[I] Support writes using spark dataframe end to end [hudi]

via GitHub Sun, 30 Nov 2025 03:39:26 -0800


hudi-bot opened a new issue, #16846:
URL: https://github.com/apache/hudi/issues/16846


   We wanted to support writes using spark end to end using dataframe w/o 
converting them to avro record.
   
    
   
   This opens up lot of opportunities for Hudi 
    * This will place Hudi close to direct parquet writes for straight forward 
immutable use-cases. Also for mutable use-cases, it will increase
    * For mutable use-cases, we are anticipating 10 to 20% improvement over rdd 
based write client impl. 
    * We can leverage spark optimizations which can kick in only with 
dataframe. 
    * Rapids, vectorized reading etc can speed up writes with Hudi once we move 
to end to end data frame writes. 
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-9019
   - Type: Improvement
   - Fix version(s):
     - 1.1.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Support writes using spark dataframe end to end [hudi]

Reply via email to