Hudi Writer vs Spark Parquet Writer - Sync

Kizhakkel Jose, Felix Sun, 30 Aug 2020 14:16:37 -0700

Hello All,

Hive has the bucketBy feature and spark is going to add support for HIVE style 
bucketBy support for data sources and once it’s implemented - its going to 
benefit largely on the read performance. So as HUDI is having different path 
while writing parquet data, are we planning to add bucketBy functionality? 
Seems Spark is adding features on writers to be benefitted for better read 
performance, so having a different writer for HUDI, are keeping track on these 
new features happening on Spark, therefore HUDI writer is not going to greatly 
differ from spark file (parquet) writer or lacking features?


Regards,
Felix K Jose


________________________________
The information contained in this message may be confidential and legally 
protected under applicable law. The message is intended solely for the 
addressee(s). If you are not the intended recipient, you are hereby notified 
that any use, forwarding, dissemination, or reproduction of this message is 
strictly prohibited and may be unlawful. If you are not the intended recipient, 
please contact the sender by return e-mail and destroy all copies of the 
original message.

Hudi Writer vs Spark Parquet Writer - Sync

Reply via email to