Hello All, Hive has the bucketBy feature and spark is going to add support for HIVE style bucketBy support for data sources and once it’s implemented - its going to benefit largely on the read performance. So as HUDI is having different path while writing parquet data, are we planning to add bucketBy functionality? Seems Spark is adding features on writers to be benefitted for better read performance, so having a different writer for HUDI, are keeping track on these new features happening on Spark, therefore HUDI writer is not going to greatly differ from spark file (parquet) writer or lacking features?
Regards, Felix K Jose ________________________________ The information contained in this message may be confidential and legally protected under applicable law. The message is intended solely for the addressee(s). If you are not the intended recipient, you are hereby notified that any use, forwarding, dissemination, or reproduction of this message is strictly prohibited and may be unlawful. If you are not the intended recipient, please contact the sender by return e-mail and destroy all copies of the original message.