Hudi supports pluggable indexing (HoodieIndex) and the phases of index lookup 
is nicely abstracted out. We have a Jira for supporting Bucket Indexing : 
https://issues.apache.org/jira/browse/HUDI-55 
You can get bucket indexing done by implementing that interface along with 
additional changes for handling initial writes to the partition and for 
bucketing information which IMO is not significant. If you are interested in 
contributing, we would be happy to help you in guiding and landing the change.
Thanks,Balaji.V



    On Wednesday, October 21, 2020, 07:51:07 PM PDT, Roopa Murthy 
<[email protected]> wrote:  
 
 Hello Hudi team,

We have a requirement to compact data on s3 but we need bucketing on top of 
compaction so that during query time, only the files relevant to the "id" in 
query would be scanned. We are told that bucketing is not currently supported 
in Hudi. Is it possible to extend Hudi to support it? What does it take to 
extend the framework in order to do this?

We are trying to analyze from timelines perspective whether this is an option 
to consider and need your help in analyzing and planning for it.

Thanks,
Roopa



  

Reply via email to