Hudi supports pluggable indexing (HoodieIndex) and the phases of index lookup
is nicely abstracted out. We have a Jira for supporting Bucket Indexing :
https://issues.apache.org/jira/browse/HUDI-55
You can get bucket indexing done by implementing that interface along with
additional changes for handling initial writes to the partition and for
bucketing information which IMO is not significant. If you are interested in
contributing, we would be happy to help you in guiding and landing the change.
Thanks,Balaji.V
On Wednesday, October 21, 2020, 07:51:07 PM PDT, Roopa Murthy
<[email protected]> wrote:
Hello Hudi team,
We have a requirement to compact data on s3 but we need bucketing on top of
compaction so that during query time, only the files relevant to the "id" in
query would be scanned. We are told that bucketing is not currently supported
in Hudi. Is it possible to extend Hudi to support it? What does it take to
extend the framework in order to do this?
We are trying to analyze from timelines perspective whether this is an option
to consider and need your help in analyzing and planning for it.
Thanks,
Roopa