+1, looking forward to the RFC. Best, Gary
On Sun, Dec 12, 2021 at 7:12 PM leesf <[email protected]> wrote: > +1 for the improvement to make bucket index more comprehensive and looking > forward to the RFC for more details. > > Yuwei Xiao <[email protected]> 于2021年12月10日周五 16:22写道: > > > Dear Hudi Community, > > > > I would like to propose Consistent Hashing Indexing to enable dynamic > > bucket number, saving hyper-parameter tuning for Hudi users. > > > > Currently, we have Bucket Index on landing [1]. It is an effective index > > approach to address the performance issue during Upsert. I observed ~3x > > throughput improvement for Upsert in my local setup compared to the Bloom > > Filter approach. However, it requires pre-configure a bucket number when > > creating the table. As described in [1], this imposes two limitations: > > > > - Due to the one-one mapping between buckets and file groups, the size > of a > > single file group may grow infinitely. Services like compaction will take > > longer because of the larger read/write amplification. > > > > - There may exist data skew because of imbalance data distribution, > > resulting in long-tail read/write. > > > > Based on the above observation, supporting dynamic bucket number is > > necessary, especially for rapidly changing hudi tables. Looking at the > > market, Consistent Hashing has been adopted in DB systems[2][3]. The main > > idea of it is to turn the "key->bucket" mapping into > > "key->hash_value->(range mapping)->bucket", constraining the re-hashing > > process to touch only several local buckets (e.g., only large file > groups) > > rather than shuffling the whole hash table. > > > > In order to introduce Consistent Hashing to Hudi, we need to consider the > > following issues: > > > > - Storing hashing metadata, such as range mapping infos. Metadata size > and > > concurrent updates to metadata should also be considered. > > > > - Splitting & Merging criteria. We need to design a (or several) policies > > to manage 'when and how to split & merge bucket'. A simple policy would > be > > splitting in the middle when the file group reaches the size threshold. > > > > - Supporting concurrent write & read. The splitting or merging must not > > block concurrent writer & reader, and the whole process should be fast > > enough (e.g., one bucket at a time) to minimize the impact on other > > operations. > > > > - Integrating splitting & merging process into existing hudi table > service > > pipelines. > > > > I have sketched a prototype design to address the above problems: > > > > - Maintain hashing metadata for each partition (persisted as files), and > > use instant to manage multi-version and concurrent updates of it. > > > > - A flexible framework will be implemented for different pluggable > > policies. The splitting plan, specifying which and how the bucket to > split > > (merge), will be generated during the scheduling (just like how > compaction > > does). > > > > - Dual-write will be activated once the writer observes the splitting(or > > merging) process, upserting records as log files into both old and new > > buckets (file groups). Readers can see records once the writer completes, > > regardless of the splitting process. > > > > - The splitting & merging could be integrated as a sub-task into the > > Clustering service, because we could view them as a special case of the > > Clustering's goal (i.e., managing file groups based on file size). Though > > we need to modify Clustering to handle log files, the bucket index > enhances > > Clustering by allowing concurrent updates. > > > > > > Would love to hear your thoughts and any feedback about the proposal. I > can > > draft an RFC with a detailed design once we reach an agreement. > > > > [1] > > https://cwiki.apache.org/confluence/display/HUDI/RFC+-+29%3A+Hash+Index > > > > [2] YugabyteDB > > > > > https://docs.yugabyte.com/latest/architecture/docdb-sharding/sharding/#example > > > > [3] PolarDB-X > > https://help.aliyun.com/document_detail/316603.html#title-y5n-2i1-5ws > > > > > > > > Best, > > > > Yuwei Xiao > > >
