+1, looking forward to the RFC.

Best,
Gary

On Sun, Dec 12, 2021 at 7:12 PM leesf <[email protected]> wrote:

> +1 for the improvement to make bucket index more comprehensive and looking
> forward to the RFC for more details.
>
> Yuwei Xiao <[email protected]> 于2021年12月10日周五 16:22写道:
>
> > Dear Hudi Community,
> >
> > I would like to propose Consistent Hashing Indexing to enable dynamic
> > bucket number, saving hyper-parameter tuning for Hudi users.
> >
> > Currently, we have Bucket Index on landing [1]. It is an effective index
> > approach to address the performance issue during Upsert. I observed ~3x
> > throughput improvement for Upsert in my local setup compared to the Bloom
> > Filter approach. However, it requires pre-configure a bucket number when
> > creating the table. As described in [1], this imposes two limitations:
> >
> > - Due to the one-one mapping between buckets and file groups, the size
> of a
> > single file group may grow infinitely. Services like compaction will take
> > longer because of the larger read/write amplification.
> >
> > - There may exist data skew because of imbalance data distribution,
> > resulting in long-tail read/write.
> >
> > Based on the above observation, supporting dynamic bucket number is
> > necessary, especially for rapidly changing hudi tables. Looking at the
> > market, Consistent Hashing has been adopted in DB systems[2][3]. The main
> > idea of it is to turn the "key->bucket" mapping into
> > "key->hash_value->(range mapping)->bucket", constraining the re-hashing
> > process to touch only several local buckets (e.g., only large file
> groups)
> > rather than shuffling the whole hash table.
> >
> > In order to introduce Consistent Hashing to Hudi, we need to consider the
> > following issues:
> >
> > - Storing hashing metadata, such as range mapping infos. Metadata size
> and
> > concurrent updates to metadata should also be considered.
> >
> > - Splitting & Merging criteria. We need to design a (or several) policies
> > to manage 'when and how to split & merge bucket'. A simple policy would
> be
> > splitting in the middle when the file group reaches the size threshold.
> >
> > - Supporting concurrent write & read. The splitting or merging must not
> > block concurrent writer & reader, and the whole process should be fast
> > enough (e.g., one bucket at a time) to minimize the impact on other
> > operations.
> >
> > - Integrating splitting & merging process into existing hudi table
> service
> > pipelines.
> >
> > I have sketched a prototype design to address the above problems:
> >
> > - Maintain hashing metadata for each partition (persisted as files), and
> > use instant to manage multi-version and concurrent updates of it.
> >
> > - A flexible framework will be implemented for different pluggable
> > policies. The splitting plan, specifying which and how the bucket to
> split
> > (merge), will be generated during the scheduling (just like how
> compaction
> > does).
> >
> > - Dual-write will be activated once the writer observes the splitting(or
> > merging) process, upserting records as log files into both old and new
> > buckets (file groups). Readers can see records once the writer completes,
> > regardless of the splitting process.
> >
> > - The splitting & merging could be integrated as a sub-task into the
> > Clustering service, because we could view them as a special case of the
> > Clustering's goal (i.e., managing file groups based on file size). Though
> > we need to modify Clustering to handle log files, the bucket index
> enhances
> > Clustering by allowing concurrent updates.
> >
> >
> > Would love to hear your thoughts and any feedback about the proposal. I
> can
> > draft an RFC with a detailed design once we reach an agreement.
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/HUDI/RFC+-+29%3A+Hash+Index
> >
> > [2] YugabyteDB
> >
> >
> https://docs.yugabyte.com/latest/architecture/docdb-sharding/sharding/#example
> >
> > [3] PolarDB-X
> > https://help.aliyun.com/document_detail/316603.html#title-y5n-2i1-5ws
> >
> >
> >
> > Best,
> >
> > Yuwei Xiao
> >
>

Reply via email to