GitHub user nsivabalan added a comment to the discussion: Dynamic Bucket Index
For Flink streaming
Thanks @cshuo for the detailed writeup — the problem statement is clear and the
motivation around limitations of existing bucket indexes is well articulated.
I had a question about the design choice that I'd like to understand better.
The core of this proposal is: use partitioned RLI as the persistent key →
bucket mapping, lazily load it into an in-memory cache, and look up every key
against that cache for routing. The bucket assignment is immutable once
written.
But if we're already paying the cost of maintaining partitioned RLI and doing
per-key lookups against it, I'm wondering — what does the bucket index
abstraction add on top of just using partitioned RLI directly?
Consider the standard write path with partitioned RLI (no bucket index):
- Key lookup: RLI tells you which file group a key belongs to → route the
record there. Same as this proposal.
- Small file handling: The existing BucketAssigner / WriteProfile
infrastructure already profiles file sizes, routes new inserts to small files
first, and creates new file groups only when existing ones are full. This is
essentially the same "select a non-full bucket, create a new one if all are
full" logic described here.
- Lazy bootstrap + cache eviction: Same approach would apply — load a
partition's RLI mappings on demand, evict when idle.
The main difference I see is that this proposal makes bucket assignment
immutable forever, which is presented as a benefit (no data relocation). But
this also means:
- You can never rebalance skewed file groups
- Clustering cannot freely reorganize file layout — it's constrained by the
fixed key-to-bucket mapping
- If early assignments turn out suboptimal, you're stuck with them
With plain partitioned RLI, clustering can merge small file groups, split
large ones, re-sort data — and simply update the RLI. The layout remains fully
optimizable over time, which seems strictly more flexible.
I'd also like to flag the workload profile assumption here. The lazy bootstrap
+ partition-granularity cache eviction works well for fact table workloads
where only recent partitions are actively written to — older partitions go
cold, their caches get evicted, and memory stays bounded. But for dimension
table workloads, where updates arrive across all partitions randomly and
continuously, most partitions stay hot. In that scenario, the cache effectively
needs to hold key → bucket mappings for the entire table in memory, and
partition-level eviction provides little relief. How would this design handle
such workloads without running into memory pressure?
So the question is: is there a specific capability or property that the
bucket index framing provides, beyond what partitioned RLI with the existing
small file handling already gives us? If the answer is primarily the file
naming convention and compatibility with bucket index readers, that might not
justify the immutability constraint and the workload limitations. Would love to
hear your thoughts.
GitHub link:
https://github.com/apache/hudi/discussions/18514#discussioncomment-16734618
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]