Re: [D] Dynamic Bucket Index For Flink streaming [hudi]

via GitHub Mon, 27 Apr 2026 14:43:06 -0700


GitHub user nsivabalan added a comment to the discussion: Dynamic Bucket Index 
For Flink streaming


Thanks @cshuo for the detailed writeup — the problem statement is clear and the 
motivation around limitations of existing bucket indexes is well articulated.   
                                                                                
                                                         
                                                                                
                                                                        
  I had a question about the design choice that I'd like to understand better.  
                                                                          
                                                                                
                                                                          
  The core of this proposal is: use partitioned RLI as the persistent key → 
bucket mapping, lazily load it into an in-memory cache, and look up every key 
against that cache for routing. The bucket assignment is immutable once 
written.                                                                        
                                                                                
                                                                          
  But if we're already paying the cost of maintaining partitioned RLI and doing 
per-key lookups against it, I'm wondering — what does the bucket index 
abstraction add on top of just using partitioned RLI directly?
                                                                                
                                                                          
  Consider the standard write path with partitioned RLI (no bucket index):      
                                                                          
   
  - Key lookup: RLI tells you which file group a key belongs to → route the 
record there. Same as this proposal.                                          
  - Small file handling: The existing BucketAssigner / WriteProfile 
infrastructure already profiles file sizes, routes new inserts to small files 
first, and creates new file groups only when existing ones are full. This is 
essentially the same "select a non-full bucket, create a new one if all are 
full"  logic described here.                                                    
                                                                             
  - Lazy bootstrap + cache eviction: Same approach would apply — load a 
partition's RLI mappings on demand, evict when idle.                            
  
                                                                                
                                                                          
  The main difference I see is that this proposal makes bucket assignment 
immutable forever, which is presented as a benefit (no data relocation). But 
this also means:                                                                
                                                                        
                                                                                
                                                                         
  - You can never rebalance skewed file groups                                  
                                                                        
  - Clustering cannot freely reorganize file layout — it's constrained by the 
fixed key-to-bucket mapping
  - If early assignments turn out suboptimal, you're stuck with them            
                                                                          
                                                                                
                                                                          
  With plain partitioned RLI, clustering can merge small file groups, split 
large ones, re-sort data — and simply update the RLI. The layout remains fully 
optimizable over time, which seems strictly more flexible.                      
             

I'd also like to flag the workload profile assumption here. The lazy bootstrap 
+ partition-granularity cache eviction works well for fact table  workloads 
where only recent partitions are actively written to — older partitions go 
cold, their caches get evicted, and memory stays bounded. But for dimension 
table workloads, where updates arrive across all partitions randomly and 
continuously, most partitions stay hot. In that scenario, the cache effectively 
needs to hold key → bucket mappings for the entire table in memory, and 
partition-level eviction provides little relief. How would this  design handle 
such workloads without running into memory pressure?

  So the question is: is there a specific capability or property that the 
bucket index framing provides, beyond what partitioned RLI with the existing 
small file handling already gives us? If the answer is primarily the file 
naming convention and compatibility with bucket index readers, that might not 
justify the immutability constraint and the workload limitations. Would love to 
hear your thoughts. 

GitHub link: 
https://github.com/apache/hudi/discussions/18514#discussioncomment-16734618

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] Dynamic Bucket Index For Flink streaming [hudi]

Reply via email to