GitHub user danny0405 added a comment to the discussion: Dynamic Bucket Index 
For Flink streaming

we actually had more discussions offline and the initial idea may not work well 
because of the hash code conflicts, when the hash code conflicts, we can not 
decide if the key really exists even if the hash code equals, so we are 
deciding to directly utilize the partitioned RLI, but with the bucket index 
style file group id format, so that the local record key -> location mappings 
could be smaller (a short numeric can represent the local bucket per partition).

The partitioned RLI would be loaded on demand and evicted when not used(when 
beyond the current checkpoint cycle and no usage detection).

cc @cshuo to update with the latest ideas.

GitHub link: 
https://github.com/apache/hudi/discussions/18514#discussioncomment-16736087

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to