GitHub user danny0405 added a comment to the discussion: Dynamic Bucket Index For Flink streaming
we actually had more discussions offline and the initial idea may not work well because of the hash code conflicts, when the hash code conflicts, we can not decide if the key really exists even if the hash code equals, so we are deciding to directly utilize the partitioned RLI, but with the bucket index style file group id format, so that the local record key -> location mappings could be smaller (a short numeric can represent the local bucket per partition). The partitioned RLI would be loaded on demand and evicted when not used(when beyond the current checkpoint cycle and no usage detection). cc @cshuo to update with the latest ideas. GitHub link: https://github.com/apache/hudi/discussions/18514#discussioncomment-16736087 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
