GitHub user danny0405 added a comment to the discussion: Dynamic Bucket Index For Flink streaming
overall looks good, can you clarify these items: 1. the small file profile for assigning new keys to existing buckets, there are two metrics: the row count and file size(file group/base file), let's decide which one do we want here. and we need a way to calculate or estimate the values. 2. the read of partitioned RLI from specific partiiton, is there any read amplification? for e.g, is the partition index mappings scatter among multipe buckets or stored together with other partitions within one RLI bucket. GitHub link: https://github.com/apache/hudi/discussions/18514#discussioncomment-16629755 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
