Re: [PR] feat(flink): Support data skipping based on partitioned RLI [hudi]

via GitHub Wed, 17 Jun 2026 19:00:14 -0700


danny0405 commented on code in PR #19006:
URL: https://github.com/apache/hudi/pull/19006#discussion_r3432540865



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##########
@@ -511,6 +511,15 @@ public class FlinkOptions extends HoodieConfig {
           + "E.g., given query: SELECT * FROM T WHERE `uuid` IN 
(1,2,3,4,5,6,7,8,9), the number of hoodie keys is 9, and\n"
           + "the maximum value is 8, so the source will not perform record 
level index filtering.");
 
+  @AdvancedConfig
+  public static final ConfigOption<Integer> 
READ_DATA_SKIPPING_RLI_PARTITIONS_MAX_NUM = ConfigOptions
+      .key("read.data.skipping.rli.partitions.max.num")
+      .intType()
+      .defaultValue(3)
+      .withDescription("The maximum number of candidate data table partitions 
that can be queried through the partitioned record level index "

Review Comment:
   > When a query doesn't filter on the partition column, the candidate set can 
span a large number of data partitions, and we'd fan out an RLI lookup to each 
one, which will lead to more extra cost. 
   
   Why do we even support the query optimization in this scenario? you can not 
assump the table got only 3 partitions?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(flink): Support data skipping based on partitioned RLI [hudi]

Reply via email to