Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18017 )
Change subject: IMPALA-10910, IMPALA-5509: Runtime filter: dictionary filter support ...................................................................... Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/18017/2/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/18017/2/be/src/exec/parquet/hdfs-parquet-scanner.cc@1905 PS2, Line 1905: ROW_GROUPS_KEY I think that leads to misleading stats - if there is a single row group with 100 dictionary entries and all pass, then the profile will look like this: - RowGroups processed: 100 - RowGroups rejected: 100 - RowGroups total: 100 suggesting that there were 100 row groups. I would prefer to increase the counters only once per row group. Another solution with be to add a new category like FilterStats::DICT_ENTRIES_KEY. http://gerrit.cloudera.org:8080/#/c/18017/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-dictionary-runtime-filter.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-dictionary-runtime-filter.test: http://gerrit.cloudera.org:8080/#/c/18017/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-dictionary-runtime-filter.test@23 PS2, Line 23: aggregation(SUM, RowGroups rejected): 1 Does this test break without your BE changes? My concern is that min-max runtime filter should be also enough to filter this row group - as b.col_2's min and max value will be both "a", we can exclude the second file where min/max are all "b". -- To view, visit http://gerrit.cloudera.org:8080/18017 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ida0ada8799774be34312eaa4be47336149f637c7 Gerrit-Change-Number: 18017 Gerrit-PatchSet: 2 Gerrit-Owner: Tamas Mate <tm...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Thu, 11 Nov 2021 12:05:18 +0000 Gerrit-HasComments: Yes