Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18017 )

Change subject: IMPALA-10910, IMPALA-5509: Runtime filter: dictionary filter 
support
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18017/2/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/18017/2/be/src/exec/parquet/hdfs-parquet-scanner.cc@1905
PS2, Line 1905: ROW_GROUPS_KEY
I think that leads to misleading stats - if there is a single row group with 
100 dictionary entries and all pass, then the profile will look like this:
             - RowGroups processed: 100
             - RowGroups rejected: 100
             - RowGroups total: 100
suggesting that there were 100 row groups.

I would prefer to increase the counters only once per row group. Another 
solution with be to add a new category like FilterStats::DICT_ENTRIES_KEY.


http://gerrit.cloudera.org:8080/#/c/18017/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-dictionary-runtime-filter.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-dictionary-runtime-filter.test:

http://gerrit.cloudera.org:8080/#/c/18017/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-dictionary-runtime-filter.test@23
PS2, Line 23: aggregation(SUM, RowGroups rejected): 1
Does this test break without your BE changes?
My concern is that min-max runtime filter should be also enough to filter this 
row group - as b.col_2's min and max value will be both "a", we can exclude the 
second file where min/max are all "b".



--
To view, visit http://gerrit.cloudera.org:8080/18017
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ida0ada8799774be34312eaa4be47336149f637c7
Gerrit-Change-Number: 18017
Gerrit-PatchSet: 2
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Nov 2021 12:05:18 +0000
Gerrit-HasComments: Yes

Reply via email to