Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18017 )
Change subject: IMPALA-10910, IMPALA-5509: Runtime filter: dictionary filter support ...................................................................... Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/18017/5/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/18017/5/be/src/exec/parquet/hdfs-parquet-scanner.cc@1891 PS5, Line 1891: vector<bool> runtime_filters_processed( : runtime_filters == nullptr ? 0 : runtime_filters->size(), false); We'll always process the first N runtime filters, where N >= 0 && N < runtime_filters->size(). And all runtime filters with idx >= N are unprocessed. This means it's enough to store a single integer (N). http://gerrit.cloudera.org:8080/#/c/18017/5/be/src/exec/parquet/hdfs-parquet-scanner.cc@1913 PS5, Line 1913: // All runtime filter evaulation should fail to eliminate this row group. All dictionary element should fail due to a conjunct or at least one runtime filter to eliminate the row group. Which means here, where we are dealing with a single dictionary element, it is enough to have a single runtime filter that doesn't contain the element (since the runtime filters are in AND relation with each other). http://gerrit.cloudera.org:8080/#/c/18017/5/be/src/exec/parquet/hdfs-parquet-scanner.cc@1916 PS5, Line 1916: if (runtime_filters->at(rf_idx)->Eval(&row)) { : column_has_match = true; : break; : } We should break on the first filter that returns false. http://gerrit.cloudera.org:8080/#/c/18017/5/be/src/exec/parquet/hdfs-parquet-scanner.cc@1920 PS5, Line 1920: } : } After the for-loop we should check if we processed all runtime_filters, and if the last runtime filter also evaluated to true, then the column has match, and we should break from the dict entry for-loop. http://gerrit.cloudera.org:8080/#/c/18017/5/be/src/exec/parquet/hdfs-parquet-scanner.cc@1931 PS5, Line 1931: runtime_filters->at(rf_idx)->stats->IncrCounters( : FilterStats::ROW_GROUPS_KEY, 1, 0, 0); We should only increment the counters for the processed filters. -- To view, visit http://gerrit.cloudera.org:8080/18017 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ida0ada8799774be34312eaa4be47336149f637c7 Gerrit-Change-Number: 18017 Gerrit-PatchSet: 5 Gerrit-Owner: Tamas Mate <tm...@cloudera.com> Gerrit-Reviewer: Amogh Margoor <amarg...@gmail.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Wed, 05 Jan 2022 18:08:58 +0000 Gerrit-HasComments: Yes