Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18017 )

Change subject: IMPALA-10910, IMPALA-5509: Runtime filter: dictionary filter 
support
......................................................................


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/18017/5/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/18017/5/be/src/exec/parquet/hdfs-parquet-scanner.cc@1891
PS5, Line 1891: vector<bool> runtime_filters_processed(
              :         runtime_filters == nullptr ? 0 : 
runtime_filters->size(), false);
We'll always process the first N runtime filters, where N >= 0 && N < 
runtime_filters->size(). And all runtime filters with idx >= N are unprocessed.

This means it's enough to store a single integer (N).


http://gerrit.cloudera.org:8080/#/c/18017/5/be/src/exec/parquet/hdfs-parquet-scanner.cc@1913
PS5, Line 1913:           // All runtime filter evaulation should fail to 
eliminate this row group.
All dictionary element should fail due to a conjunct or at least one runtime 
filter to eliminate the row group.

Which means here, where we are dealing with a single dictionary element, it is 
enough to have a single runtime filter that doesn't contain the element (since 
the runtime filters are in AND relation with each other).


http://gerrit.cloudera.org:8080/#/c/18017/5/be/src/exec/parquet/hdfs-parquet-scanner.cc@1916
PS5, Line 1916:             if (runtime_filters->at(rf_idx)->Eval(&row)) {
              :               column_has_match = true;
              :               break;
              :             }
We should break on the first filter that returns false.


http://gerrit.cloudera.org:8080/#/c/18017/5/be/src/exec/parquet/hdfs-parquet-scanner.cc@1920
PS5, Line 1920:           }
              :         }
After the for-loop we should check if we processed all runtime_filters, and if 
the last runtime filter also evaluated to true, then the column has match, and 
we should break from the dict entry for-loop.


http://gerrit.cloudera.org:8080/#/c/18017/5/be/src/exec/parquet/hdfs-parquet-scanner.cc@1931
PS5, Line 1931:         runtime_filters->at(rf_idx)->stats->IncrCounters(
              :             FilterStats::ROW_GROUPS_KEY, 1, 0, 0);
We should only increment the counters for the processed filters.



--
To view, visit http://gerrit.cloudera.org:8080/18017
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ida0ada8799774be34312eaa4be47336149f637c7
Gerrit-Change-Number: 18017
Gerrit-PatchSet: 5
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <amarg...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Jan 2022 18:08:58 +0000
Gerrit-HasComments: Yes

Reply via email to