Qifan Chen has uploaded a new patch set (#22). ( http://gerrit.cloudera.org:8080/17252 )
Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator ...................................................................... IMPALA-10647 Improve always-true min/max filter handling in coordinator The change improves how a coordinator behaves when a just arriving min/max filter is always true. A new member 'always_true_filter_received_' is introduced to record such a fact. Similarily, the new member always_false_flipped_to_false_ is added to indicate that the always false flag is flipped from 'true' to 'false'. These two members only influence how the min and max columns in "Filter routing table" and "Final filter table" in profile are displayed as follows. 1. 'PartialUpdates' - The min and the max are partially updated; 2. 'AlwaysTrue' - One received filter is AlwaysTrue; 3. 'AlwaysFalse' - No filter is received or all received filters are empty; 4. 'Real values' - The final accumulated min/max from all received filters. A second change introduced is to record, in scan node, the arrival time of min/max filters (as a timestamp since the system is rebooted, obtained by calling MonotonicMillis()). A timestamp of similar nature is recorded for hdfs parquet scanners when a row group is processed. By comparing these two timestamps, one can easily diagnose issues related to late arrival of min/max filters. This change also addresses a flaw with rows unexpectedly filtered out, due to the reason that the always_true_ flag in a min/max filter, when set, is ignored in the eval code path in RuntimeFilter::Eval(). Testing: 1. Added three new tests in overlap_min_max_filters.test to verify that the min/max are displayed correctly when the min/max filter in hash join builder is set to always true, always false, or a pair of meaningful min and max values. 2. Ran unit tests; 3. Ran runtime-filter-test; 4. Ran core tests successfully. Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/runtime-filter-ir.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test 9 files changed, 224 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/17252/22 -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 22 Gerrit-Owner: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Fang-Yu Rao <fangyu....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>