Qifan Chen has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/16720 )
Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate ...................................................................... IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate This patch adds the logic to utilize min/max stats for Parquet row groups or pages to skip these entities when they don't qualify an equi-join predicate. A new class of predicates called overlap predicates is introduced to aid in the determination of whether a Parquet row group or a page overlap with a range computed from the hash join. If not, then the entire Parquet row group or the page are skipped. The new class of predicates co-exist with the existing min/max conjuncts that are introduced based on the local scan predicates. Both classes of predicates can work individually or together with each other. The overlap predicates are evaluted after the existing min/max conjuncts. To be done: 1. Handle STRING, DATE, TIME and DECIMAL data tyes; 2. Unit/performance testing; 3. Core testing. Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 --- M be/src/exec/exec-node.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java 16 files changed, 525 insertions(+), 69 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16720/16 -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 16 Gerrit-Owner: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>