Hello Qifan Chen, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/17815 to look at the new patch set (#4). Change subject: IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader ...................................................................... IMPALA-10873: Push down EQUALS, IS NULL and IN-list predicate to ORC reader This patch pushs down more kinds of predicates into the ORC reader, including EQUALS, IN-list, and IS-NULL predicates to have more improvements: - EQUALS and IN-list predicates can be evaluated inside the ORC reader with bloom filters in the ORC files. - Comparing to scanning parquet that converting an IN-list predicate into two binary predicates (i.e. LE and GE), the ORC reader can leverage IN-list predicates to skip ORC RowGroups. E.g. a RowGroup with int column 'x' in range [1, 100] will be skipped if we push down predicate "x in (0, 101)". - IS-NULL predicates (including IS-NOT-NULL) can also be used in the ORC reader to skip RowGroups. Implementation: FE will collect these kinds of predicates into 'min_max_conjuncts' of THdfsScanNode. To better reflect the meaning, 'min_max_conjuncts' is renamed to 'stats_conjuncts'. Same for other related variable names. Parquet scanner will only pick binary min-max conjuncts (i.e. LT, GT, LE, and GE) to keep the existing behavior. ORC scanner will build SearchArgument based on all these conjuncts. Tests * Add test in orc-stats.test Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/workloads/functional-query/queries/QueryTest/orc-stats.test 9 files changed, 568 insertions(+), 162 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/17815/4 -- To view, visit http://gerrit.cloudera.org:8080/17815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iaa89f080fe2e87d94fc8ea7f1be83e087fa34225 Gerrit-Change-Number: 17815 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>