Quanlong Huang has uploaded a new patch set (#8) to the change originally 
created by Norbert Luksa. ( http://gerrit.cloudera.org:8080/15403 )

Change subject: IMPALA-6505: Min-Max predicate push down in ORC scanner
......................................................................

IMPALA-6505: Min-Max predicate push down in ORC scanner

In planning phase, the planner collects and generates min-max predicates
that can be evaluated on parquet file statistics. We can easily extend
this on ORC tables.

This commit implements min/max predicate pushdown for the ORC scanner
leveraging on the external ORC library's search arguments. We build
the search arguments when we open the scanner as we need not to
modify them later.

Also added a new query option orc_read_statistics, similar to
parquet_read_statistics. If the option is set to true (it is by default)
predicate pushdown will take effect, otherwise it will be skipped.

Limitations:
 - Min-max predicates on CHAR/VARCHAR types are not pushed down due to
   inconsistent behaviors on padding/truncating between Hive and Impala.
   (IMPALA-10882)
 - Min-max predicates with non-literal const exprs are not pushed down
   since SearchArgument interfaces only accept literals. This only
   happens when expr rewrites are disabled thus constant folding is
   disabled.

Tests:
 - Add e2e tests similar to test_parquet_stats to verified that
   predicates are pushed down.
 - Run CORE tests
 - Run TPCH benchmark, here is no improvement, nor regression.
   On the other hand, certain selective queries gained significant
   speed-up.

Change-Id: I136622413db21e0941d238ab6aeea901a6464845
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-metadata-utils.h
M be/src/exprs/scalar-expr.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M bin/impala-config.sh
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A testdata/workloads/functional-query/queries/QueryTest/orc-stats.test
M tests/query_test/test_nested_types.py
A tests/query_test/test_orc_stats.py
13 files changed, 690 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/15403/8
--
To view, visit http://gerrit.cloudera.org:8080/15403
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I136622413db21e0941d238ab6aeea901a6464845
Gerrit-Change-Number: 15403
Gerrit-PatchSet: 8
Gerrit-Owner: Norbert Luksa <norbert.lu...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward (520)
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <norbert.lu...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to