Hello Daniel Becker, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19534

to look at the new patch set (#6).

Change subject: IMPALA-11701 Part1: Don't push down predicates to scanner if 
already applied by Iceberg
......................................................................

IMPALA-11701 Part1: Don't push down predicates to scanner if already applied by 
Iceberg

We push down predicates to Iceberg that uses them to filter out files
when getting the results of planFiles(). Using the
FileScanTask.residual() function we can find out if we have to use
the predicates to further filter the rows of the given files or if
Iceberg has already performed all the filtering.
Basically if we only filter on partition columns then Iceberg can
filter the files and using these filters in Impala wouldn't filter any
more rows from the output (assuming that no partition evolution was
performed on the table).

An additional benefit of not pushing down no-op predicates to the
scanner is that we can potentially materialize less slots.
For example:

SELECT count(1) from iceberg_tbl where part_col = 10;

In the above query Iceberg filters the files using the predicate on
a partition column and then there won't be any need to materialize
'part_col' in Impala, nor to push down the 'part_col = 10' predicate.

Note, this is an all or nothing approach, meaning that assuming N
number of predicates we either push down all predicates to the scanner
or none of them. There is a room for improvement to identify a subset
of the predicates that we still have to push down to the scanner.
However, for this we'd need a mapping between Impala predicates and the
predicates returned by Iceberg's FileScanTask.residual() function that
would significantly increase the complexity of the relevant code.

Testing:
  - Some existing tests needed some extra care as they were checking
    for predicates being pushed down to the scanner, but with this
    patch not all of them are pushed down. For these tests I added some
    extra predicates to achieve that all of the predicates are pushed
    down to the scanner.
  - Added a new planner test suite for checking how predicate push down
    works with Iceberg tables.

Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b
---
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test
12 files changed, 311 insertions(+), 71 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/19534/6
--
To view, visit http://gerrit.cloudera.org:8080/19534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b
Gerrit-Change-Number: 19534
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to