Peter Rozsa has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
......................................................................

IMPALA-12089: Be able to skip pushing down a subset of the predicates

This change adds a predicate filtering mechanism at planning time that
locates Impala's predicates in the residual expressions from Iceberg
planning. By locating all residual expressions, the remainder
expression set can be calculated.

The current implementation is an all-or-nothing filter, if 'planFiles()'
(Iceberg API) returns no residual expression, then all Impala
predicates can be skipped, if there's any residual expression, every
Impala predicate is pushed down to the Impala scanner.

Residual expressions are the remaining filter expressions after the
pushdown of predicates into the Iceberg table scan. By locating the
remainder expression, we can reduce the number of predicates that will
be pushed down to the Impala scanner.

After this change, the Iceberg residual expression handling is improved
by locating the simple conjuncts in the residual expression and mapping
back them to Impala conjuncts. For example, if the list of Impala
conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"'
and 'col_i' happens to be a partition column in the Iceberg table
definition and Iceberg table scan can eliminate the expression, the
residual expression will be 'col_s = "a"'. This expression can be mapped
back as an Impala predicate, and any other expression can be removed
from the effective Impala conjunct list, and pushed down to the scanner,
skipping the unnecessary filtering of 'col_i'.

If there's no residual expression, the behavior is the same as before,
all predicate pushdown is skipped.
If Impala is unable to match all residual expression to Impala conjuncts
then all the conjunct are pushed dow to Impala scanner.

This change offers the advantage of not pushing down already evaluated
filters to the Impala scanner nodes, resulting in enhanced scanning
performance. Additionally, if the filter expression affects columns that
are unnecessary for the final result and can be filtered out during
Iceberg's table scan, it leads to a reduced row size, thereby optimizing
data retrieval and improving overall query efficiency.

This solution is limited to cases where Impala's expression list
contains only conjuncts, compound expressions are not supported, because
partial elimination of compounds would involve expression rewrites in
the Impala expression.

A new query option is added: iceberg_predicate_pushdown_subsetting. The
query option's default value is true. It can be turned off by setting it
to false.

Performance of the predicate location is measured on two edge cases:
 - 1000 expression, 999 skipped: on avreage 2 ms
 - 1000 expression, 1 skipped: on average 25 ms

Tests:
 - planner test cases added for disabled mode
 - existing planner test cases adjusted
 - core tests passed

Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
12 files changed, 371 insertions(+), 72 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/9
--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 9
Gerrit-Owner: Peter Rozsa <pro...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pro...@cloudera.com>

Reply via email to