[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg

2023-09-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates in Iceberg
..

IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg

This change adds a predicate filtering mechanism at planning time that
locates Impala's predicates in the residual expressions from Iceberg
planning. By locating all residual expressions, the remainder
expression set can be calculated.

The current implementation is an all-or-nothing filter, if 'planFiles()'
(Iceberg API) returns no residual expression, then all Impala
predicates can be skipped, if there's any residual expression, every
Impala predicate is pushed down to the Impala scanner.

Residual expressions are the remaining filter expressions after the
pushdown of predicates into the Iceberg table scan. By locating the
remainder expression, we can reduce the number of predicates that will
be pushed down to the Impala scanner.

After this change, the Iceberg residual expression handling is improved
by locating the simple conjuncts in the residual expression and mapping
back them to Impala conjuncts. For example, if the list of Impala
conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"'
and 'col_i' happens to be a partition column in the Iceberg table
definition and Iceberg table scan can eliminate the expression, the
residual expression will be 'col_s = "a"'. This expression can be mapped
back as an Impala predicate, and any other expression can be removed
from the effective Impala conjunct list, and pushed down to the scanner,
skipping the unnecessary filtering of 'col_i'.

If there's no residual expression, the behavior is the same as before,
all predicate pushdown is skipped.
If Impala is unable to match all residual expression to Impala conjuncts
then all the conjunct are pushed dow to Impala scanner.

This change offers the advantage of not pushing down already evaluated
filters to the Impala scanner nodes, resulting in enhanced scanning
performance. Additionally, if the filter expression affects columns that
are unnecessary for the final result and can be filtered out during
Iceberg's table scan, it leads to a reduced row size, thereby optimizing
data retrieval and improving overall query efficiency.

This solution is limited to cases where Impala's expression list
contains only conjuncts, compound expressions are not supported, because
partial elimination of compounds would involve expression rewrites in
the Impala expression.

A new query option is added: iceberg_predicate_pushdown_subsetting. The
query option's default value is true. It can be turned off by setting it
to false.

Performance of the predicate location is measured on two edge cases:
 - 1000 expression, 999 skipped: on avreage 2 ms
 - 1000 expression, 1 skipped: on average 25 ms

Tests:
 - planner test cases added for disabled mode
 - existing planner test cases adjusted
 - core tests passed

Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Reviewed-on: http://gerrit.cloudera.org:8080/20133
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
12 files changed, 370 insertions(+), 72 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 13
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg

2023-09-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates in Iceberg
..


Patch Set 12: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 12
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 28 Sep 2023 14:16:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg

2023-09-28 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates in Iceberg
..


Patch Set 11: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 11
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 28 Sep 2023 09:09:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg

2023-09-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates in Iceberg
..


Patch Set 12: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 12
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 28 Sep 2023 09:09:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg

2023-09-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates in Iceberg
..


Patch Set 12:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9772/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 12
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 28 Sep 2023 09:09:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg

2023-09-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates in Iceberg
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14100/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 11
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 28 Sep 2023 08:41:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg

2023-09-28 Thread Peter Rozsa (Code Review)
Peter Rozsa has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates in Iceberg
..

IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg

This change adds a predicate filtering mechanism at planning time that
locates Impala's predicates in the residual expressions from Iceberg
planning. By locating all residual expressions, the remainder
expression set can be calculated.

The current implementation is an all-or-nothing filter, if 'planFiles()'
(Iceberg API) returns no residual expression, then all Impala
predicates can be skipped, if there's any residual expression, every
Impala predicate is pushed down to the Impala scanner.

Residual expressions are the remaining filter expressions after the
pushdown of predicates into the Iceberg table scan. By locating the
remainder expression, we can reduce the number of predicates that will
be pushed down to the Impala scanner.

After this change, the Iceberg residual expression handling is improved
by locating the simple conjuncts in the residual expression and mapping
back them to Impala conjuncts. For example, if the list of Impala
conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"'
and 'col_i' happens to be a partition column in the Iceberg table
definition and Iceberg table scan can eliminate the expression, the
residual expression will be 'col_s = "a"'. This expression can be mapped
back as an Impala predicate, and any other expression can be removed
from the effective Impala conjunct list, and pushed down to the scanner,
skipping the unnecessary filtering of 'col_i'.

If there's no residual expression, the behavior is the same as before,
all predicate pushdown is skipped.
If Impala is unable to match all residual expression to Impala conjuncts
then all the conjunct are pushed dow to Impala scanner.

This change offers the advantage of not pushing down already evaluated
filters to the Impala scanner nodes, resulting in enhanced scanning
performance. Additionally, if the filter expression affects columns that
are unnecessary for the final result and can be filtered out during
Iceberg's table scan, it leads to a reduced row size, thereby optimizing
data retrieval and improving overall query efficiency.

This solution is limited to cases where Impala's expression list
contains only conjuncts, compound expressions are not supported, because
partial elimination of compounds would involve expression rewrites in
the Impala expression.

A new query option is added: iceberg_predicate_pushdown_subsetting. The
query option's default value is true. It can be turned off by setting it
to false.

Performance of the predicate location is measured on two edge cases:
 - 1000 expression, 999 skipped: on avreage 2 ms
 - 1000 expression, 1 skipped: on average 25 ms

Tests:
 - planner test cases added for disabled mode
 - existing planner test cases adjusted
 - core tests passed

Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
12 files changed, 370 insertions(+), 72 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/11
--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 11
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg

2023-09-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates in Iceberg
..


Patch Set 10: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9750/


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 10
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Fri, 22 Sep 2023 11:22:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg

2023-09-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates in Iceberg
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9750/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 10
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Fri, 22 Sep 2023 07:10:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg

2023-09-21 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates in Iceberg
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 10
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 21 Sep 2023 14:36:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg

2023-09-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates in Iceberg
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/14017/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 10
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Mon, 18 Sep 2023 09:10:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg

2023-09-18 Thread Peter Rozsa (Code Review)
Peter Rozsa has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates in Iceberg
..

IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg

This change adds a predicate filtering mechanism at planning time that
locates Impala's predicates in the residual expressions from Iceberg
planning. By locating all residual expressions, the remainder
expression set can be calculated.

The current implementation is an all-or-nothing filter, if 'planFiles()'
(Iceberg API) returns no residual expression, then all Impala
predicates can be skipped, if there's any residual expression, every
Impala predicate is pushed down to the Impala scanner.

Residual expressions are the remaining filter expressions after the
pushdown of predicates into the Iceberg table scan. By locating the
remainder expression, we can reduce the number of predicates that will
be pushed down to the Impala scanner.

After this change, the Iceberg residual expression handling is improved
by locating the simple conjuncts in the residual expression and mapping
back them to Impala conjuncts. For example, if the list of Impala
conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"'
and 'col_i' happens to be a partition column in the Iceberg table
definition and Iceberg table scan can eliminate the expression, the
residual expression will be 'col_s = "a"'. This expression can be mapped
back as an Impala predicate, and any other expression can be removed
from the effective Impala conjunct list, and pushed down to the scanner,
skipping the unnecessary filtering of 'col_i'.

If there's no residual expression, the behavior is the same as before,
all predicate pushdown is skipped.
If Impala is unable to match all residual expression to Impala conjuncts
then all the conjunct are pushed dow to Impala scanner.

This change offers the advantage of not pushing down already evaluated
filters to the Impala scanner nodes, resulting in enhanced scanning
performance. Additionally, if the filter expression affects columns that
are unnecessary for the final result and can be filtered out during
Iceberg's table scan, it leads to a reduced row size, thereby optimizing
data retrieval and improving overall query efficiency.

This solution is limited to cases where Impala's expression list
contains only conjuncts, compound expressions are not supported, because
partial elimination of compounds would involve expression rewrites in
the Impala expression.

A new query option is added: iceberg_predicate_pushdown_subsetting. The
query option's default value is true. It can be turned off by setting it
to false.

Performance of the predicate location is measured on two edge cases:
 - 1000 expression, 999 skipped: on avreage 2 ms
 - 1000 expression, 1 skipped: on average 25 ms

Tests:
 - planner test cases added for disabled mode
 - existing planner test cases adjusted
 - core tests passed

Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
12 files changed, 369 insertions(+), 72 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/10
--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 10
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 9: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9660/


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 9
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 31 Aug 2023 11:50:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9660/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 9
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 31 Aug 2023 07:38:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-30 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 9
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Wed, 30 Aug 2023 10:42:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13882/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 9
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Wed, 30 Aug 2023 08:22:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-30 Thread Peter Rozsa (Code Review)
Peter Rozsa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 9:

(2 comments)

Thank you, Gabor!

http://gerrit.cloudera.org:8080/#/c/20133/8/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/20133/8/common/thrift/ImpalaService.thrift@846
PS8, Line 846: the sca
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/20133/8/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/20133/8/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@426
PS8, Line 426:
> I'd also push down this logging into the newly introduced function to reduc
Done



--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 9
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Wed, 30 Aug 2023 07:56:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-30 Thread Peter Rozsa (Code Review)
Peter Rozsa has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..

IMPALA-12089: Be able to skip pushing down a subset of the predicates

This change adds a predicate filtering mechanism at planning time that
locates Impala's predicates in the residual expressions from Iceberg
planning. By locating all residual expressions, the remainder
expression set can be calculated.

The current implementation is an all-or-nothing filter, if 'planFiles()'
(Iceberg API) returns no residual expression, then all Impala
predicates can be skipped, if there's any residual expression, every
Impala predicate is pushed down to the Impala scanner.

Residual expressions are the remaining filter expressions after the
pushdown of predicates into the Iceberg table scan. By locating the
remainder expression, we can reduce the number of predicates that will
be pushed down to the Impala scanner.

After this change, the Iceberg residual expression handling is improved
by locating the simple conjuncts in the residual expression and mapping
back them to Impala conjuncts. For example, if the list of Impala
conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"'
and 'col_i' happens to be a partition column in the Iceberg table
definition and Iceberg table scan can eliminate the expression, the
residual expression will be 'col_s = "a"'. This expression can be mapped
back as an Impala predicate, and any other expression can be removed
from the effective Impala conjunct list, and pushed down to the scanner,
skipping the unnecessary filtering of 'col_i'.

If there's no residual expression, the behavior is the same as before,
all predicate pushdown is skipped.
If Impala is unable to match all residual expression to Impala conjuncts
then all the conjunct are pushed dow to Impala scanner.

This change offers the advantage of not pushing down already evaluated
filters to the Impala scanner nodes, resulting in enhanced scanning
performance. Additionally, if the filter expression affects columns that
are unnecessary for the final result and can be filtered out during
Iceberg's table scan, it leads to a reduced row size, thereby optimizing
data retrieval and improving overall query efficiency.

This solution is limited to cases where Impala's expression list
contains only conjuncts, compound expressions are not supported, because
partial elimination of compounds would involve expression rewrites in
the Impala expression.

A new query option is added: iceberg_predicate_pushdown_subsetting. The
query option's default value is true. It can be turned off by setting it
to false.

Performance of the predicate location is measured on two edge cases:
 - 1000 expression, 999 skipped: on avreage 2 ms
 - 1000 expression, 1 skipped: on average 25 ms

Tests:
 - planner test cases added for disabled mode
 - existing planner test cases adjusted
 - core tests passed

Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
12 files changed, 371 insertions(+), 72 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/9
--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 9
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-25 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 8: Code-Review+1

(2 comments)

Just some minor nits, looks good in overall.

http://gerrit.cloudera.org:8080/#/c/20133/8/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/20133/8/common/thrift/ImpalaService.thrift@846
PS8, Line 846: the the
nit: typo


http://gerrit.cloudera.org:8080/#/c/20133/8/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/20133/8/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@426
PS8, Line 426:   LOG.debug("Iceberg predicate pushdown subsetting took {} 
ms",
I'd also push down this logging into the newly introduced function to reduce 
the noise from this level.



--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 8
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Fri, 25 Aug 2023 13:23:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13833/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 8
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 24 Aug 2023 14:54:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-24 Thread Peter Rozsa (Code Review)
Hello Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20133

to look at the new patch set (#8).

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..

IMPALA-12089: Be able to skip pushing down a subset of the predicates

This change adds a predicate filtering mechanism at planning time that
locates Impala's predicates in the residual expressions from Iceberg
planning. By locating all residual expressions, the remainder
expression set can be calculated.

The current implementation is an all-or-nothing filter, if 'planFiles()'
(Iceberg API) returns no residual expression, then all Impala
predicates can be skipped, if there's any residual expression, every
Impala predicate is pushed down to the Impala scanner.

Residual expressions are the remaining filter expressions after the
pushdown of predicates into the Iceberg table scan. By locating the
remainder expression, we can reduce the number of predicates that will
be pushed down to the Impala scanner.

After this change, the Iceberg residual expression handling is improved
by locating the simple conjuncts in the residual expression and mapping
back them to Impala conjuncts. For example, if the list of Impala
conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"'
and 'col_i' happens to be a partition column in the Iceberg table
definition and Iceberg table scan can eliminate the expression, the
residual expression will be 'col_s = "a"'. This expression can be mapped
back as an Impala predicate, and any other expression can be removed
from the effective Impala conjunct list, and pushed down to the scanner,
skipping the unnecessary filtering of 'col_i'.

If there's no residual expression, the behavior is the same as before,
all predicate pushdown is skipped.
If Impala is unable to match all residual expression to Impala conjuncts
then all the conjunct are pushed dow to Impala scanner.

This change offers the advantage of not pushing down already evaluated
filters to the Impala scanner nodes, resulting in enhanced scanning
performance. Additionally, if the filter expression affects columns that
are unnecessary for the final result and can be filtered out during
Iceberg's table scan, it leads to a reduced row size, thereby optimizing
data retrieval and improving overall query efficiency.

This solution is limited to cases where Impala's expression list
contains only conjuncts, compound expressions are not supported, because
partial elimination of compounds would involve expression rewrites in
the Impala expression.

A new query option is added: iceberg_predicate_pushdown_subsetting. The
query option's default value is true. It can be turned off by setting it
to false.

Performance of the predicate location is measured on two edge cases:
 - 1000 expression, 999 skipped: on avreage 2 ms
 - 1000 expression, 1 skipped: on average 25 ms

Tests:
 - planner test cases added for disabled mode
 - existing planner test cases adjusted
 - core tests passed

Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
12 files changed, 372 insertions(+), 72 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/8
--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 8
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13832/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 7
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 24 Aug 2023 12:34:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-24 Thread Peter Rozsa (Code Review)
Peter Rozsa has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..

IMPALA-12089: Be able to skip pushing down a subset of the predicates

This change adds a predicate filtering mechanism at planning time that
locates Impala's predicates in the residual expressions from Iceberg
planning. By locating all residual expressions, the remainder
expression set can be calculated.

The current implementation is an all-or-nothing filter, if 'planFiles()'
(Iceberg API) returns no residual expression, then all Impala
predicates can be skipped, if there's any residual expression, every
Impala predicate is pushed down to the Impala scanner.

Residual expressions are the remaining filter expressions after the
pushdown of predicates into the Iceberg table scan. By locating the
remainder expression, we can reduce the number of predicates that will
be pushed down to the Impala scanner.

After this change, the Iceberg residual expression handling is improved
by locating the simple conjuncts in the residual expression and mapping
back them to Impala conjuncts. For example, if the list of Impala
conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"'
and 'col_i' happens to be a partition column in the Iceberg table
definition and Iceberg table scan can eliminate the expression, the
residual expression will be 'col_s = "a"'. This expression can be mapped
back as an Impala predicate, and any other expression can be removed
from the effective Impala conjunct list, and pushed down to the scanner,
skipping the unnecessary filtering of 'col_i'.

If there's no residual expression, the behavior is the same as before,
all predicate pushdown is skipped.
If Impala is unable to match all residual expression to Impala conjuncts
then all the conjunct are pushed dow to Impala scanner.

This change offers the advantage of not pushing down already evaluated
filters to the Impala scanner nodes, resulting in enhanced scanning
performance. Additionally, if the filter expression affects columns that
are unnecessary for the final result and can be filtered out during
Iceberg's table scan, it leads to a reduced row size, thereby optimizing
data retrieval and improving overall query efficiency.

This solution is limited to cases where Impala's expression list
contains only conjuncts, compound expressions are not supported, because
partial elimination of compounds would involve expression rewrites in
the Impala expression.

A new query option is added: iceberg_predicate_pushdown_subsetting. The
query option's default value is true. It can be turned off by setting it
to false.

Performance of the predicate location is measured on two edge cases:
 - 1000 expression, 999 skipped: on avreage 2 ms
 - 1000 expression, 1 skipped: on average 25 ms

Tests:
 - planner test cases added for disabled mode
 - existing planner test cases adjusted
 - core tests passed

Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
12 files changed, 373 insertions(+), 72 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/7
--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 7
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-23 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/20133/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20133/6//COMMIT_MSG@56
PS6, Line 56: Performance of the predicate location is measured on two edge 
cases:
Just curious: In terms of the # of slots being materialised, was this a select 
* query or did skipping the predicates also reduce the number of materialised 
slots too?

But anyway, quite impressive numbers!


http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@421
PS5, Line 421:   return;
nit: curly brackets aren't needed around the return statement.


http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@422
PS5, Line 422: }
I'm wondering if putting the rest of the code in this function into another 
function would increase readability. like 
trySubsettingPredicatesBeingPushedDown(). There might be a better and shorter 
name for that function, though.


http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@429
PS5, Line 429:   for (Expression located : locatedExpressions) {
nit: linebreak not necessary here


http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@433
PS5, Line 433:  filtering the predicates to be pushed down to Impala 
scanner.
nit: curly brackets not needed.


http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test:

http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test@1005
PS4, Line 1005:  further ro
> This is an example for a query that produces untranslated expression.
what I meant is that instead of comparing for equality, there is n "IS NULL" 
expression that could be tried out in my opinion.


http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test@1025
PS4, Line 1025: ned_position
> The Iceberg expression conversion can't convert binary predicates with NULL
I get that but for future readers this might not be that obvious.



--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 6
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Wed, 23 Aug 2023 09:18:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13801/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 6
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Tue, 22 Aug 2023 09:35:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-22 Thread Peter Rozsa (Code Review)
Peter Rozsa has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..

IMPALA-12089: Be able to skip pushing down a subset of the predicates

This change adds a predicate filtering mechanism at planning time that
locates Impala's predicates in the residual expressions from Iceberg
planning. By locating all residual expressions, the remainder
expression set can be calculated.

The current implementation is an all-or-nothing filter, if 'planFiles()'
(Iceberg API) returns no residual expression, then all Impala
predicates can be skipped, if there's any residual expression, every
Impala predicate is pushed down to the Impala scanner.

Residual expressions are the remaining filter expressions after the
pushdown of predicates into the Iceberg table scan. By locating the
remainder expression, we can reduce the number of predicates that will
be pushed down to the Impala scanner.

After this change, the Iceberg residual expression handling is improved
by locating the simple conjuncts in the residual expression and mapping
back them to Impala conjuncts. For example, if the list of Impala
conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"'
and 'col_i' happens to be a partition column in the Iceberg table
definition and Iceberg table scan can eliminate the expression, the
residual expression will be 'col_s = "a"'. This expression can be mapped
back as an Impala predicate, and any other expression can be removed
from the effective Impala conjunct list, and pushed down to the scanner,
skipping the unnecessary filtering of 'col_i'.

If there's no residual expression, the behavior is the same as before,
all predicate pushdown is skipped.
If Impala is unable to match all residual expression to Impala conjuncts
then all the conjunct are pushed dow to Impala scanner.

This change offers the advantage of not pushing down already evaluated
filters to the Impala scanner nodes, resulting in enhanced scanning
performance. Additionally, if the filter expression affects columns that
are unnecessary for the final result and can be filtered out during
Iceberg's table scan, it leads to a reduced row size, thereby optimizing
data retrieval and improving overall query efficiency.

This solution is limited to cases where Impala's expression list
contains only conjuncts, compound expressions are not supported, because
partial elimination of compounds would involve expression rewrites in
the Impala expression.

A new query option is added: iceberg_predicate_pushdown_subsetting. The
query option's default value is true. It can be turned off by setting it
to false.

Performance of the predicate location is measured on two edge cases:
 - 1000 expression, 999 skipped: on avreage 2 ms
 - 1000 expression, 1 skipped: on average 25 ms

Tests:
 - planner test cases added for disabled mode
 - existing planner test cases adjusted
 - core tests passed

Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
12 files changed, 355 insertions(+), 72 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/6
--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 6
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-22 Thread Peter Rozsa (Code Review)
Peter Rozsa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@118
PS5, Line 118:   private final BiMap 
impalaIcebergPredicateMapping_ =
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@173
PS5, Line 173:   private boolean needIcebergForPlanning() {
> line too long (92 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 6
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Tue, 22 Aug 2023 09:09:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13713/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 5
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Fri, 04 Aug 2023 09:35:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-04 Thread Peter Rozsa (Code Review)
Peter Rozsa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 5:

(17 comments)

Thank you for the review Gabor!

http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG@24
PS4, Line 24: Iceberg residual expression hand
> This 'residual expression backtracking' sounds pretty mysterious. Might be
Done


http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG@37
PS4, Line 37: If Impala is unable to match a
> 'If Impala is unable to match all residual expression to Impala conjuncts t
Done


http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG@53
PS4, Line 53: is true. It can be turned of
> I think it's in fact 'iceberg_predicate_pushdown_subsetting'
Done


http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG@57
PS4, Line 57:  - 1000 expression, 999 skipped: on avreage 2 ms
> Did you have the chance to make some perf measurements when you start incre
Done


http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/analysis/PredicateVisitor.java
File fe/src/main/java/org/apache/impala/analysis/PredicateVisitor.java:

http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/analysis/PredicateVisitor.java@33
PS4, Line 33:
> From Impala's perspective this class is only a Visitor for Iceberg predicat
Renamed to IcebergExpressionCollector


http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@118
PS4, Line 118: impalaIcebergPredicate
> I think something like 'impalaIcebergPredicateMapping' would be more verbos
Done


http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@375
PS4, Line 375: IcebergUtil.planFiles(getIceTable(),
> indentation is off with these 2 lines.
Done


http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@418
PS4, Line 418:   
conjuncts_.removeAll(impalaIcebergPredicateMapping_.values());
> Might be personal preference, but if you returned here then you wouldn't ne
Done


http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@421
PS4, Line 421: if 
(!analyzer_.getQueryOptions().iceberg_predicate_pushdown_subsetting) { return; }
> Doesn't this fit into the line above?
Done


http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@430
PS4, Line 430:  If we fail to locate any of the Iceberg residual 
expressions then we skip
> Could you please add some comment here for more clarity, like "If we fail t
Done


http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@431
PS4, Line 431:  filtering the predicates to be pushed down to Impala 
scanner.
> merge into the line above
Done


http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@437
PS4, Line 437: skippedExpressions_.addAll(
> I think something the other way around would be more readable if feasible.
Done


http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test:

http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test@1
PS4, Line 1: # If one or more non-partition predicates are present, then all 
predicates are pushed down
> Please mention in this comment that the pred subsetting feature is off. I k
Done


http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test:

http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@77
PS4, Line 77: 
> I think this iceberg-predicates.test file is the place where we should test
Newly added test cases moved to this file


http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test:

http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test@1005
PS4, Line 1005:  

[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-04 Thread Peter Rozsa (Code Review)
Peter Rozsa has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..

IMPALA-12089: Be able to skip pushing down a subset of the predicates

This change adds a predicate filtering mechanism at planning time that
locates Impala's predicates in the residual expressions from Iceberg
planning. By locating all residual expressions, the remainder
expression set can be calculated.

The current implementation is an all-or-nothing filter, if 'planFiles()'
(Iceberg API) returns no residual expression, then all Impala
predicates can be skipped, if there's any residual expression, every
Impala predicate is pushed down to the Impala scanner.

Residual expressions are the remaining filter expressions after the
pushdown of predicates into the Iceberg table scan. By locating the
remainder expression, we can reduce the number of predicates that will
be pushed down to the Impala scanner.

After this change, the Iceberg residual expression handling is improved
by locating the simple conjuncts in the residual expression and mapping
back them to Impala conjuncts. For example, if the list of Impala
conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"'
and 'col_i' happens to be a partition column in the Iceberg table
definition and Iceberg table scan can eliminate the expression, the
residual expression will be 'col_s = "a"'. This expression can be mapped
back as an Impala predicate, and any other expression can be removed
from the effective Impala conjunct list, and pushed down to the scanner,
skipping the unnecessary filtering of 'col_i'.

If there's no residual expression, the behavior is the same as before,
all predicate pushdown is skipped.
If Impala is unable to match all residual expression to Impala conjuncts
then all the conjunct are pushed dow to Impala scanner.

This change offers the advantage of not pushing down already evaluated
filters to the Impala scanner nodes, resulting in enhanced scanning
performance. Additionally, if the filter expression affects columns that
are unnecessary for the final result and can be filtered out during
Iceberg's table scan, it leads to a reduced row size, thereby optimizing
data retrieval and improving overall query efficiency.

This solution is limited to cases where Impala's expression list
contains only conjuncts, compound expressions are not supported, because
partial elimination of compounds would involve expression rewrites in
the Impala expression.

A new query option is added: iceberg_predicate_pushdown_subsetting. The
query option's default value is true. It can be turned off by setting it
to false.

Performance of the predicate location is measured on two edge cases:
 - 1000 expression, 999 skipped: on avreage 2 ms
 - 1000 expression, 1 skipped: on average 25 ms

Tests:
 - planner test cases added for disabled mode
 - existing planner test cases adjusted
 - core tests passed

Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
12 files changed, 353 insertions(+), 72 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/5
--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 5
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@118
PS5, Line 118:   private final BiMap 
impalaIcebergPredicateMapping_ = HashBiMap.create();
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@173
PS5, Line 173: return !impalaIcebergPredicateMapping_.isEmpty() || 
tblRef_.getTimeTravelSpec() != null;
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 5
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Fri, 04 Aug 2023 09:13:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-08-01 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 4:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG@24
PS4, Line 24: residual expression backtracking
This 'residual expression backtracking' sounds pretty mysterious. Might be 
better this way: 'Iceberg residual expression handling'


http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG@37
PS4, Line 37: If the expression backtracking
'If Impala is unable to match all residual expression to Impala conjuncts then 
all the conjunct are pushed dow to Impala scanner' ?


http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG@53
PS4, Line 53: iceberg_predicate_subsetting
I think it's in fact 'iceberg_predicate_pushdown_subsetting'


http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG@57
PS4, Line 57: Tests:
Did you have the chance to make some perf measurements when you start 
increasing the number of predicates so that we can be sure converting Iceberg 
preds back to Impala preds doesn't get too expensive with scale?


http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/analysis/PredicateVisitor.java
File fe/src/main/java/org/apache/impala/analysis/PredicateVisitor.java:

http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/analysis/PredicateVisitor.java@33
PS4, Line 33: PredicateVisitor
>From Impala's perspective this class is only a Visitor for Iceberg 
>predicates/expressions, right? IcebergPredicateVisitor would be a more 
>suitable name, then.
But being even more precise, this class is actually a collector of Iceberg 
predicates (using the visitor pattern). So something like 
IcebergPredicateCollector or IcebergExpressionCollector would be even more 
precise in my opinion.


http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@118
PS4, Line 118: translatedExpressions_
I think something like 'impalaIcebergPredicateMapping' would be more verbose.


http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@375
PS4, Line 375:  IcebergUtil.planFiles(getIceTable(),
indentation is off with these 2 lines.


http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@418
PS4, Line 418:   conjuncts_.removeAll(translatedExpressions_.values());
Might be personal preference, but if you returned here then you wouldn't need 
to put the rest of the function in an else block.


http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@421
PS4, Line 421: return;
Doesn't this fit into the line above?


http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@430
PS4, Line 430:   if (expr == null) {
Could you please add some comment here for more clarity, like "If we fail to 
translate any of the Iceberg residual expressions then we skip filtering the 
predicates to be pushed down to Impala scanner."


http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@431
PS4, Line 431: return;
merge into the line above


http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@437
PS4, Line 437:   expressionsToRetain.forEach(skippedExpressions_::remove);
I think something the other way around would be more readable if feasible. Like 
skippedExpressions_.stream().filter(some condition) ?


http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test:

http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test@1
PS4, Line 1: # If one or more non-partition predicates are present, then all 
predicates are pushed down
Please mention in this comment that the pred subsetting feature is off. I know 
this info is in the file name but you can't emphasise this enough.


http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test:


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-07-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13580/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 4
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Wed, 19 Jul 2023 12:07:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-07-19 Thread Peter Rozsa (Code Review)
Peter Rozsa has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..

IMPALA-12089: Be able to skip pushing down a subset of the predicates

This change adds a predicate filtering mechanism at planning time that
locates Impala's predicates in the residual expressions from Iceberg
planning. By locating all residual expressions, the remainder
expression set can be calculated.

The current implementation is an all-or-nothing filter, if 'planFiles()'
(Iceberg API) returns no residual expression, then all Impala
predicates can be skipped, if there's any residual expression, every
Impala predicate is pushed down to the Impala scanner.

Residual expressions are the remaining filter expressions after the
pushdown of predicates into the Iceberg table scan. By locating the
remainder expression, we can reduce the number of predicates that will
be pushed down to the Impala scanner.

After this change, the residual expression backtracking is improved by
locating the simple conjuncts in the residual expression and mapping
back them to Impala conjuncts. For example, if the list of Impala
conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"'
and 'col_i' happens to be a partition column in the Iceberg table
definition and Iceberg table scan can eliminate the expression, the
residual expression will be 'col_s = "a"'. This expression can be mapped
back as an Impala predicate, and any other expression can be removed
from the effective Impala conjunct list, and pushed down to the scanner,
skipping the unnecessary filtering of 'col_i'.

If there's no residual expression, the behavior is the same as before,
all predicate pushdown is skipped.
If the expression backtracking can't match all residual expressions in
the Impala conjunct list, then the filtering resorts to the original
all-or-nothing approach.

This change offers the advantage of not pushing down already evaluated
filters to the Impala scanner nodes, resulting in enhanced scanning
performance. Additionally, if the filter expression affects columns that
are unnecessary for the final result and can be filtered out during
Iceberg's table scan, it leads to a reduced row size, thereby optimizing
data retrieval and improving overall query efficiency.

This solution is limited to cases where Impala's expression list
contains only conjuncts, compound expressions are not supported, because
partial elimination of compounds would involve expression rewrites in
the Impala expression.

A new query option is added: iceberg_predicate_subsetting. The query
option's default value is true. It can be turned off by setting it to
false.

Tests:
 - planner test cases added for disabled mode
 - existing planner test cases adjusted
 - core tests passed

Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A fe/src/main/java/org/apache/impala/analysis/PredicateVisitor.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
12 files changed, 304 insertions(+), 72 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/4
--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 4
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-07-04 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 3:

(30 comments)

Thanks for your work, Peter!

http://gerrit.cloudera.org:8080/#/c/20133/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20133/3//COMMIT_MSG@9
PS3, Line 9: This change adds a predicate filtering mechanism at planning time 
that
For me, this paragraph was a bit difficult to read. I can imagine that someone 
with no background knowledge on this optimization would find it hard to 
understand what this patch does based on this explanation. I'd split this into 
multiple paragraphs for more readability/ eg.:
- what is the current behaviour (all-or-nothing predicate filtering).
- What we mean be residual (preds returned from Iceberg after a planFiles() 
call). Also emphasize that there is pred pushdown in Iceberg and in Impala too 
and here we optimize the pushdown in Impala.
- How we want to achieve this (also a bit more structured description would be 
nice).
- What we gain here.


http://gerrit.cloudera.org:8080/#/c/20133/3//COMMIT_MSG@20
PS3, Line 20: compound
: expressions created by Impala
example pls


http://gerrit.cloudera.org:8080/#/c/20133/3//COMMIT_MSG@19
PS3, Line 19: The visitor implementation
: handles IN, NULL and binary expressions, and does not handle 
compound
: expressions created by Impala, and aggregates, startsWith, isNan
: expressions by Iceberg.
Is there a reason for not covering these? due to difficulty in implementation? 
Are there plans to add this support later on?


http://gerrit.cloudera.org:8080/#/c/20133/3//COMMIT_MSG@30
PS3, Line 30:  - planner test cases added for disabled mode
I see also coverage for the 'enabled' mode in the existing tests.


http://gerrit.cloudera.org:8080/#/c/20133/3/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/20133/3/common/thrift/ImpalaService.thrift@800
PS3, Line 800:   ICEBERG_PREDICATE_SUBSETTING = 158;
Wondering if ICEBERG_PREDICATE_PUSHDOWN_SUBSETTING would be an even more 
self-explanatory name. Might be too long, though.


http://gerrit.cloudera.org:8080/#/c/20133/3/fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java
File fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java:

http://gerrit.cloudera.org:8080/#/c/20133/3/fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java@45
PS3, Line 45: ImpalaExpressionVisitor
this is rather an ImpalaPredicateLocator, isn't it?


http://gerrit.cloudera.org:8080/#/c/20133/3/fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java@47
PS3, Line 47: OPERATORS
A more verbose name with be nice. Like IMPALA_ICEBERG_OPERATOR_MAP


http://gerrit.cloudera.org:8080/#/c/20133/3/fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java@56
PS3, Line 56: expressions
I think impalaExpressions would be a better name as we deal with Iceberg 
expressions here as well, so would be nice to easily differentiate between them.


http://gerrit.cloudera.org:8080/#/c/20133/3/fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java@67
PS3, Line 67: if (literal.to(StringType.get()) == literal) {
This is a check if T is a string, right? Can't we simply call 
literal.toString() for all the cases? Wouldn't that work for String literals?


http://gerrit.cloudera.org:8080/#/c/20133/3/fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java@86
PS3, Line 86: return result;
this seems weird, but honestly not 100% up to speed with this Iceberg visitor 
thing :)
Could you add a test to exercise this 'not' operator? I guess we'd need a bool 
partition column for this.

Giving this a second thought, with this visitor we only check the occurrence of 
certain expressions within the Iceberg expression tree. So I think I get why we 
just simply return the intermediate results for a 'not' operation as it won't 
change what expressions we have seen so far.


http://gerrit.cloudera.org:8080/#/c/20133/3/fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java@102
PS3, Line 102:   public  ImpalaExpressionVisitorResult 
predicate(BoundPredicate pred) {
Could you help me understand what is this BoundPredicate and why it doesn't 
require such a treatment as the below UnboundPredicate does?


http://gerrit.cloudera.org:8080/#/c/20133/3/fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java@112
PS3, Line 112: if (!(pred.term() instanceof NamedReference)) {
you could merge L112-114 into one line.


http://gerrit.cloudera.org:8080/#/c/20133/3/fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java@119
PS3, Line 119: case NOT_NULL:
I personally find the case blocks more 

[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-06-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13432/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 3
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 29 Jun 2023 12:05:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-06-29 Thread Peter Rozsa (Code Review)
Peter Rozsa has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..

IMPALA-12089: Be able to skip pushing down a subset of the predicates

This change adds a predicate filtering mechanism at planning time that
locates Impala's predicates in the residual expressions from Iceberg
planning. By locating all residual expressions, the remainder expression
set can be calculated. If there's no residual expression, the behaviour
is the same as before, all predicate pushdown is skipped. If there is a
residual expression, an Iceberg expression visitor implementation tries
to locate each Impala predicate in the expression tree that the Iceberg
planning returned as a residual expression. If the visitor's result is
mismatched (the number of expressions does not match the number of
located expressions), then it resorts to the original behaviour, all
predicates are pushed down to the scanners. The visitor implementation
handles IN, NULL and binary expressions, and does not handle compound
expressions created by Impala, and aggregates, startsWith, isNan
expressions by Iceberg.

A new query option is added: iceberg_predicate_subsetting. The query
option's default value is true. It can be turned off by setting it to
false.

Tests:
 - unit tests added for ImpalaExpressionVisitor
 - planner test cases added for disabled mode
 - core tests passed

Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
A fe/src/test/java/org/apache/impala/analysis/ImpalaExpressionVisitorTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
12 files changed, 656 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/3
--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 3
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-06-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 2:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/13430/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 2
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 29 Jun 2023 11:24:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-06-29 Thread Peter Rozsa (Code Review)
Peter Rozsa has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..

IMPALA-12089: Be able to skip pushing down a subset of the predicates

This change adds a predicate filtering mechanism at planning time that
locates Impala's predicates in the residual expressions from Iceberg
planning. By locating all residual expressions, the remainder expression
set can be calculated. If there's no residual expression, the behaviour
is the same as before, all predicate pushdown is skipped. If there is a
residual expression, an Iceberg expression visitor implementation tries
to locate each Impala predicate in the expression tree that the Iceberg
planning returned as a residual expression. If the visitor's result is
mismatched (the number of expressions does not match the number of
located expressions), then it resorts to the original behaviour, all
predicates are pushed down to the scanners. The visitor implementation
handles IN, NULL and binary expressions, and does not handle compound
expressions created by Impala, and aggregates, startsWith, isNan
expressions by Iceberg.

A new query option is added: iceberg_predicate_subsetting. The query
option's default value is true. It can be turned off by setting it to
false.

Tests:
 - unit tests added for ImpalaExpressionVisitor
 - planner test cases added for disabled mode
 - core tests passed

Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
A fe/src/test/java/org/apache/impala/analysis/ImpalaExpressionVisitorTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
12 files changed, 622 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/2
--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 2
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-06-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20133 )

Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..


Patch Set 1:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/13429/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 1
Gerrit-Owner: Peter Rozsa 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Peter Rozsa 
Gerrit-Comment-Date: Thu, 29 Jun 2023 08:06:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates

2023-06-29 Thread Peter Rozsa (Code Review)
Peter Rozsa has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20133


Change subject: IMPALA-12089: Be able to skip pushing down a subset of the 
predicates
..

IMPALA-12089: Be able to skip pushing down a subset of the predicates

This change adds a predicate filtering mechanism at planning time that
locates Impala's predicates in the residual expressions from Iceberg
planning. By locating all residual expressions, the remainder expression
set can be calculated. If there's no residual expression, the behaviour
is the same as before, all predicate pushdown is skipped. If there is a
residual expression, an Iceberg expression visitor implementation tries
to locate each Impala predicate in the expression tree that the Iceberg
planning returned as a residual expression. If the visitor's result is
mismatched (the number of expressions does not match the number of
located expressions), then it resorts to the original behaviour, all
predicates are pushed down to the scanners. The visitor implementation
handles IN, NULL and binary expressions, and does not handle compound
expressions created by Impala, and aggregates, startsWith, isNan
expressions by Iceberg.

A new query option is added: iceberg_predicate_subsetting. The query
option's default value is true. It can be turned off by setting it to
false.

Tests:
 - unit tests added for ImpalaExpressionVisitor
 - planner test cases added for disabled mode
 - core tests passed

Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
A fe/src/test/java/org/apache/impala/analysis/ImpalaExpressionVisitorTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
12 files changed, 622 insertions(+), 36 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/1
--
To view, visit http://gerrit.cloudera.org:8080/20133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8
Gerrit-Change-Number: 20133
Gerrit-PatchSet: 1
Gerrit-Owner: Peter Rozsa