[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg

2023-03-06 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19534 )

Change subject: IMPALA-11701 Part1: Don't push down predicates if already 
applied by Iceberg
..


Patch Set 3:

(5 comments)

Left some minor comments, but really nice work!

http://gerrit.cloudera.org:8080/#/c/19534/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19534/3//COMMIT_MSG@20
PS3, Line 20: down
nit: pushing down?


http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@241
PS3, Line 241: getSkippedConjuncts()
I don't think we need this as this is the DELETE SCAN node which wouldn't 
evaluate any conjuncts anyway.


http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@388
PS3, Line 388: new ArrayList()
nit: could be 'Collections.emptyList()'


http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/datasets/functional/functional_schema_template.sql@3265
PS3, Line 3265: TBLPROPERTIES('iceberg.catalog'='hadoop.catalog',
  :   
'iceberg.catalog_location'='/test-warehouse/iceberg_test/hadoop_catalog',
  :   
'iceberg.table_identifier'='functional_parquet.iceberg_partition_evolution');
Since we are writing the table via Impala we probably don't need to set these.


http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test:

http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test@990
PS3, Line 990: event_time='2020-01-01 11:00:00';
iceberg_partitioned is partitioned by HOUR(event_time), how can we skip pushing 
it down?



--
To view, visit http://gerrit.cloudera.org:8080/19534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b
Gerrit-Change-Number: 19534
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 06 Mar 2023 17:06:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg

2023-03-03 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19534 )

Change subject: IMPALA-11701 Part1: Don't push down predicates if already 
applied by Iceberg
..


Patch Set 3:

(7 comments)

Thanks Gabor. Could you also address the comments on the commit message?

http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1390
PS3, Line 1390: is
'is' is not needed here?


http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2027
PS3, Line 2027: explainLevel
'explainLevel' may not be the best name as it could be confused with the 
verbosity level of the explain query ('detailLevel' here) - we can use "set 
EXPLAIN_LEVEL=..." in the shell.
'indentPrefix' would be better.


http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java:

http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@64
PS1, Line 64:
> 'skippedConjuncts_' is already removed from 'conjuncts_' in the Iceberg pla
The comment on 'nonIdentityConjuncts_' says that it is a subset of 'conjuncts_' 
that does not include the conjuncts pushed down to Iceberg - isn't the latter 
set the same as 'skippedConjuncts_'?

That's why I thought that the union of 'nonIdentityConjuncts_' and 
'skippedConjuncts_' would be the original value of 'conjuncts_'. But then after 
removing 'skippedConjuncts_' from 'conjuncts_', 'nonIdentityConjuncts_' is not 
a (proper) subset of 'conjuncts_' but the same as 'conjuncts_'. Is this not 
correct?


http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@115
PS3, Line 115: have
Nit: has


http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@1279
PS1, Line 1279: exercising
> I googled it, and for me it seems that it's 'exercising' :)
You're right, sorry :)


http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test:

http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@49
PS3, Line 49: could be
Could not be?


http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test:

http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test@51
PS3, Line 51: predicae
Nit: predicate



--
To view, visit http://gerrit.cloudera.org:8080/19534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b
Gerrit-Change-Number: 19534
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 03 Mar 2023 13:28:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg

2023-03-02 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19534 )

Change subject: IMPALA-11701 Part1: Don't push down predicates if already 
applied by Iceberg
..


Patch Set 3:

I've just realized that I forgot to address the comments on the commit message. 
Let me wait for further comments and will get back to them together.


--
To view, visit http://gerrit.cloudera.org:8080/19534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b
Gerrit-Change-Number: 19534
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 02 Mar 2023 12:41:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg

2023-02-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19534 )

Change subject: IMPALA-11701 Part1: Don't push down predicates if already 
applied by Iceberg
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12488/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b
Gerrit-Change-Number: 19534
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 28 Feb 2023 13:01:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg

2023-02-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19534 )

Change subject: IMPALA-11701 Part1: Don't push down predicates if already 
applied by Iceberg
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12487/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b
Gerrit-Change-Number: 19534
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 28 Feb 2023 12:55:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg

2023-02-28 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19534 )

Change subject: IMPALA-11701 Part1: Don't push down predicates if already 
applied by Iceberg
..


Patch Set 3:

(15 comments)

Thanks for the review, Daniel! I hope I addresses all of your comments.

http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2014
PS1, Line 2014: ort(formats);
> You could add a comment describing how this method should be used.
Sure. I also noticed that if the derived class return a multi-line string then 
the indentation would be off as I only shift the first line at L1970 when I 
append to 'detailPrefix'.


http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java:

http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@64
PS1, Line 64:
> What is the relationship between 'conjuncts_', 'nonIdentityConjuncts_' and
'skippedConjuncts_' is already removed from 'conjuncts_' in the Iceberg 
planner. I think the comment on L56 is still valid after introducing this new 
member.


http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@64
PS1, Line 64:
> Should end with an underscore: 'skippedConjuncts_'.
Done


http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@114
PS1, Line 114: ther we have to push down 'impalaIc
> Should end with underscore.
Done


http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@114
PS1, Line 114: ther we have to push down 'impalaIc
> Could make it clearer that here we mean pushing predicates down to the scan
Done


http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@375
PS1, Line 375:   "Failed to load data files for Iceberg table: %s", 
getIceTable().getFullName()),
> Wouldn't it be possible that some (but not all) conjuncts can be skipped? C
Good obervation. It's possible that a subset of 'icebergPredicates_' could be 
omitted from pushing down in Impala. This is planned in a future patch as it's 
not 100% trivial how to match between an Iceberg predicate (returned by the 
residual() function call) and Impala's predicates. At first also it's worth 
running a benchmark to see if we gain noticeable performance with this.


http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@379
PS1, Line 379:   }
> Nit: if we forgot to call filterConjuncts() but called filterFileDescriptor
Well, in that case our Planner tests would start failing because some 
predicates would be present both in the 'predicates' and 'skipped Iceberg 
predicates' section of the explain output.
With this approach we either push down all the 'Iceberg' predicates to the 
scanners or we don't push any of them so introducing another member that would 
be identical to 'impalaIcebergPredicates_' seems overkill to me. In the second 
go where we might push down a subset of these predicates we might want to 
introduce what you suggest.
But I wouldn't worry about forgetting to call filterConjuncts() because the 
PlannerTests would catch it.


http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@1279
PS1, Line 1279: exercising
> Nit: excercising.
I googled it, and for me it seems that it's 'exercising' :)


http://gerrit.cloudera.org:8080/#/c/19534/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test:

http://gerrit.cloudera.org:8080/#/c/19534/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@1
PS1, Line 1: bo
> Nit: no need for 'in'.
Done


http://gerrit.cloudera.org:8080/#/c/19534/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@2
PS1, Line 2:  th
> Nit: out.
Done


http://gerrit.cloudera.org:8080/#/c/19534/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@52
PS1, Line 52:
> Did you mean 'event_time'?
Well, this test case is not what I intended. I re-

[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg

2023-02-28 Thread Gabor Kaszab (Code Review)
Hello Daniel Becker, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19534

to look at the new patch set (#3).

Change subject: IMPALA-11701 Part1: Don't push down predicates if already 
applied by Iceberg
..

IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg

We push down predicates to Iceberg that uses them to filter out files
when getting the results of planFiles(). Using the
FileScanTask.residual() function we can find out if we have to use
the predicates to further filter the rows of the given files or if
Iceberg has already performed all the filtering.
Basically if we only
filter on partition columns then Iceberg can filter the files based
on this and using these filter in Impala wouldn't filter any more
rows from the output (taking into account that no partition evolution)
was performed on the table.

An additional benefit of not down no-op predicates is that we can
potentially materialize less slots. For example:

SELECT count(1) from iceberg_tbl where part_col = 10;

In the above query Iceberg filters the files using the predicate on
a partition column and then there won't be any need to materialize
'part_col' in Impala, nor to push down the 'part_col = 10' predicate.

Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b
---
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test
12 files changed, 307 insertions(+), 68 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/19534/3
--
To view, visit http://gerrit.cloudera.org:8080/19534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b
Gerrit-Change-Number: 19534
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg

2023-02-28 Thread Gabor Kaszab (Code Review)
Hello Daniel Becker, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/19534

to look at the new patch set (#2).

Change subject: IMPALA-11701 Part1: Don't push down predicates if already 
applied by Iceberg
..

IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg

We push down predicates to Iceberg that uses them to filter out files
when getting the results of planFiles(). Using the
FileScanTask.residual() function we can find out if we have to use
the predicates to further filter the rows of the given files or if
Iceberg has already performed all the filtering.
Basically if we only
filter on partition columns then Iceberg can filter the files based
on this and using these filter in Impala wouldn't filter any more
rows from the output (taking into account that no partition evolution)
was performed on the table.

An additional benefit of not down no-op predicates is that we can
potentially materialize less slots. For example:

SELECT count(1) from iceberg_tbl where part_col = 10;

In the above query Iceberg filters the files using the predicate on
a partition column and then there won't be any need to materialize
'part_col' in Impala, nor to push down the 'part_col = 10' predicate.

Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b
---
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test
12 files changed, 305 insertions(+), 68 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/19534/2
--
To view, visit http://gerrit.cloudera.org:8080/19534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b
Gerrit-Change-Number: 19534
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg

2023-02-24 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19534 )

Change subject: IMPALA-11701 Part1: Don't push down predicates if already 
applied by Iceberg
..


Patch Set 1:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@16
PS1, Line 16: filter
Nit: filters.


http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@17
PS1, Line 17: taking into account
Wouldn't "assuming" be better?


http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@17
PS1, Line 17: )
Nit: closing parenthesis should come at the end of the sentence.


http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@20
PS1, Line 20: An additional benefit of not down no-op predicates
Am I right that here we mean pushing down predicates to the scanner, as opposed 
to pushing it down to Iceberg? We could make this clearer, perhaps also in the 
title.


http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@20
PS1, Line 20: not down
not pushing down


http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@28
PS1, Line 28:
We could include a Testing section describing what tests have been 
added/modified.


http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2014
PS1, Line 2014: getDerivedExplainString
You could add a comment describing how this method should be used.


http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java:

http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@64
PS1, Line 64:   private List skippedConjuncts;
What is the relationship between 'conjuncts_', 'nonIdentityConjuncts_' and 
'skippedConjuncts_'? Is 'conjuncts_' the union of the other two? In this case 
'skippedConjuncts_' may not need to be stored (or taken in the constructor) 
separately.

Or, if the 'skippedConjuncts_' have already been removed from 'conjuncts_', the 
comment on L56 has become a bit misleading.


http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@64
PS1, Line 64: skippedConjuncts
Should end with an underscore: 'skippedConjuncts_'.


http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@114
PS1, Line 114: canSkipPushingDownIcebergPredicates
Should end with underscore.


http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@114
PS1, Line 114: canSkipPushingDownIcebergPredicates
Could make it clearer that here we mean pushing predicates down to the 
scanners, not to Iceberg.


http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@375
PS1, Line 375:   conjuncts_.removeAll(impalaIcebergPredicates_);
Wouldn't it be possible that some (but not all) conjuncts can be skipped? Can 
this approach be further refined (in a follow-up commit)?


http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@379
PS1, Line 379:   private List getSkippedConjuncts() {
Nit: if we forgot to call filterConjuncts() but called filterFileDescriptors(), 
it's possible that 'conjuncts_' would still contain the elements that 
getSkippedConjuncts() returns.
If we had a separate member for 'skippedConjuncts_' that would be returned by 
getSkippedConjuncts() and would be populated in filterConjuncts(), this anomaly 
would not be possible.


http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@1279
PS1, Line 1279: exercising
Nit: excercising.


http://gerrit.cloudera.org:8080/#/c/19534/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test:

http://gerrit.cloudera.org:8080/#/c/19534/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@1
PS1, Line 1: in
Nit: no need for 'in'.


http://gerrit.cloudera.org:8080/#/c/19534/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@2
PS1, Line 2: are
Nit: out.


http://gerrit.cloudera.org:8080/#/c/19534/1/t

[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg

2023-02-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19534 )

Change subject: IMPALA-11701 Part1: Don't push down predicates if already 
applied by Iceberg
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/12435/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/19534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b
Gerrit-Change-Number: 19534
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 24 Feb 2023 11:15:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg

2023-02-24 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/19534


Change subject: IMPALA-11701 Part1: Don't push down predicates if already 
applied by Iceberg
..

IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg

We push down predicates to Iceberg that uses them to filter out files
when getting the results of planFiles(). Using the
FileScanTask.residual() function we can find out if we have to use
the predicates to further filter the rows of the given files or if
Iceberg has already performed all the filtering.
Basically if we only
filter on partition columns then Iceberg can filter the files based
on this and using these filter in Impala wouldn't filter any more
rows from the output (taking into account that no partition evolution)
was performed on the table.

An additional benefit of not down no-op predicates is that we can
potentially materialize less slots. For example:

SELECT count(1) from iceberg_tbl where part_col = 10;

In the above query Iceberg filters the files using the predicate on
a partition column and then there won't be any need to materialize
'part_col' in Impala, nor to push down the 'part_col = 10' predicate.

Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b
---
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test
12 files changed, 290 insertions(+), 65 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/19534/1
--
To view, visit http://gerrit.cloudera.org:8080/19534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b
Gerrit-Change-Number: 19534
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab