[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/19534 ) Change subject: IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg .. Patch Set 3: (5 comments) Left some minor comments, but really nice work! http://gerrit.cloudera.org:8080/#/c/19534/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19534/3//COMMIT_MSG@20 PS3, Line 20: down nit: pushing down? http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@241 PS3, Line 241: getSkippedConjuncts() I don't think we need this as this is the DELETE SCAN node which wouldn't evaluate any conjuncts anyway. http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@388 PS3, Line 388: new ArrayList() nit: could be 'Collections.emptyList()' http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/datasets/functional/functional_schema_template.sql@3265 PS3, Line 3265: TBLPROPERTIES('iceberg.catalog'='hadoop.catalog', : 'iceberg.catalog_location'='/test-warehouse/iceberg_test/hadoop_catalog', : 'iceberg.table_identifier'='functional_parquet.iceberg_partition_evolution'); Since we are writing the table via Impala we probably don't need to set these. http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test: http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test@990 PS3, Line 990: event_time='2020-01-01 11:00:00'; iceberg_partitioned is partitioned by HOUR(event_time), how can we skip pushing it down? -- To view, visit http://gerrit.cloudera.org:8080/19534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b Gerrit-Change-Number: 19534 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 06 Mar 2023 17:06:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19534 ) Change subject: IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg .. Patch Set 3: (7 comments) Thanks Gabor. Could you also address the comments on the commit message? http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1390 PS3, Line 1390: is 'is' is not needed here? http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2027 PS3, Line 2027: explainLevel 'explainLevel' may not be the best name as it could be confused with the verbosity level of the explain query ('detailLevel' here) - we can use "set EXPLAIN_LEVEL=..." in the shell. 'indentPrefix' would be better. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@64 PS1, Line 64: > 'skippedConjuncts_' is already removed from 'conjuncts_' in the Iceberg pla The comment on 'nonIdentityConjuncts_' says that it is a subset of 'conjuncts_' that does not include the conjuncts pushed down to Iceberg - isn't the latter set the same as 'skippedConjuncts_'? That's why I thought that the union of 'nonIdentityConjuncts_' and 'skippedConjuncts_' would be the original value of 'conjuncts_'. But then after removing 'skippedConjuncts_' from 'conjuncts_', 'nonIdentityConjuncts_' is not a (proper) subset of 'conjuncts_' but the same as 'conjuncts_'. Is this not correct? http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@115 PS3, Line 115: have Nit: has http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@1279 PS1, Line 1279: exercising > I googled it, and for me it seems that it's 'exercising' :) You're right, sorry :) http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test: http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@49 PS3, Line 49: could be Could not be? http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test: http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test@51 PS3, Line 51: predicae Nit: predicate -- To view, visit http://gerrit.cloudera.org:8080/19534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b Gerrit-Change-Number: 19534 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 03 Mar 2023 13:28:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/19534 ) Change subject: IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg .. Patch Set 3: I've just realized that I forgot to address the comments on the commit message. Let me wait for further comments and will get back to them together. -- To view, visit http://gerrit.cloudera.org:8080/19534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b Gerrit-Change-Number: 19534 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 02 Mar 2023 12:41:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19534 ) Change subject: IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12488/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b Gerrit-Change-Number: 19534 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 28 Feb 2023 13:01:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19534 ) Change subject: IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12487/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b Gerrit-Change-Number: 19534 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 28 Feb 2023 12:55:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/19534 ) Change subject: IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg .. Patch Set 3: (15 comments) Thanks for the review, Daniel! I hope I addresses all of your comments. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2014 PS1, Line 2014: ort(formats); > You could add a comment describing how this method should be used. Sure. I also noticed that if the derived class return a multi-line string then the indentation would be off as I only shift the first line at L1970 when I append to 'detailPrefix'. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@64 PS1, Line 64: > What is the relationship between 'conjuncts_', 'nonIdentityConjuncts_' and 'skippedConjuncts_' is already removed from 'conjuncts_' in the Iceberg planner. I think the comment on L56 is still valid after introducing this new member. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@64 PS1, Line 64: > Should end with an underscore: 'skippedConjuncts_'. Done http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@114 PS1, Line 114: ther we have to push down 'impalaIc > Should end with underscore. Done http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@114 PS1, Line 114: ther we have to push down 'impalaIc > Could make it clearer that here we mean pushing predicates down to the scan Done http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@375 PS1, Line 375: "Failed to load data files for Iceberg table: %s", getIceTable().getFullName()), > Wouldn't it be possible that some (but not all) conjuncts can be skipped? C Good obervation. It's possible that a subset of 'icebergPredicates_' could be omitted from pushing down in Impala. This is planned in a future patch as it's not 100% trivial how to match between an Iceberg predicate (returned by the residual() function call) and Impala's predicates. At first also it's worth running a benchmark to see if we gain noticeable performance with this. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@379 PS1, Line 379: } > Nit: if we forgot to call filterConjuncts() but called filterFileDescriptor Well, in that case our Planner tests would start failing because some predicates would be present both in the 'predicates' and 'skipped Iceberg predicates' section of the explain output. With this approach we either push down all the 'Iceberg' predicates to the scanners or we don't push any of them so introducing another member that would be identical to 'impalaIcebergPredicates_' seems overkill to me. In the second go where we might push down a subset of these predicates we might want to introduce what you suggest. But I wouldn't worry about forgetting to call filterConjuncts() because the PlannerTests would catch it. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@1279 PS1, Line 1279: exercising > Nit: excercising. I googled it, and for me it seems that it's 'exercising' :) http://gerrit.cloudera.org:8080/#/c/19534/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test: http://gerrit.cloudera.org:8080/#/c/19534/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@1 PS1, Line 1: bo > Nit: no need for 'in'. Done http://gerrit.cloudera.org:8080/#/c/19534/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@2 PS1, Line 2: th > Nit: out. Done http://gerrit.cloudera.org:8080/#/c/19534/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@52 PS1, Line 52: > Did you mean 'event_time'? Well, this test case is not what I intended. I re-
[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg
Hello Daniel Becker, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19534 to look at the new patch set (#3). Change subject: IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg .. IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg We push down predicates to Iceberg that uses them to filter out files when getting the results of planFiles(). Using the FileScanTask.residual() function we can find out if we have to use the predicates to further filter the rows of the given files or if Iceberg has already performed all the filtering. Basically if we only filter on partition columns then Iceberg can filter the files based on this and using these filter in Impala wouldn't filter any more rows from the output (taking into account that no partition evolution) was performed on the table. An additional benefit of not down no-op predicates is that we can potentially materialize less slots. For example: SELECT count(1) from iceberg_tbl where part_col = 10; In the above query Iceberg filters the files using the predicate on a partition column and then there won't be any need to materialize 'part_col' in Impala, nor to push down the 'part_col = 10' predicate. Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b --- M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test 12 files changed, 307 insertions(+), 68 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/19534/3 -- To view, visit http://gerrit.cloudera.org:8080/19534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b Gerrit-Change-Number: 19534 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg
Hello Daniel Becker, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/19534 to look at the new patch set (#2). Change subject: IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg .. IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg We push down predicates to Iceberg that uses them to filter out files when getting the results of planFiles(). Using the FileScanTask.residual() function we can find out if we have to use the predicates to further filter the rows of the given files or if Iceberg has already performed all the filtering. Basically if we only filter on partition columns then Iceberg can filter the files based on this and using these filter in Impala wouldn't filter any more rows from the output (taking into account that no partition evolution) was performed on the table. An additional benefit of not down no-op predicates is that we can potentially materialize less slots. For example: SELECT count(1) from iceberg_tbl where part_col = 10; In the above query Iceberg filters the files using the predicate on a partition column and then there won't be any need to materialize 'part_col' in Impala, nor to push down the 'part_col = 10' predicate. Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b --- M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test 12 files changed, 305 insertions(+), 68 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/19534/2 -- To view, visit http://gerrit.cloudera.org:8080/19534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b Gerrit-Change-Number: 19534 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19534 ) Change subject: IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg .. Patch Set 1: (21 comments) http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@16 PS1, Line 16: filter Nit: filters. http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@17 PS1, Line 17: taking into account Wouldn't "assuming" be better? http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@17 PS1, Line 17: ) Nit: closing parenthesis should come at the end of the sentence. http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@20 PS1, Line 20: An additional benefit of not down no-op predicates Am I right that here we mean pushing down predicates to the scanner, as opposed to pushing it down to Iceberg? We could make this clearer, perhaps also in the title. http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@20 PS1, Line 20: not down not pushing down http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@28 PS1, Line 28: We could include a Testing section describing what tests have been added/modified. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2014 PS1, Line 2014: getDerivedExplainString You could add a comment describing how this method should be used. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@64 PS1, Line 64: private List skippedConjuncts; What is the relationship between 'conjuncts_', 'nonIdentityConjuncts_' and 'skippedConjuncts_'? Is 'conjuncts_' the union of the other two? In this case 'skippedConjuncts_' may not need to be stored (or taken in the constructor) separately. Or, if the 'skippedConjuncts_' have already been removed from 'conjuncts_', the comment on L56 has become a bit misleading. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@64 PS1, Line 64: skippedConjuncts Should end with an underscore: 'skippedConjuncts_'. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@114 PS1, Line 114: canSkipPushingDownIcebergPredicates Should end with underscore. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@114 PS1, Line 114: canSkipPushingDownIcebergPredicates Could make it clearer that here we mean pushing predicates down to the scanners, not to Iceberg. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@375 PS1, Line 375: conjuncts_.removeAll(impalaIcebergPredicates_); Wouldn't it be possible that some (but not all) conjuncts can be skipped? Can this approach be further refined (in a follow-up commit)? http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@379 PS1, Line 379: private List getSkippedConjuncts() { Nit: if we forgot to call filterConjuncts() but called filterFileDescriptors(), it's possible that 'conjuncts_' would still contain the elements that getSkippedConjuncts() returns. If we had a separate member for 'skippedConjuncts_' that would be returned by getSkippedConjuncts() and would be populated in filterConjuncts(), this anomaly would not be possible. http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@1279 PS1, Line 1279: exercising Nit: excercising. http://gerrit.cloudera.org:8080/#/c/19534/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test: http://gerrit.cloudera.org:8080/#/c/19534/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@1 PS1, Line 1: in Nit: no need for 'in'. http://gerrit.cloudera.org:8080/#/c/19534/1/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@2 PS1, Line 2: are Nit: out. http://gerrit.cloudera.org:8080/#/c/19534/1/t
[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19534 ) Change subject: IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/12435/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/19534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b Gerrit-Change-Number: 19534 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 24 Feb 2023 11:15:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg
Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19534 Change subject: IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg .. IMPALA-11701 Part1: Don't push down predicates if already applied by Iceberg We push down predicates to Iceberg that uses them to filter out files when getting the results of planFiles(). Using the FileScanTask.residual() function we can find out if we have to use the predicates to further filter the rows of the given files or if Iceberg has already performed all the filtering. Basically if we only filter on partition columns then Iceberg can filter the files based on this and using these filter in Impala wouldn't filter any more rows from the output (taking into account that no partition evolution) was performed on the table. An additional benefit of not down no-op predicates is that we can potentially materialize less slots. For example: SELECT count(1) from iceberg_tbl where part_col = 10; In the above query Iceberg filters the files using the predicate on a partition column and then there won't be any need to materialize 'part_col' in Impala, nor to push down the 'part_col = 10' predicate. Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b --- M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test 12 files changed, 290 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/19534/1 -- To view, visit http://gerrit.cloudera.org:8080/19534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b Gerrit-Change-Number: 19534 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab