Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/19534 )
Change subject: IMPALA-11701 Part1: Don't push down predicates to scanner if already applied by Iceberg ...................................................................... Patch Set 6: (17 comments) http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@16 PS1, Line 16: t (ass > Nit: filters. Done http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@17 PS1, Line 17: ). > Wouldn't "assuming" be better? Done http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@17 PS1, Line 17: > Nit: closing parenthesis should come at the end of the sentence. Done http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@20 PS1, Line 20: scanner is that we can potentially materialize les > Am I right that here we mean pushing down predicates to the scanner, as opp Done http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@20 PS1, Line 20: tentiall > not pushing down Done http://gerrit.cloudera.org:8080/#/c/19534/1//COMMIT_MSG@28 PS1, Line 28: > We could include a Testing section describing what tests have been added/mo Done http://gerrit.cloudera.org:8080/#/c/19534/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19534/3//COMMIT_MSG@20 PS3, Line 20: iall > nit: pushing down? Done http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1390 PS3, Line 1390: to > 'is' is not needed here? Done http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2027 PS3, Line 2027: indentPrefix > 'explainLevel' may not be the best name as it could be confused with the ve Done http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/19534/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@64 PS1, Line 64: > The comment on 'nonIdentityConjuncts_' says that it is a subset of 'conjunc 'nonIdentityConjuncts_' is in fact a subset of 'conjuncts_' that doesn't include conjuncts on identity partitioned columns. 'skippedConjuncts_' indeed contains identity conjuncts but not necessarily all of them just the ones that won't filter any further rows. OTOH As I see 'nonIdentityConjuncts_' is used for cardinality estimates. However, I wonder if from this point on can we drop use 'conjuncts_' - 'skippedConjuncts_' for this purpose. @Zoltan, do you have a thought on this? http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@115 PS3, Line 115: has > Nit: has Done http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@241 PS3, Line 241: > I don't think we need this as this is the DELETE SCAN node which wouldn't e Done http://gerrit.cloudera.org:8080/#/c/19534/3/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@388 PS3, Line 388: : %s", getIceTable(). > nit: could be 'Collections.emptyList()' Done http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/datasets/functional/functional_schema_template.sql@3265 PS3, Line 3265: ---- DEPENDENT_LOAD : # We can use 'date_string_col' as it is once IMPALA-11954 is done. : INSERT INTO {db_name}{db_suffix}.iceberg_partition_evolution > Since we are writing the table via Impala we probably don't need to set the Indeed, done http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test: http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@49 PS3, Line 49: could no > Could not be? Indeed :) http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test: http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test@990 PS3, Line 990: event_time='2020-01-01 11:00:00'; > iceberg_partitioned is partitioned by HOUR(event_time), how can we skip pus I had to check, but apparently we push down predicates of an HOUR() partition to Iceberg. Found this in the logs: "Push down the predicate: ref(name="event_time") == 1577872800000000 to iceberg" I'm wondering if this is because the partition value contains date+hour and not just hour. "event_time_hour=2020-01-01-08" In this particular test (due to some timezone mismatch) Iceberg filters out all the files using this pushed down filter, hence it shows up in "skipped Iceberg predicates". http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test: http://gerrit.cloudera.org:8080/#/c/19534/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test@51 PS3, Line 51: predicat > Nit: predicate Done -- To view, visit http://gerrit.cloudera.org:8080/19534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b Gerrit-Change-Number: 19534 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Tue, 28 Mar 2023 15:49:20 +0000 Gerrit-HasComments: Yes