Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 )
Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates ...................................................................... Patch Set 4: (17 comments) http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG@24 PS4, Line 24: residual expression backtracking This 'residual expression backtracking' sounds pretty mysterious. Might be better this way: 'Iceberg residual expression handling' http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG@37 PS4, Line 37: If the expression backtracking 'If Impala is unable to match all residual expression to Impala conjuncts then all the conjunct are pushed dow to Impala scanner' ? http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG@53 PS4, Line 53: iceberg_predicate_subsetting I think it's in fact 'iceberg_predicate_pushdown_subsetting' http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG@57 PS4, Line 57: Tests: Did you have the chance to make some perf measurements when you start increasing the number of predicates so that we can be sure converting Iceberg preds back to Impala preds doesn't get too expensive with scale? http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/analysis/PredicateVisitor.java File fe/src/main/java/org/apache/impala/analysis/PredicateVisitor.java: http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/analysis/PredicateVisitor.java@33 PS4, Line 33: PredicateVisitor >From Impala's perspective this class is only a Visitor for Iceberg >predicates/expressions, right? IcebergPredicateVisitor would be a more >suitable name, then. But being even more precise, this class is actually a collector of Iceberg predicates (using the visitor pattern). So something like IcebergPredicateCollector or IcebergExpressionCollector would be even more precise in my opinion. http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@118 PS4, Line 118: translatedExpressions_ I think something like 'impalaIcebergPredicateMapping' would be more verbose. http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@375 PS4, Line 375: IcebergUtil.planFiles(getIceTable(), indentation is off with these 2 lines. http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@418 PS4, Line 418: conjuncts_.removeAll(translatedExpressions_.values()); Might be personal preference, but if you returned here then you wouldn't need to put the rest of the function in an else block. http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@421 PS4, Line 421: return; Doesn't this fit into the line above? http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@430 PS4, Line 430: if (expr == null) { Could you please add some comment here for more clarity, like "If we fail to translate any of the Iceberg residual expressions then we skip filtering the predicates to be pushed down to Impala scanner." http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@431 PS4, Line 431: return; merge into the line above http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@437 PS4, Line 437: expressionsToRetain.forEach(skippedExpressions_::remove); I think something the other way around would be more readable if feasible. Like skippedExpressions_.stream().filter(some condition) ? http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test: http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test@1 PS4, Line 1: # If one or more non-partition predicates are present, then all predicates are pushed down Please mention in this comment that the pred subsetting feature is off. I know this info is in the file name but you can't emphasise this enough. http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test: http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@77 PS4, Line 77: ==== I think this iceberg-predicates.test file is the place where we should test this feature more heavily. e.g. include the use case I mentioned in another test file where some of the residuals can't be matched to Impala predicates so we fallback to push down everything. Also some further combinations of predicates, and also predicates vs partition transformed table would be nice here. You can go wild and come up with all kind of tricky predicates to see how this feature handle them. http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test: http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test@1005 PS4, Line 1005: user = null user is null ? http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test@1025 PS4, Line 1025: untranslated What is an 'untranslated expression' and it's not 100% clear for me why 'action = null' is untranslated. http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test@1035 PS4, Line 1035: ==== Would be nice to have a test case where there are residual expression but not all of them could be matched to an Impala predicate so we backtrack to push down all the predicates to Impala. -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 4 Gerrit-Owner: Peter Rozsa <pro...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Peter Rozsa <pro...@cloudera.com> Gerrit-Comment-Date: Tue, 01 Aug 2023 12:06:07 +0000 Gerrit-HasComments: Yes