[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg .. IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg This change adds a predicate filtering mechanism at planning time that locates Impala's predicates in the residual expressions from Iceberg planning. By locating all residual expressions, the remainder expression set can be calculated. The current implementation is an all-or-nothing filter, if 'planFiles()' (Iceberg API) returns no residual expression, then all Impala predicates can be skipped, if there's any residual expression, every Impala predicate is pushed down to the Impala scanner. Residual expressions are the remaining filter expressions after the pushdown of predicates into the Iceberg table scan. By locating the remainder expression, we can reduce the number of predicates that will be pushed down to the Impala scanner. After this change, the Iceberg residual expression handling is improved by locating the simple conjuncts in the residual expression and mapping back them to Impala conjuncts. For example, if the list of Impala conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"' and 'col_i' happens to be a partition column in the Iceberg table definition and Iceberg table scan can eliminate the expression, the residual expression will be 'col_s = "a"'. This expression can be mapped back as an Impala predicate, and any other expression can be removed from the effective Impala conjunct list, and pushed down to the scanner, skipping the unnecessary filtering of 'col_i'. If there's no residual expression, the behavior is the same as before, all predicate pushdown is skipped. If Impala is unable to match all residual expression to Impala conjuncts then all the conjunct are pushed dow to Impala scanner. This change offers the advantage of not pushing down already evaluated filters to the Impala scanner nodes, resulting in enhanced scanning performance. Additionally, if the filter expression affects columns that are unnecessary for the final result and can be filtered out during Iceberg's table scan, it leads to a reduced row size, thereby optimizing data retrieval and improving overall query efficiency. This solution is limited to cases where Impala's expression list contains only conjuncts, compound expressions are not supported, because partial elimination of compounds would involve expression rewrites in the Impala expression. A new query option is added: iceberg_predicate_pushdown_subsetting. The query option's default value is true. It can be turned off by setting it to false. Performance of the predicate location is measured on two edge cases: - 1000 expression, 999 skipped: on avreage 2 ms - 1000 expression, 1 skipped: on average 25 ms Tests: - planner test cases added for disabled mode - existing planner test cases adjusted - core tests passed Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Reviewed-on: http://gerrit.cloudera.org:8080/20133 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test 12 files changed, 370 insertions(+), 72 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 13 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg .. Patch Set 12: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 12 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 28 Sep 2023 14:16:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg .. Patch Set 11: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 11 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 28 Sep 2023 09:09:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg .. Patch Set 12: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 12 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 28 Sep 2023 09:09:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg .. Patch Set 12: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9772/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 12 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 28 Sep 2023 09:09:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14100/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 11 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 28 Sep 2023 08:41:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg
Peter Rozsa has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg .. IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg This change adds a predicate filtering mechanism at planning time that locates Impala's predicates in the residual expressions from Iceberg planning. By locating all residual expressions, the remainder expression set can be calculated. The current implementation is an all-or-nothing filter, if 'planFiles()' (Iceberg API) returns no residual expression, then all Impala predicates can be skipped, if there's any residual expression, every Impala predicate is pushed down to the Impala scanner. Residual expressions are the remaining filter expressions after the pushdown of predicates into the Iceberg table scan. By locating the remainder expression, we can reduce the number of predicates that will be pushed down to the Impala scanner. After this change, the Iceberg residual expression handling is improved by locating the simple conjuncts in the residual expression and mapping back them to Impala conjuncts. For example, if the list of Impala conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"' and 'col_i' happens to be a partition column in the Iceberg table definition and Iceberg table scan can eliminate the expression, the residual expression will be 'col_s = "a"'. This expression can be mapped back as an Impala predicate, and any other expression can be removed from the effective Impala conjunct list, and pushed down to the scanner, skipping the unnecessary filtering of 'col_i'. If there's no residual expression, the behavior is the same as before, all predicate pushdown is skipped. If Impala is unable to match all residual expression to Impala conjuncts then all the conjunct are pushed dow to Impala scanner. This change offers the advantage of not pushing down already evaluated filters to the Impala scanner nodes, resulting in enhanced scanning performance. Additionally, if the filter expression affects columns that are unnecessary for the final result and can be filtered out during Iceberg's table scan, it leads to a reduced row size, thereby optimizing data retrieval and improving overall query efficiency. This solution is limited to cases where Impala's expression list contains only conjuncts, compound expressions are not supported, because partial elimination of compounds would involve expression rewrites in the Impala expression. A new query option is added: iceberg_predicate_pushdown_subsetting. The query option's default value is true. It can be turned off by setting it to false. Performance of the predicate location is measured on two edge cases: - 1000 expression, 999 skipped: on avreage 2 ms - 1000 expression, 1 skipped: on average 25 ms Tests: - planner test cases added for disabled mode - existing planner test cases adjusted - core tests passed Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test 12 files changed, 370 insertions(+), 72 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/11 -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 11 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg .. Patch Set 10: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9750/ -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 10 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Fri, 22 Sep 2023 11:22:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9750/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 10 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Fri, 22 Sep 2023 07:10:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 10 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 21 Sep 2023 14:36:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/14017/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 10 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Mon, 18 Sep 2023 09:10:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg
Peter Rozsa has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg .. IMPALA-12089: Be able to skip pushing down a subset of the predicates in Iceberg This change adds a predicate filtering mechanism at planning time that locates Impala's predicates in the residual expressions from Iceberg planning. By locating all residual expressions, the remainder expression set can be calculated. The current implementation is an all-or-nothing filter, if 'planFiles()' (Iceberg API) returns no residual expression, then all Impala predicates can be skipped, if there's any residual expression, every Impala predicate is pushed down to the Impala scanner. Residual expressions are the remaining filter expressions after the pushdown of predicates into the Iceberg table scan. By locating the remainder expression, we can reduce the number of predicates that will be pushed down to the Impala scanner. After this change, the Iceberg residual expression handling is improved by locating the simple conjuncts in the residual expression and mapping back them to Impala conjuncts. For example, if the list of Impala conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"' and 'col_i' happens to be a partition column in the Iceberg table definition and Iceberg table scan can eliminate the expression, the residual expression will be 'col_s = "a"'. This expression can be mapped back as an Impala predicate, and any other expression can be removed from the effective Impala conjunct list, and pushed down to the scanner, skipping the unnecessary filtering of 'col_i'. If there's no residual expression, the behavior is the same as before, all predicate pushdown is skipped. If Impala is unable to match all residual expression to Impala conjuncts then all the conjunct are pushed dow to Impala scanner. This change offers the advantage of not pushing down already evaluated filters to the Impala scanner nodes, resulting in enhanced scanning performance. Additionally, if the filter expression affects columns that are unnecessary for the final result and can be filtered out during Iceberg's table scan, it leads to a reduced row size, thereby optimizing data retrieval and improving overall query efficiency. This solution is limited to cases where Impala's expression list contains only conjuncts, compound expressions are not supported, because partial elimination of compounds would involve expression rewrites in the Impala expression. A new query option is added: iceberg_predicate_pushdown_subsetting. The query option's default value is true. It can be turned off by setting it to false. Performance of the predicate location is measured on two edge cases: - 1000 expression, 999 skipped: on avreage 2 ms - 1000 expression, 1 skipped: on average 25 ms Tests: - planner test cases added for disabled mode - existing planner test cases adjusted - core tests passed Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test 12 files changed, 369 insertions(+), 72 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/10 -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 10 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 9: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9660/ -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 9 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 31 Aug 2023 11:50:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9660/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 9 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 31 Aug 2023 07:38:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 9 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Wed, 30 Aug 2023 10:42:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13882/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 9 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Wed, 30 Aug 2023 08:22:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Peter Rozsa has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 9: (2 comments) Thank you, Gabor! http://gerrit.cloudera.org:8080/#/c/20133/8/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/20133/8/common/thrift/ImpalaService.thrift@846 PS8, Line 846: the sca > nit: typo Done http://gerrit.cloudera.org:8080/#/c/20133/8/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/20133/8/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@426 PS8, Line 426: > I'd also push down this logging into the newly introduced function to reduc Done -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 9 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Wed, 30 Aug 2023 07:56:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Peter Rozsa has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. IMPALA-12089: Be able to skip pushing down a subset of the predicates This change adds a predicate filtering mechanism at planning time that locates Impala's predicates in the residual expressions from Iceberg planning. By locating all residual expressions, the remainder expression set can be calculated. The current implementation is an all-or-nothing filter, if 'planFiles()' (Iceberg API) returns no residual expression, then all Impala predicates can be skipped, if there's any residual expression, every Impala predicate is pushed down to the Impala scanner. Residual expressions are the remaining filter expressions after the pushdown of predicates into the Iceberg table scan. By locating the remainder expression, we can reduce the number of predicates that will be pushed down to the Impala scanner. After this change, the Iceberg residual expression handling is improved by locating the simple conjuncts in the residual expression and mapping back them to Impala conjuncts. For example, if the list of Impala conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"' and 'col_i' happens to be a partition column in the Iceberg table definition and Iceberg table scan can eliminate the expression, the residual expression will be 'col_s = "a"'. This expression can be mapped back as an Impala predicate, and any other expression can be removed from the effective Impala conjunct list, and pushed down to the scanner, skipping the unnecessary filtering of 'col_i'. If there's no residual expression, the behavior is the same as before, all predicate pushdown is skipped. If Impala is unable to match all residual expression to Impala conjuncts then all the conjunct are pushed dow to Impala scanner. This change offers the advantage of not pushing down already evaluated filters to the Impala scanner nodes, resulting in enhanced scanning performance. Additionally, if the filter expression affects columns that are unnecessary for the final result and can be filtered out during Iceberg's table scan, it leads to a reduced row size, thereby optimizing data retrieval and improving overall query efficiency. This solution is limited to cases where Impala's expression list contains only conjuncts, compound expressions are not supported, because partial elimination of compounds would involve expression rewrites in the Impala expression. A new query option is added: iceberg_predicate_pushdown_subsetting. The query option's default value is true. It can be turned off by setting it to false. Performance of the predicate location is measured on two edge cases: - 1000 expression, 999 skipped: on avreage 2 ms - 1000 expression, 1 skipped: on average 25 ms Tests: - planner test cases added for disabled mode - existing planner test cases adjusted - core tests passed Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test 12 files changed, 371 insertions(+), 72 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/9 -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 9 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 8: Code-Review+1 (2 comments) Just some minor nits, looks good in overall. http://gerrit.cloudera.org:8080/#/c/20133/8/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/20133/8/common/thrift/ImpalaService.thrift@846 PS8, Line 846: the the nit: typo http://gerrit.cloudera.org:8080/#/c/20133/8/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/20133/8/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@426 PS8, Line 426: LOG.debug("Iceberg predicate pushdown subsetting took {} ms", I'd also push down this logging into the newly introduced function to reduce the noise from this level. -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 8 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Fri, 25 Aug 2023 13:23:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13833/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 8 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 24 Aug 2023 14:54:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Hello Gabor Kaszab, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20133 to look at the new patch set (#8). Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. IMPALA-12089: Be able to skip pushing down a subset of the predicates This change adds a predicate filtering mechanism at planning time that locates Impala's predicates in the residual expressions from Iceberg planning. By locating all residual expressions, the remainder expression set can be calculated. The current implementation is an all-or-nothing filter, if 'planFiles()' (Iceberg API) returns no residual expression, then all Impala predicates can be skipped, if there's any residual expression, every Impala predicate is pushed down to the Impala scanner. Residual expressions are the remaining filter expressions after the pushdown of predicates into the Iceberg table scan. By locating the remainder expression, we can reduce the number of predicates that will be pushed down to the Impala scanner. After this change, the Iceberg residual expression handling is improved by locating the simple conjuncts in the residual expression and mapping back them to Impala conjuncts. For example, if the list of Impala conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"' and 'col_i' happens to be a partition column in the Iceberg table definition and Iceberg table scan can eliminate the expression, the residual expression will be 'col_s = "a"'. This expression can be mapped back as an Impala predicate, and any other expression can be removed from the effective Impala conjunct list, and pushed down to the scanner, skipping the unnecessary filtering of 'col_i'. If there's no residual expression, the behavior is the same as before, all predicate pushdown is skipped. If Impala is unable to match all residual expression to Impala conjuncts then all the conjunct are pushed dow to Impala scanner. This change offers the advantage of not pushing down already evaluated filters to the Impala scanner nodes, resulting in enhanced scanning performance. Additionally, if the filter expression affects columns that are unnecessary for the final result and can be filtered out during Iceberg's table scan, it leads to a reduced row size, thereby optimizing data retrieval and improving overall query efficiency. This solution is limited to cases where Impala's expression list contains only conjuncts, compound expressions are not supported, because partial elimination of compounds would involve expression rewrites in the Impala expression. A new query option is added: iceberg_predicate_pushdown_subsetting. The query option's default value is true. It can be turned off by setting it to false. Performance of the predicate location is measured on two edge cases: - 1000 expression, 999 skipped: on avreage 2 ms - 1000 expression, 1 skipped: on average 25 ms Tests: - planner test cases added for disabled mode - existing planner test cases adjusted - core tests passed Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test 12 files changed, 372 insertions(+), 72 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/8 -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 8 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13832/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 7 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 24 Aug 2023 12:34:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Peter Rozsa has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. IMPALA-12089: Be able to skip pushing down a subset of the predicates This change adds a predicate filtering mechanism at planning time that locates Impala's predicates in the residual expressions from Iceberg planning. By locating all residual expressions, the remainder expression set can be calculated. The current implementation is an all-or-nothing filter, if 'planFiles()' (Iceberg API) returns no residual expression, then all Impala predicates can be skipped, if there's any residual expression, every Impala predicate is pushed down to the Impala scanner. Residual expressions are the remaining filter expressions after the pushdown of predicates into the Iceberg table scan. By locating the remainder expression, we can reduce the number of predicates that will be pushed down to the Impala scanner. After this change, the Iceberg residual expression handling is improved by locating the simple conjuncts in the residual expression and mapping back them to Impala conjuncts. For example, if the list of Impala conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"' and 'col_i' happens to be a partition column in the Iceberg table definition and Iceberg table scan can eliminate the expression, the residual expression will be 'col_s = "a"'. This expression can be mapped back as an Impala predicate, and any other expression can be removed from the effective Impala conjunct list, and pushed down to the scanner, skipping the unnecessary filtering of 'col_i'. If there's no residual expression, the behavior is the same as before, all predicate pushdown is skipped. If Impala is unable to match all residual expression to Impala conjuncts then all the conjunct are pushed dow to Impala scanner. This change offers the advantage of not pushing down already evaluated filters to the Impala scanner nodes, resulting in enhanced scanning performance. Additionally, if the filter expression affects columns that are unnecessary for the final result and can be filtered out during Iceberg's table scan, it leads to a reduced row size, thereby optimizing data retrieval and improving overall query efficiency. This solution is limited to cases where Impala's expression list contains only conjuncts, compound expressions are not supported, because partial elimination of compounds would involve expression rewrites in the Impala expression. A new query option is added: iceberg_predicate_pushdown_subsetting. The query option's default value is true. It can be turned off by setting it to false. Performance of the predicate location is measured on two edge cases: - 1000 expression, 999 skipped: on avreage 2 ms - 1000 expression, 1 skipped: on average 25 ms Tests: - planner test cases added for disabled mode - existing planner test cases adjusted - core tests passed Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test 12 files changed, 373 insertions(+), 72 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/7 -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 7 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/20133/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20133/6//COMMIT_MSG@56 PS6, Line 56: Performance of the predicate location is measured on two edge cases: Just curious: In terms of the # of slots being materialised, was this a select * query or did skipping the predicates also reduce the number of materialised slots too? But anyway, quite impressive numbers! http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@421 PS5, Line 421: return; nit: curly brackets aren't needed around the return statement. http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@422 PS5, Line 422: } I'm wondering if putting the rest of the code in this function into another function would increase readability. like trySubsettingPredicatesBeingPushedDown(). There might be a better and shorter name for that function, though. http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@429 PS5, Line 429: for (Expression located : locatedExpressions) { nit: linebreak not necessary here http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@433 PS5, Line 433: filtering the predicates to be pushed down to Impala scanner. nit: curly brackets not needed. http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test: http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test@1005 PS4, Line 1005: further ro > This is an example for a query that produces untranslated expression. what I meant is that instead of comparing for equality, there is n "IS NULL" expression that could be tried out in my opinion. http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test@1025 PS4, Line 1025: ned_position > The Iceberg expression conversion can't convert binary predicates with NULL I get that but for future readers this might not be that obvious. -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 6 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Wed, 23 Aug 2023 09:18:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13801/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 6 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Tue, 22 Aug 2023 09:35:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Peter Rozsa has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. IMPALA-12089: Be able to skip pushing down a subset of the predicates This change adds a predicate filtering mechanism at planning time that locates Impala's predicates in the residual expressions from Iceberg planning. By locating all residual expressions, the remainder expression set can be calculated. The current implementation is an all-or-nothing filter, if 'planFiles()' (Iceberg API) returns no residual expression, then all Impala predicates can be skipped, if there's any residual expression, every Impala predicate is pushed down to the Impala scanner. Residual expressions are the remaining filter expressions after the pushdown of predicates into the Iceberg table scan. By locating the remainder expression, we can reduce the number of predicates that will be pushed down to the Impala scanner. After this change, the Iceberg residual expression handling is improved by locating the simple conjuncts in the residual expression and mapping back them to Impala conjuncts. For example, if the list of Impala conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"' and 'col_i' happens to be a partition column in the Iceberg table definition and Iceberg table scan can eliminate the expression, the residual expression will be 'col_s = "a"'. This expression can be mapped back as an Impala predicate, and any other expression can be removed from the effective Impala conjunct list, and pushed down to the scanner, skipping the unnecessary filtering of 'col_i'. If there's no residual expression, the behavior is the same as before, all predicate pushdown is skipped. If Impala is unable to match all residual expression to Impala conjuncts then all the conjunct are pushed dow to Impala scanner. This change offers the advantage of not pushing down already evaluated filters to the Impala scanner nodes, resulting in enhanced scanning performance. Additionally, if the filter expression affects columns that are unnecessary for the final result and can be filtered out during Iceberg's table scan, it leads to a reduced row size, thereby optimizing data retrieval and improving overall query efficiency. This solution is limited to cases where Impala's expression list contains only conjuncts, compound expressions are not supported, because partial elimination of compounds would involve expression rewrites in the Impala expression. A new query option is added: iceberg_predicate_pushdown_subsetting. The query option's default value is true. It can be turned off by setting it to false. Performance of the predicate location is measured on two edge cases: - 1000 expression, 999 skipped: on avreage 2 ms - 1000 expression, 1 skipped: on average 25 ms Tests: - planner test cases added for disabled mode - existing planner test cases adjusted - core tests passed Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test 12 files changed, 355 insertions(+), 72 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/6 -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 6 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Peter Rozsa has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@118 PS5, Line 118: private final BiMap impalaIcebergPredicateMapping_ = > line too long (92 > 90) Done http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@173 PS5, Line 173: private boolean needIcebergForPlanning() { > line too long (92 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 6 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Tue, 22 Aug 2023 09:09:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13713/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 5 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Fri, 04 Aug 2023 09:35:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Peter Rozsa has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 5: (17 comments) Thank you for the review Gabor! http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG@24 PS4, Line 24: Iceberg residual expression hand > This 'residual expression backtracking' sounds pretty mysterious. Might be Done http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG@37 PS4, Line 37: If Impala is unable to match a > 'If Impala is unable to match all residual expression to Impala conjuncts t Done http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG@53 PS4, Line 53: is true. It can be turned of > I think it's in fact 'iceberg_predicate_pushdown_subsetting' Done http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG@57 PS4, Line 57: - 1000 expression, 999 skipped: on avreage 2 ms > Did you have the chance to make some perf measurements when you start incre Done http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/analysis/PredicateVisitor.java File fe/src/main/java/org/apache/impala/analysis/PredicateVisitor.java: http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/analysis/PredicateVisitor.java@33 PS4, Line 33: > From Impala's perspective this class is only a Visitor for Iceberg predicat Renamed to IcebergExpressionCollector http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@118 PS4, Line 118: impalaIcebergPredicate > I think something like 'impalaIcebergPredicateMapping' would be more verbos Done http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@375 PS4, Line 375: IcebergUtil.planFiles(getIceTable(), > indentation is off with these 2 lines. Done http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@418 PS4, Line 418: conjuncts_.removeAll(impalaIcebergPredicateMapping_.values()); > Might be personal preference, but if you returned here then you wouldn't ne Done http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@421 PS4, Line 421: if (!analyzer_.getQueryOptions().iceberg_predicate_pushdown_subsetting) { return; } > Doesn't this fit into the line above? Done http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@430 PS4, Line 430: If we fail to locate any of the Iceberg residual expressions then we skip > Could you please add some comment here for more clarity, like "If we fail t Done http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@431 PS4, Line 431: filtering the predicates to be pushed down to Impala scanner. > merge into the line above Done http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@437 PS4, Line 437: skippedExpressions_.addAll( > I think something the other way around would be more readable if feasible. Done http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test: http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test@1 PS4, Line 1: # If one or more non-partition predicates are present, then all predicates are pushed down > Please mention in this comment that the pred subsetting feature is off. I k Done http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test: http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@77 PS4, Line 77: > I think this iceberg-predicates.test file is the place where we should test Newly added test cases moved to this file http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test: http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test@1005 PS4, Line 1005:
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Peter Rozsa has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. IMPALA-12089: Be able to skip pushing down a subset of the predicates This change adds a predicate filtering mechanism at planning time that locates Impala's predicates in the residual expressions from Iceberg planning. By locating all residual expressions, the remainder expression set can be calculated. The current implementation is an all-or-nothing filter, if 'planFiles()' (Iceberg API) returns no residual expression, then all Impala predicates can be skipped, if there's any residual expression, every Impala predicate is pushed down to the Impala scanner. Residual expressions are the remaining filter expressions after the pushdown of predicates into the Iceberg table scan. By locating the remainder expression, we can reduce the number of predicates that will be pushed down to the Impala scanner. After this change, the Iceberg residual expression handling is improved by locating the simple conjuncts in the residual expression and mapping back them to Impala conjuncts. For example, if the list of Impala conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"' and 'col_i' happens to be a partition column in the Iceberg table definition and Iceberg table scan can eliminate the expression, the residual expression will be 'col_s = "a"'. This expression can be mapped back as an Impala predicate, and any other expression can be removed from the effective Impala conjunct list, and pushed down to the scanner, skipping the unnecessary filtering of 'col_i'. If there's no residual expression, the behavior is the same as before, all predicate pushdown is skipped. If Impala is unable to match all residual expression to Impala conjuncts then all the conjunct are pushed dow to Impala scanner. This change offers the advantage of not pushing down already evaluated filters to the Impala scanner nodes, resulting in enhanced scanning performance. Additionally, if the filter expression affects columns that are unnecessary for the final result and can be filtered out during Iceberg's table scan, it leads to a reduced row size, thereby optimizing data retrieval and improving overall query efficiency. This solution is limited to cases where Impala's expression list contains only conjuncts, compound expressions are not supported, because partial elimination of compounds would involve expression rewrites in the Impala expression. A new query option is added: iceberg_predicate_pushdown_subsetting. The query option's default value is true. It can be turned off by setting it to false. Performance of the predicate location is measured on two edge cases: - 1000 expression, 999 skipped: on avreage 2 ms - 1000 expression, 1 skipped: on average 25 ms Tests: - planner test cases added for disabled mode - existing planner test cases adjusted - core tests passed Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test 12 files changed, 353 insertions(+), 72 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/5 -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 5 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@118 PS5, Line 118: private final BiMap impalaIcebergPredicateMapping_ = HashBiMap.create(); line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/20133/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@173 PS5, Line 173: return !impalaIcebergPredicateMapping_.isEmpty() || tblRef_.getTimeTravelSpec() != null; line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 5 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Fri, 04 Aug 2023 09:13:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 4: (17 comments) http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG@24 PS4, Line 24: residual expression backtracking This 'residual expression backtracking' sounds pretty mysterious. Might be better this way: 'Iceberg residual expression handling' http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG@37 PS4, Line 37: If the expression backtracking 'If Impala is unable to match all residual expression to Impala conjuncts then all the conjunct are pushed dow to Impala scanner' ? http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG@53 PS4, Line 53: iceberg_predicate_subsetting I think it's in fact 'iceberg_predicate_pushdown_subsetting' http://gerrit.cloudera.org:8080/#/c/20133/4//COMMIT_MSG@57 PS4, Line 57: Tests: Did you have the chance to make some perf measurements when you start increasing the number of predicates so that we can be sure converting Iceberg preds back to Impala preds doesn't get too expensive with scale? http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/analysis/PredicateVisitor.java File fe/src/main/java/org/apache/impala/analysis/PredicateVisitor.java: http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/analysis/PredicateVisitor.java@33 PS4, Line 33: PredicateVisitor >From Impala's perspective this class is only a Visitor for Iceberg >predicates/expressions, right? IcebergPredicateVisitor would be a more >suitable name, then. But being even more precise, this class is actually a collector of Iceberg predicates (using the visitor pattern). So something like IcebergPredicateCollector or IcebergExpressionCollector would be even more precise in my opinion. http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@118 PS4, Line 118: translatedExpressions_ I think something like 'impalaIcebergPredicateMapping' would be more verbose. http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@375 PS4, Line 375: IcebergUtil.planFiles(getIceTable(), indentation is off with these 2 lines. http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@418 PS4, Line 418: conjuncts_.removeAll(translatedExpressions_.values()); Might be personal preference, but if you returned here then you wouldn't need to put the rest of the function in an else block. http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@421 PS4, Line 421: return; Doesn't this fit into the line above? http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@430 PS4, Line 430: if (expr == null) { Could you please add some comment here for more clarity, like "If we fail to translate any of the Iceberg residual expressions then we skip filtering the predicates to be pushed down to Impala scanner." http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@431 PS4, Line 431: return; merge into the line above http://gerrit.cloudera.org:8080/#/c/20133/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@437 PS4, Line 437: expressionsToRetain.forEach(skippedExpressions_::remove); I think something the other way around would be more readable if feasible. Like skippedExpressions_.stream().filter(some condition) ? http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test: http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test@1 PS4, Line 1: # If one or more non-partition predicates are present, then all predicates are pushed down Please mention in this comment that the pred subsetting feature is off. I know this info is in the file name but you can't emphasise this enough. http://gerrit.cloudera.org:8080/#/c/20133/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test File testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test:
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13580/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 4 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Wed, 19 Jul 2023 12:07:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Peter Rozsa has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. IMPALA-12089: Be able to skip pushing down a subset of the predicates This change adds a predicate filtering mechanism at planning time that locates Impala's predicates in the residual expressions from Iceberg planning. By locating all residual expressions, the remainder expression set can be calculated. The current implementation is an all-or-nothing filter, if 'planFiles()' (Iceberg API) returns no residual expression, then all Impala predicates can be skipped, if there's any residual expression, every Impala predicate is pushed down to the Impala scanner. Residual expressions are the remaining filter expressions after the pushdown of predicates into the Iceberg table scan. By locating the remainder expression, we can reduce the number of predicates that will be pushed down to the Impala scanner. After this change, the residual expression backtracking is improved by locating the simple conjuncts in the residual expression and mapping back them to Impala conjuncts. For example, if the list of Impala conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"' and 'col_i' happens to be a partition column in the Iceberg table definition and Iceberg table scan can eliminate the expression, the residual expression will be 'col_s = "a"'. This expression can be mapped back as an Impala predicate, and any other expression can be removed from the effective Impala conjunct list, and pushed down to the scanner, skipping the unnecessary filtering of 'col_i'. If there's no residual expression, the behavior is the same as before, all predicate pushdown is skipped. If the expression backtracking can't match all residual expressions in the Impala conjunct list, then the filtering resorts to the original all-or-nothing approach. This change offers the advantage of not pushing down already evaluated filters to the Impala scanner nodes, resulting in enhanced scanning performance. Additionally, if the filter expression affects columns that are unnecessary for the final result and can be filtered out during Iceberg's table scan, it leads to a reduced row size, thereby optimizing data retrieval and improving overall query efficiency. This solution is limited to cases where Impala's expression list contains only conjuncts, compound expressions are not supported, because partial elimination of compounds would involve expression rewrites in the Impala expression. A new query option is added: iceberg_predicate_subsetting. The query option's default value is true. It can be turned off by setting it to false. Tests: - planner test cases added for disabled mode - existing planner test cases adjusted - core tests passed Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A fe/src/main/java/org/apache/impala/analysis/PredicateVisitor.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test 12 files changed, 304 insertions(+), 72 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/4 -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 4 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 3: (30 comments) Thanks for your work, Peter! http://gerrit.cloudera.org:8080/#/c/20133/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20133/3//COMMIT_MSG@9 PS3, Line 9: This change adds a predicate filtering mechanism at planning time that For me, this paragraph was a bit difficult to read. I can imagine that someone with no background knowledge on this optimization would find it hard to understand what this patch does based on this explanation. I'd split this into multiple paragraphs for more readability/ eg.: - what is the current behaviour (all-or-nothing predicate filtering). - What we mean be residual (preds returned from Iceberg after a planFiles() call). Also emphasize that there is pred pushdown in Iceberg and in Impala too and here we optimize the pushdown in Impala. - How we want to achieve this (also a bit more structured description would be nice). - What we gain here. http://gerrit.cloudera.org:8080/#/c/20133/3//COMMIT_MSG@20 PS3, Line 20: compound : expressions created by Impala example pls http://gerrit.cloudera.org:8080/#/c/20133/3//COMMIT_MSG@19 PS3, Line 19: The visitor implementation : handles IN, NULL and binary expressions, and does not handle compound : expressions created by Impala, and aggregates, startsWith, isNan : expressions by Iceberg. Is there a reason for not covering these? due to difficulty in implementation? Are there plans to add this support later on? http://gerrit.cloudera.org:8080/#/c/20133/3//COMMIT_MSG@30 PS3, Line 30: - planner test cases added for disabled mode I see also coverage for the 'enabled' mode in the existing tests. http://gerrit.cloudera.org:8080/#/c/20133/3/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/20133/3/common/thrift/ImpalaService.thrift@800 PS3, Line 800: ICEBERG_PREDICATE_SUBSETTING = 158; Wondering if ICEBERG_PREDICATE_PUSHDOWN_SUBSETTING would be an even more self-explanatory name. Might be too long, though. http://gerrit.cloudera.org:8080/#/c/20133/3/fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java File fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java: http://gerrit.cloudera.org:8080/#/c/20133/3/fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java@45 PS3, Line 45: ImpalaExpressionVisitor this is rather an ImpalaPredicateLocator, isn't it? http://gerrit.cloudera.org:8080/#/c/20133/3/fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java@47 PS3, Line 47: OPERATORS A more verbose name with be nice. Like IMPALA_ICEBERG_OPERATOR_MAP http://gerrit.cloudera.org:8080/#/c/20133/3/fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java@56 PS3, Line 56: expressions I think impalaExpressions would be a better name as we deal with Iceberg expressions here as well, so would be nice to easily differentiate between them. http://gerrit.cloudera.org:8080/#/c/20133/3/fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java@67 PS3, Line 67: if (literal.to(StringType.get()) == literal) { This is a check if T is a string, right? Can't we simply call literal.toString() for all the cases? Wouldn't that work for String literals? http://gerrit.cloudera.org:8080/#/c/20133/3/fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java@86 PS3, Line 86: return result; this seems weird, but honestly not 100% up to speed with this Iceberg visitor thing :) Could you add a test to exercise this 'not' operator? I guess we'd need a bool partition column for this. Giving this a second thought, with this visitor we only check the occurrence of certain expressions within the Iceberg expression tree. So I think I get why we just simply return the intermediate results for a 'not' operation as it won't change what expressions we have seen so far. http://gerrit.cloudera.org:8080/#/c/20133/3/fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java@102 PS3, Line 102: public ImpalaExpressionVisitorResult predicate(BoundPredicate pred) { Could you help me understand what is this BoundPredicate and why it doesn't require such a treatment as the below UnboundPredicate does? http://gerrit.cloudera.org:8080/#/c/20133/3/fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java@112 PS3, Line 112: if (!(pred.term() instanceof NamedReference)) { you could merge L112-114 into one line. http://gerrit.cloudera.org:8080/#/c/20133/3/fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java@119 PS3, Line 119: case NOT_NULL: I personally find the case blocks more
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/13432/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 3 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 29 Jun 2023 12:05:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Peter Rozsa has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. IMPALA-12089: Be able to skip pushing down a subset of the predicates This change adds a predicate filtering mechanism at planning time that locates Impala's predicates in the residual expressions from Iceberg planning. By locating all residual expressions, the remainder expression set can be calculated. If there's no residual expression, the behaviour is the same as before, all predicate pushdown is skipped. If there is a residual expression, an Iceberg expression visitor implementation tries to locate each Impala predicate in the expression tree that the Iceberg planning returned as a residual expression. If the visitor's result is mismatched (the number of expressions does not match the number of located expressions), then it resorts to the original behaviour, all predicates are pushed down to the scanners. The visitor implementation handles IN, NULL and binary expressions, and does not handle compound expressions created by Impala, and aggregates, startsWith, isNan expressions by Iceberg. A new query option is added: iceberg_predicate_subsetting. The query option's default value is true. It can be turned off by setting it to false. Tests: - unit tests added for ImpalaExpressionVisitor - planner test cases added for disabled mode - core tests passed Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java A fe/src/test/java/org/apache/impala/analysis/ImpalaExpressionVisitorTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test 12 files changed, 656 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/3 -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 3 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 2: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/13430/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 2 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 29 Jun 2023 11:24:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Peter Rozsa has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. IMPALA-12089: Be able to skip pushing down a subset of the predicates This change adds a predicate filtering mechanism at planning time that locates Impala's predicates in the residual expressions from Iceberg planning. By locating all residual expressions, the remainder expression set can be calculated. If there's no residual expression, the behaviour is the same as before, all predicate pushdown is skipped. If there is a residual expression, an Iceberg expression visitor implementation tries to locate each Impala predicate in the expression tree that the Iceberg planning returned as a residual expression. If the visitor's result is mismatched (the number of expressions does not match the number of located expressions), then it resorts to the original behaviour, all predicates are pushed down to the scanners. The visitor implementation handles IN, NULL and binary expressions, and does not handle compound expressions created by Impala, and aggregates, startsWith, isNan expressions by Iceberg. A new query option is added: iceberg_predicate_subsetting. The query option's default value is true. It can be turned off by setting it to false. Tests: - unit tests added for ImpalaExpressionVisitor - planner test cases added for disabled mode - core tests passed Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java A fe/src/test/java/org/apache/impala/analysis/ImpalaExpressionVisitorTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test 12 files changed, 622 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/2 -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 2 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/20133 ) Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/13429/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 1 Gerrit-Owner: Peter Rozsa Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Peter Rozsa Gerrit-Comment-Date: Thu, 29 Jun 2023 08:06:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12089: Be able to skip pushing down a subset of the predicates
Peter Rozsa has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20133 Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates .. IMPALA-12089: Be able to skip pushing down a subset of the predicates This change adds a predicate filtering mechanism at planning time that locates Impala's predicates in the residual expressions from Iceberg planning. By locating all residual expressions, the remainder expression set can be calculated. If there's no residual expression, the behaviour is the same as before, all predicate pushdown is skipped. If there is a residual expression, an Iceberg expression visitor implementation tries to locate each Impala predicate in the expression tree that the Iceberg planning returned as a residual expression. If the visitor's result is mismatched (the number of expressions does not match the number of located expressions), then it resorts to the original behaviour, all predicates are pushed down to the scanners. The visitor implementation handles IN, NULL and binary expressions, and does not handle compound expressions created by Impala, and aggregates, startsWith, isNan expressions by Iceberg. A new query option is added: iceberg_predicate_subsetting. The query option's default value is true. It can be turned off by setting it to false. Tests: - unit tests added for ImpalaExpressionVisitor - planner test cases added for disabled mode - core tests passed Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A fe/src/main/java/org/apache/impala/analysis/ImpalaExpressionVisitor.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java A fe/src/test/java/org/apache/impala/analysis/ImpalaExpressionVisitorTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test 12 files changed, 622 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/1 -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 1 Gerrit-Owner: Peter Rozsa