Peter Rozsa has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/20133 )
Change subject: IMPALA-12089: Be able to skip pushing down a subset of the predicates ...................................................................... IMPALA-12089: Be able to skip pushing down a subset of the predicates This change adds a predicate filtering mechanism at planning time that locates Impala's predicates in the residual expressions from Iceberg planning. By locating all residual expressions, the remainder expression set can be calculated. The current implementation is an all-or-nothing filter, if 'planFiles()' (Iceberg API) returns no residual expression, then all Impala predicates can be skipped, if there's any residual expression, every Impala predicate is pushed down to the Impala scanner. Residual expressions are the remaining filter expressions after the pushdown of predicates into the Iceberg table scan. By locating the remainder expression, we can reduce the number of predicates that will be pushed down to the Impala scanner. After this change, the Iceberg residual expression handling is improved by locating the simple conjuncts in the residual expression and mapping back them to Impala conjuncts. For example, if the list of Impala conjuncts consists of two predicates 'col_i != 100' and 'col_s = "a"' and 'col_i' happens to be a partition column in the Iceberg table definition and Iceberg table scan can eliminate the expression, the residual expression will be 'col_s = "a"'. This expression can be mapped back as an Impala predicate, and any other expression can be removed from the effective Impala conjunct list, and pushed down to the scanner, skipping the unnecessary filtering of 'col_i'. If there's no residual expression, the behavior is the same as before, all predicate pushdown is skipped. If Impala is unable to match all residual expression to Impala conjuncts then all the conjunct are pushed dow to Impala scanner. This change offers the advantage of not pushing down already evaluated filters to the Impala scanner nodes, resulting in enhanced scanning performance. Additionally, if the filter expression affects columns that are unnecessary for the final result and can be filtered out during Iceberg's table scan, it leads to a reduced row size, thereby optimizing data retrieval and improving overall query efficiency. This solution is limited to cases where Impala's expression list contains only conjuncts, compound expressions are not supported, because partial elimination of compounds would involve expression rewrites in the Impala expression. A new query option is added: iceberg_predicate_pushdown_subsetting. The query option's default value is true. It can be turned off by setting it to false. Performance of the predicate location is measured on two edge cases: - 1000 expression, 999 skipped: on avreage 2 ms - 1000 expression, 1 skipped: on average 25 ms Tests: - planner test cases added for disabled mode - existing planner test cases adjusted - core tests passed Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A fe/src/main/java/org/apache/impala/analysis/IcebergExpressionCollector.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates-disabled-subsetting.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test 12 files changed, 371 insertions(+), 72 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/20133/9 -- To view, visit http://gerrit.cloudera.org:8080/20133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I597f69ad03ecaf9e304613ef934654e3d9614ae8 Gerrit-Change-Number: 20133 Gerrit-PatchSet: 9 Gerrit-Owner: Peter Rozsa <pro...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Peter Rozsa <pro...@cloudera.com>