Tamas Mate has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/18531 )
Change subject: IMPALA-10453: Support file pruning via runtime filters on Iceberg ...................................................................... IMPALA-10453: Support file pruning via runtime filters on Iceberg Iceberg tables store partition information in manifest files and not in the file path. This metadata has already been pushed down to the scanners and this commit uses this metadata to evaluate runtime filters on Iceberg files. Pefromance measurement: Used TPC-DS Q10 [1] with scale of 10 to measure the query performance. Min/Max filters have been disabled and increased the wait time for runtime filters to 5 seconds. After pre-warming the Catalog I executed Q10 5 times on my local machine. The fastest execution times were: Baseline Parquet tables: 1.08s Baseline Iceberg tables without this patch: 1.43s Iceberg tables with this patch: 1.09s Testing: * Added e2e tests. * Initial perofrmance test with TPC-DS Q10. Ref: [1] TPC-DS Q10: select cd_gender, cd_marital_status, cd_education_status, count(*) cnt1, cd_purchase_estimate, count(*) cnt2, cd_credit_rating, count(*) cnt3, cd_dep_count, count(*) cnt4, cd_dep_employed_count, count(*) cnt5, cd_dep_college_count, count(*) cnt6 from customer c, customer_address ca, customer_demographics where c.c_current_addr_sk = ca.ca_address_sk and ca_county in ('Walker County','Richland County','Gaines County', 'Douglas County','Dona Ana County') and cd_demo_sk = c.c_current_cdemo_sk and exists (select * from store_sales, date_dim where c.c_customer_sk = ss_customer_sk and ss_sold_date_sk = d_date_sk and d_year = 2002 and d_moy between 4 and 4+3) and exists (select * from (select ws_bill_customer_sk as customer_sk, d_year,d_moy from web_sales, date_dim where ws_sold_date_sk = d_date_sk and d_year = 2002 and d_moy between 4 and 4+3 union all select cs_ship_customer_sk as customer_sk, d_year, d_moy from catalog_sales, date_dim where cs_sold_date_sk = d_date_sk and d_year = 2002 and d_moy between 4 and 4+3 ) x where c.c_customer_sk = customer_sk) group by cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, cd_dep_college_count order by cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, cd_dep_college_count limit 100; Change-Id: I7762e1238bdf236b85d2728881a402a2bb41f36a --- M be/src/exec/file-metadata-utils.cc M be/src/exec/file-metadata-utils.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test A testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-runtime-filter.test M tests/query_test/test_iceberg.py M tests/query_test/test_runtime_filters.py 13 files changed, 217 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/18531/4 -- To view, visit http://gerrit.cloudera.org:8080/18531 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7762e1238bdf236b85d2728881a402a2bb41f36a Gerrit-Change-Number: 18531 Gerrit-PatchSet: 4 Gerrit-Owner: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Gergely Fürnstáhl <gfurnst...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>