Hello Tamas Mate, Qifan Chen, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17960

to look at the new patch set (#3).

Change subject: IMPALA-10777: Enable min/max filtering for Iceberg partitions
......................................................................

IMPALA-10777: Enable min/max filtering for Iceberg partitions

This patch enables min/max filters for Iceberg columns that
participate in table partitioning. The min/max filters are
evaluated at the Parquet row group level. This means that it
is still slower than dynamic partition pruning (which doesn't
even need to open the files), but much faster than no pruning at all.

Performance

I used the following query to measure perf on a scale 10 TPC-DS
dataset:

 select i_item_id,sum(ss_ext_sales_price) total_sales
 from
         store_sales,
         date_dim,
          customer_address,
          item
 where i_item_id in (select
      i_item_id
 from item
 where i_color in ('orchid','chiffon','lace'))
  and     ss_item_sk              = i_item_sk
  and     ss_sold_date_sk         = d_date_sk
  and     d_year                  = 2000
  and     d_moy                   = 1
  and     ss_addr_sk              = ca_address_sk
  and     ca_gmt_offset           = -8

The above query took the following times to execute:

Regular Parquet table: 1.16s
Iceberg table without min/max filters: 4.39s
Iceberg table with min/max filters: 1.77s

Testing:
 * added e2e test
 * planner test could not be added because Iceberg tables behave
   differently during planner tests (due to some hacks that needs
   refactoring)

Change-Id: I51b53188c6da7eeebfeae385e1de31ace0980cac
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/runtime/runtime-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/FeTable.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test
8 files changed, 80 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/17960/3
--
To view, visit http://gerrit.cloudera.org:8080/17960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I51b53188c6da7eeebfeae385e1de31ace0980cac
Gerrit-Change-Number: 17960
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to