Tamas Mate has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/18017 )

Change subject: IMPALA-10910, IMPALA-5509: Runtime filter: dictionary filter 
support
......................................................................

IMPALA-10910, IMPALA-5509: Runtime filter: dictionary filter support

This commit is based on Csaba Ringhofer's earlier work on IMPALA-5509.

If a runtime filter uses only a single column, then it can be used to
filter Parquet dictionaries and if all dictionary values are filtered
out, the whole row group can be skipped. This is especially useful for
Iceberg tables, as the partition column is in the data file, therefore
this can help eliminate unnecessary reads.

The chance of false positives grow exponentially with the size of the
dictionary, so this optimisation is only useful for small dictionaries.
Therefore, the dictionary size has been limited to 1024 for runtime
filtering.

Testing:
 - Added e2e test that creates an Iceberg/Parquet table and queries it
 - Ran single node perf test with TPC-H scale 10 on Parquet, there
   were no regressions

Change-Id: Ida0ada8799774be34312eaa4be47336149f637c7
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-dictionary-runtime-filter.test
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-dictionary-runtime-filter.test
M tests/query_test/test_runtime_filters.py
5 files changed, 348 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/18017/11
--
To view, visit http://gerrit.cloudera.org:8080/18017
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ida0ada8799774be34312eaa4be47336149f637c7
Gerrit-Change-Number: 18017
Gerrit-PatchSet: 11
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <amarg...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to