Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18017 )

Change subject: IMPALA-10910, IMPALA-5509: Runtime filter: dictionary filter 
support
......................................................................

IMPALA-10910, IMPALA-5509: Runtime filter: dictionary filter support

This commit is based on Csaba Ringhofer's earlier work on IMPALA-5509.

If a runtime filter uses only a single column, then it can be used to
filter Parquet dictionaries and if all dictionary values are filtered
out, the whole row group can be skipped. This is especially useful for
Iceberg tables, as the partition column is in the data file, therefore
this can help eliminate unnecessary reads.

The chance of false positives grow exponentially with the size of the
dictionary, so this optimisation is only useful for small dictionaries.
A new query option has been added to limit the runtime filter evaluation
to smaller diciotnaries, the default value has been set to 1024,
the new option is 'PARQUET_DICTIONARY_RUNTIME_FILTER_ENTRY_LIMIT'.

Testing:
 - Added e2e test that creates an Iceberg/Parquet table and queries it
 - Ran single node perf test with TPC-H scale 10 on Parquet, there
   were no regressions

Change-Id: Ida0ada8799774be34312eaa4be47336149f637c7
Reviewed-on: http://gerrit.cloudera.org:8080/18017
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-dictionary-runtime-filter.test
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-dictionary-runtime-filter.test
M tests/query_test/test_runtime_filters.py
9 files changed, 375 insertions(+), 20 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/18017
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ida0ada8799774be34312eaa4be47336149f637c7
Gerrit-Change-Number: 18017
Gerrit-PatchSet: 16
Gerrit-Owner: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <amarg...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to