Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18017 )
Change subject: IMPALA-10910, IMPALA-5509: Runtime filter: dictionary filter support ...................................................................... IMPALA-10910, IMPALA-5509: Runtime filter: dictionary filter support This commit is based on Csaba Ringhofer's earlier work on IMPALA-5509. If a runtime filter uses only a single column, then it can be used to filter Parquet dictionaries and if all dictionary values are filtered out, the whole row group can be skipped. This is especially useful for Iceberg tables, as the partition column is in the data file, therefore this can help eliminate unnecessary reads. The chance of false positives grow exponentially with the size of the dictionary, so this optimisation is only useful for small dictionaries. A new query option has been added to limit the runtime filter evaluation to smaller diciotnaries, the default value has been set to 1024, the new option is 'PARQUET_DICTIONARY_RUNTIME_FILTER_ENTRY_LIMIT'. Testing: - Added e2e test that creates an Iceberg/Parquet table and queries it - Ran single node perf test with TPC-H scale 10 on Parquet, there were no regressions Change-Id: Ida0ada8799774be34312eaa4be47336149f637c7 Reviewed-on: http://gerrit.cloudera.org:8080/18017 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift A testdata/workloads/functional-query/queries/QueryTest/iceberg-dictionary-runtime-filter.test A testdata/workloads/functional-query/queries/QueryTest/parquet-dictionary-runtime-filter.test M tests/query_test/test_runtime_filters.py 9 files changed, 375 insertions(+), 20 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/18017 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ida0ada8799774be34312eaa4be47336149f637c7 Gerrit-Change-Number: 18017 Gerrit-PatchSet: 16 Gerrit-Owner: Tamas Mate <tm...@cloudera.com> Gerrit-Reviewer: Amogh Margoor <amarg...@gmail.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>