Daniel Becker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17026


Change subject: IMPALA-9470: Use Parquet Bloom filters - Part 1
......................................................................

IMPALA-9470: Use Parquet Bloom filters - Part 1

This change adds read support for Parquet Bloom filters for some types.
The supported Parquet type - Impala type pairs are the following:

 ---------------------------------------
|Parquet type |  Impala type            |
|---------------------------------------|
|INT32        |  TINYINT, SMALLINT, INT |
|INT64        |  BIGINT                 |
|FLOAT        |  FLOAT                  |
|DOUBLE       |  DOUBLE                 |
|BYTE_ARRAY   |  STRING                 |
 ---------------------------------------

If a Bloom filter is available for a column that is fully dictionary
encoded, the Bloom filter is not used as the dictionary can give exact
results in filtering.

Testing:
  - Added tests/query_test/test_parquet_bloom_filter.py that tests that
    Parquet Bloom filtering works for the supported types and that we do
    not incorrectly discard row groups for the unsupported type VARCHAR.

Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
---
M LICENSE.txt
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exprs/expr-value.h
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
M be/src/kudu/util/block_bloom_filter.cc
M be/src/kudu/util/block_bloom_filter.h
M be/src/runtime/bufferpool/buffer-pool-internal.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
A be/src/thirdparty/xxhash/README.md
A be/src/thirdparty/xxhash/xxhash.h
M be/src/util/CMakeLists.txt
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/impala-bloom-filter-buffer-allocator.cc
A be/src/util/impala-bloom-filter-buffer-allocator.h
A be/src/util/parquet-bloom-filter.cc
A be/src/util/parquet-bloom-filter.h
M common/thrift/parquet.thrift
A testdata/data/parquet-bloom-filtering.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-bloom-filter.test
A tests/query_test/test_parquet_bloom_filter.py
25 files changed, 6,815 insertions(+), 122 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17026/11
--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 11
Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>

Reply via email to