Qifan Chen has uploaded a new patch set (#18). ( 
http://gerrit.cloudera.org:8080/17478 )

Change subject: IMPALA-10709: Min/max filters should be enabled for joins on 
sorted columns in Parquet tables
......................................................................

IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in 
Parquet tables

This patch enables min/max filters for equi-joins on sort by
columns in a Parquet table created by Impala. This is to take advantage
of Impala sorting the min/max values in column index in each data
file for the table. When there are multiple sort by columns in the
table, only the leading column will be assigned a min/max filter. The
control knob is query option minmax_filter_sorted_columns, default to
true.

When minmax_filter_sorted_columns is true and the threshold (query
option minmax_filter_threshold) is 0, the patch automatically assigns
a reasonable value for the threshhold, and selects PAGE to be the
filtering level (query option minmax_filtering_level). When the
threshold is greater than 0, no adjustment will be made to either the
threshold or the filtering level. When the min and max column stats
exist on the leading sort column, these stats can be used to help
select filters that are most likely helpful.

When minmax_filter_sorted_columns is set to false, no min/max filters
will be specifically assigned to the leading sort by columns.

In the backend, the skiped pages are quickly identified by finding the
lower and the upper bounds in the sorted min and max value arrays,
given the min and max range in the filter.

Testing:
  1). Added two new tests in overlap_min_max_filters.test to verify
      a) Min/max filters are only created for leading sort by column;
      b) Query option minmax_filter_sorted_columns works.
  2). Added new tests in parquet-page-index-test.cc to cover the fast
      code path;
  3). Core [TBD]
  4). Performance [TBD]

Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-common.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exec/parquet/parquet-page-index-test.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
16 files changed, 529 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/17478/18
--
To view, visit http://gerrit.cloudera.org:8080/17478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
Gerrit-Change-Number: 17478
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

Reply via email to