Qifan Chen has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/17478 )
Change subject: IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables ...................................................................... IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables This patch enables min/max filters for equi-joins on sort by columns in a Parquet table created by Impala. This is to take advantage of the min/max values in column index being fully sorted in each data file for the table. When there are multiple sort by columns in the table, only the leading column will be assigned a min/max filter. The control knob is query option minmax_filter_sorted_columns, default to true. When minmax_filter_sorted_columns is true and the threshold (query option minmax_filter_threshold) is 0, the patch automatically assigns a reasonable value for the threshhold, and selects PAGE to be the filtering level (query option minmax_filtering_level). When the threshold is greater than 0, no adjustment will be made to either the threshold or the filtering level. When minmax_filter_sorted_columns is set to false, no min/max filters will be specifically assigned to the leading sort by columns. Testing: 1). Added two new tests in overlap_min_max_filters.test to verify a) Min/max filters are only created for leading sort by column; b) Query option minmax_filter_sorted_columns works. 2). Core [TBD] 3). Performance [TBD] Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963 --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test 10 files changed, 131 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/17478/8 -- To view, visit http://gerrit.cloudera.org:8080/17478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963 Gerrit-Change-Number: 17478 Gerrit-PatchSet: 8 Gerrit-Owner: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>