Janaki Lahorani has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/12113 )
Change subject: IMPALA-6533: Add min-max filter for decimal types on kudu tables. ...................................................................... IMPALA-6533: Add min-max filter for decimal types on kudu tables. The code mimics the code written for other min-max filters. Decimal data can be stored using 4 bytes, 8 bytes and 16 bytes. The code respectively handles these 3 storage configurations. The column definition states the precision and the precision determines the storage size. The minimum and maximum values are stored in a union. The precision from the column will come in as an input. Based on the precision the size will be found, and depending on the size appropriate variable will be used. The code in min-max-filter* follows the general convention of the file, hence uses macros. The test includes 24 decimal columns (as listed below) with the following joins: 1. Inner Join with broadcast (2 tables) 1a. 1 predicate 1b. 4 predicates - all results in decimal min-max filter 1c. 4 predicates - 3 results in decimal min=max filter; 1 doesn't 2. Inner Join with Shuffle (3 tables) 3. Right outer join (2 tables) 4. Left Semi join (2 tables) 5. Right Semi join (2 tables) Decimal Columns: 4bytes: (5,0), (5,1), (5,3), (5,5) (9,0), (9,1), (9,5), (9,9) 8 bytes: (14,0), (14,1), (14,7), (14,14) (18,0), (18,1), (18,9), (18,18) 16 bytes: (28,0), (28,1), (28,14), (28,28) (38,0), (38,1), (38,19), (38,38) The test aggregates the count of probe rows. This shows that the min-max filter is exercised, because the number of probe rows is less than the total number of rows in the probe side table. The count of probe rows is considered to be deterministic. But, it will be beneficial to look out for changes in Kudu that can change the way data is partitioned. Such a change could change the probe row count and in that case, the test will have to be updated. impala_test_suite.py and test_result_verifier.py are enhanced to support saving of aggregation using update_results. Change-Id: Ib7e7278e902160d7060f8097290bc172d9031f94 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/runtime/coordinator.cc M be/src/runtime/decimal-value.h M be/src/runtime/decimal-value.inline.h M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter-test.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M bin/rat_exclude_files.txt M common/thrift/Data.thrift M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/data/README A testdata/data/decimal_rtf_tbl.txt A testdata/data/decimal_rtf_tiny_tbl.txt M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/QueryTest/decimal_min_max_filters.test M testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test M tests/common/impala_test_suite.py M tests/common/test_result_verifier.py M tests/query_test/test_runtime_filters.py 22 files changed, 7,321 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/12113/17 -- To view, visit http://gerrit.cloudera.org:8080/12113 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib7e7278e902160d7060f8097290bc172d9031f94 Gerrit-Change-Number: 12113 Gerrit-PatchSet: 17 Gerrit-Owner: Janaki Lahorani <jan...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Janaki Lahorani <jan...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <thomasmarsh...@cmu.edu> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>