Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18433 )
Change subject: IMPALA-11141: Use exact data types in IN-list filter ...................................................................... IMPALA-11141: Use exact data types in IN-list filter Currently, we use a std::unordered_set<int64_t> for all numeric types (including DATE type). It's a waste of space for small data types like tinyint, smallint, int, etc. This patch extends the base InListFilter class with native implementations for different data types. For string type in-list filters, this patch uses impala::StringValue instead of std::string. This simplifies the Insert() method, which improves the codegen time. To use impala::StringValue, this patch switches the set implementation to boost::unordered_set. Same as what we use in InPredicate. Another improvement of using impala::StringValue is that we can easily maintain the strings in MemPool. When inserting a new batch of values, the new values are inserted into a temp set. String pointers still reference to the original tuple values. At the end of processing each batch, MaterializeValues() is invoked to copy the strings into the filter's own mem pool. This is more memory-friendly than the original approach since we can allocate the string batch at once. Tests: - Add unit tests for different types of in-list filters Change-Id: Id434a542b2ced64efa3bfc974cb565b94a4193e9 Reviewed-on: http://gerrit.cloudera.org:8080/18433 Reviewed-by: Qifan Chen <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/filter-context.cc M be/src/exec/hdfs-orc-scanner.cc M be/src/runtime/runtime-filter-bank.cc M be/src/util/CMakeLists.txt M be/src/util/in-list-filter-ir.cc A be/src/util/in-list-filter-test.cc M be/src/util/in-list-filter.cc M be/src/util/in-list-filter.h M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test 11 files changed, 615 insertions(+), 235 deletions(-) Approvals: Qifan Chen: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/18433 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Id434a542b2ced64efa3bfc974cb565b94a4193e9 Gerrit-Change-Number: 18433 Gerrit-PatchSet: 6 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]>
