Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/18141 )
Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables ...................................................................... Patch Set 17: (10 comments) Looks great! http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/exec/hdfs-orc-scanner.h File be/src/exec/hdfs-orc-scanner.h: http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/exec/hdfs-orc-scanner.h@427 PS17, Line 427: Returns nit. Decides http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/exec/hdfs-orc-scanner.h@433 PS17, Line 433: Usable nit. IsPushableInListFilter. http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/exec/hdfs-orc-scanner.cc@1486 PS17, Line 1486: if (in_list_filter->ContainsNull()) { : in_list.emplace_back(GetOrcPredicateDataType(col_type)); : } I wonder how the ORC layer recognizes the need to filter null. The logic here implies we append the column type at the end of the value list. http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/service/query-options.cc@974 PS17, Line 974: // Parse the enabled runtime filter types and validate it. nit. Parse and verify the enabled runtime filter types. http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/util/in-list-filter-ir.cc File be/src/util/in-list-filter-ir.cc: http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/util/in-list-filter-ir.cc@18 PS17, Line 18: #include "util/in-list-filter.h" : : #include "common/object-pool.h" nit. May arrange them in ascending order. http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/util/in-list-filter.cc File be/src/util/in-list-filter.cc: http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/util/in-list-filter.cc@38 PS17, Line 38: switch (col_type.type) { : case TYPE_TINYINT: : v = *reinterpret_cast<const int8_t*>(val); : break; : case TYPE_SMALLINT: : v = *reinterpret_cast<const int16_t*>(val); : break; : case TYPE_INT: : v = *reinterpret_cast<const int32_t*>(val); : break; : case TYPE_BIGINT: : v = *reinterpret_cast<const int64_t*>(val); : break; : case TYPE_DATE: : v = reinterpret_cast<const DateValue*>(val)->Value(); : break; : case TYPE_STRING: : case TYPE_VARCHAR: : s = reinterpret_cast<const StringValue*>(val); : return str_values_.find(string(s->ptr, s->len)) != str_values_.end(); : case TYPE_CHAR: : return str_values_.find(string(reinterpret_cast<const char*>(val), col_type.len)) : != str_values_.end(); : default: : DCHECK(false) << "Not support IN-list filter type: " << TypeToString(type_); : return false; : } Future improvement. We may need to something similar to MinMaxFilter class hierarchy to push the type into a subclass of InListFilter. In this way, we do not incur the cost of type switching for each Find() call. 182 #define NUMERIC_MIN_MAX_FILTER(NAME, TYPE) \ 183 class NAME##MinMaxFilter : public MinMaxFilter { \ 184 public: \ 185 NAME##MinMaxFilter() { \ 186 min_ = std::numeric_limits<TYPE>::max(); \ 187 max_ = std::numeric_limits<TYPE>::lowest(); \ 188 } \ 189 NAME##MinMaxFilter(const MinMaxFilterPB& protobuf); \ 190 virtual ~NAME##MinMaxFilter() {} \ 191 virtual const void* GetMin() const override { return &min_; } \ 192 virtual const void* GetMax() const override { return &max_; } \ 193 virtual bool GetCastIntMinMax( \ 194 const ColumnType& type, int64_t* out_min, int64_t* out_max) const override; \ 195 bool EvalOverlap( \ 196 const ColumnType& type, void* data_min, void* data_max) const override; \ 197 float ComputeOverlapRatio( \ min-max-filter.h http://gerrit.cloudera.org:8080/#/c/18141/17/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/18141/17/common/thrift/ImpalaService.thrift@725 PS17, Line 725: // Maximum number of entries in a runtime in-list filter. nit. distinct http://gerrit.cloudera.org:8080/#/c/18141/17/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/18141/17/common/thrift/PlanNodes.thrift@a134 PS17, Line 134: nit. dropped. http://gerrit.cloudera.org:8080/#/c/18141/17/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/18141/17/testdata/datasets/functional/functional_schema_template.sql@274 PS17, Line 274: neg nit. may spell out the entire word: alltypestiny_negative http://gerrit.cloudera.org:8080/#/c/18141/17/testdata/datasets/functional/schema_constraints.csv File testdata/datasets/functional/schema_constraints.csv: http://gerrit.cloudera.org:8080/#/c/18141/17/testdata/datasets/functional/schema_constraints.csv@323 PS17, Line 323: alltypestiny_neg nit. alltypestiny_negative -- To view, visit http://gerrit.cloudera.org:8080/18141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501 Gerrit-Change-Number: 18141 Gerrit-PatchSet: 17 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Comment-Date: Thu, 24 Feb 2022 15:25:13 +0000 Gerrit-HasComments: Yes