Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18141 )

Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables
......................................................................


Patch Set 17:

(10 comments)

Looks great!

http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/exec/hdfs-orc-scanner.h
File be/src/exec/hdfs-orc-scanner.h:

http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/exec/hdfs-orc-scanner.h@427
PS17, Line 427: Returns
nit. Decides


http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/exec/hdfs-orc-scanner.h@433
PS17, Line 433: Usable
nit. IsPushableInListFilter.


http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/exec/hdfs-orc-scanner.cc@1486
PS17, Line 1486: if (in_list_filter->ContainsNull()) {
               :       in_list.emplace_back(GetOrcPredicateDataType(col_type));
               :     }
I wonder how the ORC layer recognizes the need to filter null.  The logic here 
implies we append the column type at the end of the value list.


http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/service/query-options.cc@974
PS17, Line 974: // Parse the enabled runtime filter types and validate it.
nit. Parse and verify the enabled runtime filter types.


http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/util/in-list-filter-ir.cc
File be/src/util/in-list-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/util/in-list-filter-ir.cc@18
PS17, Line 18: #include "util/in-list-filter.h"
             :
             : #include "common/object-pool.h"
nit. May arrange them in ascending order.


http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/util/in-list-filter.cc
File be/src/util/in-list-filter.cc:

http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/util/in-list-filter.cc@38
PS17, Line 38: switch (col_type.type) {
             :     case TYPE_TINYINT:
             :       v = *reinterpret_cast<const int8_t*>(val);
             :       break;
             :     case TYPE_SMALLINT:
             :       v = *reinterpret_cast<const int16_t*>(val);
             :       break;
             :     case TYPE_INT:
             :       v = *reinterpret_cast<const int32_t*>(val);
             :       break;
             :     case TYPE_BIGINT:
             :       v = *reinterpret_cast<const int64_t*>(val);
             :       break;
             :     case TYPE_DATE:
             :       v = reinterpret_cast<const DateValue*>(val)->Value();
             :       break;
             :     case TYPE_STRING:
             :     case TYPE_VARCHAR:
             :       s = reinterpret_cast<const StringValue*>(val);
             :       return str_values_.find(string(s->ptr, s->len)) != 
str_values_.end();
             :     case TYPE_CHAR:
             :       return str_values_.find(string(reinterpret_cast<const 
char*>(val), col_type.len))
             :           != str_values_.end();
             :     default:
             :       DCHECK(false) << "Not support IN-list filter type: " << 
TypeToString(type_);
             :       return false;
             :   }
Future improvement.

We may need to something similar to MinMaxFilter class hierarchy to push the 
type into a subclass of InListFilter. In this way, we do not incur the cost of 
type switching for each Find() call.

182 #define NUMERIC_MIN_MAX_FILTER(NAME, TYPE)                                  
        \
183   class NAME##MinMaxFilter : public MinMaxFilter {                          
        \
184    public:                                                                  
        \
185     NAME##MinMaxFilter() {                                                  
        \  
186       min_ = std::numeric_limits<TYPE>::max();                              
        \
187       max_ = std::numeric_limits<TYPE>::lowest();                           
        \
188     }                                                                       
        \
189     NAME##MinMaxFilter(const MinMaxFilterPB& protobuf);                     
        \
190     virtual ~NAME##MinMaxFilter() {}                                        
        \
191     virtual const void* GetMin() const override { return &min_; }           
        \
192     virtual const void* GetMax() const override { return &max_; }           
        \ 
193     virtual bool GetCastIntMinMax(                                          
        \
194         const ColumnType& type, int64_t* out_min, int64_t* out_max) const 
override; \
195     bool EvalOverlap(                                                       
        \
196         const ColumnType& type, void* data_min, void* data_max) const 
override;     \
197     float ComputeOverlapRatio(                                              
        \
min-max-filter.h


http://gerrit.cloudera.org:8080/#/c/18141/17/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/18141/17/common/thrift/ImpalaService.thrift@725
PS17, Line 725:   // Maximum number of entries in a runtime in-list filter.
nit. distinct


http://gerrit.cloudera.org:8080/#/c/18141/17/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/18141/17/common/thrift/PlanNodes.thrift@a134
PS17, Line 134:
nit. dropped.


http://gerrit.cloudera.org:8080/#/c/18141/17/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/18141/17/testdata/datasets/functional/functional_schema_template.sql@274
PS17, Line 274: neg
nit. may spell out the entire word: alltypestiny_negative


http://gerrit.cloudera.org:8080/#/c/18141/17/testdata/datasets/functional/schema_constraints.csv
File testdata/datasets/functional/schema_constraints.csv:

http://gerrit.cloudera.org:8080/#/c/18141/17/testdata/datasets/functional/schema_constraints.csv@323
PS17, Line 323: alltypestiny_neg
nit. alltypestiny_negative



--
To view, visit http://gerrit.cloudera.org:8080/18141
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501
Gerrit-Change-Number: 18141
Gerrit-PatchSet: 17
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Comment-Date: Thu, 24 Feb 2022 15:25:13 +0000
Gerrit-HasComments: Yes

Reply via email to