Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18141 )

Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables
......................................................................


Patch Set 14:

(10 comments)

Great. Thanks a lot for the rework.

http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG@20
PS14, Line 20: Currently they
             : are generated only for small build side (e.g. #rows <= 1024) of a
             : broadcast join.
nit. They are generated for the build side of a broadcast join.

Suggest not to mention small build side until a few sentences later.


http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG@24
PS14, Line 24: #rows of the build side exceeds the threshold (1024)
nit. # of distinct values of the build side exceeds a threshold (default to 
1024).


http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG@39
PS14, Line 39: IN-list filters are disabled by default
It seems a formal performance test against TPCDs can help decide the default 
setting. Maybe we do this once Impala 11140, 11141 and 11142 are resolved.


http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/runtime/runtime-filter-bank.cc@380
PS14, Line 380:    uint32_t entry_limit = InListFilter::DEFAULT_ENTRY_LIMIT;
              :     if 
(query_state_->query_options().__isset.runtime_in_list_filter_entry_limit) {
              :       entry_limit = 
query_state_->query_options().runtime_in_list_filter_entry_limit;
              :     }
              :     in_list_filter = 
InListFilter::Create(params.in_list_filter(),
              :         fs->consumed_filter->type(), entry_limit, &obj_pool_);
              :     fs->in_list_filters.push_back(in_list_filter);
Duplicated code with a portion of implementation of 
RuntimeFilterBank::AllocateScratchInListFilte().


http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/runtime/runtime-filter-bank.cc@444
PS14, Line 444: :DEFAULT_ENTRY_LIMIT;
I wonder if we need this, since runtime_in_list_filter_entry_limit is default 
to 1024 and covers the non-negative domain.


http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc
File be/src/util/in-list-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc@55
PS13, Line 55: if (UNLIKELY(s->ptr == nullptr)) {
             :         contains_null_ = true
> The default constructor of StringValue creates a null 'ptr'. I think we'd b
Done


http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/util/in-list-filter-ir.cc
File be/src/util/in-list-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/util/in-list-filter-ir.cc@61
PS14, Line 61: return
should set always_true_ to true here.


http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/ImpalaService.thrift@725
PS14, Line 725:   RUNTIME_IN_LIST_FILTER_ENTRY_LIMIT = 142;
nit. missing comment.


http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/Query.thrift@578
PS14, Line 578:   143: optional i32 runtime_in_list_filter_entry_limit = 1024;
nit. missing comment.


http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test
File testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test:

http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test@127
PS13, Line 127:
> Sure. The ORC date_tbl is corrupted and need to wait for https://gerrit.clo
Okay.



--
To view, visit http://gerrit.cloudera.org:8080/18141
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501
Gerrit-Change-Number: 18141
Gerrit-PatchSet: 14
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Comment-Date: Wed, 23 Feb 2022 16:10:22 +0000
Gerrit-HasComments: Yes

Reply via email to