Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18141 )

Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables
......................................................................


Patch Set 16:

(9 comments)

Thanks for your feedbacks, Qifan!

Addressed the comments and added tests for
* empty string
* large string that exceeds the limit
* DATE type

Also added some logs that are useful for debugging.

http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG@20
PS14, Line 20: Currently they
             : are generated for the build side of a broadcast join. They will 
only be
             : applied on ORC
> nit. They are generated for the build side of a broadcast join.
Done


http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG@24
PS14, Line 24: ld side exceeds a threshold (default to 1024), we se
> nit. # of distinct values of the build side exceeds a threshold (default to
Done


http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG@39
PS14, Line 39: IN-list filters are disabled by default
> It seems a formal performance test against TPCDs can help decide the defaul
Yeah, hopefully we can turn it on if all the regressions are resolved.


http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/runtime/runtime-filter-bank.cc@380
PS14, Line 380:    
DCHECK(query_state_->query_options().__isset.runtime_in_list_filter_entry_limit);
              :     int entry_limit = 
query_state_->query_options().runtime_in_list_filter_entry_limit;
              :     in_list_filter = 
InListFilter::Create(params.in_list_filter(),
              :         fs->consumed_filter->type(), entry_limit, &obj_pool_);
              :     fs->in_list_filters.push_back(in_list_filter);
              :     
total_in_list_filter_items_->Add(params.in_list_filter().value_size());
              :     details = Substitute(" with $0 items", params.
> Duplicated code with a portion of implementation of RuntimeFilterBank::Allo
The codes are much simpler now.


http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/runtime/runtime-filter-bank.cc@444
PS14, Line 444:
> I wonder if we need this, since runtime_in_list_filter_entry_limit is defau
Yeah, I think thrift will make sure the default value is set. Removed this.


http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/util/in-list-filter-ir.cc
File be/src/util/in-list-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/util/in-list-filter-ir.cc@61
PS14, Line 61: str_va
> should set always_true_ to true here.
Oops, thanks for catching this!


http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/ImpalaService.thrift@725
PS14, Line 725:   // Maximum number of entries in a runtime in-list filter.
> nit. missing comment.
Done


http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/Query.thrift@578
PS14, Line 578:   // See comment in ImpalaService.thrift
> nit. missing comment.
Done


http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test
File testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test:

http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test@127
PS13, Line 127: select STRAIGHT_JOIN count(*) from date_tbl a
> Okay.
Added the DATE tests.



--
To view, visit http://gerrit.cloudera.org:8080/18141
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501
Gerrit-Change-Number: 18141
Gerrit-PatchSet: 16
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Comment-Date: Thu, 24 Feb 2022 03:04:19 +0000
Gerrit-HasComments: Yes

Reply via email to