Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18141 )
Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables ...................................................................... Patch Set 16: (9 comments) Thanks for your feedbacks, Qifan! Addressed the comments and added tests for * empty string * large string that exceeds the limit * DATE type Also added some logs that are useful for debugging. http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG@20 PS14, Line 20: Currently they : are generated for the build side of a broadcast join. They will only be : applied on ORC > nit. They are generated for the build side of a broadcast join. Done http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG@24 PS14, Line 24: ld side exceeds a threshold (default to 1024), we se > nit. # of distinct values of the build side exceeds a threshold (default to Done http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG@39 PS14, Line 39: IN-list filters are disabled by default > It seems a formal performance test against TPCDs can help decide the defaul Yeah, hopefully we can turn it on if all the regressions are resolved. http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/runtime/runtime-filter-bank.cc File be/src/runtime/runtime-filter-bank.cc: http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/runtime/runtime-filter-bank.cc@380 PS14, Line 380: DCHECK(query_state_->query_options().__isset.runtime_in_list_filter_entry_limit); : int entry_limit = query_state_->query_options().runtime_in_list_filter_entry_limit; : in_list_filter = InListFilter::Create(params.in_list_filter(), : fs->consumed_filter->type(), entry_limit, &obj_pool_); : fs->in_list_filters.push_back(in_list_filter); : total_in_list_filter_items_->Add(params.in_list_filter().value_size()); : details = Substitute(" with $0 items", params. > Duplicated code with a portion of implementation of RuntimeFilterBank::Allo The codes are much simpler now. http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/runtime/runtime-filter-bank.cc@444 PS14, Line 444: > I wonder if we need this, since runtime_in_list_filter_entry_limit is defau Yeah, I think thrift will make sure the default value is set. Removed this. http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/util/in-list-filter-ir.cc File be/src/util/in-list-filter-ir.cc: http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/util/in-list-filter-ir.cc@61 PS14, Line 61: str_va > should set always_true_ to true here. Oops, thanks for catching this! http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/ImpalaService.thrift@725 PS14, Line 725: // Maximum number of entries in a runtime in-list filter. > nit. missing comment. Done http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/Query.thrift File common/thrift/Query.thrift: http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/Query.thrift@578 PS14, Line 578: // See comment in ImpalaService.thrift > nit. missing comment. Done http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test File testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test: http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test@127 PS13, Line 127: select STRAIGHT_JOIN count(*) from date_tbl a > Okay. Added the DATE tests. -- To view, visit http://gerrit.cloudera.org:8080/18141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501 Gerrit-Change-Number: 18141 Gerrit-PatchSet: 16 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Comment-Date: Thu, 24 Feb 2022 03:04:19 +0000 Gerrit-HasComments: Yes