Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/18141 )
Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables ...................................................................... Patch Set 14: (10 comments) Great. Thanks a lot for the rework. http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG@20 PS14, Line 20: Currently they : are generated only for small build side (e.g. #rows <= 1024) of a : broadcast join. nit. They are generated for the build side of a broadcast join. Suggest not to mention small build side until a few sentences later. http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG@24 PS14, Line 24: #rows of the build side exceeds the threshold (1024) nit. # of distinct values of the build side exceeds a threshold (default to 1024). http://gerrit.cloudera.org:8080/#/c/18141/14//COMMIT_MSG@39 PS14, Line 39: IN-list filters are disabled by default It seems a formal performance test against TPCDs can help decide the default setting. Maybe we do this once Impala 11140, 11141 and 11142 are resolved. http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/runtime/runtime-filter-bank.cc File be/src/runtime/runtime-filter-bank.cc: http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/runtime/runtime-filter-bank.cc@380 PS14, Line 380: uint32_t entry_limit = InListFilter::DEFAULT_ENTRY_LIMIT; : if (query_state_->query_options().__isset.runtime_in_list_filter_entry_limit) { : entry_limit = query_state_->query_options().runtime_in_list_filter_entry_limit; : } : in_list_filter = InListFilter::Create(params.in_list_filter(), : fs->consumed_filter->type(), entry_limit, &obj_pool_); : fs->in_list_filters.push_back(in_list_filter); Duplicated code with a portion of implementation of RuntimeFilterBank::AllocateScratchInListFilte(). http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/runtime/runtime-filter-bank.cc@444 PS14, Line 444: :DEFAULT_ENTRY_LIMIT; I wonder if we need this, since runtime_in_list_filter_entry_limit is default to 1024 and covers the non-negative domain. http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc File be/src/util/in-list-filter-ir.cc: http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc@55 PS13, Line 55: if (UNLIKELY(s->ptr == nullptr)) { : contains_null_ = true > The default constructor of StringValue creates a null 'ptr'. I think we'd b Done http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/util/in-list-filter-ir.cc File be/src/util/in-list-filter-ir.cc: http://gerrit.cloudera.org:8080/#/c/18141/14/be/src/util/in-list-filter-ir.cc@61 PS14, Line 61: return should set always_true_ to true here. http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/ImpalaService.thrift@725 PS14, Line 725: RUNTIME_IN_LIST_FILTER_ENTRY_LIMIT = 142; nit. missing comment. http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/Query.thrift File common/thrift/Query.thrift: http://gerrit.cloudera.org:8080/#/c/18141/14/common/thrift/Query.thrift@578 PS14, Line 578: 143: optional i32 runtime_in_list_filter_entry_limit = 1024; nit. missing comment. http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test File testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test: http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test@127 PS13, Line 127: > Sure. The ORC date_tbl is corrupted and need to wait for https://gerrit.clo Okay. -- To view, visit http://gerrit.cloudera.org:8080/18141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501 Gerrit-Change-Number: 18141 Gerrit-PatchSet: 14 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Comment-Date: Wed, 23 Feb 2022 16:10:22 +0000 Gerrit-HasComments: Yes