Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/18141 )
Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables ...................................................................... Patch Set 13: (18 comments) Looks very good! http://gerrit.cloudera.org:8080/#/c/18141/4/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/18141/4/be/src/exec/hdfs-orc-scanner.cc@1271 PS4, Line 1271: gumentBuilder* sarg) { > hmm, I don't think we will use unmerged versions of runtime filters here. f Yeah, since the focus of the patch is for broadcasting HJ, I think we are okay. The merge for partitioned HJ is done here https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter-bank.cc#L206 and here https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter.cc#L52. We probably should specifically handle in-list case for https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter.cc#L52. http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc File be/src/util/in-list-filter-ir.cc: http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc@26 PS13, Line 26: if (val == nullptr) { UNLIKELY http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc@30 PS13, Line 30: if (values_.size() >= entry_limit_ || str_values_.size() >= entry_limit_) { UNLIKELY http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc@55 PS13, Line 55: if (s->ptr == nullptr) { : contains_null_ = true nit. should we check null-ness again? See line at 26. http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc@58 PS13, Line 58: str_size_ nit. Probably should be named as str_total_size_. http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.h File be/src/util/in-list-filter.h: http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.h@39 PS13, Line 39: InListFilter(uint32_t entry_limit) : always_true_(false), entry_limit_(entry_limit) {} Include contains_null and column type here. http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc File be/src/util/in-list-filter.cc: http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@63 PS13, Line 63: break "return false" here helps with release code. http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@71 PS13, Line 71: filter->type_ = type; nit. it is better to supply the type in the cstr. http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@78 PS13, Line 78: filter->type_ = type; : filter->contains_null_ = protobuf.contains_null(); : filter->always_true_ = protobuf.always_true(); nit. probably should inited in the cstr. http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@100 PS13, Line 100: break; return null? http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@112 PS13, Line 112: out->set_always_true(in.always_true()); : for (ColumnValuePB in_value : in.value()) { : ColumnValuePB* out_value = out->add_value(); : *out_value = in_value; : } Other fields that are not copied: type_, contains_null_, str_size_ and entry_limit_. http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@142 PS13, Line 142: } Same comment for Copy method: missing fields. http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@174 PS13, Line 174: ']' should handle null case. http://gerrit.cloudera.org:8080/#/c/18141/8/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/18141/8/common/thrift/ImpalaService.thrift@725 PS8, Line 725: RUNTIME_IN_LIST_FILTER_ENTRY_LIMIT > I think having the RUNTIME prefix is consistent with existing options, e.g. Done http://gerrit.cloudera.org:8080/#/c/18141/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/18141/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@394 PS4, Line 394: o > I added a check for the inner side started from patch set 5: https://gerrit Okay. Thanks. http://gerrit.cloudera.org:8080/#/c/18141/8/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test: http://gerrit.cloudera.org:8080/#/c/18141/8/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@8 PS8, Line 8: 3.44K > Sorry, these should not be introduced. I replace the test files so got thes Great. Thanks a lot for taking care if it. I wonder if we should also check that estimated memory or cardinality do not change when no in-list filter is present in a query. http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test: http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@787 PS13, Line 787: broadcast May repeat this test with partition HJ to verify that in-list filters is not present. http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test File testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test: http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test@127 PS13, Line 127: may add a test on date column type. -- To view, visit http://gerrit.cloudera.org:8080/18141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501 Gerrit-Change-Number: 18141 Gerrit-PatchSet: 13 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Comment-Date: Tue, 22 Feb 2022 17:37:45 +0000 Gerrit-HasComments: Yes