Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18141 )

Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables
......................................................................


Patch Set 13:

(18 comments)

Looks very good!

http://gerrit.cloudera.org:8080/#/c/18141/4/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/18141/4/be/src/exec/hdfs-orc-scanner.cc@1271
PS4, Line 1271: gumentBuilder* sarg) {
> hmm, I don't think we will use unmerged versions of runtime filters here. f
Yeah, since the focus of the patch is for broadcasting HJ, I think we are okay.

The merge for partitioned HJ is done here 
https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter-bank.cc#L206
 and here 
https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter.cc#L52.

We probably should specifically handle in-list case for 
https://github.com/apache/impala/blob/master/be/src/runtime/runtime-filter.cc#L52.


http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc
File be/src/util/in-list-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc@26
PS13, Line 26:   if (val == nullptr) {
UNLIKELY


http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc@30
PS13, Line 30:   if (values_.size() >= entry_limit_ || str_values_.size() >= 
entry_limit_) {
UNLIKELY


http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc@55
PS13, Line 55: if (s->ptr == nullptr) {
             :         contains_null_ = true
nit. should we check null-ness again? See line at 26.


http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc@58
PS13, Line 58: str_size_
nit. Probably should be named as str_total_size_.


http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.h
File be/src/util/in-list-filter.h:

http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.h@39
PS13, Line 39:   InListFilter(uint32_t entry_limit) : always_true_(false), 
entry_limit_(entry_limit) {}
Include contains_null and column type here.


http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc
File be/src/util/in-list-filter.cc:

http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@63
PS13, Line 63: break
"return false" here helps with release code.


http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@71
PS13, Line 71: filter->type_ = type;
nit. it is better to supply the type in the cstr.


http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@78
PS13, Line 78:   filter->type_ = type;
             :   filter->contains_null_ = protobuf.contains_null();
             :   filter->always_true_ = protobuf.always_true();
nit. probably should inited in the cstr.


http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@100
PS13, Line 100: break;
return null?


http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@112
PS13, Line 112: out->set_always_true(in.always_true());
              :   for (ColumnValuePB in_value : in.value()) {
              :     ColumnValuePB* out_value = out->add_value();
              :     *out_value = in_value;
              :   }
Other fields that are not copied: type_, contains_null_,  str_size_ and 
entry_limit_.


http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@142
PS13, Line 142:   }
Same comment for Copy method: missing fields.


http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@174
PS13, Line 174: ']'
should handle null case.


http://gerrit.cloudera.org:8080/#/c/18141/8/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/18141/8/common/thrift/ImpalaService.thrift@725
PS8, Line 725: RUNTIME_IN_LIST_FILTER_ENTRY_LIMIT
> I think having the RUNTIME prefix is consistent with existing options, e.g.
Done


http://gerrit.cloudera.org:8080/#/c/18141/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/18141/4/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@394
PS4, Line 394: o
> I added a check for the inner side started from patch set 5: https://gerrit
Okay. Thanks.


http://gerrit.cloudera.org:8080/#/c/18141/8/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test:

http://gerrit.cloudera.org:8080/#/c/18141/8/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@8
PS8, Line 8: 3.44K
> Sorry, these should not be introduced. I replace the test files so got thes
Great. Thanks a lot for taking care if it.

I wonder if we should also check that estimated memory or cardinality do not 
change when no in-list filter is present in a query.


http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test:

http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@787
PS13, Line 787: broadcast
May repeat this test with partition HJ to verify that in-list filters is not 
present.


http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test
File testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test:

http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test@127
PS13, Line 127:
may add a test on date column type.



--
To view, visit http://gerrit.cloudera.org:8080/18141
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501
Gerrit-Change-Number: 18141
Gerrit-PatchSet: 13
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Comment-Date: Tue, 22 Feb 2022 17:37:45 +0000
Gerrit-HasComments: Yes

Reply via email to