Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )
Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu ...................................................................... Patch Set 9: (17 comments) http://gerrit.cloudera.org:8080/#/c/15683/9/be/CMakeLists.txt File be/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/15683/9/be/CMakeLists.txt@234 PS9, Line 234: "-DKUDU_HEADERS_NO_STUBS" nit formatting: trailing whitespace and you can wrap this into the rest of the string instead of putting it on its own line http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc@91 PS9, Line 91: //local_bloom_filter->Insert(val, expr_eval->root().type()); I guess this was left by accident? http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc@106 PS9, Line 106: // An example of the generated code for TPCH-Q2: RF002 -> n_regionkey This needs to be updated (though of course it will only change slightly). If you're not sure how to get this, see: https://cwiki.apache.org/confluence/display/IMPALA/Codegen (the easiest way is probably just to add a line at the end of this function that logs fn->dump() or whatever) http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@232 PS9, Line 232: } : else if nit: formatting ('else if' should go on the same line as prior '}') Not sure if I've bugged you about clang-format before, but its a great way to help ensure that you're code reviews don't have these sorts of simple errors. See: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536 Note that clang-format is just a guide and if it's suggestion differs from surrounding code then you can feel free to ignore it http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@234 PS9, Line 234: continue; Brief comment, eg. 'This filter won't actually remove any rows so we don't need to push it down to Kudu' http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@236 PS9, Line 236: else nit: it doesn't actually change how the code works, but it would be better to leave off this 'else' since its not actually mutually exclusive with the prior 'if's http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@238 PS9, Line 238: if (filter != nullptr) { I think we can save ourselves some duplication and an indention level by changing this to 'ctx.filter->HasFilter()' and moving it up to the 'if(AlwaysTrue())' http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@239 PS9, Line 239: auto it = ctx.filter->filter_desc().planid_to_target_ndx.find( : scan_node_->id()); : const TRuntimeFilterTargetDesc &target_desc = : ctx.filter->filter_desc().targets[it->second]; : const string &col_name = target_desc.kudu_col_name; : DCHECK(col_name != ""); This is duplicated below, I think we can eliminate the duplication by moving this up to before the 'if(is_bloom)' http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h File be/src/util/bloom-filter.h: http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@53 PS9, Line 53: class BloomFilterBufferAllocator; I don't think this is necessary http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@73 PS9, Line 73: BloomFilterBufferAllocator nit: maybe name this ImpalaBloomFilterBufferAllocator to ensure there's no confusion http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@91 PS9, Line 91: std::shared_ptr<kudu::BlockBloomFilterBufferAllocatorIf> Clone() const override { If this is something that's really only used in Kudu's internal testing and won't ever get hit in Impala, it might be better to just do a LOG(FATAL) here or something, since then you would be able to simplify the logic of AllocateBuffer()/Close()/etc. and not support a code path that's never getting hit. Probably have to check with Bankim to ensure that's really the case, though http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@111 PS9, Line 111: /// A BloomFilter stores sets of items and offers a query operation indicating whether or This class comment could probably use some updating to reflect the fact that this class is now basically a thin wrapper around Kudu's bloom filter http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@171 PS9, Line 171: void Insert(void* val, const ColumnType& col_type) noexcept; Is this version of Insert() used anywhere? Same with the equivalent Find() below http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@241 PS9, Line 241: HashAlgorithm hash_algorithm_; As far as I can tell, this isn't actually getting used anywhere, and I don't think that it would be valid to pass any value other than FAST_HASH into Init() anyways (since we've hard-coded the use of fasthash in various places such as FilterContext::Insert()), so I think its probably better to remove this, the related parameter to Init(), etc. http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.cc@205 PS9, Line 205: if (is_allocated_) Close(); I suspect its probably supposed to be guaranteed that FreeBuffer() is always called before the allocator is destructed, in which case it would be good to do a LOG(DFATAL) here like we do in the BloomFilter destructor above (in general, its against Impala's style to do any real work in destructors, not sure about Kudu but I assume they have the same rule) http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.cc@226 PS9, Line 226: Close(); // Ensure that any previously allocated memory is released. I wonder if this is really necessary - I would hope that Kudu provides a guarantee that every call to AllocateBuffer will have a corresponding call to FreeBuffer. Might need to ask Bankim to be sure. http://gerrit.cloudera.org:8080/#/c/15683/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/15683/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@723 PS9, Line 723: if (slotRef == null || slotRef.getDesc().getColumn() == null : || (targetExpr instanceof CastExpr) : || filter.getExprCompOp() == Operator.NOT_DISTINCT) { : continue; : } This is almost identical to the 'if' below. I think we can reduce code duplication by moving this outside the "if(BLOOM) {} else(MIN_MAX) {}" section -- To view, visit http://gerrit.cloudera.org:8080/15683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754 Gerrit-Change-Number: 15683 Gerrit-PatchSet: 9 Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Bankim Bhavsar <ban...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Comment-Date: Mon, 04 May 2020 21:57:27 +0000 Gerrit-HasComments: Yes