Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11100 )

Change subject: Implement BloomFilter Predicate in server side.
......................................................................


Patch Set 9:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate-test.cc
File src/kudu/common/column_predicate-test.cc:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate-test.cc@881
PS9, Line 881: InList
Should this be 'Equality'?


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@230
PS9, Line 230:   bool EvaluateCellForBloomFilter(DataType type, const void* 
cell) const;
Is this used anywhere?  If it's test only please indicate that in the doc.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@376
PS9, Line 376:       typedef typename DataTypeTraits<PhysicalType>::cpp_type 
cpp_type;
             :       size_t size = sizeof(cpp_type);
             :       const void* data = cell;
             :       if (PhysicalType == BINARY) {
             :         const Slice *slice = reinterpret_cast<const Slice 
*>(cell);
             :         size = slice->size();
             :         data = slice->data();
             :       }
             :       Slice cell_slice(reinterpret_cast<const uint8_t *>(data), 
size);
             :       BloomKeyProbe probe(cell_slice, bf.hash_algorithm());
This entire portion could be hoisted outside the for loop, which saves 
recomputing the probe in the case of multiple filters.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@386
PS9, Line 386:       if (!BloomFilter(bf.bloom_data(), 
bf.nhash()).MayContainKey(probe)) {
Once this for loop is simplified to just be this call, it should further 
simplify to just use std::all_of


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc
File src/kudu/common/column_predicate.cc:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@115
PS9, Line 115:   return ColumnPredicate(PredicateType::InBloomFilter, 
move(column), bfs, lower, upper);
I think this should be calling Simplify(), or if there's a reason not to please 
leave a comment.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@385
PS9, Line 385:       bloom_filters_.insert(bloom_filters_.end(), 
other.bloom_filters().begin(),
Pretty sure this can be simplified to a straight copy, eg:

bloom_filters_ = other.bloom_filters_;


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@387
PS9, Line 387:       if (other.lower_ != nullptr &&
This range portion of this code doesn't need to be duplicated if you move 
InBloomFilter above Range, and let it fall-through.  Make sure to annotate with 
FALLTHROUGH_INTENDED, though.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@588
PS9, Line 588:       if (new_values.empty()) {
Simplify() should already handle this, so I think you can reduce this to just 
the else clause.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@644
PS9, Line 644:           new_values.emplace_back(value);
can this use copy_if like above?  I think it's more elegant that way.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@648
PS9, Line 648:         SetToNone();
likewise, I think this is already handled by the Simplify() call.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@661
PS9, Line 661:       if (other.lower_ != nullptr &&
likewise I think this can be simplified with the fallthrough trick.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@895
PS9, Line 895:     case PredicateType::InBloomFilter: {
This could be simplified by moving it above Range and allowing it to fall 
through to check the range portion (after checking the annotations).  Annotate 
with FALLTHROUGH_INTENDED if you do that.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@899
PS9, Line 899:       auto bound_equal = [&] (const void* eleml, const void* 
elemr) {
This is nice, consider doing the same simplification for Range.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/common.proto
File src/kudu/common/common.proto:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/common.proto@347
PS9, Line 347: enum HashAlgorithmInBloomFilter {
I suggested this back in PS5 but I think there was some confusion.  I think you 
can call this just 'HashAlgorithm', and use it for both bloom filters as well 
as in HashBucketSchemaPB (where the existing one will be removed).  To do that 
compatibly the existing HashAlgorithm values have to remain, so it should be:

HashAlgorithm {
  UNKNOWN = 0;
  MURMUR_HASH_2 = 1;
  CITY_HASH = 2;
}


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/wire_protocol.cc@604
PS9, Line 604:         if (!bf.has_nhash() || !bf.has_bloom_data() || 
!bf.has_hash_algorithm()) {
This should also be checking for BloomFilter::UNKNOWN when that variant is 
added.



--
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 9
Gerrit-Owner: ZhangYao <triplesheep0...@gmail.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: ZhangYao <triplesheep0...@gmail.com>
Gerrit-Comment-Date: Fri, 05 Oct 2018 05:08:37 +0000
Gerrit-HasComments: Yes

Reply via email to