Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11333 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
......................................................................


Patch Set 6:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@225
PS5, Line 225:     return checkIfContains(byteBuffer);
> I'm not sure here.
Actually, the length of 'bytes' doesn't matter because it's hashed into a 
single long regardless.


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@29
PS6, Line 29:  * An space-efficient filter and offers an approximate 
containment check.
Nit: "A space-efficient filter which offers an approximate containment check"?


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@37
PS6, Line 37: shrink the amount
Nit: "constrain the number"


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@39
PS6, Line 39: you are expecting TServer to return records have
            :  * the same value in a scan.
Nit: "you expect the TServer to return records with the same value in a scan".


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@52
PS6, Line 52:  *   // TODO: implemnt the interface for serializaing and sending
"implement" and "serializing"


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@69
PS6, Line 69:     if (bitSet.size() < 8) {
            :       throw new IllegalArgumentException("Number of bits in 
bitset should be at least 8, but found "
            :         + bitSet.length());
            :     }
You can use Guava's Preconditions to simplify these assertions:

  Preconditions.checkArgument(bitset.size() < 8, "...");


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@73
PS6, Line 73:     this.nBits = bitSet.size();
Could you look into whether bitSet.size() is a constant time operation? If it 
is, there's no need to store nBits separately; we could just query 
bitSet.size() whenever we need the number of bits.


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@82
PS6, Line 82:    * @param nBytes size of Bloom filter
Nit: "size of bloom filter in bytes"


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@83
PS6, Line 83:    * @param fpRate false positive rate
Could you elaborate on what this means? How should a user know what value to 
provide here?

Below as well.


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@89
PS6, Line 89:   public static BloomFilter BySizeAndFPRate(int nBytes, double 
fpRate, HashFunction hashFunction) {
Please doc this one too.


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@97
PS6, Line 97: Bloom
Nit: lower case "bloom" (to be consistent with how you've written "bloom 
filter" in other Javadoc.


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@112
PS6, Line 112:   public void put(byte[] data) {
Please Javadoc the various put() methods.


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@163
PS6, Line 163:
             :   public byte[] getBitSet() {
             :     return bitSet.toByteArray();
             :   }
             :
             :   public int getNHashes() {
             :     return nHashes;
             :   }
             :
             :   public String getHashFunctionName() {
             :     return hashFunction.toString();
             :   }
Are these testing only? If so, annotate them as such. If not, provide Javadoc.


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@200
PS6, Line 200:     assert (byteBuffer.length >= length);
Nit: use Guava's Preconditions here.


http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java@29
PS5, Line 29:   private int kRandomSeed = 0xdeadbeef;
> By reusing the same seed, you'll get the exact same PRNG sequencing with ev
Did you see this comment?


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java@35
PS6, Line 35:     assertTrue(BloomFilter.ByCountAndFPRate(10, 0.1).getNHashes() 
== 3);
You can use assertEquals() here instead.



--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 6
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Grant Henke <granthe...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Wed, 12 Sep 2018 18:41:30 +0000
Gerrit-HasComments: Yes

Reply via email to