Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11333 )
Change subject: [KUDU-2521] Java Implementation for BloomFilter ...................................................................... Patch Set 6: (16 comments) http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java: http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@225 PS5, Line 225: return checkIfContains(byteBuffer); > I'm not sure here. Actually, the length of 'bytes' doesn't matter because it's hashed into a single long regardless. http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java: http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@29 PS6, Line 29: * An space-efficient filter and offers an approximate containment check. Nit: "A space-efficient filter which offers an approximate containment check"? http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@37 PS6, Line 37: shrink the amount Nit: "constrain the number" http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@39 PS6, Line 39: you are expecting TServer to return records have : * the same value in a scan. Nit: "you expect the TServer to return records with the same value in a scan". http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@52 PS6, Line 52: * // TODO: implemnt the interface for serializaing and sending "implement" and "serializing" http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@69 PS6, Line 69: if (bitSet.size() < 8) { : throw new IllegalArgumentException("Number of bits in bitset should be at least 8, but found " : + bitSet.length()); : } You can use Guava's Preconditions to simplify these assertions: Preconditions.checkArgument(bitset.size() < 8, "..."); http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@73 PS6, Line 73: this.nBits = bitSet.size(); Could you look into whether bitSet.size() is a constant time operation? If it is, there's no need to store nBits separately; we could just query bitSet.size() whenever we need the number of bits. http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@82 PS6, Line 82: * @param nBytes size of Bloom filter Nit: "size of bloom filter in bytes" http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@83 PS6, Line 83: * @param fpRate false positive rate Could you elaborate on what this means? How should a user know what value to provide here? Below as well. http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@89 PS6, Line 89: public static BloomFilter BySizeAndFPRate(int nBytes, double fpRate, HashFunction hashFunction) { Please doc this one too. http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@97 PS6, Line 97: Bloom Nit: lower case "bloom" (to be consistent with how you've written "bloom filter" in other Javadoc. http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@112 PS6, Line 112: public void put(byte[] data) { Please Javadoc the various put() methods. http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@163 PS6, Line 163: : public byte[] getBitSet() { : return bitSet.toByteArray(); : } : : public int getNHashes() { : return nHashes; : } : : public String getHashFunctionName() { : return hashFunction.toString(); : } Are these testing only? If so, annotate them as such. If not, provide Javadoc. http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@200 PS6, Line 200: assert (byteBuffer.length >= length); Nit: use Guava's Preconditions here. http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java: http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java@29 PS5, Line 29: private int kRandomSeed = 0xdeadbeef; > By reusing the same seed, you'll get the exact same PRNG sequencing with ev Did you see this comment? http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java: http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java@35 PS6, Line 35: assertTrue(BloomFilter.ByCountAndFPRate(10, 0.1).getNHashes() == 3); You can use assertEquals() here instead. -- To view, visit http://gerrit.cloudera.org:8080/11333 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e Gerrit-Change-Number: 11333 Gerrit-PatchSet: 6 Gerrit-Owner: jinxing6...@126.com Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: jinxing6...@126.com Gerrit-Comment-Date: Wed, 12 Sep 2018 18:41:30 +0000 Gerrit-HasComments: Yes