jinxing6...@126.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/11077 )
Change subject: [KUDU-2521] Java Implementation for BloomFilter ...................................................................... Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java: http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@22 PS1, Line 22: public class BloomFilter { > Is this meant to be part of the public API or an internal-facing class? I'm True, I updated and added some annotation. http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@40 PS1, Line 40: genBloomKeyProbe > this pattern seems like it will create a new object for each lookup in the True, I updated in current change. http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@86 PS1, Line 86: CharsetUtil > I think we can use StandardCharsets from java 7 here Updated http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@139 PS1, Line 139: CityHash > I'm curious why the decision to use CityHash here? We are using murmur hash Yes, in Kudu kernel, we are using Murmur2 for partitioning and CityHash for blooms on PK lookups. The hashing added in this change is only used for bloom filter. So I think it's necessary to make the hashing consistent with the blooms on PK lookups in TServer. I was also looking forward to use the bloom filters to eliminate entire partitions. But it was hard. In scenario of Join, the small table used to generate BloomFilter is not always small enough, thus hard to eliminate the entire partition. http://gerrit.cloudera.org:8080/#/c/11077/3/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java: http://gerrit.cloudera.org:8080/#/c/11077/3/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@28 PS3, Line 28: public BloomFilter(BloomFilterSizing sizing) { Added some annotations http://gerrit.cloudera.org:8080/#/c/11077/3/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@169 PS3, Line 169: Yes, in Kudu kernel, we are using Murmur2 for partitioning and CityHash for blooms on PK lookups. The hashing added in this change is only used for bloom filter. So I think it's necessary to make the hashing consistent with the blooms on PK lookups in TServer. I was also looking forward to use the bloom filters to eliminate entire partitions. But it was hard. In scenario of Join, the small table used to generate BloomFilter is not always small enough, thus hard to eliminate the entire partition. -- To view, visit http://gerrit.cloudera.org:8080/11077 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I50c5dce4d24707c068c5ab94e5d311b12b3251b8 Gerrit-Change-Number: 11077 Gerrit-PatchSet: 2 Gerrit-Owner: jinxing6...@126.com Gerrit-Reviewer: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: jinxing6...@126.com Gerrit-Comment-Date: Mon, 13 Aug 2018 09:59:24 +0000 Gerrit-HasComments: Yes