jinxing6...@126.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11077 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@22
PS1, Line 22: public class BloomFilter {
> Is this meant to be part of the public API or an internal-facing class? I'm
True, I updated and added some annotation.


http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@40
PS1, Line 40: genBloomKeyProbe
> this pattern seems like it will create a new object for each lookup in the
True,  I updated in current change.


http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@86
PS1, Line 86: CharsetUtil
> I think we can use StandardCharsets from java 7 here
Updated


http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@139
PS1, Line 139: CityHash
> I'm curious why the decision to use CityHash here? We are using murmur hash
Yes, in Kudu kernel, we are using Murmur2 for partitioning and CityHash for 
blooms on PK lookups. The hashing added in this change is only used for bloom 
filter. So I think it's necessary to make the hashing consistent with the 
blooms on PK lookups in TServer.
I was also looking forward to use the bloom filters to eliminate entire 
partitions. But it was hard. In scenario of Join, the small table used to 
generate BloomFilter is not always small enough, thus hard to eliminate the 
entire partition.


http://gerrit.cloudera.org:8080/#/c/11077/3/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11077/3/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@28
PS3, Line 28:   public BloomFilter(BloomFilterSizing sizing) {
Added some annotations


http://gerrit.cloudera.org:8080/#/c/11077/3/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@169
PS3, Line 169:
Yes, in Kudu kernel, we are using Murmur2 for partitioning and CityHash for 
blooms on PK lookups. The hashing added in this change is only used for bloom 
filter. So I think it's necessary to make the hashing consistent with the 
blooms on PK lookups in TServer.
I was also looking forward to use the bloom filters to eliminate entire 
partitions. But it was hard. In scenario of Join, the small table used to 
generate BloomFilter is not always small enough, thus hard to eliminate the 
entire partition.



--
To view, visit http://gerrit.cloudera.org:8080/11077
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50c5dce4d24707c068c5ab94e5d311b12b3251b8
Gerrit-Change-Number: 11077
Gerrit-PatchSet: 2
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: Grant Henke <granthe...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Mon, 13 Aug 2018 09:59:24 +0000
Gerrit-HasComments: Yes

Reply via email to