wgtmac commented on code in PR #1020: URL: https://github.com/apache/parquet-mr/pull/1020#discussion_r1070853151
########## parquet-column/src/main/java/org/apache/parquet/column/values/bloomfilter/BlockSplitBloomFilter.java: ########## @@ -394,4 +395,24 @@ public long hash(float value) { public long hash(Binary value) { return hashFunction.hashBytes(value.getBytes()); } + + @Override + public void merge(BloomFilter otherBloomFilter) throws IOException { + Preconditions.checkArgument(otherBloomFilter != null, "Cannot merge a null BloomFilter"); + Preconditions.checkArgument((getAlgorithm() == otherBloomFilter.getAlgorithm()), + "BloomFilters must have the same algorithm (%s != %s)", + getAlgorithm(), otherBloomFilter.getAlgorithm()); + Preconditions.checkArgument((getHashStrategy() == otherBloomFilter.getHashStrategy()), + "BloomFilters must have the same hashStrategy (%s != %s)", + getHashStrategy(), otherBloomFilter.getHashStrategy()); + Preconditions.checkArgument((getBitsetSize() == otherBloomFilter.getBitsetSize()), Review Comment: Maybe we should put this check to the 1st place to make it fail fast. Usually parameters above do not change. ########## parquet-column/src/main/java/org/apache/parquet/column/values/bloomfilter/BlockSplitBloomFilter.java: ########## @@ -394,4 +395,24 @@ public long hash(float value) { public long hash(Binary value) { return hashFunction.hashBytes(value.getBytes()); } + + @Override + public void merge(BloomFilter otherBloomFilter) throws IOException { Review Comment: I still think it is helpful to extract the check logic to a separate method like `boolean canMergeFrom(BloomFilter other)` because user has to check these parameters anyway before calling the `merge` method. Then `merge` implementation can simply call `canMergeFrom`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org