yabola commented on code in PR #1020: URL: https://github.com/apache/parquet-mr/pull/1020#discussion_r1070591022
########## parquet-column/src/test/java/org/apache/parquet/column/values/bloomfilter/TestBlockSplitBloomFilter.java: ########## @@ -181,6 +182,60 @@ public void testBloomFilterNDVs(){ assertTrue(bytes < 5 * 1024 * 1024); } + @Test + public void testMergeBloomFilter() throws IOException { + Random random = new Random(); + int numBytes = BlockSplitBloomFilter.optimalNumOfBits(1024 * 1024, 0.01) / 8; + BloomFilter otherBloomFilter = new BlockSplitBloomFilter(numBytes); + BloomFilter mergedBloomFilter = new BlockSplitBloomFilter(numBytes); + + Set<String> originStrings = new HashSet<>(); + Set<String> testStrings = new HashSet<>(); + Set<Integer> testInts = new HashSet<>(); + Set<Double> testDoubles = new HashSet<>(); + Set<Float> testFloats = new HashSet<>(); + for (int i = 0; i < 1024; i++) { + + String originStrValue = RandomStringUtils.randomAlphabetic(1, 64); + originStrings.add(originStrValue); + mergedBloomFilter.insertHash(otherBloomFilter.hash(Binary.fromString(originStrValue))); + + String testString = RandomStringUtils.randomAlphabetic(1, 64); + testStrings.add(testString); + otherBloomFilter.insertHash(otherBloomFilter.hash(Binary.fromString(testString))); + + int testInt = random.nextInt(); + testInts.add(testInt); + otherBloomFilter.insertHash(otherBloomFilter.hash(testInt)); + + double testDouble = random.nextDouble(); + testDoubles.add(testDouble); + otherBloomFilter.insertHash(otherBloomFilter.hash(testDouble)); + + float testFloat = random.nextFloat(); + testFloats.add(testFloat); + otherBloomFilter.insertHash(otherBloomFilter.hash(testFloat)); + } + mergedBloomFilter.merge(otherBloomFilter); + for (String testString : originStrings) { + assertTrue(mergedBloomFilter.findHash(mergedBloomFilter.hash(Binary.fromString(testString)))); Review Comment: Because I added random value, if the BloomFilter to be merged is not empty, there is a small probability that the two BloomFilters will be inconsistent when judging whether there is a hash value. But if the BloomFilter to be merged is empty in the beginning, the result from these two BloomFilter should be always the same. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org