aherbert commented on code in PR #406:
URL: 
https://github.com/apache/commons-collections/pull/406#discussion_r1276350791


##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -121,7 +186,7 @@ default boolean merge(final Hasher hasher) {
     /**
      * Merges the specified index producer into this Bloom filter.
      *
-     * <p>Specifically: all counts for the indexes identified by the {@code 
indexProducer} will be incremented by 1.</p>
+     * <p>Specifically: all cells for the indexes identified by the {@code 
indexProducer} will be incremented by 1.</p>

Review Comment:
   I don't think your conversion to a CellProducer upholds this contract. The 
cells will not be incremented by 1; they will be incremented by the count of 
the indices because merge then calls add.
   
   IIUC a merge is from BloomFilter and should flatten all arguments to an 
effective bitmap, removing duplicates. The add operation should not remove 
duplicate indices because they are treated as cells with a count above 1.
   
   We should have some tests to check these definitions are upheld.
   
   ```Java
   ArrayCountingBloomFilter f1 = new ArrayCountingBloomFilter(Shape.fromKM(1, 
10));
   ArrayCountingBloomFilter f2 = f1.copy();
   ArrayCountingBloomFilter f3 = f1.copy();
   IndexProducer ip = p -> {
       p.test(3);
       p.test(3);
       return true;
   };
   // The merge SHOULD increment cells by 1
   f1.merge(ip);
   // The add should increment cells by 2
   f2.add(CellProducer.from(ip));
   // This merge will increment by 1 as the round-trip makes the indices unique
   f3.merge(IndexProducer.fromIndexArray(ip.asIndexArray()));
   
   f1.forEachCell((k, v) -> {
       System.out.printf("%d = %d%n", k, v);
       return true;
   });
   f2.forEachCell((k, v) -> {
       System.out.printf("%d = %d%n", k, v);
       return true;
   });
   f3.forEachCell((k, v) -> {
       System.out.printf("%d = %d%n", k, v);
       return true;
   });
   ```
   Prints:
   ```
   3 = 2    // ERROR !
   3 = 2
   3 = 1
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to