[GitHub] [commons-collections] Claude-at-Instaclustr commented on a change in pull request #258: Simplify bloom filters

GitBox Thu, 10 Mar 2022 08:38:39 -0800


Claude-at-Instaclustr commented on a change in pull request #258:
URL: 
https://github.com/apache/commons-collections/pull/258#discussion_r823925019




##########
File path: 
src/main/java/org/apache/commons/collections4/bloomfilter/ArrayCountingBloomFilter.java
##########
@@ -53,203 +48,142 @@
  * @see Shape
  * @since 4.5
  */
-public class ArrayCountingBloomFilter extends AbstractBloomFilter implements 
CountingBloomFilter {
+public final class ArrayCountingBloomFilter implements CountingBloomFilter {
+
+    /**
+     * The shape of this Bloom filter.
+     */
+    private final Shape shape;
 
     /**
      * The count of each bit index in the filter.
      */
     private final int[] counts;
 
     /**
-     * The state flag. This is a bitwise OR of the entire history of all 
updated
+     * The state flag. This is a bitwise @{code OR} of the entire history of 
all updated
      * counts. If negative then a negative count or integer overflow has 
occurred on
      * one or more counts in the history of the filter and the state is 
invalid.
      *
      * <p>Maintenance of this state flag is branch-free for improved 
performance. It
      * eliminates a conditional check for a negative count during 
remove/subtract
      * operations and a conditional check for integer overflow during merge/add
-     * operations.
+     * operations.</p>
      *
      * <p>Note: Integer overflow is unlikely in realistic usage scenarios. A 
count
      * that overflows indicates that the number of items in the filter exceeds 
the
      * maximum possible size (number of bits) of any Bloom filter constrained 
by
      * integer indices. At this point the filter is most likely full (all bits 
are
-     * non-zero) and thus useless.
+     * non-zero) and thus useless.</p>
      *
      * <p>Negative counts are a concern if the filter is used incorrectly by
      * removing an item that was never added. It is expected that a user of a
      * counting Bloom filter will not perform this action as it is a mistake.
      * Enabling an explicit recovery path for negative or overflow counts is a 
major
      * performance burden not deemed necessary for the unlikely scenarios when 
an
      * invalid state is created. Maintenance of the state flag is a concession 
to
-     * flag improper use that should not have a major performance impact.
+     * flag improper use that should not have a major performance impact.</p>
      */
     private int state;
 
-    /**
-     * An iterator of all indexes with non-zero counts.
-     *
-     * <p>In the event that the filter state is invalid any index with a 
negative count
-     * will also be produced by the iterator.
-     */
-    private class IndexIterator implements PrimitiveIterator.OfInt {
-        /** The next non-zero index (or counts.length). */
-        private int next;
-
-        /**
-         * Create an instance.
-         */
-        IndexIterator() {
-            advance();
-        }
-
-        /**
-         * Advance to the next non-zero index.
-         */
-        void advance() {
-            while (next < counts.length && counts[next] == 0) {
-                next++;
-            }
-        }
-
-        @Override
-        public boolean hasNext() {
-            return next < counts.length;
-        }
-
-        @Override
-        public int nextInt() {
-            if (hasNext()) {
-                final int result = next++;
-                advance();
-                return result;
-            }
-            // Currently unreachable as the iterator is only used by
-            // the StaticHasher which iterates correctly.
-            throw new NoSuchElementException();
-        }
-    }
-
     /**
      * Constructs an empty counting Bloom filter with the specified shape.
      *
      * @param shape the shape of the filter
+     *
      */
     public ArrayCountingBloomFilter(final Shape shape) {
-        super(shape);
+        Objects.requireNonNull(shape, "shape");
+        this.shape = shape;
         counts = new int[shape.getNumberOfBits()];
     }
 
-    @Override
-    public int cardinality() {
-        int size = 0;
-        for (final int c : counts) {
-            if (c != 0) {
-                size++;
-            }
-        }
-        return size;
+    private ArrayCountingBloomFilter(ArrayCountingBloomFilter source) {
+        this.shape = source.shape;
+        this.state = source.state;
+        this.counts = source.counts.clone();
     }
 
     @Override
-    public boolean contains(final BloomFilter other) {
-        // The AbstractBloomFilter implementation converts both filters to 
long[] bits.
-        // This would involve checking all indexes in this filter against zero.
-        // Ideally we use an iterator of bit indexes to allow fail-fast on the
-        // first bit index that is zero.
-        if (other instanceof ArrayCountingBloomFilter) {
-            verifyShape(other);
-            return contains(((ArrayCountingBloomFilter) other).iterator());
-        }
-
-        // Note:
-        // This currently creates a StaticHasher which stores all the indexes.
-        // It would greatly benefit from direct generation of the index 
iterator
-        // avoiding the intermediate storage.
-        return contains(other.getHasher());
+    public ArrayCountingBloomFilter copy() {
+        return new ArrayCountingBloomFilter(this);
     }
 
     @Override
-    public boolean contains(final Hasher hasher) {
-        verifyHasher(hasher);
-        return contains(hasher.iterator(getShape()));
-    }
-
-    /**
-     * Return true if this filter is has non-zero counts for each index in the 
iterator.
-     *
-     * @param iter the iterator
-     * @return true if this filter contains all the indexes
-     */
-    private boolean contains(final OfInt iter) {
-        while (iter.hasNext()) {
-            if (counts[iter.nextInt()] == 0) {
-                return false;
-            }
-        }
+    public boolean isSparse() {
         return true;
     }
 
     @Override
-    public long[] getBits() {
-        final BitSet bs = new BitSet();
-        for (int i = 0; i < counts.length; i++) {
-            if (counts[i] != 0) {
-                bs.set(i);
-            }
-        }
-        return bs.toLongArray();
+    public int cardinality() {
+        return (int) IntStream.range(0, counts.length).filter(i -> counts[i] > 
0).count();
     }
 
     @Override
-    public StaticHasher getHasher() {
-        return new StaticHasher(iterator(), getShape());
+    public CountingBloomFilter merge(BloomFilter other) {
+        Objects.requireNonNull(other, "other");
+        CountingBloomFilter filter = copy();
+        filter.add(BitCountProducer.from(other));
+        return filter;
     }
 
-    /**
-     * Returns an iterator over the enabled indexes in this filter.
-     * Any index with a non-zero count is considered enabled.
-     * The iterator returns indexes in their natural order.
-     *
-     * @return an iterator over the enabled indexes
-     */
-    private PrimitiveIterator.OfInt iterator() {
-        return new IndexIterator();
+    @Override
+    public CountingBloomFilter merge(Hasher hasher) {
+        Objects.requireNonNull(hasher, "hasher");
+        ArrayCountingBloomFilter filter = copy();
+        try {
+            filter.add(BitCountProducer.from(hasher.uniqueIndices(shape)));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(

Review comment:
       I think this is a documentation issue.  We have specific comments for 
mergeInPlace that states it throws an illgetlArgumentException on numbers out 
of range.  merge() and mergeInPlace() need to be consistant across all the 
implementations.  I could convert  all the methods to thow IndexOutOfBounds 
excpetions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [commons-collections] Claude-at-Instaclustr commented on a change in pull request #258: Simplify bloom filters

Reply via email to