Ayush Sharma created COLLECTIONS-883:
----------------------------------------

             Summary: BloomFilter Shape class limits numberOfBits to int, 
preventing large-scale filters (>2.1B bits)
                 Key: COLLECTIONS-883
                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-883
             Project: Commons Collections
          Issue Type: Bug
          Components: Bloomfilter
    Affects Versions: 4.5.0
            Reporter: Ayush Sharma
         Attachments: Screenshot 2025-12-15 at 11.47.24 AM.png, image1.png

*Problem*

When creating a Bloom filter for large datasets using Shape.fromNP(n, p), the 
operation fails if the calculated number of bits exceeds Integer.MAX_VALUE 
(~2.1 billion).

*Error Message*

Resulting filter has more than 2147483647 bits: 7.569340059E9



*Environment*
* Dataset: ~500 million elements
* False Positive Probability: 0.005
* Apache Commons Collections version: 4.5.0-M2

*Root Cause*

The Shape class stores numberOfBits as an int:
* Shape.fromNP(int n, double p)
* Shape.fromKM(int k, int m)  
* Shape.fromNM(int n, int m)
* Shape.getNumberOfBits() returns int

For large-scale applications, the required bits can exceed Integer.MAX_VALUE.

*Calculation*

m = -n × ln(p) / (ln(2))²
m = -500,000,000 × ln(0.005) / 0.4805
m ≈ 5.5+ billion bits



*Suggested Fix*

Change numberOfBits from int to long in the Shape class and related methods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to