Ayush Sharma created COLLECTIONS-883:
----------------------------------------
Summary: BloomFilter Shape class limits numberOfBits to int,
preventing large-scale filters (>2.1B bits)
Key: COLLECTIONS-883
URL: https://issues.apache.org/jira/browse/COLLECTIONS-883
Project: Commons Collections
Issue Type: Bug
Components: Bloomfilter
Affects Versions: 4.5.0
Reporter: Ayush Sharma
Attachments: Screenshot 2025-12-15 at 11.47.24 AM.png, image1.png
*Problem*
When creating a Bloom filter for large datasets using Shape.fromNP(n, p), the
operation fails if the calculated number of bits exceeds Integer.MAX_VALUE
(~2.1 billion).
*Error Message*
Resulting filter has more than 2147483647 bits: 7.569340059E9
*Environment*
* Dataset: ~500 million elements
* False Positive Probability: 0.005
* Apache Commons Collections version: 4.5.0-M2
*Root Cause*
The Shape class stores numberOfBits as an int:
* Shape.fromNP(int n, double p)
* Shape.fromKM(int k, int m)
* Shape.fromNM(int n, int m)
* Shape.getNumberOfBits() returns int
For large-scale applications, the required bits can exceed Integer.MAX_VALUE.
*Calculation*
m = -n × ln(p) / (ln(2))²
m = -500,000,000 × ln(0.005) / 0.4805
m ≈ 5.5+ billion bits
*Suggested Fix*
Change numberOfBits from int to long in the Shape class and related methods.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)