[
https://issues.apache.org/jira/browse/COLLECTIONS-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ayush Sharma updated COLLECTIONS-883:
-------------------------------------
Attachment: image1.png
Screenshot 2025-12-15 at 11.47.24 AM.png
> BloomFilter Shape class limits numberOfBits to int, preventing large-scale
> filters (>2.1B bits)
> -----------------------------------------------------------------------------------------------
>
> Key: COLLECTIONS-883
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-883
> Project: Commons Collections
> Issue Type: Bug
> Components: Bloomfilter
> Affects Versions: 4.5.0
> Reporter: Ayush Sharma
> Priority: Major
> Attachments: Screenshot 2025-12-15 at 11.47.24 AM.png, image1.png
>
>
> *Problem*
> When creating a Bloom filter for large datasets using Shape.fromNP(n, p), the
> operation fails if the calculated number of bits exceeds Integer.MAX_VALUE
> (~2.1 billion).
> *Error Message*
> Resulting filter has more than 2147483647 bits: 7.569340059E9
> *Environment*
> * Dataset: ~500 million elements
> * False Positive Probability: 0.005
> * Apache Commons Collections version: 4.5.0-M2
> *Root Cause*
> The Shape class stores numberOfBits as an int:
> * Shape.fromNP(int n, double p)
> * Shape.fromKM(int k, int m)
> * Shape.fromNM(int n, int m)
> * Shape.getNumberOfBits() returns int
> For large-scale applications, the required bits can exceed Integer.MAX_VALUE.
> *Calculation*
> m = -n × ln(p) / (ln(2))²
> m = -500,000,000 × ln(0.005) / 0.4805
> m ≈ 5.5+ billion bits
> *Suggested Fix*
> Change numberOfBits from int to long in the Shape class and related methods.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)