Claudenw commented on pull request #258: URL: https://github.com/apache/commons-collections/pull/258#issuecomment-996005399
@aherbert, moving to long indices for the Bloom filter has some interesting side effects. 1. The BitMap version needs checks to ensure that the values do not exceed 64*Integer.MAX_VALUE as that will cause the BitMap array to overflow (index too large). 2. It would be possible to construct a SparseBloomFilter (the one with the array of values) that can not be represented as being fully populated. For example: if the number of bits in the filter is Integer.MAX_VALUE+1, it is not possible to produce a filter with all the bits turned on as the array index will overflow (index too large again). I am certain there are other minor issues as well. Perhaps it would make sense to create a set of "Large" bloom filters, with multiple arrays of indexes or BitMaps and equivalent methods with BigInteger and BigDecimal variables and parameters. I can see that it would be possible to convert the BloomFilters implementation here to the "Large" versions. Also, at the scale we are talking about, it might make sense to have a ByteBuffer backed implementation so that Memory mapped files could be used. But I think these are all for another day. Perhaps the class documentation should state that it only works with filters up to Integer.MAX_INT bits. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org