Claudenw commented on pull request #258:
URL: 
https://github.com/apache/commons-collections/pull/258#issuecomment-996005399


   @aherbert, moving to long indices for the Bloom filter has some interesting 
side effects.
   
   1. The BitMap version needs checks to ensure that the values do not exceed 
64*Integer.MAX_VALUE as that will cause the BitMap array to overflow (index too 
large).
   2. It would be possible to construct a SparseBloomFilter (the one with the 
array of values) that can not be represented as being fully populated.  For 
example: if the number of bits in the filter is Integer.MAX_VALUE+1, it is not 
possible to produce a filter with all the bits turned on as the array index 
will overflow (index too large again).
   
   
   I am certain there are other minor issues as well.  Perhaps it would make 
sense to create a set of "Large" bloom filters, with multiple arrays of indexes 
or BitMaps and equivalent methods with BigInteger and BigDecimal variables and 
parameters.  I can see that it would be possible to convert the BloomFilters 
implementation here to the "Large" versions.  Also, at the scale we are talking 
about, it might make sense to have a ByteBuffer backed implementation so that 
Memory mapped files could be used.  But I think these are all for another day.  
   
   Perhaps the class documentation should state that it only works with filters 
up to Integer.MAX_INT bits.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to