[jira] [Commented] (CASSANDRA-2466) bloom filters should avoid huge array allocations to avoid fragmentation concerns

Ryan King (JIRA) Wed, 13 Apr 2011 10:50:47 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019457#comment-13019457
 ]


Ryan King commented on CASSANDRA-2466:
--------------------------------------

Moving to smaller arrays would make the allocation easier, but wouldn't reduce 
the raw amount of memory needed for a large bloom filter.

Would it be worth moving these off-heap completely?

> bloom filters should avoid huge array allocations to avoid fragmentation 
> concerns
> ---------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2466
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2466
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Priority: Minor
>
> The fact that bloom filters are backed by single large arrays of longs is 
> expected to interact badly with promotion of objects into old gen with CMS, 
> due to fragmentation concerns (as discussed in CASSANDRA-2463).
> It should be less of an issue than CASSANDRA-2463 in the sense that you need 
> to have a lot of rows before the array sizes become truly huge. For 
> comparison, the ~ 143 million row key limit implied by the use of 'int' in 
> BitSet prior to the switch to OpenBitSet translates roughly to 238 MB 
> (assuming the limitation factor there was the addressability of the bits with 
> a 32 bit int, which is my understanding).
> Having a preliminary look at OpenBitSet with an eye towards replacing the 
> single long[] with multiple arrays, it seems that if we're willing to drop 
> some of the functionality that is not used for bloom filter purposes, the 
> bits[i] indexing should be pretty easy to augment with modulo to address an 
> appropriate smaller array. Locality is not an issue since the bloom filter 
> case is the worst possible case for locality anyway, and it doesn't matter 
> whether it's one huge array or a number of ~ 64k arrays.
> Callers may be affected like BloomFilterSerializer which cares about the 
> underlying bit array.
> If the full functionality of OpenBitSet is to be maintained (e.g., xorCount) 
> some additional acrobatics would be necessary and presumably at a noticable 
> performance cost if such operations were to be used in performance critical 
> places.
> An argument against touching OpenBitSet is that it seems to be pretty 
> carefully written and tested and has some non-trivial details and people have 
> seemingly benchmarked it quite carefully. On the other hand, the improvement 
> would then apply to other things as well, such as the bitsets used to keep 
> track of in-core pages (off the cuff for scale, a 64 gig sstable should imply 
> a 2 mb bit set, with one bit per 4k page).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2466) bloom filters should avoid huge array allocations to avoid fragmentation concerns

Reply via email to