[jira] [Commented] (CASSANDRA-2466) bloom filters should avoid huge array allocations to avoid fragmentation concerns

C. Scott Andreas (JIRA) Tue, 12 Apr 2011 23:16:51 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019222#comment-13019222
 ]


C. Scott Andreas commented on CASSANDRA-2466:
---------------------------------------------

Peter, it's interesting that you mention this. During the first run of the 
memory profiling I ran while investigating CASSANDRA-2463 on vanilla 0.7.4, I 
saw a tremendous amount of allocation of long[]/longs (comprising ~70% of heap 
allocations at that specific moment), but was not sure of the source. I doubt 
I'll have time to look into this at the office this week, but if I'm able to 
outside of work I may check into the allocation a bit.

If you're on it, by all means go for it. Thanks for mentioning this!

> bloom filters should avoid huge array allocations to avoid fragmentation 
> concerns
> ---------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2466
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2466
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Priority: Minor
>
> The fact that bloom filters are backed by single large arrays of longs is 
> expected to interact badly with promotion of objects into old gen with CMS, 
> due to fragmentation concerns (as discussed in CASSANDRA-2463).
> It should be less of an issue than CASSANDRA-2463 in the sense that you need 
> to have a lot of rows before the array sizes become truly huge. For 
> comparison, the ~ 143 million row key limit implied by the use of 'int' in 
> BitSet prior to the switch to OpenBitSet translates roughly to 238 MB 
> (assuming the limitation factor there was the addressability of the bits with 
> a 32 bit int, which is my understanding).
> Having a preliminary look at OpenBitSet with an eye towards replacing the 
> single long[] with multiple arrays, it seems that if we're willing to drop 
> some of the functionality that is not used for bloom filter purposes, the 
> bits[i] indexing should be pretty easy to augment with modulo to address an 
> appropriate smaller array. Locality is not an issue since the bloom filter 
> case is the worst possible case for locality anyway, and it doesn't matter 
> whether it's one huge array or a number of ~ 64k arrays.
> Callers may be affected like BloomFilterSerializer which cares about the 
> underlying bit array.
> If the full functionality of OpenBitSet is to be maintained (e.g., xorCount) 
> some additional acrobatics would be necessary and presumably at a noticable 
> performance cost if such operations were to be used in performance critical 
> places.
> An argument against touching OpenBitSet is that it seems to be pretty 
> carefully written and tested and has some non-trivial details and people have 
> seemingly benchmarked it quite carefully. On the other hand, the improvement 
> would then apply to other things as well, such as the bitsets used to keep 
> track of in-core pages (off the cuff for scale, a 64 gig sstable should imply 
> a 2 mb bit set, with one bit per 4k page).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2466) bloom filters should avoid huge array allocations to avoid fragmentation concerns

Reply via email to