[ 
https://issues.apache.org/jira/browse/CASSANDRA-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768379#comment-13768379
 ] 

Matt Abrams commented on CASSANDRA-5906:
----------------------------------------

Glad to see HLL++ might be helpful here.  I think the approach mentioned above 
is a good one.  One note of caution is that HLL++ (and HLL) are not thread 
safe.  So you may need to synchronize access to the estimator if multiple 
threads will be updating it concurrently.  There has been some discussion of 
creating a thread safe version of HLL++ but the general consensus has been that 
synchronization is better done at the client level.  Let me know if you have 
strong opinions on this.

                
> Avoid allocating over-large bloom filters
> -----------------------------------------
>
>                 Key: CASSANDRA-5906
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5906
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Yuki Morishita
>             Fix For: 2.0.1
>
>
> We conservatively estimate the number of partitions post-compaction to be the 
> total number of partitions pre-compaction.  That is, we assume the worst-case 
> scenario of no partition overlap at all.
> This can result in substantial memory wasted in sstables resulting from 
> highly overlapping compactions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to