[ https://issues.apache.org/jira/browse/CASSANDRA-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830317#comment-13830317 ]
Matt Abrams commented on CASSANDRA-5906: ---------------------------------------- When SP > 0 the algorithm uses a variant of a linear counter to get very accurate counts at small cardinality. At some threshold the algorithm switches from a linear counter to HLL. Linear counters grow in size as a function of the number of inputs where HLL's size is a function of the desired error rate. We could (should?) tune the threshold so that the size so that the conversion happens earlier. Currently the threshold is equal to 2^p * .75. > Avoid allocating over-large bloom filters > ----------------------------------------- > > Key: CASSANDRA-5906 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5906 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Jonathan Ellis > Assignee: Yuki Morishita > Fix For: 2.1 > > > We conservatively estimate the number of partitions post-compaction to be the > total number of partitions pre-compaction. That is, we assume the worst-case > scenario of no partition overlap at all. > This can result in substantial memory wasted in sstables resulting from > highly overlapping compactions. -- This message was sent by Atlassian JIRA (v6.1#6144)