[ 
https://issues.apache.org/jira/browse/CASSANDRA-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324492#comment-14324492
 ] 

Benedict commented on CASSANDRA-8757:
-------------------------------------

Just to explain why I consider this a priority for 2.1, if you have users with 
very large STCS compactions, we can have some fairly pathological behaviour. 
Let's say our target file is 500Gb, and 20% of the data is the partition key. 
This means the summary will be approximately 800Mb, assuming defaults. If we 
re-open the result every 50Mb (default behaviour) we will allocate a total of 
4Tb of memory for summaries over the duration of the compaction. Not all of 
this will be used at once; ideally, in fact, we would only ever have maybe 
1.6Gb allocated. But there is no guarantee, and longer running operations like 
compactions could retain copies of multiple different instances indefinitely, 
so we could see several Gb of summary floating around in this pathological 
case. If there is a reticence to introduce this into 2.1, another option might 
be to either disable early reopening entirely for very large files, or to open 
far less frequently, say at even intervals of sqrt(N) where N is the expected 
end size, or at logarthmically further apart intervals. But the advantage of 
reopening vanishes if we do this, so we may as well just not do it for such 
files without this patch.

> IndexSummaryBuilder should construct itself offheap, and share memory between 
> the result of each build() invocation
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8757
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8757
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>             Fix For: 2.1.4
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to