[ 
https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-13354:
-------------------------------------
    Attachment: patchedTest.png

> LCS estimated compaction tasks does not take number of files into account
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13354
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13354
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction
>         Environment: Cassandra 2.2.9
>            Reporter: Jan Karlsson
>            Assignee: Jan Karlsson
>         Attachments: 13354-trunk.txt, patchedTest.png, unpatchedTest.png
>
>
> In LCS, the way we estimate number of compaction tasks remaining for L0 is by 
> taking the size of a SSTable and multiply it by four. This would give 4*160mb 
> with default settings. This calculation is used to determine whether repaired 
> or repaired data is being compacted.
> Now this works well until you take repair into account. Repair streams over 
> many many sstables which could be smaller than the configured SSTable size 
> depending on your use case. In our case we are talking about many thousands 
> of tiny SSTables. As number of files increases one can run into any number of 
> problems, including GC issues, too many open files or plain increase in read 
> latency.
> With the current algorithm we will choose repaired or unrepaired depending on 
> whichever side has more data in it. Even if the repaired files outnumber the 
> unrepaired files by a large margin.
> Similarily, our algorithm that selects compaction candidates takes up to 32 
> SSTables at a time in L0, however our estimated task calculation does not 
> take this number into account. These two mechanisms should be aligned with 
> each other.
> I propose that we take the number of files in L0 into account when estimating 
> remaining tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to