[ https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jan Karlsson updated CASSANDRA-13354: ------------------------------------- Attachment: patchedTest.png > LCS estimated compaction tasks does not take number of files into account > ------------------------------------------------------------------------- > > Key: CASSANDRA-13354 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13354 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 2.2.9 > Reporter: Jan Karlsson > Assignee: Jan Karlsson > Attachments: 13354-trunk.txt, patchedTest.png, unpatchedTest.png > > > In LCS, the way we estimate number of compaction tasks remaining for L0 is by > taking the size of a SSTable and multiply it by four. This would give 4*160mb > with default settings. This calculation is used to determine whether repaired > or repaired data is being compacted. > Now this works well until you take repair into account. Repair streams over > many many sstables which could be smaller than the configured SSTable size > depending on your use case. In our case we are talking about many thousands > of tiny SSTables. As number of files increases one can run into any number of > problems, including GC issues, too many open files or plain increase in read > latency. > With the current algorithm we will choose repaired or unrepaired depending on > whichever side has more data in it. Even if the repaired files outnumber the > unrepaired files by a large margin. > Similarily, our algorithm that selects compaction candidates takes up to 32 > SSTables at a time in L0, however our estimated task calculation does not > take this number into account. These two mechanisms should be aligned with > each other. > I propose that we take the number of files in L0 into account when estimating > remaining tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)