[ https://issues.apache.org/jira/browse/CASSANDRA-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-3532: -------------------------------------- Attachment: 3532-v2.txt v2 attached, with the new approach moved into componentsFor, so that open() can take advantage of the improvement too. Also, componentsFor now respects the temporary-ness of the descriptor passed, so a separate TempState enum is unnecessary. Also renamed cleanupIfNecessary to abort, and moved to catch block as discussed above. > Compaction cleanupIfNecessary costly when many files in data dir > ---------------------------------------------------------------- > > Key: CASSANDRA-3532 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3532 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.0.0 > Environment: Solaris 10, 1.0.4 release candidate > Reporter: Eric Parusel > Labels: compaction > Fix For: 1.0.6 > > Attachments: 3532-v2.txt, 3532.txt > > > From what I can tell SSTableWriter.cleanupIfNecessary seems increasingly > costly as the number of files in the data dir increases. > It calls SSTable.componentsFor(descriptor, Descriptor.TempState.TEMP) which > lists all files in the data dir to find matching components. > Am I roughly correct that (cleanupCost = SSTable count * data dir size)? > We had been doing write load testing with default compaction throttling > (16MB/s) and LeveledCompaction. > Unfortunately we haven't been keeping tabs on sstable counts and it grew out > of control. > On a system with 300,000 sstables (!) here is an example of our compaction > rate. Note that as you're probably aware cleanupIfNecessary is included in > the timing: > INFO [CompactionExecutor:48] 2011-11-25 22:25:30,353 CompactionTask.java > (line 213) Compacted to > [/data1/cassandra/data/MA_DDR/indexes_03-hc-5369-Data.db,]. 5,821,590 to > 5,306,354 (~91% of original) bytes for 123 keys at 0.163755MB/s. Time: > 30,903ms. > Here's a slightly larger one: > INFO [CompactionExecutor:43] 2011-11-25 22:23:28,956 CompactionTask.java > (line 213) Compacted to > [/data1/cassandra/data/MA_DDR/indexes_03-hc-5336-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5337-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5338-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5339-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5340-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5341-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5342-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5343-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5344-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5345-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5346-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5347-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5348-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5349-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5350-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5351-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5352-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5353-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5354-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5355-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5356-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5357-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5358-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5359-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5360-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5361-Data.db,]. > 140,706,512 to 137,990,868 (~98% of original) bytes for 2,181 keys at > 0.338627MB/s. Time: 388,623ms. > This is with compaction throttling set to 0 (Off). > So I believe because of this it's going to take a very long time to recover > from having so many small sstables. > It might be notable that we're using Solaris 10, possibly listFiles() is > faster on other platforms? > Is it feasible to keep track of the temp files and just delete them rather > than searching for them for each SSTable using SSTable.componentsFor()? > Here's the stack trace for the CompactionExecutor:14 thread that appears to > be occupying the majority of the cpu time on this node: > Name: CompactionExecutor:14 > State: RUNNABLE > Total blocked: 3 Total waited: 1,610,714 > Stack trace: > java.io.UnixFileSystem.getBooleanAttributes0(Native Method) > java.io.UnixFileSystem.getBooleanAttributes(Unknown Source) > java.io.File.isDirectory(Unknown Source) > org.apache.cassandra.io.sstable.SSTable$3.accept(SSTable.java:204) > java.io.File.listFiles(Unknown Source) > org.apache.cassandra.io.sstable.SSTable.componentsFor(SSTable.java:200) > org.apache.cassandra.io.sstable.SSTableWriter.cleanupIfNecessary(SSTableWriter.java:289) > org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:189) > org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:57) > org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:134) > org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:114) > java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) > java.util.concurrent.FutureTask.run(Unknown Source) > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) > java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > java.lang.Thread.run(Unknown Source) > No matter where I click in the busy Compaction thread timeline in YourKit > it's in Running state and showing this above trace, except for short periods > of time where it's actually compacting :) > Thanks, > Eric -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira