[ 
https://issues.apache.org/jira/browse/CASSANDRA-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-3532:
--------------------------------------

    Attachment: 3532-v2.txt

v2 attached, with the new approach moved into componentsFor, so that open() can 
take advantage of the improvement too. Also, componentsFor now respects the 
temporary-ness of the descriptor passed, so a separate TempState enum is 
unnecessary.

Also renamed cleanupIfNecessary to abort, and moved to catch block as discussed 
above.
                
> Compaction cleanupIfNecessary costly when many files in data dir
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-3532
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3532
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.0
>         Environment: Solaris 10, 1.0.4 release candidate
>            Reporter: Eric Parusel
>              Labels: compaction
>             Fix For: 1.0.6
>
>         Attachments: 3532-v2.txt, 3532.txt
>
>
> From what I can tell SSTableWriter.cleanupIfNecessary seems increasingly 
> costly as the number of files in the data dir increases.
> It calls SSTable.componentsFor(descriptor, Descriptor.TempState.TEMP) which 
> lists all files in the data dir to find matching components.
> Am I roughly correct that   (cleanupCost = SSTable count * data dir size)?
> We had been doing write load testing with default compaction throttling 
> (16MB/s) and LeveledCompaction.
> Unfortunately we haven't been keeping tabs on sstable counts and it grew out 
> of control.
> On a system with 300,000 sstables (!) here is an example of our compaction 
> rate.  Note that as you're probably aware cleanupIfNecessary is included in 
> the timing:
>  INFO [CompactionExecutor:48] 2011-11-25 22:25:30,353 CompactionTask.java 
> (line 213) Compacted to 
> [/data1/cassandra/data/MA_DDR/indexes_03-hc-5369-Data.db,].  5,821,590 to 
> 5,306,354 (~91% of original) bytes for 123 keys at 0.163755MB/s.  Time: 
> 30,903ms.
> Here's a slightly larger one:
>  INFO [CompactionExecutor:43] 2011-11-25 22:23:28,956 CompactionTask.java 
> (line 213) Compacted to 
> [/data1/cassandra/data/MA_DDR/indexes_03-hc-5336-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5337-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5338-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5339-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5340-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5341-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5342-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5343-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5344-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5345-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5346-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5347-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5348-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5349-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5350-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5351-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5352-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5353-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5354-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5355-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5356-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5357-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5358-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5359-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5360-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5361-Data.db,].
>   140,706,512 to 137,990,868 (~98% of original) bytes for 2,181 keys at 
> 0.338627MB/s.  Time: 388,623ms.
> This is with compaction throttling set to 0 (Off).
> So I believe because of this it's going to take a very long time to recover 
> from having so many small sstables. 
> It might be notable that we're using Solaris 10, possibly listFiles() is 
> faster on other platforms?
> Is it feasible to keep track of the temp files and just delete them rather 
> than searching for them for each SSTable using SSTable.componentsFor()?
> Here's the stack trace for the CompactionExecutor:14 thread that appears to 
> be occupying the majority of the cpu time on this node:
> Name: CompactionExecutor:14
> State: RUNNABLE
> Total blocked: 3  Total waited: 1,610,714
> Stack trace: 
>  java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
> java.io.UnixFileSystem.getBooleanAttributes(Unknown Source)
> java.io.File.isDirectory(Unknown Source)
> org.apache.cassandra.io.sstable.SSTable$3.accept(SSTable.java:204)
> java.io.File.listFiles(Unknown Source)
> org.apache.cassandra.io.sstable.SSTable.componentsFor(SSTable.java:200)
> org.apache.cassandra.io.sstable.SSTableWriter.cleanupIfNecessary(SSTableWriter.java:289)
> org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:189)
> org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:57)
> org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:134)
> org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:114)
> java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> java.util.concurrent.FutureTask.run(Unknown Source)
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> java.lang.Thread.run(Unknown Source)
> No matter where I click in the busy Compaction thread timeline in YourKit 
> it's in Running state and showing this above trace, except for short periods 
> of time where it's actually compacting :)
> Thanks,
> Eric

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to