[ https://issues.apache.org/jira/browse/CASSANDRA-3592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168549#comment-13168549 ]
Jonathan Ellis commented on CASSANDRA-3592: ------------------------------------------- You're right, that's not likely to help. It sounded like such a good fit superficially! > Major Compaction Incredibly Slow > -------------------------------- > > Key: CASSANDRA-3592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3592 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.0.3 > Environment: RHEL6 - 24 core machines 24 GB mem total, 11 GB java heap > java version "1.6.0_26" > 6 node cluster (4@0.8.6, 2@1.0.3) > Reporter: Dan Hendry > Labels: compaction > > Twice now (on different nodes), I have observed major compaction for certain > column families take *significantly* longer on 1.0.3 in comparison to 0.8.6. > For example, > On the 0.8.6 node, the post compaction log message: > {noformat}CompactionManager.java (line 608) Compacted to XXX. 339,164,959,170 > to 158,825,469,883 (~46% of original) bytes for 25,996 keys. Time: > 26,934,317ms.{noformat} > On the 1.0.3 node, the post compaction log message: > {noformat} CompactionTask.java (line 213) Compacted to [XXX]. > 222,338,354,529 to 147,751,403,084 (~66% of original) bytes for 26,100 keys > at 0.562045MB/s. Time: 250,703,563ms.{noformat} > So... literally an order of magnitude slower on 1.0.3 in comparison to 0.8.6. > Relevant configuration settings: > * compaction_throughput_mb_per_sec: 0 (why? because the compaction throttling > logic as currently implemented is highly unsuitable for wide rows but thats a > different issue) > * in_memory_compaction_limit_in_mb: 128 > Column family characteristics: > * Many wide rows (~5% of rows greater than > 10MB and hundreds of rows > greater than 100 MB, with many small columns). > * Heavy use of expiring columns - each row represents data for a particular > hour so typically all columns in the row will expire together. > * The significant size shrinkage as reported by the log messages is due > mainly to expired data being cleaned up (I typically trigger major compaction > when 30-50% of the on disk data has expired which is about once every 3 weeks > per node). > * Perhaps obviously: size tiered compaction and no compression (the schema > has not changed since the partial upgrade to 1.0.x) > * Standard column family > Performance notes during compaction: > * Nice CPU usage and load average is basically the same between 0.8.6 and > 1.0.3 - ie, compaction IS running and is not getting stalled or hung up > anywhere. > * Compaction is IO bound on the 0.8.6 machines - the disks see heavy, > constant utilization when compaction is running. > * Compaction is uses virtually no IO on the 1.0.3 machines - disk utilization > is virtually no different when compacting vs not compacting (but at the same > time, CPU usage and load average clearly indicate that compaction IS running). > Finally, I have not had time to profile more thoroughly but jconsole always > shows the following stacktrace for the active compaction thread (for the > 1.0.3 machine): > {noformat} > Stack trace: > > org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(ColumnFamilyStore.java:851) > org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(ColumnFamilyStore.java:835) > org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore.java:826) > org.apache.cassandra.db.compaction.PrecompactedRow.removeDeletedAndOldShards(PrecompactedRow.java:77) > org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:102) > org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:133) > org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:102) > org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:87) > org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:116) > org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99) > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) > com.google.common.collect.Iterators$7.computeNext(Iterators.java:614) > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) > org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:172) > org.apache.cassandra.db.compaction.CompactionManager$4.call(CompactionManager.java:277) > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > java.util.concurrent.FutureTask.run(FutureTask.java:138) > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > java.lang.Thread.run(Thread.java:662) > {noformat} > I have not really had time to debug the issue further but given the stack > trace and column family characteristics, I suspect some problem with the > column family column iterator .remove() method. If the underlying > ISortedColumns implementation is a ArrayBackedSortedColumns instance, then > repeated .remove() calls could be incredibly inefficient. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira