[jira] [Commented] (CASSANDRA-4182) multithreaded compaction very slow with large single data file and a few tiny data files

Karl Mueller (JIRA) Fri, 27 Apr 2012 12:33:14 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263907#comment-13263907
 ]


Karl Mueller commented on CASSANDRA-4182:
-----------------------------------------

Yes I figure it's a worst-case scenario pretty much.  I didn't expect it to be 
any faster than single-threaded, possibly a bit slower or taking more CPU.  

However, it's a LOT slower (~80% slower). 

I'd be happy if it were the same speed as the single thread for the worst case 
with more CPU.  

                
> multithreaded compaction very slow with large single data file and a few tiny 
> data files
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4182
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4182
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.0.9
>         Environment: Redhat
> Sun JDK 1.6.0_20-b02
>            Reporter: Karl Mueller
>
> Turning on multithreaded compaction makes compaction time take nearly twice 
> as long in our environment, which includes a very large SStable and a few 
> smaller ones, relative to either 0.8.x with MT turned off or 1.0.x with MT 
> turned off.  
> compaction_throughput_mb_per_sec is set to 0.  
> We currently compact about 500 GB of data nightly due to overwrites.  
> (LevelDB will probably be enabled on the busy CFs once 1.0.x is rolled out 
> completely)  The time it takes to do the compaction is:
> 451m13.284s (multithreaded)
> 273m58.740s (multihtreaded disabled)
> Our nodes run on SSDs and therefore have a high read and write rate available 
> to them. The primary CF they're compacting right now, with most of the data, 
> is localized to a very large file (~300+GB) and a few tiny files (1-10GB) 
> since the CF has become far less active.  
> I would expect the multithreaded compaction to be no worse than the single 
> threaded compaction, or perhaps a higher cost in CPU for the same 
> performance, but it's half the speed with the same CPU usage, or more CPU. 
> I have two graphs available from testing 2 or 3 compactions which demonstrate 
> some interesting characteristics.  1.0.9 was installed on the 21st with MT 
> turned on.  Prior stuff is 0.8.7 with MT turned off, but 1.0.9 with MT turned 
> off seems to perform as well as 0.8.7.
> http://www.xney.com/temp/cass-irq.png  (interrupts)
> http://www.xney.com/temp/cass-iostat.png (io bandwidth of disks)
> This demonstrates a large increase in rescheduling interrupts and only half 
> the bandwidth used on the disks.  I suspect this is because some kind of 
> threads are thrashing or something like that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4182) multithreaded compaction very slow with large single data file and a few tiny data files

Reply via email to