[ 
https://issues.apache.org/jira/browse/CASSANDRA-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277638#comment-14277638
 ] 

Branimir Lambov commented on CASSANDRA-6809:
--------------------------------------------

Thanks for the review, Ariel. The plan was to implement compression without 
introducing too much extra complexity thus I've stayed away from adding new 
queues and stages dedicated to compression. Even so, a pipeline that achieves 
the highest possible throughput is quite doable by simply using more than one 
sync thread, and the size of the compression window is easily controlled via 
the sync period. Since we don't have an incremental compression mechanism, 
compression necessarily has to happen at the end, i.e. when the whole of the 
set of mutations to compress has been written to the buffer (this is usually 
not at the end of the segment).

CASSANDRA-7075 is valuable on its own right. Arguably RAID 0 is not good enough 
in either performance or reliability. None of the extra complexity we introduce 
there is made necessary by compression-related concerns, but one of the side 
effects of it is the availability of more than one sync thread for compression. 
It is a solution of sorts to the lack of CPU saturation from this patch, but it 
is not at all the only way to achieve it.

This code was written before ByteBuffer compression was made available; I will 
definitely make use of that now, but I wonder if  that should not be a separate 
patch so that we don't have to block on/conflict with Jake's patch.

The playback tests are in the various RecoveryManagerTests in o.a.c.db; the 
tests are the same for the uncompressed (test/testold target) and compressed 
case (test-compressed target). For performance tests the ultimate measure is 
cassandra-stress; ComitLogStress is a simple microbenchmark of how much we can 
push that favors compression -- make sure to run it with periodic rather than 
batch sync. A latency test is probably needed for batch mode; this probably 
needs some changes to the service to make sure the sync period can go low 
enough for the write latency to show up. 

I have not looked at the other suggestions yet; I have to switch modes from the 
quite different node allocation work-- give me a couple of days.

> Compressed Commit Log
> ---------------------
>
>                 Key: CASSANDRA-6809
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6809
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Branimir Lambov
>            Priority: Minor
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: logtest.txt
>
>
> It seems an unnecessary oversight that we don't compress the commit log. 
> Doing so should improve throughput, but some care will need to be taken to 
> ensure we use as much of a segment as possible. I propose decoupling the 
> writing of the records from the segments. Basically write into a (queue of) 
> DirectByteBuffer, and have the sync thread compress, say, ~64K chunks every X 
> MB written to the CL (where X is ordinarily CLS size), and then pack as many 
> of the compressed chunks into a CLS as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to