[ https://issues.apache.org/jira/browse/CASSANDRA-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285436#comment-14285436 ]
Branimir Lambov commented on CASSANDRA-6809: -------------------------------------------- The current approach boils down to multiple sync threads which * form sections at regular time intervals, * compress the section, * wait for any previous syncs to have retired, * write and flush the compressed data, * retire the sync. (In the uncompressed case a single thread which skips the second and third step.) Let me try to rephrase what you are saying to make sure I understand it correctly: * single sync thread forms sections at regular time intervals and sends them to compression executor/phase (SPMC queue), * compression task sends completed sections to flush executor/phase (MPSC queue, ordering and wait for the first in-flight one required), * flush task retires syncs in order. Is this what you mean? Why is this simpler, or of comparable complexity? Wouldn't the two extra queues waste resources and increase latency? Smaller-than-segment batches (sections) are already part of the design in both cases, assuming that the sync period is sane (e.g. ~100ms). In both approaches there's room to further separate write and flush at the expense of added complexity. > Compressed Commit Log > --------------------- > > Key: CASSANDRA-6809 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6809 > Project: Cassandra > Issue Type: Improvement > Reporter: Benedict > Assignee: Branimir Lambov > Priority: Minor > Labels: performance > Fix For: 3.0 > > Attachments: ComitLogStress.java, logtest.txt > > > It seems an unnecessary oversight that we don't compress the commit log. > Doing so should improve throughput, but some care will need to be taken to > ensure we use as much of a segment as possible. I propose decoupling the > writing of the records from the segments. Basically write into a (queue of) > DirectByteBuffer, and have the sync thread compress, say, ~64K chunks every X > MB written to the CL (where X is ordinarily CLS size), and then pack as many > of the compressed chunks into a CLS as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)