[ https://issues.apache.org/jira/browse/CASSANDRA-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287853#comment-14287853 ]
Branimir Lambov commented on CASSANDRA-6809: -------------------------------------------- {quote} * single sync thread forms sections at regular time intervals and sends them to compression executor/phase (SPMC queue), * sync thread waits on futures and syncs each in order {quote} I gave your suggestion a day of development, but it still introduces more problems than it solves. I took it [this far|https://github.com/blambov/cassandra/compare/blambov:compressed-cl...compressed-cl-compressionexecutor]. It's already significantly more complicated than the option I proposed, and I got worse performance and still some uncertainties around recycling and shutdown. Perhaps I did not put this clearly, but I don't see a point introducing a trigger for compression other than a sync. Reducing write latency for a 10s sync period is of no value whatsoever; with a short period, especially in batch mode where it really matters, you wouldn't want to start a compression cycle before the batch is completed anyway (if you did, a better solution to the problem is to just compress each mutation individually). We have ample flexibility in the sync period (time) and segment size (space) to be able to use compression efficiently. Granted, this may require documenting different defaults for compression, but this is something I would much prefer to live with than extra code complexity needed to work around badly chosen parameters. Assuming sync-only triggering and short periods, your suggestion requires decoupling of sync starts from sync completions with a queue of sync requests in flight. That's what I implemented in the code above. Am I doing something wrong? Going back to the previous approach (updated to fix problem with sync possibly completing earlier than it should), bq. We're now no longer honouring the sync interval; we are syncing more frequently, which may reduce disk throughput. The exact time of syncing in relation to each other may also vary, likely into lock-step under saturation, so that there may be short periods of many competing syncs potentially yielding pathological disk behaviour, and introducing competition for the synchronized blocks inside the segments, in effect introducing a MPMC queue, eliminating those few micros of benefit. The sync frequency is as specified, the intervals will vary, but writes to disk are still serial so disks should behave normally. There will be competition on waitForSync if compression is constantly late but, as you say, in this case the magnitude of the overheads is too small to matter. A bigger problem is that I can imagine a pathological situation where only one thread is doing work if the others have nothing to do but also become late waiting for it, and start the next cycle at the same time. > Compressed Commit Log > --------------------- > > Key: CASSANDRA-6809 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6809 > Project: Cassandra > Issue Type: Improvement > Reporter: Benedict > Assignee: Branimir Lambov > Priority: Minor > Labels: performance > Fix For: 3.0 > > Attachments: ComitLogStress.java, logtest.txt > > > It seems an unnecessary oversight that we don't compress the commit log. > Doing so should improve throughput, but some care will need to be taken to > ensure we use as much of a segment as possible. I propose decoupling the > writing of the records from the segments. Basically write into a (queue of) > DirectByteBuffer, and have the sync thread compress, say, ~64K chunks every X > MB written to the CL (where X is ordinarily CLS size), and then pack as many > of the compressed chunks into a CLS as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)