[ 
https://issues.apache.org/jira/browse/CASSANDRA-11366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198786#comment-15198786
 ] 

Jeff Jirsa commented on CASSANDRA-11366:
----------------------------------------

There exist other times when pending compactions will increase (notably 
bootstrapping / repair, where streaming - especially in a vnode situation - 
will create dozens/hundreds/thousands of sstables. If you apply backpressure 
and drop mutations inbound in these situations, you'll end up needing to 
repair, which will again create tons of sstables, which will create more 
backpressure, which could start a vicious cycle.

As an operator, I would strongly prefer the existing behavior to having 
one-more-source-of-backpressure to try to reason about - the right answer is 
for operations teams to monitor pending compactions and counts of sstables and 
adjust accordingly (especially since adjust may mean tuning concurrent 
compactors or compaction throughput, not throttling writes). 
 

> Compaction backlog triggered backpressure to write operations
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-11366
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11366
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths
>            Reporter: Wei Deng
>
> A general perception about Cassandra is that it is super efficient in taking 
> writes and can handle very high write throughput, due to its simple and 
> efficient commitlog/memtable write path. However, one aspect that is often 
> overlooked by new Cassandra user is that when they ingest a lot of data at 
> the highest throughput possible trying to push the hardware's limit, as soon 
> as the compaction starts to kick in, a seemingly harmless write workload 
> could create way more I/O and CPU consumption than their hardware can 
> accommodate, and as a result, memtable flushes take much longer to finish due 
> to I/O contentions from the compaction, thousands of SSTables start to show 
> up on data disk because compaction is not able to process them all in time, 
> and GC pauses are more frequent and impactful because read will have to touch 
> way more SSTables than ideal situation. Depending on the compaction strategy 
> a Cassandra table chooses, this kind of overwhelmed and backlogged situation 
> can sometimes take a long time to clear up (for LeveledCompactionStrategy, or 
> LCS, it could be hours) even after the write workload is removed.
> Currently write has a back pressure mechanism called load shedding, and it 
> will be triggered if the MutationStage gets too overwhelmed and a mutation 
> takes more than 2 seconds to get acknowledgement, in that case, the mutation 
> message will be dropped and a hint will be added by StorageProxy. However, 
> this mechanism does not take into consideration of compaction backlogs. As a 
> result, the compaction can be very much behind under heavy write workload and 
> continue to accumulate more and more pending compactions and unprocessed 
> SSTables, while the writes are flowing at the same rate as write timeouts are 
> not triggered yet. For LCS and DTCS this can get out of control and become 
> hard to recover. As a practical workaround, you can monitor the number of 
> pending compactions, and if it starts going on an upward trend, reduce the 
> write throughput. This has proved to be effectively. However, these two types 
> of activities are usually conducted by different teams in an enterprise, so 
> they are not always easy to coordinate. If there is some back pressure 
> mechanism (to slow down the intake of the writes) that can be automatically 
> triggered based on how much compaction is behind, it will make DBA's life 
> easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to