[ https://issues.apache.org/jira/browse/CASSANDRA-11380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wei Deng updated CASSANDRA-11380: --------------------------------- External issue ID: 7937 > Client visible backpressure mechanism > ------------------------------------- > > Key: CASSANDRA-11380 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11380 > Project: Cassandra > Issue Type: New Feature > Components: Coordination > Reporter: Wei Deng > > Cassandra currently lacks a sophisticated back pressure mechanism to prevent > clients ingesting data at too high throughput. One of the reasons why it > hasn't done so is because of its SEDA (Staged Event Driven Architecture) > design. With SEDA, an overloaded thread pool can drop those droppable > messages (in this case, MutationStage can drop mutation or counter mutation > messages) when they exceed the 2-second timeout. This can save the JVM from > running out of memory and crash. However, one downside from this kind of > load-shedding based backpressure approach is that increased number of dropped > mutations will increase the chance of inconsistency among replicas and will > likely require more repair (hints can help to some extent, but it's not > designed to cover all inconsistencies); another downside is that excessive > writes will also introduce much more pressure on compaction (especially LCS), > and backlogged compaction will increase read latency and cause more frequent > GC pauses, and depending on the type of compaction, some backlog can take a > long time to clear up even after the write is removed. It seems that the > current load-shedding mechanism is not adequate to address a common bulk > loading scenario, where clients are trying to ingest data at highest > throughput possible. We need a more direct way to tell the client drivers to > slow down. > It appears that HBase had suffered similar situation as discussed in > HBASE-5162, and they introduced some special exception type to tell the > client to slow down when a certain "overloaded" criteria is met. If we can > leverage a similar mechanism, our dropped mutation event can be used to > trigger such exceptions to push back on the client; at the same time, > backlogged compaction (when the number of pending compactions exceeds a > certain threshold) can also be used for the push back and this can prevent > vicious cycle mentioned in > https://issues.apache.org/jira/browse/CASSANDRA-11366?focusedCommentId=15198786&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15198786. -- This message was sent by Atlassian JIRA (v6.3.4#6332)