[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375132#comment-15375132
 ] 

Sergio Bossa commented on CASSANDRA-9318:
-----------------------------------------

[~jbellis],

bq. it causes other problems in the other two (non-global-overload) scenarios.

I think you are overstating the problem here, because the first two scenarios 
are either very limited in time (the first), or very limited in magnitude (the 
second), and the back-pressure algorithm is configurable to be as sensitive and 
as reactive as you wish, by tuning the incoming/outgoing imbalance you want to 
tolerate, and the growth factor.

bq. I honestly don't see what is "better" about a "slow every write down to the 
speed of the slowest, possibly sick, replica" approach. Defining a simple high 
water mark on requests in flight should be much simpler without the negative 
side effects.

Such kind of threshold would be too arbitrary and coarse grained, but that's 
not even the problem; the point is rather what you're going to do when the 
threshold is met. That is, say the high water mark is met, we really have these 
options:
1) Throttle at the rate of the slow replicas, which is what we do in this patch.
2) Take the slow replica(s) out, which is even worse in terms of availability.
3) Rate limit the message dequeueing in the outbound connection, but this only 
moves the back-pressure problem from a place to another.
4) Rate limit at a global rate equal to the water mark, but this only helps the 
coordinator, as such rate might still be too high for the slow replicas.

In the end, I can't see any better options than what we implement in this patch 
for those use cases willing to trade performance for overall stability, and I 
would at least have it go through proper QA testing, to see how it behaves on 
larger clusters, fix any sharp edges, and see how it stands overall.

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths, Streaming and Messaging
>            Reporter: Ariel Weisberg
>            Assignee: Sergio Bossa
>         Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to