[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375176#comment-15375176
 ] 

Sylvain Lebresne commented on CASSANDRA-9318:
---------------------------------------------

bq. At this point instead of adding more complexity to an approach that 
fundamentally doesn't solve that, why not back up and use an approach that does 
the right thing in all 3 cases instead?

My understanding of the fundamentals of Sergio's approach is to:
# maintain, on the coordinator, a state for each node that keep track of how 
much in-flight query we have for that node.
# on a new write query, check the state for the replicas involved in that query 
to decide what to do (when to hint the node, when to start rate limiting or 
when to start rejecting the queries to the client).

In that sense, I don't think the approach is fundamentally wrong but I feel the 
main question is on the "what to do (and when)". And as I'm not sure there is a 
single perfect answer for that, I do also like the approach of a strategy, if 
only so experimentation is easier (though technically, instead of just having 
an {{apply()}} that potentially throws or sleep, I think the strategy should 
take the replica for the query, and return a list of nodes to query and one to 
hint (preserving the ability to sleep or throw) to get more options on the 
"what to do", and not making backpressure a node-per-node thing).

In term of the "default" back-pressure strategy we provide, I agree that we 
should mostly try to solve the scenario 3: we should define some condition 
where we consider things overloaded and only apply back-pressure from there. 
Not sure what that exact condition is btw, but I'm not convinced we can come 
with a good one out of thin air, I think we need to experiment.

tl;dr, if we make the strategy a bit more generic as mentioned above so the 
decision is made from all replica involved (maybe the strategy should also keep 
track of the replica-state completely internally so we can implement basic 
strategy like having a simple high watermark very easy), and we make sure to 
not throttle too quickly (typically, if a single replica is slow and we don't 
really need it, start by just hinting him), then I'd be happy moving to the 
"actually test this" phase and see how it goes.


> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths, Streaming and Messaging
>            Reporter: Ariel Weisberg
>            Assignee: Sergio Bossa
>         Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to