[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365817#comment-15365817 ]
Sylvain Lebresne commented on CASSANDRA-9318: --------------------------------------------- A first concern I have, that has been brought up earlier in the conversation already, is that this will create a lot more OverloadedException that we're currently used to, with 2 concrete issues imo: # the new OverloadedException has a different meaning than the old one. The one meant the coordinator was overloaded (by too much hints writing), so client drivers will tend to try another coordinator when that happens. That's not a good idea here. # I think in many case that's not what the user want. If you write at CL.ONE, and 1 of your 3 replica is struggling (but other nodes are perfectly fine), having all your queries rejected (breaking availability concretely) until that one replica catch up (or dies) feels quite wrong. What I think you'd want is consider the node dead for that query and hint that slow node (and if a coordinator gets overwhelmed by those hints, we already throw an OE), but proceed with the write. I'll not in particular that when a node dies, it's not detected so right away, and detection can actually take a few seconds, which might well mean a node that just dies will be considered overloaded temporarilly by some coordinator. So considering that overloaded is roughtly the same as dead makes some sense (you wouldn't want to start failing writes because a node dies due to this mechanism if you have enough node for your CL in particular). bq. For trunk, we can enable it by default provided we can test it on a medium size cluster before committing We *should* test it on a medium sized cluster before committing, but even then I'm really reluctant making it the default on commit. I don't think making new features untested in production and can that have huge impacts the default right away is smart, even though we have done it numerous time (unsucessfully most of the time I'd say). I'd much rather leave it opt-in for a few releases so users can test it and wait more feedback until we make it the default (I know the counter argument, that no-one will test it unless it's a default, but I doubt that's true if people are genuinely put off by the existing behavior). In general, as was already brought up earlier in the discussion, I suspect fine tuning the parameters won't be trivial, and I think we'll need a fair amount of testing in different conditions to guarantee the defaults for those parameters are sane. bq. back_pressure_timeout_override: should we call it for what it is, the back_pressure_window_size and recommend that they increase the write timeout in the comments I concur that I don't love the name either, nor the concept of having an override for another setting. I'd also prefer splitting the double meaning, leaving {{write_request_timeout_in_ms}} always be the timeout, and add a {{window_size}} in the strategy parameter. We can additionally do 2 things to get roughly the same default that in the current patch: # make {{window_size}} be equal to the write timeout by default. # make the {{*_request_timeout_in_ms}} option "silent" default (i.e. commented by default), and make the default for the write one depend on whether back_pressure is on or not. To finish on the yaml option, if we move {{window_size}} to the strategy parameters, we could also got rid of {black_pressure_enabled}} and instead have a {{NoBackPressure}} strategy (which we can totally special case internally). Don't care terribly about it, but it's imo slightly more inline with other strategies. I'd also suggest abstracting the {{BackPressureState}} state as an interface and making the strategy responsible of creating a new one, since most of the details of the state are likely strategy somewhat specific. As far as I can, all we'd need as interface is: {noformat} interface BackPressureState { void onMessageSent(); void onResponseReceived(); double currentBackPressureRate(); } {noformat} This won't provide more info for custom implementation just yet, but at least it'll give them more flexibility, and it's imo a bit clearer. > Bound the number of in-flight requests at the coordinator > --------------------------------------------------------- > > Key: CASSANDRA-9318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths, Streaming and Messaging > Reporter: Ariel Weisberg > Assignee: Sergio Bossa > Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, > limit.btm, no_backpressure.png > > > It's possible to somewhat bound the amount of load accepted into the cluster > by bounding the number of in-flight requests and request bytes. > An implementation might do something like track the number of outstanding > bytes and requests and if it reaches a high watermark disable read on client > connections until it goes back below some low watermark. > Need to make sure that disabling read on the client connection won't > introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)