[ https://issues.apache.org/jira/browse/CASSANDRA-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042734#comment-17042734 ]
Benedict Elliott Smith commented on CASSANDRA-15375: ---------------------------------------------------- bq. since it's never really been tested at scale that I know of ^ This was a euphemism for “this feature has never been used, and is probably bad”. It was implemented some time ago by DataStax, never advertised in any way by OSS, and has never ben updated (making it either the first perfect feature, or broken). It has perhaps been used by DataStax in their own offerings, but never by OSS. It is unlikely (m?)any even know it exists. Given the 4.0 networking changes, this feature no longer provides any utility for stability. We now limit the amount of data inbound from any specific (and all) coordinators so that we cannot be overwhelmed, and vice-versa, and this happens instantly i.e. responsively*. This feature, however, makes some basic implementation errors, and appears to have several problematic semantics, particularly with vnodes, responsiveness and choppiness (imposing three arbitrary rates of LOW, HIGH, INFINITE for all unique combination of message recipient (probably really problematic with vnodes, and high RF), updated once every WriteRpcTimeout - assuming the system clock doesn’t get updated by e.g. NTP). The only behaviour missing from internode is the ability to notify clients of back pressure, either by propagating to the client connection or by throwing overloaded exceptions. However this is also implemented poorly here, “applying backpressure” by consuming a {{RequestPoolExecutor}} thread until permitted to proceed. Thanks to CASSANDRA-15013 this will only be suboptimal, but prior to 4.0 this would have lead to really problematic cluster behaviours. It’s worth noting that the above was all perhaps a reasonable set of trade-offs when first implemented, though the original ticket lead to a great deal of debate about the reasonableness of the approach (CASSANDRA-9318). However it also suggests to me we are better removing this unused, unmaintained feature that is no longer particularly needed, and if we have time implementing the version that makes sense in the current context. (*That all said, 4.0 stability at scale is part of the 4.0 testing plan, and determining reasonable numbers for the limits is a remaining exercise - they are almost certainly too high today to guarantee stability.) > back pressure log line is misleading > ------------------------------------ > > Key: CASSANDRA-15375 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15375 > Project: Cassandra > Issue Type: Bug > Components: Observability/Logging > Reporter: Jon Haddad > Assignee: Jon Haddad > Priority: Low > > This is odd: > {{INFO [main] 2019-10-25 10:33:07,985 DatabaseDescriptor.java:803 - > Back-pressure is disabled with strategy > org.apache.cassandra.net.RateBasedBackPressure\{high_ratio=0.9, factor=5, > flow=FAST}.}} > When I saw that, I wasn't sure if back pressure was actually disabled, or if > I was really using {{RateBasedBackPressure.}} > This should change to output either: > {{Back-pressure is disabled}} > {{or}} > {{Back-pressure is enabled with strategy > org.apache.cassandra.net.RateBasedBackPressure\{high_ratio=0.9, factor=5, > flow=FAST}.}}{{}} > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org