[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

Stefania (JIRA) Tue, 19 Jul 2016 20:55:03 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385280#comment-15385280
 ]


Stefania commented on CASSANDRA-9318:
-------------------------------------

The current implementation is still not that generic:

* If we removed {{MessagingService.getBackPressurePerHost()}}, which doesn't 
seem to be used, then we can at least remove {{getBackPressureRateLimit}} and 
{{getHost}} from {{BackPressureState}}, leaving it with only {{onMessageSent}} 
and {{onMessageReceived}}.

* We can also consider moving the states internally to the strategy, at the 
cost of an O(1) lookup, or at least encapsulate 
{{MessagingService.updateBackPressureState}} so that the states become opaque. 
Eventually we could have an update type (message received, memory threshold 
reached, etc) and so that strategies can only deal with the type of updates 
they require.

* The apply method doesn't leave much room in terms of what to do, we can 
basically sleep after a set of replicas has been selected by SP or throw an 
exception. We are also applying backpressure _after sending the messages_, is 
this intentional?

Some other nits:

* In RateBasedBackPressure some fields can have a package based access rather 
than public/protected and the static in the flow enum declaration is redundant.

* We may want to add the flow to the logger.info message "Initialized 
back-pressure..." in the constructor of RateBasedBackPressure.

* In {{RateBasedBackPressure.apply(Iterable<RateBasedBackPressureState> 
states)}}, casting the lambda to {{Comparator<RateBasedBackPressureState>}} is 
unnecessary.

* In the very same comparator lambda, if 2 states are not equal but have the 
exact same rate limit, then I assume it is intentional to compare them by hash 
code in order to get a stable sort? If so we should add a comment since tree 
set only requires the items to be mutually comparable and therefore returning 
zero would have been allowed as well I believe.

* {{RateBasedBackPressureState}} can be package local, {{windowSize}} can be 
private and {{getLastAcquire()}} can be package local.

* {{sortOrder}} in {{MockBackPressureState}} is not used.

* Several trailing spaces.

One 
[failure|https://cassci.datastax.com/view/Dev/view/sbtourist/job/sbtourist-CASSANDRA-9318-3.0-dtest/lastCompletedBuild/testReport/cqlsh_tests.cqlsh_copy_tests/CqlshCopyTest/test_bulk_round_trip_with_timeouts_2/]
 that needs to be looked at:

{code}
ERROR [SharedPool-Worker-1] 2016-07-19 12:40:38,257 QueryMessage.java:128 - 
Unexpected error during query
java.lang.IllegalArgumentException: Size should be greater than precision.
        at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:122) 
~[guava-18.0.jar:na]
        at 
org.apache.cassandra.utils.SlidingTimeRate.<init>(SlidingTimeRate.java:51) 
~[main/:na]
        at 
org.apache.cassandra.net.RateBasedBackPressureState.<init>(RateBasedBackPressureState.java:65)
 ~[main/:na]
        at 
org.apache.cassandra.net.RateBasedBackPressure.newState(RateBasedBackPressure.java:205)
 ~[main/:na]
        at 
org.apache.cassandra.net.RateBasedBackPressure.newState(RateBasedBackPressure.java:43)
 ~[main/:na]
        at 
org.apache.cassandra.net.MessagingService.getConnectionPool(MessagingService.java:649)
 ~[main/:na]
        at 
org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:663)
 ~[main/:na]
        at 
org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:812) 
~[main/:na]
        at 
org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:755) 
~[main/:na]
        at 
org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:733) 
~[main/:na]
        at 
org.apache.cassandra.service.StorageProxy.truncateBlocking(StorageProxy.java:2380)
 ~[main/:na]
        at 
org.apache.cassandra.cql3.statements.TruncateStatement.execute(TruncateStatement.java:71)
 ~[main/:na]
        at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:206)
 ~[main/:na]
{code}

bq. Metrics can be added in this ticket once we settle on the core algorithm, 
but then yes any reporting mechanism to clients should be probably dealt with 
separately as it would probably involve changes to the native protocol (and I'm 
not sure what's the usual procedure in such case).

Let's test the strategy first, then metrics can be added later on. Let's not 
worry about native protocol changes for now, since we are no longer throwing 
exceptions.

bq. Regarding the memory-threshold strategy, I would like to stress once again 
the fact it's a coordinator-only back-pressure mechanism which wouldn't 
directly benefit replicas; also, Ariel Weisberg showed in his tests that such 
strategy isn't even needed given the local hints back-pressure/overflowing 
mechanisms already implemented. So my vote goes against it, but if you/others 
really want it, I'm perfectly fine implementing it, but it will have to be in 
this ticket, as it requires some more changes to the back-pressure API.

A generic strategy should be sufficient, without implementing the coordinator 
based approach.

I would also recommend rebasing to trunk before testing, since it will only be 
committed to trunk.


> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths, Streaming and Messaging
>            Reporter: Ariel Weisberg
>            Assignee: Sergio Bossa
>         Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

Reply via email to