[ 
https://issues.apache.org/jira/browse/CASSANDRA-16663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401274#comment-17401274
 ] 

Caleb Rackliffe commented on CASSANDRA-16663:
---------------------------------------------

Here are the latest {{tlp-stress}} results from a 3-node ccm cluster with RF=1, 
in a format similar to the single-node tests above. The general pattern here is 
that we look at three scenarios:

1.) A request rate that falls just short of the aggregate 3000 requests/second 
that should overload the cluster (3 coordinators each with a limit of 1000).

2.) A request rate that just breaches the the aggregate 3000 requests/second.

3.) A request rate that wildly overruns (by a factor of two) the aggregate 3000 
requests/second.

{noformat}
bin/tlp-stress run --compaction "{'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy', 
'tombstone_compaction_interval': '864000', 'tombstone_threshold': '0.2'}" 
--compression "{'sstable_compression': 'LZ4Compressor'}" --replication 
"{'class': 'SimpleStrategy', 'replication_factor' : 1}" --cl LOCAL_ONE 
--readrate 1.0  --partitions 10000 --populate 10000 --rate [2900|3300|6000] 
--duration 5m --client-retry disable --protocol [4|5] KeyValue
{noformat}

*Native Protocol V4, Backpressure on Overload*

|Aggregate Rate Limit (requests/second)|Client-Requested Rate|Client p99 
(millis)|Client-Observed Rate|
|3000|2900|0.38|2884.28|
|3000|3300| 103.33 |2970.01|
|3000|6000| 102.93 |2957.44|

*Native Protocol V4, Throw on Overload*

|Aggregate Rate Limit (requests/second)|Client-Requested Rate|Client p99 
(millis)|Client-Observed Rate|Errors/Second|
|1000|2900|0.39|2884.31|0|
|1000|3300|0.3|3278.02|299.85|
|3000|6000|0.62|5882.59|2959|

*Native Protocol V5, Backpressure on Overload*

|Aggregate Rate Limit (requests/second)|Client-Requested Rate|Client p99 
(millis)|Client-Observed Rate|
|3000|2900|0.3|2883.95|
|3000|3300|102.25|2948.42|
|3000|6000|102.31|2961.74|

*Native Protocol V5, Throw on Overload*

|Aggregate Rate Limit (requests/second)|Client-Requested Rate|Client p99 
(millis)|Client-Observed Rate|Errors/Second|
|1000|2900|0.28|2884.29 |0|
|1000|3300|0.37|3278.02|299.72|
|3000|6000|11.83|5883.29|2958.85|

(Note: {{tlp-stress}} counts errors as completed requests, so the 
"Client-Observed Rate" above includes them.)

tl;dr V4 and V5 are consistent with each other and consistent with expectations 
for the two overload handling modes.

More specifically, in the throwing case, we nominally meet the requested 
throughput from the client with minimal latency degradation _just above_ the 
limit. The number of errors is almost exactly the limit subtracted from 
client-observed rate, as errors do not consume permits.

In the backpressure case, as expected, our client-observed rate does adhere 
very closely to the limit, with higher latencies representing higher queuing 
times rather than errors.

> Request-Based Native Transport Rate-Limiting
> --------------------------------------------
>
>                 Key: CASSANDRA-16663
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16663
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Messaging/Client
>            Reporter: Caleb Rackliffe
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>             Fix For: 4.x
>
>          Time Spent: 14h 40m
>  Remaining Estimate: 0h
>
> Together, CASSANDRA-14855, CASSANDRA-15013, and CASSANDRA-15519 added support 
> for a runtime-configurable, per-coordinator limit on the number of bytes 
> allocated for concurrent requests over the native protocol. It supports 
> channel back-pressure by default, and optionally supports throwing 
> OverloadedException if that is requested in the relevant connection’s STARTUP 
> message.
> This can be an effective tool to prevent the coordinator from running out of 
> memory, but it may not correspond to how expensive a queries are or provide a 
> direct conceptual mapping to how users think about request capacity. I 
> propose adding the option of request-based (or perhaps more correctly 
> message-based) back-pressure, coexisting with (and reusing the logic that 
> supports) the current bytes-based back-pressure.
> _We can roll this forward in phases_, where the server’s cost accounting 
> becomes more accurate, we segment limits by operation type/keyspace/etc., and 
> the client/driver reacts more intelligently to (especially non-back-pressure) 
> overload, _but something minimally viable could look like this_:
> 1.) Reuse most of the existing logic in Limits, et al. to support a simple 
> per-coordinator limit only on native transport requests per second. Under 
> this limit will be CQL reads and writes, but also auth requests, prepare 
> requests, and batches. This is obviously simplistic, and it does not account 
> for the variation in cost between individual queries, but even a fixed cost 
> model should be useful in aggregate.
>  * If the client specifies THROW_ON_OVERLOAD in its STARTUP message at 
> connection time, a breach of the per-node limit will result in an 
> OverloadedException being propagated to the client, and the server will 
> discard the request.
>  * If THROW_ON_OVERLOAD is not specified, the server will stop consuming 
> messages from the channel/socket, which should back-pressure the client, 
> while the message continues to be processed.
> 2.) This limit is infinite by default (or simply disabled), and can be 
> enabled via the YAML config or JMX at runtime. (It might be cleaner to have a 
> no-op rate limiter that's used when the feature is disabled entirely.)
> 3.) The current value of the limit is available via JMX, and metrics around 
> coordinator operations/second are already available to compare against it.
> 4.) Any interaction with existing byte-based limits will intersect. (i.e. A 
> breach of any limit, bytes or request-based, will actuate back-pressure or 
> OverloadedExceptions.)
> In this first pass, explicitly out of scope would be any work on the 
> client/driver side.
> In terms of validation/testing, our biggest concern with anything that adds 
> overhead on a very hot path is performance. In particular, we want to fully 
> understand how the client and server perform along two axes constituting 4 
> scenarios. Those are a.) whether or not we are breaching the request limit 
> and b.) whether the server is throwing on overload at the behest of the 
> client. Having said that, query execution should dwarf the cost of limit 
> accounting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to