[ 
https://issues.apache.org/jira/browse/CASSANDRA-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281958#comment-13281958
 ] 

Peter Schuller commented on CASSANDRA-4277:
-------------------------------------------

This is a result of the architecture of Cassandra, which fundamentally requires 
a thread for an active thrift request. Fixing that would mean making the entire 
thrift front-end asynchronous. But without that happening, is it not correct to 
consider the number of CPU cores when selecting what is a reasonable limit to 
the number of concurrent thrift requests.
                
> hsha default thread limits make no sense, and yaml comments look confused
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4277
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4277
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Peter Schuller
>
> The cassandra.yaml states with respect to {{rpc_max_threads}}:
> {code}
> # For the Hsha server, the min and max both default to quadruple the number of
> # CPU cores.
> {code}
> The code seems to indeed do this. But this makes, as far as I can tell, no 
> sense what-so-ever since the number of concurrent RPC threads you need is a 
> function of the throughput and the average latency of requests (that includes 
> synchronously waiting on network traffic).
> Defaulting to anything having to do with CPU cores seems inherently wrong. If 
> a default is non-static, a closer guess might be to look at thread stack size 
> and heap size and infer what "might" be reasonable.
> *NOTE*: The effect of having this too low, is "strange" (if you don't know 
> what's going on) latencies observed form the client on all thrift requests 
> (*any* thrift request, including e.g. {{describe_ring()}}), that isn't 
> visible in any latency metric exposed by Cassandra. This is why I consider 
> this "major", since unwitting users may be seeing detrimental performance for 
> no good reason.
> In addition, I read this about async:
> {code}
> # async -> Nonblocking server implementation with one thread to serve 
> #          rpc connections.  This is not recommended for high throughput use
> #          cases. Async has been tested to be about 50% slower than sync
> #          or hsha and is deprecated: it will be removed in the next major 
> release.
> {code}
> This makes even less sense. Running with *one* rpc thread limits you to a 
> single concurrent request. How was that 50% number even attained? By 
> single-node testing being completely CPU bound locally on a node? The actual 
> effect should be "stupidly slow" in any real situation with lots of requests 
> on a cluster of many nodes and network traffic (though I didn't test that) - 
> especially in the event of any kind of hiccup like a node doing GC. I agree 
> that if the above is true, async should *definitely* be deprecated, but the 
> reasons seem *much* stronger than implied.
> I may be missing something here, in which case I apologize,, but I 
> specifically double-checked after I fixed this setting on on our our clusters 
> after seeing exactly the expected side-effect of having it be too low. I 
> always was under the impression that rpc_max_threads affects the number of 
> RPC requests running concurrently, and code inspection (it being used for the 
> worker thread limit) + the effects of client-observed latency is consistent 
> with my understanding.
> I suspect the setting was set strangely by someone because the phrasing of 
> the comments in {{cassandra.yaml}} strongly suggest that this should be tied 
> to CPU cores, hiding the fact that this really has to do with the number of 
> requests that can be serviced concurrently regardless of implementation 
> details of thrift/networking being sync/async/etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to