[ https://issues.apache.org/jira/browse/CASSANDRA-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404687#comment-13404687 ]
Jeremy Hanna commented on CASSANDRA-4277: ----------------------------------------- Sounds from Sylvain that a SP asynch rewrite would be good for implementing CASSANDRA-2478 as well. > hsha default thread limits make no sense, and yaml comments look confused > ------------------------------------------------------------------------- > > Key: CASSANDRA-4277 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4277 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Peter Schuller > Assignee: Peter Schuller > Fix For: 1.2 > > Attachments: CASSANDRA-4277-trunk-v2.txt, CASSANDRA-4277-trunk.txt > > > The cassandra.yaml states with respect to {{rpc_max_threads}}: > {code} > # For the Hsha server, the min and max both default to quadruple the number of > # CPU cores. > {code} > The code seems to indeed do this. But this makes, as far as I can tell, no > sense what-so-ever since the number of concurrent RPC threads you need is a > function of the throughput and the average latency of requests (that includes > synchronously waiting on network traffic). > Defaulting to anything having to do with CPU cores seems inherently wrong. If > a default is non-static, a closer guess might be to look at thread stack size > and heap size and infer what "might" be reasonable. > *NOTE*: The effect of having this too low, is "strange" (if you don't know > what's going on) latencies observed form the client on all thrift requests > (*any* thrift request, including e.g. {{describe_ring()}}), that isn't > visible in any latency metric exposed by Cassandra. This is why I consider > this "major", since unwitting users may be seeing detrimental performance for > no good reason. > In addition, I read this about async: > {code} > # async -> Nonblocking server implementation with one thread to serve > # rpc connections. This is not recommended for high throughput use > # cases. Async has been tested to be about 50% slower than sync > # or hsha and is deprecated: it will be removed in the next major > release. > {code} > This makes even less sense. Running with *one* rpc thread limits you to a > single concurrent request. How was that 50% number even attained? By > single-node testing being completely CPU bound locally on a node? The actual > effect should be "stupidly slow" in any real situation with lots of requests > on a cluster of many nodes and network traffic (though I didn't test that) - > especially in the event of any kind of hiccup like a node doing GC. I agree > that if the above is true, async should *definitely* be deprecated, but the > reasons seem *much* stronger than implied. > I may be missing something here, in which case I apologize,, but I > specifically double-checked after I fixed this setting on on our our clusters > after seeing exactly the expected side-effect of having it be too low. I > always was under the impression that rpc_max_threads affects the number of > RPC requests running concurrently, and code inspection (it being used for the > worker thread limit) + the effects of client-observed latency is consistent > with my understanding. > I suspect the setting was set strangely by someone because the phrasing of > the comments in {{cassandra.yaml}} strongly suggest that this should be tied > to CPU cores, hiding the fact that this really has to do with the number of > requests that can be serviced concurrently regardless of implementation > details of thrift/networking being sync/async/etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira