[jira] [Comment Edited] (CASSANDRA-4277) hsha default thread limits make no sense, and yaml comments look confused

Vijay (JIRA) Wed, 23 May 2012 21:25:47 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282171#comment-13282171
 ]


Vijay edited comment on CASSANDRA-4277 at 5/24/12 4:24 AM:
-----------------------------------------------------------

Peter, i dont understand how will latency come into picture? Selectors are 
woken when the data is available, right? if for some reason your connection is 
taking 15 ms or what even in the middle of a read you are better off 
disconnecting and reconnecting... I still dont understand how 500 threads will 
help it will all hang right?

Basically these threads (4*CPU core's) are used for selection read/write (only 
during that time) and the TP executes it and the selector is woken up again 
when the data has to be written. Are we having the same conversation as in 
CASSANDRA-3590 (but this is within the DC's where latencies are really low)?
                
      was (Author: vijay2...@yahoo.com):
    Peter, i dont understand how will latency come into picture? Selectors are 
woken when the data is available, right? if for some reason your connection is 
taking 15 ms or what even in the middle of a read you are better off 
disconnecting and reconnecting... I still dont understand how 500 threads will 
help it will all hang right? how does it help?

Basically these threads (4*CPU core's) are used for selection read/write (only 
during that time) and the TP executes it and the selector is woken up again 
when the data has to be written. Are we having the same conversation as in 
CASSANDRA-3590 (but this is within the DC's where latencies are really low)?
                  
> hsha default thread limits make no sense, and yaml comments look confused
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4277
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4277
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Peter Schuller
>
> The cassandra.yaml states with respect to {{rpc_max_threads}}:
> {code}
> # For the Hsha server, the min and max both default to quadruple the number of
> # CPU cores.
> {code}
> The code seems to indeed do this. But this makes, as far as I can tell, no 
> sense what-so-ever since the number of concurrent RPC threads you need is a 
> function of the throughput and the average latency of requests (that includes 
> synchronously waiting on network traffic).
> Defaulting to anything having to do with CPU cores seems inherently wrong. If 
> a default is non-static, a closer guess might be to look at thread stack size 
> and heap size and infer what "might" be reasonable.
> *NOTE*: The effect of having this too low, is "strange" (if you don't know 
> what's going on) latencies observed form the client on all thrift requests 
> (*any* thrift request, including e.g. {{describe_ring()}}), that isn't 
> visible in any latency metric exposed by Cassandra. This is why I consider 
> this "major", since unwitting users may be seeing detrimental performance for 
> no good reason.
> In addition, I read this about async:
> {code}
> # async -> Nonblocking server implementation with one thread to serve 
> #          rpc connections.  This is not recommended for high throughput use
> #          cases. Async has been tested to be about 50% slower than sync
> #          or hsha and is deprecated: it will be removed in the next major 
> release.
> {code}
> This makes even less sense. Running with *one* rpc thread limits you to a 
> single concurrent request. How was that 50% number even attained? By 
> single-node testing being completely CPU bound locally on a node? The actual 
> effect should be "stupidly slow" in any real situation with lots of requests 
> on a cluster of many nodes and network traffic (though I didn't test that) - 
> especially in the event of any kind of hiccup like a node doing GC. I agree 
> that if the above is true, async should *definitely* be deprecated, but the 
> reasons seem *much* stronger than implied.
> I may be missing something here, in which case I apologize,, but I 
> specifically double-checked after I fixed this setting on on our our clusters 
> after seeing exactly the expected side-effect of having it be too low. I 
> always was under the impression that rpc_max_threads affects the number of 
> RPC requests running concurrently, and code inspection (it being used for the 
> worker thread limit) + the effects of client-observed latency is consistent 
> with my understanding.
> I suspect the setting was set strangely by someone because the phrasing of 
> the comments in {{cassandra.yaml}} strongly suggest that this should be tied 
> to CPU cores, hiding the fact that this really has to do with the number of 
> requests that can be serviced concurrently regardless of implementation 
> details of thrift/networking being sync/async/etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (CASSANDRA-4277) hsha default thread limits make no sense, and yaml comments look confused

Reply via email to