[ 
https://issues.apache.org/jira/browse/CASSANDRA-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962613#comment-13962613
 ] 

Jason Brown commented on CASSANDRA-6995:
----------------------------------------

bq. [~xedin] Can we schedule requests to the appropriate stage directly from 
thrift selector threads?

Well, to a degree, you could say we already do that already :) , it just 
happens late in StorageProxy/ARE (but only when we're executing locally). 
However, if you mean (which I think you do, but correct me if I am mistaken) 
"can we detect, further upstream (before SP), which stage we'll eventually use, 
and use it locally on the coordinator", I think that is possible in both 
thrift-land (CassandraServer) and cql3 (QueryProcessor or in the CQLStatement 
implementations, I believe). However, I see a couple of issues there:

* You will always be incurring the thread context switch to one of the stages 
(even if you are not reading locally, which is probably most of the use cases 
in the wild).
* A given node's stages will be contended for by both coordinator use and data 
node uses. This perhaps would suggest a model similar to what [~benedict] 
mentioned earlier, a semaphore to limit the uses of an individual stage. 

I do like the idea of stalling/blocking requests further upstream (closer to 
the caller), and perhaps breaking it down by the type of operation (reads vs. 
writes vs. schema changes vs ....). However, I think that might be different 
than the original intent of this ticket, which is to eliminate the additional 
context switch when reading locally on the coordinator.

> Execute local ONE/LOCAL_ONE reads on request thread instead of dispatching to 
> read stage
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6995
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6995
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: performance
>             Fix For: 2.0.7
>
>         Attachments: 6995-v1.diff, syncread-stress.txt
>
>
> When performing a read local to a coordinator node, AbstractReadExecutor will 
> create a new SP.LocalReadRunnable and drop it into the read stage for 
> asynchronous execution. If you are using a client that intelligently routes  
> read requests to a node holding the data for a given request, and are using 
> CL.ONE/LOCAL_ONE, the enqueuing SP.LocalReadRunnable and waiting for the 
> context switches (and possible NUMA misses) adds unneccesary latency. We can 
> reduce that latency and improve throughput by avoiding the queueing and 
> thread context switching by simply executing the SP.LocalReadRunnable 
> synchronously in the request thread. Testing on a three node cluster (each 
> with 32 cpus, 132 GB ram) yields ~10% improvement in throughput and ~20% 
> speedup on avg/95/99 percentiles (99.9% was about 5-10% improvement).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to