[ 
https://issues.apache.org/jira/browse/CASSANDRA-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962671#comment-13962671
 ] 

Benedict commented on CASSANDRA-6995:
-------------------------------------

bq. Not sure what this means or how you would know something is in memory 
(without something like mincore), but choosing to read on the request thread 
shouldn't depend on that knowledge. It's not that kind of tradeoff we're trying 
to win with this ticket.

The point of the limit on the number of concurrent readers in the read stage is 
to prevent spamming the disk; if you know the request will not hit the disk, it 
should be irrelevant how many requests are on the read stage; so you should be 
able to proceed immediately if in-memory, and if not you should then be able to 
permit-steal. Finally, if neither scenario works, you simply place on the queue.

bq. I would argue, though, why not give the optimization to clients who know 
what they are doing (or happen to get a lucky via round-robin)?

Well, if this is only available to (synchronous only?) thrift clients, who are 
only performing CL=1 queries, while the server is unloaded (i.e. read stage not 
fully occupied) it seems a narrow win for writing a custom executor service, 
especially as any round robin benefit will vanish as the cluster gets larger 
than about 8 nodes. That said, I'm all for any performance improvement, and I 
hope we can bring the improvement to the more general use case eventually. But 
I'd like to see evidence it is still beneficial once the change is added to 
honour the read stage limit, which we really do need to do.

bq. I do like the idea of stalling/blocking requests further upstream (closer 
to the caller), and perhaps breaking it down by the type of operation (reads 
vs. writes vs. schema changes vs ....). 

There's no benefit to moving more work into the stage when we can safely do it 
outside - and despatching the remote requests simultaneously to adding to the 
read stage means they are sent as early as possible, which is the best 
situation given they'll be slowest.

bq. I think we can schedule pure coordinator requests to the separate stage 
(when it doesn't do any operations except forwarding requests)

There are no high performance requests that hit the stages in order to be 
forwarded. They are forwarded by the request thread at the same time as they 
are added to the stages.


> Execute local ONE/LOCAL_ONE reads on request thread instead of dispatching to 
> read stage
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6995
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6995
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: performance
>             Fix For: 2.0.7
>
>         Attachments: 6995-v1.diff, syncread-stress.txt
>
>
> When performing a read local to a coordinator node, AbstractReadExecutor will 
> create a new SP.LocalReadRunnable and drop it into the read stage for 
> asynchronous execution. If you are using a client that intelligently routes  
> read requests to a node holding the data for a given request, and are using 
> CL.ONE/LOCAL_ONE, the enqueuing SP.LocalReadRunnable and waiting for the 
> context switches (and possible NUMA misses) adds unneccesary latency. We can 
> reduce that latency and improve throughput by avoiding the queueing and 
> thread context switching by simply executing the SP.LocalReadRunnable 
> synchronously in the request thread. Testing on a three node cluster (each 
> with 32 cpus, 132 GB ram) yields ~10% improvement in throughput and ~20% 
> speedup on avg/95/99 percentiles (99.9% was about 5-10% improvement).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to