[ 
https://issues.apache.org/jira/browse/CASSANDRA-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963400#comment-13963400
 ] 

Jason Brown commented on CASSANDRA-6995:
----------------------------------------

[~xedin] Ahh, hadn't thought about a new stage for coordinator; thus, there 
wouldn't be contention on the read or write stages between both coordinator and 
data node. 

bq. remote read/write requests I think they should be treated in the same 
concurrency quota as thrift/cql requests and they take as such system resource 
so scheduling them to the same stages would provide appropriate back-pressure 
the client instead of internally overloading the system ….

OK, I can see the argument here for additional back pressure and avoiding 
punishing the internal systems - does seem a bit different to the original 
intent of this ticket, though :). 

bq. [~vijay2win] In a separate note shouldn't we throttle on the number of disk 
read from the disk instead of concurrent_writers and reads? 

Wow, I like this soo much better than the concurrent_reads yaml property - 
which ultimately just sets the size of a thread pool. Using throughput or disk 
IO requests per <time_period> or something similar seems a bit more in tune 
with what we are trying to do with the machine. But, alas, that might be for a 
different ticket.

[~benedict]:
bq. if you know the request will not hit the disk, it should be irrelevant how 
many requests are on the read stage;

How do you *know* the request will not hit the disk? I know of only two things 
here: using something like mincore to know if the mmap’ed page is, in fact, in 
memory, or using something like Datastax’s in-memory option 
(http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/inMemory.html).
 We don’t have the former, and latter is outside the scope of the OSS project.

bq. if this is only available to (synchronous only?) thrift clients …

It is not thrift-only. It applies to any request that a client routes to an 
appropriate node, and uses CL.ONE/LOCAL_ONE.

bq. But I'd like to see evidence it is still beneficial once the change is 
added to honour the read stage limit

See Vijay’s comment - i think that is very germane insight. However, lacking 
that, yes, respecting the concurrent_reads size is required. However, I think 
Pavel's suggestion is better than twisting the existing to use a semaphore.

I think the ideas of Vijay and Pavel are reasonably close in nature, and will 
spend some time thinking about those - and how they will or will not affect 
this ticket.

> Execute local ONE/LOCAL_ONE reads on request thread instead of dispatching to 
> read stage
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6995
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6995
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: performance
>             Fix For: 2.0.7
>
>         Attachments: 6995-v1.diff, syncread-stress.txt
>
>
> When performing a read local to a coordinator node, AbstractReadExecutor will 
> create a new SP.LocalReadRunnable and drop it into the read stage for 
> asynchronous execution. If you are using a client that intelligently routes  
> read requests to a node holding the data for a given request, and are using 
> CL.ONE/LOCAL_ONE, the enqueuing SP.LocalReadRunnable and waiting for the 
> context switches (and possible NUMA misses) adds unneccesary latency. We can 
> reduce that latency and improve throughput by avoiding the queueing and 
> thread context switching by simply executing the SP.LocalReadRunnable 
> synchronously in the request thread. Testing on a three node cluster (each 
> with 32 cpus, 132 GB ram) yields ~10% improvement in throughput and ~20% 
> speedup on avg/95/99 percentiles (99.9% was about 5-10% improvement).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to