[ https://issues.apache.org/jira/browse/CASSANDRA-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963400#comment-13963400 ]
Jason Brown commented on CASSANDRA-6995: ---------------------------------------- [~xedin] Ahh, hadn't thought about a new stage for coordinator; thus, there wouldn't be contention on the read or write stages between both coordinator and data node. bq. remote read/write requests I think they should be treated in the same concurrency quota as thrift/cql requests and they take as such system resource so scheduling them to the same stages would provide appropriate back-pressure the client instead of internally overloading the system …. OK, I can see the argument here for additional back pressure and avoiding punishing the internal systems - does seem a bit different to the original intent of this ticket, though :). bq. [~vijay2win] In a separate note shouldn't we throttle on the number of disk read from the disk instead of concurrent_writers and reads? Wow, I like this soo much better than the concurrent_reads yaml property - which ultimately just sets the size of a thread pool. Using throughput or disk IO requests per <time_period> or something similar seems a bit more in tune with what we are trying to do with the machine. But, alas, that might be for a different ticket. [~benedict]: bq. if you know the request will not hit the disk, it should be irrelevant how many requests are on the read stage; How do you *know* the request will not hit the disk? I know of only two things here: using something like mincore to know if the mmap’ed page is, in fact, in memory, or using something like Datastax’s in-memory option (http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/inMemory.html). We don’t have the former, and latter is outside the scope of the OSS project. bq. if this is only available to (synchronous only?) thrift clients … It is not thrift-only. It applies to any request that a client routes to an appropriate node, and uses CL.ONE/LOCAL_ONE. bq. But I'd like to see evidence it is still beneficial once the change is added to honour the read stage limit See Vijay’s comment - i think that is very germane insight. However, lacking that, yes, respecting the concurrent_reads size is required. However, I think Pavel's suggestion is better than twisting the existing to use a semaphore. I think the ideas of Vijay and Pavel are reasonably close in nature, and will spend some time thinking about those - and how they will or will not affect this ticket. > Execute local ONE/LOCAL_ONE reads on request thread instead of dispatching to > read stage > ---------------------------------------------------------------------------------------- > > Key: CASSANDRA-6995 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6995 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Jason Brown > Assignee: Jason Brown > Priority: Minor > Labels: performance > Fix For: 2.0.7 > > Attachments: 6995-v1.diff, syncread-stress.txt > > > When performing a read local to a coordinator node, AbstractReadExecutor will > create a new SP.LocalReadRunnable and drop it into the read stage for > asynchronous execution. If you are using a client that intelligently routes > read requests to a node holding the data for a given request, and are using > CL.ONE/LOCAL_ONE, the enqueuing SP.LocalReadRunnable and waiting for the > context switches (and possible NUMA misses) adds unneccesary latency. We can > reduce that latency and improve throughput by avoiding the queueing and > thread context switching by simply executing the SP.LocalReadRunnable > synchronously in the request thread. Testing on a three node cluster (each > with 32 cpus, 132 GB ram) yields ~10% improvement in throughput and ~20% > speedup on avg/95/99 percentiles (99.9% was about 5-10% improvement). -- This message was sent by Atlassian JIRA (v6.2#6252)