[ https://issues.apache.org/jira/browse/CASSANDRA-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828304#comment-13828304 ]
Jason Brown commented on CASSANDRA-1632: ---------------------------------------- On a happier note, though, by simply switching OTC to batch read from it's LBQ, I scored a 10% improvement in coordinator throughput (latencies remained unaffected). I'll clean up that patch (https://github.com/jasobrown/cassandra/tree/1632_batchDispatch), and actually put in error handling :). I'll also add the same technique elsewhere in the code, although PeriodicCommitLogExecutorService seems like the only other interesting place to apply it. > Thread workflow and cpu affinity > -------------------------------- > > Key: CASSANDRA-1632 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1632 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Chris Goffinet > Assignee: Jason Brown > Labels: performance > Attachments: threadAff_reads.txt, threadAff_writes.txt > > > Here are some thoughts I wanted to write down, we need to run some serious > benchmarks to see the benefits: > 1) All thread pools for our stages use a shared queue per stage. For some > stages we could move to a model where each thread has its own queue. This > would reduce lock contention on the shared queue. This workload only suits > the stages that have no variance, else you run into thread starvation. Some > stages that this might work: ROW-MUTATION. > 2) Set cpu affinity for each thread in each stage. If we can pin threads to > specific cores, and control the workflow of a message from Thrift down to > each stage, we should see improvements on reducing L1 cache misses. We would > need to build a JNI extension (to set cpu affinity), as I could not find > anywhere in JDK where it was exposed. > 3) Batching the delivery of requests across stage boundaries. Peter Schuller > hasn't looked deep enough yet into the JDK, but he thinks there may be > significant improvements to be had there. Especially in high-throughput > situations. If on each consumption you were to consume everything in the > queue, rather than implying a synchronization point in between each request. -- This message was sent by Atlassian JIRA (v6.1#6144)