[ 
https://issues.apache.org/jira/browse/CASSANDRA-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831609#comment-13831609
 ] 

Chris Burroughs commented on CASSANDRA-1632:
--------------------------------------------

Jason, thank you for the long writeup.

On affinity and IRQs:  A commonly recommended setup for an event driven load 
balancer (ie haproxy) is to pin network interrupts to one core and run the lb 
on another core with a shared L2 cache.  This works better on CPUs when some 
cores actually share L2 caches.  The closest analogue for Cassandra would be to 
pin interrupts, run an evented selector thread on another core, and everything 
else on the rest.

It would be interesting of there are best practices around interrupts that 
could get that 5% performance bump.  Pinning, disable/enabling irqbalance are 
all relativity easy to change at least in a bash-script kind of way. On the 
other hand, I've never heard a "%sintr is too high!" complaints on the mailing 
list.

> Thread workflow and cpu affinity
> --------------------------------
>
>                 Key: CASSANDRA-1632
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1632
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Chris Goffinet
>            Assignee: Jason Brown
>              Labels: performance
>             Fix For: 2.1
>
>         Attachments: 1632-v2.txt, 1632-v3.diff, 1632-v4.diff, 
> 1632_batchRead-v1.diff, patch_v5.diff, patch_v5a.diff, threadAff_reads.txt, 
> threadAff_writes.txt
>
>
> Here are some thoughts I wanted to write down, we need to run some serious 
> benchmarks to see the benefits:
> 1) All thread pools for our stages use a shared queue per stage. For some 
> stages we could move to a model where each thread has its own queue. This 
> would reduce lock contention on the shared queue. This workload only suits 
> the stages that have no variance, else you run into thread starvation. Some 
> stages that this might work: ROW-MUTATION.
> 2) Set cpu affinity for each thread in each stage. If we can pin threads to 
> specific cores, and control the workflow of a message from Thrift down to 
> each stage, we should see improvements on reducing L1 cache misses. We would 
> need to build a JNI extension (to set cpu affinity), as I could not find 
> anywhere in JDK where it was exposed. 
> 3) Batching the delivery of requests across stage boundaries. Peter Schuller 
> hasn't looked deep enough yet into the JDK, but he thinks there may be 
> significant improvements to be had there. Especially in high-throughput 
> situations. If on each consumption you were to consume everything in the 
> queue, rather than implying a synchronization point in between each request.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to