> 
 
--------------------------------------------------------------------------
This e-mail Contains PeerApp Proprietary and Confidential information.
 
 
--------------------------------------------------------------------------
-----Original Message-----
> From: Matt Domsch [mailto:[email protected]]
> Sent: ד 12 אוגוסט 2009 16:47
> 
> On Mon, Aug 10, 2009 at 08:37:03AM +0300, Sasha wrote:
> >    We're using PERC 6 (MegaRAID SAS 1078) controller on R610 machines. When
> >    under stress, PERC interrupts (megasas in /proc/interrupts) consume up to
> >    100% of CPU time on one of the cores. Usually this is CPU0, but we can
> >    move it by writing into /proc/<megasas IRQ number>/smp_affinity file. I
> >    was wondering if it is possible to reduce its CPU consumption, or perhaps
> >    spread the load among several cores.
> 
> Using the smp_affinity, which is a bitmask, you can spread the
> interrupts across several cores, simply by setting the appropriate CPU
> ID bits to 1.
> 
> echo "0x55" > smp_affinity
> 
> would send interrupts to CPUs 1, 3, 5, and 7.

I am afraid this does not work for us on R610. 
Writing a bitmask into smp_affinity file implies that IO-APIC has been 
configured to work in so called Logical Low Priority interrupt delivery mode. 
The problem is that on R610 kernel does not put IO-APIC into this mode. This is 
because:
1. We have two quad core CPUs and hyper-threading, meaning that we have 16 
cores. IO-APIC can address up to 8 (it has one byte bitmask for the destination 
core, thus it can address up 8 cores).
2. Local APIC ID of the CPUs is higher than 8 (even with hyper-threading off). 
Since in logical mode IO-APIC can address up to 8 cores, kernel drops it into 
so called Physical Fixed interrupt delivery mode - i.e. every interrupt gets to 
single core. In this mode it can address up to 256 cores (it’s the same one 
byte register, but this time it is treated as a number, not a bitmask).

I was thinking about something else. Network devices support MSI-X PCI 
extension. It allows single device to have multiple interrupt vectors. With 
multiple interrupt vectors per device, even in physical mode, we can spread 
interrupts across several cores. For instance onboard Broadcom NICs have 8 
vectors per port, thus using the above smp_affinity trick you can configure 8 
cores to serve its interrupts.
I am wondering if there's something similar for megaraid_sas?

> Note, for network devices, spreading the interrupts out may have an
> adverse impact, as then the kernel may see the arriving packets "out
> of order", and invoke the relatively expensive TCP packet reordering
> algorithms.  For this reason, irqbalanced tries to keep network device
> interrupts pinned to a single CPU, and/or switch the device to polling
> mode if there's clearly enough work to do that taking an interrupt per
> packet would be overkill.

When you let IO-APIC to do the round robin then it may indeed cause reordering. 
When using MSI-X technique I described, it is different. Each interrupt has a 
queue behind it. There is number of TX and RX queues. NIC should analyze a 
packet and decide what queue to pass it to. Same happens in kernel when it 
transmits the packet (see simple_tx_hash() net/core/dev.c and dev_pick_tx() at 
net/core/dev.c).

> By the same token, lwn.net this week covers an enhancement to the
> kernel's block I/O layer, to let it switch from interrupt mode to
> polling mode, which may be beneficial to workloads such as yours.
> This is slated for 2.6.32ish.

Thanks for the info. I'll look into it.

Alexander (Sasha) Sandler.
Software Engineer.
PeerApp LTD.
[email protected]






_______________________________________________
Linux-PowerEdge mailing list
[email protected]
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq

Reply via email to