Vu Pham wrote:

Alternatively, is there anything in the SCST layer I should tweak. I'm
still running rev 245 of that code (kinda old, but works with OFED 1.3.1
w/o hacks).


What is the mode (pass thru, blockio...)?
blockio
What is the scst_threads=<xx> parameters?
Default, which I believe is #cpus




My target server (with DAS) contains 8 2.8 GHz CPU cores and can sustain over 200K IOPs locally, but only around 73K IOPs over SRP.

Is this number from one initiator or multiple?
One initiator. At first I thought it might be a limitation of the SRP, and added a second initiator, but the aggregate performance of the two was about equal to that of a single initiator.

Looking at /proc/interrupts, I see that the mlx_core (comp) device is pushing about 135K Int/s on 1 of 2 CPUs. All CPUs are enabled for that PCI-E slot, but it only ever uses 2 of the CPUs, and only 1 at a time. None of the other CPUs has an interrupt rate more than about 40-50K/s.


The number of interrupt can be cut down if there are more completions to be processed by sw. ie. please test with multiple QPs between one initiator vs. your target and multiple initiators vs. your target

A couple questions here on my side. How would more QP connections reduce interrupts? It seems like they'd still need to come through the same mlx device, causing the same number or more, of interrupts. More importantly thought, how would one increase the number of QPs between and initiator and target? I did have my ib_srpt threads up, would that be comparable?

Does anyone know of a trick to spread those interrupts out more (which I realize might be bad due to context switching), or something else that will reduce my interrupts on that cpu? The mlx4 is a MSI-X interrupt. I've changed it to an APIC int, but it seems to give slightly lower performance.

There userspace daemon, irqbalanced, that dynamically directs IRQs to different CPUs. You can define which CPUs CAN handle an IRQ but you cannot control how it is done. You can look at Documentation/IRQ-affinity.txt for details how to configure it. In some cases I found better performance-wise to shut the irqbalanced off and assign the process to one (ore more) CPU and use a different CPU to serve interrupts.

Earlier, I did go over that file, and tried playing around with /sys/class/pci_bus/<slot>/cpu_affinity and /proc/irq/<slot>/smp_affinity for the pci slot I was using, but didn't have much luck. I also tried turning off irqbalance, but that made no difference.


Additionally, I found that I can load the newer scst code if I use the kernel-supplied modules and the standalone srpt-1.0.0 package that I think you provide Vu. I was about to try it along with dropping a module param for ib_srpt (I was using a thread count of 32 that had given me better performance on an earlier test). I'll report back on this.
Thanks for the help,
Cameron
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to