Vu Pham wrote:
Cameron Harr wrote:
Vu Pham wrote:

Alternatively, is there anything in the SCST layer I should tweak. I'm
still running rev 245 of that code (kinda old, but works with OFED 1.3.1
w/o hacks).

With blockio I get the best performance + stability with scst_threads=1

I got best performance with threads=2 or 3, and I've noticed that the srpt_thread is often at 99%, though if I increase/decrease the "thread=?" parameter for ib_srpt, it doesn't seem to make a difference. A second initiator doesn't seem to help much either, with a single initiator writing to two targets, can now usually get between 95K and 105K IOPs.

My target server (with DAS) contains 8 2.8 GHz CPU cores and can sustain over 200K IOPs locally, but only around 73K IOPs over SRP.

Is this number from one initiator or multiple?
One initiator. At first I thought it might be a limitation of the SRP, and added a second initiator, but the aggregate performance of the two was about equal to that of a single initiator.

Try again with scst_threads=1. I expect that you can get ~140K with two initiators

Unfortunately, I'm nowhere close that high, though I am significantly higher than before. 2 initiators does seem to reduce the context switching rate however, which is good.
Looking at /proc/interrupts, I see that the mlx_core (comp) device is pushing about 135K Int/s on 1 of 2 CPUs. All CPUs are enabled for that PCI-E slot, but it only ever uses 2 of the CPUs, and only 1 at a time. None of the other CPUs has an interrupt rate more than about 40-50K/s.
The number of interrupt can be cut down if there are more completions to be processed by sw. ie. please test with multiple QPs between one initiator vs. your target and multiple initiators vs. your target
Interrupts are still pretty high (around 160K/s now), but that seems to not be my bottleneck. Context switching seems to be about 2-2.5 for every IOP and sometimes less - not perfect, but not horrible either.

ib_srpt process completions in event callback handler. With more QPs there are more completions pending per interrupt instead of one completion event per interrupt. You can have multiple QPs between initiator vs. target by using different initiator_id_ext ie. echo id_ext=xxx,ioc_guid=yyy,....initiator_ext=1 > /sys/class/infiniband_srp/.../add_target echo id_ext=xxx,ioc_guid=yyy,....initiator_ext=2 > /sys/class/infiniband_srp/.../add_target echo id_ext=xxx,ioc_guid=yyy,....initiator_ext=3 > /sys/class/infiniband_srp/.../add_target
This doesn't seem to net much of an improvement, though I understand the reasoning behind it. My hunch is there's another bottleneck now to look for.

Cameron
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to