Vu Pham wrote:
Cameron Harr wrote:
Vu Pham wrote:
Alternatively, is there anything in the SCST layer I should tweak. I'm
still running rev 245 of that code (kinda old, but works with OFED
1.3.1
w/o hacks).
With blockio I get the best performance + stability with scst_threads=1
I got best performance with threads=2 or 3, and I've noticed that the
srpt_thread is often at 99%, though if I increase/decrease the
"thread=?" parameter for ib_srpt, it doesn't seem to make a difference.
A second initiator doesn't seem to help much either, with a single
initiator writing to two targets, can now usually get between 95K and
105K IOPs.
My target server (with DAS) contains 8 2.8 GHz CPU cores and can
sustain over 200K IOPs locally, but only around 73K IOPs over SRP.
Is this number from one initiator or multiple?
One initiator. At first I thought it might be a limitation of the
SRP, and added a second initiator, but the aggregate performance of
the two was about equal to that of a single initiator.
Try again with scst_threads=1. I expect that you can get ~140K with
two initiators
Unfortunately, I'm nowhere close that high, though I am significantly
higher than before. 2 initiators does seem to reduce the context
switching rate however, which is good.
Looking at /proc/interrupts, I see that the mlx_core (comp) device
is pushing about 135K Int/s on 1 of 2 CPUs. All CPUs are enabled
for that PCI-E slot, but it only ever uses 2 of the CPUs, and only
1 at a time. None of the other CPUs has an interrupt rate more
than about 40-50K/s.
The number of interrupt can be cut down if there are more
completions to be processed by sw. ie. please test with multiple QPs
between one initiator vs. your target and multiple initiators vs.
your target
Interrupts are still pretty high (around 160K/s now), but that seems to
not be my bottleneck. Context switching seems to be about 2-2.5 for
every IOP and sometimes less - not perfect, but not horrible either.
ib_srpt process completions in event callback handler. With more QPs
there are more completions pending per interrupt instead of one
completion event per interrupt.
You can have multiple QPs between initiator vs. target by using
different initiator_id_ext ie.
echo id_ext=xxx,ioc_guid=yyy,....initiator_ext=1 >
/sys/class/infiniband_srp/.../add_target
echo id_ext=xxx,ioc_guid=yyy,....initiator_ext=2 >
/sys/class/infiniband_srp/.../add_target
echo id_ext=xxx,ioc_guid=yyy,....initiator_ext=3 >
/sys/class/infiniband_srp/.../add_target
This doesn't seem to net much of an improvement, though I understand the
reasoning behind it. My hunch is there's another bottleneck now to look for.
Cameron
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general