Cameron Harr wrote:
Vu Pham wrote:
Cameron Harr wrote:
Vu Pham wrote:
Alternatively, is there anything in the SCST layer I should tweak.
I'm
still running rev 245 of that code (kinda old, but works with OFED
1.3.1
w/o hacks).
With blockio I get the best performance + stability with scst_threads=1
I got best performance with threads=2 or 3, and I've noticed that the
srpt_thread is often at 99%, though if I increase/decrease the
"thread=?" parameter for ib_srpt, it doesn't seem to make a
difference. A second initiator doesn't seem to help much either, with
a single initiator writing to two targets, can now usually get between
95K and 105K IOPs.
ib_srpt's "thread=?" parameter does not mean number of thread but
indicating you are in thread context or not
thread=0 --> you will avoid one context switch between ib_srpt's thread
and scst's threads. You may get better result with thread=0; however,
there is stability risk
My target server (with DAS) contains 8 2.8 GHz CPU cores and can
sustain over 200K IOPs locally, but only around 73K IOPs over SRP.
Is this number from one initiator or multiple?
One initiator. At first I thought it might be a limitation of the
SRP, and added a second initiator, but the aggregate performance of
the two was about equal to that of a single initiator.
Try again with scst_threads=1. I expect that you can get ~140K with
two initiators
Unfortunately, I'm nowhere close that high, though I am significantly
higher than before. 2 initiators does seem to reduce the context
switching rate however, which is good.
Could you try again with ib_srpt' thread=0?
Looking at /proc/interrupts, I see that the mlx_core (comp)
device is pushing about 135K Int/s on 1 of 2 CPUs. All CPUs are
enabled for that PCI-E slot, but it only ever uses 2 of the CPUs,
and only 1 at a time. None of the other CPUs has an interrupt
rate more than about 40-50K/s.
The number of interrupt can be cut down if there are more
completions to be processed by sw. ie. please test with multiple
QPs between one initiator vs. your target and multiple initiators
vs. your target
Interrupts are still pretty high (around 160K/s now), but that seems
to not be my bottleneck. Context switching seems to be about 2-2.5 for
every IOP and sometimes less - not perfect, but not horrible either.
ib_srpt process completions in event callback handler. With more QPs
there are more completions pending per interrupt instead of one
completion event per interrupt.
You can have multiple QPs between initiator vs. target by using
different initiator_id_ext ie.
echo id_ext=xxx,ioc_guid=yyy,....initiator_ext=1 >
/sys/class/infiniband_srp/.../add_target
echo id_ext=xxx,ioc_guid=yyy,....initiator_ext=2 >
/sys/class/infiniband_srp/.../add_target
echo id_ext=xxx,ioc_guid=yyy,....initiator_ext=3 >
/sys/class/infiniband_srp/.../add_target
This doesn't seem to net much of an improvement, though I understand
the reasoning behind it. My hunch is there's another bottleneck now to
look for.
Cameron
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general