Re: [ofa-general] SRP/mlx4 interrupts throttling performance

Vu Pham Tue, 07 Oct 2008 09:22:13 -0700

Cameron Harr wrote:

Vu Pham wrote:
Cameron Harr wrote:
Vu Pham wrote:
Alternatively, is there anything in the SCST layer I should tweak.I'mstill running rev 245 of that code (kinda old, but works with OFED1.3.1
w/o hacks).
With blockio I get the best performance + stability with scst_threads=1
I got best performance with threads=2 or 3, and I've noticed that thesrpt_thread is often at 99%, though if I increase/decrease the"thread=?" parameter for ib_srpt, it doesn't seem to make adifference. A second initiator doesn't seem to help much either, witha single initiator writing to two targets, can now usually get between95K and 105K IOPs.

ib_srpt's "thread=?" parameter does not mean number of thread butindicating you are in thread context or notthread=0 --> you will avoid one context switch between ib_srpt's threadand scst's threads. You may get better result with thread=0; however,there is stability risk

My target server (with DAS) contains 8 2.8 GHz CPU cores and cansustain over 200K IOPs locally, but only around 73K IOPs over SRP.
Is this number from one initiator or multiple?
One initiator. At first I thought it might be a limitation of theSRP, and added a second initiator, but the aggregate performance ofthe two was about equal to that of a single initiator.
Try again with scst_threads=1. I expect that you can get ~140K withtwo initiators
Unfortunately, I'm nowhere close that high, though I am significantlyhigher than before. 2 initiators does seem to reduce the contextswitching rate however, which is good.


Could you try again with ib_srpt' thread=0?

Looking at /proc/interrupts, I see that the mlx_core (comp)device is pushing about 135K Int/s on 1 of 2 CPUs. All CPUs areenabled for that PCI-E slot, but it only ever uses 2 of the CPUs,and only 1 at a time. None of the other CPUs has an interruptrate more than about 40-50K/s.
The number of interrupt can be cut down if there are morecompletions to be processed by sw. ie. please test with multipleQPs between one initiator vs. your target and multiple initiatorsvs. your target
Interrupts are still pretty high (around 160K/s now), but that seemsto not be my bottleneck. Context switching seems to be about 2-2.5 forevery IOP and sometimes less - not perfect, but not horrible either.
ib_srpt process completions in event callback handler. With more QPsthere are more completions pending per interrupt instead of onecompletion event per interrupt.You can have multiple QPs between initiator vs. target by usingdifferent initiator_id_ext ie.echo id_ext=xxx,ioc_guid=yyy,....initiator_ext=1 >/sys/class/infiniband_srp/.../add_targetecho id_ext=xxx,ioc_guid=yyy,....initiator_ext=2 >/sys/class/infiniband_srp/.../add_targetecho id_ext=xxx,ioc_guid=yyy,....initiator_ext=3 >/sys/class/infiniband_srp/.../add_target
This doesn't seem to net much of an improvement, though I understandthe reasoning behind it. My hunch is there's another bottleneck now tolook for.
Cameron
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visithttp://openib.org/mailman/listinfo/openib-general


_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] SRP/mlx4 interrupts throttling performance

Reply via email to