Re: [ofa-general] SRP/mlx4 interrupts throttling performance

Cameron Harr Wed, 08 Oct 2008 10:16:14 -0700

Vu Pham wrote:

Cameron Harr wrote:
One thing that makes results hard to interpret is that they varyenormously. I've been doing more testing with 3 physical LUNs(instead of two) on the target, srpt_thread=0, and changing betweenscst_thread=[1,2,3]. With scst_thread=1, I'm fairly low (50K IOPs),while at 2 and three threads, the results are higher, though in allcases, the context switches are low, often less than 1:1.
Can you test again with srpt_thread=0,1 and scst_threads=1,2,3 inNULLIO mode (with 1,2,3 export NULLIO luns)

srpt_thread=0:
scst_t: |    1    |    2      |    3       |
-------------------------------------------|
1 LUN*  |  54K    | 54K-75K   | 54K-75K    |
2 LUNs* |120K-200K|150K-200K**| 120K-180K**|
3 LUNs* |170K-195K|160K-195K  | 130K-170K**|

srpt_thread=1:
scst_t: |    1    |    2      |    3      |
------------------------------------------|
1 LUN*  |   74K   |    54K    |   55K     |
2 LUNs* |140K-190K| 130K-200K | 150K-220K |
3 LUNs* |170K-195K| 170K-195K | 175K-195K |

* a FIO (benchmark) process was run for each LUN, so when there were 3LUNs, there were three FIO processes runnning simultaneously.

** Sometimes the benchmark "zombied" (process doing no work, but processcan't be killed) after running a certain amount of time. However, itwasn't repeatable in a reliable way, so I mark that this particular runhas zombied before.

- Note 1: There were a number of outliers (often between 98K and 230K),but I tried to capture where the bulk of the activity happened. It'sstill somewhat of a rough guess though. Where the range is large, itusually mean the results were just really scattered.

Summary: It's hard to draw a good summary due to the variation ofresults. I would say the runs with srpt_thread=1 tended to have feweroutliers at the beginning, but as time went on, they scattered as well.Running with 2 or 3 threads almost seems to be a toss-up.

Also a little disconcerting is that my average request size on thetarget has gotten larger. I'm always writing 512B packets, and when Irun on one initiator, the average reqsz is around 600-800B. When Iadd an initiator, the average reqsz basically doubles and is nowaround 1200 - 1600B. I'm specifying direct IO in the test and scst isconfigured as blockio (and thus direct IO), but it appears somethingis cached at some point and seems to be coalesced when anotherinitiator is involved. Does this seem odd or normal? This shows truewhether the initiators are writing to different partitions on thesame LUN or the same LUN with no partitions.
What io scheduler are you running on local storage? Since you areusing blockio you should play around with io scheduler's tunedparameters (for example deadline scheduler: front_merges,write_starved,...) Please see ~/Documentation/block/*.txt

I'm using CFQ. Months ago, I tried different schedulers with theirdefault options and saw basically no difference. I can try some of thatagain; however I don't believe I can tune the schedulers because my backend doesn't give me a "queue" directory in /sys/block/<dev>/


-Cameron
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] SRP/mlx4 interrupts throttling performance

Reply via email to