Vu Pham wrote:
Cameron Harr wrote:

One thing that makes results hard to interpret is that they vary enormously. I've been doing more testing with 3 physical LUNs (instead of two) on the target, srpt_thread=0, and changing between scst_thread=[1,2,3]. With scst_thread=1, I'm fairly low (50K IOPs), while at 2 and three threads, the results are higher, though in all cases, the context switches are low, often less than 1:1.


Can you test again with srpt_thread=0,1 and scst_threads=1,2,3 in NULLIO mode (with 1,2,3 export NULLIO luns)
srpt_thread=0:
scst_t: |    1    |    2      |    3       |
-------------------------------------------|
1 LUN*  |  54K    | 54K-75K   | 54K-75K    |
2 LUNs* |120K-200K|150K-200K**| 120K-180K**|
3 LUNs* |170K-195K|160K-195K  | 130K-170K**|

srpt_thread=1:
scst_t: |    1    |    2      |    3      |
------------------------------------------|
1 LUN*  |   74K   |    54K    |   55K     |
2 LUNs* |140K-190K| 130K-200K | 150K-220K |
3 LUNs* |170K-195K| 170K-195K | 175K-195K |

* a FIO (benchmark) process was run for each LUN, so when there were 3 LUNs, there were three FIO processes runnning simultaneously.

** Sometimes the benchmark "zombied" (process doing no work, but process can't be killed) after running a certain amount of time. However, it wasn't repeatable in a reliable way, so I mark that this particular run has zombied before.

- Note 1: There were a number of outliers (often between 98K and 230K), but I tried to capture where the bulk of the activity happened. It's still somewhat of a rough guess though. Where the range is large, it usually mean the results were just really scattered.

Summary: It's hard to draw a good summary due to the variation of results. I would say the runs with srpt_thread=1 tended to have fewer outliers at the beginning, but as time went on, they scattered as well. Running with 2 or 3 threads almost seems to be a toss-up.

Also a little disconcerting is that my average request size on the target has gotten larger. I'm always writing 512B packets, and when I run on one initiator, the average reqsz is around 600-800B. When I add an initiator, the average reqsz basically doubles and is now around 1200 - 1600B. I'm specifying direct IO in the test and scst is configured as blockio (and thus direct IO), but it appears something is cached at some point and seems to be coalesced when another initiator is involved. Does this seem odd or normal? This shows true whether the initiators are writing to different partitions on the same LUN or the same LUN with no partitions.

What io scheduler are you running on local storage? Since you are using blockio you should play around with io scheduler's tuned parameters (for example deadline scheduler: front_merges, write_starved,...) Please see ~/Documentation/block/*.txt
I'm using CFQ. Months ago, I tried different schedulers with their default options and saw basically no difference. I can try some of that again; however I don't believe I can tune the schedulers because my back end doesn't give me a "queue" directory in /sys/block/<dev>/

-Cameron
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to