On Mon, 2009-06-29 at 16:32 +0100, John Haxby wrote:
> That's a fairly busy system but the iostat output doesn't look to me
> like something that's I/O bound: the average wait times and queue size
> just don't look like something that's in trouble or even working all
> that hard.
> 
> Am I missing something here?

I think what you're missing is the IO being spread across multiple
paths.  It's difficult to read in the format with dm-12, dm-13, and
dm-14 mixed together, however, if you separate them you'll see a pattern
like this:

dm-12             0.00     0.00 350.00 880.00 56176.00  7040.00    51.40     
7.16    5.82   0.69  85.40
dm-12             0.00     0.00 582.00 108.00 89688.00   864.00   131.23     
2.54    3.67   0.98  67.80
dm-12             0.00     0.00 368.32 645.54 56839.60  5164.36    61.16     
5.17    5.10   0.85  85.94
dm-12             0.00     0.00 402.00 1520.00 59472.00 12160.00    37.27     
9.22    4.80   0.50  95.20
dm-12             0.00     0.00 387.00 100.00 61496.00   800.00   127.92     
3.73    7.61   1.54  75.00
dm-12             0.00     0.00 486.00 444.00 75848.00  3552.00    85.38     
4.64    4.93   0.82  76.10
dm-12             0.00     0.00 373.00 1488.00 57416.00 11904.00    37.25    
11.70    6.32   0.51  95.60
dm-12             0.00     0.00 408.00 185.00 61504.00  1480.00   106.21     
2.88    4.86   1.26  74.60

After that, dm-12 will be quiet, but then dm-13 wakes up:

dm-13             0.00     0.00 288.00  0.00 41264.00     0.00   143.28     
2.04    6.94   2.61  75.20
dm-13             0.00     0.00 468.00 56.00 68424.00   448.00   131.44     
2.00    3.85   1.40  73.40
dm-13             0.00     0.00 526.00 292.00 77128.00  2336.00    97.14     
5.61    6.88   0.89  72.50
dm-13             0.00     0.00 514.00 216.00 73808.00  1728.00   103.47     
5.87    7.88   0.96  70.00
dm-13             0.00     0.00 548.51 83.17 76879.21   665.35   122.76     
2.87    4.71   1.09  69.11
dm-13             0.00     0.00 508.00 136.00 72520.00  1088.00   114.30     
6.08    9.43   1.07  68.60
dm-13             0.00     0.00 300.00  3.00 44840.00    24.00   148.07     
4.09   13.53   2.76  83.50
dm-13             0.00     0.00 392.00 172.00 55608.00  1376.00   101.04     
8.69   14.97   1.41  79.40

And finally, dm-14:

dm-14             0.00     0.00 427.00  4.00 63816.00    32.00   148.14     
1.29    2.99   1.08  46.50
dm-14             0.00     0.00 510.00  0.00 73032.00     0.00   143.20     
1.91    3.75   1.45  73.90
dm-14             0.00     0.00 171.00  0.00 24600.00     0.00   143.86     
2.15   12.59   5.48  93.70
dm-14             0.00     0.00 459.00  0.00 66368.00     0.00   144.59     
2.96    6.46   1.67  76.80
dm-14             0.00     0.00 549.50  0.00 77132.67     0.00   140.37     
2.46    4.47   1.27  69.60
dm-14             0.00     0.00 570.00  3.00 82000.00    24.00   143.15     
2.26    3.94   1.23  70.70
dm-14             0.00     0.00 404.00  0.00 58680.00     0.00   145.25     
1.82    4.51   2.02  81.50
dm-14             0.00     0.00 459.00  0.00 66368.00     0.00   144.59     
2.44    5.32   1.64  75.40

Then it wraps around to dm-12 and the pattern continues for about 90
seconds.  I'm assuming that's the 90 seconds of the full table scan.
He's pretty IO bound during that part.  He could set the multipath
rr_min_io parameter lower to more evenly balance the IO across the
paths, but I think he's pretty much maxing out the IOPS his array has
already based on the graph his SAN admin provided.  In the end, the
RHEL5 box appears to be doing what it can with the IOPS available.

Later,
Tom


_______________________________________________
rhelv5-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/rhelv5-list

Reply via email to