Hello,
we have a problem with our disk array. It might be even in HW, I'm not sure.
The array holds home directories of our users + mail.
HW configuration:
a HP DL585 server, with four 6-core Opterons, 128GiB RAM
array: IBM DS4300 with 7 LUNs, each a RAID5 with 4 disks (250GB).
Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA
(the array only supports 2Gb)
NCQ queue depth is 32
SW configuration:
RHEL5.3
the home partition is a linear LVM volume:
# lvdisplay -m /dev/array-vg/newhome
--- Logical volume ---
LV Name /dev/array-vg/newhome
VG Name array-vg
LV UUID 9XxWH5-5yv4-t661-K24d-Hdzg-G0aW-zUxRul
LV Write Access read/write
LV Status available
# open 1
LV Size 2.18 TB
Current LE 571393
Segments 9
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:7
--- Segments ---
Logical extent 0 to 66998:
Type linear
Physical volume /dev/sda
Physical extents 111470 to 178468
Logical extent 66999 to 133997:
Type linear
Physical volume /dev/sdb
Physical extents 111470 to 178468
Logical extent 133998 to 200996:
Type linear
Physical volume /dev/sdc
Physical extents 111470 to 178468
Logical extent 200997 to 267995:
Type linear
Physical volume /dev/sdd
Physical extents 111470 to 178468
Logical extent 267996 to 334994:
Type linear
Physical volume /dev/sde
Physical extents 111470 to 178468
Logical extent 334995 to 401993:
Type linear
Physical volume /dev/sdf
Physical extents 111470 to 178468
Logical extent 401994 to 468992:
Type linear
Physical volume /dev/sdg
Physical extents 111470 to 178468
Logical extent 468993 to 527946:
Type linear
Physical volume /dev/sdg
Physical extents 15945 to 74898
Logical extent 527947 to 571392:
Type linear
Physical volume /dev/sdc
Physical extents 15945 to 59390
All LUNs use the deadline scheduler.
Now the problem:
whenever there is a 'large' write (in the order of hundreds of megabytes),
the system load rises considerably.
Inspection using iostat shows that from something like this:
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 373.00 8.00 7792.00 8 7792
sdb 11.00 8.00 80.00 8 80
sdc 13.00 8.00 96.00 8 96
sdd 9.00 8.00 80.00 8 80
sde 23.00 8.00 296.00 8 296
sdf 9.00 8.00 80.00 8 80
sdg 5.00 8.00 32.00 8 32
after a $ dd if=/dev/zero of=file bs=$((2**20)) count=128
it goes to this:
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 31.00 8.00 28944.00 8 28944
sdf 1.00 8.00 0.00 8 0
sdg 1.00 8.00 0.00 8 0
and when I generate some reads it goes from
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 171.00 3200.00 3448.00 3200 3448
sdb 24.00 3336.00 56.00 3336 56
sdc 17.00 3280.00 16.00 3280 16
sdd 15.00 3208.00 24.00 3208 24
sde 18.00 3200.00 56.00 3200 56
sdf 18.00 3192.00 40.00 3192 40
sdg 23.00 3184.00 144.00 3184 144
to
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 5.00 392.00 88.00 392 88
sdb 2.00 352.00 0.00 352 0
sdc 2.00 264.00 0.00 264 0
sdd 2.00 264.00 0.00 264 0
sde 277.00 560.00 38744.00 560 38744
sdf 2.00 264.00 0.00 264 0
sdg 1.00 296.00 0.00 296 0
It looks like the single write somehow cancels out all other requests.
Switching to a striped LVM volume would probably help, but the data migration
would
be really painful for us.
Has anyone an idea where the problem might be? Any pointers would be
appreciated.
Regards,
Jiri Novosad
_______________________________________________
rhelv5-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/rhelv5-list