On 27.3.2010 23:21, Pasi Kärkkäinen wrote: > On Thu, Mar 25, 2010 at 06:54:38PM +0100, [email protected] wrote: >> >> Hello Jiri, >> >> The high load may be caused by I/O wait (check with sar -u). At any case, >> 30mb/s seems a little slow for an FC array of any kind.. >> Hi,
> > 30 MB/sec can be a LOT.. depending on the IO pattern and IO size. > > If you're doing totally random IO where each IO is 512 bytes in size, > then 30 MB/sec would equal over 61000 IOPS. > > Single 15k SAS/FC disk can do around 300-400 random IOPS max, so 61000 IOPS > would require you to have around 150x (15k) disks in raid-0. > > -- Pasi Actually that was 15 MB/sec (the units were blocks). But it's true that our access pattern is quite random. There are 30+ users logged in over ssh, others access their mail over imap/pop3 and also some 80 PCs mount the user's home directories over NFS. Sometimes 30 users use Netbeans at once + there's Firefox ... The real issue here is not the overall speed and I'm sorry if I didn't make myself clear. The problem is that when there is a large amount of writes to a single LUN only a small percentage of requests (if any) make it to the other LUNs. I ran another test to compare what happens when I don't use the disk cache (oflags=direct equals opening the output file with O_DIRECT). I generate some reads from all LUNS and all looks well (now its in KiB): Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 327.00 1440.00 5940.00 1440 5940 sdb 19.00 1588.00 28.00 1588 28 sdc 15.00 1720.00 0.00 1720 0 sdd 21.00 1700.00 32.00 1700 32 sde 28.00 1660.00 60.00 1660 60 sdf 13.00 1664.00 0.00 1664 0 sdg 71.00 1664.00 228.00 1664 228 Then I run $ dd if=/dev/zero of=file bs=$((2**20)) count=128. It finishes in half a second and after a while iostat says: Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 1.00 140.00 0.00 140 0 sdb 1.00 192.00 0.00 192 0 sdc 1.00 180.00 0.00 180 0 sdd 1.00 128.00 0.00 128 0 sde 1.00 128.00 0.00 128 0 sdf 46.00 144.00 23400.00 144 23400 sdg 2.00 128.00 4.00 128 4 On the other hand, if I run $ dd oflag=direct if=/dev/zero of=file bs=$((2**20)) count=128, it takes cca 8 seconds to finish and iostat says something like: Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 258.42 1702.97 3251.49 1720 3284 sdb 30.69 1710.89 102.97 1728 104 sdc 45.54 1699.01 704.95 1716 712 sdd 23.76 1817.82 15.84 1836 16 sde 18.81 1766.34 27.72 1784 28 sdf 85.15 1770.30 16308.91 1788 16472 sdg 62.38 1778.22 198.02 1796 200 Is it possible that writing the FS cache has higher priority than other accesses? > >> I don't know your DS-4300 at all but if you're using a SAN or an FC loop >> to connect to your array, here are (maybe) a few things you might want to >> look for: >> >> - What kind of disks are used in your DS4300? 10k or 15k rpm FC disks? Did >> you check how heavily used were your disks during transfers? (there >> should be software provided with the array to allow that, perhaps even >> an embedded webserver). 7200 rpm SATA. >> >> - Did you monitor your arrays Fibre Aadapter activity? (unless you're the >> sole user of the array and no other server can hit the same physical >> disks in which case you're most likely not overloading it). I did not, but this server is the only one accessing the array. >> >> - Do you have multiple paths from your server to your switch and/or to >> your array? (even if the array is only active/passive and 2gbps; having >> multiple paths provides redundancy and better performance with correct >> configuration). No. >> >> - What kind of data is your FS holding (many little files, hundreds of >> thousands of file, etc..?). Tuning the FS or switching to a different FS >> type can help.. >> >> - If there is no bottleneck noticed above, then stripping might help >> (that's what we use here on active/active DMX arrays) but take care not >> to end up on the same physical disks at the array block level.. >> Yes, it might help, but there's no easy way to do the switch. >> My 2c, >> >> Vincent Thanks for your suggestions, Jiri Novosad >> >> On Wed, 24 Mar 2010, Jiri Novosad wrote: >> >>> Hello, >>> >>> we have a problem with our disk array. It might be even in HW, I'm not sure. >>> The array holds home directories of our users + mail. >>> >>> HW configuration: >>> >>> a HP DL585 server, with four 6-core Opterons, 128GiB RAM >>> >>> array: IBM DS4300 with 7 LUNs, each a RAID5 with 4 disks (250GB). >>> Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express >>> HBA >>> (the array only supports 2Gb) >>> NCQ queue depth is 32 >>> >>> SW configuration: >>> >>> RHEL5.3 >>> >>> the home partition is a linear LVM volume: >>> >>> # lvdisplay -m /dev/array-vg/newhome >>> --- Logical volume --- >>> LV Name /dev/array-vg/newhome >>> VG Name array-vg >>> LV UUID 9XxWH5-5yv4-t661-K24d-Hdzg-G0aW-zUxRul >>> LV Write Access read/write >>> LV Status available >>> # open 1 >>> LV Size 2.18 TB >>> Current LE 571393 >>> Segments 9 >>> Allocation inherit >>> Read ahead sectors auto >>> - currently set to 256 >>> Block device 253:7 >>> >>> --- Segments --- >>> Logical extent 0 to 66998: >>> Type linear >>> Physical volume /dev/sda >>> Physical extents 111470 to 178468 >>> >>> Logical extent 66999 to 133997: >>> Type linear >>> Physical volume /dev/sdb >>> Physical extents 111470 to 178468 >>> >>> Logical extent 133998 to 200996: >>> Type linear >>> Physical volume /dev/sdc >>> Physical extents 111470 to 178468 >>> >>> Logical extent 200997 to 267995: >>> Type linear >>> Physical volume /dev/sdd >>> Physical extents 111470 to 178468 >>> >>> Logical extent 267996 to 334994: >>> Type linear >>> Physical volume /dev/sde >>> Physical extents 111470 to 178468 >>> >>> Logical extent 334995 to 401993: >>> Type linear >>> Physical volume /dev/sdf >>> Physical extents 111470 to 178468 >>> >>> Logical extent 401994 to 468992: >>> Type linear >>> Physical volume /dev/sdg >>> Physical extents 111470 to 178468 >>> >>> Logical extent 468993 to 527946: >>> Type linear >>> Physical volume /dev/sdg >>> Physical extents 15945 to 74898 >>> >>> Logical extent 527947 to 571392: >>> Type linear >>> Physical volume /dev/sdc >>> Physical extents 15945 to 59390 >>> >>> All LUNs use the deadline scheduler. >>> >>> Now the problem: >>> whenever there is a 'large' write (in the order of hundreds of megabytes), >>> the system load rises considerably. >>> Inspection using iostat shows that from something like this: >>> >>> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >>> sda 373.00 8.00 7792.00 8 7792 >>> sdb 11.00 8.00 80.00 8 80 >>> sdc 13.00 8.00 96.00 8 96 >>> sdd 9.00 8.00 80.00 8 80 >>> sde 23.00 8.00 296.00 8 296 >>> sdf 9.00 8.00 80.00 8 80 >>> sdg 5.00 8.00 32.00 8 32 >>> >>> after a $ dd if=/dev/zero of=file bs=$((2**20)) count=128 >>> it goes to this: >>> >>> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >>> sda 0.00 0.00 0.00 0 0 >>> sdb 0.00 0.00 0.00 0 0 >>> sdc 0.00 0.00 0.00 0 0 >>> sdd 0.00 0.00 0.00 0 0 >>> sde 31.00 8.00 28944.00 8 28944 >>> sdf 1.00 8.00 0.00 8 0 >>> sdg 1.00 8.00 0.00 8 0 >>> >>> and when I generate some reads it goes from >>> >>> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >>> sda 171.00 3200.00 3448.00 3200 3448 >>> sdb 24.00 3336.00 56.00 3336 56 >>> sdc 17.00 3280.00 16.00 3280 16 >>> sdd 15.00 3208.00 24.00 3208 24 >>> sde 18.00 3200.00 56.00 3200 56 >>> sdf 18.00 3192.00 40.00 3192 40 >>> sdg 23.00 3184.00 144.00 3184 144 >>> >>> to >>> >>> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >>> sda 5.00 392.00 88.00 392 88 >>> sdb 2.00 352.00 0.00 352 0 >>> sdc 2.00 264.00 0.00 264 0 >>> sdd 2.00 264.00 0.00 264 0 >>> sde 277.00 560.00 38744.00 560 38744 >>> sdf 2.00 264.00 0.00 264 0 >>> sdg 1.00 296.00 0.00 296 0 >>> >>> It looks like the single write somehow cancels out all other requests. >>> >>> Switching to a striped LVM volume would probably help, but the data >>> migration would >>> be really painful for us. >>> >>> Has anyone an idea where the problem might be? Any pointers would be >>> appreciated. >>> >>> Regards, >>> Jiri Novosad >> >> _______________________________________________ >> rhelv5-list mailing list >> [email protected] >> https://www.redhat.com/mailman/listinfo/rhelv5-list > > _______________________________________________ > rhelv5-list mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/rhelv5-list _______________________________________________ rhelv5-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/rhelv5-list
