Re: [opensuse] Raid5/LVM2/XFS alignment
On Jan 29, 2008 3:05 PM, Ciro Iriarte [EMAIL PROTECTED] wrote: 2008/1/28, Greg Freemyer [EMAIL PROTECTED]: On Jan 28, 2008 6:41 PM, Ciro Iriarte [EMAIL PROTECTED] wrote: Ok, I guess you know reads are not significantly impacted by the tuning were talking about. This is mostly about tuning for raid5 write performance. Anyway, are you planning to stripe together multiple md5 arrays via LVM? I believe that is what --stripes and --stripesize are for. (ie. If you have 8 drives, you could create 2 raid5 arrays, and use LVM to interleave them by using --stripes = 2.) I've never used that feature. You need to worry about the vg extents. I think vgcreate --physicalextentsize is what you need to tune. I would make each extent an even number of stripes in size. ie. 768KB * N. Maybe use N=10, so -s 7680K Assuming your not using lvm strips and since this appears to be a new setup, I would also use -C or --contiguous to ensure all the data is sequential. It maybe overkill, but it will further ensure you _avoid_ LV extents that don't end on a stripe boundary. (a stripe == 3 raid5 chunks for you). Then if you are going to use the snapshot feature, you need to set your chunksize efficiently. If you only are going to have large files, then I would use a large LVM snapshot chunksize. 256KB seems like a good choice, but I have not benchmarked snapshot chunksizes. Greg -- Just for the record, dealing with a bug that made the raid hang, found a workaround that also gave me performance boost: echo 4096 /sys/block/md2/md/stripe_cache_size Result: mainwks:~ # dd if=/dev/zero bs=1024k count=1000 of=/datos/test 1000+0 records in 1000+0 records out 1048576000 bytes (1,0 GB) copied, 6,78341 s, 155 MB/s mainwks:~ # rm /datos/test mainwks:~ # dd if=/dev/zero bs=1024k count=2 of=/datos/test 2+0 records in 2+0 records out 2097152 bytes (21 GB) copied, 199,135 s, 105 MB/s Ciro Ciro, 105 MB/s seems strange to me. I would have expected 75 MB/s or 225MB/ s ie. For normal non-full stripe i/o, it should be 75MB/s * 4 / 4. Where 75MB/sec is what I see for one drive typically, the first 4 is the number of drives that can be doing parallel i/o and the second 4 is the number of i/o's per write. ie. When you do a non-full stripe write, the kernel has to read the old checksum. read the old chunk data, recalc the checksum, write the new chunk data, write the checksum. Out of curiosity, on the dd line, do you get better performance if you set your blocksize to exactly one stripe? ie. 3x 256KB = 768KB stripe. I've read the Linux's raid5 implementation is optimized to handle full stripe write's. ie. Writing 3 chunks produces: Calc new checksum from all new data, Write d1, d2, d3, p so to get 3 256KB chunks to the drive, the kernel ends up invoking 4 256KB writes. Or 75 MB/s * 4 * 3 / 4 = 225 MB / sec If you have everything optimized, I think you should see the same performance with a 2-stripe write. ie. 6x 256KB. If your optimization is wrong, you will see a speed improvement because the alignment between your writes and stripes will be wrong. With the bigger write, you will be guaranteed at least one full stripe write. Thanks Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence Technology http://www.norcrossgroup.com -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [opensuse] Raid5/LVM2/XFS alignment
2008/1/28, Greg Freemyer [EMAIL PROTECTED]: On Jan 28, 2008 6:41 PM, Ciro Iriarte [EMAIL PROTECTED] wrote: Ok, I guess you know reads are not significantly impacted by the tuning were talking about. This is mostly about tuning for raid5 write performance. Anyway, are you planning to stripe together multiple md5 arrays via LVM? I believe that is what --stripes and --stripesize are for. (ie. If you have 8 drives, you could create 2 raid5 arrays, and use LVM to interleave them by using --stripes = 2.) I've never used that feature. You need to worry about the vg extents. I think vgcreate --physicalextentsize is what you need to tune. I would make each extent an even number of stripes in size. ie. 768KB * N. Maybe use N=10, so -s 7680K Assuming your not using lvm strips and since this appears to be a new setup, I would also use -C or --contiguous to ensure all the data is sequential. It maybe overkill, but it will further ensure you _avoid_ LV extents that don't end on a stripe boundary. (a stripe == 3 raid5 chunks for you). Then if you are going to use the snapshot feature, you need to set your chunksize efficiently. If you only are going to have large files, then I would use a large LVM snapshot chunksize. 256KB seems like a good choice, but I have not benchmarked snapshot chunksizes. Greg -- Just for the record, dealing with a bug that made the raid hang, found a workaround that also gave me performance boost: echo 4096 /sys/block/md2/md/stripe_cache_size Result: mainwks:~ # dd if=/dev/zero bs=1024k count=1000 of=/datos/test 1000+0 records in 1000+0 records out 1048576000 bytes (1,0 GB) copied, 6,78341 s, 155 MB/s mainwks:~ # rm /datos/test mainwks:~ # dd if=/dev/zero bs=1024k count=2 of=/datos/test 2+0 records in 2+0 records out 2097152 bytes (21 GB) copied, 199,135 s, 105 MB/s Ciro -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [opensuse] Raid5/LVM2/XFS alignment
On Jan 28, 2008 11:25 AM, Ciro Iriarte [EMAIL PROTECTED] wrote: Hi, anybody has some notes about tuning md raid5, lvn and xfs?. I'm getting 20mb/s with dd and I think it can be improved. I'll add config parameters as soon as i get home. I'm using md raid5 on a motherboard with nvidia sata controller, 4x500gb samsung sata2 disks and lvm with OpenSUSE [EMAIL PROTECTED] Regards, Ciro -- I have not done any raid 5 perf. testing: 20 mb/sec seems pretty bad, but not outrageous I suppose. I can get about 4-5GB/min from new sata drives. So about 75 MB/sec from a single raw drive (ie. dd if=/dev/zero of=/dev/sdb bs=4k) You don't say how your invoking dd. The default bs is only 512 bytes I think and that is totally inefficient with the linux kernel. I typically use 4k which maps to what the kernel uses. ie. dd if=/dev/zero of=big-file bs=4k count=1000 should give you a simple but meaningful test.. I think the default stride is 64k per drive, so if your writing 3x 64K at a time, you may get perfect alignment and miss the overhead of having to recalculate the checksum all the time. As another data point, I would bump that up to 30x 64K and see if you continue to get speed improvements. So tell us the write speed for bs=512 bs=4k bs=192k bs=1920k And the read speeds for the same. ie. dd if=big-file of=/dev/null bs=4k, etc. I would expect the write speed to go up with each increase in bs, but the read speed to be more or less constant. Then you need to figure out what sort of real world block sizes your going to be using. Once you have a bs, or collection of bs sizes that match your needs, then you can start tuning your stack. Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence Technology http://www.norcrossgroup.com -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [opensuse] Raid5/LVM2/XFS alignment
2008/1/28, Greg Freemyer [EMAIL PROTECTED]: On Jan 28, 2008 11:25 AM, Ciro Iriarte [EMAIL PROTECTED] wrote: Hi, anybody has some notes about tuning md raid5, lvn and xfs?. I'm getting 20mb/s with dd and I think it can be improved. I'll add config parameters as soon as i get home. I'm using md raid5 on a motherboard with nvidia sata controller, 4x500gb samsung sata2 disks and lvm with OpenSUSE [EMAIL PROTECTED] Regards, Ciro -- I have not done any raid 5 perf. testing: 20 mb/sec seems pretty bad, but not outrageous I suppose. I can get about 4-5GB/min from new sata drives. So about 75 MB/sec from a single raw drive (ie. dd if=/dev/zero of=/dev/sdb bs=4k) You don't say how your invoking dd. The default bs is only 512 bytes I think and that is totally inefficient with the linux kernel. I typically use 4k which maps to what the kernel uses. ie. dd if=/dev/zero of=big-file bs=4k count=1000 should give you a simple but meaningful test.. I think the default stride is 64k per drive, so if your writing 3x 64K at a time, you may get perfect alignment and miss the overhead of having to recalculate the checksum all the time. As another data point, I would bump that up to 30x 64K and see if you continue to get speed improvements. So tell us the write speed for bs=512 bs=4k bs=192k bs=1920k And the read speeds for the same. ie. dd if=big-file of=/dev/null bs=4k, etc. I would expect the write speed to go up with each increase in bs, but the read speed to be more or less constant. Then you need to figure out what sort of real world block sizes your going to be using. Once you have a bs, or collection of bs sizes that match your needs, then you can start tuning your stack. Greg Hi, posted the first mail from my cell phone, so couldn't add more info - I created the raid with chunk size= 256k. mainwks:~ # mdadm --misc --detail /dev/md2 /dev/md2: Version : 01.00.03 Creation Time : Sun Jan 27 20:08:48 2008 Raid Level : raid5 Array Size : 1465151232 (1397.28 GiB 1500.31 GB) Used Dev Size : 976767488 (465.76 GiB 500.10 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Mon Jan 28 17:42:51 2008 State : active Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 256K Name : 2 UUID : 65cb16de:d89af60e:6cac47da:88828cfe Events : 12 Number Major Minor RaidDevice State 0 8 330 active sync /dev/sdc1 1 8 491 active sync /dev/sdd1 2 8 652 active sync /dev/sde1 4 8 813 active sync /dev/sdf1 - Speed reported by hdparm: mainwks:~ # hdparm -tT /dev/sdc /dev/sdc: Timing cached reads: 1754 MB in 2.00 seconds = 877.60 MB/sec Timing buffered disk reads: 226 MB in 3.02 seconds = 74.76 MB/sec mainwks:~ # hdparm -tT /dev/md2 /dev/md2: Timing cached reads: 1250 MB in 2.00 seconds = 624.82 MB/sec Timing buffered disk reads: 620 MB in 3.01 seconds = 206.09 MB/sec - LVM: mainwks:~ # vgdisplay data Incorrect metadata area header checksum --- Volume group --- VG Name data System ID Formatlvm2 Metadata Areas1 Metadata Sequence No 5 VG Access read/write VG Status resizable MAX LV0 Cur LV2 Open LV 2 Max PV0 Cur PV1 Act PV1 VG Size 1.36 TB PE Size 4.00 MB Total PE 357702 Alloc PE / Size 51200 / 200.00 GB Free PE / Size 306502 / 1.17 TB VG UUID KpUAeN-mPjO-2K8t-hiLX-FF0C-93R2-IP3aFI mainwks:~ # pvdisplay /dev/sdc1 Incorrect metadata area header checksum --- Physical volume --- PV Name /dev/md2 VG Name data PV Size 1.36 TB / not usable 3.75 MB Allocatable yes PE Size (KByte) 4096 Total PE 357702 Free PE 306502 Allocated PE 51200 PV UUID Axl2c0-RP95-WwO0-inHP-aJEF-6SYJ-Fqhnga - XFS: mainwks:~ # xfs_info /dev/data/test meta-data=/dev/mapper/data-test isize=256agcount=16, agsize=1638400 blks = sectsz=512 attr=0 data = bsize=4096 blocks=26214400, imaxpct=25 = sunit=16 swidth=48 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=16384, version=1 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=4096 blocks=0, rtextents=0 - The reported dd mainwks:~ # dd if=/dev/zero
[opensuse] Raid5/LVM2/XFS alignment
Hi, anybody has some notes about tuning md raid5, lvn and xfs?. I'm getting 20mb/s with dd and I think it can be improved. I'll add config parameters as soon as i get home. I'm using md raid5 on a motherboard with nvidia sata controller, 4x500gb samsung sata2 disks and lvm with OpenSUSE [EMAIL PROTECTED] Regards, Ciro -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [opensuse] Raid5/LVM2/XFS alignment
On Jan 28, 2008 3:51 PM, Ciro Iriarte [EMAIL PROTECTED] wrote: 2008/1/28, Greg Freemyer [EMAIL PROTECTED]: On Jan 28, 2008 11:25 AM, Ciro Iriarte [EMAIL PROTECTED] wrote: Hi, anybody has some notes about tuning md raid5, lvn and xfs?. I'm getting 20mb/s with dd and I think it can be improved. I'll add config parameters as soon as i get home. I'm using md raid5 on a motherboard with nvidia sata controller, 4x500gb samsung sata2 disks and lvm with OpenSUSE [EMAIL PROTECTED] Regards, Ciro -- I have not done any raid 5 perf. testing: 20 mb/sec seems pretty bad, but not outrageous I suppose. I can get about 4-5GB/min from new sata drives. So about 75 MB/sec from a single raw drive (ie. dd if=/dev/zero of=/dev/sdb bs=4k) You don't say how your invoking dd. The default bs is only 512 bytes I think and that is totally inefficient with the linux kernel. I typically use 4k which maps to what the kernel uses. ie. dd if=/dev/zero of=big-file bs=4k count=1000 should give you a simple but meaningful test.. I think the default stride is 64k per drive, so if your writing 3x 64K at a time, you may get perfect alignment and miss the overhead of having to recalculate the checksum all the time. As another data point, I would bump that up to 30x 64K and see if you continue to get speed improvements. So tell us the write speed for bs=512 bs=4k bs=192k bs=1920k And the read speeds for the same. ie. dd if=big-file of=/dev/null bs=4k, etc. I would expect the write speed to go up with each increase in bs, but the read speed to be more or less constant. Then you need to figure out what sort of real world block sizes your going to be using. Once you have a bs, or collection of bs sizes that match your needs, then you can start tuning your stack. Greg Hi, posted the first mail from my cell phone, so couldn't add more info - I created the raid with chunk size= 256k. mainwks:~ # mdadm --misc --detail /dev/md2 /dev/md2: Version : 01.00.03 Creation Time : Sun Jan 27 20:08:48 2008 Raid Level : raid5 Array Size : 1465151232 (1397.28 GiB 1500.31 GB) Used Dev Size : 976767488 (465.76 GiB 500.10 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Mon Jan 28 17:42:51 2008 State : active Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 256K Name : 2 UUID : 65cb16de:d89af60e:6cac47da:88828cfe Events : 12 Number Major Minor RaidDevice State 0 8 330 active sync /dev/sdc1 1 8 491 active sync /dev/sdd1 2 8 652 active sync /dev/sde1 4 8 813 active sync /dev/sdf1 - Speed reported by hdparm: mainwks:~ # hdparm -tT /dev/sdc /dev/sdc: Timing cached reads: 1754 MB in 2.00 seconds = 877.60 MB/sec Timing buffered disk reads: 226 MB in 3.02 seconds = 74.76 MB/sec mainwks:~ # hdparm -tT /dev/md2 /dev/md2: Timing cached reads: 1250 MB in 2.00 seconds = 624.82 MB/sec Timing buffered disk reads: 620 MB in 3.01 seconds = 206.09 MB/sec - LVM: mainwks:~ # vgdisplay data Incorrect metadata area header checksum --- Volume group --- VG Name data System ID Formatlvm2 Metadata Areas1 Metadata Sequence No 5 VG Access read/write VG Status resizable MAX LV0 Cur LV2 Open LV 2 Max PV0 Cur PV1 Act PV1 VG Size 1.36 TB PE Size 4.00 MB Total PE 357702 Alloc PE / Size 51200 / 200.00 GB Free PE / Size 306502 / 1.17 TB VG UUID KpUAeN-mPjO-2K8t-hiLX-FF0C-93R2-IP3aFI mainwks:~ # pvdisplay /dev/sdc1 Incorrect metadata area header checksum --- Physical volume --- PV Name /dev/md2 VG Name data PV Size 1.36 TB / not usable 3.75 MB Allocatable yes PE Size (KByte) 4096 Total PE 357702 Free PE 306502 Allocated PE 51200 PV UUID Axl2c0-RP95-WwO0-inHP-aJEF-6SYJ-Fqhnga - XFS: mainwks:~ # xfs_info /dev/data/test meta-data=/dev/mapper/data-test isize=256agcount=16, agsize=1638400 blks = sectsz=512 attr=0 data = bsize=4096 blocks=26214400, imaxpct=25 = sunit=16 swidth=48 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=16384, version=1
Re: [opensuse] Raid5/LVM2/XFS alignment
2008/1/28, Greg Freemyer [EMAIL PROTECTED]: On Jan 28, 2008 3:51 PM, Ciro Iriarte [EMAIL PROTECTED] wrote: 2008/1/28, Greg Freemyer [EMAIL PROTECTED]: On Jan 28, 2008 11:25 AM, Ciro Iriarte [EMAIL PROTECTED] wrote: Hi, anybody has some notes about tuning md raid5, lvn and xfs?. I'm getting 20mb/s with dd and I think it can be improved. I'll add config parameters as soon as i get home. I'm using md raid5 on a motherboard with nvidia sata controller, 4x500gb samsung sata2 disks and lvm with OpenSUSE [EMAIL PROTECTED] Regards, Ciro -- I have not done any raid 5 perf. testing: 20 mb/sec seems pretty bad, but not outrageous I suppose. I can get about 4-5GB/min from new sata drives. So about 75 MB/sec from a single raw drive (ie. dd if=/dev/zero of=/dev/sdb bs=4k) You don't say how your invoking dd. The default bs is only 512 bytes I think and that is totally inefficient with the linux kernel. I typically use 4k which maps to what the kernel uses. ie. dd if=/dev/zero of=big-file bs=4k count=1000 should give you a simple but meaningful test.. I think the default stride is 64k per drive, so if your writing 3x 64K at a time, you may get perfect alignment and miss the overhead of having to recalculate the checksum all the time. As another data point, I would bump that up to 30x 64K and see if you continue to get speed improvements. So tell us the write speed for bs=512 bs=4k bs=192k bs=1920k And the read speeds for the same. ie. dd if=big-file of=/dev/null bs=4k, etc. I would expect the write speed to go up with each increase in bs, but the read speed to be more or less constant. Then you need to figure out what sort of real world block sizes your going to be using. Once you have a bs, or collection of bs sizes that match your needs, then you can start tuning your stack. Greg Hi, posted the first mail from my cell phone, so couldn't add more info - I created the raid with chunk size= 256k. mainwks:~ # mdadm --misc --detail /dev/md2 /dev/md2: Version : 01.00.03 Creation Time : Sun Jan 27 20:08:48 2008 Raid Level : raid5 Array Size : 1465151232 (1397.28 GiB 1500.31 GB) Used Dev Size : 976767488 (465.76 GiB 500.10 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Mon Jan 28 17:42:51 2008 State : active Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 256K Name : 2 UUID : 65cb16de:d89af60e:6cac47da:88828cfe Events : 12 Number Major Minor RaidDevice State 0 8 330 active sync /dev/sdc1 1 8 491 active sync /dev/sdd1 2 8 652 active sync /dev/sde1 4 8 813 active sync /dev/sdf1 - Speed reported by hdparm: mainwks:~ # hdparm -tT /dev/sdc /dev/sdc: Timing cached reads: 1754 MB in 2.00 seconds = 877.60 MB/sec Timing buffered disk reads: 226 MB in 3.02 seconds = 74.76 MB/sec mainwks:~ # hdparm -tT /dev/md2 /dev/md2: Timing cached reads: 1250 MB in 2.00 seconds = 624.82 MB/sec Timing buffered disk reads: 620 MB in 3.01 seconds = 206.09 MB/sec - LVM: mainwks:~ # vgdisplay data Incorrect metadata area header checksum --- Volume group --- VG Name data System ID Formatlvm2 Metadata Areas1 Metadata Sequence No 5 VG Access read/write VG Status resizable MAX LV0 Cur LV2 Open LV 2 Max PV0 Cur PV1 Act PV1 VG Size 1.36 TB PE Size 4.00 MB Total PE 357702 Alloc PE / Size 51200 / 200.00 GB Free PE / Size 306502 / 1.17 TB VG UUID KpUAeN-mPjO-2K8t-hiLX-FF0C-93R2-IP3aFI mainwks:~ # pvdisplay /dev/sdc1 Incorrect metadata area header checksum --- Physical volume --- PV Name /dev/md2 VG Name data PV Size 1.36 TB / not usable 3.75 MB Allocatable yes PE Size (KByte) 4096 Total PE 357702 Free PE 306502 Allocated PE 51200 PV UUID Axl2c0-RP95-WwO0-inHP-aJEF-6SYJ-Fqhnga - XFS: mainwks:~ # xfs_info /dev/data/test meta-data=/dev/mapper/data-test isize=256agcount=16, agsize=1638400 blks = sectsz=512 attr=0 data = bsize=4096 blocks=26214400,
Re: [opensuse] Raid5/LVM2/XFS alignment
On Jan 28, 2008 6:41 PM, Ciro Iriarte [EMAIL PROTECTED] wrote: 2008/1/28, Greg Freemyer [EMAIL PROTECTED]: On Jan 28, 2008 3:51 PM, Ciro Iriarte [EMAIL PROTECTED] wrote: 2008/1/28, Greg Freemyer [EMAIL PROTECTED]: On Jan 28, 2008 11:25 AM, Ciro Iriarte [EMAIL PROTECTED] wrote: Hi, anybody has some notes about tuning md raid5, lvn and xfs?. I'm getting 20mb/s with dd and I think it can be improved. I'll add config parameters as soon as i get home. I'm using md raid5 on a motherboard with nvidia sata controller, 4x500gb samsung sata2 disks and lvm with OpenSUSE [EMAIL PROTECTED] Regards, Ciro -- I have not done any raid 5 perf. testing: 20 mb/sec seems pretty bad, but not outrageous I suppose. I can get about 4-5GB/min from new sata drives. So about 75 MB/sec from a single raw drive (ie. dd if=/dev/zero of=/dev/sdb bs=4k) You don't say how your invoking dd. The default bs is only 512 bytes I think and that is totally inefficient with the linux kernel. I typically use 4k which maps to what the kernel uses. ie. dd if=/dev/zero of=big-file bs=4k count=1000 should give you a simple but meaningful test.. I think the default stride is 64k per drive, so if your writing 3x 64K at a time, you may get perfect alignment and miss the overhead of having to recalculate the checksum all the time. As another data point, I would bump that up to 30x 64K and see if you continue to get speed improvements. So tell us the write speed for bs=512 bs=4k bs=192k bs=1920k And the read speeds for the same. ie. dd if=big-file of=/dev/null bs=4k, etc. I would expect the write speed to go up with each increase in bs, but the read speed to be more or less constant. Then you need to figure out what sort of real world block sizes your going to be using. Once you have a bs, or collection of bs sizes that match your needs, then you can start tuning your stack. Greg Hi, posted the first mail from my cell phone, so couldn't add more info - I created the raid with chunk size= 256k. mainwks:~ # mdadm --misc --detail /dev/md2 /dev/md2: Version : 01.00.03 Creation Time : Sun Jan 27 20:08:48 2008 Raid Level : raid5 Array Size : 1465151232 (1397.28 GiB 1500.31 GB) Used Dev Size : 976767488 (465.76 GiB 500.10 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Mon Jan 28 17:42:51 2008 State : active Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 256K Name : 2 UUID : 65cb16de:d89af60e:6cac47da:88828cfe Events : 12 Number Major Minor RaidDevice State 0 8 330 active sync /dev/sdc1 1 8 491 active sync /dev/sdd1 2 8 652 active sync /dev/sde1 4 8 813 active sync /dev/sdf1 - Speed reported by hdparm: mainwks:~ # hdparm -tT /dev/sdc /dev/sdc: Timing cached reads: 1754 MB in 2.00 seconds = 877.60 MB/sec Timing buffered disk reads: 226 MB in 3.02 seconds = 74.76 MB/sec mainwks:~ # hdparm -tT /dev/md2 /dev/md2: Timing cached reads: 1250 MB in 2.00 seconds = 624.82 MB/sec Timing buffered disk reads: 620 MB in 3.01 seconds = 206.09 MB/sec - LVM: mainwks:~ # vgdisplay data Incorrect metadata area header checksum --- Volume group --- VG Name data System ID Formatlvm2 Metadata Areas1 Metadata Sequence No 5 VG Access read/write VG Status resizable MAX LV0 Cur LV2 Open LV 2 Max PV0 Cur PV1 Act PV1 VG Size 1.36 TB PE Size 4.00 MB Total PE 357702 Alloc PE / Size 51200 / 200.00 GB Free PE / Size 306502 / 1.17 TB VG UUID KpUAeN-mPjO-2K8t-hiLX-FF0C-93R2-IP3aFI mainwks:~ # pvdisplay /dev/sdc1 Incorrect metadata area header checksum --- Physical volume --- PV Name /dev/md2 VG Name data PV Size 1.36 TB / not usable 3.75 MB Allocatable yes PE Size (KByte) 4096 Total PE 357702 Free PE 306502 Allocated PE 51200 PV UUID Axl2c0-RP95-WwO0-inHP-aJEF-6SYJ-Fqhnga - XFS: mainwks:~ # xfs_info /dev/data/test
Re: [opensuse] Raid5/LVM2/XFS alignment
2008/1/28, Greg Freemyer [EMAIL PROTECTED]: On Jan 28, 2008 6:41 PM, Ciro Iriarte [EMAIL PROTECTED] wrote: 2008/1/28, Greg Freemyer [EMAIL PROTECTED]: On Jan 28, 2008 3:51 PM, Ciro Iriarte [EMAIL PROTECTED] wrote: 2008/1/28, Greg Freemyer [EMAIL PROTECTED]: On Jan 28, 2008 11:25 AM, Ciro Iriarte [EMAIL PROTECTED] wrote: Hi, anybody has some notes about tuning md raid5, lvn and xfs?. I'm getting 20mb/s with dd and I think it can be improved. I'll add config parameters as soon as i get home. I'm using md raid5 on a motherboard with nvidia sata controller, 4x500gb samsung sata2 disks and lvm with OpenSUSE [EMAIL PROTECTED] Regards, Ciro -- I have not done any raid 5 perf. testing: 20 mb/sec seems pretty bad, but not outrageous I suppose. I can get about 4-5GB/min from new sata drives. So about 75 MB/sec from a single raw drive (ie. dd if=/dev/zero of=/dev/sdb bs=4k) You don't say how your invoking dd. The default bs is only 512 bytes I think and that is totally inefficient with the linux kernel. I typically use 4k which maps to what the kernel uses. ie. dd if=/dev/zero of=big-file bs=4k count=1000 should give you a simple but meaningful test.. I think the default stride is 64k per drive, so if your writing 3x 64K at a time, you may get perfect alignment and miss the overhead of having to recalculate the checksum all the time. As another data point, I would bump that up to 30x 64K and see if you continue to get speed improvements. So tell us the write speed for bs=512 bs=4k bs=192k bs=1920k And the read speeds for the same. ie. dd if=big-file of=/dev/null bs=4k, etc. I would expect the write speed to go up with each increase in bs, but the read speed to be more or less constant. Then you need to figure out what sort of real world block sizes your going to be using. Once you have a bs, or collection of bs sizes that match your needs, then you can start tuning your stack. Greg Hi, posted the first mail from my cell phone, so couldn't add more info - I created the raid with chunk size= 256k. mainwks:~ # mdadm --misc --detail /dev/md2 /dev/md2: Version : 01.00.03 Creation Time : Sun Jan 27 20:08:48 2008 Raid Level : raid5 Array Size : 1465151232 (1397.28 GiB 1500.31 GB) Used Dev Size : 976767488 (465.76 GiB 500.10 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Mon Jan 28 17:42:51 2008 State : active Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 256K Name : 2 UUID : 65cb16de:d89af60e:6cac47da:88828cfe Events : 12 Number Major Minor RaidDevice State 0 8 330 active sync /dev/sdc1 1 8 491 active sync /dev/sdd1 2 8 652 active sync /dev/sde1 4 8 813 active sync /dev/sdf1 - Speed reported by hdparm: mainwks:~ # hdparm -tT /dev/sdc /dev/sdc: Timing cached reads: 1754 MB in 2.00 seconds = 877.60 MB/sec Timing buffered disk reads: 226 MB in 3.02 seconds = 74.76 MB/sec mainwks:~ # hdparm -tT /dev/md2 /dev/md2: Timing cached reads: 1250 MB in 2.00 seconds = 624.82 MB/sec Timing buffered disk reads: 620 MB in 3.01 seconds = 206.09 MB/sec - LVM: mainwks:~ # vgdisplay data Incorrect metadata area header checksum --- Volume group --- VG Name data System ID Formatlvm2 Metadata Areas1 Metadata Sequence No 5 VG Access read/write VG Status resizable MAX LV0 Cur LV2 Open LV 2 Max PV0 Cur PV1 Act PV1 VG Size 1.36 TB PE Size 4.00 MB Total PE 357702 Alloc PE / Size 51200 / 200.00 GB Free PE / Size 306502 / 1.17 TB VG UUID KpUAeN-mPjO-2K8t-hiLX-FF0C-93R2-IP3aFI mainwks:~ # pvdisplay /dev/sdc1 Incorrect metadata area header checksum --- Physical volume --- PV Name /dev/md2 VG Name data PV Size 1.36 TB / not usable 3.75 MB Allocatable yes PE Size (KByte) 4096 Total PE