2008/1/28, Greg Freemyer <[EMAIL PROTECTED]>: > On Jan 28, 2008 6:41 PM, Ciro Iriarte <[EMAIL PROTECTED]> wrote: > > > > 2008/1/28, Greg Freemyer <[EMAIL PROTECTED]>: > > > On Jan 28, 2008 3:51 PM, Ciro Iriarte <[EMAIL PROTECTED]> wrote: > > > > 2008/1/28, Greg Freemyer <[EMAIL PROTECTED]>: > > > > > > > > > On Jan 28, 2008 11:25 AM, Ciro Iriarte <[EMAIL PROTECTED]> wrote: > > > > > > Hi, anybody has some notes about tuning md raid5, lvn and xfs?. I'm > > > > > > getting 20mb/s with dd and I think it can be improved. I'll add > > > > > > config > > > > > > parameters as soon as i get home. I'm using md raid5 on a > > > > > > motherboard > > > > > > with nvidia sata controller, 4x500gb samsung sata2 disks and lvm > > > > > > with > > > > > > OpenSUSE [EMAIL PROTECTED] > > > > > > > > > > > > Regards, > > > > > > Ciro > > > > > > -- > > > > > > > > > > I have not done any raid 5 perf. testing: 20 mb/sec seems pretty bad, > > > > > but not outrageous I suppose. I can get about 4-5GB/min from new sata > > > > > drives. So about 75 MB/sec from a single raw drive (ie. dd > > > > > if=/dev/zero of=/dev/sdb bs=4k) > > > > > > > > > > You don't say how your invoking dd. The default bs is only 512 bytes > > > > > I think and that is totally inefficient with the linux kernel. > > > > > > > > > > I typically use 4k which maps to what the kernel uses. ie. dd > > > > > if=/dev/zero of=big-file bs=4k count=1000 should give you a simple but > > > > > meaningful test.. > > > > > > > > > > I think the default stride is 64k per drive, so if your writing 3x 64K > > > > > at a time, you may get perfect alignment and miss the overhead of > > > > > having to recalculate the checksum all the time. > > > > > > > > > > As another data point, I would bump that up to 30x 64K and see if you > > > > > continue to get speed improvements. > > > > > > > > > > So tell us the write speed for > > > > > bs=512 > > > > > bs=4k > > > > > bs=192k > > > > > bs=1920k > > > > > > > > > > And the read speeds for the same. ie. dd if=big-file of=/dev/null > > > > > bs=4k, etc. > > > > > > > > > > I would expect the write speed to go up with each increase in bs, but > > > > > the read speed to be more or less constant. Then you need to figure > > > > > out what sort of real world block sizes your going to be using. Once > > > > > you have a bs, or collection of bs sizes that match your needs, then > > > > > you can start tuning your stack. > > > > > > > > > > Greg > > > > > > > > Hi, posted the first mail from my cell phone, so couldn't add more > > > > info.... > > > > > > > > - I created the raid with chunk size= 256k. > > > > > > > > mainwks:~ # mdadm --misc --detail /dev/md2 > > > > /dev/md2: > > > > Version : 01.00.03 > > > > Creation Time : Sun Jan 27 20:08:48 2008 > > > > Raid Level : raid5 > > > > Array Size : 1465151232 (1397.28 GiB 1500.31 GB) > > > > Used Dev Size : 976767488 (465.76 GiB 500.10 GB) > > > > Raid Devices : 4 > > > > Total Devices : 4 > > > > Preferred Minor : 2 > > > > Persistence : Superblock is persistent > > > > > > > > Intent Bitmap : Internal > > > > > > > > Update Time : Mon Jan 28 17:42:51 2008 > > > > State : active > > > > Active Devices : 4 > > > > Working Devices : 4 > > > > Failed Devices : 0 > > > > Spare Devices : 0 > > > > > > > > Layout : left-symmetric > > > > Chunk Size : 256K > > > > > > > > Name : 2 > > > > UUID : 65cb16de:d89af60e:6cac47da:88828cfe > > > > Events : 12 > > > > > > > > Number Major Minor RaidDevice State > > > > 0 8 33 0 active sync /dev/sdc1 > > > > 1 8 49 1 active sync /dev/sdd1 > > > > 2 8 65 2 active sync /dev/sde1 > > > > 4 8 81 3 active sync /dev/sdf1 > > > > > > > > - Speed reported by hdparm: > > > > > > > > mainwks:~ # hdparm -tT /dev/sdc > > > > > > > > /dev/sdc: > > > > Timing cached reads: 1754 MB in 2.00 seconds = 877.60 MB/sec > > > > Timing buffered disk reads: 226 MB in 3.02 seconds = 74.76 MB/sec > > > > mainwks:~ # hdparm -tT /dev/md2 > > > > > > > > /dev/md2: > > > > Timing cached reads: 1250 MB in 2.00 seconds = 624.82 MB/sec > > > > Timing buffered disk reads: 620 MB in 3.01 seconds = 206.09 MB/sec > > > > > > > > - LVM: > > > > > > > > mainwks:~ # vgdisplay data > > > > Incorrect metadata area header checksum > > > > --- Volume group --- > > > > VG Name data > > > > System ID > > > > Format lvm2 > > > > Metadata Areas 1 > > > > Metadata Sequence No 5 > > > > VG Access read/write > > > > VG Status resizable > > > > MAX LV 0 > > > > Cur LV 2 > > > > Open LV 2 > > > > Max PV 0 > > > > Cur PV 1 > > > > Act PV 1 > > > > VG Size 1.36 TB > > > > PE Size 4.00 MB > > > > Total PE 357702 > > > > Alloc PE / Size 51200 / 200.00 GB > > > > Free PE / Size 306502 / 1.17 TB > > > > VG UUID KpUAeN-mPjO-2K8t-hiLX-FF0C-93R2-IP3aFI > > > > > > > > mainwks:~ # pvdisplay /dev/sdc1 > > > > Incorrect metadata area header checksum > > > > --- Physical volume --- > > > > PV Name /dev/md2 > > > > VG Name data > > > > PV Size 1.36 TB / not usable 3.75 MB > > > > Allocatable yes > > > > PE Size (KByte) 4096 > > > > Total PE 357702 > > > > Free PE 306502 > > > > Allocated PE 51200 > > > > PV UUID Axl2c0-RP95-WwO0-inHP-aJEF-6SYJ-Fqhnga > > > > > > > > - XFS: > > > > > > > > mainwks:~ # xfs_info /dev/data/test > > > > meta-data=/dev/mapper/data-test isize=256 agcount=16, > > > > agsize=1638400 blks > > > > = sectsz=512 attr=0 > > > > data = bsize=4096 blocks=26214400, > > > > imaxpct=25 > > > > = sunit=16 swidth=48 blks, > > > > unwritten=1 > > > > naming =version 2 bsize=4096 > > > > log =internal bsize=4096 blocks=16384, version=1 > > > > = sectsz=512 sunit=0 blks, lazy-count=0 > > > > realtime =none extsz=4096 blocks=0, rtextents=0 > > > > > > > > - The reported dd > > > > mainwks:~ # dd if=/dev/zero bs=1024k count=100 of=/mnt/custom/t3 > > > > 100+0 records in > > > > 100+0 records out > > > > 104857600 bytes (105 MB) copied, 5.11596 s, 20.5 MB/s > > > > > > > > > > > > - New dd (seems to give better result) > > > > mainwks:~ # dd if=/dev/zero bs=1024k count=1000 of=/mnt/custom/t0 > > > > 1000+0 records in > > > > 1000+0 records out > > > > 1048576000 bytes (1.0 GB) copied, 13.6218 s, 77.0 MB/s > > > > > > > > Ciro > > > > > > > > > > Not sure I followed why the old and new dd were so different. I do > > > see the old one only had 5 seconds worth of data. Not much data to > > > base a test run on. > > > > > > IF you really have 1MB avg. write sizes, you should read > > > http://oss.sgi.com/archives/xfs/2007-06/msg00411.html for a tuning > > > sample > > > > > > Basically that post recommends: > > > > > > chuck size = 256KB > > > LVM align = 3x Chunk Size = 768KB (assumes a 4-disk raid5) > > > > > > And tune the XFS bsize/sunit/swidth to match. > > > > > > But that all _assumes_ a large data write size. If you have a more > > > typical desktop load, then the average write is way below that and you > > > need to really reduce all of the above (except bsize. I think 4K > > > bsize is always best with Linux, but I'm not positive about that.). > > > > > > Also, dd is only able to simulate a sequential data stream. If you > > > don't have that kind of load, once again you need to reduce the chunk > > > size. I think the generically preferred chunk size is 64KB, With > > > some database apps, that can drop down to 4KB. > > > > > > So really and truly, you need to characterize your workload before you > > > start tuning. > > > > > > OTOH, if you just want bragging rights, test with and tune for a big > > > average write, but be warned your typical performance will be going > > > down at the same time that your large write performance is going up. > > > > > > Greg > > > -- > > > Greg Freemyer > > > Litigation Triage Solutions Specialist > > > http://www.linkedin.com/in/gregfreemyer > > > First 99 Days Litigation White Paper - > > > http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf > > > > > > The Norcross Group > > > The Intersection of Evidence & Technology > > > http://www.norcrossgroup.com > > > -- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > Hi, i found that thread too, the problem is i'm not sure how to tune > > the lvm alignment, maybe --stripes & --stripesize at LV creation > > time?, can't find an option for pvcreate or vgcreate. It will be > > basically a repository for media files, movies, backups, iso images, > > etc... For the rest (documents, ebooks and music) I'll create other > > LVs with Ext3. > > > > Regards, > > Ciro > > Ok, I guess you know reads are not significantly impacted by the > tuning were talking about. This is mostly about tuning for raid5 > write performance.
Yep, i know... > > Anyway, are you planning to stripe together multiple md5 arrays via > LVM? I believe that is what --stripes and --stripesize are for. (ie. > If you have 8 drives, you could create 2 raid5 arrays, and use LVM to > interleave them by using --stripes = 2.) I've never used that > feature. > No, i don't plan to use something like that > You need to worry about the vg extents. I think vgcreate > --physicalextentsize is what you need to tune. I would make each > extent an even number of stripes in size. ie. 768KB * N. Maybe use > N=10, so -s 7680K > Well, i'm not sure about the PE parameter, it doesn't affect every write operation as far as i know, using a large number just helps the allocation process (LV creation/grow) and a little number helps with allocation granularity (slower creation/grow of LV) > Assuming your not using lvm strips and since this appears to be a new > setup, I would also use -C or --contiguous to ensure all the data is > sequential. It maybe overkill, but it will further ensure you _avoid_ > LV extents that don't end on a stripe boundary. (a stripe == 3 raid5 > chunks for you). Taking note... > > Then if you are going to use the snapshot feature, you need to set > your chunksize efficiently. If you only are going to have large > files, then I would use a large LVM snapshot chunksize. 256KB seems > like a good choice, but I have not benchmarked snapshot chunksizes. Read about that, but probably wont use snapshots with this VG > > Greg > -- Thanks, Ciro -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]