On Aug 26, 2009 00:46 -0400, Robin Humble wrote: > I've had another go at fixing the problem I was seeing a few months ago: > http://lists.lustre.org/pipermail/lustre-discuss/2009-April/010315.html > and which we are seeing again now as we are setting up a new machine > with 128k chunk software raid (md) RAID6 8+2 eg. > Lustre: test-OST000d: underlying device md5 should be tuned for larger I/O > requests: max_sectors = 1024 could be up to max_hw_sectors=2560 > > without this patch, and despite raising all disks to a ridiculously > huge max_sectors_kb, all Lustre 1M rpc's are still fragmented into two > 512k chunks before being sent to md :-/ likely md then aggregates them > again 'cos performance isn't totaly dismal, which it would be if it was > 100% read-modify-writes for each stripe write.
Yes, we've seen this same issue, but haven't been able to tweak the /sys tunables correctly to get MD RAID to agree. I wonder if the problem is that the /sys/block/*/queue/max_* tunables are being set too late in the MD startup, and it has picked up the 1024 sectors value too early, and never updates it afterward? > with the patch, 1M i/o's are being fed to md (according to brw_stats), > and performance is a little better for RAID6 8+2 with 128k chunks, and > a bit worse for RAID6 8+2 with 64k chunks (which are curiously now fed > half 512k and half 1M i/o's by Lustre). This was the other question I'd asked internally. If the array is formatted with 64kB chunks then 512k IOs shouldn't cause any read-modify- write operations and (in theory) give the same performance as 1M IOs on a 128kB chunksize array. What is the relative performance of the 64kB and 128kB configurations? > the one-liner is a core kernel change, so perhaps some Lustre/kernel > block device/md people can look at it and see if it's acceptable for > inclusion in standard Lustre OSS kernels, or whether it breaks > assumptions in the core scsi layer somehow. > > IMHO the best solution would be to apply the patch, and then have a > /sys/block/md*/queue/ for md devices so that max_sectors_kb and > max_hw_sectors_kb can be tuned without recompiling the kernel... > is that possible? > > the patch is against 2.6.18-128.1.14.el5-lustre1.8.1 > --- linux-2.6.18.x86_64.lustre/include/linux/blkdev.h 2009-08-18 > 17:40:51.000000000 +1000 > +++ linux-2.6.18.x86_64.lustre.hackBlock/include/linux/blkdev.h > 2009-08-21 13:47:55.000000000 +1000 > @@ -778,7 +778,7 @@ > #define MAX_PHYS_SEGMENTS 128 > #define MAX_HW_SEGMENTS 128 > #define SAFE_MAX_SECTORS 255 > -#define BLK_DEF_MAX_SECTORS 1024 > +#define BLK_DEF_MAX_SECTORS 2048 > > #define MAX_SEGMENT_SIZE 65536 This patch definitely looks reasonable, and since we already patch the server kernel it doesn't appear to be a huge problem to include it. Can you please create a bug and attach the patch there. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
