Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k
On Sun, 28 Nov 2010 04:18:25 + Ben Hutchings b...@debian.org wrote: On Sun, 2010-11-28 at 08:28 +1100, Neil Brown wrote: The fix I would recommend for 2.6.26 is to add if (q-merge_bvec_fn) rs-max_phys_segments = 1; to dm_set_device_limits. Though the redhat one is probably adequate. If you really need an upstream fix, you will need to chase upstream to apply one :-( I won't do that myself - as you can see, I don't really understand the issue fully. Is that fix also valid (modulo renaming of max_phys_segments) for later versions? Yes. For current mainline it would look like replacing if (q-merge_bvec_fn !ti-type-merge) limits-max_sectors = min_not_zero(limits-max_sectors, (unsigned int) (PAGE_SIZE 9)); with if (q-merge_bvec_fn !ti-type-merge) limits-max_segments = 1; (the test on -type-merge is important and applies to 2.6.26 as well). NeilBrown signature.asc Description: PGP signature
Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k
On Mon, 2010-11-29 at 09:37 +1100, Neil Brown wrote: On Sun, 28 Nov 2010 04:18:25 + Ben Hutchings b...@debian.org wrote: On Sun, 2010-11-28 at 08:28 +1100, Neil Brown wrote: The fix I would recommend for 2.6.26 is to add if (q-merge_bvec_fn) rs-max_phys_segments = 1; to dm_set_device_limits. Though the redhat one is probably adequate. If you really need an upstream fix, you will need to chase upstream to apply one :-( I won't do that myself - as you can see, I don't really understand the issue fully. Is that fix also valid (modulo renaming of max_phys_segments) for later versions? Yes. For current mainline it would look like replacing if (q-merge_bvec_fn !ti-type-merge) limits-max_sectors = min_not_zero(limits-max_sectors, (unsigned int) (PAGE_SIZE 9)); with if (q-merge_bvec_fn !ti-type-merge) limits-max_segments = 1; (the test on -type-merge is important and applies to 2.6.26 as well). Why is it not necessary to set seg_boundary_mask to PAGE_CACHE_SIZE - 1, as for md devices? Ben. -- Ben Hutchings, Debian Developer and kernel team member signature.asc Description: This is a digitally signed message part
Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k
On Mon, 29 Nov 2010 00:08:47 + Ben Hutchings b...@debian.org wrote: if (q-merge_bvec_fn !ti-type-merge) limits-max_segments = 1; (the test on -type-merge is important and applies to 2.6.26 as well). Why is it not necessary to set seg_boundary_mask to PAGE_CACHE_SIZE - 1, as for md devices? Sorry. It is necessary of course. I guess I was being a bit hasty and forgetting all the details. if (q-merge_bvec_fn !ti-type-merge) { limits-max_segments = 1; /* Make sure only one segment in each bio */ limits-seg_boundary_mask = PAGE_CACHE_SIZE-1; /* make sure that segment is in just one page */ } NeilBrown signature.asc Description: PGP signature
Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k
On Mon, 2010-11-29 at 11:48 +1100, Neil Brown wrote: On Mon, 29 Nov 2010 00:08:47 + Ben Hutchings b...@debian.org wrote: if (q-merge_bvec_fn !ti-type-merge) limits-max_segments = 1; (the test on -type-merge is important and applies to 2.6.26 as well). Why is it not necessary to set seg_boundary_mask to PAGE_CACHE_SIZE - 1, as for md devices? Sorry. It is necessary of course. I guess I was being a bit hasty and forgetting all the details. if (q-merge_bvec_fn !ti-type-merge) { limits-max_segments = 1; /* Make sure only one segment in each bio */ limits-seg_boundary_mask = PAGE_CACHE_SIZE-1; /* make sure that segment is in just one page */ } Thanks again. I'll apply this change in Debian and try to get it upstream if we don't see any regressions. Ben. -- Ben Hutchings, Debian Developer and kernel team member signature.asc Description: This is a digitally signed message part
Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k
On Wed, 2010-11-24 at 17:01 +0100, Wouter D'Haeseleer wrote: Hi Ben, I have now successfully compiled the kernel including the patch which this time applied without problem. However the original bug is still present with the patch you grabbed upstream. For testing purpose I have tried also the patch which is supplied by redhat and I can confirm that this patch is working without a problem. So it looks like the patch from Neil Brown does not work for this bug. The result of my conversation with Neil Brown is that his fix covers only md devices at the top of a stack whereas the Red Hat patch covers only dm devices at the top of a stack. We should really be fixing both in the same way. Please can you test the attached patch, which covers both dm and md. Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. commit a451e30d044e5b29f31d965a32d17bf7e97f99b0 Author: Ben Hutchings b...@decadent.org.uk Date: Sun Nov 28 23:46:46 2010 + dm: Deal with merge_bvec_fn in component devices better This is analogous to commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71, which does the same for md-devices at the top of the stack. The following explanation is taken from that commit. Thanks to Neil Brown ne...@suse.de for the advice. If a component device has a merge_bvec_fn then as we never call it we must ensure we never need to. Currently this is done by setting max_sector to 1 PAGE, however this does not stop a bio being created with several sub-page iovecs that would violate the merge_bvec_fn. So instead set max_segments to 1 and set the segment boundary to the same as a page boundary to ensure there is only ever one single-page segment of IO requested at a time. This can particularly be an issue when 'xen' is used as it is known to submit multiple small buffers in a single bio. Signed-off-by: Ben Hutchings b...@decadent.org.uk commit 71ff0067805fb917142a745246f7996f3ad86d5b Author: NeilBrown ne...@suse.de Date: Mon Mar 8 16:44:38 2010 +1100 md: deal with merge_bvec_fn in component devices better. commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 upstream. If a component device has a merge_bvec_fn then as we never call it we must ensure we never need to. Currently this is done by setting max_sector to 1 PAGE, however this does not stop a bio being created with several sub-page iovecs that would violate the merge_bvec_fn. So instead set max_segments to 1 and set the segment boundary to the same as a page boundary to ensure there is only ever one single-page segment of IO requested at a time. This can particularly be an issue when 'xen' is used as it is known to submit multiple small buffers in a single bio. Signed-off-by: NeilBrown ne...@suse.de Cc: sta...@kernel.org [bwh: Backport to Linux 2.6.26] diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index 94116ea..186445d0 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -506,17 +506,15 @@ void dm_set_device_limits(struct dm_target *ti, struct block_device *bdev) rs-max_sectors = min_not_zero(rs-max_sectors, q-max_sectors); - /* FIXME: Device-Mapper on top of RAID-0 breaks because DM - *currently doesn't honor MD's merge_bvec_fn routine. - *In this case, we'll force DM to use PAGE_SIZE or - *smaller I/O, just to be safe. A better fix is in the - *works, but add this for the time being so it will at - *least operate correctly. + /* + * Since we don't call merge_bvec_fn, we must never risk + * violating it, so limit max_phys_segments to 1 lying within + * a single page. */ - if (q-merge_bvec_fn) - rs-max_sectors = - min_not_zero(rs-max_sectors, - (unsigned int) (PAGE_SIZE 9)); + if (q-merge_bvec_fn) { + rs-max_phys_segments = 1; + rs-seg_boundary_mask = PAGE_CACHE_SIZE - 1; + } rs-max_phys_segments = min_not_zero(rs-max_phys_segments, diff --git a/drivers/md/linear.c b/drivers/md/linear.c index ec921f5..fe8508a 100644 --- a/drivers/md/linear.c +++ b/drivers/md/linear.c @@ -136,12 +136,14 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) blk_queue_stack_limits(mddev-queue, rdev-bdev-bd_disk-queue); /* as we don't honour merge_bvec_fn, we must never risk - * violating it, so limit -max_sector to one PAGE, as - * a one page request is never in violation. + * violating it, so limit max_segments to 1 lying within + * a single page. */ - if (rdev-bdev-bd_disk-queue-merge_bvec_fn - mddev-queue-max_sectors (PAGE_SIZE9)) - blk_queue_max_sectors(mddev-queue, PAGE_SIZE9); + if (rdev-bdev-bd_disk-queue-merge_bvec_fn) { + blk_queue_max_phys_segments(mddev-queue, 1); + blk_queue_segment_boundary(mddev-queue, + PAGE_CACHE_SIZE - 1); + } disk-size = rdev-size; conf-array_size +=
Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k
Ben, I'm running 4 days now without any disk errors anymore. As stated in my previous message this is with the RedHat patch applied. If I compair the patches I see that the patch you grabed upstream does not deal with t-limits.max_sectors Thanks for a reply Wouter -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/22752968add47840a7d404102c7563b005a82de...@vasco-be-exch2.vasco.com
Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k
Neil, would you mind looking at this: On Sat, 2010-11-27 at 10:49 +0100, Wouter D'Haeseleer wrote: Ben, I'm running 4 days now without any disk errors anymore. As stated in my previous message this is with the RedHat patch applied. If I compair the patches I see that the patch you grabed upstream does not deal with t-limits.max_sectors I've tried backporting your commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 to Linux 2.6.26 in Debian stable but it doesn't seem to fix the problem there. My version is http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=57;filename=0002-md-deal-with-merge_bvec_fn-in-component-devices-bett.patch;att=1;bug=604457 and RH's very different patch for RHEL 5 is https://bugzilla.redhat.com/attachment.cgi?id=342638action=diffcontext=patchcollapsed=headers=1format=raw Our bug log is at http://bugs.debian.org/604457. Ben. -- Ben Hutchings, Debian Developer and kernel team member signature.asc Description: This is a digitally signed message part
Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k
On Sat, 27 Nov 2010 19:53:54 + Ben Hutchings b...@debian.org wrote: Neil, would you mind looking at this: On Sat, 2010-11-27 at 10:49 +0100, Wouter D'Haeseleer wrote: Ben, I'm running 4 days now without any disk errors anymore. As stated in my previous message this is with the RedHat patch applied. If I compair the patches I see that the patch you grabed upstream does not deal with t-limits.max_sectors I've tried backporting your commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 to Linux 2.6.26 in Debian stable but it doesn't seem to fix the problem there. My version is http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=57;filename=0002-md-deal-with-merge_bvec_fn-in-component-devices-bett.patch;att=1;bug=604457 and RH's very different patch for RHEL 5 is https://bugzilla.redhat.com/attachment.cgi?id=342638action=diffcontext=patchcollapsed=headers=1format=raw Our bug log is at http://bugs.debian.org/604457. Ben. Hi Ben, You probably know most of this, but: The problem is that with stacked devices, if the lower device has a merge_bvec_fn, and the upper device never bothers to call it, then the upper device must make sure that it never sends a bio with more than one page in the bi_iovec. This is a property of the block device interface. The patch you back-ported fixes md so when it is the upper device it behaves correctly. However in the original problem, the md/raid10 is the lower device, and dm is the upper device. So dm needs to be fixed. Despite the fact that I learned about setting blk_queue_max_segments on the dm mailing list (if I remember correctly), dm still doesn't include this fix in mainline. The fix I would recommend for 2.6.26 is to add if (q-merge_bvec_fn) rs-max_phys_segments = 1; to dm_set_device_limits. Though the redhat one is probably adequate. If you really need an upstream fix, you will need to chase upstream to apply one :-( NeilBrown signature.asc Description: PGP signature
Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k
On Sun, 2010-11-28 at 08:28 +1100, Neil Brown wrote: On Sat, 27 Nov 2010 19:53:54 + Ben Hutchings b...@debian.org wrote: Neil, would you mind looking at this: On Sat, 2010-11-27 at 10:49 +0100, Wouter D'Haeseleer wrote: Ben, I'm running 4 days now without any disk errors anymore. As stated in my previous message this is with the RedHat patch applied. If I compair the patches I see that the patch you grabed upstream does not deal with t-limits.max_sectors I've tried backporting your commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 to Linux 2.6.26 in Debian stable but it doesn't seem to fix the problem there. My version is http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=57;filename=0002-md-deal-with-merge_bvec_fn-in-component-devices-bett.patch;att=1;bug=604457 and RH's very different patch for RHEL 5 is https://bugzilla.redhat.com/attachment.cgi?id=342638action=diffcontext=patchcollapsed=headers=1format=raw Our bug log is at http://bugs.debian.org/604457. Ben. Hi Ben, You probably know most of this, but: The problem is that with stacked devices, if the lower device has a merge_bvec_fn, and the upper device never bothers to call it, then the upper device must make sure that it never sends a bio with more than one page in the bi_iovec. This is a property of the block device interface. The patch you back-ported fixes md so when it is the upper device it behaves correctly. However in the original problem, the md/raid10 is the lower device, and dm is the upper device. So dm needs to be fixed. Thanks, I didn't spot that subtlety. Despite the fact that I learned about setting blk_queue_max_segments on the dm mailing list (if I remember correctly), dm still doesn't include this fix in mainline. The fix I would recommend for 2.6.26 is to add if (q-merge_bvec_fn) rs-max_phys_segments = 1; to dm_set_device_limits. Though the redhat one is probably adequate. If you really need an upstream fix, you will need to chase upstream to apply one :-( I won't do that myself - as you can see, I don't really understand the issue fully. Is that fix also valid (modulo renaming of max_phys_segments) for later versions? Ben. -- Ben Hutchings, Debian Developer and kernel team member signature.asc Description: This is a digitally signed message part
Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k
Hi Ben, I have now successfully compiled the kernel including the patch which this time applied without problem. However the original bug is still present with the patch you grabbed upstream. For testing purpose I have tried also the patch which is supplied by redhat and I can confirm that this patch is working without a problem. So it looks like the patch from Neil Brown does not work for this bug. Thanks Wouter
Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k
Hi Ben, Thanks for the quick response. I'm trying the patch you send me but it seems to be a huge difference from what I get using the source. This is what i did: apt-get source linux-image-2.6.26-2-xen-686 cd linux-2.6-2.6.26 fakeroot debian/rules source fakeroot debian/rules setup cd debian/build/source_i386_xen And I tried your patch at this level. Attached you can find the drivers/md/linear.c I have. Thanks for your help. Wouter -Original Message- From: Ben Hutchings b...@decadent.org.uk To: Wouter D'Haeseleer wouter.dhaesel...@vasco.com Cc: 604...@bugs.debian.org 604...@bugs.debian.org Subject: Re: Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k Date: Tue, 23 Nov 2010 03:34:07 +0100 On Tue, 2010-11-23 at 02:31 +, Ben Hutchings wrote: I have attempted to adjust this for Debian's stable kernel version (2.6.26) and the result is attached. Please could you test this, following the instructions at http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official. Oops, that version was not quite completely adjusted. Please test this instead. Ben. /* linear.c : Multiple Devices driver for Linux Copyright (C) 1994-96 Marc ZYNGIER zyng...@ufr-info-p7.ibp.fr or m...@gloups.fdn.fr Linear mode management functions. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. You should have received a copy of the GNU General Public License (for example /usr/src/linux/COPYING); if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ #include linux/module.h #include linux/raid/md.h #include linux/slab.h #include linux/raid/linear.h #define MAJOR_NR MD_MAJOR #define MD_DRIVER #define MD_PERSONALITY /* * find which device holds a particular offset */ static inline dev_info_t *which_dev(mddev_t *mddev, sector_t sector) { dev_info_t *hash; linear_conf_t *conf = mddev_to_conf(mddev); sector_t block = sector 1; /* * sector_div(a,b) returns the remainer and sets a to a/b */ block = conf-preshift; (void)sector_div(block, conf-hash_spacing); hash = conf-hash_table[block]; while ((sector1) = (hash-size + hash-offset)) hash++; return hash; } /** * linear_mergeable_bvec -- tell bio layer if two requests can be merged * @q: request queue * @bio: the buffer head that's been built up so far * @biovec: the request that could be merged to it. * * Return amount of bytes we can take at this offset */ static int linear_mergeable_bvec(struct request_queue *q, struct bio *bio, struct bio_vec *biovec) { mddev_t *mddev = q-queuedata; dev_info_t *dev0; unsigned long maxsectors, bio_sectors = bio-bi_size 9; sector_t sector = bio-bi_sector + get_start_sect(bio-bi_bdev); dev0 = which_dev(mddev, sector); maxsectors = (dev0-size 1) - (sector - (dev0-offset1)); if (maxsectors bio_sectors) maxsectors = 0; else maxsectors -= bio_sectors; if (maxsectors = (PAGE_SIZE 9 ) bio_sectors == 0) return biovec-bv_len; /* The bytes available at this offset could be really big, * so we cap at 2^31 to avoid overflow */ if (maxsectors (1 (31-9))) return 131; return maxsectors 9; } static void linear_unplug(struct request_queue *q) { mddev_t *mddev = q-queuedata; linear_conf_t *conf = mddev_to_conf(mddev); int i; for (i=0; i mddev-raid_disks; i++) { struct request_queue *r_queue = bdev_get_queue(conf-disks[i].rdev-bdev); blk_unplug(r_queue); } } static int linear_congested(void *data, int bits) { mddev_t *mddev = data; linear_conf_t *conf = mddev_to_conf(mddev); int i, ret = 0; for (i = 0; i mddev-raid_disks !ret ; i++) { struct request_queue *q = bdev_get_queue(conf-disks[i].rdev-bdev); ret |= bdi_congested(q-backing_dev_info, bits); } return ret; } static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) { linear_conf_t *conf; dev_info_t **table; mdk_rdev_t *rdev; int i, nb_zone, cnt; sector_t min_spacing; sector_t curr_offset; struct list_head *tmp; conf = kzalloc (sizeof (*conf) + raid_disks*sizeof(dev_info_t), GFP_KERNEL); if (!conf) return NULL; cnt = 0; conf-array_size = 0; rdev_for_each(rdev, tmp, mddev) { int j = rdev-raid_disk; dev_info_t *disk = conf-disks + j; if (j 0 || j = raid_disks || disk-rdev) { printk(linear: disk numbering problem. Aborting!\n); goto out; } disk-rdev = rdev; blk_queue_stack_limits(mddev-queue, rdev-bdev-bd_disk-queue); /* as we don't honour merge_bvec_fn, we must never risk * violating it, so limit -max_sector to one PAGE, as * a one page request is never in violation. */ if (rdev-bdev-bd_disk-queue-merge_bvec_fn mddev-queue-max_sectors (PAGE_SIZE9))
Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k
Oeps, please ignore my previous post, it seems I made a mistake with the patch files. I have re-compiled the kernel. I can say that after running now almost 3 hours I don't see the error anymore. Therefore I can say this bug is resolved. When will this patch make it into the normal updates? Thanks
Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k
I Spoke to soon, issue still present using the patch.
Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k
Just a small question to be sure I patched it correctly this is what I did cd /usr/src apt-get build-dep linux-image-2.6.26-2-xen-686 apt-get source linux-image-2.6.26-2-xen-686 cd linux-2.6-2.6.26 fakeroot debian/rules source fakeroot debian/rules setup cd debian/build/source_i386_xen # Getting your patch and applying it wget 'http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=10;filename=0001-md-deal-with-merge_bvec_fn-in-component-devices-bett.patch;att=1;bug=604457' -O 0001-md-deal-with-merge_bvec_fn-in-component-devices-bett.patch patch -p1 0001-md-deal-with-merge_bvec_fn-in-component-devices-bett.patch fakeroot make -f debian/rules.gen binary-arch_i386_xen_686
Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k
On Tue, 2010-11-23 at 18:20 +0100, Wouter D'Haeseleer wrote: Just a small question to be sure I patched it correctly this is what I did cd /usr/src apt-get build-dep linux-image-2.6.26-2-xen-686 apt-get source linux-image-2.6.26-2-xen-686 cd linux-2.6-2.6.26 fakeroot debian/rules source fakeroot debian/rules setup cd debian/build/source_i386_xen # Getting your patch and applying it wget 'http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=10;filename=0001-md-deal-with-merge_bvec_fn-in-component-devices-bett.patch;att=1;bug=604457' -O 0001-md-deal-with-merge_bvec_fn-in-component-devices-bett.patch patch -p1 0001-md-deal-with-merge_bvec_fn-in-component-devices-bett.patch fakeroot make -f debian/rules.gen binary-arch_i386_xen_686 Sorry, I realise now that those instructions are not correct for the kernel package in stable. You need to apply the patch *before* running 'debian/rules setup'. (For newer kernel package the order doesn't matter.) Also, I wrote: On Tue, 2010-11-23 at 02:31 +, Ben Hutchings wrote: I have attempted to adjust this for Debian's stable kernel version (2.6.26) and the result is attached. Please could you test this, following the instructions at http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official. Oops, that version was not quite completely adjusted. Please test this instead. and I have no idea why I thought that, because the first version I sent you was correct and the second was not. Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. signature.asc Description: This is a digitally signed message part
Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k
On Tue, 2010-11-23 at 17:50 +, Ben Hutchings wrote: [...] Oops, that version was not quite completely adjusted. Please test this instead. and I have no idea why I thought that, because the first version I sent you was correct and the second was not. *sigh* OK, neither of them was correct. This version will really work, I promise. Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. From 34df5016f2e8681ae9e99e54a66c826463dd74a5 Mon Sep 17 00:00:00 2001 From: NeilBrown ne...@suse.de Date: Mon, 8 Mar 2010 16:44:38 +1100 Subject: [PATCH] md: deal with merge_bvec_fn in component devices better. commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 upstream. If a component device has a merge_bvec_fn then as we never call it we must ensure we never need to. Currently this is done by setting max_sector to 1 PAGE, however this does not stop a bio being created with several sub-page iovecs that would violate the merge_bvec_fn. So instead set max_segments to 1 and set the segment boundary to the same as a page boundary to ensure there is only ever one single-page segment of IO requested at a time. This can particularly be an issue when 'xen' is used as it is known to submit multiple small buffers in a single bio. Signed-off-by: NeilBrown ne...@suse.de Cc: sta...@kernel.org [bwh: Backport to Linux 2.6.26] --- drivers/md/linear.c| 13 - drivers/md/multipath.c | 22 ++ drivers/md/raid0.c | 14 -- drivers/md/raid1.c | 30 +++--- drivers/md/raid10.c| 30 +++--- 5 files changed, 68 insertions(+), 41 deletions(-) diff --git a/drivers/md/linear.c b/drivers/md/linear.c index ec921f5..627cd38 100644 --- a/drivers/md/linear.c +++ b/drivers/md/linear.c @@ -136,12 +136,15 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) blk_queue_stack_limits(mddev-queue, rdev-bdev-bd_disk-queue); /* as we don't honour merge_bvec_fn, we must never risk - * violating it, so limit -max_sector to one PAGE, as - * a one page request is never in violation. + * violating it, so limit max_segments to 1 lying within + * a single page. */ - if (rdev-bdev-bd_disk-queue-merge_bvec_fn - mddev-queue-max_sectors (PAGE_SIZE9)) - blk_queue_max_sectors(mddev-queue, PAGE_SIZE9); + if (rdev-bdev-bd_disk-queue-merge_bvec_fn) { + blk_queue_max_phys_segments(mddev-queue, 1); + blk_queue_max_hw_segments(mddev-queue, 1); + blk_queue_segment_boundary(mddev-queue, + PAGE_CACHE_SIZE - 1); + } disk-size = rdev-size; conf-array_size += rdev-size; diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c index e968116..9605b21 100644 --- a/drivers/md/multipath.c +++ b/drivers/md/multipath.c @@ -293,14 +293,17 @@ static int multipath_add_disk(mddev_t *mddev, mdk_rdev_t *rdev) blk_queue_stack_limits(mddev-queue, q); /* as we don't honour merge_bvec_fn, we must never risk - * violating it, so limit -max_sector to one PAGE, as - * a one page request is never in violation. + * violating it, so limit -max_segments to one, lying + * within a single page. * (Note: it is very unlikely that a device with * merge_bvec_fn will be involved in multipath.) */ - if (q-merge_bvec_fn - mddev-queue-max_sectors (PAGE_SIZE9)) -blk_queue_max_sectors(mddev-queue, PAGE_SIZE9); + if (q-merge_bvec_fn) { +blk_queue_max_phys_segments(mddev-queue, 1); +blk_queue_max_hw_segments(mddev-queue, 1); +blk_queue_segment_boundary(mddev-queue, + PAGE_CACHE_SIZE - 1); + } conf-working_disks++; mddev-degraded--; @@ -453,9 +456,12 @@ static int multipath_run (mddev_t *mddev) /* as we don't honour merge_bvec_fn, we must never risk * violating it, not that we ever expect a device with * a merge_bvec_fn to be involved in multipath */ - if (rdev-bdev-bd_disk-queue-merge_bvec_fn - mddev-queue-max_sectors (PAGE_SIZE9)) - blk_queue_max_sectors(mddev-queue, PAGE_SIZE9); + if (rdev-bdev-bd_disk-queue-merge_bvec_fn) { + blk_queue_max_phys_segments(mddev-queue, 1); + blk_queue_max_hw_segments(mddev-queue, 1); + blk_queue_segment_boundary(mddev-queue, + PAGE_CACHE_SIZE - 1); + } if (!test_bit(Faulty, rdev-flags)) conf-working_disks++; diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c index 914c04d..806e20d 100644 --- a/drivers/md/raid0.c +++ b/drivers/md/raid0.c @@ -141,14 +141,16 @@ static int create_strip_zones (mddev_t *mddev) blk_queue_stack_limits(mddev-queue, rdev1-bdev-bd_disk-queue); /* as we don't honour merge_bvec_fn, we must never risk - * violating it, so limit -max_sector to one PAGE, as - * a one page request is never in violation. + * violating it, so limit -max_segments to 1, lying within + * a single page. */ - if (rdev1-bdev-bd_disk-queue-merge_bvec_fn - mddev-queue-max_sectors
Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k
Package: linux-image-2.6.26-2-xen-686 Version: 2.6.26-25lenny1 Severity: critical Justification: causes serious data loss When accessing an lv using configured on a raid10 using xen results in corrupted data as the following syslog indicates: kernel: raid10_make_request bug: can't convert block across chunks or bigger than 64k 309585274 4 Continued attempts to use the disk in the domU results in i/o error and the partition being remounted read-only. see also debian bug 461644 (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=461644) Since this bug is old and closed without a fix, I want to open a new bug for it. Redhat made a patch for the appropriate driver, but its not included upstream. Can someone please make sure this patch gets into the sources of the debian fork. See this patch: https://bugzilla.redhat.com/attachment.cgi?id=342638action=diff See also this kernel trap related discussion : http://kerneltrap.org/mailarchive/linux-raid/2010/3/8/6837883 This same thread contains an other patch then the redhat one and its that one is also confirmed as working. -- System Information: Debian Release: 5.0.6 APT prefers stable APT policy: (990, 'stable') Architecture: i386 (i686) Kernel: Linux 2.6.26-2-xen-686 (SMP w/2 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Versions of packages linux-image-2.6.26-2-xen-686 depends on: ii initramfs-tools 0.92o tools for generating an initramfs ii linux-modules-2.6.26-2-x 2.6.26-25lenny1 Linux 2.6.26 modules on i686 Versions of packages linux-image-2.6.26-2-xen-686 recommends: ii libc6-xen 2.10.2-2 GNU C Library: Shared libraries [X Versions of packages linux-image-2.6.26-2-xen-686 suggests: ii grub 0.97-47lenny2 GRand Unified Bootloader (Legacy v pn linux-doc-2.6.26 none(no description available) -- no debconf information -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20101122104855.7917.71402.report...@xen-6080-01.infra.vasco.com
Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k
On Mon, 2010-11-22 at 11:48 +0100, Wouter D'Haeseleer wrote: Package: linux-image-2.6.26-2-xen-686 Version: 2.6.26-25lenny1 Severity: critical Justification: causes serious data loss When accessing an lv using configured on a raid10 using xen results in corrupted data as the following syslog indicates: kernel: raid10_make_request bug: can't convert block across chunks or bigger than 64k 309585274 4 Continued attempts to use the disk in the domU results in i/o error and the partition being remounted read-only. see also debian bug 461644 (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=461644) Since this bug is old and closed without a fix, I want to open a new bug for it. Redhat made a patch for the appropriate driver, but its not included upstream. Can someone please make sure this patch gets into the sources of the debian fork. See this patch: https://bugzilla.redhat.com/attachment.cgi?id=342638action=diff We much prefer to use bug fixes that have been accepted upstream. See also this kernel trap related discussion : http://kerneltrap.org/mailarchive/linux-raid/2010/3/8/6837883 This same thread contains an other patch then the redhat one and its that one is also confirmed as working. Well that was also not accepted upstream. However, I eventually tracked down the accepted version, which for future reference is: commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 Author: NeilBrown ne...@suse.de Date: Mon Mar 8 16:44:38 2010 +1100 md: deal with merge_bvec_fn in component devices better. I have attempted to adjust this for Debian's stable kernel version (2.6.26) and the result is attached. Please could you test this, following the instructions at http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official. Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. From ea1cddfe4cad61b43b6551ebc6bef466b25ff128 Mon Sep 17 00:00:00 2001 From: NeilBrown ne...@suse.de Date: Mon, 8 Mar 2010 16:44:38 +1100 Subject: [PATCH] md: deal with merge_bvec_fn in component devices better. commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 upstream. If a component device has a merge_bvec_fn then as we never call it we must ensure we never need to. Currently this is done by setting max_sector to 1 PAGE, however this does not stop a bio being created with several sub-page iovecs that would violate the merge_bvec_fn. So instead set max_segments to 1 and set the segment boundary to the same as a page boundary to ensure there is only ever one single-page segment of IO requested at a time. This can particularly be an issue when 'xen' is used as it is known to submit multiple small buffers in a single bio. Signed-off-by: NeilBrown ne...@suse.de Cc: sta...@kernel.org [bwh: Backport to Linux 2.6.26] --- drivers/md/linear.c| 13 - drivers/md/multipath.c | 20 drivers/md/raid0.c | 14 -- drivers/md/raid1.c | 30 +++--- drivers/md/raid10.c| 30 +++--- 5 files changed, 66 insertions(+), 41 deletions(-) diff --git a/drivers/md/linear.c b/drivers/md/linear.c index ec921f5..627cd38 100644 --- a/drivers/md/linear.c +++ b/drivers/md/linear.c @@ -136,12 +136,15 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) blk_queue_stack_limits(mddev-queue, rdev-bdev-bd_disk-queue); /* as we don't honour merge_bvec_fn, we must never risk - * violating it, so limit -max_sector to one PAGE, as - * a one page request is never in violation. + * violating it, so limit max_segments to 1 lying within + * a single page. */ - if (rdev-bdev-bd_disk-queue-merge_bvec_fn - mddev-queue-max_sectors (PAGE_SIZE9)) - blk_queue_max_sectors(mddev-queue, PAGE_SIZE9); + if (rdev-bdev-bd_disk-queue-merge_bvec_fn) { + blk_queue_max_phys_segments(mddev-queue, 1); + blk_queue_max_hw_segments(mddev-queue, 1); + blk_queue_segment_boundary(mddev-queue, + PAGE_CACHE_SIZE - 1); + } disk-size = rdev-size; conf-array_size += rdev-size; diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c index e968116..0e84b4f 100644 --- a/drivers/md/multipath.c +++ b/drivers/md/multipath.c @@ -293,14 +293,16 @@ static int multipath_add_disk(mddev_t *mddev, mdk_rdev_t *rdev) blk_queue_stack_limits(mddev-queue, q); /* as we don't honour merge_bvec_fn, we must never risk - * violating it, so limit -max_sector to one PAGE, as - * a one page request is never in violation. + * violating it, so limit -max_segments to one, lying + * within a single page. * (Note: it is very unlikely that a device with * merge_bvec_fn will be involved in multipath.) */ - if (q-merge_bvec_fn - mddev-queue-max_sectors (PAGE_SIZE9)) -blk_queue_max_sectors(mddev-queue, PAGE_SIZE9); + if (q-merge_bvec_fn) { +blk_queue_max_segments(mddev-queue, 1); +
Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k
On Tue, 2010-11-23 at 02:31 +, Ben Hutchings wrote: I have attempted to adjust this for Debian's stable kernel version (2.6.26) and the result is attached. Please could you test this, following the instructions at http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official. Oops, that version was not quite completely adjusted. Please test this instead. Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. From: NeilBrown ne...@suse.de Date: Mon, 8 Mar 2010 16:44:38 +1100 Subject: [PATCH] md: deal with merge_bvec_fn in component devices better. commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 upstream. If a component device has a merge_bvec_fn then as we never call it we must ensure we never need to. Currently this is done by setting max_sector to 1 PAGE, however this does not stop a bio being created with several sub-page iovecs that would violate the merge_bvec_fn. So instead set max_segments to 1 and set the segment boundary to the same as a page boundary to ensure there is only ever one single-page segment of IO requested at a time. This can particularly be an issue when 'xen' is used as it is known to submit multiple small buffers in a single bio. Signed-off-by: NeilBrown ne...@suse.de Cc: sta...@kernel.org --- drivers/md/linear.c| 12 +++- drivers/md/multipath.c | 20 drivers/md/raid0.c | 13 +++-- drivers/md/raid1.c | 28 +--- drivers/md/raid10.c| 28 +--- 5 files changed, 60 insertions(+), 41 deletions(-) diff --git a/drivers/md/linear.c b/drivers/md/linear.c index af2d39d..bb2a231 100644 --- a/drivers/md/linear.c +++ b/drivers/md/linear.c @@ -172,12 +172,14 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) disk_stack_limits(mddev-gendisk, rdev-bdev, rdev-data_offset 9); /* as we don't honour merge_bvec_fn, we must never risk - * violating it, so limit -max_sector to one PAGE, as - * a one page request is never in violation. + * violating it, so limit max_segments to 1 lying within + * a single page. */ - if (rdev-bdev-bd_disk-queue-merge_bvec_fn - queue_max_sectors(mddev-queue) (PAGE_SIZE9)) - blk_queue_max_hw_sectors(mddev-queue, PAGE_SIZE9); + if (rdev-bdev-bd_disk-queue-merge_bvec_fn) { + blk_queue_max_segments(mddev-queue, 1); + blk_queue_segment_boundary(mddev-queue, + PAGE_CACHE_SIZE - 1); + } conf-array_sectors += rdev-sectors; cnt++; diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c index 4b323f4..5558ebc 100644 --- a/drivers/md/multipath.c +++ b/drivers/md/multipath.c @@ -301,14 +301,16 @@ static int multipath_add_disk(mddev_t *mddev, mdk_rdev_t *rdev) rdev-data_offset 9); /* as we don't honour merge_bvec_fn, we must never risk - * violating it, so limit -max_sector to one PAGE, as - * a one page request is never in violation. + * violating it, so limit -max_segments to one, lying + * within a single page. * (Note: it is very unlikely that a device with * merge_bvec_fn will be involved in multipath.) */ - if (q-merge_bvec_fn - queue_max_sectors(q) (PAGE_SIZE9)) -blk_queue_max_hw_sectors(mddev-queue, PAGE_SIZE9); + if (q-merge_bvec_fn) { +blk_queue_max_segments(mddev-queue, 1); +blk_queue_segment_boundary(mddev-queue, + PAGE_CACHE_SIZE - 1); + } conf-working_disks++; mddev-degraded--; @@ -476,9 +478,11 @@ static int multipath_run (mddev_t *mddev) /* as we don't honour merge_bvec_fn, we must never risk * violating it, not that we ever expect a device with * a merge_bvec_fn to be involved in multipath */ - if (rdev-bdev-bd_disk-queue-merge_bvec_fn - queue_max_sectors(mddev-queue) (PAGE_SIZE9)) - blk_queue_max_hw_sectors(mddev-queue, PAGE_SIZE9); + if (rdev-bdev-bd_disk-queue-merge_bvec_fn) { + blk_queue_max_segments(mddev-queue, 1); + blk_queue_segment_boundary(mddev-queue, + PAGE_CACHE_SIZE - 1); + } if (!test_bit(Faulty, rdev-flags)) conf-working_disks++; diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c index a1f7147..377cf2a 100644 --- a/drivers/md/raid0.c +++ b/drivers/md/raid0.c @@ -176,14 +176,15 @@ static int create_strip_zones(mddev_t *mddev) disk_stack_limits(mddev-gendisk, rdev1-bdev, rdev1-data_offset 9); /* as we don't honour merge_bvec_fn, we must never risk - * violating it, so limit -max_sector to one PAGE, as - * a one page request is never in violation. + * violating it, so limit -max_segments to 1, lying within + * a single page. */ - if (rdev1-bdev-bd_disk-queue-merge_bvec_fn - queue_max_sectors(mddev-queue) (PAGE_SIZE9)) - blk_queue_max_hw_sectors(mddev-queue, PAGE_SIZE9); - + if (rdev1-bdev-bd_disk-queue-merge_bvec_fn) { + blk_queue_max_segments(mddev-queue, 1); + blk_queue_segment_boundary(mddev-queue, + PAGE_CACHE_SIZE -