Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-28 Thread Neil Brown
On Sun, 28 Nov 2010 04:18:25 + Ben Hutchings b...@debian.org wrote:

 On Sun, 2010-11-28 at 08:28 +1100, Neil Brown wrote:
   The fix I would recommend for 2.6.26 is to add
  
 if (q-merge_bvec_fn)
 rs-max_phys_segments = 1;
  
  to dm_set_device_limits.  Though the redhat one is probably adequate.
  
  If you really need an upstream fix, you will need to chase upstream to apply
  one :-(
 
 I won't do that myself - as you can see, I don't really understand the
 issue fully.  Is that fix also valid (modulo renaming of
 max_phys_segments) for later versions?
 

Yes.
For current mainline it would look like replacing


if (q-merge_bvec_fn  !ti-type-merge)
limits-max_sectors =
min_not_zero(limits-max_sectors,
 (unsigned int) (PAGE_SIZE  9));

with

if (q-merge_bvec_fn  !ti-type-merge)
limits-max_segments = 1;

(the test on -type-merge is important and applies to 2.6.26 as well).

NeilBrown



signature.asc
Description: PGP signature


Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-28 Thread Ben Hutchings
On Mon, 2010-11-29 at 09:37 +1100, Neil Brown wrote:
 On Sun, 28 Nov 2010 04:18:25 + Ben Hutchings b...@debian.org wrote:
 
  On Sun, 2010-11-28 at 08:28 +1100, Neil Brown wrote:
The fix I would recommend for 2.6.26 is to add
   
  if (q-merge_bvec_fn)
  rs-max_phys_segments = 1;
   
   to dm_set_device_limits.  Though the redhat one is probably adequate.
   
   If you really need an upstream fix, you will need to chase upstream to 
   apply
   one :-(
  
  I won't do that myself - as you can see, I don't really understand the
  issue fully.  Is that fix also valid (modulo renaming of
  max_phys_segments) for later versions?
  
 
 Yes.
 For current mainline it would look like replacing
 
 
   if (q-merge_bvec_fn  !ti-type-merge)
   limits-max_sectors =
   min_not_zero(limits-max_sectors,
(unsigned int) (PAGE_SIZE  9));
 
 with
 
   if (q-merge_bvec_fn  !ti-type-merge)
   limits-max_segments = 1;
 
 (the test on -type-merge is important and applies to 2.6.26 as well).

Why is it not necessary to set seg_boundary_mask to PAGE_CACHE_SIZE - 1,
as for md devices?

Ben.

-- 
Ben Hutchings, Debian Developer and kernel team member



signature.asc
Description: This is a digitally signed message part


Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-28 Thread Neil Brown
On Mon, 29 Nov 2010 00:08:47 + Ben Hutchings b...@debian.org wrote:

  
  if (q-merge_bvec_fn  !ti-type-merge)
  limits-max_segments = 1;
  
  (the test on -type-merge is important and applies to 2.6.26 as well).
 
 Why is it not necessary to set seg_boundary_mask to PAGE_CACHE_SIZE - 1,
 as for md devices?
 

Sorry.  It is necessary of course.  I guess I was being a bit hasty and
forgetting all the details.

 if (q-merge_bvec_fn  !ti-type-merge) {
  limits-max_segments = 1;   /* Make sure only one segment in each bio */
  limits-seg_boundary_mask = PAGE_CACHE_SIZE-1; /* make sure that
segment is in just one page */
 }

NeilBrown



signature.asc
Description: PGP signature


Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-28 Thread Ben Hutchings
On Mon, 2010-11-29 at 11:48 +1100, Neil Brown wrote:
 On Mon, 29 Nov 2010 00:08:47 + Ben Hutchings b...@debian.org wrote:
 
   
 if (q-merge_bvec_fn  !ti-type-merge)
 limits-max_segments = 1;
   
   (the test on -type-merge is important and applies to 2.6.26 as well).
  
  Why is it not necessary to set seg_boundary_mask to PAGE_CACHE_SIZE - 1,
  as for md devices?
  
 
 Sorry.  It is necessary of course.  I guess I was being a bit hasty and
 forgetting all the details.
 
  if (q-merge_bvec_fn  !ti-type-merge) {
   limits-max_segments = 1;   /* Make sure only one segment in each bio */
   limits-seg_boundary_mask = PAGE_CACHE_SIZE-1; /* make sure that
 segment is in just one page */
  }

Thanks again.  I'll apply this change in Debian and try to get it
upstream if we don't see any regressions.

Ben.

-- 
Ben Hutchings, Debian Developer and kernel team member



signature.asc
Description: This is a digitally signed message part


Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-28 Thread Ben Hutchings
On Wed, 2010-11-24 at 17:01 +0100, Wouter D'Haeseleer wrote:
 Hi Ben,
 
 I have now successfully compiled the kernel including the patch which
 this time applied without problem.
 However the original bug is still present with the patch you grabbed
 upstream.
 
 For testing purpose I have tried also the patch which is supplied by
 redhat and I can confirm that this patch is working without a problem.
 
 So it looks like the patch from Neil Brown does not work for this bug.

The result of my conversation with Neil Brown is that his fix covers
only md devices at the top of a stack whereas the Red Hat patch covers
only dm devices at the top of a stack.  We should really be fixing both
in the same way.

Please can you test the attached patch, which covers both dm and md.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.
commit a451e30d044e5b29f31d965a32d17bf7e97f99b0
Author: Ben Hutchings b...@decadent.org.uk
Date:   Sun Nov 28 23:46:46 2010 +

dm: Deal with merge_bvec_fn in component devices better

This is analogous to commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71,
which does the same for md-devices at the top of the stack.  The
following explanation is taken from that commit.  Thanks to Neil Brown
ne...@suse.de for the advice.

If a component device has a merge_bvec_fn then as we never call it
we must ensure we never need to.  Currently this is done by setting
max_sector to 1 PAGE, however this does not stop a bio being created
with several sub-page iovecs that would violate the merge_bvec_fn.

So instead set max_segments to 1 and set the segment boundary to the
same as a page boundary to ensure there is only ever one single-page
segment of IO requested at a time.

This can particularly be an issue when 'xen' is used as it is
known to submit multiple small buffers in a single bio.

Signed-off-by: Ben Hutchings b...@decadent.org.uk

commit 71ff0067805fb917142a745246f7996f3ad86d5b
Author: NeilBrown ne...@suse.de
Date:   Mon Mar 8 16:44:38 2010 +1100

md: deal with merge_bvec_fn in component devices better.

commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 upstream.

If a component device has a merge_bvec_fn then as we never call it
we must ensure we never need to.  Currently this is done by setting
max_sector to 1 PAGE, however this does not stop a bio being created
with several sub-page iovecs that would violate the merge_bvec_fn.

So instead set max_segments to 1 and set the segment boundary to the
same as a page boundary to ensure there is only ever one single-page
segment of IO requested at a time.

This can particularly be an issue when 'xen' is used as it is
known to submit multiple small buffers in a single bio.

Signed-off-by: NeilBrown ne...@suse.de
Cc: sta...@kernel.org
[bwh: Backport to Linux 2.6.26]
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 94116ea..186445d0 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -506,17 +506,15 @@ void dm_set_device_limits(struct dm_target *ti, struct block_device *bdev)
 	rs-max_sectors =
 		min_not_zero(rs-max_sectors, q-max_sectors);
 
-	/* FIXME: Device-Mapper on top of RAID-0 breaks because DM
-	 *currently doesn't honor MD's merge_bvec_fn routine.
-	 *In this case, we'll force DM to use PAGE_SIZE or
-	 *smaller I/O, just to be safe. A better fix is in the
-	 *works, but add this for the time being so it will at
-	 *least operate correctly.
+	/*
+	 * Since we don't call merge_bvec_fn, we must never risk
+	 * violating it, so limit max_phys_segments to 1 lying within
+	 * a single page.
 	 */
-	if (q-merge_bvec_fn)
-		rs-max_sectors =
-			min_not_zero(rs-max_sectors,
- (unsigned int) (PAGE_SIZE  9));
+	if (q-merge_bvec_fn) {
+		rs-max_phys_segments = 1;
+		rs-seg_boundary_mask = PAGE_CACHE_SIZE - 1;
+	}
 
 	rs-max_phys_segments =
 		min_not_zero(rs-max_phys_segments,
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index ec921f5..fe8508a 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -136,12 +136,14 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks)
 		blk_queue_stack_limits(mddev-queue,
    rdev-bdev-bd_disk-queue);
 		/* as we don't honour merge_bvec_fn, we must never risk
-		 * violating it, so limit -max_sector to one PAGE, as
-		 * a one page request is never in violation.
+		 * violating it, so limit max_segments to 1 lying within
+		 * a single page.
 		 */
-		if (rdev-bdev-bd_disk-queue-merge_bvec_fn 
-		mddev-queue-max_sectors  (PAGE_SIZE9))
-			blk_queue_max_sectors(mddev-queue, PAGE_SIZE9);
+		if (rdev-bdev-bd_disk-queue-merge_bvec_fn) {
+			blk_queue_max_phys_segments(mddev-queue, 1);
+			blk_queue_segment_boundary(mddev-queue,
+		   PAGE_CACHE_SIZE - 1);
+		}
 
 		disk-size = rdev-size;
 		conf-array_size += 

Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-27 Thread Wouter D'Haeseleer
Ben,

I'm running 4 days now without any disk errors anymore.
As stated in my previous message this is with the RedHat patch applied.

If I compair the patches I see that the patch you grabed upstream does not deal 
with t-limits.max_sectors

Thanks for a reply

Wouter


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/22752968add47840a7d404102c7563b005a82de...@vasco-be-exch2.vasco.com



Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-27 Thread Ben Hutchings
Neil, would you mind looking at this:

On Sat, 2010-11-27 at 10:49 +0100, Wouter D'Haeseleer wrote:
 Ben,
 
 I'm running 4 days now without any disk errors anymore.
 As stated in my previous message this is with the RedHat patch applied.
 
 If I compair the patches I see that the patch you grabed upstream does not 
 deal with t-limits.max_sectors

I've tried backporting your commit
627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 to Linux 2.6.26 in Debian
stable but it doesn't seem to fix the problem there.  My version is
http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=57;filename=0002-md-deal-with-merge_bvec_fn-in-component-devices-bett.patch;att=1;bug=604457
and RH's very different patch for RHEL 5 is
https://bugzilla.redhat.com/attachment.cgi?id=342638action=diffcontext=patchcollapsed=headers=1format=raw

Our bug log is at http://bugs.debian.org/604457.

Ben.

-- 
Ben Hutchings, Debian Developer and kernel team member



signature.asc
Description: This is a digitally signed message part


Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-27 Thread Neil Brown
On Sat, 27 Nov 2010 19:53:54 + Ben Hutchings b...@debian.org wrote:

 Neil, would you mind looking at this:
 
 On Sat, 2010-11-27 at 10:49 +0100, Wouter D'Haeseleer wrote:
  Ben,
  
  I'm running 4 days now without any disk errors anymore.
  As stated in my previous message this is with the RedHat patch applied.
  
  If I compair the patches I see that the patch you grabed upstream does not 
  deal with t-limits.max_sectors
 
 I've tried backporting your commit
 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 to Linux 2.6.26 in Debian
 stable but it doesn't seem to fix the problem there.  My version is
 http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=57;filename=0002-md-deal-with-merge_bvec_fn-in-component-devices-bett.patch;att=1;bug=604457
 and RH's very different patch for RHEL 5 is
 https://bugzilla.redhat.com/attachment.cgi?id=342638action=diffcontext=patchcollapsed=headers=1format=raw
 
 Our bug log is at http://bugs.debian.org/604457.
 
 Ben.
 

Hi Ben,

 You probably know most of this, but:

The problem is that with stacked devices, if the lower device has a
merge_bvec_fn, and the upper device never bothers to call it, then the
upper device must make sure that it never sends a bio with more than one
page in the bi_iovec.  This is a property of the block device interface.

The patch you back-ported fixes md so when it is the upper device it
behaves correctly.

However in the original problem, the md/raid10 is the lower device, and
dm is the upper device.  So dm needs to be fixed.

Despite the fact that I learned about setting blk_queue_max_segments on
the dm mailing list (if I remember correctly), dm still doesn't include
this fix in mainline.

 The fix I would recommend for 2.6.26 is to add

   if (q-merge_bvec_fn)
   rs-max_phys_segments = 1;

to dm_set_device_limits.  Though the redhat one is probably adequate.

If you really need an upstream fix, you will need to chase upstream to apply
one :-(

NeilBrown



signature.asc
Description: PGP signature


Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-27 Thread Ben Hutchings
On Sun, 2010-11-28 at 08:28 +1100, Neil Brown wrote:
 On Sat, 27 Nov 2010 19:53:54 + Ben Hutchings b...@debian.org wrote:
 
  Neil, would you mind looking at this:
  
  On Sat, 2010-11-27 at 10:49 +0100, Wouter D'Haeseleer wrote:
   Ben,
   
   I'm running 4 days now without any disk errors anymore.
   As stated in my previous message this is with the RedHat patch applied.
   
   If I compair the patches I see that the patch you grabed upstream does 
   not deal with t-limits.max_sectors
  
  I've tried backporting your commit
  627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 to Linux 2.6.26 in Debian
  stable but it doesn't seem to fix the problem there.  My version is
  http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=57;filename=0002-md-deal-with-merge_bvec_fn-in-component-devices-bett.patch;att=1;bug=604457
  and RH's very different patch for RHEL 5 is
  https://bugzilla.redhat.com/attachment.cgi?id=342638action=diffcontext=patchcollapsed=headers=1format=raw
  
  Our bug log is at http://bugs.debian.org/604457.
  
  Ben.
  
 
 Hi Ben,
 
  You probably know most of this, but:
 
 The problem is that with stacked devices, if the lower device has a
 merge_bvec_fn, and the upper device never bothers to call it, then the
 upper device must make sure that it never sends a bio with more than one
 page in the bi_iovec.  This is a property of the block device interface.
 
 The patch you back-ported fixes md so when it is the upper device it
 behaves correctly.
 
 However in the original problem, the md/raid10 is the lower device, and
 dm is the upper device.  So dm needs to be fixed.

Thanks, I didn't spot that subtlety.

 Despite the fact that I learned about setting blk_queue_max_segments on
 the dm mailing list (if I remember correctly), dm still doesn't include
 this fix in mainline.
 
  The fix I would recommend for 2.6.26 is to add
 
if (q-merge_bvec_fn)
rs-max_phys_segments = 1;
 
 to dm_set_device_limits.  Though the redhat one is probably adequate.
 
 If you really need an upstream fix, you will need to chase upstream to apply
 one :-(

I won't do that myself - as you can see, I don't really understand the
issue fully.  Is that fix also valid (modulo renaming of
max_phys_segments) for later versions?

Ben.

-- 
Ben Hutchings, Debian Developer and kernel team member



signature.asc
Description: This is a digitally signed message part


Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-24 Thread Wouter D'Haeseleer
Hi Ben,

I have now successfully compiled the kernel including the patch which this time 
applied without problem.
However the original bug is still present with the patch you grabbed upstream.

For testing purpose I have tried also the patch which is supplied by redhat and 
I can confirm that this patch is working without a problem.

So it looks like the patch from Neil Brown does not work for this bug.

Thanks

Wouter
 







Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-23 Thread Wouter D'Haeseleer
Hi Ben,

Thanks for the quick response.
I'm trying the patch you send me but it seems to be a huge difference
from what I get using the source.
This is what i did:

apt-get source linux-image-2.6.26-2-xen-686
cd linux-2.6-2.6.26
fakeroot debian/rules source
fakeroot debian/rules setup
cd debian/build/source_i386_xen

And I tried your patch at this level.
Attached you can find the drivers/md/linear.c I have.

Thanks for your help.

Wouter


-Original Message-
From: Ben Hutchings b...@decadent.org.uk
To: Wouter D'Haeseleer wouter.dhaesel...@vasco.com
Cc: 604...@bugs.debian.org 604...@bugs.debian.org
Subject: Re: Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting
LV to xen results in error can't convert block across chunks or bigger
than 64k
Date: Tue, 23 Nov 2010 03:34:07 +0100


On Tue, 2010-11-23 at 02:31 +, Ben Hutchings wrote:

 I have attempted to adjust this for Debian's stable kernel version
 (2.6.26) and the result is attached.  Please could you test this,
 following the instructions at
 http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official.

Oops, that version was not quite completely adjusted.  Please test this
instead.

Ben.



/*
   linear.c : Multiple Devices driver for Linux
	  Copyright (C) 1994-96 Marc ZYNGIER
	  zyng...@ufr-info-p7.ibp.fr or
	  m...@gloups.fdn.fr

   Linear mode management functions.

   This program is free software; you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 2, or (at your option)
   any later version.
   
   You should have received a copy of the GNU General Public License
   (for example /usr/src/linux/COPYING); if not, write to the Free
   Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.  
*/

#include linux/module.h

#include linux/raid/md.h
#include linux/slab.h
#include linux/raid/linear.h

#define MAJOR_NR MD_MAJOR
#define MD_DRIVER
#define MD_PERSONALITY

/*
 * find which device holds a particular offset 
 */
static inline dev_info_t *which_dev(mddev_t *mddev, sector_t sector)
{
	dev_info_t *hash;
	linear_conf_t *conf = mddev_to_conf(mddev);
	sector_t block = sector  1;

	/*
	 * sector_div(a,b) returns the remainer and sets a to a/b
	 */
	block = conf-preshift;
	(void)sector_div(block, conf-hash_spacing);
	hash = conf-hash_table[block];

	while ((sector1) = (hash-size + hash-offset))
		hash++;
	return hash;
}

/**
 *	linear_mergeable_bvec -- tell bio layer if two requests can be merged
 *	@q: request queue
 *	@bio: the buffer head that's been built up so far
 *	@biovec: the request that could be merged to it.
 *
 *	Return amount of bytes we can take at this offset
 */
static int linear_mergeable_bvec(struct request_queue *q, struct bio *bio, struct bio_vec *biovec)
{
	mddev_t *mddev = q-queuedata;
	dev_info_t *dev0;
	unsigned long maxsectors, bio_sectors = bio-bi_size  9;
	sector_t sector = bio-bi_sector + get_start_sect(bio-bi_bdev);

	dev0 = which_dev(mddev, sector);
	maxsectors = (dev0-size  1) - (sector - (dev0-offset1));

	if (maxsectors  bio_sectors)
		maxsectors = 0;
	else
		maxsectors -= bio_sectors;

	if (maxsectors = (PAGE_SIZE  9 )  bio_sectors == 0)
		return biovec-bv_len;
	/* The bytes available at this offset could be really big,
	 * so we cap at 2^31 to avoid overflow */
	if (maxsectors  (1  (31-9)))
		return 131;
	return maxsectors  9;
}

static void linear_unplug(struct request_queue *q)
{
	mddev_t *mddev = q-queuedata;
	linear_conf_t *conf = mddev_to_conf(mddev);
	int i;

	for (i=0; i  mddev-raid_disks; i++) {
		struct request_queue *r_queue = bdev_get_queue(conf-disks[i].rdev-bdev);
		blk_unplug(r_queue);
	}
}

static int linear_congested(void *data, int bits)
{
	mddev_t *mddev = data;
	linear_conf_t *conf = mddev_to_conf(mddev);
	int i, ret = 0;

	for (i = 0; i  mddev-raid_disks  !ret ; i++) {
		struct request_queue *q = bdev_get_queue(conf-disks[i].rdev-bdev);
		ret |= bdi_congested(q-backing_dev_info, bits);
	}
	return ret;
}

static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks)
{
	linear_conf_t *conf;
	dev_info_t **table;
	mdk_rdev_t *rdev;
	int i, nb_zone, cnt;
	sector_t min_spacing;
	sector_t curr_offset;
	struct list_head *tmp;

	conf = kzalloc (sizeof (*conf) + raid_disks*sizeof(dev_info_t),
			GFP_KERNEL);
	if (!conf)
		return NULL;

	cnt = 0;
	conf-array_size = 0;

	rdev_for_each(rdev, tmp, mddev) {
		int j = rdev-raid_disk;
		dev_info_t *disk = conf-disks + j;

		if (j  0 || j = raid_disks || disk-rdev) {
			printk(linear: disk numbering problem. Aborting!\n);
			goto out;
		}

		disk-rdev = rdev;

		blk_queue_stack_limits(mddev-queue,
   rdev-bdev-bd_disk-queue);
		/* as we don't honour merge_bvec_fn, we must never risk
		 * violating it, so limit -max_sector to one PAGE, as
		 * a one page request is never in violation.
		 */
		if (rdev-bdev-bd_disk-queue-merge_bvec_fn 
		mddev-queue-max_sectors  (PAGE_SIZE9))
			

Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-23 Thread Wouter D'Haeseleer
Oeps, please ignore my previous post, it seems I made a mistake with the patch 
files.

I have re-compiled the kernel.
I can say that after running now almost 3 hours I don't see the error anymore.

Therefore I can say this bug is resolved.
When will this patch make it into the normal updates?

Thanks






Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-23 Thread Wouter D'Haeseleer
I Spoke to soon, issue still present using the patch.





Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-23 Thread Wouter D'Haeseleer
Just a small question to be sure I patched it correctly this is what I
did

cd /usr/src

apt-get build-dep linux-image-2.6.26-2-xen-686
apt-get source linux-image-2.6.26-2-xen-686
cd linux-2.6-2.6.26

fakeroot debian/rules source
fakeroot debian/rules setup

cd debian/build/source_i386_xen

# Getting your patch and applying it 
wget 
'http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=10;filename=0001-md-deal-with-merge_bvec_fn-in-component-devices-bett.patch;att=1;bug=604457'
 -O 0001-md-deal-with-merge_bvec_fn-in-component-devices-bett.patch
patch -p1  0001-md-deal-with-merge_bvec_fn-in-component-devices-bett.patch

fakeroot make -f debian/rules.gen binary-arch_i386_xen_686


Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-23 Thread Ben Hutchings
On Tue, 2010-11-23 at 18:20 +0100, Wouter D'Haeseleer wrote:
 Just a small question to be sure I patched it correctly this is what I
 did
 
 cd /usr/src 
 apt-get build-dep linux-image-2.6.26-2-xen-686
 apt-get source linux-image-2.6.26-2-xen-686
 cd linux-2.6-2.6.26
 
 fakeroot debian/rules source
 fakeroot debian/rules setup
 
 cd debian/build/source_i386_xen
 
 # Getting your patch and applying it 
 wget 
 'http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=10;filename=0001-md-deal-with-merge_bvec_fn-in-component-devices-bett.patch;att=1;bug=604457'
  -O 0001-md-deal-with-merge_bvec_fn-in-component-devices-bett.patch
 patch -p1  0001-md-deal-with-merge_bvec_fn-in-component-devices-bett.patch
 
 fakeroot make -f debian/rules.gen binary-arch_i386_xen_686

Sorry, I realise now that those instructions are not correct for the
kernel package in stable.  You need to apply the patch *before* running
'debian/rules setup'.  (For newer kernel package the order doesn't
matter.)

Also, I wrote:
 On Tue, 2010-11-23 at 02:31 +, Ben Hutchings wrote:
 
  I have attempted to adjust this for Debian's stable kernel version
  (2.6.26) and the result is attached.  Please could you test this,
  following the instructions at
  http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official.
 
 Oops, that version was not quite completely adjusted.  Please test this
 instead.

and I have no idea why I thought that, because the first version I sent
you was correct and the second was not.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.


signature.asc
Description: This is a digitally signed message part


Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-23 Thread Ben Hutchings
On Tue, 2010-11-23 at 17:50 +, Ben Hutchings wrote:
[...]
  Oops, that version was not quite completely adjusted.  Please test this
  instead.
 
 and I have no idea why I thought that, because the first version I sent
 you was correct and the second was not.

*sigh* OK, neither of them was correct.  This version will really work,
I promise.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.
From 34df5016f2e8681ae9e99e54a66c826463dd74a5 Mon Sep 17 00:00:00 2001
From: NeilBrown ne...@suse.de
Date: Mon, 8 Mar 2010 16:44:38 +1100
Subject: [PATCH] md: deal with merge_bvec_fn in component devices better.

commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 upstream.

If a component device has a merge_bvec_fn then as we never call it
we must ensure we never need to.  Currently this is done by setting
max_sector to 1 PAGE, however this does not stop a bio being created
with several sub-page iovecs that would violate the merge_bvec_fn.

So instead set max_segments to 1 and set the segment boundary to the
same as a page boundary to ensure there is only ever one single-page
segment of IO requested at a time.

This can particularly be an issue when 'xen' is used as it is
known to submit multiple small buffers in a single bio.

Signed-off-by: NeilBrown ne...@suse.de
Cc: sta...@kernel.org
[bwh: Backport to Linux 2.6.26]
---
 drivers/md/linear.c|   13 -
 drivers/md/multipath.c |   22 ++
 drivers/md/raid0.c |   14 --
 drivers/md/raid1.c |   30 +++---
 drivers/md/raid10.c|   30 +++---
 5 files changed, 68 insertions(+), 41 deletions(-)

diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index ec921f5..627cd38 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -136,12 +136,15 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks)
 		blk_queue_stack_limits(mddev-queue,
    rdev-bdev-bd_disk-queue);
 		/* as we don't honour merge_bvec_fn, we must never risk
-		 * violating it, so limit -max_sector to one PAGE, as
-		 * a one page request is never in violation.
+		 * violating it, so limit max_segments to 1 lying within
+		 * a single page.
 		 */
-		if (rdev-bdev-bd_disk-queue-merge_bvec_fn 
-		mddev-queue-max_sectors  (PAGE_SIZE9))
-			blk_queue_max_sectors(mddev-queue, PAGE_SIZE9);
+		if (rdev-bdev-bd_disk-queue-merge_bvec_fn) {
+			blk_queue_max_phys_segments(mddev-queue, 1);
+			blk_queue_max_hw_segments(mddev-queue, 1);
+			blk_queue_segment_boundary(mddev-queue,
+		   PAGE_CACHE_SIZE - 1);
+		}
 
 		disk-size = rdev-size;
 		conf-array_size += rdev-size;
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index e968116..9605b21 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -293,14 +293,17 @@ static int multipath_add_disk(mddev_t *mddev, mdk_rdev_t *rdev)
 			blk_queue_stack_limits(mddev-queue, q);
 
 		/* as we don't honour merge_bvec_fn, we must never risk
-		 * violating it, so limit -max_sector to one PAGE, as
-		 * a one page request is never in violation.
+		 * violating it, so limit -max_segments to one, lying
+		 * within a single page.
 		 * (Note: it is very unlikely that a device with
 		 * merge_bvec_fn will be involved in multipath.)
 		 */
-			if (q-merge_bvec_fn 
-			mddev-queue-max_sectors  (PAGE_SIZE9))
-blk_queue_max_sectors(mddev-queue, PAGE_SIZE9);
+			if (q-merge_bvec_fn) {
+blk_queue_max_phys_segments(mddev-queue, 1);
+blk_queue_max_hw_segments(mddev-queue, 1);
+blk_queue_segment_boundary(mddev-queue,
+			   PAGE_CACHE_SIZE - 1);
+			}
 
 			conf-working_disks++;
 			mddev-degraded--;
@@ -453,9 +456,12 @@ static int multipath_run (mddev_t *mddev)
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, not that we ever expect a device with
 		 * a merge_bvec_fn to be involved in multipath */
-		if (rdev-bdev-bd_disk-queue-merge_bvec_fn 
-		mddev-queue-max_sectors  (PAGE_SIZE9))
-			blk_queue_max_sectors(mddev-queue, PAGE_SIZE9);
+		if (rdev-bdev-bd_disk-queue-merge_bvec_fn) {
+			blk_queue_max_phys_segments(mddev-queue, 1);
+			blk_queue_max_hw_segments(mddev-queue, 1);
+			blk_queue_segment_boundary(mddev-queue,
+		   PAGE_CACHE_SIZE - 1);
+		}
 
 		if (!test_bit(Faulty, rdev-flags))
 			conf-working_disks++;
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 914c04d..806e20d 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -141,14 +141,16 @@ static int create_strip_zones (mddev_t *mddev)
 		blk_queue_stack_limits(mddev-queue,
    rdev1-bdev-bd_disk-queue);
 		/* as we don't honour merge_bvec_fn, we must never risk
-		 * violating it, so limit -max_sector to one PAGE, as
-		 * a one page request is never in violation.
+		 * violating it, so limit -max_segments to 1, lying within
+		 * a single page.
 		 */
 
-		if (rdev1-bdev-bd_disk-queue-merge_bvec_fn 
-		mddev-queue-max_sectors  

Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-22 Thread Wouter D'Haeseleer
Package: linux-image-2.6.26-2-xen-686
Version: 2.6.26-25lenny1
Severity: critical
Justification: causes serious data loss

When accessing an lv using configured on a raid10 using xen results in 
corrupted data as the following syslog indicates:
kernel: raid10_make_request bug: can't convert block across chunks or bigger 
than 64k 309585274 4

Continued attempts to use the disk in the domU results in i/o error and
the partition being remounted read-only. 

see also debian bug 461644 
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=461644)
Since this bug is old and closed without a fix, I want to open a new bug for it.

Redhat made a patch for the appropriate driver, but its not included upstream.
Can someone please make sure this patch gets into the sources of the debian 
fork.
See this patch: https://bugzilla.redhat.com/attachment.cgi?id=342638action=diff

See also this kernel trap related discussion : 
http://kerneltrap.org/mailarchive/linux-raid/2010/3/8/6837883
This same thread contains an other patch then the redhat one and its that one 
is also confirmed as working.

-- System Information:
Debian Release: 5.0.6
  APT prefers stable
  APT policy: (990, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.26-2-xen-686 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages linux-image-2.6.26-2-xen-686 depends on:
ii  initramfs-tools  0.92o   tools for generating an initramfs
ii  linux-modules-2.6.26-2-x 2.6.26-25lenny1 Linux 2.6.26 modules on i686

Versions of packages linux-image-2.6.26-2-xen-686 recommends:
ii  libc6-xen 2.10.2-2   GNU C Library: Shared libraries [X

Versions of packages linux-image-2.6.26-2-xen-686 suggests:
ii  grub   0.97-47lenny2 GRand Unified Bootloader (Legacy v
pn  linux-doc-2.6.26   none(no description available)

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20101122104855.7917.71402.report...@xen-6080-01.infra.vasco.com



Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-22 Thread Ben Hutchings
On Mon, 2010-11-22 at 11:48 +0100, Wouter D'Haeseleer wrote:
 Package: linux-image-2.6.26-2-xen-686
 Version: 2.6.26-25lenny1
 Severity: critical
 Justification: causes serious data loss
 
 When accessing an lv using configured on a raid10 using xen results in 
 corrupted data as the following syslog indicates:
 kernel: raid10_make_request bug: can't convert block across chunks or bigger 
 than 64k 309585274 4
 
 Continued attempts to use the disk in the domU results in i/o error and
 the partition being remounted read-only. 
 
 see also debian bug 461644 
 (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=461644)
 Since this bug is old and closed without a fix, I want to open a new bug for 
 it.
 
 Redhat made a patch for the appropriate driver, but its not included upstream.
 Can someone please make sure this patch gets into the sources of the debian 
 fork.
 See this patch: 
 https://bugzilla.redhat.com/attachment.cgi?id=342638action=diff

We much prefer to use bug fixes that have been accepted upstream.

 See also this kernel trap related discussion :
 http://kerneltrap.org/mailarchive/linux-raid/2010/3/8/6837883
 This same thread contains an other patch then the redhat one and its
 that one is also confirmed as working.

Well that was also not accepted upstream.  However, I eventually tracked
down the accepted version, which for future reference is:

commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71
Author: NeilBrown ne...@suse.de
Date:   Mon Mar 8 16:44:38 2010 +1100

md: deal with merge_bvec_fn in component devices better.

I have attempted to adjust this for Debian's stable kernel version
(2.6.26) and the result is attached.  Please could you test this,
following the instructions at
http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.
From ea1cddfe4cad61b43b6551ebc6bef466b25ff128 Mon Sep 17 00:00:00 2001
From: NeilBrown ne...@suse.de
Date: Mon, 8 Mar 2010 16:44:38 +1100
Subject: [PATCH] md: deal with merge_bvec_fn in component devices better.

commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 upstream.

If a component device has a merge_bvec_fn then as we never call it
we must ensure we never need to.  Currently this is done by setting
max_sector to 1 PAGE, however this does not stop a bio being created
with several sub-page iovecs that would violate the merge_bvec_fn.

So instead set max_segments to 1 and set the segment boundary to the
same as a page boundary to ensure there is only ever one single-page
segment of IO requested at a time.

This can particularly be an issue when 'xen' is used as it is
known to submit multiple small buffers in a single bio.

Signed-off-by: NeilBrown ne...@suse.de
Cc: sta...@kernel.org
[bwh: Backport to Linux 2.6.26]
---
 drivers/md/linear.c|   13 -
 drivers/md/multipath.c |   20 
 drivers/md/raid0.c |   14 --
 drivers/md/raid1.c |   30 +++---
 drivers/md/raid10.c|   30 +++---
 5 files changed, 66 insertions(+), 41 deletions(-)

diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index ec921f5..627cd38 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -136,12 +136,15 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks)
 		blk_queue_stack_limits(mddev-queue,
    rdev-bdev-bd_disk-queue);
 		/* as we don't honour merge_bvec_fn, we must never risk
-		 * violating it, so limit -max_sector to one PAGE, as
-		 * a one page request is never in violation.
+		 * violating it, so limit max_segments to 1 lying within
+		 * a single page.
 		 */
-		if (rdev-bdev-bd_disk-queue-merge_bvec_fn 
-		mddev-queue-max_sectors  (PAGE_SIZE9))
-			blk_queue_max_sectors(mddev-queue, PAGE_SIZE9);
+		if (rdev-bdev-bd_disk-queue-merge_bvec_fn) {
+			blk_queue_max_phys_segments(mddev-queue, 1);
+			blk_queue_max_hw_segments(mddev-queue, 1);
+			blk_queue_segment_boundary(mddev-queue,
+		   PAGE_CACHE_SIZE - 1);
+		}
 
 		disk-size = rdev-size;
 		conf-array_size += rdev-size;
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index e968116..0e84b4f 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -293,14 +293,16 @@ static int multipath_add_disk(mddev_t *mddev, mdk_rdev_t *rdev)
 			blk_queue_stack_limits(mddev-queue, q);
 
 		/* as we don't honour merge_bvec_fn, we must never risk
-		 * violating it, so limit -max_sector to one PAGE, as
-		 * a one page request is never in violation.
+		 * violating it, so limit -max_segments to one, lying
+		 * within a single page.
 		 * (Note: it is very unlikely that a device with
 		 * merge_bvec_fn will be involved in multipath.)
 		 */
-			if (q-merge_bvec_fn 
-			mddev-queue-max_sectors  (PAGE_SIZE9))
-blk_queue_max_sectors(mddev-queue, PAGE_SIZE9);
+			if (q-merge_bvec_fn) {
+blk_queue_max_segments(mddev-queue, 1);
+

Bug#604457: linux-image-2.6.26-2-xen-686: Raid10 exporting LV to xen results in error can't convert block across chunks or bigger than 64k

2010-11-22 Thread Ben Hutchings
On Tue, 2010-11-23 at 02:31 +, Ben Hutchings wrote:

 I have attempted to adjust this for Debian's stable kernel version
 (2.6.26) and the result is attached.  Please could you test this,
 following the instructions at
 http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official.

Oops, that version was not quite completely adjusted.  Please test this
instead.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.
From: NeilBrown ne...@suse.de
Date: Mon, 8 Mar 2010 16:44:38 +1100
Subject: [PATCH] md: deal with merge_bvec_fn in component devices better.

commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 upstream.

If a component device has a merge_bvec_fn then as we never call it
we must ensure we never need to.  Currently this is done by setting
max_sector to 1 PAGE, however this does not stop a bio being created
with several sub-page iovecs that would violate the merge_bvec_fn.

So instead set max_segments to 1 and set the segment boundary to the
same as a page boundary to ensure there is only ever one single-page
segment of IO requested at a time.

This can particularly be an issue when 'xen' is used as it is
known to submit multiple small buffers in a single bio.

Signed-off-by: NeilBrown ne...@suse.de
Cc: sta...@kernel.org
---
 drivers/md/linear.c|   12 +++-
 drivers/md/multipath.c |   20 
 drivers/md/raid0.c |   13 +++--
 drivers/md/raid1.c |   28 +---
 drivers/md/raid10.c|   28 +---
 5 files changed, 60 insertions(+), 41 deletions(-)

diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index af2d39d..bb2a231 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -172,12 +172,14 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks)
 		disk_stack_limits(mddev-gendisk, rdev-bdev,
   rdev-data_offset  9);
 		/* as we don't honour merge_bvec_fn, we must never risk
-		 * violating it, so limit -max_sector to one PAGE, as
-		 * a one page request is never in violation.
+		 * violating it, so limit max_segments to 1 lying within
+		 * a single page.
 		 */
-		if (rdev-bdev-bd_disk-queue-merge_bvec_fn 
-		queue_max_sectors(mddev-queue)  (PAGE_SIZE9))
-			blk_queue_max_hw_sectors(mddev-queue, PAGE_SIZE9);
+		if (rdev-bdev-bd_disk-queue-merge_bvec_fn) {
+			blk_queue_max_segments(mddev-queue, 1);
+			blk_queue_segment_boundary(mddev-queue,
+		   PAGE_CACHE_SIZE - 1);
+		}
 
 		conf-array_sectors += rdev-sectors;
 		cnt++;
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index 4b323f4..5558ebc 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -301,14 +301,16 @@ static int multipath_add_disk(mddev_t *mddev, mdk_rdev_t *rdev)
 	  rdev-data_offset  9);
 
 		/* as we don't honour merge_bvec_fn, we must never risk
-		 * violating it, so limit -max_sector to one PAGE, as
-		 * a one page request is never in violation.
+		 * violating it, so limit -max_segments to one, lying
+		 * within a single page.
 		 * (Note: it is very unlikely that a device with
 		 * merge_bvec_fn will be involved in multipath.)
 		 */
-			if (q-merge_bvec_fn 
-			queue_max_sectors(q)  (PAGE_SIZE9))
-blk_queue_max_hw_sectors(mddev-queue, PAGE_SIZE9);
+			if (q-merge_bvec_fn) {
+blk_queue_max_segments(mddev-queue, 1);
+blk_queue_segment_boundary(mddev-queue,
+			   PAGE_CACHE_SIZE - 1);
+			}
 
 			conf-working_disks++;
 			mddev-degraded--;
@@ -476,9 +478,11 @@ static int multipath_run (mddev_t *mddev)
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, not that we ever expect a device with
 		 * a merge_bvec_fn to be involved in multipath */
-		if (rdev-bdev-bd_disk-queue-merge_bvec_fn 
-		queue_max_sectors(mddev-queue)  (PAGE_SIZE9))
-			blk_queue_max_hw_sectors(mddev-queue, PAGE_SIZE9);
+		if (rdev-bdev-bd_disk-queue-merge_bvec_fn) {
+			blk_queue_max_segments(mddev-queue, 1);
+			blk_queue_segment_boundary(mddev-queue,
+		   PAGE_CACHE_SIZE - 1);
+		}
 
 		if (!test_bit(Faulty, rdev-flags))
 			conf-working_disks++;
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index a1f7147..377cf2a 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -176,14 +176,15 @@ static int create_strip_zones(mddev_t *mddev)
 		disk_stack_limits(mddev-gendisk, rdev1-bdev,
   rdev1-data_offset  9);
 		/* as we don't honour merge_bvec_fn, we must never risk
-		 * violating it, so limit -max_sector to one PAGE, as
-		 * a one page request is never in violation.
+		 * violating it, so limit -max_segments to 1, lying within
+		 * a single page.
 		 */
 
-		if (rdev1-bdev-bd_disk-queue-merge_bvec_fn 
-		queue_max_sectors(mddev-queue)  (PAGE_SIZE9))
-			blk_queue_max_hw_sectors(mddev-queue, PAGE_SIZE9);
-
+		if (rdev1-bdev-bd_disk-queue-merge_bvec_fn) {
+			blk_queue_max_segments(mddev-queue, 1);
+			blk_queue_segment_boundary(mddev-queue,
+		   PAGE_CACHE_SIZE -