[PATCH 28/61] md: Fix bug where spares dont always get rebuilt properly when they become live.

2006-10-31 Thread Chris Wright
-stable review patch.  If anyone has any objections, please let us know.
--

From: NeilBrown <[EMAIL PROTECTED]>

If save_raid_disk is >= 0, then the device could be a device that is 
already in sync that is being re-added.  So we need to default this
value to -1.


Signed-off-by: Neil Brown <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>

---
 drivers/md/md.c |1 +
 1 file changed, 1 insertion(+)

--- linux-2.6.18.1.orig/drivers/md/md.c
+++ linux-2.6.18.1/drivers/md/md.c
@@ -1994,6 +1994,7 @@ static mdk_rdev_t *md_import_device(dev_
kobject_init(&rdev->kobj);
 
rdev->desc_nr = -1;
+   rdev->saved_raid_disk = -1;
rdev->flags = 0;
rdev->data_offset = 0;
rdev->sb_events = 0;

--
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 29/61] md: Fix calculation of ->degraded for multipath and raid10

2006-10-31 Thread Chris Wright
-stable review patch.  If anyone has any objections, please let us know.
--

From: NeilBrown <[EMAIL PROTECTED]>

Two less-used md personalities have bugs in the calculation of 
 ->degraded (the extent to which the array is degraded).

Signed-off-by: Neil Brown <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>

---
 drivers/md/multipath.c |2 +-
 drivers/md/raid10.c|2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- linux-2.6.18.1.orig/drivers/md/multipath.c
+++ linux-2.6.18.1/drivers/md/multipath.c
@@ -480,7 +480,7 @@ static int multipath_run (mddev_t *mddev
mdname(mddev));
goto out_free_conf;
}
-   mddev->degraded = conf->raid_disks = conf->working_disks;
+   mddev->degraded = conf->raid_disks - conf->working_disks;
 
conf->pool = mempool_create_kzalloc_pool(NR_RESERVED_BUFS,
 sizeof(struct multipath_bh));
--- linux-2.6.18.1.orig/drivers/md/raid10.c
+++ linux-2.6.18.1/drivers/md/raid10.c
@@ -2042,7 +2042,7 @@ static int run(mddev_t *mddev)
disk = conf->mirrors + i;
 
if (!disk->rdev ||
-   !test_bit(In_sync, &rdev->flags)) {
+   !test_bit(In_sync, &disk->rdev->flags)) {
disk->head_position = 0;
mddev->degraded++;
}

--
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


New features?

2006-10-31 Thread Frido Ferdinand
Hi,

On Tue, Oct 31, 2006 at 08:50:19AM -0800, Mike Hardy wrote:
> >> 1 "Warm swap" - replacing drives without taking down the array but maybe
> >> having to type in a few commands. Presumably a sata or sata/raid
> >> interface issue. (True hot swap is nice but not worth delaying warm-
> >> swap.)
> > 
> > I believe that 2.6.18 has SATA hot-swap, so this should be available
> > know ... providing you can find out what commands to use.
> 
> I forgot 2.6.18 has SATA hot-swap, has anyone tested that?

Yeah, I've tracked the sata EH patches and now 2.6.18 for a while and
hotswap works. However if you pull the disk on a raidset the disk is set
as faulty and the device (/dev/sda for example) dissapears. If you
replug it, the device does not regain it's original devicename but but
will use the latest free 'slot' available (in a four disk layout that's
/dev/sde). Also trying to --remove the disk doesn't work since the
devicefile is gone. So be sure to --remove disks _before_ you pull it.

Anyone know if there's work being done to fix this issue, does 
this also happen on scsi ?

Regards,

-- Frido

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New features?

2006-10-31 Thread Frido Ferdinand
Hi,

On Tue, Oct 31, 2006 at 08:50:19AM -0800, Mike Hardy wrote:
> >> 1 "Warm swap" - replacing drives without taking down the array but maybe
> >> having to type in a few commands. Presumably a sata or sata/raid
> >> interface issue. (True hot swap is nice but not worth delaying warm-
> >> swap.)
> > 
> > I believe that 2.6.18 has SATA hot-swap, so this should be available
> > know ... providing you can find out what commands to use.
> 
> I forgot 2.6.18 has SATA hot-swap, has anyone tested that?

Yeah, I've tracked the sata EH patches and now 2.6.18 for a while and
hotswap works. However if you pull the disk on a raidset the disk is set
as faulty and the device (/dev/sda for example) dissapears. If you
replug it, the device does not regain it's original devicename but but
will use the latest free 'slot' available (in a four disk layout that's
/dev/sde). Also trying to --remove the disk doesn't work since the
devicefile is gone. So be sure to --remove disks _before_ you pull it.

Anyone know if there's work being done to fix this issue, does 
this also happen on scsi ?

Met vriendelijke groet,

-- Frido Ferdinand
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 000 of 6] md: udev events and cache bypass for reads

2006-10-31 Thread Greg KH
On Tue, Oct 31, 2006 at 05:00:40PM +1100, NeilBrown wrote:
> Following are 6 patches for md in -lastest which I have been sitting
> on for a while because I hadn't had a chance to test them properly.
> I now have so there shouldn't be too many bugs left :-)
> 
> First is suitable for 2.6.19 (if it isn't too late and gregkh thinks it
> is good).  Rest are for 2.6.20.

No objections from me.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 001 of 6] md: Send online/offline uevents when an md array starts/stops.

2006-10-31 Thread Greg KH
On Tue, Oct 31, 2006 at 05:00:46PM +1100, NeilBrown wrote:
> 
> This allows udev to do something intelligent when an
> array becomes available.
> 
> cc: [EMAIL PROTECTED]
> Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

Acked-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 refuses to accept replacement drive.

2006-10-31 Thread greg
On Oct 26,  7:25am, Neil Brown wrote:
} Subject: Re: RAID5 refuses to accept replacement drive.

Hi Neil, hope your week is going well, thanks for the reply.

> > Environment:
> > Kernel: 2.4.33.3
> > MDADM:  2.4.1/2.5.3
> > MD: Three drive RAID5 (md3)

> Old kernel, new mdadm.  Not a tested combination unfortunately.  I
> guess I should try booting 2.4 somewhere and try it out...

Based on what I found, its probably an old library issue as much as
anything.

More below.

> > Drives were shuffled to get the machine operational.  The machine came
> > up with md3 degraded.  The md3 device refuses to accept a replacement
> > partition using the following syntax:
> > 
> > mdadm --manage /dev/md3 -a /dev/sde1
> > 
> > No output from mdadm, nothing in the logfiles.  Tail end of strace is
> > as follows:
> > 
> > open("/dev/md3", O_RDWR)= 3
> > fstat64(0x3, 0xb8fc)= 0
> > ioctl(3, 0x800c0910, 0xb9f8)= 0

> Those last to lines are a called to md_get_version. 
> Probably the one in open_mddev
> 
> > _exit(0)= ?
> 
> But I can see no way that it would exit...
> 
> Are you comfortable with gdb?
> Would you be interested in single stepping around and seeing what path
> leads to the exit?

My apologies for not being quicker on the draw, I should have gone
grovelling with gdb first.

The problem appears to be due to what must be a broken implementation
of getopt_long in the version of the installed C library.  Either that
or the reasonably complex :-) option parsing in mdadm is tripping
it up.

As I noted before the following syntax fails:

mdadm --manage /dev/md3 -a /dev/sde1

After poking around a bit and watching the option parsing in gdb I
noticed that the following syntax should work:

mdadm /dev/md3 -a /dev/sde1

I tried the latter command outside of GDB and things worked
perfectly.  The drive was added to the RAID5 array and synchronization
proceeded properly.

I then failed out a drive element on one of the other MD devices on
the machine and was able to repeat the problem.  The following refused
to work:

mdadm --manage /dev/md1 -a /dev/sdb2

While the following worked:

mdadm /dev/md1 -a /dev/sdb2

The getopt_long function is not picking up on the fact that -a should
have optarg set to /dev/sdb2 when the option is recognized.  Instead
optarg is set to NULL and devs_found is left at 1 rather than 2.  That
results in mdadm simply exiting without saying anything.

I know the 1.x version of mdadm we were using before processed the
'mdadm --manage' syntax properly.  This must have been the first time
we had to add a drive element back into an MD device since we upgraded
mdadm.

I would be happy to chase this a bit more or send you a statically
linked binary if you want to see what it is up to.  At the very least
it may be worthwhile to issue a warning message on exit if mdadm has
an MD device specification, a mode specification and no devices.

I remember trying to build a statically linked copy of mdadm with
dietlibc and ran into option parsing problems.  The resultant binary
would always exit complaining that a device had not been specified.  I
remember the dietlibc documentation noting that the GNU folks had an
inconsistent world view when it came to getopt processing
semantics... :-)

I suspect there is a common thead involved in both cases.

> NeilBrown

Hope the above is useful.  Let me know if you have any
questions/issues.

Happy Halloween.

Greg

}-- End of excerpt from Neil Brown

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.   Specializing in information infra-structure
Fargo, ND  58102development.
PH: 701-281-1686
FAX: 701-281-3949   EMAIL: [EMAIL PROTECTED]
--
"Fools ignore complexity.  Pragmatists suffer it.  Some can avoid it.
Geniuses remove it.
-- Perliss' Programming Proverb #58
   SIGPLAN National, Sept. 1982
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New features?

2006-10-31 Thread Bill Davidsen

John Rowe wrote:


All this discussion has led me to wonder if we users of linux RAID have
a clear consensus of what our priorities are, ie what are the things we
really want to see soon as opposed to the many things that would be nice
but not worth delaying the important things for. FWIW, here are mine, in
order although the first two are roughly equal priority.

1 "Warm swap" - replacing drives without taking down the array but maybe
having to type in a few commands. Presumably a sata or sata/raid
interface issue. (True hot swap is nice but not worth delaying warm-
swap.)
 

That seems to work now. It does assume that you have hardware hot swap 
capability.



2 Adding new disks to arrays. Allows incremental upgrades and to take
advantage of the hard disk equivalent of Moore's law.
 


Also seems to work.


3. RAID level conversion (1 to 5, 5 to 6, with single-disk to RAID 1 a
lower priority).
 

Single to RAID-N is possible, but involves a good bit of magic with 
leaving room for superblocks, etc.



4. Uneven disk sizes, eg adding a 400GB disk to a 2x200GB mirror to
create a 400GB mirror. Together with 2 and 3, allows me to continuously
expand a disk array.
 


???

--

bill davidsen <[EMAIL PROTECTED]>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New features?

2006-10-31 Thread Mike Hardy


Neil Brown wrote:
> On Tuesday October 31, [EMAIL PROTECTED] wrote:

>> 1 "Warm swap" - replacing drives without taking down the array but maybe
>> having to type in a few commands. Presumably a sata or sata/raid
>> interface issue. (True hot swap is nice but not worth delaying warm-
>> swap.)
> 
> I believe that 2.6.18 has SATA hot-swap, so this should be available
> know ... providing you can find out what commands to use.

I forgot 2.6.18 has SATA hot-swap, has anyone tested that?

FWIW, SCSI (or SAS now, using SCSI or SATA drives) has full hot-swap
with completely online drive exchanges. I have done this on recent
kernels in production and it works.

> 
>> 2 Adding new disks to arrays. Allows incremental upgrades and to take
>> advantage of the hard disk equivalent of Moore's law.
> 
> Works for raid5 and linear.  Raid6 one day.

Also works for raid1!


>> 4. Uneven disk sizes, eg adding a 400GB disk to a 2x200GB mirror to
>> create a 400GB mirror. Together with 2 and 3, allows me to continuously
>> expand a disk array.
> 
> So you have a RAID1 (md) from sda and sdb, both 200GB, and you now have a
> sdc which is 400GB.
> So
>mdadm /dev/md0 -a /dev/sdc
>mdadm /dev/md0 -f /dev/sda
>mdadm /dev/md0 -r /dev/sda
># wait for recovery

Could be:

mdadm /dev/md0 -a /dev/sdc
mdadm --grow /dev/md0 --raid-devices=3 # 3-disk mirror
# wait for recovery
# don't forget grub-install /dev/sda (or similar)!
mdadm /dev/md0 -f /dev/sda
mdadm /dev/md0 -r /dev/sda
mdadm --grow /dev/md0 --raid-devices=2 # 2-disk again

# Run a 'smartctl -d ata -t long /dev/sdb' before next line...

>mdadm /dev/md0 -f /dev/sdb
>mdadm /dev/md0 -r /dev/sdb
>mdadm -C /dev/md1 -l linear -n 2 /dev/sda /dev/sdb
>mdadm /dev/md0 -a /dev/md1
># wait for recovery
>mdadm --grow /dev/md0 --size=max
> 
> You do run with a degraded array for a while, but you can do it
> entirely online.
> It might be possible to decrease the time when the array is degraded,
> but it is too late at night to think about that.

All I did was decrease the degradation time, but hey it could help. And
don't forget the long SMART test before running degraded for real. Could
save you some pain.

-Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: future hardware

2006-10-31 Thread Rob Bray
> I have been using an older 64bit system, socket 754 for a while now.  It
> has
> the old PCI bus 33Mhz.  I have two low cost (no HW RAID) PCI SATA I cards
> each with 4 ports to give me an eight disk RAID 6.  I also have a Gig NIC,
> on the PCI bus.  I have Gig switches with clients connecting to it at Gig
> speed.
>
> As many know you get a peak transfer rate of 133 MB/s or 1064Mb/s from
> that
> PCI bus http://en.wikipedia.org/wiki/Peripheral_Component_Interconnect
>
> The transfer rate is not bad across the network but my bottle neck it the
> PCI bus.  I have been shopping around for new MB and PCI-express cards.  I
> have been using mdadm for a long time and would like to stay with it.  I
> am
> having trouble finding an eight port PCI-express card that does not have
> all
> the fancy HW RAID which jacks up the cost.  I am now considering using a
> MB
> with eight SATA II slots onboard.  GIGABYTE GA-M59SLI-S5 Socket AM2 NVIDIA
> nForce 590 SLI MCP ATX.
>
> What are other users of mdadm using with the PCI-express cards, most cost
> effective solution?
>
>

I agree that SATA drives on PCI-E cards are as much bang-for-buck as is
available right now. On the newer platforms, each PCI-E slot, the onboard
RAID controller(s), and the 32-bit PCI bus all have discrete paths to the
chip.

Play with the thing to see how many disks you can put on a controller
without a slowdown. Don't assume the controller isn't oversold on
bandwidth (I was only able to use three out of four CK804 ports on a
GA-K8NE without saturating it; two out of four slots on a PCI Sil3114).
Combining the bandwidth of the onboard RAID controller, two SATA slots,
and one PCI controller card, sustained reads reach 450MB/s (across 7
disks, RAID-0) with an $80 board, and three $20 controller cards.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New features?

2006-10-31 Thread John Rowe
Thanks for this Neil, good to know that most of what I would like is
already available. I think your reply highlights what I almost put in
there as my first priority: documentation, specifically a HOWTO.

> I believe that 2.6.18 has SATA hot-swap, so this should be available
> know ... providing you can find out what commands to use.

Exactly!

> > 2 Adding new disks to arrays. Allows incremental upgrades and to take
> > advantage of the hard disk equivalent of Moore's law.
> 
> Works for raid5 and linear.  Raid6 one day.

Am I misinterpreting the mdadm 2.5 man pages when it says:

Grow (or shrink) an array, or otherwise reshape it in some way.
Currently supported growth options including changing the active
size of component devices in RAID level 1/4/5/6 and changing the
number of active devices in RAID1.

> > 3. RAID level conversion (1 to 5, 5 to 6, with single-disk to RAID 1 a
> > lower priority).
> 
> A single disk is large than a RAID1 built from it, so this is
> non-trivial.  What exactly do you want to do there.

Single to disk is less important, but adding a third disk to a RAID1
pair to make a RAID5 would be nice as would be adding one or more disks
to a RAID5 to make a RAID6.

John


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New features?

2006-10-31 Thread Neil Brown
On Tuesday October 31, [EMAIL PROTECTED] wrote:
> All this discussion has led me to wonder if we users of linux RAID have
> a clear consensus of what our priorities are, ie what are the things we
> really want to see soon as opposed to the many things that would be nice
> but not worth delaying the important things for. FWIW, here are mine, in
> order although the first two are roughly equal priority.
> 
> 1 "Warm swap" - replacing drives without taking down the array but maybe
> having to type in a few commands. Presumably a sata or sata/raid
> interface issue. (True hot swap is nice but not worth delaying warm-
> swap.)

I believe that 2.6.18 has SATA hot-swap, so this should be available
know ... providing you can find out what commands to use.

> 
> 2 Adding new disks to arrays. Allows incremental upgrades and to take
> advantage of the hard disk equivalent of Moore's law.

Works for raid5 and linear.  Raid6 one day.

> 
> 3. RAID level conversion (1 to 5, 5 to 6, with single-disk to RAID 1 a
> lower priority).

A single disk is large than a RAID1 built from it, so this is
non-trivial.  What exactly do you want to do there.

> 
> 4. Uneven disk sizes, eg adding a 400GB disk to a 2x200GB mirror to
> create a 400GB mirror. Together with 2 and 3, allows me to continuously
> expand a disk array.

So you have a RAID1 (md) from sda and sdb, both 200GB, and you now have a
sdc which is 400GB.
So
   mdadm /dev/md0 -a /dev/sdc
   mdadm /dev/md0 -f /dev/sda
   mdadm /dev/md0 -r /dev/sda
   # wait for recovery
   mdadm /dev/md0 -f /dev/sdb
   mdadm /dev/md0 -r /dev/sdb
   mdadm -C /dev/md1 -l linear -n 2 /dev/sda /dev/sdb
   mdadm /dev/md0 -a /dev/md1
   # wait for recovery
   mdadm --grow /dev/md0 --size=max

You do run with a degraded array for a while, but you can do it
entirely online.
It might be possible to decrease the time when the array is degraded,
but it is too late at night to think about that.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


New features?

2006-10-31 Thread John Rowe
All this discussion has led me to wonder if we users of linux RAID have
a clear consensus of what our priorities are, ie what are the things we
really want to see soon as opposed to the many things that would be nice
but not worth delaying the important things for. FWIW, here are mine, in
order although the first two are roughly equal priority.

1 "Warm swap" - replacing drives without taking down the array but maybe
having to type in a few commands. Presumably a sata or sata/raid
interface issue. (True hot swap is nice but not worth delaying warm-
swap.)

2 Adding new disks to arrays. Allows incremental upgrades and to take
advantage of the hard disk equivalent of Moore's law.

3. RAID level conversion (1 to 5, 5 to 6, with single-disk to RAID 1 a
lower priority).

4. Uneven disk sizes, eg adding a 400GB disk to a 2x200GB mirror to
create a 400GB mirror. Together with 2 and 3, allows me to continuously
expand a disk array.

(Not knowing the code, I wonder if 2, 3 and 4 could be accomplished by
allowing an "external" RAID device to have several internal devices and
with changes accomplished the old one to shrink and the new one to grow
until the old one no longer exists.)

Thanks for listening.

John



-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [stable] [PATCH] Check bio address after mapping through partitions.

2006-10-31 Thread Chris Wright
* Jens Axboe ([EMAIL PROTECTED]) wrote:
> On Tue, Oct 31 2006, NeilBrown wrote:
> > This would be good for 2.6.19 and even 18.2, if it is seens acceptable.
> > raid0 at least (possibly other) can be made to Oops with a bad partition 
> > table and best fix seem to be to not let out-of-range request get down
> > to the device.
> > 
> > ### Comments for Changeset
> > 
> > Partitions are not limited to live within a device.  So
> > we should range check after partition mapping.
> > 
> > Note that 'maxsector' was being used for two different things.  I have
> > split off the second usage into 'old_sector' so that maxsector can be
> > still be used for it's primary usage later in the function.
> > 
> > Cc: Jens Axboe <[EMAIL PROTECTED]>
> > Signed-off-by: Neil Brown <[EMAIL PROTECTED]>
> 
> Code looks good to me, but for some reason your comment exceeds 80
> chars. Can you please fix that up?

Maybe just was copy, pasted and pushed one tab over from the same check
above the loop.  What about consolidating that one?

thanks,
-chris

--

From: Neil Brown <[EMAIL PROTECTED]>

Partitions are not limited to live within a device.  So
we should range check after partition mapping.

   
Note that 'maxsector' was being used for two different things.  I have
split off the second usage into 'old_sector' so that maxsector can be
still be used for it's primary usage later in the function.

   
Acked-by: Jens Axboe <[EMAIL PROTECTED]>
Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

Add check_bad_sector() to consolidate the checks a touch.

Signed-off-by: Chris Wright <[EMAIL PROTECTED]>

---
(smth like this only compile tested extension to Neil's patch)

 block/ll_rw_blk.c |   51 +++
 1 file changed, 31 insertions(+), 20 deletions(-)

--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -2971,6 +2971,26 @@ static void handle_bad_sector(struct bio
set_bit(BIO_EOF, &bio->bi_flags);
 }
 
+static int check_bad_sector(struct bio *bio)
+{
+   sector_t maxsector = bio->bi_bdev->bd_inode->i_size >> 9;
+   int nr_sectors = bio_sectors(bio);
+   if (maxsector) {
+   sector_t sector = bio->bi_sector;
+
+   if (maxsector < nr_sectors || maxsector - nr_sectors < sector) {
+   /*
+* This may well happen - the kernel calls bread()
+* without checking the size of the device, e.g., when
+* mounting a device.
+*/
+   handle_bad_sector(bio);
+   return 1;
+   }
+   }
+   return 0;
+}
+
 /**
  * generic_make_request: hand a buffer to its device driver for I/O
  * @bio:  The bio describing the location in memory and on the device.
@@ -2998,26 +3018,14 @@ static void handle_bad_sector(struct bio
 void generic_make_request(struct bio *bio)
 {
request_queue_t *q;
-   sector_t maxsector;
-   int ret, nr_sectors = bio_sectors(bio);
+   sector_t old_sector;
+   int ret;
dev_t old_dev;
 
might_sleep();
/* Test device or partition size, when known. */
-   maxsector = bio->bi_bdev->bd_inode->i_size >> 9;
-   if (maxsector) {
-   sector_t sector = bio->bi_sector;
-
-   if (maxsector < nr_sectors || maxsector - nr_sectors < sector) {
-   /*
-* This may well happen - the kernel calls bread()
-* without checking the size of the device, e.g., when
-* mounting a device.
-*/
-   handle_bad_sector(bio);
-   goto end_io;
-   }
-   }
+   if (check_bad_sector(bio))
+   goto end_io;
 
/*
 * Resolve the mapping until finished. (drivers are
@@ -3027,7 +3035,7 @@ void generic_make_request(struct bio *bi
 * NOTE: we don't repeat the blk_size check for each new device.
 * Stacking drivers are expected to know what they are doing.
 */
-   maxsector = -1;
+   old_sector = -1;
old_dev = 0;
do {
char b[BDEVNAME_SIZE];
@@ -3061,15 +3069,18 @@ end_io:
 */
blk_partition_remap(bio);
 
-   if (maxsector != -1)
+   if (old_sector != -1)
blk_add_trace_remap(q, bio, old_dev, bio->bi_sector, 
-   maxsector);
+   old_sector);
 
blk_add_trace_bio(q, bio, BLK_TA_QUEUE);
 
-   maxsector = bio->bi_sector;
+   old_sector = bio->bi_sector;
old_dev = bio->bi_

Re: [PATCH 002 of 6] md: Change lifetime rules for 'md' devices.

2006-10-31 Thread Jens Axboe
On Tue, Oct 31 2006, Neil Brown wrote:
> On Tuesday October 31, [EMAIL PROTECTED] wrote:
> > On Tue, Oct 31 2006, Neil Brown wrote:
> > > 
> > > I'm guessing we need
> > > 
> > > diff .prev/block/elevator.c ./block/elevator.c
> > > --- .prev/block/elevator.c2006-10-31 20:06:22.0 +1100
> > > +++ ./block/elevator.c2006-10-31 20:06:40.0 +1100
> > > @@ -926,7 +926,7 @@ static void __elv_unregister_queue(eleva
> > >  
> > >  void elv_unregister_queue(struct request_queue *q)
> > >  {
> > > - if (q)
> > > + if (q && q->elevator)
> > >   __elv_unregister_queue(q->elevator);
> > >  }
> > > 
> > > 
> > > Jens?  md never registers and elevator for its queue.
> > 
> > Hmm, but blk_unregister_queue() doesn't call elv_unregister_queue()
> > unless q->request_fn is set. And in that case, you must have an io
> > scheduler attached.
> 
> Hmm.. yes.  Oh, I get it.  I have
> 
>   blk_cleanup_queue(mddev->queue);
>   mddev->queue = NULL;
>   del_gendisk(mddev->gendisk);
>   mddev->gendisk = NULL;
> 
> That's the wrong order, isn't it. :-(

Yep, you want to reverse that :-)

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 002 of 6] md: Change lifetime rules for 'md' devices.

2006-10-31 Thread Neil Brown
On Tuesday October 31, [EMAIL PROTECTED] wrote:
> On Tue, Oct 31 2006, Neil Brown wrote:
> > 
> > I'm guessing we need
> > 
> > diff .prev/block/elevator.c ./block/elevator.c
> > --- .prev/block/elevator.c  2006-10-31 20:06:22.0 +1100
> > +++ ./block/elevator.c  2006-10-31 20:06:40.0 +1100
> > @@ -926,7 +926,7 @@ static void __elv_unregister_queue(eleva
> >  
> >  void elv_unregister_queue(struct request_queue *q)
> >  {
> > -   if (q)
> > +   if (q && q->elevator)
> > __elv_unregister_queue(q->elevator);
> >  }
> > 
> > 
> > Jens?  md never registers and elevator for its queue.
> 
> Hmm, but blk_unregister_queue() doesn't call elv_unregister_queue()
> unless q->request_fn is set. And in that case, you must have an io
> scheduler attached.

Hmm.. yes.  Oh, I get it.  I have

blk_cleanup_queue(mddev->queue);
mddev->queue = NULL;
del_gendisk(mddev->gendisk);
mddev->gendisk = NULL;

That's the wrong order, isn't it. :-(

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 002 of 6] md: Change lifetime rules for 'md' devices.

2006-10-31 Thread Jens Axboe
On Tue, Oct 31 2006, Neil Brown wrote:
> On Tuesday October 31, [EMAIL PROTECTED] wrote:
> > On Tue, 31 Oct 2006 17:00:51 +1100
> > NeilBrown <[EMAIL PROTECTED]> wrote:
> > 
> > > Currently md devices are created when first opened and remain in existence
> > > until the module is unloaded.
> > > This isn't a major problem, but it somewhat ugly.
> > > 
> > > This patch changes the lifetime rules so that an md device will
> > > disappear on the last close if it has no state.
> > 
> > This kills the G5:
> > 
> > 
> > EXT3-fs: recovery complete.
> > EXT3-fs: mounted filesystem with ordered data mode.
> > Oops: Kernel access of bad area, sig: 11 [#1]
> > SMP NR_CPUS=4 
> > Modules linked in:
> > NIP: C01A31B8 LR: C018E5DC CTR: C01A3404
> > REGS: c17ff4a0 TRAP: 0300   Not tainted  (2.6.19-rc4-mm1)
> > MSR: 90009032   CR: 8448  XER: 
> > DAR: 6B6B6B6B6B6B6BB3, DSISR: 4000
> > TASK = cff2b7f0[1899] 'nash' THREAD: c17fc000 CPU: 1
> > GPR00: 0008 C17FF720 C06B26D0 6B6B6B6B6B6B6B7B 
> ..
> > NIP [C01A31B8] .kobject_uevent+0xac/0x55c
> > LR [C018E5DC] .__elv_unregister_queue+0x20/0x44
> > Call Trace:
> > [C17FF720] [C0562508] read_pipe_fops+0xd0/0xd8 (unreliable)
> > [C17FF840] [C018E5DC] .__elv_unregister_queue+0x20/0x44
> > [C17FF8D0] [C0195548] .blk_unregister_queue+0x58/0x9c
> > [C17FF960] [C019683C] .unlink_gendisk+0x1c/0x50
> > [C17FF9F0] [C0122840] .del_gendisk+0x98/0x22c
> 
> I'm guessing we need
> 
> diff .prev/block/elevator.c ./block/elevator.c
> --- .prev/block/elevator.c2006-10-31 20:06:22.0 +1100
> +++ ./block/elevator.c2006-10-31 20:06:40.0 +1100
> @@ -926,7 +926,7 @@ static void __elv_unregister_queue(eleva
>  
>  void elv_unregister_queue(struct request_queue *q)
>  {
> - if (q)
> + if (q && q->elevator)
>   __elv_unregister_queue(q->elevator);
>  }
> 
> 
> Jens?  md never registers and elevator for its queue.

Hmm, but blk_unregister_queue() doesn't call elv_unregister_queue()
unless q->request_fn is set. And in that case, you must have an io
scheduler attached.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 002 of 6] md: Change lifetime rules for 'md' devices.

2006-10-31 Thread Neil Brown
On Tuesday October 31, [EMAIL PROTECTED] wrote:
> On Tue, 31 Oct 2006 17:00:51 +1100
> NeilBrown <[EMAIL PROTECTED]> wrote:
> 
> > Currently md devices are created when first opened and remain in existence
> > until the module is unloaded.
> > This isn't a major problem, but it somewhat ugly.
> > 
> > This patch changes the lifetime rules so that an md device will
> > disappear on the last close if it has no state.
> 
> This kills the G5:
> 
> 
> EXT3-fs: recovery complete.
> EXT3-fs: mounted filesystem with ordered data mode.
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=4 
> Modules linked in:
> NIP: C01A31B8 LR: C018E5DC CTR: C01A3404
> REGS: c17ff4a0 TRAP: 0300   Not tainted  (2.6.19-rc4-mm1)
> MSR: 90009032   CR: 8448  XER: 
> DAR: 6B6B6B6B6B6B6BB3, DSISR: 4000
> TASK = cff2b7f0[1899] 'nash' THREAD: c17fc000 CPU: 1
> GPR00: 0008 C17FF720 C06B26D0 6B6B6B6B6B6B6B7B 
..
> NIP [C01A31B8] .kobject_uevent+0xac/0x55c
> LR [C018E5DC] .__elv_unregister_queue+0x20/0x44
> Call Trace:
> [C17FF720] [C0562508] read_pipe_fops+0xd0/0xd8 (unreliable)
> [C17FF840] [C018E5DC] .__elv_unregister_queue+0x20/0x44
> [C17FF8D0] [C0195548] .blk_unregister_queue+0x58/0x9c
> [C17FF960] [C019683C] .unlink_gendisk+0x1c/0x50
> [C17FF9F0] [C0122840] .del_gendisk+0x98/0x22c

I'm guessing we need

diff .prev/block/elevator.c ./block/elevator.c
--- .prev/block/elevator.c  2006-10-31 20:06:22.0 +1100
+++ ./block/elevator.c  2006-10-31 20:06:40.0 +1100
@@ -926,7 +926,7 @@ static void __elv_unregister_queue(eleva
 
 void elv_unregister_queue(struct request_queue *q)
 {
-   if (q)
+   if (q && q->elevator)
__elv_unregister_queue(q->elevator);
 }


Jens?  md never registers and elevator for its queue.

> 
> Also, it'd be nice to enable CONFIG_MUST_CHECK and take a look at a few
> things...
> 
> drivers/md/md.c: In function `bind_rdev_to_array':
> drivers/md/md.c:1379: warning: ignoring return value of `kobject_add', 
> declared with attribute warn_unused_result
> drivers/md/md.c:1385: warning: ignoring return value of `sysfs_create_link', 
> declared with attribute warn_unused_result
> drivers/md/md.c: In function `md_probe':
> drivers/md/md.c:2986: warning: ignoring return value of `kobject_register', 
> declared with attribute warn_unused_result
> drivers/md/md.c: In function `do_md_run':
> drivers/md/md.c:3135: warning: ignoring return value of `sysfs_create_group', 
> declared with attribute warn_unused_result
> drivers/md/md.c:3150: warning: ignoring return value of `sysfs_create_link', 
> declared with attribute warn_unused_result
> drivers/md/md.c: In function `md_check_recovery':
> drivers/md/md.c:5446: warning: ignoring return value of `sysfs_create_link', 
> declared with attribute warn_unused_result

I guess... I saw mail a while ago about why we really should be
checking those.  I'll have to dig it up again.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 002 of 6] md: Change lifetime rules for 'md' devices.

2006-10-31 Thread Andrew Morton
On Tue, 31 Oct 2006 17:00:51 +1100
NeilBrown <[EMAIL PROTECTED]> wrote:

> Currently md devices are created when first opened and remain in existence
> until the module is unloaded.
> This isn't a major problem, but it somewhat ugly.
> 
> This patch changes the lifetime rules so that an md device will
> disappear on the last close if it has no state.

This kills the G5:


EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=4 
Modules linked in:
NIP: C01A31B8 LR: C018E5DC CTR: C01A3404
REGS: c17ff4a0 TRAP: 0300   Not tainted  (2.6.19-rc4-mm1)
MSR: 90009032   CR: 8448  XER: 
DAR: 6B6B6B6B6B6B6BB3, DSISR: 4000
TASK = cff2b7f0[1899] 'nash' THREAD: c17fc000 CPU: 1
GPR00: 0008 C17FF720 C06B26D0 6B6B6B6B6B6B6B7B 
GPR04: 0002 0001 0001 000200D0 
GPR08: 00050C00 C01A3404  C01A318C 
GPR12: 8444 C0535680 100FE350 100FE7B8 
GPR16:     
GPR20: 10005CD4 1009  10007E54 
GPR24:  C0472C80 6B6B6B6B6B6B6B7B C1FD2530 
GPR28: C7B8C2F0 6B6B6B6B6B6B6B7B C057DAE8 C79A2550 
NIP [C01A31B8] .kobject_uevent+0xac/0x55c
LR [C018E5DC] .__elv_unregister_queue+0x20/0x44
Call Trace:
[C17FF720] [C0562508] read_pipe_fops+0xd0/0xd8 (unreliable)
[C17FF840] [C018E5DC] .__elv_unregister_queue+0x20/0x44
[C17FF8D0] [C0195548] .blk_unregister_queue+0x58/0x9c
[C17FF960] [C019683C] .unlink_gendisk+0x1c/0x50
[C17FF9F0] [C0122840] .del_gendisk+0x98/0x22c
[C17FFA90] [C035B56C] .mddev_put+0xa0/0xe0
[C17FFB20] [C0362178] .md_release+0x84/0x9c
[C17FFBA0] [C00FDDE0] .__blkdev_put+0x204/0x220
[C17FFC50] [C00C765C] .__fput+0x234/0x274
[C17FFD00] [C00C5264] .filp_close+0x6c/0xfc
[C17FFD90] [C00C53B8] .sys_close+0xc4/0x178
[C17FFE30] [C000872C] syscall_exit+0x0/0x40
Instruction dump:
4e800420 0020 0250 0278 0280 0258 0260 0268 
0270 3b20 2fb9 419e003c  2fa0 7c1d0378 409e0070 
 <7>eth0: no IPv6 routers present

It happens during initscripts.  The machine has no MD devices.  config is
at http://userweb.kernel.org/~akpm/config-g5.txt

Also, it'd be nice to enable CONFIG_MUST_CHECK and take a look at a few
things...

drivers/md/md.c: In function `bind_rdev_to_array':
drivers/md/md.c:1379: warning: ignoring return value of `kobject_add', declared 
with attribute warn_unused_result
drivers/md/md.c:1385: warning: ignoring return value of `sysfs_create_link', 
declared with attribute warn_unused_result
drivers/md/md.c: In function `md_probe':
drivers/md/md.c:2986: warning: ignoring return value of `kobject_register', 
declared with attribute warn_unused_result
drivers/md/md.c: In function `do_md_run':
drivers/md/md.c:3135: warning: ignoring return value of `sysfs_create_group', 
declared with attribute warn_unused_result
drivers/md/md.c:3150: warning: ignoring return value of `sysfs_create_link', 
declared with attribute warn_unused_result
drivers/md/md.c: In function `md_check_recovery':
drivers/md/md.c:5446: warning: ignoring return value of `sysfs_create_link', 
declared with attribute warn_unused_result


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html