Re: [PATCH 002 of 6] md: Change lifetime rules for 'md' devices.

2006-10-31 Thread Jens Axboe
On Tue, Oct 31 2006, Neil Brown wrote:
> On Tuesday October 31, [EMAIL PROTECTED] wrote:
> > On Tue, Oct 31 2006, Neil Brown wrote:
> > > 
> > > I'm guessing we need
> > > 
> > > diff .prev/block/elevator.c ./block/elevator.c
> > > --- .prev/block/elevator.c2006-10-31 20:06:22.0 +1100
> > > +++ ./block/elevator.c2006-10-31 20:06:40.0 +1100
> > > @@ -926,7 +926,7 @@ static void __elv_unregister_queue(eleva
> > >  
> > >  void elv_unregister_queue(struct request_queue *q)
> > >  {
> > > - if (q)
> > > + if (q && q->elevator)
> > >   __elv_unregister_queue(q->elevator);
> > >  }
> > > 
> > > 
> > > Jens?  md never registers and elevator for its queue.
> > 
> > Hmm, but blk_unregister_queue() doesn't call elv_unregister_queue()
> > unless q->request_fn is set. And in that case, you must have an io
> > scheduler attached.
> 
> Hmm.. yes.  Oh, I get it.  I have
> 
>   blk_cleanup_queue(mddev->queue);
>   mddev->queue = NULL;
>   del_gendisk(mddev->gendisk);
>   mddev->gendisk = NULL;
> 
> That's the wrong order, isn't it. :-(

Yep, you want to reverse that :-)

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 002 of 6] md: Change lifetime rules for 'md' devices.

2006-10-31 Thread Neil Brown
On Tuesday October 31, [EMAIL PROTECTED] wrote:
> On Tue, Oct 31 2006, Neil Brown wrote:
> > 
> > I'm guessing we need
> > 
> > diff .prev/block/elevator.c ./block/elevator.c
> > --- .prev/block/elevator.c  2006-10-31 20:06:22.0 +1100
> > +++ ./block/elevator.c  2006-10-31 20:06:40.0 +1100
> > @@ -926,7 +926,7 @@ static void __elv_unregister_queue(eleva
> >  
> >  void elv_unregister_queue(struct request_queue *q)
> >  {
> > -   if (q)
> > +   if (q && q->elevator)
> > __elv_unregister_queue(q->elevator);
> >  }
> > 
> > 
> > Jens?  md never registers and elevator for its queue.
> 
> Hmm, but blk_unregister_queue() doesn't call elv_unregister_queue()
> unless q->request_fn is set. And in that case, you must have an io
> scheduler attached.

Hmm.. yes.  Oh, I get it.  I have

blk_cleanup_queue(mddev->queue);
mddev->queue = NULL;
del_gendisk(mddev->gendisk);
mddev->gendisk = NULL;

That's the wrong order, isn't it. :-(

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 002 of 6] md: Change lifetime rules for 'md' devices.

2006-10-31 Thread Jens Axboe
On Tue, Oct 31 2006, Neil Brown wrote:
> On Tuesday October 31, [EMAIL PROTECTED] wrote:
> > On Tue, 31 Oct 2006 17:00:51 +1100
> > NeilBrown <[EMAIL PROTECTED]> wrote:
> > 
> > > Currently md devices are created when first opened and remain in existence
> > > until the module is unloaded.
> > > This isn't a major problem, but it somewhat ugly.
> > > 
> > > This patch changes the lifetime rules so that an md device will
> > > disappear on the last close if it has no state.
> > 
> > This kills the G5:
> > 
> > 
> > EXT3-fs: recovery complete.
> > EXT3-fs: mounted filesystem with ordered data mode.
> > Oops: Kernel access of bad area, sig: 11 [#1]
> > SMP NR_CPUS=4 
> > Modules linked in:
> > NIP: C01A31B8 LR: C018E5DC CTR: C01A3404
> > REGS: c17ff4a0 TRAP: 0300   Not tainted  (2.6.19-rc4-mm1)
> > MSR: 90009032   CR: 8448  XER: 
> > DAR: 6B6B6B6B6B6B6BB3, DSISR: 4000
> > TASK = cff2b7f0[1899] 'nash' THREAD: c17fc000 CPU: 1
> > GPR00: 0008 C17FF720 C06B26D0 6B6B6B6B6B6B6B7B 
> ..
> > NIP [C01A31B8] .kobject_uevent+0xac/0x55c
> > LR [C018E5DC] .__elv_unregister_queue+0x20/0x44
> > Call Trace:
> > [C17FF720] [C0562508] read_pipe_fops+0xd0/0xd8 (unreliable)
> > [C17FF840] [C018E5DC] .__elv_unregister_queue+0x20/0x44
> > [C17FF8D0] [C0195548] .blk_unregister_queue+0x58/0x9c
> > [C17FF960] [C019683C] .unlink_gendisk+0x1c/0x50
> > [C17FF9F0] [C0122840] .del_gendisk+0x98/0x22c
> 
> I'm guessing we need
> 
> diff .prev/block/elevator.c ./block/elevator.c
> --- .prev/block/elevator.c2006-10-31 20:06:22.0 +1100
> +++ ./block/elevator.c2006-10-31 20:06:40.0 +1100
> @@ -926,7 +926,7 @@ static void __elv_unregister_queue(eleva
>  
>  void elv_unregister_queue(struct request_queue *q)
>  {
> - if (q)
> + if (q && q->elevator)
>   __elv_unregister_queue(q->elevator);
>  }
> 
> 
> Jens?  md never registers and elevator for its queue.

Hmm, but blk_unregister_queue() doesn't call elv_unregister_queue()
unless q->request_fn is set. And in that case, you must have an io
scheduler attached.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 002 of 6] md: Change lifetime rules for 'md' devices.

2006-10-31 Thread Neil Brown
On Tuesday October 31, [EMAIL PROTECTED] wrote:
> On Tue, 31 Oct 2006 17:00:51 +1100
> NeilBrown <[EMAIL PROTECTED]> wrote:
> 
> > Currently md devices are created when first opened and remain in existence
> > until the module is unloaded.
> > This isn't a major problem, but it somewhat ugly.
> > 
> > This patch changes the lifetime rules so that an md device will
> > disappear on the last close if it has no state.
> 
> This kills the G5:
> 
> 
> EXT3-fs: recovery complete.
> EXT3-fs: mounted filesystem with ordered data mode.
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=4 
> Modules linked in:
> NIP: C01A31B8 LR: C018E5DC CTR: C01A3404
> REGS: c17ff4a0 TRAP: 0300   Not tainted  (2.6.19-rc4-mm1)
> MSR: 90009032   CR: 8448  XER: 
> DAR: 6B6B6B6B6B6B6BB3, DSISR: 4000
> TASK = cff2b7f0[1899] 'nash' THREAD: c17fc000 CPU: 1
> GPR00: 0008 C17FF720 C06B26D0 6B6B6B6B6B6B6B7B 
..
> NIP [C01A31B8] .kobject_uevent+0xac/0x55c
> LR [C018E5DC] .__elv_unregister_queue+0x20/0x44
> Call Trace:
> [C17FF720] [C0562508] read_pipe_fops+0xd0/0xd8 (unreliable)
> [C17FF840] [C018E5DC] .__elv_unregister_queue+0x20/0x44
> [C17FF8D0] [C0195548] .blk_unregister_queue+0x58/0x9c
> [C17FF960] [C019683C] .unlink_gendisk+0x1c/0x50
> [C17FF9F0] [C0122840] .del_gendisk+0x98/0x22c

I'm guessing we need

diff .prev/block/elevator.c ./block/elevator.c
--- .prev/block/elevator.c  2006-10-31 20:06:22.0 +1100
+++ ./block/elevator.c  2006-10-31 20:06:40.0 +1100
@@ -926,7 +926,7 @@ static void __elv_unregister_queue(eleva
 
 void elv_unregister_queue(struct request_queue *q)
 {
-   if (q)
+   if (q && q->elevator)
__elv_unregister_queue(q->elevator);
 }


Jens?  md never registers and elevator for its queue.

> 
> Also, it'd be nice to enable CONFIG_MUST_CHECK and take a look at a few
> things...
> 
> drivers/md/md.c: In function `bind_rdev_to_array':
> drivers/md/md.c:1379: warning: ignoring return value of `kobject_add', 
> declared with attribute warn_unused_result
> drivers/md/md.c:1385: warning: ignoring return value of `sysfs_create_link', 
> declared with attribute warn_unused_result
> drivers/md/md.c: In function `md_probe':
> drivers/md/md.c:2986: warning: ignoring return value of `kobject_register', 
> declared with attribute warn_unused_result
> drivers/md/md.c: In function `do_md_run':
> drivers/md/md.c:3135: warning: ignoring return value of `sysfs_create_group', 
> declared with attribute warn_unused_result
> drivers/md/md.c:3150: warning: ignoring return value of `sysfs_create_link', 
> declared with attribute warn_unused_result
> drivers/md/md.c: In function `md_check_recovery':
> drivers/md/md.c:5446: warning: ignoring return value of `sysfs_create_link', 
> declared with attribute warn_unused_result

I guess... I saw mail a while ago about why we really should be
checking those.  I'll have to dig it up again.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 002 of 6] md: Change lifetime rules for 'md' devices.

2006-10-31 Thread Andrew Morton
On Tue, 31 Oct 2006 17:00:51 +1100
NeilBrown <[EMAIL PROTECTED]> wrote:

> Currently md devices are created when first opened and remain in existence
> until the module is unloaded.
> This isn't a major problem, but it somewhat ugly.
> 
> This patch changes the lifetime rules so that an md device will
> disappear on the last close if it has no state.

This kills the G5:


EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=4 
Modules linked in:
NIP: C01A31B8 LR: C018E5DC CTR: C01A3404
REGS: c17ff4a0 TRAP: 0300   Not tainted  (2.6.19-rc4-mm1)
MSR: 90009032   CR: 8448  XER: 
DAR: 6B6B6B6B6B6B6BB3, DSISR: 4000
TASK = cff2b7f0[1899] 'nash' THREAD: c17fc000 CPU: 1
GPR00: 0008 C17FF720 C06B26D0 6B6B6B6B6B6B6B7B 
GPR04: 0002 0001 0001 000200D0 
GPR08: 00050C00 C01A3404  C01A318C 
GPR12: 8444 C0535680 100FE350 100FE7B8 
GPR16:     
GPR20: 10005CD4 1009  10007E54 
GPR24:  C0472C80 6B6B6B6B6B6B6B7B C1FD2530 
GPR28: C7B8C2F0 6B6B6B6B6B6B6B7B C057DAE8 C79A2550 
NIP [C01A31B8] .kobject_uevent+0xac/0x55c
LR [C018E5DC] .__elv_unregister_queue+0x20/0x44
Call Trace:
[C17FF720] [C0562508] read_pipe_fops+0xd0/0xd8 (unreliable)
[C17FF840] [C018E5DC] .__elv_unregister_queue+0x20/0x44
[C17FF8D0] [C0195548] .blk_unregister_queue+0x58/0x9c
[C17FF960] [C019683C] .unlink_gendisk+0x1c/0x50
[C17FF9F0] [C0122840] .del_gendisk+0x98/0x22c
[C17FFA90] [C035B56C] .mddev_put+0xa0/0xe0
[C17FFB20] [C0362178] .md_release+0x84/0x9c
[C17FFBA0] [C00FDDE0] .__blkdev_put+0x204/0x220
[C17FFC50] [C00C765C] .__fput+0x234/0x274
[C17FFD00] [C00C5264] .filp_close+0x6c/0xfc
[C17FFD90] [C00C53B8] .sys_close+0xc4/0x178
[C17FFE30] [C000872C] syscall_exit+0x0/0x40
Instruction dump:
4e800420 0020 0250 0278 0280 0258 0260 0268 
0270 3b20 2fb9 419e003c  2fa0 7c1d0378 409e0070 
 <7>eth0: no IPv6 routers present

It happens during initscripts.  The machine has no MD devices.  config is
at http://userweb.kernel.org/~akpm/config-g5.txt

Also, it'd be nice to enable CONFIG_MUST_CHECK and take a look at a few
things...

drivers/md/md.c: In function `bind_rdev_to_array':
drivers/md/md.c:1379: warning: ignoring return value of `kobject_add', declared 
with attribute warn_unused_result
drivers/md/md.c:1385: warning: ignoring return value of `sysfs_create_link', 
declared with attribute warn_unused_result
drivers/md/md.c: In function `md_probe':
drivers/md/md.c:2986: warning: ignoring return value of `kobject_register', 
declared with attribute warn_unused_result
drivers/md/md.c: In function `do_md_run':
drivers/md/md.c:3135: warning: ignoring return value of `sysfs_create_group', 
declared with attribute warn_unused_result
drivers/md/md.c:3150: warning: ignoring return value of `sysfs_create_link', 
declared with attribute warn_unused_result
drivers/md/md.c: In function `md_check_recovery':
drivers/md/md.c:5446: warning: ignoring return value of `sysfs_create_link', 
declared with attribute warn_unused_result


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 002 of 6] md: Change lifetime rules for 'md' devices.

2006-10-30 Thread NeilBrown

Currently md devices are created when first opened and remain in existence
until the module is unloaded.
This isn't a major problem, but it somewhat ugly.

This patch changes the lifetime rules so that an md device will
disappear on the last close if it has no state.

Locking rules depend on bd_mutex being held in do_open and
__blkdev_put, and on setting bd_disk->private_data to 'mddev'.

There is room for a race because md_probe is called early in do_open
(get_gendisk) to create the mddev.  As this isn't protected by
bd_mutex, a concurrent call to md_close can destroy that mddev before
do_open calls md_open to get a reference on it.
md_open and md_close are serialised by md_mutex so the worst that
can happen is that md_open finds that the mddev structure doesn't
exist after all.  In this case bd_disk->private_data will be NULL,
and md_open chooses to exit with -EBUSY in this case, which is
arguable and appropriate result.

The new 'dead' field in mddev is used to track whether it is time
to destroy the mddev (if a last-close happens).  It is cleared when
any state is create (set_array_info) and set when the array is stopped
(do_md_stop).

mddev_put becomes simpler. It just destroys the mddev when the
refcount hits zero.  This will normally be the reference held in
bd_disk->private_data.
  

cc: [EMAIL PROTECTED]
Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

### Diffstat output
 ./drivers/md/md.c   |   35 +--
 ./include/linux/raid/md_k.h |3 +++
 2 files changed, 28 insertions(+), 10 deletions(-)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c   2006-10-31 16:41:02.0 +1100
+++ ./drivers/md/md.c   2006-10-31 16:41:14.0 +1100
@@ -226,13 +226,14 @@ static void mddev_put(mddev_t *mddev)
 {
if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock))
return;
-   if (!mddev->raid_disks && list_empty(&mddev->disks)) {
-   list_del(&mddev->all_mddevs);
-   spin_unlock(&all_mddevs_lock);
-   blk_cleanup_queue(mddev->queue);
-   kobject_unregister(&mddev->kobj);
-   } else
-   spin_unlock(&all_mddevs_lock);
+   list_del(&mddev->all_mddevs);
+   spin_unlock(&all_mddevs_lock);
+
+   blk_cleanup_queue(mddev->queue);
+   mddev->queue = NULL;
+   del_gendisk(mddev->gendisk);
+   mddev->gendisk = NULL;
+   kobject_unregister(&mddev->kobj);
 }
 
 static mddev_t * mddev_find(dev_t unit)
@@ -273,6 +274,7 @@ static mddev_t * mddev_find(dev_t unit)
atomic_set(&new->active, 1);
spin_lock_init(&new->write_lock);
init_waitqueue_head(&new->sb_wait);
+   new->dead = 1;
 
new->queue = blk_alloc_queue(GFP_KERNEL);
if (!new->queue) {
@@ -3362,6 +3364,8 @@ static int do_md_stop(mddev_t * mddev, i
disk = mddev->gendisk;
if (disk)
set_capacity(disk, 0);
+   mddev->dead = 1;
+
mddev->changed = 1;
} else if (mddev->pers)
printk(KERN_INFO "md: %s switched to read-only mode.\n",
@@ -4293,7 +4297,8 @@ static int md_ioctl(struct inode *inode,
printk(KERN_WARNING "md: couldn't set"
   " array info. %d\n", err);
goto abort_unlock;
-   }
+   } else
+   mddev->dead = 0;
}
goto done_unlock;
 
@@ -4377,6 +4382,8 @@ static int md_ioctl(struct inode *inode,
err = -EFAULT;
else
err = add_new_disk(mddev, &info);
+   if (!err)
+   mddev->dead = 0;
goto done_unlock;
}
 
@@ -4423,8 +4430,12 @@ static int md_open(struct inode *inode, 
 * Succeed if we can lock the mddev, which confirms that
 * it isn't being stopped right now.
 */
-   mddev_t *mddev = inode->i_bdev->bd_disk->private_data;
-   int err;
+   mddev_t *mddev;
+   int err = -EBUSY;
+
+   mddev = inode->i_bdev->bd_disk->private_data;
+   if (!mddev)
+   goto out;
 
if ((err = mutex_lock_interruptible_nested(&mddev->reconfig_mutex, 1)))
goto out;
@@ -4443,6 +4454,10 @@ static int md_release(struct inode *inod
mddev_t *mddev = inode->i_bdev->bd_disk->private_data;
 
BUG_ON(!mddev);
+   if (inode->i_bdev->bd_openers == 0 && mddev->dead) {
+   inode->i_bdev->bd_disk->private_data = NULL;
+   mddev_put(mddev);
+   }
mddev_put(mddev);
 
return 0;

diff .prev/include/linux/raid/md_k.h ./include/linux/raid/md_k.h
--- .prev/include/linux/raid/md_k.h 2006-10-31 16:40:51.000