Re: [PATCH 002 of 6] md: Change lifetime rules for 'md' devices.
On Tue, Oct 31 2006, Neil Brown wrote: > On Tuesday October 31, [EMAIL PROTECTED] wrote: > > On Tue, Oct 31 2006, Neil Brown wrote: > > > > > > I'm guessing we need > > > > > > diff .prev/block/elevator.c ./block/elevator.c > > > --- .prev/block/elevator.c2006-10-31 20:06:22.0 +1100 > > > +++ ./block/elevator.c2006-10-31 20:06:40.0 +1100 > > > @@ -926,7 +926,7 @@ static void __elv_unregister_queue(eleva > > > > > > void elv_unregister_queue(struct request_queue *q) > > > { > > > - if (q) > > > + if (q && q->elevator) > > > __elv_unregister_queue(q->elevator); > > > } > > > > > > > > > Jens? md never registers and elevator for its queue. > > > > Hmm, but blk_unregister_queue() doesn't call elv_unregister_queue() > > unless q->request_fn is set. And in that case, you must have an io > > scheduler attached. > > Hmm.. yes. Oh, I get it. I have > > blk_cleanup_queue(mddev->queue); > mddev->queue = NULL; > del_gendisk(mddev->gendisk); > mddev->gendisk = NULL; > > That's the wrong order, isn't it. :-( Yep, you want to reverse that :-) -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 002 of 6] md: Change lifetime rules for 'md' devices.
On Tuesday October 31, [EMAIL PROTECTED] wrote: > On Tue, Oct 31 2006, Neil Brown wrote: > > > > I'm guessing we need > > > > diff .prev/block/elevator.c ./block/elevator.c > > --- .prev/block/elevator.c 2006-10-31 20:06:22.0 +1100 > > +++ ./block/elevator.c 2006-10-31 20:06:40.0 +1100 > > @@ -926,7 +926,7 @@ static void __elv_unregister_queue(eleva > > > > void elv_unregister_queue(struct request_queue *q) > > { > > - if (q) > > + if (q && q->elevator) > > __elv_unregister_queue(q->elevator); > > } > > > > > > Jens? md never registers and elevator for its queue. > > Hmm, but blk_unregister_queue() doesn't call elv_unregister_queue() > unless q->request_fn is set. And in that case, you must have an io > scheduler attached. Hmm.. yes. Oh, I get it. I have blk_cleanup_queue(mddev->queue); mddev->queue = NULL; del_gendisk(mddev->gendisk); mddev->gendisk = NULL; That's the wrong order, isn't it. :-( Thanks, NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 002 of 6] md: Change lifetime rules for 'md' devices.
On Tue, Oct 31 2006, Neil Brown wrote: > On Tuesday October 31, [EMAIL PROTECTED] wrote: > > On Tue, 31 Oct 2006 17:00:51 +1100 > > NeilBrown <[EMAIL PROTECTED]> wrote: > > > > > Currently md devices are created when first opened and remain in existence > > > until the module is unloaded. > > > This isn't a major problem, but it somewhat ugly. > > > > > > This patch changes the lifetime rules so that an md device will > > > disappear on the last close if it has no state. > > > > This kills the G5: > > > > > > EXT3-fs: recovery complete. > > EXT3-fs: mounted filesystem with ordered data mode. > > Oops: Kernel access of bad area, sig: 11 [#1] > > SMP NR_CPUS=4 > > Modules linked in: > > NIP: C01A31B8 LR: C018E5DC CTR: C01A3404 > > REGS: c17ff4a0 TRAP: 0300 Not tainted (2.6.19-rc4-mm1) > > MSR: 90009032 CR: 8448 XER: > > DAR: 6B6B6B6B6B6B6BB3, DSISR: 4000 > > TASK = cff2b7f0[1899] 'nash' THREAD: c17fc000 CPU: 1 > > GPR00: 0008 C17FF720 C06B26D0 6B6B6B6B6B6B6B7B > .. > > NIP [C01A31B8] .kobject_uevent+0xac/0x55c > > LR [C018E5DC] .__elv_unregister_queue+0x20/0x44 > > Call Trace: > > [C17FF720] [C0562508] read_pipe_fops+0xd0/0xd8 (unreliable) > > [C17FF840] [C018E5DC] .__elv_unregister_queue+0x20/0x44 > > [C17FF8D0] [C0195548] .blk_unregister_queue+0x58/0x9c > > [C17FF960] [C019683C] .unlink_gendisk+0x1c/0x50 > > [C17FF9F0] [C0122840] .del_gendisk+0x98/0x22c > > I'm guessing we need > > diff .prev/block/elevator.c ./block/elevator.c > --- .prev/block/elevator.c2006-10-31 20:06:22.0 +1100 > +++ ./block/elevator.c2006-10-31 20:06:40.0 +1100 > @@ -926,7 +926,7 @@ static void __elv_unregister_queue(eleva > > void elv_unregister_queue(struct request_queue *q) > { > - if (q) > + if (q && q->elevator) > __elv_unregister_queue(q->elevator); > } > > > Jens? md never registers and elevator for its queue. Hmm, but blk_unregister_queue() doesn't call elv_unregister_queue() unless q->request_fn is set. And in that case, you must have an io scheduler attached. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 002 of 6] md: Change lifetime rules for 'md' devices.
On Tuesday October 31, [EMAIL PROTECTED] wrote: > On Tue, 31 Oct 2006 17:00:51 +1100 > NeilBrown <[EMAIL PROTECTED]> wrote: > > > Currently md devices are created when first opened and remain in existence > > until the module is unloaded. > > This isn't a major problem, but it somewhat ugly. > > > > This patch changes the lifetime rules so that an md device will > > disappear on the last close if it has no state. > > This kills the G5: > > > EXT3-fs: recovery complete. > EXT3-fs: mounted filesystem with ordered data mode. > Oops: Kernel access of bad area, sig: 11 [#1] > SMP NR_CPUS=4 > Modules linked in: > NIP: C01A31B8 LR: C018E5DC CTR: C01A3404 > REGS: c17ff4a0 TRAP: 0300 Not tainted (2.6.19-rc4-mm1) > MSR: 90009032 CR: 8448 XER: > DAR: 6B6B6B6B6B6B6BB3, DSISR: 4000 > TASK = cff2b7f0[1899] 'nash' THREAD: c17fc000 CPU: 1 > GPR00: 0008 C17FF720 C06B26D0 6B6B6B6B6B6B6B7B .. > NIP [C01A31B8] .kobject_uevent+0xac/0x55c > LR [C018E5DC] .__elv_unregister_queue+0x20/0x44 > Call Trace: > [C17FF720] [C0562508] read_pipe_fops+0xd0/0xd8 (unreliable) > [C17FF840] [C018E5DC] .__elv_unregister_queue+0x20/0x44 > [C17FF8D0] [C0195548] .blk_unregister_queue+0x58/0x9c > [C17FF960] [C019683C] .unlink_gendisk+0x1c/0x50 > [C17FF9F0] [C0122840] .del_gendisk+0x98/0x22c I'm guessing we need diff .prev/block/elevator.c ./block/elevator.c --- .prev/block/elevator.c 2006-10-31 20:06:22.0 +1100 +++ ./block/elevator.c 2006-10-31 20:06:40.0 +1100 @@ -926,7 +926,7 @@ static void __elv_unregister_queue(eleva void elv_unregister_queue(struct request_queue *q) { - if (q) + if (q && q->elevator) __elv_unregister_queue(q->elevator); } Jens? md never registers and elevator for its queue. > > Also, it'd be nice to enable CONFIG_MUST_CHECK and take a look at a few > things... > > drivers/md/md.c: In function `bind_rdev_to_array': > drivers/md/md.c:1379: warning: ignoring return value of `kobject_add', > declared with attribute warn_unused_result > drivers/md/md.c:1385: warning: ignoring return value of `sysfs_create_link', > declared with attribute warn_unused_result > drivers/md/md.c: In function `md_probe': > drivers/md/md.c:2986: warning: ignoring return value of `kobject_register', > declared with attribute warn_unused_result > drivers/md/md.c: In function `do_md_run': > drivers/md/md.c:3135: warning: ignoring return value of `sysfs_create_group', > declared with attribute warn_unused_result > drivers/md/md.c:3150: warning: ignoring return value of `sysfs_create_link', > declared with attribute warn_unused_result > drivers/md/md.c: In function `md_check_recovery': > drivers/md/md.c:5446: warning: ignoring return value of `sysfs_create_link', > declared with attribute warn_unused_result I guess... I saw mail a while ago about why we really should be checking those. I'll have to dig it up again. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 002 of 6] md: Change lifetime rules for 'md' devices.
On Tue, 31 Oct 2006 17:00:51 +1100 NeilBrown <[EMAIL PROTECTED]> wrote: > Currently md devices are created when first opened and remain in existence > until the module is unloaded. > This isn't a major problem, but it somewhat ugly. > > This patch changes the lifetime rules so that an md device will > disappear on the last close if it has no state. This kills the G5: EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=4 Modules linked in: NIP: C01A31B8 LR: C018E5DC CTR: C01A3404 REGS: c17ff4a0 TRAP: 0300 Not tainted (2.6.19-rc4-mm1) MSR: 90009032 CR: 8448 XER: DAR: 6B6B6B6B6B6B6BB3, DSISR: 4000 TASK = cff2b7f0[1899] 'nash' THREAD: c17fc000 CPU: 1 GPR00: 0008 C17FF720 C06B26D0 6B6B6B6B6B6B6B7B GPR04: 0002 0001 0001 000200D0 GPR08: 00050C00 C01A3404 C01A318C GPR12: 8444 C0535680 100FE350 100FE7B8 GPR16: GPR20: 10005CD4 1009 10007E54 GPR24: C0472C80 6B6B6B6B6B6B6B7B C1FD2530 GPR28: C7B8C2F0 6B6B6B6B6B6B6B7B C057DAE8 C79A2550 NIP [C01A31B8] .kobject_uevent+0xac/0x55c LR [C018E5DC] .__elv_unregister_queue+0x20/0x44 Call Trace: [C17FF720] [C0562508] read_pipe_fops+0xd0/0xd8 (unreliable) [C17FF840] [C018E5DC] .__elv_unregister_queue+0x20/0x44 [C17FF8D0] [C0195548] .blk_unregister_queue+0x58/0x9c [C17FF960] [C019683C] .unlink_gendisk+0x1c/0x50 [C17FF9F0] [C0122840] .del_gendisk+0x98/0x22c [C17FFA90] [C035B56C] .mddev_put+0xa0/0xe0 [C17FFB20] [C0362178] .md_release+0x84/0x9c [C17FFBA0] [C00FDDE0] .__blkdev_put+0x204/0x220 [C17FFC50] [C00C765C] .__fput+0x234/0x274 [C17FFD00] [C00C5264] .filp_close+0x6c/0xfc [C17FFD90] [C00C53B8] .sys_close+0xc4/0x178 [C17FFE30] [C000872C] syscall_exit+0x0/0x40 Instruction dump: 4e800420 0020 0250 0278 0280 0258 0260 0268 0270 3b20 2fb9 419e003c 2fa0 7c1d0378 409e0070 <7>eth0: no IPv6 routers present It happens during initscripts. The machine has no MD devices. config is at http://userweb.kernel.org/~akpm/config-g5.txt Also, it'd be nice to enable CONFIG_MUST_CHECK and take a look at a few things... drivers/md/md.c: In function `bind_rdev_to_array': drivers/md/md.c:1379: warning: ignoring return value of `kobject_add', declared with attribute warn_unused_result drivers/md/md.c:1385: warning: ignoring return value of `sysfs_create_link', declared with attribute warn_unused_result drivers/md/md.c: In function `md_probe': drivers/md/md.c:2986: warning: ignoring return value of `kobject_register', declared with attribute warn_unused_result drivers/md/md.c: In function `do_md_run': drivers/md/md.c:3135: warning: ignoring return value of `sysfs_create_group', declared with attribute warn_unused_result drivers/md/md.c:3150: warning: ignoring return value of `sysfs_create_link', declared with attribute warn_unused_result drivers/md/md.c: In function `md_check_recovery': drivers/md/md.c:5446: warning: ignoring return value of `sysfs_create_link', declared with attribute warn_unused_result - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 002 of 6] md: Change lifetime rules for 'md' devices.
Currently md devices are created when first opened and remain in existence until the module is unloaded. This isn't a major problem, but it somewhat ugly. This patch changes the lifetime rules so that an md device will disappear on the last close if it has no state. Locking rules depend on bd_mutex being held in do_open and __blkdev_put, and on setting bd_disk->private_data to 'mddev'. There is room for a race because md_probe is called early in do_open (get_gendisk) to create the mddev. As this isn't protected by bd_mutex, a concurrent call to md_close can destroy that mddev before do_open calls md_open to get a reference on it. md_open and md_close are serialised by md_mutex so the worst that can happen is that md_open finds that the mddev structure doesn't exist after all. In this case bd_disk->private_data will be NULL, and md_open chooses to exit with -EBUSY in this case, which is arguable and appropriate result. The new 'dead' field in mddev is used to track whether it is time to destroy the mddev (if a last-close happens). It is cleared when any state is create (set_array_info) and set when the array is stopped (do_md_stop). mddev_put becomes simpler. It just destroys the mddev when the refcount hits zero. This will normally be the reference held in bd_disk->private_data. cc: [EMAIL PROTECTED] Signed-off-by: Neil Brown <[EMAIL PROTECTED]> ### Diffstat output ./drivers/md/md.c | 35 +-- ./include/linux/raid/md_k.h |3 +++ 2 files changed, 28 insertions(+), 10 deletions(-) diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2006-10-31 16:41:02.0 +1100 +++ ./drivers/md/md.c 2006-10-31 16:41:14.0 +1100 @@ -226,13 +226,14 @@ static void mddev_put(mddev_t *mddev) { if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock)) return; - if (!mddev->raid_disks && list_empty(&mddev->disks)) { - list_del(&mddev->all_mddevs); - spin_unlock(&all_mddevs_lock); - blk_cleanup_queue(mddev->queue); - kobject_unregister(&mddev->kobj); - } else - spin_unlock(&all_mddevs_lock); + list_del(&mddev->all_mddevs); + spin_unlock(&all_mddevs_lock); + + blk_cleanup_queue(mddev->queue); + mddev->queue = NULL; + del_gendisk(mddev->gendisk); + mddev->gendisk = NULL; + kobject_unregister(&mddev->kobj); } static mddev_t * mddev_find(dev_t unit) @@ -273,6 +274,7 @@ static mddev_t * mddev_find(dev_t unit) atomic_set(&new->active, 1); spin_lock_init(&new->write_lock); init_waitqueue_head(&new->sb_wait); + new->dead = 1; new->queue = blk_alloc_queue(GFP_KERNEL); if (!new->queue) { @@ -3362,6 +3364,8 @@ static int do_md_stop(mddev_t * mddev, i disk = mddev->gendisk; if (disk) set_capacity(disk, 0); + mddev->dead = 1; + mddev->changed = 1; } else if (mddev->pers) printk(KERN_INFO "md: %s switched to read-only mode.\n", @@ -4293,7 +4297,8 @@ static int md_ioctl(struct inode *inode, printk(KERN_WARNING "md: couldn't set" " array info. %d\n", err); goto abort_unlock; - } + } else + mddev->dead = 0; } goto done_unlock; @@ -4377,6 +4382,8 @@ static int md_ioctl(struct inode *inode, err = -EFAULT; else err = add_new_disk(mddev, &info); + if (!err) + mddev->dead = 0; goto done_unlock; } @@ -4423,8 +4430,12 @@ static int md_open(struct inode *inode, * Succeed if we can lock the mddev, which confirms that * it isn't being stopped right now. */ - mddev_t *mddev = inode->i_bdev->bd_disk->private_data; - int err; + mddev_t *mddev; + int err = -EBUSY; + + mddev = inode->i_bdev->bd_disk->private_data; + if (!mddev) + goto out; if ((err = mutex_lock_interruptible_nested(&mddev->reconfig_mutex, 1))) goto out; @@ -4443,6 +4454,10 @@ static int md_release(struct inode *inod mddev_t *mddev = inode->i_bdev->bd_disk->private_data; BUG_ON(!mddev); + if (inode->i_bdev->bd_openers == 0 && mddev->dead) { + inode->i_bdev->bd_disk->private_data = NULL; + mddev_put(mddev); + } mddev_put(mddev); return 0; diff .prev/include/linux/raid/md_k.h ./include/linux/raid/md_k.h --- .prev/include/linux/raid/md_k.h 2006-10-31 16:40:51.000