Re: [RFD driver-core] Lifetime problems of the current driver model
On Mon, 2 Apr 2007 04:59:43 +0900, Tejun Heo [EMAIL PROTECTED] wrote: Okay, here's preliminary patch. I've tested it very lightly so it's likely to contain bugs and definitely needs to be splitted. Cool. However, there's something fishy there (not sure whether it's in your patch or a latent bug in the ccw bus code that just has been uncovered): [ cut here ] kernel BUG at mm/slab.c:607! illegal operation: 0001 [#1] CPU:1Not tainted Process kslowcrw (pid: 36, task: 01a80cc0, ksp: 01a87ce0) Krnl PSW : 040400018000 000b21b0 (kfree+0x144/0x154) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3 Krnl GPRS: 009836d8 03fffd22ffe0 000b20aa 00396ba8 01a6d100 00986c88 070001a879d8 00555428 0344a5b0 01a879c8 0344a500 00390ca8 000b20aa 01a879c8 Krnl Code: 000b21a2: e392c0480024 stg %r9,72(%r2,%r12) 000b21a8: a7f4ffc9 brc 15,1000b213a 000b21ac: a7f40001 brc 15,b21ae 000b21b0: a7f4ff9d brc 15,1000b20ea 000b21b4: e3303014 lg %r3,16(%r3) 000b21ba: a7f4ff90 brc 15,1000b20da 000b21be: 0707 bcr 0,%r7 000b21c0: eb8ff0580024 stmg %r8,%r15,88(%r15) Call Trace: ([000b20aa] kfree+0x3e/0x154) [001166f2] release_sysfs_dirent+0x3e/0xf4 [00114914] sysfs_hash_and_remove+0x150/0x154 [00118aec] remove_files+0x48/0x68 [00118b84] sysfs_remove_group+0x78/0xe8 [001cce3c] device_remove_groups+0x48/0x68 [001cd4c0] device_remove_attrs+0x3c/0x7c [001cd70e] device_del+0x20e/0x3a8 [001cd8d2] device_unregister+0x2a/0x44 [0022fb2c] css_sch_device_unregister+0x3c/0x54 [00230124] css_evaluate_subchannel+0x2f0/0x400 [0023041a] css_trigger_slow_path+0xda/0x154 [00054c9a] run_workqueue+0x136/0x1fc [00054e3a] worker_thread+0xda/0x170 [0005b05a] kthread+0x122/0x15c [00019912] kernel_thread_starter+0x6/0xc [0001990c] kernel_thread_starter+0x0/0xc This happens when I detach a device (which causes it to be unregistered). I'll look at it when I'm fully operational. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD driver-core] Lifetime problems of the current driver model
On Mon, Apr 02, 2007 at 04:59:43AM +0900, Tejun Heo wrote: Dependencies between sysfs/kobject objects are clearer now and I think I got the locking and referencing correct. This patch immediately fixes 'sysfs attr grabbing the wrong kobject and module' problem - sysfs and module lifetimes are unrelated now. We can drop half-working attr-owner. * A sysfs node no longer hold reference to its kobject. It just attaches itself to the kobject on creation and detaches on deletion. * For a dir node, sysfs_dirent no longer directly points to kobject. It points to sysfs_dir which contains pointer to kobject and a rwsem to protect it. * An open file doesn't hold a reference to kobject. It holds a reference to sysfs_dirent. kobject pointer is verified and show/store are performed with rwsem read-locked. Deletion disconnects the sysfs from its kobject while the rwsem is write-locked. This mechanism replaces buffer orphaning and kobject validation during open. Ah, very nice. * attr ops is determined on sysfs node creation not on open. This is a step toward making kobject opaque in sysfs. * bin files are handled similarly but mmap makes the open file hold reference to the kobject till it's closed. Any better ideas? This is probably needed, and might be acceptable, and I think we can live with it as there is only a very small number of sysfs files that fall into this category. * symlink is reimplemented in terms of sysfs_dirents instead of kobjects. sysfs_dirent-s_parent is added and each sysfs_dirent holds reference to its parent. Currently walking up the tree requires read locking and unlocking sysfs_dir at each level. This can be removed if name is added to sysfs_dirent. * As kobject can be disconnected anytime and sysfs code still needs to look follow dentry link in kobject, kobject-dentry is protected by dcache_lock. Once kobject becomes opaque to sysfs, this hack can go away. All in all, making kobject completely opaque in sysfs isn't too far away after this patch although it would require mass code update. What would need to be updated after this? What do you think about this approach? If it's acceptable, I'll test further and split the patch into logical steps to get it reviewed better. At first glance, I think it looks fine, but Maneesh is the one who understands this code better than anyone, so I would like to get his opinion on it. I think you should start splitting it all up into steps, which will help us review it a whole lot easier, as overall, I think this is a very worthy goal. thanks so much for doing this work, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD driver-core] Lifetime problems of the current driver model
On Mon, Apr 02, 2007 at 04:59:43AM +0900, Tejun Heo wrote: Hello, James, Greg. On Fri, Mar 30, 2007 at 01:19:34PM -0500, James Bottomley wrote: That's sort of what I was reaching for too ... it just looks to me that all the sysfs glue is in kobject, so they make a good candidate for the pure sysfs objects. Whatever we do, there has to be breakable cross linking between the driver's tree and the sysfs representation ... of course, nasty things like ksets get in the way of considering the objects as separate, sigh. The dependency between kobject and sysfs doesn't seem to be that tight. I managed to break them such that sysfs can disconnect from its kobject on deletion _without_ changing sysfs/kobject API. This way the sysfs reference counting can be put completely out of picture without impacting the rest of code. Handling symlink and suicidal attributes might need some extra attention but I think they can be done. True ... but you'll also have to implement something within sysfs to do refcounting ... it needs to know how long to keep its nodes around. sysfs_dirent already had reference counter and plays that role quite nicely. Okay, here's preliminary patch. I've tested it very lightly so it's likely to contain bugs and definitely needs to be splitted. Dependencies between sysfs/kobject objects are clearer now and I think I got the locking and referencing correct. This patch immediately fixes 'sysfs attr grabbing the wrong kobject and module' problem - sysfs and module lifetimes are unrelated now. We can drop half-working attr-owner. * A sysfs node no longer hold reference to its kobject. It just attaches itself to the kobject on creation and detaches on deletion. Earlier only sysfs leaf nodes held ref to kobject that is also only during open/close for regular files and in create/follow/destroy symlinks. * For a dir node, sysfs_dirent no longer directly points to kobject. It points to sysfs_dir which contains pointer to kobject and a rwsem to protect it. * An open file doesn't hold a reference to kobject. It holds a reference to sysfs_dirent. kobject pointer is verified and show/store are performed with rwsem read-locked. Deletion disconnects the sysfs from its kobject while the rwsem is write-locked. This mechanism replaces buffer orphaning and kobject validation during open. * attr ops is determined on sysfs node creation not on open. This is a step toward making kobject opaque in sysfs. * bin files are handled similarly but mmap makes the open file hold reference to the kobject till it's closed. Any better ideas? * symlink is reimplemented in terms of sysfs_dirents instead of kobjects. sysfs_dirent-s_parent is added and each sysfs_dirent holds reference to its parent. Currently walking up the tree requires read locking and unlocking sysfs_dir at each level. This can be removed if name is added to sysfs_dirent. * As kobject can be disconnected anytime and sysfs code still needs to look follow dentry link in kobject, kobject-dentry is protected by dcache_lock. Once kobject becomes opaque to sysfs, this hack can go away. All in all, making kobject completely opaque in sysfs isn't too far away after this patch although it would require mass code update. What do you think about this approach? If it's acceptable, I'll test further and split the patch into logical steps to get it reviewed better. It does complicate a lot in sysfs code. The code is already quite fragile considering the races (akpm is facing a couple of them, and which I am still working on) from VFS/dentry/inode angle. The previous rework of sysfs was to make dentries/inodes corresponding to sysfs leaf nodes reclaimable, i.e. looking at sysfs from VFS angle and IIUC you are looking at it from the driver layer to solve races between module/driver- layer/sysfs, which is a valid purpose. Following are the few rules in short to take care 1. directory dentries are pinned 2. regular files and symlink dentries are reclaimable 3. sysfs_dirent tree is protected using the parent inode's (ie directory dentry's inode) i_mutex 4. sysfs rename is protected using a semaphore, sysfs_rename_sem also. 5. sysfs_dirent starts with refcount of 1 at the time of creation, refcount is incremented/decremented when a dentry is attached or detached. So, the refcount at the max can be 2. 6. symlink creation pins the target kobject, so that the corresponding dentry is also pinned and removal releases the kobject. There were few more additions like sysfs poll and shadow directory support, for which respective authors can review and I can review the VFS related changes but please split the patches as you like but one possible way could be as chunks dealing with changes related to 1. data structure changes 2. creation/removal of directories 3. creation/removal of files 4. creation/removal of
Re: [RFD driver-core] Lifetime problems of the current driver model
On Mon, 2 Apr 2007 11:20:48 +0200, Cornelia Huck [EMAIL PROTECTED] wrote: Cool. However, there's something fishy there (not sure whether it's in your patch or a latent bug in the ccw bus code that just has been uncovered): Similar bug when loading/unloading a module that creates a driver attribute. The winner seems to be kfree(sd-s_element) in release_sysfs_dirent() (in case of an attribute, it will point to the attribute structure, which is usually statically created)... - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD driver-core] Lifetime problems of the current driver model
--- Tejun Heo [EMAIL PROTECTED] wrote: Cornelia Huck wrote: On Sat, 31 Mar 2007 00:08:19 +0900, Tejun Heo [EMAIL PROTECTED] wrote: (3) make sure all existing kobjects are released by module exit function. For example, let's say there is a hypothetical disk device /dev/dk0 driven by a hypothetical driver mydrv. /dev/dk0 is represented like the following in the sysfs tree. /sys/devices/pci:00/:00:1f.0/dk0/{myknob0,myknob1} Owner of both attrs myknob0 and myknob1 is mydrv and opening either increases the reference counts of dk0 and mydrv and closing does the opposite. * When there is no opener of either knob and the /dev/dk0 isn't used by anyone. Reference count of dk0 is 1, mydrv 0. Hm, but as long as dk0 is registered, it can be looked up and someone could get a reference on it. Yeah, exactly. That's why any getting any kobject reference backed by a module must be accompanied by try_module_get(). Tejun, This has always been the case. By design, the kobject infrastructure is one such that allows for different behavior implementations. You could use it to accomplish what you want and describe in your RFD, or you could use it to implement what we have today or you could use it to implement completely different (refcount) infrastructure. Luben int mydrv_get_dk(struct dk *dk) { rc = try_module_get(mydrv); if (rc) return rc; kobject_get(dk-kobj); return 0; } * User issues rmmod mydrv. As mydrv's reference count is zero, unload proceeds and mydrv's exit function is called. * mydrv's exit function looks like the following. mydrv_exit() { sysfs_remove_file(dk0, myknob0); sysfs_remove_file(dk1, myknob1); device_del(dk0); deinit controller; release all resources; } The device_del(dk0) drops dk0's reference count to zero and its -release is invoked immediately. And here is the problem if someone else still has a reference. The module will be unloaded, but -release will not be called until the someone else gives up the reference... Exactly, in that case, module reference count must not be zero. You and I are saying the same thing. Why are we running in circle? -- tejun - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD driver-core] Lifetime problems of the current driver model
--- James Bottomley [EMAIL PROTECTED] wrote: I'd favour trying to separate kobject and struct device for this ... move all the sysfs stuff into kobject and device only stuff into struct ^^^ Currently the kobject implementation is pure and well-defined. It is a good implementation [kobject], and I'd hate to see it lost into being convoluted with/into another model. Currently the infrastructure layers are well defined: kobject - (A layer with objects, their behavor and implementation) device - () sysfs. () This isn't that bad of an infrastructure. It is this well defined layering, i.e. objects, their behavior and implementation, that allows different (better/worse) infrastructures to be built on top of it. It is this well-defined layering which will allow what Tejun wants to be implemented. device ... but that would get us into disentangling the ksets, which, on balance, isn't going to be fun ... Luben - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD driver-core] Lifetime problems of the current driver model
Cornelia Huck wrote: On Mon, 2 Apr 2007 11:20:48 +0200, Cornelia Huck [EMAIL PROTECTED] wrote: Cool. However, there's something fishy there (not sure whether it's in your patch or a latent bug in the ccw bus code that just has been uncovered): Similar bug when loading/unloading a module that creates a driver attribute. The winner seems to be kfree(sd-s_element) in release_sysfs_dirent() (in case of an attribute, it will point to the attribute structure, which is usually statically created)... Thanks for finding it out. I was suspecting that last minute change. The code should be if (dir node) kfree(s_element) else if (symlink node) do things and kfree() Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD driver-core] Lifetime problems of the current driver model
Hello, James, Greg. On Fri, Mar 30, 2007 at 01:19:34PM -0500, James Bottomley wrote: That's sort of what I was reaching for too ... it just looks to me that all the sysfs glue is in kobject, so they make a good candidate for the pure sysfs objects. Whatever we do, there has to be breakable cross linking between the driver's tree and the sysfs representation ... of course, nasty things like ksets get in the way of considering the objects as separate, sigh. The dependency between kobject and sysfs doesn't seem to be that tight. I managed to break them such that sysfs can disconnect from its kobject on deletion _without_ changing sysfs/kobject API. This way the sysfs reference counting can be put completely out of picture without impacting the rest of code. Handling symlink and suicidal attributes might need some extra attention but I think they can be done. True ... but you'll also have to implement something within sysfs to do refcounting ... it needs to know how long to keep its nodes around. sysfs_dirent already had reference counter and plays that role quite nicely. Okay, here's preliminary patch. I've tested it very lightly so it's likely to contain bugs and definitely needs to be splitted. Dependencies between sysfs/kobject objects are clearer now and I think I got the locking and referencing correct. This patch immediately fixes 'sysfs attr grabbing the wrong kobject and module' problem - sysfs and module lifetimes are unrelated now. We can drop half-working attr-owner. * A sysfs node no longer hold reference to its kobject. It just attaches itself to the kobject on creation and detaches on deletion. * For a dir node, sysfs_dirent no longer directly points to kobject. It points to sysfs_dir which contains pointer to kobject and a rwsem to protect it. * An open file doesn't hold a reference to kobject. It holds a reference to sysfs_dirent. kobject pointer is verified and show/store are performed with rwsem read-locked. Deletion disconnects the sysfs from its kobject while the rwsem is write-locked. This mechanism replaces buffer orphaning and kobject validation during open. * attr ops is determined on sysfs node creation not on open. This is a step toward making kobject opaque in sysfs. * bin files are handled similarly but mmap makes the open file hold reference to the kobject till it's closed. Any better ideas? * symlink is reimplemented in terms of sysfs_dirents instead of kobjects. sysfs_dirent-s_parent is added and each sysfs_dirent holds reference to its parent. Currently walking up the tree requires read locking and unlocking sysfs_dir at each level. This can be removed if name is added to sysfs_dirent. * As kobject can be disconnected anytime and sysfs code still needs to look follow dentry link in kobject, kobject-dentry is protected by dcache_lock. Once kobject becomes opaque to sysfs, this hack can go away. All in all, making kobject completely opaque in sysfs isn't too far away after this patch although it would require mass code update. What do you think about this approach? If it's acceptable, I'll test further and split the patch into logical steps to get it reviewed better. Thanks. --- fs/sysfs/bin.c | 121 +++- fs/sysfs/dir.c | 134 ++-- fs/sysfs/file.c| 158 ++--- fs/sysfs/inode.c | 25 fs/sysfs/mount.c |9 --- fs/sysfs/symlink.c | 115 +++--- fs/sysfs/sysfs.h | 75 + 7 files changed, 366 insertions(+), 271 deletions(-) diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c index d3b9f5f..0c4671b 100644 --- a/fs/sysfs/bin.c +++ b/fs/sysfs/bin.c @@ -20,26 +20,41 @@ #include sysfs.h +struct bin_buffer { + struct mutexmutex; + void*buffer; + int mmapped; +}; + static int fill_read(struct dentry *dentry, char *buffer, loff_t off, size_t count) { struct bin_attribute * attr = to_bin_attr(dentry); - struct kobject * kobj = to_kobj(dentry-d_parent); + struct sysfs_dirent * sd = dentry-d_parent-d_fsdata; + struct kobject * kobj; + int rc; if (!attr-read) return -EIO; - return attr-read(kobj, buffer, off, count); + kobj = sysfs_get_kobj(sd); + if (!kobj) + return -ENODEV; + + rc = attr-read(kobj, buffer, off, count); + + sysfs_put_kobj(sd); + + return rc; } static ssize_t read(struct file * file, char __user * userbuf, size_t count, loff_t * off) { - char *buffer = file-private_data; + struct bin_buffer *bb = file-private_data; struct dentry *dentry = file-f_path.dentry; int size = dentry-d_inode-i_size; loff_t offs = *off; - int ret; if (count PAGE_SIZE) count
Re: [RFD driver-core] Lifetime problems of the current driver model
On Sat, 31 Mar 2007 12:12:48 +0900, Tejun Heo [EMAIL PROTECTED] wrote: Hm, but as long as dk0 is registered, it can be looked up and someone could get a reference on it. Yeah, exactly. That's why any getting any kobject reference backed by a module must be accompanied by try_module_get(). int mydrv_get_dk(struct dk *dk) { rc = try_module_get(mydrv); if (rc) return rc; kobject_get(dk-kobj); return 0; } This works if the caller always knows which module to grab (I was thinking about some tree-walking code). Exactly, in that case, module reference count must not be zero. You and I are saying the same thing. Why are we running in circle? I hope we're not, it just makes one dizzy :) I'll think some more about it... - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD driver-core] Lifetime problems of the current driver model
Cornelia Huck wrote: On Sat, 31 Mar 2007 12:12:48 +0900, Tejun Heo [EMAIL PROTECTED] wrote: Hm, but as long as dk0 is registered, it can be looked up and someone could get a reference on it. Yeah, exactly. That's why any getting any kobject reference backed by a module must be accompanied by try_module_get(). int mydrv_get_dk(struct dk *dk) { rc = try_module_get(mydrv); if (rc) return rc; kobject_get(dk-kobj); return 0; } This works if the caller always knows which module to grab (I was thinking about some tree-walking code). Yeah, that's why we have so many -owner's and getting it wrong results in jumping to non-existent address. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFD driver-core] Lifetime problems of the current driver model
Hello, all. This document tries to describe lifetime problems of the current device driver model primarily from the point view of device drivers and establish consensus, or at least, start discussion about how to solve these problems. This is primarily based on my experience with IDE and SCSI layers and my knowledge on other drivers is limited, so I might have generalized too aggressively. Feel free to point out. Table of Contents 1. Why is it this complex? 1-1. Rule #a and #b 1-2. Rule #b and #c 2. How do we solve this? 2-1. Rule #a and #b 2-2. Rule #b and #c 3. How do we get there from here? 4. Some afterthoughts 1. Why is it this complex? Object lifetime management in device drivers is complex and thus error-prone under the current driver model. The primary reason is that device drivers have to match impedances of three different life time rules. Rule a. The device itself. Many devices are hot pluggable these days. Devices come when they come and go when they go. Rule b. Driver model object lifetime rule. Fundamental data structures used by device drivers are reference counted and their lifetime can be arbitrarily extended by userspace sysfs access. Rule c. Module life time rule. Most device drivers can be compiled as modules and thus module busy count should be managed properly. 1-1. Rule #a and #b Being what it is, the primary concern of a device driver is the 'device' itself. A device driver should create and release objects representing the devices it manages as they come and go. In one way or another, this just has to be done, so this rule is pretty obvious and intuitive to device driver writers. The current device driver model mandates implementation of rule #b in device drivers. This makes the object lifetime management complicated and, more importantly, less intuitive to driver writers. As implementing such lifetime management in each device driver is too painful, most device driver midlayers handle this. e.g. SCSI midlayer handles driver model lifetime rules and presents simplified register/unregister-immediately interface to all low level SCSI drivers. IDE tries to do similar and USB and netdev have their own mechanisms too. I'm not familiar with other layers, but it took quite a while to get this right in the SCSI midlayer. The current code works and low level drivers can do away with most of lifetime management but the added complexity is saddening to look at. 1-2. Rule #b and #c On the surface, it seems that we're doing this mostly correctly but subtle bugs caused by interaction between driver model lifetime and module lifetime rules aren't difficult to find. This happens because the two lifetime rules are closely related but the relation is not properly represented in device driver model itself. A module cannot be unloaded unless all device driver objects it hosts are released, but the only way to guarantee that is to always grab module reference whenever grabbing relevant kobject reference. This is ugly to code and too easy to get wrong. Here are two such examples. Example 1. sysfs_schedule_callback() not grabbing the owning module This function is recently added to be used by suicidal sysfs nodes such that they don't deadlock when trying to unregister themselves. +#include linux/delay.h static void sysfs_schedule_callback_work(struct work_struct *work) { struct sysfs_schedule_callback_struct *ss = container_of(work, struct sysfs_schedule_callback_struct, work); + msleep(100); (ss-func)(ss-data); kobject_put(ss-kobj); kfree(ss); } int sysfs_schedule_callback(struct kobject *kobj, void (*func)(void *), void *data) { struct sysfs_schedule_callback_struct *ss; ss = kmalloc(sizeof(*ss), GFP_KERNEL); if (!ss) return -ENOMEM; kobject_get(kobj); ss-kobj = kobj; ss-func = func; ss-data = data; INIT_WORK(ss-work, sysfs_schedule_callback_work); schedule_work(ss-work); return 0; } Two lines starting with '+' are inserted to make the problem reliably reproducible. With the above changes, # insmod drivers/scsi/scsi_mod.ko; insmod drivers/scsi/sd_mod.ko; insmod drivers/ata/libata.ko; insmod drivers/ata/ahci.ko # echo 1 /sys/block/sda/device/delete; rmmod ahci; rmmod libata; rmmod sd_mod; rmmod scsi_mod It's assumed that ahci detects /dev/sda. The above command sequence causes the following oops. BUG: unable to handle kernel paging request at virtual address e0984020 [--snip--] EIP is at 0xe0984020 [--snip--] [c0127132] run_workqueue+0x92/0x140 [c01278c7] worker_thread+0x137/0x160 [c012a513] kthread+0xa3/0xd0 [c0104457] kernel_thread_helper+0x7/0x10 The problem here is that kobjec_get() in sysfs_schedule_callback() doesn't grab the module backing the kobject it's grabbing. By the time (ss-func)(ss-kobj) runs, scsi_mod
Re: [RFD driver-core] Lifetime problems of the current driver model
On Fri, 2007-03-30 at 18:43 +0900, Tejun Heo wrote: Orphaning sysfs nodes on unregistration is a big step in this direction. With sysfs reference counting out of the picture, implementing 'disconnect immediate' interface only on a few components (including request_queue) should suffice for most block device drivers. I'm not familiar with other drivers but I don't think they'll be very different. I agree with this statement. The big question in my mind is how do we do it? The essential problem, and the reason why the lifetime rules are entangled is that fundamentally, sysfs entries are managed by kobjects. The things the device drivers are interested in is struct device, which is usually embedded in driver data structures. Unfortunately, the required kobject is usually embedded in struct device meaning that the relevant driver structure cannot be freed while the sysfs entry still exists. It seems to me that the only way to Orphan the sysfs entries is to separate the kobject and the struct device so their lifetime rules are no longer entangled. Then we can free the driver structure with the embedded struct device while still keeping the necessary kobject around to perform orphaned sysfs operations. So it seems to me that what we need to do is to give struct device its own kref and a pointer to a kobject. Then we can manage the separate lifetimes independently. The device would basically allocate and take a reference to the kobject on device_add() and relinquish it again (and null out the kobject pointer) on device_del(). The complexity here is that we now have to allocate the kobject somewhere else ... probably in device_add() (it doesn't come for free with the device structures). James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD driver-core] Lifetime problems of the current driver model
Hi James, On 3/30/07, James Bottomley [EMAIL PROTECTED] wrote: On Fri, 2007-03-30 at 18:43 +0900, Tejun Heo wrote: Orphaning sysfs nodes on unregistration is a big step in this direction. With sysfs reference counting out of the picture, implementing 'disconnect immediate' interface only on a few components (including request_queue) should suffice for most block device drivers. I'm not familiar with other drivers but I don't think they'll be very different. I agree with this statement. The big question in my mind is how do we do it? The essential problem, and the reason why the lifetime rules are entangled is that fundamentally, sysfs entries are managed by kobjects. The things the device drivers are interested in is struct device, which is usually embedded in driver data structures. Unfortunately, the required kobject is usually embedded in struct device meaning that the relevant driver structure cannot be freed while the sysfs entry still exists. It seems to me that the only way to Orphan the sysfs entries is to separate the kobject and the struct device so their lifetime rules are no longer entangled. Then we can free the driver structure with the embedded struct device while still keeping the necessary kobject around to perform orphaned sysfs operations. So it seems to me that what we need to do is to give struct device its own kref and a pointer to a kobject. Then we can manage the separate lifetimes independently. The device would basically allocate and take a reference to the kobject on device_add() and relinquish it again (and null out the kobject pointer) on device_del(). The complexity here is that we now have to allocate the kobject somewhere else ... probably in device_add() (it doesn't come for free with the device structures). If you want to manage lifetime rules independently you might want to not embed struct device into you subsystems objects but attach them via pointers and use device_create(). Now that we orphan sysfs access upon unregistering device this will severe all ties from driver core to your system once you start teardown of a device and you should be in clear. However there are simpler subsystems (input, serio, etc.) where there is only one core module which provides services to device drivers and handles registration and deregistration. For such substustems it makes sense to embed struct devices and manage lifetime for all components at once. -- Dmitry - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD driver-core] Lifetime problems of the current driver model
On Fri, 30 Mar 2007 18:43:02 +0900, Tejun Heo [EMAIL PROTECTED] wrote: One way to solve this problem is to subordinate lifetime rule #b to rule #c. Each kobject points to its owning module such that grabbing a kobject automatically grabs the module. The problem with this approach is that it requires wide update and makes kobject_get heavier. Shouldn't getting/putting the module refcount be solely done in kobject.c? Grab the module reference when the kobject is created and release the module reference in kobject_cleanup() after the release function has been called. This doesn't make kobject_get() heavier, and it ensures we don't delete the module until after the last kobject it is supposed to clean up has been released. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD driver-core] Lifetime problems of the current driver model
Cornelia Huck wrote: On Fri, 30 Mar 2007 18:43:02 +0900, Tejun Heo [EMAIL PROTECTED] wrote: One way to solve this problem is to subordinate lifetime rule #b to rule #c. Each kobject points to its owning module such that grabbing a kobject automatically grabs the module. The problem with this approach is that it requires wide update and makes kobject_get heavier. Shouldn't getting/putting the module refcount be solely done in kobject.c? Grab the module reference when the kobject is created and release the module reference in kobject_cleanup() after the release function has been called. This doesn't make kobject_get() heavier, and it ensures we don't delete the module until after the last kobject it is supposed to clean up has been released. If we do that, we wouldn't be able to unload a module if there is any kobject referencing it even when the node has no openers, so no easy way out there. :-( -- tejun - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD driver-core] Lifetime problems of the current driver model
James Bottomley wrote: On Fri, 2007-03-30 at 18:43 +0900, Tejun Heo wrote: Orphaning sysfs nodes on unregistration is a big step in this direction. With sysfs reference counting out of the picture, implementing 'disconnect immediate' interface only on a few components (including request_queue) should suffice for most block device drivers. I'm not familiar with other drivers but I don't think they'll be very different. I agree with this statement. The big question in my mind is how do we do it? The essential problem, and the reason why the lifetime rules are entangled is that fundamentally, sysfs entries are managed by kobjects. The things the device drivers are interested in is struct device, which is usually embedded in driver data structures. Unfortunately, the required kobject is usually embedded in struct device meaning that the relevant driver structure cannot be freed while the sysfs entry still exists. It seems to me that the only way to Orphan the sysfs entries is to separate the kobject and the struct device so their lifetime rules are no longer entangled. Then we can free the driver structure with the embedded struct device while still keeping the necessary kobject around to perform orphaned sysfs operations. So it seems to me that what we need to do is to give struct device its own kref and a pointer to a kobject. Then we can manage the separate lifetimes independently. The device would basically allocate and take a reference to the kobject on device_add() and relinquish it again (and null out the kobject pointer) on device_del(). The complexity here is that we now have to allocate the kobject somewhere else ... probably in device_add() (it doesn't come for free with the device structures). My plan was to make sysfs more independent from struct device/kobject. e.g. Something like the following. * kobject_get() on attr/symlink creation * open() doesn't need extra kobject reference * deleting a node makes it release the kobject reference and the kobject pointer is nullified. This way the sysfs reference counting can be put completely out of picture without impacting the rest of code. Handling symlink and suicidal attributes might need some extra attention but I think they can be done. In the long term, I think sysfs should be independent from driver model and kobject such that an entity which wants to use sysfs but is not a device doesn't have to dance with kobject just to use sysfs. Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD driver-core] Lifetime problems of the current driver model
On Fri, 30 Mar 2007 22:19:52 +0900, Tejun Heo [EMAIL PROTECTED] wrote: Shouldn't getting/putting the module refcount be solely done in kobject.c? Grab the module reference when the kobject is created and release the module reference in kobject_cleanup() after the release function has been called. This doesn't make kobject_get() heavier, and it ensures we don't delete the module until after the last kobject it is supposed to clean up has been released. If we do that, we wouldn't be able to unload a module if there is any kobject referencing it even when the node has no openers, so no easy way out there. :-( But we must not unload the module when there is still a kobject around referencing a release function in the module, or we will oops if the kobject is finally released. If needed, the module must clean up its kobjects in its exit function. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD driver-core] Lifetime problems of the current driver model
Cornelia Huck wrote: On Fri, 30 Mar 2007 22:19:52 +0900, Tejun Heo [EMAIL PROTECTED] wrote: Shouldn't getting/putting the module refcount be solely done in kobject.c? Grab the module reference when the kobject is created and release the module reference in kobject_cleanup() after the release function has been called. This doesn't make kobject_get() heavier, and it ensures we don't delete the module until after the last kobject it is supposed to clean up has been released. If we do that, we wouldn't be able to unload a module if there is any kobject referencing it even when the node has no openers, so no easy way out there. :-( But we must not unload the module when there is still a kobject around referencing a release function in the module, or we will oops if the kobject is finally released. If needed, the module must clean up its kobjects in its exit function. It's a little bit more convoluted than that. Module reference count of zero doesn't indicate that there is no one referencing the module. It just means that the module can be unloaded. ie. There still can be any number of kobjects with release function backed by the module but as long as all of them can be deleted and released by module exit function, the module is unloadable at that point. IOW, module reference count does not count number of objects depending on the module. It counts the number of active usages of those objects. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD driver-core] Lifetime problems of the current driver model
On Fri, 30 Mar 2007 22:58:39 +0900, Tejun Heo [EMAIL PROTECTED] wrote: It's a little bit more convoluted than that. Module reference count of zero doesn't indicate that there is no one referencing the module. It just means that the module can be unloaded. ie. There still can be any number of kobjects with release function backed by the module but as long as all of them can be deleted and released by module exit function, the module is unloadable at that point. IOW, module reference count does not count number of objects depending on the module. It counts the number of active usages of those objects. We must make sure that the module is never deleted while there may be calls to -release functions - the exit function can only return when all -release calls have returned. This can be guaranteed if we (1) don't allow the module to unload if there are outstanding kobjects (we may need a self destruct knob then) or (2) make sure the -release functions are outside of the module (see, for example, drivers/s390/s390_rdev.c). (Gah, that stuff is always giving me headaches. Sorry if I'm not making sense...) - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD driver-core] Lifetime problems of the current driver model
On Fri, Mar 30, 2007 at 06:43:02PM +0900, Tejun Heo wrote: Hello, all. This document tries to describe lifetime problems of the current device driver model primarily from the point view of device drivers and establish consensus, or at least, start discussion about how to solve these problems. This is primarily based on my experience with IDE and SCSI layers and my knowledge on other drivers is limited, so I might have generalized too aggressively. Feel free to point out. Very nice summary of the problems we are trying to address here, thanks for doing this. 2. How do we solve this? 2-1. Rule #a and #b If you think about it, the impedance matching between rule #a and #b should be done somewhere. Maybe doing it in each low level driver is what's intended when the driver model is designed but it's impractically painful and the reality reflects that by doing it in midlayers. It's better if we do it in higher layers. For example, SCSI has a mechanism to reject new requests and drain request_queue to allow low level driver to just unregister an existing device. IDE currently doesn't have such feature but it would need to do almost the same thing to support hotplug. If request_queue just had a feature to drain and kill itself, both SCSI and IDE could use it. It would be simpler and less error-prone. On the other hand, if it's pushed downward, it will cause much more pain in all the low level drivers. No, we should not push this kind of thing down to the lower drivers. In fact, my goal is to try to keep a normal driver writer from having to know anything about the driver model, except for how to add and remove a sysfs file that is bound to the device controlled by the driver. And for the majority of the busses, this is true, it's only people like you who are mucking around in the bus specific code that know how hard this whole thing is :) Orphaning sysfs nodes on unregistration is a big step in this direction. With sysfs reference counting out of the picture, implementing 'disconnect immediate' interface only on a few components (including request_queue) should suffice for most block device drivers. I'm not familiar with other drivers but I don't think they'll be very different. All in all, I'm hoping something like the following can be done in device drivers, midlayer or low level. * For binding alloc resources; init controller; register to upper layers; * For unbinding unregister from upper layers; (no lingering references or objects) deinit controller; release resources; This basically nullifies lifetime rule #b from the POV of drivers. I agree, that should be our goal. And for almost all busses, this is true today, right? 2-2. Rule #b and #c One way to solve this problem is to subordinate lifetime rule #b to rule #c. Each kobject points to its owning module such that grabbing a kobject automatically grabs the module. The problem with this approach is that it requires wide update and makes kobject_get heavier. Why would kobject_get become heavier? Another more appealing approach is to do nothing. Solving the problem between rule #a and #b in the way described above means virtually nullifying rule #b. With rule #b gone, rule #c can't conflict with it. IOW, no reference from upper layer would be remaining after a driver unregisters itself from the upper layer - the module can be safely unloaded. Yes, but you have to watch out for devices that add their own files with callbacks to that driver. That is a problem today that Oliver has tried to address with his recent patches. 3. How do we get there from here? The current device driver model is used by most device drivers - huge amount of code. We can't update them at once. No, the majority of the logic is in the busses, and we can update them pretty easily. Fortunately, the suggested solution can easily be implemented gradually. We can add 'disconnect now' interface to upper driver interface one-by-one and convert its users gradually. The existing reference counting mechanism can be left alone and used to verify that the conversion is correct by verifying reference count is 0 after unregistering. Once the conversion is complete (maybe after a year), we can remove the reference counting mechanism from device driver interface. I'd be interested in seeing how we can do this. Do you have any specific patches started in this area? 4. Some afterthoughts * Doing this would make module reference counting more flexible. Instead of being confined by implementation, it can be used to define when to allow and disallow module unloading. (you can't unload a block driver module if a fs on it is mounted but sysfs access doesn't matter.) Well, remember, module unloading is racy and unrecommended to use anyway :) Also, look at the network drivers who all have their own special type of lock, with no module reference count increment when
Re: [RFD driver-core] Lifetime problems of the current driver model
On Fri, Mar 30, 2007 at 10:38:00PM +0900, Tejun Heo wrote: James Bottomley wrote: On Fri, 2007-03-30 at 18:43 +0900, Tejun Heo wrote: Orphaning sysfs nodes on unregistration is a big step in this direction. With sysfs reference counting out of the picture, implementing 'disconnect immediate' interface only on a few components (including request_queue) should suffice for most block device drivers. I'm not familiar with other drivers but I don't think they'll be very different. I agree with this statement. The big question in my mind is how do we do it? The essential problem, and the reason why the lifetime rules are entangled is that fundamentally, sysfs entries are managed by kobjects. The things the device drivers are interested in is struct device, which is usually embedded in driver data structures. Unfortunately, the required kobject is usually embedded in struct device meaning that the relevant driver structure cannot be freed while the sysfs entry still exists. It seems to me that the only way to Orphan the sysfs entries is to separate the kobject and the struct device so their lifetime rules are no longer entangled. Then we can free the driver structure with the embedded struct device while still keeping the necessary kobject around to perform orphaned sysfs operations. So it seems to me that what we need to do is to give struct device its own kref and a pointer to a kobject. Then we can manage the separate lifetimes independently. The device would basically allocate and take a reference to the kobject on device_add() and relinquish it again (and null out the kobject pointer) on device_del(). The complexity here is that we now have to allocate the kobject somewhere else ... probably in device_add() (it doesn't come for free with the device structures). My plan was to make sysfs more independent from struct device/kobject. e.g. Something like the following. * kobject_get() on attr/symlink creation * open() doesn't need extra kobject reference * deleting a node makes it release the kobject reference and the kobject pointer is nullified. Hm, this sounds good, but I think it will be racy. Although separating the very tight tie between sysfs and kobject would be very good to have. Pat originally wanted to do that years ago, but he's now lost to the land of a million sheep... So I don't object to this goal at all. This way the sysfs reference counting can be put completely out of picture without impacting the rest of code. Handling symlink and suicidal attributes might need some extra attention but I think they can be done. Ok, I'll but I really want to see that code :) In the long term, I think sysfs should be independent from driver model and kobject such that an entity which wants to use sysfs but is not a device doesn't have to dance with kobject just to use sysfs. I agree. thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD driver-core] Lifetime problems of the current driver model
On Fri, 2007-03-30 at 09:15 -0400, Dmitry Torokhov wrote: If you want to manage lifetime rules independently you might want to not embed struct device into you subsystems objects but attach them via pointers and use device_create(). Now that we orphan sysfs access upon unregistering device this will severe all ties from driver core to your system once you start teardown of a device and you should be in clear. But that wouldn't really help ... all objects have lifetimes. What Tejun is pointing out is that we have two different lifetime requirements: driver internal (and subsystem) objects and sysfs objects ... we still need something embedded in the driver objects, so it may as well be struct device ... trying to manage struct device outside of the driver objects would turn into another nasty refcounting problem. The struct device is usually the generic abstraction of the specific structure it's embedded in, so I think it really does make sense to keep these pieces together. However there are simpler subsystems (input, serio, etc.) where there is only one core module which provides services to device drivers and handles registration and deregistration. For such substustems it makes sense to embed struct devices and manage lifetime for all components at once. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD driver-core] Lifetime problems of the current driver model
On 3/30/07, James Bottomley [EMAIL PROTECTED] wrote: On Fri, 2007-03-30 at 09:15 -0400, Dmitry Torokhov wrote: If you want to manage lifetime rules independently you might want to not embed struct device into you subsystems objects but attach them via pointers and use device_create(). Now that we orphan sysfs access upon unregistering device this will severe all ties from driver core to your system once you start teardown of a device and you should be in clear. But that wouldn't really help ... all objects have lifetimes. What Tejun is pointing out is that we have two different lifetime requirements: driver internal (and subsystem) objects and sysfs objects ... Exactly. If driver-private data structures have different lifetime than device abstractions (like they seem to have with scsi and ide) then you split them and refcount separately. We are just arguing where to put the line. You want to split devices and kobjects and rework infrastructure and I am saying that we already have infrastructure we need and that split shoud be between generic device structure and driver-managed device structure. -- Dmitry - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD driver-core] Lifetime problems of the current driver model
Cornelia Huck wrote: On Sat, 31 Mar 2007 00:08:19 +0900, Tejun Heo [EMAIL PROTECTED] wrote: (3) make sure all existing kobjects are released by module exit function. For example, let's say there is a hypothetical disk device /dev/dk0 driven by a hypothetical driver mydrv. /dev/dk0 is represented like the following in the sysfs tree. /sys/devices/pci:00/:00:1f.0/dk0/{myknob0,myknob1} Owner of both attrs myknob0 and myknob1 is mydrv and opening either increases the reference counts of dk0 and mydrv and closing does the opposite. * When there is no opener of either knob and the /dev/dk0 isn't used by anyone. Reference count of dk0 is 1, mydrv 0. Hm, but as long as dk0 is registered, it can be looked up and someone could get a reference on it. Yeah, exactly. That's why any getting any kobject reference backed by a module must be accompanied by try_module_get(). int mydrv_get_dk(struct dk *dk) { rc = try_module_get(mydrv); if (rc) return rc; kobject_get(dk-kobj); return 0; } * User issues rmmod mydrv. As mydrv's reference count is zero, unload proceeds and mydrv's exit function is called. * mydrv's exit function looks like the following. mydrv_exit() { sysfs_remove_file(dk0, myknob0); sysfs_remove_file(dk1, myknob1); device_del(dk0); deinit controller; release all resources; } The device_del(dk0) drops dk0's reference count to zero and its -release is invoked immediately. And here is the problem if someone else still has a reference. The module will be unloaded, but -release will not be called until the someone else gives up the reference... Exactly, in that case, module reference count must not be zero. You and I are saying the same thing. Why are we running in circle? -- tejun - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD driver-core] Lifetime problems of the current driver model
Tejun Heo wrote: Cornelia Huck wrote: On Sat, 31 Mar 2007 00:08:19 +0900, Tejun Heo [EMAIL PROTECTED] wrote: (3) make sure all existing kobjects are released by module exit function. For example, let's say there is a hypothetical disk device /dev/dk0 driven by a hypothetical driver mydrv. /dev/dk0 is represented like the following in the sysfs tree. /sys/devices/pci:00/:00:1f.0/dk0/{myknob0,myknob1} Owner of both attrs myknob0 and myknob1 is mydrv and opening either increases the reference counts of dk0 and mydrv and closing does the opposite. * When there is no opener of either knob and the /dev/dk0 isn't used by anyone. Reference count of dk0 is 1, mydrv 0. Hm, but as long as dk0 is registered, it can be looked up and someone could get a reference on it. Yeah, exactly. That's why any getting any kobject reference backed by a module must be accompanied by try_module_get(). int mydrv_get_dk(struct dk *dk) { rc = try_module_get(mydrv); if (rc) return rc; kobject_get(dk-kobj); return 0; } And one more thing just in case. In the above code, try_module_get() and kobject_get() must be and is atomic w.r.t. try_stop_module(). That's why we do the following. stop_machine_run(__try_stop_module, sref, NR_CPUS);. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html