On Tue, 10 Dec 2013, Greg KH wrote: > On Sun, Nov 03, 2013 at 08:33:08PM +0100, Sebastian Andrzej Siewior wrote: > > I run a couple times into the case where "put" was called on an already > > cleanup object. The results was either nothing because "0" went back to > > 0xff…ff and release was not called a second time or some thing like this > > with SLAB once that memory region was used again: > > > > |edma-dma-engine edma-dma-engine.0: freeing channel for 57 > > |Slab corruption (Not tainted): kmalloc-256 start=edc4c8c0, len=256 > > |070: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkjkkkkkkkkkkk > > |Single bit error detected. Probably bad RAM. > > |Run a memory test tool. > > |Prev obj: start=edc4c7c0, len=256 > > |000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk > > |010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk > > |Next obj: start=edc4c9c0, len=256 > > |000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk > > |010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk > > > > which got me a little confused about the bit flip at first. > > The reason for this was resource counting that went wrong followed by a > > "put" > > called from two places. > > After the second time running into the same problem using the same driver, > > I was looking for something to help me to find atleast one caller (and the > > kind of object) a little quicker. Here is a debugobject extension to kref > > which > > seems to do the job. > > At kref_init() the debugobject is initialized and activated. > > At kref_get() / kref_sub() time it is checked if the kref is active. If > > it is, the request is completed otherwise the "usual" debugobject is > > printed. Here an example of kref_put() on an already gone object: > > > > |edma-dma-engine edma-dma-engine.0: freeing channel for 57 > > |------------[ cut here ]------------ > > |WARNING: CPU: 0 PID: 2053 at lib/debugobjects.c:260 > > debug_print_object+0x94/0xc4() > > |ODEBUG: active_state not available (active state 0) object type: kref > > hint: (null) > > |Modules linked in: ti_am335x_adc(-) > > |[<c0014d38>] (unwind_backtrace+0x0/0xf4) from [<c001249c>] > > (show_stack+0x14/0x1c) > > |[<c001249c>] (show_stack+0x14/0x1c) from [<c0037264>] > > (warn_slowpath_common+0x64/0x84) > > |[<c0037264>] (warn_slowpath_common+0x64/0x84) from [<c0037318>] > > (warn_slowpath_fmt+0x30/0x40) > > |[<c0037318>] (warn_slowpath_fmt+0x30/0x40) from [<c022e8d0>] > > (debug_print_object+0x94/0xc4) > > |[<c022e8d0>] (debug_print_object+0x94/0xc4) from [<c022e9e4>] > > (debug_object_active_state+0xe4/0x148) > > |[<c022e9e4>] (debug_object_active_state+0xe4/0x148) from [<bf0a3474>] > > (iio_buffer_put+0x24/0x70 [industrialio]) > > |[<bf0a3474>] (iio_buffer_put+0x24/0x70 [industrialio]) from [<bf0a0340>] > > (iio_dev_release+0x44/0x64 [industrialio]) > > |[<bf0a0340>] (iio_dev_release+0x44/0x64 [industrialio]) from [<c0290308>] > > (device_release+0x2c/0x94) > > |[<c0290308>] (device_release+0x2c/0x94) from [<c021f040>] > > (kobject_release+0x44/0x78) > > |[<c021f040>] (kobject_release+0x44/0x78) from [<c029600c>] > > (release_nodes+0x1a0/0x248) > > |[<c029600c>] (release_nodes+0x1a0/0x248) from [<c029355c>] > > (__device_release_driver+0x98/0xf0) > > |[<c029355c>] (__device_release_driver+0x98/0xf0) from [<c0293668>] > > (driver_detach+0xb4/0xb8) > > |[<c0293668>] (driver_detach+0xb4/0xb8) from [<c0292538>] > > (bus_remove_driver+0x98/0xec) > > |[<c0292538>] (bus_remove_driver+0x98/0xec) from [<c008fe2c>] > > (SyS_delete_module+0x1e0/0x24c) > > |[<c008fe2c>] (SyS_delete_module+0x1e0/0x24c) from [<c000e680>] > > (ret_fast_syscall+0x0/0x48) > > |---[ end trace bc5e9551626b178a ]--- > > > > This should help to notice that there is a second "put" and the > > call chain should point to the object. The "hint" callback is empty because > > in the "double free" case we don't have any information and the release > > function is passed as an argument to kref_put(). I think the information > > is quite good and it is not worth extending debugobject to somehow add > > this information. > > The "get/put unless" macros aren't traced completely because it is hard > > to get it correct without a false positive and without touching each > > user. The object might be added to a list which is browsed by someone. > > That someone holds the same lock that is required for in the cleanup path > > and so the cleanup is blocked. That means that the kref object is > > gone from debugobject point of view but the release function has not yet > > complete so it is valid to check the current counter. > > > > v1…v2: > > - not an RFC anymore > > - addressed tglx review: > > - use debug_obj_descr with state active > > - use debug_object_active_state() to check for active object instead the > > other hack I had. > > - added debug_object_free() in a way that does not interfere with the > > NSA sniffer API so it does not get removed from the patch by accident. > > > > Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de> > > I need an ack from Thomas or some other debugobjects-knowledgable > developer before I can take this...
Reviwed-by: Thomas Gleixner <t...@linutronix.de>