On Sat, 2018-06-30 at 20:57 -0700, Linus Torvalds wrote: > On Sat, Jun 30, 2018 at 8:42 PM Benjamin Herrenschmidt > <b...@kernel.crashing.org> wrote: > > > > But what if something *else* still holds a reference to the kobject ? > > It could be anything really... t > > But that's fine. Then the object will continue to exist, and the sysfs > file will continue to exist, and you won't get a new glue directory.
I suspect you didn't read it my entire argument or I wasn't clear enough :-) This is actually the crux of the problem: Yes the object continues to exist. However, the *last* kobject_put to it will no longer be done under whatever higher level locking the subsystem provides (whatever prevents for example concurrent add and removes). Thus in that scenario the "last minute" kobject_release() done by the last kobject_put() will be effectively unprotected from for example the gdp_mutex (in the case of the gluedirs) or whatever other locking the subsystem owning the kobject is using to avoid making that "refount 0" object "discoverable". So not only the conitnued existence of the object will potentially trigger the duplicate sysfs name problem, but it *will* also re-open the window for the object to be temporarily be visible with a 0 refcount. The kobject debugging doesn't create a new race here. It just significantly increase the size of an existing race window. You argued yourself that the reason that window is a non-issue without kobject debugging is that the expectation is that the release happens synchronously with the last kobject_put done by device_del, which has appropriate higher-level locking. My point is that this specific assumption doesn't hold in the case where another reference exists. In that case the object will be freed with the race open for others to observe it while its refcount is 0. > In fact, what you describe is a problem with *your* patch, exactly > because you introduce another counter that is *not* the reference > count, and now you have the problem with "old directory kobject is > still live, but I removed the sysfs part, and now I'm creating a new > object with the same name". > > Hmm? So my patch 1/2 prevents us from finding the old dying object (and thus from crashing) but replaces this with the duplicate name problem. My patch 2/2 removes that second problem by ensuring we remove the object from sysfs synchronously in device_del when it no longer contains any children, explicitely rather than implicitely by the virtue of doing the "last" kobject_put. You'll notice the existing code wraps that kobject_put() with the gdp_mutex, specifically I suppose to synronize with the creation path of the gluedir, ie with the expectation that this kobject_put() will synchronously remove the object from the kset. I argue this assumption is flawed, that another reference could delay the removal from the kset to a point where the mutex isn't held, and thus the race re-opens. My patch doesn't implement a separate "refcount" per-se, it implements a "childcount". This is akin to mm_users vs mm_count. We want the glue dir to be removed when it has no more children, we currently "conflate" this with the kobject having no reference. This is what I argue is incorrect. The kobject can have "unrelated" references, that define the lifetime of the object in memory, but it's presence in sysfs is dictated by the number of children it contains. > > It is and there's a WARN_ON about it inside kobject_get(). I don't > > think anybody argues against that, you are absolutely right. > > No. That the zero kobject_get() will not result in a warning. It just > does a kref_get(), no warnings anywhere. It's there but it's in refcount: void refcount_inc(refcount_t *r) { WARN_ONCE(!refcount_inc_not_zero(r), "refcount_t: increment on 0; use-after-free.\n"); } EXPORT_SYMBOL(refcount_inc); In fact that's how I started digging into that problem in the first place :-) > Yes, there is a kobject_get() warning, but that's about an > _uninitialized_ kobject, not a already released one! You will get no > warning that I can see from the "oh, you just got a stale one". Cheers, Ben. > Linus