On Wednesday, October 27, 2010, Rafael J. Wysocki wrote: > On Tuesday, October 26, 2010, Mathieu Desnoyers wrote: > > * Alan Stern ([email protected]) wrote: > > > On Tue, 26 Oct 2010, Mathieu Desnoyers wrote: > > > > > > > * Peter Zijlstra ([email protected]) wrote: > > > > > On Tue, 2010-10-26 at 11:56 -0500, Pierre Tardy wrote: > > > > > > > > > > > > + trace_runtime_pm_usage(dev, > > > > > > atomic_read(&dev->power.usage_count)+1); > > > > > > atomic_inc(&dev->power.usage_count); > > > > > > > > > > That's terribly racy.. > > > > > > > > Looking at the original code, it looks racy even without considering the > > > > tracepoint: > > > > > > > > int __pm_runtime_get(struct device *dev, bool sync) > > > > { > > > > int retval; > > > > > > > > + trace_runtime_pm_usage(dev, > > > > atomic_read(&dev->power.usage_count)+1); > > > > atomic_inc(&dev->power.usage_count); > > > > retval = sync ? pm_runtime_resume(dev) : pm_request_resume(dev); > > > > > > > > There is no implied memory barrier after "atomic_inc". So either all > > > > these > > > > inc/dec are protected with mutexes or spinlocks, in which case one > > > > might wonder > > > > why atomic operations are used at all, or it's a racy mess. (I vote for > > > > the > > > > second option) > > > > > > I don't understand. What's the problem? The inc/dec are atomic > > > because they are not protected by spinlocks, but everything else is > > > (aside from the tracepoint, which is new). > > > > > > > kref should certainly be used there. > > > > > > What for? > > > > kref has the following "get": > > > > atomic_inc(&kref->refcount); > > smp_mb__after_atomic_inc(); > > > > What seems to be missing in __pm_runtime_get() and > > pm_runtime_get_noresume() is > > the memory barrier after the atomic increment. The atomic increment is free > > to > > be reordered into the following spinlock (within pm_request_resume or > > pm_request > > resume execution) because taking a spinlock only acts as a memory barrier > > with > > acquire semantic, not a full memory barrier. > > > > So AFAIU, the failure scenario would be as follows (sorry for the 80+ > > columns): > > > > initial conditions: usage_count = 1 > > > > CPU A CPU B > > 1) __pm_runtime_get() (sync = true) > > 2) atomic_inc(&usage_count) (not committed to memory yet) > > 3) pm_runtime_resume() > > 4) spin_lock_irqsave(&dev->power.lock, flags); > > 5) retval = __pm_request_resume(dev); > > If sync = true this is > retval = __pm_runtime_resume(dev); > which drops and reacquires the spinlock. In the meantime it sets > ->power.runtime_status so that __pm_runtime_idle() will fail if run at this > point. > > > 6) (execute the body of __pm_request_resume and return) > > 7) > > __pm_runtime_put() (sync = true) > > 8) if > > (atomic_dec_and_test(&dev->power.usage_count)) > > (still see > > usage_count == 1 before decrement, > > thus > > decrement to 0) > > 9) > > pm_runtime_idle() > > 10) spin_unlock_irqrestore(&dev->power.lock, flags) > > 11) > > spin_lock_irq(&dev->power.lock); > > 12) retval = > > __pm_runtime_idle(dev); > > Moreover, __pm_runtime_idle() checks ->power.usage_count under the spinlock, > so it will see it's been incremented in the meantime and it will back off. > > > 13) > > spin_unlock_irq(&dev->power.lock); > > > > So we end up in a situation where CPU A expects the device to be resumed, > > but > > the last action performed has been to bring it to idle. > > > > A smp_mb__after_atomic_inc() between lines 2 and 3 would fix this. > > I don't think this particular race is possible. However, there is another one > that seems to be possible (in a different function) that an explicit barrier > will prevent from happening. > > It's related to pm_runtime_get_noresume(), but I think it's better to put the > barrier where it's necessary rather than into pm_runtime_get_noresume() > itself.
Actually, no. Since rpm_idle() and rpm_suspend() both check usage_count under the spinlock, the race I was thinking about doesn't appear to be possible after all. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
