On Fri, Sep 5, 2014 at 9:59 AM, Mark Rutland <mark.rutl...@arm.com> wrote: > > As you point out below, the race on event->ctx is the fundamental issue. That > is what results in decrementing the refcount twice (once on a stale event->ctx > pointer).
So quite frankly, the whole perf_pmu_migrate_context() thing looks completely and fundamentally broken. Your patch doesn't really solve anything in general, and would need to be extended to do that get_online_cpus() around every single case where we read it. Which isn't really acceptable. perf_pmu_migrate_context() is just f*cked up. It needs to be reverted. It seems to think that "the event->ctx" pointer is a RCU-protected pointer, but it really isn't. Its lifetime is protected by refcounting, and the pointer access itself isn't protected by anything, because "event->ctx" isn't supposed to change over the lifetime of "event". Nobody does all the necessary rcu_read_[un]lock() around it, nor access it with rcu_dereference(), because all users know that it's just fixed. Quite frankly, I think commit 0cda4c023132 ("perf: Introduce perf_pmu_migrate_context()") is unfixably broken. The whole thing needs to be re-thought. It violates everything that everybody else does with the context. No amount of (sane) locking can fix the breakage, I suspect. A possible (but unfortunate) real fix is to try to keep the broken code around, and make its broken assumptions be actually true. Make "event->ctx" truly be an RCU pointer. Add the RCU annotations, add the required rcu read-locking, and make everything really do what that currently completely broken perf_pmu_migrate_context() thinks it does. That implies, for example, that any time you use the thing in a sleeping context, you'd need to do something like this (possibly with a helper function to do that: "get_event_ctx()" rcu_read_lock(); ctx = rcu_dereference(event->rcu); get_ctx(ctx); rcu_read_unlock(); .. you can now sleep here and use 'ctx' ... put_ctx(ctx); and if you don't need to sleep, you can just do rcu_read_lock(); ctx = rcu_dereference(event->ctx); .. use ctx in an atomic context ... rcu_read_unlock(); but that is a much bigger patch than either of the "introduce a completely broken ad-hoc lock around just one random use case" patches. Alternatively (and this sounds like the better fix), use some way to avoid that broken perf_pmu_migrate_context() model entirely. Never "migrate" events. Create completely new ones on the new CPU, and leave the old stale ones alone even on dead CPU's (they will presumably not actually be activated, and who the hell cares?). Or even just say: "if somebody takes down a CPU with existing uncore events, those events are now effectively dead". Don't try to migrate them, don't try to create new events on another CPU, just let it go. The CPU is down, the events are inactive. Keep it simple. Not the current completely broken thing that clearly doesn't honor the actual real rules for "event->ctx". Because the current rules are effectively that event->ctx is a constant over the whole lifetime of the event. Agreed? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/