On Wed, Feb 19, 2014 at 08:59:08PM +0100, Stephane Eranian wrote: > On Wed, Feb 19, 2014 at 7:36 PM, Peter Zijlstra <pet...@infradead.org> wrote: > > On Wed, Feb 19, 2014 at 07:03:13PM +0100, Stephane Eranian wrote: > >> I am trying to understand the context here. > >> Are you saying, we may call an offline CPU? > > > > Yes, that is what's happening. > > > >> I saw that sometimes you retry, sometimes you don't. > > > > I tried to do exactly what we do for the task case which is far more > > likely to fail. Could be I messed up. > > > I am not sure why you need to retry. If the CPU is offline, it is offline. > Or are you saying, you get an error, but you don't know the exact > reason, thus you keep trying? But how do you get out of this if > the CPU stays offline?
Ah, so take perf_remove_from_context() as before the patch; if the cpu_function_call() fails because the CPU is offline, it doesn't call list_del_event(). Now the offline function is supposed to take them off the list, but it doesn't actually in case they're grouped. This leaves a free()d event on the offline cpu's context list. After that things quickly go downwards. But before I got there I was led down a few too many rabbit holes trying to figure out wtf happened. We could probably fix it differently though. But by the time I more or less understood things I was too tired to make something pretty. Anyway; if you get to do something if cpu_function_call() fails; you have to also check if it got back up since you tried; at which point you've got the same pattern as we have for task_function_call(). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/