On Feb 17, 2016 1:29 AM, "Borislav Petkov" <b...@alien8.de> wrote: > > On Wed, Feb 17, 2016 at 09:16:46AM +0100, Ingo Molnar wrote: > > So I'm wondering why this started triggering only now. Is this a > > pre-existing bug > > that somehow got triggered via: > > > > 58122bf1d856 x86/fpu: Default eagerfpu=on on all CPUs > > > > ? > > Well, that's an interesting question. See, the thing is, I triggered > this only *once* by accident and I haven't seen it ever since. > > The "reliable" "reproducer" I used to debug this was Andy's suggestion > to stick a schedule() in __fpu__restore_sig(). > > So the answer to that question is not easy. > > BUT(!), regardless, the bug still needs to be fixed because my tracing > here > > https://lkml.kernel.org/r/20160215191422.gb32...@pd.tnic > > showed that getting preempted after setting > > fpu->fpstate_active = 1; > > leads to the WARN. Because - and please doublecheck me on that - when > we're in __switch_to() and the task which already has ->fpstate_active > set and it is the next task to which we're going to switch to, when it > enters switch_fpu_prepare(), it does: > > fpu.preload = static_cpu_has(X86_FEATURE_FPU) && > new_fpu->fpstate_active && > ^^^^^^^^^^^^^^^^^^^^^^^ > > so that fpu.preload is set now. > > A bit later in that same function: > > /* Don't change CR0.TS if we just switch! */ > if (fpu.preload) { > new_fpu->counter++; > __fpregs_activate(new_fpu); > ^^^^^^^^^^^^^^^^^ > > ->fpregs_active gets set here and when the task returns to > __fpu__restore_sig(), fpu__restore() sets it again, leading to the WARN. > > Mind you, this happens on 32-bit only because there we sigreturn with > irqs enabled. Look at the call trace.
Not quite. IRQs are on in the 64-bit case, too, but the problematic code doesn't run because the 64-bit sigreturn case doesn't need to fiddle with the fpu layout. --Andy