Re: [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register save

Nicholas Piggin Mon, 20 Nov 2023 23:54:25 -0800

On Tue Nov 21, 2023 at 2:10 PM AEST, Timothy Pearson wrote:
> ----- Original Message -----
> > From: "Michael Ellerman" <m...@ellerman.id.au>
> > To: "Timothy Pearson" <tpear...@raptorengineering.com>
> > Cc: "Jens Axboe" <ax...@kernel.dk>, "regressions" 
> > <regressi...@lists.linux.dev>, "npiggin" <npig...@gmail.com>,
> > "christophe leroy" <christophe.le...@csgroup.eu>, "linuxppc-dev" 
> > <linuxppc-dev@lists.ozlabs.org>
> > Sent: Monday, November 20, 2023 5:39:52 PM
> > Subject: Re: [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec 
> > register  save
>
> > Timothy Pearson <tpear...@raptorengineering.com> writes:
> >> ----- Original Message -----
> >>> From: "Michael Ellerman" <m...@ellerman.id.au>
> > ...
> >>> 
> >>> But we now have a new path, because io-uring can call copy_process() via
> >>> create_io_thread() from the signal handling path. That's OK if the signal 
> >>> is
> >>> handled as we return from a syscall, but it's not OK if the signal is 
> >>> handled
> >>> due to some other interrupt.
> >>> 
> >>> Which is:
> >>> 
> >>> interrupt_return_srr_user()
> >>>  interrupt_exit_user_prepare()
> >>>    interrupt_exit_user_prepare_main()
> >>>      do_notify_resume()
> >>>        get_signal()
> >>>          task_work_run()
> >>>            create_worker_cb()
> >>>              create_io_worker()
> >>>                copy_process()
> >>>                  dup_task_struct()
> >>>                    arch_dup_task_struct()
> >>>                      flush_all_to_thread()
> >>>                        save_all()
> >>>                          if (tsk->thread.regs->msr & MSR_FP)
> >>>                            save_fpu()
> >>>                            # fr0 is clobbered and potentially live in 
> >>> userspace
> >>> 
> >>> 
> >>> So tldr I think the corruption is only an issue since io-uring started 
> >>> doing
> >>> the clone via signal, which I think matches the observed timeline of this 
> >>> bug
> >>> appearing.
> >>
> >> I agree the corruption really only started showing up in earnest on
> >> io_uring clone-via-signal, as this was confirmed several times in the
> >> course of debugging.
> > 
> > Thanks.
> > 
> >> Note as well that I may very well have a wrong call order in the
> >> commit message, since I was relying on a couple of WARN_ON() macros I
> >> inserted to check for a similar (but not identical) condition and
> >> didn't spend much time getting new traces after identifying the root
> >> cause.
> > 
> > Yep no worries. I'll reword it to incorporate the full path from my mail.
> > 
> >> I went back and grabbed some real world system-wide stack traces, since I 
> >> now
> >> know what to trigger on.  A typical example is:
> >>
> >> interrupt_return_srr_user()
> >>  interrupt_exit_user_prepare()
> >>   interrupt_exit_user_prepare_main()
> >>    schedule()
> >>     __schedule()
> >>      __switch_to()
> >>       giveup_all()
> >>        # tsk->thread.regs->msr MSR_FP is still set here
> >>        __giveup_fpu()
> >>         save_fpu()
> >>         # fr0 is clobbered and potentially live in userspace
> > 
> > fr0 is not live there.
> <snip> 
> > ie. it clears the FP etc. bits from the task's MSR. That means the FP
> > state will be reloaded from the thread struct before the task is run again.
>
> So a little more detail on this, just to put it to rest properly vs. assuming 
> hand analysis caught every possible pathway. :)
>
> The debugging that generates this stack trace also verifies the following in 
> __giveup_fpu():
>
> 1.) tsk->thread.fp_state.fpr doesn't contain the FPSCR contents prior to 
> calling save_fpu()
> 2.) tsk->thread.fp_state.fpr contains the FPSCR contents directly after 
> calling save_fpu()
> 3.) MSR_FP is set both in the task struct and in the live MSR.
>
> Only if all three conditions are met will it generate the trace.  This is a 
> generalization of the hack I used to find the problem in the first place.
>
> If the state will subsequently be reloaded from the thread struct, that means 
> we're reloading the registers from the thread struct that we just verified 
> was corrupted by the earlier save_fpu() call.  There are only two ways I can 
> see for that to be true -- one is if the registers were already clobbered 
> when giveup_all() was entered, and the other is if save_fpu() went ahead and 
> clobbered them right here inside giveup_all().
>
> To see which scenario we were dealing with, I added a bit more 
> instrumentation to dump the current register state if MSR_FP bit was already 
> set in registers (i.e. not dumping data from task struct, but using the live 
> FPU registers instead), and sure enough the registers are corrupt on entry, 
> so something else has already called save_fpu() before we even hit 
> giveup_all() in this call chain.
>
> Unless I'm missing something, doesn't this effectively mean that anything 
> interrupting a task can hit this bug?  Or, put another way, I'm seeing 
> several processes hit this exact call chain with the corrupt register going 
> back out to userspace without io_uring even in the mix, so there seems to be 
> another pathway in play.  These traces are from a qemu guest, in case it 
> matters given the kvm path is possibly susceptible.
>
> Just a few things to think about.  The FPU patch itself definitely resolves 
> the problems; I used a sledgehammer approach *specifically* so that there is 
> no place for a rare call sequence we didn't consider to hit it again down the 
> line. :)


I don't think interrupts are supposed to use (or save) FP/VEC
registers. So you're allowed to _take_ interrupts while FP/VEC
are being saved or used, just not be preempted, block, or
return to user. Hence all the preempt_disable() around these
things.

Not that we cover all these with warnings very well in the
enable_kernel_* functions AFAIKS. We could add more checks.
At least interrupts enabled would be good. balance and user
exit checks should somewhat be covered by preempt count
implicitly.

Thanks,
Nick

Re: [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register save

Reply via email to