On Tue Nov 21, 2023 at 9:39 AM AEST, Michael Ellerman wrote: > Timothy Pearson <tpear...@raptorengineering.com> writes: > > ----- Original Message ----- > >> From: "Michael Ellerman" <m...@ellerman.id.au> > ... > >> > >> But we now have a new path, because io-uring can call copy_process() via > >> create_io_thread() from the signal handling path. That's OK if the signal > >> is > >> handled as we return from a syscall, but it's not OK if the signal is > >> handled > >> due to some other interrupt. > >> > >> Which is: > >> > >> interrupt_return_srr_user() > >> interrupt_exit_user_prepare() > >> interrupt_exit_user_prepare_main() > >> do_notify_resume() > >> get_signal() > >> task_work_run() > >> create_worker_cb() > >> create_io_worker() > >> copy_process() > >> dup_task_struct() > >> arch_dup_task_struct() > >> flush_all_to_thread() > >> save_all() > >> if (tsk->thread.regs->msr & MSR_FP) > >> save_fpu() > >> # fr0 is clobbered and potentially live in > >> userspace > >> > >> > >> So tldr I think the corruption is only an issue since io-uring started > >> doing > >> the clone via signal, which I think matches the observed timeline of this > >> bug > >> appearing. > > > > I agree the corruption really only started showing up in earnest on > > io_uring clone-via-signal, as this was confirmed several times in the > > course of debugging. > > Thanks. > > > Note as well that I may very well have a wrong call order in the > > commit message, since I was relying on a couple of WARN_ON() macros I > > inserted to check for a similar (but not identical) condition and > > didn't spend much time getting new traces after identifying the root > > cause. > > Yep no worries. I'll reword it to incorporate the full path from my mail. > > > I went back and grabbed some real world system-wide stack traces, since I > > now know what to trigger on. A typical example is: > > > > interrupt_return_srr_user() > > interrupt_exit_user_prepare() > > interrupt_exit_user_prepare_main() > > schedule() > > __schedule() > > __switch_to() > > giveup_all() > > # tsk->thread.regs->msr MSR_FP is still set here > > __giveup_fpu() > > save_fpu() > > # fr0 is clobbered and potentially live in userspace > > fr0 is not live there. > > __giveup_fpu() does roughly: > > msr = tsk->thread.regs->msr; > msr &= ~(MSR_FP|MSR_FE0|MSR_FE1); > msr &= ~MSR_VSX; > tsk->thread.regs = msr; > > ie. it clears the FP etc. bits from the task's MSR. That means the FP > state will be reloaded from the thread struct before the task is run again. > > Also on that path we're switching to another task, so we'll be reloading > the other task's FP state before returning to userspace. > > So I don't see any bug there. > > There's only two places that call save_fpu() and skip the giveup logic, > which is save_all() and kvmppc_save_user_regs(). > > save_all() is only called via clone() so I think that's unable to > actually cause visible register corruption as I described in my previous > mail. > > I thought the KVM case was similar, as it's called via an ioctl, but > I'll have to talk to Nick as his mail indicates otherwise.
Yeah it can, because later on it runs the guest and that will clobber other regs. I reproduced fr14 corruption in the host easily with KVM selftests. It should just do a "giveup" on FP/VEC (as it does with TM). Thanks, Nick