On Thu, Sep 18, 2025 at 07:24:14PM +0200, Peter Zijlstra wrote:

> So we have:
> 
> do_syscall_64()
>   ... do stuff ...
>   syscall_exit_to_user_mode(regs)
>     syscall_exit_to_user_mode_work(regs)
>       syscall_exit_work()
>       exit_to_user_mode_prepare()
>         exit_to_user_mode_loop()
>         retume_user_mode_work()
>           task_work_run()
>     exit_to_user_mode()
>       unwind_reset_info();
>       user_enter_irqoff();
>       arch_exit_to_user_mode();
>       lockdep_hardirqs_on();
>   SYSRET/IRET
> 
> 
> and
> 
> DEFINE_IDTENTRY*()
>   irqentry_enter();
>   ... stuff ...
>   irqentry_exit()
>     irqentry_exit_to_user_mode()
>       exit_to_user_mode_prepare()
>         exit_to_user_mode_loop();
>         retume_user_mode_work()
>           task_work_run()
>       exit_to_user_mode()
>         unwind_reset_info();
>       ...
>   IRET
> 
> Now, task_work_run() is in the exit_to_user_mode_loop() which is notably
> *before* exit_to_user_mode() which does the unwind_reset_info().
> 
> What happens if we get an NMI requesting an unwind after
> unwind_reset_info() while still very much being in the kernel on the way
> out?

AFAICT it will try and do a task_work_add(TWA_RESUME) from NMI context,
and this will fail horribly.

If you do something like:

        twa_mode = in_nmi() ? TWA_NMI_CURRENT : TWA_RESUME;
        task_work_add(foo, twa_mode);

it might actually work.


Reply via email to