Oleg Nesterov <[email protected]> writes:
> kernel_wait4() doesn't sleep and returns -EINTR if there is no
> eligible child and signal_pending() is true.
>
> That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not
> enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending()
> return false and avoid a busy-wait loop.
I took a look through the code. It used to be that TIF_NOTIFY_SIGNAL
was all about waking up a task so that task_work_run can be used.
io_uring still mostly uses it that way. There is also a use in
kthread_stop that just uses it as a TIF_SIGPENDING without having a
pending signal.
At the point in do_exit where exit_notify and thus zap_pid_ns_processes
is called I can't possibly see a use for TIF_NOTIFY_SIGNAL.
exit_task_work, exit_signals, and io_uring_cancel have all been called.
So TIF_NOTIFY_SIGNAL should be spurious at this point and safe to clear.
Why it remains set is a mystery to me.
If I had infinite time and energy the ideal is to rework the pid
namespace exit logic so that waiting for everything to exit works like
delay_group_leader in wait_task_consider. Simply blocking reaping of
the pid namespace leader until everything in the pid namespace have been
reaped. I think acct_exit_ns is the only piece of code that needs
to be moved to allow that, and acct_exit_ns is purely bookkeeping so
does not affect userspace visible semantics.
This active waiting is weird and non-standard in the kernel and winds up
causeing a problem every couple of years because of that.
>
> Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL")
> Reported-by: Rachel Menge <[email protected]>
> Closes:
> https://lore.kernel.org/all/[email protected]/
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/pid_namespace.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index dc48fecfa1dc..25f3cf679b35 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -218,6 +218,7 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
> */
> do {
> clear_thread_flag(TIF_SIGPENDING);
> + clear_thread_flag(TIF_NOTIFY_SIGNAL);
> rc = kernel_wait4(-1, NULL, __WALL, NULL);
> } while (rc != -ECHILD);