** Changed in: linux (Ubuntu Noble)
Status: Fix Committed => Fix Released
** Changed in: linux (Ubuntu Jammy)
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2077044
Title:
zap_pid_ns_processes() gets stuck in a busy loop when zombie processes
are in namespace
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Jammy:
Fix Released
Status in linux source package in Noble:
Fix Released
Bug description:
BugLink: https://bugs.launchpad.net/bugs/2077044
[Impact]
A deadlock can occur in zap_pid_ns_processes() which can hang the
system due to RCU getting stuck.
zap_pid_ns_processes() has a busy loop that calls kernel_wait4() on a
child process of the namespace init task, waiting for it to exit. The
problem is, it clears TIF_SIGPENDING, but not TIF_NOTIFY_SIGNAL as
well, leading us to get stuck in the busy loop forever, due to the
child sleeping in synchronize_rcu(), and is never woken up due to the
parent being stuck in the busy loop and never calling schedule() or
rcu_note_context_switch().
A oops is:
Watchdog: BUG: soft lockup - CPU#3 stuck for 276s! [rcudeadlock:1836]
CPU: 3 PID: 1836 Comm: rcudeadlock Tainted: G L
5.15.0-117-generic #127-Ubuntu
RIP: 0010:_raw_read_lock+0xe/0x30
Code: f0 0f b1 17 74 08 31 c0 5d c3 cc cc cc cc b8 01 00 00 00 5d c3 cc cc cc
cc 0f 1f 00 0f 1f 44 00 00 b8 00 02 00 00 f0 0f c1 07 <a9> ff 01 00 00 75 05 c3
cc cc cc cc 55 48 89 e5 e8 4d 79 36 ff 5d
CR2: 000000c0002b0000
Call Trace:
<IRQ>
? show_trace_log_lvl+0x1d6/0x2ea
? show_trace_log_lvl+0x1d6/0x2ea
? kernel_wait4+0xaf/0x150
? show_regs.part.0+0x23/0x29
? show_regs.cold+0x8/0xd
? watchdog_timer_fn+0x1be/0x220
? lockup_detector_update_enable+0x60/0x60
? __hrtimer_run_queues+0x107/0x230
? read_hv_clock_tsc_cs+0x9/0x30
? hrtimer_interrupt+0x101/0x220
? hv_stimer0_isr+0x20/0x30
? __sysvec_hyperv_stimer0+0x32/0x70
? sysvec_hyperv_stimer0+0x7b/0x90
</IRQ>
<TASK>
? asm_sysvec_hyperv_stimer0+0x1b/0x20
? _raw_read_lock+0xe/0x30
? do_wait+0xa0/0x310
kernel_wait4+0xaf/0x150
? thread_group_exited+0x50/0x50
zap_pid_ns_processes+0x111/0x1a0
forget_original_parent+0x348/0x360
exit_notify+0x4a/0x210
do_exit+0x24f/0x3c0
do_group_exit+0x3b/0xb0
get_signal+0x150/0x900
arch_do_signal_or_restart+0xde/0x100
? __x64_sys_futex+0x78/0x1e0
exit_to_user_mode_loop+0xc4/0x160
exit_to_user_mode_prepare+0xa3/0xb0
syscall_exit_to_user_mode+0x27/0x50
? x64_sys_call+0x1022/0x1fa0
do_syscall_64+0x63/0xb0
? __io_uring_add_tctx_node+0x111/0x1a0
? fput+0x13/0x20
? __do_sys_io_uring_enter+0x10d/0x540
? __smp_call_single_queue+0x59/0x90
? exit_to_user_mode_prepare+0x37/0xb0
? syscall_exit_to_user_mode+0x2c/0x50
? x64_sys_call+0x1819/0x1fa0
? do_syscall_64+0x63/0xb0
? try_to_wake_up+0x200/0x5a0
? wake_up_q+0x50/0x90
? futex_wake+0x159/0x190
? do_futex+0x162/0x1f0
? __x64_sys_futex+0x78/0x1e0
? switch_fpu_return+0x4e/0xc0
? exit_to_user_mode_prepare+0x37/0xb0
? syscall_exit_to_user_mode+0x2c/0x50
? x64_sys_call+0x1022/0x1fa0
? do_syscall_64+0x63/0xb0
? do_user_addr_fault+0x1e7/0x670
? exit_to_user_mode_prepare+0x37/0xb0
? irqentry_exit_to_user_mode+0xe/0x20
? irqentry_exit+0x1d/0x30
? exc_page_fault+0x89/0x170
entry_SYSCALL_64_after_hwframe+0x6c/0xd6
</TASK>
There is no known workaround.
[Fix]
This was fixed in the below commit in 6.10-rc5:
commit 7fea700e04bd3f424c2d836e98425782f97b494e
Author: Oleg Nesterov <[email protected]>
Date: Sat Jun 8 14:06:16 2024 +0200
Subject: zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with
TIF_SIGPENDING
Link:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7fea700e04bd3f424c2d836e98425782f97b494e
This patch has made its way to upstream stable, and is already applied to
Ubuntu
kernels.
[Testcase]
There are two possible testcases to reproduce this issue.
This reproducer is courtesy of Rachel Menge, using the reproducers in her
github repo:
https://github.com/rlmenge/rcu-soft-lock-issue-repro
Start a Jammy or Noble VM on Azure, D8sV3 will be plenty.
$ git clone https://github.com/rlmenge/rcu-soft-lock-issue-repro.git
npm repro:
Install Docker.
$ sudo docker run telescope.azurecr.io/issue-repro/zombie:v1.1.11
$ ./rcu-npm-repro.sh
go repro:
$ go mod init rcudeadlock.go
$ go mod tidy
$ CGO_ENABLED=0 go build -o ./rcudeadlock ./
$ sudo ./rcudeadlock
Look at dmesg. After some minutes, you should see the hung task
timeout from the impact section.
[Where problems can occur]
We are clearing TIF_NOTIFY_SIGNAL in the child, in order for signal_pending()
to return false and not lead us to a busy wait loop.
This change should work as intended.
If a regression were to occur, it could potentially affect all
processes in namespaces.
[Other Info]
Upstream mailing list discussion:
https://lore.kernel.org/linux-kernel/[email protected]/T/
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2077044/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp