In CONFIG_PREEMPT=n kernel a softlockup was observed while
the for loop in exit_sem. Apparently it's possible for the loop
to take quite a long time and it doesn't have a scheduling
point in it. Since the codes is executing under an rcu read
section this may also cause rcu stalls, which in turn block
synchronize_rcu operations, which more or less de-stabilises 
the whole system. 

Fix this by introducing a cond_resched at the beginning
of the loop.

Signed-off-by: Nikolay Borisov <[email protected]>
---

So this patch fixes the following: 

NMI watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [httpd:18119]
CPU: 10 PID: 18119 Comm: httpd Tainted: G           O    4.4.20-clouder2 #6
Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
task: ffff88348d695280 ti: ffff881c95550000 task.ti: ffff881c95550000
RIP: 0010:[<ffffffff81614bc7>]  [<ffffffff81614bc7>] _raw_spin_lock+0x17/0x30
RSP: 0018:ffff881c95553e40  EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff883161b1eea8 RCX: 000000000000000d
RDX: 0000000000000001 RSI: 000000000000000e RDI: ffff883161b1eea4
RBP: ffff881c95553ea0 R08: ffff881c95553e68 R09: ffff883fef376f88
R10: ffff881fffb58c20 R11: ffffea0072556600 R12: ffff883161b1eea0
R13: ffff88348d695280 R14: ffff883dec427000 R15: ffff8831621672a0
FS:  0000000000000000(0000) GS:ffff881fffb40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3b3723e020 CR3: 0000000001c0a000 CR4: 00000000001406e0
Stack:
 ffffffff8129717c ffff88348d695280 ffff883161b1eea4 ffffea000f2359c0
 000480094089b340 ffff881c95553e68 ffff881c95553e68 ffff88348d695280
 0000000000000000 00007ffe01ca0170 ffff883217c74280 ffff883217c742e8
Call Trace:
 [<ffffffff8129717c>] ? exit_sem+0x7c/0x280
 [<ffffffff81055548>] do_exit+0x338/0xb40
 [<ffffffff81055dd3>] do_group_exit+0x43/0xd0
 [<ffffffff81055e74>] SyS_exit_group+0x14/0x20
 [<ffffffff81614f5b>] entry_SYSCALL_64_fastpath+0x16/0x6e

 ipc/sem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/ipc/sem.c b/ipc/sem.c
index 1f97a24871c5..8563535cbd0e 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -2088,6 +2088,8 @@ void exit_sem(struct task_struct *tsk)
                struct list_head tasks;
                int semid, i;
 
+               cond_resched();
+
                rcu_read_lock();
                un = list_entry_rcu(ulp->list_proc.next,
                                    struct sem_undo, list_proc);
-- 
2.5.0

Reply via email to