+Cc Paul, who was weirdly forgotten last time
And one additional question below, which made me remember this thing. On Wed, Jul 29, 2020 at 02:58:11PM +0200, pet...@infradead.org wrote: > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > > rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): > > ------------[ cut here ]------------ > > IRQs not enabled as expected > > WARNING: CPU: 0 PID: 32297 at kernel/sched/core.c:2701 > > try_invoke_on_locked_down_task+0x18b/0x320 kernel/sched/core.c:2701 > > Kernel panic - not syncing: panic_on_warn set ... > > CPU: 0 PID: 32297 Comm: syz-executor.2 Not tainted 5.8.0-rc7-syzkaller #0 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > > Google 01/01/2011 > > Call Trace: > > <IRQ> > > __dump_stack lib/dump_stack.c:77 [inline] > > dump_stack+0x1f0/0x31e lib/dump_stack.c:118 > > panic+0x264/0x7a0 kernel/panic.c:231 > > __warn+0x227/0x250 kernel/panic.c:600 > > report_bug+0x1b1/0x2e0 lib/bug.c:198 > > handle_bug+0x42/0x80 arch/x86/kernel/traps.c:235 > > exc_invalid_op+0x16/0x40 arch/x86/kernel/traps.c:255 > > asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:540 > > RIP: 0010:try_invoke_on_locked_down_task+0x18b/0x320 > > kernel/sched/core.c:2701 > > Code: 48 89 df e8 f7 35 09 00 4c 89 f7 e8 df b5 cf 06 e9 b5 00 00 00 c6 05 > > 34 82 38 08 01 48 c7 c7 8c d7 07 89 31 c0 e8 a5 a9 f5 ff <0f> 0b e9 15 ff > > ff ff 48 c7 c1 30 71 8d 89 80 e1 07 80 c1 03 38 c1 > > RSP: 0018:ffffc90000007c50 EFLAGS: 00010046 > > RAX: 1aaa08be6903c500 RBX: ffff888085d16ac8 RCX: ffff888085d16240 > > RDX: 0000000000010004 RSI: 0000000000010004 RDI: 0000000000000000 > > RBP: ffff888085d16b0c R08: ffffffff815dd389 R09: ffffed1015d041c3 > > R10: ffffed1015d041c3 R11: 0000000000000000 R12: 0000000000000000 > > R13: ffff8880a8bac140 R14: ffff8880a8bac4c0 R15: dffffc0000000000 > > rcu_print_task_stall kernel/rcu/tree_stall.h:269 [inline] > > print_other_cpu_stall kernel/rcu/tree_stall.h:477 [inline] > > Ha, that calls it with IRQs already disabled, > > So I'm thinking we want something like so? > > --- > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 2142c6767682..3182caf14844 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -2694,12 +2694,11 @@ try_to_wake_up(struct task_struct *p, unsigned int > state, int wake_flags) > */ > bool try_invoke_on_locked_down_task(struct task_struct *p, bool > (*func)(struct task_struct *t, void *arg), void *arg) > { > - bool ret = false; > struct rq_flags rf; > + bool ret = false; > struct rq *rq; > > - lockdep_assert_irqs_enabled(); > - raw_spin_lock_irq(&p->pi_lock); > + raw_spin_lock_irqsave(&p->pi_lock, rf.flags); > if (p->on_rq) { > rq = __task_rq_lock(p, &rf); > if (task_rq(p) == rq) > @@ -2716,7 +2715,7 @@ bool try_invoke_on_locked_down_task(struct task_struct > *p, bool (*func)(struct t > ret = func(p, arg); > } > } > - raw_spin_unlock_irq(&p->pi_lock); > + raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags); > return ret; > } Paul, I wanted to use this function, but found it has very weird semantics. Why do you need it to (remotely) call @func when p is current? The user in rcu_print_task_stall() explicitly bails in this case, and the other in rcu_wait_for_one_reader() will attempt an IPI. Would it be possible to change this function to: - blocked task: call @func with p->pi_lock held - queued, !running task: call @func with rq->lock held - running task: fail. ?