On Thu, Jan 18, 2024 at 02:18:42AM +0000, Chen Zhongjin wrote: > There is a deadlock scenario in kprobe_optimizer(): > > pid A pid B pid C > kprobe_optimizer() do_exit() perf_kprobe_init() > mutex_lock(&kprobe_mutex) exit_tasks_rcu_start() > mutex_lock(&kprobe_mutex) > synchronize_rcu_tasks() zap_pid_ns_processes() // waiting > kprobe_mutex > // waiting tasks_rcu_exit_srcu kernel_wait4() > // waiting pid C exit > > To avoid this deadlock loop, use synchronize_rcu_tasks_rude() in > kprobe_optimizer() > rather than synchronize_rcu_tasks(). synchronize_rcu_tasks_rude() can also > promise > that all preempted tasks have scheduled, but it will not wait > tasks_rcu_exit_srcu. > > Fixes: a30b85df7d59 ("kprobes: Use synchronize_rcu_tasks() for optprobe with > CONFIG_PREEMPT=y") > Signed-off-by: Chen Zhongjin <chenzhong...@huawei.com>
Just so you know, your email ends up in gmail's spam folder. :-/ > --- > v1 -> v2: Add Fixes tag > --- > arch/Kconfig | 2 +- > kernel/kprobes.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/Kconfig b/arch/Kconfig > index f4b210ab0612..dc6a18854017 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -104,7 +104,7 @@ config STATIC_CALL_SELFTEST > config OPTPROBES > def_bool y > depends on KPROBES && HAVE_OPTPROBES > - select TASKS_RCU if PREEMPTION > + select TASKS_RUDE_RCU > > config KPROBES_ON_FTRACE > def_bool y > diff --git a/kernel/kprobes.c b/kernel/kprobes.c > index d5a0ee40bf66..09056ae50c58 100644 > --- a/kernel/kprobes.c > +++ b/kernel/kprobes.c > @@ -623,7 +623,7 @@ static void kprobe_optimizer(struct work_struct *work) > * Note that on non-preemptive kernel, this is transparently converted > * to synchronoze_sched() to wait for all interrupts to have completed. > */ > - synchronize_rcu_tasks(); > + synchronize_rcu_tasks_rude(); Again, that comment reads in full as follows: /* * Step 2: Wait for quiesence period to ensure all potentially * preempted tasks to have normally scheduled. Because optprobe * may modify multiple instructions, there is a chance that Nth * instruction is preempted. In that case, such tasks can return * to 2nd-Nth byte of jump instruction. This wait is for avoiding it. * Note that on non-preemptive kernel, this is transparently converted * to synchronoze_sched() to wait for all interrupts to have completed. */ Please note well that first sentence. Unless that first sentence no longer holds, this patch cannot work because synchronize_rcu_tasks_rude() will not (repeat, NOT) wait for preempted tasks. So how to safely break this deadlock? Reproducing Chen Zhongjin's diagram: pid A pid B pid C kprobe_optimizer() do_exit() perf_kprobe_init() mutex_lock(&kprobe_mutex) exit_tasks_rcu_start() mutex_lock(&kprobe_mutex) synchronize_rcu_tasks() zap_pid_ns_processes() // waiting kprobe_mutex // waiting tasks_rcu_exit_srcu kernel_wait4() // waiting pid C exit We need to stop synchronize_rcu_tasks() from waiting on tasks like pid B that are voluntarily blocked. One way to do that is to replace SRCU with a set of per-CPU lists. Then exit_tasks_rcu_start() adds the current task to this list and does ... OK, this is getting a bit involved. If you would like to follow along, please feel free to look here: https://docs.google.com/document/d/1MEHHs5qbbZBzhN8dGP17pt-d87WptFJ2ZQcqS221d9I/edit?usp=sharing Thanx, Paul > /* Step 3: Optimize kprobes after quiesence period */ > do_optimize_kprobes(); > -- > 2.25.1 >