On Thu, Jan 18, 2024 at 02:18:42AM +0000, Chen Zhongjin wrote:
> There is a deadlock scenario in kprobe_optimizer():
> 
> pid A                         pid B                   pid C
> kprobe_optimizer()            do_exit()               perf_kprobe_init()
> mutex_lock(&kprobe_mutex)     exit_tasks_rcu_start()  
> mutex_lock(&kprobe_mutex)
> synchronize_rcu_tasks()               zap_pid_ns_processes()  // waiting 
> kprobe_mutex
> // waiting tasks_rcu_exit_srcu        kernel_wait4()
>                               // waiting pid C exit
> 
> To avoid this deadlock loop, use synchronize_rcu_tasks_rude() in 
> kprobe_optimizer()
> rather than synchronize_rcu_tasks(). synchronize_rcu_tasks_rude() can also 
> promise
> that all preempted tasks have scheduled, but it will not wait 
> tasks_rcu_exit_srcu.
> 
> Fixes: a30b85df7d59 ("kprobes: Use synchronize_rcu_tasks() for optprobe with 
> CONFIG_PREEMPT=y")
> Signed-off-by: Chen Zhongjin <chenzhong...@huawei.com>

Just so you know, your email ends up in gmail's spam folder.  :-/

> ---
> v1 -> v2: Add Fixes tag
> ---
>  arch/Kconfig     | 2 +-
>  kernel/kprobes.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index f4b210ab0612..dc6a18854017 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -104,7 +104,7 @@ config STATIC_CALL_SELFTEST
>  config OPTPROBES
>       def_bool y
>       depends on KPROBES && HAVE_OPTPROBES
> -     select TASKS_RCU if PREEMPTION
> +     select TASKS_RUDE_RCU
>  
>  config KPROBES_ON_FTRACE
>       def_bool y
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index d5a0ee40bf66..09056ae50c58 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -623,7 +623,7 @@ static void kprobe_optimizer(struct work_struct *work)
>        * Note that on non-preemptive kernel, this is transparently converted
>        * to synchronoze_sched() to wait for all interrupts to have completed.
>        */
> -     synchronize_rcu_tasks();
> +     synchronize_rcu_tasks_rude();

Again, that comment reads in full as follows:

        /*
         * Step 2: Wait for quiesence period to ensure all potentially
         * preempted tasks to have normally scheduled. Because optprobe
         * may modify multiple instructions, there is a chance that Nth
         * instruction is preempted. In that case, such tasks can return
         * to 2nd-Nth byte of jump instruction. This wait is for avoiding it.
         * Note that on non-preemptive kernel, this is transparently converted
         * to synchronoze_sched() to wait for all interrupts to have completed.
         */

Please note well that first sentence.

Unless that first sentence no longer holds, this patch cannot work
because synchronize_rcu_tasks_rude() will not (repeat, NOT) wait for
preempted tasks.

So how to safely break this deadlock?  Reproducing Chen Zhongjin's
diagram:

pid A                           pid B                   pid C
kprobe_optimizer()              do_exit()               perf_kprobe_init()
mutex_lock(&kprobe_mutex)       exit_tasks_rcu_start()  
mutex_lock(&kprobe_mutex)
synchronize_rcu_tasks()         zap_pid_ns_processes()  // waiting kprobe_mutex
// waiting tasks_rcu_exit_srcu  kernel_wait4()
                                // waiting pid C exit

We need to stop synchronize_rcu_tasks() from waiting on tasks like
pid B that are voluntarily blocked.  One way to do that is to replace
SRCU with a set of per-CPU lists.  Then exit_tasks_rcu_start() adds the
current task to this list and does ...

OK, this is getting a bit involved.  If you would like to follow along,
please feel free to look here:

https://docs.google.com/document/d/1MEHHs5qbbZBzhN8dGP17pt-d87WptFJ2ZQcqS221d9I/edit?usp=sharing

                                                        Thanx, Paul

>       /* Step 3: Optimize kprobes after quiesence period */
>       do_optimize_kprobes();
> -- 
> 2.25.1
> 

Reply via email to