Re: [PATCH] kprobes: Use synchronize_rcu_tasks_rude in kprobe_optimizer

Paul E. McKenney Thu, 18 Jan 2024 06:42:01 -0800

On Wed, Jan 17, 2024 at 12:31:33PM -0800, Andrew Morton wrote:
> On Wed, 17 Jan 2024 06:16:36 +0000 Chen Zhongjin <[email protected]> 
> wrote:
> 
> > There is a deadlock scenario in kprobe_optimizer():
> > 
> > pid A                               pid B                   pid C
> > kprobe_optimizer()          do_exit()               perf_kprobe_init()
> > mutex_lock(&kprobe_mutex)   exit_tasks_rcu_start()  
> > mutex_lock(&kprobe_mutex)
> > synchronize_rcu_tasks()             zap_pid_ns_processes()  // waiting 
> > kprobe_mutex
> > // waiting tasks_rcu_exit_srcu      kernel_wait4()
> >                             // waiting pid C exit
> > 
> > To avoid this deadlock loop, use synchronize_rcu_tasks_rude() in 
> > kprobe_optimizer()
> > rather than synchronize_rcu_tasks(). synchronize_rcu_tasks_rude() can also 
> > promise
> > that all preempted tasks have scheduled, but it will not wait 
> > tasks_rcu_exit_srcu.


Hello, Chen Zhongjin,

Relying on the non-preemptability of the last portion of the do_exit()
function does appear to be the basis for a much better solution than
what we currently have.  So at the very least, thank you for the idea!!!
I feel a bit silly for not having thought of it myself.  ;-)

However, just invoking synchronize_rcu_tasks_rude() will be bad for both
energy efficiency and real-time response.  This is due to the fact that
synchronize_rcu_tasks_rude() sends an IPI to each and every online CPUs,
almost none of which will be in the non-preemptible tail of do_exit()
at any given time.  These additional unnecessary IPIs will drain battery
when they hit idle CPUs and degrade real-time response when they hit
CPUs running aggressive real-time applications.  Which might not make
people happy.

So, would you be willing to use RCU's do_exit() hooks and RCU's hook
into the scheduler (either rcu_note_context_switch() or rcu_tasks_qs(),
whichever would work better) maintain a per-CPU variable that is a
pointer to the task in the non-preemptible tail of do_exit() if there
is one or NULL otherwise?  This would get us the deadlock-avoidance
simplicity of your underlying idea, while avoiding (almost all of)
the energy-efficiency and real-time-response issues with your patch.

This does require a bit of memory-ordering care, so if you would prefer
that I do the implementation, just let me know.

(The memory-ordering trick is to use "smp_mb(); smp_load_acquire();" to
sample the counter and "smp_store_release(); smp_mb();" to update it.)

                                                        Thanx, Paul

> > Signed-off-by: Chen Zhongjin <[email protected]>
> 
> Thanks.  Should we backport this fix into earlier kernels?  If so, are
> we able to identify a suitable Fixes: target?

Re: [PATCH] kprobes: Use synchronize_rcu_tasks_rude in kprobe_optimizer

Reply via email to