On Fri, Feb 14, 2025 at 10:44:59AM +0800, Yafang Shao wrote:
> The longest duration of klp_try_complete_transition() ranges from 8.5
> to 17.2 seconds.
> 
> It appears that the RCU stall is not only driven by num_processes *
> average_klp_try_switch_task, but also by contention within
> klp_try_complete_transition(), particularly around the tasklist_lock.
> Interestingly, even after replacing "read_lock(&tasklist_lock)" with
> "rcu_read_lock()", the RCU stall persists. My verification shows that
> the only way to prevent the stall is by checking need_resched() during
> each iteration of the loop.

I'm confused... rcu_read_lock() shouldn't cause any contention, right?
So if klp_try_switch_task() isn't the problem, then what is?

I wonder if those function timings might be misleading.  If
klp_try_complete_transition() gets preempted immediately when it
releases the lock, it could take a while before it eventually returns.
So that funclatency might not be telling the whole story.

Though 8.5 - 17.2 seconds is a bit excessive...

-- 
Josh

Reply via email to