On Tue, Feb 11, 2025 at 02:24:36PM +0800, Yafang Shao wrote:
> void klp_try_complete_transition(void)
> {
> + unsigned long timeout, proceed_pending_processes;
> unsigned int cpu;
> struct task_struct *g, *task;
> struct klp_patch *patch;
> @@ -467,9 +468,30 @@ void klp_try_complete_transition(void)
> * unless the patch includes changes to a very common function.
> */
> read_lock(&tasklist_lock);
> - for_each_process_thread(g, task)
> + timeout = jiffies + HZ;
> + proceed_pending_processes = 0;
> + for_each_process_thread(g, task) {
> + /* check if this task has already switched over */
> + if (task->patch_state == klp_target_state)
> + continue;
> +
> + proceed_pending_processes++;
> +
> if (!klp_try_switch_task(task))
> complete = false;
> +
> + /*
> + * Prevent hardlockup by not blocking tasklist_lock for too
> long.
> + * But guarantee the forward progress by making sure at least
> + * some pending processes were checked.
> + */
> + if (rwlock_is_contended(&tasklist_lock) &&
> + time_after(jiffies, timeout) &&
> + proceed_pending_processes > 100) {
> + complete = false;
> + break;
> + }
> + }
> read_unlock(&tasklist_lock);
Instead of all this can we not just use rcu_read_lock() instead of
tasklist_lock?
Petr, I know you mentioned that would widen the race window for the
do_exit() path, but don't we need to fix that race anyway?
--
Josh