On Wed, Feb 12, 2025 at 04:42:39PM +0100, Petr Mladek wrote:
> CPU1 CPU1
>
> klp_try_complete_transition()
>
>
> taskA:
> + fork()
> + klp_copy_process()
> child->patch_state = KLP_PATCH_UNPATCHED
>
> klp_try_switch_task(taskA)
> // safe
>
> child->patch_state = KLP_PATCH_PATCHED
>
> all processes patched
>
> klp_finish_transition()
>
>
> list_add_tail_rcu(&p->thread_node,
> &p->signal->thread_head);
>
>
> BANG: The forked task has KLP_PATCH_UNPATCHED so that
> klp_ftrace_handler() will redirect it to the old code.
>
> But CPU1 thinks that all tasks are migrated and is going
> to finish the transition
Maybe klp_try_complete_transition() could iterate the tasks in two
passes? The first pass would use rcu_read_lock(). Then if all tasks
appear to be patched, try again with tasklist_lock.
Or, we could do something completely different. There's no need for
klp_copy_process() to copy the parent's state: a newly forked task can
be patched immediately because it has no stack.
So instead, just initialize it to KLP_TRANSITION_IDLE with
TIF_PATCH_PENDING cleared. Then when klp_ftrace_handler() encounters a
KLP_TRANSITION_IDLE task, it considers its state to be 'klp_target_state'.
// called from copy_process()
void klp_init_task(struct task_struct *child)
{
/* klp_ftrace_handler() will transition the task immediately */
child->patch_state = KLP_TRANSITION_IDLE;
clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
}
klp_ftrace_handler():
patch_state = current->patch_state;
if (patch_state == KLP_TRANSITION_IDLE)
patch_state = klp_target_state;
...
Hm?
> I would first like to understand how exactly the stall happens.
> It is possible that even rcu_read_lock() won't help here!
>
> If the it takes too long time to check backtraces of all pending
> processes then even rcu_read_lock() might trigger the RCU stall
> warning as well.
Yeah, based on Yafang's reply it appears there are RCU stalls either
way.
--
Josh