On Fri, Feb 28, 2025 at 12:22 AM Petr Mladek <[email protected]> wrote:
>
> On Thu 2025-02-27 10:47:33, Yafang Shao wrote:
> > The tasklist_lock in the KLP transition might result high latency under
> > specific workload. We can replace it with RCU.
> >
> > After a new task is forked, its kernel stack is always set to empty[0].
> > Therefore, we can init these new tasks to KLP_TRANSITION_IDLE state
> > after they are forked. If these tasks are forked during the KLP
> > transition but before klp_check_and_switch_task(), they can be safely
> > skipped during klp_check_and_switch_task(). Additionally, if
> > klp_ftrace_handler() is triggered right after forking, the task can
> > determine which function to use based on the klp_target_state.
> >
> > With the above change, we can safely convert the tasklist_lock to RCU.
> >
> > Link: https://lore.kernel.org/all/20250213173253.ovivhuq2c5rmvkhj@jpoimboe/
> > [0]
> > Link: https://lore.kernel.org/all/20250214181206.xkvxohoc4ft26uhf@jpoimboe/
> > [1]
> > Suggested-by: Josh Poimboeuf <[email protected]>
> > Signed-off-by: Yafang Shao <[email protected]>
> > ---
> > --- a/kernel/livepatch/patch.c
> > +++ b/kernel/livepatch/patch.c
> > @@ -95,7 +95,13 @@ static void notrace klp_ftrace_handler(unsigned long ip,
> >
> > patch_state = current->patch_state;
> >
> > - WARN_ON_ONCE(patch_state == KLP_TRANSITION_IDLE);
> > + /* If the patch_state is KLP_TRANSITION_IDLE, it means the
> > task
> > + * was forked after klp_init_transition(). In this case, the
> > + * newly forked task can determine which function to use based
> > + * on the klp_target_state.
> > + */
> > + if (patch_state == KLP_TRANSITION_IDLE)
> > + patch_state = klp_target_state;
> >
>
I'm a bit confused by your example. Let me break it down to ensure I
understand correctly:
> CPU0 CPU1
>
> task_0 (forked process):
>
> funcA
> klp_ftrace_handler()
> if (patch_state == KLP_TRANSITION_IDLE)
> patch_state = klp_target_state
> # set to KLP_TRANSITION_PATCHED
>
> # redirected to klp_funcA()
It seems that the KLP is still in the process of transitioning, right?
>
>
> echo 0 >/sys/kernel/livepatch/patch1/enabled
>
> klp_reverse_transition()
>
> klp_target_state = !klp_target_state;
> # set to KLP_TRANSITION_UNPATCHED
If you attempt to initiate a new transition while the current one is
still ongoing, the __klp_disable_patch function will return -EBUSY,
correct?
__klp_disable_patch
if (klp_transition_patch)
return -EBUSY;
On the other hand, if klp_transition_patch is NULL, it indicates that
the first KLP transition has completed successfully.
>
>
> funcB
> if (patch_state == KLP_TRANSITION_IDLE)
> patch_state = klp_target_state
> # set to KLP_TRANSITION_UNPATCHED
>
> # staying in origianl funcB
>
>
> BANG: livepatched and original function called at the same time
>
> => broken consistency model.
>
> BTW: This is what I tried to warn you about at
> https://lore.kernel.org/r/[email protected]
>
> See below for more:
>
> > if (patch_state == KLP_TRANSITION_UNPATCHED) {
> > /*
> > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> > index ba069459c101..af76defca67a 100644
> > --- a/kernel/livepatch/transition.c
> > +++ b/kernel/livepatch/transition.c
> > @@ -23,7 +23,7 @@ static DEFINE_PER_CPU(unsigned long[MAX_STACK_ENTRIES],
> > klp_stack_entries);
> >
> > struct klp_patch *klp_transition_patch;
> >
> > -static int klp_target_state = KLP_TRANSITION_IDLE;
> > +int klp_target_state = KLP_TRANSITION_IDLE;
> >
> > static unsigned int klp_signals_cnt;
> >
> > @@ -294,6 +294,14 @@ static int klp_check_and_switch_task(struct
> > task_struct *task, void *arg)
> > {
> > int ret;
> >
> > + /*
> > + * If the patch_state remains KLP_TRANSITION_IDLE at this point, it
> > + * indicates that the task was forked after klp_init_transition().
> > + * In this case, it is safe to skip the task.
> > + */
> > + if (!test_tsk_thread_flag(task, TIF_PATCH_PENDING))
> > + return 0;
> > +
> > if (task_curr(task) && task != current)
> > return -EBUSY;
> >
> > @@ -466,11 +474,11 @@ void klp_try_complete_transition(void)
> > * Usually this will transition most (or all) of the tasks on a system
> > * unless the patch includes changes to a very common function.
> > */
> > - read_lock(&tasklist_lock);
> > + rcu_read_lock();
> > for_each_process_thread(g, task)
> > if (!klp_try_switch_task(task))
> > complete = false;
> > - read_unlock(&tasklist_lock);
> > + rcu_read_unlock();
> >
> > /*
> > * Ditto for the idle "swapper" tasks.
> > @@ -694,25 +702,10 @@ void klp_reverse_transition(void)
> > }
> >
> > /* Called from copy_process() during fork */
> > -void klp_copy_process(struct task_struct *child)
> > +void klp_init_process(struct task_struct *child)
> > {
> > -
> > - /*
> > - * The parent process may have gone through a KLP transition since
> > - * the thread flag was copied in setup_thread_stack earlier. Bring
> > - * the task flag up to date with the parent here.
> > - *
> > - * The operation is serialized against all klp_*_transition()
> > - * operations by the tasklist_lock. The only exceptions are
> > - * klp_update_patch_state(current) and __klp_sched_try_switch(), but
> > we
> > - * cannot race with them because we are current.
> > - */
> > - if (test_tsk_thread_flag(current, TIF_PATCH_PENDING))
> > - set_tsk_thread_flag(child, TIF_PATCH_PENDING);
> > - else
> > - clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
> > -
> > - child->patch_state = current->patch_state;
> > + clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
> > + child->patch_state = KLP_TRANSITION_IDLE;
>
> I thought that we might do:
>
> child->patch_state = klp_target_state;
>
> to avoid the race in the klp_ftrace_handler() described above.
>
> But the following might happen:
>
> CPU0 CPU1
>
> klp_init_process(child)
>
> child->patch_state = KLP_TRANSITION_IDLE
>
> klp_enable_patch()
> # setup ftrace handlers
> # initialize all processes
> # in the task list
>
> # add "child" into the task list
>
> schedule()
>
> [running child now]
>
> funcA()
>
> klp_ftrace_handler()
>
> child->patch_state = KLP_TRANSITION_IDLE
>
> BANG: Same problem as with the original patch.
>
>
> Hmm, the 2nd version of this patchset tried to do:
>
> diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
> index 90408500e5a3..5e523a3fbb3c 100644
> --- a/kernel/livepatch/patch.c
> +++ b/kernel/livepatch/patch.c
> @@ -95,7 +95,12 @@ static void notrace klp_ftrace_handler(unsigned long ip,
>
> patch_state = current->patch_state;
>
> - WARN_ON_ONCE(patch_state == KLP_TRANSITION_IDLE);
> + /* If the patch_state is KLP_TRANSITION_IDLE, it indicates the
> + * task was forked after klp_init_transition(). For this newly
> + * forked task, it is safe to switch it to klp_target_state.
> + */
> + if (patch_state == KLP_TRANSITION_IDLE)
> + current->patch_state = klp_target_state;
>
> if (patch_state == KLP_TRANSITION_UNPATCHED) {
> /*
>
> Note: It is broken. It sets current->patch_state but it later
> checks the local variable "patch_state" which is still
> KLP_TRANSITION_IDLE.
You're right—Josh has already highlighted this. I overlooked the local variable.
>
> But is is safe when we fix it?
>
> It might be as long as we run klp_synchronize_transition() between
> updating the global @klp_target_state and the other operations.
>
> Especially, we need to make sure that @klp_target_state always have
> either KLP_TRANSITION_PATCHED or KLP_TRANSITION_UNPATCHED when
> func->transition is set.
>
> For this, we would need to add klp_synchronize_transition() into
>
> + klp_init_transition() between setting @klp_target_state
> and func->transition = true.
>
> + klp_complete_transition() also for KLP_TRANSITION_UNPATCHED way.
> It is currently called only for the PATCHED target state.
>
> Will this be enough?
>
> FACT: It is more complicated than it looked.
> QUESTION: Is this worth the effort?
>
> NOTE: The rcu_read_lock() does not solve the reported stall.
> We are spending time on it only because it looked nice and
> simple to you.
It can help avoid contention on the tasklist_lock. We often encounter
latency spikes caused by lock contention, but identifying the root
cause isn't always straightforward. If we can eliminate this global
lock, I believe it’s a worthwhile improvement.
>
> My opinion: I would personally prefer to focus on solving
> the stall and the use-after-free access in do_exit().
These are distinct issues, and I believe it would be more effective to
discuss them separately.
--
Regards
Yafang