Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long

Petr Mladek Wed, 12 Feb 2025 07:44:38 -0800

On Wed 2025-02-12 19:54:21, Yafang Shao wrote:
> On Wed, Feb 12, 2025 at 10:34 AM Yafang Shao <[email protected]> wrote:
> >
> > On Wed, Feb 12, 2025 at 8:40 AM Josh Poimboeuf <[email protected]> wrote:
> > >
> > > On Tue, Feb 11, 2025 at 02:24:36PM +0800, Yafang Shao wrote:
> > > >  void klp_try_complete_transition(void)
> > > >  {
> > > > +     unsigned long timeout, proceed_pending_processes;
> > > >       unsigned int cpu;
> > > >       struct task_struct *g, *task;
> > > >       struct klp_patch *patch;
> > > > @@ -467,9 +468,30 @@ void klp_try_complete_transition(void)
> > > >        * unless the patch includes changes to a very common function.
> > > >        */
> > > >       read_lock(&tasklist_lock);
> > > > -     for_each_process_thread(g, task)
> > > > +     timeout = jiffies + HZ;
> > > > +     proceed_pending_processes = 0;
> > > > +     for_each_process_thread(g, task) {
> > > > +             /* check if this task has already switched over */
> > > > +             if (task->patch_state == klp_target_state)
> > > > +                     continue;
> > > > +
> > > > +             proceed_pending_processes++;
> > > > +
> > > >               if (!klp_try_switch_task(task))
> > > >                       complete = false;
> > > > +
> > > > +             /*
> > > > +              * Prevent hardlockup by not blocking tasklist_lock for 
> > > > too long.
> > > > +              * But guarantee the forward progress by making sure at 
> > > > least
> > > > +              * some pending processes were checked.
> > > > +              */
> > > > +             if (rwlock_is_contended(&tasklist_lock) &&
> > > > +                 time_after(jiffies, timeout) &&
> > > > +                 proceed_pending_processes > 100) {
> > > > +                     complete = false;
> > > > +                     break;
> > > > +             }
> > > > +     }
> > > >       read_unlock(&tasklist_lock);
> > >
> > > Instead of all this can we not just use rcu_read_lock() instead of
> > > tasklist_lock?
> >
> > I’m concerned that there’s a potential race condition in fork() if we
> > use RCU, as illustrated below:
> >
> >   CPU0                                                     CPU1
> >
> > write_lock_irq(&tasklist_lock);
> > klp_copy_process(p);
> >
> >                                          
> > parent->patch_state=klp_target_state
> >
> > list_add_tail_rcu(&p->tasks, &init_task.tasks);
> > write_unlock_irq(&tasklist_lock);
> >
> > In this scenario, after the parent executes klp_copy_process(p) to
> > copy its patch_state to the child, but before adding the child to the
> > task list, the parent’s patch_state might be updated by the KLP
> > transition. This could result in the child being left with an outdated
> > state.


Similar race actually existed even before. 

> > We need to ensure that klp_copy_process() and list_add_tail_rcu() are
> > treated as a single atomic operation.
> 
> Before the newly forked task is added to the task list, it doesn’t
> execute any code and can always be considered safe during the KLP
> transition. Therefore, we could replace klp_copy_process() with
> klp_init_process(), where we simply set patch_state to
> KLP_TRANSITION_IDLE, as shown below:

This might work when a new process is created, for example, using
execve(). But it would fail when the process is just forked (man 2 fork)
and both parent and child continue running the same code.


> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2544,7 +2544,9 @@ __latent_entropy struct task_struct *copy_process(
>                 p->exit_signal = args->exit_signal;
>         }
> 
> -       klp_copy_process(p);
> +       // klp_init_process(p);
> +       clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
> +       child->patch_state = KLP_TRANSITION_IDLE;

This is exactly the loction where we depend on the sychronization
with the tasklist_lock. It allows us to synchronize both
TIF_PATCH_PENDING flag and p->patch_state between the parent
and child, see the comment in klp_copy_process():

/* Called from copy_process() during fork */
void klp_copy_process(struct task_struct *child)
{
        /*
         * The parent process may have gone through a KLP transition since
         * the thread flag was copied in setup_thread_stack earlier. Bring
         * the task flag up to date with the parent here.
         *
         * The operation is serialized against all klp_*_transition()
         * operations by the tasklist_lock. The only exceptions are
         * klp_update_patch_state(current) and __klp_sched_try_switch(), but we
         * cannot race with them because we are current.
         */
        if (test_tsk_thread_flag(current, TIF_PATCH_PENDING))
                set_tsk_thread_flag(child, TIF_PATCH_PENDING);
        else
                clear_tsk_thread_flag(child, TIF_PATCH_PENDING);

        child->patch_state = current->patch_state;
}

When using rcu_read_lock() in klp_try_complete_transition,
child->patch_state might get an outdated information. The following
race commes to my mind:

CPU1                            CPU1

                                klp_try_complete_transition()


taskA:  
 + fork()
   + klp_copy_process()
      child->patch_state = KLP_PATCH_UNPATCHED

                                  klp_try_switch_task(taskA)
                                    // safe

                                child->patch_state = KLP_PATCH_PATCHED

                                all processes patched

                                klp_finish_transition()


        list_add_tail_rcu(&p->thread_node,
                          &p->signal->thread_head);


BANG: The forked task has KLP_PATCH_UNPATCHED so that
      klp_ftrace_handler() will redirect it to the old code.

      But CPU1 thinks that all tasks are migrated and is going
      to finish the transition



My opinion:

I am afraid that using rcu_read_lock() in klp_try_complete_transition()
might cause more harm than good. The code already is complicated
and this might make it even more tricky.

I would first like to understand how exactly the stall happens.
It is possible that even rcu_read_lock() won't help here!

If the it takes too long time to check backtraces of all pending
processes then even rcu_read_lock() might trigger the RCU stall
warning as well.

Best Regards,
Petr

Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long

Reply via email to