On 29/01/21 01:02, Tao Zhou wrote:
> Hi,
>
> On Wed, Jan 27, 2021 at 07:30:35PM +0000, Valentin Schneider wrote:
>
>> Fiddling some more with a TLA+ model of set_cpus_allowed_ptr() & friends
>> unearthed one more outstanding issue. This doesn't even involve
>> migrate_disable(), but rather affinity changes and execution of the stopper
>> racing with each other.
>>
>> My own interpretation of the (lengthy) TLA+ splat (note the potential for
>> errors at each level) is:
>>
>>   Initial conditions:
>>     victim.cpus_mask = {CPU0, CPU1}
>>
>>   CPU0                             CPU1                             
>> CPU<don't care>
>>
>>   switch_to(victim)
>>                                                                  
>> set_cpus_allowed(victim, {CPU1})
>>                                                                    kick CPU0 
>> migration_cpu_stop({.dest_cpu = CPU1})
>>   switch_to(stopper/0)
>>                                                                  // e.g. CFS 
>> load balance
>>                                                                  
>> move_queued_task(CPU0, victim, CPU1);
>                                     ^^^^^^^^^^^^^^^^
>
> Why is move_queued_task() not attach_task()/detach_task() for CFS load..
>

Heh I expected that one; it is indeed detach_task()/attach_task() for CFS
LB. I didn't want to make this any longer than it needed to, and I figured
that move_queued_task() being a composition of detach_task(), attach_task() and
rq_locks, this would get the point across.

This does raise an "interesting" point that ATM I think this issue cannot
actually involve move_queued_task(), since all current move_queued_task()
callsites are issued either from a stopper or from set_cpus_allowed_ptr().

CFS' detach_task() + attach_task() could do it, though.

>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 06b449942adf..b57326b0a742 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1923,20 +1923,28 @@ static int migration_cpu_stop(void *data)
>>                      complete = true;
>>              }
>>
>> -            /* migrate_enable() --  we must not race against SCA */
>> -            if (dest_cpu < 0) {
>> -                    /*
>> -                     * When this was migrate_enable() but we no longer
>> -                     * have a @pending, a concurrent SCA 'fixed' things
>> -                     * and we should be valid again. Nothing to do.
>> -                     */
>> -                    if (!pending) {
>> -                            WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), 
>> &p->cpus_mask));
>> -                            goto out;
>> -                    }
>> +           /*
>> +            * When this was migrate_enable() but we no longer
>> +            * have a @pending, a concurrent SCA 'fixed' things
>> +            * and we should be valid again.
>> +            *
>> +            * This can also be a stopper invocation that was 'fixed' by an
>> +            * earlier one.
>> +            *
>> +            * Nothing to do.
>> +            */
>> +            if ((dest_cpu < 0 || dest_cpu == cpu_of(rq)) && !pending) {
>
> When the condition 'dest_cpu == cpu_of(rq)' is true, pending is not NULL.
> The condition may be like this:
>
>     if ((dest_cpu < 0 && !pending) || dest_cpu == cpu_of(rq))
>
> We want to choose one cpu in the new(currently modified) cpu_mask and
> complete all.
>

Consider the execution of migration_cpu_stop() in above trace with
migrate_task_to(). We do have:
- dest_cpu == cpu_of(rq)
- p->migration_pending

but we do *not* want to bail out at this condition, because we need to fix
up dest_cpu.

Reply via email to