Hi,

On 18/10/20 10:46, ouwen wrote:
> On Fri, Oct 16, 2020 at 01:48:17PM +0100, Valentin Schneider wrote:
>> ---
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index a5b6eac07adb..1ebf653c2c2f 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1859,6 +1859,13 @@ static struct rq *__migrate_task(struct rq *rq, 
>> struct rq_flags *rf,
>>      return rq;
>>  }
>>  
>> +struct set_affinity_pending {
>> +    refcount_t              refs;
>> +    struct completion       done;
>> +    struct cpu_stop_work    stop_work;
>> +    struct migration_arg    arg;
>> +};
>> +
>>  /*
>>   * migration_cpu_stop - this will be executed by a highprio stopper thread
>>   * and performs thread migration by bumping thread off CPU then
>> @@ -1866,6 +1873,7 @@ static struct rq *__migrate_task(struct rq *rq, struct 
>> rq_flags *rf,
>>   */
>>  static int migration_cpu_stop(void *data)
>>  {
>> +    struct set_affinity_pending *pending;
>>      struct migration_arg *arg = data;
>>      struct task_struct *p = arg->task;
>>      struct rq *rq = this_rq();
>> @@ -1886,13 +1894,22 @@ static int migration_cpu_stop(void *data)
>>  
>>      raw_spin_lock(&p->pi_lock);
>>      rq_lock(rq, &rf);
>> +
>> +    if (arg->done)
>
> If I'm not wrong(always likely), arg->done is point to the installed
> pending's done of the first task that calling sca. It should not be
> NULL because it is a pointer to the stack address not related to the
> content in the stack.
>

Correct; here I'm using it as an indicator of whether migration_cpu_stop()
was invoked by SCA with a pending affinity request. I'll admit it's icky,
I'd prefer having an explicit flag to check against.

>> +            pending = container_of(arg->done, struct set_affinity_pending, 
>> done);
>>      /*
>>       * If task_rq(p) != rq, it cannot be migrated here, because we're
>>       * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
>>       * we're holding p->pi_lock.
>>       */
>>      if (task_rq(p) == rq) {
>> -            if (is_migration_disabled(p))
>> +            /*
>> +             * An affinity update may have raced with us.
>> +             * p->migration_pending could now be NULL, or could be pointing
>> +             * elsewhere entirely.
>> +             */
>> +            if (is_migration_disabled(p) ||
>> +                (arg->done && p->migration_pending != pending))
>                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> p->migration_pending can be set on the random task's stack but the
> address is possible to be the same with the previous pending. It's
> very very unlikely. But I'm also totally failed.
>

Do you mean if we encounter the above race, but on top of that a new
pending gets installed that has the *same* address as the previous one?

That would mean that the task which installed that first pending got out of
affine_move_task() and *back into it*, with the same stack depth, before the
stopper got to run & grab the task_rq_lock. I also thought about this, but
am unsure how far to push the paranoia.


Side thought: don't we need to NULL p->migration_pending in __sched_fork()?

> I can't realize anything that time, but now I just give this noise.
> Use refcount_add/dec on MIGRATE_ENABLE path to prevent that not sure
> yet.
>

One annoying thing is that in that path we can't wait on the refcount
reaching 0, since migrate_{disable, enable}() disable preemption.
(the stopper is only schedule()'d upon reenabling preemption in
migrate_enable()).

Including the stopper callback in the refcount chain would probably reduce
future headaches, but it's not as straightforward.


>>                      goto out;
>>  
>>              if (task_on_rq_queued(p))
>> @@ -2024,13 +2041,6 @@ void do_set_cpus_allowed(struct task_struct *p, const 
>> struct cpumask *new_mask)
>>      __do_set_cpus_allowed(p, new_mask, 0);
>>  }
>>  
>> -struct set_affinity_pending {
>> -    refcount_t              refs;
>> -    struct completion       done;
>> -    struct cpu_stop_work    stop_work;
>> -    struct migration_arg    arg;
>> -};
>> -
>>  /*
>>   * This function is wildly self concurrent; here be dragons.
>>   *

Reply via email to