On Thu, 6 Apr 2017, Ingo Molnar wrote:
> CPU hotplug and changing the affinity mask are the more complex cases, 
> because 
> there migrating or not migrating is a correctness issue:
> 
>  - CPU hotplug has to be aware of this anyway, regardless of whether it's 
> solved 
>    via a counter of the affinity mask.

You have to prevent CPU hotplug simply as long as there are migration
disabled tasks on the fly. Making that depend on whether they are on a CPU
which is about to be unplugged or not would be complete overkill as you
still have to solve the case that a task sets the migrate_disable() AFTER
the cpu down machinery started.

>  - Changing the affinity mask (set_cpus_allowed()) has two main cases:
>    the synchronous and asynchronous case:
> 
>      - synchronous is when the current task changes its own affinity mask, 
> this 
>        should work fine mostly out of box, as we don't call 
> set_cpus_allowed() 
>        from inside migration disabled regions. (We can enforce this via a 
>        debugging check.)
> 
>      - The asynchronous case is when the affinity task of some other task is 
>        changed - this would not have an immediate effect with 
> migration-disabled 
>        logic, the migration would be delayed to when migration is re-enabled 
>        again.
> 
> As for general fragility, is there any reason why a simple debugging check in 
> set_task_cpu() would not catch most mishaps:
> 
>       WARN_ON_ONCE(p->state != TASK_RUNNING && p->migration_disabled);
> 
> ... or something like that?
>
> I.e. my point is that I think using a counter would be much simpler, yet 
> still as 
> robust and maintainable. We could in fact move migrate_disable()/enable() 
> upstream 
> straight away and eliminate this small fork of functionality between mainline 
> and 
> -rt.

The counter alone might be enough for the scheduler placement decisions,
but it cannot solve the hotplug issue. You still need something like I
sketched out in my previous reply.

Thanks,

        tglx

Reply via email to