On 12/11/2015 04:15 PM, Jason Baron wrote:
On 12/10/2015 04:30 PM, Chris Friesen wrote:
If I put a task into a cpuset and then call sched_setaffinity() on it, it will be affined to the intersection of the two sets of cpus. (Those specified on the set, and those specified in the syscall.) However, if I then change the cpus in the cpuset the process affinity will simply be overwritten by the new cpuset affinity. It does not seem to take into account any restrictions from the original sched_setaffinity() call. Wouldn't it make more sense to affine the process to the intersection between the new set of cpus from the cpuset, and the current process affinity? That way if I explicitly masked out certain CPUs in the original sched_setaffinity() call then they would remain masked out regardless of changes to the set of cpus assigned to the cpuset.
<snip>
To add the behavior you are describing, I think requires another cpumask_t field in the task_struct. Where we could store the last requested mask value for sched_setaffinity() and use that when updating the cpus for a cpuset via an intersection as you described. I think adding a task to a cpuset still should wipe out any sched_setaffinity() settings - but that would depend on the desired semantics here. It would also require a knob so as not to break existing behavior by default.
Agreed, the additional field in the task_struct makes sense. Personally I don't think that adding a task to a cpuset should wipe out any previously-set affinity, I think it should take the intersection for that case as well.
In this environment it might make sense to have separate queries to return the requested and actual affinity.
You could also create a child cgroup for the process that you don't want to change and set the cpus on that cgroup instead of using sched_setaffinity(). Then you change the cpus for the parent cgroup and that shouldn't affect the child as long as the child cgroup is a subset. But its not entirely clear to me if that addresses your use-case?
I ended up doing something like this where I had a top-level cpuset and a number of child cpusets, each with an exclusive subset of the CPUs assigned to it. But it meant that I needed more complicated code to figure out which tasks needed to go into which child cpusets, and more complicated code to handle removing a CPU from the top-level cpuset (since you have to remove it from any children first).
Chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/