Hi.

On Thu, Jan 01, 2026 at 02:15:58PM -0500, Waiman Long <[email protected]> 
wrote:
> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
> with the cpuset.cpus/cpuset.cpus.exclusive of a sibling partition,
> the sibling's partition state becomes invalid. This is overly harsh and
> is probably not necessary.
> 
> The cpuset.cpus.exclusive control file, if set, will override the
> cpuset.cpus of the same cpuset when creating a cpuset partition.
> So cpuset.cpus has less priority than cpuset.cpus.exclusive in setting up
> a partition.  However, it cannot override a conflicting cpuset.cpus file
> in a sibling cpuset and the partition creation process will fail. This
> is inconsistent.  That will also make using cpuset.cpus.exclusive less
> valuable as a tool to set up cpuset partitions as the users have to
> check if such a cpuset.cpus conflict exists or not.
> 
> Fix these problems by strictly adhering to the setting of the
> following control files in descending order of priority when setting
> up a partition.
> 
>  1. cpuset.cpus.exclusive.effective of a valid partition
>  2. cpuset.cpus.exclusive
>  3. cpuset.cpus


> 
> So once a cpuset.cpus.exclusive is set without failure, it will
> always be allowed to form a valid partition as long as at least one
> CPU can be granted from its parent irrespective of the state of the
> siblings' cpuset.cpus values. Of course, setting cpuset.cpus.exclusive
> will fail if it conflicts with the cpuset.cpus.exclusive or the
> cpuset.cpus.exclusive.effective value of a sibling.

Concept question: 
When a/b/cpuset.cpus.exclusive ⊂ a/b/cpuset.cpus (proper subset)
and a/b/cpuset.cpus.partition == root, a/cpuset.cpus.partition == root
(b is valid partition)
should a/b/cpuset.cpus.exclusive.effective be equal to cpuset.cpus (as
all of them happen to be exclusive) or "only" cpuset.cpus.exclusive?

> Partition can still be created by setting only cpuset.cpus without
> setting cpuset.cpus.exclusive. However, any conflicting CPUs in sibling's
> cpuset.cpus.exclusive.effective and cpuset.cpus.exclusive values will
> be removed from its cpuset.cpus.exclusive.effective as long as there
> is still one or more CPUs left and can be granted from its parent. This
> CPU stripping is currently done in rm_siblings_excl_cpus().
> 
> The new code will now try its best to enable the creation of new
> partitions with only cpuset.cpus set without invalidating existing ones.

OK. (After I re-learnt benefits of remote partitions or more precisely
cpuset.cpus.effective.)

> However it is not guaranteed that all the CPUs requested in cpuset.cpus
> will be used in the new partition even when all these CPUs can be
> granted from the parent.
> 
> This is similar to the fact that cpuset.cpus.effective may not be
> able to include all the CPUs requested in cpuset.cpus. In this case,
> the parent may not able to grant all the exclusive CPUs requested in
> cpuset.cpus to cpuset.cpus.exclusive.effective if some of them have
> already been granted to other partitions earlier.
> 
> With the creation of multiple sibling partitions by setting
> only cpuset.cpus, this does have the side effect that their exact
> cpuset.cpus.exclusive.effective settings will depend on the order of
> partition creation if there are conflicts. Due to the exclusive nature
> of the CPUs in a partition, it is not easy to make it fair other than
> the old behavior of invalidating all the conflicting partitions.
> 
> For example,
>   # echo "0-2" > A1/cpuset.cpus
>   # echo "root" > A1/cpuset.cpus.partition
>   # echo A1/cpuset.cpus.partition
>   root
>   # echo A1/cpuset.cpus.exclusive.effective
>   0-2
>   # echo "2-4" > B1/cpuset.cpus
>   # echo "root" > B1/cpuset.cpus.partition
>   # echo B1/cpuset.cpus.partition
>   root
>   # echo B1/cpuset.cpus.exclusive.effective
>   3-4
>   # echo B1/cpuset.cpus.effective
>   3-4
> 
> For users who want to be sure that they can get most of the CPUs they
> want,

Slightly OT but I'd say that users want:
a) confinement (some cpuset.cpus in leaves)
b) isolation (cpuset.cpus.exclusive in leaves)
c) hierarchical organization
  - confinment generalizes OK
  - children can only claim what parent allowed

Conflicting exclusivity configs should be no users intention or a want :-p


> cpuset.cpus.exclusive should be used instead if they can set
> it successfully without failure. Setting cpuset.cpus.exclusive will
> guarantee that sibling conflicts from then onward is no longer possible.

I think the background idea of the paragraph (shift away from local to
remote partitions, also mentioned the other day) could be somehow fitted
into the Documentation/ hunks.

> diff --git a/Documentation/admin-guide/cgroup-v2.rst 
> b/Documentation/admin-guide/cgroup-v2.rst
> ...
> @@ -2632,6 +2641,9 @@ Cpuset Interface Files
>  
>       The root cgroup is always a partition root and its state cannot
>       be changed.  All other non-root cgroups start out as "member".
> +     Even though the "cpuset.cpus.exclusive*" control files are not
> +     present in the root cgroup, they are implicitly the same as
> +     "cpuset.cpus".

Even "cpuset.cpus" have CFTYPE_NOT_ON_ROOT, so this formulation might be
confusing. Maybe it's same as "cpuset.cpus.effective"?

Thanks,
Michal

Attachment: signature.asc
Description: PGP signature

Reply via email to