Re: [ovs-dev] [PATCH v2] ovs-thread: Retry sched_getaffinity() with larger CPU masks.

Ilya Maximets Fri, 19 Jun 2026 12:59:11 -0700

On 6/10/26 12:46 PM, Wang Zhan wrote:
> count_cpu_cores__() allocates the CPU affinity mask based on the
> number of online CPUs returned by sysconf(_SC_NPROCESSORS_ONLN).
> On Linux, sched_getaffinity() returns EINVAL if the supplied mask is
> smaller than the kernel affinity mask.
> 
> This can happen when the possible CPU range exceeds the configured CPU
> count reported by sysconf(_SC_NPROCESSORS_CONF), for example on systems
> or VMs configured for CPU hotplug.  In that case OVS falls back to the
> online CPU count and may ignore the process affinity mask when sizing
> handler threads.
> 
> Use _SC_NPROCESSORS_CONF as the initial Linux allocation size and retry
> sched_getaffinity() with larger dynamically allocated CPU masks on EINVAL,
> while preserving fallback behavior for other failures.  This follows the
> sched_getaffinity(2) guidance for large CPU affinity masks: probe with
> dynamically allocated masks of increasing size until the call no longer
> fails with EINVAL.
> 
> Testing:
> - Tested on a QEMU/KVM Rocky Linux 9.7 VM running Linux 6.6.119,
>   with CPUs 0-159 possible, CPUs 0-19 online, and the tested process
>   limited to CPUs 6-9; verified that sched_getaffinity() failed with
>   CPU_ALLOC(20/64/128) and succeeded with CPU_ALLOC(160), returning
>   4 CPUs.
> 
> Assisted-by: GPT-5, Codex
> Signed-off-by: Wang Zhan <[email protected]>
> ---
> v2:
> - Use _SC_NPROCESSORS_CONF as the initial Linux allocation size.
> - Keep the EINVAL retry loop for CPU hotplug environments where configured
>   CPUs are still smaller than the kernel affinity mask.
> - Remove one-shot enums, extra casts, MIN(), and use '* 2' growth.
> - Use full name in Signed-off-by.
> 
>  lib/ovs-thread.c | 25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c
> index 243791de8..095843d8d 100644
> --- a/lib/ovs-thread.c
> +++ b/lib/ovs-thread.c
> @@ -647,14 +647,31 @@ count_cpu_cores__(void)
>  #endif
>  #ifdef __linux__


The code above still requests _SC_NPROCESSORS_ONLN for linux, but the
value is not really needed for anything.  We should just store the
_SC_NPROCESSORS_CONF value in n_cores here if it is linux and avoid
the unnecessary _SC_NPROCESSORS_ONLN.

>      if (n_cores > 0) {
> -        cpu_set_t *set = CPU_ALLOC(n_cores);
> -
> -        if (set) {
> -            size_t size = CPU_ALLOC_SIZE(n_cores);
> +        long int n_conf = sysconf(_SC_NPROCESSORS_CONF);
> +        size_t n_alloc = MAX(n_cores, n_conf);

n_cores is always smaller than n_conf, so this is not very useful.

> +        size_t max_n_alloc = MAX(n_alloc, 1 << 18);

And we don't really need a separate n_alloc.  We could track the size
in n_cores variable, it is not used inside the loop anyway, only
updated before breaking.  And so, we can do:

        size_t max_n_cores = MAX(n_cores, 1 << 18);

And use that in the loop.

> +        int error = EINVAL;
> +
> +        /* sched_getaffinity() returns EINVAL when 'size' is smaller than the
> +         * kernel affinity mask.  This can happen when the possible CPU range
> +         * exceeds the CPU count reported by sysconf(). */
> +        while (error == EINVAL && n_alloc <= max_n_alloc) {
> +            size_t size = CPU_ALLOC_SIZE(n_alloc);
> +            cpu_set_t *set = CPU_ALLOC(n_alloc);
> +
> +            if (!set) {
> +                break;
> +            }
>  
> +            CPU_ZERO_S(size, set);
>              if (!sched_getaffinity(0, size, set)) {
>                  n_cores = CPU_COUNT_S(size, set);
> +                CPU_FREE(set);
> +                break;
>              }
> +
> +            error = errno;

This is not fully correct.  On sucess errno may not be set to zero, it
is only meaningful on error.  A better option here is to write the else
case for the if condition above and update the error and the n_cores in
both branches (on success set error to zero and n_cores to the actual
count, on failure - set error to errno and double the n_cores), then
the CPU_FREE can go to the end of the loop and the loop condition will
break, i.e. no need to break inside the if.

> +            n_alloc *= 2;
>              CPU_FREE(set);
>          }
>      }

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] ovs-thread: Retry sched_getaffinity() with larger CPU masks.

Reply via email to