On 6/10/26 12:46 PM, Wang Zhan wrote:
> count_cpu_cores__() allocates the CPU affinity mask based on the
> number of online CPUs returned by sysconf(_SC_NPROCESSORS_ONLN).
> On Linux, sched_getaffinity() returns EINVAL if the supplied mask is
> smaller than the kernel affinity mask.
>
> This can happen when the possible CPU range exceeds the configured CPU
> count reported by sysconf(_SC_NPROCESSORS_CONF), for example on systems
> or VMs configured for CPU hotplug. In that case OVS falls back to the
> online CPU count and may ignore the process affinity mask when sizing
> handler threads.
>
> Use _SC_NPROCESSORS_CONF as the initial Linux allocation size and retry
> sched_getaffinity() with larger dynamically allocated CPU masks on EINVAL,
> while preserving fallback behavior for other failures. This follows the
> sched_getaffinity(2) guidance for large CPU affinity masks: probe with
> dynamically allocated masks of increasing size until the call no longer
> fails with EINVAL.
>
> Testing:
> - Tested on a QEMU/KVM Rocky Linux 9.7 VM running Linux 6.6.119,
> with CPUs 0-159 possible, CPUs 0-19 online, and the tested process
> limited to CPUs 6-9; verified that sched_getaffinity() failed with
> CPU_ALLOC(20/64/128) and succeeded with CPU_ALLOC(160), returning
> 4 CPUs.
>
> Assisted-by: GPT-5, Codex
> Signed-off-by: Wang Zhan <[email protected]>
> ---
> v2:
> - Use _SC_NPROCESSORS_CONF as the initial Linux allocation size.
> - Keep the EINVAL retry loop for CPU hotplug environments where configured
> CPUs are still smaller than the kernel affinity mask.
> - Remove one-shot enums, extra casts, MIN(), and use '* 2' growth.
> - Use full name in Signed-off-by.
>
> lib/ovs-thread.c | 25 +++++++++++++++++++++----
> 1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c
> index 243791de8..095843d8d 100644
> --- a/lib/ovs-thread.c
> +++ b/lib/ovs-thread.c
> @@ -647,14 +647,31 @@ count_cpu_cores__(void)
> #endif
> #ifdef __linux__
The code above still requests _SC_NPROCESSORS_ONLN for linux, but the
value is not really needed for anything. We should just store the
_SC_NPROCESSORS_CONF value in n_cores here if it is linux and avoid
the unnecessary _SC_NPROCESSORS_ONLN.
> if (n_cores > 0) {
> - cpu_set_t *set = CPU_ALLOC(n_cores);
> -
> - if (set) {
> - size_t size = CPU_ALLOC_SIZE(n_cores);
> + long int n_conf = sysconf(_SC_NPROCESSORS_CONF);
> + size_t n_alloc = MAX(n_cores, n_conf);
n_cores is always smaller than n_conf, so this is not very useful.
> + size_t max_n_alloc = MAX(n_alloc, 1 << 18);
And we don't really need a separate n_alloc. We could track the size
in n_cores variable, it is not used inside the loop anyway, only
updated before breaking. And so, we can do:
size_t max_n_cores = MAX(n_cores, 1 << 18);
And use that in the loop.
> + int error = EINVAL;
> +
> + /* sched_getaffinity() returns EINVAL when 'size' is smaller than the
> + * kernel affinity mask. This can happen when the possible CPU range
> + * exceeds the CPU count reported by sysconf(). */
> + while (error == EINVAL && n_alloc <= max_n_alloc) {
> + size_t size = CPU_ALLOC_SIZE(n_alloc);
> + cpu_set_t *set = CPU_ALLOC(n_alloc);
> +
> + if (!set) {
> + break;
> + }
>
> + CPU_ZERO_S(size, set);
> if (!sched_getaffinity(0, size, set)) {
> n_cores = CPU_COUNT_S(size, set);
> + CPU_FREE(set);
> + break;
> }
> +
> + error = errno;
This is not fully correct. On sucess errno may not be set to zero, it
is only meaningful on error. A better option here is to write the else
case for the if condition above and update the error and the n_cores in
both branches (on success set error to zero and n_cores to the actual
count, on failure - set error to errno and double the n_cores), then
the CPU_FREE can go to the end of the loop and the loop condition will
break, i.e. no need to break inside the if.
> + n_alloc *= 2;
> CPU_FREE(set);
> }
> }
Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev