On Thu, Apr 24, 2025 at 02:07:17PM +0100, Daniel P. Berrangé wrote:
>
> On Thu, Apr 24, 2025 at 01:22:46PM +0200, Felix Huettner via Devel wrote:
> > In case of a host that has a large number of cpus offline the count of
> > host cpus and the last bit set in the virHostCPUGetOnlineBitmap might
> > diverge significantly. This can e.g. be the case when disabeling smt via
> > /sys/devices/system/cpu/smt/control.
> >
> > On the host this looks like:
> > ```
> > $ cat /sys/devices/system/cpu/present
> > 0-255
> > $ cat /sys/devices/system/cpu/online
> > 0-127
> > ```
> >
> > However in this case virBitmapToData previously only allocated 16 bytes
> > for the output bitmap. This is becase the last set bit is on the 15th
> > byte.
> >
> > Users of virHostCPUGetMap however rely on the "cpumap" containing enough
> > space for all existing cpus (so they would expect 32 bytes in this case).
> > E.g. cmdNodeCpuMap relies on this for its output. It will then actually
> > read 32 bytes from the start of the "cpumap" address where in this case
> > the last 16 of these bytes are uninitialized.
> >
> > This manifests itself in flapping outputs of "virsh nodecpumap --pretty"
> > like:
> > ```
> > $ virsh nodecpumap --pretty
> > CPUs present: 256
> > CPUs online: 128
> > CPU map: 0-127,192,194,202
> >
> > $ virsh nodecpumap --pretty
> > CPUs present: 256
> > CPUs online: 128
> > CPU map: 0-127,192,194,197
> >
> > $ virsh nodecpumap --pretty
> > CPUs present: 256
> > CPUs online: 128
> > CPU map: 0-127,192,194,196-197
> > ```
> >
> > This in turn potentially causes users of this data to report wrong cpu
> > counts.
> >
> > Note that this only seems to happen with at least 256 phyiscal cpus
> > where at least 128 are offline.
> >
> > We fix this by preallocating the expected bitmap size.
> >
> > Signed-off-by: Felix Huettner <[email protected]>
> > ---
> > src/util/virhostcpu.c | 14 +++++++++-----
> > 1 file changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/src/util/virhostcpu.c b/src/util/virhostcpu.c
> > index 5dbcc8987c..626faa88cf 100644
> > --- a/src/util/virhostcpu.c
> > +++ b/src/util/virhostcpu.c
> > @@ -1091,22 +1091,26 @@ virHostCPUGetMap(unsigned char **cpumap,
> > {
> > g_autoptr(virBitmap) cpus = NULL;
> > int ret = -1;
> > - int dummy;
> >
> > virCheckFlags(0, -1);
> >
> > + ret = virHostCPUGetCount();
>
> This sets 'ret' to a positive value....
>
> > +
> > if (!cpumap && !online)
> > - return virHostCPUGetCount();
> > + return ret;
> >
> > if (!(cpus = virHostCPUGetOnlineBitmap()))
> > goto cleanup;
>
> ...in the failure scenario we now jump to 'cleanup'
> with 'ret' still positive.
>
> Better to use a different pattern with separate variables
>
> int ret = -1;
> int ncpus = virHostCPUGetCount();
>
> ...do stuff...
>
> ret = ncpus;
> cleanup:
> if (ret < 0)
> ...
> return ret;
Hi Daniel,
thanks for the review.
I will post a fixed v2.
I decided to get rid of the "goto cleanup" completely and just return -1
instead. "cleanup" seems to only have existed to deallocate cpumap, but
by the time we jump there cpumap can not possibly be allocated.
So i hope this makes it more easy to read.
Thanks a lot,
Felix
>
> >
> > - if (cpumap)
> > - virBitmapToData(cpus, cpumap, &dummy);
> > + if (cpumap) {
> > + int len = (ret + CHAR_BIT) / CHAR_BIT;
> > + *cpumap = g_new0(unsigned char, len);
> > + virBitmapToDataBuf(cpus, *cpumap, len);
> > + }
> > +
> > if (online)
> > *online = virBitmapCountBits(cpus);
> >
> > - ret = virHostCPUGetCount();
> >
> > cleanup:
> > if (ret < 0 && cpumap)
> >
> > base-commit: c5a73f75bc6cae4f466d0a6344d5b3277ac9c2f4
> > --
> > 2.43.0
> >
>
> With regards,
> Daniel
> --
> |: https://berrange.com/
> |: https://libvirt.org/
> |: https://entangle-photo.org/
>