On Thu, May 07, 2026 at 03:12:34PM +0100, Daniel P. Berrangé wrote:
On Thu, May 07, 2026 at 03:45:03PM +0200, Martin Kletzander via Devel wrote:
Hello,

so we found an issue in the ESX driver which is pretty easy to solve,
but requires an update to the communication with the server.

What is the actual issue that needs this change ?


Long story short (not sure if the issue [1] is accessible publicly)
when there are multiple machines on vSphere which have the same UUID (in
VMX that is uuid.bios, in the API it is config.uuid) then because we
search based on that (and keep that around in virDomain struct) the
server returns just the first VM with that UUID, so it can happen that
you do:

`virsh dumpxml vm_a`

and get a domain XML for vm_b.

Of course worse issues are when you try and do shutdown, reset, etc.

Apparently the previous uuid lost its uniqueness at some point, I'd
guess it's related to cloning from templates, but I cannot be sure.  So
there is another field now, instanceUuid (vc.uuid in the VMX and
config.instanceUuid in the API) which can be thought as "more unique" (I
know how bad that sounds).  We can even search for it using the same
FindByUuid() function that we already use, we only need to change its
`instanceUuid` boolean to `true` because that says we are searching
based on the config.instanceUuid and not config.uuid.

However switching that boolean from `esxVI_Boolean_Undefined` to
`esxVI_Boolean_True` I get a 500 server error.  And that's where the
investigation got a bit more "interesting".

                                                            The newer
VI API allows for the instanceUuid parameter when searching for VMs, but
only since 4.0, but because we do not send any "SOAPAction:" header it
behaves as if we used the oldest one.  If we add the header "SOAPAction:
urn:vim25/4.0" (at least 4.0) it works, but I'm not sure if there are
any incompatibilities and whether there is a use case for older clusters
that do not support at least this version.

As Richard says, it would be nice to understand what version 4.0
corresponds to in terms of vcenter releases ?


It would be.  But I am getting confused with their versioning.  Even the
API docs are confusing sometimes.  I was testing this on a cluster with
VI API version 7.0.3.0, the documentation [2] explicitly mentions all
parameters of the FindByUuid() function to be available since 2.0.

Now I get different behaviour based on what is in the SOAPAction HTTP
header.  Testing it more thoroughly today (and realising there are some
bad dependencies because simple `ninja -C build` does not account for
all the changes somehow) I get this:

- No SOAPAction added -- HTTP 500, Unexpected element tag "instanceUuid"
  seen
- "SOAPAction: urn:vim25 -- Lot of warnings about unexpected properties
  for deserialisation (from libvirt codebase), but the API works
- "SOAPAction: urn:vim25/1.0" -- Same as above
- "SOAPAction: urn:vim25/2.0" -- Same as above
- "SOAPAction: urn:vim25/2.5" -- Same as above
- "SOAPAction: urn:vim25/3.0" -- Same as above
- "SOAPAction: urn:vim25/4.0" -- No warnings, "just works"
- "SOAPAction: urn:vim25/7.0" -- Works, but there are again lots of
  warnings, about the same properties in UserSession ServiceContent,
  HostConfigManager etc.

Now, we explicitly check that the API version is at least 2.5, but based
on the warnings the codebase might be more prepared for 4.0.  When
looking at the properties the warnings mention, for example the
UserSession they are properties of that type added specifically in
vSphere API Release 5.0 and 5.1; which means for some reason it is
probably behaving as newer API version unless we specify 4.0 exactly.

Now we can add the new fields as optional, that's not a problem (I tried
it and it works nicely).  I am just not that familiar with backwards
compatibility with vSphere API to know whether we can just add
"SOAPAction: urn:vim25" (without defining the number) and call it a day
or how much careful we need to be.

We have not formally specified what vmware versions we target, but if
we take our usual approach to limit our liability, we'd probably want
to go for compat with something like the N most recent major versions
for low values of "N"

If there is a reasonable use case for it (well, anyone saying "here, me,
I'd like to keep it") we could change the behaviour based on the version
we parse when connecting to the server.  But that's an extra bit of
mess, although not totally big one.

Or accept a URI parameter to control compatibility level ?


That would be safer, but would shift the responsibility to layers that
I am sure do not want to deal with it.  And we are parsing the API
version so we at least know what should work.

[1] https://redhat.atlassian.net/browse/RHEL-174300
[2] 
https://developer.broadcom.com/xapis/vsphere-web-services-api/latest/vim.SearchIndex.html#findByUuid

With regards,
Daniel
--
|: https://berrange.com       ~~        https://hachyderm.io/@berrange :|
|: https://libvirt.org          ~~          https://entangle-photo.org :|
|: https://pixelfed.art/berrange   ~~    https://fstop138.berrange.com :|

Attachment: signature.asc
Description: PGP signature

Reply via email to