On Thu, May 07, 2026 at 03:12:34PM +0100, Daniel P. Berrangé wrote:
On Thu, May 07, 2026 at 03:45:03PM +0200, Martin Kletzander via Devel wrote:Hello,so we found an issue in the ESX driver which is pretty easy to solve, but requires an update to the communication with the server.What is the actual issue that needs this change ?
Long story short (not sure if the issue [1] is accessible publicly) when there are multiple machines on vSphere which have the same UUID (in VMX that is uuid.bios, in the API it is config.uuid) then because we search based on that (and keep that around in virDomain struct) the server returns just the first VM with that UUID, so it can happen that you do: `virsh dumpxml vm_a` and get a domain XML for vm_b. Of course worse issues are when you try and do shutdown, reset, etc. Apparently the previous uuid lost its uniqueness at some point, I'd guess it's related to cloning from templates, but I cannot be sure. So there is another field now, instanceUuid (vc.uuid in the VMX and config.instanceUuid in the API) which can be thought as "more unique" (I know how bad that sounds). We can even search for it using the same FindByUuid() function that we already use, we only need to change its `instanceUuid` boolean to `true` because that says we are searching based on the config.instanceUuid and not config.uuid. However switching that boolean from `esxVI_Boolean_Undefined` to `esxVI_Boolean_True` I get a 500 server error. And that's where the investigation got a bit more "interesting".
The newer VI API allows for the instanceUuid parameter when searching for VMs, but only since 4.0, but because we do not send any "SOAPAction:" header it behaves as if we used the oldest one. If we add the header "SOAPAction: urn:vim25/4.0" (at least 4.0) it works, but I'm not sure if there are any incompatibilities and whether there is a use case for older clusters that do not support at least this version.As Richard says, it would be nice to understand what version 4.0 corresponds to in terms of vcenter releases ?
It would be. But I am getting confused with their versioning. Even the API docs are confusing sometimes. I was testing this on a cluster with VI API version 7.0.3.0, the documentation [2] explicitly mentions all parameters of the FindByUuid() function to be available since 2.0. Now I get different behaviour based on what is in the SOAPAction HTTP header. Testing it more thoroughly today (and realising there are some bad dependencies because simple `ninja -C build` does not account for all the changes somehow) I get this: - No SOAPAction added -- HTTP 500, Unexpected element tag "instanceUuid" seen - "SOAPAction: urn:vim25 -- Lot of warnings about unexpected properties for deserialisation (from libvirt codebase), but the API works - "SOAPAction: urn:vim25/1.0" -- Same as above - "SOAPAction: urn:vim25/2.0" -- Same as above - "SOAPAction: urn:vim25/2.5" -- Same as above - "SOAPAction: urn:vim25/3.0" -- Same as above - "SOAPAction: urn:vim25/4.0" -- No warnings, "just works" - "SOAPAction: urn:vim25/7.0" -- Works, but there are again lots of warnings, about the same properties in UserSession ServiceContent, HostConfigManager etc. Now, we explicitly check that the API version is at least 2.5, but based on the warnings the codebase might be more prepared for 4.0. When looking at the properties the warnings mention, for example the UserSession they are properties of that type added specifically in vSphere API Release 5.0 and 5.1; which means for some reason it is probably behaving as newer API version unless we specify 4.0 exactly. Now we can add the new fields as optional, that's not a problem (I tried it and it works nicely). I am just not that familiar with backwards compatibility with vSphere API to know whether we can just add "SOAPAction: urn:vim25" (without defining the number) and call it a day or how much careful we need to be.
We have not formally specified what vmware versions we target, but if we take our usual approach to limit our liability, we'd probably want to go for compat with something like the N most recent major versions for low values of "N"If there is a reasonable use case for it (well, anyone saying "here, me, I'd like to keep it") we could change the behaviour based on the version we parse when connecting to the server. But that's an extra bit of mess, although not totally big one.Or accept a URI parameter to control compatibility level ?
That would be safer, but would shift the responsibility to layers that I am sure do not want to deal with it. And we are parsing the API version so we at least know what should work. [1] https://redhat.atlassian.net/browse/RHEL-174300 [2] https://developer.broadcom.com/xapis/vsphere-web-services-api/latest/vim.SearchIndex.html#findByUuid
With regards, Daniel -- |: https://berrange.com ~~ https://hachyderm.io/@berrange :| |: https://libvirt.org ~~ https://entangle-photo.org :| |: https://pixelfed.art/berrange ~~ https://fstop138.berrange.com :|
signature.asc
Description: PGP signature
