Re: [RFC] Reintroducing customisable hotplug timeouts

Peter Krempa via Devel Wed, 14 Jan 2026 04:22:02 -0800

On Wed, Jan 14, 2026 at 11:17:21 -0000, partha.satapathy--- via Devel wrote:
> Completely agree with your views — no amount of time is truly sufficient in 
> these scenarios. However, when debugging these issues, it’s crucial that we 
> provide a larger window to ensure we can capture the right guest context.
> 
> To that end, we’re exploring the possibility of injecting an NMI, controlled 
> via a configuration, in case a CPU hot-plug timeout issue arises (using 
> qemuMonitorInjectNMI). The current 5-second timeout could indeed be too 
> aggressive in some cases. To support better debugging within the guest, would 
> it be possible to introduce a combination of a configurable timeout followed 
> by the NMI to capture the guest core state? This would give us more 
> flexibility and a better chance of diagnosing the issue effectively.


Once again, with the asynchronous API I've mentioned in the reply you've
trimmed has no timeout. It just submits the device removal request to
the VM. Success of that API is only meaning that the device unplug
request was sent to the VM.

You need to act based on wheter you've received
VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED event or not in any timeframe your
application desires, thus giving you full control of any policy you
might want to apply. 

https://www.libvirt.org/html/libvirt-libvirt-domain.html#VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED

The existance of the APIs which do apply timeout is only a quirk of
their implementation as historically hotunplug was considered
synchronous. The APIs needed to be retrofitted to have similar behaviour
as if it were synchronous but it's not and that creates the weird
situations.

That's why the async API exists which is very upfront about being async
and requiring the user to watch for the event.

Re: [RFC] Reintroducing customisable hotplug timeouts

Reply via email to