Completely agree with your views — no amount of time is truly sufficient in these scenarios. However, when debugging these issues, it’s crucial that we provide a larger window to ensure we can capture the right guest context.
To that end, we’re exploring the possibility of injecting an NMI, controlled via a configuration, in case a CPU hot-plug timeout issue arises (using qemuMonitorInjectNMI). The current 5-second timeout could indeed be too aggressive in some cases. To support better debugging within the guest, would it be possible to introduce a combination of a configurable timeout followed by the NMI to capture the guest core state? This would give us more flexibility and a better chance of diagnosing the issue effectively. Looking forward to hearing your thoughts!
