Please excuse my late reply; I was on vacation. Daniel P. Berrangé <berra...@redhat.com> writes:
> On Tue, Aug 09, 2022 at 08:40:24AM +0200, Claudio Imbrenda wrote: >> This patch adds support for asynchronously tearing down a VM on Linux. >> >> When qemu terminates, either naturally or because of a fatal signal, >> the VM is torn down. If the VM is huge, it can take a considerable >> amount of time for it to be cleaned up. In case of a protected VM, it >> might take even longer than a non-protected VM (this is the case on >> s390x, for example). >> >> Some users might want to shut down a VM and restart it immediately, >> without having to wait. This is especially true if management >> infrastructure like libvirt is used. >> >> This patch implements a simple trick on Linux to allow qemu to return >> immediately, with the teardown of the VM being performed >> asynchronously. >> >> If the new commandline option -async-teardown is used, a new process is >> spawned from qemu at startup, using the clone syscall, in such way that >> it will share its address space with qemu. >> >> The new process will have the name "cleanup/<QEMU_PID>". It will wait >> until qemu terminates, and then it will exit itself. >> >> This allows qemu to terminate quickly, without having to wait for the >> whole address space to be torn down. The teardown process will exit >> after qemu, so it will be the last user of the address space, and >> therefore it will take care of the actual teardown. >> >> The teardown process will share the same cgroups as qemu, so both >> memory usage and cpu time will be accounted properly. >> >> This feature can already be used with libvirt by adding the following >> to the XML domain definition to pass the parameter to qemu directly: >> >> <commandline xmlns="http://libvirt.org/schemas/domain/qemu/1.0"> >> <arg value='-async-teardown'/> >> </commandline> >> >> More advanced interfaces like pidfd or close_range have intentionally >> been avoided in order to be more compatible with older kernels. >> >> Signed-off-by: Claudio Imbrenda <imbre...@linux.ibm.com> [...] >> diff --git a/qemu-options.hx b/qemu-options.hx >> index 3f23a42fa8..d434353159 100644 >> --- a/qemu-options.hx >> +++ b/qemu-options.hx >> @@ -4743,6 +4743,23 @@ HXCOMM Internal use >> DEF("qtest", HAS_ARG, QEMU_OPTION_qtest, "", QEMU_ARCH_ALL) >> DEF("qtest-log", HAS_ARG, QEMU_OPTION_qtest_log, "", QEMU_ARCH_ALL) >> >> +#ifdef __linux__ >> +DEF("async-teardown", 0, QEMU_OPTION_asyncteardown, >> + "-async-teardown enable asynchronous teardown\n", >> + QEMU_ARCH_ALL) >> +#endif >> +SRST >> +``-async-teardown`` >> + Enable asynchronous teardown. A new teardown process will be >> + created at startup, using clone. The teardown process will share >> + the address space of the main qemu process, and wait for the main >> + process to terminate. At that point, the teardown process will >> + also exit. This allows qemu to terminate quickly if the guest was >> + huge, leaving the teardown of the address space to the teardown >> + process. Since the teardown process shares the same cgroups as the >> + main qemu process, accounting is performed correctly. >> +ERST >> + >> DEF("msg", HAS_ARG, QEMU_OPTION_msg, >> "-msg [timestamp[=on|off]][,guest-name=[on|off]]\n" >> " control error message format\n" > > It occurrs to me that we've got a general goal of getting away from > adding new top level command line arguments. Most of the time there's > an obvious existing place to put them, but I'm really not sure > where this particular option would fit ? > > it isn't tied to any aspect of the VM backend configuration nor > hardware frontends. > > The closest match is the lifecycle action option (-no-shutdown) > which were merged into a -action arg, but even that's a bit of a > stretch. If I understand the proposed new option correctly, it modifies how QEMU terminates, independent of why it terminates. Could be guest reboot with -action reboot-shutdown, monitor command quit, SIGTERM, ... I agree putting it under -action would be a bit of a stretch, as so far -action is entirely about configuring the reaction to guest certain actions: -action reboot=reset|shutdown action when guest reboots [default=reset] -action shutdown=poweroff|pause action when guest shuts down [default=poweroff] -action panic=pause|shutdown|exit-failure|none action when guest panics [default=shutdown] -action watchdog=reset|shutdown|poweroff|inject-nmi|pause|debug|none action when watchdog fires [default=reset] A different stretch: -daemonize, -runas, -chroot. These modify how QEMU starts. They too are "top-level". > Markus/Paolo: do you have suggestions ? Ramblings^WThoughts, not actionable suggestions, I'm afraid. [...]