On Fri, Jul 10, 2020 at 01:06:47AM -0400, John Snow wrote: > This is done primarily to avoid the 'bare except' pattern, which > suppresses all exceptions during shutdown and can obscure errors. > > Replace this with a pattern that isolates the different kind of shutdown > paradigms (_hard_shutdown and _soft_shutdown), and a new fallback shutdown > handler (_do_shutdown) that gracefully attempts one before the other. > > This split now also ensures that no matter what happens, > _post_shutdown() is always invoked. > > shutdown() changes in behavior such that if it attempts to do a graceful > shutdown and is unable to, it will now always raise an exception to > indicate this. This can be avoided by the test writer in three ways: > > 1. If the VM is expected to have already exited or is in the process of > exiting, wait() can be used instead of shutdown() to clean up resources > instead. This helps avoid race conditions in shutdown. > > 2. If a test writer is expecting graceful shutdown to fail, shutdown > should be called in a try...except block. > > 3. If the test writer has no interest in performing a graceful shutdown > at all, kill() can be used instead. > > > Handling shutdown in this way makes it much more explicit which type of > shutdown we want and allows the library to report problems with this > process. > > Signed-off-by: John Snow <js...@redhat.com> > --- > python/qemu/machine.py | 95 +++++++++++++++++++++++++++++++++++------- > 1 file changed, 80 insertions(+), 15 deletions(-) > > diff --git a/python/qemu/machine.py b/python/qemu/machine.py > index aaa173f046..b24ce8a268 100644 > --- a/python/qemu/machine.py > +++ b/python/qemu/machine.py > @@ -48,6 +48,12 @@ class QEMUMachineAddDeviceError(QEMUMachineError): > """ > > > +class AbnormalShutdown(QEMUMachineError): > + """ > + Exception raised when a graceful shutdown was requested, but not > performed. > + """ > + > + > class MonitorResponseError(qmp.QMPError): > """ > Represents erroneous QMP monitor reply > @@ -365,6 +371,7 @@ def _early_cleanup(self) -> None: > """ > Perform any cleanup that needs to happen before the VM exits. > > + May be invoked by both soft and hard shutdown in failover scenarios. > Called additionally by _post_shutdown for comprehensive cleanup. > """ > # If we keep the console socket open, we may deadlock waiting > @@ -374,32 +381,90 @@ def _early_cleanup(self) -> None: > self._console_socket.close() > self._console_socket = None > > + def _hard_shutdown(self) -> None: > + """ > + Perform early cleanup, kill the VM, and wait for it to terminate. > + > + :raise subprocess.Timeout: When timeout is exceeds 60 seconds > + waiting for the QEMU process to terminate. > + """ > + self._early_cleanup()
Like I commented on patch 5, I don't think the *current* type of cleanup done is needed on a scenario like this... > + self._popen.kill() ... as I don't remember QEMU's SIGKILL handler to be susceptible to the race condition that motivated the closing of the console file in the first place. But, I also can not prove it's not susceptible at this time. Note: I have some old patches that added tests for QEMUMachine itself. I intend to respin them on top of your work, so we may have a clearer understanding of the QEMU behaviors we need to handle. So, feel free to take the prudent route here, and keep the early cleanup. Reviewed-by: Cleber Rosa <cr...@redhat.com> Tested-by: Cleber Rosa <cr...@redhat.com>
signature.asc
Description: PGP signature