On 17.09.21 07:40, John Snow wrote:
To use the AQMP backend, Machine just needs to be a little more diligent
about what happens when closing a QMP connection. The operation is no
longer a freebie in the async world.
Because async QMP continues to check for messages asynchronously, it's
almost certainly likely that the loop will have exited due to EOF after
issuing the last 'quit' command. That error will ultimately be bubbled
up when attempting to close the QMP connection. The manager class here
then is free to discard it if it was expected.
Signed-off-by: John Snow <js...@redhat.com>
---
Yes, I regret that this class has become quite a dumping ground for
complexity around the exit path. It's in need of a refactor to help
separate out the exception handling and cleanup mechanisms from the
VM-related stuff, but it's not a priority to do that just yet -- but
it's on the list.
---
python/qemu/machine/machine.py | 51 ++++++++++++++++++++++++++++++----
1 file changed, 46 insertions(+), 5 deletions(-)
diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index 6e58d2f951..8f5a6649e5 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -342,9 +342,15 @@ def _post_shutdown(self) -> None:
# Comprehensive reset for the failed launch case:
self._early_cleanup()
- if self._qmp_connection:
- self._qmp.close()
- self._qmp_connection = None
+ try:
+ self._close_qmp_connection()
+ except Exception as err: # pylint: disable=broad-except
+ LOG.warning(
+ "Exception closing QMP connection: %s",
+ str(err) if str(err) else type(err).__name__
+ )
+ finally:
+ assert self._qmp_connection is None
self._close_qemu_log_file()
@@ -420,6 +426,31 @@ def _launch(self) -> None:
close_fds=False)
self._post_launch()
+ def _close_qmp_connection(self) -> None:
+ """
+ Close the underlying QMP connection, if any.
+
+ Dutifully report errors that occurred while closing, but assume
+ that any error encountered indicates an abnormal termination
+ process and not a failure to close.
+ """
+ if self._qmp_connection is None:
+ return
+
+ try:
+ self._qmp.close()
+ except EOFError:
+ # EOF can occur as an Exception here when using the Async
+ # QMP backend. It indicates that the server closed the
+ # stream. If we successfully issued 'quit' at any point,
+ # then this was expected. If the remote went away without
+ # our permission, it's worth reporting that as an abnormal
+ # shutdown case.
+ if not self._has_quit:
+ raise
+ finally:
+ self._qmp_connection = None
+
def _early_cleanup(self) -> None:
"""
Perform any cleanup that needs to happen before the VM exits.
@@ -461,8 +492,18 @@ def _soft_shutdown(self, timeout: Optional[int]) -> None:
if self._qmp_connection:
if not self._has_quit:
- # Might raise ConnectionReset
- self.qmp('quit')
+ try:
+ # May raise ExecInterruptedError or StateError if the
+ # connection dies or has already died.
+ self.qmp('quit')
+ finally:
+ # If 'quit' fails, we'll still want to call close(),
+ # which will raise any final errors that may have
+ # occurred while trying to send 'quit'.
+ self._close_qmp_connection()
+ else:
+ # Regardless, we want to tidy up the socket.
+ self._close_qmp_connection()
Why can’t we wait for _post_shutdown to do that? Has it something to do
with this operation being “no longer a freeby” (I’m not quite sure what
this refers to, execution time, or the set of possible exceptions, or
perhaps something else entirely), and so we want to do it in the
already-not-instantaneous _soft_shutdown?
Hanna
# May raise subprocess.TimeoutExpired
self._subp.wait(timeout=timeout)