date:20210922

[PATCH v2 12/17] python/machine: Handle QMP errors on close more meticulously

2021-09-22 Thread John Snow

To use the AQMP backend, Machine just needs to be a little more diligent
about what happens when closing a QMP connection. The operation is no
longer a freebie in the async world; it may return errors encountered in
the async bottom half on incoming message receipt, etc.

(AQMP's disconnect, ultimately, serves as the quiescence point where all
async contexts are gathered together, and any final errors reported at
that point.)

Because async QMP continues to check for messages asynchronously, it's
almost certainly likely that the loop will have exited due to EOF after
issuing the last 'quit' command. That error will ultimately be bubbled
up when attempting to close the QMP connection. The manager class here
then is free to discard it -- if it was expected.

Signed-off-by: John Snow 

---

Yes, I regret that this class has become quite a dumping ground for
complexity around the exit path. It's in need of a refactor to help
separate out the exception handling and cleanup mechanisms from the
VM-related stuff, but it's not a priority to do that just yet -- but
it's on the list.

Signed-off-by: John Snow 
---
 python/qemu/machine/machine.py | 48 +-
 1 file changed, 42 insertions(+), 6 deletions(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index 0bd40bc2f76..c33a78a2d9f 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -342,9 +342,15 @@ def _post_shutdown(self) -> None:
 # Comprehensive reset for the failed launch case:
 self._early_cleanup()
 
-if self._qmp_connection:
-self._qmp.close()
-self._qmp_connection = None
+try:
+self._close_qmp_connection()
+except Exception as err:  # pylint: disable=broad-except
+LOG.warning(
+"Exception closing QMP connection: %s",
+str(err) if str(err) else type(err).__name__
+)
+finally:
+assert self._qmp_connection is None
 
 self._close_qemu_log_file()
 
@@ -420,6 +426,31 @@ def _launch(self) -> None:
close_fds=False)
 self._post_launch()
 
+def _close_qmp_connection(self) -> None:
+"""
+Close the underlying QMP connection, if any.
+
+Dutifully report errors that occurred while closing, but assume
+that any error encountered indicates an abnormal termination
+process and not a failure to close.
+"""
+if self._qmp_connection is None:
+return
+
+try:
+self._qmp.close()
+except EOFError:
+# EOF can occur as an Exception here when using the Async
+# QMP backend. It indicates that the server closed the
+# stream. If we successfully issued 'quit' at any point,
+# then this was expected. If the remote went away without
+# our permission, it's worth reporting that as an abnormal
+# shutdown case.
+if not self._quit_issued:
+raise
+finally:
+self._qmp_connection = None
+
 def _early_cleanup(self) -> None:
 """
 Perform any cleanup that needs to happen before the VM exits.
@@ -460,9 +491,14 @@ def _soft_shutdown(self, timeout: Optional[int]) -> None:
 self._early_cleanup()
 
 if self._qmp_connection:
-if not self._quit_issued:
-# Might raise ConnectionReset
-self.qmp('quit')
+try:
+if not self._quit_issued:
+# May raise ExecInterruptedError or StateError if the
+# connection dies or has *already* died.
+self.qmp('quit')
+finally:
+# Regardless, we want to quiesce the connection.
+self._close_qmp_connection()
 
 # May raise subprocess.TimeoutExpired
 self._subp.wait(timeout=timeout)
-- 
2.31.1

[PATCH v2 11/17] python/machine: remove has_quit argument

2021-09-22 Thread John Snow

If we spy on the QMP commands instead, we don't need callers to remember
to pass it. Seems like a fair trade-off.

The one slightly weird bit is overloading this instance variable for
wait(), where we use it to mean "don't issue the qmp 'quit'
command". This means that wait() will "fail" if the QEMU process does
not terminate of its own accord.

In most cases, we probably did already actually issue quit -- some
iotests do this -- but in some others, we may be waiting for QEMU to
terminate for some other reason, such as a test wherein we tell the
guest (directly) to shut down.

Signed-off-by: John Snow 
---
 python/qemu/machine/machine.py | 34 +++---
 tests/qemu-iotests/040 |  7 +--
 tests/qemu-iotests/218 |  2 +-
 tests/qemu-iotests/255 |  2 +-
 4 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index 056d340e355..0bd40bc2f76 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -170,6 +170,7 @@ def __init__(self,
 self._console_socket: Optional[socket.socket] = None
 self._remove_files: List[str] = []
 self._user_killed = False
+self._quit_issued = False
 
 def __enter__(self: _T) -> _T:
 return self
@@ -368,6 +369,7 @@ def _post_shutdown(self) -> None:
 command = ''
 LOG.warning(msg, -int(exitcode), command)
 
+self._quit_issued = False
 self._user_killed = False
 self._launched = False
 
@@ -443,15 +445,13 @@ def _hard_shutdown(self) -> None:
 self._subp.kill()
 self._subp.wait(timeout=60)
 
-def _soft_shutdown(self, timeout: Optional[int],
-   has_quit: bool = False) -> None:
+def _soft_shutdown(self, timeout: Optional[int]) -> None:
 """
 Perform early cleanup, attempt to gracefully shut down the VM, and wait
 for it to terminate.
 
 :param timeout: Timeout in seconds for graceful shutdown.
 A value of None is an infinite wait.
-:param has_quit: When True, don't attempt to issue 'quit' QMP command
 
 :raise ConnectionReset: On QMP communication errors
 :raise subprocess.TimeoutExpired: When timeout is exceeded waiting for
@@ -460,21 +460,19 @@ def _soft_shutdown(self, timeout: Optional[int],
 self._early_cleanup()
 
 if self._qmp_connection:
-if not has_quit:
+if not self._quit_issued:
 # Might raise ConnectionReset
-self._qmp.cmd('quit')
+self.qmp('quit')
 
 # May raise subprocess.TimeoutExpired
 self._subp.wait(timeout=timeout)
 
-def _do_shutdown(self, timeout: Optional[int],
- has_quit: bool = False) -> None:
+def _do_shutdown(self, timeout: Optional[int]) -> None:
 """
 Attempt to shutdown the VM gracefully; fallback to a hard shutdown.
 
 :param timeout: Timeout in seconds for graceful shutdown.
 A value of None is an infinite wait.
-:param has_quit: When True, don't attempt to issue 'quit' QMP command
 
 :raise AbnormalShutdown: When the VM could not be shut down gracefully.
 The inner exception will likely be ConnectionReset or
@@ -482,13 +480,13 @@ def _do_shutdown(self, timeout: Optional[int],
 may result in its own exceptions, likely subprocess.TimeoutExpired.
 """
 try:
-self._soft_shutdown(timeout, has_quit)
+self._soft_shutdown(timeout)
 except Exception as exc:
 self._hard_shutdown()
 raise AbnormalShutdown("Could not perform graceful shutdown") \
 from exc
 
-def shutdown(self, has_quit: bool = False,
+def shutdown(self,
  hard: bool = False,
  timeout: Optional[int] = 30) -> None:
 """
@@ -498,7 +496,6 @@ def shutdown(self, has_quit: bool = False,
 If the VM has not yet been launched, or shutdown(), wait(), or kill()
 have already been called, this method does nothing.
 
-:param has_quit: When true, do not attempt to issue 'quit' QMP command.
 :param hard: When true, do not attempt graceful shutdown, and
  suppress the SIGKILL warning log message.
 :param timeout: Optional timeout in seconds for graceful shutdown.
@@ -512,7 +509,7 @@ def shutdown(self, has_quit: bool = False,
 self._user_killed = True
 self._hard_shutdown()
 else:
-self._do_shutdown(timeout, has_quit)
+self._do_shutdown(timeout)
 finally:
 self._post_shutdown()
 
@@ -529,7 +526,8 @@ def wait(self, timeout: Optional[int] = 30) -> None:
 :param timeout: Optional timeout in seconds. Default 30 seconds.
 A

[PATCH v2 15/17] python/aqmp: Create sync QMP wrapper for iotests

2021-09-22 Thread John Snow

This is a wrapper around the async QMPClient that mimics the old,
synchronous QEMUMonitorProtocol class. It is designed to be
interchangeable with the old implementation.

It does not, however, attempt to mimic Exception compatibility.

Signed-off-by: John Snow 
Acked-by: Hanna Reitz 
---
 python/qemu/aqmp/legacy.py | 135 +
 1 file changed, 135 insertions(+)
 create mode 100644 python/qemu/aqmp/legacy.py

diff --git a/python/qemu/aqmp/legacy.py b/python/qemu/aqmp/legacy.py
new file mode 100644
index 000..a75dbc599c0
--- /dev/null
+++ b/python/qemu/aqmp/legacy.py
@@ -0,0 +1,135 @@
+"""
+Sync QMP Wrapper
+
+This class pretends to be qemu.qmp.QEMUMonitorProtocol.
+"""
+
+import asyncio
+from typing import (
+Awaitable,
+List,
+Optional,
+TypeVar,
+Union,
+)
+
+import qemu.qmp
+from qemu.qmp import QMPMessage, QMPReturnValue, SocketAddrT
+
+from .qmp_client import QMPClient
+
+
+# pylint: disable=missing-docstring
+
+
+class QEMUMonitorProtocol(qemu.qmp.QEMUMonitorProtocol):
+def __init__(self, address: SocketAddrT,
+ server: bool = False,
+ nickname: Optional[str] = None):
+
+# pylint: disable=super-init-not-called
+self._aqmp = QMPClient(nickname)
+self._aloop = asyncio.get_event_loop()
+self._address = address
+self._timeout: Optional[float] = None
+
+_T = TypeVar('_T')
+
+def _sync(
+self, future: Awaitable[_T], timeout: Optional[float] = None
+) -> _T:
+return self._aloop.run_until_complete(
+asyncio.wait_for(future, timeout=timeout)
+)
+
+def _get_greeting(self) -> Optional[QMPMessage]:
+if self._aqmp.greeting is not None:
+# pylint: disable=protected-access
+return self._aqmp.greeting._asdict()
+return None
+
+# __enter__ and __exit__ need no changes
+# parse_address needs no changes
+
+def connect(self, negotiate: bool = True) -> Optional[QMPMessage]:
+self._aqmp.await_greeting = negotiate
+self._aqmp.negotiate = negotiate
+
+self._sync(
+self._aqmp.connect(self._address)
+)
+return self._get_greeting()
+
+def accept(self, timeout: Optional[float] = 15.0) -> QMPMessage:
+self._aqmp.await_greeting = True
+self._aqmp.negotiate = True
+
+self._sync(
+self._aqmp.accept(self._address),
+timeout
+)
+
+ret = self._get_greeting()
+assert ret is not None
+return ret
+
+def cmd_obj(self, qmp_cmd: QMPMessage) -> QMPMessage:
+return dict(
+self._sync(
+# pylint: disable=protected-access
+
+# _raw() isn't a public API, because turning off
+# automatic ID assignment is discouraged. For
+# compatibility with iotests *only*, do it anyway.
+self._aqmp._raw(qmp_cmd, assign_id=False),
+self._timeout
+)
+)
+
+# Default impl of cmd() delegates to cmd_obj
+
+def command(self, cmd: str, **kwds: object) -> QMPReturnValue:
+return self._sync(
+self._aqmp.execute(cmd, kwds),
+self._timeout
+)
+
+def pull_event(self,
+   wait: Union[bool, float] = False) -> Optional[QMPMessage]:
+if wait is False:
+# Return None if there's no event ready to go
+if self._aqmp.events.empty():
+return None
+
+timeout = None
+if isinstance(wait, float):
+timeout = wait
+
+return dict(
+self._sync(
+self._aqmp.events.get(),
+timeout
+)
+)
+
+def get_events(self, wait: Union[bool, float] = False) -> List[QMPMessage]:
+events = [dict(x) for x in self._aqmp.events.clear()]
+if events:
+return events
+
+event = self.pull_event(wait)
+return [event] if event is not None else []
+
+def clear_events(self) -> None:
+self._aqmp.events.clear()
+
+def close(self) -> None:
+self._sync(
+self._aqmp.disconnect()
+)
+
+def settimeout(self, timeout: Optional[float]) -> None:
+self._timeout = timeout
+
+def send_fd_scm(self, fd: int) -> None:
+self._aqmp.send_fd_scm(fd)
-- 
2.31.1

[PATCH v2 09/17] python/qmp: add send_fd_scm directly to QEMUMonitorProtocol

2021-09-22 Thread John Snow

It turns out you can do this directly from Python ... and because of
this, you don't need to worry about setting the inheritability of the
fds or spawning another process.

Doing this is helpful because it allows QEMUMonitorProtocol to keep its
file descriptor and socket object as private implementation
details. /that/ is helpful in turn because it allows me to write a
compatible, alternative implementation.

Signed-off-by: John Snow 
Reviewed-by: Hanna Reitz 
---
 python/qemu/machine/machine.py | 44 +++---
 python/qemu/qmp/__init__.py| 21 +++-
 2 files changed, 18 insertions(+), 47 deletions(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index ae945ca3c94..1c6532a3d68 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -213,48 +213,22 @@ def add_fd(self: _T, fd: int, fdset: int,
 def send_fd_scm(self, fd: Optional[int] = None,
 file_path: Optional[str] = None) -> int:
 """
-Send an fd or file_path to socket_scm_helper.
+Send an fd or file_path to the remote via SCM_RIGHTS.
 
-Exactly one of fd and file_path must be given.
-If it is file_path, the helper will open that file and pass its own fd.
+Exactly one of fd and file_path must be given.  If it is
+file_path, the file will be opened read-only and the new file
+descriptor will be sent to the remote.
 """
-# In iotest.py, the qmp should always use unix socket.
-assert self._qmp.is_scm_available()
-if self._socket_scm_helper is None:
-raise QEMUMachineError("No path to socket_scm_helper set")
-if not os.path.exists(self._socket_scm_helper):
-raise QEMUMachineError("%s does not exist" %
-   self._socket_scm_helper)
-
-# This did not exist before 3.4, but since then it is
-# mandatory for our purpose
-if hasattr(os, 'set_inheritable'):
-os.set_inheritable(self._qmp.get_sock_fd(), True)
-if fd is not None:
-os.set_inheritable(fd, True)
-
-fd_param = ["%s" % self._socket_scm_helper,
-"%d" % self._qmp.get_sock_fd()]
-
 if file_path is not None:
 assert fd is None
-fd_param.append(file_path)
+with open(file_path, "rb") as passfile:
+fd = passfile.fileno()
+self._qmp.send_fd_scm(fd)
 else:
 assert fd is not None
-fd_param.append(str(fd))
+self._qmp.send_fd_scm(fd)
 
-proc = subprocess.run(
-fd_param,
-stdin=subprocess.DEVNULL,
-stdout=subprocess.PIPE,
-stderr=subprocess.STDOUT,
-check=False,
-close_fds=False,
-)
-if proc.stdout:
-LOG.debug(proc.stdout)
-
-return proc.returncode
+return 0
 
 @staticmethod
 def _remove_if_exists(path: str) -> None:
diff --git a/python/qemu/qmp/__init__.py b/python/qemu/qmp/__init__.py
index c27594b66a2..358c0971d06 100644
--- a/python/qemu/qmp/__init__.py
+++ b/python/qemu/qmp/__init__.py
@@ -21,6 +21,7 @@
 import json
 import logging
 import socket
+import struct
 from types import TracebackType
 from typing import (
 Any,
@@ -408,18 +409,14 @@ def settimeout(self, timeout: Optional[float]) -> None:
 raise ValueError(msg)
 self.__sock.settimeout(timeout)
 
-def get_sock_fd(self) -> int:
+def send_fd_scm(self, fd: int) -> None:
 """
-Get the socket file descriptor.
-
-@return The file descriptor number.
-"""
-return self.__sock.fileno()
-
-def is_scm_available(self) -> bool:
+Send a file descriptor to the remote via SCM_RIGHTS.
 """
-Check if the socket allows for SCM_RIGHTS.
+if self.__sock.family != socket.AF_UNIX:
+raise RuntimeError("Can't use SCM_RIGHTS on non-AF_UNIX socket.")
 
-@return True if SCM_RIGHTS is available, otherwise False.
-"""
-return self.__sock.family == socket.AF_UNIX
+self.__sock.sendmsg(
+[b' '],
+[(socket.SOL_SOCKET, socket.SCM_RIGHTS, struct.pack('@i', fd))]
+)
-- 
2.31.1

[PATCH v2 14/17] iotests: Conditionally silence certain AQMP errors

2021-09-22 Thread John Snow

AQMP likes to be very chatty about errors it encounters. In general,
this is good because it allows us to get good diagnostic information for
otherwise complex async failures.

For example, during a failed QMP connection attempt, we might see:

+ERROR:qemu.aqmp.qmp_client.qemub-2536319:Negotiation failed: EOFError
+ERROR:qemu.aqmp.qmp_client.qemub-2536319:Failed to establish session: EOFError

This might be nice in iotests output, because failure scenarios
involving the new QMP library will be spelled out plainly in the output
diffs.

For tests that are intentionally causing this scenario though, filtering
that log output could be a hassle. For now, add a context manager that
simply lets us toggle this output off during a critical region.

(Additionally, a forthcoming patch allows the use of either legacy or
async QMP to be toggled with an environment variable. In this
circumstance, we can't amend the iotest output to just always expect the
error message, either. Just suppress it for now. More rigorous log
filtering can be investigated later if/when it is deemed safe to
permanently replace the legacy QMP library.)

Signed-off-by: John Snow 
---
 tests/qemu-iotests/iotests.py | 20 +++-
 tests/qemu-iotests/tests/mirror-top-perms |  9 +
 2 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 9afa258a405..4d39b86ed85 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -30,7 +30,7 @@
 import subprocess
 import sys
 import time
-from typing import (Any, Callable, Dict, Iterable,
+from typing import (Any, Callable, Dict, Iterable, Iterator,
 List, Optional, Sequence, TextIO, Tuple, Type, TypeVar)
 import unittest
 
@@ -116,6 +116,24 @@
 sample_img_dir = os.environ['SAMPLE_IMG_DIR']
 
 
+@contextmanager
+def change_log_level(
+logger_name: str, level: int = logging.CRITICAL) -> Iterator[None]:
+"""
+Utility function for temporarily changing the log level of a logger.
+
+This can be used to silence errors that are expected or uninteresting.
+"""
+_logger = logging.getLogger(logger_name)
+current_level = _logger.level
+_logger.setLevel(level)
+
+try:
+yield
+finally:
+_logger.setLevel(current_level)
+
+
 def unarchive_sample_image(sample, fname):
 sample_fname = os.path.join(sample_img_dir, sample + '.bz2')
 with bz2.open(sample_fname) as f_in, open(fname, 'wb') as f_out:
diff --git a/tests/qemu-iotests/tests/mirror-top-perms 
b/tests/qemu-iotests/tests/mirror-top-perms
index 9fe315e3b01..5a34ec655e2 100755
--- a/tests/qemu-iotests/tests/mirror-top-perms
+++ b/tests/qemu-iotests/tests/mirror-top-perms
@@ -21,7 +21,7 @@
 
 import os
 import iotests
-from iotests import qemu_img
+from iotests import change_log_level, qemu_img
 
 # Import qemu after iotests.py has amended sys.path
 # pylint: disable=wrong-import-order
@@ -100,9 +100,10 @@ class TestMirrorTopPerms(iotests.QMPTestCase):
 self.vm_b.add_blockdev(f'file,node-name=drive0,filename={source}')
 self.vm_b.add_device('virtio-blk,drive=drive0,share-rw=on')
 try:
-self.vm_b.launch()
-print('ERROR: VM B launched successfully, this should not have '
-  'happened')
+with change_log_level('qemu.aqmp'):
+self.vm_b.launch()
+print('ERROR: VM B launched successfully, '
+  'this should not have happened')
 except (qemu.qmp.QMPConnectError, ConnectError):
 assert 'Is another process using the image' in self.vm_b.get_log()
 
-- 
2.31.1

[PATCH v2 08/17] python/qmp: clear events on get_events() call

2021-09-22 Thread John Snow

All callers in the tree *already* clear the events after a call to
get_events(). Do it automatically instead and update callsites to remove
the manual clear call.

These semantics are quite a bit easier to emulate with async QMP, and
nobody appears to be abusing some emergent properties of what happens if
you decide not to clear them, so let's dial down to the dumber, simpler
thing.

Specifically: callers of clear() right after a call to get_events() are
more likely expressing their desire to not see any events they just
retrieved, whereas callers of clear_events() not in relation to a recent
call to pull_event/get_events are likely expressing their desire to
simply drop *all* pending events straight onto the floor. In the sync
world, this is safe enough; in the async world it's nearly impossible to
promise that nothing happens between getting and clearing the
events.

Making the retrieval also clear the queue is vastly simpler.

Signed-off-by: John Snow 
Reviewed-by: Hanna Reitz 
---
 python/qemu/machine/machine.py | 1 -
 python/qemu/qmp/__init__.py| 6 --
 python/qemu/qmp/qmp_shell.py   | 1 -
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index 34131884a57..ae945ca3c94 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -631,7 +631,6 @@ def get_qmp_events(self, wait: bool = False) -> 
List[QMPMessage]:
 events = self._qmp.get_events(wait=wait)
 events.extend(self._events)
 del self._events[:]
-self._qmp.clear_events()
 return events
 
 @staticmethod
diff --git a/python/qemu/qmp/__init__.py b/python/qemu/qmp/__init__.py
index 269516a79b9..c27594b66a2 100644
--- a/python/qemu/qmp/__init__.py
+++ b/python/qemu/qmp/__init__.py
@@ -361,7 +361,7 @@ def pull_event(self,
 
 def get_events(self, wait: bool = False) -> List[QMPMessage]:
 """
-Get a list of available QMP events.
+Get a list of available QMP events and clear all pending events.
 
 @param wait (bool): block until an event is available.
 @param wait (float): If wait is a float, treat it as a timeout value.
@@ -374,7 +374,9 @@ def get_events(self, wait: bool = False) -> 
List[QMPMessage]:
 @return The list of available QMP events.
 """
 self.__get_events(wait)
-return self.__events
+events = self.__events
+self.__events = []
+return events
 
 def clear_events(self) -> None:
 """
diff --git a/python/qemu/qmp/qmp_shell.py b/python/qemu/qmp/qmp_shell.py
index 337acfce2d2..e7d7eb18f19 100644
--- a/python/qemu/qmp/qmp_shell.py
+++ b/python/qemu/qmp/qmp_shell.py
@@ -381,7 +381,6 @@ def read_exec_command(self) -> bool:
 if cmdline == '':
 for event in self.get_events():
 print(event)
-self.clear_events()
 return True
 
 return self._execute_cmd(cmdline)
-- 
2.31.1

[PATCH v2 07/17] python/aqmp: Disable logging messages by default

2021-09-22 Thread John Snow

AQMP is a library, and ideally it should not print error diagnostics
unless a user opts into seeing them. By default, Python will print all
WARNING, ERROR or CRITICAL messages to screen if no logging
configuration has been created by a client application.

In AQMP's case, ERROR logging statements are used to report additional
detail about runtime failures that will also eventually be reported to the
client library via an Exception, so these messages should not be
rendered by default.

(Why bother to have them at all, then? In async contexts, there may be
multiple Exceptions and we are only able to report one of them back to
the client application. It is not reasonably easy to predict ahead of
time if one or more of these Exceptions will be squelched. Therefore,
it's useful to log intermediate failures to help make sense of the
ultimate, resulting failure.)

Add a NullHandler that will suppress these messages until a client
application opts into logging via logging.basicConfig or similar. Note
that upon calling basicConfig(), this handler will *not* suppress these
messages from being displayed by the client's configuration.

Signed-off-by: John Snow 
---
 python/qemu/aqmp/__init__.py | 4 
 1 file changed, 4 insertions(+)

diff --git a/python/qemu/aqmp/__init__.py b/python/qemu/aqmp/__init__.py
index ab1782999cf..d1b0e4dc3d3 100644
--- a/python/qemu/aqmp/__init__.py
+++ b/python/qemu/aqmp/__init__.py
@@ -21,6 +21,7 @@
 # This work is licensed under the terms of the GNU GPL, version 2.  See
 # the COPYING file in the top-level directory.
 
+import logging
 import warnings
 
 from .error import AQMPError
@@ -41,6 +42,9 @@
 
 warnings.warn(_WMSG, FutureWarning)
 
+# Suppress logging unless an application engages it.
+logging.getLogger('qemu.aqmp').addHandler(logging.NullHandler())
+
 
 # The order of these fields impact the Sphinx documentation order.
 __all__ = (
-- 
2.31.1

[PATCH v2 16/17] python/aqmp: Remove scary message

2021-09-22 Thread John Snow

The scary message interferes with the iotests output. Coincidentally, if
iotests works by removing this, then it's good evidence that we don't
really need to scare people away from using it.

Signed-off-by: John Snow 
---
 python/qemu/aqmp/__init__.py | 12 
 1 file changed, 12 deletions(-)

diff --git a/python/qemu/aqmp/__init__.py b/python/qemu/aqmp/__init__.py
index d1b0e4dc3d3..880d5b6fa7f 100644
--- a/python/qemu/aqmp/__init__.py
+++ b/python/qemu/aqmp/__init__.py
@@ -22,7 +22,6 @@
 # the COPYING file in the top-level directory.
 
 import logging
-import warnings
 
 from .error import AQMPError
 from .events import EventListener
@@ -31,17 +30,6 @@
 from .qmp_client import ExecInterruptedError, ExecuteError, QMPClient
 
 
-_WMSG = """
-
-The Asynchronous QMP library is currently in development and its API
-should be considered highly fluid and subject to change. It should
-not be used by any other scripts checked into the QEMU tree.
-
-Proceed with caution!
-"""
-
-warnings.warn(_WMSG, FutureWarning)
-
 # Suppress logging unless an application engages it.
 logging.getLogger('qemu.aqmp').addHandler(logging.NullHandler())
 
-- 
2.31.1

[PATCH v2 13/17] iotests: Accommodate async QMP Exception classes

2021-09-22 Thread John Snow

(But continue to support the old ones for now, too.)

There are very few cases of any user of QEMUMachine or a subclass
thereof relying on a QMP Exception type. If you'd like to check for
yourself, you want to grep for all of the derivatives of QMPError,
excluding 'AQMPError' and its derivatives. That'd be these:

- QMPError
- QMPConnectError
- QMPCapabilitiesError
- QMPTimeoutError
- QMPProtocolError
- QMPResponseError
- QMPBadPortError


Signed-off-by: John Snow 
---
 scripts/simplebench/bench_block_job.py| 3 ++-
 tests/qemu-iotests/tests/mirror-top-perms | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/scripts/simplebench/bench_block_job.py 
b/scripts/simplebench/bench_block_job.py
index 4f03c121697..a403c35b08f 100755
--- a/scripts/simplebench/bench_block_job.py
+++ b/scripts/simplebench/bench_block_job.py
@@ -28,6 +28,7 @@
 sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'python'))
 from qemu.machine import QEMUMachine
 from qemu.qmp import QMPConnectError
+from qemu.aqmp import ConnectError
 
 
 def bench_block_job(cmd, cmd_args, qemu_args):
@@ -49,7 +50,7 @@ def bench_block_job(cmd, cmd_args, qemu_args):
 vm.launch()
 except OSError as e:
 return {'error': 'popen failed: ' + str(e)}
-except (QMPConnectError, socket.timeout):
+except (QMPConnectError, ConnectError, socket.timeout):
 return {'error': 'qemu failed: ' + str(vm.get_log())}
 
 try:
diff --git a/tests/qemu-iotests/tests/mirror-top-perms 
b/tests/qemu-iotests/tests/mirror-top-perms
index 2fc8dd66e0a..9fe315e3b01 100755
--- a/tests/qemu-iotests/tests/mirror-top-perms
+++ b/tests/qemu-iotests/tests/mirror-top-perms
@@ -26,6 +26,7 @@ from iotests import qemu_img
 # Import qemu after iotests.py has amended sys.path
 # pylint: disable=wrong-import-order
 import qemu
+from qemu.aqmp import ConnectError
 
 
 image_size = 1 * 1024 * 1024
@@ -102,7 +103,7 @@ class TestMirrorTopPerms(iotests.QMPTestCase):
 self.vm_b.launch()
 print('ERROR: VM B launched successfully, this should not have '
   'happened')
-except qemu.qmp.QMPConnectError:
+except (qemu.qmp.QMPConnectError, ConnectError):
 assert 'Is another process using the image' in self.vm_b.get_log()
 
 result = self.vm.qmp('block-job-cancel',
-- 
2.31.1

[PATCH v2 17/17] python, iotests: replace qmp with aqmp

2021-09-22 Thread John Snow

Swap out the synchronous QEMUMonitorProtocol from qemu.qmp with the sync
wrapper from qemu.aqmp instead.

Add an escape hatch in the form of the environment variable
QEMU_PYTHON_LEGACY_QMP which allows you to cajole QEMUMachine into using
the old implementatin, proving that both implementations work
concurrently.

Signed-off-by: John Snow 
Reviewed-by: Hanna Reitz 
Tested-by: Hanna Reitz 
---
 python/qemu/machine/machine.py | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index c33a78a2d9f..32879faeb40 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -41,7 +41,6 @@
 )
 
 from qemu.qmp import (  # pylint: disable=import-error
-QEMUMonitorProtocol,
 QMPMessage,
 QMPReturnValue,
 SocketAddrT,
@@ -50,6 +49,12 @@
 from . import console_socket
 
 
+if os.environ.get('QEMU_PYTHON_LEGACY_QMP'):
+from qemu.qmp import QEMUMonitorProtocol
+else:
+from qemu.aqmp.legacy import QEMUMonitorProtocol
+
+
 LOG = logging.getLogger(__name__)
 
 
-- 
2.31.1

[PATCH v2 10/17] python, iotests: remove socket_scm_helper

2021-09-22 Thread John Snow

It's not used anymore, now.

Signed-off-by: John Snow 
Reviewed-by: Hanna Reitz 
---
 tests/qemu-iotests/socket_scm_helper.c | 136 -
 python/qemu/machine/machine.py |   3 -
 python/qemu/machine/qtest.py   |   2 -
 tests/Makefile.include |   1 -
 tests/meson.build  |   4 -
 tests/qemu-iotests/iotests.py  |   3 -
 tests/qemu-iotests/meson.build |   5 -
 tests/qemu-iotests/testenv.py  |   8 +-
 8 files changed, 1 insertion(+), 161 deletions(-)
 delete mode 100644 tests/qemu-iotests/socket_scm_helper.c
 delete mode 100644 tests/qemu-iotests/meson.build

diff --git a/tests/qemu-iotests/socket_scm_helper.c 
b/tests/qemu-iotests/socket_scm_helper.c
deleted file mode 100644
index eb76d31aa94..000
--- a/tests/qemu-iotests/socket_scm_helper.c
+++ /dev/null
@@ -1,136 +0,0 @@
-/*
- * SCM_RIGHTS with unix socket help program for test
- *
- * Copyright IBM, Inc. 2013
- *
- * Authors:
- *  Wenchao Xia
- *
- * This work is licensed under the terms of the GNU LGPL, version 2 or later.
- * See the COPYING.LIB file in the top-level directory.
- */
-
-#include "qemu/osdep.h"
-#include 
-#include 
-
-/* #define SOCKET_SCM_DEBUG */
-
-/*
- * @fd and @fd_to_send will not be checked for validation in this function,
- * a blank will be sent as iov data to notify qemu.
- */
-static int send_fd(int fd, int fd_to_send)
-{
-struct msghdr msg;
-struct iovec iov[1];
-int ret;
-char control[CMSG_SPACE(sizeof(int))];
-struct cmsghdr *cmsg;
-
-memset(, 0, sizeof(msg));
-memset(control, 0, sizeof(control));
-
-/* Send a blank to notify qemu */
-iov[0].iov_base = (void *)" ";
-iov[0].iov_len = 1;
-
-msg.msg_iov = iov;
-msg.msg_iovlen = 1;
-
-msg.msg_control = control;
-msg.msg_controllen = sizeof(control);
-
-cmsg = CMSG_FIRSTHDR();
-
-cmsg->cmsg_len = CMSG_LEN(sizeof(int));
-cmsg->cmsg_level = SOL_SOCKET;
-cmsg->cmsg_type = SCM_RIGHTS;
-memcpy(CMSG_DATA(cmsg), _to_send, sizeof(int));
-
-do {
-ret = sendmsg(fd, , 0);
-} while (ret < 0 && errno == EINTR);
-
-if (ret < 0) {
-fprintf(stderr, "Failed to send msg, reason: %s\n", strerror(errno));
-}
-
-return ret;
-}
-
-/* Convert string to fd number. */
-static int get_fd_num(const char *fd_str, bool silent)
-{
-int sock;
-char *err;
-
-errno = 0;
-sock = strtol(fd_str, , 10);
-if (errno) {
-if (!silent) {
-fprintf(stderr, "Failed in strtol for socket fd, reason: %s\n",
-strerror(errno));
-}
-return -1;
-}
-if (!*fd_str || *err || sock < 0) {
-if (!silent) {
-fprintf(stderr, "bad numerical value for socket fd '%s'\n", 
fd_str);
-}
-return -1;
-}
-
-return sock;
-}
-
-/*
- * To make things simple, the caller needs to specify:
- * 1. socket fd.
- * 2. path of the file to be sent.
- */
-int main(int argc, char **argv, char **envp)
-{
-int sock, fd, ret;
-
-#ifdef SOCKET_SCM_DEBUG
-int i;
-for (i = 0; i < argc; i++) {
-fprintf(stderr, "Parameter %d: %s\n", i, argv[i]);
-}
-#endif
-
-if (argc != 3) {
-fprintf(stderr,
-"Usage: %s < socket-fd > < file-path >\n",
-argv[0]);
-return EXIT_FAILURE;
-}
-
-
-sock = get_fd_num(argv[1], false);
-if (sock < 0) {
-return EXIT_FAILURE;
-}
-
-fd = get_fd_num(argv[2], true);
-if (fd < 0) {
-/* Now only open a file in readonly mode for test purpose. If more
-   precise control is needed, use python script in file operation, 
which
-   is supposed to fork and exec this program. */
-fd = open(argv[2], O_RDONLY);
-if (fd < 0) {
-fprintf(stderr, "Failed to open file '%s'\n", argv[2]);
-return EXIT_FAILURE;
-}
-}
-
-ret = send_fd(sock, fd);
-if (ret < 0) {
-close(fd);
-return EXIT_FAILURE;
-}
-
-close(fd);
-return EXIT_SUCCESS;
-}
diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index 1c6532a3d68..056d340e355 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -98,7 +98,6 @@ def __init__(self,
  name: Optional[str] = None,
  base_temp_dir: str = "/var/tmp",
  monitor_address: Optional[SocketAddrT] = None,
- socket_scm_helper: Optional[str] = None,
  sock_dir: Optional[str] = None,
  drain_console: bool = False,
  console_log: Optional[str] = None,
@@ -113,7 +112,6 @@ def __init__(self,
 @param name: prefix for socket and log file names (default: qemu-PID)
 @param base_temp_dir: default location where temp files are created
 @param monitor_address: address for QMP monitor
-@param socket_scm_helper: helper

[PATCH v2 04/17] python/aqmp: add send_fd_scm

2021-09-22 Thread John Snow

The single space is indeed required to successfully transmit the file
descriptor to QEMU.

Python 3.11 removes support for calling sendmsg directly from a
transport's socket. There is no other interface for doing this, our use
case is, I suspect, "quite unique".

As far as I can tell, this is safe to do -- send_fd_scm is a synchronous
function and we can be guaranteed that the async coroutines will *not* be
running when it is invoked. In testing, it works correctly.

I investigated quite thoroughly the possibility of creating my own
asyncio Transport (The class that ultimately manages the raw socket
object) so that I could manage the socket myself, but this is so wildly
invasive and unportable I scrapped the idea. It would involve a lot of
copy-pasting of various python utilities and classes just to re-create
the same infrastructure, and for extremely little benefit. Nah.

Just boldly void the warranty instead, while I try to follow up on
https://bugs.python.org/issue43232

Signed-off-by: John Snow 
---
 python/qemu/aqmp/qmp_client.py | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/python/qemu/aqmp/qmp_client.py b/python/qemu/aqmp/qmp_client.py
index d2ad7459f9f..f987da02eb0 100644
--- a/python/qemu/aqmp/qmp_client.py
+++ b/python/qemu/aqmp/qmp_client.py
@@ -9,6 +9,8 @@
 
 import asyncio
 import logging
+import socket
+import struct
 from typing import (
 Dict,
 List,
@@ -624,3 +626,23 @@ async def execute(self, cmd: str,
 """
 msg = self.make_execute_msg(cmd, arguments, oob=oob)
 return await self.execute_msg(msg)
+
+@upper_half
+@require(Runstate.RUNNING)
+def send_fd_scm(self, fd: int) -> None:
+"""
+Send a file descriptor to the remote via SCM_RIGHTS.
+"""
+assert self._writer is not None
+sock = self._writer.transport.get_extra_info('socket')
+
+if sock.family != socket.AF_UNIX:
+raise AQMPError("Sending file descriptors requires a UNIX socket.")
+
+# Void the warranty sticker.
+# Access to sendmsg in asyncio is scheduled for removal in Python 3.11.
+sock = sock._sock  # pylint: disable=protected-access
+sock.sendmsg(
+[b' '],
+[(socket.SOL_SOCKET, socket.SCM_RIGHTS, struct.pack('@i', fd))]
+)
-- 
2.31.1

[PATCH v2 01/17] python/aqmp: add greeting property to QMPClient

2021-09-22 Thread John Snow

Expose the greeting as a read-only property of QMPClient so it can be
retrieved at-will.

Signed-off-by: John Snow 
Reviewed-by: Hanna Reitz 
---
 python/qemu/aqmp/qmp_client.py | 5 +
 1 file changed, 5 insertions(+)

diff --git a/python/qemu/aqmp/qmp_client.py b/python/qemu/aqmp/qmp_client.py
index 82e9dab124c..d2ad7459f9f 100644
--- a/python/qemu/aqmp/qmp_client.py
+++ b/python/qemu/aqmp/qmp_client.py
@@ -224,6 +224,11 @@ def __init__(self, name: Optional[str] = None) -> None:
 'asyncio.Queue[QMPClient._PendingT]'
 ] = {}
 
+@property
+def greeting(self) -> Optional[Greeting]:
+"""The `Greeting` from the QMP server, if any."""
+return self._greeting
+
 @upper_half
 async def _establish_session(self) -> None:
 """
-- 
2.31.1

[PATCH v2 02/17] python/aqmp: add .empty() method to EventListener

2021-09-22 Thread John Snow

Synchronous clients may want to know if they're about to block waiting
for an event or not. A method such as this is necessary to implement a
compatible interface for the old QEMUMonitorProtocol using the new async
internals.

Signed-off-by: John Snow 
Reviewed-by: Hanna Reitz 
---
 python/qemu/aqmp/events.py | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/python/qemu/aqmp/events.py b/python/qemu/aqmp/events.py
index fb81d216102..271899f6b82 100644
--- a/python/qemu/aqmp/events.py
+++ b/python/qemu/aqmp/events.py
@@ -556,6 +556,12 @@ async def get(self) -> Message:
 """
 return await self._queue.get()
 
+def empty(self) -> bool:
+"""
+Return `True` if there are no pending events.
+"""
+return self._queue.empty()
+
 def clear(self) -> None:
 """
 Clear this listener of all pending events.
-- 
2.31.1

[PATCH v2 06/17] python/aqmp: Reduce severity of EOFError-caused loop terminations

2021-09-22 Thread John Snow

When we encounter an EOFError, we don't know if it's an "error" in the
perspective of the user of the library yet. Therefore, we should not log
it as an error. Reduce the severity of this logging message to "INFO" to
indicate that it's something that we expect to occur during the normal
operation of the library.

Signed-off-by: John Snow 
---
 python/qemu/aqmp/protocol.py | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/python/qemu/aqmp/protocol.py b/python/qemu/aqmp/protocol.py
index 32e78749c11..ae1df240260 100644
--- a/python/qemu/aqmp/protocol.py
+++ b/python/qemu/aqmp/protocol.py
@@ -721,8 +721,11 @@ async def _bh_loop_forever(self, async_fn: _TaskFN, name: 
str) -> None:
 self.logger.debug("Task.%s: cancelled.", name)
 return
 except BaseException as err:
-self.logger.error("Task.%s: %s",
-  name, exception_summary(err))
+self.logger.log(
+logging.INFO if isinstance(err, EOFError) else logging.ERROR,
+"Task.%s: %s",
+name, exception_summary(err)
+)
 self.logger.debug("Task.%s: failure:\n%s\n",
   name, pretty_traceback())
 self._schedule_disconnect()
-- 
2.31.1

[PATCH v2 05/17] python/aqmp: Add dict conversion method to Greeting object

2021-09-22 Thread John Snow

The iotests interface expects to return the greeting as a dict; AQMP
offers it as a rich object.

Signed-off-by: John Snow 
---
 python/qemu/aqmp/models.py | 13 +
 1 file changed, 13 insertions(+)

diff --git a/python/qemu/aqmp/models.py b/python/qemu/aqmp/models.py
index 24c94123ac0..de87f878047 100644
--- a/python/qemu/aqmp/models.py
+++ b/python/qemu/aqmp/models.py
@@ -8,8 +8,10 @@
 # pylint: disable=too-few-public-methods
 
 from collections import abc
+import copy
 from typing import (
 Any,
+Dict,
 Mapping,
 Optional,
 Sequence,
@@ -66,6 +68,17 @@ def __init__(self, raw: Mapping[str, Any]):
 self._check_member('QMP', abc.Mapping, "JSON object")
 self.QMP = QMPGreeting(self._raw['QMP'])
 
+def _asdict(self) -> Dict[str, object]:
+"""
+For compatibility with the iotests sync QMP wrapper.
+
+The legacy QMP interface needs Greetings as a garden-variety Dict.
+
+This interface is private in the hopes that it will be able to
+be dropped again in the near-future. Caller beware!
+"""
+return dict(copy.deepcopy(self._raw))
+
 
 class QMPGreeting(Model):
 """
-- 
2.31.1

[PATCH v2 03/17] python/aqmp: Return cleared events from EventListener.clear()

2021-09-22 Thread John Snow

This serves two purposes:

(1) It is now possible to discern whether or not clear() removed any
event(s) from the queue with absolute certainty, and

(2) It is now very easy to get a List of all pending events in one
chunk, which is useful for the sync bridge.

Signed-off-by: John Snow 
Reviewed-by: Hanna Reitz 
---
 python/qemu/aqmp/events.py | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/python/qemu/aqmp/events.py b/python/qemu/aqmp/events.py
index 271899f6b82..5f7150c78d4 100644
--- a/python/qemu/aqmp/events.py
+++ b/python/qemu/aqmp/events.py
@@ -562,7 +562,7 @@ def empty(self) -> bool:
 """
 return self._queue.empty()
 
-def clear(self) -> None:
+def clear(self) -> List[Message]:
 """
 Clear this listener of all pending events.
 
@@ -570,17 +570,22 @@ def clear(self) -> None:
 pending FIFO queue synchronously. It can be also be used to
 manually clear any pending events, if desired.
 
+:return: The cleared events, if any.
+
 .. warning::
 Take care when discarding events. Cleared events will be
 silently tossed on the floor. All events that were ever
 accepted by this listener are visible in `history()`.
 """
+events = []
 while True:
 try:
-self._queue.get_nowait()
+events.append(self._queue.get_nowait())
 except asyncio.QueueEmpty:
 break
 
+return events
+
 def __aiter__(self) -> AsyncIterator[Message]:
 return self
 
-- 
2.31.1

[PATCH v2 00/17] Switch iotests to using Async QMP

2021-09-22 Thread John Snow

Based-on: <20210915162955.333025-1-js...@redhat.com>
  [PATCH v4 00/27] python: introduce Asynchronous QMP package
GitLab: https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-iotest-wrapper
CI: https://gitlab.com/jsnow/qemu/-/pipelines/375637927

Hiya,

This series continues where the first AQMP series left off and adds a
synchronous 'legacy' wrapper around the new AQMP interface, then drops
it straight into iotests to prove that AQMP is functional and totally
cool and fine. The disruption and churn to iotests is extremely minimal.
(There's actually a net negative SLOC in tests/qemu-iotests.)

In the event that a regression happens and I am not physically proximate
to inflict damage upon, one may set the QEMU_PYTHON_LEGACY_QMP variable
to any non-empty string as it pleases you to engage the QMP machinery
you are used to.

I'd like to try and get this committed early in the 6.2 development
cycle to give ample time to smooth over any possible regressions. I've
tested it locally and via gitlab CI, across Python versions 3.6 through
3.10, and "worksforme". If something bad happens, we can revert the
actual switch-flip very trivially.

Layout:

Patches 1-7: ./python/qemu/aqmp changes that serve as pre-requisites.
Patches 8-12: other ./python changes that ease the transition.
Patches 13-14: iotest changes to support the new QMP backend.
Patches 15-17: Make the switch.

V2:

001/17:[] [--] 'python/aqmp: add greeting property to QMPClient'
002/17:[] [--] 'python/aqmp: add .empty() method to EventListener'
003/17:[] [--] 'python/aqmp: Return cleared events from 
EventListener.clear()'
004/17:[0007] [FC] 'python/aqmp: add send_fd_scm'
005/17:[down] 'python/aqmp: Add dict conversion method to Greeting object'
006/17:[down] 'python/aqmp: Reduce severity of EOFError-caused loop 
terminations'
007/17:[down] 'python/aqmp: Disable logging messages by default'

008/17:[0002] [FC] 'python/qmp: clear events on get_events() call'
009/17:[] [--] 'python/qmp: add send_fd_scm directly to QEMUMonitorProtocol'
010/17:[] [--] 'python, iotests: remove socket_scm_helper'
011/17:[0013] [FC] 'python/machine: remove has_quit argument'
012/17:[down] 'python/machine: Handle QMP errors on close more meticulously'

013/17:[0009] [FC] 'iotests: Accommodate async QMP Exception classes'
014/17:[down] 'iotests: Conditionally silence certain AQMP errors'

015/17:[0016] [FC] 'python/aqmp: Create sync QMP wrapper for iotests'
016/17:[0002] [FC] 'python/aqmp: Remove scary message'
017/17:[] [--] 'python, iotests: replace qmp with aqmp'

- Rebased on jsnow/python, which was recently rebased on origin/master.
- Make aqmp's send_fd_scm method bark if the socket isn't AF_UNIX (Hanna)
- Uh... modify send_fd_scm so it doesn't break when Python 3.11 comes out ...
  See the commit message for more detail.
- Drop the "python/aqmp: Create MessageModel and StandaloneModel class"
  patch and replace with a far simpler method that just adds an
  _asdict() method.
- Add patches 06 and 07 to change how the AQMP library handles logging.
- Adjust docstring in patch 08 (Hanna)
- Rename "_has_quit" attribute to "_quid_issued" (Hanna)
- Renamed patch 12, simplified the logic in _soft_shutdown a tiny bit.
- Fixed bad exception handling logic in 13 (Hanna)
- Introduce a helper in patch 14 to silence log output when it's unwanted.
- Small addition of _get_greeting() helper in patch 15, coinciding with the
  new patch 05 here.
- Contextual changes in 16.

John Snow (17):
  python/aqmp: add greeting property to QMPClient
  python/aqmp: add .empty() method to EventListener
  python/aqmp: Return cleared events from EventListener.clear()
  python/aqmp: add send_fd_scm
  python/aqmp: Add dict conversion method to Greeting object
  python/aqmp: Reduce severity of EOFError-caused loop terminations
  python/aqmp: Disable logging messages by default
  python/qmp: clear events on get_events() call
  python/qmp: add send_fd_scm directly to QEMUMonitorProtocol
  python, iotests: remove socket_scm_helper
  python/machine: remove has_quit argument
  python/machine: Handle QMP errors on close more meticulously
  iotests: Accommodate async QMP Exception classes
  iotests: Conditionally silence certain AQMP errors
  python/aqmp: Create sync QMP wrapper for iotests
  python/aqmp: Remove scary message
  python, iotests: replace qmp with aqmp

 tests/qemu-iotests/socket_scm_helper.c| 136 --
 python/qemu/aqmp/__init__.py  |  14 +--
 python/qemu/aqmp/events.py|  15 ++-
 python/qemu/aqmp/legacy.py| 135 +
 python/qemu/aqmp/models.py|  13 +++
 python/qemu/aqmp/protocol.py  |   7 +-
 python/qemu/aqmp/qmp_client.py|  27 +
 python/qemu/machine/machine.py| 133 +++--
 python/qemu/machine/qtest.py  |   2 -
 python/qemu/qmp/__init__.py   |  27 +++--
 python/qemu/qmp/qmp_shell.py

[PATCH 2/6] iotests: add warning for rogue 'qemu' packages

2021-09-22 Thread John Snow

Add a warning for when 'iotests' runs against a qemu namespace that
isn't the one in the source tree. This might occur if you have
(accidentally) installed the Python namespace package to your local
packages.

(I'm not going to say that this is because I bit myself with this,
but. You can fill in the blanks.)

Signed-off-by: John Snow 
---
 tests/qemu-iotests/testenv.py | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
index 88104dace90..8a43b193af5 100644
--- a/tests/qemu-iotests/testenv.py
+++ b/tests/qemu-iotests/testenv.py
@@ -16,6 +16,8 @@
 # along with this program.  If not, see .
 #
 
+import importlib.util
+import logging
 import os
 import sys
 import tempfile
@@ -25,7 +27,7 @@
 import random
 import subprocess
 import glob
-from typing import List, Dict, Any, Optional, ContextManager
+from typing import List, Dict, Any, Optional, ContextManager, cast
 
 DEF_GDB_OPTIONS = 'localhost:12345'
 
@@ -112,6 +114,22 @@ def init_directories(self) -> None:
 # Path where qemu goodies live in this source tree.
 qemu_srctree_path = Path(__file__, '../../../python').resolve()
 
+# warn if we happen to be able to find 'qemu' packages from an
+# unexpected location (i.e. the package is already installed in
+# the user's environment)
+qemu_spec = importlib.util.find_spec('qemu.qmp')
+if qemu_spec:
+spec_path = Path(cast(str, qemu_spec.origin))
+try:
+_ = spec_path.relative_to(qemu_srctree_path)
+except ValueError:
+self._logger.warning(
+"WARNING: 'qemu' package will be imported from outside "
+"the source tree!")
+self._logger.warning(
+"Importing from: '%s'",
+spec_path.parents[1])
+
 self.pythonpath = os.getenv('PYTHONPATH')
 self.pythonpath = os.pathsep.join(filter(None, (
 self.source_iotests,
@@ -231,6 +249,7 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: str,
 
 self.build_root = os.path.join(self.build_iotests, '..', '..')
 
+self._logger = logging.getLogger('qemu.iotests')
 self.init_directories()
 self.init_binaries()
 
-- 
2.31.1

[PATCH 5/6] iotests/migrate-bitmaps-test: delint

2021-09-22 Thread John Snow

Mostly uninteresting stuff. Move the test injections under a function
named main() so that the variables used during that process aren't in
the global scope.

Signed-off-by: John Snow 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Hanna Reitz 
---
 tests/qemu-iotests/tests/migrate-bitmaps-test | 50 +++
 1 file changed, 28 insertions(+), 22 deletions(-)

diff --git a/tests/qemu-iotests/tests/migrate-bitmaps-test 
b/tests/qemu-iotests/tests/migrate-bitmaps-test
index dc431c35b35..c23df3d75c7 100755
--- a/tests/qemu-iotests/tests/migrate-bitmaps-test
+++ b/tests/qemu-iotests/tests/migrate-bitmaps-test
@@ -19,10 +19,11 @@
 # along with this program.  If not, see .
 #
 
-import os
 import itertools
 import operator
+import os
 import re
+
 import iotests
 from iotests import qemu_img, qemu_img_create, Timeout
 
@@ -224,25 +225,6 @@ def inject_test_case(klass, suffix, method, *args, 
**kwargs):
 setattr(klass, 'test_' + method + suffix, lambda self: mc(self))
 
 
-for cmb in list(itertools.product((True, False), repeat=5)):
-name = ('_' if cmb[0] else '_not_') + 'persistent_'
-name += ('_' if cmb[1] else '_not_') + 'migbitmap_'
-name += '_online' if cmb[2] else '_offline'
-name += '_shared' if cmb[3] else '_nonshared'
-if cmb[4]:
-name += '__pre_shutdown'
-
-inject_test_case(TestDirtyBitmapMigration, name, 'do_test_migration',
- *list(cmb))
-
-for cmb in list(itertools.product((True, False), repeat=2)):
-name = ('_' if cmb[0] else '_not_') + 'persistent_'
-name += ('_' if cmb[1] else '_not_') + 'migbitmap'
-
-inject_test_case(TestDirtyBitmapMigration, name,
- 'do_test_migration_resume_source', *list(cmb))
-
-
 class TestDirtyBitmapBackingMigration(iotests.QMPTestCase):
 def setUp(self):
 qemu_img_create('-f', iotests.imgfmt, base_a, size)
@@ -304,6 +286,30 @@ class TestDirtyBitmapBackingMigration(iotests.QMPTestCase):
 self.assert_qmp(result, 'return', {})
 
 
+def main() -> None:
+for cmb in list(itertools.product((True, False), repeat=5)):
+name = ('_' if cmb[0] else '_not_') + 'persistent_'
+name += ('_' if cmb[1] else '_not_') + 'migbitmap_'
+name += '_online' if cmb[2] else '_offline'
+name += '_shared' if cmb[3] else '_nonshared'
+if cmb[4]:
+name += '__pre_shutdown'
+
+inject_test_case(TestDirtyBitmapMigration, name, 'do_test_migration',
+ *list(cmb))
+
+for cmb in list(itertools.product((True, False), repeat=2)):
+name = ('_' if cmb[0] else '_not_') + 'persistent_'
+name += ('_' if cmb[1] else '_not_') + 'migbitmap'
+
+inject_test_case(TestDirtyBitmapMigration, name,
+ 'do_test_migration_resume_source', *list(cmb))
+
+iotests.main(
+supported_fmts=['qcow2'],
+supported_protocols=['file']
+)
+
+
 if __name__ == '__main__':
-iotests.main(supported_fmts=['qcow2'],
- supported_protocols=['file'])
+main()
-- 
2.31.1

[PATCH 4/6] iotests/mirror-top-perms: Adjust imports

2021-09-22 Thread John Snow

We need to import things from the qemu namespace; importing the
namespace alone doesn't bring the submodules with it -- unless someone
else (like iotests.py) imports them too.

Adjust the imports.

Signed-off-by: John Snow 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Hanna Reitz 
---
 tests/qemu-iotests/tests/mirror-top-perms | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/tests/mirror-top-perms 
b/tests/qemu-iotests/tests/mirror-top-perms
index 73138a0ef91..3d475aa3a54 100755
--- a/tests/qemu-iotests/tests/mirror-top-perms
+++ b/tests/qemu-iotests/tests/mirror-top-perms
@@ -21,7 +21,8 @@
 
 import os
 
-import qemu
+from qemu import qmp
+from qemu.machine import machine
 
 import iotests
 from iotests import qemu_img
@@ -46,7 +47,7 @@ class TestMirrorTopPerms(iotests.QMPTestCase):
 def tearDown(self):
 try:
 self.vm.shutdown()
-except qemu.machine.machine.AbnormalShutdown:
+except machine.AbnormalShutdown:
 pass
 
 if self.vm_b is not None:
@@ -101,7 +102,7 @@ class TestMirrorTopPerms(iotests.QMPTestCase):
 self.vm_b.launch()
 print('ERROR: VM B launched successfully, this should not have '
   'happened')
-except qemu.qmp.QMPConnectError:
+except qmp.QMPConnectError:
 assert 'Is another process using the image' in self.vm_b.get_log()
 
 result = self.vm.qmp('block-job-cancel',
-- 
2.31.1

[PATCH 6/6] iotests: Update for pylint 2.11.1

2021-09-22 Thread John Snow

1. Ignore the new f-strings warning, we're not interested in doing a
   full conversion at this time.

2. Just mute the unbalanced-tuple-unpacking warning, it's not a real
   error in this case and muting the dozens of callsites is just not
   worth it.

3. Add encodings to read_text().

Signed-off-by: John Snow 
---
 tests/qemu-iotests/pylintrc  | 6 +-
 tests/qemu-iotests/testrunner.py | 7 ---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/pylintrc b/tests/qemu-iotests/pylintrc
index f2c0b522ac0..8cb4e1d6a6d 100644
--- a/tests/qemu-iotests/pylintrc
+++ b/tests/qemu-iotests/pylintrc
@@ -19,13 +19,17 @@ disable=invalid-name,
 too-many-public-methods,
 # pylint warns about Optional[] etc. as unsubscriptable in 3.9
 unsubscriptable-object,
+# pylint's static analysis causes false positivies for file_path();
+# If we really care to make it statically knowable, we'll use mypy.
+unbalanced-tuple-unpacking,
 # Sometimes we need to disable a newly introduced pylint warning.
 # Doing so should not produce a warning in older versions of pylint.
 bad-option-value,
 # These are temporary, and should be removed:
 missing-docstring,
 too-many-return-statements,
-too-many-statements
+too-many-statements,
+consider-using-f-string,
 
 [FORMAT]
 
diff --git a/tests/qemu-iotests/testrunner.py b/tests/qemu-iotests/testrunner.py
index 4a6ec421ed6..a56b6da3968 100644
--- a/tests/qemu-iotests/testrunner.py
+++ b/tests/qemu-iotests/testrunner.py
@@ -266,12 +266,13 @@ def do_run_test(self, test: str) -> TestResult:
   diff=file_diff(str(f_reference), str(f_bad)))
 
 if f_notrun.exists():
-return TestResult(status='not run',
-  description=f_notrun.read_text().strip())
+return TestResult(
+status='not run',
+description=f_notrun.read_text(encoding='utf-8').strip())
 
 casenotrun = ''
 if f_casenotrun.exists():
-casenotrun = f_casenotrun.read_text()
+casenotrun = f_casenotrun.read_text(encoding='utf-8')
 
 diff = file_diff(str(f_reference), str(f_bad))
 if diff:
-- 
2.31.1

[PATCH 3/6] iotests/linters: check mypy files all at once

2021-09-22 Thread John Snow

We can circumvent the '__main__' redefinition problem by passing
--scripts-are-modules. Take mypy out of the loop per-filename and check
everything in one go: it's quite a bit faster.

Signed-off-by: John Snow 
Reviewed-by: Hanna Reitz 
---
 tests/qemu-iotests/297 | 44 +++---
 1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/tests/qemu-iotests/297 b/tests/qemu-iotests/297
index 467b712280e..91ec34d9521 100755
--- a/tests/qemu-iotests/297
+++ b/tests/qemu-iotests/297
@@ -74,32 +74,28 @@ def run_linters():
 print('=== mypy ===')
 sys.stdout.flush()
 
-# We have to call mypy separately for each file.  Otherwise, it
-# will interpret all given files as belonging together (i.e., they
-# may not both define the same classes, etc.; most notably, they
-# must not both define the __main__ module).
 env['MYPYPATH'] = env['PYTHONPATH']
-for filename in files:
-p = subprocess.run(('mypy',
-'--warn-unused-configs',
-'--disallow-subclassing-any',
-'--disallow-any-generics',
-'--disallow-incomplete-defs',
-'--disallow-untyped-decorators',
-'--no-implicit-optional',
-'--warn-redundant-casts',
-'--warn-unused-ignores',
-'--no-implicit-reexport',
-'--namespace-packages',
-filename),
-   env=env,
-   check=False,
-   stdout=subprocess.PIPE,
-   stderr=subprocess.STDOUT,
-   universal_newlines=True)
+p = subprocess.run(('mypy',
+'--warn-unused-configs',
+'--disallow-subclassing-any',
+'--disallow-any-generics',
+'--disallow-incomplete-defs',
+'--disallow-untyped-decorators',
+'--no-implicit-optional',
+'--warn-redundant-casts',
+'--warn-unused-ignores',
+'--no-implicit-reexport',
+'--namespace-packages',
+'--scripts-are-modules',
+*files),
+   env=env,
+   check=False,
+   stdout=subprocess.PIPE,
+   stderr=subprocess.STDOUT,
+   universal_newlines=True)
 
-if p.returncode != 0:
-print(p.stdout)
+if p.returncode != 0:
+print(p.stdout)
 
 
 for linter in ('pylint-3', 'mypy'):
-- 
2.31.1

[PATCH 1/6] iotests: add 'qemu' package location to PYTHONPATH in testenv

2021-09-22 Thread John Snow

We can drop the sys.path hacking in various places by doing
this. Additionally, by doing it in one place right up top, we can print
interesting warnings in case the environment does not look correct.

If we ever decide to change how the environment is crafted, all of the
"help me find my python packages" goop is all in one place, right in one
function.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/235|  2 --
 tests/qemu-iotests/297|  6 --
 tests/qemu-iotests/300|  7 +++
 tests/qemu-iotests/iotests.py |  2 --
 tests/qemu-iotests/testenv.py | 14 +-
 tests/qemu-iotests/tests/mirror-top-perms |  7 +++
 6 files changed, 15 insertions(+), 23 deletions(-)

diff --git a/tests/qemu-iotests/235 b/tests/qemu-iotests/235
index 8aed45f9a76..4de920c3801 100755
--- a/tests/qemu-iotests/235
+++ b/tests/qemu-iotests/235
@@ -24,8 +24,6 @@ import os
 import iotests
 from iotests import qemu_img_create, qemu_io, file_path, log
 
-sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'python'))
-
 from qemu.machine import QEMUMachine
 
 iotests.script_initialize(supported_fmts=['qcow2'])
diff --git a/tests/qemu-iotests/297 b/tests/qemu-iotests/297
index b04cba53667..467b712280e 100755
--- a/tests/qemu-iotests/297
+++ b/tests/qemu-iotests/297
@@ -68,12 +68,6 @@ def run_linters():
 # Todo notes are fine, but fixme's or xxx's should probably just be
 # fixed (in tests, at least)
 env = os.environ.copy()
-qemu_module_path = os.path.join(os.path.dirname(__file__),
-'..', '..', 'python')
-try:
-env['PYTHONPATH'] += os.pathsep + qemu_module_path
-except KeyError:
-env['PYTHONPATH'] = qemu_module_path
 subprocess.run(('pylint-3', '--score=n', '--notes=FIXME,XXX', *files),
env=env, check=False)
 
diff --git a/tests/qemu-iotests/300 b/tests/qemu-iotests/300
index fe94de84edd..10f9f2a8da6 100755
--- a/tests/qemu-iotests/300
+++ b/tests/qemu-iotests/300
@@ -24,12 +24,11 @@ import random
 import re
 from typing import Dict, List, Optional
 
-import iotests
-
-# Import qemu after iotests.py has amended sys.path
-# pylint: disable=wrong-import-order
 from qemu.machine import machine
 
+import iotests
+
+
 BlockBitmapMapping = List[Dict[str, object]]
 
 mig_sock = os.path.join(iotests.sock_dir, 'mig_sock')
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index ce06cf56304..b06ad76e0c5 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -36,8 +36,6 @@
 
 from contextlib import contextmanager
 
-# pylint: disable=import-error, wrong-import-position
-sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'python'))
 from qemu.machine import qtest
 from qemu.qmp import QMPMessage
 
diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
index 70da0d60c80..88104dace90 100644
--- a/tests/qemu-iotests/testenv.py
+++ b/tests/qemu-iotests/testenv.py
@@ -108,12 +108,16 @@ def init_directories(self) -> None:
  SAMPLE_IMG_DIR
  OUTPUT_DIR
 """
+
+# Path where qemu goodies live in this source tree.
+qemu_srctree_path = Path(__file__, '../../../python').resolve()
+
 self.pythonpath = os.getenv('PYTHONPATH')
-if self.pythonpath:
-self.pythonpath = self.source_iotests + os.pathsep + \
-self.pythonpath
-else:
-self.pythonpath = self.source_iotests
+self.pythonpath = os.pathsep.join(filter(None, (
+self.source_iotests,
+str(qemu_srctree_path),
+self.pythonpath,
+)))
 
 self.test_dir = os.getenv('TEST_DIR',
   os.path.join(os.getcwd(), 'scratch'))
diff --git a/tests/qemu-iotests/tests/mirror-top-perms 
b/tests/qemu-iotests/tests/mirror-top-perms
index 2fc8dd66e0a..73138a0ef91 100755
--- a/tests/qemu-iotests/tests/mirror-top-perms
+++ b/tests/qemu-iotests/tests/mirror-top-perms
@@ -20,13 +20,12 @@
 #
 
 import os
+
+import qemu
+
 import iotests
 from iotests import qemu_img
 
-# Import qemu after iotests.py has amended sys.path
-# pylint: disable=wrong-import-order
-import qemu
-
 
 image_size = 1 * 1024 * 1024
 source = os.path.join(iotests.test_dir, 'source.img')
-- 
2.31.1

[PATCH 0/6] iotests: update environment and linting configuration

2021-09-22 Thread John Snow

GitLab: https://gitlab.com/jsnow/qemu/-/commits/python-package-iotest-pt1
CI: https://gitlab.com/jsnow/qemu/-/pipelines/375630185

This series partially supersedes:
  [PATCH v3 00/16] python/iotests: Run iotest linters during Python CI'

Howdy, this is good stuff we want even if we aren't yet in agreement
about the best way to run iotest 297 from CI.

- Update linting config to tolerate pylint 2.11.1
- Eliminate sys.path hacking in individual test files
- make mypy execution in test 297 faster

The rest of the actual "run at CI time" stuff can get handled separately
and later pending some discussion on the other series.

--js

John Snow (6):
  iotests: add 'qemu' package location to PYTHONPATH in testenv
  iotests: add warning for rogue 'qemu' packages
  iotests/linters: check mypy files all at once
  iotests/mirror-top-perms: Adjust imports
  iotests/migrate-bitmaps-test: delint
  iotests: Update for pylint 2.11.1

 tests/qemu-iotests/235|  2 -
 tests/qemu-iotests/297| 50 ---
 tests/qemu-iotests/300|  7 ++-
 tests/qemu-iotests/iotests.py |  2 -
 tests/qemu-iotests/pylintrc   |  6 ++-
 tests/qemu-iotests/testenv.py | 35 ++---
 tests/qemu-iotests/testrunner.py  |  7 +--
 tests/qemu-iotests/tests/migrate-bitmaps-test | 50 +++
 tests/qemu-iotests/tests/mirror-top-perms | 12 ++---
 9 files changed, 95 insertions(+), 76 deletions(-)

-- 
2.31.1

Re: [PATCH v3 16/16] iotests/linters: check mypy files all at once

2021-09-22 Thread John Snow

On Fri, Sep 17, 2021 at 7:23 AM Hanna Reitz  wrote:

> On 16.09.21 06:09, John Snow wrote:
> > We can circumvent the '__main__' redefinition problem by passing
> > --scripts-are-modules. Take mypy out of the loop per-filename and check
> > everything in one go: it's quite a bit faster.
>
> Is it possible to pull this to the beginning of the series?  Just
> because patch 14 has to make everything quite slow (which might be a
> tiny nuisance in bisecting some day?).
>
>
Reasonable. Yes.

--js

>
> > Signed-off-by: John Snow 
> > ---
> >   tests/qemu-iotests/linters.py | 62 ---
> >   1 file changed, 29 insertions(+), 33 deletions(-)
>
> Reviewed-by: Hanna Reitz 
>
>

Re: [PATCH v3 14/16] iotests/linters: Add workaround for mypy bug #9852

2021-09-22 Thread John Snow

On Wed, Sep 22, 2021 at 4:37 PM John Snow  wrote:

>
> On Fri, Sep 17, 2021 at 7:16 AM Hanna Reitz  wrote:
>
>>
>> Question is, when “can we use” mypy >= 0.920?  Should we check the
>> version string and append this switch as required?
>>
>>
> The answer to that question depends on how the block maintainers feel
> about what environments they expect this test to be runnable under. I
> lightly teased kwolf once about an "ancient" version of pylint they were
> running, but felt kind of bad about it in retrospect: the tests I write
> should "just work" for everyone without them needing to really know
> anything about python or setting up or managing their dependencies,
> environments, etc.
>
> (1) We can use it the very moment it is released if you're OK with running
> 'make check-dev' from the python/ directory. That script sets up its own
> virtual environment and manages its own dependencies. If I set a new
> minimum version, it will always use it. (Same story for 'make check-tox',
> or 'make check-pipenv'. The differences between the tests are primarily on
> what other dependencies they have on your environment -- the details are
> boring, see python/Makefile for further reading if desired.)
>
> (2) Otherwise, it depends on how people feel about being able to run this
> test directly from iotests and what versions of mypy/pylint they are using.
> Fedora 33 for instance has 0.782-2.fc33 still, so I can't really "expect"
> people to have a bleeding-edge version of mypy unless they went out of
> their way to install one themselves. (pip3 install --user --upgrade mypy,
> by the way.) Since people are used to running these python scripts
> *outside* of a managed environment (using their OS packages directly), I
> have largely made every effort to support versions as old as I reasonably
> can -- to avoid disruption whenever I possibly can.
>
> So, basically, it kind of depends on if you want to keep 297 or not.
> Keeping it implies some additional cost for the sake of maximizing
> compatibility. If we ditch it, you can let the scripts in ./python do their
> thing and set up their own environments to run tests that should probably
> "just work" for everyone.297 could even just be updated to a bash script
> that just hopped over to ./python and ran a special avocado command that
> ran /only/ the iotest linters, if you wanted. I just felt that step #1 was
> to change as little as possible, prove the new approach worked, and then
> when folks were comfortable with it drop the old approach.
>
>

Oh, uh, and to answer your more concrete question: Nah, we don't need to
conditionally append this workaround. The speed lost from making the check
incremental is made up for by not invoking the tool 20 times, so it's OK to
just unconditionally add it for now.

Re: [PATCH v3 14/16] iotests/linters: Add workaround for mypy bug #9852

2021-09-22 Thread John Snow

On Fri, Sep 17, 2021 at 7:16 AM Hanna Reitz  wrote:

> On 16.09.21 06:09, John Snow wrote:
> > This one is insidious: if you use the invocation
> > "from {namespace} import {subpackage}" as mirror-top-perms does,
> > mypy will fail on every-other invocation *if* the package being
> > imported is a package.
> >
> > Now, I could just edit mirror-top-perms to avoid this invocation, but
> > since I tripped on a landmine, I might as well head it off at the pass
> > and make sure nobody else trips on the same landmine.
> >
> > It seems to have something to do with the order in which files are
> > checked as well, meaning the random order in which set(os.listdir())
> > produces the list of files to test will cause problems intermittently.
> >
> > mypy >= 0.920 isn't released yet, so add this workaround for now.
> >
> > See also:
> >   https://github.com/python/mypy/issues/11010
> >   https://github.com/python/mypy/issues/9852
> >
> > Signed-off-by: John Snow 
> > ---
> >   tests/qemu-iotests/linters.py | 3 +++
> >   1 file changed, 3 insertions(+)
> >
> > diff --git a/tests/qemu-iotests/linters.py
> b/tests/qemu-iotests/linters.py
> > index 4df062a973..9c97324e87 100755
> > --- a/tests/qemu-iotests/linters.py
> > +++ b/tests/qemu-iotests/linters.py
> > @@ -100,6 +100,9 @@ def run_linters(
> >   '--warn-unused-ignores',
> >   '--no-implicit-reexport',
> >   '--namespace-packages',
> > +# Until we can use mypy >= 0.920, see
> > +# https://github.com/python/mypy/
> issues
> /9852
> 
> > +'--no-incremental',
> >   filename,
> >   ),
>
> I’m afraid I still don’t really understand this, but I’m happy with this
> given as the reported workaround and you saying it works.
>
> Reviewed-by: Hanna Reitz 
>
> Question is, when “can we use” mypy >= 0.920?  Should we check the
> version string and append this switch as required?
>
>
The answer to that question depends on how the block maintainers feel about
what environments they expect this test to be runnable under. I lightly
teased kwolf once about an "ancient" version of pylint they were running,
but felt kind of bad about it in retrospect: the tests I write should "just
work" for everyone without them needing to really know anything about
python or setting up or managing their dependencies, environments, etc.

(1) We can use it the very moment it is released if you're OK with running
'make check-dev' from the python/ directory. That script sets up its own
virtual environment and manages its own dependencies. If I set a new
minimum version, it will always use it. (Same story for 'make check-tox',
or 'make check-pipenv'. The differences between the tests are primarily on
what other dependencies they have on your environment -- the details are
boring, see python/Makefile for further reading if desired.)

(2) Otherwise, it depends on how people feel about being able to run this
test directly from iotests and what versions of mypy/pylint they are using.
Fedora 33 for instance has 0.782-2.fc33 still, so I can't really "expect"
people to have a bleeding-edge version of mypy unless they went out of
their way to install one themselves. (pip3 install --user --upgrade mypy,
by the way.) Since people are used to running these python scripts
*outside* of a managed environment (using their OS packages directly), I
have largely made every effort to support versions as old as I reasonably
can -- to avoid disruption whenever I possibly can.

So, basically, it kind of depends on if you want to keep 297 or not.
Keeping it implies some additional cost for the sake of maximizing
compatibility. If we ditch it, you can let the scripts in ./python do their
thing and set up their own environments to run tests that should probably
"just work" for everyone.297 could even just be updated to a bash script
that just hopped over to ./python and ran a special avocado command that
ran /only/ the iotest linters, if you wanted. I just felt that step #1 was
to change as little as possible, prove the new approach worked, and then
when folks were comfortable with it drop the old approach.

> Hanna
>
>

Re: [PATCH v3 11/16] iotests/297: return error code from run_linters()

2021-09-22 Thread John Snow

On Fri, Sep 17, 2021 at 7:00 AM Hanna Reitz  wrote:

> On 16.09.21 06:09, John Snow wrote:
> > This turns run_linters() into a bit of a hybrid test; returning non-zero
> > on failed execution while also printing diffable information. This is
> > done for the benefit of the avocado simple test runner, which will soon
> > be attempting to execute this test from a different environment.
> >
> > (Note: universal_newlines is added to the pylint invocation for type
> > consistency with the mypy run -- it's not strictly necessary, but it
> > avoids some typing errors caused by our re-use of the 'p' variable.)
> >
> > Signed-off-by: John Snow 
> > Reviewed-by: Vladimir Sementsov-Ogievskiy 
> > ---
> >   tests/qemu-iotests/297 | 10 --
> >   1 file changed, 8 insertions(+), 2 deletions(-)
>
> I don’t think I like this very much.  Returning an integer error code
> seems archaic.
>
> (You can perhaps already see that this is going to be one of these
> reviews of mine where I won’t say anything is really wrong, but where I
> will just lament subjectively missing beauty.)
>
>
Haha. It's fine, Vladimir didn't like the smell of this one either. I just
didn't want to prematurely optimize or overcomplicate.


> As you say, run_linters() to me seems very iotests-specific still: It
> emits a specific output that is compared against a reference output.
> Fine for 297, but not fine for a function provided by a linters.py, I’d
> say.
>
> I’d prefer run_linters() to return something like a Map[str,
> Optional[str]], that would map a tool to its output in case of error,
> i.e. ideally:
>
> `{'pylint': None, 'mypy': None}`
>
>
Splitting the test apart into two sub-tests is completely reasonable.
Python CI right now has individual tests for pylint, mypy, etc.


> 297 could format it something like
>
> ```
> for tool, output in run_linters().items():
>  print(f'=== {tool} ===')
>  if output is not None:
>  print(output)
> ```
>
> Or something.
>
> To check for error, you could put a Python script in python/tests that
> checks `any(output is not None for output in run_linters().values())` or
> something (and on error print the output).
>
>
> Pulling out run_linters() into an external file and having it print
> something to stdout just seems too iotests-centric to me.  I suppose as
> long as the return code is right (which this patch is for) it should
> work for Avocado’s simple tests, too (which I don’t like very much
> either, by the way, because they too seem archaic to me), but, well.  It
> almost seems like the Avocado test should just run ./check then.
>
>
Yeh. Ideally, we'd just have a mypy.ini and a pylintrc that configures the
tests adequately, and then 297 (or whomever else) would just call the
linters which would in turn read the same configuration. This series is
somewhat of a stop-gap to measure the temperature of the room to see how
important it was to leave 297 inside of iotests. So, I did the conservative
thing that's faster to review even if it now looks *slightly* fishy.

As for things being archaic or not ... possibly, but why involve extra
complexity if it isn't warranted? a simple pass/fail works perfectly well.
(And the human can read the output to understand WHY it failed.) If you
want more rigorous analytics for some reason, we can discuss the use cases
and figure out how to represent that information, but that's definitely a
bit beyond scope here.

>
> Come to think of it, to be absolutely blasphemous, why not.  I could say
> all of this seems like quite some work that could be done by a
> python/tests script that does this:
>
> ```
> #!/bin/sh
> set -e
>
> cat >/tmp/qemu-parrot.sh < #!/bin/sh
> echo ': qcow2'
> echo ': qcow2'
> echo 'virtio-blk'
> EOF
>
> cd $QEMU_DIR/tests/qemu-iotests
>
> QEMU_PROG="/tmp/qemu-parrot.sh" \
> QEMU_IMG_PROG=$(which false) \
> QEMU_IO_PROG=$(which false) \
> QEMU_NBD_PROG=$(which false) \
> QSD_PROG=$(which false) \
> ./check 297
> ```
>
> And, no, I don’t want that!  But the point of this series seems to just
> be to rip the core of 297 out so it can run without ./check, because
> ./check requires some environment variables to be set. Doing that seems
> just seems wrong to me.
>
>
Right, the point of this series is effectively to split out the linting
configuration and separate it from the "test" which executes the linters
with that configuration. Simplest possible thing was to just take the
configuration as it exists in its current form -- hardcoded in a python
script -- and isolate it. To my delight, it worked quite well!


> Like, can’t we have a Python script in python/tests that imports
> linters.py, invokes run_linters() and sensibly checks the output? Or,
> you know, at the very least not have run_linters() print anything to
> stdout and not have it return an integer code. linters.py:main() can do
> that conversion.
>
>
Well, I certainly don't want to bother parsing output from python tools and
trying to make sure that it works sensibly

Re: [PULL 18/28] file-posix: try BLKSECTGET on block devices too, do not round to power of 2

2021-09-22 Thread Halil Pasic

On Mon, 6 Sep 2021 16:24:20 +0200
Halil Pasic  wrote:

> On Fri, 25 Jun 2021 16:18:12 +0200
> Paolo Bonzini  wrote:
> 
> > bs->sg is only true for character devices, but block devices can also
> > be used with scsi-block and scsi-generic.  Unfortunately BLKSECTGET
> > returns bytes in an int for /dev/sgN devices, and sectors in a short
> > for block devices, so account for that in the code.
> > 
> > The maximum transfer also need not be a power of 2 (for example I have
> > seen disks with 1280 KiB maximum transfer) so there's no need to pass
> > the result through pow2floor.
> > 
> > Signed-off-by: Paolo Bonzini   
> 
> We have found that this patch leads to in guest I/O errors when DASD
> is used as a source device. I.e. libvirt domain xml wise something like:
> 
> 
>   
>   
>   
>   
>   
> 
> 
> I don't think it is the fault of this patch: it LGTM. But it correlates
> 100%, furthermore the problem seems to be related to the value of
> bl.max_iov which now comes from sysfs. 
> 
> We are still investigating what is actually wrong. Just wanted to give
> everybody a heads-up that this does seem to cause a nasty regression on
> s390x, even if the code itself is perfect.
> 

We have figured out what is going on here. The problem seems to be
specific to linux aio, which seems to limit the size of the iovec to
1024 (UIO_MAXIOV).

Because of this requests get rejected with -EINVAL by the io_submit()
syscall. Here comes a real world example:

(gdb) p *laiocb
$5 = {co = 0x3ff700072c0, ctx = 0x3fe60002650, iocb = {data = 0x0, aio_rw_flags 
= 0, key = 0, 
aio_lio_opcode = 8, aio_reqprio = 0, aio_fildes = 38, u = {c = {buf = 
0x3ff70055bc0, 
nbytes = 1274, offset = 19977953280, __pad3 = 0, flags = 1, resfd = 
43}, v = {
vec = 0x3ff70055bc0, nr = 0, offset = 19977953280}, poll = {__pad1 = 
1023, 
events = 1879399360}, saddr = {addr = 0x3ff70055bc0, len = 0}}}, ret = 
-22, 
  nbytes = 20586496, qiov = 0x3ff700382f8, is_read = false, next = {sqe_next = 
0x0}}

On the host kernel side, the -EINVAL comes from this line:
https://elixir.bootlin.com/linux/v5.15-rc2/source/lib/iov_iter.c#L1863
in iovec_from_user() roughly via: __do_sys_io_submit()->
io_submit_one() -> aio_write() -> aio_setup_rw() -> __import_iovec().

Based on the offline discussion with the DASD maintainers, and on the
linux in source tree documentation (Documentation/block/queue-sysfs.rst
und Documentation/block/biodoc.rst), I believe that the DASD driver is
not wrong when advertising the value 65535 for queue/max_segments.

I believe QEMU jumps to the wrong conclusion in blk_get_max_iov() or
in virtio_blk_submit_multireq(), I can't really tell because I'm not
sure what the semantic of blk_get_max_iov() is. But if, I had to, I would
bet that blk_get_max_iov() returns the wrong value, when linux aio is
used. I'm not sure what is the exact relationship between max_segments
and max_iov.

One idea on how to fix this would be, to cap max_iov to UIO_MAXIOV
(unconditionally, or when linux aio is used). But I have to admit, I
don't have clarity. I couldn't even find documentation that states
this limitation of linux aio (I didn't look for it too hard though), so
I can not even be sure this is a QEMU problem.

That is why I decided to write this up first, and start a discussion on
who is playing foul with the most relevant people posted. I intend to try
myself fixing the problem once my understanding of it reaches a
sufficient level.

Thanks in advance for your contribution!

Regards,
Halil

Re: [PATCH v3 07/16] iotests/297: Don't rely on distro-specific linter binaries

2021-09-22 Thread John Snow

(This email just explains python packaging stuff. No action items in here.
Skim away.)

On Fri, Sep 17, 2021 at 5:43 AM Hanna Reitz  wrote:

> On 16.09.21 06:09, John Snow wrote:
> > 'pylint-3' is another Fedora-ism. Use "python3 -m pylint" or "python3 -m
> > mypy" to access these scripts instead. This style of invocation will
> > prefer the "correct" tool when run in a virtual environment.
> >
> > Note that we still check for "pylint-3" before the test begins -- this
> > check is now "overly strict", but shouldn't cause anything that was
> > already running correctly to start failing.
> >
> > Signed-off-by: John Snow 
> > Reviewed-by: Vladimir Sementsov-Ogievskiy 
> > Reviewed-by: Philippe Mathieu-Daudé 
> > ---
> >   tests/qemu-iotests/297 | 45 --
> >   1 file changed, 26 insertions(+), 19 deletions(-)
>
> I know it sounds silly, but to be honest I have no idea if replacing
> `mypy` by `python3 -m mypy` is correct, because no mypy documentation
> seems to suggest it.
>
>
Right, I don't think it's necessarily documented that you can do this. It
just happens to be a very convenient way to invoke the same script without
needing to know *where* mypy is. You let python figure out where it's going
to import mypy from, and it handles the rest.

(This also makes it easier to use things like mypy, pylint etc with an
explicitly specified PYTHON interpreter. I don't happen to do that in this
patch, but ... we could.)

>
>  From what I understand, that’s generally how Python “binaries” work,
> though (i.e., installed as a module invokable with `python -m`, and then
> providing some stub binary that, well, effectively does this, but kind
> of in a weird way, I just don’t understand it), and none of the
> parameters seem to be hurt in this conversion, so:
>
>
Right. Technically, any python package can ask for any number of
executables to be installed, but the setuptools packaging ecosystem
provides a way to "generate" these based on package configuration. I use a
few in our own Python packages. If you look in python/setup.cfg, you'll see
stuff like this:

[options.entry_points]
console_scripts =
qom = qemu.qmp.qom:main
qom-set = qemu.qmp.qom:QOMSet.entry_point
qom-get = qemu.qmp.qom:QOMGet.entry_point
qom-list = qemu.qmp.qom:QOMList.entry_point
qom-tree = qemu.qmp.qom:QOMTree.entry_point
qom-fuse = qemu.qmp.qom_fuse:QOMFuse.entry_point [fuse]
qemu-ga-client = qemu.qmp.qemu_ga_client:main
qmp-shell = qemu.qmp.qmp_shell:main

These entries cause those weird little binary wrapper scripts to be
generated for you when the package is *installed*. So our packages will put
'qmp-shell' and 'qom-tree' into your $PATH*.The stuff to the right of the
equals sign is just a pointer to a function that can be executed that
implements the CLI command. qemu.qmp.qmp_shell points to the module to
import, and ':main' points to the function to run.

The last bit of this is that many, though not all (and there's zero
requirement this has to be true), python packages that implement CLI
commands will often have a stanza in their __init__.py module that says
something like this:

if __name__ == '__main__':
do_the_command_line_stuff()

Alternatively, a package can include a literal __main__.py file that python
knows to check for, and this module is the one that gets executed for
`python3 -m mypackage` invocations. This is what mypy does.

Those are the magical blurbs that allow you to execute a module as if it
were a script by running "python3 -m mymodule" -- that hooks directly into
that little execution stanza. For python code distributed as packages,
that's the real reason to have that little magic stanza -- it provides a
convenient way to run stuff without needing to write the incredibly more
tedious:

python3 -c "from mypy.__main__ import console_entry; console_entry();"

... which is quite a bit more porcelain too, depending on how they
re/factor the code inside of the package.

Seeing as how mypy explicitly includes a __main__.py file:
https://github.com/python/mypy/blob/master/mypy/__main__.py, I am taking it
as a given that they are fully aware of invoking mypy in this fashion, and
believe it safe to rely on.

There will be a quiz later.
(There will not be a quiz.)

Reviewed-by: Hanna Reitz 
>

Thanks!

--js

*If your $PATH is configured to include, say, ~/.local/bin/, that is. On
Fedora, this is where executable shims for python get deposited when you do
"pip3 install --user somepackage". Python virtual environments will
configure your $PATH to include packages installed in that venv, etc etc
etc. Not every distro is configured such that pip packages are on the PATH,
so you can't rely on them being there when writing other scripts, usually.
Invoking the python interpreter directly seems far more flexible.

Re: [PULL 00/12] jobs: mirror: Handle errors after READY cancel

2021-09-22 Thread Vladimir Sementsov-Ogievskiy


22.09.2021 19:05, Richard Henderson wrote:

On 9/21/21 3:20 AM, Vladimir Sementsov-Ogievskiy wrote:

The following changes since commit 326ff8dd09556fc2e257196c49f35009700794ac:

   Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into 
staging (2021-09-20 16:17:05 +0100)

are available in the Git repository at:

   https://src.openvz.org/scm/~vsementsov/qemu.git  tags/pull-jobs-2021-09-21

for you to fetch changes up to c9489c04319cac75c76af8fc27c254f46e10214c:

   iotests: Add mirror-ready-cancel-error test (2021-09-21 11:56:11 +0300)


mirror: Handle errors after READY cancel


Hanna Reitz (12):
   job: Context changes in job_completed_txn_abort()
   mirror: Keep s->synced on error
   mirror: Drop s->synced
   job: Force-cancel jobs in a failed transaction
   job: @force parameter for job_cancel_sync()
   jobs: Give Job.force_cancel more meaning
   job: Add job_cancel_requested()
   mirror: Use job_is_cancelled()
   mirror: Check job_is_cancelled() earlier
   mirror: Stop active mirroring after force-cancel
   mirror: Do not clear .cancelled
   iotests: Add mirror-ready-cancel-error test


This fails testing with errors like so:

Running test test-replication
test-replication: ../job.c:186: job_state_transition: Assertion 
`JobSTT[s0][s1]' failed.
ERROR test-replication - too few tests run (expected 13, got 8)
make: *** [Makefile.mtest:816: run-test-100] Error 1
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 1

https://gitlab.com/qemu-project/qemu/-/pipelines/375324015/failures




Interesting :(

I've reproduced, starting test-replication in several parallel loops. (it 
doesn't reproduce for me if just start in one loop). So, that's some racy bug..

Hmm, and seems it doesn't reproduce so simple on master. I'll try to bisect the 
series tomorrow.



(gdb) bt
#0  0x7f034a3d09d5 in raise () from /lib64/libc.so.6
#1  0x7f034a3b9954 in abort () from /lib64/libc.so.6
#2  0x7f034a3b9789 in __assert_fail_base.cold () from /lib64/libc.so.6
#3  0x7f034a3c9026 in __assert_fail () from /lib64/libc.so.6
#4  0x55d3b503d670 in job_state_transition (job=0x55d3b5e67020, 
s1=JOB_STATUS_CONCLUDED) at ../job.c:186
#5  0x55d3b503e7c2 in job_conclude (job=0x55d3b5e67020) at ../job.c:652
#6  0x55d3b503eaa1 in job_finalize_single (job=0x55d3b5e67020) at 
../job.c:722
#7  0x55d3b503ecd1 in job_completed_txn_abort (job=0x55d3b5e67020) at 
../job.c:801
#8  0x55d3b503f2ea in job_cancel (job=0x55d3b5e67020, force=false) at 
../job.c:973
#9  0x55d3b503f360 in job_cancel_err (job=0x55d3b5e67020, 
errp=0x7fffcc997a80) at ../job.c:992
#10 0x55d3b503f576 in job_finish_sync (job=0x55d3b5e67020, finish=0x55d3b503f33f 
, errp=0x0) at ../job.c:1054
#11 0x55d3b503f3d0 in job_cancel_sync (job=0x55d3b5e67020, force=false) at 
../job.c:1008
#12 0x55d3b4ff14a3 in replication_close (bs=0x55d3b5e6ef80) at 
../block/replication.c:152
#13 0x55d3b50277fc in bdrv_close (bs=0x55d3b5e6ef80) at ../block.c:4677
#14 0x55d3b50286cf in bdrv_delete (bs=0x55d3b5e6ef80) at ../block.c:5100
#15 0x55d3b502ae3a in bdrv_unref (bs=0x55d3b5e6ef80) at ../block.c:6495
#16 0x55d3b5023a38 in bdrv_root_unref_child (child=0x55d3b5e4c690) at 
../block.c:3010
#17 0x55d3b5047998 in blk_remove_bs (blk=0x55d3b5e73b40) at 
../block/block-backend.c:845
#18 0x55d3b5046e38 in blk_delete (blk=0x55d3b5e73b40) at 
../block/block-backend.c:461
#19 0x55d3b50470dc in blk_unref (blk=0x55d3b5e73b40) at 
../block/block-backend.c:516
#20 0x55d3b4fdb20a in teardown_secondary () at 
../tests/unit/test-replication.c:367
#21 0x55d3b4fdb632 in test_secondary_continuous_replication () at 
../tests/unit/test-replication.c:504
#22 0x7f034b26979e in g_test_run_suite_internal () from 
/lib64/libglib-2.0.so.0
#23 0x7f034b26959b in g_test_run_suite_internal () from 
/lib64/libglib-2.0.so.0
#24 0x7f034b26959b in g_test_run_suite_internal () from 
/lib64/libglib-2.0.so.0
#25 0x7f034b269c8a in g_test_run_suite () from /lib64/libglib-2.0.so.0
#26 0x7f034b269ca5 in g_test_run () from /lib64/libglib-2.0.so.0
#27 0x55d3b4fdb9c0 in main (argc=1, argv=0x7fffcc998138) at 
../tests/unit/test-replication.c:613
(gdb) fr 4
#4  0x55d3b503d670 in job_state_transition (job=0x55d3b5e67020, 
s1=JOB_STATUS_CONCLUDED) at ../job.c:186
186 assert(JobSTT[s0][s1]);
(gdb) list
181 JobStatus s0 = job->status;
182 assert(s1 >= 0 && s1 < JOB_STATUS__MAX);
183 trace_job_state_transition(job, job->ret,
184JobSTT[s0][s1] ? "allowed" : 
"disallowed",
185JobStatus_str(s0), JobStatus_str(s1));
186 assert(JobSTT[s0][s1]);
187 job->status = s1;
188
189 if

Re: [PATCH v3 03/16] iotests/migrate-bitmaps-postcopy-test: declare instance variables

2021-09-22 Thread John Snow

On Fri, Sep 17, 2021 at 4:37 AM Hanna Reitz  wrote:

> On 16.09.21 06:09, John Snow wrote:
> > Signed-off-by: John Snow 
> > ---
> >   tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test | 3 +++
> >   1 file changed, 3 insertions(+)
> >
> > diff --git a/tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test
> b/tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test
> > index 00ebb5c251..9c00ae61c8 100755
> > --- a/tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test
> > +++ b/tests/qemu-iotests/tests/migrate-bitmaps-postcopy-test
> > @@ -115,6 +115,9 @@ class
> TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
> >   self.vm_a_events = []
> >   self.vm_b_events = []
> >
> > +self.discards1_sha256: str
> > +self.all_discards_sha256: str
> > +
> >   def start_postcopy(self):
> >   """ Run migration until RESUME event on target. Return this
> event. """
> >   for i in range(nb_bitmaps):
>
> Isn’t this obsolete after e2ad17a62d9da45fbcddc3faa3d6ced519a9453a?
>
> Hanna
>
>
Yes, my apologies.

--js

Re: [PATCH v3 06/16] iotests/297: Add get_files() function

2021-09-22 Thread John Snow

On Fri, Sep 17, 2021 at 5:24 AM Hanna Reitz  wrote:

> On 16.09.21 06:09, John Snow wrote:
> > Split out file discovery into its own method to begin separating out the
> > "environment setup" and "test execution" phases.
> >
> > Signed-off-by: John Snow 
> > ---
> >   tests/qemu-iotests/297 | 12 +---
> >   1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/tests/qemu-iotests/297 b/tests/qemu-iotests/297
> > index 3e86f788fc..b3a56a3a29 100755
> > --- a/tests/qemu-iotests/297
> > +++ b/tests/qemu-iotests/297
> > @@ -21,6 +21,7 @@ import re
> >   import shutil
> >   import subprocess
> >   import sys
> > +from typing import List
> >
> >   import iotests
> >
> > @@ -56,10 +57,15 @@ def is_python_file(filename: str, directory: str =
> '.') -> bool:
> >   return False
> >
> >
> > +def get_test_files(directory: str = '.') -> List[str]:
> > +named_test_dir = os.path.join(directory, 'tests')
> > +named_tests = [f"tests/{entry}" for entry in
> os.listdir(named_test_dir)]
> > +check_tests = set(os.listdir(directory) + named_tests) -
> set(SKIP_FILES)
> > +return list(filter(lambda f: is_python_file(f, directory),
> check_tests))
>
> Seeing a filter() makes me immensely happy, but I thought that was
> unpythonic?
>
>
Eh, depends on what you're doing, I guess?

The alternative spelling is:
list(f for f in check_tests if is_python_file(f, directory))

... which I guess *is* shorter and skips the lambda. but, I suppose I have
some mild revulsion to seeing "f for f in ..." constructs.
If I ever tell you not to use a filter, feel free to cite this patch and
then just tell me to shut up. I apologize for any inconsistencies in my
style, real or imagined.

--js

Reviewed-by: Hanna Reitz 
>
>

Re: [PULL 00/12] jobs: mirror: Handle errors after READY cancel

2021-09-22 Thread Richard Henderson


On 9/21/21 3:20 AM, Vladimir Sementsov-Ogievskiy wrote:

The following changes since commit 326ff8dd09556fc2e257196c49f35009700794ac:

   Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into 
staging (2021-09-20 16:17:05 +0100)

are available in the Git repository at:

   https://src.openvz.org/scm/~vsementsov/qemu.git  tags/pull-jobs-2021-09-21

for you to fetch changes up to c9489c04319cac75c76af8fc27c254f46e10214c:

   iotests: Add mirror-ready-cancel-error test (2021-09-21 11:56:11 +0300)


mirror: Handle errors after READY cancel


Hanna Reitz (12):
   job: Context changes in job_completed_txn_abort()
   mirror: Keep s->synced on error
   mirror: Drop s->synced
   job: Force-cancel jobs in a failed transaction
   job: @force parameter for job_cancel_sync()
   jobs: Give Job.force_cancel more meaning
   job: Add job_cancel_requested()
   mirror: Use job_is_cancelled()
   mirror: Check job_is_cancelled() earlier
   mirror: Stop active mirroring after force-cancel
   mirror: Do not clear .cancelled
   iotests: Add mirror-ready-cancel-error test


This fails testing with errors like so:

Running test test-replication
test-replication: ../job.c:186: job_state_transition: Assertion 
`JobSTT[s0][s1]' failed.
ERROR test-replication - too few tests run (expected 13, got 8)
make: *** [Makefile.mtest:816: run-test-100] Error 1
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 1

https://gitlab.com/qemu-project/qemu/-/pipelines/375324015/failures


r~

Re: [PATCH v6 0/5] block/nbd: drop connection_co

2021-09-22 Thread Vladimir Sementsov-Ogievskiy


22.09.2021 16:12, Eric Blake wrote:

On Wed, Sep 22, 2021 at 10:48:58AM +0300, Vladimir Sementsov-Ogievskiy wrote:

ping)

Only one patch is not reviewed. But the biggest one :(


Yeah, and it's on my list for this week.  I'm overdue for a pull
request, so I want to get this series in if at all possible, which
means I'll get on that (large) review soon.  Meanwhile, the ping is
appreciated!



Thanks!

--
Best regards,
Vladimir

Re: [PATCH v6 00/11] 64bit block-layer: part II

2021-09-22 Thread Eric Blake

On Wed, Sep 22, 2021 at 10:52:20AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Ping)
> 
> Not reviewed: 6,7,10

Also on my list to see what I can get in this week's NBD pull request.

> 
> 03.09.2021 13:27, Vladimir Sementsov-Ogievskiy wrote:
> > Hi all!
> > 
> > Sorry for a long delay:(  Finally, here is v6.
> > 

> > Vladimir Sementsov-Ogievskiy (11):
> >block/io: bring request check to bdrv_co_(read,write)v_vmstate
> >qcow2: check request on vmstate save/load path
> >block: use int64_t instead of uint64_t in driver read handlers
> >block: use int64_t instead of uint64_t in driver write handlers
> >block: use int64_t instead of uint64_t in copy_range driver handlers
> >block: make BlockLimits::max_pwrite_zeroes 64bit
> >block: use int64_t instead of int in driver write_zeroes handlers
> >block/io: allow 64bit write-zeroes requests
> >block: make BlockLimits::max_pdiscard 64bit
> >block: use int64_t instead of int in driver discard handlers
> >block/io: allow 64bit discard requests
> 
> 
> -- 
> Best regards,
> Vladimir
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v6 0/5] block/nbd: drop connection_co

2021-09-22 Thread Eric Blake

On Wed, Sep 22, 2021 at 10:48:58AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> ping)
> 
> Only one patch is not reviewed. But the biggest one :(

Yeah, and it's on my list for this week.  I'm overdue for a pull
request, so I want to get this series in if at all possible, which
means I'll get on that (large) review soon.  Meanwhile, the ping is
appreciated!

> 
> 02.09.2021 13:38, Vladimir Sementsov-Ogievskiy wrote:
> > v6:
> > 01,02: add Eric's r-b
> > 03: make new interface clearer
> > 04,05: rebased on updated 03
> > 
> > Vladimir Sementsov-Ogievskiy (5):
> >block/nbd: nbd_channel_error() shutdown channel unconditionally
> >block/nbd: move nbd_recv_coroutines_wake_all() up
> >block/nbd: refactor nbd_recv_coroutines_wake_all()
> >block/nbd: drop connection_co
> >block/nbd: check that received handle is valid
> > 
> >   block/nbd.c  | 412 +++
> >   nbd/client.c |   2 -
> >   2 files changed, 121 insertions(+), 293 deletions(-)
> > 
> 
> 
> -- 
> Best regards,
> Vladimir
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v7 00/11] qcow2: fix parallel rewrite and discard (reqlist)

2021-09-22 Thread Vladimir Sementsov-Ogievskiy


Ping.

Hi Kevin!

Could you look at description in cover-letter and 07-09, and say do you like the approach 
in general? Then I can resend with reqlist included and without "Based-on".

Or do you still prefer the solution with RWLock?

I don't like RWLock-based blocking solution, because:

1. Mirror does discards in parallel with other requests, so blocking discard 
may decrease mirror performance
2. Backup (in push-backup-with-fleecing scheme) will do discards on temporary 
image, again blocking discard is bad in this case
3. block-commit and block-stream will at some moment do discard-after-copy to 
not waste disk space
4. It doesn't seem possible to pass ownership on RWLock to task coroutine (as 
we can't lock it in one coroutine and release in another)


04.09.2021 19:24, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

This is a new (third :)) variant of solving the problem in qcow2 that we
lack synchronisation in reallocating host clusters with refcount=0 and
guest reads/writes that may still be in progress on these clusters.

Previous version was
"[PATCH v6 00/12] qcow2: fix parallel rewrite and discard (lockless)"
Supersedes: <20210422163046.442932-1-vsement...@virtuozzo.com>

Key features of new solution:

1. Simpler data structure: requests list instead of "dynamic refcount"
concept. What is good is that request list data structure is
implemented and used in other my series. So we can improve and reuse
it.

2. Don't care about discrads: my previous attempts were complicated by
trying to postpone discards when we have in-flight operations on
cluster which refcounts becomes 0. Don't bother with it anymore.
Nothing bad in discard: it similar as when guest does simultaneous
writes into same cluster: we don't care to serialize these writes.
So keep discards as is.

So now the whole fix may be described in one sentence: don't consider a
host cluster "free" (for allocation) if there are still some in-flight
guest operations on it.

What patches are:

1-2: simple additions to reqlist (see also *)

3: test. It's unchanged since previous solution

4-5: simpler refactorings

6-7: implement guest requests tracking in qcow2

8: refactoring for [9]
9: BUG FIX: use request tracking to avoid reallocating clusters under
guest operation

10-11: improvement if request tracking to avoid extra small critical
sections that were added in [7]

[*]:
For this series to work we also need
  "[PATCH v2 06/19] block: intoduce reqlist":
Based-on: <20210827181808.311670-7-vsement...@virtuozzo.com>

Still, note that we use only simple core of reqlist and don't use its
coroutine-related features. That probably means that I should split a
kind of BlockReqListBase data structure out of BlockReqList, and then
will have separate

CoBlockReqList for "[PATCH v2 00/19] Make image fleecing more usable"
series and Qcow2ReqList for this series..

But that a side question.

Vladimir Sementsov-Ogievskiy (11):
   block/reqlist: drop extra assertion
   block/reqlist: add reqlist_new_req() and reqlist_free_req()
   iotests: add qcow2-discard-during-rewrite
   qcow2: introduce qcow2_parse_compressed_cluster_descriptor()
   qcow2: refactor qcow2_co_preadv_task() to have one return
   qcow2: prepare for tracking guest io requests in data_file
   qcow2: track guest io requests in data_file
   qcow2: introduce is_cluster_free() helper
   qcow2: don't reallocate host clusters under guest operation
   block/reqlist: implement reqlist_mark_req_invalid()
   qcow2: use reqlist_mark_req_invalid()

  block/qcow2.h |  15 ++-
  include/block/reqlist.h   |  33 ++
  block/qcow2-cluster.c |  45 ++-
  block/qcow2-refcount.c|  34 +-
  block/qcow2.c | 112 +-
  block/reqlist.c   |  25 +++-
  .../tests/qcow2-discard-during-rewrite|  72 +++
  .../tests/qcow2-discard-during-rewrite.out|  21 
  8 files changed, 310 insertions(+), 47 deletions(-)
  create mode 100755 tests/qemu-iotests/tests/qcow2-discard-during-rewrite
  create mode 100644 tests/qemu-iotests/tests/qcow2-discard-during-rewrite.out




--
Best regards,
Vladimir

Re: [PATCH v6 00/11] 64bit block-layer: part II

2021-09-22 Thread Vladimir Sementsov-Ogievskiy


Ping)

Not reviewed: 6,7,10

03.09.2021 13:27, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

Sorry for a long delay:(  Finally, here is v6.

v6: rebase on new rbd handlers and backup-top renamed to copy-before-write. 
Also:

01: add Eric's r-b
 tweak commit msg to not mention sheepdog
02: add Eric's r-b
03: tweak commit msg
 drop extra type conversion in raw_co_pwrite_zeroes

04: vvfat: no write_target_commit() anymore
 nvme: fix over-80 line
 type conversion in raw_co_pwrite_zeroes() now dropped in 03

05: tweak commit msg
 add Eric's r-b

06: update comment

07: tweak commit msg, grammar and typos in comments
 don't add extra empty line
 handle also qemu_rbd_co_pwrite_zeroes

08: tweak commit msg, add Eric's r-b
09: tweak commit msg, add Eric's r-b
10: add rbd, drop sheepdog
 add assertion to iscsi_co_pdiscard()
11: tweak commit msg, add Eric's r-b

Vladimir Sementsov-Ogievskiy (11):
   block/io: bring request check to bdrv_co_(read,write)v_vmstate
   qcow2: check request on vmstate save/load path
   block: use int64_t instead of uint64_t in driver read handlers
   block: use int64_t instead of uint64_t in driver write handlers
   block: use int64_t instead of uint64_t in copy_range driver handlers
   block: make BlockLimits::max_pwrite_zeroes 64bit
   block: use int64_t instead of int in driver write_zeroes handlers
   block/io: allow 64bit write-zeroes requests
   block: make BlockLimits::max_pdiscard 64bit
   block: use int64_t instead of int in driver discard handlers
   block/io: allow 64bit discard requests



--
Best regards,
Vladimir

Re: [PATCH v2] block: drop BLK_PERM_GRAPH_MOD

2021-09-22 Thread Vladimir Sementsov-Ogievskiy


Ping)

Patch is reviewed.

02.09.2021 12:37, Vladimir Sementsov-Ogievskiy wrote:

First, this permission never protected a node from being changed, as
generic child-replacing functions don't check it.

Second, it's a strange thing: it presents a permission of parent node
to change its child. But generally, children are replaced by different
mechanisms, like jobs or qmp commands, not by nodes.

Graph-mod permission is hard to understand. All other permissions
describe operations which done by parent node on its child: read,
write, resize. Graph modification operations are something completely
different.

The only place where BLK_PERM_GRAPH_MOD is used as "perm" (not shared
perm) is mirror_start_job, for s->target. Still modern code should use
bdrv_freeze_backing_chain() to protect from graph modification, if we
don't do it somewhere it may be considered as a bug. So, it's a bit
risky to drop GRAPH_MOD, and analyzing of possible loss of protection
is hard. But one day we should do it, let's do it now.

One more bit of information is that locking the corresponding byte in
file-posix doesn't make sense at all.

Signed-off-by: Vladimir Sementsov-Ogievskiy



--
Best regards,
Vladimir

Re: [PATCH v6 0/5] block/nbd: drop connection_co

2021-09-22 Thread Vladimir Sementsov-Ogievskiy


ping)

Only one patch is not reviewed. But the biggest one :(

02.09.2021 13:38, Vladimir Sementsov-Ogievskiy wrote:

v6:
01,02: add Eric's r-b
03: make new interface clearer
04,05: rebased on updated 03

Vladimir Sementsov-Ogievskiy (5):
   block/nbd: nbd_channel_error() shutdown channel unconditionally
   block/nbd: move nbd_recv_coroutines_wake_all() up
   block/nbd: refactor nbd_recv_coroutines_wake_all()
   block/nbd: drop connection_co
   block/nbd: check that received handle is valid

  block/nbd.c  | 412 +++
  nbd/client.c |   2 -
  2 files changed, 121 insertions(+), 293 deletions(-)




--
Best regards,
Vladimir

Re: [PATCH v2] iotests/check: move long options to double dash

2021-09-22 Thread Vladimir Sementsov-Ogievskiy


ping.

Patch is reviewed)

03.09.2021 15:00, Vladimir Sementsov-Ogievskiy wrote:

So, the change:

   -makecheck -> --makecheck
   -gdb   -> --gdb
   -valgrind  -> --valgrind
   -misalign  -> --misalign
   -nocache   -> --nocache
   -qcow2 (and other formats) -> --qcow2
   -file (and other protocols) -> --file

Motivation:

1. check scripts uses ArgumentParser to parse options, which supports
combining of short options. So using one dash for long options is a
bit ambiguous.

2. Several long options are already have double dash:
   --dry-run, --color, --groups, --exclude-groups, --start-from

3. -misalign is used to add --misalign option (two dashes) to qemu-io

4. qemu-io and qemu-nbd has --nocache option (two dashes)

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

v2: cover more things, update also format and protocol options

  docs/devel/testing.rst   | 12 ++--
  .gitlab-ci.d/buildtest.yml   |  4 ++--
  tests/check-block.sh |  2 +-
  tests/qemu-iotests/README|  7 ---
  tests/qemu-iotests/check | 14 +++---
  tests/qemu-iotests/common.rc |  4 ++--
  6 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index 4a0abbf23d..907b18a600 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -153,16 +153,16 @@ with arguments:
  .. code::
  
# test with qcow2 format

-  ./check -qcow2
+  ./check --qcow2
# or test a different protocol
-  ./check -nbd
+  ./check --nbd
  
  It's also possible to list test numbers explicitly:
  
  .. code::
  
# run selected cases with qcow2 format

-  ./check -qcow2 001 030 153
+  ./check --qcow2 001 030 153
  
  Cache mode can be selected with the "-c" option, which may help reveal bugs

  that are specific to certain cache mode.
@@ -229,7 +229,7 @@ Debugging a test case
  The following options to the ``check`` script can be useful when debugging
  a failing test:
  
-* ``-gdb`` wraps every QEMU invocation in a ``gdbserver``, which waits for a

+* ``--gdb`` wraps every QEMU invocation in a ``gdbserver``, which waits for a
connection from a gdb client.  The options given to ``gdbserver`` (e.g. the
address on which to listen for connections) are taken from the 
``$GDB_OPTIONS``
environment variable.  By default (if ``$GDB_OPTIONS`` is empty), it 
listens on
@@ -237,10 +237,10 @@ a failing test:
It is possible to connect to it for example with
``gdb -iex "target remote $addr"``, where ``$addr`` is the address
``gdbserver`` listens on.
-  If the ``-gdb`` option is not used, ``$GDB_OPTIONS`` is ignored,
+  If the ``--gdb`` option is not used, ``$GDB_OPTIONS`` is ignored,
regardless of whether it is set or not.
  
-* ``-valgrind`` attaches a valgrind instance to QEMU. If it detects

+* ``--valgrind`` attaches a valgrind instance to QEMU. If it detects
warnings, it will print and save the log in
``$TEST_DIR/.valgrind``.
The final command line will be ``valgrind --log-file=$TEST_DIR/
diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index e74998efb9..139c27554d 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -303,10 +303,10 @@ build-tcg-disabled:
  - make check-unit
  - make check-qapi-schema
  - cd tests/qemu-iotests/
-- ./check -raw 001 002 003 004 005 008 009 010 011 012 021 025 032 033 048
+- ./check --raw 001 002 003 004 005 008 009 010 011 012 021 025 032 033 048
  052 063 077 086 101 104 106 113 148 150 151 152 157 159 160 163
  170 171 183 184 192 194 208 221 226 227 236 253 277 image-fleecing
-- ./check -qcow2 028 051 056 057 058 065 068 082 085 091 095 096 102 122
+- ./check --qcow2 028 051 056 057 058 065 068 082 085 091 095 096 102 122
  124 132 139 142 144 145 151 152 155 157 165 194 196 200 202
  208 209 216 218 227 234 246 247 248 250 254 255 257 258
  260 261 262 263 264 270 272 273 277 279 image-fleecing
diff --git a/tests/check-block.sh b/tests/check-block.sh
index f86cb863de..cff1263c0b 100755
--- a/tests/check-block.sh
+++ b/tests/check-block.sh
@@ -77,7 +77,7 @@ export PYTHONUTF8=1
  
  ret=0

  for fmt in $format_list ; do
-${PYTHON} ./check -makecheck -$fmt $group || ret=1
+${PYTHON} ./check --makecheck --$fmt $group || ret=1
  done
  
  exit $ret

diff --git a/tests/qemu-iotests/README b/tests/qemu-iotests/README
index 6079b401ae..8e1f3e19c3 100644
--- a/tests/qemu-iotests/README
+++ b/tests/qemu-iotests/README
@@ -10,9 +10,10 @@ but no actual block drivers like ide, scsi or virtio.
  
  * Usage
  
-Just run ./check to run all tests for the raw image format, or ./check

--qcow2 to test the qcow2 image format.  The output of ./check -h explains
-additional options to test further image formats or I/O methods.
+Just run ./check to run all tests for the raw image format,
+or ./check --qcow2 to test the qcow2 image format.  The output of
+./check -h

Re: [PATCH v2 00/19] Make image fleecing more usable

2021-09-22 Thread Vladimir Sementsov-Ogievskiy


ping )

27.08.2021 21:17, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

That continues "[PATCH RFC DRAFT 00/11] Make image fleecing more usable"
and supersedes "[PATCH v2 for-6.2 0/6] push backup with fleecing"

Supersedes: <20210804131750.127574-1-vsement...@virtuozzo.com>
Supersedes: <20210721140424.163701-1-vsement...@virtuozzo.com>

There several improvements to fleecing scheme:

1. support bitmap in copy-before-write filter

2. introduce fleecing block driver, which opens the door for a lot of
image fleecing improvements.
See "block: introduce fleecing block driver" commit message for
details.

3. support "push backup with fleecing" scheme, when backup job is a
client of common fleecing scheme. That helps when writes to final
backup target are slow and we don't want guest writes hang waiting
for copy-before-write operations to final target.

Vladimir Sementsov-Ogievskiy (19):
   block/block-copy: move copy_bitmap initialization to
 block_copy_state_new()
   block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value
   block/block-copy: block_copy_state_new(): add bitmap parameter
   block/copy-before-write: add bitmap open parameter
   block/block-copy: add block_copy_reset()
   block: intoduce reqlist
   block/dirty-bitmap: introduce bdrv_dirty_bitmap_status()
   block/reqlist: add reqlist_wait_all()
   block: introduce FleecingState class
   block: introduce fleecing block driver
   block/copy-before-write: support fleecing block driver
   block/block-copy: add write-unchanged mode
   block/copy-before-write: use write-unchanged in fleecing mode
   iotests/image-fleecing: add test-case for fleecing format node
   iotests.py: add qemu_io_pipe_and_status()
   iotests/image-fleecing: add test case with bitmap
   block: blk_root(): return non-const pointer
   qapi: backup: add immutable-source parameter
   iotests/image-fleecing: test push backup with fleecing

  qapi/block-core.json|  39 ++-
  block/fleecing.h| 151 
  include/block/block-copy.h  |   4 +-
  include/block/block_int.h   |   1 +
  include/block/dirty-bitmap.h|   4 +-
  include/block/reqlist.h |  75 ++
  include/qemu/hbitmap.h  |  11 +
  include/sysemu/block-backend.h  |   2 +-
  block/backup.c  |  61 -
  block/block-backend.c   |   2 +-
  block/block-copy.c  | 157 +---
  block/copy-before-write.c   |  70 +-
  block/dirty-bitmap.c|  15 +-
  block/fleecing-drv.c| 260 
  block/fleecing.c| 182 ++
  block/monitor/bitmap-qmp-cmds.c |   5 +-
  block/replication.c |   2 +-
  block/reqlist.c |  84 +++
  blockdev.c  |   1 +
  util/hbitmap.c  |  36 +++
  MAINTAINERS |   7 +-
  block/meson.build   |   3 +
  tests/qemu-iotests/iotests.py   |   4 +
  tests/qemu-iotests/tests/image-fleecing | 178 +++---
  tests/qemu-iotests/tests/image-fleecing.out | 221 -
  25 files changed, 1420 insertions(+), 155 deletions(-)
  create mode 100644 block/fleecing.h
  create mode 100644 include/block/reqlist.h
  create mode 100644 block/fleecing-drv.c
  create mode 100644 block/fleecing.c
  create mode 100644 block/reqlist.c




--
Best regards,
Vladimir

Re: [PATCH] nbd/client: Request larger block status by default

2021-09-22 Thread Vladimir Sementsov-Ogievskiy


21.09.2021 23:00, Eric Blake wrote:

On Tue, Sep 21, 2021 at 10:12:02PM +0300, Vladimir Sementsov-Ogievskiy wrote:

21.09.2021 21:08, Eric Blake wrote:

On Tue, Sep 21, 2021 at 08:25:11PM +0300, Vladimir Sementsov-Ogievskiy wrote:

21.09.2021 19:17, Eric Blake wrote:

Now that commit 5a1cfd21 has clarified that a driver's block_status
can report larger *pnum than in the original request, we can take
advantage of that in the NBD driver.  Rather that limiting our request
to the server based on the maximum @bytes our caller mentioned, we
instead ask for as much status as possible (the minimum of our 4G
limit or the rest of the export); the server will still only give us
one extent in its answer (because we are using NBD_CMD_FLAG_REQ_ONE),
but now the block layer's caching of data areas can take advantage of
cases where the server gives us a large answer to avoid the need for
future NBD_CMD_BLOCK_STATUS calls.

Signed-off-by: Eric Blake 
---




I remember we already discussed that, but can't find.

The problem is that it's not for free:

In server code in blockstatus_to_extents, we loop though the disk, trying to 
merge extents of the same type.

With full allocated qcow2, we'll have to load all L2 tables and handle them, to merge all 
block status into one big "allocated" extent.



We don't have to loop that far.  The NBD protocol allows the server to
stop looping at whatever point makes sense, as long as it makes
progress.


Maybe, we need some additional negotiation flag, to allow BLOCK_STATUS command 
with NBD_CMD_FLAG_REQ_ONE flag to return an extent larger than required when 
that information is available for free?


That's already the case when FLAG_REQ_ONE is not present.  The reason
that REQ_ONE clamps things at the requested limit is because older
qemu had a bug that it rejected the server sending extra information,
even when that info was free.



That's one possibility.  Another does not add anything to the NBD
protocol, but instead limits the code that tries to loop over block
status to deteremine a larger "allocated" answer to return to instead
stop looping after a finite number of extents have been merged
together.



In this case we should answer a question: when to stop looping? I'm not sure we 
can simply drop the loop:

For example, for compressed clusters, bdrv_co_block_status() will return them 
one-by-one, and sending them one by one to the wire, when user requested large 
range would be inefficient.
Or should we change block-status behavior for compressed clusters? And may be 
add flag to block_status() that we are not interested in valid_offset, so it 
can return an extent corresponding to the whole L2 table chunk (if all entries 
are allocated, but not consecutive)?


Currently, bdrv_co_block_status() takes 'bool want_zero' that says
what the client wants.  Maybe it's worth expanding that into an enum
or bitmask to allow finer-grained client requests (the notion of
whether valid_offset matters to the caller IS relevant for deciding
when to clamp vs. loop).


I think, sooner or later we'll do it anyway






Hmm. So, if not update spec, we'll have to "fix" implementation. That means actually, 
that we should update spec anyway, at least to note that: "clients tend to request large 
regions in hope that server will not spend too much time to serve them but instead return shorter 
answer"..


I'm really hoping we don't have to tweak the NBD spec on this one, but
rather improve the quality of implementation in qemu.



And you'll never have guarantee, that some another (non-qemu) NBD server will 
not try to satisfy the whole request in on go.


That's true, but the NBD spec has always tried to encourage servers to
provide more information when it was free, but to give up early if it
gets too expensive.  It's a judgment call on where that line lies, and
may indeed be different between different servers.


Hmm. Okay.

Still, if we go the way this patch suggests, we'll need at least improve our 
server implementation of NBD_BLOCK_STATUS around qcow2.






In other words:

1. We want block_status of some region
2. If there some free information available about larger region we are happy to 
cache it

With your solution, we just request a lot larger region, so we lose information 
of [1]. That means that sever can't imagine, how much of requested region is 
really needed, i.e. if we do some additional work to return more information 
(still within boundaries of the request) will it be:
  - good work to minimize network traffic
OR
  - extra work, waste server time, client will cache this information but 
probably never use (or even lose it soon, as our cache is very simple)

With additional negotiation flag we don't lose [1], i.e how much client wants 
now.


So, for me, modifying the protocol looks nicer..

Another approach is do request without NBD_CMD_FLAG_REQ_ONE and handle several 
extents.


_This_ idea is nicer.  It allows the client to request an actual
length it is interested in

45 matches

Mail list logo