date:20220215

Re: [PATCH 4/4] iotests: make qemu_img raise on non-zero rc by default

2022-02-15 Thread John Snow

On Tue, Feb 15, 2022 at 6:09 PM Eric Blake  wrote:
>
> On Tue, Feb 15, 2022 at 05:08:53PM -0500, John Snow wrote:
> > re-configure qemu_img() into a function that will by default raise a
> > VerboseProcessException (extended from CalledProcessException) on
> > non-zero return codes. This will produce a stack trace that will show
> > the command line arguments and return code from the failed process run.
> >
> > Users that want something more flexible (There appears to be only one)
>
> s/There/there/
>
> > can use check=False and manage the return themselves.
> >
> > Signed-off-by: John Snow 
> > ---
> >  tests/qemu-iotests/257|  8 --
> >  tests/qemu-iotests/iotests.py | 53 +++
> >  2 files changed, 53 insertions(+), 8 deletions(-)
> >
>
> > +def qemu_img(*args: str, check: bool = True, combine_stdio: bool = True
> > + ) -> subprocess.CompletedProcess[str]:
> > +"""
> > +Run qemu_img, returning a CompletedProcess instance.
> > +
> > +The CompletedProcess object has args, returncode, and output 
> > properties.
> > +If streams are not combined, It will also have stdout/stderr 
> > properties.
>
> s/It/it/
>
> > +
> > +:param args: command-line arguments to qemu_img.
> > +:param check: set to False to suppress VerboseProcessError.
> > +:param combine_stdio: set to False to keep stdout/stderr separated.
> > +
> > +:raise VerboseProcessError:
> > +On non-zero exit code, when 'check=True' was provided. This
> > +exception has 'stderr', 'stdout' and 'returncode' properties
> > +that may be inspected to show greater detail. If this exception
> > +is not handled, The command-line, return code, and all console
>
> s/The/the/
>
> > +output will be included at the bottom of the stack trace.
> > +"""
>
> > @@ -469,8 +511,9 @@ def qemu_nbd_popen(*args):
> >
> >  def compare_images(img1, img2, fmt1=imgfmt, fmt2=imgfmt):
> >  '''Return True if two image files are identical'''
> > -return qemu_img('compare', '-f', fmt1,
> > -'-F', fmt2, img1, img2) == 0
> > +res = qemu_img('compare', '-f', fmt1,
> > +   '-F', fmt2, img1, img2, check=False)
> > +return res.returncode == 0
>
> Not sure why there was so much Mid-sentence capitalization ;)

Mostly the result of editing a few different drafts together and
failing to fix the casing. Oopsie.

>
> Seems useful, although it is at the limits of my python review skills,
> so this is a weak:
>
> Reviewed-by: Eric Blake 
>

Understood, thanks!

--js

Re: [PATCH 3/4] iotests: Remove explicit checks for qemu_img() == 0

2022-02-15 Thread John Snow

On Tue, Feb 15, 2022 at 6:05 PM Eric Blake  wrote:
>
> On Tue, Feb 15, 2022 at 05:08:52PM -0500, John Snow wrote:
> > qemu_img() returning zero ought to be the rule, not the
> > exception. Remove all explicit checks against the condition in
> > preparation for making non-zero returns an Exception.
> >
> > Signed-off-by: John Snow 
> > ---
>
> Reviewed-by: Eric Blake 
>
> This one seems clean whether or not there are questions about using
> enboxify earlier in the series.
>

Yeah, if we don't want ~fancy~ text decoration, the rest of this
should still be broadly helpful, I think. There are simpler text
decorations we can use to keep the output from looking like traceback
soup.

--js

Re: [PATCH 1/4] python/utils: add enboxify() text decoration utility

2022-02-15 Thread Philippe Mathieu-Daudé via


On 16/2/22 00:53, John Snow wrote:

On Tue, Feb 15, 2022 at 5:55 PM Eric Blake  wrote:


On Tue, Feb 15, 2022 at 05:08:50PM -0500, John Snow wrote:

print(enboxify(msg, width=72, name="commit message"))

┏━ commit message ━┓
┃ enboxify() takes a chunk of text and wraps it in a text art box that ┃
┃  adheres to a specified width. An optional title label may be given, ┃
┃  and any of the individual glyphs used to draw the box may be┃


Why do these two lines have a leading space,


┃ replaced or specified as well.   ┃


but this one doesn't?  It must be an off-by-one corner case when your
choice of space to wrap on is exactly at the wrap column.



Right, you're probably witnessing the right-pad *and* the actual space.


┗━━┛

Signed-off-by: John Snow 
---
  python/qemu/utils/__init__.py | 58 +++
  1 file changed, 58 insertions(+)



+def _wrap(line: str) -> str:
+return os.linesep.join([
+wrapped_line.ljust(lwidth) + suffix
+for wrapped_line in textwrap.wrap(
+line, width=lwidth, initial_indent=prefix,
+subsequent_indent=prefix, replace_whitespace=False,
+drop_whitespace=False, break_on_hyphens=False)


Always nice when someone else has written the cool library function to
do all the hard work for you ;)  But this is probably where you have the 
off-by-one I called out above.



Yeah, I just didn't want it to eat multiple spaces if they were
present -- I wanted it to reproduce them faithfully. The tradeoff is
some silliness near the margins.

Realistically, if I want something any better than what I've done
here, I should find a library to do it for me instead -- but for the
sake of highlighting some important information, this may be
just-enough-juice.


's/^┃  /┃ /' on top ;D

Re: [PATCH 2/4] iotests: add VerboseProcessError

2022-02-15 Thread John Snow

On Tue, Feb 15, 2022 at 5:58 PM Eric Blake  wrote:
>
> On Tue, Feb 15, 2022 at 05:08:51PM -0500, John Snow wrote:
> > This adds an Exception that extends the garden variety
> > subprocess.CalledProcessError. When this exception is raised, it will
> > still be caught when selecting for the stdlib variant.
> >
> > The difference is that the str() method of this Exception also adds the
> > stdout/stderr logs. In effect, if this exception goes unhandled, Python
> > will print the output in a nice, highlighted box to the terminal so that
> > it's easy to spot.
> >
> > This should save some headache from having to re-run test suites with
> > debugging enabled if we augment the exceptions we print more information
>
> This didn't parse well for me.  Maybe
> s/enabled/enabled,/ s/print more/print with more/
>

*cough* copy-paste failure. Two drafts collided here. Oopsie.

Re: [PATCH 1/4] python/utils: add enboxify() text decoration utility

2022-02-15 Thread John Snow

On Tue, Feb 15, 2022 at 5:55 PM Eric Blake  wrote:
>
> On Tue, Feb 15, 2022 at 05:08:50PM -0500, John Snow wrote:
> > >>> print(enboxify(msg, width=72, name="commit message"))
> > ┏━ commit message ━┓
> > ┃ enboxify() takes a chunk of text and wraps it in a text art box that ┃
> > ┃  adheres to a specified width. An optional title label may be given, ┃
> > ┃  and any of the individual glyphs used to draw the box may be┃
>
> Why do these two lines have a leading space,
>
> > ┃ replaced or specified as well.   ┃
>
> but this one doesn't?  It must be an off-by-one corner case when your
> choice of space to wrap on is exactly at the wrap column.
>

Right, you're probably witnessing the right-pad *and* the actual space.

> > ┗━━┛
> >
> > Signed-off-by: John Snow 
> > ---
> >  python/qemu/utils/__init__.py | 58 +++
> >  1 file changed, 58 insertions(+)
> >
> > diff --git a/python/qemu/utils/__init__.py b/python/qemu/utils/__init__.py
> > index 7f1a5138c4b..f785316f230 100644
> > --- a/python/qemu/utils/__init__.py
> > +++ b/python/qemu/utils/__init__.py
> > @@ -15,7 +15,10 @@
> >  # the COPYING file in the top-level directory.
> >  #
> >
> > +import os
> >  import re
> > +import shutil
> > +import textwrap
> >  from typing import Optional
> >
> >  # pylint: disable=import-error
> > @@ -23,6 +26,7 @@
> >
> >
> >  __all__ = (
> > +'enboxify',
> >  'get_info_usernet_hostfwd_port',
> >  'kvm_available',
> >  'list_accel',
> > @@ -43,3 +47,57 @@ def get_info_usernet_hostfwd_port(info_usernet_output: 
> > str) -> Optional[int]:
> >  if match is not None:
> >  return int(match[1])
> >  return None
> > +
> > +
> > +# pylint: disable=too-many-arguments
> > +def enboxify(
> > +content: str = '',
> > +width: Optional[int] = None,
> > +name: Optional[str] = None,
> > +padding: int = 1,
> > +upper_left: str = '┏',
> > +upper_right: str = '┓',
> > +lower_left: str = '┗',
> > +lower_right: str = '┛',
> > +horizontal: str = '━',
> > +vertical: str = '┃',
> > +) -> str:
> > +"""
> > +Wrap some text into a text art box of a given width.
> > +
> > +:param content: The text to wrap into a box.
> > +:param width: The number of columns (including the box itself).
> > +:param name: A label to apply to the upper-left of the box.
> > +:param padding: How many columns of padding to apply inside.
> > +"""
>
> Where's theh :param docs for the 6 custom glyphs?
>

Omitted as kinda-sorta-uninteresting. I can add them if we decide we
want this series.
(It's admittedly a bit of a "Hey, what do you think of this?")

> > +if width is None:
> > +width = shutil.get_terminal_size()[0]
> > +prefix = vertical + (' ' * padding)
> > +suffix = (' ' * padding) + vertical
> > +lwidth = width - len(suffix)
> > +
> > +def _bar(name: Optional[str], top: bool = True) -> str:
> > +ret = upper_left if top else lower_left
> > +right = upper_right if top else lower_right
> > +if name is not None:
> > +ret += f"{horizontal} {name} "
> > +
> > +assert width is not None
> > +filler_len = width - len(ret) - len(right)
> > +ret += f"{horizontal * filler_len}{right}"
> > +return ret
> > +
> > +def _wrap(line: str) -> str:
> > +return os.linesep.join([
> > +wrapped_line.ljust(lwidth) + suffix
> > +for wrapped_line in textwrap.wrap(
> > +line, width=lwidth, initial_indent=prefix,
> > +subsequent_indent=prefix, replace_whitespace=False,
> > +drop_whitespace=False, break_on_hyphens=False)
>
> Always nice when someone else has written the cool library function to
> do all the hard work for you ;)  But this is probably where you have the 
> off-by-one I called out above.
>

Yeah, I just didn't want it to eat multiple spaces if they were
present -- I wanted it to reproduce them faithfully. The tradeoff is
some silliness near the margins.

Realistically, if I want something any better than what I've done
here, I should find a library to do it for me instead -- but for the
sake of highlighting some important information, this may be
just-enough-juice.

--js

Re: [PATCH v2] nbd/server: Allow MULTI_CONN for shared writable exports

2022-02-15 Thread Eric Blake

On Tue, Feb 15, 2022 at 09:23:36PM +0200, Nir Soffer wrote:
> On Tue, Feb 15, 2022 at 7:22 PM Eric Blake  wrote:
> 
> > According to the NBD spec, a server advertising
> > NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
> > not see any cache inconsistencies: when properly separated by a single
> > flush, actions performed by one client will be visible to another
> > client, regardless of which client did the flush.  We satisfy these
> > conditions in qemu when our block layer is backed by the local
> > filesystem (by virtue of the semantics of fdatasync(), and the fact
> > that qemu itself is not buffering writes beyond flushes).  It is
> > harder to state whether we satisfy these conditions for network-based
> > protocols, so the safest course of action is to allow users to opt-in
> > to advertising multi-conn.  We may later tweak defaults to advertise
> > by default when the block layer can confirm that the underlying
> > protocol driver is cache consistent between multiple writers, but for
> > now, this at least allows savvy users (such as virt-v2v or nbdcopy) to
> > explicitly start qemu-nbd or qemu-storage-daemon with multi-conn
> > advertisement in a known-safe setup where the client end can then
> > benefit from parallel clients.
> >
> 
> It makes sense, and will be used by oVirt. Actually we are already using
> multiple connections for writing about 2 years, based on your promise
> that if every client writes to district  areas this is safe.

I presume s/district/distinct/, but yes, I'm glad we're finally trying
to make the code match existing practice ;)

> > +++ b/docs/tools/qemu-nbd.rst
> > @@ -139,8 +139,7 @@ driver options if ``--image-opts`` is specified.
> >  .. option:: -e, --shared=NUM
> >
> >Allow up to *NUM* clients to share the device (default
> > -  ``1``), 0 for unlimited. Safe for readers, but for now,
> > -  consistency is not guaranteed between multiple writers.
> > +  ``1``), 0 for unlimited.
> >
> 
> Removing the note means that now consistency is guaranteed between
> multiple writers, no?
> 
> Or maybe we want to mention here that consistency depends on the protocol
> and users can opt in, or refer to the section where this is discussed?

Yeah, a link to the QAPI docs where multi-conn is documented might be
nice, except I'm not sure the best way to do that in our sphinx
documentation setup.

> > +##
> > +# @NbdExportMultiConn:
> > +#
> > +# Possible settings for advertising NBD multiple client support.
> > +#
> > +# @off: Do not advertise multiple clients.
> > +#
> > +# @on: Allow multiple clients (for writable clients, this is only safe
> > +#  if the underlying BDS is cache-consistent, such as when backed
> > +#  by the raw file driver); ignored if the NBD server was set up
> > +#  with max-connections of 1.
> > +#
> > +# @auto: Behaves like @off if the export is writable, and @on if the
> > +#export is read-only.
> > +#
> > +# Since: 7.0
> > +##
> > +{ 'enum': 'NbdExportMultiConn',
> > +  'data': ['off', 'on', 'auto'] }
> >
> 
> Are we going to have --multi-con=(on|off|auto)?

Oh. The QMP command (which is immediately visible through
nbd-server-add/block-storage-add to qemu and qemu-storage-daemon)
gains "multi-conn":"on", but you may be right that qemu-nbd would want
a command line option (either that, or we accellerate our plans that
qsd should replace qemu-nbd).

> > +++ b/blockdev-nbd.c
> > @@ -44,6 +44,11 @@ bool nbd_server_is_running(void)
> >  return nbd_server || is_qemu_nbd;
> >  }
> >
> > +int nbd_server_max_connections(void)
> > +{
> > +return nbd_server ? nbd_server->max_connections : -1;
> > +}
> >
> 
> -1 is a little bit strange for a limit, maybe 1 is a better default when
> we nbd_server == NULL? When can this happen?

In qemu, if you haven't used the QMP command 'nbd-server-start' yet.
In qemu-nbd, always (per the nbd_server_is_running function just
above).  My iotest only covered the qemu/qsd side, not the qemu-nbd
side, so it looks like I need a v3...

> > +++ b/nbd/server.c

> > +/*
> > + * Determine whether to advertise multi-conn.  Default is auto,
> > + * which resolves to on for read-only and off for writable.  But
> > + * if the server has max-connections 1, that forces the flag off.
> >
> 
> Looks good, this can be enabled automatically based on the driver
> if we want, so users using auto will can upgrade to multi-con automatically.

Yes, that's part of why I made it a tri-state with a default of 'auto'.

> 
> 
> > + */
> > +if (!arg->has_multi_conn) {
> > +arg->multi_conn = NBD_EXPORT_MULTI_CONN_AUTO;
> > +}
> > +if (nbd_server_max_connections() == 1) {
> 
> +arg->multi_conn = NBD_EXPORT_MULTI_CONN_OFF;
> > +}
> 
> +if (arg->multi_conn == NBD_EXPORT_MULTI_CONN_AUTO) {
> > +multi_conn = readonly;
> > +} else {
> > +multi_conn = arg->multi_conn == NBD_EXPORT_MULTI_CONN_ON;
> > +}
> >
> 
> This part is a little bit

Re: [PATCH 4/4] iotests: make qemu_img raise on non-zero rc by default

2022-02-15 Thread Eric Blake

On Tue, Feb 15, 2022 at 05:08:53PM -0500, John Snow wrote:
> re-configure qemu_img() into a function that will by default raise a
> VerboseProcessException (extended from CalledProcessException) on
> non-zero return codes. This will produce a stack trace that will show
> the command line arguments and return code from the failed process run.
> 
> Users that want something more flexible (There appears to be only one)

s/There/there/

> can use check=False and manage the return themselves.
> 
> Signed-off-by: John Snow 
> ---
>  tests/qemu-iotests/257|  8 --
>  tests/qemu-iotests/iotests.py | 53 +++
>  2 files changed, 53 insertions(+), 8 deletions(-)
> 

> +def qemu_img(*args: str, check: bool = True, combine_stdio: bool = True
> + ) -> subprocess.CompletedProcess[str]:
> +"""
> +Run qemu_img, returning a CompletedProcess instance.
> +
> +The CompletedProcess object has args, returncode, and output properties.
> +If streams are not combined, It will also have stdout/stderr properties.

s/It/it/

> +
> +:param args: command-line arguments to qemu_img.
> +:param check: set to False to suppress VerboseProcessError.
> +:param combine_stdio: set to False to keep stdout/stderr separated.
> +
> +:raise VerboseProcessError:
> +On non-zero exit code, when 'check=True' was provided. This
> +exception has 'stderr', 'stdout' and 'returncode' properties
> +that may be inspected to show greater detail. If this exception
> +is not handled, The command-line, return code, and all console

s/The/the/

> +output will be included at the bottom of the stack trace.
> +"""

> @@ -469,8 +511,9 @@ def qemu_nbd_popen(*args):
>  
>  def compare_images(img1, img2, fmt1=imgfmt, fmt2=imgfmt):
>  '''Return True if two image files are identical'''
> -return qemu_img('compare', '-f', fmt1,
> -'-F', fmt2, img1, img2) == 0
> +res = qemu_img('compare', '-f', fmt1,
> +   '-F', fmt2, img1, img2, check=False)
> +return res.returncode == 0

Not sure why there was so much Mid-sentence capitalization ;)

Seems useful, although it is at the limits of my python review skills,
so this is a weak:

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 3/4] iotests: Remove explicit checks for qemu_img() == 0

2022-02-15 Thread Eric Blake

On Tue, Feb 15, 2022 at 05:08:52PM -0500, John Snow wrote:
> qemu_img() returning zero ought to be the rule, not the
> exception. Remove all explicit checks against the condition in
> preparation for making non-zero returns an Exception.
> 
> Signed-off-by: John Snow 
> ---

Reviewed-by: Eric Blake 

This one seems clean whether or not there are questions about using
enboxify earlier in the series.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 2/4] iotests: add VerboseProcessError

2022-02-15 Thread Eric Blake

On Tue, Feb 15, 2022 at 05:08:51PM -0500, John Snow wrote:
> This adds an Exception that extends the garden variety
> subprocess.CalledProcessError. When this exception is raised, it will
> still be caught when selecting for the stdlib variant.
> 
> The difference is that the str() method of this Exception also adds the
> stdout/stderr logs. In effect, if this exception goes unhandled, Python
> will print the output in a nice, highlighted box to the terminal so that
> it's easy to spot.
> 
> This should save some headache from having to re-run test suites with
> debugging enabled if we augment the exceptions we print more information

This didn't parse well for me.  Maybe
s/enabled/enabled,/ s/print more/print with more/

> in the default case.
> 
> Signed-off-by: John Snow 
> ---
>  tests/qemu-iotests/iotests.py | 34 ++
>  1 file changed, 34 insertions(+)
>

Otherwise seems useful.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 1/4] python/utils: add enboxify() text decoration utility

2022-02-15 Thread Eric Blake

On Tue, Feb 15, 2022 at 05:08:50PM -0500, John Snow wrote:
> >>> print(enboxify(msg, width=72, name="commit message"))
> ┏━ commit message ━┓
> ┃ enboxify() takes a chunk of text and wraps it in a text art box that ┃
> ┃  adheres to a specified width. An optional title label may be given, ┃
> ┃  and any of the individual glyphs used to draw the box may be┃

Why do these two lines have a leading space,

> ┃ replaced or specified as well.   ┃

but this one doesn't?  It must be an off-by-one corner case when your
choice of space to wrap on is exactly at the wrap column.

> ┗━━┛
> 
> Signed-off-by: John Snow 
> ---
>  python/qemu/utils/__init__.py | 58 +++
>  1 file changed, 58 insertions(+)
> 
> diff --git a/python/qemu/utils/__init__.py b/python/qemu/utils/__init__.py
> index 7f1a5138c4b..f785316f230 100644
> --- a/python/qemu/utils/__init__.py
> +++ b/python/qemu/utils/__init__.py
> @@ -15,7 +15,10 @@
>  # the COPYING file in the top-level directory.
>  #
>  
> +import os
>  import re
> +import shutil
> +import textwrap
>  from typing import Optional
>  
>  # pylint: disable=import-error
> @@ -23,6 +26,7 @@
>  
>  
>  __all__ = (
> +'enboxify',
>  'get_info_usernet_hostfwd_port',
>  'kvm_available',
>  'list_accel',
> @@ -43,3 +47,57 @@ def get_info_usernet_hostfwd_port(info_usernet_output: 
> str) -> Optional[int]:
>  if match is not None:
>  return int(match[1])
>  return None
> +
> +
> +# pylint: disable=too-many-arguments
> +def enboxify(
> +content: str = '',
> +width: Optional[int] = None,
> +name: Optional[str] = None,
> +padding: int = 1,
> +upper_left: str = '┏',
> +upper_right: str = '┓',
> +lower_left: str = '┗',
> +lower_right: str = '┛',
> +horizontal: str = '━',
> +vertical: str = '┃',
> +) -> str:
> +"""
> +Wrap some text into a text art box of a given width.
> +
> +:param content: The text to wrap into a box.
> +:param width: The number of columns (including the box itself).
> +:param name: A label to apply to the upper-left of the box.
> +:param padding: How many columns of padding to apply inside.
> +"""

Where's theh :param docs for the 6 custom glyphs?

> +if width is None:
> +width = shutil.get_terminal_size()[0]
> +prefix = vertical + (' ' * padding)
> +suffix = (' ' * padding) + vertical
> +lwidth = width - len(suffix)
> +
> +def _bar(name: Optional[str], top: bool = True) -> str:
> +ret = upper_left if top else lower_left
> +right = upper_right if top else lower_right
> +if name is not None:
> +ret += f"{horizontal} {name} "
> +
> +assert width is not None
> +filler_len = width - len(ret) - len(right)
> +ret += f"{horizontal * filler_len}{right}"
> +return ret
> +
> +def _wrap(line: str) -> str:
> +return os.linesep.join([
> +wrapped_line.ljust(lwidth) + suffix
> +for wrapped_line in textwrap.wrap(
> +line, width=lwidth, initial_indent=prefix,
> +subsequent_indent=prefix, replace_whitespace=False,
> +drop_whitespace=False, break_on_hyphens=False)

Always nice when someone else has written the cool library function to
do all the hard work for you ;)  But this is probably where you have the 
off-by-one I called out above.

> +])
> +
> +return os.linesep.join((
> +_bar(name, top=True),
> +os.linesep.join(_wrap(line) for line in content.splitlines()),
> +_bar(None, top=False),
> +))
> -- 
> 2.34.1
> 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PULL 0/3] Block patches

2022-02-15 Thread Peter Maydell

On Mon, 14 Feb 2022 at 17:44, Stefan Hajnoczi  wrote:
>
> The following changes since commit cc5ce8b8b6be83e5fe3b668dbd061ad97c534e3f:
>
>   Merge remote-tracking branch 'remotes/legoater/tags/pull-ppc-20220210' into 
> staging (2022-02-13 20:33:28 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/stefanha/qemu.git tags/block-pull-request
>
> for you to fetch changes up to 4c41c69e05fe28c0f95f8abd2ebf407e95a4f04b:
>
>   util: adjust coroutine pool size to virtio block queue (2022-02-14 17:11:25 
> +)
>
> 
> Pull request
>
> This contains coroutine poll size scaling, virtiofsd rseq seccomp for new 
> glibc
> versions, and the QEMU C virtiofsd deprecation notice.
>
> 

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH 2/3] iotests: Allow using QMP with the QSD

2022-02-15 Thread Eric Blake

On Tue, Feb 15, 2022 at 02:57:26PM +0100, Hanna Reitz wrote:
> Add a parameter to optionally open a QMP connection when creating a
> QemuStorageDaemon instance.
> 
> Signed-off-by: Hanna Reitz 
> ---
>  tests/qemu-iotests/iotests.py | 29 -
>  1 file changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index 6ba65eb1ff..47e3808ab9 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -39,6 +39,7 @@
>  
>  from qemu.machine import qtest
>  from qemu.qmp import QMPMessage
> +from qemu.aqmp.legacy import QEMUMonitorProtocol

I thought we were trying to get rid of aqmp.legacy usage, so this
feels like a temporary regression.  Oh well, not the end of the
testing world.

>  def stop(self, kill_signal=15):
>  self._p.send_signal(kill_signal)
>  self._p.wait()
>  self._p = None
>  
> +if self._qmp:
> +self._qmp.close()
> +
>  try:
> +if self._qmpsock is not None:
> +os.remove(self._qmpsock)
>  os.remove(self.pidfile)
>  except OSError:
>  pass

Do we need two try: blocks here, to remove self.pidfile even if
os.remove(self._qmpsock) failed?

Otherwise, makes sense to me.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 1/3] block: Make bdrv_refresh_limits() non-recursive

2022-02-15 Thread Eric Blake

On Tue, Feb 15, 2022 at 02:57:25PM +0100, Hanna Reitz wrote:
> bdrv_refresh_limits() recurses down to the node's children.  That does
> not seem necessary: When we refresh limits on some node, and then
> recurse down and were to change one of its children's BlockLimits, then
> that would mean we noticed the changed limits by pure chance.  The fact
> that we refresh the parent's limits has nothing to do with it, so the
> reason for the change probably happened before this point in time, and
> we should have refreshed the limits then.
> 
> On the other hand, we do not have infrastructure for noticing that block
> limits change after they have been initialized for the first time (this
> would require propagating the change upwards to the respective node's
> parents), and so evidently we consider this case impossible.
> 
> If this case is impossible, then we will not need to recurse down in
> bdrv_refresh_limits().  Every node's limits are initialized in
> bdrv_open_driver(), and are refreshed whenever its children change.
> We want to use the childrens' limits to get some initial default, but we
> can just take them, we do not need to refresh them.
> 
> The problem with recursing is that bdrv_refresh_limits() is not atomic.
> It begins with zeroing BDS.bl, and only then sets proper, valid limits.
> If we do not drain all nodes whose limits are refreshed, then concurrent
> I/O requests can encounter invalid request_alignment values and crash
> qemu.  Therefore, a recursing bdrv_refresh_limits() requires the whole
> subtree to be drained, which is currently not ensured by most callers.
> 
> A non-recursive bdrv_refresh_limits() only requires the node in question
> to not receive I/O requests, and this is done by most callers in some
> way or another:
> - bdrv_open_driver() deals with a new node with no parents yet
> - bdrv_set_file_or_backing_noperm() acts on a drained node
> - bdrv_reopen_commit() acts only on drained nodes
> - bdrv_append() should in theory require the node to be drained; in
>   practice most callers just lock the AioContext, which should at least
>   be enough to prevent concurrent I/O requests from accessing invalid
>   limits
> 
> So we can resolve the bug by making bdrv_refresh_limits() non-recursive.

Long explanation, but very helpful.

> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1879437
> Signed-off-by: Hanna Reitz 
> ---
>  block/io.c | 4 
>  1 file changed, 4 deletions(-)

And deceptively simple fix!

Reviewed-by: Eric Blake 

> 
> diff --git a/block/io.c b/block/io.c
> index 4e4cb556c5..c3e7301613 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -189,10 +189,6 @@ void bdrv_refresh_limits(BlockDriverState *bs, 
> Transaction *tran, Error **errp)
>  QLIST_FOREACH(c, >children, next) {
>  if (c->role & (BDRV_CHILD_DATA | BDRV_CHILD_FILTERED | 
> BDRV_CHILD_COW))
>  {
> -bdrv_refresh_limits(c->bs, tran, errp);
> -if (*errp) {
> -return;
> -}
>  bdrv_merge_limits(>bl, >bs->bl);
>  have_limits = true;
>  }
> -- 
> 2.34.1
> 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH] tests/qemu-iotests: Rework the checks and spots using GNU sed

2022-02-15 Thread Eric Blake

On Tue, Feb 15, 2022 at 02:20:31PM +0100, Thomas Huth wrote:
> Instead of failing the iotests if GNU sed is not available (or skipping
> them completely in the check-block.sh script), it would be better to
> simply skip the bash-based tests that rely on GNU sed, so that the other
> tests could still be run. Thus we now explicitely use "gsed" (either as
> direct program or as a wrapper around "sed" if it's the GNU version)
> in the spots that rely on the GNU sed behavior. Then we also remove the
> sed checks from the check-block.sh script, so that "make check-block"
> can now be run on systems without GNU sed, too.
> 
> Signed-off-by: Thomas Huth 
> ---
>  I've checked that this works fine with:
>  make vm-build-netbsd TARGET_LIST=x86_64-softmmu BUILD_TARGET=check-block
>  make vm-build-freebsd TARGET_LIST=x86_64-softmmu BUILD_TARGET=check-block
>  and with the macOS targets in our CI.
> 
>  tests/check-block.sh | 12 --
>  tests/qemu-iotests/271   |  2 +-
>  tests/qemu-iotests/common.filter | 74 
>  tests/qemu-iotests/common.rc | 45 +--
>  4 files changed, 61 insertions(+), 72 deletions(-)
> 

> +++ b/tests/qemu-iotests/271
> @@ -896,7 +896,7 @@ _make_test_img -o extended_l2=on 1M
>  # Second and third writes in _concurrent_io() are independent and may finish 
> in
>  # different order. So, filter offset out to match both possible variants.
>  _concurrent_io | $QEMU_IO | _filter_qemu_io | \
> -$SED -e 's/\(20480\|40960\)/OFFSET/'
> +sed -e 's/\(20480\|40960\)/OFFSET/'

Looks like a portable sed script, so 'sed' instead of 'gsed' here is fine.

>  _concurrent_verify | $QEMU_IO | _filter_qemu_io
>  
>  # success, all done
> diff --git a/tests/qemu-iotests/common.filter 
> b/tests/qemu-iotests/common.filter
> index 935217aa65..a3b1708adc 100644
> --- a/tests/qemu-iotests/common.filter
> +++ b/tests/qemu-iotests/common.filter
> @@ -21,58 +21,58 @@
>  
>  _filter_date()
>  {
> -$SED -re 's/[0-9]{4}-[0-9]{2}-[0-9]{2} 
> [0-9]{2}:[0-9]{2}:[0-9]{2}/-mm-dd hh:mm:ss/'
> +gsed -re 's/[0-9]{4}-[0-9]{2}-[0-9]{2} 
> [0-9]{2}:[0-9]{2}:[0-9]{2}/-mm-dd hh:mm:ss/'

GNU sed recommends spelling it 'sed -E', not 'sed -r', when using
extended regex.  Older POSIX did not support either spelling, but the
next version will require -E, as many implementations have it now:
https://www.austingroupbugs.net/view.php?id=528

Other than the fact that this was easier to write with ERE, I'm not
seeing any other GNU-only extensions in use here; but I'd recommend
that as long as we're touching the line, we spell it 'gsed -Ee'
instead of -re (here, and in several other places).

>  _filter_qom_path()
>  {
> -$SED -e '/Attached to:/s/\device[[0-9]\+\]/device[N]/g'
> +gsed -e '/Attached to:/s/\device[[0-9]\+\]/device[N]/g'

Here, it is our use of \+ that is a GNU sed extension, although it is
fairly easy (but verbose) to translate that one to portable sed
(\+ is the same as *).  So gsed is correct.  On the
other hand, the use of [[0-9]\+\] looks ugly - it probably does NOT
match what we meant (we have a bracket expression '[...]' that matches
the 11 characters [ and 0-9, then '\+' to match that bracket
expression 1 or more times, then '\]' which in its context is
identical to ']' to match a closing ], since only opening [ needs \
escaping for literal treatment).  What we probably meant is:

'/Attached to:/s/\device\[[0-9][0-9]*]/device[N]/g'

at which point normal sed would do.

>  }
>  
>  # replace occurrences of the actual TEST_DIR value with TEST_DIR
>  _filter_testdir()
>  {
> -$SED -e "s#$TEST_DIR/#TEST_DIR/#g" \
> - -e "s#$SOCK_DIR/#SOCK_DIR/#g" \
> - -e "s#SOCK_DIR/fuse-#TEST_DIR/#g"
> +sed -e "s#$TEST_DIR/#TEST_DIR/#g" \
> +-e "s#$SOCK_DIR/#SOCK_DIR/#g" \
> +-e "s#SOCK_DIR/fuse-#TEST_DIR/#g"

And this one indeed looks portable to POSIX (unless $TEST_DIR contains
weird stuff by accident).

>  # Removes \r from messages
>  _filter_win32()
>  {
> -$SED -e 's/\r//g'
> +gsed -e 's/\r//g'

Yep, \r is another GNU sed extension.

>  }
>  
>  # sanitize qemu-io output
>  _filter_qemu_io()
>  {
> -_filter_win32 | $SED -e "s/[0-9]* ops\; [0-9/:. sec]* ([0-9/.inf]* 
> [EPTGMKiBbytes]*\/sec and [0-9/.inf]* ops\/sec)/X ops\; XX:XX:XX.X (XXX 
> YYY\/sec and XXX ops\/sec)/" \
> +_filter_win32 | gsed -e "s/[0-9]* ops\; [0-9/:. sec]* ([0-9/.inf]* 
> [EPTGMKiBbytes]*\/sec and [0-9/.inf]* ops\/sec)/X ops\; XX:XX:XX.X (XXX 
> YYY\/sec and XXX ops\/sec)/" \
>  -e "s/: line [0-9][0-9]*:  *[0-9][0-9]*\( Aborted\| Killed\)/:\1/" \
>  -e "s/qemu-io> //g"

I'm not seeing anything specific to GNU sed in this (long) sed script;
can we relax this one to plain 'sed'?  Use of s#some/text## might be
easier than having to s/some\/text//, but that's not essential.

>  }
> @@ -80,7 +80,7 @@ _filter_qemu_io()
>  # replace occurrences of QEMU_PROG with "qemu"
>  _filter_qemu()
>  {
> -

Re: [PATCH 3/3] iotests/graph-changes-while-io: New test

2022-02-15 Thread Eric Blake

On Tue, Feb 15, 2022 at 02:57:27PM +0100, Hanna Reitz wrote:
> Test the following scenario:
> 1. Some block node (null-co) attached to a user (here: NBD server) that
>performs I/O and keeps the node in an I/O thread
> 2. Repeatedly run blockdev-add/blockdev-del to add/remove an overlay
>to/from that node
> 
> Each blockdev-add triggers bdrv_refresh_limits(), and because
> blockdev-add runs in the main thread, it does not stop the I/O requests.
> I/O can thus happen while the limits are refreshed, and when such a
> request sees a temporarily invalid block limit (e.g. alignment is 0),
> this may easily crash qemu (or the storage daemon in this case).
> 
> The block layer needs to ensure that I/O requests to a node are paused
> while that node's BlockLimits are refreshed.
> 
> Signed-off-by: Hanna Reitz 
> ---
>  .../qemu-iotests/tests/graph-changes-while-io | 91 +++
>  .../tests/graph-changes-while-io.out  |  5 +
>  2 files changed, 96 insertions(+)
>  create mode 100755 tests/qemu-iotests/tests/graph-changes-while-io
>  create mode 100644 tests/qemu-iotests/tests/graph-changes-while-io.out

Reviewed-by: Eric Blake 

Since we found this with the help of NBD, should I be considering this
series for my NBD queue, or is there a better block-related maintainer
queue that it should go through?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[PATCH 0/4] iotests: add detailed tracebacks to qemu_img() failures

2022-02-15 Thread John Snow

It came to my attention via th_huth that iotest 065 would crash in a way
that was largely silent, except for the async QMP traces. The real cause
turns out to be that iotest 065 does not support ztsd being compiled out
of the build, so the qemu-img command fails ... silently.
(And then everything after it explodes nastily.)

Almost every user of iotests.qemu_img() does not check the return code,
a few assert it to be zero, and exactly one user asserts it to be
non-zero. qemu_img() is already throwing away process output, too, so no
callers are using that information, either.

Therefore: add an Exception to qemu_img(), with some zazz.

RFC: I didn't attempt to clean up the other dozen function helpers we
have. It's possible we can unify and consolidate cases a bit, but I
wanted to test the waters with a smaller...ish incision first. qemu_io
and qemu_nbd are candidates for this treatment, and using the same
terminal decorations for the VMLaunchError in machine.py is also worth
looking into.

To see this in action, you could configure your QEMU to omit zstd
support and then run ./check -qcow2 065. It'd look something like below:

After:

065   fail   [16:26:17] [16:26:18]   0.3s   (last: 0.4s)  failed, exit 
status 1
--- /home/jsnow/src/qemu/tests/qemu-iotests/065.out
+++ 065.out.bad
@@ -1,5 +1,64 @@
-
+EEE.
+==
+ERROR: test_qmp (__main__.TestQCow3LazyQMP)
+--
+Traceback (most recent call last):
+  File "/home/jsnow/src/qemu/tests/qemu-iotests/065", line 74, in setUp
+self.TestImageInfoSpecific.setUp(self)
+  File "/home/jsnow/src/qemu/tests/qemu-iotests/065", line 38, in setUp
+qemu_img('create', '-f', iotests.imgfmt, '-o', self.img_options,
+  File "/home/jsnow/src/qemu/tests/qemu-iotests/iotests.py", line 289, in 
qemu_img
+raise VerboseProcessError(
+iotests.VerboseProcessError: Command 
'['/home/jsnow/src/qemu/bin/git/tests/qemu-iotests/../../qemu-img', 'create', 
'-f', 'qcow2', '-o', 'compat=1.1,lazy_refcounts=on,compression_type=zstd', 
'/home/jsnow/src/qemu/bin/git/tests/qemu-iotests/scratch/test.img', '128K']' 
returned non-zero exit status 1.
+  ┏━ output 
━━━┓
+  ┃ Formatting 
┃
+  ┃ '/home/jsnow/src/qemu/bin/git/tests/qemu-iotests/scratch/test.img',
┃
+  ┃ fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zstd 
┃
+  ┃ size=131072 compat=1.1 lazy_refcounts=on refcount_bits=16  
┃
+  ┃ qemu-img:  
┃
+  ┃ /home/jsnow/src/qemu/bin/git/tests/qemu-iotests/scratch/test.img:  
┃
+  ┃ Parameter 'compression-type' does not accept value 'zstd'  
┃
+  
┗┛
+
+==
+ERROR: test_human (__main__.TestQCow3NotLazy)
+--
+Traceback (most recent call last):
+  File "/home/jsnow/src/qemu/tests/qemu-iotests/065", line 38, in setUp
+qemu_img('create', '-f', iotests.imgfmt, '-o', self.img_options,
+  File "/home/jsnow/src/qemu/tests/qemu-iotests/iotests.py", line 289, in 
qemu_img
+raise VerboseProcessError(
+iotests.VerboseProcessError: Command 
'['/home/jsnow/src/qemu/bin/git/tests/qemu-iotests/../../qemu-img', 'create', 
'-f', 'qcow2', '-o', 'compat=1.1,lazy_refcounts=off,compression_type=zstd', 
'/home/jsnow/src/qemu/bin/git/tests/qemu-iotests/scratch/test.img', '128K']' 
returned non-zero exit status 1.
+  ┏━ output 
━━━┓
+  ┃ Formatting 
┃
+  ┃ '/home/jsnow/src/qemu/bin/git/tests/qemu-iotests/scratch/test.img',
┃
+  ┃ fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zstd 
┃
+  ┃ size=131072 compat=1.1 lazy_refcounts=off refcount_bits=16 
┃
+  ┃ qemu-img:  
┃
+  ┃ /home/jsnow/src/qemu/bin/git/tests/qemu-iotests/scratch/test.img:  
┃
+  ┃ Parameter 'compression-type' does not accept value 'zstd'  
┃
+  
┗┛
+
+==
+ERROR: test_json (__main__.TestQCow3NotLazy)
+--
+Traceback (most recent call last):
+  File "/home/jsnow/src/qemu/tests/qemu-iotests/065", line 38, in setUp
+qemu_img('create', '-f', iotests.imgfmt, '-o', self.img_options,
+  File "/home/jsnow/src/qemu/tests/qemu-iotests/iotests.py", line 289, in 
qemu_img
+

[PATCH 3/4] iotests: Remove explicit checks for qemu_img() == 0

2022-02-15 Thread John Snow

qemu_img() returning zero ought to be the rule, not the
exception. Remove all explicit checks against the condition in
preparation for making non-zero returns an Exception.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/163 |  9 +++--
 tests/qemu-iotests/216 |  6 +++---
 tests/qemu-iotests/218 |  2 +-
 tests/qemu-iotests/224 | 11 +--
 tests/qemu-iotests/228 | 12 ++--
 tests/qemu-iotests/257 |  3 +--
 tests/qemu-iotests/258 |  4 ++--
 tests/qemu-iotests/310 | 14 +++---
 tests/qemu-iotests/tests/block-status-cache|  3 +--
 tests/qemu-iotests/tests/image-fleecing|  4 ++--
 tests/qemu-iotests/tests/mirror-ready-cancel-error |  6 ++
 tests/qemu-iotests/tests/mirror-top-perms  |  3 +--
 .../qemu-iotests/tests/remove-bitmap-from-backing  |  8 
 tests/qemu-iotests/tests/stream-error-on-reset |  4 ++--
 14 files changed, 40 insertions(+), 49 deletions(-)

diff --git a/tests/qemu-iotests/163 b/tests/qemu-iotests/163
index b8bfc95358e..e4cd4b230f3 100755
--- a/tests/qemu-iotests/163
+++ b/tests/qemu-iotests/163
@@ -107,8 +107,7 @@ class ShrinkBaseClass(iotests.QMPTestCase):
 
 if iotests.imgfmt == 'raw':
 return
-self.assertEqual(qemu_img('check', test_img), 0,
- "Verifying image corruption")
+qemu_img('check', test_img)
 
 def test_empty_image(self):
 qemu_img('resize',  '-f', iotests.imgfmt, '--shrink', test_img,
@@ -130,8 +129,7 @@ class ShrinkBaseClass(iotests.QMPTestCase):
 qemu_img('resize',  '-f', iotests.imgfmt, '--shrink', test_img,
  self.shrink_size)
 
-self.assertEqual(qemu_img("compare", test_img, check_img), 0,
- "Verifying image content")
+qemu_img("compare", test_img, check_img)
 
 self.image_verify()
 
@@ -146,8 +144,7 @@ class ShrinkBaseClass(iotests.QMPTestCase):
 qemu_img('resize',  '-f', iotests.imgfmt, '--shrink', test_img,
  self.shrink_size)
 
-self.assertEqual(qemu_img("compare", test_img, check_img), 0,
- "Verifying image content")
+qemu_img("compare", test_img, check_img)
 
 self.image_verify()
 
diff --git a/tests/qemu-iotests/216 b/tests/qemu-iotests/216
index c02f8d2880f..88b385afa30 100755
--- a/tests/qemu-iotests/216
+++ b/tests/qemu-iotests/216
@@ -51,10 +51,10 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('--- Setting up images ---')
 log('')
 
-assert qemu_img('create', '-f', iotests.imgfmt, base_img_path, '64M') == 0
+qemu_img('create', '-f', iotests.imgfmt, base_img_path, '64M')
 assert qemu_io_silent(base_img_path, '-c', 'write -P 1 0M 1M') == 0
-assert qemu_img('create', '-f', iotests.imgfmt, '-b', base_img_path,
-'-F', iotests.imgfmt, top_img_path) == 0
+qemu_img('create', '-f', iotests.imgfmt, '-b', base_img_path,
+ '-F', iotests.imgfmt, top_img_path)
 assert qemu_io_silent(top_img_path,  '-c', 'write -P 2 1M 1M') == 0
 
 log('Done')
diff --git a/tests/qemu-iotests/218 b/tests/qemu-iotests/218
index 4922b4d3b6f..853ed52b349 100755
--- a/tests/qemu-iotests/218
+++ b/tests/qemu-iotests/218
@@ -145,7 +145,7 @@ log('')
 with iotests.VM() as vm, \
  iotests.FilePath('src.img') as src_img_path:
 
-assert qemu_img('create', '-f', iotests.imgfmt, src_img_path, '64M') == 0
+qemu_img('create', '-f', iotests.imgfmt, src_img_path, '64M')
 assert qemu_io_silent('-f', iotests.imgfmt, src_img_path,
   '-c', 'write -P 42 0M 64M') == 0
 
diff --git a/tests/qemu-iotests/224 b/tests/qemu-iotests/224
index 38dd1536254..c31c55b49d2 100755
--- a/tests/qemu-iotests/224
+++ b/tests/qemu-iotests/224
@@ -47,12 +47,11 @@ for filter_node_name in False, True:
  iotests.FilePath('top.img') as top_img_path, \
  iotests.VM() as vm:
 
-assert qemu_img('create', '-f', iotests.imgfmt,
-base_img_path, '64M') == 0
-assert qemu_img('create', '-f', iotests.imgfmt, '-b', base_img_path,
-'-F', iotests.imgfmt, mid_img_path) == 0
-assert qemu_img('create', '-f', iotests.imgfmt, '-b', mid_img_path,
-'-F', iotests.imgfmt, top_img_path) == 0
+qemu_img('create', '-f', iotests.imgfmt, base_img_path, '64M')
+qemu_img('create', '-f', iotests.imgfmt, '-b', base_img_path,
+ '-F', iotests.imgfmt, mid_img_path)
+qemu_img('create', '-f', iotests.imgfmt, '-b', mid_img_path,
+ '-F', iotests.imgfmt, top_img_path)
 
 # Something to commit
 assert qemu_io_silent(mid_img_path, '-c', 'write -P 1 0 1M') == 0
diff --git

[PATCH 2/4] iotests: add VerboseProcessError

2022-02-15 Thread John Snow

This adds an Exception that extends the garden variety
subprocess.CalledProcessError. When this exception is raised, it will
still be caught when selecting for the stdlib variant.

The difference is that the str() method of this Exception also adds the
stdout/stderr logs. In effect, if this exception goes unhandled, Python
will print the output in a nice, highlighted box to the terminal so that
it's easy to spot.

This should save some headache from having to re-run test suites with
debugging enabled if we augment the exceptions we print more information
in the default case.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/iotests.py | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 6ba65eb1ffe..7df393df2c3 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -30,6 +30,7 @@
 import struct
 import subprocess
 import sys
+import textwrap
 import time
 from typing import (Any, Callable, Dict, Iterable, Iterator,
 List, Optional, Sequence, TextIO, Tuple, Type, TypeVar)
@@ -39,6 +40,7 @@
 
 from qemu.machine import qtest
 from qemu.qmp import QMPMessage
+from qemu.utils import enboxify
 
 # Use this logger for logging messages directly from the iotests module
 logger = logging.getLogger('qemu.iotests')
@@ -117,6 +119,38 @@
 sample_img_dir = os.environ['SAMPLE_IMG_DIR']
 
 
+class VerboseProcessError(subprocess.CalledProcessError):
+"""
+The same as CalledProcessError, but more verbose.
+
+This is useful for debugging failed calls during test executions.
+The return code, signal (if any), and terminal output will be displayed
+on unhandled exceptions.
+"""
+def summary(self) -> str:
+return super().__str__()
+
+def __str__(self) -> str:
+lmargin = '  '
+width = shutil.get_terminal_size()[0] - len(lmargin)
+sections = []
+
+if self.stdout:
+name = 'output' if self.stderr is None else 'stdout'
+sections.append(enboxify(self.stdout, width, name))
+else:
+sections.append(f"{name}: N/A")
+
+if self.stderr:
+sections.append(enboxify(self.stderr, width, 'stderr'))
+elif self.stderr is not None:
+sections.append("stderr: N/A")
+
+return os.linesep.join((
+self.summary(),
+textwrap.indent(os.linesep.join(sections), prefix=lmargin),
+))
+
 @contextmanager
 def change_log_level(
 logger_name: str, level: int = logging.CRITICAL) -> Iterator[None]:
-- 
2.34.1

[PATCH 4/4] iotests: make qemu_img raise on non-zero rc by default

2022-02-15 Thread John Snow

re-configure qemu_img() into a function that will by default raise a
VerboseProcessException (extended from CalledProcessException) on
non-zero return codes. This will produce a stack trace that will show
the command line arguments and return code from the failed process run.

Users that want something more flexible (There appears to be only one)
can use check=False and manage the return themselves.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/257|  8 --
 tests/qemu-iotests/iotests.py | 53 +++
 2 files changed, 53 insertions(+), 8 deletions(-)

diff --git a/tests/qemu-iotests/257 b/tests/qemu-iotests/257
index fb5359c581e..e7e7a2317e3 100755
--- a/tests/qemu-iotests/257
+++ b/tests/qemu-iotests/257
@@ -241,11 +241,13 @@ def compare_images(image, reference, baseimg=None, 
expected_match=True):
 expected_ret = 0 if expected_match else 1
 if baseimg:
 qemu_img("rebase", "-u", "-b", baseimg, '-F', iotests.imgfmt, image)
-ret = qemu_img("compare", image, reference)
+
+sub = qemu_img("compare", image, reference, check=False)
+
 log('qemu_img compare "{:s}" "{:s}" ==> {:s}, {:s}'.format(
 image, reference,
-"Identical" if ret == 0 else "Mismatch",
-"OK!" if ret == expected_ret else "ERROR!"),
+"Identical" if sub.returncode == 0 else "Mismatch",
+"OK!" if sub.returncode == expected_ret else "ERROR!"),
 filters=[iotests.filter_testfiles])
 
 def test_bitmap_sync(bsync_mode, msync_mode='bitmap', failure=None):
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 7df393df2c3..8a244fafece 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -249,9 +249,51 @@ def qemu_img_pipe_and_status(*args: str) -> Tuple[str, 
int]:
 return qemu_tool_pipe_and_status('qemu-img', full_args,
  drop_successful_output=is_create)
 
-def qemu_img(*args: str) -> int:
-'''Run qemu-img and return the exit code'''
-return qemu_img_pipe_and_status(*args)[1]
+def qemu_img(*args: str, check: bool = True, combine_stdio: bool = True
+ ) -> subprocess.CompletedProcess[str]:
+"""
+Run qemu_img, returning a CompletedProcess instance.
+
+The CompletedProcess object has args, returncode, and output properties.
+If streams are not combined, It will also have stdout/stderr properties.
+
+:param args: command-line arguments to qemu_img.
+:param check: set to False to suppress VerboseProcessError.
+:param combine_stdio: set to False to keep stdout/stderr separated.
+
+:raise VerboseProcessError:
+On non-zero exit code, when 'check=True' was provided. This
+exception has 'stderr', 'stdout' and 'returncode' properties
+that may be inspected to show greater detail. If this exception
+is not handled, The command-line, return code, and all console
+output will be included at the bottom of the stack trace.
+"""
+full_args = qemu_img_args + qemu_img_create_prepare_args(list(args))
+
+subp = subprocess.run(
+full_args,
+stdout=subprocess.PIPE,
+stderr=subprocess.STDOUT if combine_stdio else subprocess.PIPE,
+universal_newlines=True,
+check=False
+)
+
+# For behavioral consistency with qemu_tool_pipe_and_status():
+# Print out an immediate error when the return code is negative.
+if subp.returncode < 0:
+cmd = ' '.join(full_args)
+sys.stderr.write(
+f'qemu-img received signal {-subp.returncode}: {cmd}\n')
+
+if check and subp.returncode:
+raise VerboseProcessError(
+subp.returncode, full_args,
+output=subp.stdout,
+stderr=subp.stderr,
+)
+
+return subp
+
 
 def ordered_qmp(qmsg, conv_keys=True):
 # Dictionaries are not ordered prior to 3.6, therefore:
@@ -469,8 +511,9 @@ def qemu_nbd_popen(*args):
 
 def compare_images(img1, img2, fmt1=imgfmt, fmt2=imgfmt):
 '''Return True if two image files are identical'''
-return qemu_img('compare', '-f', fmt1,
-'-F', fmt2, img1, img2) == 0
+res = qemu_img('compare', '-f', fmt1,
+   '-F', fmt2, img1, img2, check=False)
+return res.returncode == 0
 
 def create_image(name, size):
 '''Create a fully-allocated raw image with sector markers'''
-- 
2.34.1

[PATCH 1/4] python/utils: add enboxify() text decoration utility

2022-02-15 Thread John Snow

>>> print(enboxify(msg, width=72, name="commit message"))
┏━ commit message ━┓
┃ enboxify() takes a chunk of text and wraps it in a text art box that ┃
┃  adheres to a specified width. An optional title label may be given, ┃
┃  and any of the individual glyphs used to draw the box may be┃
┃ replaced or specified as well.   ┃
┗━━┛

Signed-off-by: John Snow 
---
 python/qemu/utils/__init__.py | 58 +++
 1 file changed, 58 insertions(+)

diff --git a/python/qemu/utils/__init__.py b/python/qemu/utils/__init__.py
index 7f1a5138c4b..f785316f230 100644
--- a/python/qemu/utils/__init__.py
+++ b/python/qemu/utils/__init__.py
@@ -15,7 +15,10 @@
 # the COPYING file in the top-level directory.
 #
 
+import os
 import re
+import shutil
+import textwrap
 from typing import Optional
 
 # pylint: disable=import-error
@@ -23,6 +26,7 @@
 
 
 __all__ = (
+'enboxify',
 'get_info_usernet_hostfwd_port',
 'kvm_available',
 'list_accel',
@@ -43,3 +47,57 @@ def get_info_usernet_hostfwd_port(info_usernet_output: str) 
-> Optional[int]:
 if match is not None:
 return int(match[1])
 return None
+
+
+# pylint: disable=too-many-arguments
+def enboxify(
+content: str = '',
+width: Optional[int] = None,
+name: Optional[str] = None,
+padding: int = 1,
+upper_left: str = '┏',
+upper_right: str = '┓',
+lower_left: str = '┗',
+lower_right: str = '┛',
+horizontal: str = '━',
+vertical: str = '┃',
+) -> str:
+"""
+Wrap some text into a text art box of a given width.
+
+:param content: The text to wrap into a box.
+:param width: The number of columns (including the box itself).
+:param name: A label to apply to the upper-left of the box.
+:param padding: How many columns of padding to apply inside.
+"""
+if width is None:
+width = shutil.get_terminal_size()[0]
+prefix = vertical + (' ' * padding)
+suffix = (' ' * padding) + vertical
+lwidth = width - len(suffix)
+
+def _bar(name: Optional[str], top: bool = True) -> str:
+ret = upper_left if top else lower_left
+right = upper_right if top else lower_right
+if name is not None:
+ret += f"{horizontal} {name} "
+
+assert width is not None
+filler_len = width - len(ret) - len(right)
+ret += f"{horizontal * filler_len}{right}"
+return ret
+
+def _wrap(line: str) -> str:
+return os.linesep.join([
+wrapped_line.ljust(lwidth) + suffix
+for wrapped_line in textwrap.wrap(
+line, width=lwidth, initial_indent=prefix,
+subsequent_indent=prefix, replace_whitespace=False,
+drop_whitespace=False, break_on_hyphens=False)
+])
+
+return os.linesep.join((
+_bar(name, top=True),
+os.linesep.join(_wrap(line) for line in content.splitlines()),
+_bar(None, top=False),
+))
-- 
2.34.1

Re: [PATCH v2] nbd/server: Allow MULTI_CONN for shared writable exports

2022-02-15 Thread Vladimir Sementsov-Ogievskiy


15.02.2022 20:18, Eric Blake wrote:

According to the NBD spec, a server advertising
NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
not see any cache inconsistencies: when properly separated by a single
flush, actions performed by one client will be visible to another
client, regardless of which client did the flush.  We satisfy these
conditions in qemu when our block layer is backed by the local
filesystem (by virtue of the semantics of fdatasync(), and the fact
that qemu itself is not buffering writes beyond flushes).  It is
harder to state whether we satisfy these conditions for network-based
protocols, so the safest course of action is to allow users to opt-in
to advertising multi-conn.  We may later tweak defaults to advertise
by default when the block layer can confirm that the underlying
protocol driver is cache consistent between multiple writers, but for
now, this at least allows savvy users (such as virt-v2v or nbdcopy) to
explicitly start qemu-nbd or qemu-storage-daemon with multi-conn
advertisement in a known-safe setup where the client end can then
benefit from parallel clients.

Note, however, that we don't want to advertise MULTI_CONN when we know


Here we change existing default behavior. Pre-patch MULTI_CONN is still 
advertised for readonly export even when connections number is limited to 1.

Still may be it's a good change? Let's then note it here.. Could it break some 
existing clients? Hard to imagine.

Otherwise patch looks good to me, nonsignificant notes below.


that a second client cannot connect (for historical reasons, qemu-nbd
defaults to a single connection while nbd-server-add and QMP commands
default to unlimited connections; but we already have existing means
to let either style of NBD server creation alter those defaults).  The
harder part of this patch is setting up an iotest to demonstrate
behavior of multiple NBD clients to a single server.  It might be
possible with parallel qemu-io processes, but concisely managing that
in shell is painful.  I found it easier to do by relying on the libnbd
project's nbdsh, which means this test will be skipped on platforms
where that is not available.

Signed-off-by: Eric Blake 
Fixes: https://bugzilla.redhat.com/1708300
---



[..]


+##
+# @NbdExportMultiConn:
+#
+# Possible settings for advertising NBD multiple client support.
+#
+# @off: Do not advertise multiple clients.
+#
+# @on: Allow multiple clients (for writable clients, this is only safe
+#  if the underlying BDS is cache-consistent, such as when backed
+#  by the raw file driver); ignored if the NBD server was set up
+#  with max-connections of 1.
+#
+# @auto: Behaves like @off if the export is writable, and @on if the
+#export is read-only.
+#
+# Since: 7.0
+##
+{ 'enum': 'NbdExportMultiConn',
+  'data': ['off', 'on', 'auto'] }


Probably we should use generic OnOffAuto type that we already have.. But in 
your way documentation looks better.


+
  ##
  # @BlockExportOptionsNbd:
  #


[..]


+
+_make_test_img 4M
+$QEMU_IO -c 'w -P 1 0 2M' -c 'w -P 2 2M 2M' "$TEST_IMG" | _filter_qemu_io
+_launch_qemu 2> >(_filter_nbd)
+_send_qemu_cmd $QEMU_HANDLE '{"execute":"qmp_capabilities"}' "return"
+_send_qemu_cmd $QEMU_HANDLE '{"execute":"blockdev-add",
+  "arguments":{"driver":"qcow2", "node-name":"n",
+"file":{"driver":"file", "filename":"'"$TEST_IMG"'"}}}' "return"
+export nbd_unix_socket
+


[..]


+nbdsh -c '
+import os
+sock = os.getenv("nbd_unix_socket")


Just interested why you pass it through environment and no simply do '... sock = 
"'"$nbd_unix_socket"'"... '


+h = []
+
+for i in range(3):
+  h.append(nbd.NBD())


Looks like Python:) And if something looks like Python, it's Python, we know. 
Should I say about PEP8 that recommends 4 whitespaces?)


+  h[i].connect_unix(sock)
+  assert h[i].can_multi_conn()
+
+buf1 = h[0].pread(1024 * 1024, 0)
+if buf1 != b"\x01" * 1024 * 1024:
+  print("Unexpected initial read")
+buf2 = b"\x03" * 1024 * 1024
+h[1].pwrite(buf2, 0)
+h[2].flush()
+buf1 = h[0].pread(1024 * 1024, 0)
+if buf1 == buf2:
+  print("Flush appears to be consistent across connections")
+
+for i in range(3):
+  h[i].shutdown()
+'


--
Best regards,
Vladimir

Re: [PATCH v2] nbd/server: Allow MULTI_CONN for shared writable exports

2022-02-15 Thread Nir Soffer

On Tue, Feb 15, 2022 at 7:22 PM Eric Blake  wrote:

> According to the NBD spec, a server advertising
> NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
> not see any cache inconsistencies: when properly separated by a single
> flush, actions performed by one client will be visible to another
> client, regardless of which client did the flush.  We satisfy these
> conditions in qemu when our block layer is backed by the local
> filesystem (by virtue of the semantics of fdatasync(), and the fact
> that qemu itself is not buffering writes beyond flushes).  It is
> harder to state whether we satisfy these conditions for network-based
> protocols, so the safest course of action is to allow users to opt-in
> to advertising multi-conn.  We may later tweak defaults to advertise
> by default when the block layer can confirm that the underlying
> protocol driver is cache consistent between multiple writers, but for
> now, this at least allows savvy users (such as virt-v2v or nbdcopy) to
> explicitly start qemu-nbd or qemu-storage-daemon with multi-conn
> advertisement in a known-safe setup where the client end can then
> benefit from parallel clients.
>

It makes sense, and will be used by oVirt. Actually we are already using
multiple connections for writing about 2 years, based on your promise
that if every client writes to district  areas this is safe.

Note, however, that we don't want to advertise MULTI_CONN when we know
> that a second client cannot connect (for historical reasons, qemu-nbd
> defaults to a single connection while nbd-server-add and QMP commands
> default to unlimited connections; but we already have existing means
> to let either style of NBD server creation alter those defaults).  The
> harder part of this patch is setting up an iotest to demonstrate
> behavior of multiple NBD clients to a single server.  It might be
> possible with parallel qemu-io processes, but concisely managing that
> in shell is painful.  I found it easier to do by relying on the libnbd
> project's nbdsh, which means this test will be skipped on platforms
> where that is not available.
>
> Signed-off-by: Eric Blake 
> Fixes: https://bugzilla.redhat.com/1708300
> ---
>
> v1 was in Aug 2021 [1], with further replies in Sep [2] and Oct [3].
>
> [1] https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg04900.html
> [2] https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg00038.html
> [3] https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg06744.html
>
> Since then, I've tweaked the QAPI to mention 7.0 (instead of 6.2), and
> reworked the logic so that default behavior is unchanged for now
> (advertising multi-conn on a writable export requires opt-in during
> the command line or QMP, but remains default for a readonly export).
> I've also expanded the amount of testing done in the new iotest.
>
>  docs/interop/nbd.txt   |   1 +
>  docs/tools/qemu-nbd.rst|   3 +-
>  qapi/block-export.json |  34 +++-
>  include/block/nbd.h|   3 +-
>  blockdev-nbd.c |   5 +
>  nbd/server.c   |  27 ++-
>  MAINTAINERS|   1 +
>  tests/qemu-iotests/tests/nbd-multiconn | 188 +
>  tests/qemu-iotests/tests/nbd-multiconn.out | 112 
>  9 files changed, 363 insertions(+), 11 deletions(-)
>  create mode 100755 tests/qemu-iotests/tests/nbd-multiconn
>  create mode 100644 tests/qemu-iotests/tests/nbd-multiconn.out
>
> diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
> index bdb0f2a41aca..6c99070b99c8 100644
> --- a/docs/interop/nbd.txt
> +++ b/docs/interop/nbd.txt
> @@ -68,3 +68,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:",
> NBD_CMD_CACHE
>  * 4.2: NBD_FLAG_CAN_MULTI_CONN for shareable read-only exports,
>  NBD_CMD_FLAG_FAST_ZERO
>  * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
> +* 7.0: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
> diff --git a/docs/tools/qemu-nbd.rst b/docs/tools/qemu-nbd.rst
> index 6031f9689312..1de785524c36 100644
> --- a/docs/tools/qemu-nbd.rst
> +++ b/docs/tools/qemu-nbd.rst
> @@ -139,8 +139,7 @@ driver options if ``--image-opts`` is specified.
>  .. option:: -e, --shared=NUM
>
>Allow up to *NUM* clients to share the device (default
> -  ``1``), 0 for unlimited. Safe for readers, but for now,
> -  consistency is not guaranteed between multiple writers.
> +  ``1``), 0 for unlimited.
>

Removing the note means that now consistency is guaranteed between
multiple writers, no?

Or maybe we want to mention here that consistency depends on the protocol
and users can opt in, or refer to the section where this is discussed?

 .. option:: -t, --persistent
>
> diff --git a/qapi/block-export.json b/qapi/block-export.json
> index f183522d0d2c..0a27e8ee84f9 100644
> --- a/qapi/block-export.json
> +++ b/qapi/block-export.json
> @@ -21,7 +21,9 @@
>  #

Re: [PULL 0/6] hw/nvme updates

2022-02-15 Thread Peter Maydell

On Mon, 14 Feb 2022 at 08:08, Klaus Jensen  wrote:
>
> From: Klaus Jensen 
>
> Hi Peter,
>
> The following changes since commit 48033ad678ae2def43bf0d543a2c4c3d2a93feaf:
>
>   Merge remote-tracking branch 
> 'remotes/vsementsov/tags/pull-nbd-2022-02-09-v2' into staging (2022-02-12 
> 22:04:07 +)
>
> are available in the Git repository at:
>
>   git://git.infradead.org/qemu-nvme.git tags/nvme-next-pull-request
>
> for you to fetch changes up to e321b4cdc2dd0b5e806ecf759138be7f83774142:
>
>   hw/nvme: add support for zoned random write area (2022-02-14 08:58:29 +0100)
>
> 
> hw/nvme updates
>
>   - fix CVE-2021-3929
>   - add zone random write area support
>   - misc cleanups from Philippe
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.0
for any user-visible changes.

-- PMM

Re: [PULL 0/4] Python patches

2022-02-15 Thread John Snow

On Tue, Feb 15, 2022 at 1:01 PM Peter Maydell  wrote:
>
> On Tue, 15 Feb 2022 at 17:46, John Snow  wrote:
> > Just so I don't leave this thread hanging, I filed a GitLab issue and
> > I'm working on it, but this one isn't as quick to solve as the other.
> >
> > https://gitlab.com/qemu-project/qemu/-/issues/874
>
> Is there anything particular to NetBSD that means it happens
> more often there, or is it just random luck that we hit
> the race there and haven't seen it elsewhere ?
>
> -- PMM

Complete random luck, something jostled loose by the scheduler.

I need to change the interface in the async library entirely to make
the process more granular -- We don't need the granularity in a truly
async mode, but the sync wrapper that allows the existing iotests
corpus to use the library in a synchronous manner *requires* a more
granular connection API, so I have to write one. It's in progress, it
just might be a few more days; verifying and testing the error
pathways has been slow work.

(In detail: python's asyncio.create_unix_server() call combines bind()
+ listen() + accept() into a single discrete step. A synchronous
client, though, needs to have a reprieve from all of those blocking
steps to launch the QEMU process after listen() but before accept() so
it can launch the QEMU process. I was able to pull the bind() step
out, but the async listen() + accept() steps the way I initially wrote
it are inseparable. Live and learn.)

In the meantime, there *IS* a way to use the old library, but I don't
think the environment variable in question is routed down into the VM
tests. I can look at (as a very quick fix) amending the VM launcher to
pass along that environment variable if it sees it set in the host
environment -- that should get you on the old, tried-and-true library
when you want it, and the test should pass.

--js

Re: [PULL 0/4] Python patches

2022-02-15 Thread Peter Maydell

On Tue, 15 Feb 2022 at 17:46, John Snow  wrote:
> Just so I don't leave this thread hanging, I filed a GitLab issue and
> I'm working on it, but this one isn't as quick to solve as the other.
>
> https://gitlab.com/qemu-project/qemu/-/issues/874

Is there anything particular to NetBSD that means it happens
more often there, or is it just random luck that we hit
the race there and haven't seen it elsewhere ?

-- PMM

Re: [PULL 18/38] Remove unnecessary minimum_version_id_old fields

2022-02-15 Thread Peter Maydell

On Tue, 15 Feb 2022 at 17:34, Cédric Le Goater  wrote:
>
> On 2/15/22 18:13, Peter Maydell wrote:
> > No, as the commit message notes, it deliberately did not change
> > that one vmstate, because at the time of writing the patch
> > that was the one vmstate that really was still using
> > load_state_old. As it happens commit 8f91aca7ff0044b hit
> > master first, removing that use of load_state_old (but
> > forgetting to remove the minimum_version_id_old field along
> > with it),
>
> If I remember well, at the time of this patch, we were both working
> on the same part and I thought that removing all minimum_version_id_old
> in one go was better.

I would tend to disagree, but we got to the right place anyway,
so it doesn't matter. I've just posted the patch that removes
the load_state_old and minimum_version_id_old fields from
the struct definition entirely; it's nice to be able to
finally drop that little bit of legacy support.

-- PMM

[PATCH 4/4] block: simplify handling of try to merge different sized bitmaps

2022-02-15 Thread Vladimir Sementsov-Ogievskiy

We have too much logic to simply check that bitmaps are of the same
size. Let's just define that hbitmap_merge() and
bdrv_dirty_bitmap_merge_internal() require their argument bitmaps be of
same size, this simplifies things.

Let's look through the callers:

For backup_init_bcs_bitmap() we already assert that merge can't fail.

In bdrv_reclaim_dirty_bitmap_locked() we gracefully handle the error
that can't happen: successor always has same size as its parent, drop
this logic.

In bdrv_merge_dirty_bitmap() we already has assertion and separate
check. Make the check explicit and improve error message.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block_int.h |  2 +-
 include/qemu/hbitmap.h| 15 ++-
 block/backup.c|  6 ++
 block/dirty-bitmap.c  | 25 ++---
 util/hbitmap.c| 25 +++--
 5 files changed, 22 insertions(+), 51 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 27008cfb22..cc40b6363d 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1382,7 +1382,7 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t offset, 
int64_t bytes);
 
 void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out);
 void bdrv_restore_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap *backup);
-bool bdrv_dirty_bitmap_merge_internal(BdrvDirtyBitmap *dest,
+void bdrv_dirty_bitmap_merge_internal(BdrvDirtyBitmap *dest,
   const BdrvDirtyBitmap *src,
   HBitmap **backup, bool lock);
 
diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index 5e71b6d6f7..4dc1c6ad14 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -76,20 +76,9 @@ void hbitmap_truncate(HBitmap *hb, uint64_t size);
  *
  * Store result of merging @a and @b into @result.
  * @result is allowed to be equal to @a or @b.
- *
- * Return true if the merge was successful,
- *false if it was not attempted.
+ * All bitmaps must have same size.
  */
-bool hbitmap_merge(const HBitmap *a, const HBitmap *b, HBitmap *result);
-
-/**
- * hbitmap_can_merge:
- *
- * hbitmap_can_merge(a, b) && hbitmap_can_merge(a, result) is sufficient and
- * necessary for hbitmap_merge will not fail.
- *
- */
-bool hbitmap_can_merge(const HBitmap *a, const HBitmap *b);
+void hbitmap_merge(const HBitmap *a, const HBitmap *b, HBitmap *result);
 
 /**
  * hbitmap_empty:
diff --git a/block/backup.c b/block/backup.c
index 21d5983779..fb3d4b0e13 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -228,15 +228,13 @@ out:
 
 static void backup_init_bcs_bitmap(BackupBlockJob *job)
 {
-bool ret;
 uint64_t estimate;
 BdrvDirtyBitmap *bcs_bitmap = block_copy_dirty_bitmap(job->bcs);
 
 if (job->sync_mode == MIRROR_SYNC_MODE_BITMAP) {
 bdrv_clear_dirty_bitmap(bcs_bitmap, NULL);
-ret = bdrv_dirty_bitmap_merge_internal(bcs_bitmap, job->sync_bitmap,
-   NULL, true);
-assert(ret);
+bdrv_dirty_bitmap_merge_internal(bcs_bitmap, job->sync_bitmap, NULL,
+ true);
 } else if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
 /*
  * We can't hog the coroutine to initialize this thoroughly.
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index d16b96ee62..9d803fcda3 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -309,10 +309,7 @@ BdrvDirtyBitmap 
*bdrv_reclaim_dirty_bitmap_locked(BdrvDirtyBitmap *parent,
 return NULL;
 }
 
-if (!hbitmap_merge(parent->bitmap, successor->bitmap, parent->bitmap)) {
-error_setg(errp, "Merging of parent and successor bitmap failed");
-return NULL;
-}
+hbitmap_merge(parent->bitmap, successor->bitmap, parent->bitmap);
 
 parent->disabled = successor->disabled;
 parent->busy = false;
@@ -899,13 +896,14 @@ bool bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const 
BdrvDirtyBitmap *src,
 goto out;
 }
 
-if (!hbitmap_can_merge(dest->bitmap, src->bitmap)) {
-error_setg(errp, "Bitmaps are incompatible and can't be merged");
+if (bdrv_dirty_bitmap_size(src) != bdrv_dirty_bitmap_size(dest)) {
+error_setg(errp, "Bitmaps are of different sizes (destination size is 
%"
+   PRId64 ", source size is %" PRId64 ") and can't be merged",
+   bdrv_dirty_bitmap_size(dest), bdrv_dirty_bitmap_size(src));
 goto out;
 }
 
-ret = bdrv_dirty_bitmap_merge_internal(dest, src, backup, false);
-assert(ret);
+bdrv_dirty_bitmap_merge_internal(dest, src, backup, false);
 
 out:
 bdrv_dirty_bitmaps_unlock(dest->bs);
@@ -919,18 +917,17 @@ out:
 /**
  * bdrv_dirty_bitmap_merge_internal: merge src into dest.
  * Does NOT check bitmap permissions; not suitable for use as public API.
+ * @dest, @src and @backup (if not NULL) must have same size.
  *
  *

[PATCH 3/4] block: improve block_dirty_bitmap_merge(): don't allocate extra bitmap

2022-02-15 Thread Vladimir Sementsov-Ogievskiy

We don't need extra bitmap. All we need is to backup the original
bitmap when we do first merge. So, drop extra temporary bitmap and work
directly with target and backup.

Note that block_dirty_bitmap_merge() semantics changed: on failure
target may be modified now, and caller should call
block_dirty_bitmap_restore() if needed. The only caller is
qmp_transaction() and ->abort() is called for failed action. So, we are
OK.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/monitor/bitmap-qmp-cmds.c | 31 +--
 1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/block/monitor/bitmap-qmp-cmds.c b/block/monitor/bitmap-qmp-cmds.c
index a94aaa9fb3..2ce1b4e455 100644
--- a/block/monitor/bitmap-qmp-cmds.c
+++ b/block/monitor/bitmap-qmp-cmds.c
@@ -252,12 +252,15 @@ void qmp_block_dirty_bitmap_disable(const char *node, 
const char *name,
 bdrv_disable_dirty_bitmap(bitmap);
 }
 
+/*
+ * On failure target may be modified. In this case @backup is set.
+ */
 BdrvDirtyBitmap *block_dirty_bitmap_merge(const char *node, const char *target,
   BlockDirtyBitmapMergeSourceList *bms,
   HBitmap **backup, Error **errp)
 {
 BlockDriverState *bs;
-BdrvDirtyBitmap *dst, *src, *anon;
+BdrvDirtyBitmap *dst, *src;
 BlockDirtyBitmapMergeSourceList *lst;
 
 dst = block_dirty_bitmap_lookup(node, target, , errp);
@@ -265,12 +268,6 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char 
*node, const char *target,
 return NULL;
 }
 
-anon = bdrv_create_dirty_bitmap(bs, bdrv_dirty_bitmap_granularity(dst),
-NULL, errp);
-if (!anon) {
-return NULL;
-}
-
 for (lst = bms; lst; lst = lst->next) {
 switch (lst->value->type) {
 const char *name, *node;
@@ -279,8 +276,7 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char *node, 
const char *target,
 src = bdrv_find_dirty_bitmap(bs, name);
 if (!src) {
 error_setg(errp, "Dirty bitmap '%s' not found", name);
-dst = NULL;
-goto out;
+return NULL;
 }
 break;
 case QTYPE_QDICT:
@@ -288,28 +284,19 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char 
*node, const char *target,
 name = lst->value->u.external.name;
 src = block_dirty_bitmap_lookup(node, name, NULL, errp);
 if (!src) {
-dst = NULL;
-goto out;
+return NULL;
 }
 break;
 default:
 abort();
 }
 
-if (!bdrv_merge_dirty_bitmap(anon, src, NULL, errp)) {
-dst = NULL;
-goto out;
+if (!bdrv_merge_dirty_bitmap(dst, src, backup, errp)) {
+return NULL;
 }
+backup = NULL; /* Set once by first bdrv_merge_dirty_bitmap() call */
 }
 
-/* Merge into dst; dst is unchanged on failure. */
-if (!bdrv_merge_dirty_bitmap(dst, anon, backup, errp)) {
-dst = NULL;
-goto out;
-}
-
- out:
-bdrv_release_dirty_bitmap(anon);
 return dst;
 }
 
-- 
2.31.1

[PATCH 2/4] block: block_dirty_bitmap_merge(): fix error path

2022-02-15 Thread Vladimir Sementsov-Ogievskiy

At the end we ignore failure of bdrv_merge_dirty_bitmap() and report
success. And still set errp. That's wrong.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/monitor/bitmap-qmp-cmds.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/monitor/bitmap-qmp-cmds.c b/block/monitor/bitmap-qmp-cmds.c
index 83970b22fa..a94aaa9fb3 100644
--- a/block/monitor/bitmap-qmp-cmds.c
+++ b/block/monitor/bitmap-qmp-cmds.c
@@ -303,7 +303,10 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char 
*node, const char *target,
 }
 
 /* Merge into dst; dst is unchanged on failure. */
-bdrv_merge_dirty_bitmap(dst, anon, backup, errp);
+if (!bdrv_merge_dirty_bitmap(dst, anon, backup, errp)) {
+dst = NULL;
+goto out;
+}
 
  out:
 bdrv_release_dirty_bitmap(anon);
-- 
2.31.1

[PATCH 1/4] block: bdrv_merge_dirty_bitmap: add return value

2022-02-15 Thread Vladimir Sementsov-Ogievskiy

Add return value to bdrv_merge_dirty_bitmap() and use it to reduce
error propagation.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/dirty-bitmap.h| 2 +-
 block/dirty-bitmap.c| 6 --
 block/monitor/bitmap-qmp-cmds.c | 5 +
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 40950ae3d5..f95d350b70 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -77,7 +77,7 @@ void bdrv_dirty_bitmap_set_persistence(BdrvDirtyBitmap 
*bitmap,
bool persistent);
 void bdrv_dirty_bitmap_set_inconsistent(BdrvDirtyBitmap *bitmap);
 void bdrv_dirty_bitmap_set_busy(BdrvDirtyBitmap *bitmap, bool busy);
-void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
+bool bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
  HBitmap **backup, Error **errp);
 void bdrv_dirty_bitmap_skip_store(BdrvDirtyBitmap *bitmap, bool skip);
 bool bdrv_dirty_bitmap_get(BdrvDirtyBitmap *bitmap, int64_t offset);
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 0ef46163e3..d16b96ee62 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -881,10 +881,10 @@ bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap 
*bitmap,
  *
  * @backup: If provided, make a copy of dest here prior to merge.
  */
-void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
+bool bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
  HBitmap **backup, Error **errp)
 {
-bool ret;
+bool ret = false;
 
 bdrv_dirty_bitmaps_lock(dest->bs);
 if (src->bs != dest->bs) {
@@ -912,6 +912,8 @@ out:
 if (src->bs != dest->bs) {
 bdrv_dirty_bitmaps_unlock(src->bs);
 }
+
+return ret;
 }
 
 /**
diff --git a/block/monitor/bitmap-qmp-cmds.c b/block/monitor/bitmap-qmp-cmds.c
index 9f11deec64..83970b22fa 100644
--- a/block/monitor/bitmap-qmp-cmds.c
+++ b/block/monitor/bitmap-qmp-cmds.c
@@ -259,7 +259,6 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char *node, 
const char *target,
 BlockDriverState *bs;
 BdrvDirtyBitmap *dst, *src, *anon;
 BlockDirtyBitmapMergeSourceList *lst;
-Error *local_err = NULL;
 
 dst = block_dirty_bitmap_lookup(node, target, , errp);
 if (!dst) {
@@ -297,9 +296,7 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char *node, 
const char *target,
 abort();
 }
 
-bdrv_merge_dirty_bitmap(anon, src, NULL, _err);
-if (local_err) {
-error_propagate(errp, local_err);
+if (!bdrv_merge_dirty_bitmap(anon, src, NULL, errp)) {
 dst = NULL;
 goto out;
 }
-- 
2.31.1

[PATCH 0/4] block/dirty-bitmaps: fix and improve bitmap merge

2022-02-15 Thread Vladimir Sementsov-Ogievskiy

Hi all!

Here are some good refactoring and fix of bitmap merge paths.

Vladimir Sementsov-Ogievskiy (4):
  block: bdrv_merge_dirty_bitmap: add return value
  block: block_dirty_bitmap_merge(): fix error path
  block: improve block_dirty_bitmap_merge(): don't allocate extra bitmap
  block: simplify handling of try to merge different sized bitmaps

 include/block/block_int.h   |  2 +-
 include/block/dirty-bitmap.h|  2 +-
 include/qemu/hbitmap.h  | 15 ++-
 block/backup.c  |  6 ++
 block/dirty-bitmap.c| 31 ++-
 block/monitor/bitmap-qmp-cmds.c | 31 +--
 util/hbitmap.c  | 25 +++--
 7 files changed, 36 insertions(+), 76 deletions(-)

-- 
2.31.1

Re: [PULL 0/4] Python patches

2022-02-15 Thread John Snow

On Tue, Feb 8, 2022 at 9:40 AM Peter Maydell  wrote:
>
> On Thu, 3 Feb 2022 at 23:22, John Snow  wrote:
> >
> > On Thu, Feb 3, 2022 at 11:52 AM Peter Maydell  
> > wrote:
> > >
> > > On Thu, 3 Feb 2022 at 16:38, John Snow  wrote:
> > >
> > > > On Thu, Feb 3, 2022, 11:20 AM Peter Maydell  
> > > > wrote:
> > > >> Summary of Failures:
> > > >>
> > > >> 1/1 qemu:block / qemu-iotests qcow2 ERROR  243.14s   exit 
> > > >> status 1
> >
> > I'm not too familiar with this new test runner, yet. (Is this error
> > even anything to do with the python lib? I guess I can't rule it
> > out...)
> > I just got a clean run of 'make vm-build-netbsd', so I'm using that
> > output as reference and making some guesses.
>
> Rerunning on the netbsd VM with Paolo's "revert the iotests
> conversion" patch, here's the output from a failing run, where
> iotest 041 failed:
>
> TEST   iotest-qcow2: 041 [fail]
> QEMU  --
> "/home/qemu/qemu-test.Kywnb7/build/tests/qemu-iotests/../../qemu-system-aarch64"
> -nodefaults -display none -accel qtest -machine virt
> QEMU_IMG  --
> "/home/qemu/qemu-test.Kywnb7/build/tests/qemu-iotests/../../qemu-img"
> QEMU_IO   --
> "/home/qemu/qemu-test.Kywnb7/build/tests/qemu-iotests/../../qemu-io"
> --cache writeback --aio threads -f qcow2
> QEMU_NBD  --
> "/home/qemu/qemu-test.Kywnb7/build/tests/qemu-iotests/../../qemu-nbd"
> IMGFMT-- qcow2
> IMGPROTO  -- file
> PLATFORM  -- NetBSD/amd64 localhost 9.2
> TEST_DIR  -- /home/qemu/qemu-test.Kywnb7/build/tests/qemu-iotests/scratch
> SOCK_DIR  -- /tmp/tmp6fiu68sr
> GDB_OPTIONS   --
> VALGRIND_QEMU --
> PRINT_QEMU_OUTPUT --
>
> --- /home/qemu/qemu-test.Kywnb7/src/tests/qemu-iotests/041.out
> +++ 041.out.bad
> @@ -1,5 +1,44 @@
> -...
> +ERROR:qemu.aqmp.qmp_client.qemu-14411:Failed
> to establish connection: concurrent.futures._base.CancelledError
> +E..
> +==
> +ERROR: test_mirror_to_self (__main__.TestSingleBlockdev)
> +--
> +Traceback (most recent call last):
> +  File "/home/qemu/qemu-test.Kywnb7/src/python/qemu/machine/machine.py",
> line 428, in launch
> +self._launch()
> +  File "/home/qemu/qemu-test.Kywnb7/src/python/qemu/machine/machine.py",
> line 467, in _launch
> +self._post_launch()
> +  File "/home/qemu/qemu-test.Kywnb7/src/python/qemu/machine/qtest.py",
> line 147, in _post_launch
> +super()._post_launch()
> +  File "/home/qemu/qemu-test.Kywnb7/src/python/qemu/machine/machine.py",
> line 369, in _post_launch
> +self._qmp.accept(self._qmp_timer)
> +  File "/home/qemu/qemu-test.Kywnb7/src/python/qemu/aqmp/legacy.py",
> line 95, in accept
> +timeout
> +  File "/home/qemu/qemu-test.Kywnb7/src/python/qemu/aqmp/legacy.py",
> line 68, in _sync
> +asyncio.wait_for(future, timeout=timeout)
> +  File "/usr/pkg/lib/python3.7/asyncio/base_events.py", line 587, in
> run_until_complete
> +return future.result()
> +  File "/usr/pkg/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
> +raise futures.TimeoutError()
> +concurrent.futures._base.TimeoutError
> +
> +The above exception was the direct cause of the following exception:
> +
> +Traceback (most recent call last):
> +  File "/home/qemu/qemu-test.Kywnb7/src/tests/qemu-iotests/041", line
> 233, in setUp
> +TestSingleDrive.setUp(self)
> +  File "/home/qemu/qemu-test.Kywnb7/src/tests/qemu-iotests/041", line
> 54, in setUp
> +self.vm.launch()
> +  File "/home/qemu/qemu-test.Kywnb7/src/python/qemu/machine/machine.py",
> line 445, in launch
> +) from exc
> +qemu.machine.machine.VMLaunchFailure: TimeoutError
> +   Exit code: 1
> +   Command:
> /home/qemu/qemu-test.Kywnb7/build/tests/qemu-iotests/../../qemu-system-aarch64
> -display none -vga none -chardev
> socket,id=mon,path=/tmp/tmp6fiu68sr/qemu-14411-monitor.sock -mon
> chardev=mon,mode=control -qtest
> unix:path=/tmp/tmp6fiu68sr/qemu-14411-qtest.sock -accel qtest
> -nodefaults -display none -accel qtest -machine virt -drive
> if=virtio,id=drive0,file=/home/qemu/qemu-test.Kywnb7/build/tests/qemu-iotests/scratch/test.img,format=qcow2,cache=writeback,aio=threads,node-name=top,backing.node-name=base
> +   Output: qemu-system-aarch64: -chardev
> socket,id=mon,path=/tmp/tmp6fiu68sr/qemu-14411-monitor.sock: Failed to
> connect to '/tmp/tmp6fiu68sr/qemu-14411-monitor.sock': Connection
> refused
> +
> +
> +
>  --
>  Ran 107 tests
>
> -OK
> +FAILED (errors=1)
>
>
> thanks
> -- PMM
>

Just so I don't leave this thread hanging, I filed a GitLab issue and
I'm working on it, but this one isn't as quick to solve as the other.

Re: [PULL 18/38] Remove unnecessary minimum_version_id_old fields

2022-02-15 Thread Cédric Le Goater


On 2/15/22 18:13, Peter Maydell wrote:

On Thu, 27 Jan 2022 at 15:14, Juan Quintela  wrote:


From: Peter Maydell 

The migration code will not look at a VMStateDescription's
minimum_version_id_old field unless that VMSD has set the
load_state_old field to something non-NULL.  (The purpose of
minimum_version_id_old is to specify what migration version is needed
for the code in the function pointed to by load_state_old to be able
to handle it on incoming migration.)

We have exactly one VMSD which still has a load_state_old,
in the PPC CPU; every other VMSD which sets minimum_version_id_old
is doing so unnecessarily. Delete all the unnecessary ones.

Commit created with:
   sed -i '/\.minimum_version_id_old/d' $(git grep -l 
'\.minimum_version_id_old')
with the one legitimate use then hand-edited back in.

Signed-off-by: Peter Maydell 
Reviewed-by: Juan Quintela 

Signed-off-by: Juan Quintela 

---

It missed vmstate_ppc_cpu.


No, as the commit message notes, it deliberately did not change
that one vmstate, because at the time of writing the patch
that was the one vmstate that really was still using
load_state_old. As it happens commit 8f91aca7ff0044b hit
master first, removing that use of load_state_old (but
forgetting to remove the minimum_version_id_old field along
with it), 


If I remember well, at the time of this patch, we were both working
on the same part and I thought that removing all minimum_version_id_old
in one go was better. Anyhow,


so this commit as it hit master is OK.


yes.

Thanks,

C.

[PATCH v2] nbd/server: Allow MULTI_CONN for shared writable exports

2022-02-15 Thread Eric Blake

According to the NBD spec, a server advertising
NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
not see any cache inconsistencies: when properly separated by a single
flush, actions performed by one client will be visible to another
client, regardless of which client did the flush.  We satisfy these
conditions in qemu when our block layer is backed by the local
filesystem (by virtue of the semantics of fdatasync(), and the fact
that qemu itself is not buffering writes beyond flushes).  It is
harder to state whether we satisfy these conditions for network-based
protocols, so the safest course of action is to allow users to opt-in
to advertising multi-conn.  We may later tweak defaults to advertise
by default when the block layer can confirm that the underlying
protocol driver is cache consistent between multiple writers, but for
now, this at least allows savvy users (such as virt-v2v or nbdcopy) to
explicitly start qemu-nbd or qemu-storage-daemon with multi-conn
advertisement in a known-safe setup where the client end can then
benefit from parallel clients.

Note, however, that we don't want to advertise MULTI_CONN when we know
that a second client cannot connect (for historical reasons, qemu-nbd
defaults to a single connection while nbd-server-add and QMP commands
default to unlimited connections; but we already have existing means
to let either style of NBD server creation alter those defaults).  The
harder part of this patch is setting up an iotest to demonstrate
behavior of multiple NBD clients to a single server.  It might be
possible with parallel qemu-io processes, but concisely managing that
in shell is painful.  I found it easier to do by relying on the libnbd
project's nbdsh, which means this test will be skipped on platforms
where that is not available.

Signed-off-by: Eric Blake 
Fixes: https://bugzilla.redhat.com/1708300
---

v1 was in Aug 2021 [1], with further replies in Sep [2] and Oct [3].

[1] https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg04900.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg00038.html
[3] https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg06744.html

Since then, I've tweaked the QAPI to mention 7.0 (instead of 6.2), and
reworked the logic so that default behavior is unchanged for now
(advertising multi-conn on a writable export requires opt-in during
the command line or QMP, but remains default for a readonly export).
I've also expanded the amount of testing done in the new iotest.

 docs/interop/nbd.txt   |   1 +
 docs/tools/qemu-nbd.rst|   3 +-
 qapi/block-export.json |  34 +++-
 include/block/nbd.h|   3 +-
 blockdev-nbd.c |   5 +
 nbd/server.c   |  27 ++-
 MAINTAINERS|   1 +
 tests/qemu-iotests/tests/nbd-multiconn | 188 +
 tests/qemu-iotests/tests/nbd-multiconn.out | 112 
 9 files changed, 363 insertions(+), 11 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/nbd-multiconn
 create mode 100644 tests/qemu-iotests/tests/nbd-multiconn.out

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index bdb0f2a41aca..6c99070b99c8 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -68,3 +68,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 * 4.2: NBD_FLAG_CAN_MULTI_CONN for shareable read-only exports,
 NBD_CMD_FLAG_FAST_ZERO
 * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
+* 7.0: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
diff --git a/docs/tools/qemu-nbd.rst b/docs/tools/qemu-nbd.rst
index 6031f9689312..1de785524c36 100644
--- a/docs/tools/qemu-nbd.rst
+++ b/docs/tools/qemu-nbd.rst
@@ -139,8 +139,7 @@ driver options if ``--image-opts`` is specified.
 .. option:: -e, --shared=NUM

   Allow up to *NUM* clients to share the device (default
-  ``1``), 0 for unlimited. Safe for readers, but for now,
-  consistency is not guaranteed between multiple writers.
+  ``1``), 0 for unlimited.

 .. option:: -t, --persistent

diff --git a/qapi/block-export.json b/qapi/block-export.json
index f183522d0d2c..0a27e8ee84f9 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -21,7 +21,9 @@
 # recreated on the fly while the NBD server is active.
 # If missing, it will default to denying access (since 4.0).
 # @max-connections: The maximum number of connections to allow at the same
-#   time, 0 for unlimited. (since 5.2; default: 0)
+#   time, 0 for unlimited. Setting this to 1 also stops
+#   the server from advertising multiple client support
+#   (since 5.2; default: 0)
 #
 # Since: 4.2
 ##
@@ -50,7 +52,9 @@
 # recreated on the fly while the NBD server is active.
 # If missing, it will default to denying access (since 4.0).
 # @max-connections: The

Re: [PULL 18/38] Remove unnecessary minimum_version_id_old fields

2022-02-15 Thread Peter Maydell

On Thu, 27 Jan 2022 at 15:14, Juan Quintela  wrote:
>
> From: Peter Maydell 
>
> The migration code will not look at a VMStateDescription's
> minimum_version_id_old field unless that VMSD has set the
> load_state_old field to something non-NULL.  (The purpose of
> minimum_version_id_old is to specify what migration version is needed
> for the code in the function pointed to by load_state_old to be able
> to handle it on incoming migration.)
>
> We have exactly one VMSD which still has a load_state_old,
> in the PPC CPU; every other VMSD which sets minimum_version_id_old
> is doing so unnecessarily. Delete all the unnecessary ones.
>
> Commit created with:
>   sed -i '/\.minimum_version_id_old/d' $(git grep -l 
> '\.minimum_version_id_old')
> with the one legitimate use then hand-edited back in.
>
> Signed-off-by: Peter Maydell 
> Reviewed-by: Juan Quintela 
>
> Signed-off-by: Juan Quintela 
>
> ---
>
> It missed vmstate_ppc_cpu.

No, as the commit message notes, it deliberately did not change
that one vmstate, because at the time of writing the patch
that was the one vmstate that really was still using
load_state_old. As it happens commit 8f91aca7ff0044b hit
master first, removing that use of load_state_old (but
forgetting to remove the minimum_version_id_old field along
with it), so this commit as it hit master is OK.

-- PMM

Re: [PATCH v2] ide: Increment BB in-flight counter for TRIM BH

2022-02-15 Thread Hanna Reitz


Ping

(I can take it too, if you’d like, John, but you’re listed as the only 
maintainer for hw/ide, so...  Just say the word, though!)


On 20.01.22 15:22, Hanna Reitz wrote:

When we still have an AIOCB registered for DMA operations, we try to
settle the respective operation by draining the BlockBackend associated
with the IDE device.

However, this assumes that every DMA operation is associated with an
increment of the BlockBackend’s in-flight counter (e.g. through some
ongoing I/O operation), so that draining the BB until its in-flight
counter reaches 0 will settle all DMA operations.  That is not the case:
For TRIM, the guest can issue a zero-length operation that will not
result in any I/O operation forwarded to the BlockBackend, and also not
increment the in-flight counter in any other way.  In such a case,
blk_drain() will be a no-op if no other operations are in flight.

It is clear that if blk_drain() is a no-op, the value of
s->bus->dma->aiocb will not change between checking it in the `if`
condition and asserting that it is NULL after blk_drain().

The particular problem is that ide_issue_trim() creates a BH
(ide_trim_bh_cb()) to settle the TRIM request: iocb->common.cb() is
ide_dma_cb(), which will either create a new request, or find the
transfer to be done and call ide_set_inactive(), which clears
s->bus->dma->aiocb.  Therefore, the blk_drain() must wait for
ide_trim_bh_cb() to run, which currently it will not always do.

To fix this issue, we increment the BlockBackend's in-flight counter
when the TRIM operation begins (in ide_issue_trim(), when the
ide_trim_bh_cb() BH is created) and decrement it when ide_trim_bh_cb()
is done.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2029980
Suggested-by: Paolo Bonzini 
Signed-off-by: Hanna Reitz 
---
v1:
https://lists.nongnu.org/archive/html/qemu-block/2022-01/msg00024.html

v2:
- Increment BB’s in-flight counter while the BH is active so that
   blk_drain() will poll until the BH is done, as suggested by Paolo

(No git-backport-diff, because this patch was basically completely
rewritten, so it wouldn’t be worth it.)
---
  hw/ide/core.c | 7 +++
  1 file changed, 7 insertions(+)

Re: [PATCH 1/6] tests/qemu-iotests: Improve the check for GNU sed

2022-02-15 Thread Thomas Huth


On 15/02/2022 14.51, Daniel P. Berrangé wrote:

On Tue, Feb 15, 2022 at 02:28:24PM +0100, Thomas Huth wrote:

On 11/02/2022 17.48, Thomas Huth wrote:

On 11/02/2022 17.14, Eric Blake wrote:

On Tue, Feb 08, 2022 at 03:52:19PM +0100, Thomas Huth wrote:

The current code with $SED has been introduced almost three years
ago already...


    Can’t we just do `alias sed=gsed`?


Maybe ... but let's ask Philippe and Kevin first, who Signed-off
commit bde36af1ab4f476 that introduced the current way with $SED:
What's your opinion about this?


This commit was to have check-block working on the OpenBSD VM image.


Sure. The question was whether using an alias as suggested by Hanna would be
nicer instead of using $SED ?


Scripting with aliases becomes a nightmare to debug, since it is
relatively uncommon.  In particular, in bash, you have to explicitly
opt in to using aliases (contrary to POSIX sh where aliases are
available to scripts at startup).


shopt -s expand_aliases
... as I just learnt the hard way ;-)


Using $SED everywhere may require
more hunting, but it is more obvious when reading a test that "oh
yeah, I might be using extensions that the default 'sed' can't
support" than a script that blindly uses 'sed' and depends on it
aliasing to a more-capable sed at a distance.

The other question is how many GNU sed features are we actually
depending on?  Which tests break if we have BSD sed or busybox sed?
Can we rewrite those sed scripts to avoid GNU extensions?  But
auditing for s/sed/$SED/ seems easier than auditing for which
non-portable sed extensions we depend on.


The most obvious part are the filter functions in common.filter - we're
using "-r" here that is not part of the POSIX sed as far as I can see.


Now that I stepped through the list, the other major part that is failing on
non-GNU seds are the statements that use "\r" or "\n" or "\e" to replace
special characters. That seems to be a non-POSIX extension, too.

But for running with Alpine, there is also the additional problems that the
libc uses slightly different error strings, e.g. "I/O error" instead of
"Input/output error", see e.g.:

  https://gitlab.com/thuth/qemu/-/jobs/2094869856

Maybe it could be fixed with some extensions to the filters, but I'm not
sure whether we really want to go down that road...?


AFAIK, errno strings are not standardized by POSIX, so I presume this
problem will apply to running I/O tests on any non-Linux system too.

With this in mind I think we should consider what a portable solution
looks like. We can't simply match the Alpine strings and turn them
into GLibC strings, as that does nothing to help portability on *BSD,
macOS, Windows, etc.
Luckily, the strings did not cause that much problems on *BSDs and macOS 
yet... Most of the current set of tests works fine there. It's really just 
that libc from Alpine that is causing this trouble...


 Thomas

[PATCH 1/3] block: Make bdrv_refresh_limits() non-recursive

2022-02-15 Thread Hanna Reitz

bdrv_refresh_limits() recurses down to the node's children.  That does
not seem necessary: When we refresh limits on some node, and then
recurse down and were to change one of its children's BlockLimits, then
that would mean we noticed the changed limits by pure chance.  The fact
that we refresh the parent's limits has nothing to do with it, so the
reason for the change probably happened before this point in time, and
we should have refreshed the limits then.

On the other hand, we do not have infrastructure for noticing that block
limits change after they have been initialized for the first time (this
would require propagating the change upwards to the respective node's
parents), and so evidently we consider this case impossible.

If this case is impossible, then we will not need to recurse down in
bdrv_refresh_limits().  Every node's limits are initialized in
bdrv_open_driver(), and are refreshed whenever its children change.
We want to use the childrens' limits to get some initial default, but we
can just take them, we do not need to refresh them.

The problem with recursing is that bdrv_refresh_limits() is not atomic.
It begins with zeroing BDS.bl, and only then sets proper, valid limits.
If we do not drain all nodes whose limits are refreshed, then concurrent
I/O requests can encounter invalid request_alignment values and crash
qemu.  Therefore, a recursing bdrv_refresh_limits() requires the whole
subtree to be drained, which is currently not ensured by most callers.

A non-recursive bdrv_refresh_limits() only requires the node in question
to not receive I/O requests, and this is done by most callers in some
way or another:
- bdrv_open_driver() deals with a new node with no parents yet
- bdrv_set_file_or_backing_noperm() acts on a drained node
- bdrv_reopen_commit() acts only on drained nodes
- bdrv_append() should in theory require the node to be drained; in
  practice most callers just lock the AioContext, which should at least
  be enough to prevent concurrent I/O requests from accessing invalid
  limits

So we can resolve the bug by making bdrv_refresh_limits() non-recursive.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1879437
Signed-off-by: Hanna Reitz 
---
 block/io.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/block/io.c b/block/io.c
index 4e4cb556c5..c3e7301613 100644
--- a/block/io.c
+++ b/block/io.c
@@ -189,10 +189,6 @@ void bdrv_refresh_limits(BlockDriverState *bs, Transaction 
*tran, Error **errp)
 QLIST_FOREACH(c, >children, next) {
 if (c->role & (BDRV_CHILD_DATA | BDRV_CHILD_FILTERED | BDRV_CHILD_COW))
 {
-bdrv_refresh_limits(c->bs, tran, errp);
-if (*errp) {
-return;
-}
 bdrv_merge_limits(>bl, >bs->bl);
 have_limits = true;
 }
-- 
2.34.1

[PATCH 3/3] iotests/graph-changes-while-io: New test

2022-02-15 Thread Hanna Reitz

Test the following scenario:
1. Some block node (null-co) attached to a user (here: NBD server) that
   performs I/O and keeps the node in an I/O thread
2. Repeatedly run blockdev-add/blockdev-del to add/remove an overlay
   to/from that node

Each blockdev-add triggers bdrv_refresh_limits(), and because
blockdev-add runs in the main thread, it does not stop the I/O requests.
I/O can thus happen while the limits are refreshed, and when such a
request sees a temporarily invalid block limit (e.g. alignment is 0),
this may easily crash qemu (or the storage daemon in this case).

The block layer needs to ensure that I/O requests to a node are paused
while that node's BlockLimits are refreshed.

Signed-off-by: Hanna Reitz 
---
 .../qemu-iotests/tests/graph-changes-while-io | 91 +++
 .../tests/graph-changes-while-io.out  |  5 +
 2 files changed, 96 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/graph-changes-while-io
 create mode 100644 tests/qemu-iotests/tests/graph-changes-while-io.out

diff --git a/tests/qemu-iotests/tests/graph-changes-while-io 
b/tests/qemu-iotests/tests/graph-changes-while-io
new file mode 100755
index 00..567e8cf21e
--- /dev/null
+++ b/tests/qemu-iotests/tests/graph-changes-while-io
@@ -0,0 +1,91 @@
+#!/usr/bin/env python3
+# group: rw
+#
+# Test graph changes while I/O is happening
+#
+# Copyright (C) 2022 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+from threading import Thread
+import iotests
+from iotests import imgfmt, qemu_img, qemu_img_create, QMPTestCase, \
+QemuStorageDaemon
+
+
+top = os.path.join(iotests.test_dir, 'top.img')
+nbd_sock = os.path.join(iotests.sock_dir, 'nbd.sock')
+
+
+def do_qemu_img_bench() -> None:
+"""
+Do some I/O requests on `nbd_sock`.
+"""
+assert qemu_img('bench', '-f', 'raw', '-c', '200',
+f'nbd+unix:///node0?socket={nbd_sock}') == 0
+
+
+class TestGraphChangesWhileIO(QMPTestCase):
+def setUp(self) -> None:
+# Create an overlay that can be added at runtime on top of the
+# null-co block node that will receive I/O
+assert qemu_img_create('-f', imgfmt, '-F', 'raw', '-b', 'null-co://',
+   top) == 0
+
+# QSD instance with a null-co block node in an I/O thread,
+# exported over NBD (on `nbd_sock`, export name "node0")
+self.qsd = QemuStorageDaemon(
+'--object', 'iothread,id=iothread0',
+'--blockdev', 'null-co,node-name=node0,read-zeroes=true',
+'--nbd-server', f'addr.type=unix,addr.path={nbd_sock}',
+'--export', 'nbd,id=exp0,node-name=node0,iothread=iothread0,' +
+'fixed-iothread=true,writable=true',
+qmp=True
+)
+
+def tearDown(self) -> None:
+self.qsd.stop()
+
+def test_blockdev_add_while_io(self) -> None:
+# Run qemu-img bench in the background
+bench_thr = Thread(target=do_qemu_img_bench)
+bench_thr.start()
+
+# While qemu-img bench is running, repeatedly add and remove an
+# overlay to/from node0
+while bench_thr.is_alive():
+result = self.qsd.qmp('blockdev-add', {
+'driver': imgfmt,
+'node-name': 'overlay',
+'backing': 'node0',
+'file': {
+'driver': 'file',
+'filename': top
+}
+})
+self.assert_qmp(result, 'return', {})
+
+result = self.qsd.qmp('blockdev-del', {
+'node-name': 'overlay'
+})
+self.assert_qmp(result, 'return', {})
+
+bench_thr.join()
+
+if __name__ == '__main__':
+# Format must support raw backing files
+iotests.main(supported_fmts=['qcow', 'qcow2', 'qed'],
+ supported_protocols=['file'])
diff --git a/tests/qemu-iotests/tests/graph-changes-while-io.out 
b/tests/qemu-iotests/tests/graph-changes-while-io.out
new file mode 100644
index 00..ae1213e6f8
--- /dev/null
+++ b/tests/qemu-iotests/tests/graph-changes-while-io.out
@@ -0,0 +1,5 @@
+.
+--
+Ran 1 tests
+
+OK
-- 
2.34.1

[PATCH 2/3] iotests: Allow using QMP with the QSD

2022-02-15 Thread Hanna Reitz

Add a parameter to optionally open a QMP connection when creating a
QemuStorageDaemon instance.

Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/iotests.py | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 6ba65eb1ff..47e3808ab9 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -39,6 +39,7 @@
 
 from qemu.machine import qtest
 from qemu.qmp import QMPMessage
+from qemu.aqmp.legacy import QEMUMonitorProtocol
 
 # Use this logger for logging messages directly from the iotests module
 logger = logging.getLogger('qemu.iotests')
@@ -348,14 +349,30 @@ def cmd(self, cmd):
 
 
 class QemuStorageDaemon:
-def __init__(self, *args: str, instance_id: str = 'a'):
+_qmp: Optional[QEMUMonitorProtocol] = None
+_qmpsock: Optional[str] = None
+# Python < 3.8 would complain if this type were not a string literal
+# (importing `annotations` from `__future__` would work; but not on <= 3.6)
+_p: 'Optional[subprocess.Popen[bytes]]' = None
+
+def __init__(self, *args: str, instance_id: str = 'a', qmp: bool = False):
 assert '--pidfile' not in args
 self.pidfile = os.path.join(test_dir, f'qsd-{instance_id}-pid')
 all_args = [qsd_prog] + list(args) + ['--pidfile', self.pidfile]
 
+if qmp:
+self._qmpsock = os.path.join(sock_dir, f'qsd-{instance_id}.sock')
+all_args += ['--chardev',
+ f'socket,id=qmp-sock,path={self._qmpsock}',
+ '--monitor', 'qmp-sock']
+
+self._qmp = QEMUMonitorProtocol(self._qmpsock, server=True)
+
 # Cannot use with here, we want the subprocess to stay around
 # pylint: disable=consider-using-with
 self._p = subprocess.Popen(all_args)
+if self._qmp is not None:
+self._qmp.accept()
 while not os.path.exists(self.pidfile):
 if self._p.poll() is not None:
 cmd = ' '.join(all_args)
@@ -370,12 +387,22 @@ def __init__(self, *args: str, instance_id: str = 'a'):
 
 assert self._pid == self._p.pid
 
+def qmp(self, cmd: str, args: Optional[Dict[str, object]] = None) \
+-> QMPMessage:
+assert self._qmp is not None
+return self._qmp.cmd(cmd, args)
+
 def stop(self, kill_signal=15):
 self._p.send_signal(kill_signal)
 self._p.wait()
 self._p = None
 
+if self._qmp:
+self._qmp.close()
+
 try:
+if self._qmpsock is not None:
+os.remove(self._qmpsock)
 os.remove(self.pidfile)
 except OSError:
 pass
-- 
2.34.1

[PATCH 0/3] block: Make bdrv_refresh_limits() non-recursive

2022-02-15 Thread Hanna Reitz

Hi,

Most bdrv_refresh_limits() callers do not drain the subtree of the node
whose limits are refreshed, so concurrent I/O requests to child nodes
can occur (if the node is in an I/O thread).  bdrv_refresh_limits() is
recursive, so such requests can happen to a node whose limits are being
refreshed.

bdrv_refresh_limits() is not atomic, and so the I/O requests can
encounter invalid limits, like a 0 request_alignment.  This will crash
qemu (e.g. because of a division by 0, or a failed assertion).

On inspection, bdrv_refresh_limits() doesn’t look like it really needs
to be recursive.  It just has always been.  Dropping the recursion fixes
those crashes, because all callers of bdrv_refresh_limits() make sure
one way or another that concurrent requests to the node whose limits are
to be refreshed are at leased paused (by draining, and/or by acquiring
the AioContext).

I see two other ways to fix it:
(A) Have all bdrv_refresh_limits() callers drain the entire subtree,
(B) Protect BDS.bl with RCU, which would make concurrent I/O just fine.

(A) is kind of ugly, and after starting down that path two times, both
times I decided I didn’t want to follow through with it.  It was always
an AioContext-juggling mess.  (E.g. bdrv_set_backing_hd() would need to
drain the subtree; but that means having to acquire the `backing_hd`
context, too, because `bs` might be moved into that context, and so when
`backing_hd` is attached to `bs`, `backing_hd` would be drained in the
new context.  But we can’t acquire a context twice, so we can only
acquire `backing_hd`’s context if the caller hasn’t done so already.
But the worst is that we can’t actually acquire that context: If `bs` is
moved into `backing_hd`’s context, then `bdrv_set_aio_context_ignore()`
requires us not to hold that context.  It’s just kind of a mess.)

I tried (B), and it worked, and I liked it very much; but it requires
quite a bit of refactoring (every BDS.bl reader must then use
qatomic_rcu_read() and take the RCU read lock), so it feels really
difficult to justify when the fix this series proposes just removes four
lines of code.


Hanna Reitz (3):
  block: Make bdrv_refresh_limits() non-recursive
  iotests: Allow using QMP with the QSD
  iotests/graph-changes-while-io: New test

 block/io.c|  4 -
 tests/qemu-iotests/iotests.py | 29 +-
 .../qemu-iotests/tests/graph-changes-while-io | 91 +++
 .../tests/graph-changes-while-io.out  |  5 +
 4 files changed, 124 insertions(+), 5 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/graph-changes-while-io
 create mode 100644 tests/qemu-iotests/tests/graph-changes-while-io.out

-- 
2.34.1

Re: [PATCH 1/6] tests/qemu-iotests: Improve the check for GNU sed

2022-02-15 Thread Daniel P . Berrangé

On Tue, Feb 15, 2022 at 02:28:24PM +0100, Thomas Huth wrote:
> On 11/02/2022 17.48, Thomas Huth wrote:
> > On 11/02/2022 17.14, Eric Blake wrote:
> > > On Tue, Feb 08, 2022 at 03:52:19PM +0100, Thomas Huth wrote:
> > > > > > The current code with $SED has been introduced almost three years
> > > > > > ago already...
> > > > > > 
> > > > > > >    Can’t we just do `alias sed=gsed`?
> > > > > > 
> > > > > > Maybe ... but let's ask Philippe and Kevin first, who Signed-off
> > > > > > commit bde36af1ab4f476 that introduced the current way with $SED:
> > > > > > What's your opinion about this?
> > > > > 
> > > > > This commit was to have check-block working on the OpenBSD VM image.
> > > > 
> > > > Sure. The question was whether using an alias as suggested by Hanna 
> > > > would be
> > > > nicer instead of using $SED ?
> > > 
> > > Scripting with aliases becomes a nightmare to debug, since it is
> > > relatively uncommon.  In particular, in bash, you have to explicitly
> > > opt in to using aliases (contrary to POSIX sh where aliases are
> > > available to scripts at startup).
> > 
> > shopt -s expand_aliases
> > ... as I just learnt the hard way ;-)
> > 
> > > Using $SED everywhere may require
> > > more hunting, but it is more obvious when reading a test that "oh
> > > yeah, I might be using extensions that the default 'sed' can't
> > > support" than a script that blindly uses 'sed' and depends on it
> > > aliasing to a more-capable sed at a distance.
> > > 
> > > The other question is how many GNU sed features are we actually
> > > depending on?  Which tests break if we have BSD sed or busybox sed?
> > > Can we rewrite those sed scripts to avoid GNU extensions?  But
> > > auditing for s/sed/$SED/ seems easier than auditing for which
> > > non-portable sed extensions we depend on.
> > 
> > The most obvious part are the filter functions in common.filter - we're
> > using "-r" here that is not part of the POSIX sed as far as I can see.
> 
> Now that I stepped through the list, the other major part that is failing on
> non-GNU seds are the statements that use "\r" or "\n" or "\e" to replace
> special characters. That seems to be a non-POSIX extension, too.
> 
> But for running with Alpine, there is also the additional problems that the
> libc uses slightly different error strings, e.g. "I/O error" instead of
> "Input/output error", see e.g.:
> 
>  https://gitlab.com/thuth/qemu/-/jobs/2094869856
> 
> Maybe it could be fixed with some extensions to the filters, but I'm not
> sure whether we really want to go down that road...?

AFAIK, errno strings are not standardized by POSIX, so I presume this
problem will apply to running I/O tests on any non-Linux system too.

With this in mind I think we should consider what a portable solution
looks like. We can't simply match the Alpine strings and turn them
into GLibC strings, as that does nothing to help portability on *BSD,
macOS, Windows, etc. We would need to figure out how to blank out
arbitrary input error message strings.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 1/6] tests/qemu-iotests: Improve the check for GNU sed

2022-02-15 Thread Thomas Huth


On 11/02/2022 17.48, Thomas Huth wrote:

On 11/02/2022 17.14, Eric Blake wrote:

On Tue, Feb 08, 2022 at 03:52:19PM +0100, Thomas Huth wrote:

The current code with $SED has been introduced almost three years
ago already...


   Can’t we just do `alias sed=gsed`?


Maybe ... but let's ask Philippe and Kevin first, who Signed-off
commit bde36af1ab4f476 that introduced the current way with $SED:
What's your opinion about this?


This commit was to have check-block working on the OpenBSD VM image.


Sure. The question was whether using an alias as suggested by Hanna would be
nicer instead of using $SED ?


Scripting with aliases becomes a nightmare to debug, since it is
relatively uncommon.  In particular, in bash, you have to explicitly
opt in to using aliases (contrary to POSIX sh where aliases are
available to scripts at startup).


shopt -s expand_aliases
... as I just learnt the hard way ;-)


Using $SED everywhere may require
more hunting, but it is more obvious when reading a test that "oh
yeah, I might be using extensions that the default 'sed' can't
support" than a script that blindly uses 'sed' and depends on it
aliasing to a more-capable sed at a distance.

The other question is how many GNU sed features are we actually
depending on?  Which tests break if we have BSD sed or busybox sed?
Can we rewrite those sed scripts to avoid GNU extensions?  But
auditing for s/sed/$SED/ seems easier than auditing for which
non-portable sed extensions we depend on.


The most obvious part are the filter functions in common.filter - we're 
using "-r" here that is not part of the POSIX sed as far as I can see.


Now that I stepped through the list, the other major part that is failing on 
non-GNU seds are the statements that use "\r" or "\n" or "\e" to replace 
special characters. That seems to be a non-POSIX extension, too.


But for running with Alpine, there is also the additional problems that the 
libc uses slightly different error strings, e.g. "I/O error" instead of 
"Input/output error", see e.g.:


 https://gitlab.com/thuth/qemu/-/jobs/2094869856

Maybe it could be fixed with some extensions to the filters, but I'm not 
sure whether we really want to go down that road...?


 Thomas

[PATCH] tests/qemu-iotests: Rework the checks and spots using GNU sed

2022-02-15 Thread Thomas Huth

Instead of failing the iotests if GNU sed is not available (or skipping
them completely in the check-block.sh script), it would be better to
simply skip the bash-based tests that rely on GNU sed, so that the other
tests could still be run. Thus we now explicitely use "gsed" (either as
direct program or as a wrapper around "sed" if it's the GNU version)
in the spots that rely on the GNU sed behavior. Then we also remove the
sed checks from the check-block.sh script, so that "make check-block"
can now be run on systems without GNU sed, too.

Signed-off-by: Thomas Huth 
---
 I've checked that this works fine with:
 make vm-build-netbsd TARGET_LIST=x86_64-softmmu BUILD_TARGET=check-block
 make vm-build-freebsd TARGET_LIST=x86_64-softmmu BUILD_TARGET=check-block
 and with the macOS targets in our CI.

 tests/check-block.sh | 12 --
 tests/qemu-iotests/271   |  2 +-
 tests/qemu-iotests/common.filter | 74 
 tests/qemu-iotests/common.rc | 45 +--
 4 files changed, 61 insertions(+), 72 deletions(-)

diff --git a/tests/check-block.sh b/tests/check-block.sh
index 720a46bc36..af0c574812 100755
--- a/tests/check-block.sh
+++ b/tests/check-block.sh
@@ -52,18 +52,6 @@ if LANG=C bash --version | grep -q 'GNU bash, version [123]' 
; then
 skip "bash version too old ==> Not running the qemu-iotests."
 fi
 
-if ! (sed --version | grep 'GNU sed') > /dev/null 2>&1 ; then
-if ! command -v gsed >/dev/null 2>&1; then
-skip "GNU sed not available ==> Not running the qemu-iotests."
-fi
-else
-# Double-check that we're not using BusyBox' sed which says
-# that "This is not GNU sed version 4.0" ...
-if sed --version | grep -q 'not GNU sed' ; then
-skip "BusyBox sed not supported ==> Not running the qemu-iotests."
-fi
-fi
-
 cd tests/qemu-iotests
 
 # QEMU_CHECK_BLOCK_AUTO is used to disable some unstable sub-tests
diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
index 2775b4d130..c7c2cadda0 100755
--- a/tests/qemu-iotests/271
+++ b/tests/qemu-iotests/271
@@ -896,7 +896,7 @@ _make_test_img -o extended_l2=on 1M
 # Second and third writes in _concurrent_io() are independent and may finish in
 # different order. So, filter offset out to match both possible variants.
 _concurrent_io | $QEMU_IO | _filter_qemu_io | \
-$SED -e 's/\(20480\|40960\)/OFFSET/'
+sed -e 's/\(20480\|40960\)/OFFSET/'
 _concurrent_verify | $QEMU_IO | _filter_qemu_io
 
 # success, all done
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 935217aa65..a3b1708adc 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -21,58 +21,58 @@
 
 _filter_date()
 {
-$SED -re 's/[0-9]{4}-[0-9]{2}-[0-9]{2} 
[0-9]{2}:[0-9]{2}:[0-9]{2}/-mm-dd hh:mm:ss/'
+gsed -re 's/[0-9]{4}-[0-9]{2}-[0-9]{2} 
[0-9]{2}:[0-9]{2}:[0-9]{2}/-mm-dd hh:mm:ss/'
 }
 
 _filter_vmstate_size()
 {
-$SED -r -e 's/[0-9. ]{5} [KMGT]iB/ SIZE/' \
+gsed -r -e 's/[0-9. ]{5} [KMGT]iB/ SIZE/' \
 -e 's/[0-9. ]{5} B/   SIZE/'
 }
 
 _filter_generated_node_ids()
 {
-$SED -re 's/\#block[0-9]{3,}/NODE_NAME/'
+gsed -re 's/\#block[0-9]{3,}/NODE_NAME/'
 }
 
 _filter_qom_path()
 {
-$SED -e '/Attached to:/s/\device[[0-9]\+\]/device[N]/g'
+gsed -e '/Attached to:/s/\device[[0-9]\+\]/device[N]/g'
 }
 
 # replace occurrences of the actual TEST_DIR value with TEST_DIR
 _filter_testdir()
 {
-$SED -e "s#$TEST_DIR/#TEST_DIR/#g" \
- -e "s#$SOCK_DIR/#SOCK_DIR/#g" \
- -e "s#SOCK_DIR/fuse-#TEST_DIR/#g"
+sed -e "s#$TEST_DIR/#TEST_DIR/#g" \
+-e "s#$SOCK_DIR/#SOCK_DIR/#g" \
+-e "s#SOCK_DIR/fuse-#TEST_DIR/#g"
 }
 
 # replace occurrences of the actual IMGFMT value with IMGFMT
 _filter_imgfmt()
 {
-$SED -e "s#$IMGFMT#IMGFMT#g"
+sed -e "s#$IMGFMT#IMGFMT#g"
 }
 
 # Replace error message when the format is not supported and delete
 # the output lines after the first one
 _filter_qemu_img_check()
 {
-$SED -e '/allocated.*fragmented.*compressed clusters/d' \
--e 's/qemu-img: This image format does not support checks/No errors 
were found on the image./' \
--e '/Image end offset: [0-9]\+/d'
+gsed -e '/allocated.*fragmented.*compressed clusters/d' \
+ -e 's/qemu-img: This image format does not support checks/No errors 
were found on the image./' \
+ -e '/Image end offset: [0-9]\+/d'
 }
 
 # Removes \r from messages
 _filter_win32()
 {
-$SED -e 's/\r//g'
+gsed -e 's/\r//g'
 }
 
 # sanitize qemu-io output
 _filter_qemu_io()
 {
-_filter_win32 | $SED -e "s/[0-9]* ops\; [0-9/:. sec]* ([0-9/.inf]* 
[EPTGMKiBbytes]*\/sec and [0-9/.inf]* ops\/sec)/X ops\; XX:XX:XX.X (XXX 
YYY\/sec and XXX ops\/sec)/" \
+_filter_win32 | gsed -e "s/[0-9]* ops\; [0-9/:. sec]* ([0-9/.inf]* 
[EPTGMKiBbytes]*\/sec and [0-9/.inf]* ops\/sec)/X ops\; XX:XX:XX.X (XXX 
YYY\/sec and XXX ops\/sec)/" \
 -e "s/: line

[PATCH] block: fix preallocate filter: don't do unaligned preallocate requests

2022-02-15 Thread Vladimir Sementsov-Ogievskiy

There is a bug in handling BDRV_REQ_NO_WAIT flag: we still may wait in
wait_serialising_requests() if request is unaligned. And this is
possible for the only user of this flag (preallocate filter) if
underlying file is unaligned to its request_alignment on start.

So, we have to fix preallocate filter to do only aligned preallocate
requests.

Next, we should fix generic block/io.c somehow. Keeping in mind that
preallocate is the only user of BDRV_REQ_NO_WAIT and that we have to
fix its behavior now, it seems more safe to just assert that we never
use BDRV_REQ_NO_WAIT with unaligned requests and add corresponding
comment. Let's do so.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Denis V. Lunev 
---
 include/block/block.h |  3 ++-
 block/io.c|  4 
 block/preallocate.c   | 15 ---
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index e1713ee306..f73203af4a 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -83,7 +83,8 @@ typedef enum {
 
 /*
  * If we need to wait for other requests, just fail immediately. Used
- * only together with BDRV_REQ_SERIALISING.
+ * only together with BDRV_REQ_SERIALISING. Used only with requests aligned
+ * to request_alignment (corresponding assertions are in block/io.c).
  */
 BDRV_REQ_NO_WAIT = 0x400,
 
diff --git a/block/io.c b/block/io.c
index 4e4cb556c5..eb562a1cb8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2173,6 +2173,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild 
*child,
 
 padding = bdrv_init_padding(bs, offset, bytes, );
 if (padding) {
+assert(!(flags & BDRV_REQ_NO_WAIT));
 bdrv_make_request_serialising(req, align);
 
 bdrv_padding_rmw_read(child, req, , true);
@@ -2307,6 +2308,7 @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
  * serialize the request to prevent interactions of the
  * widened region with other transactions.
  */
+assert(!(flags & BDRV_REQ_NO_WAIT));
 bdrv_make_request_serialising(, align);
 bdrv_padding_rmw_read(child, , , false);
 }
@@ -3328,6 +3330,8 @@ static int coroutine_fn bdrv_co_copy_range_internal(
 /* TODO We can support BDRV_REQ_NO_FALLBACK here */
 assert(!(read_flags & BDRV_REQ_NO_FALLBACK));
 assert(!(write_flags & BDRV_REQ_NO_FALLBACK));
+assert(!(read_flags & BDRV_REQ_NO_WAIT));
+assert(!(write_flags & BDRV_REQ_NO_WAIT));
 
 if (!dst || !dst->bs || !bdrv_is_inserted(dst->bs)) {
 return -ENOMEDIUM;
diff --git a/block/preallocate.c b/block/preallocate.c
index 1d4233f730..e15cb8c74a 100644
--- a/block/preallocate.c
+++ b/block/preallocate.c
@@ -276,6 +276,10 @@ static bool coroutine_fn handle_write(BlockDriverState 
*bs, int64_t offset,
 int64_t end = offset + bytes;
 int64_t prealloc_start, prealloc_end;
 int ret;
+uint32_t file_align = bs->file->bs->bl.request_alignment;
+uint32_t prealloc_align = MAX(s->opts.prealloc_align, file_align);
+
+assert(QEMU_IS_ALIGNED(prealloc_align, file_align));
 
 if (!has_prealloc_perms(bs)) {
 /* We don't have state neither should try to recover it */
@@ -320,9 +324,14 @@ static bool coroutine_fn handle_write(BlockDriverState 
*bs, int64_t offset,
 
 /* Now we want new preallocation, as request writes beyond s->file_end. */
 
-prealloc_start = want_merge_zero ? MIN(offset, s->file_end) : s->file_end;
-prealloc_end = QEMU_ALIGN_UP(end + s->opts.prealloc_size,
- s->opts.prealloc_align);
+prealloc_start = QEMU_ALIGN_UP(
+want_merge_zero ? MIN(offset, s->file_end) : s->file_end,
+file_align);
+prealloc_end = QEMU_ALIGN_UP(
+MAX(prealloc_start, end) + s->opts.prealloc_size,
+prealloc_align);
+
+want_merge_zero = want_merge_zero && (prealloc_start <= offset);
 
 ret = bdrv_co_pwrite_zeroes(
 bs->file, prealloc_start, prealloc_end - prealloc_start,
-- 
2.31.1

[PATCH v2 6/6] libvduse: Add support for reconnecting

2022-02-15 Thread Xie Yongji

To support reconnecting after restart or crash, VDUSE backend
might need to resubmit inflight I/Os. This stores the metadata
such as the index of inflight I/O's descriptors to a shm file so
that VDUSE backend can restore them during reconnecting.

Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c|  14 ++
 subprojects/libvduse/libvduse.c | 232 +++-
 subprojects/libvduse/libvduse.h |  12 ++
 3 files changed, 253 insertions(+), 5 deletions(-)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index e456dfe2b3..b519a34370 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -38,6 +38,7 @@ typedef struct VduseBlkExport {
 uint16_t num_queues;
 uint32_t blk_size;
 bool writable;
+char *recon_file;
 } VduseBlkExport;
 
 struct virtio_blk_inhdr {
@@ -232,6 +233,8 @@ static void vduse_blk_enable_queue(VduseDev *dev, 
VduseVirtq *vq)
 
 aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq),
true, on_vduse_vq_kick, NULL, NULL, NULL, vq);
+/* Make sure we don't miss any kick afer reconnecting */
+eventfd_write(vduse_queue_get_fd(vq), 1);
 }
 
 static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq)
@@ -407,6 +410,15 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 return -ENOMEM;
 }
 
+vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s",
+   g_get_tmp_dir(), exp->id);
+if (vduse_set_reconnect_log_file(vblk_exp->dev, vblk_exp->recon_file)) {
+error_setg(errp, "failed to set reconnect log file");
+vduse_dev_destroy(vblk_exp->dev);
+g_free(vblk_exp->recon_file);
+return -EINVAL;
+}
+
 for (i = 0; i < num_queues; i++) {
 vduse_dev_setup_queue(vblk_exp->dev, i, queue_size);
 }
@@ -430,6 +442,8 @@ static void vduse_blk_exp_delete(BlockExport *exp)
 vblk_exp);
 blk_set_dev_ops(exp->blk, NULL, NULL);
 vduse_dev_destroy(vblk_exp->dev);
+unlink(vblk_exp->recon_file);
+g_free(vblk_exp->recon_file);
 }
 
 static void vduse_blk_exp_request_shutdown(BlockExport *exp)
diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
index 9c6a8bcf6e..31f7d6e119 100644
--- a/subprojects/libvduse/libvduse.c
+++ b/subprojects/libvduse/libvduse.c
@@ -41,6 +41,8 @@
 #define VDUSE_VQ_ALIGN 4096
 #define MAX_IOVA_REGIONS 256
 
+#define LOG_ALIGNMENT 64
+
 /* Round number down to multiple */
 #define ALIGN_DOWN(n, m) ((n) / (m) * (m))
 
@@ -51,6 +53,31 @@
 #define unlikely(x)   __builtin_expect(!!(x), 0)
 #endif
 
+typedef struct VduseDescStateSplit {
+uint8_t inflight;
+uint8_t padding[5];
+uint16_t next;
+uint64_t counter;
+} VduseDescStateSplit;
+
+typedef struct VduseVirtqLogInflight {
+uint64_t features;
+uint16_t version;
+uint16_t desc_num;
+uint16_t last_batch_head;
+uint16_t used_idx;
+VduseDescStateSplit desc[];
+} VduseVirtqLogInflight;
+
+typedef struct VduseVirtqLog {
+VduseVirtqLogInflight inflight;
+} VduseVirtqLog;
+
+typedef struct VduseVirtqInflightDesc {
+uint16_t index;
+uint64_t counter;
+} VduseVirtqInflightDesc;
+
 typedef struct VduseRing {
 unsigned int num;
 uint64_t desc_addr;
@@ -73,6 +100,10 @@ struct VduseVirtq {
 bool ready;
 int fd;
 VduseDev *dev;
+VduseVirtqInflightDesc *resubmit_list;
+uint16_t resubmit_num;
+uint64_t counter;
+VduseVirtqLog *log;
 };
 
 typedef struct VduseIovaRegion {
@@ -96,8 +127,36 @@ struct VduseDev {
 int fd;
 int ctrl_fd;
 void *priv;
+void *log;
 };
 
+static inline size_t vduse_vq_log_size(uint16_t queue_size)
+{
+return ALIGN_UP(sizeof(VduseDescStateSplit) * queue_size +
+sizeof(VduseVirtqLogInflight), LOG_ALIGNMENT);
+}
+
+static void *vduse_log_get(const char *filename, size_t size)
+{
+void *ptr = MAP_FAILED;
+int fd;
+
+fd = open(filename, O_RDWR | O_CREAT, 0600);
+if (fd == -1) {
+return MAP_FAILED;
+}
+
+if (ftruncate(fd, size) == -1) {
+goto out;
+}
+
+ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+
+out:
+close(fd);
+return ptr;
+}
+
 static inline bool has_feature(uint64_t features, unsigned int fbit)
 {
 assert(fbit < 64);
@@ -139,6 +198,102 @@ static int vduse_inject_irq(VduseDev *dev, int index)
 return ioctl(dev->fd, VDUSE_VQ_INJECT_IRQ, );
 }
 
+static int inflight_desc_compare(const void *a, const void *b)
+{
+VduseVirtqInflightDesc *desc0 = (VduseVirtqInflightDesc *)a,
+   *desc1 = (VduseVirtqInflightDesc *)b;
+
+if (desc1->counter > desc0->counter &&
+(desc1->counter - desc0->counter) < VIRTQUEUE_MAX_SIZE * 2) {
+return 1;
+}
+
+return -1;
+}
+
+static int vduse_queue_check_inflights(VduseVirtq *vq)
+{
+int i = 0;
+VduseDev *dev =

[PATCH v2 5/6] vduse-blk: Add vduse-blk resize support

2022-02-15 Thread Xie Yongji

To support block resize, this uses vduse_dev_update_config()
to update the capacity field in configuration space and inject
config interrupt on the block resize callback.

Signed-off-by: Xie Yongji 
---
 block/export/vduse-blk.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index 942f985de3..e456dfe2b3 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -309,6 +309,23 @@ static void blk_aio_detach(void *opaque)
 vblk_exp->export.ctx = NULL;
 }
 
+static void vduse_blk_resize(void *opaque)
+{
+BlockExport *exp = opaque;
+VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export);
+struct virtio_blk_config config;
+
+config.capacity =
+cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS);
+vduse_dev_update_config(vblk_exp->dev, sizeof(config.capacity),
+offsetof(struct virtio_blk_config, capacity),
+(char *));
+}
+
+static const BlockDevOps vduse_block_ops = {
+.resize_cb = vduse_blk_resize,
+};
+
 static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
 Error **errp)
 {
@@ -400,6 +417,8 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
  vblk_exp);
 
+blk_set_dev_ops(exp->blk, _block_ops, exp);
+
 return 0;
 }
 
@@ -409,6 +428,7 @@ static void vduse_blk_exp_delete(BlockExport *exp)
 
 blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
 vblk_exp);
+blk_set_dev_ops(exp->blk, NULL, NULL);
 vduse_dev_destroy(vblk_exp->dev);
 }
 
-- 
2.20.1

[PATCH v2 3/6] libvduse: Add VDUSE (vDPA Device in Userspace) library

2022-02-15 Thread Xie Yongji

VDUSE [1] is a linux framework that makes it possible to implement
software-emulated vDPA devices in userspace. This adds a library
as a subproject to help implementing VDUSE backends in QEMU.

[1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html

Signed-off-by: Xie Yongji 
---
 meson.build |   15 +
 meson_options.txt   |2 +
 scripts/meson-buildoptions.sh   |3 +
 subprojects/libvduse/include/atomic.h   |1 +
 subprojects/libvduse/libvduse.c | 1152 +++
 subprojects/libvduse/libvduse.h |  225 
 subprojects/libvduse/linux-headers/linux|1 +
 subprojects/libvduse/meson.build|   10 +
 subprojects/libvduse/standard-headers/linux |1 +
 9 files changed, 1410 insertions(+)
 create mode 12 subprojects/libvduse/include/atomic.h
 create mode 100644 subprojects/libvduse/libvduse.c
 create mode 100644 subprojects/libvduse/libvduse.h
 create mode 12 subprojects/libvduse/linux-headers/linux
 create mode 100644 subprojects/libvduse/meson.build
 create mode 12 subprojects/libvduse/standard-headers/linux

diff --git a/meson.build b/meson.build
index ae5f7eec6e..27e6e07110 100644
--- a/meson.build
+++ b/meson.build
@@ -1304,6 +1304,21 @@ if not get_option('fuse_lseek').disabled()
   endif
 endif
 
+have_libvduse = (targetos == 'linux')
+if get_option('libvduse').enabled()
+if targetos != 'linux'
+error('libvduse requires linux')
+endif
+elif get_option('libvduse').disabled()
+have_libvduse = false
+endif
+
+libvduse = not_found
+if have_libvduse
+  libvduse_proj = subproject('libvduse')
+  libvduse = libvduse_proj.get_variable('libvduse_dep')
+endif
+
 # libbpf
 libbpf = dependency('libbpf', required: get_option('bpf'), method: 
'pkg-config')
 if libbpf.found() and not cc.links('''
diff --git a/meson_options.txt b/meson_options.txt
index 95d527f773..7099ab7b8b 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -193,6 +193,8 @@ option('virtfs', type: 'feature', value: 'auto',
description: 'virtio-9p support')
 option('virtiofsd', type: 'feature', value: 'auto',
description: 'build virtiofs daemon (virtiofsd)')
+option('libvduse', type: 'feature', value: 'auto',
+   description: 'build VDUSE Library')
 
 option('capstone', type: 'combo', value: 'auto',
choices: ['disabled', 'enabled', 'auto', 'system', 'internal'],
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 48a454cece..8f031ea92e 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -58,6 +58,7 @@ meson_options_help() {
   printf "%s\n" '  libssh  ssh block device support'
   printf "%s\n" '  libudev Use libudev to enumerate host devices'
   printf "%s\n" '  libusb  libusb support for USB passthrough'
+  printf "%s\n" '  libvdusebuild VDUSE Library'
   printf "%s\n" '  linux-aio   Linux AIO support'
   printf "%s\n" '  linux-io-uring  Linux io_uring support'
   printf "%s\n" '  lzfse   lzfse support for DMG images'
@@ -187,6 +188,8 @@ _meson_option_parse() {
 --disable-libudev) printf "%s" -Dlibudev=disabled ;;
 --enable-libusb) printf "%s" -Dlibusb=enabled ;;
 --disable-libusb) printf "%s" -Dlibusb=disabled ;;
+--enable-libvduse) printf "%s" -Dlibvduse=enabled ;;
+--disable-libvduse) printf "%s" -Dlibvduse=disabled ;;
 --enable-linux-aio) printf "%s" -Dlinux_aio=enabled ;;
 --disable-linux-aio) printf "%s" -Dlinux_aio=disabled ;;
 --enable-linux-io-uring) printf "%s" -Dlinux_io_uring=enabled ;;
diff --git a/subprojects/libvduse/include/atomic.h 
b/subprojects/libvduse/include/atomic.h
new file mode 12
index 00..8c2be64f7b
--- /dev/null
+++ b/subprojects/libvduse/include/atomic.h
@@ -0,0 +1 @@
+../../../include/qemu/atomic.h
\ No newline at end of file
diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c
new file mode 100644
index 00..9c6a8bcf6e
--- /dev/null
+++ b/subprojects/libvduse/libvduse.c
@@ -0,0 +1,1152 @@
+/*
+ * VDUSE (vDPA Device in Userspace) library
+ *
+ * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *   Portions of codes and concepts borrowed from libvhost-user.c, so:
+ * Copyright IBM, Corp. 2007
+ * Copyright (c) 2016 Red Hat, Inc.
+ *
+ * Author:
+ *   Xie Yongji 
+ *   Anthony Liguori 
+ *   Marc-André Lureau 
+ *   Victor Kaplansky 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "include/atomic.h"
+#include "linux-headers/linux/virtio_ring.h"
+#include "linux-headers/linux/virtio_config.h"
+#include "linux-headers/linux/vduse.h"
+#include

[PATCH v2 1/6] block: Support passing NULL ops to blk_set_dev_ops()

2022-02-15 Thread Xie Yongji

This supports passing NULL ops to blk_set_dev_ops()
so that we can remove stale ops in some cases.

Signed-off-by: Xie Yongji 
---
 block/block-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 4ff6b4d785..08dd0a3093 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1015,7 +1015,7 @@ void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps 
*ops,
 blk->dev_opaque = opaque;
 
 /* Are we currently quiesced? Should we enforce this right now? */
-if (blk->quiesce_counter && ops->drained_begin) {
+if (blk->quiesce_counter && ops && ops->drained_begin) {
 ops->drained_begin(opaque);
 }
 }
-- 
2.20.1

[PATCH v2 4/6] vduse-blk: implements vduse-blk export

2022-02-15 Thread Xie Yongji

This implements a VDUSE block backends based on
the libvduse library. We can use it to export the BDSs
for both VM and container (host) usage.

The new command-line syntax is:

$ qemu-storage-daemon \
--blockdev file,node-name=drive0,filename=test.img \
--export vduse-blk,node-name=drive0,id=vduse-export0,writable=on

After the qemu-storage-daemon started, we need to use
the "vdpa" command to attach the device to vDPA bus:

$ vdpa dev add name vduse-export0 mgmtdev vduse

Also the device must be removed via the "vdpa" command
before we stop the qemu-storage-daemon.

Signed-off-by: Xie Yongji 
---
 block/export/export.c |   6 +
 block/export/meson.build  |   5 +
 block/export/vduse-blk.c  | 428 ++
 block/export/vduse-blk.h  |  20 ++
 meson.build   |  13 ++
 meson_options.txt |   2 +
 qapi/block-export.json|  24 +-
 scripts/meson-buildoptions.sh |   4 +
 8 files changed, 500 insertions(+), 2 deletions(-)
 create mode 100644 block/export/vduse-blk.c
 create mode 100644 block/export/vduse-blk.h

diff --git a/block/export/export.c b/block/export/export.c
index 6d3b9964c8..00dd505540 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -26,6 +26,9 @@
 #ifdef CONFIG_VHOST_USER_BLK_SERVER
 #include "vhost-user-blk-server.h"
 #endif
+#ifdef CONFIG_VDUSE_BLK_EXPORT
+#include "vduse-blk.h"
+#endif
 
 static const BlockExportDriver *blk_exp_drivers[] = {
 _exp_nbd,
@@ -35,6 +38,9 @@ static const BlockExportDriver *blk_exp_drivers[] = {
 #ifdef CONFIG_FUSE
 _exp_fuse,
 #endif
+#ifdef CONFIG_VDUSE_BLK_EXPORT
+_exp_vduse_blk,
+#endif
 };
 
 /* Only accessed from the main thread */
diff --git a/block/export/meson.build b/block/export/meson.build
index 0a08e384c7..cf311d2b1b 100644
--- a/block/export/meson.build
+++ b/block/export/meson.build
@@ -5,3 +5,8 @@ if have_vhost_user_blk_server
 endif
 
 blockdev_ss.add(when: fuse, if_true: files('fuse.c'))
+
+if have_vduse_blk_export
+blockdev_ss.add(files('vduse-blk.c'))
+blockdev_ss.add(libvduse)
+endif
diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
new file mode 100644
index 00..942f985de3
--- /dev/null
+++ b/block/export/vduse-blk.c
@@ -0,0 +1,428 @@
+/*
+ * Export QEMU block device via VDUSE
+ *
+ * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *   Portions of codes and concepts borrowed from vhost-user-blk-server.c, so:
+ * Copyright (c) 2020 Red Hat, Inc.
+ *
+ * Author:
+ *   Xie Yongji 
+ *   Coiby Xu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include 
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "sysemu/block-backend.h"
+#include "block/export.h"
+#include "qemu/error-report.h"
+#include "util/block-helpers.h"
+#include "subprojects/libvduse/libvduse.h"
+
+#include "standard-headers/linux/virtio_ring.h"
+#include "standard-headers/linux/virtio_blk.h"
+
+#define VIRTIO_BLK_SECTOR_BITS 9
+#define VIRTIO_BLK_SECTOR_SIZE (1ULL << VIRTIO_BLK_SECTOR_BITS)
+
+#define VDUSE_DEFAULT_NUM_QUEUE 1
+#define VDUSE_DEFAULT_QUEUE_SIZE 256
+
+typedef struct VduseBlkExport {
+BlockExport export;
+VduseDev *dev;
+uint16_t num_queues;
+uint32_t blk_size;
+bool writable;
+} VduseBlkExport;
+
+struct virtio_blk_inhdr {
+unsigned char status;
+};
+
+typedef struct VduseBlkReq {
+VduseVirtqElement elem;
+int64_t sector_num;
+size_t in_len;
+struct virtio_blk_inhdr *in;
+struct virtio_blk_outhdr out;
+VduseVirtq *vq;
+} VduseBlkReq;
+
+static void vduse_blk_req_complete(VduseBlkReq *req)
+{
+vduse_queue_push(req->vq, >elem, req->in_len);
+vduse_queue_notify(req->vq);
+
+free(req);
+}
+
+static bool vduse_blk_sect_range_ok(VduseBlkExport *vblk_exp,
+uint64_t sector, size_t size)
+{
+uint64_t nb_sectors;
+uint64_t total_sectors;
+
+if (size % VIRTIO_BLK_SECTOR_SIZE) {
+return false;
+}
+
+nb_sectors = size >> VIRTIO_BLK_SECTOR_BITS;
+
+QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE != VIRTIO_BLK_SECTOR_SIZE);
+if (nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
+return false;
+}
+if ((sector << VIRTIO_BLK_SECTOR_BITS) % vblk_exp->blk_size) {
+return false;
+}
+blk_get_geometry(vblk_exp->export.blk, _sectors);
+if (sector > total_sectors || nb_sectors > total_sectors - sector) {
+return false;
+}
+return true;
+}
+
+static void coroutine_fn vduse_blk_virtio_process_req(void *opaque)
+{
+VduseBlkReq *req = opaque;
+VduseVirtq *vq = req->vq;
+VduseDev *dev = vduse_queue_get_dev(vq);
+VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev);
+BlockBackend *blk = vblk_exp->export.blk;
+VduseVirtqElement *elem = >elem;
+struct iovec *in_iov = elem->in_sg;
+struct iovec *out_iov = elem->out_sg;
+

[PATCH v2 0/6] Support exporting BDSs via VDUSE

2022-02-15 Thread Xie Yongji

Hi all,

Last few months ago, VDUSE (vDPA Device in Userspace) [1] has
been merged into Linux kernel as a framework that make it
possible to emulate a vDPA device in userspace. This series
aimed at implementing a VDUSE block backend based on the
qemu-storage-daemon infrastructure.

To support that, we firstly introduce a VDUSE library as a
subproject (like what libvhost-user does) to help implementing
VDUSE backends in QEMU. Then a VDUSE block export is implemented
based on this library. At last, we add resize and reconnect support
to the VDUSE block export and VDUSE library.

Since we don't support vdpa-blk in QEMU currently, the VM case is
tested with my previous patchset [2].

[1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html
[2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg797569.html

Please review, thanks!

V1 to V2:
- Move vduse header to linux-headers [Stefan]
- Add two new API to support creating device from /dev/vduse/$NAME or
  file descriptor [Stefan]
- Check VIRTIO_F_VERSION_1 during intialization [Stefan]
- Replace malloc() + memset to calloc() [Stefan]
- Increase default queue size to 256 for vduse-blk [Stefan]
- Zero-initialize virtio-blk config space [Stefan]
- Add a patch to support reset blk->dev_ops
- Validate vq->log->inflight fields [Stefan]
- Add vduse_set_reconnect_log_file() API to support specifing the
  reconnect log file
- Fix some bugs [Stefan]

Xie Yongji (6):
  block: Support passing NULL ops to blk_set_dev_ops()
  linux-headers: Add vduse.h
  libvduse: Add VDUSE (vDPA Device in Userspace) library
  vduse-blk: implements vduse-blk export
  vduse-blk: Add vduse-blk resize support
  libvduse: Add support for reconnecting

 block/block-backend.c   |2 +-
 block/export/export.c   |6 +
 block/export/meson.build|5 +
 block/export/vduse-blk.c|  462 +++
 block/export/vduse-blk.h|   20 +
 linux-headers/linux/vduse.h |  306 +
 meson.build |   28 +
 meson_options.txt   |4 +
 qapi/block-export.json  |   24 +-
 scripts/meson-buildoptions.sh   |7 +
 scripts/update-linux-headers.sh |2 +-
 subprojects/libvduse/include/atomic.h   |1 +
 subprojects/libvduse/libvduse.c | 1374 +++
 subprojects/libvduse/libvduse.h |  237 
 subprojects/libvduse/linux-headers/linux|1 +
 subprojects/libvduse/meson.build|   10 +
 subprojects/libvduse/standard-headers/linux |1 +
 17 files changed, 2486 insertions(+), 4 deletions(-)
 create mode 100644 block/export/vduse-blk.c
 create mode 100644 block/export/vduse-blk.h
 create mode 100644 linux-headers/linux/vduse.h
 create mode 12 subprojects/libvduse/include/atomic.h
 create mode 100644 subprojects/libvduse/libvduse.c
 create mode 100644 subprojects/libvduse/libvduse.h
 create mode 12 subprojects/libvduse/linux-headers/linux
 create mode 100644 subprojects/libvduse/meson.build
 create mode 12 subprojects/libvduse/standard-headers/linux

-- 
2.20.1

[PATCH v2 2/6] linux-headers: Add vduse.h

2022-02-15 Thread Xie Yongji

This adds vduse header to linux headers so that the
relevant VDUSE API can be used in subsequent patches.

Signed-off-by: Xie Yongji 
---
 linux-headers/linux/vduse.h | 306 
 scripts/update-linux-headers.sh |   2 +-
 2 files changed, 307 insertions(+), 1 deletion(-)
 create mode 100644 linux-headers/linux/vduse.h

diff --git a/linux-headers/linux/vduse.h b/linux-headers/linux/vduse.h
new file mode 100644
index 00..d47b004ce6
--- /dev/null
+++ b/linux-headers/linux/vduse.h
@@ -0,0 +1,306 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _VDUSE_H_
+#define _VDUSE_H_
+
+#include 
+
+#define VDUSE_BASE 0x81
+
+/* The ioctls for control device (/dev/vduse/control) */
+
+#define VDUSE_API_VERSION  0
+
+/*
+ * Get the version of VDUSE API that kernel supported (VDUSE_API_VERSION).
+ * This is used for future extension.
+ */
+#define VDUSE_GET_API_VERSION  _IOR(VDUSE_BASE, 0x00, __u64)
+
+/* Set the version of VDUSE API that userspace supported. */
+#define VDUSE_SET_API_VERSION  _IOW(VDUSE_BASE, 0x01, __u64)
+
+/**
+ * struct vduse_dev_config - basic configuration of a VDUSE device
+ * @name: VDUSE device name, needs to be NUL terminated
+ * @vendor_id: virtio vendor id
+ * @device_id: virtio device id
+ * @features: virtio features
+ * @vq_num: the number of virtqueues
+ * @vq_align: the allocation alignment of virtqueue's metadata
+ * @reserved: for future use, needs to be initialized to zero
+ * @config_size: the size of the configuration space
+ * @config: the buffer of the configuration space
+ *
+ * Structure used by VDUSE_CREATE_DEV ioctl to create VDUSE device.
+ */
+struct vduse_dev_config {
+#define VDUSE_NAME_MAX 256
+   char name[VDUSE_NAME_MAX];
+   __u32 vendor_id;
+   __u32 device_id;
+   __u64 features;
+   __u32 vq_num;
+   __u32 vq_align;
+   __u32 reserved[13];
+   __u32 config_size;
+   __u8 config[];
+};
+
+/* Create a VDUSE device which is represented by a char device 
(/dev/vduse/$NAME) */
+#define VDUSE_CREATE_DEV   _IOW(VDUSE_BASE, 0x02, struct vduse_dev_config)
+
+/*
+ * Destroy a VDUSE device. Make sure there are no more references
+ * to the char device (/dev/vduse/$NAME).
+ */
+#define VDUSE_DESTROY_DEV  _IOW(VDUSE_BASE, 0x03, char[VDUSE_NAME_MAX])
+
+/* The ioctls for VDUSE device (/dev/vduse/$NAME) */
+
+/**
+ * struct vduse_iotlb_entry - entry of IOTLB to describe one IOVA region 
[start, last]
+ * @offset: the mmap offset on returned file descriptor
+ * @start: start of the IOVA region
+ * @last: last of the IOVA region
+ * @perm: access permission of the IOVA region
+ *
+ * Structure used by VDUSE_IOTLB_GET_FD ioctl to find an overlapped IOVA 
region.
+ */
+struct vduse_iotlb_entry {
+   __u64 offset;
+   __u64 start;
+   __u64 last;
+#define VDUSE_ACCESS_RO 0x1
+#define VDUSE_ACCESS_WO 0x2
+#define VDUSE_ACCESS_RW 0x3
+   __u8 perm;
+};
+
+/*
+ * Find the first IOVA region that overlaps with the range [start, last]
+ * and return the corresponding file descriptor. Return -EINVAL means the
+ * IOVA region doesn't exist. Caller should set start and last fields.
+ */
+#define VDUSE_IOTLB_GET_FD _IOWR(VDUSE_BASE, 0x10, struct 
vduse_iotlb_entry)
+
+/*
+ * Get the negotiated virtio features. It's a subset of the features in
+ * struct vduse_dev_config which can be accepted by virtio driver. It's
+ * only valid after FEATURES_OK status bit is set.
+ */
+#define VDUSE_DEV_GET_FEATURES _IOR(VDUSE_BASE, 0x11, __u64)
+
+/**
+ * struct vduse_config_data - data used to update configuration space
+ * @offset: the offset from the beginning of configuration space
+ * @length: the length to write to configuration space
+ * @buffer: the buffer used to write from
+ *
+ * Structure used by VDUSE_DEV_SET_CONFIG ioctl to update device
+ * configuration space.
+ */
+struct vduse_config_data {
+   __u32 offset;
+   __u32 length;
+   __u8 buffer[];
+};
+
+/* Set device configuration space */
+#define VDUSE_DEV_SET_CONFIG   _IOW(VDUSE_BASE, 0x12, struct vduse_config_data)
+
+/*
+ * Inject a config interrupt. It's usually used to notify virtio driver
+ * that device configuration space has changed.
+ */
+#define VDUSE_DEV_INJECT_CONFIG_IRQ_IO(VDUSE_BASE, 0x13)
+
+/**
+ * struct vduse_vq_config - basic configuration of a virtqueue
+ * @index: virtqueue index
+ * @max_size: the max size of virtqueue
+ * @reserved: for future use, needs to be initialized to zero
+ *
+ * Structure used by VDUSE_VQ_SETUP ioctl to setup a virtqueue.
+ */
+struct vduse_vq_config {
+   __u32 index;
+   __u16 max_size;
+   __u16 reserved[13];
+};
+
+/*
+ * Setup the specified virtqueue. Make sure all virtqueues have been
+ * configured before the device is attached to vDPA bus.
+ */
+#define VDUSE_VQ_SETUP _IOW(VDUSE_BASE, 0x14, struct vduse_vq_config)
+
+/**
+ * struct vduse_vq_state_split - split virtqueue state
+ * @avail_index: available index
+ */

Re: [PATCH 0/4] Make qemu-img dd more flexible

2022-02-15 Thread Fabian Ebner

Am 11.02.22 um 17:42 schrieb Hanna Reitz:
> On 11.02.22 17:31, Eric Blake wrote:
>> On Thu, Feb 10, 2022 at 02:31:19PM +0100, Fabian Ebner wrote:
>>> Adds support for reading from stdin and writing to stdout (when raw
>>> format is used), as well as overriding the size of the output and
>>> input image/stream.
>>>
>>> Additionally, the options -n for skipping output image creation and -l
>>> for loading a snapshot are made available like for convert.
>> Without looking at the series itself, I want to refer back to earlier
>> times that someone proposed improving 'qemu-img dd':
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg00636.html
>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg02618.html
>>
>> As well as the observation that when we originally allowed 'qemu-img
>> dd' to be added, the end goal was that if 'qemu-img dd' can't operate
>> as a thin wrapper around 'qemu-img convert', then 'qemu-img convert'
>> needs to be made more powerful first.  Every time we diverge on what
>> the two uses can do, rather than keeping dd as a thin wrapper, we add
>> to our maintenance burden.

I'm wondering why it's not actually implemented as a thin wrapper then?
The fact that it isn't is (part of) the reason why dd was chosen, as
mentioned in the first patch:

"While dd and convert have overlapping use cases, `dd` is a
simple read/write loop while convert is much more
sophisticated and has ways to dealing with holes and blocks
of zeroes.
Since these typically can't be detected in pipes via
SEEK_DATA/HOLE or skipped while writing, dd seems to be the
better choice for implementing stdin/stdout streams."

Adding the same feature to convert seems much more involved.

>>
>> Sadly, there is a lot of technical debt in this area ('qemu-img dd
>> skip= count=' is STILL broken, more than 4 years after I first
>> proposed a potential patch), where no one has spent the necessary time
>> to improve the situation.
> 
> Note that by now (in contrast to 2018), we have FUSE disk exports, and I
> even have a script that uses them to let you run dd on any image:
> 
> https://gitlab.com/hreitz/qemu-scripts/-/blob/main/qemu-dd.py
> 
> Which is nice, because it gives you feature parity with dd, because it
> simply runs dd.

Thank you for the suggestion. It's definitely worth considering,
although it does add a bit of complexity and we currently don't have
FUSE support enabled in our builds.

> 
> (The main problem with the script is that it lives in that personal repo
> of mine and so nobody but me knows about it.  Suggestions to improve
> that situation are more than welcome.)
> 
> Now, the qemu storage daemon does not support loading qcow2 snapshots
> (as far as I’m aware), which is proposed in patch 4 of this series.  But
> I think that just means that it would be nice if the QSD could support
> that.

I suppose adding a snapshot-load QMP command would be a natural way to
add it?

> 
> Hanna
> 
>

Re: [PATCH v2 7/8] tests/qemu-iotests/testrunner: Print diff to stderr in TAP mode

2022-02-15 Thread Paolo Bonzini


On 2/9/22 11:15, Thomas Huth wrote:

When running in TAP mode, stdout is reserved for the TAP protocol.
To see the "diff" of the failed test, we have to print it to
stderr instead.

Signed-off-by: Thomas Huth 
---
  tests/qemu-iotests/testrunner.py | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/testrunner.py b/tests/qemu-iotests/testrunner.py
index 9d20f51bb1..1f7ca1f2f9 100644
--- a/tests/qemu-iotests/testrunner.py
+++ b/tests/qemu-iotests/testrunner.py
@@ -404,7 +404,10 @@ def run_tests(self, tests: List[str], jobs: int = 1) -> 
bool:
  if res.status == 'fail':
  failed.append(name)
  if res.diff:
-print('\n'.join(res.diff))
+if self.tap:
+print('\n'.join(res.diff), file=sys.stderr)
+else:
+print('\n'.join(res.diff))
  elif res.status == 'not run':
  notrun.append(name)
  elif res.status == 'pass':


Queued, thanks.

Paolo

54 matches

Mail list logo