Re: [PATCH v2] nbd/server: Add --selinux-label option

2021-09-24 Thread Eric Blake
Ping

On Wed, Aug 25, 2021 at 02:35:04PM -0500, Eric Blake wrote:
> On Fri, Jul 23, 2021 at 05:38:06PM +0100, Daniel P. Berrangé wrote:
> > On Fri, Jul 23, 2021 at 06:18:55PM +0200, Kevin Wolf wrote:
> > > Am 23.07.2021 um 12:33 hat Richard W.M. Jones geschrieben:
> > > > Under SELinux, Unix domain sockets have two labels.  One is on the
> > > > disk and can be set with commands such as chcon(1).  There is a
> > > > different label stored in memory (called the process label).  This can
> > > > only be set by the process creating the socket.  When using SELinux +
> > > > SVirt and wanting qemu to be able to connect to a qemu-nbd instance,
> > > > you must set both labels correctly first.
> > > > 
> > > > For qemu-nbd the options to set the second label are awkward.  You can
> > > > create the socket in a wrapper program and then exec into qemu-nbd.
> > > > Or you could try something with LD_PRELOAD.
> > > > 
> > > > This commit adds the ability to set the label straightforwardly on the
> > > > command line, via the new --selinux-label flag.  (The name of the flag
> > > > is the same as the equivalent nbdkit option.)
> > > > 
> > > > A worked example showing how to use the new option can be found in
> > > > this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1984938
> > > > 
> > > > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1984938
> > > > Signed-off-by: Richard W.M. Jones 
> > > 
> > > I suppose this would also be relevant for the built-in NBD server,
> > > especially in the context of qemu-storage-daemon?
> > 
> > It depends on the usage scenario really. nbdkit / qemu-nbd are
> > not commonly run under any SELinux policy, so then end up being
> > unconfined_t. A QEMU NBD client can't connect to an unconfined_t
> > socket, so we need to override it with this arg.
> > 
> > In the case of qemu system emulator, under libvirt, it will
> > already have a svirt_t type, so in that case there is no need
> > to override the type for the socket.
> > 
> > For qsd there's not really any strong practice established
> > but i expect most current usage is unconfined_t too and
> > would benefit from setting label.
> > 
> > > If so, is this something specific to NBD sockets, or would it actually
> > > make sense to have it as a generic option in UnixSocketAddress?
> > 
> > It is applicable to inet sockets too in fact.
> 
> So now that 6.2 is open, should I queue the patch as is, or wait for a
> v3 that makes the option more generic to all socket usage?
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v3 1/9] nbd/client-connection: nbd_co_establish_connection(): fix non set errp

2021-09-24 Thread Eric Blake
On Tue, Sep 07, 2021 at 12:44:53PM -0500, Eric Blake wrote:
> On Mon, Sep 06, 2021 at 10:06:46PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > When we don't have a connection and blocking is false, we return NULL
> > but don't set errp. That's wrong.
> 
> Oops...
> 
> > 
> > We have two paths for calling nbd_co_establish_connection():
> > 
> > 1. nbd_open() -> nbd_do_establish_connection() -> ...
> >   but that will never set blocking=false
> > 
> > 2. nbd_reconnect_attempt() -> nbd_co_do_establish_connection() -> ...
> >   but that uses errp=NULL
> > 
> > So, we are safe with our wrong errp policy in
> > nbd_co_establish_connection(). Still let's fix it.
> 
> ...phew!  Thus, it's not critical to backport.
> 
> Reviewed-by: Eric Blake 

Queuing this one through my NBD tree.  Given the discussion on 2/9,
I'll leave the rest of the series for later.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 11/11] Deprecate stable non-JSON -device and -object

2021-09-24 Thread Eric Blake
On Fri, Sep 24, 2021 at 11:04:27AM +0200, Kevin Wolf wrote:
> We want to switch both from QemuOpts to the keyval parser in the future,
> which results in some incompatibilities, mainly around list handling.
> Mark the non-JSON version of both as unstable syntax so that management
> tools switch to JSON and we can later make the change without breaking
> things.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  docs/about/deprecated.rst | 11 +++
>  1 file changed, 11 insertions(+)

Reviewed-by: Eric Blake 

> 
> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
> index 3c2be84d80..42f6a478fb 100644
> --- a/docs/about/deprecated.rst
> +++ b/docs/about/deprecated.rst
> @@ -160,6 +160,17 @@ Use ``-display sdl`` instead.
>  
>  Use ``-display curses`` instead.
>  
> +Stable non-JSON ``-device`` and ``-object`` syntax (since 6.2)
> +''
> +
> +If you rely on a stable interface for ``-device`` and ``-object`` that 
> doesn't
> +change incompatibly between QEMU versions (e.g. because you are using the 
> QEMU
> +command line as a machine interface in scripts rather than interactively), 
> use
> +JSON syntax for these options instead.
> +
> +There is no intention to remove support for non-JSON syntax entirely, but
> +future versions may change the way to spell some options.
> +
>  
>  Plugin argument passing through ``arg=`` (since 6.1)
>  
> -- 
> 2.31.1
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH] nbd/server: Allow LIST_META_CONTEXT without STRUCTURED_REPLY

2021-09-24 Thread Eric Blake
On Tue, Sep 14, 2021 at 05:19:42PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 07.09.2021 20:35, Eric Blake wrote:
> > The NBD protocol just relaxed the requirements on
> > NBD_OPT_LIST_META_CONTEXT:
> > 
> > https://github.com/NetworkBlockDevice/nbd/commit/13a4e33a87
> > 
> > Since listing is not stateful (unlike SET_META_CONTEXT), we don't care
> > if a client asks for meta contexts without first requesting structured
> > replies.  Well-behaved clients will still ask for structured reply
> > first (if for no other reason than for back-compat to older servers),
> > but that's no reason to avoid this change.
> > 
> > Signed-off-by: Eric Blake
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 

Thanks; queuing this one through my NBD tree.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 10/11] vl: Enable JSON syntax for -device

2021-09-24 Thread Eric Blake
On Fri, Sep 24, 2021 at 11:04:26AM +0200, Kevin Wolf wrote:
> Like we already do for -object, introduce support for JSON syntax in
> -device, which can be kept stable in the long term and guarantees that a
> single code path with identical behaviour is used for both QMP and the
> command line. Compared to the QemuOpts based code, the parser contains
> less surprises and has support for non-scalar options (lists and
> structs). Switching management tools to JSON means that we can more
> easily change the "human" CLI syntax from QemuOpts to the keyval parser
> later.
> 
> In the QAPI schema, a feature flag is added to the device-add command to
> allow management tools to detect support for this.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qapi/qdev.json | 15 +
>  softmmu/vl.c   | 58 --
>  2 files changed, 62 insertions(+), 11 deletions(-)
> 
> diff --git a/qapi/qdev.json b/qapi/qdev.json
> index b83178220b..cdc8f911b5 100644
> --- a/qapi/qdev.json
> +++ b/qapi/qdev.json
> @@ -32,17 +32,23 @@
>  ##
>  # @device_add:
>  #
> +# Add a device.
> +#
>  # @driver: the name of the new device's driver
>  #
>  # @bus: the device's parent bus (device tree path)
>  #
>  # @id: the device's ID, must be unique
>  #
> -# Additional arguments depend on the type.
> -#
> -# Add a device.
> +# Features:
> +# @json-cli: If present, the "-device" command line option supports JSON
> +#syntax with a structure identical to the arguments of this
> +#command.
>  #
>  # Notes:
> +#
> +# Additional arguments depend on the type.
> +#
>  # 1. For detailed information about this command, please refer to the
>  #'docs/qdev-device-use.txt' file.
>  #
> @@ -67,7 +73,8 @@
>  ##
>  { 'command': 'device_add',
>'data': {'driver': 'str', '*bus': 'str', '*id': 'str'},
> -  'gen': false } # so we can get the additional arguments
> +  'gen': false, # so we can get the additional arguments
> +  'features': ['json-cli'] }

Eventually, we'll get rid of this 'gen':false, but this patch series
is already an improvement towards that goal.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 09/11] qdev: Avoid QemuOpts in QMP device_add

2021-09-24 Thread Eric Blake
On Fri, Sep 24, 2021 at 11:04:25AM +0200, Kevin Wolf wrote:
> Directly call qdev_device_add_from_qdict() for QMP device_add instead of
> first going through QemuOpts and converting back to QDict.
> 
> Note that this changes the behaviour of device_add, though in ways that
> should be considered bug fixes:
> 
> QemuOpts ignores differences between data types, so you could
> successfully pass a string "123" for an integer property, or a string
> "on" for a boolean property (and vice versa).  After this change, the
> correct data type for the property must be used in the JSON input.
> 
> qemu_opts_from_qdict() also silently ignores any options whose value is
> a QDict, QList or QNull.
> 
> To illustrate, the following QMP command was accepted before and is now
> rejected for both reasons:
> 
> { "execute": "device_add",
>   "arguments": { "driver": "scsi-cd",
>  "drive": { "completely": "invalid" },
>  "physical_block_size": "4096" } }
> 
> Signed-off-by: Kevin Wolf 
> ---
>  softmmu/qdev-monitor.c | 18 +++---
>  1 file changed, 11 insertions(+), 7 deletions(-)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 08/11] qdev: Base object creation on QDict rather than QemuOpts

2021-09-24 Thread Eric Blake
On Fri, Sep 24, 2021 at 11:04:24AM +0200, Kevin Wolf wrote:
> QDicts are both what QMP natively uses and what the keyval parser
> produces. Going through QemuOpts isn't useful for either one, so switch
> the main device creation function to QDicts. By sharing more code with
> the -object/object-add code path, we can even reduce the code size a
> bit.
> 
> This commit doesn't remove the detour through QemuOpts from any code
> path yet, but it allows the following commits to do so.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/hw/qdev-core.h |  8 ++---
>  hw/core/qdev.c |  5 ++--
>  hw/net/virtio-net.c|  4 +--
>  hw/vfio/pci.c  |  4 +--
>  softmmu/qdev-monitor.c | 67 +++---
>  5 files changed, 41 insertions(+), 47 deletions(-)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 04/11] qdev: Avoid using string visitor for properties

2021-09-24 Thread Eric Blake
On Fri, Sep 24, 2021 at 11:04:20AM +0200, Kevin Wolf wrote:
> The only thing the string visitor adds compared to a keyval visitor is
> list support. git grep for 'visit_start_list' and 'visit.*List' shows
> that devices don't make use of this.
> 
> In a world with a QAPIfied command line interface, the keyval visitor is
> used to parse the command line. In order to make sure that no devices
> start using this feature that would make backwards compatibility harder,
> just switch away from object_property_parse(), which internally uses the
> string visitor, to a keyval visitor and object_property_set().
> 
> Signed-off-by: Kevin Wolf 
> ---
>  softmmu/qdev-monitor.c | 20 +---
>  1 file changed, 17 insertions(+), 3 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v2 0/6] iotests: update environment and linting configuration

2021-09-24 Thread John Snow
On Thu, Sep 23, 2021 at 2:07 PM John Snow  wrote:

> GitLab: https://gitlab.com/jsnow/qemu/-/commits/python-package-iotest-pt1
> CI: https://gitlab.com/jsnow/qemu/-/pipelines/376236687
>
> This series partially supersedes:
>   [PATCH v3 00/16] python/iotests: Run iotest linters during Python CI'
>
> Howdy, this is good stuff we want even if we aren't yet in agreement
> about the best way to run iotest 297 from CI.
>
> - Update linting config to tolerate pylint 2.11.1
> - Eliminate sys.path hacking in individual test files
> - make mypy execution in test 297 faster
>
> The rest of the actual "run at CI time" stuff can get handled separately
> and later pending some discussion on the other series.
>
> V2:
>
> 001/6:[0011] [FC] 'iotests: add 'qemu' package location to PYTHONPATH in
> testenv'
> 002/6:[0025] [FC] 'iotests: add warning for rogue 'qemu' packages'
>
> - Squashed in a small optimization from Vladimir to 001, kept R-Bs.
> - Fixed the package detection logic to not panic if it can't find
>   'qemu' at all (kwolf)
> - Updated commit messages for the first two patches.
>
> --js
>
> John Snow (6):
>   iotests: add 'qemu' package location to PYTHONPATH in testenv
>   iotests: add warning for rogue 'qemu' packages
>   iotests/linters: check mypy files all at once
>   iotests/mirror-top-perms: Adjust imports
>   iotests/migrate-bitmaps-test: delint
>   iotests: Update for pylint 2.11.1
>
>  tests/qemu-iotests/235|  2 -
>  tests/qemu-iotests/297| 50 ---
>  tests/qemu-iotests/300|  7 ++-
>  tests/qemu-iotests/iotests.py |  2 -
>  tests/qemu-iotests/pylintrc   |  6 ++-
>  tests/qemu-iotests/testenv.py | 39 ---
>  tests/qemu-iotests/testrunner.py  |  7 +--
>  tests/qemu-iotests/tests/migrate-bitmaps-test | 50 +++
>  tests/qemu-iotests/tests/mirror-top-perms | 12 ++---
>  9 files changed, 99 insertions(+), 76 deletions(-)
>
> --
> 2.31.1
>
>
>
Patch 2 can just be dropped, and everything else is reviewed, so I think
this can be staged at your leisure.

--js


Re: [PATCH v2 2/6] iotests: add warning for rogue 'qemu' packages

2021-09-24 Thread John Snow
On Thu, Sep 23, 2021 at 4:27 PM Vladimir Sementsov-Ogievskiy <
vsement...@virtuozzo.com> wrote:

> 23.09.2021 21:44, John Snow wrote:
> >
> >
> > On Thu, Sep 23, 2021 at 2:32 PM Vladimir Sementsov-Ogievskiy <
> vsement...@virtuozzo.com > wrote:
> >
> > 23.09.2021 21:07, John Snow wrote:
> >  > Add a warning for when 'iotests' runs against a qemu namespace
> that
> >  > isn't the one in the source tree. This might occur if you have
> >  > (accidentally) installed the Python namespace package to your
> local
> >  > packages.
> >  >
> >  > (I'm not going to say that this is because I bit myself with this,
> >  > but you can fill in the blanks.)
> >  >
> >  > In the future, we will pivot to always preferring a specific
> installed
> >  > instance of qemu python packages managed directly by iotests. For
> now
> >  > simply warn if there is an ambiguity over which instance that
> iotests
> >  > might use.
> >  >
> >  > Example: If a user has navigated to ~/src/qemu/python and executed
> >  > `pip install .`, you will see output like this when running
> `./check`:
> >  >
> >  > WARNING: 'qemu' python packages will be imported from outside the
> source tree ('/home/jsnow/src/qemu/python')
> >  >   Importing instead from
> '/home/jsnow/.local/lib/python3.9/site-packages/qemu'
> >  >
> >  > Signed-off-by: John Snow  js...@redhat.com>>
> >  > ---
> >  >   tests/qemu-iotests/testenv.py | 24 
> >  >   1 file changed, 24 insertions(+)
> >  >
> >  > diff --git a/tests/qemu-iotests/testenv.py
> b/tests/qemu-iotests/testenv.py
> >  > index 99a57a69f3a..1c0f6358538 100644
> >  > --- a/tests/qemu-iotests/testenv.py
> >  > +++ b/tests/qemu-iotests/testenv.py
> >  > @@ -16,6 +16,8 @@
> >  >   # along with this program.  If not, see <
> http://www.gnu.org/licenses/ >.
> >  >   #
> >  >
> >  > +import importlib.util
> >  > +import logging
> >  >   import os
> >  >   import sys
> >  >   import tempfile
> >  > @@ -112,6 +114,27 @@ def init_directories(self) -> None:
> >  >   # Path where qemu goodies live in this source tree.
> >  >   qemu_srctree_path = Path(__file__,
> '../../../python').resolve()
> >  >
> >  > +# Warn if we happen to be able to find qemu namespace
> packages
> >  > +# (using qemu.qmp as a bellwether) from an unexpected
> location.
> >  > +# i.e. the package is already installed in the user's
> environment.
> >  > +try:
> >  > +qemu_spec = importlib.util.find_spec('qemu.qmp')
> >  > +except ModuleNotFoundError:
> >  > +qemu_spec = None
> >  > +
> >  > +if qemu_spec and qemu_spec.origin:
> >  > +spec_path = Path(qemu_spec.origin)
> >  > +try:
> >  > +_ = spec_path.relative_to(qemu_srctree_path)
> >
> > It took some time and looking at specification trying to understand
> what's going on here :)
> >
> > Could we just use:
> >
> > if not Path(qemu_spec.origin).is_relative_to(qemu_srctree_path):
> >  ... logging ...
> >
> >
> > Nope, that's 3.9+ only. (I made the same mistake.)
>
> Oh :(
>
> OK
>
> >
> >
> >  > +except ValueError:
> >
> >  > +self._logger.warning(
> >  > +"WARNING: 'qemu' python packages will be
> imported from"
> >  > +" outside the source tree ('%s')",
> >  > +qemu_srctree_path)
>
> why not use f-strings ? :)
>
>
The logging subsystem still uses % formatting by default, you can see I'm
not actually using the % operator to apply the values -- the Python docs
claim this is for "efficiency" because the string doesn't have to be
evaluated unless the logging statement is actually emitted, but that gain
sounds mostly theoretical to me, because Python still has eager evaluation
of passed arguments ... unless I've missed something.

Either way, using f-strings on logging calls gives you a pylint warning
that I'd just have to then disable, so I just skip the hassle.

Now, the logging system *does* allow you to use new-style python strings [
https://docs.python.org/3/howto/logging-cookbook.html#formatting-styles ]
but given that the default is still %-style and virtually every library
uses that style, I thought it might cause more problems than it creates by
trying to use a different style for just one portion of the codebase. Mind
you, I've never even bothered to try, so my apprehensions might not be
strictly factual ;)


> >  > +self._logger.warning(
> >  > +" Importing instead from '%s'",
> >  > +spec_path.parents[1])
> >  > +
> >
> > Also, I'd move this new chunk 

Re: [PATCH 01/11] qom: Reduce use of error_propagate()

2021-09-24 Thread Eric Blake
On Fri, Sep 24, 2021 at 11:04:17AM +0200, Kevin Wolf wrote:
> ERRP_GUARD() makes debugging easier by making sure that _abort
> still fails at the real origin of the error instead of
> error_propagate().
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qom/object.c|  7 +++
>  qom/object_interfaces.c | 17 ++---
>  2 files changed, 9 insertions(+), 15 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v6 4/5] block/nbd: drop connection_co

2021-09-24 Thread Eric Blake
On Thu, Sep 02, 2021 at 01:38:04PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> OK, that's a big rewrite of the logic.

And a time-consuming review on my part!

> 
> Pre-patch we have an always running coroutine - connection_co. It does
> reply receiving and reconnecting. And it leads to a lot of difficult
> and unobvious code around drained sections and context switch. We also
> abuse bs->in_flight counter which is increased for connection_co and
> temporary decreased in points where we want to allow drained section to
> begin. One of these place is in another file: in nbd_read_eof() in
> nbd/client.c.
> 
> We also cancel reconnect and requests waiting for reconnect on drained
> begin which is not correct. And this patch fixes that.
> 
> Let's finally drop this always running coroutine and go another way:
> do both reconnect and receiving in request coroutines.
> 
> The detailed list of changes below (in the sequence of diff hunks).

Well, depending on the git order file in use ;)

> 
> 1. receiving coroutines are woken directly from nbd_channel_error, when
>we change s->state
> 
> 2. nbd_co_establish_connection_cancel(): we don't have drain_begin now,
>and in nbd_teardown_connection() all requests should already be
>finished (and reconnect is done from request). So
>nbd_co_establish_connection_cancel() is called from
>nbd_cancel_in_flight() (to cancel the request that is doing
>nbd_co_establish_connection()) and from reconnect_delay_timer_cb()
>(previously we didn't need it, as reconnect delay only should cancel
>active requests not the reconnection itself. But now reconnection

Missing )

>itself is done in the separate thread (we now call
>nbd_client_connection_enable_retry() in nbd_open()), and we need to
>cancel the requests that waits in nbd_co_establish_connection()

Singular/plural disagreement. I think the intended meaning is
'requests that wait' and not 'request that waits'.

>now).
> 
> 2. We do receive headers in request coroutine. But we also should

Second point 2; I'll call it 2A below, because it looks related to
point 8.

>dispatch replies for another pending requests. So,

s/another/other/

>nbd_connection_entry() is turned into nbd_receive_replies(), which
>does reply dispatching until it receive another request headers, and

s/until/while/, s/another/other/

>returns when it receive the requested header.

receives

> 
> 3. All old staff around drained sections and context switch is dropped.
>In details:
>- we don't need to move connection_co to new aio context, as we
>  don't have connection_co anymore
>- we don't have a fake "request" of connection_co (extra increasing
>  in_flight), so don't care with it in drain_begin/end
>- we don't stop reconnection during drained section anymore. This
>  means that drain_begin may wait for a long time (up to
>  reconnect_delay). But that's an improvement and more correct
>  behavior see below[*]
> 
> 4. In nbd_teardown_connection() we don't have to wait for
>connection_co, as it is dropped. And cleanup for s->ioc and nbd_yank
>is moved here from removed connection_co.
> 
> 5. In nbd_co_do_establish_connection() we now should handle
>NBD_CLIENT_CONNECTING_NOWAIT: if new request comes when we are in
>NBD_CLIENT_CONNECTING_NOWAIT, it still should call
>nbd_co_establish_connection() (who knows, maybe connection already
>established by thread in background). But we shouldn't wait: if

maybe the connection was already established by another thread in the background

>nbd_co_establish_connection() can't return new channel immediately
>the request should fail (we are in NBD_CLIENT_CONNECTING_NOWAIT
>state).
> 
> 6. nbd_reconnect_attempt() is simplified: it's now easier to wait for
>other requests in the caller, so here we just assert that fact.
>Also delay time is now initialized here: we can easily detect first
>attempt and start a timer.
> 
> 7. nbd_co_reconnect_loop() is dropped, we don't need it. Reconnect
>retries are fully handle by thread (nbd/client-connection.c), delay
>timer we initialize in nbd_reconnect_attempt(), we don't have to
>bother with s->drained and friends. nbd_reconnect_attempt() now
>called from nbd_co_send_request().

A lot going on there, but it's making sense so far.

> 
> 8. nbd_connection_entry is dropped: reconnect is now handled by
>nbd_co_send_request(), receiving reply is now handled by
>nbd_receive_replies(): all handled from request coroutines.
> 
> 9. So, welcome new nbd_receive_replies() called from request coroutine,
>that receives reply header instead of nbd_connection_entry().
>Like with sending requests, only one coroutine may receive in a
>moment. So we introduce receive_mutex, which is locked around
>nbd_receive_reply(). It also protects some related fields. Still,
>full audit of thread-safety in nbd driver is a separate 

Re: [PATCH 05/11] qdev: Make DeviceState.id independent of QemuOpts

2021-09-24 Thread Vladimir Sementsov-Ogievskiy

24.09.2021 18:10, Kevin Wolf wrote:

Am 24.09.2021 um 16:02 hat Vladimir Sementsov-Ogievskiy geschrieben:

24.09.2021 12:04, Kevin Wolf wrote:

DeviceState.id is a pointer to a string that is stored in the QemuOpts
object DeviceState.opts and freed together with it. We want to create
devices without going through QemuOpts in the future, so make this a
separately allocated string.

Signed-off-by: Kevin Wolf 


Interesting that in hw/xen/xen-legacy-backend.c
g_strdup_printf-allocated id is passed to qdev_set_id prior this
patch. So, the patch seems to fix that small leak. Worth to mention?


Ok, I can mention it explicitly.


---
   include/hw/qdev-core.h  | 2 +-
   include/monitor/qdev.h  | 2 +-
   hw/arm/virt.c   | 2 +-
   hw/core/qdev.c  | 1 +
   hw/pci-bridge/pci_expander_bridge.c | 2 +-
   hw/ppc/e500.c   | 2 +-
   softmmu/qdev-monitor.c  | 5 +++--
   7 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 34c8a7506a..1857d9698e 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -176,7 +176,7 @@ struct DeviceState {
   Object parent_obj;
   /*< public >*/
-const char *id;
+char *id;
   char *canonical_path;
   bool realized;
   bool pending_deleted_event;
diff --git a/include/monitor/qdev.h b/include/monitor/qdev.h
index eaa947d73a..389287eb44 100644
--- a/include/monitor/qdev.h
+++ b/include/monitor/qdev.h
@@ -9,6 +9,6 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error 
**errp);
   int qdev_device_help(QemuOpts *opts);
   DeviceState *qdev_device_add(QemuOpts *opts, Error **errp);
-void qdev_set_id(DeviceState *dev, const char *id);
+void qdev_set_id(DeviceState *dev, char *id);
   #endif
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 1d59f0e59f..f933d48d3b 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1459,7 +1459,7 @@ static void create_platform_bus(VirtMachineState *vms)
   MemoryRegion *sysmem = get_system_memory();
   dev = qdev_new(TYPE_PLATFORM_BUS_DEVICE);
-dev->id = TYPE_PLATFORM_BUS_DEVICE;
+dev->id = g_strdup(TYPE_PLATFORM_BUS_DEVICE);
   qdev_prop_set_uint32(dev, "num_irqs", PLATFORM_BUS_NUM_IRQS);
   qdev_prop_set_uint32(dev, "mmio_size", 
vms->memmap[VIRT_PLATFORM_BUS].size);
   sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index cefc5eaa0a..d918b50a1d 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -956,6 +956,7 @@ static void device_finalize(Object *obj)
   }
   qemu_opts_del(dev->opts);
+g_free(dev->id);
   }
   static void device_class_base_init(ObjectClass *class, void *data)
diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index 7112dc3062..10e6e7c2ab 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -245,7 +245,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool 
pcie, Error **errp)
   } else {
   bus = pci_root_bus_new(ds, "pxb-internal", NULL, NULL, 0, 
TYPE_PXB_BUS);
   bds = qdev_new("pci-bridge");
-bds->id = dev_name;
+bds->id = g_strdup(dev_name);
   qdev_prop_set_uint8(bds, PCI_BRIDGE_DEV_PROP_CHASSIS_NR, 
pxb->bus_nr);
   qdev_prop_set_bit(bds, PCI_BRIDGE_DEV_PROP_SHPC, false);
   }
diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
index 95451414dd..960e7efcd3 100644
--- a/hw/ppc/e500.c
+++ b/hw/ppc/e500.c
@@ -1006,7 +1006,7 @@ void ppce500_init(MachineState *machine)
   /* Platform Bus Device */
   if (pmc->has_platform_bus) {
   dev = qdev_new(TYPE_PLATFORM_BUS_DEVICE);
-dev->id = TYPE_PLATFORM_BUS_DEVICE;
+dev->id = g_strdup(TYPE_PLATFORM_BUS_DEVICE);
   qdev_prop_set_uint32(dev, "num_irqs", pmc->platform_bus_num_irqs);
   qdev_prop_set_uint32(dev, "mmio_size", pmc->platform_bus_size);
   sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 034b999401..1207e57a46 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -592,7 +592,8 @@ static BusState *qbus_find(const char *path, Error **errp)
   return bus;
   }
-void qdev_set_id(DeviceState *dev, const char *id)
+/* Takes ownership of @id, will be freed when deleting the device */
+void qdev_set_id(DeviceState *dev, char *id)
   {
   if (id) {
   dev->id = id;
@@ -690,7 +691,7 @@ DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)
   }
   }
-qdev_set_id(dev, qemu_opts_id(opts));
+qdev_set_id(dev, g_strdup(qemu_opts_id(opts)));
   /* set properties */
   if (qemu_opt_foreach(opts, set_property, dev, errp)) {



In hw/pci/pci_bridge.c

we do

br->bus_name = dev->qdev.id

It seems bad, as we now stole dynamic pointer, that may be freed later.


If it's bad now, it was as bad before 

Re: [PATCH 05/11] qdev: Make DeviceState.id independent of QemuOpts

2021-09-24 Thread Kevin Wolf
Am 24.09.2021 um 16:02 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 24.09.2021 12:04, Kevin Wolf wrote:
> > DeviceState.id is a pointer to a string that is stored in the QemuOpts
> > object DeviceState.opts and freed together with it. We want to create
> > devices without going through QemuOpts in the future, so make this a
> > separately allocated string.
> > 
> > Signed-off-by: Kevin Wolf 
> 
> Interesting that in hw/xen/xen-legacy-backend.c
> g_strdup_printf-allocated id is passed to qdev_set_id prior this
> patch. So, the patch seems to fix that small leak. Worth to mention?

Ok, I can mention it explicitly.

> > ---
> >   include/hw/qdev-core.h  | 2 +-
> >   include/monitor/qdev.h  | 2 +-
> >   hw/arm/virt.c   | 2 +-
> >   hw/core/qdev.c  | 1 +
> >   hw/pci-bridge/pci_expander_bridge.c | 2 +-
> >   hw/ppc/e500.c   | 2 +-
> >   softmmu/qdev-monitor.c  | 5 +++--
> >   7 files changed, 9 insertions(+), 7 deletions(-)
> > 
> > diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> > index 34c8a7506a..1857d9698e 100644
> > --- a/include/hw/qdev-core.h
> > +++ b/include/hw/qdev-core.h
> > @@ -176,7 +176,7 @@ struct DeviceState {
> >   Object parent_obj;
> >   /*< public >*/
> > -const char *id;
> > +char *id;
> >   char *canonical_path;
> >   bool realized;
> >   bool pending_deleted_event;
> > diff --git a/include/monitor/qdev.h b/include/monitor/qdev.h
> > index eaa947d73a..389287eb44 100644
> > --- a/include/monitor/qdev.h
> > +++ b/include/monitor/qdev.h
> > @@ -9,6 +9,6 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error 
> > **errp);
> >   int qdev_device_help(QemuOpts *opts);
> >   DeviceState *qdev_device_add(QemuOpts *opts, Error **errp);
> > -void qdev_set_id(DeviceState *dev, const char *id);
> > +void qdev_set_id(DeviceState *dev, char *id);
> >   #endif
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index 1d59f0e59f..f933d48d3b 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -1459,7 +1459,7 @@ static void create_platform_bus(VirtMachineState *vms)
> >   MemoryRegion *sysmem = get_system_memory();
> >   dev = qdev_new(TYPE_PLATFORM_BUS_DEVICE);
> > -dev->id = TYPE_PLATFORM_BUS_DEVICE;
> > +dev->id = g_strdup(TYPE_PLATFORM_BUS_DEVICE);
> >   qdev_prop_set_uint32(dev, "num_irqs", PLATFORM_BUS_NUM_IRQS);
> >   qdev_prop_set_uint32(dev, "mmio_size", 
> > vms->memmap[VIRT_PLATFORM_BUS].size);
> >   sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
> > diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> > index cefc5eaa0a..d918b50a1d 100644
> > --- a/hw/core/qdev.c
> > +++ b/hw/core/qdev.c
> > @@ -956,6 +956,7 @@ static void device_finalize(Object *obj)
> >   }
> >   qemu_opts_del(dev->opts);
> > +g_free(dev->id);
> >   }
> >   static void device_class_base_init(ObjectClass *class, void *data)
> > diff --git a/hw/pci-bridge/pci_expander_bridge.c 
> > b/hw/pci-bridge/pci_expander_bridge.c
> > index 7112dc3062..10e6e7c2ab 100644
> > --- a/hw/pci-bridge/pci_expander_bridge.c
> > +++ b/hw/pci-bridge/pci_expander_bridge.c
> > @@ -245,7 +245,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool 
> > pcie, Error **errp)
> >   } else {
> >   bus = pci_root_bus_new(ds, "pxb-internal", NULL, NULL, 0, 
> > TYPE_PXB_BUS);
> >   bds = qdev_new("pci-bridge");
> > -bds->id = dev_name;
> > +bds->id = g_strdup(dev_name);
> >   qdev_prop_set_uint8(bds, PCI_BRIDGE_DEV_PROP_CHASSIS_NR, 
> > pxb->bus_nr);
> >   qdev_prop_set_bit(bds, PCI_BRIDGE_DEV_PROP_SHPC, false);
> >   }
> > diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
> > index 95451414dd..960e7efcd3 100644
> > --- a/hw/ppc/e500.c
> > +++ b/hw/ppc/e500.c
> > @@ -1006,7 +1006,7 @@ void ppce500_init(MachineState *machine)
> >   /* Platform Bus Device */
> >   if (pmc->has_platform_bus) {
> >   dev = qdev_new(TYPE_PLATFORM_BUS_DEVICE);
> > -dev->id = TYPE_PLATFORM_BUS_DEVICE;
> > +dev->id = g_strdup(TYPE_PLATFORM_BUS_DEVICE);
> >   qdev_prop_set_uint32(dev, "num_irqs", pmc->platform_bus_num_irqs);
> >   qdev_prop_set_uint32(dev, "mmio_size", pmc->platform_bus_size);
> >   sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
> > diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
> > index 034b999401..1207e57a46 100644
> > --- a/softmmu/qdev-monitor.c
> > +++ b/softmmu/qdev-monitor.c
> > @@ -592,7 +592,8 @@ static BusState *qbus_find(const char *path, Error 
> > **errp)
> >   return bus;
> >   }
> > -void qdev_set_id(DeviceState *dev, const char *id)
> > +/* Takes ownership of @id, will be freed when deleting the device */
> > +void qdev_set_id(DeviceState *dev, char *id)
> >   {
> >   if (id) {
> >   dev->id = id;
> > @@ -690,7 +691,7 @@ DeviceState *qdev_device_add(QemuOpts *opts, Error 
> > **errp)
> >   }
> >   }

Re: [PATCH] block: introduce max_hw_iov for use in scsi-generic

2021-09-24 Thread Michael Roth via
On Fri, Sep 24, 2021 at 08:50:05AM +0200, Christian Borntraeger wrote:
> Peter, Michael,
> 
> do we still do stable releases for QEMU or has this stopped?

Hi Christian,

Yes, it's just been a perfect storm of job moves / bad timing / much-needed
testing rework. I plan to restart the stable releases starting with 6.0.1 and
6.1.1 shortly after. I'll get the release schedules posted on the release wiki
week.

Sorry for the delays on this.

-Mike

> 
> Am 24.09.21 um 07:27 schrieb Paolo Bonzini:
> > Yes, the question is whether it still exists... Paolo El jue., 23 sept. 
> > 2021 16:48, Christian Borntraeger  escribió: Am 
> > 23.09.21 um 15:04 schrieb Paolo Bonzini: > Linux limits the size of iovecs 
> > to 1024 (UIO_MAXIOV ZjQcmQRYFpfptBannerStart
> > This Message Is From an External Sender
> > This message came from outside your organization.
> > ZjQcmQRYFpfptBannerEnd
> > Yes, the question is whether it still exists...
> > 
> > Paolo
> > 
> > El jue., 23 sept. 2021 16:48, Christian Borntraeger  > > escribió:
> > 
> > 
> > 
> > Am 23.09.21 um 15:04 schrieb Paolo Bonzini:
> >  > Linux limits the size of iovecs to 1024 (UIO_MAXIOV in the kernel
> >  > sources, IOV_MAX in POSIX).  Because of this, on some host adapters
> >  > requests with many iovecs are rejected with -EINVAL by the
> >  > io_submit() or readv()/writev() system calls.
> >  >
> >  > In fact, the same limit applies to SG_IO as well.  To fix both the
> >  > EINVAL and the possible performance issues from using fewer iovecs
> >  > than allowed by Linux (some HBAs have max_segments as low as 128),
> >  > introduce a separate entry in BlockLimits to hold the max_segments
> >  > value from sysfs.  This new limit is used only for SG_IO and clamped
> >  > to bs->bl.max_iov anyway, just like max_hw_transfer is clamped to
> >  > bs->bl.max_transfer.
> >  >
> >  > Reported-by: Halil Pasic  > >
> >  > Cc: Hanna Reitz mailto:hre...@redhat.com>>
> >  > Cc: Kevin Wolf mailto:kw...@redhat.com>>
> >  > Cc: qemu-block@nongnu.org 
> >  > Fixes: 18473467d5 ("file-posix: try BLKSECTGET on block devices too, 
> > do not round to power of 2", 2021-06-25)
> > 
> > This sneaked in shortly before the 6.1 release (between rc0 and rc1 I 
> > think).
> > Shouldnt that go to stable in cass this still exist?
> > 
> > 
> >  > Signed-off-by: Paolo Bonzini  > >
> >  > ---
> >  >   block/block-backend.c          | 6 ++
> >  >   block/file-posix.c             | 2 +-
> >  >   block/io.c                     | 1 +
> >  >   hw/scsi/scsi-generic.c         | 2 +-
> >  >   include/block/block_int.h      | 7 +++
> >  >   include/sysemu/block-backend.h | 1 +
> >  >   6 files changed, 17 insertions(+), 2 deletions(-)
> >  >
> >  > diff --git a/block/block-backend.c b/block/block-backend.c
> >  > index 6140d133e2..ba2b5ebb10 100644
> >  > --- a/block/block-backend.c
> >  > +++ b/block/block-backend.c
> >  > @@ -1986,6 +1986,12 @@ uint32_t blk_get_max_transfer(BlockBackend 
> > *blk)
> >  >       return ROUND_DOWN(max, blk_get_request_alignment(blk));
> >  >   }
> >  >
> >  > +int blk_get_max_hw_iov(BlockBackend *blk)
> >  > +{
> >  > +    return MIN_NON_ZERO(blk->root->bs->bl.max_hw_iov,
> >  > +                        blk->root->bs->bl.max_iov);
> >  > +}
> >  > +
> >  >   int blk_get_max_iov(BlockBackend *blk)
> >  >   {
> >  >       return blk->root->bs->bl.max_iov;
> >  > diff --git a/block/file-posix.c b/block/file-posix.c
> >  > index cb9bffe047..1567edb3d5 100644
> >  > --- a/block/file-posix.c
> >  > +++ b/block/file-posix.c
> >  > @@ -1273,7 +1273,7 @@ static void 
> > raw_refresh_limits(BlockDriverState *bs, Error **errp)
> >  >
> >  >           ret = hdev_get_max_segments(s->fd, );
> >  >           if (ret > 0) {
> >  > -            bs->bl.max_iov = ret;
> >  > +            bs->bl.max_hw_iov = ret;
> >  >           }
> >  >       }
> >  >   }
> >  > diff --git a/block/io.c b/block/io.c
> >  > index a19942718b..f38e7f81d8 100644
> >  > --- a/block/io.c
> >  > +++ b/block/io.c
> >  > @@ -136,6 +136,7 @@ static void bdrv_merge_limits(BlockLimits *dst, 
> > const BlockLimits *src)
> >  >       dst->min_mem_alignment = MAX(dst->min_mem_alignment,
> >  >                                    src->min_mem_alignment);
> >  >       dst->max_iov = MIN_NON_ZERO(dst->max_iov, src->max_iov);
> >  > +    dst->max_hw_iov = MIN_NON_ZERO(dst->max_hw_iov, 
> > src->max_hw_iov);
> >  >   }
> >  >
> >  >   typedef struct BdrvRefreshLimitsState {
> >  > diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
> >  > index 665baf900e..0306ccc7b1 100644
> >  > --- 

Re: [PATCH 07/11] qemu-option: Allow deleting opts during qemu_opts_foreach()

2021-09-24 Thread Vladimir Sementsov-Ogievskiy

24.09.2021 12:04, Kevin Wolf wrote:

Use QTAILQ_FOREACH_SAFE() so that the current QemuOpts can be deleted
while iterating through the whole list.

Signed-off-by: Kevin Wolf


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PULL 0/3] hw/nvme updates

2021-09-24 Thread Peter Maydell
On Fri, 24 Sept 2021 at 07:47, Klaus Jensen  wrote:
>
> From: Klaus Jensen 
>
> Hi Peter,
>
> The following changes since commit 2c3e83f92d93fbab071b8a96b8ab769b01902475:
>
>   Merge remote-tracking branch 
> 'remotes/alistair23/tags/pull-riscv-to-apply-20210921' into staging 
> (2021-09-21 10:57:48 -0700)
>
> are available in the Git repository at:
>
>   git://git.infradead.org/qemu-nvme.git tags/nvme-next-pull-request
>
> for you to fetch changes up to c53a9a91021c2f57de9ab18393d0048bd0fe90c2:
>
>   hw/nvme: Return error for fused operations (2021-09-24 08:43:58 +0200)
>
> 
> hw/nvme updates
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.2
for any user-visible changes.

-- PMM



Re: [PATCH 01/11] qom: Reduce use of error_propagate()

2021-09-24 Thread Markus Armbruster
Kevin Wolf  writes:

> ERRP_GUARD() makes debugging easier by making sure that _abort
> still fails at the real origin of the error instead of
> error_propagate().
>
> Signed-off-by: Kevin Wolf 

Yes.

The code you patch uses error_propagate() to work around functions not
returning distinct error values.  error.h's big comment recommends such
return values, but recommendations don't update code, patches do.

Until then, ERRP_GUARD() is clearly a better crutch than
error_propagate().

Reviewed-by: Markus Armbruster 




Re: [PATCH 06/11] qdev: Add Error parameter to qdev_set_id()

2021-09-24 Thread Vladimir Sementsov-Ogievskiy

24.09.2021 12:04, Kevin Wolf wrote:

object_property_add_child() fails (with _abort) if an object with
the same name already exists. As long as QemuOpts is in use for -device
and device_add, it catches duplicate IDs before qdev_set_id() is even
called. However, for enabling non-QemuOpts code paths, we need to make
sure that the condition doesn't cause a crash, but fails gracefully.

Signed-off-by: Kevin Wolf 
---
  include/monitor/qdev.h  |  2 +-
  hw/xen/xen-legacy-backend.c |  3 ++-
  softmmu/qdev-monitor.c  | 16 ++--
  3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/include/monitor/qdev.h b/include/monitor/qdev.h
index 389287eb44..7961308c75 100644
--- a/include/monitor/qdev.h
+++ b/include/monitor/qdev.h
@@ -9,6 +9,6 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error 
**errp);
  
  int qdev_device_help(QemuOpts *opts);

  DeviceState *qdev_device_add(QemuOpts *opts, Error **errp);
-void qdev_set_id(DeviceState *dev, char *id);
+void qdev_set_id(DeviceState *dev, char *id, Error **errp);
  
  #endif

diff --git a/hw/xen/xen-legacy-backend.c b/hw/xen/xen-legacy-backend.c
index dd8ae1452d..17aca85ddc 100644
--- a/hw/xen/xen-legacy-backend.c
+++ b/hw/xen/xen-legacy-backend.c
@@ -276,7 +276,8 @@ static struct XenLegacyDevice *xen_be_get_xendev(const char 
*type, int dom,
  xendev = g_malloc0(ops->size);
  object_initialize(>qdev, ops->size, TYPE_XENBACKEND);
  OBJECT(xendev)->free = g_free;
-qdev_set_id(DEVICE(xendev), g_strdup_printf("xen-%s-%d", type, dev));
+qdev_set_id(DEVICE(xendev), g_strdup_printf("xen-%s-%d", type, dev),
+_abort);
  qdev_realize(DEVICE(xendev), xen_sysbus, _fatal);
  object_unref(OBJECT(xendev));
  
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c

index 1207e57a46..c2af906df0 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -593,26 +593,27 @@ static BusState *qbus_find(const char *path, Error **errp)
  }
  
  /* Takes ownership of @id, will be freed when deleting the device */

-void qdev_set_id(DeviceState *dev, char *id)
+void qdev_set_id(DeviceState *dev, char *id, Error **errp)


According to recommendations in error.h, worth adding also return value (for 
example true=success false=failure), so [..]


  {
  if (id) {
  dev->id = id;
  }


Unrelated but.. What's the strange logic?

Is it intended that with passed id=NULL we don't update dev->id variable but try 
to do following logic with old dev->id?

  
  if (dev->id) {

-object_property_add_child(qdev_get_peripheral(), dev->id,
-  OBJECT(dev));
+object_property_try_add_child(qdev_get_peripheral(), dev->id,
+  OBJECT(dev), errp);
  } else {
  static int anon_count;
  gchar *name = g_strdup_printf("device[%d]", anon_count++);
-object_property_add_child(qdev_get_peripheral_anon(), name,
-  OBJECT(dev));
+object_property_try_add_child(qdev_get_peripheral_anon(), name,
+  OBJECT(dev), errp);
  g_free(name);
  }
  }
  
  DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)

  {
+ERRP_GUARD();
  DeviceClass *dc;
  const char *driver, *path;
  DeviceState *dev = NULL;
@@ -691,7 +692,10 @@ DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)
  }
  }
  
-qdev_set_id(dev, g_strdup(qemu_opts_id(opts)));

+qdev_set_id(dev, g_strdup(qemu_opts_id(opts)), errp);
+if (*errp) {
+goto err_del_dev;
+}


[..] here we'll have

if (!qdev_set_id(...)) {
  goto err_del_dev;
}

and no need for ERRP_GUARD.

  
  /* set properties */

  if (qemu_opt_foreach(opts, set_property, dev, errp)) {




--
Best regards,
Vladimir



Re: [PATCH 05/11] qdev: Make DeviceState.id independent of QemuOpts

2021-09-24 Thread Vladimir Sementsov-Ogievskiy

24.09.2021 12:04, Kevin Wolf wrote:

DeviceState.id is a pointer to a string that is stored in the QemuOpts
object DeviceState.opts and freed together with it. We want to create
devices without going through QemuOpts in the future, so make this a
separately allocated string.

Signed-off-by: Kevin Wolf 


Interesting that in hw/xen/xen-legacy-backend.c g_strdup_printf-allocated id is 
passed to qdev_set_id prior this patch. So, the patch seems to fix that small 
leak. Worth to mention?


---
  include/hw/qdev-core.h  | 2 +-
  include/monitor/qdev.h  | 2 +-
  hw/arm/virt.c   | 2 +-
  hw/core/qdev.c  | 1 +
  hw/pci-bridge/pci_expander_bridge.c | 2 +-
  hw/ppc/e500.c   | 2 +-
  softmmu/qdev-monitor.c  | 5 +++--
  7 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 34c8a7506a..1857d9698e 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -176,7 +176,7 @@ struct DeviceState {
  Object parent_obj;
  /*< public >*/
  
-const char *id;

+char *id;
  char *canonical_path;
  bool realized;
  bool pending_deleted_event;
diff --git a/include/monitor/qdev.h b/include/monitor/qdev.h
index eaa947d73a..389287eb44 100644
--- a/include/monitor/qdev.h
+++ b/include/monitor/qdev.h
@@ -9,6 +9,6 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error 
**errp);
  
  int qdev_device_help(QemuOpts *opts);

  DeviceState *qdev_device_add(QemuOpts *opts, Error **errp);
-void qdev_set_id(DeviceState *dev, const char *id);
+void qdev_set_id(DeviceState *dev, char *id);
  
  #endif

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 1d59f0e59f..f933d48d3b 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1459,7 +1459,7 @@ static void create_platform_bus(VirtMachineState *vms)
  MemoryRegion *sysmem = get_system_memory();
  
  dev = qdev_new(TYPE_PLATFORM_BUS_DEVICE);

-dev->id = TYPE_PLATFORM_BUS_DEVICE;
+dev->id = g_strdup(TYPE_PLATFORM_BUS_DEVICE);
  qdev_prop_set_uint32(dev, "num_irqs", PLATFORM_BUS_NUM_IRQS);
  qdev_prop_set_uint32(dev, "mmio_size", 
vms->memmap[VIRT_PLATFORM_BUS].size);
  sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index cefc5eaa0a..d918b50a1d 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -956,6 +956,7 @@ static void device_finalize(Object *obj)
  }
  
  qemu_opts_del(dev->opts);

+g_free(dev->id);
  }
  
  static void device_class_base_init(ObjectClass *class, void *data)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index 7112dc3062..10e6e7c2ab 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -245,7 +245,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool 
pcie, Error **errp)
  } else {
  bus = pci_root_bus_new(ds, "pxb-internal", NULL, NULL, 0, 
TYPE_PXB_BUS);
  bds = qdev_new("pci-bridge");
-bds->id = dev_name;
+bds->id = g_strdup(dev_name);
  qdev_prop_set_uint8(bds, PCI_BRIDGE_DEV_PROP_CHASSIS_NR, pxb->bus_nr);
  qdev_prop_set_bit(bds, PCI_BRIDGE_DEV_PROP_SHPC, false);
  }
diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
index 95451414dd..960e7efcd3 100644
--- a/hw/ppc/e500.c
+++ b/hw/ppc/e500.c
@@ -1006,7 +1006,7 @@ void ppce500_init(MachineState *machine)
  /* Platform Bus Device */
  if (pmc->has_platform_bus) {
  dev = qdev_new(TYPE_PLATFORM_BUS_DEVICE);
-dev->id = TYPE_PLATFORM_BUS_DEVICE;
+dev->id = g_strdup(TYPE_PLATFORM_BUS_DEVICE);
  qdev_prop_set_uint32(dev, "num_irqs", pmc->platform_bus_num_irqs);
  qdev_prop_set_uint32(dev, "mmio_size", pmc->platform_bus_size);
  sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 034b999401..1207e57a46 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -592,7 +592,8 @@ static BusState *qbus_find(const char *path, Error **errp)
  return bus;
  }
  
-void qdev_set_id(DeviceState *dev, const char *id)

+/* Takes ownership of @id, will be freed when deleting the device */
+void qdev_set_id(DeviceState *dev, char *id)
  {
  if (id) {
  dev->id = id;
@@ -690,7 +691,7 @@ DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)
  }
  }
  
-qdev_set_id(dev, qemu_opts_id(opts));

+qdev_set_id(dev, g_strdup(qemu_opts_id(opts)));
  
  /* set properties */

  if (qemu_opt_foreach(opts, set_property, dev, errp)) {



In hw/pci/pci_bridge.c

we do

   br->bus_name = dev->qdev.id

It seems bad, as we now stole dynamic pointer, that may be freed later.

--
Best regards,
Vladimir



Re: [PATCH 03/11] iotests/051: Fix typo

2021-09-24 Thread Vladimir Sementsov-Ogievskiy

24.09.2021 12:04, Kevin Wolf wrote:

The iothread isn't called 'iothread0', but 'thread0'. Depending on the
order that properties are parsed, the error message may change from the
expected one to another one saying that the iothread doesn't exist.

Signed-off-by: Kevin Wolf



Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH 02/11] iotests/245: Fix type for iothread property

2021-09-24 Thread Vladimir Sementsov-Ogievskiy

24.09.2021 12:04, Kevin Wolf wrote:

iothread is a string property, so None (= JSON null) is not a valid
value for it. Pass the empty string instead to get the default iothread.

Signed-off-by: Kevin Wolf


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH 01/11] qom: Reduce use of error_propagate()

2021-09-24 Thread Vladimir Sementsov-Ogievskiy

24.09.2021 12:04, Kevin Wolf wrote:

ERRP_GUARD() makes debugging easier by making sure that _abort
still fails at the real origin of the error instead of
error_propagate().

Signed-off-by: Kevin Wolf 
---
  qom/object.c|  7 +++
  qom/object_interfaces.c | 17 ++---
  2 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/qom/object.c b/qom/object.c
index e86cb05b84..6be710bc40 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -1389,7 +1389,7 @@ bool object_property_get(Object *obj, const char *name, 
Visitor *v,
  bool object_property_set(Object *obj, const char *name, Visitor *v,
   Error **errp)
  {
-Error *err = NULL;
+ERRP_GUARD();
  ObjectProperty *prop = object_property_find_err(obj, name, errp);
  
  if (prop == NULL) {

@@ -1400,9 +1400,8 @@ bool object_property_set(Object *obj, const char *name, 
Visitor *v,
  error_setg(errp, QERR_PERMISSION_DENIED);
  return false;
  }
-prop->set(obj, v, name, prop->opaque, );
-error_propagate(errp, err);
-return !err;
+prop->set(obj, v, name, prop->opaque, errp);
+return !*errp;
  }


This is OK: prop->set doesn't return value, so we have to analyze errp to make 
our own return value.

  
  bool object_property_set_str(Object *obj, const char *name,

diff --git a/qom/object_interfaces.c b/qom/object_interfaces.c
index ad9b56b59a..80691e88cd 100644
--- a/qom/object_interfaces.c
+++ b/qom/object_interfaces.c
@@ -45,26 +45,21 @@ bool user_creatable_can_be_deleted(UserCreatable *uc)
  static void object_set_properties_from_qdict(Object *obj, const QDict *qdict,
   Visitor *v, Error **errp)
  {
+ERRP_GUARD();
  const QDictEntry *e;
-Error *local_err = NULL;
  
-if (!visit_start_struct(v, NULL, NULL, 0, _err)) {

-goto out;
+if (!visit_start_struct(v, NULL, NULL, 0, errp)) {
+return;
  }
  for (e = qdict_first(qdict); e; e = qdict_next(qdict, e)) {
-if (!object_property_set(obj, e->key, v, _err)) {
+if (!object_property_set(obj, e->key, v, errp)) {
  break;
  }
  }
-if (!local_err) {
-visit_check_struct(v, _err);
+if (!*errp) {
+visit_check_struct(v, errp);
  }
  visit_end_struct(v, NULL);
-
-out:
-if (local_err) {
-error_propagate(errp, local_err);
-}
  }
  
  void object_set_properties_from_keyval(Object *obj, const QDict *qdict,




ERRP_GUARD() + dereferencing errp is good when we use functions which don't 
return any value. So we want to check is it fail or not and we have to analyze 
errp.

But that is not the case: all called functions follow modern recommendations of 
returning some value indicating failure together with setting errp.

I think, it's better not use ERRP_GUARD where it is not needed: in ideal world, 
all functions that may fail do return value, and we don't need neither error 
propagation, nor dereferencing errp. And don't need ERRP_GUARD.  ERRP_GUARD 
will be still needed to support error_prepend()/error_append() together with 
error_fatal, but again that's not the case.

Here I suggest something like this:

static void object_set_properties_from_qdict(Object *obj, const QDict *qdict,
 Visitor *v, Error **errp)
{
const QDictEntry *e;
 
if (!visit_start_struct(v, NULL, NULL, 0, errp)) {

return;
}
for (e = qdict_first(qdict); e; e = qdict_next(qdict, e)) {
if (!object_property_set(obj, e->key, v, errp)) {
goto out;
}
}

visit_check_struct(v, errp);

out:
visit_end_struct(v, NULL);
}


--
Best regards,
Vladimir



Re: [PATCH] block: introduce max_hw_iov for use in scsi-generic

2021-09-24 Thread Kevin Wolf
Am 23.09.2021 um 15:04 hat Paolo Bonzini geschrieben:
> Linux limits the size of iovecs to 1024 (UIO_MAXIOV in the kernel
> sources, IOV_MAX in POSIX).  Because of this, on some host adapters
> requests with many iovecs are rejected with -EINVAL by the
> io_submit() or readv()/writev() system calls.
> 
> In fact, the same limit applies to SG_IO as well.  To fix both the
> EINVAL and the possible performance issues from using fewer iovecs
> than allowed by Linux (some HBAs have max_segments as low as 128),
> introduce a separate entry in BlockLimits to hold the max_segments
> value from sysfs.  This new limit is used only for SG_IO and clamped
> to bs->bl.max_iov anyway, just like max_hw_transfer is clamped to
> bs->bl.max_transfer.
> 
> Reported-by: Halil Pasic 
> Cc: Hanna Reitz 
> Cc: Kevin Wolf 
> Cc: qemu-block@nongnu.org
> Fixes: 18473467d5 ("file-posix: try BLKSECTGET on block devices too, do not 
> round to power of 2", 2021-06-25)
> Signed-off-by: Paolo Bonzini 

Thanks, applied to the block branch.

Kevin




[PATCH 11/11] Deprecate stable non-JSON -device and -object

2021-09-24 Thread Kevin Wolf
We want to switch both from QemuOpts to the keyval parser in the future,
which results in some incompatibilities, mainly around list handling.
Mark the non-JSON version of both as unstable syntax so that management
tools switch to JSON and we can later make the change without breaking
things.

Signed-off-by: Kevin Wolf 
---
 docs/about/deprecated.rst | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 3c2be84d80..42f6a478fb 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -160,6 +160,17 @@ Use ``-display sdl`` instead.
 
 Use ``-display curses`` instead.
 
+Stable non-JSON ``-device`` and ``-object`` syntax (since 6.2)
+''
+
+If you rely on a stable interface for ``-device`` and ``-object`` that doesn't
+change incompatibly between QEMU versions (e.g. because you are using the QEMU
+command line as a machine interface in scripts rather than interactively), use
+JSON syntax for these options instead.
+
+There is no intention to remove support for non-JSON syntax entirely, but
+future versions may change the way to spell some options.
+
 
 Plugin argument passing through ``arg=`` (since 6.1)
 
-- 
2.31.1




[PATCH 09/11] qdev: Avoid QemuOpts in QMP device_add

2021-09-24 Thread Kevin Wolf
Directly call qdev_device_add_from_qdict() for QMP device_add instead of
first going through QemuOpts and converting back to QDict.

Note that this changes the behaviour of device_add, though in ways that
should be considered bug fixes:

QemuOpts ignores differences between data types, so you could
successfully pass a string "123" for an integer property, or a string
"on" for a boolean property (and vice versa).  After this change, the
correct data type for the property must be used in the JSON input.

qemu_opts_from_qdict() also silently ignores any options whose value is
a QDict, QList or QNull.

To illustrate, the following QMP command was accepted before and is now
rejected for both reasons:

{ "execute": "device_add",
  "arguments": { "driver": "scsi-cd",
 "drive": { "completely": "invalid" },
 "physical_block_size": "4096" } }

Signed-off-by: Kevin Wolf 
---
 softmmu/qdev-monitor.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index c09b7430eb..8622ccade6 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -812,7 +812,8 @@ void hmp_info_qdm(Monitor *mon, const QDict *qdict)
 qdev_print_devinfos(true);
 }
 
-void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
+static void monitor_device_add(QDict *qdict, QObject **ret_data,
+   bool from_json, Error **errp)
 {
 QemuOpts *opts;
 DeviceState *dev;
@@ -825,7 +826,9 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error 
**errp)
 qemu_opts_del(opts);
 return;
 }
-dev = qdev_device_add(opts, errp);
+qemu_opts_del(opts);
+
+dev = qdev_device_add_from_qdict(qdict, from_json, errp);
 
 /*
  * Drain all pending RCU callbacks. This is done because
@@ -838,13 +841,14 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, 
Error **errp)
  */
 drain_call_rcu();
 
-if (!dev) {
-qemu_opts_del(opts);
-return;
-}
 object_unref(OBJECT(dev));
 }
 
+void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
+{
+monitor_device_add(qdict, ret_data, true, errp);
+}
+
 static DeviceState *find_device_state(const char *id, Error **errp)
 {
 Object *obj;
@@ -936,7 +940,7 @@ void hmp_device_add(Monitor *mon, const QDict *qdict)
 {
 Error *err = NULL;
 
-qmp_device_add((QDict *)qdict, NULL, );
+monitor_device_add((QDict *)qdict, NULL, false, );
 hmp_handle_error(mon, err);
 }
 
-- 
2.31.1




[PATCH 06/11] qdev: Add Error parameter to qdev_set_id()

2021-09-24 Thread Kevin Wolf
object_property_add_child() fails (with _abort) if an object with
the same name already exists. As long as QemuOpts is in use for -device
and device_add, it catches duplicate IDs before qdev_set_id() is even
called. However, for enabling non-QemuOpts code paths, we need to make
sure that the condition doesn't cause a crash, but fails gracefully.

Signed-off-by: Kevin Wolf 
---
 include/monitor/qdev.h  |  2 +-
 hw/xen/xen-legacy-backend.c |  3 ++-
 softmmu/qdev-monitor.c  | 16 ++--
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/include/monitor/qdev.h b/include/monitor/qdev.h
index 389287eb44..7961308c75 100644
--- a/include/monitor/qdev.h
+++ b/include/monitor/qdev.h
@@ -9,6 +9,6 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error 
**errp);
 
 int qdev_device_help(QemuOpts *opts);
 DeviceState *qdev_device_add(QemuOpts *opts, Error **errp);
-void qdev_set_id(DeviceState *dev, char *id);
+void qdev_set_id(DeviceState *dev, char *id, Error **errp);
 
 #endif
diff --git a/hw/xen/xen-legacy-backend.c b/hw/xen/xen-legacy-backend.c
index dd8ae1452d..17aca85ddc 100644
--- a/hw/xen/xen-legacy-backend.c
+++ b/hw/xen/xen-legacy-backend.c
@@ -276,7 +276,8 @@ static struct XenLegacyDevice *xen_be_get_xendev(const char 
*type, int dom,
 xendev = g_malloc0(ops->size);
 object_initialize(>qdev, ops->size, TYPE_XENBACKEND);
 OBJECT(xendev)->free = g_free;
-qdev_set_id(DEVICE(xendev), g_strdup_printf("xen-%s-%d", type, dev));
+qdev_set_id(DEVICE(xendev), g_strdup_printf("xen-%s-%d", type, dev),
+_abort);
 qdev_realize(DEVICE(xendev), xen_sysbus, _fatal);
 object_unref(OBJECT(xendev));
 
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 1207e57a46..c2af906df0 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -593,26 +593,27 @@ static BusState *qbus_find(const char *path, Error **errp)
 }
 
 /* Takes ownership of @id, will be freed when deleting the device */
-void qdev_set_id(DeviceState *dev, char *id)
+void qdev_set_id(DeviceState *dev, char *id, Error **errp)
 {
 if (id) {
 dev->id = id;
 }
 
 if (dev->id) {
-object_property_add_child(qdev_get_peripheral(), dev->id,
-  OBJECT(dev));
+object_property_try_add_child(qdev_get_peripheral(), dev->id,
+  OBJECT(dev), errp);
 } else {
 static int anon_count;
 gchar *name = g_strdup_printf("device[%d]", anon_count++);
-object_property_add_child(qdev_get_peripheral_anon(), name,
-  OBJECT(dev));
+object_property_try_add_child(qdev_get_peripheral_anon(), name,
+  OBJECT(dev), errp);
 g_free(name);
 }
 }
 
 DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)
 {
+ERRP_GUARD();
 DeviceClass *dc;
 const char *driver, *path;
 DeviceState *dev = NULL;
@@ -691,7 +692,10 @@ DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)
 }
 }
 
-qdev_set_id(dev, g_strdup(qemu_opts_id(opts)));
+qdev_set_id(dev, g_strdup(qemu_opts_id(opts)), errp);
+if (*errp) {
+goto err_del_dev;
+}
 
 /* set properties */
 if (qemu_opt_foreach(opts, set_property, dev, errp)) {
-- 
2.31.1




[PATCH 05/11] qdev: Make DeviceState.id independent of QemuOpts

2021-09-24 Thread Kevin Wolf
DeviceState.id is a pointer to a string that is stored in the QemuOpts
object DeviceState.opts and freed together with it. We want to create
devices without going through QemuOpts in the future, so make this a
separately allocated string.

Signed-off-by: Kevin Wolf 
---
 include/hw/qdev-core.h  | 2 +-
 include/monitor/qdev.h  | 2 +-
 hw/arm/virt.c   | 2 +-
 hw/core/qdev.c  | 1 +
 hw/pci-bridge/pci_expander_bridge.c | 2 +-
 hw/ppc/e500.c   | 2 +-
 softmmu/qdev-monitor.c  | 5 +++--
 7 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 34c8a7506a..1857d9698e 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -176,7 +176,7 @@ struct DeviceState {
 Object parent_obj;
 /*< public >*/
 
-const char *id;
+char *id;
 char *canonical_path;
 bool realized;
 bool pending_deleted_event;
diff --git a/include/monitor/qdev.h b/include/monitor/qdev.h
index eaa947d73a..389287eb44 100644
--- a/include/monitor/qdev.h
+++ b/include/monitor/qdev.h
@@ -9,6 +9,6 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error 
**errp);
 
 int qdev_device_help(QemuOpts *opts);
 DeviceState *qdev_device_add(QemuOpts *opts, Error **errp);
-void qdev_set_id(DeviceState *dev, const char *id);
+void qdev_set_id(DeviceState *dev, char *id);
 
 #endif
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 1d59f0e59f..f933d48d3b 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1459,7 +1459,7 @@ static void create_platform_bus(VirtMachineState *vms)
 MemoryRegion *sysmem = get_system_memory();
 
 dev = qdev_new(TYPE_PLATFORM_BUS_DEVICE);
-dev->id = TYPE_PLATFORM_BUS_DEVICE;
+dev->id = g_strdup(TYPE_PLATFORM_BUS_DEVICE);
 qdev_prop_set_uint32(dev, "num_irqs", PLATFORM_BUS_NUM_IRQS);
 qdev_prop_set_uint32(dev, "mmio_size", 
vms->memmap[VIRT_PLATFORM_BUS].size);
 sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index cefc5eaa0a..d918b50a1d 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -956,6 +956,7 @@ static void device_finalize(Object *obj)
 }
 
 qemu_opts_del(dev->opts);
+g_free(dev->id);
 }
 
 static void device_class_base_init(ObjectClass *class, void *data)
diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index 7112dc3062..10e6e7c2ab 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -245,7 +245,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool 
pcie, Error **errp)
 } else {
 bus = pci_root_bus_new(ds, "pxb-internal", NULL, NULL, 0, 
TYPE_PXB_BUS);
 bds = qdev_new("pci-bridge");
-bds->id = dev_name;
+bds->id = g_strdup(dev_name);
 qdev_prop_set_uint8(bds, PCI_BRIDGE_DEV_PROP_CHASSIS_NR, pxb->bus_nr);
 qdev_prop_set_bit(bds, PCI_BRIDGE_DEV_PROP_SHPC, false);
 }
diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
index 95451414dd..960e7efcd3 100644
--- a/hw/ppc/e500.c
+++ b/hw/ppc/e500.c
@@ -1006,7 +1006,7 @@ void ppce500_init(MachineState *machine)
 /* Platform Bus Device */
 if (pmc->has_platform_bus) {
 dev = qdev_new(TYPE_PLATFORM_BUS_DEVICE);
-dev->id = TYPE_PLATFORM_BUS_DEVICE;
+dev->id = g_strdup(TYPE_PLATFORM_BUS_DEVICE);
 qdev_prop_set_uint32(dev, "num_irqs", pmc->platform_bus_num_irqs);
 qdev_prop_set_uint32(dev, "mmio_size", pmc->platform_bus_size);
 sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 034b999401..1207e57a46 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -592,7 +592,8 @@ static BusState *qbus_find(const char *path, Error **errp)
 return bus;
 }
 
-void qdev_set_id(DeviceState *dev, const char *id)
+/* Takes ownership of @id, will be freed when deleting the device */
+void qdev_set_id(DeviceState *dev, char *id)
 {
 if (id) {
 dev->id = id;
@@ -690,7 +691,7 @@ DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)
 }
 }
 
-qdev_set_id(dev, qemu_opts_id(opts));
+qdev_set_id(dev, g_strdup(qemu_opts_id(opts)));
 
 /* set properties */
 if (qemu_opt_foreach(opts, set_property, dev, errp)) {
-- 
2.31.1




[PATCH 07/11] qemu-option: Allow deleting opts during qemu_opts_foreach()

2021-09-24 Thread Kevin Wolf
Use QTAILQ_FOREACH_SAFE() so that the current QemuOpts can be deleted
while iterating through the whole list.

Signed-off-by: Kevin Wolf 
---
 util/qemu-option.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/util/qemu-option.c b/util/qemu-option.c
index 61cb4a97bd..eedd08929b 100644
--- a/util/qemu-option.c
+++ b/util/qemu-option.c
@@ -1126,11 +1126,11 @@ int qemu_opts_foreach(QemuOptsList *list, 
qemu_opts_loopfunc func,
   void *opaque, Error **errp)
 {
 Location loc;
-QemuOpts *opts;
+QemuOpts *opts, *next;
 int rc = 0;
 
 loc_push_none();
-QTAILQ_FOREACH(opts, >head, next) {
+QTAILQ_FOREACH_SAFE(opts, >head, next, next) {
 loc_restore(>loc);
 rc = func(opaque, opts, errp);
 if (rc) {
-- 
2.31.1




[PATCH 04/11] qdev: Avoid using string visitor for properties

2021-09-24 Thread Kevin Wolf
The only thing the string visitor adds compared to a keyval visitor is
list support. git grep for 'visit_start_list' and 'visit.*List' shows
that devices don't make use of this.

In a world with a QAPIfied command line interface, the keyval visitor is
used to parse the command line. In order to make sure that no devices
start using this feature that would make backwards compatibility harder,
just switch away from object_property_parse(), which internally uses the
string visitor, to a keyval visitor and object_property_set().

Signed-off-by: Kevin Wolf 
---
 softmmu/qdev-monitor.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 0705f00846..034b999401 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -28,6 +28,8 @@
 #include "qapi/qmp/dispatch.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qerror.h"
+#include "qapi/qmp/qstring.h"
+#include "qapi/qobject-input-visitor.h"
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
 #include "qemu/help_option.h"
@@ -198,16 +200,28 @@ static int set_property(void *opaque, const char *name, 
const char *value,
 Error **errp)
 {
 Object *obj = opaque;
+QString *val;
+Visitor *v;
+int ret;
 
 if (strcmp(name, "driver") == 0)
 return 0;
 if (strcmp(name, "bus") == 0)
 return 0;
 
-if (!object_property_parse(obj, name, value, errp)) {
-return -1;
+val = qstring_from_str(value);
+v = qobject_input_visitor_new_keyval(QOBJECT(val));
+
+if (!object_property_set(obj, name, v, errp)) {
+ret = -1;
+goto out;
 }
-return 0;
+
+ret = 0;
+out:
+visit_free(v);
+qobject_unref(val);
+return ret;
 }
 
 static const char *find_typename_by_alias(const char *alias)
-- 
2.31.1




[PATCH 10/11] vl: Enable JSON syntax for -device

2021-09-24 Thread Kevin Wolf
Like we already do for -object, introduce support for JSON syntax in
-device, which can be kept stable in the long term and guarantees that a
single code path with identical behaviour is used for both QMP and the
command line. Compared to the QemuOpts based code, the parser contains
less surprises and has support for non-scalar options (lists and
structs). Switching management tools to JSON means that we can more
easily change the "human" CLI syntax from QemuOpts to the keyval parser
later.

In the QAPI schema, a feature flag is added to the device-add command to
allow management tools to detect support for this.

Signed-off-by: Kevin Wolf 
---
 qapi/qdev.json | 15 +
 softmmu/vl.c   | 58 --
 2 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/qapi/qdev.json b/qapi/qdev.json
index b83178220b..cdc8f911b5 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -32,17 +32,23 @@
 ##
 # @device_add:
 #
+# Add a device.
+#
 # @driver: the name of the new device's driver
 #
 # @bus: the device's parent bus (device tree path)
 #
 # @id: the device's ID, must be unique
 #
-# Additional arguments depend on the type.
-#
-# Add a device.
+# Features:
+# @json-cli: If present, the "-device" command line option supports JSON
+#syntax with a structure identical to the arguments of this
+#command.
 #
 # Notes:
+#
+# Additional arguments depend on the type.
+#
 # 1. For detailed information about this command, please refer to the
 #'docs/qdev-device-use.txt' file.
 #
@@ -67,7 +73,8 @@
 ##
 { 'command': 'device_add',
   'data': {'driver': 'str', '*bus': 'str', '*id': 'str'},
-  'gen': false } # so we can get the additional arguments
+  'gen': false, # so we can get the additional arguments
+  'features': ['json-cli'] }
 
 ##
 # @device_del:
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 55ab70eb97..7596d9da06 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -144,6 +144,12 @@ typedef struct ObjectOption {
 QTAILQ_ENTRY(ObjectOption) next;
 } ObjectOption;
 
+typedef struct DeviceOption {
+QDict *opts;
+Location loc;
+QTAILQ_ENTRY(DeviceOption) next;
+} DeviceOption;
+
 static const char *cpu_option;
 static const char *mem_path;
 static const char *incoming;
@@ -151,6 +157,7 @@ static const char *loadvm;
 static const char *accelerators;
 static QDict *machine_opts_dict;
 static QTAILQ_HEAD(, ObjectOption) object_opts = 
QTAILQ_HEAD_INITIALIZER(object_opts);
+static QTAILQ_HEAD(, DeviceOption) device_opts = 
QTAILQ_HEAD_INITIALIZER(device_opts);
 static ram_addr_t maxram_size;
 static uint64_t ram_slots;
 static int display_remote;
@@ -494,21 +501,39 @@ const char *qemu_get_vm_name(void)
 return qemu_name;
 }
 
-static int default_driver_check(void *opaque, QemuOpts *opts, Error **errp)
+static void default_driver_disable(const char *driver)
 {
-const char *driver = qemu_opt_get(opts, "driver");
 int i;
 
-if (!driver)
-return 0;
+if (!driver) {
+return;
+}
+
 for (i = 0; i < ARRAY_SIZE(default_list); i++) {
 if (strcmp(default_list[i].driver, driver) != 0)
 continue;
 *(default_list[i].flag) = 0;
 }
+}
+
+static int default_driver_check(void *opaque, QemuOpts *opts, Error **errp)
+{
+const char *driver = qemu_opt_get(opts, "driver");
+
+default_driver_disable(driver);
 return 0;
 }
 
+static void default_driver_check_json(void)
+{
+DeviceOption *opt;
+
+QTAILQ_FOREACH(opt, _opts, next) {
+const char *driver = qdict_get_try_str(opt->opts, "driver");
+default_driver_disable(driver);
+}
+}
+
 static int parse_name(void *opaque, QemuOpts *opts, Error **errp)
 {
 const char *proc_name;
@@ -1311,6 +1336,7 @@ static void qemu_disable_default_devices(void)
 {
 MachineClass *machine_class = MACHINE_GET_CLASS(current_machine);
 
+default_driver_check_json();
 qemu_opts_foreach(qemu_find_opts("device"),
   default_driver_check, NULL, NULL);
 qemu_opts_foreach(qemu_find_opts("global"),
@@ -2637,6 +2663,8 @@ static void qemu_init_board(void)
 
 static void qemu_create_cli_devices(void)
 {
+DeviceOption *opt;
+
 soundhw_init();
 
 qemu_opts_foreach(qemu_find_opts("fw_cfg"),
@@ -2652,6 +2680,13 @@ static void qemu_create_cli_devices(void)
 rom_set_order_override(FW_CFG_ORDER_OVERRIDE_DEVICE);
 qemu_opts_foreach(qemu_find_opts("device"),
   device_init_func, NULL, _fatal);
+QTAILQ_FOREACH(opt, _opts, next) {
+QObject *ret_data;
+
+loc_push_restore(>loc);
+qmp_device_add(opt->opts, _data, _fatal);
+loc_pop(>loc);
+}
 rom_reset_order_override();
 }
 
@@ -3352,9 +3387,18 @@ void qemu_init(int argc, char **argv, char **envp)
 add_device_config(DEV_USB, optarg);
 break;
 case QEMU_OPTION_device:
-if 

[PATCH 02/11] iotests/245: Fix type for iothread property

2021-09-24 Thread Kevin Wolf
iothread is a string property, so None (= JSON null) is not a valid
value for it. Pass the empty string instead to get the default iothread.

Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/245 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245
index bf8261eec0..9b12b42eed 100755
--- a/tests/qemu-iotests/245
+++ b/tests/qemu-iotests/245
@@ -1189,10 +1189,10 @@ class TestBlockdevReopen(iotests.QMPTestCase):
 self.run_test_iothreads('iothread0', 'iothread0')
 
 def test_iothreads_switch_backing(self):
-self.run_test_iothreads('iothread0', None)
+self.run_test_iothreads('iothread0', '')
 
 def test_iothreads_switch_overlay(self):
-self.run_test_iothreads(None, 'iothread0')
+self.run_test_iothreads('', 'iothread0')
 
 if __name__ == '__main__':
 iotests.activate_logging()
-- 
2.31.1




[PATCH 00/11] qdev: Add JSON -device and fix QMP device_add

2021-09-24 Thread Kevin Wolf
It's still a long way until we'll have QAPIfied devices, but there are
some improvements that we can already make now to make the future switch
easier.

One important part of this is having code paths without QemuOpts, which
we want to get rid of and replace with the keyval parser in the long
run. This series adds support for JSON syntax to -device, which bypasses
QemuOpts.

While we're not using QAPI yet, devices are based on QOM, so we already
do have type checks and an implied schema. JSON syntax supported now can
be supported by QAPI later and regarding command line compatibility,
actually switching to it becomes an implementation detail this way (of
course, it will still add valuable user-visible features like
introspection and documentation).

Apart from making things more future proof, this also immediately adds
a way to do non-scalar properties on the command line. nvme could have
used list support recently, and the lack of it in -device led to some
rather unnatural solution in the first version (doing the relationship
between a device and objects backwards) and loss of features in the
following. With this series, using a list as a device property should be
possible without any weird tricks.

Unfortunately, even QMP device_add goes through QemuOpts before this
series, which destroys any type safety QOM provides and also can't
support non-scalar properties. This is a bug that is fixed during this
series, but that is technically an incompatible change. A similar change
was made for netdev_add in commit db2a380c84.

libvirt needs to make sure that it actually always passes the right
type, because passing a wrong type will start to fail after this
series. I hope that libvirt already does things correctly, so this
won't cause any trouble, but if it does, there is the option of using
the QemuOpts-less code path only for JSON -device and going through a
deprecation period for QMP device_add.

Kevin Wolf (11):
  qom: Reduce use of error_propagate()
  iotests/245: Fix type for iothread property
  iotests/051: Fix typo
  qdev: Avoid using string visitor for properties
  qdev: Make DeviceState.id independent of QemuOpts
  qdev: Add Error parameter to qdev_set_id()
  qemu-option: Allow deleting opts during qemu_opts_foreach()
  qdev: Base object creation on QDict rather than QemuOpts
  qdev: Avoid QemuOpts in QMP device_add
  vl: Enable JSON syntax for -device
  Deprecate stable non-JSON -device and -object

 qapi/qdev.json  | 15 +++--
 docs/about/deprecated.rst   | 11 
 include/hw/qdev-core.h  | 10 ++--
 include/monitor/qdev.h  |  2 +-
 hw/arm/virt.c   |  2 +-
 hw/core/qdev.c  |  6 +-
 hw/net/virtio-net.c |  4 +-
 hw/pci-bridge/pci_expander_bridge.c |  2 +-
 hw/ppc/e500.c   |  2 +-
 hw/vfio/pci.c   |  4 +-
 hw/xen/xen-legacy-backend.c |  3 +-
 qom/object.c|  7 +--
 qom/object_interfaces.c | 17 ++
 softmmu/qdev-monitor.c  | 90 +
 softmmu/vl.c| 58 ---
 util/qemu-option.c  |  4 +-
 tests/qemu-iotests/051  |  2 +-
 tests/qemu-iotests/051.pc.out   |  4 +-
 tests/qemu-iotests/245  |  4 +-
 19 files changed, 161 insertions(+), 86 deletions(-)

-- 
2.31.1




[PATCH 08/11] qdev: Base object creation on QDict rather than QemuOpts

2021-09-24 Thread Kevin Wolf
QDicts are both what QMP natively uses and what the keyval parser
produces. Going through QemuOpts isn't useful for either one, so switch
the main device creation function to QDicts. By sharing more code with
the -object/object-add code path, we can even reduce the code size a
bit.

This commit doesn't remove the detour through QemuOpts from any code
path yet, but it allows the following commits to do so.

Signed-off-by: Kevin Wolf 
---
 include/hw/qdev-core.h |  8 ++---
 hw/core/qdev.c |  5 ++--
 hw/net/virtio-net.c|  4 +--
 hw/vfio/pci.c  |  4 +--
 softmmu/qdev-monitor.c | 67 +++---
 5 files changed, 41 insertions(+), 47 deletions(-)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 1857d9698e..5b3d4704a5 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -180,7 +180,7 @@ struct DeviceState {
 char *canonical_path;
 bool realized;
 bool pending_deleted_event;
-QemuOpts *opts;
+QDict *opts;
 int hotplugged;
 bool allow_unplug_during_migration;
 BusState *parent_bus;
@@ -202,7 +202,7 @@ struct DeviceListener {
  * hide a failover device depending for example on the device
  * opts.
  */
-bool (*hide_device)(DeviceListener *listener, QemuOpts *device_opts);
+bool (*hide_device)(DeviceListener *listener, const QDict *device_opts);
 QTAILQ_ENTRY(DeviceListener) link;
 };
 
@@ -831,13 +831,13 @@ void device_listener_unregister(DeviceListener *listener);
 
 /**
  * @qdev_should_hide_device:
- * @opts: QemuOpts as passed on cmdline.
+ * @opts: options QDict
  *
  * Check if a device should be added.
  * When a device is added via qdev_device_add() this will be called,
  * and return if the device should be added now or not.
  */
-bool qdev_should_hide_device(QemuOpts *opts);
+bool qdev_should_hide_device(const QDict *opts);
 
 typedef enum MachineInitPhase {
 /* current_machine is NULL.  */
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index d918b50a1d..5b889866c5 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -28,6 +28,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qapi/qapi-events-qdev.h"
+#include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qerror.h"
 #include "qapi/visitor.h"
 #include "qemu/error-report.h"
@@ -211,7 +212,7 @@ void device_listener_unregister(DeviceListener *listener)
 QTAILQ_REMOVE(_listeners, listener, link);
 }
 
-bool qdev_should_hide_device(QemuOpts *opts)
+bool qdev_should_hide_device(const QDict *opts)
 {
 DeviceListener *listener;
 
@@ -955,7 +956,7 @@ static void device_finalize(Object *obj)
 dev->canonical_path = NULL;
 }
 
-qemu_opts_del(dev->opts);
+qobject_unref(dev->opts);
 g_free(dev->id);
 }
 
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index f205331dcf..5684c2b2b7 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3304,7 +3304,7 @@ static void virtio_net_migration_state_notifier(Notifier 
*notifier, void *data)
 }
 
 static bool failover_hide_primary_device(DeviceListener *listener,
- QemuOpts *device_opts)
+ const QDict *device_opts)
 {
 VirtIONet *n = container_of(listener, VirtIONet, primary_listener);
 const char *standby_id;
@@ -3312,7 +3312,7 @@ static bool failover_hide_primary_device(DeviceListener 
*listener,
 if (!device_opts) {
 return false;
 }
-standby_id = qemu_opt_get(device_opts, "failover_pair_id");
+standby_id = qdict_get_try_str(device_opts, "failover_pair_id");
 if (g_strcmp0(standby_id, n->netclient_name) != 0) {
 return false;
 }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 4feaa1cb68..5cdf1d4298 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -29,10 +29,10 @@
 #include "hw/qdev-properties.h"
 #include "hw/qdev-properties-system.h"
 #include "migration/vmstate.h"
+#include "qapi/qmp/qdict.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
-#include "qemu/option.h"
 #include "qemu/range.h"
 #include "qemu/units.h"
 #include "sysemu/kvm.h"
@@ -941,7 +941,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
 }
 
 if (vfio_opt_rom_in_denylist(vdev)) {
-if (dev->opts && qemu_opt_get(dev->opts, "rombar")) {
+if (dev->opts && qdict_haskey(dev->opts, "rombar")) {
 warn_report("Device at %s is known to cause system instability"
 " issues during option rom execution",
 vdev->vbasedev.name);
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index c2af906df0..c09b7430eb 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -196,34 +196,6 @@ static void qdev_print_devinfos(bool show_no_user)
 g_slist_free(list);
 }
 
-static int set_property(void *opaque, const char *name, const char *value,
-Error **errp)
-{
-Object *obj = opaque;
-   

[PATCH 01/11] qom: Reduce use of error_propagate()

2021-09-24 Thread Kevin Wolf
ERRP_GUARD() makes debugging easier by making sure that _abort
still fails at the real origin of the error instead of
error_propagate().

Signed-off-by: Kevin Wolf 
---
 qom/object.c|  7 +++
 qom/object_interfaces.c | 17 ++---
 2 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/qom/object.c b/qom/object.c
index e86cb05b84..6be710bc40 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -1389,7 +1389,7 @@ bool object_property_get(Object *obj, const char *name, 
Visitor *v,
 bool object_property_set(Object *obj, const char *name, Visitor *v,
  Error **errp)
 {
-Error *err = NULL;
+ERRP_GUARD();
 ObjectProperty *prop = object_property_find_err(obj, name, errp);
 
 if (prop == NULL) {
@@ -1400,9 +1400,8 @@ bool object_property_set(Object *obj, const char *name, 
Visitor *v,
 error_setg(errp, QERR_PERMISSION_DENIED);
 return false;
 }
-prop->set(obj, v, name, prop->opaque, );
-error_propagate(errp, err);
-return !err;
+prop->set(obj, v, name, prop->opaque, errp);
+return !*errp;
 }
 
 bool object_property_set_str(Object *obj, const char *name,
diff --git a/qom/object_interfaces.c b/qom/object_interfaces.c
index ad9b56b59a..80691e88cd 100644
--- a/qom/object_interfaces.c
+++ b/qom/object_interfaces.c
@@ -45,26 +45,21 @@ bool user_creatable_can_be_deleted(UserCreatable *uc)
 static void object_set_properties_from_qdict(Object *obj, const QDict *qdict,
  Visitor *v, Error **errp)
 {
+ERRP_GUARD();
 const QDictEntry *e;
-Error *local_err = NULL;
 
-if (!visit_start_struct(v, NULL, NULL, 0, _err)) {
-goto out;
+if (!visit_start_struct(v, NULL, NULL, 0, errp)) {
+return;
 }
 for (e = qdict_first(qdict); e; e = qdict_next(qdict, e)) {
-if (!object_property_set(obj, e->key, v, _err)) {
+if (!object_property_set(obj, e->key, v, errp)) {
 break;
 }
 }
-if (!local_err) {
-visit_check_struct(v, _err);
+if (!*errp) {
+visit_check_struct(v, errp);
 }
 visit_end_struct(v, NULL);
-
-out:
-if (local_err) {
-error_propagate(errp, local_err);
-}
 }
 
 void object_set_properties_from_keyval(Object *obj, const QDict *qdict,
-- 
2.31.1




[PATCH 03/11] iotests/051: Fix typo

2021-09-24 Thread Kevin Wolf
The iothread isn't called 'iothread0', but 'thread0'. Depending on the
order that properties are parsed, the error message may change from the
expected one to another one saying that the iothread doesn't exist.

Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/051| 2 +-
 tests/qemu-iotests/051.pc.out | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/051 b/tests/qemu-iotests/051
index 7bf29343d7..1d2fa93a11 100755
--- a/tests/qemu-iotests/051
+++ b/tests/qemu-iotests/051
@@ -199,7 +199,7 @@ case "$QEMU_DEFAULT_MACHINE" in
 # virtio-blk enables the iothread only when the driver initialises the
 # device, so a second virtio-blk device can't be added even with the
 # same iothread. virtio-scsi allows this.
-run_qemu $iothread -device 
virtio-blk-pci,drive=disk,iothread=iothread0,share-rw=on
+run_qemu $iothread -device 
virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on
 run_qemu $iothread -device 
virtio-scsi,id=virtio-scsi1,iothread=thread0 -device 
scsi-hd,bus=virtio-scsi1.0,drive=disk,share-rw=on
 ;;
  *)
diff --git a/tests/qemu-iotests/051.pc.out b/tests/qemu-iotests/051.pc.out
index afe7632964..063e4fc584 100644
--- a/tests/qemu-iotests/051.pc.out
+++ b/tests/qemu-iotests/051.pc.out
@@ -183,9 +183,9 @@ Testing: -drive 
file=TEST_DIR/t.qcow2,if=none,node-name=disk -object iothread,id
 QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) QEMU_PROG: -device scsi-hd,bus=virtio-scsi1.0,drive=disk,share-rw=on: 
Cannot change iothread of active block backend
 
-Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object 
iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 
-device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device 
virtio-blk-pci,drive=disk,iothread=iothread0,share-rw=on
+Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object 
iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 
-device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device 
virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on
 QEMU X.Y.Z monitor - type 'help' for more information
-(qemu) QEMU_PROG: -device 
virtio-blk-pci,drive=disk,iothread=iothread0,share-rw=on: Cannot change 
iothread of active block backend
+(qemu) QEMU_PROG: -device 
virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on: Cannot change iothread 
of active block backend
 
 Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object 
iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 
-device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device 
virtio-scsi,id=virtio-scsi1,iothread=thread0 -device 
scsi-hd,bus=virtio-scsi1.0,drive=disk,share-rw=on
 QEMU X.Y.Z monitor - type 'help' for more information
-- 
2.31.1




Re: [PATCH] hw/nvme: reattach subsystem namespaces on hotplug

2021-09-24 Thread Klaus Jensen
On Sep 24 08:05, Hannes Reinecke wrote:
> On 9/23/21 10:09 PM, Klaus Jensen wrote:
> > On Sep  9 13:37, Hannes Reinecke wrote:
> > > On 9/9/21 12:47 PM, Klaus Jensen wrote:
> > > > On Sep  9 11:43, Hannes Reinecke wrote:
> > > > > With commit 5ffbaeed16 ("hw/nvme: fix controller hot unplugging")
> > > > > namespaces get moved from the controller to the subsystem if one
> > > > > is specified.
> > > > > That keeps the namespaces alive after a controller hot-unplug, but
> > > > > after a controller hotplug we have to reconnect the namespaces
> > > > > from the subsystem to the controller.
> > > > > 
> > > > > Fixes: 5ffbaeed16 ("hw/nvme: fix controller hot unplugging")
> > > > > Cc: Klaus Jensen 
> > > > > Signed-off-by: Hannes Reinecke 
> > > > > ---
> > > > >   hw/nvme/subsys.c | 8 +++-
> > > > >   1 file changed, 7 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
> > > > > index 93c35950d6..a9404f2b5e 100644
> > > > > --- a/hw/nvme/subsys.c
> > > > > +++ b/hw/nvme/subsys.c
> > > > > @@ -14,7 +14,7 @@
> > > > >   int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
> > > > >   {
> > > > >   NvmeSubsystem *subsys = n->subsys;
> > > > > -int cntlid;
> > > > > +int cntlid, nsid;
> > > > >   for (cntlid = 0; cntlid < ARRAY_SIZE(subsys->ctrls); cntlid++) {
> > > > >   if (!subsys->ctrls[cntlid]) {
> > > > > @@ -29,12 +29,18 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error 
> > > > > **errp)
> > > > >   subsys->ctrls[cntlid] = n;
> > > > > +for (nsid = 0; nsid < ARRAY_SIZE(subsys->namespaces); nsid++) {
> > > > > +if (subsys->namespaces[nsid]) {
> > > > > +nvme_attach_ns(n, subsys->namespaces[nsid]);
> > > > > +}
> > > > 
> > > > Thanks Hannes! I like it, keeping things simple.
> > > > 
> > > > But we should only attach namespaces that have the shared property or
> > > > have ns->attached == 0. Non-shared namespaces may already be attached to
> > > > another controller in the subsystem.
> > > > 
> > > 
> > > Well ... I tried to avoid that subject, but as you brought it up:
> > > There is a slightly tricky issue in fabrics, in that the 'controller' is
> > > independent from the 'connection'.
> > > The 'shared' bit in the CMIC setting indicates that the subsystem may
> > > have more than one _controller_. It doesn't talk about how many
> > > _connections_ a controller may support; that then is the realm of
> > > dynamic or static controllers, which we don't talk about :-).
> > > Sufficient to say, linux only implements the dynamic controller model,
> > > so every connection will be to a different controller.
> > > But you have been warned :-)
> > > 
> > > However, the 'CMIC' setting is independent on the 'NMIC' setting (ie the
> > > 'shared' bit in the namespace).
> > > Which leads to the interesting question on how exactly should one handle
> > > non-shared namespaces in subsystems for which there are multiple
> > > controllers. Especially as the NSID space is per subsystem, so each
> > > controller will be able to figure out if there are blanked-out namespaces.
> > > So to make this a sane setup I would propose to set the 'shared' option
> > > automatically whenever the controller has the 'subsys' option set.
> > > And actually, I would ditch the 'shared' option completely, and make it
> > > dependent on the 'subsys' setting for the controller.
> > > Much like we treat the 'CMIC' setting nowadays.
> > > That avoids lots of confusions, and also make the implementation _way_
> > > easier.
> > > 
> > 
> > I see your point. Unfortunately, there is no easy way to ditch shared=
> > now. But I think it'd be good enough to attach to shared automatically,
> > but not to "shared=off".
> > 
> > I've already ditched the shared parameter on my RFC refactor series and
> > always having the namespaces shared.
> > 
> 
> Okay.
> 
> > > > However...
> > > > 
> > > > The spec says that "The attach and detach operations are persistent
> > > > across all reset events.". This means that we should track those events
> > > > in the subsystem and only reattach namespaces that were attached at the
> > > > time of the "reset" event. Currently, we don't have anything mapping
> > > > that state. But the device already has to take some liberties with
> > > > regard to stuff that is considered persistent by the spec (SMART log
> > > > etc.) since we do not have any way to store stuff persistently across
> > > > qemu invocations, so I think the above is an acceptable compromise.
> > > > 
> > > Careful. 'attach' and 'detach' are MI (management interface) operations.
> > > If and how many namespaces are visible to any given controllers is
> > > actually independent on that; and, in fact, controllers might not even
> > > implement 'attach' or 'detach'.
> > > But I do agree that we don't handle the 'reset' state properly.
> > > 
> > 
> > The Namespace Attachment command has nothing to do with MI? And the QEMU
> > controller 

[PATCH 0/2] hw/nvme: fix namespace attachment on controller hotplug

2021-09-24 Thread Klaus Jensen
From: Klaus Jensen 

First patch from Hannes fixes the subsystem registration process such
that shared (but non-detached) namespaces are automatically attached to
hotplugged controllers.

The second patch changes the default for 'shared' such that namespaces
are shared by default and will thus by default be attached to hotplugged
controllers. This adds a compat property for older machine versions and
updates the documentation to reflect this.

Hannes Reinecke (1):
  hw/nvme: reattach subsystem namespaces on hotplug

Klaus Jensen (1):
  hw/nvme: change nvme-ns 'shared' default

 docs/system/devices/nvme.rst | 24 ++--
 hw/core/machine.c|  4 +++-
 hw/nvme/ns.c |  8 +---
 hw/nvme/subsys.c | 10 +-
 4 files changed, 27 insertions(+), 19 deletions(-)

-- 
2.33.0




[PATCH 2/2] hw/nvme: change nvme-ns 'shared' default

2021-09-24 Thread Klaus Jensen
From: Klaus Jensen 

Change namespaces to be shared namespaces by default (parameter
shared=on). Keep shared=off for older machine types.

Signed-off-by: Klaus Jensen 
---
 docs/system/devices/nvme.rst | 24 ++--
 hw/core/machine.c|  4 +++-
 hw/nvme/ns.c |  8 +---
 3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/docs/system/devices/nvme.rst b/docs/system/devices/nvme.rst
index bff72d1c24d0..a1c0db01f6d5 100644
--- a/docs/system/devices/nvme.rst
+++ b/docs/system/devices/nvme.rst
@@ -110,28 +110,32 @@ multipath I/O.
 This will create an NVM subsystem with two controllers. Having controllers
 linked to an ``nvme-subsys`` device allows additional ``nvme-ns`` parameters:
 
-``shared`` (default: ``off``)
+``shared`` (default: ``on`` since 6.2)
   Specifies that the namespace will be attached to all controllers in the
-  subsystem. If set to ``off`` (the default), the namespace will remain a
-  private namespace and may only be attached to a single controller at a time.
+  subsystem. If set to ``off``, the namespace will remain a private namespace
+  and may only be attached to a single controller at a time. Shared namespaces
+  are always automatically attached to all controllers (also when controllers
+  are hotplugged).
 
 ``detached`` (default: ``off``)
   If set to ``on``, the namespace will be be available in the subsystem, but
-  not attached to any controllers initially.
+  not attached to any controllers initially. A shared namespace with this set
+  to ``on`` will never be automatically attached to controllers.
 
 Thus, adding
 
 .. code-block:: console
 
-drive file=nvm-1.img,if=none,id=nvm-1
-   -device nvme-ns,drive=nvm-1,nsid=1,shared=on
+   -device nvme-ns,drive=nvm-1,nsid=1
-drive file=nvm-2.img,if=none,id=nvm-2
-   -device nvme-ns,drive=nvm-2,nsid=3,detached=on
+   -device nvme-ns,drive=nvm-2,nsid=3,shared=off,detached=on
 
-will cause NSID 1 will be a shared namespace (due to ``shared=on``) that is
-initially attached to both controllers. NSID 3 will be a private namespace
-(i.e. only attachable to a single controller at a time) and will not be
-attached to any controller initially (due to ``detached=on``).
+will cause NSID 1 will be a shared namespace that is initially attached to both
+controllers. NSID 3 will be a private namespace due to ``shared=off`` and only
+attachable to a single controller at a time. Additionally it will not be
+attached to any controller initially (due to ``detached=on``) or to hotplugged
+controllers.
 
 Optional Features
 =
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 067f42b528fd..5e2fa3e392b9 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -37,7 +37,9 @@
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-pci.h"
 
-GlobalProperty hw_compat_6_1[] = {};
+GlobalProperty hw_compat_6_1[] = {
+{ "nvme-ns", "shared", "off" },
+};
 const size_t hw_compat_6_1_len = G_N_ELEMENTS(hw_compat_6_1);
 
 GlobalProperty hw_compat_6_0[] = {
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index b7cf1494e75b..8b5f98c76180 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -465,12 +465,6 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
"linked to an nvme-subsys device");
 return;
 }
-
-if (ns->params.shared) {
-error_setg(errp, "shared requires that the nvme device is "
-   "linked to an nvme-subsys device");
-return;
-}
 } else {
 /*
  * If this namespace belongs to a subsystem (through a link on the
@@ -532,7 +526,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
 static Property nvme_ns_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
 DEFINE_PROP_BOOL("detached", NvmeNamespace, params.detached, false),
-DEFINE_PROP_BOOL("shared", NvmeNamespace, params.shared, false),
+DEFINE_PROP_BOOL("shared", NvmeNamespace, params.shared, true),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
 DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
 DEFINE_PROP_UINT64("eui64", NvmeNamespace, params.eui64, 0),
-- 
2.33.0




[PATCH 1/2] hw/nvme: reattach subsystem namespaces on hotplug

2021-09-24 Thread Klaus Jensen
From: Hannes Reinecke 

With commit 5ffbaeed16 ("hw/nvme: fix controller hot unplugging")
namespaces get moved from the controller to the subsystem if one
is specified.
That keeps the namespaces alive after a controller hot-unplug, but
after a controller hotplug we have to reconnect the namespaces
from the subsystem to the controller.

Fixes: 5ffbaeed16 ("hw/nvme: fix controller hot unplugging")
Cc: Klaus Jensen 
Signed-off-by: Hannes Reinecke 
[k.jensen: only attach to shared and non-detached namespaces]
Signed-off-by: Klaus Jensen 
---
 hw/nvme/subsys.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
index 93c35950d69d..6b2e4c975f5b 100644
--- a/hw/nvme/subsys.c
+++ b/hw/nvme/subsys.c
@@ -14,7 +14,7 @@
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
 {
 NvmeSubsystem *subsys = n->subsys;
-int cntlid;
+int cntlid, nsid;
 
 for (cntlid = 0; cntlid < ARRAY_SIZE(subsys->ctrls); cntlid++) {
 if (!subsys->ctrls[cntlid]) {
@@ -29,12 +29,20 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
 
 subsys->ctrls[cntlid] = n;
 
+for (nsid = 1; nsid < ARRAY_SIZE(subsys->namespaces); nsid++) {
+NvmeNamespace *ns = subsys->namespaces[nsid];
+if (ns && ns->params.shared && !ns->params.detached) {
+nvme_attach_ns(n, ns);
+}
+}
+
 return cntlid;
 }
 
 void nvme_subsys_unregister_ctrl(NvmeSubsystem *subsys, NvmeCtrl *n)
 {
 subsys->ctrls[n->cntlid] = NULL;
+n->cntlid = -1;
 }
 
 static void nvme_subsys_setup(NvmeSubsystem *subsys)
-- 
2.33.0




Re: [PATCH] block: introduce max_hw_iov for use in scsi-generic

2021-09-24 Thread Christian Borntraeger

Peter, Michael,

do we still do stable releases for QEMU or has this stopped?

Am 24.09.21 um 07:27 schrieb Paolo Bonzini:

Yes, the question is whether it still exists... Paolo El jue., 23 sept. 2021 16:48, 
Christian Borntraeger  escribió: Am 23.09.21 um 15:04 
schrieb Paolo Bonzini: > Linux limits the size of iovecs to 1024 (UIO_MAXIOV 
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
Yes, the question is whether it still exists...

Paolo

El jue., 23 sept. 2021 16:48, Christian Borntraeger mailto:borntrae...@de.ibm.com>> escribió:



Am 23.09.21 um 15:04 schrieb Paolo Bonzini:
 > Linux limits the size of iovecs to 1024 (UIO_MAXIOV in the kernel
 > sources, IOV_MAX in POSIX).  Because of this, on some host adapters
 > requests with many iovecs are rejected with -EINVAL by the
 > io_submit() or readv()/writev() system calls.
 >
 > In fact, the same limit applies to SG_IO as well.  To fix both the
 > EINVAL and the possible performance issues from using fewer iovecs
 > than allowed by Linux (some HBAs have max_segments as low as 128),
 > introduce a separate entry in BlockLimits to hold the max_segments
 > value from sysfs.  This new limit is used only for SG_IO and clamped
 > to bs->bl.max_iov anyway, just like max_hw_transfer is clamped to
 > bs->bl.max_transfer.
 >
 > Reported-by: Halil Pasic mailto:pa...@linux.ibm.com>>
 > Cc: Hanna Reitz mailto:hre...@redhat.com>>
 > Cc: Kevin Wolf mailto:kw...@redhat.com>>
 > Cc: qemu-block@nongnu.org 
 > Fixes: 18473467d5 ("file-posix: try BLKSECTGET on block devices too, do not 
round to power of 2", 2021-06-25)

This sneaked in shortly before the 6.1 release (between rc0 and rc1 I 
think).
Shouldnt that go to stable in cass this still exist?


 > Signed-off-by: Paolo Bonzini mailto:pbonz...@redhat.com>>
 > ---
 >   block/block-backend.c          | 6 ++
 >   block/file-posix.c             | 2 +-
 >   block/io.c                     | 1 +
 >   hw/scsi/scsi-generic.c         | 2 +-
 >   include/block/block_int.h      | 7 +++
 >   include/sysemu/block-backend.h | 1 +
 >   6 files changed, 17 insertions(+), 2 deletions(-)
 >
 > diff --git a/block/block-backend.c b/block/block-backend.c
 > index 6140d133e2..ba2b5ebb10 100644
 > --- a/block/block-backend.c
 > +++ b/block/block-backend.c
 > @@ -1986,6 +1986,12 @@ uint32_t blk_get_max_transfer(BlockBackend *blk)
 >       return ROUND_DOWN(max, blk_get_request_alignment(blk));
 >   }
 >
 > +int blk_get_max_hw_iov(BlockBackend *blk)
 > +{
 > +    return MIN_NON_ZERO(blk->root->bs->bl.max_hw_iov,
 > +                        blk->root->bs->bl.max_iov);
 > +}
 > +
 >   int blk_get_max_iov(BlockBackend *blk)
 >   {
 >       return blk->root->bs->bl.max_iov;
 > diff --git a/block/file-posix.c b/block/file-posix.c
 > index cb9bffe047..1567edb3d5 100644
 > --- a/block/file-posix.c
 > +++ b/block/file-posix.c
 > @@ -1273,7 +1273,7 @@ static void raw_refresh_limits(BlockDriverState 
*bs, Error **errp)
 >
 >           ret = hdev_get_max_segments(s->fd, );
 >           if (ret > 0) {
 > -            bs->bl.max_iov = ret;
 > +            bs->bl.max_hw_iov = ret;
 >           }
 >       }
 >   }
 > diff --git a/block/io.c b/block/io.c
 > index a19942718b..f38e7f81d8 100644
 > --- a/block/io.c
 > +++ b/block/io.c
 > @@ -136,6 +136,7 @@ static void bdrv_merge_limits(BlockLimits *dst, 
const BlockLimits *src)
 >       dst->min_mem_alignment = MAX(dst->min_mem_alignment,
 >                                    src->min_mem_alignment);
 >       dst->max_iov = MIN_NON_ZERO(dst->max_iov, src->max_iov);
 > +    dst->max_hw_iov = MIN_NON_ZERO(dst->max_hw_iov, src->max_hw_iov);
 >   }
 >
 >   typedef struct BdrvRefreshLimitsState {
 > diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
 > index 665baf900e..0306ccc7b1 100644
 > --- a/hw/scsi/scsi-generic.c
 > +++ b/hw/scsi/scsi-generic.c
 > @@ -180,7 +180,7 @@ static int scsi_handle_inquiry_reply(SCSIGenericReq 
*r, SCSIDevice *s, int len)
 >           page = r->req.cmd.buf[2];
 >           if (page == 0xb0) {
 >               uint64_t max_transfer = 
blk_get_max_hw_transfer(s->conf.blk);
 > -            uint32_t max_iov = blk_get_max_iov(s->conf.blk);
 > +            uint32_t max_iov = blk_get_max_hw_iov(s->conf.blk);
 >
 >               assert(max_transfer);
 >               max_transfer = MIN_NON_ZERO(max_transfer, max_iov * 
qemu_real_host_page_size)
 > diff --git a/include/block/block_int.h b/include/block/block_int.h
 > index f1a54db0f8..c31cbd034a 100644
 > --- a/include/block/block_int.h
 > +++ 

[PULL 0/3] hw/nvme updates

2021-09-24 Thread Klaus Jensen
From: Klaus Jensen 

Hi Peter,

The following changes since commit 2c3e83f92d93fbab071b8a96b8ab769b01902475:

  Merge remote-tracking branch 
'remotes/alistair23/tags/pull-riscv-to-apply-20210921' into staging (2021-09-21 
10:57:48 -0700)

are available in the Git repository at:

  git://git.infradead.org/qemu-nvme.git tags/nvme-next-pull-request

for you to fetch changes up to c53a9a91021c2f57de9ab18393d0048bd0fe90c2:

  hw/nvme: Return error for fused operations (2021-09-24 08:43:58 +0200)


hw/nvme updates



Klaus Jensen (1):
  hw/nvme: fix validation of ASQ and ACQ

Naveen Nagar (1):
  hw/nvme: fix verification of select field in namespace attachment

Pankaj Raghav (1):
  hw/nvme: Return error for fused operations

 hw/nvme/ctrl.c   | 31 ---
 hw/nvme/trace-events |  2 --
 include/block/nvme.h |  5 +
 3 files changed, 25 insertions(+), 13 deletions(-)

-- 
2.33.0




[PULL 2/3] hw/nvme: fix verification of select field in namespace attachment

2021-09-24 Thread Klaus Jensen
From: Naveen Nagar 

Fix is added to check for reserved value in select field for
namespace attachment

CC: Minwoo Im 
Signed-off-by: Naveen Nagar 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 15 ---
 include/block/nvme.h |  5 +
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index ff784851137e..dc0e7b00308e 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -5191,7 +5191,7 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
NvmeRequest *req)
 uint16_t list[NVME_CONTROLLER_LIST_SIZE] = {};
 uint32_t nsid = le32_to_cpu(req->cmd.nsid);
 uint32_t dw10 = le32_to_cpu(req->cmd.cdw10);
-bool attach = !(dw10 & 0xf);
+uint8_t sel = dw10 & 0xf;
 uint16_t *nr_ids = [0];
 uint16_t *ids = [1];
 uint16_t ret;
@@ -5224,7 +5224,8 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
NvmeRequest *req)
 return NVME_NS_CTRL_LIST_INVALID | NVME_DNR;
 }
 
-if (attach) {
+switch (sel) {
+case NVME_NS_ATTACHMENT_ATTACH:
 if (nvme_ns(ctrl, nsid)) {
 return NVME_NS_ALREADY_ATTACHED | NVME_DNR;
 }
@@ -5235,7 +5236,10 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
NvmeRequest *req)
 
 nvme_attach_ns(ctrl, ns);
 nvme_select_iocs_ns(ctrl, ns);
-} else {
+
+break;
+
+case NVME_NS_ATTACHMENT_DETACH:
 if (!nvme_ns(ctrl, nsid)) {
 return NVME_NS_NOT_ATTACHED | NVME_DNR;
 }
@@ -5244,6 +5248,11 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
NvmeRequest *req)
 ns->attached--;
 
 nvme_update_dmrsl(ctrl);
+
+break;
+
+default:
+return NVME_INVALID_FIELD | NVME_DNR;
 }
 
 /*
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 77aae0117494..e3bd47bf76ab 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1154,6 +1154,11 @@ enum NvmeIdCtrlCmic {
 NVME_CMIC_MULTI_CTRL= 1 << 1,
 };
 
+enum NvmeNsAttachmentOperation {
+NVME_NS_ATTACHMENT_ATTACH = 0x0,
+NVME_NS_ATTACHMENT_DETACH = 0x1,
+};
+
 #define NVME_CTRL_SQES_MIN(sqes) ((sqes) & 0xf)
 #define NVME_CTRL_SQES_MAX(sqes) (((sqes) >> 4) & 0xf)
 #define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf)
-- 
2.33.0




[PULL 3/3] hw/nvme: Return error for fused operations

2021-09-24 Thread Klaus Jensen
From: Pankaj Raghav 

Currently, FUSED operations are not supported by QEMU. As per the 1.4 SPEC,
controller should abort the command that requested a fused operation with
an INVALID FIELD error code if they are not supported.

Changes from v1:
Added FUSE flag check also to the admin cmd processing as the FUSED
operations are mentioned in the general SQE section in the SPEC.

Signed-off-by: Pankaj Raghav 
Reviewed-by: Keith Busch 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index dc0e7b00308e..2f247a9275ca 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -3893,6 +3893,10 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 return ns->status;
 }
 
+if (NVME_CMD_FLAGS_FUSE(req->cmd.flags)) {
+return NVME_INVALID_FIELD;
+}
+
 req->ns = ns;
 
 switch (req->cmd.opcode) {
@@ -5475,6 +5479,10 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
+if (NVME_CMD_FLAGS_FUSE(req->cmd.flags)) {
+return NVME_INVALID_FIELD;
+}
+
 switch (req->cmd.opcode) {
 case NVME_ADM_CMD_DELETE_SQ:
 return nvme_del_sq(n, req);
-- 
2.33.0




[PULL 1/3] hw/nvme: fix validation of ASQ and ACQ

2021-09-24 Thread Klaus Jensen
From: Klaus Jensen 

Address 0x0 is a valid address. Fix the admin submission and completion
queue address validation to not error out on this.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/nvme/ctrl.c   | 8 
 hw/nvme/trace-events | 2 --
 2 files changed, 10 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 6baf9e0420d5..ff784851137e 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -5623,14 +5623,6 @@ static int nvme_start_ctrl(NvmeCtrl *n)
 trace_pci_nvme_err_startfail_sq();
 return -1;
 }
-if (unlikely(!asq)) {
-trace_pci_nvme_err_startfail_nbarasq();
-return -1;
-}
-if (unlikely(!acq)) {
-trace_pci_nvme_err_startfail_nbaracq();
-return -1;
-}
 if (unlikely(asq & (page_size - 1))) {
 trace_pci_nvme_err_startfail_asq_misaligned(asq);
 return -1;
diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events
index 430eeb395b24..ff6cafd520df 100644
--- a/hw/nvme/trace-events
+++ b/hw/nvme/trace-events
@@ -159,8 +159,6 @@ pci_nvme_err_invalid_setfeat(uint32_t dw10) "invalid set 
features, dw10=0x%"PRIx
 pci_nvme_err_invalid_log_page(uint16_t cid, uint16_t lid) "cid %"PRIu16" lid 
0x%"PRIx16""
 pci_nvme_err_startfail_cq(void) "nvme_start_ctrl failed because there are 
non-admin completion queues"
 pci_nvme_err_startfail_sq(void) "nvme_start_ctrl failed because there are 
non-admin submission queues"
-pci_nvme_err_startfail_nbarasq(void) "nvme_start_ctrl failed because the admin 
submission queue address is null"
-pci_nvme_err_startfail_nbaracq(void) "nvme_start_ctrl failed because the admin 
completion queue address is null"
 pci_nvme_err_startfail_asq_misaligned(uint64_t addr) "nvme_start_ctrl failed 
because the admin submission queue address is misaligned: 0x%"PRIx64""
 pci_nvme_err_startfail_acq_misaligned(uint64_t addr) "nvme_start_ctrl failed 
because the admin completion queue address is misaligned: 0x%"PRIx64""
 pci_nvme_err_startfail_page_too_small(uint8_t log2ps, uint8_t maxlog2ps) 
"nvme_start_ctrl failed because the page size is too small: log2size=%u, min=%u"
-- 
2.33.0




Re: [PATCH] hw/nvme: reattach subsystem namespaces on hotplug

2021-09-24 Thread Hannes Reinecke

On 9/23/21 10:09 PM, Klaus Jensen wrote:

On Sep  9 13:37, Hannes Reinecke wrote:

On 9/9/21 12:47 PM, Klaus Jensen wrote:

On Sep  9 11:43, Hannes Reinecke wrote:

With commit 5ffbaeed16 ("hw/nvme: fix controller hot unplugging")
namespaces get moved from the controller to the subsystem if one
is specified.
That keeps the namespaces alive after a controller hot-unplug, but
after a controller hotplug we have to reconnect the namespaces
from the subsystem to the controller.

Fixes: 5ffbaeed16 ("hw/nvme: fix controller hot unplugging")
Cc: Klaus Jensen 
Signed-off-by: Hannes Reinecke 
---
  hw/nvme/subsys.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
index 93c35950d6..a9404f2b5e 100644
--- a/hw/nvme/subsys.c
+++ b/hw/nvme/subsys.c
@@ -14,7 +14,7 @@
  int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
  {
  NvmeSubsystem *subsys = n->subsys;
-int cntlid;
+int cntlid, nsid;
  
  for (cntlid = 0; cntlid < ARRAY_SIZE(subsys->ctrls); cntlid++) {

  if (!subsys->ctrls[cntlid]) {
@@ -29,12 +29,18 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
  
  subsys->ctrls[cntlid] = n;
  
+for (nsid = 0; nsid < ARRAY_SIZE(subsys->namespaces); nsid++) {

+if (subsys->namespaces[nsid]) {
+nvme_attach_ns(n, subsys->namespaces[nsid]);
+}


Thanks Hannes! I like it, keeping things simple.

But we should only attach namespaces that have the shared property or
have ns->attached == 0. Non-shared namespaces may already be attached to
another controller in the subsystem.



Well ... I tried to avoid that subject, but as you brought it up:
There is a slightly tricky issue in fabrics, in that the 'controller' is
independent from the 'connection'.
The 'shared' bit in the CMIC setting indicates that the subsystem may
have more than one _controller_. It doesn't talk about how many
_connections_ a controller may support; that then is the realm of
dynamic or static controllers, which we don't talk about :-).
Sufficient to say, linux only implements the dynamic controller model,
so every connection will be to a different controller.
But you have been warned :-)

However, the 'CMIC' setting is independent on the 'NMIC' setting (ie the
'shared' bit in the namespace).
Which leads to the interesting question on how exactly should one handle
non-shared namespaces in subsystems for which there are multiple
controllers. Especially as the NSID space is per subsystem, so each
controller will be able to figure out if there are blanked-out namespaces.
So to make this a sane setup I would propose to set the 'shared' option
automatically whenever the controller has the 'subsys' option set.
And actually, I would ditch the 'shared' option completely, and make it
dependent on the 'subsys' setting for the controller.
Much like we treat the 'CMIC' setting nowadays.
That avoids lots of confusions, and also make the implementation _way_
easier.



I see your point. Unfortunately, there is no easy way to ditch shared=
now. But I think it'd be good enough to attach to shared automatically,
but not to "shared=off".

I've already ditched the shared parameter on my RFC refactor series and
always having the namespaces shared.



Okay.


However...

The spec says that "The attach and detach operations are persistent
across all reset events.". This means that we should track those events
in the subsystem and only reattach namespaces that were attached at the
time of the "reset" event. Currently, we don't have anything mapping
that state. But the device already has to take some liberties with
regard to stuff that is considered persistent by the spec (SMART log
etc.) since we do not have any way to store stuff persistently across
qemu invocations, so I think the above is an acceptable compromise.


Careful. 'attach' and 'detach' are MI (management interface) operations.
If and how many namespaces are visible to any given controllers is
actually independent on that; and, in fact, controllers might not even
implement 'attach' or 'detach'.
But I do agree that we don't handle the 'reset' state properly.



The Namespace Attachment command has nothing to do with MI? And the QEMU
controller does support attaching and detaching namespaces.



No, you got me wrong. Whether a not a namespace is attached is 
independent of any 'Namespace attachment' command.
Case in point: the subsystem will be starting up with namespace already 
attached, even though no-one issued any namespace attachment command.



A potential (as good as it gets) fix would be to keep a map/list of
"persistently" attached controllers on the namespaces and re-attach
according to that when we see that controller joining the subsystem
again. CNTLID would be the obvious choice for the key here, but problem
is that we cant really use it since we assign it sequentially from the
subsystem, which now looks like a pretty bad choice. CNTLID should have
been a required property