Re: [PATCH v2] block/file-posix: Optimize for macOS

2021-03-08 Thread Akihiko Odaki
2021年3月9日(火) 0:37 Akihiko Odaki :
>
> 2021年3月9日(火) 0:17 Stefan Hajnoczi :
> >
> > The live migration compatibility issue is still present. Migrating to
> > another host might not work if the block limits are different.
> >
> > Here is an idea for solving it:
> >
> > Modify include/hw/block/block.h:DEFINE_BLOCK_PROPERTIES_BASE() to
> > support a new value called "host". The default behavior remains
> > unchanged for live migration compatibility but now you can use "host" if
> > you know it's okay but don't care about migration compatibility.
> >
> > The downside to this approach is that users must explicitly say
> > something like --drive ...,opt_io_size=host. But it's still better than
> > the situation we have today where user must manually enter values for
> > their disk.
> >
> > Does this sound okay to everyone?
> >
> > Stefan
>
> I wonder how that change affects other block drivers implementing
> bdrv_probe_blocksizes. As far as I know, the values they report are
> already used by default, which is contrary to the default not being
> "host".
>
> Regards,
> Akihiko Odaki

Let me suggest a variant of Stefan's approach:

Modify include/hw/block/block.h:DEFINE_BLOCK_PROPERTIES_BASE() to
support a new value called "host". The default values for block size
properties may be "host" or not, but they should be consistent. If
they are "host" by default, add global properties which sets
discard_granularity and opt_io_size to the old default to
hw_compat_5_2 in hw/core/machine.c. Otherwise, add global properties
which sets logical_block_size and physical_block_size to "host".

Does it sound good? I'd also like to know others opinions for the
default value ("host" or something else). I prefer "host" as the
default a little because those who need live migration should be
careful enough to set proper configurations for each device. We may
also assist users who need live migration by adding a property which
defaults all block size properties to something else "host".

Regards,
Akihiko Odaki



[PATCH v2 3/4] block: detect DKIOCGETBLOCKCOUNT/SIZE before use

2021-03-08 Thread Joelle van Dyne
iOS hosts do not have these defined so we fallback to the
default behaviour.

Co-authored-by: Warner Losh 
Signed-off-by: Joelle van Dyne 
---
 block/file-posix.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index d1ab3180ff..9b6d7ddda3 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2326,8 +2326,10 @@ static int64_t raw_getlength(BlockDriverState *bs)
 again:
 #endif
 if (!fstat(fd, ) && (S_IFCHR & sb.st_mode)) {
+size = 0;
 #ifdef DIOCGMEDIASIZE
 if (ioctl(fd, DIOCGMEDIASIZE, (off_t *)))
+size = 0;
 #elif defined(DIOCGPART)
 {
 struct partinfo pi;
@@ -2336,9 +2338,7 @@ again:
 else
 size = 0;
 }
-if (size == 0)
-#endif
-#if defined(__APPLE__) && defined(__MACH__)
+#elif defined(DKIOCGETBLOCKCOUNT) && defined(DKIOCGETBLOCKSIZE)
 {
 uint64_t sectors = 0;
 uint32_t sector_size = 0;
@@ -2346,19 +2346,15 @@ again:
 if (ioctl(fd, DKIOCGETBLOCKCOUNT, ) == 0
&& ioctl(fd, DKIOCGETBLOCKSIZE, _size) == 0) {
 size = sectors * sector_size;
-} else {
-size = lseek(fd, 0LL, SEEK_END);
-if (size < 0) {
-return -errno;
-}
 }
 }
-#else
-size = lseek(fd, 0LL, SEEK_END);
+#endif
+if (size == 0) {
+size = lseek(fd, 0LL, SEEK_END);
+}
 if (size < 0) {
 return -errno;
 }
-#endif
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
 switch(s->type) {
 case FTYPE_CD:
-- 
2.28.0




[PATCH v2 2/4] block: check for sys/disk.h

2021-03-08 Thread Joelle van Dyne
Some BSD platforms do not have this header.

Signed-off-by: Joelle van Dyne 
---
 meson.build | 1 +
 block.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 0e53876f69..ba0db9fa1f 100644
--- a/meson.build
+++ b/meson.build
@@ -1153,6 +1153,7 @@ config_host_data.set('HAVE_SYS_IOCCOM_H', 
cc.has_header('sys/ioccom.h'))
 config_host_data.set('HAVE_SYS_KCOV_H', cc.has_header('sys/kcov.h'))
 config_host_data.set('HAVE_SYSTEM_FUNCTION', cc.has_function('system', prefix: 
'#include '))
 config_host_data.set('HAVE_HOST_BLOCK_DEVICE', have_host_block_device)
+config_host_data.set('HAVE_SYS_DISK_H', cc.has_header('sys/disk.h'))
 
 config_host_data.set('CONFIG_PREADV', cc.has_function('preadv', prefix: 
'#include '))
 
diff --git a/block.c b/block.c
index a1f3cecd75..b2705ad225 100644
--- a/block.c
+++ b/block.c
@@ -54,7 +54,7 @@
 #ifdef CONFIG_BSD
 #include 
 #include 
-#ifndef __DragonFly__
+#if defined(HAVE_SYS_DISK_H)
 #include 
 #endif
 #endif
-- 
2.28.0




[PATCH v2 1/4] block: feature detection for host block support

2021-03-08 Thread Joelle van Dyne
On Darwin (iOS), there are no system level APIs for directly accessing
host block devices. We detect this at configure time.

Signed-off-by: Joelle van Dyne 
---
 meson.build  |  6 +-
 qapi/block-core.json | 10 +++---
 block/file-posix.c   | 33 ++---
 3 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/meson.build b/meson.build
index 81d760d6e8..0e53876f69 100644
--- a/meson.build
+++ b/meson.build
@@ -181,7 +181,7 @@ if targetos == 'windows'
   include_directories: 
include_directories('.'))
 elif targetos == 'darwin'
   coref = dependency('appleframeworks', modules: 'CoreFoundation')
-  iokit = dependency('appleframeworks', modules: 'IOKit')
+  iokit = dependency('appleframeworks', modules: 'IOKit', required: false)
 elif targetos == 'sunos'
   socket = [cc.find_library('socket'),
 cc.find_library('nsl'),
@@ -1056,6 +1056,9 @@ if get_option('cfi')
   add_global_link_arguments(cfi_flags, native: false, language: ['c', 'cpp', 
'objc'])
 endif
 
+have_host_block_device = (targetos != 'darwin' or
+cc.has_header('IOKit/storage/IOMedia.h'))
+
 #
 # config-host.h #
 #
@@ -1149,6 +1152,7 @@ config_host_data.set('HAVE_PTY_H', cc.has_header('pty.h'))
 config_host_data.set('HAVE_SYS_IOCCOM_H', cc.has_header('sys/ioccom.h'))
 config_host_data.set('HAVE_SYS_KCOV_H', cc.has_header('sys/kcov.h'))
 config_host_data.set('HAVE_SYSTEM_FUNCTION', cc.has_function('system', prefix: 
'#include '))
+config_host_data.set('HAVE_HOST_BLOCK_DEVICE', have_host_block_device)
 
 config_host_data.set('CONFIG_PREADV', cc.has_function('preadv', prefix: 
'#include '))
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 9f555d5c1d..0c2cd9e689 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -959,7 +959,8 @@
   'discriminator': 'driver',
   'data': {
   'file': 'BlockStatsSpecificFile',
-  'host_device': 'BlockStatsSpecificFile',
+  'host_device': { 'type': 'BlockStatsSpecificFile',
+   'if': 'defined(HAVE_HOST_BLOCK_DEVICE)' },
   'nvme': 'BlockStatsSpecificNvme' } }
 
 ##
@@ -2863,7 +2864,9 @@
 { 'enum': 'BlockdevDriver',
   'data': [ 'blkdebug', 'blklogwrites', 'blkreplay', 'blkverify', 'bochs',
 'cloop', 'compress', 'copy-on-read', 'dmg', 'file', 'ftp', 'ftps',
-'gluster', 'host_cdrom', 'host_device', 'http', 'https', 'iscsi',
+'gluster', 'host_cdrom',
+{'name': 'host_device', 'if': 'defined(HAVE_HOST_BLOCK_DEVICE)' },
+'http', 'https', 'iscsi',
 'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
 'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
 { 'name': 'replication', 'if': 'defined(CONFIG_REPLICATION)' },
@@ -4066,7 +4069,8 @@
   'ftps':   'BlockdevOptionsCurlFtps',
   'gluster':'BlockdevOptionsGluster',
   'host_cdrom': 'BlockdevOptionsFile',
-  'host_device':'BlockdevOptionsFile',
+  'host_device': { 'type': 'BlockdevOptionsFile',
+   'if': 'defined(HAVE_HOST_BLOCK_DEVICE)' },
   'http':   'BlockdevOptionsCurlHttp',
   'https':  'BlockdevOptionsCurlHttps',
   'iscsi':  'BlockdevOptionsIscsi',
diff --git a/block/file-posix.c b/block/file-posix.c
index 05079b40ca..d1ab3180ff 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -42,6 +42,8 @@
 #include "scsi/constants.h"
 
 #if defined(__APPLE__) && (__MACH__)
+#include 
+#if defined(HAVE_HOST_BLOCK_DEVICE)
 #include 
 #include 
 #include 
@@ -52,6 +54,7 @@
 //#include 
 #include 
 #include 
+#endif /* defined(HAVE_HOST_BLOCK_DEVICE) */
 #endif
 
 #ifdef __sun__
@@ -181,7 +184,17 @@ typedef struct BDRVRawReopenState {
 bool check_cache_dropped;
 } BDRVRawReopenState;
 
-static int fd_open(BlockDriverState *bs);
+static int fd_open(BlockDriverState *bs)
+{
+BDRVRawState *s = bs->opaque;
+
+/* this is just to ensure s->fd is sane (its called by io ops) */
+if (s->fd >= 0) {
+return 0;
+}
+return -EIO;
+}
+
 static int64_t raw_getlength(BlockDriverState *bs);
 
 typedef struct RawPosixAIOData {
@@ -3032,6 +3045,7 @@ static BlockStatsSpecific 
*raw_get_specific_stats(BlockDriverState *bs)
 return stats;
 }
 
+#if defined(HAVE_HOST_BLOCK_DEVICE)
 static BlockStatsSpecific *hdev_get_specific_stats(BlockDriverState *bs)
 {
 BlockStatsSpecific *stats = g_new(BlockStatsSpecific, 1);
@@ -3041,6 +3055,7 @@ static BlockStatsSpecific 
*hdev_get_specific_stats(BlockDriverState *bs)
 
 return stats;
 }
+#endif /* HAVE_HOST_BLOCK_DEVICE */
 
 static QemuOptsList raw_create_opts = {
 .name = "raw-create-opts",
@@ -3265,6 +3280,8 @@ BlockDriver bdrv_file = {
 /***/
 /* host device */
 
+#if defined(HAVE_HOST_BLOCK_DEVICE)
+
 #if defined(__APPLE__) && defined(__MACH__)
 static kern_return_t 

Re: [PATCH v3 29/30] vl: QAPIfy -object

2021-03-08 Thread Eric Blake
On 3/8/21 10:54 AM, Kevin Wolf wrote:
> This switches the system emulator from a QemuOpts-based parser for
> -object to user_creatable_parse_str() which uses a keyval parser and
> enforces the QAPI schema.
> 
> Apart from being a cleanup, this makes non-scalar properties accessible.
> 
> This adopts a similar model as -blockdev uses: When parsing the option,
> create the ObjectOptions and queue them. At the later point where we
> used to create objects for the collected QemuOpts, the ObjectOptions
> queue is processed instead.
> 
> A complication compared to -blockdev is that object definitions are
> supported in -readconfig and -writeconfig.
> 
> After this patch, -readconfig still works, though it still goes through
> the QemuOpts parser, which means that improvements like non-scalar
> properties are still not available in config files.
> 
> -writeconfig stops working for -object. Tough luck. It has never
> supported all options (not even the common ones), so supporting one less
> isn't the end of the world. As object definitions from -readconfig still
> go through QemuOpts, they are still included in -writeconfig output,
> which at least prevents destroying your existing configuration when you
> just wanted to add another option.

Maybe worth a tweak to this paragraph now that b979c931 has landed
formally deprecating -writeconfig (all the more reason we don't care
about it).

> 
> Signed-off-by: Kevin Wolf 
> Acked-by: Peter Krempa 
> Reviewed-by: Eric Blake 

R-b stands either way.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v3 26/30] qemu-img: Use user_creatable_process_cmdline() for --object

2021-03-08 Thread Eric Blake
On 3/8/21 10:54 AM, Kevin Wolf wrote:
> This switches qemu-img from a QemuOpts-based parser for --object to
> user_creatable_process_cmdline() which uses a keyval parser and enforces
> the QAPI schema.
> 
> Apart from being a cleanup, this makes non-scalar properties accessible.
> 
> Signed-off-by: Kevin Wolf 
> Acked-by: Peter Krempa 
> ---
>  qemu-img.c | 251 ++---
>  1 file changed, 45 insertions(+), 206 deletions(-)
> 

> @@ -1353,7 +1303,7 @@ static int check_empty_sectors(BlockBackend *blk, 
> int64_t offset,
>  /*
>   * Compares two images. Exit codes:
>   *
> - * 0 - Images are identical
> + * 0 - Images are identical or the requested help was printed

Nice, but does the user-facing documentation need updating to match?

>   * 1 - Images differ
>   * >1 - Error occurred
>   */
> @@ -1423,15 +1373,21 @@ static int img_compare(int argc, char **argv)
>  case 'U':
>  force_share = true;
>  break;
> -case OPTION_OBJECT: {
> -QemuOpts *opts;
> -opts = qemu_opts_parse_noisily(_object_opts,
> -   optarg, true);
> -if (!opts) {
> -ret = 2;
> -goto out4;
> +case OPTION_OBJECT:
> +{
> +Error *local_err = NULL;
> +
> +if (!user_creatable_add_from_str(optarg, _err)) {
> +if (local_err) {
> +error_report_err(local_err);
> +exit(2);
> +} else {
> +/* Help was printed */
> +exit(EXIT_SUCCESS);
> +}

The commit message needs to be updated to call out that this bug fix was
intentional, preferably mentioning the commit where we broke it
(334c43e2c3).

The code is fine, though, so with an improved commit message (and maybe
some matching doc tweaks),

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v3 06/30] qapi/qom: Add ObjectOptions for memory-backend-*

2021-03-08 Thread Eric Blake
On 3/8/21 10:54 AM, Kevin Wolf wrote:
> This adds a QAPI schema for the properties of the memory-backend-*
> objects.
> 
> HostMemPolicy has to be moved to an include file that can be used by the
> storage daemon, too, because ObjectOptions must be the same in all
> binaries if we don't want to compile the whole code multiple times.
> 
> Signed-off-by: Kevin Wolf 
> Acked-by: Peter Krempa 
> ---
>  qapi/common.json  |  20 
>  qapi/machine.json |  22 +
>  qapi/qom.json | 121 +-
>  3 files changed, 141 insertions(+), 22 deletions(-)
> 

> @@ -287,7 +397,10 @@
>  'cryptodev-backend-builtin',
>  'cryptodev-vhost-user',
>  'dbus-vmstate',
> -'iothread'
> +'iothread',
> +'memory-backend-file',
> +'memory-backend-memfd',
> +'memory-backend-ram'

Another leaked enum value...

>] }
>  
>  ##
> @@ -315,7 +428,11 @@
>'cryptodev-vhost-user':   { 'type': 'CryptodevVhostUserProperties',
>'if': 'defined(CONFIG_VIRTIO_CRYPTO) 
> && defined(CONFIG_VHOST_CRYPTO)' },
>'dbus-vmstate':   'DBusVMStateProperties',
> -  'iothread':   'IothreadProperties'
> +  'iothread':   'IothreadProperties',
> +  'memory-backend-file':'MemoryBackendFileProperties',
> +  'memory-backend-memfd':   { 'type': 'MemoryBackendMemfdProperties',
> +  'if': 'defined(CONFIG_LINUX)' },
> +  'memory-backend-ram': 'MemoryBackendProperties'
>} }

...when compared to the union branches.

Once fixed,
Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v3 04/30] qapi/qom: Add ObjectOptions for cryptodev-*

2021-03-08 Thread Eric Blake
On 3/8/21 10:54 AM, Kevin Wolf wrote:
> This adds a QAPI schema for the properties of the cryptodev-* objects.
> 
> These interfaces have some questionable aspects (cryptodev-backend is
> really an abstract base class without function, and the queues option
> only makes sense for cryptodev-vhost-user), but as the goal is to
> represent the existing interface in QAPI, leave these things in place.
> 
> Signed-off-by: Kevin Wolf 
> Acked-by: Peter Krempa 
> ---
>  qapi/qom.json | 35 +++
>  1 file changed, 35 insertions(+)
> 

> @@ -239,6 +267,9 @@
>  'authz-listfile',
>  'authz-pam',
>  'authz-simple',
> +'cryptodev-backend',
> +'cryptodev-backend-builtin',
> +'cryptodev-vhost-user',

Shouldn't the enum value be conditional...

>  'iothread'
>] }
>  
> @@ -262,6 +293,10 @@
>'authz-listfile': 'AuthZListFileProperties',
>'authz-pam':  'AuthZPAMProperties',
>'authz-simple':   'AuthZSimpleProperties',
> +  'cryptodev-backend':  'CryptodevBackendProperties',
> +  'cryptodev-backend-builtin':  'CryptodevBackendProperties',
> +  'cryptodev-vhost-user':   { 'type': 'CryptodevVhostUserProperties',
> +  'if': 'defined(CONFIG_VIRTIO_CRYPTO) 
> && defined(CONFIG_VHOST_CRYPTO)' },

...if the union branch is likewise?

>'iothread':   'IothreadProperties'
>} }
>  
> 

With that fixed,

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PULL 01/38] hw/block/nvme: introduce nvme-subsys device

2021-03-08 Thread Klaus Jensen
On Mar  8 18:53, Peter Maydell wrote:
> On Mon, 8 Mar 2021 at 18:46, Klaus Jensen  wrote:
> >
> > On Mar  8 19:32, Paolo Bonzini wrote:
> > > On 08/03/21 13:22, Klaus Jensen wrote:
> > > >
> > > > This patch introduced a simple nvme-subsys device model.  The subsystem
> > > > will be prepared with subsystem NQN with  provided in
> > > > nvme-subsys device:
> > > >
> > > >ex) -device nvme-subsys,id=subsys0: nqn.2019-08.org.qemu:subsys0
> > >
> > > Hi Klaus, sorry for not spotting this before.  In the SCSI subsystem we
> > > moved away from using id as guest-visible data.  Keeping it as a default 
> > > is
> > > fine I guess, but would it be possible to add an nqn property to 
> > > nvme-subsys
> > > and use it if it is present instead of the id?
> > >
> > > Thanks,
> > >
> > > Paolo
> > >
> >
> > Hi Paolo,
> >
> > Thanks for pointing this out! Absolutely - we have no specific reason to
> > use 'id', so we can just change it completely to use 'nqn'.
> >
> > Peter, you want this in a v2 or did you already start integration of
> > this PR?
> 
> I haven't yet started working on this PR so if the change is trivial
> feel free to roll a v2.
> 

Thanks, I'll do that!



signature.asc
Description: PGP signature


Re: [PULL 01/38] hw/block/nvme: introduce nvme-subsys device

2021-03-08 Thread Klaus Jensen
On Mar  8 19:32, Paolo Bonzini wrote:
> On 08/03/21 13:22, Klaus Jensen wrote:
> > 
> > This patch introduced a simple nvme-subsys device model.  The subsystem
> > will be prepared with subsystem NQN with  provided in
> > nvme-subsys device:
> > 
> >ex) -device nvme-subsys,id=subsys0: nqn.2019-08.org.qemu:subsys0
> 
> Hi Klaus, sorry for not spotting this before.  In the SCSI subsystem we
> moved away from using id as guest-visible data.  Keeping it as a default is
> fine I guess, but would it be possible to add an nqn property to nvme-subsys
> and use it if it is present instead of the id?
> 
> Thanks,
> 
> Paolo
> 

Hi Paolo,

Thanks for pointing this out! Absolutely - we have no specific reason to
use 'id', so we can just change it completely to use 'nqn'.

Peter, you want this in a v2 or did you already start integration of
this PR?


signature.asc
Description: PGP signature


Re: [PATCH] xen-block: Fix removal of backend instance via xenstore

2021-03-08 Thread Paolo Bonzini

On 08/03/21 19:14, Anthony PERARD wrote:

On Mon, Mar 08, 2021 at 06:37:38PM +0100, Paolo Bonzini wrote:

On 08/03/21 18:29, Anthony PERARD wrote:

If nothing else works then I guess it's okay, but why can't you do the
xen_block_drive_destroy from e.g. an unrealize callback?


I'm not sure if that's possible.

xen_block_device_create/xen_block_device_destroy() is supposed to be
equivalent to do those qmp commands:
  blockdev-add node-name=xvdz-qcow2 driver=qcow2 
file={"driver":"file","filename":"disk.qcow2","locking":"off"}
  device_add id=xvdz driver=xen-disk vdev=xvdz drive=xvdz-qcow2

But I tried to add a call xen_block_drive_destroy from
xen_block_unrealize, but that still is called too early, it's called
before object_property_del_all() which would delete "drive" and call
release_drive() which would free the node.


Can you use blockdev_mark_auto_del?  Then you don't have to call
xen_block_drive_destroy at all.


There is no legacy_dinfo, so blockdev_mark_auto_del doesn't work.


Then I guess it's okay.  Perhaps you can rename the function to 
xen_block_blockdev_destroy so that it's clear it's a blockdev and no 
drive.  Thanks,


Paolo




Re: [PULL 01/38] hw/block/nvme: introduce nvme-subsys device

2021-03-08 Thread Peter Maydell
On Mon, 8 Mar 2021 at 18:46, Klaus Jensen  wrote:
>
> On Mar  8 19:32, Paolo Bonzini wrote:
> > On 08/03/21 13:22, Klaus Jensen wrote:
> > >
> > > This patch introduced a simple nvme-subsys device model.  The subsystem
> > > will be prepared with subsystem NQN with  provided in
> > > nvme-subsys device:
> > >
> > >ex) -device nvme-subsys,id=subsys0: nqn.2019-08.org.qemu:subsys0
> >
> > Hi Klaus, sorry for not spotting this before.  In the SCSI subsystem we
> > moved away from using id as guest-visible data.  Keeping it as a default is
> > fine I guess, but would it be possible to add an nqn property to nvme-subsys
> > and use it if it is present instead of the id?
> >
> > Thanks,
> >
> > Paolo
> >
>
> Hi Paolo,
>
> Thanks for pointing this out! Absolutely - we have no specific reason to
> use 'id', so we can just change it completely to use 'nqn'.
>
> Peter, you want this in a v2 or did you already start integration of
> this PR?

I haven't yet started working on this PR so if the change is trivial
feel free to roll a v2.

-- PMM



Re: [PULL 01/38] hw/block/nvme: introduce nvme-subsys device

2021-03-08 Thread Paolo Bonzini

On 08/03/21 13:22, Klaus Jensen wrote:


This patch introduced a simple nvme-subsys device model.  The subsystem
will be prepared with subsystem NQN with  provided in
nvme-subsys device:

   ex) -device nvme-subsys,id=subsys0: nqn.2019-08.org.qemu:subsys0


Hi Klaus, sorry for not spotting this before.  In the SCSI subsystem we 
moved away from using id as guest-visible data.  Keeping it as a default 
is fine I guess, but would it be possible to add an nqn property to 
nvme-subsys and use it if it is present instead of the id?


Thanks,

Paolo




Re: [PATCH] xen-block: Fix removal of backend instance via xenstore

2021-03-08 Thread Anthony PERARD via
On Mon, Mar 08, 2021 at 06:37:38PM +0100, Paolo Bonzini wrote:
> On 08/03/21 18:29, Anthony PERARD wrote:
> > > If nothing else works then I guess it's okay, but why can't you do the
> > > xen_block_drive_destroy from e.g. an unrealize callback?
> > 
> > I'm not sure if that's possible.
> > 
> > xen_block_device_create/xen_block_device_destroy() is supposed to be
> > equivalent to do those qmp commands:
> >  blockdev-add node-name=xvdz-qcow2 driver=qcow2 
> > file={"driver":"file","filename":"disk.qcow2","locking":"off"}
> >  device_add id=xvdz driver=xen-disk vdev=xvdz drive=xvdz-qcow2
> > 
> > But I tried to add a call xen_block_drive_destroy from
> > xen_block_unrealize, but that still is called too early, it's called
> > before object_property_del_all() which would delete "drive" and call
> > release_drive() which would free the node.
> 
> Can you use blockdev_mark_auto_del?  Then you don't have to call
> xen_block_drive_destroy at all.

There is no legacy_dinfo, so blockdev_mark_auto_del doesn't work.

-- 
Anthony PERARD



Re: [PATCH RFC 1/4] hw/block/nvme: convert dsm to aiocb

2021-03-08 Thread Klaus Jensen
On Mar  8 16:37, Stefan Hajnoczi wrote:
> On Tue, Mar 02, 2021 at 12:10:37PM +0100, Klaus Jensen wrote:
> > +static void nvme_dsm_cancel(BlockAIOCB *aiocb)
> > +{
> > +NvmeDSMAIOCB *iocb = container_of(aiocb, NvmeDSMAIOCB, common);
> > +
> > +/* break loop */
> > +iocb->curr.len = 0;
> > +iocb->curr.idx = iocb->nr;
> > +
> > +iocb->ret = -ECANCELED;
> > +
> > +if (iocb->aiocb) {
> > +blk_aio_cancel_async(iocb->aiocb);
> > +iocb->aiocb = NULL;
> > +}
> > +}
> 
> Is the case where iocb->aiocb == NULL just in case nvme_dsm_cancel() is
> called after the last discard has completed but before the BH runs? I
> want to make sure there are no other cases because nothing would call
> iocb->common.cb().
> 

Yes - that case *can* happen, right?

I modeled this after the appoach in the ide trim code (hw/ide/core.c).

> >  static uint16_t nvme_dsm(NvmeCtrl *n, NvmeRequest *req)
> >  {
> >  NvmeNamespace *ns = req->ns;
> >  NvmeDsmCmd *dsm = (NvmeDsmCmd *) >cmd;
> > -
> >  uint32_t attr = le32_to_cpu(dsm->attributes);
> >  uint32_t nr = (le32_to_cpu(dsm->nr) & 0xff) + 1;
> > -
> >  uint16_t status = NVME_SUCCESS;
> >  
> >  trace_pci_nvme_dsm(nvme_cid(req), nvme_nsid(ns), nr, attr);
> >  
> >  if (attr & NVME_DSMGMT_AD) {
> > -int64_t offset;
> > -size_t len;
> > -NvmeDsmRange range[nr];
> > -uintptr_t *discards = (uintptr_t *)>opaque;
> > +NvmeDSMAIOCB *iocb = blk_aio_get(_dsm_aiocb_info, 
> > ns->blkconf.blk,
> > + nvme_misc_cb, req);
> >  
> > -status = nvme_dma(n, (uint8_t *)range, sizeof(range),
> > +iocb->req = req;
> > +iocb->bh = qemu_bh_new(nvme_dsm_bh, iocb);
> > +iocb->ret = 0;
> > +iocb->range = g_new(NvmeDsmRange, nr);
> > +iocb->nr = nr;
> > +iocb->curr.len = 0;
> > +iocb->curr.idx = 0;
> > +
> > +status = nvme_dma(n, (uint8_t *)iocb->range, sizeof(NvmeDsmRange) 
> > * nr,
> >DMA_DIRECTION_TO_DEVICE, req);
> >  if (status) {
> >  return status;
> >  }
> >  
> > -/*
> > - * AIO callbacks may be called immediately, so initialize discards 
> > to 1
> > - * to make sure the the callback does not complete the request 
> > before
> > - * all discards have been issued.
> > - */
> > -*discards = 1;
> > +nvme_dsm_aio_cb(iocb, 0);
> > +req->aiocb = >common;
> 
> Want to move this line up one just in case something in
> nvme_dsm_aio_cb() accesses req->aiocb?

Sounds reasonable! Thanks!


signature.asc
Description: PGP signature


Re: [PATCH] xen-block: Fix removal of backend instance via xenstore

2021-03-08 Thread Paolo Bonzini

On 08/03/21 18:29, Anthony PERARD wrote:

If nothing else works then I guess it's okay, but why can't you do the
xen_block_drive_destroy from e.g. an unrealize callback?


I'm not sure if that's possible.

xen_block_device_create/xen_block_device_destroy() is supposed to be
equivalent to do those qmp commands:
 blockdev-add node-name=xvdz-qcow2 driver=qcow2 
file={"driver":"file","filename":"disk.qcow2","locking":"off"}
 device_add id=xvdz driver=xen-disk vdev=xvdz drive=xvdz-qcow2

But I tried to add a call xen_block_drive_destroy from
xen_block_unrealize, but that still is called too early, it's called
before object_property_del_all() which would delete "drive" and call
release_drive() which would free the node.


Can you use blockdev_mark_auto_del?  Then you don't have to call 
xen_block_drive_destroy at all.


Paolo


So, no, I don't think we can use an unrealized callback.

I though of trying to delete the "drive" property ahead of calling
object_unparent() but I didn't figure out how to do so and it's maybe
not possible.

So either drain_call_rcu or adding call_rcu(xen_block_drive_destroy)
seems to be the way, but since xen_block_drive_destroy uses
qmp_blockdev_del, it seems better to drain_call_rcu.

Cheers,






Re: [PULL 00/31] Block layer patches

2021-03-08 Thread Stefan Hajnoczi
On Mon, Mar 08, 2021 at 12:08:29PM +0100, Kevin Wolf wrote:
> Am 06.03.2021 um 12:22 hat Peter Maydell geschrieben:
> > On Fri, 5 Mar 2021 at 16:55, Kevin Wolf  wrote:
> > >
> > > The following changes since commit 
> > > 9a7beaad3dbba982f7a461d676b55a5c3851d312:
> > >
> > >   Merge remote-tracking branch 
> > > 'remotes/alistair/tags/pull-riscv-to-apply-20210304' into staging 
> > > (2021-03-05 10:47:46 +)
> > >
> > > are available in the Git repository at:
> > >
> > >   git://repo.or.cz/qemu/kevin.git tags/for-upstream
> > >
> > > for you to fetch changes up to 67bedc3aed5c455b629c2cb5f523b536c46adff9:
> > >
> > >   docs: qsd: Explain --export nbd,name=... default (2021-03-05 17:09:46 
> > > +0100)
> > >
> > > 
> > > Block layer patches:
> > >
> > > - qemu-storage-daemon: add --pidfile option
> > > - qemu-storage-daemon: CLI error messages include the option name now
> > > - vhost-user-blk export: Misc fixes, added test cases
> > > - docs: Improvements for qemu-storage-daemon documentation
> > > - parallels: load bitmap extension
> > > - backup-top: Don't crash on post-finalize accesses
> > > - iotests improvements
> > 
> > This failed some of the gitlab CI jobs, like this:
> > 
> > https://gitlab.com/qemu-project/qemu/-/jobs/1077335781
> > 
> > Running test qtest-x86_64/test-hmp
> > Running test qtest-x86_64/qos-test
> > qemu-storage-daemon: vu_panic: Not implemented: memfd support is missing
> > qemu-storage-daemon: vu_panic: Failed to alloc vhost inflight area
> > qemu-system-x86_64: Failed to write msg. Wrote -1 instead of 20.
> > qemu-system-x86_64: vhost_set_features failed: Invalid argument (22)
> > qemu-system-x86_64: Error starting vhost: 22
> > qemu-system-x86_64: vhost-user-blk: vhost start failed: Invalid argument
> > **
> > ERROR:../tests/qtest/libqos/virtio.c:228:qvirtio_wait_used_elem:
> > assertion failed: (g_get_monotonic_time() - start_time <= timeout_us)
> > ERROR qtest-x86_64/qos-test - Bail out!
> > ERROR:../tests/qtest/libqos/virtio.c:228:qvirtio_wait_used_elem:
> > assertion failed: (g_get_monotonic_time() - start_time <= timeout_us)
> > make: *** [run-test-159] Error 1
> > 
> > I guess some test or other is assuming the presence of
> > a host feature that isn't guaranteed to be there ?
> 
> Stefan, can you have a look? This is from the new vhost-user-blk test
> cases from your series.
> 
> If the fix isn't trivial, I'll resubmit v2 today with just the test case
> dropped and then we can add it later.

I'm testing the following commit:
https://gitlab.com/stefanha/qemu/-/pipelines/267172954

I'll look into it more tomorrow.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH] xen-block: Fix removal of backend instance via xenstore

2021-03-08 Thread Anthony PERARD via
On Mon, Mar 08, 2021 at 03:38:49PM +0100, Paolo Bonzini wrote:
> On 08/03/21 15:32, Anthony PERARD wrote:
> > From: Anthony PERARD 
> > 
> > Whenever a Xen block device is detach via xenstore, the image
> > associated with it remained open by the backend QEMU and an error is
> > logged:
> >  qemu-system-i386: failed to destroy drive: Node xvdz-qcow2 is in use
> > 
> > This happened since object_unparent() doesn't immediately frees the
> > object and thus keep a reference to the node we are trying to free.
> > The reference is hold by the "drive" property and the call
> > xen_block_drive_destroy() fails.
> > 
> > In order to fix that, we call drain_call_rcu() to run the callback
> > setup by bus_remove_child() via object_unparent().
> > 
> > Fixes: 2d24a6466154 ("device-core: use RCU for list of children of a bus")
> > 
> > Signed-off-by: Anthony PERARD 
> > ---
> > CCing people whom introduced/reviewed the change to use RCU to give
> > them a chance to say if the change is fine.
> 
> If nothing else works then I guess it's okay, but why can't you do the
> xen_block_drive_destroy from e.g. an unrealize callback?

I'm not sure if that's possible.

xen_block_device_create/xen_block_device_destroy() is supposed to be
equivalent to do those qmp commands:
blockdev-add node-name=xvdz-qcow2 driver=qcow2 
file={"driver":"file","filename":"disk.qcow2","locking":"off"}
device_add id=xvdz driver=xen-disk vdev=xvdz drive=xvdz-qcow2

But I tried to add a call xen_block_drive_destroy from
xen_block_unrealize, but that still is called too early, it's called
before object_property_del_all() which would delete "drive" and call
release_drive() which would free the node.

So, no, I don't think we can use an unrealized callback.

I though of trying to delete the "drive" property ahead of calling
object_unparent() but I didn't figure out how to do so and it's maybe
not possible.

So either drain_call_rcu or adding call_rcu(xen_block_drive_destroy)
seems to be the way, but since xen_block_drive_destroy uses
qmp_blockdev_del, it seems better to drain_call_rcu.

Cheers,

-- 
Anthony PERARD



[PULL v2 00/30] Block layer patches

2021-03-08 Thread Kevin Wolf
The following changes since commit 138d2931979cb7ee4a54a434a54088231f6980ff:

  Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20210308' 
into staging (2021-03-08 11:57:36 +)

are available in the Git repository at:

  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to ef2e38a1a1d2915b148c4a49f61626e62c46fbb6:

  blockdev: Clarify error messages pertaining to 'node-name' (2021-03-08 
14:56:55 +0100)


Block layer patches:

- qemu-storage-daemon: add --pidfile option
- qemu-storage-daemon: CLI error messages include the option name now
- vhost-user-blk export: Misc fixes
- docs: Improvements for qemu-storage-daemon documentation
- parallels: load bitmap extension
- backup-top: Don't crash on post-finalize accesses
- Improve error messages related to node-name options
- iotests improvements


Alberto Garcia (1):
  iotests: Drop deprecated 'props' from object-add

Connor Kuehl (2):
  block: Clarify error messages pertaining to 'node-name'
  blockdev: Clarify error messages pertaining to 'node-name'

Eric Blake (1):
  iotests: Fix up python style in 300

Kevin Wolf (1):
  docs: qsd: Explain --export nbd,name=... default

Max Reitz (3):
  backup: Remove nodes from job in .clean()
  backup-top: Refuse I/O in inactive state
  iotests/283: Check that finalize drops backup-top

Paolo Bonzini (2):
  storage-daemon: report unexpected arguments on the fly
  storage-daemon: include current command line option in the errors

Stefan Hajnoczi (12):
  qemu-storage-daemon: add --pidfile option
  docs: show how to spawn qemu-storage-daemon with fd passing
  docs: replace insecure /tmp examples in qsd docs
  vhost-user-blk: fix blkcfg->num_queues endianness
  libqtest: add qtest_socket_server()
  libqtest: add qtest_kill_qemu()
  libqtest: add qtest_remove_abrt_handler()
  block/export: fix blk_size double byteswap
  block/export: use VIRTIO_BLK_SECTOR_BITS
  block/export: fix vhost-user-blk export sector number calculation
  block/export: port virtio-blk discard/write zeroes input validation
  block/export: port virtio-blk read/write range check

Stefano Garzarella (1):
  blockjob: report a better error message

Vladimir Sementsov-Ogievskiy (7):
  qcow2-bitmap: make bytes_covered_by_bitmap_cluster() public
  parallels.txt: fix bitmap L1 table description
  block/parallels: BDRVParallelsState: add cluster_size field
  parallels: support bitmap extension for read-only mode
  iotests.py: add unarchive_sample_image() helper
  iotests: add parallels-read-bitmap test
  MAINTAINERS: update parallels block driver

 docs/interop/parallels.txt |  28 +-
 docs/tools/qemu-storage-daemon.rst |  68 -
 block/parallels.h  |   7 +-
 include/block/dirty-bitmap.h   |   2 +
 tests/qtest/libqos/libqtest.h  |  37 +++
 block.c|   8 +-
 block/backup-top.c |  10 +
 block/backup.c |   1 +
 block/dirty-bitmap.c   |  13 +
 block/export/vhost-user-blk-server.c   | 150 +--
 block/parallels-ext.c  | 300 +
 block/parallels.c  |  26 +-
 block/qcow2-bitmap.c   |  16 +-
 blockdev.c |  13 +-
 blockjob.c |  10 +-
 hw/block/vhost-user-blk.c  |   7 +-
 storage-daemon/qemu-storage-daemon.c   |  56 +++-
 tests/qtest/libqtest.c |  82 --
 tests/qemu-iotests/iotests.py  |  10 +
 MAINTAINERS|   3 +
 block/meson.build  |   3 +-
 tests/qemu-iotests/030 |   4 +-
 tests/qemu-iotests/040 |   4 +-
 tests/qemu-iotests/051.pc.out  |   6 +-
 tests/qemu-iotests/081.out |   2 +-
 tests/qemu-iotests/085.out |   6 +-
 tests/qemu-iotests/087 |   8 +-
 tests/qemu-iotests/087.out |   2 +-
 tests/qemu-iotests/184 |  18 +-
 tests/qemu-iotests/206.out |   2 +-
 tests/qemu-iotests/210.out |   2 +-
 tests/qemu-iotests/211.out |   2 +-
 tests/qemu-iotests/212.out |   2 +-
 tests/qemu-iotests/213.out |   2 +-
 tests/qemu-iotests/

[PATCH v3 26/30] qemu-img: Use user_creatable_process_cmdline() for --object

2021-03-08 Thread Kevin Wolf
This switches qemu-img from a QemuOpts-based parser for --object to
user_creatable_process_cmdline() which uses a keyval parser and enforces
the QAPI schema.

Apart from being a cleanup, this makes non-scalar properties accessible.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
---
 qemu-img.c | 251 ++---
 1 file changed, 45 insertions(+), 206 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index e2952fe955..babb5573ab 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -226,23 +226,6 @@ static void QEMU_NORETURN help(void)
 exit(EXIT_SUCCESS);
 }
 
-static QemuOptsList qemu_object_opts = {
-.name = "object",
-.implied_opt_name = "qom-type",
-.head = QTAILQ_HEAD_INITIALIZER(qemu_object_opts.head),
-.desc = {
-{ }
-},
-};
-
-static bool qemu_img_object_print_help(const char *type, QemuOpts *opts)
-{
-if (user_creatable_print_help(type, opts)) {
-exit(0);
-}
-return true;
-}
-
 /*
  * Is @optarg safe for accumulate_options()?
  * It is when multiple of them can be joined together separated by ','.
@@ -566,14 +549,9 @@ static int img_create(int argc, char **argv)
 case 'u':
 flags |= BDRV_O_NO_BACKING;
 break;
-case OPTION_OBJECT: {
-QemuOpts *opts;
-opts = qemu_opts_parse_noisily(_object_opts,
-   optarg, true);
-if (!opts) {
-goto fail;
-}
-}   break;
+case OPTION_OBJECT:
+user_creatable_process_cmdline(optarg);
+break;
 }
 }
 
@@ -589,12 +567,6 @@ static int img_create(int argc, char **argv)
 }
 optind++;
 
-if (qemu_opts_foreach(_object_opts,
-  user_creatable_add_opts_foreach,
-  qemu_img_object_print_help, _fatal)) {
-goto fail;
-}
-
 /* Get image size, if specified */
 if (optind < argc) {
 int64_t sval;
@@ -804,14 +776,9 @@ static int img_check(int argc, char **argv)
 case 'U':
 force_share = true;
 break;
-case OPTION_OBJECT: {
-QemuOpts *opts;
-opts = qemu_opts_parse_noisily(_object_opts,
-   optarg, true);
-if (!opts) {
-return 1;
-}
-}   break;
+case OPTION_OBJECT:
+user_creatable_process_cmdline(optarg);
+break;
 case OPTION_IMAGE_OPTS:
 image_opts = true;
 break;
@@ -831,12 +798,6 @@ static int img_check(int argc, char **argv)
 return 1;
 }
 
-if (qemu_opts_foreach(_object_opts,
-  user_creatable_add_opts_foreach,
-  qemu_img_object_print_help, _fatal)) {
-return 1;
-}
-
 ret = bdrv_parse_cache_mode(cache, , );
 if (ret < 0) {
 error_report("Invalid source cache option: %s", cache);
@@ -1034,14 +995,9 @@ static int img_commit(int argc, char **argv)
 return 1;
 }
 break;
-case OPTION_OBJECT: {
-QemuOpts *opts;
-opts = qemu_opts_parse_noisily(_object_opts,
-   optarg, true);
-if (!opts) {
-return 1;
-}
-}   break;
+case OPTION_OBJECT:
+user_creatable_process_cmdline(optarg);
+break;
 case OPTION_IMAGE_OPTS:
 image_opts = true;
 break;
@@ -1058,12 +1014,6 @@ static int img_commit(int argc, char **argv)
 }
 filename = argv[optind++];
 
-if (qemu_opts_foreach(_object_opts,
-  user_creatable_add_opts_foreach,
-  qemu_img_object_print_help, _fatal)) {
-return 1;
-}
-
 flags = BDRV_O_RDWR | BDRV_O_UNMAP;
 ret = bdrv_parse_cache_mode(cache, , );
 if (ret < 0) {
@@ -1353,7 +1303,7 @@ static int check_empty_sectors(BlockBackend *blk, int64_t 
offset,
 /*
  * Compares two images. Exit codes:
  *
- * 0 - Images are identical
+ * 0 - Images are identical or the requested help was printed
  * 1 - Images differ
  * >1 - Error occurred
  */
@@ -1423,15 +1373,21 @@ static int img_compare(int argc, char **argv)
 case 'U':
 force_share = true;
 break;
-case OPTION_OBJECT: {
-QemuOpts *opts;
-opts = qemu_opts_parse_noisily(_object_opts,
-   optarg, true);
-if (!opts) {
-ret = 2;
-goto out4;
+case OPTION_OBJECT:
+{
+Error *local_err = NULL;
+
+if (!user_creatable_add_from_str(optarg, _err)) {
+if (local_err) {
+error_report_err(local_err);
+exit(2);
+} 

[PATCH v3 30/30] qom: Drop QemuOpts based interfaces

2021-03-08 Thread Kevin Wolf
user_creatable_add_opts() has only a single user left, which is a test
case. Rewrite the test to use user_creatable_add_type() instead (which
is the remaining function that doesn't require a QAPI schema) and drop
the QemuOpts related functions.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 include/qom/object_interfaces.h | 59 
 qom/object_interfaces.c | 81 -
 tests/check-qom-proplist.c  | 42 -
 3 files changed, 20 insertions(+), 162 deletions(-)

diff --git a/include/qom/object_interfaces.h b/include/qom/object_interfaces.h
index fb32330901..ac6c33ceac 100644
--- a/include/qom/object_interfaces.h
+++ b/include/qom/object_interfaces.h
@@ -99,51 +99,6 @@ Object *user_creatable_add_type(const char *type, const char 
*id,
  */
 void user_creatable_add_qapi(ObjectOptions *options, Error **errp);
 
-/**
- * user_creatable_add_opts:
- * @opts: the object definition
- * @errp: if an error occurs, a pointer to an area to store the error
- *
- * Create an instance of the user creatable object whose type
- * is defined in @opts by the 'qom-type' option, placing it
- * in the object composition tree with name provided by the
- * 'id' field. The remaining options in @opts are used to
- * initialize the object properties.
- *
- * Returns: the newly created object or NULL on error
- */
-Object *user_creatable_add_opts(QemuOpts *opts, Error **errp);
-
-
-/**
- * user_creatable_add_opts_predicate:
- * @type: the QOM type to be added
- *
- * A callback function to determine whether an object
- * of type @type should be created. Instances of this
- * callback should be passed to user_creatable_add_opts_foreach
- */
-typedef bool (*user_creatable_add_opts_predicate)(const char *type);
-
-/**
- * user_creatable_add_opts_foreach:
- * @opaque: a user_creatable_add_opts_predicate callback or NULL
- * @opts: options to create
- * @errp: unused
- *
- * An iterator callback to be used in conjunction with
- * the qemu_opts_foreach() method for creating a list of
- * objects from a set of QemuOpts
- *
- * The @opaque parameter can be passed a user_creatable_add_opts_predicate
- * callback to filter which types of object are created during iteration.
- * When it fails, report the error.
- *
- * Returns: 0 on success, -1 when an error was reported.
- */
-int user_creatable_add_opts_foreach(void *opaque,
-QemuOpts *opts, Error **errp);
-
 /**
  * user_creatable_parse_str:
  * @optarg: the object definition string as passed on the command line
@@ -190,20 +145,6 @@ bool user_creatable_add_from_str(const char *optarg, Error 
**errp);
  */
 void user_creatable_process_cmdline(const char *optarg);
 
-/**
- * user_creatable_print_help:
- * @type: the QOM type to be added
- * @opts: options to create
- *
- * Prints help if requested in @type or @opts. Note that if @type is neither
- * "help"/"?" nor a valid user creatable type, no help will be printed
- * regardless of @opts.
- *
- * Returns: true if a help option was found and help was printed, false
- * otherwise.
- */
-bool user_creatable_print_help(const char *type, QemuOpts *opts);
-
 /**
  * user_creatable_del:
  * @id: the unique ID for the object
diff --git a/qom/object_interfaces.c b/qom/object_interfaces.c
index 62d7db7629..61d6d74a26 100644
--- a/qom/object_interfaces.c
+++ b/qom/object_interfaces.c
@@ -10,13 +10,10 @@
 #include "qapi/qobject-input-visitor.h"
 #include "qapi/qobject-output-visitor.h"
 #include "qom/object_interfaces.h"
-#include "qemu/help_option.h"
 #include "qemu/id.h"
 #include "qemu/module.h"
 #include "qemu/option.h"
 #include "qemu/qemu-print.h"
-#include "qapi/opts-visitor.h"
-#include "qemu/config-file.h"
 
 bool user_creatable_complete(UserCreatable *uc, Error **errp)
 {
@@ -140,60 +137,6 @@ void user_creatable_add_qapi(ObjectOptions *options, Error 
**errp)
 visit_free(v);
 }
 
-Object *user_creatable_add_opts(QemuOpts *opts, Error **errp)
-{
-Visitor *v;
-QDict *pdict;
-Object *obj;
-const char *id = qemu_opts_id(opts);
-char *type = qemu_opt_get_del(opts, "qom-type");
-
-if (!type) {
-error_setg(errp, QERR_MISSING_PARAMETER, "qom-type");
-return NULL;
-}
-if (!id) {
-error_setg(errp, QERR_MISSING_PARAMETER, "id");
-qemu_opt_set(opts, "qom-type", type, _abort);
-g_free(type);
-return NULL;
-}
-
-qemu_opts_set_id(opts, NULL);
-pdict = qemu_opts_to_qdict(opts, NULL);
-
-v = opts_visitor_new(opts);
-obj = user_creatable_add_type(type, id, pdict, v, errp);
-visit_free(v);
-
-qemu_opts_set_id(opts, (char *) id);
-qemu_opt_set(opts, "qom-type", type, _abort);
-g_free(type);
-qobject_unref(pdict);
-return obj;
-}
-
-
-int user_creatable_add_opts_foreach(void *opaque, QemuOpts *opts, Error **errp)
-{
-bool (*type_opt_predicate)(const char *, QemuOpts *) = opaque;
-Object *obj 

[PATCH v3 22/30] qom: Factor out user_creatable_process_cmdline()

2021-03-08 Thread Kevin Wolf
The implementation for --object can be shared between
qemu-storage-daemon and other binaries, so move it into a function in
qom/object_interfaces.c that is accessible from everywhere.

This also requires moving the implementation of qmp_object_add() into a
new user_creatable_add_qapi(), because qom/qom-qmp-cmds.c is not linked
for tools.

user_creatable_print_help_from_qdict() can become static now.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 include/qom/object_interfaces.h  | 41 +++
 qom/object_interfaces.c  | 50 +++-
 qom/qom-qmp-cmds.c   | 20 +--
 storage-daemon/qemu-storage-daemon.c | 24 ++---
 4 files changed, 80 insertions(+), 55 deletions(-)

diff --git a/include/qom/object_interfaces.h b/include/qom/object_interfaces.h
index 5299603f50..1e6c51b541 100644
--- a/include/qom/object_interfaces.h
+++ b/include/qom/object_interfaces.h
@@ -2,6 +2,7 @@
 #define OBJECT_INTERFACES_H
 
 #include "qom/object.h"
+#include "qapi/qapi-types-qom.h"
 #include "qapi/visitor.h"
 
 #define TYPE_USER_CREATABLE "user-creatable"
@@ -86,6 +87,18 @@ Object *user_creatable_add_type(const char *type, const char 
*id,
 const QDict *qdict,
 Visitor *v, Error **errp);
 
+/**
+ * user_creatable_add_qapi:
+ * @options: the object definition
+ * @errp: if an error occurs, a pointer to an area to store the error
+ *
+ * Create an instance of the user creatable object according to the
+ * options passed in @opts as described in the QAPI schema documentation.
+ *
+ * Returns: the newly created object or NULL on error
+ */
+void user_creatable_add_qapi(ObjectOptions *options, Error **errp);
+
 /**
  * user_creatable_add_opts:
  * @opts: the object definition
@@ -131,6 +144,21 @@ typedef bool (*user_creatable_add_opts_predicate)(const 
char *type);
 int user_creatable_add_opts_foreach(void *opaque,
 QemuOpts *opts, Error **errp);
 
+/**
+ * user_creatable_process_cmdline:
+ * @optarg: the object definition string as passed on the command line
+ *
+ * Create an instance of the user creatable object by parsing optarg
+ * with a keyval parser and implicit key 'qom-type', converting the
+ * result to ObjectOptions and calling into qmp_object_add().
+ *
+ * If a help option is given, print help instead and exit.
+ *
+ * This function is only meant to be called during command line parsing.
+ * It exits the process on failure or after printing help.
+ */
+void user_creatable_process_cmdline(const char *optarg);
+
 /**
  * user_creatable_print_help:
  * @type: the QOM type to be added
@@ -145,19 +173,6 @@ int user_creatable_add_opts_foreach(void *opaque,
  */
 bool user_creatable_print_help(const char *type, QemuOpts *opts);
 
-/**
- * user_creatable_print_help_from_qdict:
- * @args: options to create
- *
- * Prints help considering the other options given in @args (if "qom-type" is
- * given and valid, print properties for the type, otherwise print valid types)
- *
- * In contrast to user_creatable_print_help(), this function can't return that
- * no help was requested. It should only be called if we know that help is
- * requested and it will always print some help.
- */
-void user_creatable_print_help_from_qdict(QDict *args);
-
 /**
  * user_creatable_del:
  * @id: the unique ID for the object
diff --git a/qom/object_interfaces.c b/qom/object_interfaces.c
index 02c3934329..2eaf9971f5 100644
--- a/qom/object_interfaces.c
+++ b/qom/object_interfaces.c
@@ -2,10 +2,13 @@
 
 #include "qemu/cutils.h"
 #include "qapi/error.h"
+#include "qapi/qapi-commands-qom.h"
+#include "qapi/qapi-visit-qom.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qerror.h"
 #include "qapi/qmp/qjson.h"
 #include "qapi/qobject-input-visitor.h"
+#include "qapi/qobject-output-visitor.h"
 #include "qom/object_interfaces.h"
 #include "qemu/help_option.h"
 #include "qemu/id.h"
@@ -113,6 +116,29 @@ out:
 return obj;
 }
 
+void user_creatable_add_qapi(ObjectOptions *options, Error **errp)
+{
+Visitor *v;
+QObject *qobj;
+QDict *props;
+Object *obj;
+
+v = qobject_output_visitor_new();
+visit_type_ObjectOptions(v, NULL, , _abort);
+visit_complete(v, );
+visit_free(v);
+
+props = qobject_to(QDict, qobj);
+qdict_del(props, "qom-type");
+qdict_del(props, "id");
+
+v = qobject_input_visitor_new(QOBJECT(props));
+obj = user_creatable_add_type(ObjectType_str(options->qom_type),
+  options->id, props, v, errp);
+object_unref(obj);
+visit_free(v);
+}
+
 Object *user_creatable_add_opts(QemuOpts *opts, Error **errp)
 {
 Visitor *v;
@@ -256,7 +282,7 @@ bool user_creatable_print_help(const char *type, QemuOpts 
*opts)
 return false;
 }
 
-void user_creatable_print_help_from_qdict(QDict *args)
+static void user_creatable_print_help_from_qdict(QDict *args)
 {
   

[PATCH v3 15/30] qapi/qom: Add ObjectOptions for confidential-guest-support

2021-03-08 Thread Kevin Wolf
This adds a QAPI schema for the properties of the objects implementing
the confidential-guest-support interface.

pef-guest and s390x-pv-guest don't have any properties, so they only
need to be added to the ObjectType enum without adding a new branch to
ObjectOptions.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 qapi/qom.json | 37 +
 1 file changed, 37 insertions(+)

diff --git a/qapi/qom.json b/qapi/qom.json
index 6afac9169f..ad72dbdec2 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -635,6 +635,38 @@
   'base': 'RngProperties',
   'data': { '*filename': 'str' } }
 
+##
+# @SevGuestProperties:
+#
+# Properties for sev-guest objects.
+#
+# @sev-device: SEV device to use (default: "/dev/sev")
+#
+# @dh-cert-file: guest owners DH certificate (encoded with base64)
+#
+# @session-file: guest owners session parameters (encoded with base64)
+#
+# @policy: SEV policy value (default: 0x1)
+#
+# @handle: SEV firmware handle (default: 0)
+#
+# @cbitpos: C-bit location in page table entry (default: 0)
+#
+# @reduced-phys-bits: number of bits in physical addresses that become
+# unavailable when SEV is enabled
+#
+# Since: 2.12
+##
+{ 'struct': 'SevGuestProperties',
+  'data': { '*sev-device': 'str',
+'*dh-cert-file': 'str',
+'*session-file': 'str',
+'*policy': 'uint32',
+'*handle': 'uint32',
+'*cbitpos': 'uint32',
+'reduced-phys-bits': 'uint32' },
+  'if': 'defined(CONFIG_SEV)' }
+
 ##
 # @ObjectType:
 #
@@ -663,12 +695,15 @@
 'memory-backend-file',
 'memory-backend-memfd',
 'memory-backend-ram',
+{'name': 'pef-guest', 'if': 'defined(CONFIG_PSERIES)' },
 'pr-manager-helper',
 'rng-builtin',
 'rng-egd',
 'rng-random',
 'secret',
 'secret_keyring',
+{'name': 'sev-guest', 'if': 'defined(CONFIG_SEV)' },
+'s390-pv-guest',
 'throttle-group',
 'tls-creds-anon',
 'tls-creds-psk',
@@ -720,6 +755,8 @@
   'rng-random': 'RngRandomProperties',
   'secret': 'SecretProperties',
   'secret_keyring': 'SecretKeyringProperties',
+  'sev-guest':  { 'type': 'SevGuestProperties',
+  'if': 'defined(CONFIG_SEV)' },
   'throttle-group': 'ThrottleGroupProperties',
   'tls-creds-anon': 'TlsCredsAnonProperties',
   'tls-creds-psk':  'TlsCredsPskProperties',
-- 
2.29.2




[PATCH v3 16/30] qapi/qom: Add ObjectOptions for input-*

2021-03-08 Thread Kevin Wolf
This adds a QAPI schema for the properties of the input-* objects.

ui.json cannot be included in qom.json because the storage daemon can't
use it, so move GrabToggleKeys to common.json.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 qapi/common.json | 12 ++
 qapi/qom.json| 59 
 qapi/ui.json | 13 +--
 3 files changed, 72 insertions(+), 12 deletions(-)

diff --git a/qapi/common.json b/qapi/common.json
index b87e7f9039..7c976296f0 100644
--- a/qapi/common.json
+++ b/qapi/common.json
@@ -185,3 +185,15 @@
 ##
 { 'enum': 'NetFilterDirection',
   'data': [ 'all', 'rx', 'tx' ] }
+
+##
+# @GrabToggleKeys:
+#
+# Keys to toggle input-linux between host and guest.
+#
+# Since: 4.0
+#
+##
+{ 'enum': 'GrabToggleKeys',
+  'data': [ 'ctrl-ctrl', 'alt-alt', 'shift-shift','meta-meta', 'scrolllock',
+'ctrl-scrolllock' ] }
diff --git a/qapi/qom.json b/qapi/qom.json
index ad72dbdec2..6b96e9b0b3 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -444,6 +444,61 @@
   'base': 'NetfilterProperties',
   'data': { '*vnet_hdr_support': 'bool' } }
 
+##
+# @InputBarrierProperties:
+#
+# Properties for input-barrier objects.
+#
+# @name: the screen name as declared in the screens section of barrier.conf
+#
+# @server: hostname of the Barrier server (default: "localhost")
+#
+# @port: TCP port of the Barrier server (default: "24800")
+#
+# @x-origin: x coordinate of the leftmost pixel on the guest screen
+#(default: "0")
+#
+# @y-origin: y coordinate of the topmost pixel on the guest screen
+#(default: "0")
+#
+# @width: the width of secondary screen in pixels (default: "1920")
+#
+# @height: the height of secondary screen in pixels (default: "1080")
+#
+# Since: 4.2
+##
+{ 'struct': 'InputBarrierProperties',
+  'data': { 'name': 'str',
+'*server': 'str',
+'*port': 'str',
+'*x-origin': 'str',
+'*y-origin': 'str',
+'*width': 'str',
+'*height': 'str' } }
+
+##
+# @InputLinuxProperties:
+#
+# Properties for input-linux objects.
+#
+# @evdev: the path of the host evdev device to use
+#
+# @grab_all: if true, grab is toggled for all devices (e.g. both keyboard and
+#mouse) instead of just one device (default: false)
+#
+# @repeat: enables auto-repeat events (default: false)
+#
+# @grab-toggle: the key or key combination that toggles device grab
+#   (default: ctrl-ctrl)
+#
+# Since: 2.6
+##
+{ 'struct': 'InputLinuxProperties',
+  'data': { 'evdev': 'str',
+'*grab_all': 'bool',
+'*repeat': 'bool',
+'*grab-toggle': 'GrabToggleKeys' } }
+
 ##
 # @IothreadProperties:
 #
@@ -691,6 +746,8 @@
 'filter-redirector',
 'filter-replay',
 'filter-rewriter',
+'input-barrier',
+'input-linux',
 'iothread',
 'memory-backend-file',
 'memory-backend-memfd',
@@ -744,6 +801,8 @@
   'filter-redirector':  'FilterRedirectorProperties',
   'filter-replay':  'NetfilterProperties',
   'filter-rewriter':'FilterRewriterProperties',
+  'input-barrier':  'InputBarrierProperties',
+  'input-linux':'InputLinuxProperties',
   'iothread':   'IothreadProperties',
   'memory-backend-file':'MemoryBackendFileProperties',
   'memory-backend-memfd':   { 'type': 'MemoryBackendMemfdProperties',
diff --git a/qapi/ui.json b/qapi/ui.json
index d08d72b439..cc1882108b 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -6,6 +6,7 @@
 # = Remote desktop
 ##
 
+{ 'include': 'common.json' }
 { 'include': 'sockets.json' }
 
 ##
@@ -1021,18 +1022,6 @@
 '*head'  : 'int',
 'events' : [ 'InputEvent' ] } }
 
-##
-# @GrabToggleKeys:
-#
-# Keys to toggle input-linux between host and guest.
-#
-# Since: 4.0
-#
-##
-{ 'enum': 'GrabToggleKeys',
-  'data': [ 'ctrl-ctrl', 'alt-alt', 'shift-shift','meta-meta', 'scrolllock',
-'ctrl-scrolllock' ] }
-
 ##
 # @DisplayGTK:
 #
-- 
2.29.2




[PATCH v3 24/30] qemu-nbd: Use user_creatable_process_cmdline() for --object

2021-03-08 Thread Kevin Wolf
This switches qemu-nbd from a QemuOpts-based parser for --object to
user_creatable_process_cmdline() which uses a keyval parser and enforces
the QAPI schema.

Apart from being a cleanup, this makes non-scalar properties accessible.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 qemu-nbd.c | 34 +++---
 1 file changed, 3 insertions(+), 31 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index b1b9430a8f..93ef4e288f 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -401,24 +401,6 @@ static QemuOptsList file_opts = {
 },
 };
 
-static QemuOptsList qemu_object_opts = {
-.name = "object",
-.implied_opt_name = "qom-type",
-.head = QTAILQ_HEAD_INITIALIZER(qemu_object_opts.head),
-.desc = {
-{ }
-},
-};
-
-static bool qemu_nbd_object_print_help(const char *type, QemuOpts *opts)
-{
-if (user_creatable_print_help(type, opts)) {
-exit(0);
-}
-return true;
-}
-
-
 static QCryptoTLSCreds *nbd_get_tls_creds(const char *id, bool list,
   Error **errp)
 {
@@ -594,7 +576,6 @@ int main(int argc, char **argv)
 qcrypto_init(_fatal);
 
 module_call_init(MODULE_INIT_QOM);
-qemu_add_opts(_object_opts);
 qemu_add_opts(_trace_opts);
 qemu_init_exec_dir(argv[0]);
 
@@ -747,14 +728,9 @@ int main(int argc, char **argv)
 case '?':
 error_report("Try `%s --help' for more information.", argv[0]);
 exit(EXIT_FAILURE);
-case QEMU_NBD_OPT_OBJECT: {
-QemuOpts *opts;
-opts = qemu_opts_parse_noisily(_object_opts,
-   optarg, true);
-if (!opts) {
-exit(EXIT_FAILURE);
-}
-}   break;
+case QEMU_NBD_OPT_OBJECT:
+user_creatable_process_cmdline(optarg);
+break;
 case QEMU_NBD_OPT_TLSCREDS:
 tlscredsid = optarg;
 break;
@@ -802,10 +778,6 @@ int main(int argc, char **argv)
 export_name = "";
 }
 
-qemu_opts_foreach(_object_opts,
-  user_creatable_add_opts_foreach,
-  qemu_nbd_object_print_help, _fatal);
-
 if (!trace_init_backends()) {
 exit(1);
 }
-- 
2.29.2




[PATCH v3 27/30] hmp: QAPIfy object_add

2021-03-08 Thread Kevin Wolf
This switches the HMP command object_add from a QemuOpts-based parser to
user_creatable_add_from_str() which uses a keyval parser and enforces
the QAPI schema.

Apart from being a cleanup, this makes non-scalar properties and help
accessible. In order for help to be printed to the monitor instead of
stdout, the printf() calls in the help functions are changed to
qemu_printf().

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
Reviewed-by: Dr. David Alan Gilbert 
---
 monitor/hmp-cmds.c  | 17 ++---
 qom/object_interfaces.c | 11 ++-
 hmp-commands.hx |  2 +-
 3 files changed, 9 insertions(+), 21 deletions(-)

diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 3c88a4faef..652cf9ff21 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1670,24 +1670,11 @@ void hmp_netdev_del(Monitor *mon, const QDict *qdict)
 
 void hmp_object_add(Monitor *mon, const QDict *qdict)
 {
+const char *options = qdict_get_str(qdict, "object");
 Error *err = NULL;
-QemuOpts *opts;
-Object *obj = NULL;
-
-opts = qemu_opts_from_qdict(qemu_find_opts("object"), qdict, );
-if (err) {
-goto end;
-}
 
-obj = user_creatable_add_opts(opts, );
-qemu_opts_del(opts);
-
-end:
+user_creatable_add_from_str(options, );
 hmp_handle_error(mon, err);
-
-if (obj) {
-object_unref(obj);
-}
 }
 
 void hmp_getfd(Monitor *mon, const QDict *qdict)
diff --git a/qom/object_interfaces.c b/qom/object_interfaces.c
index bf9f8cd2c6..6dcab60f09 100644
--- a/qom/object_interfaces.c
+++ b/qom/object_interfaces.c
@@ -14,6 +14,7 @@
 #include "qemu/id.h"
 #include "qemu/module.h"
 #include "qemu/option.h"
+#include "qemu/qemu-print.h"
 #include "qapi/opts-visitor.h"
 #include "qemu/config-file.h"
 
@@ -221,11 +222,11 @@ static void user_creatable_print_types(void)
 {
 GSList *l, *list;
 
-printf("List of user creatable objects:\n");
+qemu_printf("List of user creatable objects:\n");
 list = object_class_get_list_sorted(TYPE_USER_CREATABLE, false);
 for (l = list; l != NULL; l = l->next) {
 ObjectClass *oc = OBJECT_CLASS(l->data);
-printf("  %s\n", object_class_get_name(oc));
+qemu_printf("  %s\n", object_class_get_name(oc));
 }
 g_slist_free(list);
 }
@@ -256,12 +257,12 @@ static bool user_creatable_print_type_properites(const 
char *type)
 }
 g_ptr_array_sort(array, (GCompareFunc)qemu_pstrcmp0);
 if (array->len > 0) {
-printf("%s options:\n", type);
+qemu_printf("%s options:\n", type);
 } else {
-printf("There are no options for %s.\n", type);
+qemu_printf("There are no options for %s.\n", type);
 }
 for (i = 0; i < array->len; i++) {
-printf("%s\n", (char *)array->pdata[i]);
+qemu_printf("%s\n", (char *)array->pdata[i]);
 }
 g_ptr_array_set_free_func(array, g_free);
 g_ptr_array_free(array, true);
diff --git a/hmp-commands.hx b/hmp-commands.hx
index d4001f9c5d..6f5d9ce2fb 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1337,7 +1337,7 @@ ERST
 
 {
 .name   = "object_add",
-.args_type  = "object:O",
+.args_type  = "object:S",
 .params = "[qom-type=]type,id=str[,prop=value][,...]",
 .help   = "create QOM object",
 .cmd= hmp_object_add,
-- 
2.29.2




[PATCH v3 29/30] vl: QAPIfy -object

2021-03-08 Thread Kevin Wolf
This switches the system emulator from a QemuOpts-based parser for
-object to user_creatable_parse_str() which uses a keyval parser and
enforces the QAPI schema.

Apart from being a cleanup, this makes non-scalar properties accessible.

This adopts a similar model as -blockdev uses: When parsing the option,
create the ObjectOptions and queue them. At the later point where we
used to create objects for the collected QemuOpts, the ObjectOptions
queue is processed instead.

A complication compared to -blockdev is that object definitions are
supported in -readconfig and -writeconfig.

After this patch, -readconfig still works, though it still goes through
the QemuOpts parser, which means that improvements like non-scalar
properties are still not available in config files.

-writeconfig stops working for -object. Tough luck. It has never
supported all options (not even the common ones), so supporting one less
isn't the end of the world. As object definitions from -readconfig still
go through QemuOpts, they are still included in -writeconfig output,
which at least prevents destroying your existing configuration when you
just wanted to add another option.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 softmmu/vl.c | 109 +++
 1 file changed, 84 insertions(+), 25 deletions(-)

diff --git a/softmmu/vl.c b/softmmu/vl.c
index 10bd8a10a3..deb061cc78 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -113,6 +113,7 @@
 #include "sysemu/replay.h"
 #include "qapi/qapi-events-run-state.h"
 #include "qapi/qapi-visit-block-core.h"
+#include "qapi/qapi-visit-qom.h"
 #include "qapi/qapi-visit-ui.h"
 #include "qapi/qapi-commands-block-core.h"
 #include "qapi/qapi-commands-migration.h"
@@ -132,6 +133,14 @@ typedef struct BlockdevOptionsQueueEntry {
 
 typedef QSIMPLEQ_HEAD(, BlockdevOptionsQueueEntry) BlockdevOptionsQueue;
 
+typedef struct ObjectOptionsQueueEntry {
+ObjectOptions *options;
+Location loc;
+QTAILQ_ENTRY(ObjectOptionsQueueEntry) next;
+} ObjectOptionsQueueEntry;
+
+typedef QTAILQ_HEAD(, ObjectOptionsQueueEntry) ObjectOptionsQueue;
+
 static const char *cpu_option;
 static const char *mem_path;
 static const char *incoming;
@@ -143,6 +152,7 @@ static int snapshot;
 static bool preconfig_requested;
 static QemuPluginList plugin_list = QTAILQ_HEAD_INITIALIZER(plugin_list);
 static BlockdevOptionsQueue bdo_queue = QSIMPLEQ_HEAD_INITIALIZER(bdo_queue);
+static ObjectOptionsQueue obj_queue = QTAILQ_HEAD_INITIALIZER(obj_queue);
 static bool nographic = false;
 static int mem_prealloc; /* force preallocation of physical target memory */
 static ram_addr_t ram_size;
@@ -1691,12 +1701,9 @@ static int machine_set_property(void *opaque,
  * cannot be created here, as it depends on the chardev
  * already existing.
  */
-static bool object_create_early(const char *type, QemuOpts *opts)
+static bool object_create_early(ObjectOptions *options)
 {
-if (user_creatable_print_help(type, opts)) {
-exit(0);
-}
-
+const char *type = ObjectType_str(options->qom_type);
 /*
  * Objects should not be made "delayed" without a reason.  If you
  * add one, state the reason in a comment!
@@ -1744,6 +1751,56 @@ static bool object_create_early(const char *type, 
QemuOpts *opts)
 return true;
 }
 
+static void object_queue_create(bool early)
+{
+ObjectOptionsQueueEntry *entry, *next;
+
+QTAILQ_FOREACH_SAFE(entry, _queue, next, next) {
+if (early != object_create_early(entry->options)) {
+continue;
+}
+QTAILQ_REMOVE(_queue, entry, next);
+loc_push_restore(>loc);
+user_creatable_add_qapi(entry->options, _fatal);
+loc_pop(>loc);
+qapi_free_ObjectOptions(entry->options);
+g_free(entry);
+}
+}
+
+/*
+ * -readconfig still parses things into QemuOpts. Convert any such
+ *  configurations to an ObjectOptionsQueueEntry.
+ *
+ *  This is more restricted than the normal -object parser because QemuOpts
+ *  parsed things, so no support for non-scalar properties. Help is also not
+ *  supported (but this shouldn't be requested in a config file anyway).
+ */
+static int object_readconfig_to_qapi(void *opaque, QemuOpts *opts, Error 
**errp)
+{
+ERRP_GUARD();
+ObjectOptionsQueueEntry *entry;
+ObjectOptions *options;
+QDict *args = qemu_opts_to_qdict(opts, NULL);
+Visitor *v;
+
+v = qobject_input_visitor_new_keyval(QOBJECT(args));
+visit_type_ObjectOptions(v, NULL, , errp);
+visit_free(v);
+qobject_unref(args);
+
+if (*errp) {
+return -1;
+}
+
+entry = g_new0(ObjectOptionsQueueEntry, 1);
+entry->options = options;
+loc_save(>loc);
+QTAILQ_INSERT_TAIL(_queue, entry, next);
+
+return 0;
+}
+
 static void qemu_apply_machine_options(void)
 {
 MachineClass *machine_class = MACHINE_GET_CLASS(current_machine);
@@ -1816,8 +1873,8 @@ static void qemu_create_early_backends(void)
 }
 

[PATCH v3 25/30] qom: Add user_creatable_add_from_str()

2021-03-08 Thread Kevin Wolf
This is a version of user_creatable_process_cmdline() with an Error
parameter that never calls exit() and is therefore usable in HMP.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 include/qom/object_interfaces.h | 16 
 qom/object_interfaces.c | 29 -
 2 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/include/qom/object_interfaces.h b/include/qom/object_interfaces.h
index 1e6c51b541..07511e6cff 100644
--- a/include/qom/object_interfaces.h
+++ b/include/qom/object_interfaces.h
@@ -144,6 +144,22 @@ typedef bool (*user_creatable_add_opts_predicate)(const 
char *type);
 int user_creatable_add_opts_foreach(void *opaque,
 QemuOpts *opts, Error **errp);
 
+/**
+ * user_creatable_add_from_str:
+ * @optarg: the object definition string as passed on the command line
+ * @errp: if an error occurs, a pointer to an area to store the error
+ *
+ * Create an instance of the user creatable object by parsing optarg
+ * with a keyval parser and implicit key 'qom-type', converting the
+ * result to ObjectOptions and calling into qmp_object_add().
+ *
+ * If a help option is given, print help instead.
+ *
+ * Returns: true when an object was successfully created, false when an error
+ * occurred (*errp is set then) or help was printed (*errp is not set).
+ */
+bool user_creatable_add_from_str(const char *optarg, Error **errp);
+
 /**
  * user_creatable_process_cmdline:
  * @optarg: the object definition string as passed on the command line
diff --git a/qom/object_interfaces.c b/qom/object_interfaces.c
index 2eaf9971f5..bf9f8cd2c6 100644
--- a/qom/object_interfaces.c
+++ b/qom/object_interfaces.c
@@ -291,26 +291,45 @@ static void user_creatable_print_help_from_qdict(QDict 
*args)
 }
 }
 
-void user_creatable_process_cmdline(const char *optarg)
+bool user_creatable_add_from_str(const char *optarg, Error **errp)
 {
+ERRP_GUARD();
 QDict *args;
 bool help;
 Visitor *v;
 ObjectOptions *options;
 
-args = keyval_parse(optarg, "qom-type", , _fatal);
+args = keyval_parse(optarg, "qom-type", , errp);
+if (*errp) {
+return false;
+}
 if (help) {
 user_creatable_print_help_from_qdict(args);
-exit(EXIT_SUCCESS);
+qobject_unref(args);
+return false;
 }
 
 v = qobject_input_visitor_new_keyval(QOBJECT(args));
-visit_type_ObjectOptions(v, NULL, , _fatal);
+visit_type_ObjectOptions(v, NULL, , errp);
 visit_free(v);
 qobject_unref(args);
 
-user_creatable_add_qapi(options, _fatal);
+if (*errp) {
+goto out;
+}
+
+user_creatable_add_qapi(options, errp);
+out:
 qapi_free_ObjectOptions(options);
+return !*errp;
+}
+
+void user_creatable_process_cmdline(const char *optarg)
+{
+if (!user_creatable_add_from_str(optarg, _fatal)) {
+/* Help was printed */
+exit(EXIT_SUCCESS);
+}
 }
 
 bool user_creatable_del(const char *id, Error **errp)
-- 
2.29.2




[PATCH v3 21/30] qom: Remove user_creatable_add_dict()

2021-03-08 Thread Kevin Wolf
This function is now unused and can be removed.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 include/qom/object_interfaces.h | 18 --
 qom/object_interfaces.c | 32 
 2 files changed, 50 deletions(-)

diff --git a/include/qom/object_interfaces.h b/include/qom/object_interfaces.h
index 9b9938b8c0..5299603f50 100644
--- a/include/qom/object_interfaces.h
+++ b/include/qom/object_interfaces.h
@@ -86,24 +86,6 @@ Object *user_creatable_add_type(const char *type, const char 
*id,
 const QDict *qdict,
 Visitor *v, Error **errp);
 
-/**
- * user_creatable_add_dict:
- * @qdict: the object definition
- * @keyval: if true, use a keyval visitor for processing @qdict (i.e.
- *  assume that all @qdict values are strings); otherwise, use
- *  the normal QObject visitor (i.e. assume all @qdict values
- *  have the QType expected by the QOM object type)
- * @errp: if an error occurs, a pointer to an area to store the error
- *
- * Create an instance of the user creatable object that is defined by
- * @qdict.  The object type is taken from the QDict key 'qom-type', its
- * ID from the key 'id'. The remaining entries in @qdict are used to
- * initialize the object properties.
- *
- * Returns: %true on success, %false on failure.
- */
-bool user_creatable_add_dict(QDict *qdict, bool keyval, Error **errp);
-
 /**
  * user_creatable_add_opts:
  * @opts: the object definition
diff --git a/qom/object_interfaces.c b/qom/object_interfaces.c
index d4df2334b7..02c3934329 100644
--- a/qom/object_interfaces.c
+++ b/qom/object_interfaces.c
@@ -113,38 +113,6 @@ out:
 return obj;
 }
 
-bool user_creatable_add_dict(QDict *qdict, bool keyval, Error **errp)
-{
-Visitor *v;
-Object *obj;
-g_autofree char *type = NULL;
-g_autofree char *id = NULL;
-
-type = g_strdup(qdict_get_try_str(qdict, "qom-type"));
-if (!type) {
-error_setg(errp, QERR_MISSING_PARAMETER, "qom-type");
-return false;
-}
-qdict_del(qdict, "qom-type");
-
-id = g_strdup(qdict_get_try_str(qdict, "id"));
-if (!id) {
-error_setg(errp, QERR_MISSING_PARAMETER, "id");
-return false;
-}
-qdict_del(qdict, "id");
-
-if (keyval) {
-v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
-} else {
-v = qobject_input_visitor_new(QOBJECT(qdict));
-}
-obj = user_creatable_add_type(type, id, qdict, v, errp);
-visit_free(v);
-object_unref(obj);
-return !!obj;
-}
-
 Object *user_creatable_add_opts(QemuOpts *opts, Error **errp)
 {
 Visitor *v;
-- 
2.29.2




[PATCH v3 20/30] qemu-storage-daemon: Implement --object with qmp_object_add()

2021-03-08 Thread Kevin Wolf
This QAPIfies --object and ensures that QMP and the command line option
behave the same.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 storage-daemon/qemu-storage-daemon.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/storage-daemon/qemu-storage-daemon.c 
b/storage-daemon/qemu-storage-daemon.c
index a1bcbacf05..4ab7e73053 100644
--- a/storage-daemon/qemu-storage-daemon.c
+++ b/storage-daemon/qemu-storage-daemon.c
@@ -38,6 +38,7 @@
 #include "qapi/qapi-visit-block-core.h"
 #include "qapi/qapi-visit-block-export.h"
 #include "qapi/qapi-visit-control.h"
+#include "qapi/qapi-visit-qom.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qstring.h"
 #include "qapi/qobject-input-visitor.h"
@@ -134,15 +135,6 @@ enum {
 
 extern QemuOptsList qemu_chardev_opts;
 
-static QemuOptsList qemu_object_opts = {
-.name = "object",
-.implied_opt_name = "qom-type",
-.head = QTAILQ_HEAD_INITIALIZER(qemu_object_opts.head),
-.desc = {
-{ }
-},
-};
-
 static void init_qmp_commands(void)
 {
 qmp_init_marshal(_commands);
@@ -282,14 +274,22 @@ static void process_options(int argc, char *argv[])
 {
 QDict *args;
 bool help;
+Visitor *v;
+ObjectOptions *options;
 
 args = keyval_parse(optarg, "qom-type", , _fatal);
 if (help) {
 user_creatable_print_help_from_qdict(args);
 exit(EXIT_SUCCESS);
 }
-user_creatable_add_dict(args, true, _fatal);
+
+v = qobject_input_visitor_new_keyval(QOBJECT(args));
+visit_type_ObjectOptions(v, NULL, , _fatal);
+visit_free(v);
 qobject_unref(args);
+
+qmp_object_add(options, _fatal);
+qapi_free_ObjectOptions(options);
 break;
 }
 case OPTION_PIDFILE:
@@ -338,7 +338,6 @@ int main(int argc, char *argv[])
 
 module_call_init(MODULE_INIT_QOM);
 module_call_init(MODULE_INIT_TRACE);
-qemu_add_opts(_object_opts);
 qemu_add_opts(_trace_opts);
 qcrypto_init(_fatal);
 bdrv_init();
-- 
2.29.2




[PATCH v3 18/30] qapi/qom: QAPIfy object-add

2021-03-08 Thread Kevin Wolf
This converts object-add from 'gen': false to the ObjectOptions QAPI
type. As an immediate benefit, clients can now use QAPI schema
introspection for user creatable QOM objects.

It is also the first step towards making the QAPI schema the only
external interface for the creation of user creatable objects. Once all
other places (HMP and command lines of the system emulator and all
tools) go through QAPI, too, some object implementations can be
simplified because some checks (e.g. that mandatory options are set) are
already performed by QAPI, and in another step, QOM boilerplate code
could be generated from the schema.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 qapi/qom.json| 11 +--
 include/qom/object_interfaces.h  |  7 ---
 hw/block/xen-block.c | 16 
 monitor/misc.c   |  2 --
 qom/qom-qmp-cmds.c   | 25 +++--
 storage-daemon/qemu-storage-daemon.c |  2 --
 6 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 0fd8563693..5b8a5da16f 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -844,13 +844,6 @@
 #
 # Create a QOM object.
 #
-# @qom-type: the class name for the object to be created
-#
-# @id: the name of the new object
-#
-# Additional arguments depend on qom-type and are passed to the backend
-# unchanged.
-#
 # Returns: Nothing on success
 #  Error if @qom-type is not a valid class name
 #
@@ -864,9 +857,7 @@
 # <- { "return": {} }
 #
 ##
-{ 'command': 'object-add',
-  'data': {'qom-type': 'str', 'id': 'str'},
-  'gen': false } # so we can get the additional arguments
+{ 'command': 'object-add', 'data': 'ObjectOptions', 'boxed': true }
 
 ##
 # @object-del:
diff --git a/include/qom/object_interfaces.h b/include/qom/object_interfaces.h
index 07d5cc8832..9b9938b8c0 100644
--- a/include/qom/object_interfaces.h
+++ b/include/qom/object_interfaces.h
@@ -196,11 +196,4 @@ bool user_creatable_del(const char *id, Error **errp);
  */
 void user_creatable_cleanup(void);
 
-/**
- * qmp_object_add:
- *
- * QMP command handler for object-add. See the QAPI schema for documentation.
- */
-void qmp_object_add(QDict *qdict, QObject **ret_data, Error **errp);
-
 #endif
diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index a3b69e2709..ac82d54063 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -836,17 +836,17 @@ static XenBlockIOThread *xen_block_iothread_create(const 
char *id,
 {
 ERRP_GUARD();
 XenBlockIOThread *iothread = g_new(XenBlockIOThread, 1);
-QDict *opts;
-QObject *ret_data = NULL;
+ObjectOptions *opts;
 
 iothread->id = g_strdup(id);
 
-opts = qdict_new();
-qdict_put_str(opts, "qom-type", TYPE_IOTHREAD);
-qdict_put_str(opts, "id", id);
-qmp_object_add(opts, _data, errp);
-qobject_unref(opts);
-qobject_unref(ret_data);
+opts = g_new(ObjectOptions, 1);
+*opts = (ObjectOptions) {
+.qom_type = OBJECT_TYPE_IOTHREAD,
+.id = g_strdup(id),
+};
+qmp_object_add(opts, errp);
+qapi_free_ObjectOptions(opts);
 
 if (*errp) {
 g_free(iothread->id);
diff --git a/monitor/misc.c b/monitor/misc.c
index a7650ed747..42efd9e2ab 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -235,8 +235,6 @@ static void monitor_init_qmp_commands(void)
  qmp_query_qmp_schema, QCO_ALLOW_PRECONFIG);
 qmp_register_command(_commands, "device_add", qmp_device_add,
  QCO_NO_OPTIONS);
-qmp_register_command(_commands, "object-add", qmp_object_add,
- QCO_NO_OPTIONS);
 
 QTAILQ_INIT(_cap_negotiation_commands);
 qmp_register_command(_cap_negotiation_commands, "qmp_capabilities",
diff --git a/qom/qom-qmp-cmds.c b/qom/qom-qmp-cmds.c
index 19fd5e117f..e577a96adf 100644
--- a/qom/qom-qmp-cmds.c
+++ b/qom/qom-qmp-cmds.c
@@ -19,8 +19,11 @@
 #include "qapi/error.h"
 #include "qapi/qapi-commands-qdev.h"
 #include "qapi/qapi-commands-qom.h"
+#include "qapi/qapi-visit-qom.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qerror.h"
+#include "qapi/qobject-input-visitor.h"
+#include "qapi/qobject-output-visitor.h"
 #include "qemu/cutils.h"
 #include "qom/object_interfaces.h"
 #include "qom/qom-qobject.h"
@@ -223,9 +226,27 @@ ObjectPropertyInfoList *qmp_qom_list_properties(const char 
*typename,
 return prop_list;
 }
 
-void qmp_object_add(QDict *qdict, QObject **ret_data, Error **errp)
+void qmp_object_add(ObjectOptions *options, Error **errp)
 {
-user_creatable_add_dict(qdict, false, errp);
+Visitor *v;
+QObject *qobj;
+QDict *props;
+Object *obj;
+
+v = qobject_output_visitor_new();
+visit_type_ObjectOptions(v, NULL, , _abort);
+visit_complete(v, );
+visit_free(v);
+
+props = qobject_to(QDict, qobj);
+qdict_del(props, "qom-type");
+qdict_del(props, "id");
+
+v = 

[PATCH v3 28/30] qom: Add user_creatable_parse_str()

2021-03-08 Thread Kevin Wolf
The system emulator has a more complicated way of handling command line
options in that it reorders options before it processes them. This means
that parsing object options and creating the object happen at two
different points. Split the parsing part into a separate function that
can be reused by the system emulator command line.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 include/qom/object_interfaces.h | 15 +++
 qom/object_interfaces.c | 20 ++--
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/include/qom/object_interfaces.h b/include/qom/object_interfaces.h
index 07511e6cff..fb32330901 100644
--- a/include/qom/object_interfaces.h
+++ b/include/qom/object_interfaces.h
@@ -144,6 +144,21 @@ typedef bool (*user_creatable_add_opts_predicate)(const 
char *type);
 int user_creatable_add_opts_foreach(void *opaque,
 QemuOpts *opts, Error **errp);
 
+/**
+ * user_creatable_parse_str:
+ * @optarg: the object definition string as passed on the command line
+ * @errp: if an error occurs, a pointer to an area to store the error
+ *
+ * Parses the option for the user creatable object with a keyval parser and
+ * implicit key 'qom-type', converting the result to ObjectOptions.
+ *
+ * If a help option is given, print help instead.
+ *
+ * Returns: ObjectOptions on success, NULL when an error occurred (*errp is set
+ * then) or help was printed (*errp is not set).
+ */
+ObjectOptions *user_creatable_parse_str(const char *optarg, Error **errp);
+
 /**
  * user_creatable_add_from_str:
  * @optarg: the object definition string as passed on the command line
diff --git a/qom/object_interfaces.c b/qom/object_interfaces.c
index 6dcab60f09..62d7db7629 100644
--- a/qom/object_interfaces.c
+++ b/qom/object_interfaces.c
@@ -292,7 +292,7 @@ static void user_creatable_print_help_from_qdict(QDict 
*args)
 }
 }
 
-bool user_creatable_add_from_str(const char *optarg, Error **errp)
+ObjectOptions *user_creatable_parse_str(const char *optarg, Error **errp)
 {
 ERRP_GUARD();
 QDict *args;
@@ -302,12 +302,12 @@ bool user_creatable_add_from_str(const char *optarg, 
Error **errp)
 
 args = keyval_parse(optarg, "qom-type", , errp);
 if (*errp) {
-return false;
+return NULL;
 }
 if (help) {
 user_creatable_print_help_from_qdict(args);
 qobject_unref(args);
-return false;
+return NULL;
 }
 
 v = qobject_input_visitor_new_keyval(QOBJECT(args));
@@ -315,12 +315,20 @@ bool user_creatable_add_from_str(const char *optarg, 
Error **errp)
 visit_free(v);
 qobject_unref(args);
 
-if (*errp) {
-goto out;
+return options;
+}
+
+bool user_creatable_add_from_str(const char *optarg, Error **errp)
+{
+ERRP_GUARD();
+ObjectOptions *options;
+
+options = user_creatable_parse_str(optarg, errp);
+if (!options) {
+return false;
 }
 
 user_creatable_add_qapi(options, errp);
-out:
 qapi_free_ObjectOptions(options);
 return !*errp;
 }
-- 
2.29.2




[PATCH v3 23/30] qemu-io: Use user_creatable_process_cmdline() for --object

2021-03-08 Thread Kevin Wolf
This switches qemu-io from a QemuOpts-based parser for --object to
user_creatable_process_cmdline() which uses a keyval parser and enforces
the QAPI schema.

Apart from being a cleanup, this makes non-scalar properties accessible.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 qemu-io.c | 33 +++--
 1 file changed, 3 insertions(+), 30 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index ac88d8bd40..bf902302e9 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -477,23 +477,6 @@ enum {
 OPTION_IMAGE_OPTS = 257,
 };
 
-static QemuOptsList qemu_object_opts = {
-.name = "object",
-.implied_opt_name = "qom-type",
-.head = QTAILQ_HEAD_INITIALIZER(qemu_object_opts.head),
-.desc = {
-{ }
-},
-};
-
-static bool qemu_io_object_print_help(const char *type, QemuOpts *opts)
-{
-if (user_creatable_print_help(type, opts)) {
-exit(0);
-}
-return true;
-}
-
 static QemuOptsList file_opts = {
 .name = "file",
 .implied_opt_name = "file",
@@ -550,7 +533,6 @@ int main(int argc, char **argv)
 qcrypto_init(_fatal);
 
 module_call_init(MODULE_INIT_QOM);
-qemu_add_opts(_object_opts);
 qemu_add_opts(_trace_opts);
 bdrv_init();
 
@@ -612,14 +594,9 @@ int main(int argc, char **argv)
 case 'U':
 force_share = true;
 break;
-case OPTION_OBJECT: {
-QemuOpts *qopts;
-qopts = qemu_opts_parse_noisily(_object_opts,
-optarg, true);
-if (!qopts) {
-exit(1);
-}
-}   break;
+case OPTION_OBJECT:
+user_creatable_process_cmdline(optarg);
+break;
 case OPTION_IMAGE_OPTS:
 imageOpts = true;
 break;
@@ -644,10 +621,6 @@ int main(int argc, char **argv)
 exit(1);
 }
 
-qemu_opts_foreach(_object_opts,
-  user_creatable_add_opts_foreach,
-  qemu_io_object_print_help, _fatal);
-
 if (!trace_init_backends()) {
 exit(1);
 }
-- 
2.29.2




[PATCH v3 08/30] qapi/qom: Add ObjectOptions for throttle-group

2021-03-08 Thread Kevin Wolf
This adds a QAPI schema for the properties of the throttle-group object.

The only purpose of the x-* properties is to make the nested options in
'limits' available for a command line parser that doesn't support
structs. Any parser that will use the QAPI schema will supports structs,
though, so they will not be needed in the schema in the future.

To keep the conversion straightforward, add them to the schema anyway.
We can then remove the options and adjust documentation, test cases etc.
in a separate patch.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 qapi/block-core.json | 27 +++
 qapi/qom.json|  7 +--
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 9f555d5c1d..a67fa0cc59 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2504,6 +2504,33 @@
 '*bps-write-max' : 'int', '*bps-write-max-length' : 'int',
 '*iops-size' : 'int' } }
 
+##
+# @ThrottleGroupProperties:
+#
+# Properties for throttle-group objects.
+#
+# The options starting with x- are aliases for the same key without x- in
+# the @limits object. As indicated by the x- prefix, this is not a stable
+# interface and may be removed or changed incompatibly in the future. Use
+# @limits for a supported stable interface.
+#
+# @limits: limits to apply for this throttle group
+#
+# Since: 2.11
+##
+{ 'struct': 'ThrottleGroupProperties',
+  'data': { '*limits': 'ThrottleLimits',
+'*x-iops-total' : 'int', '*x-iops-total-max' : 'int',
+'*x-iops-total-max-length' : 'int', '*x-iops-read' : 'int',
+'*x-iops-read-max' : 'int', '*x-iops-read-max-length' : 'int',
+'*x-iops-write' : 'int', '*x-iops-write-max' : 'int',
+'*x-iops-write-max-length' : 'int', '*x-bps-total' : 'int',
+'*x-bps-total-max' : 'int', '*x-bps-total-max-length' : 'int',
+'*x-bps-read' : 'int', '*x-bps-read-max' : 'int',
+'*x-bps-read-max-length' : 'int', '*x-bps-write' : 'int',
+'*x-bps-write-max' : 'int', '*x-bps-write-max-length' : 'int',
+'*x-iops-size' : 'int' } }
+
 ##
 # @block-stream:
 #
diff --git a/qapi/qom.json b/qapi/qom.json
index 6d3b8c4fe0..0721a636f9 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -5,6 +5,7 @@
 # See the COPYING file in the top-level directory.
 
 { 'include': 'authz.json' }
+{ 'include': 'block-core.json' }
 { 'include': 'common.json' }
 
 ##
@@ -449,7 +450,8 @@
 'memory-backend-ram',
 'rng-builtin',
 'rng-egd',
-'rng-random'
+'rng-random',
+'throttle-group'
   ] }
 
 ##
@@ -484,7 +486,8 @@
   'memory-backend-ram': 'MemoryBackendProperties',
   'rng-builtin':'RngProperties',
   'rng-egd':'RngEgdProperties',
-  'rng-random': 'RngRandomProperties'
+  'rng-random': 'RngRandomProperties',
+  'throttle-group': 'ThrottleGroupProperties'
   } }
 
 ##
-- 
2.29.2




[PATCH v3 14/30] qapi/qom: Add ObjectOptions for pr-manager-helper

2021-03-08 Thread Kevin Wolf
This adds a QAPI schema for the properties of the pr-manager-helper
object.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 qapi/qom.json | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/qapi/qom.json b/qapi/qom.json
index 6fe775bd83..6afac9169f 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -577,6 +577,18 @@
 '*hugetlbsize': 'size',
 '*seal': 'bool' } }
 
+##
+# @PrManagerHelperProperties:
+#
+# Properties for pr-manager-helper objects.
+#
+# @path: the path to a Unix domain socket for connecting to the external helper
+#
+# Since: 2.11
+##
+{ 'struct': 'PrManagerHelperProperties',
+  'data': { 'path': 'str' } }
+
 ##
 # @RngProperties:
 #
@@ -651,6 +663,7 @@
 'memory-backend-file',
 'memory-backend-memfd',
 'memory-backend-ram',
+'pr-manager-helper',
 'rng-builtin',
 'rng-egd',
 'rng-random',
@@ -701,6 +714,7 @@
   'memory-backend-memfd':   { 'type': 'MemoryBackendMemfdProperties',
   'if': 'defined(CONFIG_LINUX)' },
   'memory-backend-ram': 'MemoryBackendProperties',
+  'pr-manager-helper':  'PrManagerHelperProperties',
   'rng-builtin':'RngProperties',
   'rng-egd':'RngEgdProperties',
   'rng-random': 'RngRandomProperties',
-- 
2.29.2




[PATCH v3 12/30] qapi/qom: Add ObjectOptions for colo-compare

2021-03-08 Thread Kevin Wolf
This adds a QAPI schema for the properties of the colo-compare object.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 qapi/qom.json | 49 +
 1 file changed, 49 insertions(+)

diff --git a/qapi/qom.json b/qapi/qom.json
index fd87896bca..a34ae43cb9 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -222,6 +222,53 @@
   'data': { 'if': 'str',
 'canbus': 'str' } }
 
+##
+# @ColoCompareProperties:
+#
+# Properties for colo-compare objects.
+#
+# @primary_in: name of the character device backend to use for the primary
+#  input (incoming packets are redirected to @outdev)
+#
+# @secondary_in: name of the character device backend to use for secondary
+#input (incoming packets are only compared to the input on
+#@primary_in and then dropped)
+#
+# @outdev: name of the character device backend to use for output
+#
+# @iothread: name of the iothread to run in
+#
+# @notify_dev: name of the character device backend to be used to communicate
+#  with the remote colo-frame (only for Xen COLO)
+#
+# @compare_timeout: the maximum time to hold a packet from @primary_in for
+#   comparison with an incoming packet on @secondary_in in
+#   milliseconds (default: 3000)
+#
+# @expired_scan_cycle: the interval at which colo-compare checks whether
+#  packets from @primary have timed out, in milliseconds
+#  (default: 3000)
+#
+# @max_queue_size: the maximum number of packets to keep in the queue for
+#  comparing with incoming packets from @secondary_in.  If the
+#  queue is full and addtional packets are received, the
+#  addtional packets are dropped. (default: 1024)
+#
+# @vnet_hdr_support: if true, vnet header support is enabled (default: false)
+#
+# Since: 2.8
+##
+{ 'struct': 'ColoCompareProperties',
+  'data': { 'primary_in': 'str',
+'secondary_in': 'str',
+'outdev': 'str',
+'iothread': 'str',
+'*notify_dev': 'str',
+'*compare_timeout': 'uint64',
+'*expired_scan_cycle': 'uint32',
+'*max_queue_size': 'uint32',
+'*vnet_hdr_support': 'bool' } }
+
 ##
 # @CryptodevBackendProperties:
 #
@@ -458,6 +505,7 @@
 'authz-simple',
 'can-bus',
 'can-host-socketcan',
+'colo-compare',
 'cryptodev-backend',
 'cryptodev-backend-builtin',
 'cryptodev-vhost-user',
@@ -499,6 +547,7 @@
   'authz-pam':  'AuthZPAMProperties',
   'authz-simple':   'AuthZSimpleProperties',
   'can-host-socketcan': 'CanHostSocketcanProperties',
+  'colo-compare':   'ColoCompareProperties',
   'cryptodev-backend':  'CryptodevBackendProperties',
   'cryptodev-backend-builtin':  'CryptodevBackendProperties',
   'cryptodev-vhost-user':   { 'type': 'CryptodevVhostUserProperties',
-- 
2.29.2




[PATCH v3 17/30] qapi/qom: Add ObjectOptions for x-remote-object

2021-03-08 Thread Kevin Wolf
This adds a QAPI schema for the properties of the x-remote-object
object.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 qapi/qom.json | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 6b96e9b0b3..0fd8563693 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -644,6 +644,20 @@
 { 'struct': 'PrManagerHelperProperties',
   'data': { 'path': 'str' } }
 
+##
+# @RemoteObjectProperties:
+#
+# Properties for x-remote-object objects.
+#
+# @fd: file descriptor name previously passed via 'getfd' command
+#
+# @devid: the id of the device to be associated with the file descriptor
+#
+# Since: 6.0
+##
+{ 'struct': 'RemoteObjectProperties',
+  'data': { 'fd': 'str', 'devid': 'str' } }
+
 ##
 # @RngProperties:
 #
@@ -765,7 +779,8 @@
 'tls-creds-anon',
 'tls-creds-psk',
 'tls-creds-x509',
-'tls-cipher-suites'
+'tls-cipher-suites',
+'x-remote-object'
   ] }
 
 ##
@@ -820,7 +835,8 @@
   'tls-creds-anon': 'TlsCredsAnonProperties',
   'tls-creds-psk':  'TlsCredsPskProperties',
   'tls-creds-x509': 'TlsCredsX509Properties',
-  'tls-cipher-suites':  'TlsCredsProperties'
+  'tls-cipher-suites':  'TlsCredsProperties',
+  'x-remote-object':'RemoteObjectProperties'
   } }
 
 ##
-- 
2.29.2




[PATCH v3 13/30] qapi/qom: Add ObjectOptions for filter-*

2021-03-08 Thread Kevin Wolf
This adds a QAPI schema for the properties of the filter-* objects.

Some parts of the interface (in particular NetfilterProperties.position)
are very unusual for QAPI, but for now just describe the existing
interface.

net.json can't be included in qom.json because the storage daemon
doesn't have it. NetFilterDirection is still required in the new object
property definitions in qom.json, so move this enum to common.json.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 qapi/common.json |  20 +++
 qapi/net.json|  20 ---
 qapi/qom.json| 143 +++
 3 files changed, 163 insertions(+), 20 deletions(-)

diff --git a/qapi/common.json b/qapi/common.json
index 2dad4fadc3..b87e7f9039 100644
--- a/qapi/common.json
+++ b/qapi/common.json
@@ -165,3 +165,23 @@
 ##
 { 'enum': 'HostMemPolicy',
   'data': [ 'default', 'preferred', 'bind', 'interleave' ] }
+
+##
+# @NetFilterDirection:
+#
+# Indicates whether a netfilter is attached to a netdev's transmit queue or
+# receive queue or both.
+#
+# @all: the filter is attached both to the receive and the transmit
+#   queue of the netdev (default).
+#
+# @rx: the filter is attached to the receive queue of the netdev,
+#  where it will receive packets sent to the netdev.
+#
+# @tx: the filter is attached to the transmit queue of the netdev,
+#  where it will receive packets sent by the netdev.
+#
+# Since: 2.5
+##
+{ 'enum': 'NetFilterDirection',
+  'data': [ 'all', 'rx', 'tx' ] }
diff --git a/qapi/net.json b/qapi/net.json
index c31748c87f..af3f5b0fda 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -492,26 +492,6 @@
 'vhost-user': 'NetdevVhostUserOptions',
 'vhost-vdpa': 'NetdevVhostVDPAOptions' } }
 
-##
-# @NetFilterDirection:
-#
-# Indicates whether a netfilter is attached to a netdev's transmit queue or
-# receive queue or both.
-#
-# @all: the filter is attached both to the receive and the transmit
-#   queue of the netdev (default).
-#
-# @rx: the filter is attached to the receive queue of the netdev,
-#  where it will receive packets sent to the netdev.
-#
-# @tx: the filter is attached to the transmit queue of the netdev,
-#  where it will receive packets sent by the netdev.
-#
-# Since: 2.5
-##
-{ 'enum': 'NetFilterDirection',
-  'data': [ 'all', 'rx', 'tx' ] }
-
 ##
 # @RxState:
 #
diff --git a/qapi/qom.json b/qapi/qom.json
index a34ae43cb9..6fe775bd83 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -313,6 +313,137 @@
   'data': { 'addr': 'str' ,
 '*id-list': 'str' } }
 
+##
+# @NetfilterInsert:
+#
+# Indicates where to insert a netfilter relative to a given other filter.
+#
+# @before: insert before the specified filter
+#
+# @behind: insert behind the specified filter
+#
+# Since: 5.0
+##
+{ 'enum': 'NetfilterInsert',
+  'data': [ 'before', 'behind' ] }
+
+##
+# @NetfilterProperties:
+#
+# Properties for objects of classes derived from netfilter.
+#
+# @netdev: id of the network device backend to filter
+#
+# @queue: indicates which queue(s) to filter (default: all)
+#
+# @status: indicates whether the filter is enabled ("on") or disabled ("off")
+#  (default: "on")
+#
+# @position: specifies where the filter should be inserted in the filter list.
+#"head" means the filter is inserted at the head of the filter 
list,
+#before any existing filters.
+#"tail" means the filter is inserted at the tail of the filter 
list,
+#behind any existing filters (default).
+#"id=" means the filter is inserted before or behind the filter
+#specified by , depending on the @insert property.
+#(default: "tail")
+#
+# @insert: where to insert the filter relative to the filter given in 
@position.
+#  Ignored if @position is "head" or "tail". (default: behind)
+#
+# Since: 2.5
+##
+{ 'struct': 'NetfilterProperties',
+  'data': { 'netdev': 'str',
+'*queue': 'NetFilterDirection',
+'*status': 'str',
+'*position': 'str',
+'*insert': 'NetfilterInsert' } }
+
+##
+# @FilterBufferProperties:
+#
+# Properties for filter-buffer objects.
+#
+# @interval: a non-zero interval in microseconds.  All packets arriving in the
+#given interval are delayed until the end of the interval.
+#
+# Since: 2.5
+##
+{ 'struct': 'FilterBufferProperties',
+  'base': 'NetfilterProperties',
+  'data': { 'interval': 'uint32' } }
+
+##
+# @FilterDumpProperties:
+#
+# Properties for filter-dump objects.
+#
+# @file: the filename where the dumped packets should be stored
+#
+# @maxlen: maximum number of bytes in a packet that are stored (default: 65536)
+#
+# Since: 2.5
+##
+{ 'struct': 'FilterDumpProperties',
+  'base': 'NetfilterProperties',
+  'data': { 'file': 'str',
+'*maxlen': 'uint32' } }
+
+##
+# @FilterMirrorProperties:
+#
+# Properties for filter-mirror objects.
+#
+# @outdev: the name of a character 

[PATCH v3 11/30] qapi/qom: Add ObjectOptions for can-*

2021-03-08 Thread Kevin Wolf
This adds a QAPI schema for the properties of the can-* objects.

can-bus doesn't have any properties, so it only needs to be added to the
ObjectType enum without adding a new branch to ObjectOptions.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 qapi/qom.json | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/qapi/qom.json b/qapi/qom.json
index 512d8fce12..fd87896bca 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -207,6 +207,21 @@
   'returns': [ 'ObjectPropertyInfo' ],
   'allow-preconfig': true }
 
+##
+# @CanHostSocketcanProperties:
+#
+# Properties for can-host-socketcan objects.
+#
+# @if: interface name of the host system CAN bus to connect to
+#
+# @canbus: object ID of the can-bus object to connect to the host interface
+#
+# Since: 2.12
+##
+{ 'struct': 'CanHostSocketcanProperties',
+  'data': { 'if': 'str',
+'canbus': 'str' } }
+
 ##
 # @CryptodevBackendProperties:
 #
@@ -441,6 +456,8 @@
 'authz-listfile',
 'authz-pam',
 'authz-simple',
+'can-bus',
+'can-host-socketcan',
 'cryptodev-backend',
 'cryptodev-backend-builtin',
 'cryptodev-vhost-user',
@@ -481,6 +498,7 @@
   'authz-listfile': 'AuthZListFileProperties',
   'authz-pam':  'AuthZPAMProperties',
   'authz-simple':   'AuthZSimpleProperties',
+  'can-host-socketcan': 'CanHostSocketcanProperties',
   'cryptodev-backend':  'CryptodevBackendProperties',
   'cryptodev-backend-builtin':  'CryptodevBackendProperties',
   'cryptodev-vhost-user':   { 'type': 'CryptodevVhostUserProperties',
-- 
2.29.2




[PATCH v3 10/30] qapi/qom: Add ObjectOptions for tls-*, deprecate 'loaded'

2021-03-08 Thread Kevin Wolf
This adds a QAPI schema for the properties of the tls-* objects.

The 'loaded' property doesn't seem to make sense as an external
interface: It is automatically set to true in ucc->complete, and
explicitly setting it to true earlier just means that additional options
will be silently ignored.

In other words, the 'loaded' property is useless. Mark it as deprecated
in the schema from the start.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 qapi/crypto.json | 98 
 qapi/qom.json| 12 +-
 2 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/qapi/crypto.json b/qapi/crypto.json
index 0fef3de66d..7116ae9a46 100644
--- a/qapi/crypto.json
+++ b/qapi/crypto.json
@@ -442,3 +442,101 @@
 { 'struct': 'SecretKeyringProperties',
   'base': 'SecretCommonProperties',
   'data': { 'serial': 'int32' } }
+
+##
+# @TlsCredsProperties:
+#
+# Properties for objects of classes derived from tls-creds.
+#
+# @verify-peer: if true the peer credentials will be verified once the
+#   handshake is completed.  This is a no-op for anonymous
+#   credentials. (default: true)
+#
+# @dir: the path of the directory that contains the credential files
+#
+# @endpoint: whether the QEMU network backend that uses the credentials will be
+#acting as a client or as a server (default: client)
+#
+# @priority: a gnutls priority string as described at
+#https://gnutls.org/manual/html_node/Priority-Strings.html
+#
+# Since: 2.5
+##
+{ 'struct': 'TlsCredsProperties',
+  'data': { '*verify-peer': 'bool',
+'*dir': 'str',
+'*endpoint': 'QCryptoTLSCredsEndpoint',
+'*priority': 'str' } }
+
+##
+# @TlsCredsAnonProperties:
+#
+# Properties for tls-creds-anon objects.
+#
+# @loaded: if true, the credentials are loaded immediately when applying this
+#  option and will ignore options that are processed later. Don't use;
+#  only provided for compatibility. (default: false)
+#
+# Features:
+# @deprecated: Member @loaded is deprecated.  Setting true doesn't make sense,
+#  and false is already the default.
+#
+# Since: 2.5
+##
+{ 'struct': 'TlsCredsAnonProperties',
+  'base': 'TlsCredsProperties',
+  'data': { '*loaded': { 'type': 'bool', 'features': ['deprecated'] } } }
+
+##
+# @TlsCredsPskProperties:
+#
+# Properties for tls-creds-psk objects.
+#
+# @loaded: if true, the credentials are loaded immediately when applying this
+#  option and will ignore options that are processed later. Don't use;
+#  only provided for compatibility. (default: false)
+#
+# @username: the username which will be sent to the server.  For clients only.
+#If absent, "qemu" is sent and the property will read back as an
+#empty string.
+#
+# Features:
+# @deprecated: Member @loaded is deprecated.  Setting true doesn't make sense,
+#  and false is already the default.
+#
+# Since: 3.0
+##
+{ 'struct': 'TlsCredsPskProperties',
+  'base': 'TlsCredsProperties',
+  'data': { '*loaded': { 'type': 'bool', 'features': ['deprecated'] },
+'*username': 'str' } }
+
+##
+# @TlsCredsX509Properties:
+#
+# Properties for tls-creds-x509 objects.
+#
+# @loaded: if true, the credentials are loaded immediately when applying this
+#  option and will ignore options that are processed later. Don't use;
+#  only provided for compatibility. (default: false)
+#
+# @sanity-check: if true, perform some sanity checks before using the
+#credentials (default: true)
+#
+# @passwordid: For the server-key.pem and client-key.pem files which contain
+#  sensitive private keys, it is possible to use an encrypted
+#  version by providing the @passwordid parameter.  This provides
+#  the ID of a previously created secret object containing the
+#  password for decryption.
+#
+# Features:
+# @deprecated: Member @loaded is deprecated.  Setting true doesn't make sense,
+#  and false is already the default.
+#
+# Since: 2.5
+##
+{ 'struct': 'TlsCredsX509Properties',
+  'base': 'TlsCredsProperties',
+  'data': { '*loaded': { 'type': 'bool', 'features': ['deprecated'] },
+'*sanity-check': 'bool',
+'*passwordid': 'str' } }
diff --git a/qapi/qom.json b/qapi/qom.json
index e4bbddd986..512d8fce12 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -454,7 +454,11 @@
 'rng-random',
 'secret',
 'secret_keyring',
-'throttle-group'
+'throttle-group',
+'tls-creds-anon',
+'tls-creds-psk',
+'tls-creds-x509',
+'tls-cipher-suites'
   ] }
 
 ##
@@ -492,7 +496,11 @@
   'rng-random': 'RngRandomProperties',
   'secret': 'SecretProperties',
   'secret_keyring': 'SecretKeyringProperties',
-  'throttle-group': 'ThrottleGroupProperties'
+  

[PATCH v3 09/30] qapi/qom: Add ObjectOptions for secret*, deprecate 'loaded'

2021-03-08 Thread Kevin Wolf
This adds a QAPI schema for the properties of the secret* objects.

The 'loaded' property doesn't seem to make sense as an external
interface: It is automatically set to true in ucc->complete, and
explicitly setting it to true earlier just means that additional options
will be silently ignored.

In other words, the 'loaded' property is useless. Mark it as deprecated
in the schema from the start.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 qapi/crypto.json   | 61 ++
 qapi/qom.json  |  5 
 docs/system/deprecated.rst | 11 +++
 3 files changed, 77 insertions(+)

diff --git a/qapi/crypto.json b/qapi/crypto.json
index 2aebe6fa20..0fef3de66d 100644
--- a/qapi/crypto.json
+++ b/qapi/crypto.json
@@ -381,3 +381,64 @@
   'discriminator': 'format',
   'data': {
   'luks': 'QCryptoBlockAmendOptionsLUKS' } }
+
+##
+# @SecretCommonProperties:
+#
+# Properties for objects of classes derived from secret-common.
+#
+# @loaded: if true, the secret is loaded immediately when applying this option
+#  and will probably fail when processing the next option. Don't use;
+#  only provided for compatibility. (default: false)
+#
+# @format: the data format that the secret is provided in (default: raw)
+#
+# @keyid: the name of another secret that should be used to decrypt the
+# provided data. If not present, the data is assumed to be unencrypted.
+#
+# @iv: the random initialization vector used for encryption of this particular
+#  secret. Should be a base64 encrypted string of the 16-byte IV. Mandatory
+#  if @keyid is given. Ignored if @keyid is absent.
+#
+# Features:
+# @deprecated: Member @loaded is deprecated.  Setting true doesn't make sense,
+#  and false is already the default.
+#
+# Since: 2.6
+##
+{ 'struct': 'SecretCommonProperties',
+  'data': { '*loaded': { 'type': 'bool', 'features': ['deprecated'] },
+'*format': 'QCryptoSecretFormat',
+'*keyid': 'str',
+'*iv': 'str' } }
+
+##
+# @SecretProperties:
+#
+# Properties for secret objects.
+#
+# Either @data or @file must be provided, but not both.
+#
+# @data: the associated with the secret from
+#
+# @file: the filename to load the data associated with the secret from
+#
+# Since: 2.6
+##
+{ 'struct': 'SecretProperties',
+  'base': 'SecretCommonProperties',
+  'data': { '*data': 'str',
+'*file': 'str' } }
+
+##
+# @SecretKeyringProperties:
+#
+# Properties for secret_keyring objects.
+#
+# @serial: serial number that identifies a key to get from the kernel
+#
+# Since: 5.1
+##
+{ 'struct': 'SecretKeyringProperties',
+  'base': 'SecretCommonProperties',
+  'data': { 'serial': 'int32' } }
diff --git a/qapi/qom.json b/qapi/qom.json
index 0721a636f9..e4bbddd986 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -7,6 +7,7 @@
 { 'include': 'authz.json' }
 { 'include': 'block-core.json' }
 { 'include': 'common.json' }
+{ 'include': 'crypto.json' }
 
 ##
 # = QEMU Object Model (QOM)
@@ -451,6 +452,8 @@
 'rng-builtin',
 'rng-egd',
 'rng-random',
+'secret',
+'secret_keyring',
 'throttle-group'
   ] }
 
@@ -487,6 +490,8 @@
   'rng-builtin':'RngProperties',
   'rng-egd':'RngEgdProperties',
   'rng-random': 'RngRandomProperties',
+  'secret': 'SecretProperties',
+  'secret_keyring': 'SecretKeyringProperties',
   'throttle-group': 'ThrottleGroupProperties'
   } }
 
diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index 3dac79f600..f4e8226963 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -162,6 +162,17 @@ other options have been processed.  This will either have 
no effect (if
 ``opened`` was the last option) or cause errors.  The property is therefore
 useless and should not be specified.
 
+``loaded`` property of ``secret`` and ``secret_keyring`` objects (since 6.0.0)
+''
+
+The only effect of specifying ``loaded=on`` in the command line or QMP
+``object-add`` is that the secret is loaded immediately, possibly before all
+other options have been processed.  This will either have no effect (if
+``loaded`` was the last option) or cause options to be effectively ignored as
+if they were not given.  The property is therefore useless and should not be
+specified.
+
+
 QEMU Machine Protocol (QMP) commands
 
 
-- 
2.29.2




[PATCH v3 19/30] qom: Make "object" QemuOptsList optional

2021-03-08 Thread Kevin Wolf
This code is going away anyway, but for a few more commits, we'll be in
a state where some binaries still use QemuOpts and others don't. If the
"object" QemuOptsList doesn't even exist, we don't have to remove (or
fail to remove, and therefore abort) a user creatable object from it.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 qom/object_interfaces.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/qom/object_interfaces.c b/qom/object_interfaces.c
index 7661270b98..d4df2334b7 100644
--- a/qom/object_interfaces.c
+++ b/qom/object_interfaces.c
@@ -299,6 +299,7 @@ void user_creatable_print_help_from_qdict(QDict *args)
 
 bool user_creatable_del(const char *id, Error **errp)
 {
+QemuOptsList *opts_list;
 Object *container;
 Object *obj;
 
@@ -318,8 +319,10 @@ bool user_creatable_del(const char *id, Error **errp)
  * if object was defined on the command-line, remove its corresponding
  * option group entry
  */
-qemu_opts_del(qemu_opts_find(qemu_find_opts_err("object", _abort),
- id));
+opts_list = qemu_find_opts_err("object", NULL);
+if (opts_list) {
+qemu_opts_del(qemu_opts_find(opts_list, id));
+}
 
 object_unparent(obj);
 return true;
-- 
2.29.2




[PATCH v3 05/30] qapi/qom: Add ObjectOptions for dbus-vmstate

2021-03-08 Thread Kevin Wolf
This adds a QAPI schema for the properties of the dbus-vmstate object.

A list represented as a comma separated string is clearly not very
QAPI-like, but for now just describe the existing interface.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 qapi/qom.json | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/qapi/qom.json b/qapi/qom.json
index 46c2cdc6cf..942654e05c 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -232,6 +232,22 @@
   'base': 'CryptodevBackendProperties',
   'data': { 'chardev': 'str' } }
 
+##
+# @DBusVMStateProperties:
+#
+# Properties for dbus-vmstate objects.
+#
+# @addr: the name of the DBus bus to connect to
+#
+# @id-list: a comma separated list of DBus IDs of helpers whose data should be
+#   included in the VM state on migration
+#
+# Since: 5.0
+##
+{ 'struct': 'DBusVMStateProperties',
+  'data': { 'addr': 'str' ,
+'*id-list': 'str' } }
+
 ##
 # @IothreadProperties:
 #
@@ -270,6 +286,7 @@
 'cryptodev-backend',
 'cryptodev-backend-builtin',
 'cryptodev-vhost-user',
+'dbus-vmstate',
 'iothread'
   ] }
 
@@ -297,6 +314,7 @@
   'cryptodev-backend-builtin':  'CryptodevBackendProperties',
   'cryptodev-vhost-user':   { 'type': 'CryptodevVhostUserProperties',
   'if': 'defined(CONFIG_VIRTIO_CRYPTO) && 
defined(CONFIG_VHOST_CRYPTO)' },
+  'dbus-vmstate':   'DBusVMStateProperties',
   'iothread':   'IothreadProperties'
   } }
 
-- 
2.29.2




[PATCH v3 07/30] qapi/qom: Add ObjectOptions for rng-*, deprecate 'opened'

2021-03-08 Thread Kevin Wolf
This adds a QAPI schema for the properties of the rng-* objects.

The 'opened' property doesn't seem to make sense as an external
interface: It is automatically set to true in ucc->complete, and
explicitly setting it to true earlier just means that trying to set
additional options will result in an error. After the property has once
been set to true (i.e. when the object construction has completed), it
can never be reset to false. In other words, the 'opened' property is
useless. Mark it as deprecated in the schema from the start.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 qapi/qom.json  | 56 --
 docs/system/deprecated.rst |  9 ++
 2 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 8c0e06c198..6d3b8c4fe0 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -382,6 +382,52 @@
 '*hugetlbsize': 'size',
 '*seal': 'bool' } }
 
+##
+# @RngProperties:
+#
+# Properties for objects of classes derived from rng.
+#
+# @opened: if true, the device is opened immediately when applying this option
+#  and will probably fail when processing the next option. Don't use;
+#  only provided for compatibility. (default: false)
+#
+# Features:
+# @deprecated: Member @opened is deprecated.  Setting true doesn't make sense,
+#  and false is already the default.
+#
+# Since: 1.3
+##
+{ 'struct': 'RngProperties',
+  'data': { '*opened': { 'type': 'bool', 'features': ['deprecated'] } } }
+
+##
+# @RngEgdProperties:
+#
+# Properties for rng-egd objects.
+#
+# @chardev: the name of a character device backend that provides the connection
+#   to the RNG daemon
+#
+# Since: 1.3
+##
+{ 'struct': 'RngEgdProperties',
+  'base': 'RngProperties',
+  'data': { 'chardev': 'str' } }
+
+##
+# @RngRandomProperties:
+#
+# Properties for rng-random objects.
+#
+# @filename: the filename of the device on the host to obtain entropy from
+#(default: "/dev/urandom")
+#
+# Since: 1.3
+##
+{ 'struct': 'RngRandomProperties',
+  'base': 'RngProperties',
+  'data': { '*filename': 'str' } }
+
 ##
 # @ObjectType:
 #
@@ -400,7 +446,10 @@
 'iothread',
 'memory-backend-file',
 'memory-backend-memfd',
-'memory-backend-ram'
+'memory-backend-ram',
+'rng-builtin',
+'rng-egd',
+'rng-random'
   ] }
 
 ##
@@ -432,7 +481,10 @@
   'memory-backend-file':'MemoryBackendFileProperties',
   'memory-backend-memfd':   { 'type': 'MemoryBackendMemfdProperties',
   'if': 'defined(CONFIG_LINUX)' },
-  'memory-backend-ram': 'MemoryBackendProperties'
+  'memory-backend-ram': 'MemoryBackendProperties',
+  'rng-builtin':'RngProperties',
+  'rng-egd':'RngEgdProperties',
+  'rng-random': 'RngRandomProperties'
   } }
 
 ##
diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index 893f3e8579..3dac79f600 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -153,6 +153,15 @@ The ``-writeconfig`` option is not able to serialize the 
entire contents
 of the QEMU command line.  It is thus considered a failed experiment
 and deprecated, with no current replacement.
 
+``opened`` property of ``rng-*`` objects (since 6.0.0)
+''
+
+The only effect of specifying ``opened=on`` in the command line or QMP
+``object-add`` is that the device is opened immediately, possibly before all
+other options have been processed.  This will either have no effect (if
+``opened`` was the last option) or cause errors.  The property is therefore
+useless and should not be specified.
+
 QEMU Machine Protocol (QMP) commands
 
 
-- 
2.29.2




[PATCH v3 01/30] qapi/qom: Drop deprecated 'props' from object-add

2021-03-08 Thread Kevin Wolf
The option has been deprecated in QEMU 5.0, remove it.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 qapi/qom.json|  6 +-
 docs/system/deprecated.rst   |  5 -
 docs/system/removed-features.rst |  5 +
 qom/qom-qmp-cmds.c   | 21 -
 4 files changed, 6 insertions(+), 31 deletions(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 0b0b92944b..96c91c1faf 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -211,10 +211,6 @@
 #
 # @id: the name of the new object
 #
-# @props: a dictionary of properties to be passed to the backend. Deprecated
-# since 5.0, specify the properties on the top level instead. It is an
-# error to specify the same option both on the top level and in @props.
-#
 # Additional arguments depend on qom-type and are passed to the backend
 # unchanged.
 #
@@ -232,7 +228,7 @@
 #
 ##
 { 'command': 'object-add',
-  'data': {'qom-type': 'str', 'id': 'str', '*props': 'any'},
+  'data': {'qom-type': 'str', 'id': 'str'},
   'gen': false } # so we can get the additional arguments
 
 ##
diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index 561c916da2..893f3e8579 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -206,11 +206,6 @@ Use ``migrate-set-parameters`` and 
``query-migrate-parameters`` instead.
 
 Use arguments ``base-node`` and ``top-node`` instead.
 
-``object-add`` option ``props`` (since 5.0)
-'''
-
-Specify the properties for the object as top-level arguments instead.
-
 ``query-named-block-nodes`` and ``query-block`` result dirty-bitmaps[i].status 
(since 4.0)
 
''
 
diff --git a/docs/system/removed-features.rst b/docs/system/removed-features.rst
index c8481cafbd..95f3fb2912 100644
--- a/docs/system/removed-features.rst
+++ b/docs/system/removed-features.rst
@@ -58,6 +58,11 @@ documentation of ``query-hotpluggable-cpus`` for additional 
details.
 
 Use ``blockdev-change-medium`` or ``change-vnc-password`` instead.
 
+``object-add`` option ``props`` (removed in 6.0)
+
+
+Specify the properties for the object as top-level arguments instead.
+
 Human Monitor Protocol (HMP) commands
 -
 
diff --git a/qom/qom-qmp-cmds.c b/qom/qom-qmp-cmds.c
index b40ac39f30..19fd5e117f 100644
--- a/qom/qom-qmp-cmds.c
+++ b/qom/qom-qmp-cmds.c
@@ -225,27 +225,6 @@ ObjectPropertyInfoList *qmp_qom_list_properties(const char 
*typename,
 
 void qmp_object_add(QDict *qdict, QObject **ret_data, Error **errp)
 {
-QObject *props;
-QDict *pdict;
-
-props = qdict_get(qdict, "props");
-if (props) {
-pdict = qobject_to(QDict, props);
-if (!pdict) {
-error_setg(errp, QERR_INVALID_PARAMETER_TYPE, "props", "dict");
-return;
-}
-qobject_ref(pdict);
-qdict_del(qdict, "props");
-qdict_join(qdict, pdict, false);
-if (qdict_size(pdict) != 0) {
-error_setg(errp, "Option in 'props' conflicts with top level");
-qobject_unref(pdict);
-return;
-}
-qobject_unref(pdict);
-}
-
 user_creatable_add_dict(qdict, false, errp);
 }
 
-- 
2.29.2




[PATCH v3 06/30] qapi/qom: Add ObjectOptions for memory-backend-*

2021-03-08 Thread Kevin Wolf
This adds a QAPI schema for the properties of the memory-backend-*
objects.

HostMemPolicy has to be moved to an include file that can be used by the
storage daemon, too, because ObjectOptions must be the same in all
binaries if we don't want to compile the whole code multiple times.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
---
 qapi/common.json  |  20 
 qapi/machine.json |  22 +
 qapi/qom.json | 121 +-
 3 files changed, 141 insertions(+), 22 deletions(-)

diff --git a/qapi/common.json b/qapi/common.json
index 716712d4b3..2dad4fadc3 100644
--- a/qapi/common.json
+++ b/qapi/common.json
@@ -145,3 +145,23 @@
 ##
 { 'enum': 'PCIELinkWidth',
   'data': [ '1', '2', '4', '8', '12', '16', '32' ] }
+
+##
+# @HostMemPolicy:
+#
+# Host memory policy types
+#
+# @default: restore default policy, remove any nondefault policy
+#
+# @preferred: set the preferred host nodes for allocation
+#
+# @bind: a strict policy that restricts memory allocation to the
+#host nodes specified
+#
+# @interleave: memory allocations are interleaved across the set
+#  of host nodes specified
+#
+# Since: 2.1
+##
+{ 'enum': 'HostMemPolicy',
+  'data': [ 'default', 'preferred', 'bind', 'interleave' ] }
diff --git a/qapi/machine.json b/qapi/machine.json
index 330189efe3..4322aee782 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -8,6 +8,8 @@
 # = Machines
 ##
 
+{ 'include': 'common.json' }
+
 ##
 # @SysEmuTarget:
 #
@@ -897,26 +899,6 @@
'policy': 'HmatCacheWritePolicy',
'line': 'uint16' }}
 
-##
-# @HostMemPolicy:
-#
-# Host memory policy types
-#
-# @default: restore default policy, remove any nondefault policy
-#
-# @preferred: set the preferred host nodes for allocation
-#
-# @bind: a strict policy that restricts memory allocation to the
-#host nodes specified
-#
-# @interleave: memory allocations are interleaved across the set
-#  of host nodes specified
-#
-# Since: 2.1
-##
-{ 'enum': 'HostMemPolicy',
-  'data': [ 'default', 'preferred', 'bind', 'interleave' ] }
-
 ##
 # @memsave:
 #
diff --git a/qapi/qom.json b/qapi/qom.json
index 942654e05c..8c0e06c198 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -5,6 +5,7 @@
 # See the COPYING file in the top-level directory.
 
 { 'include': 'authz.json' }
+{ 'include': 'common.json' }
 
 ##
 # = QEMU Object Model (QOM)
@@ -272,6 +273,115 @@
 '*poll-grow': 'int',
 '*poll-shrink': 'int' } }
 
+##
+# @MemoryBackendProperties:
+#
+# Properties for objects of classes derived from memory-backend.
+#
+# @merge: if true, mark the memory as mergeable (default depends on the machine
+# type)
+#
+# @dump: if true, include the memory in core dumps (default depends on the
+#machine type)
+#
+# @host-nodes: the list of NUMA host nodes to bind the memory to
+#
+# @policy: the NUMA policy (default: 'default')
+#
+# @prealloc: if true, preallocate memory (default: false)
+#
+# @prealloc-threads: number of CPU threads to use for prealloc (default: 1)
+#
+# @share: if false, the memory is private to QEMU; if true, it is shared
+# (default: false)
+#
+# @size: size of the memory region in bytes
+#
+# @x-use-canonical-path-for-ramblock-id: if true, the canoncial path is used
+#for ramblock-id. Disable this for 4.0
+#machine types or older to allow
+#migration with newer QEMU versions.
+#This option is considered stable
+#despite the x- prefix. (default:
+#false generally, but true for machine
+#types <= 4.0)
+#
+# Since: 2.1
+##
+{ 'struct': 'MemoryBackendProperties',
+  'data': { '*dump': 'bool',
+'*host-nodes': ['uint16'],
+'*merge': 'bool',
+'*policy': 'HostMemPolicy',
+'*prealloc': 'bool',
+'*prealloc-threads': 'uint32',
+'*share': 'bool',
+'size': 'size',
+'*x-use-canonical-path-for-ramblock-id': 'bool' } }
+
+##
+# @MemoryBackendFileProperties:
+#
+# Properties for memory-backend-file objects.
+#
+# @align: the base address alignment when QEMU mmap(2)s @mem-path. Some
+# backend stores specified by @mem-path require an alignment different
+# than the default one used by QEMU, e.g. the device DAX /dev/dax0.0
+# requires 2M alignment rather than 4K. In such cases, users can
+# specify the required alignment via this option.
+# 0 selects a default alignment (currently the page size). (default: 0)
+#
+# @discard-data: if true, the file contents can be destroyed when QEMU exits,
+#to avoid unnecessarily flushing data to the backing file. Note
+#that ``discard-data`` is only an optimization, and 

[PATCH v3 04/30] qapi/qom: Add ObjectOptions for cryptodev-*

2021-03-08 Thread Kevin Wolf
This adds a QAPI schema for the properties of the cryptodev-* objects.

These interfaces have some questionable aspects (cryptodev-backend is
really an abstract base class without function, and the queues option
only makes sense for cryptodev-vhost-user), but as the goal is to
represent the existing interface in QAPI, leave these things in place.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
---
 qapi/qom.json | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/qapi/qom.json b/qapi/qom.json
index 30ed179bc1..46c2cdc6cf 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -204,6 +204,34 @@
   'returns': [ 'ObjectPropertyInfo' ],
   'allow-preconfig': true }
 
+##
+# @CryptodevBackendProperties:
+#
+# Properties for cryptodev-backend and cryptodev-backend-builtin objects.
+#
+# @queues: the number of queues for the cryptodev backend. Ignored for
+#  cryptodev-backend and must be 1 for cryptodev-backend-builtin.
+#  (default: 1)
+#
+# Since: 2.8
+##
+{ 'struct': 'CryptodevBackendProperties',
+  'data': { '*queues': 'uint32' } }
+
+##
+# @CryptodevVhostUserProperties:
+#
+# Properties for cryptodev-vhost-user objects.
+#
+# @chardev: the name of a Unix domain socket character device that connects to
+#   the vhost-user server
+#
+# Since: 2.12
+##
+{ 'struct': 'CryptodevVhostUserProperties',
+  'base': 'CryptodevBackendProperties',
+  'data': { 'chardev': 'str' } }
+
 ##
 # @IothreadProperties:
 #
@@ -239,6 +267,9 @@
 'authz-listfile',
 'authz-pam',
 'authz-simple',
+'cryptodev-backend',
+'cryptodev-backend-builtin',
+'cryptodev-vhost-user',
 'iothread'
   ] }
 
@@ -262,6 +293,10 @@
   'authz-listfile': 'AuthZListFileProperties',
   'authz-pam':  'AuthZPAMProperties',
   'authz-simple':   'AuthZSimpleProperties',
+  'cryptodev-backend':  'CryptodevBackendProperties',
+  'cryptodev-backend-builtin':  'CryptodevBackendProperties',
+  'cryptodev-vhost-user':   { 'type': 'CryptodevVhostUserProperties',
+  'if': 'defined(CONFIG_VIRTIO_CRYPTO) && 
defined(CONFIG_VHOST_CRYPTO)' },
   'iothread':   'IothreadProperties'
   } }
 
-- 
2.29.2




[PATCH v3 03/30] qapi/qom: Add ObjectOptions for authz-*

2021-03-08 Thread Kevin Wolf
This adds a QAPI schema for the properties of the authz-* objects.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 qapi/authz.json  | 61 +---
 qapi/qom.json| 10 +
 storage-daemon/qapi/qapi-schema.json |  1 +
 3 files changed, 67 insertions(+), 5 deletions(-)

diff --git a/qapi/authz.json b/qapi/authz.json
index 42afe752d1..51845e37cc 100644
--- a/qapi/authz.json
+++ b/qapi/authz.json
@@ -50,12 +50,63 @@
'*format': 'QAuthZListFormat'}}
 
 ##
-# @QAuthZListRuleListHack:
+# @AuthZListProperties:
 #
-# Not exposed via QMP; hack to generate QAuthZListRuleList
-# for use internally by the code.
+# Properties for authz-list objects.
+#
+# @policy: Default policy to apply when no rule matches (default: deny)
+#
+# @rules: Authorization rules based on matching user
+#
+# Since: 4.0
+##
+{ 'struct': 'AuthZListProperties',
+  'data': { '*policy': 'QAuthZListPolicy',
+'*rules': ['QAuthZListRule'] } }
+
+##
+# @AuthZListFileProperties:
+#
+# Properties for authz-listfile objects.
+#
+# @filename: File name to load the configuration from. The file must
+#contain valid JSON for AuthZListProperties.
+#
+# @refresh: If true, inotify is used to monitor the file, automatically
+#   reloading changes. If an error occurs during reloading, all
+#   authorizations will fail until the file is next successfully
+#   loaded. (default: true if the binary was built with
+#   CONFIG_INOTIFY1, false otherwise)
+#
+# Since: 4.0
+##
+{ 'struct': 'AuthZListFileProperties',
+  'data': { 'filename': 'str',
+'*refresh': 'bool' } }
+
+##
+# @AuthZPAMProperties:
+#
+# Properties for authz-pam objects.
+#
+# @service: PAM service name to use for authorization
+#
+# Since: 4.0
+##
+{ 'struct': 'AuthZPAMProperties',
+  'data': { 'service': 'str' } }
+
+##
+# @AuthZSimpleProperties:
+#
+# Properties for authz-simple objects.
+#
+# @identity: Identifies the allowed user. Its format depends on the network
+#service that authorization object is associated with. For
+#authorizing based on TLS x509 certificates, the identity must be
+#the x509 distinguished name.
 #
 # Since: 4.0
 ##
-{ 'struct': 'QAuthZListRuleListHack',
-  'data': { 'unused': ['QAuthZListRule'] } }
+{ 'struct': 'AuthZSimpleProperties',
+  'data': { 'identity': 'str' } }
diff --git a/qapi/qom.json b/qapi/qom.json
index bf2ecb34be..30ed179bc1 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -4,6 +4,8 @@
 # This work is licensed under the terms of the GNU GPL, version 2 or later.
 # See the COPYING file in the top-level directory.
 
+{ 'include': 'authz.json' }
+
 ##
 # = QEMU Object Model (QOM)
 ##
@@ -233,6 +235,10 @@
 ##
 { 'enum': 'ObjectType',
   'data': [
+'authz-list',
+'authz-listfile',
+'authz-pam',
+'authz-simple',
 'iothread'
   ] }
 
@@ -252,6 +258,10 @@
 'id': 'str' },
   'discriminator': 'qom-type',
   'data': {
+  'authz-list': 'AuthZListProperties',
+  'authz-listfile': 'AuthZListFileProperties',
+  'authz-pam':  'AuthZPAMProperties',
+  'authz-simple':   'AuthZSimpleProperties',
   'iothread':   'IothreadProperties'
   } }
 
diff --git a/storage-daemon/qapi/qapi-schema.json 
b/storage-daemon/qapi/qapi-schema.json
index 28117c3aac..67749d1101 100644
--- a/storage-daemon/qapi/qapi-schema.json
+++ b/storage-daemon/qapi/qapi-schema.json
@@ -26,6 +26,7 @@
 { 'include': '../../qapi/crypto.json' }
 { 'include': '../../qapi/introspect.json' }
 { 'include': '../../qapi/job.json' }
+{ 'include': '../../qapi/authz.json' }
 { 'include': '../../qapi/qom.json' }
 { 'include': '../../qapi/sockets.json' }
 { 'include': '../../qapi/transaction.json' }
-- 
2.29.2




[PATCH v3 00/30] qapi/qom: QAPIfy --object and object-add

2021-03-08 Thread Kevin Wolf
This series adds a QAPI type for the properties of all user creatable
QOM types and finally makes the --object command line option (in all
binaries) and the object-add monitor commands (in QMP and HMP) use the
new ObjectOptions union.

This change improves things in more than just one way:

1. Documentation for QOM object types has always been lacking. Adding
   the schema, we get documentation for every property.

2. It prevents bugs by performing parts of the input validation (e.g.
   checking presence of mandatory properties) already in QAPI instead of
   relying on separate manual implementations in each class.

3. It provides QAPI introspection for user creatable objects.

4. Non-scalar properties are now supported everywhere because the
   command line parsers (including HMP) use the keyval parser now.


If you are in the CC list and didn't expect this series, it's probably
because you're the maintainer of one of the objects for which I'm adding
a QAPI schema description. Please just have a look at the specific patch
for your object and check whether the schema and its documentation make
sense to you. You can ignore all other patches.


In a next step after this series, we can add make use of the QAPI
structs in the implementation of the object and separate their
configuration from the runtime state. Specifically, the plan is to
add a .configure() callback to ObjectClass that allows configuring the
object in one place at creation time and keeping QOM property setters
only for properties that can actually be changed at runtime. Paolo made
an example of what the state could look like after this:

https://wiki.qemu.org/Features/QOM-QAPI_integration

Finally, the intention is to extend the QAPI schema to have separate
'object' entities and generate some of the code that was written
manually in the intermediate state before.


This series is available as a git tag at:

https://repo.or.cz/qemu/kevin.git qapi-object-v3


v3:
- Removed now useless QAuthZListRuleListHack
- Made some more ObjectOptions branches conditional
- Improved documentation for some properties
- Fixed 'qemu-img compare' exit code for option parsing failure

v2:
- Convert not only object-add, but all external interfaces so that the
  schema will always be enforced and mismatch between implementation and
  schema can't go unnoticed.
- Rebased, covering properties and object types added since v1 (yes,
  things do become outdated rather quickly when you touch all user
  creatable objects)
- Changed the "Since:" version number in the schema documentation to
  refer to the version when the object was introduced rather than 6.0
  where the schema will (hopefully) be added
- Probably some other minor changes

Kevin Wolf (30):
  qapi/qom: Drop deprecated 'props' from object-add
  qapi/qom: Add ObjectOptions for iothread
  qapi/qom: Add ObjectOptions for authz-*
  qapi/qom: Add ObjectOptions for cryptodev-*
  qapi/qom: Add ObjectOptions for dbus-vmstate
  qapi/qom: Add ObjectOptions for memory-backend-*
  qapi/qom: Add ObjectOptions for rng-*, deprecate 'opened'
  qapi/qom: Add ObjectOptions for throttle-group
  qapi/qom: Add ObjectOptions for secret*, deprecate 'loaded'
  qapi/qom: Add ObjectOptions for tls-*, deprecate 'loaded'
  qapi/qom: Add ObjectOptions for can-*
  qapi/qom: Add ObjectOptions for colo-compare
  qapi/qom: Add ObjectOptions for filter-*
  qapi/qom: Add ObjectOptions for pr-manager-helper
  qapi/qom: Add ObjectOptions for confidential-guest-support
  qapi/qom: Add ObjectOptions for input-*
  qapi/qom: Add ObjectOptions for x-remote-object
  qapi/qom: QAPIfy object-add
  qom: Make "object" QemuOptsList optional
  qemu-storage-daemon: Implement --object with qmp_object_add()
  qom: Remove user_creatable_add_dict()
  qom: Factor out user_creatable_process_cmdline()
  qemu-io: Use user_creatable_process_cmdline() for --object
  qemu-nbd: Use user_creatable_process_cmdline() for --object
  qom: Add user_creatable_add_from_str()
  qemu-img: Use user_creatable_process_cmdline() for --object
  hmp: QAPIfy object_add
  qom: Add user_creatable_parse_str()
  vl: QAPIfy -object
  qom: Drop QemuOpts based interfaces

 qapi/authz.json  |  61 ++-
 qapi/block-core.json |  27 ++
 qapi/common.json |  52 +++
 qapi/crypto.json | 159 +++
 qapi/machine.json|  22 +-
 qapi/net.json|  20 -
 qapi/qom.json| 644 ++-
 qapi/ui.json |  13 +-
 docs/system/deprecated.rst   |  25 +-
 docs/system/removed-features.rst |   5 +
 include/qom/object_interfaces.h  | 106 ++---
 hw/block/xen-block.c |  16 +-
 monitor/hmp-cmds.c   |  17 +-
 monitor/misc.c   |   2 -
 qemu-img.c   | 251 ++-
 qemu-io.c|  33 +-
 qemu-nbd.c  

[PATCH v3 02/30] qapi/qom: Add ObjectOptions for iothread

2021-03-08 Thread Kevin Wolf
Add an ObjectOptions union that will eventually describe the options of
all user creatable object types. As unions can't exist without any
branches, also add the first object type.

This adds a QAPI schema for the properties of the iothread object.

Signed-off-by: Kevin Wolf 
Acked-by: Peter Krempa 
Reviewed-by: Eric Blake 
---
 qapi/qom.json | 53 +++
 1 file changed, 53 insertions(+)

diff --git a/qapi/qom.json b/qapi/qom.json
index 96c91c1faf..bf2ecb34be 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -202,6 +202,59 @@
   'returns': [ 'ObjectPropertyInfo' ],
   'allow-preconfig': true }
 
+##
+# @IothreadProperties:
+#
+# Properties for iothread objects.
+#
+# @poll-max-ns: the maximum number of nanoseconds to busy wait for events.
+#   0 means polling is disabled (default: 32768 on POSIX hosts,
+#   0 otherwise)
+#
+# @poll-grow: the multiplier used to increase the polling time when the
+# algorithm detects it is missing events due to not polling long
+# enough. 0 selects a default behaviour (default: 0)
+#
+# @poll-shrink: the divisor used to decrease the polling time when the
+#   algorithm detects it is spending too long polling without
+#   encountering events. 0 selects a default behaviour (default: 0)
+#
+# Since: 2.0
+##
+{ 'struct': 'IothreadProperties',
+  'data': { '*poll-max-ns': 'int',
+'*poll-grow': 'int',
+'*poll-shrink': 'int' } }
+
+##
+# @ObjectType:
+#
+# Since: 6.0
+##
+{ 'enum': 'ObjectType',
+  'data': [
+'iothread'
+  ] }
+
+##
+# @ObjectOptions:
+#
+# Describes the options of a user creatable QOM object.
+#
+# @qom-type: the class name for the object to be created
+#
+# @id: the name of the new object
+#
+# Since: 6.0
+##
+{ 'union': 'ObjectOptions',
+  'base': { 'qom-type': 'ObjectType',
+'id': 'str' },
+  'discriminator': 'qom-type',
+  'data': {
+  'iothread':   'IothreadProperties'
+  } }
+
 ##
 # @object-add:
 #
-- 
2.29.2




Re: [PATCH v4] virtio-blk: Respect discard granularity

2021-03-08 Thread Stefan Hajnoczi
On Thu, Feb 25, 2021 at 09:12:39AM +0900, Akihiko Odaki wrote:
> Report the configured granularity for discard operation to the
> guest. If this is not set use the block size.
> 
> Since until now we have ignored the configured discard granularity
> and always reported the block size, let's add
> 'report-discard-granularity' property and disable it for older
> machine types to avoid migration issues.
> 
> Signed-off-by: Akihiko Odaki 
> ---
>  hw/block/virtio-blk.c  | 8 +++-
>  hw/core/machine.c  | 4 +++-
>  include/hw/virtio/virtio-blk.h | 1 +
>  3 files changed, 11 insertions(+), 2 deletions(-)

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 00/14] deprecations: remove many old deprecations

2021-03-08 Thread Stefan Hajnoczi
On Wed, Feb 24, 2021 at 04:21:13PM +0100, Philippe Mathieu-Daudé wrote:
> On 2/24/21 3:38 PM, Peter Maydell wrote:
> > On Wed, 24 Feb 2021 at 13:21, Daniel P. Berrangé  
> > wrote:
> >>
> >> The following features have been deprecated for well over the 2
> >> release cycle we promise
> >>
> >>   ``-usbdevice`` (since 2.10.0)
> >>   ``-drive file=3Djson:{...{'driver':'file'}}`` (since 3.0)
> >>   ``-vnc acl`` (since 4.0.0)
> >>   ``-mon ...,control=3Dreadline,pretty=3Don|off`` (since 4.1)
> > 
> > Are the literal '=3D' here intended ?
> 
> No, this is a git-publish bug:
> https://github.com/stefanha/git-publish/issues/88
> 
> Apparently the fix is not yet backported to Fedora.

Thanks for reminding me. I'll roll a new git-publish release and package
it in Fedora.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH RFC 0/4] hw/block/nvme: convert ad-hoc aio tracking to aiocbs

2021-03-08 Thread Stefan Hajnoczi
On Tue, Mar 02, 2021 at 12:10:36PM +0100, Klaus Jensen wrote:
> Marking RFC, since I've not really done anything with QEMU AIOs and BHs
> on this level before, so I'd really like some block-layer eyes on it.

I took a brief look and it seems like a nice conversion of the code.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH RFC 1/4] hw/block/nvme: convert dsm to aiocb

2021-03-08 Thread Stefan Hajnoczi
On Tue, Mar 02, 2021 at 12:10:37PM +0100, Klaus Jensen wrote:
> +static void nvme_dsm_cancel(BlockAIOCB *aiocb)
> +{
> +NvmeDSMAIOCB *iocb = container_of(aiocb, NvmeDSMAIOCB, common);
> +
> +/* break loop */
> +iocb->curr.len = 0;
> +iocb->curr.idx = iocb->nr;
> +
> +iocb->ret = -ECANCELED;
> +
> +if (iocb->aiocb) {
> +blk_aio_cancel_async(iocb->aiocb);
> +iocb->aiocb = NULL;
> +}
> +}

Is the case where iocb->aiocb == NULL just in case nvme_dsm_cancel() is
called after the last discard has completed but before the BH runs? I
want to make sure there are no other cases because nothing would call
iocb->common.cb().

>  static uint16_t nvme_dsm(NvmeCtrl *n, NvmeRequest *req)
>  {
>  NvmeNamespace *ns = req->ns;
>  NvmeDsmCmd *dsm = (NvmeDsmCmd *) >cmd;
> -
>  uint32_t attr = le32_to_cpu(dsm->attributes);
>  uint32_t nr = (le32_to_cpu(dsm->nr) & 0xff) + 1;
> -
>  uint16_t status = NVME_SUCCESS;
>  
>  trace_pci_nvme_dsm(nvme_cid(req), nvme_nsid(ns), nr, attr);
>  
>  if (attr & NVME_DSMGMT_AD) {
> -int64_t offset;
> -size_t len;
> -NvmeDsmRange range[nr];
> -uintptr_t *discards = (uintptr_t *)>opaque;
> +NvmeDSMAIOCB *iocb = blk_aio_get(_dsm_aiocb_info, 
> ns->blkconf.blk,
> + nvme_misc_cb, req);
>  
> -status = nvme_dma(n, (uint8_t *)range, sizeof(range),
> +iocb->req = req;
> +iocb->bh = qemu_bh_new(nvme_dsm_bh, iocb);
> +iocb->ret = 0;
> +iocb->range = g_new(NvmeDsmRange, nr);
> +iocb->nr = nr;
> +iocb->curr.len = 0;
> +iocb->curr.idx = 0;
> +
> +status = nvme_dma(n, (uint8_t *)iocb->range, sizeof(NvmeDsmRange) * 
> nr,
>DMA_DIRECTION_TO_DEVICE, req);
>  if (status) {
>  return status;
>  }
>  
> -/*
> - * AIO callbacks may be called immediately, so initialize discards 
> to 1
> - * to make sure the the callback does not complete the request before
> - * all discards have been issued.
> - */
> -*discards = 1;
> +nvme_dsm_aio_cb(iocb, 0);
> +req->aiocb = >common;

Want to move this line up one just in case something in
nvme_dsm_aio_cb() accesses req->aiocb?


signature.asc
Description: PGP signature


Re: [PATCH v3 2/5] scripts/tracetool: Replace the word 'whitelist'

2021-03-08 Thread Stefan Hajnoczi
On Wed, Mar 03, 2021 at 07:46:41PM +0100, Philippe Mathieu-Daudé wrote:
> Follow the inclusive terminology from the "Conscious Language in your
> Open Source Projects" guidelines [*] and replace the words "whitelist"
> appropriately.
> 
> [*] https://github.com/conscious-lang/conscious-lang-docs/blob/main/faq.md
> 
> Reviewed-by: Daniel P. Berrangé 
> Reviewed-by: Stefan Hajnoczi 
> Reviewed-by: Alex Bennée 
> Reviewed-by: Thomas Huth 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  scripts/tracetool/__init__.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


[PATCH] block: remove format defaults from QemuOpts in bdrv_create_file()

2021-03-08 Thread Stefano Garzarella
QemuOpts is usually created merging the QemuOptsList of format
and protocol. So, when the format calls bdr_create_file(), the 'opts'
parameter contains a QemuOptsList with a combination of format and
protocol default values.

The format properly removes its options before calling
bdr_create_file(), but the default values remain in 'opts->list'.
So if the protocol has options with the same name (e.g. rbd has
'cluster_size' as qcow2), it will see the default values of the format,
since for overlapping options, the format wins.

To avoid this issue, lets convert QemuOpts to QDict, in this way we take
only the set options, and then convert it back to QemuOpts, using the
'create_opts' of the protocol. So the new QemuOpts, will contain only the
protocol defaults.

Suggested-by: Kevin Wolf 
Signed-off-by: Stefano Garzarella 
---
 block.c | 36 +++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index a1f3cecd75..be7083c7d8 100644
--- a/block.c
+++ b/block.c
@@ -670,14 +670,48 @@ out:
 
 int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp)
 {
+QemuOpts *protocol_opts;
 BlockDriver *drv;
+QDict *qdict;
+int ret;
 
 drv = bdrv_find_protocol(filename, true, errp);
 if (drv == NULL) {
 return -ENOENT;
 }
 
-return bdrv_create(drv, filename, opts, errp);
+if (!drv->create_opts) {
+error_setg(errp, "Driver '%s' does not support image creation",
+   drv->format_name);
+return -ENOTSUP;
+}
+
+/*
+ * 'opts' contains a QemuOptsList with a combination of format and protocol
+ * default values.
+ *
+ * The format properly removes its options, but the default values remain
+ * in 'opts->list'.  So if the protocol has options with the same name
+ * (e.g. rbd has 'cluster_size' as qcow2), it will see the default values
+ * of the format, since for overlapping options, the format wins.
+ *
+ * To avoid this issue, lets convert QemuOpts to QDict, in this way we take
+ * only the set options, and then convert it back to QemuOpts, using the
+ * create_opts of the protocol. So the new QemuOpts, will contain only the
+ * protocol defaults.
+ */
+qdict = qemu_opts_to_qdict(opts, NULL);
+protocol_opts = qemu_opts_from_qdict(drv->create_opts, qdict, errp);
+if (protocol_opts == NULL) {
+ret = -EINVAL;
+goto out;
+}
+
+ret = bdrv_create(drv, filename, protocol_opts, errp);
+out:
+qemu_opts_del(protocol_opts);
+qobject_unref(qdict);
+return ret;
 }
 
 int coroutine_fn bdrv_co_delete_file(BlockDriverState *bs, Error **errp)
-- 
2.29.2




Re: [PATCH v2] block/file-posix: Optimize for macOS

2021-03-08 Thread Akihiko Odaki
2021年3月9日(火) 0:17 Stefan Hajnoczi :
>
> The live migration compatibility issue is still present. Migrating to
> another host might not work if the block limits are different.
>
> Here is an idea for solving it:
>
> Modify include/hw/block/block.h:DEFINE_BLOCK_PROPERTIES_BASE() to
> support a new value called "host". The default behavior remains
> unchanged for live migration compatibility but now you can use "host" if
> you know it's okay but don't care about migration compatibility.
>
> The downside to this approach is that users must explicitly say
> something like --drive ...,opt_io_size=host. But it's still better than
> the situation we have today where user must manually enter values for
> their disk.
>
> Does this sound okay to everyone?
>
> Stefan

I wonder how that change affects other block drivers implementing
bdrv_probe_blocksizes. As far as I know, the values they report are
already used by default, which is contrary to the default not being
"host".

Regards,
Akihiko Odaki



Re: [PATCH v2] block/file-posix: Optimize for macOS

2021-03-08 Thread Stefan Hajnoczi
On Fri, Mar 05, 2021 at 09:17:48PM +0900, Akihiko Odaki wrote:
> This commit introduces "punch hole" operation and optimizes transfer
> block size for macOS.
> 
> This commit introduces two additional members,
> discard_granularity and opt_io to BlockSizes type in
> include/block/block.h. Also, the members of the type are now
> optional. Set -1 to discard_granularity and 0 to other members
> for the default values.
> 
> Thanks to Konstantin Nazarov for detailed analysis of a flaw in an
> old version of this change:
> https://gist.github.com/akihikodaki/87df4149e7ca87f18dc56807ec5a1bc5#gistcomment-3654667
> 
> Signed-off-by: Akihiko Odaki 
> ---
>  block/file-posix.c| 40 ++--
>  block/nvme.c  |  2 ++
>  block/raw-format.c|  4 +++-
>  hw/block/block.c  | 12 ++--
>  include/block/block.h |  2 ++
>  5 files changed, 55 insertions(+), 5 deletions(-)

The live migration compatibility issue is still present. Migrating to
another host might not work if the block limits are different.

Here is an idea for solving it:

Modify include/hw/block/block.h:DEFINE_BLOCK_PROPERTIES_BASE() to
support a new value called "host". The default behavior remains
unchanged for live migration compatibility but now you can use "host" if
you know it's okay but don't care about migration compatibility.

The downside to this approach is that users must explicitly say
something like --drive ...,opt_io_size=host. But it's still better than
the situation we have today where user must manually enter values for
their disk.

Does this sound okay to everyone?

Stefan


signature.asc
Description: PGP signature


Re: [PATCH] xen-block: Fix removal of backend instance via xenstore

2021-03-08 Thread Paolo Bonzini

On 08/03/21 15:32, Anthony PERARD wrote:

From: Anthony PERARD 

Whenever a Xen block device is detach via xenstore, the image
associated with it remained open by the backend QEMU and an error is
logged:
 qemu-system-i386: failed to destroy drive: Node xvdz-qcow2 is in use

This happened since object_unparent() doesn't immediately frees the
object and thus keep a reference to the node we are trying to free.
The reference is hold by the "drive" property and the call
xen_block_drive_destroy() fails.

In order to fix that, we call drain_call_rcu() to run the callback
setup by bus_remove_child() via object_unparent().

Fixes: 2d24a6466154 ("device-core: use RCU for list of children of a bus")

Signed-off-by: Anthony PERARD 
---
CCing people whom introduced/reviewed the change to use RCU to give
them a chance to say if the change is fine.


If nothing else works then I guess it's okay, but why can't you do the 
xen_block_drive_destroy from e.g. an unrealize callback?


Paolo


---
  hw/block/xen-block.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index a3b69e27096f..fe5f828e2d25 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -972,6 +972,15 @@ static void xen_block_device_destroy(XenBackendInstance 
*backend,
  
  object_unparent(OBJECT(xendev));
  
+/*

+ * Drall all pending RCU callbacks as object_unparent() frees `xendev'
+ * in a RCU callback.
+ * And due to the property "drive" still existing in `xendev', we
+ * cann't destroy the XenBlockDrive associated with `xendev' with
+ * xen_block_drive_destroy() below.
+ */
+drain_call_rcu();
+
  if (iothread) {
  xen_block_iothread_destroy(iothread, errp);
  if (*errp) {






[PATCH] xen-block: Fix removal of backend instance via xenstore

2021-03-08 Thread Anthony PERARD via
From: Anthony PERARD 

Whenever a Xen block device is detach via xenstore, the image
associated with it remained open by the backend QEMU and an error is
logged:
qemu-system-i386: failed to destroy drive: Node xvdz-qcow2 is in use

This happened since object_unparent() doesn't immediately frees the
object and thus keep a reference to the node we are trying to free.
The reference is hold by the "drive" property and the call
xen_block_drive_destroy() fails.

In order to fix that, we call drain_call_rcu() to run the callback
setup by bus_remove_child() via object_unparent().

Fixes: 2d24a6466154 ("device-core: use RCU for list of children of a bus")

Signed-off-by: Anthony PERARD 
---
CCing people whom introduced/reviewed the change to use RCU to give
them a chance to say if the change is fine.
---
 hw/block/xen-block.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index a3b69e27096f..fe5f828e2d25 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -972,6 +972,15 @@ static void xen_block_device_destroy(XenBackendInstance 
*backend,
 
 object_unparent(OBJECT(xendev));
 
+/*
+ * Drall all pending RCU callbacks as object_unparent() frees `xendev'
+ * in a RCU callback.
+ * And due to the property "drive" still existing in `xendev', we
+ * cann't destroy the XenBlockDrive associated with `xendev' with
+ * xen_block_drive_destroy() below.
+ */
+drain_call_rcu();
+
 if (iothread) {
 xen_block_iothread_destroy(iothread, errp);
 if (*errp) {
-- 
Anthony PERARD




[PULL 36/38] hw/block/nvme: support namespace attachment command

2021-03-08 Thread Klaus Jensen
From: Minwoo Im 

This patch supports Namespace Attachment command for the pre-defined
nvme-ns device nodes.  Of course, attach/detach namespace should only be
supported in case 'subsys' is given.  This is because if we detach a
namespace from a controller, somebody needs to manage the detached, but
allocated namespace in the NVMe subsystem.

As command effect for the namespace attachment command is registered,
the host will be notified that namespace inventory is changed so that
host will rescan the namespace inventory after this command.  For
example, kernel driver manages this command effect via passthru IOCTL.

Signed-off-by: Minwoo Im 
Reviewed-by: Keith Busch 
Reviewed-by: Klaus Jensen 
Tested-by: Klaus Jensen 
[k.jensen: rebased for dma refactor]
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-subsys.h | 10 +++
 hw/block/nvme.h|  5 
 include/block/nvme.h   |  6 +
 hw/block/nvme.c| 60 +-
 hw/block/trace-events  |  2 ++
 5 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 14627f9ccb41..ef4bec928eae 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -30,6 +30,16 @@ typedef struct NvmeSubsystem {
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
 int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp);
 
+static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem *subsys,
+uint32_t cntlid)
+{
+if (!subsys) {
+return NULL;
+}
+
+return subsys->ctrls[cntlid];
+}
+
 /*
  * Return allocated namespace of the specified nsid in the subsystem.
  */
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 85a7b5a14f4e..1287bc2cd17a 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -235,6 +235,11 @@ static inline void nvme_ns_attach(NvmeCtrl *n, 
NvmeNamespace *ns)
 n->namespaces[nvme_nsid(ns) - 1] = ns;
 }
 
+static inline void nvme_ns_detach(NvmeCtrl *n, NvmeNamespace *ns)
+{
+n->namespaces[nvme_nsid(ns) - 1] = NULL;
+}
+
 static inline NvmeCQueue *nvme_cq(NvmeRequest *req)
 {
 NvmeSQueue *sq = req->sq;
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 16d8c4c90f7e..03471a4d5abd 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -566,6 +566,7 @@ enum NvmeAdminCommands {
 NVME_ADM_CMD_ASYNC_EV_REQ   = 0x0c,
 NVME_ADM_CMD_ACTIVATE_FW= 0x10,
 NVME_ADM_CMD_DOWNLOAD_FW= 0x11,
+NVME_ADM_CMD_NS_ATTACHMENT  = 0x15,
 NVME_ADM_CMD_FORMAT_NVM = 0x80,
 NVME_ADM_CMD_SECURITY_SEND  = 0x81,
 NVME_ADM_CMD_SECURITY_RECV  = 0x82,
@@ -836,6 +837,9 @@ enum NvmeStatusCodes {
 NVME_FEAT_NOT_CHANGEABLE= 0x010e,
 NVME_FEAT_NOT_NS_SPEC   = 0x010f,
 NVME_FW_REQ_SUSYSTEM_RESET  = 0x0110,
+NVME_NS_ALREADY_ATTACHED= 0x0118,
+NVME_NS_NOT_ATTACHED= 0x011A,
+NVME_NS_CTRL_LIST_INVALID   = 0x011C,
 NVME_CONFLICTING_ATTRS  = 0x0180,
 NVME_INVALID_PROT_INFO  = 0x0181,
 NVME_WRITE_TO_RO= 0x0182,
@@ -951,6 +955,7 @@ typedef struct QEMU_PACKED NvmePSD {
 uint8_t resv[16];
 } NvmePSD;
 
+#define NVME_CONTROLLER_LIST_SIZE 2048
 #define NVME_IDENTIFY_DATA_SIZE 4096
 
 enum NvmeIdCns {
@@ -1055,6 +1060,7 @@ enum NvmeIdCtrlOacs {
 NVME_OACS_SECURITY  = 1 << 0,
 NVME_OACS_FORMAT= 1 << 1,
 NVME_OACS_FW= 1 << 2,
+NVME_OACS_NS_MGMT   = 1 << 3,
 };
 
 enum NvmeIdCtrlOncs {
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 3f86da6ebc5c..fc38c3e4629d 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -187,6 +187,7 @@ static const uint32_t nvme_cse_acs[256] = {
 [NVME_ADM_CMD_SET_FEATURES] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_GET_FEATURES] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_NS_ATTACHMENT]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_NIC,
 };
 
 static const uint32_t nvme_cse_iocs_none[256];
@@ -3896,6 +3897,61 @@ static uint16_t nvme_aer(NvmeCtrl *n, NvmeRequest *req)
 return NVME_NO_COMPLETE;
 }
 
+static void __nvme_select_ns_iocs(NvmeCtrl *n, NvmeNamespace *ns);
+static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req)
+{
+NvmeNamespace *ns;
+NvmeCtrl *ctrl;
+uint16_t list[NVME_CONTROLLER_LIST_SIZE] = {};
+uint32_t nsid = le32_to_cpu(req->cmd.nsid);
+uint32_t dw10 = le32_to_cpu(req->cmd.cdw10);
+bool attach = !(dw10 & 0xf);
+uint16_t *nr_ids = [0];
+uint16_t *ids = [1];
+uint16_t ret;
+int i;
+
+trace_pci_nvme_ns_attachment(nvme_cid(req), dw10 & 0xf);
+
+ns = nvme_subsys_ns(n->subsys, nsid);
+if (!ns) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+ret = nvme_h2c(n, (uint8_t *)list, 4096, req);
+if (ret) {
+return ret;
+}
+
+if (!*nr_ids) {
+return NVME_NS_CTRL_LIST_INVALID | NVME_DNR;
+}
+
+for (i = 0; i < *nr_ids; i++) {
+ctrl = nvme_subsys_ctrl(n->subsys, ids[i]);
+if (!ctrl) 

[PULL 33/38] hw/block/nvme: fix allocated namespace list to 256

2021-03-08 Thread Klaus Jensen
From: Minwoo Im 

Expand allocated namespace list (subsys->namespaces) to have 256 entries
which is a value lager than at least NVME_MAX_NAMESPACES which is for
attached namespace list in a controller.

Allocated namespace list should at least larger than attached namespace
list.

n->num_namespaces = NVME_MAX_NAMESPACES;

The above line will set the NN field by id->nn so that the subsystem
should also prepare at least this number of namespace list entries.

Signed-off-by: Minwoo Im 
Reviewed-by: Keith Busch 
Reviewed-by: Klaus Jensen 
Tested-by: Klaus Jensen 
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-subsys.h | 2 +-
 hw/block/nvme.h| 6 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 574774390c4c..8a0732b22316 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -14,7 +14,7 @@
 OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
 
 #define NVME_SUBSYS_MAX_CTRLS   32
-#define NVME_SUBSYS_MAX_NAMESPACES  32
+#define NVME_SUBSYS_MAX_NAMESPACES  256
 
 typedef struct NvmeCtrl NvmeCtrl;
 typedef struct NvmeNamespace NvmeNamespace;
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index cd8d40634411..85a7b5a14f4e 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -10,6 +10,12 @@
 #define NVME_DEFAULT_ZONE_SIZE   (128 * MiB)
 #define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB)
 
+/*
+ * Subsystem namespace list for allocated namespaces should be larger than
+ * attached namespace list in a controller.
+ */
+QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > NVME_SUBSYS_MAX_NAMESPACES);
+
 typedef struct NvmeParams {
 char *serial;
 uint32_t num_queues; /* deprecated since 5.1 */
-- 
2.30.1




[PULL 32/38] hw/block/nvme: fix namespaces array to 1-based

2021-03-08 Thread Klaus Jensen
From: Minwoo Im 

subsys->namespaces array used to be sized to NVME_SUBSYS_MAX_NAMESPACES.
But subsys->namespaces are being accessed with 1-based namespace id
which means the very first array entry will always be empty(NULL).

Signed-off-by: Minwoo Im 
Reviewed-by: Keith Busch 
Reviewed-by: Klaus Jensen 
Tested-by: Klaus Jensen 
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-subsys.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 890d118117dc..574774390c4c 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -24,7 +24,7 @@ typedef struct NvmeSubsystem {
 
 NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS];
 /* Allocated namespaces for this subsystem */
-NvmeNamespace *namespaces[NVME_SUBSYS_MAX_NAMESPACES];
+NvmeNamespace *namespaces[NVME_SUBSYS_MAX_NAMESPACES + 1];
 } NvmeSubsystem;
 
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
-- 
2.30.1




[PULL 24/38] hw/block/nvme: report non-mdts command size limit for dsm

2021-03-08 Thread Klaus Jensen
From: Gollu Appalanaidu 

Dataset Management is not subject to MDTS, but exceeded a certain size
per range causes internal looping. Report this limit (DMRSL) in the NVM
command set specific identify controller data structure.

Signed-off-by: Gollu Appalanaidu 
Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.h   |  2 ++
 include/block/nvme.h  | 11 +++
 hw/block/nvme.c   | 27 +++
 hw/block/trace-events |  1 +
 4 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index f45ace0cff5b..294fac1defe3 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -171,6 +171,8 @@ typedef struct NvmeCtrl {
 QTAILQ_HEAD(, NvmeAsyncEvent) aer_queue;
 int aer_queued;
 
+uint32_tdmrsl;
+
 NvmeSubsystem   *subsys;
 
 NvmeNamespace   namespace;
diff --git a/include/block/nvme.h b/include/block/nvme.h
index b23f3ae2279f..16d8c4c90f7e 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1041,6 +1041,16 @@ typedef struct NvmeIdCtrlZoned {
 uint8_t rsvd1[4095];
 } NvmeIdCtrlZoned;
 
+typedef struct NvmeIdCtrlNvm {
+uint8_t vsl;
+uint8_t wzsl;
+uint8_t wusl;
+uint8_t dmrl;
+uint32_tdmrsl;
+uint64_tdmsl;
+uint8_t rsvd16[4080];
+} NvmeIdCtrlNvm;
+
 enum NvmeIdCtrlOacs {
 NVME_OACS_SECURITY  = 1 << 0,
 NVME_OACS_FORMAT= 1 << 1,
@@ -1396,6 +1406,7 @@ static inline void _nvme_check_size(void)
 QEMU_BUILD_BUG_ON(sizeof(NvmeEffectsLog) != 4096);
 QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrl) != 4096);
 QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrlZoned) != 4096);
+QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrlNvm) != 4096);
 QEMU_BUILD_BUG_ON(sizeof(NvmeLBAF) != 4);
 QEMU_BUILD_BUG_ON(sizeof(NvmeLBAFE) != 16);
 QEMU_BUILD_BUG_ON(sizeof(NvmeIdNs) != 4096);
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 961507cae28a..0f6400cd7274 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1780,6 +1780,10 @@ static uint16_t nvme_dsm(NvmeCtrl *n, NvmeRequest *req)
 trace_pci_nvme_dsm_deallocate(nvme_cid(req), nvme_nsid(ns), slba,
   nlb);
 
+if (nlb > n->dmrsl) {
+trace_pci_nvme_dsm_single_range_limit_exceeded(nlb, n->dmrsl);
+}
+
 offset = nvme_l2b(ns, slba);
 len = nvme_l2b(ns, nlb);
 
@@ -3199,20 +3203,24 @@ static uint16_t nvme_identify_ctrl(NvmeCtrl *n, 
NvmeRequest *req)
 static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, NvmeRequest *req)
 {
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
-NvmeIdCtrlZoned id = {};
+uint8_t id[NVME_IDENTIFY_DATA_SIZE] = {};
 
 trace_pci_nvme_identify_ctrl_csi(c->csi);
 
-if (c->csi == NVME_CSI_NVM) {
-return nvme_rpt_empty_id_struct(n, req);
-} else if (c->csi == NVME_CSI_ZONED) {
-id.zasl = n->params.zasl;
+switch (c->csi) {
+case NVME_CSI_NVM:
+((NvmeIdCtrlNvm *))->dmrsl = cpu_to_le32(n->dmrsl);
+break;
 
-return nvme_dma(n, (uint8_t *), sizeof(id),
-DMA_DIRECTION_FROM_DEVICE, req);
+case NVME_CSI_ZONED:
+((NvmeIdCtrlZoned *))->zasl = n->params.zasl;
+break;
+
+default:
+return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-return NVME_INVALID_FIELD | NVME_DNR;
+return nvme_dma(n, id, sizeof(id), DMA_DIRECTION_FROM_DEVICE, req);
 }
 
 static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req)
@@ -4646,6 +4654,9 @@ int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace 
*ns, Error **errp)
 
 n->namespaces[nsid - 1] = ns;
 
+n->dmrsl = MIN_NON_ZERO(n->dmrsl,
+BDRV_REQUEST_MAX_BYTES / nvme_l2b(ns, 1));
+
 return 0;
 }
 
diff --git a/hw/block/trace-events b/hw/block/trace-events
index c165ee2a97c3..8deeacc8c35c 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -51,6 +51,7 @@ pci_nvme_copy_cb(uint16_t cid) "cid %"PRIu16""
 pci_nvme_block_status(int64_t offset, int64_t bytes, int64_t pnum, int ret, 
bool zeroed) "offset %"PRId64" bytes %"PRId64" pnum %"PRId64" ret 0x%x zeroed 
%d"
 pci_nvme_dsm(uint16_t cid, uint32_t nsid, uint32_t nr, uint32_t attr) "cid 
%"PRIu16" nsid %"PRIu32" nr %"PRIu32" attr 0x%"PRIx32""
 pci_nvme_dsm_deallocate(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t 
nlb) "cid %"PRIu16" nsid %"PRIu32" slba %"PRIu64" nlb %"PRIu32""
+pci_nvme_dsm_single_range_limit_exceeded(uint32_t nlb, uint32_t dmrsl) "nlb 
%"PRIu32" dmrsl %"PRIu32""
 pci_nvme_compare(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) 
"cid %"PRIu16" nsid %"PRIu32" slba 0x%"PRIx64" nlb %"PRIu32""
 pci_nvme_compare_cb(uint16_t cid) "cid %"PRIu16""
 pci_nvme_aio_discard_cb(uint16_t cid) "cid %"PRIu16""
-- 
2.30.1




[PULL 29/38] hw/block/nvme: remove the req dependency in map functions

2021-03-08 Thread Klaus Jensen
From: Klaus Jensen 

The PRP and SGL mapping functions does not have any particular need for
the entire NvmeRequest as a parameter. Clean it up.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c   | 61 ++-
 hw/block/trace-events |  4 +--
 2 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 621e993e652e..fb0bc971704f 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -535,8 +535,8 @@ static inline bool nvme_addr_is_dma(NvmeCtrl *n, hwaddr 
addr)
 return !(nvme_addr_is_cmb(n, addr) || nvme_addr_is_pmr(n, addr));
 }
 
-static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, uint64_t prp2,
- uint32_t len, NvmeRequest *req)
+static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, uint64_t prp1,
+ uint64_t prp2, uint32_t len)
 {
 hwaddr trans_len = n->page_size - (prp1 % n->page_size);
 trans_len = MIN(len, trans_len);
@@ -546,9 +546,9 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 
 trace_pci_nvme_map_prp(trans_len, len, prp1, prp2, num_prps);
 
-nvme_sg_init(n, >sg, nvme_addr_is_dma(n, prp1));
+nvme_sg_init(n, sg, nvme_addr_is_dma(n, prp1));
 
-status = nvme_map_addr(n, >sg, prp1, trans_len);
+status = nvme_map_addr(n, sg, prp1, trans_len);
 if (status) {
 goto unmap;
 }
@@ -598,7 +598,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 }
 
 trans_len = MIN(len, n->page_size);
-status = nvme_map_addr(n, >sg, prp_ent, trans_len);
+status = nvme_map_addr(n, sg, prp_ent, trans_len);
 if (status) {
 goto unmap;
 }
@@ -612,7 +612,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 status = NVME_INVALID_PRP_OFFSET | NVME_DNR;
 goto unmap;
 }
-status = nvme_map_addr(n, >sg, prp2, len);
+status = nvme_map_addr(n, sg, prp2, len);
 if (status) {
 goto unmap;
 }
@@ -622,7 +622,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 return NVME_SUCCESS;
 
 unmap:
-nvme_sg_unmap(>sg);
+nvme_sg_unmap(sg);
 return status;
 }
 
@@ -632,7 +632,7 @@ unmap:
  */
 static uint16_t nvme_map_sgl_data(NvmeCtrl *n, NvmeSg *sg,
   NvmeSglDescriptor *segment, uint64_t nsgld,
-  size_t *len, NvmeRequest *req)
+  size_t *len, NvmeCmd *cmd)
 {
 dma_addr_t addr, trans_len;
 uint32_t dlen;
@@ -643,7 +643,7 @@ static uint16_t nvme_map_sgl_data(NvmeCtrl *n, NvmeSg *sg,
 
 switch (type) {
 case NVME_SGL_DESCR_TYPE_BIT_BUCKET:
-if (req->cmd.opcode == NVME_CMD_WRITE) {
+if (cmd->opcode == NVME_CMD_WRITE) {
 continue;
 }
 case NVME_SGL_DESCR_TYPE_DATA_BLOCK:
@@ -672,7 +672,7 @@ static uint16_t nvme_map_sgl_data(NvmeCtrl *n, NvmeSg *sg,
 break;
 }
 
-trace_pci_nvme_err_invalid_sgl_excess_length(nvme_cid(req));
+trace_pci_nvme_err_invalid_sgl_excess_length(dlen);
 return NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
 }
 
@@ -701,7 +701,7 @@ next:
 }
 
 static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, NvmeSglDescriptor sgl,
- size_t len, NvmeRequest *req)
+ size_t len, NvmeCmd *cmd)
 {
 /*
  * Read the segment in chunks of 256 descriptors (one 4k page) to avoid
@@ -722,7 +722,7 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, 
NvmeSglDescriptor sgl,
 sgld = 
 addr = le64_to_cpu(sgl.addr);
 
-trace_pci_nvme_map_sgl(nvme_cid(req), NVME_SGL_TYPE(sgl.type), len);
+trace_pci_nvme_map_sgl(NVME_SGL_TYPE(sgl.type), len);
 
 nvme_sg_init(n, sg, nvme_addr_is_dma(n, addr));
 
@@ -731,7 +731,7 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, 
NvmeSglDescriptor sgl,
  * be mapped directly.
  */
 if (NVME_SGL_TYPE(sgl.type) == NVME_SGL_DESCR_TYPE_DATA_BLOCK) {
-status = nvme_map_sgl_data(n, sg, sgld, 1, , req);
+status = nvme_map_sgl_data(n, sg, sgld, 1, , cmd);
 if (status) {
 goto unmap;
 }
@@ -770,7 +770,7 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, 
NvmeSglDescriptor sgl,
 }
 
 status = nvme_map_sgl_data(n, sg, segment, SEG_CHUNK_SIZE,
-   , req);
+   , cmd);
 if (status) {
 goto unmap;
 }
@@ -796,7 +796,7 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, NvmeSg *sg, 
NvmeSglDescriptor sgl,
 switch (NVME_SGL_TYPE(last_sgld->type)) {
 case 

[PULL 38/38] hw/block/nvme: support Identify NS Attached Controller List

2021-03-08 Thread Klaus Jensen
From: Minwoo Im 

Support Identify command for Namespace attached controller list.  This
command handler will traverse the controller instances in the given
subsystem to figure out whether the specified nsid is attached to the
controllers or not.

The 4096bytes Identify data will return with the first entry (16bits)
indicating the number of the controller id entries.  So, the data can
hold up to 2047 entries for the controller ids.

Signed-off-by: Minwoo Im 
Reviewed-by: Keith Busch 
Reviewed-by: Klaus Jensen 
Tested-by: Klaus Jensen 
[k.jensen: rebased for dma refactor]
Signed-off-by: Klaus Jensen 
---
 include/block/nvme.h  |  1 +
 hw/block/nvme.c   | 41 +
 hw/block/trace-events |  1 +
 3 files changed, 43 insertions(+)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index 7ee887022aef..372d0f2799fb 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -971,6 +971,7 @@ enum NvmeIdCns {
 NVME_ID_CNS_CS_NS_ACTIVE_LIST = 0x07,
 NVME_ID_CNS_NS_PRESENT_LIST   = 0x10,
 NVME_ID_CNS_NS_PRESENT= 0x11,
+NVME_ID_CNS_NS_ATTACHED_CTRL_LIST = 0x12,
 NVME_ID_CNS_CS_NS_PRESENT_LIST= 0x1a,
 NVME_ID_CNS_CS_NS_PRESENT = 0x1b,
 NVME_ID_CNS_IO_COMMAND_SET= 0x1c,
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 159cd0ca867b..3e4401128ad8 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -3329,6 +3329,45 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, 
NvmeRequest *req, bool active)
 return NVME_INVALID_CMD_SET | NVME_DNR;
 }
 
+static uint16_t nvme_identify_ns_attached_list(NvmeCtrl *n, NvmeRequest *req)
+{
+NvmeIdentify *c = (NvmeIdentify *)>cmd;
+uint16_t min_id = le16_to_cpu(c->ctrlid);
+uint16_t list[NVME_CONTROLLER_LIST_SIZE] = {};
+uint16_t *ids = [1];
+NvmeNamespace *ns;
+NvmeCtrl *ctrl;
+int cntlid, nr_ids = 0;
+
+trace_pci_nvme_identify_ns_attached_list(min_id);
+
+if (c->nsid == NVME_NSID_BROADCAST) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+ns = nvme_subsys_ns(n->subsys, c->nsid);
+if (!ns) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+for (cntlid = min_id; cntlid < ARRAY_SIZE(n->subsys->ctrls); cntlid++) {
+ctrl = nvme_subsys_ctrl(n->subsys, cntlid);
+if (!ctrl) {
+continue;
+}
+
+if (!nvme_ns_is_attached(ctrl, ns)) {
+continue;
+}
+
+ids[nr_ids++] = cntlid;
+}
+
+list[0] = nr_ids;
+
+return nvme_c2h(n, (uint8_t *)list, sizeof(list), req);
+}
+
 static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req,
 bool active)
 {
@@ -3531,6 +3570,8 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest 
*req)
 return nvme_identify_ns(n, req, true);
 case NVME_ID_CNS_NS_PRESENT:
 return nvme_identify_ns(n, req, false);
+case NVME_ID_CNS_NS_ATTACHED_CTRL_LIST:
+return nvme_identify_ns_attached_list(n, req);
 case NVME_ID_CNS_CS_NS:
 return nvme_identify_ns_csi(n, req, true);
 case NVME_ID_CNS_CS_NS_PRESENT:
diff --git a/hw/block/trace-events b/hw/block/trace-events
index c5dba935a0c1..ef06d2ea7470 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -66,6 +66,7 @@ pci_nvme_identify(uint16_t cid, uint8_t cns, uint16_t ctrlid, 
uint8_t csi) "cid
 pci_nvme_identify_ctrl(void) "identify controller"
 pci_nvme_identify_ctrl_csi(uint8_t csi) "identify controller, csi=0x%"PRIx8""
 pci_nvme_identify_ns(uint32_t ns) "nsid %"PRIu32""
+pci_nvme_identify_ns_attached_list(uint16_t cntid) "cntid=%"PRIu16""
 pci_nvme_identify_ns_csi(uint32_t ns, uint8_t csi) "nsid=%"PRIu32", 
csi=0x%"PRIx8""
 pci_nvme_identify_nslist(uint32_t ns) "nsid %"PRIu32""
 pci_nvme_identify_nslist_csi(uint16_t ns, uint8_t csi) "nsid=%"PRIu16", 
csi=0x%"PRIx8""
-- 
2.30.1




[PULL 34/38] hw/block/nvme: support allocated namespace type

2021-03-08 Thread Klaus Jensen
From: Minwoo Im 

>From NVMe spec 1.4b "6.1.5. NSID and Namespace Relationships" defines
valid namespace types:

- Unallocated: Not exists in the NVMe subsystem
- Allocated: Exists in the NVMe subsystem
- Inactive: Not attached to the controller
- Active: Attached to the controller

This patch added support for allocated, but not attached namespace type:

!nvme_ns(n, nsid) && nvme_subsys_ns(n->subsys, nsid)

nvme_ns() returns attached namespace instance of the given controller
and nvme_subsys_ns() returns allocated namespace instance in the
subsystem.

Signed-off-by: Minwoo Im 
Reviewed-by: Keith Busch 
Reviewed-by: Klaus Jensen 
Tested-by: Klaus Jensen 
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-subsys.h | 13 +
 hw/block/nvme.c| 63 +++---
 2 files changed, 60 insertions(+), 16 deletions(-)

diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 8a0732b22316..14627f9ccb41 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -30,4 +30,17 @@ typedef struct NvmeSubsystem {
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
 int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp);
 
+/*
+ * Return allocated namespace of the specified nsid in the subsystem.
+ */
+static inline NvmeNamespace *nvme_subsys_ns(NvmeSubsystem *subsys,
+uint32_t nsid)
+{
+if (!subsys) {
+return NULL;
+}
+
+return subsys->namespaces[nsid];
+}
+
 #endif /* NVME_SUBSYS_H */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index bfbdc8213c2b..3bfe10f5b517 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -3253,7 +3253,7 @@ static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, 
NvmeRequest *req)
 return nvme_c2h(n, id, sizeof(id), req);
 }
 
-static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req, bool active)
 {
 NvmeNamespace *ns;
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
@@ -3267,7 +3267,14 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, 
NvmeRequest *req)
 
 ns = nvme_ns(n, nsid);
 if (unlikely(!ns)) {
-return nvme_rpt_empty_id_struct(n, req);
+if (!active) {
+ns = nvme_subsys_ns(n->subsys, nsid);
+if (!ns) {
+return nvme_rpt_empty_id_struct(n, req);
+}
+} else {
+return nvme_rpt_empty_id_struct(n, req);
+}
 }
 
 if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) {
@@ -3277,7 +3284,8 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest 
*req)
 return NVME_INVALID_CMD_SET | NVME_DNR;
 }
 
-static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req,
+bool active)
 {
 NvmeNamespace *ns;
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
@@ -3291,7 +3299,14 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, 
NvmeRequest *req)
 
 ns = nvme_ns(n, nsid);
 if (unlikely(!ns)) {
-return nvme_rpt_empty_id_struct(n, req);
+if (!active) {
+ns = nvme_subsys_ns(n->subsys, nsid);
+if (!ns) {
+return nvme_rpt_empty_id_struct(n, req);
+}
+} else {
+return nvme_rpt_empty_id_struct(n, req);
+}
 }
 
 if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) {
@@ -3304,7 +3319,8 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, 
NvmeRequest *req)
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req,
+bool active)
 {
 NvmeNamespace *ns;
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
@@ -3329,7 +3345,14 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
NvmeRequest *req)
 for (i = 1; i <= n->num_namespaces; i++) {
 ns = nvme_ns(n, i);
 if (!ns) {
-continue;
+if (!active) {
+ns = nvme_subsys_ns(n->subsys, i);
+if (!ns) {
+continue;
+}
+} else {
+continue;
+}
 }
 if (ns->params.nsid <= min_nsid) {
 continue;
@@ -3343,7 +3366,8 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
NvmeRequest *req)
 return nvme_c2h(n, list, data_len, req);
 }
 
-static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req,
+bool active)
 {
 NvmeNamespace *ns;
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
@@ -3369,7 +3393,14 @@ static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, 
NvmeRequest *req)
 for (i = 1; i <= n->num_namespaces; i++) {
 ns = nvme_ns(n, i);
 if (!ns) {
-continue;
+if (!active) {
+ns = 

[PULL 35/38] hw/block/nvme: refactor nvme_select_ns_iocs

2021-03-08 Thread Klaus Jensen
From: Minwoo Im 

This patch has no functional changes.  This patch just refactored
nvme_select_ns_iocs() to iterate the attached namespaces of the
controlller and make it invoke __nvme_select_ns_iocs().

Signed-off-by: Minwoo Im 
Reviewed-by: Keith Busch 
Reviewed-by: Klaus Jensen 
Tested-by: Klaus Jensen 
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 3bfe10f5b517..3f86da6ebc5c 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -4033,6 +4033,25 @@ static void nvme_ctrl_shutdown(NvmeCtrl *n)
 }
 }
 
+static void __nvme_select_ns_iocs(NvmeCtrl *n, NvmeNamespace *ns)
+{
+ns->iocs = nvme_cse_iocs_none;
+switch (ns->csi) {
+case NVME_CSI_NVM:
+if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) {
+ns->iocs = nvme_cse_iocs_nvm;
+}
+break;
+case NVME_CSI_ZONED:
+if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) {
+ns->iocs = nvme_cse_iocs_zoned;
+} else if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_NVM) {
+ns->iocs = nvme_cse_iocs_nvm;
+}
+break;
+}
+}
+
 static void nvme_select_ns_iocs(NvmeCtrl *n)
 {
 NvmeNamespace *ns;
@@ -4043,21 +4062,8 @@ static void nvme_select_ns_iocs(NvmeCtrl *n)
 if (!ns) {
 continue;
 }
-ns->iocs = nvme_cse_iocs_none;
-switch (ns->csi) {
-case NVME_CSI_NVM:
-if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) {
-ns->iocs = nvme_cse_iocs_nvm;
-}
-break;
-case NVME_CSI_ZONED:
-if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) {
-ns->iocs = nvme_cse_iocs_zoned;
-} else if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_NVM) {
-ns->iocs = nvme_cse_iocs_nvm;
-}
-break;
-}
+
+__nvme_select_ns_iocs(n, ns);
 }
 }
 
-- 
2.30.1




[PULL 28/38] hw/block/nvme: try to deal with the iov/qsg duality

2021-03-08 Thread Klaus Jensen
From: Klaus Jensen 

Introduce NvmeSg and try to deal with that pesky qsg/iov duality that
haunts all the memory-related functions.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.h |  17 -
 hw/block/nvme.c | 191 ++--
 2 files changed, 117 insertions(+), 91 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 294fac1defe3..96afefa8c9fb 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -29,6 +29,20 @@ typedef struct NvmeAsyncEvent {
 NvmeAerResult result;
 } NvmeAsyncEvent;
 
+enum {
+NVME_SG_ALLOC = 1 << 0,
+NVME_SG_DMA   = 1 << 1,
+};
+
+typedef struct NvmeSg {
+int flags;
+
+union {
+QEMUSGList   qsg;
+QEMUIOVector iov;
+};
+} NvmeSg;
+
 typedef struct NvmeRequest {
 struct NvmeSQueue   *sq;
 struct NvmeNamespace*ns;
@@ -38,8 +52,7 @@ typedef struct NvmeRequest {
 NvmeCqe cqe;
 NvmeCmd cmd;
 BlockAcctCookie acct;
-QEMUSGList  qsg;
-QEMUIOVectoriov;
+NvmeSg  sg;
 QTAILQ_ENTRY(NvmeRequest)entry;
 } NvmeRequest;
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6580f5eb1746..621e993e652e 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -432,15 +432,31 @@ static void nvme_req_clear(NvmeRequest *req)
 req->status = NVME_SUCCESS;
 }
 
-static void nvme_req_exit(NvmeRequest *req)
+static inline void nvme_sg_init(NvmeCtrl *n, NvmeSg *sg, bool dma)
 {
-if (req->qsg.sg) {
-qemu_sglist_destroy(>qsg);
+if (dma) {
+pci_dma_sglist_init(>qsg, >parent_obj, 0);
+sg->flags = NVME_SG_DMA;
+} else {
+qemu_iovec_init(>iov, 0);
 }
 
-if (req->iov.iov) {
-qemu_iovec_destroy(>iov);
+sg->flags |= NVME_SG_ALLOC;
+}
+
+static inline void nvme_sg_unmap(NvmeSg *sg)
+{
+if (!(sg->flags & NVME_SG_ALLOC)) {
+return;
 }
+
+if (sg->flags & NVME_SG_DMA) {
+qemu_sglist_destroy(>qsg);
+} else {
+qemu_iovec_destroy(>iov);
+}
+
+memset(sg, 0x0, sizeof(*sg));
 }
 
 static uint16_t nvme_map_addr_cmb(NvmeCtrl *n, QEMUIOVector *iov, hwaddr addr,
@@ -477,8 +493,7 @@ static uint16_t nvme_map_addr_pmr(NvmeCtrl *n, QEMUIOVector 
*iov, hwaddr addr,
 return NVME_SUCCESS;
 }
 
-static uint16_t nvme_map_addr(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
-  hwaddr addr, size_t len)
+static uint16_t nvme_map_addr(NvmeCtrl *n, NvmeSg *sg, hwaddr addr, size_t len)
 {
 bool cmb = false, pmr = false;
 
@@ -495,38 +510,31 @@ static uint16_t nvme_map_addr(NvmeCtrl *n, QEMUSGList 
*qsg, QEMUIOVector *iov,
 }
 
 if (cmb || pmr) {
-if (qsg && qsg->sg) {
+if (sg->flags & NVME_SG_DMA) {
 return NVME_INVALID_USE_OF_CMB | NVME_DNR;
 }
 
-assert(iov);
-
-if (!iov->iov) {
-qemu_iovec_init(iov, 1);
-}
-
 if (cmb) {
-return nvme_map_addr_cmb(n, iov, addr, len);
+return nvme_map_addr_cmb(n, >iov, addr, len);
 } else {
-return nvme_map_addr_pmr(n, iov, addr, len);
+return nvme_map_addr_pmr(n, >iov, addr, len);
 }
 }
 
-if (iov && iov->iov) {
+if (!(sg->flags & NVME_SG_DMA)) {
 return NVME_INVALID_USE_OF_CMB | NVME_DNR;
 }
 
-assert(qsg);
-
-if (!qsg->sg) {
-pci_dma_sglist_init(qsg, >parent_obj, 1);
-}
-
-qemu_sglist_add(qsg, addr, len);
+qemu_sglist_add(>qsg, addr, len);
 
 return NVME_SUCCESS;
 }
 
+static inline bool nvme_addr_is_dma(NvmeCtrl *n, hwaddr addr)
+{
+return !(nvme_addr_is_cmb(n, addr) || nvme_addr_is_pmr(n, addr));
+}
+
 static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, uint64_t prp2,
  uint32_t len, NvmeRequest *req)
 {
@@ -536,20 +544,13 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 uint16_t status;
 int ret;
 
-QEMUSGList *qsg = >qsg;
-QEMUIOVector *iov = >iov;
-
 trace_pci_nvme_map_prp(trans_len, len, prp1, prp2, num_prps);
 
-if (nvme_addr_is_cmb(n, prp1) || (nvme_addr_is_pmr(n, prp1))) {
-qemu_iovec_init(iov, num_prps);
-} else {
-pci_dma_sglist_init(qsg, >parent_obj, num_prps);
-}
+nvme_sg_init(n, >sg, nvme_addr_is_dma(n, prp1));
 
-status = nvme_map_addr(n, qsg, iov, prp1, trans_len);
+status = nvme_map_addr(n, >sg, prp1, trans_len);
 if (status) {
-return status;
+goto unmap;
 }
 
 len -= trans_len;
@@ -564,7 +565,8 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 ret = nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
 if (ret) {
 trace_pci_nvme_err_addr_read(prp2);
-return NVME_DATA_TRAS_ERROR;
+status = NVME_DATA_TRAS_ERROR;
+goto unmap;
 }
 

[PULL 37/38] hw/block/nvme: support changed namespace asynchronous event

2021-03-08 Thread Klaus Jensen
From: Minwoo Im 

If namespace inventory is changed due to some reasons (e.g., namespace
attachment/detachment), controller can send out event notifier to the
host to manage namespaces.

This patch sends out the AEN to the host after either attach or detach
namespaces from controllers.  To support clear of the event from the
controller, this patch also implemented Get Log Page command for Changed
Namespace List log type.  To return namespace id list through the
command, when namespace inventory is updated, id is added to the
per-controller list (changed_ns_list).

To indicate the support of this async event, this patch set
OAES(Optional Asynchronous Events Supported) in Identify Controller data
structure.

Signed-off-by: Minwoo Im 
Reviewed-by: Keith Busch 
Reviewed-by: Klaus Jensen 
Tested-by: Klaus Jensen 
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-ns.h   |  1 +
 hw/block/nvme.h  |  4 
 include/block/nvme.h |  7 ++
 hw/block/nvme.c  | 56 
 4 files changed, 68 insertions(+)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index b0c00e115d81..318d3aebe1a8 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -53,6 +53,7 @@ typedef struct NvmeNamespace {
 uint8_t  csi;
 
 NvmeSubsystem   *subsys;
+QTAILQ_ENTRY(NvmeNamespace) entry;
 
 NvmeIdNsZoned   *id_ns_zoned;
 NvmeZone*zone_array;
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 1287bc2cd17a..4955d649c7d4 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -192,6 +192,10 @@ typedef struct NvmeCtrl {
 
 uint32_tdmrsl;
 
+/* Namespace ID is started with 1 so bitmap should be 1-based */
+#define NVME_CHANGED_NSID_SIZE  (NVME_MAX_NAMESPACES + 1)
+DECLARE_BITMAP(changed_nsids, NVME_CHANGED_NSID_SIZE);
+
 NvmeSubsystem   *subsys;
 
 NvmeNamespace   namespace;
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 03471a4d5abd..7ee887022aef 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -760,6 +760,7 @@ typedef struct QEMU_PACKED NvmeCopySourceRange {
 enum NvmeAsyncEventRequest {
 NVME_AER_TYPE_ERROR = 0,
 NVME_AER_TYPE_SMART = 1,
+NVME_AER_TYPE_NOTICE= 2,
 NVME_AER_TYPE_IO_SPECIFIC   = 6,
 NVME_AER_TYPE_VENDOR_SPECIFIC   = 7,
 NVME_AER_INFO_ERR_INVALID_DB_REGISTER   = 0,
@@ -771,6 +772,7 @@ enum NvmeAsyncEventRequest {
 NVME_AER_INFO_SMART_RELIABILITY = 0,
 NVME_AER_INFO_SMART_TEMP_THRESH = 1,
 NVME_AER_INFO_SMART_SPARE_THRESH= 2,
+NVME_AER_INFO_NOTICE_NS_ATTR_CHANGED= 0,
 };
 
 typedef struct QEMU_PACKED NvmeAerResult {
@@ -940,6 +942,7 @@ enum NvmeLogIdentifier {
 NVME_LOG_ERROR_INFO = 0x01,
 NVME_LOG_SMART_INFO = 0x02,
 NVME_LOG_FW_SLOT_INFO   = 0x03,
+NVME_LOG_CHANGED_NSLIST = 0x04,
 NVME_LOG_CMD_EFFECTS= 0x05,
 };
 
@@ -1056,6 +1059,10 @@ typedef struct NvmeIdCtrlNvm {
 uint8_t rsvd16[4080];
 } NvmeIdCtrlNvm;
 
+enum NvmeIdCtrlOaes {
+NVME_OAES_NS_ATTR   = 1 << 8,
+};
+
 enum NvmeIdCtrlOacs {
 NVME_OACS_SECURITY  = 1 << 0,
 NVME_OACS_FORMAT= 1 << 1,
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index fc38c3e4629d..159cd0ca867b 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -3006,6 +3006,48 @@ static uint16_t nvme_error_info(NvmeCtrl *n, uint8_t 
rae, uint32_t buf_len,
 return nvme_c2h(n, (uint8_t *), trans_len, req);
 }
 
+static uint16_t nvme_changed_nslist(NvmeCtrl *n, uint8_t rae, uint32_t buf_len,
+uint64_t off, NvmeRequest *req)
+{
+uint32_t nslist[1024];
+uint32_t trans_len;
+int i = 0;
+uint32_t nsid;
+
+memset(nslist, 0x0, sizeof(nslist));
+trans_len = MIN(sizeof(nslist) - off, buf_len);
+
+while ((nsid = find_first_bit(n->changed_nsids, NVME_CHANGED_NSID_SIZE)) !=
+NVME_CHANGED_NSID_SIZE) {
+/*
+ * If more than 1024 namespaces, the first entry in the log page should
+ * be set to 0x and the others to 0 as spec.
+ */
+if (i == ARRAY_SIZE(nslist)) {
+memset(nslist, 0x0, sizeof(nslist));
+nslist[0] = 0x;
+break;
+}
+
+nslist[i++] = nsid;
+clear_bit(nsid, n->changed_nsids);
+}
+
+/*
+ * Remove all the remaining list entries in case returns directly due to
+ * more than 1024 namespaces.
+ */
+if (nslist[0] == 0x) {
+bitmap_zero(n->changed_nsids, NVME_CHANGED_NSID_SIZE);
+}
+
+if (!rae) {
+nvme_clear_events(n, NVME_AER_TYPE_NOTICE);
+}
+
+return nvme_c2h(n, ((uint8_t *)nslist) + off, trans_len, req);
+}
+
 static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint8_t csi, uint32_t buf_len,
  uint64_t off, NvmeRequest *req)
 {
@@ -3089,6 +3131,8 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest 

[PULL 31/38] hw/block/nvme: support namespace detach

2021-03-08 Thread Klaus Jensen
From: Minwoo Im 

Given that now we have nvme-subsys device supported, we can manage
namespace allocated, but not attached: detached.  This patch introduced
a parameter for nvme-ns device named 'detached'.  This parameter
indicates whether the given namespace device is detached from
a entire NVMe subsystem('subsys' given case, shared namespace) or a
controller('bus' given case, private namespace).

- Allocated namespace

  1) Shared ns in the subsystem 'subsys0':

 -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,subsys=subsys0,detached=true

  2) Private ns for the controller 'nvme0' of the subsystem 'subsys0':

 -device nvme-subsys,id=subsys0
 -device nvme,serial=foo,id=nvme0,subsys=subsys0
 -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,bus=nvme0,detached=true

  3) (Invalid case) Controller 'nvme0' has no subsystem to manage ns:

 -device nvme,serial=foo,id=nvme0
 -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,bus=nvme0,detached=true

Signed-off-by: Minwoo Im 
Reviewed-by: Keith Busch 
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-ns.h |  1 +
 hw/block/nvme-subsys.h |  1 +
 hw/block/nvme.h| 22 ++
 hw/block/nvme-ns.c |  1 +
 hw/block/nvme.c| 41 +++--
 5 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 7af6884862b5..b0c00e115d81 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -26,6 +26,7 @@ typedef struct NvmeZone {
 } NvmeZone;
 
 typedef struct NvmeNamespaceParams {
+bool detached;
 uint32_t nsid;
 QemuUUID uuid;
 
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index ccf6a71398d3..890d118117dc 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -23,6 +23,7 @@ typedef struct NvmeSubsystem {
 uint8_t subnqn[256];
 
 NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS];
+/* Allocated namespaces for this subsystem */
 NvmeNamespace *namespaces[NVME_SUBSYS_MAX_NAMESPACES];
 } NvmeSubsystem;
 
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 96afefa8c9fb..cd8d40634411 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -189,6 +189,10 @@ typedef struct NvmeCtrl {
 NvmeSubsystem   *subsys;
 
 NvmeNamespace   namespace;
+/*
+ * Attached namespaces to this controller.  If subsys is not given, all
+ * namespaces in this list will always be attached.
+ */
 NvmeNamespace   *namespaces[NVME_MAX_NAMESPACES];
 NvmeSQueue  **sq;
 NvmeCQueue  **cq;
@@ -207,6 +211,24 @@ static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t 
nsid)
 return n->namespaces[nsid - 1];
 }
 
+static inline bool nvme_ns_is_attached(NvmeCtrl *n, NvmeNamespace *ns)
+{
+int nsid;
+
+for (nsid = 1; nsid <= n->num_namespaces; nsid++) {
+if (nvme_ns(n, nsid) == ns) {
+return true;
+}
+}
+
+return false;
+}
+
+static inline void nvme_ns_attach(NvmeCtrl *n, NvmeNamespace *ns)
+{
+n->namespaces[nvme_nsid(ns) - 1] = ns;
+}
+
 static inline NvmeCQueue *nvme_cq(NvmeRequest *req)
 {
 NvmeSQueue *sq = req->sq;
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 0e8760020483..eda6a0c003a4 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -399,6 +399,7 @@ static Property nvme_ns_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
 DEFINE_PROP_LINK("subsys", NvmeNamespace, subsys, TYPE_NVME_SUBSYS,
  NvmeSubsystem *),
+DEFINE_PROP_BOOL("detached", NvmeNamespace, params.detached, false),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
 DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
 DEFINE_PROP_UINT16("mssrl", NvmeNamespace, params.mssrl, 128),
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 22bd8403496b..bfbdc8213c2b 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -23,7 +23,7 @@
  *  max_ioqpairs=, \
  *  aerl=,aer_max_queued=, \
  *  mdts=,zoned.zasl=, \
- *  subsys= \
+ *  subsys=,detached=
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=, \
  *  subsys=
@@ -82,6 +82,13 @@
  *   controllers in the subsystem. Otherwise, `bus` must be given to attach
  *   this namespace to a specified single controller as a non-shared namespace.
  *
+ * - `detached`
+ *   Not to attach the namespace device to controllers in the NVMe subsystem
+ *   during boot-up. If not given, namespaces are all attahced to all
+ *   controllers in the subsystem by default.
+ *   It's mutual exclusive with 'bus' parameter. It's only valid in case
+ *   `subsys` is provided.
+ *
  * Setting `zoned` to true selects Zoned Command Set at the namespace.
  * In this case, the following namespace properties are available to configure
  * zoned operation:
@@ -4646,6 +4653,20 @@ static void nvme_init_state(NvmeCtrl *n)
 n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
 }
 

[PULL 21/38] hw/block/nvme: add identify trace event

2021-03-08 Thread Klaus Jensen
From: Gollu Appalanaidu 

Add a trace event for the Identify command.

Signed-off-by: Gollu Appalanaidu 
Signed-off-by: Klaus Jensen 
Reviewed-by: Minwoo Im 
---
 hw/block/nvme.c   | 3 +++
 hw/block/trace-events | 1 +
 2 files changed, 4 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index ba07e6deef5f..478168de6eab 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -3413,6 +3413,9 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest 
*req)
 {
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
 
+trace_pci_nvme_identify(nvme_cid(req), c->cns, le16_to_cpu(c->ctrlid),
+c->csi);
+
 switch (c->cns) {
 case NVME_ID_CNS_NS:
  /* fall through */
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 25ba51ea5405..c165ee2a97c3 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -61,6 +61,7 @@ pci_nvme_create_sq(uint64_t addr, uint16_t sqid, uint16_t 
cqid, uint16_t qsize,
 pci_nvme_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t 
size, uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", 
cqid=%"PRIu16", vector=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16", ien=%d"
 pci_nvme_del_sq(uint16_t qid) "deleting submission queue sqid=%"PRIu16""
 pci_nvme_del_cq(uint16_t cqid) "deleted completion queue, cqid=%"PRIu16""
+pci_nvme_identify(uint16_t cid, uint8_t cns, uint16_t ctrlid, uint8_t csi) 
"cid %"PRIu16" cns 0x%"PRIx8" ctrlid %"PRIu16" csi 0x%"PRIx8""
 pci_nvme_identify_ctrl(void) "identify controller"
 pci_nvme_identify_ctrl_csi(uint8_t csi) "identify controller, csi=0x%"PRIx8""
 pci_nvme_identify_ns(uint32_t ns) "nsid %"PRIu32""
-- 
2.30.1




[PULL 27/38] hw/block/nvme: fix strerror printing

2021-03-08 Thread Klaus Jensen
From: Klaus Jensen 

Fix missing sign inversion.

Signed-off-by: Klaus Jensen 
Reviewed-by: Minwoo Im 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 148ad3dd01e1..6580f5eb1746 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1155,7 +1155,7 @@ static void nvme_aio_err(NvmeRequest *req, int ret)
 break;
 }
 
-trace_pci_nvme_err_aio(nvme_cid(req), strerror(ret), status);
+trace_pci_nvme_err_aio(nvme_cid(req), strerror(-ret), status);
 
 error_setg_errno(_err, -ret, "aio failed");
 error_report_err(local_err);
-- 
2.30.1




[PULL 18/38] hw/block/nvme: deduplicate bad mdts trace event

2021-03-08 Thread Klaus Jensen
From: Klaus Jensen 

If mdts is exceeded, trace it from a single place.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c   | 6 +-
 hw/block/trace-events | 2 +-
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6a27b28f2c2d..25a7726ca05b 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1075,6 +1075,7 @@ static inline uint16_t nvme_check_mdts(NvmeCtrl *n, 
size_t len)
 uint8_t mdts = n->params.mdts;
 
 if (mdts && len > n->page_size << mdts) {
+trace_pci_nvme_err_mdts(len);
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
@@ -1945,7 +1946,6 @@ static uint16_t nvme_compare(NvmeCtrl *n, NvmeRequest 
*req)
 
 status = nvme_check_mdts(n, len);
 if (status) {
-trace_pci_nvme_err_mdts(nvme_cid(req), len);
 return status;
 }
 
@@ -2048,7 +2048,6 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req)
 
 status = nvme_check_mdts(n, data_size);
 if (status) {
-trace_pci_nvme_err_mdts(nvme_cid(req), data_size);
 goto invalid;
 }
 
@@ -2116,7 +2115,6 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest 
*req, bool append,
 if (!wrz) {
 status = nvme_check_mdts(n, data_size);
 if (status) {
-trace_pci_nvme_err_mdts(nvme_cid(req), data_size);
 goto invalid;
 }
 }
@@ -2610,7 +2608,6 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, 
NvmeRequest *req)
 
 status = nvme_check_mdts(n, data_size);
 if (status) {
-trace_pci_nvme_err_mdts(nvme_cid(req), data_size);
 return status;
 }
 
@@ -3052,7 +3049,6 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest 
*req)
 
 status = nvme_check_mdts(n, len);
 if (status) {
-trace_pci_nvme_err_mdts(nvme_cid(req), len);
 return status;
 }
 
diff --git a/hw/block/trace-events b/hw/block/trace-events
index b04f7a3e1890..e1a85661cf3f 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -114,7 +114,7 @@ pci_nvme_clear_ns_close(uint32_t state, uint64_t slba) 
"zone state=%"PRIu32", sl
 pci_nvme_clear_ns_reset(uint32_t state, uint64_t slba) "zone state=%"PRIu32", 
slba=%"PRIu64" transitioned to Empty state"
 
 # nvme traces for error conditions
-pci_nvme_err_mdts(uint16_t cid, size_t len) "cid %"PRIu16" len %zu"
+pci_nvme_err_mdts(size_t len) "len %zu"
 pci_nvme_err_req_status(uint16_t cid, uint32_t nsid, uint16_t status, uint8_t 
opc) "cid %"PRIu16" nsid %"PRIu32" status 0x%"PRIx16" opc 0x%"PRIx8""
 pci_nvme_err_addr_read(uint64_t addr) "addr 0x%"PRIx64""
 pci_nvme_err_addr_write(uint64_t addr) "addr 0x%"PRIx64""
-- 
2.30.1




[PULL 26/38] hw/block/nvme: remove block accounting for write zeroes

2021-03-08 Thread Klaus Jensen
From: Klaus Jensen 

A Write Zeroes commands should not be counted in either the 'Data Units
Written' or in 'Host Write Commands' SMART/Health Information Log page.

Signed-off-by: Klaus Jensen 
Reviewed-by: Minwoo Im 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 8b84342d72a8..148ad3dd01e1 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -2172,7 +2172,6 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest 
*req, bool append,
  nvme_rw_cb, req);
 }
 } else {
-block_acct_start(blk_get_stats(blk), >acct, 0, BLOCK_ACCT_WRITE);
 req->aiocb = blk_aio_pwrite_zeroes(blk, data_offset, data_size,
BDRV_REQ_MAY_UNMAP, nvme_rw_cb,
req);
-- 
2.30.1




[PULL 23/38] hw/block/nvme: add trace event for zone read check

2021-03-08 Thread Klaus Jensen
From: Gollu Appalanaidu 

Add a trace event for the offline zone condition when checking zone
read.

Signed-off-by: Gollu Appalanaidu 
[k.jensen: split commit]
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 7b4adb906fb4..961507cae28a 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1246,6 +1246,7 @@ static uint16_t nvme_check_zone_state_for_read(NvmeZone 
*zone)
 case NVME_ZONE_STATE_READ_ONLY:
 return NVME_SUCCESS;
 case NVME_ZONE_STATE_OFFLINE:
+trace_pci_nvme_err_zone_is_offline(zone->d.zslba);
 return NVME_ZONE_OFFLINE;
 default:
 assert(false);
-- 
2.30.1




[PULL 25/38] hw/block/nvme: remove redundant len member in compare context

2021-03-08 Thread Klaus Jensen
From: Klaus Jensen 

The 'len' member of the nvme_compare_ctx struct is redundant since the
same information is available in the 'iov' member.

Signed-off-by: Klaus Jensen 
Reviewed-by: Minwoo Im 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 0f6400cd7274..8b84342d72a8 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1694,7 +1694,6 @@ static void nvme_aio_copy_in_cb(void *opaque, int ret)
 struct nvme_compare_ctx {
 QEMUIOVector iov;
 uint8_t *bounce;
-size_t len;
 };
 
 static void nvme_compare_cb(void *opaque, int ret)
@@ -1715,16 +1714,16 @@ static void nvme_compare_cb(void *opaque, int ret)
 goto out;
 }
 
-buf = g_malloc(ctx->len);
+buf = g_malloc(ctx->iov.size);
 
-status = nvme_dma(nvme_ctrl(req), buf, ctx->len, DMA_DIRECTION_TO_DEVICE,
-  req);
+status = nvme_dma(nvme_ctrl(req), buf, ctx->iov.size,
+  DMA_DIRECTION_TO_DEVICE, req);
 if (status) {
 req->status = status;
 goto out;
 }
 
-if (memcmp(buf, ctx->bounce, ctx->len)) {
+if (memcmp(buf, ctx->bounce, ctx->iov.size)) {
 req->status = NVME_CMP_FAILURE;
 }
 
@@ -1965,7 +1964,6 @@ static uint16_t nvme_compare(NvmeCtrl *n, NvmeRequest 
*req)
 
 ctx = g_new(struct nvme_compare_ctx, 1);
 ctx->bounce = bounce;
-ctx->len = len;
 
 req->opaque = ctx;
 
-- 
2.30.1




[PULL 22/38] hw/block/nvme: fix potential compilation error

2021-03-08 Thread Klaus Jensen
From: Gollu Appalanaidu 

assert may be compiled to a noop and we could end up returning an
uninitialized status.

Fix this by always returning Internal Device Error as a fallback.

Note that, as pointed out by Philippe, per commit 262a69f4282 ("osdep.h:
Prohibit disabling assert() in supported builds") this shouldn't be
possible. But clean it up so we don't worry about it again.

Signed-off-by: Gollu Appalanaidu 
[k.jensen: split commit]
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 478168de6eab..7b4adb906fb4 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1237,8 +1237,6 @@ static uint16_t nvme_check_zone_write(NvmeNamespace *ns, 
NvmeZone *zone,
 
 static uint16_t nvme_check_zone_state_for_read(NvmeZone *zone)
 {
-uint16_t status;
-
 switch (nvme_get_zone_state(zone)) {
 case NVME_ZONE_STATE_EMPTY:
 case NVME_ZONE_STATE_IMPLICITLY_OPEN:
@@ -1246,16 +1244,14 @@ static uint16_t nvme_check_zone_state_for_read(NvmeZone 
*zone)
 case NVME_ZONE_STATE_FULL:
 case NVME_ZONE_STATE_CLOSED:
 case NVME_ZONE_STATE_READ_ONLY:
-status = NVME_SUCCESS;
-break;
+return NVME_SUCCESS;
 case NVME_ZONE_STATE_OFFLINE:
-status = NVME_ZONE_OFFLINE;
-break;
+return NVME_ZONE_OFFLINE;
 default:
 assert(false);
 }
 
-return status;
+return NVME_INTERNAL_DEV_ERROR;
 }
 
 static uint16_t nvme_check_zone_read(NvmeNamespace *ns, uint64_t slba,
-- 
2.30.1




[PULL 30/38] hw/block/nvme: refactor nvme_dma

2021-03-08 Thread Klaus Jensen
From: Klaus Jensen 

The nvme_dma function doesn't just do DMA (QEMUSGList-based) memory transfers;
it also handles QEMUIOVector copies.

Introduce the NvmeTxDirection enum and rename to nvme_tx. Remove mapping
of PRPs/SGLs from nvme_tx and instead assert that they have been mapped
previously. This allows more fine-grained use in subsequent patches.

Add new (better named) helpers, nvme_{c2h,h2c}, that does both PRP/SGL
mapping and transfer.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 138 ++--
 1 file changed, 76 insertions(+), 62 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index fb0bc971704f..22bd8403496b 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -862,45 +862,71 @@ static uint16_t nvme_map_dptr(NvmeCtrl *n, NvmeSg *sg, 
size_t len,
 }
 }
 
-static uint16_t nvme_dma(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
- DMADirection dir, NvmeRequest *req)
+typedef enum NvmeTxDirection {
+NVME_TX_DIRECTION_TO_DEVICE   = 0,
+NVME_TX_DIRECTION_FROM_DEVICE = 1,
+} NvmeTxDirection;
+
+static uint16_t nvme_tx(NvmeCtrl *n, NvmeSg *sg, uint8_t *ptr, uint32_t len,
+NvmeTxDirection dir)
 {
-uint16_t status = NVME_SUCCESS;
+assert(sg->flags & NVME_SG_ALLOC);
+
+if (sg->flags & NVME_SG_DMA) {
+uint64_t residual;
+
+if (dir == NVME_TX_DIRECTION_TO_DEVICE) {
+residual = dma_buf_write(ptr, len, >qsg);
+} else {
+residual = dma_buf_read(ptr, len, >qsg);
+}
+
+if (unlikely(residual)) {
+trace_pci_nvme_err_invalid_dma();
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+} else {
+size_t bytes;
+
+if (dir == NVME_TX_DIRECTION_TO_DEVICE) {
+bytes = qemu_iovec_to_buf(>iov, 0, ptr, len);
+} else {
+bytes = qemu_iovec_from_buf(>iov, 0, ptr, len);
+}
+
+if (unlikely(bytes != len)) {
+trace_pci_nvme_err_invalid_dma();
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+}
+
+return NVME_SUCCESS;
+}
+
+static inline uint16_t nvme_c2h(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
+NvmeRequest *req)
+{
+uint16_t status;
 
 status = nvme_map_dptr(n, >sg, len, >cmd);
 if (status) {
 return status;
 }
 
-if (req->sg.flags & NVME_SG_DMA) {
-uint64_t residual;
+return nvme_tx(n, >sg, ptr, len, NVME_TX_DIRECTION_FROM_DEVICE);
+}
 
-if (dir == DMA_DIRECTION_TO_DEVICE) {
-residual = dma_buf_write(ptr, len, >sg.qsg);
-} else {
-residual = dma_buf_read(ptr, len, >sg.qsg);
-}
+static inline uint16_t nvme_h2c(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
+NvmeRequest *req)
+{
+uint16_t status;
 
-if (unlikely(residual)) {
-trace_pci_nvme_err_invalid_dma();
-status = NVME_INVALID_FIELD | NVME_DNR;
-}
-} else {
-size_t bytes;
-
-if (dir == DMA_DIRECTION_TO_DEVICE) {
-bytes = qemu_iovec_to_buf(>sg.iov, 0, ptr, len);
-} else {
-bytes = qemu_iovec_from_buf(>sg.iov, 0, ptr, len);
-}
-
-if (unlikely(bytes != len)) {
-trace_pci_nvme_err_invalid_dma();
-status = NVME_INVALID_FIELD | NVME_DNR;
-}
+status = nvme_map_dptr(n, >sg, len, >cmd);
+if (status) {
+return status;
 }
 
-return status;
+return nvme_tx(n, >sg, ptr, len, NVME_TX_DIRECTION_TO_DEVICE);
 }
 
 static inline void nvme_blk_read(BlockBackend *blk, int64_t offset,
@@ -1737,8 +1763,7 @@ static void nvme_compare_cb(void *opaque, int ret)
 
 buf = g_malloc(ctx->iov.size);
 
-status = nvme_dma(nvme_ctrl(req), buf, ctx->iov.size,
-  DMA_DIRECTION_TO_DEVICE, req);
+status = nvme_h2c(nvme_ctrl(req), buf, ctx->iov.size, req);
 if (status) {
 req->status = status;
 goto out;
@@ -1774,8 +1799,7 @@ static uint16_t nvme_dsm(NvmeCtrl *n, NvmeRequest *req)
 NvmeDsmRange range[nr];
 uintptr_t *discards = (uintptr_t *)>opaque;
 
-status = nvme_dma(n, (uint8_t *)range, sizeof(range),
-  DMA_DIRECTION_TO_DEVICE, req);
+status = nvme_h2c(n, (uint8_t *)range, sizeof(range), req);
 if (status) {
 return status;
 }
@@ -1861,8 +1885,8 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
 
 range = g_new(NvmeCopySourceRange, nr);
 
-status = nvme_dma(n, (uint8_t *)range, nr * sizeof(NvmeCopySourceRange),
-  DMA_DIRECTION_TO_DEVICE, req);
+status = nvme_h2c(n, (uint8_t *)range, nr * sizeof(NvmeCopySourceRange),
+  req);
 if (status) {
 return status;
 }
@@ -2513,8 +2537,7 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, 
NvmeRequest *req)
 

[PULL 17/38] hw/block/nvme: document 'mdts' nvme device parameter

2021-03-08 Thread Klaus Jensen
From: Klaus Jensen 

Document the 'mdts' nvme device parameter.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 1cd82fa3c9fe..6a27b28f2c2d 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -63,6 +63,12 @@
  *   completion when there are no outstanding AERs. When the maximum number of
  *   enqueued events are reached, subsequent events will be dropped.
  *
+ * - `mdts`
+ *   Indicates the maximum data transfer size for a command that transfers data
+ *   between host-accessible memory and the controller. The value is specified
+ *   as a power of two (2^n) and is in units of the minimum memory page size
+ *   (CAP.MPSMIN). The default value is 7 (i.e. 512 KiB).
+ *
  * - `zoned.append_size_limit`
  *   The maximum I/O size in bytes that is allowed in Zone Append command.
  *   The default is 128KiB. Since internally this this value is maintained as
-- 
2.30.1




[PULL 19/38] hw/block/nvme: align zoned.zasl with mdts

2021-03-08 Thread Klaus Jensen
From: Klaus Jensen 

ZASL (Zone Append Size Limit) is defined exactly like MDTS (Maximum Data
Transfer Size), that is, it is a value in units of the minimum memory
page size (CAP.MPSMIN) and is reported as a power of two.

The 'mdts' nvme device parameter is specified as in the spec, but the
'zoned.append_size_limit' parameter is specified in bytes. This is
suboptimal for a number of reasons:

  1. It is just plain confusing wrt. the definition of mdts.
  2. There is a lot of complexity involved in validating the value; it
 must be a power of two, it should be larger than 4k, if it is zero
 we set it internally to mdts, but still report it as zero.
  3. While "hw/block/nvme: improve invalid zasl value reporting"
 slightly improved the handling of the parameter, the validation is
 still wrong; it does not depend on CC.MPS, it depends on
 CAP.MPSMIN. And we are not even checking that it is actually less
 than or equal to MDTS, which is kinda the *one* condition it must
 satisfy.

Fix this by defining zasl exactly like mdts and checking the one thing
that it must satisfy (that it is less than or equal to mdts). Also,
change the default value from 128KiB to 0 (aka, whatever mdts is).

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.h   |  4 +--
 hw/block/nvme.c   | 59 +--
 hw/block/trace-events |  2 +-
 3 files changed, 19 insertions(+), 46 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index cb2b5175f1a1..f45ace0cff5b 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -20,7 +20,7 @@ typedef struct NvmeParams {
 uint32_t aer_max_queued;
 uint8_t  mdts;
 bool use_intel_id;
-uint32_t zasl_bs;
+uint8_t  zasl;
 bool legacy_cmb;
 } NvmeParams;
 
@@ -171,8 +171,6 @@ typedef struct NvmeCtrl {
 QTAILQ_HEAD(, NvmeAsyncEvent) aer_queue;
 int aer_queued;
 
-uint8_t zasl;
-
 NvmeSubsystem   *subsys;
 
 NvmeNamespace   namespace;
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 25a7726ca05b..01be8a1620be 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -21,8 +21,8 @@
  *  cmb_size_mb=, \
  *  [pmrdev=,] \
  *  max_ioqpairs=, \
- *  aerl=, aer_max_queued=, \
- *  mdts=,zoned.append_size_limit=, \
+ *  aerl=,aer_max_queued=, \
+ *  mdts=,zoned.zasl=, \
  *  subsys= \
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=, \
@@ -69,13 +69,11 @@
  *   as a power of two (2^n) and is in units of the minimum memory page size
  *   (CAP.MPSMIN). The default value is 7 (i.e. 512 KiB).
  *
- * - `zoned.append_size_limit`
- *   The maximum I/O size in bytes that is allowed in Zone Append command.
- *   The default is 128KiB. Since internally this this value is maintained as
- *   ZASL = log2( / ), some values assigned
- *   to this property may be rounded down and result in a lower maximum ZA
- *   data size being in effect. By setting this property to 0, users can make
- *   ZASL to be equal to MDTS. This property only affects zoned namespaces.
+ * - `zoned.zasl`
+ *   Indicates the maximum data transfer size for the Zone Append command. Like
+ *   `mdts`, the value is specified as a power of two (2^n) and is in units of
+ *   the minimum memory page size (CAP.MPSMIN). The default value is 0 (i.e.
+ *   defaulting to the value of `mdts`).
  *
  * nvme namespace device parameters
  * 
@@ -2135,10 +2133,9 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest 
*req, bool append,
 goto invalid;
 }
 
-if (nvme_l2b(ns, nlb) > (n->page_size << n->zasl)) {
-trace_pci_nvme_err_append_too_large(slba, nlb, n->zasl);
-status = NVME_INVALID_FIELD;
-goto invalid;
+if (n->params.zasl && data_size > n->page_size << n->params.zasl) {
+trace_pci_nvme_err_zasl(data_size);
+return NVME_INVALID_FIELD | NVME_DNR;
 }
 
 slba = zone->w_ptr;
@@ -3212,9 +3209,8 @@ static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, 
NvmeRequest *req)
 if (c->csi == NVME_CSI_NVM) {
 return nvme_rpt_empty_id_struct(n, req);
 } else if (c->csi == NVME_CSI_ZONED) {
-if (n->params.zasl_bs) {
-id.zasl = n->zasl;
-}
+id.zasl = n->params.zasl;
+
 return nvme_dma(n, (uint8_t *), sizeof(id),
 DMA_DIRECTION_FROM_DEVICE, req);
 }
@@ -4088,19 +4084,6 @@ static int nvme_start_ctrl(NvmeCtrl *n)
 nvme_init_sq(>admin_sq, n, n->bar.asq, 0, 0,
  NVME_AQA_ASQS(n->bar.aqa) + 1);
 
-if (!n->params.zasl_bs) {
-n->zasl = n->params.mdts;
-} else {
-if (n->params.zasl_bs < n->page_size) {
-NVME_GUEST_ERR(pci_nvme_err_startfail_zasl_too_small,
-   

[PULL 16/38] hw/block/nvme: add broadcast nsid support flush command

2021-03-08 Thread Klaus Jensen
From: Gollu Appalanaidu 

Add support for using the broadcast nsid to issue a flush on all
namespaces through a single command.

Signed-off-by: Gollu Appalanaidu 
Reviewed-by: Klaus Jensen 
Acked-by: Stefan Hajnoczi 
Acked-by: Keith Busch 
Signed-off-by: Klaus Jensen 
---
 include/block/nvme.h  |   8 +++
 hw/block/nvme.c   | 124 +++---
 hw/block/trace-events |   2 +
 3 files changed, 127 insertions(+), 7 deletions(-)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index 9f8eb3988c0e..b23f3ae2279f 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1062,6 +1062,14 @@ enum NvmeIdCtrlOcfs {
 NVME_OCFS_COPY_FORMAT_0 = 1 << 0,
 };
 
+enum NvmeIdctrlVwc {
+NVME_VWC_PRESENT= 1 << 0,
+NVME_VWC_NSID_BROADCAST_NO_SUPPORT  = 0 << 1,
+NVME_VWC_NSID_BROADCAST_RESERVED= 1 << 1,
+NVME_VWC_NSID_BROADCAST_CTRL_SPEC   = 2 << 1,
+NVME_VWC_NSID_BROADCAST_SUPPORT = 3 << 1,
+};
+
 enum NvmeIdCtrlFrmw {
 NVME_FRMW_SLOT1_RO = 1 << 0,
 };
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 68d80a0b4c37..1cd82fa3c9fe 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1457,6 +1457,41 @@ static void nvme_rw_cb(void *opaque, int ret)
 nvme_enqueue_req_completion(nvme_cq(req), req);
 }
 
+struct nvme_aio_flush_ctx {
+NvmeRequest *req;
+NvmeNamespace   *ns;
+BlockAcctCookie acct;
+};
+
+static void nvme_aio_flush_cb(void *opaque, int ret)
+{
+struct nvme_aio_flush_ctx *ctx = opaque;
+NvmeRequest *req = ctx->req;
+uintptr_t *num_flushes = (uintptr_t *)>opaque;
+
+BlockBackend *blk = ctx->ns->blkconf.blk;
+BlockAcctCookie *acct = >acct;
+BlockAcctStats *stats = blk_get_stats(blk);
+
+trace_pci_nvme_aio_flush_cb(nvme_cid(req), blk_name(blk));
+
+if (!ret) {
+block_acct_done(stats, acct);
+} else {
+block_acct_failed(stats, acct);
+nvme_aio_err(req, ret);
+}
+
+(*num_flushes)--;
+g_free(ctx);
+
+if (*num_flushes) {
+return;
+}
+
+nvme_enqueue_req_completion(nvme_cq(req), req);
+}
+
 static void nvme_aio_discard_cb(void *opaque, int ret)
 {
 NvmeRequest *req = opaque;
@@ -1940,10 +1975,56 @@ static uint16_t nvme_compare(NvmeCtrl *n, NvmeRequest 
*req)
 
 static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req)
 {
-block_acct_start(blk_get_stats(req->ns->blkconf.blk), >acct, 0,
- BLOCK_ACCT_FLUSH);
-req->aiocb = blk_aio_flush(req->ns->blkconf.blk, nvme_rw_cb, req);
-return NVME_NO_COMPLETE;
+uint32_t nsid = le32_to_cpu(req->cmd.nsid);
+uintptr_t *num_flushes = (uintptr_t *)>opaque;
+uint16_t status;
+struct nvme_aio_flush_ctx *ctx;
+NvmeNamespace *ns;
+
+trace_pci_nvme_flush(nvme_cid(req), nsid);
+
+if (nsid != NVME_NSID_BROADCAST) {
+req->ns = nvme_ns(n, nsid);
+if (unlikely(!req->ns)) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+block_acct_start(blk_get_stats(req->ns->blkconf.blk), >acct, 0,
+ BLOCK_ACCT_FLUSH);
+req->aiocb = blk_aio_flush(req->ns->blkconf.blk, nvme_rw_cb, req);
+return NVME_NO_COMPLETE;
+}
+
+/* 1-initialize; see comment in nvme_dsm */
+*num_flushes = 1;
+
+for (int i = 1; i <= n->num_namespaces; i++) {
+ns = nvme_ns(n, i);
+if (!ns) {
+continue;
+}
+
+ctx = g_new(struct nvme_aio_flush_ctx, 1);
+ctx->req = req;
+ctx->ns = ns;
+
+(*num_flushes)++;
+
+block_acct_start(blk_get_stats(ns->blkconf.blk), >acct, 0,
+ BLOCK_ACCT_FLUSH);
+blk_aio_flush(ns->blkconf.blk, nvme_aio_flush_cb, ctx);
+}
+
+/* account for the 1-initialization */
+(*num_flushes)--;
+
+if (*num_flushes) {
+status = NVME_NO_COMPLETE;
+} else {
+status = req->status;
+}
+
+return status;
 }
 
 static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req)
@@ -2599,6 +2680,29 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 return NVME_INVALID_NSID | NVME_DNR;
 }
 
+/*
+ * In the base NVM command set, Flush may apply to all namespaces
+ * (indicated by NSID being set to 0x). But if that feature is used
+ * along with TP 4056 (Namespace Types), it may be pretty screwed up.
+ *
+ * If NSID is indeed set to 0x, we simply cannot associate the
+ * opcode with a specific command since we cannot determine a unique I/O
+ * command set. Opcode 0x0 could have any other meaning than something
+ * equivalent to flushing and say it DOES have completely different
+ * semantics in some other command set - does an NSID of 0x then
+ * mean "for all namespaces, apply whatever command set specific command
+ * that uses the 0x0 opcode?" Or does it mean "for all namespaces, apply
+ * whatever command that uses the 0x0 opcode if, and only if, 

[PULL 15/38] hw/block/nvme: use locally assigned QEMU IEEE OUI

2021-03-08 Thread Klaus Jensen
From: Gollu Appalanaidu 

Commit 6eb7a071292a ("hw/block/nvme: change controller pci id") changed
the controller to use a Red Hat assigned PCI Device and Vendor ID, but
did not change the IEEE OUI away from the Intel IEEE OUI.

Fix that and use the locally assigned QEMU IEEE OUI instead if the
`use-intel-id` parameter is not explicitly set. Also reverse the Intel
IEEE OUI bytes.

Signed-off-by: Gollu Appalanaidu 
Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/block/nvme.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index ae7ccf643673..68d80a0b4c37 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -4698,9 +4698,17 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->cntlid = cpu_to_le16(n->cntlid);
 
 id->rab = 6;
-id->ieee[0] = 0x00;
-id->ieee[1] = 0x02;
-id->ieee[2] = 0xb3;
+
+if (n->params.use_intel_id) {
+id->ieee[0] = 0xb3;
+id->ieee[1] = 0x02;
+id->ieee[2] = 0x00;
+} else {
+id->ieee[0] = 0x00;
+id->ieee[1] = 0x54;
+id->ieee[2] = 0x52;
+}
+
 id->mdts = n->params.mdts;
 id->ver = cpu_to_le32(NVME_SPEC_VER);
 id->oacs = cpu_to_le16(0);
-- 
2.30.1




[PULL 14/38] hw/block/nvme: improve invalid zasl value reporting

2021-03-08 Thread Klaus Jensen
From: Klaus Jensen 

The Zone Append Size Limit (ZASL) must be at least 4096 bytes, so
improve the user experience by adding an early parameter check in
nvme_check_constraints.

When ZASL is still too small due to the host configuring the device for
an even larger page size, convert the trace point in nvme_start_ctrl to
an NVME_GUEST_ERR such that this is logged by QEMU instead of only
traced.

Reported-by: Corne 
Cc: Dmitry Fomichev 
Reviewed-by: Dmitry Fomichev 
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 5cdf17db512c..ae7ccf643673 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -3988,8 +3988,10 @@ static int nvme_start_ctrl(NvmeCtrl *n)
 n->zasl = n->params.mdts;
 } else {
 if (n->params.zasl_bs < n->page_size) {
-trace_pci_nvme_err_startfail_zasl_too_small(n->params.zasl_bs,
-n->page_size);
+NVME_GUEST_ERR(pci_nvme_err_startfail_zasl_too_small,
+   "Zone Append Size Limit (ZASL) of %d bytes is too "
+   "small; must be at least %d bytes",
+   n->params.zasl_bs, n->page_size);
 return -1;
 }
 n->zasl = 31 - clz32(n->params.zasl_bs / n->page_size);
@@ -4508,6 +4510,12 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 error_setg(errp, "zone append size limit has to be a power of 2");
 return;
 }
+
+if (n->params.zasl_bs < 4096) {
+error_setg(errp, "zone append size limit must be at least "
+   "4096 bytes");
+return;
+}
 }
 }
 
-- 
2.30.1




[PULL 20/38] hw/block/nvme: remove unnecessary endian conversion

2021-03-08 Thread Klaus Jensen
From: Gollu Appalanaidu 

Remove an unnecessary le_to_cpu conversion in Identify.

Signed-off-by: Gollu Appalanaidu 
Signed-off-by: Klaus Jensen 
Reviewed-by: Minwoo Im 
---
 hw/block/nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 01be8a1620be..ba07e6deef5f 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -3413,7 +3413,7 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest 
*req)
 {
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
 
-switch (le32_to_cpu(c->cns)) {
+switch (c->cns) {
 case NVME_ID_CNS_NS:
  /* fall through */
 case NVME_ID_CNS_NS_PRESENT:
-- 
2.30.1




[PULL 12/38] hw/block/nvme: fix Close Zone

2021-03-08 Thread Klaus Jensen
From: Dmitry Fomichev 

Implicitly and Explicitly Open zones can be closed by Close Zone
management function. This got broken by a recent commit ("hw/block/nvme:
refactor zone resource management") and now such commands fail with
Invalid Zone State Transition status.

Modify nvm_zrm_close() function to make Close Zone work correctly.

Signed-off-by: Dmitry Fomichev 
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 59faebce28f9..5cdf17db512c 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1310,14 +1310,13 @@ static uint16_t nvme_zrm_finish(NvmeNamespace *ns, 
NvmeZone *zone)
 static uint16_t nvme_zrm_close(NvmeNamespace *ns, NvmeZone *zone)
 {
 switch (nvme_get_zone_state(zone)) {
-case NVME_ZONE_STATE_CLOSED:
-return NVME_SUCCESS;
-
 case NVME_ZONE_STATE_EXPLICITLY_OPEN:
 case NVME_ZONE_STATE_IMPLICITLY_OPEN:
 nvme_aor_dec_open(ns);
 nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED);
 /* fall through */
+case NVME_ZONE_STATE_CLOSED:
+return NVME_SUCCESS;
 
 default:
 return NVME_ZONE_INVAL_TRANSITION;
-- 
2.30.1




[PULL 13/38] hw/block/nvme: add missing mor/mar constraint checks

2021-03-08 Thread Klaus Jensen
From: Klaus Jensen 

Firstly, if zoned.max_active is non-zero, zoned.max_open must be less
than or equal to zoned.max_active.

Secondly, if only zones.max_active is set, we have to explicitly set
zones.max_open or we end up with an invalid MAR/MOR configuration. This
is an artifact of the parameters not being zeroes-based like in the
spec.

Cc: Dmitry Fomichev 
Reported-by: Gollu Appalanaidu 
Signed-off-by: Klaus Jensen 
Reviewed-by: Dmitry Fomichev 
---
 hw/block/nvme-ns.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index fd73d0321109..0e8760020483 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -163,6 +163,18 @@ static int nvme_ns_zoned_check_calc_geometry(NvmeNamespace 
*ns, Error **errp)
 return -1;
 }
 
+if (ns->params.max_active_zones) {
+if (ns->params.max_open_zones > ns->params.max_active_zones) {
+error_setg(errp, "max_open_zones (%u) exceeds max_active_zones 
(%u)",
+   ns->params.max_open_zones, ns->params.max_active_zones);
+return -1;
+}
+
+if (!ns->params.max_open_zones) {
+ns->params.max_open_zones = ns->params.max_active_zones;
+}
+}
+
 if (ns->params.zd_extension_size) {
 if (ns->params.zd_extension_size & 0x3f) {
 error_setg(errp,
-- 
2.30.1




[PULL 09/38] hw/block/nvme: pull write pointer advancement to separate function

2021-03-08 Thread Klaus Jensen
From: Klaus Jensen 

In preparation for Simple Copy, pull write pointer advancement into a
separate function that is independent off an NvmeRequest.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 9d85d498455c..4d6cd7755986 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1398,6 +1398,16 @@ static inline uint16_t nvme_zrm_open(NvmeNamespace *ns, 
NvmeZone *zone)
 return __nvme_zrm_open(ns, zone, false);
 }
 
+static void __nvme_advance_zone_wp(NvmeNamespace *ns, NvmeZone *zone,
+   uint32_t nlb)
+{
+zone->d.wp += nlb;
+
+if (zone->d.wp == nvme_zone_wr_boundary(zone)) {
+nvme_zrm_finish(ns, zone);
+}
+}
+
 static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req)
 {
 NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
@@ -1409,11 +1419,7 @@ static void nvme_finalize_zoned_write(NvmeNamespace *ns, 
NvmeRequest *req)
 nlb = le16_to_cpu(rw->nlb) + 1;
 zone = nvme_get_zone_by_slba(ns, slba);
 
-zone->d.wp += nlb;
-
-if (zone->d.wp == nvme_zone_wr_boundary(zone)) {
-nvme_zrm_finish(ns, zone);
-}
+__nvme_advance_zone_wp(ns, zone, nlb);
 }
 
 static inline bool nvme_is_write(NvmeRequest *req)
-- 
2.30.1




[PULL 08/38] hw/block/nvme: refactor zone resource management

2021-03-08 Thread Klaus Jensen
From: Klaus Jensen 

Zone transition handling and resource management is open coded (and
semi-duplicated in the case of open, close and finish).

In preparation for Simple Copy command support (which also needs to open
zones for writing), consolidate into a set of 'nvme_zrm' functions and
in the process fix a bug with the controller not closing an open zone to
allow another zone to be explicitly opened.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 220 +++-
 1 file changed, 103 insertions(+), 117 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index b8070e1f7fd9..9d85d498455c 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1283,7 +1283,46 @@ static uint16_t nvme_check_zone_read(NvmeNamespace *ns, 
uint64_t slba,
 return status;
 }
 
-static void nvme_auto_transition_zone(NvmeNamespace *ns)
+static uint16_t nvme_zrm_finish(NvmeNamespace *ns, NvmeZone *zone)
+{
+switch (nvme_get_zone_state(zone)) {
+case NVME_ZONE_STATE_FULL:
+return NVME_SUCCESS;
+
+case NVME_ZONE_STATE_IMPLICITLY_OPEN:
+case NVME_ZONE_STATE_EXPLICITLY_OPEN:
+nvme_aor_dec_open(ns);
+/* fallthrough */
+case NVME_ZONE_STATE_CLOSED:
+nvme_aor_dec_active(ns);
+/* fallthrough */
+case NVME_ZONE_STATE_EMPTY:
+nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_FULL);
+return NVME_SUCCESS;
+
+default:
+return NVME_ZONE_INVAL_TRANSITION;
+}
+}
+
+static uint16_t nvme_zrm_close(NvmeNamespace *ns, NvmeZone *zone)
+{
+switch (nvme_get_zone_state(zone)) {
+case NVME_ZONE_STATE_CLOSED:
+return NVME_SUCCESS;
+
+case NVME_ZONE_STATE_EXPLICITLY_OPEN:
+case NVME_ZONE_STATE_IMPLICITLY_OPEN:
+nvme_aor_dec_open(ns);
+nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED);
+/* fall through */
+
+default:
+return NVME_ZONE_INVAL_TRANSITION;
+}
+}
+
+static void nvme_zrm_auto_transition_zone(NvmeNamespace *ns)
 {
 NvmeZone *zone;
 
@@ -1295,34 +1334,74 @@ static void nvme_auto_transition_zone(NvmeNamespace *ns)
  * Automatically close this implicitly open zone.
  */
 QTAILQ_REMOVE(>imp_open_zones, zone, entry);
-nvme_aor_dec_open(ns);
-nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED);
+nvme_zrm_close(ns, zone);
 }
 }
 }
 
-static uint16_t nvme_auto_open_zone(NvmeNamespace *ns, NvmeZone *zone)
+static uint16_t __nvme_zrm_open(NvmeNamespace *ns, NvmeZone *zone,
+bool implicit)
 {
-uint16_t status = NVME_SUCCESS;
-uint8_t zs = nvme_get_zone_state(zone);
+int act = 0;
+uint16_t status;
 
-if (zs == NVME_ZONE_STATE_EMPTY) {
-nvme_auto_transition_zone(ns);
-status = nvme_aor_check(ns, 1, 1);
-} else if (zs == NVME_ZONE_STATE_CLOSED) {
-nvme_auto_transition_zone(ns);
-status = nvme_aor_check(ns, 0, 1);
+switch (nvme_get_zone_state(zone)) {
+case NVME_ZONE_STATE_EMPTY:
+act = 1;
+
+/* fallthrough */
+
+case NVME_ZONE_STATE_CLOSED:
+nvme_zrm_auto_transition_zone(ns);
+status = nvme_aor_check(ns, act, 1);
+if (status) {
+return status;
+}
+
+if (act) {
+nvme_aor_inc_active(ns);
+}
+
+nvme_aor_inc_open(ns);
+
+if (implicit) {
+nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_IMPLICITLY_OPEN);
+return NVME_SUCCESS;
+}
+
+/* fallthrough */
+
+case NVME_ZONE_STATE_IMPLICITLY_OPEN:
+if (implicit) {
+return NVME_SUCCESS;
+}
+
+nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_EXPLICITLY_OPEN);
+
+/* fallthrough */
+
+case NVME_ZONE_STATE_EXPLICITLY_OPEN:
+return NVME_SUCCESS;
+
+default:
+return NVME_ZONE_INVAL_TRANSITION;
 }
-
-return status;
 }
 
-static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req,
-  bool failed)
+static inline uint16_t nvme_zrm_auto(NvmeNamespace *ns, NvmeZone *zone)
+{
+return __nvme_zrm_open(ns, zone, true);
+}
+
+static inline uint16_t nvme_zrm_open(NvmeNamespace *ns, NvmeZone *zone)
+{
+return __nvme_zrm_open(ns, zone, false);
+}
+
+static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req)
 {
 NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
 NvmeZone *zone;
-NvmeZonedResult *res = (NvmeZonedResult *)>cqe;
 uint64_t slba;
 uint32_t nlb;
 
@@ -1332,47 +1411,8 @@ static void nvme_finalize_zoned_write(NvmeNamespace *ns, 
NvmeRequest *req,
 
 zone->d.wp += nlb;
 
-if (failed) {
-res->slba = 0;
-}
-
 if (zone->d.wp == nvme_zone_wr_boundary(zone)) {
-switch (nvme_get_zone_state(zone)) {
-case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-case 

[PULL 07/38] hw/block/nvme: remove unused parameter in check zone write

2021-03-08 Thread Klaus Jensen
From: Klaus Jensen 

Remove the unused NvmeCtrl parameter in nvme_check_zone_write.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 34f9be0199d5..b8070e1f7fd9 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1204,9 +1204,8 @@ static uint16_t nvme_check_zone_state_for_write(NvmeZone 
*zone)
 return NVME_INTERNAL_DEV_ERROR;
 }
 
-static uint16_t nvme_check_zone_write(NvmeCtrl *n, NvmeNamespace *ns,
-  NvmeZone *zone, uint64_t slba,
-  uint32_t nlb)
+static uint16_t nvme_check_zone_write(NvmeNamespace *ns, NvmeZone *zone,
+  uint64_t slba, uint32_t nlb)
 {
 uint64_t zcap = nvme_zone_wr_boundary(zone);
 uint16_t status;
@@ -1769,7 +1768,7 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest 
*req, bool append,
 res->slba = cpu_to_le64(slba);
 }
 
-status = nvme_check_zone_write(n, ns, zone, slba, nlb);
+status = nvme_check_zone_write(ns, zone, slba, nlb);
 if (status) {
 goto invalid;
 }
-- 
2.30.1




[PULL 11/38] hw/block/nvme: add simple copy command

2021-03-08 Thread Klaus Jensen
From: Klaus Jensen 

Add support for TP 4065a ("Simple Copy Command"), v2020.05.04
("Ratified").

The implementation uses a bounce buffer to first read in the source
logical blocks, then issue a write of that bounce buffer. The default
maximum number of source logical blocks is 128, translating to 512 KiB
for 4k logical blocks which aligns with the default value of MDTS.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme-ns.h|   4 +
 hw/block/nvme.h   |   1 +
 hw/block/nvme-ns.c|   8 ++
 hw/block/nvme.c   | 252 +-
 hw/block/trace-events |   6 +
 5 files changed, 270 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 929e78861903..7af6884862b5 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -29,6 +29,10 @@ typedef struct NvmeNamespaceParams {
 uint32_t nsid;
 QemuUUID uuid;
 
+uint16_t mssrl;
+uint32_t mcl;
+uint8_t  msrc;
+
 bool zoned;
 bool cross_zone_read;
 uint64_t zone_size_bs;
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index b8f5f2d6ffb8..cb2b5175f1a1 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -69,6 +69,7 @@ static inline const char *nvme_io_opc_str(uint8_t opc)
 case NVME_CMD_COMPARE:  return "NVME_NVM_CMD_COMPARE";
 case NVME_CMD_WRITE_ZEROES: return "NVME_NVM_CMD_WRITE_ZEROES";
 case NVME_CMD_DSM:  return "NVME_NVM_CMD_DSM";
+case NVME_CMD_COPY: return "NVME_NVM_CMD_COPY";
 case NVME_CMD_ZONE_MGMT_SEND:   return "NVME_ZONED_CMD_MGMT_SEND";
 case NVME_CMD_ZONE_MGMT_RECV:   return "NVME_ZONED_CMD_MGMT_RECV";
 case NVME_CMD_ZONE_APPEND:  return "NVME_ZONED_CMD_ZONE_APPEND";
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 64b6a491adc3..fd73d0321109 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -67,6 +67,11 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 id_ns->nmic |= NVME_NMIC_NS_SHARED;
 }
 
+/* simple copy */
+id_ns->mssrl = cpu_to_le16(ns->params.mssrl);
+id_ns->mcl = cpu_to_le32(ns->params.mcl);
+id_ns->msrc = ns->params.msrc;
+
 return 0;
 }
 
@@ -384,6 +389,9 @@ static Property nvme_ns_props[] = {
  NvmeSubsystem *),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
 DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
+DEFINE_PROP_UINT16("mssrl", NvmeNamespace, params.mssrl, 128),
+DEFINE_PROP_UINT32("mcl", NvmeNamespace, params.mcl, 128),
+DEFINE_PROP_UINT8("msrc", NvmeNamespace, params.msrc, 127),
 DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false),
 DEFINE_PROP_SIZE("zoned.zone_size", NvmeNamespace, params.zone_size_bs,
  NVME_DEFAULT_ZONE_SIZE),
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 4d6cd7755986..59faebce28f9 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -186,6 +186,7 @@ static const uint32_t nvme_cse_iocs_nvm[256] = {
 [NVME_CMD_WRITE]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_READ] = NVME_CMD_EFF_CSUPP,
 [NVME_CMD_DSM]  = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
+[NVME_CMD_COPY] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_COMPARE]  = NVME_CMD_EFF_CSUPP,
 };
 
@@ -195,6 +196,7 @@ static const uint32_t nvme_cse_iocs_zoned[256] = {
 [NVME_CMD_WRITE]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_READ] = NVME_CMD_EFF_CSUPP,
 [NVME_CMD_DSM]  = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
+[NVME_CMD_COPY] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_COMPARE]  = NVME_CMD_EFF_CSUPP,
 [NVME_CMD_ZONE_APPEND]  = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_ZONE_MGMT_SEND]   = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
@@ -1523,6 +1525,136 @@ static void nvme_aio_zone_reset_cb(void *opaque, int 
ret)
 nvme_enqueue_req_completion(nvme_cq(req), req);
 }
 
+struct nvme_copy_ctx {
+int copies;
+uint8_t *bounce;
+uint32_t nlb;
+};
+
+struct nvme_copy_in_ctx {
+NvmeRequest *req;
+QEMUIOVector iov;
+};
+
+static void nvme_copy_cb(void *opaque, int ret)
+{
+NvmeRequest *req = opaque;
+NvmeNamespace *ns = req->ns;
+struct nvme_copy_ctx *ctx = req->opaque;
+
+trace_pci_nvme_copy_cb(nvme_cid(req));
+
+if (ns->params.zoned) {
+NvmeCopyCmd *copy = (NvmeCopyCmd *)>cmd;
+uint64_t sdlba = le64_to_cpu(copy->sdlba);
+NvmeZone *zone = nvme_get_zone_by_slba(ns, sdlba);
+
+__nvme_advance_zone_wp(ns, zone, ctx->nlb);
+}
+
+if (!ret) {
+block_acct_done(blk_get_stats(ns->blkconf.blk), >acct);
+} else {
+block_acct_failed(blk_get_stats(ns->blkconf.blk), >acct);
+nvme_aio_err(req, ret);
+}
+
+g_free(ctx->bounce);
+g_free(ctx);
+
+

[PULL 10/38] nvme: updated shared header for copy command

2021-03-08 Thread Klaus Jensen
From: Klaus Jensen 

Add new data structures and types for the Simple Copy command.

Signed-off-by: Klaus Jensen 
Reviewed-by: Minwoo Im 
Acked-by: Stefan Hajnoczi 
Reviewed-by: Keith Busch 
---
 include/block/nvme.h | 47 ++--
 1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index 3db2b9b4cba7..9f8eb3988c0e 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -579,6 +579,7 @@ enum NvmeIoCommands {
 NVME_CMD_COMPARE= 0x05,
 NVME_CMD_WRITE_ZEROES   = 0x08,
 NVME_CMD_DSM= 0x09,
+NVME_CMD_COPY   = 0x19,
 NVME_CMD_ZONE_MGMT_SEND = 0x79,
 NVME_CMD_ZONE_MGMT_RECV = 0x7a,
 NVME_CMD_ZONE_APPEND= 0x7d,
@@ -724,6 +725,37 @@ typedef struct QEMU_PACKED NvmeDsmRange {
 uint64_tslba;
 } NvmeDsmRange;
 
+enum {
+NVME_COPY_FORMAT_0 = 0x0,
+};
+
+typedef struct QEMU_PACKED NvmeCopyCmd {
+uint8_t opcode;
+uint8_t flags;
+uint16_tcid;
+uint32_tnsid;
+uint32_trsvd2[4];
+NvmeCmdDptr dptr;
+uint64_tsdlba;
+uint8_t nr;
+uint8_t control[3];
+uint16_trsvd13;
+uint16_tdspec;
+uint32_treftag;
+uint16_tapptag;
+uint16_tappmask;
+} NvmeCopyCmd;
+
+typedef struct QEMU_PACKED NvmeCopySourceRange {
+uint8_t  rsvd0[8];
+uint64_t slba;
+uint16_t nlb;
+uint8_t  rsvd18[6];
+uint32_t reftag;
+uint16_t apptag;
+uint16_t appmask;
+} NvmeCopySourceRange;
+
 enum NvmeAsyncEventRequest {
 NVME_AER_TYPE_ERROR = 0,
 NVME_AER_TYPE_SMART = 1,
@@ -807,6 +839,7 @@ enum NvmeStatusCodes {
 NVME_CONFLICTING_ATTRS  = 0x0180,
 NVME_INVALID_PROT_INFO  = 0x0181,
 NVME_WRITE_TO_RO= 0x0182,
+NVME_CMD_SIZE_LIMIT = 0x0183,
 NVME_ZONE_BOUNDARY_ERROR= 0x01b8,
 NVME_ZONE_FULL  = 0x01b9,
 NVME_ZONE_READ_ONLY = 0x01ba,
@@ -994,7 +1027,7 @@ typedef struct QEMU_PACKED NvmeIdCtrl {
 uint8_t nvscc;
 uint8_t rsvd531;
 uint16_tacwu;
-uint8_t rsvd534[2];
+uint16_tocfs;
 uint32_tsgls;
 uint8_t rsvd540[228];
 uint8_t subnqn[256];
@@ -1022,6 +1055,11 @@ enum NvmeIdCtrlOncs {
 NVME_ONCS_FEATURES  = 1 << 4,
 NVME_ONCS_RESRVATIONS   = 1 << 5,
 NVME_ONCS_TIMESTAMP = 1 << 6,
+NVME_ONCS_COPY  = 1 << 8,
+};
+
+enum NvmeIdCtrlOcfs {
+NVME_OCFS_COPY_FORMAT_0 = 1 << 0,
 };
 
 enum NvmeIdCtrlFrmw {
@@ -1175,7 +1213,10 @@ typedef struct QEMU_PACKED NvmeIdNs {
 uint16_tnpdg;
 uint16_tnpda;
 uint16_tnows;
-uint8_t rsvd74[30];
+uint16_tmssrl;
+uint32_tmcl;
+uint8_t msrc;
+uint8_t rsvd81[23];
 uint8_t nguid[16];
 uint64_teui64;
 NvmeLBAFlbaf[16];
@@ -1331,6 +1372,7 @@ static inline void _nvme_check_size(void)
 QEMU_BUILD_BUG_ON(sizeof(NvmeZonedResult) != 8);
 QEMU_BUILD_BUG_ON(sizeof(NvmeCqe) != 16);
 QEMU_BUILD_BUG_ON(sizeof(NvmeDsmRange) != 16);
+QEMU_BUILD_BUG_ON(sizeof(NvmeCopySourceRange) != 32);
 QEMU_BUILD_BUG_ON(sizeof(NvmeCmd) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeDeleteQ) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeCreateCq) != 64);
@@ -1338,6 +1380,7 @@ static inline void _nvme_check_size(void)
 QEMU_BUILD_BUG_ON(sizeof(NvmeIdentify) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeRwCmd) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeDsmCmd) != 64);
+QEMU_BUILD_BUG_ON(sizeof(NvmeCopyCmd) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeRangeType) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeErrorLog) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeFwSlotInfoLog) != 512);
-- 
2.30.1




[PULL 04/38] hw/block/nvme: support for multi-controller in subsystem

2021-03-08 Thread Klaus Jensen
From: Minwoo Im 

We have nvme-subsys and nvme devices mapped together.  To support
multi-controller scheme to this setup, controller identifier(id) has to
be managed.  Earlier, cntlid(controller id) used to be always 0 because
we didn't have any subsystem scheme that controller id matters.

This patch introduced 'cntlid' attribute to the nvme controller
instance(NvmeCtrl) and make it allocated by the nvme-subsys device
mapped to the controller.  If nvme-subsys is not given to the
controller, then it will always be 0 as it was.

Added 'ctrls' array in the nvme-subsys instance to manage attached
controllers to the subsystem with a limit(32).  This patch didn't take
list for the controllers to make it seamless with nvme-ns device.

Signed-off-by: Minwoo Im 
Tested-by: Klaus Jensen 
Reviewed-by: Klaus Jensen 
Reviewed-by: Keith Busch 
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-subsys.h |  4 
 hw/block/nvme.h|  1 +
 hw/block/nvme-subsys.c | 21 +
 hw/block/nvme.c| 29 +
 4 files changed, 55 insertions(+)

diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 40f06a4c7db0..4eba50d96a1d 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -20,6 +20,10 @@ typedef struct NvmeNamespace NvmeNamespace;
 typedef struct NvmeSubsystem {
 DeviceState parent_obj;
 uint8_t subnqn[256];
+
+NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS];
 } NvmeSubsystem;
 
+int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
+
 #endif /* NVME_SUBSYS_H */
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 04d4684601fd..b8f5f2d6ffb8 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -134,6 +134,7 @@ typedef struct NvmeCtrl {
 NvmeBus  bus;
 BlockConfconf;
 
+uint16_tcntlid;
 boolqs_created;
 uint32_tpage_size;
 uint16_tpage_bits;
diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
index aa82911b951c..e9d61c993c90 100644
--- a/hw/block/nvme-subsys.c
+++ b/hw/block/nvme-subsys.c
@@ -22,6 +22,27 @@
 #include "nvme.h"
 #include "nvme-subsys.h"
 
+int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
+{
+NvmeSubsystem *subsys = n->subsys;
+int cntlid;
+
+for (cntlid = 0; cntlid < ARRAY_SIZE(subsys->ctrls); cntlid++) {
+if (!subsys->ctrls[cntlid]) {
+break;
+}
+}
+
+if (cntlid == ARRAY_SIZE(subsys->ctrls)) {
+error_setg(errp, "no more free controller id");
+return -1;
+}
+
+subsys->ctrls[cntlid] = n;
+
+return cntlid;
+}
+
 static void nvme_subsys_setup(NvmeSubsystem *subsys)
 {
 snprintf((char *)subsys->subnqn, sizeof(subsys->subnqn),
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 84c7e2798026..4e8e15a82da0 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -4439,6 +4439,9 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' ');
 strpadcpy((char *)id->fr, sizeof(id->fr), "1.0", ' ');
 strpadcpy((char *)id->sn, sizeof(id->sn), n->params.serial, ' ');
+
+id->cntlid = cpu_to_le16(n->cntlid);
+
 id->rab = 6;
 id->ieee[0] = 0x00;
 id->ieee[1] = 0x02;
@@ -4485,6 +4488,10 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->psd[0].enlat = cpu_to_le32(0x10);
 id->psd[0].exlat = cpu_to_le32(0x4);
 
+if (n->subsys) {
+id->cmic |= NVME_CMIC_MULTI_CTRL;
+}
+
 NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
 NVME_CAP_SET_CQR(n->bar.cap, 1);
 NVME_CAP_SET_TO(n->bar.cap, 0xf);
@@ -4499,6 +4506,24 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 n->bar.intmc = n->bar.intms = 0;
 }
 
+static int nvme_init_subsys(NvmeCtrl *n, Error **errp)
+{
+int cntlid;
+
+if (!n->subsys) {
+return 0;
+}
+
+cntlid = nvme_subsys_register_ctrl(n, errp);
+if (cntlid < 0) {
+return -1;
+}
+
+n->cntlid = cntlid;
+
+return 0;
+}
+
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
 NvmeCtrl *n = NVME(pci_dev);
@@ -4519,6 +4544,10 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 return;
 }
 
+if (nvme_init_subsys(n, errp)) {
+error_propagate(errp, local_err);
+return;
+}
 nvme_init_ctrl(n, pci_dev);
 
 /* setup a namespace if the controller drive property was given */
-- 
2.30.1




[PULL 05/38] hw/block/nvme: add NMIC enum value for Identify Namespace

2021-03-08 Thread Klaus Jensen
From: Minwoo Im 

Added Namespace Multi-path I/O and Namespace Sharing Capabilities (NMIC)
field to support shared namespace from controller(s).

This field is in Identify Namespace data structure in [30].

Signed-off-by: Minwoo Im 
Tested-by: Klaus Jensen 
Reviewed-by: Klaus Jensen 
Reviewed-by: Keith Busch 
Signed-off-by: Klaus Jensen 
---
 include/block/nvme.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index f1d3a78658eb..3db2b9b4cba7 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1203,6 +1203,10 @@ enum NvmeNsIdentifierType {
 NVME_NIDT_CSI   = 0x04,
 };
 
+enum NvmeIdNsNmic {
+NVME_NMIC_NS_SHARED = 1 << 0,
+};
+
 enum NvmeCsi {
 NVME_CSI_NVM= 0x00,
 NVME_CSI_ZONED  = 0x02,
-- 
2.30.1




[PULL 03/38] hw/block/nvme: add CMIC enum value for Identify Controller

2021-03-08 Thread Klaus Jensen
From: Minwoo Im 

Added Controller Multi-path I/O and Namespace Sharing Capabilities
(CMIC) field to support multi-controller in the following patches.

This field is in Identify Controller data structure in [76].

Signed-off-by: Minwoo Im 
Tested-by: Klaus Jensen 
Reviewed-by: Klaus Jensen 
Reviewed-by: Keith Busch 
Signed-off-by: Klaus Jensen 
---
 include/block/nvme.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index 07cfc929368b..f1d3a78658eb 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1034,6 +1034,10 @@ enum NvmeIdCtrlLpa {
 NVME_LPA_EXTENDED = 1 << 2,
 };
 
+enum NvmeIdCtrlCmic {
+NVME_CMIC_MULTI_CTRL= 1 << 1,
+};
+
 #define NVME_CTRL_SQES_MIN(sqes) ((sqes) & 0xf)
 #define NVME_CTRL_SQES_MAX(sqes) (((sqes) >> 4) & 0xf)
 #define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf)
-- 
2.30.1




[PULL 06/38] hw/block/nvme: support for shared namespace in subsystem

2021-03-08 Thread Klaus Jensen
From: Minwoo Im 

nvme-ns device is registered to a nvme controller device during the
initialization in nvme_register_namespace() in case that 'bus' property
is given which means it's mapped to a single controller.

This patch introduced a new property 'subsys' just like the controller
device instance did to map a namespace to a NVMe subsystem.

If 'subsys' property is given to the nvme-ns device, it will belong to
the specified subsystem and will be attached to all controllers in that
subsystem by enabling shared namespace capability in NMIC(Namespace
Multi-path I/O and Namespace Capabilities) in Identify Namespace.

Usage:

  -device nvme-subsys,id=subsys0
  -device nvme,serial=foo,id=nvme0,subsys=subsys0
  -device nvme,serial=bar,id=nvme1,subsys=subsys0
  -device nvme,serial=baz,id=nvme2,subsys=subsys0
  -device nvme-ns,id=ns1,drive=,nsid=1,subsys=subsys0  # Shared
  -device nvme-ns,id=ns2,drive=,nsid=2,bus=nvme2   # Non-shared

  In the above example, 'ns1' will be shared to 'nvme0' and 'nvme1' in
  the same subsystem.  On the other hand, 'ns2' will be attached to the
  'nvme2' only as a private namespace in that subsystem.

All the namespace with 'subsys' parameter will attach all controllers in
the subsystem to the namespace by default.

Signed-off-by: Minwoo Im 
Tested-by: Klaus Jensen 
Reviewed-by: Klaus Jensen 
Reviewed-by: Keith Busch 
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-ns.h |  7 +++
 hw/block/nvme-subsys.h |  3 +++
 hw/block/nvme-ns.c | 17 ++---
 hw/block/nvme-subsys.c | 25 +
 hw/block/nvme.c| 10 +-
 5 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 293ac990e3f6..929e78861903 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -47,6 +47,8 @@ typedef struct NvmeNamespace {
 const uint32_t *iocs;
 uint8_t  csi;
 
+NvmeSubsystem   *subsys;
+
 NvmeIdNsZoned   *id_ns_zoned;
 NvmeZone*zone_array;
 QTAILQ_HEAD(, NvmeZone) exp_open_zones;
@@ -77,6 +79,11 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 return -1;
 }
 
+static inline bool nvme_ns_shared(NvmeNamespace *ns)
+{
+return !!ns->subsys;
+}
+
 static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
 {
 NvmeIdNs *id_ns = >id_ns;
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 4eba50d96a1d..ccf6a71398d3 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -14,6 +14,7 @@
 OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
 
 #define NVME_SUBSYS_MAX_CTRLS   32
+#define NVME_SUBSYS_MAX_NAMESPACES  32
 
 typedef struct NvmeCtrl NvmeCtrl;
 typedef struct NvmeNamespace NvmeNamespace;
@@ -22,8 +23,10 @@ typedef struct NvmeSubsystem {
 uint8_t subnqn[256];
 
 NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS];
+NvmeNamespace *namespaces[NVME_SUBSYS_MAX_NAMESPACES];
 } NvmeSubsystem;
 
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
+int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp);
 
 #endif /* NVME_SUBSYS_H */
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 93ac6e107a09..64b6a491adc3 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -63,6 +63,10 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 
 id_ns->npda = id_ns->npdg = npdg - 1;
 
+if (nvme_ns_shared(ns)) {
+id_ns->nmic |= NVME_NMIC_NS_SHARED;
+}
+
 return 0;
 }
 
@@ -363,14 +367,21 @@ static void nvme_ns_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-if (nvme_register_namespace(n, ns, errp)) {
-return;
+if (ns->subsys) {
+if (nvme_subsys_register_ns(ns, errp)) {
+return;
+}
+} else {
+if (nvme_register_namespace(n, ns, errp)) {
+return;
+}
 }
-
 }
 
 static Property nvme_ns_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
+DEFINE_PROP_LINK("subsys", NvmeNamespace, subsys, TYPE_NVME_SUBSYS,
+ NvmeSubsystem *),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
 DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
 DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false),
diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
index e9d61c993c90..641de33e99fc 100644
--- a/hw/block/nvme-subsys.c
+++ b/hw/block/nvme-subsys.c
@@ -43,6 +43,31 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
 return cntlid;
 }
 
+int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp)
+{
+NvmeSubsystem *subsys = ns->subsys;
+NvmeCtrl *n;
+int i;
+
+if (subsys->namespaces[nvme_nsid(ns)]) {
+error_setg(errp, "namespace %d already registerd to subsy %s",
+   nvme_nsid(ns), subsys->parent_obj.id);
+return -1;
+}
+
+subsys->namespaces[nvme_nsid(ns)] = ns;
+
+for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
+n = subsys->ctrls[i];
+
+if 

[PULL 01/38] hw/block/nvme: introduce nvme-subsys device

2021-03-08 Thread Klaus Jensen
From: Minwoo Im 

To support multi-path in QEMU NVMe device model, We need to have NVMe
subsystem hierarchy to map controllers and namespaces to a NVMe
subsystem.

This patch introduced a simple nvme-subsys device model.  The subsystem
will be prepared with subsystem NQN with  provided in
nvme-subsys device:

  ex) -device nvme-subsys,id=subsys0: nqn.2019-08.org.qemu:subsys0

Signed-off-by: Minwoo Im 
Tested-by: Klaus Jensen 
Reviewed-by: Klaus Jensen 
Reviewed-by: Keith Busch 
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-subsys.h | 25 ++
 hw/block/nvme-subsys.c | 60 ++
 hw/block/nvme.c|  3 +++
 hw/block/meson.build   |  2 +-
 4 files changed, 89 insertions(+), 1 deletion(-)
 create mode 100644 hw/block/nvme-subsys.h
 create mode 100644 hw/block/nvme-subsys.c

diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
new file mode 100644
index ..40f06a4c7db0
--- /dev/null
+++ b/hw/block/nvme-subsys.h
@@ -0,0 +1,25 @@
+/*
+ * QEMU NVM Express Subsystem: nvme-subsys
+ *
+ * Copyright (c) 2021 Minwoo Im 
+ *
+ * This code is licensed under the GNU GPL v2.  Refer COPYING.
+ */
+
+#ifndef NVME_SUBSYS_H
+#define NVME_SUBSYS_H
+
+#define TYPE_NVME_SUBSYS "nvme-subsys"
+#define NVME_SUBSYS(obj) \
+OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
+
+#define NVME_SUBSYS_MAX_CTRLS   32
+
+typedef struct NvmeCtrl NvmeCtrl;
+typedef struct NvmeNamespace NvmeNamespace;
+typedef struct NvmeSubsystem {
+DeviceState parent_obj;
+uint8_t subnqn[256];
+} NvmeSubsystem;
+
+#endif /* NVME_SUBSYS_H */
diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
new file mode 100644
index ..aa82911b951c
--- /dev/null
+++ b/hw/block/nvme-subsys.c
@@ -0,0 +1,60 @@
+/*
+ * QEMU NVM Express Subsystem: nvme-subsys
+ *
+ * Copyright (c) 2021 Minwoo Im 
+ *
+ * This code is licensed under the GNU GPL v2.  Refer COPYING.
+ */
+
+#include "qemu/units.h"
+#include "qemu/osdep.h"
+#include "qemu/uuid.h"
+#include "qemu/iov.h"
+#include "qemu/cutils.h"
+#include "qapi/error.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-core.h"
+#include "hw/block/block.h"
+#include "block/aio.h"
+#include "block/accounting.h"
+#include "sysemu/sysemu.h"
+#include "hw/pci/pci.h"
+#include "nvme.h"
+#include "nvme-subsys.h"
+
+static void nvme_subsys_setup(NvmeSubsystem *subsys)
+{
+snprintf((char *)subsys->subnqn, sizeof(subsys->subnqn),
+ "nqn.2019-08.org.qemu:%s", subsys->parent_obj.id);
+}
+
+static void nvme_subsys_realize(DeviceState *dev, Error **errp)
+{
+NvmeSubsystem *subsys = NVME_SUBSYS(dev);
+
+nvme_subsys_setup(subsys);
+}
+
+static void nvme_subsys_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+
+set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
+
+dc->realize = nvme_subsys_realize;
+dc->desc = "Virtual NVMe subsystem";
+}
+
+static const TypeInfo nvme_subsys_info = {
+.name = TYPE_NVME_SUBSYS,
+.parent = TYPE_DEVICE,
+.class_init = nvme_subsys_class_init,
+.instance_size = sizeof(NvmeSubsystem),
+};
+
+static void nvme_subsys_register_types(void)
+{
+type_register_static(_subsys_info);
+}
+
+type_init(nvme_subsys_register_types)
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index fb83636abdc1..1950b34684cd 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -25,6 +25,7 @@
  *  mdts=,zoned.append_size_limit= \
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=
+ *  -device nvme-subsys,id=
  *
  * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
  * offset 0 in BAR2 and supports only WDS, RDS and SQS for now. By default, the
@@ -38,6 +39,8 @@
  *
  * The PMR will use BAR 4/5 exclusively.
  *
+ * To place controller(s) and namespace(s) to a subsystem, then provide
+ * nvme-subsys device as above.
  *
  * nvme device parameters
  * ~~
diff --git a/hw/block/meson.build b/hw/block/meson.build
index 602ca6c8541d..83ea2d37978d 100644
--- a/hw/block/meson.build
+++ b/hw/block/meson.build
@@ -13,7 +13,7 @@ softmmu_ss.add(when: 'CONFIG_SSI_M25P80', if_true: 
files('m25p80.c'))
 softmmu_ss.add(when: 'CONFIG_SWIM', if_true: files('swim.c'))
 softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c'))
 softmmu_ss.add(when: 'CONFIG_SH4', if_true: files('tc58128.c'))
-softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c'))
+softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c', 
'nvme-subsys.c'))
 
 specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c'))
 specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
files('vhost-user-blk.c'))
-- 
2.30.1




[PULL 02/38] hw/block/nvme: support to map controller to a subsystem

2021-03-08 Thread Klaus Jensen
From: Minwoo Im 

nvme controller(nvme) can be mapped to a NVMe subsystem(nvme-subsys).
This patch maps a controller to a subsystem by adding a parameter
'subsys' to the nvme device.

To map a controller to a subsystem, we need to put nvme-subsys first and
then maps the subsystem to the controller:

  -device nvme-subsys,id=subsys0
  -device nvme,serial=foo,id=nvme0,subsys=subsys0

If 'subsys' property is not given to the nvme controller, then subsystem
NQN will be created with serial (e.g., 'foo' in above example),
Otherwise, it will be based on subsys id (e.g., 'subsys0' in above
example).

Signed-off-by: Minwoo Im 
Tested-by: Klaus Jensen 
Reviewed-by: Klaus Jensen 
Reviewed-by: Keith Busch 
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.h |  3 +++
 hw/block/nvme.c | 30 +-
 2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index dee6092bd45f..04d4684601fd 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -2,6 +2,7 @@
 #define HW_NVME_H
 
 #include "block/nvme.h"
+#include "nvme-subsys.h"
 #include "nvme-ns.h"
 
 #define NVME_MAX_NAMESPACES 256
@@ -170,6 +171,8 @@ typedef struct NvmeCtrl {
 
 uint8_t zasl;
 
+NvmeSubsystem   *subsys;
+
 NvmeNamespace   namespace;
 NvmeNamespace   *namespaces[NVME_MAX_NAMESPACES];
 NvmeSQueue  **sq;
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 1950b34684cd..84c7e2798026 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -22,7 +22,8 @@
  *  [pmrdev=,] \
  *  max_ioqpairs=, \
  *  aerl=, aer_max_queued=, \
- *  mdts=,zoned.append_size_limit= \
+ *  mdts=,zoned.append_size_limit=, \
+ *  subsys= \
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=
  *  -device nvme-subsys,id=
@@ -44,6 +45,13 @@
  *
  * nvme device parameters
  * ~~
+ * - `subsys`
+ *   NVM Subsystem device. If given, a subsystem NQN will be initialized with
+ *given. Otherwise,  will be taken for subsystem NQN.
+ *   Also, it will enable multi controller capability represented in Identify
+ *   Controller data structure in CMIC (Controller Multi-path I/O and Namesapce
+ *   Sharing Capabilities), if given.
+ *
  * - `aerl`
  *   The Asynchronous Event Request Limit (AERL). Indicates the maximum number
  *   of concurrently outstanding Asynchronous Event Request commands support
@@ -4408,11 +4416,23 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 return 0;
 }
 
+static void nvme_init_subnqn(NvmeCtrl *n)
+{
+NvmeSubsystem *subsys = n->subsys;
+NvmeIdCtrl *id = >id_ctrl;
+
+if (!subsys) {
+snprintf((char *)id->subnqn, sizeof(id->subnqn),
+ "nqn.2019-08.org.qemu:%s", n->params.serial);
+} else {
+pstrcpy((char *)id->subnqn, sizeof(id->subnqn), (char*)subsys->subnqn);
+}
+}
+
 static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
 {
 NvmeIdCtrl *id = >id_ctrl;
 uint8_t *pci_conf = pci_dev->config;
-char *subnqn;
 
 id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
 id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
@@ -4459,9 +4479,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->sgls = cpu_to_le32(NVME_CTRL_SGLS_SUPPORT_NO_ALIGN |
NVME_CTRL_SGLS_BITBUCKET);
 
-subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", n->params.serial);
-strpadcpy((char *)id->subnqn, sizeof(id->subnqn), subnqn, '\0');
-g_free(subnqn);
+nvme_init_subnqn(n);
 
 id->psd[0].mp = cpu_to_le16(0x9c4);
 id->psd[0].enlat = cpu_to_le32(0x10);
@@ -4553,6 +4571,8 @@ static Property nvme_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeCtrl, namespace.blkconf),
 DEFINE_PROP_LINK("pmrdev", NvmeCtrl, pmr.dev, TYPE_MEMORY_BACKEND,
  HostMemoryBackend *),
+DEFINE_PROP_LINK("subsys", NvmeCtrl, subsys, TYPE_NVME_SUBSYS,
+ NvmeSubsystem *),
 DEFINE_PROP_STRING("serial", NvmeCtrl, params.serial),
 DEFINE_PROP_UINT32("cmb_size_mb", NvmeCtrl, params.cmb_size_mb, 0),
 DEFINE_PROP_UINT32("num_queues", NvmeCtrl, params.num_queues, 0),
-- 
2.30.1




[PULL 00/38] emulated nvme device updates

2021-03-08 Thread Klaus Jensen
From: Klaus Jensen 

The following changes since commit 91e92cad67caca3bc4b8e920ddb5c8ca64aac9e1:

  Merge remote-tracking branch 'remotes/cohuck-gitlab/tags/s390x-20210305' into 
staging (2021-03-05 19:04:47 +)

are available in the Git repository at:

  git://git.infradead.org/qemu-nvme.git tags/nvme-next-pull-request

for you to fetch changes up to 552dca9ce2473acfa78e65320538d4d0a07e11b2:

  hw/block/nvme: support Identify NS Attached Controller List (2021-03-08 
12:28:30 +0100)


hw/block/nvme updates

* NVMe subsystem support (`-device nvme-subsys`) (Minwoo Im)
* Namespace (De|At)tachment support (Minwoo Im)
* Simple Copy command support (Klaus Jensen)
* Flush broadcast support (Gollu Appalanaidu)
* QEMUIOVector/QEMUSGList duality refactoring (Klaus Jensen)

plus various fixes from Minwoo, Gollu, Dmitry and me.



Dmitry Fomichev (1):
  hw/block/nvme: fix Close Zone

Gollu Appalanaidu (7):
  hw/block/nvme: use locally assigned QEMU IEEE OUI
  hw/block/nvme: add broadcast nsid support flush command
  hw/block/nvme: remove unnecessary endian conversion
  hw/block/nvme: add identify trace event
  hw/block/nvme: fix potential compilation error
  hw/block/nvme: add trace event for zone read check
  hw/block/nvme: report non-mdts command size limit for dsm

Klaus Jensen (16):
  hw/block/nvme: remove unused parameter in check zone write
  hw/block/nvme: refactor zone resource management
  hw/block/nvme: pull write pointer advancement to separate function
  nvme: updated shared header for copy command
  hw/block/nvme: add simple copy command
  hw/block/nvme: add missing mor/mar constraint checks
  hw/block/nvme: improve invalid zasl value reporting
  hw/block/nvme: document 'mdts' nvme device parameter
  hw/block/nvme: deduplicate bad mdts trace event
  hw/block/nvme: align zoned.zasl with mdts
  hw/block/nvme: remove redundant len member in compare context
  hw/block/nvme: remove block accounting for write zeroes
  hw/block/nvme: fix strerror printing
  hw/block/nvme: try to deal with the iov/qsg duality
  hw/block/nvme: remove the req dependency in map functions
  hw/block/nvme: refactor nvme_dma

Minwoo Im (14):
  hw/block/nvme: introduce nvme-subsys device
  hw/block/nvme: support to map controller to a subsystem
  hw/block/nvme: add CMIC enum value for Identify Controller
  hw/block/nvme: support for multi-controller in subsystem
  hw/block/nvme: add NMIC enum value for Identify Namespace
  hw/block/nvme: support for shared namespace in subsystem
  hw/block/nvme: support namespace detach
  hw/block/nvme: fix namespaces array to 1-based
  hw/block/nvme: fix allocated namespace list to 256
  hw/block/nvme: support allocated namespace type
  hw/block/nvme: refactor nvme_select_ns_iocs
  hw/block/nvme: support namespace attachment command
  hw/block/nvme: support changed namespace asynchronous event
  hw/block/nvme: support Identify NS Attached Controller List

 hw/block/nvme-ns.h |   13 +
 hw/block/nvme-subsys.h |   56 ++
 hw/block/nvme.h|   63 +-
 include/block/nvme.h   |   88 ++-
 hw/block/nvme-ns.c |   38 +-
 hw/block/nvme-subsys.c |  106 +++
 hw/block/nvme.c| 1434 +---
 hw/block/meson.build   |2 +-
 hw/block/trace-events  |   21 +-
 9 files changed, 1424 insertions(+), 397 deletions(-)
 create mode 100644 hw/block/nvme-subsys.h
 create mode 100644 hw/block/nvme-subsys.c

-- 
2.30.1




Re: block/throttle and burst bucket

2021-03-08 Thread Alberto Garcia
On Mon 01 Mar 2021 01:11:55 PM CET, Peter Lieven  wrote:
> Why we talk about throttling I still do not understand the following part in 
> util/throttle.c function throttle_compute_wait
>
>
>     if (!bkt->max) {
>     /* If bkt->max is 0 we still want to allow short bursts of I/O
>  * from the guest, otherwise every other request will be throttled
>  * and performance will suffer considerably. */
>     bucket_size = (double) bkt->avg / 10;
>     burst_bucket_size = 0;
>     } else {
>     /* If we have a burst limit then we have to wait until all I/O
>  * at burst rate has finished before throttling to bkt->avg */
>     bucket_size = bkt->max * bkt->burst_length;
>     burst_bucket_size = (double) bkt->max / 10;
>     }
>
>
> Why burst_bucket_size = bkt->max / 10?
>
> From what I understand it should be bkt->max. Otherwise we compare the
> "extra" against a tenth of the bucket capacity

1) bkt->max is the burst rate in bytes/second [*]
2) burst_bucket_size is used to decide when to start throttling (you can
   see the code at the end of throttle_compute_wait()).

The important thing is that burst_bucket_size does not actually have an
influence on the actual burst rate. Increasing that value is not going
to make the I/O faster, it just means that I/O will be throttled later.

Once the I/O is throttled, the actual burst rate is define by how quick
the burst bucket leaks (see throttle_leak_bucket()).

The higher burst_bucket_size is, the longer we allow the guest to exceed
the maximum rate. So we divide blk->max by 10 in order to allow the
guest to perform 100ms' worth of I/O without being throttled.

See the commit message of 0770a7a6466cc2dbf4ac91841173ad4488e1fbc7 for
more details.

Berto



  1   2   >