date:20200922

Re: [PATCH v2 0/2] block: deprecate the sheepdog driver

2020-09-22 Thread Vladimir Sementsov-Ogievskiy


22.09.2020 21:11, Neal Gompa wrote:

On Tue, Sep 22, 2020 at 1:43 PM Daniel P. Berrangé  wrote:


On Tue, Sep 22, 2020 at 01:09:06PM -0400, Neal Gompa wrote:

On Tue, Sep 22, 2020 at 12:16 PM Daniel P. Berrangé  wrote:


2 years back I proposed dropping the sheepdog mailing list from the
MAINTAINERS file, but somehow the patch never got picked up:

   https://lists.gnu.org/archive/html/qemu-block/2018-03/msg01048.html

So here I am with the same patch again.

This time I go further and deprecate the sheepdog driver entirely.
See the rationale in the second patch commit message.

Daniel P. Berrangé (2):
   block: drop moderated sheepdog mailing list from MAINTAINERS file
   block: deprecate the sheepdog block driver

  MAINTAINERS|  1 -
  block/sheepdog.c   | 15 +++
  configure  |  5 +++--
  docs/system/deprecated.rst |  9 +
  4 files changed, 27 insertions(+), 3 deletions(-)

--
2.26.2




I don't know of anyone shipping this other than Fedora, and I
certainly don't use it there.

Upstream looks like it's unmaintained now, with no commits in over two
years: https://github.com/sheepdog/sheepdog/commits/master

Can we also get a corresponding change to eliminate this from libvirt?


This is only deprecation in QEMU, the feature still exists and is
intended to be as fully functional as in previous releases.

Assuming QEMU actually deletes the feature at end of the deprecation
cycle, then libvirt would look at removing its own support.



Does that stop us from deprecating it in libvirt though?



I think not. Libvirt is not intended to support all Qemu features and
I'm sure it doesn't. Amd it can safely deprecate features even if they
are not-deprecated in Qemu.

The only possible conflict is when Qemu wants to deprecate something
that Libvirt wants to continue support for (or even can't live without).

--
Best regards,
Vladimir

Re: [PATCH v2 0/2] block: deprecate the sheepdog driver

2020-09-22 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20200922161611.2049616-1-berra...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20200922161611.2049616-1-berra...@redhat.com
Subject: [PATCH v2 0/2] block: deprecate the sheepdog driver

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
2c11635 block: deprecate the sheepdog block driver
aa3d54c block: drop moderated sheepdog mailing list from MAINTAINERS file

=== OUTPUT BEGIN ===
1/2 Checking commit aa3d54cfa28f (block: drop moderated sheepdog mailing list 
from MAINTAINERS file)
2/2 Checking commit 2c1163558579 (block: deprecate the sheepdog block driver)
ERROR: do not initialise statics to 0 or NULL
#54: FILE: block/sheepdog.c:247:
+static bool warned = false;

total: 1 errors, 0 warnings, 71 lines checked

Patch 2/2 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200922161611.2049616-1-berra...@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v2 0/2] block: deprecate the sheepdog driver

2020-09-22 Thread Neal Gompa

On Tue, Sep 22, 2020 at 1:43 PM Daniel P. Berrangé  wrote:
>
> On Tue, Sep 22, 2020 at 01:09:06PM -0400, Neal Gompa wrote:
> > On Tue, Sep 22, 2020 at 12:16 PM Daniel P. Berrangé  
> > wrote:
> > >
> > > 2 years back I proposed dropping the sheepdog mailing list from the
> > > MAINTAINERS file, but somehow the patch never got picked up:
> > >
> > >   https://lists.gnu.org/archive/html/qemu-block/2018-03/msg01048.html
> > >
> > > So here I am with the same patch again.
> > >
> > > This time I go further and deprecate the sheepdog driver entirely.
> > > See the rationale in the second patch commit message.
> > >
> > > Daniel P. Berrangé (2):
> > >   block: drop moderated sheepdog mailing list from MAINTAINERS file
> > >   block: deprecate the sheepdog block driver
> > >
> > >  MAINTAINERS|  1 -
> > >  block/sheepdog.c   | 15 +++
> > >  configure  |  5 +++--
> > >  docs/system/deprecated.rst |  9 +
> > >  4 files changed, 27 insertions(+), 3 deletions(-)
> > >
> > > --
> > > 2.26.2
> > >
> > >
> >
> > I don't know of anyone shipping this other than Fedora, and I
> > certainly don't use it there.
> >
> > Upstream looks like it's unmaintained now, with no commits in over two
> > years: https://github.com/sheepdog/sheepdog/commits/master
> >
> > Can we also get a corresponding change to eliminate this from libvirt?
>
> This is only deprecation in QEMU, the feature still exists and is
> intended to be as fully functional as in previous releases.
>
> Assuming QEMU actually deletes the feature at end of the deprecation
> cycle, then libvirt would look at removing its own support.
>

Does that stop us from deprecating it in libvirt though?

-- 
真実はいつも一つ！/ Always, there's only one truth!

Re: [PATCH v2 2/2] block: deprecate the sheepdog block driver

2020-09-22 Thread Vladimir Sementsov-Ogievskiy


22.09.2020 19:16, Daniel P. Berrangé wrote:

This thread from a little over a year ago:

   http://lists.wpkg.org/pipermail/sheepdog/2019-March/thread.html

states that sheepdog is no longer actively developed. The only mentioned
users are some companies who are said to have it for legacy reasons with
plans to replace it by Ceph. There is talk about cutting out existing
features to turn it into a simple demo of how to write a distributed
block service. There is no evidence of anyone working on that idea:

   https://github.com/sheepdog/sheepdog/commits/master

No real commits to git since Jan 2018, and before then just some minor
technical debt cleanup..

There is essentially no activity on the mailing list aside from
patches to QEMU that get CC'd due to our MAINTAINERS entry.

Fedora packages for sheepdog failed to build from upstream source
because of the more strict linker that no longer merges duplicate
global symbols. Fedora patches it to add the missing "extern"
annotations and presumably other distros do to, but upstream source
remains broken.

There is only basic compile testing, no functional testing of the
driver.

Since there are no build pre-requisites the sheepdog driver is currently
enabled unconditionally. This would result in configure issuing a
deprecation warning by default for all users. Thus the configure default
is changed to disable it, requiring users to pass --enable-sheepdog to
build the driver.

Signed-off-by: Daniel P. Berrangé


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

Re: [PATCH v2 1/2] block: drop moderated sheepdog mailing list from MAINTAINERS file

2020-09-22 Thread Vladimir Sementsov-Ogievskiy


22.09.2020 19:16, Daniel P. Berrangé wrote:

The sheepdog mailing list is setup to stop and queue messages from
non-subscribers, pending moderator approval. Unfortunately it seems
that the moderation queue is not actively dealt with. Even when messages
are approved, the sender is never added to the whitelist, so every
future mail from the same sender continues to get stopped for moderation.

MAINTAINERS entries should be responsive and not unneccessarily block
mails from QEMU contributors, so drop the sheepdog mailing list.

Reviewed-by: Philippe Mathieu-Daudé
Signed-off-by: Daniel P. Berrangé


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

Re: [PATCH v2 0/2] block: deprecate the sheepdog driver

2020-09-22 Thread Daniel P . Berrangé

On Tue, Sep 22, 2020 at 01:09:06PM -0400, Neal Gompa wrote:
> On Tue, Sep 22, 2020 at 12:16 PM Daniel P. Berrangé  
> wrote:
> >
> > 2 years back I proposed dropping the sheepdog mailing list from the
> > MAINTAINERS file, but somehow the patch never got picked up:
> >
> >   https://lists.gnu.org/archive/html/qemu-block/2018-03/msg01048.html
> >
> > So here I am with the same patch again.
> >
> > This time I go further and deprecate the sheepdog driver entirely.
> > See the rationale in the second patch commit message.
> >
> > Daniel P. Berrangé (2):
> >   block: drop moderated sheepdog mailing list from MAINTAINERS file
> >   block: deprecate the sheepdog block driver
> >
> >  MAINTAINERS|  1 -
> >  block/sheepdog.c   | 15 +++
> >  configure  |  5 +++--
> >  docs/system/deprecated.rst |  9 +
> >  4 files changed, 27 insertions(+), 3 deletions(-)
> >
> > --
> > 2.26.2
> >
> >
> 
> I don't know of anyone shipping this other than Fedora, and I
> certainly don't use it there.
> 
> Upstream looks like it's unmaintained now, with no commits in over two
> years: https://github.com/sheepdog/sheepdog/commits/master
> 
> Can we also get a corresponding change to eliminate this from libvirt?

This is only deprecation in QEMU, the feature still exists and is
intended to be as fully functional as in previous releases.

Assuming QEMU actually deletes the feature at end of the deprecation
cycle, then libvirt would look at removing its own support.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v3 00/17] hw/block/nvme: multiple namespaces support

2020-09-22 Thread Keith Busch

On Tue, Sep 22, 2020 at 07:27:30PM +0200, Klaus Jensen wrote:
> On Sep 22 08:31, Keith Busch wrote:
> > On Tue, Sep 22, 2020 at 10:45:16AM +0200, Klaus Jensen wrote:
> > > From: Klaus Jensen 
> > > 
> > > This is the next round of my patches for the nvme device.
> > > 
> > > This includes a bit of cleanup and two new features:
> > > 
> > >   * support for scatter/gather lists
> > > 
> > >   * multiple namespaces support through a new nvme-ns device
> > > 
> > > Finally, the series wraps up with changing the PCI vendor and device ID 
> > > to get
> > > rid of the internal Intel id and as a side-effect get rid of some Linux 
> > > kernel
> > > quirks that no longer applies.
> > > 
> > > "pci: pass along the return value of dma_memory_rw" has already been 
> > > posted by
> > > Philippe in another series, but since it is not applied yet, I am 
> > > including it
> > > here.
> > 
> > For the rest of the patches I haven't individually commented:
> > 
> > Reviewed-by: Keith Busch 
> 
> Sorry if I am being thick here Keith, but didn't you R-b all patches
> (except 3 and 9) in v2 yesterday?
> 
> I do not see any comments to any of the v3 patches, so I'm not sure how
> to interpret this ;)

Ha, I thought I did yesterday too, but it's not showing in my "Sent"
mail so I sent it again. But since it apparently did send, you can
interpret the 2nd as "two thumbs up!". :)

Re: [PATCH v2 0/2] block: deprecate the sheepdog driver

2020-09-22 Thread Neal Gompa

On Tue, Sep 22, 2020 at 12:16 PM Daniel P. Berrangé  wrote:
>
> 2 years back I proposed dropping the sheepdog mailing list from the
> MAINTAINERS file, but somehow the patch never got picked up:
>
>   https://lists.gnu.org/archive/html/qemu-block/2018-03/msg01048.html
>
> So here I am with the same patch again.
>
> This time I go further and deprecate the sheepdog driver entirely.
> See the rationale in the second patch commit message.
>
> Daniel P. Berrangé (2):
>   block: drop moderated sheepdog mailing list from MAINTAINERS file
>   block: deprecate the sheepdog block driver
>
>  MAINTAINERS|  1 -
>  block/sheepdog.c   | 15 +++
>  configure  |  5 +++--
>  docs/system/deprecated.rst |  9 +
>  4 files changed, 27 insertions(+), 3 deletions(-)
>
> --
> 2.26.2
>
>

I don't know of anyone shipping this other than Fedora, and I
certainly don't use it there.

Upstream looks like it's unmaintained now, with no commits in over two
years: https://github.com/sheepdog/sheepdog/commits/master

Can we also get a corresponding change to eliminate this from libvirt?

Reviewed-by: Neal Gompa 


-- 
真実はいつも一つ！/ Always, there's only one truth!

Re: [PATCH v2 2/2] block: deprecate the sheepdog block driver

2020-09-22 Thread Thomas Huth

On 22/09/2020 18.16, Daniel P. Berrangé wrote:
> This thread from a little over a year ago:
> 
>   http://lists.wpkg.org/pipermail/sheepdog/2019-March/thread.html
> 
> states that sheepdog is no longer actively developed. The only mentioned
> users are some companies who are said to have it for legacy reasons with
> plans to replace it by Ceph. There is talk about cutting out existing
> features to turn it into a simple demo of how to write a distributed
> block service. There is no evidence of anyone working on that idea:
> 
>   https://github.com/sheepdog/sheepdog/commits/master
> 
> No real commits to git since Jan 2018, and before then just some minor
> technical debt cleanup..
> 
> There is essentially no activity on the mailing list aside from
> patches to QEMU that get CC'd due to our MAINTAINERS entry.
> 
> Fedora packages for sheepdog failed to build from upstream source
> because of the more strict linker that no longer merges duplicate
> global symbols. Fedora patches it to add the missing "extern"
> annotations and presumably other distros do to, but upstream source
> remains broken.
> 
> There is only basic compile testing, no functional testing of the
> driver.
> 
> Since there are no build pre-requisites the sheepdog driver is currently
> enabled unconditionally. This would result in configure issuing a
> deprecation warning by default for all users. Thus the configure default
> is changed to disable it, requiring users to pass --enable-sheepdog to
> build the driver.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  block/sheepdog.c   | 15 +++
>  configure  |  5 +++--
>  docs/system/deprecated.rst |  9 +
>  3 files changed, 27 insertions(+), 2 deletions(-)

Reviewed-by: Thomas Huth

Re: [PATCH v2 1/2] block: drop moderated sheepdog mailing list from MAINTAINERS file

2020-09-22 Thread Thomas Huth

On 22/09/2020 18.16, Daniel P. Berrangé wrote:
> The sheepdog mailing list is setup to stop and queue messages from
> non-subscribers, pending moderator approval. Unfortunately it seems
> that the moderation queue is not actively dealt with. Even when messages
> are approved, the sender is never added to the whitelist, so every
> future mail from the same sender continues to get stopped for moderation.
> 
> MAINTAINERS entries should be responsive and not unneccessarily block
> mails from QEMU contributors, so drop the sheepdog mailing list.
> 
> Reviewed-by: Philippe Mathieu-Daudé 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  MAINTAINERS | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3d17cad19a..8e8a4fb0a8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2852,7 +2852,6 @@ F: block/rbd.c
>  Sheepdog
>  M: Liu Yuan 
>  L: qemu-block@nongnu.org
> -L: sheep...@lists.wpkg.org
>  S: Odd Fixes
>  F: block/sheepdog.c

Reviewed-by: Thomas Huth

Re: [PATCH v3 00/17] hw/block/nvme: multiple namespaces support

2020-09-22 Thread Klaus Jensen

On Sep 22 08:31, Keith Busch wrote:
> On Tue, Sep 22, 2020 at 10:45:16AM +0200, Klaus Jensen wrote:
> > From: Klaus Jensen 
> > 
> > This is the next round of my patches for the nvme device.
> > 
> > This includes a bit of cleanup and two new features:
> > 
> >   * support for scatter/gather lists
> > 
> >   * multiple namespaces support through a new nvme-ns device
> > 
> > Finally, the series wraps up with changing the PCI vendor and device ID to 
> > get
> > rid of the internal Intel id and as a side-effect get rid of some Linux 
> > kernel
> > quirks that no longer applies.
> > 
> > "pci: pass along the return value of dma_memory_rw" has already been posted 
> > by
> > Philippe in another series, but since it is not applied yet, I am including 
> > it
> > here.
> 
> For the rest of the patches I haven't individually commented:
> 
> Reviewed-by: Keith Busch 

Sorry if I am being thick here Keith, but didn't you R-b all patches
(except 3 and 9) in v2 yesterday?

I do not see any comments to any of the v3 patches, so I'm not sure how
to interpret this ;)


signature.asc
Description: PGP signature

[PATCH v2 2/2] block: deprecate the sheepdog block driver

2020-09-22 Thread Daniel P . Berrangé

This thread from a little over a year ago:

  http://lists.wpkg.org/pipermail/sheepdog/2019-March/thread.html

states that sheepdog is no longer actively developed. The only mentioned
users are some companies who are said to have it for legacy reasons with
plans to replace it by Ceph. There is talk about cutting out existing
features to turn it into a simple demo of how to write a distributed
block service. There is no evidence of anyone working on that idea:

  https://github.com/sheepdog/sheepdog/commits/master

No real commits to git since Jan 2018, and before then just some minor
technical debt cleanup..

There is essentially no activity on the mailing list aside from
patches to QEMU that get CC'd due to our MAINTAINERS entry.

Fedora packages for sheepdog failed to build from upstream source
because of the more strict linker that no longer merges duplicate
global symbols. Fedora patches it to add the missing "extern"
annotations and presumably other distros do to, but upstream source
remains broken.

There is only basic compile testing, no functional testing of the
driver.

Since there are no build pre-requisites the sheepdog driver is currently
enabled unconditionally. This would result in configure issuing a
deprecation warning by default for all users. Thus the configure default
is changed to disable it, requiring users to pass --enable-sheepdog to
build the driver.

Signed-off-by: Daniel P. Berrangé 
---
 block/sheepdog.c   | 15 +++
 configure  |  5 +++--
 docs/system/deprecated.rst |  9 +
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index cbbebc1aaf..7f68bd6a1a 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -242,6 +242,17 @@ typedef struct SheepdogInode {
  */
 #define FNV1A_64_INIT ((uint64_t)0xcbf29ce484222325ULL)
 
+static void deprecation_warning(void)
+{
+static bool warned = false;
+
+if (!warned) {
+warn_report("the sheepdog block driver is deprecated and will be "
+"removed in a future release");
+warned = true;
+}
+}
+
 /*
  * 64 bit Fowler/Noll/Vo FNV-1a hash code
  */
@@ -1548,6 +1559,8 @@ static int sd_open(BlockDriverState *bs, QDict *options, 
int flags,
 char *buf = NULL;
 QemuOpts *opts;
 
+deprecation_warning();
+
 s->bs = bs;
 s->aio_context = bdrv_get_aio_context(bs);
 
@@ -2007,6 +2020,8 @@ static int sd_co_create(BlockdevCreateOptions *options, 
Error **errp)
 
 assert(options->driver == BLOCKDEV_DRIVER_SHEEPDOG);
 
+deprecation_warning();
+
 s = g_new0(BDRVSheepdogState, 1);
 
 /* Steal SocketAddress from QAPI, set NULL to prevent double free */
diff --git a/configure b/configure
index 7564479008..c6af83f2e6 100755
--- a/configure
+++ b/configure
@@ -533,7 +533,7 @@ vdi="yes"
 vvfat="yes"
 qed="yes"
 parallels="yes"
-sheepdog="yes"
+sheepdog="no"
 libxml2=""
 debug_mutex="no"
 libpmem=""
@@ -1941,7 +1941,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   vvfat   vvfat image format support
   qed qed image format support
   parallels   parallels image format support
-  sheepdogsheepdog block driver support
+  sheepdogsheepdog block driver support (deprecated)
   crypto-afalgLinux AF_ALG crypto backend driver
   capstonecapstone disassembler support
   debug-mutex mutex debugging support
@@ -7350,6 +7350,7 @@ if test "$parallels" = "yes" ; then
   echo "CONFIG_PARALLELS=y" >> $config_host_mak
 fi
 if test "$sheepdog" = "yes" ; then
+  add_to deprecated_features "sheepdog"
   echo "CONFIG_SHEEPDOG=y" >> $config_host_mak
 fi
 if test "$pty_h" = "yes" ; then
diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index 0cb8b01424..49b9f4b02e 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -405,6 +405,15 @@ The above, converted to the current supported format::
 
   json:{"file.driver":"rbd", "file.pool":"rbd", "file.image":"name"}
 
+``sheepdog`` driver (since 5.2.0)
+^
+
+The ``sheepdog`` block device driver is deprecated. The corresponding upstream
+server project is no longer actively maintained. Users are recommended to 
switch
+to an alternative distributed block device driver such as RBD. The
+``qemu-img convert`` command can be used to liberate existing data by moving
+it out of sheepdog volumes into an alternative storage backend.
+
 linux-user mode CPUs
 
 
-- 
2.26.2

[PATCH 03/11] util/vhost-user-server: drop unnecessary QOM cast

2020-09-22 Thread Stefan Hajnoczi

We already have access to the value with the correct type (ioc and sioc
are the same QIOChannel).

Signed-off-by: Stefan Hajnoczi 
---
 util/vhost-user-server.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 2cd0cf8277..ebe47885f5 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -337,7 +337,7 @@ static void vu_accept(QIONetListener *listener, 
QIOChannelSocket *sioc,
 server->ioc = QIO_CHANNEL(sioc);
 object_ref(OBJECT(server->ioc));
 qio_channel_attach_aio_context(server->ioc, server->ctx);
-qio_channel_set_blocking(QIO_CHANNEL(server->sioc), false, NULL);
+qio_channel_set_blocking(server->ioc, false, NULL);
 vu_client_start(server);
 }
 
-- 
2.26.2

[PATCH 10/11] block/export: report flush errors

2020-09-22 Thread Stefan Hajnoczi

Propagate the flush return value since errors are possible.

Signed-off-by: Stefan Hajnoczi 
---
 block/export/vhost-user-blk-server.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index b609a3e4d6..44d3c45fa2 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -78,11 +78,11 @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec 
*iov,
 return -EINVAL;
 }
 
-static void coroutine_fn vu_block_flush(VuBlockReq *req)
+static int coroutine_fn vu_block_flush(VuBlockReq *req)
 {
 VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
 BlockBackend *backend = vdev_blk->backend;
-blk_co_flush(backend);
+return blk_co_flush(backend);
 }
 
 static void coroutine_fn vu_block_virtio_process_req(void *opaque)
@@ -152,8 +152,11 @@ static void coroutine_fn vu_block_virtio_process_req(void 
*opaque)
 break;
 }
 case VIRTIO_BLK_T_FLUSH:
-vu_block_flush(req);
-req->in->status = VIRTIO_BLK_S_OK;
+if (vu_block_flush(req) == 0) {
+req->in->status = VIRTIO_BLK_S_OK;
+} else {
+req->in->status = VIRTIO_BLK_S_IOERR;
+}
 break;
 case VIRTIO_BLK_T_GET_ID: {
 size_t size = MIN(iov_size(&elem->in_sg[0], in_num),
-- 
2.26.2

Re: [PATCH v2 00/20] block/export: Allow exporting BDSs via FUSE

2020-09-22 Thread Daniel P . Berrangé

On Tue, Sep 22, 2020 at 12:49:12PM +0200, Max Reitz wrote:
> Based-on: <20200907182011.521007-1-kw...@redhat.com>
>   (“block/export: Add infrastructure and QAPI for block exports”)
> 
> (Though its patch 16 needs a s/= \&blk_exp_nbd/= drv/ on top.)
> 
> v1: https://lists.nongnu.org/archive/html/qemu-block/2019-12/msg00451.html
> 
> Branch: https://github.com/XanClic/qemu.git fuse-exports-v2
> Branch: https://git.xanclic.moe/XanClic/qemu.git fuse-exports-v2
> 
> 
> Hi,
> 
> Ever since I found out that you can mount FUSE filesystems on regular
> files (not just directories), I had the idea of adding FUSE block
> exports to qemu where you can export block nodes as raw images.  The
> best thing is that you’d be able to mount an image on itself, so
> whatever format it may be in, qemu lets it appear as a raw image (and
> you can then use regular tools like dd on it).
> 
> The performance is quite bad so far, but we can always try to improve it
> if the need arises.  For now I consider it mostly a cute feature to get
> easy access to the raw contents of image files in any format (without
> requiring root rights).

Aside from the iotests, so you forsee any particular use cases
where this feature is desirable / important ?

Looking at it from a security POV, I'm not thrilled about the
idea of granting QEMU permission to use the mount syscall for
seccomp or SELinux. IOW, I expect this feature won't be something
we want to expose in QEMU guests managed by libvirt, which would
limit how widely it can be used.

QEMU can export NBD. Would it make sense to do this as an NBD
client ? There's already https://libguestfs.org/nbdfuse.1.html
but IIUC that exposes it as a file within a dir. Presumably
it is not too hard to make it support exposing it directly as
a file too.

I wonder how performance compares between your native FUSE
impl in QEMU vs NBD FUSE ?

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH v2 0/2] block: deprecate the sheepdog driver

2020-09-22 Thread Daniel P . Berrangé

2 years back I proposed dropping the sheepdog mailing list from the
MAINTAINERS file, but somehow the patch never got picked up:

  https://lists.gnu.org/archive/html/qemu-block/2018-03/msg01048.html

So here I am with the same patch again.

This time I go further and deprecate the sheepdog driver entirely.
See the rationale in the second patch commit message.

Daniel P. Berrangé (2):
  block: drop moderated sheepdog mailing list from MAINTAINERS file
  block: deprecate the sheepdog block driver

 MAINTAINERS|  1 -
 block/sheepdog.c   | 15 +++
 configure  |  5 +++--
 docs/system/deprecated.rst |  9 +
 4 files changed, 27 insertions(+), 3 deletions(-)

-- 
2.26.2

[PATCH 11/11] block/export: convert vhost-user-blk server to block export API

2020-09-22 Thread Stefan Hajnoczi

Use the new QAPI block exports API instead of defining our own QOM
objects.

This is a large change because the lifecycle of VuBlockDev needs to
follow BlockExportDriver. QOM properties are replaced by QAPI options
objects.

VuBlockDev is renamed VuBlkExport and contains a BlockExport field.
Several fields can be dropped since BlockExport already has equivalents.

The file names and meson build integration will be adjusted in a future
patch. libvhost-user should probably be built as a static library that
is linked into QEMU instead of as a .c file that results in duplicate
compilation.

The new command-line syntax is:

  $ qemu-storage-daemon \
  --blockdev file,node-name=drive0,filename=test.img \
  --export 
vhost-user-blk,node-name=drive0,id=export0,unix-socket=/tmp/vhost-user-blk.sock

Note that unix-socket is optional because we may wish to accept chardevs
too in the future.

Signed-off-by: Stefan Hajnoczi 
---
 qapi/block-export.json   |  19 +-
 block/export/vhost-user-blk-server.h |  23 +-
 block/export/export.c|   8 +-
 block/export/vhost-user-blk-server.c | 461 ---
 block/export/meson.build |   1 +
 block/meson.build|   1 -
 6 files changed, 156 insertions(+), 357 deletions(-)

diff --git a/qapi/block-export.json b/qapi/block-export.json
index ace0d66e17..840dcbe833 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -84,6 +84,19 @@
   'data': { '*name': 'str', '*description': 'str',
 '*bitmap': 'str' } }
 
+##
+# @BlockExportOptionsVhostUserBlk:
+#
+# A vhost-user-blk block export.
+#
+# @unix-socket: Path where the vhost-user UNIX domain socket will be created.
+# @logical-block-size: Logical block size in bytes.
+#
+# Since: 5.2
+##
+{ 'struct': 'BlockExportOptionsVhostUserBlk',
+  'data': { '*unix-socket': 'str', '*logical-block-size': 'size' } }
+
 ##
 # @NbdServerAddOptions:
 #
@@ -180,11 +193,12 @@
 # An enumeration of block export types
 #
 # @nbd: NBD export
+# @vhost-user-blk: vhost-user-blk export (since 5.2)
 #
 # Since: 4.2
 ##
 { 'enum': 'BlockExportType',
-  'data': [ 'nbd' ] }
+  'data': [ 'nbd', 'vhost-user-blk' ] }
 
 ##
 # @BlockExportOptions:
@@ -213,7 +227,8 @@
 '*writethrough': 'bool' },
   'discriminator': 'type',
   'data': {
-  'nbd': 'BlockExportOptionsNbd'
+  'nbd': 'BlockExportOptionsNbd',
+  'vhost-user-blk': 'BlockExportOptionsVhostUserBlk'
} }
 
 ##
diff --git a/block/export/vhost-user-blk-server.h 
b/block/export/vhost-user-blk-server.h
index f06f37c4c8..fcf46fc8a5 100644
--- a/block/export/vhost-user-blk-server.h
+++ b/block/export/vhost-user-blk-server.h
@@ -10,27 +10,10 @@
 
 #ifndef VHOST_USER_BLK_SERVER_H
 #define VHOST_USER_BLK_SERVER_H
-#include "util/vhost-user-server.h"
 
-typedef struct VuBlockDev VuBlockDev;
-#define TYPE_VHOST_USER_BLK_SERVER "vhost-user-blk-server"
-#define VHOST_USER_BLK_SERVER(obj) \
-   OBJECT_CHECK(VuBlockDev, obj, TYPE_VHOST_USER_BLK_SERVER)
+#include "block/export.h"
 
-/* vhost user block device */
-struct VuBlockDev {
-Object parent_obj;
-char *node_name;
-SocketAddress *addr;
-AioContext *ctx;
-VuServer vu_server;
-bool running;
-uint32_t blk_size;
-BlockBackend *backend;
-QIOChannelSocket *sioc;
-QTAILQ_ENTRY(VuBlockDev) next;
-struct virtio_blk_config blkcfg;
-bool writable;
-};
+/* For block/export/export.c */
+extern const BlockExportDriver blk_exp_vhost_user_blk;
 
 #endif /* VHOST_USER_BLK_SERVER_H */
diff --git a/block/export/export.c b/block/export/export.c
index 87516d1e05..d0c7590911 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -17,6 +17,9 @@
 #include "sysemu/block-backend.h"
 #include "block/export.h"
 #include "block/nbd.h"
+#if CONFIG_LINUX
+#include "block/export/vhost-user-blk-server.h"
+#endif
 #include "qapi/error.h"
 #include "qapi/qapi-commands-block-export.h"
 #include "qapi/qapi-events-block-export.h"
@@ -24,6 +27,9 @@
 
 static const BlockExportDriver *blk_exp_drivers[] = {
 &blk_exp_nbd,
+#if CONFIG_LINUX
+&blk_exp_vhost_user_blk,
+#endif
 };
 
 /* Only accessed from the main thread */
@@ -123,7 +129,7 @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error 
**errp)
 assert(drv->instance_size >= sizeof(BlockExport));
 exp = g_malloc0(drv->instance_size);
 *exp = (BlockExport) {
-.drv= &blk_exp_nbd,
+.drv= drv,
 .refcount   = 1,
 .user_owned = true,
 .id = g_strdup(export->id),
diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index 44d3c45fa2..9908b3287e 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -11,6 +11,9 @@
  */
 #include "qemu/osdep.h"
 #include "block/block.h"
+#include "contrib/libvhost-user/libvhost-user.h"
+#include "standard-headers/linux/virtio_blk.h"
+#include "util/vhost-user-server.h"
 #include "vh

[PATCH v2 1/2] block: drop moderated sheepdog mailing list from MAINTAINERS file

2020-09-22 Thread Daniel P . Berrangé

The sheepdog mailing list is setup to stop and queue messages from
non-subscribers, pending moderator approval. Unfortunately it seems
that the moderation queue is not actively dealt with. Even when messages
are approved, the sender is never added to the whitelist, so every
future mail from the same sender continues to get stopped for moderation.

MAINTAINERS entries should be responsive and not unneccessarily block
mails from QEMU contributors, so drop the sheepdog mailing list.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Daniel P. Berrangé 
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3d17cad19a..8e8a4fb0a8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2852,7 +2852,6 @@ F: block/rbd.c
 Sheepdog
 M: Liu Yuan 
 L: qemu-block@nongnu.org
-L: sheep...@lists.wpkg.org
 S: Odd Fixes
 F: block/sheepdog.c
 
-- 
2.26.2

[PATCH 02/11] util/vhost-user-server: s/fileds/fields/ typo fix

2020-09-22 Thread Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi 
---
 util/vhost-user-server.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 7b50a2b1fd..2cd0cf8277 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -407,7 +407,7 @@ bool vhost_user_server_start(VuServer *server,
 return false;
 }
 
-/* zero out unspecified fileds */
+/* zero out unspecified fields */
 *server = (VuServer) {
 .listener  = listener,
 .vu_iface  = vu_iface,
-- 
2.26.2

[PATCH 09/11] util/vhost-user-server: rework vu_client_trip() coroutine lifecycle

2020-09-22 Thread Stefan Hajnoczi

The vu_client_trip() coroutine is leaked during AioContext switching. It
is also unsafe to destroy the vu_dev in panic_cb() since its callers
still access it in some cases.

Rework the lifecycle to solve these safety issues.

Signed-off-by: Stefan Hajnoczi 
---
 util/vhost-user-server.h |  29 ++--
 block/export/vhost-user-blk-server.c |   9 +-
 util/vhost-user-server.c | 245 +++
 3 files changed, 155 insertions(+), 128 deletions(-)

diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
index 92177fc911..0da4c2cc4c 100644
--- a/util/vhost-user-server.h
+++ b/util/vhost-user-server.h
@@ -19,34 +19,36 @@
 #include "qapi/error.h"
 #include "standard-headers/linux/virtio_blk.h"
 
+/* A kick fd that we monitor on behalf of libvhost-user */
 typedef struct VuFdWatch {
 VuDev *vu_dev;
 int fd; /*kick fd*/
 void *pvt;
 vu_watch_cb cb;
-bool processing;
 QTAILQ_ENTRY(VuFdWatch) next;
 } VuFdWatch;
 
-typedef struct VuServer VuServer;
-
-struct VuServer {
+/**
+ * VuServer:
+ * A vhost-user server instance with user-defined VuDevIface callbacks.
+ * Vhost-user device backends can be implemented using VuServer. VuDevIface
+ * callbacks and virtqueue kicks run in the given AioContext.
+ */
+typedef struct {
 QIONetListener *listener;
+QEMUBH *restart_listener_bh;
 AioContext *ctx;
 int max_queues;
 const VuDevIface *vu_iface;
+
+/* Protected by ctx lock */
 VuDev vu_dev;
 QIOChannel *ioc; /* The I/O channel with the client */
 QIOChannelSocket *sioc; /* The underlying data channel with the client */
-/* IOChannel for fd provided via VHOST_USER_SET_SLAVE_REQ_FD */
-QIOChannel *ioc_slave;
-QIOChannelSocket *sioc_slave;
-Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
 QTAILQ_HEAD(, VuFdWatch) vu_fd_watches;
-/* restart coroutine co_trip if AIOContext is changed */
-bool aio_context_changed;
-bool processing_msg;
-};
+
+Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
+} VuServer;
 
 bool vhost_user_server_start(VuServer *server,
  SocketAddress *unix_socket,
@@ -57,6 +59,7 @@ bool vhost_user_server_start(VuServer *server,
 
 void vhost_user_server_stop(VuServer *server);
 
-void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx);
+void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
+void vhost_user_server_detach_aio_context(VuServer *server);
 
 #endif /* VHOST_USER_SERVER_H */
diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index f2bfddbf26..b609a3e4d6 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -313,18 +313,13 @@ static const VuDevIface vu_block_iface = {
 static void blk_aio_attached(AioContext *ctx, void *opaque)
 {
 VuBlockDev *vub_dev = opaque;
-aio_context_acquire(ctx);
-vhost_user_server_set_aio_context(&vub_dev->vu_server, ctx);
-aio_context_release(ctx);
+vhost_user_server_attach_aio_context(&vub_dev->vu_server, ctx);
 }
 
 static void blk_aio_detach(void *opaque)
 {
 VuBlockDev *vub_dev = opaque;
-AioContext *ctx = vub_dev->vu_server.ctx;
-aio_context_acquire(ctx);
-vhost_user_server_set_aio_context(&vub_dev->vu_server, NULL);
-aio_context_release(ctx);
+vhost_user_server_detach_aio_context(&vub_dev->vu_server);
 }
 
 static void
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index ec555abcb2..d8b8c08b5f 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -9,8 +9,50 @@
  */
 #include "qemu/osdep.h"
 #include "qemu/main-loop.h"
+#include "block/aio-wait.h"
 #include "vhost-user-server.h"
 
+/*
+ * Theory of operation:
+ *
+ * VuServer is started and stopped by vhost_user_server_start() and
+ * vhost_user_server_stop() from the main loop thread. Starting the server
+ * opens a vhost-user UNIX domain socket and listens for incoming connections.
+ * Only one connection is allowed at a time.
+ *
+ * The connection is handled by the vu_client_trip() coroutine in the
+ * VuServer->ctx AioContext. The coroutine consists of a vu_dispatch() loop
+ * where libvhost-user calls vu_message_read() to receive the next vhost-user
+ * protocol messages over the UNIX domain socket.
+ *
+ * When virtqueues are set up libvhost-user calls set_watch() to monitor kick
+ * fds. These fds are also handled in the VuServer->ctx AioContext.
+ *
+ * Both vu_client_trip() and kick fd monitoring can be stopped by shutting down
+ * the socket connection. Shutting down the socket connection causes
+ * vu_message_read() to fail since no more data can be received from the 
socket.
+ * After vu_dispatch() fails, vu_client_trip() calls vu_deinit() to stop
+ * libvhost-user before terminating the coroutine. vu_deinit() calls
+ * remove_watch() to stop monitoring kick fds and this stops virtqueue
+ * processing.
+ *
+ * Whe

[PATCH 06/11] util/vhost-user-server: drop unused DevicePanicNotifier

2020-09-22 Thread Stefan Hajnoczi

The device panic notifier callback is not used. Drop it.

Signed-off-by: Stefan Hajnoczi 
---
 util/vhost-user-server.h | 3 ---
 block/export/vhost-user-blk-server.c | 3 +--
 util/vhost-user-server.c | 6 --
 3 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
index 5232f96718..92177fc911 100644
--- a/util/vhost-user-server.h
+++ b/util/vhost-user-server.h
@@ -29,12 +29,10 @@ typedef struct VuFdWatch {
 } VuFdWatch;
 
 typedef struct VuServer VuServer;
-typedef void DevicePanicNotifierFn(VuServer *server);
 
 struct VuServer {
 QIONetListener *listener;
 AioContext *ctx;
-DevicePanicNotifierFn *device_panic_notifier;
 int max_queues;
 const VuDevIface *vu_iface;
 VuDev vu_dev;
@@ -54,7 +52,6 @@ bool vhost_user_server_start(VuServer *server,
  SocketAddress *unix_socket,
  AioContext *ctx,
  uint16_t max_queues,
- DevicePanicNotifierFn *device_panic_notifier,
  const VuDevIface *vu_iface,
  Error **errp);
 
diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index ef07a87eb1..f2bfddbf26 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -439,8 +439,7 @@ static void vhost_user_blk_server_start(VuBlockDev 
*vu_block_device,
 ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
 
 if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
- VHOST_USER_BLK_MAX_QUEUES,
- NULL, &vu_block_iface,
+ VHOST_USER_BLK_MAX_QUEUES, &vu_block_iface,
  errp)) {
 goto error;
 }
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index ebb850731b..892815827d 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -81,10 +81,6 @@ static void panic_cb(VuDev *vu_dev, const char *buf)
 close_client(server);
 }
 
-if (server->device_panic_notifier) {
-server->device_panic_notifier(server);
-}
-
 /*
  * Set the callback function for network listener so another
  * vhost-user client can connect to this server
@@ -385,7 +381,6 @@ bool vhost_user_server_start(VuServer *server,
  SocketAddress *socket_addr,
  AioContext *ctx,
  uint16_t max_queues,
- DevicePanicNotifierFn *device_panic_notifier,
  const VuDevIface *vu_iface,
  Error **errp)
 {
@@ -402,7 +397,6 @@ bool vhost_user_server_start(VuServer *server,
 .vu_iface  = vu_iface,
 .max_queues= max_queues,
 .ctx   = ctx,
-.device_panic_notifier = device_panic_notifier,
 };
 
 qio_net_listener_set_name(server->listener, "vhost-user-backend-listener");
-- 
2.26.2

[PATCH 04/11] util/vhost-user-server: drop unnecessary watch deletion

2020-09-22 Thread Stefan Hajnoczi

Explicitly deleting watches is not necessary since libvhost-user calls
remove_watch() during vu_deinit(). Add an assertion to check this
though.

Signed-off-by: Stefan Hajnoczi 
---
 util/vhost-user-server.c | 19 ---
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index ebe47885f5..ebb850731b 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -48,21 +48,6 @@ static void close_client(VuServer *server)
 /* When this is set vu_client_trip will stop new processing vhost-user 
message */
 server->sioc = NULL;
 
-VuFdWatch *vu_fd_watch, *next;
-QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
-aio_set_fd_handler(server->ioc->ctx, vu_fd_watch->fd, true, NULL,
-   NULL, NULL, NULL);
-}
-
-while (!QTAILQ_EMPTY(&server->vu_fd_watches)) {
-QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
-if (!vu_fd_watch->processing) {
-QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
-g_free(vu_fd_watch);
-}
-}
-}
-
 while (server->processing_msg) {
 if (server->ioc->read_coroutine) {
 server->ioc->read_coroutine = NULL;
@@ -73,6 +58,10 @@ static void close_client(VuServer *server)
 }
 
 vu_deinit(&server->vu_dev);
+
+/* vu_deinit() should have called remove_watch() */
+assert(QTAILQ_EMPTY(&server->vu_fd_watches));
+
 object_unref(OBJECT(sioc));
 object_unref(OBJECT(server->ioc));
 }
-- 
2.26.2

[PATCH 07/11] util/vhost-user-server: fix memory leak in vu_message_read()

2020-09-22 Thread Stefan Hajnoczi

fds[] is leaked when qio_channel_readv_full() fails.

Use vmsg->fds[] instead of keeping a local fds[] array. Then we can
reuse goto fail to clean up fds. vmsg->fd_num must be zeroed before the
loop to make this safe.

Signed-off-by: Stefan Hajnoczi 
---
 util/vhost-user-server.c | 50 ++--
 1 file changed, 23 insertions(+), 27 deletions(-)

diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 892815827d..5a60e2ca2a 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -100,21 +100,11 @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg 
*vmsg)
 };
 int rc, read_bytes = 0;
 Error *local_err = NULL;
-/*
- * Store fds/nfds returned from qio_channel_readv_full into
- * temporary variables.
- *
- * VhostUserMsg is a packed structure, gcc will complain about passing
- * pointer to a packed structure member if we pass &VhostUserMsg.fd_num
- * and &VhostUserMsg.fds directly when calling qio_channel_readv_full,
- * thus two temporary variables nfds and fds are used here.
- */
-size_t nfds = 0, nfds_t = 0;
 const size_t max_fds = G_N_ELEMENTS(vmsg->fds);
-int *fds_t = NULL;
 VuServer *server = container_of(vu_dev, VuServer, vu_dev);
 QIOChannel *ioc = server->ioc;
 
+vmsg->fd_num = 0;
 if (!ioc) {
 error_report_err(local_err);
 goto fail;
@@ -122,41 +112,47 @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg 
*vmsg)
 
 assert(qemu_in_coroutine());
 do {
+size_t nfds = 0;
+int *fds = NULL;
+
 /*
  * qio_channel_readv_full may have short reads, keeping calling it
  * until getting VHOST_USER_HDR_SIZE or 0 bytes in total
  */
-rc = qio_channel_readv_full(ioc, &iov, 1, &fds_t, &nfds_t, &local_err);
+rc = qio_channel_readv_full(ioc, &iov, 1, &fds, &nfds, &local_err);
 if (rc < 0) {
 if (rc == QIO_CHANNEL_ERR_BLOCK) {
+assert(local_err == NULL);
 qio_channel_yield(ioc, G_IO_IN);
 continue;
 } else {
 error_report_err(local_err);
-return false;
+goto fail;
 }
 }
-read_bytes += rc;
-if (nfds_t > 0) {
-if (nfds + nfds_t > max_fds) {
+
+if (nfds > 0) {
+if (vmsg->fd_num + nfds > max_fds) {
 error_report("A maximum of %zu fds are allowed, "
  "however got %lu fds now",
- max_fds, nfds + nfds_t);
+ max_fds, vmsg->fd_num + nfds);
+g_free(fds);
 goto fail;
 }
-memcpy(vmsg->fds + nfds, fds_t,
-   nfds_t *sizeof(vmsg->fds[0]));
-nfds += nfds_t;
-g_free(fds_t);
+memcpy(vmsg->fds + vmsg->fd_num, fds, nfds * sizeof(vmsg->fds[0]));
+vmsg->fd_num += nfds;
+g_free(fds);
 }
-if (read_bytes == VHOST_USER_HDR_SIZE || rc == 0) {
-break;
+
+if (rc == 0) { /* socket closed */
+goto fail;
 }
-iov.iov_base = (char *)vmsg + read_bytes;
-iov.iov_len = VHOST_USER_HDR_SIZE - read_bytes;
-} while (true);
 
-vmsg->fd_num = nfds;
+iov.iov_base += rc;
+iov.iov_len -= rc;
+read_bytes += rc;
+} while (read_bytes != VHOST_USER_HDR_SIZE);
+
 /* qio_channel_readv_full will make socket fds blocking, unblock them */
 vmsg_unblock_fds(vmsg);
 if (vmsg->size > sizeof(vmsg->payload)) {
-- 
2.26.2

[PATCH 08/11] util/vhost-user-server: check EOF when reading payload

2020-09-22 Thread Stefan Hajnoczi

Unexpected EOF is an error that must be reported.

Signed-off-by: Stefan Hajnoczi 
---
 util/vhost-user-server.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 5a60e2ca2a..ec555abcb2 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -169,8 +169,10 @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg 
*vmsg)
 };
 if (vmsg->size) {
 rc = qio_channel_readv_all_eof(ioc, &iov_payload, 1, &local_err);
-if (rc == -1) {
-error_report_err(local_err);
+if (rc != 1) {
+if (local_err) {
+error_report_err(local_err);
+}
 goto fail;
 }
 }
-- 
2.26.2

[PATCH 05/11] block/export: consolidate request structs into VuBlockReq

2020-09-22 Thread Stefan Hajnoczi

Only one struct is needed per request. Drop req_data and the separate
VuBlockReq instance. Instead let vu_queue_pop() allocate everything at
once.

This fixes the req_data memory leak in vu_block_virtio_process_req().

Signed-off-by: Stefan Hajnoczi 
---
 block/export/vhost-user-blk-server.c | 68 +---
 1 file changed, 21 insertions(+), 47 deletions(-)

diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index fb7764a730..ef07a87eb1 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -25,7 +25,7 @@ struct virtio_blk_inhdr {
 };
 
 typedef struct VuBlockReq {
-VuVirtqElement *elem;
+VuVirtqElement elem;
 int64_t sector_num;
 size_t size;
 struct virtio_blk_inhdr *in;
@@ -39,14 +39,10 @@ static void vu_block_req_complete(VuBlockReq *req)
 VuDev *vu_dev = &req->server->vu_dev;
 
 /* IO size with 1 extra status byte */
-vu_queue_push(vu_dev, req->vq, req->elem, req->size + 1);
+vu_queue_push(vu_dev, req->vq, &req->elem, req->size + 1);
 vu_queue_notify(vu_dev, req->vq);
 
-if (req->elem) {
-free(req->elem);
-}
-
-g_free(req);
+free(req);
 }
 
 static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
@@ -89,20 +85,12 @@ static void coroutine_fn vu_block_flush(VuBlockReq *req)
 blk_co_flush(backend);
 }
 
-struct req_data {
-VuServer *server;
-VuVirtq *vq;
-VuVirtqElement *elem;
-};
-
 static void coroutine_fn vu_block_virtio_process_req(void *opaque)
 {
-struct req_data *data = opaque;
-VuServer *server = data->server;
-VuVirtq *vq = data->vq;
-VuVirtqElement *elem = data->elem;
+VuBlockReq *req = opaque;
+VuServer *server = req->server;
+VuVirtqElement *elem = &req->elem;
 uint32_t type;
-VuBlockReq *req;
 
 VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
 BlockBackend *backend = vdev_blk->backend;
@@ -111,18 +99,13 @@ static void coroutine_fn vu_block_virtio_process_req(void 
*opaque)
 struct iovec *out_iov = elem->out_sg;
 unsigned in_num = elem->in_num;
 unsigned out_num = elem->out_num;
+
 /* refer to hw/block/virtio_blk.c */
 if (elem->out_num < 1 || elem->in_num < 1) {
 error_report("virtio-blk request missing headers");
-free(elem);
-return;
+goto err;
 }
 
-req = g_new0(VuBlockReq, 1);
-req->server = server;
-req->vq = vq;
-req->elem = elem;
-
 if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out,
 sizeof(req->out)) != sizeof(req->out))) {
 error_report("virtio-blk request outhdr too short");
@@ -202,36 +185,27 @@ static void coroutine_fn vu_block_virtio_process_req(void 
*opaque)
 
 err:
 free(elem);
-g_free(req);
-return;
 }
 
 static void vu_block_process_vq(VuDev *vu_dev, int idx)
 {
-VuServer *server;
-VuVirtq *vq;
-struct req_data *req_data;
+VuServer *server = container_of(vu_dev, VuServer, vu_dev);
+VuVirtq *vq = vu_get_queue(vu_dev, idx);
 
-server = container_of(vu_dev, VuServer, vu_dev);
-assert(server);
-
-vq = vu_get_queue(vu_dev, idx);
-assert(vq);
-VuVirtqElement *elem;
 while (1) {
-elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) +
-sizeof(VuBlockReq));
-if (elem) {
-req_data = g_new0(struct req_data, 1);
-req_data->server = server;
-req_data->vq = vq;
-req_data->elem = elem;
-Coroutine *co = qemu_coroutine_create(vu_block_virtio_process_req,
-  req_data);
-aio_co_enter(server->ioc->ctx, co);
-} else {
+VuBlockReq *req;
+
+req = vu_queue_pop(vu_dev, vq, sizeof(VuBlockReq));
+if (!req) {
 break;
 }
+
+req->server = server;
+req->vq = vq;
+
+Coroutine *co =
+qemu_coroutine_create(vu_block_virtio_process_req, req);
+qemu_coroutine_enter(co);
 }
 }
 
-- 
2.26.2

[PATCH 01/11] block/export: shorten serial string to fit

2020-09-22 Thread Stefan Hajnoczi

The serial field is only 20 bytes long. Shorten the string so it fits.

This fixes the following compiler warning:

  ../block/export/vhost-user-blk-server.c:178:50: error: ‘%s’ directive output 
truncated writing 21 bytes into a region of size 20 [-Werror=format-truncation=]
  178 | snprintf(elem->in_sg[0].iov_base, size, "%s", 
"vhost_user_blk_server");
  |  ^~   
~~~

Signed-off-by: Stefan Hajnoczi 
---
 block/export/vhost-user-blk-server.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index ec78130f09..fb7764a730 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -175,7 +175,7 @@ static void coroutine_fn vu_block_virtio_process_req(void 
*opaque)
 case VIRTIO_BLK_T_GET_ID: {
 size_t size = MIN(iov_size(&elem->in_sg[0], in_num),
   VIRTIO_BLK_ID_BYTES);
-snprintf(elem->in_sg[0].iov_base, size, "%s", "vhost_user_blk_server");
+snprintf(elem->in_sg[0].iov_base, size, "%s", "vhost_user_blk");
 req->in->status = VIRTIO_BLK_S_OK;
 req->size = elem->in_sg[0].iov_len;
 break;
-- 
2.26.2

[PATCH 00/11] block/export: convert vhost-user-blk-server to block exports API

2020-09-22 Thread Stefan Hajnoczi

This patch series converts Coiby Xu's vhost-user-blk-server from a QOM object
to the block exports API. The block exports API provides a standard QMP and
command-line interface for managing block exports (NBD, FUSE, vhost-user-blk,
etc). A fair amount of init/shutdown code is removed because the block exports
API already takes care of that functionality.

Most of the patches are vhost-user-blk-server cleanups.

The syntax for launching qemu-storage-daemon is:

  $ qemu-storage-daemon \
  --blockdev file,node-name=drive0,filename=test.img \
  --export 
vhost-user-blk,node-name=drive0,id=export0,writable=on,unix-socket=/tmp/vhost-user-blk.sock

QEMU can connect to the vhost-user-blk export like this:

  $ qemu-system-x86_64 \
  -M accel=kvm,memory-backend=mem \
  -m 1G \
  -object memory-backend-memfd,size=1G,id=mem \
  -cpu host \
  -chardev socket,path=/tmp/vhost-user-blk.sock,id=char0 \
  -device vhost-user-blk-pci,chardev=char0

Based-on: 20200918080912.321299-1-coiby...@gmail.com ("[PATCH v10 0/7] 
vhost-user block device backend implementation")
Based-on: 20200907182011.521007-1-kw...@redhat.com ("[PATCH 00/29] 
block/export: Add infrastructure and QAPI for block exports")

Stefan Hajnoczi (11):
  block/export: shorten serial string to fit
  util/vhost-user-server: s/fileds/fields/ typo fix
  util/vhost-user-server: drop unnecessary QOM cast
  util/vhost-user-server: drop unnecessary watch deletion
  block/export: consolidate request structs into VuBlockReq
  util/vhost-user-server: drop unused DevicePanicNotifier
  util/vhost-user-server: fix memory leak in vu_message_read()
  util/vhost-user-server: check EOF when reading payload
  util/vhost-user-server: rework vu_client_trip() coroutine lifecycle
  block/export: report flush errors
  block/export: convert vhost-user-blk server to block export API

 qapi/block-export.json   |  19 +-
 block/export/vhost-user-blk-server.h |  23 +-
 util/vhost-user-server.h |  32 +-
 block/export/export.c|   8 +-
 block/export/vhost-user-blk-server.c | 534 ---
 util/vhost-user-server.c | 322 
 block/export/meson.build |   1 +
 block/meson.build|   1 -
 8 files changed, 360 insertions(+), 580 deletions(-)

-- 
2.26.2

Re: [PATCH v2 01/20] configure: Detect libfuse

2020-09-22 Thread Paolo Bonzini

On 22/09/20 17:37, Max Reitz wrote:
> On 22.09.20 13:14, Thomas Huth wrote:
> 
> [...]
> 
>> Could you turn this immediately into a meson test instead? See e.g.
>> https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg07112.html as
>> an example for how to do this.
> 
> Done (I think).  Until I send v3, the updated version lives here:
> 
> https://git.xanclic.moe/XanClic/qemu/commits/branch/fuse-exports-next
> https://github.com/XanClic/qemu/commits/fuse-exports-next

Looks good.

Paolo



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2 01/20] configure: Detect libfuse

2020-09-22 Thread Max Reitz

On 22.09.20 13:14, Thomas Huth wrote:

[...]

> Could you turn this immediately into a meson test instead? See e.g.
> https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg07112.html as
> an example for how to do this.

Done (I think).  Until I send v3, the updated version lives here:

https://git.xanclic.moe/XanClic/qemu/commits/branch/fuse-exports-next
https://github.com/XanClic/qemu/commits/fuse-exports-next

I’ve replaced the compile checks (on test programs) by simpler version
checks (>=3.1 for libfuse itself, >=3.8 for fuse-lseek).

Max

signature.asc
Description: OpenPGP digital signature

Re: [PATCH v3 00/17] hw/block/nvme: multiple namespaces support

2020-09-22 Thread Keith Busch

On Tue, Sep 22, 2020 at 10:45:16AM +0200, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> This is the next round of my patches for the nvme device.
> 
> This includes a bit of cleanup and two new features:
> 
>   * support for scatter/gather lists
> 
>   * multiple namespaces support through a new nvme-ns device
> 
> Finally, the series wraps up with changing the PCI vendor and device ID to get
> rid of the internal Intel id and as a side-effect get rid of some Linux kernel
> quirks that no longer applies.
> 
> "pci: pass along the return value of dma_memory_rw" has already been posted by
> Philippe in another series, but since it is not applied yet, I am including it
> here.

For the rest of the patches I haven't individually commented:

Reviewed-by: Keith Busch

Re: [PATCH v6 2/2] nvme: allow cmb and pmr to be enabled on same device

2020-09-22 Thread Keith Busch

On Tue, Sep 22, 2020 at 11:04:25AM +0200, Klaus Jensen wrote:
> On Aug 17 08:29, Klaus Jensen wrote:
> > On Jul 30 00:50, Klaus Jensen wrote:
> > > On Jul 29 15:01, Andrzej Jakowski wrote:
> > > > So far it was not possible to have CMB and PMR emulated on the same
> > > > device, because BAR2 was used exclusively either of PMR or CMB. This
> > > > patch places CMB at BAR4 offset so it not conflicts with MSI-X vectors.
> > > > 
> > > > Signed-off-by: Andrzej Jakowski 
> > > > ---
> > > 
> > > Well, I'm certainly happy now. LGTM!
> > > 
> > > Reviewed-by: Klaus Jensen 
> > > 
> > 
> > Are anyone willing to chip in with another review on this?
> > 
> 
> I think this patch is ready (and have been for some time) for inclusion,
> but would really like an additional review on this; preferably from
> Keith, since he is the one that originally mentioned that we could do
> something like this.

Yes, this does look good to me too.

Reviewed-by: Keith Busch

Re: [RFC PATCH 3/6] hw/sd/sdcard: Do not use legal address '0' for INVALID_ADDRESS

2020-09-22 Thread Markus Armbruster

Paolo Bonzini  writes:

> I think we can just bite the bullet and bump the version number. Just like
> not all boards are created equal in terms of migration compatibility,
> neither are all devices.
>
> Unfortunately pflash is among those that need some care, but we have much
> more leeway with sdhci-pci.

No objection.

Re: [PATCH 0/1] block: future of sheepdog storage driver ?

2020-09-22 Thread Markus Armbruster

Thomas Huth  writes:

> On 22/09/2020 11.01, Daniel P. Berrangé wrote:
> [...]
>> Does someone have a compelling reason for QEMU to keep supporting
>> this driver, other than the fact that it already exists ?
>> 
>> If not, it looks like a candidate for deprecating in QEMU with a
>> view to later removing it.
>
> I think you gave enough examples why this is a good candidate for
> deprecation. So I'd say: Simply send a patch to deprecate it. That's
> what's our deprecation process is good for. If someone still cares for
> sheepdog, they then can speak up and we can revert the patch that put it
> on the deprecation list.

Concur.

Re: [PATCH 1/1] block: drop moderated sheepdog mailing list from MAINTAINERS file

2020-09-22 Thread Philippe Mathieu-Daudé

On 9/22/20 11:01 AM, Daniel P. Berrangé wrote:
> The sheepdog mailing list is setup to stop and queue messages from
> non-subscribers, pending moderator approval. Unfortunately it seems
> that the moderation queue is not actively dealt with. Even when messages
> are approved, the sender is never added to the whitelist, so every
> future mail from the same sender continues to get stopped for moderation.
> 
> MAINTAINERS entries should be responsive and not uneccessarily block

Typo "unnecessarily".

Reviewed-by: Philippe Mathieu-Daudé 

> mails from QEMU contributors, so drop the sheepdog mailing list.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  MAINTAINERS | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3d17cad19a..8e8a4fb0a8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2852,7 +2852,6 @@ F: block/rbd.c
>  Sheepdog
>  M: Liu Yuan 
>  L: qemu-block@nongnu.org
> -L: sheep...@lists.wpkg.org
>  S: Odd Fixes
>  F: block/sheepdog.c
>  
>

Re: [PATCH v8 5/7] copy-on-read: limit guest writes to base in COR driver

2020-09-22 Thread Andrey Shinkevich


On 04.09.2020 16:59, Vladimir Sementsov-Ogievskiy wrote:

04.09.2020 15:50, Max Reitz wrote:

On 28.08.20 18:52, Andrey Shinkevich wrote:

Limit the guest's COR operations by the base node in the backing chain
during a stream job.


I don’t understand.   Shouldn’t we limit the areas where we set the COR
flag?


Signed-off-by: Andrey Shinkevich 
---
  block/copy-on-read.c | 49 
+

  1 file changed, 49 insertions(+)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 1f858bb..ecbd1f8 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -57,6 +57,37 @@ static BlockDriverState 
*get_base_by_name(BlockDriverState *bs,

  return base_bs;
  }
  +/*
+ * Returns 1 if the block is allocated in a node between 
cor_filter_bs->file->bs

+ * and the base_bs in the backing chain. Otherwise, returns 0.
+ * The COR operation is allowed if the base_bs is not specified - 
return 1.

+ * Returns negative errno on failure.
+ */
+static int node_above_base(BlockDriverState *cor_filter_bs, 
uint64_t offset,

+   uint64_t bytes)
+{
+    int ret;
+    int64_t dummy;
+    BlockDriverState *file = NULL;
+    BDRVStateCOR *state = cor_filter_bs->opaque;
+
+    if (!state->base_bs) {
+    return 1;
+    }
+
+    ret = bdrv_block_status_above(cor_filter_bs->file->bs, 
state->base_bs,

+  offset, bytes, &dummy, NULL, &file);
+    if (ret < 0) {
+    return ret;
+    }
+
+    if (file) {


Why check file and not the return value?


+    return 1;
+    }
+
+    return 0;


“dummy” should really not be called that way, it should be evaluated
whether it’s smaller than bytes.

First, [offset, offset + dummy) may not be allocated above the base –
but [offset + dummy, offset + bytes) may be.  Then this function returns
0 here, even though there is something in that range that’s allocated.

Second, in that case we still shouldn’t return 1, but return the
shrunken offset instead.  Or, if there are multiple distinct allocated
areas, they should probably even all be copied separately.


(But all of that of course only if this function is intended to be used
to limit where we set the COR flag, because I don’t understand why we’d
want to limit where something can be written.)



Agree to all.

1. Write path shouldn't be changed: it's a copy-on-read filter.

2. On read we need is_allocated_above-driven loop, to insert the flag 
only to regions allocated above base
 (and other regions we read just without the flag, don't skip them). 
qiov_offset will help very well.


3. Like in many other places, let's ignore  errors (and just add the 
flag if block_status fails)



If "block_status" fails, the stream job does not copy. Shall we keep the 
same behavior in the cor_co_preadv_part()?



Andrey




+}
+
  static int cor_open(BlockDriverState *bs, QDict *options, int flags,
  Error **errp)
  {
@@ -153,6 +184,12 @@ static int coroutine_fn 
cor_co_pwritev_part(BlockDriverState *bs,

  QEMUIOVector *qiov,
  size_t qiov_offset, 
int flags)

  {
+    int ret = node_above_base(bs, offset, bytes);
+
+    if (!ret || ret < 0) {
+    return ret;
+    }
+
  return bdrv_co_pwritev_part(bs->file, offset, bytes, qiov, 
qiov_offset,

  flags);
  }
@@ -162,6 +199,12 @@ static int coroutine_fn 
cor_co_pwrite_zeroes(BlockDriverState *bs,
   int64_t offset, int 
bytes,

BdrvRequestFlags flags)
  {
+    int ret = node_above_base(bs, offset, bytes);
+
+    if (!ret || ret < 0) {
+    return ret;
+    }
+
  return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
  }
  @@ -178,6 +221,12 @@ static int coroutine_fn 
cor_co_pwritev_compressed(BlockDriverState *bs,

    uint64_t bytes,
QEMUIOVector *qiov)
  {
+    int ret = node_above_base(bs, offset, bytes);
+
+    if (!ret || ret < 0) {
+    return ret;
+    }
+
  return bdrv_co_pwritev(bs->file, offset, bytes, qiov,
 BDRV_REQ_WRITE_COMPRESSED);
  }

Re: [PATCH] hw: ide: check the pointer before do dma memory unmap

2020-09-22 Thread Li Qiang

Philippe Mathieu-Daudé  于2020年9月22日周二 下午6:46写道：
>
> On 9/22/20 12:37 PM, Li Qiang wrote:
> > Philippe Mathieu-Daudé  于2020年9月22日周二 下午4:19写道：
> >>
> >> On 9/22/20 3:34 AM, Alexander Bulekov wrote:
> >>> On 200815 0020, Li Qiang wrote:
>  In 'map_page' we need to check the return value of
>  'dma_memory_map' to ensure the we actully maped something.
>  Otherwise, we will hit an assert in 'address_space_unmap'.
>  This is because we can't find the MR with the NULL buffer.
>  This is the LP#1884693:
> 
>  -->https://bugs.launchpad.net/qemu/+bug/1884693
> 
>  Reported-by: Alexander Bulekov 
>  Signed-off-by: Li Qiang 
> >>>
> >>> I'm not very familiar with the IDE code, but this seems like a simple
> >>> null-ptr check, and Li has not received a response in over a month.
> >>
> >> Yeah well it is not an easy bug... I spent few hours but at some
> >> point it became too AHCI specific. I wanted to understand the bug
> >> to answer the "Why do we get there?" "Can we get there with real
> >> hardware?" questions, to be able to discern if this patch is OK,
> >> or if it is hiding bugs and what we really use here is an assert().
> >
> > Hi Philippe,
> > I think you have complicated this issue. The root cause is that
> > 'dma_memory_map' maybe fail.
> > The gpa is from guest and can be any value so this is expected.
> > It can return NULL pointer (no map) or it can be do partially
> > mapped(len < wanted).
> > Though in most situation the map result is 'ret == NULL and len <
> > wanted'. It may also has '
> > ret != NULL and len < wanted' I think.
>
> Then this form is easier to review to my taste:
>
> -- >8 --
> @@ -250,7 +250,7 @@ static void map_page(AddressSpace *as, uint8_t
> **ptr, uint64_t addr,
>  }
>
>  *ptr = dma_memory_map(as, addr, &len, DMA_DIRECTION_FROM_DEVICE);
> -if (len < wanted) {
> +if (*ptr && len < wanted) {
>  dma_memory_unmap(as, *ptr, len, DMA_DIRECTION_FROM_DEVICE, len);
>  *ptr = NULL;
>  }

Yes, in this case your patch is more clean and less code change.
But in generic case, my origin patch is more easy to understand and also
It is easy to do resource clean or trace (such as 'ahci_populate_sglist').

Anyway just let John do the choice.

Thanks,
Li Qiang

> ---
>
> >
> > The assert is come from that we pass NULL to 'dma_memory_unmap'.
> >
> > So the standard usage of 'dma_memory_map' I think is first check if
> > the return value to ensure it is not NULL.
> > Then check whether it mapped the len as the caller expected.
> >
> > There are several places in the code base that doesn't following this
> > usage which I think it is wrong.
> >
> > Thanks,
> > Li Qiang
> >
> >>
> >>>
> >>> Reviewed-by: Alexander Bulekov 
> >>>
>  ---
>   hw/ide/ahci.c | 5 +
>   1 file changed, 5 insertions(+)
> 
>  diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
>  index 009120f88b..63e9fccdbe 100644
>  --- a/hw/ide/ahci.c
>  +++ b/hw/ide/ahci.c
>  @@ -250,6 +250,11 @@ static void map_page(AddressSpace *as, uint8_t 
>  **ptr, uint64_t addr,
>   }
> 
>   *ptr = dma_memory_map(as, addr, &len, DMA_DIRECTION_FROM_DEVICE);
>  +
>  +if (!*ptr) {
>  +return;
>  +}
>  +
>   if (len < wanted) {
>   dma_memory_unmap(as, *ptr, len, DMA_DIRECTION_FROM_DEVICE, len);
>   *ptr = NULL;
>  --
>  2.17.1
> 
> >>>
> >>
> >
>

Re: [PATCH v2 01/20] configure: Detect libfuse

2020-09-22 Thread Max Reitz

On 22.09.20 13:14, Thomas Huth wrote:

[...]

> Could you turn this immediately into a meson test instead? See e.g.
> https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg07112.html as
> an example for how to do this.

Sure!

Max



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2] qemu/atomic.h: prefix qemu_ to solve collisions

2020-09-22 Thread Paolo Bonzini

On 22/09/20 10:58, Stefan Hajnoczi wrote:
> clang's C11 atomic_fetch_*() functions only take a C11 atomic type
> pointer argument. QEMU uses direct types (int, etc) and this causes a
> compiler error when a QEMU code calls these functions in a source file
> that also included  via a system header file:
> 
>   $ CC=clang CXX=clang++ ./configure ... && make
>   ../util/async.c:79:17: error: address argument to atomic operation must be 
> a pointer to _Atomic type ('unsigned int *' invalid)
> 
> Avoid using atomic_*() names in QEMU's atomic.h since that namespace is
> used by . Prefix QEMU's APIs with qemu_ so that atomic.h
> and  can co-exist.
> 
> This patch was generated using:
> 
>   $ git grep -h -o '\ sort -u >/tmp/changed_identifiers
>   $ for identifier in $(sed -i "s%\<$identifier\>%qemu_$identifier%g" \
>$(git grep -I -l "\<$identifier\>")
> done
> 
> I manually fixed line-wrap issues and misaligned rST tables.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
> v2:
>  * The diff of my manual fixups is available here:
>https://vmsplice.net/~stefan/atomic-namespace-pre-fixups.diff
>- Dropping #ifndef qemu_atomic_fetch_add in atomic.h
>- atomic_##X(haddr, val) glue macros not caught by grep
>- Keep atomic_add-bench name
>- C preprocessor backslash-newline ('\') column alignment
>- Line wrapping
>  * Use grep -I to avoid accidentally modifying binary files (RISC-V
>OpenSBI ELFs) [Eric Blake]
>  * Tweak .gitorder to show atomic.h changes first [Eric Blake]
>  * Update grep commands in commit description so reviewers can reproduce
>mechanical changes [Eric Blake]

I think the reviews crossed, are you going to respin using a qatomic_
prefix?

Paolo

Re: [PATCH v2 01/20] configure: Detect libfuse

2020-09-22 Thread Paolo Bonzini

On 22/09/20 13:14, Thomas Huth wrote:
> On 22/09/2020 12.49, Max Reitz wrote:
>> Signed-off-by: Max Reitz 
>> ---
>>  configure   | 34 ++
>>  meson.build |  6 ++
>>  2 files changed, 40 insertions(+)
>>
>> diff --git a/configure b/configure
>> index ce27eafb0a..21c31e4694 100755
>> --- a/configure
>> +++ b/configure
>> @@ -538,6 +538,7 @@ meson=""
>>  ninja=""
>>  skip_meson=no
>>  gettext=""
>> +fuse=""
>>  
>>  bogus_os="no"
>>  malloc_trim=""
>> @@ -1621,6 +1622,10 @@ for opt do
>>;;
>>--disable-libdaxctl) libdaxctl=no
>>;;
>> +  --enable-fuse) fuse=yes
>> +  ;;
>> +  --disable-fuse) fuse=no
>> +  ;;
>>*)
>>echo "ERROR: unknown option $opt"
>>echo "Try '$0 --help' for more information"
>> @@ -1945,6 +1950,7 @@ disabled with --disable-FEATURE, default is enabled if 
>> available:
>>xkbcommon   xkbcommon support
>>rng-nonedummy RNG, avoid using /dev/(u)random and getrandom()
>>libdaxctl   libdaxctl support
>> +  fusefuse block device export
>>  
>>  NOTE: The object files are built at the place where configure is launched
>>  EOF
>> @@ -6206,6 +6212,28 @@ but not implemented on your system"
>>  fi
>>  fi
>>  
>> +##
>> +# FUSE support
>> +
>> +if test "$fuse" != "no"; then
>> +  cat > $TMPC <> +#define FUSE_USE_VERSION 31
>> +#include 
>> +#include 
>> +int main(void) { return 0; }
>> +EOF
>> +  fuse_cflags=$(pkg-config --cflags fuse3)
>> +  fuse_libs=$(pkg-config --libs fuse3)
>> +  if compile_prog "$fuse_cflags" "$fuse_libs"; then
>> +fuse=yes
>> +  else
>> +if test "$fuse" = "yes"; then
>> +  feature_not_found "fuse"
>> +fi
>> +fuse=no
>> +  fi
>> +fi
> 
> Could you turn this immediately into a meson test instead? See e.g.
> https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg07112.html as
> an example for how to do this.

For pkg-config in fact it's even simpler and documented in
docs/devel/build-system.rst.

Paolo

Re: [PATCH v2 01/20] configure: Detect libfuse

2020-09-22 Thread Thomas Huth

On 22/09/2020 12.49, Max Reitz wrote:
> Signed-off-by: Max Reitz 
> ---
>  configure   | 34 ++
>  meson.build |  6 ++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/configure b/configure
> index ce27eafb0a..21c31e4694 100755
> --- a/configure
> +++ b/configure
> @@ -538,6 +538,7 @@ meson=""
>  ninja=""
>  skip_meson=no
>  gettext=""
> +fuse=""
>  
>  bogus_os="no"
>  malloc_trim=""
> @@ -1621,6 +1622,10 @@ for opt do
>;;
>--disable-libdaxctl) libdaxctl=no
>;;
> +  --enable-fuse) fuse=yes
> +  ;;
> +  --disable-fuse) fuse=no
> +  ;;
>*)
>echo "ERROR: unknown option $opt"
>echo "Try '$0 --help' for more information"
> @@ -1945,6 +1950,7 @@ disabled with --disable-FEATURE, default is enabled if 
> available:
>xkbcommon   xkbcommon support
>rng-nonedummy RNG, avoid using /dev/(u)random and getrandom()
>libdaxctl   libdaxctl support
> +  fusefuse block device export
>  
>  NOTE: The object files are built at the place where configure is launched
>  EOF
> @@ -6206,6 +6212,28 @@ but not implemented on your system"
>  fi
>  fi
>  
> +##
> +# FUSE support
> +
> +if test "$fuse" != "no"; then
> +  cat > $TMPC < +#define FUSE_USE_VERSION 31
> +#include 
> +#include 
> +int main(void) { return 0; }
> +EOF
> +  fuse_cflags=$(pkg-config --cflags fuse3)
> +  fuse_libs=$(pkg-config --libs fuse3)
> +  if compile_prog "$fuse_cflags" "$fuse_libs"; then
> +fuse=yes
> +  else
> +if test "$fuse" = "yes"; then
> +  feature_not_found "fuse"
> +fi
> +fuse=no
> +  fi
> +fi

Could you turn this immediately into a meson test instead? See e.g.
https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg07112.html as
an example for how to do this.

 Thomas

[PATCH v2 18/20] iotests: Allow testing FUSE exports

2020-09-22 Thread Max Reitz

This pretends FUSE exports are a kind of protocol.  As such, they are
always tested under the format node.  This is probably the best way to
test them, actually, because this will generate more I/O load and more
varied patterns.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/check |   6 ++
 tests/qemu-iotests/common.filter |   5 +-
 tests/qemu-iotests/common.rc | 124 +++
 3 files changed, 134 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 467a7cf1b7..07232138d7 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -270,6 +270,7 @@ image protocol options
 -rbdtest rbd
 -sheepdog   test sheepdog
 -nbdtest nbd
+-fuse   test fuse
 -sshtest ssh
 -nfstest nfs
 
@@ -382,6 +383,11 @@ testlist options
 xpand=false
 ;;
 
+-fuse)
+IMGPROTO=fuse
+xpand=false
+;;
+
 -ssh)
 IMGPROTO=ssh
 xpand=false
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 838ed15793..172ea5752e 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -44,7 +44,8 @@ _filter_qom_path()
 _filter_testdir()
 {
 $SED -e "s#$TEST_DIR/#TEST_DIR/#g" \
- -e "s#$SOCK_DIR/#SOCK_DIR/#g"
+ -e "s#$SOCK_DIR/#SOCK_DIR/#g" \
+ -e "s#SOCK_DIR/fuse-#TEST_DIR/#g"
 }
 
 # replace occurrences of the actual IMGFMT value with IMGFMT
@@ -127,6 +128,7 @@ _filter_img_create_filenames()
 -e "s#$IMGPROTO:$TEST_DIR#TEST_DIR#g" \
 -e "s#$TEST_DIR#TEST_DIR#g" \
 -e "s#$SOCK_DIR#SOCK_DIR#g" \
+-e 's#SOCK_DIR/fuse-#TEST_DIR/#g' \
 -e "s#$IMGFMT#IMGFMT#g" \
 -e 's#nbd:127.0.0.1:[0-9]\\+#TEST_DIR/t.IMGFMT#g' \
 -e 's#nbd+unix:///\??socket=SOCK_DIR/nbd#TEST_DIR/t.IMGFMT#g'
@@ -227,6 +229,7 @@ _filter_img_info()
 -e "s#$IMGFMT#IMGFMT#g" \
 -e 's#nbd://127.0.0.1:[0-9]\\+$#TEST_DIR/t.IMGFMT#g' \
 -e 's#nbd+unix:///\??socket=SOCK_DIR/nbd#TEST_DIR/t.IMGFMT#g' \
+-e 's#SOCK_DIR/fuse-#TEST_DIR/#g' \
 -e "/encrypted: yes/d" \
 -e "/cluster_size: [0-9]\\+/d" \
 -e "/table_size: [0-9]\\+/d" \
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index e4751d4985..e17f813f06 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -257,6 +257,9 @@ if [ "$IMGOPTSSYNTAX" = "true" ]; then
 TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
 TEST_IMG="$DRIVER,file.driver=nbd,file.type=unix"
 TEST_IMG="$TEST_IMG,file.path=$SOCK_DIR/nbd"
+elif [ "$IMGPROTO" = "fuse" ]; then
+TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
+TEST_IMG="$DRIVER,file.filename=$SOCK_DIR/fuse-t.$IMGFMT"
 elif [ "$IMGPROTO" = "ssh" ]; then
 TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
 
TEST_IMG="$DRIVER,file.driver=ssh,file.host=127.0.0.1,file.path=$TEST_IMG_FILE"
@@ -273,6 +276,9 @@ else
 elif [ "$IMGPROTO" = "nbd" ]; then
 TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
 TEST_IMG="nbd+unix:///?socket=$SOCK_DIR/nbd"
+elif [ "$IMGPROTO" = "fuse" ]; then
+TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
+TEST_IMG="$SOCK_DIR/fuse-t.$IMGFMT"
 elif [ "$IMGPROTO" = "ssh" ]; then
 TEST_IMG_FILE=$TEST_DIR/t.$IMGFMT
 
REMOTE_TEST_DIR="ssh://\\($USER@\\)\\?127.0.0.1\\(:[0-9]\\+\\)\\?$TEST_DIR"
@@ -288,6 +294,9 @@ fi
 ORIG_TEST_IMG_FILE=$TEST_IMG_FILE
 ORIG_TEST_IMG="$TEST_IMG"
 
+FUSE_PIDS=()
+FUSE_EXPORTS=()
+
 if [ -z "$TEST_DIR" ]; then
 TEST_DIR=$PWD/scratch
 fi
@@ -357,6 +366,10 @@ _test_img_to_test_img_file()
 echo "$1"
 ;;
 
+fuse)
+echo "$1" | sed -e "s#$SOCK_DIR/fuse-#$TEST_DIR/#"
+;;
+
 nfs)
 echo "$1" | sed -e "s#nfs://127.0.0.1##"
 ;;
@@ -385,6 +398,11 @@ _make_test_img()
 local opts_param=false
 local misc_params=()
 
+if [[ $IMGPROTO == fuse && $TEST_IMG == $SOCK_DIR/fuse-* ]]; then
+# The caller may be trying to overwrite an existing image
+_rm_test_img "$TEST_IMG"
+fi
+
 if [ -z "$TEST_IMG_FILE" ]; then
 img_name=$TEST_IMG
 elif [ "$IMGOPTSSYNTAX" != "true" -a \
@@ -469,11 +487,105 @@ _make_test_img()
 eval "$QEMU_NBD -v -t -k '$SOCK_DIR/nbd' -f $IMGFMT -e 42 -x '' 
$TEST_IMG_FILE >/dev/null &"
 sleep 1 # FIXME: qemu-nbd needs to be listening before we continue
 fi
+
+if [ $IMGPROTO = "fuse" -a -f "$img_name" ]; then
+local export_mp
+local pid
+local pidfile
+local timeout
+
+export_mp=$(echo "$img_name" | sed -e "s#$TEST_DIR/#$SOCK_DIR/fuse-#")
+if ! echo "$export_mp" | grep -q "^$SOCK_DIR"; then
+echo 'Cannot use FUSE exports with images outside of TEST_DIR' >&2
+

[PATCH v2 15/20] iotests/287: Clean up subshell test image

2020-09-22 Thread Max Reitz

287 creates an image in a subshell (thanks to the pipe) to see whether
that is possible with compression_type=zstd.  If _make_test_img were to
modify any global state, this global state would then be lost before we
could cleanup the image.

When using FUSE as the test protocol, this global state is important, so
clean up the image before the state is lost.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/287 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/287 b/tests/qemu-iotests/287
index f98a4cadc1..036cc09e82 100755
--- a/tests/qemu-iotests/287
+++ b/tests/qemu-iotests/287
@@ -51,8 +51,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 CLUSTER_SIZE=65536
 
 # Check if we can run this test.
-if IMGOPTS='compression_type=zstd' _make_test_img 64M |
-grep "Invalid parameter 'zstd'"; then
+output=$(_make_test_img -o 'compression_type=zstd' 64M; _cleanup_test_img)
+if echo "$output" | grep -q "Invalid parameter 'zstd'"; then
 _notrun "ZSTD is disabled"
 fi
 
-- 
2.26.2

[PATCH v2 17/20] iotests: Give access to the qemu-storage-daemon

2020-09-22 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/check | 11 +++
 tests/qemu-iotests/common.rc | 17 +
 2 files changed, 28 insertions(+)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index e14a1f354d..467a7cf1b7 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -644,6 +644,17 @@ if [ -z $QEMU_NBD_PROG ]; then
 fi
 export QEMU_NBD_PROG="$(type -p "$QEMU_NBD_PROG")"
 
+if [ -z "$QEMU_STGD_PROG" ]; then
+if [ -x "$build_iotests/qemu-storage-daemon" ]; then
+export QEMU_STGD_PROG="$build_iotests/qemu-storage-daemon"
+elif [ -x "$build_root/storage-daemon/qemu-storage-daemon" ]; then
+export QEMU_STGD_PROG="$build_root/storage-daemon/qemu-storage-daemon"
+else
+_init_error "qemu-storage-daemon not found"
+fi
+fi
+export QEMU_STGD_PROG="$(type -p "$QEMU_STGD_PROG")"
+
 if [ -x "$build_iotests/socket_scm_helper" ]
 then
 export SOCKET_SCM_HELPER="$build_iotests/socket_scm_helper"
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 23f46da2db..e4751d4985 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -124,6 +124,7 @@ fi
 : ${VALGRIND_QEMU_IMG=$VALGRIND_QEMU}
 : ${VALGRIND_QEMU_IO=$VALGRIND_QEMU}
 : ${VALGRIND_QEMU_NBD=$VALGRIND_QEMU}
+: ${VALGRIND_QEMU_STGD=$VALGRIND_QEMU}
 
 # The Valgrind own parameters may be set with
 # its environment variable VALGRIND_OPTS, e.g.
@@ -211,6 +212,21 @@ _qemu_nbd_wrapper()
 return $RETVAL
 }
 
+_qemu_storage_daemon_wrapper()
+{
+local VALGRIND_LOGFILE="${TEST_DIR}"/$$.valgrind
+(
+if [ -n "${QEMU_STGD_NEED_PID}" ]; then
+echo $BASHPID > "${QEMU_TEST_DIR}/qemu-storage-daemon.pid"
+fi
+VALGRIND_QEMU="${VALGRIND_QEMU_STGD}" _qemu_proc_exec 
"${VALGRIND_LOGFILE}" \
+"$QEMU_STGD_PROG" $QEMU_STGD_OPTIONS "$@"
+)
+RETVAL=$?
+_qemu_proc_valgrind_log "${VALGRIND_LOGFILE}" $RETVAL
+return $RETVAL
+}
+
 # Valgrind bug #409141 https://bugs.kde.org/show_bug.cgi?id=409141
 # Until valgrind 3.16+ is ubiquitous, we must work around a hang in
 # valgrind when issuing sigkill. Disable valgrind for this invocation.
@@ -223,6 +239,7 @@ export QEMU=_qemu_wrapper
 export QEMU_IMG=_qemu_img_wrapper
 export QEMU_IO=_qemu_io_wrapper
 export QEMU_NBD=_qemu_nbd_wrapper
+export QEMU_STGD=_qemu_storage_daemon_wrapper
 
 if [ "$IMGOPTSSYNTAX" = "true" ]; then
 DRIVER="driver=$IMGFMT"
-- 
2.26.2

[PATCH v2 14/20] iotests: Let _make_test_img guess $TEST_IMG_FILE

2020-09-22 Thread Max Reitz

When most iotests want to create a test image that is named differently
from the default $TEST_IMG, they do something like this:

TEST_IMG="$TEST_IMG.base" _make_test_img $options

This works fine with the "file" protocol, but not so much for anything
else: _make_test_img tries to create an image under $TEST_IMG_FILE
first, and only under $TEST_IMG if the former is not set; and on
everything but "file", $TEST_IMG_FILE is set.

There are two ways we can fix this: First, we could make all tests
adjust not only TEST_IMG, but also TEST_IMG_FILE if that is present
(e.g. with something like _set_test_img_suffix $suffix that would affect
not only TEST_IMG but also TEST_IMG_FILE, if necessary).  This is a
pretty clean solution, and this is maybe what we should have done from
the start.

But it would also require changes to most existing bash tests.  So the
alternative is this: Let _make_test_img see whether $TEST_IMG_FILE still
points to the original value.  If so, it is possible that the caller has
adjusted $TEST_IMG but not $TEST_IMG_FILE.  In such a case, we can (for
most protocols) derive the corresponding $TEST_IMG_FILE value from
$TEST_IMG value and thus work around what technically is the caller
misbehaving.

This second solution is less clean, but it is robust against people
keeping their old habit of adjusting TEST_IMG only, and requires much
less changes.  So this patch implements it.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/common.rc | 40 +---
 1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 494490a272..23f46da2db 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -268,6 +268,7 @@ else
 TEST_IMG=$IMGPROTO:$TEST_DIR/t.$IMGFMT
 fi
 fi
+ORIG_TEST_IMG_FILE=$TEST_IMG_FILE
 ORIG_TEST_IMG="$TEST_IMG"
 
 if [ -z "$TEST_DIR" ]; then
@@ -330,6 +331,30 @@ _get_data_file()
 | sed -e "s#\\\$TEST_IMG#$1#"
 }
 
+# Translate a $TEST_IMG to its corresponding $TEST_IMG_FILE for
+# different protocols
+_test_img_to_test_img_file()
+{
+case "$IMGPROTO" in
+file)
+echo "$1"
+;;
+
+nfs)
+echo "$1" | sed -e "s#nfs://127.0.0.1##"
+;;
+
+ssh)
+echo "$1" | \
+sed -e "s#ssh://\\($USER@\\)\\?127.0.0.1\\(:[0-9]\\+\\)\\?##"
+;;
+
+*)
+return 1
+;;
+esac
+}
+
 _make_test_img()
 {
 # extra qemu-img options can be added by tests
@@ -343,10 +368,19 @@ _make_test_img()
 local opts_param=false
 local misc_params=()
 
-if [ -n "$TEST_IMG_FILE" ]; then
-img_name=$TEST_IMG_FILE
-else
+if [ -z "$TEST_IMG_FILE" ]; then
 img_name=$TEST_IMG
+elif [ "$IMGOPTSSYNTAX" != "true" -a \
+   "$TEST_IMG_FILE" = "$ORIG_TEST_IMG_FILE" ]; then
+# Handle cases of tests only updating TEST_IMG, but not TEST_IMG_FILE
+img_name=$(_test_img_to_test_img_file "$TEST_IMG")
+if [ "$?" != 0 ]; then
+img_name=$TEST_IMG_FILE
+fi
+else
+# $TEST_IMG_FILE is not the default value, so it definitely has been
+# modified by the test
+img_name=$TEST_IMG_FILE
 fi
 
 if [ -n "$IMGOPTS" ]; then
-- 
2.26.2

[PATCH v2 13/20] iotests: Restrict some Python tests to file

2020-09-22 Thread Max Reitz

Most Python tests are restricted to the file protocol (without
explicitly saying so), but these are the ones that would break
./check -fuse -qcow2.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/206 | 3 ++-
 tests/qemu-iotests/242 | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/206 b/tests/qemu-iotests/206
index 11bc51f256..0a3ee5ef00 100755
--- a/tests/qemu-iotests/206
+++ b/tests/qemu-iotests/206
@@ -23,7 +23,8 @@
 import iotests
 from iotests import imgfmt
 
-iotests.script_initialize(supported_fmts=['qcow2'])
+iotests.script_initialize(supported_fmts=['qcow2'],
+  supported_protocols=['file'])
 iotests.verify_working_luks()
 
 with iotests.FilePath('t.qcow2') as disk_path, \
diff --git a/tests/qemu-iotests/242 b/tests/qemu-iotests/242
index 64f1bd95e4..a16de3085f 100755
--- a/tests/qemu-iotests/242
+++ b/tests/qemu-iotests/242
@@ -24,7 +24,8 @@ import struct
 from iotests import qemu_img_create, qemu_io, qemu_img_pipe, \
 file_path, img_info_log, log, filter_qemu_io
 
-iotests.script_initialize(supported_fmts=['qcow2'])
+iotests.script_initialize(supported_fmts=['qcow2'],
+  supported_protocols=['file'])
 
 disk = file_path('disk')
 chunk = 256 * 1024
-- 
2.26.2

[PATCH v2 20/20] iotests/308: Add test for FUSE exports

2020-09-22 Thread Max Reitz

We have good coverage of the normal I/O paths now, but what remains is a
test that tests some more special cases: Exporting an image on itself
(thus turning a formatted image into a raw one), some error cases, and
non-writable and non-growable exports.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/308 | 339 +
 tests/qemu-iotests/308.out |  97 +++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 437 insertions(+)
 create mode 100755 tests/qemu-iotests/308
 create mode 100644 tests/qemu-iotests/308.out

diff --git a/tests/qemu-iotests/308 b/tests/qemu-iotests/308
new file mode 100755
index 00..b30f4400f6
--- /dev/null
+++ b/tests/qemu-iotests/308
@@ -0,0 +1,339 @@
+#!/usr/bin/env bash
+#
+# Test FUSE exports (in ways that are not captured by the generic
+# tests)
+#
+# Copyright (C) 2020 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+seq=$(basename "$0")
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+_cleanup()
+{
+_cleanup_qemu
+_cleanup_test_img
+rmdir "$EXT_MP" 2>/dev/null
+rm -f "$EXT_MP"
+rm -f "$COPIED_IMG"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+# Generic format, but needs a plain filename
+_supported_fmt generic
+if [ "$IMGOPTSSYNTAX" = "true" ]; then
+_unsupported_fmt $IMGFMT
+fi
+# We need the image to have exactly the specified size, and VPC does
+# not allow that by default
+_unsupported_fmt vpc
+
+_supported_proto file # We create the FUSE export manually
+_supported_os Linux # We need /dev/urandom
+
+# $1: Export ID
+# $2: Options (beyond the node-name and ID)
+# $3: Expected return value (defaults to 'return')
+# $4: Node to export (defaults to 'node-format')
+fuse_export_add()
+{
+_send_qemu_cmd $QEMU_HANDLE \
+"{'execute': 'block-export-add',
+  'arguments': {
+  'type': 'fuse',
+  'id': '$1',
+  'node-name': '${4:-node-format}',
+  $2
+  } }" \
+"${3:-return}" \
+| _filter_imgfmt
+}
+
+# $1: Export ID
+fuse_export_del()
+{
+_send_qemu_cmd $QEMU_HANDLE \
+"{'execute': 'block-export-del',
+  'arguments': {
+  'id': '$1'
+  } }" \
+'return'
+
+_send_qemu_cmd $QEMU_HANDLE \
+'' \
+'BLOCK_EXPORT_DELETED'
+}
+
+# Return the length of the protocol file
+# $1: Protocol node export mount point
+# $2: Original file (to compare)
+get_proto_len()
+{
+len1=$(stat -c '%s' "$1")
+len2=$(stat -c '%s' "$2")
+
+if [ "$len1" != "$len2" ]; then
+echo 'ERROR: Length of export and original differ:' >&2
+echo "$len1 != $len2" >&2
+else
+echo '(OK: Lengths of export and original are the same)' >&2
+fi
+
+echo "$len1"
+}
+
+COPIED_IMG="$TEST_IMG.copy"
+EXT_MP="$TEST_IMG.fuse"
+
+echo '=== Set up ==='
+
+# Create image with random data
+_make_test_img 64M
+$QEMU_IO -c 'write -s /dev/urandom 0 64M' "$TEST_IMG" | _filter_qemu_io
+
+_launch_qemu
+_send_qemu_cmd $QEMU_HANDLE \
+"{'execute': 'qmp_capabilities'}" \
+'return'
+
+# Separate blockdev-add calls for format and protocol so we can remove
+# the format layer later on
+_send_qemu_cmd $QEMU_HANDLE \
+"{'execute': 'blockdev-add',
+  'arguments': {
+  'driver': 'file',
+  'node-name': 'node-protocol',
+  'filename': '$TEST_IMG'
+  } }" \
+'return'
+
+_send_qemu_cmd $QEMU_HANDLE \
+"{'execute': 'blockdev-add',
+  'arguments': {
+  'driver': '$IMGFMT',
+  'node-name': 'node-format',
+  'file': 'node-protocol'
+  } }" \
+'return'
+
+echo
+echo '=== Mountpoint not present ==='
+
+rmdir "$EXT_MP" 2>/dev/null
+rm -f "$EXT_MP"
+output=$(fuse_export_add 'export-err' "'mountpoint': '$EXT_MP'" error)
+
+if echo "$output" | grep -q "Invalid parameter 'fuse'"; then
+_notrun 'No FUSE support'
+fi
+
+echo "$output"
+
+echo
+echo '=== Mountpoint is a directory ==='
+
+mkdir "$EXT_MP"
+fuse_export_add 'export-err' "'mountpoint': '$EXT_MP'" error
+rmdir "$EXT_MP"
+
+echo
+echo '=== Mountpoint is a regular file ==='
+
+touch "$EXT_MP"
+fuse_export_add 'export-mp' "'mountpoint': '$EXT_MP'"
+
+# Check that the export presents the same data as the origin

[PATCH v2 11/20] iotests: Derive image names from $TEST_IMG

2020-09-22 Thread Max Reitz

Avoid creating images with custom filenames in $TEST_DIR, because
non-file protocols may want to keep $TEST_IMG (and all other test
images) in some other directory.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/200 | 3 +--
 tests/qemu-iotests/200.out | 4 ++--
 tests/qemu-iotests/229 | 3 +--
 tests/qemu-iotests/229.out | 6 +++---
 4 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/tests/qemu-iotests/200 b/tests/qemu-iotests/200
index 59f7854b9f..a7aabbd032 100755
--- a/tests/qemu-iotests/200
+++ b/tests/qemu-iotests/200
@@ -44,8 +44,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2 qed
 _supported_proto file
 
-BACKING_IMG="${TEST_DIR}/backing.img"
-TEST_IMG="${TEST_DIR}/test.img"
+BACKING_IMG="$TEST_IMG.base"
 
 TEST_IMG="$BACKING_IMG" _make_test_img 512M
 _make_test_img -F $IMGFMT -b "$BACKING_IMG" 512M
diff --git a/tests/qemu-iotests/200.out b/tests/qemu-iotests/200.out
index a6776070e4..5883f16ac3 100644
--- a/tests/qemu-iotests/200.out
+++ b/tests/qemu-iotests/200.out
@@ -1,6 +1,6 @@
 QA output created by 200
-Formatting 'TEST_DIR/backing.img', fmt=IMGFMT size=536870912
-Formatting 'TEST_DIR/test.img', fmt=IMGFMT size=536870912 
backing_file=TEST_DIR/backing.img backing_fmt=IMGFMT
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=536870912
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=536870912 
backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
 wrote 314572800/314572800 bytes at offset 512
 300 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
diff --git a/tests/qemu-iotests/229 b/tests/qemu-iotests/229
index 89a5359f32..5f759fa587 100755
--- a/tests/qemu-iotests/229
+++ b/tests/qemu-iotests/229
@@ -51,8 +51,7 @@ _supported_os Linux
 _unsupported_imgopts data_file
 
 
-DEST_IMG="$TEST_DIR/d.$IMGFMT"
-TEST_IMG="$TEST_DIR/b.$IMGFMT"
+DEST_IMG="$TEST_IMG.dest"
 BLKDEBUG_CONF="$TEST_DIR/blkdebug.conf"
 
 _make_test_img 2M
diff --git a/tests/qemu-iotests/229.out b/tests/qemu-iotests/229.out
index 4de6dfaa28..7eed393013 100644
--- a/tests/qemu-iotests/229.out
+++ b/tests/qemu-iotests/229.out
@@ -1,6 +1,6 @@
 QA output created by 229
-Formatting 'TEST_DIR/b.IMGFMT', fmt=IMGFMT size=2097152
-Formatting 'TEST_DIR/d.IMGFMT', fmt=IMGFMT size=2097152
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2097152
+Formatting 'TEST_DIR/t.IMGFMT.dest', fmt=IMGFMT size=2097152
 wrote 2097152/2097152 bytes at offset 0
 2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 {'execute': 'qmp_capabilities'}
@@ -8,7 +8,7 @@ wrote 2097152/2097152 bytes at offset 0
 
 === Starting drive-mirror, causing error & stop  ===
 
-{'execute': 'drive-mirror', 'arguments': {'device': 'testdisk', 'format': 
'IMGFMT', 'target': 'blkdebug:TEST_DIR/blkdebug.conf:TEST_DIR/d.IMGFMT', 
'sync': 'full', 'mode': 'existing', 'on-source-error': 'stop', 
'on-target-error': 'stop' }}
+{'execute': 'drive-mirror', 'arguments': {'device': 'testdisk', 'format': 
'IMGFMT', 'target': 'blkdebug:TEST_DIR/blkdebug.conf:TEST_DIR/t.IMGFMT.dest', 
'sync': 'full', 'mode': 'existing', 'on-source-error': 'stop', 
'on-target-error': 'stop' }}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "created", "id": "testdisk"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "running", "id": "testdisk"}}
 {"return": {}}
-- 
2.26.2

[PATCH v2 09/20] iotests: Use convert -n in some cases

2020-09-22 Thread Max Reitz

qemu-img convert (without -n) can often be replaced by a combination of
_make_test_img + qemu-img convert -n.  Doing so allows converting to
protocols that do not allow direct file creation, such as FUSE exports.
The only problem is that for formats other than qcow2 and qed (qcow1 at
least), this may lead to high disk usage for some reason, so we cannot
do it everywhere.

But we can do it in 028 and 089, so let us do that so they can run on
FUSE exports.  Also, in 028 this allows us to remove a 9-line comment
that used to explain why we cannot safely filter drive-backup's image
creation output.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/028 | 14 --
 tests/qemu-iotests/028.out |  3 +++
 tests/qemu-iotests/089 |  3 ++-
 tests/qemu-iotests/089.out |  1 +
 4 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028
index 6dd3ae09a3..864dc4a4e2 100755
--- a/tests/qemu-iotests/028
+++ b/tests/qemu-iotests/028
@@ -116,16 +116,10 @@ else
 QEMU_COMM_TIMEOUT=1
 fi
 
-# Silence output since it contains the disk image path and QEMU's readline
-# character echoing makes it very hard to filter the output. Plus, there
-# is no telling how many times the command will repeat before succeeding.
-# (Note that creating the image results in a "Formatting..." message over
-# stdout, which is the same channel the monitor uses.  We cannot reliably
-# wait for it because the monitor output may interact with it in such a
-# way that _timed_wait_for cannot read it.  However, once the block job is
-# done, we know that the "Formatting..." message must have appeared
-# already, so the output is still deterministic.)
-silent=y _send_qemu_cmd $h "drive_backup disk ${TEST_IMG}.copy" "(qemu)"
+TEST_IMG="$TEST_IMG.copy" _make_test_img $image_size
+_send_qemu_cmd $h "drive_backup -n disk ${TEST_IMG}.copy" "(qemu)" \
+| _filter_imgfmt
+
 silent=y qemu_cmd_repeat=20 _send_qemu_cmd $h "info block-jobs" "No active 
jobs"
 _send_qemu_cmd $h "info block-jobs" "No active jobs"
 _send_qemu_cmd $h 'quit' ""
diff --git a/tests/qemu-iotests/028.out b/tests/qemu-iotests/028.out
index 5a68de5c46..e580488216 100644
--- a/tests/qemu-iotests/028.out
+++ b/tests/qemu-iotests/028.out
@@ -468,6 +468,9 @@ No errors were found on the image.
 
 block-backup
 
+Formatting 'TEST_DIR/t.IMGFMT.copy', fmt=IMGFMT size=4294968832
+QEMU X.Y.Z monitor - type 'help' for more information
+(qemu) drive_backup -n disk TEST_DIR/t.IMGFMT.copy
 (qemu) info block-jobs
 No active jobs
 === IO: pattern 195
diff --git a/tests/qemu-iotests/089 b/tests/qemu-iotests/089
index 66c5415abe..03a2ccf1e8 100755
--- a/tests/qemu-iotests/089
+++ b/tests/qemu-iotests/089
@@ -62,7 +62,8 @@ TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE
 $QEMU_IO -c 'write -P 42 0 512' -c 'write -P 23 512 512' \
  -c 'write -P 66 1024 512' "$TEST_IMG.base" | _filter_qemu_io
 
-$QEMU_IMG convert -f raw -O $IMGFMT "$TEST_IMG.base" "$TEST_IMG"
+_make_test_img $IMG_SIZE
+$QEMU_IMG convert -f raw -O $IMGFMT -n "$TEST_IMG.base" "$TEST_IMG"
 
 $QEMU_IO_PROG --cache $CACHEMODE --aio $AIOMODE \
  -c 'read -P 42 0 512' -c 'read -P 23 512 512' \
diff --git a/tests/qemu-iotests/089.out b/tests/qemu-iotests/089.out
index 15682c2886..c53fc4823a 100644
--- a/tests/qemu-iotests/089.out
+++ b/tests/qemu-iotests/089.out
@@ -9,6 +9,7 @@ wrote 512/512 bytes at offset 512
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 512/512 bytes at offset 1024
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 read 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 512/512 bytes at offset 512
-- 
2.26.2

[PATCH v2 08/20] iotests: Do not pipe _make_test_img

2020-09-22 Thread Max Reitz

Executing _make_test_img as part of a pipe will undo all variable
changes it has done.  As such, this could not work with FUSE (because
we want to remember all of our exports and their qemu instances).

Replace the pipe by a temporary file in 071 and 174 (the two tests that
can run on FUSE).

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/071 | 19 +++
 tests/qemu-iotests/174 | 10 +-
 2 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/071 b/tests/qemu-iotests/071
index 88faebcc1d..18fe9054b0 100755
--- a/tests/qemu-iotests/071
+++ b/tests/qemu-iotests/071
@@ -61,8 +61,17 @@ echo
 echo "=== Testing blkverify through filename ==="
 echo
 
-TEST_IMG="$TEST_IMG.base" IMGFMT="raw" _make_test_img --no-opts $IMG_SIZE |\
-_filter_imgfmt
+# _make_test_img may set variables that we need to retain.  Everything
+# in a pipe is executed in a subshell, so doing so would throw away
+# all changes.  Therefore, we have to store the output in some temp
+# file and filter that.
+scratch_out="$TEST_DIR/img-create.out"
+
+TEST_IMG="$TEST_IMG.base" IMGFMT="raw" _make_test_img --no-opts $IMG_SIZE \
+>"$scratch_out"
+_filter_imgfmt <"$scratch_out"
+rm -f "$scratch_out"
+
 _make_test_img $IMG_SIZE
 $QEMU_IO -c "open -o 
driver=raw,file.driver=blkverify,file.raw.filename=$TEST_IMG.base $TEST_IMG" \
  -c 'read 0 512' -c 'write -P 42 0x38000 512' -c 'read -P 42 0x38000 
512' | _filter_qemu_io
@@ -76,8 +85,10 @@ echo
 echo "=== Testing blkverify through file blockref ==="
 echo
 
-TEST_IMG="$TEST_IMG.base" IMGFMT="raw" _make_test_img --no-opts $IMG_SIZE |\
-_filter_imgfmt
+TEST_IMG="$TEST_IMG.base" IMGFMT="raw" _make_test_img --no-opts $IMG_SIZE \
+>"$scratch_out"
+_filter_imgfmt <"$scratch_out"
+
 _make_test_img $IMG_SIZE
 $QEMU_IO -c "open -o 
driver=raw,file.driver=blkverify,file.raw.filename=$TEST_IMG.base,file.test.driver=$IMGFMT,file.test.file.filename=$TEST_IMG"
 \
  -c 'read 0 512' -c 'write -P 42 0x38000 512' -c 'read -P 42 0x38000 
512' | _filter_qemu_io
diff --git a/tests/qemu-iotests/174 b/tests/qemu-iotests/174
index e2f14a38c6..1b0dd2e8b7 100755
--- a/tests/qemu-iotests/174
+++ b/tests/qemu-iotests/174
@@ -40,7 +40,15 @@ _unsupported_fmt raw
 
 
 size=256K
-IMGFMT=raw IMGKEYSECRET= _make_test_img --no-opts $size | _filter_imgfmt
+
+# _make_test_img may set variables that we need to retain.  Everything
+# in a pipe is executed in a subshell, so doing so would throw away
+# all changes.  Therefore, we have to store the output in some temp
+# file and filter that.
+scratch_out="$TEST_DIR/img-create.out"
+IMGFMT=raw IMGKEYSECRET= _make_test_img --no-opts $size >"$scratch_out"
+_filter_imgfmt <"$scratch_out"
+rm -f "$scratch_out"
 
 echo
 echo "== reading wrong format should fail =="
-- 
2.26.2

[PATCH v2 19/20] iotests: Enable fuse for many tests

2020-09-22 Thread Max Reitz

Many tests (that do not support generic protocols) can run just fine
with FUSE-exported images, so allow them to.  Note that this is no
attempt at being definitely complete.  There are some tests that might
be modified to run on FUSE, but this patch still skips them.  This patch
only tries to pick the rather low-hanging fruits.

Note that 221 and 250 only pass when .lseek is correctly implemented,
which is only possible with a libfuse that is 3.8 or newer.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/025 | 2 +-
 tests/qemu-iotests/026 | 2 +-
 tests/qemu-iotests/028 | 2 +-
 tests/qemu-iotests/031 | 2 +-
 tests/qemu-iotests/034 | 2 +-
 tests/qemu-iotests/036 | 2 +-
 tests/qemu-iotests/037 | 2 +-
 tests/qemu-iotests/038 | 2 +-
 tests/qemu-iotests/039 | 2 +-
 tests/qemu-iotests/046 | 2 +-
 tests/qemu-iotests/050 | 2 +-
 tests/qemu-iotests/054 | 2 +-
 tests/qemu-iotests/060 | 2 +-
 tests/qemu-iotests/071 | 2 +-
 tests/qemu-iotests/079 | 2 +-
 tests/qemu-iotests/080 | 2 +-
 tests/qemu-iotests/089 | 2 +-
 tests/qemu-iotests/090 | 2 +-
 tests/qemu-iotests/091 | 2 +-
 tests/qemu-iotests/095 | 2 +-
 tests/qemu-iotests/097 | 2 +-
 tests/qemu-iotests/098 | 2 +-
 tests/qemu-iotests/102 | 2 +-
 tests/qemu-iotests/103 | 2 +-
 tests/qemu-iotests/106 | 2 +-
 tests/qemu-iotests/107 | 2 +-
 tests/qemu-iotests/108 | 2 +-
 tests/qemu-iotests/111 | 2 +-
 tests/qemu-iotests/112 | 2 +-
 tests/qemu-iotests/115 | 2 +-
 tests/qemu-iotests/117 | 2 +-
 tests/qemu-iotests/120 | 2 +-
 tests/qemu-iotests/121 | 2 +-
 tests/qemu-iotests/127 | 2 +-
 tests/qemu-iotests/133 | 2 +-
 tests/qemu-iotests/137 | 2 +-
 tests/qemu-iotests/138 | 2 +-
 tests/qemu-iotests/140 | 2 +-
 tests/qemu-iotests/154 | 2 +-
 tests/qemu-iotests/161 | 2 +-
 tests/qemu-iotests/171 | 2 +-
 tests/qemu-iotests/175 | 2 +-
 tests/qemu-iotests/176 | 2 +-
 tests/qemu-iotests/177 | 2 +-
 tests/qemu-iotests/179 | 2 +-
 tests/qemu-iotests/183 | 2 +-
 tests/qemu-iotests/186 | 2 +-
 tests/qemu-iotests/187 | 2 +-
 tests/qemu-iotests/191 | 2 +-
 tests/qemu-iotests/195 | 2 +-
 tests/qemu-iotests/200 | 2 +-
 tests/qemu-iotests/204 | 2 +-
 tests/qemu-iotests/214 | 2 +-
 tests/qemu-iotests/217 | 2 +-
 tests/qemu-iotests/220 | 2 +-
 tests/qemu-iotests/221 | 2 +-
 tests/qemu-iotests/229 | 2 +-
 tests/qemu-iotests/247 | 2 +-
 tests/qemu-iotests/249 | 2 +-
 tests/qemu-iotests/250 | 2 +-
 tests/qemu-iotests/252 | 2 +-
 tests/qemu-iotests/265 | 2 +-
 tests/qemu-iotests/268 | 2 +-
 tests/qemu-iotests/272 | 2 +-
 tests/qemu-iotests/273 | 2 +-
 tests/qemu-iotests/279 | 2 +-
 tests/qemu-iotests/286 | 2 +-
 tests/qemu-iotests/287 | 2 +-
 tests/qemu-iotests/289 | 2 +-
 tests/qemu-iotests/290 | 2 +-
 tests/qemu-iotests/291 | 2 +-
 tests/qemu-iotests/292 | 2 +-
 tests/qemu-iotests/293 | 2 +-
 tests/qemu-iotests/294 | 2 +-
 tests/qemu-iotests/305 | 2 +-
 75 files changed, 75 insertions(+), 75 deletions(-)

diff --git a/tests/qemu-iotests/025 b/tests/qemu-iotests/025
index e05d833452..1569d912f4 100755
--- a/tests/qemu-iotests/025
+++ b/tests/qemu-iotests/025
@@ -38,7 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 _supported_fmt raw qcow2 qed luks
-_supported_proto file sheepdog rbd nfs
+_supported_proto file sheepdog rbd nfs fuse
 
 echo "=== Creating image"
 echo
diff --git a/tests/qemu-iotests/026 b/tests/qemu-iotests/026
index b9713eb591..9ecc5880b1 100755
--- a/tests/qemu-iotests/026
+++ b/tests/qemu-iotests/026
@@ -41,7 +41,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 # Currently only qcow2 supports rebasing
 _supported_fmt qcow2
-_supported_proto file
+_supported_proto file fuse
 _default_cache_mode writethrough
 _supported_cache_modes writethrough none
 # The refcount table tests expect a certain minimum width for refcount entries
diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028
index 864dc4a4e2..57d34aae99 100755
--- a/tests/qemu-iotests/028
+++ b/tests/qemu-iotests/028
@@ -46,7 +46,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 # Any format supporting backing files except vmdk and qcow which do not support
 # smaller backing files.
 _supported_fmt qcow2 qed
-_supported_proto file
+_supported_proto file fuse
 _supported_os Linux
 
 # Choose a size that is not necessarily a cluster size multiple for image
diff --git a/tests/qemu-iotests/031 b/tests/qemu-iotests/031
index 646ecd593f..2bcbc5886e 100755
--- a/tests/qemu-iotests/031
+++ b/tests/qemu-iotests/031
@@ -39,7 +39,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 # This tests qcow2-specific low-level functionality
 _supported_fmt qcow2
-_supported_proto file
+_supported_proto file fuse
 # We want to test compat=0.10, which does not support external data
 # files or refcount widths other than 16
 _unsupported_imgopts data_file 'refcount_bits=\([^1]\|.\([^6]\|$\)\)'
diff --git a/tests/qemu-iotests/034 b/tests/qemu-iotests/034
index ac2d687c71..08f7aea6d5 100755
--- a/tests/qemu-iotests/034
+++ b/tests/qemu-iotests/034
@@ -37,7 +37,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common

[PATCH v2 07/20] iotests: Do not needlessly filter _make_test_img

2020-09-22 Thread Max Reitz

In most cases, _make_test_img does not need a _filter_imgfmt on top.  It
does that by itself.

(The exception is when IMGFMT has been overwritten but TEST_IMG has not.
In such cases, we do need a _filter_imgfmt on top to filter the test's
original IMGFMT from TEST_IMG.)

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/161 | 12 ++--
 tests/qemu-iotests/175 |  6 +++---
 tests/qemu-iotests/249 |  6 +++---
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/tests/qemu-iotests/161 b/tests/qemu-iotests/161
index e270976d87..bbf7dbbc5c 100755
--- a/tests/qemu-iotests/161
+++ b/tests/qemu-iotests/161
@@ -48,9 +48,9 @@ _supported_os Linux
 IMG_SIZE=1M
 
 # Create the images
-TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE | _filter_imgfmt
-TEST_IMG="$TEST_IMG.int" _make_test_img -b "$TEST_IMG.base" -F $IMGFMT | 
_filter_imgfmt
-_make_test_img -b "$TEST_IMG.int" -F $IMGFMT -F $IMGFMT | _filter_imgfmt
+TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE
+TEST_IMG="$TEST_IMG.int" _make_test_img -b "$TEST_IMG.base" -F $IMGFMT
+_make_test_img -b "$TEST_IMG.int" -F $IMGFMT -F $IMGFMT
 
 # First test: reopen $TEST.IMG changing the detect-zeroes option on
 # its backing file ($TEST_IMG.int).
@@ -105,9 +105,9 @@ echo
 echo "*** Commit and then change an option on the backing file"
 echo
 # Create the images again
-TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE | _filter_imgfmt
-TEST_IMG="$TEST_IMG.int" _make_test_img -b "$TEST_IMG.base" -F $IMGFMT| 
_filter_imgfmt
-_make_test_img -b "$TEST_IMG.int" -F $IMGFMT | _filter_imgfmt
+TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE
+TEST_IMG="$TEST_IMG.int" _make_test_img -b "$TEST_IMG.base" -F $IMGFMT
+_make_test_img -b "$TEST_IMG.int" -F $IMGFMT
 
 _launch_qemu -drive if=none,file="${TEST_IMG}"
 _send_qemu_cmd $QEMU_HANDLE \
diff --git a/tests/qemu-iotests/175 b/tests/qemu-iotests/175
index 00a626aa63..c3c2aed653 100755
--- a/tests/qemu-iotests/175
+++ b/tests/qemu-iotests/175
@@ -89,20 +89,20 @@ min_blocks=$(stat -c '%b' "$TEST_DIR/empty")
 
 echo
 echo "== creating image with default preallocation =="
-_make_test_img -o extent_size_hint=0 $size | _filter_imgfmt
+_make_test_img -o extent_size_hint=0 $size
 stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks 
$min_blocks $size
 
 for mode in off full falloc; do
 echo
 echo "== creating image with preallocation $mode =="
-_make_test_img -o preallocation=$mode,extent_size_hint=0 $size | 
_filter_imgfmt
+_make_test_img -o preallocation=$mode,extent_size_hint=0 $size
 stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks 
$min_blocks $size
 done
 
 for new_size in 4096 1048576; do
 echo
 echo "== resize empty image with block_resize =="
-_make_test_img -o extent_size_hint=0 0 | _filter_imgfmt
+_make_test_img -o extent_size_hint=0 0
 _block_resize $TEST_IMG $new_size >/dev/null
 stat -c "size=%s, blocks=%b" $TEST_IMG | _filter_blocks $extra_blocks 
$min_blocks $new_size
 done
diff --git a/tests/qemu-iotests/249 b/tests/qemu-iotests/249
index 68f13ed328..a9aa9303eb 100755
--- a/tests/qemu-iotests/249
+++ b/tests/qemu-iotests/249
@@ -48,9 +48,9 @@ _supported_os Linux
 IMG_SIZE=1M
 
 # Create the images: base <- int <- active
-TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE | _filter_imgfmt
-TEST_IMG="$TEST_IMG.int" _make_test_img -b "$TEST_IMG.base" -F $IMGFMT | 
_filter_imgfmt
-_make_test_img -b "$TEST_IMG.int" -F $IMGFMT | _filter_imgfmt
+TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE
+TEST_IMG="$TEST_IMG.int" _make_test_img -b "$TEST_IMG.base" -F $IMGFMT
+_make_test_img -b "$TEST_IMG.int" -F $IMGFMT
 
 # Launch QEMU with these two drives:
 # none0: base (read-only)
-- 
2.26.2

[PATCH v2 10/20] iotests/046: Avoid renaming images

2020-09-22 Thread Max Reitz

This generally does not work on non-file protocols.  It is better to
create the image with the final name from the start, and most tests do
this already.  Let 046 follow suit.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/046 | 5 +++--
 tests/qemu-iotests/046.out | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/046 b/tests/qemu-iotests/046
index 88b3363c19..40a9f30087 100755
--- a/tests/qemu-iotests/046
+++ b/tests/qemu-iotests/046
@@ -47,6 +47,8 @@ size=128M
 echo
 echo "== creating backing file for COW tests =="
 
+TEST_IMG_SAVE=$TEST_IMG
+TEST_IMG="$TEST_IMG.base"
 _make_test_img $size
 
 backing_io()
@@ -67,8 +69,7 @@ backing_io()
 
 backing_io 0 32 write | $QEMU_IO "$TEST_IMG" | _filter_qemu_io
 
-mv "$TEST_IMG" "$TEST_IMG.base"
-
+TEST_IMG=$TEST_IMG_SAVE
 _make_test_img -b "$TEST_IMG.base" -F $IMGFMT 6G
 
 echo
diff --git a/tests/qemu-iotests/046.out b/tests/qemu-iotests/046.out
index b022bcddd5..66ad987ab3 100644
--- a/tests/qemu-iotests/046.out
+++ b/tests/qemu-iotests/046.out
@@ -1,7 +1,7 @@
 QA output created by 046
 
 == creating backing file for COW tests ==
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134217728
 wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 65536/65536 bytes at offset 65536
-- 
2.26.2

[PATCH v2 16/20] storage-daemon: Call bdrv_close_all() on exit

2020-09-22 Thread Max Reitz

Otherwise, exports and block devices are not properly shut down and
closed, unless the users explicitly issues blockdev-del and
block-export-del commands for each of them.

Signed-off-by: Max Reitz 
---
 storage-daemon/qemu-storage-daemon.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/storage-daemon/qemu-storage-daemon.c 
b/storage-daemon/qemu-storage-daemon.c
index e6157ff518..0b6d469751 100644
--- a/storage-daemon/qemu-storage-daemon.c
+++ b/storage-daemon/qemu-storage-daemon.c
@@ -324,6 +324,9 @@ int main(int argc, char *argv[])
 main_loop_wait(false);
 }
 
+bdrv_drain_all_begin();
+bdrv_close_all();
+
 monitor_cleanup();
 qemu_chr_cleanup();
 user_creatable_cleanup();
-- 
2.26.2

[PATCH v2 12/20] iotests/091: Use _cleanup_qemu instad of "wait"

2020-09-22 Thread Max Reitz

If the test environment has some other child processes running (like a
storage daemon that provides a FUSE export), then "wait" will never
finish.  Use wait=yes _cleanup_qemu instead.

(We need to discard the output so there is no change to the reference
output.)

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/091 | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/091 b/tests/qemu-iotests/091
index 68fbfd777b..8a4ce5b7e2 100755
--- a/tests/qemu-iotests/091
+++ b/tests/qemu-iotests/091
@@ -96,7 +96,8 @@ _send_qemu_cmd $h2 'qemu-io disk flush' "(qemu)"
 _send_qemu_cmd $h2 'quit' ""
 _send_qemu_cmd $h1 'quit' ""
 
-wait
+wait=yes _cleanup_qemu >/dev/null
+
 echo "Check image pattern"
 ${QEMU_IO} -c "read -P 0x22 0 4M" "${TEST_IMG}" | _filter_testdir | 
_filter_qemu_io
 
-- 
2.26.2

[PATCH v2 00/20] block/export: Allow exporting BDSs via FUSE

2020-09-22 Thread Max Reitz

Based-on: <20200907182011.521007-1-kw...@redhat.com>
  (“block/export: Add infrastructure and QAPI for block exports”)

(Though its patch 16 needs a s/= \&blk_exp_nbd/= drv/ on top.)

v1: https://lists.nongnu.org/archive/html/qemu-block/2019-12/msg00451.html

Branch: https://github.com/XanClic/qemu.git fuse-exports-v2
Branch: https://git.xanclic.moe/XanClic/qemu.git fuse-exports-v2


Hi,

Ever since I found out that you can mount FUSE filesystems on regular
files (not just directories), I had the idea of adding FUSE block
exports to qemu where you can export block nodes as raw images.  The
best thing is that you’d be able to mount an image on itself, so
whatever format it may be in, qemu lets it appear as a raw image (and
you can then use regular tools like dd on it).

The performance is quite bad so far, but we can always try to improve it
if the need arises.  For now I consider it mostly a cute feature to get
easy access to the raw contents of image files in any format (without
requiring root rights).

In this version (as opposed to v1 linked above), I integrated the FUSE
export code into Kevin’s proposed common infrastructure for block
exports.


This series does the following:

First, add the FUSE export module (block/export/fuse.c) that implements
the basic file access functions.  (Note that you need libfuse 3.8.0 or
later for SEEK_HOLE/SEEK_DATA.)

Second, it allows using FUSE exports as a protocol in the iotests and
makes many iotests work with it.  (The file node is exported by a
background qemu instance to $SOCK_DIR.)

This gives us a lot of coverage for, well, not free (it does take twelve
patches), but for cheap; but there are still some more specialized
things we want to test, so third and last, this series adds an iotest
dedicated to FUSE exports.


(Note that as opposed to v1, I did run the iotests this time.)


Some notable changes from v1:
- Integrated everything into Kevin’s block-export infrastructure
- Use the storage daemon instead of full QEMU to provide FUSE exports
  when running the iotests with -fuse
- meson rebase
- Some other rebase conflicts


git-backport-diff against v1:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/20:[0007] [FC] 'configure: Detect libfuse'
002/20:[0255] [FC] 'fuse: Allow exporting BDSs via FUSE'
003/20:[0062] [FC] 'fuse: Implement standard FUSE operations'
004/20:[0018] [FC] 'fuse: Allow growable exports'
005/20:[0016] [FC] 'fuse: (Partially) implement fallocate()'
006/20:[0008] [FC] 'fuse: Implement hole detection through lseek'
007/20:[0036] [FC] 'iotests: Do not needlessly filter _make_test_img'
008/20:[] [--] 'iotests: Do not pipe _make_test_img'
009/20:[0012] [FC] 'iotests: Use convert -n in some cases'
010/20:[0006] [FC] 'iotests: Avoid renaming images'
011/20:[0008] [FC] 'iotests: Derive image names from $TEST_IMG'
012/20:[] [--] 'iotests/091: Use _cleanup_qemu instad of "wait"'
013/20:[0008] [FC] 'iotests: Restrict some Python tests to file'
014/20:[0010] [FC] 'iotests: Let _make_test_img guess $TEST_IMG_FILE'
015/20:[down] 'iotests/287: Clean up subshell test image'
016/20:[down] 'storage-daemon: Call bdrv_close_all() on exit'
017/20:[down] 'iotests: Give access to the qemu-storage-daemon'
018/20:[0042] [FC] 'iotests: Allow testing FUSE exports'
019/20:[0026] [FC] 'iotests: Enable fuse for many tests'
020/20:[0104] [FC] 'iotests/281: Add test for FUSE exports'


Max Reitz (20):
  configure: Detect libfuse
  fuse: Allow exporting BDSs via FUSE
  fuse: Implement standard FUSE operations
  fuse: Allow growable exports
  fuse: (Partially) implement fallocate()
  fuse: Implement hole detection through lseek
  iotests: Do not needlessly filter _make_test_img
  iotests: Do not pipe _make_test_img
  iotests: Use convert -n in some cases
  iotests/046: Avoid renaming images
  iotests: Derive image names from $TEST_IMG
  iotests/091: Use _cleanup_qemu instad of "wait"
  iotests: Restrict some Python tests to file
  iotests: Let _make_test_img guess $TEST_IMG_FILE
  iotests/287: Clean up subshell test image
  storage-daemon: Call bdrv_close_all() on exit
  iotests: Give access to the qemu-storage-daemon
  iotests: Allow testing FUSE exports
  iotests: Enable fuse for many tests
  iotests/308: Add test for FUSE exports

 configure|  66 +++
 qapi/block-export.json   |  28 +-
 include/block/fuse.h |  30 ++
 block.c  |   1 +
 block/export/export.c|   4 +
 block/export/fuse.c  | 645 +++
 storage-daemon/qemu-storage-daemon.c |   3 +
 block/export/meson.build |   1 +
 meson.build  |   7 +
 tests/qemu-iotests/025   |   2 +-
 tests/qemu-iotests/026   |   2 +-
 tests/qemu-iot

[PATCH v2 06/20] fuse: Implement hole detection through lseek

2020-09-22 Thread Max Reitz

This is a relatively new feature in libfuse (available since 3.8.0,
which was released in November 2019), so we have to let configure check
whether it is available before making use of it.

Signed-off-by: Max Reitz 
---
 configure   | 32 +++
 block/export/fuse.c | 77 +
 meson.build |  1 +
 3 files changed, 110 insertions(+)

diff --git a/configure b/configure
index 21c31e4694..6b9184b62a 100755
--- a/configure
+++ b/configure
@@ -6226,11 +6226,39 @@ EOF
   fuse_libs=$(pkg-config --libs fuse3)
   if compile_prog "$fuse_cflags" "$fuse_libs"; then
 fuse=yes
+
+cat > $TMPC <
+#include 
+#include 
+#include 
+#include 
+static void fuse_lseek(fuse_req_t req, fuse_ino_t inode, off_t offset,
+   int whence, struct fuse_file_info *fi)
+{
+if (whence == SEEK_DATA || whence == SEEK_HOLE) {
+fuse_reply_lseek(req, offset);
+} else {
+fuse_reply_err(req, EINVAL);
+}
+}
+const struct fuse_lowlevel_ops fuse_ops = {
+.lseek = fuse_lseek,
+};
+int main(void) { return 0; }
+EOF
+if compile_prog "$fuse_cflags" "$fuse_libs"; then
+  fuse_lseek=yes
+else
+  fuse_lseek=no
+fi
   else
 if test "$fuse" = "yes"; then
   feature_not_found "fuse"
 fi
 fuse=no
+fuse_lseek=no
   fi
 fi
 
@@ -7425,6 +7453,10 @@ if test "$fuse" = "yes"; then
   echo "CONFIG_FUSE=y" >> $config_host_mak
   echo "FUSE_CFLAGS=$fuse_cflags" >> $config_host_mak
   echo "FUSE_LIBS=$fuse_libs" >> $config_host_mak
+
+  if test "$fuse_lseek" = "yes"; then
+echo "CONFIG_FUSE_LSEEK=y" >> $config_host_mak
+  fi
 fi
 
 if test "$tcg_interpreter" = "yes"; then
diff --git a/block/export/fuse.c b/block/export/fuse.c
index 7dfb4eb280..ee90f5a185 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -548,6 +548,80 @@ static void fuse_flush(fuse_req_t req, fuse_ino_t inode,
 fuse_reply_err(req, ret < 0 ? -ret : 0);
 }
 
+#ifdef CONFIG_FUSE_LSEEK
+/**
+ * Let clients inquire allocation status.
+ */
+static void fuse_lseek(fuse_req_t req, fuse_ino_t inode, off_t offset,
+   int whence, struct fuse_file_info *fi)
+{
+FuseExport *exp = fuse_req_userdata(req);
+
+if (whence != SEEK_HOLE && whence != SEEK_DATA) {
+fuse_reply_err(req, EINVAL);
+return;
+}
+
+while (true) {
+int64_t pnum;
+int ret;
+
+ret = bdrv_block_status_above(blk_bs(exp->common.blk), NULL,
+  offset, INT64_MAX, &pnum, NULL, NULL);
+if (ret < 0) {
+fuse_reply_err(req, -ret);
+return;
+}
+
+if (!pnum && (ret & BDRV_BLOCK_EOF)) {
+int64_t blk_len;
+
+/*
+ * If blk_getlength() rounds (e.g. by sectors), then the
+ * export length will be rounded, too.  However,
+ * bdrv_block_status_above() may return EOF at unaligned
+ * offsets.  We must not let this become visible and thus
+ * always simulate a hole between @offset (the real EOF)
+ * and @blk_len (the client-visible EOF).
+ */
+
+blk_len = blk_getlength(exp->common.blk);
+if (blk_len < 0) {
+fuse_reply_err(req, -blk_len);
+return;
+}
+
+if (offset > blk_len || whence == SEEK_DATA) {
+fuse_reply_err(req, ENXIO);
+} else {
+fuse_reply_lseek(req, offset);
+}
+return;
+}
+
+if (ret & BDRV_BLOCK_DATA) {
+if (whence == SEEK_DATA) {
+fuse_reply_lseek(req, offset);
+return;
+}
+} else {
+if (whence == SEEK_HOLE) {
+fuse_reply_lseek(req, offset);
+return;
+}
+}
+
+/* Safety check against infinite loops */
+if (!pnum) {
+fuse_reply_err(req, ENXIO);
+return;
+}
+
+offset += pnum;
+}
+}
+#endif
+
 static const struct fuse_lowlevel_ops fuse_ops = {
 .lookup = fuse_lookup,
 .getattr= fuse_getattr,
@@ -557,6 +631,9 @@ static const struct fuse_lowlevel_ops fuse_ops = {
 .write  = fuse_write,
 .fallocate  = fuse_fallocate,
 .flush  = fuse_flush,
+#ifdef CONFIG_FUSE_LSEEK
+.lseek  = fuse_lseek,
+#endif
 };
 
 const BlockExportDriver blk_exp_fuse = {
diff --git a/meson.build b/meson.build
index 85addd8562..1db6a46d14 100644
--- a/meson.build
+++ b/meson.build
@@ -1537,6 +1537,7 @@ summary_info += {'thread sanitizer':  
config_host.has_key('CONFIG_TSAN')}
 summary_info += {'rng-none':  config_host.has_key('CONFIG_RNG_NONE')}
 summary_info += {'Linux keyring': 
config_host.has_key('CONFIG_SECRET_KEYRING')}
 summary_info += {'fuse exports':  config_host.has_key('CONFIG_FUSE')}
+summary_info += {'fuse lseek':config_host.

[PATCH v2 04/20] fuse: Allow growable exports

2020-09-22 Thread Max Reitz

These will behave more like normal files in that writes beyond the EOF
will automatically grow the export size.

Signed-off-by: Max Reitz 
---
 qapi/block-export.json |  6 +-
 block/export/fuse.c| 12 +++-
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/qapi/block-export.json b/qapi/block-export.json
index cb5bd54cbf..cb26daa98b 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -183,10 +183,14 @@
 # @mountpoint: Path on which to export the block device via FUSE.
 #  This must point to an existing regular file.
 #
+# @growable: Whether writes beyond the EOF should grow the block node
+#accordingly. (default: false)
+#
 # Since: 5.2
 ##
 { 'struct': 'BlockExportOptionsFuse',
-  'data': { 'mountpoint': 'str' },
+  'data': { 'mountpoint': 'str',
+'*growable': 'bool' },
   'if': 'defined(CONFIG_FUSE)' }
 
 ##
diff --git a/block/export/fuse.c b/block/export/fuse.c
index 8fc667231d..f3a84579ba 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -45,6 +45,7 @@ typedef struct FuseExport {
 
 char *mountpoint;
 bool writable;
+bool growable;
 } FuseExport;
 
 static GHashTable *exports;
@@ -101,6 +102,7 @@ static int fuse_export_create(BlockExport *blk_exp,
 
 exp->mountpoint = g_strdup(args->mountpoint);
 exp->writable = blk_exp_args->writable;
+exp->growable = args->growable;
 
 ret = setup_fuse_export(exp, args->mountpoint, errp);
 if (ret < 0) {
@@ -436,7 +438,15 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, 
const char *buf,
 
 size = MIN(size, BDRV_REQUEST_MAX_BYTES);
 if (offset + size > length) {
-size = length - offset;
+if (exp->growable) {
+ret = fuse_do_truncate(exp, offset + size, PREALLOC_MODE_OFF);
+if (ret < 0) {
+fuse_reply_err(req, -ret);
+return;
+}
+} else {
+size = length - offset;
+}
 }
 
 ret = blk_pwrite(exp->common.blk, offset, buf, size, 0);
-- 
2.26.2

[PATCH v2 05/20] fuse: (Partially) implement fallocate()

2020-09-22 Thread Max Reitz

This allows allocating areas after the (old) EOF as part of a growing
resize, writing zeroes, and discarding.

Signed-off-by: Max Reitz 
---
 block/export/fuse.c | 79 +
 1 file changed, 79 insertions(+)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index f3a84579ba..7dfb4eb280 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -457,6 +457,84 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, 
const char *buf,
 }
 }
 
+/**
+ * Let clients perform various fallocate() operations.
+ */
+static void fuse_fallocate(fuse_req_t req, fuse_ino_t inode, int mode,
+   off_t offset, off_t length,
+   struct fuse_file_info *fi)
+{
+FuseExport *exp = fuse_req_userdata(req);
+int64_t blk_len;
+int ret;
+
+if (!exp->writable) {
+fuse_reply_err(req, EACCES);
+return;
+}
+
+blk_len = blk_getlength(exp->common.blk);
+if (blk_len < 0) {
+fuse_reply_err(req, -blk_len);
+return;
+}
+
+if (mode & FALLOC_FL_KEEP_SIZE) {
+length = MIN(length, blk_len - offset);
+}
+
+if (mode & FALLOC_FL_PUNCH_HOLE) {
+if (!(mode & FALLOC_FL_KEEP_SIZE)) {
+fuse_reply_err(req, EINVAL);
+return;
+}
+
+do {
+int size = MIN(length, BDRV_REQUEST_MAX_BYTES);
+
+ret = blk_pdiscard(exp->common.blk, offset, size);
+length -= size;
+} while (ret == 0 && length > 0);
+} else if (mode & FALLOC_FL_ZERO_RANGE) {
+if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + length > blk_len) {
+ret = fuse_do_truncate(exp, offset + length, PREALLOC_MODE_OFF);
+if (ret < 0) {
+fuse_reply_err(req, -ret);
+return;
+}
+}
+
+do {
+int size = MIN(length, BDRV_REQUEST_MAX_BYTES);
+
+ret = blk_pwrite_zeroes(exp->common.blk,
+offset, size, 0);
+length -= size;
+} while (ret == 0 && length > 0);
+} else if (!mode) {
+/* We can only fallocate at the EOF with a truncate */
+if (offset < blk_len) {
+fuse_reply_err(req, EOPNOTSUPP);
+return;
+}
+
+if (offset > blk_len) {
+/* No preallocation needed here */
+ret = fuse_do_truncate(exp, offset, PREALLOC_MODE_OFF);
+if (ret < 0) {
+fuse_reply_err(req, -ret);
+return;
+}
+}
+
+ret = fuse_do_truncate(exp, offset + length, PREALLOC_MODE_FALLOC);
+} else {
+ret = -EOPNOTSUPP;
+}
+
+fuse_reply_err(req, ret < 0 ? -ret : 0);
+}
+
 /**
  * Let clients flush the exported image.
  */
@@ -477,6 +555,7 @@ static const struct fuse_lowlevel_ops fuse_ops = {
 .open   = fuse_open,
 .read   = fuse_read,
 .write  = fuse_write,
+.fallocate  = fuse_fallocate,
 .flush  = fuse_flush,
 };
 
-- 
2.26.2

[PATCH v2 02/20] fuse: Allow exporting BDSs via FUSE

2020-09-22 Thread Max Reitz

block-export-add type=fuse allows mounting block graph nodes via FUSE on
some existing regular file.  That file should then appears like a raw
disk image, and accesses to it result in accesses to the exported BDS.

Right now, we only implement the necessary block export functions to set
it up and shut it down.  We do not implement any access functions, so
accessing the mount point only results in errors.  This will be
addressed by a followup patch.

We keep a hash table of exported mount points, because we want to be
able to detect when users try to use a mount point twice.  This is
because we invoke stat() to check whether the given mount point is a
regular file, but if that file is served by ourselves (because it is
already used as a mount point), then this stat() would have to be served
by ourselves, too, which is impossible to do while we (as the caller)
are waiting for it to settle.  Therefore, keep track of mount point
paths to at least catch the most obvious instances of that problem.

Signed-off-by: Max Reitz 
---
 qapi/block-export.json   |  24 +++-
 include/block/fuse.h |  30 +
 block.c  |   1 +
 block/export/export.c|   4 +
 block/export/fuse.c  | 253 +++
 block/export/meson.build |   1 +
 6 files changed, 311 insertions(+), 2 deletions(-)
 create mode 100644 include/block/fuse.h
 create mode 100644 block/export/fuse.c

diff --git a/qapi/block-export.json b/qapi/block-export.json
index 0c045dfe7b..cb5bd54cbf 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -174,6 +174,21 @@
 ##
 { 'command': 'nbd-server-stop' }
 
+##
+# @BlockExportOptionsFuse:
+#
+# Options for exporting a block graph node on some (file) mountpoint
+# as a raw image.
+#
+# @mountpoint: Path on which to export the block device via FUSE.
+#  This must point to an existing regular file.
+#
+# Since: 5.2
+##
+{ 'struct': 'BlockExportOptionsFuse',
+  'data': { 'mountpoint': 'str' },
+  'if': 'defined(CONFIG_FUSE)' }
+
 ##
 # @BlockExportType:
 #
@@ -181,10 +196,13 @@
 #
 # @nbd: NBD export
 #
+# @fuse: Fuse export (since: 5.2)
+#
 # Since: 4.2
 ##
 { 'enum': 'BlockExportType',
-  'data': [ 'nbd' ] }
+  'data': [ 'nbd',
+{ 'name': 'fuse', 'if': 'defined(CONFIG_FUSE)' } ] }
 
 ##
 # @BlockExportOptions:
@@ -213,7 +231,9 @@
 '*writethrough': 'bool' },
   'discriminator': 'type',
   'data': {
-  'nbd': 'BlockExportOptionsNbd'
+  'nbd': 'BlockExportOptionsNbd',
+  'fuse': { 'type': 'BlockExportOptionsFuse',
+'if': 'defined(CONFIG_FUSE)' }
} }
 
 ##
diff --git a/include/block/fuse.h b/include/block/fuse.h
new file mode 100644
index 00..ffa91fe364
--- /dev/null
+++ b/include/block/fuse.h
@@ -0,0 +1,30 @@
+/*
+ * Present a block device as a raw image through FUSE
+ *
+ * Copyright (c) 2020 Max Reitz 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; under version 2 or later of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef BLOCK_FUSE_H
+#define BLOCK_FUSE_H
+
+#ifdef CONFIG_FUSE
+
+#include "block/export.h"
+
+extern const BlockExportDriver blk_exp_fuse;
+
+#endif /* CONFIG_FUSE */
+
+#endif
diff --git a/block.c b/block.c
index acde53f92a..7bf45f6ede 100644
--- a/block.c
+++ b/block.c
@@ -26,6 +26,7 @@
 #include "block/trace.h"
 #include "block/block_int.h"
 #include "block/blockjob.h"
+#include "block/fuse.h"
 #include "block/nbd.h"
 #include "block/qdict.h"
 #include "qemu/error-report.h"
diff --git a/block/export/export.c b/block/export/export.c
index 64d39a6934..ba866d6ba7 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -16,6 +16,7 @@
 #include "block/block.h"
 #include "sysemu/block-backend.h"
 #include "block/export.h"
+#include "block/fuse.h"
 #include "block/nbd.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-block-export.h"
@@ -24,6 +25,9 @@
 
 static const BlockExportDriver *blk_exp_drivers[] = {
 &blk_exp_nbd,
+#ifdef CONFIG_FUSE
+&blk_exp_fuse,
+#endif
 };
 
 /* Only accessed from the main thread */
diff --git a/block/export/fuse.c b/block/export/fuse.c
new file mode 100644
index 00..75f11d2514
--- /dev/null
+++ b/block/export/fuse.c
@@ -0,0 +1,253 @@
+/*
+ * Present a block device as a raw image through FUSE
+ *
+ * Copyright (c) 2020 Max Reitz 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Fou

[PATCH v2 03/20] fuse: Implement standard FUSE operations

2020-09-22 Thread Max Reitz

This makes the export actually useful instead of only producing errors
whenever it is accessed.

Signed-off-by: Max Reitz 
---
 block/export/fuse.c | 226 
 1 file changed, 226 insertions(+)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index 75f11d2514..8fc667231d 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -32,6 +32,10 @@
 #include 
 
 
+/* Prevent overly long bounce buffer allocations */
+#define FUSE_MAX_BOUNCE_BYTES (MIN(BDRV_REQUEST_MAX_BYTES, 64 * 1024 * 1024))
+
+
 typedef struct FuseExport {
 BlockExport common;
 
@@ -241,7 +245,229 @@ static bool is_regular_file(const char *path, Error 
**errp)
 return true;
 }
 
+
+/**
+ * Let clients look up files.  Always return ENOENT because we only
+ * care about the mountpoint itself.
+ */
+static void fuse_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
+{
+fuse_reply_err(req, ENOENT);
+}
+
+/**
+ * Let clients get file attributes (i.e., stat() the file).
+ */
+static void fuse_getattr(fuse_req_t req, fuse_ino_t inode,
+ struct fuse_file_info *fi)
+{
+struct stat statbuf;
+int64_t length, allocated_blocks;
+time_t now = time(NULL);
+ImageInfo *info = NULL;
+FuseExport *exp = fuse_req_userdata(req);
+mode_t mode;
+Error *local_error = NULL;
+
+length = blk_getlength(exp->common.blk);
+if (length < 0) {
+fuse_reply_err(req, -length);
+return;
+}
+
+bdrv_query_image_info(blk_bs(exp->common.blk), &info, &local_error);
+if (local_error) {
+allocated_blocks = DIV_ROUND_UP(length, 512);
+} else {
+allocated_blocks = DIV_ROUND_UP(info->actual_size, 512);
+}
+
+qapi_free_ImageInfo(info);
+error_free(local_error);
+local_error = NULL;
+
+mode = S_IFREG | S_IRUSR;
+if (exp->writable) {
+mode |= S_IWUSR;
+}
+
+statbuf = (struct stat) {
+.st_ino = inode,
+.st_mode= mode,
+.st_nlink   = 1,
+.st_uid = getuid(),
+.st_gid = getgid(),
+.st_size= length,
+.st_blksize = blk_bs(exp->common.blk)->bl.request_alignment,
+.st_blocks  = allocated_blocks,
+.st_atime   = now,
+.st_mtime   = now,
+.st_ctime   = now,
+};
+
+fuse_reply_attr(req, &statbuf, 1.);
+}
+
+static int fuse_do_truncate(const FuseExport *exp, int64_t size,
+PreallocMode prealloc)
+{
+uint64_t blk_perm, blk_shared_perm;
+int ret;
+
+blk_get_perm(exp->common.blk, &blk_perm, &blk_shared_perm);
+
+ret = blk_set_perm(exp->common.blk, blk_perm | BLK_PERM_RESIZE,
+   blk_shared_perm, NULL);
+if (ret < 0) {
+return ret;
+}
+
+ret = blk_truncate(exp->common.blk, size, true, prealloc, 0, NULL);
+
+/* Must succeed, because we are only giving up the RESIZE permission */
+blk_set_perm(exp->common.blk, blk_perm, blk_shared_perm, &error_abort);
+
+return ret;
+}
+
+/**
+ * Let clients set file attributes.  Only resizing is supported.
+ */
+static void fuse_setattr(fuse_req_t req, fuse_ino_t inode, struct stat 
*statbuf,
+ int to_set, struct fuse_file_info *fi)
+{
+FuseExport *exp = fuse_req_userdata(req);
+int ret;
+
+if (!exp->writable) {
+fuse_reply_err(req, EACCES);
+return;
+}
+
+if (to_set & ~FUSE_SET_ATTR_SIZE) {
+fuse_reply_err(req, ENOTSUP);
+return;
+}
+
+ret = fuse_do_truncate(exp, statbuf->st_size, PREALLOC_MODE_OFF);
+if (ret < 0) {
+fuse_reply_err(req, -ret);
+return;
+}
+
+fuse_getattr(req, inode, fi);
+}
+
+/**
+ * Let clients open a file (i.e., the exported image).
+ */
+static void fuse_open(fuse_req_t req, fuse_ino_t inode,
+  struct fuse_file_info *fi)
+{
+fuse_reply_open(req, fi);
+}
+
+/**
+ * Handle client reads from the exported image.
+ */
+static void fuse_read(fuse_req_t req, fuse_ino_t inode,
+  size_t size, off_t offset, struct fuse_file_info *fi)
+{
+FuseExport *exp = fuse_req_userdata(req);
+int64_t length;
+void *buf;
+int ret;
+
+/**
+ * Clients will expect short reads at EOF, so we have to limit
+ * offset+size to the image length.
+ */
+length = blk_getlength(exp->common.blk);
+if (length < 0) {
+fuse_reply_err(req, -length);
+return;
+}
+
+size = MIN(size, FUSE_MAX_BOUNCE_BYTES);
+if (offset + size > length) {
+size = length - offset;
+}
+
+buf = qemu_try_blockalign(blk_bs(exp->common.blk), size);
+if (!buf) {
+fuse_reply_err(req, ENOMEM);
+return;
+}
+
+ret = blk_pread(exp->common.blk, offset, buf, size);
+if (ret >= 0) {
+fuse_reply_buf(req, buf, size);
+} else {
+fuse_reply_err(req, -ret);
+}
+
+qemu_vfree(buf);
+}
+
+/**
+ * Handle client writes to

Re: [PATCH] hw: ide: check the pointer before do dma memory unmap

2020-09-22 Thread Philippe Mathieu-Daudé

On 9/22/20 12:37 PM, Li Qiang wrote:
> Philippe Mathieu-Daudé  于2020年9月22日周二 下午4:19写道：
>>
>> On 9/22/20 3:34 AM, Alexander Bulekov wrote:
>>> On 200815 0020, Li Qiang wrote:
 In 'map_page' we need to check the return value of
 'dma_memory_map' to ensure the we actully maped something.
 Otherwise, we will hit an assert in 'address_space_unmap'.
 This is because we can't find the MR with the NULL buffer.
 This is the LP#1884693:

 -->https://bugs.launchpad.net/qemu/+bug/1884693

 Reported-by: Alexander Bulekov 
 Signed-off-by: Li Qiang 
>>>
>>> I'm not very familiar with the IDE code, but this seems like a simple
>>> null-ptr check, and Li has not received a response in over a month.
>>
>> Yeah well it is not an easy bug... I spent few hours but at some
>> point it became too AHCI specific. I wanted to understand the bug
>> to answer the "Why do we get there?" "Can we get there with real
>> hardware?" questions, to be able to discern if this patch is OK,
>> or if it is hiding bugs and what we really use here is an assert().
> 
> Hi Philippe,
> I think you have complicated this issue. The root cause is that
> 'dma_memory_map' maybe fail.
> The gpa is from guest and can be any value so this is expected.
> It can return NULL pointer (no map) or it can be do partially
> mapped(len < wanted).
> Though in most situation the map result is 'ret == NULL and len <
> wanted'. It may also has '
> ret != NULL and len < wanted' I think.

Then this form is easier to review to my taste:

-- >8 --
@@ -250,7 +250,7 @@ static void map_page(AddressSpace *as, uint8_t
**ptr, uint64_t addr,
 }

 *ptr = dma_memory_map(as, addr, &len, DMA_DIRECTION_FROM_DEVICE);
-if (len < wanted) {
+if (*ptr && len < wanted) {
 dma_memory_unmap(as, *ptr, len, DMA_DIRECTION_FROM_DEVICE, len);
 *ptr = NULL;
 }
---

> 
> The assert is come from that we pass NULL to 'dma_memory_unmap'.
> 
> So the standard usage of 'dma_memory_map' I think is first check if
> the return value to ensure it is not NULL.
> Then check whether it mapped the len as the caller expected.
> 
> There are several places in the code base that doesn't following this
> usage which I think it is wrong.
> 
> Thanks,
> Li Qiang
> 
>>
>>>
>>> Reviewed-by: Alexander Bulekov 
>>>
 ---
  hw/ide/ahci.c | 5 +
  1 file changed, 5 insertions(+)

 diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
 index 009120f88b..63e9fccdbe 100644
 --- a/hw/ide/ahci.c
 +++ b/hw/ide/ahci.c
 @@ -250,6 +250,11 @@ static void map_page(AddressSpace *as, uint8_t **ptr, 
 uint64_t addr,
  }

  *ptr = dma_memory_map(as, addr, &len, DMA_DIRECTION_FROM_DEVICE);
 +
 +if (!*ptr) {
 +return;
 +}
 +
  if (len < wanted) {
  dma_memory_unmap(as, *ptr, len, DMA_DIRECTION_FROM_DEVICE, len);
  *ptr = NULL;
 --
 2.17.1

>>>
>>
>

[PATCH v2 01/20] configure: Detect libfuse

2020-09-22 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 configure   | 34 ++
 meson.build |  6 ++
 2 files changed, 40 insertions(+)

diff --git a/configure b/configure
index ce27eafb0a..21c31e4694 100755
--- a/configure
+++ b/configure
@@ -538,6 +538,7 @@ meson=""
 ninja=""
 skip_meson=no
 gettext=""
+fuse=""
 
 bogus_os="no"
 malloc_trim=""
@@ -1621,6 +1622,10 @@ for opt do
   ;;
   --disable-libdaxctl) libdaxctl=no
   ;;
+  --enable-fuse) fuse=yes
+  ;;
+  --disable-fuse) fuse=no
+  ;;
   *)
   echo "ERROR: unknown option $opt"
   echo "Try '$0 --help' for more information"
@@ -1945,6 +1950,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   xkbcommon   xkbcommon support
   rng-nonedummy RNG, avoid using /dev/(u)random and getrandom()
   libdaxctl   libdaxctl support
+  fusefuse block device export
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -6206,6 +6212,28 @@ but not implemented on your system"
 fi
 fi
 
+##
+# FUSE support
+
+if test "$fuse" != "no"; then
+  cat > $TMPC <
+#include 
+int main(void) { return 0; }
+EOF
+  fuse_cflags=$(pkg-config --cflags fuse3)
+  fuse_libs=$(pkg-config --libs fuse3)
+  if compile_prog "$fuse_cflags" "$fuse_libs"; then
+fuse=yes
+  else
+if test "$fuse" = "yes"; then
+  feature_not_found "fuse"
+fi
+fuse=no
+  fi
+fi
+
 ##
 # End of CC checks
 # After here, no more $cc or $ld runs
@@ -7393,6 +7421,12 @@ if test "$secret_keyring" = "yes" ; then
   echo "CONFIG_SECRET_KEYRING=y" >> $config_host_mak
 fi
 
+if test "$fuse" = "yes"; then
+  echo "CONFIG_FUSE=y" >> $config_host_mak
+  echo "FUSE_CFLAGS=$fuse_cflags" >> $config_host_mak
+  echo "FUSE_LIBS=$fuse_libs" >> $config_host_mak
+fi
+
 if test "$tcg_interpreter" = "yes"; then
   QEMU_INCLUDES="-iquote ${source_path}/tcg/tci $QEMU_INCLUDES"
 elif test "$ARCH" = "sparc64" ; then
diff --git a/meson.build b/meson.build
index 86e1cca0ad..85addd8562 100644
--- a/meson.build
+++ b/meson.build
@@ -436,6 +436,11 @@ if 'CONFIG_TASN1' in config_host
 endif
 keyutils = dependency('libkeyutils', required: false,
   method: 'pkg-config', static: enable_static)
+libfuse = not_found
+if 'CONFIG_FUSE' in config_host
+  libfuse = declare_dependency(compile_args: 
config_host['FUSE_CFLAGS'].split(),
+   link_args: config_host['FUSE_LIBS'].split())
+endif
 
 has_gettid = cc.has_function('gettid')
 
@@ -1531,6 +1536,7 @@ endif
 summary_info += {'thread sanitizer':  config_host.has_key('CONFIG_TSAN')}
 summary_info += {'rng-none':  config_host.has_key('CONFIG_RNG_NONE')}
 summary_info += {'Linux keyring': 
config_host.has_key('CONFIG_SECRET_KEYRING')}
+summary_info += {'fuse exports':  config_host.has_key('CONFIG_FUSE')}
 summary(summary_info, bool_yn: true)
 
 if not supported_cpus.contains(cpu)
-- 
2.26.2

Re: [PATCH v2] fdc: check null block pointer before r/w data transfer

2020-09-22 Thread Li Qiang

P J P  于2020年9月22日周二 下午5:29写道：
>
> From: Prasad J Pandit 
>
> While transferring data via fdctrl_read/write_data() routines,
> check that current drive does not have a null block pointer.
> Avoid null pointer dereference.
>
>  -> https://ruhr-uni-bochum.sciebo.de/s/NNWP2GfwzYKeKwE?path=%2Ffdc_nullptr1
> ==1658854==Hint: address points to the zero page.
> #0 blk_inc_in_flight block/block-backend.c:1327
> #1 blk_prw block/block-backend.c:1299
> #2 blk_pwrite block/block-backend.c:1464
> #3 fdctrl_write_data hw/block/fdc.c:2418
> #4 fdctrl_write hw/block/fdc.c:962
> #5 portio_write ioport.c:205
> #6 memory_region_write_accessor memory.c:483
> #7 access_with_adjusted_size memory.c:544
> #8 memory_region_dispatch_write memory.c:1476
>
> Reported-by: Ruhr-University 
> Signed-off-by: Prasad J Pandit 

Reviewed-by: Li Qiang 

> ---
>  hw/block/fdc.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> Update v2: treat NULL blk pointer as an error
>   -> https://lists.nongnu.org/archive/html/qemu-devel/2020-09/msg06642.html
>
> diff --git a/hw/block/fdc.c b/hw/block/fdc.c
> index 224bac504f..bf968dc66f 100644
> --- a/hw/block/fdc.c
> +++ b/hw/block/fdc.c
> @@ -1923,7 +1923,8 @@ static uint32_t fdctrl_read_data(FDCtrl *fdctrl)
> fd_sector(cur_drv));
>  return 0;
>  }
> -if (blk_pread(cur_drv->blk, fd_offset(cur_drv), fdctrl->fifo,
> +if (!cur_drv->blk
> +|| blk_pread(cur_drv->blk, fd_offset(cur_drv), fdctrl->fifo,
>BDRV_SECTOR_SIZE)
>  < 0) {
>  FLOPPY_DPRINTF("error getting sector %d\n",
> @@ -2427,7 +2428,8 @@ static void fdctrl_write_data(FDCtrl *fdctrl, uint32_t 
> value)
>  if (pos == FD_SECTOR_LEN - 1 ||
>  fdctrl->data_pos == fdctrl->data_len) {
>  cur_drv = get_cur_drv(fdctrl);
> -if (blk_pwrite(cur_drv->blk, fd_offset(cur_drv), fdctrl->fifo,
> +if (!cur_drv->blk
> +|| blk_pwrite(cur_drv->blk, fd_offset(cur_drv), fdctrl->fifo,
> BDRV_SECTOR_SIZE, 0) < 0) {
>  FLOPPY_DPRINTF("error writing sector %d\n",
> fd_sector(cur_drv));
> --
> 2.26.2
>

Re: [PATCH] hw: ide: check the pointer before do dma memory unmap

2020-09-22 Thread Li Qiang

Philippe Mathieu-Daudé  于2020年9月22日周二 下午4:19写道：
>
> On 9/22/20 3:34 AM, Alexander Bulekov wrote:
> > On 200815 0020, Li Qiang wrote:
> >> In 'map_page' we need to check the return value of
> >> 'dma_memory_map' to ensure the we actully maped something.
> >> Otherwise, we will hit an assert in 'address_space_unmap'.
> >> This is because we can't find the MR with the NULL buffer.
> >> This is the LP#1884693:
> >>
> >> -->https://bugs.launchpad.net/qemu/+bug/1884693
> >>
> >> Reported-by: Alexander Bulekov 
> >> Signed-off-by: Li Qiang 
> >
> > I'm not very familiar with the IDE code, but this seems like a simple
> > null-ptr check, and Li has not received a response in over a month.
>
> Yeah well it is not an easy bug... I spent few hours but at some
> point it became too AHCI specific. I wanted to understand the bug
> to answer the "Why do we get there?" "Can we get there with real
> hardware?" questions, to be able to discern if this patch is OK,
> or if it is hiding bugs and what we really use here is an assert().

Hi Philippe,
I think you have complicated this issue. The root cause is that
'dma_memory_map' maybe fail.
The gpa is from guest and can be any value so this is expected.
It can return NULL pointer (no map) or it can be do partially
mapped(len < wanted).
Though in most situation the map result is 'ret == NULL and len <
wanted'. It may also has '
ret != NULL and len < wanted' I think.

The assert is come from that we pass NULL to 'dma_memory_unmap'.

So the standard usage of 'dma_memory_map' I think is first check if
the return value to ensure it is not NULL.
Then check whether it mapped the len as the caller expected.

There are several places in the code base that doesn't following this
usage which I think it is wrong.

Thanks,
Li Qiang

>
> >
> > Reviewed-by: Alexander Bulekov 
> >
> >> ---
> >>  hw/ide/ahci.c | 5 +
> >>  1 file changed, 5 insertions(+)
> >>
> >> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> >> index 009120f88b..63e9fccdbe 100644
> >> --- a/hw/ide/ahci.c
> >> +++ b/hw/ide/ahci.c
> >> @@ -250,6 +250,11 @@ static void map_page(AddressSpace *as, uint8_t **ptr, 
> >> uint64_t addr,
> >>  }
> >>
> >>  *ptr = dma_memory_map(as, addr, &len, DMA_DIRECTION_FROM_DEVICE);
> >> +
> >> +if (!*ptr) {
> >> +return;
> >> +}
> >> +
> >>  if (len < wanted) {
> >>  dma_memory_unmap(as, *ptr, len, DMA_DIRECTION_FROM_DEVICE, len);
> >>  *ptr = NULL;
> >> --
> >> 2.17.1
> >>
> >
>

Re: [PATCH 0/1] block: future of sheepdog storage driver ?

2020-09-22 Thread Thomas Huth

On 22/09/2020 11.01, Daniel P. Berrangé wrote:
[...]
> Does someone have a compelling reason for QEMU to keep supporting
> this driver, other than the fact that it already exists ?
> 
> If not, it looks like a candidate for deprecating in QEMU with a
> view to later removing it.

I think you gave enough examples why this is a good candidate for
deprecation. So I'd say: Simply send a patch to deprecate it. That's
what's our deprecation process is good for. If someone still cares for
sheepdog, they then can speak up and we can revert the patch that put it
on the deprecation list.

 Thomas

Re: [PATCH v2 0/6] block/nvme: Map doorbells pages write-only, remove magic from nvme_init

2020-09-22 Thread Fam Zheng

On Tue, 2020-09-22 at 10:38 +0200, Philippe Mathieu-Daudé wrote:
> Instead of mapping 8K of I/O + doorbells as RW during the whole
> execution, maps I/O temporarly at init, and map doorbells WO.
> 
> Replace various magic values by slighly more explicit macros from
> "block/nvme.h".
> 
> Since v1: Fix uninitialized regs* (patchew)
> 
> Philippe Mathieu-Daudé (6):
>   util/vfio-helpers: Pass page protections to qemu_vfio_pci_map_bar()
>   block/nvme: Map doorbells pages write-only
>   block/nvme: Reduce I/O registers scope
>   block/nvme: Drop NVMeRegs structure, directly use NvmeBar
>   block/nvme: Use register definitions from 'block/nvme.h'
>   block/nvme: Replace magic value by SCALE_MS definition
> 
>  include/qemu/vfio-helpers.h |  2 +-
>  block/nvme.c| 73 +
> 
>  util/vfio-helpers.c |  4 +-
>  3 files changed, 44 insertions(+), 35 deletions(-)
> 

Reviewed-by: Fam Zheng

Re: [PATCH 2/6] block/nvme: Map doorbells pages write-only

2020-09-22 Thread Fam Zheng

On Tue, 2020-09-22 at 10:41 +0200, Philippe Mathieu-Daudé wrote:
> Hi Fam,
> 
> +Paolo?
> 
> On 9/22/20 10:18 AM, Fam Zheng wrote:
> > On Mon, 2020-09-21 at 18:29 +0200, Philippe Mathieu-Daudé wrote:
> > > Per the datasheet sections 3.1.13/3.1.14:
> > >   "The host should not read the doorbell registers."
> > > 
> > > As we don't need read access, map the doorbells with write-only
> > > permission. We keep a reference to this mapped address in the
> > > BDRVNVMeState structure.
> > 
> > Besides looking more correct in access mode, is there any side
> > effect
> > of WO mapping?
> 
> TBH I don't have enough knowledge to answer this question.
> I tested successfully on X86. I'm writing more tests.

The reason I'm asking is more because x86 pages are either RO or RW. So
I'd like to see if there's a practical reason behind this patch (I have
no idea about the effects on MTRR and/or IOMMU).

Fam

> 
> > 
> > Fam
> > 
> > > 
> > > Signed-off-by: Philippe Mathieu-Daudé 
> > > ---
> > >  block/nvme.c | 29 +++--
> > >  1 file changed, 19 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/block/nvme.c b/block/nvme.c
> > > index 5a4dc6a722a..3c834da8fec 100644
> > > --- a/block/nvme.c
> > > +++ b/block/nvme.c
> > > @@ -31,7 +31,7 @@
> > >  #define NVME_SQ_ENTRY_BYTES 64
> > >  #define NVME_CQ_ENTRY_BYTES 16
> > >  #define NVME_QUEUE_SIZE 128
> > > -#define NVME_BAR_SIZE 8192
> > > +#define NVME_DOORBELL_SIZE 4096
> > >  
> > >  /*
> > >   * We have to leave one slot empty as that is the full queue
> > > case
> > > where
> > > @@ -84,10 +84,6 @@ typedef struct {
> > >  /* Memory mapped registers */
> > >  typedef volatile struct {
> > >  NvmeBar ctrl;
> > > -struct {
> > > -uint32_t sq_tail;
> > > -uint32_t cq_head;
> > > -} doorbells[];
> > >  } NVMeRegs;
> > >  
> > >  #define INDEX_ADMIN 0
> > > @@ -103,6 +99,11 @@ struct BDRVNVMeState {
> > >  AioContext *aio_context;
> > >  QEMUVFIOState *vfio;
> > >  NVMeRegs *regs;
> > > +/* Memory mapped registers */
> > > +volatile struct {
> > > +uint32_t sq_tail;
> > > +uint32_t cq_head;
> > > +} *doorbells;
> > >  /* The submission/completion queue pairs.
> > >   * [0]: admin queue.
> > >   * [1..]: io queues.
> > > @@ -247,14 +248,14 @@ static NVMeQueuePair
> > > *nvme_create_queue_pair(BDRVNVMeState *s,
> > >  error_propagate(errp, local_err);
> > >  goto fail;
> > >  }
> > > -q->sq.doorbell = &s->regs->doorbells[idx * s-
> > > > doorbell_scale].sq_tail;
> > > 
> > > +q->sq.doorbell = &s->doorbells[idx * s-
> > > >doorbell_scale].sq_tail;
> > >  
> > >  nvme_init_queue(s, &q->cq, size, NVME_CQ_ENTRY_BYTES,
> > > &local_err);
> > >  if (local_err) {
> > >  error_propagate(errp, local_err);
> > >  goto fail;
> > >  }
> > > -q->cq.doorbell = &s->regs->doorbells[idx * s-
> > > > doorbell_scale].cq_head;
> > > 
> > > +q->cq.doorbell = &s->doorbells[idx * s-
> > > >doorbell_scale].cq_head;
> > >  
> > >  return q;
> > >  fail:
> > > @@ -712,13 +713,12 @@ static int nvme_init(BlockDriverState *bs,
> > > const char *device, int namespace,
> > >  goto out;
> > >  }
> > >  
> > > -s->regs = qemu_vfio_pci_map_bar(s->vfio, 0, 0,
> > > NVME_BAR_SIZE,
> > > +s->regs = qemu_vfio_pci_map_bar(s->vfio, 0, 0,
> > > sizeof(NvmeBar),
> > >  PROT_READ | PROT_WRITE,
> > > errp);
> > >  if (!s->regs) {
> > >  ret = -EINVAL;
> > >  goto out;
> > >  }
> > > -
> > >  /* Perform initialize sequence as described in NVMe spec
> > > "7.6.1
> > >   * Initialization". */
> > >  
> > > @@ -748,6 +748,13 @@ static int nvme_init(BlockDriverState *bs,
> > > const
> > > char *device, int namespace,
> > >  }
> > >  }
> > >  
> > > +s->doorbells = qemu_vfio_pci_map_bar(s->vfio, 0,
> > > sizeof(NvmeBar),
> > > + NVME_DOORBELL_SIZE,
> > > PROT_WRITE, errp);
> > > +if (!s->doorbells) {
> > > +ret = -EINVAL;
> > > +goto out;
> > > +}
> > > +
> > >  /* Set up admin queue. */
> > >  s->queues = g_new(NVMeQueuePair *, 1);
> > >  s->queues[INDEX_ADMIN] = nvme_create_queue_pair(s,
> > > aio_context,
> > > 0,
> > > @@ -873,7 +880,9 @@ static void nvme_close(BlockDriverState *bs)
> > > &s-
> > > >irq_notifier[MSIX_SHARED_IRQ_IDX],
> > > false, NULL, NULL);
> > >  event_notifier_cleanup(&s-
> > > >irq_notifier[MSIX_SHARED_IRQ_IDX]);
> > > -qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->regs, 0,
> > > NVME_BAR_SIZE);
> > > +qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->doorbells,
> > > +sizeof(NvmeBar),
> > > NVME_DOORBELL_SIZE);
> > > +qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->regs, 0,
> > > sizeof(NvmeBar));
> > >  qemu_vfio_close(s->vfio);
> > >  
> > >

Re: [PATCH v2] qemu/atomic.h: prefix qemu_ to solve collisions

2020-09-22 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20200922085838.230505-1-stefa...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20200922085838.230505-1-stefa...@redhat.com
Subject: [PATCH v2] qemu/atomic.h: prefix qemu_ to solve  
collisions

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

From https://github.com/patchew-project/qemu
 * [new tag] patchew/20200922085838.230505-1-stefa...@redhat.com -> 
patchew/20200922085838.230505-1-stefa...@redhat.com
Switched to a new branch 'test'
1e6457d qemu/atomic.h: prefix qemu_ to solve  collisions

=== OUTPUT BEGIN ===
WARNING: Block comments use a trailing */ on a separate line
#2501: FILE: hw/virtio/vhost.c:92:
+ * but it's easier to use qemu_atomic_* than roll our own. */

ERROR: Macros with multiple statements should be enclosed in a do - while loop
#2973: FILE: include/qemu/atomic.h:152:
+#define qemu_atomic_rcu_read__nocheck(ptr, valptr)  \
 __atomic_load(ptr, valptr, __ATOMIC_RELAXED);   \
 smp_read_barrier_depends();

ERROR: space required before that '*' (ctx:VxB)
#3128: FILE: include/qemu/atomic.h:347:
+#define qemu_atomic_read__nocheck(p) (*(__typeof__(*(p)) volatile*) (p))
  ^

ERROR: Use of volatile is usually wrong, please add a comment
#3128: FILE: include/qemu/atomic.h:347:
+#define qemu_atomic_read__nocheck(p) (*(__typeof__(*(p)) volatile*) (p))

ERROR: space required before that '*' (ctx:VxB)
#3130: FILE: include/qemu/atomic.h:349:
+((*(__typeof__(*(p)) volatile*) (p)) = (i))
  ^

ERROR: Use of volatile is usually wrong, please add a comment
#3130: FILE: include/qemu/atomic.h:349:
+((*(__typeof__(*(p)) volatile*) (p)) = (i))

ERROR: space required after that ',' (ctx:VxV)
#3135: FILE: include/qemu/atomic.h:352:
+#define qemu_atomic_set(ptr, i) qemu_atomic_set__nocheck(ptr,i)
 ^

ERROR: memory barrier without comment
#3210: FILE: include/qemu/atomic.h:410:
+#define qemu_atomic_xchg(ptr, i) (smp_mb(), __sync_lock_test_and_set(ptr, i))

WARNING: Block comments use a leading /* on a separate line
#3285: FILE: include/qemu/atomic.h:462:
+/* qemu_atomic_mb_read/set semantics map Java volatile variables. They are

WARNING: Block comments use a leading /* on a separate line
#6402: FILE: util/bitmap.c:214:
+/* If we avoided the full barrier in qemu_atomic_or(), issue a

WARNING: Block comments use a leading /* on a separate line
#7438: FILE: util/rcu.c:85:
+/* Instead of using qemu_atomic_mb_set for index->waiting, and

WARNING: Block comments use a leading /* on a separate line
#7464: FILE: util/rcu.c:154:
+/* In either case, the qemu_atomic_mb_set below blocks stores that free

total: 7 errors, 5 warnings, 6520 lines checked

Commit 1e6457dfb499 (qemu/atomic.h: prefix qemu_ to solve  
collisions) has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200922085838.230505-1-stefa...@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH 2/6] block/nvme: Map doorbells pages write-only

2020-09-22 Thread Philippe Mathieu-Daudé

On 9/22/20 11:04 AM, Paolo Bonzini wrote:
> On 22/09/20 10:41, Philippe Mathieu-Daudé wrote:
>>> Besides looking more correct in access mode, is there any side effect
>>> of WO mapping?
>> TBH I don't have enough knowledge to answer this question.
>> I tested successfully on X86. I'm writing more tests.
> 
> No problem with doing this, but PROT_WRITE does not work at all on x86.
> :)  PROT_EXEC works if you have a machine with PKRU, but PROT_WRITE
> silently becomes PROT_READ|PROT_WRITE because the processor does not
> support it.

Ah this is why it works the same way in my testing.

I'll run tests on ARM.

Thanks,

Phil.

> 
> Paolo
>

[PATCH qemu 4/4] iotests: add test for bitmap mirror

2020-09-22 Thread Fabian Grünbichler

heavily based on/practically forked off iotest 257 for bitmap backups,
but:

- no writes to filter node 'mirror-top' between completion and
finalization, as those seem to deadlock?
- no inclusion of not-yet-available full/top sync modes in combination
with bitmaps
- extra set of reference/test mirrors to verify that writes in parallel
with active mirror work

intentionally keeping copyright and ownership of original test case to
honor provenance.

Signed-off-by: Fabian Grünbichler 
---
 tests/qemu-iotests/306 |  546 +++
 tests/qemu-iotests/306.out | 2846 
 tests/qemu-iotests/group   |1 +
 3 files changed, 3393 insertions(+)
 create mode 100755 tests/qemu-iotests/306
 create mode 100644 tests/qemu-iotests/306.out

diff --git a/tests/qemu-iotests/306 b/tests/qemu-iotests/306
new file mode 100755
index 00..1bb8bd4138
--- /dev/null
+++ b/tests/qemu-iotests/306
@@ -0,0 +1,546 @@
+#!/usr/bin/env python3
+#
+# Test bitmap-sync mirrors (incremental, differential, and partials)
+#
+# Copyright (c) 2019 John Snow for Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+# owner=js...@redhat.com
+
+import math
+import os
+
+import iotests
+from iotests import log, qemu_img
+
+SIZE = 64 * 1024 * 1024
+GRANULARITY = 64 * 1024
+
+
+class Pattern:
+def __init__(self, byte, offset, size=GRANULARITY):
+self.byte = byte
+self.offset = offset
+self.size = size
+
+def bits(self, granularity):
+lower = self.offset // granularity
+upper = (self.offset + self.size - 1) // granularity
+return set(range(lower, upper + 1))
+
+
+class PatternGroup:
+"""Grouping of Pattern objects. Initialize with an iterable of Patterns."""
+def __init__(self, patterns):
+self.patterns = patterns
+
+def bits(self, granularity):
+"""Calculate the unique bits dirtied by this pattern grouping"""
+res = set()
+for pattern in self.patterns:
+res |= pattern.bits(granularity)
+return res
+
+
+GROUPS = [
+PatternGroup([
+# Batch 0: 4 clusters
+Pattern('0x49', 0x000),
+Pattern('0x6c', 0x010),   # 1M
+Pattern('0x6f', 0x200),   # 32M
+Pattern('0x76', 0x3ff)]), # 64M - 64K
+PatternGroup([
+# Batch 1: 6 clusters (3 new)
+Pattern('0x65', 0x000),   # Full overwrite
+Pattern('0x77', 0x00f8000),   # Partial-left (1M-32K)
+Pattern('0x72', 0x2008000),   # Partial-right (32M+32K)
+Pattern('0x69', 0x3fe)]), # Adjacent-left (64M - 128K)
+PatternGroup([
+# Batch 2: 7 clusters (3 new)
+Pattern('0x74', 0x001),   # Adjacent-right
+Pattern('0x69', 0x00e8000),   # Partial-left  (1M-96K)
+Pattern('0x6e', 0x2018000),   # Partial-right (32M+96K)
+Pattern('0x67', 0x3fe,
+2*GRANULARITY)]), # Overwrite [(64M-128K)-64M)
+PatternGroup([
+# Batch 3: 8 clusters (5 new)
+# Carefully chosen such that nothing re-dirties the one cluster
+# that copies out successfully before failure in Group #1.
+Pattern('0xaa', 0x001,
+3*GRANULARITY),   # Overwrite and 2x Adjacent-right
+Pattern('0xbb', 0x00d8000),   # Partial-left (1M-160K)
+Pattern('0xcc', 0x2028000),   # Partial-right (32M+160K)
+Pattern('0xdd', 0x3fc)]), # New; leaving a gap to the right
+]
+
+
+class EmulatedBitmap:
+def __init__(self, granularity=GRANULARITY):
+self._bits = set()
+self.granularity = granularity
+
+def dirty_bits(self, bits):
+self._bits |= set(bits)
+
+def dirty_group(self, n):
+self.dirty_bits(GROUPS[n].bits(self.granularity))
+
+def clear(self):
+self._bits = set()
+
+def clear_bits(self, bits):
+self._bits -= set(bits)
+
+def clear_bit(self, bit):
+self.clear_bits({bit})
+
+def clear_group(self, n):
+self.clear_bits(GROUPS[n].bits(self.granularity))
+
+@property
+def first_bit(self):
+return sorted(self.bits)[0]
+
+@property
+def bits(self):
+return self._bits
+
+@property
+def count(self):
+return len(self.bits)
+
+def compare(self, qmp_bitmap):
+"""
+Print a nice human-readable message checking that a bitmap as rep

[PATCH v2] fdc: check null block pointer before r/w data transfer

2020-09-22 Thread P J P

From: Prasad J Pandit 

While transferring data via fdctrl_read/write_data() routines,
check that current drive does not have a null block pointer.
Avoid null pointer dereference.

 -> https://ruhr-uni-bochum.sciebo.de/s/NNWP2GfwzYKeKwE?path=%2Ffdc_nullptr1
==1658854==Hint: address points to the zero page.
#0 blk_inc_in_flight block/block-backend.c:1327
#1 blk_prw block/block-backend.c:1299
#2 blk_pwrite block/block-backend.c:1464
#3 fdctrl_write_data hw/block/fdc.c:2418
#4 fdctrl_write hw/block/fdc.c:962
#5 portio_write ioport.c:205
#6 memory_region_write_accessor memory.c:483
#7 access_with_adjusted_size memory.c:544
#8 memory_region_dispatch_write memory.c:1476

Reported-by: Ruhr-University 
Signed-off-by: Prasad J Pandit 
---
 hw/block/fdc.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

Update v2: treat NULL blk pointer as an error
  -> https://lists.nongnu.org/archive/html/qemu-devel/2020-09/msg06642.html

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 224bac504f..bf968dc66f 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -1923,7 +1923,8 @@ static uint32_t fdctrl_read_data(FDCtrl *fdctrl)
fd_sector(cur_drv));
 return 0;
 }
-if (blk_pread(cur_drv->blk, fd_offset(cur_drv), fdctrl->fifo,
+if (!cur_drv->blk
+|| blk_pread(cur_drv->blk, fd_offset(cur_drv), fdctrl->fifo,
   BDRV_SECTOR_SIZE)
 < 0) {
 FLOPPY_DPRINTF("error getting sector %d\n",
@@ -2427,7 +2428,8 @@ static void fdctrl_write_data(FDCtrl *fdctrl, uint32_t 
value)
 if (pos == FD_SECTOR_LEN - 1 ||
 fdctrl->data_pos == fdctrl->data_len) {
 cur_drv = get_cur_drv(fdctrl);
-if (blk_pwrite(cur_drv->blk, fd_offset(cur_drv), fdctrl->fifo,
+if (!cur_drv->blk
+|| blk_pwrite(cur_drv->blk, fd_offset(cur_drv), fdctrl->fifo,
BDRV_SECTOR_SIZE, 0) < 0) {
 FLOPPY_DPRINTF("error writing sector %d\n",
fd_sector(cur_drv));
--
2.26.2

[PATCH qemu 3/4] mirror: move some checks to qmp

2020-09-22 Thread Fabian Grünbichler

and assert the passing conditions in block/mirror.c. while incremental
mode was never available for drive-mirror, it makes the interface more
in line with  backup block jobs.

Signed-off-by: Fabian Grünbichler 
---
 block/mirror.c | 28 +---
 blockdev.c | 29 +
 2 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index bc4f4563d9..fe70f9b93c 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1592,31 +1592,13 @@ static BlockJob *mirror_start_job(
 Error *local_err = NULL;
 int ret;
 
-if (sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
-error_setg(errp, "Sync mode '%s' not supported",
-   MirrorSyncMode_str(sync_mode));
-return NULL;
-} else if (sync_mode == MIRROR_SYNC_MODE_BITMAP) {
-if (!bitmap) {
-error_setg(errp, "Must provide a valid bitmap name for '%s'"
-   " sync mode",
-   MirrorSyncMode_str(sync_mode));
-return NULL;
-}
-} else if (bitmap) {
-error_setg(errp,
-   "sync mode '%s' is not compatible with bitmaps",
-   MirrorSyncMode_str(sync_mode));
-return NULL;
-}
+/* QMP interface protects us from these cases */
+assert(sync_mode != MIRROR_SYNC_MODE_INCREMENTAL);
+assert((bitmap && sync_mode == MIRROR_SYNC_MODE_BITMAP) ||
+   (!bitmap && sync_mode != MIRROR_SYNC_MODE_BITMAP));
+assert(!(bitmap && granularity));
 
 if (bitmap) {
-if (granularity) {
-error_setg(errp, "granularity (%d)"
-   "cannot be specified when a bitmap is provided",
-   granularity);
-return NULL;
-}
 granularity = bdrv_dirty_bitmap_granularity(bitmap);
 
 if (bitmap_mode != BITMAP_SYNC_MODE_NEVER) {
diff --git a/blockdev.c b/blockdev.c
index 0fd30a392d..3b090cc749 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2998,7 +2998,36 @@ static void blockdev_mirror_common(const char *job_id, 
BlockDriverState *bs,
 sync = MIRROR_SYNC_MODE_FULL;
 }
 
+if ((sync == MIRROR_SYNC_MODE_BITMAP) ||
+(sync == MIRROR_SYNC_MODE_INCREMENTAL)) {
+/* done before desugaring 'incremental' to print the right message */
+if (!has_bitmap) {
+error_setg(errp, "Must provide a valid bitmap name for "
+   "'%s' sync mode", MirrorSyncMode_str(sync));
+return;
+}
+}
+
+if (sync == MIRROR_SYNC_MODE_INCREMENTAL) {
+if (has_bitmap_mode &&
+bitmap_mode != BITMAP_SYNC_MODE_ON_SUCCESS) {
+error_setg(errp, "Bitmap sync mode must be '%s' "
+   "when using sync mode '%s'",
+   BitmapSyncMode_str(BITMAP_SYNC_MODE_ON_SUCCESS),
+   MirrorSyncMode_str(sync));
+return;
+}
+has_bitmap_mode = true;
+sync = MIRROR_SYNC_MODE_BITMAP;
+bitmap_mode = BITMAP_SYNC_MODE_ON_SUCCESS;
+}
+
 if (has_bitmap) {
+if (sync != MIRROR_SYNC_MODE_BITMAP) {
+error_setg(errp, "Sync mode '%s' not supported with bitmap.",
+   MirrorSyncMode_str(sync));
+return;
+}
 if (granularity) {
 error_setg(errp, "Granularity and bitmap cannot both be set");
 return;
-- 
2.20.1

Re: [PATCH] qemu/atomic.h: prefix qemu_ to solve collisions

2020-09-22 Thread Stefan Hajnoczi

On Mon, Sep 21, 2020 at 04:29:10PM -0500, Eric Blake wrote:
> On 9/21/20 11:23 AM, Stefan Hajnoczi wrote:

Thanks for the review! Your feedback prompted me to do this more
systematically. I fixed the command-lines and published a diff of just
the manual changes I made on top of the mechanical changes (see v2).

> > clang's C11 atomic_fetch_*() functions only take a C11 atomic type
> > pointer argument. QEMU uses direct types (int, etc) and this causes a
> > compiler error when a QEMU code calls these functions in a source file
> > that also included  via a system header file:
> > 
> >$ CC=clang CXX=clang++ ./configure ... && make
> >../util/async.c:79:17: error: address argument to atomic operation must 
> > be a pointer to _Atomic type ('unsigned int *' invalid)
> > 
> > Avoid using atomic_*() names in QEMU's atomic.h since that namespace is
> > used by . Prefix QEMU's APIs with qemu_ so that atomic.h
> > and  can co-exist.
> > 
> > This patch was generated using:
> > 
> >$ git diff | grep -o '\ > >/tmp/changed_identifiers
> 
> Missing a step in the recipe: namely, you probably modified
> include/qemu/atomic*.h prior to running 'git diff' (so that you actually had
> input to feed to grep -o).  But spelling it 'git diff HEAD^
> include/qemu/atomic*.h | ...' does indeed give me a sane list of identifiers
> that looks like what you touched in the rest of the patch.

Yes, I edited the file first and then used this command-line. The one
you posted it better :).

> 
> >$ for identifier in $( 
> Also not quite the right recipe, based on the file name used in the line
> above.

Yes, "64" is when I realized the original grep expression hadn't matched
the atomic64 APIs.

These commands only show the gist of it. It involved a few manual steps.

> 
> > sed -i "s%\<$identifier\>%qemu_$identifier%" $(git grep -l 
> > "\<$identifier\>") \
> >  done
> > 
> 
> Fortunately, running "git grep -c '\ state of the tree gives me a list that is somewhat close to yours, where the
> obvious difference in line counts is explained by:
> 
> > I manually fixed line-wrap issues and misaligned rST tables.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> 
> First, focusing on the change summary:
> 
> >   docs/devel/lockcnt.txt|  14 +-
> >   docs/devel/rcu.txt|  40 +--
> >   accel/tcg/atomic_template.h   |  20 +-
> >   include/block/aio-wait.h  |   4 +-
> >   include/block/aio.h   |   8 +-
> >   include/exec/cpu_ldst.h   |   2 +-
> >   include/exec/exec-all.h   |   6 +-
> >   include/exec/log.h|   6 +-
> >   include/exec/memory.h |   2 +-
> >   include/exec/ram_addr.h   |  27 +-
> >   include/exec/ramlist.h|   2 +-
> >   include/exec/tb-lookup.h  |   4 +-
> >   include/hw/core/cpu.h |   2 +-
> >   include/qemu/atomic.h | 258 +++---
> >   include/qemu/atomic128.h  |   6 +-
> 
> These two are the most important for the sake of this patch; perhaps it's
> worth a temporary override of your git orderfile if you have to respin, to
> list them first?

Will do in v2.

> 
> >   include/qemu/bitops.h |   2 +-
> >   include/qemu/coroutine.h  |   2 +-
> >   include/qemu/log.h|   6 +-
> >   include/qemu/queue.h  |   8 +-
> >   include/qemu/rcu.h|  10 +-
> >   include/qemu/rcu_queue.h  | 103 +++---
> 
> Presumably, this and any other file with an odd number of changes was due to
> a difference in lines after reformatting long lines.

Yes, line-wrapping required many changes in this file.

> 
> >   include/qemu/seqlock.h|   8 +-
> ...
> 
> >   util/stats64.c|  34 +-
> >   docs/devel/atomics.rst| 326 +-
> >   .../opensbi-riscv32-generic-fw_dynamic.elf| Bin 558668 -> 558698 bytes
> >   .../opensbi-riscv64-generic-fw_dynamic.elf| Bin 620424 -> 620454 bytes
> 
> Why are we regenerating .elf files in this patch?  Is your change even
> correct for those two files?

Thanks for noticing this! The git-grep(1) man page documents the -I
option for skipping binary files. I thought this was the default.

Will fix in v2.

> 
> >   scripts/kernel-doc|   2 +-
> >   tcg/aarch64/tcg-target.c.inc  |   2 +-
> >   tcg/mips/tcg-target.c.inc |   2 +-
> >   tcg/ppc/tcg-target.c.inc  |   6 +-
> >   tcg/sparc/tcg-target.c.inc|   5 +-
> >   135 files changed, 1195 insertions(+), 1130 deletions(-)
> 
> I don't spot accel/tcg/atomic_common.c.inc in the list (which decla

Re: [PATCH 2/6] block/nvme: Map doorbells pages write-only

2020-09-22 Thread Paolo Bonzini

On 22/09/20 10:41, Philippe Mathieu-Daudé wrote:
>> Besides looking more correct in access mode, is there any side effect
>> of WO mapping?
> TBH I don't have enough knowledge to answer this question.
> I tested successfully on X86. I'm writing more tests.

No problem with doing this, but PROT_WRITE does not work at all on x86.
:)  PROT_EXEC works if you have a machine with PKRU, but PROT_WRITE
silently becomes PROT_READ|PROT_WRITE because the processor does not
support it.

Paolo

[PATCH qemu 2/4] drive-mirror: add support for conditional and always bitmap sync modes

2020-09-22 Thread Fabian Grünbichler

From: John Snow 

Teach mirror two new tricks for using bitmaps:

Always: no matter what, we synchronize the copy_bitmap back to the
sync_bitmap. In effect, this allows us resume a failed mirror at a later
date, since the target bdrv should be exactly in the state referenced by
the bitmap.

Conditional: On success only, we sync the bitmap. This is akin to
incremental backup modes; we can use this bitmap to later refresh a
successfully created mirror, or possibly re-try the whole failed mirror
if we are able to rollback the target to the state before starting the
mirror.

Based on original work by John Snow.

Signed-off-by: Fabian Grünbichler 
---
 block/mirror.c | 28 
 blockdev.c |  3 +++
 2 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index d64c8203ef..bc4f4563d9 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -654,8 +654,6 @@ static int mirror_exit_common(Job *job)
 bdrv_unfreeze_backing_chain(mirror_top_bs, target_bs);
 }
 
-bdrv_release_dirty_bitmap(s->dirty_bitmap);
-
 /* Make sure that the source BDS doesn't go away during bdrv_replace_node,
  * before we can call bdrv_drained_end */
 bdrv_ref(src);
@@ -755,6 +753,18 @@ static int mirror_exit_common(Job *job)
 blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort);
 blk_insert_bs(bjob->blk, mirror_top_bs, &error_abort);
 
+if (s->sync_bitmap) {
+if (s->bitmap_mode == BITMAP_SYNC_MODE_ALWAYS ||
+(s->bitmap_mode == BITMAP_SYNC_MODE_ON_SUCCESS &&
+ job->ret == 0 && ret == 0)) {
+/* Success; synchronize copy back to sync. */
+bdrv_clear_dirty_bitmap(s->sync_bitmap, NULL);
+bdrv_dirty_bitmap_merge_internal(s->sync_bitmap, s->dirty_bitmap,
+ NULL, true);
+}
+}
+bdrv_release_dirty_bitmap(s->dirty_bitmap);
+
 bs_opaque->job = NULL;
 
 bdrv_drained_end(src);
@@ -1592,10 +1602,6 @@ static BlockJob *mirror_start_job(
" sync mode",
MirrorSyncMode_str(sync_mode));
 return NULL;
-} else if (bitmap_mode != BITMAP_SYNC_MODE_NEVER) {
-error_setg(errp,
-   "Bitmap Sync Mode '%s' is not supported by Mirror",
-   BitmapSyncMode_str(bitmap_mode));
 }
 } else if (bitmap) {
 error_setg(errp,
@@ -1612,6 +1618,12 @@ static BlockJob *mirror_start_job(
 return NULL;
 }
 granularity = bdrv_dirty_bitmap_granularity(bitmap);
+
+if (bitmap_mode != BITMAP_SYNC_MODE_NEVER) {
+if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_DEFAULT, errp)) {
+return NULL;
+}
+}
 } else if (granularity == 0) {
 granularity = bdrv_get_default_bitmap_granularity(target);
 }
@@ -1781,8 +1793,8 @@ static BlockJob *mirror_start_job(
 }
 
 if (s->sync_mode == MIRROR_SYNC_MODE_BITMAP) {
-bdrv_merge_dirty_bitmap(s->dirty_bitmap, s->sync_bitmap,
-NULL, &local_err);
+bdrv_dirty_bitmap_merge_internal(s->dirty_bitmap, s->sync_bitmap,
+ NULL, true);
 if (local_err) {
 goto fail;
 }
diff --git a/blockdev.c b/blockdev.c
index 6baa1a33f5..0fd30a392d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3019,6 +3019,9 @@ static void blockdev_mirror_common(const char *job_id, 
BlockDriverState *bs,
 if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_ALLOW_RO, errp)) {
 return;
 }
+} else if (has_bitmap_mode) {
+error_setg(errp, "Cannot specify bitmap sync mode without a bitmap");
+return;
 }
 
 if (!has_replaces) {
-- 
2.20.1

[PATCH qemu 1/4] drive-mirror: add support for sync=bitmap mode=never

2020-09-22 Thread Fabian Grünbichler

From: John Snow 

This patch adds support for the "BITMAP" sync mode to drive-mirror and
blockdev-mirror. It adds support only for the BitmapSyncMode "never,"
because it's the simplest mode.

This mode simply uses a user-provided bitmap as an initial copy
manifest, and then does not clear any bits in the bitmap at the
conclusion of the operation.

Any new writes dirtied during the operation are copied out, in contrast
to backup. Note that whether these writes are reflected in the bitmap
at the conclusion of the operation depends on whether that bitmap is
actually recording!

This patch was originally based on one by Ma Haocong, but it has since
been modified pretty heavily.

Suggested-by: Ma Haocong 
Signed-off-by: Ma Haocong 
Signed-off-by: John Snow 
Signed-off-by: Fabian Grünbichler 
---
 include/block/block_int.h   |  4 +-
 block/mirror.c  | 98 ++---
 blockdev.c  | 39 +--
 tests/test-block-iothread.c |  4 +-
 qapi/block-core.json| 29 +--
 5 files changed, 145 insertions(+), 29 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 38cad9d15c..8b7cec67ad 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1230,7 +1230,9 @@ void mirror_start(const char *job_id, BlockDriverState 
*bs,
   BlockDriverState *target, const char *replaces,
   int creation_flags, int64_t speed,
   uint32_t granularity, int64_t buf_size,
-  MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
+  MirrorSyncMode mode, BdrvDirtyBitmap *bitmap,
+  BitmapSyncMode bitmap_mode,
+  BlockMirrorBackingMode backing_mode,
   bool zero_target,
   BlockdevOnError on_source_error,
   BlockdevOnError on_target_error,
diff --git a/block/mirror.c b/block/mirror.c
index 26acf4af6f..d64c8203ef 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -50,7 +50,7 @@ typedef struct MirrorBlockJob {
 BlockDriverState *to_replace;
 /* Used to block operations on the drive-mirror-replace target */
 Error *replace_blocker;
-bool is_none_mode;
+MirrorSyncMode sync_mode;
 BlockMirrorBackingMode backing_mode;
 /* Whether the target image requires explicit zero-initialization */
 bool zero_target;
@@ -65,6 +65,8 @@ typedef struct MirrorBlockJob {
 size_t buf_size;
 int64_t bdev_length;
 unsigned long *cow_bitmap;
+BdrvDirtyBitmap *sync_bitmap;
+BitmapSyncMode bitmap_mode;
 BdrvDirtyBitmap *dirty_bitmap;
 BdrvDirtyBitmapIter *dbi;
 uint8_t *buf;
@@ -677,7 +679,8 @@ static int mirror_exit_common(Job *job)
 bdrv_child_refresh_perms(mirror_top_bs, mirror_top_bs->backing,
  &error_abort);
 if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
-BlockDriverState *backing = s->is_none_mode ? src : s->base;
+BlockDriverState *backing;
+backing = s->sync_mode == MIRROR_SYNC_MODE_NONE ? src : s->base;
 BlockDriverState *unfiltered_target = bdrv_skip_filters(target_bs);
 
 if (bdrv_cow_bs(unfiltered_target) != backing) {
@@ -774,6 +777,16 @@ static void mirror_abort(Job *job)
 assert(ret == 0);
 }
 
+/* Always called after commit/abort. */
+static void mirror_clean(Job *job)
+{
+MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
+
+if (s->sync_bitmap) {
+bdrv_dirty_bitmap_set_busy(s->sync_bitmap, false);
+}
+}
+
 static void coroutine_fn mirror_throttle(MirrorBlockJob *s)
 {
 int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@@ -955,7 +968,8 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
 mirror_free_init(s);
 
 s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
-if (!s->is_none_mode) {
+if ((s->sync_mode == MIRROR_SYNC_MODE_TOP) ||
+(s->sync_mode == MIRROR_SYNC_MODE_FULL)) {
 ret = mirror_dirty_init(s);
 if (ret < 0 || job_is_cancelled(&s->common.job)) {
 goto immediate_exit;
@@ -1188,6 +1202,7 @@ static const BlockJobDriver mirror_job_driver = {
 .run= mirror_run,
 .prepare= mirror_prepare,
 .abort  = mirror_abort,
+.clean  = mirror_clean,
 .pause  = mirror_pause,
 .complete   = mirror_complete,
 },
@@ -1203,6 +1218,7 @@ static const BlockJobDriver commit_active_job_driver = {
 .run= mirror_run,
 .prepare= mirror_prepare,
 .abort  = mirror_abort,
+.clean  = mirror_clean,
 .pause  = mirror_pause,
 .complete   = mirror_complete,
 },
@@ -1550,7 +1566,10 @@ static BlockJob *mirror_start_job(

[PATCH qemu 0/4] mirror: implement incremental and bitmap modes

2020-09-22 Thread Fabian Grünbichler

based on John's in-progress work from last year, this series introduces
incremental drive-/block-dev mirror support using bitmaps with three bitmap
modes.

changes since RFC:
- rebased on current master
- squashed patches 2-4
- re-ordered patches 5/6, and moved all test code to patch 6

NOTE: patch #2 is still requiring a S-O-B by John before applying!

Fabian Grünbichler (2):
  mirror: move some checks to qmp
  iotests: add test for bitmap mirror

John Snow (2):
  drive-mirror: add support for sync=bitmap mode=never
  drive-mirror: add support for conditional and always bitmap sync modes

 include/block/block_int.h   |4 +-
 block/mirror.c  |   96 +-
 blockdev.c  |   71 +-
 tests/test-block-iothread.c |4 +-
 qapi/block-core.json|   29 +-
 tests/qemu-iotests/306  |  546 +++
 tests/qemu-iotests/306.out  | 2846 +++
 tests/qemu-iotests/group|1 +
 8 files changed, 3566 insertions(+), 31 deletions(-)
 create mode 100755 tests/qemu-iotests/306
 create mode 100644 tests/qemu-iotests/306.out

-- 
2.20.1

[PATCH v3 16/17] pci: allocate pci id for nvme

2020-09-22 Thread Klaus Jensen

From: Klaus Jensen 

The emulated nvme device (hw/block/nvme.c) is currently using an
internal Intel device id.

Prepare to change that by allocating a device id under the 1b36 (Red
Hat, Inc.) vendor id.

Signed-off-by: Klaus Jensen 
Acked-by: Gerd Hoffmann 
Reviewed-by: Maxim Levitsky 
Reviewed-by: Keith Busch 
---
 docs/specs/nvme.txt| 23 +++
 docs/specs/pci-ids.txt |  1 +
 include/hw/pci/pci.h   |  1 +
 MAINTAINERS|  1 +
 4 files changed, 26 insertions(+)
 create mode 100644 docs/specs/nvme.txt

diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt
new file mode 100644
index ..56d393884e7a
--- /dev/null
+++ b/docs/specs/nvme.txt
@@ -0,0 +1,23 @@
+NVM Express Controller
+==
+
+The nvme device (-device nvme) emulates an NVM Express Controller.
+
+
+Reference Specifications
+
+
+The device currently implements most mandatory features of NVMe v1.3d, see
+
+  https://nvmexpress.org/resources/specifications/
+
+for the specification.
+
+
+Known issues
+
+
+* The accounting numbers in the SMART/Health are reset across power cycles
+
+* Interrupt Coalescing is not supported and is disabled by default in volation
+  of the specification.
diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
index 4d53e5c7d9d5..abbdbca6be38 100644
--- a/docs/specs/pci-ids.txt
+++ b/docs/specs/pci-ids.txt
@@ -63,6 +63,7 @@ PCI devices (other than virtio):
 1b36:000b  PCIe Expander Bridge (-device pxb-pcie)
 1b36:000d  PCI xhci usb host adapter
 1b36:000f  mdpy (mdev sample device), linux/samples/vfio-mdev/mdpy.c
+1b36:0010  PCIe NVMe device (-device nvme)
 
 All these devices are documented in docs/specs.
 
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 0ff3feec1573..fb3ea21ee1d6 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -106,6 +106,7 @@ extern bool pci_available;
 #define PCI_DEVICE_ID_REDHAT_XHCI0x000d
 #define PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE 0x000e
 #define PCI_DEVICE_ID_REDHAT_MDPY0x000f
+#define PCI_DEVICE_ID_REDHAT_NVME0x0010
 #define PCI_DEVICE_ID_REDHAT_QXL 0x0100
 
 #define FMT_PCIBUS  PRIx64
diff --git a/MAINTAINERS b/MAINTAINERS
index 3d17cad19aa0..3f3babeca663 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1880,6 +1880,7 @@ L: qemu-block@nongnu.org
 S: Supported
 F: hw/block/nvme*
 F: tests/qtest/nvme-test.c
+F: docs/specs/nvme.txt
 T: git git://git.infradead.org/qemu-nvme.git nvme-next
 
 megasas
-- 
2.28.0

Re: [PATCH v6 2/2] nvme: allow cmb and pmr to be enabled on same device

2020-09-22 Thread Klaus Jensen

On Aug 17 08:29, Klaus Jensen wrote:
> On Jul 30 00:50, Klaus Jensen wrote:
> > On Jul 29 15:01, Andrzej Jakowski wrote:
> > > So far it was not possible to have CMB and PMR emulated on the same
> > > device, because BAR2 was used exclusively either of PMR or CMB. This
> > > patch places CMB at BAR4 offset so it not conflicts with MSI-X vectors.
> > > 
> > > Signed-off-by: Andrzej Jakowski 
> > > ---
> > 
> > Well, I'm certainly happy now. LGTM!
> > 
> > Reviewed-by: Klaus Jensen 
> > 
> 
> Are anyone willing to chip in with another review on this?
> 

I think this patch is ready (and have been for some time) for inclusion,
but would really like an additional review on this; preferably from
Keith, since he is the one that originally mentioned that we could do
something like this.

I've mentioned it before, but I would prefer that the MSI-X stuff was in
BAR0 instead of mixing it with the CMB, but that's bikeshedding and my
R-b still holds of course.

signature.asc
Description: PGP signature

[PATCH 0/1] block: future of sheepdog storage driver ?

2020-09-22 Thread Daniel P . Berrangé

2 years back I proposed dropping the sheepdog mailing list from the
MAINTAINERS file, but somehow the patch never got picked up:

  https://lists.gnu.org/archive/html/qemu-block/2018-03/msg01048.html

So here I am with the same patch again.

I further looked at the sheepdog project though, and I'm wondering
if we actually want to keep the sheepdog storage driver at all.

This thread from a little over a year ago:

  http://lists.wpkg.org/pipermail/sheepdog/2019-March/thread.html

clearly states that sheepdog is no longer actively developed. The only
mentioned users are some companies who are said to have it for legacy
reasons with plans to replace it by Ceph. There is talk about cutting
out existing features to turn it into a simple demo of how to write a
distributed block service. There is no evidence of anyone working on
that idea.

  https://github.com/sheepdog/sheepdog/commits/master

No real commits to git since Jan 2018 and that's just dealing with
technical debt.

There is essentially no activity on the mailing list aside from
patches to QEMU that get CC'd due to our MAINTAINERS entry, if and
when someone processes the moderator queue.

Fedora packages for sheepdog failed to build from upstream source
because of the more strict linker that no longer merges duplicate
global symbols. So we patch it to add the missing "extern" annotations.
Upstream source remains broken for everyone else.

AFAIK, none of our containers or VMs include the sheepdog server
packages, so we have no testing coverage for it in CI that I see.

Does someone have a compelling reason for QEMU to keep supporting
this driver, other than the fact that it already exists ?

If not, it looks like a candidate for deprecating in QEMU with a
view to later removing it.

Daniel P. Berrangé (1):
  block: drop moderated sheepdog mailing list from MAINTAINERS file

 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

-- 
2.26.2

[PATCH v3 14/17] hw/block/nvme: refactor identify active namespace id list

2020-09-22 Thread Klaus Jensen

From: Klaus Jensen 

Prepare to support inactive namespaces.

Signed-off-by: Klaus Jensen 
Reviewed-by: Maxim Levitsky 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 1fdd5c9c5ab0..f61076178a6d 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1473,7 +1473,7 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
NvmeRequest *req)
 uint32_t min_nsid = le32_to_cpu(c->nsid);
 uint32_t *list;
 uint16_t ret;
-int i, j = 0;
+int j = 0;
 
 trace_pci_nvme_identify_nslist(min_nsid);
 
@@ -1488,11 +1488,11 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
NvmeRequest *req)
 }
 
 list = g_malloc0(data_len);
-for (i = 0; i < n->num_namespaces; i++) {
-if (i < min_nsid) {
+for (int i = 1; i <= n->num_namespaces; i++) {
+if (i <= min_nsid) {
 continue;
 }
-list[j++] = cpu_to_le32(i + 1);
+list[j++] = cpu_to_le32(i);
 if (j == data_len / sizeof(uint32_t)) {
 break;
 }
-- 
2.28.0

[PATCH 1/1] block: drop moderated sheepdog mailing list from MAINTAINERS file

2020-09-22 Thread Daniel P . Berrangé

The sheepdog mailing list is setup to stop and queue messages from
non-subscribers, pending moderator approval. Unfortunately it seems
that the moderation queue is not actively dealt with. Even when messages
are approved, the sender is never added to the whitelist, so every
future mail from the same sender continues to get stopped for moderation.

MAINTAINERS entries should be responsive and not uneccessarily block
mails from QEMU contributors, so drop the sheepdog mailing list.

Signed-off-by: Daniel P. Berrangé 
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3d17cad19a..8e8a4fb0a8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2852,7 +2852,6 @@ F: block/rbd.c
 Sheepdog
 M: Liu Yuan 
 L: qemu-block@nongnu.org
-L: sheep...@lists.wpkg.org
 S: Odd Fixes
 F: block/sheepdog.c
 
-- 
2.26.2

[PATCH v3 17/17] hw/block/nvme: change controller pci id

2020-09-22 Thread Klaus Jensen

From: Klaus Jensen 

There are two reasons for changing this:

  1. The nvme device currently uses an internal Intel device id.

  2. Since commits "nvme: fix write zeroes offset and count" and "nvme:
 support multiple namespaces" the controller device no longer has
 the quirks that the Linux kernel think it has.

 As the quirks are applied based on pci vendor and device id, change
 them to get rid of the quirks.

To keep backward compatibility, add a new 'use-intel-id' parameter to
the nvme device to force use of the Intel vendor and device id. This is
off by default but add a compat property to set this for 5.1 machines
and older. If a 5.1 machine is booted (or the use-intel-id parameter is
explicitly set to true), the Linux kernel will just apply these
unnecessary quirks:

  1. NVME_QUIRK_IDENTIFY_CNS which says that the device does not support
 anything else than values 0x0 and 0x1 for CNS (Identify Namespace
 and Identify Namespace). With multiple namespace support, this just
 means that the kernel will "scan" namespaces instead of using
 "Active Namespace ID list" (CNS 0x2).

  2. NVME_QUIRK_DISABLE_WRITE_ZEROES. The nvme device started out with a
 broken Write Zeroes implementation which has since been fixed in
 commit 9d6459d21a6e ("nvme: fix write zeroes offset and count").

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
Reviewed-by: Maxim Levitsky 
---
 hw/block/nvme.h   |  1 +
 hw/block/nvme.c   | 12 ++--
 hw/core/machine.c |  1 +
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index d96ec15cdffb..e080a2318a50 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -15,6 +15,7 @@ typedef struct NvmeParams {
 uint8_t  aerl;
 uint32_t aer_max_queued;
 uint8_t  mdts;
+bool use_intel_id;
 } NvmeParams;
 
 typedef struct NvmeAsyncEvent {
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index d6749fdd96e1..da8344f196a8 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -2683,6 +2683,15 @@ static void nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 
 pci_conf[PCI_INTERRUPT_PIN] = 1;
 pci_config_set_prog_interface(pci_conf, 0x2);
+
+if (n->params.use_intel_id) {
+pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_INTEL);
+pci_config_set_device_id(pci_conf, 0x5845);
+} else {
+pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_REDHAT);
+pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_REDHAT_NVME);
+}
+
 pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_EXPRESS);
 pcie_endpoint_cap_init(pci_dev, 0x80);
 
@@ -2836,6 +2845,7 @@ static Property nvme_props[] = {
 DEFINE_PROP_UINT8("aerl", NvmeCtrl, params.aerl, 3),
 DEFINE_PROP_UINT32("aer_max_queued", NvmeCtrl, params.aer_max_queued, 64),
 DEFINE_PROP_UINT8("mdts", NvmeCtrl, params.mdts, 7),
+DEFINE_PROP_BOOL("use-intel-id", NvmeCtrl, params.use_intel_id, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -2852,8 +2862,6 @@ static void nvme_class_init(ObjectClass *oc, void *data)
 pc->realize = nvme_realize;
 pc->exit = nvme_exit;
 pc->class_id = PCI_CLASS_STORAGE_EXPRESS;
-pc->vendor_id = PCI_VENDOR_ID_INTEL;
-pc->device_id = 0x5845;
 pc->revision = 2;
 
 set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
diff --git a/hw/core/machine.c b/hw/core/machine.c
index ea26d612374d..944120cf19c8 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -34,6 +34,7 @@ GlobalProperty hw_compat_5_1[] = {
 { "vhost-user-scsi", "num_queues", "1"},
 { "virtio-blk-device", "num-queues", "1"},
 { "virtio-scsi-device", "num_queues", "1"},
+{ "nvme", "use-intel-id", "on"},
 };
 const size_t hw_compat_5_1_len = G_N_ELEMENTS(hw_compat_5_1);
 
-- 
2.28.0

[PATCH v3 13/17] hw/block/nvme: add support for sgl bit bucket descriptor

2020-09-22 Thread Klaus Jensen

From: Gollu Appalanaidu 

This adds support for SGL descriptor type 0x1 (bit bucket descriptor).
See the NVM Express v1.3d specification, Section 4.4 ("Scatter Gather
List (SGL)").

Signed-off-by: Gollu Appalanaidu 
Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index c5d09ff1edf5..1fdd5c9c5ab0 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -430,6 +430,10 @@ static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList 
*qsg,
 uint8_t type = NVME_SGL_TYPE(segment[i].type);
 
 switch (type) {
+case NVME_SGL_DESCR_TYPE_BIT_BUCKET:
+if (req->cmd.opcode == NVME_CMD_WRITE) {
+continue;
+}
 case NVME_SGL_DESCR_TYPE_DATA_BLOCK:
 break;
 case NVME_SGL_DESCR_TYPE_SEGMENT:
@@ -440,6 +444,7 @@ static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList 
*qsg,
 }
 
 dlen = le32_to_cpu(segment[i].len);
+
 if (!dlen) {
 continue;
 }
@@ -460,6 +465,11 @@ static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList 
*qsg,
 }
 
 trans_len = MIN(*len, dlen);
+
+if (type == NVME_SGL_DESCR_TYPE_BIT_BUCKET) {
+goto next;
+}
+
 addr = le64_to_cpu(segment[i].addr);
 
 if (UINT64_MAX - addr < dlen) {
@@ -471,6 +481,7 @@ static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList 
*qsg,
 return status;
 }
 
+next:
 *len -= trans_len;
 }
 
@@ -540,7 +551,8 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg, 
QEMUIOVector *iov,
 seg_len = le32_to_cpu(sgld->len);
 
 /* check the length of the (Last) Segment descriptor */
-if (!seg_len || seg_len & 0xf) {
+if ((!seg_len || seg_len & 0xf) &&
+(NVME_SGL_TYPE(sgld->type) != NVME_SGL_DESCR_TYPE_BIT_BUCKET)) {
 return NVME_INVALID_SGL_SEG_DESCR | NVME_DNR;
 }
 
@@ -577,19 +589,27 @@ static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList 
*qsg, QEMUIOVector *iov,
 
 last_sgld = &segment[nsgld - 1];
 
-/* if the segment ends with a Data Block, then we are done */
-if (NVME_SGL_TYPE(last_sgld->type) == NVME_SGL_DESCR_TYPE_DATA_BLOCK) {
+/*
+ * If the segment ends with a Data Block or Bit Bucket Descriptor Type,
+ * then we are done.
+ */
+switch (NVME_SGL_TYPE(last_sgld->type)) {
+case NVME_SGL_DESCR_TYPE_DATA_BLOCK:
+case NVME_SGL_DESCR_TYPE_BIT_BUCKET:
 status = nvme_map_sgl_data(n, qsg, iov, segment, nsgld, &len, req);
 if (status) {
 goto unmap;
 }
 
 goto out;
+
+default:
+break;
 }
 
 /*
- * If the last descriptor was not a Data Block, then the current
- * segment must not be a Last Segment.
+ * If the last descriptor was not a Data Block or Bit Bucket, then the
+ * current segment must not be a Last Segment.
  */
 if (NVME_SGL_TYPE(sgld->type) == NVME_SGL_DESCR_TYPE_LAST_SEGMENT) {
 status = NVME_INVALID_SGL_SEG_DESCR | NVME_DNR;
@@ -2656,7 +2676,8 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->nn = cpu_to_le32(n->num_namespaces);
 id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROES | NVME_ONCS_TIMESTAMP |
NVME_ONCS_FEATURES);
-id->sgls = cpu_to_le32(NVME_CTRL_SGLS_SUPPORT_NO_ALIGN);
+id->sgls = cpu_to_le32(NVME_CTRL_SGLS_SUPPORT_NO_ALIGN |
+   NVME_CTRL_SGLS_BITBUCKET);
 
 subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", n->params.serial);
 strpadcpy((char *)id->subnqn, sizeof(id->subnqn), subnqn, '\0');
-- 
2.28.0

[PATCH v3 15/17] hw/block/nvme: support multiple namespaces

2020-09-22 Thread Klaus Jensen

From: Klaus Jensen 

This adds support for multiple namespaces by introducing a new 'nvme-ns'
device model. The nvme device creates a bus named from the device name
('id'). The nvme-ns devices then connect to this and registers
themselves with the nvme device.

This changes how an nvme device is created. Example with two namespaces:

  -drive file=nvme0n1.img,if=none,id=disk1
  -drive file=nvme0n2.img,if=none,id=disk2
  -device nvme,serial=deadbeef,id=nvme0
  -device nvme-ns,drive=disk1,bus=nvme0,nsid=1
  -device nvme-ns,drive=disk2,bus=nvme0,nsid=2

The drive property is kept on the nvme device to keep the change
backward compatible, but the property is now optional. Specifying a
drive for the nvme device will always create the namespace with nsid 1.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
Reviewed-by: Minwoo Im 
---
 hw/block/nvme-ns.h|  74 +
 hw/block/nvme.h   |  46 
 hw/block/nvme-ns.c| 167 
 hw/block/nvme.c   | 249 +++---
 hw/block/meson.build  |   2 +-
 hw/block/trace-events |   6 +-
 6 files changed, 428 insertions(+), 116 deletions(-)
 create mode 100644 hw/block/nvme-ns.h
 create mode 100644 hw/block/nvme-ns.c

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
new file mode 100644
index ..83734f4606e1
--- /dev/null
+++ b/hw/block/nvme-ns.h
@@ -0,0 +1,74 @@
+/*
+ * QEMU NVM Express Virtual Namespace
+ *
+ * Copyright (c) 2019 CNEX Labs
+ * Copyright (c) 2020 Samsung Electronics
+ *
+ * Authors:
+ *  Klaus Jensen  
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef NVME_NS_H
+#define NVME_NS_H
+
+#define TYPE_NVME_NS "nvme-ns"
+#define NVME_NS(obj) \
+OBJECT_CHECK(NvmeNamespace, (obj), TYPE_NVME_NS)
+
+typedef struct NvmeNamespaceParams {
+uint32_t nsid;
+} NvmeNamespaceParams;
+
+typedef struct NvmeNamespace {
+DeviceState  parent_obj;
+BlockConfblkconf;
+int32_t  bootindex;
+int64_t  size;
+NvmeIdNs id_ns;
+
+NvmeNamespaceParams params;
+} NvmeNamespace;
+
+static inline uint32_t nvme_nsid(NvmeNamespace *ns)
+{
+if (ns) {
+return ns->params.nsid;
+}
+
+return -1;
+}
+
+static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
+{
+NvmeIdNs *id_ns = &ns->id_ns;
+return &id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
+}
+
+static inline uint8_t nvme_ns_lbads(NvmeNamespace *ns)
+{
+return nvme_ns_lbaf(ns)->ds;
+}
+
+/* calculate the number of LBAs that the namespace can accomodate */
+static inline uint64_t nvme_ns_nlbas(NvmeNamespace *ns)
+{
+return ns->size >> nvme_ns_lbads(ns);
+}
+
+/* convert an LBA to the equivalent in bytes */
+static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t lba)
+{
+return lba << nvme_ns_lbads(ns);
+}
+
+typedef struct NvmeCtrl NvmeCtrl;
+
+int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
+void nvme_ns_drain(NvmeNamespace *ns);
+void nvme_ns_flush(NvmeNamespace *ns);
+
+#endif /* NVME_NS_H */
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index f355eccb323b..d96ec15cdffb 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -2,6 +2,9 @@
 #define HW_NVME_H
 
 #include "block/nvme.h"
+#include "nvme-ns.h"
+
+#define NVME_MAX_NAMESPACES 256
 
 typedef struct NvmeParams {
 char *serial;
@@ -90,26 +93,12 @@ typedef struct NvmeCQueue {
 QTAILQ_HEAD(, NvmeRequest) req_list;
 } NvmeCQueue;
 
-typedef struct NvmeNamespace {
-NvmeIdNsid_ns;
-} NvmeNamespace;
+#define TYPE_NVME_BUS "nvme-bus"
+#define NVME_BUS(obj) OBJECT_CHECK(NvmeBus, (obj), TYPE_NVME_BUS)
 
-static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
-{
-NvmeIdNs *id_ns = &ns->id_ns;
-return &id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
-}
-
-static inline uint8_t nvme_ns_lbads(NvmeNamespace *ns)
-{
-return nvme_ns_lbaf(ns)->ds;
-}
-
-/* convert an LBA to the equivalent in bytes */
-static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t lba)
-{
-return lba << nvme_ns_lbads(ns);
-}
+typedef struct NvmeBus {
+BusState parent_bus;
+} NvmeBus;
 
 #define TYPE_NVME "nvme"
 #define NVME(obj) \
@@ -121,6 +110,7 @@ typedef struct NvmeFeatureVal {
 uint16_t temp_thresh_low;
 };
 uint32_tasync_config;
+uint32_tvwc;
 } NvmeFeatureVal;
 
 typedef struct NvmeCtrl {
@@ -128,8 +118,9 @@ typedef struct NvmeCtrl {
 MemoryRegion iomem;
 MemoryRegion ctrl_mem;
 NvmeBar  bar;
-BlockConfconf;
 NvmeParams   params;
+NvmeBus  bus;
+BlockConfconf;
 
 boolqs_created;
 uint32_tpage_size;
@@ -140,7 +131,6 @@ typedef struct NvmeCtrl {
 uint32_treg_size;
 uint32_tnum_namespaces;
 uint32_tmax_q_ents;
-uint64_tns_size;
 uint8_t outstanding_aers;
 uint8_t *cmbuf;
 uint32_tirq_status;
@@ -156,7 +146,8 @@ typedef struc

[PATCH v3 12/17] hw/block/nvme: add support for scatter gather lists

2020-09-22 Thread Klaus Jensen

From: Klaus Jensen 

For now, support the Data Block, Segment and Last Segment descriptor
types.

See NVM Express 1.3d, Section 4.4 ("Scatter Gather List (SGL)").

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 include/block/nvme.h  |   6 +-
 hw/block/nvme.c   | 329 ++
 hw/block/trace-events |   4 +
 3 files changed, 279 insertions(+), 60 deletions(-)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index 65e68a82c897..58647bcdad0b 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -412,9 +412,9 @@ typedef union NvmeCmdDptr {
 } NvmeCmdDptr;
 
 enum NvmePsdt {
-PSDT_PRP = 0x0,
-PSDT_SGL_MPTR_CONTIGUOUS = 0x1,
-PSDT_SGL_MPTR_SGL= 0x2,
+NVME_PSDT_PRP = 0x0,
+NVME_PSDT_SGL_MPTR_CONTIGUOUS = 0x1,
+NVME_PSDT_SGL_MPTR_SGL= 0x2,
 };
 
 typedef struct QEMU_PACKED NvmeCmd {
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 3b901efd1ec0..c5d09ff1edf5 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -413,13 +413,262 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 return NVME_SUCCESS;
 }
 
-static uint16_t nvme_dma_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
- uint64_t prp1, uint64_t prp2, DMADirection dir,
+/*
+ * Map 'nsgld' data descriptors from 'segment'. The function will subtract the
+ * number of bytes mapped in len.
+ */
+static uint16_t nvme_map_sgl_data(NvmeCtrl *n, QEMUSGList *qsg,
+  QEMUIOVector *iov,
+  NvmeSglDescriptor *segment, uint64_t nsgld,
+  size_t *len, NvmeRequest *req)
+{
+dma_addr_t addr, trans_len;
+uint32_t dlen;
+uint16_t status;
+
+for (int i = 0; i < nsgld; i++) {
+uint8_t type = NVME_SGL_TYPE(segment[i].type);
+
+switch (type) {
+case NVME_SGL_DESCR_TYPE_DATA_BLOCK:
+break;
+case NVME_SGL_DESCR_TYPE_SEGMENT:
+case NVME_SGL_DESCR_TYPE_LAST_SEGMENT:
+return NVME_INVALID_NUM_SGL_DESCRS | NVME_DNR;
+default:
+return NVME_SGL_DESCR_TYPE_INVALID | NVME_DNR;
+}
+
+dlen = le32_to_cpu(segment[i].len);
+if (!dlen) {
+continue;
+}
+
+if (*len == 0) {
+/*
+ * All data has been mapped, but the SGL contains additional
+ * segments and/or descriptors. The controller might accept
+ * ignoring the rest of the SGL.
+ */
+uint16_t sgls = le16_to_cpu(n->id_ctrl.sgls);
+if (sgls & NVME_CTRL_SGLS_EXCESS_LENGTH) {
+break;
+}
+
+trace_pci_nvme_err_invalid_sgl_excess_length(nvme_cid(req));
+return NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
+}
+
+trans_len = MIN(*len, dlen);
+addr = le64_to_cpu(segment[i].addr);
+
+if (UINT64_MAX - addr < dlen) {
+return NVME_DATA_SGL_LEN_INVALID | NVME_DNR;
+}
+
+status = nvme_map_addr(n, qsg, iov, addr, trans_len);
+if (status) {
+return status;
+}
+
+*len -= trans_len;
+}
+
+return NVME_SUCCESS;
+}
+
+static uint16_t nvme_map_sgl(NvmeCtrl *n, QEMUSGList *qsg, QEMUIOVector *iov,
+ NvmeSglDescriptor sgl, size_t len,
  NvmeRequest *req)
+{
+/*
+ * Read the segment in chunks of 256 descriptors (one 4k page) to avoid
+ * dynamically allocating a potentially huge SGL. The spec allows the SGL
+ * to be larger (as in number of bytes required to describe the SGL
+ * descriptors and segment chain) than the command transfer size, so it is
+ * not bounded by MDTS.
+ */
+const int SEG_CHUNK_SIZE = 256;
+
+NvmeSglDescriptor segment[SEG_CHUNK_SIZE], *sgld, *last_sgld;
+uint64_t nsgld;
+uint32_t seg_len;
+uint16_t status;
+bool sgl_in_cmb = false;
+hwaddr addr;
+int ret;
+
+sgld = &sgl;
+addr = le64_to_cpu(sgl.addr);
+
+trace_pci_nvme_map_sgl(nvme_cid(req), NVME_SGL_TYPE(sgl.type), len);
+
+/*
+ * If the entire transfer can be described with a single data block it can
+ * be mapped directly.
+ */
+if (NVME_SGL_TYPE(sgl.type) == NVME_SGL_DESCR_TYPE_DATA_BLOCK) {
+status = nvme_map_sgl_data(n, qsg, iov, sgld, 1, &len, req);
+if (status) {
+goto unmap;
+}
+
+goto out;
+}
+
+/*
+ * If the segment is located in the CMB, the submission queue of the
+ * request must also reside there.
+ */
+if (nvme_addr_is_cmb(n, addr)) {
+if (!nvme_addr_is_cmb(n, req->sq->dma_addr)) {
+return NVME_INVALID_USE_OF_CMB | NVME_DNR;
+}
+
+sgl_in_cmb = true;
+}
+
+for (;;) {
+switch (NVME_SGL_TYPE(sgld->type)) {
+case NVME_SGL_DESCR_TYPE_SEGMENT:
+ca

[PATCH v3 11/17] hw/block/nvme: harden cmb access

2020-09-22 Thread Klaus Jensen

From: Klaus Jensen 

Since the controller has only supported PRPs so far it has not been
required to check the ending address (addr + len - 1) of the CMB access
for validity since it has been guaranteed to be in range of the CMB.

This changes when the controller adds support for SGLs (next patch), so
add that check.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 7c9ea792483c..3b901efd1ec0 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -142,7 +142,12 @@ static inline void *nvme_addr_to_cmb(NvmeCtrl *n, hwaddr 
addr)
 
 static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
 {
-if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr)) {
+hwaddr hi = addr + size - 1;
+if (hi < addr) {
+return 1;
+}
+
+if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr) && nvme_addr_is_cmb(n, hi)) {
 memcpy(buf, nvme_addr_to_cmb(n, addr), size);
 return 0;
 }
-- 
2.28.0

[PATCH v3 08/17] hw/block/nvme: add symbolic command name to trace events

2020-09-22 Thread Klaus Jensen

From: Klaus Jensen 

Add the symbolic command name to the pci_nvme_{io,admin}_cmd and
pci_nvme_rw trace events.

Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.h   | 28 
 hw/block/nvme.c   |  8 +---
 hw/block/trace-events |  6 +++---
 3 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 1675c1e0755c..ce9e931420d7 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -32,6 +32,34 @@ typedef struct NvmeRequest {
 QTAILQ_ENTRY(NvmeRequest)entry;
 } NvmeRequest;
 
+static inline const char *nvme_adm_opc_str(uint8_t opc)
+{
+switch (opc) {
+case NVME_ADM_CMD_DELETE_SQ:return "NVME_ADM_CMD_DELETE_SQ";
+case NVME_ADM_CMD_CREATE_SQ:return "NVME_ADM_CMD_CREATE_SQ";
+case NVME_ADM_CMD_GET_LOG_PAGE: return "NVME_ADM_CMD_GET_LOG_PAGE";
+case NVME_ADM_CMD_DELETE_CQ:return "NVME_ADM_CMD_DELETE_CQ";
+case NVME_ADM_CMD_CREATE_CQ:return "NVME_ADM_CMD_CREATE_CQ";
+case NVME_ADM_CMD_IDENTIFY: return "NVME_ADM_CMD_IDENTIFY";
+case NVME_ADM_CMD_ABORT:return "NVME_ADM_CMD_ABORT";
+case NVME_ADM_CMD_SET_FEATURES: return "NVME_ADM_CMD_SET_FEATURES";
+case NVME_ADM_CMD_GET_FEATURES: return "NVME_ADM_CMD_GET_FEATURES";
+case NVME_ADM_CMD_ASYNC_EV_REQ: return "NVME_ADM_CMD_ASYNC_EV_REQ";
+default:return "NVME_ADM_CMD_UNKNOWN";
+}
+}
+
+static inline const char *nvme_io_opc_str(uint8_t opc)
+{
+switch (opc) {
+case NVME_CMD_FLUSH:return "NVME_NVM_CMD_FLUSH";
+case NVME_CMD_WRITE:return "NVME_NVM_CMD_WRITE";
+case NVME_CMD_READ: return "NVME_NVM_CMD_READ";
+case NVME_CMD_WRITE_ZEROES: return "NVME_NVM_CMD_WRITE_ZEROES";
+default:return "NVME_NVM_CMD_UNKNOWN";
+}
+}
+
 typedef struct NvmeSQueue {
 struct NvmeCtrl *ctrl;
 uint16_tsqid;
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 638e3b1ccac8..bae43276bd6f 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -678,7 +678,8 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
 enum BlockAcctType acct = is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ;
 uint16_t status;
 
-trace_pci_nvme_rw(is_write ? "write" : "read", nlb, data_size, slba);
+trace_pci_nvme_rw(nvme_cid(req), nvme_io_opc_str(rw->opcode), nlb,
+  data_size, slba);
 
 status = nvme_check_mdts(n, data_size);
 if (status) {
@@ -727,7 +728,7 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
 uint32_t nsid = le32_to_cpu(req->cmd.nsid);
 
 trace_pci_nvme_io_cmd(nvme_cid(req), nsid, nvme_sqid(req),
-  req->cmd.opcode);
+  req->cmd.opcode, nvme_io_opc_str(req->cmd.opcode));
 
 if (unlikely(nsid == 0 || nsid > n->num_namespaces)) {
 trace_pci_nvme_err_invalid_ns(nsid, n->num_namespaces);
@@ -1584,7 +1585,8 @@ static uint16_t nvme_aer(NvmeCtrl *n, NvmeRequest *req)
 
 static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req)
 {
-trace_pci_nvme_admin_cmd(nvme_cid(req), nvme_sqid(req), req->cmd.opcode);
+trace_pci_nvme_admin_cmd(nvme_cid(req), nvme_sqid(req), req->cmd.opcode,
+ nvme_adm_opc_str(req->cmd.opcode));
 
 switch (req->cmd.opcode) {
 case NVME_ADM_CMD_DELETE_SQ:
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 5589db4a014f..024786f4833c 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -36,9 +36,9 @@ pci_nvme_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, 
prp1=0x%"PRIx64" prp2
 pci_nvme_map_addr(uint64_t addr, uint64_t len) "addr 0x%"PRIx64" len %"PRIu64""
 pci_nvme_map_addr_cmb(uint64_t addr, uint64_t len) "addr 0x%"PRIx64" len 
%"PRIu64""
 pci_nvme_map_prp(uint64_t trans_len, uint32_t len, uint64_t prp1, uint64_t 
prp2, int num_prps) "trans_len %"PRIu64" len %"PRIu32" prp1 0x%"PRIx64" prp2 
0x%"PRIx64" num_prps %d"
-pci_nvme_io_cmd(uint16_t cid, uint32_t nsid, uint16_t sqid, uint8_t opcode) 
"cid %"PRIu16" nsid %"PRIu32" sqid %"PRIu16" opc 0x%"PRIx8""
-pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t opcode) "cid %"PRIu16" 
sqid %"PRIu16" opc 0x%"PRIx8""
-pci_nvme_rw(const char *verb, uint32_t blk_count, uint64_t byte_count, 
uint64_t lba) "%s %"PRIu32" blocks (%"PRIu64" bytes) from LBA %"PRIu64""
+pci_nvme_io_cmd(uint16_t cid, uint32_t nsid, uint16_t sqid, uint8_t opcode, 
const char *opname) "cid %"PRIu16" nsid %"PRIu32" sqid %"PRIu16" opc 0x%"PRIx8" 
opname '%s'"
+pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t opcode, const char 
*opname) "cid %"PRIu16" sqid %"PRIu16" opc 0x%"PRIx8" opname '%s'"
+pci_nvme_rw(uint16_t cid, const char *verb, uint32_t nlb, uint64_t count, 
uint64_t lba) "cid %"PRIu16" '%s' nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64""
 pci_nvme_rw_cb(uint16_t cid) "cid %"PR

[PATCH v3 04/17] hw/block/nvme: commonize nvme_rw error handling

2020-09-22 Thread Klaus Jensen

From: Klaus Jensen 

Move common error handling to a label.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 744ff3d86c22..a76a6464d6a1 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -687,20 +687,18 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
 status = nvme_check_mdts(n, data_size);
 if (status) {
 trace_pci_nvme_err_mdts(nvme_cid(req), data_size);
-block_acct_invalid(blk_get_stats(n->conf.blk), acct);
-return status;
+goto invalid;
 }
 
 status = nvme_check_bounds(n, ns, slba, nlb);
 if (status) {
 trace_pci_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
-block_acct_invalid(blk_get_stats(n->conf.blk), acct);
-return status;
+goto invalid;
 }
 
-if (nvme_map_dptr(n, data_size, req)) {
-block_acct_invalid(blk_get_stats(n->conf.blk), acct);
-return NVME_INVALID_FIELD | NVME_DNR;
+status = nvme_map_dptr(n, data_size, req);
+if (status) {
+goto invalid;
 }
 
 if (req->qsg.nsg > 0) {
@@ -722,6 +720,10 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
 }
 
 return NVME_NO_COMPLETE;
+
+invalid:
+block_acct_invalid(blk_get_stats(n->conf.blk), acct);
+return status;
 }
 
 static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
-- 
2.28.0

[PATCH v3 09/17] hw/block/nvme: refactor aio submission

2020-09-22 Thread Klaus Jensen

From: Klaus Jensen 

This pulls block layer aio submission/completion to common functions.

For completions, additionally map an AIO error to the Unrecovered Read
and Write Fault status codes.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.h   |  14 +
 hw/block/nvme.c   | 136 ++
 hw/block/trace-events |   4 +-
 3 files changed, 114 insertions(+), 40 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index ce9e931420d7..f355eccb323b 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -171,4 +171,18 @@ static inline uint64_t nvme_ns_nlbas(NvmeCtrl *n, 
NvmeNamespace *ns)
 return n->ns_size >> nvme_ns_lbads(ns);
 }
 
+static inline NvmeCQueue *nvme_cq(NvmeRequest *req)
+{
+NvmeSQueue *sq = req->sq;
+NvmeCtrl *n = sq->ctrl;
+
+return n->cq[sq->cqid];
+}
+
+static inline NvmeCtrl *nvme_ctrl(NvmeRequest *req)
+{
+NvmeSQueue *sq = req->sq;
+return sq->ctrl;
+}
+
 #endif /* HW_NVME_H */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index bae43276bd6f..aa13809eaab2 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -614,30 +614,110 @@ static inline uint16_t nvme_check_bounds(NvmeCtrl *n, 
NvmeNamespace *ns,
 static void nvme_rw_cb(void *opaque, int ret)
 {
 NvmeRequest *req = opaque;
-NvmeSQueue *sq = req->sq;
-NvmeCtrl *n = sq->ctrl;
-NvmeCQueue *cq = n->cq[sq->cqid];
+NvmeCtrl *n = nvme_ctrl(req);
 
-trace_pci_nvme_rw_cb(nvme_cid(req));
+BlockBackend *blk = n->conf.blk;
+BlockAcctCookie *acct = &req->acct;
+BlockAcctStats *stats = blk_get_stats(blk);
+
+Error *local_err = NULL;
+
+trace_pci_nvme_rw_cb(nvme_cid(req), blk_name(blk));
 
 if (!ret) {
-block_acct_done(blk_get_stats(n->conf.blk), &req->acct);
+block_acct_done(stats, acct);
 req->status = NVME_SUCCESS;
 } else {
-block_acct_failed(blk_get_stats(n->conf.blk), &req->acct);
-req->status = NVME_INTERNAL_DEV_ERROR;
+uint16_t status;
+
+block_acct_failed(stats, acct);
+
+switch (req->cmd.opcode) {
+case NVME_CMD_READ:
+status = NVME_UNRECOVERED_READ;
+break;
+case NVME_CMD_FLUSH:
+case NVME_CMD_WRITE:
+case NVME_CMD_WRITE_ZEROES:
+status = NVME_WRITE_FAULT;
+break;
+default:
+status = NVME_INTERNAL_DEV_ERROR;
+break;
+}
+
+trace_pci_nvme_err_aio(nvme_cid(req), strerror(ret), status);
+
+error_setg_errno(&local_err, -ret, "aio failed");
+error_report_err(local_err);
+
+req->status = status;
 }
 
-nvme_enqueue_req_completion(cq, req);
+nvme_enqueue_req_completion(nvme_cq(req), req);
+}
+
+static uint16_t nvme_do_aio(BlockBackend *blk, int64_t offset, size_t len,
+NvmeRequest *req)
+{
+BlockAcctCookie *acct = &req->acct;
+BlockAcctStats *stats = blk_get_stats(blk);
+
+bool is_write = false;
+
+trace_pci_nvme_do_aio(nvme_cid(req), req->cmd.opcode,
+  nvme_io_opc_str(req->cmd.opcode), blk_name(blk),
+  offset, len);
+
+switch (req->cmd.opcode) {
+case NVME_CMD_FLUSH:
+block_acct_start(stats, acct, 0, BLOCK_ACCT_FLUSH);
+req->aiocb = blk_aio_flush(blk, nvme_rw_cb, req);
+break;
+
+case NVME_CMD_WRITE_ZEROES:
+block_acct_start(stats, acct, len, BLOCK_ACCT_WRITE);
+req->aiocb = blk_aio_pwrite_zeroes(blk, offset, len,
+   BDRV_REQ_MAY_UNMAP, nvme_rw_cb,
+   req);
+break;
+
+case NVME_CMD_WRITE:
+is_write = true;
+
+/* fallthrough */
+
+case NVME_CMD_READ:
+block_acct_start(stats, acct, len,
+ is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ);
+
+if (req->qsg.sg) {
+if (is_write) {
+req->aiocb = dma_blk_write(blk, &req->qsg, offset,
+   BDRV_SECTOR_SIZE, nvme_rw_cb, req);
+} else {
+req->aiocb = dma_blk_read(blk, &req->qsg, offset,
+  BDRV_SECTOR_SIZE, nvme_rw_cb, req);
+}
+} else {
+if (is_write) {
+req->aiocb = blk_aio_pwritev(blk, offset, &req->iov, 0,
+ nvme_rw_cb, req);
+} else {
+req->aiocb = blk_aio_preadv(blk, offset, &req->iov, 0,
+nvme_rw_cb, req);
+}
+}
+
+break;
+}
+
+return NVME_NO_COMPLETE;
 }
 
 static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req)
 {
-block_acct_start(blk_get_stats(n->conf.blk), &req->acct, 0,
- BLOCK_ACCT_FLUSH);
-req->aiocb = blk_aio_flush(n->conf.blk, nvme_rw_cb, req);
-
-return NVME_NO_COMPLETE;
+return nvme_d

[PATCH v3 10/17] hw/block/nvme: default request status to success

2020-09-22 Thread Klaus Jensen

From: Klaus Jensen 

Make the default request status NVME_SUCCESS so only error status codes
have to be set.

Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index aa13809eaab2..7c9ea792483c 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -230,6 +230,7 @@ static void nvme_req_clear(NvmeRequest *req)
 {
 req->ns = NULL;
 memset(&req->cqe, 0x0, sizeof(req->cqe));
+req->status = NVME_SUCCESS;
 }
 
 static void nvme_req_exit(NvmeRequest *req)
@@ -546,8 +547,6 @@ static void nvme_process_aers(void *opaque)
 result->log_page = event->result.log_page;
 g_free(event);
 
-req->status = NVME_SUCCESS;
-
 trace_pci_nvme_aer_post_cqe(result->event_type, result->event_info,
 result->log_page);
 
@@ -626,7 +625,6 @@ static void nvme_rw_cb(void *opaque, int ret)
 
 if (!ret) {
 block_acct_done(stats, acct);
-req->status = NVME_SUCCESS;
 } else {
 uint16_t status;
 
-- 
2.28.0

[PATCH v3 03/17] hw/block/nvme: handle dma errors

2020-09-22 Thread Klaus Jensen

From: Klaus Jensen 

Handling DMA errors gracefully is required for the device to pass the
block/011 test ("disable PCI device while doing I/O") in the blktests
suite.

With this patch the device sets the Controller Fatal Status bit in the
CSTS register when failing to read from a submission queue or writing to
a completion queue; expecting the host to reset the controller.

If DMA errors occur at any other point in the execution of the command
(say, while mapping the PRPs), the command is aborted with a Data
Transfer Error status code.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c   | 41 +++--
 hw/block/trace-events |  3 +++
 2 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 63078f600920..744ff3d86c22 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -140,14 +140,14 @@ static inline void *nvme_addr_to_cmb(NvmeCtrl *n, hwaddr 
addr)
 return &n->cmbuf[addr - n->ctrl_mem.addr];
 }
 
-static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
+static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
 {
 if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr)) {
 memcpy(buf, nvme_addr_to_cmb(n, addr), size);
-return;
+return 0;
 }
 
-pci_dma_read(&n->parent_obj, addr, buf, size);
+return pci_dma_read(&n->parent_obj, addr, buf, size);
 }
 
 static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
@@ -307,6 +307,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 int num_prps = (len >> n->page_bits) + 1;
 uint16_t status;
 bool prp_list_in_cmb = false;
+int ret;
 
 QEMUSGList *qsg = &req->qsg;
 QEMUIOVector *iov = &req->iov;
@@ -347,7 +348,11 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 
 nents = (len + n->page_size - 1) >> n->page_bits;
 prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
-nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
+ret = nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
+if (ret) {
+trace_pci_nvme_err_addr_read(prp2);
+return NVME_DATA_TRAS_ERROR;
+}
 while (len != 0) {
 uint64_t prp_ent = le64_to_cpu(prp_list[i]);
 
@@ -364,8 +369,12 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, uint64_t prp1, 
uint64_t prp2,
 i = 0;
 nents = (len + n->page_size - 1) >> n->page_bits;
 prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
-nvme_addr_read(n, prp_ent, (void *)prp_list,
-prp_trans);
+ret = nvme_addr_read(n, prp_ent, (void *)prp_list,
+ prp_trans);
+if (ret) {
+trace_pci_nvme_err_addr_read(prp_ent);
+return NVME_DATA_TRAS_ERROR;
+}
 prp_ent = le64_to_cpu(prp_list[i]);
 }
 
@@ -457,6 +466,7 @@ static void nvme_post_cqes(void *opaque)
 NvmeCQueue *cq = opaque;
 NvmeCtrl *n = cq->ctrl;
 NvmeRequest *req, *next;
+int ret;
 
 QTAILQ_FOREACH_SAFE(req, &cq->req_list, entry, next) {
 NvmeSQueue *sq;
@@ -466,15 +476,21 @@ static void nvme_post_cqes(void *opaque)
 break;
 }
 
-QTAILQ_REMOVE(&cq->req_list, req, entry);
 sq = req->sq;
 req->cqe.status = cpu_to_le16((req->status << 1) | cq->phase);
 req->cqe.sq_id = cpu_to_le16(sq->sqid);
 req->cqe.sq_head = cpu_to_le16(sq->head);
 addr = cq->dma_addr + cq->tail * n->cqe_size;
+ret = pci_dma_write(&n->parent_obj, addr, (void *)&req->cqe,
+sizeof(req->cqe));
+if (ret) {
+trace_pci_nvme_err_addr_write(addr);
+trace_pci_nvme_err_cfs();
+n->bar.csts = NVME_CSTS_FAILED;
+break;
+}
+QTAILQ_REMOVE(&cq->req_list, req, entry);
 nvme_inc_cq_tail(cq);
-pci_dma_write(&n->parent_obj, addr, (void *)&req->cqe,
-sizeof(req->cqe));
 nvme_req_exit(req);
 QTAILQ_INSERT_TAIL(&sq->req_list, req, entry);
 }
@@ -1611,7 +1627,12 @@ static void nvme_process_sq(void *opaque)
 
 while (!(nvme_sq_empty(sq) || QTAILQ_EMPTY(&sq->req_list))) {
 addr = sq->dma_addr + sq->head * n->sqe_size;
-nvme_addr_read(n, addr, (void *)&cmd, sizeof(cmd));
+if (nvme_addr_read(n, addr, (void *)&cmd, sizeof(cmd))) {
+trace_pci_nvme_err_addr_read(addr);
+trace_pci_nvme_err_cfs();
+n->bar.csts = NVME_CSTS_FAILED;
+break;
+}
 nvme_inc_sq_head(sq);
 
 req = QTAILQ_FIRST(&sq->req_list);
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 8ff4cbc4932c..5589db4a014f 10064

[PATCH v3 07/17] hw/block/nvme: fix endian conversion

2020-09-22 Thread Klaus Jensen

From: Klaus Jensen 

The raw NLB field is a 16 bit value, so use le16_to_cpu instead of
le32_to_cpu and cast to uint32_t before incrementing the value to not
wrap around.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/block/nvme.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 25b34b3701db..638e3b1ccac8 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -645,7 +645,7 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest 
*req)
 NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
 NvmeNamespace *ns = req->ns;
 uint64_t slba = le64_to_cpu(rw->slba);
-uint32_t nlb  = le16_to_cpu(rw->nlb) + 1;
+uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
 uint64_t offset = nvme_l2b(ns, slba);
 uint32_t count = nvme_l2b(ns, nlb);
 uint16_t status;
@@ -669,7 +669,7 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
 {
 NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
 NvmeNamespace *ns = req->ns;
-uint32_t nlb  = le32_to_cpu(rw->nlb) + 1;
+uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
 uint64_t slba = le64_to_cpu(rw->slba);
 
 uint64_t data_size = nvme_l2b(ns, nlb);
-- 
2.28.0

[PATCH v3 06/17] hw/block/nvme: add a lba to bytes helper

2020-09-22 Thread Klaus Jensen

From: Klaus Jensen 

Add the nvme_l2b helper and use it for converting NLB and SLBA to byte
counts and offsets.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.h |  6 ++
 hw/block/nvme.c | 12 
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 52ba794f2e9a..1675c1e0755c 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -77,6 +77,12 @@ static inline uint8_t nvme_ns_lbads(NvmeNamespace *ns)
 return nvme_ns_lbaf(ns)->ds;
 }
 
+/* convert an LBA to the equivalent in bytes */
+static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t lba)
+{
+return lba << nvme_ns_lbads(ns);
+}
+
 #define TYPE_NVME "nvme"
 #define NVME(obj) \
 OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index bc8690fa7aaa..25b34b3701db 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -644,12 +644,10 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, 
NvmeRequest *req)
 {
 NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
 NvmeNamespace *ns = req->ns;
-const uint8_t lba_index = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
-const uint8_t data_shift = ns->id_ns.lbaf[lba_index].ds;
 uint64_t slba = le64_to_cpu(rw->slba);
 uint32_t nlb  = le16_to_cpu(rw->nlb) + 1;
-uint64_t offset = slba << data_shift;
-uint32_t count = nlb << data_shift;
+uint64_t offset = nvme_l2b(ns, slba);
+uint32_t count = nvme_l2b(ns, nlb);
 uint16_t status;
 
 trace_pci_nvme_write_zeroes(nvme_cid(req), slba, nlb);
@@ -674,10 +672,8 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
 uint32_t nlb  = le32_to_cpu(rw->nlb) + 1;
 uint64_t slba = le64_to_cpu(rw->slba);
 
-uint8_t lba_index  = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
-uint8_t data_shift = ns->id_ns.lbaf[lba_index].ds;
-uint64_t data_size = (uint64_t)nlb << data_shift;
-uint64_t data_offset = slba << data_shift;
+uint64_t data_size = nvme_l2b(ns, nlb);
+uint64_t data_offset = nvme_l2b(ns, slba);
 int is_write = rw->opcode == NVME_CMD_WRITE ? 1 : 0;
 enum BlockAcctType acct = is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ;
 uint16_t status;
-- 
2.28.0

[PATCH v3 02/17] pci: pass along the return value of dma_memory_rw

2020-09-22 Thread Klaus Jensen

From: Klaus Jensen 

Some devices might want to know the return value of dma_memory_rw, so
pass it along instead of ignoring it.

There are no existing users of the return value, so this patch should be
safe.

Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Michael S. Tsirkin 
Acked-by: Keith Busch 
---
 include/hw/pci/pci.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index c13ae1f8580b..0ff3feec1573 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -785,8 +785,7 @@ static inline AddressSpace *pci_get_address_space(PCIDevice 
*dev)
 static inline int pci_dma_rw(PCIDevice *dev, dma_addr_t addr,
  void *buf, dma_addr_t len, DMADirection dir)
 {
-dma_memory_rw(pci_get_address_space(dev), addr, buf, len, dir);
-return 0;
+return dma_memory_rw(pci_get_address_space(dev), addr, buf, len, dir);
 }
 
 static inline int pci_dma_read(PCIDevice *dev, dma_addr_t addr,
-- 
2.28.0

[PATCH v3 05/17] hw/block/nvme: alignment style fixes

2020-09-22 Thread Klaus Jensen

From: Klaus Jensen 

Style fixes.

Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index a76a6464d6a1..bc8690fa7aaa 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -634,7 +634,7 @@ static void nvme_rw_cb(void *opaque, int ret)
 static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req)
 {
 block_acct_start(blk_get_stats(n->conf.blk), &req->acct, 0,
- BLOCK_ACCT_FLUSH);
+ BLOCK_ACCT_FLUSH);
 req->aiocb = blk_aio_flush(n->conf.blk, nvme_rw_cb, req);
 
 return NVME_NO_COMPLETE;
@@ -663,7 +663,7 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest 
*req)
 block_acct_start(blk_get_stats(n->conf.blk), &req->acct, 0,
  BLOCK_ACCT_WRITE);
 req->aiocb = blk_aio_pwrite_zeroes(n->conf.blk, offset, count,
-BDRV_REQ_MAY_UNMAP, nvme_rw_cb, req);
+   BDRV_REQ_MAY_UNMAP, nvme_rw_cb, req);
 return NVME_NO_COMPLETE;
 }
 
@@ -803,7 +803,7 @@ static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeRequest *req)
 }
 
 static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n, uint64_t dma_addr,
-uint16_t sqid, uint16_t cqid, uint16_t size)
+ uint16_t sqid, uint16_t cqid, uint16_t size)
 {
 int i;
 NvmeCQueue *cq;
@@ -1058,7 +1058,8 @@ static uint16_t nvme_del_cq(NvmeCtrl *n, NvmeRequest *req)
 }
 
 static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, uint64_t dma_addr,
-uint16_t cqid, uint16_t vector, uint16_t size, uint16_t irq_enabled)
+ uint16_t cqid, uint16_t vector, uint16_t size,
+ uint16_t irq_enabled)
 {
 int ret;
 
@@ -1118,7 +1119,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest 
*req)
 
 cq = g_malloc0(sizeof(*cq));
 nvme_init_cq(cq, n, prp1, cqid, vector, qsize + 1,
-NVME_CQ_FLAGS_IEN(qflags));
+ NVME_CQ_FLAGS_IEN(qflags));
 
 /*
  * It is only required to set qs_created when creating a completion queue;
@@ -1520,7 +1521,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest 
*req)
 }
 
 if (((n->temperature >= n->features.temp_thresh_hi) ||
-(n->temperature <= n->features.temp_thresh_low)) &&
+ (n->temperature <= n->features.temp_thresh_low)) &&
 NVME_AEC_SMART(n->features.async_config) & NVME_SMART_TEMPERATURE) 
{
 nvme_enqueue_event(n, NVME_AER_TYPE_SMART,
NVME_AER_INFO_SMART_TEMP_THRESH,
@@ -1770,9 +1771,9 @@ static int nvme_start_ctrl(NvmeCtrl *n)
 n->cqe_size = 1 << NVME_CC_IOCQES(n->bar.cc);
 n->sqe_size = 1 << NVME_CC_IOSQES(n->bar.cc);
 nvme_init_cq(&n->admin_cq, n, n->bar.acq, 0, 0,
-NVME_AQA_ACQS(n->bar.aqa) + 1, 1);
+ NVME_AQA_ACQS(n->bar.aqa) + 1, 1);
 nvme_init_sq(&n->admin_sq, n, n->bar.asq, 0, 0,
-NVME_AQA_ASQS(n->bar.aqa) + 1);
+ NVME_AQA_ASQS(n->bar.aqa) + 1);
 
 nvme_set_timestamp(n, 0ULL);
 
@@ -1782,7 +1783,7 @@ static int nvme_start_ctrl(NvmeCtrl *n)
 }
 
 static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, uint64_t data,
-unsigned size)
+   unsigned size)
 {
 if (unlikely(offset & (sizeof(uint32_t) - 1))) {
 NVME_GUEST_ERR(pci_nvme_ub_mmiowr_misaligned32,
@@ -1925,7 +1926,7 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
"invalid write to PMRSWTP register, ignored");
 return;
 case 0xE14: /* TODO PMRMSC */
- break;
+break;
 default:
 NVME_GUEST_ERR(pci_nvme_ub_mmiowr_invalid,
"invalid MMIO write,"
@@ -2101,7 +2102,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int 
val)
 }
 
 static void nvme_mmio_write(void *opaque, hwaddr addr, uint64_t data,
-unsigned size)
+unsigned size)
 {
 NvmeCtrl *n = (NvmeCtrl *)opaque;
 
@@ -2125,7 +2126,7 @@ static const MemoryRegionOps nvme_mmio_ops = {
 };
 
 static void nvme_cmb_write(void *opaque, hwaddr addr, uint64_t data,
-unsigned size)
+   unsigned size)
 {
 NvmeCtrl *n = (NvmeCtrl *)opaque;
 stn_le_p(&n->cmbuf[addr], size, data);
-- 
2.28.0

[PATCH v3 01/17] hw/block/nvme: fix typo in trace event

2020-09-22 Thread Klaus Jensen

From: Klaus Jensen 

Fix a typo in the sq doorbell trace event.

Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Keith Busch 
---
 hw/block/trace-events | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/trace-events b/hw/block/trace-events
index ec94c56a4165..8ff4cbc4932c 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -70,7 +70,7 @@ pci_nvme_enqueue_req_completion(uint16_t cid, uint16_t cqid, 
uint16_t status) "c
 pci_nvme_mmio_read(uint64_t addr) "addr 0x%"PRIx64""
 pci_nvme_mmio_write(uint64_t addr, uint64_t data) "addr 0x%"PRIx64" data 
0x%"PRIx64""
 pci_nvme_mmio_doorbell_cq(uint16_t cqid, uint16_t new_head) "cqid %"PRIu16" 
new_head %"PRIu16""
-pci_nvme_mmio_doorbell_sq(uint16_t sqid, uint16_t new_tail) "cqid %"PRIu16" 
new_tail %"PRIu16""
+pci_nvme_mmio_doorbell_sq(uint16_t sqid, uint16_t new_tail) "sqid %"PRIu16" 
new_tail %"PRIu16""
 pci_nvme_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, 
interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64""
 pci_nvme_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, 
interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64""
 pci_nvme_mmio_cfg(uint64_t data) "wrote MMIO, config controller 
config=0x%"PRIx64""
-- 
2.28.0

[PATCH v3 00/17] hw/block/nvme: multiple namespaces support

2020-09-22 Thread Klaus Jensen

From: Klaus Jensen 

This is the next round of my patches for the nvme device.

This includes a bit of cleanup and two new features:

  * support for scatter/gather lists

  * multiple namespaces support through a new nvme-ns device

Finally, the series wraps up with changing the PCI vendor and device ID to get
rid of the internal Intel id and as a side-effect get rid of some Linux kernel
quirks that no longer applies.

"pci: pass along the return value of dma_memory_rw" has already been posted by
Philippe in another series, but since it is not applied yet, I am including it
here.

Changes for v3
~~

  * hw/block/nvme: handle dma errors
Do not retry DMA, just set Controller Fatal Status (CFS). This causes
the Linux kernel to most likely disable the controller when running the
blktests block/011 test case, which causes some havoc when running
blktests, but I have submitted a patch for blktests to fix this. (Keith)

  * hw/block/nvme: refactor aio submission
Dropped the unneeded nvme_req_is_write function. (Keith)

  * Added R-b's from Keith and Philippe.

Changes for v2
~~

  * Added a new patch ("hw/block/nvme: fix typo in trace event") that does what
it says on the tin.

  * Dropped the "hw/block/nvme: support multiple parallel aios per request"
patch (Keith).

  * hw/block/nvme: add symbolic command name to trace events
Changed to single quote (Philippe)

  * hw/block/nvme: default request status to success
Commit message typo fixed (Philippe)

  * hw/block/nvme: change controller pci id
Do NOT bump the device id for the legacy Intel id (David)

Gollu Appalanaidu (1):
  hw/block/nvme: add support for sgl bit bucket descriptor

Klaus Jensen (16):
  hw/block/nvme: fix typo in trace event
  pci: pass along the return value of dma_memory_rw
  hw/block/nvme: handle dma errors
  hw/block/nvme: commonize nvme_rw error handling
  hw/block/nvme: alignment style fixes
  hw/block/nvme: add a lba to bytes helper
  hw/block/nvme: fix endian conversion
  hw/block/nvme: add symbolic command name to trace events
  hw/block/nvme: refactor aio submission
  hw/block/nvme: default request status to success
  hw/block/nvme: harden cmb access
  hw/block/nvme: add support for scatter gather lists
  hw/block/nvme: refactor identify active namespace id list
  hw/block/nvme: support multiple namespaces
  pci: allocate pci id for nvme
  hw/block/nvme: change controller pci id

 docs/specs/nvme.txt|  23 ++
 docs/specs/pci-ids.txt |   1 +
 hw/block/nvme-ns.h |  74 
 hw/block/nvme.h|  83 +++-
 include/block/nvme.h   |   6 +-
 include/hw/pci/pci.h   |   4 +-
 hw/block/nvme-ns.c | 167 
 hw/block/nvme.c| 848 ++---
 hw/core/machine.c  |   1 +
 MAINTAINERS|   1 +
 hw/block/meson.build   |   2 +-
 hw/block/trace-events  |  23 +-
 12 files changed, 978 insertions(+), 255 deletions(-)
 create mode 100644 docs/specs/nvme.txt
 create mode 100644 hw/block/nvme-ns.h
 create mode 100644 hw/block/nvme-ns.c

-- 
2.28.0

Re: [PATCH 2/6] block/nvme: Map doorbells pages write-only

2020-09-22 Thread Philippe Mathieu-Daudé

Hi Fam,

+Paolo?

On 9/22/20 10:18 AM, Fam Zheng wrote:
> On Mon, 2020-09-21 at 18:29 +0200, Philippe Mathieu-Daudé wrote:
>> Per the datasheet sections 3.1.13/3.1.14:
>>   "The host should not read the doorbell registers."
>>
>> As we don't need read access, map the doorbells with write-only
>> permission. We keep a reference to this mapped address in the
>> BDRVNVMeState structure.
> 
> Besides looking more correct in access mode, is there any side effect
> of WO mapping?

TBH I don't have enough knowledge to answer this question.
I tested successfully on X86. I'm writing more tests.

> 
> Fam
> 
>>
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  block/nvme.c | 29 +++--
>>  1 file changed, 19 insertions(+), 10 deletions(-)
>>
>> diff --git a/block/nvme.c b/block/nvme.c
>> index 5a4dc6a722a..3c834da8fec 100644
>> --- a/block/nvme.c
>> +++ b/block/nvme.c
>> @@ -31,7 +31,7 @@
>>  #define NVME_SQ_ENTRY_BYTES 64
>>  #define NVME_CQ_ENTRY_BYTES 16
>>  #define NVME_QUEUE_SIZE 128
>> -#define NVME_BAR_SIZE 8192
>> +#define NVME_DOORBELL_SIZE 4096
>>  
>>  /*
>>   * We have to leave one slot empty as that is the full queue case
>> where
>> @@ -84,10 +84,6 @@ typedef struct {
>>  /* Memory mapped registers */
>>  typedef volatile struct {
>>  NvmeBar ctrl;
>> -struct {
>> -uint32_t sq_tail;
>> -uint32_t cq_head;
>> -} doorbells[];
>>  } NVMeRegs;
>>  
>>  #define INDEX_ADMIN 0
>> @@ -103,6 +99,11 @@ struct BDRVNVMeState {
>>  AioContext *aio_context;
>>  QEMUVFIOState *vfio;
>>  NVMeRegs *regs;
>> +/* Memory mapped registers */
>> +volatile struct {
>> +uint32_t sq_tail;
>> +uint32_t cq_head;
>> +} *doorbells;
>>  /* The submission/completion queue pairs.
>>   * [0]: admin queue.
>>   * [1..]: io queues.
>> @@ -247,14 +248,14 @@ static NVMeQueuePair
>> *nvme_create_queue_pair(BDRVNVMeState *s,
>>  error_propagate(errp, local_err);
>>  goto fail;
>>  }
>> -q->sq.doorbell = &s->regs->doorbells[idx * s-
>>> doorbell_scale].sq_tail;
>> +q->sq.doorbell = &s->doorbells[idx * s->doorbell_scale].sq_tail;
>>  
>>  nvme_init_queue(s, &q->cq, size, NVME_CQ_ENTRY_BYTES,
>> &local_err);
>>  if (local_err) {
>>  error_propagate(errp, local_err);
>>  goto fail;
>>  }
>> -q->cq.doorbell = &s->regs->doorbells[idx * s-
>>> doorbell_scale].cq_head;
>> +q->cq.doorbell = &s->doorbells[idx * s->doorbell_scale].cq_head;
>>  
>>  return q;
>>  fail:
>> @@ -712,13 +713,12 @@ static int nvme_init(BlockDriverState *bs,
>> const char *device, int namespace,
>>  goto out;
>>  }
>>  
>> -s->regs = qemu_vfio_pci_map_bar(s->vfio, 0, 0, NVME_BAR_SIZE,
>> +s->regs = qemu_vfio_pci_map_bar(s->vfio, 0, 0, sizeof(NvmeBar),
>>  PROT_READ | PROT_WRITE, errp);
>>  if (!s->regs) {
>>  ret = -EINVAL;
>>  goto out;
>>  }
>> -
>>  /* Perform initialize sequence as described in NVMe spec "7.6.1
>>   * Initialization". */
>>  
>> @@ -748,6 +748,13 @@ static int nvme_init(BlockDriverState *bs, const
>> char *device, int namespace,
>>  }
>>  }
>>  
>> +s->doorbells = qemu_vfio_pci_map_bar(s->vfio, 0,
>> sizeof(NvmeBar),
>> + NVME_DOORBELL_SIZE,
>> PROT_WRITE, errp);
>> +if (!s->doorbells) {
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
>>  /* Set up admin queue. */
>>  s->queues = g_new(NVMeQueuePair *, 1);
>>  s->queues[INDEX_ADMIN] = nvme_create_queue_pair(s, aio_context,
>> 0,
>> @@ -873,7 +880,9 @@ static void nvme_close(BlockDriverState *bs)
>> &s->irq_notifier[MSIX_SHARED_IRQ_IDX],
>> false, NULL, NULL);
>>  event_notifier_cleanup(&s->irq_notifier[MSIX_SHARED_IRQ_IDX]);
>> -qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->regs, 0,
>> NVME_BAR_SIZE);
>> +qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->doorbells,
>> +sizeof(NvmeBar), NVME_DOORBELL_SIZE);
>> +qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->regs, 0,
>> sizeof(NvmeBar));
>>  qemu_vfio_close(s->vfio);
>>  
>>  g_free(s->device);
>

Re: [PATCH v2 0/6] block/nvme: Map doorbells pages write-only, remove magic from nvme_init

2020-09-22 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20200922083821.578519-1-phi...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20200922083821.578519-1-phi...@redhat.com
Subject: [PATCH v2 0/6] block/nvme: Map doorbells pages write-only, remove 
magic from nvme_init

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]  
patchew/1600485023-263643-1-git-send-email-lei@intel.com -> 
patchew/1600485023-263643-1-git-send-email-lei@intel.com
 - [tag update]  patchew/20200918092203.20384-1-chen.zh...@intel.com -> 
patchew/20200918092203.20384-1-chen.zh...@intel.com
 - [tag update]  patchew/20200921144957.979989-1-lviv...@redhat.com -> 
patchew/20200921144957.979989-1-lviv...@redhat.com
 - [tag update]  patchew/20200921174320.46062-1-th...@redhat.com -> 
patchew/20200921174320.46062-1-th...@redhat.com
 - [tag update]  patchew/20200921221045.699690-1-ehabk...@redhat.com -> 
patchew/20200921221045.699690-1-ehabk...@redhat.com
 * [new tag] patchew/20200922083821.578519-1-phi...@redhat.com -> 
patchew/20200922083821.578519-1-phi...@redhat.com
Switched to a new branch 'test'
ba5ba7b block/nvme: Replace magic value by SCALE_MS definition
7975c85 block/nvme: Use register definitions from 'block/nvme.h'
0ce328b block/nvme: Drop NVMeRegs structure, directly use NvmeBar
554f68e block/nvme: Reduce I/O registers scope
5cfb159 block/nvme: Map doorbells pages write-only
2064703 util/vfio-helpers: Pass page protections to qemu_vfio_pci_map_bar()

=== OUTPUT BEGIN ===
1/6 Checking commit 2064703ff8dc (util/vfio-helpers: Pass page protections to 
qemu_vfio_pci_map_bar())
2/6 Checking commit 5cfb15932539 (block/nvme: Map doorbells pages write-only)
3/6 Checking commit 554f68e25abc (block/nvme: Reduce I/O registers scope)
4/6 Checking commit 0ce328b78083 (block/nvme: Drop NVMeRegs structure, directly 
use NvmeBar)
ERROR: Use of volatile is usually wrong, please add a comment
#43: FILE: block/nvme.c:692:
+volatile NvmeBar *regs = NULL;

total: 1 errors, 0 warnings, 62 lines checked

Patch 4/6 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

5/6 Checking commit 7975c85b3904 (block/nvme: Use register definitions from 
'block/nvme.h')
6/6 Checking commit ba5ba7b65742 (block/nvme: Replace magic value by SCALE_MS 
definition)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200922083821.578519-1-phi...@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

1 2 >

1 - 100 of 114 matches

Mail list logo