[PATCH v4] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-03-20 Thread Andrzej Jakowski
This patch introduces support for PMR that has been defined as part of NVMe 1.4
spec. User can now specify a pmrdev option that should point to 
HostMemoryBackend.
pmrdev memory region will subsequently be exposed as PCI BAR 2 in emulated NVMe
device. Guest OS can perform mmio read and writes to the PMR region that will 
stay
persistent across system reboot.

Signed-off-by: Andrzej Jakowski 
Reviewed-by: Klaus Jensen 
---
v3:
 - replaced qemu_msync() use with qemu_ram_writeback() to allow pmem_persist()
   or qemu_msync() be called depending on configuration [4] (Stefan)
 - rephrased comments to improve clarity and fixed code style issues [4]
   (Stefan, Klaus)

v2:
 - reworked PMR to use HostMemoryBackend instead of directly mapping PMR
   backend file into qemu [1] (Stefan)

v1:
 - provided support for Bit 1 from PMRWBM register instead of Bit 0 to ensure
   improved performance in virtualized environment [2] (Stefan)

 - added check if pmr size is power of two in size [3] (David)

 - addressed cross compilation build problems reported by CI environment

[1]: 
https://lore.kernel.org/qemu-devel/20200306223853.37958-1-andrzej.jakow...@linux.intel.com/
[2]: 
https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4-2019.06.10-Ratified.pdf
[3]: 
https://lore.kernel.org/qemu-devel/20200218224811.30050-1-andrzej.jakow...@linux.intel.com/
[4]: 
https://lore.kernel.org/qemu-devel/20200318200303.11322-1-andrzej.jakow...@linux.intel.com/
---
Persistent Memory Region (PMR) is a new optional feature provided in NVMe 1.4
specification. This patch implements initial support for it in NVMe driver.
---
 hw/block/Makefile.objs |   2 +-
 hw/block/nvme.c| 109 ++
 hw/block/nvme.h|   2 +
 hw/block/trace-events  |   4 +
 include/block/nvme.h   | 172 +
 5 files changed, 288 insertions(+), 1 deletion(-)

diff --git a/hw/block/Makefile.objs b/hw/block/Makefile.objs
index 4b4a2b338d..47960b5f0d 100644
--- a/hw/block/Makefile.objs
+++ b/hw/block/Makefile.objs
@@ -7,12 +7,12 @@ common-obj-$(CONFIG_PFLASH_CFI02) += pflash_cfi02.o
 common-obj-$(CONFIG_XEN) += xen-block.o
 common-obj-$(CONFIG_ECC) += ecc.o
 common-obj-$(CONFIG_ONENAND) += onenand.o
-common-obj-$(CONFIG_NVME_PCI) += nvme.o
 common-obj-$(CONFIG_SWIM) += swim.o
 
 common-obj-$(CONFIG_SH4) += tc58128.o
 
 obj-$(CONFIG_VIRTIO_BLK) += virtio-blk.o
 obj-$(CONFIG_VHOST_USER_BLK) += vhost-user-blk.o
+obj-$(CONFIG_NVME_PCI) += nvme.o
 
 obj-y += dataplane/
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index d28335cbf3..9b453423cf 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -19,10 +19,19 @@
  *  -drive file=,if=none,id=
  *  -device nvme,drive=,serial=,id=, \
  *  cmb_size_mb=, \
+ *  [pmrdev=,] \
  *  num_queues=
  *
  * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
  * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
+ *
+ * cmb_size_mb= and pmrdev= options are mutually exclusive due to limitation
+ * in available BAR's. cmb_size_mb= will take precedence over pmrdev= when
+ * both provided.
+ * Enabling pmr emulation can be achieved by pointing to memory-backend-file.
+ * For example:
+ * -object memory-backend-file,id=,share=on,mem-path=, \
+ *  size=  -device nvme,...,pmrdev=
  */
 
 #include "qemu/osdep.h"
@@ -35,7 +44,9 @@
 #include "sysemu/sysemu.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
+#include "sysemu/hostmem.h"
 #include "sysemu/block-backend.h"
+#include "exec/ram_addr.h"
 
 #include "qemu/log.h"
 #include "qemu/module.h"
@@ -1141,6 +1152,26 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 NVME_GUEST_ERR(nvme_ub_mmiowr_cmbsz_readonly,
"invalid write to read only CMBSZ, ignored");
 return;
+case 0xE00: /* PMRCAP */
+NVME_GUEST_ERR(nvme_ub_mmiowr_pmrcap_readonly,
+   "invalid write to PMRCAP register, ignored");
+return;
+case 0xE04: /* TODO PMRCTL */
+break;
+case 0xE08: /* PMRSTS */
+NVME_GUEST_ERR(nvme_ub_mmiowr_pmrsts_readonly,
+   "invalid write to PMRSTS register, ignored");
+return;
+case 0xE0C: /* PMREBS */
+NVME_GUEST_ERR(nvme_ub_mmiowr_pmrebs_readonly,
+   "invalid write to PMREBS register, ignored");
+return;
+case 0xE10: /* PMRSWTP */
+NVME_GUEST_ERR(nvme_ub_mmiowr_pmrswtp_readonly,
+   "invalid write to PMRSWTP register, ignored");
+return;
+case 0xE14: /* TODO PMRMSC */
+ break;
 default:
 NVME_GUEST_ERR(nvme_ub_mmiowr_invalid,
"invalid MMIO write,"
@@ -1169,6 +1200,16 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr 
addr, unsigned size)
 }
 
 if (addr < sizeof(n->bar)) {
+/*
+ * When PMRWBM bit 1 is set then read from
+ * from PMRSTS should ensure 

Re: discard and v2 qcow2 images

2020-03-20 Thread Alberto Garcia
On Fri 20 Mar 2020 08:35:44 PM CET, Eric Blake  wrote:
>> This flag is however only supported when qcow_version >= 3. In older
>> images the cluster is simply deallocated, exposing any possible
>> previous data from the backing file.
>
> Discard is advisory, and has no requirements that discarded data read
> back as zero.  However, if write zeroes uses discard under the hood,
> then THAT usage must guarantee reading back as zero.

write_zeroes doesn't seem to use discard in any case, so no problem
there.

>> @@ -3763,6 +3763,10 @@ static coroutine_fn int 
>> qcow2_co_pdiscard(BlockDriverState *bs,
>>   int ret;
>>   BDRVQcow2State *s = bs->opaque;
>>   
>> +if (s->qcow_version < 3) {
>> +return -ENOTSUP;
>> +}
>> +
>
> This changes it so you no longer see stale data, but doesn't change
> the fact that you don't read zeroes (just that your stale data is now
> from the current layer instead of the backing layer, since we did
> nothing at all).

discard can already fail if the request is not aligned, in this case you
get -ENOTSUP and stay with the same data as before.

What's different in this case is that you can actually get stale data,
that doesn't seem like a desirable outcome.

Berto



Re: discard and v2 qcow2 images

2020-03-20 Thread Eric Blake

On 3/20/20 1:58 PM, Alberto Garcia wrote:

Hi,

when full_discard is false in discard_in_l2_slice() then the selected
cluster should be deallocated and it should read back as zeroes. This
is done by clearing the cluster offset field and setting OFLAG_ZERO in
the L2 entry.

This flag is however only supported when qcow_version >= 3. In older
images the cluster is simply deallocated, exposing any possible
previous data from the backing file.


Discard is advisory, and has no requirements that discarded data read 
back as zero.  However, if write zeroes uses discard under the hood, 
then THAT usage must guarantee reading back as zero.




This can be trivially reproduced like this:

qemu-img create -f qcow2 backing.img 64k
qemu-io -c 'write -P 0xff 0 64k' backing.img
qemu-img create -f qcow2 -o compat=0.10 -b backing.img top.img
qemu-io -c 'write -P 0x01 0 64k' top.img

After this, top.img is filled with 0x01. Now we issue a discard
command:

qemu-io -c 'discard 0 64k' top.img

top.img should now read as zeroes, but instead you get the data from
the backing file (0xff). If top.img was created with compat=1.1
instead (the default) then it would read as zeroes after the discard.


I'd argue that this is undesirable behavior, but not a bug.



This seems like a bug to me, and I would simply forbid using discard
in this case (see below). The other user of full_discard = false is
qcow2_snapshot_create() but I think that one is safe and should be
allowed?

--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3763,6 +3763,10 @@ static coroutine_fn int 
qcow2_co_pdiscard(BlockDriverState *bs,
  int ret;
  BDRVQcow2State *s = bs->opaque;
  
+if (s->qcow_version < 3) {

+return -ENOTSUP;
+}
+


This changes it so you no longer see stale data, but doesn't change the 
fact that you don't read zeroes (just that your stale data is now from 
the current layer instead of the backing layer, since we did nothing at 
all).


I'm not opposed to the patch, per se, but am not convinced that this is 
a problem to worry about.



  if (!QEMU_IS_ALIGNED(offset | bytes, s->cluster_size)) {
  assert(bytes < s->cluster_size);
  /* Ignore partial clusters, except for the special case of the

Berto



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v3] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-03-20 Thread Stefan Hajnoczi
On Fri, Mar 20, 2020 at 5:48 PM Andrzej Jakowski
 wrote:
>
> On 3/20/20 8:45 AM, Stefan Hajnoczi wrote:
> > Please use qemu_ram_writeback() so that pmem_persist() and qemu_msync()
> > are used as appropriate.
>
> Thx!
> qemu_ram_writeback() doesn't return any status. How can I know that actual 
> msync succeds?

If the warn_report() message that is already printed by
qemu_ram_writeback() is insufficient in terms of error reporting, I
suggest propagating the return value from qemu_ram_writeback() and
qemu_ram_block_writeback().

> Also qemu_ram_writeback() requires me to include #include "exec/ram_addr.h".
> After including it when I compile code I'm getting following error:
>
> In file included from hw/block/nvme.c:49:
> /root/sources/pmr/qemu/include/exec/ram_addr.h:23:10: fatal error: cpu.h: No 
> such file or directory
>23 | #include "cpu.h"
>   |  ^~~
> compilation terminated.
> make: *** [/root/sources/pmr/qemu/rules.mak:69: hw/block/nvme.o] Error 1
>
> Why this is happening and what should be changed.

Generally object files are built as part of common-obj-y in
Makefile.objs.  These object files are built only once across all QEMU
targets (e.g. qemu-system-x86_64, qemu-system-arm, ...).

Some code embeds target-specific information and is therefore not
suitable for common-obj-y.  These object files are built as part of
obj-y in Makefile.objs.

You can fix this compilation issue by changing hw/block/Makefile.objs
to like this:

diff --git a/hw/block/Makefile.objs b/hw/block/Makefile.objs
index 4b4a2b338d..12d5d5dac6 100644
--- a/hw/block/Makefile.objs
+++ b/hw/block/Makefile.objs
@@ -7,11 +7,11 @@ common-obj-$(CONFIG_PFLASH_CFI02) += pflash_cfi02.o
 common-obj-$(CONFIG_XEN) += xen-block.o
 common-obj-$(CONFIG_ECC) += ecc.o
 common-obj-$(CONFIG_ONENAND) += onenand.o
-common-obj-$(CONFIG_NVME_PCI) += nvme.o
 common-obj-$(CONFIG_SWIM) += swim.o

 common-obj-$(CONFIG_SH4) += tc58128.o

+obj-$(CONFIG_NVME_PCI) += nvme.o
 obj-$(CONFIG_VIRTIO_BLK) += virtio-blk.o
 obj-$(CONFIG_VHOST_USER_BLK) += vhost-user-blk.o

Stefan



discard and v2 qcow2 images

2020-03-20 Thread Alberto Garcia
Hi,

when full_discard is false in discard_in_l2_slice() then the selected
cluster should be deallocated and it should read back as zeroes. This
is done by clearing the cluster offset field and setting OFLAG_ZERO in
the L2 entry.

This flag is however only supported when qcow_version >= 3. In older
images the cluster is simply deallocated, exposing any possible
previous data from the backing file.

This can be trivially reproduced like this:

   qemu-img create -f qcow2 backing.img 64k
   qemu-io -c 'write -P 0xff 0 64k' backing.img
   qemu-img create -f qcow2 -o compat=0.10 -b backing.img top.img
   qemu-io -c 'write -P 0x01 0 64k' top.img

After this, top.img is filled with 0x01. Now we issue a discard
command:

   qemu-io -c 'discard 0 64k' top.img

top.img should now read as zeroes, but instead you get the data from
the backing file (0xff). If top.img was created with compat=1.1
instead (the default) then it would read as zeroes after the discard.

This seems like a bug to me, and I would simply forbid using discard
in this case (see below). The other user of full_discard = false is
qcow2_snapshot_create() but I think that one is safe and should be
allowed?

--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3763,6 +3763,10 @@ static coroutine_fn int 
qcow2_co_pdiscard(BlockDriverState *bs,
 int ret;
 BDRVQcow2State *s = bs->opaque;
 
+if (s->qcow_version < 3) {
+return -ENOTSUP;
+}
+
 if (!QEMU_IS_ALIGNED(offset | bytes, s->cluster_size)) {
 assert(bytes < s->cluster_size);
 /* Ignore partial clusters, except for the special case of the

Berto



[PATCH] block: Avoid memleak on qcow2 image info failure

2020-03-20 Thread Eric Blake
If we fail to get bitmap info, we must not leak the encryption info.

Fixes: b8968c875f403
Fixes: Coverity CID 1421894
Signed-off-by: Eric Blake 
---
 block/qcow2.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index d44b45633dbb..e08917ed8462 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -4811,6 +4811,7 @@ static ImageInfoSpecific 
*qcow2_get_specific_info(BlockDriverState *bs,
 if (local_err) {
 error_propagate(errp, local_err);
 qapi_free_ImageInfoSpecific(spec_info);
+qapi_free_QCryptoBlockInfo(encrypt_info);
 return NULL;
 }
 *spec_info->u.qcow2.data = (ImageInfoSpecificQCow2){
-- 
2.25.1




Re: [Qemu-devel] [PULL 3/4] qcow2: Add list of bitmaps to ImageInfoSpecificQCow2

2020-03-20 Thread Eric Blake

On 3/20/20 12:57 PM, Peter Maydell wrote:

On Mon, 11 Feb 2019 at 20:57, Eric Blake  wrote:


From: Andrey Shinkevich 

In the 'Format specific information' section of the 'qemu-img info'
command output, the supplemental information about existing QCOW2
bitmaps will be shown, such as a bitmap name, flags and granularity:


Hi; Coverity has just noticed an issue (CID 1421894) with this change:




+Qcow2BitmapInfoList *bitmaps;
+bitmaps = qcow2_get_bitmap_info_list(bs, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+qapi_free_ImageInfoSpecific(spec_info);
+return NULL;


If we take this error-exit codepath, then we never free the
memory allocated by the earlier call to qcrypto_block_get_info().


Fix sent.

Hmm - it would be nice if the QAPI generator could declare all QAPI 
types as g_autoptr compatible, so we could simplify our cleanup paths to 
not have to worry about calling qapi_free_FOO() on all paths.  But while 
the memory leak fix is a one-liner safe for 5.0, switching to g_autoptr 
is a bigger task that would be 5.1 material.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v3] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-03-20 Thread Andrzej Jakowski
On 3/20/20 8:45 AM, Stefan Hajnoczi wrote:
> Please use qemu_ram_writeback() so that pmem_persist() and qemu_msync()
> are used as appropriate.

Thx!
qemu_ram_writeback() doesn't return any status. How can I know that actual 
msync succeds?

Also qemu_ram_writeback() requires me to include #include "exec/ram_addr.h". 
After including it when I compile code I'm getting following error:

In file included from hw/block/nvme.c:49:
/root/sources/pmr/qemu/include/exec/ram_addr.h:23:10: fatal error: cpu.h: No 
such file or directory
   23 | #include "cpu.h"
  |  ^~~
compilation terminated.
make: *** [/root/sources/pmr/qemu/rules.mak:69: hw/block/nvme.o] Error 1

Why this is happening and what should be changed.



Re: [Qemu-devel] [PULL 3/4] qcow2: Add list of bitmaps to ImageInfoSpecificQCow2

2020-03-20 Thread Peter Maydell
On Mon, 11 Feb 2019 at 20:57, Eric Blake  wrote:
>
> From: Andrey Shinkevich 
>
> In the 'Format specific information' section of the 'qemu-img info'
> command output, the supplemental information about existing QCOW2
> bitmaps will be shown, such as a bitmap name, flags and granularity:

Hi; Coverity has just noticed an issue (CID 1421894) with this change:

> diff --git a/block/qcow2.c b/block/qcow2.c
> index bcb80d0270c..65a54c9ac65 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -4387,7 +4387,7 @@ static ImageInfoSpecific 
> *qcow2_get_specific_info(BlockDriverState *bs,
>  spec_info = g_new(ImageInfoSpecific, 1);
>  *spec_info = (ImageInfoSpecific){
>  .type  = IMAGE_INFO_SPECIFIC_KIND_QCOW2,
> -.u.qcow2.data = g_new(ImageInfoSpecificQCow2, 1),
> +.u.qcow2.data = g_new0(ImageInfoSpecificQCow2, 1),
>  };
>  if (s->qcow_version == 2) {
>  *spec_info->u.qcow2.data = (ImageInfoSpecificQCow2){
> @@ -4395,6 +4395,13 @@ static ImageInfoSpecific 
> *qcow2_get_specific_info(BlockDriverState *bs,
>  .refcount_bits  = s->refcount_bits,
>  };
>  } else if (s->qcow_version == 3) {
> +Qcow2BitmapInfoList *bitmaps;
> +bitmaps = qcow2_get_bitmap_info_list(bs, _err);
> +if (local_err) {
> +error_propagate(errp, local_err);
> +qapi_free_ImageInfoSpecific(spec_info);
> +return NULL;

If we take this error-exit codepath, then we never free the
memory allocated by the earlier call to qcrypto_block_get_info().

> +}
>  *spec_info->u.qcow2.data = (ImageInfoSpecificQCow2){
>  .compat = g_strdup("1.1"),
>  .lazy_refcounts = s->compatible_features &
> @@ -4404,6 +4411,8 @@ static ImageInfoSpecific 
> *qcow2_get_specific_info(BlockDriverState *bs,
>QCOW2_INCOMPAT_CORRUPT,
>  .has_corrupt= true,
>  .refcount_bits  = s->refcount_bits,
> +.has_bitmaps= !!bitmaps,
> +.bitmaps= bitmaps,
>  };
>  } else {
>  /* if this assertion fails, this probably means a new version was
> --
> 2.20.1

thanks
-- PMM



Re: [PATCH v9 02/10] scripts: Coccinelle script to use ERRP_AUTO_PROPAGATE()

2020-03-20 Thread Markus Armbruster
Vladimir Sementsov-Ogievskiy  writes:

> 20.03.2020 16:58, Markus Armbruster wrote:
>> Vladimir Sementsov-Ogievskiy  writes:
[...]
>>> I will not be surprised, if we missed some more interesting cases :)
>>> But we should proceed. What is our plan? Will you queue v10 for 5.1?
>>
>> v10's PATCH 1+2 look ready.  The error.h comment update could perhaps
>> use some polish; I've focused my attention elsewhere.
>>
>> PATCH 8-9 are generated.  They should never be rebased, always be
>> regenerated.  We compare regenerated patches to posted ones to make sure
>> they are still sane, and the R-bys are still valid.  I can take care of
>> the comparing.
>>
>> I'd like to have a pull request ready when the tree reopens for general
>> development.  Let's use the time until then to get more generated
>> patches out for review.
>>
>> If I queue up patches in my tree, we shift the responsibility for
>> regenerating patches from you to me, and create a coordination issue:
>> you'll want to base patch submissions on the branch I use to queue this
>> work, and that's going to be awkward when I rebase / regenerate that
>> branch.  I think it's simpler to queue up in your tree until we're ready
>> for a pull request.
>>
>> When you post more patches, use
>>
>>  Based-on: <20200317151625.20797-1-vsement...@virtuozzo.com>
>>
>> so that Patchew applies them on top of this series.  Hmm, probably won't
>> do, as PATCH 9 already conflicts.
>>
>> You could instead repost PATCH 1+2 with each batch.  I hope that's not
>> too confusing.
>>
>> I trust you'll keep providing a tag reviewers can pull.
>>
>> I suggest to ask maintainers to leave merging these patches to me, in
>> cover letters.
>>
>> Makes sense?
>>
>
> Hmm.
>
> I remember what Kevin said about freeze period: maintainers will queue
> a lot of patches in their "next" branches, and send pull requests at start
> of next developing period. This highly possible will drop r-bs I can get now.
> And reviewers will have to review twice.
>
> And for the same reason, it's bad idea to queue in your branch a lot of 
> patches
> from different subsystems during freeze.
>
> So, just postpone this all up to next development phase?

Okay.  I hope we can process generated patches at a brisk pace then.




Re: [PATCH v10 2/9] scripts: Coccinelle script to use ERRP_AUTO_PROPAGATE()

2020-03-20 Thread Markus Armbruster
Vladimir Sementsov-Ogievskiy  writes:

> Script adds ERRP_AUTO_PROPAGATE macro invocation where appropriate and
> does corresponding changes in code (look for details in
> include/qapi/error.h)
>
> Usage example:
> spatch --sp-file scripts/coccinelle/auto-propagated-errp.cocci \
>  --macro-file scripts/cocci-macro-file.h --in-place --no-show-diff \
>  --max-width 80 FILES...
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>
> Cc: Eric Blake 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> Cc: Greg Kurz 
> Cc: Christian Schoenebeck 
> Cc: Stefan Hajnoczi 
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: Paul Durrant 
> Cc: "Philippe Mathieu-Daudé" 
> Cc: Laszlo Ersek 
> Cc: Gerd Hoffmann 
> Cc: Stefan Berger 
> Cc: Markus Armbruster 
> Cc: Michael Roth 
> Cc: qemu-de...@nongnu.org
> Cc: qemu-block@nongnu.org
> Cc: xen-de...@lists.xenproject.org
>
>  scripts/coccinelle/auto-propagated-errp.cocci | 336 ++
>  include/qapi/error.h  |   3 +
>  MAINTAINERS   |   1 +
>  3 files changed, 340 insertions(+)
>  create mode 100644 scripts/coccinelle/auto-propagated-errp.cocci
>
> diff --git a/scripts/coccinelle/auto-propagated-errp.cocci 
> b/scripts/coccinelle/auto-propagated-errp.cocci
> new file mode 100644
> index 00..5188b07006
> --- /dev/null
> +++ b/scripts/coccinelle/auto-propagated-errp.cocci
> @@ -0,0 +1,336 @@
> +// Use ERRP_AUTO_PROPAGATE (see include/qapi/error.h)
> +//
> +// Copyright (c) 2020 Virtuozzo International GmbH.
> +//
> +// This program is free software; you can redistribute it and/or
> +// modify it under the terms of the GNU General Public License as
> +// published by the Free Software Foundation; either version 2 of the
> +// License, or (at your option) any later version.
> +//
> +// This program is distributed in the hope that it will be useful,
> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +// GNU General Public License for more details.
> +//
> +// You should have received a copy of the GNU General Public License
> +// along with this program.  If not, see
> +// .
> +//
> +// Usage example:
> +// spatch --sp-file scripts/coccinelle/auto-propagated-errp.cocci \
> +//  --macro-file scripts/cocci-macro-file.h --in-place \
> +//  --no-show-diff --max-width 80 FILES...
> +//
> +// Note: --max-width 80 is needed because coccinelle default is less
> +// than 80, and without this parameter coccinelle may reindent some
> +// lines which fit into 80 characters but not to coccinelle default,
> +// which in turn produces extra patch hunks for no reason.
> +
> +// Switch unusual Error ** parameter names to errp
> +// (this is necessary to use ERRP_AUTO_PROPAGATE).
> +//
> +// Disable optional_qualifier to skip functions with
> +// "Error *const *errp" parameter.
> +//
> +// Skip functions with "assert(_errp && *_errp)" statement, because
> +// that signals unusual semantics, and the parameter name may well
> +// serve a purpose. (like nbd_iter_channel_error()).
> +//
> +// Skip util/error.c to not touch, for example, error_propagate() and
> +// error_propagate_prepend().
> +@ depends on !(file in "util/error.c") disable optional_qualifier@
> +identifier fn;
> +identifier _errp != errp;
> +@@
> +
> + fn(...,
> +-   Error **_errp
> ++   Error **errp
> +,...)
> + {
> +(
> + ... when != assert(_errp && *_errp)
> +&
> + <...
> +-_errp
> ++errp
> + ...>
> +)
> + }
> +
> +// Add invocation of ERRP_AUTO_PROPAGATE to errp-functions where
> +// necessary
> +//
> +// Note, that without "when any" the final "..." does not mach
> +// something matched by previous pattern, i.e. the rule will not match
> +// double error_prepend in control flow like in
> +// vfio_set_irq_signaling().
> +//
> +// Note, "exists" says that we want apply rule even if it does not
> +// match on all possible control flows (otherwise, it will not match
> +// standard pattern when error_propagate() call is in if branch).
> +@ disable optional_qualifier exists@
> +identifier fn, local_err;
> +symbol errp;
> +@@
> +
> + fn(..., Error **errp, ...)
> + {
> ++   ERRP_AUTO_PROPAGATE();
> +...  when != ERRP_AUTO_PROPAGATE();
> +(
> +(
> +error_append_hint(errp, ...);
> +|
> +error_prepend(errp, ...);
> +|
> +error_vprepend(errp, ...);
> +)
> +... when any
> +|
> +Error *local_err = NULL;
> +...
> +(
> +error_propagate_prepend(errp, local_err, ...);
> +|
> +error_propagate(errp, local_err);
> +)
> +...
> +)
> + }
> +
> +// Warn when several Error * definitions are in the control flow.
> +// This rule is not chained to rule1 and less restrictive, to cover more
> +// functions to warn (even those we are not going to convert).
> +//
> +// Note, that even with one (or zero) Error * definition in the each
> +// control flow we may have several (in total) Error * definitions in
> +// the 

Re: [RFC (fix for 5.0?)] block/io: do not do pointer arithmetic on void *

2020-03-20 Thread Stefan Hajnoczi
On Wed, Mar 18, 2020 at 02:26:54PM +, Daniel P. Berrangé wrote:
> On Wed, Mar 18, 2020 at 05:22:53PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > Signed-off-by: Vladimir Sementsov-Ogievskiy 
> > ---
> > 
> > Hi all!
> > 
> > C standard doesn't allow pointer arithmetic on void *.
> > Still, gcc allows it as an extension:
> >  https://gcc.gnu.org/onlinedocs/gcc-4.8.0/gcc/Pointer-Arith.html
> > 
> > I can create a series of patches like this. Do we need it?
> 
> I don't think so, we only care about gcc & clang.

I agree with Dan.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v3] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-03-20 Thread Stefan Hajnoczi
On Wed, Mar 18, 2020 at 01:03:03PM -0700, Andrzej Jakowski wrote:
> This patch introduces support for PMR that has been defined as part of NVMe 
> 1.4
> spec. User can now specify a pmrdev option that should point to 
> HostMemoryBackend.
> pmrdev memory region will subsequently be exposed as PCI BAR 2 in emulated 
> NVMe
> device. Guest OS can perform mmio read and writes to the PMR region that will 
> stay
> persistent across system reboot.
> 
> Signed-off-by: Andrzej Jakowski 
> ---
> v2:
>  - reworked PMR to use HostMemoryBackend instead of directly mapping PMR
>backend file into qemu [1] (Stefan)
> 
> v1:
>  - provided support for Bit 1 from PMRWBM register instead of Bit 0 to ensure
>improved performance in virtualized environment [2] (Stefan)
> 
>  - added check if pmr size is power of two in size [3] (David)
> 
>  - addressed cross compilation build problems reported by CI environment
> 
> [1]: 
> https://lore.kernel.org/qemu-devel/20200306223853.37958-1-andrzej.jakow...@linux.intel.com/
> [2]: 
> https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4-2019.06.10-Ratified.pdf
> [3]: 
> https://lore.kernel.org/qemu-devel/20200218224811.30050-1-andrzej.jakow...@linux.intel.com/
> ---
> Persistent Memory Region (PMR) is a new optional feature provided in NVMe 1.4
> specification. This patch implements initial support for it in NVMe driver.
> ---
>  hw/block/nvme.c   | 117 +++-
>  hw/block/nvme.h   |   2 +
>  hw/block/trace-events |   5 ++
>  include/block/nvme.h  | 172 ++
>  4 files changed, 294 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index d28335cbf3..70fd09d293 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -19,10 +19,18 @@
>   *  -drive file=,if=none,id=
>   *  -device nvme,drive=,serial=,id=, \
>   *  cmb_size_mb=, \
> + *  [pmrdev=,] \
>   *  num_queues=
>   *
>   * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
>   * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
> + *
> + * Either cmb or pmr - due to limitation in available BAR indexes.

This isn't 100% clear.  Does this mean "cmb_size_mb= and pmrdev= are
mutually exclusive due to limited availability of BARs"?  Please
rephrase.

> + * pmr_file file needs to be power of two in size.
> + * Enabling pmr emulation can be achieved by pointing to memory-backend-file.
> + * For example:
> + * -object memory-backend-file,id=,share=on,mem-path=, \
> + *  size=  -device nvme,...,pmrdev=
>   */
>  
>  #include "qemu/osdep.h"
> @@ -35,7 +43,9 @@
>  #include "sysemu/sysemu.h"
>  #include "qapi/error.h"
>  #include "qapi/visitor.h"
> +#include "sysemu/hostmem.h"
>  #include "sysemu/block-backend.h"
> +#include "exec/ramblock.h"
>  
>  #include "qemu/log.h"
>  #include "qemu/module.h"
> @@ -1141,6 +1151,26 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
> uint64_t data,
>  NVME_GUEST_ERR(nvme_ub_mmiowr_cmbsz_readonly,
> "invalid write to read only CMBSZ, ignored");
>  return;
> +case 0xE00: /* PMRCAP */
> +NVME_GUEST_ERR(nvme_ub_mmiowr_pmrcap_readonly,
> +   "invalid write to PMRCAP register, ignored");
> +return;
> +case 0xE04: /* TODO PMRCTL */
> +break;
> +case 0xE08: /* PMRSTS */
> +NVME_GUEST_ERR(nvme_ub_mmiowr_pmrsts_readonly,
> +   "invalid write to PMRSTS register, ignored");
> +return;
> +case 0xE0C: /* PMREBS */
> +NVME_GUEST_ERR(nvme_ub_mmiowr_pmrebs_readonly,
> +   "invalid write to PMREBS register, ignored");
> +return;
> +case 0xE10: /* PMRSWTP */
> +NVME_GUEST_ERR(nvme_ub_mmiowr_pmrswtp_readonly,
> +   "invalid write to PMRSWTP register, ignored");
> +return;
> +case 0xE14: /* TODO PMRMSC */
> + break;
>  default:
>  NVME_GUEST_ERR(nvme_ub_mmiowr_invalid,
> "invalid MMIO write,"
> @@ -1169,6 +1199,23 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr 
> addr, unsigned size)
>  }
>  
>  if (addr < sizeof(n->bar)) {
> +/*
> + * When PMRWBM bit 1 is set then read from
> + * from PMRSTS should ensure prior writes
> + * made it to persistent media
> + */
> +if (addr == 0xE08 &&
> +(NVME_PMRCAP_PMRWBM(n->bar.pmrcap) & 0x02) >> 1) {
> +int status;
> +
> +status = qemu_msync((void *)n->pmrdev->mr.ram_block->host,
> +n->pmrdev->size,
> +n->pmrdev->mr.ram_block->fd);

Please use qemu_ram_writeback() so that pmem_persist() and qemu_msync()
are used as appropriate.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v9 02/10] scripts: Coccinelle script to use ERRP_AUTO_PROPAGATE()

2020-03-20 Thread Vladimir Sementsov-Ogievskiy

20.03.2020 16:58, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


19.03.2020 13:45, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:

[...]

So, understanding that there no such cases in the whole tree, and even
if your patch works faster on the whole tree, I still don't want to
drop inheritance, because it's just a correct thing to do. Yes, we've
added  helper. It helps to avoid some problems. Pair-inheritance
helps to avoid another problems. I understand, that there still may
other, not-covered problems, but better to be as safe as possible. And
inheritance here is native and correct thing to do, even with our 
additional helper. What do you think?


I wouldn't call it correct.  It's still unreliable, but less so than
without the function name constraint.  That makes it less wrong.


Agree.



100% reliable would be nice, but not at any cost.  Something we're
reasonably confident to get right should be good enough.

To be confident, we need to understand the script's limitations, and how
to compensate for them.  I figure we do now.  You too?



I will not be surprised, if we missed some more interesting cases :)
But we should proceed. What is our plan? Will you queue v10 for 5.1?


v10's PATCH 1+2 look ready.  The error.h comment update could perhaps
use some polish; I've focused my attention elsewhere.

PATCH 8-9 are generated.  They should never be rebased, always be
regenerated.  We compare regenerated patches to posted ones to make sure
they are still sane, and the R-bys are still valid.  I can take care of
the comparing.

I'd like to have a pull request ready when the tree reopens for general
development.  Let's use the time until then to get more generated
patches out for review.

If I queue up patches in my tree, we shift the responsibility for
regenerating patches from you to me, and create a coordination issue:
you'll want to base patch submissions on the branch I use to queue this
work, and that's going to be awkward when I rebase / regenerate that
branch.  I think it's simpler to queue up in your tree until we're ready
for a pull request.

When you post more patches, use

 Based-on: <20200317151625.20797-1-vsement...@virtuozzo.com>

so that Patchew applies them on top of this series.  Hmm, probably won't
do, as PATCH 9 already conflicts.

You could instead repost PATCH 1+2 with each batch.  I hope that's not
too confusing.

I trust you'll keep providing a tag reviewers can pull.

I suggest to ask maintainers to leave merging these patches to me, in
cover letters.

Makes sense?



Hmm.

I remember what Kevin said about freeze period: maintainers will queue
a lot of patches in their "next" branches, and send pull requests at start
of next developing period. This highly possible will drop r-bs I can get now.
And reviewers will have to review twice.

And for the same reason, it's bad idea to queue in your branch a lot of patches
from different subsystems during freeze.

So, just postpone this all up to next development phase?


--
Best regards,
Vladimir



Re: [PATCH v9 02/10] scripts: Coccinelle script to use ERRP_AUTO_PROPAGATE()

2020-03-20 Thread Markus Armbruster
Vladimir Sementsov-Ogievskiy  writes:

> 19.03.2020 13:45, Markus Armbruster wrote:
>> Vladimir Sementsov-Ogievskiy  writes:
[...]
>>> So, understanding that there no such cases in the whole tree, and even
>>> if your patch works faster on the whole tree, I still don't want to
>>> drop inheritance, because it's just a correct thing to do. Yes, we've
>>> added  helper. It helps to avoid some problems. Pair-inheritance
>>> helps to avoid another problems. I understand, that there still may
>>> other, not-covered problems, but better to be as safe as possible. And
>>> inheritance here is native and correct thing to do, even with our 
>>> additional helper. What do you think?
>>
>> I wouldn't call it correct.  It's still unreliable, but less so than
>> without the function name constraint.  That makes it less wrong.
>
> Agree.
>
>>
>> 100% reliable would be nice, but not at any cost.  Something we're
>> reasonably confident to get right should be good enough.
>>
>> To be confident, we need to understand the script's limitations, and how
>> to compensate for them.  I figure we do now.  You too?
>>
>
> I will not be surprised, if we missed some more interesting cases :)
> But we should proceed. What is our plan? Will you queue v10 for 5.1?

v10's PATCH 1+2 look ready.  The error.h comment update could perhaps
use some polish; I've focused my attention elsewhere.

PATCH 8-9 are generated.  They should never be rebased, always be
regenerated.  We compare regenerated patches to posted ones to make sure
they are still sane, and the R-bys are still valid.  I can take care of
the comparing.

I'd like to have a pull request ready when the tree reopens for general
development.  Let's use the time until then to get more generated
patches out for review.

If I queue up patches in my tree, we shift the responsibility for
regenerating patches from you to me, and create a coordination issue:
you'll want to base patch submissions on the branch I use to queue this
work, and that's going to be awkward when I rebase / regenerate that
branch.  I think it's simpler to queue up in your tree until we're ready
for a pull request.

When you post more patches, use

Based-on: <20200317151625.20797-1-vsement...@virtuozzo.com>

so that Patchew applies them on top of this series.  Hmm, probably won't
do, as PATCH 9 already conflicts.

You could instead repost PATCH 1+2 with each batch.  I hope that's not
too confusing.

I trust you'll keep providing a tag reviewers can pull.

I suggest to ask maintainers to leave merging these patches to me, in
cover letters.

Makes sense?




Re: [PATCH v3 2/2] lockable: replaced locks with lock guard macros where appropriate

2020-03-20 Thread Dr. David Alan Gilbert
* Paolo Bonzini (pbonz...@redhat.com) wrote:
> On 20/03/20 13:56, Dr. David Alan Gilbert wrote:
> >> According to https://wiki.qemu.org/ToDo/LockGuards cases that are trivial 
> >> (no
> >> conditional logic) shouldn't be replaced.
> > OK
> 
> I don't think that has to be either-or.  Trivial lock/unlock sequences
> are not the first ones that should be converted, but there's an
> advantage in having a single patch that converts all possible uses of a
> lock.  Trivial sequences certainly do not belong in a bigger patch like
> this, as they would make the patch even bigger.
> 
> > So for what you've already got there,
> > 
> > For migration:
> > Acked-by: Dr. David Alan Gilbert 
> > 
> 
> Can you just extract that and queue it yourself (for 5.1 probably)?


I can, although it would be easier if Daniel did that; there's no rush
given it's for 5.1

> Paolo
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH v3 2/2] lockable: replaced locks with lock guard macros where appropriate

2020-03-20 Thread Paolo Bonzini
On 20/03/20 13:56, Dr. David Alan Gilbert wrote:
>> According to https://wiki.qemu.org/ToDo/LockGuards cases that are trivial (no
>> conditional logic) shouldn't be replaced.
> OK

I don't think that has to be either-or.  Trivial lock/unlock sequences
are not the first ones that should be converted, but there's an
advantage in having a single patch that converts all possible uses of a
lock.  Trivial sequences certainly do not belong in a bigger patch like
this, as they would make the patch even bigger.

> So for what you've already got there,
> 
> For migration:
> Acked-by: Dr. David Alan Gilbert 
> 

Can you just extract that and queue it yourself (for 5.1 probably)?

Paolo




Re: [PATCH v3 2/2] lockable: replaced locks with lock guard macros where appropriate

2020-03-20 Thread Dr. David Alan Gilbert
* Daniel Brodsky (dnbrd...@gmail.com) wrote:
> On Fri, Mar 20, 2020 at 5:34 AM Dr. David Alan Gilbert
>  wrote:
> >
> > * dnbrd...@gmail.com (dnbrd...@gmail.com) wrote:
> > > From: Daniel Brodsky 
> > >
> > > - ran regexp "qemu_mutex_lock\(.*\).*\n.*if" to find targets
> > > - replaced result with QEMU_LOCK_GUARD if all unlocks at function end
> > > - replaced result with WITH_QEMU_LOCK_GUARD if unlock not at end
> > >
> > > Signed-off-by: Daniel Brodsky 
> > > ---
> > >  block/iscsi.c | 11 +++---
> > >  block/nfs.c   | 51 ---
> > >  cpus-common.c | 13 +--
> > >  hw/display/qxl.c  | 43 +---
> > >  hw/vfio/platform.c|  5 ++---
> > >  migration/migration.c |  3 +--
> > >  migration/multifd.c   |  8 +++
> > >  migration/ram.c   |  3 +--
> > >  monitor/misc.c|  4 +---
> > >  ui/spice-display.c| 14 ++--
> > >  util/log.c|  4 ++--
> > >  util/qemu-timer.c | 17 +++
> > >  util/rcu.c|  8 +++
> > >  util/thread-pool.c|  3 +--
> > >  util/vfio-helpers.c   |  4 ++--
> > >  15 files changed, 84 insertions(+), 107 deletions(-)
> > >
> >
> > 
> >
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index c1d88ace7f..2f0bd6d8b4 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -1652,11 +1652,10 @@ static void migrate_fd_cleanup_bh(void *opaque)
> > >
> > >  void migrate_set_error(MigrationState *s, const Error *error)
> > >  {
> > > -qemu_mutex_lock(>error_mutex);
> > > +QEMU_LOCK_GUARD(>error_mutex);
> > >  if (!s->error) {
> > >  s->error = error_copy(error);
> > >  }
> > > -qemu_mutex_unlock(>error_mutex);
> > >  }
> >
> > There are some other places in migration.c that would really benefit;
> > for example, migrate_send_rp_message, if you use a LOCK_QUARD
> > there, then you can remove the 'int ret', and the goto error.
> > In postcopy_pause, the locks on qemu_file_lock would work well using the
> > WITH_QEMU_LOCK_GUARD.
> 
> I did a basic pass through for targets and that one didn't come up. I can add
> more replacements later, but there are ~300 mutex locks that might be worth
> replacing and going through them manually in one go is too tedious.

Sure; the send_rp_message case is quite a nice example of where the
guard makes the code simpler.

> > >  void migrate_fd_error(MigrationState *s, const Error *error)
> > > diff --git a/migration/multifd.c b/migration/multifd.c
> > > index cb6a4a3ab8..9123c111a3 100644
> > > --- a/migration/multifd.c
> > > +++ b/migration/multifd.c
> > > @@ -894,11 +894,11 @@ void multifd_recv_sync_main(void)
> > >  for (i = 0; i < migrate_multifd_channels(); i++) {
> > >  MultiFDRecvParams *p = _recv_state->params[i];
> > >
> > > -qemu_mutex_lock(>mutex);
> > > -if (multifd_recv_state->packet_num < p->packet_num) {
> > > -multifd_recv_state->packet_num = p->packet_num;
> > > +WITH_QEMU_LOCK_GUARD(>mutex) {
> > > +if (multifd_recv_state->packet_num < p->packet_num) {
> > > +multifd_recv_state->packet_num = p->packet_num;
> > > +}
> > >  }
> > > -qemu_mutex_unlock(>mutex);
> > >  trace_multifd_recv_sync_main_signal(p->id);
> > >  qemu_sem_post(>sem_sync);
> > >  }
> >
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index c12cfdbe26..87a670cfbf 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -1368,7 +1368,7 @@ static RAMBlock *unqueue_page(RAMState *rs, 
> > > ram_addr_t *offset)
> > >  return NULL;
> > >  }
> > >
> > > -qemu_mutex_lock(>src_page_req_mutex);
> > > +QEMU_LOCK_GUARD(>src_page_req_mutex);
> > >  if (!QSIMPLEQ_EMPTY(>src_page_requests)) {
> > >  struct RAMSrcPageRequest *entry =
> > >  QSIMPLEQ_FIRST(>src_page_requests);
> > > @@ -1385,7 +1385,6 @@ static RAMBlock *unqueue_page(RAMState *rs, 
> > > ram_addr_t *offset)
> > >  migration_consume_urgent_request();
> > >  }
> > >  }
> > > -qemu_mutex_unlock(>src_page_req_mutex);
> > >
> > >  return block;
> > >  }
> >
> > Is there a reason you didn't do the matching pair at the bottom of
> > ram_save_queue_pages ?
> >
> > Dave
> >
> 
> According to https://wiki.qemu.org/ToDo/LockGuards cases that are trivial (no
> conditional logic) shouldn't be replaced.

OK

So for what you've already go tthere,

For migration:
Acked-by: Dr. David Alan Gilbert 

> > > diff --git a/monitor/misc.c b/monitor/misc.c
> > > index 6c45fa490f..9723b466cd 100644
> > > --- a/monitor/misc.c
> > > +++ b/monitor/misc.c
> > > @@ -1473,7 +1473,7 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool 
> > > has_fdset_id, int64_t fdset_id,
> > >  MonFdsetFd *mon_fdset_fd;
> > >  AddfdInfo *fdinfo;
> > >
> > > -qemu_mutex_lock(_fdsets_lock);
> 

Re: [PATCH v3 2/2] lockable: replaced locks with lock guard macros where appropriate

2020-03-20 Thread Daniel Brodsky
On Fri, Mar 20, 2020 at 5:34 AM Dr. David Alan Gilbert
 wrote:
>
> * dnbrd...@gmail.com (dnbrd...@gmail.com) wrote:
> > From: Daniel Brodsky 
> >
> > - ran regexp "qemu_mutex_lock\(.*\).*\n.*if" to find targets
> > - replaced result with QEMU_LOCK_GUARD if all unlocks at function end
> > - replaced result with WITH_QEMU_LOCK_GUARD if unlock not at end
> >
> > Signed-off-by: Daniel Brodsky 
> > ---
> >  block/iscsi.c | 11 +++---
> >  block/nfs.c   | 51 ---
> >  cpus-common.c | 13 +--
> >  hw/display/qxl.c  | 43 +---
> >  hw/vfio/platform.c|  5 ++---
> >  migration/migration.c |  3 +--
> >  migration/multifd.c   |  8 +++
> >  migration/ram.c   |  3 +--
> >  monitor/misc.c|  4 +---
> >  ui/spice-display.c| 14 ++--
> >  util/log.c|  4 ++--
> >  util/qemu-timer.c | 17 +++
> >  util/rcu.c|  8 +++
> >  util/thread-pool.c|  3 +--
> >  util/vfio-helpers.c   |  4 ++--
> >  15 files changed, 84 insertions(+), 107 deletions(-)
> >
>
> 
>
> > diff --git a/migration/migration.c b/migration/migration.c
> > index c1d88ace7f..2f0bd6d8b4 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1652,11 +1652,10 @@ static void migrate_fd_cleanup_bh(void *opaque)
> >
> >  void migrate_set_error(MigrationState *s, const Error *error)
> >  {
> > -qemu_mutex_lock(>error_mutex);
> > +QEMU_LOCK_GUARD(>error_mutex);
> >  if (!s->error) {
> >  s->error = error_copy(error);
> >  }
> > -qemu_mutex_unlock(>error_mutex);
> >  }
>
> There are some other places in migration.c that would really benefit;
> for example, migrate_send_rp_message, if you use a LOCK_QUARD
> there, then you can remove the 'int ret', and the goto error.
> In postcopy_pause, the locks on qemu_file_lock would work well using the
> WITH_QEMU_LOCK_GUARD.

I did a basic pass through for targets and that one didn't come up. I can add
more replacements later, but there are ~300 mutex locks that might be worth
replacing and going through them manually in one go is too tedious.

> >  void migrate_fd_error(MigrationState *s, const Error *error)
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index cb6a4a3ab8..9123c111a3 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -894,11 +894,11 @@ void multifd_recv_sync_main(void)
> >  for (i = 0; i < migrate_multifd_channels(); i++) {
> >  MultiFDRecvParams *p = _recv_state->params[i];
> >
> > -qemu_mutex_lock(>mutex);
> > -if (multifd_recv_state->packet_num < p->packet_num) {
> > -multifd_recv_state->packet_num = p->packet_num;
> > +WITH_QEMU_LOCK_GUARD(>mutex) {
> > +if (multifd_recv_state->packet_num < p->packet_num) {
> > +multifd_recv_state->packet_num = p->packet_num;
> > +}
> >  }
> > -qemu_mutex_unlock(>mutex);
> >  trace_multifd_recv_sync_main_signal(p->id);
> >  qemu_sem_post(>sem_sync);
> >  }
>
> > diff --git a/migration/ram.c b/migration/ram.c
> > index c12cfdbe26..87a670cfbf 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -1368,7 +1368,7 @@ static RAMBlock *unqueue_page(RAMState *rs, 
> > ram_addr_t *offset)
> >  return NULL;
> >  }
> >
> > -qemu_mutex_lock(>src_page_req_mutex);
> > +QEMU_LOCK_GUARD(>src_page_req_mutex);
> >  if (!QSIMPLEQ_EMPTY(>src_page_requests)) {
> >  struct RAMSrcPageRequest *entry =
> >  QSIMPLEQ_FIRST(>src_page_requests);
> > @@ -1385,7 +1385,6 @@ static RAMBlock *unqueue_page(RAMState *rs, 
> > ram_addr_t *offset)
> >  migration_consume_urgent_request();
> >  }
> >  }
> > -qemu_mutex_unlock(>src_page_req_mutex);
> >
> >  return block;
> >  }
>
> Is there a reason you didn't do the matching pair at the bottom of
> ram_save_queue_pages ?
>
> Dave
>

According to https://wiki.qemu.org/ToDo/LockGuards cases that are trivial (no
conditional logic) shouldn't be replaced.

> > diff --git a/monitor/misc.c b/monitor/misc.c
> > index 6c45fa490f..9723b466cd 100644
> > --- a/monitor/misc.c
> > +++ b/monitor/misc.c
> > @@ -1473,7 +1473,7 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool 
> > has_fdset_id, int64_t fdset_id,
> >  MonFdsetFd *mon_fdset_fd;
> >  AddfdInfo *fdinfo;
> >
> > -qemu_mutex_lock(_fdsets_lock);
> > +QEMU_LOCK_GUARD(_fdsets_lock);
> >  if (has_fdset_id) {
> >  QLIST_FOREACH(mon_fdset, _fdsets, next) {
> >  /* Break if match found or match impossible due to ordering by 
> > ID */
> > @@ -1494,7 +1494,6 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool 
> > has_fdset_id, int64_t fdset_id,
> >  if (fdset_id < 0) {
> >  error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "fdset-id",
> > "a 

[PATCH v4 2/2] lockable: replaced locks with lock guard macros where appropriate

2020-03-20 Thread dnbrdsky
From: Daniel Brodsky 

- ran regexp "qemu_mutex_lock\(.*\).*\n.*if" to find targets
- replaced result with QEMU_LOCK_GUARD if all unlocks at function end
- replaced result with WITH_QEMU_LOCK_GUARD if unlock not at end

Signed-off-by: Daniel Brodsky 
---
 block/iscsi.c |  7 ++
 block/nfs.c   | 51 ---
 cpus-common.c | 14 +---
 hw/display/qxl.c  | 43 +---
 hw/vfio/platform.c|  5 ++---
 migration/migration.c |  3 +--
 migration/multifd.c   |  8 +++
 migration/ram.c   |  3 +--
 monitor/misc.c|  4 +---
 ui/spice-display.c| 14 ++--
 util/log.c|  4 ++--
 util/qemu-timer.c | 17 +++
 util/rcu.c|  8 +++
 util/thread-pool.c|  3 +--
 util/vfio-helpers.c   |  5 ++---
 15 files changed, 83 insertions(+), 106 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 682abd8e09..f86a53c096 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1395,20 +1395,17 @@ static void iscsi_nop_timed_event(void *opaque)
 {
 IscsiLun *iscsilun = opaque;
 
-qemu_mutex_lock(>mutex);
+QEMU_LOCK_GUARD(>mutex);
 if (iscsi_get_nops_in_flight(iscsilun->iscsi) >= MAX_NOP_FAILURES) {
 error_report("iSCSI: NOP timeout. Reconnecting...");
 iscsilun->request_timed_out = true;
 } else if (iscsi_nop_out_async(iscsilun->iscsi, NULL, NULL, 0, NULL) != 0) 
{
 error_report("iSCSI: failed to sent NOP-Out. Disabling NOP messages.");
-goto out;
+return;
 }
 
 timer_mod(iscsilun->nop_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + 
NOP_INTERVAL);
 iscsi_set_events(iscsilun);
-
-out:
-qemu_mutex_unlock(>mutex);
 }
 
 static void iscsi_readcapacity_sync(IscsiLun *iscsilun, Error **errp)
diff --git a/block/nfs.c b/block/nfs.c
index 9a6311e270..09a78aede8 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -273,15 +273,14 @@ static int coroutine_fn nfs_co_preadv(BlockDriverState 
*bs, uint64_t offset,
 nfs_co_init_task(bs, );
 task.iov = iov;
 
-qemu_mutex_lock(>mutex);
-if (nfs_pread_async(client->context, client->fh,
-offset, bytes, nfs_co_generic_cb, ) != 0) {
-qemu_mutex_unlock(>mutex);
-return -ENOMEM;
-}
+WITH_QEMU_LOCK_GUARD(>mutex) {
+if (nfs_pread_async(client->context, client->fh,
+offset, bytes, nfs_co_generic_cb, ) != 0) {
+return -ENOMEM;
+}
 
-nfs_set_events(client);
-qemu_mutex_unlock(>mutex);
+nfs_set_events(client);
+}
 while (!task.complete) {
 qemu_coroutine_yield();
 }
@@ -320,19 +319,18 @@ static int coroutine_fn nfs_co_pwritev(BlockDriverState 
*bs, uint64_t offset,
 buf = iov->iov[0].iov_base;
 }
 
-qemu_mutex_lock(>mutex);
-if (nfs_pwrite_async(client->context, client->fh,
- offset, bytes, buf,
- nfs_co_generic_cb, ) != 0) {
-qemu_mutex_unlock(>mutex);
-if (my_buffer) {
-g_free(buf);
+WITH_QEMU_LOCK_GUARD(>mutex) {
+if (nfs_pwrite_async(client->context, client->fh,
+ offset, bytes, buf,
+ nfs_co_generic_cb, ) != 0) {
+if (my_buffer) {
+g_free(buf);
+}
+return -ENOMEM;
 }
-return -ENOMEM;
-}
 
-nfs_set_events(client);
-qemu_mutex_unlock(>mutex);
+nfs_set_events(client);
+}
 while (!task.complete) {
 qemu_coroutine_yield();
 }
@@ -355,15 +353,14 @@ static int coroutine_fn nfs_co_flush(BlockDriverState *bs)
 
 nfs_co_init_task(bs, );
 
-qemu_mutex_lock(>mutex);
-if (nfs_fsync_async(client->context, client->fh, nfs_co_generic_cb,
-) != 0) {
-qemu_mutex_unlock(>mutex);
-return -ENOMEM;
-}
+WITH_QEMU_LOCK_GUARD(>mutex) {
+if (nfs_fsync_async(client->context, client->fh, nfs_co_generic_cb,
+) != 0) {
+return -ENOMEM;
+}
 
-nfs_set_events(client);
-qemu_mutex_unlock(>mutex);
+nfs_set_events(client);
+}
 while (!task.complete) {
 qemu_coroutine_yield();
 }
diff --git a/cpus-common.c b/cpus-common.c
index eaf590cb38..55d5df8923 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -22,6 +22,7 @@
 #include "exec/cpu-common.h"
 #include "hw/core/cpu.h"
 #include "sysemu/cpus.h"
+#include "qemu/lockable.h"
 
 static QemuMutex qemu_cpu_list_lock;
 static QemuCond exclusive_cond;
@@ -71,7 +72,7 @@ static int cpu_get_free_index(void)
 
 void cpu_list_add(CPUState *cpu)
 {
-qemu_mutex_lock(_cpu_list_lock);
+QEMU_LOCK_GUARD(_cpu_list_lock);
 if (cpu->cpu_index == UNASSIGNED_CPU_INDEX) {
 cpu->cpu_index = cpu_get_free_index();
 assert(cpu->cpu_index != UNASSIGNED_CPU_INDEX);
@@ -79,15 +80,13 @@ void 

Re: [PATCH v3 2/2] lockable: replaced locks with lock guard macros where appropriate

2020-03-20 Thread Dr. David Alan Gilbert
* dnbrd...@gmail.com (dnbrd...@gmail.com) wrote:
> From: Daniel Brodsky 
> 
> - ran regexp "qemu_mutex_lock\(.*\).*\n.*if" to find targets
> - replaced result with QEMU_LOCK_GUARD if all unlocks at function end
> - replaced result with WITH_QEMU_LOCK_GUARD if unlock not at end
> 
> Signed-off-by: Daniel Brodsky 
> ---
>  block/iscsi.c | 11 +++---
>  block/nfs.c   | 51 ---
>  cpus-common.c | 13 +--
>  hw/display/qxl.c  | 43 +---
>  hw/vfio/platform.c|  5 ++---
>  migration/migration.c |  3 +--
>  migration/multifd.c   |  8 +++
>  migration/ram.c   |  3 +--
>  monitor/misc.c|  4 +---
>  ui/spice-display.c| 14 ++--
>  util/log.c|  4 ++--
>  util/qemu-timer.c | 17 +++
>  util/rcu.c|  8 +++
>  util/thread-pool.c|  3 +--
>  util/vfio-helpers.c   |  4 ++--
>  15 files changed, 84 insertions(+), 107 deletions(-)
> 



> diff --git a/migration/migration.c b/migration/migration.c
> index c1d88ace7f..2f0bd6d8b4 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1652,11 +1652,10 @@ static void migrate_fd_cleanup_bh(void *opaque)
>  
>  void migrate_set_error(MigrationState *s, const Error *error)
>  {
> -qemu_mutex_lock(>error_mutex);
> +QEMU_LOCK_GUARD(>error_mutex);
>  if (!s->error) {
>  s->error = error_copy(error);
>  }
> -qemu_mutex_unlock(>error_mutex);
>  }

There are some other places in migration.c that would really benefit;
for example, migrate_send_rp_message, if you use a LOCK_QUARD
there, then you can remove the 'int ret', and the goto error.
In postcopy_pause, the locks on qemu_file_lock would work well using the 
WITH_QEMU_LOCK_GUARD.

>  void migrate_fd_error(MigrationState *s, const Error *error)
> diff --git a/migration/multifd.c b/migration/multifd.c
> index cb6a4a3ab8..9123c111a3 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -894,11 +894,11 @@ void multifd_recv_sync_main(void)
>  for (i = 0; i < migrate_multifd_channels(); i++) {
>  MultiFDRecvParams *p = _recv_state->params[i];
>  
> -qemu_mutex_lock(>mutex);
> -if (multifd_recv_state->packet_num < p->packet_num) {
> -multifd_recv_state->packet_num = p->packet_num;
> +WITH_QEMU_LOCK_GUARD(>mutex) {
> +if (multifd_recv_state->packet_num < p->packet_num) {
> +multifd_recv_state->packet_num = p->packet_num;
> +}
>  }
> -qemu_mutex_unlock(>mutex);
>  trace_multifd_recv_sync_main_signal(p->id);
>  qemu_sem_post(>sem_sync);
>  }

> diff --git a/migration/ram.c b/migration/ram.c
> index c12cfdbe26..87a670cfbf 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1368,7 +1368,7 @@ static RAMBlock *unqueue_page(RAMState *rs, ram_addr_t 
> *offset)
>  return NULL;
>  }
>  
> -qemu_mutex_lock(>src_page_req_mutex);
> +QEMU_LOCK_GUARD(>src_page_req_mutex);
>  if (!QSIMPLEQ_EMPTY(>src_page_requests)) {
>  struct RAMSrcPageRequest *entry =
>  QSIMPLEQ_FIRST(>src_page_requests);
> @@ -1385,7 +1385,6 @@ static RAMBlock *unqueue_page(RAMState *rs, ram_addr_t 
> *offset)
>  migration_consume_urgent_request();
>  }
>  }
> -qemu_mutex_unlock(>src_page_req_mutex);
>  
>  return block;
>  }

Is there a reason you didn't do the matching pair at the bottom of
ram_save_queue_pages ?

Dave

> diff --git a/monitor/misc.c b/monitor/misc.c
> index 6c45fa490f..9723b466cd 100644
> --- a/monitor/misc.c
> +++ b/monitor/misc.c
> @@ -1473,7 +1473,7 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool 
> has_fdset_id, int64_t fdset_id,
>  MonFdsetFd *mon_fdset_fd;
>  AddfdInfo *fdinfo;
>  
> -qemu_mutex_lock(_fdsets_lock);
> +QEMU_LOCK_GUARD(_fdsets_lock);
>  if (has_fdset_id) {
>  QLIST_FOREACH(mon_fdset, _fdsets, next) {
>  /* Break if match found or match impossible due to ordering by 
> ID */
> @@ -1494,7 +1494,6 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool 
> has_fdset_id, int64_t fdset_id,
>  if (fdset_id < 0) {
>  error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "fdset-id",
> "a non-negative value");
> -qemu_mutex_unlock(_fdsets_lock);
>  return NULL;
>  }
>  /* Use specified fdset ID */
> @@ -1545,7 +1544,6 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool 
> has_fdset_id, int64_t fdset_id,
>  fdinfo->fdset_id = mon_fdset->id;
>  fdinfo->fd = mon_fdset_fd->fd;
>  
> -qemu_mutex_unlock(_fdsets_lock);
>  return fdinfo;
>  }
>  
> diff --git a/ui/spice-display.c b/ui/spice-display.c
> index 6babe24909..19632fdf6c 100644
> --- a/ui/spice-display.c
> +++ b/ui/spice-display.c
> @@ -18,6 +18,7 @@
>  #include "qemu/osdep.h"
>  #include 

Re: [PATCH v2 2/2] lockable: replaced locks with lock guard macros where appropriate

2020-03-20 Thread Daniel Brodsky
On Fri, Mar 20, 2020 at 5:06 AM Paolo Bonzini  wrote:
>
> On 20/03/20 00:34, dnbrd...@gmail.com wrote:
> > index 682abd8e09..89f8a656a4 100644
> > --- a/block/iscsi.c
> > +++ b/block/iscsi.c
> > @@ -1086,7 +1086,7 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState 
> > *bs,
> >  acb->task->expxferlen = acb->ioh->dxfer_len;
> >
> >  data.size = 0;
> > -qemu_mutex_lock(>mutex);
> > +QEMU_LOCK_GUARD(>mutex);
> >  if (acb->task->xfer_dir == SCSI_XFER_WRITE) {
> >  if (acb->ioh->iovec_count == 0) {
> >  data.data = acb->ioh->dxferp;
> > @@ -1102,7 +1102,6 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState 
> > *bs,
> >   iscsi_aio_ioctl_cb,
> >   (data.size > 0) ?  : NULL,
> >   acb) != 0) {
> > -qemu_mutex_unlock(>mutex);
> >  scsi_free_scsi_task(acb->task);
> >  qemu_aio_unref(acb);
> >  return NULL;
>
> Not exactly the same, should be okay but it also should be noted in the
> changelog.

Going to drop this change in the next version, I don't want this patch
to include cases
with possible side effects as I skipped other ones like this already.

> >  void cpu_list_remove(CPUState *cpu)
> >  {
> > -qemu_mutex_lock(_cpu_list_lock);
> > +QEMU_LOCK_GUARD(_cpu_list_lock);
> >  if (!QTAILQ_IN_USE(cpu, node)) {
> >  /* there is nothing to undo since cpu_exec_init() hasn't been 
> > called */
> >  qemu_mutex_unlock(_cpu_list_lock);
>
>
> Missed unlock.
>
> Otherwise looks good.
>
> Paolo
>
Thanks for the review, I'll fix the changes you pointed out in the next version.

Daniel



[PATCH v3 2/2] lockable: replaced locks with lock guard macros where appropriate

2020-03-20 Thread dnbrdsky
From: Daniel Brodsky 

- ran regexp "qemu_mutex_lock\(.*\).*\n.*if" to find targets
- replaced result with QEMU_LOCK_GUARD if all unlocks at function end
- replaced result with WITH_QEMU_LOCK_GUARD if unlock not at end

Signed-off-by: Daniel Brodsky 
---
 block/iscsi.c | 11 +++---
 block/nfs.c   | 51 ---
 cpus-common.c | 13 +--
 hw/display/qxl.c  | 43 +---
 hw/vfio/platform.c|  5 ++---
 migration/migration.c |  3 +--
 migration/multifd.c   |  8 +++
 migration/ram.c   |  3 +--
 monitor/misc.c|  4 +---
 ui/spice-display.c| 14 ++--
 util/log.c|  4 ++--
 util/qemu-timer.c | 17 +++
 util/rcu.c|  8 +++
 util/thread-pool.c|  3 +--
 util/vfio-helpers.c   |  4 ++--
 15 files changed, 84 insertions(+), 107 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 682abd8e09..89f8a656a4 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1086,7 +1086,7 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
 acb->task->expxferlen = acb->ioh->dxfer_len;
 
 data.size = 0;
-qemu_mutex_lock(>mutex);
+QEMU_LOCK_GUARD(>mutex);
 if (acb->task->xfer_dir == SCSI_XFER_WRITE) {
 if (acb->ioh->iovec_count == 0) {
 data.data = acb->ioh->dxferp;
@@ -1102,7 +1102,6 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
  iscsi_aio_ioctl_cb,
  (data.size > 0) ?  : NULL,
  acb) != 0) {
-qemu_mutex_unlock(>mutex);
 scsi_free_scsi_task(acb->task);
 qemu_aio_unref(acb);
 return NULL;
@@ -1122,7 +1121,6 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
 }
 
 iscsi_set_events(iscsilun);
-qemu_mutex_unlock(>mutex);
 
 return >common;
 }
@@ -1395,20 +1393,17 @@ static void iscsi_nop_timed_event(void *opaque)
 {
 IscsiLun *iscsilun = opaque;
 
-qemu_mutex_lock(>mutex);
+QEMU_LOCK_GUARD(>mutex);
 if (iscsi_get_nops_in_flight(iscsilun->iscsi) >= MAX_NOP_FAILURES) {
 error_report("iSCSI: NOP timeout. Reconnecting...");
 iscsilun->request_timed_out = true;
 } else if (iscsi_nop_out_async(iscsilun->iscsi, NULL, NULL, 0, NULL) != 0) 
{
 error_report("iSCSI: failed to sent NOP-Out. Disabling NOP messages.");
-goto out;
+return;
 }
 
 timer_mod(iscsilun->nop_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + 
NOP_INTERVAL);
 iscsi_set_events(iscsilun);
-
-out:
-qemu_mutex_unlock(>mutex);
 }
 
 static void iscsi_readcapacity_sync(IscsiLun *iscsilun, Error **errp)
diff --git a/block/nfs.c b/block/nfs.c
index 9a6311e270..09a78aede8 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -273,15 +273,14 @@ static int coroutine_fn nfs_co_preadv(BlockDriverState 
*bs, uint64_t offset,
 nfs_co_init_task(bs, );
 task.iov = iov;
 
-qemu_mutex_lock(>mutex);
-if (nfs_pread_async(client->context, client->fh,
-offset, bytes, nfs_co_generic_cb, ) != 0) {
-qemu_mutex_unlock(>mutex);
-return -ENOMEM;
-}
+WITH_QEMU_LOCK_GUARD(>mutex) {
+if (nfs_pread_async(client->context, client->fh,
+offset, bytes, nfs_co_generic_cb, ) != 0) {
+return -ENOMEM;
+}
 
-nfs_set_events(client);
-qemu_mutex_unlock(>mutex);
+nfs_set_events(client);
+}
 while (!task.complete) {
 qemu_coroutine_yield();
 }
@@ -320,19 +319,18 @@ static int coroutine_fn nfs_co_pwritev(BlockDriverState 
*bs, uint64_t offset,
 buf = iov->iov[0].iov_base;
 }
 
-qemu_mutex_lock(>mutex);
-if (nfs_pwrite_async(client->context, client->fh,
- offset, bytes, buf,
- nfs_co_generic_cb, ) != 0) {
-qemu_mutex_unlock(>mutex);
-if (my_buffer) {
-g_free(buf);
+WITH_QEMU_LOCK_GUARD(>mutex) {
+if (nfs_pwrite_async(client->context, client->fh,
+ offset, bytes, buf,
+ nfs_co_generic_cb, ) != 0) {
+if (my_buffer) {
+g_free(buf);
+}
+return -ENOMEM;
 }
-return -ENOMEM;
-}
 
-nfs_set_events(client);
-qemu_mutex_unlock(>mutex);
+nfs_set_events(client);
+}
 while (!task.complete) {
 qemu_coroutine_yield();
 }
@@ -355,15 +353,14 @@ static int coroutine_fn nfs_co_flush(BlockDriverState *bs)
 
 nfs_co_init_task(bs, );
 
-qemu_mutex_lock(>mutex);
-if (nfs_fsync_async(client->context, client->fh, nfs_co_generic_cb,
-) != 0) {
-qemu_mutex_unlock(>mutex);
-return -ENOMEM;
-}
+WITH_QEMU_LOCK_GUARD(>mutex) {
+if (nfs_fsync_async(client->context, client->fh, nfs_co_generic_cb,
+ 

Re: [PATCH v2 2/2] lockable: replaced locks with lock guard macros where appropriate

2020-03-20 Thread Paolo Bonzini
On 20/03/20 00:34, dnbrd...@gmail.com wrote:
> index 682abd8e09..89f8a656a4 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -1086,7 +1086,7 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
>  acb->task->expxferlen = acb->ioh->dxfer_len;
>  
>  data.size = 0;
> -qemu_mutex_lock(>mutex);
> +QEMU_LOCK_GUARD(>mutex);
>  if (acb->task->xfer_dir == SCSI_XFER_WRITE) {
>  if (acb->ioh->iovec_count == 0) {
>  data.data = acb->ioh->dxferp;
> @@ -1102,7 +1102,6 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
>   iscsi_aio_ioctl_cb,
>   (data.size > 0) ?  : NULL,
>   acb) != 0) {
> -qemu_mutex_unlock(>mutex);
>  scsi_free_scsi_task(acb->task);
>  qemu_aio_unref(acb);
>  return NULL;

Not exactly the same, should be okay but it also should be noted in the
changelog.

>  void cpu_list_remove(CPUState *cpu)
>  {
> -qemu_mutex_lock(_cpu_list_lock);
> +QEMU_LOCK_GUARD(_cpu_list_lock);
>  if (!QTAILQ_IN_USE(cpu, node)) {
>  /* there is nothing to undo since cpu_exec_init() hasn't been called 
> */
>  qemu_mutex_unlock(_cpu_list_lock);

Missed unlock.

> 
> @@ -667,14 +668,13 @@ int qemu_vfio_dma_reset_temporary(QEMUVFIOState *s)
>  .size = QEMU_VFIO_IOVA_MAX - s->high_water_mark,
>  };
>  trace_qemu_vfio_dma_reset_temporary(s);
> -qemu_mutex_lock(>lock);
> +QEMU_LOCK_GUARD(>lock);
>  if (ioctl(s->container, VFIO_IOMMU_UNMAP_DMA, )) {
>  error_report("VFIO_UNMAP_DMA failed: %s", strerror(errno));
>  qemu_mutex_unlock(>lock);
>  return -errno;
>  }
>  s->high_water_mark = QEMU_VFIO_IOVA_MAX;
> -qemu_mutex_unlock(>lock);
>  return 0;

Missed unlock.

Otherwise looks good.

Paolo




Re: [PATCH RESEND v3 2/4] virtio-scsi: default num_queues to -smp N

2020-03-20 Thread Cornelia Huck
On Fri, 20 Mar 2020 10:30:39 +
Stefan Hajnoczi  wrote:

> Automatically size the number of virtio-scsi-pci, vhost-scsi-pci, and
> vhost-user-scsi-pci request virtqueues to match the number of vCPUs.
> Other transports continue to default to 1 request virtqueue.
> 
> A 1:1 virtqueue:vCPU mapping ensures that completion interrupts are
> handled on the same vCPU that submitted the request.  No IPI is
> necessary to complete an I/O request and performance is improved.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  hw/core/machine.c   |  3 +++
>  hw/scsi/vhost-scsi.c|  3 ++-
>  hw/scsi/vhost-user-scsi.c   |  3 ++-
>  hw/scsi/virtio-scsi.c   |  6 +-
>  hw/virtio/vhost-scsi-pci.c  | 10 --
>  hw/virtio/vhost-user-scsi-pci.c | 10 --
>  hw/virtio/virtio-scsi-pci.c | 10 --
>  include/hw/virtio/virtio-scsi.h |  2 ++
>  8 files changed, 38 insertions(+), 9 deletions(-)

(...)

> diff --git a/hw/virtio/vhost-scsi-pci.c b/hw/virtio/vhost-scsi-pci.c
> index 5dce640eaf..a0b7cdc1ac 100644
> --- a/hw/virtio/vhost-scsi-pci.c
> +++ b/hw/virtio/vhost-scsi-pci.c
> @@ -17,6 +17,7 @@
>  #include "qemu/osdep.h"
>  
>  #include "standard-headers/linux/virtio_pci.h"
> +#include "hw/boards.h"
>  #include "hw/qdev-properties.h"
>  #include "hw/virtio/vhost-scsi.h"
>  #include "qapi/error.h"
> @@ -47,10 +48,15 @@ static void vhost_scsi_pci_realize(VirtIOPCIProxy 
> *vpci_dev, Error **errp)
>  {
>  VHostSCSIPCI *dev = VHOST_SCSI_PCI(vpci_dev);
>  DeviceState *vdev = DEVICE(>vdev);
> -VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
> +VirtIOSCSIConf *conf = >vdev.parent_obj.parent_obj.conf;
> +
> +/* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
> +if (conf->num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
> +conf->num_queues = current_machine->smp.cpus;

I don't recall the discussion from previous versions of this patch
set... do we need to bound this by the maximum number of virtqueues? It
seems unlikely that something will break from my reading of the code,
but it might still be nicer.

> +}
>  
>  if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
> -vpci_dev->nvectors = vs->conf.num_queues + 3;
> +vpci_dev->nvectors = conf->num_queues + VIRTIO_SCSI_VQ_NUM_FIXED + 1;

You might already do the resolving of 3 into NUM_FIXED + 1 in the
previous patch; but as you touch this line anyway, I'd just keep this
if you don't need to respin.

>  }
>  
>  qdev_set_parent_bus(vdev, BUS(_dev->bus));

(Same comments apply to the two other cases below.)




[PATCH RESEND v3 4/4] vhost-user-blk: default num_queues to -smp N

2020-03-20 Thread Stefan Hajnoczi
Automatically size the number of request virtqueues to match the number
of vCPUs.  This ensures that completion interrupts are handled on the
same vCPU that submitted the request.  No IPI is necessary to complete
an I/O request and performance is improved.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Cornelia Huck 
---
 hw/block/vhost-user-blk.c  | 6 +-
 hw/core/machine.c  | 1 +
 hw/virtio/vhost-user-blk-pci.c | 6 ++
 include/hw/virtio/vhost-user-blk.h | 2 ++
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 12925a47ec..5c275af935 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -403,6 +403,9 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
+if (s->num_queues == VHOST_USER_BLK_AUTO_NUM_QUEUES) {
+s->num_queues = 1;
+}
 if (!s->num_queues || s->num_queues > VIRTIO_QUEUE_MAX) {
 error_setg(errp, "vhost-user-blk: invalid number of IO queues");
 return;
@@ -511,7 +514,8 @@ static const VMStateDescription vmstate_vhost_user_blk = {
 
 static Property vhost_user_blk_properties[] = {
 DEFINE_PROP_CHR("chardev", VHostUserBlk, chardev),
-DEFINE_PROP_UINT16("num-queues", VHostUserBlk, num_queues, 1),
+DEFINE_PROP_UINT16("num-queues", VHostUserBlk, num_queues,
+   VHOST_USER_BLK_AUTO_NUM_QUEUES),
 DEFINE_PROP_UINT32("queue-size", VHostUserBlk, queue_size, 128),
 DEFINE_PROP_BIT("config-wce", VHostUserBlk, config_wce, 0, true),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/core/machine.c b/hw/core/machine.c
index c993b8f489..13d00abdc4 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -38,6 +38,7 @@ GlobalProperty hw_compat_4_2[] = {
 { "virtio-scsi-device", "seg_max_adjust", "off"},
 { "vhost-blk-device", "seg_max_adjust", "off"},
 { "vhost-scsi", "num_queues", "1"},
+{ "vhost-user-blk", "num-queues", "1"},
 { "vhost-user-scsi", "num_queues", "1"},
 { "usb-host", "suppress-remote-wake", "off" },
 { "usb-redir", "suppress-remote-wake", "off" },
diff --git a/hw/virtio/vhost-user-blk-pci.c b/hw/virtio/vhost-user-blk-pci.c
index 8d3d766427..846fec83ac 100644
--- a/hw/virtio/vhost-user-blk-pci.c
+++ b/hw/virtio/vhost-user-blk-pci.c
@@ -19,6 +19,7 @@
 #include "qemu/osdep.h"
 
 #include "standard-headers/linux/virtio_pci.h"
+#include "hw/boards.h"
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/vhost-user-blk.h"
 #include "hw/pci/pci.h"
@@ -54,6 +55,11 @@ static void vhost_user_blk_pci_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 VHostUserBlkPCI *dev = VHOST_USER_BLK_PCI(vpci_dev);
 DeviceState *vdev = DEVICE(>vdev);
 
+/* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
+if (dev->vdev.num_queues == VHOST_USER_BLK_AUTO_NUM_QUEUES) {
+dev->vdev.num_queues = current_machine->smp.cpus;
+}
+
 if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
 vpci_dev->nvectors = dev->vdev.num_queues + 1;
 }
diff --git a/include/hw/virtio/vhost-user-blk.h 
b/include/hw/virtio/vhost-user-blk.h
index 05ea0ad183..c28027c7c8 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -25,6 +25,8 @@
 #define VHOST_USER_BLK(obj) \
 OBJECT_CHECK(VHostUserBlk, (obj), TYPE_VHOST_USER_BLK)
 
+#define VHOST_USER_BLK_AUTO_NUM_QUEUES UINT16_MAX
+
 typedef struct VHostUserBlk {
 VirtIODevice parent_obj;
 CharBackend chardev;
-- 
2.24.1



[PATCH RESEND v3 1/4] virtio-scsi: introduce a constant for fixed virtqueues

2020-03-20 Thread Stefan Hajnoczi
The event and control virtqueues are always present, regardless of the
multi-queue configuration.  Define a constant so that virtqueue number
calculations are easier to read.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Cornelia Huck 
---
 hw/scsi/vhost-user-scsi.c   | 2 +-
 hw/scsi/virtio-scsi.c   | 7 ---
 include/hw/virtio/virtio-scsi.h | 3 +++
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index a01bf63a08..e9752baa89 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -115,7 +115,7 @@ static void vhost_user_scsi_realize(DeviceState *dev, Error 
**errp)
 goto free_virtio;
 }
 
-vsc->dev.nvqs = 2 + vs->conf.num_queues;
+vsc->dev.nvqs = VIRTIO_SCSI_VQ_NUM_FIXED + vs->conf.num_queues;
 vsc->dev.vqs = g_new0(struct vhost_virtqueue, vsc->dev.nvqs);
 vsc->dev.vq_index = 0;
 vsc->dev.backend_features = 0;
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 472bbd233b..427ad83c50 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -191,7 +191,7 @@ static void virtio_scsi_save_request(QEMUFile *f, 
SCSIRequest *sreq)
 VirtIOSCSIReq *req = sreq->hba_private;
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(req->dev);
 VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
-uint32_t n = virtio_get_queue_index(req->vq) - 2;
+uint32_t n = virtio_get_queue_index(req->vq) - VIRTIO_SCSI_VQ_NUM_FIXED;
 
 assert(n < vs->conf.num_queues);
 qemu_put_be32s(f, );
@@ -892,10 +892,11 @@ void virtio_scsi_common_realize(DeviceState *dev,
 sizeof(VirtIOSCSIConfig));
 
 if (s->conf.num_queues == 0 ||
-s->conf.num_queues > VIRTIO_QUEUE_MAX - 2) {
+s->conf.num_queues > VIRTIO_QUEUE_MAX - VIRTIO_SCSI_VQ_NUM_FIXED) {
 error_setg(errp, "Invalid number of queues (= %" PRIu32 "), "
  "must be a positive integer less than %d.",
-   s->conf.num_queues, VIRTIO_QUEUE_MAX - 2);
+   s->conf.num_queues,
+   VIRTIO_QUEUE_MAX - VIRTIO_SCSI_VQ_NUM_FIXED);
 virtio_cleanup(vdev);
 return;
 }
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 24e768909d..9f293bcb80 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -36,6 +36,9 @@
 #define VIRTIO_SCSI_MAX_TARGET  255
 #define VIRTIO_SCSI_MAX_LUN 16383
 
+/* Number of virtqueues that are always present */
+#define VIRTIO_SCSI_VQ_NUM_FIXED2
+
 typedef struct virtio_scsi_cmd_req VirtIOSCSICmdReq;
 typedef struct virtio_scsi_cmd_resp VirtIOSCSICmdResp;
 typedef struct virtio_scsi_ctrl_tmf_req VirtIOSCSICtrlTMFReq;
-- 
2.24.1



[PATCH RESEND v3 2/4] virtio-scsi: default num_queues to -smp N

2020-03-20 Thread Stefan Hajnoczi
Automatically size the number of virtio-scsi-pci, vhost-scsi-pci, and
vhost-user-scsi-pci request virtqueues to match the number of vCPUs.
Other transports continue to default to 1 request virtqueue.

A 1:1 virtqueue:vCPU mapping ensures that completion interrupts are
handled on the same vCPU that submitted the request.  No IPI is
necessary to complete an I/O request and performance is improved.

Signed-off-by: Stefan Hajnoczi 
---
 hw/core/machine.c   |  3 +++
 hw/scsi/vhost-scsi.c|  3 ++-
 hw/scsi/vhost-user-scsi.c   |  3 ++-
 hw/scsi/virtio-scsi.c   |  6 +-
 hw/virtio/vhost-scsi-pci.c  | 10 --
 hw/virtio/vhost-user-scsi-pci.c | 10 --
 hw/virtio/virtio-scsi-pci.c | 10 --
 include/hw/virtio/virtio-scsi.h |  2 ++
 8 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 9e8c06036f..4bbcec8fbd 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -33,8 +33,11 @@ GlobalProperty hw_compat_4_2[] = {
 { "virtio-scsi-device", "virtqueue_size", "128"},
 { "virtio-blk-device", "x-enable-wce-if-config-wce", "off" },
 { "virtio-blk-device", "seg-max-adjust", "off"},
+{ "virtio-scsi-device", "num_queues", "1"},
 { "virtio-scsi-device", "seg_max_adjust", "off"},
 { "vhost-blk-device", "seg_max_adjust", "off"},
+{ "vhost-scsi", "num_queues", "1"},
+{ "vhost-user-scsi", "num_queues", "1"},
 { "usb-host", "suppress-remote-wake", "off" },
 { "usb-redir", "suppress-remote-wake", "off" },
 { "qxl", "revision", "4" },
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index f052377b7e..8cb7e3825f 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -272,7 +272,8 @@ static Property vhost_scsi_properties[] = {
 DEFINE_PROP_STRING("vhostfd", VirtIOSCSICommon, conf.vhostfd),
 DEFINE_PROP_STRING("wwpn", VirtIOSCSICommon, conf.wwpn),
 DEFINE_PROP_UINT32("boot_tpgt", VirtIOSCSICommon, conf.boot_tpgt, 0),
-DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues, 1),
+DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues,
+   VIRTIO_SCSI_AUTO_NUM_QUEUES),
 DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSICommon, conf.virtqueue_size,
128),
 DEFINE_PROP_BOOL("seg_max_adjust", VirtIOSCSICommon, conf.seg_max_adjust,
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index e9752baa89..f0a7e76280 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -163,7 +163,8 @@ static void vhost_user_scsi_unrealize(DeviceState *dev, 
Error **errp)
 static Property vhost_user_scsi_properties[] = {
 DEFINE_PROP_CHR("chardev", VirtIOSCSICommon, conf.chardev),
 DEFINE_PROP_UINT32("boot_tpgt", VirtIOSCSICommon, conf.boot_tpgt, 0),
-DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues, 1),
+DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues,
+   VIRTIO_SCSI_AUTO_NUM_QUEUES),
 DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSICommon, conf.virtqueue_size,
128),
 DEFINE_PROP_UINT32("max_sectors", VirtIOSCSICommon, conf.max_sectors,
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 427ad83c50..3bf97836d9 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -891,6 +891,9 @@ void virtio_scsi_common_realize(DeviceState *dev,
 virtio_init(vdev, "virtio-scsi", VIRTIO_ID_SCSI,
 sizeof(VirtIOSCSIConfig));
 
+if (s->conf.num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) {
+s->conf.num_queues = 1;
+}
 if (s->conf.num_queues == 0 ||
 s->conf.num_queues > VIRTIO_QUEUE_MAX - VIRTIO_SCSI_VQ_NUM_FIXED) {
 error_setg(errp, "Invalid number of queues (= %" PRIu32 "), "
@@ -964,7 +967,8 @@ static void virtio_scsi_device_unrealize(DeviceState *dev, 
Error **errp)
 }
 
 static Property virtio_scsi_properties[] = {
-DEFINE_PROP_UINT32("num_queues", VirtIOSCSI, parent_obj.conf.num_queues, 
1),
+DEFINE_PROP_UINT32("num_queues", VirtIOSCSI, parent_obj.conf.num_queues,
+   VIRTIO_SCSI_AUTO_NUM_QUEUES),
 DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSI,
  parent_obj.conf.virtqueue_size, 256),
 DEFINE_PROP_BOOL("seg_max_adjust", VirtIOSCSI,
diff --git a/hw/virtio/vhost-scsi-pci.c b/hw/virtio/vhost-scsi-pci.c
index 5dce640eaf..a0b7cdc1ac 100644
--- a/hw/virtio/vhost-scsi-pci.c
+++ b/hw/virtio/vhost-scsi-pci.c
@@ -17,6 +17,7 @@
 #include "qemu/osdep.h"
 
 #include "standard-headers/linux/virtio_pci.h"
+#include "hw/boards.h"
 #include "hw/qdev-properties.h"
 #include "hw/virtio/vhost-scsi.h"
 #include "qapi/error.h"
@@ -47,10 +48,15 @@ static void vhost_scsi_pci_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 {
 VHostSCSIPCI *dev = VHOST_SCSI_PCI(vpci_dev);
 DeviceState *vdev = DEVICE(>vdev);
-VirtIOSCSICommon *vs = 

[PATCH RESEND v3 3/4] virtio-blk: default num_queues to -smp N

2020-03-20 Thread Stefan Hajnoczi
Automatically size the number of virtio-blk-pci request virtqueues to
match the number of vCPUs.  Other transports continue to default to 1
request virtqueue.

A 1:1 virtqueue:vCPU mapping ensures that completion interrupts are
handled on the same vCPU that submitted the request.  No IPI is
necessary to complete an I/O request and performance is improved.

Performance improves from 78k to 104k IOPS on a 32 vCPU guest with 101
virtio-blk-pci devices (ioengine=libaio, iodepth=1, bs=4k, rw=randread
with NVMe storage).

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Cornelia Huck 
---
 hw/block/virtio-blk.c  | 6 +-
 hw/core/machine.c  | 1 +
 hw/virtio/virtio-blk-pci.c | 9 -
 include/hw/virtio/virtio-blk.h | 2 ++
 4 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 142863a3b2..ed5df71779 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1135,6 +1135,9 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 error_setg(errp, "Device needs media, but drive is empty");
 return;
 }
+if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
+conf->num_queues = 1;
+}
 if (!conf->num_queues) {
 error_setg(errp, "num-queues property must be larger than 0");
 return;
@@ -1271,7 +1274,8 @@ static Property virtio_blk_properties[] = {
 #endif
 DEFINE_PROP_BIT("request-merging", VirtIOBlock, conf.request_merging, 0,
 true),
-DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, 1),
+DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues,
+   VIRTIO_BLK_AUTO_NUM_QUEUES),
 DEFINE_PROP_UINT16("queue-size", VirtIOBlock, conf.queue_size, 256),
 DEFINE_PROP_BOOL("seg-max-adjust", VirtIOBlock, conf.seg_max_adjust, true),
 DEFINE_PROP_LINK("iothread", VirtIOBlock, conf.iothread, TYPE_IOTHREAD,
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 4bbcec8fbd..c993b8f489 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -31,6 +31,7 @@
 GlobalProperty hw_compat_4_2[] = {
 { "virtio-blk-device", "queue-size", "128"},
 { "virtio-scsi-device", "virtqueue_size", "128"},
+{ "virtio-blk-device", "num-queues", "1"},
 { "virtio-blk-device", "x-enable-wce-if-config-wce", "off" },
 { "virtio-blk-device", "seg-max-adjust", "off"},
 { "virtio-scsi-device", "num_queues", "1"},
diff --git a/hw/virtio/virtio-blk-pci.c b/hw/virtio/virtio-blk-pci.c
index efb2c22a1d..bf628de675 100644
--- a/hw/virtio/virtio-blk-pci.c
+++ b/hw/virtio/virtio-blk-pci.c
@@ -17,6 +17,7 @@
 
 #include "qemu/osdep.h"
 
+#include "hw/boards.h"
 #include "hw/qdev-properties.h"
 #include "hw/virtio/virtio-blk.h"
 #include "virtio-pci.h"
@@ -50,9 +51,15 @@ static void virtio_blk_pci_realize(VirtIOPCIProxy *vpci_dev, 
Error **errp)
 {
 VirtIOBlkPCI *dev = VIRTIO_BLK_PCI(vpci_dev);
 DeviceState *vdev = DEVICE(>vdev);
+VirtIOBlkConf *conf = >vdev.conf;
+
+/* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */
+if (conf->num_queues == VIRTIO_BLK_AUTO_NUM_QUEUES) {
+conf->num_queues = current_machine->smp.cpus;
+}
 
 if (vpci_dev->nvectors == DEV_NVECTORS_UNSPECIFIED) {
-vpci_dev->nvectors = dev->vdev.conf.num_queues + 1;
+vpci_dev->nvectors = conf->num_queues + 1;
 }
 
 qdev_set_parent_bus(vdev, BUS(_dev->bus));
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 1e62f869b2..4e5e903f4a 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -30,6 +30,8 @@ struct virtio_blk_inhdr
 unsigned char status;
 };
 
+#define VIRTIO_BLK_AUTO_NUM_QUEUES UINT16_MAX
+
 struct VirtIOBlkConf
 {
 BlockConf conf;
-- 
2.24.1



[PATCH RESEND v3 0/4] virtio-pci: enable blk and scsi multi-queue by default

2020-03-20 Thread Stefan Hajnoczi
v3:
 * Add new performance results that demonstrate the scalability
 * Mention that this is PCI-specific [Cornelia]
v2:
 * Let the virtio-DEVICE-pci device select num-queues because the optimal
   multi-queue configuration may differ between virtio-pci, virtio-mmio, and
   virtio-ccw [Cornelia]

Enabling multi-queue on virtio-pci storage devices improves performance on SMP
guests because the completion interrupt is handled on the vCPU that submitted
the I/O request.  This avoids IPIs inside the guest.

Note that performance is unchanged in these cases:
1. Uniprocessor guests.  They don't have IPIs.
2. Application threads might be scheduled on the sole vCPU that handles
   completion interrupts purely by chance.  (This is one reason why benchmark
   results can vary noticably between runs.)
3. Users may bind the application to the vCPU that handles completion
   interrupts.

Set the number of queues to the number of vCPUs by default on virtio-blk and
virtio-scsi PCI devices.  Older machine types continue to default to 1 queue
for live migration compatibility.

Random read performance:
  IOPS
q=178k
q=32  104k  +33%

Boot time:
  Duration
q=151s
q=32 1m41s  +98%

Guest configuration: 32 vCPUs, 101 virtio-blk-pci disks

Previously measured results on a 4 vCPU guest were also positive but showed a
smaller 1-4% performance improvement.  They are no longer valid because
significant event loop optimizations have been merged.

Stefan Hajnoczi (4):
  virtio-scsi: introduce a constant for fixed virtqueues
  virtio-scsi: default num_queues to -smp N
  virtio-blk: default num_queues to -smp N
  vhost-user-blk: default num_queues to -smp N

 hw/block/vhost-user-blk.c  |  6 +-
 hw/block/virtio-blk.c  |  6 +-
 hw/core/machine.c  |  5 +
 hw/scsi/vhost-scsi.c   |  3 ++-
 hw/scsi/vhost-user-scsi.c  |  5 +++--
 hw/scsi/virtio-scsi.c  | 13 +
 hw/virtio/vhost-scsi-pci.c | 10 --
 hw/virtio/vhost-user-blk-pci.c |  6 ++
 hw/virtio/vhost-user-scsi-pci.c| 10 --
 hw/virtio/virtio-blk-pci.c |  9 -
 hw/virtio/virtio-scsi-pci.c| 10 --
 include/hw/virtio/vhost-user-blk.h |  2 ++
 include/hw/virtio/virtio-blk.h |  2 ++
 include/hw/virtio/virtio-scsi.h|  5 +
 14 files changed, 76 insertions(+), 16 deletions(-)

-- 
2.24.1



Re: [PATCH 3/3] iotests: Increase pause_wait() timeout

2020-03-20 Thread Philippe Mathieu-Daudé

On 3/13/20 9:36 AM, Kevin Wolf wrote:

Waiting for only 1 second proved to be too short on a loaded system,
resulting in false positives when testing pull requests. Increase the
timeout a bit to make this less likely.

Signed-off-by: Kevin Wolf 
---
  tests/qemu-iotests/iotests.py | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index b859c303a2..7bc4934cd2 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -925,7 +925,7 @@ class QMPTestCase(unittest.TestCase):
  self.assert_qmp(event, 'data/type', 'mirror')
  
  def pause_wait(self, job_id='job0'):

-with Timeout(1, "Timeout waiting for job to pause"):
+with Timeout(3, "Timeout waiting for job to pause"):
  while True:
  result = self.vm.qmp('query-block-jobs')
  found = False



I wonder if this might be more accurate:

  load_timeout = math.ceil(os.getloadavg()[0])
  with Timeout(1 + load_timeout, "Timeout waiting for job to pause"):
...

Anyhow:
Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH 2/3] python/qemu: Kill QEMU process if 'quit' doesn't work

2020-03-20 Thread Philippe Mathieu-Daudé

On 3/13/20 9:36 AM, Kevin Wolf wrote:

With a QEMU bug, it can happen that the QEMU process doesn't react to a
'quit' QMP command. If we got an exception during previous QMP
communication (e.g. iotests Timeout expiring), we could also be in an
inconsistent state where after sending 'quit' we immediately read an old
response and close the socket even though the 'quit' command wasn't
processed yet. Both cases would lead to a hanging test.

Fix this by waiting for the QEMU process to exit after sending 'quit'
with a timeout, and if it doesn't happen within three seconds, send
SIGKILL.

Signed-off-by: Kevin Wolf 
---
  python/qemu/machine.py | 1 +
  1 file changed, 1 insertion(+)

diff --git a/python/qemu/machine.py b/python/qemu/machine.py
index 183d8f3d38..c837ee8723 100644
--- a/python/qemu/machine.py
+++ b/python/qemu/machine.py
@@ -358,6 +358,7 @@ class QEMUMachine(object):
  if not has_quit:
  self._qmp.cmd('quit')
  self._qmp.close()
+self._popen.wait(timeout=3)
  except:
  self._popen.kill()
  self._popen.wait()



Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH 0/3] iotests: Fix intermittent 030 hang

2020-03-20 Thread Philippe Mathieu-Daudé

On 3/13/20 9:36 AM, Kevin Wolf wrote:

Peter ran into a 030 hang while testing a pull request. This turned out
to be two bugs in the test suite at once: First was the test failing
because a timeout was apparently too short, second was that the timeout
would actually cause the test to hang instead of failing. This series
should fix both.

Kevin Wolf (3):
   iotests.py: Enable faulthandler
   python/qemu: Kill QEMU process if 'quit' doesn't work
   iotests: Increase pause_wait() timeout

  python/qemu/machine.py| 1 +
  tests/qemu-iotests/iotests.py | 5 -
  2 files changed, 5 insertions(+), 1 deletion(-)



Tested-by: Philippe Mathieu-Daudé 




Re: [PATCH] aio-posix: fix io_uring with external events

2020-03-20 Thread Stefano Garzarella
On Thu, Mar 19, 2020 at 04:35:59PM +, Stefan Hajnoczi wrote:
> When external event sources are disabled fdmon-io_uring falls back to
> fdmon-poll.  The ->need_wait() callback needs to watch for this so it
> can return true when external event sources are disabled.
> 
> It is also necessary to call ->wait() when AioHandlers have changed
> because io_uring is asynchronous and we must submit new sqes.
> 
> Both of these changes to ->need_wait() together fix tests/test-aio -p
> /aio/external-client, which failed with:
> 
>   test-aio: tests/test-aio.c:404: test_aio_external_client: Assertion 
> `aio_poll(ctx, false)' failed.
> 
> Reported-by: Julia Suvorova 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  util/fdmon-io_uring.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
> index 893b79b622..7e143ef515 100644
> --- a/util/fdmon-io_uring.c
> +++ b/util/fdmon-io_uring.c
> @@ -290,7 +290,18 @@ static int fdmon_io_uring_wait(AioContext *ctx, 
> AioHandlerList *ready_list,
>  
>  static bool fdmon_io_uring_need_wait(AioContext *ctx)
>  {
> -return io_uring_cq_ready(>fdmon_io_uring);
> +/* Have io_uring events completed? */
> +if (io_uring_cq_ready(>fdmon_io_uring)) {
> +return true;
> +}
> +
> +/* Do we need to submit new io_uring sqes? */
> +if (!QSLIST_EMPTY_RCU(>submit_list)) {
> +return true;
> +}
> +
> +/* Are we falling back to fdmon-poll? */
> +return atomic_read(>external_disable_cnt);
>  }
>  
>  static const FDMonOps fdmon_io_uring_ops = {
> -- 
> 2.24.1
> 

Reviewed-by: Stefano Garzarella