Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start

2022-11-22 Thread Christian Borntraeger

Am 21.11.22 um 23:37 schrieb Michael S. Tsirkin:
[...]

qemu-system-x86_64: ../hw/virtio/vhost-vsock-common.c:203: 
vhost_vsock_common_pre_save: Assertion `!vhost_dev_is_started(&vvc->vhost_dev)' 
failed.
2022-11-15 16:38:46.096+: shutting down, reason=crashed


Alex were you able to replicate? Just curious.


Ping?



Re: [RFC 4/7] migration: Split save_live_pending() into state_pending_*

2022-11-22 Thread Dr. David Alan Gilbert
* Juan Quintela (quint...@redhat.com) wrote:
> We split the function into to:
> 
> - state_pending_estimate: We estimate the remaining state size without
>   stopping the machine.
> 
> - state pending_exact: We calculate the exact amount of remaining
>   state.
> 
> The only "device" that implements different functions for _estimate()
> and _exact() is ram.
> 
> Signed-off-by: Juan Quintela 

Yeh I think that's OK; I'm a little worried whether you end up calling
the two functions in migration_iteration_run a lot, but still


Reviewed-by: Dr. David Alan Gilbert 

> ---
>  docs/devel/migration.rst   | 18 ++
>  docs/devel/vfio-migration.rst  |  4 ++--
>  include/migration/register.h   | 12 
>  migration/savevm.h |  8 ++--
>  hw/s390x/s390-stattrib.c   |  7 ---
>  hw/vfio/migration.c|  9 +
>  migration/block-dirty-bitmap.c | 11 ++-
>  migration/block.c  | 11 ++-
>  migration/migration.c  | 13 +
>  migration/ram.c| 31 ---
>  migration/savevm.c | 34 +-
>  hw/vfio/trace-events   |  2 +-
>  migration/trace-events |  7 ---
>  13 files changed, 114 insertions(+), 53 deletions(-)
> 
> diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
> index 3e9656d8e0..6f65c23b47 100644
> --- a/docs/devel/migration.rst
> +++ b/docs/devel/migration.rst
> @@ -482,15 +482,17 @@ An iterative device must provide:
>- A ``load_setup`` function that initialises the data structures on the
>  destination.
>  
> -  - A ``save_live_pending`` function that is called repeatedly and must
> -indicate how much more data the iterative data must save.  The core
> -migration code will use this to determine when to pause the CPUs
> -and complete the migration.
> +  - A ``state_pending_exact`` function that indicates how much more
> +data we must save.  The core migration code will use this to
> +determine when to pause the CPUs and complete the migration.
>  
> -  - A ``save_live_iterate`` function (called after ``save_live_pending``
> -when there is significant data still to be sent).  It should send
> -a chunk of data until the point that stream bandwidth limits tell it
> -to stop.  Each call generates one section.
> +  - A ``state_pending_estimate`` function that indicates how much more
> +data we must save.  When the estimated amount is smaller than the
> +threshold, we call ``state_pending_exact``.
> +
> +  - A ``save_live_iterate`` function should send a chunk of data until
> +the point that stream bandwidth limits tell it to stop.  Each call
> +generates one section.
>  
>- A ``save_live_complete_precopy`` function that must transmit the
>  last section for the device containing any remaining data.
> diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
> index 9ff6163c88..673057c90d 100644
> --- a/docs/devel/vfio-migration.rst
> +++ b/docs/devel/vfio-migration.rst
> @@ -28,7 +28,7 @@ VFIO implements the device hooks for the iterative approach 
> as follows:
>  * A ``load_setup`` function that sets up the migration region on the
>destination and sets _RESUMING flag in the VFIO device state.
>  
> -* A ``save_live_pending`` function that reads pending_bytes from the vendor
> +* A ``state_pending_exact`` function that reads pending_bytes from the vendor
>driver, which indicates the amount of data that the vendor driver has yet 
> to
>save for the VFIO device.
>  
> @@ -114,7 +114,7 @@ Live migration save path
>  (RUNNING, _SETUP, _RUNNING|_SAVING)
>|
>  (RUNNING, _ACTIVE, _RUNNING|_SAVING)
> - If device is active, get pending_bytes by .save_live_pending()
> + If device is active, get pending_bytes by .state_pending_exact()
>If total pending_bytes >= threshold_size, call .save_live_iterate()
>Data of VFIO device for pre-copy phase is copied
>  Iterate till total pending bytes converge and are less than threshold
> diff --git a/include/migration/register.h b/include/migration/register.h
> index 5b5424ed8f..313b8e1c3b 100644
> --- a/include/migration/register.h
> +++ b/include/migration/register.h
> @@ -46,9 +46,7 @@ typedef struct SaveVMHandlers {
>  
>  /* This runs outside the iothread lock!  */
>  int (*save_setup)(QEMUFile *f, void *opaque);
> -void (*save_live_pending)(void *opaque,  uint64_t threshold_size,
> -  uint64_t *rest_precopy, uint64_t 
> *rest_postcopy);
> -/* Note for save_live_pending:
> +/* Note for state_pending_*:
>   * - res_precopy is for data which must be migrated in precopy
>   * phase or in stopped state, in other words - before target
>   * vm start
> @@ -59,7 +57,13 @@ typedef str

Re: [PATCH] vhost: mask VIRTIO_F_RING_RESET for vhost and vhost-user devices

2022-11-22 Thread Eugenio Perez Martin
On Tue, Nov 22, 2022 at 4:13 AM Jason Wang  wrote:
>
> On Mon, Nov 21, 2022 at 6:11 PM Stefano Garzarella  
> wrote:
> >
> > Commit 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
> > enabled VIRTIO_F_RING_RESET by default for all virtio devices.
> >
> > This feature is not currently emulated by QEMU, so for vhost and
> > vhost-user devices we need to make sure it is supported by the offloaded
> > device emulation (in-kernel or in another process).
> > To do this we need to add VIRTIO_F_RING_RESET to the features bitmap
> > passed to vhost_get_features(). This way it will be masked if the device
> > does not support it.
> >
> > This issue was initially discovered with vhost-vsock and vhost-user-vsock,
> > and then also tested with vhost-user-rng which confirmed the same issue.
> > They fail when sending features through VHOST_SET_FEATURES ioctl or
> > VHOST_USER_SET_FEATURES message, since VIRTIO_F_RING_RESET is negotiated
> > by the guest (Linux >= v6.0), but not supported by the device.
> >
> > Fixes: 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1318
> > Signed-off-by: Stefano Garzarella 
>
> Acked-by: Jason Wang 
>
> > ---
> >
> > To prevent this problem in the future, perhaps we should provide a function
> > (e.g. vhost_device_get_features) where we go to mask all non-device-specific
> > features (e.g VIRTIO_F_*, VIRTIO_RING_F_*) that are not emulated by QEMU but
> > we expect them to be emulated by the vhost or vhost-user devices.
>
> Adding Eugenio, this might not be true if we want to use shadow
> virtqueue for emulating some features?
>

I think we should be able to introduce that in the (hypothetical)
vhost_device_get_features if SVQ starts emulating a ring feature,
isn't it? I think a first version not aware of SVQ is fine anyway, and
to introduce it later should be easy.

Thanks!

> Thanks
>
> > Then we can call it in all .get_features callbacks just before return the
> > features.
> >
> > What do you think?
> >
> > But maybe better to do that for the next release, I will send an RFC.
> >
> > Thanks,
> > Stefano
> > ---
> >  hw/block/vhost-user-blk.c  |  1 +
> >  hw/net/vhost_net.c |  1 +
> >  hw/scsi/vhost-scsi.c   |  1 +
> >  hw/scsi/vhost-user-scsi.c  |  1 +
> >  hw/virtio/vhost-user-fs.c  |  1 +
> >  hw/virtio/vhost-user-gpio.c|  1 +
> >  hw/virtio/vhost-user-i2c.c |  1 +
> >  hw/virtio/vhost-user-rng.c | 11 +--
> >  hw/virtio/vhost-vsock-common.c |  1 +
> >  net/vhost-vdpa.c   |  1 +
> >  10 files changed, 18 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> > index 16ad400889..0d5190accf 100644
> > --- a/hw/block/vhost-user-blk.c
> > +++ b/hw/block/vhost-user-blk.c
> > @@ -52,6 +52,7 @@ static const int user_feature_bits[] = {
> >  VIRTIO_F_NOTIFY_ON_EMPTY,
> >  VIRTIO_F_RING_PACKED,
> >  VIRTIO_F_IOMMU_PLATFORM,
> > +VIRTIO_F_RING_RESET,
> >  VHOST_INVALID_FEATURE_BIT
> >  };
> >
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > index feda448878..26e4930676 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -75,6 +75,7 @@ static const int user_feature_bits[] = {
> >  VIRTIO_NET_F_MTU,
> >  VIRTIO_F_IOMMU_PLATFORM,
> >  VIRTIO_F_RING_PACKED,
> > +VIRTIO_F_RING_RESET,
> >  VIRTIO_NET_F_RSS,
> >  VIRTIO_NET_F_HASH_REPORT,
> >
> > diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
> > index bdf337a7a2..6a0fd0dfb1 100644
> > --- a/hw/scsi/vhost-scsi.c
> > +++ b/hw/scsi/vhost-scsi.c
> > @@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
> >  VIRTIO_RING_F_INDIRECT_DESC,
> >  VIRTIO_RING_F_EVENT_IDX,
> >  VIRTIO_SCSI_F_HOTPLUG,
> > +VIRTIO_F_RING_RESET,
> >  VHOST_INVALID_FEATURE_BIT
> >  };
> >
> > diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> > index bc37317d55..b7a71a802c 100644
> > --- a/hw/scsi/vhost-user-scsi.c
> > +++ b/hw/scsi/vhost-user-scsi.c
> > @@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
> >  VIRTIO_RING_F_INDIRECT_DESC,
> >  VIRTIO_RING_F_EVENT_IDX,
> >  VIRTIO_SCSI_F_HOTPLUG,
> > +VIRTIO_F_RING_RESET,
> >  VHOST_INVALID_FEATURE_BIT
> >  };
> >
> > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > index 1c40f42045..dc4014cdef 100644
> > --- a/hw/virtio/vhost-user-fs.c
> > +++ b/hw/virtio/vhost-user-fs.c
> > @@ -32,6 +32,7 @@ static const int user_feature_bits[] = {
> >  VIRTIO_F_NOTIFY_ON_EMPTY,
> >  VIRTIO_F_RING_PACKED,
> >  VIRTIO_F_IOMMU_PLATFORM,
> > +VIRTIO_F_RING_RESET,
> >
> >  VHOST_INVALID_FEATURE_BIT
> >  };
> > diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c
> > index 677d1c7730..5851cb3bc9 100644
> > --- a/hw/virtio/vhost-user-gpio.c
> > +++ b/hw/virtio/vhost-user-gpio.c
> > @@ -24,6 +24,7 @@ static const int feature_bits[] = 

Re: [RFC 2/7] migration: No save_live_pending() method uses the QEMUFile parameter

2022-11-22 Thread Dr. David Alan Gilbert
* Juan Quintela (quint...@redhat.com) wrote:
> So remove it everywhere.
> 
> Signed-off-by: Juan Quintela 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  include/migration/register.h   | 6 ++
>  migration/savevm.h | 2 +-
>  hw/s390x/s390-stattrib.c   | 2 +-
>  hw/vfio/migration.c| 6 ++
>  migration/block-dirty-bitmap.c | 5 ++---
>  migration/block.c  | 2 +-
>  migration/migration.c  | 3 +--
>  migration/ram.c| 2 +-
>  migration/savevm.c | 5 ++---
>  9 files changed, 13 insertions(+), 20 deletions(-)
> 
> diff --git a/include/migration/register.h b/include/migration/register.h
> index 1950fee6a8..5b5424ed8f 100644
> --- a/include/migration/register.h
> +++ b/include/migration/register.h
> @@ -46,10 +46,8 @@ typedef struct SaveVMHandlers {
>  
>  /* This runs outside the iothread lock!  */
>  int (*save_setup)(QEMUFile *f, void *opaque);
> -void (*save_live_pending)(QEMUFile *f, void *opaque,
> -  uint64_t threshold_size,
> -  uint64_t *rest_precopy,
> -  uint64_t *rest_postcopy);
> +void (*save_live_pending)(void *opaque,  uint64_t threshold_size,
> +  uint64_t *rest_precopy, uint64_t 
> *rest_postcopy);
>  /* Note for save_live_pending:
>   * - res_precopy is for data which must be migrated in precopy
>   * phase or in stopped state, in other words - before target
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 9bd55c336c..98fae6f9b3 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -40,7 +40,7 @@ void qemu_savevm_state_cleanup(void);
>  void qemu_savevm_state_complete_postcopy(QEMUFile *f);
>  int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
> bool inactivate_disks);
> -void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
> +void qemu_savevm_state_pending(uint64_t max_size,
> uint64_t *res_precopy, uint64_t 
> *res_postcopy);
>  void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
>  void qemu_savevm_send_open_return_path(QEMUFile *f);
> diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
> index ee60b53da4..9b74eeadf3 100644
> --- a/hw/s390x/s390-stattrib.c
> +++ b/hw/s390x/s390-stattrib.c
> @@ -182,7 +182,7 @@ static int cmma_save_setup(QEMUFile *f, void *opaque)
>  return 0;
>  }
>  
> -static void cmma_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
> +static void cmma_save_pending(void *opaque, uint64_t max_size,
>uint64_t *res_precopy, uint64_t *res_postcopy)
>  {
>  S390StAttribState *sas = S390_STATTRIB(opaque);
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 3423f113f0..760d5f3c5c 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -456,10 +456,8 @@ static void vfio_save_cleanup(void *opaque)
>  trace_vfio_save_cleanup(vbasedev->name);
>  }
>  
> -static void vfio_save_pending(QEMUFile *f, void *opaque,
> -  uint64_t threshold_size,
> -  uint64_t *res_precopy,
> -  uint64_t *res_postcopy)
> +static void vfio_save_pending(void *opaque,  uint64_t threshold_size,
> +  uint64_t *res_precopy, uint64_t *res_postcopy)
>  {
>  VFIODevice *vbasedev = opaque;
>  VFIOMigration *migration = vbasedev->migration;
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index dfea546330..a445bdc3c3 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -761,9 +761,8 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void 
> *opaque)
>  return 0;
>  }
>  
> -static void dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
> -  uint64_t max_size,
> -  uint64_t *res_precopy,
> +static void dirty_bitmap_save_pending(void *opaque, uint64_t max_size,
> +  uint64_t *res_precopy, 
>uint64_t *res_postcopy)
>  {
>  DBMSaveState *s = &((DBMState *)opaque)->save;
> diff --git a/migration/block.c b/migration/block.c
> index 4ae8f837b0..b3d680af75 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -862,7 +862,7 @@ static int block_save_complete(QEMUFile *f, void *opaque)
>  return 0;
>  }
>  
> -static void block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
> +static void block_save_pending(void *opaque, uint64_t max_size,
> uint64_t *res_precopy,
> uint64_t *res_postcopy)
>  {
> diff --git a/migration/migration.c b/migration/migration.c
> index 440aa62f16..038fc58a96 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ 

Re: [PATCH for-7.2 1/5] hw/nvme: fix aio cancel in format

2022-11-22 Thread Keith Busch
On Tue, Nov 22, 2022 at 09:13:44AM +0100, Klaus Jensen wrote:
> There are several bugs in the async cancel code for the Format command.
> 
> Firstly, cancelling a format operation neglects to set iocb->ret as well
> as clearing the iocb->aiocb after cancelling the underlying aiocb which
> causes the aio callback to ignore the cancellation. Trivial fix.
> 
> Secondly, and worse, because the request is queued up for posting to the
> CQ in a bottom half, if the cancellation is due to the submission queue
> being deleted (which calls blk_aio_cancel), the req structure is
> deallocated in nvme_del_sq prior to the bottom half being schedulued.
> 
> Fix this by simply removing the bottom half, there is no reason to defer
> it anyway.

I thought for sure I'd find a reason defered execution was needed, but
hey, it looks perfectly fine with this change!
 
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index ac3885ce5079..26b53469328f 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -5756,14 +5756,15 @@ typedef struct NvmeFormatAIOCB {
>  uint8_t pil;
>  } NvmeFormatAIOCB;

I think you can remove the QEMUBH member from the above struct with this
patch.

Otherwise looks good.

Reviewed-by: Keith Busch 



Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci

2022-11-22 Thread Peter Maydell
On Tue, 22 Nov 2022 at 15:04, Philippe Mathieu-Daudé  wrote:
>
> On 21/11/22 17:42, Max Filippov wrote:
> > On Mon, Nov 21, 2022 at 6:01 AM Markus Armbruster  wrote:
> >>   .../xtensa/core-dsp3400/xtensa-modules.c.inc  | 136 +-
> >>   target/xtensa/core-lx106/xtensa-modules.c.inc |  16 +--
> >
> > These files are generated and were imported from xtensa configuration
> > overlays, they're not supposed to be changed.
>
> Tools can get the repository file list using 'git ls-files', which
> itself support file pattern exclusion [*].
>
> We can create i.e. 'scripts/imported-files.txt' with:
>
>linux-headers/
>target/hexagon/imported/
>target/xtensa/core*
>tests/tcg/mips/user/

Good idea. scripts/clean-header-guards.pl also has these to add
to the exclude list:

include/standard-headers/
pc-bios/
tests/tcg/
tests/multiboot/

thanks
-- PMM



Re: [PATCH v2 2/2] block/vmdk: Simplify vmdk_co_create() to return directly

2022-11-22 Thread Philippe Mathieu-Daudé

On 22/11/22 14:49, Markus Armbruster wrote:

Cc: Fam Zheng 
Cc: Kevin Wolf 
Cc: Hanna Reitz 
Cc: qemu-block@nongnu.org
Signed-off-by: Markus Armbruster 
---
  block/vmdk.c | 28 +++-
  1 file changed, 11 insertions(+), 17 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci

2022-11-22 Thread Philippe Mathieu-Daudé

On 21/11/22 17:42, Max Filippov wrote:

On Mon, Nov 21, 2022 at 6:01 AM Markus Armbruster  wrote:

  .../xtensa/core-dsp3400/xtensa-modules.c.inc  | 136 +-
  target/xtensa/core-lx106/xtensa-modules.c.inc |  16 +--


These files are generated and were imported from xtensa configuration
overlays, they're not supposed to be changed.


Tools can get the repository file list using 'git ls-files', which
itself support file pattern exclusion [*].

We can create i.e. 'scripts/imported-files.txt' with:

  linux-headers/
  target/hexagon/imported/
  target/xtensa/core*
  tests/tcg/mips/user/

Then use 'git ls-files --exclude-from=scripts/imported-files.txt' ...

[*] 
https://git-scm.com/docs/git-ls-files#Documentation/git-ls-files.txt---exclude-fromltfilegt





Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci

2022-11-22 Thread Philippe Mathieu-Daudé

On 22/11/22 09:58, Markus Armbruster wrote:

Thomas Huth  writes:


On 21/11/2022 17.32, Markus Armbruster wrote:

Philippe Mathieu-Daudé  writes:


On 21/11/22 15:36, Peter Maydell wrote:

On Mon, 21 Nov 2022 at 14:03, Markus Armbruster  wrote:


Tweak the semantic patch to drop redundant parenthesis around the
return expression.

Coccinelle drops comments in hw/rdma/vmw/pvrdma_cmd.c; restored
manually.

Coccinelle messes up vmdk_co_create(), not sure why.  Transformed
manually.

Line breaks in target/avr/cpu.h and hw/rdma/vmw/pvrdma_cmd.c tidied up
manually.

Whitespace in fuse_reply_iov() tidied up manually.

checkpatch.pl complains "return of an errno should typically be -ve"
two times for hw/9pfs/9p-synth.c.  Preexisting, the patch merely makes
it visible to checkpatch.pl.

checkpatch.pl complains "return is not a function, parentheses are not
required" three times for target/mips/tcg/dsp_helper.c.  False
positives.

Signed-off-by: Markus Armbruster 



.../user/ase/msa/bit-count/test_msa_nloc_b.c  |   9 +-
.../user/ase/msa/bit-count/test_msa_nloc_d.c  |   9 +-

[snip long list of other mips test files]


328 files changed, 989 insertions(+), 2099 deletions(-)

This patch seems to almost entirely be huge because of these
mips test case files. Are they specific to QEMU or are they
effectively a 3rd-party import that it doesn't make sense
to make local changes to?


They are imported and will unlikely be modified.


Not obvious to me from git-log.

Should I drop the changes to tests/tcg/mips/?


I'd say yes. At least move them to a separate patch.


Possible status of tests/tcg/mips/:

1. Imported, should not be modified

Drop from the patch.

2. Not imported, should be modified

2a. To be reviewed separately from the remainder of the patch

 Split off.

2b. Likewise, but nobody will care to review, realistically

 Split off and merge anyway, or drop.  I'd go for the latter.

2c. To be reviewed together with the remainder of the patch

 Keep as is.

Which one is it?


"1. Imported, should not be modified" please :)



Re: [RFC 1/7] migration: Remove res_compatible parameter

2022-11-22 Thread Dr. David Alan Gilbert
* Juan Quintela (quint...@redhat.com) wrote:
> It was only used for RAM, and in that case, it means that this amount
> of data was sent for memory.  Just delete the field in all callers.
> 
> Signed-off-by: Juan Quintela 
> ---
>  include/migration/register.h   | 20 ++--
>  migration/savevm.h |  4 +---
>  hw/s390x/s390-stattrib.c   |  6 ++
>  hw/vfio/migration.c| 10 --
>  migration/block-dirty-bitmap.c |  7 +++
>  migration/block.c  |  7 +++
>  migration/migration.c  |  9 -
>  migration/ram.c|  8 +++-
>  migration/savevm.c | 14 +-
>  hw/vfio/trace-events   |  2 +-
>  migration/trace-events |  2 +-
>  11 files changed, 37 insertions(+), 52 deletions(-)
> 
> diff --git a/include/migration/register.h b/include/migration/register.h
> index c1dcff0f90..1950fee6a8 100644
> --- a/include/migration/register.h
> +++ b/include/migration/register.h
> @@ -48,18 +48,18 @@ typedef struct SaveVMHandlers {
>  int (*save_setup)(QEMUFile *f, void *opaque);
>  void (*save_live_pending)(QEMUFile *f, void *opaque,
>uint64_t threshold_size,
> -  uint64_t *res_precopy_only,
> -  uint64_t *res_compatible,
> -  uint64_t *res_postcopy_only);
> +  uint64_t *rest_precopy,
> +  uint64_t *rest_postcopy);
>  /* Note for save_live_pending:
> - * - res_precopy_only is for data which must be migrated in precopy phase
> - * or in stopped state, in other words - before target vm start
> - * - res_compatible is for data which may be migrated in any phase
> - * - res_postcopy_only is for data which must be migrated in postcopy 
> phase
> - * or in stopped state, in other words - after source vm stop
> + * - res_precopy is for data which must be migrated in precopy
> + * phase or in stopped state, in other words - before target
> + * vm start
> + * - res_postcopy is for data which must be migrated in postcopy
> + * phase or in stopped state, in other words - after source vm
> + * stop
>   *
> - * Sum of res_postcopy_only, res_compatible and res_postcopy_only is the
> - * whole amount of pending data.
> + * Sum of res_precopy and res_postcopy is the whole amount of
> + * pending data.

I'm not sure if this is correct; I think we really do have three
different kinds of device:
  a) Those that don't know how to postcopy
  b) Those that can only send data in postcopy (which I think is what
the dirty-bitmap block stuff does)
  c) Devices that know how to postcopy, like RAM
  


> diff --git a/migration/ram.c b/migration/ram.c
> index dc1de9ddbc..20167e1102 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3435,9 +3435,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>  }
>  
>  static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
> - uint64_t *res_precopy_only,
> - uint64_t *res_compatible,
> - uint64_t *res_postcopy_only)
> + uint64_t *res_precopy, uint64_t *res_postcopy)
>  {
>  RAMState **temp = opaque;
>  RAMState *rs = *temp;
> @@ -3457,9 +3455,9 @@ static void ram_save_pending(QEMUFile *f, void *opaque, 
> uint64_t max_size,
>  
>  if (migrate_postcopy_ram()) {
>  /* We can do postcopy, and all the data is postcopiable */
> -*res_compatible += remaining_size;
> +*res_postcopy += remaining_size;

migrate_postcopy_ram() says that postcopy is enabled, not active, so here you 
are saying that
ram is 'postcopy only' according to your definition above.
I don't think it actually changes behaviour though at the moment; but it
doesn't feel right to assign that to something that says 'postcopy only'
migration_iteration_run doesn't enforce the postcopy-only part very
hard; it's just part of the threshold as far as I can tell.

Dave

>  } else {
> -*res_precopy_only += remaining_size;
> +*res_precopy += remaining_size;
>  }
>  }

> diff --git a/migration/migration.c b/migration/migration.c
> index bb8bbddfe4..440aa62f16 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3734,15 +3734,14 @@ typedef enum {
>   */
>  static MigIterateState migration_iteration_run(MigrationState *s)
>  {
> -uint64_t pending_size, pend_pre, pend_compat, pend_post;
> +uint64_t pending_size, pend_pre, pend_post;
>  bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
>  
>  qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, &pend_pre,
> -  &pend_compat, &pend_post);
> -pending_size = pend_pre + pend_compat + pend_post;
> +  &pend_po

Re: [PATCH v2 2/2] block/vmdk: Simplify vmdk_co_create() to return directly

2022-11-22 Thread Peter Maydell
On Tue, 22 Nov 2022 at 13:51, Markus Armbruster  wrote:
>
> Cc: Fam Zheng 
> Cc: Kevin Wolf 
> Cc: Hanna Reitz 
> Cc: qemu-block@nongnu.org
> Signed-off-by: Markus Armbruster 
> ---
>  block/vmdk.c | 28 +++-
>  1 file changed, 11 insertions(+), 17 deletions(-)
>

Reviewed-by: Peter Maydell 

thanks
-- PMM



[PATCH v2 2/2] block/vmdk: Simplify vmdk_co_create() to return directly

2022-11-22 Thread Markus Armbruster
Cc: Fam Zheng 
Cc: Kevin Wolf 
Cc: Hanna Reitz 
Cc: qemu-block@nongnu.org
Signed-off-by: Markus Armbruster 
---
 block/vmdk.c | 28 +++-
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 26376352b9..bac3d8db50 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -2821,7 +2821,6 @@ static BlockBackend *vmdk_co_create_cb(int64_t size, int 
idx,
 static int coroutine_fn vmdk_co_create(BlockdevCreateOptions *create_options,
Error **errp)
 {
-int ret;
 BlockdevCreateOptionsVmdk *opts;
 
 opts = &create_options->u.vmdk;
@@ -2829,24 +2828,19 @@ static int coroutine_fn 
vmdk_co_create(BlockdevCreateOptions *create_options,
 /* Validate options */
 if (!QEMU_IS_ALIGNED(opts->size, BDRV_SECTOR_SIZE)) {
 error_setg(errp, "Image size must be a multiple of 512 bytes");
-ret = -EINVAL;
-goto out;
+return -EINVAL;
 }
 
-ret = vmdk_co_do_create(opts->size,
-opts->subformat,
-opts->adapter_type,
-opts->backing_file,
-opts->hwversion,
-opts->toolsversion,
-false,
-opts->zeroed_grain,
-vmdk_co_create_cb,
-opts, errp);
-return ret;
-
-out:
-return ret;
+return vmdk_co_do_create(opts->size,
+ opts->subformat,
+ opts->adapter_type,
+ opts->backing_file,
+ opts->hwversion,
+ opts->toolsversion,
+ false,
+ opts->zeroed_grain,
+ vmdk_co_create_cb,
+ opts, errp);
 }
 
 static void vmdk_close(BlockDriverState *bs)
-- 
2.37.3




Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci

2022-11-22 Thread Peter Maydell
On Tue, 22 Nov 2022 at 13:27, Markus Armbruster  wrote:
>
> Peter Maydell  writes:
> > The obvious answer is "you might have got your manual tweaking
> > wrong". A purely mechanised patch I can review by looking at
> > the script and maybe eyeballing a few instances of the change;
> > a change that is 99% mechanised and 1% hand-written I need to
> > run through to find the hand-written parts.
>
> Define "handwritten" :)
>
> If reverting unwanted line-breaks and blank lines counts, then I can
> make two patches, one straight from Coccinelle, and one that reverts the
> unwanted crap.  The first one will be larger and more annoying to review
> than this one.  A clear loss in my book, but I'm the patch submitter,
> not a patch reviewer, so my book doesn't matter.
>
> Else, we're down to one file, which I already offered to split off.
>
> > But mostly this patch is hard to review for its sheer size,
> > mechanical changes or not. A 3000 line patchmail is so big that
> > the UI on my mail client gets pretty unwieldy.
>
> With the manual one split off, target/xtensa/ dropped as requested by
> Max, and tests/tcg/mips/ dropped because its status is unclear (and I
> start to find it hard to care), we're down to
>
>  28 files changed, 81 insertions(+), 221 deletions(-)

Yes, this is much better and "I hand tweaked these things"
is reasonable in a patch that big. It's the combination
of the ginormous multi-thousand-line patch and the hand
tweaking that was the really awkward part.

thanks
-- PMM



Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci

2022-11-22 Thread Markus Armbruster
Peter Maydell  writes:

> On Tue, 22 Nov 2022 at 08:58, Markus Armbruster  wrote:
>> I don't think complete detailed review is necessary or even sensible.
>>
>> Review should start with the Coccinelle script:
>>
>> // replace 'R = X; return R;' with 'return X;'
>> @@
>> identifier VAR;
>> expression E;
>> type T;
>> identifier F;
>> @@
>>  T F(...)
>>  {
>>  ...
>> -T VAR;
>>  ... when != VAR
>>
>> -VAR = (E);
>> -return VAR;
>> +return E;
>>  ... when != VAR
>>  }
>>
>> What could go wrong?  Not a rhetorical question!
>
> The obvious answer is "you might have got your manual tweaking
> wrong". A purely mechanised patch I can review by looking at
> the script and maybe eyeballing a few instances of the change;
> a change that is 99% mechanised and 1% hand-written I need to
> run through to find the hand-written parts.

Define "handwritten" :)

If reverting unwanted line-breaks and blank lines counts, then I can
make two patches, one straight from Coccinelle, and one that reverts the
unwanted crap.  The first one will be larger and more annoying to review
than this one.  A clear loss in my book, but I'm the patch submitter,
not a patch reviewer, so my book doesn't matter.

Else, we're down to one file, which I already offered to split off.

> But mostly this patch is hard to review for its sheer size,
> mechanical changes or not. A 3000 line patchmail is so big that
> the UI on my mail client gets pretty unwieldy.

With the manual one split off, target/xtensa/ dropped as requested by
Max, and tests/tcg/mips/ dropped because its status is unclear (and I
start to find it hard to care), we're down to

 28 files changed, 81 insertions(+), 221 deletions(-)

This will be v2.




Re: [PATCH v8 0/4] block: small refactorings

2022-11-22 Thread Kevin Wolf
Am 07.11.2022 um 17:35 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Hi all!
> 
> Here is 4-more simple already reviewed patches from
>  "[PATCH v5 00/45] Transactional block-graph modifying API" [1]
> 
> Called v8 because first part of [1] was recently merged as
>  "[PATCH v7 for-7.2 00/15] block: cleanup backing and file handling"

Thanks, applied to block-next.

Kevin




Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci

2022-11-22 Thread Peter Maydell
On Tue, 22 Nov 2022 at 08:58, Markus Armbruster  wrote:
> I don't think complete detailed review is necessary or even sensible.
>
> Review should start with the Coccinelle script:
>
> // replace 'R = X; return R;' with 'return X;'
> @@
> identifier VAR;
> expression E;
> type T;
> identifier F;
> @@
>  T F(...)
>  {
>  ...
> -T VAR;
>  ... when != VAR
>
> -VAR = (E);
> -return VAR;
> +return E;
>  ... when != VAR
>  }
>
> What could go wrong?  Not a rhetorical question!

The obvious answer is "you might have got your manual tweaking
wrong". A purely mechanised patch I can review by looking at
the script and maybe eyeballing a few instances of the change;
a change that is 99% mechanised and 1% hand-written I need to
run through to find the hand-written parts.

But mostly this patch is hard to review for its sheer size,
mechanical changes or not. A 3000 line patchmail is so big that
the UI on my mail client gets pretty unwieldy.

-- PMM



[PULL 1/8] vhost: mask VIRTIO_F_RING_RESET for vhost and vhost-user devices

2022-11-22 Thread Michael S. Tsirkin
From: Stefano Garzarella 

Commit 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
enabled VIRTIO_F_RING_RESET by default for all virtio devices.

This feature is not currently emulated by QEMU, so for vhost and
vhost-user devices we need to make sure it is supported by the offloaded
device emulation (in-kernel or in another process).
To do this we need to add VIRTIO_F_RING_RESET to the features bitmap
passed to vhost_get_features(). This way it will be masked if the device
does not support it.

This issue was initially discovered with vhost-vsock and vhost-user-vsock,
and then also tested with vhost-user-rng which confirmed the same issue.
They fail when sending features through VHOST_SET_FEATURES ioctl or
VHOST_USER_SET_FEATURES message, since VIRTIO_F_RING_RESET is negotiated
by the guest (Linux >= v6.0), but not supported by the device.

Fixes: 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1318
Signed-off-by: Stefano Garzarella 
Message-Id: <20221121101101.29400-1-sgarz...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Stefan Hajnoczi 
Acked-by: Raphael Norwitz 
Acked-by: Jason Wang 
---
 hw/block/vhost-user-blk.c  |  1 +
 hw/net/vhost_net.c |  1 +
 hw/scsi/vhost-scsi.c   |  1 +
 hw/scsi/vhost-user-scsi.c  |  1 +
 hw/virtio/vhost-user-fs.c  |  1 +
 hw/virtio/vhost-user-gpio.c|  1 +
 hw/virtio/vhost-user-i2c.c |  1 +
 hw/virtio/vhost-user-rng.c | 11 +--
 hw/virtio/vhost-vsock-common.c |  1 +
 net/vhost-vdpa.c   |  1 +
 10 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 16ad400889..0d5190accf 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -52,6 +52,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_NOTIFY_ON_EMPTY,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
+VIRTIO_F_RING_RESET,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index feda448878..26e4930676 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -75,6 +75,7 @@ static const int user_feature_bits[] = {
 VIRTIO_NET_F_MTU,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
+VIRTIO_F_RING_RESET,
 VIRTIO_NET_F_RSS,
 VIRTIO_NET_F_HASH_REPORT,
 
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index bdf337a7a2..6a0fd0dfb1 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
+VIRTIO_F_RING_RESET,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index bc37317d55..b7a71a802c 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
+VIRTIO_F_RING_RESET,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 1c40f42045..dc4014cdef 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -32,6 +32,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_NOTIFY_ON_EMPTY,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
+VIRTIO_F_RING_RESET,
 
 VHOST_INVALID_FEATURE_BIT
 };
diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c
index 677d1c7730..5851cb3bc9 100644
--- a/hw/virtio/vhost-user-gpio.c
+++ b/hw/virtio/vhost-user-gpio.c
@@ -24,6 +24,7 @@ static const int feature_bits[] = {
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_GPIO_F_IRQ,
+VIRTIO_F_RING_RESET,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-i2c.c b/hw/virtio/vhost-user-i2c.c
index 864eba695e..1c9f3d20dc 100644
--- a/hw/virtio/vhost-user-i2c.c
+++ b/hw/virtio/vhost-user-i2c.c
@@ -16,6 +16,7 @@
 
 static const int feature_bits[] = {
 VIRTIO_I2C_F_ZERO_LENGTH_REQUEST,
+VIRTIO_F_RING_RESET,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-rng.c b/hw/virtio/vhost-user-rng.c
index 8b47287875..f9084cde58 100644
--- a/hw/virtio/vhost-user-rng.c
+++ b/hw/virtio/vhost-user-rng.c
@@ -16,6 +16,11 @@
 #include "qemu/error-report.h"
 #include "standard-headers/linux/virtio_ids.h"
 
+static const int feature_bits[] = {
+VIRTIO_F_RING_RESET,
+VHOST_INVALID_FEATURE_BIT
+};
+
 static void vu_rng_start(VirtIODevice *vdev)
 {
 VHostUserRNG *rng = VHOST_USER_RNG(vdev);
@@ -106,8 +111,10 @@ static void vu_rng_set_status(VirtIODevice *vdev, uint8_t 
status)
 static uint64_t vu_rng_get_features(VirtIODevice *vdev,
 uint64_t requested_features, Error **errp)
 {
-/* No feature bits used yet */

Re: [PATCH v4 11/11] block/dirty-bitmap: convert coroutine-only functions to generated_co_wrapper_simple

2022-11-22 Thread Kevin Wolf
Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben:
> bdrv_can_store_new_dirty_bitmap and bdrv_remove_persistent_dirty_bitmap
> check if they are running in a coroutine, directly calling the
> coroutine callback if it's the case.
> Except that no coroutine calls such functions, therefore that check
> can be removed, and function creation can be offloaded to
> g_c_w_simple.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 

Reviewed-by: Kevin Wolf 




Re: [PATCH v4 10/11] block: convert bdrv_create to generated_co_wrapper_simple

2022-11-22 Thread Kevin Wolf
Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben:
> This function is never called in coroutine context, therefore
> instead of manually creating a new coroutine, delegate it to the
> block-coroutine-wrapper script, defining it as g_c_w_simple.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 

At first I thought that "never called in coroutine context" was not
entirely true because of paths like this:

qcow2_co_create() -> bdrv_open_blockdev_ref() -> bdrv_open_inherit() ->
bdrv_append_temp_snapshot() -> bdrv_create().

The only reason why it doesn't happen is that with a BlockdevRef, it's
impossible to get BDRV_O_SNAPSHOT set, so bdrv_append_temp_snapshot()
can't actually be called when you come from bdrv_open_blockdev_ref().

Of course, opening images in a coroutine is likely to already do similar
things. We should probably drop out of coroutine context for bdrv_open
to be safe. In practice, I suspect it might work anyway because nothing
is going to wait on the current coroutine, but maybe better not to rely
on it.

Anyway, not a problem of your patch, it's just something it made me
think of.

> diff --git a/block.c b/block.c
> index 7a4c3eb540..d3e168408a 100644
> --- a/block.c
> +++ b/block.c
> @@ -528,7 +528,7 @@ typedef struct CreateCo {
>  Error *err;
>  } CreateCo;
>  
> -static int coroutine_fn bdrv_co_create(BlockDriver *drv, const char 
> *filename,
> +int coroutine_fn bdrv_co_create(BlockDriver *drv, const char *filename,
> QemuOpts *opts, Error **errp)

The indentation is off now. With this fixed:

Reviewed-by: Kevin Wolf 




Re: [PATCH v4 09/11] block: bdrv_create_file is a coroutine_fn

2022-11-22 Thread Emanuele Giuseppe Esposito



Am 22/11/2022 um 09:58 schrieb Kevin Wolf:
> Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben:
>> It is always called in coroutine_fn callbacks, therefore
>> it can directly call bdrv_co_create().
>>
>> Signed-off-by: Emanuele Giuseppe Esposito 
>> ---
>>  block.c| 6 --
>>  include/block/block-global-state.h | 3 ++-
>>  2 files changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index c610a32e77..7a4c3eb540 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -534,6 +534,7 @@ static int coroutine_fn bdrv_co_create(BlockDriver *drv, 
>> const char *filename,
>>  int ret;
>>  char *filename_copy;
>>  GLOBAL_STATE_CODE();
>> +assert(qemu_in_coroutine());
> 
> We don't generally assert this, otherwise it would have to be in every
> coroutine_fn.

That was my plan for the serie "Protect the block layer with a rwlock:
part 3", where I convert BlockDriver callbacks in coroutine, and thus
assert there because I know all the callers are coroutine_fn.

> 
>>  assert(*errp == NULL);
>>  assert(drv);
>>  
>> @@ -725,7 +726,8 @@ out:
>>  return ret;
>>  }
>>  
>> -int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp)
>> +int coroutine_fn bdrv_create_file(const char *filename, QemuOpts *opts,
>> +  Error **errp)
> 
> Should it be renamed as bdrv_co_create_file()?
> 
Ok (when I don't answer just assume that I agree).

Thank you,
Emanuele
>>  {
>>  QemuOpts *protocol_opts;
>>  BlockDriver *drv;
>> @@ -766,7 +768,7 @@ int bdrv_create_file(const char *filename, QemuOpts 
>> *opts, Error **errp)
>>  goto out;
>>  }
>>  
>> -ret = bdrv_create(drv, filename, protocol_opts, errp);
>> +ret = bdrv_co_create(drv, filename, protocol_opts, errp);
>>  out:
>>  qemu_opts_del(protocol_opts);
>>  qobject_unref(qdict);
> 
> Kevin
> 




Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci

2022-11-22 Thread Markus Armbruster
Thomas Huth  writes:

> On 21/11/2022 17.32, Markus Armbruster wrote:
>> Philippe Mathieu-Daudé  writes:
>> 
>>> On 21/11/22 15:36, Peter Maydell wrote:
 On Mon, 21 Nov 2022 at 14:03, Markus Armbruster  wrote:
>
> Tweak the semantic patch to drop redundant parenthesis around the
> return expression.
>
> Coccinelle drops comments in hw/rdma/vmw/pvrdma_cmd.c; restored
> manually.
>
> Coccinelle messes up vmdk_co_create(), not sure why.  Transformed
> manually.
>
> Line breaks in target/avr/cpu.h and hw/rdma/vmw/pvrdma_cmd.c tidied up
> manually.
>
> Whitespace in fuse_reply_iov() tidied up manually.
>
> checkpatch.pl complains "return of an errno should typically be -ve"
> two times for hw/9pfs/9p-synth.c.  Preexisting, the patch merely makes
> it visible to checkpatch.pl.
>
> checkpatch.pl complains "return is not a function, parentheses are not
> required" three times for target/mips/tcg/dsp_helper.c.  False
> positives.
>
> Signed-off-by: Markus Armbruster 

>.../user/ase/msa/bit-count/test_msa_nloc_b.c  |   9 +-
>.../user/ase/msa/bit-count/test_msa_nloc_d.c  |   9 +-
 [snip long list of other mips test files]

>328 files changed, 989 insertions(+), 2099 deletions(-)
 This patch seems to almost entirely be huge because of these
 mips test case files. Are they specific to QEMU or are they
 effectively a 3rd-party import that it doesn't make sense
 to make local changes to?
>>>
>>> They are imported and will unlikely be modified.
>> 
>> Not obvious to me from git-log.
>> 
>> Should I drop the changes to tests/tcg/mips/?
>
> I'd say yes. At least move them to a separate patch.

Possible status of tests/tcg/mips/:

1. Imported, should not be modified

   Drop from the patch.

2. Not imported, should be modified

2a. To be reviewed separately from the remainder of the patch

Split off.

2b. Likewise, but nobody will care to review, realistically

Split off and merge anyway, or drop.  I'd go for the latter.

2c. To be reviewed together with the remainder of the patch

Keep as is.

Which one is it?

>  Otherwise reviewing 
> this patch here is no fun at all.

I don't think complete detailed review is necessary or even sensible.

Review should start with the Coccinelle script:

// replace 'R = X; return R;' with 'return X;'
@@
identifier VAR;
expression E;
type T;
identifier F;
@@
 T F(...)
 {
 ...
-T VAR;
 ... when != VAR

-VAR = (E);
-return VAR;
+return E;
 ... when != VAR
 }

What could go wrong?  Not a rhetorical question!

The simple part is executing the rule.  It'll *delete* variable VAR, and
*preserve* expression E.

The tricky part is deciding whether the rule matches, in particular
proving that there are no other uses of VAR.  If Coccinelle gets that
wrong, the code no longer compiles *unless* there is another, global VAR
of suitable type.

Turns out Coccinelle does get it wrong for vmdk_co_create(), and the
compiler duly rejects it.  I don't understand why it gets it wrong.

To help understand the script, and as a sanity check, reviewing some of
its patches is advisable.  Reviewing all of them feels impractical.




Re: [PATCH v4 09/11] block: bdrv_create_file is a coroutine_fn

2022-11-22 Thread Kevin Wolf
Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben:
> It is always called in coroutine_fn callbacks, therefore
> it can directly call bdrv_co_create().
> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  block.c| 6 --
>  include/block/block-global-state.h | 3 ++-
>  2 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/block.c b/block.c
> index c610a32e77..7a4c3eb540 100644
> --- a/block.c
> +++ b/block.c
> @@ -534,6 +534,7 @@ static int coroutine_fn bdrv_co_create(BlockDriver *drv, 
> const char *filename,
>  int ret;
>  char *filename_copy;
>  GLOBAL_STATE_CODE();
> +assert(qemu_in_coroutine());

We don't generally assert this, otherwise it would have to be in every
coroutine_fn.

>  assert(*errp == NULL);
>  assert(drv);
>  
> @@ -725,7 +726,8 @@ out:
>  return ret;
>  }
>  
> -int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp)
> +int coroutine_fn bdrv_create_file(const char *filename, QemuOpts *opts,
> +  Error **errp)

Should it be renamed as bdrv_co_create_file()?

>  {
>  QemuOpts *protocol_opts;
>  BlockDriver *drv;
> @@ -766,7 +768,7 @@ int bdrv_create_file(const char *filename, QemuOpts 
> *opts, Error **errp)
>  goto out;
>  }
>  
> -ret = bdrv_create(drv, filename, protocol_opts, errp);
> +ret = bdrv_co_create(drv, filename, protocol_opts, errp);
>  out:
>  qemu_opts_del(protocol_opts);
>  qobject_unref(qdict);

Kevin




Re: [PATCH v4 08/11] block: distinguish between bdrv_create running in coroutine and not

2022-11-22 Thread Kevin Wolf
Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben:
> Call two different functions depending on whether bdrv_create
> is in coroutine or not, following the same pattern as
> generated_co_wrapper functions.
> 
> This allows to also call the coroutine function directly,
> without using CreateCo or relying in bdrv_create().
> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  block.c | 76 -
>  1 file changed, 38 insertions(+), 38 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 577639c7e0..c610a32e77 100644
> --- a/block.c
> +++ b/block.c
> @@ -528,66 +528,66 @@ typedef struct CreateCo {
>  Error *err;
>  } CreateCo;
>  
> -static void coroutine_fn bdrv_create_co_entry(void *opaque)
> +static int coroutine_fn bdrv_co_create(BlockDriver *drv, const char 
> *filename,
> +   QemuOpts *opts, Error **errp)
>  {
> -Error *local_err = NULL;
>  int ret;
> +char *filename_copy;
> +GLOBAL_STATE_CODE();
> +assert(*errp == NULL);

Your use of *errp in this function will crash if NULL is passed. You
need ERRP_GUARD() first before you can do this.

> +assert(drv);
> +
> +if (!drv->bdrv_co_create_opts) {
> +error_setg(errp, "Driver '%s' does not support image creation",
> +   drv->format_name);
> +return -ENOTSUP;
> +}
>  
> +filename_copy = g_strdup(filename);

It's only preserved from the pre-patch state, but I don't think this is
necessary? We know that the string will stay around until the function
returns, and the parameter of drv->bdrv_co_create_opts is const char*,
so it must not be modified either.

Maybe worth a cleanup patch on top to just use filename directly?

> +ret = drv->bdrv_co_create_opts(drv, filename_copy, opts, errp);
> +
> +if (ret < 0 && !*errp) {
> +error_setg_errno(errp, -ret, "Could not create image");
> +}
> +
> +g_free(filename_copy);
> +return ret;
> +}

Kevin




Re: [PATCH maybe-7.2 1/3] hw/i2c: only schedule pending master when bus is idle

2022-11-22 Thread Klaus Jensen
On Nov 17 14:40, Cédric Le Goater wrote:
> On 11/17/22 12:58, Klaus Jensen wrote:
> > On Nov 17 09:01, Cédric Le Goater wrote:
> > > On 11/17/22 08:37, Klaus Jensen wrote:
> > > > On Nov 17 07:56, Cédric Le Goater wrote:
> > > > > On 11/17/22 07:40, Klaus Jensen wrote:
> > > > > > On Nov 16 16:58, Cédric Le Goater wrote:
> > > > > > > On 11/16/22 09:43, Klaus Jensen wrote:
> > > > > > > > From: Klaus Jensen 
> > > > > > > > 
> > > > > > > > It is not given that the current master will release the bus 
> > > > > > > > after a
> > > > > > > > transfer ends. Only schedule a pending master if the bus is 
> > > > > > > > idle.
> > > > > > > > 
> > > > > > > > Fixes: 37fa5ca42623 ("hw/i2c: support multiple masters")
> > > > > > > > Signed-off-by: Klaus Jensen 
> > > > > > > > ---
> > > > > > > >  hw/i2c/aspeed_i2c.c  |  2 ++
> > > > > > > >  hw/i2c/core.c| 37 
> > > > > > > > ++---
> > > > > > > >  include/hw/i2c/i2c.h |  2 ++
> > > > > > > >  3 files changed, 26 insertions(+), 15 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/hw/i2c/aspeed_i2c.c b/hw/i2c/aspeed_i2c.c
> > > > > > > > index c166fd20fa11..1f071a3811f7 100644
> > > > > > > > --- a/hw/i2c/aspeed_i2c.c
> > > > > > > > +++ b/hw/i2c/aspeed_i2c.c
> > > > > > > > @@ -550,6 +550,8 @@ static void 
> > > > > > > > aspeed_i2c_bus_handle_cmd(AspeedI2CBus *bus, uint64_t value)
> > > > > > > >  }
> > > > > > > >  SHARED_ARRAY_FIELD_DP32(bus->regs, reg_cmd, 
> > > > > > > > M_STOP_CMD, 0);
> > > > > > > >  aspeed_i2c_set_state(bus, I2CD_IDLE);
> > > > > > > > +
> > > > > > > > +i2c_schedule_pending_master(bus->bus);
> > > > > > > 
> > > > > > > Shouldn't it be i2c_bus_release() ?
> > > > > > > 
> > > > > > 
> > > > > > The reason for having both i2c_bus_release() and
> > > > > > i2c_schedule_pending_master() is that i2c_bus_release() sort of 
> > > > > > pairs
> > > > > > with i2c_bus_master(). They either set or clear the bus->bh member.
> > > > > > 
> > > > > > In the current design, the controller (in this case the Aspeed I2C) 
> > > > > > is
> > > > > > an "implicit" master (it does not have a bottom half driving it), so
> > > > > > there is no bus->bh to clear.
> > > > > > 
> > > > > > I should (and will) write some documentation on the asynchronous 
> > > > > > API.
> > > > > 
> > > > > I found the routine names confusing. Thanks for the clarification.
> > > > > 
> > > > > Maybe we could do this rename  :
> > > > > 
> > > > > i2c_bus_release() -> i2c_bus_release_and_clear()
> > > > > i2c_schedule_pending_master() -> i2c_bus_release()
> > > > > 
> > > > > and keep i2c_schedule_pending_master() internal the I2C core 
> > > > > subsystem.
> > > > > 
> > > > 
> > > > How about renaming i2c_bus_master to i2c_bus_acquire() such that it
> > > > pairs with i2c_bus_release().
> > > 
> > > Looks good to me.
> > > 
> > > > And then add an i2c_bus_yield() to be used by the controller? I think we
> > > > should be able to assert in i2c_bus_yield() that bus->bh is NULL. But
> > > > I'll take a closer look at that.
> > > 
> > > I am using your i2c-echo slave device to track regressions in the Aspeed
> > > machines. May be we could merge it for tests ?
> > > 
> > 
> > Oh, cool.
> > 
> > Sure, I'd be happy to help "maintain" it ;)
> 
> And so, I am seeing errors with the little POC you sent.
> 
> without:
>   console: echo slave-24c02 0x1064 > /sys/bus/i2c/devices/i2c-3/new_device
>   console: # echo slave-24c02 0x1064 > /sys/bus/i2c/devices/i2c-3/new_device
>   console: [4.252431] i2c i2c-3: new_device: Instantiated device 
> slave-24c02 at 0x64
>   console: i2cset -y 3 0x42 0x64 0x00 0xaa i
>   /console: # i2cset -y 3 0x42 0x64 0x00 0xaa i
>   console: # hexdump /sys/bus/i2c/devices/3-1064/slave-eeprom
>   console: 000 ffaa       
>   console: poweroff
>   console: 010        
>   console: *
>   console: 100
> 
> with:
>   console: echo slave-24c02 0x1064 > /sys/bus/i2c/devices/i2c-3/new_device
>   console: # echo slave-24c02 0x1064 > /sys/bus/i2c/devices/i2c-3/new_device
>   console: [4.413210] i2c i2c-3: new_device: Instantiated device 
> slave-24c02 at 0x64
>   console: i2cset -y 3 0x42 0x64 0x00 0xaa i
>   console: # i2cset -y 3 0x42 0x64 0x00 0xaa i
>   console: # hexdump /sys/bus/i2c/devices/3-1064/slave-eeprom
>   console: 000        
>   console: *
>   console: 100
> C.

Right.

This is because the i2c-echo device is scheduling the bottom half
initially on its own. What happens is that the bottom half gets queued
up in the pending masters list instead of being scheduling directly. And
since the i2c controller is idle, the bottom half is never scheduled.

Fixing i2c_bus_acquire() to schedulue it directly if the bus is free
seems the proper way to do it. I'll include that in v2.

While it is not directly invalid, the 

Re: [PATCH v4 05/11] block-coroutine-wrapper.py: default to main loop aiocontext if function does not have a BlockDriverState parameter

2022-11-22 Thread Kevin Wolf
Am 21.11.2022 um 16:52 hat Emanuele Giuseppe Esposito geschrieben:
> Am 21/11/2022 um 16:30 schrieb Kevin Wolf:
> > Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben:
> >> Basically BdrvPollCo->bs is only used by bdrv_poll_co(), and the
> >> functions that it uses are both using bdrv_get_aio_context, that
> >> defaults to qemu_get_aio_context() if bs is NULL.
> >>
> >> Therefore pass NULL to BdrvPollCo to automatically generate a function
> >> that create and runs a coroutine in the main loop.
> >>
> >> Signed-off-by: Emanuele Giuseppe Esposito 
> > 
> > It happens to work, but it's kind of ugly to call bdrv_coroutine_enter()
> > and BDRV_POLL_WHILE() with a NULL bs.
> > 
> > How hard would it be to generate code that doesn't use these functions,
> > but directly aio_co_enter() and AIO_WAIT_WHILE() for wrappers that are
> > not related to a BDS?
> > 
> 
> At this point, I would get rid of s->poll_state.bs and instead use
> s->poll_state.aio_context. Then call directly aio_co_enter and
> AIO_WAIT_WHILE, as you suggested but just everywhere, without
> differentiating the cases.

Oh, yes, that's better.

> Then we would have something similar to what it is currently done with bs:
> 
> if t == 'BlockDriverState *':
> bs = 'bdrv_get_aio_context(bs)'
> elif t == 'BdrvChild *':
> bs = 'bdrv_get_aio_context(child->bs)'
> elif t == 'BlockBackend *':
> bs = 'bdrv_get_aio_context(blk_bs(blk))'

blk_get_aio_context(blk) seems more correct.

> else:
> bs = 'qemu_get_aio_context()'
> 
> I haven't tried it yet, but it should work.

Kevin




Re: [PATCH for-7.2] block-backend: avoid bdrv_unregister_buf() NULL pointer deref

2022-11-22 Thread Kevin Wolf
Am 21.11.2022 um 22:19 hat Stefan Hajnoczi geschrieben:
> bdrv_*() APIs expect a valid BlockDriverState. Calling them with bs=NULL
> leads to undefined behavior.
> 
> Jonathan Cameron reported this following NULL pointer dereference when a
> VM with a virtio-blk device and a memory-backend-file object is
> terminated:
> 1. qemu_cleanup() closes all drives, setting blk->root to NULL
> 2. qemu_cleanup() calls user_creatable_cleanup(), which results in a RAM
>block notifier callback because the memory-backend-file is destroyed.
> 3. blk_unregister_buf() is called by virtio-blk's BlockRamRegistrar
>notifier callback and undefined behavior occurs.
> 
> Fixes: baf422684d73 ("virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization 
> hint")
> Co-authored-by: Jonathan Cameron 
> Signed-off-by: Stefan Hajnoczi 

Reviewed-by: Kevin Wolf 

This raises some questions, though. What happens if the graph isn't
static between creation and deletion of the device? Do we need to do
something with registered buffers when a node is attached to or detached
from an existing device?

Kevin




[PATCH for-7.2 3/5] hw/nvme: fix aio cancel in zone reset

2022-11-22 Thread Klaus Jensen
From: Klaus Jensen 

If the zone reset operation is cancelled but the block unmap operation
completes normally, the callback will continue resetting the next zone
since it neglects to check iocb->ret which will have been set to
-ECANCELED. Make sure that this is checked and bail out if an error is
present.

Secondly, fix a potential use-after-free by removing the bottom half and
enqueuing the completion directly.

Fixes: 63d96e4ffd71 ("hw/nvme: reimplement zone reset to allow cancellation")
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 36 +++-
 1 file changed, 11 insertions(+), 25 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index fc129b8d1a93..558ccea154c2 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -3712,7 +3712,6 @@ typedef struct NvmeZoneResetAIOCB {
 BlockAIOCB common;
 BlockAIOCB *aiocb;
 NvmeRequest *req;
-QEMUBH *bh;
 int ret;
 
 bool all;
@@ -3741,17 +3740,6 @@ static const AIOCBInfo nvme_zone_reset_aiocb_info = {
 .cancel_async = nvme_zone_reset_cancel,
 };
 
-static void nvme_zone_reset_bh(void *opaque)
-{
-NvmeZoneResetAIOCB *iocb = opaque;
-
-iocb->common.cb(iocb->common.opaque, iocb->ret);
-
-qemu_bh_delete(iocb->bh);
-iocb->bh = NULL;
-qemu_aio_unref(iocb);
-}
-
 static void nvme_zone_reset_cb(void *opaque, int ret);
 
 static void nvme_zone_reset_epilogue_cb(void *opaque, int ret)
@@ -3762,14 +3750,8 @@ static void nvme_zone_reset_epilogue_cb(void *opaque, 
int ret)
 int64_t moff;
 int count;
 
-if (ret < 0) {
-nvme_zone_reset_cb(iocb, ret);
-return;
-}
-
-if (!ns->lbaf.ms) {
-nvme_zone_reset_cb(iocb, 0);
-return;
+if (ret < 0 || iocb->ret < 0 || !ns->lbaf.ms) {
+goto out;
 }
 
 moff = nvme_moff(ns, iocb->zone->d.zslba);
@@ -3779,6 +3761,9 @@ static void nvme_zone_reset_epilogue_cb(void *opaque, int 
ret)
 BDRV_REQ_MAY_UNMAP,
 nvme_zone_reset_cb, iocb);
 return;
+
+out:
+nvme_zone_reset_cb(iocb, ret);
 }
 
 static void nvme_zone_reset_cb(void *opaque, int ret)
@@ -3787,7 +3772,9 @@ static void nvme_zone_reset_cb(void *opaque, int ret)
 NvmeRequest *req = iocb->req;
 NvmeNamespace *ns = req->ns;
 
-if (ret < 0) {
+if (iocb->ret < 0) {
+goto done;
+} else if (ret < 0) {
 iocb->ret = ret;
 goto done;
 }
@@ -3835,9 +3822,9 @@ static void nvme_zone_reset_cb(void *opaque, int ret)
 
 done:
 iocb->aiocb = NULL;
-if (iocb->bh) {
-qemu_bh_schedule(iocb->bh);
-}
+
+iocb->common.cb(iocb->common.opaque, iocb->ret);
+qemu_aio_unref(iocb);
 }
 
 static uint16_t nvme_zone_mgmt_send_zrwa_flush(NvmeCtrl *n, NvmeZone *zone,
@@ -3942,7 +3929,6 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, 
NvmeRequest *req)
nvme_misc_cb, req);
 
 iocb->req = req;
-iocb->bh = qemu_bh_new(nvme_zone_reset_bh, iocb);
 iocb->ret = 0;
 iocb->all = all;
 iocb->idx = zone_idx;
-- 
2.38.1




[PATCH for-7.2 4/5] hw/nvme: fix aio cancel in dsm

2022-11-22 Thread Klaus Jensen
From: Klaus Jensen 

When the DSM operation is cancelled asynchronously, we set iocb->ret to
-ECANCELED. However, the callback function only checks the return value
of the completed aio, which may have completed succesfully prior to the
cancellation and thus the callback ends up continuing the dsm operation
instead of bailing out. Fix this.

Secondly, fix a potential use-after-free by removing the bottom half and
enqueuing the completion directly.

Fixes: d7d1474fd85d ("hw/nvme: reimplement dsm to allow cancellation")
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 34 --
 1 file changed, 8 insertions(+), 26 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 558ccea154c2..458c85d47cce 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -2329,7 +2329,6 @@ typedef struct NvmeDSMAIOCB {
 BlockAIOCB common;
 BlockAIOCB *aiocb;
 NvmeRequest *req;
-QEMUBH *bh;
 int ret;
 
 NvmeDsmRange *range;
@@ -2351,7 +2350,7 @@ static void nvme_dsm_cancel(BlockAIOCB *aiocb)
 } else {
 /*
  * We only reach this if nvme_dsm_cancel() has already been called or
- * the command ran to completion and nvme_dsm_bh is scheduled to run.
+ * the command ran to completion.
  */
 assert(iocb->idx == iocb->nr);
 }
@@ -2362,17 +2361,6 @@ static const AIOCBInfo nvme_dsm_aiocb_info = {
 .cancel_async = nvme_dsm_cancel,
 };
 
-static void nvme_dsm_bh(void *opaque)
-{
-NvmeDSMAIOCB *iocb = opaque;
-
-iocb->common.cb(iocb->common.opaque, iocb->ret);
-
-qemu_bh_delete(iocb->bh);
-iocb->bh = NULL;
-qemu_aio_unref(iocb);
-}
-
 static void nvme_dsm_cb(void *opaque, int ret);
 
 static void nvme_dsm_md_cb(void *opaque, int ret)
@@ -2384,16 +2372,10 @@ static void nvme_dsm_md_cb(void *opaque, int ret)
 uint64_t slba;
 uint32_t nlb;
 
-if (ret < 0) {
-iocb->ret = ret;
+if (ret < 0 || iocb->ret < 0 || !ns->lbaf.ms) {
 goto done;
 }
 
-if (!ns->lbaf.ms) {
-nvme_dsm_cb(iocb, 0);
-return;
-}
-
 range = &iocb->range[iocb->idx - 1];
 slba = le64_to_cpu(range->slba);
 nlb = le32_to_cpu(range->nlb);
@@ -2406,7 +2388,6 @@ static void nvme_dsm_md_cb(void *opaque, int ret)
 ret = nvme_block_status_all(ns, slba, nlb, BDRV_BLOCK_ZERO);
 if (ret) {
 if (ret < 0) {
-iocb->ret = ret;
 goto done;
 }
 
@@ -2420,8 +2401,7 @@ static void nvme_dsm_md_cb(void *opaque, int ret)
 return;
 
 done:
-iocb->aiocb = NULL;
-qemu_bh_schedule(iocb->bh);
+nvme_dsm_cb(iocb, ret);
 }
 
 static void nvme_dsm_cb(void *opaque, int ret)
@@ -2434,7 +2414,9 @@ static void nvme_dsm_cb(void *opaque, int ret)
 uint64_t slba;
 uint32_t nlb;
 
-if (ret < 0) {
+if (iocb->ret < 0) {
+goto done;
+} else if (ret < 0) {
 iocb->ret = ret;
 goto done;
 }
@@ -2468,7 +2450,8 @@ next:
 
 done:
 iocb->aiocb = NULL;
-qemu_bh_schedule(iocb->bh);
+iocb->common.cb(iocb->common.opaque, iocb->ret);
+qemu_aio_unref(iocb);
 }
 
 static uint16_t nvme_dsm(NvmeCtrl *n, NvmeRequest *req)
@@ -2486,7 +2469,6 @@ static uint16_t nvme_dsm(NvmeCtrl *n, NvmeRequest *req)
  nvme_misc_cb, req);
 
 iocb->req = req;
-iocb->bh = qemu_bh_new(nvme_dsm_bh, iocb);
 iocb->ret = 0;
 iocb->range = g_new(NvmeDsmRange, nr);
 iocb->nr = nr;
-- 
2.38.1




[PATCH for-7.2 2/5] hw/nvme: fix aio cancel in flush

2022-11-22 Thread Klaus Jensen
From: Klaus Jensen 

Make sure that iocb->aiocb is NULL'ed when cancelling.

Fix a potential use-after-free by removing the bottom half and enqueuing
the completion directly.

Fixes: 38f4ac65ac88 ("hw/nvme: reimplement flush to allow cancellation")
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 21 ++---
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 26b53469328f..fc129b8d1a93 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -3160,7 +3160,6 @@ typedef struct NvmeFlushAIOCB {
 BlockAIOCB common;
 BlockAIOCB *aiocb;
 NvmeRequest *req;
-QEMUBH *bh;
 int ret;
 
 NvmeNamespace *ns;
@@ -3176,6 +3175,7 @@ static void nvme_flush_cancel(BlockAIOCB *acb)
 
 if (iocb->aiocb) {
 blk_aio_cancel_async(iocb->aiocb);
+iocb->aiocb = NULL;
 }
 }
 
@@ -3185,6 +3185,8 @@ static const AIOCBInfo nvme_flush_aiocb_info = {
 .get_aio_context = nvme_get_aio_context,
 };
 
+static void nvme_do_flush(NvmeFlushAIOCB *iocb);
+
 static void nvme_flush_ns_cb(void *opaque, int ret)
 {
 NvmeFlushAIOCB *iocb = opaque;
@@ -3206,13 +3208,11 @@ static void nvme_flush_ns_cb(void *opaque, int ret)
 }
 
 out:
-iocb->aiocb = NULL;
-qemu_bh_schedule(iocb->bh);
+nvme_do_flush(iocb);
 }
 
-static void nvme_flush_bh(void *opaque)
+static void nvme_do_flush(NvmeFlushAIOCB *iocb)
 {
-NvmeFlushAIOCB *iocb = opaque;
 NvmeRequest *req = iocb->req;
 NvmeCtrl *n = nvme_ctrl(req);
 int i;
@@ -3239,14 +3239,8 @@ static void nvme_flush_bh(void *opaque)
 return;
 
 done:
-qemu_bh_delete(iocb->bh);
-iocb->bh = NULL;
-
 iocb->common.cb(iocb->common.opaque, iocb->ret);
-
 qemu_aio_unref(iocb);
-
-return;
 }
 
 static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req)
@@ -3258,7 +3252,6 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req)
 iocb = qemu_aio_get(&nvme_flush_aiocb_info, NULL, nvme_misc_cb, req);
 
 iocb->req = req;
-iocb->bh = qemu_bh_new(nvme_flush_bh, iocb);
 iocb->ret = 0;
 iocb->ns = NULL;
 iocb->nsid = 0;
@@ -3280,13 +3273,11 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest 
*req)
 }
 
 req->aiocb = &iocb->common;
-qemu_bh_schedule(iocb->bh);
+nvme_do_flush(iocb);
 
 return NVME_NO_COMPLETE;
 
 out:
-qemu_bh_delete(iocb->bh);
-iocb->bh = NULL;
 qemu_aio_unref(iocb);
 
 return status;
-- 
2.38.1




[PATCH for-7.2 5/5] hw/nvme: remove copy bh scheduling

2022-11-22 Thread Klaus Jensen
From: Klaus Jensen 

Fix a potential use-after-free by removing the bottom half and enqueuing
the completion directly.

Fixes: 796d20681d9b ("hw/nvme: reimplement the copy command to allow aio 
cancellation")
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 63 +++---
 1 file changed, 14 insertions(+), 49 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 458c85d47cce..bbbab522aa7a 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -2552,7 +2552,6 @@ typedef struct NvmeCopyAIOCB {
 BlockAIOCB common;
 BlockAIOCB *aiocb;
 NvmeRequest *req;
-QEMUBH *bh;
 int ret;
 
 void *ranges;
@@ -2590,9 +2589,8 @@ static const AIOCBInfo nvme_copy_aiocb_info = {
 .cancel_async = nvme_copy_cancel,
 };
 
-static void nvme_copy_bh(void *opaque)
+static void nvme_copy_done(NvmeCopyAIOCB *iocb)
 {
-NvmeCopyAIOCB *iocb = opaque;
 NvmeRequest *req = iocb->req;
 NvmeNamespace *ns = req->ns;
 BlockAcctStats *stats = blk_get_stats(ns->blkconf.blk);
@@ -2604,9 +2602,6 @@ static void nvme_copy_bh(void *opaque)
 qemu_iovec_destroy(&iocb->iov);
 g_free(iocb->bounce);
 
-qemu_bh_delete(iocb->bh);
-iocb->bh = NULL;
-
 if (iocb->ret < 0) {
 block_acct_failed(stats, &iocb->acct.read);
 block_acct_failed(stats, &iocb->acct.write);
@@ -2619,7 +2614,7 @@ static void nvme_copy_bh(void *opaque)
 qemu_aio_unref(iocb);
 }
 
-static void nvme_copy_cb(void *opaque, int ret);
+static void nvme_do_copy(NvmeCopyAIOCB *iocb);
 
 static void nvme_copy_source_range_parse_format0(void *ranges, int idx,
  uint64_t *slba, uint32_t *nlb,
@@ -2731,7 +2726,7 @@ static void nvme_copy_out_completed_cb(void *opaque, int 
ret)
 iocb->idx++;
 iocb->slba += nlb;
 out:
-nvme_copy_cb(iocb, iocb->ret);
+nvme_do_copy(iocb);
 }
 
 static void nvme_copy_out_cb(void *opaque, int ret)
@@ -2743,16 +2738,8 @@ static void nvme_copy_out_cb(void *opaque, int ret)
 size_t mlen;
 uint8_t *mbounce;
 
-if (ret < 0) {
-iocb->ret = ret;
+if (ret < 0 || iocb->ret < 0 || !ns->lbaf.ms) {
 goto out;
-} else if (iocb->ret < 0) {
-goto out;
-}
-
-if (!ns->lbaf.ms) {
-nvme_copy_out_completed_cb(iocb, 0);
-return;
 }
 
 nvme_copy_source_range_parse(iocb->ranges, iocb->idx, iocb->format, NULL,
@@ -2771,7 +2758,7 @@ static void nvme_copy_out_cb(void *opaque, int ret)
 return;
 
 out:
-nvme_copy_cb(iocb, ret);
+nvme_copy_out_completed_cb(iocb, ret);
 }
 
 static void nvme_copy_in_completed_cb(void *opaque, int ret)
@@ -2865,15 +2852,9 @@ static void nvme_copy_in_completed_cb(void *opaque, int 
ret)
 
 invalid:
 req->status = status;
-iocb->aiocb = NULL;
-if (iocb->bh) {
-qemu_bh_schedule(iocb->bh);
-}
-
-return;
-
+iocb->ret = -1;
 out:
-nvme_copy_cb(iocb, ret);
+nvme_do_copy(iocb);
 }
 
 static void nvme_copy_in_cb(void *opaque, int ret)
@@ -2884,16 +2865,8 @@ static void nvme_copy_in_cb(void *opaque, int ret)
 uint64_t slba;
 uint32_t nlb;
 
-if (ret < 0) {
-iocb->ret = ret;
+if (ret < 0 || iocb->ret < 0 || !ns->lbaf.ms) {
 goto out;
-} else if (iocb->ret < 0) {
-goto out;
-}
-
-if (!ns->lbaf.ms) {
-nvme_copy_in_completed_cb(iocb, 0);
-return;
 }
 
 nvme_copy_source_range_parse(iocb->ranges, iocb->idx, iocb->format, &slba,
@@ -2909,12 +2882,11 @@ static void nvme_copy_in_cb(void *opaque, int ret)
 return;
 
 out:
-nvme_copy_cb(iocb, iocb->ret);
+nvme_copy_in_completed_cb(iocb, ret);
 }
 
-static void nvme_copy_cb(void *opaque, int ret)
+static void nvme_do_copy(NvmeCopyAIOCB *iocb)
 {
-NvmeCopyAIOCB *iocb = opaque;
 NvmeRequest *req = iocb->req;
 NvmeNamespace *ns = req->ns;
 uint64_t slba;
@@ -2922,10 +2894,7 @@ static void nvme_copy_cb(void *opaque, int ret)
 size_t len;
 uint16_t status;
 
-if (ret < 0) {
-iocb->ret = ret;
-goto done;
-} else if (iocb->ret < 0) {
+if (iocb->ret < 0) {
 goto done;
 }
 
@@ -2972,14 +2941,11 @@ static void nvme_copy_cb(void *opaque, int ret)
 
 invalid:
 req->status = status;
+iocb->ret = -1;
 done:
-iocb->aiocb = NULL;
-if (iocb->bh) {
-qemu_bh_schedule(iocb->bh);
-}
+nvme_copy_done(iocb);
 }
 
-
 static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
 {
 NvmeNamespace *ns = req->ns;
@@ -3049,7 +3015,6 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
 }
 
 iocb->req = req;
-iocb->bh = qemu_bh_new(nvme_copy_bh, iocb);
 iocb->ret = 0;
 iocb->nr = nr;
 iocb->idx = 0;
@@ -3066,7 +3031,7 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
  BLOCK_ACCT_WRITE);
 
 req->aiocb = &iocb->common;
-nvme_copy_cb(iocb, 0);
+nvme_do_copy(iocb);
 
 return NVME_NO_COMPLETE;
 
-- 
2.3

[PATCH for-7.2 1/5] hw/nvme: fix aio cancel in format

2022-11-22 Thread Klaus Jensen
From: Klaus Jensen 

There are several bugs in the async cancel code for the Format command.

Firstly, cancelling a format operation neglects to set iocb->ret as well
as clearing the iocb->aiocb after cancelling the underlying aiocb which
causes the aio callback to ignore the cancellation. Trivial fix.

Secondly, and worse, because the request is queued up for posting to the
CQ in a bottom half, if the cancellation is due to the submission queue
being deleted (which calls blk_aio_cancel), the req structure is
deallocated in nvme_del_sq prior to the bottom half being schedulued.

Fix this by simply removing the bottom half, there is no reason to defer
it anyway.

Fixes: 3bcf26d3d619 ("hw/nvme: reimplement format nvm to allow cancellation")
Reported-by: Jonathan Derrick 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index ac3885ce5079..26b53469328f 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -5756,14 +5756,15 @@ typedef struct NvmeFormatAIOCB {
 uint8_t pil;
 } NvmeFormatAIOCB;
 
-static void nvme_format_bh(void *opaque);
-
 static void nvme_format_cancel(BlockAIOCB *aiocb)
 {
 NvmeFormatAIOCB *iocb = container_of(aiocb, NvmeFormatAIOCB, common);
 
+iocb->ret = -ECANCELED;
+
 if (iocb->aiocb) {
 blk_aio_cancel_async(iocb->aiocb);
+iocb->aiocb = NULL;
 }
 }
 
@@ -5787,13 +5788,17 @@ static void nvme_format_set(NvmeNamespace *ns, uint8_t 
lbaf, uint8_t mset,
 nvme_ns_init_format(ns);
 }
 
+static void nvme_do_format(NvmeFormatAIOCB *iocb);
+
 static void nvme_format_ns_cb(void *opaque, int ret)
 {
 NvmeFormatAIOCB *iocb = opaque;
 NvmeNamespace *ns = iocb->ns;
 int bytes;
 
-if (ret < 0) {
+if (iocb->ret < 0) {
+goto done;
+} else if (ret < 0) {
 iocb->ret = ret;
 goto done;
 }
@@ -5817,8 +5822,7 @@ static void nvme_format_ns_cb(void *opaque, int ret)
 iocb->offset = 0;
 
 done:
-iocb->aiocb = NULL;
-qemu_bh_schedule(iocb->bh);
+nvme_do_format(iocb);
 }
 
 static uint16_t nvme_format_check(NvmeNamespace *ns, uint8_t lbaf, uint8_t pi)
@@ -5842,9 +5846,8 @@ static uint16_t nvme_format_check(NvmeNamespace *ns, 
uint8_t lbaf, uint8_t pi)
 return NVME_SUCCESS;
 }
 
-static void nvme_format_bh(void *opaque)
+static void nvme_do_format(NvmeFormatAIOCB *iocb)
 {
-NvmeFormatAIOCB *iocb = opaque;
 NvmeRequest *req = iocb->req;
 NvmeCtrl *n = nvme_ctrl(req);
 uint32_t dw10 = le32_to_cpu(req->cmd.cdw10);
@@ -5882,11 +5885,7 @@ static void nvme_format_bh(void *opaque)
 return;
 
 done:
-qemu_bh_delete(iocb->bh);
-iocb->bh = NULL;
-
 iocb->common.cb(iocb->common.opaque, iocb->ret);
-
 qemu_aio_unref(iocb);
 }
 
@@ -5905,7 +5904,6 @@ static uint16_t nvme_format(NvmeCtrl *n, NvmeRequest *req)
 iocb = qemu_aio_get(&nvme_format_aiocb_info, NULL, nvme_misc_cb, req);
 
 iocb->req = req;
-iocb->bh = qemu_bh_new(nvme_format_bh, iocb);
 iocb->ret = 0;
 iocb->ns = NULL;
 iocb->nsid = 0;
@@ -5934,14 +5932,13 @@ static uint16_t nvme_format(NvmeCtrl *n, NvmeRequest 
*req)
 }
 
 req->aiocb = &iocb->common;
-qemu_bh_schedule(iocb->bh);
+nvme_do_format(iocb);
 
 return NVME_NO_COMPLETE;
 
 out:
-qemu_bh_delete(iocb->bh);
-iocb->bh = NULL;
 qemu_aio_unref(iocb);
+
 return status;
 }
 
-- 
2.38.1




[PATCH for-7.2 0/5] hw/nvme: aio cancel fixes

2022-11-22 Thread Klaus Jensen
From: Klaus Jensen 

A new blktests nvme test unearthed some bad bugs in the asynchronous
cancellation handling.

Fix this for all commands that implement async_cancel(). The fix is the
same for all commands: remove the deferred enqueuing (a bottom half
scheduling) of the request completion.

Klaus Jensen (5):
  hw/nvme: fix aio cancel in format
  hw/nvme: fix aio cancel in flush
  hw/nvme: fix aio cancel in zone reset
  hw/nvme: fix aio cancel in dsm
  hw/nvme: remove copy bh scheduling

 hw/nvme/ctrl.c | 181 ++---
 1 file changed, 51 insertions(+), 130 deletions(-)

-- 
2.38.1