Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start
Am 21.11.22 um 23:37 schrieb Michael S. Tsirkin: [...] qemu-system-x86_64: ../hw/virtio/vhost-vsock-common.c:203: vhost_vsock_common_pre_save: Assertion `!vhost_dev_is_started(&vvc->vhost_dev)' failed. 2022-11-15 16:38:46.096+: shutting down, reason=crashed Alex were you able to replicate? Just curious. Ping?
Re: [RFC 4/7] migration: Split save_live_pending() into state_pending_*
* Juan Quintela (quint...@redhat.com) wrote: > We split the function into to: > > - state_pending_estimate: We estimate the remaining state size without > stopping the machine. > > - state pending_exact: We calculate the exact amount of remaining > state. > > The only "device" that implements different functions for _estimate() > and _exact() is ram. > > Signed-off-by: Juan Quintela Yeh I think that's OK; I'm a little worried whether you end up calling the two functions in migration_iteration_run a lot, but still Reviewed-by: Dr. David Alan Gilbert > --- > docs/devel/migration.rst | 18 ++ > docs/devel/vfio-migration.rst | 4 ++-- > include/migration/register.h | 12 > migration/savevm.h | 8 ++-- > hw/s390x/s390-stattrib.c | 7 --- > hw/vfio/migration.c| 9 + > migration/block-dirty-bitmap.c | 11 ++- > migration/block.c | 11 ++- > migration/migration.c | 13 + > migration/ram.c| 31 --- > migration/savevm.c | 34 +- > hw/vfio/trace-events | 2 +- > migration/trace-events | 7 --- > 13 files changed, 114 insertions(+), 53 deletions(-) > > diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst > index 3e9656d8e0..6f65c23b47 100644 > --- a/docs/devel/migration.rst > +++ b/docs/devel/migration.rst > @@ -482,15 +482,17 @@ An iterative device must provide: >- A ``load_setup`` function that initialises the data structures on the > destination. > > - - A ``save_live_pending`` function that is called repeatedly and must > -indicate how much more data the iterative data must save. The core > -migration code will use this to determine when to pause the CPUs > -and complete the migration. > + - A ``state_pending_exact`` function that indicates how much more > +data we must save. The core migration code will use this to > +determine when to pause the CPUs and complete the migration. > > - - A ``save_live_iterate`` function (called after ``save_live_pending`` > -when there is significant data still to be sent). It should send > -a chunk of data until the point that stream bandwidth limits tell it > -to stop. Each call generates one section. > + - A ``state_pending_estimate`` function that indicates how much more > +data we must save. When the estimated amount is smaller than the > +threshold, we call ``state_pending_exact``. > + > + - A ``save_live_iterate`` function should send a chunk of data until > +the point that stream bandwidth limits tell it to stop. Each call > +generates one section. > >- A ``save_live_complete_precopy`` function that must transmit the > last section for the device containing any remaining data. > diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst > index 9ff6163c88..673057c90d 100644 > --- a/docs/devel/vfio-migration.rst > +++ b/docs/devel/vfio-migration.rst > @@ -28,7 +28,7 @@ VFIO implements the device hooks for the iterative approach > as follows: > * A ``load_setup`` function that sets up the migration region on the >destination and sets _RESUMING flag in the VFIO device state. > > -* A ``save_live_pending`` function that reads pending_bytes from the vendor > +* A ``state_pending_exact`` function that reads pending_bytes from the vendor >driver, which indicates the amount of data that the vendor driver has yet > to >save for the VFIO device. > > @@ -114,7 +114,7 @@ Live migration save path > (RUNNING, _SETUP, _RUNNING|_SAVING) >| > (RUNNING, _ACTIVE, _RUNNING|_SAVING) > - If device is active, get pending_bytes by .save_live_pending() > + If device is active, get pending_bytes by .state_pending_exact() >If total pending_bytes >= threshold_size, call .save_live_iterate() >Data of VFIO device for pre-copy phase is copied > Iterate till total pending bytes converge and are less than threshold > diff --git a/include/migration/register.h b/include/migration/register.h > index 5b5424ed8f..313b8e1c3b 100644 > --- a/include/migration/register.h > +++ b/include/migration/register.h > @@ -46,9 +46,7 @@ typedef struct SaveVMHandlers { > > /* This runs outside the iothread lock! */ > int (*save_setup)(QEMUFile *f, void *opaque); > -void (*save_live_pending)(void *opaque, uint64_t threshold_size, > - uint64_t *rest_precopy, uint64_t > *rest_postcopy); > -/* Note for save_live_pending: > +/* Note for state_pending_*: > * - res_precopy is for data which must be migrated in precopy > * phase or in stopped state, in other words - before target > * vm start > @@ -59,7 +57,13 @@ typedef str
Re: [PATCH] vhost: mask VIRTIO_F_RING_RESET for vhost and vhost-user devices
On Tue, Nov 22, 2022 at 4:13 AM Jason Wang wrote: > > On Mon, Nov 21, 2022 at 6:11 PM Stefano Garzarella > wrote: > > > > Commit 69e1c14aa2 ("virtio: core: vq reset feature negotation support") > > enabled VIRTIO_F_RING_RESET by default for all virtio devices. > > > > This feature is not currently emulated by QEMU, so for vhost and > > vhost-user devices we need to make sure it is supported by the offloaded > > device emulation (in-kernel or in another process). > > To do this we need to add VIRTIO_F_RING_RESET to the features bitmap > > passed to vhost_get_features(). This way it will be masked if the device > > does not support it. > > > > This issue was initially discovered with vhost-vsock and vhost-user-vsock, > > and then also tested with vhost-user-rng which confirmed the same issue. > > They fail when sending features through VHOST_SET_FEATURES ioctl or > > VHOST_USER_SET_FEATURES message, since VIRTIO_F_RING_RESET is negotiated > > by the guest (Linux >= v6.0), but not supported by the device. > > > > Fixes: 69e1c14aa2 ("virtio: core: vq reset feature negotation support") > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1318 > > Signed-off-by: Stefano Garzarella > > Acked-by: Jason Wang > > > --- > > > > To prevent this problem in the future, perhaps we should provide a function > > (e.g. vhost_device_get_features) where we go to mask all non-device-specific > > features (e.g VIRTIO_F_*, VIRTIO_RING_F_*) that are not emulated by QEMU but > > we expect them to be emulated by the vhost or vhost-user devices. > > Adding Eugenio, this might not be true if we want to use shadow > virtqueue for emulating some features? > I think we should be able to introduce that in the (hypothetical) vhost_device_get_features if SVQ starts emulating a ring feature, isn't it? I think a first version not aware of SVQ is fine anyway, and to introduce it later should be easy. Thanks! > Thanks > > > Then we can call it in all .get_features callbacks just before return the > > features. > > > > What do you think? > > > > But maybe better to do that for the next release, I will send an RFC. > > > > Thanks, > > Stefano > > --- > > hw/block/vhost-user-blk.c | 1 + > > hw/net/vhost_net.c | 1 + > > hw/scsi/vhost-scsi.c | 1 + > > hw/scsi/vhost-user-scsi.c | 1 + > > hw/virtio/vhost-user-fs.c | 1 + > > hw/virtio/vhost-user-gpio.c| 1 + > > hw/virtio/vhost-user-i2c.c | 1 + > > hw/virtio/vhost-user-rng.c | 11 +-- > > hw/virtio/vhost-vsock-common.c | 1 + > > net/vhost-vdpa.c | 1 + > > 10 files changed, 18 insertions(+), 2 deletions(-) > > > > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c > > index 16ad400889..0d5190accf 100644 > > --- a/hw/block/vhost-user-blk.c > > +++ b/hw/block/vhost-user-blk.c > > @@ -52,6 +52,7 @@ static const int user_feature_bits[] = { > > VIRTIO_F_NOTIFY_ON_EMPTY, > > VIRTIO_F_RING_PACKED, > > VIRTIO_F_IOMMU_PLATFORM, > > +VIRTIO_F_RING_RESET, > > VHOST_INVALID_FEATURE_BIT > > }; > > > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > > index feda448878..26e4930676 100644 > > --- a/hw/net/vhost_net.c > > +++ b/hw/net/vhost_net.c > > @@ -75,6 +75,7 @@ static const int user_feature_bits[] = { > > VIRTIO_NET_F_MTU, > > VIRTIO_F_IOMMU_PLATFORM, > > VIRTIO_F_RING_PACKED, > > +VIRTIO_F_RING_RESET, > > VIRTIO_NET_F_RSS, > > VIRTIO_NET_F_HASH_REPORT, > > > > diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c > > index bdf337a7a2..6a0fd0dfb1 100644 > > --- a/hw/scsi/vhost-scsi.c > > +++ b/hw/scsi/vhost-scsi.c > > @@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = { > > VIRTIO_RING_F_INDIRECT_DESC, > > VIRTIO_RING_F_EVENT_IDX, > > VIRTIO_SCSI_F_HOTPLUG, > > +VIRTIO_F_RING_RESET, > > VHOST_INVALID_FEATURE_BIT > > }; > > > > diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c > > index bc37317d55..b7a71a802c 100644 > > --- a/hw/scsi/vhost-user-scsi.c > > +++ b/hw/scsi/vhost-user-scsi.c > > @@ -36,6 +36,7 @@ static const int user_feature_bits[] = { > > VIRTIO_RING_F_INDIRECT_DESC, > > VIRTIO_RING_F_EVENT_IDX, > > VIRTIO_SCSI_F_HOTPLUG, > > +VIRTIO_F_RING_RESET, > > VHOST_INVALID_FEATURE_BIT > > }; > > > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c > > index 1c40f42045..dc4014cdef 100644 > > --- a/hw/virtio/vhost-user-fs.c > > +++ b/hw/virtio/vhost-user-fs.c > > @@ -32,6 +32,7 @@ static const int user_feature_bits[] = { > > VIRTIO_F_NOTIFY_ON_EMPTY, > > VIRTIO_F_RING_PACKED, > > VIRTIO_F_IOMMU_PLATFORM, > > +VIRTIO_F_RING_RESET, > > > > VHOST_INVALID_FEATURE_BIT > > }; > > diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c > > index 677d1c7730..5851cb3bc9 100644 > > --- a/hw/virtio/vhost-user-gpio.c > > +++ b/hw/virtio/vhost-user-gpio.c > > @@ -24,6 +24,7 @@ static const int feature_bits[] =
Re: [RFC 2/7] migration: No save_live_pending() method uses the QEMUFile parameter
* Juan Quintela (quint...@redhat.com) wrote: > So remove it everywhere. > > Signed-off-by: Juan Quintela Reviewed-by: Dr. David Alan Gilbert > --- > include/migration/register.h | 6 ++ > migration/savevm.h | 2 +- > hw/s390x/s390-stattrib.c | 2 +- > hw/vfio/migration.c| 6 ++ > migration/block-dirty-bitmap.c | 5 ++--- > migration/block.c | 2 +- > migration/migration.c | 3 +-- > migration/ram.c| 2 +- > migration/savevm.c | 5 ++--- > 9 files changed, 13 insertions(+), 20 deletions(-) > > diff --git a/include/migration/register.h b/include/migration/register.h > index 1950fee6a8..5b5424ed8f 100644 > --- a/include/migration/register.h > +++ b/include/migration/register.h > @@ -46,10 +46,8 @@ typedef struct SaveVMHandlers { > > /* This runs outside the iothread lock! */ > int (*save_setup)(QEMUFile *f, void *opaque); > -void (*save_live_pending)(QEMUFile *f, void *opaque, > - uint64_t threshold_size, > - uint64_t *rest_precopy, > - uint64_t *rest_postcopy); > +void (*save_live_pending)(void *opaque, uint64_t threshold_size, > + uint64_t *rest_precopy, uint64_t > *rest_postcopy); > /* Note for save_live_pending: > * - res_precopy is for data which must be migrated in precopy > * phase or in stopped state, in other words - before target > diff --git a/migration/savevm.h b/migration/savevm.h > index 9bd55c336c..98fae6f9b3 100644 > --- a/migration/savevm.h > +++ b/migration/savevm.h > @@ -40,7 +40,7 @@ void qemu_savevm_state_cleanup(void); > void qemu_savevm_state_complete_postcopy(QEMUFile *f); > int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only, > bool inactivate_disks); > -void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size, > +void qemu_savevm_state_pending(uint64_t max_size, > uint64_t *res_precopy, uint64_t > *res_postcopy); > void qemu_savevm_send_ping(QEMUFile *f, uint32_t value); > void qemu_savevm_send_open_return_path(QEMUFile *f); > diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c > index ee60b53da4..9b74eeadf3 100644 > --- a/hw/s390x/s390-stattrib.c > +++ b/hw/s390x/s390-stattrib.c > @@ -182,7 +182,7 @@ static int cmma_save_setup(QEMUFile *f, void *opaque) > return 0; > } > > -static void cmma_save_pending(QEMUFile *f, void *opaque, uint64_t max_size, > +static void cmma_save_pending(void *opaque, uint64_t max_size, >uint64_t *res_precopy, uint64_t *res_postcopy) > { > S390StAttribState *sas = S390_STATTRIB(opaque); > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > index 3423f113f0..760d5f3c5c 100644 > --- a/hw/vfio/migration.c > +++ b/hw/vfio/migration.c > @@ -456,10 +456,8 @@ static void vfio_save_cleanup(void *opaque) > trace_vfio_save_cleanup(vbasedev->name); > } > > -static void vfio_save_pending(QEMUFile *f, void *opaque, > - uint64_t threshold_size, > - uint64_t *res_precopy, > - uint64_t *res_postcopy) > +static void vfio_save_pending(void *opaque, uint64_t threshold_size, > + uint64_t *res_precopy, uint64_t *res_postcopy) > { > VFIODevice *vbasedev = opaque; > VFIOMigration *migration = vbasedev->migration; > diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c > index dfea546330..a445bdc3c3 100644 > --- a/migration/block-dirty-bitmap.c > +++ b/migration/block-dirty-bitmap.c > @@ -761,9 +761,8 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void > *opaque) > return 0; > } > > -static void dirty_bitmap_save_pending(QEMUFile *f, void *opaque, > - uint64_t max_size, > - uint64_t *res_precopy, > +static void dirty_bitmap_save_pending(void *opaque, uint64_t max_size, > + uint64_t *res_precopy, >uint64_t *res_postcopy) > { > DBMSaveState *s = &((DBMState *)opaque)->save; > diff --git a/migration/block.c b/migration/block.c > index 4ae8f837b0..b3d680af75 100644 > --- a/migration/block.c > +++ b/migration/block.c > @@ -862,7 +862,7 @@ static int block_save_complete(QEMUFile *f, void *opaque) > return 0; > } > > -static void block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size, > +static void block_save_pending(void *opaque, uint64_t max_size, > uint64_t *res_precopy, > uint64_t *res_postcopy) > { > diff --git a/migration/migration.c b/migration/migration.c > index 440aa62f16..038fc58a96 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@
Re: [PATCH for-7.2 1/5] hw/nvme: fix aio cancel in format
On Tue, Nov 22, 2022 at 09:13:44AM +0100, Klaus Jensen wrote: > There are several bugs in the async cancel code for the Format command. > > Firstly, cancelling a format operation neglects to set iocb->ret as well > as clearing the iocb->aiocb after cancelling the underlying aiocb which > causes the aio callback to ignore the cancellation. Trivial fix. > > Secondly, and worse, because the request is queued up for posting to the > CQ in a bottom half, if the cancellation is due to the submission queue > being deleted (which calls blk_aio_cancel), the req structure is > deallocated in nvme_del_sq prior to the bottom half being schedulued. > > Fix this by simply removing the bottom half, there is no reason to defer > it anyway. I thought for sure I'd find a reason defered execution was needed, but hey, it looks perfectly fine with this change! > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c > index ac3885ce5079..26b53469328f 100644 > --- a/hw/nvme/ctrl.c > +++ b/hw/nvme/ctrl.c > @@ -5756,14 +5756,15 @@ typedef struct NvmeFormatAIOCB { > uint8_t pil; > } NvmeFormatAIOCB; I think you can remove the QEMUBH member from the above struct with this patch. Otherwise looks good. Reviewed-by: Keith Busch
Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci
On Tue, 22 Nov 2022 at 15:04, Philippe Mathieu-Daudé wrote: > > On 21/11/22 17:42, Max Filippov wrote: > > On Mon, Nov 21, 2022 at 6:01 AM Markus Armbruster wrote: > >> .../xtensa/core-dsp3400/xtensa-modules.c.inc | 136 +- > >> target/xtensa/core-lx106/xtensa-modules.c.inc | 16 +-- > > > > These files are generated and were imported from xtensa configuration > > overlays, they're not supposed to be changed. > > Tools can get the repository file list using 'git ls-files', which > itself support file pattern exclusion [*]. > > We can create i.e. 'scripts/imported-files.txt' with: > >linux-headers/ >target/hexagon/imported/ >target/xtensa/core* >tests/tcg/mips/user/ Good idea. scripts/clean-header-guards.pl also has these to add to the exclude list: include/standard-headers/ pc-bios/ tests/tcg/ tests/multiboot/ thanks -- PMM
Re: [PATCH v2 2/2] block/vmdk: Simplify vmdk_co_create() to return directly
On 22/11/22 14:49, Markus Armbruster wrote: Cc: Fam Zheng Cc: Kevin Wolf Cc: Hanna Reitz Cc: qemu-block@nongnu.org Signed-off-by: Markus Armbruster --- block/vmdk.c | 28 +++- 1 file changed, 11 insertions(+), 17 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci
On 21/11/22 17:42, Max Filippov wrote: On Mon, Nov 21, 2022 at 6:01 AM Markus Armbruster wrote: .../xtensa/core-dsp3400/xtensa-modules.c.inc | 136 +- target/xtensa/core-lx106/xtensa-modules.c.inc | 16 +-- These files are generated and were imported from xtensa configuration overlays, they're not supposed to be changed. Tools can get the repository file list using 'git ls-files', which itself support file pattern exclusion [*]. We can create i.e. 'scripts/imported-files.txt' with: linux-headers/ target/hexagon/imported/ target/xtensa/core* tests/tcg/mips/user/ Then use 'git ls-files --exclude-from=scripts/imported-files.txt' ... [*] https://git-scm.com/docs/git-ls-files#Documentation/git-ls-files.txt---exclude-fromltfilegt
Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci
On 22/11/22 09:58, Markus Armbruster wrote: Thomas Huth writes: On 21/11/2022 17.32, Markus Armbruster wrote: Philippe Mathieu-Daudé writes: On 21/11/22 15:36, Peter Maydell wrote: On Mon, 21 Nov 2022 at 14:03, Markus Armbruster wrote: Tweak the semantic patch to drop redundant parenthesis around the return expression. Coccinelle drops comments in hw/rdma/vmw/pvrdma_cmd.c; restored manually. Coccinelle messes up vmdk_co_create(), not sure why. Transformed manually. Line breaks in target/avr/cpu.h and hw/rdma/vmw/pvrdma_cmd.c tidied up manually. Whitespace in fuse_reply_iov() tidied up manually. checkpatch.pl complains "return of an errno should typically be -ve" two times for hw/9pfs/9p-synth.c. Preexisting, the patch merely makes it visible to checkpatch.pl. checkpatch.pl complains "return is not a function, parentheses are not required" three times for target/mips/tcg/dsp_helper.c. False positives. Signed-off-by: Markus Armbruster .../user/ase/msa/bit-count/test_msa_nloc_b.c | 9 +- .../user/ase/msa/bit-count/test_msa_nloc_d.c | 9 +- [snip long list of other mips test files] 328 files changed, 989 insertions(+), 2099 deletions(-) This patch seems to almost entirely be huge because of these mips test case files. Are they specific to QEMU or are they effectively a 3rd-party import that it doesn't make sense to make local changes to? They are imported and will unlikely be modified. Not obvious to me from git-log. Should I drop the changes to tests/tcg/mips/? I'd say yes. At least move them to a separate patch. Possible status of tests/tcg/mips/: 1. Imported, should not be modified Drop from the patch. 2. Not imported, should be modified 2a. To be reviewed separately from the remainder of the patch Split off. 2b. Likewise, but nobody will care to review, realistically Split off and merge anyway, or drop. I'd go for the latter. 2c. To be reviewed together with the remainder of the patch Keep as is. Which one is it? "1. Imported, should not be modified" please :)
Re: [RFC 1/7] migration: Remove res_compatible parameter
* Juan Quintela (quint...@redhat.com) wrote: > It was only used for RAM, and in that case, it means that this amount > of data was sent for memory. Just delete the field in all callers. > > Signed-off-by: Juan Quintela > --- > include/migration/register.h | 20 ++-- > migration/savevm.h | 4 +--- > hw/s390x/s390-stattrib.c | 6 ++ > hw/vfio/migration.c| 10 -- > migration/block-dirty-bitmap.c | 7 +++ > migration/block.c | 7 +++ > migration/migration.c | 9 - > migration/ram.c| 8 +++- > migration/savevm.c | 14 +- > hw/vfio/trace-events | 2 +- > migration/trace-events | 2 +- > 11 files changed, 37 insertions(+), 52 deletions(-) > > diff --git a/include/migration/register.h b/include/migration/register.h > index c1dcff0f90..1950fee6a8 100644 > --- a/include/migration/register.h > +++ b/include/migration/register.h > @@ -48,18 +48,18 @@ typedef struct SaveVMHandlers { > int (*save_setup)(QEMUFile *f, void *opaque); > void (*save_live_pending)(QEMUFile *f, void *opaque, >uint64_t threshold_size, > - uint64_t *res_precopy_only, > - uint64_t *res_compatible, > - uint64_t *res_postcopy_only); > + uint64_t *rest_precopy, > + uint64_t *rest_postcopy); > /* Note for save_live_pending: > - * - res_precopy_only is for data which must be migrated in precopy phase > - * or in stopped state, in other words - before target vm start > - * - res_compatible is for data which may be migrated in any phase > - * - res_postcopy_only is for data which must be migrated in postcopy > phase > - * or in stopped state, in other words - after source vm stop > + * - res_precopy is for data which must be migrated in precopy > + * phase or in stopped state, in other words - before target > + * vm start > + * - res_postcopy is for data which must be migrated in postcopy > + * phase or in stopped state, in other words - after source vm > + * stop > * > - * Sum of res_postcopy_only, res_compatible and res_postcopy_only is the > - * whole amount of pending data. > + * Sum of res_precopy and res_postcopy is the whole amount of > + * pending data. I'm not sure if this is correct; I think we really do have three different kinds of device: a) Those that don't know how to postcopy b) Those that can only send data in postcopy (which I think is what the dirty-bitmap block stuff does) c) Devices that know how to postcopy, like RAM > diff --git a/migration/ram.c b/migration/ram.c > index dc1de9ddbc..20167e1102 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -3435,9 +3435,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) > } > > static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size, > - uint64_t *res_precopy_only, > - uint64_t *res_compatible, > - uint64_t *res_postcopy_only) > + uint64_t *res_precopy, uint64_t *res_postcopy) > { > RAMState **temp = opaque; > RAMState *rs = *temp; > @@ -3457,9 +3455,9 @@ static void ram_save_pending(QEMUFile *f, void *opaque, > uint64_t max_size, > > if (migrate_postcopy_ram()) { > /* We can do postcopy, and all the data is postcopiable */ > -*res_compatible += remaining_size; > +*res_postcopy += remaining_size; migrate_postcopy_ram() says that postcopy is enabled, not active, so here you are saying that ram is 'postcopy only' according to your definition above. I don't think it actually changes behaviour though at the moment; but it doesn't feel right to assign that to something that says 'postcopy only' migration_iteration_run doesn't enforce the postcopy-only part very hard; it's just part of the threshold as far as I can tell. Dave > } else { > -*res_precopy_only += remaining_size; > +*res_precopy += remaining_size; > } > } > diff --git a/migration/migration.c b/migration/migration.c > index bb8bbddfe4..440aa62f16 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -3734,15 +3734,14 @@ typedef enum { > */ > static MigIterateState migration_iteration_run(MigrationState *s) > { > -uint64_t pending_size, pend_pre, pend_compat, pend_post; > +uint64_t pending_size, pend_pre, pend_post; > bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE; > > qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, &pend_pre, > - &pend_compat, &pend_post); > -pending_size = pend_pre + pend_compat + pend_post; > + &pend_po
Re: [PATCH v2 2/2] block/vmdk: Simplify vmdk_co_create() to return directly
On Tue, 22 Nov 2022 at 13:51, Markus Armbruster wrote: > > Cc: Fam Zheng > Cc: Kevin Wolf > Cc: Hanna Reitz > Cc: qemu-block@nongnu.org > Signed-off-by: Markus Armbruster > --- > block/vmdk.c | 28 +++- > 1 file changed, 11 insertions(+), 17 deletions(-) > Reviewed-by: Peter Maydell thanks -- PMM
[PATCH v2 2/2] block/vmdk: Simplify vmdk_co_create() to return directly
Cc: Fam Zheng Cc: Kevin Wolf Cc: Hanna Reitz Cc: qemu-block@nongnu.org Signed-off-by: Markus Armbruster --- block/vmdk.c | 28 +++- 1 file changed, 11 insertions(+), 17 deletions(-) diff --git a/block/vmdk.c b/block/vmdk.c index 26376352b9..bac3d8db50 100644 --- a/block/vmdk.c +++ b/block/vmdk.c @@ -2821,7 +2821,6 @@ static BlockBackend *vmdk_co_create_cb(int64_t size, int idx, static int coroutine_fn vmdk_co_create(BlockdevCreateOptions *create_options, Error **errp) { -int ret; BlockdevCreateOptionsVmdk *opts; opts = &create_options->u.vmdk; @@ -2829,24 +2828,19 @@ static int coroutine_fn vmdk_co_create(BlockdevCreateOptions *create_options, /* Validate options */ if (!QEMU_IS_ALIGNED(opts->size, BDRV_SECTOR_SIZE)) { error_setg(errp, "Image size must be a multiple of 512 bytes"); -ret = -EINVAL; -goto out; +return -EINVAL; } -ret = vmdk_co_do_create(opts->size, -opts->subformat, -opts->adapter_type, -opts->backing_file, -opts->hwversion, -opts->toolsversion, -false, -opts->zeroed_grain, -vmdk_co_create_cb, -opts, errp); -return ret; - -out: -return ret; +return vmdk_co_do_create(opts->size, + opts->subformat, + opts->adapter_type, + opts->backing_file, + opts->hwversion, + opts->toolsversion, + false, + opts->zeroed_grain, + vmdk_co_create_cb, + opts, errp); } static void vmdk_close(BlockDriverState *bs) -- 2.37.3
Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci
On Tue, 22 Nov 2022 at 13:27, Markus Armbruster wrote: > > Peter Maydell writes: > > The obvious answer is "you might have got your manual tweaking > > wrong". A purely mechanised patch I can review by looking at > > the script and maybe eyeballing a few instances of the change; > > a change that is 99% mechanised and 1% hand-written I need to > > run through to find the hand-written parts. > > Define "handwritten" :) > > If reverting unwanted line-breaks and blank lines counts, then I can > make two patches, one straight from Coccinelle, and one that reverts the > unwanted crap. The first one will be larger and more annoying to review > than this one. A clear loss in my book, but I'm the patch submitter, > not a patch reviewer, so my book doesn't matter. > > Else, we're down to one file, which I already offered to split off. > > > But mostly this patch is hard to review for its sheer size, > > mechanical changes or not. A 3000 line patchmail is so big that > > the UI on my mail client gets pretty unwieldy. > > With the manual one split off, target/xtensa/ dropped as requested by > Max, and tests/tcg/mips/ dropped because its status is unclear (and I > start to find it hard to care), we're down to > > 28 files changed, 81 insertions(+), 221 deletions(-) Yes, this is much better and "I hand tweaked these things" is reasonable in a patch that big. It's the combination of the ginormous multi-thousand-line patch and the hand tweaking that was the really awkward part. thanks -- PMM
Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci
Peter Maydell writes: > On Tue, 22 Nov 2022 at 08:58, Markus Armbruster wrote: >> I don't think complete detailed review is necessary or even sensible. >> >> Review should start with the Coccinelle script: >> >> // replace 'R = X; return R;' with 'return X;' >> @@ >> identifier VAR; >> expression E; >> type T; >> identifier F; >> @@ >> T F(...) >> { >> ... >> -T VAR; >> ... when != VAR >> >> -VAR = (E); >> -return VAR; >> +return E; >> ... when != VAR >> } >> >> What could go wrong? Not a rhetorical question! > > The obvious answer is "you might have got your manual tweaking > wrong". A purely mechanised patch I can review by looking at > the script and maybe eyeballing a few instances of the change; > a change that is 99% mechanised and 1% hand-written I need to > run through to find the hand-written parts. Define "handwritten" :) If reverting unwanted line-breaks and blank lines counts, then I can make two patches, one straight from Coccinelle, and one that reverts the unwanted crap. The first one will be larger and more annoying to review than this one. A clear loss in my book, but I'm the patch submitter, not a patch reviewer, so my book doesn't matter. Else, we're down to one file, which I already offered to split off. > But mostly this patch is hard to review for its sheer size, > mechanical changes or not. A 3000 line patchmail is so big that > the UI on my mail client gets pretty unwieldy. With the manual one split off, target/xtensa/ dropped as requested by Max, and tests/tcg/mips/ dropped because its status is unclear (and I start to find it hard to care), we're down to 28 files changed, 81 insertions(+), 221 deletions(-) This will be v2.
Re: [PATCH v8 0/4] block: small refactorings
Am 07.11.2022 um 17:35 hat Vladimir Sementsov-Ogievskiy geschrieben: > Hi all! > > Here is 4-more simple already reviewed patches from > "[PATCH v5 00/45] Transactional block-graph modifying API" [1] > > Called v8 because first part of [1] was recently merged as > "[PATCH v7 for-7.2 00/15] block: cleanup backing and file handling" Thanks, applied to block-next. Kevin
Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci
On Tue, 22 Nov 2022 at 08:58, Markus Armbruster wrote: > I don't think complete detailed review is necessary or even sensible. > > Review should start with the Coccinelle script: > > // replace 'R = X; return R;' with 'return X;' > @@ > identifier VAR; > expression E; > type T; > identifier F; > @@ > T F(...) > { > ... > -T VAR; > ... when != VAR > > -VAR = (E); > -return VAR; > +return E; > ... when != VAR > } > > What could go wrong? Not a rhetorical question! The obvious answer is "you might have got your manual tweaking wrong". A purely mechanised patch I can review by looking at the script and maybe eyeballing a few instances of the change; a change that is 99% mechanised and 1% hand-written I need to run through to find the hand-written parts. But mostly this patch is hard to review for its sheer size, mechanical changes or not. A 3000 line patchmail is so big that the UI on my mail client gets pretty unwieldy. -- PMM
[PULL 1/8] vhost: mask VIRTIO_F_RING_RESET for vhost and vhost-user devices
From: Stefano Garzarella Commit 69e1c14aa2 ("virtio: core: vq reset feature negotation support") enabled VIRTIO_F_RING_RESET by default for all virtio devices. This feature is not currently emulated by QEMU, so for vhost and vhost-user devices we need to make sure it is supported by the offloaded device emulation (in-kernel or in another process). To do this we need to add VIRTIO_F_RING_RESET to the features bitmap passed to vhost_get_features(). This way it will be masked if the device does not support it. This issue was initially discovered with vhost-vsock and vhost-user-vsock, and then also tested with vhost-user-rng which confirmed the same issue. They fail when sending features through VHOST_SET_FEATURES ioctl or VHOST_USER_SET_FEATURES message, since VIRTIO_F_RING_RESET is negotiated by the guest (Linux >= v6.0), but not supported by the device. Fixes: 69e1c14aa2 ("virtio: core: vq reset feature negotation support") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1318 Signed-off-by: Stefano Garzarella Message-Id: <20221121101101.29400-1-sgarz...@redhat.com> Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin Reviewed-by: Stefan Hajnoczi Acked-by: Raphael Norwitz Acked-by: Jason Wang --- hw/block/vhost-user-blk.c | 1 + hw/net/vhost_net.c | 1 + hw/scsi/vhost-scsi.c | 1 + hw/scsi/vhost-user-scsi.c | 1 + hw/virtio/vhost-user-fs.c | 1 + hw/virtio/vhost-user-gpio.c| 1 + hw/virtio/vhost-user-i2c.c | 1 + hw/virtio/vhost-user-rng.c | 11 +-- hw/virtio/vhost-vsock-common.c | 1 + net/vhost-vdpa.c | 1 + 10 files changed, 18 insertions(+), 2 deletions(-) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index 16ad400889..0d5190accf 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -52,6 +52,7 @@ static const int user_feature_bits[] = { VIRTIO_F_NOTIFY_ON_EMPTY, VIRTIO_F_RING_PACKED, VIRTIO_F_IOMMU_PLATFORM, +VIRTIO_F_RING_RESET, VHOST_INVALID_FEATURE_BIT }; diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index feda448878..26e4930676 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -75,6 +75,7 @@ static const int user_feature_bits[] = { VIRTIO_NET_F_MTU, VIRTIO_F_IOMMU_PLATFORM, VIRTIO_F_RING_PACKED, +VIRTIO_F_RING_RESET, VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT, diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c index bdf337a7a2..6a0fd0dfb1 100644 --- a/hw/scsi/vhost-scsi.c +++ b/hw/scsi/vhost-scsi.c @@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = { VIRTIO_RING_F_INDIRECT_DESC, VIRTIO_RING_F_EVENT_IDX, VIRTIO_SCSI_F_HOTPLUG, +VIRTIO_F_RING_RESET, VHOST_INVALID_FEATURE_BIT }; diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c index bc37317d55..b7a71a802c 100644 --- a/hw/scsi/vhost-user-scsi.c +++ b/hw/scsi/vhost-user-scsi.c @@ -36,6 +36,7 @@ static const int user_feature_bits[] = { VIRTIO_RING_F_INDIRECT_DESC, VIRTIO_RING_F_EVENT_IDX, VIRTIO_SCSI_F_HOTPLUG, +VIRTIO_F_RING_RESET, VHOST_INVALID_FEATURE_BIT }; diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c index 1c40f42045..dc4014cdef 100644 --- a/hw/virtio/vhost-user-fs.c +++ b/hw/virtio/vhost-user-fs.c @@ -32,6 +32,7 @@ static const int user_feature_bits[] = { VIRTIO_F_NOTIFY_ON_EMPTY, VIRTIO_F_RING_PACKED, VIRTIO_F_IOMMU_PLATFORM, +VIRTIO_F_RING_RESET, VHOST_INVALID_FEATURE_BIT }; diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c index 677d1c7730..5851cb3bc9 100644 --- a/hw/virtio/vhost-user-gpio.c +++ b/hw/virtio/vhost-user-gpio.c @@ -24,6 +24,7 @@ static const int feature_bits[] = { VIRTIO_RING_F_INDIRECT_DESC, VIRTIO_RING_F_EVENT_IDX, VIRTIO_GPIO_F_IRQ, +VIRTIO_F_RING_RESET, VHOST_INVALID_FEATURE_BIT }; diff --git a/hw/virtio/vhost-user-i2c.c b/hw/virtio/vhost-user-i2c.c index 864eba695e..1c9f3d20dc 100644 --- a/hw/virtio/vhost-user-i2c.c +++ b/hw/virtio/vhost-user-i2c.c @@ -16,6 +16,7 @@ static const int feature_bits[] = { VIRTIO_I2C_F_ZERO_LENGTH_REQUEST, +VIRTIO_F_RING_RESET, VHOST_INVALID_FEATURE_BIT }; diff --git a/hw/virtio/vhost-user-rng.c b/hw/virtio/vhost-user-rng.c index 8b47287875..f9084cde58 100644 --- a/hw/virtio/vhost-user-rng.c +++ b/hw/virtio/vhost-user-rng.c @@ -16,6 +16,11 @@ #include "qemu/error-report.h" #include "standard-headers/linux/virtio_ids.h" +static const int feature_bits[] = { +VIRTIO_F_RING_RESET, +VHOST_INVALID_FEATURE_BIT +}; + static void vu_rng_start(VirtIODevice *vdev) { VHostUserRNG *rng = VHOST_USER_RNG(vdev); @@ -106,8 +111,10 @@ static void vu_rng_set_status(VirtIODevice *vdev, uint8_t status) static uint64_t vu_rng_get_features(VirtIODevice *vdev, uint64_t requested_features, Error **errp) { -/* No feature bits used yet */
Re: [PATCH v4 11/11] block/dirty-bitmap: convert coroutine-only functions to generated_co_wrapper_simple
Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben: > bdrv_can_store_new_dirty_bitmap and bdrv_remove_persistent_dirty_bitmap > check if they are running in a coroutine, directly calling the > coroutine callback if it's the case. > Except that no coroutine calls such functions, therefore that check > can be removed, and function creation can be offloaded to > g_c_w_simple. > > Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Kevin Wolf
Re: [PATCH v4 10/11] block: convert bdrv_create to generated_co_wrapper_simple
Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben: > This function is never called in coroutine context, therefore > instead of manually creating a new coroutine, delegate it to the > block-coroutine-wrapper script, defining it as g_c_w_simple. > > Signed-off-by: Emanuele Giuseppe Esposito At first I thought that "never called in coroutine context" was not entirely true because of paths like this: qcow2_co_create() -> bdrv_open_blockdev_ref() -> bdrv_open_inherit() -> bdrv_append_temp_snapshot() -> bdrv_create(). The only reason why it doesn't happen is that with a BlockdevRef, it's impossible to get BDRV_O_SNAPSHOT set, so bdrv_append_temp_snapshot() can't actually be called when you come from bdrv_open_blockdev_ref(). Of course, opening images in a coroutine is likely to already do similar things. We should probably drop out of coroutine context for bdrv_open to be safe. In practice, I suspect it might work anyway because nothing is going to wait on the current coroutine, but maybe better not to rely on it. Anyway, not a problem of your patch, it's just something it made me think of. > diff --git a/block.c b/block.c > index 7a4c3eb540..d3e168408a 100644 > --- a/block.c > +++ b/block.c > @@ -528,7 +528,7 @@ typedef struct CreateCo { > Error *err; > } CreateCo; > > -static int coroutine_fn bdrv_co_create(BlockDriver *drv, const char > *filename, > +int coroutine_fn bdrv_co_create(BlockDriver *drv, const char *filename, > QemuOpts *opts, Error **errp) The indentation is off now. With this fixed: Reviewed-by: Kevin Wolf
Re: [PATCH v4 09/11] block: bdrv_create_file is a coroutine_fn
Am 22/11/2022 um 09:58 schrieb Kevin Wolf: > Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben: >> It is always called in coroutine_fn callbacks, therefore >> it can directly call bdrv_co_create(). >> >> Signed-off-by: Emanuele Giuseppe Esposito >> --- >> block.c| 6 -- >> include/block/block-global-state.h | 3 ++- >> 2 files changed, 6 insertions(+), 3 deletions(-) >> >> diff --git a/block.c b/block.c >> index c610a32e77..7a4c3eb540 100644 >> --- a/block.c >> +++ b/block.c >> @@ -534,6 +534,7 @@ static int coroutine_fn bdrv_co_create(BlockDriver *drv, >> const char *filename, >> int ret; >> char *filename_copy; >> GLOBAL_STATE_CODE(); >> +assert(qemu_in_coroutine()); > > We don't generally assert this, otherwise it would have to be in every > coroutine_fn. That was my plan for the serie "Protect the block layer with a rwlock: part 3", where I convert BlockDriver callbacks in coroutine, and thus assert there because I know all the callers are coroutine_fn. > >> assert(*errp == NULL); >> assert(drv); >> >> @@ -725,7 +726,8 @@ out: >> return ret; >> } >> >> -int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp) >> +int coroutine_fn bdrv_create_file(const char *filename, QemuOpts *opts, >> + Error **errp) > > Should it be renamed as bdrv_co_create_file()? > Ok (when I don't answer just assume that I agree). Thank you, Emanuele >> { >> QemuOpts *protocol_opts; >> BlockDriver *drv; >> @@ -766,7 +768,7 @@ int bdrv_create_file(const char *filename, QemuOpts >> *opts, Error **errp) >> goto out; >> } >> >> -ret = bdrv_create(drv, filename, protocol_opts, errp); >> +ret = bdrv_co_create(drv, filename, protocol_opts, errp); >> out: >> qemu_opts_del(protocol_opts); >> qobject_unref(qdict); > > Kevin >
Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci
Thomas Huth writes: > On 21/11/2022 17.32, Markus Armbruster wrote: >> Philippe Mathieu-Daudé writes: >> >>> On 21/11/22 15:36, Peter Maydell wrote: On Mon, 21 Nov 2022 at 14:03, Markus Armbruster wrote: > > Tweak the semantic patch to drop redundant parenthesis around the > return expression. > > Coccinelle drops comments in hw/rdma/vmw/pvrdma_cmd.c; restored > manually. > > Coccinelle messes up vmdk_co_create(), not sure why. Transformed > manually. > > Line breaks in target/avr/cpu.h and hw/rdma/vmw/pvrdma_cmd.c tidied up > manually. > > Whitespace in fuse_reply_iov() tidied up manually. > > checkpatch.pl complains "return of an errno should typically be -ve" > two times for hw/9pfs/9p-synth.c. Preexisting, the patch merely makes > it visible to checkpatch.pl. > > checkpatch.pl complains "return is not a function, parentheses are not > required" three times for target/mips/tcg/dsp_helper.c. False > positives. > > Signed-off-by: Markus Armbruster >.../user/ase/msa/bit-count/test_msa_nloc_b.c | 9 +- >.../user/ase/msa/bit-count/test_msa_nloc_d.c | 9 +- [snip long list of other mips test files] >328 files changed, 989 insertions(+), 2099 deletions(-) This patch seems to almost entirely be huge because of these mips test case files. Are they specific to QEMU or are they effectively a 3rd-party import that it doesn't make sense to make local changes to? >>> >>> They are imported and will unlikely be modified. >> >> Not obvious to me from git-log. >> >> Should I drop the changes to tests/tcg/mips/? > > I'd say yes. At least move them to a separate patch. Possible status of tests/tcg/mips/: 1. Imported, should not be modified Drop from the patch. 2. Not imported, should be modified 2a. To be reviewed separately from the remainder of the patch Split off. 2b. Likewise, but nobody will care to review, realistically Split off and merge anyway, or drop. I'd go for the latter. 2c. To be reviewed together with the remainder of the patch Keep as is. Which one is it? > Otherwise reviewing > this patch here is no fun at all. I don't think complete detailed review is necessary or even sensible. Review should start with the Coccinelle script: // replace 'R = X; return R;' with 'return X;' @@ identifier VAR; expression E; type T; identifier F; @@ T F(...) { ... -T VAR; ... when != VAR -VAR = (E); -return VAR; +return E; ... when != VAR } What could go wrong? Not a rhetorical question! The simple part is executing the rule. It'll *delete* variable VAR, and *preserve* expression E. The tricky part is deciding whether the rule matches, in particular proving that there are no other uses of VAR. If Coccinelle gets that wrong, the code no longer compiles *unless* there is another, global VAR of suitable type. Turns out Coccinelle does get it wrong for vmdk_co_create(), and the compiler duly rejects it. I don't understand why it gets it wrong. To help understand the script, and as a sanity check, reviewing some of its patches is advisable. Reviewing all of them feels impractical.
Re: [PATCH v4 09/11] block: bdrv_create_file is a coroutine_fn
Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben: > It is always called in coroutine_fn callbacks, therefore > it can directly call bdrv_co_create(). > > Signed-off-by: Emanuele Giuseppe Esposito > --- > block.c| 6 -- > include/block/block-global-state.h | 3 ++- > 2 files changed, 6 insertions(+), 3 deletions(-) > > diff --git a/block.c b/block.c > index c610a32e77..7a4c3eb540 100644 > --- a/block.c > +++ b/block.c > @@ -534,6 +534,7 @@ static int coroutine_fn bdrv_co_create(BlockDriver *drv, > const char *filename, > int ret; > char *filename_copy; > GLOBAL_STATE_CODE(); > +assert(qemu_in_coroutine()); We don't generally assert this, otherwise it would have to be in every coroutine_fn. > assert(*errp == NULL); > assert(drv); > > @@ -725,7 +726,8 @@ out: > return ret; > } > > -int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp) > +int coroutine_fn bdrv_create_file(const char *filename, QemuOpts *opts, > + Error **errp) Should it be renamed as bdrv_co_create_file()? > { > QemuOpts *protocol_opts; > BlockDriver *drv; > @@ -766,7 +768,7 @@ int bdrv_create_file(const char *filename, QemuOpts > *opts, Error **errp) > goto out; > } > > -ret = bdrv_create(drv, filename, protocol_opts, errp); > +ret = bdrv_co_create(drv, filename, protocol_opts, errp); > out: > qemu_opts_del(protocol_opts); > qobject_unref(qdict); Kevin
Re: [PATCH v4 08/11] block: distinguish between bdrv_create running in coroutine and not
Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben: > Call two different functions depending on whether bdrv_create > is in coroutine or not, following the same pattern as > generated_co_wrapper functions. > > This allows to also call the coroutine function directly, > without using CreateCo or relying in bdrv_create(). > > Signed-off-by: Emanuele Giuseppe Esposito > --- > block.c | 76 - > 1 file changed, 38 insertions(+), 38 deletions(-) > > diff --git a/block.c b/block.c > index 577639c7e0..c610a32e77 100644 > --- a/block.c > +++ b/block.c > @@ -528,66 +528,66 @@ typedef struct CreateCo { > Error *err; > } CreateCo; > > -static void coroutine_fn bdrv_create_co_entry(void *opaque) > +static int coroutine_fn bdrv_co_create(BlockDriver *drv, const char > *filename, > + QemuOpts *opts, Error **errp) > { > -Error *local_err = NULL; > int ret; > +char *filename_copy; > +GLOBAL_STATE_CODE(); > +assert(*errp == NULL); Your use of *errp in this function will crash if NULL is passed. You need ERRP_GUARD() first before you can do this. > +assert(drv); > + > +if (!drv->bdrv_co_create_opts) { > +error_setg(errp, "Driver '%s' does not support image creation", > + drv->format_name); > +return -ENOTSUP; > +} > > +filename_copy = g_strdup(filename); It's only preserved from the pre-patch state, but I don't think this is necessary? We know that the string will stay around until the function returns, and the parameter of drv->bdrv_co_create_opts is const char*, so it must not be modified either. Maybe worth a cleanup patch on top to just use filename directly? > +ret = drv->bdrv_co_create_opts(drv, filename_copy, opts, errp); > + > +if (ret < 0 && !*errp) { > +error_setg_errno(errp, -ret, "Could not create image"); > +} > + > +g_free(filename_copy); > +return ret; > +} Kevin
Re: [PATCH maybe-7.2 1/3] hw/i2c: only schedule pending master when bus is idle
On Nov 17 14:40, Cédric Le Goater wrote: > On 11/17/22 12:58, Klaus Jensen wrote: > > On Nov 17 09:01, Cédric Le Goater wrote: > > > On 11/17/22 08:37, Klaus Jensen wrote: > > > > On Nov 17 07:56, Cédric Le Goater wrote: > > > > > On 11/17/22 07:40, Klaus Jensen wrote: > > > > > > On Nov 16 16:58, Cédric Le Goater wrote: > > > > > > > On 11/16/22 09:43, Klaus Jensen wrote: > > > > > > > > From: Klaus Jensen > > > > > > > > > > > > > > > > It is not given that the current master will release the bus > > > > > > > > after a > > > > > > > > transfer ends. Only schedule a pending master if the bus is > > > > > > > > idle. > > > > > > > > > > > > > > > > Fixes: 37fa5ca42623 ("hw/i2c: support multiple masters") > > > > > > > > Signed-off-by: Klaus Jensen > > > > > > > > --- > > > > > > > > hw/i2c/aspeed_i2c.c | 2 ++ > > > > > > > > hw/i2c/core.c| 37 > > > > > > > > ++--- > > > > > > > > include/hw/i2c/i2c.h | 2 ++ > > > > > > > > 3 files changed, 26 insertions(+), 15 deletions(-) > > > > > > > > > > > > > > > > diff --git a/hw/i2c/aspeed_i2c.c b/hw/i2c/aspeed_i2c.c > > > > > > > > index c166fd20fa11..1f071a3811f7 100644 > > > > > > > > --- a/hw/i2c/aspeed_i2c.c > > > > > > > > +++ b/hw/i2c/aspeed_i2c.c > > > > > > > > @@ -550,6 +550,8 @@ static void > > > > > > > > aspeed_i2c_bus_handle_cmd(AspeedI2CBus *bus, uint64_t value) > > > > > > > > } > > > > > > > > SHARED_ARRAY_FIELD_DP32(bus->regs, reg_cmd, > > > > > > > > M_STOP_CMD, 0); > > > > > > > > aspeed_i2c_set_state(bus, I2CD_IDLE); > > > > > > > > + > > > > > > > > +i2c_schedule_pending_master(bus->bus); > > > > > > > > > > > > > > Shouldn't it be i2c_bus_release() ? > > > > > > > > > > > > > > > > > > > The reason for having both i2c_bus_release() and > > > > > > i2c_schedule_pending_master() is that i2c_bus_release() sort of > > > > > > pairs > > > > > > with i2c_bus_master(). They either set or clear the bus->bh member. > > > > > > > > > > > > In the current design, the controller (in this case the Aspeed I2C) > > > > > > is > > > > > > an "implicit" master (it does not have a bottom half driving it), so > > > > > > there is no bus->bh to clear. > > > > > > > > > > > > I should (and will) write some documentation on the asynchronous > > > > > > API. > > > > > > > > > > I found the routine names confusing. Thanks for the clarification. > > > > > > > > > > Maybe we could do this rename : > > > > > > > > > > i2c_bus_release() -> i2c_bus_release_and_clear() > > > > > i2c_schedule_pending_master() -> i2c_bus_release() > > > > > > > > > > and keep i2c_schedule_pending_master() internal the I2C core > > > > > subsystem. > > > > > > > > > > > > > How about renaming i2c_bus_master to i2c_bus_acquire() such that it > > > > pairs with i2c_bus_release(). > > > > > > Looks good to me. > > > > > > > And then add an i2c_bus_yield() to be used by the controller? I think we > > > > should be able to assert in i2c_bus_yield() that bus->bh is NULL. But > > > > I'll take a closer look at that. > > > > > > I am using your i2c-echo slave device to track regressions in the Aspeed > > > machines. May be we could merge it for tests ? > > > > > > > Oh, cool. > > > > Sure, I'd be happy to help "maintain" it ;) > > And so, I am seeing errors with the little POC you sent. > > without: > console: echo slave-24c02 0x1064 > /sys/bus/i2c/devices/i2c-3/new_device > console: # echo slave-24c02 0x1064 > /sys/bus/i2c/devices/i2c-3/new_device > console: [4.252431] i2c i2c-3: new_device: Instantiated device > slave-24c02 at 0x64 > console: i2cset -y 3 0x42 0x64 0x00 0xaa i > /console: # i2cset -y 3 0x42 0x64 0x00 0xaa i > console: # hexdump /sys/bus/i2c/devices/3-1064/slave-eeprom > console: 000 ffaa > console: poweroff > console: 010 > console: * > console: 100 > > with: > console: echo slave-24c02 0x1064 > /sys/bus/i2c/devices/i2c-3/new_device > console: # echo slave-24c02 0x1064 > /sys/bus/i2c/devices/i2c-3/new_device > console: [4.413210] i2c i2c-3: new_device: Instantiated device > slave-24c02 at 0x64 > console: i2cset -y 3 0x42 0x64 0x00 0xaa i > console: # i2cset -y 3 0x42 0x64 0x00 0xaa i > console: # hexdump /sys/bus/i2c/devices/3-1064/slave-eeprom > console: 000 > console: * > console: 100 > C. Right. This is because the i2c-echo device is scheduling the bottom half initially on its own. What happens is that the bottom half gets queued up in the pending masters list instead of being scheduling directly. And since the i2c controller is idle, the bottom half is never scheduled. Fixing i2c_bus_acquire() to schedulue it directly if the bus is free seems the proper way to do it. I'll include that in v2. While it is not directly invalid, the
Re: [PATCH v4 05/11] block-coroutine-wrapper.py: default to main loop aiocontext if function does not have a BlockDriverState parameter
Am 21.11.2022 um 16:52 hat Emanuele Giuseppe Esposito geschrieben: > Am 21/11/2022 um 16:30 schrieb Kevin Wolf: > > Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben: > >> Basically BdrvPollCo->bs is only used by bdrv_poll_co(), and the > >> functions that it uses are both using bdrv_get_aio_context, that > >> defaults to qemu_get_aio_context() if bs is NULL. > >> > >> Therefore pass NULL to BdrvPollCo to automatically generate a function > >> that create and runs a coroutine in the main loop. > >> > >> Signed-off-by: Emanuele Giuseppe Esposito > > > > It happens to work, but it's kind of ugly to call bdrv_coroutine_enter() > > and BDRV_POLL_WHILE() with a NULL bs. > > > > How hard would it be to generate code that doesn't use these functions, > > but directly aio_co_enter() and AIO_WAIT_WHILE() for wrappers that are > > not related to a BDS? > > > > At this point, I would get rid of s->poll_state.bs and instead use > s->poll_state.aio_context. Then call directly aio_co_enter and > AIO_WAIT_WHILE, as you suggested but just everywhere, without > differentiating the cases. Oh, yes, that's better. > Then we would have something similar to what it is currently done with bs: > > if t == 'BlockDriverState *': > bs = 'bdrv_get_aio_context(bs)' > elif t == 'BdrvChild *': > bs = 'bdrv_get_aio_context(child->bs)' > elif t == 'BlockBackend *': > bs = 'bdrv_get_aio_context(blk_bs(blk))' blk_get_aio_context(blk) seems more correct. > else: > bs = 'qemu_get_aio_context()' > > I haven't tried it yet, but it should work. Kevin
Re: [PATCH for-7.2] block-backend: avoid bdrv_unregister_buf() NULL pointer deref
Am 21.11.2022 um 22:19 hat Stefan Hajnoczi geschrieben: > bdrv_*() APIs expect a valid BlockDriverState. Calling them with bs=NULL > leads to undefined behavior. > > Jonathan Cameron reported this following NULL pointer dereference when a > VM with a virtio-blk device and a memory-backend-file object is > terminated: > 1. qemu_cleanup() closes all drives, setting blk->root to NULL > 2. qemu_cleanup() calls user_creatable_cleanup(), which results in a RAM >block notifier callback because the memory-backend-file is destroyed. > 3. blk_unregister_buf() is called by virtio-blk's BlockRamRegistrar >notifier callback and undefined behavior occurs. > > Fixes: baf422684d73 ("virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization > hint") > Co-authored-by: Jonathan Cameron > Signed-off-by: Stefan Hajnoczi Reviewed-by: Kevin Wolf This raises some questions, though. What happens if the graph isn't static between creation and deletion of the device? Do we need to do something with registered buffers when a node is attached to or detached from an existing device? Kevin
[PATCH for-7.2 3/5] hw/nvme: fix aio cancel in zone reset
From: Klaus Jensen If the zone reset operation is cancelled but the block unmap operation completes normally, the callback will continue resetting the next zone since it neglects to check iocb->ret which will have been set to -ECANCELED. Make sure that this is checked and bail out if an error is present. Secondly, fix a potential use-after-free by removing the bottom half and enqueuing the completion directly. Fixes: 63d96e4ffd71 ("hw/nvme: reimplement zone reset to allow cancellation") Signed-off-by: Klaus Jensen --- hw/nvme/ctrl.c | 36 +++- 1 file changed, 11 insertions(+), 25 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index fc129b8d1a93..558ccea154c2 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -3712,7 +3712,6 @@ typedef struct NvmeZoneResetAIOCB { BlockAIOCB common; BlockAIOCB *aiocb; NvmeRequest *req; -QEMUBH *bh; int ret; bool all; @@ -3741,17 +3740,6 @@ static const AIOCBInfo nvme_zone_reset_aiocb_info = { .cancel_async = nvme_zone_reset_cancel, }; -static void nvme_zone_reset_bh(void *opaque) -{ -NvmeZoneResetAIOCB *iocb = opaque; - -iocb->common.cb(iocb->common.opaque, iocb->ret); - -qemu_bh_delete(iocb->bh); -iocb->bh = NULL; -qemu_aio_unref(iocb); -} - static void nvme_zone_reset_cb(void *opaque, int ret); static void nvme_zone_reset_epilogue_cb(void *opaque, int ret) @@ -3762,14 +3750,8 @@ static void nvme_zone_reset_epilogue_cb(void *opaque, int ret) int64_t moff; int count; -if (ret < 0) { -nvme_zone_reset_cb(iocb, ret); -return; -} - -if (!ns->lbaf.ms) { -nvme_zone_reset_cb(iocb, 0); -return; +if (ret < 0 || iocb->ret < 0 || !ns->lbaf.ms) { +goto out; } moff = nvme_moff(ns, iocb->zone->d.zslba); @@ -3779,6 +3761,9 @@ static void nvme_zone_reset_epilogue_cb(void *opaque, int ret) BDRV_REQ_MAY_UNMAP, nvme_zone_reset_cb, iocb); return; + +out: +nvme_zone_reset_cb(iocb, ret); } static void nvme_zone_reset_cb(void *opaque, int ret) @@ -3787,7 +3772,9 @@ static void nvme_zone_reset_cb(void *opaque, int ret) NvmeRequest *req = iocb->req; NvmeNamespace *ns = req->ns; -if (ret < 0) { +if (iocb->ret < 0) { +goto done; +} else if (ret < 0) { iocb->ret = ret; goto done; } @@ -3835,9 +3822,9 @@ static void nvme_zone_reset_cb(void *opaque, int ret) done: iocb->aiocb = NULL; -if (iocb->bh) { -qemu_bh_schedule(iocb->bh); -} + +iocb->common.cb(iocb->common.opaque, iocb->ret); +qemu_aio_unref(iocb); } static uint16_t nvme_zone_mgmt_send_zrwa_flush(NvmeCtrl *n, NvmeZone *zone, @@ -3942,7 +3929,6 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) nvme_misc_cb, req); iocb->req = req; -iocb->bh = qemu_bh_new(nvme_zone_reset_bh, iocb); iocb->ret = 0; iocb->all = all; iocb->idx = zone_idx; -- 2.38.1
[PATCH for-7.2 4/5] hw/nvme: fix aio cancel in dsm
From: Klaus Jensen When the DSM operation is cancelled asynchronously, we set iocb->ret to -ECANCELED. However, the callback function only checks the return value of the completed aio, which may have completed succesfully prior to the cancellation and thus the callback ends up continuing the dsm operation instead of bailing out. Fix this. Secondly, fix a potential use-after-free by removing the bottom half and enqueuing the completion directly. Fixes: d7d1474fd85d ("hw/nvme: reimplement dsm to allow cancellation") Signed-off-by: Klaus Jensen --- hw/nvme/ctrl.c | 34 -- 1 file changed, 8 insertions(+), 26 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 558ccea154c2..458c85d47cce 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -2329,7 +2329,6 @@ typedef struct NvmeDSMAIOCB { BlockAIOCB common; BlockAIOCB *aiocb; NvmeRequest *req; -QEMUBH *bh; int ret; NvmeDsmRange *range; @@ -2351,7 +2350,7 @@ static void nvme_dsm_cancel(BlockAIOCB *aiocb) } else { /* * We only reach this if nvme_dsm_cancel() has already been called or - * the command ran to completion and nvme_dsm_bh is scheduled to run. + * the command ran to completion. */ assert(iocb->idx == iocb->nr); } @@ -2362,17 +2361,6 @@ static const AIOCBInfo nvme_dsm_aiocb_info = { .cancel_async = nvme_dsm_cancel, }; -static void nvme_dsm_bh(void *opaque) -{ -NvmeDSMAIOCB *iocb = opaque; - -iocb->common.cb(iocb->common.opaque, iocb->ret); - -qemu_bh_delete(iocb->bh); -iocb->bh = NULL; -qemu_aio_unref(iocb); -} - static void nvme_dsm_cb(void *opaque, int ret); static void nvme_dsm_md_cb(void *opaque, int ret) @@ -2384,16 +2372,10 @@ static void nvme_dsm_md_cb(void *opaque, int ret) uint64_t slba; uint32_t nlb; -if (ret < 0) { -iocb->ret = ret; +if (ret < 0 || iocb->ret < 0 || !ns->lbaf.ms) { goto done; } -if (!ns->lbaf.ms) { -nvme_dsm_cb(iocb, 0); -return; -} - range = &iocb->range[iocb->idx - 1]; slba = le64_to_cpu(range->slba); nlb = le32_to_cpu(range->nlb); @@ -2406,7 +2388,6 @@ static void nvme_dsm_md_cb(void *opaque, int ret) ret = nvme_block_status_all(ns, slba, nlb, BDRV_BLOCK_ZERO); if (ret) { if (ret < 0) { -iocb->ret = ret; goto done; } @@ -2420,8 +2401,7 @@ static void nvme_dsm_md_cb(void *opaque, int ret) return; done: -iocb->aiocb = NULL; -qemu_bh_schedule(iocb->bh); +nvme_dsm_cb(iocb, ret); } static void nvme_dsm_cb(void *opaque, int ret) @@ -2434,7 +2414,9 @@ static void nvme_dsm_cb(void *opaque, int ret) uint64_t slba; uint32_t nlb; -if (ret < 0) { +if (iocb->ret < 0) { +goto done; +} else if (ret < 0) { iocb->ret = ret; goto done; } @@ -2468,7 +2450,8 @@ next: done: iocb->aiocb = NULL; -qemu_bh_schedule(iocb->bh); +iocb->common.cb(iocb->common.opaque, iocb->ret); +qemu_aio_unref(iocb); } static uint16_t nvme_dsm(NvmeCtrl *n, NvmeRequest *req) @@ -2486,7 +2469,6 @@ static uint16_t nvme_dsm(NvmeCtrl *n, NvmeRequest *req) nvme_misc_cb, req); iocb->req = req; -iocb->bh = qemu_bh_new(nvme_dsm_bh, iocb); iocb->ret = 0; iocb->range = g_new(NvmeDsmRange, nr); iocb->nr = nr; -- 2.38.1
[PATCH for-7.2 2/5] hw/nvme: fix aio cancel in flush
From: Klaus Jensen Make sure that iocb->aiocb is NULL'ed when cancelling. Fix a potential use-after-free by removing the bottom half and enqueuing the completion directly. Fixes: 38f4ac65ac88 ("hw/nvme: reimplement flush to allow cancellation") Signed-off-by: Klaus Jensen --- hw/nvme/ctrl.c | 21 ++--- 1 file changed, 6 insertions(+), 15 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 26b53469328f..fc129b8d1a93 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -3160,7 +3160,6 @@ typedef struct NvmeFlushAIOCB { BlockAIOCB common; BlockAIOCB *aiocb; NvmeRequest *req; -QEMUBH *bh; int ret; NvmeNamespace *ns; @@ -3176,6 +3175,7 @@ static void nvme_flush_cancel(BlockAIOCB *acb) if (iocb->aiocb) { blk_aio_cancel_async(iocb->aiocb); +iocb->aiocb = NULL; } } @@ -3185,6 +3185,8 @@ static const AIOCBInfo nvme_flush_aiocb_info = { .get_aio_context = nvme_get_aio_context, }; +static void nvme_do_flush(NvmeFlushAIOCB *iocb); + static void nvme_flush_ns_cb(void *opaque, int ret) { NvmeFlushAIOCB *iocb = opaque; @@ -3206,13 +3208,11 @@ static void nvme_flush_ns_cb(void *opaque, int ret) } out: -iocb->aiocb = NULL; -qemu_bh_schedule(iocb->bh); +nvme_do_flush(iocb); } -static void nvme_flush_bh(void *opaque) +static void nvme_do_flush(NvmeFlushAIOCB *iocb) { -NvmeFlushAIOCB *iocb = opaque; NvmeRequest *req = iocb->req; NvmeCtrl *n = nvme_ctrl(req); int i; @@ -3239,14 +3239,8 @@ static void nvme_flush_bh(void *opaque) return; done: -qemu_bh_delete(iocb->bh); -iocb->bh = NULL; - iocb->common.cb(iocb->common.opaque, iocb->ret); - qemu_aio_unref(iocb); - -return; } static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req) @@ -3258,7 +3252,6 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req) iocb = qemu_aio_get(&nvme_flush_aiocb_info, NULL, nvme_misc_cb, req); iocb->req = req; -iocb->bh = qemu_bh_new(nvme_flush_bh, iocb); iocb->ret = 0; iocb->ns = NULL; iocb->nsid = 0; @@ -3280,13 +3273,11 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req) } req->aiocb = &iocb->common; -qemu_bh_schedule(iocb->bh); +nvme_do_flush(iocb); return NVME_NO_COMPLETE; out: -qemu_bh_delete(iocb->bh); -iocb->bh = NULL; qemu_aio_unref(iocb); return status; -- 2.38.1
[PATCH for-7.2 5/5] hw/nvme: remove copy bh scheduling
From: Klaus Jensen Fix a potential use-after-free by removing the bottom half and enqueuing the completion directly. Fixes: 796d20681d9b ("hw/nvme: reimplement the copy command to allow aio cancellation") Signed-off-by: Klaus Jensen --- hw/nvme/ctrl.c | 63 +++--- 1 file changed, 14 insertions(+), 49 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 458c85d47cce..bbbab522aa7a 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -2552,7 +2552,6 @@ typedef struct NvmeCopyAIOCB { BlockAIOCB common; BlockAIOCB *aiocb; NvmeRequest *req; -QEMUBH *bh; int ret; void *ranges; @@ -2590,9 +2589,8 @@ static const AIOCBInfo nvme_copy_aiocb_info = { .cancel_async = nvme_copy_cancel, }; -static void nvme_copy_bh(void *opaque) +static void nvme_copy_done(NvmeCopyAIOCB *iocb) { -NvmeCopyAIOCB *iocb = opaque; NvmeRequest *req = iocb->req; NvmeNamespace *ns = req->ns; BlockAcctStats *stats = blk_get_stats(ns->blkconf.blk); @@ -2604,9 +2602,6 @@ static void nvme_copy_bh(void *opaque) qemu_iovec_destroy(&iocb->iov); g_free(iocb->bounce); -qemu_bh_delete(iocb->bh); -iocb->bh = NULL; - if (iocb->ret < 0) { block_acct_failed(stats, &iocb->acct.read); block_acct_failed(stats, &iocb->acct.write); @@ -2619,7 +2614,7 @@ static void nvme_copy_bh(void *opaque) qemu_aio_unref(iocb); } -static void nvme_copy_cb(void *opaque, int ret); +static void nvme_do_copy(NvmeCopyAIOCB *iocb); static void nvme_copy_source_range_parse_format0(void *ranges, int idx, uint64_t *slba, uint32_t *nlb, @@ -2731,7 +2726,7 @@ static void nvme_copy_out_completed_cb(void *opaque, int ret) iocb->idx++; iocb->slba += nlb; out: -nvme_copy_cb(iocb, iocb->ret); +nvme_do_copy(iocb); } static void nvme_copy_out_cb(void *opaque, int ret) @@ -2743,16 +2738,8 @@ static void nvme_copy_out_cb(void *opaque, int ret) size_t mlen; uint8_t *mbounce; -if (ret < 0) { -iocb->ret = ret; +if (ret < 0 || iocb->ret < 0 || !ns->lbaf.ms) { goto out; -} else if (iocb->ret < 0) { -goto out; -} - -if (!ns->lbaf.ms) { -nvme_copy_out_completed_cb(iocb, 0); -return; } nvme_copy_source_range_parse(iocb->ranges, iocb->idx, iocb->format, NULL, @@ -2771,7 +2758,7 @@ static void nvme_copy_out_cb(void *opaque, int ret) return; out: -nvme_copy_cb(iocb, ret); +nvme_copy_out_completed_cb(iocb, ret); } static void nvme_copy_in_completed_cb(void *opaque, int ret) @@ -2865,15 +2852,9 @@ static void nvme_copy_in_completed_cb(void *opaque, int ret) invalid: req->status = status; -iocb->aiocb = NULL; -if (iocb->bh) { -qemu_bh_schedule(iocb->bh); -} - -return; - +iocb->ret = -1; out: -nvme_copy_cb(iocb, ret); +nvme_do_copy(iocb); } static void nvme_copy_in_cb(void *opaque, int ret) @@ -2884,16 +2865,8 @@ static void nvme_copy_in_cb(void *opaque, int ret) uint64_t slba; uint32_t nlb; -if (ret < 0) { -iocb->ret = ret; +if (ret < 0 || iocb->ret < 0 || !ns->lbaf.ms) { goto out; -} else if (iocb->ret < 0) { -goto out; -} - -if (!ns->lbaf.ms) { -nvme_copy_in_completed_cb(iocb, 0); -return; } nvme_copy_source_range_parse(iocb->ranges, iocb->idx, iocb->format, &slba, @@ -2909,12 +2882,11 @@ static void nvme_copy_in_cb(void *opaque, int ret) return; out: -nvme_copy_cb(iocb, iocb->ret); +nvme_copy_in_completed_cb(iocb, ret); } -static void nvme_copy_cb(void *opaque, int ret) +static void nvme_do_copy(NvmeCopyAIOCB *iocb) { -NvmeCopyAIOCB *iocb = opaque; NvmeRequest *req = iocb->req; NvmeNamespace *ns = req->ns; uint64_t slba; @@ -2922,10 +2894,7 @@ static void nvme_copy_cb(void *opaque, int ret) size_t len; uint16_t status; -if (ret < 0) { -iocb->ret = ret; -goto done; -} else if (iocb->ret < 0) { +if (iocb->ret < 0) { goto done; } @@ -2972,14 +2941,11 @@ static void nvme_copy_cb(void *opaque, int ret) invalid: req->status = status; +iocb->ret = -1; done: -iocb->aiocb = NULL; -if (iocb->bh) { -qemu_bh_schedule(iocb->bh); -} +nvme_copy_done(iocb); } - static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req) { NvmeNamespace *ns = req->ns; @@ -3049,7 +3015,6 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req) } iocb->req = req; -iocb->bh = qemu_bh_new(nvme_copy_bh, iocb); iocb->ret = 0; iocb->nr = nr; iocb->idx = 0; @@ -3066,7 +3031,7 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req) BLOCK_ACCT_WRITE); req->aiocb = &iocb->common; -nvme_copy_cb(iocb, 0); +nvme_do_copy(iocb); return NVME_NO_COMPLETE; -- 2.3
[PATCH for-7.2 1/5] hw/nvme: fix aio cancel in format
From: Klaus Jensen There are several bugs in the async cancel code for the Format command. Firstly, cancelling a format operation neglects to set iocb->ret as well as clearing the iocb->aiocb after cancelling the underlying aiocb which causes the aio callback to ignore the cancellation. Trivial fix. Secondly, and worse, because the request is queued up for posting to the CQ in a bottom half, if the cancellation is due to the submission queue being deleted (which calls blk_aio_cancel), the req structure is deallocated in nvme_del_sq prior to the bottom half being schedulued. Fix this by simply removing the bottom half, there is no reason to defer it anyway. Fixes: 3bcf26d3d619 ("hw/nvme: reimplement format nvm to allow cancellation") Reported-by: Jonathan Derrick Signed-off-by: Klaus Jensen --- hw/nvme/ctrl.c | 27 --- 1 file changed, 12 insertions(+), 15 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index ac3885ce5079..26b53469328f 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -5756,14 +5756,15 @@ typedef struct NvmeFormatAIOCB { uint8_t pil; } NvmeFormatAIOCB; -static void nvme_format_bh(void *opaque); - static void nvme_format_cancel(BlockAIOCB *aiocb) { NvmeFormatAIOCB *iocb = container_of(aiocb, NvmeFormatAIOCB, common); +iocb->ret = -ECANCELED; + if (iocb->aiocb) { blk_aio_cancel_async(iocb->aiocb); +iocb->aiocb = NULL; } } @@ -5787,13 +5788,17 @@ static void nvme_format_set(NvmeNamespace *ns, uint8_t lbaf, uint8_t mset, nvme_ns_init_format(ns); } +static void nvme_do_format(NvmeFormatAIOCB *iocb); + static void nvme_format_ns_cb(void *opaque, int ret) { NvmeFormatAIOCB *iocb = opaque; NvmeNamespace *ns = iocb->ns; int bytes; -if (ret < 0) { +if (iocb->ret < 0) { +goto done; +} else if (ret < 0) { iocb->ret = ret; goto done; } @@ -5817,8 +5822,7 @@ static void nvme_format_ns_cb(void *opaque, int ret) iocb->offset = 0; done: -iocb->aiocb = NULL; -qemu_bh_schedule(iocb->bh); +nvme_do_format(iocb); } static uint16_t nvme_format_check(NvmeNamespace *ns, uint8_t lbaf, uint8_t pi) @@ -5842,9 +5846,8 @@ static uint16_t nvme_format_check(NvmeNamespace *ns, uint8_t lbaf, uint8_t pi) return NVME_SUCCESS; } -static void nvme_format_bh(void *opaque) +static void nvme_do_format(NvmeFormatAIOCB *iocb) { -NvmeFormatAIOCB *iocb = opaque; NvmeRequest *req = iocb->req; NvmeCtrl *n = nvme_ctrl(req); uint32_t dw10 = le32_to_cpu(req->cmd.cdw10); @@ -5882,11 +5885,7 @@ static void nvme_format_bh(void *opaque) return; done: -qemu_bh_delete(iocb->bh); -iocb->bh = NULL; - iocb->common.cb(iocb->common.opaque, iocb->ret); - qemu_aio_unref(iocb); } @@ -5905,7 +5904,6 @@ static uint16_t nvme_format(NvmeCtrl *n, NvmeRequest *req) iocb = qemu_aio_get(&nvme_format_aiocb_info, NULL, nvme_misc_cb, req); iocb->req = req; -iocb->bh = qemu_bh_new(nvme_format_bh, iocb); iocb->ret = 0; iocb->ns = NULL; iocb->nsid = 0; @@ -5934,14 +5932,13 @@ static uint16_t nvme_format(NvmeCtrl *n, NvmeRequest *req) } req->aiocb = &iocb->common; -qemu_bh_schedule(iocb->bh); +nvme_do_format(iocb); return NVME_NO_COMPLETE; out: -qemu_bh_delete(iocb->bh); -iocb->bh = NULL; qemu_aio_unref(iocb); + return status; } -- 2.38.1
[PATCH for-7.2 0/5] hw/nvme: aio cancel fixes
From: Klaus Jensen A new blktests nvme test unearthed some bad bugs in the asynchronous cancellation handling. Fix this for all commands that implement async_cancel(). The fix is the same for all commands: remove the deferred enqueuing (a bottom half scheduling) of the request completion. Klaus Jensen (5): hw/nvme: fix aio cancel in format hw/nvme: fix aio cancel in flush hw/nvme: fix aio cancel in zone reset hw/nvme: fix aio cancel in dsm hw/nvme: remove copy bh scheduling hw/nvme/ctrl.c | 181 ++--- 1 file changed, 51 insertions(+), 130 deletions(-) -- 2.38.1