date:20221121

Re: [PATCH] vhost: mask VIRTIO_F_RING_RESET for vhost and vhost-user devices

2022-11-21 Thread Raphael Norwitz




> On Nov 21, 2022, at 5:11 AM, Stefano Garzarella  wrote:
> 
> Commit 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
> enabled VIRTIO_F_RING_RESET by default for all virtio devices.
> 
> This feature is not currently emulated by QEMU, so for vhost and
> vhost-user devices we need to make sure it is supported by the offloaded
> device emulation (in-kernel or in another process).
> To do this we need to add VIRTIO_F_RING_RESET to the features bitmap
> passed to vhost_get_features(). This way it will be masked if the device
> does not support it.
> 
> This issue was initially discovered with vhost-vsock and vhost-user-vsock,
> and then also tested with vhost-user-rng which confirmed the same issue.
> They fail when sending features through VHOST_SET_FEATURES ioctl or
> VHOST_USER_SET_FEATURES message, since VIRTIO_F_RING_RESET is negotiated
> by the guest (Linux >= v6.0), but not supported by the device.
> 
> Fixes: 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1318
> Signed-off-by: Stefano Garzarella 

Looks good. For vhost-user-blk and vhost-user-scsi:

Acked-by: Raphael Norwitz

Re: [PATCH] vhost: mask VIRTIO_F_RING_RESET for vhost and vhost-user devices

2022-11-21 Thread Jason Wang

On Mon, Nov 21, 2022 at 6:11 PM Stefano Garzarella  wrote:
>
> Commit 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
> enabled VIRTIO_F_RING_RESET by default for all virtio devices.
>
> This feature is not currently emulated by QEMU, so for vhost and
> vhost-user devices we need to make sure it is supported by the offloaded
> device emulation (in-kernel or in another process).
> To do this we need to add VIRTIO_F_RING_RESET to the features bitmap
> passed to vhost_get_features(). This way it will be masked if the device
> does not support it.
>
> This issue was initially discovered with vhost-vsock and vhost-user-vsock,
> and then also tested with vhost-user-rng which confirmed the same issue.
> They fail when sending features through VHOST_SET_FEATURES ioctl or
> VHOST_USER_SET_FEATURES message, since VIRTIO_F_RING_RESET is negotiated
> by the guest (Linux >= v6.0), but not supported by the device.
>
> Fixes: 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1318
> Signed-off-by: Stefano Garzarella 

Acked-by: Jason Wang 

> ---
>
> To prevent this problem in the future, perhaps we should provide a function
> (e.g. vhost_device_get_features) where we go to mask all non-device-specific
> features (e.g VIRTIO_F_*, VIRTIO_RING_F_*) that are not emulated by QEMU but
> we expect them to be emulated by the vhost or vhost-user devices.

Adding Eugenio, this might not be true if we want to use shadow
virtqueue for emulating some features?

Thanks

> Then we can call it in all .get_features callbacks just before return the
> features.
>
> What do you think?
>
> But maybe better to do that for the next release, I will send an RFC.
>
> Thanks,
> Stefano
> ---
>  hw/block/vhost-user-blk.c  |  1 +
>  hw/net/vhost_net.c |  1 +
>  hw/scsi/vhost-scsi.c   |  1 +
>  hw/scsi/vhost-user-scsi.c  |  1 +
>  hw/virtio/vhost-user-fs.c  |  1 +
>  hw/virtio/vhost-user-gpio.c|  1 +
>  hw/virtio/vhost-user-i2c.c |  1 +
>  hw/virtio/vhost-user-rng.c | 11 +--
>  hw/virtio/vhost-vsock-common.c |  1 +
>  net/vhost-vdpa.c   |  1 +
>  10 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 16ad400889..0d5190accf 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -52,6 +52,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_F_NOTIFY_ON_EMPTY,
>  VIRTIO_F_RING_PACKED,
>  VIRTIO_F_IOMMU_PLATFORM,
> +VIRTIO_F_RING_RESET,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index feda448878..26e4930676 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -75,6 +75,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_NET_F_MTU,
>  VIRTIO_F_IOMMU_PLATFORM,
>  VIRTIO_F_RING_PACKED,
> +VIRTIO_F_RING_RESET,
>  VIRTIO_NET_F_RSS,
>  VIRTIO_NET_F_HASH_REPORT,
>
> diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
> index bdf337a7a2..6a0fd0dfb1 100644
> --- a/hw/scsi/vhost-scsi.c
> +++ b/hw/scsi/vhost-scsi.c
> @@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
>  VIRTIO_RING_F_INDIRECT_DESC,
>  VIRTIO_RING_F_EVENT_IDX,
>  VIRTIO_SCSI_F_HOTPLUG,
> +VIRTIO_F_RING_RESET,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index bc37317d55..b7a71a802c 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_RING_F_INDIRECT_DESC,
>  VIRTIO_RING_F_EVENT_IDX,
>  VIRTIO_SCSI_F_HOTPLUG,
> +VIRTIO_F_RING_RESET,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index 1c40f42045..dc4014cdef 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -32,6 +32,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_F_NOTIFY_ON_EMPTY,
>  VIRTIO_F_RING_PACKED,
>  VIRTIO_F_IOMMU_PLATFORM,
> +VIRTIO_F_RING_RESET,
>
>  VHOST_INVALID_FEATURE_BIT
>  };
> diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c
> index 677d1c7730..5851cb3bc9 100644
> --- a/hw/virtio/vhost-user-gpio.c
> +++ b/hw/virtio/vhost-user-gpio.c
> @@ -24,6 +24,7 @@ static const int feature_bits[] = {
>  VIRTIO_RING_F_INDIRECT_DESC,
>  VIRTIO_RING_F_EVENT_IDX,
>  VIRTIO_GPIO_F_IRQ,
> +VIRTIO_F_RING_RESET,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/hw/virtio/vhost-user-i2c.c b/hw/virtio/vhost-user-i2c.c
> index 864eba695e..1c9f3d20dc 100644
> --- a/hw/virtio/vhost-user-i2c.c
> +++ b/hw/virtio/vhost-user-i2c.c
> @@ -16,6 +16,7 @@
>
>  static const int feature_bits[] = {
>  VIRTIO_I2C_F_ZERO_LENGTH_REQUEST,
> +VIRTIO_F_RING_RESET,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git

Re: [PATCH v1 5/9] hw/virtio: introduce virtio_device_should_start

2022-11-21 Thread Michael S. Tsirkin

On Tue, Nov 15, 2022 at 05:46:58PM +0100, Christian Borntraeger wrote:
> 
> 
> Am 15.11.22 um 17:40 schrieb Christian Borntraeger:
> > 
> > 
> > Am 15.11.22 um 17:05 schrieb Alex Bennée:
> > > 
> > > Christian Borntraeger  writes:
> > > 
> > > > Am 15.11.22 um 15:31 schrieb Alex Bennée:
> > > > > "Michael S. Tsirkin"  writes:
> > > > > 
> > > > > > On Mon, Nov 14, 2022 at 06:15:30PM +0100, Christian Borntraeger 
> > > > > > wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > Am 14.11.22 um 18:10 schrieb Michael S. Tsirkin:
> > > > > > > > On Mon, Nov 14, 2022 at 05:55:09PM +0100, Christian Borntraeger 
> > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Am 14.11.22 um 17:37 schrieb Michael S. Tsirkin:
> > > > > > > > > > On Mon, Nov 14, 2022 at 05:18:53PM +0100, Christian 
> > > > > > > > > > Borntraeger wrote:
> > > > > > > > > > > Am 08.11.22 um 10:23 schrieb Alex Bennée:
> > > > > > > > > > > > The previous fix to virtio_device_started revealed a 
> > > > > > > > > > > > problem in its
> > > > > > > > > > > > use by both the core and the device code. The core code 
> > > > > > > > > > > > should be able
> > > > > > > > > > > > to handle the device "starting" while the VM isn't 
> > > > > > > > > > > > running to handle
> > > > > > > > > > > > the restoration of migration state. To solve this dual 
> > > > > > > > > > > > use introduce a
> > > > > > > > > > > > new helper for use by the vhost-user backends who all 
> > > > > > > > > > > > use it to feed a
> > > > > > > > > > > > should_start variable.
> > > > > > > > > > > > 
> > > > > > > > > > > > We can also pick up a change vhost_user_blk_set_status 
> > > > > > > > > > > > while we are at
> > > > > > > > > > > > it which follows the same pattern.
> > > > > > > > > > > > 
> > > > > > > > > > > > Fixes: 9f6bcfd99f (hw/virtio: move vm_running check to 
> > > > > > > > > > > > virtio_device_started)
> > > > > > > > > > > > Fixes: 27ba7b027f (hw/virtio: add boilerplate for 
> > > > > > > > > > > > vhost-user-gpio device)
> > > > > > > > > > > > Signed-off-by: Alex Bennée 
> > > > > > > > > > > > Cc: "Michael S. Tsirkin" 
> > > > > > > > > > > 
> > > > > > > > > > > Hmmm, is this
> > > > > > > > > > > commit 259d69c00b67c02a67f3bdbeeea71c2c0af76c35
> > > > > > > > > > > Author: Alex Bennée 
> > > > > > > > > > > AuthorDate: Mon Nov 7 12:14:07 2022 +
> > > > > > > > > > > Commit: Michael S. Tsirkin 
> > > > > > > > > > > CommitDate: Mon Nov 7 14:08:18 2022 -0500
> > > > > > > > > > > 
> > > > > > > > > > >     hw/virtio: introduce virtio_device_should_start
> > > > > > > > > > > 
> > > > > > > > > > > and older version?
> > > > > > > > > > 
> > > > > > > > > > This is what got merged:
> > > > > > > > > > https://lore.kernel.org/r/20221107121407.1010913-1-alex.bennee%40linaro.org
> > > > > > > > > > This patch was sent after I merged the RFC.
> > > > > > > > > > I think the only difference is the commit log but I might 
> > > > > > > > > > be missing
> > > > > > > > > > something.
> > > > > > > > > > 
> > > > > > > > > > > This does not seem to fix the regression that I have 
> > > > > > > > > > > reported.
> > > > > > > > > > 
> > > > > > > > > > This was applied on top of 9f6bcfd99f which IIUC does, 
> > > > > > > > > > right?
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > QEMU master still fails for me for suspend/resume to disk:
> > > > > > > > > 
> > > > > > > > > #0  0x03ff8e3980a6 in __pthread_kill_implementation () at 
> > > > > > > > > /lib64/libc.so.6
> > > > > > > > > #1  0x03ff8e348580 in raise () at /lib64/libc.so.6
> > > > > > > > > #2  0x03ff8e32b5c0 in abort () at /lib64/libc.so.6
> > > > > > > > > #3  0x03ff8e3409da in __assert_fail_base () at 
> > > > > > > > > /lib64/libc.so.6
> > > > > > > > > #4  0x03ff8e340a4e in  () at /lib64/libc.so.6
> > > > > > > > > #5 0x02aa1ffa8966 in vhost_vsock_common_pre_save
> > > > > > > > > (opaque=) at
> > > > > > > > > ../hw/virtio/vhost-vsock-common.c:203
> > > > > > > > > #6  0x02aa1fe5e0ee in vmstate_save_state_v
> > > > > > > > >    (f=f@entry=0x2aa21bdc170, vmsd=0x2aa204ac5f0
> > > > > > > > > , opaque=0x2aa21bac9f8,
> > > > > > > > > vmdesc=vmdesc@entry=0x3fddc08eb30,
> > > > > > > > > version_id=version_id@entry=0) at ../migration/vmstate.c:329
> > > > > > > > > #7 0x02aa1fe5ebf8 in vmstate_save_state
> > > > > > > > > (f=f@entry=0x2aa21bdc170, vmsd=,
> > > > > > > > > opaque=, 
> > > > > > > > > vmdesc_id=vmdesc_id@entry=0x3fddc08eb30)
> > > > > > > > > at ../migration/vmstate.c:317
> > > > > > > > > #8 0x02aa1fe75bd0 in vmstate_save 
> > > > > > > > > (f=f@entry=0x2aa21bdc170,
> > > > > > > > > se=se@entry=0x2aa21bdbe90, vmdesc=vmdesc@entry=0x3fddc08eb30) 
> > > > > > > > > at
> > > > > > > > > ../migration/savevm.c:908
> > > > > > > > > #9 0x02aa1fe79584 in
> > > > > > > > > qemu_savevm_state_complete_precopy_non_iterable
> > > > > > > > > (f=f@entry=0x2aa21bdc170,

Re: [PATCH for-7.2] block-backend: avoid bdrv_unregister_buf() NULL pointer deref

2022-11-21 Thread Philippe Mathieu-Daudé


On 21/11/22 22:19, Stefan Hajnoczi wrote:

bdrv_*() APIs expect a valid BlockDriverState. Calling them with bs=NULL
leads to undefined behavior.

Jonathan Cameron reported this following NULL pointer dereference when a
VM with a virtio-blk device and a memory-backend-file object is
terminated:
1. qemu_cleanup() closes all drives, setting blk->root to NULL
2. qemu_cleanup() calls user_creatable_cleanup(), which results in a RAM
block notifier callback because the memory-backend-file is destroyed.
3. blk_unregister_buf() is called by virtio-blk's BlockRamRegistrar
notifier callback and undefined behavior occurs.

Fixes: baf422684d73 ("virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization 
hint")
Co-authored-by: Jonathan Cameron 
Signed-off-by: Stefan Hajnoczi 
---
  block/block-backend.c | 15 +--
  1 file changed, 13 insertions(+), 2 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

[PATCH for-7.2] block-backend: avoid bdrv_unregister_buf() NULL pointer deref

2022-11-21 Thread Stefan Hajnoczi

bdrv_*() APIs expect a valid BlockDriverState. Calling them with bs=NULL
leads to undefined behavior.

Jonathan Cameron reported this following NULL pointer dereference when a
VM with a virtio-blk device and a memory-backend-file object is
terminated:
1. qemu_cleanup() closes all drives, setting blk->root to NULL
2. qemu_cleanup() calls user_creatable_cleanup(), which results in a RAM
   block notifier callback because the memory-backend-file is destroyed.
3. blk_unregister_buf() is called by virtio-blk's BlockRamRegistrar
   notifier callback and undefined behavior occurs.

Fixes: baf422684d73 ("virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization 
hint")
Co-authored-by: Jonathan Cameron 
Signed-off-by: Stefan Hajnoczi 
---
 block/block-backend.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index b48c91f4e1..d98a96ff37 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2576,14 +2576,25 @@ static void blk_root_drained_end(BdrvChild *child, int 
*drained_end_counter)
 
 bool blk_register_buf(BlockBackend *blk, void *host, size_t size, Error **errp)
 {
+BlockDriverState *bs = blk_bs(blk);
+
 GLOBAL_STATE_CODE();
-return bdrv_register_buf(blk_bs(blk), host, size, errp);
+
+if (bs) {
+return bdrv_register_buf(bs, host, size, errp);
+}
+return true;
 }
 
 void blk_unregister_buf(BlockBackend *blk, void *host, size_t size)
 {
+BlockDriverState *bs = blk_bs(blk);
+
 GLOBAL_STATE_CODE();
-bdrv_unregister_buf(blk_bs(blk), host, size);
+
+if (bs) {
+bdrv_unregister_buf(bs, host, size);
+}
 }
 
 int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
-- 
2.38.1

Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci

2022-11-21 Thread Markus Armbruster

Max Filippov  writes:

> On Mon, Nov 21, 2022 at 6:01 AM Markus Armbruster  wrote:
>>  .../xtensa/core-dsp3400/xtensa-modules.c.inc  | 136 +-
>>  target/xtensa/core-lx106/xtensa-modules.c.inc |  16 +--
>
> These files are generated and were imported from xtensa configuration
> overlays, they're not supposed to be changed.

Will drop.  Thanks!

Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci

2022-11-21 Thread Max Filippov

On Mon, Nov 21, 2022 at 6:01 AM Markus Armbruster  wrote:
>  .../xtensa/core-dsp3400/xtensa-modules.c.inc  | 136 +-
>  target/xtensa/core-lx106/xtensa-modules.c.inc |  16 +--

These files are generated and were imported from xtensa configuration
overlays, they're not supposed to be changed.

-- 
Thanks.
-- Max

Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci

2022-11-21 Thread Thomas Huth


On 21/11/2022 17.32, Markus Armbruster wrote:

Philippe Mathieu-Daudé  writes:


On 21/11/22 15:36, Peter Maydell wrote:

On Mon, 21 Nov 2022 at 14:03, Markus Armbruster  wrote:


Tweak the semantic patch to drop redundant parenthesis around the
return expression.

Coccinelle drops comments in hw/rdma/vmw/pvrdma_cmd.c; restored
manually.

Coccinelle messes up vmdk_co_create(), not sure why.  Transformed
manually.

Line breaks in target/avr/cpu.h and hw/rdma/vmw/pvrdma_cmd.c tidied up
manually.

Whitespace in fuse_reply_iov() tidied up manually.

checkpatch.pl complains "return of an errno should typically be -ve"
two times for hw/9pfs/9p-synth.c.  Preexisting, the patch merely makes
it visible to checkpatch.pl.

checkpatch.pl complains "return is not a function, parentheses are not
required" three times for target/mips/tcg/dsp_helper.c.  False
positives.

Signed-off-by: Markus Armbruster 



   .../user/ase/msa/bit-count/test_msa_nloc_b.c  |   9 +-
   .../user/ase/msa/bit-count/test_msa_nloc_d.c  |   9 +-

[snip long list of other mips test files]


   328 files changed, 989 insertions(+), 2099 deletions(-)

This patch seems to almost entirely be huge because of these
mips test case files. Are they specific to QEMU or are they
effectively a 3rd-party import that it doesn't make sense
to make local changes to?


They are imported and will unlikely be modified.


Not obvious to me from git-log.

Should I drop the changes to tests/tcg/mips/?


I'd say yes. At least move them to a separate patch. Otherwise reviewing 
this patch here is no fun at all.


 Thomas

Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci

2022-11-21 Thread Markus Armbruster

Philippe Mathieu-Daudé  writes:

> On 21/11/22 15:36, Peter Maydell wrote:
>> On Mon, 21 Nov 2022 at 14:03, Markus Armbruster  wrote:
>>>
>>> Tweak the semantic patch to drop redundant parenthesis around the
>>> return expression.
>>>
>>> Coccinelle drops comments in hw/rdma/vmw/pvrdma_cmd.c; restored
>>> manually.
>>>
>>> Coccinelle messes up vmdk_co_create(), not sure why.  Transformed
>>> manually.
>>>
>>> Line breaks in target/avr/cpu.h and hw/rdma/vmw/pvrdma_cmd.c tidied up
>>> manually.
>>>
>>> Whitespace in fuse_reply_iov() tidied up manually.
>>>
>>> checkpatch.pl complains "return of an errno should typically be -ve"
>>> two times for hw/9pfs/9p-synth.c.  Preexisting, the patch merely makes
>>> it visible to checkpatch.pl.
>>>
>>> checkpatch.pl complains "return is not a function, parentheses are not
>>> required" three times for target/mips/tcg/dsp_helper.c.  False
>>> positives.
>>>
>>> Signed-off-by: Markus Armbruster 
>> 
>>>   .../user/ase/msa/bit-count/test_msa_nloc_b.c  |   9 +-
>>>   .../user/ase/msa/bit-count/test_msa_nloc_d.c  |   9 +-
>> [snip long list of other mips test files]
>> 
>>>   328 files changed, 989 insertions(+), 2099 deletions(-)
>> This patch seems to almost entirely be huge because of these
>> mips test case files. Are they specific to QEMU or are they
>> effectively a 3rd-party import that it doesn't make sense
>> to make local changes to?
>
> They are imported and will unlikely be modified.

Not obvious to me from git-log.

Should I drop the changes to tests/tcg/mips/?

Re: [PATCH] vhost: mask VIRTIO_F_RING_RESET for vhost and vhost-user devices

2022-11-21 Thread Stefan Hajnoczi

On Mon, Nov 21, 2022 at 11:11:01AM +0100, Stefano Garzarella wrote:
> Commit 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
> enabled VIRTIO_F_RING_RESET by default for all virtio devices.
> 
> This feature is not currently emulated by QEMU, so for vhost and
> vhost-user devices we need to make sure it is supported by the offloaded
> device emulation (in-kernel or in another process).
> To do this we need to add VIRTIO_F_RING_RESET to the features bitmap
> passed to vhost_get_features(). This way it will be masked if the device
> does not support it.
> 
> This issue was initially discovered with vhost-vsock and vhost-user-vsock,
> and then also tested with vhost-user-rng which confirmed the same issue.
> They fail when sending features through VHOST_SET_FEATURES ioctl or
> VHOST_USER_SET_FEATURES message, since VIRTIO_F_RING_RESET is negotiated
> by the guest (Linux >= v6.0), but not supported by the device.
> 
> Fixes: 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1318
> Signed-off-by: Stefano Garzarella 
> ---
> 
> To prevent this problem in the future, perhaps we should provide a function
> (e.g. vhost_device_get_features) where we go to mask all non-device-specific
> features (e.g VIRTIO_F_*, VIRTIO_RING_F_*) that are not emulated by QEMU but
> we expect them to be emulated by the vhost or vhost-user devices.
> Then we can call it in all .get_features callbacks just before return the
> features.
> 
> What do you think?
> 
> But maybe better to do that for the next release, I will send an RFC.

This patch looks good for 7.2.

I agree that in the long run this needs to be more robust so vhost
devices don't break every time a new feature bit is added.

> 
> Thanks,
> Stefano
> ---
>  hw/block/vhost-user-blk.c  |  1 +
>  hw/net/vhost_net.c |  1 +
>  hw/scsi/vhost-scsi.c   |  1 +
>  hw/scsi/vhost-user-scsi.c  |  1 +
>  hw/virtio/vhost-user-fs.c  |  1 +
>  hw/virtio/vhost-user-gpio.c|  1 +
>  hw/virtio/vhost-user-i2c.c |  1 +
>  hw/virtio/vhost-user-rng.c | 11 +--
>  hw/virtio/vhost-vsock-common.c |  1 +
>  net/vhost-vdpa.c   |  1 +
>  10 files changed, 18 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 00/19] Migration patches for 8.8

2022-11-21 Thread Juan Quintela

Juan Quintela  wrote:
> Hi
>
> Based-on:  <20221121125907.62469-1-quint...@redhat.com>

The subject should put 8.0 not 8.8.

Sorry, Juan.

>
> This are the patches that I had to drop form the last PULL request because 
> they werent fixes:
> - AVX2 is dropped, intel posted a fix, I have to redo it
> - Fix for out of order channels is out
>   Daniel nacked it and I need to redo it
>
> Juan Quintela (4):
>   multifd: Create page_size fields into both MultiFD{Recv,Send}Params
>   multifd: Create page_count fields into both MultiFD{Recv,Send}Params
>   migration: Export ram_transferred_ram()
>   migration: Export ram_release_page()
>
> Peter Xu (15):
>   migration: Take bitmap mutex when completing ram migration
>   migration: Add postcopy_preempt_active()
>   migration: Cleanup xbzrle zero page cache update logic
>   migration: Trivial cleanup save_page_header() on same block check
>   migration: Remove RAMState.f references in compression code
>   migration: Yield bitmap_mutex properly when sending/sleeping
>   migration: Use atomic ops properly for page accountings
>   migration: Teach PSS about host page
>   migration: Introduce pss_channel
>   migration: Add pss_init()
>   migration: Make PageSearchStatus part of RAMState
>   migration: Move last_sent_block into PageSearchStatus
>   migration: Send requested page directly in rp-return thread
>   migration: Remove old preempt code around state maintainance
>   migration: Drop rs->f
>
>  migration/migration.h|   7 -
>  migration/multifd.h  |   8 +
>  migration/ram.h  |  23 ++
>  migration/migration.c|  47 +--
>  migration/multifd-zlib.c |  14 +-
>  migration/multifd-zstd.c |  12 +-
>  migration/multifd.c  |  27 +-
>  migration/ram.c  | 735 ++-
>  8 files changed, 422 insertions(+), 451 deletions(-)

Re: [PATCH RESEND v3 2/2] virtio: remove unnecessary host_features in ->get_features()

2022-11-21 Thread Cornelia Huck

On Mon, Nov 21 2022, Stefan Hajnoczi  wrote:

> Since at least commit 6b8f1020540c27246277377aa2c3331ad2bfb160 ("virtio:
> move host_features") the ->get_features() function has been called with
> host_features as an argument.
>
> Some devices manually add host_features in ->get_features() although the
> features argument already contains host_features. Make all devices
> consistent by dropping the unnecessary code.
>
> Cc: Cornelia Huck 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  hw/block/virtio-blk.c   | 3 ---
>  hw/char/virtio-serial-bus.c | 1 -
>  hw/net/virtio-net.c | 3 ---
>  hw/scsi/vhost-scsi-common.c | 3 ---
>  hw/scsi/virtio-scsi.c   | 4 
>  hw/virtio/virtio-balloon.c  | 2 --
>  6 files changed, 16 deletions(-)

Reviewed-by: Cornelia Huck

Re: [PATCH RESEND v3 1/2] virtio: document ->host_features usage in vdc->get_features() callback

2022-11-21 Thread Cornelia Huck

On Mon, Nov 21 2022, Stefan Hajnoczi  wrote:

> Suggested-by: Cornelia Huck 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/hw/virtio/virtio.h | 7 +++
>  1 file changed, 7 insertions(+)

Reviewed-by: Cornelia Huck

Re: [PATCH v4 07/11] block/vmdk: add missing coroutine_fn annotations

2022-11-21 Thread Emanuele Giuseppe Esposito




Am 21/11/2022 um 17:01 schrieb Kevin Wolf:
> Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben:
>> These functions end up calling bdrv_create() implemented as 
>> generated_co_wrapper
>> functions.
>> In addition, they also happen to be always called in coroutine context,
>> meaning all callers are coroutine_fn.
>> This means that the g_c_w function will enter the qemu_in_coroutine()
>> case and eventually suspend (or in other words call qemu_coroutine_yield()).
>> Therefore we need to mark such functions coroutine_fn too.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito 
> 
> Just one remark about patch ordering: This doesn't require the
> g_c_w_simple patches, so wouldn't it make more sense to move the
> g_c_w_simple right before the first patch that actually makes use of
> them?

I thought about it, but then I thought it was too far from bdrv_create
patches.
Both g_c_w_simple and this patch are needed by bdrv_create_file, so I
thought if you read this one first then you wouldn't have a clue on what
is the end goal.

Anyways, I'll change the order.

Thank you,
Emanuele

> 
> Reviewed-by: Kevin Wolf 
>

Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci

2022-11-21 Thread Philippe Mathieu-Daudé


On 21/11/22 15:36, Peter Maydell wrote:

On Mon, 21 Nov 2022 at 14:03, Markus Armbruster  wrote:


Tweak the semantic patch to drop redundant parenthesis around the
return expression.

Coccinelle drops comments in hw/rdma/vmw/pvrdma_cmd.c; restored
manually.

Coccinelle messes up vmdk_co_create(), not sure why.  Transformed
manually.

Line breaks in target/avr/cpu.h and hw/rdma/vmw/pvrdma_cmd.c tidied up
manually.

Whitespace in fuse_reply_iov() tidied up manually.

checkpatch.pl complains "return of an errno should typically be -ve"
two times for hw/9pfs/9p-synth.c.  Preexisting, the patch merely makes
it visible to checkpatch.pl.

checkpatch.pl complains "return is not a function, parentheses are not
required" three times for target/mips/tcg/dsp_helper.c.  False
positives.

Signed-off-by: Markus Armbruster 



  .../user/ase/msa/bit-count/test_msa_nloc_b.c  |   9 +-
  .../user/ase/msa/bit-count/test_msa_nloc_d.c  |   9 +-

[snip long list of other mips test files]


  328 files changed, 989 insertions(+), 2099 deletions(-)


This patch seems to almost entirely be huge because of these
mips test case files. Are they specific to QEMU or are they
effectively a 3rd-party import that it doesn't make sense
to make local changes to?


They are imported and will unlikely be modified.

Re: [PATCH v4 07/11] block/vmdk: add missing coroutine_fn annotations

2022-11-21 Thread Kevin Wolf

Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben:
> These functions end up calling bdrv_create() implemented as 
> generated_co_wrapper
> functions.
> In addition, they also happen to be always called in coroutine context,
> meaning all callers are coroutine_fn.
> This means that the g_c_w function will enter the qemu_in_coroutine()
> case and eventually suspend (or in other words call qemu_coroutine_yield()).
> Therefore we need to mark such functions coroutine_fn too.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 

Just one remark about patch ordering: This doesn't require the
g_c_w_simple patches, so wouldn't it make more sense to move the
g_c_w_simple right before the first patch that actually makes use of
them?

Reviewed-by: Kevin Wolf

Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci

2022-11-21 Thread Markus Armbruster

Peter Maydell  writes:

> On Mon, 21 Nov 2022 at 14:03, Markus Armbruster  wrote:
>>
>> Tweak the semantic patch to drop redundant parenthesis around the
>> return expression.
>>
>> Coccinelle drops comments in hw/rdma/vmw/pvrdma_cmd.c; restored
>> manually.
>>
>> Coccinelle messes up vmdk_co_create(), not sure why.  Transformed
>> manually.
>>
>> Line breaks in target/avr/cpu.h and hw/rdma/vmw/pvrdma_cmd.c tidied up
>> manually.
>>
>> Whitespace in fuse_reply_iov() tidied up manually.
>>
>> checkpatch.pl complains "return of an errno should typically be -ve"
>> two times for hw/9pfs/9p-synth.c.  Preexisting, the patch merely makes
>> it visible to checkpatch.pl.
>>
>> checkpatch.pl complains "return is not a function, parentheses are not
>> required" three times for target/mips/tcg/dsp_helper.c.  False
>> positives.
>>
>> Signed-off-by: Markus Armbruster 
>
>>  .../user/ase/msa/bit-count/test_msa_nloc_b.c  |   9 +-
>>  .../user/ase/msa/bit-count/test_msa_nloc_d.c  |   9 +-
> [snip long list of other mips test files]
>
>>  328 files changed, 989 insertions(+), 2099 deletions(-)
>
> This patch seems to almost entirely be huge because of these
> mips test case files. Are they specific to QEMU or are they
> effectively a 3rd-party import that it doesn't make sense
> to make local changes to?

Good question.

> I definitely don't think you should put manual tidying changes
> in the same patch as 3000 lines of automated changes: that
> is in my opinion un-reviewable.

I can split off the manual cleanup of vmdk_co_create().

The others are really just "don't delete this comment", "don't insert
a line containing just spaces", and "don't reflow this expression, keep
the line breaks right where they are".

Re: [PATCH v4 06/11] block-coroutine-wrapper.py: support also basic return types

2022-11-21 Thread Kevin Wolf

Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben:
> Extend the regex to cover also return type, pointers included.
> This implies that the value returned by the function cannot be
> a simple "int" anymore, but the custom return type.
> Therefore remove poll_state->ret and instead use a per-function
> custom "ret" field.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 

Reviewed-by: Kevin Wolf

Re: [PULL 0/8] Next patches

2022-11-21 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

[PATCH v3 3/3] virtio: remove unnecessary host_features in ->get_features()

2022-11-21 Thread Stefan Hajnoczi

Since at least commit 6b8f1020540c27246277377aa2c3331ad2bfb160 ("virtio:
move host_features") the ->get_features() function has been called with
host_features as an argument.

Some devices manually add host_features in ->get_features() although the
features argument already contains host_features. Make all devices
consistent by dropping the unnecessary code.

Cc: Cornelia Huck 
Signed-off-by: Stefan Hajnoczi 
---
 hw/block/virtio-blk.c   | 3 ---
 hw/char/virtio-serial-bus.c | 1 -
 hw/net/virtio-net.c | 3 ---
 hw/scsi/vhost-scsi-common.c | 3 ---
 hw/scsi/virtio-scsi.c   | 4 
 hw/virtio/virtio-balloon.c  | 2 --
 6 files changed, 16 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index f717550fdc..a744012fec 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -983,9 +983,6 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, 
uint64_t features,
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
 
-/* Firstly sync all virtio-blk possible supported features */
-features |= s->host_features;
-
 virtio_add_feature(, VIRTIO_BLK_F_SEG_MAX);
 virtio_add_feature(, VIRTIO_BLK_F_GEOMETRY);
 virtio_add_feature(, VIRTIO_BLK_F_TOPOLOGY);
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 7d4601cb5d..1414fb85ae 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -557,7 +557,6 @@ static uint64_t get_features(VirtIODevice *vdev, uint64_t 
features,
 
 vser = VIRTIO_SERIAL(vdev);
 
-features |= vser->host_features;
 if (vser->bus.max_nr_ports > 1) {
 virtio_add_feature(, VIRTIO_CONSOLE_F_MULTIPORT);
 }
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index aba12759d5..35b45eb6e3 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -777,9 +777,6 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, 
uint64_t features,
 VirtIONet *n = VIRTIO_NET(vdev);
 NetClientState *nc = qemu_get_queue(n->nic);
 
-/* Firstly sync all virtio-net possible supported features */
-features |= n->host_features;
-
 virtio_add_feature(, VIRTIO_NET_F_MAC);
 
 if (!peer_has_vnet_hdr(n)) {
diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
index 767f827e55..8b26f90aa1 100644
--- a/hw/scsi/vhost-scsi-common.c
+++ b/hw/scsi/vhost-scsi-common.c
@@ -124,9 +124,6 @@ uint64_t vhost_scsi_common_get_features(VirtIODevice *vdev, 
uint64_t features,
 {
 VHostSCSICommon *vsc = VHOST_SCSI_COMMON(vdev);
 
-/* Turn on predefined features supported by this device */
-features |= vsc->host_features;
-
 return vhost_get_features(>dev, vsc->feature_bits, features);
 }
 
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 6f6e2e32ba..40e10322fb 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -817,10 +817,6 @@ static uint64_t virtio_scsi_get_features(VirtIODevice 
*vdev,
  uint64_t requested_features,
  Error **errp)
 {
-VirtIOSCSI *s = VIRTIO_SCSI(vdev);
-
-/* Firstly sync all virtio-scsi possible supported features */
-requested_features |= s->host_features;
 return requested_features;
 }
 
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 73ac5eb675..0e9ca71b15 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -796,8 +796,6 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
 Error **errp)
 {
-VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
-f |= dev->host_features;
 virtio_add_feature(, VIRTIO_BALLOON_F_STATS_VQ);
 
 return f;
-- 
2.38.1

Re: [PATCH v4 05/11] block-coroutine-wrapper.py: default to main loop aiocontext if function does not have a BlockDriverState parameter

2022-11-21 Thread Emanuele Giuseppe Esposito




Am 21/11/2022 um 16:30 schrieb Kevin Wolf:
> Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben:
>> Basically BdrvPollCo->bs is only used by bdrv_poll_co(), and the
>> functions that it uses are both using bdrv_get_aio_context, that
>> defaults to qemu_get_aio_context() if bs is NULL.
>>
>> Therefore pass NULL to BdrvPollCo to automatically generate a function
>> that create and runs a coroutine in the main loop.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito 
> 
> It happens to work, but it's kind of ugly to call bdrv_coroutine_enter()
> and BDRV_POLL_WHILE() with a NULL bs.
> 
> How hard would it be to generate code that doesn't use these functions,
> but directly aio_co_enter() and AIO_WAIT_WHILE() for wrappers that are
> not related to a BDS?
> 

At this point, I would get rid of s->poll_state.bs and instead use
s->poll_state.aio_context. Then call directly aio_co_enter and
AIO_WAIT_WHILE, as you suggested but just everywhere, without
differentiating the cases.

Then we would have something similar to what it is currently done with bs:

if t == 'BlockDriverState *':
bs = 'bdrv_get_aio_context(bs)'
elif t == 'BdrvChild *':
bs = 'bdrv_get_aio_context(child->bs)'
elif t == 'BlockBackend *':
bs = 'bdrv_get_aio_context(blk_bs(blk))'
else:
bs = 'qemu_get_aio_context()'

I haven't tried it yet, but it should work.

Thank you,
Emanuele

[PATCH RESEND v3 2/2] virtio: remove unnecessary host_features in ->get_features()

2022-11-21 Thread Stefan Hajnoczi

Since at least commit 6b8f1020540c27246277377aa2c3331ad2bfb160 ("virtio:
move host_features") the ->get_features() function has been called with
host_features as an argument.

Some devices manually add host_features in ->get_features() although the
features argument already contains host_features. Make all devices
consistent by dropping the unnecessary code.

Cc: Cornelia Huck 
Signed-off-by: Stefan Hajnoczi 
---
 hw/block/virtio-blk.c   | 3 ---
 hw/char/virtio-serial-bus.c | 1 -
 hw/net/virtio-net.c | 3 ---
 hw/scsi/vhost-scsi-common.c | 3 ---
 hw/scsi/virtio-scsi.c   | 4 
 hw/virtio/virtio-balloon.c  | 2 --
 6 files changed, 16 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index f717550fdc..a744012fec 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -983,9 +983,6 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, 
uint64_t features,
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
 
-/* Firstly sync all virtio-blk possible supported features */
-features |= s->host_features;
-
 virtio_add_feature(, VIRTIO_BLK_F_SEG_MAX);
 virtio_add_feature(, VIRTIO_BLK_F_GEOMETRY);
 virtio_add_feature(, VIRTIO_BLK_F_TOPOLOGY);
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 7d4601cb5d..1414fb85ae 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -557,7 +557,6 @@ static uint64_t get_features(VirtIODevice *vdev, uint64_t 
features,
 
 vser = VIRTIO_SERIAL(vdev);
 
-features |= vser->host_features;
 if (vser->bus.max_nr_ports > 1) {
 virtio_add_feature(, VIRTIO_CONSOLE_F_MULTIPORT);
 }
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index aba12759d5..35b45eb6e3 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -777,9 +777,6 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, 
uint64_t features,
 VirtIONet *n = VIRTIO_NET(vdev);
 NetClientState *nc = qemu_get_queue(n->nic);
 
-/* Firstly sync all virtio-net possible supported features */
-features |= n->host_features;
-
 virtio_add_feature(, VIRTIO_NET_F_MAC);
 
 if (!peer_has_vnet_hdr(n)) {
diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
index 767f827e55..8b26f90aa1 100644
--- a/hw/scsi/vhost-scsi-common.c
+++ b/hw/scsi/vhost-scsi-common.c
@@ -124,9 +124,6 @@ uint64_t vhost_scsi_common_get_features(VirtIODevice *vdev, 
uint64_t features,
 {
 VHostSCSICommon *vsc = VHOST_SCSI_COMMON(vdev);
 
-/* Turn on predefined features supported by this device */
-features |= vsc->host_features;
-
 return vhost_get_features(>dev, vsc->feature_bits, features);
 }
 
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 6f6e2e32ba..40e10322fb 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -817,10 +817,6 @@ static uint64_t virtio_scsi_get_features(VirtIODevice 
*vdev,
  uint64_t requested_features,
  Error **errp)
 {
-VirtIOSCSI *s = VIRTIO_SCSI(vdev);
-
-/* Firstly sync all virtio-scsi possible supported features */
-requested_features |= s->host_features;
 return requested_features;
 }
 
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 73ac5eb675..0e9ca71b15 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -796,8 +796,6 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
 Error **errp)
 {
-VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
-f |= dev->host_features;
 virtio_add_feature(, VIRTIO_BALLOON_F_STATS_VQ);
 
 return f;
-- 
2.38.1

[PATCH RESEND v3 1/2] virtio: document ->host_features usage in vdc->get_features() callback

2022-11-21 Thread Stefan Hajnoczi

Suggested-by: Cornelia Huck 
Signed-off-by: Stefan Hajnoczi 
---
 include/hw/virtio/virtio.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index a973811cbf..b6e09c6d4b 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -138,9 +138,16 @@ struct VirtioDeviceClass {
 /* This is what a VirtioDevice must implement */
 DeviceRealize realize;
 DeviceUnrealize unrealize;
+
+/*
+ * Called with vdev->host_features in requested_features. Returns device
+ * feature bits to be stored in vdev->host_features after factoring in
+ * device-specific feature bits.
+ */
 uint64_t (*get_features)(VirtIODevice *vdev,
  uint64_t requested_features,
  Error **errp);
+
 uint64_t (*bad_features)(VirtIODevice *vdev);
 void (*set_features)(VirtIODevice *vdev, uint64_t val);
 int (*validate_features)(VirtIODevice *vdev);
-- 
2.38.1

[PATCH RESEND v3 0/2] virtio: remove unnecessary host_features in ->get_features()

2022-11-21 Thread Stefan Hajnoczi

v3 (resend):
- On top of master instead of the qemu.git/staging tree :)
v3:
- Use an informal comment instead of a full doc comment [Cornelia]
v2:
- Document vdv->get_features() callback [Cornelia]

The vdc->get_features() callbacks are a little inconsistent in how they use
vdev->host_features. This is because the function's behavior changed over time.
Clean things up.

Stefan Hajnoczi (2):
  virtio: document ->host_features usage in vdc->get_features() callback
  virtio: remove unnecessary host_features in ->get_features()

 include/hw/virtio/virtio.h  | 7 +++
 hw/block/virtio-blk.c   | 3 ---
 hw/char/virtio-serial-bus.c | 1 -
 hw/net/virtio-net.c | 3 ---
 hw/scsi/vhost-scsi-common.c | 3 ---
 hw/scsi/virtio-scsi.c   | 4 
 hw/virtio/virtio-balloon.c  | 2 --
 7 files changed, 7 insertions(+), 16 deletions(-)

-- 
2.38.1

[PATCH v3 2/3] virtio: document ->host_features usage in vdc->get_features() callback

2022-11-21 Thread Stefan Hajnoczi

Suggested-by: Cornelia Huck 
Signed-off-by: Stefan Hajnoczi 
---
 include/hw/virtio/virtio.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index a973811cbf..b6e09c6d4b 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -138,9 +138,16 @@ struct VirtioDeviceClass {
 /* This is what a VirtioDevice must implement */
 DeviceRealize realize;
 DeviceUnrealize unrealize;
+
+/*
+ * Called with vdev->host_features in requested_features. Returns device
+ * feature bits to be stored in vdev->host_features after factoring in
+ * device-specific feature bits.
+ */
 uint64_t (*get_features)(VirtIODevice *vdev,
  uint64_t requested_features,
  Error **errp);
+
 uint64_t (*bad_features)(VirtIODevice *vdev);
 void (*set_features)(VirtIODevice *vdev, uint64_t val);
 int (*validate_features)(VirtIODevice *vdev);
-- 
2.38.1

[PATCH v3 1/3] target/ppc: Fix build warnings when building with 'disable-tcg'

2022-11-21 Thread Stefan Hajnoczi

From: Vaibhav Jain 

Kowshik reported that building qemu with GCC 12.2.1 for 'ppc64-softmmu'
target is failing due to following build warnings:


 ../target/ppc/cpu_init.c:7018:13: error: 'ppc_restore_state_to_opc' defined 
but not used [-Werror=unused-function]
 7018 | static void ppc_restore_state_to_opc(CPUState *cs,


Fix this by wrapping these function definitions in 'ifdef CONFIG_TCG' so that
they are only defined if qemu is compiled with '--enable-tcg'

Reported-by: Kowshik Jois B S 
Fixes: 61bd1d2942 ("target/ppc: Convert to tcg_ops restore_state_to_opc")
Fixes: 670f1da374 ("target/ppc: Implement hashst and hashchk")
Fixes: 53ae2aeb94 ("target/ppc: Implement hashstp and hashchkp")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1319
Signed-off-by: Vaibhav Jain 
Reviewed-by: Greg Kurz 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Kowshik Jois B S 
Message-Id: <20221116131743.658708-1-vaib...@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza 
---
 target/ppc/cpu_init.c| 2 ++
 target/ppc/excp_helper.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 32e94153d1..cbf0081374 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -7015,6 +7015,7 @@ static vaddr ppc_cpu_get_pc(CPUState *cs)
 return cpu->env.nip;
 }
 
+#ifdef CONFIG_TCG
 static void ppc_restore_state_to_opc(CPUState *cs,
  const TranslationBlock *tb,
  const uint64_t *data)
@@ -7023,6 +7024,7 @@ static void ppc_restore_state_to_opc(CPUState *cs,
 
 cpu->env.nip = data[0];
 }
+#endif /* CONFIG_TCG */
 
 static bool ppc_cpu_has_work(CPUState *cs)
 {
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index a05a2ed595..94adcb766b 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -2842,6 +2842,7 @@ void helper_td(CPUPPCState *env, target_ulong arg1, 
target_ulong arg2,
 #endif
 #endif
 
+#ifdef CONFIG_TCG
 static uint32_t helper_SIMON_LIKE_32_64(uint32_t x, uint64_t key, uint32_t 
lane)
 {
 const uint16_t c = 0xfffc;
@@ -2924,6 +2925,7 @@ HELPER_HASH(HASHST, env->spr[SPR_HASHKEYR], true)
 HELPER_HASH(HASHCHK, env->spr[SPR_HASHKEYR], false)
 HELPER_HASH(HASHSTP, env->spr[SPR_HASHPKEYR], true)
 HELPER_HASH(HASHCHKP, env->spr[SPR_HASHPKEYR], false)
+#endif /* CONFIG_TCG */
 
 #if !defined(CONFIG_USER_ONLY)
 
-- 
2.38.1

[PATCH v3 0/3] virtio: remove unnecessary host_features in ->get_features()

2022-11-21 Thread Stefan Hajnoczi

v3:
- Use an informal comment instead of a full doc comment [Cornelia]
v2:
- Document vdv->get_features() callback [Cornelia]

The vdc->get_features() callbacks are a little inconsistent in how they use
vdev->host_features. This is because the function's behavior changed over time.
Clean things up.

Stefan Hajnoczi (2):
  virtio: document ->host_features usage in vdc->get_features() callback
  virtio: remove unnecessary host_features in ->get_features()

Vaibhav Jain (1):
  target/ppc: Fix build warnings when building with 'disable-tcg'

 include/hw/virtio/virtio.h  | 7 +++
 hw/block/virtio-blk.c   | 3 ---
 hw/char/virtio-serial-bus.c | 1 -
 hw/net/virtio-net.c | 3 ---
 hw/scsi/vhost-scsi-common.c | 3 ---
 hw/scsi/virtio-scsi.c   | 4 
 hw/virtio/virtio-balloon.c  | 2 --
 target/ppc/cpu_init.c   | 2 ++
 target/ppc/excp_helper.c| 2 ++
 9 files changed, 11 insertions(+), 16 deletions(-)

-- 
2.38.1

Re: [PATCH v4 05/11] block-coroutine-wrapper.py: default to main loop aiocontext if function does not have a BlockDriverState parameter

2022-11-21 Thread Kevin Wolf

Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben:
> Basically BdrvPollCo->bs is only used by bdrv_poll_co(), and the
> functions that it uses are both using bdrv_get_aio_context, that
> defaults to qemu_get_aio_context() if bs is NULL.
> 
> Therefore pass NULL to BdrvPollCo to automatically generate a function
> that create and runs a coroutine in the main loop.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 

It happens to work, but it's kind of ugly to call bdrv_coroutine_enter()
and BDRV_POLL_WHILE() with a NULL bs.

How hard would it be to generate code that doesn't use these functions,
but directly aio_co_enter() and AIO_WAIT_WHILE() for wrappers that are
not related to a BDS?

Kevin

Re: [PATCH 0/6] Protect the block layer with a rwlock: part 2

2022-11-21 Thread Emanuele Giuseppe Esposito

Ok, as I expected simple changes in a previous based-on serie provoke a
cascade of changes that inevitably affect these patches too.

While I strongly suggest to have an initial look at these patches
because it gives an idea on what am I trying to accomplish, I would not
go looking at nitpicks and trivial errors that came up from the based-on
series (ie "just as in the previous serie, fix this").

The order of the series is:
1. Still more coroutine and various fixes in block layer
2. Protect the block layer with a rwlock: part 1
3. Protect the block layer with a rwlock: part 2
4. Protect the block layer with a rwlock: part 3

Thank you,
Emanuele

Am 16/11/2022 um 14:53 schrieb Emanuele Giuseppe Esposito:
> Please read "Protect the block layer with a rwlock: part 1" for an additional
> introduction and aim of this series.
> 
> This second part aims to add the graph rdlock to the BlockDriver functions
> that already run in coroutine context and are classified as IO.
> Such functions will recursively traverse the BlockDriverState graph, therefore
> they need to be protected with the rdlock.
> 
> Based-on: <20221116134850.3051419-1-eespo...@redhat.com>
> 
> Thank you,
> Emanuele
> 
> Emanuele Giuseppe Esposito (6):
>   block: assert that bdrv_co_create is always called with graph rdlock
> taken
>   block: assert that BlockDriver->bdrv_co_{amend/create} are called with
> graph rdlock taken
>   block: assert that BlockDriver->bdrv_co_copy_range_{from/to} is always
> called with graph rdlock taken
>   block/dirty-bitmap: assert that BlockDriver->bdrv_co_*_dirty_bitmap
> are always called with graph rdlock taken
>   block/io: assert that BlockDriver->bdrv_co_*_snapshot_* are always
> called with graph rdlock taken
>   block: assert that BlockDriver->bdrv_co_delete_file is always called
> with graph rdlock taken
> 
>  block.c  |  2 ++
>  block/amend.c|  1 +
>  block/block-backend.c|  2 ++
>  block/create.c   |  1 +
>  block/dirty-bitmap.c |  2 ++
>  block/io.c   |  7 +++
>  include/block/block_int-common.h | 14 +-
>  qemu-img.c   |  4 +++-
>  8 files changed, 31 insertions(+), 2 deletions(-)
>

Re: [PATCH 00/15] Protect the block layer with a rwlock: part 3

2022-11-21 Thread Emanuele Giuseppe Esposito

Ok, as I expected simple changes in a previous based-on serie provoke a
cascade of changes that inevitably affect these patches too.

While I strongly suggest to have an initial look at these patches
because it gives an idea on what am I trying to accomplish, I would not
go looking at nitpicks and trivial errors that came up from the based-on
series (ie "just as in the previous serie, fix this").

The order of the series is:
1. Still more coroutine and various fixes in block layer
2. Protect the block layer with a rwlock: part 1
3. Protect the block layer with a rwlock: part 2
4. Protect the block layer with a rwlock: part 3

Thank you,
Emanuele

Am 16/11/2022 um 15:07 schrieb Emanuele Giuseppe Esposito:
> Please read "Protect the block layer with a rwlock: part 1" and
> "Protect the block layer with a rwlock: part 2" for an
> additional introduction and aim of this series.
> 
> In this serie, we cover the remaining BlockDriver IO callbacks that were not
> running in coroutine, therefore not using the graph rdlock.
> Therefore convert them to coroutines, using either g_c_w or a new
> variant introduced in this serie (see below).
> 
> We need to convert these callbacks into coroutine because non-coroutine code
> is tied to the main thread, even though it will still delegate I/O accesses to
> the iothread (via the bdrv_coroutine_enter call in generated_co_wrappers).
> Making callbacks run in coroutines provides more flexibility, because they run
> entirely in iothreads and can use CoMutexes for mutual exclusion.
> 
> Here we introduce generated_co_wrapper_simple, a simplification of g_c_w that
> only considers the case where the caller is not in a coroutine.
> This simplifies and clarifies a lot when the caller is a coroutine or not, and
> in the future will hopefully replace g_c_w.
> 
> While we are at it, try to directly call the _co_ counterpart of a g_c_w when
> we know already that the function always run in a coroutine.
> 
> Based-on: <20221116135331.3052923-1-eespo...@redhat.com>
> 
> Thank you,
> Emanuele
> 
> Emanuele Giuseppe Esposito (15):
>   block/qed: add missing graph rdlock in qed_need_check_timer_entry
>   block: rename refresh_total_sectors in bdrv_refresh_total_sectors
>   block-backend: use bdrv_getlength instead of blk_getlength
>   block: convert bdrv_refresh_total_sectors in generated_co_wrapper
>   block: use bdrv_co_refresh_total_sectors when possible
>   block: convert bdrv_get_allocated_file_size in
> generated_co_wrapper_simple
>   block: convert bdrv_get_info in generated_co_wrapper
>   block: convert bdrv_is_inserted in generated_co_wrapper_simple
>   block-coroutine-wrapper: support void functions
>   block: convert bdrv_eject in generated_co_wrapper_simple
>   block: convert bdrv_lock_medium in generated_co_wrapper_simple
>   block: convert bdrv_debug_event in generated_co_wrapper
>   block: convert bdrv_io_plug in generated_co_wrapper_simple
>   block: convert bdrv_io_unplug in generated_co_wrapper_simple
>   block: rename newly converted BlockDriver IO coroutine functions
> 
>  block.c| 93 +++---
>  block/blkdebug.c   |  4 +-
>  block/blkio.c  |  6 +-
>  block/blklogwrites.c   |  2 +-
>  block/blkreplay.c  |  2 +-
>  block/blkverify.c  |  2 +-
>  block/block-backend.c  | 43 --
>  block/commit.c |  4 +-
>  block/copy-on-read.c   | 12 ++--
>  block/crypto.c |  6 +-
>  block/curl.c   |  8 +--
>  block/file-posix.c | 48 +++
>  block/file-win32.c |  8 +--
>  block/filter-compress.c| 10 ++--
>  block/gluster.c| 16 ++---
>  block/io.c | 78 +
>  block/iscsi.c  |  8 +--
>  block/meson.build  |  1 +
>  block/mirror.c | 17 --
>  block/nbd.c|  6 +-
>  block/nfs.c|  2 +-
>  block/null.c   |  8 +--
>  block/nvme.c   |  6 +-
>  block/preallocate.c|  2 +-
>  block/qcow.c   |  2 +-
>  block/qcow2-refcount.c |  2 +-
>  block/qcow2.c  |  6 +-
>  block/qed.c|  7 ++-
>  block/quorum.c |  2 +-
>  block/raw-format.c | 14 ++---
>  block/rbd.c|  4 +-
>  block/replication.c|  2 +-
>  block/ssh.c|  2 +-
>  block/stream.c |  4 +-
>  block/throttle.c   |  2 +-
>  block/vdi.c|  2 +-
>  block/vhdx.c   |  2 +-
>  block/vmdk.c   |  4 +-
>  block/vpc.c|  2 +-
>  blockdev.c |  8

Re: [PATCH 00/20] Protect the block layer with a rwlock: part 1

2022-11-21 Thread Emanuele Giuseppe Esposito

Ok, as I expected simple changes in a previous based-on serie provoke a
cascade of changes that inevitably affect these patches too.

While I strongly suggest to have an initial look at these patches
because it gives an idea on what am I trying to accomplish, I would not
go looking at nitpicks and trivial errors that came up from the based-on
series (ie "just as in the previous serie, fix this").

The order of the series is:
1. Still more coroutine and various fixes in block layer
2. Protect the block layer with a rwlock: part 1
3. Protect the block layer with a rwlock: part 2
4. Protect the block layer with a rwlock: part 3

Thank you,
Emanuele

Am 16/11/2022 um 14:48 schrieb Emanuele Giuseppe Esposito:
> This serie is the first of four series that aim to introduce and use a new
> graph rwlock in the QEMU block layer.
> The aim is to replace the current AioContext lock with much fine-grained 
> locks,
> aimed to protect only specific data.
> Currently the AioContext lock is used pretty much everywhere, and it's not
> even clear what it is protecting exactly.
> 
> The aim of the rwlock is to cover graph modifications: more precisely,
> when a BlockDriverState parent or child list is modified or read, since it can
> be concurrently accessed by the main loop and iothreads.
> 
> The main assumption is that the main loop is the only one allowed to perform
> graph modifications, and so far this has always been held by the current code.
> 
> The rwlock is inspired from cpus-common.c implementation, and aims to
> reduce cacheline bouncing by having per-aiocontext counter of readers.
> All details and implementation of the lock are in patch 1.
> 
> We distinguish between writer (main loop, under BQL) that modifies the
> graph, and readers (all other coroutines running in various AioContext),
> that go through the graph edges, reading ->parents and->children.
> The writer (main loop)  has an "exclusive" access, so it first waits for
> current read to finish, and then prevents incoming ones from
> entering while it has the exclusive access.
> The readers (coroutines in multiple AioContext) are free to
> access the graph as long the writer is not modifying the graph.
> In case it is, they go in a CoQueue and sleep until the writer
> is done.
> 
> In this first serie, my aim is to introduce the lock (patches 1-3,6), cover 
> the
> main graph writer (patch 4), define assertions (patch 5) and start using the
> read lock in the generated_co_wrapper functions (7-20).
> Such functions recursively traverse the BlockDriverState graph, so they must
> take the graph rdlock.
> 
> We distinguish two cases related to the generated_co_wrapper (often shortened
> to g_c_w):
> - qemu_in_coroutine(), which means the function is already running in a
>   coroutine. This means we don't take the lock, because the coroutine must
>   have taken it when it started
> - !qemu_in_coroutine(), which means we need to create a new coroutine that
>   performs the operation requested. In this case we take the rdlock as soon as
>   the coroutine starts, and release only before finishing.
> 
> In this and following series, we try to follow the following locking pattern:
> - bdrv_co_* functions that call BlockDriver callbacks always expect the lock
>   to be taken, therefore they assert.
> - blk_co_* functions usually call blk_wait_while_drained(), therefore they 
> must
>   take the lock internally. Therefore we introduce generated_co_wrapper_blk,
>   which does not take the rdlock when starting the coroutine.
> 
> The long term goal of this series is to eventually replace the AioContext 
> lock,
> so that we can get rid of it once and for all.
> 
> This serie is based on v4 of "Still more coroutine and various fixes in block 
> layer".
> 
> Based-on: <20221116122241.2856527-1-eespo...@redhat.com>
> 
> Thank you,
> Emanuele
> 
> Emanuele Giuseppe Esposito (19):
>   graph-lock: introduce BdrvGraphRWlock structure
>   async: register/unregister aiocontext in graph lock list
>   block.c: wrlock in bdrv_replace_child_noperm
>   block: remove unnecessary assert_bdrv_graph_writable()
>   block: assert that graph read and writes are performed correctly
>   graph-lock: implement WITH_GRAPH_RDLOCK_GUARD and GRAPH_RDLOCK_GUARD
> macros
>   block-coroutine-wrapper.py: take the graph rdlock in bdrv_* functions
>   block-backend: introduce new generated_co_wrapper_blk annotation
>   block-gen: assert that {bdrv/blk}_co_truncate is always called with
> graph rdlock taken
>   block-gen: assert that bdrv_co_{check/invalidate_cache} are always
> called with graph rdlock taken
>   block-gen: assert that bdrv_co_pwrite is always called with graph
> rdlock taken
>   block-gen: assert that bdrv_co_pwrite_{zeros/sync} is always called
> with graph rdlock taken
>   block-gen: assert that bdrv_co_pread is always called with graph
> rdlock taken
>   block-gen: assert that {bdrv/blk}_co_flush is always called with graph
> rdlock taken
>

Re: [PATCH] cleanup: Tweak and re-run return_directly.cocci

2022-11-21 Thread Peter Maydell

On Mon, 21 Nov 2022 at 14:03, Markus Armbruster  wrote:
>
> Tweak the semantic patch to drop redundant parenthesis around the
> return expression.
>
> Coccinelle drops comments in hw/rdma/vmw/pvrdma_cmd.c; restored
> manually.
>
> Coccinelle messes up vmdk_co_create(), not sure why.  Transformed
> manually.
>
> Line breaks in target/avr/cpu.h and hw/rdma/vmw/pvrdma_cmd.c tidied up
> manually.
>
> Whitespace in fuse_reply_iov() tidied up manually.
>
> checkpatch.pl complains "return of an errno should typically be -ve"
> two times for hw/9pfs/9p-synth.c.  Preexisting, the patch merely makes
> it visible to checkpatch.pl.
>
> checkpatch.pl complains "return is not a function, parentheses are not
> required" three times for target/mips/tcg/dsp_helper.c.  False
> positives.
>
> Signed-off-by: Markus Armbruster 

>  .../user/ase/msa/bit-count/test_msa_nloc_b.c  |   9 +-
>  .../user/ase/msa/bit-count/test_msa_nloc_d.c  |   9 +-
[snip long list of other mips test files]

>  328 files changed, 989 insertions(+), 2099 deletions(-)

This patch seems to almost entirely be huge because of these
mips test case files. Are they specific to QEMU or are they
effectively a 3rd-party import that it doesn't make sense
to make local changes to?

I definitely don't think you should put manual tidying changes
in the same patch as 3000 lines of automated changes: that
is in my opinion un-reviewable.

thanks
-- PMM

Re: [PATCH RFC 2/3] hw/i2c: add mctp core

2022-11-21 Thread Matt Johnston

On Fri, Nov 18, 2022 at 08:01:40AM +0100, Klaus Jensen wrote:
> On Nov 18 13:56, Jeremy Kerr wrote:
> > Hi Klaus,
> > 
> > > Add an abstract MCTP over I2C endpoint model. This implements MCTP
> > > control message handling as well as handling the actual I2C transport
> > > (packetization).
> > > 
> > With those changes, I can get control protocol going, and multi-packet
> > messages work. There's a couple of failures from unsupported commands,
> > but otherwise looks good:
> > 
> >   # mctp addr add 8 dev mctpi2c6
> >   # mctp link set mctpi2c6 up
> >   # mctp link set mctpi2c6 mtu 254
> >   # systemctl restart mctpd
> >   # busctl --no-pager call xyz.openbmc_project.MCTP \
> > /xyz/openbmc_project/mctp au.com.CodeConstruct.MCTP \
> > SetupEndpoint say mctpi2c6 1 0x1d

Hi Klaus,

Thanks for the MCTP model, it's useful here.

I needed the following patch to be able to call SetupEndpoint again when a
device has already been assigned an EID. That tries a Set Endpoint ID/
Get Endpoint ID, addressed to EID 0.

Cheers,
Matt

---
>From cb7ad91474367f8e47bdaf03aba9a822f2648f41 Mon Sep 17 00:00:00 2001
From: Matt Johnston 
Date: Mon, 21 Nov 2022 15:10:13 +0800
Subject: [PATCH] i2c/mctp: Allow receiving messages to dest eid 0

The Null Destination ID, 0, is used for MCTP control messages when
addressing by physical ID. That is used for Get Endpoint ID and
Set Endpoint ID when querying/assigning an EID to an endpoint.

Signed-off-by: Matt Johnston 
---
 hw/i2c/mctp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/i2c/mctp.c b/hw/i2c/mctp.c
index 1775deb46f..9d9e519ba9 100644
--- a/hw/i2c/mctp.c
+++ b/hw/i2c/mctp.c
@@ -258,7 +258,8 @@ static int i2c_mctp_event_cb(I2CSlave *i2c, enum i2c_event 
event)
 goto drop;
 }
 
-if (pkt->mctp.hdr.eid.dest != mctp->my_eid) {
+if (!(pkt->mctp.hdr.eid.dest == mctp->my_eid
+|| pkt->mctp.hdr.eid.dest == 0)) {
 trace_i2c_mctp_drop_invalid_eid(pkt->mctp.hdr.eid.dest,
 mctp->my_eid);
 goto drop;
-- 
2.37.2

[PATCH 03/19] migration: Export ram_transferred_ram()

2022-11-21 Thread Juan Quintela

Signed-off-by: Juan Quintela 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: David Edmondson 
Reviewed-by: Leonardo Bras 
---
 migration/ram.h | 2 ++
 migration/ram.c | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/migration/ram.h b/migration/ram.h
index c7af65ac74..e844966f69 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -65,6 +65,8 @@ int ram_load_postcopy(QEMUFile *f, int channel);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
 
+void ram_transferred_add(uint64_t bytes);
+
 int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr);
 bool ramblock_recv_bitmap_test_byte_offset(RAMBlock *rb, uint64_t byte_offset);
 void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr);
diff --git a/migration/ram.c b/migration/ram.c
index 1338e47665..2cbe707bfc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -422,7 +422,7 @@ uint64_t ram_bytes_remaining(void)
 
 MigrationStats ram_counters;
 
-static void ram_transferred_add(uint64_t bytes)
+void ram_transferred_add(uint64_t bytes)
 {
 if (runstate_is_running()) {
 ram_counters.precopy_bytes += bytes;
-- 
2.38.1

[PATCH 00/19] Migration patches for 8.8

2022-11-21 Thread Juan Quintela

Hi

Based-on:  <20221121125907.62469-1-quint...@redhat.com>

This are the patches that I had to drop form the last PULL request because they 
werent fixes:
- AVX2 is dropped, intel posted a fix, I have to redo it
- Fix for out of order channels is out
  Daniel nacked it and I need to redo it

Juan Quintela (4):
  multifd: Create page_size fields into both MultiFD{Recv,Send}Params
  multifd: Create page_count fields into both MultiFD{Recv,Send}Params
  migration: Export ram_transferred_ram()
  migration: Export ram_release_page()

Peter Xu (15):
  migration: Take bitmap mutex when completing ram migration
  migration: Add postcopy_preempt_active()
  migration: Cleanup xbzrle zero page cache update logic
  migration: Trivial cleanup save_page_header() on same block check
  migration: Remove RAMState.f references in compression code
  migration: Yield bitmap_mutex properly when sending/sleeping
  migration: Use atomic ops properly for page accountings
  migration: Teach PSS about host page
  migration: Introduce pss_channel
  migration: Add pss_init()
  migration: Make PageSearchStatus part of RAMState
  migration: Move last_sent_block into PageSearchStatus
  migration: Send requested page directly in rp-return thread
  migration: Remove old preempt code around state maintainance
  migration: Drop rs->f

 migration/migration.h|   7 -
 migration/multifd.h  |   8 +
 migration/ram.h  |  23 ++
 migration/migration.c|  47 +--
 migration/multifd-zlib.c |  14 +-
 migration/multifd-zstd.c |  12 +-
 migration/multifd.c  |  27 +-
 migration/ram.c  | 735 ++-
 8 files changed, 422 insertions(+), 451 deletions(-)

-- 
2.38.1

[PATCH 15/19] migration: Make PageSearchStatus part of RAMState

2022-11-21 Thread Juan Quintela

From: Peter Xu 

We used to allocate PSS structure on the stack for precopy when sending
pages.  Make it static, so as to describe per-channel ram migration status.

Here we declared RAM_CHANNEL_MAX instances, preparing for postcopy to use
it, even though this patch has not yet to start using the 2nd instance.

This should not have any functional change per se, but it already starts to
export PSS information via the RAMState, so that e.g. one PSS channel can
start to reference the other PSS channel.

Always protect PSS access using the same RAMState.bitmap_mutex.  We already
do so, so no code change needed, just some comment update.  Maybe we should
consider renaming bitmap_mutex some day as it's going to be a more commonly
and big mutex we use for ram states, but just leave it for later.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 112 ++--
 1 file changed, 61 insertions(+), 51 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index d81bf7b183..3194997738 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -85,6 +85,46 @@
 
 XBZRLECacheStats xbzrle_counters;
 
+/* used by the search for pages to send */
+struct PageSearchStatus {
+/* The migration channel used for a specific host page */
+QEMUFile*pss_channel;
+/* Current block being searched */
+RAMBlock*block;
+/* Current page to search from */
+unsigned long page;
+/* Set once we wrap around */
+bool complete_round;
+/*
+ * [POSTCOPY-ONLY] Whether current page is explicitly requested by
+ * postcopy.  When set, the request is "urgent" because the dest QEMU
+ * threads are waiting for us.
+ */
+bool postcopy_requested;
+/*
+ * [POSTCOPY-ONLY] The target channel to use to send current page.
+ *
+ * Note: This may _not_ match with the value in postcopy_requested
+ * above. Let's imagine the case where the postcopy request is exactly
+ * the page that we're sending in progress during precopy. In this case
+ * we'll have postcopy_requested set to true but the target channel
+ * will be the precopy channel (so that we don't split brain on that
+ * specific page since the precopy channel already contains partial of
+ * that page data).
+ *
+ * Besides that specific use case, postcopy_target_channel should
+ * always be equal to postcopy_requested, because by default we send
+ * postcopy pages via postcopy preempt channel.
+ */
+bool postcopy_target_channel;
+/* Whether we're sending a host page */
+bool  host_page_sending;
+/* The start/end of current host page.  Invalid if 
host_page_sending==false */
+unsigned long host_page_start;
+unsigned long host_page_end;
+};
+typedef struct PageSearchStatus PageSearchStatus;
+
 /* struct contains XBZRLE cache and a static page
used by the compression */
 static struct {
@@ -319,6 +359,11 @@ typedef struct {
 struct RAMState {
 /* QEMUFile used for this migration */
 QEMUFile *f;
+/*
+ * PageSearchStatus structures for the channels when send pages.
+ * Protected by the bitmap_mutex.
+ */
+PageSearchStatus pss[RAM_CHANNEL_MAX];
 /* UFFD file descriptor, used in 'write-tracking' migration */
 int uffdio_fd;
 /* Last block that we have visited searching for dirty pages */
@@ -362,7 +407,12 @@ struct RAMState {
 uint64_t target_page_count;
 /* number of dirty bits in the bitmap */
 uint64_t migration_dirty_pages;
-/* Protects modification of the bitmap and migration dirty pages */
+/*
+ * Protects:
+ * - dirty/clear bitmap
+ * - migration_dirty_pages
+ * - pss structures
+ */
 QemuMutex bitmap_mutex;
 /* The RAMBlock used in the last src_page_requests */
 RAMBlock *last_req_rb;
@@ -451,46 +501,6 @@ void dirty_sync_missed_zero_copy(void)
 ram_counters.dirty_sync_missed_zero_copy++;
 }
 
-/* used by the search for pages to send */
-struct PageSearchStatus {
-/* The migration channel used for a specific host page */
-QEMUFile*pss_channel;
-/* Current block being searched */
-RAMBlock*block;
-/* Current page to search from */
-unsigned long page;
-/* Set once we wrap around */
-bool complete_round;
-/*
- * [POSTCOPY-ONLY] Whether current page is explicitly requested by
- * postcopy.  When set, the request is "urgent" because the dest QEMU
- * threads are waiting for us.
- */
-bool postcopy_requested;
-/*
- * [POSTCOPY-ONLY] The target channel to use to send current page.
- *
- * Note: This may _not_ match with the value in postcopy_requested
- * above. Let's imagine the case where the postcopy request is exactly
- * the page that we're sending in progress during precopy. In this case
- *

[PATCH 07/19] migration: Cleanup xbzrle zero page cache update logic

2022-11-21 Thread Juan Quintela

From: Peter Xu 

The major change is to replace "!save_page_use_compression()" with
"xbzrle_enabled" to make it clear.

Reasonings:

(1) When compression enabled, "!save_page_use_compression()" is exactly the
same as checking "xbzrle_enabled".

(2) When compression disabled, "!save_page_use_compression()" always return
true.  We used to try calling the xbzrle code, but after this change we
won't, and we shouldn't need to.

Since at it, drop the xbzrle_enabled check in xbzrle_cache_zero_page()
because with this change it's not needed anymore.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 00a2e30322..7124ff531c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -741,10 +741,6 @@ void mig_throttle_counter_reset(void)
  */
 static void xbzrle_cache_zero_page(RAMState *rs, ram_addr_t current_addr)
 {
-if (!rs->xbzrle_enabled) {
-return;
-}
-
 /* We don't care if this fails to allocate a new cache page
  * as long as it updated an old one */
 cache_insert(XBZRLE.cache, current_addr, XBZRLE.zero_target_page,
@@ -2301,7 +2297,7 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss)
 /* Must let xbzrle know, otherwise a previous (now 0'd) cached
  * page would be stale
  */
-if (!save_page_use_compression(rs)) {
+if (rs->xbzrle_enabled) {
 XBZRLE_cache_lock();
 xbzrle_cache_zero_page(rs, block->offset + offset);
 XBZRLE_cache_unlock();
-- 
2.38.1

[PATCH 18/19] migration: Remove old preempt code around state maintainance

2022-11-21 Thread Juan Quintela

From: Peter Xu 

With the new code to send pages in rp-return thread, there's little help to
keep lots of the old code on maintaining the preempt state in migration
thread, because the new way should always be faster..

Then if we'll always send pages in the rp-return thread anyway, we don't
need those logic to maintain preempt state anymore because now we serialize
things using the mutex directly instead of using those fields.

It's very unfortunate to have those code for a short period, but that's
still one intermediate step that we noticed the next bottleneck on the
migration thread.  Now what we can do best is to drop unnecessary code as
long as the new code is stable to reduce the burden.  It's actually a good
thing because the new "sending page in rp-return thread" model is (IMHO)
even cleaner and with better performance.

Remove the old code that was responsible for maintaining preempt states, at
the meantime also remove x-postcopy-preempt-break-huge parameter because
with concurrent sender threads we don't really need to break-huge anymore.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/migration.h |   7 -
 migration/migration.c |   2 -
 migration/ram.c   | 291 +-
 3 files changed, 3 insertions(+), 297 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index cdad8aceaa..ae4ffd3454 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -340,13 +340,6 @@ struct MigrationState {
 bool send_configuration;
 /* Whether we send section footer during migration */
 bool send_section_footer;
-/*
- * Whether we allow break sending huge pages when postcopy preempt is
- * enabled.  When disabled, we won't interrupt precopy within sending a
- * host huge page, which is the old behavior of vanilla postcopy.
- * NOTE: this parameter is ignored if postcopy preempt is not enabled.
- */
-bool postcopy_preempt_break_huge;
 
 /* Needed by postcopy-pause state */
 QemuSemaphore postcopy_pause_sem;
diff --git a/migration/migration.c b/migration/migration.c
index c1d4d76d0c..c3490c495d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -4402,8 +4402,6 @@ static Property migration_properties[] = {
 DEFINE_PROP_SIZE("announce-step", MigrationState,
   parameters.announce_step,
   DEFAULT_MIGRATE_ANNOUNCE_STEP),
-DEFINE_PROP_BOOL("x-postcopy-preempt-break-huge", MigrationState,
-  postcopy_preempt_break_huge, true),
 DEFINE_PROP_STRING("tls-creds", MigrationState, parameters.tls_creds),
 DEFINE_PROP_STRING("tls-hostname", MigrationState, 
parameters.tls_hostname),
 DEFINE_PROP_STRING("tls-authz", MigrationState, parameters.tls_authz),
diff --git a/migration/ram.c b/migration/ram.c
index 16ade7cb70..1ae093fb61 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -97,28 +97,6 @@ struct PageSearchStatus {
 unsigned long page;
 /* Set once we wrap around */
 bool complete_round;
-/*
- * [POSTCOPY-ONLY] Whether current page is explicitly requested by
- * postcopy.  When set, the request is "urgent" because the dest QEMU
- * threads are waiting for us.
- */
-bool postcopy_requested;
-/*
- * [POSTCOPY-ONLY] The target channel to use to send current page.
- *
- * Note: This may _not_ match with the value in postcopy_requested
- * above. Let's imagine the case where the postcopy request is exactly
- * the page that we're sending in progress during precopy. In this case
- * we'll have postcopy_requested set to true but the target channel
- * will be the precopy channel (so that we don't split brain on that
- * specific page since the precopy channel already contains partial of
- * that page data).
- *
- * Besides that specific use case, postcopy_target_channel should
- * always be equal to postcopy_requested, because by default we send
- * postcopy pages via postcopy preempt channel.
- */
-bool postcopy_target_channel;
 /* Whether we're sending a host page */
 bool  host_page_sending;
 /* The start/end of current host page.  Invalid if 
host_page_sending==false */
@@ -343,20 +321,6 @@ struct RAMSrcPageRequest {
 QSIMPLEQ_ENTRY(RAMSrcPageRequest) next_req;
 };
 
-typedef struct {
-/*
- * Cached ramblock/offset values if preempted.  They're only meaningful if
- * preempted==true below.
- */
-RAMBlock *ram_block;
-unsigned long ram_page;
-/*
- * Whether a postcopy preemption just happened.  Will be reset after
- * precopy recovered to background migration.
- */
-bool preempted;
-} PostcopyPreemptState;
-
 /* State of RAM for migration */
 struct RAMState {
 /* QEMUFile used for this migration */
@@ -419,14 +383,6 @@ struct RAMState {

[PATCH 19/19] migration: Drop rs->f

2022-11-21 Thread Juan Quintela

From: Peter Xu 

Now with rs->pss we can already cache channels in pss->pss_channels.  That
pss_channel contains more infromation than rs->f because it's per-channel.
So rs->f could be replaced by rss->pss[RAM_CHANNEL_PRECOPY].pss_channel,
while rs->f itself is a bit vague now.

Note that vanilla postcopy still send pages via pss[RAM_CHANNEL_PRECOPY],
that's slightly confusing but it reflects the reality.

Then, after the replacement we can safely drop rs->f.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 1ae093fb61..334309f1c6 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -323,8 +323,6 @@ struct RAMSrcPageRequest {
 
 /* State of RAM for migration */
 struct RAMState {
-/* QEMUFile used for this migration */
-QEMUFile *f;
 /*
  * PageSearchStatus structures for the channels when send pages.
  * Protected by the bitmap_mutex.
@@ -2532,8 +2530,6 @@ static int ram_find_and_save_block(RAMState *rs)
 }
 
 if (found) {
-/* Cache rs->f in pss_channel (TODO: remove rs->f) */
-pss->pss_channel = rs->f;
 pages = ram_save_host_page(rs, pss);
 }
 } while (!pages && again);
@@ -3089,7 +3085,7 @@ static void ram_state_resume_prepare(RAMState *rs, 
QEMUFile *out)
 ram_state_reset(rs);
 
 /* Update RAMState cache of output QEMUFile */
-rs->f = out;
+rs->pss[RAM_CHANNEL_PRECOPY].pss_channel = out;
 
 trace_ram_state_resume_prepare(pages);
 }
@@ -3180,7 +3176,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 return -1;
 }
 }
-(*rsp)->f = f;
+(*rsp)->pss[RAM_CHANNEL_PRECOPY].pss_channel = f;
 
 WITH_RCU_READ_LOCK_GUARD() {
 qemu_put_be64(f, ram_bytes_total_common(true) | 
RAM_SAVE_FLAG_MEM_SIZE);
@@ -3315,7 +3311,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 out:
 if (ret >= 0
 && migration_is_setup_or_active(migrate_get_current()->state)) {
-ret = multifd_send_sync_main(rs->f);
+ret = multifd_send_sync_main(rs->pss[RAM_CHANNEL_PRECOPY].pss_channel);
 if (ret < 0) {
 return ret;
 }
@@ -3385,7 +3381,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 return ret;
 }
 
-ret = multifd_send_sync_main(rs->f);
+ret = multifd_send_sync_main(rs->pss[RAM_CHANNEL_PRECOPY].pss_channel);
 if (ret < 0) {
 return ret;
 }
-- 
2.38.1

[PATCH 10/19] migration: Yield bitmap_mutex properly when sending/sleeping

2022-11-21 Thread Juan Quintela

From: Peter Xu 

Don't take the bitmap mutex when sending pages, or when being throttled by
migration_rate_limit() (which is a bit tricky to call it here in ram code,
but seems still helpful).

It prepares for the possibility of concurrently sending pages in >1 threads
using the function ram_save_host_page() because all threads may need the
bitmap_mutex to operate on bitmaps, so that either sendmsg() or any kind of
qemu_sem_wait() blocking for one thread will not block the other from
progressing.

Signed-off-by: Peter Xu 
Reviewed-by: Juan Quintela 
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 46 +++---
 1 file changed, 35 insertions(+), 11 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 6e3dc845c5..5379164749 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2452,9 +2452,14 @@ static void postcopy_preempt_reset_channel(RAMState *rs)
  * a host page in which case the remainder of the hostpage is sent.
  * Only dirty target pages are sent. Note that the host page size may
  * be a huge page for this block.
+ *
  * The saving stops at the boundary of the used_length of the block
  * if the RAMBlock isn't a multiple of the host page size.
  *
+ * The caller must be with ram_state.bitmap_mutex held to call this
+ * function.  Note that this function can temporarily release the lock, but
+ * when the function is returned it'll make sure the lock is still held.
+ *
  * Returns the number of pages written or negative on error
  *
  * @rs: current RAM state
@@ -2462,6 +2467,7 @@ static void postcopy_preempt_reset_channel(RAMState *rs)
  */
 static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
 {
+bool page_dirty, preempt_active = postcopy_preempt_active();
 int tmppages, pages = 0;
 size_t pagesize_bits =
 qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
@@ -2485,22 +2491,40 @@ static int ram_save_host_page(RAMState *rs, 
PageSearchStatus *pss)
 break;
 }
 
+page_dirty = migration_bitmap_clear_dirty(rs, pss->block, pss->page);
+
 /* Check the pages is dirty and if it is send it */
-if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
+if (page_dirty) {
+/*
+ * Properly yield the lock only in postcopy preempt mode
+ * because both migration thread and rp-return thread can
+ * operate on the bitmaps.
+ */
+if (preempt_active) {
+qemu_mutex_unlock(>bitmap_mutex);
+}
 tmppages = ram_save_target_page(rs, pss);
-if (tmppages < 0) {
-return tmppages;
+if (tmppages >= 0) {
+pages += tmppages;
+/*
+ * Allow rate limiting to happen in the middle of huge pages if
+ * something is sent in the current iteration.
+ */
+if (pagesize_bits > 1 && tmppages > 0) {
+migration_rate_limit();
+}
 }
-
-pages += tmppages;
-/*
- * Allow rate limiting to happen in the middle of huge pages if
- * something is sent in the current iteration.
- */
-if (pagesize_bits > 1 && tmppages > 0) {
-migration_rate_limit();
+if (preempt_active) {
+qemu_mutex_lock(>bitmap_mutex);
 }
+} else {
+tmppages = 0;
+}
+
+if (tmppages < 0) {
+return tmppages;
 }
+
 pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
 } while ((pss->page < hostpage_boundary) &&
  offset_in_ramblock(pss->block,
-- 
2.38.1

[PATCH 17/19] migration: Send requested page directly in rp-return thread

2022-11-21 Thread Juan Quintela

From: Peter Xu 

With all the facilities ready, send the requested page directly in the
rp-return thread rather than queuing it in the request queue, if and only
if postcopy preempt is enabled.  It can achieve so because it uses separate
channel for sending urgent pages.  The only shared data is bitmap and it's
protected by the bitmap_mutex.

Note that since we're moving the ownership of the urgent channel from the
migration thread to rp thread it also means the rp thread is responsible
for managing the qemufile, e.g. properly close it when pausing migration
happens.  For this, let migration_release_from_dst_file to cover shutdown
of the urgent channel too, renaming it as migration_release_dst_files() to
better show what it does.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/migration.c |  35 +++--
 migration/ram.c   | 112 ++
 2 files changed, 131 insertions(+), 16 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index de83c50f51..c1d4d76d0c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2848,8 +2848,11 @@ static int migrate_handle_rp_resume_ack(MigrationState 
*s, uint32_t value)
 return 0;
 }
 
-/* Release ms->rp_state.from_dst_file in a safe way */
-static void migration_release_from_dst_file(MigrationState *ms)
+/*
+ * Release ms->rp_state.from_dst_file (and postcopy_qemufile_src if
+ * existed) in a safe way.
+ */
+static void migration_release_dst_files(MigrationState *ms)
 {
 QEMUFile *file;
 
@@ -2862,6 +2865,18 @@ static void 
migration_release_from_dst_file(MigrationState *ms)
 ms->rp_state.from_dst_file = NULL;
 }
 
+/*
+ * Do the same to postcopy fast path socket too if there is.  No
+ * locking needed because this qemufile should only be managed by
+ * return path thread.
+ */
+if (ms->postcopy_qemufile_src) {
+migration_ioc_unregister_yank_from_file(ms->postcopy_qemufile_src);
+qemu_file_shutdown(ms->postcopy_qemufile_src);
+qemu_fclose(ms->postcopy_qemufile_src);
+ms->postcopy_qemufile_src = NULL;
+}
+
 qemu_fclose(file);
 }
 
@@ -3006,7 +3021,7 @@ out:
  * Maybe there is something we can do: it looks like a
  * network down issue, and we pause for a recovery.
  */
-migration_release_from_dst_file(ms);
+migration_release_dst_files(ms);
 rp = NULL;
 if (postcopy_pause_return_path_thread(ms)) {
 /*
@@ -3024,7 +3039,7 @@ out:
 }
 
 trace_source_return_path_thread_end();
-migration_release_from_dst_file(ms);
+migration_release_dst_files(ms);
 rcu_unregister_thread();
 return NULL;
 }
@@ -3547,18 +3562,6 @@ static MigThrError postcopy_pause(MigrationState *s)
 qemu_file_shutdown(file);
 qemu_fclose(file);
 
-/*
- * Do the same to postcopy fast path socket too if there is.  No
- * locking needed because no racer as long as we do this before setting
- * status to paused.
- */
-if (s->postcopy_qemufile_src) {
-migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
-qemu_file_shutdown(s->postcopy_qemufile_src);
-qemu_fclose(s->postcopy_qemufile_src);
-s->postcopy_qemufile_src = NULL;
-}
-
 migrate_set_state(>state, s->state,
   MIGRATION_STATUS_POSTCOPY_PAUSED);
 
diff --git a/migration/ram.c b/migration/ram.c
index 1233ff53ac..16ade7cb70 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -546,6 +546,8 @@ static QemuThread *decompress_threads;
 static QemuMutex decomp_done_lock;
 static QemuCond decomp_done_cond;
 
+static int ram_save_host_page_urgent(PageSearchStatus *pss);
+
 static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock 
*block,
  ram_addr_t offset, uint8_t *source_buf);
 
@@ -560,6 +562,16 @@ static void pss_init(PageSearchStatus *pss, RAMBlock *rb, 
ram_addr_t page)
 pss->complete_round = false;
 }
 
+/*
+ * Check whether two PSSs are actively sending the same page.  Return true
+ * if it is, false otherwise.
+ */
+static bool pss_overlap(PageSearchStatus *pss1, PageSearchStatus *pss2)
+{
+return pss1->host_page_sending && pss2->host_page_sending &&
+(pss1->host_page_start == pss2->host_page_start);
+}
+
 static void *do_data_compress(void *opaque)
 {
 CompressParam *param = opaque;
@@ -2260,6 +2272,57 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t 
start, ram_addr_t len)
 return -1;
 }
 
+/*
+ * When with postcopy preempt, we send back the page directly in the
+ * rp-return thread.
+ */
+if (postcopy_preempt_active()) {
+ram_addr_t page_start = start >> TARGET_PAGE_BITS;
+size_t page_size =

[PATCH 12/19] migration: Teach PSS about host page

2022-11-21 Thread Juan Quintela

From: Peter Xu 

Migration code has a lot to do with host pages.  Teaching PSS core about
the idea of host page helps a lot and makes the code clean.  Meanwhile,
this prepares for the future changes that can leverage the new PSS helpers
that this patch introduces to send host page in another thread.

Three more fields are introduced for this:

  (1) host_page_sending: this is set to true when QEMU is sending a host
  page, false otherwise.

  (2) host_page_{start|end}: these point to the start/end of host page
  we're sending, and it's only valid when host_page_sending==true.

For example, when we look up the next dirty page on the ramblock, with
host_page_sending==true, we'll not try to look for anything beyond the
current host page boundary.  This can be slightly efficient than current
code because currently we'll set pss->page to next dirty bit (which can be
over current host page boundary) and reset it to host page boundary if we
found it goes beyond that.

With above, we can easily make migration_bitmap_find_dirty() self contained
by updating pss->page properly.  rs* parameter is removed because it's not
even used in old code.

When sending a host page, we should use the pss helpers like this:

  - pss_host_page_prepare(pss): called before sending host page
  - pss_within_range(pss): whether we're still working on the cur host page?
  - pss_host_page_finish(pss): called after sending a host page

Then we can use ram_save_target_page() to save one small page.

Currently ram_save_host_page() is still the only user. If there'll be
another function to send host page (e.g. in return path thread) in the
future, it should follow the same style.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 95 +++--
 1 file changed, 76 insertions(+), 19 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index f4cd9038f4..4d7b50ef79 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -481,6 +481,11 @@ struct PageSearchStatus {
  * postcopy pages via postcopy preempt channel.
  */
 bool postcopy_target_channel;
+/* Whether we're sending a host page */
+bool  host_page_sending;
+/* The start/end of current host page.  Only valid if 
host_page_sending==true */
+unsigned long host_page_start;
+unsigned long host_page_end;
 };
 typedef struct PageSearchStatus PageSearchStatus;
 
@@ -858,26 +863,38 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
**current_data,
 }
 
 /**
- * migration_bitmap_find_dirty: find the next dirty page from start
+ * pss_find_next_dirty: find the next dirty page of current ramblock
  *
- * Returns the page offset within memory region of the start of a dirty page
+ * This function updates pss->page to point to the next dirty page index
+ * within the ramblock to migrate, or the end of ramblock when nothing
+ * found.  Note that when pss->host_page_sending==true it means we're
+ * during sending a host page, so we won't look for dirty page that is
+ * outside the host page boundary.
  *
- * @rs: current RAM state
- * @rb: RAMBlock where to search for dirty pages
- * @start: page where we start the search
+ * @pss: the current page search status
  */
-static inline
-unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
-  unsigned long start)
+static void pss_find_next_dirty(PageSearchStatus *pss)
 {
+RAMBlock *rb = pss->block;
 unsigned long size = rb->used_length >> TARGET_PAGE_BITS;
 unsigned long *bitmap = rb->bmap;
 
 if (ramblock_is_ignored(rb)) {
-return size;
+/* Points directly to the end, so we know no dirty page */
+pss->page = size;
+return;
 }
 
-return find_next_bit(bitmap, size, start);
+/*
+ * If during sending a host page, only look for dirty pages within the
+ * current host page being send.
+ */
+if (pss->host_page_sending) {
+assert(pss->host_page_end);
+size = MIN(size, pss->host_page_end);
+}
+
+pss->page = find_next_bit(bitmap, size, pss->page);
 }
 
 static void migration_clear_memory_region_dirty_bitmap(RAMBlock *rb,
@@ -1563,7 +1580,9 @@ static bool find_dirty_block(RAMState *rs, 
PageSearchStatus *pss, bool *again)
 pss->postcopy_requested = false;
 pss->postcopy_target_channel = RAM_CHANNEL_PRECOPY;
 
-pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
+/* Update pss->page for the next dirty bit in ramblock */
+pss_find_next_dirty(pss);
+
 if (pss->complete_round && pss->block == rs->last_seen_block &&
 pss->page >= rs->last_page) {
 /*
@@ -2452,6 +2471,44 @@ static void postcopy_preempt_reset_channel(RAMState *rs)
 }
 }
 
+/* Should be called before sending a host page */
+static void pss_host_page_prepare(PageSearchStatus *pss)
+{
+/* How many guest

[PATCH 08/19] migration: Trivial cleanup save_page_header() on same block check

2022-11-21 Thread Juan Quintela

From: Peter Xu 

The 2nd check on RAM_SAVE_FLAG_CONTINUE is a bit redundant.  Use a boolean
to be clearer.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7124ff531c..41475431fc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -661,14 +661,15 @@ static size_t save_page_header(RAMState *rs, QEMUFile *f, 
 RAMBlock *block,
ram_addr_t offset)
 {
 size_t size, len;
+bool same_block = (block == rs->last_sent_block);
 
-if (block == rs->last_sent_block) {
+if (same_block) {
 offset |= RAM_SAVE_FLAG_CONTINUE;
 }
 qemu_put_be64(f, offset);
 size = 8;
 
-if (!(offset & RAM_SAVE_FLAG_CONTINUE)) {
+if (!same_block) {
 len = strlen(block->idstr);
 qemu_put_byte(f, len);
 qemu_put_buffer(f, (uint8_t *)block->idstr, len);
-- 
2.38.1

[PATCH 02/19] multifd: Create page_count fields into both MultiFD{Recv, Send}Params

2022-11-21 Thread Juan Quintela

We were recalculating it left and right.  We plan to change that
values on next patches.

Signed-off-by: Juan Quintela 
Reviewed-by: Leonardo Bras 
---
 migration/multifd.h | 4 
 migration/multifd.c | 7 ---
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 86fb9982b3..e2802a9ce2 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -82,6 +82,8 @@ typedef struct {
 uint32_t packet_len;
 /* guest page size */
 uint32_t page_size;
+/* number of pages in a full packet */
+uint32_t page_count;
 /* multifd flags for sending ram */
 int write_flags;
 
@@ -147,6 +149,8 @@ typedef struct {
 uint32_t packet_len;
 /* guest page size */
 uint32_t page_size;
+/* number of pages in a full packet */
+uint32_t page_count;
 
 /* syncs main thread and channels */
 QemuSemaphore sem_sync;
diff --git a/migration/multifd.c b/migration/multifd.c
index efffa77a76..b8dc559d24 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -279,7 +279,6 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
 static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
 {
 MultiFDPacket_t *packet = p->packet;
-uint32_t page_count = MULTIFD_PACKET_SIZE / p->page_size;
 RAMBlock *block;
 int i;
 
@@ -306,10 +305,10 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams 
*p, Error **errp)
  * If we received a packet that is 100 times bigger than expected
  * just stop migration.  It is a magic number.
  */
-if (packet->pages_alloc > page_count) {
+if (packet->pages_alloc > p->page_count) {
 error_setg(errp, "multifd: received packet "
"with size %u and expected a size of %u",
-   packet->pages_alloc, page_count) ;
+   packet->pages_alloc, p->page_count) ;
 return -1;
 }
 
@@ -944,6 +943,7 @@ int multifd_save_setup(Error **errp)
 p->iov = g_new0(struct iovec, page_count + 1);
 p->normal = g_new0(ram_addr_t, page_count);
 p->page_size = qemu_target_page_size();
+p->page_count = page_count;
 
 if (migrate_use_zero_copy_send()) {
 p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY;
@@ -1191,6 +1191,7 @@ int multifd_load_setup(Error **errp)
 p->name = g_strdup_printf("multifdrecv_%d", i);
 p->iov = g_new0(struct iovec, page_count);
 p->normal = g_new0(ram_addr_t, page_count);
+p->page_count = page_count;
 p->page_size = qemu_target_page_size();
 }
 
-- 
2.38.1

[PATCH 04/19] migration: Export ram_release_page()

2022-11-21 Thread Juan Quintela

Signed-off-by: Juan Quintela 
Reviewed-by: Leonardo Bras 
---
 migration/ram.h | 1 +
 migration/ram.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/ram.h b/migration/ram.h
index e844966f69..038d52f49f 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -66,6 +66,7 @@ int ram_load_postcopy(QEMUFile *f, int channel);
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
 
 void ram_transferred_add(uint64_t bytes);
+void ram_release_page(const char *rbname, uint64_t offset);
 
 int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr);
 bool ramblock_recv_bitmap_test_byte_offset(RAMBlock *rb, uint64_t byte_offset);
diff --git a/migration/ram.c b/migration/ram.c
index 2cbe707bfc..8aad17c429 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1234,7 +1234,7 @@ static void migration_bitmap_sync_precopy(RAMState *rs)
 }
 }
 
-static void ram_release_page(const char *rbname, uint64_t offset)
+void ram_release_page(const char *rbname, uint64_t offset)
 {
 if (!migrate_release_ram() || !migration_in_postcopy()) {
 return;
-- 
2.38.1

[PATCH 14/19] migration: Add pss_init()

2022-11-21 Thread Juan Quintela

From: Peter Xu 

Helper to init PSS structures.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 571d780987..d81bf7b183 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -542,6 +542,14 @@ static bool do_compress_ram_page(QEMUFile *f, z_stream 
*stream, RAMBlock *block,
 static void postcopy_preempt_restore(RAMState *rs, PageSearchStatus *pss,
  bool postcopy_requested);
 
+/* NOTE: page is the PFN not real ram_addr_t. */
+static void pss_init(PageSearchStatus *pss, RAMBlock *rb, ram_addr_t page)
+{
+pss->block = rb;
+pss->page = page;
+pss->complete_round = false;
+}
+
 static void *do_data_compress(void *opaque)
 {
 CompressParam *param = opaque;
@@ -2650,9 +2658,7 @@ static int ram_find_and_save_block(RAMState *rs)
 rs->last_page = 0;
 }
 
-pss.block = rs->last_seen_block;
-pss.page = rs->last_page;
-pss.complete_round = false;
+pss_init(, rs->last_seen_block, rs->last_page);
 
 do {
 again = true;
-- 
2.38.1

[PATCH 06/19] migration: Add postcopy_preempt_active()

2022-11-21 Thread Juan Quintela

From: Peter Xu 

Add the helper to show that postcopy preempt enabled, meanwhile active.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index cc72c24c18..00a2e30322 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -162,6 +162,11 @@ out:
 return ret;
 }
 
+static bool postcopy_preempt_active(void)
+{
+return migrate_postcopy_preempt() && migration_in_postcopy();
+}
+
 bool ramblock_is_ignored(RAMBlock *block)
 {
 return !qemu_ram_is_migratable(block) ||
@@ -2433,7 +2438,7 @@ static void postcopy_preempt_choose_channel(RAMState *rs, 
PageSearchStatus *pss)
 /* We need to make sure rs->f always points to the default channel elsewhere */
 static void postcopy_preempt_reset_channel(RAMState *rs)
 {
-if (migrate_postcopy_preempt() && migration_in_postcopy()) {
+if (postcopy_preempt_active()) {
 rs->postcopy_channel = RAM_CHANNEL_PRECOPY;
 rs->f = migrate_get_current()->to_dst_file;
 trace_postcopy_preempt_reset_channel();
@@ -2471,7 +2476,7 @@ static int ram_save_host_page(RAMState *rs, 
PageSearchStatus *pss)
 return 0;
 }
 
-if (migrate_postcopy_preempt() && migration_in_postcopy()) {
+if (postcopy_preempt_active()) {
 postcopy_preempt_choose_channel(rs, pss);
 }
 
-- 
2.38.1

[PATCH 05/19] migration: Take bitmap mutex when completing ram migration

2022-11-21 Thread Juan Quintela

From: Peter Xu 

Any call to ram_find_and_save_block() needs to take the bitmap mutex.  We
used to not take it for most of ram_save_complete() because we thought
we're the only one left using the bitmap, but it's not true after the
preempt full patchset applied, since the return path can be taking it too.

Signed-off-by: Peter Xu 
Reviewed-by: Juan Quintela 
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 8aad17c429..cc72c24c18 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3406,6 +3406,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 /* try transferring iterative blocks of memory */
 
 /* flush all remaining blocks regardless of rate limiting */
+qemu_mutex_lock(>bitmap_mutex);
 while (true) {
 int pages;
 
@@ -3419,6 +3420,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 break;
 }
 }
+qemu_mutex_unlock(>bitmap_mutex);
 
 flush_compressed_data(rs);
 ram_control_after_iterate(f, RAM_CONTROL_FINISH);
-- 
2.38.1

[PATCH 16/19] migration: Move last_sent_block into PageSearchStatus

2022-11-21 Thread Juan Quintela

From: Peter Xu 

Since we use PageSearchStatus to represent a channel, it makes perfect
sense to keep last_sent_block (aka, leverage RAM_SAVE_FLAG_CONTINUE) to be
per-channel rather than global because each channel can be sending
different pages on ramblocks.

Hence move it from RAMState into PageSearchStatus.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 71 -
 1 file changed, 41 insertions(+), 30 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 3194997738..1233ff53ac 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -89,6 +89,8 @@ XBZRLECacheStats xbzrle_counters;
 struct PageSearchStatus {
 /* The migration channel used for a specific host page */
 QEMUFile*pss_channel;
+/* Last block from where we have sent data */
+RAMBlock *last_sent_block;
 /* Current block being searched */
 RAMBlock*block;
 /* Current page to search from */
@@ -368,8 +370,6 @@ struct RAMState {
 int uffdio_fd;
 /* Last block that we have visited searching for dirty pages */
 RAMBlock *last_seen_block;
-/* Last block from where we have sent data */
-RAMBlock *last_sent_block;
 /* Last dirty target page we have sent */
 ram_addr_t last_page;
 /* last ram version we have seen */
@@ -684,16 +684,17 @@ exit:
  *
  * Returns the number of bytes written
  *
- * @f: QEMUFile where to send the data
+ * @pss: current PSS channel status
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  *  in the lower bits, it contains flags
  */
-static size_t save_page_header(RAMState *rs, QEMUFile *f,  RAMBlock *block,
+static size_t save_page_header(PageSearchStatus *pss, RAMBlock *block,
ram_addr_t offset)
 {
 size_t size, len;
-bool same_block = (block == rs->last_sent_block);
+bool same_block = (block == pss->last_sent_block);
+QEMUFile *f = pss->pss_channel;
 
 if (same_block) {
 offset |= RAM_SAVE_FLAG_CONTINUE;
@@ -706,7 +707,7 @@ static size_t save_page_header(RAMState *rs, QEMUFile *f,  
RAMBlock *block,
 qemu_put_byte(f, len);
 qemu_put_buffer(f, (uint8_t *)block->idstr, len);
 size += 1 + len;
-rs->last_sent_block = block;
+pss->last_sent_block = block;
 }
 return size;
 }
@@ -790,17 +791,19 @@ static void xbzrle_cache_zero_page(RAMState *rs, 
ram_addr_t current_addr)
  *  -1 means that xbzrle would be longer than normal
  *
  * @rs: current RAM state
+ * @pss: current PSS channel
  * @current_data: pointer to the address of the page contents
  * @current_addr: addr of the page
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  */
-static int save_xbzrle_page(RAMState *rs, QEMUFile *file,
+static int save_xbzrle_page(RAMState *rs, PageSearchStatus *pss,
 uint8_t **current_data, ram_addr_t current_addr,
 RAMBlock *block, ram_addr_t offset)
 {
 int encoded_len = 0, bytes_xbzrle;
 uint8_t *prev_cached_page;
+QEMUFile *file = pss->pss_channel;
 
 if (!cache_is_cached(XBZRLE.cache, current_addr,
  ram_counters.dirty_sync_count)) {
@@ -865,7 +868,7 @@ static int save_xbzrle_page(RAMState *rs, QEMUFile *file,
 }
 
 /* Send XBZRLE based compressed page */
-bytes_xbzrle = save_page_header(rs, file, block,
+bytes_xbzrle = save_page_header(pss, block,
 offset | RAM_SAVE_FLAG_XBZRLE);
 qemu_put_byte(file, ENCODING_FLAG_XBZRLE);
 qemu_put_be16(file, encoded_len);
@@ -1296,19 +1299,19 @@ void ram_release_page(const char *rbname, uint64_t 
offset)
  * Returns the size of data written to the file, 0 means the page is not
  * a zero page
  *
- * @rs: current RAM state
- * @file: the file where the data is saved
+ * @pss: current PSS channel
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  */
-static int save_zero_page_to_file(RAMState *rs, QEMUFile *file,
+static int save_zero_page_to_file(PageSearchStatus *pss,
   RAMBlock *block, ram_addr_t offset)
 {
 uint8_t *p = block->host + offset;
+QEMUFile *file = pss->pss_channel;
 int len = 0;
 
 if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
-len += save_page_header(rs, file, block, offset | RAM_SAVE_FLAG_ZERO);
+len += save_page_header(pss, block, offset | RAM_SAVE_FLAG_ZERO);
 qemu_put_byte(file, 0);
 len += 1;
 ram_release_page(block->idstr, offset);
@@ -1321,14 +1324,14 @@ static int save_zero_page_to_file(RAMState *rs, 
QEMUFile *file,
  *
  * Returns the number of pages written.
  *
- * @rs: current RAM state
+ * @pss: current PSS channel
  *

[PATCH 01/19] multifd: Create page_size fields into both MultiFD{Recv, Send}Params

2022-11-21 Thread Juan Quintela

We were calling qemu_target_page_size() left and right.

Signed-off-by: Juan Quintela 
Reviewed-by: Leonardo Bras 
---
 migration/multifd.h  |  4 
 migration/multifd-zlib.c | 14 ++
 migration/multifd-zstd.c | 12 +---
 migration/multifd.c  | 18 --
 4 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 519f498643..86fb9982b3 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -80,6 +80,8 @@ typedef struct {
 bool registered_yank;
 /* packet allocated len */
 uint32_t packet_len;
+/* guest page size */
+uint32_t page_size;
 /* multifd flags for sending ram */
 int write_flags;
 
@@ -143,6 +145,8 @@ typedef struct {
 QIOChannel *c;
 /* packet allocated len */
 uint32_t packet_len;
+/* guest page size */
+uint32_t page_size;
 
 /* syncs main thread and channels */
 QemuSemaphore sem_sync;
diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 18213a9513..37770248e1 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -116,7 +116,6 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error 
**errp)
 static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
 {
 struct zlib_data *z = p->data;
-size_t page_size = qemu_target_page_size();
 z_stream *zs = >zs;
 uint32_t out_size = 0;
 int ret;
@@ -135,8 +134,8 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
**errp)
  * with compression. zlib does not guarantee that this is safe,
  * therefore copy the page before calling deflate().
  */
-memcpy(z->buf, p->pages->block->host + p->normal[i], page_size);
-zs->avail_in = page_size;
+memcpy(z->buf, p->pages->block->host + p->normal[i], p->page_size);
+zs->avail_in = p->page_size;
 zs->next_in = z->buf;
 
 zs->avail_out = available;
@@ -242,12 +241,11 @@ static void zlib_recv_cleanup(MultiFDRecvParams *p)
 static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
 struct zlib_data *z = p->data;
-size_t page_size = qemu_target_page_size();
 z_stream *zs = >zs;
 uint32_t in_size = p->next_packet_size;
 /* we measure the change of total_out */
 uint32_t out_size = zs->total_out;
-uint32_t expected_size = p->normal_num * page_size;
+uint32_t expected_size = p->normal_num * p->page_size;
 uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
 int ret;
 int i;
@@ -274,7 +272,7 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error 
**errp)
 flush = Z_SYNC_FLUSH;
 }
 
-zs->avail_out = page_size;
+zs->avail_out = p->page_size;
 zs->next_out = p->host + p->normal[i];
 
 /*
@@ -288,8 +286,8 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error 
**errp)
 do {
 ret = inflate(zs, flush);
 } while (ret == Z_OK && zs->avail_in
- && (zs->total_out - start) < page_size);
-if (ret == Z_OK && (zs->total_out - start) < page_size) {
+ && (zs->total_out - start) < p->page_size);
+if (ret == Z_OK && (zs->total_out - start) < p->page_size) {
 error_setg(errp, "multifd %u: inflate generated too few output",
p->id);
 return -1;
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index d788d309f2..f4a8e1ed1f 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -113,7 +113,6 @@ static void zstd_send_cleanup(MultiFDSendParams *p, Error 
**errp)
 static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
 {
 struct zstd_data *z = p->data;
-size_t page_size = qemu_target_page_size();
 int ret;
 uint32_t i;
 
@@ -128,7 +127,7 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error 
**errp)
 flush = ZSTD_e_flush;
 }
 z->in.src = p->pages->block->host + p->normal[i];
-z->in.size = page_size;
+z->in.size = p->page_size;
 z->in.pos = 0;
 
 /*
@@ -241,8 +240,7 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error 
**errp)
 {
 uint32_t in_size = p->next_packet_size;
 uint32_t out_size = 0;
-size_t page_size = qemu_target_page_size();
-uint32_t expected_size = p->normal_num * page_size;
+uint32_t expected_size = p->normal_num * p->page_size;
 uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
 struct zstd_data *z = p->data;
 int ret;
@@ -265,7 +263,7 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error 
**errp)
 
 for (i = 0; i < p->normal_num; i++) {
 z->out.dst = p->host + p->normal[i];
-z->out.size = page_size;
+z->out.size = p->page_size;
 z->out.pos = 0;
 
 /*
@@ -279,8 +277,8 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error 
**errp)
 do {
 ret

[PATCH 09/19] migration: Remove RAMState.f references in compression code

2022-11-21 Thread Juan Quintela

From: Peter Xu 

Removing referencing to RAMState.f in compress_page_with_multi_thread() and
flush_compressed_data().

Compression code by default isn't compatible with having >1 channels (or it
won't currently know which channel to flush the compressed data), so to
make it simple we always flush on the default to_dst_file port until
someone wants to add >1 ports support, as rs->f right now can really
change (after postcopy preempt is introduced).

There should be no functional change at all after patch applied, since as
long as rs->f referenced in compression code, it must be to_dst_file.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 41475431fc..6e3dc845c5 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1461,6 +1461,7 @@ static bool save_page_use_compression(RAMState *rs);
 
 static void flush_compressed_data(RAMState *rs)
 {
+MigrationState *ms = migrate_get_current();
 int idx, len, thread_count;
 
 if (!save_page_use_compression(rs)) {
@@ -1479,7 +1480,7 @@ static void flush_compressed_data(RAMState *rs)
 for (idx = 0; idx < thread_count; idx++) {
 qemu_mutex_lock(_param[idx].mutex);
 if (!comp_param[idx].quit) {
-len = qemu_put_qemu_file(rs->f, comp_param[idx].file);
+len = qemu_put_qemu_file(ms->to_dst_file, comp_param[idx].file);
 /*
  * it's safe to fetch zero_page without holding comp_done_lock
  * as there is no further request submitted to the thread,
@@ -1498,11 +1499,11 @@ static inline void set_compress_params(CompressParam 
*param, RAMBlock *block,
 param->offset = offset;
 }
 
-static int compress_page_with_multi_thread(RAMState *rs, RAMBlock *block,
-   ram_addr_t offset)
+static int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset)
 {
 int idx, thread_count, bytes_xmit = -1, pages = -1;
 bool wait = migrate_compress_wait_thread();
+MigrationState *ms = migrate_get_current();
 
 thread_count = migrate_compress_threads();
 qemu_mutex_lock(_done_lock);
@@ -1510,7 +1511,8 @@ retry:
 for (idx = 0; idx < thread_count; idx++) {
 if (comp_param[idx].done) {
 comp_param[idx].done = false;
-bytes_xmit = qemu_put_qemu_file(rs->f, comp_param[idx].file);
+bytes_xmit = qemu_put_qemu_file(ms->to_dst_file,
+comp_param[idx].file);
 qemu_mutex_lock(_param[idx].mutex);
 set_compress_params(_param[idx], block, offset);
 qemu_cond_signal(_param[idx].cond);
@@ -2263,7 +2265,7 @@ static bool save_compress_page(RAMState *rs, RAMBlock 
*block, ram_addr_t offset)
 return false;
 }
 
-if (compress_page_with_multi_thread(rs, block, offset) > 0) {
+if (compress_page_with_multi_thread(block, offset) > 0) {
 return true;
 }
 
-- 
2.38.1

[PATCH 13/19] migration: Introduce pss_channel

2022-11-21 Thread Juan Quintela

From: Peter Xu 

Introduce pss_channel for PageSearchStatus, define it as "the migration
channel to be used to transfer this host page".

We used to have rs->f, which is a mirror to MigrationState.to_dst_file.

After postcopy preempt initial version, rs->f can be dynamically changed
depending on which channel we want to use.

But that later work still doesn't grant full concurrency of sending pages
in e.g. different threads, because rs->f can either be the PRECOPY channel
or POSTCOPY channel.  This needs to be per-thread too.

PageSearchStatus is actually a good piece of struct which we can leverage
if we want to have multiple threads sending pages.  Sending a single guest
page may not make sense, so we make the granule to be "host page", and in
the PSS structure we allow specify a QEMUFile* to migrate a specific host
page.  Then we open the possibility to specify different channels in
different threads with different PSS structures.

The PSS prefix can be slightly misleading here because e.g. for the
upcoming usage of postcopy channel/thread it's not "searching" (or,
scanning) at all but sending the explicit page that was requested.  However
since PSS existed for some years keep it as-is until someone complains.

This patch mostly (simply) replace rs->f with pss->pss_channel only. No
functional change intended for this patch yet.  But it does prepare to
finally drop rs->f, and make ram_save_guest_page() thread safe.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 70 +++--
 1 file changed, 38 insertions(+), 32 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 4d7b50ef79..571d780987 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -453,6 +453,8 @@ void dirty_sync_missed_zero_copy(void)
 
 /* used by the search for pages to send */
 struct PageSearchStatus {
+/* The migration channel used for a specific host page */
+QEMUFile*pss_channel;
 /* Current block being searched */
 RAMBlock*block;
 /* Current page to search from */
@@ -775,9 +777,9 @@ static void xbzrle_cache_zero_page(RAMState *rs, ram_addr_t 
current_addr)
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  */
-static int save_xbzrle_page(RAMState *rs, uint8_t **current_data,
-ram_addr_t current_addr, RAMBlock *block,
-ram_addr_t offset)
+static int save_xbzrle_page(RAMState *rs, QEMUFile *file,
+uint8_t **current_data, ram_addr_t current_addr,
+RAMBlock *block, ram_addr_t offset)
 {
 int encoded_len = 0, bytes_xbzrle;
 uint8_t *prev_cached_page;
@@ -845,11 +847,11 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
**current_data,
 }
 
 /* Send XBZRLE based compressed page */
-bytes_xbzrle = save_page_header(rs, rs->f, block,
+bytes_xbzrle = save_page_header(rs, file, block,
 offset | RAM_SAVE_FLAG_XBZRLE);
-qemu_put_byte(rs->f, ENCODING_FLAG_XBZRLE);
-qemu_put_be16(rs->f, encoded_len);
-qemu_put_buffer(rs->f, XBZRLE.encoded_buf, encoded_len);
+qemu_put_byte(file, ENCODING_FLAG_XBZRLE);
+qemu_put_be16(file, encoded_len);
+qemu_put_buffer(file, XBZRLE.encoded_buf, encoded_len);
 bytes_xbzrle += encoded_len + 1 + 2;
 /*
  * Like compressed_size (please see update_compress_thread_counts),
@@ -1305,9 +1307,10 @@ static int save_zero_page_to_file(RAMState *rs, QEMUFile 
*file,
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  */
-static int save_zero_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
+static int save_zero_page(RAMState *rs, QEMUFile *file, RAMBlock *block,
+  ram_addr_t offset)
 {
-int len = save_zero_page_to_file(rs, rs->f, block, offset);
+int len = save_zero_page_to_file(rs, file, block, offset);
 
 if (len) {
 stat64_add(_atomic_counters.duplicate, 1);
@@ -1324,15 +1327,15 @@ static int save_zero_page(RAMState *rs, RAMBlock 
*block, ram_addr_t offset)
  *
  * Return true if the pages has been saved, otherwise false is returned.
  */
-static bool control_save_page(RAMState *rs, RAMBlock *block, ram_addr_t offset,
-  int *pages)
+static bool control_save_page(PageSearchStatus *pss, RAMBlock *block,
+  ram_addr_t offset, int *pages)
 {
 uint64_t bytes_xmit = 0;
 int ret;
 
 *pages = -1;
-ret = ram_control_save_page(rs->f, block->offset, offset, TARGET_PAGE_SIZE,
-_xmit);
+ret = ram_control_save_page(pss->pss_channel, block->offset, offset,
+TARGET_PAGE_SIZE, _xmit);
 if (ret == RAM_SAVE_CONTROL_NOT_SUPP) {
 return false;
 }

[PATCH 11/19] migration: Use atomic ops properly for page accountings

2022-11-21 Thread Juan Quintela

From: Peter Xu 

To prepare for thread-safety on page accountings, at least below counters
need to be accessed only atomically, they are:

ram_counters.transferred
ram_counters.duplicate
ram_counters.normal
ram_counters.postcopy_bytes

There are a lot of other counters but they won't be accessed outside
migration thread, then they're still safe to be accessed without atomic
ops.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.h   | 20 
 migration/migration.c | 10 +-
 migration/multifd.c   |  4 ++--
 migration/ram.c   | 40 
 4 files changed, 51 insertions(+), 23 deletions(-)

diff --git a/migration/ram.h b/migration/ram.h
index 038d52f49f..81cbb0947c 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -32,7 +32,27 @@
 #include "qapi/qapi-types-migration.h"
 #include "exec/cpu-common.h"
 #include "io/channel.h"
+#include "qemu/stats64.h"
 
+/*
+ * These are the migration statistic counters that need to be updated using
+ * atomic ops (can be accessed by more than one thread).  Here since we
+ * cannot modify MigrationStats directly to use Stat64 as it was defined in
+ * the QAPI scheme, we define an internal structure to hold them, and we
+ * propagate the real values when QMP queries happen.
+ *
+ * IOW, the corresponding fields within ram_counters on these specific
+ * fields will be always zero and not being used at all; they're just
+ * placeholders to make it QAPI-compatible.
+ */
+typedef struct {
+Stat64 transferred;
+Stat64 duplicate;
+Stat64 normal;
+Stat64 postcopy_bytes;
+} MigrationAtomicStats;
+
+extern MigrationAtomicStats ram_atomic_counters;
 extern MigrationStats ram_counters;
 extern XBZRLECacheStats xbzrle_counters;
 extern CompressionStats compression_counters;
diff --git a/migration/migration.c b/migration/migration.c
index f485eea5fb..de83c50f51 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1049,13 +1049,13 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 
 info->has_ram = true;
 info->ram = g_malloc0(sizeof(*info->ram));
-info->ram->transferred = ram_counters.transferred;
+info->ram->transferred = stat64_get(_atomic_counters.transferred);
 info->ram->total = ram_bytes_total();
-info->ram->duplicate = ram_counters.duplicate;
+info->ram->duplicate = stat64_get(_atomic_counters.duplicate);
 /* legacy value.  It is not used anymore */
 info->ram->skipped = 0;
-info->ram->normal = ram_counters.normal;
-info->ram->normal_bytes = ram_counters.normal * page_size;
+info->ram->normal = stat64_get(_atomic_counters.normal);
+info->ram->normal_bytes = info->ram->normal * page_size;
 info->ram->mbps = s->mbps;
 info->ram->dirty_sync_count = ram_counters.dirty_sync_count;
 info->ram->dirty_sync_missed_zero_copy =
@@ -1066,7 +1066,7 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 info->ram->pages_per_second = s->pages_per_second;
 info->ram->precopy_bytes = ram_counters.precopy_bytes;
 info->ram->downtime_bytes = ram_counters.downtime_bytes;
-info->ram->postcopy_bytes = ram_counters.postcopy_bytes;
+info->ram->postcopy_bytes = 
stat64_get(_atomic_counters.postcopy_bytes);
 
 if (migrate_use_xbzrle()) {
 info->has_xbzrle_cache = true;
diff --git a/migration/multifd.c b/migration/multifd.c
index b8dc559d24..000ca4d4ec 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -432,7 +432,7 @@ static int multifd_send_pages(QEMUFile *f)
 transferred = ((uint64_t) pages->num) * p->page_size + p->packet_len;
 qemu_file_acct_rate_limit(f, transferred);
 ram_counters.multifd_bytes += transferred;
-ram_counters.transferred += transferred;
+stat64_add(_atomic_counters.transferred, transferred);
 qemu_mutex_unlock(>mutex);
 qemu_sem_post(>sem);
 
@@ -624,7 +624,7 @@ int multifd_send_sync_main(QEMUFile *f)
 p->pending_job++;
 qemu_file_acct_rate_limit(f, p->packet_len);
 ram_counters.multifd_bytes += p->packet_len;
-ram_counters.transferred += p->packet_len;
+stat64_add(_atomic_counters.transferred, p->packet_len);
 qemu_mutex_unlock(>mutex);
 qemu_sem_post(>sem);
 
diff --git a/migration/ram.c b/migration/ram.c
index 5379164749..f4cd9038f4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -425,18 +425,25 @@ uint64_t ram_bytes_remaining(void)
0;
 }
 
+/*
+ * NOTE: not all stats in ram_counters are used in reality.  See comments
+ * for struct MigrationAtomicStats.  The ultimate result of ram migration
+ * counters will be a merged version with both ram_counters and the atomic
+ * fields in ram_atomic_counters.
+ */
 MigrationStats ram_counters;
+MigrationAtomicStats ram_atomic_counters;
 
 void

Re: [PATCH v4 01/11] block-copy: add missing coroutine_fn annotations

2022-11-21 Thread Emanuele Giuseppe Esposito




Am 21/11/2022 um 12:50 schrieb Kevin Wolf:
> Am 21.11.2022 um 09:51 hat Emanuele Giuseppe Esposito geschrieben:
>>
>>
>> Am 21/11/2022 um 09:32 schrieb Emanuele Giuseppe Esposito:
>>>
>>>
>>> Am 18/11/2022 um 20:05 schrieb Kevin Wolf:
 Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben:
> These functions end up calling bdrv_common_block_status_above(), a
> generated_co_wrapper function.
> In addition, they also happen to be always called in coroutine context,
> meaning all callers are coroutine_fn.
> This means that the g_c_w function will enter the qemu_in_coroutine()
> case and eventually suspend (or in other words call 
> qemu_coroutine_yield()).
> Therefore we need to mark such functions coroutine_fn too.
>
> Signed-off-by: Emanuele Giuseppe Esposito 

 Ideally, we'd convert them to new wrappers bdrv_co_is_allocated() and
 bdrv_co_block_status_above() instead of having to argue that they always
 take the coroutine path in g_c_w.
>>>
>>> Ok so basically I should introduce bdrv_co_is_allocated, because so far
>>> in this and next series I never thought about creating it.
>>> Since these functions will be eventually split anyways, I agree let's
>>> start doing this now.
>>
>> Actually bdrv_is_allocated would be a g_c_w functions in itself, that
>> calls another g_c_w and it is probably called by functions that are or
>> will be g_c_w.
> 
> I'm not sure if I understand. bdrv_is_allocated() is essentially a g_c_w
> function today, just indirectly. But we have callers that know that they
> are running in a coroutine (which is why you're adding coroutine_fn to
> them), so they shouldn't call a g_c_w function, but directly the
> coroutine version of the function.
> 
> The only reason why you can't currently do that is that
> bdrv_is_allocated() exists as a wrapper around the g_c_w function
> bdrv_common_block_status_above(), but the same wrapper doesn't exist for
> the pure coroutine version bdrv_co_common_block_status_above().
> 
> All I'm suggesting is introducing a bdrv_co_is_allocated() that is a
> wrapper directly around bdrv_co_common_block_status_above(), so that
> the functions you're marking as coroutine_fn can use it instead of
> calling g_c_w. This should be about 10 lines of code.
> 
> I'm not implying that you should convert any other callers in this
> patch, or that you should touch bdrv_is_allocated() at all.
> 
>> Is this actually the scope of this series? I think switching this
>> specific function and its callers or similar will require a lot of
>> efforts, and if I do it here it won't cover all the cases for sure.
>>
>> Wouldn't it be better to do these kind of things in a different serie
>> using Paolo's vrc tool?
> 
> I'm not sure what the scope of this series is, because you already do
> introduce new wrappers in other patches of the series. I assumed it's
> just to improve the situation a little, with no claim of being
> exhaustive.
> 
> Finding and fully converting all callers might indeed be a job for
> something like vrc, but here I'm only looking at local consistency in
> functions where you're adding coroutine_fn.
> 

Oh ok now I see what you mean. I was thinking (and did in "[PATCH 04/15]
block: convert bdrv_refresh_total_sectors in generated_co_wrapper") to
instead convert all callers in g_c_w, and that ended up being a big pain.

I'll also correct the patch I mentioned above.

Thank you,
Emanuele

[PULL 1/8] migration/channel-block: fix return value for qio_channel_block_{readv, writev}

2022-11-21 Thread Juan Quintela

From: Fiona Ebner 

in the error case. The documentation in include/io/channel.h states
that -1 or QIO_CHANNEL_ERR_BLOCK should be returned upon error. Simply
passing along the return value from the bdrv-functions has the
potential to confuse the call sides. Non-blocking mode is not
implemented currently, so -1 it is.

Signed-off-by: Fiona Ebner 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/channel-block.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/migration/channel-block.c b/migration/channel-block.c
index c55c8c93ce..f4ab53acdb 100644
--- a/migration/channel-block.c
+++ b/migration/channel-block.c
@@ -62,7 +62,8 @@ qio_channel_block_readv(QIOChannel *ioc,
 qemu_iovec_init_external(, (struct iovec *)iov, niov);
 ret = bdrv_readv_vmstate(bioc->bs, , bioc->offset);
 if (ret < 0) {
-return ret;
+error_setg_errno(errp, -ret, "bdrv_readv_vmstate failed");
+return -1;
 }
 
 bioc->offset += qiov.size;
@@ -86,7 +87,8 @@ qio_channel_block_writev(QIOChannel *ioc,
 qemu_iovec_init_external(, (struct iovec *)iov, niov);
 ret = bdrv_writev_vmstate(bioc->bs, , bioc->offset);
 if (ret < 0) {
-return ret;
+error_setg_errno(errp, -ret, "bdrv_writev_vmstate failed");
+return -1;
 }
 
 bioc->offset += qiov.size;
-- 
2.38.1

[PULL 0/8] Next patches

2022-11-21 Thread Juan Quintela

The following changes since commit a082fab9d259473a9d5d53307cf83b1223301181:

  Merge tag 'pull-ppc-20221117' of https://gitlab.com/danielhb/qemu into 
staging (2022-11-17 12:39:38 -0500)

are available in the Git repository at:

  https://gitlab.com/juan.quintela/qemu.git tags/next-pull-request

for you to fetch changes up to b5280437a7f49cf617cdd99bbbe2c7bd1652408b:

  migration: Block migration comment or code is wrong (2022-11-21 11:58:10 
+0100)


Migration PULL request (take 3)

Hi

Drop everything that is not a bug fix:
- fixes by peter
- fix comment on block creation (me)
- fix return values from qio_channel_block()

Please, apply.

(take 1)
It includes:
- Leonardo fix for zero_copy flush
- Fiona fix for return value of readv/writev
- Peter Xu cleanups
- Peter Xu preempt patches
- Patches ready from zero page (me)
- AVX2 support (ling)
- fix for slow networking and reordering of first packets (manish)

Please, apply.



Fiona Ebner (1):
  migration/channel-block: fix return value for
qio_channel_block_{readv,writev}

Juan Quintela (1):
  migration: Block migration comment or code is wrong

Leonardo Bras (1):
  migration/multifd/zero-copy: Create helper function for flushing

Peter Xu (5):
  migration: Fix possible infinite loop of ram save process
  migration: Fix race on qemu_file_shutdown()
  migration: Disallow postcopy preempt to be used with compress
  migration: Use non-atomic ops for clear log bitmap
  migration: Disable multifd explicitly with compression

 include/exec/ram_addr.h   | 11 +-
 include/exec/ramblock.h   |  3 +++
 include/qemu/bitmap.h |  1 +
 migration/block.c |  4 ++--
 migration/channel-block.c |  6 --
 migration/migration.c | 18 
 migration/multifd.c   | 30 --
 migration/qemu-file.c | 27 ---
 migration/ram.c   | 27 ++-
 util/bitmap.c | 45 +++
 10 files changed, 139 insertions(+), 33 deletions(-)

-- 
2.38.1

[PULL 7/8] migration: Disable multifd explicitly with compression

2022-11-21 Thread Juan Quintela

From: Peter Xu 

Multifd thread model does not work for compression, explicitly disable it.

Note that previuosly even we can enable both of them, nothing will go
wrong, because the compression code has higher priority so multifd feature
will just be ignored.  Now we'll fail even earlier at config time so the
user should be aware of the consequence better.

Note that there can be a slight chance of breaking existing users, but
let's assume they're not majority and not serious users, or they should
have found that multifd is not working already.

With that, we can safely drop the check in ram_save_target_page() for using
multifd, because when multifd=on then compression=off, then the removed
check on save_page_use_compression() will also always return false too.

Signed-off-by: Peter Xu 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/migration.c |  7 +++
 migration/ram.c   | 11 +--
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index f3ed77a7d0..f485eea5fb 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1350,6 +1350,13 @@ static bool migrate_caps_check(bool *cap_list,
 }
 }
 
+if (cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
+if (cap_list[MIGRATION_CAPABILITY_COMPRESS]) {
+error_setg(errp, "Multifd is not compatible with compress");
+return false;
+}
+}
+
 return true;
 }
 
diff --git a/migration/ram.c b/migration/ram.c
index 1d42414ecc..1338e47665 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2305,13 +2305,12 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss)
 }
 
 /*
- * Do not use multifd for:
- * 1. Compression as the first page in the new block should be posted out
- *before sending the compressed page
- * 2. In postcopy as one whole host page should be placed
+ * Do not use multifd in postcopy as one whole host page should be
+ * placed.  Meanwhile postcopy requires atomic update of pages, so even
+ * if host page size == guest page size the dest guest during run may
+ * still see partially copied pages which is data corruption.
  */
-if (!save_page_use_compression(rs) && migrate_use_multifd()
-&& !migration_in_postcopy()) {
+if (migrate_use_multifd() && !migration_in_postcopy()) {
 return ram_save_multifd_page(rs, block, offset);
 }
 
-- 
2.38.1

[PULL 8/8] migration: Block migration comment or code is wrong

2022-11-21 Thread Juan Quintela

And it appears that what is wrong is the code. During bulk stage we
need to make sure that some block is dirty, but no games with
max_size at all.

Signed-off-by: Juan Quintela 
Reviewed-by: Stefan Hajnoczi 
---
 migration/block.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index 3577c815a9..4347da1526 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -880,8 +880,8 @@ static void block_save_pending(QEMUFile *f, void *opaque, 
uint64_t max_size,
 blk_mig_unlock();
 
 /* Report at least one block pending during bulk phase */
-if (pending <= max_size && !block_mig_state.bulk_completed) {
-pending = max_size + BLK_MIG_BLOCK_SIZE;
+if (!pending && !block_mig_state.bulk_completed) {
+pending = BLK_MIG_BLOCK_SIZE;
 }
 
 trace_migration_block_save_pending(pending);
-- 
2.38.1

[PULL 2/8] migration/multifd/zero-copy: Create helper function for flushing

2022-11-21 Thread Juan Quintela

From: Leonardo Bras 

Move flushing code from multifd_send_sync_main() to a new helper, and call
it in multifd_send_sync_main().

Signed-off-by: Leonardo Bras 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/multifd.c | 30 +++---
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 586ddc9d65..509bbbe3bf 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -566,6 +566,23 @@ void multifd_save_cleanup(void)
 multifd_send_state = NULL;
 }
 
+static int multifd_zero_copy_flush(QIOChannel *c)
+{
+int ret;
+Error *err = NULL;
+
+ret = qio_channel_flush(c, );
+if (ret < 0) {
+error_report_err(err);
+return -1;
+}
+if (ret == 1) {
+dirty_sync_missed_zero_copy();
+}
+
+return ret;
+}
+
 int multifd_send_sync_main(QEMUFile *f)
 {
 int i;
@@ -616,17 +633,8 @@ int multifd_send_sync_main(QEMUFile *f)
 qemu_mutex_unlock(>mutex);
 qemu_sem_post(>sem);
 
-if (flush_zero_copy && p->c) {
-int ret;
-Error *err = NULL;
-
-ret = qio_channel_flush(p->c, );
-if (ret < 0) {
-error_report_err(err);
-return -1;
-} else if (ret == 1) {
-dirty_sync_missed_zero_copy();
-}
+if (flush_zero_copy && p->c && (multifd_zero_copy_flush(p->c) < 0)) {
+return -1;
 }
 }
 for (i = 0; i < migrate_multifd_channels(); i++) {
-- 
2.38.1

[PULL 6/8] migration: Use non-atomic ops for clear log bitmap

2022-11-21 Thread Juan Quintela

From: Peter Xu 

Since we already have bitmap_mutex to protect either the dirty bitmap or
the clear log bitmap, we don't need atomic operations to set/clear/test on
the clear log bitmap.  Switching all ops from atomic to non-atomic
versions, meanwhile touch up the comments to show which lock is in charge.

Introduced non-atomic version of bitmap_test_and_clear_atomic(), mostly the
same as the atomic version but simplified a few places, e.g. dropped the
"old_bits" variable, and also the explicit memory barriers.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 include/exec/ram_addr.h | 11 +-
 include/exec/ramblock.h |  3 +++
 include/qemu/bitmap.h   |  1 +
 util/bitmap.c   | 45 +
 4 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 1500680458..f4fb6a2111 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -42,7 +42,8 @@ static inline long clear_bmap_size(uint64_t pages, uint8_t 
shift)
 }
 
 /**
- * clear_bmap_set: set clear bitmap for the page range
+ * clear_bmap_set: set clear bitmap for the page range.  Must be with
+ * bitmap_mutex held.
  *
  * @rb: the ramblock to operate on
  * @start: the start page number
@@ -55,12 +56,12 @@ static inline void clear_bmap_set(RAMBlock *rb, uint64_t 
start,
 {
 uint8_t shift = rb->clear_bmap_shift;
 
-bitmap_set_atomic(rb->clear_bmap, start >> shift,
-  clear_bmap_size(npages, shift));
+bitmap_set(rb->clear_bmap, start >> shift, clear_bmap_size(npages, shift));
 }
 
 /**
- * clear_bmap_test_and_clear: test clear bitmap for the page, clear if set
+ * clear_bmap_test_and_clear: test clear bitmap for the page, clear if set.
+ * Must be with bitmap_mutex held.
  *
  * @rb: the ramblock to operate on
  * @page: the page number to check
@@ -71,7 +72,7 @@ static inline bool clear_bmap_test_and_clear(RAMBlock *rb, 
uint64_t page)
 {
 uint8_t shift = rb->clear_bmap_shift;
 
-return bitmap_test_and_clear_atomic(rb->clear_bmap, page >> shift, 1);
+return bitmap_test_and_clear(rb->clear_bmap, page >> shift, 1);
 }
 
 static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 6cbedf9e0c..adc03df59c 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -53,6 +53,9 @@ struct RAMBlock {
  * and split clearing of dirty bitmap on the remote node (e.g.,
  * KVM).  The bitmap will be set only when doing global sync.
  *
+ * It is only used during src side of ram migration, and it is
+ * protected by the global ram_state.bitmap_mutex.
+ *
  * NOTE: this bitmap is different comparing to the other bitmaps
  * in that one bit can represent multiple guest pages (which is
  * decided by the `clear_bmap_shift' variable below).  On
diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 82a1d2f41f..3ccb00865f 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -253,6 +253,7 @@ void bitmap_set(unsigned long *map, long i, long len);
 void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
 bool bitmap_test_and_clear_atomic(unsigned long *map, long start, long nr);
+bool bitmap_test_and_clear(unsigned long *map, long start, long nr);
 void bitmap_copy_and_clear_atomic(unsigned long *dst, unsigned long *src,
   long nr);
 unsigned long bitmap_find_next_zero_area(unsigned long *map,
diff --git a/util/bitmap.c b/util/bitmap.c
index f81d8057a7..8d12e90a5a 100644
--- a/util/bitmap.c
+++ b/util/bitmap.c
@@ -240,6 +240,51 @@ void bitmap_clear(unsigned long *map, long start, long nr)
 }
 }
 
+bool bitmap_test_and_clear(unsigned long *map, long start, long nr)
+{
+unsigned long *p = map + BIT_WORD(start);
+const long size = start + nr;
+int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
+unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
+bool dirty = false;
+
+assert(start >= 0 && nr >= 0);
+
+/* First word */
+if (nr - bits_to_clear > 0) {
+if ((*p) & mask_to_clear) {
+dirty = true;
+}
+*p &= ~mask_to_clear;
+nr -= bits_to_clear;
+bits_to_clear = BITS_PER_LONG;
+p++;
+}
+
+/* Full words */
+if (bits_to_clear == BITS_PER_LONG) {
+while (nr >= BITS_PER_LONG) {
+if (*p) {
+dirty = true;
+*p = 0;
+}
+nr -= BITS_PER_LONG;
+p++;
+}
+}
+
+/* Last word */
+if (nr) {
+mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
+if ((*p) & mask_to_clear) {
+dirty = true;
+}
+*p &= ~mask_to_clear;
+}
+
+return dirty;
+}
+
 bool

[PULL 3/8] migration: Fix possible infinite loop of ram save process

2022-11-21 Thread Juan Quintela

From: Peter Xu 

When starting ram saving procedure (especially at the completion phase),
always set last_seen_block to non-NULL to make sure we can always correctly
detect the case where "we've migrated all the dirty pages".

Then we'll guarantee both last_seen_block and pss.block will be valid
always before the loop starts.

See the comment in the code for some details.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index dc1de9ddbc..1d42414ecc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2546,14 +2546,22 @@ static int ram_find_and_save_block(RAMState *rs)
 return pages;
 }
 
+/*
+ * Always keep last_seen_block/last_page valid during this procedure,
+ * because find_dirty_block() relies on these values (e.g., we compare
+ * last_seen_block with pss.block to see whether we searched all the
+ * ramblocks) to detect the completion of migration.  Having NULL value
+ * of last_seen_block can conditionally cause below loop to run forever.
+ */
+if (!rs->last_seen_block) {
+rs->last_seen_block = QLIST_FIRST_RCU(_list.blocks);
+rs->last_page = 0;
+}
+
 pss.block = rs->last_seen_block;
 pss.page = rs->last_page;
 pss.complete_round = false;
 
-if (!pss.block) {
-pss.block = QLIST_FIRST_RCU(_list.blocks);
-}
-
 do {
 again = true;
 found = get_queued_page(rs, );
-- 
2.38.1

[PULL 5/8] migration: Disallow postcopy preempt to be used with compress

2022-11-21 Thread Juan Quintela

From: Peter Xu 

The preempt mode requires the capability to assign channel for each of the
page, while the compression logic will currently assign pages to different
compress thread/local-channel so potentially they're incompatible.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/migration.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 739bb683f3..f3ed77a7d0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1337,6 +1337,17 @@ static bool migrate_caps_check(bool *cap_list,
 error_setg(errp, "Postcopy preempt requires postcopy-ram");
 return false;
 }
+
+/*
+ * Preempt mode requires urgent pages to be sent in separate
+ * channel, OTOH compression logic will disorder all pages into
+ * different compression channels, which is not compatible with the
+ * preempt assumptions on channel assignments.
+ */
+if (cap_list[MIGRATION_CAPABILITY_COMPRESS]) {
+error_setg(errp, "Postcopy preempt not compatible with compress");
+return false;
+}
 }
 
 return true;
-- 
2.38.1

[PULL 4/8] migration: Fix race on qemu_file_shutdown()

2022-11-21 Thread Juan Quintela

From: Peter Xu 

In qemu_file_shutdown(), there's a possible race if with current order of
operation.  There're two major things to do:

  (1) Do real shutdown() (e.g. shutdown() syscall on socket)
  (2) Update qemufile's last_error

We must do (2) before (1) otherwise there can be a race condition like:

  page receiver other thread
  - 
  qemu_get_buffer()
do shutdown()
returns 0 (buffer all zero)
(meanwhile we didn't check this retcode)
  try to detect IO error
last_error==NULL, IO okay
  install ALL-ZERO page
set last_error
  --> guest crash!

To fix this, we can also check retval of qemu_get_buffer(), but not all
APIs can be properly checked and ultimately we still need to go back to
qemu_file_get_error().  E.g. qemu_get_byte() doesn't return error.

Maybe some day a rework of qemufile API is really needed, but for now keep
using qemu_file_get_error() and fix it by not allowing that race condition
to happen.  Here shutdown() is indeed special because the last_error was
emulated.  For real -EIO errors it'll always be set when e.g. sendmsg()
error triggers so we won't miss those ones, only shutdown() is a bit tricky
here.

Cc: Daniel P. Berrange 
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/qemu-file.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 4f400c2e52..2d5f74ffc2 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -79,6 +79,30 @@ int qemu_file_shutdown(QEMUFile *f)
 int ret = 0;
 
 f->shutdown = true;
+
+/*
+ * We must set qemufile error before the real shutdown(), otherwise
+ * there can be a race window where we thought IO all went though
+ * (because last_error==NULL) but actually IO has already stopped.
+ *
+ * If without correct ordering, the race can happen like this:
+ *
+ *  page receiver other thread
+ *  - 
+ *  qemu_get_buffer()
+ *do shutdown()
+ *returns 0 (buffer all zero)
+ *(we didn't check this retcode)
+ *  try to detect IO error
+ *last_error==NULL, IO okay
+ *  install ALL-ZERO page
+ *set last_error
+ *  --> guest crash!
+ */
+if (!f->last_error) {
+qemu_file_set_error(f, -EIO);
+}
+
 if (!qio_channel_has_feature(f->ioc,
  QIO_CHANNEL_FEATURE_SHUTDOWN)) {
 return -ENOSYS;
@@ -88,9 +112,6 @@ int qemu_file_shutdown(QEMUFile *f)
 ret = -EIO;
 }
 
-if (!f->last_error) {
-qemu_file_set_error(f, -EIO);
-}
 return ret;
 }
 
-- 
2.38.1

Re: [PATCH v4 04/11] block-coroutine-wrapper.py: introduce generated_co_wrapper_simple

2022-11-21 Thread Kevin Wolf

Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben:
> This new annotation creates just a function wrapper that creates
> a new coroutine. It assumes the caller is not a coroutine.
> 
> This is much better as g_c_w, because it is clear if the caller
> is a coroutine or not, and provides the advantage of automating
> the code creation. In the future all g_c_w functions will be
> substituted on g_c_w_simple.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 

$ mypy --strict scripts/block-coroutine-wrapper.py
scripts/block-coroutine-wrapper.py:31: error: Function is missing a return type 
annotation
scripts/block-coroutine-wrapper.py:90: error: Missing type parameters for 
generic type "Iterator"
scripts/block-coroutine-wrapper.py:110: error: "FuncDecl" has no attribute 
"co_name"
scripts/block-coroutine-wrapper.py:111: error: "FuncDecl" has no attribute 
"struct_name"
scripts/block-coroutine-wrapper.py:112: error: "FuncDecl" has no attribute "bs"
scripts/block-coroutine-wrapper.py:135: error: "FuncDecl" has no attribute 
"co_name"
scripts/block-coroutine-wrapper.py:136: error: "FuncDecl" has no attribute 
"struct_name"
scripts/block-coroutine-wrapper.py:137: error: "FuncDecl" has no attribute "bs"
scripts/block-coroutine-wrapper.py:160: error: "FuncDecl" has no attribute 
"co_name"
scripts/block-coroutine-wrapper.py:161: error: "FuncDecl" has no attribute 
"co_name"
scripts/block-coroutine-wrapper.py:172: error: "FuncDecl" has no attribute "bs"
scripts/block-coroutine-wrapper.py:173: error: "FuncDecl" has no attribute 
"struct_name"
scripts/block-coroutine-wrapper.py:174: error: "FuncDecl" has no attribute 
"struct_name"
scripts/block-coroutine-wrapper.py:218: error: Call to untyped function 
"gen_header" in typed context
Found 14 errors in 1 file (checked 1 source file)

The first two and the last one isn't the fault of this patch, but the
others are. When you add new attributes, they should be declared in
FuncDecl.__init__(). And actually, this even seems like a better place
to initialise them already with the right value than gen_wrapper().

>  include/block/block-common.h   |  1 +
>  scripts/block-coroutine-wrapper.py | 87 ++
>  2 files changed, 66 insertions(+), 22 deletions(-)
> 
> diff --git a/include/block/block-common.h b/include/block/block-common.h
> index 297704c1e9..8ae750c7cf 100644
> --- a/include/block/block-common.h
> +++ b/include/block/block-common.h
> @@ -43,6 +43,7 @@
>   * Read more in docs/devel/block-coroutine-wrapper.rst
>   */
>  #define generated_co_wrapper
> +#define generated_co_wrapper_simple
>  
>  /* block.c */
>  typedef struct BlockDriver BlockDriver;
> diff --git a/scripts/block-coroutine-wrapper.py 
> b/scripts/block-coroutine-wrapper.py
> index 08be813407..f88ef53964 100644
> --- a/scripts/block-coroutine-wrapper.py
> +++ b/scripts/block-coroutine-wrapper.py
> @@ -62,10 +62,15 @@ def __init__(self, param_decl: str) -> None:
>  
>  
>  class FuncDecl:
> -def __init__(self, return_type: str, name: str, args: str) -> None:
> +def __init__(self, return_type: str, name: str, args: str,
> + variant: str) -> None:
>  self.return_type = return_type.strip()
>  self.name = name.strip()
>  self.args = [ParamDecl(arg.strip()) for arg in args.split(',')]
> +self.create_only_co = False
> +
> +if variant == '_simple':
> +self.create_only_co = True
>  
>  def gen_list(self, format: str) -> str:
>  return ', '.join(format.format_map(arg.__dict__) for arg in 
> self.args)
> @@ -75,7 +80,8 @@ def gen_block(self, format: str) -> str:
>  
>  
>  # Match wrappers declared with a generated_co_wrapper mark
> -func_decl_re = re.compile(r'^int\s*generated_co_wrapper\s*'
> +func_decl_re = re.compile(r'^int\s*generated_co_wrapper'
> +  r'(?P(_[a-z][a-z0-9_]*)?)\s*'
>r'(?P[a-z][a-z0-9_]*)'
>r'\((?P[^)]*)\);$', re.MULTILINE)
>  
> @@ -84,7 +90,8 @@ def func_decl_iter(text: str) -> Iterator:
>  for m in func_decl_re.finditer(text):
>  yield FuncDecl(return_type='int',
> name=m.group('wrapper_name'),
> -   args=m.group('args'))
> +   args=m.group('args'),
> +   variant=m.group('variant'))
>  
>  
>  def snake_to_camel(func_name: str) -> str:
> @@ -97,6 +104,51 @@ def snake_to_camel(func_name: str) -> str:
>  return ''.join(words)
>  
>  
> +# Checks if we are already in coroutine
> +def create_g_c_w(func: FuncDecl) -> str:
> +name = func.co_name
> +struct_name = func.struct_name
> +return f"""\
> +int {func.name}({ func.gen_list('{decl}') })
> +{{
> +if (qemu_in_coroutine()) {{
> +return {name}({ func.gen_list('{name}') });
> +}} else {{
> +{struct_name} s = {{
> +.poll_state.bs = {func.bs},
> +.poll_state.in_progress = true,
> +
> +{

Re: [PATCH v4 01/11] block-copy: add missing coroutine_fn annotations

2022-11-21 Thread Kevin Wolf

Am 21.11.2022 um 09:51 hat Emanuele Giuseppe Esposito geschrieben:
> 
> 
> Am 21/11/2022 um 09:32 schrieb Emanuele Giuseppe Esposito:
> > 
> > 
> > Am 18/11/2022 um 20:05 schrieb Kevin Wolf:
> >> Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben:
> >>> These functions end up calling bdrv_common_block_status_above(), a
> >>> generated_co_wrapper function.
> >>> In addition, they also happen to be always called in coroutine context,
> >>> meaning all callers are coroutine_fn.
> >>> This means that the g_c_w function will enter the qemu_in_coroutine()
> >>> case and eventually suspend (or in other words call 
> >>> qemu_coroutine_yield()).
> >>> Therefore we need to mark such functions coroutine_fn too.
> >>>
> >>> Signed-off-by: Emanuele Giuseppe Esposito 
> >>
> >> Ideally, we'd convert them to new wrappers bdrv_co_is_allocated() and
> >> bdrv_co_block_status_above() instead of having to argue that they always
> >> take the coroutine path in g_c_w.
> > 
> > Ok so basically I should introduce bdrv_co_is_allocated, because so far
> > in this and next series I never thought about creating it.
> > Since these functions will be eventually split anyways, I agree let's
> > start doing this now.
> 
> Actually bdrv_is_allocated would be a g_c_w functions in itself, that
> calls another g_c_w and it is probably called by functions that are or
> will be g_c_w.

I'm not sure if I understand. bdrv_is_allocated() is essentially a g_c_w
function today, just indirectly. But we have callers that know that they
are running in a coroutine (which is why you're adding coroutine_fn to
them), so they shouldn't call a g_c_w function, but directly the
coroutine version of the function.

The only reason why you can't currently do that is that
bdrv_is_allocated() exists as a wrapper around the g_c_w function
bdrv_common_block_status_above(), but the same wrapper doesn't exist for
the pure coroutine version bdrv_co_common_block_status_above().

All I'm suggesting is introducing a bdrv_co_is_allocated() that is a
wrapper directly around bdrv_co_common_block_status_above(), so that
the functions you're marking as coroutine_fn can use it instead of
calling g_c_w. This should be about 10 lines of code.

I'm not implying that you should convert any other callers in this
patch, or that you should touch bdrv_is_allocated() at all.

> Is this actually the scope of this series? I think switching this
> specific function and its callers or similar will require a lot of
> efforts, and if I do it here it won't cover all the cases for sure.
> 
> Wouldn't it be better to do these kind of things in a different serie
> using Paolo's vrc tool?

I'm not sure what the scope of this series is, because you already do
introduce new wrappers in other patches of the series. I assumed it's
just to improve the situation a little, with no claim of being
exhaustive.

Finding and fully converting all callers might indeed be a job for
something like vrc, but here I'm only looking at local consistency in
functions where you're adding coroutine_fn.

Kevin

[PATCH] vhost: mask VIRTIO_F_RING_RESET for vhost and vhost-user devices

2022-11-21 Thread Stefano Garzarella

Commit 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
enabled VIRTIO_F_RING_RESET by default for all virtio devices.

This feature is not currently emulated by QEMU, so for vhost and
vhost-user devices we need to make sure it is supported by the offloaded
device emulation (in-kernel or in another process).
To do this we need to add VIRTIO_F_RING_RESET to the features bitmap
passed to vhost_get_features(). This way it will be masked if the device
does not support it.

This issue was initially discovered with vhost-vsock and vhost-user-vsock,
and then also tested with vhost-user-rng which confirmed the same issue.
They fail when sending features through VHOST_SET_FEATURES ioctl or
VHOST_USER_SET_FEATURES message, since VIRTIO_F_RING_RESET is negotiated
by the guest (Linux >= v6.0), but not supported by the device.

Fixes: 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1318
Signed-off-by: Stefano Garzarella 
---

To prevent this problem in the future, perhaps we should provide a function
(e.g. vhost_device_get_features) where we go to mask all non-device-specific
features (e.g VIRTIO_F_*, VIRTIO_RING_F_*) that are not emulated by QEMU but
we expect them to be emulated by the vhost or vhost-user devices.
Then we can call it in all .get_features callbacks just before return the
features.

What do you think?

But maybe better to do that for the next release, I will send an RFC.

Thanks,
Stefano
---
 hw/block/vhost-user-blk.c  |  1 +
 hw/net/vhost_net.c |  1 +
 hw/scsi/vhost-scsi.c   |  1 +
 hw/scsi/vhost-user-scsi.c  |  1 +
 hw/virtio/vhost-user-fs.c  |  1 +
 hw/virtio/vhost-user-gpio.c|  1 +
 hw/virtio/vhost-user-i2c.c |  1 +
 hw/virtio/vhost-user-rng.c | 11 +--
 hw/virtio/vhost-vsock-common.c |  1 +
 net/vhost-vdpa.c   |  1 +
 10 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 16ad400889..0d5190accf 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -52,6 +52,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_NOTIFY_ON_EMPTY,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
+VIRTIO_F_RING_RESET,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index feda448878..26e4930676 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -75,6 +75,7 @@ static const int user_feature_bits[] = {
 VIRTIO_NET_F_MTU,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
+VIRTIO_F_RING_RESET,
 VIRTIO_NET_F_RSS,
 VIRTIO_NET_F_HASH_REPORT,
 
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index bdf337a7a2..6a0fd0dfb1 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
+VIRTIO_F_RING_RESET,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index bc37317d55..b7a71a802c 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
+VIRTIO_F_RING_RESET,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 1c40f42045..dc4014cdef 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -32,6 +32,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_NOTIFY_ON_EMPTY,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
+VIRTIO_F_RING_RESET,
 
 VHOST_INVALID_FEATURE_BIT
 };
diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c
index 677d1c7730..5851cb3bc9 100644
--- a/hw/virtio/vhost-user-gpio.c
+++ b/hw/virtio/vhost-user-gpio.c
@@ -24,6 +24,7 @@ static const int feature_bits[] = {
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_GPIO_F_IRQ,
+VIRTIO_F_RING_RESET,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-i2c.c b/hw/virtio/vhost-user-i2c.c
index 864eba695e..1c9f3d20dc 100644
--- a/hw/virtio/vhost-user-i2c.c
+++ b/hw/virtio/vhost-user-i2c.c
@@ -16,6 +16,7 @@
 
 static const int feature_bits[] = {
 VIRTIO_I2C_F_ZERO_LENGTH_REQUEST,
+VIRTIO_F_RING_RESET,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-rng.c b/hw/virtio/vhost-user-rng.c
index 8b47287875..f9084cde58 100644
--- a/hw/virtio/vhost-user-rng.c
+++ b/hw/virtio/vhost-user-rng.c
@@ -16,6 +16,11 @@
 #include "qemu/error-report.h"
 #include "standard-headers/linux/virtio_ids.h"
 
+static const int feature_bits[] = {
+VIRTIO_F_RING_RESET,
+VHOST_INVALID_FEATURE_BIT
+};
+
 static void vu_rng_start(VirtIODevice *vdev)
 {
 VHostUserRNG *rng = VHOST_USER_RNG(vdev);

Re: [PATCH v4 01/11] block-copy: add missing coroutine_fn annotations

2022-11-21 Thread Emanuele Giuseppe Esposito




Am 21/11/2022 um 09:32 schrieb Emanuele Giuseppe Esposito:
> 
> 
> Am 18/11/2022 um 20:05 schrieb Kevin Wolf:
>> Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben:
>>> These functions end up calling bdrv_common_block_status_above(), a
>>> generated_co_wrapper function.
>>> In addition, they also happen to be always called in coroutine context,
>>> meaning all callers are coroutine_fn.
>>> This means that the g_c_w function will enter the qemu_in_coroutine()
>>> case and eventually suspend (or in other words call qemu_coroutine_yield()).
>>> Therefore we need to mark such functions coroutine_fn too.
>>>
>>> Signed-off-by: Emanuele Giuseppe Esposito 
>>
>> Ideally, we'd convert them to new wrappers bdrv_co_is_allocated() and
>> bdrv_co_block_status_above() instead of having to argue that they always
>> take the coroutine path in g_c_w.
>>
> 
> Ok so basically I should introduce bdrv_co_is_allocated, because so far
> in this and next series I never thought about creating it.
> Since these functions will be eventually split anyways, I agree let's
> start doing this now.

Actually bdrv_is_allocated would be a g_c_w functions in itself, that
calls another g_c_w and it is probably called by functions that are or
will be g_c_w.
Is this actually the scope of this series? I think switching this
specific function and its callers or similar will require a lot of
efforts, and if I do it here it won't cover all the cases for sure.

Wouldn't it be better to do these kind of things in a different serie
using Paolo's vrc tool?

> Thank you,
> Emanuele
>

Re: [PATCH v4 01/11] block-copy: add missing coroutine_fn annotations

2022-11-21 Thread Emanuele Giuseppe Esposito




Am 18/11/2022 um 20:05 schrieb Kevin Wolf:
> Am 16.11.2022 um 13:22 hat Emanuele Giuseppe Esposito geschrieben:
>> These functions end up calling bdrv_common_block_status_above(), a
>> generated_co_wrapper function.
>> In addition, they also happen to be always called in coroutine context,
>> meaning all callers are coroutine_fn.
>> This means that the g_c_w function will enter the qemu_in_coroutine()
>> case and eventually suspend (or in other words call qemu_coroutine_yield()).
>> Therefore we need to mark such functions coroutine_fn too.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito 
> 
> Ideally, we'd convert them to new wrappers bdrv_co_is_allocated() and
> bdrv_co_block_status_above() instead of having to argue that they always
> take the coroutine path in g_c_w.
> 

Ok so basically I should introduce bdrv_co_is_allocated, because so far
in this and next series I never thought about creating it.
Since these functions will be eventually split anyways, I agree let's
start doing this now.

Thank you,
Emanuele

68 matches

Mail list logo