Re: [PULL 11/35] arm/Kconfig: Do not build TCG-only boards on a KVM-only build

2023-05-08 Thread Thomas Huth

On 04/05/2023 14.27, Fabiano Rosas wrote:

Thomas Huth  writes:


On 02/05/2023 14.14, Peter Maydell wrote:

From: Fabiano Rosas 

Move all the CONFIG_FOO=y from default.mak into "default y if TCG"
statements in Kconfig. That way they won't be selected when
CONFIG_TCG=n.

I'm leaving CONFIG_ARM_VIRT in default.mak because it allows us to
keep the two default.mak files not empty and keep aarch64-default.mak
including arm-default.mak. That way we don't surprise anyone that's
used to altering these files.

With this change we can start building with --disable-tcg.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Richard Henderson 
Message-id: 20230426180013.14814-12-faro...@suse.de
Signed-off-by: Peter Maydell 
---
   configs/devices/aarch64-softmmu/default.mak |  4 --
   configs/devices/arm-softmmu/default.mak | 37 --
   hw/arm/Kconfig  | 42 -
   3 files changed, 41 insertions(+), 42 deletions(-)

diff --git a/configs/devices/aarch64-softmmu/default.mak 
b/configs/devices/aarch64-softmmu/default.mak
index cf43ac8da11..70e05a197dc 100644
--- a/configs/devices/aarch64-softmmu/default.mak
+++ b/configs/devices/aarch64-softmmu/default.mak
@@ -2,7 +2,3 @@
   
   # We support all the 32 bit boards so need all their config

   include ../arm-softmmu/default.mak
-
-CONFIG_XLNX_ZYNQMP_ARM=y
-CONFIG_XLNX_VERSAL=y
-CONFIG_SBSA_REF=y
diff --git a/configs/devices/arm-softmmu/default.mak 
b/configs/devices/arm-softmmu/default.mak
index cb3e5aea657..647fbce88d3 100644
--- a/configs/devices/arm-softmmu/default.mak
+++ b/configs/devices/arm-softmmu/default.mak
@@ -4,40 +4,3 @@
   # CONFIG_TEST_DEVICES=n
   
   CONFIG_ARM_VIRT=y

-CONFIG_CUBIEBOARD=y
-CONFIG_EXYNOS4=y
-CONFIG_HIGHBANK=y
-CONFIG_INTEGRATOR=y
-CONFIG_FSL_IMX31=y
-CONFIG_MUSICPAL=y
-CONFIG_MUSCA=y
-CONFIG_CHEETAH=y
-CONFIG_SX1=y
-CONFIG_NSERIES=y
-CONFIG_STELLARIS=y
-CONFIG_STM32VLDISCOVERY=y
-CONFIG_REALVIEW=y
-CONFIG_VERSATILE=y
-CONFIG_VEXPRESS=y
-CONFIG_ZYNQ=y
-CONFIG_MAINSTONE=y
-CONFIG_GUMSTIX=y
-CONFIG_SPITZ=y
-CONFIG_TOSA=y
-CONFIG_Z2=y
-CONFIG_NPCM7XX=y
-CONFIG_COLLIE=y
-CONFIG_ASPEED_SOC=y
-CONFIG_NETDUINO2=y
-CONFIG_NETDUINOPLUS2=y
-CONFIG_OLIMEX_STM32_H405=y
-CONFIG_MPS2=y
-CONFIG_RASPI=y
-CONFIG_DIGIC=y
-CONFIG_SABRELITE=y
-CONFIG_EMCRAFT_SF2=y
-CONFIG_MICROBIT=y
-CONFIG_FSL_IMX25=y
-CONFIG_FSL_IMX7=y
-CONFIG_FSL_IMX6UL=y
-CONFIG_ALLWINNER_H3=y
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 87c1a29c912..2d7c4579559 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -35,20 +35,24 @@ config ARM_VIRT
   
   config CHEETAH

   bool
+default y if TCG && ARM
   select OMAP
   select TSC210X
   
   config CUBIEBOARD

   bool
+default y if TCG && ARM
   select ALLWINNER_A10

...

   Hi!

Sorry for not noticing this earlier, but I have to say that I really dislike
this change, since it very much changes the way we did our machine
configuration so far.
Until now, you could simply go to configs/devices/*-softmmu/*.mak and only
select the machines you wanted to have with "...=y" and delete everything
else. Now you have to know *all* the machines that you do *not* want to have
in your build and disable them with "...=n" in that file. That's quite ugly,
especially for the arm target that has so many machines. (ok, you could also
do a "--without-default-devices" configuration to get rid of the machines,
but that also disables all other kind of devices that you then have to
specify manually).



Would leaving the CONFIGs as 'n', but commented out in the .mak files be
of any help? If I understand your use case, you were probably just
deleting the CONFIG=y for the boards you don't want. So now you'd be
uncommenting the CONFIG=n instead.

Alternatively, we could revert the .mak part of this change, convert
default.mak into tcg.mak and kvm.mak, and use those transparently
depending on whether --disable-tcg is present in the configure line.

But there's probably a better way still that I'm not seeing here, let's
see what others think.


I pondered about it for a while, but I also don't have a better solution, so 
yes, I guess that "# CONFIG_xxx=n" idea is likely still the best solution 
right now.


 Thomas





Re: [PATCH RESEND] vhost: fix possible wrap in SVQ descriptor ring

2023-05-08 Thread Hawkins Jiawei
Hi Eugenio,
Thanks for reviewing.

On 2023/5/9 1:26, Eugenio Perez Martin wrote:
> On Sat, May 6, 2023 at 5:01 PM Hawkins Jiawei  wrote:
>>
>> QEMU invokes vhost_svq_add() when adding a guest's element into SVQ.
>> In vhost_svq_add(), it uses vhost_svq_available_slots() to check
>> whether QEMU can add the element into the SVQ. If there is
>> enough space, then QEMU combines some out descriptors and
>> some in descriptors into one descriptor chain, and add it into
>> svq->vring.desc by vhost_svq_vring_write_descs().
>>
>> Yet the problem is that, `svq->shadow_avail_idx - svq->shadow_used_idx`
>> in vhost_svq_available_slots() return the number of occupied elements,
>> or the number of descriptor chains, instead of the number of occupied
>> descriptors, which may cause wrapping in SVQ descriptor ring.
>>
>> Here is an example. In vhost_handle_guest_kick(), QEMU forwards
>> as many available buffers to device by virtqueue_pop() and
>> vhost_svq_add_element(). virtqueue_pop() return a guest's element,
>> and use vhost_svq_add_elemnt(), a wrapper to vhost_svq_add(), to
>> add this element into SVQ. If QEMU invokes virtqueue_pop() and
>> vhost_svq_add_element() `svq->vring.num` times, vhost_svq_available_slots()
>> thinks QEMU just ran out of slots and everything should work fine.
>> But in fact, virtqueue_pop() return `svq-vring.num` elements or
>> descriptor chains, more than `svq->vring.num` descriptors, due to
>> guest memory fragmentation, and this cause wrapping in SVQ descriptor ring.
>>
>
> The bug is valid even before marking the descriptors used. If the
> guest memory is fragmented, SVQ must add chains so it can try to add
> more descriptors than possible.

I will add this in the commit message in v2 patch.

>
>> Therefore, this patch adds `num_free` field in VhostShadowVirtqueue
>> structure, updates this field in vhost_svq_add() and
>> vhost_svq_get_buf(), to record the number of free descriptors.
>> Then we can avoid wrap in SVQ descriptor ring by refactoring
>> vhost_svq_available_slots().
>>
>> Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding")
>> Signed-off-by: Hawkins Jiawei 
>> ---
>>   hw/virtio/vhost-shadow-virtqueue.c | 9 -
>>   hw/virtio/vhost-shadow-virtqueue.h | 3 +++
>>   2 files changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
>> b/hw/virtio/vhost-shadow-virtqueue.c
>> index 8361e70d1b..e1c6952b10 100644
>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>> @@ -68,7 +68,7 @@ bool vhost_svq_valid_features(uint64_t features, Error 
>> **errp)
>>*/
>>   static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
>>   {
>> -return svq->vring.num - (svq->shadow_avail_idx - svq->shadow_used_idx);
>> +return svq->num_free;
>>   }
>>
>>   /**
>> @@ -263,6 +263,9 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const 
>> struct iovec *out_sg,
>>   return -EINVAL;
>>   }
>>
>> +/* Update the size of SVQ vring free descriptors */
>> +svq->num_free -= ndescs;
>> +
>>   svq->desc_state[qemu_head].elem = elem;
>>   svq->desc_state[qemu_head].ndescs = ndescs;
>>   vhost_svq_kick(svq);
>> @@ -450,6 +453,9 @@ static VirtQueueElement 
>> *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
>>   svq->desc_next[last_used_chain] = svq->free_head;
>>   svq->free_head = used_elem.id;
>>
>> +/* Update the size of SVQ vring free descriptors */
>
> No need for this comment.
>
> Apart from that,
>
> Acked-by: Eugenio Pérez 
>

Thanks for your suggestion. I will remove the
comment in v2 patch, with this tag on.


>> +svq->num_free += num;
>> +
>>   *len = used_elem.len;
>>   return g_steal_pointer(&svq->desc_state[used_elem.id].elem);
>>   }
>> @@ -659,6 +665,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, 
>> VirtIODevice *vdev,
>>   svq->iova_tree = iova_tree;
>>
>>   svq->vring.num = virtio_queue_get_num(vdev, 
>> virtio_get_queue_index(vq));
>> +svq->num_free = svq->vring.num;
>>   driver_size = vhost_svq_driver_area_size(svq);
>>   device_size = vhost_svq_device_area_size(svq);
>>   svq->vring.desc = qemu_memalign(qemu_real_host_page_size(), 
>> driver_size);
>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
>> b/hw/virtio/vhost-shadow-virtqueue.h
>> index 926a4897b1..6efe051a70 100644
>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>> @@ -107,6 +107,9 @@ typedef struct VhostShadowVirtqueue {
>>
>>   /* Next head to consume from the device */
>>   uint16_t last_used_idx;
>> +
>> +/* Size of SVQ vring free descriptors */
>> +uint16_t num_free;
>>   } VhostShadowVirtqueue;
>>
>>   bool vhost_svq_valid_features(uint64_t features, Error **errp);
>> --
>> 2.25.1
>>
>



Re: [PATCH v3 3/3] tests/qtest: Don't run cdrom boot tests if no accelerator is present

2023-05-08 Thread Thomas Huth

On 08/05/2023 20.16, Fabiano Rosas wrote:

On a build configured with: --disable-tcg --enable-xen it is possible
to produce a QEMU binary with no TCG nor KVM support. Skip the cdrom
boot tests if that's the case.

Fixes: 0c1ae3ff9d ("tests/qtest: Fix tests when no KVM or TCG are present")
Signed-off-by: Fabiano Rosas 
---
  tests/qtest/cdrom-test.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/tests/qtest/cdrom-test.c b/tests/qtest/cdrom-test.c
index 26a2400181..31d3bacd8c 100644
--- a/tests/qtest/cdrom-test.c
+++ b/tests/qtest/cdrom-test.c
@@ -130,6 +130,11 @@ static void test_cdboot(gconstpointer data)
  
  static void add_x86_tests(void)

  {
+if (!qtest_has_accel("tcg") && !qtest_has_accel("kvm")) {
+g_test_skip("No KVM or TCG accelerator available, skipping boot 
tests");
+return;
+}
+
  qtest_add_data_func("cdrom/boot/default", "-cdrom ", test_cdboot);
  qtest_add_data_func("cdrom/boot/virtio-scsi",
  "-device virtio-scsi -device scsi-cd,drive=cdr "
@@ -176,6 +181,11 @@ static void add_x86_tests(void)
  
  static void add_s390x_tests(void)

  {
+if (!qtest_has_accel("tcg") && !qtest_has_accel("kvm")) {
+g_test_skip("No KVM or TCG accelerator available, skipping boot 
tests");
+return;
+}
+
  qtest_add_data_func("cdrom/boot/default", "-cdrom ", test_cdboot);
  qtest_add_data_func("cdrom/boot/virtio-scsi",
  "-device virtio-scsi -device scsi-cd,drive=cdr "


Reviewed-by: Thomas Huth 




Re: [PATCH 2/4] vhost-user: Interface for migration state transfer

2023-05-08 Thread Eugenio Perez Martin
On Mon, May 8, 2023 at 10:10 PM Stefan Hajnoczi  wrote:
>
> On Thu, Apr 20, 2023 at 03:29:44PM +0200, Eugenio Pérez wrote:
> > On Wed, 2023-04-19 at 07:21 -0400, Stefan Hajnoczi wrote:
> > > On Wed, 19 Apr 2023 at 07:10, Hanna Czenczek  wrote:
> > > > On 18.04.23 09:54, Eugenio Perez Martin wrote:
> > > > > On Mon, Apr 17, 2023 at 9:21 PM Stefan Hajnoczi 
> > > > > wrote:
> > > > > > On Mon, 17 Apr 2023 at 15:08, Eugenio Perez Martin 
> > > > > > 
> > > > > > wrote:
> > > > > > > On Mon, Apr 17, 2023 at 7:14 PM Stefan Hajnoczi 
> > > > > > > 
> > > > > > > wrote:
> > > > > > > > On Thu, Apr 13, 2023 at 12:14:24PM +0200, Eugenio Perez Martin
> > > > > > > > wrote:
> > > > > > > > > On Wed, Apr 12, 2023 at 11:06 PM Stefan Hajnoczi <
> > > > > > > > > stefa...@redhat.com> wrote:
> > > > > > > > > > On Tue, Apr 11, 2023 at 05:05:13PM +0200, Hanna Czenczek 
> > > > > > > > > > wrote:
> > > > > > > > > > > So-called "internal" virtio-fs migration refers to
> > > > > > > > > > > transporting the
> > > > > > > > > > > back-end's (virtiofsd's) state through qemu's migration
> > > > > > > > > > > stream.  To do
> > > > > > > > > > > this, we need to be able to transfer virtiofsd's internal
> > > > > > > > > > > state to and
> > > > > > > > > > > from virtiofsd.
> > > > > > > > > > >
> > > > > > > > > > > Because virtiofsd's internal state will not be too large, 
> > > > > > > > > > > we
> > > > > > > > > > > believe it
> > > > > > > > > > > is best to transfer it as a single binary blob after the
> > > > > > > > > > > streaming
> > > > > > > > > > > phase.  Because this method should be useful to other 
> > > > > > > > > > > vhost-
> > > > > > > > > > > user
> > > > > > > > > > > implementations, too, it is introduced as a 
> > > > > > > > > > > general-purpose
> > > > > > > > > > > addition to
> > > > > > > > > > > the protocol, not limited to vhost-user-fs.
> > > > > > > > > > >
> > > > > > > > > > > These are the additions to the protocol:
> > > > > > > > > > > - New vhost-user protocol feature
> > > > > > > > > > > VHOST_USER_PROTOCOL_F_MIGRATORY_STATE:
> > > > > > > > > > >This feature signals support for transferring state, 
> > > > > > > > > > > and is
> > > > > > > > > > > added so
> > > > > > > > > > >that migration can fail early when the back-end has no
> > > > > > > > > > > support.
> > > > > > > > > > >
> > > > > > > > > > > - SET_DEVICE_STATE_FD function: Front-end and back-end
> > > > > > > > > > > negotiate a pipe
> > > > > > > > > > >over which to transfer the state.  The front-end sends 
> > > > > > > > > > > an
> > > > > > > > > > > FD to the
> > > > > > > > > > >back-end into/from which it can write/read its state, 
> > > > > > > > > > > and
> > > > > > > > > > > the back-end
> > > > > > > > > > >can decide to either use it, or reply with a different 
> > > > > > > > > > > FD
> > > > > > > > > > > for the
> > > > > > > > > > >front-end to override the front-end's choice.
> > > > > > > > > > >The front-end creates a simple pipe to transfer the 
> > > > > > > > > > > state,
> > > > > > > > > > > but maybe
> > > > > > > > > > >the back-end already has an FD into/from which it has 
> > > > > > > > > > > to
> > > > > > > > > > > write/read
> > > > > > > > > > >its state, in which case it will want to override the
> > > > > > > > > > > simple pipe.
> > > > > > > > > > >Conversely, maybe in the future we find a way to have 
> > > > > > > > > > > the
> > > > > > > > > > > front-end
> > > > > > > > > > >get an immediate FD for the migration stream (in some
> > > > > > > > > > > cases), in which
> > > > > > > > > > >case we will want to send this to the back-end instead 
> > > > > > > > > > > of
> > > > > > > > > > > creating a
> > > > > > > > > > >pipe.
> > > > > > > > > > >Hence the negotiation: If one side has a better idea 
> > > > > > > > > > > than a
> > > > > > > > > > > plain
> > > > > > > > > > >pipe, we will want to use that.
> > > > > > > > > > >
> > > > > > > > > > > - CHECK_DEVICE_STATE: After the state has been transferred
> > > > > > > > > > > through the
> > > > > > > > > > >pipe (the end indicated by EOF), the front-end invokes 
> > > > > > > > > > > this
> > > > > > > > > > > function
> > > > > > > > > > >to verify success.  There is no in-band way (through 
> > > > > > > > > > > the
> > > > > > > > > > > pipe) to
> > > > > > > > > > >indicate failure, so we need to check explicitly.
> > > > > > > > > > >
> > > > > > > > > > > Once the transfer pipe has been established via
> > > > > > > > > > > SET_DEVICE_STATE_FD
> > > > > > > > > > > (which includes establishing the direction of transfer and
> > > > > > > > > > > migration
> > > > > > > > > > > phase), the sending side writes its data into the pipe, 
> > > > > > > > > > > and
> > > > > > > > > > > the reading
> > > > > > > > > > > side reads it until it sees an EOF.  Then, the front-end 
> > > > > > > > > > > will
> > > > > > > > > > > check for
> > > > > > > > > >

[PATCH] Fix QEMU crash caused when NUMA nodes exceed available CPUs

2023-05-08 Thread Yin Wang
command "qemu-system-riscv64 -machine virt
-m 2G -smp 1 -numa node,mem=1G -numa node,mem=1G"
would trigger this problem.
This commit fixes the issue by adding parameter checks.

Signed-off-by: Yin Wang 
---
 hw/core/numa.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index d8d36b16d8..ff249369be 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -168,6 +168,13 @@ static void parse_numa_node(MachineState *ms, 
NumaNodeOptions *node,
 numa_info[nodenr].present = true;
 max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
 ms->numa_state->num_nodes++;
+if (ms->smp.max_cpus < ms->numa_state->num_nodes) {
+error_setg(errp,
+   "Number of NUMA nodes:(%d)"
+   " is larger than number of CPUs:(%d)",
+   ms->numa_state->num_nodes, ms->smp.max_cpus);
+return;
+}
 }
 
 static
-- 
2.34.1




Re: [PULL 00/13] Compression code patches

2023-05-08 Thread Richard Henderson

On 5/8/23 19:51, Juan Quintela wrote:

The following changes since commit 792f77f376adef944f9a03e601f6ad90c2f891b2:

   Merge tag 'pull-loongarch-20230506' ofhttps://gitlab.com/gaosong/qemu  into 
staging (2023-05-06 08:11:52 +0100)

are available in the Git repository at:

   https://gitlab.com/juan.quintela/qemu.git  tags/compression-code-pull-request

for you to fetch changes up to c323518a7aab1c01740a468671b7f2b517d3bca6:

   migration: Initialize and cleanup decompression in migration.c (2023-05-08 
15:25:27 +0200)


Migration PULL request (20230508 edition, take 2)

Hi

This is just the compression bits of the Migration PULL request for
20230428.  Only change is that we don't run the compression tests by
default.

The problem already exist with compression code.  The test just show
that it don't work.

- Add migration tests for (old) compress migration code (lukas)
- Make compression code independent of ram.c (lukas)
- Move compression code into ram-compress.c (lukas)

Please apply, Juan.


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
appropriate.


r~




Re: [PATCH 2/4] vhost-user: Interface for migration state transfer

2023-05-08 Thread Eugenio Perez Martin
On Mon, May 8, 2023 at 9:12 PM Stefan Hajnoczi  wrote:
>
> On Thu, Apr 20, 2023 at 03:27:51PM +0200, Eugenio Pérez wrote:
> > On Tue, 2023-04-18 at 16:40 -0400, Stefan Hajnoczi wrote:
> > > On Tue, 18 Apr 2023 at 14:31, Eugenio Perez Martin 
> > > wrote:
> > > > On Tue, Apr 18, 2023 at 7:59 PM Stefan Hajnoczi  
> > > > wrote:
> > > > > On Tue, Apr 18, 2023 at 10:09:30AM +0200, Eugenio Perez Martin wrote:
> > > > > > On Mon, Apr 17, 2023 at 9:33 PM Stefan Hajnoczi 
> > > > > > wrote:
> > > > > > > On Mon, 17 Apr 2023 at 15:10, Eugenio Perez Martin <
> > > > > > > epere...@redhat.com> wrote:
> > > > > > > > On Mon, Apr 17, 2023 at 5:38 PM Stefan Hajnoczi 
> > > > > > > >  > > > > > > > > wrote:
> > > > > > > > > On Thu, Apr 13, 2023 at 12:14:24PM +0200, Eugenio Perez Martin
> > > > > > > > > wrote:
> > > > > > > > > > On Wed, Apr 12, 2023 at 11:06 PM Stefan Hajnoczi <
> > > > > > > > > > stefa...@redhat.com> wrote:
> > > > > > > > > > > On Tue, Apr 11, 2023 at 05:05:13PM +0200, Hanna Czenczek
> > > > > > > > > > > wrote:
> > > > > > > > > > > > So-called "internal" virtio-fs migration refers to
> > > > > > > > > > > > transporting the
> > > > > > > > > > > > back-end's (virtiofsd's) state through qemu's migration
> > > > > > > > > > > > stream.  To do
> > > > > > > > > > > > this, we need to be able to transfer virtiofsd's 
> > > > > > > > > > > > internal
> > > > > > > > > > > > state to and
> > > > > > > > > > > > from virtiofsd.
> > > > > > > > > > > >
> > > > > > > > > > > > Because virtiofsd's internal state will not be too 
> > > > > > > > > > > > large, we
> > > > > > > > > > > > believe it
> > > > > > > > > > > > is best to transfer it as a single binary blob after the
> > > > > > > > > > > > streaming
> > > > > > > > > > > > phase.  Because this method should be useful to other 
> > > > > > > > > > > > vhost-
> > > > > > > > > > > > user
> > > > > > > > > > > > implementations, too, it is introduced as a 
> > > > > > > > > > > > general-purpose
> > > > > > > > > > > > addition to
> > > > > > > > > > > > the protocol, not limited to vhost-user-fs.
> > > > > > > > > > > >
> > > > > > > > > > > > These are the additions to the protocol:
> > > > > > > > > > > > - New vhost-user protocol feature
> > > > > > > > > > > > VHOST_USER_PROTOCOL_F_MIGRATORY_STATE:
> > > > > > > > > > > >   This feature signals support for transferring state, 
> > > > > > > > > > > > and
> > > > > > > > > > > > is added so
> > > > > > > > > > > >   that migration can fail early when the back-end has no
> > > > > > > > > > > > support.
> > > > > > > > > > > >
> > > > > > > > > > > > - SET_DEVICE_STATE_FD function: Front-end and back-end
> > > > > > > > > > > > negotiate a pipe
> > > > > > > > > > > >   over which to transfer the state.  The front-end 
> > > > > > > > > > > > sends an
> > > > > > > > > > > > FD to the
> > > > > > > > > > > >   back-end into/from which it can write/read its state, 
> > > > > > > > > > > > and
> > > > > > > > > > > > the back-end
> > > > > > > > > > > >   can decide to either use it, or reply with a 
> > > > > > > > > > > > different FD
> > > > > > > > > > > > for the
> > > > > > > > > > > >   front-end to override the front-end's choice.
> > > > > > > > > > > >   The front-end creates a simple pipe to transfer the 
> > > > > > > > > > > > state,
> > > > > > > > > > > > but maybe
> > > > > > > > > > > >   the back-end already has an FD into/from which it has 
> > > > > > > > > > > > to
> > > > > > > > > > > > write/read
> > > > > > > > > > > >   its state, in which case it will want to override the
> > > > > > > > > > > > simple pipe.
> > > > > > > > > > > >   Conversely, maybe in the future we find a way to have 
> > > > > > > > > > > > the
> > > > > > > > > > > > front-end
> > > > > > > > > > > >   get an immediate FD for the migration stream (in some
> > > > > > > > > > > > cases), in which
> > > > > > > > > > > >   case we will want to send this to the back-end 
> > > > > > > > > > > > instead of
> > > > > > > > > > > > creating a
> > > > > > > > > > > >   pipe.
> > > > > > > > > > > >   Hence the negotiation: If one side has a better idea 
> > > > > > > > > > > > than
> > > > > > > > > > > > a plain
> > > > > > > > > > > >   pipe, we will want to use that.
> > > > > > > > > > > >
> > > > > > > > > > > > - CHECK_DEVICE_STATE: After the state has been 
> > > > > > > > > > > > transferred
> > > > > > > > > > > > through the
> > > > > > > > > > > >   pipe (the end indicated by EOF), the front-end invokes
> > > > > > > > > > > > this function
> > > > > > > > > > > >   to verify success.  There is no in-band way (through 
> > > > > > > > > > > > the
> > > > > > > > > > > > pipe) to
> > > > > > > > > > > >   indicate failure, so we need to check explicitly.
> > > > > > > > > > > >
> > > > > > > > > > > > Once the transfer pipe has been established via
> > > > > > > > > > > > SET_DEVICE_STATE_FD
> > > > > > > > > > > > (which includes establishing the direction of transfer 
> > > > > > > > > > > > and

Re: [PATCH] virtio-net: not enable vq reset feature unconditionally

2023-05-08 Thread Jason Wang
On Tue, May 9, 2023 at 1:32 AM Eugenio Perez Martin  wrote:
>
> On Mon, May 8, 2023 at 12:22 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, May 08, 2023 at 11:09:46AM +0200, Eugenio Perez Martin wrote:
> > > On Sat, May 6, 2023 at 4:25 AM Xuan Zhuo  
> > > wrote:
> > > >
> > > > On Thu,  4 May 2023 12:14:47 +0200, =?utf-8?q?Eugenio_P=C3=A9rez?= 
> > > >  wrote:
> > > > > The commit 93a97dc5200a ("virtio-net: enable vq reset feature") 
> > > > > enables
> > > > > unconditionally vq reset feature as long as the device is emulated.
> > > > > This makes impossible to actually disable the feature, and it causes
> > > > > migration problems from qemu version previous than 7.2.
> > > > >
> > > > > The entire final commit is unneeded as device system already enable or
> > > > > disable the feature properly.
> > > > >
> > > > > This reverts commit 93a97dc5200a95e63b99cb625f20b7ae802ba413.
> > > > > Fixes: 93a97dc5200a ("virtio-net: enable vq reset feature")
> > > > > Signed-off-by: Eugenio Pérez 
> > > > >
> > > > > ---
> > > > > Tested by checking feature bit at  
> > > > > /sys/devices/pci.../virtio0/features
> > > > > enabling and disabling queue_reset virtio-net feature and vhost=on/off
> > > > > on net device backend.
> > > >
> > > > Do you mean that this feature cannot be closed?
> > > >
> > > > I tried to close in the guest, it was successful.
> > > >
> > >
> > > I'm not sure what you mean with close. If the device dataplane is
> > > emulated in qemu (vhost=off), I'm not able to make the device not
> > > offer it.
> > >
> > > > In addition, in this case, could you try to repair the problem instead 
> > > > of
> > > > directly revert.
> > > >
> > >
> > > I'm not following this. The revert is not to always disable the feature.
> > >
> > > By default, the feature is enabled. If cmdline states queue_reset=on,
> > > the feature is enabled. That is true both before and after applying
> > > this patch.
> > >
> > > However, in qemu master, queue_reset=off keeps enabling this feature
> > > on the device. It happens that there is a commit explicitly doing
> > > that, so I'm reverting it.
> > >
> > > Let me know if that makes sense to you.
> > >
> > > Thanks!
> >
> >
> > question is this:
> >
> > DEFINE_PROP_BIT64("queue_reset", _state, _field, \
> >   VIRTIO_F_RING_RESET, true)
> >
> >
> >
> > don't we need compat for 7.2 and back for this property?
> >
>
> I think that part is already covered by commit 69e1c14aa222 ("virtio:
> core: vq reset feature negotation support"). In that regard, maybe we
> can simplify the patch message simply stating that queue_reset=off
> does not work.
>
> Thanks!

Ack

Thanks

>




Re: [PATCH v3 0/4] hw/arm/virt: Support dirty ring

2023-05-08 Thread Gavin Shan

Hi Paolo,

On 5/9/23 12:21 PM, Gavin Shan wrote:

This series intends to support dirty ring for live migration for arm64. The
dirty ring use discrete buffer to track dirty pages. For arm64, the speciality
is to use backup bitmap to track dirty pages when there is no-running-vcpu
context. It's known that the backup bitmap needs to be synchronized when
KVM device "kvm-arm-gicv3" or "arm-its-kvm" has been enabled. The backup
bitmap is collected in the last stage of migration. The policy here is to
always enable the backup bitmap extension. The overhead to synchronize the
backup bitmap in the last stage of migration, when those two devices aren't
used, is introduced. However, the overhead should be very small and acceptable.
The benefit is to support future cases where those two devices are used without
modifying the code.

PATCH[1] add migration last stage indicator
PATCH[2] synchronize the backup bitmap in the last stage of migration
PATCH[3] add helper kvm_dirty_ring_init() to enable dirty ring
PATCH[4] enable dirty ring for arm64

v2: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg01342.html
v1: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg00434.html
RFCv1: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg00171.html

Testing
===
(1) kvm-unit-tests/its-pending-migration and kvm-unit-tests/its-migration with
 dirty ring or normal dirty page tracking mechanism. All test cases passed.

 QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \
 ./its-pending-migration

 QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \
 ./its-migration

 QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 
\
 ./its-pending-migration

 QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 
\
 ./its-migration

(2) Combinations of migration, post-copy migration, e1000e and virtio-net
 devices. All test cases passed.

 -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown  \
 -device e1000e,bus=pcie.5,netdev=net0,mac=52:54:00:f1:26:a0

 -netdev tap,id=vnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
 -device virtio-net-pci,bus=pcie.6,netdev=vnet0,mac=52:54:00:f1:26:b0

Changelog
=
v3:
   * Rebase for QEMU v8.1.0 
(Gavin)
v2:
   * Drop PATCH[v1 1/6] to synchronize linux-headers
(Gavin)
   * More restrictive comments about struct MemoryListener::log_sync_global 
(PeterX)
   * Always enable the backup bitmap extension  
(PeterM)
v1:
   * Combine two patches into one PATCH[v1 2/6] for the last stage indicator
(PeterX)
   * Drop the secondary bitmap and use the original one directly
(Juan)
   * Avoid "goto out" in helper kvm_dirty_ring_init()   
(Juan)

Gavin Shan (4):
   migration: Add last stage indicator to global dirty log
   kvm: Synchronize the backup bitmap in the last stage
   kvm: Add helper kvm_dirty_ring_init()
   kvm: Enable dirty ring for arm64

  accel/kvm/kvm-all.c  | 108 ---
  include/exec/memory.h|   7 ++-
  include/sysemu/kvm_int.h |   1 +
  migration/dirtyrate.c|   4 +-
  migration/ram.c  |  20 
  softmmu/memory.c |  10 ++--
  6 files changed, 101 insertions(+), 49 deletions(-)



Could you please help to take a look and queue this series for QEMU v8.1 if it 
looks good?
Peter Maydell has the suggestion [1] to merge the v2 series to QEMU v8.1. there 
is no
difference between v2 and v3 except the fixed rebase conflicts in v3.

[1] https://lists.nongnu.org/archive/html/qemu-arm/2023-03/msg00551.html

Thanks,
Gavin




[PATCH v3 4/4] kvm: Enable dirty ring for arm64

2023-05-08 Thread Gavin Shan
arm64 has different capability from x86 to enable the dirty ring, which
is KVM_CAP_DIRTY_LOG_RING_ACQ_REL. Besides, arm64 also needs the backup
bitmap extension (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) when 'kvm-arm-gicv3'
or 'arm-its-kvm' device is enabled. Here the extension is always enabled
and the unnecessary overhead to do the last stage of dirty log synchronization
when those two devices aren't used is introduced, but the overhead should
be very small and acceptable. The benefit is cover future cases where those
two devices are used without modifying the code.

Signed-off-by: Gavin Shan 
Reviewed-by: Juan Quintela 
Tested-by: Zhenyu Zhang 
---
 accel/kvm/kvm-all.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 5d0de9d0a8..7679f397ae 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1466,6 +1466,7 @@ static int kvm_dirty_ring_init(KVMState *s)
 {
 uint32_t ring_size = s->kvm_dirty_ring_size;
 uint64_t ring_bytes = ring_size * sizeof(struct kvm_dirty_gfn);
+unsigned int capability = KVM_CAP_DIRTY_LOG_RING;
 int ret;
 
 s->kvm_dirty_ring_size = 0;
@@ -1480,7 +1481,12 @@ static int kvm_dirty_ring_init(KVMState *s)
  * Read the max supported pages. Fall back to dirty logging mode
  * if the dirty ring isn't supported.
  */
-ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING);
+ret = kvm_vm_check_extension(s, capability);
+if (ret <= 0) {
+capability = KVM_CAP_DIRTY_LOG_RING_ACQ_REL;
+ret = kvm_vm_check_extension(s, capability);
+}
+
 if (ret <= 0) {
 warn_report("KVM dirty ring not available, using bitmap method");
 return 0;
@@ -1493,13 +1499,26 @@ static int kvm_dirty_ring_init(KVMState *s)
 return -EINVAL;
 }
 
-ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes);
+ret = kvm_vm_enable_cap(s, capability, 0, ring_bytes);
 if (ret) {
 error_report("Enabling of KVM dirty ring failed: %s. "
  "Suggested minimum value is 1024.", strerror(-ret));
 return -EIO;
 }
 
+/* Enable the backup bitmap if it is supported */
+ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP);
+if (ret > 0) {
+ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP, 0);
+if (ret) {
+error_report("Enabling of KVM dirty ring's backup bitmap failed: "
+ "%s. ", strerror(-ret));
+return -EIO;
+}
+
+s->kvm_dirty_ring_with_bitmap = true;
+}
+
 s->kvm_dirty_ring_size = ring_size;
 s->kvm_dirty_ring_bytes = ring_bytes;
 
-- 
2.23.0




[PATCH v3 2/4] kvm: Synchronize the backup bitmap in the last stage

2023-05-08 Thread Gavin Shan
In the last stage of live migration or memory slot removal, the
backup bitmap needs to be synchronized when it has been enabled.

Signed-off-by: Gavin Shan 
Reviewed-by: Peter Xu 
Tested-by: Zhenyu Zhang 
---
 accel/kvm/kvm-all.c  | 11 +++
 include/sysemu/kvm_int.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 870abad826..c3aaabf304 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1361,6 +1361,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
  */
 if (kvm_state->kvm_dirty_ring_size) {
 kvm_dirty_ring_reap_locked(kvm_state, NULL);
+if (kvm_state->kvm_dirty_ring_with_bitmap) {
+kvm_slot_sync_dirty_pages(mem);
+kvm_slot_get_dirty_log(kvm_state, mem);
+}
 } else {
 kvm_slot_get_dirty_log(kvm_state, mem);
 }
@@ -1582,6 +1586,12 @@ static void kvm_log_sync_global(MemoryListener *l, bool 
last_stage)
 mem = &kml->slots[i];
 if (mem->memory_size && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
 kvm_slot_sync_dirty_pages(mem);
+
+if (s->kvm_dirty_ring_with_bitmap && last_stage &&
+kvm_slot_get_dirty_log(s, mem)) {
+kvm_slot_sync_dirty_pages(mem);
+}
+
 /*
  * This is not needed by KVM_GET_DIRTY_LOG because the
  * ioctl will unconditionally overwrite the whole region.
@@ -3710,6 +3720,7 @@ static void kvm_accel_instance_init(Object *obj)
 s->kernel_irqchip_split = ON_OFF_AUTO_AUTO;
 /* KVM dirty ring is by default off */
 s->kvm_dirty_ring_size = 0;
+s->kvm_dirty_ring_with_bitmap = false;
 s->notify_vmexit = NOTIFY_VMEXIT_OPTION_RUN;
 s->notify_window = 0;
 s->xen_version = 0;
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index a641c974ea..511b42bde5 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -115,6 +115,7 @@ struct KVMState
 } *as;
 uint64_t kvm_dirty_ring_bytes;  /* Size of the per-vcpu dirty ring */
 uint32_t kvm_dirty_ring_size;   /* Number of dirty GFNs per ring */
+bool kvm_dirty_ring_with_bitmap;
 struct KVMDirtyRingReaper reaper;
 NotifyVmexitOption notify_vmexit;
 uint32_t notify_window;
-- 
2.23.0




[PATCH v3 1/4] migration: Add last stage indicator to global dirty log

2023-05-08 Thread Gavin Shan
The global dirty log synchronization is used when KVM and dirty ring
are enabled. There is a particularity for ARM64 where the backup
bitmap is used to track dirty pages in non-running-vcpu situations.
It means the dirty ring works with the combination of ring buffer
and backup bitmap. The dirty bits in the backup bitmap needs to
collected in the last stage of live migration.

In order to identify the last stage of live migration and pass it
down, an extra parameter is added to the relevant functions and
callbacks. This last stage indicator isn't used until the dirty
ring is enabled in the subsequent patches.

No functional change intended.

Signed-off-by: Gavin Shan 
Reviewed-by: Peter Xu 
Tested-by: Zhenyu Zhang 
---
 accel/kvm/kvm-all.c   |  2 +-
 include/exec/memory.h |  7 +--
 migration/dirtyrate.c |  4 ++--
 migration/ram.c   | 20 ++--
 softmmu/memory.c  | 10 +-
 5 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index cf3a88d90e..870abad826 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1563,7 +1563,7 @@ static void kvm_log_sync(MemoryListener *listener,
 kvm_slots_unlock();
 }
 
-static void kvm_log_sync_global(MemoryListener *l)
+static void kvm_log_sync_global(MemoryListener *l, bool last_stage)
 {
 KVMMemoryListener *kml = container_of(l, KVMMemoryListener, listener);
 KVMState *s = kvm_state;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index e45ce6061f..df01b0ef8a 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -934,8 +934,11 @@ struct MemoryListener {
  * its @log_sync must be NULL.  Vice versa.
  *
  * @listener: The #MemoryListener.
+ * @last_stage: The last stage to synchronize the log during migration.
+ * The caller should gurantee that the synchronization with true for
+ * @last_stage is triggered for once after all VCPUs have been stopped.
  */
-void (*log_sync_global)(MemoryListener *listener);
+void (*log_sync_global)(MemoryListener *listener, bool last_stage);
 
 /**
  * @log_clear:
@@ -2423,7 +2426,7 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr,
  *
  * Synchronizes the dirty page log for all address spaces.
  */
-void memory_global_dirty_log_sync(void);
+void memory_global_dirty_log_sync(bool last_stage);
 
 /**
  * memory_global_dirty_log_sync: synchronize the dirty log for all memory
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 180ba38c7a..486085a9cf 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -102,7 +102,7 @@ void global_dirty_log_change(unsigned int flag, bool start)
 static void global_dirty_log_sync(unsigned int flag, bool one_shot)
 {
 qemu_mutex_lock_iothread();
-memory_global_dirty_log_sync();
+memory_global_dirty_log_sync(false);
 if (one_shot) {
 memory_global_dirty_log_stop(flag);
 }
@@ -554,7 +554,7 @@ static void calculate_dirtyrate_dirty_bitmap(struct 
DirtyRateConfig config)
  * skip it unconditionally and start dirty tracking
  * from 2'round of log sync
  */
-memory_global_dirty_log_sync();
+memory_global_dirty_log_sync(false);
 
 /*
  * reset page protect manually and unconditionally.
diff --git a/migration/ram.c b/migration/ram.c
index 5e7bf20ca5..6154e4f18b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1204,7 +1204,7 @@ static void migration_trigger_throttle(RAMState *rs)
 }
 }
 
-static void migration_bitmap_sync(RAMState *rs)
+static void migration_bitmap_sync(RAMState *rs, bool last_stage)
 {
 RAMBlock *block;
 int64_t end_time;
@@ -1216,7 +1216,7 @@ static void migration_bitmap_sync(RAMState *rs)
 }
 
 trace_migration_bitmap_sync_start();
-memory_global_dirty_log_sync();
+memory_global_dirty_log_sync(last_stage);
 
 qemu_mutex_lock(&rs->bitmap_mutex);
 WITH_RCU_READ_LOCK_GUARD() {
@@ -1251,7 +1251,7 @@ static void migration_bitmap_sync(RAMState *rs)
 }
 }
 
-static void migration_bitmap_sync_precopy(RAMState *rs)
+static void migration_bitmap_sync_precopy(RAMState *rs, bool last_stage)
 {
 Error *local_err = NULL;
 
@@ -1264,7 +1264,7 @@ static void migration_bitmap_sync_precopy(RAMState *rs)
 local_err = NULL;
 }
 
-migration_bitmap_sync(rs);
+migration_bitmap_sync(rs, last_stage);
 
 if (precopy_notify(PRECOPY_NOTIFY_AFTER_BITMAP_SYNC, &local_err)) {
 error_report_err(local_err);
@@ -2924,7 +2924,7 @@ void ram_postcopy_send_discard_bitmap(MigrationState *ms)
 RCU_READ_LOCK_GUARD();
 
 /* This should be our last sync, the src is now paused */
-migration_bitmap_sync(rs);
+migration_bitmap_sync(rs, false);
 
 /* Easiest way to make sure we don't resume in the middle of a host-page */
 rs->pss[RAM_CHANNEL_PRECOPY].last_sent_block = NULL;
@@ -3115,7 +3115,7 @@ static void ram_init_bitmaps(RAMState *rs)
 /* We don't use dirty log with backgrou

[PATCH v3 0/4] hw/arm/virt: Support dirty ring

2023-05-08 Thread Gavin Shan
This series intends to support dirty ring for live migration for arm64. The
dirty ring use discrete buffer to track dirty pages. For arm64, the speciality
is to use backup bitmap to track dirty pages when there is no-running-vcpu
context. It's known that the backup bitmap needs to be synchronized when
KVM device "kvm-arm-gicv3" or "arm-its-kvm" has been enabled. The backup
bitmap is collected in the last stage of migration. The policy here is to
always enable the backup bitmap extension. The overhead to synchronize the
backup bitmap in the last stage of migration, when those two devices aren't
used, is introduced. However, the overhead should be very small and acceptable.
The benefit is to support future cases where those two devices are used without
modifying the code.

PATCH[1] add migration last stage indicator
PATCH[2] synchronize the backup bitmap in the last stage of migration
PATCH[3] add helper kvm_dirty_ring_init() to enable dirty ring
PATCH[4] enable dirty ring for arm64

   v2: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg01342.html
   v1: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg00434.html
RFCv1: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg00171.html

Testing
===
(1) kvm-unit-tests/its-pending-migration and kvm-unit-tests/its-migration with
dirty ring or normal dirty page tracking mechanism. All test cases passed.

QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \
./its-pending-migration

QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \
./its-migration

QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 \
./its-pending-migration

QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 \
./its-migration

(2) Combinations of migration, post-copy migration, e1000e and virtio-net
devices. All test cases passed.

-netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown  \
-device e1000e,bus=pcie.5,netdev=net0,mac=52:54:00:f1:26:a0

-netdev tap,id=vnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-device virtio-net-pci,bus=pcie.6,netdev=vnet0,mac=52:54:00:f1:26:b0

Changelog
=
v3:
  * Rebase for QEMU v8.1.0 
(Gavin)
v2:
  * Drop PATCH[v1 1/6] to synchronize linux-headers
(Gavin)
  * More restrictive comments about struct MemoryListener::log_sync_global 
(PeterX)
  * Always enable the backup bitmap extension  
(PeterM)
v1:
  * Combine two patches into one PATCH[v1 2/6] for the last stage indicator
(PeterX)
  * Drop the secondary bitmap and use the original one directly
(Juan)
  * Avoid "goto out" in helper kvm_dirty_ring_init()   
(Juan)

Gavin Shan (4):
  migration: Add last stage indicator to global dirty log
  kvm: Synchronize the backup bitmap in the last stage
  kvm: Add helper kvm_dirty_ring_init()
  kvm: Enable dirty ring for arm64

 accel/kvm/kvm-all.c  | 108 ---
 include/exec/memory.h|   7 ++-
 include/sysemu/kvm_int.h |   1 +
 migration/dirtyrate.c|   4 +-
 migration/ram.c  |  20 
 softmmu/memory.c |  10 ++--
 6 files changed, 101 insertions(+), 49 deletions(-)

-- 
2.23.0




[PATCH v3 3/4] kvm: Add helper kvm_dirty_ring_init()

2023-05-08 Thread Gavin Shan
Due to multiple capabilities associated with the dirty ring for different
architectures: KVM_CAP_DIRTY_{LOG_RING, LOG_RING_ACQ_REL} for x86 and
arm64 separately. There will be more to be done in order to support the
dirty ring for arm64.

Lets add helper kvm_dirty_ring_init() to enable the dirty ring. With this,
the code looks a bit clean.

No functional change intended.

Signed-off-by: Gavin Shan 
Reviewed-by: Peter Xu 
Tested-by: Zhenyu Zhang 
---
 accel/kvm/kvm-all.c | 76 -
 1 file changed, 47 insertions(+), 29 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index c3aaabf304..5d0de9d0a8 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1462,6 +1462,50 @@ static int kvm_dirty_ring_reaper_init(KVMState *s)
 return 0;
 }
 
+static int kvm_dirty_ring_init(KVMState *s)
+{
+uint32_t ring_size = s->kvm_dirty_ring_size;
+uint64_t ring_bytes = ring_size * sizeof(struct kvm_dirty_gfn);
+int ret;
+
+s->kvm_dirty_ring_size = 0;
+s->kvm_dirty_ring_bytes = 0;
+
+/* Bail if the dirty ring size isn't specified */
+if (!ring_size) {
+return 0;
+}
+
+/*
+ * Read the max supported pages. Fall back to dirty logging mode
+ * if the dirty ring isn't supported.
+ */
+ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING);
+if (ret <= 0) {
+warn_report("KVM dirty ring not available, using bitmap method");
+return 0;
+}
+
+if (ring_bytes > ret) {
+error_report("KVM dirty ring size %" PRIu32 " too big "
+ "(maximum is %ld).  Please use a smaller value.",
+ ring_size, (long)ret / sizeof(struct kvm_dirty_gfn));
+return -EINVAL;
+}
+
+ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes);
+if (ret) {
+error_report("Enabling of KVM dirty ring failed: %s. "
+ "Suggested minimum value is 1024.", strerror(-ret));
+return -EIO;
+}
+
+s->kvm_dirty_ring_size = ring_size;
+s->kvm_dirty_ring_bytes = ring_bytes;
+
+return 0;
+}
+
 static void kvm_region_add(MemoryListener *listener,
MemoryRegionSection *section)
 {
@@ -2531,35 +2575,9 @@ static int kvm_init(MachineState *ms)
  * Enable KVM dirty ring if supported, otherwise fall back to
  * dirty logging mode
  */
-if (s->kvm_dirty_ring_size > 0) {
-uint64_t ring_bytes;
-
-ring_bytes = s->kvm_dirty_ring_size * sizeof(struct kvm_dirty_gfn);
-
-/* Read the max supported pages */
-ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING);
-if (ret > 0) {
-if (ring_bytes > ret) {
-error_report("KVM dirty ring size %" PRIu32 " too big "
- "(maximum is %ld).  Please use a smaller value.",
- s->kvm_dirty_ring_size,
- (long)ret / sizeof(struct kvm_dirty_gfn));
-ret = -EINVAL;
-goto err;
-}
-
-ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes);
-if (ret) {
-error_report("Enabling of KVM dirty ring failed: %s. "
- "Suggested minimum value is 1024.", 
strerror(-ret));
-goto err;
-}
-
-s->kvm_dirty_ring_bytes = ring_bytes;
- } else {
- warn_report("KVM dirty ring not available, using bitmap method");
- s->kvm_dirty_ring_size = 0;
-}
+ret = kvm_dirty_ring_init(s);
+if (ret < 0) {
+goto err;
 }
 
 /*
-- 
2.23.0




Re: [PATCH v10 1/8] memory: prevent dma-reentracy issues

2023-05-08 Thread Song Gao




在 2023/5/8 下午9:12, Thomas Huth 写道:

On 08/05/2023 15.03, Song Gao wrote:

Hi, Thomas

在 2023/5/8 下午5:33, Thomas Huth 写道:

On 06/05/2023 11.25, Song Gao wrote:

  Hi Alexander

在 2023/4/28 下午5:14, Thomas Huth 写道:

On 28/04/2023 11.11, Alexander Bulekov wrote:

On 230428 1015, Thomas Huth wrote:

On 28/04/2023 10.12, Daniel P. Berrangé wrote:

On Thu, Apr 27, 2023 at 05:10:06PM -0400, Alexander Bulekov wrote:
Add a flag to the DeviceState, when a device is engaged in 
PIO/MMIO/DMA.

...
This patch causes the loongarch virtual machine to fail to start 
the slave cpu.


 ./build/qemu-system-loongarch64 -machine virt -m 8G -cpu la464 \
              -smp 4 -bios QEMU_EFI.fd -kernel vmlinuz.efi -initrd 
ramdisk   \
    -serial stdio   -monitor 
telnet:localhost:4495,server,nowait  \
    -append "root=/dev/ram rdinit=/sbin/init 
console=ttyS0,115200"   --nographic




qemu-system-loongarch64: warning: Blocked re-entrant IO on 
MemoryRegion: loongarch_ipi_iocsr at addr: 0x24


Oh, another spot that needs special handling ... I see Alexander 
already sent a patch (thanks!), but anyway, this is a good 
indication that we're missing some test coverage in the CI.


Are there any loongarch kernel images available for public download 
somewhere? If so, we really should add an avocado regression test 
for this - since as far as I can see, we don't have any  tests for 
loongarch in tests/avocado yet?



we can get  some binarys  at:
https://github.com/yangxiaojuan-loongson/qemu-binary

>

I'm not sure that avacodo testing can be done using just the kernel.

Is a full loongarch system required?


No, you don't need a full distro installation, just a kernel with 
ramdisk (which is also available there) is good enough for a basic 
test, e.g. just check whether the kernel boots to a certain point is 
good enough to provide a basic sanity test. If you then can also get 
even into a shell (of the ramdisk), you can check some additional 
stuff in the sysfs or "dmesg" output, see for example 
tests/avocado/machine_s390_ccw_virtio.py which does such checks with a 
kernel and initrd on s390x.



Thanks for you suggestion .

We will add a loongarch basic test  on tests/avacode.

Thanks.
Song Gao




Re: ssl fips self check fails with 7.2.0 on x86 TCG

2023-05-08 Thread Patrick Venture
Verified it was https://gitlab.com/qemu-project/qemu/-/issues/1471

On Thu, May 4, 2023 at 12:03 PM Patrick Venture  wrote:

> Hi,
>
> I just finished rebasing my team onto 7.2.0 and now I'm seeing
> https://boringssl.googlesource.com/boringssl/+/master/crypto/fipsmodule/self_check/self_check.c#361
> fail.
>
> I applied
> https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg00260.html and
> it's still failing.
>
> Is anyone else seeing this issue or have suggestions on how to debug it?
>
> I haven't yet tried with 8.0.0 but that's my next step, although it also
> needs the float32_exp3 patch.
>
> Patrick
>


Re: RE: [PATCH] virtio-crypto: fix NULL pointer dereference in virtio_crypto_free_request

2023-05-08 Thread zhenwei pi




On 5/9/23 09:02, Gonglei (Arei) wrote:




-Original Message-
From: Mauro Matteo Cascella [mailto:mcasc...@redhat.com]
Sent: Monday, May 8, 2023 11:02 PM
To: qemu-devel@nongnu.org
Cc: m...@redhat.com; Gonglei (Arei) ;
pizhen...@bytedance.com; ta...@zju.edu.cn; mcasc...@redhat.com
Subject: [PATCH] virtio-crypto: fix NULL pointer dereference in
virtio_crypto_free_request

Ensure op_info is not NULL in case of QCRYPTODEV_BACKEND_ALG_SYM
algtype.

Fixes: 02ed3e7c ("virtio-crypto: zeroize the key material before free")


I have to say the fixes is incorrect. The bug was introduced by commit 
0e660a6f90a, which
changed the semantic meaning of request-> flag.

Regards,
-Gonglei



Hi Mauro

Agree with Lei, could you please change the Fixes as Lei suggested?

--
zhenwei pi



RE: [PATCH] virtio-crypto: fix NULL pointer dereference in virtio_crypto_free_request

2023-05-08 Thread Gonglei (Arei)



> -Original Message-
> From: Mauro Matteo Cascella [mailto:mcasc...@redhat.com]
> Sent: Monday, May 8, 2023 11:02 PM
> To: qemu-devel@nongnu.org
> Cc: m...@redhat.com; Gonglei (Arei) ;
> pizhen...@bytedance.com; ta...@zju.edu.cn; mcasc...@redhat.com
> Subject: [PATCH] virtio-crypto: fix NULL pointer dereference in
> virtio_crypto_free_request
> 
> Ensure op_info is not NULL in case of QCRYPTODEV_BACKEND_ALG_SYM
> algtype.
> 
> Fixes: 02ed3e7c ("virtio-crypto: zeroize the key material before free")

I have to say the fixes is incorrect. The bug was introduced by commit 
0e660a6f90a, which
changed the semantic meaning of request-> flag.

Regards,
-Gonglei




Re: [PATCH v5 0/3] NUMA: Apply cluster-NUMA-node boundary for aarch64 and riscv machines

2023-05-08 Thread Gavin Shan

Hi Paolo,

On 5/9/23 10:27 AM, Gavin Shan wrote:

For arm64 and riscv architecture, the driver (/base/arch_topology.c) is
used to populate the CPU topology in the Linux guest. It's required that
the CPUs in one cluster can't span mutiple NUMA nodes. Otherwise, the Linux
scheduling domain can't be sorted out, as the following warning message
indicates. To avoid the unexpected confusion, this series attempts to
warn about such kind of irregular configurations.

-smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \
-numa node,nodeid=0,cpus=0-1,memdev=ram0\
-numa node,nodeid=1,cpus=2-3,memdev=ram1\
-numa node,nodeid=2,cpus=4-5,memdev=ram2\

[ cut here ]
WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 
build_sched_domains+0x284/0x910
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1
pstate: 0045 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : build_sched_domains+0x284/0x910
lr : build_sched_domains+0x184/0x910
sp : 8804bd50
x29: 8804bd50 x28: 0002 x27: 
x26: 89cf9a80 x25:  x24: 89cbf840
x23: 80325000 x22: 005df800 x21: 8a4ce508
x20:  x19: 80324440 x18: 0014
x17: 388925c0 x16: 5386a066 x15: 9c10cc2e
x14: 01c0 x13: 0001 x12: 7fffb1a0
x11: 7fffb180 x10: 8a4ce508 x9 : 0041
x8 : 8a4ce500 x7 : 8a4cf920 x6 : 0001
x5 : 0001 x4 : 0007 x3 : 0002
x2 : 1000 x1 : 8a4cf928 x0 : 0001
Call trace:
 build_sched_domains+0x284/0x910
 sched_init_domains+0xac/0xe0
 sched_init_smp+0x48/0xc8
 kernel_init_freeable+0x140/0x1ac
 kernel_init+0x28/0x140
 ret_from_fork+0x10/0x20

PATCH[1] Warn about the irregular configuration if required
PATCH[2] Enable the validation for aarch64 machines
PATCH[3] Enable the validation for riscv machines

v4: https://lists.nongnu.org/archive/html/qemu-arm/2023-04/msg00232.html
v3: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg01226.html
v2: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg01080.html
v1: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg00886.html

Changelog
=
v5:
   * Rebase for QEMU v8.1.0(Gavin)
   * Pick ack-b's from Igor(Gavin)


[...]



Gavin Shan (3):
   numa: Validate cluster and NUMA node boundary if required
   hw/arm: Validate cluster and NUMA node boundary
   hw/riscv: Validate cluster and NUMA node boundary

  hw/arm/sbsa-ref.c   |  2 ++
  hw/arm/virt.c   |  2 ++
  hw/core/machine.c   | 42 ++
  hw/riscv/spike.c|  2 ++
  hw/riscv/virt.c |  2 ++
  include/hw/boards.h |  1 +
  6 files changed, 51 insertions(+)



When v4 was reviewed by Igor, it was mentioned that you're handling 
hw/core/machine
changes recently. Could you please help to queue this series for QEMU v8.1 if it
looks good to you?

Thanks,
Gavin




[PATCH v5 0/3] NUMA: Apply cluster-NUMA-node boundary for aarch64 and riscv machines

2023-05-08 Thread Gavin Shan
For arm64 and riscv architecture, the driver (/base/arch_topology.c) is
used to populate the CPU topology in the Linux guest. It's required that
the CPUs in one cluster can't span mutiple NUMA nodes. Otherwise, the Linux
scheduling domain can't be sorted out, as the following warning message
indicates. To avoid the unexpected confusion, this series attempts to
warn about such kind of irregular configurations.

   -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \
   -numa node,nodeid=0,cpus=0-1,memdev=ram0\
   -numa node,nodeid=1,cpus=2-3,memdev=ram1\
   -numa node,nodeid=2,cpus=4-5,memdev=ram2\

   [ cut here ]
   WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 
build_sched_domains+0x284/0x910
   Modules linked in:
   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1
   pstate: 0045 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
   pc : build_sched_domains+0x284/0x910
   lr : build_sched_domains+0x184/0x910
   sp : 8804bd50
   x29: 8804bd50 x28: 0002 x27: 
   x26: 89cf9a80 x25:  x24: 89cbf840
   x23: 80325000 x22: 005df800 x21: 8a4ce508
   x20:  x19: 80324440 x18: 0014
   x17: 388925c0 x16: 5386a066 x15: 9c10cc2e
   x14: 01c0 x13: 0001 x12: 7fffb1a0
   x11: 7fffb180 x10: 8a4ce508 x9 : 0041
   x8 : 8a4ce500 x7 : 8a4cf920 x6 : 0001
   x5 : 0001 x4 : 0007 x3 : 0002
   x2 : 1000 x1 : 8a4cf928 x0 : 0001
   Call trace:
build_sched_domains+0x284/0x910
sched_init_domains+0xac/0xe0
sched_init_smp+0x48/0xc8
kernel_init_freeable+0x140/0x1ac
kernel_init+0x28/0x140
ret_from_fork+0x10/0x20

PATCH[1] Warn about the irregular configuration if required
PATCH[2] Enable the validation for aarch64 machines
PATCH[3] Enable the validation for riscv machines

v4: https://lists.nongnu.org/archive/html/qemu-arm/2023-04/msg00232.html
v3: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg01226.html
v2: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg01080.html
v1: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg00886.html

Changelog
=
v5:
  * Rebase for QEMU v8.1.0(Gavin)
  * Pick ack-b's from Igor(Gavin)
v4:
  * Pick r-b and ack-b from Daniel/Philippe   (Gavin)
  * Replace local variable @len with possible_cpus->len in
validate_cpu_cluster_to_numa_boundary()   (Philippe)
v3:
  * Validate cluster-to-NUMA instead of socket-to-NUMA
boundary  (Gavin)
  * Move the switch from MachineState to MachineClass (Philippe)
  * Warning instead of rejecting the irregular configuration  (Daniel)
  * Comments to mention cluster-to-NUMA is platform instead
of architectural choice   (Drew)
  * Drop PATCH[v2 1/4] related to qtests/numa-test(Gavin)
v2:
  * Fix socket-NUMA-node boundary issues in qtests/numa-test  (Gavin)
  * Add helper set_numa_socket_boundary() and validate the
boundary in the generic path  (Philippe)

Gavin Shan (3):
  numa: Validate cluster and NUMA node boundary if required
  hw/arm: Validate cluster and NUMA node boundary
  hw/riscv: Validate cluster and NUMA node boundary

 hw/arm/sbsa-ref.c   |  2 ++
 hw/arm/virt.c   |  2 ++
 hw/core/machine.c   | 42 ++
 hw/riscv/spike.c|  2 ++
 hw/riscv/virt.c |  2 ++
 include/hw/boards.h |  1 +
 6 files changed, 51 insertions(+)

-- 
2.23.0




[PATCH v5 2/3] hw/arm: Validate cluster and NUMA node boundary

2023-05-08 Thread Gavin Shan
There are two ARM machines where NUMA is aware: 'virt' and 'sbsa-ref'.
Both of them are required to follow cluster-NUMA-node boundary. To
enable the validation to warn about the irregular configuration where
multiple CPUs in one cluster have been associated with different NUMA
nodes.

Signed-off-by: Gavin Shan 
Acked-by: Igor Mammedov 
---
 hw/arm/sbsa-ref.c | 2 ++
 hw/arm/virt.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index 0b93558dde..efb380e7c8 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -864,6 +864,8 @@ static void sbsa_ref_class_init(ObjectClass *oc, void *data)
 mc->possible_cpu_arch_ids = sbsa_ref_possible_cpu_arch_ids;
 mc->cpu_index_to_instance_props = sbsa_ref_cpu_index_to_props;
 mc->get_default_cpu_node_id = sbsa_ref_get_default_cpu_node_id;
+/* platform instead of architectural choice */
+mc->cpu_cluster_has_numa_boundary = true;
 }
 
 static const TypeInfo sbsa_ref_info = {
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index b99ae18501..5c88b78aab 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -3032,6 +3032,8 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 mc->smp_props.clusters_supported = true;
 mc->auto_enable_numa_with_memhp = true;
 mc->auto_enable_numa_with_memdev = true;
+/* platform instead of architectural choice */
+mc->cpu_cluster_has_numa_boundary = true;
 mc->default_ram_id = "mach-virt.ram";
 
 object_class_property_add(oc, "acpi", "OnOffAuto",
-- 
2.23.0




[PATCH v5 3/3] hw/riscv: Validate cluster and NUMA node boundary

2023-05-08 Thread Gavin Shan
There are two RISCV machines where NUMA is aware: 'virt' and 'spike'.
Both of them are required to follow cluster-NUMA-node boundary. To
enable the validation to warn about the irregular configuration where
multiple CPUs in one cluster has been associated with multiple NUMA
nodes.

Signed-off-by: Gavin Shan 
Reviewed-by: Daniel Henrique Barboza 
Acked-by: Igor Mammedov 
---
 hw/riscv/spike.c | 2 ++
 hw/riscv/virt.c  | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c
index 2c5546560a..81f7e53aed 100644
--- a/hw/riscv/spike.c
+++ b/hw/riscv/spike.c
@@ -354,6 +354,8 @@ static void spike_machine_class_init(ObjectClass *oc, void 
*data)
 mc->cpu_index_to_instance_props = riscv_numa_cpu_index_to_props;
 mc->get_default_cpu_node_id = riscv_numa_get_default_cpu_node_id;
 mc->numa_mem_supported = true;
+/* platform instead of architectural choice */
+mc->cpu_cluster_has_numa_boundary = true;
 mc->default_ram_id = "riscv.spike.ram";
 object_class_property_add_str(oc, "signature", NULL, spike_set_signature);
 object_class_property_set_description(oc, "signature",
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 4e3efbee16..84a2bca460 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -1678,6 +1678,8 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 mc->cpu_index_to_instance_props = riscv_numa_cpu_index_to_props;
 mc->get_default_cpu_node_id = riscv_numa_get_default_cpu_node_id;
 mc->numa_mem_supported = true;
+/* platform instead of architectural choice */
+mc->cpu_cluster_has_numa_boundary = true;
 mc->default_ram_id = "riscv_virt_board.ram";
 assert(!mc->get_hotplug_handler);
 mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
-- 
2.23.0




[PATCH v5 1/3] numa: Validate cluster and NUMA node boundary if required

2023-05-08 Thread Gavin Shan
For some architectures like ARM64, multiple CPUs in one cluster can be
associated with different NUMA nodes, which is irregular configuration
because we shouldn't have this in baremetal environment. The irregular
configuration causes Linux guest to misbehave, as the following warning
messages indicate.

  -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \
  -numa node,nodeid=0,cpus=0-1,memdev=ram0\
  -numa node,nodeid=1,cpus=2-3,memdev=ram1\
  -numa node,nodeid=2,cpus=4-5,memdev=ram2\

  [ cut here ]
  WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 
build_sched_domains+0x284/0x910
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1
  pstate: 0045 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : build_sched_domains+0x284/0x910
  lr : build_sched_domains+0x184/0x910
  sp : 8804bd50
  x29: 8804bd50 x28: 0002 x27: 
  x26: 89cf9a80 x25:  x24: 89cbf840
  x23: 80325000 x22: 005df800 x21: 8a4ce508
  x20:  x19: 80324440 x18: 0014
  x17: 388925c0 x16: 5386a066 x15: 9c10cc2e
  x14: 01c0 x13: 0001 x12: 7fffb1a0
  x11: 7fffb180 x10: 8a4ce508 x9 : 0041
  x8 : 8a4ce500 x7 : 8a4cf920 x6 : 0001
  x5 : 0001 x4 : 0007 x3 : 0002
  x2 : 1000 x1 : 8a4cf928 x0 : 0001
  Call trace:
   build_sched_domains+0x284/0x910
   sched_init_domains+0xac/0xe0
   sched_init_smp+0x48/0xc8
   kernel_init_freeable+0x140/0x1ac
   kernel_init+0x28/0x140
   ret_from_fork+0x10/0x20

Improve the situation to warn when multiple CPUs in one cluster have
been associated with different NUMA nodes. However, one NUMA node is
allowed to be associated with different clusters.

Signed-off-by: Gavin Shan 
Acked-by: Philippe Mathieu-Daudé 
Acked-by: Igor Mammedov 
---
 hw/core/machine.c   | 42 ++
 include/hw/boards.h |  1 +
 2 files changed, 43 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 47a34841a5..b718d89441 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -1261,6 +1261,45 @@ static void machine_numa_finish_cpu_init(MachineState 
*machine)
 g_string_free(s, true);
 }
 
+static void validate_cpu_cluster_to_numa_boundary(MachineState *ms)
+{
+MachineClass *mc = MACHINE_GET_CLASS(ms);
+NumaState *state = ms->numa_state;
+const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
+const CPUArchId *cpus = possible_cpus->cpus;
+int i, j;
+
+if (state->num_nodes <= 1 || possible_cpus->len <= 1) {
+return;
+}
+
+/*
+ * The Linux scheduling domain can't be parsed when the multiple CPUs
+ * in one cluster have been associated with different NUMA nodes. However,
+ * it's fine to associate one NUMA node with CPUs in different clusters.
+ */
+for (i = 0; i < possible_cpus->len; i++) {
+for (j = i + 1; j < possible_cpus->len; j++) {
+if (cpus[i].props.has_socket_id &&
+cpus[i].props.has_cluster_id &&
+cpus[i].props.has_node_id &&
+cpus[j].props.has_socket_id &&
+cpus[j].props.has_cluster_id &&
+cpus[j].props.has_node_id &&
+cpus[i].props.socket_id == cpus[j].props.socket_id &&
+cpus[i].props.cluster_id == cpus[j].props.cluster_id &&
+cpus[i].props.node_id != cpus[j].props.node_id) {
+warn_report("CPU-%d and CPU-%d in socket-%ld-cluster-%ld "
+ "have been associated with node-%ld and node-%ld "
+ "respectively. It can cause OSes like Linux to "
+ "misbehave", i, j, cpus[i].props.socket_id,
+ cpus[i].props.cluster_id, cpus[i].props.node_id,
+ cpus[j].props.node_id);
+}
+}
+}
+}
+
 MemoryRegion *machine_consume_memdev(MachineState *machine,
  HostMemoryBackend *backend)
 {
@@ -1346,6 +1385,9 @@ void machine_run_board_init(MachineState *machine, const 
char *mem_path, Error *
 numa_complete_configuration(machine);
 if (machine->numa_state->num_nodes) {
 machine_numa_finish_cpu_init(machine);
+if (machine_class->cpu_cluster_has_numa_boundary) {
+validate_cpu_cluster_to_numa_boundary(machine);
+}
 }
 }
 
diff --git a/include/hw/boards.h b/include/hw/boards.h
index f4117fdb9a..f609cc9aed 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -273,6 +273,7 @@ struct MachineClass {
 bool nvdimm_supported;
 bool numa_mem_supported;
 bool a

Re: [PATCH 05/22] hw/arm: Select VIRTIO_NET for virt machine

2023-05-08 Thread Paolo Bonzini
Il gio 4 mag 2023, 14:56 Fabiano Rosas  ha scritto:

>
> It's a bit hard to maintain the original intention with just
> documentation. Couldn't we require that --without-default-devices always
> be accompanied by --with-devices?


Maybe, but why would it be bad to just patch the default .mak file?

And more to the point of Peter's
> question, couldn't we just leave the defaults off unconditionally when
> --without-default-devices is passed without --with-devices?
>

No, for example RHEL adds a lot of devices and is perfectly usable without
--nodefaults, but we still use --without-default-devices because we want
any new config to be opt in, unless it's always needed.

The coupling of -nodefaults with --without-default-devices is a bit
> redundant. If we're choosing to not build some devices, then the QEMU
> binary should already know that.
>

--without-default-devices is not about choosing to not build some devices;
it is about making non-selected devices opt-in rather than opt-out.

Paolo


> Just to be clear, -nodefaults by itself still makes sense because we can
> have a simple command line for those using QEMU directly while allowing
> the management layer to fine tune the devices.
>
> In the long run, I think we need to add some configure option that gives
> us pure allnoconfig so we can have that in the CI and catch these CONFIG
> issues before merging. There's no reason to merge a new CONFIG if it
> will then be impossible to turn it off.
>
>


Re: [PULL 11/35] arm/Kconfig: Do not build TCG-only boards on a KVM-only build

2023-05-08 Thread Paolo Bonzini
Il gio 4 mag 2023, 14:27 Fabiano Rosas  ha scritto:

> Thomas Huth  writes:
>
> > On 02/05/2023 14.14, Peter Maydell wrote:
> >> From: Fabiano Rosas 
> >>
> >> Move all the CONFIG_FOO=y from default.mak into "default y if TCG"
> >> statements in Kconfig. That way they won't be selected when
> >> CONFIG_TCG=n.
> >>
> >> I'm leaving CONFIG_ARM_VIRT in default.mak because it allows us to
> >> keep the two default.mak files not empty and keep aarch64-default.mak
> >> including arm-default.mak. That way we don't surprise anyone that's
> >> used to altering these files.
> >>
> >> With this change we can start building with --disable-tcg.
> >>
> >> Signed-off-by: Fabiano Rosas 
> >> Reviewed-by: Richard Henderson 
> >> Message-id: 20230426180013.14814-12-faro...@suse.de
> >> Signed-off-by: Peter Maydell 
> >> ---
> >>   configs/devices/aarch64-softmmu/default.mak |  4 --
> >>   configs/devices/arm-softmmu/default.mak | 37 --
> >>   hw/arm/Kconfig  | 42 -
> >>   3 files changed, 41 insertions(+), 42 deletions(-)
> >>
> >> diff --git a/configs/devices/aarch64-softmmu/default.mak
> b/configs/devices/aarch64-softmmu/default.mak
> >> index cf43ac8da11..70e05a197dc 100644
> >> --- a/configs/devices/aarch64-softmmu/default.mak
> >> +++ b/configs/devices/aarch64-softmmu/default.mak
> >> @@ -2,7 +2,3 @@
> >>
> >>   # We support all the 32 bit boards so need all their config
> >>   include ../arm-softmmu/default.mak
> >> -
> >> -CONFIG_XLNX_ZYNQMP_ARM=y
> >> -CONFIG_XLNX_VERSAL=y
> >> -CONFIG_SBSA_REF=y
> >> diff --git a/configs/devices/arm-softmmu/default.mak
> b/configs/devices/arm-softmmu/default.mak
> >> index cb3e5aea657..647fbce88d3 100644
> >> --- a/configs/devices/arm-softmmu/default.mak
> >> +++ b/configs/devices/arm-softmmu/default.mak
> >> @@ -4,40 +4,3 @@
> >>   # CONFIG_TEST_DEVICES=n
> >>
> >>   CONFIG_ARM_VIRT=y
> >> -CONFIG_CUBIEBOARD=y
> >> -CONFIG_EXYNOS4=y
> >> -CONFIG_HIGHBANK=y
> >> -CONFIG_INTEGRATOR=y
> >> -CONFIG_FSL_IMX31=y
> >> -CONFIG_MUSICPAL=y
> >> -CONFIG_MUSCA=y
> >> -CONFIG_CHEETAH=y
> >> -CONFIG_SX1=y
> >> -CONFIG_NSERIES=y
> >> -CONFIG_STELLARIS=y
> >> -CONFIG_STM32VLDISCOVERY=y
> >> -CONFIG_REALVIEW=y
> >> -CONFIG_VERSATILE=y
> >> -CONFIG_VEXPRESS=y
> >> -CONFIG_ZYNQ=y
> >> -CONFIG_MAINSTONE=y
> >> -CONFIG_GUMSTIX=y
> >> -CONFIG_SPITZ=y
> >> -CONFIG_TOSA=y
> >> -CONFIG_Z2=y
> >> -CONFIG_NPCM7XX=y
> >> -CONFIG_COLLIE=y
> >> -CONFIG_ASPEED_SOC=y
> >> -CONFIG_NETDUINO2=y
> >> -CONFIG_NETDUINOPLUS2=y
> >> -CONFIG_OLIMEX_STM32_H405=y
> >> -CONFIG_MPS2=y
> >> -CONFIG_RASPI=y
> >> -CONFIG_DIGIC=y
> >> -CONFIG_SABRELITE=y
> >> -CONFIG_EMCRAFT_SF2=y
> >> -CONFIG_MICROBIT=y
> >> -CONFIG_FSL_IMX25=y
> >> -CONFIG_FSL_IMX7=y
> >> -CONFIG_FSL_IMX6UL=y
> >> -CONFIG_ALLWINNER_H3=y
> >> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> >> index 87c1a29c912..2d7c4579559 100644
> >> --- a/hw/arm/Kconfig
> >> +++ b/hw/arm/Kconfig
> >> @@ -35,20 +35,24 @@ config ARM_VIRT
> >>
> >>   config CHEETAH
> >>   bool
> >> +default y if TCG && ARM
> >>   select OMAP
> >>   select TSC210X
> >>
> >>   config CUBIEBOARD
> >>   bool
> >> +default y if TCG && ARM
> >>   select ALLWINNER_A10
> > ...
> >
> >   Hi!
> >
> > Sorry for not noticing this earlier, but I have to say that I really
> dislike
> > this change, since it very much changes the way we did our machine
> > configuration so far.
> > Until now, you could simply go to configs/devices/*-softmmu/*.mak and
> only
> > select the machines you wanted to have with "...=y" and delete
> everything
> > else. Now you have to know *all* the machines that you do *not* want to
> have
> > in your build and disable them with "...=n" in that file. That's quite
> ugly,
> > especially for the arm target that has so many machines. (ok, you could
> also
> > do a "--without-default-devices" configuration to get rid of the
> machines,
> > but that also disables all other kind of devices that you then have to
> > specify manually).
> >
>
> Would leaving the CONFIGs as 'n', but commented out in the .mak files be
> of any help? If I understand your use case, you were probably just
> deleting the CONFIG=y for the boards you don't want. So now you'd be
> uncommenting the CONFIG=n instead.


Yes, that would help—though it is likely to bitrot. I would also change the
"if TCG" part to "depends on TCG && ARM", which will break loudly if
someone sets the config to y with the wrong accelerator or in the wrong
file.

Once this is done for ARM we can extend it to other .mak files for
consistency.

Paolo


> Alternatively, we could revert the .mak part of this change, convert
> default.mak into tcg.mak and kvm.mak, and use those transparently
> depending on whether --disable-tcg is present in the configure line.
>
> But there's probably a better way still that I'm not seeing here, let's
> see what others think.
>
>


Re: [PATCH v20 11/21] qapi/s390x/cpu topology: CPU_POLARIZATION_CHANGE qapi event

2023-05-08 Thread Nina Schoetterl-Glausch
On Tue, 2023-04-25 at 18:14 +0200, Pierre Morel wrote:
> When the guest asks to change the polarization this change
> is forwarded to the upper layer using QAPI.
> The upper layer is supposed to take according decisions concerning
> CPU provisioning.
> 
> Signed-off-by: Pierre Morel 
> ---
>  qapi/machine-target.json | 33 +
>  hw/s390x/cpu-topology.c  |  2 ++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/qapi/machine-target.json b/qapi/machine-target.json
> index 3b7a0b77f4..ffde2e9cbd 100644
> --- a/qapi/machine-target.json
> +++ b/qapi/machine-target.json
> @@ -391,3 +391,36 @@
>'features': [ 'unstable' ],
>'if': { 'all': [ 'TARGET_S390X' , 'CONFIG_KVM' ] }
>  }
> +
> +##
> +# @CPU_POLARIZATION_CHANGE:
> +#
> +# Emitted when the guest asks to change the polarization.
> +#
> +# @polarization: polarization specified by the guest
> +#
> +# Features:
> +# @unstable: This command may still be modified.
> +#
> +# The guest can tell the host (via the PTF instruction) whether the
> +# CPUs should be provisioned using horizontal or vertical polarization.
> +#
> +# On horizontal polarization the host is expected to provision all vCPUs
> +# equally.
> +# On vertical polarization the host can provision each vCPU differently.
> +# The guest will get information on the details of the provisioning
> +# the next time it uses the STSI(15) instruction.
> +#
> +# Since: 8.1
> +#
> +# Example:
> +#
> +# <- { "event": "CPU_POLARIZATION_CHANGE",
> +#  "data": { "polarization": 0 },

I think you'd be getting "horizontal" instead of 0.

> +#  "timestamp": { "seconds": 1401385907, "microseconds": 422329 } }
> +##
> +{ 'event': 'CPU_POLARIZATION_CHANGE',
> +  'data': { 'polarization': 'CpuS390Polarization' },
> +  'features': [ 'unstable' ],
> +  'if': { 'all': [ 'TARGET_S390X', 'CONFIG_KVM' ] }
> +}
> diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
> index e5fb976594..e8b140d623 100644
> --- a/hw/s390x/cpu-topology.c
> +++ b/hw/s390x/cpu-topology.c
> @@ -17,6 +17,7 @@
>  #include "hw/s390x/s390-virtio-ccw.h"
>  #include "hw/s390x/cpu-topology.h"
>  #include "qapi/qapi-commands-machine-target.h"
> +#include "qapi/qapi-events-machine-target.h"
>  
>  /*
>   * s390_topology is used to keep the topology information.
> @@ -138,6 +139,7 @@ void s390_handle_ptf(S390CPU *cpu, uint8_t r1, uintptr_t 
> ra)
>  } else {
>  s390_topology.vertical_polarization = !!fc;
>  s390_cpu_topology_set_changed(true);
> +qapi_event_send_cpu_polarization_change(fc);

I'm not sure I like the implicit conversation of the function code to the enum 
value.
How about you do 
qapi_event_send_cpu_polarization_change(s390_topology.polarization);
and rename vertical_polarization and change it's type to the enum.
You can then also do

+CpuS390Polarization polarization = S390_CPU_POLARIZATION_HORIZONTAL;
+switch (fc) {
+case S390_CPU_POLARIZATION_VERTICAL:
+polarization = S390_CPU_POLARIZATION_VERTICAL;
+/* fallthrough */
+case S390_CPU_POLARIZATION_HORIZONTAL:
+if (s390_topology.polarization == polarization) {

and use the value for the assignment further down, too.
>  setcc(cpu, 0);
>  }
>  break;




Re: [PATCH 11/11] cutils: Improve qemu_strtosz handling of fractions

2023-05-08 Thread Eric Blake
On Mon, May 08, 2023 at 03:03:43PM -0500, Eric Blake wrote:
> We have several limitations and bugs worth fixing; they are
> inter-related enough that it is not worth splitting this patch into
> smaller pieces:
> 
> * ".5k" should work to specify 512, just as "0.5k" does
> * "1.k" and "1." + "9"*50 + "k" should both produce the same
>   result of 2048 after rounding
> * "1." + "0"*350 + "1B" should not be treated the same as "1.0B";
>   underflow in the fraction should not be lost
> * "7.99e99" and "7.99e999" look similar, but our code was doing a
>   read-out-of-bounds on the latter because it was not expecting ERANGE
>   due to overflow. While we document that scientific notation is not
>   supported, and the previous patch actually fixed
>   qemu_strtod_finite() to no longer return ERANGE overflows, it is
>   easier to pre-filter than to try and determine after the fact if
>   strtod() consumed more than we wanted.  Note that this is a
>   low-level semantic change (when endptr is not NULL, we can now
>   successfully parse with a scale of 'E' and then report trailing
>   junk, instead of failing outright with EINVAL); but an earlier
>   commit already argued that this is not a high-level semantic change
>   since the only caller passing in a non-NULL endptr also checks that
>   the tail is whitespace-only.
> 
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1629

Also,

Fixes: cf923b78 ("utils: Improve qemu_strtosz() to have 64 bits of precision", 
6.0.0)
Fixes: 7625a1ed ("utils: Use fixed-point arithmetic in qemu_strtosz", 6.0.0)

> Signed-off-by: Eric Blake 
> ---
>  tests/unit/test-cutils.c | 51 +++
>  util/cutils.c| 89 
>  2 files changed, 88 insertions(+), 52 deletions(-)
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v1 6/9] KVM: x86: Add Heki hypervisor support

2023-05-08 Thread Wei Liu
On Fri, May 05, 2023 at 05:20:43PM +0200, Mickaël Salaün wrote:
> From: Madhavan T. Venkataraman 
> 
> Each supported hypervisor in x86 implements a struct x86_hyper_init to
> define the init functions for the hypervisor.  Define a new init_heki()
> entry point in struct x86_hyper_init.  Hypervisors that support Heki
> must define this init_heki() function.  Call init_heki() of the chosen
> hypervisor in init_hypervisor_platform().
> 
> Create a heki_hypervisor structure that each hypervisor can fill
> with its data and functions. This will allow the Heki feature to work
> in a hypervisor agnostic way.
> 
> Declare and initialize a "heki_hypervisor" structure for KVM so KVM can
> support Heki.  Define the init_heki() function for KVM.  In init_heki(),
> set the hypervisor field in the generic "heki" structure to the KVM
> "heki_hypervisor".  After this point, generic Heki code can access the
> KVM Heki data and functions.
> 
[...]
> +static void kvm_init_heki(void)
> +{
> + long err;
> +
> + if (!kvm_para_available())
> + /* Cannot make KVM hypercalls. */
> + return;
> +
> + err = kvm_hypercall3(KVM_HC_LOCK_MEM_PAGE_RANGES, -1, -1, -1);

Why not do a proper version check or capability check here? If the ABI
or supported features ever change then we have something to rely on?

Thanks,
Wei.



Re: [PATCH 07/11] numa: Check for qemu_strtosz_MiB error

2023-05-08 Thread Eric Blake
On Mon, May 08, 2023 at 03:03:39PM -0500, Eric Blake wrote:
> As shown in the previous commit, qemu_strtosz_MiB sometimes leaves the
> result value untoutched (we have to audit further to learn that in

untouched

> that case, the QAPI generator says that visit_type_NumaOptions() will
> have zero-initialized it), and sometimes leaves it with the value of a
> partial parse before -EINVAL occurs because of trailing garbage.
> Rather than blindly treating any string the user may throw at us as
> valid, we should check for parse failures.
> 
> Fiuxes: cc001888 ("numa: fixup parsed NumaNodeOptions earlier", v2.11.0)
> Signed-off-by: Eric Blake 
> ---

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v1 5/9] KVM: x86: Add new hypercall to lock control registers

2023-05-08 Thread Wei Liu
On Fri, May 05, 2023 at 05:20:42PM +0200, Mickaël Salaün wrote:
> This enables guests to lock their CR0 and CR4 registers with a subset of
> X86_CR0_WP, X86_CR4_SMEP, X86_CR4_SMAP, X86_CR4_UMIP, X86_CR4_FSGSBASE
> and X86_CR4_CET flags.
> 
> The new KVM_HC_LOCK_CR_UPDATE hypercall takes two arguments.  The first
> is to identify the control register, and the second is a bit mask to
> pin (i.e. mark as read-only).
> 
> These register flags should already be pinned by Linux guests, but once
> compromised, this self-protection mechanism could be disabled, which is
> not the case with this dedicated hypercall.
> 
> Cc: Borislav Petkov 
> Cc: Dave Hansen 
> Cc: H. Peter Anvin 
> Cc: Ingo Molnar 
> Cc: Kees Cook 
> Cc: Madhavan T. Venkataraman 
> Cc: Paolo Bonzini 
> Cc: Sean Christopherson 
> Cc: Thomas Gleixner 
> Cc: Vitaly Kuznetsov 
> Cc: Wanpeng Li 
> Signed-off-by: Mickaël Salaün 
> Link: https://lore.kernel.org/r/20230505152046.6575-6-...@digikod.net
[...]
>   hw_cr4 = (cr4_read_shadow() & X86_CR4_MCE) | (cr4 & ~X86_CR4_MCE);
>   if (is_unrestricted_guest(vcpu))
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ffab64d08de3..a529455359ac 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7927,11 +7927,77 @@ static unsigned long emulator_get_cr(struct 
> x86_emulate_ctxt *ctxt, int cr)
>   return value;
>  }
>  
> +#ifdef CONFIG_HEKI
> +
> +extern unsigned long cr4_pinned_mask;
> +

Can this be moved to a header file?

> +static int heki_lock_cr(struct kvm *const kvm, const unsigned long cr,
> + unsigned long pin)
> +{
> + if (!pin)
> + return -KVM_EINVAL;
> +
> + switch (cr) {
> + case 0:
> + /* Cf. arch/x86/kernel/cpu/common.c */
> + if (!(pin & X86_CR0_WP))
> + return -KVM_EINVAL;
> +
> + if ((read_cr0() & pin) != pin)
> + return -KVM_EINVAL;
> +
> + atomic_long_or(pin, &kvm->heki_pinned_cr0);
> + return 0;
> + case 4:
> + /* Checks for irrelevant bits. */
> + if ((pin & cr4_pinned_mask) != pin)
> + return -KVM_EINVAL;
> +

It is enforcing the host mask on the guest, right? If the guest's set is a
super set of the host's then it will get rejected.


> + /* Ignores bits not present in host. */
> + pin &= __read_cr4();
> + atomic_long_or(pin, &kvm->heki_pinned_cr4);
> + return 0;
> + }
> + return -KVM_EINVAL;
> +}
> +
> +int heki_check_cr(const struct kvm *const kvm, const unsigned long cr,
> +   const unsigned long val)
> +{
> + unsigned long pinned;
> +
> + switch (cr) {
> + case 0:
> + pinned = atomic_long_read(&kvm->heki_pinned_cr0);
> + if ((val & pinned) != pinned) {
> + pr_warn_ratelimited(
> + "heki-kvm: Blocked CR0 update: 0x%lx\n", val);

I think if the message contains the VM and VCPU identifier it will
become more useful.

Thanks,
Wei.



Re: [PATCH 0/4] vhost-user-fs: Internal migration

2023-05-08 Thread Stefan Hajnoczi
On Fri, May 05, 2023 at 02:51:55PM +0200, Hanna Czenczek wrote:
> On 05.05.23 11:53, Eugenio Perez Martin wrote:
> > On Fri, May 5, 2023 at 11:03 AM Hanna Czenczek  wrote:
> > > On 04.05.23 23:14, Stefan Hajnoczi wrote:
> > > > On Thu, 4 May 2023 at 13:39, Hanna Czenczek  wrote:
> 
> [...]
> 
> > > > All state is lost and the Device Initialization process
> > > > must be followed to make the device operational again.
> > > > 
> > > > Existing vhost-user backends don't implement SET_STATUS 0 (it's new).
> > > > 
> > > > It's messy and not your fault. I think QEMU should solve this by
> > > > treating stateful devices differently from non-stateful devices. That
> > > > way existing vhost-user backends continue to work and new stateful
> > > > devices can also be supported.
> > > It’s my understanding that SET_STATUS 0/RESET_DEVICE is problematic for
> > > stateful devices.  In a previous email, you wrote that these should
> > > implement SUSPEND+RESUME so qemu can use those instead.  But those are
> > > separate things, so I assume we just use SET_STATUS 0 when stopping the
> > > VM because this happens to also stop processing vrings as a side effect?
> > > 
> > > I.e. I understand “treating stateful devices differently” to mean that
> > > qemu should use SUSPEND+RESUME instead of SET_STATUS 0 when the back-end
> > > supports it, and stateful back-ends should support it.
> > > 
> > Honestly I cannot think of any use case where the vhost-user backend
> > did not ignore set_status(0) and had to retrieve vq states. So maybe
> > we can totally remove that call from qemu?
> 
> I don’t know so I can’t really say; but I don’t quite understand why qemu
> would reset a device at any point but perhaps VM reset (and even then I’d
> expect the post-reset guest to just reset the device on boot by itself,
> too).

DPDK stores the Device Status field value and uses it later:
https://github.com/DPDK/dpdk/blob/main/lib/vhost/vhost_user.c#L2791

While DPDK performs no immediate action upon SET_STATUS 0, omitting the
message will change the behavior of other DPDK code like
virtio_is_ready().

Changing the semantics of the vhost-user protocol in a way that's not
backwards compatible is something we should avoid unless there is no
other way.

The fundamental problem is that QEMU's vhost code is designed to reset
vhost devices because it assumes they are stateless. If an F_SUSPEND
protocol feature bit is added, then it becomes possible to detect new
backends and suspend/resume them rather than reset them.

That's the solution that I favor because it's backwards compatible and
the same model can be applied to stateful vDPA devices in the future.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 2/4] vhost-user: Interface for migration state transfer

2023-05-08 Thread Stefan Hajnoczi
On Thu, Apr 20, 2023 at 03:29:44PM +0200, Eugenio Pérez wrote:
> On Wed, 2023-04-19 at 07:21 -0400, Stefan Hajnoczi wrote:
> > On Wed, 19 Apr 2023 at 07:10, Hanna Czenczek  wrote:
> > > On 18.04.23 09:54, Eugenio Perez Martin wrote:
> > > > On Mon, Apr 17, 2023 at 9:21 PM Stefan Hajnoczi 
> > > > wrote:
> > > > > On Mon, 17 Apr 2023 at 15:08, Eugenio Perez Martin 
> > > > > 
> > > > > wrote:
> > > > > > On Mon, Apr 17, 2023 at 7:14 PM Stefan Hajnoczi 
> > > > > > 
> > > > > > wrote:
> > > > > > > On Thu, Apr 13, 2023 at 12:14:24PM +0200, Eugenio Perez Martin
> > > > > > > wrote:
> > > > > > > > On Wed, Apr 12, 2023 at 11:06 PM Stefan Hajnoczi <
> > > > > > > > stefa...@redhat.com> wrote:
> > > > > > > > > On Tue, Apr 11, 2023 at 05:05:13PM +0200, Hanna Czenczek 
> > > > > > > > > wrote:
> > > > > > > > > > So-called "internal" virtio-fs migration refers to
> > > > > > > > > > transporting the
> > > > > > > > > > back-end's (virtiofsd's) state through qemu's migration
> > > > > > > > > > stream.  To do
> > > > > > > > > > this, we need to be able to transfer virtiofsd's internal
> > > > > > > > > > state to and
> > > > > > > > > > from virtiofsd.
> > > > > > > > > > 
> > > > > > > > > > Because virtiofsd's internal state will not be too large, we
> > > > > > > > > > believe it
> > > > > > > > > > is best to transfer it as a single binary blob after the
> > > > > > > > > > streaming
> > > > > > > > > > phase.  Because this method should be useful to other vhost-
> > > > > > > > > > user
> > > > > > > > > > implementations, too, it is introduced as a general-purpose
> > > > > > > > > > addition to
> > > > > > > > > > the protocol, not limited to vhost-user-fs.
> > > > > > > > > > 
> > > > > > > > > > These are the additions to the protocol:
> > > > > > > > > > - New vhost-user protocol feature
> > > > > > > > > > VHOST_USER_PROTOCOL_F_MIGRATORY_STATE:
> > > > > > > > > >This feature signals support for transferring state, and 
> > > > > > > > > > is
> > > > > > > > > > added so
> > > > > > > > > >that migration can fail early when the back-end has no
> > > > > > > > > > support.
> > > > > > > > > > 
> > > > > > > > > > - SET_DEVICE_STATE_FD function: Front-end and back-end
> > > > > > > > > > negotiate a pipe
> > > > > > > > > >over which to transfer the state.  The front-end sends an
> > > > > > > > > > FD to the
> > > > > > > > > >back-end into/from which it can write/read its state, and
> > > > > > > > > > the back-end
> > > > > > > > > >can decide to either use it, or reply with a different FD
> > > > > > > > > > for the
> > > > > > > > > >front-end to override the front-end's choice.
> > > > > > > > > >The front-end creates a simple pipe to transfer the 
> > > > > > > > > > state,
> > > > > > > > > > but maybe
> > > > > > > > > >the back-end already has an FD into/from which it has to
> > > > > > > > > > write/read
> > > > > > > > > >its state, in which case it will want to override the
> > > > > > > > > > simple pipe.
> > > > > > > > > >Conversely, maybe in the future we find a way to have the
> > > > > > > > > > front-end
> > > > > > > > > >get an immediate FD for the migration stream (in some
> > > > > > > > > > cases), in which
> > > > > > > > > >case we will want to send this to the back-end instead of
> > > > > > > > > > creating a
> > > > > > > > > >pipe.
> > > > > > > > > >Hence the negotiation: If one side has a better idea 
> > > > > > > > > > than a
> > > > > > > > > > plain
> > > > > > > > > >pipe, we will want to use that.
> > > > > > > > > > 
> > > > > > > > > > - CHECK_DEVICE_STATE: After the state has been transferred
> > > > > > > > > > through the
> > > > > > > > > >pipe (the end indicated by EOF), the front-end invokes 
> > > > > > > > > > this
> > > > > > > > > > function
> > > > > > > > > >to verify success.  There is no in-band way (through the
> > > > > > > > > > pipe) to
> > > > > > > > > >indicate failure, so we need to check explicitly.
> > > > > > > > > > 
> > > > > > > > > > Once the transfer pipe has been established via
> > > > > > > > > > SET_DEVICE_STATE_FD
> > > > > > > > > > (which includes establishing the direction of transfer and
> > > > > > > > > > migration
> > > > > > > > > > phase), the sending side writes its data into the pipe, and
> > > > > > > > > > the reading
> > > > > > > > > > side reads it until it sees an EOF.  Then, the front-end 
> > > > > > > > > > will
> > > > > > > > > > check for
> > > > > > > > > > success via CHECK_DEVICE_STATE, which on the destination 
> > > > > > > > > > side
> > > > > > > > > > includes
> > > > > > > > > > checking for integrity (i.e. errors during deserialization).
> > > > > > > > > > 
> > > > > > > > > > Suggested-by: Stefan Hajnoczi 
> > > > > > > > > > Signed-off-by: Hanna Czenczek 
> > > > > > > > > > ---
> > > > > > > > > >   include/hw/virtio/vhost-backend.h |  24 +
> > > > > > > > > >   include/hw/virtio/vhost.h |  79 

Re: [PATCH v1 3/9] virt: Implement Heki common code

2023-05-08 Thread Wei Liu
On Fri, May 05, 2023 at 05:20:40PM +0200, Mickaël Salaün wrote:
> From: Madhavan T. Venkataraman 
> 
> Hypervisor Enforced Kernel Integrity (Heki) is a feature that will use
> the hypervisor to enhance guest virtual machine security.
> 
> Configuration
> =
> 
> Define the config variables for the feature. This feature depends on
> support from the architecture as well as the hypervisor.
> 
> Enabling HEKI
> =
> 
> Define a kernel command line parameter "heki" to turn the feature on or
> off. By default, Heki is on.

For such a newfangled feature can we have it off by default? Especially
when there are unsolved issues around dynamically loaded code.

> 
[...]
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 3604074a878b..5cf5a7a97811 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -297,6 +297,7 @@ config X86
>   select FUNCTION_ALIGNMENT_4B
>   imply IMA_SECURE_AND_OR_TRUSTED_BOOTif EFI
>   select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
> + select ARCH_SUPPORTS_HEKI   if X86_64

Why is there a restriction on X86_64?

>  
>  config INSTRUCTION_DECODER
>   def_bool y
> diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
> index a6e8373a5170..42ef1e33b8a5 100644
> --- a/arch/x86/include/asm/sections.h
> +++ b/arch/x86/include/asm/sections.h
[...]
>  
> +#ifdef CONFIG_HEKI
> +
> +/*
> + * Gather all of the statically defined sections so heki_late_init() can
> + * protect these sections in the host page table.
> + *
> + * The sections are defined under "SECTIONS" in vmlinux.lds.S
> + * Keep this array in sync with SECTIONS.
> + */

This seems a bit fragile, because it requires constant attention from
people who care about this functionality. Can this table be
automatically generated?

Thanks,
Wei.

> +struct heki_va_range __initdata heki_va_ranges[] = {
> + {
> + .va_start = _stext,
> + .va_end = _etext,
> + .attributes = HEKI_ATTR_MEM_NOWRITE | HEKI_ATTR_MEM_EXEC,
> + },
> + {
> + .va_start = __start_rodata,
> + .va_end = __end_rodata,
> + .attributes = HEKI_ATTR_MEM_NOWRITE,
> + },
> +#ifdef CONFIG_UNWINDER_ORC
> + {
> + .va_start = __start_orc_unwind_ip,
> + .va_end = __stop_orc_unwind_ip,
> + .attributes = HEKI_ATTR_MEM_NOWRITE,
> + },
> + {
> + .va_start = __start_orc_unwind,
> + .va_end = __stop_orc_unwind,
> + .attributes = HEKI_ATTR_MEM_NOWRITE,
> + },
> + {
> + .va_start = orc_lookup,
> + .va_end = orc_lookup_end,
> + .attributes = HEKI_ATTR_MEM_NOWRITE,
> + },
> +#endif /* CONFIG_UNWINDER_ORC */
> +};
> +



[PATCH 04/11] test-cutils: Add coverage of qemu_strtod

2023-05-08 Thread Eric Blake
Plenty more corner cases of strtod proper, but this covers the bulk of
what our wrappers do. In particular, it demonstrates the difference on
when *value is left uninitialized, which an upcoming patch will
normalize.

Signed-off-by: Eric Blake 
---
 tests/unit/test-cutils.c | 435 +++
 1 file changed, 435 insertions(+)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 1eeaf21ae22..4c096c6fc70 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -25,6 +25,8 @@
  * THE SOFTWARE.
  */

+#include 
+
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
 #include "qemu/units.h"
@@ -2044,6 +2046,414 @@ static void test_qemu_strtou64_full_max(void)
 g_free(str);
 }

+static void test_qemu_strtod_simple(void)
+{
+const char *str;
+const char *endptr;
+int err;
+double res;
+
+/* no radix or exponent */
+str = "1";
+endptr = NULL;
+res = 999;
+err = qemu_strtod(str, &endptr, &res);
+g_assert_cmpint(err, ==, 0);
+g_assert_cmpfloat(res, ==, 1.0);
+g_assert_true(endptr == str + 1);
+
+/* leading space and sign */
+str = " -0.0";
+endptr = NULL;
+res = 999;
+err = qemu_strtod(str, &endptr, &res);
+g_assert_cmpint(err, ==, 0);
+g_assert_cmpfloat(res, ==, -0.0);
+g_assert_true(signbit(res));
+g_assert_true(endptr == str + 5);
+
+/* fraction only */
+str = "+.5";
+endptr = NULL;
+res = 999;
+err = qemu_strtod(str, &endptr, &res);
+g_assert_cmpint(err, ==, 0);
+g_assert_cmpfloat(res, ==, 0.5);
+g_assert_true(endptr == str + 3);
+
+/* exponent */
+str = "1.e+1";
+endptr = NULL;
+res = 999;
+err = qemu_strtod(str, &endptr, &res);
+g_assert_cmpint(err, ==, 0);
+g_assert_cmpfloat(res, ==, 10.0);
+g_assert_true(endptr == str + 5);
+}
+
+static void test_qemu_strtod_einval(void)
+{
+const char *str;
+const char *endptr;
+int err;
+double res;
+
+/* empty */
+str = "";
+endptr = NULL;
+res = 999;
+err = qemu_strtod(str, &endptr, &res);
+g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpfloat(res, ==, 0.0);
+g_assert_true(endptr == str);
+
+/* NULL */
+str = NULL;
+endptr = "random";
+res = 999;
+err = qemu_strtod(str, &endptr, &res);
+g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpfloat(res, ==, 999.0);
+g_assert_null(endptr);
+
+/* not recognizable */
+str = " junk";
+endptr = NULL;
+res = 999;
+err = qemu_strtod(str, &endptr, &res);
+g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpfloat(res, ==, 0.0);
+g_assert_true(endptr == str);
+}
+
+static void test_qemu_strtod_erange(void)
+{
+const char *str;
+const char *endptr;
+int err;
+double res;
+
+/* overflow */
+str = "9e999";
+endptr = NULL;
+res = 999;
+err = qemu_strtod(str, &endptr, &res);
+g_assert_cmpint(err, ==, -ERANGE);
+g_assert_cmpfloat(res, ==, HUGE_VAL);
+g_assert_true(endptr == str + 5);
+
+str = "-9e+999";
+endptr = NULL;
+res = 999;
+err = qemu_strtod(str, &endptr, &res);
+g_assert_cmpint(err, ==, -ERANGE);
+g_assert_cmpfloat(res, ==, -HUGE_VAL);
+g_assert_true(endptr == str + 7);
+
+/* underflow */
+str = "-9e-999";
+endptr = NULL;
+res = 999;
+err = qemu_strtod(str, &endptr, &res);
+g_assert_cmpint(err, ==, -ERANGE);
+g_assert_cmpfloat(res, >=, -DBL_MIN);
+g_assert_cmpfloat(res, <=, -0.0);
+g_assert_true(signbit(res));
+g_assert_true(endptr == str + 7);
+}
+
+static void test_qemu_strtod_nonfinite(void)
+{
+const char *str;
+const char *endptr;
+int err;
+double res;
+
+/* infinity */
+str = "inf";
+endptr = NULL;
+res = 999;
+err = qemu_strtod(str, &endptr, &res);
+g_assert_cmpint(err, ==, 0);
+g_assert_true(isinf(res));
+g_assert_false(signbit(res));
+g_assert_true(endptr == str + 3);
+
+str = "-infinity";
+endptr = NULL;
+res = 999;
+err = qemu_strtod(str, &endptr, &res);
+g_assert_cmpint(err, ==, 0);
+g_assert_true(isinf(res));
+g_assert_true(signbit(res));
+g_assert_true(endptr == str + 9);
+
+/* not a number */
+str = " NaN";
+endptr = NULL;
+res = 999;
+err = qemu_strtod(str, &endptr, &res);
+g_assert_cmpint(err, ==, 0);
+g_assert_true(isnan(res));
+g_assert_true(endptr == str + 4);
+}
+
+static void test_qemu_strtod_trailing(void)
+{
+const char *str;
+const char *endptr;
+int err;
+double res;
+
+/* trailing whitespace */
+str = "1. ";
+endptr = NULL;
+res = 999;
+err = qemu_strtod(str, &endptr, &res);
+g_assert_cmpint(err, ==, 0);
+g_assert_cmpfloat(res, ==, 1.0);
+g_assert_true(endptr == str + 2);
+
+endptr = NULL;
+res = 999;
+err = qemu_strtod(str, NULL, &res);
+g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpfloat(res, ==, 

[PATCH 01/11] test-cutils: Avoid g_assert in unit tests

2023-05-08 Thread Eric Blake
glib documentation[1] is clear: g_assert() should be avoided in unit
tests because it is ineffective if G_DISABLE_ASSERT is defined; unit
tests should stick to constructs based on g_assert_true() instead.
Note that since commit 262a69f428, we intentionally state that you
cannot define G_DISABLE_ASSERT that while building qemu; but our code
can be copied to other projects without that restriction, so we should
be consistent.

For most of the replacements in this patch, using g_assert_cmpstr()
would be a regression in quality - although it would helpfully display
the string contents of both pointers on test failure, here, we really
do care about pointer equality, not just string content equality.  But
when a NULL pointer is expected, g_assert_null works fine.

[1] https://libsoup.org/glib/glib-Testing.html#g-assert

Signed-off-by: Eric Blake 
---
 tests/unit/test-cutils.c | 324 +++
 1 file changed, 162 insertions(+), 162 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 3c4f8754202..0202ac0d5b3 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -1,7 +1,7 @@
 /*
  * cutils.c unit-tests
  *
- * Copyright (C) 2013 Red Hat Inc.
+ * Copyright Red Hat
  *
  * Authors:
  *  Eduardo Habkost 
@@ -40,7 +40,7 @@ static void test_parse_uint_null(void)

 g_assert_cmpint(r, ==, -EINVAL);
 g_assert_cmpint(i, ==, 0);
-g_assert(endptr == NULL);
+g_assert_null(endptr);
 }

 static void test_parse_uint_empty(void)
@@ -55,7 +55,7 @@ static void test_parse_uint_empty(void)

 g_assert_cmpint(r, ==, -EINVAL);
 g_assert_cmpint(i, ==, 0);
-g_assert(endptr == str);
+g_assert_true(endptr == str);
 }

 static void test_parse_uint_whitespace(void)
@@ -70,7 +70,7 @@ static void test_parse_uint_whitespace(void)

 g_assert_cmpint(r, ==, -EINVAL);
 g_assert_cmpint(i, ==, 0);
-g_assert(endptr == str);
+g_assert_true(endptr == str);
 }


@@ -86,7 +86,7 @@ static void test_parse_uint_invalid(void)

 g_assert_cmpint(r, ==, -EINVAL);
 g_assert_cmpint(i, ==, 0);
-g_assert(endptr == str);
+g_assert_true(endptr == str);
 }


@@ -102,7 +102,7 @@ static void test_parse_uint_trailing(void)

 g_assert_cmpint(r, ==, 0);
 g_assert_cmpint(i, ==, 123);
-g_assert(endptr == str + 3);
+g_assert_true(endptr == str + 3);
 }

 static void test_parse_uint_correct(void)
@@ -117,7 +117,7 @@ static void test_parse_uint_correct(void)

 g_assert_cmpint(r, ==, 0);
 g_assert_cmpint(i, ==, 123);
-g_assert(endptr == str + strlen(str));
+g_assert_true(endptr == str + strlen(str));
 }

 static void test_parse_uint_octal(void)
@@ -132,7 +132,7 @@ static void test_parse_uint_octal(void)

 g_assert_cmpint(r, ==, 0);
 g_assert_cmpint(i, ==, 0123);
-g_assert(endptr == str + strlen(str));
+g_assert_true(endptr == str + strlen(str));
 }

 static void test_parse_uint_decimal(void)
@@ -147,7 +147,7 @@ static void test_parse_uint_decimal(void)

 g_assert_cmpint(r, ==, 0);
 g_assert_cmpint(i, ==, 123);
-g_assert(endptr == str + strlen(str));
+g_assert_true(endptr == str + strlen(str));
 }


@@ -163,7 +163,7 @@ static void test_parse_uint_llong_max(void)

 g_assert_cmpint(r, ==, 0);
 g_assert_cmpint(i, ==, (unsigned long long)LLONG_MAX + 1);
-g_assert(endptr == str + strlen(str));
+g_assert_true(endptr == str + strlen(str));

 g_free(str);
 }
@@ -180,7 +180,7 @@ static void test_parse_uint_overflow(void)

 g_assert_cmpint(r, ==, -ERANGE);
 g_assert_cmpint(i, ==, ULLONG_MAX);
-g_assert(endptr == str + strlen(str));
+g_assert_true(endptr == str + strlen(str));
 }

 static void test_parse_uint_negative(void)
@@ -195,7 +195,7 @@ static void test_parse_uint_negative(void)

 g_assert_cmpint(r, ==, -ERANGE);
 g_assert_cmpint(i, ==, 0);
-g_assert(endptr == str + strlen(str));
+g_assert_true(endptr == str + strlen(str));
 }


@@ -235,7 +235,7 @@ static void test_qemu_strtoi_correct(void)

 g_assert_cmpint(err, ==, 0);
 g_assert_cmpint(res, ==, 12345);
-g_assert(endptr == str + 5);
+g_assert_true(endptr == str + 5);
 }

 static void test_qemu_strtoi_null(void)
@@ -248,7 +248,7 @@ static void test_qemu_strtoi_null(void)
 err = qemu_strtoi(NULL, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert(endptr == NULL);
+g_assert_null(endptr);
 }

 static void test_qemu_strtoi_empty(void)
@@ -262,7 +262,7 @@ static void test_qemu_strtoi_empty(void)
 err = qemu_strtoi(str, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert(endptr == str);
+g_assert_true(endptr == str);
 }

 static void test_qemu_strtoi_whitespace(void)
@@ -276,7 +276,7 @@ static void test_qemu_strtoi_whitespace(void)
 err = qemu_strtoi(str, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert(endptr == str);
+g_assert_true(endptr == str);
 }

 static void test_qemu_st

[PATCH 09/11] cutils: Set value in all integral qemu_strto* error paths

2023-05-08 Thread Eric Blake
Our goal in writing qemu_strtoi() and friends is to have an interface
harder to abuse than libc's strtol().  Leaving the return value
initialized on some error paths does not lend itself well to this
goal; and our documentation wasn't helpful on the matter.

Note that the previous patch changed all qemu_strtosz() EINVAL error
paths to slam value to 0 rather than stay uninitialized, even when the
EINVAL eror occurs because of trailing junk.  But for the remaining
integral qemu_strto*, it's easier to return the parsed value than to
force things back to zero, in part because of how check_strtox_error
works; and doing so creates less churn in the testsuite.

Here, the list of affected callers is much longer ('git grep
"qemu_strto[ui]" *.c **/*.c | grep -v tests/ |wc -l' outputs 87,
although a few of those are the implementation in in cutils.c), so
touching as little as possible is the wisest course of action.

Signed-off-by: Eric Blake 
---
 tests/unit/test-cutils.c | 24 +++
 util/cutils.c| 42 +---
 2 files changed, 38 insertions(+), 28 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 9cf00a810e4..2cb33e41ae4 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -250,7 +250,7 @@ static void test_qemu_strtoi_null(void)
 err = qemu_strtoi(NULL, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpint(res, ==, 999);
+g_assert_cmpint(res, ==, 0);
 g_assert_null(endptr);
 }

@@ -479,7 +479,7 @@ static void test_qemu_strtoi_full_null(void)
 err = qemu_strtoi(NULL, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpint(res, ==, 999);
+g_assert_cmpint(res, ==, 0);
 g_assert_null(endptr);
 }

@@ -557,7 +557,7 @@ static void test_qemu_strtoui_null(void)
 err = qemu_strtoui(NULL, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpuint(res, ==, 999);
+g_assert_cmpuint(res, ==, 0);
 g_assert_null(endptr);
 }

@@ -784,7 +784,7 @@ static void test_qemu_strtoui_full_null(void)
 err = qemu_strtoui(NULL, NULL, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpuint(res, ==, 999);
+g_assert_cmpuint(res, ==, 0);
 }

 static void test_qemu_strtoui_full_empty(void)
@@ -860,7 +860,7 @@ static void test_qemu_strtol_null(void)
 err = qemu_strtol(NULL, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpint(res, ==, 999);
+g_assert_cmpint(res, ==, 0);
 g_assert_null(endptr);
 }

@@ -1087,7 +1087,7 @@ static void test_qemu_strtol_full_null(void)
 err = qemu_strtol(NULL, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpint(res, ==, 999);
+g_assert_cmpint(res, ==, 0);
 g_assert_null(endptr);
 }

@@ -1165,7 +1165,7 @@ static void test_qemu_strtoul_null(void)
 err = qemu_strtoul(NULL, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpuint(res, ==, 999);
+g_assert_cmpuint(res, ==, 0);
 g_assert_null(endptr);
 }

@@ -1390,7 +1390,7 @@ static void test_qemu_strtoul_full_null(void)
 err = qemu_strtoul(NULL, NULL, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpuint(res, ==, 999);
+g_assert_cmpuint(res, ==, 0);
 }

 static void test_qemu_strtoul_full_empty(void)
@@ -1466,7 +1466,7 @@ static void test_qemu_strtoi64_null(void)
 err = qemu_strtoi64(NULL, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpint(res, ==, 999);
+g_assert_cmpint(res, ==, 0);
 g_assert_null(endptr);
 }

@@ -1691,7 +1691,7 @@ static void test_qemu_strtoi64_full_null(void)
 err = qemu_strtoi64(NULL, NULL, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpint(res, ==, 999);
+g_assert_cmpint(res, ==, 0);
 }

 static void test_qemu_strtoi64_full_empty(void)
@@ -1769,7 +1769,7 @@ static void test_qemu_strtou64_null(void)
 err = qemu_strtou64(NULL, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpuint(res, ==, 999);
+g_assert_cmpuint(res, ==, 0);
 g_assert_null(endptr);
 }

@@ -1994,7 +1994,7 @@ static void test_qemu_strtou64_full_null(void)
 err = qemu_strtou64(NULL, NULL, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpuint(res, ==, 999);
+g_assert_cmpuint(res, ==, 0);
 }

 static void test_qemu_strtou64_full_empty(void)
diff --git a/util/cutils.c b/util/cutils.c
index 8bacf349383..83948926ec9 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -384,12 +384,13 @@ static int check_strtox_error(const char *nptr, char *ep,
  *
  * @nptr may be null, and no conversion is performed then.
  *
- * If no conversion is performed, store @nptr in *@endptr and return
- * -EINVAL.
+ * If no conversion is performed, store @nptr in *@endptr, 0 in
+ * @result, and return -EINVAL.
  *
  * If @endptr is null, and the string isn't fully converted, return
- * -EINVAL.  This

[PATCH 03/11] test-cutils: Test integral qemu_strto* value on failures

2023-05-08 Thread Eric Blake
We are inconsistent on the contents of *value after a strto* parse
failure.  I found the following behaviors:

- parse_uint() and parse_uint_full(), which document that *value is
  slammed to 0 on all EINVAL failures and 0 or UINT_MAX on ERANGE
  failures, and has unit tests for that (note that parse_uint requires
  non-NULL endptr, and does not fail with EINVAL for trailing junk)

- qemu_strtosz(), which leaves *value untouched on all failures (both
  EINVAL and ERANGE), and has unit tests but not documentation for
  that

- qemu_strtoi() and other integral friends, which document *value on
  ERANGE failures but is unspecified on EINVAL (other than implicitly
  by comparison to libc strto*); there, *value is untouched for NULL
  string, slammed to 0 on no conversion, and left at the prefix value
  on NULL endptr; unit tests do not consistently check the value

- qemu_strtod(), which documents *value on ERANGE failures but is
  unspecified on EINVAL; there, *value is untouched for NULL string,
  slammed to 0.0 for no conversion, and left at the prefix value on
  NULL endptr; there are no unit tests (other than indirectly through
  qemu_strtosz)

- qemu_strtod_finite(), which documents *value on ERANGE failures but
  is unspecified on EINVAL; there, *value is left at the prefix for
  'inf' or 'nan' and untouched in all other cases; there are no unit
  tests (other than indirectly through qemu_strtosz)

Upcoming patches will change behaviors for consistency, but it's best
to first have more unit test coverage to see the impact of those
changes.

Signed-off-by: Eric Blake 
---
 tests/unit/test-cutils.c | 58 +++-
 1 file changed, 51 insertions(+), 7 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 38bd3990207..1eeaf21ae22 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -248,6 +248,7 @@ static void test_qemu_strtoi_null(void)
 err = qemu_strtoi(NULL, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpint(res, ==, 999);
 g_assert_null(endptr);
 }

@@ -262,6 +263,7 @@ static void test_qemu_strtoi_empty(void)
 err = qemu_strtoi(str, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpint(res, ==, 0);
 g_assert_true(endptr == str);
 }

@@ -276,6 +278,7 @@ static void test_qemu_strtoi_whitespace(void)
 err = qemu_strtoi(str, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpint(res, ==, 0);
 g_assert_true(endptr == str);
 }

@@ -290,6 +293,7 @@ static void test_qemu_strtoi_invalid(void)
 err = qemu_strtoi(str, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpint(res, ==, 0);
 g_assert_true(endptr == str);
 }

@@ -473,6 +477,7 @@ static void test_qemu_strtoi_full_null(void)
 err = qemu_strtoi(NULL, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpint(res, ==, 999);
 g_assert_null(endptr);
 }

@@ -485,6 +490,7 @@ static void test_qemu_strtoi_full_empty(void)
 err = qemu_strtoi(str, NULL, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpint(res, ==, 0);
 }

 static void test_qemu_strtoi_full_negative(void)
@@ -502,18 +508,19 @@ static void test_qemu_strtoi_full_negative(void)
 static void test_qemu_strtoi_full_trailing(void)
 {
 const char *str = "123xxx";
-int res;
+int res = 999;
 int err;

 err = qemu_strtoi(str, NULL, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpint(res, ==, 123);
 }

 static void test_qemu_strtoi_full_max(void)
 {
 char *str = g_strdup_printf("%d", INT_MAX);
-int res;
+int res = 999;
 int err;

 err = qemu_strtoi(str, NULL, 0, &res);
@@ -548,6 +555,7 @@ static void test_qemu_strtoui_null(void)
 err = qemu_strtoui(NULL, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpuint(res, ==, 999);
 g_assert_null(endptr);
 }

@@ -562,6 +570,7 @@ static void test_qemu_strtoui_empty(void)
 err = qemu_strtoui(str, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpuint(res, ==, 0);
 g_assert_true(endptr == str);
 }

@@ -576,6 +585,7 @@ static void test_qemu_strtoui_whitespace(void)
 err = qemu_strtoui(str, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpuint(res, ==, 0);
 g_assert_true(endptr == str);
 }

@@ -590,6 +600,7 @@ static void test_qemu_strtoui_invalid(void)
 err = qemu_strtoui(str, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpuint(res, ==, 0);
 g_assert_true(endptr == str);
 }

@@ -771,6 +782,7 @@ static void test_qemu_strtoui_full_null(void)
 err = qemu_strtoui(NULL, NULL, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpuint(res, ==, 999);
 }

 static void test_qemu_strtoui_full_empty(void)
@@ -782,7 +794,9 @@ static void test_qemu_strtoui_full_empty(void)
 err = qemu_strtoui(str

[PATCH 02/11] test-cutils: Use g_assert_cmpuint where appropriate

2023-05-08 Thread Eric Blake
When debugging test failures, seeing unsigned values as large positive
values rather than negative values matters (assuming that the bug in
glib 2.76 [1] where g_assert_cmpuint displays signed instead of
unsigned values will eventually be fixed).  No impact when the test is
passing, but using a consistent style will matter more in upcoming
test additions.  Also, some tests are better with cmphex.

While at it, fix some spacing and minor typing issues spotted nearby.

[1] https://gitlab.gnome.org/GNOME/glib/-/issues/2997

Signed-off-by: Eric Blake 
---
 tests/unit/test-cutils.c | 148 +++
 1 file changed, 74 insertions(+), 74 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 0202ac0d5b3..38bd3990207 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -39,7 +39,7 @@ static void test_parse_uint_null(void)
 r = parse_uint(NULL, &i, &endptr, 0);

 g_assert_cmpint(r, ==, -EINVAL);
-g_assert_cmpint(i, ==, 0);
+g_assert_cmpuint(i, ==, 0);
 g_assert_null(endptr);
 }

@@ -54,7 +54,7 @@ static void test_parse_uint_empty(void)
 r = parse_uint(str, &i, &endptr, 0);

 g_assert_cmpint(r, ==, -EINVAL);
-g_assert_cmpint(i, ==, 0);
+g_assert_cmpuint(i, ==, 0);
 g_assert_true(endptr == str);
 }

@@ -69,7 +69,7 @@ static void test_parse_uint_whitespace(void)
 r = parse_uint(str, &i, &endptr, 0);

 g_assert_cmpint(r, ==, -EINVAL);
-g_assert_cmpint(i, ==, 0);
+g_assert_cmpuint(i, ==, 0);
 g_assert_true(endptr == str);
 }

@@ -85,7 +85,7 @@ static void test_parse_uint_invalid(void)
 r = parse_uint(str, &i, &endptr, 0);

 g_assert_cmpint(r, ==, -EINVAL);
-g_assert_cmpint(i, ==, 0);
+g_assert_cmpuint(i, ==, 0);
 g_assert_true(endptr == str);
 }

@@ -101,7 +101,7 @@ static void test_parse_uint_trailing(void)
 r = parse_uint(str, &i, &endptr, 0);

 g_assert_cmpint(r, ==, 0);
-g_assert_cmpint(i, ==, 123);
+g_assert_cmpuint(i, ==, 123);
 g_assert_true(endptr == str + 3);
 }

@@ -116,7 +116,7 @@ static void test_parse_uint_correct(void)
 r = parse_uint(str, &i, &endptr, 0);

 g_assert_cmpint(r, ==, 0);
-g_assert_cmpint(i, ==, 123);
+g_assert_cmpuint(i, ==, 123);
 g_assert_true(endptr == str + strlen(str));
 }

@@ -131,7 +131,7 @@ static void test_parse_uint_octal(void)
 r = parse_uint(str, &i, &endptr, 0);

 g_assert_cmpint(r, ==, 0);
-g_assert_cmpint(i, ==, 0123);
+g_assert_cmpuint(i, ==, 0123);
 g_assert_true(endptr == str + strlen(str));
 }

@@ -146,7 +146,7 @@ static void test_parse_uint_decimal(void)
 r = parse_uint(str, &i, &endptr, 10);

 g_assert_cmpint(r, ==, 0);
-g_assert_cmpint(i, ==, 123);
+g_assert_cmpuint(i, ==, 123);
 g_assert_true(endptr == str + strlen(str));
 }

@@ -162,7 +162,7 @@ static void test_parse_uint_llong_max(void)
 r = parse_uint(str, &i, &endptr, 0);

 g_assert_cmpint(r, ==, 0);
-g_assert_cmpint(i, ==, (unsigned long long)LLONG_MAX + 1);
+g_assert_cmpuint(i, ==, (unsigned long long)LLONG_MAX + 1);
 g_assert_true(endptr == str + strlen(str));

 g_free(str);
@@ -179,7 +179,7 @@ static void test_parse_uint_overflow(void)
 r = parse_uint(str, &i, &endptr, 0);

 g_assert_cmpint(r, ==, -ERANGE);
-g_assert_cmpint(i, ==, ULLONG_MAX);
+g_assert_cmpuint(i, ==, ULLONG_MAX);
 g_assert_true(endptr == str + strlen(str));
 }

@@ -194,7 +194,7 @@ static void test_parse_uint_negative(void)
 r = parse_uint(str, &i, &endptr, 0);

 g_assert_cmpint(r, ==, -ERANGE);
-g_assert_cmpint(i, ==, 0);
+g_assert_cmpuint(i, ==, 0);
 g_assert_true(endptr == str + strlen(str));
 }

@@ -208,7 +208,7 @@ static void test_parse_uint_full_trailing(void)
 r = parse_uint_full(str, &i, 0);

 g_assert_cmpint(r, ==, -EINVAL);
-g_assert_cmpint(i, ==, 0);
+g_assert_cmpuint(i, ==, 0);
 }

 static void test_parse_uint_full_correct(void)
@@ -220,7 +220,7 @@ static void test_parse_uint_full_correct(void)
 r = parse_uint_full(str, &i, 0);

 g_assert_cmpint(r, ==, 0);
-g_assert_cmpint(i, ==, 123);
+g_assert_cmpuint(i, ==, 123);
 }

 static void test_qemu_strtoi_correct(void)
@@ -428,7 +428,7 @@ static void test_qemu_strtoi_underflow(void)
 int res = 999;
 int err;

-err  = qemu_strtoi(str, &endptr, 0, &res);
+err = qemu_strtoi(str, &endptr, 0, &res);

 g_assert_cmpint(err, ==, -ERANGE);
 g_assert_cmpint(res, ==, INT_MIN);
@@ -479,10 +479,10 @@ static void test_qemu_strtoi_full_null(void)
 static void test_qemu_strtoi_full_empty(void)
 {
 const char *str = "";
-int res = 999L;
+int res = 999;
 int err;

-err =  qemu_strtoi(str, NULL, 0, &res);
+err = qemu_strtoi(str, NULL, 0, &res);

 g_assert_cmpint(err, ==, -EINVAL);
 }
@@ -728,7 +728,7 @@ static void test_qemu_strtoui_underflow(void)
 unsigned int res = 999;
 int err;

-err  = qemu_strtoui(str, &endptr,

[PATCH 08/11] cutils: Set value in all qemu_strtosz* error paths

2023-05-08 Thread Eric Blake
Making callers determine whether or not *value was populated on error
is not nice for usability.  Pre-patch, we have unit tests that check
that *result is left unchanged on most EINVAL errors and set to 0 on
many ERANGE errors.  This is subtly different from libc strtoumax()
behavior which returns UINT64_MAX on ERANGE errors, as well as
different from our parse_uint() which slams to 0 on EINVAL on the
grounds that we want our functions to be harder to mis-use than
strtoumax().

Let's audit callers:

- hw/core/numa.c:parse_numa() fixed in the previous patch to check for
  errors

- migration/migration-hmp-cmds.c:hmp_migrate_set_parameter(),
  monitor/hmp.c:monitor_parse_arguments(),
  qapi/opts-visitor.c:opts_type_size(),
  qapi/qobject-input-visitor.c:qobject_input_type_size_keyval(),
  qemu-img.c:cvtnum_full(), qemu-io-cmds.c:cvtnum(),
  target/i386/cpu.c:x86_cpu_parse_featurestr(), and
  util/qemu-option.c:parse_option_size() appear to reject all failures
  (although some with distinct messages for ERANGE as opposed to
  EINVAL), so it doesn't matter what is in the value parameter on
  error.

- All remaining callers are in the testsuite, where we can tweak our
  expectations to match our new desired behavior.

Advancing to the end of the string parsed on overflow (ERANGE), while
still returning 0, makes sense (UINT64_MAX as a size is unlikely to be
useful); likewise, our size parsing code is complex enough that it's
easier to always return 0 when endptr is NULL but trailing garbage was
found, rather than trying to return the value of the prefix actually
parsed (no current caller cared about the value of the prefix).

Signed-off-by: Eric Blake 
---
 tests/unit/test-cutils.c | 72 
 util/cutils.c| 17 +++---
 2 files changed, 48 insertions(+), 41 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 9fa6fb042e8..9cf00a810e4 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -2684,7 +2684,7 @@ static void test_qemu_strtosz_float(void)
 res = 0xbaadf00d;
 err = qemu_strtosz(str, &endptr, &res);
 g_assert_cmpint(err, ==, -EINVAL /* FIXME 0 */);
-g_assert_cmpuint(res, ==, 0xbaadf00d /* FIXME 512 */);
+g_assert_cmpuint(res, ==, 0 /* FIXME 512 */);
 g_assert_true(endptr == str /* FIXME + 4 */);

 /* For convenience, we permit values that are not byte-exact */
@@ -2736,7 +2736,7 @@ static void test_qemu_strtosz_invalid(void)
 res = 0xbaadf00d;
 err = qemu_strtosz(str, &endptr, &res);
 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmphex(res, ==, 0xbaadf00d);
+g_assert_cmpuint(res, ==, 0);
 g_assert_null(endptr);

 str = "";
@@ -2744,7 +2744,7 @@ static void test_qemu_strtosz_invalid(void)
 res = 0xbaadf00d;
 err = qemu_strtosz(str, &endptr, &res);
 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmphex(res, ==, 0xbaadf00d);
+g_assert_cmpuint(res, ==, 0);
 g_assert_true(endptr == str);

 str = " \t ";
@@ -2752,7 +2752,7 @@ static void test_qemu_strtosz_invalid(void)
 res = 0xbaadf00d;
 err = qemu_strtosz(str, &endptr, &res);
 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmphex(res, ==, 0xbaadf00d);
+g_assert_cmpuint(res, ==, 0);
 g_assert_true(endptr == str);

 str = ".";
@@ -2760,14 +2760,14 @@ static void test_qemu_strtosz_invalid(void)
 res = 0xbaadf00d;
 err = qemu_strtosz(str, &endptr, &res);
 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmphex(res, ==, 0xbaadf00d);
+g_assert_cmpuint(res, ==, 0);
 g_assert(endptr == str);

 str = " .";
 endptr = NULL;
 err = qemu_strtosz(str, &endptr, &res);
 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmphex(res, ==, 0xbaadf00d);
+g_assert_cmpuint(res, ==, 0);
 g_assert(endptr == str);

 str = "crap";
@@ -2775,7 +2775,7 @@ static void test_qemu_strtosz_invalid(void)
 res = 0xbaadf00d;
 err = qemu_strtosz(str, &endptr, &res);
 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmphex(res, ==, 0xbaadf00d);
+g_assert_cmpuint(res, ==, 0);
 g_assert_true(endptr == str);

 str = "inf";
@@ -2783,7 +2783,7 @@ static void test_qemu_strtosz_invalid(void)
 res = 0xbaadf00d;
 err = qemu_strtosz(str, &endptr, &res);
 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmphex(res, ==, 0xbaadf00d);
+g_assert_cmpuint(res, ==, 0);
 g_assert_true(endptr == str);

 str = "NaN";
@@ -2791,7 +2791,7 @@ static void test_qemu_strtosz_invalid(void)
 res = 0xbaadf00d;
 err = qemu_strtosz(str, &endptr, &res);
 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmphex(res, ==, 0xbaadf00d);
+g_assert_cmpuint(res, ==, 0);
 g_assert_true(endptr == str);

 /* Fractional values require scale larger than bytes */
@@ -2800,7 +2800,7 @@ static void test_qemu_strtosz_invalid(void)
 res = 0xbaadf00d;
 err = qemu_strtosz(str, &endptr, &res);
 g_assert_cmpint(err, ==, -EI

[PATCH 05/11] test-cutils: Prepare for upcoming semantic change in qemu_strtosz

2023-05-08 Thread Eric Blake
A quick search for 'qemu_strtosz' in the code base shows that outside
of the testsuite, the ONLY place that passes a non-NULL pointer to
@endptr of any variant of a size parser is in hmp.c (the 'o' parser of
monitor_parse_arguments), and that particular caller warns of
"extraneous characters at the end of line" unless the trailing bytes
are purely whitespace.  Thus, it makes no semantic difference at the
high level whether we parse "1.5e1k" as "1" + ".5e1" + "k" (an attempt
to use scientific notation in strtod with a scaling suffix of 'k' with
no trailing junk, but which qemu_strtosz says should fail with
EINVAL), or as "1.5e" + "1k" (a valid size with scaling suffix of 'e'
for exabytes, followed by two junk bytes) - either way, any user
passing such a string will get an error message about a parse failure.

However, an upcoming patch to qemu_strtosz will fix other corner case
bugs in handling the fractional portion of a size, and in doing so, it
is easier to declare that qemu_strtosz() itself stops parsing at the
first 'e' rather than blindly consuming whatever strtod() will
recognize.  Once that is fixed, the difference will be visible at the
low level (getting a valid parse with trailing garbage when @endptr is
non-NULL, while continuing to get -EINVAL when @endptr is NULL); this
is easier to demonstrate by moving the affected strings from
test_qemu_strtosz_invalid() (which declares them as always -EINVAL) to
test_qemu_strtosz_trailing() (where @endptr affects behavior, for now
with FIXME comments).

Note that a similar argument could be made for having "0x1.5" or
"0x1M" parse as 0x1 with ".5" or "M" as trailing junk, instead of
blindly treating it as -EINVAL; however, as these cases do not suffer
from the same problems as floating point, they are not worth changing
at this time.

Signed-off-by: Eric Blake 
---
 tests/unit/test-cutils.c | 42 ++--
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 4c096c6fc70..afae2ee5331 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -2745,21 +2745,6 @@ static void test_qemu_strtosz_invalid(void)
 g_assert_cmphex(res, ==, 0xbaadf00d);
 g_assert_true(endptr == str);

-/* No floating point exponents */
-str = "1.5e1k";
-endptr = NULL;
-err = qemu_strtosz(str, &endptr, &res);
-g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmphex(res, ==, 0xbaadf00d);
-g_assert_true(endptr == str);
-
-str = "1.5E+0k";
-endptr = NULL;
-err = qemu_strtosz(str, &endptr, &res);
-g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmphex(res, ==, 0xbaadf00d);
-g_assert_true(endptr == str);
-
 /* No hex fractions */
 str = "0x1.8k";
 endptr = NULL;
@@ -2863,6 +2848,33 @@ static void test_qemu_strtosz_trailing(void)
 err = qemu_strtosz(str, NULL, &res);
 g_assert_cmpint(err, ==, -EINVAL);
 g_assert_cmphex(res, ==, 0xbaadf00d);
+
+/* FIXME should stop parse after 'e'. No floating point exponents */
+str = "1.5e1k";
+endptr = NULL;
+res = 0xbaadf00d;
+err = qemu_strtosz(str, &endptr, &res);
+g_assert_cmpint(err, ==, -EINVAL /* FIXME 0 */);
+g_assert_cmphex(res, ==, 0xbaadf00d /* FIXME EiB * 1.5 */);
+g_assert_true(endptr == str /* FIXME + 4 */);
+
+res = 0xbaadf00d;
+err = qemu_strtosz(str, NULL, &res);
+g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpint(res, ==, 0xbaadf00d);
+
+str = "1.5E+0k";
+endptr = NULL;
+res = 0xbaadf00d;
+err = qemu_strtosz(str, &endptr, &res);
+g_assert_cmpint(err, ==, -EINVAL /* FIXME 0 */);
+g_assert_cmphex(res, ==, 0xbaadf00d /* FIXME EiB * 1.5 */);
+g_assert_true(endptr == str /* FIXME + 4 */);
+
+res = 0xbaadf00d;
+err = qemu_strtosz(str, NULL, &res);
+g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmphex(res, ==, 0xbaadf00d);
 }

 static void test_qemu_strtosz_erange(void)
-- 
2.40.1




[PATCH 10/11] cutils: Improve qemu_strtod* error paths

2023-05-08 Thread Eric Blake
Previous patches changed all integral qemu_strto*() error paths to
guarantee that *value is never left uninitialized.  Do likewise for
qemu_strtod.  Also, tighten qemu_strtod_finite() to never return a
non-finite value (prior to this patch, we were rejecting "inf" with
-EINVAL and unspecified result 0.0, but failing "9e999" with -ERANGE
and HUGE_VAL - which is infinite on IEEE machines - despite our
function claiming to recognize only finite values).

Auditing callers, we have no external callers of qemu_strtod, and
among the callers of qemu_strtod_finite:

- qapi/qobject-input-visitor.c:qobject_input_type_number_keyval() and
  qapi/string-input-visitor.c:parse_type_number() which reject all
  errors (does not matter what we store)

- utils/cutils.c:do_strtosz() incorrectly assumes that *endptr points
  to '.' on all failures (that is, it is not distinguishing between
  EINVAL and ERANGE; and therefore still does the WRONG THING for
  "9.9e999".  The change here does not fix that (a later patch will
  tackle this more systematically), but at least the value of endptr
  is less likely to be out of bounds on overflow

- our testsuite, which we can update to match what we document

Signed-off-by: Eric Blake 
---
 tests/unit/test-cutils.c | 57 +---
 util/cutils.c| 32 +-
 2 files changed, 55 insertions(+), 34 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index 2cb33e41ae4..f781997aef7 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -2105,6 +2105,7 @@ static void test_qemu_strtod_einval(void)
 err = qemu_strtod(str, &endptr, &res);
 g_assert_cmpint(err, ==, -EINVAL);
 g_assert_cmpfloat(res, ==, 0.0);
+g_assert_false(signbit(res));
 g_assert_true(endptr == str);

 /* NULL */
@@ -2113,7 +2114,8 @@ static void test_qemu_strtod_einval(void)
 res = 999;
 err = qemu_strtod(str, &endptr, &res);
 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpfloat(res, ==, 999.0);
+g_assert_cmpfloat(res, ==, 0.0);
+g_assert_false(signbit(res));
 g_assert_null(endptr);

 /* not recognizable */
@@ -2123,6 +2125,7 @@ static void test_qemu_strtod_einval(void)
 err = qemu_strtod(str, &endptr, &res);
 g_assert_cmpint(err, ==, -EINVAL);
 g_assert_cmpfloat(res, ==, 0.0);
+g_assert_false(signbit(res));
 g_assert_true(endptr == str);
 }

@@ -2309,7 +2312,8 @@ static void test_qemu_strtod_finite_einval(void)
 res = 999;
 err = qemu_strtod_finite(str, &endptr, &res);
 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpfloat(res, ==, 999.0);
+g_assert_cmpfloat(res, ==, 0.0);
+g_assert_false(signbit(res));
 g_assert_true(endptr == str);

 /* NULL */
@@ -2318,7 +2322,8 @@ static void test_qemu_strtod_finite_einval(void)
 res = 999;
 err = qemu_strtod_finite(str, &endptr, &res);
 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpfloat(res, ==, 999.0);
+g_assert_cmpfloat(res, ==, 0.0);
+g_assert_false(signbit(res));
 g_assert_null(endptr);

 /* not recognizable */
@@ -2327,7 +2332,8 @@ static void test_qemu_strtod_finite_einval(void)
 res = 999;
 err = qemu_strtod_finite(str, &endptr, &res);
 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpfloat(res, ==, 999.0);
+g_assert_cmpfloat(res, ==, 0.0);
+g_assert_false(signbit(res));
 g_assert_true(endptr == str);
 }

@@ -2338,24 +2344,26 @@ static void test_qemu_strtod_finite_erange(void)
 int err;
 double res;

-/* overflow */
+/* overflow turns into EINVAL */
 str = "9e999";
 endptr = NULL;
 res = 999;
 err = qemu_strtod_finite(str, &endptr, &res);
-g_assert_cmpint(err, ==, -ERANGE);
-g_assert_cmpfloat(res, ==, HUGE_VAL);
-g_assert_true(endptr == str + 5);
+g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpfloat(res, ==, 0.0);
+g_assert_false(signbit(res));
+g_assert_true(endptr == str);

 str = "-9e+999";
 endptr = NULL;
 res = 999;
 err = qemu_strtod_finite(str, &endptr, &res);
-g_assert_cmpint(err, ==, -ERANGE);
-g_assert_cmpfloat(res, ==, -HUGE_VAL);
-g_assert_true(endptr == str + 7);
+g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpfloat(res, ==, 0.0);
+g_assert_false(signbit(res));
+g_assert_true(endptr == str);

-/* underflow */
+/* underflow is still possible */
 str = "-9e-999";
 endptr = NULL;
 res = 999;
@@ -2380,7 +2388,8 @@ static void test_qemu_strtod_finite_nonfinite(void)
 res = 999;
 err = qemu_strtod_finite(str, &endptr, &res);
 g_assert_cmpint(err, ==, -EINVAL);
-g_assert_cmpfloat(res, ==, 999.0);
+g_assert_cmpfloat(res, ==, 0.0);
+g_assert_false(signbit(res));
 g_assert_true(endptr == str);

 str = "-infinity";
@@ -2388,7 +2397,8 @@ static void test_qemu_strtod_finite_nonfinite(void)
 res = 999;
 err = qemu_strtod_finite(str, &endptr, &res);
   

[PATCH 06/11] test-cutils: Add more coverage to qemu_strtosz

2023-05-08 Thread Eric Blake
Add some more strings that the user might send our way.  In
particular, some of these additions include FIXME comments showing
where our parser doesn't quite behave the way we want.

Signed-off-by: Eric Blake 
---
 tests/unit/test-cutils.c | 226 +--
 1 file changed, 215 insertions(+), 11 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index afae2ee5331..9fa6fb042e8 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -2478,14 +2478,14 @@ static void test_qemu_strtosz_simple(void)
 g_assert_cmpuint(res, ==, 8);
 g_assert_true(endptr == str + 2);

-/* Leading space is ignored */
-str = " 12345";
+/* Leading space and + are ignored */
+str = " +12345";
 endptr = str;
 res = 0xbaadf00d;
 err = qemu_strtosz(str, &endptr, &res);
 g_assert_cmpint(err, ==, 0);
 g_assert_cmpuint(res, ==, 12345);
-g_assert_true(endptr == str + 6);
+g_assert_true(endptr == str + 7);

 res = 0xbaadf00d;
 err = qemu_strtosz(str, NULL, &res);
@@ -2564,13 +2564,13 @@ static void test_qemu_strtosz_hex(void)
 g_assert_cmpuint(res, ==, 171);
 g_assert_true(endptr == str + 4);

-str = "0xae";
+str = " +0xae";
 endptr = str;
 res = 0xbaadf00d;
 err = qemu_strtosz(str, &endptr, &res);
 g_assert_cmpint(err, ==, 0);
 g_assert_cmpuint(res, ==, 174);
-g_assert_true(endptr == str + 4);
+g_assert_true(endptr == str + 6);
 }

 static void test_qemu_strtosz_units(void)
@@ -2669,14 +2669,23 @@ static void test_qemu_strtosz_float(void)
 g_assert_cmpuint(res, ==, 1);
 g_assert_true(endptr == str + 4);

-/* An empty fraction is tolerated */
-str = "1.k";
+/* An empty fraction tail is tolerated */
+str = " 1.k";
 endptr = str;
 res = 0xbaadf00d;
 err = qemu_strtosz(str, &endptr, &res);
 g_assert_cmpint(err, ==, 0);
 g_assert_cmpuint(res, ==, 1024);
-g_assert_true(endptr == str + 3);
+g_assert_true(endptr == str + 4);
+
+/* FIXME An empty fraction head should be tolerated */
+str = " .5k";
+endptr = str;
+res = 0xbaadf00d;
+err = qemu_strtosz(str, &endptr, &res);
+g_assert_cmpint(err, ==, -EINVAL /* FIXME 0 */);
+g_assert_cmpuint(res, ==, 0xbaadf00d /* FIXME 512 */);
+g_assert_true(endptr == str /* FIXME + 4 */);

 /* For convenience, we permit values that are not byte-exact */
 str = "12.345M";
@@ -2686,6 +2695,32 @@ static void test_qemu_strtosz_float(void)
 g_assert_cmpint(err, ==, 0);
 g_assert_cmpuint(res, ==, (uint64_t) (12.345 * MiB + 0.5));
 g_assert_true(endptr == str + 7);
+
+/* FIXME Fraction tail should round correctly */
+str = "1.k";
+endptr = str;
+res = 0xbaadf00d;
+err = qemu_strtosz(str, &endptr, &res);
+g_assert_cmpint(err, ==, 0);
+g_assert_cmpint(res, ==, 1024 /* FIXME 2048 */);
+g_assert_true(endptr == str + 55);
+
+/* FIXME ERANGE underflow in the fraction tail should not matter for 'k' */
+str = "1."
+"00"
+"00"
+"00"
+"00"
+"00"
+"00"
+"00"
+"1k";
+endptr = str;
+res = 0xbaadf00d;
+err = qemu_strtosz(str, &endptr, &res);
+g_assert_cmpint(err, ==, 0);
+g_assert_cmpuint(res, ==, 1 /* FIXME 1024 */);
+g_assert_true(endptr == str + 354);
 }

 static void test_qemu_strtosz_invalid(void)
@@ -2693,10 +2728,20 @@ static void test_qemu_strtosz_invalid(void)
 const char *str;
 const char *endptr;
 int err;
-uint64_t res = 0xbaadf00d;
+uint64_t res;
+
+/* Must parse at least one digit */
+str = NULL;
+endptr = "somewhere";
+res = 0xbaadf00d;
+err = qemu_strtosz(str, &endptr, &res);
+g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmphex(res, ==, 0xbaadf00d);
+g_assert_null(endptr);

 str = "";
 endptr = NULL;
+res = 0xbaadf00d;
 err = qemu_strtosz(str, &endptr, &res);
 g_assert_cmpint(err, ==, -EINVAL);
 g_assert_cmphex(res, ==, 0xbaadf00d);
@@ -2704,13 +2749,30 @@ static void test_qemu_strtosz_invalid(void)

 str = " \t ";
 endptr = NULL;
+res = 0xbaadf00d;
 err = qemu_strtosz(str, &endptr, &res);
 g_assert_cmpint(err, ==, -EINVAL);
 g_assert_cmphex(res, ==, 0xbaadf00d);
 g_assert_true(endptr == str);

+str = ".";
+endptr = NULL;
+res = 0xbaadf00d;
+err = qemu_strtosz(str, &endptr, &res);
+g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmphex(res, ==, 0xbaadf00d);
+g_assert(endptr == str);
+
+str = " .";
+endptr = N

[PATCH 07/11] numa: Check for qemu_strtosz_MiB error

2023-05-08 Thread Eric Blake
As shown in the previous commit, qemu_strtosz_MiB sometimes leaves the
result value untoutched (we have to audit further to learn that in
that case, the QAPI generator says that visit_type_NumaOptions() will
have zero-initialized it), and sometimes leaves it with the value of a
partial parse before -EINVAL occurs because of trailing garbage.
Rather than blindly treating any string the user may throw at us as
valid, we should check for parse failures.

Fiuxes: cc001888 ("numa: fixup parsed NumaNodeOptions earlier", v2.11.0)
Signed-off-by: Eric Blake 
---
 hw/core/numa.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index d8d36b16d80..f08956ddb0f 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -531,10 +531,17 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error 
**errp)
 /* Fix up legacy suffix-less format */
 if ((object->type == NUMA_OPTIONS_TYPE_NODE) && object->u.node.has_mem) {
 const char *mem_str = qemu_opt_get(opts, "mem");
-qemu_strtosz_MiB(mem_str, NULL, &object->u.node.mem);
+int ret = qemu_strtosz_MiB(mem_str, NULL, &object->u.node.mem);
+
+if (ret < 0) {
+error_setg_errno(&err, -ret, "could not parse memory size '%s'",
+ mem_str);
+}
 }

-set_numa_options(ms, object, &err);
+if (!err) {
+set_numa_options(ms, object, &err);
+}

 qapi_free_NumaOptions(object);
 if (err) {
-- 
2.40.1




[PATCH 11/11] cutils: Improve qemu_strtosz handling of fractions

2023-05-08 Thread Eric Blake
We have several limitations and bugs worth fixing; they are
inter-related enough that it is not worth splitting this patch into
smaller pieces:

* ".5k" should work to specify 512, just as "0.5k" does
* "1.k" and "1." + "9"*50 + "k" should both produce the same
  result of 2048 after rounding
* "1." + "0"*350 + "1B" should not be treated the same as "1.0B";
  underflow in the fraction should not be lost
* "7.99e99" and "7.99e999" look similar, but our code was doing a
  read-out-of-bounds on the latter because it was not expecting ERANGE
  due to overflow. While we document that scientific notation is not
  supported, and the previous patch actually fixed
  qemu_strtod_finite() to no longer return ERANGE overflows, it is
  easier to pre-filter than to try and determine after the fact if
  strtod() consumed more than we wanted.  Note that this is a
  low-level semantic change (when endptr is not NULL, we can now
  successfully parse with a scale of 'E' and then report trailing
  junk, instead of failing outright with EINVAL); but an earlier
  commit already argued that this is not a high-level semantic change
  since the only caller passing in a non-NULL endptr also checks that
  the tail is whitespace-only.

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1629
Signed-off-by: Eric Blake 
---
 tests/unit/test-cutils.c | 51 +++
 util/cutils.c| 89 
 2 files changed, 88 insertions(+), 52 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index f781997aef7..1fb9d5323ab 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -2693,14 +2693,14 @@ static void test_qemu_strtosz_float(void)
 g_assert_cmpuint(res, ==, 1024);
 g_assert_true(endptr == str + 4);

-/* FIXME An empty fraction head should be tolerated */
+/* An empty fraction head is tolerated */
 str = " .5k";
 endptr = str;
 res = 0xbaadf00d;
 err = qemu_strtosz(str, &endptr, &res);
-g_assert_cmpint(err, ==, -EINVAL /* FIXME 0 */);
-g_assert_cmpuint(res, ==, 0 /* FIXME 512 */);
-g_assert_true(endptr == str /* FIXME + 4 */);
+g_assert_cmpint(err, ==, 0);
+g_assert_cmpuint(res, ==, 512);
+g_assert_true(endptr == str + 4);

 /* For convenience, we permit values that are not byte-exact */
 str = "12.345M";
@@ -2711,16 +2711,16 @@ static void test_qemu_strtosz_float(void)
 g_assert_cmpuint(res, ==, (uint64_t) (12.345 * MiB + 0.5));
 g_assert_true(endptr == str + 7);

-/* FIXME Fraction tail should round correctly */
+/* Fraction tail can round up */
 str = "1.k";
 endptr = str;
 res = 0xbaadf00d;
 err = qemu_strtosz(str, &endptr, &res);
 g_assert_cmpint(err, ==, 0);
-g_assert_cmpint(res, ==, 1024 /* FIXME 2048 */);
+g_assert_cmpuint(res, ==, 2048);
 g_assert_true(endptr == str + 55);

-/* FIXME ERANGE underflow in the fraction tail should not matter for 'k' */
+/* ERANGE underflow in the fraction tail does not matter for 'k' */
 str = "1."
 "00"
 "00"
@@ -2734,7 +2734,7 @@ static void test_qemu_strtosz_float(void)
 res = 0xbaadf00d;
 err = qemu_strtosz(str, &endptr, &res);
 g_assert_cmpint(err, ==, 0);
-g_assert_cmpuint(res, ==, 1 /* FIXME 1024 */);
+g_assert_cmpuint(res, ==, 1024);
 g_assert_true(endptr == str + 354);
 }

@@ -2826,16 +2826,16 @@ static void test_qemu_strtosz_invalid(void)
 g_assert_cmpuint(res, ==, 0);
 g_assert_true(endptr == str);

-/* FIXME Fraction tail can cause ERANGE overflow */
+/* Fraction tail can cause ERANGE overflow */
 str = "15.E";
 endptr = str;
 res = 0xbaadf00d;
 err = qemu_strtosz(str, &endptr, &res);
-g_assert_cmpint(err, ==, 0 /* FIXME -ERANGE */);
-g_assert_cmpuint(res, ==, 15ULL * EiB /* FIXME 0 */);
-g_assert_true(endptr == str + 56 /* FIXME str */);
+g_assert_cmpint(err, ==, -ERANGE);
+g_assert_cmpuint(res, ==, 0);
+g_assert_true(endptr == str + 56);

-/* FIXME ERANGE underflow in the fraction tail should matter for 'B' */
+/* ERANGE underflow in the fraction tail matters for 'B' */
 str = "1."
 "00"
 "00"
@@ -2848,9 +2848,9 @@ static void test_qemu_strtosz_invalid(void)
 endptr = str;
 res = 0xbaadf00d;
 err = qemu_strtosz(str, &endptr, &res);
-g_assert_cmpint(err, ==, 0 /* FIXME -EINVAL */);
-g_assert_cmpuint(res, ==, 1 /* FIXME 0 */);
-g_assert_true(endptr == str + 354 /* FIXME str */);
+g_assert_cmpint(err, ==, -EINVAL);
+g_assert_cmpuint(res, ==, 0);
+g_assert_true(endptr == str);

 /* No hex fractions */
 str = "

[PATCH 00/11] Fix qemu_strtosz() read-out-of-bounds

2023-05-08 Thread Eric Blake
This series blew up in my face when Hanna first pointed me to
https://gitlab.com/qemu-project/qemu/-/issues/1629

Basically, 'qemu-img dd bs=9.9e999' killed a sanitized build because
of a read-out-of-bounds (".9e999" parses as infinity, but qemu_strtosz
wasn't expecting ERANGE failure).

The overall diffstate is big, mainly because the unit tests needed a
LOT of work before I felt comfortable tweaking semantics in something
that is so essential to command-line and QMP parsing.

Eric Blake (11):
  test-cutils: Avoid g_assert in unit tests
  test-cutils: Use g_assert_cmpuint where appropriate
  test-cutils: Test integral qemu_strto* value on failures
  test-cutils: Add coverage of qemu_strtod
  test-cutils: Prepare for upcoming semantic change in qemu_strtosz
  test-cutils: Add more coverage to qemu_strtosz
  numa: Check for qemu_strtosz_MiB error
  cutils: Set value in all qemu_strtosz* error paths
  cutils: Set value in all integral qemu_strto* error paths
  cutils: Improve qemu_strtod* error paths
  cutils: Improve qemu_strtosz handling of fractions

 hw/core/numa.c   |   11 +-
 tests/unit/test-cutils.c | 1213 ++
 util/cutils.c|  180 --
 3 files changed, 1090 insertions(+), 314 deletions(-)


base-commit: 792f77f376adef944f9a03e601f6ad90c2f891b2
-- 
2.40.1




Re: [PATCH v20 10/21] machine: adding s390 topology to info hotpluggable-cpus

2023-05-08 Thread Nina Schoetterl-Glausch
On Tue, 2023-04-25 at 18:14 +0200, Pierre Morel wrote:
> S390 topology adds books and drawers topology containers.
> Let's add these to the HMP information for hotpluggable cpus.
> 
> Signed-off-by: Pierre Morel 

Reviewed-by: Nina Schoetterl-Glausch 

if you fix the nits below.
> ---
>  hw/core/machine-hmp-cmds.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
> index c3e55ef9e9..971212242d 100644
> --- a/hw/core/machine-hmp-cmds.c
> +++ b/hw/core/machine-hmp-cmds.c
> @@ -71,6 +71,12 @@ void hmp_hotpluggable_cpus(Monitor *mon, const QDict 
> *qdict)
>  if (c->has_node_id) {
>  monitor_printf(mon, "node-id: \"%" PRIu64 "\"\n", 
> c->node_id);
>  }
> +if (c->has_drawer_id) {
> +monitor_printf(mon, "drawer_id: \"%" PRIu64 "\"\n", 
> c->drawer_id);

   use - instead here ^ unless there is some reason to 
be inconsistent.
> +}
> +if (c->has_book_id) {
> +monitor_printf(mon, "  book_id: \"%" PRIu64 "\"\n", 
> c->book_id);

Same here.

> +}
>  if (c->has_socket_id) {
>  monitor_printf(mon, "socket-id: \"%" PRIu64 "\"\n", 
> c->socket_id);
>  }




Re: [PATCH v20 08/21] qapi/s390x/cpu topology: set-cpu-topology qmp command

2023-05-08 Thread Nina Schoetterl-Glausch
On Tue, 2023-04-25 at 18:14 +0200, Pierre Morel wrote:
> The modification of the CPU attributes are done through a monitor
> command.
> 
> It allows to move the core inside the topology tree to optimize
> the cache usage in the case the host's hypervisor previously
> moved the CPU.
> 
> The same command allows to modify the CPU attributes modifiers
> like polarization entitlement and the dedicated attribute to notify
> the guest if the host admin modified scheduling or dedication of a vCPU.
> 
> With this knowledge the guest has the possibility to optimize the
> usage of the vCPUs.
> 
> The command has a feature unstable for the moment.
> 
> Signed-off-by: Pierre Morel 

Logic is sound, minor stuff below.

> ---
>  qapi/machine-target.json |  37 +++
>  hw/s390x/cpu-topology.c  | 136 +++
>  2 files changed, 173 insertions(+)
> 
> diff --git a/qapi/machine-target.json b/qapi/machine-target.json
> index 42a6a40333..3b7a0b77f4 100644
> --- a/qapi/machine-target.json
> +++ b/qapi/machine-target.json
> @@ -4,6 +4,8 @@
>  # This work is licensed under the terms of the GNU GPL, version 2 or later.
>  # See the COPYING file in the top-level directory.
>  
> +{ 'include': 'machine-common.json' }
> +
>  ##
>  # @CpuModelInfo:
>  #
> @@ -354,3 +356,38 @@
>  { 'enum': 'CpuS390Polarization',
>'prefix': 'S390_CPU_POLARIZATION',
>'data': [ 'horizontal', 'vertical' ] }
> +
> +##
> +# @set-cpu-topology:
> +#
> +# @core-id: the vCPU ID to be moved
> +# @socket-id: optional destination socket where to move the vCPU
> +# @book-id: optional destination book where to move the vCPU
> +# @drawer-id: optional destination drawer where to move the vCPU
> +# @entitlement: optional entitlement
> +# @dedicated: optional, if the vCPU is dedicated to a real CPU
> +#
> +# Features:
> +# @unstable: This command may still be modified.
> +#
> +# Modifies the topology by moving the CPU inside the topology
> +# tree or by changing a modifier attribute of a CPU.
> +# Default value for optional parameter is the current value
> +# used by the CPU.
> +#
> +# Returns: Nothing on success, the reason on failure.
> +#
> +# Since: 8.1
> +##
> +{ 'command': 'set-cpu-topology',
> +  'data': {
> +  'core-id': 'uint16',
> +  '*socket-id': 'uint16',
> +  '*book-id': 'uint16',
> +  '*drawer-id': 'uint16',
> +  '*entitlement': 'CpuS390Entitlement',
> +  '*dedicated': 'bool'
> +  },
> +  'features': [ 'unstable' ],
> +  'if': { 'all': [ 'TARGET_S390X' , 'CONFIG_KVM' ] }
> +}
> diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c
> index d9cd3dc3ce..e5fb976594 100644
> --- a/hw/s390x/cpu-topology.c
> +++ b/hw/s390x/cpu-topology.c
> @@ -16,6 +16,7 @@
>  #include "target/s390x/cpu.h"
>  #include "hw/s390x/s390-virtio-ccw.h"
>  #include "hw/s390x/cpu-topology.h"
> +#include "qapi/qapi-commands-machine-target.h"
>  
>  /*
>   * s390_topology is used to keep the topology information.
> @@ -261,6 +262,27 @@ static bool s390_topology_check(uint16_t socket_id, 
> uint16_t book_id,
>  return true;
>  }
>  
> +/**
> + * s390_topology_need_report
> + * @cpu: Current cpu
> + * @drawer_id: future drawer ID
> + * @book_id: future book ID
> + * @socket_id: future socket ID

Entitlement and dedicated are missing here.

> + *
> + * A modified topology change report is needed if the topology
> + * tree or the topology attributes change.
> + */
> +static int s390_topology_need_report(S390CPU *cpu, int drawer_id,

I'd prefer a bool return type.

> +   int book_id, int socket_id,
> +   uint16_t entitlement, bool dedicated)
> +{
> +return cpu->env.drawer_id != drawer_id ||
> +   cpu->env.book_id != book_id ||
> +   cpu->env.socket_id != socket_id ||
> +   cpu->env.entitlement != entitlement ||
> +   cpu->env.dedicated != dedicated;
> +}
> +
>  /**
>   * s390_update_cpu_props:
>   * @ms: the machine state
> @@ -330,3 +352,117 @@ void s390_topology_setup_cpu(MachineState *ms, S390CPU 
> *cpu, Error **errp)
>  /* topology tree is reflected in props */
>  s390_update_cpu_props(ms, cpu);
>  }
> +
> +static void s390_change_topology(uint16_t core_id,
> + bool has_socket_id, uint16_t socket_id,
> + bool has_book_id, uint16_t book_id,
> + bool has_drawer_id, uint16_t drawer_id,
> + bool has_entitlement, uint16_t entitlement,

I would keep the enum type for entitlement.

> + bool has_dedicated, bool dedicated,
> + Error **errp)
> +{
> +MachineState *ms = current_machine;
> +int old_socket_entry;
> +int new_socket_entry;
> +int report_needed;
> +S390CPU *cpu;
> +ERRP_GUARD();
> +
> +if (core_id >= ms->smp.max_cpus) {
> +error_setg(errp, "Core-id %d out of range!", core_id);
>

Re: [PATCH] hw/net: Move xilinx_ethlite.c to the target-independent source set

2023-05-08 Thread Francisco Iglesias
On [2023 May 08] Mon 14:03:14, Thomas Huth wrote:
> Now that the tswap() functions are available for target-independent
> code, too, we can move xilinx_ethlite.c from specific_ss to softmmu_ss
> to avoid that we have to compile this file multiple times.
> 
> Signed-off-by: Thomas Huth 

Reviewed-by: Francisco Iglesias 

> ---
>  hw/net/xilinx_ethlite.c | 2 +-
>  hw/net/meson.build  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/net/xilinx_ethlite.c b/hw/net/xilinx_ethlite.c
> index 99c22819ea..89f4f3b254 100644
> --- a/hw/net/xilinx_ethlite.c
> +++ b/hw/net/xilinx_ethlite.c
> @@ -25,7 +25,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu/module.h"
>  #include "qom/object.h"
> -#include "cpu.h" /* FIXME should not use tswap* */
> +#include "exec/tswap.h"
>  #include "hw/sysbus.h"
>  #include "hw/irq.h"
>  #include "hw/qdev-properties.h"
> diff --git a/hw/net/meson.build b/hw/net/meson.build
> index e2be0654a1..a7860c5efe 100644
> --- a/hw/net/meson.build
> +++ b/hw/net/meson.build
> @@ -43,7 +43,7 @@ softmmu_ss.add(when: 'CONFIG_NPCM7XX', if_true: 
> files('npcm7xx_emc.c'))
>  softmmu_ss.add(when: 'CONFIG_ETRAXFS', if_true: files('etraxfs_eth.c'))
>  softmmu_ss.add(when: 'CONFIG_COLDFIRE', if_true: files('mcf_fec.c'))
>  specific_ss.add(when: 'CONFIG_PSERIES', if_true: files('spapr_llan.c'))
> -specific_ss.add(when: 'CONFIG_XILINX_ETHLITE', if_true: 
> files('xilinx_ethlite.c'))
> +softmmu_ss.add(when: 'CONFIG_XILINX_ETHLITE', if_true: 
> files('xilinx_ethlite.c'))
>  
>  softmmu_ss.add(when: 'CONFIG_VIRTIO_NET', if_true: files('net_rx_pkt.c'))
>  specific_ss.add(when: 'CONFIG_VIRTIO_NET', if_true: files('virtio-net.c'))
> -- 
> 2.31.1
> 
> 



Re: [PATCH 0/4] vhost-user-fs: Internal migration

2023-05-08 Thread Eugenio Perez Martin
On Mon, May 8, 2023 at 7:51 PM Eugenio Perez Martin  wrote:
>
> On Mon, May 8, 2023 at 7:00 PM Hanna Czenczek  wrote:
> >
> > On 05.05.23 16:37, Hanna Czenczek wrote:
> > > On 05.05.23 16:26, Eugenio Perez Martin wrote:
> > >> On Fri, May 5, 2023 at 11:51 AM Hanna Czenczek 
> > >> wrote:
> > >>> (By the way, thanks for the explanations :))
> > >>>
> > >>> On 05.05.23 11:03, Hanna Czenczek wrote:
> >  On 04.05.23 23:14, Stefan Hajnoczi wrote:
> > >>> [...]
> > >>>
> > > I think it's better to change QEMU's vhost code
> > > to leave stateful devices suspended (but not reset) across
> > > vhost_dev_stop() -> vhost_dev_start(), maybe by introducing
> > > vhost_dev_suspend() and vhost_dev_resume(). Have you thought about
> > > this aspect?
> >  Yes and no; I mean, I haven’t in detail, but I thought this is what’s
> >  meant by suspending instead of resetting when the VM is stopped.
> > >>> So, now looking at vhost_dev_stop(), one problem I can see is that
> > >>> depending on the back-end, different operations it does will do
> > >>> different things.
> > >>>
> > >>> It tries to stop the whole device via vhost_ops->vhost_dev_start(),
> > >>> which for vDPA will suspend the device, but for vhost-user will
> > >>> reset it
> > >>> (if F_STATUS is there).
> > >>>
> > >>> It disables all vrings, which doesn’t mean stopping, but may be
> > >>> necessary, too.  (I haven’t yet really understood the use of disabled
> > >>> vrings, I heard that virtio-net would have a need for it.)
> > >>>
> > >>> It then also stops all vrings, though, so that’s OK.  And because this
> > >>> will always do GET_VRING_BASE, this is actually always the same
> > >>> regardless of transport.
> > >>>
> > >>> Finally (for this purpose), it resets the device status via
> > >>> vhost_ops->vhost_reset_status().  This is only implemented on vDPA, and
> > >>> this is what resets the device there.
> > >>>
> > >>>
> > >>> So vhost-user resets the device in .vhost_dev_start, but vDPA only does
> > >>> so in .vhost_reset_status.  It would seem better to me if vhost-user
> > >>> would also reset the device only in .vhost_reset_status, not in
> > >>> .vhost_dev_start.  .vhost_dev_start seems precisely like the place to
> > >>> run SUSPEND/RESUME.
> > >>>
> > >> I think the same. I just saw It's been proposed at [1].
> > >>
> > >>> Another question I have (but this is basically what I wrote in my last
> > >>> email) is why we even call .vhost_reset_status here.  If the device
> > >>> and/or all of the vrings are already stopped, why do we need to reset
> > >>> it?  Naïvely, I had assumed we only really need to reset the device if
> > >>> the guest changes, so that a new guest driver sees a freshly
> > >>> initialized
> > >>> device.
> > >>>
> > >> I don't know why we didn't need to call it :). I'm assuming the
> > >> previous vhost-user net did fine resetting vq indexes, using
> > >> VHOST_USER_SET_VRING_BASE. But I don't know about more complex
> > >> devices.
> > >>
> > >> The guest can reset the device, or write 0 to the PCI config status,
> > >> at any time. How does virtiofs handle it, being stateful?
> > >
> > > Honestly a good question because virtiofsd implements neither
> > > SET_STATUS nor RESET_DEVICE.  I’ll have to investigate that.
> > >
> > > I think when the guest resets the device, SET_VRING_BASE always comes
> > > along some way or another, so that’s how the vrings are reset.  Maybe
> > > the internal state is reset only following more high-level FUSE
> > > commands like INIT.
> >
> > So a meeting and one session of looking-into-the-code later:
> >
> > We reset every virt queue on GET_VRING_BASE, which is wrong, but happens
> > to serve the purpose.  (German is currently on that.)
> >
> > In our meeting, German said the reset would occur when the memory
> > regions are changed, but I can’t see that in the code.
>
> That would imply that the status is reset when the guest's memory is
> added or removed?
>
> > I think it only
> > happens implicitly through the SET_VRING_BASE call, which resets the
> > internal avail/used pointers.
> >
> > [This doesn’t seem different from libvhost-user, though, which
> > implements neither SET_STATUS nor RESET_DEVICE, and which pretends to
> > reset the device on RESET_OWNER, but really doesn’t (its
> > vu_reset_device_exec() function just disables all vrings, doesn’t reset
> > or even stop them).]
> >
> > Consequently, the internal state is never reset.  It would be cleared on
> > a FUSE Destroy message, but if you just force-reset the system, the
> > state remains into the next reboot.  Not even FUSE Init clears it, which
> > seems weird.  It happens to work because it’s still the same filesystem,
> > so the existing state fits, but it kind of seems dangerous to keep e.g.
> > files open.  I don’t think it’s really exploitable because everything
> > still goes through the guest kernel, but, well.  We should clear the
> > state on Init, and probably also implement SET_STATUS 

[PATCH v2 6/6] multifd: Add colo support

2023-05-08 Thread Lukas Straub
Like in the normal ram_load() path, put the received pages into the
colo cache and mark the pages in the bitmap so that they will be
flushed to the guest later.

Signed-off-by: Lukas Straub 
---
 migration/multifd-colo.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/migration/multifd-colo.c b/migration/multifd-colo.c
index c035d15e87..305a1b7000 100644
--- a/migration/multifd-colo.c
+++ b/migration/multifd-colo.c
@@ -15,13 +15,41 @@
 #include "ram.h"
 #include "multifd.h"
 #include "io/channel-socket.h"
+#include "migration/colo.h"
 
 #define MULTIFD_INTERNAL
 #include "multifd-internal.h"
 
 static int multifd_colo_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
-return multifd_recv_state->ops->recv_pages(p, errp);
+int ret = 0;
+
+/*
+ * While we're still in precopy mode, we copy received pages to both guest
+ * and cache. No need to set dirty bits, since guest and cache memory are
+ * in sync.
+ */
+if (migration_incoming_in_colo_state()) {
+colo_record_bitmap(p->block, p->normal, p->normal_num);
+}
+
+p->host = p->block->colo_cache;
+ret = multifd_recv_state->ops->recv_pages(p, errp);
+if (ret != 0) {
+p->host = p->block->host;
+return ret;
+}
+
+if (!migration_incoming_in_colo_state()) {
+for (int i = 0; i < p->normal_num; i++) {
+void *guest = p->block->host + p->normal[i];
+void *cache = p->host + p->normal[i];
+memcpy(guest, cache, p->page_size);
+}
+}
+
+p->host = p->block->host;
+return ret;
 }
 
 int multifd_colo_load_setup(Error **errp)
-- 
2.39.2


pgplE9D31XYvU.pgp
Description: OpenPGP digital signature


Re: [PATCH 2/4] vhost-user: Interface for migration state transfer

2023-05-08 Thread Stefan Hajnoczi
On Thu, Apr 20, 2023 at 03:27:51PM +0200, Eugenio Pérez wrote:
> On Tue, 2023-04-18 at 16:40 -0400, Stefan Hajnoczi wrote:
> > On Tue, 18 Apr 2023 at 14:31, Eugenio Perez Martin 
> > wrote:
> > > On Tue, Apr 18, 2023 at 7:59 PM Stefan Hajnoczi  
> > > wrote:
> > > > On Tue, Apr 18, 2023 at 10:09:30AM +0200, Eugenio Perez Martin wrote:
> > > > > On Mon, Apr 17, 2023 at 9:33 PM Stefan Hajnoczi 
> > > > > wrote:
> > > > > > On Mon, 17 Apr 2023 at 15:10, Eugenio Perez Martin <
> > > > > > epere...@redhat.com> wrote:
> > > > > > > On Mon, Apr 17, 2023 at 5:38 PM Stefan Hajnoczi 
> > > > > > >  > > > > > > > wrote:
> > > > > > > > On Thu, Apr 13, 2023 at 12:14:24PM +0200, Eugenio Perez Martin
> > > > > > > > wrote:
> > > > > > > > > On Wed, Apr 12, 2023 at 11:06 PM Stefan Hajnoczi <
> > > > > > > > > stefa...@redhat.com> wrote:
> > > > > > > > > > On Tue, Apr 11, 2023 at 05:05:13PM +0200, Hanna Czenczek
> > > > > > > > > > wrote:
> > > > > > > > > > > So-called "internal" virtio-fs migration refers to
> > > > > > > > > > > transporting the
> > > > > > > > > > > back-end's (virtiofsd's) state through qemu's migration
> > > > > > > > > > > stream.  To do
> > > > > > > > > > > this, we need to be able to transfer virtiofsd's internal
> > > > > > > > > > > state to and
> > > > > > > > > > > from virtiofsd.
> > > > > > > > > > > 
> > > > > > > > > > > Because virtiofsd's internal state will not be too large, 
> > > > > > > > > > > we
> > > > > > > > > > > believe it
> > > > > > > > > > > is best to transfer it as a single binary blob after the
> > > > > > > > > > > streaming
> > > > > > > > > > > phase.  Because this method should be useful to other 
> > > > > > > > > > > vhost-
> > > > > > > > > > > user
> > > > > > > > > > > implementations, too, it is introduced as a 
> > > > > > > > > > > general-purpose
> > > > > > > > > > > addition to
> > > > > > > > > > > the protocol, not limited to vhost-user-fs.
> > > > > > > > > > > 
> > > > > > > > > > > These are the additions to the protocol:
> > > > > > > > > > > - New vhost-user protocol feature
> > > > > > > > > > > VHOST_USER_PROTOCOL_F_MIGRATORY_STATE:
> > > > > > > > > > >   This feature signals support for transferring state, and
> > > > > > > > > > > is added so
> > > > > > > > > > >   that migration can fail early when the back-end has no
> > > > > > > > > > > support.
> > > > > > > > > > > 
> > > > > > > > > > > - SET_DEVICE_STATE_FD function: Front-end and back-end
> > > > > > > > > > > negotiate a pipe
> > > > > > > > > > >   over which to transfer the state.  The front-end sends 
> > > > > > > > > > > an
> > > > > > > > > > > FD to the
> > > > > > > > > > >   back-end into/from which it can write/read its state, 
> > > > > > > > > > > and
> > > > > > > > > > > the back-end
> > > > > > > > > > >   can decide to either use it, or reply with a different 
> > > > > > > > > > > FD
> > > > > > > > > > > for the
> > > > > > > > > > >   front-end to override the front-end's choice.
> > > > > > > > > > >   The front-end creates a simple pipe to transfer the 
> > > > > > > > > > > state,
> > > > > > > > > > > but maybe
> > > > > > > > > > >   the back-end already has an FD into/from which it has to
> > > > > > > > > > > write/read
> > > > > > > > > > >   its state, in which case it will want to override the
> > > > > > > > > > > simple pipe.
> > > > > > > > > > >   Conversely, maybe in the future we find a way to have 
> > > > > > > > > > > the
> > > > > > > > > > > front-end
> > > > > > > > > > >   get an immediate FD for the migration stream (in some
> > > > > > > > > > > cases), in which
> > > > > > > > > > >   case we will want to send this to the back-end instead 
> > > > > > > > > > > of
> > > > > > > > > > > creating a
> > > > > > > > > > >   pipe.
> > > > > > > > > > >   Hence the negotiation: If one side has a better idea 
> > > > > > > > > > > than
> > > > > > > > > > > a plain
> > > > > > > > > > >   pipe, we will want to use that.
> > > > > > > > > > > 
> > > > > > > > > > > - CHECK_DEVICE_STATE: After the state has been transferred
> > > > > > > > > > > through the
> > > > > > > > > > >   pipe (the end indicated by EOF), the front-end invokes
> > > > > > > > > > > this function
> > > > > > > > > > >   to verify success.  There is no in-band way (through the
> > > > > > > > > > > pipe) to
> > > > > > > > > > >   indicate failure, so we need to check explicitly.
> > > > > > > > > > > 
> > > > > > > > > > > Once the transfer pipe has been established via
> > > > > > > > > > > SET_DEVICE_STATE_FD
> > > > > > > > > > > (which includes establishing the direction of transfer and
> > > > > > > > > > > migration
> > > > > > > > > > > phase), the sending side writes its data into the pipe, 
> > > > > > > > > > > and
> > > > > > > > > > > the reading
> > > > > > > > > > > side reads it until it sees an EOF.  Then, the front-end
> > > > > > > > > > > will check for
> > > > > > > > > > > success via CHECK_DEVICE_STATE, which on the destination
> > > > > > > > > > > 

[PATCH v2 3/6] multifd: Introduce multifd-internal.h

2023-05-08 Thread Lukas Straub
Introduce multifd-internal.h so code that would normally go into
multifd.c can go into an extra file. This way, multifd.c hopefully
won't grow to 4000 lines like ram.c

This will be used in the next commits to add colo support to multifd.

Signed-off-by: Lukas Straub 
---
 migration/multifd-internal.h | 34 ++
 migration/multifd.c  | 15 ---
 2 files changed, 38 insertions(+), 11 deletions(-)
 create mode 100644 migration/multifd-internal.h

diff --git a/migration/multifd-internal.h b/migration/multifd-internal.h
new file mode 100644
index 00..6eeaa028e7
--- /dev/null
+++ b/migration/multifd-internal.h
@@ -0,0 +1,34 @@
+/*
+ * Internal Multifd header
+ *
+ * Copyright (c) 2019-2020 Red Hat Inc
+ *
+ * Authors:
+ *  Juan Quintela 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifdef QEMU_MIGRATION_MULTIFD_INTERNAL_H
+#error Only include this header directly
+#endif
+#define QEMU_MIGRATION_MULTIFD_INTERNAL_H
+
+#ifndef MULTIFD_INTERNAL
+#error This header is internal to multifd
+#endif
+
+struct MultiFDRecvState {
+MultiFDRecvParams *params;
+/* number of created threads */
+int count;
+/* syncs main thread and channels */
+QemuSemaphore sem_sync;
+/* global number of generated multifd packets */
+uint64_t packet_num;
+/* multifd ops */
+MultiFDMethods *ops;
+};
+
+extern struct MultiFDRecvState *multifd_recv_state;
diff --git a/migration/multifd.c b/migration/multifd.c
index 4e71c19292..f6bad69b6c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -31,6 +31,9 @@
 #include "io/channel-socket.h"
 #include "yank_functions.h"
 
+#define MULTIFD_INTERNAL
+#include "multifd-internal.h"
+
 /* Multiple fd's */
 
 #define MULTIFD_MAGIC 0x11223344U
@@ -967,17 +970,7 @@ int multifd_save_setup(Error **errp)
 return 0;
 }
 
-struct {
-MultiFDRecvParams *params;
-/* number of created threads */
-int count;
-/* syncs main thread and channels */
-QemuSemaphore sem_sync;
-/* global number of generated multifd packets */
-uint64_t packet_num;
-/* multifd ops */
-MultiFDMethods *ops;
-} *multifd_recv_state;
+struct MultiFDRecvState *multifd_recv_state;
 
 static void multifd_recv_terminate_threads(Error *err)
 {
-- 
2.39.2



pgpxmCxzwCgUP.pgp
Description: OpenPGP digital signature


[PATCH v2 4/6] multifd: Introduce a overridable revc_pages method

2023-05-08 Thread Lukas Straub
This allows to override the behaviour around recv_pages. Think of
it like a "multifd_colo" child class of multifd.

This will be used in the next commits to add colo support to multifd.

Signed-off-by: Lukas Straub 
---
 migration/meson.build|  1 +
 migration/multifd-colo.c | 39 +
 migration/multifd-internal.h |  5 
 migration/multifd.c  | 48 
 4 files changed, 83 insertions(+), 10 deletions(-)
 create mode 100644 migration/multifd-colo.c

diff --git a/migration/meson.build b/migration/meson.build
index da1897fadf..22ab6c6d73 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -23,6 +23,7 @@ softmmu_ss.add(files(
   'migration.c',
   'multifd.c',
   'multifd-zlib.c',
+  'multifd-colo.c',
   'options.c',
   'postcopy-ram.c',
   'savevm.c',
diff --git a/migration/multifd-colo.c b/migration/multifd-colo.c
new file mode 100644
index 00..c035d15e87
--- /dev/null
+++ b/migration/multifd-colo.c
@@ -0,0 +1,39 @@
+/*
+ * multifd colo implementation
+ *
+ * Copyright (c) Lukas Straub 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "exec/target_page.h"
+#include "exec/ramblock.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "ram.h"
+#include "multifd.h"
+#include "io/channel-socket.h"
+
+#define MULTIFD_INTERNAL
+#include "multifd-internal.h"
+
+static int multifd_colo_recv_pages(MultiFDRecvParams *p, Error **errp)
+{
+return multifd_recv_state->ops->recv_pages(p, errp);
+}
+
+int multifd_colo_load_setup(Error **errp)
+{
+int ret;
+
+ret = _multifd_load_setup(errp);
+if (ret) {
+return ret;
+}
+
+multifd_recv_state->recv_pages = multifd_colo_recv_pages;
+
+return 0;
+}
diff --git a/migration/multifd-internal.h b/migration/multifd-internal.h
index 6eeaa028e7..82357f1d88 100644
--- a/migration/multifd-internal.h
+++ b/migration/multifd-internal.h
@@ -29,6 +29,11 @@ struct MultiFDRecvState {
 uint64_t packet_num;
 /* multifd ops */
 MultiFDMethods *ops;
+/* overridable recv method */
+int (*recv_pages)(MultiFDRecvParams *p, Error **errp);
 };
 
 extern struct MultiFDRecvState *multifd_recv_state;
+
+int _multifd_load_setup(Error **errp);
+int multifd_colo_load_setup(Error **errp);
diff --git a/migration/multifd.c b/migration/multifd.c
index f6bad69b6c..fb5e8859de 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1126,7 +1126,7 @@ static void *multifd_recv_thread(void *opaque)
 qemu_mutex_unlock(&p->mutex);
 
 if (p->normal_num) {
-ret = multifd_recv_state->ops->recv_pages(p, &local_err);
+ret = multifd_recv_state->recv_pages(p, &local_err);
 if (ret != 0) {
 break;
 }
@@ -1152,20 +1152,12 @@ static void *multifd_recv_thread(void *opaque)
 return NULL;
 }
 
-int multifd_load_setup(Error **errp)
+int _multifd_load_setup(Error **errp)
 {
 int thread_count;
 uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
 uint8_t i;
 
-/*
- * Return successfully if multiFD recv state is already initialised
- * or multiFD is not enabled.
- */
-if (multifd_recv_state || !migrate_multifd()) {
-return 0;
-}
-
 thread_count = migrate_multifd_channels();
 multifd_recv_state = g_malloc0(sizeof(*multifd_recv_state));
 multifd_recv_state->params = g_new0(MultiFDRecvParams, thread_count);
@@ -1204,6 +1196,42 @@ int multifd_load_setup(Error **errp)
 return 0;
 }
 
+static int multifd_normal_recv_pages(MultiFDRecvParams *p, Error **errp)
+{
+return multifd_recv_state->ops->recv_pages(p, errp);
+}
+
+static int multifd_normal_load_setup(Error **errp)
+{
+int ret;
+
+ret = _multifd_load_setup(errp);
+if (ret) {
+return ret;
+}
+
+multifd_recv_state->recv_pages = multifd_normal_recv_pages;
+
+return 0;
+}
+
+int multifd_load_setup(Error **errp)
+{
+/*
+ * Return successfully if multiFD recv state is already initialised
+ * or multiFD is not enabled.
+ */
+if (multifd_recv_state || !migrate_multifd()) {
+return 0;
+}
+
+if (migrate_colo()) {
+return multifd_colo_load_setup(errp);
+} else {
+return multifd_normal_load_setup(errp);
+}
+}
+
 bool multifd_recv_all_channels_created(void)
 {
 int thread_count = migrate_multifd_channels();
-- 
2.39.2



pgpW27tKPR0AL.pgp
Description: OpenPGP digital signature


[PATCH v2 1/6] ram: Add public helper to set colo bitmap

2023-05-08 Thread Lukas Straub
The overhead of the mutex in non-multifd mode is negligible,
because in that case its just the single thread taking the mutex.

This will be used in the next commits to add colo support to multifd.

Signed-off-by: Lukas Straub 
---
 migration/ram.c | 17 ++---
 migration/ram.h |  1 +
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 5e7bf20ca5..2d3fd2112a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3633,6 +3633,18 @@ static ram_addr_t 
host_page_offset_from_ram_block_offset(RAMBlock *block,
 return ((uintptr_t)block->host + offset) & (block->page_size - 1);
 }
 
+void colo_record_bitmap(RAMBlock *block, ram_addr_t *normal, uint normal_num)
+{
+qemu_mutex_lock(&ram_state->bitmap_mutex);
+for (int i = 0; i < normal_num; i++) {
+ram_addr_t offset = normal[i];
+ram_state->migration_dirty_pages += !test_and_set_bit(
+offset >> TARGET_PAGE_BITS,
+block->bmap);
+}
+qemu_mutex_unlock(&ram_state->bitmap_mutex);
+}
+
 static inline void *colo_cache_from_block_offset(RAMBlock *block,
  ram_addr_t offset, bool record_bitmap)
 {
@@ -3650,9 +3662,8 @@ static inline void *colo_cache_from_block_offset(RAMBlock 
*block,
 * It help us to decide which pages in ram cache should be flushed
 * into VM's RAM later.
 */
-if (record_bitmap &&
-!test_and_set_bit(offset >> TARGET_PAGE_BITS, block->bmap)) {
-ram_state->migration_dirty_pages++;
+if (record_bitmap) {
+colo_record_bitmap(block, &offset, 1);
 }
 return block->colo_cache + offset;
 }
diff --git a/migration/ram.h b/migration/ram.h
index 6fffbeb5f1..887d1fbae6 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -82,6 +82,7 @@ int colo_init_ram_cache(void);
 void colo_flush_ram_cache(void);
 void colo_release_ram_cache(void);
 void colo_incoming_start_dirty_log(void);
+void colo_record_bitmap(RAMBlock *block, ram_addr_t *normal, uint normal_num);
 
 /* Background snapshot */
 bool ram_write_tracking_available(void);
-- 
2.39.2



pgp53mPpYP4Hq.pgp
Description: OpenPGP digital signature


[PATCH v2 2/6] ram: Let colo_flush_ram_cache take the bitmap_mutex

2023-05-08 Thread Lukas Straub
This is not required, colo_flush_ram_cache does not run concurrently
with the multifd threads since the cache is only flushed after
everything has been received. But it makes me more comfortable.

This will be used in the next commits to add colo support to multifd.

Signed-off-by: Lukas Straub 
---
 migration/ram.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 2d3fd2112a..f9e7aeda12 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4230,6 +4230,7 @@ void colo_flush_ram_cache(void)
 unsigned long offset = 0;
 
 memory_global_dirty_log_sync();
+qemu_mutex_lock(&ram_state->bitmap_mutex);
 WITH_RCU_READ_LOCK_GUARD() {
 RAMBLOCK_FOREACH_NOT_IGNORED(block) {
 ramblock_sync_dirty_bitmap(ram_state, block);
@@ -4264,6 +4265,7 @@ void colo_flush_ram_cache(void)
 }
 }
 }
+qemu_mutex_unlock(&ram_state->bitmap_mutex);
 trace_colo_flush_ram_cache_end();
 }
 
-- 
2.39.2



pgpg5HTiURpEc.pgp
Description: OpenPGP digital signature


[PATCH v2 5/6] multifd: Add the ramblock to MultiFDRecvParams

2023-05-08 Thread Lukas Straub
This will be used in the next commits to add colo support to multifd.

Signed-off-by: Lukas Straub 
---
 migration/multifd.c | 11 +--
 migration/multifd.h |  2 ++
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index fb5e8859de..fddbf86596 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -284,7 +284,6 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
 static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
 {
 MultiFDPacket_t *packet = p->packet;
-RAMBlock *block;
 int i;
 
 packet->magic = be32_to_cpu(packet->magic);
@@ -334,21 +333,21 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams 
*p, Error **errp)
 
 /* make sure that ramblock is 0 terminated */
 packet->ramblock[255] = 0;
-block = qemu_ram_block_by_name(packet->ramblock);
-if (!block) {
+p->block = qemu_ram_block_by_name(packet->ramblock);
+if (!p->block) {
 error_setg(errp, "multifd: unknown ram block %s",
packet->ramblock);
 return -1;
 }
 
-p->host = block->host;
+p->host = p->block->host;
 for (i = 0; i < p->normal_num; i++) {
 uint64_t offset = be64_to_cpu(packet->offset[i]);
 
-if (offset > (block->used_length - p->page_size)) {
+if (offset > (p->block->used_length - p->page_size)) {
 error_setg(errp, "multifd: offset too long %" PRIu64
" (max " RAM_ADDR_FMT ")",
-   offset, block->used_length);
+   offset, p->block->used_length);
 return -1;
 }
 p->normal[i] = offset;
diff --git a/migration/multifd.h b/migration/multifd.h
index 7cfc265148..a835643b48 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -175,6 +175,8 @@ typedef struct {
 uint32_t next_packet_size;
 /* packets sent through this channel */
 uint64_t num_packets;
+/* ramblock */
+RAMBlock *block;
 /* ramblock host address */
 uint8_t *host;
 /* non zero pages recv through this channel */
-- 
2.39.2



pgpYmu_18sDuf.pgp
Description: OpenPGP digital signature


[PATCH v2 0/6] multifd: Add colo support

2023-05-08 Thread Lukas Straub
Hello Everyone,
These patches add support for colo to multifd.

-v2:
 - Split out addition of p->block 
 - Add more comments

Lukas Straub (6):
  ram: Add public helper to set colo bitmap
  ram: Let colo_flush_ram_cache take the bitmap_mutex
  multifd: Introduce multifd-internal.h
  multifd: Introduce a overridable revc_pages method
  multifd: Add the ramblock to MultiFDRecvParams
  multifd: Add colo support

 migration/meson.build|  1 +
 migration/multifd-colo.c | 67 
 migration/multifd-internal.h | 39 +++
 migration/multifd.c  | 74 +++-
 migration/multifd.h  |  2 +
 migration/ram.c  | 19 +++--
 migration/ram.h  |  1 +
 7 files changed, 173 insertions(+), 30 deletions(-)
 create mode 100644 migration/multifd-colo.c
 create mode 100644 migration/multifd-internal.h

-- 
2.39.2


pgpUUAHlLbth_.pgp
Description: OpenPGP digital signature


[PULL 12/13] ram-compress.c: Make target independent

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

Make ram-compress.c target independent.

Signed-off-by: Lukas Straub 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/meson.build|  3 ++-
 migration/ram-compress.c | 17 ++---
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/migration/meson.build b/migration/meson.build
index 2090af8e85..75de868bb7 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -23,6 +23,8 @@ softmmu_ss.add(files(
   'migration.c',
   'multifd.c',
   'multifd-zlib.c',
+  'multifd-zlib.c',
+  'ram-compress.c',
   'options.c',
   'postcopy-ram.c',
   'savevm.c',
@@ -40,5 +42,4 @@ softmmu_ss.add(when: zstd, if_true: files('multifd-zstd.c'))
 specific_ss.add(when: 'CONFIG_SOFTMMU',
 if_true: files('dirtyrate.c',
'ram.c',
-   'ram-compress.c',
'target.c'))
diff --git a/migration/ram-compress.c b/migration/ram-compress.c
index 3d2a4a6329..06254d8c69 100644
--- a/migration/ram-compress.c
+++ b/migration/ram-compress.c
@@ -35,7 +35,8 @@
 #include "migration.h"
 #include "options.h"
 #include "io/channel-null.h"
-#include "exec/ram_addr.h"
+#include "exec/target_page.h"
+#include "exec/ramblock.h"
 
 CompressionStats compression_counters;
 
@@ -156,7 +157,7 @@ int compress_threads_save_setup(void)
 qemu_cond_init(&comp_done_cond);
 qemu_mutex_init(&comp_done_lock);
 for (i = 0; i < thread_count; i++) {
-comp_param[i].originbuf = g_try_malloc(TARGET_PAGE_SIZE);
+comp_param[i].originbuf = g_try_malloc(qemu_target_page_size());
 if (!comp_param[i].originbuf) {
 goto exit;
 }
@@ -192,11 +193,12 @@ static CompressResult do_compress_ram_page(QEMUFile *f, 
z_stream *stream,
uint8_t *source_buf)
 {
 uint8_t *p = block->host + offset;
+size_t page_size = qemu_target_page_size();
 int ret;
 
 assert(qemu_file_buffer_empty(f));
 
-if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
+if (buffer_is_zero(p, page_size)) {
 return RES_ZEROPAGE;
 }
 
@@ -205,8 +207,8 @@ static CompressResult do_compress_ram_page(QEMUFile *f, 
z_stream *stream,
  * so that we can catch up the error during compression and
  * decompression
  */
-memcpy(source_buf, p, TARGET_PAGE_SIZE);
-ret = qemu_put_compression_data(f, stream, source_buf, TARGET_PAGE_SIZE);
+memcpy(source_buf, p, page_size);
+ret = qemu_put_compression_data(f, stream, source_buf, page_size);
 if (ret < 0) {
 qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
 error_report("compressed data failed!");
@@ -336,7 +338,7 @@ static void *do_data_decompress(void *opaque)
 param->des = 0;
 qemu_mutex_unlock(¶m->mutex);
 
-pagesize = TARGET_PAGE_SIZE;
+pagesize = qemu_target_page_size();
 
 ret = qemu_uncompress_data(¶m->stream, des, pagesize,
param->compbuf, len);
@@ -439,7 +441,8 @@ int compress_threads_load_setup(QEMUFile *f)
 goto exit;
 }
 
-decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
+size_t compbuf_size = compressBound(qemu_target_page_size());
+decomp_param[i].compbuf = g_malloc0(compbuf_size);
 qemu_mutex_init(&decomp_param[i].mutex);
 qemu_cond_init(&decomp_param[i].cond);
 decomp_param[i].done = true;
-- 
2.40.0




[PULL 11/13] ram compress: Assert that the file buffer matches the result

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

Before this series, "nothing to send" was handled by the file buffer
being empty. Now it is tracked via param->result.

Assert that the file buffer state matches the result.

Signed-off-by: Lukas Straub 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/qemu-file.c| 11 +++
 migration/qemu-file.h|  1 +
 migration/ram-compress.c |  5 +
 migration/ram.c  |  2 ++
 4 files changed, 19 insertions(+)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index f4cfd05c67..61fb580342 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -870,6 +870,17 @@ int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src)
 return len;
 }
 
+/*
+ * Check if the writable buffer is empty
+ */
+
+bool qemu_file_buffer_empty(QEMUFile *file)
+{
+assert(qemu_file_is_writable(file));
+
+return !file->iovcnt;
+}
+
 /*
  * Get a string whose length is determined by a single preceding byte
  * A preallocated 256 byte buffer must be passed in.
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 4f26bf6961..4ee58a87dd 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -113,6 +113,7 @@ size_t coroutine_mixed_fn qemu_get_buffer_in_place(QEMUFile 
*f, uint8_t **buf, s
 ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream,
   const uint8_t *p, size_t size);
 int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src);
+bool qemu_file_buffer_empty(QEMUFile *file);
 
 /*
  * Note that you can only peek continuous bytes from where the current pointer
diff --git a/migration/ram-compress.c b/migration/ram-compress.c
index c25562f12d..3d2a4a6329 100644
--- a/migration/ram-compress.c
+++ b/migration/ram-compress.c
@@ -194,6 +194,8 @@ static CompressResult do_compress_ram_page(QEMUFile *f, 
z_stream *stream,
 uint8_t *p = block->host + offset;
 int ret;
 
+assert(qemu_file_buffer_empty(f));
+
 if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
 return RES_ZEROPAGE;
 }
@@ -208,6 +210,7 @@ static CompressResult do_compress_ram_page(QEMUFile *f, 
z_stream *stream,
 if (ret < 0) {
 qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
 error_report("compressed data failed!");
+qemu_fflush(f);
 return RES_NONE;
 }
 return RES_COMPRESS;
@@ -239,6 +242,7 @@ void flush_compressed_data(int 
(send_queued_data(CompressParam *)))
 if (!comp_param[idx].quit) {
 CompressParam *param = &comp_param[idx];
 send_queued_data(param);
+assert(qemu_file_buffer_empty(param->file));
 compress_reset_result(param);
 }
 qemu_mutex_unlock(&comp_param[idx].mutex);
@@ -268,6 +272,7 @@ retry:
 qemu_mutex_lock(¶m->mutex);
 param->done = false;
 send_queued_data(param);
+assert(qemu_file_buffer_empty(param->file));
 compress_reset_result(param);
 set_compress_params(param, block, offset);
 
diff --git a/migration/ram.c b/migration/ram.c
index 009681d213..ee4ab31f25 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1321,11 +1321,13 @@ static int send_queued_data(CompressParam *param)
 assert(block == pss->last_sent_block);
 
 if (param->result == RES_ZEROPAGE) {
+assert(qemu_file_buffer_empty(param->file));
 len += save_page_header(pss, file, block, offset | RAM_SAVE_FLAG_ZERO);
 qemu_put_byte(file, 0);
 len += 1;
 ram_release_page(block->idstr, offset);
 } else if (param->result == RES_COMPRESS) {
+assert(!qemu_file_buffer_empty(param->file));
 len += save_page_header(pss, file, block,
 offset | RAM_SAVE_FLAG_COMPRESS_PAGE);
 len += qemu_put_qemu_file(file, param->file);
-- 
2.40.0




[PULL 13/13] migration: Initialize and cleanup decompression in migration.c

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

This fixes compress with colo.

Signed-off-by: Lukas Straub 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/migration.c | 9 +
 migration/ram.c   | 5 -
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 232e387109..0ee07802a5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -26,6 +26,7 @@
 #include "sysemu/cpu-throttle.h"
 #include "rdma.h"
 #include "ram.h"
+#include "ram-compress.h"
 #include "migration/global_state.h"
 #include "migration/misc.h"
 #include "migration.h"
@@ -228,6 +229,7 @@ void migration_incoming_state_destroy(void)
 struct MigrationIncomingState *mis = migration_incoming_get_current();
 
 multifd_load_cleanup();
+compress_threads_load_cleanup();
 
 if (mis->to_src_file) {
 /* Tell source that we are done */
@@ -500,6 +502,12 @@ process_incoming_migration_co(void *opaque)
 Error *local_err = NULL;
 
 assert(mis->from_src_file);
+
+if (compress_threads_load_setup(mis->from_src_file)) {
+error_report("Failed to setup decompress threads");
+goto fail;
+}
+
 mis->migration_incoming_co = qemu_coroutine_self();
 mis->largest_page_size = qemu_ram_pagesize_largest();
 postcopy_state_set(POSTCOPY_INCOMING_NONE);
@@ -565,6 +573,7 @@ fail:
 qemu_fclose(mis->from_src_file);
 
 multifd_load_cleanup();
+compress_threads_load_cleanup();
 
 exit(EXIT_FAILURE);
 }
diff --git a/migration/ram.c b/migration/ram.c
index ee4ab31f25..f78e9912cd 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3558,10 +3558,6 @@ void colo_release_ram_cache(void)
  */
 static int ram_load_setup(QEMUFile *f, void *opaque)
 {
-if (compress_threads_load_setup(f)) {
-return -1;
-}
-
 xbzrle_load_setup();
 ramblock_recv_map_init();
 
@@ -3577,7 +3573,6 @@ static int ram_load_cleanup(void *opaque)
 }
 
 xbzrle_load_cleanup();
-compress_threads_load_cleanup();
 
 RAMBLOCK_FOREACH_NOT_IGNORED(rb) {
 g_free(rb->receivedmap);
-- 
2.40.0




[PULL 01/13] qtest/migration-test.c: Add tests with compress enabled

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

There has never been tests for migration with compress enabled.

Add suitable tests, testing with compress-wait-thread = false
too.

Signed-off-by: Lukas Straub 
Acked-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 tests/qtest/migration-test.c | 109 +++
 1 file changed, 109 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index be73ec3c06..ea0d3fad2a 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -406,6 +406,41 @@ static void migrate_set_parameter_str(QTestState *who, 
const char *parameter,
 migrate_check_parameter_str(who, parameter, value);
 }
 
+static long long migrate_get_parameter_bool(QTestState *who,
+   const char *parameter)
+{
+QDict *rsp;
+int result;
+
+rsp = wait_command(who, "{ 'execute': 'query-migrate-parameters' }");
+result = qdict_get_bool(rsp, parameter);
+qobject_unref(rsp);
+return !!result;
+}
+
+static void migrate_check_parameter_bool(QTestState *who, const char 
*parameter,
+int value)
+{
+int result;
+
+result = migrate_get_parameter_bool(who, parameter);
+g_assert_cmpint(result, ==, value);
+}
+
+static void migrate_set_parameter_bool(QTestState *who, const char *parameter,
+  int value)
+{
+QDict *rsp;
+
+rsp = qtest_qmp(who,
+"{ 'execute': 'migrate-set-parameters',"
+"'arguments': { %s: %i } }",
+parameter, value);
+g_assert(qdict_haskey(rsp, "return"));
+qobject_unref(rsp);
+migrate_check_parameter_bool(who, parameter, value);
+}
+
 static void migrate_ensure_non_converge(QTestState *who)
 {
 /* Can't converge with 1ms downtime + 3 mbs bandwidth limit */
@@ -1524,6 +1559,70 @@ static void test_precopy_unix_xbzrle(void)
 test_precopy_common(&args);
 }
 
+static void *
+test_migrate_compress_start(QTestState *from,
+QTestState *to)
+{
+migrate_set_parameter_int(from, "compress-level", 1);
+migrate_set_parameter_int(from, "compress-threads", 4);
+migrate_set_parameter_bool(from, "compress-wait-thread", true);
+migrate_set_parameter_int(to, "decompress-threads", 4);
+
+migrate_set_capability(from, "compress", true);
+migrate_set_capability(to, "compress", true);
+
+return NULL;
+}
+
+static void test_precopy_unix_compress(void)
+{
+g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+MigrateCommon args = {
+.connect_uri = uri,
+.listen_uri = uri,
+.start_hook = test_migrate_compress_start,
+/*
+ * Test that no invalid thread state is left over from
+ * the previous iteration.
+ */
+.iterations = 2,
+};
+
+test_precopy_common(&args);
+}
+
+static void *
+test_migrate_compress_nowait_start(QTestState *from,
+   QTestState *to)
+{
+migrate_set_parameter_int(from, "compress-level", 9);
+migrate_set_parameter_int(from, "compress-threads", 1);
+migrate_set_parameter_bool(from, "compress-wait-thread", false);
+migrate_set_parameter_int(to, "decompress-threads", 1);
+
+migrate_set_capability(from, "compress", true);
+migrate_set_capability(to, "compress", true);
+
+return NULL;
+}
+
+static void test_precopy_unix_compress_nowait(void)
+{
+g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+MigrateCommon args = {
+.connect_uri = uri,
+.listen_uri = uri,
+.start_hook = test_migrate_compress_nowait_start,
+/*
+ * Test that no invalid thread state is left over from
+ * the previous iteration.
+ */
+.iterations = 2,
+};
+
+test_precopy_common(&args);
+}
+
 static void test_precopy_tcp_plain(void)
 {
 MigrateCommon args = {
@@ -2537,6 +2636,16 @@ int main(int argc, char **argv)
 qtest_add_func("/migration/bad_dest", test_baddest);
 qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain);
 qtest_add_func("/migration/precopy/unix/xbzrle", test_precopy_unix_xbzrle);
+/*
+ * Compression fails from time to time.
+ * Put test here but don't enable it until everything is fixed.
+ */
+if (getenv("QEMU_TEST_FLAKY_TESTS")) {
+qtest_add_func("/migration/precopy/unix/compress/wait",
+   test_precopy_unix_compress);
+qtest_add_func("/migration/precopy/unix/compress/nowait",
+   test_precopy_unix_compress_nowait);
+}
 #ifdef CONFIG_GNUTLS
 qtest_add_func("/migration/precopy/unix/tls/psk",
test_precopy_unix_tls_psk);
-- 
2.40.0




[PULL 08/13] ram.c: Remove last ram.c dependency from the core compress code

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

Make compression interfaces take send_queued_data() as an argument.
Remove save_page_use_compression() from flush_compressed_data().

This removes the last ram.c dependency from the core compress code.

Signed-off-by: Lukas Straub 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index d1c24eff21..0cce65dfa5 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1545,13 +1545,10 @@ static int send_queued_data(CompressParam *param)
 return len;
 }
 
-static void flush_compressed_data(RAMState *rs)
+static void flush_compressed_data(int (send_queued_data(CompressParam *)))
 {
 int idx, thread_count;
 
-if (!save_page_use_compression(rs)) {
-return;
-}
 thread_count = migrate_compress_threads();
 
 qemu_mutex_lock(&comp_done_lock);
@@ -1573,6 +1570,15 @@ static void flush_compressed_data(RAMState *rs)
 }
 }
 
+static void ram_flush_compressed_data(RAMState *rs)
+{
+if (!save_page_use_compression(rs)) {
+return;
+}
+
+flush_compressed_data(send_queued_data);
+}
+
 static inline void set_compress_params(CompressParam *param, RAMBlock *block,
ram_addr_t offset)
 {
@@ -1581,7 +1587,8 @@ static inline void set_compress_params(CompressParam 
*param, RAMBlock *block,
 param->trigger = true;
 }
 
-static int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset)
+static int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset,
+int (send_queued_data(CompressParam *)))
 {
 int idx, thread_count, pages = -1;
 bool wait = migrate_compress_wait_thread();
@@ -1672,7 +1679,7 @@ static int find_dirty_block(RAMState *rs, 
PageSearchStatus *pss)
  * Also If xbzrle is on, stop using the data compression at this
  * point. In theory, xbzrle can do better than compression.
  */
-flush_compressed_data(rs);
+ram_flush_compressed_data(rs);
 
 /* Hit the end of the list */
 pss->block = QLIST_FIRST_RCU(&ram_list.blocks);
@@ -2362,11 +2369,11 @@ static bool save_compress_page(RAMState *rs, 
PageSearchStatus *pss,
  * much CPU resource.
  */
 if (block != pss->last_sent_block) {
-flush_compressed_data(rs);
+ram_flush_compressed_data(rs);
 return false;
 }
 
-if (compress_page_with_multi_thread(block, offset) > 0) {
+if (compress_page_with_multi_thread(block, offset, send_queued_data) > 0) {
 return true;
 }
 
@@ -3412,7 +3419,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
  * page is sent in one chunk.
  */
 if (migrate_postcopy_ram()) {
-flush_compressed_data(rs);
+ram_flush_compressed_data(rs);
 }
 
 /*
@@ -3507,7 +3514,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 }
 qemu_mutex_unlock(&rs->bitmap_mutex);
 
-flush_compressed_data(rs);
+ram_flush_compressed_data(rs);
 ram_control_after_iterate(f, RAM_CONTROL_FINISH);
 }
 
-- 
2.40.0




[PULL 09/13] ram.c: Move core compression code into its own file

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

No functional changes intended.

Signed-off-by: Lukas Straub 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/meson.build|   5 +-
 migration/ram-compress.c | 274 +++
 migration/ram-compress.h |  65 ++
 migration/ram.c  | 262 +
 4 files changed, 344 insertions(+), 262 deletions(-)
 create mode 100644 migration/ram-compress.c
 create mode 100644 migration/ram-compress.h

diff --git a/migration/meson.build b/migration/meson.build
index da1897fadf..2090af8e85 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -38,4 +38,7 @@ endif
 softmmu_ss.add(when: zstd, if_true: files('multifd-zstd.c'))
 
 specific_ss.add(when: 'CONFIG_SOFTMMU',
-if_true: files('dirtyrate.c', 'ram.c', 'target.c'))
+if_true: files('dirtyrate.c',
+   'ram.c',
+   'ram-compress.c',
+   'target.c'))
diff --git a/migration/ram-compress.c b/migration/ram-compress.c
new file mode 100644
index 00..d9bc67d075
--- /dev/null
+++ b/migration/ram-compress.c
@@ -0,0 +1,274 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ * Copyright (c) 2011-2015 Red Hat Inc
+ *
+ * Authors:
+ *  Juan Quintela 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/cutils.h"
+
+#include "ram-compress.h"
+
+#include "qemu/error-report.h"
+#include "migration.h"
+#include "options.h"
+#include "io/channel-null.h"
+#include "exec/ram_addr.h"
+
+CompressionStats compression_counters;
+
+static CompressParam *comp_param;
+static QemuThread *compress_threads;
+/* comp_done_cond is used to wake up the migration thread when
+ * one of the compression threads has finished the compression.
+ * comp_done_lock is used to co-work with comp_done_cond.
+ */
+static QemuMutex comp_done_lock;
+static QemuCond comp_done_cond;
+
+static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream,
+   RAMBlock *block, ram_addr_t offset,
+   uint8_t *source_buf);
+
+static void *do_data_compress(void *opaque)
+{
+CompressParam *param = opaque;
+RAMBlock *block;
+ram_addr_t offset;
+CompressResult result;
+
+qemu_mutex_lock(¶m->mutex);
+while (!param->quit) {
+if (param->trigger) {
+block = param->block;
+offset = param->offset;
+param->trigger = false;
+qemu_mutex_unlock(¶m->mutex);
+
+result = do_compress_ram_page(param->file, ¶m->stream,
+  block, offset, param->originbuf);
+
+qemu_mutex_lock(&comp_done_lock);
+param->done = true;
+param->result = result;
+qemu_cond_signal(&comp_done_cond);
+qemu_mutex_unlock(&comp_done_lock);
+
+qemu_mutex_lock(¶m->mutex);
+} else {
+qemu_cond_wait(¶m->cond, ¶m->mutex);
+}
+}
+qemu_mutex_unlock(¶m->mutex);
+
+return NULL;
+}
+
+void compress_threads_save_cleanup(void)
+{
+int i, thread_count;
+
+if (!migrate_compress() || !comp_param) {
+return;
+}
+
+thread_count = migrate_compress_threads();
+for (i = 0; i < thread_count; i++) {
+/*
+ * we use it as a indicator which shows if the thread is
+ * properly init'd or not
+ */
+if (!comp_param[i].file) {
+break;
+}
+
+qemu_mutex_lock(&comp_param[i].mutex);
+comp_param[i].quit = true;
+qemu_cond_signal(&comp_param[i].cond);
+qemu_mutex_unlock(&comp_param[i].mutex);
+
+qemu_thread_join(compress_threads + i);
+qemu_mutex_destroy(&comp_param[i].mutex);
+qemu_cond_destroy(&comp

[PULL 02/13] qtest/migration-test.c: Add postcopy tests with compress enabled

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

Add postcopy tests with compress enabled to ensure nothing breaks
with the refactoring in the next commits.

preempt+compress is blocked, so no test needed for that case.

Signed-off-by: Lukas Straub 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 tests/qtest/migration-test.c | 85 +++-
 1 file changed, 55 insertions(+), 30 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index ea0d3fad2a..8a5df84624 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1127,6 +1127,36 @@ test_migrate_tls_x509_finish(QTestState *from,
 #endif /* CONFIG_TASN1 */
 #endif /* CONFIG_GNUTLS */
 
+static void *
+test_migrate_compress_start(QTestState *from,
+QTestState *to)
+{
+migrate_set_parameter_int(from, "compress-level", 1);
+migrate_set_parameter_int(from, "compress-threads", 4);
+migrate_set_parameter_bool(from, "compress-wait-thread", true);
+migrate_set_parameter_int(to, "decompress-threads", 4);
+
+migrate_set_capability(from, "compress", true);
+migrate_set_capability(to, "compress", true);
+
+return NULL;
+}
+
+static void *
+test_migrate_compress_nowait_start(QTestState *from,
+   QTestState *to)
+{
+migrate_set_parameter_int(from, "compress-level", 9);
+migrate_set_parameter_int(from, "compress-threads", 1);
+migrate_set_parameter_bool(from, "compress-wait-thread", false);
+migrate_set_parameter_int(to, "decompress-threads", 1);
+
+migrate_set_capability(from, "compress", true);
+migrate_set_capability(to, "compress", true);
+
+return NULL;
+}
+
 static int migrate_postcopy_prepare(QTestState **from_ptr,
 QTestState **to_ptr,
 MigrateCommon *args)
@@ -1204,6 +1234,15 @@ static void test_postcopy(void)
 test_postcopy_common(&args);
 }
 
+static void test_postcopy_compress(void)
+{
+MigrateCommon args = {
+.start_hook = test_migrate_compress_start
+};
+
+test_postcopy_common(&args);
+}
+
 static void test_postcopy_preempt(void)
 {
 MigrateCommon args = {
@@ -1305,6 +1344,15 @@ static void test_postcopy_recovery(void)
 test_postcopy_recovery_common(&args);
 }
 
+static void test_postcopy_recovery_compress(void)
+{
+MigrateCommon args = {
+.start_hook = test_migrate_compress_start
+};
+
+test_postcopy_recovery_common(&args);
+}
+
 #ifdef CONFIG_GNUTLS
 static void test_postcopy_recovery_tls_psk(void)
 {
@@ -1338,6 +1386,7 @@ static void test_postcopy_preempt_all(void)
 
 test_postcopy_recovery_common(&args);
 }
+
 #endif
 
 static void test_baddest(void)
@@ -1559,21 +1608,6 @@ static void test_precopy_unix_xbzrle(void)
 test_precopy_common(&args);
 }
 
-static void *
-test_migrate_compress_start(QTestState *from,
-QTestState *to)
-{
-migrate_set_parameter_int(from, "compress-level", 1);
-migrate_set_parameter_int(from, "compress-threads", 4);
-migrate_set_parameter_bool(from, "compress-wait-thread", true);
-migrate_set_parameter_int(to, "decompress-threads", 4);
-
-migrate_set_capability(from, "compress", true);
-migrate_set_capability(to, "compress", true);
-
-return NULL;
-}
-
 static void test_precopy_unix_compress(void)
 {
 g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
@@ -1591,21 +1625,6 @@ static void test_precopy_unix_compress(void)
 test_precopy_common(&args);
 }
 
-static void *
-test_migrate_compress_nowait_start(QTestState *from,
-   QTestState *to)
-{
-migrate_set_parameter_int(from, "compress-level", 9);
-migrate_set_parameter_int(from, "compress-threads", 1);
-migrate_set_parameter_bool(from, "compress-wait-thread", false);
-migrate_set_parameter_int(to, "decompress-threads", 1);
-
-migrate_set_capability(from, "compress", true);
-migrate_set_capability(to, "compress", true);
-
-return NULL;
-}
-
 static void test_precopy_unix_compress_nowait(void)
 {
 g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
@@ -2631,6 +2650,12 @@ int main(int argc, char **argv)
 qtest_add_func("/migration/postcopy/preempt/plain", 
test_postcopy_preempt);
 qtest_add_func("/migration/postcopy/preempt/recovery/plain",
test_postcopy_preempt_recovery);
+if (getenv("QEMU_TEST_FLAKY_TESTS")) {
+qtest_add_func("/migration/postcopy/compress/plain",
+   test_postcopy_compress);
+qtest_add_func("/migration/postcopy/recovery/compress/plain",
+   test_postcopy_recovery_compress);
+}
 }
 
 qtest_add_func("/migration/bad_dest", test_baddest);
-- 
2.40.0




[PULL 00/13] Compression code patches

2023-05-08 Thread Juan Quintela
The following changes since commit 792f77f376adef944f9a03e601f6ad90c2f891b2:

  Merge tag 'pull-loongarch-20230506' of https://gitlab.com/gaosong/qemu into 
staging (2023-05-06 08:11:52 +0100)

are available in the Git repository at:

  https://gitlab.com/juan.quintela/qemu.git tags/compression-code-pull-request

for you to fetch changes up to c323518a7aab1c01740a468671b7f2b517d3bca6:

  migration: Initialize and cleanup decompression in migration.c (2023-05-08 
15:25:27 +0200)


Migration PULL request (20230508 edition, take 2)

Hi

This is just the compression bits of the Migration PULL request for
20230428.  Only change is that we don't run the compression tests by
default.

The problem already exist with compression code.  The test just show
that it don't work.

- Add migration tests for (old) compress migration code (lukas)
- Make compression code independent of ram.c (lukas)
- Move compression code into ram-compress.c (lukas)

Please apply, Juan.



Lukas Straub (13):
  qtest/migration-test.c: Add tests with compress enabled
  qtest/migration-test.c: Add postcopy tests with compress enabled
  ram.c: Let the compress threads return a CompressResult enum
  ram.c: Dont change param->block in the compress thread
  ram.c: Reset result after sending queued data
  ram.c: Do not call save_page_header() from compress threads
  ram.c: Call update_compress_thread_counts from
compress_send_queued_data
  ram.c: Remove last ram.c dependency from the core compress code
  ram.c: Move core compression code into its own file
  ram.c: Move core decompression code into its own file
  ram compress: Assert that the file buffer matches the result
  ram-compress.c: Make target independent
  migration: Initialize and cleanup decompression in migration.c

 migration/meson.build|   6 +-
 migration/migration.c|   9 +
 migration/qemu-file.c|  11 +
 migration/qemu-file.h|   1 +
 migration/ram-compress.c | 485 +
 migration/ram-compress.h |  70 +
 migration/ram.c  | 502 +++
 tests/qtest/migration-test.c | 134 ++
 8 files changed, 758 insertions(+), 460 deletions(-)
 create mode 100644 migration/ram-compress.c
 create mode 100644 migration/ram-compress.h

-- 
2.40.0




[PULL 07/13] ram.c: Call update_compress_thread_counts from compress_send_queued_data

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

This makes the core compress code more independend from ram.c.

Signed-off-by: Lukas Straub 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 18 ++
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index c52602b70d..d1c24eff21 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1540,12 +1540,14 @@ static int send_queued_data(CompressParam *param)
 abort();
 }
 
+update_compress_thread_counts(param, len);
+
 return len;
 }
 
 static void flush_compressed_data(RAMState *rs)
 {
-int idx, len, thread_count;
+int idx, thread_count;
 
 if (!save_page_use_compression(rs)) {
 return;
@@ -1564,15 +1566,8 @@ static void flush_compressed_data(RAMState *rs)
 qemu_mutex_lock(&comp_param[idx].mutex);
 if (!comp_param[idx].quit) {
 CompressParam *param = &comp_param[idx];
-len = send_queued_data(param);
+send_queued_data(param);
 compress_reset_result(param);
-
-/*
- * it's safe to fetch zero_page without holding comp_done_lock
- * as there is no further request submitted to the thread,
- * i.e, the thread should be waiting for a request at this point.
- */
-update_compress_thread_counts(param, len);
 }
 qemu_mutex_unlock(&comp_param[idx].mutex);
 }
@@ -1588,7 +1583,7 @@ static inline void set_compress_params(CompressParam 
*param, RAMBlock *block,
 
 static int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset)
 {
-int idx, thread_count, bytes_xmit = -1, pages = -1;
+int idx, thread_count, pages = -1;
 bool wait = migrate_compress_wait_thread();
 
 thread_count = migrate_compress_threads();
@@ -1599,11 +1594,10 @@ retry:
 CompressParam *param = &comp_param[idx];
 qemu_mutex_lock(¶m->mutex);
 param->done = false;
-bytes_xmit = send_queued_data(param);
+send_queued_data(param);
 compress_reset_result(param);
 set_compress_params(param, block, offset);
 
-update_compress_thread_counts(param, bytes_xmit);
 qemu_cond_signal(¶m->cond);
 qemu_mutex_unlock(¶m->mutex);
 pages = 1;
-- 
2.40.0




[PULL 04/13] ram.c: Dont change param->block in the compress thread

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

Instead introduce a extra parameter to trigger the compress thread.
Now, when the compress thread is done, we know what RAMBlock and
offset it did compress.

This will be used in the next commits to move save_page_header()
out of compress code.

Signed-off-by: Lukas Straub 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7bc05fc034..b552a9e538 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -492,6 +492,7 @@ typedef enum CompressResult CompressResult;
 struct CompressParam {
 bool done;
 bool quit;
+bool trigger;
 CompressResult result;
 QEMUFile *file;
 QemuMutex mutex;
@@ -565,10 +566,10 @@ static void *do_data_compress(void *opaque)
 
 qemu_mutex_lock(¶m->mutex);
 while (!param->quit) {
-if (param->block) {
+if (param->trigger) {
 block = param->block;
 offset = param->offset;
-param->block = NULL;
+param->trigger = false;
 qemu_mutex_unlock(¶m->mutex);
 
 result = do_compress_ram_page(param->file, ¶m->stream,
@@ -1545,6 +1546,7 @@ static inline void set_compress_params(CompressParam 
*param, RAMBlock *block,
 {
 param->block = block;
 param->offset = offset;
+param->trigger = true;
 }
 
 static int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset)
-- 
2.40.0




[PULL 06/13] ram.c: Do not call save_page_header() from compress threads

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

save_page_header() accesses several global variables, so calling it
from multiple threads is pretty ugly.

Instead, call save_page_header() before writing out the compressed
data from the compress buffer to the migration stream.

This also makes the core compress code more independend from ram.c.

Signed-off-by: Lukas Straub 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 44 +++-
 1 file changed, 35 insertions(+), 9 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 4e14e3bb94..c52602b70d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1465,17 +1465,13 @@ static CompressResult do_compress_ram_page(QEMUFile *f, 
z_stream *stream,
RAMBlock *block, ram_addr_t offset,
uint8_t *source_buf)
 {
-RAMState *rs = ram_state;
-PageSearchStatus *pss = &rs->pss[RAM_CHANNEL_PRECOPY];
 uint8_t *p = block->host + offset;
 int ret;
 
-if (save_zero_page_to_file(pss, f, block, offset)) {
+if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
 return RES_ZEROPAGE;
 }
 
-save_page_header(pss, f, block, offset | RAM_SAVE_FLAG_COMPRESS_PAGE);
-
 /*
  * copy it to a internal buffer to avoid it being modified by VM
  * so that we can catch up the error during compression and
@@ -1515,9 +1511,40 @@ static inline void compress_reset_result(CompressParam 
*param)
 param->offset = 0;
 }
 
+static int send_queued_data(CompressParam *param)
+{
+PageSearchStatus *pss = &ram_state->pss[RAM_CHANNEL_PRECOPY];
+MigrationState *ms = migrate_get_current();
+QEMUFile *file = ms->to_dst_file;
+int len = 0;
+
+RAMBlock *block = param->block;
+ram_addr_t offset = param->offset;
+
+if (param->result == RES_NONE) {
+return 0;
+}
+
+assert(block == pss->last_sent_block);
+
+if (param->result == RES_ZEROPAGE) {
+len += save_page_header(pss, file, block, offset | RAM_SAVE_FLAG_ZERO);
+qemu_put_byte(file, 0);
+len += 1;
+ram_release_page(block->idstr, offset);
+} else if (param->result == RES_COMPRESS) {
+len += save_page_header(pss, file, block,
+offset | RAM_SAVE_FLAG_COMPRESS_PAGE);
+len += qemu_put_qemu_file(file, param->file);
+} else {
+abort();
+}
+
+return len;
+}
+
 static void flush_compressed_data(RAMState *rs)
 {
-MigrationState *ms = migrate_get_current();
 int idx, len, thread_count;
 
 if (!save_page_use_compression(rs)) {
@@ -1537,7 +1564,7 @@ static void flush_compressed_data(RAMState *rs)
 qemu_mutex_lock(&comp_param[idx].mutex);
 if (!comp_param[idx].quit) {
 CompressParam *param = &comp_param[idx];
-len = qemu_put_qemu_file(ms->to_dst_file, param->file);
+len = send_queued_data(param);
 compress_reset_result(param);
 
 /*
@@ -1563,7 +1590,6 @@ static int compress_page_with_multi_thread(RAMBlock 
*block, ram_addr_t offset)
 {
 int idx, thread_count, bytes_xmit = -1, pages = -1;
 bool wait = migrate_compress_wait_thread();
-MigrationState *ms = migrate_get_current();
 
 thread_count = migrate_compress_threads();
 qemu_mutex_lock(&comp_done_lock);
@@ -1573,7 +1599,7 @@ retry:
 CompressParam *param = &comp_param[idx];
 qemu_mutex_lock(¶m->mutex);
 param->done = false;
-bytes_xmit = qemu_put_qemu_file(ms->to_dst_file, param->file);
+bytes_xmit = send_queued_data(param);
 compress_reset_result(param);
 set_compress_params(param, block, offset);
 
-- 
2.40.0




[PULL 10/13] ram.c: Move core decompression code into its own file

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

No functional changes intended.

Signed-off-by: Lukas Straub 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram-compress.c | 203 ++
 migration/ram-compress.h |   5 +
 migration/ram.c  | 204 ---
 3 files changed, 208 insertions(+), 204 deletions(-)

diff --git a/migration/ram-compress.c b/migration/ram-compress.c
index d9bc67d075..c25562f12d 100644
--- a/migration/ram-compress.c
+++ b/migration/ram-compress.c
@@ -48,6 +48,24 @@ static QemuThread *compress_threads;
 static QemuMutex comp_done_lock;
 static QemuCond comp_done_cond;
 
+struct DecompressParam {
+bool done;
+bool quit;
+QemuMutex mutex;
+QemuCond cond;
+void *des;
+uint8_t *compbuf;
+int len;
+z_stream stream;
+};
+typedef struct DecompressParam DecompressParam;
+
+static QEMUFile *decomp_file;
+static DecompressParam *decomp_param;
+static QemuThread *decompress_threads;
+static QemuMutex decomp_done_lock;
+static QemuCond decomp_done_cond;
+
 static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream,
RAMBlock *block, ram_addr_t offset,
uint8_t *source_buf);
@@ -272,3 +290,188 @@ retry:
 
 return pages;
 }
+
+/* return the size after decompression, or negative value on error */
+static int
+qemu_uncompress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
+ const uint8_t *source, size_t source_len)
+{
+int err;
+
+err = inflateReset(stream);
+if (err != Z_OK) {
+return -1;
+}
+
+stream->avail_in = source_len;
+stream->next_in = (uint8_t *)source;
+stream->avail_out = dest_len;
+stream->next_out = dest;
+
+err = inflate(stream, Z_NO_FLUSH);
+if (err != Z_STREAM_END) {
+return -1;
+}
+
+return stream->total_out;
+}
+
+static void *do_data_decompress(void *opaque)
+{
+DecompressParam *param = opaque;
+unsigned long pagesize;
+uint8_t *des;
+int len, ret;
+
+qemu_mutex_lock(¶m->mutex);
+while (!param->quit) {
+if (param->des) {
+des = param->des;
+len = param->len;
+param->des = 0;
+qemu_mutex_unlock(¶m->mutex);
+
+pagesize = TARGET_PAGE_SIZE;
+
+ret = qemu_uncompress_data(¶m->stream, des, pagesize,
+   param->compbuf, len);
+if (ret < 0 && migrate_get_current()->decompress_error_check) {
+error_report("decompress data failed");
+qemu_file_set_error(decomp_file, ret);
+}
+
+qemu_mutex_lock(&decomp_done_lock);
+param->done = true;
+qemu_cond_signal(&decomp_done_cond);
+qemu_mutex_unlock(&decomp_done_lock);
+
+qemu_mutex_lock(¶m->mutex);
+} else {
+qemu_cond_wait(¶m->cond, ¶m->mutex);
+}
+}
+qemu_mutex_unlock(¶m->mutex);
+
+return NULL;
+}
+
+int wait_for_decompress_done(void)
+{
+int idx, thread_count;
+
+if (!migrate_compress()) {
+return 0;
+}
+
+thread_count = migrate_decompress_threads();
+qemu_mutex_lock(&decomp_done_lock);
+for (idx = 0; idx < thread_count; idx++) {
+while (!decomp_param[idx].done) {
+qemu_cond_wait(&decomp_done_cond, &decomp_done_lock);
+}
+}
+qemu_mutex_unlock(&decomp_done_lock);
+return qemu_file_get_error(decomp_file);
+}
+
+void compress_threads_load_cleanup(void)
+{
+int i, thread_count;
+
+if (!migrate_compress()) {
+return;
+}
+thread_count = migrate_decompress_threads();
+for (i = 0; i < thread_count; i++) {
+/*
+ * we use it as a indicator which shows if the thread is
+ * properly init'd or not
+ */
+if (!decomp_param[i].compbuf) {
+break;
+}
+
+qemu_mutex_lock(&decomp_param[i].mutex);
+decomp_param[i].quit = true;
+qemu_cond_signal(&decomp_param[i].cond);
+qemu_mutex_unlock(&decomp_param[i].mutex);
+}
+for (i = 0; i < thread_count; i++) {
+if (!decomp_param[i].compbuf) {
+break;
+}
+
+qemu_thread_join(decompress_threads + i);
+qemu_mutex_destroy(&decomp_param[i].mutex);
+qemu_cond_destroy(&decomp_param[i].cond);
+inflateEnd(&decomp_param[i].stream);
+g_free(decomp_param[i].compbuf);
+decomp_param[i].compbuf = NULL;
+}
+g_free(decompress_threads);
+g_free(decomp_param);
+decompress_threads = NULL;
+decomp_param = NULL;
+decomp_file = NULL;
+}
+
+int compress_threads_load_setup(QEMUFile *f)
+{
+int i, thread_count;
+
+if (!migrate_compress()) {
+return 0;
+}
+
+thread_count = migrate_decompress_threads();
+decompress_thr

[PULL 03/13] ram.c: Let the compress threads return a CompressResult enum

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

This will be used in the next commits to move save_page_header()
out of compress code.

Signed-off-by: Lukas Straub 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 34 ++
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 5e7bf20ca5..7bc05fc034 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -482,10 +482,17 @@ MigrationOps *migration_ops;
 
 CompressionStats compression_counters;
 
+enum CompressResult {
+RES_NONE = 0,
+RES_ZEROPAGE = 1,
+RES_COMPRESS = 2
+};
+typedef enum CompressResult CompressResult;
+
 struct CompressParam {
 bool done;
 bool quit;
-bool zero_page;
+CompressResult result;
 QEMUFile *file;
 QemuMutex mutex;
 QemuCond cond;
@@ -527,8 +534,9 @@ static QemuCond decomp_done_cond;
 
 static int ram_save_host_page_urgent(PageSearchStatus *pss);
 
-static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock 
*block,
- ram_addr_t offset, uint8_t *source_buf);
+static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream,
+   RAMBlock *block, ram_addr_t offset,
+   uint8_t *source_buf);
 
 /* NOTE: page is the PFN not real ram_addr_t. */
 static void pss_init(PageSearchStatus *pss, RAMBlock *rb, ram_addr_t page)
@@ -553,7 +561,7 @@ static void *do_data_compress(void *opaque)
 CompressParam *param = opaque;
 RAMBlock *block;
 ram_addr_t offset;
-bool zero_page;
+CompressResult result;
 
 qemu_mutex_lock(¶m->mutex);
 while (!param->quit) {
@@ -563,12 +571,12 @@ static void *do_data_compress(void *opaque)
 param->block = NULL;
 qemu_mutex_unlock(¶m->mutex);
 
-zero_page = do_compress_ram_page(param->file, ¶m->stream,
- block, offset, param->originbuf);
+result = do_compress_ram_page(param->file, ¶m->stream,
+  block, offset, param->originbuf);
 
 qemu_mutex_lock(&comp_done_lock);
 param->done = true;
-param->zero_page = zero_page;
+param->result = result;
 qemu_cond_signal(&comp_done_cond);
 qemu_mutex_unlock(&comp_done_lock);
 
@@ -1452,8 +1460,9 @@ static int ram_save_multifd_page(QEMUFile *file, RAMBlock 
*block,
 return 1;
 }
 
-static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock 
*block,
- ram_addr_t offset, uint8_t *source_buf)
+static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream,
+   RAMBlock *block, ram_addr_t offset,
+   uint8_t *source_buf)
 {
 RAMState *rs = ram_state;
 PageSearchStatus *pss = &rs->pss[RAM_CHANNEL_PRECOPY];
@@ -1461,7 +1470,7 @@ static bool do_compress_ram_page(QEMUFile *f, z_stream 
*stream, RAMBlock *block,
 int ret;
 
 if (save_zero_page_to_file(pss, f, block, offset)) {
-return true;
+return RES_ZEROPAGE;
 }
 
 save_page_header(pss, f, block, offset | RAM_SAVE_FLAG_COMPRESS_PAGE);
@@ -1476,8 +1485,9 @@ static bool do_compress_ram_page(QEMUFile *f, z_stream 
*stream, RAMBlock *block,
 if (ret < 0) {
 qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
 error_report("compressed data failed!");
+return RES_NONE;
 }
-return false;
+return RES_COMPRESS;
 }
 
 static void
@@ -1485,7 +1495,7 @@ update_compress_thread_counts(const CompressParam *param, 
int bytes_xmit)
 {
 ram_transferred_add(bytes_xmit);
 
-if (param->zero_page) {
+if (param->result == RES_ZEROPAGE) {
 stat64_add(&mig_stats.zero_pages, 1);
 return;
 }
-- 
2.40.0




[PULL 05/13] ram.c: Reset result after sending queued data

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

And take the param->mutex lock for the whole section to ensure
thread-safety.
Now, it is explicitly clear if there is no queued data to send.
Before, this was handled by param->file stream being empty and thus
qemu_put_qemu_file() not sending anything.

This will be used in the next commits to move save_page_header()
out of compress code.

Signed-off-by: Lukas Straub 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 32 ++--
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index b552a9e538..4e14e3bb94 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1508,6 +1508,13 @@ update_compress_thread_counts(const CompressParam 
*param, int bytes_xmit)
 
 static bool save_page_use_compression(RAMState *rs);
 
+static inline void compress_reset_result(CompressParam *param)
+{
+param->result = RES_NONE;
+param->block = NULL;
+param->offset = 0;
+}
+
 static void flush_compressed_data(RAMState *rs)
 {
 MigrationState *ms = migrate_get_current();
@@ -1529,13 +1536,16 @@ static void flush_compressed_data(RAMState *rs)
 for (idx = 0; idx < thread_count; idx++) {
 qemu_mutex_lock(&comp_param[idx].mutex);
 if (!comp_param[idx].quit) {
-len = qemu_put_qemu_file(ms->to_dst_file, comp_param[idx].file);
+CompressParam *param = &comp_param[idx];
+len = qemu_put_qemu_file(ms->to_dst_file, param->file);
+compress_reset_result(param);
+
 /*
  * it's safe to fetch zero_page without holding comp_done_lock
  * as there is no further request submitted to the thread,
  * i.e, the thread should be waiting for a request at this point.
  */
-update_compress_thread_counts(&comp_param[idx], len);
+update_compress_thread_counts(param, len);
 }
 qemu_mutex_unlock(&comp_param[idx].mutex);
 }
@@ -1560,15 +1570,17 @@ static int compress_page_with_multi_thread(RAMBlock 
*block, ram_addr_t offset)
 retry:
 for (idx = 0; idx < thread_count; idx++) {
 if (comp_param[idx].done) {
-comp_param[idx].done = false;
-bytes_xmit = qemu_put_qemu_file(ms->to_dst_file,
-comp_param[idx].file);
-qemu_mutex_lock(&comp_param[idx].mutex);
-set_compress_params(&comp_param[idx], block, offset);
-qemu_cond_signal(&comp_param[idx].cond);
-qemu_mutex_unlock(&comp_param[idx].mutex);
+CompressParam *param = &comp_param[idx];
+qemu_mutex_lock(¶m->mutex);
+param->done = false;
+bytes_xmit = qemu_put_qemu_file(ms->to_dst_file, param->file);
+compress_reset_result(param);
+set_compress_params(param, block, offset);
+
+update_compress_thread_counts(param, bytes_xmit);
+qemu_cond_signal(¶m->cond);
+qemu_mutex_unlock(¶m->mutex);
 pages = 1;
-update_compress_thread_counts(&comp_param[idx], bytes_xmit);
 break;
 }
 }
-- 
2.40.0




Re: Machine x-remote property auto-shutdown

2023-05-08 Thread Jag Raman
Hi Markus,

Please see the comments inline below.

> On May 5, 2023, at 10:58 AM, Markus Armbruster  wrote:
> 
> I stumbled over this property, looked closer, and now I'm confused.
> 
> Like most QOM properties, x-remote.auto-shutdown is virtually
> undocumented.  All we have is this comment in vfio-user-obj.c:
> 
>/**
> * Usage: add options:
> * -machine x-remote,vfio-user=on,auto-shutdown=on
> * -device ,id=
> * -object x-vfio-user-server,id=,type=unix,path=,
> * device=
> *
> * Note that x-vfio-user-server object must be used with x-remote machine 
> only.
> * This server could only support PCI devices for now.
> *
> * type - SocketAddress type - presently "unix" alone is supported. 
> Required
> *option
> *
> * path - named unix socket, it will be created by the server. It is
> *a required option
> *
> * device - id of a device on the server, a required option. PCI devices
> *  alone are supported presently.
> *
> * notes - x-vfio-user-server could block IO and monitor during the
> * initialization phase.
> */
> 
> This differs from docs/system/multi-process.rst, which has
> 
>  - Example command-line for the remote process is as follows:
> 
>  /usr/bin/qemu-system-x86_64\
>  -machine x-remote  \
>  -device lsi53c895a,id=lsi0 \
>  -drive id=drive_image2,file=/build/ol7-nvme-test-1.qcow2   \
>  -device scsi-hd,id=drive2,drive=drive_image2,bus=lsi0.0,scsi-id=0  \
>  -object x-remote-object,id=robj1,devid=lsi0,fd=4,
> 
> No mention of auto-shutdown here.
> 
> It points to docs/devel/qemu-multiprocess, which doesn't exist.  I guess
> it's docs/devel/multi-process.rst.  Please fix that.  Anyway, no mention

Sorry about this. I will fix it.

> of auto-shutdown there, either.
> 
> Let's try code instead.  The only use of the property is here:
> 
>static bool vfu_object_auto_shutdown(void)
>{
>bool auto_shutdown = true;
>Error *local_err = NULL;
> 
>if (!current_machine) {
>return auto_shutdown;
>}
> 
>auto_shutdown = object_property_get_bool(OBJECT(current_machine),
> "auto-shutdown",
> &local_err);
> 
>/*
> * local_err would be set if no such property exists - safe to ignore.
> * Unlikely scenario as auto-shutdown is always defined for
> * TYPE_REMOTE_MACHINE, and  TYPE_VFU_OBJECT only works with
> * TYPE_REMOTE_MACHINE
> */
>if (local_err) {
>auto_shutdown = true;
>error_free(local_err);
>}
> 
>return auto_shutdown;
>}
> 
> The comment suggests auto-shutdown should always be set with machine
> TYPE_REMOTE_MACHINE, i.e. -machine x-remote basically requires
> auto-shutdown=on.  Why isn't it the default then?  Why is it even
> configurable?  Use cases?

The "auto-shutdown" property tells the server if it should continue running
after all the clients disconnect or if it should shut down automatically after
the last client disconnects.

The user can set this property to "off" when the server serves multiple
QEMU clients. The server process will continue to run after the last
client disconnects, waiting for more clients to connect in the future.

> 
> Anyway, vfu_object_auto_shutdown() returns
> 
> (1) true when we don't have a current machine
> 
> (2) true when getting the current machine's auto-shutdown property fails
> 
> (3) the value of its auto-shutdown property otherwise
> 
> Two uses:
> 
> * In vfu_object_finalize():
> 
>if (!k->nr_devs && vfu_object_auto_shutdown()) {
>qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
>}
> 
>  I guess this requests shutdown when the last TYPE_VFU_OBJECT dies.
> 
>  SHUTDOWN_CAUSE_GUEST_SHUTDOWN is documented as
> 
># @guest-shutdown: Guest shutdown/suspend request, via ACPI or other
>#  hardware-specific means
> 
>  Can't say whether it's the right one to use here.
> 
> * In VFU_OBJECT_ERROR():
> 
>/**
> * VFU_OBJECT_ERROR - reports an error message. If auto_shutdown
> * is set, it aborts the machine on error. Otherwise, it logs an
> * error message without aborting.
> */
>//
>#define VFU_OBJECT_ERROR(o, fmt, ...) \
>{ \
>if (vfu_object_auto_shutdown()) { \
>error_setg(&error_abort, (fmt), ## __VA_ARGS__);  \
>} else {  \
>error_report((fmt), ## __VA_ARGS__);   

[PATCH v3 2/3] target/arm: Select CONFIG_ARM_V7M when TCG is enabled

2023-05-08 Thread Fabiano Rosas
We cannot allow this config to be disabled at the moment as not all of
the relevant code is protected by it.

Commit 29d9efca16 ("arm/Kconfig: Do not build TCG-only boards on a
KVM-only build") moved the CONFIGs of several boards to Kconfig, so it
is now possible that nothing selects ARM_V7M (e.g. when doing a
--without-default-devices build).

Return the CONFIG_ARM_V7M entry to a state where it is always selected
whenever TCG is available.

Fixes: 29d9efca16 ("arm/Kconfig: Do not build TCG-only boards on a KVM-only 
build")
Signed-off-by: Fabiano Rosas 
---
 target/arm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/arm/Kconfig b/target/arm/Kconfig
index 3fffdcb61b..5947366f6e 100644
--- a/target/arm/Kconfig
+++ b/target/arm/Kconfig
@@ -1,6 +1,7 @@
 config ARM
 bool
 select ARM_COMPATIBLE_SEMIHOSTING if TCG
+select ARM_V7M if TCG
 
 config AARCH64
 bool
-- 
2.35.3




[PATCH v3 0/3] target/arm: disable-tcg and without-default-devices fixes

2023-05-08 Thread Fabiano Rosas
Changed the cdrom test to apply to only the x86 and s390x cdrom boot
tests.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/860488769

v2:
https://lore.kernel.org/r/20230505123524.23401-1-faro...@suse.de

v1:
https://lore.kernel.org/r/20230503193833.29047-1-faro...@suse.de

Here's the fix for the cdrom test failure that we discussed in the
list, plus 2 fixes for the ---without-default-devices build.

When I moved the boards CONFIGs from default.mak to Kconfig, it became
possible (due to --without-default-devices) to disable the CONFIGs for
all the boards that require ARM_V7M. That breaks the build because
ARM_V7M is required to be always set.

Fabiano Rosas (3):
  target/arm: Select SEMIHOSTING when using TCG
  target/arm: Select CONFIG_ARM_V7M when TCG is enabled
  tests/qtest: Don't run cdrom boot tests if no accelerator is present

 target/arm/Kconfig   |  9 ++---
 tests/qtest/cdrom-test.c | 10 ++
 2 files changed, 12 insertions(+), 7 deletions(-)

-- 
2.35.3




[PATCH v3 3/3] tests/qtest: Don't run cdrom boot tests if no accelerator is present

2023-05-08 Thread Fabiano Rosas
On a build configured with: --disable-tcg --enable-xen it is possible
to produce a QEMU binary with no TCG nor KVM support. Skip the cdrom
boot tests if that's the case.

Fixes: 0c1ae3ff9d ("tests/qtest: Fix tests when no KVM or TCG are present")
Signed-off-by: Fabiano Rosas 
---
 tests/qtest/cdrom-test.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/tests/qtest/cdrom-test.c b/tests/qtest/cdrom-test.c
index 26a2400181..31d3bacd8c 100644
--- a/tests/qtest/cdrom-test.c
+++ b/tests/qtest/cdrom-test.c
@@ -130,6 +130,11 @@ static void test_cdboot(gconstpointer data)
 
 static void add_x86_tests(void)
 {
+if (!qtest_has_accel("tcg") && !qtest_has_accel("kvm")) {
+g_test_skip("No KVM or TCG accelerator available, skipping boot 
tests");
+return;
+}
+
 qtest_add_data_func("cdrom/boot/default", "-cdrom ", test_cdboot);
 qtest_add_data_func("cdrom/boot/virtio-scsi",
 "-device virtio-scsi -device scsi-cd,drive=cdr "
@@ -176,6 +181,11 @@ static void add_x86_tests(void)
 
 static void add_s390x_tests(void)
 {
+if (!qtest_has_accel("tcg") && !qtest_has_accel("kvm")) {
+g_test_skip("No KVM or TCG accelerator available, skipping boot 
tests");
+return;
+}
+
 qtest_add_data_func("cdrom/boot/default", "-cdrom ", test_cdboot);
 qtest_add_data_func("cdrom/boot/virtio-scsi",
 "-device virtio-scsi -device scsi-cd,drive=cdr "
-- 
2.35.3




[PATCH v3 1/3] target/arm: Select SEMIHOSTING when using TCG

2023-05-08 Thread Fabiano Rosas
Semihosting has been made a 'default y' entry in Kconfig, which does
not work because when building --without-default-devices, the
semihosting code would not be available.

Make semihosting unconditional when TCG is present.

Fixes: 29d9efca16 ("arm/Kconfig: Do not build TCG-only boards on a KVM-only 
build")
Signed-off-by: Fabiano Rosas 
---
 target/arm/Kconfig | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/target/arm/Kconfig b/target/arm/Kconfig
index 39f05b6420..3fffdcb61b 100644
--- a/target/arm/Kconfig
+++ b/target/arm/Kconfig
@@ -1,13 +1,7 @@
 config ARM
 bool
+select ARM_COMPATIBLE_SEMIHOSTING if TCG
 
 config AARCH64
 bool
 select ARM
-
-# This config exists just so we can make SEMIHOSTING default when TCG
-# is selected without also changing it for other architectures.
-config ARM_SEMIHOSTING
-bool
-default y if TCG && ARM
-select ARM_COMPATIBLE_SEMIHOSTING
-- 
2.35.3




Re: [PATCH] virtio-net: not enable vq reset feature unconditionally

2023-05-08 Thread Michael S. Tsirkin
On Mon, May 08, 2023 at 07:31:35PM +0200, Eugenio Perez Martin wrote:
> On Mon, May 8, 2023 at 12:22 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, May 08, 2023 at 11:09:46AM +0200, Eugenio Perez Martin wrote:
> > > On Sat, May 6, 2023 at 4:25 AM Xuan Zhuo  
> > > wrote:
> > > >
> > > > On Thu,  4 May 2023 12:14:47 +0200, =?utf-8?q?Eugenio_P=C3=A9rez?= 
> > > >  wrote:
> > > > > The commit 93a97dc5200a ("virtio-net: enable vq reset feature") 
> > > > > enables
> > > > > unconditionally vq reset feature as long as the device is emulated.
> > > > > This makes impossible to actually disable the feature, and it causes
> > > > > migration problems from qemu version previous than 7.2.
> > > > >
> > > > > The entire final commit is unneeded as device system already enable or
> > > > > disable the feature properly.
> > > > >
> > > > > This reverts commit 93a97dc5200a95e63b99cb625f20b7ae802ba413.
> > > > > Fixes: 93a97dc5200a ("virtio-net: enable vq reset feature")
> > > > > Signed-off-by: Eugenio Pérez 
> > > > >
> > > > > ---
> > > > > Tested by checking feature bit at  
> > > > > /sys/devices/pci.../virtio0/features
> > > > > enabling and disabling queue_reset virtio-net feature and vhost=on/off
> > > > > on net device backend.
> > > >
> > > > Do you mean that this feature cannot be closed?
> > > >
> > > > I tried to close in the guest, it was successful.
> > > >
> > >
> > > I'm not sure what you mean with close. If the device dataplane is
> > > emulated in qemu (vhost=off), I'm not able to make the device not
> > > offer it.
> > >
> > > > In addition, in this case, could you try to repair the problem instead 
> > > > of
> > > > directly revert.
> > > >
> > >
> > > I'm not following this. The revert is not to always disable the feature.
> > >
> > > By default, the feature is enabled. If cmdline states queue_reset=on,
> > > the feature is enabled. That is true both before and after applying
> > > this patch.
> > >
> > > However, in qemu master, queue_reset=off keeps enabling this feature
> > > on the device. It happens that there is a commit explicitly doing
> > > that, so I'm reverting it.
> > >
> > > Let me know if that makes sense to you.
> > >
> > > Thanks!
> >
> >
> > question is this:
> >
> > DEFINE_PROP_BIT64("queue_reset", _state, _field, \
> >   VIRTIO_F_RING_RESET, true)
> >
> >
> >
> > don't we need compat for 7.2 and back for this property?
> >
> 
> I think that part is already covered by commit 69e1c14aa222 ("virtio:
> core: vq reset feature negotation support"). In that regard, maybe we
> can simplify the patch message simply stating that queue_reset=off
> does not work.
> 
> Thanks!

that compat for 7.1 and not 7.2 though? is that correct?




Re: [PATCH 0/4] vhost-user-fs: Internal migration

2023-05-08 Thread Eugenio Perez Martin
On Mon, May 8, 2023 at 7:00 PM Hanna Czenczek  wrote:
>
> On 05.05.23 16:37, Hanna Czenczek wrote:
> > On 05.05.23 16:26, Eugenio Perez Martin wrote:
> >> On Fri, May 5, 2023 at 11:51 AM Hanna Czenczek 
> >> wrote:
> >>> (By the way, thanks for the explanations :))
> >>>
> >>> On 05.05.23 11:03, Hanna Czenczek wrote:
>  On 04.05.23 23:14, Stefan Hajnoczi wrote:
> >>> [...]
> >>>
> > I think it's better to change QEMU's vhost code
> > to leave stateful devices suspended (but not reset) across
> > vhost_dev_stop() -> vhost_dev_start(), maybe by introducing
> > vhost_dev_suspend() and vhost_dev_resume(). Have you thought about
> > this aspect?
>  Yes and no; I mean, I haven’t in detail, but I thought this is what’s
>  meant by suspending instead of resetting when the VM is stopped.
> >>> So, now looking at vhost_dev_stop(), one problem I can see is that
> >>> depending on the back-end, different operations it does will do
> >>> different things.
> >>>
> >>> It tries to stop the whole device via vhost_ops->vhost_dev_start(),
> >>> which for vDPA will suspend the device, but for vhost-user will
> >>> reset it
> >>> (if F_STATUS is there).
> >>>
> >>> It disables all vrings, which doesn’t mean stopping, but may be
> >>> necessary, too.  (I haven’t yet really understood the use of disabled
> >>> vrings, I heard that virtio-net would have a need for it.)
> >>>
> >>> It then also stops all vrings, though, so that’s OK.  And because this
> >>> will always do GET_VRING_BASE, this is actually always the same
> >>> regardless of transport.
> >>>
> >>> Finally (for this purpose), it resets the device status via
> >>> vhost_ops->vhost_reset_status().  This is only implemented on vDPA, and
> >>> this is what resets the device there.
> >>>
> >>>
> >>> So vhost-user resets the device in .vhost_dev_start, but vDPA only does
> >>> so in .vhost_reset_status.  It would seem better to me if vhost-user
> >>> would also reset the device only in .vhost_reset_status, not in
> >>> .vhost_dev_start.  .vhost_dev_start seems precisely like the place to
> >>> run SUSPEND/RESUME.
> >>>
> >> I think the same. I just saw It's been proposed at [1].
> >>
> >>> Another question I have (but this is basically what I wrote in my last
> >>> email) is why we even call .vhost_reset_status here.  If the device
> >>> and/or all of the vrings are already stopped, why do we need to reset
> >>> it?  Naïvely, I had assumed we only really need to reset the device if
> >>> the guest changes, so that a new guest driver sees a freshly
> >>> initialized
> >>> device.
> >>>
> >> I don't know why we didn't need to call it :). I'm assuming the
> >> previous vhost-user net did fine resetting vq indexes, using
> >> VHOST_USER_SET_VRING_BASE. But I don't know about more complex
> >> devices.
> >>
> >> The guest can reset the device, or write 0 to the PCI config status,
> >> at any time. How does virtiofs handle it, being stateful?
> >
> > Honestly a good question because virtiofsd implements neither
> > SET_STATUS nor RESET_DEVICE.  I’ll have to investigate that.
> >
> > I think when the guest resets the device, SET_VRING_BASE always comes
> > along some way or another, so that’s how the vrings are reset.  Maybe
> > the internal state is reset only following more high-level FUSE
> > commands like INIT.
>
> So a meeting and one session of looking-into-the-code later:
>
> We reset every virt queue on GET_VRING_BASE, which is wrong, but happens
> to serve the purpose.  (German is currently on that.)
>
> In our meeting, German said the reset would occur when the memory
> regions are changed, but I can’t see that in the code.

That would imply that the status is reset when the guest's memory is
added or removed?

> I think it only
> happens implicitly through the SET_VRING_BASE call, which resets the
> internal avail/used pointers.
>
> [This doesn’t seem different from libvhost-user, though, which
> implements neither SET_STATUS nor RESET_DEVICE, and which pretends to
> reset the device on RESET_OWNER, but really doesn’t (its
> vu_reset_device_exec() function just disables all vrings, doesn’t reset
> or even stop them).]
>
> Consequently, the internal state is never reset.  It would be cleared on
> a FUSE Destroy message, but if you just force-reset the system, the
> state remains into the next reboot.  Not even FUSE Init clears it, which
> seems weird.  It happens to work because it’s still the same filesystem,
> so the existing state fits, but it kind of seems dangerous to keep e.g.
> files open.  I don’t think it’s really exploitable because everything
> still goes through the guest kernel, but, well.  We should clear the
> state on Init, and probably also implement SET_STATUS and clear the
> state there.
>

I see. That's in the line of assuming GET_VRING_BASE is the last
message received from qemu.

Thanks!




Re: [PATCH v4 51/57] tcg/sparc64: Use atom_and_align_for_opc

2023-05-08 Thread Richard Henderson

On 5/5/23 14:20, Peter Maydell wrote:

On Wed, 3 May 2023 at 08:13, Richard Henderson
 wrote:


Signed-off-by: Richard Henderson 
---
  tcg/sparc64/tcg-target.c.inc | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tcg/sparc64/tcg-target.c.inc b/tcg/sparc64/tcg-target.c.inc
index bb23038529..4f9ec02b1f 100644
--- a/tcg/sparc64/tcg-target.c.inc
+++ b/tcg/sparc64/tcg-target.c.inc
@@ -1028,11 +1028,13 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext 
*s, HostAddress *h,
  {
  TCGLabelQemuLdst *ldst = NULL;
  MemOp opc = get_memop(oi);
-unsigned a_bits = get_alignment_bits(opc);
-unsigned s_bits = opc & MO_SIZE;
+MemOp s_bits = opc & MO_SIZE;
+MemOp a_bits, atom_a, atom_u;
  unsigned a_mask;

  /* We don't support unaligned accesses. */
+a_bits = atom_and_align_for_opc(s, &atom_a, &atom_u, opc,
+MO_ATOM_IFALIGN, false);
  a_bits = MAX(a_bits, s_bits);
  a_mask = (1u << a_bits) - 1;

--


No changes to HostAddress struct again?


Again, no use of alignment outside of prepare_host_addr.
No 128-bit operations, and all host operations aligned.


r~




Re: [PATCH v4 49/57] tcg/riscv: Use atom_and_align_for_opc

2023-05-08 Thread Richard Henderson

On 5/5/23 14:19, Peter Maydell wrote:

On Wed, 3 May 2023 at 08:13, Richard Henderson
 wrote:


Signed-off-by: Richard Henderson 
---
  tcg/riscv/tcg-target.c.inc | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 37870c89fc..4dd33c73e8 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -910,8 +910,12 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, 
TCGReg *pbase,
  {
  TCGLabelQemuLdst *ldst = NULL;
  MemOp opc = get_memop(oi);
-unsigned a_bits = get_alignment_bits(opc);
-unsigned a_mask = (1u << a_bits) - 1;
+MemOp a_bits, atom_a, atom_u;
+unsigned a_mask;
+
+a_bits = atom_and_align_for_opc(s, &atom_a, &atom_u, opc,
+MO_ATOM_IFALIGN, false);
+a_mask = (1u << a_bits) - 1;

  #ifdef CONFIG_SOFTMMU
  unsigned s_bits = opc & MO_SIZE;


Same remark as for ppc.


Because the alignment was not required outside of prepare_host_addr.
RISC-V does not have 128-bit memory operations of any kind.


r~




Re: [PATCH v4 48/57] tcg/ppc: Use atom_and_align_for_opc

2023-05-08 Thread Richard Henderson

On 5/5/23 14:18, Peter Maydell wrote:

On Wed, 3 May 2023 at 08:13, Richard Henderson
 wrote:


Signed-off-by: Richard Henderson 
---
  tcg/ppc/tcg-target.c.inc | 17 -
  1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index f0a4118bbb..60375804cd 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -2034,7 +2034,22 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext 
*s, HostAddress *h,
  {
  TCGLabelQemuLdst *ldst = NULL;
  MemOp opc = get_memop(oi);
-unsigned a_bits = get_alignment_bits(opc);
+MemOp a_bits, atom_a, atom_u;
+
+/*
+ * Book II, Section 1.4, Single-Copy Atomicity, specifies:
+ *
+ * Before 3.0, "An access that is not atomic is performed as a set of
+ * smaller disjoint atomic accesses. In general, the number and alignment
+ * of these accesses are implementation-dependent."  Thus MO_ATOM_IFALIGN.
+ *
+ * As of 3.0, "the non-atomic access is performed as described in
+ * the corresponding list", which matches MO_ATOM_SUBALIGN.
+ */
+a_bits = atom_and_align_for_opc(s, &atom_a, &atom_u, opc,
+have_isa_3_00 ? MO_ATOM_SUBALIGN
+  : MO_ATOM_IFALIGN,
+false);



Why doesn't this patch have changes to a HostAddress struct
like all the other archs ?


Because the alignment as only required here, within prepare_host_addr.
The Power LQ instruction allows unaligned input, unlike x86 VMOVDQA.


r~




Re: [PATCH] virtio-net: not enable vq reset feature unconditionally

2023-05-08 Thread Eugenio Perez Martin
On Mon, May 8, 2023 at 12:22 PM Michael S. Tsirkin  wrote:
>
> On Mon, May 08, 2023 at 11:09:46AM +0200, Eugenio Perez Martin wrote:
> > On Sat, May 6, 2023 at 4:25 AM Xuan Zhuo  wrote:
> > >
> > > On Thu,  4 May 2023 12:14:47 +0200, =?utf-8?q?Eugenio_P=C3=A9rez?= 
> > >  wrote:
> > > > The commit 93a97dc5200a ("virtio-net: enable vq reset feature") enables
> > > > unconditionally vq reset feature as long as the device is emulated.
> > > > This makes impossible to actually disable the feature, and it causes
> > > > migration problems from qemu version previous than 7.2.
> > > >
> > > > The entire final commit is unneeded as device system already enable or
> > > > disable the feature properly.
> > > >
> > > > This reverts commit 93a97dc5200a95e63b99cb625f20b7ae802ba413.
> > > > Fixes: 93a97dc5200a ("virtio-net: enable vq reset feature")
> > > > Signed-off-by: Eugenio Pérez 
> > > >
> > > > ---
> > > > Tested by checking feature bit at  /sys/devices/pci.../virtio0/features
> > > > enabling and disabling queue_reset virtio-net feature and vhost=on/off
> > > > on net device backend.
> > >
> > > Do you mean that this feature cannot be closed?
> > >
> > > I tried to close in the guest, it was successful.
> > >
> >
> > I'm not sure what you mean with close. If the device dataplane is
> > emulated in qemu (vhost=off), I'm not able to make the device not
> > offer it.
> >
> > > In addition, in this case, could you try to repair the problem instead of
> > > directly revert.
> > >
> >
> > I'm not following this. The revert is not to always disable the feature.
> >
> > By default, the feature is enabled. If cmdline states queue_reset=on,
> > the feature is enabled. That is true both before and after applying
> > this patch.
> >
> > However, in qemu master, queue_reset=off keeps enabling this feature
> > on the device. It happens that there is a commit explicitly doing
> > that, so I'm reverting it.
> >
> > Let me know if that makes sense to you.
> >
> > Thanks!
>
>
> question is this:
>
> DEFINE_PROP_BIT64("queue_reset", _state, _field, \
>   VIRTIO_F_RING_RESET, true)
>
>
>
> don't we need compat for 7.2 and back for this property?
>

I think that part is already covered by commit 69e1c14aa222 ("virtio:
core: vq reset feature negotation support"). In that regard, maybe we
can simplify the patch message simply stating that queue_reset=off
does not work.

Thanks!




Re: [PATCH RESEND] vhost: fix possible wrap in SVQ descriptor ring

2023-05-08 Thread Eugenio Perez Martin
On Sat, May 6, 2023 at 5:01 PM Hawkins Jiawei  wrote:
>
> QEMU invokes vhost_svq_add() when adding a guest's element into SVQ.
> In vhost_svq_add(), it uses vhost_svq_available_slots() to check
> whether QEMU can add the element into the SVQ. If there is
> enough space, then QEMU combines some out descriptors and
> some in descriptors into one descriptor chain, and add it into
> svq->vring.desc by vhost_svq_vring_write_descs().
>
> Yet the problem is that, `svq->shadow_avail_idx - svq->shadow_used_idx`
> in vhost_svq_available_slots() return the number of occupied elements,
> or the number of descriptor chains, instead of the number of occupied
> descriptors, which may cause wrapping in SVQ descriptor ring.
>
> Here is an example. In vhost_handle_guest_kick(), QEMU forwards
> as many available buffers to device by virtqueue_pop() and
> vhost_svq_add_element(). virtqueue_pop() return a guest's element,
> and use vhost_svq_add_elemnt(), a wrapper to vhost_svq_add(), to
> add this element into SVQ. If QEMU invokes virtqueue_pop() and
> vhost_svq_add_element() `svq->vring.num` times, vhost_svq_available_slots()
> thinks QEMU just ran out of slots and everything should work fine.
> But in fact, virtqueue_pop() return `svq-vring.num` elements or
> descriptor chains, more than `svq->vring.num` descriptors, due to
> guest memory fragmentation, and this cause wrapping in SVQ descriptor ring.
>

The bug is valid even before marking the descriptors used. If the
guest memory is fragmented, SVQ must add chains so it can try to add
more descriptors than possible.

> Therefore, this patch adds `num_free` field in VhostShadowVirtqueue
> structure, updates this field in vhost_svq_add() and
> vhost_svq_get_buf(), to record the number of free descriptors.
> Then we can avoid wrap in SVQ descriptor ring by refactoring
> vhost_svq_available_slots().
>
> Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding")
> Signed-off-by: Hawkins Jiawei 
> ---
>  hw/virtio/vhost-shadow-virtqueue.c | 9 -
>  hw/virtio/vhost-shadow-virtqueue.h | 3 +++
>  2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> b/hw/virtio/vhost-shadow-virtqueue.c
> index 8361e70d1b..e1c6952b10 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -68,7 +68,7 @@ bool vhost_svq_valid_features(uint64_t features, Error 
> **errp)
>   */
>  static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
>  {
> -return svq->vring.num - (svq->shadow_avail_idx - svq->shadow_used_idx);
> +return svq->num_free;
>  }
>
>  /**
> @@ -263,6 +263,9 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct 
> iovec *out_sg,
>  return -EINVAL;
>  }
>
> +/* Update the size of SVQ vring free descriptors */
> +svq->num_free -= ndescs;
> +
>  svq->desc_state[qemu_head].elem = elem;
>  svq->desc_state[qemu_head].ndescs = ndescs;
>  vhost_svq_kick(svq);
> @@ -450,6 +453,9 @@ static VirtQueueElement 
> *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
>  svq->desc_next[last_used_chain] = svq->free_head;
>  svq->free_head = used_elem.id;
>
> +/* Update the size of SVQ vring free descriptors */

No need for this comment.

Apart from that,

Acked-by: Eugenio Pérez 

> +svq->num_free += num;
> +
>  *len = used_elem.len;
>  return g_steal_pointer(&svq->desc_state[used_elem.id].elem);
>  }
> @@ -659,6 +665,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, 
> VirtIODevice *vdev,
>  svq->iova_tree = iova_tree;
>
>  svq->vring.num = virtio_queue_get_num(vdev, virtio_get_queue_index(vq));
> +svq->num_free = svq->vring.num;
>  driver_size = vhost_svq_driver_area_size(svq);
>  device_size = vhost_svq_device_area_size(svq);
>  svq->vring.desc = qemu_memalign(qemu_real_host_page_size(), driver_size);
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
> b/hw/virtio/vhost-shadow-virtqueue.h
> index 926a4897b1..6efe051a70 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -107,6 +107,9 @@ typedef struct VhostShadowVirtqueue {
>
>  /* Next head to consume from the device */
>  uint16_t last_used_idx;
> +
> +/* Size of SVQ vring free descriptors */
> +uint16_t num_free;
>  } VhostShadowVirtqueue;
>
>  bool vhost_svq_valid_features(uint64_t features, Error **errp);
> --
> 2.25.1
>




Re: [PATCH 0/4] vhost-user-fs: Internal migration

2023-05-08 Thread Hanna Czenczek

On 05.05.23 16:37, Hanna Czenczek wrote:

On 05.05.23 16:26, Eugenio Perez Martin wrote:
On Fri, May 5, 2023 at 11:51 AM Hanna Czenczek  
wrote:

(By the way, thanks for the explanations :))

On 05.05.23 11:03, Hanna Czenczek wrote:

On 04.05.23 23:14, Stefan Hajnoczi wrote:

[...]


I think it's better to change QEMU's vhost code
to leave stateful devices suspended (but not reset) across
vhost_dev_stop() -> vhost_dev_start(), maybe by introducing
vhost_dev_suspend() and vhost_dev_resume(). Have you thought about
this aspect?

Yes and no; I mean, I haven’t in detail, but I thought this is what’s
meant by suspending instead of resetting when the VM is stopped.

So, now looking at vhost_dev_stop(), one problem I can see is that
depending on the back-end, different operations it does will do
different things.

It tries to stop the whole device via vhost_ops->vhost_dev_start(),
which for vDPA will suspend the device, but for vhost-user will 
reset it

(if F_STATUS is there).

It disables all vrings, which doesn’t mean stopping, but may be
necessary, too.  (I haven’t yet really understood the use of disabled
vrings, I heard that virtio-net would have a need for it.)

It then also stops all vrings, though, so that’s OK.  And because this
will always do GET_VRING_BASE, this is actually always the same
regardless of transport.

Finally (for this purpose), it resets the device status via
vhost_ops->vhost_reset_status().  This is only implemented on vDPA, and
this is what resets the device there.


So vhost-user resets the device in .vhost_dev_start, but vDPA only does
so in .vhost_reset_status.  It would seem better to me if vhost-user
would also reset the device only in .vhost_reset_status, not in
.vhost_dev_start.  .vhost_dev_start seems precisely like the place to
run SUSPEND/RESUME.


I think the same. I just saw It's been proposed at [1].


Another question I have (but this is basically what I wrote in my last
email) is why we even call .vhost_reset_status here.  If the device
and/or all of the vrings are already stopped, why do we need to reset
it?  Naïvely, I had assumed we only really need to reset the device if
the guest changes, so that a new guest driver sees a freshly 
initialized

device.


I don't know why we didn't need to call it :). I'm assuming the
previous vhost-user net did fine resetting vq indexes, using
VHOST_USER_SET_VRING_BASE. But I don't know about more complex
devices.

The guest can reset the device, or write 0 to the PCI config status,
at any time. How does virtiofs handle it, being stateful?


Honestly a good question because virtiofsd implements neither 
SET_STATUS nor RESET_DEVICE.  I’ll have to investigate that.


I think when the guest resets the device, SET_VRING_BASE always comes 
along some way or another, so that’s how the vrings are reset.  Maybe 
the internal state is reset only following more high-level FUSE 
commands like INIT.


So a meeting and one session of looking-into-the-code later:

We reset every virt queue on GET_VRING_BASE, which is wrong, but happens 
to serve the purpose.  (German is currently on that.)


In our meeting, German said the reset would occur when the memory 
regions are changed, but I can’t see that in the code.  I think it only 
happens implicitly through the SET_VRING_BASE call, which resets the 
internal avail/used pointers.


[This doesn’t seem different from libvhost-user, though, which 
implements neither SET_STATUS nor RESET_DEVICE, and which pretends to 
reset the device on RESET_OWNER, but really doesn’t (its 
vu_reset_device_exec() function just disables all vrings, doesn’t reset 
or even stop them).]


Consequently, the internal state is never reset.  It would be cleared on 
a FUSE Destroy message, but if you just force-reset the system, the 
state remains into the next reboot.  Not even FUSE Init clears it, which 
seems weird.  It happens to work because it’s still the same filesystem, 
so the existing state fits, but it kind of seems dangerous to keep e.g. 
files open.  I don’t think it’s really exploitable because everything 
still goes through the guest kernel, but, well.  We should clear the 
state on Init, and probably also implement SET_STATUS and clear the 
state there.


Hanna




Re: [PATCH v2 00/12] simpletrace: refactor and general improvements

2023-05-08 Thread Mads Ynddal


> 
> I was curious how Mads is using simpletrace for an internal (to
> Samsung?) project.
> 

I was just tracing the NVMe emulation to get some metrics. The code is all
upstream or a part of this patchset. The rest is tracing configs.



Re: [PATCH 00/13] Migration PULL request (20230508 edition)

2023-05-08 Thread Richard Henderson

On 5/8/23 16:26, Juan Quintela wrote:

Hi

This is just the compression bits of the Migration PULL request for
20230428.  Only change is that we don't run the compression tests by
default.

The problem already exist with compression code.  The test just show
that it don't work.

Please apply, Juan.


Missing request-pull data.


r~



Lukas Straub (13):
   qtest/migration-test.c: Add tests with compress enabled
   qtest/migration-test.c: Add postcopy tests with compress enabled
   ram.c: Let the compress threads return a CompressResult enum
   ram.c: Dont change param->block in the compress thread
   ram.c: Reset result after sending queued data
   ram.c: Do not call save_page_header() from compress threads
   ram.c: Call update_compress_thread_counts from
 compress_send_queued_data
   ram.c: Remove last ram.c dependency from the core compress code
   ram.c: Move core compression code into its own file
   ram.c: Move core decompression code into its own file
   ram compress: Assert that the file buffer matches the result
   ram-compress.c: Make target independent
   migration: Initialize and cleanup decompression in migration.c

  migration/meson.build|   6 +-
  migration/migration.c|   9 +
  migration/qemu-file.c|  11 +
  migration/qemu-file.h|   1 +
  migration/ram-compress.c | 485 +
  migration/ram-compress.h |  70 +
  migration/ram.c  | 502 +++
  tests/qtest/migration-test.c | 134 ++
  8 files changed, 758 insertions(+), 460 deletions(-)
  create mode 100644 migration/ram-compress.c
  create mode 100644 migration/ram-compress.h






Re: [RFC v2 1/1] migration: Update error description whenever migration fails

2023-05-08 Thread Thomas Huth

 Hi!

On 08/05/2023 17.32, tejus.gk wrote:

There are places in the code where the migration is marked failed with
MIGRATION_STATUS_FAILED, but the failiure reason is never updated. Hence


s/failiure/failure/


libvirt doesn't know why the migration failed when it queries for it.

Signed-off-by: tejus.gk 


The Signed-off-by line should contain the proper name...
Is "tejus.gk" really the correct spelling of your name (with only lowercase 
letters and a dot in it)? If not, please update the line, thanks!


 Thomas




Re: [PATCH v4 52/57] tcg/i386: Honor 64-bit atomicity in 32-bit mode

2023-05-08 Thread Richard Henderson

On 5/5/23 14:27, Peter Maydell wrote:

On Wed, 3 May 2023 at 08:18, Richard Henderson
 wrote:


Use the fpu to perform 64-bit loads and stores.

Signed-off-by: Richard Henderson 




@@ -2091,7 +2095,20 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg 
datalo, TCGReg datahi,
  datalo = datahi;
  datahi = t;
  }
-if (h.base == datalo || h.index == datalo) {
+if (h.atom == MO_64) {
+/*
+ * Atomicity requires that we use use a single 8-byte load.
+ * For simplicity and code size, always use the FPU for this.
+ * Similar insns using SSE/AVX are merely larger.


I'm surprised there's no performance penalty for throwing old-school
FPU insns into what is presumably otherwise code that's only
using modern SSE.


I have no idea about performance.  We don't require SSE for TCG at the moment.


I assume the caller has arranged that the top of the stack
is trashable at this point?


The entire fpu stack is call-clobbered.


r~




Re: [PATCH] target/m68k: Fix gen_load_fp for OS_LONG

2023-05-08 Thread Laurent Vivier

Le 08/05/2023 à 16:08, Richard Henderson a écrit :

Case was accidentally dropped in b7a94da9550b.

Signed-off-by: Richard Henderson 
---
  target/m68k/translate.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/target/m68k/translate.c b/target/m68k/translate.c
index 744eb3748b..44d852b106 100644
--- a/target/m68k/translate.c
+++ b/target/m68k/translate.c
@@ -959,6 +959,7 @@ static void gen_load_fp(DisasContext *s, int opsize, TCGv 
addr, TCGv_ptr fp,
  switch (opsize) {
  case OS_BYTE:
  case OS_WORD:
+case OS_LONG:
  tcg_gen_qemu_ld_tl(tmp, addr, index, opsize | MO_SIGN | MO_TE);
  gen_helper_exts32(cpu_env, fp, tmp);
  break;


Tested-by: Laurent Vivier 
Reviewed-by: Laurent Vivier 



[RFC v2 1/1] migration: Update error description whenever migration fails

2023-05-08 Thread tejus.gk
There are places in the code where the migration is marked failed with
MIGRATION_STATUS_FAILED, but the failiure reason is never updated. Hence
libvirt doesn't know why the migration failed when it queries for it.

Signed-off-by: tejus.gk 
---
 migration/migration.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 232e387109..87101eed5c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1660,15 +1660,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 } else if (strstart(uri, "fd:", &p)) {
 fd_start_outgoing_migration(s, p, &local_err);
 } else {
-if (!(has_resume && resume)) {
-yank_unregister_instance(MIGRATION_YANK_INSTANCE);
-}
-error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri",
+error_setg(&local_err, QERR_INVALID_PARAMETER_VALUE, "uri",
"a valid migration protocol");
-migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
-  MIGRATION_STATUS_FAILED);
 block_cleanup_parameters();
-return;
 }
 
 if (local_err) {
@@ -2050,7 +2044,7 @@ migration_wait_main_channel(MigrationState *ms)
  * Switch from normal iteration to postcopy
  * Returns non-0 on error
  */
-static int postcopy_start(MigrationState *ms)
+static int postcopy_start(MigrationState *ms, Error **errp)
 {
 int ret;
 QIOChannelBuffer *bioc;
@@ -2165,7 +2159,7 @@ static int postcopy_start(MigrationState *ms)
  */
 ret = qemu_file_get_error(ms->to_dst_file);
 if (ret) {
-error_report("postcopy_start: Migration stream errored (pre package)");
+error_setg(errp, "postcopy_start: Migration stream errored (pre 
package)");
 goto fail_closefb;
 }
 
@@ -2202,7 +2196,7 @@ static int postcopy_start(MigrationState *ms)
 
 ret = qemu_file_get_error(ms->to_dst_file);
 if (ret) {
-error_report("postcopy_start: Migration stream errored");
+error_setg(errp, "postcopy_start: Migration stream errored");
 migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
   MIGRATION_STATUS_FAILED);
 }
@@ -2719,6 +2713,7 @@ typedef enum {
 static MigIterateState migration_iteration_run(MigrationState *s)
 {
 uint64_t must_precopy, can_postcopy;
+Error *local_err = NULL;
 bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
 
 qemu_savevm_state_pending_estimate(&must_precopy, &can_postcopy);
@@ -2741,8 +2736,9 @@ static MigIterateState 
migration_iteration_run(MigrationState *s)
 /* Still a significant amount to transfer */
 if (!in_postcopy && must_precopy <= s->threshold_size &&
 qatomic_read(&s->start_postcopy)) {
-if (postcopy_start(s)) {
-error_report("%s: postcopy failed to start", __func__);
+if (postcopy_start(s, &local_err)) {
+migrate_set_error(s, local_err);
+error_report_err(local_err);
 }
 return MIG_ITERATE_SKIP;
 }
@@ -3232,8 +3228,10 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
  */
 if (migrate_postcopy_ram() || migrate_return_path()) {
 if (open_return_path_on_source(s, !resume)) {
-error_report("Unable to open return-path for postcopy");
+error_setg(&local_err, "Unable to open return-path for postcopy");
 migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
+migrate_set_error(s, local_err);
+error_report_err(local_err);
 migrate_fd_cleanup(s);
 return;
 }
-- 
2.22.3




[RFC v2 0/1] migration: Update error description whenever migration fails

2023-05-08 Thread tejus.gk
Hi everyone, 

Thanks for the reviews. This is the v2 patchset based on the reviews 
recieved for the previous one. 

Links to previous patchsets:
v1: https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg00868.html


tejus.gk (1):
  migration: Update error description whenever migration fails

 migration/migration.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

-- 
2.22.3




[PATCH 04/13] ram.c: Dont change param->block in the compress thread

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

Instead introduce a extra parameter to trigger the compress thread.
Now, when the compress thread is done, we know what RAMBlock and
offset it did compress.

This will be used in the next commits to move save_page_header()
out of compress code.

Signed-off-by: Lukas Straub 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7bc05fc034..b552a9e538 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -492,6 +492,7 @@ typedef enum CompressResult CompressResult;
 struct CompressParam {
 bool done;
 bool quit;
+bool trigger;
 CompressResult result;
 QEMUFile *file;
 QemuMutex mutex;
@@ -565,10 +566,10 @@ static void *do_data_compress(void *opaque)
 
 qemu_mutex_lock(¶m->mutex);
 while (!param->quit) {
-if (param->block) {
+if (param->trigger) {
 block = param->block;
 offset = param->offset;
-param->block = NULL;
+param->trigger = false;
 qemu_mutex_unlock(¶m->mutex);
 
 result = do_compress_ram_page(param->file, ¶m->stream,
@@ -1545,6 +1546,7 @@ static inline void set_compress_params(CompressParam 
*param, RAMBlock *block,
 {
 param->block = block;
 param->offset = offset;
+param->trigger = true;
 }
 
 static int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset)
-- 
2.40.0




[PATCH 10/13] ram.c: Move core decompression code into its own file

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

No functional changes intended.

Signed-off-by: Lukas Straub 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram-compress.c | 203 ++
 migration/ram-compress.h |   5 +
 migration/ram.c  | 204 ---
 3 files changed, 208 insertions(+), 204 deletions(-)

diff --git a/migration/ram-compress.c b/migration/ram-compress.c
index d9bc67d075..c25562f12d 100644
--- a/migration/ram-compress.c
+++ b/migration/ram-compress.c
@@ -48,6 +48,24 @@ static QemuThread *compress_threads;
 static QemuMutex comp_done_lock;
 static QemuCond comp_done_cond;
 
+struct DecompressParam {
+bool done;
+bool quit;
+QemuMutex mutex;
+QemuCond cond;
+void *des;
+uint8_t *compbuf;
+int len;
+z_stream stream;
+};
+typedef struct DecompressParam DecompressParam;
+
+static QEMUFile *decomp_file;
+static DecompressParam *decomp_param;
+static QemuThread *decompress_threads;
+static QemuMutex decomp_done_lock;
+static QemuCond decomp_done_cond;
+
 static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream,
RAMBlock *block, ram_addr_t offset,
uint8_t *source_buf);
@@ -272,3 +290,188 @@ retry:
 
 return pages;
 }
+
+/* return the size after decompression, or negative value on error */
+static int
+qemu_uncompress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
+ const uint8_t *source, size_t source_len)
+{
+int err;
+
+err = inflateReset(stream);
+if (err != Z_OK) {
+return -1;
+}
+
+stream->avail_in = source_len;
+stream->next_in = (uint8_t *)source;
+stream->avail_out = dest_len;
+stream->next_out = dest;
+
+err = inflate(stream, Z_NO_FLUSH);
+if (err != Z_STREAM_END) {
+return -1;
+}
+
+return stream->total_out;
+}
+
+static void *do_data_decompress(void *opaque)
+{
+DecompressParam *param = opaque;
+unsigned long pagesize;
+uint8_t *des;
+int len, ret;
+
+qemu_mutex_lock(¶m->mutex);
+while (!param->quit) {
+if (param->des) {
+des = param->des;
+len = param->len;
+param->des = 0;
+qemu_mutex_unlock(¶m->mutex);
+
+pagesize = TARGET_PAGE_SIZE;
+
+ret = qemu_uncompress_data(¶m->stream, des, pagesize,
+   param->compbuf, len);
+if (ret < 0 && migrate_get_current()->decompress_error_check) {
+error_report("decompress data failed");
+qemu_file_set_error(decomp_file, ret);
+}
+
+qemu_mutex_lock(&decomp_done_lock);
+param->done = true;
+qemu_cond_signal(&decomp_done_cond);
+qemu_mutex_unlock(&decomp_done_lock);
+
+qemu_mutex_lock(¶m->mutex);
+} else {
+qemu_cond_wait(¶m->cond, ¶m->mutex);
+}
+}
+qemu_mutex_unlock(¶m->mutex);
+
+return NULL;
+}
+
+int wait_for_decompress_done(void)
+{
+int idx, thread_count;
+
+if (!migrate_compress()) {
+return 0;
+}
+
+thread_count = migrate_decompress_threads();
+qemu_mutex_lock(&decomp_done_lock);
+for (idx = 0; idx < thread_count; idx++) {
+while (!decomp_param[idx].done) {
+qemu_cond_wait(&decomp_done_cond, &decomp_done_lock);
+}
+}
+qemu_mutex_unlock(&decomp_done_lock);
+return qemu_file_get_error(decomp_file);
+}
+
+void compress_threads_load_cleanup(void)
+{
+int i, thread_count;
+
+if (!migrate_compress()) {
+return;
+}
+thread_count = migrate_decompress_threads();
+for (i = 0; i < thread_count; i++) {
+/*
+ * we use it as a indicator which shows if the thread is
+ * properly init'd or not
+ */
+if (!decomp_param[i].compbuf) {
+break;
+}
+
+qemu_mutex_lock(&decomp_param[i].mutex);
+decomp_param[i].quit = true;
+qemu_cond_signal(&decomp_param[i].cond);
+qemu_mutex_unlock(&decomp_param[i].mutex);
+}
+for (i = 0; i < thread_count; i++) {
+if (!decomp_param[i].compbuf) {
+break;
+}
+
+qemu_thread_join(decompress_threads + i);
+qemu_mutex_destroy(&decomp_param[i].mutex);
+qemu_cond_destroy(&decomp_param[i].cond);
+inflateEnd(&decomp_param[i].stream);
+g_free(decomp_param[i].compbuf);
+decomp_param[i].compbuf = NULL;
+}
+g_free(decompress_threads);
+g_free(decomp_param);
+decompress_threads = NULL;
+decomp_param = NULL;
+decomp_file = NULL;
+}
+
+int compress_threads_load_setup(QEMUFile *f)
+{
+int i, thread_count;
+
+if (!migrate_compress()) {
+return 0;
+}
+
+thread_count = migrate_decompress_threads();
+decompress_thr

[PATCH 12/13] ram-compress.c: Make target independent

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

Make ram-compress.c target independent.

Signed-off-by: Lukas Straub 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/meson.build|  3 ++-
 migration/ram-compress.c | 17 ++---
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/migration/meson.build b/migration/meson.build
index 2090af8e85..75de868bb7 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -23,6 +23,8 @@ softmmu_ss.add(files(
   'migration.c',
   'multifd.c',
   'multifd-zlib.c',
+  'multifd-zlib.c',
+  'ram-compress.c',
   'options.c',
   'postcopy-ram.c',
   'savevm.c',
@@ -40,5 +42,4 @@ softmmu_ss.add(when: zstd, if_true: files('multifd-zstd.c'))
 specific_ss.add(when: 'CONFIG_SOFTMMU',
 if_true: files('dirtyrate.c',
'ram.c',
-   'ram-compress.c',
'target.c'))
diff --git a/migration/ram-compress.c b/migration/ram-compress.c
index 3d2a4a6329..06254d8c69 100644
--- a/migration/ram-compress.c
+++ b/migration/ram-compress.c
@@ -35,7 +35,8 @@
 #include "migration.h"
 #include "options.h"
 #include "io/channel-null.h"
-#include "exec/ram_addr.h"
+#include "exec/target_page.h"
+#include "exec/ramblock.h"
 
 CompressionStats compression_counters;
 
@@ -156,7 +157,7 @@ int compress_threads_save_setup(void)
 qemu_cond_init(&comp_done_cond);
 qemu_mutex_init(&comp_done_lock);
 for (i = 0; i < thread_count; i++) {
-comp_param[i].originbuf = g_try_malloc(TARGET_PAGE_SIZE);
+comp_param[i].originbuf = g_try_malloc(qemu_target_page_size());
 if (!comp_param[i].originbuf) {
 goto exit;
 }
@@ -192,11 +193,12 @@ static CompressResult do_compress_ram_page(QEMUFile *f, 
z_stream *stream,
uint8_t *source_buf)
 {
 uint8_t *p = block->host + offset;
+size_t page_size = qemu_target_page_size();
 int ret;
 
 assert(qemu_file_buffer_empty(f));
 
-if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
+if (buffer_is_zero(p, page_size)) {
 return RES_ZEROPAGE;
 }
 
@@ -205,8 +207,8 @@ static CompressResult do_compress_ram_page(QEMUFile *f, 
z_stream *stream,
  * so that we can catch up the error during compression and
  * decompression
  */
-memcpy(source_buf, p, TARGET_PAGE_SIZE);
-ret = qemu_put_compression_data(f, stream, source_buf, TARGET_PAGE_SIZE);
+memcpy(source_buf, p, page_size);
+ret = qemu_put_compression_data(f, stream, source_buf, page_size);
 if (ret < 0) {
 qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
 error_report("compressed data failed!");
@@ -336,7 +338,7 @@ static void *do_data_decompress(void *opaque)
 param->des = 0;
 qemu_mutex_unlock(¶m->mutex);
 
-pagesize = TARGET_PAGE_SIZE;
+pagesize = qemu_target_page_size();
 
 ret = qemu_uncompress_data(¶m->stream, des, pagesize,
param->compbuf, len);
@@ -439,7 +441,8 @@ int compress_threads_load_setup(QEMUFile *f)
 goto exit;
 }
 
-decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
+size_t compbuf_size = compressBound(qemu_target_page_size());
+decomp_param[i].compbuf = g_malloc0(compbuf_size);
 qemu_mutex_init(&decomp_param[i].mutex);
 qemu_cond_init(&decomp_param[i].cond);
 decomp_param[i].done = true;
-- 
2.40.0




[PATCH 07/13] ram.c: Call update_compress_thread_counts from compress_send_queued_data

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

This makes the core compress code more independend from ram.c.

Signed-off-by: Lukas Straub 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 18 ++
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index c52602b70d..d1c24eff21 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1540,12 +1540,14 @@ static int send_queued_data(CompressParam *param)
 abort();
 }
 
+update_compress_thread_counts(param, len);
+
 return len;
 }
 
 static void flush_compressed_data(RAMState *rs)
 {
-int idx, len, thread_count;
+int idx, thread_count;
 
 if (!save_page_use_compression(rs)) {
 return;
@@ -1564,15 +1566,8 @@ static void flush_compressed_data(RAMState *rs)
 qemu_mutex_lock(&comp_param[idx].mutex);
 if (!comp_param[idx].quit) {
 CompressParam *param = &comp_param[idx];
-len = send_queued_data(param);
+send_queued_data(param);
 compress_reset_result(param);
-
-/*
- * it's safe to fetch zero_page without holding comp_done_lock
- * as there is no further request submitted to the thread,
- * i.e, the thread should be waiting for a request at this point.
- */
-update_compress_thread_counts(param, len);
 }
 qemu_mutex_unlock(&comp_param[idx].mutex);
 }
@@ -1588,7 +1583,7 @@ static inline void set_compress_params(CompressParam 
*param, RAMBlock *block,
 
 static int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset)
 {
-int idx, thread_count, bytes_xmit = -1, pages = -1;
+int idx, thread_count, pages = -1;
 bool wait = migrate_compress_wait_thread();
 
 thread_count = migrate_compress_threads();
@@ -1599,11 +1594,10 @@ retry:
 CompressParam *param = &comp_param[idx];
 qemu_mutex_lock(¶m->mutex);
 param->done = false;
-bytes_xmit = send_queued_data(param);
+send_queued_data(param);
 compress_reset_result(param);
 set_compress_params(param, block, offset);
 
-update_compress_thread_counts(param, bytes_xmit);
 qemu_cond_signal(¶m->cond);
 qemu_mutex_unlock(¶m->mutex);
 pages = 1;
-- 
2.40.0




[PATCH 05/13] ram.c: Reset result after sending queued data

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

And take the param->mutex lock for the whole section to ensure
thread-safety.
Now, it is explicitly clear if there is no queued data to send.
Before, this was handled by param->file stream being empty and thus
qemu_put_qemu_file() not sending anything.

This will be used in the next commits to move save_page_header()
out of compress code.

Signed-off-by: Lukas Straub 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 32 ++--
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index b552a9e538..4e14e3bb94 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1508,6 +1508,13 @@ update_compress_thread_counts(const CompressParam 
*param, int bytes_xmit)
 
 static bool save_page_use_compression(RAMState *rs);
 
+static inline void compress_reset_result(CompressParam *param)
+{
+param->result = RES_NONE;
+param->block = NULL;
+param->offset = 0;
+}
+
 static void flush_compressed_data(RAMState *rs)
 {
 MigrationState *ms = migrate_get_current();
@@ -1529,13 +1536,16 @@ static void flush_compressed_data(RAMState *rs)
 for (idx = 0; idx < thread_count; idx++) {
 qemu_mutex_lock(&comp_param[idx].mutex);
 if (!comp_param[idx].quit) {
-len = qemu_put_qemu_file(ms->to_dst_file, comp_param[idx].file);
+CompressParam *param = &comp_param[idx];
+len = qemu_put_qemu_file(ms->to_dst_file, param->file);
+compress_reset_result(param);
+
 /*
  * it's safe to fetch zero_page without holding comp_done_lock
  * as there is no further request submitted to the thread,
  * i.e, the thread should be waiting for a request at this point.
  */
-update_compress_thread_counts(&comp_param[idx], len);
+update_compress_thread_counts(param, len);
 }
 qemu_mutex_unlock(&comp_param[idx].mutex);
 }
@@ -1560,15 +1570,17 @@ static int compress_page_with_multi_thread(RAMBlock 
*block, ram_addr_t offset)
 retry:
 for (idx = 0; idx < thread_count; idx++) {
 if (comp_param[idx].done) {
-comp_param[idx].done = false;
-bytes_xmit = qemu_put_qemu_file(ms->to_dst_file,
-comp_param[idx].file);
-qemu_mutex_lock(&comp_param[idx].mutex);
-set_compress_params(&comp_param[idx], block, offset);
-qemu_cond_signal(&comp_param[idx].cond);
-qemu_mutex_unlock(&comp_param[idx].mutex);
+CompressParam *param = &comp_param[idx];
+qemu_mutex_lock(¶m->mutex);
+param->done = false;
+bytes_xmit = qemu_put_qemu_file(ms->to_dst_file, param->file);
+compress_reset_result(param);
+set_compress_params(param, block, offset);
+
+update_compress_thread_counts(param, bytes_xmit);
+qemu_cond_signal(¶m->cond);
+qemu_mutex_unlock(¶m->mutex);
 pages = 1;
-update_compress_thread_counts(&comp_param[idx], bytes_xmit);
 break;
 }
 }
-- 
2.40.0




[PATCH 03/13] ram.c: Let the compress threads return a CompressResult enum

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

This will be used in the next commits to move save_page_header()
out of compress code.

Signed-off-by: Lukas Straub 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/ram.c | 34 ++
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 5e7bf20ca5..7bc05fc034 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -482,10 +482,17 @@ MigrationOps *migration_ops;
 
 CompressionStats compression_counters;
 
+enum CompressResult {
+RES_NONE = 0,
+RES_ZEROPAGE = 1,
+RES_COMPRESS = 2
+};
+typedef enum CompressResult CompressResult;
+
 struct CompressParam {
 bool done;
 bool quit;
-bool zero_page;
+CompressResult result;
 QEMUFile *file;
 QemuMutex mutex;
 QemuCond cond;
@@ -527,8 +534,9 @@ static QemuCond decomp_done_cond;
 
 static int ram_save_host_page_urgent(PageSearchStatus *pss);
 
-static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock 
*block,
- ram_addr_t offset, uint8_t *source_buf);
+static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream,
+   RAMBlock *block, ram_addr_t offset,
+   uint8_t *source_buf);
 
 /* NOTE: page is the PFN not real ram_addr_t. */
 static void pss_init(PageSearchStatus *pss, RAMBlock *rb, ram_addr_t page)
@@ -553,7 +561,7 @@ static void *do_data_compress(void *opaque)
 CompressParam *param = opaque;
 RAMBlock *block;
 ram_addr_t offset;
-bool zero_page;
+CompressResult result;
 
 qemu_mutex_lock(¶m->mutex);
 while (!param->quit) {
@@ -563,12 +571,12 @@ static void *do_data_compress(void *opaque)
 param->block = NULL;
 qemu_mutex_unlock(¶m->mutex);
 
-zero_page = do_compress_ram_page(param->file, ¶m->stream,
- block, offset, param->originbuf);
+result = do_compress_ram_page(param->file, ¶m->stream,
+  block, offset, param->originbuf);
 
 qemu_mutex_lock(&comp_done_lock);
 param->done = true;
-param->zero_page = zero_page;
+param->result = result;
 qemu_cond_signal(&comp_done_cond);
 qemu_mutex_unlock(&comp_done_lock);
 
@@ -1452,8 +1460,9 @@ static int ram_save_multifd_page(QEMUFile *file, RAMBlock 
*block,
 return 1;
 }
 
-static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock 
*block,
- ram_addr_t offset, uint8_t *source_buf)
+static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream,
+   RAMBlock *block, ram_addr_t offset,
+   uint8_t *source_buf)
 {
 RAMState *rs = ram_state;
 PageSearchStatus *pss = &rs->pss[RAM_CHANNEL_PRECOPY];
@@ -1461,7 +1470,7 @@ static bool do_compress_ram_page(QEMUFile *f, z_stream 
*stream, RAMBlock *block,
 int ret;
 
 if (save_zero_page_to_file(pss, f, block, offset)) {
-return true;
+return RES_ZEROPAGE;
 }
 
 save_page_header(pss, f, block, offset | RAM_SAVE_FLAG_COMPRESS_PAGE);
@@ -1476,8 +1485,9 @@ static bool do_compress_ram_page(QEMUFile *f, z_stream 
*stream, RAMBlock *block,
 if (ret < 0) {
 qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
 error_report("compressed data failed!");
+return RES_NONE;
 }
-return false;
+return RES_COMPRESS;
 }
 
 static void
@@ -1485,7 +1495,7 @@ update_compress_thread_counts(const CompressParam *param, 
int bytes_xmit)
 {
 ram_transferred_add(bytes_xmit);
 
-if (param->zero_page) {
+if (param->result == RES_ZEROPAGE) {
 stat64_add(&mig_stats.zero_pages, 1);
 return;
 }
-- 
2.40.0




[PATCH 13/13] migration: Initialize and cleanup decompression in migration.c

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

This fixes compress with colo.

Signed-off-by: Lukas Straub 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/migration.c | 9 +
 migration/ram.c   | 5 -
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 232e387109..0ee07802a5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -26,6 +26,7 @@
 #include "sysemu/cpu-throttle.h"
 #include "rdma.h"
 #include "ram.h"
+#include "ram-compress.h"
 #include "migration/global_state.h"
 #include "migration/misc.h"
 #include "migration.h"
@@ -228,6 +229,7 @@ void migration_incoming_state_destroy(void)
 struct MigrationIncomingState *mis = migration_incoming_get_current();
 
 multifd_load_cleanup();
+compress_threads_load_cleanup();
 
 if (mis->to_src_file) {
 /* Tell source that we are done */
@@ -500,6 +502,12 @@ process_incoming_migration_co(void *opaque)
 Error *local_err = NULL;
 
 assert(mis->from_src_file);
+
+if (compress_threads_load_setup(mis->from_src_file)) {
+error_report("Failed to setup decompress threads");
+goto fail;
+}
+
 mis->migration_incoming_co = qemu_coroutine_self();
 mis->largest_page_size = qemu_ram_pagesize_largest();
 postcopy_state_set(POSTCOPY_INCOMING_NONE);
@@ -565,6 +573,7 @@ fail:
 qemu_fclose(mis->from_src_file);
 
 multifd_load_cleanup();
+compress_threads_load_cleanup();
 
 exit(EXIT_FAILURE);
 }
diff --git a/migration/ram.c b/migration/ram.c
index ee4ab31f25..f78e9912cd 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3558,10 +3558,6 @@ void colo_release_ram_cache(void)
  */
 static int ram_load_setup(QEMUFile *f, void *opaque)
 {
-if (compress_threads_load_setup(f)) {
-return -1;
-}
-
 xbzrle_load_setup();
 ramblock_recv_map_init();
 
@@ -3577,7 +3573,6 @@ static int ram_load_cleanup(void *opaque)
 }
 
 xbzrle_load_cleanup();
-compress_threads_load_cleanup();
 
 RAMBLOCK_FOREACH_NOT_IGNORED(rb) {
 g_free(rb->receivedmap);
-- 
2.40.0




[PATCH 11/13] ram compress: Assert that the file buffer matches the result

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

Before this series, "nothing to send" was handled by the file buffer
being empty. Now it is tracked via param->result.

Assert that the file buffer state matches the result.

Signed-off-by: Lukas Straub 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/qemu-file.c| 11 +++
 migration/qemu-file.h|  1 +
 migration/ram-compress.c |  5 +
 migration/ram.c  |  2 ++
 4 files changed, 19 insertions(+)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index f4cfd05c67..61fb580342 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -870,6 +870,17 @@ int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src)
 return len;
 }
 
+/*
+ * Check if the writable buffer is empty
+ */
+
+bool qemu_file_buffer_empty(QEMUFile *file)
+{
+assert(qemu_file_is_writable(file));
+
+return !file->iovcnt;
+}
+
 /*
  * Get a string whose length is determined by a single preceding byte
  * A preallocated 256 byte buffer must be passed in.
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 4f26bf6961..4ee58a87dd 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -113,6 +113,7 @@ size_t coroutine_mixed_fn qemu_get_buffer_in_place(QEMUFile 
*f, uint8_t **buf, s
 ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream,
   const uint8_t *p, size_t size);
 int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src);
+bool qemu_file_buffer_empty(QEMUFile *file);
 
 /*
  * Note that you can only peek continuous bytes from where the current pointer
diff --git a/migration/ram-compress.c b/migration/ram-compress.c
index c25562f12d..3d2a4a6329 100644
--- a/migration/ram-compress.c
+++ b/migration/ram-compress.c
@@ -194,6 +194,8 @@ static CompressResult do_compress_ram_page(QEMUFile *f, 
z_stream *stream,
 uint8_t *p = block->host + offset;
 int ret;
 
+assert(qemu_file_buffer_empty(f));
+
 if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
 return RES_ZEROPAGE;
 }
@@ -208,6 +210,7 @@ static CompressResult do_compress_ram_page(QEMUFile *f, 
z_stream *stream,
 if (ret < 0) {
 qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
 error_report("compressed data failed!");
+qemu_fflush(f);
 return RES_NONE;
 }
 return RES_COMPRESS;
@@ -239,6 +242,7 @@ void flush_compressed_data(int 
(send_queued_data(CompressParam *)))
 if (!comp_param[idx].quit) {
 CompressParam *param = &comp_param[idx];
 send_queued_data(param);
+assert(qemu_file_buffer_empty(param->file));
 compress_reset_result(param);
 }
 qemu_mutex_unlock(&comp_param[idx].mutex);
@@ -268,6 +272,7 @@ retry:
 qemu_mutex_lock(¶m->mutex);
 param->done = false;
 send_queued_data(param);
+assert(qemu_file_buffer_empty(param->file));
 compress_reset_result(param);
 set_compress_params(param, block, offset);
 
diff --git a/migration/ram.c b/migration/ram.c
index 009681d213..ee4ab31f25 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1321,11 +1321,13 @@ static int send_queued_data(CompressParam *param)
 assert(block == pss->last_sent_block);
 
 if (param->result == RES_ZEROPAGE) {
+assert(qemu_file_buffer_empty(param->file));
 len += save_page_header(pss, file, block, offset | RAM_SAVE_FLAG_ZERO);
 qemu_put_byte(file, 0);
 len += 1;
 ram_release_page(block->idstr, offset);
 } else if (param->result == RES_COMPRESS) {
+assert(!qemu_file_buffer_empty(param->file));
 len += save_page_header(pss, file, block,
 offset | RAM_SAVE_FLAG_COMPRESS_PAGE);
 len += qemu_put_qemu_file(file, param->file);
-- 
2.40.0




[PATCH 09/13] ram.c: Move core compression code into its own file

2023-05-08 Thread Juan Quintela
From: Lukas Straub 

No functional changes intended.

Signed-off-by: Lukas Straub 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
---
 migration/meson.build|   5 +-
 migration/ram-compress.c | 274 +++
 migration/ram-compress.h |  65 ++
 migration/ram.c  | 262 +
 4 files changed, 344 insertions(+), 262 deletions(-)
 create mode 100644 migration/ram-compress.c
 create mode 100644 migration/ram-compress.h

diff --git a/migration/meson.build b/migration/meson.build
index da1897fadf..2090af8e85 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -38,4 +38,7 @@ endif
 softmmu_ss.add(when: zstd, if_true: files('multifd-zstd.c'))
 
 specific_ss.add(when: 'CONFIG_SOFTMMU',
-if_true: files('dirtyrate.c', 'ram.c', 'target.c'))
+if_true: files('dirtyrate.c',
+   'ram.c',
+   'ram-compress.c',
+   'target.c'))
diff --git a/migration/ram-compress.c b/migration/ram-compress.c
new file mode 100644
index 00..d9bc67d075
--- /dev/null
+++ b/migration/ram-compress.c
@@ -0,0 +1,274 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ * Copyright (c) 2011-2015 Red Hat Inc
+ *
+ * Authors:
+ *  Juan Quintela 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/cutils.h"
+
+#include "ram-compress.h"
+
+#include "qemu/error-report.h"
+#include "migration.h"
+#include "options.h"
+#include "io/channel-null.h"
+#include "exec/ram_addr.h"
+
+CompressionStats compression_counters;
+
+static CompressParam *comp_param;
+static QemuThread *compress_threads;
+/* comp_done_cond is used to wake up the migration thread when
+ * one of the compression threads has finished the compression.
+ * comp_done_lock is used to co-work with comp_done_cond.
+ */
+static QemuMutex comp_done_lock;
+static QemuCond comp_done_cond;
+
+static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream,
+   RAMBlock *block, ram_addr_t offset,
+   uint8_t *source_buf);
+
+static void *do_data_compress(void *opaque)
+{
+CompressParam *param = opaque;
+RAMBlock *block;
+ram_addr_t offset;
+CompressResult result;
+
+qemu_mutex_lock(¶m->mutex);
+while (!param->quit) {
+if (param->trigger) {
+block = param->block;
+offset = param->offset;
+param->trigger = false;
+qemu_mutex_unlock(¶m->mutex);
+
+result = do_compress_ram_page(param->file, ¶m->stream,
+  block, offset, param->originbuf);
+
+qemu_mutex_lock(&comp_done_lock);
+param->done = true;
+param->result = result;
+qemu_cond_signal(&comp_done_cond);
+qemu_mutex_unlock(&comp_done_lock);
+
+qemu_mutex_lock(¶m->mutex);
+} else {
+qemu_cond_wait(¶m->cond, ¶m->mutex);
+}
+}
+qemu_mutex_unlock(¶m->mutex);
+
+return NULL;
+}
+
+void compress_threads_save_cleanup(void)
+{
+int i, thread_count;
+
+if (!migrate_compress() || !comp_param) {
+return;
+}
+
+thread_count = migrate_compress_threads();
+for (i = 0; i < thread_count; i++) {
+/*
+ * we use it as a indicator which shows if the thread is
+ * properly init'd or not
+ */
+if (!comp_param[i].file) {
+break;
+}
+
+qemu_mutex_lock(&comp_param[i].mutex);
+comp_param[i].quit = true;
+qemu_cond_signal(&comp_param[i].cond);
+qemu_mutex_unlock(&comp_param[i].mutex);
+
+qemu_thread_join(compress_threads + i);
+qemu_mutex_destroy(&comp_param[i].mutex);
+qemu_cond_destroy(&comp

  1   2   3   >