Re: [PULL 11/35] arm/Kconfig: Do not build TCG-only boards on a KVM-only build
On 04/05/2023 14.27, Fabiano Rosas wrote: Thomas Huth writes: On 02/05/2023 14.14, Peter Maydell wrote: From: Fabiano Rosas Move all the CONFIG_FOO=y from default.mak into "default y if TCG" statements in Kconfig. That way they won't be selected when CONFIG_TCG=n. I'm leaving CONFIG_ARM_VIRT in default.mak because it allows us to keep the two default.mak files not empty and keep aarch64-default.mak including arm-default.mak. That way we don't surprise anyone that's used to altering these files. With this change we can start building with --disable-tcg. Signed-off-by: Fabiano Rosas Reviewed-by: Richard Henderson Message-id: 20230426180013.14814-12-faro...@suse.de Signed-off-by: Peter Maydell --- configs/devices/aarch64-softmmu/default.mak | 4 -- configs/devices/arm-softmmu/default.mak | 37 -- hw/arm/Kconfig | 42 - 3 files changed, 41 insertions(+), 42 deletions(-) diff --git a/configs/devices/aarch64-softmmu/default.mak b/configs/devices/aarch64-softmmu/default.mak index cf43ac8da11..70e05a197dc 100644 --- a/configs/devices/aarch64-softmmu/default.mak +++ b/configs/devices/aarch64-softmmu/default.mak @@ -2,7 +2,3 @@ # We support all the 32 bit boards so need all their config include ../arm-softmmu/default.mak - -CONFIG_XLNX_ZYNQMP_ARM=y -CONFIG_XLNX_VERSAL=y -CONFIG_SBSA_REF=y diff --git a/configs/devices/arm-softmmu/default.mak b/configs/devices/arm-softmmu/default.mak index cb3e5aea657..647fbce88d3 100644 --- a/configs/devices/arm-softmmu/default.mak +++ b/configs/devices/arm-softmmu/default.mak @@ -4,40 +4,3 @@ # CONFIG_TEST_DEVICES=n CONFIG_ARM_VIRT=y -CONFIG_CUBIEBOARD=y -CONFIG_EXYNOS4=y -CONFIG_HIGHBANK=y -CONFIG_INTEGRATOR=y -CONFIG_FSL_IMX31=y -CONFIG_MUSICPAL=y -CONFIG_MUSCA=y -CONFIG_CHEETAH=y -CONFIG_SX1=y -CONFIG_NSERIES=y -CONFIG_STELLARIS=y -CONFIG_STM32VLDISCOVERY=y -CONFIG_REALVIEW=y -CONFIG_VERSATILE=y -CONFIG_VEXPRESS=y -CONFIG_ZYNQ=y -CONFIG_MAINSTONE=y -CONFIG_GUMSTIX=y -CONFIG_SPITZ=y -CONFIG_TOSA=y -CONFIG_Z2=y -CONFIG_NPCM7XX=y -CONFIG_COLLIE=y -CONFIG_ASPEED_SOC=y -CONFIG_NETDUINO2=y -CONFIG_NETDUINOPLUS2=y -CONFIG_OLIMEX_STM32_H405=y -CONFIG_MPS2=y -CONFIG_RASPI=y -CONFIG_DIGIC=y -CONFIG_SABRELITE=y -CONFIG_EMCRAFT_SF2=y -CONFIG_MICROBIT=y -CONFIG_FSL_IMX25=y -CONFIG_FSL_IMX7=y -CONFIG_FSL_IMX6UL=y -CONFIG_ALLWINNER_H3=y diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index 87c1a29c912..2d7c4579559 100644 --- a/hw/arm/Kconfig +++ b/hw/arm/Kconfig @@ -35,20 +35,24 @@ config ARM_VIRT config CHEETAH bool +default y if TCG && ARM select OMAP select TSC210X config CUBIEBOARD bool +default y if TCG && ARM select ALLWINNER_A10 ... Hi! Sorry for not noticing this earlier, but I have to say that I really dislike this change, since it very much changes the way we did our machine configuration so far. Until now, you could simply go to configs/devices/*-softmmu/*.mak and only select the machines you wanted to have with "...=y" and delete everything else. Now you have to know *all* the machines that you do *not* want to have in your build and disable them with "...=n" in that file. That's quite ugly, especially for the arm target that has so many machines. (ok, you could also do a "--without-default-devices" configuration to get rid of the machines, but that also disables all other kind of devices that you then have to specify manually). Would leaving the CONFIGs as 'n', but commented out in the .mak files be of any help? If I understand your use case, you were probably just deleting the CONFIG=y for the boards you don't want. So now you'd be uncommenting the CONFIG=n instead. Alternatively, we could revert the .mak part of this change, convert default.mak into tcg.mak and kvm.mak, and use those transparently depending on whether --disable-tcg is present in the configure line. But there's probably a better way still that I'm not seeing here, let's see what others think. I pondered about it for a while, but I also don't have a better solution, so yes, I guess that "# CONFIG_xxx=n" idea is likely still the best solution right now. Thomas
Re: [PATCH RESEND] vhost: fix possible wrap in SVQ descriptor ring
Hi Eugenio, Thanks for reviewing. On 2023/5/9 1:26, Eugenio Perez Martin wrote: > On Sat, May 6, 2023 at 5:01 PM Hawkins Jiawei wrote: >> >> QEMU invokes vhost_svq_add() when adding a guest's element into SVQ. >> In vhost_svq_add(), it uses vhost_svq_available_slots() to check >> whether QEMU can add the element into the SVQ. If there is >> enough space, then QEMU combines some out descriptors and >> some in descriptors into one descriptor chain, and add it into >> svq->vring.desc by vhost_svq_vring_write_descs(). >> >> Yet the problem is that, `svq->shadow_avail_idx - svq->shadow_used_idx` >> in vhost_svq_available_slots() return the number of occupied elements, >> or the number of descriptor chains, instead of the number of occupied >> descriptors, which may cause wrapping in SVQ descriptor ring. >> >> Here is an example. In vhost_handle_guest_kick(), QEMU forwards >> as many available buffers to device by virtqueue_pop() and >> vhost_svq_add_element(). virtqueue_pop() return a guest's element, >> and use vhost_svq_add_elemnt(), a wrapper to vhost_svq_add(), to >> add this element into SVQ. If QEMU invokes virtqueue_pop() and >> vhost_svq_add_element() `svq->vring.num` times, vhost_svq_available_slots() >> thinks QEMU just ran out of slots and everything should work fine. >> But in fact, virtqueue_pop() return `svq-vring.num` elements or >> descriptor chains, more than `svq->vring.num` descriptors, due to >> guest memory fragmentation, and this cause wrapping in SVQ descriptor ring. >> > > The bug is valid even before marking the descriptors used. If the > guest memory is fragmented, SVQ must add chains so it can try to add > more descriptors than possible. I will add this in the commit message in v2 patch. > >> Therefore, this patch adds `num_free` field in VhostShadowVirtqueue >> structure, updates this field in vhost_svq_add() and >> vhost_svq_get_buf(), to record the number of free descriptors. >> Then we can avoid wrap in SVQ descriptor ring by refactoring >> vhost_svq_available_slots(). >> >> Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding") >> Signed-off-by: Hawkins Jiawei >> --- >> hw/virtio/vhost-shadow-virtqueue.c | 9 - >> hw/virtio/vhost-shadow-virtqueue.h | 3 +++ >> 2 files changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/hw/virtio/vhost-shadow-virtqueue.c >> b/hw/virtio/vhost-shadow-virtqueue.c >> index 8361e70d1b..e1c6952b10 100644 >> --- a/hw/virtio/vhost-shadow-virtqueue.c >> +++ b/hw/virtio/vhost-shadow-virtqueue.c >> @@ -68,7 +68,7 @@ bool vhost_svq_valid_features(uint64_t features, Error >> **errp) >>*/ >> static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq) >> { >> -return svq->vring.num - (svq->shadow_avail_idx - svq->shadow_used_idx); >> +return svq->num_free; >> } >> >> /** >> @@ -263,6 +263,9 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const >> struct iovec *out_sg, >> return -EINVAL; >> } >> >> +/* Update the size of SVQ vring free descriptors */ >> +svq->num_free -= ndescs; >> + >> svq->desc_state[qemu_head].elem = elem; >> svq->desc_state[qemu_head].ndescs = ndescs; >> vhost_svq_kick(svq); >> @@ -450,6 +453,9 @@ static VirtQueueElement >> *vhost_svq_get_buf(VhostShadowVirtqueue *svq, >> svq->desc_next[last_used_chain] = svq->free_head; >> svq->free_head = used_elem.id; >> >> +/* Update the size of SVQ vring free descriptors */ > > No need for this comment. > > Apart from that, > > Acked-by: Eugenio Pérez > Thanks for your suggestion. I will remove the comment in v2 patch, with this tag on. >> +svq->num_free += num; >> + >> *len = used_elem.len; >> return g_steal_pointer(&svq->desc_state[used_elem.id].elem); >> } >> @@ -659,6 +665,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, >> VirtIODevice *vdev, >> svq->iova_tree = iova_tree; >> >> svq->vring.num = virtio_queue_get_num(vdev, >> virtio_get_queue_index(vq)); >> +svq->num_free = svq->vring.num; >> driver_size = vhost_svq_driver_area_size(svq); >> device_size = vhost_svq_device_area_size(svq); >> svq->vring.desc = qemu_memalign(qemu_real_host_page_size(), >> driver_size); >> diff --git a/hw/virtio/vhost-shadow-virtqueue.h >> b/hw/virtio/vhost-shadow-virtqueue.h >> index 926a4897b1..6efe051a70 100644 >> --- a/hw/virtio/vhost-shadow-virtqueue.h >> +++ b/hw/virtio/vhost-shadow-virtqueue.h >> @@ -107,6 +107,9 @@ typedef struct VhostShadowVirtqueue { >> >> /* Next head to consume from the device */ >> uint16_t last_used_idx; >> + >> +/* Size of SVQ vring free descriptors */ >> +uint16_t num_free; >> } VhostShadowVirtqueue; >> >> bool vhost_svq_valid_features(uint64_t features, Error **errp); >> -- >> 2.25.1 >> >
Re: [PATCH v3 3/3] tests/qtest: Don't run cdrom boot tests if no accelerator is present
On 08/05/2023 20.16, Fabiano Rosas wrote: On a build configured with: --disable-tcg --enable-xen it is possible to produce a QEMU binary with no TCG nor KVM support. Skip the cdrom boot tests if that's the case. Fixes: 0c1ae3ff9d ("tests/qtest: Fix tests when no KVM or TCG are present") Signed-off-by: Fabiano Rosas --- tests/qtest/cdrom-test.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/tests/qtest/cdrom-test.c b/tests/qtest/cdrom-test.c index 26a2400181..31d3bacd8c 100644 --- a/tests/qtest/cdrom-test.c +++ b/tests/qtest/cdrom-test.c @@ -130,6 +130,11 @@ static void test_cdboot(gconstpointer data) static void add_x86_tests(void) { +if (!qtest_has_accel("tcg") && !qtest_has_accel("kvm")) { +g_test_skip("No KVM or TCG accelerator available, skipping boot tests"); +return; +} + qtest_add_data_func("cdrom/boot/default", "-cdrom ", test_cdboot); qtest_add_data_func("cdrom/boot/virtio-scsi", "-device virtio-scsi -device scsi-cd,drive=cdr " @@ -176,6 +181,11 @@ static void add_x86_tests(void) static void add_s390x_tests(void) { +if (!qtest_has_accel("tcg") && !qtest_has_accel("kvm")) { +g_test_skip("No KVM or TCG accelerator available, skipping boot tests"); +return; +} + qtest_add_data_func("cdrom/boot/default", "-cdrom ", test_cdboot); qtest_add_data_func("cdrom/boot/virtio-scsi", "-device virtio-scsi -device scsi-cd,drive=cdr " Reviewed-by: Thomas Huth
Re: [PATCH 2/4] vhost-user: Interface for migration state transfer
On Mon, May 8, 2023 at 10:10 PM Stefan Hajnoczi wrote: > > On Thu, Apr 20, 2023 at 03:29:44PM +0200, Eugenio Pérez wrote: > > On Wed, 2023-04-19 at 07:21 -0400, Stefan Hajnoczi wrote: > > > On Wed, 19 Apr 2023 at 07:10, Hanna Czenczek wrote: > > > > On 18.04.23 09:54, Eugenio Perez Martin wrote: > > > > > On Mon, Apr 17, 2023 at 9:21 PM Stefan Hajnoczi > > > > > wrote: > > > > > > On Mon, 17 Apr 2023 at 15:08, Eugenio Perez Martin > > > > > > > > > > > > wrote: > > > > > > > On Mon, Apr 17, 2023 at 7:14 PM Stefan Hajnoczi > > > > > > > > > > > > > > wrote: > > > > > > > > On Thu, Apr 13, 2023 at 12:14:24PM +0200, Eugenio Perez Martin > > > > > > > > wrote: > > > > > > > > > On Wed, Apr 12, 2023 at 11:06 PM Stefan Hajnoczi < > > > > > > > > > stefa...@redhat.com> wrote: > > > > > > > > > > On Tue, Apr 11, 2023 at 05:05:13PM +0200, Hanna Czenczek > > > > > > > > > > wrote: > > > > > > > > > > > So-called "internal" virtio-fs migration refers to > > > > > > > > > > > transporting the > > > > > > > > > > > back-end's (virtiofsd's) state through qemu's migration > > > > > > > > > > > stream. To do > > > > > > > > > > > this, we need to be able to transfer virtiofsd's internal > > > > > > > > > > > state to and > > > > > > > > > > > from virtiofsd. > > > > > > > > > > > > > > > > > > > > > > Because virtiofsd's internal state will not be too large, > > > > > > > > > > > we > > > > > > > > > > > believe it > > > > > > > > > > > is best to transfer it as a single binary blob after the > > > > > > > > > > > streaming > > > > > > > > > > > phase. Because this method should be useful to other > > > > > > > > > > > vhost- > > > > > > > > > > > user > > > > > > > > > > > implementations, too, it is introduced as a > > > > > > > > > > > general-purpose > > > > > > > > > > > addition to > > > > > > > > > > > the protocol, not limited to vhost-user-fs. > > > > > > > > > > > > > > > > > > > > > > These are the additions to the protocol: > > > > > > > > > > > - New vhost-user protocol feature > > > > > > > > > > > VHOST_USER_PROTOCOL_F_MIGRATORY_STATE: > > > > > > > > > > >This feature signals support for transferring state, > > > > > > > > > > > and is > > > > > > > > > > > added so > > > > > > > > > > >that migration can fail early when the back-end has no > > > > > > > > > > > support. > > > > > > > > > > > > > > > > > > > > > > - SET_DEVICE_STATE_FD function: Front-end and back-end > > > > > > > > > > > negotiate a pipe > > > > > > > > > > >over which to transfer the state. The front-end sends > > > > > > > > > > > an > > > > > > > > > > > FD to the > > > > > > > > > > >back-end into/from which it can write/read its state, > > > > > > > > > > > and > > > > > > > > > > > the back-end > > > > > > > > > > >can decide to either use it, or reply with a different > > > > > > > > > > > FD > > > > > > > > > > > for the > > > > > > > > > > >front-end to override the front-end's choice. > > > > > > > > > > >The front-end creates a simple pipe to transfer the > > > > > > > > > > > state, > > > > > > > > > > > but maybe > > > > > > > > > > >the back-end already has an FD into/from which it has > > > > > > > > > > > to > > > > > > > > > > > write/read > > > > > > > > > > >its state, in which case it will want to override the > > > > > > > > > > > simple pipe. > > > > > > > > > > >Conversely, maybe in the future we find a way to have > > > > > > > > > > > the > > > > > > > > > > > front-end > > > > > > > > > > >get an immediate FD for the migration stream (in some > > > > > > > > > > > cases), in which > > > > > > > > > > >case we will want to send this to the back-end instead > > > > > > > > > > > of > > > > > > > > > > > creating a > > > > > > > > > > >pipe. > > > > > > > > > > >Hence the negotiation: If one side has a better idea > > > > > > > > > > > than a > > > > > > > > > > > plain > > > > > > > > > > >pipe, we will want to use that. > > > > > > > > > > > > > > > > > > > > > > - CHECK_DEVICE_STATE: After the state has been transferred > > > > > > > > > > > through the > > > > > > > > > > >pipe (the end indicated by EOF), the front-end invokes > > > > > > > > > > > this > > > > > > > > > > > function > > > > > > > > > > >to verify success. There is no in-band way (through > > > > > > > > > > > the > > > > > > > > > > > pipe) to > > > > > > > > > > >indicate failure, so we need to check explicitly. > > > > > > > > > > > > > > > > > > > > > > Once the transfer pipe has been established via > > > > > > > > > > > SET_DEVICE_STATE_FD > > > > > > > > > > > (which includes establishing the direction of transfer and > > > > > > > > > > > migration > > > > > > > > > > > phase), the sending side writes its data into the pipe, > > > > > > > > > > > and > > > > > > > > > > > the reading > > > > > > > > > > > side reads it until it sees an EOF. Then, the front-end > > > > > > > > > > > will > > > > > > > > > > > check for > > > > > > > > > >
[PATCH] Fix QEMU crash caused when NUMA nodes exceed available CPUs
command "qemu-system-riscv64 -machine virt -m 2G -smp 1 -numa node,mem=1G -numa node,mem=1G" would trigger this problem. This commit fixes the issue by adding parameter checks. Signed-off-by: Yin Wang --- hw/core/numa.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/hw/core/numa.c b/hw/core/numa.c index d8d36b16d8..ff249369be 100644 --- a/hw/core/numa.c +++ b/hw/core/numa.c @@ -168,6 +168,13 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node, numa_info[nodenr].present = true; max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1); ms->numa_state->num_nodes++; +if (ms->smp.max_cpus < ms->numa_state->num_nodes) { +error_setg(errp, + "Number of NUMA nodes:(%d)" + " is larger than number of CPUs:(%d)", + ms->numa_state->num_nodes, ms->smp.max_cpus); +return; +} } static -- 2.34.1
Re: [PULL 00/13] Compression code patches
On 5/8/23 19:51, Juan Quintela wrote: The following changes since commit 792f77f376adef944f9a03e601f6ad90c2f891b2: Merge tag 'pull-loongarch-20230506' ofhttps://gitlab.com/gaosong/qemu into staging (2023-05-06 08:11:52 +0100) are available in the Git repository at: https://gitlab.com/juan.quintela/qemu.git tags/compression-code-pull-request for you to fetch changes up to c323518a7aab1c01740a468671b7f2b517d3bca6: migration: Initialize and cleanup decompression in migration.c (2023-05-08 15:25:27 +0200) Migration PULL request (20230508 edition, take 2) Hi This is just the compression bits of the Migration PULL request for 20230428. Only change is that we don't run the compression tests by default. The problem already exist with compression code. The test just show that it don't work. - Add migration tests for (old) compress migration code (lukas) - Make compression code independent of ram.c (lukas) - Move compression code into ram-compress.c (lukas) Please apply, Juan. Applied, thanks. Please update https://wiki.qemu.org/ChangeLog/8.1 as appropriate. r~
Re: [PATCH 2/4] vhost-user: Interface for migration state transfer
On Mon, May 8, 2023 at 9:12 PM Stefan Hajnoczi wrote: > > On Thu, Apr 20, 2023 at 03:27:51PM +0200, Eugenio Pérez wrote: > > On Tue, 2023-04-18 at 16:40 -0400, Stefan Hajnoczi wrote: > > > On Tue, 18 Apr 2023 at 14:31, Eugenio Perez Martin > > > wrote: > > > > On Tue, Apr 18, 2023 at 7:59 PM Stefan Hajnoczi > > > > wrote: > > > > > On Tue, Apr 18, 2023 at 10:09:30AM +0200, Eugenio Perez Martin wrote: > > > > > > On Mon, Apr 17, 2023 at 9:33 PM Stefan Hajnoczi > > > > > > wrote: > > > > > > > On Mon, 17 Apr 2023 at 15:10, Eugenio Perez Martin < > > > > > > > epere...@redhat.com> wrote: > > > > > > > > On Mon, Apr 17, 2023 at 5:38 PM Stefan Hajnoczi > > > > > > > > > > > > > > > > wrote: > > > > > > > > > On Thu, Apr 13, 2023 at 12:14:24PM +0200, Eugenio Perez Martin > > > > > > > > > wrote: > > > > > > > > > > On Wed, Apr 12, 2023 at 11:06 PM Stefan Hajnoczi < > > > > > > > > > > stefa...@redhat.com> wrote: > > > > > > > > > > > On Tue, Apr 11, 2023 at 05:05:13PM +0200, Hanna Czenczek > > > > > > > > > > > wrote: > > > > > > > > > > > > So-called "internal" virtio-fs migration refers to > > > > > > > > > > > > transporting the > > > > > > > > > > > > back-end's (virtiofsd's) state through qemu's migration > > > > > > > > > > > > stream. To do > > > > > > > > > > > > this, we need to be able to transfer virtiofsd's > > > > > > > > > > > > internal > > > > > > > > > > > > state to and > > > > > > > > > > > > from virtiofsd. > > > > > > > > > > > > > > > > > > > > > > > > Because virtiofsd's internal state will not be too > > > > > > > > > > > > large, we > > > > > > > > > > > > believe it > > > > > > > > > > > > is best to transfer it as a single binary blob after the > > > > > > > > > > > > streaming > > > > > > > > > > > > phase. Because this method should be useful to other > > > > > > > > > > > > vhost- > > > > > > > > > > > > user > > > > > > > > > > > > implementations, too, it is introduced as a > > > > > > > > > > > > general-purpose > > > > > > > > > > > > addition to > > > > > > > > > > > > the protocol, not limited to vhost-user-fs. > > > > > > > > > > > > > > > > > > > > > > > > These are the additions to the protocol: > > > > > > > > > > > > - New vhost-user protocol feature > > > > > > > > > > > > VHOST_USER_PROTOCOL_F_MIGRATORY_STATE: > > > > > > > > > > > > This feature signals support for transferring state, > > > > > > > > > > > > and > > > > > > > > > > > > is added so > > > > > > > > > > > > that migration can fail early when the back-end has no > > > > > > > > > > > > support. > > > > > > > > > > > > > > > > > > > > > > > > - SET_DEVICE_STATE_FD function: Front-end and back-end > > > > > > > > > > > > negotiate a pipe > > > > > > > > > > > > over which to transfer the state. The front-end > > > > > > > > > > > > sends an > > > > > > > > > > > > FD to the > > > > > > > > > > > > back-end into/from which it can write/read its state, > > > > > > > > > > > > and > > > > > > > > > > > > the back-end > > > > > > > > > > > > can decide to either use it, or reply with a > > > > > > > > > > > > different FD > > > > > > > > > > > > for the > > > > > > > > > > > > front-end to override the front-end's choice. > > > > > > > > > > > > The front-end creates a simple pipe to transfer the > > > > > > > > > > > > state, > > > > > > > > > > > > but maybe > > > > > > > > > > > > the back-end already has an FD into/from which it has > > > > > > > > > > > > to > > > > > > > > > > > > write/read > > > > > > > > > > > > its state, in which case it will want to override the > > > > > > > > > > > > simple pipe. > > > > > > > > > > > > Conversely, maybe in the future we find a way to have > > > > > > > > > > > > the > > > > > > > > > > > > front-end > > > > > > > > > > > > get an immediate FD for the migration stream (in some > > > > > > > > > > > > cases), in which > > > > > > > > > > > > case we will want to send this to the back-end > > > > > > > > > > > > instead of > > > > > > > > > > > > creating a > > > > > > > > > > > > pipe. > > > > > > > > > > > > Hence the negotiation: If one side has a better idea > > > > > > > > > > > > than > > > > > > > > > > > > a plain > > > > > > > > > > > > pipe, we will want to use that. > > > > > > > > > > > > > > > > > > > > > > > > - CHECK_DEVICE_STATE: After the state has been > > > > > > > > > > > > transferred > > > > > > > > > > > > through the > > > > > > > > > > > > pipe (the end indicated by EOF), the front-end invokes > > > > > > > > > > > > this function > > > > > > > > > > > > to verify success. There is no in-band way (through > > > > > > > > > > > > the > > > > > > > > > > > > pipe) to > > > > > > > > > > > > indicate failure, so we need to check explicitly. > > > > > > > > > > > > > > > > > > > > > > > > Once the transfer pipe has been established via > > > > > > > > > > > > SET_DEVICE_STATE_FD > > > > > > > > > > > > (which includes establishing the direction of transfer > > > > > > > > > > > > and
Re: [PATCH] virtio-net: not enable vq reset feature unconditionally
On Tue, May 9, 2023 at 1:32 AM Eugenio Perez Martin wrote: > > On Mon, May 8, 2023 at 12:22 PM Michael S. Tsirkin wrote: > > > > On Mon, May 08, 2023 at 11:09:46AM +0200, Eugenio Perez Martin wrote: > > > On Sat, May 6, 2023 at 4:25 AM Xuan Zhuo > > > wrote: > > > > > > > > On Thu, 4 May 2023 12:14:47 +0200, =?utf-8?q?Eugenio_P=C3=A9rez?= > > > > wrote: > > > > > The commit 93a97dc5200a ("virtio-net: enable vq reset feature") > > > > > enables > > > > > unconditionally vq reset feature as long as the device is emulated. > > > > > This makes impossible to actually disable the feature, and it causes > > > > > migration problems from qemu version previous than 7.2. > > > > > > > > > > The entire final commit is unneeded as device system already enable or > > > > > disable the feature properly. > > > > > > > > > > This reverts commit 93a97dc5200a95e63b99cb625f20b7ae802ba413. > > > > > Fixes: 93a97dc5200a ("virtio-net: enable vq reset feature") > > > > > Signed-off-by: Eugenio Pérez > > > > > > > > > > --- > > > > > Tested by checking feature bit at > > > > > /sys/devices/pci.../virtio0/features > > > > > enabling and disabling queue_reset virtio-net feature and vhost=on/off > > > > > on net device backend. > > > > > > > > Do you mean that this feature cannot be closed? > > > > > > > > I tried to close in the guest, it was successful. > > > > > > > > > > I'm not sure what you mean with close. If the device dataplane is > > > emulated in qemu (vhost=off), I'm not able to make the device not > > > offer it. > > > > > > > In addition, in this case, could you try to repair the problem instead > > > > of > > > > directly revert. > > > > > > > > > > I'm not following this. The revert is not to always disable the feature. > > > > > > By default, the feature is enabled. If cmdline states queue_reset=on, > > > the feature is enabled. That is true both before and after applying > > > this patch. > > > > > > However, in qemu master, queue_reset=off keeps enabling this feature > > > on the device. It happens that there is a commit explicitly doing > > > that, so I'm reverting it. > > > > > > Let me know if that makes sense to you. > > > > > > Thanks! > > > > > > question is this: > > > > DEFINE_PROP_BIT64("queue_reset", _state, _field, \ > > VIRTIO_F_RING_RESET, true) > > > > > > > > don't we need compat for 7.2 and back for this property? > > > > I think that part is already covered by commit 69e1c14aa222 ("virtio: > core: vq reset feature negotation support"). In that regard, maybe we > can simplify the patch message simply stating that queue_reset=off > does not work. > > Thanks! Ack Thanks >
Re: [PATCH v3 0/4] hw/arm/virt: Support dirty ring
Hi Paolo, On 5/9/23 12:21 PM, Gavin Shan wrote: This series intends to support dirty ring for live migration for arm64. The dirty ring use discrete buffer to track dirty pages. For arm64, the speciality is to use backup bitmap to track dirty pages when there is no-running-vcpu context. It's known that the backup bitmap needs to be synchronized when KVM device "kvm-arm-gicv3" or "arm-its-kvm" has been enabled. The backup bitmap is collected in the last stage of migration. The policy here is to always enable the backup bitmap extension. The overhead to synchronize the backup bitmap in the last stage of migration, when those two devices aren't used, is introduced. However, the overhead should be very small and acceptable. The benefit is to support future cases where those two devices are used without modifying the code. PATCH[1] add migration last stage indicator PATCH[2] synchronize the backup bitmap in the last stage of migration PATCH[3] add helper kvm_dirty_ring_init() to enable dirty ring PATCH[4] enable dirty ring for arm64 v2: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg01342.html v1: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg00434.html RFCv1: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg00171.html Testing === (1) kvm-unit-tests/its-pending-migration and kvm-unit-tests/its-migration with dirty ring or normal dirty page tracking mechanism. All test cases passed. QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \ ./its-pending-migration QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \ ./its-migration QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 \ ./its-pending-migration QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 \ ./its-migration (2) Combinations of migration, post-copy migration, e1000e and virtio-net devices. All test cases passed. -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -device e1000e,bus=pcie.5,netdev=net0,mac=52:54:00:f1:26:a0 -netdev tap,id=vnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -device virtio-net-pci,bus=pcie.6,netdev=vnet0,mac=52:54:00:f1:26:b0 Changelog = v3: * Rebase for QEMU v8.1.0 (Gavin) v2: * Drop PATCH[v1 1/6] to synchronize linux-headers (Gavin) * More restrictive comments about struct MemoryListener::log_sync_global (PeterX) * Always enable the backup bitmap extension (PeterM) v1: * Combine two patches into one PATCH[v1 2/6] for the last stage indicator (PeterX) * Drop the secondary bitmap and use the original one directly (Juan) * Avoid "goto out" in helper kvm_dirty_ring_init() (Juan) Gavin Shan (4): migration: Add last stage indicator to global dirty log kvm: Synchronize the backup bitmap in the last stage kvm: Add helper kvm_dirty_ring_init() kvm: Enable dirty ring for arm64 accel/kvm/kvm-all.c | 108 --- include/exec/memory.h| 7 ++- include/sysemu/kvm_int.h | 1 + migration/dirtyrate.c| 4 +- migration/ram.c | 20 softmmu/memory.c | 10 ++-- 6 files changed, 101 insertions(+), 49 deletions(-) Could you please help to take a look and queue this series for QEMU v8.1 if it looks good? Peter Maydell has the suggestion [1] to merge the v2 series to QEMU v8.1. there is no difference between v2 and v3 except the fixed rebase conflicts in v3. [1] https://lists.nongnu.org/archive/html/qemu-arm/2023-03/msg00551.html Thanks, Gavin
[PATCH v3 4/4] kvm: Enable dirty ring for arm64
arm64 has different capability from x86 to enable the dirty ring, which is KVM_CAP_DIRTY_LOG_RING_ACQ_REL. Besides, arm64 also needs the backup bitmap extension (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) when 'kvm-arm-gicv3' or 'arm-its-kvm' device is enabled. Here the extension is always enabled and the unnecessary overhead to do the last stage of dirty log synchronization when those two devices aren't used is introduced, but the overhead should be very small and acceptable. The benefit is cover future cases where those two devices are used without modifying the code. Signed-off-by: Gavin Shan Reviewed-by: Juan Quintela Tested-by: Zhenyu Zhang --- accel/kvm/kvm-all.c | 23 +-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 5d0de9d0a8..7679f397ae 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1466,6 +1466,7 @@ static int kvm_dirty_ring_init(KVMState *s) { uint32_t ring_size = s->kvm_dirty_ring_size; uint64_t ring_bytes = ring_size * sizeof(struct kvm_dirty_gfn); +unsigned int capability = KVM_CAP_DIRTY_LOG_RING; int ret; s->kvm_dirty_ring_size = 0; @@ -1480,7 +1481,12 @@ static int kvm_dirty_ring_init(KVMState *s) * Read the max supported pages. Fall back to dirty logging mode * if the dirty ring isn't supported. */ -ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING); +ret = kvm_vm_check_extension(s, capability); +if (ret <= 0) { +capability = KVM_CAP_DIRTY_LOG_RING_ACQ_REL; +ret = kvm_vm_check_extension(s, capability); +} + if (ret <= 0) { warn_report("KVM dirty ring not available, using bitmap method"); return 0; @@ -1493,13 +1499,26 @@ static int kvm_dirty_ring_init(KVMState *s) return -EINVAL; } -ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes); +ret = kvm_vm_enable_cap(s, capability, 0, ring_bytes); if (ret) { error_report("Enabling of KVM dirty ring failed: %s. " "Suggested minimum value is 1024.", strerror(-ret)); return -EIO; } +/* Enable the backup bitmap if it is supported */ +ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP); +if (ret > 0) { +ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP, 0); +if (ret) { +error_report("Enabling of KVM dirty ring's backup bitmap failed: " + "%s. ", strerror(-ret)); +return -EIO; +} + +s->kvm_dirty_ring_with_bitmap = true; +} + s->kvm_dirty_ring_size = ring_size; s->kvm_dirty_ring_bytes = ring_bytes; -- 2.23.0
[PATCH v3 2/4] kvm: Synchronize the backup bitmap in the last stage
In the last stage of live migration or memory slot removal, the backup bitmap needs to be synchronized when it has been enabled. Signed-off-by: Gavin Shan Reviewed-by: Peter Xu Tested-by: Zhenyu Zhang --- accel/kvm/kvm-all.c | 11 +++ include/sysemu/kvm_int.h | 1 + 2 files changed, 12 insertions(+) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 870abad826..c3aaabf304 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1361,6 +1361,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, */ if (kvm_state->kvm_dirty_ring_size) { kvm_dirty_ring_reap_locked(kvm_state, NULL); +if (kvm_state->kvm_dirty_ring_with_bitmap) { +kvm_slot_sync_dirty_pages(mem); +kvm_slot_get_dirty_log(kvm_state, mem); +} } else { kvm_slot_get_dirty_log(kvm_state, mem); } @@ -1582,6 +1586,12 @@ static void kvm_log_sync_global(MemoryListener *l, bool last_stage) mem = &kml->slots[i]; if (mem->memory_size && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { kvm_slot_sync_dirty_pages(mem); + +if (s->kvm_dirty_ring_with_bitmap && last_stage && +kvm_slot_get_dirty_log(s, mem)) { +kvm_slot_sync_dirty_pages(mem); +} + /* * This is not needed by KVM_GET_DIRTY_LOG because the * ioctl will unconditionally overwrite the whole region. @@ -3710,6 +3720,7 @@ static void kvm_accel_instance_init(Object *obj) s->kernel_irqchip_split = ON_OFF_AUTO_AUTO; /* KVM dirty ring is by default off */ s->kvm_dirty_ring_size = 0; +s->kvm_dirty_ring_with_bitmap = false; s->notify_vmexit = NOTIFY_VMEXIT_OPTION_RUN; s->notify_window = 0; s->xen_version = 0; diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h index a641c974ea..511b42bde5 100644 --- a/include/sysemu/kvm_int.h +++ b/include/sysemu/kvm_int.h @@ -115,6 +115,7 @@ struct KVMState } *as; uint64_t kvm_dirty_ring_bytes; /* Size of the per-vcpu dirty ring */ uint32_t kvm_dirty_ring_size; /* Number of dirty GFNs per ring */ +bool kvm_dirty_ring_with_bitmap; struct KVMDirtyRingReaper reaper; NotifyVmexitOption notify_vmexit; uint32_t notify_window; -- 2.23.0
[PATCH v3 1/4] migration: Add last stage indicator to global dirty log
The global dirty log synchronization is used when KVM and dirty ring are enabled. There is a particularity for ARM64 where the backup bitmap is used to track dirty pages in non-running-vcpu situations. It means the dirty ring works with the combination of ring buffer and backup bitmap. The dirty bits in the backup bitmap needs to collected in the last stage of live migration. In order to identify the last stage of live migration and pass it down, an extra parameter is added to the relevant functions and callbacks. This last stage indicator isn't used until the dirty ring is enabled in the subsequent patches. No functional change intended. Signed-off-by: Gavin Shan Reviewed-by: Peter Xu Tested-by: Zhenyu Zhang --- accel/kvm/kvm-all.c | 2 +- include/exec/memory.h | 7 +-- migration/dirtyrate.c | 4 ++-- migration/ram.c | 20 ++-- softmmu/memory.c | 10 +- 5 files changed, 23 insertions(+), 20 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index cf3a88d90e..870abad826 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1563,7 +1563,7 @@ static void kvm_log_sync(MemoryListener *listener, kvm_slots_unlock(); } -static void kvm_log_sync_global(MemoryListener *l) +static void kvm_log_sync_global(MemoryListener *l, bool last_stage) { KVMMemoryListener *kml = container_of(l, KVMMemoryListener, listener); KVMState *s = kvm_state; diff --git a/include/exec/memory.h b/include/exec/memory.h index e45ce6061f..df01b0ef8a 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -934,8 +934,11 @@ struct MemoryListener { * its @log_sync must be NULL. Vice versa. * * @listener: The #MemoryListener. + * @last_stage: The last stage to synchronize the log during migration. + * The caller should gurantee that the synchronization with true for + * @last_stage is triggered for once after all VCPUs have been stopped. */ -void (*log_sync_global)(MemoryListener *listener); +void (*log_sync_global)(MemoryListener *listener, bool last_stage); /** * @log_clear: @@ -2423,7 +2426,7 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr, * * Synchronizes the dirty page log for all address spaces. */ -void memory_global_dirty_log_sync(void); +void memory_global_dirty_log_sync(bool last_stage); /** * memory_global_dirty_log_sync: synchronize the dirty log for all memory diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c index 180ba38c7a..486085a9cf 100644 --- a/migration/dirtyrate.c +++ b/migration/dirtyrate.c @@ -102,7 +102,7 @@ void global_dirty_log_change(unsigned int flag, bool start) static void global_dirty_log_sync(unsigned int flag, bool one_shot) { qemu_mutex_lock_iothread(); -memory_global_dirty_log_sync(); +memory_global_dirty_log_sync(false); if (one_shot) { memory_global_dirty_log_stop(flag); } @@ -554,7 +554,7 @@ static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig config) * skip it unconditionally and start dirty tracking * from 2'round of log sync */ -memory_global_dirty_log_sync(); +memory_global_dirty_log_sync(false); /* * reset page protect manually and unconditionally. diff --git a/migration/ram.c b/migration/ram.c index 5e7bf20ca5..6154e4f18b 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1204,7 +1204,7 @@ static void migration_trigger_throttle(RAMState *rs) } } -static void migration_bitmap_sync(RAMState *rs) +static void migration_bitmap_sync(RAMState *rs, bool last_stage) { RAMBlock *block; int64_t end_time; @@ -1216,7 +1216,7 @@ static void migration_bitmap_sync(RAMState *rs) } trace_migration_bitmap_sync_start(); -memory_global_dirty_log_sync(); +memory_global_dirty_log_sync(last_stage); qemu_mutex_lock(&rs->bitmap_mutex); WITH_RCU_READ_LOCK_GUARD() { @@ -1251,7 +1251,7 @@ static void migration_bitmap_sync(RAMState *rs) } } -static void migration_bitmap_sync_precopy(RAMState *rs) +static void migration_bitmap_sync_precopy(RAMState *rs, bool last_stage) { Error *local_err = NULL; @@ -1264,7 +1264,7 @@ static void migration_bitmap_sync_precopy(RAMState *rs) local_err = NULL; } -migration_bitmap_sync(rs); +migration_bitmap_sync(rs, last_stage); if (precopy_notify(PRECOPY_NOTIFY_AFTER_BITMAP_SYNC, &local_err)) { error_report_err(local_err); @@ -2924,7 +2924,7 @@ void ram_postcopy_send_discard_bitmap(MigrationState *ms) RCU_READ_LOCK_GUARD(); /* This should be our last sync, the src is now paused */ -migration_bitmap_sync(rs); +migration_bitmap_sync(rs, false); /* Easiest way to make sure we don't resume in the middle of a host-page */ rs->pss[RAM_CHANNEL_PRECOPY].last_sent_block = NULL; @@ -3115,7 +3115,7 @@ static void ram_init_bitmaps(RAMState *rs) /* We don't use dirty log with backgrou
[PATCH v3 0/4] hw/arm/virt: Support dirty ring
This series intends to support dirty ring for live migration for arm64. The dirty ring use discrete buffer to track dirty pages. For arm64, the speciality is to use backup bitmap to track dirty pages when there is no-running-vcpu context. It's known that the backup bitmap needs to be synchronized when KVM device "kvm-arm-gicv3" or "arm-its-kvm" has been enabled. The backup bitmap is collected in the last stage of migration. The policy here is to always enable the backup bitmap extension. The overhead to synchronize the backup bitmap in the last stage of migration, when those two devices aren't used, is introduced. However, the overhead should be very small and acceptable. The benefit is to support future cases where those two devices are used without modifying the code. PATCH[1] add migration last stage indicator PATCH[2] synchronize the backup bitmap in the last stage of migration PATCH[3] add helper kvm_dirty_ring_init() to enable dirty ring PATCH[4] enable dirty ring for arm64 v2: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg01342.html v1: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg00434.html RFCv1: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg00171.html Testing === (1) kvm-unit-tests/its-pending-migration and kvm-unit-tests/its-migration with dirty ring or normal dirty page tracking mechanism. All test cases passed. QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \ ./its-pending-migration QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \ ./its-migration QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 \ ./its-pending-migration QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 \ ./its-migration (2) Combinations of migration, post-copy migration, e1000e and virtio-net devices. All test cases passed. -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -device e1000e,bus=pcie.5,netdev=net0,mac=52:54:00:f1:26:a0 -netdev tap,id=vnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -device virtio-net-pci,bus=pcie.6,netdev=vnet0,mac=52:54:00:f1:26:b0 Changelog = v3: * Rebase for QEMU v8.1.0 (Gavin) v2: * Drop PATCH[v1 1/6] to synchronize linux-headers (Gavin) * More restrictive comments about struct MemoryListener::log_sync_global (PeterX) * Always enable the backup bitmap extension (PeterM) v1: * Combine two patches into one PATCH[v1 2/6] for the last stage indicator (PeterX) * Drop the secondary bitmap and use the original one directly (Juan) * Avoid "goto out" in helper kvm_dirty_ring_init() (Juan) Gavin Shan (4): migration: Add last stage indicator to global dirty log kvm: Synchronize the backup bitmap in the last stage kvm: Add helper kvm_dirty_ring_init() kvm: Enable dirty ring for arm64 accel/kvm/kvm-all.c | 108 --- include/exec/memory.h| 7 ++- include/sysemu/kvm_int.h | 1 + migration/dirtyrate.c| 4 +- migration/ram.c | 20 softmmu/memory.c | 10 ++-- 6 files changed, 101 insertions(+), 49 deletions(-) -- 2.23.0
[PATCH v3 3/4] kvm: Add helper kvm_dirty_ring_init()
Due to multiple capabilities associated with the dirty ring for different architectures: KVM_CAP_DIRTY_{LOG_RING, LOG_RING_ACQ_REL} for x86 and arm64 separately. There will be more to be done in order to support the dirty ring for arm64. Lets add helper kvm_dirty_ring_init() to enable the dirty ring. With this, the code looks a bit clean. No functional change intended. Signed-off-by: Gavin Shan Reviewed-by: Peter Xu Tested-by: Zhenyu Zhang --- accel/kvm/kvm-all.c | 76 - 1 file changed, 47 insertions(+), 29 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index c3aaabf304..5d0de9d0a8 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1462,6 +1462,50 @@ static int kvm_dirty_ring_reaper_init(KVMState *s) return 0; } +static int kvm_dirty_ring_init(KVMState *s) +{ +uint32_t ring_size = s->kvm_dirty_ring_size; +uint64_t ring_bytes = ring_size * sizeof(struct kvm_dirty_gfn); +int ret; + +s->kvm_dirty_ring_size = 0; +s->kvm_dirty_ring_bytes = 0; + +/* Bail if the dirty ring size isn't specified */ +if (!ring_size) { +return 0; +} + +/* + * Read the max supported pages. Fall back to dirty logging mode + * if the dirty ring isn't supported. + */ +ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING); +if (ret <= 0) { +warn_report("KVM dirty ring not available, using bitmap method"); +return 0; +} + +if (ring_bytes > ret) { +error_report("KVM dirty ring size %" PRIu32 " too big " + "(maximum is %ld). Please use a smaller value.", + ring_size, (long)ret / sizeof(struct kvm_dirty_gfn)); +return -EINVAL; +} + +ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes); +if (ret) { +error_report("Enabling of KVM dirty ring failed: %s. " + "Suggested minimum value is 1024.", strerror(-ret)); +return -EIO; +} + +s->kvm_dirty_ring_size = ring_size; +s->kvm_dirty_ring_bytes = ring_bytes; + +return 0; +} + static void kvm_region_add(MemoryListener *listener, MemoryRegionSection *section) { @@ -2531,35 +2575,9 @@ static int kvm_init(MachineState *ms) * Enable KVM dirty ring if supported, otherwise fall back to * dirty logging mode */ -if (s->kvm_dirty_ring_size > 0) { -uint64_t ring_bytes; - -ring_bytes = s->kvm_dirty_ring_size * sizeof(struct kvm_dirty_gfn); - -/* Read the max supported pages */ -ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING); -if (ret > 0) { -if (ring_bytes > ret) { -error_report("KVM dirty ring size %" PRIu32 " too big " - "(maximum is %ld). Please use a smaller value.", - s->kvm_dirty_ring_size, - (long)ret / sizeof(struct kvm_dirty_gfn)); -ret = -EINVAL; -goto err; -} - -ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes); -if (ret) { -error_report("Enabling of KVM dirty ring failed: %s. " - "Suggested minimum value is 1024.", strerror(-ret)); -goto err; -} - -s->kvm_dirty_ring_bytes = ring_bytes; - } else { - warn_report("KVM dirty ring not available, using bitmap method"); - s->kvm_dirty_ring_size = 0; -} +ret = kvm_dirty_ring_init(s); +if (ret < 0) { +goto err; } /* -- 2.23.0
Re: [PATCH v10 1/8] memory: prevent dma-reentracy issues
在 2023/5/8 下午9:12, Thomas Huth 写道: On 08/05/2023 15.03, Song Gao wrote: Hi, Thomas 在 2023/5/8 下午5:33, Thomas Huth 写道: On 06/05/2023 11.25, Song Gao wrote: Hi Alexander 在 2023/4/28 下午5:14, Thomas Huth 写道: On 28/04/2023 11.11, Alexander Bulekov wrote: On 230428 1015, Thomas Huth wrote: On 28/04/2023 10.12, Daniel P. Berrangé wrote: On Thu, Apr 27, 2023 at 05:10:06PM -0400, Alexander Bulekov wrote: Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA. ... This patch causes the loongarch virtual machine to fail to start the slave cpu. ./build/qemu-system-loongarch64 -machine virt -m 8G -cpu la464 \ -smp 4 -bios QEMU_EFI.fd -kernel vmlinuz.efi -initrd ramdisk \ -serial stdio -monitor telnet:localhost:4495,server,nowait \ -append "root=/dev/ram rdinit=/sbin/init console=ttyS0,115200" --nographic qemu-system-loongarch64: warning: Blocked re-entrant IO on MemoryRegion: loongarch_ipi_iocsr at addr: 0x24 Oh, another spot that needs special handling ... I see Alexander already sent a patch (thanks!), but anyway, this is a good indication that we're missing some test coverage in the CI. Are there any loongarch kernel images available for public download somewhere? If so, we really should add an avocado regression test for this - since as far as I can see, we don't have any tests for loongarch in tests/avocado yet? we can get some binarys at: https://github.com/yangxiaojuan-loongson/qemu-binary > I'm not sure that avacodo testing can be done using just the kernel. Is a full loongarch system required? No, you don't need a full distro installation, just a kernel with ramdisk (which is also available there) is good enough for a basic test, e.g. just check whether the kernel boots to a certain point is good enough to provide a basic sanity test. If you then can also get even into a shell (of the ramdisk), you can check some additional stuff in the sysfs or "dmesg" output, see for example tests/avocado/machine_s390_ccw_virtio.py which does such checks with a kernel and initrd on s390x. Thanks for you suggestion . We will add a loongarch basic test on tests/avacode. Thanks. Song Gao
Re: ssl fips self check fails with 7.2.0 on x86 TCG
Verified it was https://gitlab.com/qemu-project/qemu/-/issues/1471 On Thu, May 4, 2023 at 12:03 PM Patrick Venture wrote: > Hi, > > I just finished rebasing my team onto 7.2.0 and now I'm seeing > https://boringssl.googlesource.com/boringssl/+/master/crypto/fipsmodule/self_check/self_check.c#361 > fail. > > I applied > https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg00260.html and > it's still failing. > > Is anyone else seeing this issue or have suggestions on how to debug it? > > I haven't yet tried with 8.0.0 but that's my next step, although it also > needs the float32_exp3 patch. > > Patrick >
Re: RE: [PATCH] virtio-crypto: fix NULL pointer dereference in virtio_crypto_free_request
On 5/9/23 09:02, Gonglei (Arei) wrote: -Original Message- From: Mauro Matteo Cascella [mailto:mcasc...@redhat.com] Sent: Monday, May 8, 2023 11:02 PM To: qemu-devel@nongnu.org Cc: m...@redhat.com; Gonglei (Arei) ; pizhen...@bytedance.com; ta...@zju.edu.cn; mcasc...@redhat.com Subject: [PATCH] virtio-crypto: fix NULL pointer dereference in virtio_crypto_free_request Ensure op_info is not NULL in case of QCRYPTODEV_BACKEND_ALG_SYM algtype. Fixes: 02ed3e7c ("virtio-crypto: zeroize the key material before free") I have to say the fixes is incorrect. The bug was introduced by commit 0e660a6f90a, which changed the semantic meaning of request-> flag. Regards, -Gonglei Hi Mauro Agree with Lei, could you please change the Fixes as Lei suggested? -- zhenwei pi
RE: [PATCH] virtio-crypto: fix NULL pointer dereference in virtio_crypto_free_request
> -Original Message- > From: Mauro Matteo Cascella [mailto:mcasc...@redhat.com] > Sent: Monday, May 8, 2023 11:02 PM > To: qemu-devel@nongnu.org > Cc: m...@redhat.com; Gonglei (Arei) ; > pizhen...@bytedance.com; ta...@zju.edu.cn; mcasc...@redhat.com > Subject: [PATCH] virtio-crypto: fix NULL pointer dereference in > virtio_crypto_free_request > > Ensure op_info is not NULL in case of QCRYPTODEV_BACKEND_ALG_SYM > algtype. > > Fixes: 02ed3e7c ("virtio-crypto: zeroize the key material before free") I have to say the fixes is incorrect. The bug was introduced by commit 0e660a6f90a, which changed the semantic meaning of request-> flag. Regards, -Gonglei
Re: [PATCH v5 0/3] NUMA: Apply cluster-NUMA-node boundary for aarch64 and riscv machines
Hi Paolo, On 5/9/23 10:27 AM, Gavin Shan wrote: For arm64 and riscv architecture, the driver (/base/arch_topology.c) is used to populate the CPU topology in the Linux guest. It's required that the CPUs in one cluster can't span mutiple NUMA nodes. Otherwise, the Linux scheduling domain can't be sorted out, as the following warning message indicates. To avoid the unexpected confusion, this series attempts to warn about such kind of irregular configurations. -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \ -numa node,nodeid=0,cpus=0-1,memdev=ram0\ -numa node,nodeid=1,cpus=2-3,memdev=ram1\ -numa node,nodeid=2,cpus=4-5,memdev=ram2\ [ cut here ] WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 build_sched_domains+0x284/0x910 Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1 pstate: 0045 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : build_sched_domains+0x284/0x910 lr : build_sched_domains+0x184/0x910 sp : 8804bd50 x29: 8804bd50 x28: 0002 x27: x26: 89cf9a80 x25: x24: 89cbf840 x23: 80325000 x22: 005df800 x21: 8a4ce508 x20: x19: 80324440 x18: 0014 x17: 388925c0 x16: 5386a066 x15: 9c10cc2e x14: 01c0 x13: 0001 x12: 7fffb1a0 x11: 7fffb180 x10: 8a4ce508 x9 : 0041 x8 : 8a4ce500 x7 : 8a4cf920 x6 : 0001 x5 : 0001 x4 : 0007 x3 : 0002 x2 : 1000 x1 : 8a4cf928 x0 : 0001 Call trace: build_sched_domains+0x284/0x910 sched_init_domains+0xac/0xe0 sched_init_smp+0x48/0xc8 kernel_init_freeable+0x140/0x1ac kernel_init+0x28/0x140 ret_from_fork+0x10/0x20 PATCH[1] Warn about the irregular configuration if required PATCH[2] Enable the validation for aarch64 machines PATCH[3] Enable the validation for riscv machines v4: https://lists.nongnu.org/archive/html/qemu-arm/2023-04/msg00232.html v3: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg01226.html v2: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg01080.html v1: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg00886.html Changelog = v5: * Rebase for QEMU v8.1.0(Gavin) * Pick ack-b's from Igor(Gavin) [...] Gavin Shan (3): numa: Validate cluster and NUMA node boundary if required hw/arm: Validate cluster and NUMA node boundary hw/riscv: Validate cluster and NUMA node boundary hw/arm/sbsa-ref.c | 2 ++ hw/arm/virt.c | 2 ++ hw/core/machine.c | 42 ++ hw/riscv/spike.c| 2 ++ hw/riscv/virt.c | 2 ++ include/hw/boards.h | 1 + 6 files changed, 51 insertions(+) When v4 was reviewed by Igor, it was mentioned that you're handling hw/core/machine changes recently. Could you please help to queue this series for QEMU v8.1 if it looks good to you? Thanks, Gavin
[PATCH v5 0/3] NUMA: Apply cluster-NUMA-node boundary for aarch64 and riscv machines
For arm64 and riscv architecture, the driver (/base/arch_topology.c) is used to populate the CPU topology in the Linux guest. It's required that the CPUs in one cluster can't span mutiple NUMA nodes. Otherwise, the Linux scheduling domain can't be sorted out, as the following warning message indicates. To avoid the unexpected confusion, this series attempts to warn about such kind of irregular configurations. -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \ -numa node,nodeid=0,cpus=0-1,memdev=ram0\ -numa node,nodeid=1,cpus=2-3,memdev=ram1\ -numa node,nodeid=2,cpus=4-5,memdev=ram2\ [ cut here ] WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 build_sched_domains+0x284/0x910 Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1 pstate: 0045 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : build_sched_domains+0x284/0x910 lr : build_sched_domains+0x184/0x910 sp : 8804bd50 x29: 8804bd50 x28: 0002 x27: x26: 89cf9a80 x25: x24: 89cbf840 x23: 80325000 x22: 005df800 x21: 8a4ce508 x20: x19: 80324440 x18: 0014 x17: 388925c0 x16: 5386a066 x15: 9c10cc2e x14: 01c0 x13: 0001 x12: 7fffb1a0 x11: 7fffb180 x10: 8a4ce508 x9 : 0041 x8 : 8a4ce500 x7 : 8a4cf920 x6 : 0001 x5 : 0001 x4 : 0007 x3 : 0002 x2 : 1000 x1 : 8a4cf928 x0 : 0001 Call trace: build_sched_domains+0x284/0x910 sched_init_domains+0xac/0xe0 sched_init_smp+0x48/0xc8 kernel_init_freeable+0x140/0x1ac kernel_init+0x28/0x140 ret_from_fork+0x10/0x20 PATCH[1] Warn about the irregular configuration if required PATCH[2] Enable the validation for aarch64 machines PATCH[3] Enable the validation for riscv machines v4: https://lists.nongnu.org/archive/html/qemu-arm/2023-04/msg00232.html v3: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg01226.html v2: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg01080.html v1: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg00886.html Changelog = v5: * Rebase for QEMU v8.1.0(Gavin) * Pick ack-b's from Igor(Gavin) v4: * Pick r-b and ack-b from Daniel/Philippe (Gavin) * Replace local variable @len with possible_cpus->len in validate_cpu_cluster_to_numa_boundary() (Philippe) v3: * Validate cluster-to-NUMA instead of socket-to-NUMA boundary (Gavin) * Move the switch from MachineState to MachineClass (Philippe) * Warning instead of rejecting the irregular configuration (Daniel) * Comments to mention cluster-to-NUMA is platform instead of architectural choice (Drew) * Drop PATCH[v2 1/4] related to qtests/numa-test(Gavin) v2: * Fix socket-NUMA-node boundary issues in qtests/numa-test (Gavin) * Add helper set_numa_socket_boundary() and validate the boundary in the generic path (Philippe) Gavin Shan (3): numa: Validate cluster and NUMA node boundary if required hw/arm: Validate cluster and NUMA node boundary hw/riscv: Validate cluster and NUMA node boundary hw/arm/sbsa-ref.c | 2 ++ hw/arm/virt.c | 2 ++ hw/core/machine.c | 42 ++ hw/riscv/spike.c| 2 ++ hw/riscv/virt.c | 2 ++ include/hw/boards.h | 1 + 6 files changed, 51 insertions(+) -- 2.23.0
[PATCH v5 2/3] hw/arm: Validate cluster and NUMA node boundary
There are two ARM machines where NUMA is aware: 'virt' and 'sbsa-ref'. Both of them are required to follow cluster-NUMA-node boundary. To enable the validation to warn about the irregular configuration where multiple CPUs in one cluster have been associated with different NUMA nodes. Signed-off-by: Gavin Shan Acked-by: Igor Mammedov --- hw/arm/sbsa-ref.c | 2 ++ hw/arm/virt.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c index 0b93558dde..efb380e7c8 100644 --- a/hw/arm/sbsa-ref.c +++ b/hw/arm/sbsa-ref.c @@ -864,6 +864,8 @@ static void sbsa_ref_class_init(ObjectClass *oc, void *data) mc->possible_cpu_arch_ids = sbsa_ref_possible_cpu_arch_ids; mc->cpu_index_to_instance_props = sbsa_ref_cpu_index_to_props; mc->get_default_cpu_node_id = sbsa_ref_get_default_cpu_node_id; +/* platform instead of architectural choice */ +mc->cpu_cluster_has_numa_boundary = true; } static const TypeInfo sbsa_ref_info = { diff --git a/hw/arm/virt.c b/hw/arm/virt.c index b99ae18501..5c88b78aab 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -3032,6 +3032,8 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) mc->smp_props.clusters_supported = true; mc->auto_enable_numa_with_memhp = true; mc->auto_enable_numa_with_memdev = true; +/* platform instead of architectural choice */ +mc->cpu_cluster_has_numa_boundary = true; mc->default_ram_id = "mach-virt.ram"; object_class_property_add(oc, "acpi", "OnOffAuto", -- 2.23.0
[PATCH v5 3/3] hw/riscv: Validate cluster and NUMA node boundary
There are two RISCV machines where NUMA is aware: 'virt' and 'spike'. Both of them are required to follow cluster-NUMA-node boundary. To enable the validation to warn about the irregular configuration where multiple CPUs in one cluster has been associated with multiple NUMA nodes. Signed-off-by: Gavin Shan Reviewed-by: Daniel Henrique Barboza Acked-by: Igor Mammedov --- hw/riscv/spike.c | 2 ++ hw/riscv/virt.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/hw/riscv/spike.c b/hw/riscv/spike.c index 2c5546560a..81f7e53aed 100644 --- a/hw/riscv/spike.c +++ b/hw/riscv/spike.c @@ -354,6 +354,8 @@ static void spike_machine_class_init(ObjectClass *oc, void *data) mc->cpu_index_to_instance_props = riscv_numa_cpu_index_to_props; mc->get_default_cpu_node_id = riscv_numa_get_default_cpu_node_id; mc->numa_mem_supported = true; +/* platform instead of architectural choice */ +mc->cpu_cluster_has_numa_boundary = true; mc->default_ram_id = "riscv.spike.ram"; object_class_property_add_str(oc, "signature", NULL, spike_set_signature); object_class_property_set_description(oc, "signature", diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c index 4e3efbee16..84a2bca460 100644 --- a/hw/riscv/virt.c +++ b/hw/riscv/virt.c @@ -1678,6 +1678,8 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) mc->cpu_index_to_instance_props = riscv_numa_cpu_index_to_props; mc->get_default_cpu_node_id = riscv_numa_get_default_cpu_node_id; mc->numa_mem_supported = true; +/* platform instead of architectural choice */ +mc->cpu_cluster_has_numa_boundary = true; mc->default_ram_id = "riscv_virt_board.ram"; assert(!mc->get_hotplug_handler); mc->get_hotplug_handler = virt_machine_get_hotplug_handler; -- 2.23.0
[PATCH v5 1/3] numa: Validate cluster and NUMA node boundary if required
For some architectures like ARM64, multiple CPUs in one cluster can be associated with different NUMA nodes, which is irregular configuration because we shouldn't have this in baremetal environment. The irregular configuration causes Linux guest to misbehave, as the following warning messages indicate. -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \ -numa node,nodeid=0,cpus=0-1,memdev=ram0\ -numa node,nodeid=1,cpus=2-3,memdev=ram1\ -numa node,nodeid=2,cpus=4-5,memdev=ram2\ [ cut here ] WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 build_sched_domains+0x284/0x910 Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1 pstate: 0045 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : build_sched_domains+0x284/0x910 lr : build_sched_domains+0x184/0x910 sp : 8804bd50 x29: 8804bd50 x28: 0002 x27: x26: 89cf9a80 x25: x24: 89cbf840 x23: 80325000 x22: 005df800 x21: 8a4ce508 x20: x19: 80324440 x18: 0014 x17: 388925c0 x16: 5386a066 x15: 9c10cc2e x14: 01c0 x13: 0001 x12: 7fffb1a0 x11: 7fffb180 x10: 8a4ce508 x9 : 0041 x8 : 8a4ce500 x7 : 8a4cf920 x6 : 0001 x5 : 0001 x4 : 0007 x3 : 0002 x2 : 1000 x1 : 8a4cf928 x0 : 0001 Call trace: build_sched_domains+0x284/0x910 sched_init_domains+0xac/0xe0 sched_init_smp+0x48/0xc8 kernel_init_freeable+0x140/0x1ac kernel_init+0x28/0x140 ret_from_fork+0x10/0x20 Improve the situation to warn when multiple CPUs in one cluster have been associated with different NUMA nodes. However, one NUMA node is allowed to be associated with different clusters. Signed-off-by: Gavin Shan Acked-by: Philippe Mathieu-Daudé Acked-by: Igor Mammedov --- hw/core/machine.c | 42 ++ include/hw/boards.h | 1 + 2 files changed, 43 insertions(+) diff --git a/hw/core/machine.c b/hw/core/machine.c index 47a34841a5..b718d89441 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -1261,6 +1261,45 @@ static void machine_numa_finish_cpu_init(MachineState *machine) g_string_free(s, true); } +static void validate_cpu_cluster_to_numa_boundary(MachineState *ms) +{ +MachineClass *mc = MACHINE_GET_CLASS(ms); +NumaState *state = ms->numa_state; +const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms); +const CPUArchId *cpus = possible_cpus->cpus; +int i, j; + +if (state->num_nodes <= 1 || possible_cpus->len <= 1) { +return; +} + +/* + * The Linux scheduling domain can't be parsed when the multiple CPUs + * in one cluster have been associated with different NUMA nodes. However, + * it's fine to associate one NUMA node with CPUs in different clusters. + */ +for (i = 0; i < possible_cpus->len; i++) { +for (j = i + 1; j < possible_cpus->len; j++) { +if (cpus[i].props.has_socket_id && +cpus[i].props.has_cluster_id && +cpus[i].props.has_node_id && +cpus[j].props.has_socket_id && +cpus[j].props.has_cluster_id && +cpus[j].props.has_node_id && +cpus[i].props.socket_id == cpus[j].props.socket_id && +cpus[i].props.cluster_id == cpus[j].props.cluster_id && +cpus[i].props.node_id != cpus[j].props.node_id) { +warn_report("CPU-%d and CPU-%d in socket-%ld-cluster-%ld " + "have been associated with node-%ld and node-%ld " + "respectively. It can cause OSes like Linux to " + "misbehave", i, j, cpus[i].props.socket_id, + cpus[i].props.cluster_id, cpus[i].props.node_id, + cpus[j].props.node_id); +} +} +} +} + MemoryRegion *machine_consume_memdev(MachineState *machine, HostMemoryBackend *backend) { @@ -1346,6 +1385,9 @@ void machine_run_board_init(MachineState *machine, const char *mem_path, Error * numa_complete_configuration(machine); if (machine->numa_state->num_nodes) { machine_numa_finish_cpu_init(machine); +if (machine_class->cpu_cluster_has_numa_boundary) { +validate_cpu_cluster_to_numa_boundary(machine); +} } } diff --git a/include/hw/boards.h b/include/hw/boards.h index f4117fdb9a..f609cc9aed 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -273,6 +273,7 @@ struct MachineClass { bool nvdimm_supported; bool numa_mem_supported; bool a
Re: [PATCH 05/22] hw/arm: Select VIRTIO_NET for virt machine
Il gio 4 mag 2023, 14:56 Fabiano Rosas ha scritto: > > It's a bit hard to maintain the original intention with just > documentation. Couldn't we require that --without-default-devices always > be accompanied by --with-devices? Maybe, but why would it be bad to just patch the default .mak file? And more to the point of Peter's > question, couldn't we just leave the defaults off unconditionally when > --without-default-devices is passed without --with-devices? > No, for example RHEL adds a lot of devices and is perfectly usable without --nodefaults, but we still use --without-default-devices because we want any new config to be opt in, unless it's always needed. The coupling of -nodefaults with --without-default-devices is a bit > redundant. If we're choosing to not build some devices, then the QEMU > binary should already know that. > --without-default-devices is not about choosing to not build some devices; it is about making non-selected devices opt-in rather than opt-out. Paolo > Just to be clear, -nodefaults by itself still makes sense because we can > have a simple command line for those using QEMU directly while allowing > the management layer to fine tune the devices. > > In the long run, I think we need to add some configure option that gives > us pure allnoconfig so we can have that in the CI and catch these CONFIG > issues before merging. There's no reason to merge a new CONFIG if it > will then be impossible to turn it off. > >
Re: [PULL 11/35] arm/Kconfig: Do not build TCG-only boards on a KVM-only build
Il gio 4 mag 2023, 14:27 Fabiano Rosas ha scritto: > Thomas Huth writes: > > > On 02/05/2023 14.14, Peter Maydell wrote: > >> From: Fabiano Rosas > >> > >> Move all the CONFIG_FOO=y from default.mak into "default y if TCG" > >> statements in Kconfig. That way they won't be selected when > >> CONFIG_TCG=n. > >> > >> I'm leaving CONFIG_ARM_VIRT in default.mak because it allows us to > >> keep the two default.mak files not empty and keep aarch64-default.mak > >> including arm-default.mak. That way we don't surprise anyone that's > >> used to altering these files. > >> > >> With this change we can start building with --disable-tcg. > >> > >> Signed-off-by: Fabiano Rosas > >> Reviewed-by: Richard Henderson > >> Message-id: 20230426180013.14814-12-faro...@suse.de > >> Signed-off-by: Peter Maydell > >> --- > >> configs/devices/aarch64-softmmu/default.mak | 4 -- > >> configs/devices/arm-softmmu/default.mak | 37 -- > >> hw/arm/Kconfig | 42 - > >> 3 files changed, 41 insertions(+), 42 deletions(-) > >> > >> diff --git a/configs/devices/aarch64-softmmu/default.mak > b/configs/devices/aarch64-softmmu/default.mak > >> index cf43ac8da11..70e05a197dc 100644 > >> --- a/configs/devices/aarch64-softmmu/default.mak > >> +++ b/configs/devices/aarch64-softmmu/default.mak > >> @@ -2,7 +2,3 @@ > >> > >> # We support all the 32 bit boards so need all their config > >> include ../arm-softmmu/default.mak > >> - > >> -CONFIG_XLNX_ZYNQMP_ARM=y > >> -CONFIG_XLNX_VERSAL=y > >> -CONFIG_SBSA_REF=y > >> diff --git a/configs/devices/arm-softmmu/default.mak > b/configs/devices/arm-softmmu/default.mak > >> index cb3e5aea657..647fbce88d3 100644 > >> --- a/configs/devices/arm-softmmu/default.mak > >> +++ b/configs/devices/arm-softmmu/default.mak > >> @@ -4,40 +4,3 @@ > >> # CONFIG_TEST_DEVICES=n > >> > >> CONFIG_ARM_VIRT=y > >> -CONFIG_CUBIEBOARD=y > >> -CONFIG_EXYNOS4=y > >> -CONFIG_HIGHBANK=y > >> -CONFIG_INTEGRATOR=y > >> -CONFIG_FSL_IMX31=y > >> -CONFIG_MUSICPAL=y > >> -CONFIG_MUSCA=y > >> -CONFIG_CHEETAH=y > >> -CONFIG_SX1=y > >> -CONFIG_NSERIES=y > >> -CONFIG_STELLARIS=y > >> -CONFIG_STM32VLDISCOVERY=y > >> -CONFIG_REALVIEW=y > >> -CONFIG_VERSATILE=y > >> -CONFIG_VEXPRESS=y > >> -CONFIG_ZYNQ=y > >> -CONFIG_MAINSTONE=y > >> -CONFIG_GUMSTIX=y > >> -CONFIG_SPITZ=y > >> -CONFIG_TOSA=y > >> -CONFIG_Z2=y > >> -CONFIG_NPCM7XX=y > >> -CONFIG_COLLIE=y > >> -CONFIG_ASPEED_SOC=y > >> -CONFIG_NETDUINO2=y > >> -CONFIG_NETDUINOPLUS2=y > >> -CONFIG_OLIMEX_STM32_H405=y > >> -CONFIG_MPS2=y > >> -CONFIG_RASPI=y > >> -CONFIG_DIGIC=y > >> -CONFIG_SABRELITE=y > >> -CONFIG_EMCRAFT_SF2=y > >> -CONFIG_MICROBIT=y > >> -CONFIG_FSL_IMX25=y > >> -CONFIG_FSL_IMX7=y > >> -CONFIG_FSL_IMX6UL=y > >> -CONFIG_ALLWINNER_H3=y > >> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig > >> index 87c1a29c912..2d7c4579559 100644 > >> --- a/hw/arm/Kconfig > >> +++ b/hw/arm/Kconfig > >> @@ -35,20 +35,24 @@ config ARM_VIRT > >> > >> config CHEETAH > >> bool > >> +default y if TCG && ARM > >> select OMAP > >> select TSC210X > >> > >> config CUBIEBOARD > >> bool > >> +default y if TCG && ARM > >> select ALLWINNER_A10 > > ... > > > > Hi! > > > > Sorry for not noticing this earlier, but I have to say that I really > dislike > > this change, since it very much changes the way we did our machine > > configuration so far. > > Until now, you could simply go to configs/devices/*-softmmu/*.mak and > only > > select the machines you wanted to have with "...=y" and delete > everything > > else. Now you have to know *all* the machines that you do *not* want to > have > > in your build and disable them with "...=n" in that file. That's quite > ugly, > > especially for the arm target that has so many machines. (ok, you could > also > > do a "--without-default-devices" configuration to get rid of the > machines, > > but that also disables all other kind of devices that you then have to > > specify manually). > > > > Would leaving the CONFIGs as 'n', but commented out in the .mak files be > of any help? If I understand your use case, you were probably just > deleting the CONFIG=y for the boards you don't want. So now you'd be > uncommenting the CONFIG=n instead. Yes, that would help—though it is likely to bitrot. I would also change the "if TCG" part to "depends on TCG && ARM", which will break loudly if someone sets the config to y with the wrong accelerator or in the wrong file. Once this is done for ARM we can extend it to other .mak files for consistency. Paolo > Alternatively, we could revert the .mak part of this change, convert > default.mak into tcg.mak and kvm.mak, and use those transparently > depending on whether --disable-tcg is present in the configure line. > > But there's probably a better way still that I'm not seeing here, let's > see what others think. > >
Re: [PATCH v20 11/21] qapi/s390x/cpu topology: CPU_POLARIZATION_CHANGE qapi event
On Tue, 2023-04-25 at 18:14 +0200, Pierre Morel wrote: > When the guest asks to change the polarization this change > is forwarded to the upper layer using QAPI. > The upper layer is supposed to take according decisions concerning > CPU provisioning. > > Signed-off-by: Pierre Morel > --- > qapi/machine-target.json | 33 + > hw/s390x/cpu-topology.c | 2 ++ > 2 files changed, 35 insertions(+) > > diff --git a/qapi/machine-target.json b/qapi/machine-target.json > index 3b7a0b77f4..ffde2e9cbd 100644 > --- a/qapi/machine-target.json > +++ b/qapi/machine-target.json > @@ -391,3 +391,36 @@ >'features': [ 'unstable' ], >'if': { 'all': [ 'TARGET_S390X' , 'CONFIG_KVM' ] } > } > + > +## > +# @CPU_POLARIZATION_CHANGE: > +# > +# Emitted when the guest asks to change the polarization. > +# > +# @polarization: polarization specified by the guest > +# > +# Features: > +# @unstable: This command may still be modified. > +# > +# The guest can tell the host (via the PTF instruction) whether the > +# CPUs should be provisioned using horizontal or vertical polarization. > +# > +# On horizontal polarization the host is expected to provision all vCPUs > +# equally. > +# On vertical polarization the host can provision each vCPU differently. > +# The guest will get information on the details of the provisioning > +# the next time it uses the STSI(15) instruction. > +# > +# Since: 8.1 > +# > +# Example: > +# > +# <- { "event": "CPU_POLARIZATION_CHANGE", > +# "data": { "polarization": 0 }, I think you'd be getting "horizontal" instead of 0. > +# "timestamp": { "seconds": 1401385907, "microseconds": 422329 } } > +## > +{ 'event': 'CPU_POLARIZATION_CHANGE', > + 'data': { 'polarization': 'CpuS390Polarization' }, > + 'features': [ 'unstable' ], > + 'if': { 'all': [ 'TARGET_S390X', 'CONFIG_KVM' ] } > +} > diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c > index e5fb976594..e8b140d623 100644 > --- a/hw/s390x/cpu-topology.c > +++ b/hw/s390x/cpu-topology.c > @@ -17,6 +17,7 @@ > #include "hw/s390x/s390-virtio-ccw.h" > #include "hw/s390x/cpu-topology.h" > #include "qapi/qapi-commands-machine-target.h" > +#include "qapi/qapi-events-machine-target.h" > > /* > * s390_topology is used to keep the topology information. > @@ -138,6 +139,7 @@ void s390_handle_ptf(S390CPU *cpu, uint8_t r1, uintptr_t > ra) > } else { > s390_topology.vertical_polarization = !!fc; > s390_cpu_topology_set_changed(true); > +qapi_event_send_cpu_polarization_change(fc); I'm not sure I like the implicit conversation of the function code to the enum value. How about you do qapi_event_send_cpu_polarization_change(s390_topology.polarization); and rename vertical_polarization and change it's type to the enum. You can then also do +CpuS390Polarization polarization = S390_CPU_POLARIZATION_HORIZONTAL; +switch (fc) { +case S390_CPU_POLARIZATION_VERTICAL: +polarization = S390_CPU_POLARIZATION_VERTICAL; +/* fallthrough */ +case S390_CPU_POLARIZATION_HORIZONTAL: +if (s390_topology.polarization == polarization) { and use the value for the assignment further down, too. > setcc(cpu, 0); > } > break;
Re: [PATCH 11/11] cutils: Improve qemu_strtosz handling of fractions
On Mon, May 08, 2023 at 03:03:43PM -0500, Eric Blake wrote: > We have several limitations and bugs worth fixing; they are > inter-related enough that it is not worth splitting this patch into > smaller pieces: > > * ".5k" should work to specify 512, just as "0.5k" does > * "1.k" and "1." + "9"*50 + "k" should both produce the same > result of 2048 after rounding > * "1." + "0"*350 + "1B" should not be treated the same as "1.0B"; > underflow in the fraction should not be lost > * "7.99e99" and "7.99e999" look similar, but our code was doing a > read-out-of-bounds on the latter because it was not expecting ERANGE > due to overflow. While we document that scientific notation is not > supported, and the previous patch actually fixed > qemu_strtod_finite() to no longer return ERANGE overflows, it is > easier to pre-filter than to try and determine after the fact if > strtod() consumed more than we wanted. Note that this is a > low-level semantic change (when endptr is not NULL, we can now > successfully parse with a scale of 'E' and then report trailing > junk, instead of failing outright with EINVAL); but an earlier > commit already argued that this is not a high-level semantic change > since the only caller passing in a non-NULL endptr also checks that > the tail is whitespace-only. > > Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1629 Also, Fixes: cf923b78 ("utils: Improve qemu_strtosz() to have 64 bits of precision", 6.0.0) Fixes: 7625a1ed ("utils: Use fixed-point arithmetic in qemu_strtosz", 6.0.0) > Signed-off-by: Eric Blake > --- > tests/unit/test-cutils.c | 51 +++ > util/cutils.c| 89 > 2 files changed, 88 insertions(+), 52 deletions(-) > -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [PATCH v1 6/9] KVM: x86: Add Heki hypervisor support
On Fri, May 05, 2023 at 05:20:43PM +0200, Mickaël Salaün wrote: > From: Madhavan T. Venkataraman > > Each supported hypervisor in x86 implements a struct x86_hyper_init to > define the init functions for the hypervisor. Define a new init_heki() > entry point in struct x86_hyper_init. Hypervisors that support Heki > must define this init_heki() function. Call init_heki() of the chosen > hypervisor in init_hypervisor_platform(). > > Create a heki_hypervisor structure that each hypervisor can fill > with its data and functions. This will allow the Heki feature to work > in a hypervisor agnostic way. > > Declare and initialize a "heki_hypervisor" structure for KVM so KVM can > support Heki. Define the init_heki() function for KVM. In init_heki(), > set the hypervisor field in the generic "heki" structure to the KVM > "heki_hypervisor". After this point, generic Heki code can access the > KVM Heki data and functions. > [...] > +static void kvm_init_heki(void) > +{ > + long err; > + > + if (!kvm_para_available()) > + /* Cannot make KVM hypercalls. */ > + return; > + > + err = kvm_hypercall3(KVM_HC_LOCK_MEM_PAGE_RANGES, -1, -1, -1); Why not do a proper version check or capability check here? If the ABI or supported features ever change then we have something to rely on? Thanks, Wei.
Re: [PATCH 07/11] numa: Check for qemu_strtosz_MiB error
On Mon, May 08, 2023 at 03:03:39PM -0500, Eric Blake wrote: > As shown in the previous commit, qemu_strtosz_MiB sometimes leaves the > result value untoutched (we have to audit further to learn that in untouched > that case, the QAPI generator says that visit_type_NumaOptions() will > have zero-initialized it), and sometimes leaves it with the value of a > partial parse before -EINVAL occurs because of trailing garbage. > Rather than blindly treating any string the user may throw at us as > valid, we should check for parse failures. > > Fiuxes: cc001888 ("numa: fixup parsed NumaNodeOptions earlier", v2.11.0) > Signed-off-by: Eric Blake > --- -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [PATCH v1 5/9] KVM: x86: Add new hypercall to lock control registers
On Fri, May 05, 2023 at 05:20:42PM +0200, Mickaël Salaün wrote: > This enables guests to lock their CR0 and CR4 registers with a subset of > X86_CR0_WP, X86_CR4_SMEP, X86_CR4_SMAP, X86_CR4_UMIP, X86_CR4_FSGSBASE > and X86_CR4_CET flags. > > The new KVM_HC_LOCK_CR_UPDATE hypercall takes two arguments. The first > is to identify the control register, and the second is a bit mask to > pin (i.e. mark as read-only). > > These register flags should already be pinned by Linux guests, but once > compromised, this self-protection mechanism could be disabled, which is > not the case with this dedicated hypercall. > > Cc: Borislav Petkov > Cc: Dave Hansen > Cc: H. Peter Anvin > Cc: Ingo Molnar > Cc: Kees Cook > Cc: Madhavan T. Venkataraman > Cc: Paolo Bonzini > Cc: Sean Christopherson > Cc: Thomas Gleixner > Cc: Vitaly Kuznetsov > Cc: Wanpeng Li > Signed-off-by: Mickaël Salaün > Link: https://lore.kernel.org/r/20230505152046.6575-6-...@digikod.net [...] > hw_cr4 = (cr4_read_shadow() & X86_CR4_MCE) | (cr4 & ~X86_CR4_MCE); > if (is_unrestricted_guest(vcpu)) > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index ffab64d08de3..a529455359ac 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -7927,11 +7927,77 @@ static unsigned long emulator_get_cr(struct > x86_emulate_ctxt *ctxt, int cr) > return value; > } > > +#ifdef CONFIG_HEKI > + > +extern unsigned long cr4_pinned_mask; > + Can this be moved to a header file? > +static int heki_lock_cr(struct kvm *const kvm, const unsigned long cr, > + unsigned long pin) > +{ > + if (!pin) > + return -KVM_EINVAL; > + > + switch (cr) { > + case 0: > + /* Cf. arch/x86/kernel/cpu/common.c */ > + if (!(pin & X86_CR0_WP)) > + return -KVM_EINVAL; > + > + if ((read_cr0() & pin) != pin) > + return -KVM_EINVAL; > + > + atomic_long_or(pin, &kvm->heki_pinned_cr0); > + return 0; > + case 4: > + /* Checks for irrelevant bits. */ > + if ((pin & cr4_pinned_mask) != pin) > + return -KVM_EINVAL; > + It is enforcing the host mask on the guest, right? If the guest's set is a super set of the host's then it will get rejected. > + /* Ignores bits not present in host. */ > + pin &= __read_cr4(); > + atomic_long_or(pin, &kvm->heki_pinned_cr4); > + return 0; > + } > + return -KVM_EINVAL; > +} > + > +int heki_check_cr(const struct kvm *const kvm, const unsigned long cr, > + const unsigned long val) > +{ > + unsigned long pinned; > + > + switch (cr) { > + case 0: > + pinned = atomic_long_read(&kvm->heki_pinned_cr0); > + if ((val & pinned) != pinned) { > + pr_warn_ratelimited( > + "heki-kvm: Blocked CR0 update: 0x%lx\n", val); I think if the message contains the VM and VCPU identifier it will become more useful. Thanks, Wei.
Re: [PATCH 0/4] vhost-user-fs: Internal migration
On Fri, May 05, 2023 at 02:51:55PM +0200, Hanna Czenczek wrote: > On 05.05.23 11:53, Eugenio Perez Martin wrote: > > On Fri, May 5, 2023 at 11:03 AM Hanna Czenczek wrote: > > > On 04.05.23 23:14, Stefan Hajnoczi wrote: > > > > On Thu, 4 May 2023 at 13:39, Hanna Czenczek wrote: > > [...] > > > > > All state is lost and the Device Initialization process > > > > must be followed to make the device operational again. > > > > > > > > Existing vhost-user backends don't implement SET_STATUS 0 (it's new). > > > > > > > > It's messy and not your fault. I think QEMU should solve this by > > > > treating stateful devices differently from non-stateful devices. That > > > > way existing vhost-user backends continue to work and new stateful > > > > devices can also be supported. > > > It’s my understanding that SET_STATUS 0/RESET_DEVICE is problematic for > > > stateful devices. In a previous email, you wrote that these should > > > implement SUSPEND+RESUME so qemu can use those instead. But those are > > > separate things, so I assume we just use SET_STATUS 0 when stopping the > > > VM because this happens to also stop processing vrings as a side effect? > > > > > > I.e. I understand “treating stateful devices differently” to mean that > > > qemu should use SUSPEND+RESUME instead of SET_STATUS 0 when the back-end > > > supports it, and stateful back-ends should support it. > > > > > Honestly I cannot think of any use case where the vhost-user backend > > did not ignore set_status(0) and had to retrieve vq states. So maybe > > we can totally remove that call from qemu? > > I don’t know so I can’t really say; but I don’t quite understand why qemu > would reset a device at any point but perhaps VM reset (and even then I’d > expect the post-reset guest to just reset the device on boot by itself, > too). DPDK stores the Device Status field value and uses it later: https://github.com/DPDK/dpdk/blob/main/lib/vhost/vhost_user.c#L2791 While DPDK performs no immediate action upon SET_STATUS 0, omitting the message will change the behavior of other DPDK code like virtio_is_ready(). Changing the semantics of the vhost-user protocol in a way that's not backwards compatible is something we should avoid unless there is no other way. The fundamental problem is that QEMU's vhost code is designed to reset vhost devices because it assumes they are stateless. If an F_SUSPEND protocol feature bit is added, then it becomes possible to detect new backends and suspend/resume them rather than reset them. That's the solution that I favor because it's backwards compatible and the same model can be applied to stateful vDPA devices in the future. Stefan signature.asc Description: PGP signature
Re: [PATCH 2/4] vhost-user: Interface for migration state transfer
On Thu, Apr 20, 2023 at 03:29:44PM +0200, Eugenio Pérez wrote: > On Wed, 2023-04-19 at 07:21 -0400, Stefan Hajnoczi wrote: > > On Wed, 19 Apr 2023 at 07:10, Hanna Czenczek wrote: > > > On 18.04.23 09:54, Eugenio Perez Martin wrote: > > > > On Mon, Apr 17, 2023 at 9:21 PM Stefan Hajnoczi > > > > wrote: > > > > > On Mon, 17 Apr 2023 at 15:08, Eugenio Perez Martin > > > > > > > > > > wrote: > > > > > > On Mon, Apr 17, 2023 at 7:14 PM Stefan Hajnoczi > > > > > > > > > > > > wrote: > > > > > > > On Thu, Apr 13, 2023 at 12:14:24PM +0200, Eugenio Perez Martin > > > > > > > wrote: > > > > > > > > On Wed, Apr 12, 2023 at 11:06 PM Stefan Hajnoczi < > > > > > > > > stefa...@redhat.com> wrote: > > > > > > > > > On Tue, Apr 11, 2023 at 05:05:13PM +0200, Hanna Czenczek > > > > > > > > > wrote: > > > > > > > > > > So-called "internal" virtio-fs migration refers to > > > > > > > > > > transporting the > > > > > > > > > > back-end's (virtiofsd's) state through qemu's migration > > > > > > > > > > stream. To do > > > > > > > > > > this, we need to be able to transfer virtiofsd's internal > > > > > > > > > > state to and > > > > > > > > > > from virtiofsd. > > > > > > > > > > > > > > > > > > > > Because virtiofsd's internal state will not be too large, we > > > > > > > > > > believe it > > > > > > > > > > is best to transfer it as a single binary blob after the > > > > > > > > > > streaming > > > > > > > > > > phase. Because this method should be useful to other vhost- > > > > > > > > > > user > > > > > > > > > > implementations, too, it is introduced as a general-purpose > > > > > > > > > > addition to > > > > > > > > > > the protocol, not limited to vhost-user-fs. > > > > > > > > > > > > > > > > > > > > These are the additions to the protocol: > > > > > > > > > > - New vhost-user protocol feature > > > > > > > > > > VHOST_USER_PROTOCOL_F_MIGRATORY_STATE: > > > > > > > > > >This feature signals support for transferring state, and > > > > > > > > > > is > > > > > > > > > > added so > > > > > > > > > >that migration can fail early when the back-end has no > > > > > > > > > > support. > > > > > > > > > > > > > > > > > > > > - SET_DEVICE_STATE_FD function: Front-end and back-end > > > > > > > > > > negotiate a pipe > > > > > > > > > >over which to transfer the state. The front-end sends an > > > > > > > > > > FD to the > > > > > > > > > >back-end into/from which it can write/read its state, and > > > > > > > > > > the back-end > > > > > > > > > >can decide to either use it, or reply with a different FD > > > > > > > > > > for the > > > > > > > > > >front-end to override the front-end's choice. > > > > > > > > > >The front-end creates a simple pipe to transfer the > > > > > > > > > > state, > > > > > > > > > > but maybe > > > > > > > > > >the back-end already has an FD into/from which it has to > > > > > > > > > > write/read > > > > > > > > > >its state, in which case it will want to override the > > > > > > > > > > simple pipe. > > > > > > > > > >Conversely, maybe in the future we find a way to have the > > > > > > > > > > front-end > > > > > > > > > >get an immediate FD for the migration stream (in some > > > > > > > > > > cases), in which > > > > > > > > > >case we will want to send this to the back-end instead of > > > > > > > > > > creating a > > > > > > > > > >pipe. > > > > > > > > > >Hence the negotiation: If one side has a better idea > > > > > > > > > > than a > > > > > > > > > > plain > > > > > > > > > >pipe, we will want to use that. > > > > > > > > > > > > > > > > > > > > - CHECK_DEVICE_STATE: After the state has been transferred > > > > > > > > > > through the > > > > > > > > > >pipe (the end indicated by EOF), the front-end invokes > > > > > > > > > > this > > > > > > > > > > function > > > > > > > > > >to verify success. There is no in-band way (through the > > > > > > > > > > pipe) to > > > > > > > > > >indicate failure, so we need to check explicitly. > > > > > > > > > > > > > > > > > > > > Once the transfer pipe has been established via > > > > > > > > > > SET_DEVICE_STATE_FD > > > > > > > > > > (which includes establishing the direction of transfer and > > > > > > > > > > migration > > > > > > > > > > phase), the sending side writes its data into the pipe, and > > > > > > > > > > the reading > > > > > > > > > > side reads it until it sees an EOF. Then, the front-end > > > > > > > > > > will > > > > > > > > > > check for > > > > > > > > > > success via CHECK_DEVICE_STATE, which on the destination > > > > > > > > > > side > > > > > > > > > > includes > > > > > > > > > > checking for integrity (i.e. errors during deserialization). > > > > > > > > > > > > > > > > > > > > Suggested-by: Stefan Hajnoczi > > > > > > > > > > Signed-off-by: Hanna Czenczek > > > > > > > > > > --- > > > > > > > > > > include/hw/virtio/vhost-backend.h | 24 + > > > > > > > > > > include/hw/virtio/vhost.h | 79
Re: [PATCH v1 3/9] virt: Implement Heki common code
On Fri, May 05, 2023 at 05:20:40PM +0200, Mickaël Salaün wrote: > From: Madhavan T. Venkataraman > > Hypervisor Enforced Kernel Integrity (Heki) is a feature that will use > the hypervisor to enhance guest virtual machine security. > > Configuration > = > > Define the config variables for the feature. This feature depends on > support from the architecture as well as the hypervisor. > > Enabling HEKI > = > > Define a kernel command line parameter "heki" to turn the feature on or > off. By default, Heki is on. For such a newfangled feature can we have it off by default? Especially when there are unsolved issues around dynamically loaded code. > [...] > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 3604074a878b..5cf5a7a97811 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -297,6 +297,7 @@ config X86 > select FUNCTION_ALIGNMENT_4B > imply IMA_SECURE_AND_OR_TRUSTED_BOOTif EFI > select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE > + select ARCH_SUPPORTS_HEKI if X86_64 Why is there a restriction on X86_64? > > config INSTRUCTION_DECODER > def_bool y > diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h > index a6e8373a5170..42ef1e33b8a5 100644 > --- a/arch/x86/include/asm/sections.h > +++ b/arch/x86/include/asm/sections.h [...] > > +#ifdef CONFIG_HEKI > + > +/* > + * Gather all of the statically defined sections so heki_late_init() can > + * protect these sections in the host page table. > + * > + * The sections are defined under "SECTIONS" in vmlinux.lds.S > + * Keep this array in sync with SECTIONS. > + */ This seems a bit fragile, because it requires constant attention from people who care about this functionality. Can this table be automatically generated? Thanks, Wei. > +struct heki_va_range __initdata heki_va_ranges[] = { > + { > + .va_start = _stext, > + .va_end = _etext, > + .attributes = HEKI_ATTR_MEM_NOWRITE | HEKI_ATTR_MEM_EXEC, > + }, > + { > + .va_start = __start_rodata, > + .va_end = __end_rodata, > + .attributes = HEKI_ATTR_MEM_NOWRITE, > + }, > +#ifdef CONFIG_UNWINDER_ORC > + { > + .va_start = __start_orc_unwind_ip, > + .va_end = __stop_orc_unwind_ip, > + .attributes = HEKI_ATTR_MEM_NOWRITE, > + }, > + { > + .va_start = __start_orc_unwind, > + .va_end = __stop_orc_unwind, > + .attributes = HEKI_ATTR_MEM_NOWRITE, > + }, > + { > + .va_start = orc_lookup, > + .va_end = orc_lookup_end, > + .attributes = HEKI_ATTR_MEM_NOWRITE, > + }, > +#endif /* CONFIG_UNWINDER_ORC */ > +}; > +
[PATCH 04/11] test-cutils: Add coverage of qemu_strtod
Plenty more corner cases of strtod proper, but this covers the bulk of what our wrappers do. In particular, it demonstrates the difference on when *value is left uninitialized, which an upcoming patch will normalize. Signed-off-by: Eric Blake --- tests/unit/test-cutils.c | 435 +++ 1 file changed, 435 insertions(+) diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c index 1eeaf21ae22..4c096c6fc70 100644 --- a/tests/unit/test-cutils.c +++ b/tests/unit/test-cutils.c @@ -25,6 +25,8 @@ * THE SOFTWARE. */ +#include + #include "qemu/osdep.h" #include "qemu/cutils.h" #include "qemu/units.h" @@ -2044,6 +2046,414 @@ static void test_qemu_strtou64_full_max(void) g_free(str); } +static void test_qemu_strtod_simple(void) +{ +const char *str; +const char *endptr; +int err; +double res; + +/* no radix or exponent */ +str = "1"; +endptr = NULL; +res = 999; +err = qemu_strtod(str, &endptr, &res); +g_assert_cmpint(err, ==, 0); +g_assert_cmpfloat(res, ==, 1.0); +g_assert_true(endptr == str + 1); + +/* leading space and sign */ +str = " -0.0"; +endptr = NULL; +res = 999; +err = qemu_strtod(str, &endptr, &res); +g_assert_cmpint(err, ==, 0); +g_assert_cmpfloat(res, ==, -0.0); +g_assert_true(signbit(res)); +g_assert_true(endptr == str + 5); + +/* fraction only */ +str = "+.5"; +endptr = NULL; +res = 999; +err = qemu_strtod(str, &endptr, &res); +g_assert_cmpint(err, ==, 0); +g_assert_cmpfloat(res, ==, 0.5); +g_assert_true(endptr == str + 3); + +/* exponent */ +str = "1.e+1"; +endptr = NULL; +res = 999; +err = qemu_strtod(str, &endptr, &res); +g_assert_cmpint(err, ==, 0); +g_assert_cmpfloat(res, ==, 10.0); +g_assert_true(endptr == str + 5); +} + +static void test_qemu_strtod_einval(void) +{ +const char *str; +const char *endptr; +int err; +double res; + +/* empty */ +str = ""; +endptr = NULL; +res = 999; +err = qemu_strtod(str, &endptr, &res); +g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmpfloat(res, ==, 0.0); +g_assert_true(endptr == str); + +/* NULL */ +str = NULL; +endptr = "random"; +res = 999; +err = qemu_strtod(str, &endptr, &res); +g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmpfloat(res, ==, 999.0); +g_assert_null(endptr); + +/* not recognizable */ +str = " junk"; +endptr = NULL; +res = 999; +err = qemu_strtod(str, &endptr, &res); +g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmpfloat(res, ==, 0.0); +g_assert_true(endptr == str); +} + +static void test_qemu_strtod_erange(void) +{ +const char *str; +const char *endptr; +int err; +double res; + +/* overflow */ +str = "9e999"; +endptr = NULL; +res = 999; +err = qemu_strtod(str, &endptr, &res); +g_assert_cmpint(err, ==, -ERANGE); +g_assert_cmpfloat(res, ==, HUGE_VAL); +g_assert_true(endptr == str + 5); + +str = "-9e+999"; +endptr = NULL; +res = 999; +err = qemu_strtod(str, &endptr, &res); +g_assert_cmpint(err, ==, -ERANGE); +g_assert_cmpfloat(res, ==, -HUGE_VAL); +g_assert_true(endptr == str + 7); + +/* underflow */ +str = "-9e-999"; +endptr = NULL; +res = 999; +err = qemu_strtod(str, &endptr, &res); +g_assert_cmpint(err, ==, -ERANGE); +g_assert_cmpfloat(res, >=, -DBL_MIN); +g_assert_cmpfloat(res, <=, -0.0); +g_assert_true(signbit(res)); +g_assert_true(endptr == str + 7); +} + +static void test_qemu_strtod_nonfinite(void) +{ +const char *str; +const char *endptr; +int err; +double res; + +/* infinity */ +str = "inf"; +endptr = NULL; +res = 999; +err = qemu_strtod(str, &endptr, &res); +g_assert_cmpint(err, ==, 0); +g_assert_true(isinf(res)); +g_assert_false(signbit(res)); +g_assert_true(endptr == str + 3); + +str = "-infinity"; +endptr = NULL; +res = 999; +err = qemu_strtod(str, &endptr, &res); +g_assert_cmpint(err, ==, 0); +g_assert_true(isinf(res)); +g_assert_true(signbit(res)); +g_assert_true(endptr == str + 9); + +/* not a number */ +str = " NaN"; +endptr = NULL; +res = 999; +err = qemu_strtod(str, &endptr, &res); +g_assert_cmpint(err, ==, 0); +g_assert_true(isnan(res)); +g_assert_true(endptr == str + 4); +} + +static void test_qemu_strtod_trailing(void) +{ +const char *str; +const char *endptr; +int err; +double res; + +/* trailing whitespace */ +str = "1. "; +endptr = NULL; +res = 999; +err = qemu_strtod(str, &endptr, &res); +g_assert_cmpint(err, ==, 0); +g_assert_cmpfloat(res, ==, 1.0); +g_assert_true(endptr == str + 2); + +endptr = NULL; +res = 999; +err = qemu_strtod(str, NULL, &res); +g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmpfloat(res, ==,
[PATCH 01/11] test-cutils: Avoid g_assert in unit tests
glib documentation[1] is clear: g_assert() should be avoided in unit tests because it is ineffective if G_DISABLE_ASSERT is defined; unit tests should stick to constructs based on g_assert_true() instead. Note that since commit 262a69f428, we intentionally state that you cannot define G_DISABLE_ASSERT that while building qemu; but our code can be copied to other projects without that restriction, so we should be consistent. For most of the replacements in this patch, using g_assert_cmpstr() would be a regression in quality - although it would helpfully display the string contents of both pointers on test failure, here, we really do care about pointer equality, not just string content equality. But when a NULL pointer is expected, g_assert_null works fine. [1] https://libsoup.org/glib/glib-Testing.html#g-assert Signed-off-by: Eric Blake --- tests/unit/test-cutils.c | 324 +++ 1 file changed, 162 insertions(+), 162 deletions(-) diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c index 3c4f8754202..0202ac0d5b3 100644 --- a/tests/unit/test-cutils.c +++ b/tests/unit/test-cutils.c @@ -1,7 +1,7 @@ /* * cutils.c unit-tests * - * Copyright (C) 2013 Red Hat Inc. + * Copyright Red Hat * * Authors: * Eduardo Habkost @@ -40,7 +40,7 @@ static void test_parse_uint_null(void) g_assert_cmpint(r, ==, -EINVAL); g_assert_cmpint(i, ==, 0); -g_assert(endptr == NULL); +g_assert_null(endptr); } static void test_parse_uint_empty(void) @@ -55,7 +55,7 @@ static void test_parse_uint_empty(void) g_assert_cmpint(r, ==, -EINVAL); g_assert_cmpint(i, ==, 0); -g_assert(endptr == str); +g_assert_true(endptr == str); } static void test_parse_uint_whitespace(void) @@ -70,7 +70,7 @@ static void test_parse_uint_whitespace(void) g_assert_cmpint(r, ==, -EINVAL); g_assert_cmpint(i, ==, 0); -g_assert(endptr == str); +g_assert_true(endptr == str); } @@ -86,7 +86,7 @@ static void test_parse_uint_invalid(void) g_assert_cmpint(r, ==, -EINVAL); g_assert_cmpint(i, ==, 0); -g_assert(endptr == str); +g_assert_true(endptr == str); } @@ -102,7 +102,7 @@ static void test_parse_uint_trailing(void) g_assert_cmpint(r, ==, 0); g_assert_cmpint(i, ==, 123); -g_assert(endptr == str + 3); +g_assert_true(endptr == str + 3); } static void test_parse_uint_correct(void) @@ -117,7 +117,7 @@ static void test_parse_uint_correct(void) g_assert_cmpint(r, ==, 0); g_assert_cmpint(i, ==, 123); -g_assert(endptr == str + strlen(str)); +g_assert_true(endptr == str + strlen(str)); } static void test_parse_uint_octal(void) @@ -132,7 +132,7 @@ static void test_parse_uint_octal(void) g_assert_cmpint(r, ==, 0); g_assert_cmpint(i, ==, 0123); -g_assert(endptr == str + strlen(str)); +g_assert_true(endptr == str + strlen(str)); } static void test_parse_uint_decimal(void) @@ -147,7 +147,7 @@ static void test_parse_uint_decimal(void) g_assert_cmpint(r, ==, 0); g_assert_cmpint(i, ==, 123); -g_assert(endptr == str + strlen(str)); +g_assert_true(endptr == str + strlen(str)); } @@ -163,7 +163,7 @@ static void test_parse_uint_llong_max(void) g_assert_cmpint(r, ==, 0); g_assert_cmpint(i, ==, (unsigned long long)LLONG_MAX + 1); -g_assert(endptr == str + strlen(str)); +g_assert_true(endptr == str + strlen(str)); g_free(str); } @@ -180,7 +180,7 @@ static void test_parse_uint_overflow(void) g_assert_cmpint(r, ==, -ERANGE); g_assert_cmpint(i, ==, ULLONG_MAX); -g_assert(endptr == str + strlen(str)); +g_assert_true(endptr == str + strlen(str)); } static void test_parse_uint_negative(void) @@ -195,7 +195,7 @@ static void test_parse_uint_negative(void) g_assert_cmpint(r, ==, -ERANGE); g_assert_cmpint(i, ==, 0); -g_assert(endptr == str + strlen(str)); +g_assert_true(endptr == str + strlen(str)); } @@ -235,7 +235,7 @@ static void test_qemu_strtoi_correct(void) g_assert_cmpint(err, ==, 0); g_assert_cmpint(res, ==, 12345); -g_assert(endptr == str + 5); +g_assert_true(endptr == str + 5); } static void test_qemu_strtoi_null(void) @@ -248,7 +248,7 @@ static void test_qemu_strtoi_null(void) err = qemu_strtoi(NULL, &endptr, 0, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert(endptr == NULL); +g_assert_null(endptr); } static void test_qemu_strtoi_empty(void) @@ -262,7 +262,7 @@ static void test_qemu_strtoi_empty(void) err = qemu_strtoi(str, &endptr, 0, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert(endptr == str); +g_assert_true(endptr == str); } static void test_qemu_strtoi_whitespace(void) @@ -276,7 +276,7 @@ static void test_qemu_strtoi_whitespace(void) err = qemu_strtoi(str, &endptr, 0, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert(endptr == str); +g_assert_true(endptr == str); } static void test_qemu_st
[PATCH 09/11] cutils: Set value in all integral qemu_strto* error paths
Our goal in writing qemu_strtoi() and friends is to have an interface harder to abuse than libc's strtol(). Leaving the return value initialized on some error paths does not lend itself well to this goal; and our documentation wasn't helpful on the matter. Note that the previous patch changed all qemu_strtosz() EINVAL error paths to slam value to 0 rather than stay uninitialized, even when the EINVAL eror occurs because of trailing junk. But for the remaining integral qemu_strto*, it's easier to return the parsed value than to force things back to zero, in part because of how check_strtox_error works; and doing so creates less churn in the testsuite. Here, the list of affected callers is much longer ('git grep "qemu_strto[ui]" *.c **/*.c | grep -v tests/ |wc -l' outputs 87, although a few of those are the implementation in in cutils.c), so touching as little as possible is the wisest course of action. Signed-off-by: Eric Blake --- tests/unit/test-cutils.c | 24 +++ util/cutils.c| 42 +--- 2 files changed, 38 insertions(+), 28 deletions(-) diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c index 9cf00a810e4..2cb33e41ae4 100644 --- a/tests/unit/test-cutils.c +++ b/tests/unit/test-cutils.c @@ -250,7 +250,7 @@ static void test_qemu_strtoi_null(void) err = qemu_strtoi(NULL, &endptr, 0, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmpint(res, ==, 999); +g_assert_cmpint(res, ==, 0); g_assert_null(endptr); } @@ -479,7 +479,7 @@ static void test_qemu_strtoi_full_null(void) err = qemu_strtoi(NULL, &endptr, 0, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmpint(res, ==, 999); +g_assert_cmpint(res, ==, 0); g_assert_null(endptr); } @@ -557,7 +557,7 @@ static void test_qemu_strtoui_null(void) err = qemu_strtoui(NULL, &endptr, 0, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmpuint(res, ==, 999); +g_assert_cmpuint(res, ==, 0); g_assert_null(endptr); } @@ -784,7 +784,7 @@ static void test_qemu_strtoui_full_null(void) err = qemu_strtoui(NULL, NULL, 0, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmpuint(res, ==, 999); +g_assert_cmpuint(res, ==, 0); } static void test_qemu_strtoui_full_empty(void) @@ -860,7 +860,7 @@ static void test_qemu_strtol_null(void) err = qemu_strtol(NULL, &endptr, 0, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmpint(res, ==, 999); +g_assert_cmpint(res, ==, 0); g_assert_null(endptr); } @@ -1087,7 +1087,7 @@ static void test_qemu_strtol_full_null(void) err = qemu_strtol(NULL, &endptr, 0, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmpint(res, ==, 999); +g_assert_cmpint(res, ==, 0); g_assert_null(endptr); } @@ -1165,7 +1165,7 @@ static void test_qemu_strtoul_null(void) err = qemu_strtoul(NULL, &endptr, 0, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmpuint(res, ==, 999); +g_assert_cmpuint(res, ==, 0); g_assert_null(endptr); } @@ -1390,7 +1390,7 @@ static void test_qemu_strtoul_full_null(void) err = qemu_strtoul(NULL, NULL, 0, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmpuint(res, ==, 999); +g_assert_cmpuint(res, ==, 0); } static void test_qemu_strtoul_full_empty(void) @@ -1466,7 +1466,7 @@ static void test_qemu_strtoi64_null(void) err = qemu_strtoi64(NULL, &endptr, 0, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmpint(res, ==, 999); +g_assert_cmpint(res, ==, 0); g_assert_null(endptr); } @@ -1691,7 +1691,7 @@ static void test_qemu_strtoi64_full_null(void) err = qemu_strtoi64(NULL, NULL, 0, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmpint(res, ==, 999); +g_assert_cmpint(res, ==, 0); } static void test_qemu_strtoi64_full_empty(void) @@ -1769,7 +1769,7 @@ static void test_qemu_strtou64_null(void) err = qemu_strtou64(NULL, &endptr, 0, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmpuint(res, ==, 999); +g_assert_cmpuint(res, ==, 0); g_assert_null(endptr); } @@ -1994,7 +1994,7 @@ static void test_qemu_strtou64_full_null(void) err = qemu_strtou64(NULL, NULL, 0, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmpuint(res, ==, 999); +g_assert_cmpuint(res, ==, 0); } static void test_qemu_strtou64_full_empty(void) diff --git a/util/cutils.c b/util/cutils.c index 8bacf349383..83948926ec9 100644 --- a/util/cutils.c +++ b/util/cutils.c @@ -384,12 +384,13 @@ static int check_strtox_error(const char *nptr, char *ep, * * @nptr may be null, and no conversion is performed then. * - * If no conversion is performed, store @nptr in *@endptr and return - * -EINVAL. + * If no conversion is performed, store @nptr in *@endptr, 0 in + * @result, and return -EINVAL. * * If @endptr is null, and the string isn't fully converted, return - * -EINVAL. This
[PATCH 03/11] test-cutils: Test integral qemu_strto* value on failures
We are inconsistent on the contents of *value after a strto* parse failure. I found the following behaviors: - parse_uint() and parse_uint_full(), which document that *value is slammed to 0 on all EINVAL failures and 0 or UINT_MAX on ERANGE failures, and has unit tests for that (note that parse_uint requires non-NULL endptr, and does not fail with EINVAL for trailing junk) - qemu_strtosz(), which leaves *value untouched on all failures (both EINVAL and ERANGE), and has unit tests but not documentation for that - qemu_strtoi() and other integral friends, which document *value on ERANGE failures but is unspecified on EINVAL (other than implicitly by comparison to libc strto*); there, *value is untouched for NULL string, slammed to 0 on no conversion, and left at the prefix value on NULL endptr; unit tests do not consistently check the value - qemu_strtod(), which documents *value on ERANGE failures but is unspecified on EINVAL; there, *value is untouched for NULL string, slammed to 0.0 for no conversion, and left at the prefix value on NULL endptr; there are no unit tests (other than indirectly through qemu_strtosz) - qemu_strtod_finite(), which documents *value on ERANGE failures but is unspecified on EINVAL; there, *value is left at the prefix for 'inf' or 'nan' and untouched in all other cases; there are no unit tests (other than indirectly through qemu_strtosz) Upcoming patches will change behaviors for consistency, but it's best to first have more unit test coverage to see the impact of those changes. Signed-off-by: Eric Blake --- tests/unit/test-cutils.c | 58 +++- 1 file changed, 51 insertions(+), 7 deletions(-) diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c index 38bd3990207..1eeaf21ae22 100644 --- a/tests/unit/test-cutils.c +++ b/tests/unit/test-cutils.c @@ -248,6 +248,7 @@ static void test_qemu_strtoi_null(void) err = qemu_strtoi(NULL, &endptr, 0, &res); g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmpint(res, ==, 999); g_assert_null(endptr); } @@ -262,6 +263,7 @@ static void test_qemu_strtoi_empty(void) err = qemu_strtoi(str, &endptr, 0, &res); g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmpint(res, ==, 0); g_assert_true(endptr == str); } @@ -276,6 +278,7 @@ static void test_qemu_strtoi_whitespace(void) err = qemu_strtoi(str, &endptr, 0, &res); g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmpint(res, ==, 0); g_assert_true(endptr == str); } @@ -290,6 +293,7 @@ static void test_qemu_strtoi_invalid(void) err = qemu_strtoi(str, &endptr, 0, &res); g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmpint(res, ==, 0); g_assert_true(endptr == str); } @@ -473,6 +477,7 @@ static void test_qemu_strtoi_full_null(void) err = qemu_strtoi(NULL, &endptr, 0, &res); g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmpint(res, ==, 999); g_assert_null(endptr); } @@ -485,6 +490,7 @@ static void test_qemu_strtoi_full_empty(void) err = qemu_strtoi(str, NULL, 0, &res); g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmpint(res, ==, 0); } static void test_qemu_strtoi_full_negative(void) @@ -502,18 +508,19 @@ static void test_qemu_strtoi_full_negative(void) static void test_qemu_strtoi_full_trailing(void) { const char *str = "123xxx"; -int res; +int res = 999; int err; err = qemu_strtoi(str, NULL, 0, &res); g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmpint(res, ==, 123); } static void test_qemu_strtoi_full_max(void) { char *str = g_strdup_printf("%d", INT_MAX); -int res; +int res = 999; int err; err = qemu_strtoi(str, NULL, 0, &res); @@ -548,6 +555,7 @@ static void test_qemu_strtoui_null(void) err = qemu_strtoui(NULL, &endptr, 0, &res); g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmpuint(res, ==, 999); g_assert_null(endptr); } @@ -562,6 +570,7 @@ static void test_qemu_strtoui_empty(void) err = qemu_strtoui(str, &endptr, 0, &res); g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmpuint(res, ==, 0); g_assert_true(endptr == str); } @@ -576,6 +585,7 @@ static void test_qemu_strtoui_whitespace(void) err = qemu_strtoui(str, &endptr, 0, &res); g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmpuint(res, ==, 0); g_assert_true(endptr == str); } @@ -590,6 +600,7 @@ static void test_qemu_strtoui_invalid(void) err = qemu_strtoui(str, &endptr, 0, &res); g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmpuint(res, ==, 0); g_assert_true(endptr == str); } @@ -771,6 +782,7 @@ static void test_qemu_strtoui_full_null(void) err = qemu_strtoui(NULL, NULL, 0, &res); g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmpuint(res, ==, 999); } static void test_qemu_strtoui_full_empty(void) @@ -782,7 +794,9 @@ static void test_qemu_strtoui_full_empty(void) err = qemu_strtoui(str
[PATCH 02/11] test-cutils: Use g_assert_cmpuint where appropriate
When debugging test failures, seeing unsigned values as large positive values rather than negative values matters (assuming that the bug in glib 2.76 [1] where g_assert_cmpuint displays signed instead of unsigned values will eventually be fixed). No impact when the test is passing, but using a consistent style will matter more in upcoming test additions. Also, some tests are better with cmphex. While at it, fix some spacing and minor typing issues spotted nearby. [1] https://gitlab.gnome.org/GNOME/glib/-/issues/2997 Signed-off-by: Eric Blake --- tests/unit/test-cutils.c | 148 +++ 1 file changed, 74 insertions(+), 74 deletions(-) diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c index 0202ac0d5b3..38bd3990207 100644 --- a/tests/unit/test-cutils.c +++ b/tests/unit/test-cutils.c @@ -39,7 +39,7 @@ static void test_parse_uint_null(void) r = parse_uint(NULL, &i, &endptr, 0); g_assert_cmpint(r, ==, -EINVAL); -g_assert_cmpint(i, ==, 0); +g_assert_cmpuint(i, ==, 0); g_assert_null(endptr); } @@ -54,7 +54,7 @@ static void test_parse_uint_empty(void) r = parse_uint(str, &i, &endptr, 0); g_assert_cmpint(r, ==, -EINVAL); -g_assert_cmpint(i, ==, 0); +g_assert_cmpuint(i, ==, 0); g_assert_true(endptr == str); } @@ -69,7 +69,7 @@ static void test_parse_uint_whitespace(void) r = parse_uint(str, &i, &endptr, 0); g_assert_cmpint(r, ==, -EINVAL); -g_assert_cmpint(i, ==, 0); +g_assert_cmpuint(i, ==, 0); g_assert_true(endptr == str); } @@ -85,7 +85,7 @@ static void test_parse_uint_invalid(void) r = parse_uint(str, &i, &endptr, 0); g_assert_cmpint(r, ==, -EINVAL); -g_assert_cmpint(i, ==, 0); +g_assert_cmpuint(i, ==, 0); g_assert_true(endptr == str); } @@ -101,7 +101,7 @@ static void test_parse_uint_trailing(void) r = parse_uint(str, &i, &endptr, 0); g_assert_cmpint(r, ==, 0); -g_assert_cmpint(i, ==, 123); +g_assert_cmpuint(i, ==, 123); g_assert_true(endptr == str + 3); } @@ -116,7 +116,7 @@ static void test_parse_uint_correct(void) r = parse_uint(str, &i, &endptr, 0); g_assert_cmpint(r, ==, 0); -g_assert_cmpint(i, ==, 123); +g_assert_cmpuint(i, ==, 123); g_assert_true(endptr == str + strlen(str)); } @@ -131,7 +131,7 @@ static void test_parse_uint_octal(void) r = parse_uint(str, &i, &endptr, 0); g_assert_cmpint(r, ==, 0); -g_assert_cmpint(i, ==, 0123); +g_assert_cmpuint(i, ==, 0123); g_assert_true(endptr == str + strlen(str)); } @@ -146,7 +146,7 @@ static void test_parse_uint_decimal(void) r = parse_uint(str, &i, &endptr, 10); g_assert_cmpint(r, ==, 0); -g_assert_cmpint(i, ==, 123); +g_assert_cmpuint(i, ==, 123); g_assert_true(endptr == str + strlen(str)); } @@ -162,7 +162,7 @@ static void test_parse_uint_llong_max(void) r = parse_uint(str, &i, &endptr, 0); g_assert_cmpint(r, ==, 0); -g_assert_cmpint(i, ==, (unsigned long long)LLONG_MAX + 1); +g_assert_cmpuint(i, ==, (unsigned long long)LLONG_MAX + 1); g_assert_true(endptr == str + strlen(str)); g_free(str); @@ -179,7 +179,7 @@ static void test_parse_uint_overflow(void) r = parse_uint(str, &i, &endptr, 0); g_assert_cmpint(r, ==, -ERANGE); -g_assert_cmpint(i, ==, ULLONG_MAX); +g_assert_cmpuint(i, ==, ULLONG_MAX); g_assert_true(endptr == str + strlen(str)); } @@ -194,7 +194,7 @@ static void test_parse_uint_negative(void) r = parse_uint(str, &i, &endptr, 0); g_assert_cmpint(r, ==, -ERANGE); -g_assert_cmpint(i, ==, 0); +g_assert_cmpuint(i, ==, 0); g_assert_true(endptr == str + strlen(str)); } @@ -208,7 +208,7 @@ static void test_parse_uint_full_trailing(void) r = parse_uint_full(str, &i, 0); g_assert_cmpint(r, ==, -EINVAL); -g_assert_cmpint(i, ==, 0); +g_assert_cmpuint(i, ==, 0); } static void test_parse_uint_full_correct(void) @@ -220,7 +220,7 @@ static void test_parse_uint_full_correct(void) r = parse_uint_full(str, &i, 0); g_assert_cmpint(r, ==, 0); -g_assert_cmpint(i, ==, 123); +g_assert_cmpuint(i, ==, 123); } static void test_qemu_strtoi_correct(void) @@ -428,7 +428,7 @@ static void test_qemu_strtoi_underflow(void) int res = 999; int err; -err = qemu_strtoi(str, &endptr, 0, &res); +err = qemu_strtoi(str, &endptr, 0, &res); g_assert_cmpint(err, ==, -ERANGE); g_assert_cmpint(res, ==, INT_MIN); @@ -479,10 +479,10 @@ static void test_qemu_strtoi_full_null(void) static void test_qemu_strtoi_full_empty(void) { const char *str = ""; -int res = 999L; +int res = 999; int err; -err = qemu_strtoi(str, NULL, 0, &res); +err = qemu_strtoi(str, NULL, 0, &res); g_assert_cmpint(err, ==, -EINVAL); } @@ -728,7 +728,7 @@ static void test_qemu_strtoui_underflow(void) unsigned int res = 999; int err; -err = qemu_strtoui(str, &endptr,
[PATCH 08/11] cutils: Set value in all qemu_strtosz* error paths
Making callers determine whether or not *value was populated on error is not nice for usability. Pre-patch, we have unit tests that check that *result is left unchanged on most EINVAL errors and set to 0 on many ERANGE errors. This is subtly different from libc strtoumax() behavior which returns UINT64_MAX on ERANGE errors, as well as different from our parse_uint() which slams to 0 on EINVAL on the grounds that we want our functions to be harder to mis-use than strtoumax(). Let's audit callers: - hw/core/numa.c:parse_numa() fixed in the previous patch to check for errors - migration/migration-hmp-cmds.c:hmp_migrate_set_parameter(), monitor/hmp.c:monitor_parse_arguments(), qapi/opts-visitor.c:opts_type_size(), qapi/qobject-input-visitor.c:qobject_input_type_size_keyval(), qemu-img.c:cvtnum_full(), qemu-io-cmds.c:cvtnum(), target/i386/cpu.c:x86_cpu_parse_featurestr(), and util/qemu-option.c:parse_option_size() appear to reject all failures (although some with distinct messages for ERANGE as opposed to EINVAL), so it doesn't matter what is in the value parameter on error. - All remaining callers are in the testsuite, where we can tweak our expectations to match our new desired behavior. Advancing to the end of the string parsed on overflow (ERANGE), while still returning 0, makes sense (UINT64_MAX as a size is unlikely to be useful); likewise, our size parsing code is complex enough that it's easier to always return 0 when endptr is NULL but trailing garbage was found, rather than trying to return the value of the prefix actually parsed (no current caller cared about the value of the prefix). Signed-off-by: Eric Blake --- tests/unit/test-cutils.c | 72 util/cutils.c| 17 +++--- 2 files changed, 48 insertions(+), 41 deletions(-) diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c index 9fa6fb042e8..9cf00a810e4 100644 --- a/tests/unit/test-cutils.c +++ b/tests/unit/test-cutils.c @@ -2684,7 +2684,7 @@ static void test_qemu_strtosz_float(void) res = 0xbaadf00d; err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, -EINVAL /* FIXME 0 */); -g_assert_cmpuint(res, ==, 0xbaadf00d /* FIXME 512 */); +g_assert_cmpuint(res, ==, 0 /* FIXME 512 */); g_assert_true(endptr == str /* FIXME + 4 */); /* For convenience, we permit values that are not byte-exact */ @@ -2736,7 +2736,7 @@ static void test_qemu_strtosz_invalid(void) res = 0xbaadf00d; err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmphex(res, ==, 0xbaadf00d); +g_assert_cmpuint(res, ==, 0); g_assert_null(endptr); str = ""; @@ -2744,7 +2744,7 @@ static void test_qemu_strtosz_invalid(void) res = 0xbaadf00d; err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmphex(res, ==, 0xbaadf00d); +g_assert_cmpuint(res, ==, 0); g_assert_true(endptr == str); str = " \t "; @@ -2752,7 +2752,7 @@ static void test_qemu_strtosz_invalid(void) res = 0xbaadf00d; err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmphex(res, ==, 0xbaadf00d); +g_assert_cmpuint(res, ==, 0); g_assert_true(endptr == str); str = "."; @@ -2760,14 +2760,14 @@ static void test_qemu_strtosz_invalid(void) res = 0xbaadf00d; err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmphex(res, ==, 0xbaadf00d); +g_assert_cmpuint(res, ==, 0); g_assert(endptr == str); str = " ."; endptr = NULL; err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmphex(res, ==, 0xbaadf00d); +g_assert_cmpuint(res, ==, 0); g_assert(endptr == str); str = "crap"; @@ -2775,7 +2775,7 @@ static void test_qemu_strtosz_invalid(void) res = 0xbaadf00d; err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmphex(res, ==, 0xbaadf00d); +g_assert_cmpuint(res, ==, 0); g_assert_true(endptr == str); str = "inf"; @@ -2783,7 +2783,7 @@ static void test_qemu_strtosz_invalid(void) res = 0xbaadf00d; err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmphex(res, ==, 0xbaadf00d); +g_assert_cmpuint(res, ==, 0); g_assert_true(endptr == str); str = "NaN"; @@ -2791,7 +2791,7 @@ static void test_qemu_strtosz_invalid(void) res = 0xbaadf00d; err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmphex(res, ==, 0xbaadf00d); +g_assert_cmpuint(res, ==, 0); g_assert_true(endptr == str); /* Fractional values require scale larger than bytes */ @@ -2800,7 +2800,7 @@ static void test_qemu_strtosz_invalid(void) res = 0xbaadf00d; err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, -EI
[PATCH 05/11] test-cutils: Prepare for upcoming semantic change in qemu_strtosz
A quick search for 'qemu_strtosz' in the code base shows that outside of the testsuite, the ONLY place that passes a non-NULL pointer to @endptr of any variant of a size parser is in hmp.c (the 'o' parser of monitor_parse_arguments), and that particular caller warns of "extraneous characters at the end of line" unless the trailing bytes are purely whitespace. Thus, it makes no semantic difference at the high level whether we parse "1.5e1k" as "1" + ".5e1" + "k" (an attempt to use scientific notation in strtod with a scaling suffix of 'k' with no trailing junk, but which qemu_strtosz says should fail with EINVAL), or as "1.5e" + "1k" (a valid size with scaling suffix of 'e' for exabytes, followed by two junk bytes) - either way, any user passing such a string will get an error message about a parse failure. However, an upcoming patch to qemu_strtosz will fix other corner case bugs in handling the fractional portion of a size, and in doing so, it is easier to declare that qemu_strtosz() itself stops parsing at the first 'e' rather than blindly consuming whatever strtod() will recognize. Once that is fixed, the difference will be visible at the low level (getting a valid parse with trailing garbage when @endptr is non-NULL, while continuing to get -EINVAL when @endptr is NULL); this is easier to demonstrate by moving the affected strings from test_qemu_strtosz_invalid() (which declares them as always -EINVAL) to test_qemu_strtosz_trailing() (where @endptr affects behavior, for now with FIXME comments). Note that a similar argument could be made for having "0x1.5" or "0x1M" parse as 0x1 with ".5" or "M" as trailing junk, instead of blindly treating it as -EINVAL; however, as these cases do not suffer from the same problems as floating point, they are not worth changing at this time. Signed-off-by: Eric Blake --- tests/unit/test-cutils.c | 42 ++-- 1 file changed, 27 insertions(+), 15 deletions(-) diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c index 4c096c6fc70..afae2ee5331 100644 --- a/tests/unit/test-cutils.c +++ b/tests/unit/test-cutils.c @@ -2745,21 +2745,6 @@ static void test_qemu_strtosz_invalid(void) g_assert_cmphex(res, ==, 0xbaadf00d); g_assert_true(endptr == str); -/* No floating point exponents */ -str = "1.5e1k"; -endptr = NULL; -err = qemu_strtosz(str, &endptr, &res); -g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmphex(res, ==, 0xbaadf00d); -g_assert_true(endptr == str); - -str = "1.5E+0k"; -endptr = NULL; -err = qemu_strtosz(str, &endptr, &res); -g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmphex(res, ==, 0xbaadf00d); -g_assert_true(endptr == str); - /* No hex fractions */ str = "0x1.8k"; endptr = NULL; @@ -2863,6 +2848,33 @@ static void test_qemu_strtosz_trailing(void) err = qemu_strtosz(str, NULL, &res); g_assert_cmpint(err, ==, -EINVAL); g_assert_cmphex(res, ==, 0xbaadf00d); + +/* FIXME should stop parse after 'e'. No floating point exponents */ +str = "1.5e1k"; +endptr = NULL; +res = 0xbaadf00d; +err = qemu_strtosz(str, &endptr, &res); +g_assert_cmpint(err, ==, -EINVAL /* FIXME 0 */); +g_assert_cmphex(res, ==, 0xbaadf00d /* FIXME EiB * 1.5 */); +g_assert_true(endptr == str /* FIXME + 4 */); + +res = 0xbaadf00d; +err = qemu_strtosz(str, NULL, &res); +g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmpint(res, ==, 0xbaadf00d); + +str = "1.5E+0k"; +endptr = NULL; +res = 0xbaadf00d; +err = qemu_strtosz(str, &endptr, &res); +g_assert_cmpint(err, ==, -EINVAL /* FIXME 0 */); +g_assert_cmphex(res, ==, 0xbaadf00d /* FIXME EiB * 1.5 */); +g_assert_true(endptr == str /* FIXME + 4 */); + +res = 0xbaadf00d; +err = qemu_strtosz(str, NULL, &res); +g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmphex(res, ==, 0xbaadf00d); } static void test_qemu_strtosz_erange(void) -- 2.40.1
[PATCH 10/11] cutils: Improve qemu_strtod* error paths
Previous patches changed all integral qemu_strto*() error paths to guarantee that *value is never left uninitialized. Do likewise for qemu_strtod. Also, tighten qemu_strtod_finite() to never return a non-finite value (prior to this patch, we were rejecting "inf" with -EINVAL and unspecified result 0.0, but failing "9e999" with -ERANGE and HUGE_VAL - which is infinite on IEEE machines - despite our function claiming to recognize only finite values). Auditing callers, we have no external callers of qemu_strtod, and among the callers of qemu_strtod_finite: - qapi/qobject-input-visitor.c:qobject_input_type_number_keyval() and qapi/string-input-visitor.c:parse_type_number() which reject all errors (does not matter what we store) - utils/cutils.c:do_strtosz() incorrectly assumes that *endptr points to '.' on all failures (that is, it is not distinguishing between EINVAL and ERANGE; and therefore still does the WRONG THING for "9.9e999". The change here does not fix that (a later patch will tackle this more systematically), but at least the value of endptr is less likely to be out of bounds on overflow - our testsuite, which we can update to match what we document Signed-off-by: Eric Blake --- tests/unit/test-cutils.c | 57 +--- util/cutils.c| 32 +- 2 files changed, 55 insertions(+), 34 deletions(-) diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c index 2cb33e41ae4..f781997aef7 100644 --- a/tests/unit/test-cutils.c +++ b/tests/unit/test-cutils.c @@ -2105,6 +2105,7 @@ static void test_qemu_strtod_einval(void) err = qemu_strtod(str, &endptr, &res); g_assert_cmpint(err, ==, -EINVAL); g_assert_cmpfloat(res, ==, 0.0); +g_assert_false(signbit(res)); g_assert_true(endptr == str); /* NULL */ @@ -2113,7 +2114,8 @@ static void test_qemu_strtod_einval(void) res = 999; err = qemu_strtod(str, &endptr, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmpfloat(res, ==, 999.0); +g_assert_cmpfloat(res, ==, 0.0); +g_assert_false(signbit(res)); g_assert_null(endptr); /* not recognizable */ @@ -2123,6 +2125,7 @@ static void test_qemu_strtod_einval(void) err = qemu_strtod(str, &endptr, &res); g_assert_cmpint(err, ==, -EINVAL); g_assert_cmpfloat(res, ==, 0.0); +g_assert_false(signbit(res)); g_assert_true(endptr == str); } @@ -2309,7 +2312,8 @@ static void test_qemu_strtod_finite_einval(void) res = 999; err = qemu_strtod_finite(str, &endptr, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmpfloat(res, ==, 999.0); +g_assert_cmpfloat(res, ==, 0.0); +g_assert_false(signbit(res)); g_assert_true(endptr == str); /* NULL */ @@ -2318,7 +2322,8 @@ static void test_qemu_strtod_finite_einval(void) res = 999; err = qemu_strtod_finite(str, &endptr, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmpfloat(res, ==, 999.0); +g_assert_cmpfloat(res, ==, 0.0); +g_assert_false(signbit(res)); g_assert_null(endptr); /* not recognizable */ @@ -2327,7 +2332,8 @@ static void test_qemu_strtod_finite_einval(void) res = 999; err = qemu_strtod_finite(str, &endptr, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmpfloat(res, ==, 999.0); +g_assert_cmpfloat(res, ==, 0.0); +g_assert_false(signbit(res)); g_assert_true(endptr == str); } @@ -2338,24 +2344,26 @@ static void test_qemu_strtod_finite_erange(void) int err; double res; -/* overflow */ +/* overflow turns into EINVAL */ str = "9e999"; endptr = NULL; res = 999; err = qemu_strtod_finite(str, &endptr, &res); -g_assert_cmpint(err, ==, -ERANGE); -g_assert_cmpfloat(res, ==, HUGE_VAL); -g_assert_true(endptr == str + 5); +g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmpfloat(res, ==, 0.0); +g_assert_false(signbit(res)); +g_assert_true(endptr == str); str = "-9e+999"; endptr = NULL; res = 999; err = qemu_strtod_finite(str, &endptr, &res); -g_assert_cmpint(err, ==, -ERANGE); -g_assert_cmpfloat(res, ==, -HUGE_VAL); -g_assert_true(endptr == str + 7); +g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmpfloat(res, ==, 0.0); +g_assert_false(signbit(res)); +g_assert_true(endptr == str); -/* underflow */ +/* underflow is still possible */ str = "-9e-999"; endptr = NULL; res = 999; @@ -2380,7 +2388,8 @@ static void test_qemu_strtod_finite_nonfinite(void) res = 999; err = qemu_strtod_finite(str, &endptr, &res); g_assert_cmpint(err, ==, -EINVAL); -g_assert_cmpfloat(res, ==, 999.0); +g_assert_cmpfloat(res, ==, 0.0); +g_assert_false(signbit(res)); g_assert_true(endptr == str); str = "-infinity"; @@ -2388,7 +2397,8 @@ static void test_qemu_strtod_finite_nonfinite(void) res = 999; err = qemu_strtod_finite(str, &endptr, &res);
[PATCH 06/11] test-cutils: Add more coverage to qemu_strtosz
Add some more strings that the user might send our way. In particular, some of these additions include FIXME comments showing where our parser doesn't quite behave the way we want. Signed-off-by: Eric Blake --- tests/unit/test-cutils.c | 226 +-- 1 file changed, 215 insertions(+), 11 deletions(-) diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c index afae2ee5331..9fa6fb042e8 100644 --- a/tests/unit/test-cutils.c +++ b/tests/unit/test-cutils.c @@ -2478,14 +2478,14 @@ static void test_qemu_strtosz_simple(void) g_assert_cmpuint(res, ==, 8); g_assert_true(endptr == str + 2); -/* Leading space is ignored */ -str = " 12345"; +/* Leading space and + are ignored */ +str = " +12345"; endptr = str; res = 0xbaadf00d; err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, 0); g_assert_cmpuint(res, ==, 12345); -g_assert_true(endptr == str + 6); +g_assert_true(endptr == str + 7); res = 0xbaadf00d; err = qemu_strtosz(str, NULL, &res); @@ -2564,13 +2564,13 @@ static void test_qemu_strtosz_hex(void) g_assert_cmpuint(res, ==, 171); g_assert_true(endptr == str + 4); -str = "0xae"; +str = " +0xae"; endptr = str; res = 0xbaadf00d; err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, 0); g_assert_cmpuint(res, ==, 174); -g_assert_true(endptr == str + 4); +g_assert_true(endptr == str + 6); } static void test_qemu_strtosz_units(void) @@ -2669,14 +2669,23 @@ static void test_qemu_strtosz_float(void) g_assert_cmpuint(res, ==, 1); g_assert_true(endptr == str + 4); -/* An empty fraction is tolerated */ -str = "1.k"; +/* An empty fraction tail is tolerated */ +str = " 1.k"; endptr = str; res = 0xbaadf00d; err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, 0); g_assert_cmpuint(res, ==, 1024); -g_assert_true(endptr == str + 3); +g_assert_true(endptr == str + 4); + +/* FIXME An empty fraction head should be tolerated */ +str = " .5k"; +endptr = str; +res = 0xbaadf00d; +err = qemu_strtosz(str, &endptr, &res); +g_assert_cmpint(err, ==, -EINVAL /* FIXME 0 */); +g_assert_cmpuint(res, ==, 0xbaadf00d /* FIXME 512 */); +g_assert_true(endptr == str /* FIXME + 4 */); /* For convenience, we permit values that are not byte-exact */ str = "12.345M"; @@ -2686,6 +2695,32 @@ static void test_qemu_strtosz_float(void) g_assert_cmpint(err, ==, 0); g_assert_cmpuint(res, ==, (uint64_t) (12.345 * MiB + 0.5)); g_assert_true(endptr == str + 7); + +/* FIXME Fraction tail should round correctly */ +str = "1.k"; +endptr = str; +res = 0xbaadf00d; +err = qemu_strtosz(str, &endptr, &res); +g_assert_cmpint(err, ==, 0); +g_assert_cmpint(res, ==, 1024 /* FIXME 2048 */); +g_assert_true(endptr == str + 55); + +/* FIXME ERANGE underflow in the fraction tail should not matter for 'k' */ +str = "1." +"00" +"00" +"00" +"00" +"00" +"00" +"00" +"1k"; +endptr = str; +res = 0xbaadf00d; +err = qemu_strtosz(str, &endptr, &res); +g_assert_cmpint(err, ==, 0); +g_assert_cmpuint(res, ==, 1 /* FIXME 1024 */); +g_assert_true(endptr == str + 354); } static void test_qemu_strtosz_invalid(void) @@ -2693,10 +2728,20 @@ static void test_qemu_strtosz_invalid(void) const char *str; const char *endptr; int err; -uint64_t res = 0xbaadf00d; +uint64_t res; + +/* Must parse at least one digit */ +str = NULL; +endptr = "somewhere"; +res = 0xbaadf00d; +err = qemu_strtosz(str, &endptr, &res); +g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmphex(res, ==, 0xbaadf00d); +g_assert_null(endptr); str = ""; endptr = NULL; +res = 0xbaadf00d; err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, -EINVAL); g_assert_cmphex(res, ==, 0xbaadf00d); @@ -2704,13 +2749,30 @@ static void test_qemu_strtosz_invalid(void) str = " \t "; endptr = NULL; +res = 0xbaadf00d; err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, -EINVAL); g_assert_cmphex(res, ==, 0xbaadf00d); g_assert_true(endptr == str); +str = "."; +endptr = NULL; +res = 0xbaadf00d; +err = qemu_strtosz(str, &endptr, &res); +g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmphex(res, ==, 0xbaadf00d); +g_assert(endptr == str); + +str = " ."; +endptr = N
[PATCH 07/11] numa: Check for qemu_strtosz_MiB error
As shown in the previous commit, qemu_strtosz_MiB sometimes leaves the result value untoutched (we have to audit further to learn that in that case, the QAPI generator says that visit_type_NumaOptions() will have zero-initialized it), and sometimes leaves it with the value of a partial parse before -EINVAL occurs because of trailing garbage. Rather than blindly treating any string the user may throw at us as valid, we should check for parse failures. Fiuxes: cc001888 ("numa: fixup parsed NumaNodeOptions earlier", v2.11.0) Signed-off-by: Eric Blake --- hw/core/numa.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/hw/core/numa.c b/hw/core/numa.c index d8d36b16d80..f08956ddb0f 100644 --- a/hw/core/numa.c +++ b/hw/core/numa.c @@ -531,10 +531,17 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error **errp) /* Fix up legacy suffix-less format */ if ((object->type == NUMA_OPTIONS_TYPE_NODE) && object->u.node.has_mem) { const char *mem_str = qemu_opt_get(opts, "mem"); -qemu_strtosz_MiB(mem_str, NULL, &object->u.node.mem); +int ret = qemu_strtosz_MiB(mem_str, NULL, &object->u.node.mem); + +if (ret < 0) { +error_setg_errno(&err, -ret, "could not parse memory size '%s'", + mem_str); +} } -set_numa_options(ms, object, &err); +if (!err) { +set_numa_options(ms, object, &err); +} qapi_free_NumaOptions(object); if (err) { -- 2.40.1
[PATCH 11/11] cutils: Improve qemu_strtosz handling of fractions
We have several limitations and bugs worth fixing; they are inter-related enough that it is not worth splitting this patch into smaller pieces: * ".5k" should work to specify 512, just as "0.5k" does * "1.k" and "1." + "9"*50 + "k" should both produce the same result of 2048 after rounding * "1." + "0"*350 + "1B" should not be treated the same as "1.0B"; underflow in the fraction should not be lost * "7.99e99" and "7.99e999" look similar, but our code was doing a read-out-of-bounds on the latter because it was not expecting ERANGE due to overflow. While we document that scientific notation is not supported, and the previous patch actually fixed qemu_strtod_finite() to no longer return ERANGE overflows, it is easier to pre-filter than to try and determine after the fact if strtod() consumed more than we wanted. Note that this is a low-level semantic change (when endptr is not NULL, we can now successfully parse with a scale of 'E' and then report trailing junk, instead of failing outright with EINVAL); but an earlier commit already argued that this is not a high-level semantic change since the only caller passing in a non-NULL endptr also checks that the tail is whitespace-only. Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1629 Signed-off-by: Eric Blake --- tests/unit/test-cutils.c | 51 +++ util/cutils.c| 89 2 files changed, 88 insertions(+), 52 deletions(-) diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c index f781997aef7..1fb9d5323ab 100644 --- a/tests/unit/test-cutils.c +++ b/tests/unit/test-cutils.c @@ -2693,14 +2693,14 @@ static void test_qemu_strtosz_float(void) g_assert_cmpuint(res, ==, 1024); g_assert_true(endptr == str + 4); -/* FIXME An empty fraction head should be tolerated */ +/* An empty fraction head is tolerated */ str = " .5k"; endptr = str; res = 0xbaadf00d; err = qemu_strtosz(str, &endptr, &res); -g_assert_cmpint(err, ==, -EINVAL /* FIXME 0 */); -g_assert_cmpuint(res, ==, 0 /* FIXME 512 */); -g_assert_true(endptr == str /* FIXME + 4 */); +g_assert_cmpint(err, ==, 0); +g_assert_cmpuint(res, ==, 512); +g_assert_true(endptr == str + 4); /* For convenience, we permit values that are not byte-exact */ str = "12.345M"; @@ -2711,16 +2711,16 @@ static void test_qemu_strtosz_float(void) g_assert_cmpuint(res, ==, (uint64_t) (12.345 * MiB + 0.5)); g_assert_true(endptr == str + 7); -/* FIXME Fraction tail should round correctly */ +/* Fraction tail can round up */ str = "1.k"; endptr = str; res = 0xbaadf00d; err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, 0); -g_assert_cmpint(res, ==, 1024 /* FIXME 2048 */); +g_assert_cmpuint(res, ==, 2048); g_assert_true(endptr == str + 55); -/* FIXME ERANGE underflow in the fraction tail should not matter for 'k' */ +/* ERANGE underflow in the fraction tail does not matter for 'k' */ str = "1." "00" "00" @@ -2734,7 +2734,7 @@ static void test_qemu_strtosz_float(void) res = 0xbaadf00d; err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, 0); -g_assert_cmpuint(res, ==, 1 /* FIXME 1024 */); +g_assert_cmpuint(res, ==, 1024); g_assert_true(endptr == str + 354); } @@ -2826,16 +2826,16 @@ static void test_qemu_strtosz_invalid(void) g_assert_cmpuint(res, ==, 0); g_assert_true(endptr == str); -/* FIXME Fraction tail can cause ERANGE overflow */ +/* Fraction tail can cause ERANGE overflow */ str = "15.E"; endptr = str; res = 0xbaadf00d; err = qemu_strtosz(str, &endptr, &res); -g_assert_cmpint(err, ==, 0 /* FIXME -ERANGE */); -g_assert_cmpuint(res, ==, 15ULL * EiB /* FIXME 0 */); -g_assert_true(endptr == str + 56 /* FIXME str */); +g_assert_cmpint(err, ==, -ERANGE); +g_assert_cmpuint(res, ==, 0); +g_assert_true(endptr == str + 56); -/* FIXME ERANGE underflow in the fraction tail should matter for 'B' */ +/* ERANGE underflow in the fraction tail matters for 'B' */ str = "1." "00" "00" @@ -2848,9 +2848,9 @@ static void test_qemu_strtosz_invalid(void) endptr = str; res = 0xbaadf00d; err = qemu_strtosz(str, &endptr, &res); -g_assert_cmpint(err, ==, 0 /* FIXME -EINVAL */); -g_assert_cmpuint(res, ==, 1 /* FIXME 0 */); -g_assert_true(endptr == str + 354 /* FIXME str */); +g_assert_cmpint(err, ==, -EINVAL); +g_assert_cmpuint(res, ==, 0); +g_assert_true(endptr == str); /* No hex fractions */ str = "
[PATCH 00/11] Fix qemu_strtosz() read-out-of-bounds
This series blew up in my face when Hanna first pointed me to https://gitlab.com/qemu-project/qemu/-/issues/1629 Basically, 'qemu-img dd bs=9.9e999' killed a sanitized build because of a read-out-of-bounds (".9e999" parses as infinity, but qemu_strtosz wasn't expecting ERANGE failure). The overall diffstate is big, mainly because the unit tests needed a LOT of work before I felt comfortable tweaking semantics in something that is so essential to command-line and QMP parsing. Eric Blake (11): test-cutils: Avoid g_assert in unit tests test-cutils: Use g_assert_cmpuint where appropriate test-cutils: Test integral qemu_strto* value on failures test-cutils: Add coverage of qemu_strtod test-cutils: Prepare for upcoming semantic change in qemu_strtosz test-cutils: Add more coverage to qemu_strtosz numa: Check for qemu_strtosz_MiB error cutils: Set value in all qemu_strtosz* error paths cutils: Set value in all integral qemu_strto* error paths cutils: Improve qemu_strtod* error paths cutils: Improve qemu_strtosz handling of fractions hw/core/numa.c | 11 +- tests/unit/test-cutils.c | 1213 ++ util/cutils.c| 180 -- 3 files changed, 1090 insertions(+), 314 deletions(-) base-commit: 792f77f376adef944f9a03e601f6ad90c2f891b2 -- 2.40.1
Re: [PATCH v20 10/21] machine: adding s390 topology to info hotpluggable-cpus
On Tue, 2023-04-25 at 18:14 +0200, Pierre Morel wrote: > S390 topology adds books and drawers topology containers. > Let's add these to the HMP information for hotpluggable cpus. > > Signed-off-by: Pierre Morel Reviewed-by: Nina Schoetterl-Glausch if you fix the nits below. > --- > hw/core/machine-hmp-cmds.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c > index c3e55ef9e9..971212242d 100644 > --- a/hw/core/machine-hmp-cmds.c > +++ b/hw/core/machine-hmp-cmds.c > @@ -71,6 +71,12 @@ void hmp_hotpluggable_cpus(Monitor *mon, const QDict > *qdict) > if (c->has_node_id) { > monitor_printf(mon, "node-id: \"%" PRIu64 "\"\n", > c->node_id); > } > +if (c->has_drawer_id) { > +monitor_printf(mon, "drawer_id: \"%" PRIu64 "\"\n", > c->drawer_id); use - instead here ^ unless there is some reason to be inconsistent. > +} > +if (c->has_book_id) { > +monitor_printf(mon, " book_id: \"%" PRIu64 "\"\n", > c->book_id); Same here. > +} > if (c->has_socket_id) { > monitor_printf(mon, "socket-id: \"%" PRIu64 "\"\n", > c->socket_id); > }
Re: [PATCH v20 08/21] qapi/s390x/cpu topology: set-cpu-topology qmp command
On Tue, 2023-04-25 at 18:14 +0200, Pierre Morel wrote: > The modification of the CPU attributes are done through a monitor > command. > > It allows to move the core inside the topology tree to optimize > the cache usage in the case the host's hypervisor previously > moved the CPU. > > The same command allows to modify the CPU attributes modifiers > like polarization entitlement and the dedicated attribute to notify > the guest if the host admin modified scheduling or dedication of a vCPU. > > With this knowledge the guest has the possibility to optimize the > usage of the vCPUs. > > The command has a feature unstable for the moment. > > Signed-off-by: Pierre Morel Logic is sound, minor stuff below. > --- > qapi/machine-target.json | 37 +++ > hw/s390x/cpu-topology.c | 136 +++ > 2 files changed, 173 insertions(+) > > diff --git a/qapi/machine-target.json b/qapi/machine-target.json > index 42a6a40333..3b7a0b77f4 100644 > --- a/qapi/machine-target.json > +++ b/qapi/machine-target.json > @@ -4,6 +4,8 @@ > # This work is licensed under the terms of the GNU GPL, version 2 or later. > # See the COPYING file in the top-level directory. > > +{ 'include': 'machine-common.json' } > + > ## > # @CpuModelInfo: > # > @@ -354,3 +356,38 @@ > { 'enum': 'CpuS390Polarization', >'prefix': 'S390_CPU_POLARIZATION', >'data': [ 'horizontal', 'vertical' ] } > + > +## > +# @set-cpu-topology: > +# > +# @core-id: the vCPU ID to be moved > +# @socket-id: optional destination socket where to move the vCPU > +# @book-id: optional destination book where to move the vCPU > +# @drawer-id: optional destination drawer where to move the vCPU > +# @entitlement: optional entitlement > +# @dedicated: optional, if the vCPU is dedicated to a real CPU > +# > +# Features: > +# @unstable: This command may still be modified. > +# > +# Modifies the topology by moving the CPU inside the topology > +# tree or by changing a modifier attribute of a CPU. > +# Default value for optional parameter is the current value > +# used by the CPU. > +# > +# Returns: Nothing on success, the reason on failure. > +# > +# Since: 8.1 > +## > +{ 'command': 'set-cpu-topology', > + 'data': { > + 'core-id': 'uint16', > + '*socket-id': 'uint16', > + '*book-id': 'uint16', > + '*drawer-id': 'uint16', > + '*entitlement': 'CpuS390Entitlement', > + '*dedicated': 'bool' > + }, > + 'features': [ 'unstable' ], > + 'if': { 'all': [ 'TARGET_S390X' , 'CONFIG_KVM' ] } > +} > diff --git a/hw/s390x/cpu-topology.c b/hw/s390x/cpu-topology.c > index d9cd3dc3ce..e5fb976594 100644 > --- a/hw/s390x/cpu-topology.c > +++ b/hw/s390x/cpu-topology.c > @@ -16,6 +16,7 @@ > #include "target/s390x/cpu.h" > #include "hw/s390x/s390-virtio-ccw.h" > #include "hw/s390x/cpu-topology.h" > +#include "qapi/qapi-commands-machine-target.h" > > /* > * s390_topology is used to keep the topology information. > @@ -261,6 +262,27 @@ static bool s390_topology_check(uint16_t socket_id, > uint16_t book_id, > return true; > } > > +/** > + * s390_topology_need_report > + * @cpu: Current cpu > + * @drawer_id: future drawer ID > + * @book_id: future book ID > + * @socket_id: future socket ID Entitlement and dedicated are missing here. > + * > + * A modified topology change report is needed if the topology > + * tree or the topology attributes change. > + */ > +static int s390_topology_need_report(S390CPU *cpu, int drawer_id, I'd prefer a bool return type. > + int book_id, int socket_id, > + uint16_t entitlement, bool dedicated) > +{ > +return cpu->env.drawer_id != drawer_id || > + cpu->env.book_id != book_id || > + cpu->env.socket_id != socket_id || > + cpu->env.entitlement != entitlement || > + cpu->env.dedicated != dedicated; > +} > + > /** > * s390_update_cpu_props: > * @ms: the machine state > @@ -330,3 +352,117 @@ void s390_topology_setup_cpu(MachineState *ms, S390CPU > *cpu, Error **errp) > /* topology tree is reflected in props */ > s390_update_cpu_props(ms, cpu); > } > + > +static void s390_change_topology(uint16_t core_id, > + bool has_socket_id, uint16_t socket_id, > + bool has_book_id, uint16_t book_id, > + bool has_drawer_id, uint16_t drawer_id, > + bool has_entitlement, uint16_t entitlement, I would keep the enum type for entitlement. > + bool has_dedicated, bool dedicated, > + Error **errp) > +{ > +MachineState *ms = current_machine; > +int old_socket_entry; > +int new_socket_entry; > +int report_needed; > +S390CPU *cpu; > +ERRP_GUARD(); > + > +if (core_id >= ms->smp.max_cpus) { > +error_setg(errp, "Core-id %d out of range!", core_id); >
Re: [PATCH] hw/net: Move xilinx_ethlite.c to the target-independent source set
On [2023 May 08] Mon 14:03:14, Thomas Huth wrote: > Now that the tswap() functions are available for target-independent > code, too, we can move xilinx_ethlite.c from specific_ss to softmmu_ss > to avoid that we have to compile this file multiple times. > > Signed-off-by: Thomas Huth Reviewed-by: Francisco Iglesias > --- > hw/net/xilinx_ethlite.c | 2 +- > hw/net/meson.build | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/hw/net/xilinx_ethlite.c b/hw/net/xilinx_ethlite.c > index 99c22819ea..89f4f3b254 100644 > --- a/hw/net/xilinx_ethlite.c > +++ b/hw/net/xilinx_ethlite.c > @@ -25,7 +25,7 @@ > #include "qemu/osdep.h" > #include "qemu/module.h" > #include "qom/object.h" > -#include "cpu.h" /* FIXME should not use tswap* */ > +#include "exec/tswap.h" > #include "hw/sysbus.h" > #include "hw/irq.h" > #include "hw/qdev-properties.h" > diff --git a/hw/net/meson.build b/hw/net/meson.build > index e2be0654a1..a7860c5efe 100644 > --- a/hw/net/meson.build > +++ b/hw/net/meson.build > @@ -43,7 +43,7 @@ softmmu_ss.add(when: 'CONFIG_NPCM7XX', if_true: > files('npcm7xx_emc.c')) > softmmu_ss.add(when: 'CONFIG_ETRAXFS', if_true: files('etraxfs_eth.c')) > softmmu_ss.add(when: 'CONFIG_COLDFIRE', if_true: files('mcf_fec.c')) > specific_ss.add(when: 'CONFIG_PSERIES', if_true: files('spapr_llan.c')) > -specific_ss.add(when: 'CONFIG_XILINX_ETHLITE', if_true: > files('xilinx_ethlite.c')) > +softmmu_ss.add(when: 'CONFIG_XILINX_ETHLITE', if_true: > files('xilinx_ethlite.c')) > > softmmu_ss.add(when: 'CONFIG_VIRTIO_NET', if_true: files('net_rx_pkt.c')) > specific_ss.add(when: 'CONFIG_VIRTIO_NET', if_true: files('virtio-net.c')) > -- > 2.31.1 > >
Re: [PATCH 0/4] vhost-user-fs: Internal migration
On Mon, May 8, 2023 at 7:51 PM Eugenio Perez Martin wrote: > > On Mon, May 8, 2023 at 7:00 PM Hanna Czenczek wrote: > > > > On 05.05.23 16:37, Hanna Czenczek wrote: > > > On 05.05.23 16:26, Eugenio Perez Martin wrote: > > >> On Fri, May 5, 2023 at 11:51 AM Hanna Czenczek > > >> wrote: > > >>> (By the way, thanks for the explanations :)) > > >>> > > >>> On 05.05.23 11:03, Hanna Czenczek wrote: > > On 04.05.23 23:14, Stefan Hajnoczi wrote: > > >>> [...] > > >>> > > > I think it's better to change QEMU's vhost code > > > to leave stateful devices suspended (but not reset) across > > > vhost_dev_stop() -> vhost_dev_start(), maybe by introducing > > > vhost_dev_suspend() and vhost_dev_resume(). Have you thought about > > > this aspect? > > Yes and no; I mean, I haven’t in detail, but I thought this is what’s > > meant by suspending instead of resetting when the VM is stopped. > > >>> So, now looking at vhost_dev_stop(), one problem I can see is that > > >>> depending on the back-end, different operations it does will do > > >>> different things. > > >>> > > >>> It tries to stop the whole device via vhost_ops->vhost_dev_start(), > > >>> which for vDPA will suspend the device, but for vhost-user will > > >>> reset it > > >>> (if F_STATUS is there). > > >>> > > >>> It disables all vrings, which doesn’t mean stopping, but may be > > >>> necessary, too. (I haven’t yet really understood the use of disabled > > >>> vrings, I heard that virtio-net would have a need for it.) > > >>> > > >>> It then also stops all vrings, though, so that’s OK. And because this > > >>> will always do GET_VRING_BASE, this is actually always the same > > >>> regardless of transport. > > >>> > > >>> Finally (for this purpose), it resets the device status via > > >>> vhost_ops->vhost_reset_status(). This is only implemented on vDPA, and > > >>> this is what resets the device there. > > >>> > > >>> > > >>> So vhost-user resets the device in .vhost_dev_start, but vDPA only does > > >>> so in .vhost_reset_status. It would seem better to me if vhost-user > > >>> would also reset the device only in .vhost_reset_status, not in > > >>> .vhost_dev_start. .vhost_dev_start seems precisely like the place to > > >>> run SUSPEND/RESUME. > > >>> > > >> I think the same. I just saw It's been proposed at [1]. > > >> > > >>> Another question I have (but this is basically what I wrote in my last > > >>> email) is why we even call .vhost_reset_status here. If the device > > >>> and/or all of the vrings are already stopped, why do we need to reset > > >>> it? Naïvely, I had assumed we only really need to reset the device if > > >>> the guest changes, so that a new guest driver sees a freshly > > >>> initialized > > >>> device. > > >>> > > >> I don't know why we didn't need to call it :). I'm assuming the > > >> previous vhost-user net did fine resetting vq indexes, using > > >> VHOST_USER_SET_VRING_BASE. But I don't know about more complex > > >> devices. > > >> > > >> The guest can reset the device, or write 0 to the PCI config status, > > >> at any time. How does virtiofs handle it, being stateful? > > > > > > Honestly a good question because virtiofsd implements neither > > > SET_STATUS nor RESET_DEVICE. I’ll have to investigate that. > > > > > > I think when the guest resets the device, SET_VRING_BASE always comes > > > along some way or another, so that’s how the vrings are reset. Maybe > > > the internal state is reset only following more high-level FUSE > > > commands like INIT. > > > > So a meeting and one session of looking-into-the-code later: > > > > We reset every virt queue on GET_VRING_BASE, which is wrong, but happens > > to serve the purpose. (German is currently on that.) > > > > In our meeting, German said the reset would occur when the memory > > regions are changed, but I can’t see that in the code. > > That would imply that the status is reset when the guest's memory is > added or removed? > > > I think it only > > happens implicitly through the SET_VRING_BASE call, which resets the > > internal avail/used pointers. > > > > [This doesn’t seem different from libvhost-user, though, which > > implements neither SET_STATUS nor RESET_DEVICE, and which pretends to > > reset the device on RESET_OWNER, but really doesn’t (its > > vu_reset_device_exec() function just disables all vrings, doesn’t reset > > or even stop them).] > > > > Consequently, the internal state is never reset. It would be cleared on > > a FUSE Destroy message, but if you just force-reset the system, the > > state remains into the next reboot. Not even FUSE Init clears it, which > > seems weird. It happens to work because it’s still the same filesystem, > > so the existing state fits, but it kind of seems dangerous to keep e.g. > > files open. I don’t think it’s really exploitable because everything > > still goes through the guest kernel, but, well. We should clear the > > state on Init, and probably also implement SET_STATUS
[PATCH v2 6/6] multifd: Add colo support
Like in the normal ram_load() path, put the received pages into the colo cache and mark the pages in the bitmap so that they will be flushed to the guest later. Signed-off-by: Lukas Straub --- migration/multifd-colo.c | 30 +- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/migration/multifd-colo.c b/migration/multifd-colo.c index c035d15e87..305a1b7000 100644 --- a/migration/multifd-colo.c +++ b/migration/multifd-colo.c @@ -15,13 +15,41 @@ #include "ram.h" #include "multifd.h" #include "io/channel-socket.h" +#include "migration/colo.h" #define MULTIFD_INTERNAL #include "multifd-internal.h" static int multifd_colo_recv_pages(MultiFDRecvParams *p, Error **errp) { -return multifd_recv_state->ops->recv_pages(p, errp); +int ret = 0; + +/* + * While we're still in precopy mode, we copy received pages to both guest + * and cache. No need to set dirty bits, since guest and cache memory are + * in sync. + */ +if (migration_incoming_in_colo_state()) { +colo_record_bitmap(p->block, p->normal, p->normal_num); +} + +p->host = p->block->colo_cache; +ret = multifd_recv_state->ops->recv_pages(p, errp); +if (ret != 0) { +p->host = p->block->host; +return ret; +} + +if (!migration_incoming_in_colo_state()) { +for (int i = 0; i < p->normal_num; i++) { +void *guest = p->block->host + p->normal[i]; +void *cache = p->host + p->normal[i]; +memcpy(guest, cache, p->page_size); +} +} + +p->host = p->block->host; +return ret; } int multifd_colo_load_setup(Error **errp) -- 2.39.2 pgplE9D31XYvU.pgp Description: OpenPGP digital signature
Re: [PATCH 2/4] vhost-user: Interface for migration state transfer
On Thu, Apr 20, 2023 at 03:27:51PM +0200, Eugenio Pérez wrote: > On Tue, 2023-04-18 at 16:40 -0400, Stefan Hajnoczi wrote: > > On Tue, 18 Apr 2023 at 14:31, Eugenio Perez Martin > > wrote: > > > On Tue, Apr 18, 2023 at 7:59 PM Stefan Hajnoczi > > > wrote: > > > > On Tue, Apr 18, 2023 at 10:09:30AM +0200, Eugenio Perez Martin wrote: > > > > > On Mon, Apr 17, 2023 at 9:33 PM Stefan Hajnoczi > > > > > wrote: > > > > > > On Mon, 17 Apr 2023 at 15:10, Eugenio Perez Martin < > > > > > > epere...@redhat.com> wrote: > > > > > > > On Mon, Apr 17, 2023 at 5:38 PM Stefan Hajnoczi > > > > > > > > > > > > > > wrote: > > > > > > > > On Thu, Apr 13, 2023 at 12:14:24PM +0200, Eugenio Perez Martin > > > > > > > > wrote: > > > > > > > > > On Wed, Apr 12, 2023 at 11:06 PM Stefan Hajnoczi < > > > > > > > > > stefa...@redhat.com> wrote: > > > > > > > > > > On Tue, Apr 11, 2023 at 05:05:13PM +0200, Hanna Czenczek > > > > > > > > > > wrote: > > > > > > > > > > > So-called "internal" virtio-fs migration refers to > > > > > > > > > > > transporting the > > > > > > > > > > > back-end's (virtiofsd's) state through qemu's migration > > > > > > > > > > > stream. To do > > > > > > > > > > > this, we need to be able to transfer virtiofsd's internal > > > > > > > > > > > state to and > > > > > > > > > > > from virtiofsd. > > > > > > > > > > > > > > > > > > > > > > Because virtiofsd's internal state will not be too large, > > > > > > > > > > > we > > > > > > > > > > > believe it > > > > > > > > > > > is best to transfer it as a single binary blob after the > > > > > > > > > > > streaming > > > > > > > > > > > phase. Because this method should be useful to other > > > > > > > > > > > vhost- > > > > > > > > > > > user > > > > > > > > > > > implementations, too, it is introduced as a > > > > > > > > > > > general-purpose > > > > > > > > > > > addition to > > > > > > > > > > > the protocol, not limited to vhost-user-fs. > > > > > > > > > > > > > > > > > > > > > > These are the additions to the protocol: > > > > > > > > > > > - New vhost-user protocol feature > > > > > > > > > > > VHOST_USER_PROTOCOL_F_MIGRATORY_STATE: > > > > > > > > > > > This feature signals support for transferring state, and > > > > > > > > > > > is added so > > > > > > > > > > > that migration can fail early when the back-end has no > > > > > > > > > > > support. > > > > > > > > > > > > > > > > > > > > > > - SET_DEVICE_STATE_FD function: Front-end and back-end > > > > > > > > > > > negotiate a pipe > > > > > > > > > > > over which to transfer the state. The front-end sends > > > > > > > > > > > an > > > > > > > > > > > FD to the > > > > > > > > > > > back-end into/from which it can write/read its state, > > > > > > > > > > > and > > > > > > > > > > > the back-end > > > > > > > > > > > can decide to either use it, or reply with a different > > > > > > > > > > > FD > > > > > > > > > > > for the > > > > > > > > > > > front-end to override the front-end's choice. > > > > > > > > > > > The front-end creates a simple pipe to transfer the > > > > > > > > > > > state, > > > > > > > > > > > but maybe > > > > > > > > > > > the back-end already has an FD into/from which it has to > > > > > > > > > > > write/read > > > > > > > > > > > its state, in which case it will want to override the > > > > > > > > > > > simple pipe. > > > > > > > > > > > Conversely, maybe in the future we find a way to have > > > > > > > > > > > the > > > > > > > > > > > front-end > > > > > > > > > > > get an immediate FD for the migration stream (in some > > > > > > > > > > > cases), in which > > > > > > > > > > > case we will want to send this to the back-end instead > > > > > > > > > > > of > > > > > > > > > > > creating a > > > > > > > > > > > pipe. > > > > > > > > > > > Hence the negotiation: If one side has a better idea > > > > > > > > > > > than > > > > > > > > > > > a plain > > > > > > > > > > > pipe, we will want to use that. > > > > > > > > > > > > > > > > > > > > > > - CHECK_DEVICE_STATE: After the state has been transferred > > > > > > > > > > > through the > > > > > > > > > > > pipe (the end indicated by EOF), the front-end invokes > > > > > > > > > > > this function > > > > > > > > > > > to verify success. There is no in-band way (through the > > > > > > > > > > > pipe) to > > > > > > > > > > > indicate failure, so we need to check explicitly. > > > > > > > > > > > > > > > > > > > > > > Once the transfer pipe has been established via > > > > > > > > > > > SET_DEVICE_STATE_FD > > > > > > > > > > > (which includes establishing the direction of transfer and > > > > > > > > > > > migration > > > > > > > > > > > phase), the sending side writes its data into the pipe, > > > > > > > > > > > and > > > > > > > > > > > the reading > > > > > > > > > > > side reads it until it sees an EOF. Then, the front-end > > > > > > > > > > > will check for > > > > > > > > > > > success via CHECK_DEVICE_STATE, which on the destination > > > > > > > > > > >
[PATCH v2 3/6] multifd: Introduce multifd-internal.h
Introduce multifd-internal.h so code that would normally go into multifd.c can go into an extra file. This way, multifd.c hopefully won't grow to 4000 lines like ram.c This will be used in the next commits to add colo support to multifd. Signed-off-by: Lukas Straub --- migration/multifd-internal.h | 34 ++ migration/multifd.c | 15 --- 2 files changed, 38 insertions(+), 11 deletions(-) create mode 100644 migration/multifd-internal.h diff --git a/migration/multifd-internal.h b/migration/multifd-internal.h new file mode 100644 index 00..6eeaa028e7 --- /dev/null +++ b/migration/multifd-internal.h @@ -0,0 +1,34 @@ +/* + * Internal Multifd header + * + * Copyright (c) 2019-2020 Red Hat Inc + * + * Authors: + * Juan Quintela + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifdef QEMU_MIGRATION_MULTIFD_INTERNAL_H +#error Only include this header directly +#endif +#define QEMU_MIGRATION_MULTIFD_INTERNAL_H + +#ifndef MULTIFD_INTERNAL +#error This header is internal to multifd +#endif + +struct MultiFDRecvState { +MultiFDRecvParams *params; +/* number of created threads */ +int count; +/* syncs main thread and channels */ +QemuSemaphore sem_sync; +/* global number of generated multifd packets */ +uint64_t packet_num; +/* multifd ops */ +MultiFDMethods *ops; +}; + +extern struct MultiFDRecvState *multifd_recv_state; diff --git a/migration/multifd.c b/migration/multifd.c index 4e71c19292..f6bad69b6c 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -31,6 +31,9 @@ #include "io/channel-socket.h" #include "yank_functions.h" +#define MULTIFD_INTERNAL +#include "multifd-internal.h" + /* Multiple fd's */ #define MULTIFD_MAGIC 0x11223344U @@ -967,17 +970,7 @@ int multifd_save_setup(Error **errp) return 0; } -struct { -MultiFDRecvParams *params; -/* number of created threads */ -int count; -/* syncs main thread and channels */ -QemuSemaphore sem_sync; -/* global number of generated multifd packets */ -uint64_t packet_num; -/* multifd ops */ -MultiFDMethods *ops; -} *multifd_recv_state; +struct MultiFDRecvState *multifd_recv_state; static void multifd_recv_terminate_threads(Error *err) { -- 2.39.2 pgpxmCxzwCgUP.pgp Description: OpenPGP digital signature
[PATCH v2 4/6] multifd: Introduce a overridable revc_pages method
This allows to override the behaviour around recv_pages. Think of it like a "multifd_colo" child class of multifd. This will be used in the next commits to add colo support to multifd. Signed-off-by: Lukas Straub --- migration/meson.build| 1 + migration/multifd-colo.c | 39 + migration/multifd-internal.h | 5 migration/multifd.c | 48 4 files changed, 83 insertions(+), 10 deletions(-) create mode 100644 migration/multifd-colo.c diff --git a/migration/meson.build b/migration/meson.build index da1897fadf..22ab6c6d73 100644 --- a/migration/meson.build +++ b/migration/meson.build @@ -23,6 +23,7 @@ softmmu_ss.add(files( 'migration.c', 'multifd.c', 'multifd-zlib.c', + 'multifd-colo.c', 'options.c', 'postcopy-ram.c', 'savevm.c', diff --git a/migration/multifd-colo.c b/migration/multifd-colo.c new file mode 100644 index 00..c035d15e87 --- /dev/null +++ b/migration/multifd-colo.c @@ -0,0 +1,39 @@ +/* + * multifd colo implementation + * + * Copyright (c) Lukas Straub + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "exec/target_page.h" +#include "exec/ramblock.h" +#include "qemu/error-report.h" +#include "qapi/error.h" +#include "ram.h" +#include "multifd.h" +#include "io/channel-socket.h" + +#define MULTIFD_INTERNAL +#include "multifd-internal.h" + +static int multifd_colo_recv_pages(MultiFDRecvParams *p, Error **errp) +{ +return multifd_recv_state->ops->recv_pages(p, errp); +} + +int multifd_colo_load_setup(Error **errp) +{ +int ret; + +ret = _multifd_load_setup(errp); +if (ret) { +return ret; +} + +multifd_recv_state->recv_pages = multifd_colo_recv_pages; + +return 0; +} diff --git a/migration/multifd-internal.h b/migration/multifd-internal.h index 6eeaa028e7..82357f1d88 100644 --- a/migration/multifd-internal.h +++ b/migration/multifd-internal.h @@ -29,6 +29,11 @@ struct MultiFDRecvState { uint64_t packet_num; /* multifd ops */ MultiFDMethods *ops; +/* overridable recv method */ +int (*recv_pages)(MultiFDRecvParams *p, Error **errp); }; extern struct MultiFDRecvState *multifd_recv_state; + +int _multifd_load_setup(Error **errp); +int multifd_colo_load_setup(Error **errp); diff --git a/migration/multifd.c b/migration/multifd.c index f6bad69b6c..fb5e8859de 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -1126,7 +1126,7 @@ static void *multifd_recv_thread(void *opaque) qemu_mutex_unlock(&p->mutex); if (p->normal_num) { -ret = multifd_recv_state->ops->recv_pages(p, &local_err); +ret = multifd_recv_state->recv_pages(p, &local_err); if (ret != 0) { break; } @@ -1152,20 +1152,12 @@ static void *multifd_recv_thread(void *opaque) return NULL; } -int multifd_load_setup(Error **errp) +int _multifd_load_setup(Error **errp) { int thread_count; uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size(); uint8_t i; -/* - * Return successfully if multiFD recv state is already initialised - * or multiFD is not enabled. - */ -if (multifd_recv_state || !migrate_multifd()) { -return 0; -} - thread_count = migrate_multifd_channels(); multifd_recv_state = g_malloc0(sizeof(*multifd_recv_state)); multifd_recv_state->params = g_new0(MultiFDRecvParams, thread_count); @@ -1204,6 +1196,42 @@ int multifd_load_setup(Error **errp) return 0; } +static int multifd_normal_recv_pages(MultiFDRecvParams *p, Error **errp) +{ +return multifd_recv_state->ops->recv_pages(p, errp); +} + +static int multifd_normal_load_setup(Error **errp) +{ +int ret; + +ret = _multifd_load_setup(errp); +if (ret) { +return ret; +} + +multifd_recv_state->recv_pages = multifd_normal_recv_pages; + +return 0; +} + +int multifd_load_setup(Error **errp) +{ +/* + * Return successfully if multiFD recv state is already initialised + * or multiFD is not enabled. + */ +if (multifd_recv_state || !migrate_multifd()) { +return 0; +} + +if (migrate_colo()) { +return multifd_colo_load_setup(errp); +} else { +return multifd_normal_load_setup(errp); +} +} + bool multifd_recv_all_channels_created(void) { int thread_count = migrate_multifd_channels(); -- 2.39.2 pgpW27tKPR0AL.pgp Description: OpenPGP digital signature
[PATCH v2 1/6] ram: Add public helper to set colo bitmap
The overhead of the mutex in non-multifd mode is negligible, because in that case its just the single thread taking the mutex. This will be used in the next commits to add colo support to multifd. Signed-off-by: Lukas Straub --- migration/ram.c | 17 ++--- migration/ram.h | 1 + 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 5e7bf20ca5..2d3fd2112a 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3633,6 +3633,18 @@ static ram_addr_t host_page_offset_from_ram_block_offset(RAMBlock *block, return ((uintptr_t)block->host + offset) & (block->page_size - 1); } +void colo_record_bitmap(RAMBlock *block, ram_addr_t *normal, uint normal_num) +{ +qemu_mutex_lock(&ram_state->bitmap_mutex); +for (int i = 0; i < normal_num; i++) { +ram_addr_t offset = normal[i]; +ram_state->migration_dirty_pages += !test_and_set_bit( +offset >> TARGET_PAGE_BITS, +block->bmap); +} +qemu_mutex_unlock(&ram_state->bitmap_mutex); +} + static inline void *colo_cache_from_block_offset(RAMBlock *block, ram_addr_t offset, bool record_bitmap) { @@ -3650,9 +3662,8 @@ static inline void *colo_cache_from_block_offset(RAMBlock *block, * It help us to decide which pages in ram cache should be flushed * into VM's RAM later. */ -if (record_bitmap && -!test_and_set_bit(offset >> TARGET_PAGE_BITS, block->bmap)) { -ram_state->migration_dirty_pages++; +if (record_bitmap) { +colo_record_bitmap(block, &offset, 1); } return block->colo_cache + offset; } diff --git a/migration/ram.h b/migration/ram.h index 6fffbeb5f1..887d1fbae6 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -82,6 +82,7 @@ int colo_init_ram_cache(void); void colo_flush_ram_cache(void); void colo_release_ram_cache(void); void colo_incoming_start_dirty_log(void); +void colo_record_bitmap(RAMBlock *block, ram_addr_t *normal, uint normal_num); /* Background snapshot */ bool ram_write_tracking_available(void); -- 2.39.2 pgp53mPpYP4Hq.pgp Description: OpenPGP digital signature
[PATCH v2 2/6] ram: Let colo_flush_ram_cache take the bitmap_mutex
This is not required, colo_flush_ram_cache does not run concurrently with the multifd threads since the cache is only flushed after everything has been received. But it makes me more comfortable. This will be used in the next commits to add colo support to multifd. Signed-off-by: Lukas Straub --- migration/ram.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index 2d3fd2112a..f9e7aeda12 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4230,6 +4230,7 @@ void colo_flush_ram_cache(void) unsigned long offset = 0; memory_global_dirty_log_sync(); +qemu_mutex_lock(&ram_state->bitmap_mutex); WITH_RCU_READ_LOCK_GUARD() { RAMBLOCK_FOREACH_NOT_IGNORED(block) { ramblock_sync_dirty_bitmap(ram_state, block); @@ -4264,6 +4265,7 @@ void colo_flush_ram_cache(void) } } } +qemu_mutex_unlock(&ram_state->bitmap_mutex); trace_colo_flush_ram_cache_end(); } -- 2.39.2 pgpg5HTiURpEc.pgp Description: OpenPGP digital signature
[PATCH v2 5/6] multifd: Add the ramblock to MultiFDRecvParams
This will be used in the next commits to add colo support to multifd. Signed-off-by: Lukas Straub --- migration/multifd.c | 11 +-- migration/multifd.h | 2 ++ 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index fb5e8859de..fddbf86596 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -284,7 +284,6 @@ static void multifd_send_fill_packet(MultiFDSendParams *p) static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp) { MultiFDPacket_t *packet = p->packet; -RAMBlock *block; int i; packet->magic = be32_to_cpu(packet->magic); @@ -334,21 +333,21 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp) /* make sure that ramblock is 0 terminated */ packet->ramblock[255] = 0; -block = qemu_ram_block_by_name(packet->ramblock); -if (!block) { +p->block = qemu_ram_block_by_name(packet->ramblock); +if (!p->block) { error_setg(errp, "multifd: unknown ram block %s", packet->ramblock); return -1; } -p->host = block->host; +p->host = p->block->host; for (i = 0; i < p->normal_num; i++) { uint64_t offset = be64_to_cpu(packet->offset[i]); -if (offset > (block->used_length - p->page_size)) { +if (offset > (p->block->used_length - p->page_size)) { error_setg(errp, "multifd: offset too long %" PRIu64 " (max " RAM_ADDR_FMT ")", - offset, block->used_length); + offset, p->block->used_length); return -1; } p->normal[i] = offset; diff --git a/migration/multifd.h b/migration/multifd.h index 7cfc265148..a835643b48 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -175,6 +175,8 @@ typedef struct { uint32_t next_packet_size; /* packets sent through this channel */ uint64_t num_packets; +/* ramblock */ +RAMBlock *block; /* ramblock host address */ uint8_t *host; /* non zero pages recv through this channel */ -- 2.39.2 pgpYmu_18sDuf.pgp Description: OpenPGP digital signature
[PATCH v2 0/6] multifd: Add colo support
Hello Everyone, These patches add support for colo to multifd. -v2: - Split out addition of p->block - Add more comments Lukas Straub (6): ram: Add public helper to set colo bitmap ram: Let colo_flush_ram_cache take the bitmap_mutex multifd: Introduce multifd-internal.h multifd: Introduce a overridable revc_pages method multifd: Add the ramblock to MultiFDRecvParams multifd: Add colo support migration/meson.build| 1 + migration/multifd-colo.c | 67 migration/multifd-internal.h | 39 +++ migration/multifd.c | 74 +++- migration/multifd.h | 2 + migration/ram.c | 19 +++-- migration/ram.h | 1 + 7 files changed, 173 insertions(+), 30 deletions(-) create mode 100644 migration/multifd-colo.c create mode 100644 migration/multifd-internal.h -- 2.39.2 pgpUUAHlLbth_.pgp Description: OpenPGP digital signature
[PULL 12/13] ram-compress.c: Make target independent
From: Lukas Straub Make ram-compress.c target independent. Signed-off-by: Lukas Straub Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/meson.build| 3 ++- migration/ram-compress.c | 17 ++--- 2 files changed, 12 insertions(+), 8 deletions(-) diff --git a/migration/meson.build b/migration/meson.build index 2090af8e85..75de868bb7 100644 --- a/migration/meson.build +++ b/migration/meson.build @@ -23,6 +23,8 @@ softmmu_ss.add(files( 'migration.c', 'multifd.c', 'multifd-zlib.c', + 'multifd-zlib.c', + 'ram-compress.c', 'options.c', 'postcopy-ram.c', 'savevm.c', @@ -40,5 +42,4 @@ softmmu_ss.add(when: zstd, if_true: files('multifd-zstd.c')) specific_ss.add(when: 'CONFIG_SOFTMMU', if_true: files('dirtyrate.c', 'ram.c', - 'ram-compress.c', 'target.c')) diff --git a/migration/ram-compress.c b/migration/ram-compress.c index 3d2a4a6329..06254d8c69 100644 --- a/migration/ram-compress.c +++ b/migration/ram-compress.c @@ -35,7 +35,8 @@ #include "migration.h" #include "options.h" #include "io/channel-null.h" -#include "exec/ram_addr.h" +#include "exec/target_page.h" +#include "exec/ramblock.h" CompressionStats compression_counters; @@ -156,7 +157,7 @@ int compress_threads_save_setup(void) qemu_cond_init(&comp_done_cond); qemu_mutex_init(&comp_done_lock); for (i = 0; i < thread_count; i++) { -comp_param[i].originbuf = g_try_malloc(TARGET_PAGE_SIZE); +comp_param[i].originbuf = g_try_malloc(qemu_target_page_size()); if (!comp_param[i].originbuf) { goto exit; } @@ -192,11 +193,12 @@ static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream, uint8_t *source_buf) { uint8_t *p = block->host + offset; +size_t page_size = qemu_target_page_size(); int ret; assert(qemu_file_buffer_empty(f)); -if (buffer_is_zero(p, TARGET_PAGE_SIZE)) { +if (buffer_is_zero(p, page_size)) { return RES_ZEROPAGE; } @@ -205,8 +207,8 @@ static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream, * so that we can catch up the error during compression and * decompression */ -memcpy(source_buf, p, TARGET_PAGE_SIZE); -ret = qemu_put_compression_data(f, stream, source_buf, TARGET_PAGE_SIZE); +memcpy(source_buf, p, page_size); +ret = qemu_put_compression_data(f, stream, source_buf, page_size); if (ret < 0) { qemu_file_set_error(migrate_get_current()->to_dst_file, ret); error_report("compressed data failed!"); @@ -336,7 +338,7 @@ static void *do_data_decompress(void *opaque) param->des = 0; qemu_mutex_unlock(¶m->mutex); -pagesize = TARGET_PAGE_SIZE; +pagesize = qemu_target_page_size(); ret = qemu_uncompress_data(¶m->stream, des, pagesize, param->compbuf, len); @@ -439,7 +441,8 @@ int compress_threads_load_setup(QEMUFile *f) goto exit; } -decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE)); +size_t compbuf_size = compressBound(qemu_target_page_size()); +decomp_param[i].compbuf = g_malloc0(compbuf_size); qemu_mutex_init(&decomp_param[i].mutex); qemu_cond_init(&decomp_param[i].cond); decomp_param[i].done = true; -- 2.40.0
[PULL 11/13] ram compress: Assert that the file buffer matches the result
From: Lukas Straub Before this series, "nothing to send" was handled by the file buffer being empty. Now it is tracked via param->result. Assert that the file buffer state matches the result. Signed-off-by: Lukas Straub Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/qemu-file.c| 11 +++ migration/qemu-file.h| 1 + migration/ram-compress.c | 5 + migration/ram.c | 2 ++ 4 files changed, 19 insertions(+) diff --git a/migration/qemu-file.c b/migration/qemu-file.c index f4cfd05c67..61fb580342 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -870,6 +870,17 @@ int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src) return len; } +/* + * Check if the writable buffer is empty + */ + +bool qemu_file_buffer_empty(QEMUFile *file) +{ +assert(qemu_file_is_writable(file)); + +return !file->iovcnt; +} + /* * Get a string whose length is determined by a single preceding byte * A preallocated 256 byte buffer must be passed in. diff --git a/migration/qemu-file.h b/migration/qemu-file.h index 4f26bf6961..4ee58a87dd 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -113,6 +113,7 @@ size_t coroutine_mixed_fn qemu_get_buffer_in_place(QEMUFile *f, uint8_t **buf, s ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream, const uint8_t *p, size_t size); int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src); +bool qemu_file_buffer_empty(QEMUFile *file); /* * Note that you can only peek continuous bytes from where the current pointer diff --git a/migration/ram-compress.c b/migration/ram-compress.c index c25562f12d..3d2a4a6329 100644 --- a/migration/ram-compress.c +++ b/migration/ram-compress.c @@ -194,6 +194,8 @@ static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream, uint8_t *p = block->host + offset; int ret; +assert(qemu_file_buffer_empty(f)); + if (buffer_is_zero(p, TARGET_PAGE_SIZE)) { return RES_ZEROPAGE; } @@ -208,6 +210,7 @@ static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream, if (ret < 0) { qemu_file_set_error(migrate_get_current()->to_dst_file, ret); error_report("compressed data failed!"); +qemu_fflush(f); return RES_NONE; } return RES_COMPRESS; @@ -239,6 +242,7 @@ void flush_compressed_data(int (send_queued_data(CompressParam *))) if (!comp_param[idx].quit) { CompressParam *param = &comp_param[idx]; send_queued_data(param); +assert(qemu_file_buffer_empty(param->file)); compress_reset_result(param); } qemu_mutex_unlock(&comp_param[idx].mutex); @@ -268,6 +272,7 @@ retry: qemu_mutex_lock(¶m->mutex); param->done = false; send_queued_data(param); +assert(qemu_file_buffer_empty(param->file)); compress_reset_result(param); set_compress_params(param, block, offset); diff --git a/migration/ram.c b/migration/ram.c index 009681d213..ee4ab31f25 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1321,11 +1321,13 @@ static int send_queued_data(CompressParam *param) assert(block == pss->last_sent_block); if (param->result == RES_ZEROPAGE) { +assert(qemu_file_buffer_empty(param->file)); len += save_page_header(pss, file, block, offset | RAM_SAVE_FLAG_ZERO); qemu_put_byte(file, 0); len += 1; ram_release_page(block->idstr, offset); } else if (param->result == RES_COMPRESS) { +assert(!qemu_file_buffer_empty(param->file)); len += save_page_header(pss, file, block, offset | RAM_SAVE_FLAG_COMPRESS_PAGE); len += qemu_put_qemu_file(file, param->file); -- 2.40.0
[PULL 13/13] migration: Initialize and cleanup decompression in migration.c
From: Lukas Straub This fixes compress with colo. Signed-off-by: Lukas Straub Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/migration.c | 9 + migration/ram.c | 5 - 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 232e387109..0ee07802a5 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -26,6 +26,7 @@ #include "sysemu/cpu-throttle.h" #include "rdma.h" #include "ram.h" +#include "ram-compress.h" #include "migration/global_state.h" #include "migration/misc.h" #include "migration.h" @@ -228,6 +229,7 @@ void migration_incoming_state_destroy(void) struct MigrationIncomingState *mis = migration_incoming_get_current(); multifd_load_cleanup(); +compress_threads_load_cleanup(); if (mis->to_src_file) { /* Tell source that we are done */ @@ -500,6 +502,12 @@ process_incoming_migration_co(void *opaque) Error *local_err = NULL; assert(mis->from_src_file); + +if (compress_threads_load_setup(mis->from_src_file)) { +error_report("Failed to setup decompress threads"); +goto fail; +} + mis->migration_incoming_co = qemu_coroutine_self(); mis->largest_page_size = qemu_ram_pagesize_largest(); postcopy_state_set(POSTCOPY_INCOMING_NONE); @@ -565,6 +573,7 @@ fail: qemu_fclose(mis->from_src_file); multifd_load_cleanup(); +compress_threads_load_cleanup(); exit(EXIT_FAILURE); } diff --git a/migration/ram.c b/migration/ram.c index ee4ab31f25..f78e9912cd 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3558,10 +3558,6 @@ void colo_release_ram_cache(void) */ static int ram_load_setup(QEMUFile *f, void *opaque) { -if (compress_threads_load_setup(f)) { -return -1; -} - xbzrle_load_setup(); ramblock_recv_map_init(); @@ -3577,7 +3573,6 @@ static int ram_load_cleanup(void *opaque) } xbzrle_load_cleanup(); -compress_threads_load_cleanup(); RAMBLOCK_FOREACH_NOT_IGNORED(rb) { g_free(rb->receivedmap); -- 2.40.0
[PULL 01/13] qtest/migration-test.c: Add tests with compress enabled
From: Lukas Straub There has never been tests for migration with compress enabled. Add suitable tests, testing with compress-wait-thread = false too. Signed-off-by: Lukas Straub Acked-by: Peter Xu Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- tests/qtest/migration-test.c | 109 +++ 1 file changed, 109 insertions(+) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index be73ec3c06..ea0d3fad2a 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -406,6 +406,41 @@ static void migrate_set_parameter_str(QTestState *who, const char *parameter, migrate_check_parameter_str(who, parameter, value); } +static long long migrate_get_parameter_bool(QTestState *who, + const char *parameter) +{ +QDict *rsp; +int result; + +rsp = wait_command(who, "{ 'execute': 'query-migrate-parameters' }"); +result = qdict_get_bool(rsp, parameter); +qobject_unref(rsp); +return !!result; +} + +static void migrate_check_parameter_bool(QTestState *who, const char *parameter, +int value) +{ +int result; + +result = migrate_get_parameter_bool(who, parameter); +g_assert_cmpint(result, ==, value); +} + +static void migrate_set_parameter_bool(QTestState *who, const char *parameter, + int value) +{ +QDict *rsp; + +rsp = qtest_qmp(who, +"{ 'execute': 'migrate-set-parameters'," +"'arguments': { %s: %i } }", +parameter, value); +g_assert(qdict_haskey(rsp, "return")); +qobject_unref(rsp); +migrate_check_parameter_bool(who, parameter, value); +} + static void migrate_ensure_non_converge(QTestState *who) { /* Can't converge with 1ms downtime + 3 mbs bandwidth limit */ @@ -1524,6 +1559,70 @@ static void test_precopy_unix_xbzrle(void) test_precopy_common(&args); } +static void * +test_migrate_compress_start(QTestState *from, +QTestState *to) +{ +migrate_set_parameter_int(from, "compress-level", 1); +migrate_set_parameter_int(from, "compress-threads", 4); +migrate_set_parameter_bool(from, "compress-wait-thread", true); +migrate_set_parameter_int(to, "decompress-threads", 4); + +migrate_set_capability(from, "compress", true); +migrate_set_capability(to, "compress", true); + +return NULL; +} + +static void test_precopy_unix_compress(void) +{ +g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs); +MigrateCommon args = { +.connect_uri = uri, +.listen_uri = uri, +.start_hook = test_migrate_compress_start, +/* + * Test that no invalid thread state is left over from + * the previous iteration. + */ +.iterations = 2, +}; + +test_precopy_common(&args); +} + +static void * +test_migrate_compress_nowait_start(QTestState *from, + QTestState *to) +{ +migrate_set_parameter_int(from, "compress-level", 9); +migrate_set_parameter_int(from, "compress-threads", 1); +migrate_set_parameter_bool(from, "compress-wait-thread", false); +migrate_set_parameter_int(to, "decompress-threads", 1); + +migrate_set_capability(from, "compress", true); +migrate_set_capability(to, "compress", true); + +return NULL; +} + +static void test_precopy_unix_compress_nowait(void) +{ +g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs); +MigrateCommon args = { +.connect_uri = uri, +.listen_uri = uri, +.start_hook = test_migrate_compress_nowait_start, +/* + * Test that no invalid thread state is left over from + * the previous iteration. + */ +.iterations = 2, +}; + +test_precopy_common(&args); +} + static void test_precopy_tcp_plain(void) { MigrateCommon args = { @@ -2537,6 +2636,16 @@ int main(int argc, char **argv) qtest_add_func("/migration/bad_dest", test_baddest); qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain); qtest_add_func("/migration/precopy/unix/xbzrle", test_precopy_unix_xbzrle); +/* + * Compression fails from time to time. + * Put test here but don't enable it until everything is fixed. + */ +if (getenv("QEMU_TEST_FLAKY_TESTS")) { +qtest_add_func("/migration/precopy/unix/compress/wait", + test_precopy_unix_compress); +qtest_add_func("/migration/precopy/unix/compress/nowait", + test_precopy_unix_compress_nowait); +} #ifdef CONFIG_GNUTLS qtest_add_func("/migration/precopy/unix/tls/psk", test_precopy_unix_tls_psk); -- 2.40.0
[PULL 08/13] ram.c: Remove last ram.c dependency from the core compress code
From: Lukas Straub Make compression interfaces take send_queued_data() as an argument. Remove save_page_use_compression() from flush_compressed_data(). This removes the last ram.c dependency from the core compress code. Signed-off-by: Lukas Straub Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 27 +-- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index d1c24eff21..0cce65dfa5 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1545,13 +1545,10 @@ static int send_queued_data(CompressParam *param) return len; } -static void flush_compressed_data(RAMState *rs) +static void flush_compressed_data(int (send_queued_data(CompressParam *))) { int idx, thread_count; -if (!save_page_use_compression(rs)) { -return; -} thread_count = migrate_compress_threads(); qemu_mutex_lock(&comp_done_lock); @@ -1573,6 +1570,15 @@ static void flush_compressed_data(RAMState *rs) } } +static void ram_flush_compressed_data(RAMState *rs) +{ +if (!save_page_use_compression(rs)) { +return; +} + +flush_compressed_data(send_queued_data); +} + static inline void set_compress_params(CompressParam *param, RAMBlock *block, ram_addr_t offset) { @@ -1581,7 +1587,8 @@ static inline void set_compress_params(CompressParam *param, RAMBlock *block, param->trigger = true; } -static int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset) +static int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset, +int (send_queued_data(CompressParam *))) { int idx, thread_count, pages = -1; bool wait = migrate_compress_wait_thread(); @@ -1672,7 +1679,7 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss) * Also If xbzrle is on, stop using the data compression at this * point. In theory, xbzrle can do better than compression. */ -flush_compressed_data(rs); +ram_flush_compressed_data(rs); /* Hit the end of the list */ pss->block = QLIST_FIRST_RCU(&ram_list.blocks); @@ -2362,11 +2369,11 @@ static bool save_compress_page(RAMState *rs, PageSearchStatus *pss, * much CPU resource. */ if (block != pss->last_sent_block) { -flush_compressed_data(rs); +ram_flush_compressed_data(rs); return false; } -if (compress_page_with_multi_thread(block, offset) > 0) { +if (compress_page_with_multi_thread(block, offset, send_queued_data) > 0) { return true; } @@ -3412,7 +3419,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) * page is sent in one chunk. */ if (migrate_postcopy_ram()) { -flush_compressed_data(rs); +ram_flush_compressed_data(rs); } /* @@ -3507,7 +3514,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) } qemu_mutex_unlock(&rs->bitmap_mutex); -flush_compressed_data(rs); +ram_flush_compressed_data(rs); ram_control_after_iterate(f, RAM_CONTROL_FINISH); } -- 2.40.0
[PULL 09/13] ram.c: Move core compression code into its own file
From: Lukas Straub No functional changes intended. Signed-off-by: Lukas Straub Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/meson.build| 5 +- migration/ram-compress.c | 274 +++ migration/ram-compress.h | 65 ++ migration/ram.c | 262 + 4 files changed, 344 insertions(+), 262 deletions(-) create mode 100644 migration/ram-compress.c create mode 100644 migration/ram-compress.h diff --git a/migration/meson.build b/migration/meson.build index da1897fadf..2090af8e85 100644 --- a/migration/meson.build +++ b/migration/meson.build @@ -38,4 +38,7 @@ endif softmmu_ss.add(when: zstd, if_true: files('multifd-zstd.c')) specific_ss.add(when: 'CONFIG_SOFTMMU', -if_true: files('dirtyrate.c', 'ram.c', 'target.c')) +if_true: files('dirtyrate.c', + 'ram.c', + 'ram-compress.c', + 'target.c')) diff --git a/migration/ram-compress.c b/migration/ram-compress.c new file mode 100644 index 00..d9bc67d075 --- /dev/null +++ b/migration/ram-compress.c @@ -0,0 +1,274 @@ +/* + * QEMU System Emulator + * + * Copyright (c) 2003-2008 Fabrice Bellard + * Copyright (c) 2011-2015 Red Hat Inc + * + * Authors: + * Juan Quintela + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "qemu/osdep.h" +#include "qemu/cutils.h" + +#include "ram-compress.h" + +#include "qemu/error-report.h" +#include "migration.h" +#include "options.h" +#include "io/channel-null.h" +#include "exec/ram_addr.h" + +CompressionStats compression_counters; + +static CompressParam *comp_param; +static QemuThread *compress_threads; +/* comp_done_cond is used to wake up the migration thread when + * one of the compression threads has finished the compression. + * comp_done_lock is used to co-work with comp_done_cond. + */ +static QemuMutex comp_done_lock; +static QemuCond comp_done_cond; + +static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream, + RAMBlock *block, ram_addr_t offset, + uint8_t *source_buf); + +static void *do_data_compress(void *opaque) +{ +CompressParam *param = opaque; +RAMBlock *block; +ram_addr_t offset; +CompressResult result; + +qemu_mutex_lock(¶m->mutex); +while (!param->quit) { +if (param->trigger) { +block = param->block; +offset = param->offset; +param->trigger = false; +qemu_mutex_unlock(¶m->mutex); + +result = do_compress_ram_page(param->file, ¶m->stream, + block, offset, param->originbuf); + +qemu_mutex_lock(&comp_done_lock); +param->done = true; +param->result = result; +qemu_cond_signal(&comp_done_cond); +qemu_mutex_unlock(&comp_done_lock); + +qemu_mutex_lock(¶m->mutex); +} else { +qemu_cond_wait(¶m->cond, ¶m->mutex); +} +} +qemu_mutex_unlock(¶m->mutex); + +return NULL; +} + +void compress_threads_save_cleanup(void) +{ +int i, thread_count; + +if (!migrate_compress() || !comp_param) { +return; +} + +thread_count = migrate_compress_threads(); +for (i = 0; i < thread_count; i++) { +/* + * we use it as a indicator which shows if the thread is + * properly init'd or not + */ +if (!comp_param[i].file) { +break; +} + +qemu_mutex_lock(&comp_param[i].mutex); +comp_param[i].quit = true; +qemu_cond_signal(&comp_param[i].cond); +qemu_mutex_unlock(&comp_param[i].mutex); + +qemu_thread_join(compress_threads + i); +qemu_mutex_destroy(&comp_param[i].mutex); +qemu_cond_destroy(&comp
[PULL 02/13] qtest/migration-test.c: Add postcopy tests with compress enabled
From: Lukas Straub Add postcopy tests with compress enabled to ensure nothing breaks with the refactoring in the next commits. preempt+compress is blocked, so no test needed for that case. Signed-off-by: Lukas Straub Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- tests/qtest/migration-test.c | 85 +++- 1 file changed, 55 insertions(+), 30 deletions(-) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index ea0d3fad2a..8a5df84624 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -1127,6 +1127,36 @@ test_migrate_tls_x509_finish(QTestState *from, #endif /* CONFIG_TASN1 */ #endif /* CONFIG_GNUTLS */ +static void * +test_migrate_compress_start(QTestState *from, +QTestState *to) +{ +migrate_set_parameter_int(from, "compress-level", 1); +migrate_set_parameter_int(from, "compress-threads", 4); +migrate_set_parameter_bool(from, "compress-wait-thread", true); +migrate_set_parameter_int(to, "decompress-threads", 4); + +migrate_set_capability(from, "compress", true); +migrate_set_capability(to, "compress", true); + +return NULL; +} + +static void * +test_migrate_compress_nowait_start(QTestState *from, + QTestState *to) +{ +migrate_set_parameter_int(from, "compress-level", 9); +migrate_set_parameter_int(from, "compress-threads", 1); +migrate_set_parameter_bool(from, "compress-wait-thread", false); +migrate_set_parameter_int(to, "decompress-threads", 1); + +migrate_set_capability(from, "compress", true); +migrate_set_capability(to, "compress", true); + +return NULL; +} + static int migrate_postcopy_prepare(QTestState **from_ptr, QTestState **to_ptr, MigrateCommon *args) @@ -1204,6 +1234,15 @@ static void test_postcopy(void) test_postcopy_common(&args); } +static void test_postcopy_compress(void) +{ +MigrateCommon args = { +.start_hook = test_migrate_compress_start +}; + +test_postcopy_common(&args); +} + static void test_postcopy_preempt(void) { MigrateCommon args = { @@ -1305,6 +1344,15 @@ static void test_postcopy_recovery(void) test_postcopy_recovery_common(&args); } +static void test_postcopy_recovery_compress(void) +{ +MigrateCommon args = { +.start_hook = test_migrate_compress_start +}; + +test_postcopy_recovery_common(&args); +} + #ifdef CONFIG_GNUTLS static void test_postcopy_recovery_tls_psk(void) { @@ -1338,6 +1386,7 @@ static void test_postcopy_preempt_all(void) test_postcopy_recovery_common(&args); } + #endif static void test_baddest(void) @@ -1559,21 +1608,6 @@ static void test_precopy_unix_xbzrle(void) test_precopy_common(&args); } -static void * -test_migrate_compress_start(QTestState *from, -QTestState *to) -{ -migrate_set_parameter_int(from, "compress-level", 1); -migrate_set_parameter_int(from, "compress-threads", 4); -migrate_set_parameter_bool(from, "compress-wait-thread", true); -migrate_set_parameter_int(to, "decompress-threads", 4); - -migrate_set_capability(from, "compress", true); -migrate_set_capability(to, "compress", true); - -return NULL; -} - static void test_precopy_unix_compress(void) { g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs); @@ -1591,21 +1625,6 @@ static void test_precopy_unix_compress(void) test_precopy_common(&args); } -static void * -test_migrate_compress_nowait_start(QTestState *from, - QTestState *to) -{ -migrate_set_parameter_int(from, "compress-level", 9); -migrate_set_parameter_int(from, "compress-threads", 1); -migrate_set_parameter_bool(from, "compress-wait-thread", false); -migrate_set_parameter_int(to, "decompress-threads", 1); - -migrate_set_capability(from, "compress", true); -migrate_set_capability(to, "compress", true); - -return NULL; -} - static void test_precopy_unix_compress_nowait(void) { g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs); @@ -2631,6 +2650,12 @@ int main(int argc, char **argv) qtest_add_func("/migration/postcopy/preempt/plain", test_postcopy_preempt); qtest_add_func("/migration/postcopy/preempt/recovery/plain", test_postcopy_preempt_recovery); +if (getenv("QEMU_TEST_FLAKY_TESTS")) { +qtest_add_func("/migration/postcopy/compress/plain", + test_postcopy_compress); +qtest_add_func("/migration/postcopy/recovery/compress/plain", + test_postcopy_recovery_compress); +} } qtest_add_func("/migration/bad_dest", test_baddest); -- 2.40.0
[PULL 00/13] Compression code patches
The following changes since commit 792f77f376adef944f9a03e601f6ad90c2f891b2: Merge tag 'pull-loongarch-20230506' of https://gitlab.com/gaosong/qemu into staging (2023-05-06 08:11:52 +0100) are available in the Git repository at: https://gitlab.com/juan.quintela/qemu.git tags/compression-code-pull-request for you to fetch changes up to c323518a7aab1c01740a468671b7f2b517d3bca6: migration: Initialize and cleanup decompression in migration.c (2023-05-08 15:25:27 +0200) Migration PULL request (20230508 edition, take 2) Hi This is just the compression bits of the Migration PULL request for 20230428. Only change is that we don't run the compression tests by default. The problem already exist with compression code. The test just show that it don't work. - Add migration tests for (old) compress migration code (lukas) - Make compression code independent of ram.c (lukas) - Move compression code into ram-compress.c (lukas) Please apply, Juan. Lukas Straub (13): qtest/migration-test.c: Add tests with compress enabled qtest/migration-test.c: Add postcopy tests with compress enabled ram.c: Let the compress threads return a CompressResult enum ram.c: Dont change param->block in the compress thread ram.c: Reset result after sending queued data ram.c: Do not call save_page_header() from compress threads ram.c: Call update_compress_thread_counts from compress_send_queued_data ram.c: Remove last ram.c dependency from the core compress code ram.c: Move core compression code into its own file ram.c: Move core decompression code into its own file ram compress: Assert that the file buffer matches the result ram-compress.c: Make target independent migration: Initialize and cleanup decompression in migration.c migration/meson.build| 6 +- migration/migration.c| 9 + migration/qemu-file.c| 11 + migration/qemu-file.h| 1 + migration/ram-compress.c | 485 + migration/ram-compress.h | 70 + migration/ram.c | 502 +++ tests/qtest/migration-test.c | 134 ++ 8 files changed, 758 insertions(+), 460 deletions(-) create mode 100644 migration/ram-compress.c create mode 100644 migration/ram-compress.h -- 2.40.0
[PULL 07/13] ram.c: Call update_compress_thread_counts from compress_send_queued_data
From: Lukas Straub This makes the core compress code more independend from ram.c. Signed-off-by: Lukas Straub Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 18 ++ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index c52602b70d..d1c24eff21 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1540,12 +1540,14 @@ static int send_queued_data(CompressParam *param) abort(); } +update_compress_thread_counts(param, len); + return len; } static void flush_compressed_data(RAMState *rs) { -int idx, len, thread_count; +int idx, thread_count; if (!save_page_use_compression(rs)) { return; @@ -1564,15 +1566,8 @@ static void flush_compressed_data(RAMState *rs) qemu_mutex_lock(&comp_param[idx].mutex); if (!comp_param[idx].quit) { CompressParam *param = &comp_param[idx]; -len = send_queued_data(param); +send_queued_data(param); compress_reset_result(param); - -/* - * it's safe to fetch zero_page without holding comp_done_lock - * as there is no further request submitted to the thread, - * i.e, the thread should be waiting for a request at this point. - */ -update_compress_thread_counts(param, len); } qemu_mutex_unlock(&comp_param[idx].mutex); } @@ -1588,7 +1583,7 @@ static inline void set_compress_params(CompressParam *param, RAMBlock *block, static int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset) { -int idx, thread_count, bytes_xmit = -1, pages = -1; +int idx, thread_count, pages = -1; bool wait = migrate_compress_wait_thread(); thread_count = migrate_compress_threads(); @@ -1599,11 +1594,10 @@ retry: CompressParam *param = &comp_param[idx]; qemu_mutex_lock(¶m->mutex); param->done = false; -bytes_xmit = send_queued_data(param); +send_queued_data(param); compress_reset_result(param); set_compress_params(param, block, offset); -update_compress_thread_counts(param, bytes_xmit); qemu_cond_signal(¶m->cond); qemu_mutex_unlock(¶m->mutex); pages = 1; -- 2.40.0
[PULL 04/13] ram.c: Dont change param->block in the compress thread
From: Lukas Straub Instead introduce a extra parameter to trigger the compress thread. Now, when the compress thread is done, we know what RAMBlock and offset it did compress. This will be used in the next commits to move save_page_header() out of compress code. Signed-off-by: Lukas Straub Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 7bc05fc034..b552a9e538 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -492,6 +492,7 @@ typedef enum CompressResult CompressResult; struct CompressParam { bool done; bool quit; +bool trigger; CompressResult result; QEMUFile *file; QemuMutex mutex; @@ -565,10 +566,10 @@ static void *do_data_compress(void *opaque) qemu_mutex_lock(¶m->mutex); while (!param->quit) { -if (param->block) { +if (param->trigger) { block = param->block; offset = param->offset; -param->block = NULL; +param->trigger = false; qemu_mutex_unlock(¶m->mutex); result = do_compress_ram_page(param->file, ¶m->stream, @@ -1545,6 +1546,7 @@ static inline void set_compress_params(CompressParam *param, RAMBlock *block, { param->block = block; param->offset = offset; +param->trigger = true; } static int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset) -- 2.40.0
[PULL 06/13] ram.c: Do not call save_page_header() from compress threads
From: Lukas Straub save_page_header() accesses several global variables, so calling it from multiple threads is pretty ugly. Instead, call save_page_header() before writing out the compressed data from the compress buffer to the migration stream. This also makes the core compress code more independend from ram.c. Signed-off-by: Lukas Straub Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 44 +++- 1 file changed, 35 insertions(+), 9 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 4e14e3bb94..c52602b70d 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1465,17 +1465,13 @@ static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block, ram_addr_t offset, uint8_t *source_buf) { -RAMState *rs = ram_state; -PageSearchStatus *pss = &rs->pss[RAM_CHANNEL_PRECOPY]; uint8_t *p = block->host + offset; int ret; -if (save_zero_page_to_file(pss, f, block, offset)) { +if (buffer_is_zero(p, TARGET_PAGE_SIZE)) { return RES_ZEROPAGE; } -save_page_header(pss, f, block, offset | RAM_SAVE_FLAG_COMPRESS_PAGE); - /* * copy it to a internal buffer to avoid it being modified by VM * so that we can catch up the error during compression and @@ -1515,9 +1511,40 @@ static inline void compress_reset_result(CompressParam *param) param->offset = 0; } +static int send_queued_data(CompressParam *param) +{ +PageSearchStatus *pss = &ram_state->pss[RAM_CHANNEL_PRECOPY]; +MigrationState *ms = migrate_get_current(); +QEMUFile *file = ms->to_dst_file; +int len = 0; + +RAMBlock *block = param->block; +ram_addr_t offset = param->offset; + +if (param->result == RES_NONE) { +return 0; +} + +assert(block == pss->last_sent_block); + +if (param->result == RES_ZEROPAGE) { +len += save_page_header(pss, file, block, offset | RAM_SAVE_FLAG_ZERO); +qemu_put_byte(file, 0); +len += 1; +ram_release_page(block->idstr, offset); +} else if (param->result == RES_COMPRESS) { +len += save_page_header(pss, file, block, +offset | RAM_SAVE_FLAG_COMPRESS_PAGE); +len += qemu_put_qemu_file(file, param->file); +} else { +abort(); +} + +return len; +} + static void flush_compressed_data(RAMState *rs) { -MigrationState *ms = migrate_get_current(); int idx, len, thread_count; if (!save_page_use_compression(rs)) { @@ -1537,7 +1564,7 @@ static void flush_compressed_data(RAMState *rs) qemu_mutex_lock(&comp_param[idx].mutex); if (!comp_param[idx].quit) { CompressParam *param = &comp_param[idx]; -len = qemu_put_qemu_file(ms->to_dst_file, param->file); +len = send_queued_data(param); compress_reset_result(param); /* @@ -1563,7 +1590,6 @@ static int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset) { int idx, thread_count, bytes_xmit = -1, pages = -1; bool wait = migrate_compress_wait_thread(); -MigrationState *ms = migrate_get_current(); thread_count = migrate_compress_threads(); qemu_mutex_lock(&comp_done_lock); @@ -1573,7 +1599,7 @@ retry: CompressParam *param = &comp_param[idx]; qemu_mutex_lock(¶m->mutex); param->done = false; -bytes_xmit = qemu_put_qemu_file(ms->to_dst_file, param->file); +bytes_xmit = send_queued_data(param); compress_reset_result(param); set_compress_params(param, block, offset); -- 2.40.0
[PULL 10/13] ram.c: Move core decompression code into its own file
From: Lukas Straub No functional changes intended. Signed-off-by: Lukas Straub Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram-compress.c | 203 ++ migration/ram-compress.h | 5 + migration/ram.c | 204 --- 3 files changed, 208 insertions(+), 204 deletions(-) diff --git a/migration/ram-compress.c b/migration/ram-compress.c index d9bc67d075..c25562f12d 100644 --- a/migration/ram-compress.c +++ b/migration/ram-compress.c @@ -48,6 +48,24 @@ static QemuThread *compress_threads; static QemuMutex comp_done_lock; static QemuCond comp_done_cond; +struct DecompressParam { +bool done; +bool quit; +QemuMutex mutex; +QemuCond cond; +void *des; +uint8_t *compbuf; +int len; +z_stream stream; +}; +typedef struct DecompressParam DecompressParam; + +static QEMUFile *decomp_file; +static DecompressParam *decomp_param; +static QemuThread *decompress_threads; +static QemuMutex decomp_done_lock; +static QemuCond decomp_done_cond; + static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block, ram_addr_t offset, uint8_t *source_buf); @@ -272,3 +290,188 @@ retry: return pages; } + +/* return the size after decompression, or negative value on error */ +static int +qemu_uncompress_data(z_stream *stream, uint8_t *dest, size_t dest_len, + const uint8_t *source, size_t source_len) +{ +int err; + +err = inflateReset(stream); +if (err != Z_OK) { +return -1; +} + +stream->avail_in = source_len; +stream->next_in = (uint8_t *)source; +stream->avail_out = dest_len; +stream->next_out = dest; + +err = inflate(stream, Z_NO_FLUSH); +if (err != Z_STREAM_END) { +return -1; +} + +return stream->total_out; +} + +static void *do_data_decompress(void *opaque) +{ +DecompressParam *param = opaque; +unsigned long pagesize; +uint8_t *des; +int len, ret; + +qemu_mutex_lock(¶m->mutex); +while (!param->quit) { +if (param->des) { +des = param->des; +len = param->len; +param->des = 0; +qemu_mutex_unlock(¶m->mutex); + +pagesize = TARGET_PAGE_SIZE; + +ret = qemu_uncompress_data(¶m->stream, des, pagesize, + param->compbuf, len); +if (ret < 0 && migrate_get_current()->decompress_error_check) { +error_report("decompress data failed"); +qemu_file_set_error(decomp_file, ret); +} + +qemu_mutex_lock(&decomp_done_lock); +param->done = true; +qemu_cond_signal(&decomp_done_cond); +qemu_mutex_unlock(&decomp_done_lock); + +qemu_mutex_lock(¶m->mutex); +} else { +qemu_cond_wait(¶m->cond, ¶m->mutex); +} +} +qemu_mutex_unlock(¶m->mutex); + +return NULL; +} + +int wait_for_decompress_done(void) +{ +int idx, thread_count; + +if (!migrate_compress()) { +return 0; +} + +thread_count = migrate_decompress_threads(); +qemu_mutex_lock(&decomp_done_lock); +for (idx = 0; idx < thread_count; idx++) { +while (!decomp_param[idx].done) { +qemu_cond_wait(&decomp_done_cond, &decomp_done_lock); +} +} +qemu_mutex_unlock(&decomp_done_lock); +return qemu_file_get_error(decomp_file); +} + +void compress_threads_load_cleanup(void) +{ +int i, thread_count; + +if (!migrate_compress()) { +return; +} +thread_count = migrate_decompress_threads(); +for (i = 0; i < thread_count; i++) { +/* + * we use it as a indicator which shows if the thread is + * properly init'd or not + */ +if (!decomp_param[i].compbuf) { +break; +} + +qemu_mutex_lock(&decomp_param[i].mutex); +decomp_param[i].quit = true; +qemu_cond_signal(&decomp_param[i].cond); +qemu_mutex_unlock(&decomp_param[i].mutex); +} +for (i = 0; i < thread_count; i++) { +if (!decomp_param[i].compbuf) { +break; +} + +qemu_thread_join(decompress_threads + i); +qemu_mutex_destroy(&decomp_param[i].mutex); +qemu_cond_destroy(&decomp_param[i].cond); +inflateEnd(&decomp_param[i].stream); +g_free(decomp_param[i].compbuf); +decomp_param[i].compbuf = NULL; +} +g_free(decompress_threads); +g_free(decomp_param); +decompress_threads = NULL; +decomp_param = NULL; +decomp_file = NULL; +} + +int compress_threads_load_setup(QEMUFile *f) +{ +int i, thread_count; + +if (!migrate_compress()) { +return 0; +} + +thread_count = migrate_decompress_threads(); +decompress_thr
[PULL 03/13] ram.c: Let the compress threads return a CompressResult enum
From: Lukas Straub This will be used in the next commits to move save_page_header() out of compress code. Signed-off-by: Lukas Straub Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 34 ++ 1 file changed, 22 insertions(+), 12 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 5e7bf20ca5..7bc05fc034 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -482,10 +482,17 @@ MigrationOps *migration_ops; CompressionStats compression_counters; +enum CompressResult { +RES_NONE = 0, +RES_ZEROPAGE = 1, +RES_COMPRESS = 2 +}; +typedef enum CompressResult CompressResult; + struct CompressParam { bool done; bool quit; -bool zero_page; +CompressResult result; QEMUFile *file; QemuMutex mutex; QemuCond cond; @@ -527,8 +534,9 @@ static QemuCond decomp_done_cond; static int ram_save_host_page_urgent(PageSearchStatus *pss); -static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block, - ram_addr_t offset, uint8_t *source_buf); +static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream, + RAMBlock *block, ram_addr_t offset, + uint8_t *source_buf); /* NOTE: page is the PFN not real ram_addr_t. */ static void pss_init(PageSearchStatus *pss, RAMBlock *rb, ram_addr_t page) @@ -553,7 +561,7 @@ static void *do_data_compress(void *opaque) CompressParam *param = opaque; RAMBlock *block; ram_addr_t offset; -bool zero_page; +CompressResult result; qemu_mutex_lock(¶m->mutex); while (!param->quit) { @@ -563,12 +571,12 @@ static void *do_data_compress(void *opaque) param->block = NULL; qemu_mutex_unlock(¶m->mutex); -zero_page = do_compress_ram_page(param->file, ¶m->stream, - block, offset, param->originbuf); +result = do_compress_ram_page(param->file, ¶m->stream, + block, offset, param->originbuf); qemu_mutex_lock(&comp_done_lock); param->done = true; -param->zero_page = zero_page; +param->result = result; qemu_cond_signal(&comp_done_cond); qemu_mutex_unlock(&comp_done_lock); @@ -1452,8 +1460,9 @@ static int ram_save_multifd_page(QEMUFile *file, RAMBlock *block, return 1; } -static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block, - ram_addr_t offset, uint8_t *source_buf) +static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream, + RAMBlock *block, ram_addr_t offset, + uint8_t *source_buf) { RAMState *rs = ram_state; PageSearchStatus *pss = &rs->pss[RAM_CHANNEL_PRECOPY]; @@ -1461,7 +1470,7 @@ static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block, int ret; if (save_zero_page_to_file(pss, f, block, offset)) { -return true; +return RES_ZEROPAGE; } save_page_header(pss, f, block, offset | RAM_SAVE_FLAG_COMPRESS_PAGE); @@ -1476,8 +1485,9 @@ static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block, if (ret < 0) { qemu_file_set_error(migrate_get_current()->to_dst_file, ret); error_report("compressed data failed!"); +return RES_NONE; } -return false; +return RES_COMPRESS; } static void @@ -1485,7 +1495,7 @@ update_compress_thread_counts(const CompressParam *param, int bytes_xmit) { ram_transferred_add(bytes_xmit); -if (param->zero_page) { +if (param->result == RES_ZEROPAGE) { stat64_add(&mig_stats.zero_pages, 1); return; } -- 2.40.0
[PULL 05/13] ram.c: Reset result after sending queued data
From: Lukas Straub And take the param->mutex lock for the whole section to ensure thread-safety. Now, it is explicitly clear if there is no queued data to send. Before, this was handled by param->file stream being empty and thus qemu_put_qemu_file() not sending anything. This will be used in the next commits to move save_page_header() out of compress code. Signed-off-by: Lukas Straub Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 32 ++-- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index b552a9e538..4e14e3bb94 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1508,6 +1508,13 @@ update_compress_thread_counts(const CompressParam *param, int bytes_xmit) static bool save_page_use_compression(RAMState *rs); +static inline void compress_reset_result(CompressParam *param) +{ +param->result = RES_NONE; +param->block = NULL; +param->offset = 0; +} + static void flush_compressed_data(RAMState *rs) { MigrationState *ms = migrate_get_current(); @@ -1529,13 +1536,16 @@ static void flush_compressed_data(RAMState *rs) for (idx = 0; idx < thread_count; idx++) { qemu_mutex_lock(&comp_param[idx].mutex); if (!comp_param[idx].quit) { -len = qemu_put_qemu_file(ms->to_dst_file, comp_param[idx].file); +CompressParam *param = &comp_param[idx]; +len = qemu_put_qemu_file(ms->to_dst_file, param->file); +compress_reset_result(param); + /* * it's safe to fetch zero_page without holding comp_done_lock * as there is no further request submitted to the thread, * i.e, the thread should be waiting for a request at this point. */ -update_compress_thread_counts(&comp_param[idx], len); +update_compress_thread_counts(param, len); } qemu_mutex_unlock(&comp_param[idx].mutex); } @@ -1560,15 +1570,17 @@ static int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset) retry: for (idx = 0; idx < thread_count; idx++) { if (comp_param[idx].done) { -comp_param[idx].done = false; -bytes_xmit = qemu_put_qemu_file(ms->to_dst_file, -comp_param[idx].file); -qemu_mutex_lock(&comp_param[idx].mutex); -set_compress_params(&comp_param[idx], block, offset); -qemu_cond_signal(&comp_param[idx].cond); -qemu_mutex_unlock(&comp_param[idx].mutex); +CompressParam *param = &comp_param[idx]; +qemu_mutex_lock(¶m->mutex); +param->done = false; +bytes_xmit = qemu_put_qemu_file(ms->to_dst_file, param->file); +compress_reset_result(param); +set_compress_params(param, block, offset); + +update_compress_thread_counts(param, bytes_xmit); +qemu_cond_signal(¶m->cond); +qemu_mutex_unlock(¶m->mutex); pages = 1; -update_compress_thread_counts(&comp_param[idx], bytes_xmit); break; } } -- 2.40.0
Re: Machine x-remote property auto-shutdown
Hi Markus, Please see the comments inline below. > On May 5, 2023, at 10:58 AM, Markus Armbruster wrote: > > I stumbled over this property, looked closer, and now I'm confused. > > Like most QOM properties, x-remote.auto-shutdown is virtually > undocumented. All we have is this comment in vfio-user-obj.c: > >/** > * Usage: add options: > * -machine x-remote,vfio-user=on,auto-shutdown=on > * -device ,id= > * -object x-vfio-user-server,id=,type=unix,path=, > * device= > * > * Note that x-vfio-user-server object must be used with x-remote machine > only. > * This server could only support PCI devices for now. > * > * type - SocketAddress type - presently "unix" alone is supported. > Required > *option > * > * path - named unix socket, it will be created by the server. It is > *a required option > * > * device - id of a device on the server, a required option. PCI devices > * alone are supported presently. > * > * notes - x-vfio-user-server could block IO and monitor during the > * initialization phase. > */ > > This differs from docs/system/multi-process.rst, which has > > - Example command-line for the remote process is as follows: > > /usr/bin/qemu-system-x86_64\ > -machine x-remote \ > -device lsi53c895a,id=lsi0 \ > -drive id=drive_image2,file=/build/ol7-nvme-test-1.qcow2 \ > -device scsi-hd,id=drive2,drive=drive_image2,bus=lsi0.0,scsi-id=0 \ > -object x-remote-object,id=robj1,devid=lsi0,fd=4, > > No mention of auto-shutdown here. > > It points to docs/devel/qemu-multiprocess, which doesn't exist. I guess > it's docs/devel/multi-process.rst. Please fix that. Anyway, no mention Sorry about this. I will fix it. > of auto-shutdown there, either. > > Let's try code instead. The only use of the property is here: > >static bool vfu_object_auto_shutdown(void) >{ >bool auto_shutdown = true; >Error *local_err = NULL; > >if (!current_machine) { >return auto_shutdown; >} > >auto_shutdown = object_property_get_bool(OBJECT(current_machine), > "auto-shutdown", > &local_err); > >/* > * local_err would be set if no such property exists - safe to ignore. > * Unlikely scenario as auto-shutdown is always defined for > * TYPE_REMOTE_MACHINE, and TYPE_VFU_OBJECT only works with > * TYPE_REMOTE_MACHINE > */ >if (local_err) { >auto_shutdown = true; >error_free(local_err); >} > >return auto_shutdown; >} > > The comment suggests auto-shutdown should always be set with machine > TYPE_REMOTE_MACHINE, i.e. -machine x-remote basically requires > auto-shutdown=on. Why isn't it the default then? Why is it even > configurable? Use cases? The "auto-shutdown" property tells the server if it should continue running after all the clients disconnect or if it should shut down automatically after the last client disconnects. The user can set this property to "off" when the server serves multiple QEMU clients. The server process will continue to run after the last client disconnects, waiting for more clients to connect in the future. > > Anyway, vfu_object_auto_shutdown() returns > > (1) true when we don't have a current machine > > (2) true when getting the current machine's auto-shutdown property fails > > (3) the value of its auto-shutdown property otherwise > > Two uses: > > * In vfu_object_finalize(): > >if (!k->nr_devs && vfu_object_auto_shutdown()) { >qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN); >} > > I guess this requests shutdown when the last TYPE_VFU_OBJECT dies. > > SHUTDOWN_CAUSE_GUEST_SHUTDOWN is documented as > ># @guest-shutdown: Guest shutdown/suspend request, via ACPI or other ># hardware-specific means > > Can't say whether it's the right one to use here. > > * In VFU_OBJECT_ERROR(): > >/** > * VFU_OBJECT_ERROR - reports an error message. If auto_shutdown > * is set, it aborts the machine on error. Otherwise, it logs an > * error message without aborting. > */ >// >#define VFU_OBJECT_ERROR(o, fmt, ...) \ >{ \ >if (vfu_object_auto_shutdown()) { \ >error_setg(&error_abort, (fmt), ## __VA_ARGS__); \ >} else { \ >error_report((fmt), ## __VA_ARGS__);
[PATCH v3 2/3] target/arm: Select CONFIG_ARM_V7M when TCG is enabled
We cannot allow this config to be disabled at the moment as not all of the relevant code is protected by it. Commit 29d9efca16 ("arm/Kconfig: Do not build TCG-only boards on a KVM-only build") moved the CONFIGs of several boards to Kconfig, so it is now possible that nothing selects ARM_V7M (e.g. when doing a --without-default-devices build). Return the CONFIG_ARM_V7M entry to a state where it is always selected whenever TCG is available. Fixes: 29d9efca16 ("arm/Kconfig: Do not build TCG-only boards on a KVM-only build") Signed-off-by: Fabiano Rosas --- target/arm/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/target/arm/Kconfig b/target/arm/Kconfig index 3fffdcb61b..5947366f6e 100644 --- a/target/arm/Kconfig +++ b/target/arm/Kconfig @@ -1,6 +1,7 @@ config ARM bool select ARM_COMPATIBLE_SEMIHOSTING if TCG +select ARM_V7M if TCG config AARCH64 bool -- 2.35.3
[PATCH v3 0/3] target/arm: disable-tcg and without-default-devices fixes
Changed the cdrom test to apply to only the x86 and s390x cdrom boot tests. CI run: https://gitlab.com/farosas/qemu/-/pipelines/860488769 v2: https://lore.kernel.org/r/20230505123524.23401-1-faro...@suse.de v1: https://lore.kernel.org/r/20230503193833.29047-1-faro...@suse.de Here's the fix for the cdrom test failure that we discussed in the list, plus 2 fixes for the ---without-default-devices build. When I moved the boards CONFIGs from default.mak to Kconfig, it became possible (due to --without-default-devices) to disable the CONFIGs for all the boards that require ARM_V7M. That breaks the build because ARM_V7M is required to be always set. Fabiano Rosas (3): target/arm: Select SEMIHOSTING when using TCG target/arm: Select CONFIG_ARM_V7M when TCG is enabled tests/qtest: Don't run cdrom boot tests if no accelerator is present target/arm/Kconfig | 9 ++--- tests/qtest/cdrom-test.c | 10 ++ 2 files changed, 12 insertions(+), 7 deletions(-) -- 2.35.3
[PATCH v3 3/3] tests/qtest: Don't run cdrom boot tests if no accelerator is present
On a build configured with: --disable-tcg --enable-xen it is possible to produce a QEMU binary with no TCG nor KVM support. Skip the cdrom boot tests if that's the case. Fixes: 0c1ae3ff9d ("tests/qtest: Fix tests when no KVM or TCG are present") Signed-off-by: Fabiano Rosas --- tests/qtest/cdrom-test.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/tests/qtest/cdrom-test.c b/tests/qtest/cdrom-test.c index 26a2400181..31d3bacd8c 100644 --- a/tests/qtest/cdrom-test.c +++ b/tests/qtest/cdrom-test.c @@ -130,6 +130,11 @@ static void test_cdboot(gconstpointer data) static void add_x86_tests(void) { +if (!qtest_has_accel("tcg") && !qtest_has_accel("kvm")) { +g_test_skip("No KVM or TCG accelerator available, skipping boot tests"); +return; +} + qtest_add_data_func("cdrom/boot/default", "-cdrom ", test_cdboot); qtest_add_data_func("cdrom/boot/virtio-scsi", "-device virtio-scsi -device scsi-cd,drive=cdr " @@ -176,6 +181,11 @@ static void add_x86_tests(void) static void add_s390x_tests(void) { +if (!qtest_has_accel("tcg") && !qtest_has_accel("kvm")) { +g_test_skip("No KVM or TCG accelerator available, skipping boot tests"); +return; +} + qtest_add_data_func("cdrom/boot/default", "-cdrom ", test_cdboot); qtest_add_data_func("cdrom/boot/virtio-scsi", "-device virtio-scsi -device scsi-cd,drive=cdr " -- 2.35.3
[PATCH v3 1/3] target/arm: Select SEMIHOSTING when using TCG
Semihosting has been made a 'default y' entry in Kconfig, which does not work because when building --without-default-devices, the semihosting code would not be available. Make semihosting unconditional when TCG is present. Fixes: 29d9efca16 ("arm/Kconfig: Do not build TCG-only boards on a KVM-only build") Signed-off-by: Fabiano Rosas --- target/arm/Kconfig | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/target/arm/Kconfig b/target/arm/Kconfig index 39f05b6420..3fffdcb61b 100644 --- a/target/arm/Kconfig +++ b/target/arm/Kconfig @@ -1,13 +1,7 @@ config ARM bool +select ARM_COMPATIBLE_SEMIHOSTING if TCG config AARCH64 bool select ARM - -# This config exists just so we can make SEMIHOSTING default when TCG -# is selected without also changing it for other architectures. -config ARM_SEMIHOSTING -bool -default y if TCG && ARM -select ARM_COMPATIBLE_SEMIHOSTING -- 2.35.3
Re: [PATCH] virtio-net: not enable vq reset feature unconditionally
On Mon, May 08, 2023 at 07:31:35PM +0200, Eugenio Perez Martin wrote: > On Mon, May 8, 2023 at 12:22 PM Michael S. Tsirkin wrote: > > > > On Mon, May 08, 2023 at 11:09:46AM +0200, Eugenio Perez Martin wrote: > > > On Sat, May 6, 2023 at 4:25 AM Xuan Zhuo > > > wrote: > > > > > > > > On Thu, 4 May 2023 12:14:47 +0200, =?utf-8?q?Eugenio_P=C3=A9rez?= > > > > wrote: > > > > > The commit 93a97dc5200a ("virtio-net: enable vq reset feature") > > > > > enables > > > > > unconditionally vq reset feature as long as the device is emulated. > > > > > This makes impossible to actually disable the feature, and it causes > > > > > migration problems from qemu version previous than 7.2. > > > > > > > > > > The entire final commit is unneeded as device system already enable or > > > > > disable the feature properly. > > > > > > > > > > This reverts commit 93a97dc5200a95e63b99cb625f20b7ae802ba413. > > > > > Fixes: 93a97dc5200a ("virtio-net: enable vq reset feature") > > > > > Signed-off-by: Eugenio Pérez > > > > > > > > > > --- > > > > > Tested by checking feature bit at > > > > > /sys/devices/pci.../virtio0/features > > > > > enabling and disabling queue_reset virtio-net feature and vhost=on/off > > > > > on net device backend. > > > > > > > > Do you mean that this feature cannot be closed? > > > > > > > > I tried to close in the guest, it was successful. > > > > > > > > > > I'm not sure what you mean with close. If the device dataplane is > > > emulated in qemu (vhost=off), I'm not able to make the device not > > > offer it. > > > > > > > In addition, in this case, could you try to repair the problem instead > > > > of > > > > directly revert. > > > > > > > > > > I'm not following this. The revert is not to always disable the feature. > > > > > > By default, the feature is enabled. If cmdline states queue_reset=on, > > > the feature is enabled. That is true both before and after applying > > > this patch. > > > > > > However, in qemu master, queue_reset=off keeps enabling this feature > > > on the device. It happens that there is a commit explicitly doing > > > that, so I'm reverting it. > > > > > > Let me know if that makes sense to you. > > > > > > Thanks! > > > > > > question is this: > > > > DEFINE_PROP_BIT64("queue_reset", _state, _field, \ > > VIRTIO_F_RING_RESET, true) > > > > > > > > don't we need compat for 7.2 and back for this property? > > > > I think that part is already covered by commit 69e1c14aa222 ("virtio: > core: vq reset feature negotation support"). In that regard, maybe we > can simplify the patch message simply stating that queue_reset=off > does not work. > > Thanks! that compat for 7.1 and not 7.2 though? is that correct?
Re: [PATCH 0/4] vhost-user-fs: Internal migration
On Mon, May 8, 2023 at 7:00 PM Hanna Czenczek wrote: > > On 05.05.23 16:37, Hanna Czenczek wrote: > > On 05.05.23 16:26, Eugenio Perez Martin wrote: > >> On Fri, May 5, 2023 at 11:51 AM Hanna Czenczek > >> wrote: > >>> (By the way, thanks for the explanations :)) > >>> > >>> On 05.05.23 11:03, Hanna Czenczek wrote: > On 04.05.23 23:14, Stefan Hajnoczi wrote: > >>> [...] > >>> > > I think it's better to change QEMU's vhost code > > to leave stateful devices suspended (but not reset) across > > vhost_dev_stop() -> vhost_dev_start(), maybe by introducing > > vhost_dev_suspend() and vhost_dev_resume(). Have you thought about > > this aspect? > Yes and no; I mean, I haven’t in detail, but I thought this is what’s > meant by suspending instead of resetting when the VM is stopped. > >>> So, now looking at vhost_dev_stop(), one problem I can see is that > >>> depending on the back-end, different operations it does will do > >>> different things. > >>> > >>> It tries to stop the whole device via vhost_ops->vhost_dev_start(), > >>> which for vDPA will suspend the device, but for vhost-user will > >>> reset it > >>> (if F_STATUS is there). > >>> > >>> It disables all vrings, which doesn’t mean stopping, but may be > >>> necessary, too. (I haven’t yet really understood the use of disabled > >>> vrings, I heard that virtio-net would have a need for it.) > >>> > >>> It then also stops all vrings, though, so that’s OK. And because this > >>> will always do GET_VRING_BASE, this is actually always the same > >>> regardless of transport. > >>> > >>> Finally (for this purpose), it resets the device status via > >>> vhost_ops->vhost_reset_status(). This is only implemented on vDPA, and > >>> this is what resets the device there. > >>> > >>> > >>> So vhost-user resets the device in .vhost_dev_start, but vDPA only does > >>> so in .vhost_reset_status. It would seem better to me if vhost-user > >>> would also reset the device only in .vhost_reset_status, not in > >>> .vhost_dev_start. .vhost_dev_start seems precisely like the place to > >>> run SUSPEND/RESUME. > >>> > >> I think the same. I just saw It's been proposed at [1]. > >> > >>> Another question I have (but this is basically what I wrote in my last > >>> email) is why we even call .vhost_reset_status here. If the device > >>> and/or all of the vrings are already stopped, why do we need to reset > >>> it? Naïvely, I had assumed we only really need to reset the device if > >>> the guest changes, so that a new guest driver sees a freshly > >>> initialized > >>> device. > >>> > >> I don't know why we didn't need to call it :). I'm assuming the > >> previous vhost-user net did fine resetting vq indexes, using > >> VHOST_USER_SET_VRING_BASE. But I don't know about more complex > >> devices. > >> > >> The guest can reset the device, or write 0 to the PCI config status, > >> at any time. How does virtiofs handle it, being stateful? > > > > Honestly a good question because virtiofsd implements neither > > SET_STATUS nor RESET_DEVICE. I’ll have to investigate that. > > > > I think when the guest resets the device, SET_VRING_BASE always comes > > along some way or another, so that’s how the vrings are reset. Maybe > > the internal state is reset only following more high-level FUSE > > commands like INIT. > > So a meeting and one session of looking-into-the-code later: > > We reset every virt queue on GET_VRING_BASE, which is wrong, but happens > to serve the purpose. (German is currently on that.) > > In our meeting, German said the reset would occur when the memory > regions are changed, but I can’t see that in the code. That would imply that the status is reset when the guest's memory is added or removed? > I think it only > happens implicitly through the SET_VRING_BASE call, which resets the > internal avail/used pointers. > > [This doesn’t seem different from libvhost-user, though, which > implements neither SET_STATUS nor RESET_DEVICE, and which pretends to > reset the device on RESET_OWNER, but really doesn’t (its > vu_reset_device_exec() function just disables all vrings, doesn’t reset > or even stop them).] > > Consequently, the internal state is never reset. It would be cleared on > a FUSE Destroy message, but if you just force-reset the system, the > state remains into the next reboot. Not even FUSE Init clears it, which > seems weird. It happens to work because it’s still the same filesystem, > so the existing state fits, but it kind of seems dangerous to keep e.g. > files open. I don’t think it’s really exploitable because everything > still goes through the guest kernel, but, well. We should clear the > state on Init, and probably also implement SET_STATUS and clear the > state there. > I see. That's in the line of assuming GET_VRING_BASE is the last message received from qemu. Thanks!
Re: [PATCH v4 51/57] tcg/sparc64: Use atom_and_align_for_opc
On 5/5/23 14:20, Peter Maydell wrote: On Wed, 3 May 2023 at 08:13, Richard Henderson wrote: Signed-off-by: Richard Henderson --- tcg/sparc64/tcg-target.c.inc | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tcg/sparc64/tcg-target.c.inc b/tcg/sparc64/tcg-target.c.inc index bb23038529..4f9ec02b1f 100644 --- a/tcg/sparc64/tcg-target.c.inc +++ b/tcg/sparc64/tcg-target.c.inc @@ -1028,11 +1028,13 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, HostAddress *h, { TCGLabelQemuLdst *ldst = NULL; MemOp opc = get_memop(oi); -unsigned a_bits = get_alignment_bits(opc); -unsigned s_bits = opc & MO_SIZE; +MemOp s_bits = opc & MO_SIZE; +MemOp a_bits, atom_a, atom_u; unsigned a_mask; /* We don't support unaligned accesses. */ +a_bits = atom_and_align_for_opc(s, &atom_a, &atom_u, opc, +MO_ATOM_IFALIGN, false); a_bits = MAX(a_bits, s_bits); a_mask = (1u << a_bits) - 1; -- No changes to HostAddress struct again? Again, no use of alignment outside of prepare_host_addr. No 128-bit operations, and all host operations aligned. r~
Re: [PATCH v4 49/57] tcg/riscv: Use atom_and_align_for_opc
On 5/5/23 14:19, Peter Maydell wrote: On Wed, 3 May 2023 at 08:13, Richard Henderson wrote: Signed-off-by: Richard Henderson --- tcg/riscv/tcg-target.c.inc | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc index 37870c89fc..4dd33c73e8 100644 --- a/tcg/riscv/tcg-target.c.inc +++ b/tcg/riscv/tcg-target.c.inc @@ -910,8 +910,12 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, TCGReg *pbase, { TCGLabelQemuLdst *ldst = NULL; MemOp opc = get_memop(oi); -unsigned a_bits = get_alignment_bits(opc); -unsigned a_mask = (1u << a_bits) - 1; +MemOp a_bits, atom_a, atom_u; +unsigned a_mask; + +a_bits = atom_and_align_for_opc(s, &atom_a, &atom_u, opc, +MO_ATOM_IFALIGN, false); +a_mask = (1u << a_bits) - 1; #ifdef CONFIG_SOFTMMU unsigned s_bits = opc & MO_SIZE; Same remark as for ppc. Because the alignment was not required outside of prepare_host_addr. RISC-V does not have 128-bit memory operations of any kind. r~
Re: [PATCH v4 48/57] tcg/ppc: Use atom_and_align_for_opc
On 5/5/23 14:18, Peter Maydell wrote: On Wed, 3 May 2023 at 08:13, Richard Henderson wrote: Signed-off-by: Richard Henderson --- tcg/ppc/tcg-target.c.inc | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc index f0a4118bbb..60375804cd 100644 --- a/tcg/ppc/tcg-target.c.inc +++ b/tcg/ppc/tcg-target.c.inc @@ -2034,7 +2034,22 @@ static TCGLabelQemuLdst *prepare_host_addr(TCGContext *s, HostAddress *h, { TCGLabelQemuLdst *ldst = NULL; MemOp opc = get_memop(oi); -unsigned a_bits = get_alignment_bits(opc); +MemOp a_bits, atom_a, atom_u; + +/* + * Book II, Section 1.4, Single-Copy Atomicity, specifies: + * + * Before 3.0, "An access that is not atomic is performed as a set of + * smaller disjoint atomic accesses. In general, the number and alignment + * of these accesses are implementation-dependent." Thus MO_ATOM_IFALIGN. + * + * As of 3.0, "the non-atomic access is performed as described in + * the corresponding list", which matches MO_ATOM_SUBALIGN. + */ +a_bits = atom_and_align_for_opc(s, &atom_a, &atom_u, opc, +have_isa_3_00 ? MO_ATOM_SUBALIGN + : MO_ATOM_IFALIGN, +false); Why doesn't this patch have changes to a HostAddress struct like all the other archs ? Because the alignment as only required here, within prepare_host_addr. The Power LQ instruction allows unaligned input, unlike x86 VMOVDQA. r~
Re: [PATCH] virtio-net: not enable vq reset feature unconditionally
On Mon, May 8, 2023 at 12:22 PM Michael S. Tsirkin wrote: > > On Mon, May 08, 2023 at 11:09:46AM +0200, Eugenio Perez Martin wrote: > > On Sat, May 6, 2023 at 4:25 AM Xuan Zhuo wrote: > > > > > > On Thu, 4 May 2023 12:14:47 +0200, =?utf-8?q?Eugenio_P=C3=A9rez?= > > > wrote: > > > > The commit 93a97dc5200a ("virtio-net: enable vq reset feature") enables > > > > unconditionally vq reset feature as long as the device is emulated. > > > > This makes impossible to actually disable the feature, and it causes > > > > migration problems from qemu version previous than 7.2. > > > > > > > > The entire final commit is unneeded as device system already enable or > > > > disable the feature properly. > > > > > > > > This reverts commit 93a97dc5200a95e63b99cb625f20b7ae802ba413. > > > > Fixes: 93a97dc5200a ("virtio-net: enable vq reset feature") > > > > Signed-off-by: Eugenio Pérez > > > > > > > > --- > > > > Tested by checking feature bit at /sys/devices/pci.../virtio0/features > > > > enabling and disabling queue_reset virtio-net feature and vhost=on/off > > > > on net device backend. > > > > > > Do you mean that this feature cannot be closed? > > > > > > I tried to close in the guest, it was successful. > > > > > > > I'm not sure what you mean with close. If the device dataplane is > > emulated in qemu (vhost=off), I'm not able to make the device not > > offer it. > > > > > In addition, in this case, could you try to repair the problem instead of > > > directly revert. > > > > > > > I'm not following this. The revert is not to always disable the feature. > > > > By default, the feature is enabled. If cmdline states queue_reset=on, > > the feature is enabled. That is true both before and after applying > > this patch. > > > > However, in qemu master, queue_reset=off keeps enabling this feature > > on the device. It happens that there is a commit explicitly doing > > that, so I'm reverting it. > > > > Let me know if that makes sense to you. > > > > Thanks! > > > question is this: > > DEFINE_PROP_BIT64("queue_reset", _state, _field, \ > VIRTIO_F_RING_RESET, true) > > > > don't we need compat for 7.2 and back for this property? > I think that part is already covered by commit 69e1c14aa222 ("virtio: core: vq reset feature negotation support"). In that regard, maybe we can simplify the patch message simply stating that queue_reset=off does not work. Thanks!
Re: [PATCH RESEND] vhost: fix possible wrap in SVQ descriptor ring
On Sat, May 6, 2023 at 5:01 PM Hawkins Jiawei wrote: > > QEMU invokes vhost_svq_add() when adding a guest's element into SVQ. > In vhost_svq_add(), it uses vhost_svq_available_slots() to check > whether QEMU can add the element into the SVQ. If there is > enough space, then QEMU combines some out descriptors and > some in descriptors into one descriptor chain, and add it into > svq->vring.desc by vhost_svq_vring_write_descs(). > > Yet the problem is that, `svq->shadow_avail_idx - svq->shadow_used_idx` > in vhost_svq_available_slots() return the number of occupied elements, > or the number of descriptor chains, instead of the number of occupied > descriptors, which may cause wrapping in SVQ descriptor ring. > > Here is an example. In vhost_handle_guest_kick(), QEMU forwards > as many available buffers to device by virtqueue_pop() and > vhost_svq_add_element(). virtqueue_pop() return a guest's element, > and use vhost_svq_add_elemnt(), a wrapper to vhost_svq_add(), to > add this element into SVQ. If QEMU invokes virtqueue_pop() and > vhost_svq_add_element() `svq->vring.num` times, vhost_svq_available_slots() > thinks QEMU just ran out of slots and everything should work fine. > But in fact, virtqueue_pop() return `svq-vring.num` elements or > descriptor chains, more than `svq->vring.num` descriptors, due to > guest memory fragmentation, and this cause wrapping in SVQ descriptor ring. > The bug is valid even before marking the descriptors used. If the guest memory is fragmented, SVQ must add chains so it can try to add more descriptors than possible. > Therefore, this patch adds `num_free` field in VhostShadowVirtqueue > structure, updates this field in vhost_svq_add() and > vhost_svq_get_buf(), to record the number of free descriptors. > Then we can avoid wrap in SVQ descriptor ring by refactoring > vhost_svq_available_slots(). > > Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding") > Signed-off-by: Hawkins Jiawei > --- > hw/virtio/vhost-shadow-virtqueue.c | 9 - > hw/virtio/vhost-shadow-virtqueue.h | 3 +++ > 2 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c > b/hw/virtio/vhost-shadow-virtqueue.c > index 8361e70d1b..e1c6952b10 100644 > --- a/hw/virtio/vhost-shadow-virtqueue.c > +++ b/hw/virtio/vhost-shadow-virtqueue.c > @@ -68,7 +68,7 @@ bool vhost_svq_valid_features(uint64_t features, Error > **errp) > */ > static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq) > { > -return svq->vring.num - (svq->shadow_avail_idx - svq->shadow_used_idx); > +return svq->num_free; > } > > /** > @@ -263,6 +263,9 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct > iovec *out_sg, > return -EINVAL; > } > > +/* Update the size of SVQ vring free descriptors */ > +svq->num_free -= ndescs; > + > svq->desc_state[qemu_head].elem = elem; > svq->desc_state[qemu_head].ndescs = ndescs; > vhost_svq_kick(svq); > @@ -450,6 +453,9 @@ static VirtQueueElement > *vhost_svq_get_buf(VhostShadowVirtqueue *svq, > svq->desc_next[last_used_chain] = svq->free_head; > svq->free_head = used_elem.id; > > +/* Update the size of SVQ vring free descriptors */ No need for this comment. Apart from that, Acked-by: Eugenio Pérez > +svq->num_free += num; > + > *len = used_elem.len; > return g_steal_pointer(&svq->desc_state[used_elem.id].elem); > } > @@ -659,6 +665,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, > VirtIODevice *vdev, > svq->iova_tree = iova_tree; > > svq->vring.num = virtio_queue_get_num(vdev, virtio_get_queue_index(vq)); > +svq->num_free = svq->vring.num; > driver_size = vhost_svq_driver_area_size(svq); > device_size = vhost_svq_device_area_size(svq); > svq->vring.desc = qemu_memalign(qemu_real_host_page_size(), driver_size); > diff --git a/hw/virtio/vhost-shadow-virtqueue.h > b/hw/virtio/vhost-shadow-virtqueue.h > index 926a4897b1..6efe051a70 100644 > --- a/hw/virtio/vhost-shadow-virtqueue.h > +++ b/hw/virtio/vhost-shadow-virtqueue.h > @@ -107,6 +107,9 @@ typedef struct VhostShadowVirtqueue { > > /* Next head to consume from the device */ > uint16_t last_used_idx; > + > +/* Size of SVQ vring free descriptors */ > +uint16_t num_free; > } VhostShadowVirtqueue; > > bool vhost_svq_valid_features(uint64_t features, Error **errp); > -- > 2.25.1 >
Re: [PATCH 0/4] vhost-user-fs: Internal migration
On 05.05.23 16:37, Hanna Czenczek wrote: On 05.05.23 16:26, Eugenio Perez Martin wrote: On Fri, May 5, 2023 at 11:51 AM Hanna Czenczek wrote: (By the way, thanks for the explanations :)) On 05.05.23 11:03, Hanna Czenczek wrote: On 04.05.23 23:14, Stefan Hajnoczi wrote: [...] I think it's better to change QEMU's vhost code to leave stateful devices suspended (but not reset) across vhost_dev_stop() -> vhost_dev_start(), maybe by introducing vhost_dev_suspend() and vhost_dev_resume(). Have you thought about this aspect? Yes and no; I mean, I haven’t in detail, but I thought this is what’s meant by suspending instead of resetting when the VM is stopped. So, now looking at vhost_dev_stop(), one problem I can see is that depending on the back-end, different operations it does will do different things. It tries to stop the whole device via vhost_ops->vhost_dev_start(), which for vDPA will suspend the device, but for vhost-user will reset it (if F_STATUS is there). It disables all vrings, which doesn’t mean stopping, but may be necessary, too. (I haven’t yet really understood the use of disabled vrings, I heard that virtio-net would have a need for it.) It then also stops all vrings, though, so that’s OK. And because this will always do GET_VRING_BASE, this is actually always the same regardless of transport. Finally (for this purpose), it resets the device status via vhost_ops->vhost_reset_status(). This is only implemented on vDPA, and this is what resets the device there. So vhost-user resets the device in .vhost_dev_start, but vDPA only does so in .vhost_reset_status. It would seem better to me if vhost-user would also reset the device only in .vhost_reset_status, not in .vhost_dev_start. .vhost_dev_start seems precisely like the place to run SUSPEND/RESUME. I think the same. I just saw It's been proposed at [1]. Another question I have (but this is basically what I wrote in my last email) is why we even call .vhost_reset_status here. If the device and/or all of the vrings are already stopped, why do we need to reset it? Naïvely, I had assumed we only really need to reset the device if the guest changes, so that a new guest driver sees a freshly initialized device. I don't know why we didn't need to call it :). I'm assuming the previous vhost-user net did fine resetting vq indexes, using VHOST_USER_SET_VRING_BASE. But I don't know about more complex devices. The guest can reset the device, or write 0 to the PCI config status, at any time. How does virtiofs handle it, being stateful? Honestly a good question because virtiofsd implements neither SET_STATUS nor RESET_DEVICE. I’ll have to investigate that. I think when the guest resets the device, SET_VRING_BASE always comes along some way or another, so that’s how the vrings are reset. Maybe the internal state is reset only following more high-level FUSE commands like INIT. So a meeting and one session of looking-into-the-code later: We reset every virt queue on GET_VRING_BASE, which is wrong, but happens to serve the purpose. (German is currently on that.) In our meeting, German said the reset would occur when the memory regions are changed, but I can’t see that in the code. I think it only happens implicitly through the SET_VRING_BASE call, which resets the internal avail/used pointers. [This doesn’t seem different from libvhost-user, though, which implements neither SET_STATUS nor RESET_DEVICE, and which pretends to reset the device on RESET_OWNER, but really doesn’t (its vu_reset_device_exec() function just disables all vrings, doesn’t reset or even stop them).] Consequently, the internal state is never reset. It would be cleared on a FUSE Destroy message, but if you just force-reset the system, the state remains into the next reboot. Not even FUSE Init clears it, which seems weird. It happens to work because it’s still the same filesystem, so the existing state fits, but it kind of seems dangerous to keep e.g. files open. I don’t think it’s really exploitable because everything still goes through the guest kernel, but, well. We should clear the state on Init, and probably also implement SET_STATUS and clear the state there. Hanna
Re: [PATCH v2 00/12] simpletrace: refactor and general improvements
> > I was curious how Mads is using simpletrace for an internal (to > Samsung?) project. > I was just tracing the NVMe emulation to get some metrics. The code is all upstream or a part of this patchset. The rest is tracing configs.
Re: [PATCH 00/13] Migration PULL request (20230508 edition)
On 5/8/23 16:26, Juan Quintela wrote: Hi This is just the compression bits of the Migration PULL request for 20230428. Only change is that we don't run the compression tests by default. The problem already exist with compression code. The test just show that it don't work. Please apply, Juan. Missing request-pull data. r~ Lukas Straub (13): qtest/migration-test.c: Add tests with compress enabled qtest/migration-test.c: Add postcopy tests with compress enabled ram.c: Let the compress threads return a CompressResult enum ram.c: Dont change param->block in the compress thread ram.c: Reset result after sending queued data ram.c: Do not call save_page_header() from compress threads ram.c: Call update_compress_thread_counts from compress_send_queued_data ram.c: Remove last ram.c dependency from the core compress code ram.c: Move core compression code into its own file ram.c: Move core decompression code into its own file ram compress: Assert that the file buffer matches the result ram-compress.c: Make target independent migration: Initialize and cleanup decompression in migration.c migration/meson.build| 6 +- migration/migration.c| 9 + migration/qemu-file.c| 11 + migration/qemu-file.h| 1 + migration/ram-compress.c | 485 + migration/ram-compress.h | 70 + migration/ram.c | 502 +++ tests/qtest/migration-test.c | 134 ++ 8 files changed, 758 insertions(+), 460 deletions(-) create mode 100644 migration/ram-compress.c create mode 100644 migration/ram-compress.h
Re: [RFC v2 1/1] migration: Update error description whenever migration fails
Hi! On 08/05/2023 17.32, tejus.gk wrote: There are places in the code where the migration is marked failed with MIGRATION_STATUS_FAILED, but the failiure reason is never updated. Hence s/failiure/failure/ libvirt doesn't know why the migration failed when it queries for it. Signed-off-by: tejus.gk The Signed-off-by line should contain the proper name... Is "tejus.gk" really the correct spelling of your name (with only lowercase letters and a dot in it)? If not, please update the line, thanks! Thomas
Re: [PATCH v4 52/57] tcg/i386: Honor 64-bit atomicity in 32-bit mode
On 5/5/23 14:27, Peter Maydell wrote: On Wed, 3 May 2023 at 08:18, Richard Henderson wrote: Use the fpu to perform 64-bit loads and stores. Signed-off-by: Richard Henderson @@ -2091,7 +2095,20 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi, datalo = datahi; datahi = t; } -if (h.base == datalo || h.index == datalo) { +if (h.atom == MO_64) { +/* + * Atomicity requires that we use use a single 8-byte load. + * For simplicity and code size, always use the FPU for this. + * Similar insns using SSE/AVX are merely larger. I'm surprised there's no performance penalty for throwing old-school FPU insns into what is presumably otherwise code that's only using modern SSE. I have no idea about performance. We don't require SSE for TCG at the moment. I assume the caller has arranged that the top of the stack is trashable at this point? The entire fpu stack is call-clobbered. r~
Re: [PATCH] target/m68k: Fix gen_load_fp for OS_LONG
Le 08/05/2023 à 16:08, Richard Henderson a écrit : Case was accidentally dropped in b7a94da9550b. Signed-off-by: Richard Henderson --- target/m68k/translate.c | 1 + 1 file changed, 1 insertion(+) diff --git a/target/m68k/translate.c b/target/m68k/translate.c index 744eb3748b..44d852b106 100644 --- a/target/m68k/translate.c +++ b/target/m68k/translate.c @@ -959,6 +959,7 @@ static void gen_load_fp(DisasContext *s, int opsize, TCGv addr, TCGv_ptr fp, switch (opsize) { case OS_BYTE: case OS_WORD: +case OS_LONG: tcg_gen_qemu_ld_tl(tmp, addr, index, opsize | MO_SIGN | MO_TE); gen_helper_exts32(cpu_env, fp, tmp); break; Tested-by: Laurent Vivier Reviewed-by: Laurent Vivier
[RFC v2 1/1] migration: Update error description whenever migration fails
There are places in the code where the migration is marked failed with MIGRATION_STATUS_FAILED, but the failiure reason is never updated. Hence libvirt doesn't know why the migration failed when it queries for it. Signed-off-by: tejus.gk --- migration/migration.c | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 232e387109..87101eed5c 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1660,15 +1660,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, } else if (strstart(uri, "fd:", &p)) { fd_start_outgoing_migration(s, p, &local_err); } else { -if (!(has_resume && resume)) { -yank_unregister_instance(MIGRATION_YANK_INSTANCE); -} -error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "uri", +error_setg(&local_err, QERR_INVALID_PARAMETER_VALUE, "uri", "a valid migration protocol"); -migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, - MIGRATION_STATUS_FAILED); block_cleanup_parameters(); -return; } if (local_err) { @@ -2050,7 +2044,7 @@ migration_wait_main_channel(MigrationState *ms) * Switch from normal iteration to postcopy * Returns non-0 on error */ -static int postcopy_start(MigrationState *ms) +static int postcopy_start(MigrationState *ms, Error **errp) { int ret; QIOChannelBuffer *bioc; @@ -2165,7 +2159,7 @@ static int postcopy_start(MigrationState *ms) */ ret = qemu_file_get_error(ms->to_dst_file); if (ret) { -error_report("postcopy_start: Migration stream errored (pre package)"); +error_setg(errp, "postcopy_start: Migration stream errored (pre package)"); goto fail_closefb; } @@ -2202,7 +2196,7 @@ static int postcopy_start(MigrationState *ms) ret = qemu_file_get_error(ms->to_dst_file); if (ret) { -error_report("postcopy_start: Migration stream errored"); +error_setg(errp, "postcopy_start: Migration stream errored"); migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, MIGRATION_STATUS_FAILED); } @@ -2719,6 +2713,7 @@ typedef enum { static MigIterateState migration_iteration_run(MigrationState *s) { uint64_t must_precopy, can_postcopy; +Error *local_err = NULL; bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE; qemu_savevm_state_pending_estimate(&must_precopy, &can_postcopy); @@ -2741,8 +2736,9 @@ static MigIterateState migration_iteration_run(MigrationState *s) /* Still a significant amount to transfer */ if (!in_postcopy && must_precopy <= s->threshold_size && qatomic_read(&s->start_postcopy)) { -if (postcopy_start(s)) { -error_report("%s: postcopy failed to start", __func__); +if (postcopy_start(s, &local_err)) { +migrate_set_error(s, local_err); +error_report_err(local_err); } return MIG_ITERATE_SKIP; } @@ -3232,8 +3228,10 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) */ if (migrate_postcopy_ram() || migrate_return_path()) { if (open_return_path_on_source(s, !resume)) { -error_report("Unable to open return-path for postcopy"); +error_setg(&local_err, "Unable to open return-path for postcopy"); migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED); +migrate_set_error(s, local_err); +error_report_err(local_err); migrate_fd_cleanup(s); return; } -- 2.22.3
[RFC v2 0/1] migration: Update error description whenever migration fails
Hi everyone, Thanks for the reviews. This is the v2 patchset based on the reviews recieved for the previous one. Links to previous patchsets: v1: https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg00868.html tejus.gk (1): migration: Update error description whenever migration fails migration/migration.c | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) -- 2.22.3
[PATCH 04/13] ram.c: Dont change param->block in the compress thread
From: Lukas Straub Instead introduce a extra parameter to trigger the compress thread. Now, when the compress thread is done, we know what RAMBlock and offset it did compress. This will be used in the next commits to move save_page_header() out of compress code. Signed-off-by: Lukas Straub Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 7bc05fc034..b552a9e538 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -492,6 +492,7 @@ typedef enum CompressResult CompressResult; struct CompressParam { bool done; bool quit; +bool trigger; CompressResult result; QEMUFile *file; QemuMutex mutex; @@ -565,10 +566,10 @@ static void *do_data_compress(void *opaque) qemu_mutex_lock(¶m->mutex); while (!param->quit) { -if (param->block) { +if (param->trigger) { block = param->block; offset = param->offset; -param->block = NULL; +param->trigger = false; qemu_mutex_unlock(¶m->mutex); result = do_compress_ram_page(param->file, ¶m->stream, @@ -1545,6 +1546,7 @@ static inline void set_compress_params(CompressParam *param, RAMBlock *block, { param->block = block; param->offset = offset; +param->trigger = true; } static int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset) -- 2.40.0
[PATCH 10/13] ram.c: Move core decompression code into its own file
From: Lukas Straub No functional changes intended. Signed-off-by: Lukas Straub Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram-compress.c | 203 ++ migration/ram-compress.h | 5 + migration/ram.c | 204 --- 3 files changed, 208 insertions(+), 204 deletions(-) diff --git a/migration/ram-compress.c b/migration/ram-compress.c index d9bc67d075..c25562f12d 100644 --- a/migration/ram-compress.c +++ b/migration/ram-compress.c @@ -48,6 +48,24 @@ static QemuThread *compress_threads; static QemuMutex comp_done_lock; static QemuCond comp_done_cond; +struct DecompressParam { +bool done; +bool quit; +QemuMutex mutex; +QemuCond cond; +void *des; +uint8_t *compbuf; +int len; +z_stream stream; +}; +typedef struct DecompressParam DecompressParam; + +static QEMUFile *decomp_file; +static DecompressParam *decomp_param; +static QemuThread *decompress_threads; +static QemuMutex decomp_done_lock; +static QemuCond decomp_done_cond; + static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block, ram_addr_t offset, uint8_t *source_buf); @@ -272,3 +290,188 @@ retry: return pages; } + +/* return the size after decompression, or negative value on error */ +static int +qemu_uncompress_data(z_stream *stream, uint8_t *dest, size_t dest_len, + const uint8_t *source, size_t source_len) +{ +int err; + +err = inflateReset(stream); +if (err != Z_OK) { +return -1; +} + +stream->avail_in = source_len; +stream->next_in = (uint8_t *)source; +stream->avail_out = dest_len; +stream->next_out = dest; + +err = inflate(stream, Z_NO_FLUSH); +if (err != Z_STREAM_END) { +return -1; +} + +return stream->total_out; +} + +static void *do_data_decompress(void *opaque) +{ +DecompressParam *param = opaque; +unsigned long pagesize; +uint8_t *des; +int len, ret; + +qemu_mutex_lock(¶m->mutex); +while (!param->quit) { +if (param->des) { +des = param->des; +len = param->len; +param->des = 0; +qemu_mutex_unlock(¶m->mutex); + +pagesize = TARGET_PAGE_SIZE; + +ret = qemu_uncompress_data(¶m->stream, des, pagesize, + param->compbuf, len); +if (ret < 0 && migrate_get_current()->decompress_error_check) { +error_report("decompress data failed"); +qemu_file_set_error(decomp_file, ret); +} + +qemu_mutex_lock(&decomp_done_lock); +param->done = true; +qemu_cond_signal(&decomp_done_cond); +qemu_mutex_unlock(&decomp_done_lock); + +qemu_mutex_lock(¶m->mutex); +} else { +qemu_cond_wait(¶m->cond, ¶m->mutex); +} +} +qemu_mutex_unlock(¶m->mutex); + +return NULL; +} + +int wait_for_decompress_done(void) +{ +int idx, thread_count; + +if (!migrate_compress()) { +return 0; +} + +thread_count = migrate_decompress_threads(); +qemu_mutex_lock(&decomp_done_lock); +for (idx = 0; idx < thread_count; idx++) { +while (!decomp_param[idx].done) { +qemu_cond_wait(&decomp_done_cond, &decomp_done_lock); +} +} +qemu_mutex_unlock(&decomp_done_lock); +return qemu_file_get_error(decomp_file); +} + +void compress_threads_load_cleanup(void) +{ +int i, thread_count; + +if (!migrate_compress()) { +return; +} +thread_count = migrate_decompress_threads(); +for (i = 0; i < thread_count; i++) { +/* + * we use it as a indicator which shows if the thread is + * properly init'd or not + */ +if (!decomp_param[i].compbuf) { +break; +} + +qemu_mutex_lock(&decomp_param[i].mutex); +decomp_param[i].quit = true; +qemu_cond_signal(&decomp_param[i].cond); +qemu_mutex_unlock(&decomp_param[i].mutex); +} +for (i = 0; i < thread_count; i++) { +if (!decomp_param[i].compbuf) { +break; +} + +qemu_thread_join(decompress_threads + i); +qemu_mutex_destroy(&decomp_param[i].mutex); +qemu_cond_destroy(&decomp_param[i].cond); +inflateEnd(&decomp_param[i].stream); +g_free(decomp_param[i].compbuf); +decomp_param[i].compbuf = NULL; +} +g_free(decompress_threads); +g_free(decomp_param); +decompress_threads = NULL; +decomp_param = NULL; +decomp_file = NULL; +} + +int compress_threads_load_setup(QEMUFile *f) +{ +int i, thread_count; + +if (!migrate_compress()) { +return 0; +} + +thread_count = migrate_decompress_threads(); +decompress_thr
[PATCH 12/13] ram-compress.c: Make target independent
From: Lukas Straub Make ram-compress.c target independent. Signed-off-by: Lukas Straub Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/meson.build| 3 ++- migration/ram-compress.c | 17 ++--- 2 files changed, 12 insertions(+), 8 deletions(-) diff --git a/migration/meson.build b/migration/meson.build index 2090af8e85..75de868bb7 100644 --- a/migration/meson.build +++ b/migration/meson.build @@ -23,6 +23,8 @@ softmmu_ss.add(files( 'migration.c', 'multifd.c', 'multifd-zlib.c', + 'multifd-zlib.c', + 'ram-compress.c', 'options.c', 'postcopy-ram.c', 'savevm.c', @@ -40,5 +42,4 @@ softmmu_ss.add(when: zstd, if_true: files('multifd-zstd.c')) specific_ss.add(when: 'CONFIG_SOFTMMU', if_true: files('dirtyrate.c', 'ram.c', - 'ram-compress.c', 'target.c')) diff --git a/migration/ram-compress.c b/migration/ram-compress.c index 3d2a4a6329..06254d8c69 100644 --- a/migration/ram-compress.c +++ b/migration/ram-compress.c @@ -35,7 +35,8 @@ #include "migration.h" #include "options.h" #include "io/channel-null.h" -#include "exec/ram_addr.h" +#include "exec/target_page.h" +#include "exec/ramblock.h" CompressionStats compression_counters; @@ -156,7 +157,7 @@ int compress_threads_save_setup(void) qemu_cond_init(&comp_done_cond); qemu_mutex_init(&comp_done_lock); for (i = 0; i < thread_count; i++) { -comp_param[i].originbuf = g_try_malloc(TARGET_PAGE_SIZE); +comp_param[i].originbuf = g_try_malloc(qemu_target_page_size()); if (!comp_param[i].originbuf) { goto exit; } @@ -192,11 +193,12 @@ static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream, uint8_t *source_buf) { uint8_t *p = block->host + offset; +size_t page_size = qemu_target_page_size(); int ret; assert(qemu_file_buffer_empty(f)); -if (buffer_is_zero(p, TARGET_PAGE_SIZE)) { +if (buffer_is_zero(p, page_size)) { return RES_ZEROPAGE; } @@ -205,8 +207,8 @@ static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream, * so that we can catch up the error during compression and * decompression */ -memcpy(source_buf, p, TARGET_PAGE_SIZE); -ret = qemu_put_compression_data(f, stream, source_buf, TARGET_PAGE_SIZE); +memcpy(source_buf, p, page_size); +ret = qemu_put_compression_data(f, stream, source_buf, page_size); if (ret < 0) { qemu_file_set_error(migrate_get_current()->to_dst_file, ret); error_report("compressed data failed!"); @@ -336,7 +338,7 @@ static void *do_data_decompress(void *opaque) param->des = 0; qemu_mutex_unlock(¶m->mutex); -pagesize = TARGET_PAGE_SIZE; +pagesize = qemu_target_page_size(); ret = qemu_uncompress_data(¶m->stream, des, pagesize, param->compbuf, len); @@ -439,7 +441,8 @@ int compress_threads_load_setup(QEMUFile *f) goto exit; } -decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE)); +size_t compbuf_size = compressBound(qemu_target_page_size()); +decomp_param[i].compbuf = g_malloc0(compbuf_size); qemu_mutex_init(&decomp_param[i].mutex); qemu_cond_init(&decomp_param[i].cond); decomp_param[i].done = true; -- 2.40.0
[PATCH 07/13] ram.c: Call update_compress_thread_counts from compress_send_queued_data
From: Lukas Straub This makes the core compress code more independend from ram.c. Signed-off-by: Lukas Straub Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 18 ++ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index c52602b70d..d1c24eff21 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1540,12 +1540,14 @@ static int send_queued_data(CompressParam *param) abort(); } +update_compress_thread_counts(param, len); + return len; } static void flush_compressed_data(RAMState *rs) { -int idx, len, thread_count; +int idx, thread_count; if (!save_page_use_compression(rs)) { return; @@ -1564,15 +1566,8 @@ static void flush_compressed_data(RAMState *rs) qemu_mutex_lock(&comp_param[idx].mutex); if (!comp_param[idx].quit) { CompressParam *param = &comp_param[idx]; -len = send_queued_data(param); +send_queued_data(param); compress_reset_result(param); - -/* - * it's safe to fetch zero_page without holding comp_done_lock - * as there is no further request submitted to the thread, - * i.e, the thread should be waiting for a request at this point. - */ -update_compress_thread_counts(param, len); } qemu_mutex_unlock(&comp_param[idx].mutex); } @@ -1588,7 +1583,7 @@ static inline void set_compress_params(CompressParam *param, RAMBlock *block, static int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset) { -int idx, thread_count, bytes_xmit = -1, pages = -1; +int idx, thread_count, pages = -1; bool wait = migrate_compress_wait_thread(); thread_count = migrate_compress_threads(); @@ -1599,11 +1594,10 @@ retry: CompressParam *param = &comp_param[idx]; qemu_mutex_lock(¶m->mutex); param->done = false; -bytes_xmit = send_queued_data(param); +send_queued_data(param); compress_reset_result(param); set_compress_params(param, block, offset); -update_compress_thread_counts(param, bytes_xmit); qemu_cond_signal(¶m->cond); qemu_mutex_unlock(¶m->mutex); pages = 1; -- 2.40.0
[PATCH 05/13] ram.c: Reset result after sending queued data
From: Lukas Straub And take the param->mutex lock for the whole section to ensure thread-safety. Now, it is explicitly clear if there is no queued data to send. Before, this was handled by param->file stream being empty and thus qemu_put_qemu_file() not sending anything. This will be used in the next commits to move save_page_header() out of compress code. Signed-off-by: Lukas Straub Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 32 ++-- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index b552a9e538..4e14e3bb94 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1508,6 +1508,13 @@ update_compress_thread_counts(const CompressParam *param, int bytes_xmit) static bool save_page_use_compression(RAMState *rs); +static inline void compress_reset_result(CompressParam *param) +{ +param->result = RES_NONE; +param->block = NULL; +param->offset = 0; +} + static void flush_compressed_data(RAMState *rs) { MigrationState *ms = migrate_get_current(); @@ -1529,13 +1536,16 @@ static void flush_compressed_data(RAMState *rs) for (idx = 0; idx < thread_count; idx++) { qemu_mutex_lock(&comp_param[idx].mutex); if (!comp_param[idx].quit) { -len = qemu_put_qemu_file(ms->to_dst_file, comp_param[idx].file); +CompressParam *param = &comp_param[idx]; +len = qemu_put_qemu_file(ms->to_dst_file, param->file); +compress_reset_result(param); + /* * it's safe to fetch zero_page without holding comp_done_lock * as there is no further request submitted to the thread, * i.e, the thread should be waiting for a request at this point. */ -update_compress_thread_counts(&comp_param[idx], len); +update_compress_thread_counts(param, len); } qemu_mutex_unlock(&comp_param[idx].mutex); } @@ -1560,15 +1570,17 @@ static int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset) retry: for (idx = 0; idx < thread_count; idx++) { if (comp_param[idx].done) { -comp_param[idx].done = false; -bytes_xmit = qemu_put_qemu_file(ms->to_dst_file, -comp_param[idx].file); -qemu_mutex_lock(&comp_param[idx].mutex); -set_compress_params(&comp_param[idx], block, offset); -qemu_cond_signal(&comp_param[idx].cond); -qemu_mutex_unlock(&comp_param[idx].mutex); +CompressParam *param = &comp_param[idx]; +qemu_mutex_lock(¶m->mutex); +param->done = false; +bytes_xmit = qemu_put_qemu_file(ms->to_dst_file, param->file); +compress_reset_result(param); +set_compress_params(param, block, offset); + +update_compress_thread_counts(param, bytes_xmit); +qemu_cond_signal(¶m->cond); +qemu_mutex_unlock(¶m->mutex); pages = 1; -update_compress_thread_counts(&comp_param[idx], bytes_xmit); break; } } -- 2.40.0
[PATCH 03/13] ram.c: Let the compress threads return a CompressResult enum
From: Lukas Straub This will be used in the next commits to move save_page_header() out of compress code. Signed-off-by: Lukas Straub Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/ram.c | 34 ++ 1 file changed, 22 insertions(+), 12 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 5e7bf20ca5..7bc05fc034 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -482,10 +482,17 @@ MigrationOps *migration_ops; CompressionStats compression_counters; +enum CompressResult { +RES_NONE = 0, +RES_ZEROPAGE = 1, +RES_COMPRESS = 2 +}; +typedef enum CompressResult CompressResult; + struct CompressParam { bool done; bool quit; -bool zero_page; +CompressResult result; QEMUFile *file; QemuMutex mutex; QemuCond cond; @@ -527,8 +534,9 @@ static QemuCond decomp_done_cond; static int ram_save_host_page_urgent(PageSearchStatus *pss); -static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block, - ram_addr_t offset, uint8_t *source_buf); +static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream, + RAMBlock *block, ram_addr_t offset, + uint8_t *source_buf); /* NOTE: page is the PFN not real ram_addr_t. */ static void pss_init(PageSearchStatus *pss, RAMBlock *rb, ram_addr_t page) @@ -553,7 +561,7 @@ static void *do_data_compress(void *opaque) CompressParam *param = opaque; RAMBlock *block; ram_addr_t offset; -bool zero_page; +CompressResult result; qemu_mutex_lock(¶m->mutex); while (!param->quit) { @@ -563,12 +571,12 @@ static void *do_data_compress(void *opaque) param->block = NULL; qemu_mutex_unlock(¶m->mutex); -zero_page = do_compress_ram_page(param->file, ¶m->stream, - block, offset, param->originbuf); +result = do_compress_ram_page(param->file, ¶m->stream, + block, offset, param->originbuf); qemu_mutex_lock(&comp_done_lock); param->done = true; -param->zero_page = zero_page; +param->result = result; qemu_cond_signal(&comp_done_cond); qemu_mutex_unlock(&comp_done_lock); @@ -1452,8 +1460,9 @@ static int ram_save_multifd_page(QEMUFile *file, RAMBlock *block, return 1; } -static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block, - ram_addr_t offset, uint8_t *source_buf) +static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream, + RAMBlock *block, ram_addr_t offset, + uint8_t *source_buf) { RAMState *rs = ram_state; PageSearchStatus *pss = &rs->pss[RAM_CHANNEL_PRECOPY]; @@ -1461,7 +1470,7 @@ static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block, int ret; if (save_zero_page_to_file(pss, f, block, offset)) { -return true; +return RES_ZEROPAGE; } save_page_header(pss, f, block, offset | RAM_SAVE_FLAG_COMPRESS_PAGE); @@ -1476,8 +1485,9 @@ static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block, if (ret < 0) { qemu_file_set_error(migrate_get_current()->to_dst_file, ret); error_report("compressed data failed!"); +return RES_NONE; } -return false; +return RES_COMPRESS; } static void @@ -1485,7 +1495,7 @@ update_compress_thread_counts(const CompressParam *param, int bytes_xmit) { ram_transferred_add(bytes_xmit); -if (param->zero_page) { +if (param->result == RES_ZEROPAGE) { stat64_add(&mig_stats.zero_pages, 1); return; } -- 2.40.0
[PATCH 13/13] migration: Initialize and cleanup decompression in migration.c
From: Lukas Straub This fixes compress with colo. Signed-off-by: Lukas Straub Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/migration.c | 9 + migration/ram.c | 5 - 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 232e387109..0ee07802a5 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -26,6 +26,7 @@ #include "sysemu/cpu-throttle.h" #include "rdma.h" #include "ram.h" +#include "ram-compress.h" #include "migration/global_state.h" #include "migration/misc.h" #include "migration.h" @@ -228,6 +229,7 @@ void migration_incoming_state_destroy(void) struct MigrationIncomingState *mis = migration_incoming_get_current(); multifd_load_cleanup(); +compress_threads_load_cleanup(); if (mis->to_src_file) { /* Tell source that we are done */ @@ -500,6 +502,12 @@ process_incoming_migration_co(void *opaque) Error *local_err = NULL; assert(mis->from_src_file); + +if (compress_threads_load_setup(mis->from_src_file)) { +error_report("Failed to setup decompress threads"); +goto fail; +} + mis->migration_incoming_co = qemu_coroutine_self(); mis->largest_page_size = qemu_ram_pagesize_largest(); postcopy_state_set(POSTCOPY_INCOMING_NONE); @@ -565,6 +573,7 @@ fail: qemu_fclose(mis->from_src_file); multifd_load_cleanup(); +compress_threads_load_cleanup(); exit(EXIT_FAILURE); } diff --git a/migration/ram.c b/migration/ram.c index ee4ab31f25..f78e9912cd 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3558,10 +3558,6 @@ void colo_release_ram_cache(void) */ static int ram_load_setup(QEMUFile *f, void *opaque) { -if (compress_threads_load_setup(f)) { -return -1; -} - xbzrle_load_setup(); ramblock_recv_map_init(); @@ -3577,7 +3573,6 @@ static int ram_load_cleanup(void *opaque) } xbzrle_load_cleanup(); -compress_threads_load_cleanup(); RAMBLOCK_FOREACH_NOT_IGNORED(rb) { g_free(rb->receivedmap); -- 2.40.0
[PATCH 11/13] ram compress: Assert that the file buffer matches the result
From: Lukas Straub Before this series, "nothing to send" was handled by the file buffer being empty. Now it is tracked via param->result. Assert that the file buffer state matches the result. Signed-off-by: Lukas Straub Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/qemu-file.c| 11 +++ migration/qemu-file.h| 1 + migration/ram-compress.c | 5 + migration/ram.c | 2 ++ 4 files changed, 19 insertions(+) diff --git a/migration/qemu-file.c b/migration/qemu-file.c index f4cfd05c67..61fb580342 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -870,6 +870,17 @@ int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src) return len; } +/* + * Check if the writable buffer is empty + */ + +bool qemu_file_buffer_empty(QEMUFile *file) +{ +assert(qemu_file_is_writable(file)); + +return !file->iovcnt; +} + /* * Get a string whose length is determined by a single preceding byte * A preallocated 256 byte buffer must be passed in. diff --git a/migration/qemu-file.h b/migration/qemu-file.h index 4f26bf6961..4ee58a87dd 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -113,6 +113,7 @@ size_t coroutine_mixed_fn qemu_get_buffer_in_place(QEMUFile *f, uint8_t **buf, s ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream, const uint8_t *p, size_t size); int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src); +bool qemu_file_buffer_empty(QEMUFile *file); /* * Note that you can only peek continuous bytes from where the current pointer diff --git a/migration/ram-compress.c b/migration/ram-compress.c index c25562f12d..3d2a4a6329 100644 --- a/migration/ram-compress.c +++ b/migration/ram-compress.c @@ -194,6 +194,8 @@ static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream, uint8_t *p = block->host + offset; int ret; +assert(qemu_file_buffer_empty(f)); + if (buffer_is_zero(p, TARGET_PAGE_SIZE)) { return RES_ZEROPAGE; } @@ -208,6 +210,7 @@ static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream, if (ret < 0) { qemu_file_set_error(migrate_get_current()->to_dst_file, ret); error_report("compressed data failed!"); +qemu_fflush(f); return RES_NONE; } return RES_COMPRESS; @@ -239,6 +242,7 @@ void flush_compressed_data(int (send_queued_data(CompressParam *))) if (!comp_param[idx].quit) { CompressParam *param = &comp_param[idx]; send_queued_data(param); +assert(qemu_file_buffer_empty(param->file)); compress_reset_result(param); } qemu_mutex_unlock(&comp_param[idx].mutex); @@ -268,6 +272,7 @@ retry: qemu_mutex_lock(¶m->mutex); param->done = false; send_queued_data(param); +assert(qemu_file_buffer_empty(param->file)); compress_reset_result(param); set_compress_params(param, block, offset); diff --git a/migration/ram.c b/migration/ram.c index 009681d213..ee4ab31f25 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1321,11 +1321,13 @@ static int send_queued_data(CompressParam *param) assert(block == pss->last_sent_block); if (param->result == RES_ZEROPAGE) { +assert(qemu_file_buffer_empty(param->file)); len += save_page_header(pss, file, block, offset | RAM_SAVE_FLAG_ZERO); qemu_put_byte(file, 0); len += 1; ram_release_page(block->idstr, offset); } else if (param->result == RES_COMPRESS) { +assert(!qemu_file_buffer_empty(param->file)); len += save_page_header(pss, file, block, offset | RAM_SAVE_FLAG_COMPRESS_PAGE); len += qemu_put_qemu_file(file, param->file); -- 2.40.0
[PATCH 09/13] ram.c: Move core compression code into its own file
From: Lukas Straub No functional changes intended. Signed-off-by: Lukas Straub Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/meson.build| 5 +- migration/ram-compress.c | 274 +++ migration/ram-compress.h | 65 ++ migration/ram.c | 262 + 4 files changed, 344 insertions(+), 262 deletions(-) create mode 100644 migration/ram-compress.c create mode 100644 migration/ram-compress.h diff --git a/migration/meson.build b/migration/meson.build index da1897fadf..2090af8e85 100644 --- a/migration/meson.build +++ b/migration/meson.build @@ -38,4 +38,7 @@ endif softmmu_ss.add(when: zstd, if_true: files('multifd-zstd.c')) specific_ss.add(when: 'CONFIG_SOFTMMU', -if_true: files('dirtyrate.c', 'ram.c', 'target.c')) +if_true: files('dirtyrate.c', + 'ram.c', + 'ram-compress.c', + 'target.c')) diff --git a/migration/ram-compress.c b/migration/ram-compress.c new file mode 100644 index 00..d9bc67d075 --- /dev/null +++ b/migration/ram-compress.c @@ -0,0 +1,274 @@ +/* + * QEMU System Emulator + * + * Copyright (c) 2003-2008 Fabrice Bellard + * Copyright (c) 2011-2015 Red Hat Inc + * + * Authors: + * Juan Quintela + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "qemu/osdep.h" +#include "qemu/cutils.h" + +#include "ram-compress.h" + +#include "qemu/error-report.h" +#include "migration.h" +#include "options.h" +#include "io/channel-null.h" +#include "exec/ram_addr.h" + +CompressionStats compression_counters; + +static CompressParam *comp_param; +static QemuThread *compress_threads; +/* comp_done_cond is used to wake up the migration thread when + * one of the compression threads has finished the compression. + * comp_done_lock is used to co-work with comp_done_cond. + */ +static QemuMutex comp_done_lock; +static QemuCond comp_done_cond; + +static CompressResult do_compress_ram_page(QEMUFile *f, z_stream *stream, + RAMBlock *block, ram_addr_t offset, + uint8_t *source_buf); + +static void *do_data_compress(void *opaque) +{ +CompressParam *param = opaque; +RAMBlock *block; +ram_addr_t offset; +CompressResult result; + +qemu_mutex_lock(¶m->mutex); +while (!param->quit) { +if (param->trigger) { +block = param->block; +offset = param->offset; +param->trigger = false; +qemu_mutex_unlock(¶m->mutex); + +result = do_compress_ram_page(param->file, ¶m->stream, + block, offset, param->originbuf); + +qemu_mutex_lock(&comp_done_lock); +param->done = true; +param->result = result; +qemu_cond_signal(&comp_done_cond); +qemu_mutex_unlock(&comp_done_lock); + +qemu_mutex_lock(¶m->mutex); +} else { +qemu_cond_wait(¶m->cond, ¶m->mutex); +} +} +qemu_mutex_unlock(¶m->mutex); + +return NULL; +} + +void compress_threads_save_cleanup(void) +{ +int i, thread_count; + +if (!migrate_compress() || !comp_param) { +return; +} + +thread_count = migrate_compress_threads(); +for (i = 0; i < thread_count; i++) { +/* + * we use it as a indicator which shows if the thread is + * properly init'd or not + */ +if (!comp_param[i].file) { +break; +} + +qemu_mutex_lock(&comp_param[i].mutex); +comp_param[i].quit = true; +qemu_cond_signal(&comp_param[i].cond); +qemu_mutex_unlock(&comp_param[i].mutex); + +qemu_thread_join(compress_threads + i); +qemu_mutex_destroy(&comp_param[i].mutex); +qemu_cond_destroy(&comp