Re: [PATCH v4 03/10] migration: migrate 'inc' command option is deprecated.
Juan Quintela writes: > Markus Armbruster wrote: >> Juan Quintela writes: >> >>> Markus Armbruster wrote: Juan Quintela writes: >>> So what I want, I want to remove -i/-b in the next version (9.0?). For >>> the other, I want to remove it, but I don't care if the code is around >>> in "deprecated" state for another couple of years if there are still >>> people that feel that they want it. >>> >>> This is the reason that I put a pointer for -i/-b to >>> @block/@block-incremental. They are "perfect" replacements. >>> >>> I can put here to use blockdev-mirror + NBD, but the replacement is not >>> so direct. >>> >>> Does this make sense? >> >> I see where you're coming from. Now let's change perspective for a >> minute: what's the purpose of deprecating stuff? >> >> We normally deprecate with intent to remove. >> >> We don't remove right away, because we promised to first deprecate for a >> grace period, so users can adjust in an orderly manner. The deprecation >> serves as signal "you need to adjust". The documentation that comes >> with it should help with the adjustment. It's commonly of the form "use >> $alternative instead". The alternative is often a direct replacement, >> but not always. There could even be no replacement at all. We don't >> promise replacements, we promise an orderly process, so users can >> adjust. >> >> Sometimes, we don't have firm plans to remove, but are more like "maybe >> remove when gets in the way". We could soften the "you need to adjust" >> signal in documentation, but I doubt that's a good idea. Regardless, >> the need to help users adjust remains. >> >> Back to your patches. There are two separate interfaces to block >> migration, and both are deprecated at the end of the series: >> >> 1. Migration parameter @block-incremental & friends >> >>Not in the way, content to keep around for longer if it helps users. >> >>The deprecation documentation advises to use block-mirror with NBD >>instead. All good. >> >> 2. QMP migrate parameters @inc and @blk >> >>Firm intent to remove as soon as the grace period expires, because >>it's in the way. >> >>The deprecation documentation advises to use interface 1 instead. >>But that's deprecated, too! >> >>Insufficiently careful readers will miss that the replacement is >>deprecated, and just use it. Risks surprise when the replacement >>goes away, too. >> >>More careful readers will realize that this advises to use something >>we elsewhere advise not to use. Contradiction! Confusion ensues. >> >>On further reflection, these readers might conclude that the >>*combined* advice is to use block-mirror with NBD instead. This is >>correct. >> >>So why not tell them? >> >>Perhaps you'd like to give more nuanced advice, like "you should move >>to block-mirror with NBD, but if that's not practical for you, you >>should at least move to @block-incremental & friends, which will >>likely stick around for longer." That's fine. All I'm asking for is >>to not make things more confusing than they need to be :) >> >> [...] > > Telling this in deprecated.rst is enough? or you want me to put it also > in the error/warn messages and qapi? Let's make all of them point to blockdev-mirror with NBD. If you think mentioning @block-incremental & friends would be useful in some or all places would be useful, go ahead, I don't mind.
[PATCH v2 3/3] hw/nvme: Add SPDM over DOE support
From: Wilfred Mallawa Setup Data Object Exchance (DOE) as an extended capability for the NVME controller and connect SPDM to it (CMA) to it. Signed-off-by: Wilfred Mallawa Signed-off-by: Alistair Francis --- docs/specs/index.rst| 1 + docs/specs/spdm.rst | 114 include/hw/pci/pci_device.h | 5 ++ include/hw/pci/pcie_doe.h | 3 + hw/nvme/ctrl.c | 53 + 5 files changed, 176 insertions(+) create mode 100644 docs/specs/spdm.rst diff --git a/docs/specs/index.rst b/docs/specs/index.rst index e58be38c41..c398541388 100644 --- a/docs/specs/index.rst +++ b/docs/specs/index.rst @@ -24,3 +24,4 @@ guest hardware that is specific to QEMU. acpi_erst sev-guest-firmware fw_cfg + spdm diff --git a/docs/specs/spdm.rst b/docs/specs/spdm.rst new file mode 100644 index 00..dfdc3cbb4d --- /dev/null +++ b/docs/specs/spdm.rst @@ -0,0 +1,114 @@ +== +QEMU Security Protocols and Data Models (SPDM) Support +== + +SPDM enables authentication, attestation and key exchange to assist in +providing infrastructure security enablement. It's a standard published +by the `DMTF`_. + +QEMU supports connecting to a SPDM Responder implementation. This allows an +external application to emulate the SPDM Responder logic for an SPDM device. + +Setting up a SPDM server + + +When using QEMU with SPDM devices QEMU will connect to a server which +implements the SPDM functionality. + +SPDM-Utils +-- + +You can use `SPDM Utils`_ to emulate a Responder. + +SPDM-Utils is a Linux applications to manage, test and develop devices +supporting DMTF Security Protocol and Data Model (SPDM). It is written in Rust +and utilises libspdm. + +To use SPDM-Utils you will need to do the followoing: + + 1. `Build SPDM Utils`_ + 2. `Generate the certificates`_ + 3. `Run it as a server`_ + +spdm-emu + + +You can use `spdm emu`_ to model the +SPDM responder. + +.. code-block:: shell + +$ cd spdm-emu +$ git submodule init; git submodule update --recursive +$ mkdir build; cd build +$ cmake -DARCH=x64 -DTOOLCHAIN=GCC -DTARGET=Debug -DCRYPTO=openssl .. +$ make -j32 +$ make copy_sample_key # Build certificates, required for SPDM authentication. + +It is worth noting that the certificates should be in compliance with +PCIe r6.1 sec 6.31.3. This means you will need to add the following to +openssl.cnf + +.. code-block:: + +subjectAltName = otherName:2.23.147;UTF8:Vendor=1b36:Device=0010:CC=010802:REV=02:SSVID=1af4:SSID=1100 +2.23.147 = ASN1:OID:2.23.147 + +and then manually regenerate some certificates with: + +.. code-block:: shell + +openssl req -nodes -newkey ec:param.pem -keyout end_responder.key -out end_responder.req -sha384 -batch -subj "/CN=DMTF libspdm ECP384 responder cert" +openssl x509 -req -in end_responder.req -out end_responder.cert -CA inter.cert -CAkey inter.key -sha384 -days 3650 -set_serial 3 -extensions v3_end -extfile ../openssl.cnf +openssl asn1parse -in end_responder.cert -out end_responder.cert.der +cat ca.cert.der inter.cert.der end_responder.cert.der > bundle_responder.certchain.der + +You can use SPDM-Utils instead as it will generate the correct certificates +automatically. + +The responder can then be launched with + +.. code-block:: shell + +$ cd bin +$ ./spdm_responder_emu --trans PCI_DOE + +Connecting an SPDM NVMe device +== + +Once a SPDM server is running we can start QEMU and connect to the server. + +For an NVMe device first let's setup a block we can use + +.. code-block:: shell + +$ cd qemu-spdm/linux/image +$ dd if=/dev/zero of=blknvme bs=1M count=2096 # 2GB NNMe Drive + +Then you can add this to your QEMU command line: + +.. code-block:: shell + +-drive file=blknvme,if=none,id=mynvme,format=raw \ +-device nvme,drive=mynvme,serial=deadbeef,spdm=2323 + +At which point QEMU will try to connect to the SPDM server. + + +.. _DMTF: + https://www.dmtf.org/standards/SPDM + +.. _SPDM Utils: + https://github.com/westerndigitalcorporation/spdm-utils + +.. _spdm emu: + https://github.com/dmtf/spdm-emu + +.. _Build SPDM Utils: + https://github.com/westerndigitalcorporation/spdm-utils#building + +.. _Generate the certificates: + https://github.com/westerndigitalcorporation/spdm-utils#generate-mutable-certificates + +.. _Run it as a server: + https://github.com/westerndigitalcorporation/spdm-utils#qemu-spdm-device-emulation diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h index d3dd0f64b2..b8379c78f1 100644 --- a/include/hw/pci/pci_device.h +++ b/include/hw/pci/pci_device.h @@ -3,6 +3,7 @@ #include "hw/pci/pci.h" #include "hw/pci/pcie.h" +#include "hw/pci/pcie_doe.h" #define TYPE_PCI_DEVICE "pci-device" typedef struct PCIDeviceClass PCIDeviceClass; @@
[PATCH v2 0/3] Initial support for SPDM Responders
The Security Protocol and Data Model (SPDM) Specification defines messages, data objects, and sequences for performing message exchanges over a variety of transport and physical media. - https://www.dmtf.org/sites/default/files/standards/documents/DSP0274_1.3.0.pdf SPDM currently supports PCIe DOE and MCTP transports, but it can be extended to support others in the future. This series adds support to QEMU to connect to an external SPDM instance. SPDM support can be added to any QEMU device by exposing a TCP socket to a SPDM server. The server can then implement the SPDM decoding/encoding support, generally using libspdm [2]. This is similar to how the current TPM implementation works and means that the heavy lifting of setting up certificate chains, capabilities, measurements and complex crypto can be done outside QEMU by a well supported and tested library. This series implements socket support and exposes SPDM for a NVMe device. 2: https://github.com/DMTF/libspdm v2: - Add cover letter - A few code fixes based on comments - Document SPDM-Utils - A few tweaks and clarifications to the documentation Alistair Francis (1): hw/pci: Add all Data Object Types defined in PCIe r6.0 Huai-Cheng Kuo (1): backends: Initial support for SPDM socket support Wilfred Mallawa (1): hw/nvme: Add SPDM over DOE support docs/specs/index.rst | 1 + docs/specs/spdm.rst | 114 ++ include/hw/pci/pci_device.h | 5 + include/hw/pci/pcie_doe.h| 5 + include/sysemu/spdm-socket.h | 44 +++ backends/spdm-socket.c | 216 +++ hw/nvme/ctrl.c | 53 + backends/Kconfig | 4 + backends/meson.build | 2 + 9 files changed, 444 insertions(+) create mode 100644 docs/specs/spdm.rst create mode 100644 include/sysemu/spdm-socket.h create mode 100644 backends/spdm-socket.c -- 2.41.0
[PATCH v2 2/3] backends: Initial support for SPDM socket support
From: Huai-Cheng Kuo SPDM enables authentication, attestation and key exchange to assist in providing infrastructure security enablement. It's a standard published by the DMTF [1]. SPDM supports multiple transports, including PCIe DOE and MCTP. This patch adds support to QEMU to connect to an external SPDM instance. SPDM support can be added to any QEMU device by exposing a TCP socket to a SPDM server. The server can then implement the SPDM decoding/encoding support, generally using libspdm [2]. This is similar to how the current TPM implementation works and means that the heavy lifting of setting up certificate chains, capabilities, measurements and complex crypto can be done outside QEMU by a well supported and tested library. 1: https://www.dmtf.org/standards/SPDM 2: https://github.com/DMTF/libspdm Signed-off-by: Huai-Cheng Kuo Signed-off-by: Chris Browy Co-developed-by: Jonathan Cameron Signed-off-by: Jonathan Cameron [ Changes by WM - Bug fixes from testing ] Signed-off-by: Wilfred Mallawa [ Changes by AF: - Convert to be more QEMU-ified - Move to backends as it isn't PCIe specific ] Signed-off-by: Alistair Francis --- include/sysemu/spdm-socket.h | 44 +++ backends/spdm-socket.c | 216 +++ backends/Kconfig | 4 + backends/meson.build | 2 + 4 files changed, 266 insertions(+) create mode 100644 include/sysemu/spdm-socket.h create mode 100644 backends/spdm-socket.c diff --git a/include/sysemu/spdm-socket.h b/include/sysemu/spdm-socket.h new file mode 100644 index 00..24e6fccb83 --- /dev/null +++ b/include/sysemu/spdm-socket.h @@ -0,0 +1,44 @@ +/* + * QEMU SPDM socket support + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#ifndef SPDM_REQUESTER_H +#define SPDM_REQUESTER_H + +int spdm_socket_connect(uint16_t port, Error **errp); +uint32_t spdm_socket_rsp(const int socket, uint32_t transport_type, + void *req, uint32_t req_len, + void *rsp, uint32_t rsp_len); +void spdm_socket_close(const int socket, uint32_t transport_type); + +#define SPDM_SOCKET_COMMAND_NORMAL0x0001 +#define SPDM_SOCKET_COMMAND_OOB_ENCAP_KEY_UPDATE 0x8001 +#define SPDM_SOCKET_COMMAND_CONTINUE 0xFFFD +#define SPDM_SOCKET_COMMAND_SHUTDOWN 0xFFFE +#define SPDM_SOCKET_COMMAND_UNKOWN0x +#define SPDM_SOCKET_COMMAND_TEST 0xDEAD + +#define SPDM_SOCKET_TRANSPORT_TYPE_MCTP 0x01 +#define SPDM_SOCKET_TRANSPORT_TYPE_PCI_DOE0x02 + +#define SPDM_SOCKET_MAX_MESSAGE_BUFFER_SIZE 0x1200 + +#endif diff --git a/backends/spdm-socket.c b/backends/spdm-socket.c new file mode 100644 index 00..d0663d696c --- /dev/null +++ b/backends/spdm-socket.c @@ -0,0 +1,216 @@ +/* SPDX-License-Identifier: BSD-3-Clause */ +/* + * QEMU SPDM socket support + * + * This is based on: + * https://github.com/DMTF/spdm-emu/blob/07c0a838bcc1c6207c656ac75885c0603e344b6f/spdm_emu/spdm_emu_common/command.c + * but has been re-written to match QEMU style + * + * Copyright (c) 2021, DMTF. All rights reserved. + * Copyright (c) 2023. Western Digital Corporation or its affiliates. + */ + +#include "qemu/osdep.h" +#include "sysemu/spdm-socket.h" +#include "qapi/error.h" + +static bool read_bytes(const int socket, uint8_t *buffer, + size_t number_of_bytes) +{ +ssize_t number_received = 0; +ssize_t result; + +while (number_received < number_of_bytes) { +result = recv(socket, buffer + number_received, + number_of_bytes - number_received, 0); +if (result <= 0) { +return false; +} +number_received += result; +} +return true; +} + +static bool read_data32(const int socket, uint32_t *data) +{ +bool result; + +result = read_bytes(socket, (uint8_t *)data, sizeof(uint32_t)); +if (!result) { +
[PATCH v2 1/3] hw/pci: Add all Data Object Types defined in PCIe r6.0
Add all of the defined protocols/features from the PCIe-SIG r6.0 "Table 6-32 PCI-SIG defined Data Object Types (Vendor ID = 0001h)" table. Signed-off-by: Alistair Francis --- include/hw/pci/pcie_doe.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/hw/pci/pcie_doe.h b/include/hw/pci/pcie_doe.h index 87dc17dcef..15d94661f9 100644 --- a/include/hw/pci/pcie_doe.h +++ b/include/hw/pci/pcie_doe.h @@ -46,6 +46,8 @@ REG32(PCI_DOE_CAP_STATUS, 0) /* PCI-SIG defined Data Object Types - r6.0 Table 6-32 */ #define PCI_SIG_DOE_DISCOVERY 0x00 +#define PCI_SIG_DOE_CMA 0x01 +#define PCI_SIG_DOE_SECURED_CMA 0x02 #define PCI_DOE_DW_SIZE_MAX (1 << 18) #define PCI_DOE_PROTOCOL_NUM_MAX256 -- 2.41.0
Re: [PATCH 2/2] virtio: Drop out of coroutine context in virtio_load()
05.09.2023 17:50, Kevin Wolf wrote: virtio_load() as a whole should run in coroutine context because it reads from the migration stream and we don't want this to block. However, it calls virtio_set_features_nocheck() and devices don't expect their .set_features callback to run in a coroutine and therefore call functions that may not be called in coroutine context. To fix this, drop out of coroutine context for calling virtio_set_features_nocheck(). ... Cc: qemu-sta...@nongnu.org Buglink: https://issues.redhat.com/browse/RHEL-832 Signed-off-by: Kevin Wolf It looks like this change caused an interesting regression, https://gitlab.com/qemu-project/qemu/-/issues/1933 at least in -stable. Can you take a look please? BTW, Kevin, do you have account @gitlab? Thanks, /mjt
Re: [PULL 00/25] Python patches
On Mon, Oct 16, 2023 at 3:21 PM Stefan Hajnoczi wrote: > > Applied, thanks. > > Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any > user-visible changes. Hi Vladimir: all done! I've created a MR to backport your changes to the standalone repo here: https://gitlab.com/qemu-project/python-qemu-qmp/-/merge_requests/30 - please give it a quick glance to make sure I didn't butcher anything in transit. Thanks, --js
Re: [PULL 00/38] Migration 20231016 patches
Stefan Hajnoczi writes: > On Mon, 16 Oct 2023 at 13:13, Fabiano Rosas wrote: >> >> Stefan Hajnoczi writes: >> >> > On Mon, 16 Oct 2023 at 06:11, Juan Quintela wrote: >> >> >> >> The following changes since commit >> >> 63011373ad22c794a013da69663c03f1297a5c56: >> >> >> >> Merge tag 'pull-riscv-to-apply-20231012-1' of >> >> https://github.com/alistair23/qemu into staging (2023-10-12 10:24:44 >> >> -0400) >> >> >> >> are available in the Git repository at: >> >> >> >> https://gitlab.com/juan.quintela/qemu.git >> >> tags/migration-20231016-pull-request >> >> >> >> for you to fetch changes up to f39b0f42753635b0f2d8b00a26d11bb197bf51e2: >> >> >> >> migration/multifd: Clarify Error usage in multifd_channel_connect >> >> (2023-10-16 11:01:33 +0200) >> >> >> >> >> >> Migration Pull request (20231016) >> >> >> >> In this pull request: >> >> - rdma cleanups >> >> - removal of QEMUFileHook >> >> - test for analyze-migration.py >> >> - test for multifd file >> >> - multifd cleanups >> >> - available switchover bandwidth >> >> - lots of cleanups. >> >> >> >> CI: https://gitlab.com/juan.quintela/qemu/-/pipelines/1037878829 >> >> >> >> Please, apply. >> > >> > This CI failure looks migration-related: >> > >> > MALLOC_PERTURB_=96 >> > PYTHON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/pyvenv/bin/python3 >> > QTEST_QEMU_BINARY=./qemu-system-i386 >> > G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh >> > QTEST_QEMU_IMG=./qemu-img >> > QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon >> > /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qtest/migration-test >> > --tap -k >> > ― ✀ >> > ― >> > stderr: >> > ** >> > ERROR:../tests/qtest/migration-test.c:1969:file_offset_finish_hook: >> > assertion failed (cpu_to_be32(*p) == QEMU_VM_FILE_MAGIC): (3 == >> > 1363498573) >> >> That's the test for the file: transport which got merged in the last >> PR. I'll look into it. > > I have dropped the 20231016 pull request for now and the tests passed > without it. Maybe there is an interaction with the test you recently > added and this pull request? Sorry, I expressed myself poorly. The test _is_ what is breaking this pull request. The feature was already merged and is working fine. I commented with a fix on the patch that adds the test. [PULL 11/38] tests/qtest: migration-test: Add tests for file-based migration https://lore.kernel.org/r/20231016100706.2551-12-quint...@redhat.com
[PULL 0/1] Block patches
The following changes since commit 63011373ad22c794a013da69663c03f1297a5c56: Merge tag 'pull-riscv-to-apply-20231012-1' of https://github.com/alistair23/qemu into staging (2023-10-12 10:24:44 -0400) are available in the Git repository at: https://gitlab.com/stefanha/qemu.git tags/block-pull-request for you to fetch changes up to 071d6d107db2e26dde9bb15457c74956c88ec5b4: virtio-blk: don't start dataplane during the stop of dataplane (2023-10-16 15:39:13 -0400) Pull request Contains a virtio-blk IOThread fix. hujian (1): virtio-blk: don't start dataplane during the stop of dataplane hw/block/virtio-blk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.41.0
[PULL 1/1] virtio-blk: don't start dataplane during the stop of dataplane
From: hujian During the stop of dataplane for virtio-blk, virtio_bus_cleanup_host_notifier() is be called to clean up notifier at the end, if polled ioeventfd, virtio_blk_handle_output() is used to handle io request. But due to s->dataplane_disabled is false, it will be returned directly, which drops io request. Backtrace: ->virtio_blk_data_plane_stop ->virtio_bus_cleanup_host_notifier ->virtio_queue_host_notifier_read ->virtio_queue_notify_vq ->vq->handle_output ->virtio_blk_handle_output ->if (s->dataplane && !s->dataplane_stoped) ->if (!s->dataplane_disabled) ->return * ->virtio_blk_handle_output_do The above problem can occur when using "virsh reset" cmdline to reset guest, while guest does io. To fix this problem, don't try to start dataplane if s->stopping is true, and io would be handled by virtio_blk_handle_vq(). Signed-off-by: hujian Message-id: 202310111414266586...@zte.com.cn Signed-off-by: Stefan Hajnoczi --- hw/block/virtio-blk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 39e7f23fab..c2d59389cb 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -1166,7 +1166,7 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq) { VirtIOBlock *s = (VirtIOBlock *)vdev; -if (s->dataplane && !s->dataplane_started) { +if (s->dataplane && !s->dataplane_started && !s->stopping) { /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start * dataplane here instead of waiting for .set_status(). */ -- 2.41.0
Re: [PULL 00/25] Python patches
Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any user-visible changes. signature.asc Description: PGP signature
Re: [PULL v2 0/2] hw/ufs: fixes
Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any user-visible changes. signature.asc Description: PGP signature
Re: [PULL 00/26] Block layer patches
Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any user-visible changes. signature.asc Description: PGP signature
Re: [PULL 00/38] Migration 20231016 patches
On Mon, 16 Oct 2023 at 13:13, Fabiano Rosas wrote: > > Stefan Hajnoczi writes: > > > On Mon, 16 Oct 2023 at 06:11, Juan Quintela wrote: > >> > >> The following changes since commit > >> 63011373ad22c794a013da69663c03f1297a5c56: > >> > >> Merge tag 'pull-riscv-to-apply-20231012-1' of > >> https://github.com/alistair23/qemu into staging (2023-10-12 10:24:44 -0400) > >> > >> are available in the Git repository at: > >> > >> https://gitlab.com/juan.quintela/qemu.git > >> tags/migration-20231016-pull-request > >> > >> for you to fetch changes up to f39b0f42753635b0f2d8b00a26d11bb197bf51e2: > >> > >> migration/multifd: Clarify Error usage in multifd_channel_connect > >> (2023-10-16 11:01:33 +0200) > >> > >> > >> Migration Pull request (20231016) > >> > >> In this pull request: > >> - rdma cleanups > >> - removal of QEMUFileHook > >> - test for analyze-migration.py > >> - test for multifd file > >> - multifd cleanups > >> - available switchover bandwidth > >> - lots of cleanups. > >> > >> CI: https://gitlab.com/juan.quintela/qemu/-/pipelines/1037878829 > >> > >> Please, apply. > > > > This CI failure looks migration-related: > > > > MALLOC_PERTURB_=96 > > PYTHON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/pyvenv/bin/python3 > > QTEST_QEMU_BINARY=./qemu-system-i386 > > G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh > > QTEST_QEMU_IMG=./qemu-img > > QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon > > /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qtest/migration-test > > --tap -k > > ― ✀ > > ― > > stderr: > > ** > > ERROR:../tests/qtest/migration-test.c:1969:file_offset_finish_hook: > > assertion failed (cpu_to_be32(*p) == QEMU_VM_FILE_MAGIC): (3 == > > 1363498573) > > That's the test for the file: transport which got merged in the last > PR. I'll look into it. I have dropped the 20231016 pull request for now and the tests passed without it. Maybe there is an interaction with the test you recently added and this pull request? Stefan
Re: [PULL 11/38] tests/qtest: migration-test: Add tests for file-based migration
Juan Quintela writes: > From: Fabiano Rosas > > Add basic tests for file-based migration. > > Note that we cannot use test_precopy_common because that routine > expects it to be possible to run the migration live. With the file > transport there is no live migration because we must wait for the > source to finish writing the migration data to the file before the > destination can start reading. Add a new migration function > specifically to handle the file migration. > > Reviewed-by: Peter Xu > Reviewed-by: Juan Quintela > Signed-off-by: Fabiano Rosas > Signed-off-by: Juan Quintela > Message-ID: <20230712190742.22294-7-faro...@suse.de> > --- > tests/qtest/migration-test.c | 147 +++ > 1 file changed, 147 insertions(+) > > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c > index cef5081f8c..da02b6d692 100644 > --- a/tests/qtest/migration-test.c > +++ b/tests/qtest/migration-test.c > @@ -68,6 +68,10 @@ static bool got_dst_resume; > > #define ANALYZE_SCRIPT "scripts/analyze-migration.py" > > +#define QEMU_VM_FILE_MAGIC 0x5145564d > +#define FILE_TEST_FILENAME "migfile" > +#define FILE_TEST_OFFSET 0x1000 > + > #if defined(__linux__) > #include > #include > @@ -884,6 +888,7 @@ static void test_migrate_end(QTestState *from, QTestState > *to, bool test_dest) > cleanup("migsocket"); > cleanup("src_serial"); > cleanup("dest_serial"); > +cleanup(FILE_TEST_FILENAME); > } > > #ifdef CONFIG_GNUTLS > @@ -1667,6 +1672,70 @@ finish: > test_migrate_end(from, to, args->result == MIG_TEST_SUCCEED); > } > > +static void test_file_common(MigrateCommon *args, bool stop_src) > +{ > +QTestState *from, *to; > +void *data_hook = NULL; > +g_autofree char *connect_uri = g_strdup(args->connect_uri); > + > +if (test_migrate_start(, , args->listen_uri, >start)) { > +return; > +} > + > +/* > + * File migration is never live. We can keep the source VM running > + * during migration, but the destination will not be running > + * concurrently. > + */ > +g_assert_false(args->live); > + > +if (args->start_hook) { > +data_hook = args->start_hook(from, to); > +} > + > +migrate_ensure_converge(from); > +wait_for_serial("src_serial"); > + > +if (stop_src) { > +qtest_qmp_assert_success(from, "{ 'execute' : 'stop'}"); > +if (!got_src_stop) { > +qtest_qmp_eventwait(from, "STOP"); > +} > +} > + > +if (args->result == MIG_TEST_QMP_ERROR) { > +migrate_qmp_fail(from, connect_uri, "{}"); > +goto finish; > +} > + > +migrate_qmp(from, connect_uri, "{}"); > +wait_for_migration_complete(from); > + > +/* > + * We need to wait for the source to finish before starting the > + * destination. > + */ > +migrate_incoming_qmp(to, connect_uri, "{}"); > +wait_for_migration_complete(to); > + > +if (stop_src) { > +qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}"); > +} > + > +if (!got_dst_resume) { > +qtest_qmp_eventwait(to, "RESUME"); > +} > + > +wait_for_serial("dest_serial"); > + > +finish: > +if (args->finish_hook) { > +args->finish_hook(from, to, data_hook); > +} > + > +test_migrate_end(from, to, args->result == MIG_TEST_SUCCEED); > +} > + > static void test_precopy_unix_plain(void) > { > g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs); > @@ -1862,6 +1931,76 @@ static void test_precopy_unix_compress_nowait(void) > test_precopy_common(); > } > > +static void test_precopy_file(void) > +{ > +g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs, > + FILE_TEST_FILENAME); > +MigrateCommon args = { > +.connect_uri = uri, > +.listen_uri = "defer", > +}; > + > +test_file_common(, true); > +} > + > +static void file_offset_finish_hook(QTestState *from, QTestState *to, > +void *opaque) > +{ > +#if defined(__linux__) > +g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, > FILE_TEST_FILENAME); > +size_t size = FILE_TEST_OFFSET + sizeof(QEMU_VM_FILE_MAGIC); > +uintptr_t *addr, *p; > +int fd; > + > +fd = open(path, O_RDONLY); > +g_assert(fd != -1); > +addr = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0); > +g_assert(addr != MAP_FAILED); > + > +/* > + * Ensure the skipped offset contains zeros and the migration > + * stream starts at the right place. > + */ > +p = addr; > +while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) { > +g_assert(*p == 0); > +p++; > +} > +g_assert_cmpint(cpu_to_be32(*p), ==, QEMU_VM_FILE_MAGIC); This truncates to 32-bits, so it breaks on a BE host. We need this: -->8-- >From ea0c2d1c988add48d9754891a9fc7f6854a9718a Mon Sep 17 00:00:00 2001 From: Fabiano Rosas Date: Mon, 16 Oct
Re: [RFC PATCH 00/78] Strict disable implicit fallthrough
On Mon, 16 Oct 2023, 18:04 Peter Maydell, wrote: > On Mon, 16 Oct 2023 at 15:58, Manos Pitsidianakis > wrote: > > > > Hello Peter, > > > > On Mon, 16 Oct 2023, 17:13 Peter Maydell, > wrote: > >> > >> On Fri, 13 Oct 2023 at 13:42, Markus Armbruster > wrote: > >> > > >> > Emmanouil Pitsidianakis writes: > >> > > >> > > Hello, > >> > > > >> > > This RFC is inspired by the kernel's move to > -Wimplicit-fallthrough=3 > >> > > back in 2019.[0] > >> > > We take one step (or two) further by increasing it to 5 which > rejects > >> > > fall through comments and requires an attribute statement. > >> > > > >> > > [0]: > >> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a035d552a93b > >> > > > >> > > The line differences are not many, but they spread all over > different > >> > > subsystems, architectures and devices. An attempt has been made to > split > >> > > them in cohesive patches to aid post-RFC review. Part of the RFC is > to > >> > > determine whether these patch divisions needs improvement. > >> > > > >> > > Main questions this RFC poses > >> > > = > >> > > > >> > > - Is this change desirable and net-positive. > >> > > >> > Unwanted fallthrough is an easy mistake to make, and > >> > -Wimplicit-fallthrough=N helps avoid it. The question is how far up > we > >> > need to push N. Right now we're at N=2. Has unwanted fallthrough > been > >> > a problem? > >> > >> Mmm, this is my opinion I think. We have a mechanism for > >> catching "forgot the 'break'" already (our =2 setting) and > >> a way to say "intentional" in a fairly natural way (add the > >> comment). Does pushing N up any further gain us anything > >> except a load of churn? > >> > >> Also, the compiler is not the only thing that processes our > >> code: Coverity also looks for "unexpected fallthrough" issues, > >> so if we wanted to switch away from our current practice we > >> should check whether what we're switching to is an idiom > >> that Coverity recognises. > > > > > > It is a code style change as the cover letter mentions, it's not related > to the static analysis itself. > > Yes, exactly. As a code style change it needs a fairly high level > of justification for the code churn, and the cover letter > doesn't really provide one... > As I state in the cover letter, I personally find that using one macro instead of a comment regex feels more consistent. But your view is valid as well! Let's consider the RFC retracted then. -- Manos >
Re: [PATCH v2] block/file-posix: fix update_zones_wp() caller
On Fri, Aug 25, 2023 at 12:05:56PM +0800, Sam Li wrote: > When the zoned request fail, it needs to update only the wp of > the target zones for not disrupting the in-flight writes on > these other zones. The wp is updated successfully after the > request completes. > > Fixed the callers with right offset and nr_zones. > > Signed-off-by: Sam Li > --- > block/file-posix.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
Re: [PULL 00/38] Migration 20231016 patches
Stefan Hajnoczi writes: > On Mon, 16 Oct 2023 at 06:11, Juan Quintela wrote: >> >> The following changes since commit 63011373ad22c794a013da69663c03f1297a5c56: >> >> Merge tag 'pull-riscv-to-apply-20231012-1' of >> https://github.com/alistair23/qemu into staging (2023-10-12 10:24:44 -0400) >> >> are available in the Git repository at: >> >> https://gitlab.com/juan.quintela/qemu.git >> tags/migration-20231016-pull-request >> >> for you to fetch changes up to f39b0f42753635b0f2d8b00a26d11bb197bf51e2: >> >> migration/multifd: Clarify Error usage in multifd_channel_connect >> (2023-10-16 11:01:33 +0200) >> >> >> Migration Pull request (20231016) >> >> In this pull request: >> - rdma cleanups >> - removal of QEMUFileHook >> - test for analyze-migration.py >> - test for multifd file >> - multifd cleanups >> - available switchover bandwidth >> - lots of cleanups. >> >> CI: https://gitlab.com/juan.quintela/qemu/-/pipelines/1037878829 >> >> Please, apply. > > This CI failure looks migration-related: > > MALLOC_PERTURB_=96 > PYTHON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/pyvenv/bin/python3 > QTEST_QEMU_BINARY=./qemu-system-i386 > G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh > QTEST_QEMU_IMG=./qemu-img > QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon > /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qtest/migration-test > --tap -k > ― ✀ ― > stderr: > ** > ERROR:../tests/qtest/migration-test.c:1969:file_offset_finish_hook: > assertion failed (cpu_to_be32(*p) == QEMU_VM_FILE_MAGIC): (3 == > 1363498573) That's the test for the file: transport which got merged in the last PR. I'll look into it.
Re: [PULL 00/38] Migration 20231016 patches
On Mon, 16 Oct 2023 at 06:11, Juan Quintela wrote: > > The following changes since commit 63011373ad22c794a013da69663c03f1297a5c56: > > Merge tag 'pull-riscv-to-apply-20231012-1' of > https://github.com/alistair23/qemu into staging (2023-10-12 10:24:44 -0400) > > are available in the Git repository at: > > https://gitlab.com/juan.quintela/qemu.git > tags/migration-20231016-pull-request > > for you to fetch changes up to f39b0f42753635b0f2d8b00a26d11bb197bf51e2: > > migration/multifd: Clarify Error usage in multifd_channel_connect > (2023-10-16 11:01:33 +0200) > > > Migration Pull request (20231016) > > In this pull request: > - rdma cleanups > - removal of QEMUFileHook > - test for analyze-migration.py > - test for multifd file > - multifd cleanups > - available switchover bandwidth > - lots of cleanups. > > CI: https://gitlab.com/juan.quintela/qemu/-/pipelines/1037878829 > > Please, apply. This CI failure looks migration-related: MALLOC_PERTURB_=96 PYTHON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/pyvenv/bin/python3 QTEST_QEMU_BINARY=./qemu-system-i386 G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qtest/migration-test --tap -k ― ✀ ― stderr: ** ERROR:../tests/qtest/migration-test.c:1969:file_offset_finish_hook: assertion failed (cpu_to_be32(*p) == QEMU_VM_FILE_MAGIC): (3 == 1363498573) https://gitlab.com/qemu-project/qemu/-/jobs/5301793548 Stefan > > > > Dmitry Frolov (1): > migration: fix RAMBlock add NULL check > > Elena Ufimtseva (3): > migration: check for rate_limit_max for RATE_LIMIT_DISABLED > multifd: fix counters in multifd_send_thread > multifd: reset next_packet_len after sending pages > > Fabiano Rosas (13): > migration: Fix analyze-migration.py 'configuration' parsing > migration: Add capability parsing to analyze-migration.py > migration: Fix analyze-migration.py when ignore-shared is used > migration: Fix analyze-migration read operation signedness > tests/qtest/migration: Add a test for the analyze-migration script > tests/qtest: migration-test: Add tests for file-based migration > migration/ram: Remove RAMState from xbzrle_cache_zero_page > migration/ram: Stop passing QEMUFile around in save_zero_page > migration/ram: Move xbzrle zero page handling into save_zero_page > migration/ram: Merge save_zero_page functions > migration/multifd: Remove direct "socket" references > migration/multifd: Unify multifd_send_thread error paths > migration/multifd: Clarify Error usage in multifd_channel_connect > > Fiona Ebner (1): > migration: hold the BQL during setup > > Juan Quintela (15): > migration: Non multifd migration don't care about multifd flushes > migration: Create migrate_rdma() > migration/rdma: Unfold ram_control_before_iterate() > migration/rdma: Unfold ram_control_after_iterate() > migration/rdma: Remove all uses of RAM_CONTROL_HOOK > migration/rdma: Unfold hook_ram_load() > migration/rdma: Create rdma_control_save_page() > qemu-file: Remove QEMUFileHooks > migration/rdma: Move rdma constants from qemu-file.h to rdma.h > migration/rdma: Remove qemu_ prefix from exported functions > migration/rdma: Check sooner if we are in postcopy for save_page() > migration/rdma: Use i as for index instead of idx > migration/rdma: Declare for index variables local > migration/rdma: Remove all "ret" variables that are used only once > migration: Improve json and formatting > > Nikolay Borisov (2): > migration: Add the configuration vmstate to the json writer > migration/ram: Refactor precopy ram loading code > > Peter Xu (1): > migration: Allow user to specify available switchover bandwidth > > Philippe Mathieu-Daudé (1): > migration: Use g_autofree to simplify ram_dirty_bitmap_reload() > > Wei Wang (1): > migration: refactor migration_completion > > qapi/migration.json| 41 - > include/migration/register.h | 2 +- > migration/migration.h | 4 +- > migration/options.h| 2 + > migration/qemu-file.h | 49 -- > migration/rdma.h | 42 + > migration/block-dirty-bitmap.c | 3 - > migration/block.c | 5 - > migration/migration-hmp-cmds.c | 14 ++ > migration/migration-
[PATCH 07/12] hw/xen: update Xen console to XenDevice model
This allows (non-primary) console devices to be created on the command line. Signed-off-by: David Woodhouse --- hw/char/trace-events| 8 + hw/char/xen_console.c | 502 +++- hw/xen/xen-legacy-backend.c | 1 - 3 files changed, 381 insertions(+), 130 deletions(-) diff --git a/hw/char/trace-events b/hw/char/trace-events index babf4d35ea..7a398c82a5 100644 --- a/hw/char/trace-events +++ b/hw/char/trace-events @@ -105,3 +105,11 @@ cadence_uart_baudrate(unsigned baudrate) "baudrate %u" # sh_serial.c sh_serial_read(char *id, unsigned size, uint64_t offs, uint64_t val) " %s size %d offs 0x%02" PRIx64 " -> 0x%02" PRIx64 sh_serial_write(char *id, unsigned size, uint64_t offs, uint64_t val) "%s size %d offs 0x%02" PRIx64 " <- 0x%02" PRIx64 + +# xen_console.c +xen_console_connect(unsigned int idx, unsigned int ring_ref, unsigned int port, unsigned int limit) "idx %u ring_ref %u port %u limit %u" +xen_console_disconnect(unsigned int idx) "idx %u" +xen_console_unrealize(unsigned int idx) "idx %u" +xen_console_realize(unsigned int idx, const char *chrdev) "idx %u chrdev %s" +xen_console_device_create(unsigned int idx) "idx %u" +xen_console_device_destroy(unsigned int idx) "idx %u" diff --git a/hw/char/xen_console.c b/hw/char/xen_console.c index 810dae3f44..bd20be116c 100644 --- a/hw/char/xen_console.c +++ b/hw/char/xen_console.c @@ -20,15 +20,19 @@ */ #include "qemu/osdep.h" +#include "qemu/cutils.h" #include #include #include "qapi/error.h" #include "sysemu/sysemu.h" #include "chardev/char-fe.h" -#include "hw/xen/xen-legacy-backend.h" - +#include "hw/xen/xen-backend.h" +#include "hw/xen/xen-bus-helper.h" +#include "hw/qdev-properties.h" +#include "hw/qdev-properties-system.h" #include "hw/xen/interface/io/console.h" +#include "trace.h" struct buffer { uint8_t *data; @@ -39,16 +43,22 @@ struct buffer { }; struct XenConsole { -struct XenLegacyDevice xendev; /* must be first */ +struct XenDevice xendev; /* must be first */ +XenEventChannel *event_channel; +int dev; struct buffer buffer; -char console[XEN_BUFSIZE]; -int ring_ref; +char *fe_path; +unsigned int ring_ref; void *sring; CharBackend chr; int backlog; }; +typedef struct XenConsole XenConsole; + +#define TYPE_XEN_CONSOLE_DEVICE "xen-console" +OBJECT_DECLARE_SIMPLE_TYPE(XenConsole, XEN_CONSOLE_DEVICE) -static void buffer_append(struct XenConsole *con) +static bool buffer_append(XenConsole *con) { struct buffer *buffer = >buffer; XENCONS_RING_IDX cons, prod, size; @@ -60,7 +70,7 @@ static void buffer_append(struct XenConsole *con) size = prod - cons; if ((size == 0) || (size > sizeof(intf->out))) -return; +return false; if ((buffer->capacity - buffer->size) < size) { buffer->capacity += (size + 1024); @@ -73,7 +83,7 @@ static void buffer_append(struct XenConsole *con) xen_mb(); intf->out_cons = cons; -xen_pv_send_notify(>xendev); +xen_device_notify_event_channel(XEN_DEVICE(con), con->event_channel, NULL); if (buffer->max_capacity && buffer->size > buffer->max_capacity) { @@ -89,6 +99,7 @@ static void buffer_append(struct XenConsole *con) if (buffer->consumed > buffer->max_capacity - over) buffer->consumed = buffer->max_capacity - over; } +return true; } static void buffer_advance(struct buffer *buffer, size_t len) @@ -100,7 +111,7 @@ static void buffer_advance(struct buffer *buffer, size_t len) } } -static int ring_free_bytes(struct XenConsole *con) +static int ring_free_bytes(XenConsole *con) { struct xencons_interface *intf = con->sring; XENCONS_RING_IDX cons, prod, space; @@ -118,13 +129,13 @@ static int ring_free_bytes(struct XenConsole *con) static int xencons_can_receive(void *opaque) { -struct XenConsole *con = opaque; +XenConsole *con = opaque; return ring_free_bytes(con); } static void xencons_receive(void *opaque, const uint8_t *buf, int len) { -struct XenConsole *con = opaque; +XenConsole *con = opaque; struct xencons_interface *intf = con->sring; XENCONS_RING_IDX prod; int i, max; @@ -141,10 +152,10 @@ static void xencons_receive(void *opaque, const uint8_t *buf, int len) } xen_wmb(); intf->in_prod = prod; -xen_pv_send_notify(>xendev); +xen_device_notify_event_channel(XEN_DEVICE(con), con->event_channel, NULL); } -static void xencons_send(struct XenConsole *con) +static bool xencons_send(XenConsole *con) { ssize_t len, size; @@ -159,174 +170,407 @@ static void xencons_send(struct XenConsole *con) if (len < 1) { if (!con->backlog) { con->backlog = 1; -xen_pv_printf(>xendev, 1, - "backlog piling up, nobody listening?\n"); }
[PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode
From: David Woodhouse The primary console is special because the toolstack maps a page at a fixed GFN and also allocates the guest-side event channel. Add support for that in emulated mode, so that we can have a primary console. Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which supports literally nothing except a single-page mapping of the console page. This might as well have been a hack in the xen_console driver, but this way at least the special-casing is kept within the Xen emulation code, and it gives us a hook for a more complete implementation if/when we ever do need one. Now at last we can boot the Xen PV shim and run PV kernels in QEMU. Signed-off-by: David Woodhouse --- hw/char/xen_console.c | 12 ++- hw/i386/kvm/meson.build | 1 + hw/i386/kvm/trace-events | 2 + hw/i386/kvm/xen-stubs.c | 5 + hw/i386/kvm/xen_gnttab.c | 32 +- hw/i386/kvm/xen_primary_console.c | 167 ++ hw/i386/kvm/xen_primary_console.h | 22 hw/i386/kvm/xen_xenstore.c| 9 ++ target/i386/kvm/xen-emu.c | 23 +++- 9 files changed, 266 insertions(+), 7 deletions(-) create mode 100644 hw/i386/kvm/xen_primary_console.c create mode 100644 hw/i386/kvm/xen_primary_console.h diff --git a/hw/char/xen_console.c b/hw/char/xen_console.c index 1a0f5ed3e1..dfc4be0aa1 100644 --- a/hw/char/xen_console.c +++ b/hw/char/xen_console.c @@ -32,6 +32,7 @@ #include "hw/qdev-properties.h" #include "hw/qdev-properties-system.h" #include "hw/xen/interface/io/console.h" +#include "hw/i386/kvm/xen_primary_console.h" #include "trace.h" struct buffer { @@ -334,8 +335,8 @@ static char *xen_console_get_name(XenDevice *xendev, Error **errp) XenConsole *con = XEN_CONSOLE_DEVICE(xendev); if (con->dev == -1) { +int idx = (xen_mode == XEN_EMULATE) ? 0 : 1; char name[11]; -int idx = 1; /* Theoretically we could go up to INT_MAX here but that's overkill */ while (idx < 100) { @@ -386,10 +387,13 @@ static void xen_console_realize(XenDevice *xendev, Error **errp) * be mapped directly as foreignmem (not a grant ref), and the guest port * was allocated *for* the guest by the toolstack. The guest gets these * through HVMOP_get_param and can use the console long before it's got - * XenStore up and running. We cannot create those for a Xen guest. + * XenStore up and running. We cannot create those for a true Xen guest, + * but we can for Xen emulation. */ if (!con->dev) { -if (xen_device_frontend_scanf(xendev, "ring-ref", "%u", ) != 1 || +if (xen_mode == XEN_EMULATE) { +xen_primary_console_create(); +} else if (xen_device_frontend_scanf(xendev, "ring-ref", "%u", ) != 1 || xen_device_frontend_scanf(xendev, "port", "%u", ) != 1) { error_setg(errp, "cannot create primary Xen console"); return; @@ -404,7 +408,7 @@ static void xen_console_realize(XenDevice *xendev, Error **errp) } /* No normal PV driver initialization for the primary console */ -if (!con->dev) { +if (!con->dev && xen_mode != XEN_EMULATE) { xen_console_connect(xendev, errp); } } diff --git a/hw/i386/kvm/meson.build b/hw/i386/kvm/meson.build index ab143d6474..a4a2e23c06 100644 --- a/hw/i386/kvm/meson.build +++ b/hw/i386/kvm/meson.build @@ -9,6 +9,7 @@ i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files( 'xen_evtchn.c', 'xen_gnttab.c', 'xen_xenstore.c', + 'xen_primary_console.c', 'xenstore_impl.c', )) diff --git a/hw/i386/kvm/trace-events b/hw/i386/kvm/trace-events index e4c82de6f3..67bf7f174e 100644 --- a/hw/i386/kvm/trace-events +++ b/hw/i386/kvm/trace-events @@ -18,3 +18,5 @@ xenstore_watch(const char *path, const char *token) "path %s token %s" xenstore_unwatch(const char *path, const char *token) "path %s token %s" xenstore_reset_watches(void) "" xenstore_watch_event(const char *path, const char *token) "path %s token %s" +xen_primary_console_create(void) "" +xen_primary_console_reset(int port) "port %u" diff --git a/hw/i386/kvm/xen-stubs.c b/hw/i386/kvm/xen-stubs.c index ae406e0b02..10068970fe 100644 --- a/hw/i386/kvm/xen-stubs.c +++ b/hw/i386/kvm/xen-stubs.c @@ -15,6 +15,7 @@ #include "qapi/qapi-commands-misc-target.h" #include "xen_evtchn.h" +#include "xen_primary_console.h" void xen_evtchn_snoop_msi(PCIDevice *dev, bool is_msix, unsigned int vector, uint64_t addr, uint32_t data, bool is_masked) @@ -30,6 +31,10 @@ bool xen_evtchn_deliver_pirq_msi(uint64_t address, uint32_t data) return false; } +void xen_primary_console_create(void) +{ +} + #ifdef TARGET_I386 EvtchnInfoList *qmp_xen_event_list(Error **errp) { diff --git a/hw/i386/kvm/xen_gnttab.c b/hw/i386/kvm/xen_gnttab.c index 21c30e3659..ea201cd582 100644 --- a/hw/i386/kvm/xen_gnttab.c +++ b/hw/i386/kvm/xen_gnttab.c @@ -25,6
[PATCH 05/12] hw/xen: populate store frontend nodes with XenStore PFN/port
From: David Woodhouse This is kind of redundant since without being able to get these through ome other method (HVMOP_get_param) the guest wouldn't be able to access XenStore in order to find them. But Xen populates them, and it does allow guests to *rebind* to the event channel port after a reset. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c index d2b311109b..3300e0614a 100644 --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -1432,6 +1432,7 @@ static void alloc_guest_port(XenXenstoreState *s) int xen_xenstore_reset(void) { XenXenstoreState *s = xen_xenstore_singleton; +GList *perms; int err; if (!s) { @@ -1459,6 +1460,15 @@ int xen_xenstore_reset(void) } s->be_port = err; +/* Create frontend store nodes */ +perms = g_list_append(NULL, xs_perm_as_string(XS_PERM_NONE, DOMID_QEMU)); +perms = g_list_append(perms, xs_perm_as_string(XS_PERM_READ, xen_domid)); + +relpath_printf(s, perms, "store/ring-ref", "%lu", XEN_SPECIAL_PFN(XENSTORE)); +relpath_printf(s, perms, "store/port", "%u", s->be_port); + +g_list_free_full(perms, g_free); + /* * We don't actually access the guest's page through the grant, because * this isn't real Xen, and we can just use the page we gave it in the -- 2.40.1
[PATCH 04/12] i386/xen: advertise XEN_HVM_CPUID_UPCALL_VECTOR in CPUID
From: David Woodhouse This will allow Linux guests (since v6.0) to use the per-vCPU upcall vector delivered as MSI through the local APIC. Signed-off-by: David Woodhouse --- target/i386/kvm/kvm.c | 4 1 file changed, 4 insertions(+) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index f6c7f7e268..8bdbdcdffe 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -1887,6 +1887,10 @@ int kvm_arch_init_vcpu(CPUState *cs) c->eax |= XEN_HVM_CPUID_VCPU_ID_PRESENT; c->ebx = cs->cpu_index; } + +if (cs->kvm_state->xen_version >= XEN_VERSION(4, 17)) { +c->eax |= XEN_HVM_CPUID_UPCALL_VECTOR; +} } r = kvm_xen_init_vcpu(cs); -- 2.40.1
[PATCH 02/12] hw/xen: select kernel mode for per-vCPU event channel upcall vector
From: David Woodhouse A guest which has configured the per-vCPU upcall vector may set the HVM_PARAM_CALLBACK_IRQ param to fairly much anything other than zero. For example, Linux v6.0+ after commit b1c3497e604 ("x86/xen: Add support for HVMOP_set_evtchn_upcall_vector") will just do this after setting the vector: /* Trick toolstack to think we are enlightened. */ if (!cpu) rc = xen_set_callback_via(1); That's explicitly setting the delivery to GSI#, but it's supposed to be overridden by the per-vCPU vector setting. This mostly works in QEMU *except* for the logic to enable the in-kernel handling of event channels, which falsely determines that the kernel cannot accelerate GSI delivery in this case. Add a kvm_xen_has_vcpu_callback_vector() to report whether vCPU#0 has the vector set, and use that in xen_evtchn_set_callback_param() to enable the kernel acceleration features even when the param *appears* to be set to target a GSI. Preserve the Xen behaviour that when HVM_PARAM_CALLBACK_IRQ is set to *zero* the event channel delivery is disabled completely. (Which is what that bizarre guest behaviour is working round in the first place.) Fixes: 91cce756179 ("hw/xen: Add xen_evtchn device for event channel emulation") Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_evtchn.c | 6 ++ include/sysemu/kvm_xen.h | 1 + target/i386/kvm/xen-emu.c | 7 +++ 3 files changed, 14 insertions(+) diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c index 4df973022c..d72dca6591 100644 --- a/hw/i386/kvm/xen_evtchn.c +++ b/hw/i386/kvm/xen_evtchn.c @@ -490,6 +490,12 @@ int xen_evtchn_set_callback_param(uint64_t param) break; } +/* If the guest has set a per-vCPU callback vector, prefer that. */ +if (gsi && kvm_xen_has_vcpu_callback_vector()) { +in_kernel = kvm_xen_has_cap(EVTCHN_SEND); +gsi = 0; +} + if (!ret) { /* If vector delivery was turned *off* then tell the kernel */ if ((s->callback_param >> CALLBACK_VIA_TYPE_SHIFT) == diff --git a/include/sysemu/kvm_xen.h b/include/sysemu/kvm_xen.h index 595abfbe40..961c702c4e 100644 --- a/include/sysemu/kvm_xen.h +++ b/include/sysemu/kvm_xen.h @@ -22,6 +22,7 @@ int kvm_xen_soft_reset(void); uint32_t kvm_xen_get_caps(void); void *kvm_xen_get_vcpu_info_hva(uint32_t vcpu_id); +bool kvm_xen_has_vcpu_callback_vector(void); void kvm_xen_inject_vcpu_callback_vector(uint32_t vcpu_id, int type); void kvm_xen_set_callback_asserted(void); int kvm_xen_set_vcpu_virq(uint32_t vcpu_id, uint16_t virq, uint16_t port); diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c index b49a840438..477e93cd92 100644 --- a/target/i386/kvm/xen-emu.c +++ b/target/i386/kvm/xen-emu.c @@ -424,6 +424,13 @@ void kvm_xen_set_callback_asserted(void) } } +bool kvm_xen_has_vcpu_callback_vector(void) +{ +CPUState *cs = qemu_get_cpu(0); + +return cs && !!X86_CPU(cs)->env.xen_vcpu_callback_vector; +} + void kvm_xen_inject_vcpu_callback_vector(uint32_t vcpu_id, int type) { CPUState *cs = qemu_get_cpu(vcpu_id); -- 2.40.1
[PATCH 0/12] Get Xen PV shim running in qemu
I hadn't got round to getting the PV shim running yet; I thought it would need work on the multiboot loader. Turns out it doesn't. I *did* need to fix a couple of brown-paper-bag bugs in the per-vCPU upcall vector support, and implement Xen console support though. Now I can test PV guests: $ qemu-system-x86_64 --accel kvm,xen-version=0x40011,kernel-irqchip=split \ -chardev stdio,mux=on,id=char0 -device xen-console,chardev=char0 \ -drive file=${GUEST_IMAGE},if=xen -display none -m 1G \ -kernel ~/git/xen/xen/xen -initrd ~/git/linux/arch/x86/boot/bzImage \ -append "loglvl=all -- console=hvc0 root=/dev/xvda1" blockdev.c | 15 +++- hw/block/xen-block.c | 26 +- hw/char/trace-events | 8 ++ hw/char/xen_console.c | 522 -- hw/i386/kvm/meson.build| 1 + hw/i386/kvm/trace-events | 2 + hw/i386/kvm/xen-stubs.c| 5 ++ hw/i386/kvm/xen_evtchn.c | 6 ++ hw/i386/kvm/xen_gnttab.c | 32 ++- hw/i386/kvm/xen_primary_console.c | 167 ++ hw/i386/kvm/xen_primary_console.h | 22 + hw/i386/kvm/xen_xenstore.c | 21 - hw/xen/xen-backend.c | 81 + hw/xen/xen-bus.c | 21 - hw/xen/xen-legacy-backend.c| 1 - include/hw/xen/interface/arch-arm.h| 37 include/hw/xen/interface/arch-x86/cpuid.h | 31 +++ include/hw/xen/interface/arch-x86/xen-x86_32.h | 19 +--- include/hw/xen/interface/arch-x86/xen-x86_64.h | 19 +--- include/hw/xen/interface/arch-x86/xen.h| 26 +- include/hw/xen/interface/event_channel.h | 19 +--- include/hw/xen/interface/features.h| 19 +--- include/hw/xen/interface/grant_table.h | 19 +--- include/hw/xen/interface/hvm/hvm_op.h | 19 +--- include/hw/xen/interface/hvm/params.h | 19 +--- include/hw/xen/interface/io/blkif.h| 27 ++ include/hw/xen/interface/io/console.h | 19 +--- include/hw/xen/interface/io/fbif.h | 19 +--- include/hw/xen/interface/io/kbdif.h| 19 +--- include/hw/xen/interface/io/netif.h| 25 ++ include/hw/xen/interface/io/protocols.h| 19 +--- include/hw/xen/interface/io/ring.h | 49 +- include/hw/xen/interface/io/usbif.h| 19 +--- include/hw/xen/interface/io/xenbus.h | 19 +--- include/hw/xen/interface/io/xs_wire.h | 36 include/hw/xen/interface/memory.h | 30 +++ include/hw/xen/interface/physdev.h | 23 + include/hw/xen/interface/sched.h | 19 +--- include/hw/xen/interface/trace.h | 19 +--- include/hw/xen/interface/vcpu.h| 19 +--- include/hw/xen/interface/version.h | 19 +--- include/hw/xen/interface/xen-compat.h | 19 +--- include/hw/xen/interface/xen.h | 19 +--- include/hw/xen/xen-backend.h | 4 + include/hw/xen/xen-bus.h | 2 + include/sysemu/kvm_xen.h | 1 + target/i386/kvm/kvm.c | 4 + target/i386/kvm/xen-emu.c | 35 ++-- 48 files changed, 941 insertions(+), 680 deletions(-)
[PATCH 08/12] hw/xen: do not repeatedly try to create a failing backend device
From: David Woodhouse If xen_backend_device_create() fails to instantiate a device, the XenBus code will just keep trying over and over again each time the bus is re-enumerated, as long as the backend appears online and in XenbusStateInitialising. The only thing which prevents the XenBus code from recreating duplicates of devices which already exist, is the fact that xen_device_realize() sets the backend state to XenbusStateInitWait. If the attempt to create the device doesn't get *that* far, that's when it will keep getting retried. My first thought was to handle errors by setting the backend state to XenbusStateClosed, but that doesn't work for XenConsole which wants to *ignore* any device of type != "ioemu" completely. So, make xen_backend_device_create() *keep* the XenBackendInstance for a failed device, and provide a new xen_backend_exists() function to allow xen_bus_type_enumerate() to check whether one already exists before creating a new one. Signed-off-by: David Woodhouse --- hw/xen/xen-backend.c | 27 +-- hw/xen/xen-bus.c | 3 ++- include/hw/xen/xen-backend.h | 1 + 3 files changed, 24 insertions(+), 7 deletions(-) diff --git a/hw/xen/xen-backend.c b/hw/xen/xen-backend.c index 5b0fb76eae..b9bf70a9f5 100644 --- a/hw/xen/xen-backend.c +++ b/hw/xen/xen-backend.c @@ -101,6 +101,24 @@ static XenBackendInstance *xen_backend_list_find(XenDevice *xendev) return NULL; } +bool xen_backend_exists(const char *type, const char *name) +{ +const XenBackendImpl *impl = xen_backend_table_lookup(type); +XenBackendInstance *backend; + +if (!impl) { +return false; +} + +QLIST_FOREACH(backend, _list, entry) { +if (backend->impl == impl && !strcmp(backend->name, name)) { +return true; +} +} + +return false; +} + static void xen_backend_list_remove(XenBackendInstance *backend) { QLIST_REMOVE(backend, entry); @@ -122,11 +140,6 @@ void xen_backend_device_create(XenBus *xenbus, const char *type, backend->name = g_strdup(name); impl->create(backend, opts, errp); -if (*errp) { -g_free(backend->name); -g_free(backend); -return; -} backend->impl = impl; xen_backend_list_add(backend); @@ -165,7 +178,9 @@ bool xen_backend_try_device_destroy(XenDevice *xendev, Error **errp) } impl = backend->impl; -impl->destroy(backend, errp); +if (backend->xendev) { +impl->destroy(backend, errp); +} xen_backend_list_remove(backend); g_free(backend->name); diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c index cc524ed92c..0da2aa219a 100644 --- a/hw/xen/xen-bus.c +++ b/hw/xen/xen-bus.c @@ -209,7 +209,8 @@ static void xen_bus_type_enumerate(XenBus *xenbus, const char *type) NULL, "%u", ) != 1) online = 0; -if (online && state == XenbusStateInitialising) { +if (online && state == XenbusStateInitialising && +!xen_backend_exists(type, backend[i])) { Error *local_err = NULL; xen_bus_backend_create(xenbus, type, backend[i], backend_path, diff --git a/include/hw/xen/xen-backend.h b/include/hw/xen/xen-backend.h index aac2fd454d..0f01631ae7 100644 --- a/include/hw/xen/xen-backend.h +++ b/include/hw/xen/xen-backend.h @@ -33,6 +33,7 @@ XenDevice *xen_backend_get_device(XenBackendInstance *backend); void xen_backend_register(const XenBackendInfo *info); const char **xen_backend_get_types(unsigned int *nr); +bool xen_backend_exists(const char *type, const char *name); void xen_backend_device_create(XenBus *xenbus, const char *type, const char *name, QDict *opts, Error **errp); bool xen_backend_try_device_destroy(XenDevice *xendev, Error **errp); -- 2.40.1
[PATCH 11/12] hw/xen: automatically assign device index to block devices
From: David Woodhouse There's no need to force the user to assign a vdev. We can automatically assign one, starting at xvda and searching until we find the first disk name that's unused. This means we can now allow '-drive if=xen,file=xxx' to work without an explicit separate -driver argument, just like if=virtio. Signed-off-by: David Woodhouse --- blockdev.c | 15 --- hw/block/xen-block.c | 25 + 2 files changed, 37 insertions(+), 3 deletions(-) diff --git a/blockdev.c b/blockdev.c index 325b7a3bef..9dec4c9c74 100644 --- a/blockdev.c +++ b/blockdev.c @@ -255,13 +255,13 @@ void drive_check_orphaned(void) * Ignore default drives, because we create certain default * drives unconditionally, then leave them unclaimed. Not the * users fault. - * Ignore IF_VIRTIO, because it gets desugared into -device, - * so we can leave failing to -device. + * Ignore IF_VIRTIO or IF_XEN, because it gets desugared into + * -device, so we can leave failing to -device. * Ignore IF_NONE, because leaving unclaimed IF_NONE remains * available for device_add is a feature. */ if (dinfo->is_default || dinfo->type == IF_VIRTIO -|| dinfo->type == IF_NONE) { +|| dinfo->type == IF_XEN || dinfo->type == IF_NONE) { continue; } if (!blk_get_attached_dev(blk)) { @@ -977,6 +977,15 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type, qemu_opt_set(devopts, "driver", "virtio-blk", _abort); qemu_opt_set(devopts, "drive", qdict_get_str(bs_opts, "id"), _abort); +} else if (type == IF_XEN) { +QemuOpts *devopts; +devopts = qemu_opts_create(qemu_find_opts("device"), NULL, 0, + _abort); +qemu_opt_set(devopts, "driver", + (media == MEDIA_CDROM) ? "xen-cdrom" : "xen-disk", + _abort); +qemu_opt_set(devopts, "drive", qdict_get_str(bs_opts, "id"), + _abort); } filename = qemu_opt_get(legacy_opts, "file"); diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c index 9262338535..20fa783cbe 100644 --- a/hw/block/xen-block.c +++ b/hw/block/xen-block.c @@ -34,6 +34,31 @@ static char *xen_block_get_name(XenDevice *xendev, Error **errp) XenBlockDevice *blockdev = XEN_BLOCK_DEVICE(xendev); XenBlockVdev *vdev = >props.vdev; +if (blockdev->props.vdev.type == XEN_BLOCK_VDEV_TYPE_INVALID) { +char name[11]; +int disk = 0; +unsigned long idx; + +/* Find an unoccupied device name */ +while (disk < (1 << 20)) { +if (disk < (1 << 4)) { +idx = (202 << 8) | (disk << 4); +} else { +idx = (1 << 28) | (disk << 8); +} +snprintf(name, sizeof(name), "%lu", idx); +if (!xen_backend_exists("qdisk", name)) { +vdev->type = XEN_BLOCK_VDEV_TYPE_XVD; +vdev->partition = 0; +vdev->disk = disk; +vdev->number = idx; +return g_strdup(name); +} +disk++; +} +error_setg(errp, "cannot find device vdev for block device"); +return NULL; +} return g_strdup_printf("%lu", vdev->number); } -- 2.40.1
[PATCH 10/12] hw/xen: automatically assign device index to console devices
From: David Woodhouse Now that we can reliably tell whether a given device already exists, we can allow the user to add console devices on the command line with just '-device xen-console,chardev=foo'. Start at 1, because we can't add the *primary* console; that's special because the toolstack has to set up the guest end of that. Signed-off-by: David Woodhouse --- hw/char/xen_console.c | 16 1 file changed, 16 insertions(+) diff --git a/hw/char/xen_console.c b/hw/char/xen_console.c index 2825b8c511..1a0f5ed3e1 100644 --- a/hw/char/xen_console.c +++ b/hw/char/xen_console.c @@ -333,6 +333,22 @@ static char *xen_console_get_name(XenDevice *xendev, Error **errp) { XenConsole *con = XEN_CONSOLE_DEVICE(xendev); +if (con->dev == -1) { +char name[11]; +int idx = 1; + +/* Theoretically we could go up to INT_MAX here but that's overkill */ +while (idx < 100) { +snprintf(name, sizeof(name), "%u", idx); +if (!xen_backend_exists("console", name)) { +con->dev = idx; +return g_strdup(name); +} +idx++; +} +error_setg(errp, "cannot find device index for console device"); +return NULL; +} return g_strdup_printf("%u", con->dev); } -- 2.40.1
[PATCH 03/12] include: update Xen public headers to Xen 4.17.2 release
From: David Woodhouse ... in order to advertise the XEN_HVM_CPUID_UPCALL_VECTOR feature, which will come in a subsequent commit. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c| 2 +- include/hw/xen/interface/arch-arm.h | 37 +++--- include/hw/xen/interface/arch-x86/cpuid.h | 31 +--- .../hw/xen/interface/arch-x86/xen-x86_32.h| 19 +-- .../hw/xen/interface/arch-x86/xen-x86_64.h| 19 +-- include/hw/xen/interface/arch-x86/xen.h | 26 ++ include/hw/xen/interface/event_channel.h | 19 +-- include/hw/xen/interface/features.h | 19 +-- include/hw/xen/interface/grant_table.h| 19 +-- include/hw/xen/interface/hvm/hvm_op.h | 19 +-- include/hw/xen/interface/hvm/params.h | 19 +-- include/hw/xen/interface/io/blkif.h | 27 -- include/hw/xen/interface/io/console.h | 19 +-- include/hw/xen/interface/io/fbif.h| 19 +-- include/hw/xen/interface/io/kbdif.h | 19 +-- include/hw/xen/interface/io/netif.h | 25 +++--- include/hw/xen/interface/io/protocols.h | 19 +-- include/hw/xen/interface/io/ring.h| 49 ++- include/hw/xen/interface/io/usbif.h | 19 +-- include/hw/xen/interface/io/xenbus.h | 19 +-- include/hw/xen/interface/io/xs_wire.h | 36 ++ include/hw/xen/interface/memory.h | 30 +--- include/hw/xen/interface/physdev.h| 23 ++--- include/hw/xen/interface/sched.h | 19 +-- include/hw/xen/interface/trace.h | 19 +-- include/hw/xen/interface/vcpu.h | 19 +-- include/hw/xen/interface/version.h| 19 +-- include/hw/xen/interface/xen-compat.h | 19 +-- include/hw/xen/interface/xen.h| 19 +-- 29 files changed, 124 insertions(+), 523 deletions(-) diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c index 660d0b72f9..d2b311109b 100644 --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -331,7 +331,7 @@ static void xs_error(XenXenstoreState *s, unsigned int id, const char *errstr = NULL; for (unsigned int i = 0; i < ARRAY_SIZE(xsd_errors); i++) { -struct xsd_errors *xsd_error = _errors[i]; +const struct xsd_errors *xsd_error = _errors[i]; if (xsd_error->errnum == errnum) { errstr = xsd_error->errstring; diff --git a/include/hw/xen/interface/arch-arm.h b/include/hw/xen/interface/arch-arm.h index 94b31511dd..1528ced509 100644 --- a/include/hw/xen/interface/arch-arm.h +++ b/include/hw/xen/interface/arch-arm.h @@ -1,26 +1,9 @@ +/* SPDX-License-Identifier: MIT */ /** * arch-arm.h * * Guest OS interface to ARM Xen. * - * Permission is hereby granted, free of charge, to any person obtaining a copy - * of this software and associated documentation files (the "Software"), to - * deal in the Software without restriction, including without limitation the - * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or - * sell copies of the Software, and to permit persons to whom the Software is - * furnished to do so, subject to the following conditions: - * - * The above copyright notice and this permission notice shall be included in - * all copies or substantial portions of the Software. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE - * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER - * DEALINGS IN THE SOFTWARE. - * * Copyright 2011 (C) Citrix Systems */ @@ -361,6 +344,7 @@ typedef uint64_t xen_callback_t; #define PSR_DBG_MASK(1<<9)/* arm64: Debug Exception mask */ #define PSR_IT_MASK (0x0600fc00) /* Thumb If-Then Mask */ #define PSR_JAZELLE (1<<24) /* Jazelle Mode */ +#define PSR_Z (1<<30) /* Zero condition flag */ /* 32 bit modes */ #define PSR_MODE_USR 0x10 @@ -383,7 +367,15 @@ typedef uint64_t xen_callback_t; #define PSR_MODE_EL1t 0x04 #define PSR_MODE_EL0t 0x00 -#define PSR_GUEST32_INIT (PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_SVC) +/* + * We set PSR_Z to be able to boot Linux kernel versions with an invalid + * encoding of the first 8 NOP instructions. See commit a92882a4d270 in + * Linux. + * + * Note that PSR_Z is also set by U-Boot and QEMU -kernel when loading + * zImage kernels on aarch32. + */ +#define PSR_GUEST32_INIT
[PATCH 06/12] hw/xen: add get_frontend_path() method to XenDeviceClass
From: David Woodhouse The primary Xen console is special. The guest's side is set up for it by the toolstack automatically and not by the standard PV init sequence. Accordingly, its *frontend* doesn't appear in …/device/console/0 either; instead it appears under …/console in the guest's XenStore node. To allow the Xen console driver to override the frontend path for the primary console, add a method to the XenDeviceClass which can be used instead of the standard xen_device_get_frontend_path() Signed-off-by: David Woodhouse --- hw/xen/xen-bus.c | 10 +- include/hw/xen/xen-bus.h | 2 ++ 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c index ece8ec40cd..cc524ed92c 100644 --- a/hw/xen/xen-bus.c +++ b/hw/xen/xen-bus.c @@ -711,8 +711,16 @@ static void xen_device_frontend_create(XenDevice *xendev, Error **errp) { ERRP_GUARD(); XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev))); +XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev); -xendev->frontend_path = xen_device_get_frontend_path(xendev); +if (xendev_class->get_frontend_path) { +xendev->frontend_path = xendev_class->get_frontend_path(xendev, errp); +if (!xendev->frontend_path) { +return; +} +} else { +xendev->frontend_path = xen_device_get_frontend_path(xendev); +} /* * The frontend area may have already been created by a legacy diff --git a/include/hw/xen/xen-bus.h b/include/hw/xen/xen-bus.h index f435898164..eb440880b5 100644 --- a/include/hw/xen/xen-bus.h +++ b/include/hw/xen/xen-bus.h @@ -33,6 +33,7 @@ struct XenDevice { }; typedef struct XenDevice XenDevice; +typedef char *(*XenDeviceGetFrontendPath)(XenDevice *xendev, Error **errp); typedef char *(*XenDeviceGetName)(XenDevice *xendev, Error **errp); typedef void (*XenDeviceRealize)(XenDevice *xendev, Error **errp); typedef void (*XenDeviceFrontendChanged)(XenDevice *xendev, @@ -46,6 +47,7 @@ struct XenDeviceClass { /*< public >*/ const char *backend; const char *device; +XenDeviceGetFrontendPath get_frontend_path; XenDeviceGetName get_name; XenDeviceRealize realize; XenDeviceFrontendChanged frontend_changed; -- 2.40.1
[PATCH 09/12] hw/xen: prevent duplicate device registrations
From: David Woodhouse Ensure that we have a XenBackendInstance for every device regardless of whether it was "discovered" in XenStore or created directly in QEMU. This allows the backend_list to be a source of truth about whether a given backend exists, and allows us to reject duplicates. This also cleans up the fact that backend drivers were calling xen_backend_set_device() with a XenDevice immediately after calling qdev_realize_and_unref() on it, when it wasn't theirs to play with any more. Signed-off-by: David Woodhouse --- hw/block/xen-block.c | 1 - hw/char/xen_console.c| 2 +- hw/xen/xen-backend.c | 78 ++-- hw/xen/xen-bus.c | 8 include/hw/xen/xen-backend.h | 3 ++ 5 files changed, 69 insertions(+), 23 deletions(-) diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c index a07cd7eb5d..9262338535 100644 --- a/hw/block/xen-block.c +++ b/hw/block/xen-block.c @@ -975,7 +975,6 @@ static void xen_block_device_create(XenBackendInstance *backend, goto fail; } -xen_backend_set_device(backend, xendev); return; fail: diff --git a/hw/char/xen_console.c b/hw/char/xen_console.c index bd20be116c..2825b8c511 100644 --- a/hw/char/xen_console.c +++ b/hw/char/xen_console.c @@ -468,7 +468,7 @@ static void xen_console_device_create(XenBackendInstance *backend, Chardev *cd = NULL; struct qemu_xs_handle *xsh = xenbus->xsh; -if (qemu_strtoul(name, NULL, 10, )) { +if (qemu_strtoul(name, NULL, 10, ) || number >= INT_MAX) { error_setg(errp, "failed to parse name '%s'", name); goto fail; } diff --git a/hw/xen/xen-backend.c b/hw/xen/xen-backend.c index b9bf70a9f5..dcb4329258 100644 --- a/hw/xen/xen-backend.c +++ b/hw/xen/xen-backend.c @@ -101,22 +101,28 @@ static XenBackendInstance *xen_backend_list_find(XenDevice *xendev) return NULL; } -bool xen_backend_exists(const char *type, const char *name) +static XenBackendInstance *xen_backend_lookup(const XenBackendImpl *impl, const char *name) { -const XenBackendImpl *impl = xen_backend_table_lookup(type); XenBackendInstance *backend; -if (!impl) { -return false; -} - QLIST_FOREACH(backend, _list, entry) { if (backend->impl == impl && !strcmp(backend->name, name)) { -return true; +return backend; } } -return false; +return NULL; +} + +bool xen_backend_exists(const char *type, const char *name) +{ +const XenBackendImpl *impl = xen_backend_table_lookup(type); + +if (!impl) { +return false; +} + +return !!xen_backend_lookup(impl, name); } static void xen_backend_list_remove(XenBackendInstance *backend) @@ -138,11 +144,10 @@ void xen_backend_device_create(XenBus *xenbus, const char *type, backend = g_new0(XenBackendInstance, 1); backend->xenbus = xenbus; backend->name = g_strdup(name); - -impl->create(backend, opts, errp); - backend->impl = impl; xen_backend_list_add(backend); + +impl->create(backend, opts, errp); } XenBus *xen_backend_get_bus(XenBackendInstance *backend) @@ -155,13 +160,6 @@ const char *xen_backend_get_name(XenBackendInstance *backend) return backend->name; } -void xen_backend_set_device(XenBackendInstance *backend, -XenDevice *xendev) -{ -g_assert(!backend->xendev); -backend->xendev = xendev; -} - XenDevice *xen_backend_get_device(XenBackendInstance *backend) { return backend->xendev; @@ -178,9 +176,7 @@ bool xen_backend_try_device_destroy(XenDevice *xendev, Error **errp) } impl = backend->impl; -if (backend->xendev) { -impl->destroy(backend, errp); -} +impl->destroy(backend, errp); xen_backend_list_remove(backend); g_free(backend->name); @@ -188,3 +184,43 @@ bool xen_backend_try_device_destroy(XenDevice *xendev, Error **errp) return true; } + +bool xen_backend_device_realized(XenDevice *xendev, Error **errp) +{ +XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev); +const char *type = xendev_class->backend ? : object_get_typename(OBJECT(xendev)); +const XenBackendImpl *impl = xen_backend_table_lookup(type); +XenBackendInstance *backend; + +if (!impl) { +return false; +} + +backend = xen_backend_lookup(impl, xendev->name); +if (backend) { +if (backend->xendev && backend->xendev != xendev) { +error_setg(errp, "device %s/%s already exists", type, xendev->name); +return false; +} +backend->xendev = xendev; +return true; +} + +backend = g_new0(XenBackendInstance, 1); +backend->xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev))); +backend->xendev = xendev; +backend->name = g_strdup(xendev->name); +backend->impl = impl; + +xen_backend_list_add(backend); +return true; +} + +void
[PATCH 01/12] i386/xen: fix per-vCPU upcall vector for Xen emulation
From: David Woodhouse The per-vCPU upcall vector support had two problems. Firstly it was using the wrong hypercall argument and would always return -EFAULT. And secondly it was using the wrong ioctl() to pass the vector to the kernel and thus the *kernel* would always return -EINVAL. Linux doesn't (yet) use this mode so it went without decent testing for a while. Fixes: 105b47fdf2d0 ("i386/xen: implement HVMOP_set_evtchn_upcall_vector") Signed-off-by: David Woodhouse --- target/i386/kvm/xen-emu.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c index c4fa84a982..b49a840438 100644 --- a/target/i386/kvm/xen-emu.c +++ b/target/i386/kvm/xen-emu.c @@ -306,7 +306,7 @@ static int kvm_xen_set_vcpu_callback_vector(CPUState *cs) trace_kvm_xen_set_vcpu_callback(cs->cpu_index, vector); -return kvm_vcpu_ioctl(cs, KVM_XEN_HVM_SET_ATTR, ); +return kvm_vcpu_ioctl(cs, KVM_XEN_VCPU_SET_ATTR, ); } static void do_set_vcpu_callback_vector(CPUState *cs, run_on_cpu_data data) @@ -849,8 +849,7 @@ static bool kvm_xen_hcall_hvm_op(struct kvm_xen_exit *exit, X86CPU *cpu, int ret = -ENOSYS; switch (cmd) { case HVMOP_set_evtchn_upcall_vector: -ret = kvm_xen_hcall_evtchn_upcall_vector(exit, cpu, - exit->u.hcall.params[0]); +ret = kvm_xen_hcall_evtchn_upcall_vector(exit, cpu, arg); break; case HVMOP_pagetable_dying: -- 2.40.1
Re: [RFC PATCH v3 02/78] block: add fallthrough pseudo-keyword
On Fri, Oct 13, 2023 at 11:45:30AM +0300, Emmanouil Pitsidianakis wrote: > In preparation of raising -Wimplicit-fallthrough to 5, replace all > fall-through comments with the fallthrough attribute pseudo-keyword. > > Signed-off-by: Emmanouil Pitsidianakis > --- > block/block-copy.c| 1 + > block/file-posix.c| 1 + > block/io.c| 1 + > block/iscsi.c | 1 + > block/qcow2-cluster.c | 5 - > block/vhdx.c | 17 + > 6 files changed, 21 insertions(+), 5 deletions(-) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
Re: [RFC PATCH 00/78] Strict disable implicit fallthrough
On Mon, 16 Oct 2023 at 15:58, Manos Pitsidianakis wrote: > > Hello Peter, > > On Mon, 16 Oct 2023, 17:13 Peter Maydell, wrote: >> >> On Fri, 13 Oct 2023 at 13:42, Markus Armbruster wrote: >> > >> > Emmanouil Pitsidianakis writes: >> > >> > > Hello, >> > > >> > > This RFC is inspired by the kernel's move to -Wimplicit-fallthrough=3 >> > > back in 2019.[0] >> > > We take one step (or two) further by increasing it to 5 which rejects >> > > fall through comments and requires an attribute statement. >> > > >> > > [0]: >> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a035d552a93b >> > > >> > > The line differences are not many, but they spread all over different >> > > subsystems, architectures and devices. An attempt has been made to split >> > > them in cohesive patches to aid post-RFC review. Part of the RFC is to >> > > determine whether these patch divisions needs improvement. >> > > >> > > Main questions this RFC poses >> > > = >> > > >> > > - Is this change desirable and net-positive. >> > >> > Unwanted fallthrough is an easy mistake to make, and >> > -Wimplicit-fallthrough=N helps avoid it. The question is how far up we >> > need to push N. Right now we're at N=2. Has unwanted fallthrough been >> > a problem? >> >> Mmm, this is my opinion I think. We have a mechanism for >> catching "forgot the 'break'" already (our =2 setting) and >> a way to say "intentional" in a fairly natural way (add the >> comment). Does pushing N up any further gain us anything >> except a load of churn? >> >> Also, the compiler is not the only thing that processes our >> code: Coverity also looks for "unexpected fallthrough" issues, >> so if we wanted to switch away from our current practice we >> should check whether what we're switching to is an idiom >> that Coverity recognises. > > > It is a code style change as the cover letter mentions, it's not related to > the static analysis itself. Yes, exactly. As a code style change it needs a fairly high level of justification for the code churn, and the cover letter doesn't really provide one... thanks -- PMM
Re: [RFC PATCH v3 08/78] hw/block: add fallthrough pseudo-keyword
On Fri, Oct 13, 2023 at 11:45:36AM +0300, Emmanouil Pitsidianakis wrote: > In preparation of raising -Wimplicit-fallthrough to 5, replace all > fall-through comments with the fallthrough attribute pseudo-keyword. > > Signed-off-by: Emmanouil Pitsidianakis > --- > hw/block/dataplane/xen-block.c | 4 ++-- > hw/block/m25p80.c | 2 +- > hw/block/onenand.c | 2 +- > hw/block/pflash_cfi01.c| 1 + > hw/block/pflash_cfi02.c| 6 -- > 5 files changed, 9 insertions(+), 6 deletions(-) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
Re: [RFC PATCH 00/78] Strict disable implicit fallthrough
Hello Peter, On Mon, 16 Oct 2023, 17:13 Peter Maydell, wrote: > On Fri, 13 Oct 2023 at 13:42, Markus Armbruster wrote: > > > > Emmanouil Pitsidianakis writes: > > > > > Hello, > > > > > > This RFC is inspired by the kernel's move to -Wimplicit-fallthrough=3 > > > back in 2019.[0] > > > We take one step (or two) further by increasing it to 5 which rejects > > > fall through comments and requires an attribute statement. > > > > > > [0]: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a035d552a93b > > > > > > The line differences are not many, but they spread all over different > > > subsystems, architectures and devices. An attempt has been made to > split > > > them in cohesive patches to aid post-RFC review. Part of the RFC is to > > > determine whether these patch divisions needs improvement. > > > > > > Main questions this RFC poses > > > = > > > > > > - Is this change desirable and net-positive. > > > > Unwanted fallthrough is an easy mistake to make, and > > -Wimplicit-fallthrough=N helps avoid it. The question is how far up we > > need to push N. Right now we're at N=2. Has unwanted fallthrough been > > a problem? > > Mmm, this is my opinion I think. We have a mechanism for > catching "forgot the 'break'" already (our =2 setting) and > a way to say "intentional" in a fairly natural way (add the > comment). Does pushing N up any further gain us anything > except a load of churn? > > Also, the compiler is not the only thing that processes our > code: Coverity also looks for "unexpected fallthrough" issues, > so if we wanted to switch away from our current practice we > should check whether what we're switching to is an idiom > that Coverity recognises. > It is a code style change as the cover letter mentions, it's not related to the static analysis itself. -- Manos >
Re: [PATCH v2] block/file-posix: fix update_zones_wp() caller
Sam Li 于2023年8月25日周五 12:06写道: > > When the zoned request fail, it needs to update only the wp of > the target zones for not disrupting the in-flight writes on > these other zones. The wp is updated successfully after the > request completes. > > Fixed the callers with right offset and nr_zones. > > Signed-off-by: Sam Li > --- > block/file-posix.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) Ping? > > diff --git a/block/file-posix.c b/block/file-posix.c > index b16e9c21a1..55e7f06a2f 100644 > --- a/block/file-posix.c > +++ b/block/file-posix.c > @@ -2522,7 +2522,8 @@ out: > } > } else { > if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) { > -update_zones_wp(bs, s->fd, 0, 1); > +/* write and append write are not allowed to cross zone > bounaries */ > +update_zones_wp(bs, s->fd, offset, 1); > } > } > > @@ -3472,7 +3473,7 @@ static int coroutine_fn > raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, > len >> BDRV_SECTOR_BITS); > ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, ); > if (ret != 0) { > -update_zones_wp(bs, s->fd, offset, i); > +update_zones_wp(bs, s->fd, offset, nrz); > error_report("ioctl %s failed %d", op_name, ret); > return ret; > } > -- > 2.40.1 >
Re: [RFC PATCH 00/78] Strict disable implicit fallthrough
On Fri, 13 Oct 2023 at 13:42, Markus Armbruster wrote: > > Emmanouil Pitsidianakis writes: > > > Hello, > > > > This RFC is inspired by the kernel's move to -Wimplicit-fallthrough=3 > > back in 2019.[0] > > We take one step (or two) further by increasing it to 5 which rejects > > fall through comments and requires an attribute statement. > > > > [0]: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a035d552a93b > > > > The line differences are not many, but they spread all over different > > subsystems, architectures and devices. An attempt has been made to split > > them in cohesive patches to aid post-RFC review. Part of the RFC is to > > determine whether these patch divisions needs improvement. > > > > Main questions this RFC poses > > = > > > > - Is this change desirable and net-positive. > > Unwanted fallthrough is an easy mistake to make, and > -Wimplicit-fallthrough=N helps avoid it. The question is how far up we > need to push N. Right now we're at N=2. Has unwanted fallthrough been > a problem? Mmm, this is my opinion I think. We have a mechanism for catching "forgot the 'break'" already (our =2 setting) and a way to say "intentional" in a fairly natural way (add the comment). Does pushing N up any further gain us anything except a load of churn? Also, the compiler is not the only thing that processes our code: Coverity also looks for "unexpected fallthrough" issues, so if we wanted to switch away from our current practice we should check whether what we're switching to is an idiom that Coverity recognises. thanks -- PMM
Re: [PATCH v4 03/10] migration: migrate 'inc' command option is deprecated.
Markus Armbruster wrote: > Juan Quintela writes: > >> Markus Armbruster wrote: >>> Juan Quintela writes: >> So what I want, I want to remove -i/-b in the next version (9.0?). For >> the other, I want to remove it, but I don't care if the code is around >> in "deprecated" state for another couple of years if there are still >> people that feel that they want it. >> >> This is the reason that I put a pointer for -i/-b to >> @block/@block-incremental. They are "perfect" replacements. >> >> I can put here to use blockdev-mirror + NBD, but the replacement is not >> so direct. >> >> Does this make sense? > > I see where you're coming from. Now let's change perspective for a > minute: what's the purpose of deprecating stuff? > > We normally deprecate with intent to remove. > > We don't remove right away, because we promised to first deprecate for a > grace period, so users can adjust in an orderly manner. The deprecation > serves as signal "you need to adjust". The documentation that comes > with it should help with the adjustment. It's commonly of the form "use > $alternative instead". The alternative is often a direct replacement, > but not always. There could even be no replacement at all. We don't > promise replacements, we promise an orderly process, so users can > adjust. > > Sometimes, we don't have firm plans to remove, but are more like "maybe > remove when gets in the way". We could soften the "you need to adjust" > signal in documentation, but I doubt that's a good idea. Regardless, > the need to help users adjust remains. > > Back to your patches. There are two separate interfaces to block > migration, and both are deprecated at the end of the series: > > 1. Migration parameter @block-incremental & friends > >Not in the way, content to keep around for longer if it helps users. > >The deprecation documentation advises to use block-mirror with NBD >instead. All good. > > 2. QMP migrate parameters @inc and @blk > >Firm intent to remove as soon as the grace period expires, because >it's in the way. > >The deprecation documentation advises to use interface 1 instead. >But that's deprecated, too! > >Insufficiently careful readers will miss that the replacement is >deprecated, and just use it. Risks surprise when the replacement >goes away, too. > >More careful readers will realize that this advises to use something >we elsewhere advise not to use. Contradiction! Confusion ensues. > >On further reflection, these readers might conclude that the >*combined* advice is to use block-mirror with NBD instead. This is >correct. > >So why not tell them? > >Perhaps you'd like to give more nuanced advice, like "you should move >to block-mirror with NBD, but if that's not practical for you, you >should at least move to @block-incremental & friends, which will >likely stick around for longer." That's fine. All I'm asking for is >to not make things more confusing than they need to be :) > > [...] Telling this in deprecated.rst is enough? or you want me to put it also in the error/warn messages and qapi? Later, Juan.
Re: -drive if=none: can't we make this the default?
Almost everyone mentions -blockdev as a replacement for -drive. But I have to remind several issues with it: 1. While documentation has improved a lot, -blockdev is still mostly unknown to the masses. 2. -blockdev is just too verbose, one have to specify a lot of parameters just to do a simple thing which is solved with an extra parameter with -drive. Including various backing stores/chains for qcow2 files - this is terrible for using things manually from command line 3. -blockdev does not work with -snapshot 4. Something else I forgot while typing all the above :) In my view, -blockdev is not a substitute for -drive, not at all, and it is very user-unfriendly. This is why -drive seems to be a good trade-off between things like -hda (which is just too simplistic) and -blockdev which which is way too verbose and lacks some automatic sugar like -snapshot. /mjt
Re: [PATCH v4 03/10] migration: migrate 'inc' command option is deprecated.
Juan Quintela writes: > Markus Armbruster wrote: >> Juan Quintela writes: >> >>> Set the 'block_incremental' migration parameter to 'true' instead. >>> >>> Reviewed-by: Thomas Huth >>> Acked-by: Stefan Hajnoczi >>> Signed-off-by: Juan Quintela >>> >>> --- >>> >>> Improve documentation and style (thanks Markus) >>> --- >>> docs/about/deprecated.rst | 7 +++ >>> qapi/migration.json | 8 +++- >>> migration/migration.c | 6 ++ >>> 3 files changed, 20 insertions(+), 1 deletion(-) >>> >>> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst >>> index 1c4d7f36f0..1b6b2870cf 100644 >>> --- a/docs/about/deprecated.rst >>> +++ b/docs/about/deprecated.rst >>> @@ -452,3 +452,10 @@ Migration >>> ``skipped`` field in Migration stats has been deprecated. It hasn't >>> been used for more than 10 years. >>> >>> +``inc`` migrate command option (since 8.2) >>> +'' >>> + >>> +The new way to modify migration is using migration parameters. >>> +``inc`` functionality can be achieved by setting the >>> +``block-incremental`` migration parameter to ``true``. >>> + >>> diff --git a/qapi/migration.json b/qapi/migration.json >>> index 6865fea3c5..56bbd55b87 100644 >>> --- a/qapi/migration.json >>> +++ b/qapi/migration.json >>> @@ -1492,6 +1492,11 @@ >>> # >>> # @resume: resume one paused migration, default "off". (since 3.0) >>> # >>> +# Features: >>> +# >>> +# @deprecated: Member @inc is deprecated. Use migration parameter >>> +# @block-incremental instead. >> >> This is fine now. It becomes bad advice in PATCH 05, which deprecates >> @block-incremental. Two solutions: >> >> 1. Change this patch to point to an alternative that will *not* be >> deprecated. > > Ok, clearly I am not explaining myself properly O:-) > > History of block migration: > * In the beggining there was -b and -i migrate options > There was the only way to do storage of migration. > * We moved to use parameters and capabilities for migration > So we created @block-incremental and @block. > But we didn't remove the command line options (for backward > compatibility). > * We were asked to modify migration so some storaged was migrated and > some was not migrated during migration. But block people found that > it was a good idea to allow storage migration without migrating the > vm, so they created this blockdev-mirror mechanism that is shinny, > funny, faster, better. > > So now we have old code that basically nobody uses (the last big user > was COLO, but now it can use multifd). So we want to drop it, but we > don't care about a direct replacement. > > So, why I am interested in removing this? > - @block and @block-incremental: If you don't use block migration, their > existence don't bother you. They are "quite" independent of the rest > of the migration code (they could be better integrated, but not big > trouble here). > - migrate options -i/-b: This ones hurt us each time that we need to > do changing in options. Notice that we have "perfect" replacements > with @block and @block-incremental, exactly the same > result/semantics/... > You can see the trobles in the RFC patches > > * [PATCH v4 07/10] [RFC] migration: Make -i/-b an error for hmp and qmp > * [PATCH v4 08/10] [RFC] migration: Remove helpers needed for -i/-b > migrate options > > So what I want, I want to remove -i/-b in the next version (9.0?). For > the other, I want to remove it, but I don't care if the code is around > in "deprecated" state for another couple of years if there are still > people that feel that they want it. > > This is the reason that I put a pointer for -i/-b to > @block/@block-incremental. They are "perfect" replacements. > > I can put here to use blockdev-mirror + NBD, but the replacement is not > so direct. > > Does this make sense? I see where you're coming from. Now let's change perspective for a minute: what's the purpose of deprecating stuff? We normally deprecate with intent to remove. We don't remove right away, because we promised to first deprecate for a grace period, so users can adjust in an orderly manner. The deprecation serves as signal "you need to adjust". The documentation that comes with it should help with the adjustment. It's commonly of the form "use $alternative instead". The alternative is often a direct replacement, but not always. There could even be no replacement at all. We don't promise replacements, we promise an orderly process, so users can adjust. Sometimes, we don't have firm plans to remove, but are more like "maybe remove when gets in the way". We could soften the "you need to adjust" signal in documentation, but I doubt that's a good idea. Regardless, the need to help users adjust remains. Back to your patches. There are two separate interfaces to block migration, and both are deprecated at the end of the series: 1. Migration parameter @block-incremental & friends Not in the
[PULL 37/38] migration/multifd: Unify multifd_send_thread error paths
From: Fabiano Rosas The preferred usage of the Error type is to always set both the return code and the error when a failure happens. As all code called from the send thread follows this pattern, we'll always have the return code and the error set at the same time. Aside from the convention, in this piece of code this must be the case, otherwise the if (ret != 0) would be exiting the thread without calling multifd_send_terminate_threads() which is incorrect. Unify both paths to make it clear that both are taken when there's an error. Signed-off-by: Fabiano Rosas Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela Message-ID: <20231012134343.23757-3-faro...@suse.de> --- migration/multifd.c | 12 +++- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 8e9a5ee394..c92955de41 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -753,19 +753,13 @@ static void *multifd_send_thread(void *opaque) } out: -if (local_err) { +if (ret) { +assert(local_err); trace_multifd_send_error(p->id); multifd_send_terminate_threads(local_err); -error_free(local_err); -} - -/* - * Error happen, I will exit, but I can't just leave, tell - * who pay attention to me. - */ -if (ret != 0) { qemu_sem_post(>sem_sync); qemu_sem_post(_send_state->channels_ready); +error_free(local_err); } qemu_mutex_lock(>mutex); -- 2.41.0
[PULL 32/38] migration/ram: Remove RAMState from xbzrle_cache_zero_page
From: Fabiano Rosas 'rs' is not used in that function. It's a leftover from commit 9360447d34 ("ram: Use MigrationStats for statistics"). Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela Message-ID: <20231011184604.32364-4-faro...@suse.de> --- migration/ram.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 2bd51d6bb5..535172d9be 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -570,7 +570,6 @@ void mig_throttle_counter_reset(void) /** * xbzrle_cache_zero_page: insert a zero page in the XBZRLE cache * - * @rs: current RAM state * @current_addr: address for the zero page * * Update the xbzrle cache to reflect a page that's been sent as all 0. @@ -579,7 +578,7 @@ void mig_throttle_counter_reset(void) * As a bonus, if the page wasn't in the cache it gets added so that * when a small write is made into the 0'd page it gets XBZRLE sent. */ -static void xbzrle_cache_zero_page(RAMState *rs, ram_addr_t current_addr) +static void xbzrle_cache_zero_page(ram_addr_t current_addr) { /* We don't care if this fails to allocate a new cache page * as long as it updated an old one */ @@ -2146,7 +2145,7 @@ static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss) */ if (rs->xbzrle_started) { XBZRLE_cache_lock(); -xbzrle_cache_zero_page(rs, block->offset + offset); +xbzrle_cache_zero_page(block->offset + offset); XBZRLE_cache_unlock(); } return res; -- 2.41.0
[PULL 06/38] migration: Fix analyze-migration.py 'configuration' parsing
From: Fabiano Rosas The 'configuration' state subsections are currently not being parsed and the script fails when analyzing an aarch64 stream: Traceback (most recent call last): File "./scripts/analyze-migration.py", line 625, in dump.read(dump_memory = args.memory) File "./scripts/analyze-migration.py", line 571, in read raise Exception("Unknown section type: %d" % section_type) Exception: Unknown section type: 5 Signed-off-by: Fabiano Rosas Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela Message-ID: <20231009184326.15777-3-faro...@suse.de> --- scripts/analyze-migration.py | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py index 082424558b..24687db497 100755 --- a/scripts/analyze-migration.py +++ b/scripts/analyze-migration.py @@ -261,12 +261,21 @@ def getDict(self): class ConfigurationSection(object): -def __init__(self, file): +def __init__(self, file, desc): self.file = file +self.desc = desc def read(self): -name_len = self.file.read32() -name = self.file.readstr(len = name_len) +if self.desc: +version_id = self.desc['version'] +section = VMSDSection(self.file, version_id, self.desc, + 'configuration') +section.read() +else: +# backward compatibility for older streams that don't have +# the configuration section in the json +name_len = self.file.read32() +name = self.file.readstr(len = name_len) class VMSDFieldGeneric(object): def __init__(self, desc, file): @@ -532,7 +541,8 @@ def read(self, desc_only = False, dump_memory = False, write_memory = False): if section_type == self.QEMU_VM_EOF: break elif section_type == self.QEMU_VM_CONFIGURATION: -section = ConfigurationSection(file) +config_desc = self.vmsd_desc.get('configuration') +section = ConfigurationSection(file, config_desc) section.read() elif section_type == self.QEMU_VM_SECTION_START or section_type == self.QEMU_VM_SECTION_FULL: section_id = file.read32() -- 2.41.0
[PULL 14/38] migration: Create migrate_rdma()
Helper to say if we are doing a migration over rdma. Reviewed-by: Peter Xu Reviewed-by: Li Zhijian Signed-off-by: Juan Quintela Message-ID: <20231011203527.9061-2-quint...@redhat.com> --- migration/migration.h | 2 ++ migration/options.h | 1 + migration/migration.c | 1 + migration/options.c | 7 +++ migration/rdma.c | 4 +++- 5 files changed, 14 insertions(+), 1 deletion(-) diff --git a/migration/migration.h b/migration/migration.h index 974897a8d0..ae82004892 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -469,6 +469,8 @@ struct MigrationState { * switchover has been received. */ bool switchover_acked; +/* Is this a rdma migration */ +bool rdma_migration; }; void migrate_set_state(int *state, int old_state, int new_state); diff --git a/migration/options.h b/migration/options.h index 93ee938ab8..237f2d6b4a 100644 --- a/migration/options.h +++ b/migration/options.h @@ -56,6 +56,7 @@ bool migrate_zero_copy_send(void); bool migrate_multifd_flush_after_each_section(void); bool migrate_postcopy(void); +bool migrate_rdma(void); bool migrate_tls(void); /* capabilities helpers */ diff --git a/migration/migration.c b/migration/migration.c index 79fa11e3f6..6ba5e145ac 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1452,6 +1452,7 @@ int migrate_init(MigrationState *s, Error **errp) s->iteration_initial_bytes = 0; s->threshold_size = 0; s->switchover_acked = false; +s->rdma_migration = false; /* * set mig_stats compression_counters memory to zero for a * new migration diff --git a/migration/options.c b/migration/options.c index 546cbe3106..42fb818956 100644 --- a/migration/options.c +++ b/migration/options.c @@ -378,6 +378,13 @@ bool migrate_postcopy(void) return migrate_postcopy_ram() || migrate_dirty_bitmaps(); } +bool migrate_rdma(void) +{ +MigrationState *s = migrate_get_current(); + +return s->rdma_migration; +} + bool migrate_tls(void) { MigrationState *s = migrate_get_current(); diff --git a/migration/rdma.c b/migration/rdma.c index f6fc226c9b..f155f3e1c8 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -4113,6 +4113,7 @@ static void rdma_accept_incoming_migration(void *opaque) void rdma_start_incoming_migration(const char *host_port, Error **errp) { +MigrationState *s = migrate_get_current(); int ret; RDMAContext *rdma; @@ -4144,7 +4145,7 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp) } trace_rdma_start_incoming_migration_after_rdma_listen(); - +s->rdma_migration = true; qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration, NULL, (void *)(intptr_t)rdma); return; @@ -4220,6 +4221,7 @@ void rdma_start_outgoing_migration(void *opaque, trace_rdma_start_outgoing_migration_after_rdma_connect(); s->to_dst_file = rdma_new_output(rdma); +s->rdma_migration = true; migrate_fd_connect(s, NULL); return; return_path_err: -- 2.41.0
[PULL 27/38] migration: Improve json and formatting
Reviewed-by: Markus Armbruster Signed-off-by: Juan Quintela Message-ID: <20231013104736.31722-2-quint...@redhat.com> --- qapi/migration.json | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/qapi/migration.json b/qapi/migration.json index 360e609f66..db3df12d6c 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -73,7 +73,7 @@ { 'struct': 'MigrationStats', 'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int' , 'duplicate': 'int', - 'skipped': { 'type': 'int', 'features': ['deprecated'] }, + 'skipped': { 'type': 'int', 'features': [ 'deprecated' ] }, 'normal': 'int', 'normal-bytes': 'int', 'dirty-pages-rate': 'int', 'mbps': 'number', 'dirty-sync-count': 'int', @@ -440,10 +440,9 @@ # compress and xbzrle are both on, compress only takes effect in # the ram bulk stage, after that, it will be disabled and only # xbzrle takes effect, this can help to minimize migration -# traffic. The feature is disabled by default. (since 2.4 ) +# traffic. The feature is disabled by default. (since 2.4) # -# @events: generate events for each migration state change (since 2.4 -# ) +# @events: generate events for each migration state change (since 2.4) # # @auto-converge: If enabled, QEMU will automatically throttle down # the guest to speed up convergence of RAM migration. (since 1.6) -- 2.41.0
[PULL 33/38] migration/ram: Stop passing QEMUFile around in save_zero_page
From: Fabiano Rosas We don't need the QEMUFile when we're already passing the PageSearchStatus. Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela Message-ID: <20231011184604.32364-5-faro...@suse.de> --- migration/ram.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 535172d9be..2ec28c4507 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1147,10 +1147,11 @@ void ram_release_page(const char *rbname, uint64_t offset) * @block: block that contains the page we want to send * @offset: offset inside the block for the page */ -static int save_zero_page_to_file(PageSearchStatus *pss, QEMUFile *file, - RAMBlock *block, ram_addr_t offset) +static int save_zero_page_to_file(PageSearchStatus *pss, RAMBlock *block, + ram_addr_t offset) { uint8_t *p = block->host + offset; +QEMUFile *file = pss->pss_channel; int len = 0; if (buffer_is_zero(p, TARGET_PAGE_SIZE)) { @@ -1171,10 +1172,10 @@ static int save_zero_page_to_file(PageSearchStatus *pss, QEMUFile *file, * @block: block that contains the page we want to send * @offset: offset inside the block for the page */ -static int save_zero_page(PageSearchStatus *pss, QEMUFile *f, RAMBlock *block, +static int save_zero_page(PageSearchStatus *pss, RAMBlock *block, ram_addr_t offset) { -int len = save_zero_page_to_file(pss, f, block, offset); +int len = save_zero_page_to_file(pss, block, offset); if (len) { stat64_add(_stats.zero_pages, 1); @@ -2138,7 +2139,7 @@ static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss) return 1; } -res = save_zero_page(pss, pss->pss_channel, block, offset); +res = save_zero_page(pss, block, offset); if (res > 0) { /* Must let xbzrle know, otherwise a previous (now 0'd) cached * page would be stale -- 2.41.0
[PULL 34/38] migration/ram: Move xbzrle zero page handling into save_zero_page
From: Fabiano Rosas It makes a bit more sense to have the zero page handling of xbzrle right where we save the zero page. Also invert the exit condition to remove one level of indentation which makes the next patch easier to grasp. Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela Message-ID: <20231011184604.32364-6-faro...@suse.de> --- migration/ram.c | 35 --- 1 file changed, 20 insertions(+), 15 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 2ec28c4507..229cad5c74 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1168,21 +1168,34 @@ static int save_zero_page_to_file(PageSearchStatus *pss, RAMBlock *block, * * Returns the number of pages written. * + * @rs: current RAM state * @pss: current PSS channel * @block: block that contains the page we want to send * @offset: offset inside the block for the page */ -static int save_zero_page(PageSearchStatus *pss, RAMBlock *block, +static int save_zero_page(RAMState *rs, PageSearchStatus *pss, RAMBlock *block, ram_addr_t offset) { int len = save_zero_page_to_file(pss, block, offset); -if (len) { -stat64_add(_stats.zero_pages, 1); -ram_transferred_add(len); -return 1; +if (!len) { +return -1; } -return -1; + +stat64_add(_stats.zero_pages, 1); +ram_transferred_add(len); + +/* + * Must let xbzrle know, otherwise a previous (now 0'd) cached + * page would be stale. + */ +if (rs->xbzrle_started) { +XBZRLE_cache_lock(); +xbzrle_cache_zero_page(block->offset + offset); +XBZRLE_cache_unlock(); +} + +return 1; } /* @@ -2139,16 +2152,8 @@ static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss) return 1; } -res = save_zero_page(pss, block, offset); +res = save_zero_page(rs, pss, block, offset); if (res > 0) { -/* Must let xbzrle know, otherwise a previous (now 0'd) cached - * page would be stale - */ -if (rs->xbzrle_started) { -XBZRLE_cache_lock(); -xbzrle_cache_zero_page(block->offset + offset); -XBZRLE_cache_unlock(); -} return res; } -- 2.41.0
[PULL 26/38] migration/rdma: Remove all "ret" variables that are used only once
Change code that is: int ret; ... ret = foo(); if (ret[ < 0]?) { to: if (foo()[ < 0]) { Reviewed-by: Fabiano Rosas Reviewed-by: Li Zhijian Signed-off-by: Juan Quintela Message-ID: <20231011203527.9061-14-quint...@redhat.com> --- migration/rdma.c | 29 - 1 file changed, 8 insertions(+), 21 deletions(-) diff --git a/migration/rdma.c b/migration/rdma.c index 09015fbd1a..2a1852ec7f 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -1107,7 +1107,6 @@ err_alloc_pd_cq: static int qemu_rdma_alloc_qp(RDMAContext *rdma) { struct ibv_qp_init_attr attr = { 0 }; -int ret; attr.cap.max_send_wr = RDMA_SIGNALED_SEND_MAX; attr.cap.max_recv_wr = 3; @@ -1117,8 +1116,7 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma) attr.recv_cq = rdma->recv_cq; attr.qp_type = IBV_QPT_RC; -ret = rdma_create_qp(rdma->cm_id, rdma->pd, ); -if (ret < 0) { +if (rdma_create_qp(rdma->cm_id, rdma->pd, ) < 0) { return -1; } @@ -1130,8 +1128,8 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma) static bool rdma_support_odp(struct ibv_context *dev) { struct ibv_device_attr_ex attr = {0}; -int ret = ibv_query_device_ex(dev, NULL, ); -if (ret) { + +if (ibv_query_device_ex(dev, NULL, )) { return false; } @@ -1508,7 +1506,6 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma, struct ibv_comp_channel *comp_channel) { struct rdma_cm_event *cm_event; -int ret; /* * Coroutine doesn't start until migration_fd_process_incoming() @@ -1544,8 +1541,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma, } if (pfds[1].revents) { -ret = rdma_get_cm_event(rdma->channel, _event); -if (ret < 0) { +if (rdma_get_cm_event(rdma->channel, _event) < 0) { return -1; } @@ -2317,12 +2313,10 @@ static int qemu_rdma_write(RDMAContext *rdma, uint64_t current_addr = block_offset + offset; uint64_t index = rdma->current_index; uint64_t chunk = rdma->current_chunk; -int ret; /* If we cannot merge it, we flush the current buffer first. */ if (!qemu_rdma_buffer_mergeable(rdma, current_addr, len)) { -ret = qemu_rdma_write_flush(rdma, errp); -if (ret < 0) { +if (qemu_rdma_write_flush(rdma, errp) < 0) { return -1; } rdma->current_length = 0; @@ -2936,7 +2930,6 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc, static int qemu_rdma_drain_cq(RDMAContext *rdma) { Error *err = NULL; -int ret; if (qemu_rdma_write_flush(rdma, ) < 0) { error_report_err(err); @@ -2944,8 +2937,7 @@ static int qemu_rdma_drain_cq(RDMAContext *rdma) } while (rdma->nb_sent) { -ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL); -if (ret < 0) { +if (qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL) < 0) { error_report("rdma migration: complete polling error!"); return -1; } @@ -3323,12 +3315,10 @@ static void rdma_accept_incoming_migration(void *opaque); static void rdma_cm_poll_handler(void *opaque) { RDMAContext *rdma = opaque; -int ret; struct rdma_cm_event *cm_event; MigrationIncomingState *mis = migration_incoming_get_current(); -ret = rdma_get_cm_event(rdma->channel, _event); -if (ret < 0) { +if (rdma_get_cm_event(rdma->channel, _event) < 0) { error_report("get_cm_event failed %d", errno); return; } @@ -4053,14 +4043,11 @@ static QEMUFile *rdma_new_output(RDMAContext *rdma) static void rdma_accept_incoming_migration(void *opaque) { RDMAContext *rdma = opaque; -int ret; QEMUFile *f; Error *local_err = NULL; trace_qemu_rdma_accept_incoming_migration(); -ret = qemu_rdma_accept(rdma); - -if (ret < 0) { +if (qemu_rdma_accept(rdma) < 0) { error_report("RDMA ERROR: Migration initialization failed"); return; } -- 2.41.0
[PULL 25/38] migration/rdma: Declare for index variables local
Declare all variables that are only used inside a for loop inside the for statement. This makes clear that they are not used outside of the for loop. Reviewed-by: Fabiano Rosas Reviewed-by: Li Zhijian Signed-off-by: Juan Quintela Message-ID: <20231011203527.9061-13-quint...@redhat.com> --- migration/rdma.c | 44 ++-- 1 file changed, 18 insertions(+), 26 deletions(-) diff --git a/migration/rdma.c b/migration/rdma.c index 5f6a771e8e..09015fbd1a 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -559,10 +559,8 @@ static void rdma_add_block(RDMAContext *rdma, const char *block_name, local->block = g_new0(RDMALocalBlock, local->nb_blocks + 1); if (local->nb_blocks) { -int x; - if (rdma->blockmap) { -for (x = 0; x < local->nb_blocks; x++) { +for (int x = 0; x < local->nb_blocks; x++) { g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)old[x].offset); g_hash_table_insert(rdma->blockmap, @@ -649,15 +647,12 @@ static void rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block) { RDMALocalBlocks *local = >local_ram_blocks; RDMALocalBlock *old = local->block; -int x; if (rdma->blockmap) { g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)block->offset); } if (block->pmr) { -int j; - -for (j = 0; j < block->nb_chunks; j++) { +for (int j = 0; j < block->nb_chunks; j++) { if (!block->pmr[j]) { continue; } @@ -687,7 +682,7 @@ static void rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block) block->block_name = NULL; if (rdma->blockmap) { -for (x = 0; x < local->nb_blocks; x++) { +for (int x = 0; x < local->nb_blocks; x++) { g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)old[x].offset); } @@ -705,7 +700,7 @@ static void rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block) memcpy(local->block + block->index, old + (block->index + 1), sizeof(RDMALocalBlock) * (local->nb_blocks - (block->index + 1))); -for (x = block->index; x < local->nb_blocks - 1; x++) { +for (int x = block->index; x < local->nb_blocks - 1; x++) { local->block[x].index--; } } @@ -725,7 +720,7 @@ static void rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block) local->nb_blocks--; if (local->nb_blocks && rdma->blockmap) { -for (x = 0; x < local->nb_blocks; x++) { +for (int x = 0; x < local->nb_blocks; x++) { g_hash_table_insert(rdma->blockmap, (void *)(uintptr_t)local->block[x].offset, >block[x]); @@ -828,12 +823,12 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp) * Otherwise, there are no guarantees until the bug is fixed in linux. */ if (!verbs) { -int num_devices, x; +int num_devices; struct ibv_device **dev_list = ibv_get_device_list(_devices); bool roce_found = false; bool ib_found = false; -for (x = 0; x < num_devices; x++) { +for (int x = 0; x < num_devices; x++) { verbs = ibv_open_device(dev_list[x]); /* * ibv_open_device() is not documented to set errno. If @@ -925,7 +920,6 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp) char port_str[16]; struct rdma_cm_event *cm_event; char ip[40] = "unknown"; -struct rdma_addrinfo *e; if (rdma->host == NULL || !strcmp(rdma->host, "")) { error_setg(errp, "RDMA ERROR: RDMA hostname has not been set"); @@ -957,7 +951,7 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp) } /* Try all addresses, saving the first error in @err */ -for (e = res; e != NULL; e = e->ai_next) { +for (struct rdma_addrinfo *e = res; e != NULL; e = e->ai_next) { Error **local_errp = err ? NULL : inet_ntop(e->ai_family, @@ -2777,7 +2771,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, RDMAContext *rdma; int ret; ssize_t done = 0; -size_t i, len; +size_t len; RCU_READ_LOCK_GUARD(); rdma = qatomic_rcu_read(>rdmaout); @@ -2803,7 +2797,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, return -1; } -for (i = 0; i < niov; i++) { +for (int i = 0; i < niov; i++) { size_t remaining = iov[i].iov_len; uint8_t * data = (void *)iov[i].iov_base; while (remaining) { @@ -2866,7 +2860,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc, RDMAControlHeader head; int ret; ssize_t done = 0; -size_t i, len; +size_t len;
[PULL 31/38] migration/ram: Refactor precopy ram loading code
From: Nikolay Borisov Extract the ramblock parsing code into a routine that operates on the sequence of headers from the stream and another the parses the individual ramblock. This makes ram_load_precopy() easier to comprehend. Signed-off-by: Nikolay Borisov Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Peter Xu Reviewed-by: Juan Quintela Signed-off-by: Fabiano Rosas Signed-off-by: Juan Quintela Message-ID: <20231011184604.32364-3-faro...@suse.de> --- migration/ram.c | 146 +++- 1 file changed, 82 insertions(+), 64 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index a9bc6ae1f1..2bd51d6bb5 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3884,6 +3884,85 @@ void colo_flush_ram_cache(void) trace_colo_flush_ram_cache_end(); } +static int parse_ramblock(QEMUFile *f, RAMBlock *block, ram_addr_t length) +{ +int ret = 0; +/* ADVISE is earlier, it shows the source has the postcopy capability on */ +bool postcopy_advised = migration_incoming_postcopy_advised(); + +assert(block); + +if (!qemu_ram_is_migratable(block)) { +error_report("block %s should not be migrated !", block->idstr); +return -EINVAL; +} + +if (length != block->used_length) { +Error *local_err = NULL; + +ret = qemu_ram_resize(block, length, _err); +if (local_err) { +error_report_err(local_err); +} +} +/* For postcopy we need to check hugepage sizes match */ +if (postcopy_advised && migrate_postcopy_ram() && +block->page_size != qemu_host_page_size) { +uint64_t remote_page_size = qemu_get_be64(f); +if (remote_page_size != block->page_size) { +error_report("Mismatched RAM page size %s " + "(local) %zd != %" PRId64, block->idstr, + block->page_size, remote_page_size); +ret = -EINVAL; +} +} +if (migrate_ignore_shared()) { +hwaddr addr = qemu_get_be64(f); +if (migrate_ram_is_ignored(block) && +block->mr->addr != addr) { +error_report("Mismatched GPAs for block %s " + "%" PRId64 "!= %" PRId64, block->idstr, + (uint64_t)addr, (uint64_t)block->mr->addr); +ret = -EINVAL; +} +} +ret = rdma_block_notification_handle(f, block->idstr); +if (ret < 0) { +qemu_file_set_error(f, ret); +} + +return ret; +} + +static int parse_ramblocks(QEMUFile *f, ram_addr_t total_ram_bytes) +{ +int ret = 0; + +/* Synchronize RAM block list */ +while (!ret && total_ram_bytes) { +RAMBlock *block; +char id[256]; +ram_addr_t length; +int len = qemu_get_byte(f); + +qemu_get_buffer(f, (uint8_t *)id, len); +id[len] = 0; +length = qemu_get_be64(f); + +block = qemu_ram_block_by_name(id); +if (block) { +ret = parse_ramblock(f, block, length); +} else { +error_report("Unknown ramblock \"%s\", cannot accept " + "migration", id); +ret = -EINVAL; +} +total_ram_bytes -= length; +} + +return ret; +} + /** * ram_load_precopy: load pages in precopy case * @@ -3898,14 +3977,13 @@ static int ram_load_precopy(QEMUFile *f) { MigrationIncomingState *mis = migration_incoming_get_current(); int flags = 0, ret = 0, invalid_flags = 0, len = 0, i = 0; -/* ADVISE is earlier, it shows the source has the postcopy capability on */ -bool postcopy_advised = migration_incoming_postcopy_advised(); + if (!migrate_compress()) { invalid_flags |= RAM_SAVE_FLAG_COMPRESS_PAGE; } while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) { -ram_addr_t addr, total_ram_bytes; +ram_addr_t addr; void *host = NULL, *host_bak = NULL; uint8_t ch; @@ -3976,67 +4054,7 @@ static int ram_load_precopy(QEMUFile *f) switch (flags & ~RAM_SAVE_FLAG_CONTINUE) { case RAM_SAVE_FLAG_MEM_SIZE: -/* Synchronize RAM block list */ -total_ram_bytes = addr; -while (!ret && total_ram_bytes) { -RAMBlock *block; -char id[256]; -ram_addr_t length; - -len = qemu_get_byte(f); -qemu_get_buffer(f, (uint8_t *)id, len); -id[len] = 0; -length = qemu_get_be64(f); - -block = qemu_ram_block_by_name(id); -if (block && !qemu_ram_is_migratable(block)) { -error_report("block %s should not be migrated !", id); -ret = -EINVAL; -} else if (block) { -if (length != block->used_length) { -Error *local_err = NULL; - -ret = qemu_ram_resize(block, length, -
[PULL 09/38] migration: Fix analyze-migration read operation signedness
From: Fabiano Rosas The migration code uses unsigned values for 16, 32 and 64-bit operations. Fix the script to do the same. This was causing an issue when parsing the migration stream generated on the ppc64 target because one of instance_ids was larger than the 32bit signed maximum: Traceback (most recent call last): File "/home/fabiano/kvm/qemu/build/scripts/analyze-migration.py", line 658, in dump.read(dump_memory = args.memory) File "/home/fabiano/kvm/qemu/build/scripts/analyze-migration.py", line 592, in read classdesc = self.section_classes[section_key] KeyError: ('spapr_iommu', -2147483648) Signed-off-by: Fabiano Rosas Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela Message-ID: <20231009184326.15777-6-faro...@suse.de> --- scripts/analyze-migration.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py index 56ab04dd2d..de506cb8bf 100755 --- a/scripts/analyze-migration.py +++ b/scripts/analyze-migration.py @@ -38,13 +38,13 @@ def __init__(self, filename): self.file = open(self.filename, "rb") def read64(self): -return int.from_bytes(self.file.read(8), byteorder='big', signed=True) +return int.from_bytes(self.file.read(8), byteorder='big', signed=False) def read32(self): -return int.from_bytes(self.file.read(4), byteorder='big', signed=True) +return int.from_bytes(self.file.read(4), byteorder='big', signed=False) def read16(self): -return int.from_bytes(self.file.read(2), byteorder='big', signed=True) +return int.from_bytes(self.file.read(2), byteorder='big', signed=False) def read8(self): return int.from_bytes(self.file.read(1), byteorder='big', signed=True) -- 2.41.0
[PULL 19/38] migration/rdma: Create rdma_control_save_page()
The only user of ram_control_save_page() and save_page() hook was rdma. Just move the function to rdma.c, rename it to rdma_control_save_page(). Reviewed-by: Peter Xu Reviewed-by: Li Zhijian Signed-off-by: Juan Quintela Message-ID: <20231011203527.9061-7-quint...@redhat.com> --- migration/qemu-file.h | 12 migration/rdma.h | 10 ++ migration/qemu-file.c | 20 migration/ram.c | 4 ++-- migration/rdma.c | 19 ++- 5 files changed, 30 insertions(+), 35 deletions(-) diff --git a/migration/qemu-file.h b/migration/qemu-file.h index 80c30631dc..60510a2819 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -36,17 +36,7 @@ #define RAM_CONTROL_ROUND 1 #define RAM_CONTROL_FINISH3 -/* - * This function allows override of where the RAM page - * is saved (such as RDMA, for example.) - */ -typedef int (QEMURamSaveFunc)(QEMUFile *f, - ram_addr_t block_offset, - ram_addr_t offset, - size_t size); - typedef struct QEMUFileHooks { -QEMURamSaveFunc *save_page; } QEMUFileHooks; QEMUFile *qemu_file_new_input(QIOChannel *ioc); @@ -125,8 +115,6 @@ int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size); #define RAM_SAVE_CONTROL_NOT_SUPP -1000 #define RAM_SAVE_CONTROL_DELAYED -2000 -int ram_control_save_page(QEMUFile *f, ram_addr_t block_offset, - ram_addr_t offset, size_t size); QIOChannel *qemu_file_get_ioc(QEMUFile *file); #endif diff --git a/migration/rdma.h b/migration/rdma.h index 8df8b4089a..09a16c1e3c 100644 --- a/migration/rdma.h +++ b/migration/rdma.h @@ -17,6 +17,8 @@ #ifndef QEMU_MIGRATION_RDMA_H #define QEMU_MIGRATION_RDMA_H +#include "exec/memory.h" + void rdma_start_outgoing_migration(void *opaque, const char *host_port, Error **errp); @@ -28,6 +30,8 @@ int qemu_rdma_registration_handle(QEMUFile *f); int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags); int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags); int rdma_block_notification_handle(QEMUFile *f, const char *name); +int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset, + ram_addr_t offset, size_t size); #else static inline int qemu_rdma_registration_handle(QEMUFile *f) { return 0; } @@ -37,5 +41,11 @@ static inline int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags) { return 0; } static inline int rdma_block_notification_handle(QEMUFile *f, const char *name) { return 0; } +static inline +int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset, + ram_addr_t offset, size_t size) +{ +return RAM_SAVE_CONTROL_NOT_SUPP; +} #endif #endif diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 4a414b8976..745eaf7a5b 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -298,26 +298,6 @@ void qemu_fflush(QEMUFile *f) f->iovcnt = 0; } -int ram_control_save_page(QEMUFile *f, ram_addr_t block_offset, - ram_addr_t offset, size_t size) -{ -if (f->hooks && f->hooks->save_page) { -int ret = f->hooks->save_page(f, block_offset, offset, size); -/* - * RAM_SAVE_CONTROL_* are negative values - */ -if (ret != RAM_SAVE_CONTROL_DELAYED && -ret != RAM_SAVE_CONTROL_NOT_SUPP) { -if (ret < 0) { -qemu_file_set_error(f, ret); -} -} -return ret; -} - -return RAM_SAVE_CONTROL_NOT_SUPP; -} - /* * Attempt to fill the buffer from the underlying file * Returns the number of bytes read, or negative value for an error. diff --git a/migration/ram.c b/migration/ram.c index 8c462276cd..3b4b09f6ff 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1197,8 +1197,8 @@ static bool control_save_page(PageSearchStatus *pss, RAMBlock *block, { int ret; -ret = ram_control_save_page(pss->pss_channel, block->offset, offset, -TARGET_PAGE_SIZE); +ret = rdma_control_save_page(pss->pss_channel, block->offset, offset, + TARGET_PAGE_SIZE); if (ret == RAM_SAVE_CONTROL_NOT_SUPP) { return false; } diff --git a/migration/rdma.c b/migration/rdma.c index 0b1cb03b2b..f66bd939d7 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3314,6 +3314,24 @@ err: return -1; } +int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset, + ram_addr_t offset, size_t size) +{ +if (!migrate_rdma()) { +return RAM_SAVE_CONTROL_NOT_SUPP; +} + +int ret = qemu_rdma_save_page(f, block_offset, offset, size); + +if (ret != RAM_SAVE_CONTROL_DELAYED && +ret != RAM_SAVE_CONTROL_NOT_SUPP) { +if (ret < 0) { +qemu_file_set_error(f, ret); +} +} +return ret; +} + static void
[PULL 20/38] qemu-file: Remove QEMUFileHooks
The only user was rdma, and its use is gone. Reviewed-by: Peter Xu Reviewed-by: Li Zhijian Signed-off-by: Juan Quintela Message-ID: <20231011203527.9061-8-quint...@redhat.com> --- migration/qemu-file.h | 4 migration/qemu-file.c | 6 -- migration/rdma.c | 9 - 3 files changed, 19 deletions(-) diff --git a/migration/qemu-file.h b/migration/qemu-file.h index 60510a2819..0b22d8335f 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -36,12 +36,8 @@ #define RAM_CONTROL_ROUND 1 #define RAM_CONTROL_FINISH3 -typedef struct QEMUFileHooks { -} QEMUFileHooks; - QEMUFile *qemu_file_new_input(QIOChannel *ioc); QEMUFile *qemu_file_new_output(QIOChannel *ioc); -void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks); int qemu_fclose(QEMUFile *f); /* diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 745eaf7a5b..3fb25148d1 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -38,7 +38,6 @@ #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64) struct QEMUFile { -const QEMUFileHooks *hooks; QIOChannel *ioc; bool is_writable; @@ -133,11 +132,6 @@ QEMUFile *qemu_file_new_input(QIOChannel *ioc) return qemu_file_new_impl(ioc, false); } -void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks) -{ -f->hooks = hooks; -} - /* * Get last error for stream f with optional Error* * diff --git a/migration/rdma.c b/migration/rdma.c index f66bd939d7..9883b0a250 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -4003,13 +4003,6 @@ err: return -1; } -static const QEMUFileHooks rdma_read_hooks = { -}; - -static const QEMUFileHooks rdma_write_hooks = { -}; - - static void qio_channel_rdma_finalize(Object *obj) { QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(obj); @@ -4061,7 +4054,6 @@ static QEMUFile *rdma_new_input(RDMAContext *rdma) rioc->file = qemu_file_new_input(QIO_CHANNEL(rioc)); rioc->rdmain = rdma; rioc->rdmaout = rdma->return_path; -qemu_file_set_hooks(rioc->file, _read_hooks); return rioc->file; } @@ -4073,7 +4065,6 @@ static QEMUFile *rdma_new_output(RDMAContext *rdma) rioc->file = qemu_file_new_output(QIO_CHANNEL(rioc)); rioc->rdmaout = rdma; rioc->rdmain = rdma->return_path; -qemu_file_set_hooks(rioc->file, _write_hooks); return rioc->file; } -- 2.41.0
[PULL 17/38] migration/rdma: Remove all uses of RAM_CONTROL_HOOK
Instead of going through ram_control_load_hook(), call qemu_rdma_registration_handle() directly. Reviewed-by: Li Zhijian Signed-off-by: Juan Quintela Message-ID: <20231011203527.9061-5-quint...@redhat.com> --- migration/qemu-file.h | 1 - migration/rdma.h | 3 +++ migration/ram.c | 5 - migration/rdma.c | 12 +++- 4 files changed, 14 insertions(+), 7 deletions(-) diff --git a/migration/qemu-file.h b/migration/qemu-file.h index 35e671a01e..14ff0d9cc4 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -41,7 +41,6 @@ typedef int (QEMURamHookFunc)(QEMUFile *f, uint64_t flags, void *data); */ #define RAM_CONTROL_SETUP 0 #define RAM_CONTROL_ROUND 1 -#define RAM_CONTROL_HOOK 2 #define RAM_CONTROL_FINISH3 #define RAM_CONTROL_BLOCK_REG 4 diff --git a/migration/rdma.h b/migration/rdma.h index c13b94c782..8bd277efb9 100644 --- a/migration/rdma.h +++ b/migration/rdma.h @@ -24,10 +24,13 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp); #ifdef CONFIG_RDMA +int qemu_rdma_registration_handle(QEMUFile *f); int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags); int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags); #else static inline +int qemu_rdma_registration_handle(QEMUFile *f) { return 0; } +static inline int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags) { return 0; } static inline int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags) { return 0; } diff --git a/migration/ram.c b/migration/ram.c index f1ddc1f9fa..f6ea1831b5 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4075,7 +4075,10 @@ static int ram_load_precopy(QEMUFile *f) } break; case RAM_SAVE_FLAG_HOOK: -ram_control_load_hook(f, RAM_CONTROL_HOOK, NULL); +ret = qemu_rdma_registration_handle(f); +if (ret < 0) { +qemu_file_set_error(f, ret); +} break; default: error_report("Unknown combination of migration flags: 0x%x", flags); diff --git a/migration/rdma.c b/migration/rdma.c index 4b32d375ec..5c20f425a9 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3522,7 +3522,7 @@ static int dest_ram_sort_func(const void *a, const void *b) * * Keep doing this until the source tells us to stop. */ -static int qemu_rdma_registration_handle(QEMUFile *f) +int qemu_rdma_registration_handle(QEMUFile *f) { RDMAControlHeader reg_resp = { .len = sizeof(RDMARegisterResult), .type = RDMA_CONTROL_REGISTER_RESULT, @@ -3534,7 +3534,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f) }; RDMAControlHeader blocks = { .type = RDMA_CONTROL_RAM_BLOCKS_RESULT, .repeat = 1 }; -QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f)); +QIOChannelRDMA *rioc; Error *err = NULL; RDMAContext *rdma; RDMALocalBlocks *local; @@ -3550,7 +3550,12 @@ static int qemu_rdma_registration_handle(QEMUFile *f) int count = 0; int i = 0; +if (!migrate_rdma()) { +return 0; +} + RCU_READ_LOCK_GUARD(); +rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f)); rdma = qatomic_rcu_read(>rdmain); if (!rdma) { @@ -3841,9 +3846,6 @@ static int rdma_load_hook(QEMUFile *f, uint64_t flags, void *data) case RAM_CONTROL_BLOCK_REG: return rdma_block_notification_handle(f, data); -case RAM_CONTROL_HOOK: -return qemu_rdma_registration_handle(f); - default: /* Shouldn't be called with any other values */ abort(); -- 2.41.0
[PULL 35/38] migration/ram: Merge save_zero_page functions
From: Fabiano Rosas We don't need to do this in two pieces. One single function makes it easier to grasp, specially since it removes the indirection on the return value handling. Reviewed-by: Peter Xu Signed-off-by: Fabiano Rosas Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela Message-ID: <20231011184604.32364-7-faro...@suse.de> --- migration/ram.c | 46 +- 1 file changed, 13 insertions(+), 33 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 229cad5c74..c844151ee9 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1137,32 +1137,6 @@ void ram_release_page(const char *rbname, uint64_t offset) ram_discard_range(rbname, offset, TARGET_PAGE_SIZE); } -/** - * save_zero_page_to_file: send the zero page to the file - * - * Returns the size of data written to the file, 0 means the page is not - * a zero page - * - * @pss: current PSS channel - * @block: block that contains the page we want to send - * @offset: offset inside the block for the page - */ -static int save_zero_page_to_file(PageSearchStatus *pss, RAMBlock *block, - ram_addr_t offset) -{ -uint8_t *p = block->host + offset; -QEMUFile *file = pss->pss_channel; -int len = 0; - -if (buffer_is_zero(p, TARGET_PAGE_SIZE)) { -len += save_page_header(pss, file, block, offset | RAM_SAVE_FLAG_ZERO); -qemu_put_byte(file, 0); -len += 1; -ram_release_page(block->idstr, offset); -} -return len; -} - /** * save_zero_page: send the zero page to the stream * @@ -1176,12 +1150,19 @@ static int save_zero_page_to_file(PageSearchStatus *pss, RAMBlock *block, static int save_zero_page(RAMState *rs, PageSearchStatus *pss, RAMBlock *block, ram_addr_t offset) { -int len = save_zero_page_to_file(pss, block, offset); +uint8_t *p = block->host + offset; +QEMUFile *file = pss->pss_channel; +int len = 0; -if (!len) { -return -1; +if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) { +return 0; } +len += save_page_header(pss, file, block, offset | RAM_SAVE_FLAG_ZERO); +qemu_put_byte(file, 0); +len += 1; +ram_release_page(block->idstr, offset); + stat64_add(_stats.zero_pages, 1); ram_transferred_add(len); @@ -1195,7 +1176,7 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss, RAMBlock *block, XBZRLE_cache_unlock(); } -return 1; +return len; } /* @@ -2152,9 +2133,8 @@ static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss) return 1; } -res = save_zero_page(rs, pss, block, offset); -if (res > 0) { -return res; +if (save_zero_page(rs, pss, block, offset)) { +return 1; } /* -- 2.41.0
[PULL 28/38] migration: check for rate_limit_max for RATE_LIMIT_DISABLED
From: Elena Ufimtseva In migration rate limiting atomic operations are used to read the rate limit variables and transferred bytes and they are expensive. Check first if rate_limit_max is equal to RATE_LIMIT_DISABLED and return false immediately if so. Note that with this patch we will also will stop flushing by not calling qemu_fflush() from migration_transferred_bytes() if the migration rate is not exceeded. This should be fine since migration thread calls in the loop migration_update_counters from migration_rate_limit() that calls the migration_transferred_bytes() and flushes there. Signed-off-by: Elena Ufimtseva Reviewed-by: Fabiano Rosas Reviewed-by: Peter Xu Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela Message-ID: <20231011184358.97349-2-elena.ufimts...@oracle.com> --- migration/migration-stats.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/migration/migration-stats.c b/migration/migration-stats.c index 84e11e6dd8..4cc989d975 100644 --- a/migration/migration-stats.c +++ b/migration/migration-stats.c @@ -24,14 +24,15 @@ bool migration_rate_exceeded(QEMUFile *f) return true; } +uint64_t rate_limit_max = migration_rate_get(); +if (rate_limit_max == RATE_LIMIT_DISABLED) { +return false; +} + uint64_t rate_limit_start = stat64_get(_stats.rate_limit_start); uint64_t rate_limit_current = migration_transferred_bytes(f); uint64_t rate_limit_used = rate_limit_current - rate_limit_start; -uint64_t rate_limit_max = stat64_get(_stats.rate_limit_max); -if (rate_limit_max == RATE_LIMIT_DISABLED) { -return false; -} if (rate_limit_max > 0 && rate_limit_used > rate_limit_max) { return true; } -- 2.41.0
[PULL 13/38] migration: Non multifd migration don't care about multifd flushes
RDMA was having trouble because migrate_multifd_flush_after_each_section() can only be true or false, but we don't want to send any flush when we are not in multifd migration. CC: Fabiano Rosas Reviewed-by: Li Zhijian Reviewed-by: Peter Xu Signed-off-by: Juan Quintela Message-ID: <20231011205548.10571-2-quint...@redhat.com> --- migration/ram.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index d3d9c8b65b..acb8f95f00 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1395,7 +1395,8 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss) pss->page = 0; pss->block = QLIST_NEXT_RCU(pss->block, next); if (!pss->block) { -if (!migrate_multifd_flush_after_each_section()) { +if (migrate_multifd() && +!migrate_multifd_flush_after_each_section()) { QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel; int ret = multifd_send_sync_main(f); if (ret < 0) { @@ -3072,7 +3073,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) return ret; } -if (!migrate_multifd_flush_after_each_section()) { +if (migrate_multifd() && !migrate_multifd_flush_after_each_section()) { qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); } @@ -3184,7 +3185,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) out: if (ret >= 0 && migration_is_setup_or_active(migrate_get_current()->state)) { -if (migrate_multifd_flush_after_each_section()) { +if (migrate_multifd() && migrate_multifd_flush_after_each_section()) { ret = multifd_send_sync_main(rs->pss[RAM_CHANNEL_PRECOPY].pss_channel); if (ret < 0) { return ret; @@ -3261,7 +3262,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) return ret; } -if (!migrate_multifd_flush_after_each_section()) { +if (migrate_multifd() && !migrate_multifd_flush_after_each_section()) { qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); } qemu_put_be64(f, RAM_SAVE_FLAG_EOS); @@ -3768,7 +3769,8 @@ int ram_load_postcopy(QEMUFile *f, int channel) break; case RAM_SAVE_FLAG_EOS: /* normal exit */ -if (migrate_multifd_flush_after_each_section()) { +if (migrate_multifd() && +migrate_multifd_flush_after_each_section()) { multifd_recv_sync_main(); } break; @@ -4046,7 +4048,8 @@ static int ram_load_precopy(QEMUFile *f) break; case RAM_SAVE_FLAG_EOS: /* normal exit */ -if (migrate_multifd_flush_after_each_section()) { +if (migrate_multifd() && +migrate_multifd_flush_after_each_section()) { multifd_recv_sync_main(); } break; -- 2.41.0
[PULL 02/38] migration: Use g_autofree to simplify ram_dirty_bitmap_reload()
From: Philippe Mathieu-Daudé Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Markus Armbruster Reviewed-by: Fabiano Rosas Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela Message-ID: <20231011023627.86691-1-phi...@linaro.org> --- migration/ram.c | 17 ++--- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 2f5ce4d60b..24d91de8b3 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4159,7 +4159,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block) int ret = -EINVAL; /* from_dst_file is always valid because we're within rp_thread */ QEMUFile *file = s->rp_state.from_dst_file; -unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS; +g_autofree unsigned long *le_bitmap = NULL; +unsigned long nbits = block->used_length >> TARGET_PAGE_BITS; uint64_t local_size = DIV_ROUND_UP(nbits, 8); uint64_t size, end_mark; RAMState *rs = ram_state; @@ -4188,8 +4189,7 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block) error_report("%s: ramblock '%s' bitmap size mismatch " "(0x%"PRIx64" != 0x%"PRIx64")", __func__, block->idstr, size, local_size); -ret = -EINVAL; -goto out; +return -EINVAL; } size = qemu_get_buffer(file, (uint8_t *)le_bitmap, local_size); @@ -4200,15 +4200,13 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block) error_report("%s: read bitmap failed for ramblock '%s': %d" " (size 0x%"PRIx64", got: 0x%"PRIx64")", __func__, block->idstr, ret, local_size, size); -ret = -EIO; -goto out; +return -EIO; } if (end_mark != RAMBLOCK_RECV_BITMAP_ENDING) { error_report("%s: ramblock '%s' end mark incorrect: 0x%"PRIx64, __func__, block->idstr, end_mark); -ret = -EINVAL; -goto out; +return -EINVAL; } /* @@ -4240,10 +4238,7 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block) */ migration_rp_kick(s); -ret = 0; -out: -g_free(le_bitmap); -return ret; +return 0; } static int ram_resume_prepare(MigrationState *s, void *opaque) -- 2.41.0
[PULL 04/38] migration: fix RAMBlock add NULL check
From: Dmitry Frolov qemu_ram_block_from_host() may return NULL, which will be dereferenced w/o check. Usualy return value is checked for this function. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Dmitry Frolov Reviewed-by: Fabiano Rosas Reviewed-by: Peter Xu Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela Message-ID: <20231010104851.802947-1-fro...@swemel.ru> --- migration/ram.c | 5 + 1 file changed, 5 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index 24d91de8b3..e8df4dc862 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4285,6 +4285,11 @@ static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host, RAMBlock *rb = qemu_ram_block_from_host(host, false, ); Error *err = NULL; +if (!rb) { +error_report("RAM block not found"); +return; +} + if (migrate_ram_is_ignored(rb)) { return; } -- 2.41.0
[PULL 29/38] multifd: fix counters in multifd_send_thread
From: Elena Ufimtseva Previous commit cbec7eb76879d419e7dbf531ee2506ec0722e825 "migration/multifd: Compute transferred bytes correctly" removed accounting for packet_len in non-rdma case, but the next_packet_size only accounts for pages, not for the header packet (normal_pages * PAGE_SIZE) that is being sent as iov[0]. The packet_len part should be added to account for the size of MultiFDPacket and the array of the offsets. Signed-off-by: Elena Ufimtseva Reviewed-by: Fabiano Rosas Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela Message-ID: <20231011184358.97349-4-elena.ufimts...@oracle.com> --- migration/multifd.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 0f6b203877..e6e0013c16 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -714,8 +714,6 @@ static void *multifd_send_thread(void *opaque) if (ret != 0) { break; } -stat64_add(_stats.multifd_bytes, p->packet_len); -stat64_add(_stats.transferred, p->packet_len); } else { /* Send header using the same writev call */ p->iov[0].iov_len = p->packet_len; @@ -728,8 +726,10 @@ static void *multifd_send_thread(void *opaque) break; } -stat64_add(_stats.multifd_bytes, p->next_packet_size); -stat64_add(_stats.transferred, p->next_packet_size); +stat64_add(_stats.multifd_bytes, + p->next_packet_size + p->packet_len); +stat64_add(_stats.transferred, + p->next_packet_size + p->packet_len); qemu_mutex_lock(>mutex); p->pending_job--; qemu_mutex_unlock(>mutex); -- 2.41.0
[PULL 08/38] migration: Fix analyze-migration.py when ignore-shared is used
From: Fabiano Rosas The script is currently broken when the x-ignore-shared capability is used: Traceback (most recent call last): File "./scripts/analyze-migration.py", line 656, in dump.read(dump_memory = args.memory) File "./scripts/analyze-migration.py", line 593, in read section.read() File "./scripts/analyze-migration.py", line 163, in read self.name = self.file.readstr(len = namelen) File "./scripts/analyze-migration.py", line 53, in readstr return self.readvar(len).decode('utf-8') UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 55: invalid start byte We're currently adding data to the middle of the ram section depending on the presence of the capability. As a consequence, any code loading the ram section needs to know about capabilities so it can interpret the stream. Skip the byte that's added when x-ignore-shared is used to fix the script. Signed-off-by: Fabiano Rosas Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela Message-ID: <20231009184326.15777-5-faro...@suse.de> --- scripts/analyze-migration.py | 5 + 1 file changed, 5 insertions(+) diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py index c700fed64d..56ab04dd2d 100755 --- a/scripts/analyze-migration.py +++ b/scripts/analyze-migration.py @@ -123,6 +123,7 @@ def __init__(self, file, version_id, ramargs, section_key): self.TARGET_PAGE_SIZE = ramargs['page_size'] self.dump_memory = ramargs['dump_memory'] self.write_memory = ramargs['write_memory'] +self.ignore_shared = ramargs['ignore_shared'] self.sizeinfo = collections.OrderedDict() self.data = collections.OrderedDict() self.data['section sizes'] = self.sizeinfo @@ -169,6 +170,8 @@ def read(self): f.truncate(0) f.truncate(len) self.files[self.name] = f +if self.ignore_shared: +mr_addr = self.file.read64() flags &= ~self.RAM_SAVE_FLAG_MEM_SIZE if flags & self.RAM_SAVE_FLAG_COMPRESS: @@ -572,6 +575,7 @@ def read(self, desc_only = False, dump_memory = False, write_memory = False): ramargs['page_size'] = self.vmsd_desc['page_size'] ramargs['dump_memory'] = dump_memory ramargs['write_memory'] = write_memory +ramargs['ignore_shared'] = False self.section_classes[('ram',0)][1] = ramargs while True: @@ -582,6 +586,7 @@ def read(self, desc_only = False, dump_memory = False, write_memory = False): config_desc = self.vmsd_desc.get('configuration') section = ConfigurationSection(file, config_desc) section.read() +ramargs['ignore_shared'] = section.has_capability('x-ignore-shared') elif section_type == self.QEMU_VM_SECTION_START or section_type == self.QEMU_VM_SECTION_FULL: section_id = file.read32() name = file.readstr() -- 2.41.0
[PULL 00/38] Migration 20231016 patches
The following changes since commit 63011373ad22c794a013da69663c03f1297a5c56: Merge tag 'pull-riscv-to-apply-20231012-1' of https://github.com/alistair23/qemu into staging (2023-10-12 10:24:44 -0400) are available in the Git repository at: https://gitlab.com/juan.quintela/qemu.git tags/migration-20231016-pull-request for you to fetch changes up to f39b0f42753635b0f2d8b00a26d11bb197bf51e2: migration/multifd: Clarify Error usage in multifd_channel_connect (2023-10-16 11:01:33 +0200) Migration Pull request (20231016) In this pull request: - rdma cleanups - removal of QEMUFileHook - test for analyze-migration.py - test for multifd file - multifd cleanups - available switchover bandwidth - lots of cleanups. CI: https://gitlab.com/juan.quintela/qemu/-/pipelines/1037878829 Please, apply. Dmitry Frolov (1): migration: fix RAMBlock add NULL check Elena Ufimtseva (3): migration: check for rate_limit_max for RATE_LIMIT_DISABLED multifd: fix counters in multifd_send_thread multifd: reset next_packet_len after sending pages Fabiano Rosas (13): migration: Fix analyze-migration.py 'configuration' parsing migration: Add capability parsing to analyze-migration.py migration: Fix analyze-migration.py when ignore-shared is used migration: Fix analyze-migration read operation signedness tests/qtest/migration: Add a test for the analyze-migration script tests/qtest: migration-test: Add tests for file-based migration migration/ram: Remove RAMState from xbzrle_cache_zero_page migration/ram: Stop passing QEMUFile around in save_zero_page migration/ram: Move xbzrle zero page handling into save_zero_page migration/ram: Merge save_zero_page functions migration/multifd: Remove direct "socket" references migration/multifd: Unify multifd_send_thread error paths migration/multifd: Clarify Error usage in multifd_channel_connect Fiona Ebner (1): migration: hold the BQL during setup Juan Quintela (15): migration: Non multifd migration don't care about multifd flushes migration: Create migrate_rdma() migration/rdma: Unfold ram_control_before_iterate() migration/rdma: Unfold ram_control_after_iterate() migration/rdma: Remove all uses of RAM_CONTROL_HOOK migration/rdma: Unfold hook_ram_load() migration/rdma: Create rdma_control_save_page() qemu-file: Remove QEMUFileHooks migration/rdma: Move rdma constants from qemu-file.h to rdma.h migration/rdma: Remove qemu_ prefix from exported functions migration/rdma: Check sooner if we are in postcopy for save_page() migration/rdma: Use i as for index instead of idx migration/rdma: Declare for index variables local migration/rdma: Remove all "ret" variables that are used only once migration: Improve json and formatting Nikolay Borisov (2): migration: Add the configuration vmstate to the json writer migration/ram: Refactor precopy ram loading code Peter Xu (1): migration: Allow user to specify available switchover bandwidth Philippe Mathieu-Daudé (1): migration: Use g_autofree to simplify ram_dirty_bitmap_reload() Wei Wang (1): migration: refactor migration_completion qapi/migration.json| 41 - include/migration/register.h | 2 +- migration/migration.h | 4 +- migration/options.h| 2 + migration/qemu-file.h | 49 -- migration/rdma.h | 42 + migration/block-dirty-bitmap.c | 3 - migration/block.c | 5 - migration/migration-hmp-cmds.c | 14 ++ migration/migration-stats.c| 9 +- migration/migration.c | 199 + migration/multifd.c| 101 +-- migration/options.c| 35 migration/qemu-file.c | 61 +-- migration/ram.c| 306 ++--- migration/rdma.c | 259 migration/savevm.c | 22 ++- tests/qtest/migration-test.c | 207 ++ migration/trace-events | 33 ++-- scripts/analyze-migration.py | 67 +++- tests/qtest/meson.build| 2 + 21 files changed, 895 insertions(+), 568 deletions(-) -- 2.41.0
[PULL 36/38] migration/multifd: Remove direct "socket" references
From: Fabiano Rosas We're about to enable support for other transports in multifd, so remove direct references to sockets. Signed-off-by: Fabiano Rosas Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela Message-ID: <20231012134343.23757-2-faro...@suse.de> --- migration/multifd.c | 22 -- 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index c45f5015f8..8e9a5ee394 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -510,6 +510,11 @@ static void multifd_send_terminate_threads(Error *err) } } +static int multifd_send_channel_destroy(QIOChannel *send) +{ +return socket_send_channel_destroy(send); +} + void multifd_save_cleanup(void) { int i; @@ -532,7 +537,7 @@ void multifd_save_cleanup(void) if (p->registered_yank) { migration_ioc_unregister_yank(p->c); } -socket_send_channel_destroy(p->c); +multifd_send_channel_destroy(p->c); p->c = NULL; qemu_mutex_destroy(>mutex); qemu_sem_destroy(>sem); @@ -890,20 +895,25 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p, static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque) { MultiFDSendParams *p = opaque; -QIOChannel *sioc = QIO_CHANNEL(qio_task_get_source(task)); +QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task)); Error *local_err = NULL; trace_multifd_new_send_channel_async(p->id); if (!qio_task_propagate_error(task, _err)) { -p->c = sioc; +p->c = ioc; qio_channel_set_delay(p->c, false); p->running = true; -if (multifd_channel_connect(p, sioc, local_err)) { +if (multifd_channel_connect(p, ioc, local_err)) { return; } } -multifd_new_send_channel_cleanup(p, sioc, local_err); +multifd_new_send_channel_cleanup(p, ioc, local_err); +} + +static void multifd_new_send_channel_create(gpointer opaque) +{ +socket_send_channel_create(multifd_new_send_channel_async, opaque); } int multifd_save_setup(Error **errp) @@ -952,7 +962,7 @@ int multifd_save_setup(Error **errp) p->write_flags = 0; } -socket_send_channel_create(multifd_new_send_channel_async, p); +multifd_new_send_channel_create(p); } for (i = 0; i < thread_count; i++) { -- 2.41.0
[PULL 30/38] multifd: reset next_packet_len after sending pages
From: Elena Ufimtseva Sometimes multifd sends just sync packet with no pages (normal_num is 0). In this case the old value is being preserved and being accounted for while only packet_len is being transferred. Reset it to 0 after sending and accounting for. Signed-off-by: Elena Ufimtseva Reviewed-by: Fabiano Rosas Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela Message-ID: <20231011184358.97349-5-elena.ufimts...@oracle.com> --- migration/multifd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/migration/multifd.c b/migration/multifd.c index e6e0013c16..c45f5015f8 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -730,6 +730,7 @@ static void *multifd_send_thread(void *opaque) p->next_packet_size + p->packet_len); stat64_add(_stats.transferred, p->next_packet_size + p->packet_len); +p->next_packet_size = 0; qemu_mutex_lock(>mutex); p->pending_job--; qemu_mutex_unlock(>mutex); -- 2.41.0
[PULL 23/38] migration/rdma: Check sooner if we are in postcopy for save_page()
Reviewed-by: Peter Xu Reviewed-by: Li Zhijian Signed-off-by: Juan Quintela Message-ID: <20231011203527.9061-11-quint...@redhat.com> --- migration/rdma.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/migration/rdma.c b/migration/rdma.c index c147c94b08..e973579a52 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3240,10 +3240,6 @@ static int qemu_rdma_save_page(QEMUFile *f, ram_addr_t block_offset, RDMAContext *rdma; int ret; -if (migration_in_postcopy()) { -return RAM_SAVE_CONTROL_NOT_SUPP; -} - RCU_READ_LOCK_GUARD(); rdma = qatomic_rcu_read(>rdmaout); @@ -3317,7 +3313,7 @@ err: int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset, ram_addr_t offset, size_t size) { -if (!migrate_rdma()) { +if (!migrate_rdma() || migration_in_postcopy()) { return RAM_SAVE_CONTROL_NOT_SUPP; } -- 2.41.0
[PULL 16/38] migration/rdma: Unfold ram_control_after_iterate()
Once there: - Remove unused data parameter - unfold it in its callers - change all callers to call qemu_rdma_registration_stop() - We need to call QIO_CHANNEL_RDMA() after we check for migrate_rdma() Reviewed-by: Li Zhijian Signed-off-by: Juan Quintela Message-ID: <20231011203527.9061-4-quint...@redhat.com> --- migration/qemu-file.h | 2 -- migration/rdma.h | 3 +++ migration/qemu-file.c | 12 migration/ram.c | 17 ++--- migration/rdma.c | 9 - 5 files changed, 21 insertions(+), 22 deletions(-) diff --git a/migration/qemu-file.h b/migration/qemu-file.h index d6a370c569..35e671a01e 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -55,7 +55,6 @@ typedef int (QEMURamSaveFunc)(QEMUFile *f, size_t size); typedef struct QEMUFileHooks { -QEMURamHookFunc *after_ram_iterate; QEMURamHookFunc *hook_ram_load; QEMURamSaveFunc *save_page; } QEMUFileHooks; @@ -126,7 +125,6 @@ void qemu_fflush(QEMUFile *f); void qemu_file_set_blocking(QEMUFile *f, bool block); int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size); -void ram_control_after_iterate(QEMUFile *f, uint64_t flags); void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data); /* Whenever this is found in the data stream, the flags diff --git a/migration/rdma.h b/migration/rdma.h index 670c67a8cb..c13b94c782 100644 --- a/migration/rdma.h +++ b/migration/rdma.h @@ -25,8 +25,11 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp); #ifdef CONFIG_RDMA int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags); +int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags); #else static inline int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags) { return 0; } +static inline +int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags) { return 0; } #endif #endif diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 5e2d73fd68..e7dba2a849 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -298,18 +298,6 @@ void qemu_fflush(QEMUFile *f) f->iovcnt = 0; } -void ram_control_after_iterate(QEMUFile *f, uint64_t flags) -{ -int ret = 0; - -if (f->hooks && f->hooks->after_ram_iterate) { -ret = f->hooks->after_ram_iterate(f, flags, NULL); -if (ret < 0) { -qemu_file_set_error(f, ret); -} -} -} - void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data) { if (f->hooks && f->hooks->hook_ram_load) { diff --git a/migration/ram.c b/migration/ram.c index 6592431a4e..f1ddc1f9fa 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3065,7 +3065,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque) if (ret < 0) { qemu_file_set_error(f, ret); } -ram_control_after_iterate(f, RAM_CONTROL_SETUP); + +ret = qemu_rdma_registration_stop(f, RAM_CONTROL_SETUP); +if (ret < 0) { +qemu_file_set_error(f, ret); +} migration_ops = g_malloc0(sizeof(MigrationOps)); migration_ops->ram_save_target_page = ram_save_target_page_legacy; @@ -3187,7 +3191,10 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) * Must occur before EOS (or any QEMUFile operation) * because of RDMA protocol. */ -ram_control_after_iterate(f, RAM_CONTROL_ROUND); +ret = qemu_rdma_registration_stop(f, RAM_CONTROL_ROUND); +if (ret < 0) { +qemu_file_set_error(f, ret); +} out: if (ret >= 0 @@ -3260,7 +3267,11 @@ static int ram_save_complete(QEMUFile *f, void *opaque) qemu_mutex_unlock(>bitmap_mutex); ram_flush_compressed_data(rs); -ram_control_after_iterate(f, RAM_CONTROL_FINISH); + +int ret = qemu_rdma_registration_stop(f, RAM_CONTROL_FINISH); +if (ret < 0) { +qemu_file_set_error(f, ret); +} } if (ret < 0) { diff --git a/migration/rdma.c b/migration/rdma.c index 3d74ad6db0..4b32d375ec 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3878,20 +3878,20 @@ int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags) * Inform dest that dynamic registrations are done for now. * First, flush writes, if any. */ -static int qemu_rdma_registration_stop(QEMUFile *f, - uint64_t flags, void *data) +int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags) { -QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f)); +QIOChannelRDMA *rioc; Error *err = NULL; RDMAContext *rdma; RDMAControlHeader head = { .len = 0, .repeat = 1 }; int ret; -if (migration_in_postcopy()) { +if (!migrate_rdma() || migration_in_postcopy()) { return 0; } RCU_READ_LOCK_GUARD(); +rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f)); rdma = qatomic_rcu_read(>rdmaout); if (!rdma) { return -1; @@ -3999,7 +3999,6 @@ static const QEMUFileHooks rdma_read_hooks = { }; static const
[PULL 38/38] migration/multifd: Clarify Error usage in multifd_channel_connect
From: Fabiano Rosas The function is currently called from two sites, one always gives it a NULL Error and the other always gives it a non-NULL Error. In the non-NULL case, all it does it trace the error and return. One of the callers already have tracing, add a tracepoint to the other and stop passing the error into the function. Cc: Markus Armbruster Signed-off-by: Fabiano Rosas Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela Message-ID: <20231012134343.23757-4-faro...@suse.de> --- migration/multifd.c| 60 -- migration/trace-events | 3 ++- 2 files changed, 30 insertions(+), 33 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index c92955de41..1fe53d3b98 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -775,7 +775,7 @@ out: static bool multifd_channel_connect(MultiFDSendParams *p, QIOChannel *ioc, -Error *error); +Error **errp); static void multifd_tls_outgoing_handshake(QIOTask *task, gpointer opaque) @@ -784,21 +784,22 @@ static void multifd_tls_outgoing_handshake(QIOTask *task, QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task)); Error *err = NULL; -if (qio_task_propagate_error(task, )) { -trace_multifd_tls_outgoing_handshake_error(ioc, error_get_pretty(err)); -} else { +if (!qio_task_propagate_error(task, )) { trace_multifd_tls_outgoing_handshake_complete(ioc); +if (multifd_channel_connect(p, ioc, )) { +return; +} } -if (!multifd_channel_connect(p, ioc, err)) { -/* - * Error happen, mark multifd_send_thread status as 'quit' although it - * is not created, and then tell who pay attention to me. - */ -p->quit = true; -qemu_sem_post(_send_state->channels_ready); -qemu_sem_post(>sem_sync); -} +trace_multifd_tls_outgoing_handshake_error(ioc, error_get_pretty(err)); + +/* + * Error happen, mark multifd_send_thread status as 'quit' although it + * is not created, and then tell who pay attention to me. + */ +p->quit = true; +qemu_sem_post(_send_state->channels_ready); +qemu_sem_post(>sem_sync); } static void *multifd_tls_handshake_thread(void *opaque) @@ -814,7 +815,7 @@ static void *multifd_tls_handshake_thread(void *opaque) return NULL; } -static void multifd_tls_channel_connect(MultiFDSendParams *p, +static bool multifd_tls_channel_connect(MultiFDSendParams *p, QIOChannel *ioc, Error **errp) { @@ -824,7 +825,7 @@ static void multifd_tls_channel_connect(MultiFDSendParams *p, tioc = migration_tls_client_create(ioc, hostname, errp); if (!tioc) { -return; +return false; } object_unref(OBJECT(ioc)); @@ -834,31 +835,25 @@ static void multifd_tls_channel_connect(MultiFDSendParams *p, qemu_thread_create(>thread, "multifd-tls-handshake-worker", multifd_tls_handshake_thread, p, QEMU_THREAD_JOINABLE); +return true; } static bool multifd_channel_connect(MultiFDSendParams *p, QIOChannel *ioc, -Error *error) +Error **errp) { trace_multifd_set_outgoing_channel( ioc, object_get_typename(OBJECT(ioc)), -migrate_get_current()->hostname, error); +migrate_get_current()->hostname); -if (error) { -return false; -} if (migrate_channel_requires_tls_upgrade(ioc)) { -multifd_tls_channel_connect(p, ioc, ); -if (!error) { -/* - * tls_channel_connect will call back to this - * function after the TLS handshake, - * so we mustn't call multifd_send_thread until then - */ -return true; -} else { -return false; -} +/* + * tls_channel_connect will call back to this + * function after the TLS handshake, + * so we mustn't call multifd_send_thread until then + */ +return multifd_tls_channel_connect(p, ioc, errp); + } else { migration_ioc_register_yank(ioc); p->registered_yank = true; @@ -897,11 +892,12 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque) p->c = ioc; qio_channel_set_delay(p->c, false); p->running = true; -if (multifd_channel_connect(p, ioc, local_err)) { +if (multifd_channel_connect(p, ioc, _err)) { return; } } +trace_multifd_new_send_channel_async_error(p->id, local_err); multifd_new_send_channel_cleanup(p, ioc, local_err);
[PULL 07/38] migration: Add capability parsing to analyze-migration.py
From: Fabiano Rosas The script is broken when the configuration/capabilities section is present. Add support for parsing the capabilities so we can fix it in the next patch. Signed-off-by: Fabiano Rosas Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela Message-ID: <20231009184326.15777-4-faro...@suse.de> --- scripts/analyze-migration.py | 38 1 file changed, 38 insertions(+) diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py index 24687db497..c700fed64d 100755 --- a/scripts/analyze-migration.py +++ b/scripts/analyze-migration.py @@ -264,6 +264,24 @@ class ConfigurationSection(object): def __init__(self, file, desc): self.file = file self.desc = desc +self.caps = [] + +def parse_capabilities(self, vmsd_caps): +if not vmsd_caps: +return + +ncaps = vmsd_caps.data['caps_count'].data +self.caps = vmsd_caps.data['capabilities'] + +if type(self.caps) != list: +self.caps = [self.caps] + +if len(self.caps) != ncaps: +raise Exception("Number of capabilities doesn't match " +"caps_count field") + +def has_capability(self, cap): +return any([str(c) == cap for c in self.caps]) def read(self): if self.desc: @@ -271,6 +289,8 @@ def read(self): section = VMSDSection(self.file, version_id, self.desc, 'configuration') section.read() +self.parse_capabilities( +section.data.get("configuration/capabilities")) else: # backward compatibility for older streams that don't have # the configuration section in the json @@ -297,6 +317,23 @@ def read(self): self.data = self.file.readvar(size) return self.data +class VMSDFieldCap(object): +def __init__(self, desc, file): +self.file = file +self.desc = desc +self.data = "" + +def __repr__(self): +return self.data + +def __str__(self): +return self.data + +def read(self): +len = self.file.read8() +self.data = self.file.readstr(len) + + class VMSDFieldInt(VMSDFieldGeneric): def __init__(self, desc, file): super(VMSDFieldInt, self).__init__(desc, file) @@ -471,6 +508,7 @@ def getDict(self): "unused_buffer" : VMSDFieldGeneric, "bitmap" : VMSDFieldGeneric, "struct" : VMSDFieldStruct, +"capability": VMSDFieldCap, "unknown" : VMSDFieldGeneric, } -- 2.41.0
[PULL 01/38] migration: refactor migration_completion
From: Wei Wang Current migration_completion function is a bit long. Refactor the long implementation into different subfunctions: - migration_completion_precopy: completion code related to precopy - migration_completion_postcopy: completion code related to postcopy Rename await_return_path_close_on_source to close_return_path_on_source: It is renamed to match with open_return_path_on_source. This improves readability and is easier for future updates (e.g. add new subfunctions when completion code related to new features are needed). No functional changes intended. Signed-off-by: Wei Wang Reviewed-by: Peter Xu Reviewed-by: Isaku Yamahata Reviewed-by: Juan Quintela Message-ID: <20230804093053.5037-1-wei.w.w...@intel.com> Signed-off-by: Juan Quintela --- migration/migration.c | 167 -- 1 file changed, 94 insertions(+), 73 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 1c6c81ad49..0e1002d017 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -99,7 +99,7 @@ static int migration_maybe_pause(MigrationState *s, int *current_active_state, int new_state); static void migrate_fd_cancel(MigrationState *s); -static int await_return_path_close_on_source(MigrationState *s); +static int close_return_path_on_source(MigrationState *s); static bool migration_needs_multiple_sockets(void) { @@ -1191,7 +1191,7 @@ static void migrate_fd_cleanup(MigrationState *s) * We already cleaned up to_dst_file, so errors from the return * path might be due to that, ignore them. */ -await_return_path_close_on_source(s); +close_return_path_on_source(s); assert(!migration_is_active(s)); @@ -2049,8 +2049,7 @@ static int open_return_path_on_source(MigrationState *ms) return 0; } -/* Returns 0 if the RP was ok, otherwise there was an error on the RP */ -static int await_return_path_close_on_source(MigrationState *ms) +static int close_return_path_on_source(MigrationState *ms) { int ret; @@ -2317,6 +2316,87 @@ static int migration_maybe_pause(MigrationState *s, return s->state == new_state ? 0 : -EINVAL; } +static int migration_completion_precopy(MigrationState *s, +int *current_active_state) +{ +int ret; + +qemu_mutex_lock_iothread(); +s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); +qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); + +s->vm_old_state = runstate_get(); +global_state_store(); + +ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); +trace_migration_completion_vm_stop(ret); +if (ret < 0) { +goto out_unlock; +} + +ret = migration_maybe_pause(s, current_active_state, +MIGRATION_STATUS_DEVICE); +if (ret < 0) { +goto out_unlock; +} + +/* + * Inactivate disks except in COLO, and track that we have done so in order + * to remember to reactivate them if migration fails or is cancelled. + */ +s->block_inactive = !migrate_colo(); +migration_rate_set(RATE_LIMIT_DISABLED); +ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false, + s->block_inactive); +out_unlock: +qemu_mutex_unlock_iothread(); +return ret; +} + +static void migration_completion_postcopy(MigrationState *s) +{ +trace_migration_completion_postcopy_end(); + +qemu_mutex_lock_iothread(); +qemu_savevm_state_complete_postcopy(s->to_dst_file); +qemu_mutex_unlock_iothread(); + +/* + * Shutdown the postcopy fast path thread. This is only needed when dest + * QEMU binary is old (7.1/7.2). QEMU 8.0+ doesn't need this. + */ +if (migrate_postcopy_preempt() && s->preempt_pre_7_2) { +postcopy_preempt_shutdown_file(s); +} + +trace_migration_completion_postcopy_end_after_complete(); +} + +static void migration_completion_failed(MigrationState *s, +int current_active_state) +{ +if (s->block_inactive && (s->state == MIGRATION_STATUS_ACTIVE || + s->state == MIGRATION_STATUS_DEVICE)) { +/* + * If not doing postcopy, vm_start() will be called: let's + * regain control on images. + */ +Error *local_err = NULL; + +qemu_mutex_lock_iothread(); +bdrv_activate_all(_err); +if (local_err) { +error_report_err(local_err); +} else { +s->block_inactive = false; +} +qemu_mutex_unlock_iothread(); +} + +migrate_set_state(>state, current_active_state, + MIGRATION_STATUS_FAILED); +} + /** * migration_completion: Used by migration_thread when there's not much left. * The caller 'breaks' the loop when this returns. @@ -2325,62 +2405,22 @@ static int
[PULL 21/38] migration/rdma: Move rdma constants from qemu-file.h to rdma.h
Reviewed-by: Peter Xu Reviewed-by: Li Zhijian Signed-off-by: Juan Quintela Message-ID: <20231011203527.9061-9-quint...@redhat.com> --- migration/qemu-file.h | 17 - migration/rdma.h | 16 migration/ram.c | 2 +- 3 files changed, 17 insertions(+), 18 deletions(-) diff --git a/migration/qemu-file.h b/migration/qemu-file.h index 0b22d8335f..a29c37b0d0 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -29,13 +29,6 @@ #include "exec/cpu-common.h" #include "io/channel.h" -/* - * Constants used by ram_control_* hooks - */ -#define RAM_CONTROL_SETUP 0 -#define RAM_CONTROL_ROUND 1 -#define RAM_CONTROL_FINISH3 - QEMUFile *qemu_file_new_input(QIOChannel *ioc); QEMUFile *qemu_file_new_output(QIOChannel *ioc); int qemu_fclose(QEMUFile *f); @@ -101,16 +94,6 @@ void qemu_fflush(QEMUFile *f); void qemu_file_set_blocking(QEMUFile *f, bool block); int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size); -/* Whenever this is found in the data stream, the flags - * will be passed to ram_control_load_hook in the incoming-migration - * side. This lets before_ram_iterate/after_ram_iterate add - * transport-specific sections to the RAM migration data. - */ -#define RAM_SAVE_FLAG_HOOK 0x80 - -#define RAM_SAVE_CONTROL_NOT_SUPP -1000 -#define RAM_SAVE_CONTROL_DELAYED -2000 - QIOChannel *qemu_file_get_ioc(QEMUFile *file); #endif diff --git a/migration/rdma.h b/migration/rdma.h index 09a16c1e3c..1ff3718a76 100644 --- a/migration/rdma.h +++ b/migration/rdma.h @@ -24,6 +24,22 @@ void rdma_start_outgoing_migration(void *opaque, const char *host_port, void rdma_start_incoming_migration(const char *host_port, Error **errp); +/* + * Constants used by rdma return codes + */ +#define RAM_CONTROL_SETUP 0 +#define RAM_CONTROL_ROUND 1 +#define RAM_CONTROL_FINISH3 + +/* + * Whenever this is found in the data stream, the flags + * will be passed to rdma functions in the incoming-migration + * side. + */ +#define RAM_SAVE_FLAG_HOOK 0x80 + +#define RAM_SAVE_CONTROL_NOT_SUPP -1000 +#define RAM_SAVE_CONTROL_DELAYED -2000 #ifdef CONFIG_RDMA int qemu_rdma_registration_handle(QEMUFile *f); diff --git a/migration/ram.c b/migration/ram.c index 3b4b09f6ff..6a4aed2a75 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -89,7 +89,7 @@ #define RAM_SAVE_FLAG_EOS 0x10 #define RAM_SAVE_FLAG_CONTINUE 0x20 #define RAM_SAVE_FLAG_XBZRLE 0x40 -/* 0x80 is reserved in qemu-file.h for RAM_SAVE_FLAG_HOOK */ +/* 0x80 is reserved in rdma.h for RAM_SAVE_FLAG_HOOK */ #define RAM_SAVE_FLAG_COMPRESS_PAGE0x100 #define RAM_SAVE_FLAG_MULTIFD_FLUSH0x200 /* We can't use any flag that is bigger than 0x200 */ -- 2.41.0
[PULL 22/38] migration/rdma: Remove qemu_ prefix from exported functions
Functions are long enough even without this. Reviewed-by: Peter Xu Reviewed-by: Li Zhijian Signed-off-by: Juan Quintela Message-ID: <20231011203527.9061-10-quint...@redhat.com> --- migration/rdma.h | 12 ++-- migration/ram.c| 14 +++--- migration/rdma.c | 40 +++- migration/trace-events | 28 ++-- 4 files changed, 46 insertions(+), 48 deletions(-) diff --git a/migration/rdma.h b/migration/rdma.h index 1ff3718a76..30b15b4466 100644 --- a/migration/rdma.h +++ b/migration/rdma.h @@ -42,19 +42,19 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp); #define RAM_SAVE_CONTROL_DELAYED -2000 #ifdef CONFIG_RDMA -int qemu_rdma_registration_handle(QEMUFile *f); -int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags); -int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags); +int rdma_registration_handle(QEMUFile *f); +int rdma_registration_start(QEMUFile *f, uint64_t flags); +int rdma_registration_stop(QEMUFile *f, uint64_t flags); int rdma_block_notification_handle(QEMUFile *f, const char *name); int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset, ram_addr_t offset, size_t size); #else static inline -int qemu_rdma_registration_handle(QEMUFile *f) { return 0; } +int rdma_registration_handle(QEMUFile *f) { return 0; } static inline -int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags) { return 0; } +int rdma_registration_start(QEMUFile *f, uint64_t flags) { return 0; } static inline -int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags) { return 0; } +int rdma_registration_stop(QEMUFile *f, uint64_t flags) { return 0; } static inline int rdma_block_notification_handle(QEMUFile *f, const char *name) { return 0; } static inline diff --git a/migration/ram.c b/migration/ram.c index 6a4aed2a75..a9bc6ae1f1 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3061,12 +3061,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque) } } -ret = qemu_rdma_registration_start(f, RAM_CONTROL_SETUP); +ret = rdma_registration_start(f, RAM_CONTROL_SETUP); if (ret < 0) { qemu_file_set_error(f, ret); } -ret = qemu_rdma_registration_stop(f, RAM_CONTROL_SETUP); +ret = rdma_registration_stop(f, RAM_CONTROL_SETUP); if (ret < 0) { qemu_file_set_error(f, ret); } @@ -3131,7 +3131,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) /* Read version before ram_list.blocks */ smp_rmb(); -ret = qemu_rdma_registration_start(f, RAM_CONTROL_ROUND); +ret = rdma_registration_start(f, RAM_CONTROL_ROUND); if (ret < 0) { qemu_file_set_error(f, ret); } @@ -3191,7 +3191,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) * Must occur before EOS (or any QEMUFile operation) * because of RDMA protocol. */ -ret = qemu_rdma_registration_stop(f, RAM_CONTROL_ROUND); +ret = rdma_registration_stop(f, RAM_CONTROL_ROUND); if (ret < 0) { qemu_file_set_error(f, ret); } @@ -3242,7 +3242,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) migration_bitmap_sync_precopy(rs, true); } -ret = qemu_rdma_registration_start(f, RAM_CONTROL_FINISH); +ret = rdma_registration_start(f, RAM_CONTROL_FINISH); if (ret < 0) { qemu_file_set_error(f, ret); } @@ -3268,7 +3268,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) ram_flush_compressed_data(rs); -int ret = qemu_rdma_registration_stop(f, RAM_CONTROL_FINISH); +int ret = rdma_registration_stop(f, RAM_CONTROL_FINISH); if (ret < 0) { qemu_file_set_error(f, ret); } @@ -4077,7 +4077,7 @@ static int ram_load_precopy(QEMUFile *f) } break; case RAM_SAVE_FLAG_HOOK: -ret = qemu_rdma_registration_handle(f); +ret = rdma_registration_handle(f); if (ret < 0) { qemu_file_set_error(f, ret); } diff --git a/migration/rdma.c b/migration/rdma.c index 9883b0a250..c147c94b08 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3540,7 +3540,7 @@ static int dest_ram_sort_func(const void *a, const void *b) * * Keep doing this until the source tells us to stop. */ -int qemu_rdma_registration_handle(QEMUFile *f) +int rdma_registration_handle(QEMUFile *f) { RDMAControlHeader reg_resp = { .len = sizeof(RDMARegisterResult), .type = RDMA_CONTROL_REGISTER_RESULT, @@ -3586,7 +3586,7 @@ int qemu_rdma_registration_handle(QEMUFile *f) local = >local_ram_blocks; do { -trace_qemu_rdma_registration_handle_wait(); +trace_rdma_registration_handle_wait(); ret = qemu_rdma_exchange_recv(rdma, , RDMA_CONTROL_NONE, ); @@ -3606,9 +3606,9 @@
[PULL 10/38] tests/qtest/migration: Add a test for the analyze-migration script
From: Fabiano Rosas Add a smoke test that migrates to a file and gives it to the script. It should catch the most annoying errors such as changes in the ram flags. After code has been merged it becomes way harder to figure out what is causing the script to fail, the person making the change is the most likely to know right away what the problem is. Signed-off-by: Fabiano Rosas Acked-by: Thomas Huth Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela Message-ID: <20231009184326.15777-7-faro...@suse.de> --- tests/qtest/migration-test.c | 60 tests/qtest/meson.build | 2 ++ 2 files changed, 62 insertions(+) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 8eb2053dbb..cef5081f8c 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -66,6 +66,8 @@ static bool got_dst_resume; */ #define DIRTYLIMIT_TOLERANCE_RANGE 25 /* MB/s */ +#define ANALYZE_SCRIPT "scripts/analyze-migration.py" + #if defined(__linux__) #include #include @@ -1501,6 +1503,61 @@ static void test_baddest(void) test_migrate_end(from, to, false); } +#ifndef _WIN32 +static void test_analyze_script(void) +{ +MigrateStart args = { +.opts_source = "-uuid ----", +}; +QTestState *from, *to; +g_autofree char *uri = NULL; +g_autofree char *file = NULL; +int pid, wstatus; +const char *python = g_getenv("PYTHON"); + +if (!python) { +g_test_skip("PYTHON variable not set"); +return; +} + +/* dummy url */ +if (test_migrate_start(, , "tcp:127.0.0.1:0", )) { +return; +} + +/* + * Setting these two capabilities causes the "configuration" + * vmstate to include subsections for them. The script needs to + * parse those subsections properly. + */ +migrate_set_capability(from, "validate-uuid", true); +migrate_set_capability(from, "x-ignore-shared", true); + +file = g_strdup_printf("%s/migfile", tmpfs); +uri = g_strdup_printf("exec:cat > %s", file); + +migrate_ensure_converge(from); +migrate_qmp(from, uri, "{}"); +wait_for_migration_complete(from); + +pid = fork(); +if (!pid) { +close(1); +open("/dev/null", O_WRONLY); +execl(python, python, ANALYZE_SCRIPT, "-f", file, NULL); +g_assert_not_reached(); +} + +g_assert(waitpid(pid, , 0) == pid); +if (WIFEXITED(wstatus) && WEXITSTATUS(wstatus) != 0) { +g_test_message("Failed to analyze the migration stream"); +g_test_fail(); +} +test_migrate_end(from, to, false); +cleanup("migfile"); +} +#endif + static void test_precopy_common(MigrateCommon *args) { QTestState *from, *to; @@ -2837,6 +2894,9 @@ int main(int argc, char **argv) } qtest_add_func("/migration/bad_dest", test_baddest); +#ifndef _WIN32 +qtest_add_func("/migration/analyze-script", test_analyze_script); +#endif qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain); qtest_add_func("/migration/precopy/unix/xbzrle", test_precopy_unix_xbzrle); /* diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build index 66795cfcd2..d6022ebd64 100644 --- a/tests/qtest/meson.build +++ b/tests/qtest/meson.build @@ -357,6 +357,8 @@ foreach dir : target_dirs test_deps += [qsd] endif + qtest_env.set('PYTHON', python.full_path()) + foreach test : target_qtests # Executables are shared across targets, declare them only the first time we # encounter them -- 2.41.0
[PULL 24/38] migration/rdma: Use i as for index instead of idx
Once there, all the uses are local to the for, so declare the variable inside the for statement. Reviewed-by: Fabiano Rosas Reviewed-by: Li Zhijian Signed-off-by: Juan Quintela Message-ID: <20231011203527.9061-12-quint...@redhat.com> --- migration/rdma.c | 49 ++-- 1 file changed, 22 insertions(+), 27 deletions(-) diff --git a/migration/rdma.c b/migration/rdma.c index e973579a52..5f6a771e8e 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -2354,7 +2354,6 @@ static int qemu_rdma_write(RDMAContext *rdma, static void qemu_rdma_cleanup(RDMAContext *rdma) { Error *err = NULL; -int idx; if (rdma->cm_id && rdma->connected) { if ((rdma->errored || @@ -2381,12 +2380,12 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) g_free(rdma->dest_blocks); rdma->dest_blocks = NULL; -for (idx = 0; idx < RDMA_WRID_MAX; idx++) { -if (rdma->wr_data[idx].control_mr) { +for (int i = 0; i < RDMA_WRID_MAX; i++) { +if (rdma->wr_data[i].control_mr) { rdma->total_registrations--; -ibv_dereg_mr(rdma->wr_data[idx].control_mr); +ibv_dereg_mr(rdma->wr_data[i].control_mr); } -rdma->wr_data[idx].control_mr = NULL; +rdma->wr_data[i].control_mr = NULL; } if (rdma->local_ram_blocks.block) { @@ -2452,7 +2451,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp) { -int ret, idx; +int ret; /* * Will be validated against destination's actual capabilities @@ -2480,18 +2479,17 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp) /* Build the hash that maps from offset to RAMBlock */ rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal); -for (idx = 0; idx < rdma->local_ram_blocks.nb_blocks; idx++) { +for (int i = 0; i < rdma->local_ram_blocks.nb_blocks; i++) { g_hash_table_insert(rdma->blockmap, -(void *)(uintptr_t)rdma->local_ram_blocks.block[idx].offset, ->local_ram_blocks.block[idx]); +(void *)(uintptr_t)rdma->local_ram_blocks.block[i].offset, +>local_ram_blocks.block[i]); } -for (idx = 0; idx < RDMA_WRID_MAX; idx++) { -ret = qemu_rdma_reg_control(rdma, idx); +for (int i = 0; i < RDMA_WRID_MAX; i++) { +ret = qemu_rdma_reg_control(rdma, i); if (ret < 0) { -error_setg(errp, - "RDMA ERROR: rdma migration: error registering %d control!", - idx); +error_setg(errp, "RDMA ERROR: rdma migration: error " + "registering %d control!", i); goto err_rdma_source_init; } } @@ -2625,16 +2623,16 @@ err_rdma_source_connect: static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp) { Error *err = NULL; -int ret, idx; +int ret; struct rdma_cm_id *listen_id; char ip[40] = "unknown"; struct rdma_addrinfo *res, *e; char port_str[16]; int reuse = 1; -for (idx = 0; idx < RDMA_WRID_MAX; idx++) { -rdma->wr_data[idx].control_len = 0; -rdma->wr_data[idx].control_curr = NULL; +for (int i = 0; i < RDMA_WRID_MAX; i++) { +rdma->wr_data[i].control_len = 0; +rdma->wr_data[i].control_curr = NULL; } if (!rdma->host || !rdma->host[0]) { @@ -2723,11 +2721,9 @@ err_dest_init_create_listen_id: static void qemu_rdma_return_path_dest_init(RDMAContext *rdma_return_path, RDMAContext *rdma) { -int idx; - -for (idx = 0; idx < RDMA_WRID_MAX; idx++) { -rdma_return_path->wr_data[idx].control_len = 0; -rdma_return_path->wr_data[idx].control_curr = NULL; +for (int i = 0; i < RDMA_WRID_MAX; i++) { +rdma_return_path->wr_data[i].control_len = 0; +rdma_return_path->wr_data[i].control_curr = NULL; } /*the CM channel and CM id is shared*/ @@ -3376,7 +3372,6 @@ static int qemu_rdma_accept(RDMAContext *rdma) struct rdma_cm_event *cm_event; struct ibv_context *verbs; int ret; -int idx; ret = rdma_get_cm_event(rdma->channel, _event); if (ret < 0) { @@ -3462,10 +3457,10 @@ static int qemu_rdma_accept(RDMAContext *rdma) qemu_rdma_init_ram_blocks(rdma); -for (idx = 0; idx < RDMA_WRID_MAX; idx++) { -ret = qemu_rdma_reg_control(rdma, idx); +for (int i = 0; i < RDMA_WRID_MAX; i++) { +ret = qemu_rdma_reg_control(rdma, i); if (ret < 0) { -error_report("rdma: error registering %d control", idx); +error_report("rdma: error registering %d control", i); goto err_rdma_dest_wait; } } -- 2.41.0
[PULL 18/38] migration/rdma: Unfold hook_ram_load()
There is only one flag called with: RAM_CONTROL_BLOCK_REG. Reviewed-by: Li Zhijian Signed-off-by: Juan Quintela Message-ID: <20231011203527.9061-6-quint...@redhat.com> --- migration/qemu-file.h | 11 --- migration/rdma.h | 3 +++ migration/qemu-file.c | 10 -- migration/ram.c | 6 -- migration/rdma.c | 34 +++--- 5 files changed, 18 insertions(+), 46 deletions(-) diff --git a/migration/qemu-file.h b/migration/qemu-file.h index 14ff0d9cc4..80c30631dc 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -29,20 +29,12 @@ #include "exec/cpu-common.h" #include "io/channel.h" -/* - * This function provides hooks around different - * stages of RAM migration. - * 'data' is call specific data associated with the 'flags' value - */ -typedef int (QEMURamHookFunc)(QEMUFile *f, uint64_t flags, void *data); - /* * Constants used by ram_control_* hooks */ #define RAM_CONTROL_SETUP 0 #define RAM_CONTROL_ROUND 1 #define RAM_CONTROL_FINISH3 -#define RAM_CONTROL_BLOCK_REG 4 /* * This function allows override of where the RAM page @@ -54,7 +46,6 @@ typedef int (QEMURamSaveFunc)(QEMUFile *f, size_t size); typedef struct QEMUFileHooks { -QEMURamHookFunc *hook_ram_load; QEMURamSaveFunc *save_page; } QEMUFileHooks; @@ -124,8 +115,6 @@ void qemu_fflush(QEMUFile *f); void qemu_file_set_blocking(QEMUFile *f, bool block); int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size); -void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data); - /* Whenever this is found in the data stream, the flags * will be passed to ram_control_load_hook in the incoming-migration * side. This lets before_ram_iterate/after_ram_iterate add diff --git a/migration/rdma.h b/migration/rdma.h index 8bd277efb9..8df8b4089a 100644 --- a/migration/rdma.h +++ b/migration/rdma.h @@ -27,6 +27,7 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp); int qemu_rdma_registration_handle(QEMUFile *f); int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags); int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags); +int rdma_block_notification_handle(QEMUFile *f, const char *name); #else static inline int qemu_rdma_registration_handle(QEMUFile *f) { return 0; } @@ -34,5 +35,7 @@ static inline int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags) { return 0; } static inline int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags) { return 0; } +static inline +int rdma_block_notification_handle(QEMUFile *f, const char *name) { return 0; } #endif #endif diff --git a/migration/qemu-file.c b/migration/qemu-file.c index e7dba2a849..4a414b8976 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -298,16 +298,6 @@ void qemu_fflush(QEMUFile *f) f->iovcnt = 0; } -void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data) -{ -if (f->hooks && f->hooks->hook_ram_load) { -int ret = f->hooks->hook_ram_load(f, flags, data); -if (ret < 0) { -qemu_file_set_error(f, ret); -} -} -} - int ram_control_save_page(QEMUFile *f, ram_addr_t block_offset, ram_addr_t offset, size_t size) { diff --git a/migration/ram.c b/migration/ram.c index f6ea1831b5..8c462276cd 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4025,8 +4025,10 @@ static int ram_load_precopy(QEMUFile *f) ret = -EINVAL; } } -ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG, - block->idstr); +ret = rdma_block_notification_handle(f, block->idstr); +if (ret < 0) { +qemu_file_set_error(f, ret); +} } else { error_report("Unknown ramblock \"%s\", cannot " "accept migration", id); diff --git a/migration/rdma.c b/migration/rdma.c index 5c20f425a9..0b1cb03b2b 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3799,22 +3799,23 @@ err: } /* Destination: - * Called via a ram_control_load_hook during the initial RAM load section which - * lists the RAMBlocks by name. This lets us know the order of the RAMBlocks - * on the source. - * We've already built our local RAMBlock list, but not yet sent the list to - * the source. + * Called during the initial RAM load section which lists the + * RAMBlocks by name. This lets us know the order of the RAMBlocks on + * the source. We've already built our local RAMBlock list, but not + * yet sent the list to the source. */ -static int -rdma_block_notification_handle(QEMUFile *f, const char *name) +int rdma_block_notification_handle(QEMUFile *f, const char *name) { -RDMAContext *rdma; -QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f)); int curr;
[PULL 15/38] migration/rdma: Unfold ram_control_before_iterate()
Once there: - Remove unused data parameter - unfold it in its callers. - change all callers to call qemu_rdma_registration_start() - We need to call QIO_CHANNEL_RDMA() after we check for migrate_rdma() Reviewed-by: Li Zhijian Reviewed-by: Fabiano Rosas Signed-off-by: Juan Quintela Message-ID: <20231011203527.9061-3-quint...@redhat.com> --- migration/qemu-file.h | 2 -- migration/rdma.h | 7 +++ migration/qemu-file.c | 13 + migration/ram.c | 16 +--- migration/rdma.c | 12 5 files changed, 25 insertions(+), 25 deletions(-) diff --git a/migration/qemu-file.h b/migration/qemu-file.h index 03e718c264..d6a370c569 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -55,7 +55,6 @@ typedef int (QEMURamSaveFunc)(QEMUFile *f, size_t size); typedef struct QEMUFileHooks { -QEMURamHookFunc *before_ram_iterate; QEMURamHookFunc *after_ram_iterate; QEMURamHookFunc *hook_ram_load; QEMURamSaveFunc *save_page; @@ -127,7 +126,6 @@ void qemu_fflush(QEMUFile *f); void qemu_file_set_blocking(QEMUFile *f, bool block); int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size); -void ram_control_before_iterate(QEMUFile *f, uint64_t flags); void ram_control_after_iterate(QEMUFile *f, uint64_t flags); void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data); diff --git a/migration/rdma.h b/migration/rdma.h index de2ba09dc5..670c67a8cb 100644 --- a/migration/rdma.h +++ b/migration/rdma.h @@ -22,4 +22,11 @@ void rdma_start_outgoing_migration(void *opaque, const char *host_port, void rdma_start_incoming_migration(const char *host_port, Error **errp); + +#ifdef CONFIG_RDMA +int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags); +#else +static inline +int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags) { return 0; } +#endif #endif diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 7fb659296f..5e2d73fd68 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -32,6 +32,7 @@ #include "trace.h" #include "options.h" #include "qapi/error.h" +#include "rdma.h" #define IO_BUF_SIZE 32768 #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64) @@ -297,18 +298,6 @@ void qemu_fflush(QEMUFile *f) f->iovcnt = 0; } -void ram_control_before_iterate(QEMUFile *f, uint64_t flags) -{ -int ret = 0; - -if (f->hooks && f->hooks->before_ram_iterate) { -ret = f->hooks->before_ram_iterate(f, flags, NULL); -if (ret < 0) { -qemu_file_set_error(f, ret); -} -} -} - void ram_control_after_iterate(QEMUFile *f, uint64_t flags) { int ret = 0; diff --git a/migration/ram.c b/migration/ram.c index acb8f95f00..6592431a4e 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -59,6 +59,7 @@ #include "qemu/iov.h" #include "multifd.h" #include "sysemu/runstate.h" +#include "rdma.h" #include "options.h" #include "sysemu/dirtylimit.h" #include "sysemu/kvm.h" @@ -3060,7 +3061,10 @@ static int ram_save_setup(QEMUFile *f, void *opaque) } } -ram_control_before_iterate(f, RAM_CONTROL_SETUP); +ret = qemu_rdma_registration_start(f, RAM_CONTROL_SETUP); +if (ret < 0) { +qemu_file_set_error(f, ret); +} ram_control_after_iterate(f, RAM_CONTROL_SETUP); migration_ops = g_malloc0(sizeof(MigrationOps)); @@ -3123,7 +3127,10 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) /* Read version before ram_list.blocks */ smp_rmb(); -ram_control_before_iterate(f, RAM_CONTROL_ROUND); +ret = qemu_rdma_registration_start(f, RAM_CONTROL_ROUND); +if (ret < 0) { +qemu_file_set_error(f, ret); +} t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); i = 0; @@ -3228,7 +3235,10 @@ static int ram_save_complete(QEMUFile *f, void *opaque) migration_bitmap_sync_precopy(rs, true); } -ram_control_before_iterate(f, RAM_CONTROL_FINISH); +ret = qemu_rdma_registration_start(f, RAM_CONTROL_FINISH); +if (ret < 0) { +qemu_file_set_error(f, ret); +} /* try transferring iterative blocks of memory */ diff --git a/migration/rdma.c b/migration/rdma.c index f155f3e1c8..3d74ad6db0 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3850,18 +3850,15 @@ static int rdma_load_hook(QEMUFile *f, uint64_t flags, void *data) } } -static int qemu_rdma_registration_start(QEMUFile *f, -uint64_t flags, void *data) +int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags) { -QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f)); -RDMAContext *rdma; - -if (migration_in_postcopy()) { +if (!migrate_rdma() || migration_in_postcopy()) { return 0; } +QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f)); RCU_READ_LOCK_GUARD(); -rdma =
[PULL 03/38] migration: Allow user to specify available switchover bandwidth
From: Peter Xu Migration bandwidth is a very important value to live migration. It's because it's one of the major factors that we'll make decision on when to switchover to destination in a precopy process. This value is currently estimated by QEMU during the whole live migration process by monitoring how fast we were sending the data. This can be the most accurate bandwidth if in the ideal world, where we're always feeding unlimited data to the migration channel, and then it'll be limited to the bandwidth that is available. However in reality it may be very different, e.g., over a 10Gbps network we can see query-migrate showing migration bandwidth of only a few tens of MB/s just because there are plenty of other things the migration thread might be doing. For example, the migration thread can be busy scanning zero pages, or it can be fetching dirty bitmap from other external dirty sources (like vhost or KVM). It means we may not be pushing data as much as possible to migration channel, so the bandwidth estimated from "how many data we sent in the channel" can be dramatically inaccurate sometimes. With that, the decision to switchover will be affected, by assuming that we may not be able to switchover at all with such a low bandwidth, but in reality we can. The migration may not even converge at all with the downtime specified, with that wrong estimation of bandwidth, keeping iterations forever with a low estimation of bandwidth. The issue is QEMU itself may not be able to avoid those uncertainties on measuing the real "available migration bandwidth". At least not something I can think of so far. One way to fix this is when the user is fully aware of the available bandwidth, then we can allow the user to help providing an accurate value. For example, if the user has a dedicated channel of 10Gbps for migration for this specific VM, the user can specify this bandwidth so QEMU can always do the calculation based on this fact, trusting the user as long as specified. It may not be the exact bandwidth when switching over (in which case qemu will push migration data as fast as possible), but much better than QEMU trying to wildly guess, especially when very wrong. A new parameter "avail-switchover-bandwidth" is introduced just for this. So when the user specified this parameter, instead of trusting the estimated value from QEMU itself (based on the QEMUFile send speed), it trusts the user more by using this value to decide when to switchover, assuming that we'll have such bandwidth available then. Note that specifying this value will not throttle the bandwidth for switchover yet, so QEMU will always use the full bandwidth possible for sending switchover data, assuming that should always be the most important way to use the network at that time. This can resolve issues like "unconvergence migration" which is caused by hilarious low "migration bandwidth" detected for whatever reason. Reported-by: Zhiyi Guo Reviewed-by: Joao Martins Reviewed-by: Juan Quintela Signed-off-by: Peter Xu Signed-off-by: Juan Quintela Message-ID: <20231010221922.40638-1-pet...@redhat.com> --- qapi/migration.json| 34 +- migration/migration.h | 2 +- migration/options.h| 1 + migration/migration-hmp-cmds.c | 14 ++ migration/migration.c | 24 +--- migration/options.c| 28 migration/trace-events | 2 +- 7 files changed, 99 insertions(+), 6 deletions(-) diff --git a/qapi/migration.json b/qapi/migration.json index d7dfaa5db9..360e609f66 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -758,6 +758,16 @@ # @max-bandwidth: to set maximum speed for migration. maximum speed # in bytes per second. (Since 2.8) # +# @avail-switchover-bandwidth: to set the available bandwidth that +# migration can use during switchover phase. NOTE! This does not +# limit the bandwidth during switchover, but only for calculations when +# making decisions to switchover. By default, this value is zero, +# which means QEMU will estimate the bandwidth automatically. This can +# be set when the estimated value is not accurate, while the user is +# able to guarantee such bandwidth is available when switching over. +# When specified correctly, this can make the switchover decision much +# more accurate. (Since 8.2) +# # @downtime-limit: set maximum tolerated downtime for migration. # maximum downtime in milliseconds (Since 2.8) # @@ -839,7 +849,7 @@ 'cpu-throttle-initial', 'cpu-throttle-increment', 'cpu-throttle-tailslow', 'tls-creds', 'tls-hostname', 'tls-authz', 'max-bandwidth', - 'downtime-limit', + 'avail-switchover-bandwidth', 'downtime-limit', { 'name': 'x-checkpoint-delay', 'features': [ 'unstable' ] }, 'block-incremental',
[PULL 11/38] tests/qtest: migration-test: Add tests for file-based migration
From: Fabiano Rosas Add basic tests for file-based migration. Note that we cannot use test_precopy_common because that routine expects it to be possible to run the migration live. With the file transport there is no live migration because we must wait for the source to finish writing the migration data to the file before the destination can start reading. Add a new migration function specifically to handle the file migration. Reviewed-by: Peter Xu Reviewed-by: Juan Quintela Signed-off-by: Fabiano Rosas Signed-off-by: Juan Quintela Message-ID: <20230712190742.22294-7-faro...@suse.de> --- tests/qtest/migration-test.c | 147 +++ 1 file changed, 147 insertions(+) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index cef5081f8c..da02b6d692 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -68,6 +68,10 @@ static bool got_dst_resume; #define ANALYZE_SCRIPT "scripts/analyze-migration.py" +#define QEMU_VM_FILE_MAGIC 0x5145564d +#define FILE_TEST_FILENAME "migfile" +#define FILE_TEST_OFFSET 0x1000 + #if defined(__linux__) #include #include @@ -884,6 +888,7 @@ static void test_migrate_end(QTestState *from, QTestState *to, bool test_dest) cleanup("migsocket"); cleanup("src_serial"); cleanup("dest_serial"); +cleanup(FILE_TEST_FILENAME); } #ifdef CONFIG_GNUTLS @@ -1667,6 +1672,70 @@ finish: test_migrate_end(from, to, args->result == MIG_TEST_SUCCEED); } +static void test_file_common(MigrateCommon *args, bool stop_src) +{ +QTestState *from, *to; +void *data_hook = NULL; +g_autofree char *connect_uri = g_strdup(args->connect_uri); + +if (test_migrate_start(, , args->listen_uri, >start)) { +return; +} + +/* + * File migration is never live. We can keep the source VM running + * during migration, but the destination will not be running + * concurrently. + */ +g_assert_false(args->live); + +if (args->start_hook) { +data_hook = args->start_hook(from, to); +} + +migrate_ensure_converge(from); +wait_for_serial("src_serial"); + +if (stop_src) { +qtest_qmp_assert_success(from, "{ 'execute' : 'stop'}"); +if (!got_src_stop) { +qtest_qmp_eventwait(from, "STOP"); +} +} + +if (args->result == MIG_TEST_QMP_ERROR) { +migrate_qmp_fail(from, connect_uri, "{}"); +goto finish; +} + +migrate_qmp(from, connect_uri, "{}"); +wait_for_migration_complete(from); + +/* + * We need to wait for the source to finish before starting the + * destination. + */ +migrate_incoming_qmp(to, connect_uri, "{}"); +wait_for_migration_complete(to); + +if (stop_src) { +qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}"); +} + +if (!got_dst_resume) { +qtest_qmp_eventwait(to, "RESUME"); +} + +wait_for_serial("dest_serial"); + +finish: +if (args->finish_hook) { +args->finish_hook(from, to, data_hook); +} + +test_migrate_end(from, to, args->result == MIG_TEST_SUCCEED); +} + static void test_precopy_unix_plain(void) { g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs); @@ -1862,6 +1931,76 @@ static void test_precopy_unix_compress_nowait(void) test_precopy_common(); } +static void test_precopy_file(void) +{ +g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs, + FILE_TEST_FILENAME); +MigrateCommon args = { +.connect_uri = uri, +.listen_uri = "defer", +}; + +test_file_common(, true); +} + +static void file_offset_finish_hook(QTestState *from, QTestState *to, +void *opaque) +{ +#if defined(__linux__) +g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, FILE_TEST_FILENAME); +size_t size = FILE_TEST_OFFSET + sizeof(QEMU_VM_FILE_MAGIC); +uintptr_t *addr, *p; +int fd; + +fd = open(path, O_RDONLY); +g_assert(fd != -1); +addr = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0); +g_assert(addr != MAP_FAILED); + +/* + * Ensure the skipped offset contains zeros and the migration + * stream starts at the right place. + */ +p = addr; +while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) { +g_assert(*p == 0); +p++; +} +g_assert_cmpint(cpu_to_be32(*p), ==, QEMU_VM_FILE_MAGIC); + +munmap(addr, size); +close(fd); +#endif +} + +static void test_precopy_file_offset(void) +{ +g_autofree char *uri = g_strdup_printf("file:%s/%s,offset=%d", tmpfs, + FILE_TEST_FILENAME, + FILE_TEST_OFFSET); +MigrateCommon args = { +.connect_uri = uri, +.listen_uri = "defer", +.finish_hook = file_offset_finish_hook, +}; + +test_file_common(, false); +} + +static void
[PULL 05/38] migration: Add the configuration vmstate to the json writer
From: Nikolay Borisov Make the migration json writer part of MigrationState struct, allowing the 'configuration' object be serialized to json. This will facilitate the parsing of the 'configuration' object in the next patch that fixes analyze-migration.py for arm. Signed-off-by: Nikolay Borisov Signed-off-by: Fabiano Rosas Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela Message-ID: <20231009184326.15777-2-faro...@suse.de> --- migration/migration.c | 1 + migration/savevm.c| 20 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index ed04ca3b1c..98151b1424 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1442,6 +1442,7 @@ int migrate_init(MigrationState *s, Error **errp) error_free(s->error); s->error = NULL; s->hostname = NULL; +s->vmdesc = NULL; migrate_set_state(>state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP); diff --git a/migration/savevm.c b/migration/savevm.c index 497ce02bd7..bce698b0af 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1217,13 +1217,27 @@ void qemu_savevm_non_migratable_list(strList **reasons) void qemu_savevm_state_header(QEMUFile *f) { +MigrationState *s = migrate_get_current(); + +s->vmdesc = json_writer_new(false); + trace_savevm_state_header(); qemu_put_be32(f, QEMU_VM_FILE_MAGIC); qemu_put_be32(f, QEMU_VM_FILE_VERSION); -if (migrate_get_current()->send_configuration) { +if (s->send_configuration) { qemu_put_byte(f, QEMU_VM_CONFIGURATION); -vmstate_save_state(f, _configuration, _state, 0); + +/* + * This starts the main json object and is paired with the + * json_writer_end_object in + * qemu_savevm_state_complete_precopy_non_iterable + */ +json_writer_start_object(s->vmdesc, NULL); + +json_writer_start_object(s->vmdesc, "configuration"); +vmstate_save_state(f, _configuration, _state, s->vmdesc); +json_writer_end_object(s->vmdesc); } } @@ -1272,8 +1286,6 @@ void qemu_savevm_state_setup(QEMUFile *f) Error *local_err = NULL; int ret; -ms->vmdesc = json_writer_new(false); -json_writer_start_object(ms->vmdesc, NULL); json_writer_int64(ms->vmdesc, "page_size", qemu_target_page_size()); json_writer_start_array(ms->vmdesc, "devices"); -- 2.41.0
[PULL 12/38] migration: hold the BQL during setup
From: Fiona Ebner This is intended to be a semantic revert of commit 9b09503752 ("migration: run setup callbacks out of big lock"). There have been so many changes since that commit (e.g. a new setup callback dirty_bitmap_save_setup() that also needs to be adapted now), it's easier to do the revert manually. For snapshots, the bdrv_writev_vmstate() function is used during setup (in QIOChannelBlock backing the QEMUFile), but not holding the BQL while calling it could lead to an assertion failure. To understand how, first note the following: 1. Generated coroutine wrappers for block layer functions spawn the coroutine and use AIO_WAIT_WHILE()/aio_poll() to wait for it. 2. If the host OS switches threads at an inconvenient time, it can happen that a bottom half scheduled for the main thread's AioContext is executed as part of a vCPU thread's aio_poll(). An example leading to the assertion failure is as follows: main thread: 1. A snapshot-save QMP command gets issued. 2. snapshot_save_job_bh() is scheduled. vCPU thread: 3. aio_poll() for the main thread's AioContext is called (e.g. when the guest writes to a pflash device, as part of blk_pwrite which is a generated coroutine wrapper). 4. snapshot_save_job_bh() is executed as part of aio_poll(). 3. qemu_savevm_state() is called. 4. qemu_mutex_unlock_iothread() is called. Now qemu_get_current_aio_context() returns 0x0. 5. bdrv_writev_vmstate() is executed during the usual savevm setup via qemu_fflush(). But this function is a generated coroutine wrapper, so it uses AIO_WAIT_WHILE. There, the assertion assert(qemu_get_current_aio_context() == qemu_get_aio_context()); will fail. To fix it, ensure that the BQL is held during setup. While it would only be needed for snapshots, adapting migration too avoids additional logic for conditional locking/unlocking in the setup callbacks. Writing the header could (in theory) also trigger qemu_fflush() and thus bdrv_writev_vmstate(), so the locked section also covers the qemu_savevm_state_header() call, even for migration for consistency. The section around multifd_send_sync_main() needs to be unlocked to avoid a deadlock. In particular, the multifd_save_setup() function calls socket_send_channel_create() using multifd_new_send_channel_async() as a callback and then waits for the callback to signal via the channels_ready semaphore. The connection happens via qio_task_run_in_thread(), but the callback is only executed via qio_task_thread_result() which is scheduled for the main event loop. Without unlocking the section, the main thread would never get to process the task result and the callback meaning there would be no signal via the channels_ready semaphore. The comment in ram_init_bitmaps() was introduced by 4987783400 ("migration: fix incorrect memory_global_dirty_log_start outside BQL") and is removed, because it referred to the qemu_mutex_lock_iothread() call. Signed-off-by: Fiona Ebner Reviewed-by: Fabiano Rosas Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela Message-ID: <20231013105839.415989-1-f.eb...@proxmox.com> --- include/migration/register.h | 2 +- migration/block-dirty-bitmap.c | 3 --- migration/block.c | 5 - migration/migration.c | 6 ++ migration/ram.c| 6 +++--- migration/savevm.c | 2 -- 6 files changed, 10 insertions(+), 14 deletions(-) diff --git a/include/migration/register.h b/include/migration/register.h index 2b12c6adec..fed1d04a3c 100644 --- a/include/migration/register.h +++ b/include/migration/register.h @@ -25,6 +25,7 @@ typedef struct SaveVMHandlers { * used to perform early checks. */ int (*save_prepare)(void *opaque, Error **errp); +int (*save_setup)(QEMUFile *f, void *opaque); void (*save_cleanup)(void *opaque); int (*save_live_complete_postcopy)(QEMUFile *f, void *opaque); int (*save_live_complete_precopy)(QEMUFile *f, void *opaque); @@ -50,7 +51,6 @@ typedef struct SaveVMHandlers { int (*save_live_iterate)(QEMUFile *f, void *opaque); /* This runs outside the iothread lock! */ -int (*save_setup)(QEMUFile *f, void *opaque); /* Note for save_live_pending: * must_precopy: * - must be migrated in precopy or in stopped state diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c index 032fc5f405..03cb2e72ee 100644 --- a/migration/block-dirty-bitmap.c +++ b/migration/block-dirty-bitmap.c @@ -1214,9 +1214,7 @@ static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque) DBMSaveState *s = &((DBMState *)opaque)->save; SaveBitmapState *dbms = NULL; -qemu_mutex_lock_iothread(); if (init_dirty_bitmap_migration(s) < 0) { -qemu_mutex_unlock_iothread(); return -1; } @@ -1224,7 +1222,6 @@ static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque) send_bitmap_start(f, s, dbms); } qemu_put_bitmap_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS); -
Re: -drive if=none: can't we make this the default?
On 10/14/23 21:16, Michael Tokarev wrote: Can't we make -drive if=none the default? Yes, I know current default is ide, and whole world have to use if=none explicitly to undo this. I think at this point we can deprecate if=ide default and switch to if=none in the next release. I think it will be a welcome change. I think if anything we should have no default at all. But if I had my way: 1) if=none would be deprecated (but with a much longer cycle than 1 year, probably), and everything that uses it would have to use -blockdev. 2) -drive would be limited to a very small set of suboptions (file, cache, if, and the ones in qemu_common_drive_opts) and anything that specifies the driver would go through -blockdev. Paolo
Re: -drive if=none: can't we make this the default?
On Sat, Oct 14, 2023 at 10:16:16PM +0300, Michael Tokarev wrote: > Can't we make -drive if=none the default? > > Yes, I know current default is ide, and whole world have to use if=none > explicitly > to undo this. I think at this point we can deprecate if=ide default and > switch to > if=none in the next release. I think it will be a welcome change. IMHO we'd be better off investing more effort in pushing people towards -blockdev though better documentation of the latter. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH v3 0/8] qemu-img: rebase: add compression support
On 10/2/23 09:35, Andrey Drobyshev wrote: > On 9/19/23 20:57, Andrey Drobyshev wrote: >> v2 --> v3: >> * Patch 3/8: fixed logic in the if statement, so that we align on blk >>when blk_old_backing == NULL; >> * Patch 4/8: comment fix; >> * Patch 5/8: comment fix; dropped redundant "if (blk_new_backing)" >>statements. >> >> v2: https://lists.nongnu.org/archive/html/qemu-block/2023-09/msg00448.html >> >> Andrey Drobyshev (8): >> qemu-img: rebase: stop when reaching EOF of old backing file >> qemu-iotests: 024: add rebasing test case for overlay_size > >> backing_size >> qemu-img: rebase: use backing files' BlockBackend for buffer alignment >> qemu-img: add chunk size parameter to compare_buffers() >> qemu-img: rebase: avoid unnecessary COW operations >> iotests/{024, 271}: add testcases for qemu-img rebase >> qemu-img: add compression option to rebase subcommand >> iotests: add tests for "qemu-img rebase" with compression >> >> docs/tools/qemu-img.rst| 6 +- >> qemu-img-cmds.hx | 4 +- >> qemu-img.c | 136 ++ >> tests/qemu-iotests/024 | 117 ++ >> tests/qemu-iotests/024.out | 73 >> tests/qemu-iotests/271 | 131 + >> tests/qemu-iotests/271.out | 82 ++ >> tests/qemu-iotests/314 | 165 + >> tests/qemu-iotests/314.out | 75 + >> 9 files changed, 752 insertions(+), 37 deletions(-) >> create mode 100755 tests/qemu-iotests/314 >> create mode 100644 tests/qemu-iotests/314.out >> > > Ping Friendly ping
Re: [PATCH v4 03/10] migration: migrate 'inc' command option is deprecated.
Markus Armbruster wrote: > Juan Quintela writes: > >> Set the 'block_incremental' migration parameter to 'true' instead. >> >> Reviewed-by: Thomas Huth >> Acked-by: Stefan Hajnoczi >> Signed-off-by: Juan Quintela >> >> --- >> >> Improve documentation and style (thanks Markus) >> --- >> docs/about/deprecated.rst | 7 +++ >> qapi/migration.json | 8 +++- >> migration/migration.c | 6 ++ >> 3 files changed, 20 insertions(+), 1 deletion(-) >> >> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst >> index 1c4d7f36f0..1b6b2870cf 100644 >> --- a/docs/about/deprecated.rst >> +++ b/docs/about/deprecated.rst >> @@ -452,3 +452,10 @@ Migration >> ``skipped`` field in Migration stats has been deprecated. It hasn't >> been used for more than 10 years. >> >> +``inc`` migrate command option (since 8.2) >> +'' >> + >> +The new way to modify migration is using migration parameters. >> +``inc`` functionality can be achieved by setting the >> +``block-incremental`` migration parameter to ``true``. >> + >> diff --git a/qapi/migration.json b/qapi/migration.json >> index 6865fea3c5..56bbd55b87 100644 >> --- a/qapi/migration.json >> +++ b/qapi/migration.json >> @@ -1492,6 +1492,11 @@ >> # >> # @resume: resume one paused migration, default "off". (since 3.0) >> # >> +# Features: >> +# >> +# @deprecated: Member @inc is deprecated. Use migration parameter >> +# @block-incremental instead. > > This is fine now. It becomes bad advice in PATCH 05, which deprecates > @block-incremental. Two solutions: > > 1. Change this patch to point to an alternative that will *not* be > deprecated. Ok, clearly I am not explaining myself properly O:-) History of block migration: * In the beggining there was -b and -i migrate options There was the only way to do storage of migration. * We moved to use parameters and capabilities for migration So we created @block-incremental and @block. But we didn't remove the command line options (for backward compatibility). * We were asked to modify migration so some storaged was migrated and some was not migrated during migration. But block people found that it was a good idea to allow storage migration without migrating the vm, so they created this blockdev-mirror mechanism that is shinny, funny, faster, better. So now we have old code that basically nobody uses (the last big user was COLO, but now it can use multifd). So we want to drop it, but we don't care about a direct replacement. So, why I am interested in removing this? - @block and @block-incremental: If you don't use block migration, their existence don't bother you. They are "quite" independent of the rest of the migration code (they could be better integrated, but not big trouble here). - migrate options -i/-b: This ones hurt us each time that we need to do changing in options. Notice that we have "perfect" replacements with @block and @block-incremental, exactly the same result/semantics/... You can see the trobles in the RFC patches * [PATCH v4 07/10] [RFC] migration: Make -i/-b an error for hmp and qmp * [PATCH v4 08/10] [RFC] migration: Remove helpers needed for -i/-b migrate options So what I want, I want to remove -i/-b in the next version (9.0?). For the other, I want to remove it, but I don't care if the code is around in "deprecated" state for another couple of years if there are still people that feel that they want it. This is the reason that I put a pointer for -i/-b to @block/@block-incremental. They are "perfect" replacements. I can put here to use blockdev-mirror + NBD, but the replacement is not so direct. Does this make sense? > 2. Change PATCH 05. > > Same end result. > >> +# >> # Returns: nothing on success >> # >> # Since: 0.14 >> @@ -1513,7 +1518,8 @@ >> # <- { "return": {} } >> ## >> { 'command': 'migrate', >> - 'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', >> + 'data': {'uri': 'str', '*blk': 'bool', >> + '*inc': { 'type': 'bool', 'features': [ 'deprecated' ] }, >> '*detach': 'bool', '*resume': 'bool' } } >> >> ## >> diff --git a/migration/migration.c b/migration/migration.c >> index 1c6c81ad49..ac4897fe0d 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -1601,6 +1601,12 @@ static bool migrate_prepare(MigrationState *s, bool >> blk, bool blk_inc, >> { >> Error *local_err = NULL; >> >> +if (blk_inc) { >> +warn_report("@inc/-i migrate option is deprecated, set the" > > This is either about QMP migrate's parameter "inc", or HMP migrate's > flags -i. Needs to be @inc. I want about the "-i" command option in other place. > In the former case, we want something like "parameter 'inc' is > deprecated". This one. > In the latter case, we want something like "-i is deprecated". Ok, changing. > Trying to do both in a single message results in a sub-par message. If > you want to do better,
Re: [PATCH v5] migration: hold the BQL during setup
Fiona Ebner wrote: queued. > This is intended to be a semantic revert of commit 9b09503752 > ("migration: run setup callbacks out of big lock"). There have been so > many changes since that commit (e.g. a new setup callback > dirty_bitmap_save_setup() that also needs to be adapted now), it's > easier to do the revert manually. > > For snapshots, the bdrv_writev_vmstate() function is used during setup > (in QIOChannelBlock backing the QEMUFile), but not holding the BQL > while calling it could lead to an assertion failure. To understand > how, first note the following: > > 1. Generated coroutine wrappers for block layer functions spawn the > coroutine and use AIO_WAIT_WHILE()/aio_poll() to wait for it. > 2. If the host OS switches threads at an inconvenient time, it can > happen that a bottom half scheduled for the main thread's AioContext > is executed as part of a vCPU thread's aio_poll(). > > An example leading to the assertion failure is as follows: > > main thread: > 1. A snapshot-save QMP command gets issued. > 2. snapshot_save_job_bh() is scheduled. > > vCPU thread: > 3. aio_poll() for the main thread's AioContext is called (e.g. when > the guest writes to a pflash device, as part of blk_pwrite which is a > generated coroutine wrapper). > 4. snapshot_save_job_bh() is executed as part of aio_poll(). > 3. qemu_savevm_state() is called. > 4. qemu_mutex_unlock_iothread() is called. Now > qemu_get_current_aio_context() returns 0x0. > 5. bdrv_writev_vmstate() is executed during the usual savevm setup > via qemu_fflush(). But this function is a generated coroutine wrapper, > so it uses AIO_WAIT_WHILE. There, the assertion > assert(qemu_get_current_aio_context() == qemu_get_aio_context()); > will fail. > > To fix it, ensure that the BQL is held during setup. While it would > only be needed for snapshots, adapting migration too avoids additional > logic for conditional locking/unlocking in the setup callbacks. > Writing the header could (in theory) also trigger qemu_fflush() and > thus bdrv_writev_vmstate(), so the locked section also covers the > qemu_savevm_state_header() call, even for migration for consistency. > > The section around multifd_send_sync_main() needs to be unlocked to > avoid a deadlock. In particular, the multifd_save_setup() function calls > socket_send_channel_create() using multifd_new_send_channel_async() as a > callback and then waits for the callback to signal via the > channels_ready semaphore. The connection happens via