date:20231016

Re: [PATCH v4 03/10] migration: migrate 'inc' command option is deprecated.

2023-10-16 Thread Markus Armbruster

Juan Quintela  writes:

> Markus Armbruster  wrote:
>> Juan Quintela  writes:
>>
>>> Markus Armbruster  wrote:
 Juan Quintela  writes:
>>> So what I want, I want to remove -i/-b in the next version (9.0?).  For
>>> the other, I want to remove it, but I don't care if the code is around
>>> in "deprecated" state for another couple of years if there are still
>>> people that feel that they want it.
>>>
>>> This is the reason that I put a pointer for -i/-b to
>>> @block/@block-incremental.  They are "perfect" replacements.
>>>
>>> I can put here to use blockdev-mirror + NBD, but the replacement is not
>>> so direct.
>>>
>>> Does this make sense?
>>
>> I see where you're coming from.  Now let's change perspective for a
>> minute: what's the purpose of deprecating stuff?
>>
>> We normally deprecate with intent to remove.
>>
>> We don't remove right away, because we promised to first deprecate for a
>> grace period, so users can adjust in an orderly manner.  The deprecation
>> serves as signal "you need to adjust".  The documentation that comes
>> with it should help with the adjustment.  It's commonly of the form "use
>> $alternative instead".  The alternative is often a direct replacement,
>> but not always.  There could even be no replacement at all.  We don't
>> promise replacements, we promise an orderly process, so users can
>> adjust.
>>
>> Sometimes, we don't have firm plans to remove, but are more like "maybe
>> remove when gets in the way".  We could soften the "you need to adjust"
>> signal in documentation, but I doubt that's a good idea.  Regardless,
>> the need to help users adjust remains.
>>
>> Back to your patches.  There are two separate interfaces to block
>> migration, and both are deprecated at the end of the series:
>>
>> 1. Migration parameter @block-incremental & friends
>>
>>Not in the way, content to keep around for longer if it helps users.
>>
>>The deprecation documentation advises to use block-mirror with NBD
>>instead.  All good.
>>
>> 2. QMP migrate parameters @inc and @blk
>>
>>Firm intent to remove as soon as the grace period expires, because
>>it's in the way.
>>
>>The deprecation documentation advises to use interface 1 instead.
>>But that's deprecated, too!
>>
>>Insufficiently careful readers will miss that the replacement is
>>deprecated, and just use it.  Risks surprise when the replacement
>>goes away, too.
>>
>>More careful readers will realize that this advises to use something
>>we elsewhere advise not to use.  Contradiction!  Confusion ensues.
>>
>>On further reflection, these readers might conclude that the
>>*combined* advice is to use block-mirror with NBD instead.  This is
>>correct.
>>
>>So why not tell them?
>>
>>Perhaps you'd like to give more nuanced advice, like "you should move
>>to block-mirror with NBD, but if that's not practical for you, you
>>should at least move to @block-incremental & friends, which will
>>likely stick around for longer."  That's fine.  All I'm asking for is
>>to not make things more confusing than they need to be :)
>>
>> [...]
>
> Telling this in deprecated.rst is enough?  or you want me to put it also
> in the error/warn messages and qapi?

Let's make all of them point to blockdev-mirror with NBD.  If you think
mentioning @block-incremental & friends would be useful in some or all
places would be useful, go ahead, I don't mind.

[PATCH v2 3/3] hw/nvme: Add SPDM over DOE support

2023-10-16 Thread Alistair Francis

From: Wilfred Mallawa 

Setup Data Object Exchance (DOE) as an extended capability for the NVME
controller and connect SPDM to it (CMA) to it.

Signed-off-by: Wilfred Mallawa 
Signed-off-by: Alistair Francis 
---
 docs/specs/index.rst|   1 +
 docs/specs/spdm.rst | 114 
 include/hw/pci/pci_device.h |   5 ++
 include/hw/pci/pcie_doe.h   |   3 +
 hw/nvme/ctrl.c  |  53 +
 5 files changed, 176 insertions(+)
 create mode 100644 docs/specs/spdm.rst

diff --git a/docs/specs/index.rst b/docs/specs/index.rst
index e58be38c41..c398541388 100644
--- a/docs/specs/index.rst
+++ b/docs/specs/index.rst
@@ -24,3 +24,4 @@ guest hardware that is specific to QEMU.
acpi_erst
sev-guest-firmware
fw_cfg
+   spdm
diff --git a/docs/specs/spdm.rst b/docs/specs/spdm.rst
new file mode 100644
index 00..dfdc3cbb4d
--- /dev/null
+++ b/docs/specs/spdm.rst
@@ -0,0 +1,114 @@
+==
+QEMU Security Protocols and Data Models (SPDM) Support
+==
+
+SPDM enables authentication, attestation and key exchange to assist in
+providing infrastructure security enablement. It's a standard published
+by the `DMTF`_.
+
+QEMU supports connecting to a SPDM Responder implementation. This allows an
+external application to emulate the SPDM Responder logic for an SPDM device.
+
+Setting up a SPDM server
+
+
+When using QEMU with SPDM devices QEMU will connect to a server which
+implements the SPDM functionality.
+
+SPDM-Utils
+--
+
+You can use `SPDM Utils`_ to emulate a Responder.
+
+SPDM-Utils is a Linux applications to manage, test and develop devices
+supporting DMTF Security Protocol and Data Model (SPDM). It is written in Rust
+and utilises libspdm.
+
+To use SPDM-Utils you will need to do the followoing:
+
+ 1. `Build SPDM Utils`_
+ 2. `Generate the certificates`_
+ 3. `Run it as a server`_
+
+spdm-emu
+
+
+You can use `spdm emu`_ to model the
+SPDM responder.
+
+.. code-block:: shell
+
+$ cd spdm-emu
+$ git submodule init; git submodule update --recursive
+$ mkdir build; cd build
+$ cmake -DARCH=x64 -DTOOLCHAIN=GCC -DTARGET=Debug -DCRYPTO=openssl ..
+$ make -j32
+$ make copy_sample_key # Build certificates, required for SPDM 
authentication.
+
+It is worth noting that the certificates should be in compliance with
+PCIe r6.1 sec 6.31.3. This means you will need to add the following to
+openssl.cnf
+
+.. code-block::
+
+subjectAltName = 
otherName:2.23.147;UTF8:Vendor=1b36:Device=0010:CC=010802:REV=02:SSVID=1af4:SSID=1100
+2.23.147 = ASN1:OID:2.23.147
+
+and then manually regenerate some certificates with:
+
+.. code-block:: shell
+
+openssl req -nodes -newkey ec:param.pem -keyout end_responder.key -out 
end_responder.req -sha384 -batch -subj "/CN=DMTF libspdm ECP384 responder cert"
+openssl x509 -req -in end_responder.req -out end_responder.cert -CA 
inter.cert -CAkey inter.key -sha384 -days 3650 -set_serial 3 -extensions v3_end 
-extfile ../openssl.cnf
+openssl asn1parse -in end_responder.cert -out end_responder.cert.der
+cat ca.cert.der inter.cert.der end_responder.cert.der > 
bundle_responder.certchain.der
+
+You can use SPDM-Utils instead as it will generate the correct certificates
+automatically.
+
+The responder can then be launched with
+
+.. code-block:: shell
+
+$ cd bin
+$ ./spdm_responder_emu --trans PCI_DOE
+
+Connecting an SPDM NVMe device
+==
+
+Once a SPDM server is running we can start QEMU and connect to the server.
+
+For an NVMe device first let's setup a block we can use
+
+.. code-block:: shell
+
+$ cd qemu-spdm/linux/image
+$ dd if=/dev/zero of=blknvme bs=1M count=2096 # 2GB NNMe Drive
+
+Then you can add this to your QEMU command line:
+
+.. code-block:: shell
+
+-drive file=blknvme,if=none,id=mynvme,format=raw \
+-device nvme,drive=mynvme,serial=deadbeef,spdm=2323
+
+At which point QEMU will try to connect to the SPDM server.
+
+
+.. _DMTF:
+   https://www.dmtf.org/standards/SPDM
+
+.. _SPDM Utils:
+   https://github.com/westerndigitalcorporation/spdm-utils
+
+.. _spdm emu:
+   https://github.com/dmtf/spdm-emu
+
+.. _Build SPDM Utils:
+   https://github.com/westerndigitalcorporation/spdm-utils#building
+
+.. _Generate the certificates:
+   
https://github.com/westerndigitalcorporation/spdm-utils#generate-mutable-certificates
+
+.. _Run it as a server:
+   
https://github.com/westerndigitalcorporation/spdm-utils#qemu-spdm-device-emulation
diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index d3dd0f64b2..b8379c78f1 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -3,6 +3,7 @@
 
 #include "hw/pci/pci.h"
 #include "hw/pci/pcie.h"
+#include "hw/pci/pcie_doe.h"
 
 #define TYPE_PCI_DEVICE "pci-device"
 typedef struct PCIDeviceClass PCIDeviceClass;
@@

[PATCH v2 0/3] Initial support for SPDM Responders

2023-10-16 Thread Alistair Francis

The Security Protocol and Data Model (SPDM) Specification defines
messages, data objects, and sequences for performing message exchanges
over a variety of transport and physical media.
 - 
https://www.dmtf.org/sites/default/files/standards/documents/DSP0274_1.3.0.pdf

SPDM currently supports PCIe DOE and MCTP transports, but it can be
extended to support others in the future. This series adds
support to QEMU to connect to an external SPDM instance.

SPDM support can be added to any QEMU device by exposing a
TCP socket to a SPDM server. The server can then implement the SPDM
decoding/encoding support, generally using libspdm [2].

This is similar to how the current TPM implementation works and means
that the heavy lifting of setting up certificate chains, capabilities,
measurements and complex crypto can be done outside QEMU by a well
supported and tested library.

This series implements socket support and exposes SPDM for a NVMe device.

2: https://github.com/DMTF/libspdm

v2:
 - Add cover letter
 - A few code fixes based on comments
 - Document SPDM-Utils
 - A few tweaks and clarifications to the documentation

Alistair Francis (1):
  hw/pci: Add all Data Object Types defined in PCIe r6.0

Huai-Cheng Kuo (1):
  backends: Initial support for SPDM socket support

Wilfred Mallawa (1):
  hw/nvme: Add SPDM over DOE support

 docs/specs/index.rst |   1 +
 docs/specs/spdm.rst  | 114 ++
 include/hw/pci/pci_device.h  |   5 +
 include/hw/pci/pcie_doe.h|   5 +
 include/sysemu/spdm-socket.h |  44 +++
 backends/spdm-socket.c   | 216 +++
 hw/nvme/ctrl.c   |  53 +
 backends/Kconfig |   4 +
 backends/meson.build |   2 +
 9 files changed, 444 insertions(+)
 create mode 100644 docs/specs/spdm.rst
 create mode 100644 include/sysemu/spdm-socket.h
 create mode 100644 backends/spdm-socket.c

-- 
2.41.0

[PATCH v2 2/3] backends: Initial support for SPDM socket support

2023-10-16 Thread Alistair Francis

From: Huai-Cheng Kuo 

SPDM enables authentication, attestation and key exchange to assist in
providing infrastructure security enablement. It's a standard published
by the DMTF [1].

SPDM supports multiple transports, including PCIe DOE and MCTP.
This patch adds support to QEMU to connect to an external SPDM
instance.

SPDM support can be added to any QEMU device by exposing a
TCP socket to a SPDM server. The server can then implement the SPDM
decoding/encoding support, generally using libspdm [2].

This is similar to how the current TPM implementation works and means
that the heavy lifting of setting up certificate chains, capabilities,
measurements and complex crypto can be done outside QEMU by a well
supported and tested library.

1: https://www.dmtf.org/standards/SPDM
2: https://github.com/DMTF/libspdm

Signed-off-by: Huai-Cheng Kuo 
Signed-off-by: Chris Browy 
Co-developed-by: Jonathan Cameron 
Signed-off-by: Jonathan Cameron 
[ Changes by WM
 - Bug fixes from testing
]
Signed-off-by: Wilfred Mallawa 
[ Changes by AF:
 - Convert to be more QEMU-ified
 - Move to backends as it isn't PCIe specific
]
Signed-off-by: Alistair Francis 
---
 include/sysemu/spdm-socket.h |  44 +++
 backends/spdm-socket.c   | 216 +++
 backends/Kconfig |   4 +
 backends/meson.build |   2 +
 4 files changed, 266 insertions(+)
 create mode 100644 include/sysemu/spdm-socket.h
 create mode 100644 backends/spdm-socket.c

diff --git a/include/sysemu/spdm-socket.h b/include/sysemu/spdm-socket.h
new file mode 100644
index 00..24e6fccb83
--- /dev/null
+++ b/include/sysemu/spdm-socket.h
@@ -0,0 +1,44 @@
+/*
+ * QEMU SPDM socket support
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef SPDM_REQUESTER_H
+#define SPDM_REQUESTER_H
+
+int spdm_socket_connect(uint16_t port, Error **errp);
+uint32_t spdm_socket_rsp(const int socket, uint32_t transport_type,
+ void *req, uint32_t req_len,
+ void *rsp, uint32_t rsp_len);
+void spdm_socket_close(const int socket, uint32_t transport_type);
+
+#define SPDM_SOCKET_COMMAND_NORMAL0x0001
+#define SPDM_SOCKET_COMMAND_OOB_ENCAP_KEY_UPDATE  0x8001
+#define SPDM_SOCKET_COMMAND_CONTINUE  0xFFFD
+#define SPDM_SOCKET_COMMAND_SHUTDOWN  0xFFFE
+#define SPDM_SOCKET_COMMAND_UNKOWN0x
+#define SPDM_SOCKET_COMMAND_TEST  0xDEAD
+
+#define SPDM_SOCKET_TRANSPORT_TYPE_MCTP   0x01
+#define SPDM_SOCKET_TRANSPORT_TYPE_PCI_DOE0x02
+
+#define SPDM_SOCKET_MAX_MESSAGE_BUFFER_SIZE   0x1200
+
+#endif
diff --git a/backends/spdm-socket.c b/backends/spdm-socket.c
new file mode 100644
index 00..d0663d696c
--- /dev/null
+++ b/backends/spdm-socket.c
@@ -0,0 +1,216 @@
+/* SPDX-License-Identifier: BSD-3-Clause */
+/*
+ * QEMU SPDM socket support
+ *
+ * This is based on:
+ * 
https://github.com/DMTF/spdm-emu/blob/07c0a838bcc1c6207c656ac75885c0603e344b6f/spdm_emu/spdm_emu_common/command.c
+ * but has been re-written to match QEMU style
+ *
+ * Copyright (c) 2021, DMTF. All rights reserved.
+ * Copyright (c) 2023. Western Digital Corporation or its affiliates.
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/spdm-socket.h"
+#include "qapi/error.h"
+
+static bool read_bytes(const int socket, uint8_t *buffer,
+   size_t number_of_bytes)
+{
+ssize_t number_received = 0;
+ssize_t result;
+
+while (number_received < number_of_bytes) {
+result = recv(socket, buffer + number_received,
+  number_of_bytes - number_received, 0);
+if (result <= 0) {
+return false;
+}
+number_received += result;
+}
+return true;
+}
+
+static bool read_data32(const int socket, uint32_t *data)
+{
+bool result;
+
+result = read_bytes(socket, (uint8_t *)data, sizeof(uint32_t));
+if (!result) {
+

[PATCH v2 1/3] hw/pci: Add all Data Object Types defined in PCIe r6.0

2023-10-16 Thread Alistair Francis

Add all of the defined protocols/features from the PCIe-SIG r6.0
"Table 6-32 PCI-SIG defined Data Object Types (Vendor ID = 0001h)"
table.

Signed-off-by: Alistair Francis 
---
 include/hw/pci/pcie_doe.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/hw/pci/pcie_doe.h b/include/hw/pci/pcie_doe.h
index 87dc17dcef..15d94661f9 100644
--- a/include/hw/pci/pcie_doe.h
+++ b/include/hw/pci/pcie_doe.h
@@ -46,6 +46,8 @@ REG32(PCI_DOE_CAP_STATUS, 0)
 
 /* PCI-SIG defined Data Object Types - r6.0 Table 6-32 */
 #define PCI_SIG_DOE_DISCOVERY   0x00
+#define PCI_SIG_DOE_CMA 0x01
+#define PCI_SIG_DOE_SECURED_CMA 0x02
 
 #define PCI_DOE_DW_SIZE_MAX (1 << 18)
 #define PCI_DOE_PROTOCOL_NUM_MAX256
-- 
2.41.0

Re: [PATCH 2/2] virtio: Drop out of coroutine context in virtio_load()

2023-10-16 Thread Michael Tokarev


05.09.2023 17:50, Kevin Wolf wrote:

virtio_load() as a whole should run in coroutine context because it
reads from the migration stream and we don't want this to block.

However, it calls virtio_set_features_nocheck() and devices don't
expect their .set_features callback to run in a coroutine and therefore
call functions that may not be called in coroutine context. To fix this,
drop out of coroutine context for calling virtio_set_features_nocheck().

...

Cc: qemu-sta...@nongnu.org
Buglink: https://issues.redhat.com/browse/RHEL-832
Signed-off-by: Kevin Wolf 


It looks like this change caused an interesting regression,
  https://gitlab.com/qemu-project/qemu/-/issues/1933
at least in -stable.  Can you take a look please?

BTW, Kevin, do you have account @gitlab?

Thanks,

/mjt

Re: [PULL 00/25] Python patches

2023-10-16 Thread John Snow

On Mon, Oct 16, 2023 at 3:21 PM Stefan Hajnoczi  wrote:
>
> Applied, thanks.
>
> Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
> user-visible changes.

Hi Vladimir: all done!

I've created a MR to backport your changes to the standalone repo
here: https://gitlab.com/qemu-project/python-qemu-qmp/-/merge_requests/30
- please give it a quick glance to make sure I didn't butcher anything
in transit.

Thanks,
--js

Re: [PULL 00/38] Migration 20231016 patches

2023-10-16 Thread Fabiano Rosas

Stefan Hajnoczi  writes:

> On Mon, 16 Oct 2023 at 13:13, Fabiano Rosas  wrote:
>>
>> Stefan Hajnoczi  writes:
>>
>> > On Mon, 16 Oct 2023 at 06:11, Juan Quintela  wrote:
>> >>
>> >> The following changes since commit 
>> >> 63011373ad22c794a013da69663c03f1297a5c56:
>> >>
>> >>   Merge tag 'pull-riscv-to-apply-20231012-1' of 
>> >> https://github.com/alistair23/qemu into staging (2023-10-12 10:24:44 
>> >> -0400)
>> >>
>> >> are available in the Git repository at:
>> >>
>> >>   https://gitlab.com/juan.quintela/qemu.git 
>> >> tags/migration-20231016-pull-request
>> >>
>> >> for you to fetch changes up to f39b0f42753635b0f2d8b00a26d11bb197bf51e2:
>> >>
>> >>   migration/multifd: Clarify Error usage in multifd_channel_connect 
>> >> (2023-10-16 11:01:33 +0200)
>> >>
>> >> 
>> >> Migration Pull request (20231016)
>> >>
>> >> In this pull request:
>> >> - rdma cleanups
>> >> - removal of QEMUFileHook
>> >> - test for analyze-migration.py
>> >> - test for multifd file
>> >> - multifd cleanups
>> >> - available switchover bandwidth
>> >> - lots of cleanups.
>> >>
>> >> CI: https://gitlab.com/juan.quintela/qemu/-/pipelines/1037878829
>> >>
>> >> Please, apply.
>> >
>> > This CI failure looks migration-related:
>> >
>> > MALLOC_PERTURB_=96
>> > PYTHON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/pyvenv/bin/python3
>> > QTEST_QEMU_BINARY=./qemu-system-i386
>> > G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh
>> > QTEST_QEMU_IMG=./qemu-img
>> > QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
>> > /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qtest/migration-test
>> > --tap -k
>> > ― ✀ 
>> > ―
>> > stderr:
>> > **
>> > ERROR:../tests/qtest/migration-test.c:1969:file_offset_finish_hook:
>> > assertion failed (cpu_to_be32(*p) == QEMU_VM_FILE_MAGIC): (3 ==
>> > 1363498573)
>>
>> That's the test for the file: transport which got merged in the last
>> PR. I'll look into it.
>
> I have dropped the 20231016 pull request for now and the tests passed
> without it. Maybe there is an interaction with the test you recently
> added and this pull request?

Sorry, I expressed myself poorly. The test _is_ what is breaking this
pull request. The feature was already merged and is working fine.

I commented with a fix on the patch that adds the test.

[PULL 11/38] tests/qtest: migration-test: Add tests for file-based
migration
https://lore.kernel.org/r/20231016100706.2551-12-quint...@redhat.com

[PULL 0/1] Block patches

2023-10-16 Thread Stefan Hajnoczi

The following changes since commit 63011373ad22c794a013da69663c03f1297a5c56:

  Merge tag 'pull-riscv-to-apply-20231012-1' of 
https://github.com/alistair23/qemu into staging (2023-10-12 10:24:44 -0400)

are available in the Git repository at:

  https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 071d6d107db2e26dde9bb15457c74956c88ec5b4:

  virtio-blk: don't start dataplane during the stop of dataplane (2023-10-16 
15:39:13 -0400)


Pull request

Contains a virtio-blk IOThread fix.



hujian (1):
  virtio-blk: don't start dataplane during the stop of dataplane

 hw/block/virtio-blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.41.0

[PULL 1/1] virtio-blk: don't start dataplane during the stop of dataplane

2023-10-16 Thread Stefan Hajnoczi

From: hujian 

During the stop of dataplane for virtio-blk, virtio_bus_cleanup_host_notifier() 
is be
called to clean up notifier at the end, if polled ioeventfd, 
virtio_blk_handle_output()
is used to handle io request. But due to s->dataplane_disabled is false, it 
will be
returned directly, which drops io request.
Backtrace:
->virtio_blk_data_plane_stop
  ->virtio_bus_cleanup_host_notifier
->virtio_queue_host_notifier_read
  ->virtio_queue_notify_vq
->vq->handle_output
  ->virtio_blk_handle_output
->if (s->dataplane  && !s->dataplane_stoped)
  ->if (!s->dataplane_disabled)
->return *
->virtio_blk_handle_output_do
The above problem can occur when using "virsh reset" cmdline to reset guest, 
while
guest does io.
To fix this problem, don't try to start dataplane if s->stopping is true, and 
io would
be handled by virtio_blk_handle_vq().

Signed-off-by: hujian 
Message-id: 202310111414266586...@zte.com.cn
Signed-off-by: Stefan Hajnoczi 
---
 hw/block/virtio-blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 39e7f23fab..c2d59389cb 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1166,7 +1166,7 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, 
VirtQueue *vq)
 {
 VirtIOBlock *s = (VirtIOBlock *)vdev;
 
-if (s->dataplane && !s->dataplane_started) {
+if (s->dataplane && !s->dataplane_started && !s->stopping) {
 /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
  * dataplane here instead of waiting for .set_status().
  */
-- 
2.41.0

Re: [PULL 00/25] Python patches

2023-10-16 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL v2 0/2] hw/ufs: fixes

2023-10-16 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 00/26] Block layer patches

2023-10-16 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 00/38] Migration 20231016 patches

2023-10-16 Thread Stefan Hajnoczi

On Mon, 16 Oct 2023 at 13:13, Fabiano Rosas  wrote:
>
> Stefan Hajnoczi  writes:
>
> > On Mon, 16 Oct 2023 at 06:11, Juan Quintela  wrote:
> >>
> >> The following changes since commit 
> >> 63011373ad22c794a013da69663c03f1297a5c56:
> >>
> >>   Merge tag 'pull-riscv-to-apply-20231012-1' of 
> >> https://github.com/alistair23/qemu into staging (2023-10-12 10:24:44 -0400)
> >>
> >> are available in the Git repository at:
> >>
> >>   https://gitlab.com/juan.quintela/qemu.git 
> >> tags/migration-20231016-pull-request
> >>
> >> for you to fetch changes up to f39b0f42753635b0f2d8b00a26d11bb197bf51e2:
> >>
> >>   migration/multifd: Clarify Error usage in multifd_channel_connect 
> >> (2023-10-16 11:01:33 +0200)
> >>
> >> 
> >> Migration Pull request (20231016)
> >>
> >> In this pull request:
> >> - rdma cleanups
> >> - removal of QEMUFileHook
> >> - test for analyze-migration.py
> >> - test for multifd file
> >> - multifd cleanups
> >> - available switchover bandwidth
> >> - lots of cleanups.
> >>
> >> CI: https://gitlab.com/juan.quintela/qemu/-/pipelines/1037878829
> >>
> >> Please, apply.
> >
> > This CI failure looks migration-related:
> >
> > MALLOC_PERTURB_=96
> > PYTHON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/pyvenv/bin/python3
> > QTEST_QEMU_BINARY=./qemu-system-i386
> > G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh
> > QTEST_QEMU_IMG=./qemu-img
> > QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
> > /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qtest/migration-test
> > --tap -k
> > ― ✀ 
> > ―
> > stderr:
> > **
> > ERROR:../tests/qtest/migration-test.c:1969:file_offset_finish_hook:
> > assertion failed (cpu_to_be32(*p) == QEMU_VM_FILE_MAGIC): (3 ==
> > 1363498573)
>
> That's the test for the file: transport which got merged in the last
> PR. I'll look into it.

I have dropped the 20231016 pull request for now and the tests passed
without it. Maybe there is an interaction with the test you recently
added and this pull request?

Stefan

Re: [PULL 11/38] tests/qtest: migration-test: Add tests for file-based migration

2023-10-16 Thread Fabiano Rosas

Juan Quintela  writes:

> From: Fabiano Rosas 
>
> Add basic tests for file-based migration.
>
> Note that we cannot use test_precopy_common because that routine
> expects it to be possible to run the migration live. With the file
> transport there is no live migration because we must wait for the
> source to finish writing the migration data to the file before the
> destination can start reading. Add a new migration function
> specifically to handle the file migration.
>
> Reviewed-by: Peter Xu 
> Reviewed-by: Juan Quintela 
> Signed-off-by: Fabiano Rosas 
> Signed-off-by: Juan Quintela 
> Message-ID: <20230712190742.22294-7-faro...@suse.de>
> ---
>  tests/qtest/migration-test.c | 147 +++
>  1 file changed, 147 insertions(+)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index cef5081f8c..da02b6d692 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -68,6 +68,10 @@ static bool got_dst_resume;
>  
>  #define ANALYZE_SCRIPT "scripts/analyze-migration.py"
>  
> +#define QEMU_VM_FILE_MAGIC 0x5145564d
> +#define FILE_TEST_FILENAME "migfile"
> +#define FILE_TEST_OFFSET 0x1000
> +
>  #if defined(__linux__)
>  #include 
>  #include 
> @@ -884,6 +888,7 @@ static void test_migrate_end(QTestState *from, QTestState 
> *to, bool test_dest)
>  cleanup("migsocket");
>  cleanup("src_serial");
>  cleanup("dest_serial");
> +cleanup(FILE_TEST_FILENAME);
>  }
>  
>  #ifdef CONFIG_GNUTLS
> @@ -1667,6 +1672,70 @@ finish:
>  test_migrate_end(from, to, args->result == MIG_TEST_SUCCEED);
>  }
>  
> +static void test_file_common(MigrateCommon *args, bool stop_src)
> +{
> +QTestState *from, *to;
> +void *data_hook = NULL;
> +g_autofree char *connect_uri = g_strdup(args->connect_uri);
> +
> +if (test_migrate_start(, , args->listen_uri, >start)) {
> +return;
> +}
> +
> +/*
> + * File migration is never live. We can keep the source VM running
> + * during migration, but the destination will not be running
> + * concurrently.
> + */
> +g_assert_false(args->live);
> +
> +if (args->start_hook) {
> +data_hook = args->start_hook(from, to);
> +}
> +
> +migrate_ensure_converge(from);
> +wait_for_serial("src_serial");
> +
> +if (stop_src) {
> +qtest_qmp_assert_success(from, "{ 'execute' : 'stop'}");
> +if (!got_src_stop) {
> +qtest_qmp_eventwait(from, "STOP");
> +}
> +}
> +
> +if (args->result == MIG_TEST_QMP_ERROR) {
> +migrate_qmp_fail(from, connect_uri, "{}");
> +goto finish;
> +}
> +
> +migrate_qmp(from, connect_uri, "{}");
> +wait_for_migration_complete(from);
> +
> +/*
> + * We need to wait for the source to finish before starting the
> + * destination.
> + */
> +migrate_incoming_qmp(to, connect_uri, "{}");
> +wait_for_migration_complete(to);
> +
> +if (stop_src) {
> +qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");
> +}
> +
> +if (!got_dst_resume) {
> +qtest_qmp_eventwait(to, "RESUME");
> +}
> +
> +wait_for_serial("dest_serial");
> +
> +finish:
> +if (args->finish_hook) {
> +args->finish_hook(from, to, data_hook);
> +}
> +
> +test_migrate_end(from, to, args->result == MIG_TEST_SUCCEED);
> +}
> +
>  static void test_precopy_unix_plain(void)
>  {
>  g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
> @@ -1862,6 +1931,76 @@ static void test_precopy_unix_compress_nowait(void)
>  test_precopy_common();
>  }
>  
> +static void test_precopy_file(void)
> +{
> +g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
> +   FILE_TEST_FILENAME);
> +MigrateCommon args = {
> +.connect_uri = uri,
> +.listen_uri = "defer",
> +};
> +
> +test_file_common(, true);
> +}
> +
> +static void file_offset_finish_hook(QTestState *from, QTestState *to,
> +void *opaque)
> +{
> +#if defined(__linux__)
> +g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, 
> FILE_TEST_FILENAME);
> +size_t size = FILE_TEST_OFFSET + sizeof(QEMU_VM_FILE_MAGIC);
> +uintptr_t *addr, *p;
> +int fd;
> +
> +fd = open(path, O_RDONLY);
> +g_assert(fd != -1);
> +addr = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0);
> +g_assert(addr != MAP_FAILED);
> +
> +/*
> + * Ensure the skipped offset contains zeros and the migration
> + * stream starts at the right place.
> + */
> +p = addr;
> +while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) {
> +g_assert(*p == 0);
> +p++;
> +}
> +g_assert_cmpint(cpu_to_be32(*p), ==, QEMU_VM_FILE_MAGIC);

This truncates to 32-bits, so it breaks on a BE host. We need this:

-->8--
>From ea0c2d1c988add48d9754891a9fc7f6854a9718a Mon Sep 17 00:00:00 2001
From: Fabiano Rosas 
Date: Mon, 16 Oct

Re: [RFC PATCH 00/78] Strict disable implicit fallthrough

2023-10-16 Thread Manos Pitsidianakis

On Mon, 16 Oct 2023, 18:04 Peter Maydell,  wrote:

> On Mon, 16 Oct 2023 at 15:58, Manos Pitsidianakis
>  wrote:
> >
> > Hello Peter,
> >
> > On Mon, 16 Oct 2023, 17:13 Peter Maydell, 
> wrote:
> >>
> >> On Fri, 13 Oct 2023 at 13:42, Markus Armbruster 
> wrote:
> >> >
> >> > Emmanouil Pitsidianakis  writes:
> >> >
> >> > > Hello,
> >> > >
> >> > > This RFC is inspired by the kernel's move to
> -Wimplicit-fallthrough=3
> >> > > back in 2019.[0]
> >> > > We take one step (or two) further by increasing it to 5 which
> rejects
> >> > > fall through comments and requires an attribute statement.
> >> > >
> >> > > [0]:
> >> > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a035d552a93b
> >> > >
> >> > > The line differences are not many, but they spread all over
> different
> >> > > subsystems, architectures and devices. An attempt has been made to
> split
> >> > > them in cohesive patches to aid post-RFC review. Part of the RFC is
> to
> >> > > determine whether these patch divisions needs improvement.
> >> > >
> >> > > Main questions this RFC poses
> >> > > =
> >> > >
> >> > > - Is this change desirable and net-positive.
> >> >
> >> > Unwanted fallthrough is an easy mistake to make, and
> >> > -Wimplicit-fallthrough=N helps avoid it.  The question is how far up
> we
> >> > need to push N.  Right now we're at N=2.  Has unwanted fallthrough
> been
> >> > a problem?
> >>
> >> Mmm, this is my opinion I think. We have a mechanism for
> >> catching "forgot the 'break'" already (our =2 setting) and
> >> a way to say "intentional" in a fairly natural way (add the
> >> comment). Does pushing N up any further gain us anything
> >> except a load of churn?
> >>
> >> Also, the compiler is not the only thing that processes our
> >> code: Coverity also looks for "unexpected fallthrough" issues,
> >> so if we wanted to switch away from our current practice we
> >> should check whether what we're switching to is an idiom
> >> that Coverity recognises.
> >
> >
> > It is a code style change as the cover letter mentions, it's not related
> to the static analysis itself.
>
> Yes, exactly. As a code style change it needs a fairly high level
> of justification for the code churn, and the cover letter
> doesn't really provide one...
>


As I state in the cover letter, I personally find that using one macro
instead of a comment regex feels more consistent. But your view is valid as
well!

Let's consider the RFC retracted then.

--
Manos

>

Re: [PATCH v2] block/file-posix: fix update_zones_wp() caller

2023-10-16 Thread Stefan Hajnoczi

On Fri, Aug 25, 2023 at 12:05:56PM +0800, Sam Li wrote:
> When the zoned request fail, it needs to update only the wp of
> the target zones for not disrupting the in-flight writes on
> these other zones. The wp is updated successfully after the
> request completes.
> 
> Fixed the callers with right offset and nr_zones.
> 
> Signed-off-by: Sam Li 
> ---
>  block/file-posix.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PULL 00/38] Migration 20231016 patches

2023-10-16 Thread Fabiano Rosas

Stefan Hajnoczi  writes:

> On Mon, 16 Oct 2023 at 06:11, Juan Quintela  wrote:
>>
>> The following changes since commit 63011373ad22c794a013da69663c03f1297a5c56:
>>
>>   Merge tag 'pull-riscv-to-apply-20231012-1' of 
>> https://github.com/alistair23/qemu into staging (2023-10-12 10:24:44 -0400)
>>
>> are available in the Git repository at:
>>
>>   https://gitlab.com/juan.quintela/qemu.git 
>> tags/migration-20231016-pull-request
>>
>> for you to fetch changes up to f39b0f42753635b0f2d8b00a26d11bb197bf51e2:
>>
>>   migration/multifd: Clarify Error usage in multifd_channel_connect 
>> (2023-10-16 11:01:33 +0200)
>>
>> 
>> Migration Pull request (20231016)
>>
>> In this pull request:
>> - rdma cleanups
>> - removal of QEMUFileHook
>> - test for analyze-migration.py
>> - test for multifd file
>> - multifd cleanups
>> - available switchover bandwidth
>> - lots of cleanups.
>>
>> CI: https://gitlab.com/juan.quintela/qemu/-/pipelines/1037878829
>>
>> Please, apply.
>
> This CI failure looks migration-related:
>
> MALLOC_PERTURB_=96
> PYTHON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/pyvenv/bin/python3
> QTEST_QEMU_BINARY=./qemu-system-i386
> G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh
> QTEST_QEMU_IMG=./qemu-img
> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
> /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qtest/migration-test
> --tap -k
> ― ✀ ―
> stderr:
> **
> ERROR:../tests/qtest/migration-test.c:1969:file_offset_finish_hook:
> assertion failed (cpu_to_be32(*p) == QEMU_VM_FILE_MAGIC): (3 ==
> 1363498573)

That's the test for the file: transport which got merged in the last
PR. I'll look into it.

Re: [PULL 00/38] Migration 20231016 patches

2023-10-16 Thread Stefan Hajnoczi

On Mon, 16 Oct 2023 at 06:11, Juan Quintela  wrote:
>
> The following changes since commit 63011373ad22c794a013da69663c03f1297a5c56:
>
>   Merge tag 'pull-riscv-to-apply-20231012-1' of 
> https://github.com/alistair23/qemu into staging (2023-10-12 10:24:44 -0400)
>
> are available in the Git repository at:
>
>   https://gitlab.com/juan.quintela/qemu.git 
> tags/migration-20231016-pull-request
>
> for you to fetch changes up to f39b0f42753635b0f2d8b00a26d11bb197bf51e2:
>
>   migration/multifd: Clarify Error usage in multifd_channel_connect 
> (2023-10-16 11:01:33 +0200)
>
> 
> Migration Pull request (20231016)
>
> In this pull request:
> - rdma cleanups
> - removal of QEMUFileHook
> - test for analyze-migration.py
> - test for multifd file
> - multifd cleanups
> - available switchover bandwidth
> - lots of cleanups.
>
> CI: https://gitlab.com/juan.quintela/qemu/-/pipelines/1037878829
>
> Please, apply.

This CI failure looks migration-related:

MALLOC_PERTURB_=96
PYTHON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/pyvenv/bin/python3
QTEST_QEMU_BINARY=./qemu-system-i386
G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh
QTEST_QEMU_IMG=./qemu-img
QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qtest/migration-test
--tap -k
― ✀ ―
stderr:
**
ERROR:../tests/qtest/migration-test.c:1969:file_offset_finish_hook:
assertion failed (cpu_to_be32(*p) == QEMU_VM_FILE_MAGIC): (3 ==
1363498573)

https://gitlab.com/qemu-project/qemu/-/jobs/5301793548

Stefan

>
> 
>
> Dmitry Frolov (1):
>   migration: fix RAMBlock add NULL check
>
> Elena Ufimtseva (3):
>   migration: check for rate_limit_max for RATE_LIMIT_DISABLED
>   multifd: fix counters in multifd_send_thread
>   multifd: reset next_packet_len after sending pages
>
> Fabiano Rosas (13):
>   migration: Fix analyze-migration.py 'configuration' parsing
>   migration: Add capability parsing to analyze-migration.py
>   migration: Fix analyze-migration.py when ignore-shared is used
>   migration: Fix analyze-migration read operation signedness
>   tests/qtest/migration: Add a test for the analyze-migration script
>   tests/qtest: migration-test: Add tests for file-based migration
>   migration/ram: Remove RAMState from xbzrle_cache_zero_page
>   migration/ram: Stop passing QEMUFile around in save_zero_page
>   migration/ram: Move xbzrle zero page handling into save_zero_page
>   migration/ram: Merge save_zero_page functions
>   migration/multifd: Remove direct "socket" references
>   migration/multifd: Unify multifd_send_thread error paths
>   migration/multifd: Clarify Error usage in multifd_channel_connect
>
> Fiona Ebner (1):
>   migration: hold the BQL during setup
>
> Juan Quintela (15):
>   migration: Non multifd migration don't care about multifd flushes
>   migration: Create migrate_rdma()
>   migration/rdma: Unfold ram_control_before_iterate()
>   migration/rdma: Unfold ram_control_after_iterate()
>   migration/rdma: Remove all uses of RAM_CONTROL_HOOK
>   migration/rdma: Unfold hook_ram_load()
>   migration/rdma: Create rdma_control_save_page()
>   qemu-file: Remove QEMUFileHooks
>   migration/rdma: Move rdma constants from qemu-file.h to rdma.h
>   migration/rdma: Remove qemu_ prefix from exported functions
>   migration/rdma: Check sooner if we are in postcopy for save_page()
>   migration/rdma: Use i as for index instead of idx
>   migration/rdma: Declare for index variables local
>   migration/rdma: Remove all "ret" variables that are used only once
>   migration: Improve json and formatting
>
> Nikolay Borisov (2):
>   migration: Add the configuration vmstate to the json writer
>   migration/ram: Refactor precopy ram loading code
>
> Peter Xu (1):
>   migration: Allow user to specify available switchover bandwidth
>
> Philippe Mathieu-Daudé (1):
>   migration: Use g_autofree to simplify ram_dirty_bitmap_reload()
>
> Wei Wang (1):
>   migration: refactor migration_completion
>
>  qapi/migration.json|  41 -
>  include/migration/register.h   |   2 +-
>  migration/migration.h  |   4 +-
>  migration/options.h|   2 +
>  migration/qemu-file.h  |  49 --
>  migration/rdma.h   |  42 +
>  migration/block-dirty-bitmap.c |   3 -
>  migration/block.c  |   5 -
>  migration/migration-hmp-cmds.c |  14 ++
>  migration/migration-

[PATCH 07/12] hw/xen: update Xen console to XenDevice model

2023-10-16 Thread David Woodhouse

This allows (non-primary) console devices to be created on the command
line.

Signed-off-by: David Woodhouse 
---
 hw/char/trace-events|   8 +
 hw/char/xen_console.c   | 502 +++-
 hw/xen/xen-legacy-backend.c |   1 -
 3 files changed, 381 insertions(+), 130 deletions(-)

diff --git a/hw/char/trace-events b/hw/char/trace-events
index babf4d35ea..7a398c82a5 100644
--- a/hw/char/trace-events
+++ b/hw/char/trace-events
@@ -105,3 +105,11 @@ cadence_uart_baudrate(unsigned baudrate) "baudrate %u"
 # sh_serial.c
 sh_serial_read(char *id, unsigned size, uint64_t offs, uint64_t val) " %s size 
%d offs 0x%02" PRIx64 " -> 0x%02" PRIx64
 sh_serial_write(char *id, unsigned size, uint64_t offs, uint64_t val) "%s size 
%d offs 0x%02" PRIx64 " <- 0x%02" PRIx64
+
+# xen_console.c
+xen_console_connect(unsigned int idx, unsigned int ring_ref, unsigned int 
port, unsigned int limit) "idx %u ring_ref %u port %u limit %u"
+xen_console_disconnect(unsigned int idx) "idx %u"
+xen_console_unrealize(unsigned int idx) "idx %u"
+xen_console_realize(unsigned int idx, const char *chrdev) "idx %u chrdev %s"
+xen_console_device_create(unsigned int idx) "idx %u"
+xen_console_device_destroy(unsigned int idx) "idx %u"
diff --git a/hw/char/xen_console.c b/hw/char/xen_console.c
index 810dae3f44..bd20be116c 100644
--- a/hw/char/xen_console.c
+++ b/hw/char/xen_console.c
@@ -20,15 +20,19 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/cutils.h"
 #include 
 #include 
 
 #include "qapi/error.h"
 #include "sysemu/sysemu.h"
 #include "chardev/char-fe.h"
-#include "hw/xen/xen-legacy-backend.h"
-
+#include "hw/xen/xen-backend.h"
+#include "hw/xen/xen-bus-helper.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-properties-system.h"
 #include "hw/xen/interface/io/console.h"
+#include "trace.h"
 
 struct buffer {
 uint8_t *data;
@@ -39,16 +43,22 @@ struct buffer {
 };
 
 struct XenConsole {
-struct XenLegacyDevice  xendev;  /* must be first */
+struct XenDevice  xendev;  /* must be first */
+XenEventChannel   *event_channel;
+int   dev;
 struct buffer buffer;
-char  console[XEN_BUFSIZE];
-int   ring_ref;
+char  *fe_path;
+unsigned int  ring_ref;
 void  *sring;
 CharBackend   chr;
 int   backlog;
 };
+typedef struct XenConsole XenConsole;
+
+#define TYPE_XEN_CONSOLE_DEVICE "xen-console"
+OBJECT_DECLARE_SIMPLE_TYPE(XenConsole, XEN_CONSOLE_DEVICE)
 
-static void buffer_append(struct XenConsole *con)
+static bool buffer_append(XenConsole *con)
 {
 struct buffer *buffer = >buffer;
 XENCONS_RING_IDX cons, prod, size;
@@ -60,7 +70,7 @@ static void buffer_append(struct XenConsole *con)
 
 size = prod - cons;
 if ((size == 0) || (size > sizeof(intf->out)))
-return;
+return false;
 
 if ((buffer->capacity - buffer->size) < size) {
 buffer->capacity += (size + 1024);
@@ -73,7 +83,7 @@ static void buffer_append(struct XenConsole *con)
 
 xen_mb();
 intf->out_cons = cons;
-xen_pv_send_notify(>xendev);
+xen_device_notify_event_channel(XEN_DEVICE(con), con->event_channel, NULL);
 
 if (buffer->max_capacity &&
 buffer->size > buffer->max_capacity) {
@@ -89,6 +99,7 @@ static void buffer_append(struct XenConsole *con)
 if (buffer->consumed > buffer->max_capacity - over)
 buffer->consumed = buffer->max_capacity - over;
 }
+return true;
 }
 
 static void buffer_advance(struct buffer *buffer, size_t len)
@@ -100,7 +111,7 @@ static void buffer_advance(struct buffer *buffer, size_t 
len)
 }
 }
 
-static int ring_free_bytes(struct XenConsole *con)
+static int ring_free_bytes(XenConsole *con)
 {
 struct xencons_interface *intf = con->sring;
 XENCONS_RING_IDX cons, prod, space;
@@ -118,13 +129,13 @@ static int ring_free_bytes(struct XenConsole *con)
 
 static int xencons_can_receive(void *opaque)
 {
-struct XenConsole *con = opaque;
+XenConsole *con = opaque;
 return ring_free_bytes(con);
 }
 
 static void xencons_receive(void *opaque, const uint8_t *buf, int len)
 {
-struct XenConsole *con = opaque;
+XenConsole *con = opaque;
 struct xencons_interface *intf = con->sring;
 XENCONS_RING_IDX prod;
 int i, max;
@@ -141,10 +152,10 @@ static void xencons_receive(void *opaque, const uint8_t 
*buf, int len)
 }
 xen_wmb();
 intf->in_prod = prod;
-xen_pv_send_notify(>xendev);
+xen_device_notify_event_channel(XEN_DEVICE(con), con->event_channel, NULL);
 }
 
-static void xencons_send(struct XenConsole *con)
+static bool xencons_send(XenConsole *con)
 {
 ssize_t len, size;
 
@@ -159,174 +170,407 @@ static void xencons_send(struct XenConsole *con)
 if (len < 1) {
 if (!con->backlog) {
 con->backlog = 1;
-xen_pv_printf(>xendev, 1,
-  "backlog piling up, nobody listening?\n");
 }

[PATCH 12/12] hw/xen: add support for Xen primary console in emulated mode

2023-10-16 Thread David Woodhouse

From: David Woodhouse 

The primary console is special because the toolstack maps a page at a
fixed GFN and also allocates the guest-side event channel. Add support
for that in emulated mode, so that we can have a primary console.

Add a *very* rudimentary stub of foriegnmem ops for emulated mode, which
supports literally nothing except a single-page mapping of the console
page. This might as well have been a hack in the xen_console driver, but
this way at least the special-casing is kept within the Xen emulation
code, and it gives us a hook for a more complete implementation if/when
we ever do need one.

Now at last we can boot the Xen PV shim and run PV kernels in QEMU.

Signed-off-by: David Woodhouse 
---
 hw/char/xen_console.c |  12 ++-
 hw/i386/kvm/meson.build   |   1 +
 hw/i386/kvm/trace-events  |   2 +
 hw/i386/kvm/xen-stubs.c   |   5 +
 hw/i386/kvm/xen_gnttab.c  |  32 +-
 hw/i386/kvm/xen_primary_console.c | 167 ++
 hw/i386/kvm/xen_primary_console.h |  22 
 hw/i386/kvm/xen_xenstore.c|   9 ++
 target/i386/kvm/xen-emu.c |  23 +++-
 9 files changed, 266 insertions(+), 7 deletions(-)
 create mode 100644 hw/i386/kvm/xen_primary_console.c
 create mode 100644 hw/i386/kvm/xen_primary_console.h

diff --git a/hw/char/xen_console.c b/hw/char/xen_console.c
index 1a0f5ed3e1..dfc4be0aa1 100644
--- a/hw/char/xen_console.c
+++ b/hw/char/xen_console.c
@@ -32,6 +32,7 @@
 #include "hw/qdev-properties.h"
 #include "hw/qdev-properties-system.h"
 #include "hw/xen/interface/io/console.h"
+#include "hw/i386/kvm/xen_primary_console.h"
 #include "trace.h"
 
 struct buffer {
@@ -334,8 +335,8 @@ static char *xen_console_get_name(XenDevice *xendev, Error 
**errp)
 XenConsole *con = XEN_CONSOLE_DEVICE(xendev);
 
 if (con->dev == -1) {
+int idx = (xen_mode == XEN_EMULATE) ? 0 : 1;
 char name[11];
-int idx = 1;
 
 /* Theoretically we could go up to INT_MAX here but that's overkill */
 while (idx < 100) {
@@ -386,10 +387,13 @@ static void xen_console_realize(XenDevice *xendev, Error 
**errp)
  * be mapped directly as foreignmem (not a grant ref), and the guest port
  * was allocated *for* the guest by the toolstack. The guest gets these
  * through HVMOP_get_param and can use the console long before it's got
- * XenStore up and running. We cannot create those for a Xen guest.
+ * XenStore up and running. We cannot create those for a true Xen guest,
+ * but we can for Xen emulation.
  */
 if (!con->dev) {
-if (xen_device_frontend_scanf(xendev, "ring-ref", "%u", ) != 1 ||
+if (xen_mode == XEN_EMULATE) {
+xen_primary_console_create();
+} else if (xen_device_frontend_scanf(xendev, "ring-ref", "%u", ) != 
1 ||
 xen_device_frontend_scanf(xendev, "port", "%u", ) != 1) {
 error_setg(errp, "cannot create primary Xen console");
 return;
@@ -404,7 +408,7 @@ static void xen_console_realize(XenDevice *xendev, Error 
**errp)
 }
 
 /* No normal PV driver initialization for the primary console */
-if (!con->dev) {
+if (!con->dev && xen_mode != XEN_EMULATE) {
 xen_console_connect(xendev, errp);
 }
 }
diff --git a/hw/i386/kvm/meson.build b/hw/i386/kvm/meson.build
index ab143d6474..a4a2e23c06 100644
--- a/hw/i386/kvm/meson.build
+++ b/hw/i386/kvm/meson.build
@@ -9,6 +9,7 @@ i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files(
   'xen_evtchn.c',
   'xen_gnttab.c',
   'xen_xenstore.c',
+  'xen_primary_console.c',
   'xenstore_impl.c',
   ))
 
diff --git a/hw/i386/kvm/trace-events b/hw/i386/kvm/trace-events
index e4c82de6f3..67bf7f174e 100644
--- a/hw/i386/kvm/trace-events
+++ b/hw/i386/kvm/trace-events
@@ -18,3 +18,5 @@ xenstore_watch(const char *path, const char *token) "path %s 
token %s"
 xenstore_unwatch(const char *path, const char *token) "path %s token %s"
 xenstore_reset_watches(void) ""
 xenstore_watch_event(const char *path, const char *token) "path %s token %s"
+xen_primary_console_create(void) ""
+xen_primary_console_reset(int port) "port %u"
diff --git a/hw/i386/kvm/xen-stubs.c b/hw/i386/kvm/xen-stubs.c
index ae406e0b02..10068970fe 100644
--- a/hw/i386/kvm/xen-stubs.c
+++ b/hw/i386/kvm/xen-stubs.c
@@ -15,6 +15,7 @@
 #include "qapi/qapi-commands-misc-target.h"
 
 #include "xen_evtchn.h"
+#include "xen_primary_console.h"
 
 void xen_evtchn_snoop_msi(PCIDevice *dev, bool is_msix, unsigned int vector,
   uint64_t addr, uint32_t data, bool is_masked)
@@ -30,6 +31,10 @@ bool xen_evtchn_deliver_pirq_msi(uint64_t address, uint32_t 
data)
 return false;
 }
 
+void xen_primary_console_create(void)
+{
+}
+
 #ifdef TARGET_I386
 EvtchnInfoList *qmp_xen_event_list(Error **errp)
 {
diff --git a/hw/i386/kvm/xen_gnttab.c b/hw/i386/kvm/xen_gnttab.c
index 21c30e3659..ea201cd582 100644
--- a/hw/i386/kvm/xen_gnttab.c
+++ b/hw/i386/kvm/xen_gnttab.c
@@ -25,6

[PATCH 05/12] hw/xen: populate store frontend nodes with XenStore PFN/port

2023-10-16 Thread David Woodhouse

From: David Woodhouse 

This is kind of redundant since without being able to get these through
ome other method (HVMOP_get_param) the guest wouldn't be able to access
XenStore in order to find them. But Xen populates them, and it does
allow guests to *rebind* to the event channel port after a reset.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_xenstore.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c
index d2b311109b..3300e0614a 100644
--- a/hw/i386/kvm/xen_xenstore.c
+++ b/hw/i386/kvm/xen_xenstore.c
@@ -1432,6 +1432,7 @@ static void alloc_guest_port(XenXenstoreState *s)
 int xen_xenstore_reset(void)
 {
 XenXenstoreState *s = xen_xenstore_singleton;
+GList *perms;
 int err;
 
 if (!s) {
@@ -1459,6 +1460,15 @@ int xen_xenstore_reset(void)
 }
 s->be_port = err;
 
+/* Create frontend store nodes */
+perms = g_list_append(NULL, xs_perm_as_string(XS_PERM_NONE, DOMID_QEMU));
+perms = g_list_append(perms, xs_perm_as_string(XS_PERM_READ, xen_domid));
+
+relpath_printf(s, perms, "store/ring-ref", "%lu", 
XEN_SPECIAL_PFN(XENSTORE));
+relpath_printf(s, perms, "store/port", "%u", s->be_port);
+
+g_list_free_full(perms, g_free);
+
 /*
  * We don't actually access the guest's page through the grant, because
  * this isn't real Xen, and we can just use the page we gave it in the
-- 
2.40.1

[PATCH 04/12] i386/xen: advertise XEN_HVM_CPUID_UPCALL_VECTOR in CPUID

2023-10-16 Thread David Woodhouse

From: David Woodhouse 

This will allow Linux guests (since v6.0) to use the per-vCPU upcall
vector delivered as MSI through the local APIC.

Signed-off-by: David Woodhouse 
---
 target/i386/kvm/kvm.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f6c7f7e268..8bdbdcdffe 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1887,6 +1887,10 @@ int kvm_arch_init_vcpu(CPUState *cs)
 c->eax |= XEN_HVM_CPUID_VCPU_ID_PRESENT;
 c->ebx = cs->cpu_index;
 }
+
+if (cs->kvm_state->xen_version >= XEN_VERSION(4, 17)) {
+c->eax |= XEN_HVM_CPUID_UPCALL_VECTOR;
+}
 }
 
 r = kvm_xen_init_vcpu(cs);
-- 
2.40.1

[PATCH 02/12] hw/xen: select kernel mode for per-vCPU event channel upcall vector

2023-10-16 Thread David Woodhouse

From: David Woodhouse 

A guest which has configured the per-vCPU upcall vector may set the
HVM_PARAM_CALLBACK_IRQ param to fairly much anything other than zero.

For example, Linux v6.0+ after commit b1c3497e604 ("x86/xen: Add support
for HVMOP_set_evtchn_upcall_vector") will just do this after setting the
vector:

   /* Trick toolstack to think we are enlightened. */
   if (!cpu)
   rc = xen_set_callback_via(1);

That's explicitly setting the delivery to GSI#, but it's supposed to be
overridden by the per-vCPU vector setting. This mostly works in QEMU
*except* for the logic to enable the in-kernel handling of event channels,
which falsely determines that the kernel cannot accelerate GSI delivery
in this case.

Add a kvm_xen_has_vcpu_callback_vector() to report whether vCPU#0 has
the vector set, and use that in xen_evtchn_set_callback_param() to
enable the kernel acceleration features even when the param *appears*
to be set to target a GSI.

Preserve the Xen behaviour that when HVM_PARAM_CALLBACK_IRQ is set to
*zero* the event channel delivery is disabled completely. (Which is
what that bizarre guest behaviour is working round in the first place.)

Fixes: 91cce756179 ("hw/xen: Add xen_evtchn device for event channel emulation")
Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 6 ++
 include/sysemu/kvm_xen.h  | 1 +
 target/i386/kvm/xen-emu.c | 7 +++
 3 files changed, 14 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 4df973022c..d72dca6591 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -490,6 +490,12 @@ int xen_evtchn_set_callback_param(uint64_t param)
 break;
 }
 
+/* If the guest has set a per-vCPU callback vector, prefer that. */
+if (gsi && kvm_xen_has_vcpu_callback_vector()) {
+in_kernel = kvm_xen_has_cap(EVTCHN_SEND);
+gsi = 0;
+}
+
 if (!ret) {
 /* If vector delivery was turned *off* then tell the kernel */
 if ((s->callback_param >> CALLBACK_VIA_TYPE_SHIFT) ==
diff --git a/include/sysemu/kvm_xen.h b/include/sysemu/kvm_xen.h
index 595abfbe40..961c702c4e 100644
--- a/include/sysemu/kvm_xen.h
+++ b/include/sysemu/kvm_xen.h
@@ -22,6 +22,7 @@
 int kvm_xen_soft_reset(void);
 uint32_t kvm_xen_get_caps(void);
 void *kvm_xen_get_vcpu_info_hva(uint32_t vcpu_id);
+bool kvm_xen_has_vcpu_callback_vector(void);
 void kvm_xen_inject_vcpu_callback_vector(uint32_t vcpu_id, int type);
 void kvm_xen_set_callback_asserted(void);
 int kvm_xen_set_vcpu_virq(uint32_t vcpu_id, uint16_t virq, uint16_t port);
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index b49a840438..477e93cd92 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -424,6 +424,13 @@ void kvm_xen_set_callback_asserted(void)
 }
 }
 
+bool kvm_xen_has_vcpu_callback_vector(void)
+{
+CPUState *cs = qemu_get_cpu(0);
+
+return cs && !!X86_CPU(cs)->env.xen_vcpu_callback_vector;
+}
+
 void kvm_xen_inject_vcpu_callback_vector(uint32_t vcpu_id, int type)
 {
 CPUState *cs = qemu_get_cpu(vcpu_id);
-- 
2.40.1

[PATCH 0/12] Get Xen PV shim running in qemu

2023-10-16 Thread David Woodhouse

I hadn't got round to getting the PV shim running yet; I thought it would
need work on the multiboot loader. Turns out it doesn't. I *did* need to
fix a couple of brown-paper-bag bugs in the per-vCPU upcall vector support,
and implement Xen console support though. Now I can test PV guests:

 $ qemu-system-x86_64 --accel kvm,xen-version=0x40011,kernel-irqchip=split \
   -chardev stdio,mux=on,id=char0 -device xen-console,chardev=char0 \
   -drive file=${GUEST_IMAGE},if=xen -display none -m 1G \
   -kernel ~/git/xen/xen/xen -initrd ~/git/linux/arch/x86/boot/bzImage \
   -append "loglvl=all -- console=hvc0 root=/dev/xvda1"

 blockdev.c |  15 +++-
 hw/block/xen-block.c   |  26 +-
 hw/char/trace-events   |   8 ++
 hw/char/xen_console.c  | 522 
--
 hw/i386/kvm/meson.build|   1 +
 hw/i386/kvm/trace-events   |   2 +
 hw/i386/kvm/xen-stubs.c|   5 ++
 hw/i386/kvm/xen_evtchn.c   |   6 ++
 hw/i386/kvm/xen_gnttab.c   |  32 ++-
 hw/i386/kvm/xen_primary_console.c  | 167 
++
 hw/i386/kvm/xen_primary_console.h  |  22 +
 hw/i386/kvm/xen_xenstore.c |  21 -
 hw/xen/xen-backend.c   |  81 +
 hw/xen/xen-bus.c   |  21 -
 hw/xen/xen-legacy-backend.c|   1 -
 include/hw/xen/interface/arch-arm.h|  37 
 include/hw/xen/interface/arch-x86/cpuid.h  |  31 +++
 include/hw/xen/interface/arch-x86/xen-x86_32.h |  19 +---
 include/hw/xen/interface/arch-x86/xen-x86_64.h |  19 +---
 include/hw/xen/interface/arch-x86/xen.h|  26 +-
 include/hw/xen/interface/event_channel.h   |  19 +---
 include/hw/xen/interface/features.h|  19 +---
 include/hw/xen/interface/grant_table.h |  19 +---
 include/hw/xen/interface/hvm/hvm_op.h  |  19 +---
 include/hw/xen/interface/hvm/params.h  |  19 +---
 include/hw/xen/interface/io/blkif.h|  27 ++
 include/hw/xen/interface/io/console.h  |  19 +---
 include/hw/xen/interface/io/fbif.h |  19 +---
 include/hw/xen/interface/io/kbdif.h|  19 +---
 include/hw/xen/interface/io/netif.h|  25 ++
 include/hw/xen/interface/io/protocols.h|  19 +---
 include/hw/xen/interface/io/ring.h |  49 +-
 include/hw/xen/interface/io/usbif.h|  19 +---
 include/hw/xen/interface/io/xenbus.h   |  19 +---
 include/hw/xen/interface/io/xs_wire.h  |  36 
 include/hw/xen/interface/memory.h  |  30 +++
 include/hw/xen/interface/physdev.h |  23 +
 include/hw/xen/interface/sched.h   |  19 +---
 include/hw/xen/interface/trace.h   |  19 +---
 include/hw/xen/interface/vcpu.h|  19 +---
 include/hw/xen/interface/version.h |  19 +---
 include/hw/xen/interface/xen-compat.h  |  19 +---
 include/hw/xen/interface/xen.h |  19 +---
 include/hw/xen/xen-backend.h   |   4 +
 include/hw/xen/xen-bus.h   |   2 +
 include/sysemu/kvm_xen.h   |   1 +
 target/i386/kvm/kvm.c  |   4 +
 target/i386/kvm/xen-emu.c  |  35 ++--
 48 files changed, 941 insertions(+), 680 deletions(-)

[PATCH 08/12] hw/xen: do not repeatedly try to create a failing backend device

2023-10-16 Thread David Woodhouse

From: David Woodhouse 

If xen_backend_device_create() fails to instantiate a device, the XenBus
code will just keep trying over and over again each time the bus is
re-enumerated, as long as the backend appears online and in
XenbusStateInitialising.

The only thing which prevents the XenBus code from recreating duplicates
of devices which already exist, is the fact that xen_device_realize()
sets the backend state to XenbusStateInitWait. If the attempt to create
the device doesn't get *that* far, that's when it will keep getting
retried.

My first thought was to handle errors by setting the backend state to
XenbusStateClosed, but that doesn't work for XenConsole which wants to
*ignore* any device of type != "ioemu" completely.

So, make xen_backend_device_create() *keep* the XenBackendInstance for a
failed device, and provide a new xen_backend_exists() function to allow
xen_bus_type_enumerate() to check whether one already exists before
creating a new one.

Signed-off-by: David Woodhouse 
---
 hw/xen/xen-backend.c | 27 +--
 hw/xen/xen-bus.c |  3 ++-
 include/hw/xen/xen-backend.h |  1 +
 3 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/hw/xen/xen-backend.c b/hw/xen/xen-backend.c
index 5b0fb76eae..b9bf70a9f5 100644
--- a/hw/xen/xen-backend.c
+++ b/hw/xen/xen-backend.c
@@ -101,6 +101,24 @@ static XenBackendInstance *xen_backend_list_find(XenDevice 
*xendev)
 return NULL;
 }
 
+bool xen_backend_exists(const char *type, const char *name)
+{
+const XenBackendImpl *impl = xen_backend_table_lookup(type);
+XenBackendInstance *backend;
+
+if (!impl) {
+return false;
+}
+
+QLIST_FOREACH(backend, _list, entry) {
+if (backend->impl == impl && !strcmp(backend->name, name)) {
+return true;
+}
+}
+
+return false;
+}
+
 static void xen_backend_list_remove(XenBackendInstance *backend)
 {
 QLIST_REMOVE(backend, entry);
@@ -122,11 +140,6 @@ void xen_backend_device_create(XenBus *xenbus, const char 
*type,
 backend->name = g_strdup(name);
 
 impl->create(backend, opts, errp);
-if (*errp) {
-g_free(backend->name);
-g_free(backend);
-return;
-}
 
 backend->impl = impl;
 xen_backend_list_add(backend);
@@ -165,7 +178,9 @@ bool xen_backend_try_device_destroy(XenDevice *xendev, 
Error **errp)
 }
 
 impl = backend->impl;
-impl->destroy(backend, errp);
+if (backend->xendev) {
+impl->destroy(backend, errp);
+}
 
 xen_backend_list_remove(backend);
 g_free(backend->name);
diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index cc524ed92c..0da2aa219a 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -209,7 +209,8 @@ static void xen_bus_type_enumerate(XenBus *xenbus, const 
char *type)
   NULL, "%u", ) != 1)
 online = 0;
 
-if (online && state == XenbusStateInitialising) {
+if (online && state == XenbusStateInitialising &&
+!xen_backend_exists(type, backend[i])) {
 Error *local_err = NULL;
 
 xen_bus_backend_create(xenbus, type, backend[i], backend_path,
diff --git a/include/hw/xen/xen-backend.h b/include/hw/xen/xen-backend.h
index aac2fd454d..0f01631ae7 100644
--- a/include/hw/xen/xen-backend.h
+++ b/include/hw/xen/xen-backend.h
@@ -33,6 +33,7 @@ XenDevice *xen_backend_get_device(XenBackendInstance 
*backend);
 void xen_backend_register(const XenBackendInfo *info);
 const char **xen_backend_get_types(unsigned int *nr);
 
+bool xen_backend_exists(const char *type, const char *name);
 void xen_backend_device_create(XenBus *xenbus, const char *type,
const char *name, QDict *opts, Error **errp);
 bool xen_backend_try_device_destroy(XenDevice *xendev, Error **errp);
-- 
2.40.1

[PATCH 11/12] hw/xen: automatically assign device index to block devices

2023-10-16 Thread David Woodhouse

From: David Woodhouse 

There's no need to force the user to assign a vdev. We can automatically
assign one, starting at xvda and searching until we find the first disk
name that's unused.

This means we can now allow '-drive if=xen,file=xxx' to work without an
explicit separate -driver argument, just like if=virtio.

Signed-off-by: David Woodhouse 
---
 blockdev.c   | 15 ---
 hw/block/xen-block.c | 25 +
 2 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 325b7a3bef..9dec4c9c74 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -255,13 +255,13 @@ void drive_check_orphaned(void)
  * Ignore default drives, because we create certain default
  * drives unconditionally, then leave them unclaimed.  Not the
  * users fault.
- * Ignore IF_VIRTIO, because it gets desugared into -device,
- * so we can leave failing to -device.
+ * Ignore IF_VIRTIO or IF_XEN, because it gets desugared into
+ * -device, so we can leave failing to -device.
  * Ignore IF_NONE, because leaving unclaimed IF_NONE remains
  * available for device_add is a feature.
  */
 if (dinfo->is_default || dinfo->type == IF_VIRTIO
-|| dinfo->type == IF_NONE) {
+|| dinfo->type == IF_XEN || dinfo->type == IF_NONE) {
 continue;
 }
 if (!blk_get_attached_dev(blk)) {
@@ -977,6 +977,15 @@ DriveInfo *drive_new(QemuOpts *all_opts, 
BlockInterfaceType block_default_type,
 qemu_opt_set(devopts, "driver", "virtio-blk", _abort);
 qemu_opt_set(devopts, "drive", qdict_get_str(bs_opts, "id"),
  _abort);
+} else if (type == IF_XEN) {
+QemuOpts *devopts;
+devopts = qemu_opts_create(qemu_find_opts("device"), NULL, 0,
+   _abort);
+qemu_opt_set(devopts, "driver",
+ (media == MEDIA_CDROM) ? "xen-cdrom" : "xen-disk",
+ _abort);
+qemu_opt_set(devopts, "drive", qdict_get_str(bs_opts, "id"),
+ _abort);
 }
 
 filename = qemu_opt_get(legacy_opts, "file");
diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index 9262338535..20fa783cbe 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -34,6 +34,31 @@ static char *xen_block_get_name(XenDevice *xendev, Error 
**errp)
 XenBlockDevice *blockdev = XEN_BLOCK_DEVICE(xendev);
 XenBlockVdev *vdev = >props.vdev;
 
+if (blockdev->props.vdev.type == XEN_BLOCK_VDEV_TYPE_INVALID) {
+char name[11];
+int disk = 0;
+unsigned long idx;
+
+/* Find an unoccupied device name */
+while (disk < (1 << 20)) {
+if (disk < (1 << 4)) {
+idx = (202 << 8) | (disk << 4);
+} else {
+idx = (1 << 28) | (disk << 8);
+}
+snprintf(name, sizeof(name), "%lu", idx);
+if (!xen_backend_exists("qdisk", name)) {
+vdev->type = XEN_BLOCK_VDEV_TYPE_XVD;
+vdev->partition = 0;
+vdev->disk = disk;
+vdev->number = idx;
+return g_strdup(name);
+}
+disk++;
+}
+error_setg(errp, "cannot find device vdev for block device");
+return NULL;
+}
 return g_strdup_printf("%lu", vdev->number);
 }
 
-- 
2.40.1

[PATCH 10/12] hw/xen: automatically assign device index to console devices

2023-10-16 Thread David Woodhouse

From: David Woodhouse 

Now that we can reliably tell whether a given device already exists, we
can allow the user to add console devices on the command line with just
'-device xen-console,chardev=foo'.

Start at 1, because we can't add the *primary* console; that's special
because the toolstack has to set up the guest end of that.

Signed-off-by: David Woodhouse 
---
 hw/char/xen_console.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/hw/char/xen_console.c b/hw/char/xen_console.c
index 2825b8c511..1a0f5ed3e1 100644
--- a/hw/char/xen_console.c
+++ b/hw/char/xen_console.c
@@ -333,6 +333,22 @@ static char *xen_console_get_name(XenDevice *xendev, Error 
**errp)
 {
 XenConsole *con = XEN_CONSOLE_DEVICE(xendev);
 
+if (con->dev == -1) {
+char name[11];
+int idx = 1;
+
+/* Theoretically we could go up to INT_MAX here but that's overkill */
+while (idx < 100) {
+snprintf(name, sizeof(name), "%u", idx);
+if (!xen_backend_exists("console", name)) {
+con->dev = idx;
+return g_strdup(name);
+}
+idx++;
+}
+error_setg(errp, "cannot find device index for console device");
+return NULL;
+}
 return g_strdup_printf("%u", con->dev);
 }
 
-- 
2.40.1

[PATCH 03/12] include: update Xen public headers to Xen 4.17.2 release

2023-10-16 Thread David Woodhouse

From: David Woodhouse 

... in order to advertise the XEN_HVM_CPUID_UPCALL_VECTOR feature,
which will come in a subsequent commit.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_xenstore.c|  2 +-
 include/hw/xen/interface/arch-arm.h   | 37 +++---
 include/hw/xen/interface/arch-x86/cpuid.h | 31 +---
 .../hw/xen/interface/arch-x86/xen-x86_32.h| 19 +--
 .../hw/xen/interface/arch-x86/xen-x86_64.h| 19 +--
 include/hw/xen/interface/arch-x86/xen.h   | 26 ++
 include/hw/xen/interface/event_channel.h  | 19 +--
 include/hw/xen/interface/features.h   | 19 +--
 include/hw/xen/interface/grant_table.h| 19 +--
 include/hw/xen/interface/hvm/hvm_op.h | 19 +--
 include/hw/xen/interface/hvm/params.h | 19 +--
 include/hw/xen/interface/io/blkif.h   | 27 --
 include/hw/xen/interface/io/console.h | 19 +--
 include/hw/xen/interface/io/fbif.h| 19 +--
 include/hw/xen/interface/io/kbdif.h   | 19 +--
 include/hw/xen/interface/io/netif.h   | 25 +++---
 include/hw/xen/interface/io/protocols.h   | 19 +--
 include/hw/xen/interface/io/ring.h| 49 ++-
 include/hw/xen/interface/io/usbif.h   | 19 +--
 include/hw/xen/interface/io/xenbus.h  | 19 +--
 include/hw/xen/interface/io/xs_wire.h | 36 ++
 include/hw/xen/interface/memory.h | 30 +---
 include/hw/xen/interface/physdev.h| 23 ++---
 include/hw/xen/interface/sched.h  | 19 +--
 include/hw/xen/interface/trace.h  | 19 +--
 include/hw/xen/interface/vcpu.h   | 19 +--
 include/hw/xen/interface/version.h| 19 +--
 include/hw/xen/interface/xen-compat.h | 19 +--
 include/hw/xen/interface/xen.h| 19 +--
 29 files changed, 124 insertions(+), 523 deletions(-)

diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c
index 660d0b72f9..d2b311109b 100644
--- a/hw/i386/kvm/xen_xenstore.c
+++ b/hw/i386/kvm/xen_xenstore.c
@@ -331,7 +331,7 @@ static void xs_error(XenXenstoreState *s, unsigned int id,
 const char *errstr = NULL;
 
 for (unsigned int i = 0; i < ARRAY_SIZE(xsd_errors); i++) {
-struct xsd_errors *xsd_error = _errors[i];
+const struct xsd_errors *xsd_error = _errors[i];
 
 if (xsd_error->errnum == errnum) {
 errstr = xsd_error->errstring;
diff --git a/include/hw/xen/interface/arch-arm.h 
b/include/hw/xen/interface/arch-arm.h
index 94b31511dd..1528ced509 100644
--- a/include/hw/xen/interface/arch-arm.h
+++ b/include/hw/xen/interface/arch-arm.h
@@ -1,26 +1,9 @@
+/* SPDX-License-Identifier: MIT */
 /**
  * arch-arm.h
  *
  * Guest OS interface to ARM Xen.
  *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to
- * deal in the Software without restriction, including without limitation the
- * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
- * sell copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
- * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
- * DEALINGS IN THE SOFTWARE.
- *
  * Copyright 2011 (C) Citrix Systems
  */
 
@@ -361,6 +344,7 @@ typedef uint64_t xen_callback_t;
 #define PSR_DBG_MASK(1<<9)/* arm64: Debug Exception mask */
 #define PSR_IT_MASK (0x0600fc00)  /* Thumb If-Then Mask */
 #define PSR_JAZELLE (1<<24)   /* Jazelle Mode */
+#define PSR_Z   (1<<30)   /* Zero condition flag */
 
 /* 32 bit modes */
 #define PSR_MODE_USR 0x10
@@ -383,7 +367,15 @@ typedef uint64_t xen_callback_t;
 #define PSR_MODE_EL1t 0x04
 #define PSR_MODE_EL0t 0x00
 
-#define PSR_GUEST32_INIT  (PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_SVC)
+/*
+ * We set PSR_Z to be able to boot Linux kernel versions with an invalid
+ * encoding of the first 8 NOP instructions. See commit a92882a4d270 in
+ * Linux.
+ *
+ * Note that PSR_Z is also set by U-Boot and QEMU -kernel when loading
+ * zImage kernels on aarch32.
+ */
+#define PSR_GUEST32_INIT

[PATCH 06/12] hw/xen: add get_frontend_path() method to XenDeviceClass

2023-10-16 Thread David Woodhouse

From: David Woodhouse 

The primary Xen console is special. The guest's side is set up for it by
the toolstack automatically and not by the standard PV init sequence.

Accordingly, its *frontend* doesn't appear in …/device/console/0 either;
instead it appears under …/console in the guest's XenStore node.

To allow the Xen console driver to override the frontend path for the
primary console, add a method to the XenDeviceClass which can be used
instead of the standard xen_device_get_frontend_path()

Signed-off-by: David Woodhouse 
---
 hw/xen/xen-bus.c | 10 +-
 include/hw/xen/xen-bus.h |  2 ++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index ece8ec40cd..cc524ed92c 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -711,8 +711,16 @@ static void xen_device_frontend_create(XenDevice *xendev, 
Error **errp)
 {
 ERRP_GUARD();
 XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev)));
+XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev);
 
-xendev->frontend_path = xen_device_get_frontend_path(xendev);
+if (xendev_class->get_frontend_path) {
+xendev->frontend_path = xendev_class->get_frontend_path(xendev, errp);
+if (!xendev->frontend_path) {
+return;
+}
+} else {
+xendev->frontend_path = xen_device_get_frontend_path(xendev);
+}
 
 /*
  * The frontend area may have already been created by a legacy
diff --git a/include/hw/xen/xen-bus.h b/include/hw/xen/xen-bus.h
index f435898164..eb440880b5 100644
--- a/include/hw/xen/xen-bus.h
+++ b/include/hw/xen/xen-bus.h
@@ -33,6 +33,7 @@ struct XenDevice {
 };
 typedef struct XenDevice XenDevice;
 
+typedef char *(*XenDeviceGetFrontendPath)(XenDevice *xendev, Error **errp);
 typedef char *(*XenDeviceGetName)(XenDevice *xendev, Error **errp);
 typedef void (*XenDeviceRealize)(XenDevice *xendev, Error **errp);
 typedef void (*XenDeviceFrontendChanged)(XenDevice *xendev,
@@ -46,6 +47,7 @@ struct XenDeviceClass {
 /*< public >*/
 const char *backend;
 const char *device;
+XenDeviceGetFrontendPath get_frontend_path;
 XenDeviceGetName get_name;
 XenDeviceRealize realize;
 XenDeviceFrontendChanged frontend_changed;
-- 
2.40.1

[PATCH 09/12] hw/xen: prevent duplicate device registrations

2023-10-16 Thread David Woodhouse

From: David Woodhouse 

Ensure that we have a XenBackendInstance for every device regardless
of whether it was "discovered" in XenStore or created directly in QEMU.

This allows the backend_list to be a source of truth about whether a
given backend exists, and allows us to reject duplicates.

This also cleans up the fact that backend drivers were calling
xen_backend_set_device() with a XenDevice immediately after calling
qdev_realize_and_unref() on it, when it wasn't theirs to play with any
more.

Signed-off-by: David Woodhouse 
---
 hw/block/xen-block.c |  1 -
 hw/char/xen_console.c|  2 +-
 hw/xen/xen-backend.c | 78 ++--
 hw/xen/xen-bus.c |  8 
 include/hw/xen/xen-backend.h |  3 ++
 5 files changed, 69 insertions(+), 23 deletions(-)

diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index a07cd7eb5d..9262338535 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -975,7 +975,6 @@ static void xen_block_device_create(XenBackendInstance 
*backend,
 goto fail;
 }
 
-xen_backend_set_device(backend, xendev);
 return;
 
 fail:
diff --git a/hw/char/xen_console.c b/hw/char/xen_console.c
index bd20be116c..2825b8c511 100644
--- a/hw/char/xen_console.c
+++ b/hw/char/xen_console.c
@@ -468,7 +468,7 @@ static void xen_console_device_create(XenBackendInstance 
*backend,
 Chardev *cd = NULL;
 struct qemu_xs_handle *xsh = xenbus->xsh;
 
-if (qemu_strtoul(name, NULL, 10, )) {
+if (qemu_strtoul(name, NULL, 10, ) || number >= INT_MAX) {
 error_setg(errp, "failed to parse name '%s'", name);
 goto fail;
 }
diff --git a/hw/xen/xen-backend.c b/hw/xen/xen-backend.c
index b9bf70a9f5..dcb4329258 100644
--- a/hw/xen/xen-backend.c
+++ b/hw/xen/xen-backend.c
@@ -101,22 +101,28 @@ static XenBackendInstance 
*xen_backend_list_find(XenDevice *xendev)
 return NULL;
 }
 
-bool xen_backend_exists(const char *type, const char *name)
+static XenBackendInstance *xen_backend_lookup(const XenBackendImpl *impl, 
const char *name)
 {
-const XenBackendImpl *impl = xen_backend_table_lookup(type);
 XenBackendInstance *backend;
 
-if (!impl) {
-return false;
-}
-
 QLIST_FOREACH(backend, _list, entry) {
 if (backend->impl == impl && !strcmp(backend->name, name)) {
-return true;
+return backend;
 }
 }
 
-return false;
+return NULL;
+}
+
+bool xen_backend_exists(const char *type, const char *name)
+{
+const XenBackendImpl *impl = xen_backend_table_lookup(type);
+
+if (!impl) {
+return false;
+}
+
+return !!xen_backend_lookup(impl, name);
 }
 
 static void xen_backend_list_remove(XenBackendInstance *backend)
@@ -138,11 +144,10 @@ void xen_backend_device_create(XenBus *xenbus, const char 
*type,
 backend = g_new0(XenBackendInstance, 1);
 backend->xenbus = xenbus;
 backend->name = g_strdup(name);
-
-impl->create(backend, opts, errp);
-
 backend->impl = impl;
 xen_backend_list_add(backend);
+
+impl->create(backend, opts, errp);
 }
 
 XenBus *xen_backend_get_bus(XenBackendInstance *backend)
@@ -155,13 +160,6 @@ const char *xen_backend_get_name(XenBackendInstance 
*backend)
 return backend->name;
 }
 
-void xen_backend_set_device(XenBackendInstance *backend,
-XenDevice *xendev)
-{
-g_assert(!backend->xendev);
-backend->xendev = xendev;
-}
-
 XenDevice *xen_backend_get_device(XenBackendInstance *backend)
 {
 return backend->xendev;
@@ -178,9 +176,7 @@ bool xen_backend_try_device_destroy(XenDevice *xendev, 
Error **errp)
 }
 
 impl = backend->impl;
-if (backend->xendev) {
-impl->destroy(backend, errp);
-}
+impl->destroy(backend, errp);
 
 xen_backend_list_remove(backend);
 g_free(backend->name);
@@ -188,3 +184,43 @@ bool xen_backend_try_device_destroy(XenDevice *xendev, 
Error **errp)
 
 return true;
 }
+
+bool xen_backend_device_realized(XenDevice *xendev, Error **errp)
+{
+XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev);
+const char *type = xendev_class->backend ? : 
object_get_typename(OBJECT(xendev));
+const XenBackendImpl *impl = xen_backend_table_lookup(type);
+XenBackendInstance *backend;
+
+if (!impl) {
+return false;
+}
+
+backend = xen_backend_lookup(impl, xendev->name);
+if (backend) {
+if (backend->xendev && backend->xendev != xendev) {
+error_setg(errp, "device %s/%s already exists", type, 
xendev->name);
+return false;
+}
+backend->xendev = xendev;
+return true;
+}
+
+backend = g_new0(XenBackendInstance, 1);
+backend->xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev)));
+backend->xendev = xendev;
+backend->name = g_strdup(xendev->name);
+backend->impl = impl;
+
+xen_backend_list_add(backend);
+return true;
+}
+
+void

[PATCH 01/12] i386/xen: fix per-vCPU upcall vector for Xen emulation

2023-10-16 Thread David Woodhouse

From: David Woodhouse 

The per-vCPU upcall vector support had two problems. Firstly it was
using the wrong hypercall argument and would always return -EFAULT.
And secondly it was using the wrong ioctl() to pass the vector to
the kernel and thus the *kernel* would always return -EINVAL.

Linux doesn't (yet) use this mode so it went without decent testing
for a while.

Fixes: 105b47fdf2d0 ("i386/xen: implement HVMOP_set_evtchn_upcall_vector")
Signed-off-by: David Woodhouse 
---
 target/i386/kvm/xen-emu.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index c4fa84a982..b49a840438 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -306,7 +306,7 @@ static int kvm_xen_set_vcpu_callback_vector(CPUState *cs)
 
 trace_kvm_xen_set_vcpu_callback(cs->cpu_index, vector);
 
-return kvm_vcpu_ioctl(cs, KVM_XEN_HVM_SET_ATTR, );
+return kvm_vcpu_ioctl(cs, KVM_XEN_VCPU_SET_ATTR, );
 }
 
 static void do_set_vcpu_callback_vector(CPUState *cs, run_on_cpu_data data)
@@ -849,8 +849,7 @@ static bool kvm_xen_hcall_hvm_op(struct kvm_xen_exit *exit, 
X86CPU *cpu,
 int ret = -ENOSYS;
 switch (cmd) {
 case HVMOP_set_evtchn_upcall_vector:
-ret = kvm_xen_hcall_evtchn_upcall_vector(exit, cpu,
- exit->u.hcall.params[0]);
+ret = kvm_xen_hcall_evtchn_upcall_vector(exit, cpu, arg);
 break;
 
 case HVMOP_pagetable_dying:
-- 
2.40.1

Re: [RFC PATCH v3 02/78] block: add fallthrough pseudo-keyword

2023-10-16 Thread Stefan Hajnoczi

On Fri, Oct 13, 2023 at 11:45:30AM +0300, Emmanouil Pitsidianakis wrote:
> In preparation of raising -Wimplicit-fallthrough to 5, replace all
> fall-through comments with the fallthrough attribute pseudo-keyword.
> 
> Signed-off-by: Emmanouil Pitsidianakis 
> ---
>  block/block-copy.c|  1 +
>  block/file-posix.c|  1 +
>  block/io.c|  1 +
>  block/iscsi.c |  1 +
>  block/qcow2-cluster.c |  5 -
>  block/vhdx.c  | 17 +
>  6 files changed, 21 insertions(+), 5 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [RFC PATCH 00/78] Strict disable implicit fallthrough

2023-10-16 Thread Peter Maydell

On Mon, 16 Oct 2023 at 15:58, Manos Pitsidianakis
 wrote:
>
> Hello Peter,
>
> On Mon, 16 Oct 2023, 17:13 Peter Maydell,  wrote:
>>
>> On Fri, 13 Oct 2023 at 13:42, Markus Armbruster  wrote:
>> >
>> > Emmanouil Pitsidianakis  writes:
>> >
>> > > Hello,
>> > >
>> > > This RFC is inspired by the kernel's move to -Wimplicit-fallthrough=3
>> > > back in 2019.[0]
>> > > We take one step (or two) further by increasing it to 5 which rejects
>> > > fall through comments and requires an attribute statement.
>> > >
>> > > [0]:
>> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a035d552a93b
>> > >
>> > > The line differences are not many, but they spread all over different
>> > > subsystems, architectures and devices. An attempt has been made to split
>> > > them in cohesive patches to aid post-RFC review. Part of the RFC is to
>> > > determine whether these patch divisions needs improvement.
>> > >
>> > > Main questions this RFC poses
>> > > =
>> > >
>> > > - Is this change desirable and net-positive.
>> >
>> > Unwanted fallthrough is an easy mistake to make, and
>> > -Wimplicit-fallthrough=N helps avoid it.  The question is how far up we
>> > need to push N.  Right now we're at N=2.  Has unwanted fallthrough been
>> > a problem?
>>
>> Mmm, this is my opinion I think. We have a mechanism for
>> catching "forgot the 'break'" already (our =2 setting) and
>> a way to say "intentional" in a fairly natural way (add the
>> comment). Does pushing N up any further gain us anything
>> except a load of churn?
>>
>> Also, the compiler is not the only thing that processes our
>> code: Coverity also looks for "unexpected fallthrough" issues,
>> so if we wanted to switch away from our current practice we
>> should check whether what we're switching to is an idiom
>> that Coverity recognises.
>
>
> It is a code style change as the cover letter mentions, it's not related to 
> the static analysis itself.

Yes, exactly. As a code style change it needs a fairly high level
of justification for the code churn, and the cover letter
doesn't really provide one...

thanks
-- PMM

Re: [RFC PATCH v3 08/78] hw/block: add fallthrough pseudo-keyword

2023-10-16 Thread Stefan Hajnoczi

On Fri, Oct 13, 2023 at 11:45:36AM +0300, Emmanouil Pitsidianakis wrote:
> In preparation of raising -Wimplicit-fallthrough to 5, replace all
> fall-through comments with the fallthrough attribute pseudo-keyword.
> 
> Signed-off-by: Emmanouil Pitsidianakis 
> ---
>  hw/block/dataplane/xen-block.c | 4 ++--
>  hw/block/m25p80.c  | 2 +-
>  hw/block/onenand.c | 2 +-
>  hw/block/pflash_cfi01.c| 1 +
>  hw/block/pflash_cfi02.c| 6 --
>  5 files changed, 9 insertions(+), 6 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [RFC PATCH 00/78] Strict disable implicit fallthrough

2023-10-16 Thread Manos Pitsidianakis

Hello Peter,

On Mon, 16 Oct 2023, 17:13 Peter Maydell,  wrote:

> On Fri, 13 Oct 2023 at 13:42, Markus Armbruster  wrote:
> >
> > Emmanouil Pitsidianakis  writes:
> >
> > > Hello,
> > >
> > > This RFC is inspired by the kernel's move to -Wimplicit-fallthrough=3
> > > back in 2019.[0]
> > > We take one step (or two) further by increasing it to 5 which rejects
> > > fall through comments and requires an attribute statement.
> > >
> > > [0]:
> > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a035d552a93b
> > >
> > > The line differences are not many, but they spread all over different
> > > subsystems, architectures and devices. An attempt has been made to
> split
> > > them in cohesive patches to aid post-RFC review. Part of the RFC is to
> > > determine whether these patch divisions needs improvement.
> > >
> > > Main questions this RFC poses
> > > =
> > >
> > > - Is this change desirable and net-positive.
> >
> > Unwanted fallthrough is an easy mistake to make, and
> > -Wimplicit-fallthrough=N helps avoid it.  The question is how far up we
> > need to push N.  Right now we're at N=2.  Has unwanted fallthrough been
> > a problem?
>
> Mmm, this is my opinion I think. We have a mechanism for
> catching "forgot the 'break'" already (our =2 setting) and
> a way to say "intentional" in a fairly natural way (add the
> comment). Does pushing N up any further gain us anything
> except a load of churn?
>
> Also, the compiler is not the only thing that processes our
> code: Coverity also looks for "unexpected fallthrough" issues,
> so if we wanted to switch away from our current practice we
> should check whether what we're switching to is an idiom
> that Coverity recognises.
>

It is a code style change as the cover letter mentions, it's not related to
the static analysis itself.

--
Manos

>

Re: [PATCH v2] block/file-posix: fix update_zones_wp() caller

2023-10-16 Thread Sam Li

Sam Li  于2023年8月25日周五 12:06写道：
>
> When the zoned request fail, it needs to update only the wp of
> the target zones for not disrupting the in-flight writes on
> these other zones. The wp is updated successfully after the
> request completes.
>
> Fixed the callers with right offset and nr_zones.
>
> Signed-off-by: Sam Li 
> ---
>  block/file-posix.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Ping?

>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index b16e9c21a1..55e7f06a2f 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -2522,7 +2522,8 @@ out:
>  }
>  } else {
>  if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
> -update_zones_wp(bs, s->fd, 0, 1);
> +/* write and append write are not allowed to cross zone 
> bounaries */
> +update_zones_wp(bs, s->fd, offset, 1);
>  }
>  }
>
> @@ -3472,7 +3473,7 @@ static int coroutine_fn 
> raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
>  len >> BDRV_SECTOR_BITS);
>  ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, );
>  if (ret != 0) {
> -update_zones_wp(bs, s->fd, offset, i);
> +update_zones_wp(bs, s->fd, offset, nrz);
>  error_report("ioctl %s failed %d", op_name, ret);
>  return ret;
>  }
> --
> 2.40.1
>

Re: [RFC PATCH 00/78] Strict disable implicit fallthrough

2023-10-16 Thread Peter Maydell

On Fri, 13 Oct 2023 at 13:42, Markus Armbruster  wrote:
>
> Emmanouil Pitsidianakis  writes:
>
> > Hello,
> >
> > This RFC is inspired by the kernel's move to -Wimplicit-fallthrough=3
> > back in 2019.[0]
> > We take one step (or two) further by increasing it to 5 which rejects
> > fall through comments and requires an attribute statement.
> >
> > [0]:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a035d552a93b
> >
> > The line differences are not many, but they spread all over different
> > subsystems, architectures and devices. An attempt has been made to split
> > them in cohesive patches to aid post-RFC review. Part of the RFC is to
> > determine whether these patch divisions needs improvement.
> >
> > Main questions this RFC poses
> > =
> >
> > - Is this change desirable and net-positive.
>
> Unwanted fallthrough is an easy mistake to make, and
> -Wimplicit-fallthrough=N helps avoid it.  The question is how far up we
> need to push N.  Right now we're at N=2.  Has unwanted fallthrough been
> a problem?

Mmm, this is my opinion I think. We have a mechanism for
catching "forgot the 'break'" already (our =2 setting) and
a way to say "intentional" in a fairly natural way (add the
comment). Does pushing N up any further gain us anything
except a load of churn?

Also, the compiler is not the only thing that processes our
code: Coverity also looks for "unexpected fallthrough" issues,
so if we wanted to switch away from our current practice we
should check whether what we're switching to is an idiom
that Coverity recognises.

thanks
-- PMM

Re: [PATCH v4 03/10] migration: migrate 'inc' command option is deprecated.

2023-10-16 Thread Juan Quintela

Markus Armbruster  wrote:
> Juan Quintela  writes:
>
>> Markus Armbruster  wrote:
>>> Juan Quintela  writes:
>> So what I want, I want to remove -i/-b in the next version (9.0?).  For
>> the other, I want to remove it, but I don't care if the code is around
>> in "deprecated" state for another couple of years if there are still
>> people that feel that they want it.
>>
>> This is the reason that I put a pointer for -i/-b to
>> @block/@block-incremental.  They are "perfect" replacements.
>>
>> I can put here to use blockdev-mirror + NBD, but the replacement is not
>> so direct.
>>
>> Does this make sense?
>
> I see where you're coming from.  Now let's change perspective for a
> minute: what's the purpose of deprecating stuff?
>
> We normally deprecate with intent to remove.
>
> We don't remove right away, because we promised to first deprecate for a
> grace period, so users can adjust in an orderly manner.  The deprecation
> serves as signal "you need to adjust".  The documentation that comes
> with it should help with the adjustment.  It's commonly of the form "use
> $alternative instead".  The alternative is often a direct replacement,
> but not always.  There could even be no replacement at all.  We don't
> promise replacements, we promise an orderly process, so users can
> adjust.
>
> Sometimes, we don't have firm plans to remove, but are more like "maybe
> remove when gets in the way".  We could soften the "you need to adjust"
> signal in documentation, but I doubt that's a good idea.  Regardless,
> the need to help users adjust remains.
>
> Back to your patches.  There are two separate interfaces to block
> migration, and both are deprecated at the end of the series:
>
> 1. Migration parameter @block-incremental & friends
>
>Not in the way, content to keep around for longer if it helps users.
>
>The deprecation documentation advises to use block-mirror with NBD
>instead.  All good.
>
> 2. QMP migrate parameters @inc and @blk
>
>Firm intent to remove as soon as the grace period expires, because
>it's in the way.
>
>The deprecation documentation advises to use interface 1 instead.
>But that's deprecated, too!
>
>Insufficiently careful readers will miss that the replacement is
>deprecated, and just use it.  Risks surprise when the replacement
>goes away, too.
>
>More careful readers will realize that this advises to use something
>we elsewhere advise not to use.  Contradiction!  Confusion ensues.
>
>On further reflection, these readers might conclude that the
>*combined* advice is to use block-mirror with NBD instead.  This is
>correct.
>
>So why not tell them?
>
>Perhaps you'd like to give more nuanced advice, like "you should move
>to block-mirror with NBD, but if that's not practical for you, you
>should at least move to @block-incremental & friends, which will
>likely stick around for longer."  That's fine.  All I'm asking for is
>to not make things more confusing than they need to be :)
>
> [...]

Telling this in deprecated.rst is enough?  or you want me to put it also
in the error/warn messages and qapi?

Later, Juan.

Re: -drive if=none: can't we make this the default?

2023-10-16 Thread Michael Tokarev


Almost everyone mentions -blockdev as a replacement for -drive.

But I have to remind several issues with it:

1. While documentation has improved a lot, -blockdev is still mostly unknown
   to the masses.

2. -blockdev is just too verbose, one have to specify a lot of parameters just 
to
   do a simple thing which is solved with an extra parameter with -drive.
   Including various backing stores/chains for qcow2 files - this is terrible 
for
   using things manually from command line

3. -blockdev does not work with -snapshot

4. Something else I forgot while typing all the above :)

In my view, -blockdev is not a substitute for -drive, not at all, and it is
very user-unfriendly.  This is why -drive seems to be a good trade-off between
things like -hda (which is just too simplistic) and -blockdev which which is
way too verbose and lacks some automatic sugar like -snapshot.

/mjt

Re: [PATCH v4 03/10] migration: migrate 'inc' command option is deprecated.

2023-10-16 Thread Markus Armbruster

Juan Quintela  writes:

> Markus Armbruster  wrote:
>> Juan Quintela  writes:
>>
>>> Set the 'block_incremental' migration parameter to 'true' instead.
>>>
>>> Reviewed-by: Thomas Huth 
>>> Acked-by: Stefan Hajnoczi 
>>> Signed-off-by: Juan Quintela 
>>>
>>> ---
>>>
>>> Improve documentation and style (thanks Markus)
>>> ---
>>>  docs/about/deprecated.rst | 7 +++
>>>  qapi/migration.json   | 8 +++-
>>>  migration/migration.c | 6 ++
>>>  3 files changed, 20 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
>>> index 1c4d7f36f0..1b6b2870cf 100644
>>> --- a/docs/about/deprecated.rst
>>> +++ b/docs/about/deprecated.rst
>>> @@ -452,3 +452,10 @@ Migration
>>>  ``skipped`` field in Migration stats has been deprecated.  It hasn't
>>>  been used for more than 10 years.
>>>  
>>> +``inc`` migrate command option (since 8.2)
>>> +''
>>> +
>>> +The new way to modify migration is using migration parameters.
>>> +``inc`` functionality can be achieved by setting the
>>> +``block-incremental`` migration parameter to ``true``.
>>> +
>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>> index 6865fea3c5..56bbd55b87 100644
>>> --- a/qapi/migration.json
>>> +++ b/qapi/migration.json
>>> @@ -1492,6 +1492,11 @@
>>>  #
>>>  # @resume: resume one paused migration, default "off". (since 3.0)
>>>  #
>>> +# Features:
>>> +#
>>> +# @deprecated: Member @inc is deprecated.  Use migration parameter
>>> +# @block-incremental instead.
>>
>> This is fine now.  It becomes bad advice in PATCH 05, which deprecates
>> @block-incremental.  Two solutions:
>>
>> 1. Change this patch to point to an alternative that will *not* be
>> deprecated.
>
> Ok, clearly I am not explaining myself properly O:-)
>
> History of block migration:
> * In the beggining there was -b and -i migrate options
>   There was the only way to do storage of migration.
> * We moved to use parameters and capabilities for migration
>   So we created @block-incremental and @block.
>   But we didn't remove the command line options (for backward
>   compatibility).
> * We were asked to modify migration so some storaged was migrated and
>   some was not migrated during migration.  But block people found that
>   it was a good idea to allow storage migration without migrating the
>   vm, so they created this blockdev-mirror mechanism that is shinny,
>   funny, faster,  better.
>
> So now we have old code that basically nobody uses (the last big user
> was COLO, but now it can use multifd).  So we want to drop it, but we
> don't care about a direct replacement.
>
> So, why I am interested in removing this?
> - @block and @block-incremental: If you don't use block migration, their
>   existence don't bother you.  They are "quite" independent of the rest
>   of the migration code (they could be better integrated, but not big
>   trouble here).
> - migrate options -i/-b: This ones hurt us each time that we need to
>   do changing in options.  Notice that we have "perfect" replacements
>   with @block and @block-incremental, exactly the same
>   result/semantics/...
>   You can see the trobles in the RFC patches
>
>  * [PATCH v4 07/10] [RFC] migration: Make -i/-b an error for hmp and qmp
>  * [PATCH v4 08/10] [RFC] migration: Remove helpers needed for -i/-b 
> migrate options
>
> So what I want, I want to remove -i/-b in the next version (9.0?).  For
> the other, I want to remove it, but I don't care if the code is around
> in "deprecated" state for another couple of years if there are still
> people that feel that they want it.
>
> This is the reason that I put a pointer for -i/-b to
> @block/@block-incremental.  They are "perfect" replacements.
>
> I can put here to use blockdev-mirror + NBD, but the replacement is not
> so direct.
>
> Does this make sense?

I see where you're coming from.  Now let's change perspective for a
minute: what's the purpose of deprecating stuff?

We normally deprecate with intent to remove.

We don't remove right away, because we promised to first deprecate for a
grace period, so users can adjust in an orderly manner.  The deprecation
serves as signal "you need to adjust".  The documentation that comes
with it should help with the adjustment.  It's commonly of the form "use
$alternative instead".  The alternative is often a direct replacement,
but not always.  There could even be no replacement at all.  We don't
promise replacements, we promise an orderly process, so users can
adjust.

Sometimes, we don't have firm plans to remove, but are more like "maybe
remove when gets in the way".  We could soften the "you need to adjust"
signal in documentation, but I doubt that's a good idea.  Regardless,
the need to help users adjust remains.

Back to your patches.  There are two separate interfaces to block
migration, and both are deprecated at the end of the series:

1. Migration parameter @block-incremental & friends

   Not in the

[PULL 37/38] migration/multifd: Unify multifd_send_thread error paths

2023-10-16 Thread Juan Quintela

From: Fabiano Rosas 

The preferred usage of the Error type is to always set both the return
code and the error when a failure happens. As all code called from the
send thread follows this pattern, we'll always have the return code
and the error set at the same time.

Aside from the convention, in this piece of code this must be the
case, otherwise the if (ret != 0) would be exiting the thread without
calling multifd_send_terminate_threads() which is incorrect.

Unify both paths to make it clear that both are taken when there's an
error.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231012134343.23757-3-faro...@suse.de>
---
 migration/multifd.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 8e9a5ee394..c92955de41 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -753,19 +753,13 @@ static void *multifd_send_thread(void *opaque)
 }
 
 out:
-if (local_err) {
+if (ret) {
+assert(local_err);
 trace_multifd_send_error(p->id);
 multifd_send_terminate_threads(local_err);
-error_free(local_err);
-}
-
-/*
- * Error happen, I will exit, but I can't just leave, tell
- * who pay attention to me.
- */
-if (ret != 0) {
 qemu_sem_post(>sem_sync);
 qemu_sem_post(_send_state->channels_ready);
+error_free(local_err);
 }
 
 qemu_mutex_lock(>mutex);
-- 
2.41.0

[PULL 32/38] migration/ram: Remove RAMState from xbzrle_cache_zero_page

2023-10-16 Thread Juan Quintela

From: Fabiano Rosas 

'rs' is not used in that function. It's a leftover from commit
9360447d34 ("ram: Use MigrationStats for statistics").

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231011184604.32364-4-faro...@suse.de>
---
 migration/ram.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 2bd51d6bb5..535172d9be 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -570,7 +570,6 @@ void mig_throttle_counter_reset(void)
 /**
  * xbzrle_cache_zero_page: insert a zero page in the XBZRLE cache
  *
- * @rs: current RAM state
  * @current_addr: address for the zero page
  *
  * Update the xbzrle cache to reflect a page that's been sent as all 0.
@@ -579,7 +578,7 @@ void mig_throttle_counter_reset(void)
  * As a bonus, if the page wasn't in the cache it gets added so that
  * when a small write is made into the 0'd page it gets XBZRLE sent.
  */
-static void xbzrle_cache_zero_page(RAMState *rs, ram_addr_t current_addr)
+static void xbzrle_cache_zero_page(ram_addr_t current_addr)
 {
 /* We don't care if this fails to allocate a new cache page
  * as long as it updated an old one */
@@ -2146,7 +2145,7 @@ static int ram_save_target_page_legacy(RAMState *rs, 
PageSearchStatus *pss)
  */
 if (rs->xbzrle_started) {
 XBZRLE_cache_lock();
-xbzrle_cache_zero_page(rs, block->offset + offset);
+xbzrle_cache_zero_page(block->offset + offset);
 XBZRLE_cache_unlock();
 }
 return res;
-- 
2.41.0

[PULL 06/38] migration: Fix analyze-migration.py 'configuration' parsing

2023-10-16 Thread Juan Quintela

From: Fabiano Rosas 

The 'configuration' state subsections are currently not being parsed
and the script fails when analyzing an aarch64 stream:

Traceback (most recent call last):
  File "./scripts/analyze-migration.py", line 625, in 
dump.read(dump_memory = args.memory)
  File "./scripts/analyze-migration.py", line 571, in read
raise Exception("Unknown section type: %d" % section_type)
Exception: Unknown section type: 5

Signed-off-by: Fabiano Rosas 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231009184326.15777-3-faro...@suse.de>
---
 scripts/analyze-migration.py | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py
index 082424558b..24687db497 100755
--- a/scripts/analyze-migration.py
+++ b/scripts/analyze-migration.py
@@ -261,12 +261,21 @@ def getDict(self):
 
 
 class ConfigurationSection(object):
-def __init__(self, file):
+def __init__(self, file, desc):
 self.file = file
+self.desc = desc
 
 def read(self):
-name_len = self.file.read32()
-name = self.file.readstr(len = name_len)
+if self.desc:
+version_id = self.desc['version']
+section = VMSDSection(self.file, version_id, self.desc,
+  'configuration')
+section.read()
+else:
+# backward compatibility for older streams that don't have
+# the configuration section in the json
+name_len = self.file.read32()
+name = self.file.readstr(len = name_len)
 
 class VMSDFieldGeneric(object):
 def __init__(self, desc, file):
@@ -532,7 +541,8 @@ def read(self, desc_only = False, dump_memory = False, 
write_memory = False):
 if section_type == self.QEMU_VM_EOF:
 break
 elif section_type == self.QEMU_VM_CONFIGURATION:
-section = ConfigurationSection(file)
+config_desc = self.vmsd_desc.get('configuration')
+section = ConfigurationSection(file, config_desc)
 section.read()
 elif section_type == self.QEMU_VM_SECTION_START or section_type == 
self.QEMU_VM_SECTION_FULL:
 section_id = file.read32()
-- 
2.41.0

[PULL 14/38] migration: Create migrate_rdma()

2023-10-16 Thread Juan Quintela

Helper to say if we are doing a migration over rdma.

Reviewed-by: Peter Xu 
Reviewed-by: Li Zhijian 
Signed-off-by: Juan Quintela 
Message-ID: <20231011203527.9061-2-quint...@redhat.com>
---
 migration/migration.h | 2 ++
 migration/options.h   | 1 +
 migration/migration.c | 1 +
 migration/options.c   | 7 +++
 migration/rdma.c  | 4 +++-
 5 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/migration/migration.h b/migration/migration.h
index 974897a8d0..ae82004892 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -469,6 +469,8 @@ struct MigrationState {
  * switchover has been received.
  */
 bool switchover_acked;
+/* Is this a rdma migration */
+bool rdma_migration;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/options.h b/migration/options.h
index 93ee938ab8..237f2d6b4a 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -56,6 +56,7 @@ bool migrate_zero_copy_send(void);
 
 bool migrate_multifd_flush_after_each_section(void);
 bool migrate_postcopy(void);
+bool migrate_rdma(void);
 bool migrate_tls(void);
 
 /* capabilities helpers */
diff --git a/migration/migration.c b/migration/migration.c
index 79fa11e3f6..6ba5e145ac 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1452,6 +1452,7 @@ int migrate_init(MigrationState *s, Error **errp)
 s->iteration_initial_bytes = 0;
 s->threshold_size = 0;
 s->switchover_acked = false;
+s->rdma_migration = false;
 /*
  * set mig_stats compression_counters memory to zero for a
  * new migration
diff --git a/migration/options.c b/migration/options.c
index 546cbe3106..42fb818956 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -378,6 +378,13 @@ bool migrate_postcopy(void)
 return migrate_postcopy_ram() || migrate_dirty_bitmaps();
 }
 
+bool migrate_rdma(void)
+{
+MigrationState *s = migrate_get_current();
+
+return s->rdma_migration;
+}
+
 bool migrate_tls(void)
 {
 MigrationState *s = migrate_get_current();
diff --git a/migration/rdma.c b/migration/rdma.c
index f6fc226c9b..f155f3e1c8 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -4113,6 +4113,7 @@ static void rdma_accept_incoming_migration(void *opaque)
 
 void rdma_start_incoming_migration(const char *host_port, Error **errp)
 {
+MigrationState *s = migrate_get_current();
 int ret;
 RDMAContext *rdma;
 
@@ -4144,7 +4145,7 @@ void rdma_start_incoming_migration(const char *host_port, 
Error **errp)
 }
 
 trace_rdma_start_incoming_migration_after_rdma_listen();
-
+s->rdma_migration = true;
 qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration,
 NULL, (void *)(intptr_t)rdma);
 return;
@@ -4220,6 +4221,7 @@ void rdma_start_outgoing_migration(void *opaque,
 trace_rdma_start_outgoing_migration_after_rdma_connect();
 
 s->to_dst_file = rdma_new_output(rdma);
+s->rdma_migration = true;
 migrate_fd_connect(s, NULL);
 return;
 return_path_err:
-- 
2.41.0

[PULL 27/38] migration: Improve json and formatting

2023-10-16 Thread Juan Quintela

Reviewed-by: Markus Armbruster 
Signed-off-by: Juan Quintela 
Message-ID: <20231013104736.31722-2-quint...@redhat.com>
---
 qapi/migration.json | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 360e609f66..db3df12d6c 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -73,7 +73,7 @@
 { 'struct': 'MigrationStats',
   'data': {'transferred': 'int', 'remaining': 'int', 'total': 'int' ,
'duplicate': 'int',
-   'skipped': { 'type': 'int', 'features': ['deprecated'] },
+   'skipped': { 'type': 'int', 'features': [ 'deprecated' ] },
'normal': 'int',
'normal-bytes': 'int', 'dirty-pages-rate': 'int',
'mbps': 'number', 'dirty-sync-count': 'int',
@@ -440,10 +440,9 @@
 # compress and xbzrle are both on, compress only takes effect in
 # the ram bulk stage, after that, it will be disabled and only
 # xbzrle takes effect, this can help to minimize migration
-# traffic.  The feature is disabled by default.  (since 2.4 )
+# traffic.  The feature is disabled by default.  (since 2.4)
 #
-# @events: generate events for each migration state change (since 2.4
-# )
+# @events: generate events for each migration state change (since 2.4)
 #
 # @auto-converge: If enabled, QEMU will automatically throttle down
 # the guest to speed up convergence of RAM migration.  (since 1.6)
-- 
2.41.0

[PULL 33/38] migration/ram: Stop passing QEMUFile around in save_zero_page

2023-10-16 Thread Juan Quintela

From: Fabiano Rosas 

We don't need the QEMUFile when we're already passing the
PageSearchStatus.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231011184604.32364-5-faro...@suse.de>
---
 migration/ram.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 535172d9be..2ec28c4507 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1147,10 +1147,11 @@ void ram_release_page(const char *rbname, uint64_t 
offset)
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  */
-static int save_zero_page_to_file(PageSearchStatus *pss, QEMUFile *file,
-  RAMBlock *block, ram_addr_t offset)
+static int save_zero_page_to_file(PageSearchStatus *pss, RAMBlock *block,
+  ram_addr_t offset)
 {
 uint8_t *p = block->host + offset;
+QEMUFile *file = pss->pss_channel;
 int len = 0;
 
 if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
@@ -1171,10 +1172,10 @@ static int save_zero_page_to_file(PageSearchStatus 
*pss, QEMUFile *file,
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  */
-static int save_zero_page(PageSearchStatus *pss, QEMUFile *f, RAMBlock *block,
+static int save_zero_page(PageSearchStatus *pss, RAMBlock *block,
   ram_addr_t offset)
 {
-int len = save_zero_page_to_file(pss, f, block, offset);
+int len = save_zero_page_to_file(pss, block, offset);
 
 if (len) {
 stat64_add(_stats.zero_pages, 1);
@@ -2138,7 +2139,7 @@ static int ram_save_target_page_legacy(RAMState *rs, 
PageSearchStatus *pss)
 return 1;
 }
 
-res = save_zero_page(pss, pss->pss_channel, block, offset);
+res = save_zero_page(pss, block, offset);
 if (res > 0) {
 /* Must let xbzrle know, otherwise a previous (now 0'd) cached
  * page would be stale
-- 
2.41.0

[PULL 34/38] migration/ram: Move xbzrle zero page handling into save_zero_page

2023-10-16 Thread Juan Quintela

From: Fabiano Rosas 

It makes a bit more sense to have the zero page handling of xbzrle
right where we save the zero page.

Also invert the exit condition to remove one level of indentation
which makes the next patch easier to grasp.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231011184604.32364-6-faro...@suse.de>
---
 migration/ram.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 2ec28c4507..229cad5c74 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1168,21 +1168,34 @@ static int save_zero_page_to_file(PageSearchStatus 
*pss, RAMBlock *block,
  *
  * Returns the number of pages written.
  *
+ * @rs: current RAM state
  * @pss: current PSS channel
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  */
-static int save_zero_page(PageSearchStatus *pss, RAMBlock *block,
+static int save_zero_page(RAMState *rs, PageSearchStatus *pss, RAMBlock *block,
   ram_addr_t offset)
 {
 int len = save_zero_page_to_file(pss, block, offset);
 
-if (len) {
-stat64_add(_stats.zero_pages, 1);
-ram_transferred_add(len);
-return 1;
+if (!len) {
+return -1;
 }
-return -1;
+
+stat64_add(_stats.zero_pages, 1);
+ram_transferred_add(len);
+
+/*
+ * Must let xbzrle know, otherwise a previous (now 0'd) cached
+ * page would be stale.
+ */
+if (rs->xbzrle_started) {
+XBZRLE_cache_lock();
+xbzrle_cache_zero_page(block->offset + offset);
+XBZRLE_cache_unlock();
+}
+
+return 1;
 }
 
 /*
@@ -2139,16 +2152,8 @@ static int ram_save_target_page_legacy(RAMState *rs, 
PageSearchStatus *pss)
 return 1;
 }
 
-res = save_zero_page(pss, block, offset);
+res = save_zero_page(rs, pss, block, offset);
 if (res > 0) {
-/* Must let xbzrle know, otherwise a previous (now 0'd) cached
- * page would be stale
- */
-if (rs->xbzrle_started) {
-XBZRLE_cache_lock();
-xbzrle_cache_zero_page(block->offset + offset);
-XBZRLE_cache_unlock();
-}
 return res;
 }
 
-- 
2.41.0

[PULL 26/38] migration/rdma: Remove all "ret" variables that are used only once

2023-10-16 Thread Juan Quintela

Change code that is:

int ret;
...

ret = foo();
if (ret[ < 0]?) {

to:

if (foo()[ < 0]) {

Reviewed-by: Fabiano Rosas 
Reviewed-by: Li Zhijian 
Signed-off-by: Juan Quintela 
Message-ID: <20231011203527.9061-14-quint...@redhat.com>
---
 migration/rdma.c | 29 -
 1 file changed, 8 insertions(+), 21 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 09015fbd1a..2a1852ec7f 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1107,7 +1107,6 @@ err_alloc_pd_cq:
 static int qemu_rdma_alloc_qp(RDMAContext *rdma)
 {
 struct ibv_qp_init_attr attr = { 0 };
-int ret;
 
 attr.cap.max_send_wr = RDMA_SIGNALED_SEND_MAX;
 attr.cap.max_recv_wr = 3;
@@ -1117,8 +1116,7 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma)
 attr.recv_cq = rdma->recv_cq;
 attr.qp_type = IBV_QPT_RC;
 
-ret = rdma_create_qp(rdma->cm_id, rdma->pd, );
-if (ret < 0) {
+if (rdma_create_qp(rdma->cm_id, rdma->pd, ) < 0) {
 return -1;
 }
 
@@ -1130,8 +1128,8 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma)
 static bool rdma_support_odp(struct ibv_context *dev)
 {
 struct ibv_device_attr_ex attr = {0};
-int ret = ibv_query_device_ex(dev, NULL, );
-if (ret) {
+
+if (ibv_query_device_ex(dev, NULL, )) {
 return false;
 }
 
@@ -1508,7 +1506,6 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
struct ibv_comp_channel *comp_channel)
 {
 struct rdma_cm_event *cm_event;
-int ret;
 
 /*
  * Coroutine doesn't start until migration_fd_process_incoming()
@@ -1544,8 +1541,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
 }
 
 if (pfds[1].revents) {
-ret = rdma_get_cm_event(rdma->channel, _event);
-if (ret < 0) {
+if (rdma_get_cm_event(rdma->channel, _event) < 0) {
 return -1;
 }
 
@@ -2317,12 +2313,10 @@ static int qemu_rdma_write(RDMAContext *rdma,
 uint64_t current_addr = block_offset + offset;
 uint64_t index = rdma->current_index;
 uint64_t chunk = rdma->current_chunk;
-int ret;
 
 /* If we cannot merge it, we flush the current buffer first. */
 if (!qemu_rdma_buffer_mergeable(rdma, current_addr, len)) {
-ret = qemu_rdma_write_flush(rdma, errp);
-if (ret < 0) {
+if (qemu_rdma_write_flush(rdma, errp) < 0) {
 return -1;
 }
 rdma->current_length = 0;
@@ -2936,7 +2930,6 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
 static int qemu_rdma_drain_cq(RDMAContext *rdma)
 {
 Error *err = NULL;
-int ret;
 
 if (qemu_rdma_write_flush(rdma, ) < 0) {
 error_report_err(err);
@@ -2944,8 +2937,7 @@ static int qemu_rdma_drain_cq(RDMAContext *rdma)
 }
 
 while (rdma->nb_sent) {
-ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL);
-if (ret < 0) {
+if (qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL) < 0) {
 error_report("rdma migration: complete polling error!");
 return -1;
 }
@@ -3323,12 +3315,10 @@ static void rdma_accept_incoming_migration(void 
*opaque);
 static void rdma_cm_poll_handler(void *opaque)
 {
 RDMAContext *rdma = opaque;
-int ret;
 struct rdma_cm_event *cm_event;
 MigrationIncomingState *mis = migration_incoming_get_current();
 
-ret = rdma_get_cm_event(rdma->channel, _event);
-if (ret < 0) {
+if (rdma_get_cm_event(rdma->channel, _event) < 0) {
 error_report("get_cm_event failed %d", errno);
 return;
 }
@@ -4053,14 +4043,11 @@ static QEMUFile *rdma_new_output(RDMAContext *rdma)
 static void rdma_accept_incoming_migration(void *opaque)
 {
 RDMAContext *rdma = opaque;
-int ret;
 QEMUFile *f;
 Error *local_err = NULL;
 
 trace_qemu_rdma_accept_incoming_migration();
-ret = qemu_rdma_accept(rdma);
-
-if (ret < 0) {
+if (qemu_rdma_accept(rdma) < 0) {
 error_report("RDMA ERROR: Migration initialization failed");
 return;
 }
-- 
2.41.0

[PULL 25/38] migration/rdma: Declare for index variables local

2023-10-16 Thread Juan Quintela

Declare all variables that are only used inside a for loop inside the
for statement.

This makes clear that they are not used outside of the for loop.

Reviewed-by: Fabiano Rosas 
Reviewed-by: Li Zhijian 
Signed-off-by: Juan Quintela 
Message-ID: <20231011203527.9061-13-quint...@redhat.com>
---
 migration/rdma.c | 44 ++--
 1 file changed, 18 insertions(+), 26 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 5f6a771e8e..09015fbd1a 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -559,10 +559,8 @@ static void rdma_add_block(RDMAContext *rdma, const char 
*block_name,
 local->block = g_new0(RDMALocalBlock, local->nb_blocks + 1);
 
 if (local->nb_blocks) {
-int x;
-
 if (rdma->blockmap) {
-for (x = 0; x < local->nb_blocks; x++) {
+for (int x = 0; x < local->nb_blocks; x++) {
 g_hash_table_remove(rdma->blockmap,
 (void *)(uintptr_t)old[x].offset);
 g_hash_table_insert(rdma->blockmap,
@@ -649,15 +647,12 @@ static void rdma_delete_block(RDMAContext *rdma, 
RDMALocalBlock *block)
 {
 RDMALocalBlocks *local = >local_ram_blocks;
 RDMALocalBlock *old = local->block;
-int x;
 
 if (rdma->blockmap) {
 g_hash_table_remove(rdma->blockmap, (void *)(uintptr_t)block->offset);
 }
 if (block->pmr) {
-int j;
-
-for (j = 0; j < block->nb_chunks; j++) {
+for (int j = 0; j < block->nb_chunks; j++) {
 if (!block->pmr[j]) {
 continue;
 }
@@ -687,7 +682,7 @@ static void rdma_delete_block(RDMAContext *rdma, 
RDMALocalBlock *block)
 block->block_name = NULL;
 
 if (rdma->blockmap) {
-for (x = 0; x < local->nb_blocks; x++) {
+for (int x = 0; x < local->nb_blocks; x++) {
 g_hash_table_remove(rdma->blockmap,
 (void *)(uintptr_t)old[x].offset);
 }
@@ -705,7 +700,7 @@ static void rdma_delete_block(RDMAContext *rdma, 
RDMALocalBlock *block)
 memcpy(local->block + block->index, old + (block->index + 1),
 sizeof(RDMALocalBlock) *
 (local->nb_blocks - (block->index + 1)));
-for (x = block->index; x < local->nb_blocks - 1; x++) {
+for (int x = block->index; x < local->nb_blocks - 1; x++) {
 local->block[x].index--;
 }
 }
@@ -725,7 +720,7 @@ static void rdma_delete_block(RDMAContext *rdma, 
RDMALocalBlock *block)
 local->nb_blocks--;
 
 if (local->nb_blocks && rdma->blockmap) {
-for (x = 0; x < local->nb_blocks; x++) {
+for (int x = 0; x < local->nb_blocks; x++) {
 g_hash_table_insert(rdma->blockmap,
 (void *)(uintptr_t)local->block[x].offset,
 >block[x]);
@@ -828,12 +823,12 @@ static int qemu_rdma_broken_ipv6_kernel(struct 
ibv_context *verbs, Error **errp)
  * Otherwise, there are no guarantees until the bug is fixed in linux.
  */
 if (!verbs) {
-int num_devices, x;
+int num_devices;
 struct ibv_device **dev_list = ibv_get_device_list(_devices);
 bool roce_found = false;
 bool ib_found = false;
 
-for (x = 0; x < num_devices; x++) {
+for (int x = 0; x < num_devices; x++) {
 verbs = ibv_open_device(dev_list[x]);
 /*
  * ibv_open_device() is not documented to set errno.  If
@@ -925,7 +920,6 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error 
**errp)
 char port_str[16];
 struct rdma_cm_event *cm_event;
 char ip[40] = "unknown";
-struct rdma_addrinfo *e;
 
 if (rdma->host == NULL || !strcmp(rdma->host, "")) {
 error_setg(errp, "RDMA ERROR: RDMA hostname has not been set");
@@ -957,7 +951,7 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error 
**errp)
 }
 
 /* Try all addresses, saving the first error in @err */
-for (e = res; e != NULL; e = e->ai_next) {
+for (struct rdma_addrinfo *e = res; e != NULL; e = e->ai_next) {
 Error **local_errp = err ? NULL : 
 
 inet_ntop(e->ai_family,
@@ -2777,7 +2771,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
 RDMAContext *rdma;
 int ret;
 ssize_t done = 0;
-size_t i, len;
+size_t len;
 
 RCU_READ_LOCK_GUARD();
 rdma = qatomic_rcu_read(>rdmaout);
@@ -2803,7 +2797,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
 return -1;
 }
 
-for (i = 0; i < niov; i++) {
+for (int i = 0; i < niov; i++) {
 size_t remaining = iov[i].iov_len;
 uint8_t * data = (void *)iov[i].iov_base;
 while (remaining) {
@@ -2866,7 +2860,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
 RDMAControlHeader head;
 int ret;
 ssize_t done = 0;
-size_t i, len;
+size_t len;

[PULL 31/38] migration/ram: Refactor precopy ram loading code

2023-10-16 Thread Juan Quintela

From: Nikolay Borisov 

Extract the ramblock parsing code into a routine that operates on the
sequence of headers from the stream and another the parses the
individual ramblock. This makes ram_load_precopy() easier to
comprehend.

Signed-off-by: Nikolay Borisov 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
Message-ID: <20231011184604.32364-3-faro...@suse.de>
---
 migration/ram.c | 146 +++-
 1 file changed, 82 insertions(+), 64 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index a9bc6ae1f1..2bd51d6bb5 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3884,6 +3884,85 @@ void colo_flush_ram_cache(void)
 trace_colo_flush_ram_cache_end();
 }
 
+static int parse_ramblock(QEMUFile *f, RAMBlock *block, ram_addr_t length)
+{
+int ret = 0;
+/* ADVISE is earlier, it shows the source has the postcopy capability on */
+bool postcopy_advised = migration_incoming_postcopy_advised();
+
+assert(block);
+
+if (!qemu_ram_is_migratable(block)) {
+error_report("block %s should not be migrated !", block->idstr);
+return -EINVAL;
+}
+
+if (length != block->used_length) {
+Error *local_err = NULL;
+
+ret = qemu_ram_resize(block, length, _err);
+if (local_err) {
+error_report_err(local_err);
+}
+}
+/* For postcopy we need to check hugepage sizes match */
+if (postcopy_advised && migrate_postcopy_ram() &&
+block->page_size != qemu_host_page_size) {
+uint64_t remote_page_size = qemu_get_be64(f);
+if (remote_page_size != block->page_size) {
+error_report("Mismatched RAM page size %s "
+ "(local) %zd != %" PRId64, block->idstr,
+ block->page_size, remote_page_size);
+ret = -EINVAL;
+}
+}
+if (migrate_ignore_shared()) {
+hwaddr addr = qemu_get_be64(f);
+if (migrate_ram_is_ignored(block) &&
+block->mr->addr != addr) {
+error_report("Mismatched GPAs for block %s "
+ "%" PRId64 "!= %" PRId64, block->idstr,
+ (uint64_t)addr, (uint64_t)block->mr->addr);
+ret = -EINVAL;
+}
+}
+ret = rdma_block_notification_handle(f, block->idstr);
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
+
+return ret;
+}
+
+static int parse_ramblocks(QEMUFile *f, ram_addr_t total_ram_bytes)
+{
+int ret = 0;
+
+/* Synchronize RAM block list */
+while (!ret && total_ram_bytes) {
+RAMBlock *block;
+char id[256];
+ram_addr_t length;
+int len = qemu_get_byte(f);
+
+qemu_get_buffer(f, (uint8_t *)id, len);
+id[len] = 0;
+length = qemu_get_be64(f);
+
+block = qemu_ram_block_by_name(id);
+if (block) {
+ret = parse_ramblock(f, block, length);
+} else {
+error_report("Unknown ramblock \"%s\", cannot accept "
+ "migration", id);
+ret = -EINVAL;
+}
+total_ram_bytes -= length;
+}
+
+return ret;
+}
+
 /**
  * ram_load_precopy: load pages in precopy case
  *
@@ -3898,14 +3977,13 @@ static int ram_load_precopy(QEMUFile *f)
 {
 MigrationIncomingState *mis = migration_incoming_get_current();
 int flags = 0, ret = 0, invalid_flags = 0, len = 0, i = 0;
-/* ADVISE is earlier, it shows the source has the postcopy capability on */
-bool postcopy_advised = migration_incoming_postcopy_advised();
+
 if (!migrate_compress()) {
 invalid_flags |= RAM_SAVE_FLAG_COMPRESS_PAGE;
 }
 
 while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
-ram_addr_t addr, total_ram_bytes;
+ram_addr_t addr;
 void *host = NULL, *host_bak = NULL;
 uint8_t ch;
 
@@ -3976,67 +4054,7 @@ static int ram_load_precopy(QEMUFile *f)
 
 switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
 case RAM_SAVE_FLAG_MEM_SIZE:
-/* Synchronize RAM block list */
-total_ram_bytes = addr;
-while (!ret && total_ram_bytes) {
-RAMBlock *block;
-char id[256];
-ram_addr_t length;
-
-len = qemu_get_byte(f);
-qemu_get_buffer(f, (uint8_t *)id, len);
-id[len] = 0;
-length = qemu_get_be64(f);
-
-block = qemu_ram_block_by_name(id);
-if (block && !qemu_ram_is_migratable(block)) {
-error_report("block %s should not be migrated !", id);
-ret = -EINVAL;
-} else if (block) {
-if (length != block->used_length) {
-Error *local_err = NULL;
-
-ret = qemu_ram_resize(block, length,
-

[PULL 09/38] migration: Fix analyze-migration read operation signedness

2023-10-16 Thread Juan Quintela

From: Fabiano Rosas 

The migration code uses unsigned values for 16, 32 and 64-bit
operations. Fix the script to do the same.

This was causing an issue when parsing the migration stream generated
on the ppc64 target because one of instance_ids was larger than the
32bit signed maximum:

Traceback (most recent call last):
  File "/home/fabiano/kvm/qemu/build/scripts/analyze-migration.py", line 658, 
in 
dump.read(dump_memory = args.memory)
  File "/home/fabiano/kvm/qemu/build/scripts/analyze-migration.py", line 592, 
in read
classdesc = self.section_classes[section_key]
KeyError: ('spapr_iommu', -2147483648)

Signed-off-by: Fabiano Rosas 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231009184326.15777-6-faro...@suse.de>
---
 scripts/analyze-migration.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py
index 56ab04dd2d..de506cb8bf 100755
--- a/scripts/analyze-migration.py
+++ b/scripts/analyze-migration.py
@@ -38,13 +38,13 @@ def __init__(self, filename):
 self.file = open(self.filename, "rb")
 
 def read64(self):
-return int.from_bytes(self.file.read(8), byteorder='big', signed=True)
+return int.from_bytes(self.file.read(8), byteorder='big', signed=False)
 
 def read32(self):
-return int.from_bytes(self.file.read(4), byteorder='big', signed=True)
+return int.from_bytes(self.file.read(4), byteorder='big', signed=False)
 
 def read16(self):
-return int.from_bytes(self.file.read(2), byteorder='big', signed=True)
+return int.from_bytes(self.file.read(2), byteorder='big', signed=False)
 
 def read8(self):
 return int.from_bytes(self.file.read(1), byteorder='big', signed=True)
-- 
2.41.0

[PULL 19/38] migration/rdma: Create rdma_control_save_page()

2023-10-16 Thread Juan Quintela

The only user of ram_control_save_page() and save_page() hook was
rdma. Just move the function to rdma.c, rename it to
rdma_control_save_page().

Reviewed-by: Peter Xu 
Reviewed-by: Li Zhijian 
Signed-off-by: Juan Quintela 
Message-ID: <20231011203527.9061-7-quint...@redhat.com>
---
 migration/qemu-file.h | 12 
 migration/rdma.h  | 10 ++
 migration/qemu-file.c | 20 
 migration/ram.c   |  4 ++--
 migration/rdma.c  | 19 ++-
 5 files changed, 30 insertions(+), 35 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 80c30631dc..60510a2819 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -36,17 +36,7 @@
 #define RAM_CONTROL_ROUND 1
 #define RAM_CONTROL_FINISH3
 
-/*
- * This function allows override of where the RAM page
- * is saved (such as RDMA, for example.)
- */
-typedef int (QEMURamSaveFunc)(QEMUFile *f,
-  ram_addr_t block_offset,
-  ram_addr_t offset,
-  size_t size);
-
 typedef struct QEMUFileHooks {
-QEMURamSaveFunc *save_page;
 } QEMUFileHooks;
 
 QEMUFile *qemu_file_new_input(QIOChannel *ioc);
@@ -125,8 +115,6 @@ int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size);
 #define RAM_SAVE_CONTROL_NOT_SUPP -1000
 #define RAM_SAVE_CONTROL_DELAYED  -2000
 
-int ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
-  ram_addr_t offset, size_t size);
 QIOChannel *qemu_file_get_ioc(QEMUFile *file);
 
 #endif
diff --git a/migration/rdma.h b/migration/rdma.h
index 8df8b4089a..09a16c1e3c 100644
--- a/migration/rdma.h
+++ b/migration/rdma.h
@@ -17,6 +17,8 @@
 #ifndef QEMU_MIGRATION_RDMA_H
 #define QEMU_MIGRATION_RDMA_H
 
+#include "exec/memory.h"
+
 void rdma_start_outgoing_migration(void *opaque, const char *host_port,
Error **errp);
 
@@ -28,6 +30,8 @@ int qemu_rdma_registration_handle(QEMUFile *f);
 int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags);
 int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags);
 int rdma_block_notification_handle(QEMUFile *f, const char *name);
+int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
+   ram_addr_t offset, size_t size);
 #else
 static inline
 int qemu_rdma_registration_handle(QEMUFile *f) { return 0; }
@@ -37,5 +41,11 @@ static inline
 int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags) { return 0; }
 static inline
 int rdma_block_notification_handle(QEMUFile *f, const char *name) { return 0; }
+static inline
+int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
+   ram_addr_t offset, size_t size)
+{
+return RAM_SAVE_CONTROL_NOT_SUPP;
+}
 #endif
 #endif
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 4a414b8976..745eaf7a5b 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -298,26 +298,6 @@ void qemu_fflush(QEMUFile *f)
 f->iovcnt = 0;
 }
 
-int ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
-  ram_addr_t offset, size_t size)
-{
-if (f->hooks && f->hooks->save_page) {
-int ret = f->hooks->save_page(f, block_offset, offset, size);
-/*
- * RAM_SAVE_CONTROL_* are negative values
- */
-if (ret != RAM_SAVE_CONTROL_DELAYED &&
-ret != RAM_SAVE_CONTROL_NOT_SUPP) {
-if (ret < 0) {
-qemu_file_set_error(f, ret);
-}
-}
-return ret;
-}
-
-return RAM_SAVE_CONTROL_NOT_SUPP;
-}
-
 /*
  * Attempt to fill the buffer from the underlying file
  * Returns the number of bytes read, or negative value for an error.
diff --git a/migration/ram.c b/migration/ram.c
index 8c462276cd..3b4b09f6ff 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1197,8 +1197,8 @@ static bool control_save_page(PageSearchStatus *pss, 
RAMBlock *block,
 {
 int ret;
 
-ret = ram_control_save_page(pss->pss_channel, block->offset, offset,
-TARGET_PAGE_SIZE);
+ret = rdma_control_save_page(pss->pss_channel, block->offset, offset,
+ TARGET_PAGE_SIZE);
 if (ret == RAM_SAVE_CONTROL_NOT_SUPP) {
 return false;
 }
diff --git a/migration/rdma.c b/migration/rdma.c
index 0b1cb03b2b..f66bd939d7 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3314,6 +3314,24 @@ err:
 return -1;
 }
 
+int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
+   ram_addr_t offset, size_t size)
+{
+if (!migrate_rdma()) {
+return RAM_SAVE_CONTROL_NOT_SUPP;
+}
+
+int ret = qemu_rdma_save_page(f, block_offset, offset, size);
+
+if (ret != RAM_SAVE_CONTROL_DELAYED &&
+ret != RAM_SAVE_CONTROL_NOT_SUPP) {
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
+}
+return ret;
+}
+
 static void

[PULL 20/38] qemu-file: Remove QEMUFileHooks

2023-10-16 Thread Juan Quintela

The only user was rdma, and its use is gone.

Reviewed-by: Peter Xu 
Reviewed-by: Li Zhijian 
Signed-off-by: Juan Quintela 
Message-ID: <20231011203527.9061-8-quint...@redhat.com>
---
 migration/qemu-file.h | 4 
 migration/qemu-file.c | 6 --
 migration/rdma.c  | 9 -
 3 files changed, 19 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 60510a2819..0b22d8335f 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -36,12 +36,8 @@
 #define RAM_CONTROL_ROUND 1
 #define RAM_CONTROL_FINISH3
 
-typedef struct QEMUFileHooks {
-} QEMUFileHooks;
-
 QEMUFile *qemu_file_new_input(QIOChannel *ioc);
 QEMUFile *qemu_file_new_output(QIOChannel *ioc);
-void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks);
 int qemu_fclose(QEMUFile *f);
 
 /*
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 745eaf7a5b..3fb25148d1 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -38,7 +38,6 @@
 #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
 
 struct QEMUFile {
-const QEMUFileHooks *hooks;
 QIOChannel *ioc;
 bool is_writable;
 
@@ -133,11 +132,6 @@ QEMUFile *qemu_file_new_input(QIOChannel *ioc)
 return qemu_file_new_impl(ioc, false);
 }
 
-void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks)
-{
-f->hooks = hooks;
-}
-
 /*
  * Get last error for stream f with optional Error*
  *
diff --git a/migration/rdma.c b/migration/rdma.c
index f66bd939d7..9883b0a250 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -4003,13 +4003,6 @@ err:
 return -1;
 }
 
-static const QEMUFileHooks rdma_read_hooks = {
-};
-
-static const QEMUFileHooks rdma_write_hooks = {
-};
-
-
 static void qio_channel_rdma_finalize(Object *obj)
 {
 QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(obj);
@@ -4061,7 +4054,6 @@ static QEMUFile *rdma_new_input(RDMAContext *rdma)
 rioc->file = qemu_file_new_input(QIO_CHANNEL(rioc));
 rioc->rdmain = rdma;
 rioc->rdmaout = rdma->return_path;
-qemu_file_set_hooks(rioc->file, _read_hooks);
 
 return rioc->file;
 }
@@ -4073,7 +4065,6 @@ static QEMUFile *rdma_new_output(RDMAContext *rdma)
 rioc->file = qemu_file_new_output(QIO_CHANNEL(rioc));
 rioc->rdmaout = rdma;
 rioc->rdmain = rdma->return_path;
-qemu_file_set_hooks(rioc->file, _write_hooks);
 
 return rioc->file;
 }
-- 
2.41.0

[PULL 17/38] migration/rdma: Remove all uses of RAM_CONTROL_HOOK

2023-10-16 Thread Juan Quintela

Instead of going through ram_control_load_hook(), call
qemu_rdma_registration_handle() directly.

Reviewed-by: Li Zhijian 
Signed-off-by: Juan Quintela 
Message-ID: <20231011203527.9061-5-quint...@redhat.com>
---
 migration/qemu-file.h |  1 -
 migration/rdma.h  |  3 +++
 migration/ram.c   |  5 -
 migration/rdma.c  | 12 +++-
 4 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 35e671a01e..14ff0d9cc4 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -41,7 +41,6 @@ typedef int (QEMURamHookFunc)(QEMUFile *f, uint64_t flags, 
void *data);
  */
 #define RAM_CONTROL_SETUP 0
 #define RAM_CONTROL_ROUND 1
-#define RAM_CONTROL_HOOK  2
 #define RAM_CONTROL_FINISH3
 #define RAM_CONTROL_BLOCK_REG 4
 
diff --git a/migration/rdma.h b/migration/rdma.h
index c13b94c782..8bd277efb9 100644
--- a/migration/rdma.h
+++ b/migration/rdma.h
@@ -24,10 +24,13 @@ void rdma_start_incoming_migration(const char *host_port, 
Error **errp);
 
 
 #ifdef CONFIG_RDMA
+int qemu_rdma_registration_handle(QEMUFile *f);
 int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags);
 int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags);
 #else
 static inline
+int qemu_rdma_registration_handle(QEMUFile *f) { return 0; }
+static inline
 int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags) { return 0; }
 static inline
 int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags) { return 0; }
diff --git a/migration/ram.c b/migration/ram.c
index f1ddc1f9fa..f6ea1831b5 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4075,7 +4075,10 @@ static int ram_load_precopy(QEMUFile *f)
 }
 break;
 case RAM_SAVE_FLAG_HOOK:
-ram_control_load_hook(f, RAM_CONTROL_HOOK, NULL);
+ret = qemu_rdma_registration_handle(f);
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
 break;
 default:
 error_report("Unknown combination of migration flags: 0x%x", 
flags);
diff --git a/migration/rdma.c b/migration/rdma.c
index 4b32d375ec..5c20f425a9 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3522,7 +3522,7 @@ static int dest_ram_sort_func(const void *a, const void 
*b)
  *
  * Keep doing this until the source tells us to stop.
  */
-static int qemu_rdma_registration_handle(QEMUFile *f)
+int qemu_rdma_registration_handle(QEMUFile *f)
 {
 RDMAControlHeader reg_resp = { .len = sizeof(RDMARegisterResult),
.type = RDMA_CONTROL_REGISTER_RESULT,
@@ -3534,7 +3534,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
  };
 RDMAControlHeader blocks = { .type = RDMA_CONTROL_RAM_BLOCKS_RESULT,
  .repeat = 1 };
-QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
+QIOChannelRDMA *rioc;
 Error *err = NULL;
 RDMAContext *rdma;
 RDMALocalBlocks *local;
@@ -3550,7 +3550,12 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 int count = 0;
 int i = 0;
 
+if (!migrate_rdma()) {
+return 0;
+}
+
 RCU_READ_LOCK_GUARD();
+rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
 rdma = qatomic_rcu_read(>rdmain);
 
 if (!rdma) {
@@ -3841,9 +3846,6 @@ static int rdma_load_hook(QEMUFile *f, uint64_t flags, 
void *data)
 case RAM_CONTROL_BLOCK_REG:
 return rdma_block_notification_handle(f, data);
 
-case RAM_CONTROL_HOOK:
-return qemu_rdma_registration_handle(f);
-
 default:
 /* Shouldn't be called with any other values */
 abort();
-- 
2.41.0

[PULL 35/38] migration/ram: Merge save_zero_page functions

2023-10-16 Thread Juan Quintela

From: Fabiano Rosas 

We don't need to do this in two pieces. One single function makes it
easier to grasp, specially since it removes the indirection on the
return value handling.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231011184604.32364-7-faro...@suse.de>
---
 migration/ram.c | 46 +-
 1 file changed, 13 insertions(+), 33 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 229cad5c74..c844151ee9 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1137,32 +1137,6 @@ void ram_release_page(const char *rbname, uint64_t 
offset)
 ram_discard_range(rbname, offset, TARGET_PAGE_SIZE);
 }
 
-/**
- * save_zero_page_to_file: send the zero page to the file
- *
- * Returns the size of data written to the file, 0 means the page is not
- * a zero page
- *
- * @pss: current PSS channel
- * @block: block that contains the page we want to send
- * @offset: offset inside the block for the page
- */
-static int save_zero_page_to_file(PageSearchStatus *pss, RAMBlock *block,
-  ram_addr_t offset)
-{
-uint8_t *p = block->host + offset;
-QEMUFile *file = pss->pss_channel;
-int len = 0;
-
-if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
-len += save_page_header(pss, file, block, offset | RAM_SAVE_FLAG_ZERO);
-qemu_put_byte(file, 0);
-len += 1;
-ram_release_page(block->idstr, offset);
-}
-return len;
-}
-
 /**
  * save_zero_page: send the zero page to the stream
  *
@@ -1176,12 +1150,19 @@ static int save_zero_page_to_file(PageSearchStatus 
*pss, RAMBlock *block,
 static int save_zero_page(RAMState *rs, PageSearchStatus *pss, RAMBlock *block,
   ram_addr_t offset)
 {
-int len = save_zero_page_to_file(pss, block, offset);
+uint8_t *p = block->host + offset;
+QEMUFile *file = pss->pss_channel;
+int len = 0;
 
-if (!len) {
-return -1;
+if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
+return 0;
 }
 
+len += save_page_header(pss, file, block, offset | RAM_SAVE_FLAG_ZERO);
+qemu_put_byte(file, 0);
+len += 1;
+ram_release_page(block->idstr, offset);
+
 stat64_add(_stats.zero_pages, 1);
 ram_transferred_add(len);
 
@@ -1195,7 +1176,7 @@ static int save_zero_page(RAMState *rs, PageSearchStatus 
*pss, RAMBlock *block,
 XBZRLE_cache_unlock();
 }
 
-return 1;
+return len;
 }
 
 /*
@@ -2152,9 +2133,8 @@ static int ram_save_target_page_legacy(RAMState *rs, 
PageSearchStatus *pss)
 return 1;
 }
 
-res = save_zero_page(rs, pss, block, offset);
-if (res > 0) {
-return res;
+if (save_zero_page(rs, pss, block, offset)) {
+return 1;
 }
 
 /*
-- 
2.41.0

[PULL 28/38] migration: check for rate_limit_max for RATE_LIMIT_DISABLED

2023-10-16 Thread Juan Quintela

From: Elena Ufimtseva 

In migration rate limiting atomic operations are used
to read the rate limit variables and transferred bytes and
they are expensive. Check first if rate_limit_max is equal
to RATE_LIMIT_DISABLED and return false immediately if so.

Note that with this patch we will also will stop flushing
by not calling qemu_fflush() from migration_transferred_bytes()
if the migration rate is not exceeded.
This should be fine since migration thread calls in the loop
migration_update_counters from migration_rate_limit() that
calls the migration_transferred_bytes() and flushes there.

Signed-off-by: Elena Ufimtseva 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231011184358.97349-2-elena.ufimts...@oracle.com>
---
 migration/migration-stats.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/migration/migration-stats.c b/migration/migration-stats.c
index 84e11e6dd8..4cc989d975 100644
--- a/migration/migration-stats.c
+++ b/migration/migration-stats.c
@@ -24,14 +24,15 @@ bool migration_rate_exceeded(QEMUFile *f)
 return true;
 }
 
+uint64_t rate_limit_max = migration_rate_get();
+if (rate_limit_max == RATE_LIMIT_DISABLED) {
+return false;
+}
+
 uint64_t rate_limit_start = stat64_get(_stats.rate_limit_start);
 uint64_t rate_limit_current = migration_transferred_bytes(f);
 uint64_t rate_limit_used = rate_limit_current - rate_limit_start;
-uint64_t rate_limit_max = stat64_get(_stats.rate_limit_max);
 
-if (rate_limit_max == RATE_LIMIT_DISABLED) {
-return false;
-}
 if (rate_limit_max > 0 && rate_limit_used > rate_limit_max) {
 return true;
 }
-- 
2.41.0

[PULL 13/38] migration: Non multifd migration don't care about multifd flushes

2023-10-16 Thread Juan Quintela

RDMA was having trouble because
migrate_multifd_flush_after_each_section() can only be true or false,
but we don't want to send any flush when we are not in multifd
migration.

CC: Fabiano Rosas 
Reviewed-by: Li Zhijian 
Reviewed-by: Peter Xu 
Signed-off-by: Juan Quintela 
Message-ID: <20231011205548.10571-2-quint...@redhat.com>
---
 migration/ram.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index d3d9c8b65b..acb8f95f00 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1395,7 +1395,8 @@ static int find_dirty_block(RAMState *rs, 
PageSearchStatus *pss)
 pss->page = 0;
 pss->block = QLIST_NEXT_RCU(pss->block, next);
 if (!pss->block) {
-if (!migrate_multifd_flush_after_each_section()) {
+if (migrate_multifd() &&
+!migrate_multifd_flush_after_each_section()) {
 QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
 int ret = multifd_send_sync_main(f);
 if (ret < 0) {
@@ -3072,7 +3073,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 return ret;
 }
 
-if (!migrate_multifd_flush_after_each_section()) {
+if (migrate_multifd() && !migrate_multifd_flush_after_each_section()) {
 qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
 }
 
@@ -3184,7 +3185,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 out:
 if (ret >= 0
 && migration_is_setup_or_active(migrate_get_current()->state)) {
-if (migrate_multifd_flush_after_each_section()) {
+if (migrate_multifd() && migrate_multifd_flush_after_each_section()) {
 ret = 
multifd_send_sync_main(rs->pss[RAM_CHANNEL_PRECOPY].pss_channel);
 if (ret < 0) {
 return ret;
@@ -3261,7 +3262,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 return ret;
 }
 
-if (!migrate_multifd_flush_after_each_section()) {
+if (migrate_multifd() && !migrate_multifd_flush_after_each_section()) {
 qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
 }
 qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
@@ -3768,7 +3769,8 @@ int ram_load_postcopy(QEMUFile *f, int channel)
 break;
 case RAM_SAVE_FLAG_EOS:
 /* normal exit */
-if (migrate_multifd_flush_after_each_section()) {
+if (migrate_multifd() &&
+migrate_multifd_flush_after_each_section()) {
 multifd_recv_sync_main();
 }
 break;
@@ -4046,7 +4048,8 @@ static int ram_load_precopy(QEMUFile *f)
 break;
 case RAM_SAVE_FLAG_EOS:
 /* normal exit */
-if (migrate_multifd_flush_after_each_section()) {
+if (migrate_multifd() &&
+migrate_multifd_flush_after_each_section()) {
 multifd_recv_sync_main();
 }
 break;
-- 
2.41.0

[PULL 02/38] migration: Use g_autofree to simplify ram_dirty_bitmap_reload()

2023-10-16 Thread Juan Quintela

From: Philippe Mathieu-Daudé 

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Markus Armbruster 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231011023627.86691-1-phi...@linaro.org>
---
 migration/ram.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 2f5ce4d60b..24d91de8b3 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4159,7 +4159,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock 
*block)
 int ret = -EINVAL;
 /* from_dst_file is always valid because we're within rp_thread */
 QEMUFile *file = s->rp_state.from_dst_file;
-unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS;
+g_autofree unsigned long *le_bitmap = NULL;
+unsigned long nbits = block->used_length >> TARGET_PAGE_BITS;
 uint64_t local_size = DIV_ROUND_UP(nbits, 8);
 uint64_t size, end_mark;
 RAMState *rs = ram_state;
@@ -4188,8 +4189,7 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock 
*block)
 error_report("%s: ramblock '%s' bitmap size mismatch "
  "(0x%"PRIx64" != 0x%"PRIx64")", __func__,
  block->idstr, size, local_size);
-ret = -EINVAL;
-goto out;
+return -EINVAL;
 }
 
 size = qemu_get_buffer(file, (uint8_t *)le_bitmap, local_size);
@@ -4200,15 +4200,13 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock 
*block)
 error_report("%s: read bitmap failed for ramblock '%s': %d"
  " (size 0x%"PRIx64", got: 0x%"PRIx64")",
  __func__, block->idstr, ret, local_size, size);
-ret = -EIO;
-goto out;
+return -EIO;
 }
 
 if (end_mark != RAMBLOCK_RECV_BITMAP_ENDING) {
 error_report("%s: ramblock '%s' end mark incorrect: 0x%"PRIx64,
  __func__, block->idstr, end_mark);
-ret = -EINVAL;
-goto out;
+return -EINVAL;
 }
 
 /*
@@ -4240,10 +4238,7 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock 
*block)
  */
 migration_rp_kick(s);
 
-ret = 0;
-out:
-g_free(le_bitmap);
-return ret;
+return 0;
 }
 
 static int ram_resume_prepare(MigrationState *s, void *opaque)
-- 
2.41.0

[PULL 04/38] migration: fix RAMBlock add NULL check

2023-10-16 Thread Juan Quintela

From: Dmitry Frolov 

qemu_ram_block_from_host() may return NULL, which will be dereferenced w/o
check. Usualy return value is checked for this function.
Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Dmitry Frolov 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231010104851.802947-1-fro...@swemel.ru>
---
 migration/ram.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 24d91de8b3..e8df4dc862 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4285,6 +4285,11 @@ static void ram_mig_ram_block_resized(RAMBlockNotifier 
*n, void *host,
 RAMBlock *rb = qemu_ram_block_from_host(host, false, );
 Error *err = NULL;
 
+if (!rb) {
+error_report("RAM block not found");
+return;
+}
+
 if (migrate_ram_is_ignored(rb)) {
 return;
 }
-- 
2.41.0

[PULL 29/38] multifd: fix counters in multifd_send_thread

2023-10-16 Thread Juan Quintela

From: Elena Ufimtseva 

Previous commit cbec7eb76879d419e7dbf531ee2506ec0722e825
"migration/multifd: Compute transferred bytes correctly"
removed accounting for packet_len in non-rdma
case, but the next_packet_size only accounts for pages, not for
the header packet (normal_pages * PAGE_SIZE) that is being sent
as iov[0]. The packet_len part should be added to account for
the size of MultiFDPacket and the array of the offsets.

Signed-off-by: Elena Ufimtseva 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231011184358.97349-4-elena.ufimts...@oracle.com>
---
 migration/multifd.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 0f6b203877..e6e0013c16 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -714,8 +714,6 @@ static void *multifd_send_thread(void *opaque)
 if (ret != 0) {
 break;
 }
-stat64_add(_stats.multifd_bytes, p->packet_len);
-stat64_add(_stats.transferred, p->packet_len);
 } else {
 /* Send header using the same writev call */
 p->iov[0].iov_len = p->packet_len;
@@ -728,8 +726,10 @@ static void *multifd_send_thread(void *opaque)
 break;
 }
 
-stat64_add(_stats.multifd_bytes, p->next_packet_size);
-stat64_add(_stats.transferred, p->next_packet_size);
+stat64_add(_stats.multifd_bytes,
+   p->next_packet_size + p->packet_len);
+stat64_add(_stats.transferred,
+   p->next_packet_size + p->packet_len);
 qemu_mutex_lock(>mutex);
 p->pending_job--;
 qemu_mutex_unlock(>mutex);
-- 
2.41.0

[PULL 08/38] migration: Fix analyze-migration.py when ignore-shared is used

2023-10-16 Thread Juan Quintela

From: Fabiano Rosas 

The script is currently broken when the x-ignore-shared capability is
used:

Traceback (most recent call last):
  File "./scripts/analyze-migration.py", line 656, in 
dump.read(dump_memory = args.memory)
  File "./scripts/analyze-migration.py", line 593, in read
section.read()
  File "./scripts/analyze-migration.py", line 163, in read
self.name = self.file.readstr(len = namelen)
  File "./scripts/analyze-migration.py", line 53, in readstr
return self.readvar(len).decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 55: 
invalid start byte

We're currently adding data to the middle of the ram section depending
on the presence of the capability. As a consequence, any code loading
the ram section needs to know about capabilities so it can interpret
the stream.

Skip the byte that's added when x-ignore-shared is used to fix the
script.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231009184326.15777-5-faro...@suse.de>
---
 scripts/analyze-migration.py | 5 +
 1 file changed, 5 insertions(+)

diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py
index c700fed64d..56ab04dd2d 100755
--- a/scripts/analyze-migration.py
+++ b/scripts/analyze-migration.py
@@ -123,6 +123,7 @@ def __init__(self, file, version_id, ramargs, section_key):
 self.TARGET_PAGE_SIZE = ramargs['page_size']
 self.dump_memory = ramargs['dump_memory']
 self.write_memory = ramargs['write_memory']
+self.ignore_shared = ramargs['ignore_shared']
 self.sizeinfo = collections.OrderedDict()
 self.data = collections.OrderedDict()
 self.data['section sizes'] = self.sizeinfo
@@ -169,6 +170,8 @@ def read(self):
 f.truncate(0)
 f.truncate(len)
 self.files[self.name] = f
+if self.ignore_shared:
+mr_addr = self.file.read64()
 flags &= ~self.RAM_SAVE_FLAG_MEM_SIZE
 
 if flags & self.RAM_SAVE_FLAG_COMPRESS:
@@ -572,6 +575,7 @@ def read(self, desc_only = False, dump_memory = False, 
write_memory = False):
 ramargs['page_size'] = self.vmsd_desc['page_size']
 ramargs['dump_memory'] = dump_memory
 ramargs['write_memory'] = write_memory
+ramargs['ignore_shared'] = False
 self.section_classes[('ram',0)][1] = ramargs
 
 while True:
@@ -582,6 +586,7 @@ def read(self, desc_only = False, dump_memory = False, 
write_memory = False):
 config_desc = self.vmsd_desc.get('configuration')
 section = ConfigurationSection(file, config_desc)
 section.read()
+ramargs['ignore_shared'] = 
section.has_capability('x-ignore-shared')
 elif section_type == self.QEMU_VM_SECTION_START or section_type == 
self.QEMU_VM_SECTION_FULL:
 section_id = file.read32()
 name = file.readstr()
-- 
2.41.0

[PULL 00/38] Migration 20231016 patches

2023-10-16 Thread Juan Quintela

The following changes since commit 63011373ad22c794a013da69663c03f1297a5c56:

  Merge tag 'pull-riscv-to-apply-20231012-1' of 
https://github.com/alistair23/qemu into staging (2023-10-12 10:24:44 -0400)

are available in the Git repository at:

  https://gitlab.com/juan.quintela/qemu.git tags/migration-20231016-pull-request

for you to fetch changes up to f39b0f42753635b0f2d8b00a26d11bb197bf51e2:

  migration/multifd: Clarify Error usage in multifd_channel_connect (2023-10-16 
11:01:33 +0200)


Migration Pull request (20231016)

In this pull request:
- rdma cleanups
- removal of QEMUFileHook
- test for analyze-migration.py
- test for multifd file
- multifd cleanups
- available switchover bandwidth
- lots of cleanups.

CI: https://gitlab.com/juan.quintela/qemu/-/pipelines/1037878829

Please, apply.



Dmitry Frolov (1):
  migration: fix RAMBlock add NULL check

Elena Ufimtseva (3):
  migration: check for rate_limit_max for RATE_LIMIT_DISABLED
  multifd: fix counters in multifd_send_thread
  multifd: reset next_packet_len after sending pages

Fabiano Rosas (13):
  migration: Fix analyze-migration.py 'configuration' parsing
  migration: Add capability parsing to analyze-migration.py
  migration: Fix analyze-migration.py when ignore-shared is used
  migration: Fix analyze-migration read operation signedness
  tests/qtest/migration: Add a test for the analyze-migration script
  tests/qtest: migration-test: Add tests for file-based migration
  migration/ram: Remove RAMState from xbzrle_cache_zero_page
  migration/ram: Stop passing QEMUFile around in save_zero_page
  migration/ram: Move xbzrle zero page handling into save_zero_page
  migration/ram: Merge save_zero_page functions
  migration/multifd: Remove direct "socket" references
  migration/multifd: Unify multifd_send_thread error paths
  migration/multifd: Clarify Error usage in multifd_channel_connect

Fiona Ebner (1):
  migration: hold the BQL during setup

Juan Quintela (15):
  migration: Non multifd migration don't care about multifd flushes
  migration: Create migrate_rdma()
  migration/rdma: Unfold ram_control_before_iterate()
  migration/rdma: Unfold ram_control_after_iterate()
  migration/rdma: Remove all uses of RAM_CONTROL_HOOK
  migration/rdma: Unfold hook_ram_load()
  migration/rdma: Create rdma_control_save_page()
  qemu-file: Remove QEMUFileHooks
  migration/rdma: Move rdma constants from qemu-file.h to rdma.h
  migration/rdma: Remove qemu_ prefix from exported functions
  migration/rdma: Check sooner if we are in postcopy for save_page()
  migration/rdma: Use i as for index instead of idx
  migration/rdma: Declare for index variables local
  migration/rdma: Remove all "ret" variables that are used only once
  migration: Improve json and formatting

Nikolay Borisov (2):
  migration: Add the configuration vmstate to the json writer
  migration/ram: Refactor precopy ram loading code

Peter Xu (1):
  migration: Allow user to specify available switchover bandwidth

Philippe Mathieu-Daudé (1):
  migration: Use g_autofree to simplify ram_dirty_bitmap_reload()

Wei Wang (1):
  migration: refactor migration_completion

 qapi/migration.json|  41 -
 include/migration/register.h   |   2 +-
 migration/migration.h  |   4 +-
 migration/options.h|   2 +
 migration/qemu-file.h  |  49 --
 migration/rdma.h   |  42 +
 migration/block-dirty-bitmap.c |   3 -
 migration/block.c  |   5 -
 migration/migration-hmp-cmds.c |  14 ++
 migration/migration-stats.c|   9 +-
 migration/migration.c  | 199 +
 migration/multifd.c| 101 +--
 migration/options.c|  35 
 migration/qemu-file.c  |  61 +--
 migration/ram.c| 306 ++---
 migration/rdma.c   | 259 
 migration/savevm.c |  22 ++-
 tests/qtest/migration-test.c   | 207 ++
 migration/trace-events |  33 ++--
 scripts/analyze-migration.py   |  67 +++-
 tests/qtest/meson.build|   2 +
 21 files changed, 895 insertions(+), 568 deletions(-)

-- 
2.41.0

[PULL 36/38] migration/multifd: Remove direct "socket" references

2023-10-16 Thread Juan Quintela

From: Fabiano Rosas 

We're about to enable support for other transports in multifd, so
remove direct references to sockets.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231012134343.23757-2-faro...@suse.de>
---
 migration/multifd.c | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index c45f5015f8..8e9a5ee394 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -510,6 +510,11 @@ static void multifd_send_terminate_threads(Error *err)
 }
 }
 
+static int multifd_send_channel_destroy(QIOChannel *send)
+{
+return socket_send_channel_destroy(send);
+}
+
 void multifd_save_cleanup(void)
 {
 int i;
@@ -532,7 +537,7 @@ void multifd_save_cleanup(void)
 if (p->registered_yank) {
 migration_ioc_unregister_yank(p->c);
 }
-socket_send_channel_destroy(p->c);
+multifd_send_channel_destroy(p->c);
 p->c = NULL;
 qemu_mutex_destroy(>mutex);
 qemu_sem_destroy(>sem);
@@ -890,20 +895,25 @@ static void 
multifd_new_send_channel_cleanup(MultiFDSendParams *p,
 static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
 {
 MultiFDSendParams *p = opaque;
-QIOChannel *sioc = QIO_CHANNEL(qio_task_get_source(task));
+QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
 Error *local_err = NULL;
 
 trace_multifd_new_send_channel_async(p->id);
 if (!qio_task_propagate_error(task, _err)) {
-p->c = sioc;
+p->c = ioc;
 qio_channel_set_delay(p->c, false);
 p->running = true;
-if (multifd_channel_connect(p, sioc, local_err)) {
+if (multifd_channel_connect(p, ioc, local_err)) {
 return;
 }
 }
 
-multifd_new_send_channel_cleanup(p, sioc, local_err);
+multifd_new_send_channel_cleanup(p, ioc, local_err);
+}
+
+static void multifd_new_send_channel_create(gpointer opaque)
+{
+socket_send_channel_create(multifd_new_send_channel_async, opaque);
 }
 
 int multifd_save_setup(Error **errp)
@@ -952,7 +962,7 @@ int multifd_save_setup(Error **errp)
 p->write_flags = 0;
 }
 
-socket_send_channel_create(multifd_new_send_channel_async, p);
+multifd_new_send_channel_create(p);
 }
 
 for (i = 0; i < thread_count; i++) {
-- 
2.41.0

[PULL 30/38] multifd: reset next_packet_len after sending pages

2023-10-16 Thread Juan Quintela

From: Elena Ufimtseva 

Sometimes multifd sends just sync packet with no pages
(normal_num is 0). In this case the old value is being
preserved and being accounted for while only packet_len
is being transferred.
Reset it to 0 after sending and accounting for.

Signed-off-by: Elena Ufimtseva 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231011184358.97349-5-elena.ufimts...@oracle.com>
---
 migration/multifd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/migration/multifd.c b/migration/multifd.c
index e6e0013c16..c45f5015f8 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -730,6 +730,7 @@ static void *multifd_send_thread(void *opaque)
p->next_packet_size + p->packet_len);
 stat64_add(_stats.transferred,
p->next_packet_size + p->packet_len);
+p->next_packet_size = 0;
 qemu_mutex_lock(>mutex);
 p->pending_job--;
 qemu_mutex_unlock(>mutex);
-- 
2.41.0

[PULL 23/38] migration/rdma: Check sooner if we are in postcopy for save_page()

2023-10-16 Thread Juan Quintela

Reviewed-by: Peter Xu 
Reviewed-by: Li Zhijian 
Signed-off-by: Juan Quintela 
Message-ID: <20231011203527.9061-11-quint...@redhat.com>
---
 migration/rdma.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index c147c94b08..e973579a52 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3240,10 +3240,6 @@ static int qemu_rdma_save_page(QEMUFile *f, ram_addr_t 
block_offset,
 RDMAContext *rdma;
 int ret;
 
-if (migration_in_postcopy()) {
-return RAM_SAVE_CONTROL_NOT_SUPP;
-}
-
 RCU_READ_LOCK_GUARD();
 rdma = qatomic_rcu_read(>rdmaout);
 
@@ -3317,7 +3313,7 @@ err:
 int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
ram_addr_t offset, size_t size)
 {
-if (!migrate_rdma()) {
+if (!migrate_rdma() || migration_in_postcopy()) {
 return RAM_SAVE_CONTROL_NOT_SUPP;
 }
 
-- 
2.41.0

[PULL 16/38] migration/rdma: Unfold ram_control_after_iterate()

2023-10-16 Thread Juan Quintela

Once there:
- Remove unused data parameter
- unfold it in its callers
- change all callers to call qemu_rdma_registration_stop()
- We need to call QIO_CHANNEL_RDMA() after we check for migrate_rdma()

Reviewed-by: Li Zhijian 
Signed-off-by: Juan Quintela 
Message-ID: <20231011203527.9061-4-quint...@redhat.com>
---
 migration/qemu-file.h |  2 --
 migration/rdma.h  |  3 +++
 migration/qemu-file.c | 12 
 migration/ram.c   | 17 ++---
 migration/rdma.c  |  9 -
 5 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index d6a370c569..35e671a01e 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -55,7 +55,6 @@ typedef int (QEMURamSaveFunc)(QEMUFile *f,
   size_t size);
 
 typedef struct QEMUFileHooks {
-QEMURamHookFunc *after_ram_iterate;
 QEMURamHookFunc *hook_ram_load;
 QEMURamSaveFunc *save_page;
 } QEMUFileHooks;
@@ -126,7 +125,6 @@ void qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
 int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size);
 
-void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data);
 
 /* Whenever this is found in the data stream, the flags
diff --git a/migration/rdma.h b/migration/rdma.h
index 670c67a8cb..c13b94c782 100644
--- a/migration/rdma.h
+++ b/migration/rdma.h
@@ -25,8 +25,11 @@ void rdma_start_incoming_migration(const char *host_port, 
Error **errp);
 
 #ifdef CONFIG_RDMA
 int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags);
+int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags);
 #else
 static inline
 int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags) { return 0; }
+static inline
+int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags) { return 0; }
 #endif
 #endif
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 5e2d73fd68..e7dba2a849 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -298,18 +298,6 @@ void qemu_fflush(QEMUFile *f)
 f->iovcnt = 0;
 }
 
-void ram_control_after_iterate(QEMUFile *f, uint64_t flags)
-{
-int ret = 0;
-
-if (f->hooks && f->hooks->after_ram_iterate) {
-ret = f->hooks->after_ram_iterate(f, flags, NULL);
-if (ret < 0) {
-qemu_file_set_error(f, ret);
-}
-}
-}
-
 void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data)
 {
 if (f->hooks && f->hooks->hook_ram_load) {
diff --git a/migration/ram.c b/migration/ram.c
index 6592431a4e..f1ddc1f9fa 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3065,7 +3065,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
-ram_control_after_iterate(f, RAM_CONTROL_SETUP);
+
+ret = qemu_rdma_registration_stop(f, RAM_CONTROL_SETUP);
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
 
 migration_ops = g_malloc0(sizeof(MigrationOps));
 migration_ops->ram_save_target_page = ram_save_target_page_legacy;
@@ -3187,7 +3191,10 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
  * Must occur before EOS (or any QEMUFile operation)
  * because of RDMA protocol.
  */
-ram_control_after_iterate(f, RAM_CONTROL_ROUND);
+ret = qemu_rdma_registration_stop(f, RAM_CONTROL_ROUND);
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
 
 out:
 if (ret >= 0
@@ -3260,7 +3267,11 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 qemu_mutex_unlock(>bitmap_mutex);
 
 ram_flush_compressed_data(rs);
-ram_control_after_iterate(f, RAM_CONTROL_FINISH);
+
+int ret = qemu_rdma_registration_stop(f, RAM_CONTROL_FINISH);
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
 }
 
 if (ret < 0) {
diff --git a/migration/rdma.c b/migration/rdma.c
index 3d74ad6db0..4b32d375ec 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3878,20 +3878,20 @@ int qemu_rdma_registration_start(QEMUFile *f, uint64_t 
flags)
  * Inform dest that dynamic registrations are done for now.
  * First, flush writes, if any.
  */
-static int qemu_rdma_registration_stop(QEMUFile *f,
-   uint64_t flags, void *data)
+int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags)
 {
-QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
+QIOChannelRDMA *rioc;
 Error *err = NULL;
 RDMAContext *rdma;
 RDMAControlHeader head = { .len = 0, .repeat = 1 };
 int ret;
 
-if (migration_in_postcopy()) {
+if (!migrate_rdma() || migration_in_postcopy()) {
 return 0;
 }
 
 RCU_READ_LOCK_GUARD();
+rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
 rdma = qatomic_rcu_read(>rdmaout);
 if (!rdma) {
 return -1;
@@ -3999,7 +3999,6 @@ static const QEMUFileHooks rdma_read_hooks = {
 };
 
 static const

[PULL 38/38] migration/multifd: Clarify Error usage in multifd_channel_connect

2023-10-16 Thread Juan Quintela

From: Fabiano Rosas 

The function is currently called from two sites, one always gives it a
NULL Error and the other always gives it a non-NULL Error.

In the non-NULL case, all it does it trace the error and return. One
of the callers already have tracing, add a tracepoint to the other and
stop passing the error into the function.

Cc: Markus Armbruster 
Signed-off-by: Fabiano Rosas 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231012134343.23757-4-faro...@suse.de>
---
 migration/multifd.c| 60 --
 migration/trace-events |  3 ++-
 2 files changed, 30 insertions(+), 33 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index c92955de41..1fe53d3b98 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -775,7 +775,7 @@ out:
 
 static bool multifd_channel_connect(MultiFDSendParams *p,
 QIOChannel *ioc,
-Error *error);
+Error **errp);
 
 static void multifd_tls_outgoing_handshake(QIOTask *task,
gpointer opaque)
@@ -784,21 +784,22 @@ static void multifd_tls_outgoing_handshake(QIOTask *task,
 QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
 Error *err = NULL;
 
-if (qio_task_propagate_error(task, )) {
-trace_multifd_tls_outgoing_handshake_error(ioc, error_get_pretty(err));
-} else {
+if (!qio_task_propagate_error(task, )) {
 trace_multifd_tls_outgoing_handshake_complete(ioc);
+if (multifd_channel_connect(p, ioc, )) {
+return;
+}
 }
 
-if (!multifd_channel_connect(p, ioc, err)) {
-/*
- * Error happen, mark multifd_send_thread status as 'quit' although it
- * is not created, and then tell who pay attention to me.
- */
-p->quit = true;
-qemu_sem_post(_send_state->channels_ready);
-qemu_sem_post(>sem_sync);
-}
+trace_multifd_tls_outgoing_handshake_error(ioc, error_get_pretty(err));
+
+/*
+ * Error happen, mark multifd_send_thread status as 'quit' although it
+ * is not created, and then tell who pay attention to me.
+ */
+p->quit = true;
+qemu_sem_post(_send_state->channels_ready);
+qemu_sem_post(>sem_sync);
 }
 
 static void *multifd_tls_handshake_thread(void *opaque)
@@ -814,7 +815,7 @@ static void *multifd_tls_handshake_thread(void *opaque)
 return NULL;
 }
 
-static void multifd_tls_channel_connect(MultiFDSendParams *p,
+static bool multifd_tls_channel_connect(MultiFDSendParams *p,
 QIOChannel *ioc,
 Error **errp)
 {
@@ -824,7 +825,7 @@ static void multifd_tls_channel_connect(MultiFDSendParams 
*p,
 
 tioc = migration_tls_client_create(ioc, hostname, errp);
 if (!tioc) {
-return;
+return false;
 }
 
 object_unref(OBJECT(ioc));
@@ -834,31 +835,25 @@ static void multifd_tls_channel_connect(MultiFDSendParams 
*p,
 qemu_thread_create(>thread, "multifd-tls-handshake-worker",
multifd_tls_handshake_thread, p,
QEMU_THREAD_JOINABLE);
+return true;
 }
 
 static bool multifd_channel_connect(MultiFDSendParams *p,
 QIOChannel *ioc,
-Error *error)
+Error **errp)
 {
 trace_multifd_set_outgoing_channel(
 ioc, object_get_typename(OBJECT(ioc)),
-migrate_get_current()->hostname, error);
+migrate_get_current()->hostname);
 
-if (error) {
-return false;
-}
 if (migrate_channel_requires_tls_upgrade(ioc)) {
-multifd_tls_channel_connect(p, ioc, );
-if (!error) {
-/*
- * tls_channel_connect will call back to this
- * function after the TLS handshake,
- * so we mustn't call multifd_send_thread until then
- */
-return true;
-} else {
-return false;
-}
+/*
+ * tls_channel_connect will call back to this
+ * function after the TLS handshake,
+ * so we mustn't call multifd_send_thread until then
+ */
+return multifd_tls_channel_connect(p, ioc, errp);
+
 } else {
 migration_ioc_register_yank(ioc);
 p->registered_yank = true;
@@ -897,11 +892,12 @@ static void multifd_new_send_channel_async(QIOTask *task, 
gpointer opaque)
 p->c = ioc;
 qio_channel_set_delay(p->c, false);
 p->running = true;
-if (multifd_channel_connect(p, ioc, local_err)) {
+if (multifd_channel_connect(p, ioc, _err)) {
 return;
 }
 }
 
+trace_multifd_new_send_channel_async_error(p->id, local_err);
 multifd_new_send_channel_cleanup(p, ioc, local_err);

[PULL 07/38] migration: Add capability parsing to analyze-migration.py

2023-10-16 Thread Juan Quintela

From: Fabiano Rosas 

The script is broken when the configuration/capabilities section is
present. Add support for parsing the capabilities so we can fix it in
the next patch.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231009184326.15777-4-faro...@suse.de>
---
 scripts/analyze-migration.py | 38 
 1 file changed, 38 insertions(+)

diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py
index 24687db497..c700fed64d 100755
--- a/scripts/analyze-migration.py
+++ b/scripts/analyze-migration.py
@@ -264,6 +264,24 @@ class ConfigurationSection(object):
 def __init__(self, file, desc):
 self.file = file
 self.desc = desc
+self.caps = []
+
+def parse_capabilities(self, vmsd_caps):
+if not vmsd_caps:
+return
+
+ncaps = vmsd_caps.data['caps_count'].data
+self.caps = vmsd_caps.data['capabilities']
+
+if type(self.caps) != list:
+self.caps = [self.caps]
+
+if len(self.caps) != ncaps:
+raise Exception("Number of capabilities doesn't match "
+"caps_count field")
+
+def has_capability(self, cap):
+return any([str(c) == cap for c in self.caps])
 
 def read(self):
 if self.desc:
@@ -271,6 +289,8 @@ def read(self):
 section = VMSDSection(self.file, version_id, self.desc,
   'configuration')
 section.read()
+self.parse_capabilities(
+section.data.get("configuration/capabilities"))
 else:
 # backward compatibility for older streams that don't have
 # the configuration section in the json
@@ -297,6 +317,23 @@ def read(self):
 self.data = self.file.readvar(size)
 return self.data
 
+class VMSDFieldCap(object):
+def __init__(self, desc, file):
+self.file = file
+self.desc = desc
+self.data = ""
+
+def __repr__(self):
+return self.data
+
+def __str__(self):
+return self.data
+
+def read(self):
+len = self.file.read8()
+self.data = self.file.readstr(len)
+
+
 class VMSDFieldInt(VMSDFieldGeneric):
 def __init__(self, desc, file):
 super(VMSDFieldInt, self).__init__(desc, file)
@@ -471,6 +508,7 @@ def getDict(self):
 "unused_buffer" : VMSDFieldGeneric,
 "bitmap" : VMSDFieldGeneric,
 "struct" : VMSDFieldStruct,
+"capability": VMSDFieldCap,
 "unknown" : VMSDFieldGeneric,
 }
 
-- 
2.41.0

[PULL 01/38] migration: refactor migration_completion

2023-10-16 Thread Juan Quintela

From: Wei Wang 

Current migration_completion function is a bit long. Refactor the long
implementation into different subfunctions:
- migration_completion_precopy: completion code related to precopy
- migration_completion_postcopy: completion code related to postcopy

Rename await_return_path_close_on_source to
close_return_path_on_source: It is renamed to match with
open_return_path_on_source.

This improves readability and is easier for future updates (e.g. add new
subfunctions when completion code related to new features are needed). No
functional changes intended.

Signed-off-by: Wei Wang 
Reviewed-by: Peter Xu 
Reviewed-by: Isaku Yamahata 
Reviewed-by: Juan Quintela 
Message-ID: <20230804093053.5037-1-wei.w.w...@intel.com>
Signed-off-by: Juan Quintela 
---
 migration/migration.c | 167 --
 1 file changed, 94 insertions(+), 73 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 1c6c81ad49..0e1002d017 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -99,7 +99,7 @@ static int migration_maybe_pause(MigrationState *s,
  int *current_active_state,
  int new_state);
 static void migrate_fd_cancel(MigrationState *s);
-static int await_return_path_close_on_source(MigrationState *s);
+static int close_return_path_on_source(MigrationState *s);
 
 static bool migration_needs_multiple_sockets(void)
 {
@@ -1191,7 +1191,7 @@ static void migrate_fd_cleanup(MigrationState *s)
  * We already cleaned up to_dst_file, so errors from the return
  * path might be due to that, ignore them.
  */
-await_return_path_close_on_source(s);
+close_return_path_on_source(s);
 
 assert(!migration_is_active(s));
 
@@ -2049,8 +2049,7 @@ static int open_return_path_on_source(MigrationState *ms)
 return 0;
 }
 
-/* Returns 0 if the RP was ok, otherwise there was an error on the RP */
-static int await_return_path_close_on_source(MigrationState *ms)
+static int close_return_path_on_source(MigrationState *ms)
 {
 int ret;
 
@@ -2317,6 +2316,87 @@ static int migration_maybe_pause(MigrationState *s,
 return s->state == new_state ? 0 : -EINVAL;
 }
 
+static int migration_completion_precopy(MigrationState *s,
+int *current_active_state)
+{
+int ret;
+
+qemu_mutex_lock_iothread();
+s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
+
+s->vm_old_state = runstate_get();
+global_state_store();
+
+ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+trace_migration_completion_vm_stop(ret);
+if (ret < 0) {
+goto out_unlock;
+}
+
+ret = migration_maybe_pause(s, current_active_state,
+MIGRATION_STATUS_DEVICE);
+if (ret < 0) {
+goto out_unlock;
+}
+
+/*
+ * Inactivate disks except in COLO, and track that we have done so in order
+ * to remember to reactivate them if migration fails or is cancelled.
+ */
+s->block_inactive = !migrate_colo();
+migration_rate_set(RATE_LIMIT_DISABLED);
+ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
+ s->block_inactive);
+out_unlock:
+qemu_mutex_unlock_iothread();
+return ret;
+}
+
+static void migration_completion_postcopy(MigrationState *s)
+{
+trace_migration_completion_postcopy_end();
+
+qemu_mutex_lock_iothread();
+qemu_savevm_state_complete_postcopy(s->to_dst_file);
+qemu_mutex_unlock_iothread();
+
+/*
+ * Shutdown the postcopy fast path thread.  This is only needed when dest
+ * QEMU binary is old (7.1/7.2).  QEMU 8.0+ doesn't need this.
+ */
+if (migrate_postcopy_preempt() && s->preempt_pre_7_2) {
+postcopy_preempt_shutdown_file(s);
+}
+
+trace_migration_completion_postcopy_end_after_complete();
+}
+
+static void migration_completion_failed(MigrationState *s,
+int current_active_state)
+{
+if (s->block_inactive && (s->state == MIGRATION_STATUS_ACTIVE ||
+  s->state == MIGRATION_STATUS_DEVICE)) {
+/*
+ * If not doing postcopy, vm_start() will be called: let's
+ * regain control on images.
+ */
+Error *local_err = NULL;
+
+qemu_mutex_lock_iothread();
+bdrv_activate_all(_err);
+if (local_err) {
+error_report_err(local_err);
+} else {
+s->block_inactive = false;
+}
+qemu_mutex_unlock_iothread();
+}
+
+migrate_set_state(>state, current_active_state,
+  MIGRATION_STATUS_FAILED);
+}
+
 /**
  * migration_completion: Used by migration_thread when there's not much left.
  *   The caller 'breaks' the loop when this returns.
@@ -2325,62 +2405,22 @@ static int

[PULL 21/38] migration/rdma: Move rdma constants from qemu-file.h to rdma.h

2023-10-16 Thread Juan Quintela

Reviewed-by: Peter Xu 
Reviewed-by: Li Zhijian 
Signed-off-by: Juan Quintela 
Message-ID: <20231011203527.9061-9-quint...@redhat.com>
---
 migration/qemu-file.h | 17 -
 migration/rdma.h  | 16 
 migration/ram.c   |  2 +-
 3 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 0b22d8335f..a29c37b0d0 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -29,13 +29,6 @@
 #include "exec/cpu-common.h"
 #include "io/channel.h"
 
-/*
- * Constants used by ram_control_* hooks
- */
-#define RAM_CONTROL_SETUP 0
-#define RAM_CONTROL_ROUND 1
-#define RAM_CONTROL_FINISH3
-
 QEMUFile *qemu_file_new_input(QIOChannel *ioc);
 QEMUFile *qemu_file_new_output(QIOChannel *ioc);
 int qemu_fclose(QEMUFile *f);
@@ -101,16 +94,6 @@ void qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
 int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size);
 
-/* Whenever this is found in the data stream, the flags
- * will be passed to ram_control_load_hook in the incoming-migration
- * side. This lets before_ram_iterate/after_ram_iterate add
- * transport-specific sections to the RAM migration data.
- */
-#define RAM_SAVE_FLAG_HOOK 0x80
-
-#define RAM_SAVE_CONTROL_NOT_SUPP -1000
-#define RAM_SAVE_CONTROL_DELAYED  -2000
-
 QIOChannel *qemu_file_get_ioc(QEMUFile *file);
 
 #endif
diff --git a/migration/rdma.h b/migration/rdma.h
index 09a16c1e3c..1ff3718a76 100644
--- a/migration/rdma.h
+++ b/migration/rdma.h
@@ -24,6 +24,22 @@ void rdma_start_outgoing_migration(void *opaque, const char 
*host_port,
 
 void rdma_start_incoming_migration(const char *host_port, Error **errp);
 
+/*
+ * Constants used by rdma return codes
+ */
+#define RAM_CONTROL_SETUP 0
+#define RAM_CONTROL_ROUND 1
+#define RAM_CONTROL_FINISH3
+
+/*
+ * Whenever this is found in the data stream, the flags
+ * will be passed to rdma functions in the incoming-migration
+ * side.
+ */
+#define RAM_SAVE_FLAG_HOOK 0x80
+
+#define RAM_SAVE_CONTROL_NOT_SUPP -1000
+#define RAM_SAVE_CONTROL_DELAYED  -2000
 
 #ifdef CONFIG_RDMA
 int qemu_rdma_registration_handle(QEMUFile *f);
diff --git a/migration/ram.c b/migration/ram.c
index 3b4b09f6ff..6a4aed2a75 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -89,7 +89,7 @@
 #define RAM_SAVE_FLAG_EOS  0x10
 #define RAM_SAVE_FLAG_CONTINUE 0x20
 #define RAM_SAVE_FLAG_XBZRLE   0x40
-/* 0x80 is reserved in qemu-file.h for RAM_SAVE_FLAG_HOOK */
+/* 0x80 is reserved in rdma.h for RAM_SAVE_FLAG_HOOK */
 #define RAM_SAVE_FLAG_COMPRESS_PAGE0x100
 #define RAM_SAVE_FLAG_MULTIFD_FLUSH0x200
 /* We can't use any flag that is bigger than 0x200 */
-- 
2.41.0

[PULL 22/38] migration/rdma: Remove qemu_ prefix from exported functions

2023-10-16 Thread Juan Quintela

Functions are long enough even without this.

Reviewed-by: Peter Xu 
Reviewed-by: Li Zhijian 
Signed-off-by: Juan Quintela 
Message-ID: <20231011203527.9061-10-quint...@redhat.com>
---
 migration/rdma.h   | 12 ++--
 migration/ram.c| 14 +++---
 migration/rdma.c   | 40 +++-
 migration/trace-events | 28 ++--
 4 files changed, 46 insertions(+), 48 deletions(-)

diff --git a/migration/rdma.h b/migration/rdma.h
index 1ff3718a76..30b15b4466 100644
--- a/migration/rdma.h
+++ b/migration/rdma.h
@@ -42,19 +42,19 @@ void rdma_start_incoming_migration(const char *host_port, 
Error **errp);
 #define RAM_SAVE_CONTROL_DELAYED  -2000
 
 #ifdef CONFIG_RDMA
-int qemu_rdma_registration_handle(QEMUFile *f);
-int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags);
-int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags);
+int rdma_registration_handle(QEMUFile *f);
+int rdma_registration_start(QEMUFile *f, uint64_t flags);
+int rdma_registration_stop(QEMUFile *f, uint64_t flags);
 int rdma_block_notification_handle(QEMUFile *f, const char *name);
 int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
ram_addr_t offset, size_t size);
 #else
 static inline
-int qemu_rdma_registration_handle(QEMUFile *f) { return 0; }
+int rdma_registration_handle(QEMUFile *f) { return 0; }
 static inline
-int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags) { return 0; }
+int rdma_registration_start(QEMUFile *f, uint64_t flags) { return 0; }
 static inline
-int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags) { return 0; }
+int rdma_registration_stop(QEMUFile *f, uint64_t flags) { return 0; }
 static inline
 int rdma_block_notification_handle(QEMUFile *f, const char *name) { return 0; }
 static inline
diff --git a/migration/ram.c b/migration/ram.c
index 6a4aed2a75..a9bc6ae1f1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3061,12 +3061,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 }
 }
 
-ret = qemu_rdma_registration_start(f, RAM_CONTROL_SETUP);
+ret = rdma_registration_start(f, RAM_CONTROL_SETUP);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
 
-ret = qemu_rdma_registration_stop(f, RAM_CONTROL_SETUP);
+ret = rdma_registration_stop(f, RAM_CONTROL_SETUP);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
@@ -3131,7 +3131,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 /* Read version before ram_list.blocks */
 smp_rmb();
 
-ret = qemu_rdma_registration_start(f, RAM_CONTROL_ROUND);
+ret = rdma_registration_start(f, RAM_CONTROL_ROUND);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
@@ -3191,7 +3191,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
  * Must occur before EOS (or any QEMUFile operation)
  * because of RDMA protocol.
  */
-ret = qemu_rdma_registration_stop(f, RAM_CONTROL_ROUND);
+ret = rdma_registration_stop(f, RAM_CONTROL_ROUND);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
@@ -3242,7 +3242,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 migration_bitmap_sync_precopy(rs, true);
 }
 
-ret = qemu_rdma_registration_start(f, RAM_CONTROL_FINISH);
+ret = rdma_registration_start(f, RAM_CONTROL_FINISH);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
@@ -3268,7 +3268,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 
 ram_flush_compressed_data(rs);
 
-int ret = qemu_rdma_registration_stop(f, RAM_CONTROL_FINISH);
+int ret = rdma_registration_stop(f, RAM_CONTROL_FINISH);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
@@ -4077,7 +4077,7 @@ static int ram_load_precopy(QEMUFile *f)
 }
 break;
 case RAM_SAVE_FLAG_HOOK:
-ret = qemu_rdma_registration_handle(f);
+ret = rdma_registration_handle(f);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
diff --git a/migration/rdma.c b/migration/rdma.c
index 9883b0a250..c147c94b08 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3540,7 +3540,7 @@ static int dest_ram_sort_func(const void *a, const void 
*b)
  *
  * Keep doing this until the source tells us to stop.
  */
-int qemu_rdma_registration_handle(QEMUFile *f)
+int rdma_registration_handle(QEMUFile *f)
 {
 RDMAControlHeader reg_resp = { .len = sizeof(RDMARegisterResult),
.type = RDMA_CONTROL_REGISTER_RESULT,
@@ -3586,7 +3586,7 @@ int qemu_rdma_registration_handle(QEMUFile *f)
 
 local = >local_ram_blocks;
 do {
-trace_qemu_rdma_registration_handle_wait();
+trace_rdma_registration_handle_wait();
 
 ret = qemu_rdma_exchange_recv(rdma, , RDMA_CONTROL_NONE, );
 
@@ -3606,9 +3606,9 @@

[PULL 10/38] tests/qtest/migration: Add a test for the analyze-migration script

2023-10-16 Thread Juan Quintela

From: Fabiano Rosas 

Add a smoke test that migrates to a file and gives it to the
script. It should catch the most annoying errors such as changes in
the ram flags.

After code has been merged it becomes way harder to figure out what is
causing the script to fail, the person making the change is the most
likely to know right away what the problem is.

Signed-off-by: Fabiano Rosas 
Acked-by: Thomas Huth 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231009184326.15777-7-faro...@suse.de>
---
 tests/qtest/migration-test.c | 60 
 tests/qtest/meson.build  |  2 ++
 2 files changed, 62 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 8eb2053dbb..cef5081f8c 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -66,6 +66,8 @@ static bool got_dst_resume;
  */
 #define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
 
+#define ANALYZE_SCRIPT "scripts/analyze-migration.py"
+
 #if defined(__linux__)
 #include 
 #include 
@@ -1501,6 +1503,61 @@ static void test_baddest(void)
 test_migrate_end(from, to, false);
 }
 
+#ifndef _WIN32
+static void test_analyze_script(void)
+{
+MigrateStart args = {
+.opts_source = "-uuid ----",
+};
+QTestState *from, *to;
+g_autofree char *uri = NULL;
+g_autofree char *file = NULL;
+int pid, wstatus;
+const char *python = g_getenv("PYTHON");
+
+if (!python) {
+g_test_skip("PYTHON variable not set");
+return;
+}
+
+/* dummy url */
+if (test_migrate_start(, , "tcp:127.0.0.1:0", )) {
+return;
+}
+
+/*
+ * Setting these two capabilities causes the "configuration"
+ * vmstate to include subsections for them. The script needs to
+ * parse those subsections properly.
+ */
+migrate_set_capability(from, "validate-uuid", true);
+migrate_set_capability(from, "x-ignore-shared", true);
+
+file = g_strdup_printf("%s/migfile", tmpfs);
+uri = g_strdup_printf("exec:cat > %s", file);
+
+migrate_ensure_converge(from);
+migrate_qmp(from, uri, "{}");
+wait_for_migration_complete(from);
+
+pid = fork();
+if (!pid) {
+close(1);
+open("/dev/null", O_WRONLY);
+execl(python, python, ANALYZE_SCRIPT, "-f", file, NULL);
+g_assert_not_reached();
+}
+
+g_assert(waitpid(pid, , 0) == pid);
+if (WIFEXITED(wstatus) && WEXITSTATUS(wstatus) != 0) {
+g_test_message("Failed to analyze the migration stream");
+g_test_fail();
+}
+test_migrate_end(from, to, false);
+cleanup("migfile");
+}
+#endif
+
 static void test_precopy_common(MigrateCommon *args)
 {
 QTestState *from, *to;
@@ -2837,6 +2894,9 @@ int main(int argc, char **argv)
 }
 
 qtest_add_func("/migration/bad_dest", test_baddest);
+#ifndef _WIN32
+qtest_add_func("/migration/analyze-script", test_analyze_script);
+#endif
 qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain);
 qtest_add_func("/migration/precopy/unix/xbzrle", test_precopy_unix_xbzrle);
 /*
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 66795cfcd2..d6022ebd64 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -357,6 +357,8 @@ foreach dir : target_dirs
 test_deps += [qsd]
   endif
 
+  qtest_env.set('PYTHON', python.full_path())
+
   foreach test : target_qtests
 # Executables are shared across targets, declare them only the first time 
we
 # encounter them
-- 
2.41.0

[PULL 24/38] migration/rdma: Use i as for index instead of idx

2023-10-16 Thread Juan Quintela

Once there, all the uses are local to the for, so declare the variable
inside the for statement.

Reviewed-by: Fabiano Rosas 
Reviewed-by: Li Zhijian 
Signed-off-by: Juan Quintela 
Message-ID: <20231011203527.9061-12-quint...@redhat.com>
---
 migration/rdma.c | 49 ++--
 1 file changed, 22 insertions(+), 27 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index e973579a52..5f6a771e8e 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2354,7 +2354,6 @@ static int qemu_rdma_write(RDMAContext *rdma,
 static void qemu_rdma_cleanup(RDMAContext *rdma)
 {
 Error *err = NULL;
-int idx;
 
 if (rdma->cm_id && rdma->connected) {
 if ((rdma->errored ||
@@ -2381,12 +2380,12 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
 g_free(rdma->dest_blocks);
 rdma->dest_blocks = NULL;
 
-for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
-if (rdma->wr_data[idx].control_mr) {
+for (int i = 0; i < RDMA_WRID_MAX; i++) {
+if (rdma->wr_data[i].control_mr) {
 rdma->total_registrations--;
-ibv_dereg_mr(rdma->wr_data[idx].control_mr);
+ibv_dereg_mr(rdma->wr_data[i].control_mr);
 }
-rdma->wr_data[idx].control_mr = NULL;
+rdma->wr_data[i].control_mr = NULL;
 }
 
 if (rdma->local_ram_blocks.block) {
@@ -2452,7 +2451,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
 
 static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
 {
-int ret, idx;
+int ret;
 
 /*
  * Will be validated against destination's actual capabilities
@@ -2480,18 +2479,17 @@ static int qemu_rdma_source_init(RDMAContext *rdma, 
bool pin_all, Error **errp)
 
 /* Build the hash that maps from offset to RAMBlock */
 rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
-for (idx = 0; idx < rdma->local_ram_blocks.nb_blocks; idx++) {
+for (int i = 0; i < rdma->local_ram_blocks.nb_blocks; i++) {
 g_hash_table_insert(rdma->blockmap,
-(void *)(uintptr_t)rdma->local_ram_blocks.block[idx].offset,
->local_ram_blocks.block[idx]);
+(void *)(uintptr_t)rdma->local_ram_blocks.block[i].offset,
+>local_ram_blocks.block[i]);
 }
 
-for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
-ret = qemu_rdma_reg_control(rdma, idx);
+for (int i = 0; i < RDMA_WRID_MAX; i++) {
+ret = qemu_rdma_reg_control(rdma, i);
 if (ret < 0) {
-error_setg(errp,
-   "RDMA ERROR: rdma migration: error registering %d 
control!",
-   idx);
+error_setg(errp, "RDMA ERROR: rdma migration: error "
+   "registering %d control!", i);
 goto err_rdma_source_init;
 }
 }
@@ -2625,16 +2623,16 @@ err_rdma_source_connect:
 static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 {
 Error *err = NULL;
-int ret, idx;
+int ret;
 struct rdma_cm_id *listen_id;
 char ip[40] = "unknown";
 struct rdma_addrinfo *res, *e;
 char port_str[16];
 int reuse = 1;
 
-for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
-rdma->wr_data[idx].control_len = 0;
-rdma->wr_data[idx].control_curr = NULL;
+for (int i = 0; i < RDMA_WRID_MAX; i++) {
+rdma->wr_data[i].control_len = 0;
+rdma->wr_data[i].control_curr = NULL;
 }
 
 if (!rdma->host || !rdma->host[0]) {
@@ -2723,11 +2721,9 @@ err_dest_init_create_listen_id:
 static void qemu_rdma_return_path_dest_init(RDMAContext *rdma_return_path,
 RDMAContext *rdma)
 {
-int idx;
-
-for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
-rdma_return_path->wr_data[idx].control_len = 0;
-rdma_return_path->wr_data[idx].control_curr = NULL;
+for (int i = 0; i < RDMA_WRID_MAX; i++) {
+rdma_return_path->wr_data[i].control_len = 0;
+rdma_return_path->wr_data[i].control_curr = NULL;
 }
 
 /*the CM channel and CM id is shared*/
@@ -3376,7 +3372,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
 struct rdma_cm_event *cm_event;
 struct ibv_context *verbs;
 int ret;
-int idx;
 
 ret = rdma_get_cm_event(rdma->channel, _event);
 if (ret < 0) {
@@ -3462,10 +3457,10 @@ static int qemu_rdma_accept(RDMAContext *rdma)
 
 qemu_rdma_init_ram_blocks(rdma);
 
-for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
-ret = qemu_rdma_reg_control(rdma, idx);
+for (int i = 0; i < RDMA_WRID_MAX; i++) {
+ret = qemu_rdma_reg_control(rdma, i);
 if (ret < 0) {
-error_report("rdma: error registering %d control", idx);
+error_report("rdma: error registering %d control", i);
 goto err_rdma_dest_wait;
 }
 }
-- 
2.41.0

[PULL 18/38] migration/rdma: Unfold hook_ram_load()

2023-10-16 Thread Juan Quintela

There is only one flag called with: RAM_CONTROL_BLOCK_REG.

Reviewed-by: Li Zhijian 
Signed-off-by: Juan Quintela 
Message-ID: <20231011203527.9061-6-quint...@redhat.com>
---
 migration/qemu-file.h | 11 ---
 migration/rdma.h  |  3 +++
 migration/qemu-file.c | 10 --
 migration/ram.c   |  6 --
 migration/rdma.c  | 34 +++---
 5 files changed, 18 insertions(+), 46 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 14ff0d9cc4..80c30631dc 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -29,20 +29,12 @@
 #include "exec/cpu-common.h"
 #include "io/channel.h"
 
-/*
- * This function provides hooks around different
- * stages of RAM migration.
- * 'data' is call specific data associated with the 'flags' value
- */
-typedef int (QEMURamHookFunc)(QEMUFile *f, uint64_t flags, void *data);
-
 /*
  * Constants used by ram_control_* hooks
  */
 #define RAM_CONTROL_SETUP 0
 #define RAM_CONTROL_ROUND 1
 #define RAM_CONTROL_FINISH3
-#define RAM_CONTROL_BLOCK_REG 4
 
 /*
  * This function allows override of where the RAM page
@@ -54,7 +46,6 @@ typedef int (QEMURamSaveFunc)(QEMUFile *f,
   size_t size);
 
 typedef struct QEMUFileHooks {
-QEMURamHookFunc *hook_ram_load;
 QEMURamSaveFunc *save_page;
 } QEMUFileHooks;
 
@@ -124,8 +115,6 @@ void qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
 int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size);
 
-void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data);
-
 /* Whenever this is found in the data stream, the flags
  * will be passed to ram_control_load_hook in the incoming-migration
  * side. This lets before_ram_iterate/after_ram_iterate add
diff --git a/migration/rdma.h b/migration/rdma.h
index 8bd277efb9..8df8b4089a 100644
--- a/migration/rdma.h
+++ b/migration/rdma.h
@@ -27,6 +27,7 @@ void rdma_start_incoming_migration(const char *host_port, 
Error **errp);
 int qemu_rdma_registration_handle(QEMUFile *f);
 int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags);
 int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags);
+int rdma_block_notification_handle(QEMUFile *f, const char *name);
 #else
 static inline
 int qemu_rdma_registration_handle(QEMUFile *f) { return 0; }
@@ -34,5 +35,7 @@ static inline
 int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags) { return 0; }
 static inline
 int qemu_rdma_registration_stop(QEMUFile *f, uint64_t flags) { return 0; }
+static inline
+int rdma_block_notification_handle(QEMUFile *f, const char *name) { return 0; }
 #endif
 #endif
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index e7dba2a849..4a414b8976 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -298,16 +298,6 @@ void qemu_fflush(QEMUFile *f)
 f->iovcnt = 0;
 }
 
-void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data)
-{
-if (f->hooks && f->hooks->hook_ram_load) {
-int ret = f->hooks->hook_ram_load(f, flags, data);
-if (ret < 0) {
-qemu_file_set_error(f, ret);
-}
-}
-}
-
 int ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
   ram_addr_t offset, size_t size)
 {
diff --git a/migration/ram.c b/migration/ram.c
index f6ea1831b5..8c462276cd 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4025,8 +4025,10 @@ static int ram_load_precopy(QEMUFile *f)
 ret = -EINVAL;
 }
 }
-ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG,
-  block->idstr);
+ret = rdma_block_notification_handle(f, block->idstr);
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
 } else {
 error_report("Unknown ramblock \"%s\", cannot "
  "accept migration", id);
diff --git a/migration/rdma.c b/migration/rdma.c
index 5c20f425a9..0b1cb03b2b 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3799,22 +3799,23 @@ err:
 }
 
 /* Destination:
- * Called via a ram_control_load_hook during the initial RAM load section which
- * lists the RAMBlocks by name.  This lets us know the order of the RAMBlocks
- * on the source.
- * We've already built our local RAMBlock list, but not yet sent the list to
- * the source.
+ * Called during the initial RAM load section which lists the
+ * RAMBlocks by name.  This lets us know the order of the RAMBlocks on
+ * the source.  We've already built our local RAMBlock list, but not
+ * yet sent the list to the source.
  */
-static int
-rdma_block_notification_handle(QEMUFile *f, const char *name)
+int rdma_block_notification_handle(QEMUFile *f, const char *name)
 {
-RDMAContext *rdma;
-QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
 int curr;

[PULL 15/38] migration/rdma: Unfold ram_control_before_iterate()

2023-10-16 Thread Juan Quintela

Once there:
- Remove unused data parameter
- unfold it in its callers.
- change all callers to call qemu_rdma_registration_start()
- We need to call QIO_CHANNEL_RDMA() after we check for migrate_rdma()

Reviewed-by: Li Zhijian 
Reviewed-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
Message-ID: <20231011203527.9061-3-quint...@redhat.com>
---
 migration/qemu-file.h |  2 --
 migration/rdma.h  |  7 +++
 migration/qemu-file.c | 13 +
 migration/ram.c   | 16 +---
 migration/rdma.c  | 12 
 5 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 03e718c264..d6a370c569 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -55,7 +55,6 @@ typedef int (QEMURamSaveFunc)(QEMUFile *f,
   size_t size);
 
 typedef struct QEMUFileHooks {
-QEMURamHookFunc *before_ram_iterate;
 QEMURamHookFunc *after_ram_iterate;
 QEMURamHookFunc *hook_ram_load;
 QEMURamSaveFunc *save_page;
@@ -127,7 +126,6 @@ void qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
 int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size);
 
-void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_load_hook(QEMUFile *f, uint64_t flags, void *data);
 
diff --git a/migration/rdma.h b/migration/rdma.h
index de2ba09dc5..670c67a8cb 100644
--- a/migration/rdma.h
+++ b/migration/rdma.h
@@ -22,4 +22,11 @@ void rdma_start_outgoing_migration(void *opaque, const char 
*host_port,
 
 void rdma_start_incoming_migration(const char *host_port, Error **errp);
 
+
+#ifdef CONFIG_RDMA
+int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags);
+#else
+static inline
+int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags) { return 0; }
+#endif
 #endif
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 7fb659296f..5e2d73fd68 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -32,6 +32,7 @@
 #include "trace.h"
 #include "options.h"
 #include "qapi/error.h"
+#include "rdma.h"
 
 #define IO_BUF_SIZE 32768
 #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
@@ -297,18 +298,6 @@ void qemu_fflush(QEMUFile *f)
 f->iovcnt = 0;
 }
 
-void ram_control_before_iterate(QEMUFile *f, uint64_t flags)
-{
-int ret = 0;
-
-if (f->hooks && f->hooks->before_ram_iterate) {
-ret = f->hooks->before_ram_iterate(f, flags, NULL);
-if (ret < 0) {
-qemu_file_set_error(f, ret);
-}
-}
-}
-
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags)
 {
 int ret = 0;
diff --git a/migration/ram.c b/migration/ram.c
index acb8f95f00..6592431a4e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -59,6 +59,7 @@
 #include "qemu/iov.h"
 #include "multifd.h"
 #include "sysemu/runstate.h"
+#include "rdma.h"
 #include "options.h"
 #include "sysemu/dirtylimit.h"
 #include "sysemu/kvm.h"
@@ -3060,7 +3061,10 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 }
 }
 
-ram_control_before_iterate(f, RAM_CONTROL_SETUP);
+ret = qemu_rdma_registration_start(f, RAM_CONTROL_SETUP);
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
 ram_control_after_iterate(f, RAM_CONTROL_SETUP);
 
 migration_ops = g_malloc0(sizeof(MigrationOps));
@@ -3123,7 +3127,10 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 /* Read version before ram_list.blocks */
 smp_rmb();
 
-ram_control_before_iterate(f, RAM_CONTROL_ROUND);
+ret = qemu_rdma_registration_start(f, RAM_CONTROL_ROUND);
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
 
 t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 i = 0;
@@ -3228,7 +3235,10 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 migration_bitmap_sync_precopy(rs, true);
 }
 
-ram_control_before_iterate(f, RAM_CONTROL_FINISH);
+ret = qemu_rdma_registration_start(f, RAM_CONTROL_FINISH);
+if (ret < 0) {
+qemu_file_set_error(f, ret);
+}
 
 /* try transferring iterative blocks of memory */
 
diff --git a/migration/rdma.c b/migration/rdma.c
index f155f3e1c8..3d74ad6db0 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3850,18 +3850,15 @@ static int rdma_load_hook(QEMUFile *f, uint64_t flags, 
void *data)
 }
 }
 
-static int qemu_rdma_registration_start(QEMUFile *f,
-uint64_t flags, void *data)
+int qemu_rdma_registration_start(QEMUFile *f, uint64_t flags)
 {
-QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
-RDMAContext *rdma;
-
-if (migration_in_postcopy()) {
+if (!migrate_rdma() || migration_in_postcopy()) {
 return 0;
 }
 
+QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
 RCU_READ_LOCK_GUARD();
-rdma =

[PULL 03/38] migration: Allow user to specify available switchover bandwidth

2023-10-16 Thread Juan Quintela

From: Peter Xu 

Migration bandwidth is a very important value to live migration.  It's
because it's one of the major factors that we'll make decision on when to
switchover to destination in a precopy process.

This value is currently estimated by QEMU during the whole live migration
process by monitoring how fast we were sending the data.  This can be the
most accurate bandwidth if in the ideal world, where we're always feeding
unlimited data to the migration channel, and then it'll be limited to the
bandwidth that is available.

However in reality it may be very different, e.g., over a 10Gbps network we
can see query-migrate showing migration bandwidth of only a few tens of
MB/s just because there are plenty of other things the migration thread
might be doing.  For example, the migration thread can be busy scanning
zero pages, or it can be fetching dirty bitmap from other external dirty
sources (like vhost or KVM).  It means we may not be pushing data as much
as possible to migration channel, so the bandwidth estimated from "how many
data we sent in the channel" can be dramatically inaccurate sometimes.

With that, the decision to switchover will be affected, by assuming that we
may not be able to switchover at all with such a low bandwidth, but in
reality we can.

The migration may not even converge at all with the downtime specified,
with that wrong estimation of bandwidth, keeping iterations forever with a
low estimation of bandwidth.

The issue is QEMU itself may not be able to avoid those uncertainties on
measuing the real "available migration bandwidth".  At least not something
I can think of so far.

One way to fix this is when the user is fully aware of the available
bandwidth, then we can allow the user to help providing an accurate value.

For example, if the user has a dedicated channel of 10Gbps for migration
for this specific VM, the user can specify this bandwidth so QEMU can
always do the calculation based on this fact, trusting the user as long as
specified.  It may not be the exact bandwidth when switching over (in which
case qemu will push migration data as fast as possible), but much better
than QEMU trying to wildly guess, especially when very wrong.

A new parameter "avail-switchover-bandwidth" is introduced just for this.
So when the user specified this parameter, instead of trusting the
estimated value from QEMU itself (based on the QEMUFile send speed), it
trusts the user more by using this value to decide when to switchover,
assuming that we'll have such bandwidth available then.

Note that specifying this value will not throttle the bandwidth for
switchover yet, so QEMU will always use the full bandwidth possible for
sending switchover data, assuming that should always be the most important
way to use the network at that time.

This can resolve issues like "unconvergence migration" which is caused by
hilarious low "migration bandwidth" detected for whatever reason.

Reported-by: Zhiyi Guo 
Reviewed-by: Joao Martins 
Reviewed-by: Juan Quintela 
Signed-off-by: Peter Xu 
Signed-off-by: Juan Quintela 
Message-ID: <20231010221922.40638-1-pet...@redhat.com>
---
 qapi/migration.json| 34 +-
 migration/migration.h  |  2 +-
 migration/options.h|  1 +
 migration/migration-hmp-cmds.c | 14 ++
 migration/migration.c  | 24 +---
 migration/options.c| 28 
 migration/trace-events |  2 +-
 7 files changed, 99 insertions(+), 6 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index d7dfaa5db9..360e609f66 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -758,6 +758,16 @@
 # @max-bandwidth: to set maximum speed for migration.  maximum speed
 # in bytes per second.  (Since 2.8)
 #
+# @avail-switchover-bandwidth: to set the available bandwidth that
+# migration can use during switchover phase.  NOTE!  This does not
+# limit the bandwidth during switchover, but only for calculations when
+# making decisions to switchover.  By default, this value is zero,
+# which means QEMU will estimate the bandwidth automatically.  This can
+# be set when the estimated value is not accurate, while the user is
+# able to guarantee such bandwidth is available when switching over.
+# When specified correctly, this can make the switchover decision much
+# more accurate.  (Since 8.2)
+#
 # @downtime-limit: set maximum tolerated downtime for migration.
 # maximum downtime in milliseconds (Since 2.8)
 #
@@ -839,7 +849,7 @@
'cpu-throttle-initial', 'cpu-throttle-increment',
'cpu-throttle-tailslow',
'tls-creds', 'tls-hostname', 'tls-authz', 'max-bandwidth',
-   'downtime-limit',
+   'avail-switchover-bandwidth', 'downtime-limit',
{ 'name': 'x-checkpoint-delay', 'features': [ 'unstable' ] },
'block-incremental',

[PULL 11/38] tests/qtest: migration-test: Add tests for file-based migration

2023-10-16 Thread Juan Quintela

From: Fabiano Rosas 

Add basic tests for file-based migration.

Note that we cannot use test_precopy_common because that routine
expects it to be possible to run the migration live. With the file
transport there is no live migration because we must wait for the
source to finish writing the migration data to the file before the
destination can start reading. Add a new migration function
specifically to handle the file migration.

Reviewed-by: Peter Xu 
Reviewed-by: Juan Quintela 
Signed-off-by: Fabiano Rosas 
Signed-off-by: Juan Quintela 
Message-ID: <20230712190742.22294-7-faro...@suse.de>
---
 tests/qtest/migration-test.c | 147 +++
 1 file changed, 147 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index cef5081f8c..da02b6d692 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -68,6 +68,10 @@ static bool got_dst_resume;
 
 #define ANALYZE_SCRIPT "scripts/analyze-migration.py"
 
+#define QEMU_VM_FILE_MAGIC 0x5145564d
+#define FILE_TEST_FILENAME "migfile"
+#define FILE_TEST_OFFSET 0x1000
+
 #if defined(__linux__)
 #include 
 #include 
@@ -884,6 +888,7 @@ static void test_migrate_end(QTestState *from, QTestState 
*to, bool test_dest)
 cleanup("migsocket");
 cleanup("src_serial");
 cleanup("dest_serial");
+cleanup(FILE_TEST_FILENAME);
 }
 
 #ifdef CONFIG_GNUTLS
@@ -1667,6 +1672,70 @@ finish:
 test_migrate_end(from, to, args->result == MIG_TEST_SUCCEED);
 }
 
+static void test_file_common(MigrateCommon *args, bool stop_src)
+{
+QTestState *from, *to;
+void *data_hook = NULL;
+g_autofree char *connect_uri = g_strdup(args->connect_uri);
+
+if (test_migrate_start(, , args->listen_uri, >start)) {
+return;
+}
+
+/*
+ * File migration is never live. We can keep the source VM running
+ * during migration, but the destination will not be running
+ * concurrently.
+ */
+g_assert_false(args->live);
+
+if (args->start_hook) {
+data_hook = args->start_hook(from, to);
+}
+
+migrate_ensure_converge(from);
+wait_for_serial("src_serial");
+
+if (stop_src) {
+qtest_qmp_assert_success(from, "{ 'execute' : 'stop'}");
+if (!got_src_stop) {
+qtest_qmp_eventwait(from, "STOP");
+}
+}
+
+if (args->result == MIG_TEST_QMP_ERROR) {
+migrate_qmp_fail(from, connect_uri, "{}");
+goto finish;
+}
+
+migrate_qmp(from, connect_uri, "{}");
+wait_for_migration_complete(from);
+
+/*
+ * We need to wait for the source to finish before starting the
+ * destination.
+ */
+migrate_incoming_qmp(to, connect_uri, "{}");
+wait_for_migration_complete(to);
+
+if (stop_src) {
+qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");
+}
+
+if (!got_dst_resume) {
+qtest_qmp_eventwait(to, "RESUME");
+}
+
+wait_for_serial("dest_serial");
+
+finish:
+if (args->finish_hook) {
+args->finish_hook(from, to, data_hook);
+}
+
+test_migrate_end(from, to, args->result == MIG_TEST_SUCCEED);
+}
+
 static void test_precopy_unix_plain(void)
 {
 g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
@@ -1862,6 +1931,76 @@ static void test_precopy_unix_compress_nowait(void)
 test_precopy_common();
 }
 
+static void test_precopy_file(void)
+{
+g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
+   FILE_TEST_FILENAME);
+MigrateCommon args = {
+.connect_uri = uri,
+.listen_uri = "defer",
+};
+
+test_file_common(, true);
+}
+
+static void file_offset_finish_hook(QTestState *from, QTestState *to,
+void *opaque)
+{
+#if defined(__linux__)
+g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, 
FILE_TEST_FILENAME);
+size_t size = FILE_TEST_OFFSET + sizeof(QEMU_VM_FILE_MAGIC);
+uintptr_t *addr, *p;
+int fd;
+
+fd = open(path, O_RDONLY);
+g_assert(fd != -1);
+addr = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0);
+g_assert(addr != MAP_FAILED);
+
+/*
+ * Ensure the skipped offset contains zeros and the migration
+ * stream starts at the right place.
+ */
+p = addr;
+while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) {
+g_assert(*p == 0);
+p++;
+}
+g_assert_cmpint(cpu_to_be32(*p), ==, QEMU_VM_FILE_MAGIC);
+
+munmap(addr, size);
+close(fd);
+#endif
+}
+
+static void test_precopy_file_offset(void)
+{
+g_autofree char *uri = g_strdup_printf("file:%s/%s,offset=%d", tmpfs,
+   FILE_TEST_FILENAME,
+   FILE_TEST_OFFSET);
+MigrateCommon args = {
+.connect_uri = uri,
+.listen_uri = "defer",
+.finish_hook = file_offset_finish_hook,
+};
+
+test_file_common(, false);
+}
+
+static void

[PULL 05/38] migration: Add the configuration vmstate to the json writer

2023-10-16 Thread Juan Quintela

From: Nikolay Borisov 

Make the migration json writer part of MigrationState struct, allowing
the 'configuration' object be serialized to json.

This will facilitate the parsing of the 'configuration' object in the
next patch that fixes analyze-migration.py for arm.

Signed-off-by: Nikolay Borisov 
Signed-off-by: Fabiano Rosas 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231009184326.15777-2-faro...@suse.de>
---
 migration/migration.c |  1 +
 migration/savevm.c| 20 
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index ed04ca3b1c..98151b1424 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1442,6 +1442,7 @@ int migrate_init(MigrationState *s, Error **errp)
 error_free(s->error);
 s->error = NULL;
 s->hostname = NULL;
+s->vmdesc = NULL;
 
 migrate_set_state(>state, MIGRATION_STATUS_NONE, 
MIGRATION_STATUS_SETUP);
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 497ce02bd7..bce698b0af 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1217,13 +1217,27 @@ void qemu_savevm_non_migratable_list(strList **reasons)
 
 void qemu_savevm_state_header(QEMUFile *f)
 {
+MigrationState *s = migrate_get_current();
+
+s->vmdesc = json_writer_new(false);
+
 trace_savevm_state_header();
 qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
 qemu_put_be32(f, QEMU_VM_FILE_VERSION);
 
-if (migrate_get_current()->send_configuration) {
+if (s->send_configuration) {
 qemu_put_byte(f, QEMU_VM_CONFIGURATION);
-vmstate_save_state(f, _configuration, _state, 0);
+
+/*
+ * This starts the main json object and is paired with the
+ * json_writer_end_object in
+ * qemu_savevm_state_complete_precopy_non_iterable
+ */
+json_writer_start_object(s->vmdesc, NULL);
+
+json_writer_start_object(s->vmdesc, "configuration");
+vmstate_save_state(f, _configuration, _state, 
s->vmdesc);
+json_writer_end_object(s->vmdesc);
 }
 }
 
@@ -1272,8 +1286,6 @@ void qemu_savevm_state_setup(QEMUFile *f)
 Error *local_err = NULL;
 int ret;
 
-ms->vmdesc = json_writer_new(false);
-json_writer_start_object(ms->vmdesc, NULL);
 json_writer_int64(ms->vmdesc, "page_size", qemu_target_page_size());
 json_writer_start_array(ms->vmdesc, "devices");
 
-- 
2.41.0

[PULL 12/38] migration: hold the BQL during setup

2023-10-16 Thread Juan Quintela

From: Fiona Ebner 

This is intended to be a semantic revert of commit 9b09503752
("migration: run setup callbacks out of big lock"). There have been so
many changes since that commit (e.g. a new setup callback
dirty_bitmap_save_setup() that also needs to be adapted now), it's
easier to do the revert manually.

For snapshots, the bdrv_writev_vmstate() function is used during setup
(in QIOChannelBlock backing the QEMUFile), but not holding the BQL
while calling it could lead to an assertion failure. To understand
how, first note the following:

1. Generated coroutine wrappers for block layer functions spawn the
coroutine and use AIO_WAIT_WHILE()/aio_poll() to wait for it.
2. If the host OS switches threads at an inconvenient time, it can
happen that a bottom half scheduled for the main thread's AioContext
is executed as part of a vCPU thread's aio_poll().

An example leading to the assertion failure is as follows:

main thread:
1. A snapshot-save QMP command gets issued.
2. snapshot_save_job_bh() is scheduled.

vCPU thread:
3. aio_poll() for the main thread's AioContext is called (e.g. when
the guest writes to a pflash device, as part of blk_pwrite which is a
generated coroutine wrapper).
4. snapshot_save_job_bh() is executed as part of aio_poll().
3. qemu_savevm_state() is called.
4. qemu_mutex_unlock_iothread() is called. Now
qemu_get_current_aio_context() returns 0x0.
5. bdrv_writev_vmstate() is executed during the usual savevm setup
via qemu_fflush(). But this function is a generated coroutine wrapper,
so it uses AIO_WAIT_WHILE. There, the assertion
assert(qemu_get_current_aio_context() == qemu_get_aio_context());
will fail.

To fix it, ensure that the BQL is held during setup. While it would
only be needed for snapshots, adapting migration too avoids additional
logic for conditional locking/unlocking in the setup callbacks.
Writing the header could (in theory) also trigger qemu_fflush() and
thus bdrv_writev_vmstate(), so the locked section also covers the
qemu_savevm_state_header() call, even for migration for consistency.

The section around multifd_send_sync_main() needs to be unlocked to
avoid a deadlock. In particular, the multifd_save_setup() function calls
socket_send_channel_create() using multifd_new_send_channel_async() as a
callback and then waits for the callback to signal via the
channels_ready semaphore. The connection happens via
qio_task_run_in_thread(), but the callback is only executed via
qio_task_thread_result() which is scheduled for the main event loop.
Without unlocking the section, the main thread would never get to
process the task result and the callback meaning there would be no
signal via the channels_ready semaphore.

The comment in ram_init_bitmaps() was introduced by 4987783400
("migration: fix incorrect memory_global_dirty_log_start outside BQL")
and is removed, because it referred to the qemu_mutex_lock_iothread()
call.

Signed-off-by: Fiona Ebner 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Juan Quintela 
Signed-off-by: Juan Quintela 
Message-ID: <20231013105839.415989-1-f.eb...@proxmox.com>
---
 include/migration/register.h   | 2 +-
 migration/block-dirty-bitmap.c | 3 ---
 migration/block.c  | 5 -
 migration/migration.c  | 6 ++
 migration/ram.c| 6 +++---
 migration/savevm.c | 2 --
 6 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index 2b12c6adec..fed1d04a3c 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -25,6 +25,7 @@ typedef struct SaveVMHandlers {
  * used to perform early checks.
  */
 int (*save_prepare)(void *opaque, Error **errp);
+int (*save_setup)(QEMUFile *f, void *opaque);
 void (*save_cleanup)(void *opaque);
 int (*save_live_complete_postcopy)(QEMUFile *f, void *opaque);
 int (*save_live_complete_precopy)(QEMUFile *f, void *opaque);
@@ -50,7 +51,6 @@ typedef struct SaveVMHandlers {
 int (*save_live_iterate)(QEMUFile *f, void *opaque);
 
 /* This runs outside the iothread lock!  */
-int (*save_setup)(QEMUFile *f, void *opaque);
 /* Note for save_live_pending:
  * must_precopy:
  * - must be migrated in precopy or in stopped state
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 032fc5f405..03cb2e72ee 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -1214,9 +1214,7 @@ static int dirty_bitmap_save_setup(QEMUFile *f, void 
*opaque)
 DBMSaveState *s = &((DBMState *)opaque)->save;
 SaveBitmapState *dbms = NULL;
 
-qemu_mutex_lock_iothread();
 if (init_dirty_bitmap_migration(s) < 0) {
-qemu_mutex_unlock_iothread();
 return -1;
 }
 
@@ -1224,7 +1222,6 @@ static int dirty_bitmap_save_setup(QEMUFile *f, void 
*opaque)
 send_bitmap_start(f, s, dbms);
 }
 qemu_put_bitmap_flags(f, DIRTY_BITMAP_MIG_FLAG_EOS);
-

Re: -drive if=none: can't we make this the default?

2023-10-16 Thread Paolo Bonzini


On 10/14/23 21:16, Michael Tokarev wrote:

Can't we make -drive if=none the default?

Yes, I know current default is ide, and whole world have to use if=none 
explicitly
to undo this.  I think at this point we can deprecate if=ide default and switch 
to
if=none in the next release.  I think it will be a welcome change.


I think if anything we should have no default at all.  But if I had my way:

1) if=none would be deprecated (but with a much longer cycle than 1 
year, probably), and everything that uses it would have to use -blockdev.


2) -drive would be limited to a very small set of suboptions (file, 
cache, if, and the ones in qemu_common_drive_opts) and anything that 
specifies the driver would go through -blockdev.


Paolo

Re: -drive if=none: can't we make this the default?

2023-10-16 Thread Daniel P . Berrangé

On Sat, Oct 14, 2023 at 10:16:16PM +0300, Michael Tokarev wrote:
> Can't we make -drive if=none the default?
> 
> Yes, I know current default is ide, and whole world have to use if=none 
> explicitly
> to undo this.  I think at this point we can deprecate if=ide default and 
> switch to
> if=none in the next release.  I think it will be a welcome change.

IMHO we'd be better off investing more effort in pushing people towards
-blockdev though better documentation of the latter.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v3 0/8] qemu-img: rebase: add compression support

2023-10-16 Thread Andrey Drobyshev

On 10/2/23 09:35, Andrey Drobyshev wrote:
> On 9/19/23 20:57, Andrey Drobyshev wrote:
>> v2 --> v3:
>>  * Patch 3/8: fixed logic in the if statement, so that we align on blk
>>when blk_old_backing == NULL;
>>  * Patch 4/8: comment fix;
>>  * Patch 5/8: comment fix; dropped redundant "if (blk_new_backing)"
>>statements.
>>
>> v2: https://lists.nongnu.org/archive/html/qemu-block/2023-09/msg00448.html
>>
>> Andrey Drobyshev (8):
>>   qemu-img: rebase: stop when reaching EOF of old backing file
>>   qemu-iotests: 024: add rebasing test case for overlay_size >
>> backing_size
>>   qemu-img: rebase: use backing files' BlockBackend for buffer alignment
>>   qemu-img: add chunk size parameter to compare_buffers()
>>   qemu-img: rebase: avoid unnecessary COW operations
>>   iotests/{024, 271}: add testcases for qemu-img rebase
>>   qemu-img: add compression option to rebase subcommand
>>   iotests: add tests for "qemu-img rebase" with compression
>>
>>  docs/tools/qemu-img.rst|   6 +-
>>  qemu-img-cmds.hx   |   4 +-
>>  qemu-img.c | 136 ++
>>  tests/qemu-iotests/024 | 117 ++
>>  tests/qemu-iotests/024.out |  73 
>>  tests/qemu-iotests/271 | 131 +
>>  tests/qemu-iotests/271.out |  82 ++
>>  tests/qemu-iotests/314 | 165 +
>>  tests/qemu-iotests/314.out |  75 +
>>  9 files changed, 752 insertions(+), 37 deletions(-)
>>  create mode 100755 tests/qemu-iotests/314
>>  create mode 100644 tests/qemu-iotests/314.out
>>
> 
> Ping

Friendly ping

Re: [PATCH v4 03/10] migration: migrate 'inc' command option is deprecated.

2023-10-16 Thread Juan Quintela

Markus Armbruster  wrote:
> Juan Quintela  writes:
>
>> Set the 'block_incremental' migration parameter to 'true' instead.
>>
>> Reviewed-by: Thomas Huth 
>> Acked-by: Stefan Hajnoczi 
>> Signed-off-by: Juan Quintela 
>>
>> ---
>>
>> Improve documentation and style (thanks Markus)
>> ---
>>  docs/about/deprecated.rst | 7 +++
>>  qapi/migration.json   | 8 +++-
>>  migration/migration.c | 6 ++
>>  3 files changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
>> index 1c4d7f36f0..1b6b2870cf 100644
>> --- a/docs/about/deprecated.rst
>> +++ b/docs/about/deprecated.rst
>> @@ -452,3 +452,10 @@ Migration
>>  ``skipped`` field in Migration stats has been deprecated.  It hasn't
>>  been used for more than 10 years.
>>  
>> +``inc`` migrate command option (since 8.2)
>> +''
>> +
>> +The new way to modify migration is using migration parameters.
>> +``inc`` functionality can be achieved by setting the
>> +``block-incremental`` migration parameter to ``true``.
>> +
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 6865fea3c5..56bbd55b87 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -1492,6 +1492,11 @@
>>  #
>>  # @resume: resume one paused migration, default "off". (since 3.0)
>>  #
>> +# Features:
>> +#
>> +# @deprecated: Member @inc is deprecated.  Use migration parameter
>> +# @block-incremental instead.
>
> This is fine now.  It becomes bad advice in PATCH 05, which deprecates
> @block-incremental.  Two solutions:
>
> 1. Change this patch to point to an alternative that will *not* be
> deprecated.

Ok, clearly I am not explaining myself properly O:-)

History of block migration:
* In the beggining there was -b and -i migrate options
  There was the only way to do storage of migration.
* We moved to use parameters and capabilities for migration
  So we created @block-incremental and @block.
  But we didn't remove the command line options (for backward
  compatibility).
* We were asked to modify migration so some storaged was migrated and
  some was not migrated during migration.  But block people found that
  it was a good idea to allow storage migration without migrating the
  vm, so they created this blockdev-mirror mechanism that is shinny,
  funny, faster,  better.

So now we have old code that basically nobody uses (the last big user
was COLO, but now it can use multifd).  So we want to drop it, but we
don't care about a direct replacement.

So, why I am interested in removing this?
- @block and @block-incremental: If you don't use block migration, their
  existence don't bother you.  They are "quite" independent of the rest
  of the migration code (they could be better integrated, but not big
  trouble here).
- migrate options -i/-b: This ones hurt us each time that we need to
  do changing in options.  Notice that we have "perfect" replacements
  with @block and @block-incremental, exactly the same
  result/semantics/...
  You can see the trobles in the RFC patches

 * [PATCH v4 07/10] [RFC] migration: Make -i/-b an error for hmp and qmp
 * [PATCH v4 08/10] [RFC] migration: Remove helpers needed for -i/-b 
migrate options

So what I want, I want to remove -i/-b in the next version (9.0?).  For
the other, I want to remove it, but I don't care if the code is around
in "deprecated" state for another couple of years if there are still
people that feel that they want it.

This is the reason that I put a pointer for -i/-b to
@block/@block-incremental.  They are "perfect" replacements.

I can put here to use blockdev-mirror + NBD, but the replacement is not
so direct.

Does this make sense?


> 2. Change PATCH 05.
>
> Same end result.
>
>> +#
>>  # Returns: nothing on success
>>  #
>>  # Since: 0.14
>> @@ -1513,7 +1518,8 @@
>>  # <- { "return": {} }
>>  ##
>>  { 'command': 'migrate',
>> -  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool',
>> +  'data': {'uri': 'str', '*blk': 'bool',
>> +   '*inc': { 'type': 'bool', 'features': [ 'deprecated' ] },
>> '*detach': 'bool', '*resume': 'bool' } }
>>  
>>  ##
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 1c6c81ad49..ac4897fe0d 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1601,6 +1601,12 @@ static bool migrate_prepare(MigrationState *s, bool 
>> blk, bool blk_inc,
>>  {
>>  Error *local_err = NULL;
>>  
>> +if (blk_inc) {
>> +warn_report("@inc/-i migrate option is deprecated, set the"
>
> This is either about QMP migrate's parameter "inc", or HMP migrate's
> flags -i.

Needs to be @inc.  I want about the "-i" command option in other place.

> In the former case, we want something like "parameter 'inc' is
> deprecated".

This one.

> In the latter case, we want something like "-i is deprecated".

Ok, changing.

> Trying to do both in a single message results in a sub-par message.  If
> you want to do better,

Re: [PATCH v5] migration: hold the BQL during setup

2023-10-16 Thread Juan Quintela

Fiona Ebner  wrote:

queued.

> This is intended to be a semantic revert of commit 9b09503752
> ("migration: run setup callbacks out of big lock"). There have been so
> many changes since that commit (e.g. a new setup callback
> dirty_bitmap_save_setup() that also needs to be adapted now), it's
> easier to do the revert manually.
>
> For snapshots, the bdrv_writev_vmstate() function is used during setup
> (in QIOChannelBlock backing the QEMUFile), but not holding the BQL
> while calling it could lead to an assertion failure. To understand
> how, first note the following:
>
> 1. Generated coroutine wrappers for block layer functions spawn the
> coroutine and use AIO_WAIT_WHILE()/aio_poll() to wait for it.
> 2. If the host OS switches threads at an inconvenient time, it can
> happen that a bottom half scheduled for the main thread's AioContext
> is executed as part of a vCPU thread's aio_poll().
>
> An example leading to the assertion failure is as follows:
>
> main thread:
> 1. A snapshot-save QMP command gets issued.
> 2. snapshot_save_job_bh() is scheduled.
>
> vCPU thread:
> 3. aio_poll() for the main thread's AioContext is called (e.g. when
> the guest writes to a pflash device, as part of blk_pwrite which is a
> generated coroutine wrapper).
> 4. snapshot_save_job_bh() is executed as part of aio_poll().
> 3. qemu_savevm_state() is called.
> 4. qemu_mutex_unlock_iothread() is called. Now
> qemu_get_current_aio_context() returns 0x0.
> 5. bdrv_writev_vmstate() is executed during the usual savevm setup
> via qemu_fflush(). But this function is a generated coroutine wrapper,
> so it uses AIO_WAIT_WHILE. There, the assertion
> assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> will fail.
>
> To fix it, ensure that the BQL is held during setup. While it would
> only be needed for snapshots, adapting migration too avoids additional
> logic for conditional locking/unlocking in the setup callbacks.
> Writing the header could (in theory) also trigger qemu_fflush() and
> thus bdrv_writev_vmstate(), so the locked section also covers the
> qemu_savevm_state_header() call, even for migration for consistency.
>
> The section around multifd_send_sync_main() needs to be unlocked to
> avoid a deadlock. In particular, the multifd_save_setup() function calls
> socket_send_channel_create() using multifd_new_send_channel_async() as a
> callback and then waits for the callback to signal via the
> channels_ready semaphore. The connection happens via

85 matches

Mail list logo