Re: [PATCH] linux-aio: add IO_CMD_FDSYNC command support

2024-03-10 Thread Prasad Pandit
Hello Kevin,

On Fri, 8 Mar 2024 at 17:38, Prasad Pandit  wrote:
> I'm trying to test it against the Fedora-26 kernel, which was < 4.13.0, and 
> did not support the AIO_FDSYNC call.

[PATCH v2] -> 
https://lists.nongnu.org/archive/html/qemu-devel/2024-03/msg02495.html

* I've sent v2 of this patch which checks the return value from
'laio_co_submit' function and returns if it is >= zero(0).

* I tested this and previous version of this patch on host kernels
which support IO_CMD_FDSYNC and which don't.
1) When kernel supports IO_CMD_FDSYNC, everything works well. No issues.

   2) When kernel does _not_ support IO_CMD_FDSYNC
- With [PATCH v1], guest does not boot, instead it opens a rescue shell
- With [PATCH v2], guest boots and seems to work fine. But
after some time guest kernel threads seem to hang and show traces like
===
INFO: task kworker/u2:0:9 blocked for more than 245 seconds.
INFO: task (tmpfiles):482 blocked for more than 123 seconds.
INFO: task (tmpfiles):482 blocked for more than 368 seconds.
INFO: task systemd-random-:477 blocked for more than 368 seconds.
[  492.932383]   Not tainted 6.6.7-100.fc37.x86_64 #1
[  492.935404] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
===

* I'm not yet sure how to fix this. I'd appreciate if you've suggestions.

Thank you.
---
  - Prasad




[PATCH v2] linux-aio: add IO_CMD_FDSYNC command support

2024-03-10 Thread Prasad Pandit
From: Prasad Pandit 

Libaio defines IO_CMD_FDSYNC command to sync all outstanding
asynchronous I/O operations, by flushing out file data to the
disk storage.

Enable linux-aio to submit such aio request. This helps to
reduce latency induced via pthread_create calls by
thread-pool (aio=threads).

Signed-off-by: Prasad Pandit 
---
 block/file-posix.c | 12 
 block/linux-aio.c  |  5 -
 2 files changed, 16 insertions(+), 1 deletion(-)

v2: if IO_CMD_FDSYNC is not supported by the kernel,
fallback on thread-pool flush.
  -> https://lists.nongnu.org/archive/html/qemu-devel/2024-03/msg01986.html

diff --git a/block/file-posix.c b/block/file-posix.c
index 35684f7e21..4f2195d01d 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2599,6 +2599,18 @@ static int coroutine_fn 
raw_co_flush_to_disk(BlockDriverState *bs)
 if (raw_check_linux_io_uring(s)) {
 return luring_co_submit(bs, s->fd, 0, NULL, QEMU_AIO_FLUSH);
 }
+#endif
+#ifdef CONFIG_LINUX_AIO
+if (raw_check_linux_aio(s)) {
+ret = laio_co_submit(s->fd, 0, NULL, QEMU_AIO_FLUSH, 0);
+if (ret >= 0) {
+/*
+ * if AIO_FLUSH is supported return
+ * else fallback on thread-pool flush.
+ */
+return ret;
+}
+}
 #endif
 return raw_thread_pool_submit(handle_aiocb_flush, &acb);
 }
diff --git a/block/linux-aio.c b/block/linux-aio.c
index ec05d946f3..d940d029e3 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -384,6 +384,9 @@ static int laio_do_submit(int fd, struct qemu_laiocb 
*laiocb, off_t offset,
 case QEMU_AIO_READ:
 io_prep_preadv(iocbs, fd, qiov->iov, qiov->niov, offset);
 break;
+case QEMU_AIO_FLUSH:
+io_prep_fdsync(iocbs, fd);
+break;
 /* Currently Linux kernel does not support other operations */
 default:
 fprintf(stderr, "%s: invalid AIO request type 0x%x.\n",
@@ -412,7 +415,7 @@ int coroutine_fn laio_co_submit(int fd, uint64_t offset, 
QEMUIOVector *qiov,
 AioContext *ctx = qemu_get_current_aio_context();
 struct qemu_laiocb laiocb = {
 .co = qemu_coroutine_self(),
-.nbytes = qiov->size,
+.nbytes = qiov ? qiov->size : 0,
 .ctx= aio_get_linux_aio(ctx),
 .ret= -EINPROGRESS,
 .is_read= (type == QEMU_AIO_READ),
-- 
2.44.0




[PATCH v6 2/3] backends: Initial support for SPDM socket support

2024-03-10 Thread Alistair Francis
From: Huai-Cheng Kuo 

SPDM enables authentication, attestation and key exchange to assist in
providing infrastructure security enablement. It's a standard published
by the DMTF [1].

SPDM supports multiple transports, including PCIe DOE and MCTP.
This patch adds support to QEMU to connect to an external SPDM
instance.

SPDM support can be added to any QEMU device by exposing a
TCP socket to a SPDM server. The server can then implement the SPDM
decoding/encoding support, generally using libspdm [2].

This is similar to how the current TPM implementation works and means
that the heavy lifting of setting up certificate chains, capabilities,
measurements and complex crypto can be done outside QEMU by a well
supported and tested library.

1: https://www.dmtf.org/standards/SPDM
2: https://github.com/DMTF/libspdm

Signed-off-by: Huai-Cheng Kuo 
Signed-off-by: Chris Browy 
Co-developed-by: Jonathan Cameron 
Signed-off-by: Jonathan Cameron 
[ Changes by WM
 - Bug fixes from testing
]
Signed-off-by: Wilfred Mallawa 
[ Changes by AF:
 - Convert to be more QEMU-ified
 - Move to backends as it isn't PCIe specific
]
Signed-off-by: Alistair Francis 
---
 MAINTAINERS  |   6 +
 include/sysemu/spdm-socket.h |  74 
 backends/spdm-socket.c   | 216 +++
 backends/Kconfig |   4 +
 backends/meson.build |   2 +
 5 files changed, 302 insertions(+)
 create mode 100644 include/sysemu/spdm-socket.h
 create mode 100644 backends/spdm-socket.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 4d96f855de..0a8ffa8fe0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3396,6 +3396,12 @@ F: tests/qtest/*tpm*
 F: docs/specs/tpm.rst
 T: git https://github.com/stefanberger/qemu-tpm.git tpm-next
 
+SPDM
+M: Alistair Francis 
+S: Maintained
+F: backends/spdm-socket.c
+F: include/sysemu/spdm-socket.h
+
 Checkpatch
 S: Odd Fixes
 F: scripts/checkpatch.pl
diff --git a/include/sysemu/spdm-socket.h b/include/sysemu/spdm-socket.h
new file mode 100644
index 00..5d8bd9aa4e
--- /dev/null
+++ b/include/sysemu/spdm-socket.h
@@ -0,0 +1,74 @@
+/*
+ * QEMU SPDM socket support
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef SPDM_REQUESTER_H
+#define SPDM_REQUESTER_H
+
+/**
+ * spdm_socket_connect: connect to an external SPDM socket
+ * @port: port to connect to
+ * @errp: error object handle
+ *
+ * This will connect to an external SPDM socket server. On error
+ * it will return -1 and errp will be set. On success this function
+ * will return the socket number.
+ */
+int spdm_socket_connect(uint16_t port, Error **errp);
+
+/**
+ * spdm_socket_rsp: send and receive a message to a SPDM server
+ * @socket: socket returned from spdm_socket_connect()
+ * @transport_type: SPDM_SOCKET_TRANSPORT_TYPE_* macro
+ * @req: request buffer
+ * @req_len: request buffer length
+ * @rsp: response buffer
+ * @rsp_len: response buffer length
+ *
+ * Send platform data to a SPDM server on socket and then receive
+ * a response.
+ */
+uint32_t spdm_socket_rsp(const int socket, uint32_t transport_type,
+ void *req, uint32_t req_len,
+ void *rsp, uint32_t rsp_len);
+
+/**
+ * spdm_socket_close: send a shutdown command to the server
+ * @socket: socket returned from spdm_socket_connect()
+ * @transport_type: SPDM_SOCKET_TRANSPORT_TYPE_* macro
+ *
+ * This will issue a shutdown command to the server.
+ */
+void spdm_socket_close(const int socket, uint32_t transport_type);
+
+#define SPDM_SOCKET_COMMAND_NORMAL0x0001
+#define SPDM_SOCKET_COMMAND_OOB_ENCAP_KEY_UPDATE  0x8001
+#define SPDM_SOCKET_COMMAND_CONTINUE  0xFFFD
+#define SPDM_SOCKET_COMMAND_SHUTDOWN  0xFFFE
+#define SPDM_SOCKET_COMMAND_UNKOWN0x
+#define SPDM_SOCKET_COMMAND_TEST  0xDEAD
+
+#define SPDM_SOCKET_TRANSPORT_TYPE_MCTP   0x01
+#define SPDM_SOCKET_TRANSPORT_TYPE_PCI_D

[PATCH v6 3/3] hw/nvme: Add SPDM over DOE support

2024-03-10 Thread Alistair Francis
From: Wilfred Mallawa 

Setup Data Object Exchance (DOE) as an extended capability for the NVME
controller and connect SPDM to it (CMA) to it.

Signed-off-by: Wilfred Mallawa 
Signed-off-by: Alistair Francis 
Reviewed-by: Jonathan Cameron 
Acked-by: Klaus Jensen 
---
 docs/specs/index.rst|   1 +
 docs/specs/spdm.rst | 122 
 include/hw/pci/pci_device.h |   5 ++
 include/hw/pci/pcie_doe.h   |   3 +
 hw/nvme/ctrl.c  |  57 +
 5 files changed, 188 insertions(+)
 create mode 100644 docs/specs/spdm.rst

diff --git a/docs/specs/index.rst b/docs/specs/index.rst
index 1484e3e760..e2d907959a 100644
--- a/docs/specs/index.rst
+++ b/docs/specs/index.rst
@@ -29,6 +29,7 @@ guest hardware that is specific to QEMU.
edu
ivshmem-spec
pvpanic
+   spdm
standard-vga
virt-ctlr
vmcoreinfo
diff --git a/docs/specs/spdm.rst b/docs/specs/spdm.rst
new file mode 100644
index 00..06b5ccac09
--- /dev/null
+++ b/docs/specs/spdm.rst
@@ -0,0 +1,122 @@
+==
+QEMU Security Protocols and Data Models (SPDM) Support
+==
+
+SPDM enables authentication, attestation and key exchange to assist in
+providing infrastructure security enablement. It's a standard published
+by the `DMTF`_.
+
+QEMU supports connecting to a SPDM responder implementation. This allows an
+external application to emulate the SPDM responder logic for an SPDM device.
+
+Setting up a SPDM server
+
+
+When using QEMU with SPDM devices QEMU will connect to a server which
+implements the SPDM functionality.
+
+SPDM-Utils
+--
+
+You can use `SPDM Utils`_ to emulate a responder. This is the simplest method.
+
+SPDM-Utils is a Linux applications to manage, test and develop devices
+supporting DMTF Security Protocol and Data Model (SPDM). It is written in Rust
+and utilises libspdm.
+
+To use SPDM-Utils you will need to do the following steps. Details are included
+in the SPDM-Utils README.
+
+ 1. `Build libspdm`_
+ 2. `Build SPDM Utils`_
+ 3. `Run it as a server`_
+
+spdm-emu
+
+
+You can use `spdm emu`_ to model the
+SPDM responder.
+
+.. code-block:: shell
+
+$ cd spdm-emu
+$ git submodule init; git submodule update --recursive
+$ mkdir build; cd build
+$ cmake -DARCH=x64 -DTOOLCHAIN=GCC -DTARGET=Debug -DCRYPTO=openssl ..
+$ make -j32
+$ make copy_sample_key # Build certificates, required for SPDM 
authentication.
+
+It is worth noting that the certificates should be in compliance with
+PCIe r6.1 sec 6.31.3. This means you will need to add the following to
+openssl.cnf
+
+.. code-block::
+
+subjectAltName = 
otherName:2.23.147;UTF8:Vendor=1b36:Device=0010:CC=010802:REV=02:SSVID=1af4:SSID=1100
+2.23.147 = ASN1:OID:2.23.147
+
+and then manually regenerate some certificates with:
+
+.. code-block:: shell
+
+$ openssl req -nodes -newkey ec:param.pem -keyout end_responder.key \
+-out end_responder.req -sha384 -batch \
+-subj "/CN=DMTF libspdm ECP384 responder cert"
+
+$ openssl x509 -req -in end_responder.req -out end_responder.cert \
+-CA inter.cert -CAkey inter.key -sha384 -days 3650 -set_serial 3 \
+-extensions v3_end -extfile ../openssl.cnf
+
+$ openssl asn1parse -in end_responder.cert -out end_responder.cert.der
+
+$ cat ca.cert.der inter.cert.der end_responder.cert.der > 
bundle_responder.certchain.der
+
+You can use SPDM-Utils instead as it will generate the correct certificates
+automatically.
+
+The responder can then be launched with
+
+.. code-block:: shell
+
+$ cd bin
+$ ./spdm_responder_emu --trans PCI_DOE
+
+Connecting an SPDM NVMe device
+==
+
+Once a SPDM server is running we can start QEMU and connect to the server.
+
+For an NVMe device first let's setup a block we can use
+
+.. code-block:: shell
+
+$ cd qemu-spdm/linux/image
+$ dd if=/dev/zero of=blknvme bs=1M count=2096 # 2GB NNMe Drive
+
+Then you can add this to your QEMU command line:
+
+.. code-block:: shell
+
+-drive file=blknvme,if=none,id=mynvme,format=raw \
+-device nvme,drive=mynvme,serial=deadbeef,spdm=2323
+
+At which point QEMU will try to connect to the SPDM server.
+
+
+.. _DMTF:
+   https://www.dmtf.org/standards/SPDM
+
+.. _SPDM Utils:
+   https://github.com/westerndigitalcorporation/spdm-utils
+
+.. _spdm emu:
+   https://github.com/dmtf/spdm-emu
+
+.. _Build libspdm:
+   
https://github.com/westerndigitalcorporation/spdm-utils?tab=readme-ov-file#build-libspdm
+
+.. _Build SPDM Utils:
+   
https://github.com/westerndigitalcorporation/spdm-utils?tab=readme-ov-file#build-the-binary
+
+.. _Run it as a server:
+   
https://github.com/westerndigitalcorporation/spdm-utils#qemu-spdm-device-emulation
diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index d3dd0f64b2..b8379c78f1 100644
--- a/include/hw/pc

[PATCH v6 1/3] hw/pci: Add all Data Object Types defined in PCIe r6.0

2024-03-10 Thread Alistair Francis
Add all of the defined protocols/features from the PCIe-SIG r6.0
"Table 6-32 PCI-SIG defined Data Object Types (Vendor ID = 0001h)"
table.

Signed-off-by: Alistair Francis 
Reviewed-by: Jonathan Cameron 
---
 include/hw/pci/pcie_doe.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/hw/pci/pcie_doe.h b/include/hw/pci/pcie_doe.h
index 87dc17dcef..15d94661f9 100644
--- a/include/hw/pci/pcie_doe.h
+++ b/include/hw/pci/pcie_doe.h
@@ -46,6 +46,8 @@ REG32(PCI_DOE_CAP_STATUS, 0)
 
 /* PCI-SIG defined Data Object Types - r6.0 Table 6-32 */
 #define PCI_SIG_DOE_DISCOVERY   0x00
+#define PCI_SIG_DOE_CMA 0x01
+#define PCI_SIG_DOE_SECURED_CMA 0x02
 
 #define PCI_DOE_DW_SIZE_MAX (1 << 18)
 #define PCI_DOE_PROTOCOL_NUM_MAX256
-- 
2.44.0




[PATCH v6 0/3] Initial support for SPDM Responders

2024-03-10 Thread Alistair Francis
The Security Protocol and Data Model (SPDM) Specification defines
messages, data objects, and sequences for performing message exchanges
over a variety of transport and physical media.
 - 
https://www.dmtf.org/sites/default/files/standards/documents/DSP0274_1.3.0.pdf

SPDM currently supports PCIe DOE and MCTP transports, but it can be
extended to support others in the future. This series adds
support to QEMU to connect to an external SPDM instance.

SPDM support can be added to any QEMU device by exposing a
TCP socket to a SPDM server. The server can then implement the SPDM
decoding/encoding support, generally using libspdm [1].

This is similar to how the current TPM implementation works and means
that the heavy lifting of setting up certificate chains, capabilities,
measurements and complex crypto can be done outside QEMU by a well
supported and tested library.

This series implements socket support and exposes SPDM for a NVMe device.

1: https://github.com/DMTF/libspdm

v6:
 - Add documentation to public functions
 - Rename socket variable to spdm_socket
 - Don't override errp
 - Correctly return false from nvme_init_pci() on error
v5:
 - Update MAINTAINERS
v4:
 - Rebase
v3:
 - Spelling fixes
 - Support for SPDM-Utils
v2:
 - Add cover letter
 - A few code fixes based on comments
 - Document SPDM-Utils
 - A few tweaks and clarifications to the documentation

Alistair Francis (1):
  hw/pci: Add all Data Object Types defined in PCIe r6.0

Huai-Cheng Kuo (1):
  backends: Initial support for SPDM socket support

Wilfred Mallawa (1):
  hw/nvme: Add SPDM over DOE support

 MAINTAINERS  |   6 +
 docs/specs/index.rst |   1 +
 docs/specs/spdm.rst  | 122 
 include/hw/pci/pci_device.h  |   5 +
 include/hw/pci/pcie_doe.h|   5 +
 include/sysemu/spdm-socket.h |  74 
 backends/spdm-socket.c   | 216 +++
 hw/nvme/ctrl.c   |  57 +
 backends/Kconfig |   4 +
 backends/meson.build |   2 +
 10 files changed, 492 insertions(+)
 create mode 100644 docs/specs/spdm.rst
 create mode 100644 include/sysemu/spdm-socket.h
 create mode 100644 backends/spdm-socket.c

-- 
2.44.0




Re: [PATCH v2 00/10] mirror: allow switching from background to active mode

2024-03-10 Thread Peter Krempa
On Thu, Mar 07, 2024 at 22:42:56 +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 04.03.24 14:09, Peter Krempa wrote:
> > On Mon, Mar 04, 2024 at 11:48:54 +0100, Kevin Wolf wrote:
> > > Am 28.02.2024 um 19:07 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > > On 03.11.23 18:56, Markus Armbruster wrote:
> > > > > Kevin Wolf  writes:

[...]

> 
> I only see the following differencies:
> 
> 1. block-job-complete documents that it completes the job synchronously.. But 
> looking at mirror code I see it just set s->should_complete = true, which 
> will be then handled asynchronously.. So I doubt that documentation is 
> correct.
> 
> 2. block-job-complete will trigger final graph changes. block-job-cancel will 
> not.
> 
> Is [2] really useful? Seems yes: in case of some failure before starting 
> migration target, we'd like to continue executing source. So, no reason to 
> break block-graph in source, better keep it unchanged.
> 
> But I think, such behavior better be setup by mirror-job start parameter, 
> rather then by special option for cancel (or even compelete) command, useful 
> only for mirror.

Libvirt's API design was based on the qemu quirk and thus we allow users
to do the decision after starting the job, thus anything you pick needs
to allow us to do this at "completion" time.

Libvirt can adapt to any option that will give us the above semantics
(extra parameter at completion time, different completion command or
extra command to switch job properties right before completion), but to
be honest all of these feel like they would be more hassle than keeping
'block-job-cancel' around from qemu's side.




Re: [PATCH] hw/ide/ahci: Rename ahci_internal.h to ahci-internal.h

2024-03-10 Thread BALATON Zoltan

On Tue, 27 Feb 2024, Philippe Mathieu-Daudé wrote:

On 27/2/24 14:13, BALATON Zoltan wrote:

Other headers now use dash instead of underscore. Rename
ahci_internal.h accordingly for consistency.

Signed-off-by: BALATON Zoltan 
---
  hw/ide/{ahci_internal.h => ahci-internal.h} | 0
  hw/ide/ahci.c   | 2 +-
  hw/ide/ich.c| 2 +-
  3 files changed, 2 insertions(+), 2 deletions(-)
  rename hw/ide/{ahci_internal.h => ahci-internal.h} (100%)


Reviewed-by: Philippe Mathieu-Daudé 


Was this also queued somewhere? I haven't seen it merged neither with the 
trivial nor the misc-hw pull requests.


Regards,
BALATON Zoltan

Re: [PATCH 0/2] hw/nvme: fix hibernation-based migration

2024-03-10 Thread Klaus Jensen
On Mar 10 12:07, Klaus Jensen wrote:
> Julien Grall, in #2184, reported that hibernation-based migration with
> hw/nvme is broken when suspending on a pre 6.0 QEMU and resuming on a
> more recent one. This is because the BAR layout was changed in 6.0.
> 
> Fix this by adding a machine compatibility parameter that restores the
> old behavior.
> 
> Signed-off-by: Klaus Jensen 
> ---
> Klaus Jensen (2):
>   hw/nvme: generalize the mbar size helper
>   hw/nvme: add machine compatibility parameter to enable msix exclusive 
> bar
> 
>  hw/core/machine.c |  1 +
>  hw/nvme/ctrl.c| 73 
> ---
>  hw/nvme/nvme.h|  1 +
>  3 files changed, 50 insertions(+), 25 deletions(-)
> ---
> base-commit: f901bf11b3ddf852e591593b09b8aa7a177f9a0b
> change-id: 20240310-fix-msix-exclusive-bar-d65564414a2c
> 
> Best regards,
> -- 
> Klaus Jensen 
> 

Whoops, forgot Fixes: and Resolves: tags.

Will add that on the pull.


signature.asc
Description: PGP signature


[PATCH 0/2] hw/nvme: fix hibernation-based migration

2024-03-10 Thread Klaus Jensen
Julien Grall, in #2184, reported that hibernation-based migration with
hw/nvme is broken when suspending on a pre 6.0 QEMU and resuming on a
more recent one. This is because the BAR layout was changed in 6.0.

Fix this by adding a machine compatibility parameter that restores the
old behavior.

Signed-off-by: Klaus Jensen 
---
Klaus Jensen (2):
  hw/nvme: generalize the mbar size helper
  hw/nvme: add machine compatibility parameter to enable msix exclusive bar

 hw/core/machine.c |  1 +
 hw/nvme/ctrl.c| 73 ---
 hw/nvme/nvme.h|  1 +
 3 files changed, 50 insertions(+), 25 deletions(-)
---
base-commit: f901bf11b3ddf852e591593b09b8aa7a177f9a0b
change-id: 20240310-fix-msix-exclusive-bar-d65564414a2c

Best regards,
-- 
Klaus Jensen 




[PATCH 1/2] hw/nvme: generalize the mbar size helper

2024-03-10 Thread Klaus Jensen
From: Klaus Jensen 

Generalize the mbar size helper such that it can handle cases where the
MSI-X table and PBA are expected to be in an exclusive bar.

Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 76fe0397045b..8cca8a762124 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8003,13 +8003,18 @@ static void nvme_init_pmr(NvmeCtrl *n, PCIDevice 
*pci_dev)
 memory_region_set_enabled(&n->pmr.dev->mr, false);
 }
 
-static uint64_t nvme_bar_size(unsigned total_queues, unsigned total_irqs,
-  unsigned *msix_table_offset,
-  unsigned *msix_pba_offset)
+static uint64_t nvme_mbar_size(unsigned total_queues, unsigned total_irqs,
+   unsigned *msix_table_offset,
+   unsigned *msix_pba_offset)
 {
-uint64_t bar_size, msix_table_size, msix_pba_size;
+uint64_t bar_size, msix_table_size;
 
 bar_size = sizeof(NvmeBar) + 2 * total_queues * NVME_DB_SIZE;
+
+if (total_irqs == 0) {
+goto out;
+}
+
 bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB);
 
 if (msix_table_offset) {
@@ -8024,11 +8029,10 @@ static uint64_t nvme_bar_size(unsigned total_queues, 
unsigned total_irqs,
 *msix_pba_offset = bar_size;
 }
 
-msix_pba_size = QEMU_ALIGN_UP(total_irqs, 64) / 8;
-bar_size += msix_pba_size;
+bar_size += QEMU_ALIGN_UP(total_irqs, 64) / 8;
 
-bar_size = pow2ceil(bar_size);
-return bar_size;
+out:
+return pow2ceil(bar_size);
 }
 
 static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset)
@@ -8036,7 +8040,7 @@ static void nvme_init_sriov(NvmeCtrl *n, PCIDevice 
*pci_dev, uint16_t offset)
 uint16_t vf_dev_id = n->params.use_intel_id ?
  PCI_DEVICE_ID_INTEL_NVME : PCI_DEVICE_ID_REDHAT_NVME;
 NvmePriCtrlCap *cap = &n->pri_ctrl_cap;
-uint64_t bar_size = nvme_bar_size(le16_to_cpu(cap->vqfrsm),
+uint64_t bar_size = nvme_mbar_size(le16_to_cpu(cap->vqfrsm),
   le16_to_cpu(cap->vifrsm),
   NULL, NULL);
 
@@ -8098,8 +8102,8 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 }
 
 /* add one to max_ioqpairs to account for the admin queue pair */
-bar_size = nvme_bar_size(n->params.max_ioqpairs + 1, n->params.msix_qsize,
- &msix_table_offset, &msix_pba_offset);
+bar_size = nvme_mbar_size(n->params.max_ioqpairs + 1, n->params.msix_qsize,
+  &msix_table_offset, &msix_pba_offset);
 
 memory_region_init(&n->bar0, OBJECT(n), "nvme-bar0", bar_size);
 memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n, "nvme",

-- 
2.43.0




[PATCH 2/2] hw/nvme: add machine compatibility parameter to enable msix exclusive bar

2024-03-10 Thread Klaus Jensen
From: Klaus Jensen 

Commit 1901b4967c3f ("hw/block/nvme: move msix table and pba to BAR 0")
moved the MSI-X table and PBA to BAR 0 to make room for enabling CMR and
PMR at the same time. As reported by Julien Grall in #2184, this breaks
migration through system hibernation.

Add a machine compatibility parameter and set it on machines pre 6.0 to
enable the old behavior automatically, restoring the hibernation
migration support.

Reported-by: Julien Grall 
Tested-by: Julien Grall 
Signed-off-by: Klaus Jensen 
---
 hw/core/machine.c |  1 +
 hw/nvme/ctrl.c| 51 +++
 hw/nvme/nvme.h|  1 +
 3 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 9ac5d5389a6c..f3012bca1370 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -100,6 +100,7 @@ GlobalProperty hw_compat_5_2[] = {
 { "PIIX4_PM", "smm-compat", "on"},
 { "virtio-blk-device", "report-discard-granularity", "off" },
 { "virtio-net-pci-base", "vectors", "3"},
+{ "nvme", "msix-exclusive-bar", "on"},
 };
 const size_t hw_compat_5_2_len = G_N_ELEMENTS(hw_compat_5_2);
 
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 8cca8a762124..649467723e7e 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -7798,6 +7798,11 @@ static bool nvme_check_params(NvmeCtrl *n, Error **errp)
 }
 
 if (n->pmr.dev) {
+if (params->msix_exclusive_bar) {
+error_setg(errp, "not enough BARs available to enable PMR");
+return false;
+}
+
 if (host_memory_backend_is_mapped(n->pmr.dev)) {
 error_setg(errp, "can't use already busy memdev: %s",

object_get_canonical_path_component(OBJECT(n->pmr.dev)));
@@ -8101,24 +8106,36 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 pcie_ari_init(pci_dev, 0x100);
 }
 
-/* add one to max_ioqpairs to account for the admin queue pair */
-bar_size = nvme_mbar_size(n->params.max_ioqpairs + 1, n->params.msix_qsize,
-  &msix_table_offset, &msix_pba_offset);
-
-memory_region_init(&n->bar0, OBJECT(n), "nvme-bar0", bar_size);
-memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n, "nvme",
-  msix_table_offset);
-memory_region_add_subregion(&n->bar0, 0, &n->iomem);
-
-if (pci_is_vf(pci_dev)) {
-pcie_sriov_vf_register_bar(pci_dev, 0, &n->bar0);
-} else {
+if (n->params.msix_exclusive_bar && !pci_is_vf(pci_dev)) {
+bar_size = nvme_mbar_size(n->params.max_ioqpairs + 1, 0, NULL, NULL);
+memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n, "nvme",
+  bar_size);
 pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
- PCI_BASE_ADDRESS_MEM_TYPE_64, &n->bar0);
+ PCI_BASE_ADDRESS_MEM_TYPE_64, &n->iomem);
+ret = msix_init_exclusive_bar(pci_dev, n->params.msix_qsize, 4, errp);
+} else {
+/* add one to max_ioqpairs to account for the admin queue pair */
+bar_size = nvme_mbar_size(n->params.max_ioqpairs + 1,
+  n->params.msix_qsize, &msix_table_offset,
+  &msix_pba_offset);
+
+memory_region_init(&n->bar0, OBJECT(n), "nvme-bar0", bar_size);
+memory_region_init_io(&n->iomem, OBJECT(n), &nvme_mmio_ops, n, "nvme",
+  msix_table_offset);
+memory_region_add_subregion(&n->bar0, 0, &n->iomem);
+
+if (pci_is_vf(pci_dev)) {
+pcie_sriov_vf_register_bar(pci_dev, 0, &n->bar0);
+} else {
+pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_TYPE_64, &n->bar0);
+}
+
+ret = msix_init(pci_dev, n->params.msix_qsize,
+&n->bar0, 0, msix_table_offset,
+&n->bar0, 0, msix_pba_offset, 0, errp);
 }
-ret = msix_init(pci_dev, n->params.msix_qsize,
-&n->bar0, 0, msix_table_offset,
-&n->bar0, 0, msix_pba_offset, 0, errp);
+
 if (ret == -ENOTSUP) {
 /* report that msix is not supported, but do not error out */
 warn_report_err(*errp);
@@ -8416,6 +8433,8 @@ static Property nvme_props[] = {
   params.sriov_max_vi_per_vf, 0),
 DEFINE_PROP_UINT8("sriov_max_vq_per_vf", NvmeCtrl,
   params.sriov_max_vq_per_vf, 0),
+DEFINE_PROP_BOOL("msix-exclusive-bar", NvmeCtrl, params.msix_exclusive_bar,
+ false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 5f2ae7b28b9c..eccf14acc906 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -522,6 +522,7 @@ typedef struct NvmeParams {
 uint16_t sriov_vi_flexible;
 uint8_t  sriov_max_vq_per_vf;
 uint8_t  sriov_max_vi_per_v