Re: [PATCH v7 3/3] hw/nvme: Add SPDM over DOE support

2024-06-13 Thread Wilfred Mallawa
On Fri, 2024-06-14 at 11:28 +1000, Alistair Francis wrote:
> From: Wilfred Mallawa 
> 
> Setup Data Object Exchance (DOE) as an extended capability for the
> NVME
small typo here 邏️ [s/Setup Data Object Exchance/Setup Data Object
Exchange]

Wilfred
> controller and connect SPDM to it (CMA) to it.
> 
> Signed-off-by: Wilfred Mallawa 
> Signed-off-by: Alistair Francis 
> Reviewed-by: Jonathan Cameron 
> Acked-by: Klaus Jensen 
> ---
>  docs/specs/index.rst    |   1 +
>  docs/specs/spdm.rst | 134
> 
>  include/hw/pci/pci_device.h |   7 ++
>  include/hw/pci/pcie_doe.h   |   3 +
>  hw/nvme/ctrl.c  |  60 
>  5 files changed, 205 insertions(+)
>  create mode 100644 docs/specs/spdm.rst
> 
> diff --git a/docs/specs/index.rst b/docs/specs/index.rst
> index 1484e3e760..e2d907959a 100644
> --- a/docs/specs/index.rst
> +++ b/docs/specs/index.rst
> @@ -29,6 +29,7 @@ guest hardware that is specific to QEMU.
>     edu
>     ivshmem-spec
>     pvpanic
> +   spdm
>     standard-vga
>     virt-ctlr
>     vmcoreinfo
> diff --git a/docs/specs/spdm.rst b/docs/specs/spdm.rst
> new file mode 100644
> index 00..f7de080ff0
> --- /dev/null
> +++ b/docs/specs/spdm.rst
> @@ -0,0 +1,134 @@
> +==
> +QEMU Security Protocols and Data Models (SPDM) Support
> +==
> +
> +SPDM enables authentication, attestation and key exchange to assist
> in
> +providing infrastructure security enablement. It's a standard
> published
> +by the `DMTF`_.
> +
> +QEMU supports connecting to a SPDM responder implementation. This
> allows an
> +external application to emulate the SPDM responder logic for an SPDM
> device.
> +
> +Setting up a SPDM server
> +
> +
> +When using QEMU with SPDM devices QEMU will connect to a server
> which
> +implements the SPDM functionality.
> +
> +SPDM-Utils
> +--
> +
> +You can use `SPDM Utils`_ to emulate a responder. This is the
> simplest method.
> +
> +SPDM-Utils is a Linux applications to manage, test and develop
> devices
> +supporting DMTF Security Protocol and Data Model (SPDM). It is
> written in Rust
> +and utilises libspdm.
> +
> +To use SPDM-Utils you will need to do the following steps. Details
> are included
> +in the SPDM-Utils README.
> +
> + 1. `Build libspdm`_
> + 2. `Build SPDM Utils`_
> + 3. `Run it as a server`_
> +
> +spdm-emu
> +
> +
> +You can use `spdm emu`_ to model the
> +SPDM responder.
> +
> +.. code-block:: shell
> +
> +    $ cd spdm-emu
> +    $ git submodule init; git submodule update --recursive
> +    $ mkdir build; cd build
> +    $ cmake -DARCH=x64 -DTOOLCHAIN=GCC -DTARGET=Debug -
> DCRYPTO=openssl ..
> +    $ make -j32
> +    $ make copy_sample_key # Build certificates, required for SPDM
> authentication.
> +
> +It is worth noting that the certificates should be in compliance
> with
> +PCIe r6.1 sec 6.31.3. This means you will need to add the following
> to
> +openssl.cnf
> +
> +.. code-block::
> +
> +    subjectAltName =
> otherName:2.23.147;UTF8:Vendor=1b36:Device=0010:CC=010802:REV=02:SSVI
> D=1af4:SSID=1100
> +    2.23.147 = ASN1:OID:2.23.147
> +
> +and then manually regenerate some certificates with:
> +
> +.. code-block:: shell
> +
> +    $ openssl req -nodes -newkey ec:param.pem -keyout
> end_responder.key \
> +    -out end_responder.req -sha384 -batch \
> +    -subj "/CN=DMTF libspdm ECP384 responder cert"
> +
> +    $ openssl x509 -req -in end_responder.req -out
> end_responder.cert \
> +    -CA inter.cert -CAkey inter.key -sha384 -days 3650 -
> set_serial 3 \
> +    -extensions v3_end -extfile ../openssl.cnf
> +
> +    $ openssl asn1parse -in end_responder.cert -out
> end_responder.cert.der
> +
> +    $ cat ca.cert.der inter.cert.der end_responder.cert.der >
> bundle_responder.certchain.der
> +
> +You can use SPDM-Utils instead as it will generate the correct
> certificates
> +automatically.
> +
> +The responder can then be launched with
> +
> +.. code-block:: shell
> +
> +    $ cd bin
> +    $ ./spdm_responder_emu --trans PCI_DOE
> +
> +Connecting an SPDM NVMe device
> +==
> +
> +Once a SPDM server is running we can start QEMU and connect to the
> server.
> +
> +For an NVMe device first let's setup a block we can use
> +
> +.. code-block:: shell
> +
> +    $ cd qemu-spdm/linux/image
> +    $ dd if=/dev/zero of=blknvme bs=1M count=2096 # 2GB NNMe Drive
> +
> +Then you can add this to your QEMU command line:
> +
> +.. code-block:: shell
> +
> +    -drive file=blknvme,if=none,id=mynvme,format=raw \
> +    -device nvme,drive=mynvme,serial=deadbeef,spdm_port=2323
> +
> +At which point QEMU will try to connect to the SPDM server.
> +
> +Note that if using x64-64 you will want to use the q35 machine
> instead
> +of the default. So the entire QEMU command might look like this
> +
> +.. code-block:: shell
> +
> +    

Re: [PATCH v7 1/3] hw/pci: Add all Data Object Types defined in PCIe r6.0

2024-06-13 Thread Wilfred Mallawa
Reviewed-by: Wilfred Mallawa 

On Fri, 2024-06-14 at 11:28 +1000, Alistair Francis wrote:
> Add all of the defined protocols/features from the PCIe-SIG r6.0
> "Table 6-32 PCI-SIG defined Data Object Types (Vendor ID = 0001h)"
> table.
> 
> Signed-off-by: Alistair Francis 
> Reviewed-by: Jonathan Cameron 
> ---
>  include/hw/pci/pcie_doe.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/hw/pci/pcie_doe.h b/include/hw/pci/pcie_doe.h
> index 87dc17dcef..15d94661f9 100644
> --- a/include/hw/pci/pcie_doe.h
> +++ b/include/hw/pci/pcie_doe.h
> @@ -46,6 +46,8 @@ REG32(PCI_DOE_CAP_STATUS, 0)
>  
>  /* PCI-SIG defined Data Object Types - r6.0 Table 6-32 */
>  #define PCI_SIG_DOE_DISCOVERY   0x00
> +#define PCI_SIG_DOE_CMA 0x01
> +#define PCI_SIG_DOE_SECURED_CMA 0x02
>  
>  #define PCI_DOE_DW_SIZE_MAX (1 << 18)
>  #define PCI_DOE_PROTOCOL_NUM_MAX    256




[PATCH v7 2/3] backends: Initial support for SPDM socket support

2024-06-13 Thread Alistair Francis
From: Huai-Cheng Kuo 

SPDM enables authentication, attestation and key exchange to assist in
providing infrastructure security enablement. It's a standard published
by the DMTF [1].

SPDM supports multiple transports, including PCIe DOE and MCTP.
This patch adds support to QEMU to connect to an external SPDM
instance.

SPDM support can be added to any QEMU device by exposing a
TCP socket to a SPDM server. The server can then implement the SPDM
decoding/encoding support, generally using libspdm [2].

This is similar to how the current TPM implementation works and means
that the heavy lifting of setting up certificate chains, capabilities,
measurements and complex crypto can be done outside QEMU by a well
supported and tested library.

1: https://www.dmtf.org/standards/SPDM
2: https://github.com/DMTF/libspdm

Signed-off-by: Huai-Cheng Kuo 
Signed-off-by: Chris Browy 
Co-developed-by: Jonathan Cameron 
Signed-off-by: Jonathan Cameron 
[ Changes by WM
 - Bug fixes from testing
]
Signed-off-by: Wilfred Mallawa 
[ Changes by AF:
 - Convert to be more QEMU-ified
 - Move to backends as it isn't PCIe specific
]
Signed-off-by: Alistair Francis 
---
 MAINTAINERS  |   6 +
 include/sysemu/spdm-socket.h |  74 
 backends/spdm-socket.c   | 216 +++
 backends/Kconfig |   4 +
 backends/meson.build |   2 +
 5 files changed, 302 insertions(+)
 create mode 100644 include/sysemu/spdm-socket.h
 create mode 100644 backends/spdm-socket.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 951556224a..aeac108586 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3394,6 +3394,12 @@ F: tests/qtest/*tpm*
 F: docs/specs/tpm.rst
 T: git https://github.com/stefanberger/qemu-tpm.git tpm-next
 
+SPDM
+M: Alistair Francis 
+S: Maintained
+F: backends/spdm-socket.c
+F: include/sysemu/spdm-socket.h
+
 Checkpatch
 S: Odd Fixes
 F: scripts/checkpatch.pl
diff --git a/include/sysemu/spdm-socket.h b/include/sysemu/spdm-socket.h
new file mode 100644
index 00..5d8bd9aa4e
--- /dev/null
+++ b/include/sysemu/spdm-socket.h
@@ -0,0 +1,74 @@
+/*
+ * QEMU SPDM socket support
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef SPDM_REQUESTER_H
+#define SPDM_REQUESTER_H
+
+/**
+ * spdm_socket_connect: connect to an external SPDM socket
+ * @port: port to connect to
+ * @errp: error object handle
+ *
+ * This will connect to an external SPDM socket server. On error
+ * it will return -1 and errp will be set. On success this function
+ * will return the socket number.
+ */
+int spdm_socket_connect(uint16_t port, Error **errp);
+
+/**
+ * spdm_socket_rsp: send and receive a message to a SPDM server
+ * @socket: socket returned from spdm_socket_connect()
+ * @transport_type: SPDM_SOCKET_TRANSPORT_TYPE_* macro
+ * @req: request buffer
+ * @req_len: request buffer length
+ * @rsp: response buffer
+ * @rsp_len: response buffer length
+ *
+ * Send platform data to a SPDM server on socket and then receive
+ * a response.
+ */
+uint32_t spdm_socket_rsp(const int socket, uint32_t transport_type,
+ void *req, uint32_t req_len,
+ void *rsp, uint32_t rsp_len);
+
+/**
+ * spdm_socket_close: send a shutdown command to the server
+ * @socket: socket returned from spdm_socket_connect()
+ * @transport_type: SPDM_SOCKET_TRANSPORT_TYPE_* macro
+ *
+ * This will issue a shutdown command to the server.
+ */
+void spdm_socket_close(const int socket, uint32_t transport_type);
+
+#define SPDM_SOCKET_COMMAND_NORMAL0x0001
+#define SPDM_SOCKET_COMMAND_OOB_ENCAP_KEY_UPDATE  0x8001
+#define SPDM_SOCKET_COMMAND_CONTINUE  0xFFFD
+#define SPDM_SOCKET_COMMAND_SHUTDOWN  0xFFFE
+#define SPDM_SOCKET_COMMAND_UNKOWN0x
+#define SPDM_SOCKET_COMMAND_TEST  0xDEAD
+
+#define SPDM_SOCKET_TRANSPORT_TYPE_MCTP   0x01
+#define 

[PATCH v7 0/3] Initial support for SPDM Responders

2024-06-13 Thread Alistair Francis
The Security Protocol and Data Model (SPDM) Specification defines
messages, data objects, and sequences for performing message exchanges
over a variety of transport and physical media.
 - 
https://www.dmtf.org/sites/default/files/standards/documents/DSP0274_1.3.0.pdf

SPDM currently supports PCIe DOE and MCTP transports, but it can be
extended to support others in the future. This series adds
support to QEMU to connect to an external SPDM instance.

SPDM support can be added to any QEMU device by exposing a
TCP socket to a SPDM server. The server can then implement the SPDM
decoding/encoding support, generally using libspdm [1].

This is similar to how the current TPM implementation works and means
that the heavy lifting of setting up certificate chains, capabilities,
measurements and complex crypto can be done outside QEMU by a well
supported and tested library.

This series implements socket support and exposes SPDM for a NVMe device.

1: https://github.com/DMTF/libspdm

v7:
 - Fixup checkpatch failures
 - Fixup test failures
 - Rename port name to be clearer
v6:
 - Add documentation to public functions
 - Rename socket variable to spdm_socket
 - Don't override errp
 - Correctly return false from nvme_init_pci() on error
v5:
 - Update MAINTAINERS
v4:
 - Rebase
v3:
 - Spelling fixes
 - Support for SPDM-Utils
v2:
 - Add cover letter
 - A few code fixes based on comments
 - Document SPDM-Utils
 - A few tweaks and clarifications to the documentation

Alistair Francis (1):
  hw/pci: Add all Data Object Types defined in PCIe r6.0

Huai-Cheng Kuo (1):
  backends: Initial support for SPDM socket support

Wilfred Mallawa (1):
  hw/nvme: Add SPDM over DOE support

 MAINTAINERS  |   6 +
 docs/specs/index.rst |   1 +
 docs/specs/spdm.rst  | 134 ++
 include/hw/pci/pci_device.h  |   7 ++
 include/hw/pci/pcie_doe.h|   5 +
 include/sysemu/spdm-socket.h |  74 
 backends/spdm-socket.c   | 216 +++
 hw/nvme/ctrl.c   |  60 ++
 backends/Kconfig |   4 +
 backends/meson.build |   2 +
 10 files changed, 509 insertions(+)
 create mode 100644 docs/specs/spdm.rst
 create mode 100644 include/sysemu/spdm-socket.h
 create mode 100644 backends/spdm-socket.c

-- 
2.45.2




[PATCH v7 1/3] hw/pci: Add all Data Object Types defined in PCIe r6.0

2024-06-13 Thread Alistair Francis
Add all of the defined protocols/features from the PCIe-SIG r6.0
"Table 6-32 PCI-SIG defined Data Object Types (Vendor ID = 0001h)"
table.

Signed-off-by: Alistair Francis 
Reviewed-by: Jonathan Cameron 
---
 include/hw/pci/pcie_doe.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/hw/pci/pcie_doe.h b/include/hw/pci/pcie_doe.h
index 87dc17dcef..15d94661f9 100644
--- a/include/hw/pci/pcie_doe.h
+++ b/include/hw/pci/pcie_doe.h
@@ -46,6 +46,8 @@ REG32(PCI_DOE_CAP_STATUS, 0)
 
 /* PCI-SIG defined Data Object Types - r6.0 Table 6-32 */
 #define PCI_SIG_DOE_DISCOVERY   0x00
+#define PCI_SIG_DOE_CMA 0x01
+#define PCI_SIG_DOE_SECURED_CMA 0x02
 
 #define PCI_DOE_DW_SIZE_MAX (1 << 18)
 #define PCI_DOE_PROTOCOL_NUM_MAX256
-- 
2.45.2




[PATCH v7 3/3] hw/nvme: Add SPDM over DOE support

2024-06-13 Thread Alistair Francis
From: Wilfred Mallawa 

Setup Data Object Exchance (DOE) as an extended capability for the NVME
controller and connect SPDM to it (CMA) to it.

Signed-off-by: Wilfred Mallawa 
Signed-off-by: Alistair Francis 
Reviewed-by: Jonathan Cameron 
Acked-by: Klaus Jensen 
---
 docs/specs/index.rst|   1 +
 docs/specs/spdm.rst | 134 
 include/hw/pci/pci_device.h |   7 ++
 include/hw/pci/pcie_doe.h   |   3 +
 hw/nvme/ctrl.c  |  60 
 5 files changed, 205 insertions(+)
 create mode 100644 docs/specs/spdm.rst

diff --git a/docs/specs/index.rst b/docs/specs/index.rst
index 1484e3e760..e2d907959a 100644
--- a/docs/specs/index.rst
+++ b/docs/specs/index.rst
@@ -29,6 +29,7 @@ guest hardware that is specific to QEMU.
edu
ivshmem-spec
pvpanic
+   spdm
standard-vga
virt-ctlr
vmcoreinfo
diff --git a/docs/specs/spdm.rst b/docs/specs/spdm.rst
new file mode 100644
index 00..f7de080ff0
--- /dev/null
+++ b/docs/specs/spdm.rst
@@ -0,0 +1,134 @@
+==
+QEMU Security Protocols and Data Models (SPDM) Support
+==
+
+SPDM enables authentication, attestation and key exchange to assist in
+providing infrastructure security enablement. It's a standard published
+by the `DMTF`_.
+
+QEMU supports connecting to a SPDM responder implementation. This allows an
+external application to emulate the SPDM responder logic for an SPDM device.
+
+Setting up a SPDM server
+
+
+When using QEMU with SPDM devices QEMU will connect to a server which
+implements the SPDM functionality.
+
+SPDM-Utils
+--
+
+You can use `SPDM Utils`_ to emulate a responder. This is the simplest method.
+
+SPDM-Utils is a Linux applications to manage, test and develop devices
+supporting DMTF Security Protocol and Data Model (SPDM). It is written in Rust
+and utilises libspdm.
+
+To use SPDM-Utils you will need to do the following steps. Details are included
+in the SPDM-Utils README.
+
+ 1. `Build libspdm`_
+ 2. `Build SPDM Utils`_
+ 3. `Run it as a server`_
+
+spdm-emu
+
+
+You can use `spdm emu`_ to model the
+SPDM responder.
+
+.. code-block:: shell
+
+$ cd spdm-emu
+$ git submodule init; git submodule update --recursive
+$ mkdir build; cd build
+$ cmake -DARCH=x64 -DTOOLCHAIN=GCC -DTARGET=Debug -DCRYPTO=openssl ..
+$ make -j32
+$ make copy_sample_key # Build certificates, required for SPDM 
authentication.
+
+It is worth noting that the certificates should be in compliance with
+PCIe r6.1 sec 6.31.3. This means you will need to add the following to
+openssl.cnf
+
+.. code-block::
+
+subjectAltName = 
otherName:2.23.147;UTF8:Vendor=1b36:Device=0010:CC=010802:REV=02:SSVID=1af4:SSID=1100
+2.23.147 = ASN1:OID:2.23.147
+
+and then manually regenerate some certificates with:
+
+.. code-block:: shell
+
+$ openssl req -nodes -newkey ec:param.pem -keyout end_responder.key \
+-out end_responder.req -sha384 -batch \
+-subj "/CN=DMTF libspdm ECP384 responder cert"
+
+$ openssl x509 -req -in end_responder.req -out end_responder.cert \
+-CA inter.cert -CAkey inter.key -sha384 -days 3650 -set_serial 3 \
+-extensions v3_end -extfile ../openssl.cnf
+
+$ openssl asn1parse -in end_responder.cert -out end_responder.cert.der
+
+$ cat ca.cert.der inter.cert.der end_responder.cert.der > 
bundle_responder.certchain.der
+
+You can use SPDM-Utils instead as it will generate the correct certificates
+automatically.
+
+The responder can then be launched with
+
+.. code-block:: shell
+
+$ cd bin
+$ ./spdm_responder_emu --trans PCI_DOE
+
+Connecting an SPDM NVMe device
+==
+
+Once a SPDM server is running we can start QEMU and connect to the server.
+
+For an NVMe device first let's setup a block we can use
+
+.. code-block:: shell
+
+$ cd qemu-spdm/linux/image
+$ dd if=/dev/zero of=blknvme bs=1M count=2096 # 2GB NNMe Drive
+
+Then you can add this to your QEMU command line:
+
+.. code-block:: shell
+
+-drive file=blknvme,if=none,id=mynvme,format=raw \
+-device nvme,drive=mynvme,serial=deadbeef,spdm_port=2323
+
+At which point QEMU will try to connect to the SPDM server.
+
+Note that if using x64-64 you will want to use the q35 machine instead
+of the default. So the entire QEMU command might look like this
+
+.. code-block:: shell
+
+qemu-system-x86_64 -M q35 \
+--kernel bzImage \
+-drive file=rootfs.ext2,if=virtio,format=raw \
+-append "root=/dev/vda console=ttyS0" \
+-net none -nographic \
+-drive file=blknvme,if=none,id=mynvme,format=raw \
+-device nvme,drive=mynvme,serial=deadbeef,spdm_port=2323
+
+.. _DMTF:
+   https://www.dmtf.org/standards/SPDM
+
+.. _SPDM Utils:
+   https://github.com/westerndigitalcorporation/spdm-utils
+
+.. _spdm emu:
+   

Re: [PATCH v5 5/5] iotests: Add `vvfat` tests

2024-06-13 Thread Kevin Wolf
Am 13.06.2024 um 16:07 hat Amjad Alsharafi geschrieben:
> On Wed, Jun 12, 2024 at 08:43:26PM +0800, Amjad Alsharafi wrote:
> > Added several tests to verify the implementation of the vvfat driver.
> > 
> > We needed a way to interact with it, so created a basic `fat16.py` driver
> > that handled writing correct sectors for us.
> > 
> > Added `vvfat` to the non-generic formats, as its not a normal image format.
> > 
> > Signed-off-by: Amjad Alsharafi 
> > ---
> >  tests/qemu-iotests/check   |   2 +-
> >  tests/qemu-iotests/fat16.py| 673 +
> >  tests/qemu-iotests/testenv.py  |   2 +-
> >  tests/qemu-iotests/tests/vvfat | 457 
> >  tests/qemu-iotests/tests/vvfat.out |   5 +
> >  5 files changed, 1137 insertions(+), 2 deletions(-)
> >  create mode 100644 tests/qemu-iotests/fat16.py
> >  create mode 100755 tests/qemu-iotests/tests/vvfat
> >  create mode 100755 tests/qemu-iotests/tests/vvfat.out
> 
> Btw Kevin, I'm not sure if I should add myself to the MAINTAINERS file
> for the new files (fat16.py and vvfat) or not

You can if you really want to, but there is no need.

Basically if you see yourself maintaining these files for a long time
instead of just contributing them now and you want to be contacted
when they are changed in the future, that's when you could consider it.
But the patches will still go through my or Hanna's tree and we already
cover all of qemu-iotests, so we don't have to have a formal
maintainership definition for these files specifically.

Kevin




Re: [PULL 0/8] Block layer patches

2024-06-13 Thread Richard Henderson

On 6/11/24 10:36, Kevin Wolf wrote:

The following changes since commit 80e8f0602168f451a93e71cbb1d59e93d745e62e:

   Merge tag 'bsd-user-misc-2024q2-pull-request' of gitlab.com:bsdimp/qemu into 
staging (2024-06-09 11:21:55 -0700)

are available in the Git repository at:

   https://repo.or.cz/qemu/kevin.git  tags/for-upstream

for you to fetch changes up to 3ab0f063e58ed9224237d69c4211ca83335164c4:

   crypto/block: drop qcrypto_block_open() n_threads argument (2024-06-10 
11:05:43 +0200)


Block layer patches

- crypto: Fix crash when used with multiqueue devices
- linux-aio: add IO_CMD_FDSYNC command support
- copy-before-write: Avoid integer overflows for timeout > 4s
- Fix crash with QMP block_resize and iothreads
- qemu-io: add cvtnum() error handling for zone commands
- Code cleanup


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.


r~






Re: [PATCH 07/20] qapi/parser: add semantic 'kind' parameter to QAPIDoc.Section

2024-06-13 Thread Markus Armbruster
John Snow  writes:

> On Thu, May 16, 2024, 2:18 AM Markus Armbruster  wrote:
>
>> John Snow  writes:
>>
>> > When iterating all_sections, this is helpful to be able to distinguish
>> > "members" from "features"; the only other way to do so is to
>> > cross-reference these sections against QAPIDoc.args or QAPIDoc.features,
>> > but if the desired end goal for QAPIDoc is to remove everything except
>> > all_sections, we need *something* accessible to distinguish them.
>> >
>> > To keep types simple, add this semantic parameter to the base Section
>> > and not just ArgSection; we can use this to filter out paragraphs and
>> > tagged sections, too.
>> >
>> > Signed-off-by: John Snow 
>> > ---
>> >  scripts/qapi/parser.py | 25 -
>> >  1 file changed, 16 insertions(+), 9 deletions(-)
>> >
>> > diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>> > index 161768b8b96..cf4cbca1c1f 100644
>> > --- a/scripts/qapi/parser.py
>> > +++ b/scripts/qapi/parser.py
>> > @@ -613,21 +613,27 @@ class QAPIDoc:
>> >
>> >  class Section:
>> >  # pylint: disable=too-few-public-methods
>> > -def __init__(self, info: QAPISourceInfo,
>> > - tag: Optional[str] = None):
>> > +def __init__(
>> > +self,
>> > +info: QAPISourceInfo,
>> > +tag: Optional[str] = None,
>> > +kind: str = 'paragraph',
>> > +):
>> >  # section source info, i.e. where it begins
>> >  self.info = info
>> >  # section tag, if any ('Returns', '@name', ...)
>> >  self.tag = tag
>> >  # section text without tag
>> >  self.text = ''
>> > +# section type - {paragraph, feature, member, tagged}
>> > +self.kind = kind
>>
>> Hmm.  .kind is almost redundant with .tag.
>>
>
> Almost, yes. But the crucial bit is members/features as you notice. That's
> the real necessity here that saves a lot of code when relying on *only*
> all_sections.
>
> (If you want to remove the other fields leaving only all_sections behind,
> this is strictly necessary.)
>
>
>> Untagged section:.kind is 'paragraph', .tag is None
>>
>> Member description:  .kind is 'member', .tag matches @NAME
>>
>> Feature description: .kind is 'feature', .tag matches @NAME
>
>
>> Tagged section:  .kind is 'tagged', .tag matches
>>   r'Returns|Errors|Since|Notes?|Examples?|TODO'
>>
>> .kind can directly be derived from .tag except for member and feature
>> descriptions.  And you want to tell these two apart in a straightforward
>> manner in later patches, as you explain in your commit message.
>>
>> If .kind is 'member' or 'feature', then self must be an ArgSection.
>> Worth a comment?  An assertion?
>>
>
> No real need. The classes don't differ much in practice so there's not much
> benefit, and asserting it won't help the static typer out anyway because it
> can't remember the inference from string->type anyway.
>
> If you wanted to be FANCY, we could use string literal typing on the field
> and restrict valid values per-class, but that's so needless not even I'm
> tempted by it.
>
>
>> Some time back, I considered changing .tag for member and feature
>> descriptions to suitable strings, like your 'member' and 'feature', and
>> move the member / feature name into ArgSection.  I didn't, because the
>> benefit wasn't worth the churn at the time.  Perhaps it's worth it now.
>> Would it result in simpler code than your solution?
>>
>
> Not considerably, I think. Would just be shuffling around which field names
> I touch and where/when.

The way .tag works irks me.  I might explore the change I proposed just
to see whether I hate the result less.  On top of your work, to not
annoy you without need.

> It might actually just add some lines where I have to assert isinstance to
> do type narrowing in the generator.
>
>> Terminology nit: the section you call 'paragraph' isn't actually a
>> paragraph: it could be several paragraphs.  Best to call it 'untagged',
>> as in .ensure_untagged_section().
>>
>
> Oh, I hate when you make a good point. I was avoiding the term because I'm
> removing Notes and Examples, and we have plans to eliminate Since ... the
> tagged sections are almost going away entirely, leaving just TODO, which we
> ignore.
>
> Uhm, can I name it paragraphs? :) or open to other suggestions, incl.
> untagged if that's still your preference.

'untagged' is more consistent with existing names and comments:
.ensure_untagged_section(), "additional (non-argument) sections,
possibly tagged", ...

When all tags but 'TODO' are gone, the concept "tagged vs. untagged
section" ceases to make sense, I guess.  We can tweak the names and
comments accordingly then.

>
>> >
>> >  def append_line(self, line: str) -> None:
>> >  self.text += line + '\n'
>> >
>> >  class ArgSection(Section):
>> > -def __init__(self, info: QAPISourceInfo, 

Re: [PATCH 06/20] qapi/parser: fix comment parsing immediately following a doc block

2024-06-13 Thread Markus Armbruster
John Snow  writes:

> On Thu, May 16, 2024 at 2:01 AM Markus Armbruster  wrote:
>
>> John Snow  writes:
>>
>> > If a comment immediately follows a doc block, the parser doesn't ignore
>> > that token appropriately. Fix that.
>>
>> Reproducer?
>>
>> >
>> > Signed-off-by: John Snow 
>> > ---
>> >  scripts/qapi/parser.py | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>> > index 41b9319e5cb..161768b8b96 100644
>> > --- a/scripts/qapi/parser.py
>> > +++ b/scripts/qapi/parser.py
>> > @@ -587,7 +587,7 @@ def get_doc(self) -> 'QAPIDoc':
>> >  line = self.get_doc_line()
>> >  first = False
>> >
>> > -self.accept(False)
>> > +self.accept()
>> >  doc.end()
>> >  return doc
>>
>> Can't judge the fix without understanding the problem, and the problem
>> will be easier to understand for me with a reproducer.
>>
>
> audio.json:
>
> ```
> ##
> # = Audio
> ##
>
> ##
> # @AudiodevPerDirectionOptions:
> #
> # General audio backend options that are used for both playback and
> # recording.
> #
> ```
>
> Modify this excerpt to have a comment after the "= Audio" header, say for
> instance if you were to take out that intro paragraph and transform it into
> a comment that preceded the AudiodevPerDirectionOptions doc block.
>
> e.g.
>
> ```
> ##
> # = Audio
> ##
>
> # Lorem Ipsum
>
> ##
> # @AudiodevPerDirectionOptions:
> ```
>
> the parser breaks because the line I changed that primes the next token is
> still set to "not ignore comments", but that breaks the parser and gives a
> rather unhelpful message:
>
> ../qapi/audio.json:13:1: junk after '##' at start of documentation comment

When ._parse()'s loop sees a comment token in .tok, it expects it to be
the start of a doc comment block (because any other comments are to be
skipped).  It then calls .get_doc(), which expects the same.  The
unexpected comment token then triggers .get_doc()'s check for junk after
'##'.

> Encountered when converting developer commentary from documentation
> paragraphs to mere QAPI comments.

Your fix is correct.

It's actually a regression.  Please add

Fixes: 3d035cd2cca6 (qapi: Rewrite doc comment parser)

to the commit message.

Let's add a reproducer to tests/qapi-schema/doc-good.json right in this
patch, e.g.

diff --git a/tests/qapi-schema/doc-good.json b/tests/qapi-schema/doc-good.json
index de38a386e8..8b39eb946a 100644
--- a/tests/qapi-schema/doc-good.json
+++ b/tests/qapi-schema/doc-good.json
@@ -55,6 +55,8 @@
 # - {braces}
 ##
 
+# Not a doc comment
+
 ##
 # @Enum:
 #




Re: [PATCH v5 5/5] iotests: Add `vvfat` tests

2024-06-13 Thread Amjad Alsharafi
On Wed, Jun 12, 2024 at 08:43:26PM +0800, Amjad Alsharafi wrote:
> Added several tests to verify the implementation of the vvfat driver.
> 
> We needed a way to interact with it, so created a basic `fat16.py` driver
> that handled writing correct sectors for us.
> 
> Added `vvfat` to the non-generic formats, as its not a normal image format.
> 
> Signed-off-by: Amjad Alsharafi 
> ---
>  tests/qemu-iotests/check   |   2 +-
>  tests/qemu-iotests/fat16.py| 673 +
>  tests/qemu-iotests/testenv.py  |   2 +-
>  tests/qemu-iotests/tests/vvfat | 457 
>  tests/qemu-iotests/tests/vvfat.out |   5 +
>  5 files changed, 1137 insertions(+), 2 deletions(-)
>  create mode 100644 tests/qemu-iotests/fat16.py
>  create mode 100755 tests/qemu-iotests/tests/vvfat
>  create mode 100755 tests/qemu-iotests/tests/vvfat.out
> 
> diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
> index 56d88ca423..545f9ec7bd 100755
> --- a/tests/qemu-iotests/check
> +++ b/tests/qemu-iotests/check
> @@ -84,7 +84,7 @@ def make_argparser() -> argparse.ArgumentParser:
>  p.set_defaults(imgfmt='raw', imgproto='file')
>  
>  format_list = ['raw', 'bochs', 'cloop', 'parallels', 'qcow', 'qcow2',
> -   'qed', 'vdi', 'vpc', 'vhdx', 'vmdk', 'luks', 'dmg']
> +   'qed', 'vdi', 'vpc', 'vhdx', 'vmdk', 'luks', 'dmg', 
> 'vvfat']
>  g_fmt = p.add_argument_group(
>  '  image format options',
>  'The following options set the IMGFMT environment variable. '
> diff --git a/tests/qemu-iotests/fat16.py b/tests/qemu-iotests/fat16.py
> new file mode 100644
> index 00..abe4ce1f8f
> --- /dev/null
> +++ b/tests/qemu-iotests/fat16.py
> @@ -0,0 +1,673 @@
> +# A simple FAT16 driver that is used to test the `vvfat` driver in QEMU.
> +#
> +# Copyright (C) 2024 Amjad Alsharafi 
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see .
> +
> +from typing import List
> +import string
> +
> +SECTOR_SIZE = 512
> +DIRENTRY_SIZE = 32
> +ALLOWED_FILE_CHARS = set(
> +"!#$%&'()-@^_`{}~" + string.digits + string.ascii_uppercase
> +)
> +
> +
> +class MBR:
> +def __init__(self, data: bytes):
> +assert len(data) == 512
> +self.partition_table = []
> +for i in range(4):
> +partition = data[446 + i * 16 : 446 + (i + 1) * 16]
> +self.partition_table.append(
> +{
> +"status": partition[0],
> +"start_head": partition[1],
> +"start_sector": partition[2] & 0x3F,
> +"start_cylinder": ((partition[2] & 0xC0) << 2)
> +  | partition[3],
> +"type": partition[4],
> +"end_head": partition[5],
> +"end_sector": partition[6] & 0x3F,
> +"end_cylinder": ((partition[6] & 0xC0) << 2) | 
> partition[7],
> +"start_lba": int.from_bytes(partition[8:12], "little"),
> +"size": int.from_bytes(partition[12:16], "little"),
> +}
> +)
> +
> +def __str__(self):
> +return "\n".join(
> +[
> +f"{i}: {partition}"
> +for i, partition in enumerate(self.partition_table)
> +]
> +)
> +
> +
> +class FatBootSector:
> +def __init__(self, data: bytes):
> +assert len(data) == 512
> +self.bytes_per_sector = int.from_bytes(data[11:13], "little")
> +self.sectors_per_cluster = data[13]
> +self.reserved_sectors = int.from_bytes(data[14:16], "little")
> +self.fat_count = data[16]
> +self.root_entries = int.from_bytes(data[17:19], "little")
> +total_sectors_16 = int.from_bytes(data[19:21], "little")
> +self.media_descriptor = data[21]
> +self.sectors_per_fat = int.from_bytes(data[22:24], "little")
> +self.sectors_per_track = int.from_bytes(data[24:26], "little")
> +self.heads = int.from_bytes(data[26:28], "little")
> +self.hidden_sectors = int.from_bytes(data[28:32], "little")
> +total_sectors_32 = int.from_bytes(data[32:36], "little")
> +assert (
> +total_sectors_16 == 0 or total_sectors_32 == 0
> +), "Both total sectors (16 and 32) fields are non-zero"

Re: [PATCH RESEND v7 10/12] hostmem: add a new memory backend based on POSIX shm_open()

2024-06-13 Thread Stefano Garzarella

On Wed, Jun 12, 2024 at 03:20:48PM GMT, Markus Armbruster wrote:

Stefano Garzarella  writes:


shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).

The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.

This backend mimics memfd, allocating memory that is practically
anonymous. In theory shm_open() requires a name, but this is allocated
for a short time interval and shm_unlink() is called right after
shm_open(). After that, only fd is shared with external processes
(e.g., vhost-user) as if it were associated with anonymous memory.

In the future we may also allow the user to specify the name to be
passed to shm_open(), but for now we keep the backend simple, mimicking
anonymous memory such as memfd.

Acked-by: David Hildenbrand 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Stefano Garzarella 


[...]


diff --git a/qapi/qom.json b/qapi/qom.json
index 9b8f6a7ab5..94e4458288 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -601,8 +601,8 @@
 #
 # @share: if false, the memory is private to QEMU; if true, it is
 # shared (default false for backends memory-backend-file and
-# memory-backend-ram, true for backends memory-backend-epc and
-# memory-backend-memfd)
+# memory-backend-ram, true for backends memory-backend-epc,
+# memory-backend-memfd, and memory-backend-shm)
 #
 # @reserve: if true, reserve swap space (or huge pages) if applicable
 # (default: true) (since 6.1)
@@ -721,6 +721,22 @@
 '*hugetlbsize': 'size',
 '*seal': 'bool' } }

+##
+# @MemoryBackendShmProperties:
+#
+# Properties for memory-backend-shm objects.
+#
+# Setting @share boolean option (defined in the base type) to false
+# will cause a failure during allocation because it is not
+# supported by this backend.


This is QMP reference documentation.  "Failure during allocation" feels
like unnecessary detail there.  Maybe "This memory backend support only
shared memory, which is the default."


I'll fix in v8!




+#
+# Since: 9.1
+##
+{ 'struct': 'MemoryBackendShmProperties',
+  'base': 'MemoryBackendProperties',
+  'data': { },
+  'if': 'CONFIG_POSIX' }
+
 ##
 # @MemoryBackendEpcProperties:
 #
@@ -1049,6 +1065,8 @@
 { 'name': 'memory-backend-memfd',
   'if': 'CONFIG_LINUX' },
 'memory-backend-ram',
+{ 'name': 'memory-backend-shm',
+  'if': 'CONFIG_POSIX' },
 'pef-guest',
 { 'name': 'pr-manager-helper',
   'if': 'CONFIG_LINUX' },
@@ -1121,6 +1139,8 @@
   'memory-backend-memfd':   { 'type': 'MemoryBackendMemfdProperties',
   'if': 'CONFIG_LINUX' },
   'memory-backend-ram': 'MemoryBackendProperties',
+  'memory-backend-shm': { 'type': 'MemoryBackendShmProperties',
+  'if': 'CONFIG_POSIX' },
   'pr-manager-helper':  { 'type': 'PrManagerHelperProperties',
   'if': 'CONFIG_LINUX' },
   'qtest':  'QtestProperties',


[...]

Other than that, QAPI schema
Acked-by: Markus Armbruster 



Thanks for the review!
Stefano




Re: [PATCH v4 5/5] iotests: add backup-discard-source

2024-06-13 Thread Kevin Wolf
Am 12.06.2024 um 21:21 hat Vladimir Sementsov-Ogievskiy geschrieben:
> On 11.06.24 20:49, Kevin Wolf wrote:
> > Am 13.03.2024 um 16:28 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > Add test for a new backup option: discard-source.
> > > 
> > > Signed-off-by: Vladimir Sementsov-Ogievskiy 
> > > Reviewed-by: Fiona Ebner 
> > > Tested-by: Fiona Ebner 
> > 
> > This test fails for me, and it already does so after this commit that
> > introduced it. I haven't checked what get_actual_size(), but I'm running
> > on XFS, so its preallocation could be causing this. We generally avoid
> > checking the number of allocated blocks in image files for this reason.
> > 
> 
> Hmm right, I see that relying on allocated size is bad thing. Better
> is to check block status, to see how many qcow2 clusters are
> allocated.
> 
> Do we have some qmp command to get such information? The simplest way
> I see is to add dirty to temp block-node, and then check its dirty
> count through query-named-block-nodes

Hm, does it have to be QMP in a running QEMU process? I'm not sure what
the best way would be there.

Otherwise, my approach would be 'qemu-img check' or even 'qemu-img map',
depending on what you want to verify with it.

Kevin




[PATCH v6 06/10] block/nvme: add reservation command protocol constants

2024-06-13 Thread Changqi Lu
Add constants for the NVMe persistent command protocol.
The constants include the reservation command opcode and
reservation type values defined in section 7 of the NVMe
2.0 specification.

Signed-off-by: Changqi Lu 
Signed-off-by: zhenwei pi 
---
 include/block/nvme.h | 61 
 1 file changed, 61 insertions(+)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index bb231d0b9a..da6ccb0f3b 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -633,6 +633,10 @@ enum NvmeIoCommands {
 NVME_CMD_WRITE_ZEROES   = 0x08,
 NVME_CMD_DSM= 0x09,
 NVME_CMD_VERIFY = 0x0c,
+NVME_CMD_RESV_REGISTER  = 0x0d,
+NVME_CMD_RESV_REPORT= 0x0e,
+NVME_CMD_RESV_ACQUIRE   = 0x11,
+NVME_CMD_RESV_RELEASE   = 0x15,
 NVME_CMD_IO_MGMT_RECV   = 0x12,
 NVME_CMD_COPY   = 0x19,
 NVME_CMD_IO_MGMT_SEND   = 0x1d,
@@ -641,6 +645,63 @@ enum NvmeIoCommands {
 NVME_CMD_ZONE_APPEND= 0x7d,
 };
 
+typedef enum {
+NVME_RESV_REGISTER_ACTION_REGISTER  = 0x00,
+NVME_RESV_REGISTER_ACTION_UNREGISTER= 0x01,
+NVME_RESV_REGISTER_ACTION_REPLACE   = 0x02,
+} NvmeReservationRegisterAction;
+
+typedef enum {
+NVME_RESV_RELEASE_ACTION_RELEASE= 0x00,
+NVME_RESV_RELEASE_ACTION_CLEAR  = 0x01,
+} NvmeReservationReleaseAction;
+
+typedef enum {
+NVME_RESV_ACQUIRE_ACTION_ACQUIRE= 0x00,
+NVME_RESV_ACQUIRE_ACTION_PREEMPT= 0x01,
+NVME_RESV_ACQUIRE_ACTION_PREEMPT_AND_ABORT  = 0x02,
+} NvmeReservationAcquireAction;
+
+typedef enum {
+NVME_RESV_WRITE_EXCLUSIVE   = 0x01,
+NVME_RESV_EXCLUSIVE_ACCESS  = 0x02,
+NVME_RESV_WRITE_EXCLUSIVE_REGS_ONLY = 0x03,
+NVME_RESV_EXCLUSIVE_ACCESS_REGS_ONLY= 0x04,
+NVME_RESV_WRITE_EXCLUSIVE_ALL_REGS  = 0x05,
+NVME_RESV_EXCLUSIVE_ACCESS_ALL_REGS = 0x06,
+} NvmeResvType;
+
+typedef enum {
+NVME_RESV_PTPL_NO_CHANGE = 0x00,
+NVME_RESV_PTPL_DISABLE   = 0x02,
+NVME_RESV_PTPL_ENABLE= 0x03,
+} NvmeResvPTPL;
+
+typedef enum NVMEPrCap {
+/* Persist Through Power Loss */
+NVME_PR_CAP_PTPL = 1 << 0,
+/* Write Exclusive reservation type */
+NVME_PR_CAP_WR_EX = 1 << 1,
+/* Exclusive Access reservation type */
+NVME_PR_CAP_EX_AC = 1 << 2,
+/* Write Exclusive Registrants Only reservation type */
+NVME_PR_CAP_WR_EX_RO = 1 << 3,
+/* Exclusive Access Registrants Only reservation type */
+NVME_PR_CAP_EX_AC_RO = 1 << 4,
+/* Write Exclusive All Registrants reservation type */
+NVME_PR_CAP_WR_EX_AR = 1 << 5,
+/* Exclusive Access All Registrants reservation type */
+NVME_PR_CAP_EX_AC_AR = 1 << 6,
+
+NVME_PR_CAP_ALL = (NVME_PR_CAP_PTPL |
+  NVME_PR_CAP_WR_EX |
+  NVME_PR_CAP_EX_AC |
+  NVME_PR_CAP_WR_EX_RO |
+  NVME_PR_CAP_EX_AC_RO |
+  NVME_PR_CAP_WR_EX_AR |
+  NVME_PR_CAP_EX_AC_AR),
+} NvmePrCap;
+
 typedef struct QEMU_PACKED NvmeDeleteQ {
 uint8_t opcode;
 uint8_t flags;
-- 
2.20.1




[PATCH v6 08/10] hw/nvme: enable ONCS and rescap function

2024-06-13 Thread Changqi Lu
This commit enables ONCS to support the reservation
function at the controller level. Also enables rescap
function in the namespace by detecting the supported reservation
function in the backend driver.

Signed-off-by: Changqi Lu 
Signed-off-by: zhenwei pi 
---
 hw/nvme/ctrl.c | 3 ++-
 hw/nvme/ns.c   | 5 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 127c3d2383..182307a48b 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8248,7 +8248,8 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->nn = cpu_to_le32(NVME_MAX_NAMESPACES);
 id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROES | NVME_ONCS_TIMESTAMP |
NVME_ONCS_FEATURES | NVME_ONCS_DSM |
-   NVME_ONCS_COMPARE | NVME_ONCS_COPY);
+   NVME_ONCS_COMPARE | NVME_ONCS_COPY |
+   NVME_ONCS_RESRVATIONS);
 
 /*
  * NOTE: If this device ever supports a command set that does NOT use 0x0
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index ea8db175db..320c9bf658 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -20,6 +20,7 @@
 #include "qemu/bitops.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/block-backend.h"
+#include "block/block_int.h"
 
 #include "nvme.h"
 #include "trace.h"
@@ -33,6 +34,7 @@ void nvme_ns_init_format(NvmeNamespace *ns)
 BlockDriverInfo bdi;
 int npdg, ret;
 int64_t nlbas;
+uint8_t blk_pr_cap;
 
 ns->lbaf = id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
 ns->lbasz = 1 << ns->lbaf.ds;
@@ -55,6 +57,9 @@ void nvme_ns_init_format(NvmeNamespace *ns)
 }
 
 id_ns->npda = id_ns->npdg = npdg - 1;
+
+blk_pr_cap = blk_bs(ns->blkconf.blk)->file->bs->bl.pr_cap;
+id_ns->rescap = block_pr_cap_to_nvme(blk_pr_cap);
 }
 
 static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
-- 
2.20.1




[PATCH v6 04/10] scsi/util: add helper functions for persistent reservation types conversion

2024-06-13 Thread Changqi Lu
This commit introduces two helper functions
that facilitate the conversion between the
persistent reservation types used in the SCSI
protocol and those used in the block layer.

Signed-off-by: Changqi Lu 
Signed-off-by: zhenwei pi 
---
 include/scsi/utils.h |  8 +
 scsi/utils.c | 81 
 2 files changed, 89 insertions(+)

diff --git a/include/scsi/utils.h b/include/scsi/utils.h
index d5c8efa16e..89a0b082fb 100644
--- a/include/scsi/utils.h
+++ b/include/scsi/utils.h
@@ -1,6 +1,8 @@
 #ifndef SCSI_UTILS_H
 #define SCSI_UTILS_H
 
+#include "block/block-common.h"
+#include "scsi/constants.h"
 #ifdef CONFIG_LINUX
 #include 
 #endif
@@ -135,6 +137,12 @@ uint32_t scsi_data_cdb_xfer(uint8_t *buf);
 uint32_t scsi_cdb_xfer(uint8_t *buf);
 int scsi_cdb_length(uint8_t *buf);
 
+BlockPrType scsi_pr_type_to_block(SCSIPrType type);
+SCSIPrType block_pr_type_to_scsi(BlockPrType type);
+
+uint8_t scsi_pr_cap_to_block(uint16_t scsi_pr_cap);
+uint16_t block_pr_cap_to_scsi(uint8_t block_pr_cap);
+
 /* Linux SG_IO interface.  */
 #ifdef CONFIG_LINUX
 #define SG_ERR_DRIVER_TIMEOUT  0x06
diff --git a/scsi/utils.c b/scsi/utils.c
index 357b036671..0dfdeb499d 100644
--- a/scsi/utils.c
+++ b/scsi/utils.c
@@ -658,3 +658,84 @@ int scsi_sense_from_host_status(uint8_t host_status,
 }
 return GOOD;
 }
+
+BlockPrType scsi_pr_type_to_block(SCSIPrType type)
+{
+switch (type) {
+case SCSI_PR_WRITE_EXCLUSIVE:
+return BLK_PR_WRITE_EXCLUSIVE;
+case SCSI_PR_EXCLUSIVE_ACCESS:
+return BLK_PR_EXCLUSIVE_ACCESS;
+case SCSI_PR_WRITE_EXCLUSIVE_REGS_ONLY:
+return BLK_PR_WRITE_EXCLUSIVE_REGS_ONLY;
+case SCSI_PR_EXCLUSIVE_ACCESS_REGS_ONLY:
+return BLK_PR_EXCLUSIVE_ACCESS_REGS_ONLY;
+case SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS:
+return BLK_PR_WRITE_EXCLUSIVE_ALL_REGS;
+case SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS:
+return BLK_PR_EXCLUSIVE_ACCESS_ALL_REGS;
+}
+
+return 0;
+}
+
+SCSIPrType block_pr_type_to_scsi(BlockPrType type)
+{
+switch (type) {
+case BLK_PR_WRITE_EXCLUSIVE:
+return SCSI_PR_WRITE_EXCLUSIVE;
+case BLK_PR_EXCLUSIVE_ACCESS:
+return SCSI_PR_EXCLUSIVE_ACCESS;
+case BLK_PR_WRITE_EXCLUSIVE_REGS_ONLY:
+return SCSI_PR_WRITE_EXCLUSIVE_REGS_ONLY;
+case BLK_PR_EXCLUSIVE_ACCESS_REGS_ONLY:
+return SCSI_PR_EXCLUSIVE_ACCESS_REGS_ONLY;
+case BLK_PR_WRITE_EXCLUSIVE_ALL_REGS:
+return SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS;
+case BLK_PR_EXCLUSIVE_ACCESS_ALL_REGS:
+return SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS;
+}
+
+return 0;
+}
+
+
+uint8_t scsi_pr_cap_to_block(uint16_t scsi_pr_cap)
+{
+uint8_t res = 0;
+
+res |= (scsi_pr_cap & SCSI_PR_CAP_WR_EX) ?
+   BLK_PR_CAP_WR_EX : 0;
+res |= (scsi_pr_cap & SCSI_PR_CAP_EX_AC) ?
+   BLK_PR_CAP_EX_AC : 0;
+res |= (scsi_pr_cap & SCSI_PR_CAP_WR_EX_RO) ?
+   BLK_PR_CAP_WR_EX_RO : 0;
+res |= (scsi_pr_cap & SCSI_PR_CAP_EX_AC_RO) ?
+   BLK_PR_CAP_EX_AC_RO : 0;
+res |= (scsi_pr_cap & SCSI_PR_CAP_WR_EX_AR) ?
+   BLK_PR_CAP_WR_EX_AR : 0;
+res |= (scsi_pr_cap & SCSI_PR_CAP_EX_AC_AR) ?
+   BLK_PR_CAP_EX_AC_AR : 0;
+
+return res;
+}
+
+uint16_t block_pr_cap_to_scsi(uint8_t block_pr_cap)
+{
+uint16_t res = 0;
+
+res |= (block_pr_cap & BLK_PR_CAP_WR_EX) ?
+  SCSI_PR_CAP_WR_EX : 0;
+res |= (block_pr_cap & BLK_PR_CAP_EX_AC) ?
+  SCSI_PR_CAP_EX_AC : 0;
+res |= (block_pr_cap & BLK_PR_CAP_WR_EX_RO) ?
+  SCSI_PR_CAP_WR_EX_RO : 0;
+res |= (block_pr_cap & BLK_PR_CAP_EX_AC_RO) ?
+  SCSI_PR_CAP_EX_AC_RO : 0;
+res |= (block_pr_cap & BLK_PR_CAP_WR_EX_AR) ?
+  SCSI_PR_CAP_WR_EX_AR : 0;
+res |= (block_pr_cap & BLK_PR_CAP_EX_AC_AR) ?
+  SCSI_PR_CAP_EX_AC_AR : 0;
+
+return res;
+}
-- 
2.20.1




[PATCH v6 01/10] block: add persistent reservation in/out api

2024-06-13 Thread Changqi Lu
Add persistent reservation in/out operations
at the block level. The following operations
are included:

- read_keys:retrieves the list of registered keys.
- read_reservation: retrieves the current reservation status.
- register: registers a new reservation key.
- reserve:  initiates a reservation for a specific key.
- release:  releases a reservation for a specific key.
- clear:clears all existing reservations.
- preempt:  preempts a reservation held by another key.

Signed-off-by: Changqi Lu 
Signed-off-by: zhenwei pi 
---
 block/block-backend.c | 403 ++
 block/io.c| 163 
 include/block/block-common.h  |  40 +++
 include/block/block-io.h  |  20 ++
 include/block/block_int-common.h  |  84 +++
 include/sysemu/block-backend-io.h |  24 ++
 6 files changed, 734 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index db6f9b92a3..b74aaba23f 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1770,6 +1770,409 @@ BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned 
long int req, void *buf,
 return blk_aio_prwv(blk, req, 0, buf, blk_aio_ioctl_entry, 0, cb, opaque);
 }
 
+typedef struct BlkPrInCo {
+BlockBackend *blk;
+uint32_t *generation;
+uint32_t num_keys;
+BlockPrType *type;
+uint64_t *keys;
+int ret;
+} BlkPrInCo;
+
+typedef struct BlkPrInCB {
+BlockAIOCB common;
+BlkPrInCo prco;
+bool has_returned;
+} BlkPrInCB;
+
+static const AIOCBInfo blk_pr_in_aiocb_info = {
+.aiocb_size = sizeof(BlkPrInCB),
+};
+
+static void blk_pr_in_complete(BlkPrInCB *acb)
+{
+if (acb->has_returned) {
+acb->common.cb(acb->common.opaque, acb->prco.ret);
+
+/* This is paired with blk_inc_in_flight() in blk_aio_pr_in(). */
+blk_dec_in_flight(acb->prco.blk);
+qemu_aio_unref(acb);
+}
+}
+
+static void blk_pr_in_complete_bh(void *opaque)
+{
+BlkPrInCB *acb = opaque;
+assert(acb->has_returned);
+blk_pr_in_complete(acb);
+}
+
+static BlockAIOCB *blk_aio_pr_in(BlockBackend *blk, uint32_t *generation,
+ uint32_t num_keys, BlockPrType *type,
+ uint64_t *keys, CoroutineEntry co_entry,
+ BlockCompletionFunc *cb, void *opaque)
+{
+BlkPrInCB *acb;
+Coroutine *co;
+
+/* This is paired with blk_dec_in_flight() in blk_pr_in_complete(). */
+blk_inc_in_flight(blk);
+acb = blk_aio_get(_pr_in_aiocb_info, blk, cb, opaque);
+acb->prco = (BlkPrInCo) {
+.blk= blk,
+.generation = generation,
+.num_keys   = num_keys,
+.type   = type,
+.ret= NOT_DONE,
+.keys   = keys,
+};
+acb->has_returned = false;
+
+co = qemu_coroutine_create(co_entry, acb);
+aio_co_enter(qemu_get_current_aio_context(), co);
+
+acb->has_returned = true;
+if (acb->prco.ret != NOT_DONE) {
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
+ blk_pr_in_complete_bh, acb);
+}
+
+return >common;
+}
+
+/* To be called between exactly one pair of blk_inc/dec_in_flight() */
+static int coroutine_fn
+blk_aio_pr_do_read_keys(BlockBackend *blk, uint32_t *generation,
+uint32_t num_keys, uint64_t *keys)
+{
+IO_CODE();
+
+blk_wait_while_drained(blk);
+GRAPH_RDLOCK_GUARD();
+
+if (!blk_co_is_available(blk)) {
+return -ENOMEDIUM;
+}
+
+return bdrv_co_pr_read_keys(blk_bs(blk), generation, num_keys, keys);
+}
+
+static void coroutine_fn blk_aio_pr_read_keys_entry(void *opaque)
+{
+BlkPrInCB *acb = opaque;
+BlkPrInCo *prco = >prco;
+
+prco->ret = blk_aio_pr_do_read_keys(prco->blk, prco->generation,
+prco->num_keys, prco->keys);
+blk_pr_in_complete(acb);
+}
+
+BlockAIOCB *blk_aio_pr_read_keys(BlockBackend *blk, uint32_t *generation,
+ uint32_t num_keys, uint64_t *keys,
+ BlockCompletionFunc *cb, void *opaque)
+{
+IO_CODE();
+return blk_aio_pr_in(blk, generation, num_keys, NULL, keys,
+ blk_aio_pr_read_keys_entry, cb, opaque);
+}
+
+/* To be called between exactly one pair of blk_inc/dec_in_flight() */
+static int coroutine_fn
+blk_aio_pr_do_read_reservation(BlockBackend *blk, uint32_t *generation,
+   uint64_t *key, BlockPrType *type)
+{
+IO_CODE();
+
+blk_wait_while_drained(blk);
+GRAPH_RDLOCK_GUARD();
+
+if (!blk_co_is_available(blk)) {
+return -ENOMEDIUM;
+}
+
+return bdrv_co_pr_read_reservation(blk_bs(blk), generation, key, type);
+}
+
+static void coroutine_fn blk_aio_pr_read_reservation_entry(void *opaque)
+{
+BlkPrInCB *acb = opaque;
+BlkPrInCo *prco = >prco;
+
+

[PATCH v6 02/10] block/raw: add persistent reservation in/out driver

2024-06-13 Thread Changqi Lu
Add persistent reservation in/out operations for raw driver.
The following methods are implemented: bdrv_co_pr_read_keys,
bdrv_co_pr_read_reservation, bdrv_co_pr_register, bdrv_co_pr_reserve,
bdrv_co_pr_release, bdrv_co_pr_clear and bdrv_co_pr_preempt.

Signed-off-by: Changqi Lu 
Signed-off-by: zhenwei pi 
---
 block/raw-format.c | 56 ++
 1 file changed, 56 insertions(+)

diff --git a/block/raw-format.c b/block/raw-format.c
index ac7e8495f6..3746bc1bd3 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -454,6 +454,55 @@ raw_co_ioctl(BlockDriverState *bs, unsigned long int req, 
void *buf)
 return bdrv_co_ioctl(bs->file->bs, req, buf);
 }
 
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_pr_read_keys(BlockDriverState *bs, uint32_t *generation,
+uint32_t num_keys, uint64_t *keys)
+{
+
+return bdrv_co_pr_read_keys(bs->file->bs, generation, num_keys, keys);
+}
+
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_pr_read_reservation(BlockDriverState *bs, uint32_t *generation,
+   uint64_t *key, BlockPrType *type)
+{
+return bdrv_co_pr_read_reservation(bs->file->bs, generation, key, type);
+}
+
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_pr_register(BlockDriverState *bs, uint64_t old_key,
+   uint64_t new_key, BlockPrType type,
+   bool ptpl, bool ignore_key)
+{
+return bdrv_co_pr_register(bs->file->bs, old_key, new_key,
+   type, ptpl, ignore_key);
+}
+
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_pr_reserve(BlockDriverState *bs, uint64_t key, BlockPrType type)
+{
+return bdrv_co_pr_reserve(bs->file->bs, key, type);
+}
+
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_pr_release(BlockDriverState *bs, uint64_t key, BlockPrType type)
+{
+return bdrv_co_pr_release(bs->file->bs, key, type);
+}
+
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_pr_clear(BlockDriverState *bs, uint64_t key)
+{
+return bdrv_co_pr_clear(bs->file->bs, key);
+}
+
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_pr_preempt(BlockDriverState *bs, uint64_t old_key,
+  uint64_t new_key, BlockPrType type, bool abort)
+{
+return bdrv_co_pr_preempt(bs->file->bs, old_key, new_key, type, abort);
+}
+
 static int GRAPH_RDLOCK raw_has_zero_init(BlockDriverState *bs)
 {
 return bdrv_has_zero_init(bs->file->bs);
@@ -672,6 +721,13 @@ BlockDriver bdrv_raw = {
 .strong_runtime_opts  = raw_strong_runtime_opts,
 .mutable_opts = mutable_opts,
 .bdrv_cancel_in_flight = raw_cancel_in_flight,
+.bdrv_co_pr_read_keys= raw_co_pr_read_keys,
+.bdrv_co_pr_read_reservation = raw_co_pr_read_reservation,
+.bdrv_co_pr_register = raw_co_pr_register,
+.bdrv_co_pr_reserve  = raw_co_pr_reserve,
+.bdrv_co_pr_release  = raw_co_pr_release,
+.bdrv_co_pr_clear= raw_co_pr_clear,
+.bdrv_co_pr_preempt  = raw_co_pr_preempt,
 };
 
 static void bdrv_raw_init(void)
-- 
2.20.1




[PATCH v6 00/10] Support persistent reservation operations

2024-06-13 Thread Changqi Lu
Hi,

patch v6 has been modified.

v5->v6:
- Add relevant comments in the io layer.

v4->v5:
- Fixed a memory leak bug at hw/nvme/ctrl.c.

v3->v4:
- At the nvme layer, the two patches of enabling the ONCS
  function and enabling rescap are combined into one.
- At the nvme layer, add helper functions for pr capacity
  conversion between the block layer and the nvme layer.

v2->v3:
In v2 Persist Through Power Loss(PTPL) is enable default.
In v3 PTPL is supported, which is passed as a parameter.

v1->v2:
- Add sg_persist --report-capabilities for SCSI protocol and enable
  oncs and rescap for NVMe protocol.
- Add persistent reservation capabilities constants and helper functions for
  SCSI and NVMe protocol.
- Add comments for necessary APIs.

v1:
- Add seven APIs about persistent reservation command for block layer.
  These APIs including reading keys, reading reservations, registering,
  reserving, releasing, clearing and preempting.
- Add the necessary pr-related operation APIs for both the
  SCSI protocol and NVMe protocol at the device layer.
- Add scsi driver at the driver layer to verify the functions

Changqi Lu (10):
  block: add persistent reservation in/out api
  block/raw: add persistent reservation in/out driver
  scsi/constant: add persistent reservation in/out protocol constants
  scsi/util: add helper functions for persistent reservation types
conversion
  hw/scsi: add persistent reservation in/out api for scsi device
  block/nvme: add reservation command protocol constants
  hw/nvme: add helper functions for converting reservation types
  hw/nvme: enable ONCS and rescap function
  hw/nvme: add reservation protocal command
  block/iscsi: add persistent reservation in/out driver

 block/block-backend.c | 403 +++
 block/io.c| 163 +++
 block/iscsi.c | 443 ++
 block/raw-format.c|  56 
 hw/nvme/ctrl.c| 326 +-
 hw/nvme/ns.c  |   5 +
 hw/nvme/nvme.h|  84 ++
 hw/scsi/scsi-disk.c   | 352 
 include/block/block-common.h  |  40 +++
 include/block/block-io.h  |  20 ++
 include/block/block_int-common.h  |  84 ++
 include/block/nvme.h  |  98 +++
 include/scsi/constants.h  |  52 
 include/scsi/utils.h  |   8 +
 include/sysemu/block-backend-io.h |  24 ++
 scsi/utils.c  |  81 ++
 16 files changed, 2237 insertions(+), 2 deletions(-)

-- 
2.20.1




[PATCH v6 05/10] hw/scsi: add persistent reservation in/out api for scsi device

2024-06-13 Thread Changqi Lu
Add persistent reservation in/out operations in the
SCSI device layer. By introducing the persistent
reservation in/out api, this enables the SCSI device
to perform reservation-related tasks, including querying
keys, querying reservation status, registering reservation
keys, initiating and releasing reservations, as well as
clearing and preempting reservations held by other keys.

These operations are crucial for management and control of
shared storage resources in a persistent manner.

Signed-off-by: Changqi Lu 
Signed-off-by: zhenwei pi 
---
 hw/scsi/scsi-disk.c | 352 
 1 file changed, 352 insertions(+)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 4bd7af9d0c..0e964dbd87 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -32,6 +32,7 @@
 #include "migration/vmstate.h"
 #include "hw/scsi/emulation.h"
 #include "scsi/constants.h"
+#include "scsi/utils.h"
 #include "sysemu/block-backend.h"
 #include "sysemu/blockdev.h"
 #include "hw/block/block.h"
@@ -42,6 +43,7 @@
 #include "qemu/cutils.h"
 #include "trace.h"
 #include "qom/object.h"
+#include "block/block_int.h"
 
 #ifdef __linux
 #include 
@@ -1474,6 +1476,346 @@ static void scsi_disk_emulate_read_data(SCSIRequest 
*req)
 scsi_req_complete(>req, GOOD);
 }
 
+typedef struct SCSIPrReadKeys {
+uint32_t generation;
+uint32_t num_keys;
+uint64_t *keys;
+void *req;
+} SCSIPrReadKeys;
+
+typedef struct SCSIPrReadReservation {
+uint32_t generation;
+uint64_t key;
+BlockPrType type;
+void *req;
+} SCSIPrReadReservation;
+
+static void scsi_pr_read_keys_complete(void *opaque, int ret)
+{
+int num_keys;
+uint8_t *buf;
+SCSIPrReadKeys *blk_keys = (SCSIPrReadKeys *)opaque;
+SCSIDiskReq *r = (SCSIDiskReq *)blk_keys->req;
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+
+assert(blk_get_aio_context(s->qdev.conf.blk) ==
+qemu_get_current_aio_context());
+
+assert(r->req.aiocb != NULL);
+r->req.aiocb = NULL;
+
+if (scsi_disk_req_check_error(r, ret, true)) {
+goto done;
+}
+
+buf = scsi_req_get_buf(>req);
+num_keys = MIN(blk_keys->num_keys, ret);
+blk_keys->generation = cpu_to_be32(blk_keys->generation);
+memcpy([0], _keys->generation, 4);
+for (int i = 0; i < num_keys; i++) {
+blk_keys->keys[i] = cpu_to_be64(blk_keys->keys[i]);
+memcpy([8 + i * 8], _keys->keys[i], 8);
+}
+num_keys = cpu_to_be32(num_keys * 8);
+memcpy([4], _keys, 4);
+
+scsi_req_data(>req, r->buflen);
+done:
+scsi_req_unref(>req);
+g_free(blk_keys->keys);
+g_free(blk_keys);
+}
+
+static int scsi_disk_emulate_pr_read_keys(SCSIRequest *req)
+{
+SCSIPrReadKeys *blk_keys;
+SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
+int buflen = MIN(r->req.cmd.xfer, r->buflen);
+int num_keys = (buflen - sizeof(uint32_t) * 2) / sizeof(uint64_t);
+
+blk_keys = g_new0(SCSIPrReadKeys, 1);
+blk_keys->generation = 0;
+/* num_keys is the maximum number of keys that can be transmitted */
+blk_keys->num_keys = num_keys;
+blk_keys->keys = g_malloc(sizeof(uint64_t) * num_keys);
+blk_keys->req = r;
+
+/* The request is used as the AIO opaque value, so add a ref.  */
+scsi_req_ref(>req);
+r->req.aiocb = blk_aio_pr_read_keys(s->qdev.conf.blk, 
_keys->generation,
+blk_keys->num_keys, blk_keys->keys,
+scsi_pr_read_keys_complete, blk_keys);
+return 0;
+}
+
+static void scsi_pr_read_reservation_complete(void *opaque, int ret)
+{
+uint8_t *buf;
+uint32_t additional_len = 0;
+SCSIPrReadReservation *blk_rsv = (SCSIPrReadReservation *)opaque;
+SCSIDiskReq *r = (SCSIDiskReq *)blk_rsv->req;
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+
+assert(blk_get_aio_context(s->qdev.conf.blk) ==
+qemu_get_current_aio_context());
+
+assert(r->req.aiocb != NULL);
+r->req.aiocb = NULL;
+
+if (scsi_disk_req_check_error(r, ret, true)) {
+goto done;
+}
+
+buf = scsi_req_get_buf(>req);
+blk_rsv->generation = cpu_to_be32(blk_rsv->generation);
+memcpy([0], _rsv->generation, 4);
+if (ret) {
+additional_len = cpu_to_be32(16);
+blk_rsv->key = cpu_to_be64(blk_rsv->key);
+memcpy([8], _rsv->key, 8);
+buf[21] = block_pr_type_to_scsi(blk_rsv->type) & 0xf;
+} else {
+additional_len = cpu_to_be32(0);
+}
+
+memcpy([4], _len, 4);
+scsi_req_data(>req, r->buflen);
+
+done:
+scsi_req_unref(>req);
+g_free(blk_rsv);
+}
+
+static int scsi_disk_emulate_pr_read_reservation(SCSIRequest *req)
+{
+SCSIPrReadReservation *blk_rsv;
+SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
+
+blk_rsv = 

[PATCH v6 10/10] block/iscsi: add persistent reservation in/out driver

2024-06-13 Thread Changqi Lu
Add persistent reservation in/out operations for iscsi driver.
The following methods are implemented: bdrv_co_pr_read_keys,
bdrv_co_pr_read_reservation, bdrv_co_pr_register, bdrv_co_pr_reserve,
bdrv_co_pr_release, bdrv_co_pr_clear and bdrv_co_pr_preempt.

Signed-off-by: Changqi Lu 
Signed-off-by: zhenwei pi 
---
 block/iscsi.c | 443 ++
 1 file changed, 443 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index 2ff14b7472..d94ebe35bd 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -96,6 +96,7 @@ typedef struct IscsiLun {
 unsigned long *allocmap_valid;
 long allocmap_size;
 int cluster_size;
+uint8_t pr_cap;
 bool use_16_for_rw;
 bool write_protected;
 bool lbpme;
@@ -280,6 +281,8 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
 iTask->err_code = -error;
 iTask->err_str = g_strdup(iscsi_get_error(iscsi));
 }
+} else if (status == SCSI_STATUS_RESERVATION_CONFLICT) {
+iTask->err_code = -EBADE;
 }
 }
 }
@@ -1792,6 +1795,52 @@ static void iscsi_save_designator(IscsiLun *lun,
 }
 }
 
+static void iscsi_get_pr_cap_sync(IscsiLun *iscsilun, Error **errp)
+{
+struct scsi_task *task = NULL;
+struct scsi_persistent_reserve_in_report_capabilities *rc = NULL;
+int retries = ISCSI_CMD_RETRIES;
+int xferlen = sizeof(struct 
scsi_persistent_reserve_in_report_capabilities);
+
+do {
+if (task != NULL) {
+scsi_free_scsi_task(task);
+task = NULL;
+}
+
+task = iscsi_persistent_reserve_in_sync(iscsilun->iscsi,
+   iscsilun->lun, SCSI_PR_IN_REPORT_CAPABILITIES, xferlen);
+if (task != NULL && task->status == SCSI_STATUS_GOOD) {
+rc = scsi_datain_unmarshall(task);
+if (rc == NULL) {
+error_setg(errp,
+"iSCSI: Failed to unmarshall report capabilities data.");
+} else {
+iscsilun->pr_cap =
+scsi_pr_cap_to_block(rc->persistent_reservation_type_mask);
+iscsilun->pr_cap |= (rc->ptpl_a) ? BLK_PR_CAP_PTPL : 0;
+}
+break;
+}
+
+if (task != NULL && task->status == SCSI_STATUS_CHECK_CONDITION
+&& task->sense.key == SCSI_SENSE_UNIT_ATTENTION) {
+break;
+}
+
+} while (task != NULL && task->status == SCSI_STATUS_CHECK_CONDITION
+ && task->sense.key == SCSI_SENSE_UNIT_ATTENTION
+ && retries-- > 0);
+
+if (task == NULL || task->status != SCSI_STATUS_GOOD) {
+error_setg(errp, "iSCSI: failed to send report capabilities command");
+}
+
+if (task) {
+scsi_free_scsi_task(task);
+}
+}
+
 static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
   Error **errp)
 {
@@ -2024,6 +2073,11 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
 bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP;
 }
 
+iscsi_get_pr_cap_sync(iscsilun, _err);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+ret = -EINVAL;
+}
 out:
 qemu_opts_del(opts);
 g_free(initiator_name);
@@ -2110,6 +2164,8 @@ static void iscsi_refresh_limits(BlockDriverState *bs, 
Error **errp)
 bs->bl.opt_transfer = pow2floor(iscsilun->bl.opt_xfer_len *
 iscsilun->block_size);
 }
+
+bs->bl.pr_cap = iscsilun->pr_cap;
 }
 
 /* Note that this will not re-establish a connection with an iSCSI target - it
@@ -2408,6 +2464,385 @@ out_unlock:
 return r;
 }
 
+static int coroutine_fn
+iscsi_co_pr_read_keys(BlockDriverState *bs, uint32_t *generation,
+  uint32_t num_keys, uint64_t *keys)
+{
+IscsiLun *iscsilun = bs->opaque;
+QEMUIOVector qiov;
+struct IscsiTask iTask;
+int xferlen = sizeof(struct scsi_persistent_reserve_in_read_keys) +
+  sizeof(uint64_t) * num_keys;
+uint8_t *buf = g_malloc0(xferlen);
+int32_t num_collect_keys = 0;
+int r = 0;
+
+qemu_iovec_init_buf(, buf, xferlen);
+iscsi_co_init_iscsitask(iscsilun, );
+qemu_mutex_lock(>mutex);
+retry:
+iTask.task = iscsi_persistent_reserve_in_task(iscsilun->iscsi,
+ iscsilun->lun, SCSI_PR_IN_READ_KEYS, xferlen,
+ iscsi_co_generic_cb, );
+
+if (iTask.task == NULL) {
+qemu_mutex_unlock(>mutex);
+return -ENOMEM;
+}
+
+scsi_task_set_iov_in(iTask.task, (struct scsi_iovec *)qiov.iov, qiov.niov);
+iscsi_co_wait_for_task(, iscsilun);
+
+if (iTask.task != NULL) {
+scsi_free_scsi_task(iTask.task);
+iTask.task = NULL;
+}
+
+if (iTask.do_retry) {
+iTask.complete = 0;
+goto retry;
+}
+
+if (iTask.status != SCSI_STATUS_GOOD) {
+ 

[PATCH v6 09/10] hw/nvme: add reservation protocal command

2024-06-13 Thread Changqi Lu
Add reservation acquire, reservation register,
reservation release and reservation report commands
in the nvme device layer.

By introducing these commands, this enables the nvme
device to perform reservation-related tasks, including
querying keys, querying reservation status, registering
reservation keys, initiating and releasing reservations,
as well as clearing and preempting reservations held by
other keys.

These commands are crucial for management and control of
shared storage resources in a persistent manner.
Signed-off-by: Changqi Lu 
Signed-off-by: zhenwei pi 
---
 hw/nvme/ctrl.c   | 323 ++-
 hw/nvme/nvme.h   |   4 +
 include/block/nvme.h |  37 +
 3 files changed, 363 insertions(+), 1 deletion(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 182307a48b..44e0bd5c63 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -294,6 +294,10 @@ static const uint32_t nvme_cse_iocs_nvm[256] = {
 [NVME_CMD_COMPARE]  = NVME_CMD_EFF_CSUPP,
 [NVME_CMD_IO_MGMT_RECV] = NVME_CMD_EFF_CSUPP,
 [NVME_CMD_IO_MGMT_SEND] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
+[NVME_CMD_RESV_REGISTER]= NVME_CMD_EFF_CSUPP,
+[NVME_CMD_RESV_REPORT]  = NVME_CMD_EFF_CSUPP,
+[NVME_CMD_RESV_ACQUIRE] = NVME_CMD_EFF_CSUPP,
+[NVME_CMD_RESV_RELEASE] = NVME_CMD_EFF_CSUPP,
 };
 
 static const uint32_t nvme_cse_iocs_zoned[256] = {
@@ -308,6 +312,10 @@ static const uint32_t nvme_cse_iocs_zoned[256] = {
 [NVME_CMD_ZONE_APPEND]  = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_ZONE_MGMT_SEND]   = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 [NVME_CMD_ZONE_MGMT_RECV]   = NVME_CMD_EFF_CSUPP,
+[NVME_CMD_RESV_REGISTER]= NVME_CMD_EFF_CSUPP,
+[NVME_CMD_RESV_REPORT]  = NVME_CMD_EFF_CSUPP,
+[NVME_CMD_RESV_ACQUIRE] = NVME_CMD_EFF_CSUPP,
+[NVME_CMD_RESV_RELEASE] = NVME_CMD_EFF_CSUPP,
 };
 
 static void nvme_process_sq(void *opaque);
@@ -1745,6 +1753,7 @@ static void nvme_aio_err(NvmeRequest *req, int ret)
 
 switch (req->cmd.opcode) {
 case NVME_CMD_READ:
+case NVME_CMD_RESV_REPORT:
 status = NVME_UNRECOVERED_READ;
 break;
 case NVME_CMD_FLUSH:
@@ -1752,6 +1761,9 @@ static void nvme_aio_err(NvmeRequest *req, int ret)
 case NVME_CMD_WRITE_ZEROES:
 case NVME_CMD_ZONE_APPEND:
 case NVME_CMD_COPY:
+case NVME_CMD_RESV_REGISTER:
+case NVME_CMD_RESV_ACQUIRE:
+case NVME_CMD_RESV_RELEASE:
 status = NVME_WRITE_FAULT;
 break;
 default:
@@ -2127,7 +2139,10 @@ static inline bool nvme_is_write(NvmeRequest *req)
 
 return rw->opcode == NVME_CMD_WRITE ||
rw->opcode == NVME_CMD_ZONE_APPEND ||
-   rw->opcode == NVME_CMD_WRITE_ZEROES;
+   rw->opcode == NVME_CMD_WRITE_ZEROES ||
+   rw->opcode == NVME_CMD_RESV_REGISTER ||
+   rw->opcode == NVME_CMD_RESV_ACQUIRE ||
+   rw->opcode == NVME_CMD_RESV_RELEASE;
 }
 
 static void nvme_misc_cb(void *opaque, int ret)
@@ -2692,6 +2707,304 @@ static uint16_t nvme_verify(NvmeCtrl *n, NvmeRequest 
*req)
 return NVME_NO_COMPLETE;
 }
 
+typedef struct NvmeKeyInfo {
+uint64_t cr_key;
+uint64_t nr_key;
+} NvmeKeyInfo;
+
+static uint16_t nvme_resv_register(NvmeCtrl *n, NvmeRequest *req)
+{
+int ret;
+NvmeKeyInfo key_info;
+NvmeNamespace *ns = req->ns;
+uint32_t cdw10 = le32_to_cpu(req->cmd.cdw10);
+bool ignore_key = cdw10 >> 3 & 0x1;
+uint8_t action = cdw10 & 0x7;
+uint8_t ptpl = cdw10 >> 30 & 0x3;
+bool aptpl;
+
+switch (ptpl) {
+case NVME_RESV_PTPL_NO_CHANGE:
+aptpl = (ns->id_ns.rescap & NVME_PR_CAP_PTPL) ? true : false;
+break;
+case NVME_RESV_PTPL_DISABLE:
+aptpl = false;
+break;
+case NVME_RESV_PTPL_ENABLE:
+aptpl = true;
+break;
+default:
+return NVME_INVALID_FIELD;
+}
+
+ret = nvme_h2c(n, (uint8_t *)_info, sizeof(NvmeKeyInfo), req);
+if (ret) {
+return ret;
+}
+
+switch (action) {
+case NVME_RESV_REGISTER_ACTION_REGISTER:
+req->aiocb = blk_aio_pr_register(ns->blkconf.blk, 0,
+ key_info.nr_key, 0, aptpl,
+ ignore_key, nvme_misc_cb,
+ req);
+break;
+case NVME_RESV_REGISTER_ACTION_UNREGISTER:
+req->aiocb = blk_aio_pr_register(ns->blkconf.blk, key_info.cr_key, 0,
+ 0, aptpl, ignore_key,
+ nvme_misc_cb, req);
+break;
+case NVME_RESV_REGISTER_ACTION_REPLACE:
+req->aiocb = blk_aio_pr_register(ns->blkconf.blk, key_info.cr_key,
+ key_info.nr_key, 0, aptpl, ignore_key,
+ nvme_misc_cb, req);
+break;
+default:
+return 

[PATCH v6 07/10] hw/nvme: add helper functions for converting reservation types

2024-06-13 Thread Changqi Lu
This commit introduces two helper functions
that facilitate the conversion between the
reservation types used in the NVME protocol
and those used in the block layer.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Changqi Lu 
Signed-off-by: zhenwei pi 
---
 hw/nvme/nvme.h | 80 ++
 1 file changed, 80 insertions(+)

diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index bed8191bd5..b1ad27c8f2 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -474,6 +474,86 @@ static inline const char *nvme_io_opc_str(uint8_t opc)
 }
 }
 
+static inline NvmeResvType block_pr_type_to_nvme(BlockPrType type)
+{
+switch (type) {
+case BLK_PR_WRITE_EXCLUSIVE:
+return NVME_RESV_WRITE_EXCLUSIVE;
+case BLK_PR_EXCLUSIVE_ACCESS:
+return NVME_RESV_EXCLUSIVE_ACCESS;
+case BLK_PR_WRITE_EXCLUSIVE_REGS_ONLY:
+return NVME_RESV_WRITE_EXCLUSIVE_REGS_ONLY;
+case BLK_PR_EXCLUSIVE_ACCESS_REGS_ONLY:
+return NVME_RESV_EXCLUSIVE_ACCESS_REGS_ONLY;
+case BLK_PR_WRITE_EXCLUSIVE_ALL_REGS:
+return NVME_RESV_WRITE_EXCLUSIVE_ALL_REGS;
+case BLK_PR_EXCLUSIVE_ACCESS_ALL_REGS:
+return NVME_RESV_EXCLUSIVE_ACCESS_ALL_REGS;
+}
+
+return 0;
+}
+
+static inline BlockPrType nvme_pr_type_to_block(NvmeResvType type)
+{
+switch (type) {
+case NVME_RESV_WRITE_EXCLUSIVE:
+return BLK_PR_WRITE_EXCLUSIVE;
+case NVME_RESV_EXCLUSIVE_ACCESS:
+return BLK_PR_EXCLUSIVE_ACCESS;
+case NVME_RESV_WRITE_EXCLUSIVE_REGS_ONLY:
+return BLK_PR_WRITE_EXCLUSIVE_REGS_ONLY;
+case NVME_RESV_EXCLUSIVE_ACCESS_REGS_ONLY:
+return BLK_PR_EXCLUSIVE_ACCESS_REGS_ONLY;
+case NVME_RESV_WRITE_EXCLUSIVE_ALL_REGS:
+return BLK_PR_WRITE_EXCLUSIVE_ALL_REGS;
+case NVME_RESV_EXCLUSIVE_ACCESS_ALL_REGS:
+return BLK_PR_EXCLUSIVE_ACCESS_ALL_REGS;
+}
+
+return 0;
+}
+
+static inline uint8_t nvme_pr_cap_to_block(uint16_t nvme_pr_cap)
+{
+uint8_t res = 0;
+
+res |= (nvme_pr_cap & NVME_PR_CAP_WR_EX) ?
+   BLK_PR_CAP_WR_EX : 0;
+res |= (nvme_pr_cap & NVME_PR_CAP_EX_AC) ?
+   BLK_PR_CAP_EX_AC : 0;
+res |= (nvme_pr_cap & NVME_PR_CAP_WR_EX_RO) ?
+   BLK_PR_CAP_WR_EX_RO : 0;
+res |= (nvme_pr_cap & NVME_PR_CAP_EX_AC_RO) ?
+   BLK_PR_CAP_EX_AC_RO : 0;
+res |= (nvme_pr_cap & NVME_PR_CAP_WR_EX_AR) ?
+   BLK_PR_CAP_WR_EX_AR : 0;
+res |= (nvme_pr_cap & NVME_PR_CAP_EX_AC_AR) ?
+   BLK_PR_CAP_EX_AC_AR : 0;
+
+return res;
+}
+
+static inline uint8_t block_pr_cap_to_nvme(uint8_t block_pr_cap)
+{
+uint16_t res = 0;
+
+res |= (block_pr_cap & BLK_PR_CAP_WR_EX) ?
+  NVME_PR_CAP_WR_EX : 0;
+res |= (block_pr_cap & BLK_PR_CAP_EX_AC) ?
+  NVME_PR_CAP_EX_AC : 0;
+res |= (block_pr_cap & BLK_PR_CAP_WR_EX_RO) ?
+  NVME_PR_CAP_WR_EX_RO : 0;
+res |= (block_pr_cap & BLK_PR_CAP_EX_AC_RO) ?
+  NVME_PR_CAP_EX_AC_RO : 0;
+res |= (block_pr_cap & BLK_PR_CAP_WR_EX_AR) ?
+  NVME_PR_CAP_WR_EX_AR : 0;
+res |= (block_pr_cap & BLK_PR_CAP_EX_AC_AR) ?
+  NVME_PR_CAP_EX_AC_AR : 0;
+
+return res;
+}
+
 typedef struct NvmeSQueue {
 struct NvmeCtrl *ctrl;
 uint16_tsqid;
-- 
2.20.1




[PATCH v6 03/10] scsi/constant: add persistent reservation in/out protocol constants

2024-06-13 Thread Changqi Lu
Add constants for the persistent reservation in/out protocol
in the scsi/constant module. The constants include the persistent
reservation command, type, and scope values defined in sections
6.13 and 6.14 of the SCSI Primary Commands-4 (SPC-4) specification.

Signed-off-by: Changqi Lu 
Signed-off-by: zhenwei pi 
---
 include/scsi/constants.h | 52 
 1 file changed, 52 insertions(+)

diff --git a/include/scsi/constants.h b/include/scsi/constants.h
index 9b98451912..922a314535 100644
--- a/include/scsi/constants.h
+++ b/include/scsi/constants.h
@@ -319,4 +319,56 @@
 #define IDENT_DESCR_TGT_DESCR_SIZE 32
 #define XCOPY_BLK2BLK_SEG_DESC_SIZE 28
 
+typedef enum {
+SCSI_PR_WRITE_EXCLUSIVE = 0x01,
+SCSI_PR_EXCLUSIVE_ACCESS= 0x03,
+SCSI_PR_WRITE_EXCLUSIVE_REGS_ONLY   = 0x05,
+SCSI_PR_EXCLUSIVE_ACCESS_REGS_ONLY  = 0x06,
+SCSI_PR_WRITE_EXCLUSIVE_ALL_REGS= 0x07,
+SCSI_PR_EXCLUSIVE_ACCESS_ALL_REGS   = 0x08,
+} SCSIPrType;
+
+typedef enum {
+SCSI_PR_LU_SCOPE  = 0x00,
+} SCSIPrScope;
+
+typedef enum {
+SCSI_PR_OUT_REGISTER = 0x0,
+SCSI_PR_OUT_RESERVE  = 0x1,
+SCSI_PR_OUT_RELEASE  = 0x2,
+SCSI_PR_OUT_CLEAR= 0x3,
+SCSI_PR_OUT_PREEMPT  = 0x4,
+SCSI_PR_OUT_PREEMPT_AND_ABORT= 0x5,
+SCSI_PR_OUT_REG_AND_IGNORE_KEY   = 0x6,
+SCSI_PR_OUT_REG_AND_MOVE = 0x7,
+} SCSIPrOutAction;
+
+typedef enum {
+SCSI_PR_IN_READ_KEYS = 0x0,
+SCSI_PR_IN_READ_RESERVATION  = 0x1,
+SCSI_PR_IN_REPORT_CAPABILITIES   = 0x2,
+} SCSIPrInAction;
+
+typedef enum {
+/* Exclusive Access All Registrants reservation type */
+SCSI_PR_CAP_EX_AC_AR = 1 << 0,
+/* Write Exclusive reservation type */
+SCSI_PR_CAP_WR_EX = 1 << 9,
+/* Exclusive Access reservation type */
+SCSI_PR_CAP_EX_AC = 1 << 11,
+/* Write Exclusive Registrants Only reservation type */
+SCSI_PR_CAP_WR_EX_RO = 1 << 13,
+/* Exclusive Access Registrants Only reservation type */
+SCSI_PR_CAP_EX_AC_RO = 1 << 14,
+/* Write Exclusive All Registrants reservation type */
+SCSI_PR_CAP_WR_EX_AR = 1 << 15,
+
+SCSI_PR_CAP_ALL = (SCSI_PR_CAP_EX_AC_AR |
+  SCSI_PR_CAP_WR_EX |
+  SCSI_PR_CAP_EX_AC |
+  SCSI_PR_CAP_WR_EX_RO |
+  SCSI_PR_CAP_EX_AC_RO |
+  SCSI_PR_CAP_WR_EX_AR),
+} SCSIPrCap;
+
 #endif
-- 
2.20.1