date:20221026

Re: qemu iotest 161 and make check

2022-10-26 Thread Christian Borntraeger





Am 31.03.22 um 10:25 schrieb Christian Borntraeger:



Am 31.03.22 um 09:44 schrieb Christian Borntraeger:



Am 21.02.22 um 11:27 schrieb Christian Borntraeger:


Am 10.02.22 um 18:44 schrieb Vladimir Sementsov-Ogievskiy:

10.02.2022 20:13, Thomas Huth wrote:

On 10/02/2022 15.51, Christian Borntraeger wrote:



Am 10.02.22 um 15:47 schrieb Vladimir Sementsov-Ogievskiy:

10.02.2022 10:57, Christian Borntraeger wrote:

Hello,

I do see spurious failures of 161 in our CI, but only when I use
make check with parallelism (-j).
I have not yet figured out which other testcase could interfere

@@ -34,6 +34,8 @@
  *** Commit and then change an option on the backing file

  Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1048576
+qemu-img: TEST_DIR/t.IMGFMT.base: Failed to get "write" lock


FWIW, qemu_lock_fd_test returns -11 (EAGAIN)
and raw_check_lock_bytes spits this error.



And its coming from here (ret is 0)

int qemu_lock_fd_test(int fd, int64_t start, int64_t len, bool exclusive)
{
     int ret;
     struct flock fl = {
     .l_whence = SEEK_SET,
     .l_start  = start,
     .l_len    = len,
     .l_type   = exclusive ? F_WRLCK : F_RDLCK,
     };
     qemu_probe_lock_ops();
     ret = fcntl(fd, fcntl_op_getlk, );
     if (ret == -1) {
     return -errno;
     } else {
->    return fl.l_type == F_UNLCK ? 0 : -EAGAIN;
     }
}




Is this just some overload situation that we do not recover because we do not 
handle EAGAIN any special.


Restarted my investigation. Looks like the file lock from qemu is not fully 
cleaned up when the process is gone.
Something like
diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu
index 0f1fecc68e..b28a6c187c 100644
--- a/tests/qemu-iotests/common.qemu
+++ b/tests/qemu-iotests/common.qemu
@@ -403,4 +403,5 @@ _cleanup_qemu()
 unset QEMU_IN[$i]
 unset QEMU_OUT[$i]
 done
+sleep 0.5
 }


makes the problem go away.

Looks like we do use the OFD variant of the file lock, so any clone, fork etc 
will keep the lock.

So I tested the following:

diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu
index 0f1fecc68e..01bdb05575 100644
--- a/tests/qemu-iotests/common.qemu
+++ b/tests/qemu-iotests/common.qemu
@@ -388,7 +388,7 @@ _cleanup_qemu()
 kill -KILL ${QEMU_PID} 2>/dev/null
 fi
 if [ -n "${QEMU_PID}" ]; then
-wait ${QEMU_PID} 2>/dev/null # silent kill
+wait 2>/dev/null # silent kill
 fi
 fi


And this also helps. Still trying to find out what clone/fork happens here.

Re: [PATCH v3 02/16] hw/i386/amd_iommu: Omit errp for pci_add_capability

2022-10-26 Thread Markus Armbruster

Akihiko Odaki  writes:

> Omitting errp for pci_add_capability() causes it to abort if
> capabilities overlap. This behavior is appropriate heare because all of

Typo: here

Same for later patches.

> the capabilities set in this device are defined in the program and
> their overlap should not happen unless there is a programming error.
>
> Signed-off-by: Akihiko Odaki 

Otherwise fine.  Thank you!

Re: [PATCH 5/7] block/nfs: Fix 32-bit Windows build

2022-10-26 Thread Bin Meng

Hi Kevin,

On Sat, Sep 24, 2022 at 9:19 AM Bin Meng  wrote:
>
> Hi,
>
> On Wed, Sep 21, 2022 at 8:10 PM Meng, Bin  wrote:
> >
> > -Original Message-
> > From: Philippe Mathieu-Daudé  On Behalf 
> > Of Philippe Mathieu-Daudé
> > Sent: Sunday, September 18, 2022 5:32 AM
> > To: Bin Meng ; qemu-de...@nongnu.org
> > Cc: Meng, Bin ; Hanna Reitz ; 
> > Kevin Wolf ; Peter Lieven ; 
> > qemu-block@nongnu.org
> > Subject: Re: [PATCH 5/7] block/nfs: Fix 32-bit Windows build
> >
> > [Please note: This e-mail is from an EXTERNAL e-mail address]
> >
> > On 8/9/22 15:28, Bin Meng wrote:
> > > From: Bin Meng 
> > >
> > > libnfs.h declares nfs_fstat() as the following for win32:
> > >
> > >int nfs_fstat(struct nfs_context *nfs, struct nfsfh *nfsfh,
> > >  struct __stat64 *st);
> > >
> > > The 'st' parameter should be of type 'struct __stat64'. The codes
> > > happen to build successfully for 64-bit Windows, but it does not build
> > > for 32-bit Windows.
> > >
> > > Fixes: 6542aa9c75bc ("block: add native support for NFS")
> > > Fixes: 18a8056e0bc7 ("block/nfs: cache allocated filesize for
> > > read-only files")
> > > Signed-off-by: Bin Meng 
> > > ---
> > >
> > >   block/nfs.c | 8 
> > >   1 file changed, 8 insertions(+)
> > >
> > > diff --git a/block/nfs.c b/block/nfs.c index 444c40b458..d5d67937dd
> > > 100644
> > > --- a/block/nfs.c
> > > +++ b/block/nfs.c
> > > @@ -418,7 +418,11 @@ static int64_t nfs_client_open(NFSClient *client, 
> > > BlockdevOptionsNfs *opts,
> > >  int flags, int open_flags, Error **errp)
> > >   {
> > >   int64_t ret = -EINVAL;
> > > +#ifdef _WIN32
> > > +struct __stat64 st;
> > > +#else
> > >   struct stat st;
> > > +#endif
> > >   char *file = NULL, *strp = NULL;
> > >
> > >   qemu_mutex_init(>mutex); @@ -781,7 +785,11 @@ static int
> > > nfs_reopen_prepare(BDRVReopenState *state,
> > > BlockReopenQueue *queue, Error **errp)
> > >   {
> > >   NFSClient *client = state->bs->opaque;
> > > +#ifdef _WIN32
> > > +struct __stat64 st;
> > > +#else
> > >   struct stat st;
> > > +#endif
> > >   int ret = 0;
> > >
> > >   if (state->flags & BDRV_O_RDWR && bdrv_is_read_only(state->bs))
> > > {
> >
> > What about the field in struct NFSRPC?
> >
> > NFSRPC::struct stat is used in nfs_get_allocated_file_size_cb() and 
> > nfs_get_allocated_file_size() that are not built on win32, so there is no 
> > problem.
> >
>
> Any further comments?
>

Will you queue this patch via the block tree?

Regards,
Bin

Re: [PATCH v3 14/16] hw/vfio/pci: Omit errp for pci_add_capability

2022-10-26 Thread Alex Williamson

On Thu, 27 Oct 2022 05:15:25 +0900
Akihiko Odaki  wrote:

> The code generating errors in pci_add_capability has a comment which
> says:
> > Verify that capabilities don't overlap.  Note: device assignment
> > depends on this check to verify that the device is not broken.
> > Should never trigger for emulated devices, but it's helpful for
> > debugging these.  
> 
> Indeed vfio has some code that passes capability offsets and sizes from
> a physical device, but it explicitly pays attention so that the
> capabilities never overlap. Therefore, in pci_add_capability(), we can
> always assert that capabilities never overlap, and that is what happens
> when omitting errp.
> 
> Signed-off-by: Akihiko Odaki 
> ---
>  hw/vfio/pci-quirks.c | 15 +++
>  hw/vfio/pci.c| 14 +-
>  2 files changed, 8 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> index f0147a050a..e94fd273ea 100644
> --- a/hw/vfio/pci-quirks.c
> +++ b/hw/vfio/pci-quirks.c
> @@ -1530,7 +1530,7 @@ const PropertyInfo qdev_prop_nv_gpudirect_clique = {
>  static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice *vdev, Error **errp)
>  {
>  PCIDevice *pdev = >pdev;
> -int ret, pos = 0xC8;
> +int pos = 0xC8;
>  
>  if (vdev->nv_gpudirect_clique == 0xFF) {
>  return 0;
> @@ -1547,11 +1547,7 @@ static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice 
> *vdev, Error **errp)
>  return -EINVAL;
>  }
>  
> -ret = pci_add_capability(pdev, PCI_CAP_ID_VNDR, pos, 8, errp);
> -if (ret < 0) {
> -error_prepend(errp, "Failed to add NVIDIA GPUDirect cap: ");
> -return ret;
> -}
> +pci_add_capability(pdev, PCI_CAP_ID_VNDR, pos, 8);
>  
>  memset(vdev->emulated_config_bits + pos, 0xFF, 8);
>  pos += PCI_CAP_FLAGS;
> @@ -1718,12 +1714,7 @@ static int vfio_add_vmd_shadow_cap(VFIOPCIDevice 
> *vdev, Error **errp)
>  return -EFAULT;
>  }
>  
> -ret = pci_add_capability(>pdev, PCI_CAP_ID_VNDR, pos,
> - VMD_SHADOW_CAP_LEN, errp);
> -if (ret < 0) {
> -error_prepend(errp, "Failed to add VMD MEMBAR Shadow cap: ");
> -return ret;
> -}
> +pci_add_capability(>pdev, PCI_CAP_ID_VNDR, pos, 
> VMD_SHADOW_CAP_LEN);


For these, I guess we're assuming that since they're added first via
vfio_add_virt_caps() that we cannot create an overlap?

>  
>  memset(vdev->emulated_config_bits + pos, 0xFF, VMD_SHADOW_CAP_LEN);
>  pos += PCI_CAP_FLAGS;
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 939dcc3d4a..2b653d01e3 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -1826,7 +1826,7 @@ static void vfio_add_emulated_long(VFIOPCIDevice *vdev, 
> int pos,
>  vfio_set_long_bits(vdev->emulated_config_bits + pos, mask, mask);
>  }
>  
> -static int vfio_setup_pcie_cap(VFIOPCIDevice *vdev, int pos, uint8_t size,
> +static int vfio_setup_pcie_cap(VFIOPCIDevice *vdev, uint8_t pos, uint8_t 
> size,
> Error **errp)
>  {
>  uint16_t flags;
> @@ -1943,11 +1943,7 @@ static int vfio_setup_pcie_cap(VFIOPCIDevice *vdev, 
> int pos, uint8_t size,
> 1, PCI_EXP_FLAGS_VERS);
>  }
>  
> -pos = pci_add_capability(>pdev, PCI_CAP_ID_EXP, pos, size,
> - errp);
> -if (pos < 0) {
> -return pos;
> -}
> +pos = pci_add_capability(>pdev, PCI_CAP_ID_EXP, pos, size);
>  
>  vdev->pdev.exp.exp_cap = pos;
>  
> @@ -2045,14 +2041,14 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, 
> uint8_t pos, Error **errp)
>  case PCI_CAP_ID_PM:
>  vfio_check_pm_reset(vdev, pos);
>  vdev->pm_cap = pos;
> -ret = pci_add_capability(pdev, cap_id, pos, size, errp);
> +pci_add_capability(pdev, cap_id, pos, size);
>  break;
>  case PCI_CAP_ID_AF:
>  vfio_check_af_flr(vdev, pos);
> -ret = pci_add_capability(pdev, cap_id, pos, size, errp);
> +pci_add_capability(pdev, cap_id, pos, size);
>  break;
>  default:
> -ret = pci_add_capability(pdev, cap_id, pos, size, errp);
> +pci_add_capability(pdev, cap_id, pos, size);
>  break;
>  }

And for these we've calculated a size to make sure that we don't bump
into the next capability, but how do you account for the MSI and MSI-X
cases above this chunk where vfio's overlap prevention can't be used?
Thanks,

Alex

Re: [PATCH v4 0/7] ppc/e500: Add support for two types of flash, cleanup

2022-10-26 Thread Daniel Henrique Barboza





On 10/26/22 16:51, B wrote:



Am 26. Oktober 2022 17:18:14 UTC schrieb Daniel Henrique Barboza 
:

Hi,

Since this is being sent to qemu-ppc and has to do with e500 I decided to
take a look. I acked the e500 related patches, 5 and 7. Patch 6 LGTM as well
but I'd rather not ack it it's SD specific code.

I'll send a PowerPC pull request this week. I can grab this series via the ppc
tree if someone with SD authority acks patch 6.


This would be awesome. If we can't reach consensus on the eSDHC device until 
then perhaps you could pull everything but the last two patches?



That's fair enough. Just applied 1-5 to ppc-next.

I'll send a PR most likely Friday. If patch 6 gets an ACK until then I'll
pick 6 and 7 as well.

I'll be offline start of next week so this will be my last PR before the freeze.
If a patch 6 ACK arrives after Friday we'll need the SD maintainers to pick 
those
in before the freeze. Patch 7 has my ACK so feel free to take it.



Thanks Daniel for generally absorbing any patches floating around which look 
remotely useful for the ppc tree. This is rewarding.



Glad I'm able to help!


Daniel



Best regards,
Bernhard



Thanks,


Daniel





On 10/18/22 18:01, Bernhard Beschow wrote:

Cover letter:
~

This series adds support for -pflash and direct SD card access to the
PPC e500 boards. The idea is to increase compatibility with "real" firmware
images where only the bare minimum of drivers is compiled in.

The series is structured as follows:

Patches 1-4 perform some general cleanup which paves the way for the rest of
the series.

Patch 5 adds -pflash handling where memory-mapped flash can be added on
user's behalf. That is, the flash memory region in the eLBC is only added if
the -pflash argument is supplied. Note that the cfi01 device model becomes
stricter in checking the size of the emulated flash space.

Patches 6 and 7 add a new device model - the Freescale eSDHC - to the e500
boards which was missing so far.

User documentation is also added as the new features become available.

Tesing done:
* `qeu-system-ppc -M ppce500 -cpu e500mc -m 256 -kernel uImage -append
"console=ttyS0 rootwait root=/dev/mtdblock0 nokaslr" -drive
if=pflash,file=rootfs.ext2,format=raw`
* `qemu-system-ppc -M ppce500 -cpu e500mc -m 256 -kernel uImage -append
"console=ttyS0 rootwait root=/dev/mmcblk0" -device sd-card,drive=mydrive -drive
id=mydrive,if=none,file=rootfs.ext2,format=raw`

The load was created using latest Buildroot with `make
qemu_ppc_e500mc_defconfig` where the rootfs was configured to be of ext2 type.
In both cases it was possible to log in and explore the root file system.

v4:
~~~
Zoltan:
- Don't suggest presence of qemu-system-ppc32 in documentation

Bin:
- New patch: docs/system/ppc/ppce500: Use qemu-system-ppc64 across the board(s)

Peter:
- Inline pflash_cfi01_register() rather than modify it (similar to v2)

v3:
~~~
Phil:
- Also add power-of-2 fix to pflash_cfi02
- Resolve cfi01-specific assertion in e500 code
- Resolve unused define in eSDHC device model
- Resolve redundant alignment checks in eSDHC device model

Bin:
- Add dedicated flash chapter to documentation

Bernhard:
- Use is_power_of_2() instead of ctpop64() for better readability
- Only instantiate eSDHC device model in ppce500 (not used in MPC8544DS)
- Rebase onto gitlab.com/danielhb/qemu/tree/ppc-next

v2:
~~~
Bin:
- Add source for MPC8544DS platform bus' memory map in commit message.
- Keep "ESDHC" in comment referring to Linux driver.
- Use "qemu-system-ppc{64|32} in documentation.
- Use g_autofree in device tree code.
- Remove unneeded device tree properties.
- Error out if pflash size doesn't fit into eLBC memory window.
- Remove unused ESDHC defines.
- Define macro ESDHC_WML for register offset with magic constant.
- Fix some whitespace issues when adding eSDHC device to e500.

Phil:
- Fix tense in commit message.

Bernhard Beschow (7):
docs/system/ppc/ppce500: Use qemu-system-ppc64 across the board(s)
hw/block/pflash_cfi0{1,2}: Error out if device length isn't a power of
  two
hw/sd/sdhci-internal: Unexport ESDHC defines
hw/sd/sdhci: Rename ESDHC_* defines to USDHC_*
hw/ppc/e500: Implement pflash handling
hw/sd/sdhci: Implement Freescale eSDHC device model
hw/ppc/e500: Add Freescale eSDHC to e500plat

   docs/system/ppc/ppce500.rst |  38 +++-
   hw/block/pflash_cfi01.c |   8 +-
   hw/block/pflash_cfi02.c |   5 +
   hw/ppc/Kconfig  |   2 +
   hw/ppc/e500.c   | 114 +-
   hw/ppc/e500.h   |   1 +
   hw/ppc/e500plat.c   |   1 +
   hw/sd/sdhci-internal.h  |  20 
   hw/sd/sdhci.c   | 183 +++-
   include/hw/sd/sdhci.h   |   3 +
   10 files changed, 324 insertions(+), 51 deletions(-)

[PATCH v3 08/16] hw/pci/pci_bridge: Omit errp for pci_add_capability

2022-10-26 Thread Akihiko Odaki

Omitting errp for pci_add_capability() causes it to abort if
capabilities overlap. A caller of pci_bridge_ssvid_init(), which calls
pci_add_capability() in turn, is expected to ensure that will not
happen.

Signed-off-by: Akihiko Odaki 
---
 hw/pci-bridge/i82801b11.c  | 14 ++
 hw/pci-bridge/pcie_root_port.c |  7 +--
 hw/pci-bridge/xio3130_downstream.c |  8 ++--
 hw/pci-bridge/xio3130_upstream.c   |  8 ++--
 hw/pci/pci_bridge.c| 21 ++---
 include/hw/pci/pci_bridge.h|  5 ++---
 6 files changed, 15 insertions(+), 48 deletions(-)

diff --git a/hw/pci-bridge/i82801b11.c b/hw/pci-bridge/i82801b11.c
index f28181e210..f45dcdbacc 100644
--- a/hw/pci-bridge/i82801b11.c
+++ b/hw/pci-bridge/i82801b11.c
@@ -61,21 +61,11 @@ typedef struct I82801b11Bridge {
 
 static void i82801b11_bridge_realize(PCIDevice *d, Error **errp)
 {
-int rc;
-
 pci_bridge_initfn(d, TYPE_PCI_BUS);
 
-rc = pci_bridge_ssvid_init(d, I82801ba_SSVID_OFFSET,
-   I82801ba_SSVID_SVID, I82801ba_SSVID_SSID,
-   errp);
-if (rc < 0) {
-goto err_bridge;
-}
+pci_bridge_ssvid_init(d, I82801ba_SSVID_OFFSET,
+  I82801ba_SSVID_SVID, I82801ba_SSVID_SSID);
 pci_config_set_prog_interface(d->config, PCI_CLASS_BRIDGE_PCI_INF_SUB);
-return;
-
-err_bridge:
-pci_bridge_exitfn(d);
 }
 
 static const VMStateDescription i82801b11_bridge_dev_vmstate = {
diff --git a/hw/pci-bridge/pcie_root_port.c b/hw/pci-bridge/pcie_root_port.c
index 460e48269d..a9d8c2adb4 100644
--- a/hw/pci-bridge/pcie_root_port.c
+++ b/hw/pci-bridge/pcie_root_port.c
@@ -74,12 +74,7 @@ static void rp_realize(PCIDevice *d, Error **errp)
 }
 pcie_port_init_reg(d);
 
-rc = pci_bridge_ssvid_init(d, rpc->ssvid_offset, dc->vendor_id,
-   rpc->ssid, errp);
-if (rc < 0) {
-error_append_hint(errp, "Can't init SSV ID, error %d\n", rc);
-goto err_bridge;
-}
+pci_bridge_ssvid_init(d, rpc->ssvid_offset, dc->vendor_id, rpc->ssid);
 
 if (rpc->interrupts_init) {
 rc = rpc->interrupts_init(d, errp);
diff --git a/hw/pci-bridge/xio3130_downstream.c 
b/hw/pci-bridge/xio3130_downstream.c
index 05e2b06c0c..eea3d3a2df 100644
--- a/hw/pci-bridge/xio3130_downstream.c
+++ b/hw/pci-bridge/xio3130_downstream.c
@@ -81,12 +81,8 @@ static void xio3130_downstream_realize(PCIDevice *d, Error 
**errp)
 goto err_bridge;
 }
 
-rc = pci_bridge_ssvid_init(d, XIO3130_SSVID_OFFSET,
-   XIO3130_SSVID_SVID, XIO3130_SSVID_SSID,
-   errp);
-if (rc < 0) {
-goto err_msi;
-}
+pci_bridge_ssvid_init(d, XIO3130_SSVID_OFFSET,
+  XIO3130_SSVID_SVID, XIO3130_SSVID_SSID);
 
 rc = pcie_cap_init(d, XIO3130_EXP_OFFSET, PCI_EXP_TYPE_DOWNSTREAM,
p->port, errp);
diff --git a/hw/pci-bridge/xio3130_upstream.c b/hw/pci-bridge/xio3130_upstream.c
index 5ff46ef050..d954906d79 100644
--- a/hw/pci-bridge/xio3130_upstream.c
+++ b/hw/pci-bridge/xio3130_upstream.c
@@ -71,12 +71,8 @@ static void xio3130_upstream_realize(PCIDevice *d, Error 
**errp)
 goto err_bridge;
 }
 
-rc = pci_bridge_ssvid_init(d, XIO3130_SSVID_OFFSET,
-   XIO3130_SSVID_SVID, XIO3130_SSVID_SSID,
-   errp);
-if (rc < 0) {
-goto err_msi;
-}
+pci_bridge_ssvid_init(d, XIO3130_SSVID_OFFSET,
+  XIO3130_SSVID_SVID, XIO3130_SSVID_SSID);
 
 rc = pcie_cap_init(d, XIO3130_EXP_OFFSET, PCI_EXP_TYPE_UPSTREAM,
p->port, errp);
diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
index da34c8ebcd..30032fed64 100644
--- a/hw/pci/pci_bridge.c
+++ b/hw/pci/pci_bridge.c
@@ -42,21 +42,15 @@
 #define PCI_SSVID_SVID  4
 #define PCI_SSVID_SSID  6
 
-int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
-  uint16_t svid, uint16_t ssid,
-  Error **errp)
+void pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
+   uint16_t svid, uint16_t ssid)
 {
-int pos;
+uint8_t pos;
 
-pos = pci_add_capability(dev, PCI_CAP_ID_SSVID, offset,
- PCI_SSVID_SIZEOF, errp);
-if (pos < 0) {
-return pos;
-}
+pos = pci_add_capability(dev, PCI_CAP_ID_SSVID, offset, PCI_SSVID_SIZEOF);
 
 pci_set_word(dev->config + pos + PCI_SSVID_SVID, svid);
 pci_set_word(dev->config + pos + PCI_SSVID_SSID, ssid);
-return pos;
 }
 
 /* Accessor function to get parent bridge device from pci bus. */
@@ -455,11 +449,8 @@ int pci_bridge_qemu_reserve_cap_init(PCIDevice *dev, int 
cap_offset,
 .mem_pref_64 = cpu_to_le64(res_reserve.mem_pref_64)
 };
 
-int offset = pci_add_capability(dev, PCI_CAP_ID_VNDR,
-

[PATCH v3 11/16] msix: Omit errp for pci_add_capability

2022-10-26 Thread Akihiko Odaki

Omitting errp for pci_add_capability() causes it to abort if
capabilities overlap. A caller of msix_init(), which calls
pci_add_capability() in turn, is expected to ensure that will not
happen.

Signed-off-by: Akihiko Odaki 
---
 hw/pci/msix.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index 1e381a9813..28af83403b 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -311,7 +311,7 @@ int msix_init(struct PCIDevice *dev, unsigned short 
nentries,
   uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos,
   Error **errp)
 {
-int cap;
+uint8_t cap;
 unsigned table_size, pba_size;
 uint8_t *config;
 
@@ -340,11 +340,7 @@ int msix_init(struct PCIDevice *dev, unsigned short 
nentries,
 return -EINVAL;
 }
 
-cap = pci_add_capability(dev, PCI_CAP_ID_MSIX,
-  cap_pos, MSIX_CAP_LENGTH, errp);
-if (cap < 0) {
-return cap;
-}
+cap = pci_add_capability(dev, PCI_CAP_ID_MSIX, cap_pos, MSIX_CAP_LENGTH);
 
 dev->msix_cap = cap;
 dev->cap_present |= QEMU_PCI_CAP_MSIX;
-- 
2.37.3

[PATCH v3 05/16] eepro100: Omit errp for pci_add_capability

2022-10-26 Thread Akihiko Odaki

Omitting errp for pci_add_capability() causes it to abort if
capabilities overlap. This behavior is appropriate heare because all of
the capabilities set in this device are defined in the program and
their overlap should not happen unless there is a programming error.

Signed-off-by: Akihiko Odaki 
---
 hw/net/eepro100.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
index 679f52f80f..bf2ecdded9 100644
--- a/hw/net/eepro100.c
+++ b/hw/net/eepro100.c
@@ -549,12 +549,7 @@ static void e100_pci_reset(EEPRO100State *s, Error **errp)
 if (info->power_management) {
 /* Power Management Capabilities */
 int cfg_offset = 0xdc;
-int r = pci_add_capability(>dev, PCI_CAP_ID_PM,
-   cfg_offset, PCI_PM_SIZEOF,
-   errp);
-if (r < 0) {
-return;
-}
+pci_add_capability(>dev, PCI_CAP_ID_PM, cfg_offset, PCI_PM_SIZEOF);
 
 pci_set_word(pci_conf + cfg_offset + PCI_PM_PMC, 0x7e21);
 #if 0 /* TODO: replace dummy code for power management emulation. */
-- 
2.37.3

[PATCH v3 10/16] pci/shpc: Omit errp for pci_add_capability

2022-10-26 Thread Akihiko Odaki

Omitting errp for pci_add_capability() causes it to abort if
capabilities overlap. A caller of shpc_init(), which calls
pci_add_capability() in turn, is expected to ensure that will not
happen.

Signed-off-by: Akihiko Odaki 
---
 hw/pci-bridge/pci_bridge_dev.c  |  2 +-
 hw/pci-bridge/pcie_pci_bridge.c |  2 +-
 hw/pci/shpc.c   | 23 ++-
 include/hw/pci/shpc.h   |  3 +--
 4 files changed, 9 insertions(+), 21 deletions(-)

diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
index 657a06ddbe..4b6d1876eb 100644
--- a/hw/pci-bridge/pci_bridge_dev.c
+++ b/hw/pci-bridge/pci_bridge_dev.c
@@ -66,7 +66,7 @@ static void pci_bridge_dev_realize(PCIDevice *dev, Error 
**errp)
 dev->config[PCI_INTERRUPT_PIN] = 0x1;
 memory_region_init(_dev->bar, OBJECT(dev), "shpc-bar",
shpc_bar_size(dev));
-err = shpc_init(dev, >sec_bus, _dev->bar, 0, errp);
+err = shpc_init(dev, >sec_bus, _dev->bar, 0);
 if (err) {
 goto shpc_error;
 }
diff --git a/hw/pci-bridge/pcie_pci_bridge.c b/hw/pci-bridge/pcie_pci_bridge.c
index df5dfdd139..99778e3e24 100644
--- a/hw/pci-bridge/pcie_pci_bridge.c
+++ b/hw/pci-bridge/pcie_pci_bridge.c
@@ -42,7 +42,7 @@ static void pcie_pci_bridge_realize(PCIDevice *d, Error 
**errp)
 d->config[PCI_INTERRUPT_PIN] = 0x1;
 memory_region_init(_br->shpc_bar, OBJECT(d), "shpc-bar",
shpc_bar_size(d));
-rc = shpc_init(d, >sec_bus, _br->shpc_bar, 0, errp);
+rc = shpc_init(d, >sec_bus, _br->shpc_bar, 0);
 if (rc) {
 goto error;
 }
diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c
index e71f3a7483..5b3228c793 100644
--- a/hw/pci/shpc.c
+++ b/hw/pci/shpc.c
@@ -440,16 +440,11 @@ static void shpc_cap_update_dword(PCIDevice *d)
 }
 
 /* Add SHPC capability to the config space for the device. */
-static int shpc_cap_add_config(PCIDevice *d, Error **errp)
+static void shpc_cap_add_config(PCIDevice *d)
 {
 uint8_t *config;
-int config_offset;
-config_offset = pci_add_capability(d, PCI_CAP_ID_SHPC,
-   0, SHPC_CAP_LENGTH,
-   errp);
-if (config_offset < 0) {
-return config_offset;
-}
+uint8_t config_offset;
+config_offset = pci_add_capability(d, PCI_CAP_ID_SHPC, 0, SHPC_CAP_LENGTH);
 config = d->config + config_offset;
 
 pci_set_byte(config + SHPC_CAP_DWORD_SELECT, 0);
@@ -459,7 +454,6 @@ static int shpc_cap_add_config(PCIDevice *d, Error **errp)
 /* Make dword select and data writable. */
 pci_set_byte(d->wmask + config_offset + SHPC_CAP_DWORD_SELECT, 0xff);
 pci_set_long(d->wmask + config_offset + SHPC_CAP_DWORD_DATA, 0x);
-return 0;
 }
 
 static uint64_t shpc_mmio_read(void *opaque, hwaddr addr,
@@ -584,18 +578,13 @@ void shpc_device_unplug_request_cb(HotplugHandler 
*hotplug_dev,
 }
 
 /* Initialize the SHPC structure in bridge's BAR. */
-int shpc_init(PCIDevice *d, PCIBus *sec_bus, MemoryRegion *bar,
-  unsigned offset, Error **errp)
+int shpc_init(PCIDevice *d, PCIBus *sec_bus, MemoryRegion *bar, unsigned 
offset)
 {
-int i, ret;
+int i;
 int nslots = SHPC_MAX_SLOTS; /* TODO: qdev property? */
 SHPCDevice *shpc = d->shpc = g_malloc0(sizeof(*d->shpc));
 shpc->sec_bus = sec_bus;
-ret = shpc_cap_add_config(d, errp);
-if (ret) {
-g_free(d->shpc);
-return ret;
-}
+shpc_cap_add_config(d);
 if (nslots < SHPC_MIN_SLOTS) {
 return 0;
 }
diff --git a/include/hw/pci/shpc.h b/include/hw/pci/shpc.h
index d5683b7399..18ab16ec9f 100644
--- a/include/hw/pci/shpc.h
+++ b/include/hw/pci/shpc.h
@@ -38,8 +38,7 @@ struct SHPCDevice {
 
 void shpc_reset(PCIDevice *d);
 int shpc_bar_size(PCIDevice *dev);
-int shpc_init(PCIDevice *dev, PCIBus *sec_bus, MemoryRegion *bar,
-  unsigned off, Error **errp);
+int shpc_init(PCIDevice *dev, PCIBus *sec_bus, MemoryRegion *bar, unsigned 
off);
 void shpc_cleanup(PCIDevice *dev, MemoryRegion *bar);
 void shpc_free(PCIDevice *dev);
 void shpc_cap_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int len);
-- 
2.37.3

[PATCH v3 03/16] ahci: Omit errp for pci_add_capability

2022-10-26 Thread Akihiko Odaki

Omitting errp for pci_add_capability() causes it to abort if
capabilities overlap. This behavior is appropriate heare because all of
the capabilities set in this device are defined in the program and
their overlap should not happen unless there is a programming error.

Signed-off-by: Akihiko Odaki 
---
 hw/ide/ich.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/hw/ide/ich.c b/hw/ide/ich.c
index 1007a51fcb..3b478b01f8 100644
--- a/hw/ide/ich.c
+++ b/hw/ide/ich.c
@@ -106,7 +106,7 @@ static void pci_ich9_ahci_init(Object *obj)
 static void pci_ich9_ahci_realize(PCIDevice *dev, Error **errp)
 {
 struct AHCIPCIState *d;
-int sata_cap_offset;
+uint8_t sata_cap_offset;
 uint8_t *sata_cap;
 d = ICH9_AHCI(dev);
 int ret;
@@ -130,11 +130,7 @@ static void pci_ich9_ahci_realize(PCIDevice *dev, Error 
**errp)
  >ahci.mem);
 
 sata_cap_offset = pci_add_capability(dev, PCI_CAP_ID_SATA,
-  ICH9_SATA_CAP_OFFSET, SATA_CAP_SIZE,
-  errp);
-if (sata_cap_offset < 0) {
-return;
-}
+  ICH9_SATA_CAP_OFFSET, SATA_CAP_SIZE);
 
 sata_cap = dev->config + sata_cap_offset;
 pci_set_word(sata_cap + SATA_CAP_REV, 0x10);
-- 
2.37.3

[PATCH v3 16/16] pci: Remove legacy errp from pci_add_capability

2022-10-26 Thread Akihiko Odaki

Signed-off-by: Akihiko Odaki 
---
 hw/pci/pci.c | 29 +++--
 include/hw/pci/pci.h | 12 ++--
 2 files changed, 9 insertions(+), 32 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 8ee2171011..8ff71e4553 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2513,38 +2513,23 @@ static void pci_del_option_rom(PCIDevice *pdev)
 }
 
 /*
- * On success, pci_add_capability_legacy() returns a positive value
- * that the offset of the pci capability.
- * On failure, it sets an error and returns a negative error
- * code.
+ * pci_add_capability() returns a positive value that the offset of the pci
+ * capability.
  */
-int pci_add_capability_legacy(PCIDevice *pdev, uint8_t cap_id,
-  uint8_t offset, uint8_t size,
-  Error **errp)
+uint8_t pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
+   uint8_t offset, uint8_t size)
 {
 uint8_t *config;
-int i, overlapping_cap;
+int i;
 
 if (!offset) {
 offset = pci_find_space(pdev, size);
 /* out of PCI config space is programming error */
 assert(offset);
 } else {
-/* Verify that capabilities don't overlap.  Note: device assignment
- * depends on this check to verify that the device is not broken.
- * Should never trigger for emulated devices, but it's helpful
- * for debugging these. */
+/* Verify that capabilities don't overlap. */
 for (i = offset; i < offset + size; i++) {
-overlapping_cap = pci_find_capability_at_offset(pdev, i);
-if (overlapping_cap) {
-error_setg(errp, "%s:%02x:%02x.%x "
-   "Attempt to add PCI capability %x at offset "
-   "%x overlaps existing capability %x at offset %x",
-   pci_root_bus_path(pdev), pci_dev_bus_num(pdev),
-   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
-   cap_id, offset, overlapping_cap, i);
-return -EINVAL;
-}
+assert(!pci_find_capability_at_offset(pdev, i));
 }
 }
 
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 51fd106f16..2a5d4b329f 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -2,7 +2,6 @@
 #define QEMU_PCI_H
 
 #include "exec/memory.h"
-#include "qapi/error.h"
 #include "sysemu/dma.h"
 
 /* PCI includes legacy ISA access.  */
@@ -391,15 +390,8 @@ void pci_register_vga(PCIDevice *pci_dev, MemoryRegion 
*mem,
 void pci_unregister_vga(PCIDevice *pci_dev);
 pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num);
 
-int pci_add_capability_legacy(PCIDevice *pdev, uint8_t cap_id,
-  uint8_t offset, uint8_t size,
-  Error **errp);
-
-#define PCI_ADD_CAPABILITY_VA(pdev, cap_id, offset, size, errp, ...) \
-pci_add_capability_legacy(pdev, cap_id, offset, size, errp)
-
-#define pci_add_capability(...) \
-PCI_ADD_CAPABILITY_VA(__VA_ARGS__, _abort)
+uint8_t pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
+   uint8_t offset, uint8_t size);
 
 void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size);
 
-- 
2.37.3

[PATCH v3 15/16] virtio-pci: Omit errp for pci_add_capability

2022-10-26 Thread Akihiko Odaki

Omitting errp for pci_add_capability() causes it to abort if
capabilities overlap. This behavior is appropriate heare because all of
the capabilities set in this device are defined in the program and
their overlap should not happen unless there is a programming error.

Signed-off-by: Akihiko Odaki 
---
 hw/virtio/virtio-pci.c | 9 ++---
 include/hw/virtio/virtio-pci.h | 2 +-
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index c37bdc77ea..b393ff01be 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1154,8 +1154,7 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
 PCIDevice *dev = >pci_dev;
 int offset;
 
-offset = pci_add_capability(dev, PCI_CAP_ID_VNDR, 0,
-cap->cap_len, _abort);
+offset = pci_add_capability(dev, PCI_CAP_ID_VNDR, 0, cap->cap_len);
 
 assert(cap->cap_len >= sizeof *cap);
 memcpy(dev->config + offset + PCI_CAP_FLAGS, >cap_len,
@@ -1864,11 +1863,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 
 pcie_endpoint_cap_init(pci_dev, 0);
 
-pos = pci_add_capability(pci_dev, PCI_CAP_ID_PM, 0,
- PCI_PM_SIZEOF, errp);
-if (pos < 0) {
-return;
-}
+pos = pci_add_capability(pci_dev, PCI_CAP_ID_PM, 0, PCI_PM_SIZEOF);
 
 pci_dev->exp.pm_cap = pos;
 
diff --git a/include/hw/virtio/virtio-pci.h b/include/hw/virtio/virtio-pci.h
index 2446dcd9ae..9f3736723c 100644
--- a/include/hw/virtio/virtio-pci.h
+++ b/include/hw/virtio/virtio-pci.h
@@ -141,7 +141,7 @@ struct VirtIOPCIProxy {
 uint32_t msix_bar_idx;
 uint32_t modern_io_bar_idx;
 uint32_t modern_mem_bar_idx;
-int config_cap;
+uint8_t config_cap;
 uint32_t flags;
 bool disable_modern;
 bool ignore_backend_features;
-- 
2.37.3

[PATCH v3 09/16] pcie: Omit errp for pci_add_capability

2022-10-26 Thread Akihiko Odaki

Omitting errp for pci_add_capability() causes it to abort if
capabilities overlap. A caller of a PCIe function which calls
pci_add_capability() in turn is expected to ensure that will not
happen.

Signed-off-by: Akihiko Odaki 
Acked-by: Jonathan Cameron  (for CXL parts)
---
 docs/pcie_sriov.txt|  4 +--
 hw/display/bochs-display.c |  4 +--
 hw/net/e1000e.c|  4 +--
 hw/pci-bridge/cxl_downstream.c |  9 ++
 hw/pci-bridge/cxl_upstream.c   |  8 ++---
 hw/pci-bridge/pcie_pci_bridge.c|  6 +---
 hw/pci-bridge/pcie_root_port.c |  9 +-
 hw/pci-bridge/xio3130_downstream.c |  7 +---
 hw/pci-bridge/xio3130_upstream.c   |  7 +---
 hw/pci-host/designware.c   |  3 +-
 hw/pci-host/xilinx-pcie.c  |  4 +--
 hw/pci/pcie.c  | 52 --
 hw/usb/hcd-xhci-pci.c  |  3 +-
 hw/virtio/virtio-pci.c |  3 +-
 include/hw/pci/pcie.h  | 11 +++
 15 files changed, 35 insertions(+), 99 deletions(-)

diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
index 11158dbf88..728a73ba7b 100644
--- a/docs/pcie_sriov.txt
+++ b/docs/pcie_sriov.txt
@@ -49,7 +49,7 @@ setting up a BAR for a VF.
pci_your_pf_dev_realize( ... )
{
   ...
-  int ret = pcie_endpoint_cap_init(d, 0x70);
+  pcie_endpoint_cap_init(d, 0x70);
   ...
   pcie_ari_init(d, 0x100, 1);
   ...
@@ -79,7 +79,7 @@ setting up a BAR for a VF.
pci_your_vf_dev_realize( ... )
{
   ...
-  int ret = pcie_endpoint_cap_init(d, 0x60);
+  pcie_endpoint_cap_init(d, 0x60);
   ...
   pcie_ari_init(d, 0x100, 1);
   ...
diff --git a/hw/display/bochs-display.c b/hw/display/bochs-display.c
index 8ed734b195..111cabcfb3 100644
--- a/hw/display/bochs-display.c
+++ b/hw/display/bochs-display.c
@@ -265,7 +265,6 @@ static void bochs_display_realize(PCIDevice *dev, Error 
**errp)
 {
 BochsDisplayState *s = BOCHS_DISPLAY(dev);
 Object *obj = OBJECT(dev);
-int ret;
 
 if (s->vgamem < 4 * MiB) {
 error_setg(errp, "bochs-display: video memory too small");
@@ -302,8 +301,7 @@ static void bochs_display_realize(PCIDevice *dev, Error 
**errp)
 }
 
 if (pci_bus_is_express(pci_get_bus(dev))) {
-ret = pcie_endpoint_cap_init(dev, 0x80);
-assert(ret > 0);
+pcie_endpoint_cap_init(dev, 0x80);
 } else {
 dev->cap_present &= ~QEMU_PCI_CAP_EXPRESS;
 }
diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index e433b8f9a5..aea4305c43 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -462,9 +462,7 @@ static void e1000e_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 
 e1000e_init_msix(s);
 
-if (pcie_endpoint_cap_v1_init(pci_dev, e1000e_pcie_offset) < 0) {
-hw_error("Failed to initialize PCIe capability");
-}
+pcie_endpoint_cap_v1_init(pci_dev, e1000e_pcie_offset);
 
 ret = msi_init(PCI_DEVICE(s), 0xD0, 1, true, false, NULL);
 if (ret) {
diff --git a/hw/pci-bridge/cxl_downstream.c b/hw/pci-bridge/cxl_downstream.c
index a361e519d0..1980dd9c6c 100644
--- a/hw/pci-bridge/cxl_downstream.c
+++ b/hw/pci-bridge/cxl_downstream.c
@@ -155,12 +155,8 @@ static void cxl_dsp_realize(PCIDevice *d, Error **errp)
 goto err_bridge;
 }
 
-rc = pcie_cap_init(d, CXL_DOWNSTREAM_PORT_EXP_OFFSET,
-   PCI_EXP_TYPE_DOWNSTREAM, p->port,
-   errp);
-if (rc < 0) {
-goto err_msi;
-}
+pcie_cap_init(d, CXL_DOWNSTREAM_PORT_EXP_OFFSET,
+  PCI_EXP_TYPE_DOWNSTREAM, p->port);
 
 pcie_cap_flr_init(d);
 pcie_cap_deverr_init(d);
@@ -195,7 +191,6 @@ static void cxl_dsp_realize(PCIDevice *d, Error **errp)
 pcie_chassis_del_slot(s);
  err_pcie_cap:
 pcie_cap_exit(d);
- err_msi:
 msi_uninit(d);
  err_bridge:
 pci_bridge_exitfn(d);
diff --git a/hw/pci-bridge/cxl_upstream.c b/hw/pci-bridge/cxl_upstream.c
index a83a3e81e4..26f27ba681 100644
--- a/hw/pci-bridge/cxl_upstream.c
+++ b/hw/pci-bridge/cxl_upstream.c
@@ -138,11 +138,8 @@ static void cxl_usp_realize(PCIDevice *d, Error **errp)
 goto err_bridge;
 }
 
-rc = pcie_cap_init(d, CXL_UPSTREAM_PORT_PCIE_CAP_OFFSET,
-   PCI_EXP_TYPE_UPSTREAM, p->port, errp);
-if (rc < 0) {
-goto err_msi;
-}
+pcie_cap_init(d, CXL_UPSTREAM_PORT_PCIE_CAP_OFFSET,
+  PCI_EXP_TYPE_UPSTREAM, p->port);
 
 pcie_cap_flr_init(d);
 pcie_cap_deverr_init(d);
@@ -165,7 +162,6 @@ static void cxl_usp_realize(PCIDevice *d, Error **errp)
 
 err_cap:
 pcie_cap_exit(d);
-err_msi:
 msi_uninit(d);
 err_bridge:
 pci_bridge_exitfn(d);
diff --git a/hw/pci-bridge/pcie_pci_bridge.c b/hw/pci-bridge/pcie_pci_bridge.c
index 1cd917a459..df5dfdd139 100644
--- a/hw/pci-bridge/pcie_pci_bridge.c
+++ b/hw/pci-bridge/pcie_pci_bridge.c
@@ -47,10 +47,7 @@ static void pcie_pci_bridge_realize(PCIDevice *d, Error 
**errp)
 goto error;
 }
 
-rc =

[PATCH v3 07/16] msi: Omit errp for pci_add_capability

2022-10-26 Thread Akihiko Odaki

Omitting errp for pci_add_capability() causes it to abort if
capabilities overlap. A caller of msi_init(), which calls
pci_add_capability() in turn, is expected to ensure that will not
happen.

Signed-off-by: Akihiko Odaki 
---
 hw/pci/msi.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/hw/pci/msi.c b/hw/pci/msi.c
index 058d1d1ef1..5283a08b5a 100644
--- a/hw/pci/msi.c
+++ b/hw/pci/msi.c
@@ -194,7 +194,6 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
 unsigned int vectors_order;
 uint16_t flags;
 uint8_t cap_size;
-int config_offset;
 
 if (!msi_nonbroken) {
 error_setg(errp, "MSI is not supported by interrupt controller");
@@ -221,13 +220,7 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
 }
 
 cap_size = msi_cap_sizeof(flags);
-config_offset = pci_add_capability(dev, PCI_CAP_ID_MSI, offset,
-cap_size, errp);
-if (config_offset < 0) {
-return config_offset;
-}
-
-dev->msi_cap = config_offset;
+dev->msi_cap = pci_add_capability(dev, PCI_CAP_ID_MSI, offset, cap_size);
 dev->cap_present |= QEMU_PCI_CAP_MSI;
 
 pci_set_word(dev->config + msi_flags_off(dev), flags);
-- 
2.37.3

[PATCH v3 06/16] hw/nvme: Omit errp for pci_add_capability

2022-10-26 Thread Akihiko Odaki

Omitting errp for pci_add_capability() causes it to abort if
capabilities overlap. This behavior is appropriate heare because all of
the capabilities set in this device are defined in the program and
their overlap should not happen unless there is a programming error.

Signed-off-by: Akihiko Odaki 
---
 hw/nvme/ctrl.c | 14 ++
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 87aeba0564..ff4e2beea6 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -7325,17 +7325,9 @@ static void nvme_init_sriov(NvmeCtrl *n, PCIDevice 
*pci_dev, uint16_t offset)
   PCI_BASE_ADDRESS_MEM_TYPE_64, bar_size);
 }
 
-static int nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset)
+static void nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset)
 {
-Error *err = NULL;
-int ret;
-
-ret = pci_add_capability(pci_dev, PCI_CAP_ID_PM, offset,
- PCI_PM_SIZEOF, );
-if (err) {
-error_report_err(err);
-return ret;
-}
+pci_add_capability(pci_dev, PCI_CAP_ID_PM, offset, PCI_PM_SIZEOF);
 
 pci_set_word(pci_dev->config + offset + PCI_PM_PMC,
  PCI_PM_CAP_VER_1_2);
@@ -7343,8 +7335,6 @@ static int nvme_add_pm_capability(PCIDevice *pci_dev, 
uint8_t offset)
  PCI_PM_CTRL_NO_SOFT_RESET);
 pci_set_word(pci_dev->wmask + offset + PCI_PM_CTRL,
  PCI_PM_CTRL_STATE_MASK);
-
-return 0;
 }
 
 static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
-- 
2.37.3

[PATCH v3 14/16] hw/vfio/pci: Omit errp for pci_add_capability

2022-10-26 Thread Akihiko Odaki

The code generating errors in pci_add_capability has a comment which
says:
> Verify that capabilities don't overlap.  Note: device assignment
> depends on this check to verify that the device is not broken.
> Should never trigger for emulated devices, but it's helpful for
> debugging these.

Indeed vfio has some code that passes capability offsets and sizes from
a physical device, but it explicitly pays attention so that the
capabilities never overlap. Therefore, in pci_add_capability(), we can
always assert that capabilities never overlap, and that is what happens
when omitting errp.

Signed-off-by: Akihiko Odaki 
---
 hw/vfio/pci-quirks.c | 15 +++
 hw/vfio/pci.c| 14 +-
 2 files changed, 8 insertions(+), 21 deletions(-)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index f0147a050a..e94fd273ea 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -1530,7 +1530,7 @@ const PropertyInfo qdev_prop_nv_gpudirect_clique = {
 static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice *vdev, Error **errp)
 {
 PCIDevice *pdev = >pdev;
-int ret, pos = 0xC8;
+int pos = 0xC8;
 
 if (vdev->nv_gpudirect_clique == 0xFF) {
 return 0;
@@ -1547,11 +1547,7 @@ static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice 
*vdev, Error **errp)
 return -EINVAL;
 }
 
-ret = pci_add_capability(pdev, PCI_CAP_ID_VNDR, pos, 8, errp);
-if (ret < 0) {
-error_prepend(errp, "Failed to add NVIDIA GPUDirect cap: ");
-return ret;
-}
+pci_add_capability(pdev, PCI_CAP_ID_VNDR, pos, 8);
 
 memset(vdev->emulated_config_bits + pos, 0xFF, 8);
 pos += PCI_CAP_FLAGS;
@@ -1718,12 +1714,7 @@ static int vfio_add_vmd_shadow_cap(VFIOPCIDevice *vdev, 
Error **errp)
 return -EFAULT;
 }
 
-ret = pci_add_capability(>pdev, PCI_CAP_ID_VNDR, pos,
- VMD_SHADOW_CAP_LEN, errp);
-if (ret < 0) {
-error_prepend(errp, "Failed to add VMD MEMBAR Shadow cap: ");
-return ret;
-}
+pci_add_capability(>pdev, PCI_CAP_ID_VNDR, pos, VMD_SHADOW_CAP_LEN);
 
 memset(vdev->emulated_config_bits + pos, 0xFF, VMD_SHADOW_CAP_LEN);
 pos += PCI_CAP_FLAGS;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 939dcc3d4a..2b653d01e3 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1826,7 +1826,7 @@ static void vfio_add_emulated_long(VFIOPCIDevice *vdev, 
int pos,
 vfio_set_long_bits(vdev->emulated_config_bits + pos, mask, mask);
 }
 
-static int vfio_setup_pcie_cap(VFIOPCIDevice *vdev, int pos, uint8_t size,
+static int vfio_setup_pcie_cap(VFIOPCIDevice *vdev, uint8_t pos, uint8_t size,
Error **errp)
 {
 uint16_t flags;
@@ -1943,11 +1943,7 @@ static int vfio_setup_pcie_cap(VFIOPCIDevice *vdev, int 
pos, uint8_t size,
1, PCI_EXP_FLAGS_VERS);
 }
 
-pos = pci_add_capability(>pdev, PCI_CAP_ID_EXP, pos, size,
- errp);
-if (pos < 0) {
-return pos;
-}
+pos = pci_add_capability(>pdev, PCI_CAP_ID_EXP, pos, size);
 
 vdev->pdev.exp.exp_cap = pos;
 
@@ -2045,14 +2041,14 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, 
uint8_t pos, Error **errp)
 case PCI_CAP_ID_PM:
 vfio_check_pm_reset(vdev, pos);
 vdev->pm_cap = pos;
-ret = pci_add_capability(pdev, cap_id, pos, size, errp);
+pci_add_capability(pdev, cap_id, pos, size);
 break;
 case PCI_CAP_ID_AF:
 vfio_check_af_flr(vdev, pos);
-ret = pci_add_capability(pdev, cap_id, pos, size, errp);
+pci_add_capability(pdev, cap_id, pos, size);
 break;
 default:
-ret = pci_add_capability(pdev, cap_id, pos, size, errp);
+pci_add_capability(pdev, cap_id, pos, size);
 break;
 }
 
-- 
2.37.3

[PATCH v3 13/16] hw/pci-bridge/pcie_pci_bridge: Omit errp for pci_add_capability

2022-10-26 Thread Akihiko Odaki

Omitting errp for pci_add_capability() causes it to abort if
capabilities overlap. This behavior is appropriate heare because all of
the capabilities set in this device are defined in the program and
their overlap should not happen unless there is a programming error.

Signed-off-by: Akihiko Odaki 
---
 hw/pci-bridge/pcie_pci_bridge.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/hw/pci-bridge/pcie_pci_bridge.c b/hw/pci-bridge/pcie_pci_bridge.c
index 99778e3e24..1b839465e7 100644
--- a/hw/pci-bridge/pcie_pci_bridge.c
+++ b/hw/pci-bridge/pcie_pci_bridge.c
@@ -35,7 +35,7 @@ static void pcie_pci_bridge_realize(PCIDevice *d, Error 
**errp)
 {
 PCIBridge *br = PCI_BRIDGE(d);
 PCIEPCIBridge *pcie_br = PCIE_PCI_BRIDGE_DEV(d);
-int rc, pos;
+int rc;
 
 pci_bridge_initfn(d, TYPE_PCI_BUS);
 
@@ -49,12 +49,8 @@ static void pcie_pci_bridge_realize(PCIDevice *d, Error 
**errp)
 
 pcie_cap_init(d, 0, PCI_EXP_TYPE_PCI_BRIDGE, 0);
 
-pos = pci_add_capability(d, PCI_CAP_ID_PM, 0, PCI_PM_SIZEOF, errp);
-if (pos < 0) {
-goto pm_error;
-}
-d->exp.pm_cap = pos;
-pci_set_word(d->config + pos + PCI_PM_PMC, 0x3);
+d->exp.pm_cap = pci_add_capability(d, PCI_CAP_ID_PM, 0, PCI_PM_SIZEOF);
+pci_set_word(d->config + d->exp.pm_cap + PCI_PM_PMC, 0x3);
 
 pcie_cap_arifwd_init(d);
 pcie_cap_deverr_init(d);
@@ -85,7 +81,6 @@ static void pcie_pci_bridge_realize(PCIDevice *d, Error 
**errp)
 msi_error:
 pcie_aer_exit(d);
 aer_error:
-pm_error:
 pcie_cap_exit(d);
 shpc_cleanup(d, _br->shpc_bar);
 error:
-- 
2.37.3

[PATCH v3 12/16] pci/slotid: Omit errp for pci_add_capability

2022-10-26 Thread Akihiko Odaki

Omitting errp for pci_add_capability() causes it to abort if
capabilities overlap. A caller of slotid_cap_init(), which calls
pci_add_capability() in turn, is expected to ensure that will not
happen.

Signed-off-by: Akihiko Odaki 
---
 hw/pci/slotid_cap.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/hw/pci/slotid_cap.c b/hw/pci/slotid_cap.c
index 36d021b4a6..5da8c82133 100644
--- a/hw/pci/slotid_cap.c
+++ b/hw/pci/slotid_cap.c
@@ -12,7 +12,7 @@ int slotid_cap_init(PCIDevice *d, int nslots,
 unsigned offset,
 Error **errp)
 {
-int cap;
+uint8_t cap;
 
 if (!chassis) {
 error_setg(errp, "Bridge chassis not specified. Each bridge is 
required"
@@ -24,11 +24,7 @@ int slotid_cap_init(PCIDevice *d, int nslots,
 return -EINVAL;
 }
 
-cap = pci_add_capability(d, PCI_CAP_ID_SLOTID, offset,
- SLOTID_CAP_LENGTH, errp);
-if (cap < 0) {
-return cap;
-}
+cap = pci_add_capability(d, PCI_CAP_ID_SLOTID, offset, SLOTID_CAP_LENGTH);
 /* We make each chassis unique, this way each bridge is First in Chassis */
 d->config[cap + PCI_SID_ESR] = PCI_SID_ESR_FIC |
 (nslots << SLOTID_NSLOTS_SHIFT);
-- 
2.37.3

[PATCH v3 04/16] e1000e: Omit errp for pci_add_capability

2022-10-26 Thread Akihiko Odaki

Omitting errp for pci_add_capability() causes it to abort if
capabilities overlap. This behavior is appropriate heare because all of
the capabilities set in this device are defined in the program and
their overlap should not happen unless there is a programming error.

Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000e.c | 18 +++---
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index ac96f7665a..e433b8f9a5 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -377,17 +377,10 @@ e1000e_gen_dsn(uint8_t *mac)
(uint64_t)(mac[0])  << 56;
 }
 
-static int
+static void
 e1000e_add_pm_capability(PCIDevice *pdev, uint8_t offset, uint16_t pmc)
 {
-Error *local_err = NULL;
-int ret = pci_add_capability(pdev, PCI_CAP_ID_PM, offset,
- PCI_PM_SIZEOF, _err);
-
-if (local_err) {
-error_report_err(local_err);
-return ret;
-}
+pci_add_capability(pdev, PCI_CAP_ID_PM, offset, PCI_PM_SIZEOF);
 
 pci_set_word(pdev->config + offset + PCI_PM_PMC,
  PCI_PM_CAP_VER_1_1 |
@@ -400,8 +393,6 @@ e1000e_add_pm_capability(PCIDevice *pdev, uint8_t offset, 
uint16_t pmc)
 
 pci_set_word(pdev->w1cmask + offset + PCI_PM_CTRL,
  PCI_PM_CTRL_PME_STATUS);
-
-return ret;
 }
 
 static void e1000e_write_config(PCIDevice *pci_dev, uint32_t address,
@@ -480,10 +471,7 @@ static void e1000e_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 trace_e1000e_msi_init_fail(ret);
 }
 
-if (e1000e_add_pm_capability(pci_dev, e1000e_pmrb_offset,
-  PCI_PM_CAP_DSI) < 0) {
-hw_error("Failed to initialize PM capability");
-}
+e1000e_add_pm_capability(pci_dev, e1000e_pmrb_offset, PCI_PM_CAP_DSI);
 
 if (pcie_aer_init(pci_dev, PCI_ERR_VER, e1000e_aer_offset,
   PCI_ERR_SIZEOF, NULL) < 0) {
-- 
2.37.3

[PATCH v3 02/16] hw/i386/amd_iommu: Omit errp for pci_add_capability

2022-10-26 Thread Akihiko Odaki

Omitting errp for pci_add_capability() causes it to abort if
capabilities overlap. This behavior is appropriate heare because all of
the capabilities set in this device are defined in the program and
their overlap should not happen unless there is a programming error.

Signed-off-by: Akihiko Odaki 
---
 hw/i386/amd_iommu.c | 21 -
 1 file changed, 4 insertions(+), 17 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 725f69095b..8a88cbea0a 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1539,7 +1539,6 @@ static void amdvi_sysbus_reset(DeviceState *dev)
 
 static void amdvi_sysbus_realize(DeviceState *dev, Error **errp)
 {
-int ret = 0;
 AMDVIState *s = AMD_IOMMU_DEVICE(dev);
 MachineState *ms = MACHINE(qdev_get_machine());
 PCMachineState *pcms = PC_MACHINE(ms);
@@ -1553,23 +1552,11 @@ static void amdvi_sysbus_realize(DeviceState *dev, 
Error **errp)
 if (!qdev_realize(DEVICE(>pci), >qbus, errp)) {
 return;
 }
-ret = pci_add_capability(>pci.dev, AMDVI_CAPAB_ID_SEC, 0,
- AMDVI_CAPAB_SIZE, errp);
-if (ret < 0) {
-return;
-}
-s->capab_offset = ret;
+s->capab_offset = pci_add_capability(>pci.dev, AMDVI_CAPAB_ID_SEC, 0,
+ AMDVI_CAPAB_SIZE);
 
-ret = pci_add_capability(>pci.dev, PCI_CAP_ID_MSI, 0,
- AMDVI_CAPAB_REG_SIZE, errp);
-if (ret < 0) {
-return;
-}
-ret = pci_add_capability(>pci.dev, PCI_CAP_ID_HT, 0,
- AMDVI_CAPAB_REG_SIZE, errp);
-if (ret < 0) {
-return;
-}
+pci_add_capability(>pci.dev, PCI_CAP_ID_MSI, 0, AMDVI_CAPAB_REG_SIZE);
+pci_add_capability(>pci.dev, PCI_CAP_ID_HT, 0, AMDVI_CAPAB_REG_SIZE);
 
 /* Pseudo address space under root PCI bus. */
 x86ms->ioapic_as = amdvi_host_dma_iommu(bus, s, AMDVI_IOAPIC_SB_DEVID);
-- 
2.37.3

[PATCH v3 00/16] pci: Abort if pci_add_capability fails

2022-10-26 Thread Akihiko Odaki

pci_add_capability appears most PCI devices. Its error handling required
lots of code, and led to inconsistent behaviors such as:
- passing error_abort
- passing error_fatal
- asserting the returned value
- propagating the error to the caller
- skipping the rest of the function
- just ignoring

The code generating errors in pci_add_capability had a comment which
says:
> Verify that capabilities don't overlap.  Note: device assignment
> depends on this check to verify that the device is not broken.
> Should never trigger for emulated devices, but it's helpful for
> debugging these.

Indeed vfio has some code that passes capability offsets and sizes from
a physical device, but it explicitly pays attention so that the
capabilities never overlap. Therefore, we can always assert that
capabilities never overlap when pci_add_capability is called, resolving
these inconsistencies.

v3:
- Correct patch split between virtio-pci and pci (Markus Armbruster)
- Add messages for individual patches (Markus Armbruster)
- Acked-by: Jonathan Cameron 

v2: Split the patch (Markus Armbruster)

Akihiko Odaki (16):
  pci: Allow to omit errp for pci_add_capability
  hw/i386/amd_iommu: Omit errp for pci_add_capability
  ahci: Omit errp for pci_add_capability
  e1000e: Omit errp for pci_add_capability
  eepro100: Omit errp for pci_add_capability
  hw/nvme: Omit errp for pci_add_capability
  msi: Omit errp for pci_add_capability
  hw/pci/pci_bridge: Omit errp for pci_add_capability
  pcie: Omit errp for pci_add_capability
  pci/shpc: Omit errp for pci_add_capability
  msix: Omit errp for pci_add_capability
  pci/slotid: Omit errp for pci_add_capability
  hw/pci-bridge/pcie_pci_bridge: Omit errp for pci_add_capability
  hw/vfio/pci: Omit errp for pci_add_capability
  virtio-pci: Omit errp for pci_add_capability
  pci: Remove legacy errp from pci_add_capability

 docs/pcie_sriov.txt|  4 +--
 hw/display/bochs-display.c |  4 +--
 hw/i386/amd_iommu.c| 21 +++-
 hw/ide/ich.c   |  8 ++---
 hw/net/e1000e.c| 22 +++--
 hw/net/eepro100.c  |  7 +---
 hw/nvme/ctrl.c | 14 ++--
 hw/pci-bridge/cxl_downstream.c |  9 ++
 hw/pci-bridge/cxl_upstream.c   |  8 ++---
 hw/pci-bridge/i82801b11.c  | 14 ++--
 hw/pci-bridge/pci_bridge_dev.c |  2 +-
 hw/pci-bridge/pcie_pci_bridge.c| 19 +++
 hw/pci-bridge/pcie_root_port.c | 16 ++---
 hw/pci-bridge/xio3130_downstream.c | 15 ++---
 hw/pci-bridge/xio3130_upstream.c   | 15 ++---
 hw/pci-host/designware.c   |  3 +-
 hw/pci-host/xilinx-pcie.c  |  4 +--
 hw/pci/msi.c   |  9 +-
 hw/pci/msix.c  |  8 ++---
 hw/pci/pci.c   | 29 -
 hw/pci/pci_bridge.c| 21 
 hw/pci/pcie.c  | 52 --
 hw/pci/shpc.c  | 23 -
 hw/pci/slotid_cap.c|  8 ++---
 hw/usb/hcd-xhci-pci.c  |  3 +-
 hw/vfio/pci-quirks.c   | 15 ++---
 hw/vfio/pci.c  | 14 +++-
 hw/virtio/virtio-pci.c | 12 ++-
 include/hw/pci/pci.h   |  5 ++-
 include/hw/pci/pci_bridge.h|  5 ++-
 include/hw/pci/pcie.h  | 11 +++
 include/hw/pci/shpc.h  |  3 +-
 include/hw/virtio/virtio-pci.h |  2 +-
 33 files changed, 99 insertions(+), 306 deletions(-)

-- 
2.37.3

[PATCH v3 01/16] pci: Allow to omit errp for pci_add_capability

2022-10-26 Thread Akihiko Odaki

pci_add_capability appears most PCI devices. Its error handling required
lots of code, and led to inconsistent behaviors such as:
- passing error_abort
- passing error_fatal
- asserting the returned value
- propagating the error to the caller
- skipping the rest of the function
- just ignoring

The code generating errors in pci_add_capability had a comment which
says:
> Verify that capabilities don't overlap.  Note: device assignment
> depends on this check to verify that the device is not broken.
> Should never trigger for emulated devices, but it's helpful for
> debugging these.

Indeed vfio has some code that passes capability offsets and sizes from
a physical device, but it explicitly pays attention so that the
capabilities never overlap. Therefore, we can always assert that
capabilities never overlap when pci_add_capability is called, resolving
these inconsistencies.

Such an implementation of pci_add_capability will not have errp
parameter. However, there are so many callers of pci_add_capability
that it does not make sense to amend all of them at once to match
with the new signature. Instead, this change will allow callers of
pci_add_capability to omit errp as the first step.

Signed-off-by: Akihiko Odaki 
---
 hw/pci/pci.c |  8 
 include/hw/pci/pci.h | 13 ++---
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 2f450f6a72..8ee2171011 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2513,14 +2513,14 @@ static void pci_del_option_rom(PCIDevice *pdev)
 }
 
 /*
- * On success, pci_add_capability() returns a positive value
+ * On success, pci_add_capability_legacy() returns a positive value
  * that the offset of the pci capability.
  * On failure, it sets an error and returns a negative error
  * code.
  */
-int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
-   uint8_t offset, uint8_t size,
-   Error **errp)
+int pci_add_capability_legacy(PCIDevice *pdev, uint8_t cap_id,
+  uint8_t offset, uint8_t size,
+  Error **errp)
 {
 uint8_t *config;
 int i, overlapping_cap;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index b54b6ef88f..51fd106f16 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -2,6 +2,7 @@
 #define QEMU_PCI_H
 
 #include "exec/memory.h"
+#include "qapi/error.h"
 #include "sysemu/dma.h"
 
 /* PCI includes legacy ISA access.  */
@@ -390,9 +391,15 @@ void pci_register_vga(PCIDevice *pci_dev, MemoryRegion 
*mem,
 void pci_unregister_vga(PCIDevice *pci_dev);
 pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num);
 
-int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
-   uint8_t offset, uint8_t size,
-   Error **errp);
+int pci_add_capability_legacy(PCIDevice *pdev, uint8_t cap_id,
+  uint8_t offset, uint8_t size,
+  Error **errp);
+
+#define PCI_ADD_CAPABILITY_VA(pdev, cap_id, offset, size, errp, ...) \
+pci_add_capability_legacy(pdev, cap_id, offset, size, errp)
+
+#define pci_add_capability(...) \
+PCI_ADD_CAPABILITY_VA(__VA_ARGS__, _abort)
 
 void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size);
 
-- 
2.37.3

Re: [PATCH v4 0/7] ppc/e500: Add support for two types of flash, cleanup

2022-10-26 Thread B




Am 26. Oktober 2022 17:18:14 UTC schrieb Daniel Henrique Barboza 
:
>Hi,
>
>Since this is being sent to qemu-ppc and has to do with e500 I decided to
>take a look. I acked the e500 related patches, 5 and 7. Patch 6 LGTM as well
>but I'd rather not ack it it's SD specific code.
>
>I'll send a PowerPC pull request this week. I can grab this series via the ppc
>tree if someone with SD authority acks patch 6.

This would be awesome. If we can't reach consensus on the eSDHC device until 
then perhaps you could pull everything but the last two patches?

Thanks Daniel for generally absorbing any patches floating around which look 
remotely useful for the ppc tree. This is rewarding.

Best regards,
Bernhard
>
>
>Thanks,
>
>
>Daniel
>
>
>
>
>
>On 10/18/22 18:01, Bernhard Beschow wrote:
>> Cover letter:
>> ~
>> 
>> This series adds support for -pflash and direct SD card access to the
>> PPC e500 boards. The idea is to increase compatibility with "real" firmware
>> images where only the bare minimum of drivers is compiled in.
>> 
>> The series is structured as follows:
>> 
>> Patches 1-4 perform some general cleanup which paves the way for the rest of
>> the series.
>> 
>> Patch 5 adds -pflash handling where memory-mapped flash can be added on
>> user's behalf. That is, the flash memory region in the eLBC is only added if
>> the -pflash argument is supplied. Note that the cfi01 device model becomes
>> stricter in checking the size of the emulated flash space.
>> 
>> Patches 6 and 7 add a new device model - the Freescale eSDHC - to the e500
>> boards which was missing so far.
>> 
>> User documentation is also added as the new features become available.
>> 
>> Tesing done:
>> * `qeu-system-ppc -M ppce500 -cpu e500mc -m 256 -kernel uImage -append
>> "console=ttyS0 rootwait root=/dev/mtdblock0 nokaslr" -drive
>> if=pflash,file=rootfs.ext2,format=raw`
>> * `qemu-system-ppc -M ppce500 -cpu e500mc -m 256 -kernel uImage -append
>> "console=ttyS0 rootwait root=/dev/mmcblk0" -device sd-card,drive=mydrive 
>> -drive
>> id=mydrive,if=none,file=rootfs.ext2,format=raw`
>> 
>> The load was created using latest Buildroot with `make
>> qemu_ppc_e500mc_defconfig` where the rootfs was configured to be of ext2 
>> type.
>> In both cases it was possible to log in and explore the root file system.
>> 
>> v4:
>> ~~~
>> Zoltan:
>> - Don't suggest presence of qemu-system-ppc32 in documentation
>> 
>> Bin:
>> - New patch: docs/system/ppc/ppce500: Use qemu-system-ppc64 across the 
>> board(s)
>> 
>> Peter:
>> - Inline pflash_cfi01_register() rather than modify it (similar to v2)
>> 
>> v3:
>> ~~~
>> Phil:
>> - Also add power-of-2 fix to pflash_cfi02
>> - Resolve cfi01-specific assertion in e500 code
>> - Resolve unused define in eSDHC device model
>> - Resolve redundant alignment checks in eSDHC device model
>> 
>> Bin:
>> - Add dedicated flash chapter to documentation
>> 
>> Bernhard:
>> - Use is_power_of_2() instead of ctpop64() for better readability
>> - Only instantiate eSDHC device model in ppce500 (not used in MPC8544DS)
>> - Rebase onto gitlab.com/danielhb/qemu/tree/ppc-next
>> 
>> v2:
>> ~~~
>> Bin:
>> - Add source for MPC8544DS platform bus' memory map in commit message.
>> - Keep "ESDHC" in comment referring to Linux driver.
>> - Use "qemu-system-ppc{64|32} in documentation.
>> - Use g_autofree in device tree code.
>> - Remove unneeded device tree properties.
>> - Error out if pflash size doesn't fit into eLBC memory window.
>> - Remove unused ESDHC defines.
>> - Define macro ESDHC_WML for register offset with magic constant.
>> - Fix some whitespace issues when adding eSDHC device to e500.
>> 
>> Phil:
>> - Fix tense in commit message.
>> 
>> Bernhard Beschow (7):
>>docs/system/ppc/ppce500: Use qemu-system-ppc64 across the board(s)
>>hw/block/pflash_cfi0{1,2}: Error out if device length isn't a power of
>>  two
>>hw/sd/sdhci-internal: Unexport ESDHC defines
>>hw/sd/sdhci: Rename ESDHC_* defines to USDHC_*
>>hw/ppc/e500: Implement pflash handling
>>hw/sd/sdhci: Implement Freescale eSDHC device model
>>hw/ppc/e500: Add Freescale eSDHC to e500plat
>> 
>>   docs/system/ppc/ppce500.rst |  38 +++-
>>   hw/block/pflash_cfi01.c |   8 +-
>>   hw/block/pflash_cfi02.c |   5 +
>>   hw/ppc/Kconfig  |   2 +
>>   hw/ppc/e500.c   | 114 +-
>>   hw/ppc/e500.h   |   1 +
>>   hw/ppc/e500plat.c   |   1 +
>>   hw/sd/sdhci-internal.h  |  20 
>>   hw/sd/sdhci.c   | 183 +++-
>>   include/hw/sd/sdhci.h   |   3 +
>>   10 files changed, 324 insertions(+), 51 deletions(-)
>>

Re: [PATCH v2 14/43] hw/intc/i8259: Introduce i8259 proxy "isa-pic"

2022-10-26 Thread B




Am 24. Oktober 2022 07:35:48 UTC schrieb "Philippe Mathieu-Daudé" 
:
>Hi Bernhard,
>
>On 22/10/22 17:04, Bernhard Beschow wrote:
>> Having an i8259 proxy allows for ISA PICs to be created and wired up in
>> southbridges. This is especially interesting for PIIX3 for two reasons:
>> First, the southbridge doesn't need to care about the virtualization
>> technology used (KVM, TCG, Xen) due to in-IRQs (where devices get
>> attached) and out-IRQs (which will trigger the IRQs of the respective
>> virtzalization technology) are separated. Second, since the in-IRQs are
>> populated with fully initialized qemu_irq's, they can already be wired
>> up inside PIIX3.
>> 
>> Signed-off-by: Bernhard Beschow 
>> ---
>>   hw/intc/i8259.c | 27 +++
>>   include/hw/intc/i8259.h | 14 ++
>>   2 files changed, 41 insertions(+)
>
>> +static void isapic_set_irq(void *opaque, int irq, int level)
>> +{
>> +ISAPICState *s = opaque;
>> +
>> +qemu_set_irq(s->out_irqs[irq], level);
>> +}
>> +
>> +static void isapic_init(Object *obj)
>> +{
>> +ISAPICState *s = ISA_PIC(obj);
>> +
>> +qdev_init_gpio_in(DEVICE(s), isapic_set_irq, ISA_NUM_IRQS);
>> +qdev_init_gpio_out(DEVICE(s), s->out_irqs, ISA_NUM_IRQS);
>> +
>> +for (int i = 0; i < ISA_NUM_IRQS; ++i) {
>> +s->in_irqs[i] = qdev_get_gpio_in(DEVICE(s), i);
>> +}
>> +}
>> +
>> +static const TypeInfo isapic_info = {
>> +.name  = TYPE_ISA_PIC,
>> +.parent= TYPE_ISA_DEVICE,
>> +.instance_size = sizeof(ISAPICState),
>> +.instance_init = isapic_init,
>> +};
>
>> --- a/include/hw/intc/i8259.h
>> +++ b/include/hw/intc/i8259.h
>> @@ -1,6 +1,20 @@
>>   #ifndef HW_I8259_H
>>   #define HW_I8259_H
>>   +#include "qom/object.h"
>> +#include "hw/isa/isa.h"
>> +#include "qemu/typedefs.h"
>> +
>> +#define TYPE_ISA_PIC "isa-pic"
>> +OBJECT_DECLARE_SIMPLE_TYPE(ISAPICState, ISA_PIC)
>> +
>> +struct ISAPICState {
>> +ISADevice parent_obj;
>> +
>> +qemu_irq in_irqs[ISA_NUM_IRQS];
>> +qemu_irq out_irqs[ISA_NUM_IRQS];
>> +};
>
>There is nothing I8259 / ISA specific in your model.
>
>What about adding a generic qdev proxy-irq (having a configurable
>number of IRQs to proxy)? See for example hw/core/split-irq.c.

Will do!

Best regards,
Bernhard
>
>Regards,
>
>Phil.

[PULL 06/13] block: add BDRV_REQ_REGISTERED_BUF request flag

2022-10-26 Thread Stefan Hajnoczi

Block drivers may optimize I/O requests accessing buffers previously
registered with bdrv_register_buf(). Checking whether all elements of a
request's QEMUIOVector are within previously registered buffers is
expensive, so we need a hint from the user to avoid costly checks.

Add a BDRV_REQ_REGISTERED_BUF request flag to indicate that all
QEMUIOVector elements in an I/O request are known to be within
previously registered buffers.

Always pass the flag through to driver read/write functions. There is
little harm in passing the flag to a driver that does not use it.
Passing the flag to drivers avoids changes across many block drivers.
Filter drivers would need to explicitly support the flag and pass
through to their children when the children support it. That's a lot of
code changes and it's hard to remember to do that everywhere, leading to
silent reduced performance when the flag is accidentally dropped.

The only problematic scenario with the approach in this patch is when a
driver passes the flag through to internal I/O requests that don't use
the same I/O buffer. In that case the hint may be set when it should
actually be clear. This is a rare case though so the risk is low.

Some drivers have assert(!flags), which no longer works when
BDRV_REQ_REGISTERED_BUF is passed in. These assertions aren't very
useful anyway since the functions are called almost exclusively by
bdrv_driver_preadv/pwritev() so if we get flags handling right there
then the assertion is not needed.

Signed-off-by: Stefan Hajnoczi 
Message-id: 20221013185908.1297568-7-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/block/block-common.h |  9 ++
 block.c  | 14 +
 block/blkverify.c|  4 +--
 block/crypto.c   |  4 +--
 block/file-posix.c   |  1 -
 block/gluster.c  |  1 -
 block/io.c   | 61 ++--
 block/mirror.c   |  2 ++
 block/nbd.c  |  1 -
 block/parallels.c|  1 -
 block/qcow.c |  2 --
 block/qed.c  |  1 -
 block/raw-format.c   |  2 ++
 block/replication.c  |  1 -
 block/ssh.c  |  1 -
 block/vhdx.c |  1 -
 16 files changed, 69 insertions(+), 37 deletions(-)

diff --git a/include/block/block-common.h b/include/block/block-common.h
index fdb7306e78..061606e867 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -80,6 +80,15 @@ typedef enum {
  */
 BDRV_REQ_MAY_UNMAP  = 0x4,
 
+/*
+ * An optimization hint when all QEMUIOVector elements are within
+ * previously registered bdrv_register_buf() memory ranges.
+ *
+ * Code that replaces the user's QEMUIOVector elements with bounce buffers
+ * must take care to clear this flag.
+ */
+BDRV_REQ_REGISTERED_BUF = 0x8,
+
 BDRV_REQ_FUA= 0x10,
 BDRV_REQ_WRITE_COMPRESSED   = 0x20,
 
diff --git a/block.c b/block.c
index 1fbf6b9e69..c69be2cfe3 100644
--- a/block.c
+++ b/block.c
@@ -1641,6 +1641,20 @@ static int bdrv_open_driver(BlockDriverState *bs, 
BlockDriver *drv,
 goto open_failed;
 }
 
+assert(!(bs->supported_read_flags & ~BDRV_REQ_MASK));
+assert(!(bs->supported_write_flags & ~BDRV_REQ_MASK));
+
+/*
+ * Always allow the BDRV_REQ_REGISTERED_BUF optimization hint. This saves
+ * drivers that pass read/write requests through to a child the trouble of
+ * declaring support explicitly.
+ *
+ * Drivers must not propagate this flag accidentally when they initiate I/O
+ * to a bounce buffer. That case should be rare though.
+ */
+bs->supported_read_flags |= BDRV_REQ_REGISTERED_BUF;
+bs->supported_write_flags |= BDRV_REQ_REGISTERED_BUF;
+
 ret = refresh_total_sectors(bs, bs->total_sectors);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Could not refresh total sector count");
diff --git a/block/blkverify.c b/block/blkverify.c
index 020b1ae7b6..f36fd6aeb2 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -235,8 +235,8 @@ blkverify_co_preadv(BlockDriverState *bs, int64_t offset, 
int64_t bytes,
 qemu_iovec_init(_qiov, qiov->niov);
 qemu_iovec_clone(_qiov, qiov, buf);
 
-ret = blkverify_co_prwv(bs, , offset, bytes, qiov, _qiov, flags,
-false);
+ret = blkverify_co_prwv(bs, , offset, bytes, qiov, _qiov,
+flags & ~BDRV_REQ_REGISTERED_BUF, false);
 
 cmp_offset = qemu_iovec_compare(qiov, _qiov);
 if (cmp_offset != -1) {
diff --git a/block/crypto.c b/block/crypto.c
index 7a57774b76..c7365598a7 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -410,7 +410,6 @@ block_crypto_co_preadv(BlockDriverState *bs, int64_t 
offset, int64_t bytes,
 uint64_t sector_size = qcrypto_block_get_sector_size(crypto->block);
 uint64_t payload_offset = qcrypto_block_get_payload_offset(crypto->block);
 
-

[PULL 09/13] block: add BlockRAMRegistrar

2022-10-26 Thread Stefan Hajnoczi

Emulated devices and other BlockBackend users wishing to take advantage
of blk_register_buf() all have the same repetitive job: register
RAMBlocks with the BlockBackend using RAMBlockNotifier.

Add a BlockRAMRegistrar API to do this. A later commit will use this
from hw/block/virtio-blk.c.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Stefano Garzarella 
Message-id: 20221013185908.1297568-10-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS  |  1 +
 include/sysemu/block-ram-registrar.h | 37 ++
 block/block-ram-registrar.c  | 58 
 block/meson.build|  1 +
 4 files changed, 97 insertions(+)
 create mode 100644 include/sysemu/block-ram-registrar.h
 create mode 100644 block/block-ram-registrar.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 8b4363c0ff..cda146ebbb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2510,6 +2510,7 @@ F: block*
 F: block/
 F: hw/block/
 F: include/block/
+F: include/sysemu/block-*.h
 F: qemu-img*
 F: docs/tools/qemu-img.rst
 F: qemu-io*
diff --git a/include/sysemu/block-ram-registrar.h 
b/include/sysemu/block-ram-registrar.h
new file mode 100644
index 00..d8b2f7942b
--- /dev/null
+++ b/include/sysemu/block-ram-registrar.h
@@ -0,0 +1,37 @@
+/*
+ * BlockBackend RAM Registrar
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef BLOCK_RAM_REGISTRAR_H
+#define BLOCK_RAM_REGISTRAR_H
+
+#include "exec/ramlist.h"
+
+/**
+ * struct BlockRAMRegistrar:
+ *
+ * Keeps RAMBlock memory registered with a BlockBackend using
+ * blk_register_buf() including hotplugged memory.
+ *
+ * Emulated devices or other BlockBackend users initialize a BlockRAMRegistrar
+ * with blk_ram_registrar_init() before submitting I/O requests with the
+ * BDRV_REQ_REGISTERED_BUF flag set.
+ */
+typedef struct {
+BlockBackend *blk;
+RAMBlockNotifier notifier;
+bool ok;
+} BlockRAMRegistrar;
+
+void blk_ram_registrar_init(BlockRAMRegistrar *r, BlockBackend *blk);
+void blk_ram_registrar_destroy(BlockRAMRegistrar *r);
+
+/* Have all RAMBlocks been registered successfully? */
+static inline bool blk_ram_registrar_ok(BlockRAMRegistrar *r)
+{
+return r->ok;
+}
+
+#endif /* BLOCK_RAM_REGISTRAR_H */
diff --git a/block/block-ram-registrar.c b/block/block-ram-registrar.c
new file mode 100644
index 00..25dbafa789
--- /dev/null
+++ b/block/block-ram-registrar.c
@@ -0,0 +1,58 @@
+/*
+ * BlockBackend RAM Registrar
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/block-backend.h"
+#include "sysemu/block-ram-registrar.h"
+#include "qapi/error.h"
+
+static void ram_block_added(RAMBlockNotifier *n, void *host, size_t size,
+size_t max_size)
+{
+BlockRAMRegistrar *r = container_of(n, BlockRAMRegistrar, notifier);
+Error *err = NULL;
+
+if (!r->ok) {
+return; /* don't try again if we've already failed */
+}
+
+if (!blk_register_buf(r->blk, host, max_size, )) {
+error_report_err(err);
+ram_block_notifier_remove(>notifier);
+r->ok = false;
+}
+}
+
+static void ram_block_removed(RAMBlockNotifier *n, void *host, size_t size,
+  size_t max_size)
+{
+BlockRAMRegistrar *r = container_of(n, BlockRAMRegistrar, notifier);
+blk_unregister_buf(r->blk, host, max_size);
+}
+
+void blk_ram_registrar_init(BlockRAMRegistrar *r, BlockBackend *blk)
+{
+r->blk = blk;
+r->notifier = (RAMBlockNotifier){
+.ram_block_added = ram_block_added,
+.ram_block_removed = ram_block_removed,
+
+/*
+ * .ram_block_resized() is not necessary because we use the max_size
+ * value that does not change across resize.
+ */
+};
+r->ok = true;
+
+ram_block_notifier_add(>notifier);
+}
+
+void blk_ram_registrar_destroy(BlockRAMRegistrar *r)
+{
+if (r->ok) {
+ram_block_notifier_remove(>notifier);
+}
+}
diff --git a/block/meson.build b/block/meson.build
index 500878f082..b7c68b83a3 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -46,6 +46,7 @@ block_ss.add(files(
 ), zstd, zlib, gnutls)
 
 softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
+softmmu_ss.add(files('block-ram-registrar.c'))
 
 if get_option('qcow1').allowed()
   block_ss.add(files('qcow.c'))
-- 
2.37.3

[PULL 02/13] blkio: add libblkio block driver

2022-10-26 Thread Stefan Hajnoczi

libblkio (https://gitlab.com/libblkio/libblkio/) is a library for
high-performance disk I/O. It currently supports io_uring,
virtio-blk-vhost-user, and virtio-blk-vhost-vdpa with additional drivers
under development.

One of the reasons for developing libblkio is that other applications
besides QEMU can use it. This will be particularly useful for
virtio-blk-vhost-user which applications may wish to use for connecting
to qemu-storage-daemon.

libblkio also gives us an opportunity to develop in Rust behind a C API
that is easy to consume from QEMU.

This commit adds io_uring, nvme-io_uring, virtio-blk-vhost-user, and
virtio-blk-vhost-vdpa BlockDrivers to QEMU using libblkio. It will be
easy to add other libblkio drivers since they will share the majority of
code.

For now I/O buffers are copied through bounce buffers if the libblkio
driver requires it. Later commits add an optimization for
pre-registering guest RAM to avoid bounce buffers.

The syntax is:

  --blockdev 
io_uring,node-name=drive0,filename=test.img,readonly=on|off,cache.direct=on|off

  --blockdev 
nvme-io_uring,node-name=drive0,filename=/dev/ng0n1,readonly=on|off,cache.direct=on

  --blockdev 
virtio-blk-vhost-vdpa,node-name=drive0,path=/dev/vdpa...,readonly=on|off,cache.direct=on

  --blockdev 
virtio-blk-vhost-user,node-name=drive0,path=vhost-user-blk.sock,readonly=on|off,cache.direct=on

Signed-off-by: Stefan Hajnoczi 
Acked-by: Markus Armbruster 
Reviewed-by: Stefano Garzarella 
Message-id: 20221013185908.1297568-3-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS   |   6 +
 meson_options.txt |   2 +
 qapi/block-core.json  |  77 +++-
 meson.build   |   9 +
 block/blkio.c | 831 ++
 tests/qtest/modules-test.c|   3 +
 block/meson.build |   1 +
 scripts/meson-buildoptions.sh |   3 +
 8 files changed, 928 insertions(+), 4 deletions(-)
 create mode 100644 block/blkio.c

diff --git a/MAINTAINERS b/MAINTAINERS
index e3d5b7e09c..8b4363c0ff 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3415,6 +3415,12 @@ L: qemu-block@nongnu.org
 S: Maintained
 F: block/vdi.c
 
+blkio
+M: Stefan Hajnoczi 
+L: qemu-block@nongnu.org
+S: Maintained
+F: block/blkio.c
+
 iSCSI
 M: Ronnie Sahlberg 
 M: Paolo Bonzini 
diff --git a/meson_options.txt b/meson_options.txt
index 79c6af18d5..66128178bf 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -117,6 +117,8 @@ option('bzip2', type : 'feature', value : 'auto',
description: 'bzip2 support for DMG images')
 option('cap_ng', type : 'feature', value : 'auto',
description: 'cap_ng support')
+option('blkio', type : 'feature', value : 'auto',
+   description: 'libblkio block device driver')
 option('bpf', type : 'feature', value : 'auto',
 description: 'eBPF support')
 option('cocoa', type : 'feature', value : 'auto',
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 882b266532..cb5079e645 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2951,11 +2951,18 @@
 'file', 'snapshot-access', 'ftp', 'ftps', 'gluster',
 {'name': 'host_cdrom', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
 {'name': 'host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
-'http', 'https', 'iscsi',
-'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
-'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
+'http', 'https',
+{ 'name': 'io_uring', 'if': 'CONFIG_BLKIO' },
+'iscsi',
+'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme',
+{ 'name': 'nvme-io_uring', 'if': 'CONFIG_BLKIO' },
+'parallels', 'preallocate', 'qcow', 'qcow2', 'qed', 'quorum',
+'raw', 'rbd',
 { 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
-'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
+'ssh', 'throttle', 'vdi', 'vhdx',
+{ 'name': 'virtio-blk-vhost-user', 'if': 'CONFIG_BLKIO' },
+{ 'name': 'virtio-blk-vhost-vdpa', 'if': 'CONFIG_BLKIO' },
+'vmdk', 'vpc', 'vvfat' ] }
 
 ##
 # @BlockdevOptionsFile:
@@ -3678,6 +3685,58 @@
 '*debug': 'int',
 '*logfile': 'str' } }
 
+##
+# @BlockdevOptionsIoUring:
+#
+# Driver specific block device options for the io_uring backend.
+#
+# @filename: path to the image file
+#
+# Since: 7.2
+##
+{ 'struct': 'BlockdevOptionsIoUring',
+  'data': { 'filename': 'str' },
+  'if': 'CONFIG_BLKIO' }
+
+##
+# @BlockdevOptionsNvmeIoUring:
+#
+# Driver specific block device options for the nvme-io_uring backend.
+#
+# @filename: path to the image file
+#
+# Since: 7.2
+##
+{ 'struct': 'BlockdevOptionsNvmeIoUring',
+  'data': { 'filename': 'str' },
+  'if': 'CONFIG_BLKIO' }
+
+##
+# @BlockdevOptionsVirtioBlkVhostUser:
+#
+# Driver specific block device options for the virtio-blk-vhost-user backend.
+#
+#

[PULL 10/13] exec/cpu-common: add qemu_ram_get_fd()

2022-10-26 Thread Stefan Hajnoczi

Add a function to get the file descriptor for a RAMBlock. Device
emulation code typically uses the MemoryRegion APIs but vhost-style code
may use RAMBlock directly for sharing guest memory with another process.

This new API will be used by the libblkio block driver so it can share
guest memory via .bdrv_register_buf().

Signed-off-by: Stefan Hajnoczi 
Message-id: 20221013185908.1297568-11-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/exec/cpu-common.h | 1 +
 softmmu/physmem.c | 5 +
 2 files changed, 6 insertions(+)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index c493510ee9..6feaa40ca7 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -92,6 +92,7 @@ void qemu_ram_set_uf_zeroable(RAMBlock *rb);
 bool qemu_ram_is_migratable(RAMBlock *rb);
 void qemu_ram_set_migratable(RAMBlock *rb);
 void qemu_ram_unset_migratable(RAMBlock *rb);
+int qemu_ram_get_fd(RAMBlock *rb);
 
 size_t qemu_ram_pagesize(RAMBlock *block);
 size_t qemu_ram_pagesize_largest(void);
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 56e03e07b5..d9578ccfd4 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1748,6 +1748,11 @@ void qemu_ram_unset_migratable(RAMBlock *rb)
 rb->flags &= ~RAM_MIGRATABLE;
 }
 
+int qemu_ram_get_fd(RAMBlock *rb)
+{
+return rb->fd;
+}
+
 /* Called with iothread lock held.  */
 void qemu_ram_set_idstr(RAMBlock *new_block, const char *name, DeviceState 
*dev)
 {
-- 
2.37.3

[PULL 00/13] Block patches

2022-10-26 Thread Stefan Hajnoczi

The following changes since commit 79fc2fb685f35a5e71e23629760ef4025d6aba31:

  Merge tag 'trivial-branch-for-7.2-pull-request' of 
https://gitlab.com/laurent_vivier/qemu into staging (2022-10-25 11:37:17 -0400)

are available in the Git repository at:

  https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to baf422684d73c7bf38e2c18815e18d44fcf395b6:

  virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint (2022-10-26 
14:56:42 -0400)


Pull request



Stefan Hajnoczi (13):
  coroutine: add flag to re-queue at front of CoQueue
  blkio: add libblkio block driver
  numa: call ->ram_block_removed() in ram_block_notifer_remove()
  block: pass size to bdrv_unregister_buf()
  block: use BdrvRequestFlags type for supported flag fields
  block: add BDRV_REQ_REGISTERED_BUF request flag
  block: return errors from bdrv_register_buf()
  numa: use QLIST_FOREACH_SAFE() for RAM block notifiers
  block: add BlockRAMRegistrar
  exec/cpu-common: add qemu_ram_get_fd()
  stubs: add qemu_ram_block_from_host() and qemu_ram_get_fd()
  blkio: implement BDRV_REQ_REGISTERED_BUF optimization
  virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint

 MAINTAINERS |7 +
 meson_options.txt   |2 +
 qapi/block-core.json|   77 +-
 meson.build |9 +
 include/block/block-common.h|9 +
 include/block/block-global-state.h  |   10 +-
 include/block/block_int-common.h|   15 +-
 include/exec/cpu-common.h   |1 +
 include/hw/virtio/virtio-blk.h  |2 +
 include/qemu/coroutine.h|   15 +-
 include/sysemu/block-backend-global-state.h |4 +-
 include/sysemu/block-ram-registrar.h|   37 +
 block.c |   14 +
 block/blkio.c   | 1008 +++
 block/blkverify.c   |4 +-
 block/block-backend.c   |8 +-
 block/block-ram-registrar.c |   58 ++
 block/crypto.c  |4 +-
 block/file-posix.c  |1 -
 block/gluster.c |1 -
 block/io.c  |  101 +-
 block/mirror.c  |2 +
 block/nbd.c |1 -
 block/nvme.c|   20 +-
 block/parallels.c   |1 -
 block/qcow.c|2 -
 block/qed.c |1 -
 block/raw-format.c  |2 +
 block/replication.c |1 -
 block/ssh.c |1 -
 block/vhdx.c|1 -
 hw/block/virtio-blk.c   |   39 +-
 hw/core/numa.c  |   26 +-
 qemu-img.c  |6 +-
 softmmu/physmem.c   |5 +
 stubs/physmem.c |   13 +
 tests/qtest/modules-test.c  |3 +
 util/qemu-coroutine-lock.c  |9 +-
 util/vfio-helpers.c |5 +-
 block/meson.build   |2 +
 scripts/meson-buildoptions.sh   |3 +
 stubs/meson.build   |1 +
 42 files changed, 1435 insertions(+), 96 deletions(-)
 create mode 100644 include/sysemu/block-ram-registrar.h
 create mode 100644 block/blkio.c
 create mode 100644 block/block-ram-registrar.c
 create mode 100644 stubs/physmem.c

-- 
2.37.3

Re: [PATCH v7 00/13] blkio: add libblkio BlockDriver

2022-10-26 Thread Stefan Hajnoczi

On Thu, Oct 13, 2022 at 02:58:55PM -0400, Stefan Hajnoczi wrote:
> v7:
> - Add nvme-io_uring and virtio-blk-vhost-user syntax examples to commit 
> description [Markus]
> - Add missing nvme-io_uring QAPI [Markus, Alberto]
> - Rename mem-regions-pinned to may-pin-mem-regions [Alberto]
> - Fix value/bs->bl.max_iov mix-up [Stefano]
> v6:
> - Add untested nvme-io_uring driver. Please test in your nested NVMe 
> environment, Alberto. [Alberto]
> - Map blkio mem regions only when necessary to reduce conflicts with RAM 
> discard [Alberto]
> - Reduce duplication by having a single blkio_virtio_blk_common_open() 
> function [Alberto]
> - Avoid duplication in BlockDriver definitions using a macro [Alberto]
> - Avoid ram block registrar segfault [Stefano]
> - Use QLIST_FOREACH_SAFE() in ram block notifier code so callbacks can remove 
> themselves
> v5:
> - Drop "RFC" since libblkio 1.0 has been released and the library API is 
> stable
> - Disable BDRV_REQ_REGISTERED_BUF if we run out of blkio_mem_regions. The
>   bounce buffer slow path is taken when there are not enough blkio_mem_regions
>   to cover guest RAM. [Hanna & David Hildenbrand]
> - Call ram_block_discard_disable() when mem-region-pinned property is true or
>   absent [David Hildenbrand]
> - Use a bounce buffer pool instead of allocating/freeing a buffer for each
>   request. This reduces the number of blkio_mem_regions required for bounce
>   buffers to 1 and avoids frequent blkio_mem_region_map/unmap() calls.
> - Switch to .bdrv_co_*() instead of .bdrv_aio_*(). Needed for the bounce 
> buffer
>   pool's CoQueue.
> v4:
> - Patch 1:
>   - Add virtio-blk-vhost-user driver [Kevin]
>   - Drop .bdrv_parse_filename() and .bdrv_needs_filename for 
> virtio-blk-vhost-vdpa [Stefano]
>   - Add copyright and license header [Hanna]
>   - Drop .bdrv_parse_filename() in favor of --blockdev or json: [Hanna]
>   - Clarify that "filename" is always non-NULL for io_uring [Hanna]
>   - Check that virtio-blk-vhost-vdpa "path" option is non-NULL [Hanna]
>   - Fix virtio-blk-vhost-vdpa cache.direct=off logic [Hanna]
>   - Use macros for driver names [Hanna]
>   - Assert that the driver name is valid [Hanna]
>   - Update "readonly" property name to "read-only" [Hanna]
>   - Call blkio_detach_aio_context() in blkio_close() [Hanna]
>   - Avoid uint32_t * to int * casts in blkio_refresh_limits() [Hanna]
>   - Remove write zeroes and discard from the todo list [Hanna]
>   - Use PRIu32 instead of %d for uint32_t [Hanna]
>   - Fix error messages with buf-alignment instead of optimal-io-size [Hanna]
>   - Call map/unmap APIs since libblkio alloc/free APIs no longer do that
>   - Update QAPI schema QEMU version to 7.2
> - Patch 5:
>   - Expand the BDRV_REQ_REGISTERED_BUF flag passthrough and drop 
> assert(!flags)
> in drivers [Hanna]
> - Patch 7:
>   - Fix BLK->BDRV typo [Hanna]
>   - Make BlockRAMRegistrar handle failure [Hanna]
> - Patch 8:
>   - Replace memory_region_get_fd() approach with qemu_ram_get_fd()
> - Patch 10:
>   - Use (void)ret; to discard unused return value [Hanna]
>   - libblkio's blkio_unmap_mem_region() API no longer has a return value
>   - Check for registered bufs that cross RAMBlocks [Hanna]
> - Patch 11:
>   - Handle bdrv_register_buf() errors [Hanna]
> v3:
> - Add virtio-blk-vhost-vdpa for vdpa-blk devices including VDUSE
> - Add discard and write zeroes support
> - Rebase and adopt latest libblkio APIs
> v2:
> - Add BDRV_REQ_REGISTERED_BUF to bs.supported_write_flags [Stefano]
> - Use new blkioq_get_num_completions() API
> - Implement .bdrv_refresh_limits()
> 
> This patch series adds a QEMU BlockDriver for libblkio
> (https://gitlab.com/libblkio/libblkio/), a library for high-performance block
> device I/O. This work was presented at KVM Forum 2022 and slides are available
> here:
> https://static.sched.com/hosted_files/kvmforum2022/8c/libblkio-kvm-forum-2022.pdf
> 
> The second patch adds the core BlockDriver and most of the libblkio API usage.
> Three libblkio drivers are included:
> - io_uring
> - virtio-blk-vhost-user
> - virtio-blk-vhost-vdpa
> 
> The remainder of the patch series reworks the existing QEMU 
> bdrv_register_buf()
> API so virtio-blk emulation efficiently map guest RAM for libblkio - some
> libblkio drivers require that I/O buffer memory is pre-registered (think VFIO,
> vhost, etc).
> 
> Vladimir requested performance results that show the effect of the
> BDRV_REQ_REGISTERED_BUF flag. I ran the patches against qemu-storage-daemon's
> vhost-user-blk export with iodepth=1 bs=512 to see the per-request overhead 
> due
> to bounce buffer allocation/mapping:
> 
> Name   IOPS   Error
> bounce-buf  4373.81 ± 0.01%
> registered-buf 13062.80 ± 0.67%
> 
> The BDRV_REQ_REGISTERED_BUF optimization version is about 3x faster.
> 
> See the BlockDriver struct in block/blkio.c for a list of APIs that still need
> to be implemented. The core functionality is covered.
>

[PULL 04/13] block: pass size to bdrv_unregister_buf()

2022-10-26 Thread Stefan Hajnoczi

The only implementor of bdrv_register_buf() is block/nvme.c, where the
size is not needed when unregistering a buffer. This is because
util/vfio-helpers.c can look up mappings by address.

Future block drivers that implement bdrv_register_buf() may not be able
to do their job given only the buffer address. Add a size argument to
bdrv_unregister_buf().

Also document the assumptions about
bdrv_register_buf()/bdrv_unregister_buf() calls. The same 
values that were given to bdrv_register_buf() must be given to
bdrv_unregister_buf().

gcc 11.2.1 emits a spurious warning that img_bench()'s buf_size local
variable might be uninitialized, so it's necessary to silence the
compiler.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Stefano Garzarella 
Message-id: 20221013185908.1297568-5-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/block/block-global-state.h  | 5 -
 include/block/block_int-common.h| 2 +-
 include/sysemu/block-backend-global-state.h | 2 +-
 block/block-backend.c   | 4 ++--
 block/io.c  | 6 +++---
 block/nvme.c| 2 +-
 qemu-img.c  | 4 ++--
 7 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/include/block/block-global-state.h 
b/include/block/block-global-state.h
index 21265e3966..7901f35863 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -243,9 +243,12 @@ void bdrv_del_child(BlockDriverState *parent, BdrvChild 
*child, Error **errp);
  * Register/unregister a buffer for I/O. For example, VFIO drivers are
  * interested to know the memory areas that would later be used for I/O, so
  * that they can prepare IOMMU mapping etc., to get better performance.
+ *
+ * Buffers must not overlap and they must be unregistered with the same  values that they were registered with.
  */
 void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size);
-void bdrv_unregister_buf(BlockDriverState *bs, void *host);
+void bdrv_unregister_buf(BlockDriverState *bs, void *host, size_t size);
 
 void bdrv_cancel_in_flight(BlockDriverState *bs);
 
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 8947abab76..b7a7cbd3a5 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -435,7 +435,7 @@ struct BlockDriver {
  * DMA mapping for hot buffers.
  */
 void (*bdrv_register_buf)(BlockDriverState *bs, void *host, size_t size);
-void (*bdrv_unregister_buf)(BlockDriverState *bs, void *host);
+void (*bdrv_unregister_buf)(BlockDriverState *bs, void *host, size_t size);
 
 /*
  * This field is modified only under the BQL, and is part of
diff --git a/include/sysemu/block-backend-global-state.h 
b/include/sysemu/block-backend-global-state.h
index 415f0c91d7..97f7dad2c3 100644
--- a/include/sysemu/block-backend-global-state.h
+++ b/include/sysemu/block-backend-global-state.h
@@ -107,7 +107,7 @@ void blk_io_limits_update_group(BlockBackend *blk, const 
char *group);
 void blk_set_force_allow_inactivate(BlockBackend *blk);
 
 void blk_register_buf(BlockBackend *blk, void *host, size_t size);
-void blk_unregister_buf(BlockBackend *blk, void *host);
+void blk_unregister_buf(BlockBackend *blk, void *host, size_t size);
 
 const BdrvChild *blk_root(BlockBackend *blk);
 
diff --git a/block/block-backend.c b/block/block-backend.c
index aa4adf06ae..ae42474891 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2551,10 +2551,10 @@ void blk_register_buf(BlockBackend *blk, void *host, 
size_t size)
 bdrv_register_buf(blk_bs(blk), host, size);
 }
 
-void blk_unregister_buf(BlockBackend *blk, void *host)
+void blk_unregister_buf(BlockBackend *blk, void *host, size_t size)
 {
 GLOBAL_STATE_CODE();
-bdrv_unregister_buf(blk_bs(blk), host);
+bdrv_unregister_buf(blk_bs(blk), host, size);
 }
 
 int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in,
diff --git a/block/io.c b/block/io.c
index d30073036e..cca402bf7b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3275,16 +3275,16 @@ void bdrv_register_buf(BlockDriverState *bs, void 
*host, size_t size)
 }
 }
 
-void bdrv_unregister_buf(BlockDriverState *bs, void *host)
+void bdrv_unregister_buf(BlockDriverState *bs, void *host, size_t size)
 {
 BdrvChild *child;
 
 GLOBAL_STATE_CODE();
 if (bs->drv && bs->drv->bdrv_unregister_buf) {
-bs->drv->bdrv_unregister_buf(bs, host);
+bs->drv->bdrv_unregister_buf(bs, host, size);
 }
 QLIST_FOREACH(child, >children, next) {
-bdrv_unregister_buf(child->bs, host);
+bdrv_unregister_buf(child->bs, host, size);
 }
 }
 
diff --git a/block/nvme.c b/block/nvme.c
index 2b24f95164..94b76b16f2 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -1602,7 +1602,7 @@ static void nvme_register_buf(BlockDriverState *bs, void 
*host, size_t size)
 }
 }
 
-static void

[PULL 01/13] coroutine: add flag to re-queue at front of CoQueue

2022-10-26 Thread Stefan Hajnoczi

When a coroutine wakes up it may determine that it must re-queue.
Normally coroutines are pushed onto the back of the CoQueue, but for
fairness it may be necessary to push it onto the front of the CoQueue.

Add a flag to specify that the coroutine should be pushed onto the front
of the CoQueue. A later patch will use this to ensure fairness in the
bounce buffer CoQueue used by the blkio BlockDriver.

Signed-off-by: Stefan Hajnoczi 
Message-id: 20221013185908.1297568-2-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/coroutine.h   | 15 +--
 util/qemu-coroutine-lock.c |  9 +++--
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index aae33cce17..608fe45dcf 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -198,14 +198,25 @@ typedef struct CoQueue {
  */
 void qemu_co_queue_init(CoQueue *queue);
 
+typedef enum {
+/*
+ * Enqueue at front instead of back. Use this to re-queue a request when
+ * its wait condition is not satisfied after being woken up.
+ */
+CO_QUEUE_WAIT_FRONT = 0x1,
+} CoQueueWaitFlags;
+
 /**
  * Adds the current coroutine to the CoQueue and transfers control to the
  * caller of the coroutine.  The mutex is unlocked during the wait and
  * locked again afterwards.
  */
 #define qemu_co_queue_wait(queue, lock) \
-qemu_co_queue_wait_impl(queue, QEMU_MAKE_LOCKABLE(lock))
-void coroutine_fn qemu_co_queue_wait_impl(CoQueue *queue, QemuLockable *lock);
+qemu_co_queue_wait_impl(queue, QEMU_MAKE_LOCKABLE(lock), 0)
+#define qemu_co_queue_wait_flags(queue, lock, flags) \
+qemu_co_queue_wait_impl(queue, QEMU_MAKE_LOCKABLE(lock), (flags))
+void coroutine_fn qemu_co_queue_wait_impl(CoQueue *queue, QemuLockable *lock,
+  CoQueueWaitFlags flags);
 
 /**
  * Removes the next coroutine from the CoQueue, and queue it to run after
diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index 15c82d9348..45c6b57374 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -39,10 +39,15 @@ void qemu_co_queue_init(CoQueue *queue)
 QSIMPLEQ_INIT(>entries);
 }
 
-void coroutine_fn qemu_co_queue_wait_impl(CoQueue *queue, QemuLockable *lock)
+void coroutine_fn qemu_co_queue_wait_impl(CoQueue *queue, QemuLockable *lock,
+  CoQueueWaitFlags flags)
 {
 Coroutine *self = qemu_coroutine_self();
-QSIMPLEQ_INSERT_TAIL(>entries, self, co_queue_next);
+if (flags & CO_QUEUE_WAIT_FRONT) {
+QSIMPLEQ_INSERT_HEAD(>entries, self, co_queue_next);
+} else {
+QSIMPLEQ_INSERT_TAIL(>entries, self, co_queue_next);
+}
 
 if (lock) {
 qemu_lockable_unlock(lock);
-- 
2.37.3

[PULL 03/13] numa: call ->ram_block_removed() in ram_block_notifer_remove()

2022-10-26 Thread Stefan Hajnoczi

When a RAMBlockNotifier is added, ->ram_block_added() is called with all
existing RAMBlocks. There is no equivalent ->ram_block_removed() call
when a RAMBlockNotifier is removed.

The util/vfio-helpers.c code (the sole user of RAMBlockNotifier) is fine
with this asymmetry because it does not rely on RAMBlockNotifier for
cleanup. It walks its internal list of DMA mappings and unmaps them by
itself.

Future users of RAMBlockNotifier may not have an internal data structure
that records added RAMBlocks so they will need ->ram_block_removed()
callbacks.

This patch makes ram_block_notifier_remove() symmetric with respect to
callbacks. Now util/vfio-helpers.c needs to unmap remaining DMA mappings
after ram_block_notifier_remove() has been called. This is necessary
since users like block/nvme.c may create additional DMA mappings that do
not originate from the RAMBlockNotifier.

Reviewed-by: David Hildenbrand 
Signed-off-by: Stefan Hajnoczi 
Message-id: 20221013185908.1297568-4-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 hw/core/numa.c  | 17 +
 util/vfio-helpers.c |  5 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 26d8e5f616..31e6fe1caa 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -822,6 +822,19 @@ static int ram_block_notify_add_single(RAMBlock *rb, void 
*opaque)
 return 0;
 }
 
+static int ram_block_notify_remove_single(RAMBlock *rb, void *opaque)
+{
+const ram_addr_t max_size = qemu_ram_get_max_length(rb);
+const ram_addr_t size = qemu_ram_get_used_length(rb);
+void *host = qemu_ram_get_host_addr(rb);
+RAMBlockNotifier *notifier = opaque;
+
+if (host) {
+notifier->ram_block_removed(notifier, host, size, max_size);
+}
+return 0;
+}
+
 void ram_block_notifier_add(RAMBlockNotifier *n)
 {
 QLIST_INSERT_HEAD(_list.ramblock_notifiers, n, next);
@@ -835,6 +848,10 @@ void ram_block_notifier_add(RAMBlockNotifier *n)
 void ram_block_notifier_remove(RAMBlockNotifier *n)
 {
 QLIST_REMOVE(n, next);
+
+if (n->ram_block_removed) {
+qemu_ram_foreach_block(ram_block_notify_remove_single, n);
+}
 }
 
 void ram_block_notify_add(void *host, size_t size, size_t max_size)
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 5ba01177bf..0d1520caac 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -847,10 +847,13 @@ void qemu_vfio_close(QEMUVFIOState *s)
 if (!s) {
 return;
 }
+
+ram_block_notifier_remove(>ram_notifier);
+
 for (i = 0; i < s->nr_mappings; ++i) {
 qemu_vfio_undo_mapping(s, >mappings[i], NULL);
 }
-ram_block_notifier_remove(>ram_notifier);
+
 g_free(s->usable_iova_ranges);
 s->nb_iova_ranges = 0;
 qemu_vfio_reset(s);
-- 
2.37.3

[PULL 05/13] block: use BdrvRequestFlags type for supported flag fields

2022-10-26 Thread Stefan Hajnoczi

Use the enum type so GDB displays the enum members instead of printing a
numeric constant.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Stefano Garzarella 
Message-id: 20221013185908.1297568-6-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/block/block_int-common.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index b7a7cbd3a5..19798d0e77 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -1051,7 +1051,7 @@ struct BlockDriverState {
 /*
  * Flags honored during pread
  */
-unsigned int supported_read_flags;
+BdrvRequestFlags supported_read_flags;
 /*
  * Flags honored during pwrite (so far: BDRV_REQ_FUA,
  * BDRV_REQ_WRITE_UNCHANGED).
@@ -1069,12 +1069,12 @@ struct BlockDriverState {
  * flag), or they have to explicitly take the WRITE permission for
  * their children.
  */
-unsigned int supported_write_flags;
+BdrvRequestFlags supported_write_flags;
 /*
  * Flags honored during pwrite_zeroes (so far: BDRV_REQ_FUA,
  * BDRV_REQ_MAY_UNMAP, BDRV_REQ_WRITE_UNCHANGED)
  */
-unsigned int supported_zero_flags;
+BdrvRequestFlags supported_zero_flags;
 /*
  * Flags honoured during truncate (so far: BDRV_REQ_ZERO_WRITE).
  *
@@ -1082,7 +1082,7 @@ struct BlockDriverState {
  * that any added space reads as all zeros. If this can't be guaranteed,
  * the operation must fail.
  */
-unsigned int supported_truncate_flags;
+BdrvRequestFlags supported_truncate_flags;
 
 /* the following member gives a name to every node on the bs graph. */
 char node_name[32];
-- 
2.37.3

[PULL 11/13] stubs: add qemu_ram_block_from_host() and qemu_ram_get_fd()

2022-10-26 Thread Stefan Hajnoczi

The blkio block driver will need to look up the file descriptor for a
given pointer. This is possible in softmmu builds where the RAMBlock API
is available for querying guest RAM.

Add stubs so tools like qemu-img that link the block layer still build
successfully. In this case there is no guest RAM but that is fine.
Bounce buffers and their file descriptors will be allocated with
libblkio's blkio_alloc_mem_region() so we won't rely on QEMU's
qemu_ram_get_fd() in that case.

Signed-off-by: Stefan Hajnoczi 
Message-id: 20221013185908.1297568-12-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 stubs/physmem.c   | 13 +
 stubs/meson.build |  1 +
 2 files changed, 14 insertions(+)
 create mode 100644 stubs/physmem.c

diff --git a/stubs/physmem.c b/stubs/physmem.c
new file mode 100644
index 00..1fc5f2df29
--- /dev/null
+++ b/stubs/physmem.c
@@ -0,0 +1,13 @@
+#include "qemu/osdep.h"
+#include "exec/cpu-common.h"
+
+RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
+   ram_addr_t *offset)
+{
+return NULL;
+}
+
+int qemu_ram_get_fd(RAMBlock *rb)
+{
+return -1;
+}
diff --git a/stubs/meson.build b/stubs/meson.build
index d8f3fd5c44..4314161f5f 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -29,6 +29,7 @@ stub_ss.add(files('migr-blocker.c'))
 stub_ss.add(files('module-opts.c'))
 stub_ss.add(files('monitor.c'))
 stub_ss.add(files('monitor-core.c'))
+stub_ss.add(files('physmem.c'))
 stub_ss.add(files('qemu-timer-notify-cb.c'))
 stub_ss.add(files('qmp_memory_device.c'))
 stub_ss.add(files('qmp-command-available.c'))
-- 
2.37.3

[PULL 13/13] virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint

2022-10-26 Thread Stefan Hajnoczi

Register guest RAM using BlockRAMRegistrar and set the
BDRV_REQ_REGISTERED_BUF flag so block drivers can optimize memory
accesses in I/O requests.

This is for vdpa-blk, vhost-user-blk, and other I/O interfaces that rely
on DMA mapping/unmapping.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Stefano Garzarella 
Message-id: 20221013185908.1297568-14-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/hw/virtio/virtio-blk.h |  2 ++
 hw/block/virtio-blk.c  | 39 ++
 2 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index d311c57cca..7f589b4146 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -19,6 +19,7 @@
 #include "hw/block/block.h"
 #include "sysemu/iothread.h"
 #include "sysemu/block-backend.h"
+#include "sysemu/block-ram-registrar.h"
 #include "qom/object.h"
 
 #define TYPE_VIRTIO_BLK "virtio-blk-device"
@@ -64,6 +65,7 @@ struct VirtIOBlock {
 struct VirtIOBlockDataPlane *dataplane;
 uint64_t host_features;
 size_t config_size;
+BlockRAMRegistrar blk_ram_registrar;
 };
 
 typedef struct VirtIOBlockReq {
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 8131ec2dbc..f717550fdc 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -21,6 +21,7 @@
 #include "hw/block/block.h"
 #include "hw/qdev-properties.h"
 #include "sysemu/blockdev.h"
+#include "sysemu/block-ram-registrar.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/runstate.h"
 #include "hw/virtio/virtio-blk.h"
@@ -362,12 +363,14 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
 }
 }
 
-static inline void submit_requests(BlockBackend *blk, MultiReqBuffer *mrb,
+static inline void submit_requests(VirtIOBlock *s, MultiReqBuffer *mrb,
int start, int num_reqs, int niov)
 {
+BlockBackend *blk = s->blk;
 QEMUIOVector *qiov = >reqs[start]->qiov;
 int64_t sector_num = mrb->reqs[start]->sector_num;
 bool is_write = mrb->is_write;
+BdrvRequestFlags flags = 0;
 
 if (num_reqs > 1) {
 int i;
@@ -398,12 +401,18 @@ static inline void submit_requests(BlockBackend *blk, 
MultiReqBuffer *mrb,
   num_reqs - 1);
 }
 
+if (blk_ram_registrar_ok(>blk_ram_registrar)) {
+flags |= BDRV_REQ_REGISTERED_BUF;
+}
+
 if (is_write) {
-blk_aio_pwritev(blk, sector_num << BDRV_SECTOR_BITS, qiov, 0,
-virtio_blk_rw_complete, mrb->reqs[start]);
+blk_aio_pwritev(blk, sector_num << BDRV_SECTOR_BITS, qiov,
+flags, virtio_blk_rw_complete,
+mrb->reqs[start]);
 } else {
-blk_aio_preadv(blk, sector_num << BDRV_SECTOR_BITS, qiov, 0,
-   virtio_blk_rw_complete, mrb->reqs[start]);
+blk_aio_preadv(blk, sector_num << BDRV_SECTOR_BITS, qiov,
+   flags, virtio_blk_rw_complete,
+   mrb->reqs[start]);
 }
 }
 
@@ -425,14 +434,14 @@ static int multireq_compare(const void *a, const void *b)
 }
 }
 
-static void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb)
+static void virtio_blk_submit_multireq(VirtIOBlock *s, MultiReqBuffer *mrb)
 {
 int i = 0, start = 0, num_reqs = 0, niov = 0, nb_sectors = 0;
 uint32_t max_transfer;
 int64_t sector_num = 0;
 
 if (mrb->num_reqs == 1) {
-submit_requests(blk, mrb, 0, 1, -1);
+submit_requests(s, mrb, 0, 1, -1);
 mrb->num_reqs = 0;
 return;
 }
@@ -452,11 +461,11 @@ static void virtio_blk_submit_multireq(BlockBackend *blk, 
MultiReqBuffer *mrb)
  * 3. merge would exceed maximum transfer length of backend device
  */
 if (sector_num + nb_sectors != req->sector_num ||
-niov > blk_get_max_iov(blk) - req->qiov.niov ||
+niov > blk_get_max_iov(s->blk) - req->qiov.niov ||
 req->qiov.size > max_transfer ||
 nb_sectors > (max_transfer -
   req->qiov.size) / BDRV_SECTOR_SIZE) {
-submit_requests(blk, mrb, start, num_reqs, niov);
+submit_requests(s, mrb, start, num_reqs, niov);
 num_reqs = 0;
 }
 }
@@ -472,7 +481,7 @@ static void virtio_blk_submit_multireq(BlockBackend *blk, 
MultiReqBuffer *mrb)
 num_reqs++;
 }
 
-submit_requests(blk, mrb, start, num_reqs, niov);
+submit_requests(s, mrb, start, num_reqs, niov);
 mrb->num_reqs = 0;
 }
 
@@ -487,7 +496,7 @@ static void virtio_blk_handle_flush(VirtIOBlockReq *req, 
MultiReqBuffer *mrb)
  * Make sure all outstanding writes are posted to the backing device.
  */
 if (mrb->is_write && mrb->num_reqs > 0) {
-virtio_blk_submit_multireq(s->blk, mrb);
+virtio_blk_submit_multireq(s, mrb);
 }
 blk_aio_flush(s->blk,

[PULL 07/13] block: return errors from bdrv_register_buf()

2022-10-26 Thread Stefan Hajnoczi

Registering an I/O buffer is only a performance optimization hint but it
is still necessary to return errors when it fails.

Later patches will need to detect errors when registering buffers but an
immediate advantage is that error_report() calls are no longer needed in
block driver .bdrv_register_buf() functions.

Signed-off-by: Stefan Hajnoczi 
Message-id: 20221013185908.1297568-8-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/block/block-global-state.h  |  5 ++-
 include/block/block_int-common.h|  5 ++-
 include/sysemu/block-backend-global-state.h |  2 +-
 block/block-backend.c   |  4 +--
 block/io.c  | 34 +++--
 block/nvme.c| 18 +--
 qemu-img.c  |  2 +-
 7 files changed, 52 insertions(+), 18 deletions(-)

diff --git a/include/block/block-global-state.h 
b/include/block/block-global-state.h
index 7901f35863..eba4ed23b4 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -246,8 +246,11 @@ void bdrv_del_child(BlockDriverState *parent, BdrvChild 
*child, Error **errp);
  *
  * Buffers must not overlap and they must be unregistered with the same  values that they were registered with.
+ *
+ * Returns: true on success, false on failure
  */
-void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size);
+bool bdrv_register_buf(BlockDriverState *bs, void *host, size_t size,
+   Error **errp);
 void bdrv_unregister_buf(BlockDriverState *bs, void *host, size_t size);
 
 void bdrv_cancel_in_flight(BlockDriverState *bs);
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 19798d0e77..9c569be162 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -433,8 +433,11 @@ struct BlockDriver {
  * that it can do IOMMU mapping with VFIO etc., in order to get better
  * performance. In the case of VFIO drivers, this callback is used to do
  * DMA mapping for hot buffers.
+ *
+ * Returns: true on success, false on failure
  */
-void (*bdrv_register_buf)(BlockDriverState *bs, void *host, size_t size);
+bool (*bdrv_register_buf)(BlockDriverState *bs, void *host, size_t size,
+  Error **errp);
 void (*bdrv_unregister_buf)(BlockDriverState *bs, void *host, size_t size);
 
 /*
diff --git a/include/sysemu/block-backend-global-state.h 
b/include/sysemu/block-backend-global-state.h
index 97f7dad2c3..6858e39cb6 100644
--- a/include/sysemu/block-backend-global-state.h
+++ b/include/sysemu/block-backend-global-state.h
@@ -106,7 +106,7 @@ void blk_io_limits_enable(BlockBackend *blk, const char 
*group);
 void blk_io_limits_update_group(BlockBackend *blk, const char *group);
 void blk_set_force_allow_inactivate(BlockBackend *blk);
 
-void blk_register_buf(BlockBackend *blk, void *host, size_t size);
+bool blk_register_buf(BlockBackend *blk, void *host, size_t size, Error 
**errp);
 void blk_unregister_buf(BlockBackend *blk, void *host, size_t size);
 
 const BdrvChild *blk_root(BlockBackend *blk);
diff --git a/block/block-backend.c b/block/block-backend.c
index ae42474891..4f59664397 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2545,10 +2545,10 @@ static void blk_root_drained_end(BdrvChild *child, int 
*drained_end_counter)
 }
 }
 
-void blk_register_buf(BlockBackend *blk, void *host, size_t size)
+bool blk_register_buf(BlockBackend *blk, void *host, size_t size, Error **errp)
 {
 GLOBAL_STATE_CODE();
-bdrv_register_buf(blk_bs(blk), host, size);
+return bdrv_register_buf(blk_bs(blk), host, size, errp);
 }
 
 void blk_unregister_buf(BlockBackend *blk, void *host, size_t size)
diff --git a/block/io.c b/block/io.c
index 4207648db6..a9673465dd 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3277,17 +3277,45 @@ void bdrv_io_unplug(BlockDriverState *bs)
 }
 }
 
-void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size)
+/* Helper that undoes bdrv_register_buf() when it fails partway through */
+static void bdrv_register_buf_rollback(BlockDriverState *bs,
+   void *host,
+   size_t size,
+   BdrvChild *final_child)
+{
+BdrvChild *child;
+
+QLIST_FOREACH(child, >children, next) {
+if (child == final_child) {
+break;
+}
+
+bdrv_unregister_buf(child->bs, host, size);
+}
+
+if (bs->drv && bs->drv->bdrv_unregister_buf) {
+bs->drv->bdrv_unregister_buf(bs, host, size);
+}
+}
+
+bool bdrv_register_buf(BlockDriverState *bs, void *host, size_t size,
+   Error **errp)
 {
 BdrvChild *child;
 
 GLOBAL_STATE_CODE();
 if (bs->drv && bs->drv->bdrv_register_buf) {
-bs->drv->bdrv_register_buf(bs, host, size);
+if

[PULL 08/13] numa: use QLIST_FOREACH_SAFE() for RAM block notifiers

2022-10-26 Thread Stefan Hajnoczi

Make list traversal work when a callback removes a notifier
mid-traversal. This is a cleanup to prevent bugs in the future.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Message-id: 20221013185908.1297568-9-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 hw/core/numa.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 31e6fe1caa..ea24a5fa8c 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -857,8 +857,9 @@ void ram_block_notifier_remove(RAMBlockNotifier *n)
 void ram_block_notify_add(void *host, size_t size, size_t max_size)
 {
 RAMBlockNotifier *notifier;
+RAMBlockNotifier *next;
 
-QLIST_FOREACH(notifier, _list.ramblock_notifiers, next) {
+QLIST_FOREACH_SAFE(notifier, _list.ramblock_notifiers, next, next) {
 if (notifier->ram_block_added) {
 notifier->ram_block_added(notifier, host, size, max_size);
 }
@@ -868,8 +869,9 @@ void ram_block_notify_add(void *host, size_t size, size_t 
max_size)
 void ram_block_notify_remove(void *host, size_t size, size_t max_size)
 {
 RAMBlockNotifier *notifier;
+RAMBlockNotifier *next;
 
-QLIST_FOREACH(notifier, _list.ramblock_notifiers, next) {
+QLIST_FOREACH_SAFE(notifier, _list.ramblock_notifiers, next, next) {
 if (notifier->ram_block_removed) {
 notifier->ram_block_removed(notifier, host, size, max_size);
 }
@@ -879,8 +881,9 @@ void ram_block_notify_remove(void *host, size_t size, 
size_t max_size)
 void ram_block_notify_resize(void *host, size_t old_size, size_t new_size)
 {
 RAMBlockNotifier *notifier;
+RAMBlockNotifier *next;
 
-QLIST_FOREACH(notifier, _list.ramblock_notifiers, next) {
+QLIST_FOREACH_SAFE(notifier, _list.ramblock_notifiers, next, next) {
 if (notifier->ram_block_resized) {
 notifier->ram_block_resized(notifier, host, old_size, new_size);
 }
-- 
2.37.3

[PULL 12/13] blkio: implement BDRV_REQ_REGISTERED_BUF optimization

2022-10-26 Thread Stefan Hajnoczi

Avoid bounce buffers when QEMUIOVector elements are within previously
registered bdrv_register_buf() buffers.

The idea is that emulated storage controllers will register guest RAM
using bdrv_register_buf() and set the BDRV_REQ_REGISTERED_BUF on I/O
requests. Therefore no blkio_map_mem_region() calls are necessary in the
performance-critical I/O code path.

This optimization doesn't apply if the I/O buffer is internally
allocated by QEMU (e.g. qcow2 metadata). There we still take the slow
path because BDRV_REQ_REGISTERED_BUF is not set.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Stefano Garzarella 
Message-id: 20221013185908.1297568-13-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/blkio.c | 183 +-
 1 file changed, 180 insertions(+), 3 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index b0cfd74b36..82f26eedd2 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -11,9 +11,13 @@
 #include "qemu/osdep.h"
 #include 
 #include "block/block_int.h"
+#include "exec/memory.h"
+#include "exec/cpu-common.h" /* for qemu_ram_get_fd() */
 #include "qapi/error.h"
+#include "qemu/error-report.h"
 #include "qapi/qmp/qdict.h"
 #include "qemu/module.h"
+#include "exec/memory.h" /* for ram_block_discard_disable() */
 
 /*
  * Keep the QEMU BlockDriver names identical to the libblkio driver names.
@@ -73,6 +77,12 @@ typedef struct {
 
 /* Can we skip adding/deleting blkio_mem_regions? */
 bool needs_mem_regions;
+
+/* Are file descriptors necessary for blkio_mem_regions? */
+bool needs_mem_region_fd;
+
+/* Are madvise(MADV_DONTNEED)-style operations unavailable? */
+bool may_pin_mem_regions;
 } BDRVBlkioState;
 
 /* Called with s->bounce_lock held */
@@ -347,7 +357,8 @@ blkio_co_preadv(BlockDriverState *bs, int64_t offset, 
int64_t bytes,
 .coroutine = qemu_coroutine_self(),
 };
 BDRVBlkioState *s = bs->opaque;
-bool use_bounce_buffer = s->needs_mem_regions;
+bool use_bounce_buffer =
+s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF);
 BlkioBounceBuf bounce;
 struct iovec *iov = qiov->iov;
 int iovcnt = qiov->niov;
@@ -390,7 +401,8 @@ static int coroutine_fn blkio_co_pwritev(BlockDriverState 
*bs, int64_t offset,
 .coroutine = qemu_coroutine_self(),
 };
 BDRVBlkioState *s = bs->opaque;
-bool use_bounce_buffer = s->needs_mem_regions;
+bool use_bounce_buffer =
+s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF);
 BlkioBounceBuf bounce;
 struct iovec *iov = qiov->iov;
 int iovcnt = qiov->niov;
@@ -473,6 +485,130 @@ static void blkio_io_unplug(BlockDriverState *bs)
 }
 }
 
+typedef enum {
+BMRR_OK,
+BMRR_SKIP,
+BMRR_FAIL,
+} BlkioMemRegionResult;
+
+/*
+ * Produce a struct blkio_mem_region for a given address and size.
+ *
+ * This function produces identical results when called multiple times with the
+ * same arguments. This property is necessary because blkio_unmap_mem_region()
+ * must receive the same struct blkio_mem_region field values that were passed
+ * to blkio_map_mem_region().
+ */
+static BlkioMemRegionResult
+blkio_mem_region_from_host(BlockDriverState *bs,
+   void *host, size_t size,
+   struct blkio_mem_region *region,
+   Error **errp)
+{
+BDRVBlkioState *s = bs->opaque;
+int fd = -1;
+ram_addr_t fd_offset = 0;
+
+if (((uintptr_t)host | size) % s->mem_region_alignment) {
+error_setg(errp, "unaligned buf %p with size %zu", host, size);
+return BMRR_FAIL;
+}
+
+/* Attempt to find the fd for the underlying memory */
+if (s->needs_mem_region_fd) {
+RAMBlock *ram_block;
+RAMBlock *end_block;
+ram_addr_t offset;
+
+/*
+ * bdrv_register_buf() is called with the BQL held so mr lives at least
+ * until this function returns.
+ */
+ram_block = qemu_ram_block_from_host(host, false, _offset);
+if (ram_block) {
+fd = qemu_ram_get_fd(ram_block);
+}
+if (fd == -1) {
+/*
+ * Ideally every RAMBlock would have an fd. pc-bios and other
+ * things don't. Luckily they are usually not I/O buffers and we
+ * can just ignore them.
+ */
+return BMRR_SKIP;
+}
+
+/* Make sure the fd covers the entire range */
+end_block = qemu_ram_block_from_host(host + size - 1, false, );
+if (ram_block != end_block) {
+error_setg(errp, "registered buffer at %p with size %zu extends "
+   "beyond RAMBlock", host, size);
+return BMRR_FAIL;
+}
+}
+
+*region = (struct blkio_mem_region){
+.addr = host,
+.len = size,
+.fd = fd,
+.fd_offset = fd_offset,
+};
+return BMRR_OK;
+}
+
+static bool blkio_register_buf(BlockDriverState *bs,

Re: [PULL 00/16] aspeed queue

2022-10-26 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

[PULL v2 12/28] qapi job: Elide redundant has_FOO in generated C

2022-10-26 Thread Markus Armbruster

The has_FOO for pointer-valued FOO are redundant, except for arrays.
They are also a nuisance to work with.  Recent commit "qapi: Start to
elide redundant has_FOO in generated C" provided the means to elide
them step by step.  This is the step for qapi/job.json.

Said commit explains the transformation in more detail.  The invariant
violations mentioned there do not occur here.

Cc: John Snow 
Cc: Vladimir Sementsov-Ogievskiy 
Cc: qemu-block@nongnu.org
Signed-off-by: Markus Armbruster 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20221018062849.3420573-13-arm...@redhat.com>
---
 job-qmp.c  | 3 +--
 scripts/qapi/schema.py | 1 -
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/job-qmp.c b/job-qmp.c
index d498fc89c0..9e26fa899f 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -156,8 +156,7 @@ static JobInfo *job_query_single_locked(Job *job, Error 
**errp)
 .status = job->status,
 .current_progress   = progress_current,
 .total_progress = progress_total,
-.has_error  = !!job->err,
-.error  = job->err ? \
+.error  = job->err ?
   g_strdup(error_get_pretty(job->err)) : NULL,
 };
 
diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
index 07e2a0f263..ff73fdb0b3 100644
--- a/scripts/qapi/schema.py
+++ b/scripts/qapi/schema.py
@@ -759,7 +759,6 @@ def need_has(self):
 assert self.type
 # Temporary hack to support dropping the has_FOO in reviewable chunks
 opt_out = [
-'qapi/job.json',
 'qapi/machine.json',
 'qapi/machine-target.json',
 'qapi/migration.json',
-- 
2.37.3

[PULL v2 08/28] qapi block: Elide redundant has_FOO in generated C

2022-10-26 Thread Markus Armbruster

The has_FOO for pointer-valued FOO are redundant, except for arrays.
They are also a nuisance to work with.  Recent commit "qapi: Start to
elide redundant has_FOO in generated C" provided the means to elide
them step by step.  This is the step for qapi/block*.json.

Said commit explains the transformation in more detail.

There is one instance of the Invariant violation mentioned there:
qcow2_signal_corruption() passes false, "" when node_name is an empty
string.  Take care to pass NULL then.

Additionally, helper bdrv_latency_histogram_stats() loses its output
parameters and returns a value instead.

Cc: Kevin Wolf 
Cc: Hanna Reitz 
Cc: qemu-block@nongnu.org
Signed-off-by: Markus Armbruster 
Message-Id: <20221018062849.3420573-9-arm...@redhat.com>
[block/rbd.c fixed for #ifndef LIBRBD_SUPPORTS_ENCRYPTION]
---
 block/block-backend.c  |   2 +-
 block/copy-before-write.c  |   2 +-
 block/dirty-bitmap.c   |   1 -
 block/export/export.c  |   2 +-
 block/export/vduse-blk.c   |   3 +-
 block/gluster.c|   3 -
 block/monitor/block-hmp-cmds.c |  48 --
 block/qapi-sysemu.c|  73 +-
 block/qapi.c   |  62 +---
 block/qcow.c   |  10 +-
 block/qcow2.c  |  18 ++--
 block/qed.c|   2 +-
 block/quorum.c |   2 +-
 block/rbd.c|  17 ++--
 block/ssh.c|   2 +-
 blockdev-nbd.c |   9 +-
 blockdev.c | 170 +
 blockjob.c |   2 -
 monitor/hmp-cmds.c |   3 +-
 nbd/server.c   |   2 +-
 qemu-img.c |  13 ++-
 qemu-nbd.c |   2 -
 scripts/qapi/schema.py |   3 -
 23 files changed, 171 insertions(+), 280 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index aa4adf06ae..b0ef2ea174 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1861,7 +1861,7 @@ static void send_qmp_error_event(BlockBackend *blk,
 BlockDriverState *bs = blk_bs(blk);
 
 optype = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
-qapi_event_send_block_io_error(blk_name(blk), !!bs,
+qapi_event_send_block_io_error(blk_name(blk),
bs ? bdrv_get_node_name(bs) : NULL, optype,
action, blk_iostatus_is_enabled(blk),
error == ENOSPC, strerror(error));
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index afbdd04489..330c812737 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -433,7 +433,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
 return -EINVAL;
 }
 
-if (opts->has_bitmap) {
+if (opts->bitmap) {
 bitmap = block_dirty_bitmap_lookup(opts->bitmap->node,
opts->bitmap->name, NULL, errp);
 if (!bitmap) {
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index bf3dc0512a..9c39550698 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -541,7 +541,6 @@ BlockDirtyInfoList 
*bdrv_query_dirty_bitmaps(BlockDriverState *bs)
 
 info->count = bdrv_get_dirty_count(bm);
 info->granularity = bdrv_dirty_bitmap_granularity(bm);
-info->has_name = !!bm->name;
 info->name = g_strdup(bm->name);
 info->recording = bdrv_dirty_bitmap_recording(bm);
 info->busy = bdrv_dirty_bitmap_busy(bm);
diff --git a/block/export/export.c b/block/export/export.c
index 4744862915..da7480372c 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -114,7 +114,7 @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error 
**errp)
 ctx = bdrv_get_aio_context(bs);
 aio_context_acquire(ctx);
 
-if (export->has_iothread) {
+if (export->iothread) {
 IOThread *iothread;
 AioContext *new_ctx;
 Error **set_context_errp;
diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c
index f101c24c3f..350d6fdaf0 100644
--- a/block/export/vduse-blk.c
+++ b/block/export/vduse-blk.c
@@ -265,8 +265,7 @@ static int vduse_blk_exp_create(BlockExport *exp, 
BlockExportOptions *opts,
 }
 vblk_exp->num_queues = num_queues;
 vblk_exp->handler.blk = exp->blk;
-vblk_exp->handler.serial = g_strdup(vblk_opts->has_serial ?
-vblk_opts->serial : "");
+vblk_exp->handler.serial = g_strdup(vblk_opts->serial ?: "");
 vblk_exp->handler.logical_block_size = logical_block_size;
 vblk_exp->handler.writable = opts->writable;
 
diff --git a/block/gluster.c b/block/gluster.c
index bb1144cf6a..ee40eac28f 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -830,7 +830,6 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict 
*options,
 s->logfile = g_strdup(logfile ? logfile :

[PULL v2 24/28] qapi transaction: Elide redundant has_FOO in generated C

2022-10-26 Thread Markus Armbruster

The has_FOO for pointer-valued FOO are redundant, except for arrays.
They are also a nuisance to work with.  Recent commit "qapi: Start to
elide redundant has_FOO in generated C" provided the means to elide
them step by step.  This is the step for qapi/transaction.json.

Said commit explains the transformation in more detail.  The invariant
violations mentioned there do not occur here.

Cc: Kevin Wolf 
Cc: Hanna Reitz 
Cc: qemu-block@nongnu.org
Signed-off-by: Markus Armbruster 
Message-Id: <20221018062849.3420573-25-arm...@redhat.com>
---
 blockdev.c | 3 +--
 scripts/qapi/schema.py | 1 -
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 6a3b8cdbc6..0900a38ca6 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1048,7 +1048,7 @@ static void blockdev_do_action(TransactionAction *action, 
Error **errp)
 
 list.value = action;
 list.next = NULL;
-qmp_transaction(, false, NULL, errp);
+qmp_transaction(, NULL, errp);
 }
 
 void qmp_blockdev_snapshot_sync(const char *device, const char *node_name,
@@ -2288,7 +2288,6 @@ static TransactionProperties *get_transaction_properties(
  * Always run under BQL.
  */
 void qmp_transaction(TransactionActionList *dev_list,
- bool has_props,
  struct TransactionProperties *props,
  Error **errp)
 {
diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
index f0726af876..3673296ad8 100644
--- a/scripts/qapi/schema.py
+++ b/scripts/qapi/schema.py
@@ -759,7 +759,6 @@ def need_has(self):
 assert self.type
 # Temporary hack to support dropping the has_FOO in reviewable chunks
 opt_out = [
-'qapi/transaction.json',
 'qapi/ui.json',
 'qapi/virtio.json',
 'qga/qapi-schema.json']
-- 
2.37.3

Re: [PATCH v4 0/7] ppc/e500: Add support for two types of flash, cleanup

2022-10-26 Thread Daniel Henrique Barboza


Hi,

Since this is being sent to qemu-ppc and has to do with e500 I decided to
take a look. I acked the e500 related patches, 5 and 7. Patch 6 LGTM as well
but I'd rather not ack it it's SD specific code.

I'll send a PowerPC pull request this week. I can grab this series via the ppc
tree if someone with SD authority acks patch 6.


Thanks,


Daniel





On 10/18/22 18:01, Bernhard Beschow wrote:

Cover letter:
~

This series adds support for -pflash and direct SD card access to the
PPC e500 boards. The idea is to increase compatibility with "real" firmware
images where only the bare minimum of drivers is compiled in.

The series is structured as follows:

Patches 1-4 perform some general cleanup which paves the way for the rest of
the series.

Patch 5 adds -pflash handling where memory-mapped flash can be added on
user's behalf. That is, the flash memory region in the eLBC is only added if
the -pflash argument is supplied. Note that the cfi01 device model becomes
stricter in checking the size of the emulated flash space.

Patches 6 and 7 add a new device model - the Freescale eSDHC - to the e500
boards which was missing so far.

User documentation is also added as the new features become available.

Tesing done:
* `qeu-system-ppc -M ppce500 -cpu e500mc -m 256 -kernel uImage -append
"console=ttyS0 rootwait root=/dev/mtdblock0 nokaslr" -drive
if=pflash,file=rootfs.ext2,format=raw`
* `qemu-system-ppc -M ppce500 -cpu e500mc -m 256 -kernel uImage -append
"console=ttyS0 rootwait root=/dev/mmcblk0" -device sd-card,drive=mydrive -drive
id=mydrive,if=none,file=rootfs.ext2,format=raw`

The load was created using latest Buildroot with `make
qemu_ppc_e500mc_defconfig` where the rootfs was configured to be of ext2 type.
In both cases it was possible to log in and explore the root file system.

v4:
~~~
Zoltan:
- Don't suggest presence of qemu-system-ppc32 in documentation

Bin:
- New patch: docs/system/ppc/ppce500: Use qemu-system-ppc64 across the board(s)

Peter:
- Inline pflash_cfi01_register() rather than modify it (similar to v2)

v3:
~~~
Phil:
- Also add power-of-2 fix to pflash_cfi02
- Resolve cfi01-specific assertion in e500 code
- Resolve unused define in eSDHC device model
- Resolve redundant alignment checks in eSDHC device model

Bin:
- Add dedicated flash chapter to documentation

Bernhard:
- Use is_power_of_2() instead of ctpop64() for better readability
- Only instantiate eSDHC device model in ppce500 (not used in MPC8544DS)
- Rebase onto gitlab.com/danielhb/qemu/tree/ppc-next

v2:
~~~
Bin:
- Add source for MPC8544DS platform bus' memory map in commit message.
- Keep "ESDHC" in comment referring to Linux driver.
- Use "qemu-system-ppc{64|32} in documentation.
- Use g_autofree in device tree code.
- Remove unneeded device tree properties.
- Error out if pflash size doesn't fit into eLBC memory window.
- Remove unused ESDHC defines.
- Define macro ESDHC_WML for register offset with magic constant.
- Fix some whitespace issues when adding eSDHC device to e500.

Phil:
- Fix tense in commit message.

Bernhard Beschow (7):
   docs/system/ppc/ppce500: Use qemu-system-ppc64 across the board(s)
   hw/block/pflash_cfi0{1,2}: Error out if device length isn't a power of
 two
   hw/sd/sdhci-internal: Unexport ESDHC defines
   hw/sd/sdhci: Rename ESDHC_* defines to USDHC_*
   hw/ppc/e500: Implement pflash handling
   hw/sd/sdhci: Implement Freescale eSDHC device model
   hw/ppc/e500: Add Freescale eSDHC to e500plat

  docs/system/ppc/ppce500.rst |  38 +++-
  hw/block/pflash_cfi01.c |   8 +-
  hw/block/pflash_cfi02.c |   5 +
  hw/ppc/Kconfig  |   2 +
  hw/ppc/e500.c   | 114 +-
  hw/ppc/e500.h   |   1 +
  hw/ppc/e500plat.c   |   1 +
  hw/sd/sdhci-internal.h  |  20 
  hw/sd/sdhci.c   | 183 +++-
  include/hw/sd/sdhci.h   |   3 +
  10 files changed, 324 insertions(+), 51 deletions(-)

Re: [PATCH v4 7/7] hw/ppc/e500: Add Freescale eSDHC to e500plat

2022-10-26 Thread Daniel Henrique Barboza





On 10/18/22 18:01, Bernhard Beschow wrote:

Adds missing functionality to e500plat machine which increases the
chance of given "real" firmware images to access SD cards.

Signed-off-by: Bernhard Beschow 
---


Reviewed-by: Daniel Henrique Barboza 


  docs/system/ppc/ppce500.rst | 12 
  hw/ppc/Kconfig  |  1 +
  hw/ppc/e500.c   | 35 ++-
  hw/ppc/e500.h   |  1 +
  hw/ppc/e500plat.c   |  1 +
  5 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/docs/system/ppc/ppce500.rst b/docs/system/ppc/ppce500.rst
index 38f8ceb0cf..c9fe0915dc 100644
--- a/docs/system/ppc/ppce500.rst
+++ b/docs/system/ppc/ppce500.rst
@@ -19,6 +19,7 @@ The ``ppce500`` machine supports the following devices:
  * Power-off functionality via one GPIO pin
  * 1 Freescale MPC8xxx PCI host controller
  * VirtIO devices via PCI bus
+* 1 Freescale Enhanced Secure Digital Host controller (eSDHC)
  * 1 Freescale Enhanced Triple Speed Ethernet controller (eTSEC)
  
  Hardware configuration information

@@ -181,3 +182,14 @@ as follows:
-drive if=pflash,file=/path/to/rootfs.ext2,format=raw \
-append "rootwait root=/dev/mtdblock0"
  
+Alternatively, the root file system can also reside on an emulated SD card

+whose size must again be a power of two:
+
+.. code-block:: bash
+
+  $ qemu-system-ppc64 -M ppce500 -cpu e500mc -smp 4 -m 2G \
+  -display none -serial stdio \
+  -kernel vmlinux \
+  -device sd-card,drive=mydrive \
+  -drive id=mydrive,if=none,file=/path/to/rootfs.ext2,format=raw \
+  -append "rootwait root=/dev/mmcblk0"
diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig
index 769a1ead1c..6e31f568ba 100644
--- a/hw/ppc/Kconfig
+++ b/hw/ppc/Kconfig
@@ -129,6 +129,7 @@ config E500
  select PFLASH_CFI01
  select PLATFORM_BUS
  select PPCE500_PCI
+select SDHCI
  select SERIAL
  select MPC_I2C
  select FDT_PPC
diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
index 73198adac8..15d1f5ea00 100644
--- a/hw/ppc/e500.c
+++ b/hw/ppc/e500.c
@@ -48,6 +48,7 @@
  #include "hw/net/fsl_etsec/etsec.h"
  #include "hw/i2c/i2c.h"
  #include "hw/irq.h"
+#include "hw/sd/sdhci.h"
  
  #define EPAPR_MAGIC(0x45504150)

  #define DTC_LOAD_PAD   0x180
@@ -66,11 +67,14 @@
  #define MPC8544_SERIAL1_REGS_OFFSET 0x4600ULL
  #define MPC8544_PCI_REGS_OFFSET0x8000ULL
  #define MPC8544_PCI_REGS_SIZE  0x1000ULL
+#define MPC85XX_ESDHC_REGS_OFFSET  0x2e000ULL
+#define MPC85XX_ESDHC_REGS_SIZE0x1000ULL
  #define MPC8544_UTIL_OFFSET0xeULL
  #define MPC8XXX_GPIO_OFFSET0x000FF000ULL
  #define MPC8544_I2C_REGS_OFFSET0x3000ULL
  #define MPC8XXX_GPIO_IRQ   47
  #define MPC8544_I2C_IRQ43
+#define MPC85XX_ESDHC_IRQ  72
  #define RTC_REGS_OFFSET0x68
  
  #define PLATFORM_CLK_FREQ_HZ   (400 * 1000 * 1000)

@@ -203,6 +207,22 @@ static void dt_i2c_create(void *fdt, const char *soc, 
const char *mpic,
  g_free(i2c);
  }
  
+static void dt_sdhc_create(void *fdt, const char *parent, const char *mpic)

+{
+hwaddr mmio = MPC85XX_ESDHC_REGS_OFFSET;
+hwaddr size = MPC85XX_ESDHC_REGS_SIZE;
+int irq = MPC85XX_ESDHC_IRQ;
+g_autofree char *name = NULL;
+
+name = g_strdup_printf("%s/sdhc@%" PRIx64, parent, mmio);
+qemu_fdt_add_subnode(fdt, name);
+qemu_fdt_setprop(fdt, name, "sdhci,auto-cmd12", NULL, 0);
+qemu_fdt_setprop_phandle(fdt, name, "interrupt-parent", mpic);
+qemu_fdt_setprop_cells(fdt, name, "bus-width", 4);
+qemu_fdt_setprop_cells(fdt, name, "interrupts", irq, 0x2);
+qemu_fdt_setprop_cells(fdt, name, "reg", mmio, size);
+qemu_fdt_setprop_string(fdt, name, "compatible", "fsl,esdhc");
+}
  
  typedef struct PlatformDevtreeData {

  void *fdt;
@@ -553,6 +573,10 @@ static int ppce500_load_device_tree(PPCE500MachineState 
*pms,
  
  dt_rtc_create(fdt, "i2c", "rtc");
  
+/* sdhc */

+if (pmc->has_esdhc) {
+dt_sdhc_create(fdt, soc, mpic);
+}
  
  gutil = g_strdup_printf("%s/global-utilities@%llx", soc,

  MPC8544_UTIL_OFFSET);
@@ -982,7 +1006,8 @@ void ppce500_init(MachineState *machine)
 0, qdev_get_gpio_in(mpicdev, 42), 399193,
 serial_hd(1), DEVICE_BIG_ENDIAN);
  }
-/* I2C */
+
+/* I2C */
  dev = qdev_new("mpc-i2c");
  s = SYS_BUS_DEVICE(dev);
  sysbus_realize_and_unref(s, _fatal);
@@ -992,6 +1017,14 @@ void ppce500_init(MachineState *machine)
  i2c = (I2CBus *)qdev_get_child_bus(dev, "i2c");
  i2c_slave_create_simple(i2c, "ds1338", RTC_REGS_OFFSET);
  
+/* eSDHC */

+if (pmc->has_esdhc) {
+dev = qdev_new(TYPE_FSL_ESDHC);
+s = SYS_BUS_DEVICE(dev);
+sysbus_realize_and_unref(s, _fatal);
+sysbus_mmio_map(s, 0, pmc->ccsrbar_base + MPC85XX_ESDHC_REGS_OFFSET);
+sysbus_connect_irq(s, 0,

Re: [PATCH v4 5/7] hw/ppc/e500: Implement pflash handling

2022-10-26 Thread Daniel Henrique Barboza





On 10/18/22 18:01, Bernhard Beschow wrote:

Allows e500 boards to have their root file system reside on flash using
only builtin devices located in the eLBC memory region.

Note that the flash memory area is only created when a -pflash argument is
given, and that the size is determined by the given file. The idea is to
put users into control.

Signed-off-by: Bernhard Beschow 
---


Reviewed-by: Daniel Henrique Barboza 


  docs/system/ppc/ppce500.rst | 16 
  hw/ppc/Kconfig  |  1 +
  hw/ppc/e500.c   | 79 +
  3 files changed, 96 insertions(+)

diff --git a/docs/system/ppc/ppce500.rst b/docs/system/ppc/ppce500.rst
index 7b5eb3c4ee..38f8ceb0cf 100644
--- a/docs/system/ppc/ppce500.rst
+++ b/docs/system/ppc/ppce500.rst
@@ -165,3 +165,19 @@ if “-device eTSEC” is given to QEMU:
  .. code-block:: bash
  
-netdev tap,ifname=tap0,script=no,downscript=no,id=net0 -device eTSEC,netdev=net0

+
+Root file system on flash drive
+---
+
+Rather than using a root file system on ram disk, it is possible to have it on
+CFI flash. Given an ext2 image whose size must be a power of two, it can be 
used
+as follows:
+
+.. code-block:: bash
+
+  $ qemu-system-ppc64 -M ppce500 -cpu e500mc -smp 4 -m 2G \
+  -display none -serial stdio \
+  -kernel vmlinux \
+  -drive if=pflash,file=/path/to/rootfs.ext2,format=raw \
+  -append "rootwait root=/dev/mtdblock0"
+
diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig
index 791fe78a50..769a1ead1c 100644
--- a/hw/ppc/Kconfig
+++ b/hw/ppc/Kconfig
@@ -126,6 +126,7 @@ config E500
  select ETSEC
  select GPIO_MPC8XXX
  select OPENPIC
+select PFLASH_CFI01
  select PLATFORM_BUS
  select PPCE500_PCI
  select SERIAL
diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
index 3e950ea3ba..73198adac8 100644
--- a/hw/ppc/e500.c
+++ b/hw/ppc/e500.c
@@ -23,8 +23,10 @@
  #include "e500-ccsr.h"
  #include "net/net.h"
  #include "qemu/config-file.h"
+#include "hw/block/flash.h"
  #include "hw/char/serial.h"
  #include "hw/pci/pci.h"
+#include "sysemu/block-backend-io.h"
  #include "sysemu/sysemu.h"
  #include "sysemu/kvm.h"
  #include "sysemu/reset.h"
@@ -267,6 +269,31 @@ static void sysbus_device_create_devtree(SysBusDevice 
*sbdev, void *opaque)
  }
  }
  
+static void create_devtree_flash(SysBusDevice *sbdev,

+ PlatformDevtreeData *data)
+{
+g_autofree char *name = NULL;
+uint64_t num_blocks = object_property_get_uint(OBJECT(sbdev),
+   "num-blocks",
+   _fatal);
+uint64_t sector_length = object_property_get_uint(OBJECT(sbdev),
+  "sector-length",
+  _fatal);
+uint64_t bank_width = object_property_get_uint(OBJECT(sbdev),
+   "width",
+   _fatal);
+hwaddr flashbase = 0;
+hwaddr flashsize = num_blocks * sector_length;
+void *fdt = data->fdt;
+
+name = g_strdup_printf("%s/nor@%" PRIx64, data->node, flashbase);
+qemu_fdt_add_subnode(fdt, name);
+qemu_fdt_setprop_string(fdt, name, "compatible", "cfi-flash");
+qemu_fdt_setprop_sized_cells(fdt, name, "reg",
+ 1, flashbase, 1, flashsize);
+qemu_fdt_setprop_cell(fdt, name, "bank-width", bank_width);
+}
+
  static void platform_bus_create_devtree(PPCE500MachineState *pms,
  void *fdt, const char *mpic)
  {
@@ -276,6 +303,8 @@ static void platform_bus_create_devtree(PPCE500MachineState 
*pms,
  uint64_t addr = pmc->platform_bus_base;
  uint64_t size = pmc->platform_bus_size;
  int irq_start = pmc->platform_bus_first_irq;
+SysBusDevice *sbdev;
+bool ambiguous;
  
  /* Create a /platform node that we can put all devices into */
  
@@ -302,6 +331,13 @@ static void platform_bus_create_devtree(PPCE500MachineState *pms,

  /* Loop through all dynamic sysbus devices and create nodes for them */
  foreach_dynamic_sysbus_device(sysbus_device_create_devtree, );
  
+sbdev = SYS_BUS_DEVICE(object_resolve_path_type("", TYPE_PFLASH_CFI01,

+));
+if (sbdev) {
+assert(!ambiguous);
+create_devtree_flash(sbdev, );
+}
+
  g_free(node);
  }
  
@@ -856,6 +892,7 @@ void ppce500_init(MachineState *machine)

  unsigned int pci_irq_nrs[PCI_NUM_PINS] = {1, 2, 3, 4};
  IrqLines *irqs;
  DeviceState *dev, *mpicdev;
+DriveInfo *dinfo;
  CPUPPCState *firstenv = NULL;
  MemoryRegion *ccsr_addr_space;
  SysBusDevice *s;
@@ -1024,6 +1061,48 @@ void ppce500_init(MachineState *machine)
  pmc->platform_bus_base,

Re: [PATCH 3/3] qemu-img: Speed up checksum

2022-10-26 Thread Hanna Reitz


On 01.09.22 16:32, Nir Soffer wrote:

Add coroutine based loop inspired by `qemu-img convert` design.

Changes compared to `qemu-img convert`:

- State for the entire image is kept in ImgChecksumState

- State for single worker coroutine is kept in ImgChecksumworker.

- "Writes" are always in-order, ensured using a queue.

- Calling block status once per image extent, when the current extent is
   consumed by the workers.

- Using 1m buffer size - testings shows that this gives best read
   performance both with buffered and direct I/O.


Why does patch 1 then choose to use 2 MB?


- Number of coroutines is not configurable. Testing does not show
   improvement when using more than 8 coroutines.

- Progress include entire image, not only the allocated state.

Comparing to the simple read loop shows that this version is up to 4.67
times faster when computing a checksum for an image full of zeroes. For
real images it is 1.59 times faster with direct I/O, and with buffered
I/O there is no difference.

Test results on Dell PowerEdge R640 in a CentOS Stream 9 container:

| image| size | i/o   | before | after  | change |
|--|--|---||||
| zero [1] |   6g | buffered  | 1.600s ±0.014s | 0.342s ±0.016s |  x4.67 |
| zero |   6g | direct| 4.684s ±0.093s | 2.211s ±0.009s |  x2.12 |
| real [2] |   6g | buffered  | 1.841s ±0.075s | 1.806s ±0.036s |  x1.02 |
| real |   6g | direct| 3.094s ±0.079s | 1.947s ±0.017s |  x1.59 |
| nbd  [3] |   6g | buffered  | 2.455s ±0.183s | 1.808s ±0.016s |  x1.36 |
| nbd  |   6g | direct| 3.540s ±0.020s | 1.749s ±0.018s |  x2.02 |

[1] raw image full of zeroes
[2] raw fedora 35 image with additional random data, 50% full
[3] image [2] exported by qemu-nbd via unix socket

Signed-off-by: Nir Soffer 
---
  qemu-img.c | 343 +
  1 file changed, 270 insertions(+), 73 deletions(-)


Looks good!

Just a couple of style comments below.


diff --git a/qemu-img.c b/qemu-img.c
index 7edcfe4bc8..bfa8e2862f 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1613,48 +1613,288 @@ out:
  qemu_vfree(buf2);
  blk_unref(blk2);
  out2:
  blk_unref(blk1);
  out3:
  qemu_progress_end();
  return ret;
  }
  
  #ifdef CONFIG_BLKHASH

+
+#define CHECKSUM_COROUTINES 8
+#define CHECKSUM_BUF_SIZE (1 * MiB)
+#define CHECKSUM_ZERO_SIZE MIN(16 * GiB, SIZE_MAX)
+
+typedef struct ImgChecksumState ImgChecksumState;
+
+typedef struct ImgChecksumWorker {
+QTAILQ_ENTRY(ImgChecksumWorker) entry;
+ImgChecksumState *state;
+Coroutine *co;
+uint8_t *buf;
+
+/* The current chunk. */
+int64_t offset;
+int64_t length;
+bool zero;
+
+/* Always true for zero extent, false for data extent. Set to true
+ * when reading the chunk completes. */


Qemu codestyle requires /* and */ to be on separate lines for multi-line 
comments (see checkpatch.pl).



+bool ready;
+} ImgChecksumWorker;
+
+struct ImgChecksumState {
+const char *filename;
+BlockBackend *blk;
+BlockDriverState *bs;
+int64_t total_size;
+
+/* Current extent, modified in checksum_co_next. */
+int64_t offset;
+int64_t length;
+bool zero;
+
+int running_coroutines;
+CoMutex lock;
+ImgChecksumWorker workers[CHECKSUM_COROUTINES];
+
+/* Ensure in-order updates. Update are scheduled at the tail of the
+ * queue and processed from the head of the queue when a worker is
+ * ready. */


Qemu codestyle requires /* and */ to be on separate lines for multi-line 
comments.



+QTAILQ_HEAD(, ImgChecksumWorker) update_queue;
+
+struct blkhash *hash;
+int ret;
+};


[...]


+static coroutine_fn bool checksum_co_next(ImgChecksumWorker *w)
+{
+ImgChecksumState *s = w->state;
+
+qemu_co_mutex_lock(>lock);
+
+if (s->offset == s->total_size || s->ret != -EINPROGRESS) {
+qemu_co_mutex_unlock(>lock);
+return false;
+}
+
+if (s->length == 0 && checksum_block_status(s)) {


I’d prefer `checksum_block_status(s) < 0` so that it is clear that 
negative values indicate errors.  Otherwise, based on this code alone, 
it looks to me more like `checksum_block_status()` returned a boolean, 
where `false` would generally indicate an error, which is confusing.


Same in other places below.


+qemu_co_mutex_unlock(>lock);
+return false;
+}


[...]


+/* Enter the next worker coroutine if the worker is ready. */
+static void coroutine_fn checksum_co_enter_next(ImgChecksumWorker *w)
+{
+ImgChecksumState *s = w->state;
+ImgChecksumWorker *next;
+
+if (!QTAILQ_EMPTY(>update_queue)) {
+next = QTAILQ_FIRST(>update_queue);
+if (next->ready)
+qemu_coroutine_enter(next->co);


Qemu codestyle requires braces here.

Hanna

Re: [PATCH 2/3] iotests: Test qemu-img checksum

2022-10-26 Thread Hanna Reitz


On 01.09.22 16:32, Nir Soffer wrote:

Add simple tests creating an image with all kinds of extents, different
formats, different backing chain, different protocol, and different
image options. Since all images have the same guest visible content they
must have the same checksum.

To help debugging in case of failures, the output includes a json map of
every test image.

Signed-off-by: Nir Soffer 
---
  tests/qemu-iotests/tests/qemu-img-checksum| 149 ++
  .../qemu-iotests/tests/qemu-img-checksum.out  |  74 +
  2 files changed, 223 insertions(+)
  create mode 100755 tests/qemu-iotests/tests/qemu-img-checksum
  create mode 100644 tests/qemu-iotests/tests/qemu-img-checksum.out

diff --git a/tests/qemu-iotests/tests/qemu-img-checksum 
b/tests/qemu-iotests/tests/qemu-img-checksum
new file mode 100755
index 00..3a85ba33f2
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-checksum
@@ -0,0 +1,149 @@
+#!/usr/bin/env python3
+# group: rw auto quick
+#
+# Test cases for qemu-img checksum.
+#
+# Copyright (C) 2022 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+
+import re
+
+import iotests
+
+from iotests import (
+filter_testfiles,
+qemu_img,
+qemu_img_log,
+qemu_io,
+qemu_nbd_popen,
+)
+
+
+def checksum_available():
+out = qemu_img("--help").stdout
+return re.search(r"\bchecksum .+ filename\b", out) is not None
+
+
+if not checksum_available():
+iotests.notrun("checksum command not available")
+
+iotests.script_initialize(
+supported_fmts=["raw", "qcow2"],
+supported_cache_modes=["none", "writeback"],


It doesn’t work with writeback, though, because it uses -T none below.

Which by the way is a heavy cost, because I usually run tests in tmpfs, 
where this won’t work.  Is there any way of not doing the -T none below?



+supported_protocols=["file", "nbd"],
+required_fmts=["raw", "qcow2"],
+)
+
+print()
+print("=== Test images ===")
+print()
+
+disk_raw = iotests.file_path('raw')
+qemu_img("create", "-f", "raw", disk_raw, "10m")
+qemu_io("-f", "raw",
+"-c", "write -P 0x1 0 2m",  # data
+"-c", "write -P 0x0 2m 2m", # data with zeroes
+"-c", "write -z 4m 2m", # zero allocated
+"-c", "write -z -u 6m 2m",  # zero hole
+# unallocated
+disk_raw)
+print(filter_testfiles(disk_raw))
+qemu_img_log("map", "--output", "json", disk_raw)
+
+disk_qcow2 = iotests.file_path('qcow2')
+qemu_img("create", "-f", "qcow2", disk_qcow2, "10m")
+qemu_io("-f", "qcow2",
+"-c", "write -P 0x1 0 2m",  # data
+"-c", "write -P 0x0 2m 2m", # data with zeroes
+"-c", "write -z 4m 2m", # zero allocated
+"-c", "write -z -u 6m 2m",  # zero hole
+# unallocated
+disk_qcow2)
+print(filter_testfiles(disk_qcow2))
+qemu_img_log("map", "--output", "json", disk_qcow2)


This isn’t how iotests work, generally.  When run with -qcow2 -file, it 
should only test qcow2 on file, not raw on file, not raw on nbd. Perhaps 
this way this test could even support other formats than qcow2 and raw.


Hanna

Re: [PATCH 1/3] qemu-img: Add checksum command

2022-10-26 Thread Hanna Reitz


On 01.09.22 16:32, Nir Soffer wrote:

The checksum command compute a checksum for disk image content using the
blkhash library[1]. The blkhash library is not packaged yet, but it is
available via copr[2].

Example run:

 $ ./qemu-img checksum -p fedora-35.qcow2
 6e5c00c995056319d52395f8d91c7f84725ae3da69ffcba4de4c7d22cff713a5  
fedora-35.qcow2

The block checksum is constructed by splitting the image to fixed sized
blocks and computing a digest of every block. The image checksum is the
digest of the all block digests.

The checksum uses internally the "sha256" algorithm but it cannot be
compared with checksums created by other tools such as `sha256sum`.

The blkhash library supports sparse images, zero detection, and
optimizes zero block hashing (they are practically free). The library
uses multiple threads to speed up the computation.

Comparing to `sha256sum`, `qemu-img checksum` is 3.5-4800[3] times
faster, depending on the amount of data in the image:

 $ ./qemu-img info /scratch/50p.raw
 file format: raw
 virtual size: 6 GiB (6442450944 bytes)
 disk size: 2.91 GiB

 $ hyperfine -w2 -r5 -p "sleep 1" "./qemu-img checksum /scratch/50p.raw" \
  "sha256sum /scratch/50p.raw"
 Benchmark 1: ./qemu-img checksum /scratch/50p.raw
   Time (mean ± σ):  1.849 s ±  0.037 s[User: 7.764 s, System: 
0.962 s]
   Range (min … max):1.813 s …  1.908 s5 runs

 Benchmark 2: sha256sum /scratch/50p.raw
   Time (mean ± σ): 14.585 s ±  0.072 s[User: 13.537 s, System: 
1.003 s]
   Range (min … max):   14.501 s … 14.697 s5 runs

 Summary
   './qemu-img checksum /scratch/50p.raw' ran
 7.89 ± 0.16 times faster than 'sha256sum /scratch/50p.raw'

The new command is available only when `blkhash` is available during
build. To test the new command please install the `blkhash-devel`
package:

 $ dnf copr enable nsoffer/blkhash
 $ sudo dnf install blkhash-devel

[1] https://gitlab.com/nirs/blkhash
[2] https://copr.fedorainfracloud.org/coprs/nsoffer/blkhash/
[3] Computing checksum for 8T empty image: qemu-img checksum: 3.7s,
 sha256sum (estimate): 17,749s

Signed-off-by: Nir Soffer 
---
  docs/tools/qemu-img.rst |  22 +
  meson.build |  10 ++-
  meson_options.txt   |   2 +
  qemu-img-cmds.hx|   8 ++
  qemu-img.c  | 191 
  5 files changed, 232 insertions(+), 1 deletion(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 85a6e05b35..8be9c45cbf 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -347,20 +347,42 @@ Command description:
  Check completed, image is corrupted
3
  Check completed, image has leaked clusters, but is not corrupted
63
  Checks are not supported by the image format
  
If ``-r`` is specified, exit codes representing the image state refer to the

state after (the attempt at) repairing it. That is, a successful ``-r all``
will yield the exit code 0, independently of the image state before.
  
+.. option:: checksum [--object OBJECTDEF] [--image-opts] [-f FMT] [-T SRC_CACHE] [-p] FILENAME

+
+  Print a checksum for image *FILENAME* guest visible content.


Why not say which kind of checksum it is?


 Images with
+  different format or settings wil have the same checksum.


s/wil/will/


+
+  The format is probed unless you specify it by ``-f``.
+
+  The checksum is computed for guest visible content. Allocated areas full of
+  zeroes, zero clusters, and unallocated areas are read as zeros so they will
+  have the same checksum. Images with single or multiple files or backing files
+  will have the same checksums if the guest will see the same content when
+  reading the image.
+
+  Image metadata that is not visible to the guest such as dirty bitmaps does
+  not affect the checksum.
+
+  Computing a checksum requires a read-only image. You cannot compute a
+  checksum of an active image used by a guest,


Makes me ask: Why not?  Other subcommands have the -U flag for this.


 but you can compute a checksum
+  of a guest during pull mode incremental backup using NBD URL.
+
+  The checksum is not compatible with other tools such as *sha256sum*.


Why not?  I can see it differs even for raw images, but why?  I would 
have very much assumed that this gives me exactly what sha256sum in the 
guest on the guest device would yield.



+
  .. option:: commit [--object OBJECTDEF] [--image-opts] [-q] [-f FMT] [-t 
CACHE] [-b BASE] [-r RATE_LIMIT] [-d] [-p] FILENAME
  
Commit the changes recorded in *FILENAME* in its base image or backing file.

If the backing file is smaller than the snapshot, then the backing file 
will be
resized to be the same size as the snapshot.  If the snapshot is smaller 
than
the backing file,

Re: [PATCH v2 19/43] hw/isa/piix3: Allow board to provide PCI interrupt routes

2022-10-26 Thread Bernhard Beschow

Hi Phil,

Am 25. Oktober 2022 23:34:15 UTC schrieb "Philippe Mathieu-Daudé" 
:
>On 22/10/22 17:04, Bernhard Beschow wrote:
>> PIIX3 initializes the PIRQx route control registers to the default
>> values as described in the 82371AB PCI-TO-ISA/IDE XCELERATOR (PIIX4)
>> April 1997 manual. PIIX4, however, initializes the routes according to
>> the Malta™ User’s Manual, ch 6.6, which are IRQs 10 and 11. In order to
>> allow the reset methods to be consolidated, allow board code to specify
>> the routes.
>> 
>> Signed-off-by: Bernhard Beschow 
>> ---
>>   hw/isa/piix3.c| 12 
>>   include/hw/southbridge/piix.h |  1 +
>>   2 files changed, 9 insertions(+), 4 deletions(-)
>> 
>> diff --git a/hw/isa/piix3.c b/hw/isa/piix3.c
>> index aa32f43e4a..c6a8f1f27d 100644
>> --- a/hw/isa/piix3.c
>> +++ b/hw/isa/piix3.c
>> @@ -168,10 +168,10 @@ static void piix3_reset(DeviceState *dev)
>>   pci_conf[0x4c] = 0x4d;
>>   pci_conf[0x4e] = 0x03;
>>   pci_conf[0x4f] = 0x00;
>> -pci_conf[0x60] = 0x80;
>> -pci_conf[0x61] = 0x80;
>> -pci_conf[0x62] = 0x80;
>> -pci_conf[0x63] = 0x80;
>
>These values are the correct reset ones however (also for PIIX4).
>
>The problem is the Malta machine abuse of the PIIX4 model. IOW
>this doesn't seem the correct approach, we should fix how Malta
>use the PIIX4 (in the emulated tiny boot loader). I'll try to
>write smth before reviewing the rest of this series, because
>it might simplify your refactor.

Indeed my first approach for the refactor was to implement MachineClass::reset 
for Malta where the Malta-specific values would be set. I could redo that if 
you want. Just let me know.

Best regards,
Bernhard
>
>> +pci_conf[PIIX_PIRQCA] = d->pci_irq_reset_mappings[0];
>> +pci_conf[PIIX_PIRQCB] = d->pci_irq_reset_mappings[1];
>> +pci_conf[PIIX_PIRQCC] = d->pci_irq_reset_mappings[2];
>> +pci_conf[PIIX_PIRQCD] = d->pci_irq_reset_mappings[3];
>>   pci_conf[0x69] = 0x02;
>>   pci_conf[0x70] = 0x80;
>>   pci_conf[0x76] = 0x0c;
>> @@ -383,6 +383,10 @@ static void pci_piix3_init(Object *obj)
>> static Property pci_piix3_props[] = {
>>   DEFINE_PROP_UINT32("smb_io_base", PIIX3State, smb_io_base, 0),
>> +DEFINE_PROP_UINT8("pirqa", PIIX3State, pci_irq_reset_mappings[0], 0x80),
>> +DEFINE_PROP_UINT8("pirqb", PIIX3State, pci_irq_reset_mappings[1], 0x80),
>> +DEFINE_PROP_UINT8("pirqc", PIIX3State, pci_irq_reset_mappings[2], 0x80),
>> +DEFINE_PROP_UINT8("pirqd", PIIX3State, pci_irq_reset_mappings[3], 0x80),
>>   DEFINE_PROP_BOOL("has-acpi", PIIX3State, has_acpi, true),
>>   DEFINE_PROP_BOOL("has-usb", PIIX3State, has_usb, true),
>>   DEFINE_PROP_BOOL("smm-enabled", PIIX3State, smm_enabled, false),
>> diff --git a/include/hw/southbridge/piix.h b/include/hw/southbridge/piix.h
>> index 1f22eb1444..df3e0084c5 100644
>> --- a/include/hw/southbridge/piix.h
>> +++ b/include/hw/southbridge/piix.h
>> @@ -54,6 +54,7 @@ struct PIIXState {
>> /* This member isn't used. Just for save/load compatibility */
>>   int32_t pci_irq_levels_vmstate[PIIX_NUM_PIRQS];
>> +uint8_t pci_irq_reset_mappings[PIIX_NUM_PIRQS];
>> ISAPICState pic;
>>   RTCState rtc;
>

[PATCH v1] block/rbd: Add support for layered encryption

2022-10-26 Thread Or Ozeri

Starting from ceph Reef, RBD has built-in support for layered encryption,
where each ancestor image (in a cloned image setting) can be possibly
encrypted using a unique passphrase.

A new function, rbd_encryption_load2, was added to librbd API.
This new function supports an array of passphrases (via "spec" structs).

This commit extends the qemu rbd driver API to use this new librbd API,
in order to support this new layered encryption feature.

Signed-off-by: Or Ozeri 
---
 block/rbd.c  | 134 ++-
 qapi/block-core.json |  33 ++-
 2 files changed, 163 insertions(+), 4 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index f826410f40..09953687c9 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -71,6 +71,16 @@ static const char rbd_luks2_header_verification[
 'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2
 };
 
+static const char rbd_layered_luks_header_verification[
+RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
+'R', 'B', 'D', 'L', 0xBA, 0xBE, 0, 1
+};
+
+static const char rbd_layered_luks2_header_verification[
+RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
+'R', 'B', 'D', 'L', 0xBA, 0xBE, 0, 2
+};
+
 typedef enum {
 RBD_AIO_READ,
 RBD_AIO_WRITE,
@@ -470,6 +480,9 @@ static int qemu_rbd_encryption_load(rbd_image_t image,
 size_t passphrase_len;
 rbd_encryption_luks1_format_options_t luks_opts;
 rbd_encryption_luks2_format_options_t luks2_opts;
+#ifdef LIBRBD_SUPPORTS_ENCRYPTION_LOAD2
+rbd_encryption_luks_format_options_t luks_all_opts;
+#endif
 rbd_encryption_format_t format;
 rbd_encryption_options_t opts;
 size_t opts_size;
@@ -505,6 +518,23 @@ static int qemu_rbd_encryption_load(rbd_image_t image,
 luks2_opts.passphrase_size = passphrase_len;
 break;
 }
+#ifdef LIBRBD_SUPPORTS_ENCRYPTION_LOAD2
+case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS_ALL: {
+memset(_all_opts, 0, sizeof(luks_all_opts));
+format = RBD_ENCRYPTION_FORMAT_LUKS;
+opts = _all_opts;
+opts_size = sizeof(luks_all_opts);
+r = qemu_rbd_convert_luks_options(
+
qapi_RbdEncryptionOptionsLUKSAll_base(>u.luks_all),
+, _len, errp);
+if (r < 0) {
+return r;
+}
+luks_all_opts.passphrase = passphrase;
+luks_all_opts.passphrase_size = passphrase_len;
+break;
+}
+#endif
 default: {
 r = -ENOTSUP;
 error_setg_errno(
@@ -522,6 +552,87 @@ static int qemu_rbd_encryption_load(rbd_image_t image,
 
 return 0;
 }
+
+#ifdef LIBRBD_SUPPORTS_ENCRYPTION_LOAD2
+static int qemu_rbd_encryption_load2(rbd_image_t image,
+ RbdEncryptionOptions *encrypt,
+ Error **errp)
+{
+int r = 0;
+int encryption_options_count = 1;
+int spec_count = 0;
+int passphrase_count = 0;
+int i;
+RbdEncryptionOptions *curr_encrypt;
+rbd_encryption_spec_t *specs;
+rbd_encryption_spec_t *curr_spec;
+rbd_encryption_luks_format_options_t* luks_all_opts;
+char **passphrases;
+char **curr_passphrase;
+
+/* count encryption options */
+for (curr_encrypt = encrypt; curr_encrypt->has_parent;
+ curr_encrypt = curr_encrypt->parent, ++encryption_options_count) {
+}
+
+specs = g_new0(rbd_encryption_spec_t, encryption_options_count);
+passphrases = g_new0(char*, encryption_options_count);
+
+curr_encrypt = encrypt;
+for (i = 0; i < encryption_options_count; ++i) {
+if (curr_encrypt->format != RBD_IMAGE_ENCRYPTION_FORMAT_LUKS_ALL) {
+r = -ENOTSUP;
+error_setg_errno(
+errp, -r, "unknown image encryption format: %u",
+curr_encrypt->format);
+goto exit;
+}
+
+curr_spec = [i];
+curr_passphrase = [i];
+curr_spec->format = RBD_ENCRYPTION_FORMAT_LUKS;
+curr_spec->opts_size = sizeof(rbd_encryption_luks_format_options_t);
+
+luks_all_opts = g_new0(rbd_encryption_luks_format_options_t, 1);
+curr_spec->opts = luks_all_opts;
+++spec_count;
+memset(luks_all_opts, 0, sizeof(rbd_encryption_luks_format_options_t));
+
+r = qemu_rbd_convert_luks_options(
+qapi_RbdEncryptionOptionsLUKSAll_base(
+_encrypt->u.luks_all),
+curr_passphrase, _all_opts->passphrase_size,
+errp);
+if (r < 0) {
+goto exit;
+}
+
+++passphrase_count;
+luks_all_opts->passphrase = *curr_passphrase;
+
+curr_encrypt = curr_encrypt->parent;
+}
+
+r = rbd_encryption_load2(image, specs, spec_count);
+if (r < 0) {
+error_setg_errno(errp, -r, "encryption load (2) fail");
+goto exit;
+}
+
+exit:
+for (i = 0; i < spec_count; ++i) {
+

51 matches

Mail list logo