Re: [PATCH 07/13] RFC migration: icp/server is a mess

2023-10-20 Thread Greg Kurz
On Fri, 20 Oct 2023 17:49:38 +1000
"Nicholas Piggin"  wrote:

> On Fri Oct 20, 2023 at 7:39 AM AEST, Greg Kurz wrote:
> > On Thu, 19 Oct 2023 21:08:25 +0200
> > Juan Quintela  wrote:
> >
> > > Current code does:
> > > - register pre_2_10_vmstate_dummy_icp with "icp/server" and instance
> > >   dependinfg on cpu number
> > > - for newer machines, it register vmstate_icp with "icp/server" name
> > >   and instance 0
> > > - now it unregisters "icp/server" for the 1st instance.
> > > 
> > > This is wrong at many levels:
> > > - we shouldn't have two VMSTATEDescriptions with the same name
> > > - In case this is the only solution that we can came with, it needs to
> > >   be:
> > >   * register pre_2_10_vmstate_dummy_icp
> > >   * unregister pre_2_10_vmstate_dummy_icp
> > >   * register real vmstate_icp
> > > 
> > > As the initialization of this machine is already complex enough, I
> > > need help from PPC maintainers to fix this.
> > > 
> > > Volunteers?
> > > 
> > > CC: Cedric Le Goater 
> > > CC: Daniel Henrique Barboza 
> > > CC: David Gibson 
> > > CC: Greg Kurz 
> > > 
> > > Signed-off-by: Juan Quintela 
> > > ---
> > >  hw/ppc/spapr.c | 7 ++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index cb840676d3..8531d13492 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -143,7 +143,12 @@ static bool pre_2_10_vmstate_dummy_icp_needed(void 
> > > *opaque)
> > >  }
> > >  
> > >  static const VMStateDescription pre_2_10_vmstate_dummy_icp = {
> > > -.name = "icp/server",
> > > +/*
> > > + * Hack ahead.  We can't have two devices with the same name and
> > > + * instance id.  So I rename this to pass make check.
> > > + * Real help from people who knows the hardware is needed.
> > > + */
> > > +.name = "pre-2.10-icp/server",
> > >  .version_id = 1,
> > >  .minimum_version_id = 1,
> > >  .needed = pre_2_10_vmstate_dummy_icp_needed,
> >
> > I guess this fix is acceptable as well and a lot simpler than
> > reverting the hack actually. Outcome is the same : drop
> > compat with pseries-2.9 and older.
> >
> > Reviewed-by: Greg Kurz 
> 
> So the reason we can't have duplicate names registered, aside from it
> surely going bad if we actually send or receive a stream at the point
> they are registered, is the duplcate check introduced in patch 9? But
> before that, this hack does seem to actually work because the duplicate
> is unregistered right away.
> 

Correct.

> If I understand the workaround, there is an asymmetry in the migration
> sequence in that receiving an unexpected object would cause a failure,
> but going from newer to older would just skip some "expected" objects
> and that didn't cause a problem. So you only have to deal with ignoring
> the unexpected ones going form older to newer.
> 

Correct.

> Side question, is it possible to flag the problem of *not* receiving
> an object that you did expect? That might be a source of bugs too.
> 

AFAICR we try to only migrate state that differs from reset : the
destination cannot really assume it will receive anything for a
given device.

> Anyway, I wonder if we could fix this spapr problem by adding a special
> case wild card instance matcher to ignore it? It's still a bit hacky
> but maybe a bit nicer. I don't mind deprecating the machine soon if
> you want to clear the wildcard hack away soon, but it would be nice to
> separate the deprecation and removal from the fix, if possible.
> 
> This patch is not tested but hopefully helps illustrate the idea.
> 

I'm not sure this will fly with older QEMUs that don't know about
VMSTATE_INSTANCE_ID_WILD... but I'll let Juan comment on that.

> Thanks,
> Nick
> 

Cheers,

--
Greg

> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 1a31fb7293..8ce03edefa 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -1205,6 +1205,7 @@ int vmstate_save_state_v(QEMUFile *f, const 
> VMStateDescription *vmsd,
>  bool vmstate_save_needed(const VMStateDescription *vmsd, void *opaque);
>  
>  #define  VMSTATE_INSTANCE_ID_ANY  -1
> +#define  VMSTATE_INSTANCE_ID_WILD -2
>  
>  /* Returns: 0 on success, -1 on failure */
>  int vmstate_register_with_alia

Re: [PATCH 07/13] RFC migration: icp/server is a mess

2023-10-20 Thread Greg Kurz
On Fri, 20 Oct 2023 09:30:44 +0200
Juan Quintela  wrote:

> Greg Kurz  wrote:
> > On Thu, 19 Oct 2023 21:08:25 +0200
> > Juan Quintela  wrote:
> >
> >> Current code does:
> >> - register pre_2_10_vmstate_dummy_icp with "icp/server" and instance
> >>   dependinfg on cpu number
> >> - for newer machines, it register vmstate_icp with "icp/server" name
> >>   and instance 0
> >> - now it unregisters "icp/server" for the 1st instance.
> >> 
> >> This is wrong at many levels:
> >> - we shouldn't have two VMSTATEDescriptions with the same name
> >> - In case this is the only solution that we can came with, it needs to
> >>   be:
> >>   * register pre_2_10_vmstate_dummy_icp
> >>   * unregister pre_2_10_vmstate_dummy_icp
> >>   * register real vmstate_icp
> >> 
> >> As the initialization of this machine is already complex enough, I
> >> need help from PPC maintainers to fix this.
> >> 
> >> Volunteers?
> >> 
> >> CC: Cedric Le Goater 
> >> CC: Daniel Henrique Barboza 
> >> CC: David Gibson 
> >> CC: Greg Kurz 
> >> 
> >> Signed-off-by: Juan Quintela 
> >> ---
> >>  hw/ppc/spapr.c | 7 ++-
> >>  1 file changed, 6 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index cb840676d3..8531d13492 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -143,7 +143,12 @@ static bool pre_2_10_vmstate_dummy_icp_needed(void 
> >> *opaque)
> >>  }
> >>  
> >>  static const VMStateDescription pre_2_10_vmstate_dummy_icp = {
> >> -.name = "icp/server",
> >> +/*
> >> + * Hack ahead.  We can't have two devices with the same name and
> >> + * instance id.  So I rename this to pass make check.
> >> + * Real help from people who knows the hardware is needed.
> >> + */
> >> +.name = "pre-2.10-icp/server",
> >>  .version_id = 1,
> >>  .minimum_version_id = 1,
> >>  .needed = pre_2_10_vmstate_dummy_icp_needed,
> >
> > I guess this fix is acceptable as well and a lot simpler than
> > reverting the hack actually. Outcome is the same : drop
> > compat with pseries-2.9 and older.
> >
> > Reviewed-by: Greg Kurz 
> 
> I fully agree with you here.
> The other options given on this thread is deprecate that machines, but I
> would like to have this series sooner than 2 releases.

Yeah and, especially, the deprecation of all these machine types is
itself a massive chunk of work as it will call to identify and
remove other related workarounds as well. Given that pretty much
everyone working in PPC/PAPR moved away, can the community handle
such a big change ?

>  And what ppc is
> doing here is (and has always been) a hack and an abuse about how
> vmstate registrations is supposed to work.
> 

Sorry again... We should have involved migration experts at the time. :-)

> Thanks, Juan.
> 

Cheers,

-- 
Greg



Re: [PATCH 07/13] RFC migration: icp/server is a mess

2023-10-19 Thread Greg Kurz
On Thu, 19 Oct 2023 21:08:25 +0200
Juan Quintela  wrote:

> Current code does:
> - register pre_2_10_vmstate_dummy_icp with "icp/server" and instance
>   dependinfg on cpu number
> - for newer machines, it register vmstate_icp with "icp/server" name
>   and instance 0
> - now it unregisters "icp/server" for the 1st instance.
> 
> This is wrong at many levels:
> - we shouldn't have two VMSTATEDescriptions with the same name
> - In case this is the only solution that we can came with, it needs to
>   be:
>   * register pre_2_10_vmstate_dummy_icp
>   * unregister pre_2_10_vmstate_dummy_icp
>   * register real vmstate_icp
> 
> As the initialization of this machine is already complex enough, I
> need help from PPC maintainers to fix this.
> 
> Volunteers?
> 
> CC: Cedric Le Goater 
> CC: Daniel Henrique Barboza 
> CC: David Gibson 
> CC: Greg Kurz 
> 
> Signed-off-by: Juan Quintela 
> ---
>  hw/ppc/spapr.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index cb840676d3..8531d13492 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -143,7 +143,12 @@ static bool pre_2_10_vmstate_dummy_icp_needed(void 
> *opaque)
>  }
>  
>  static const VMStateDescription pre_2_10_vmstate_dummy_icp = {
> -.name = "icp/server",
> +/*
> + * Hack ahead.  We can't have two devices with the same name and
> + * instance id.  So I rename this to pass make check.
> + * Real help from people who knows the hardware is needed.
> + */
> +.name = "pre-2.10-icp/server",
>  .version_id = 1,
>  .minimum_version_id = 1,
>  .needed = pre_2_10_vmstate_dummy_icp_needed,

I guess this fix is acceptable as well and a lot simpler than
reverting the hack actually. Outcome is the same : drop
compat with pseries-2.9 and older.

Reviewed-by: Greg Kurz 

-- 
Greg



Re: [PATCH 07/13] RFC migration: icp/server is a mess

2023-10-19 Thread Greg Kurz
Hi Juan,

On Thu, 19 Oct 2023 21:08:25 +0200
Juan Quintela  wrote:

> Current code does:
> - register pre_2_10_vmstate_dummy_icp with "icp/server" and instance
>   dependinfg on cpu number
> - for newer machines, it register vmstate_icp with "icp/server" name
>   and instance 0
> - now it unregisters "icp/server" for the 1st instance.
> 

Heh I remember about this hack... it was caused by some rework in
the interrupt controller that broke migration.

> This is wrong at many levels:
> - we shouldn't have two VMSTATEDescriptions with the same name

I don't know how bad it is. The idea here is to send extra
state in the stream because older QEMU expect it (but won't use
it), so it made sense to keep the same name.

> - In case this is the only solution that we can came with, it needs to
>   be:
>   * register pre_2_10_vmstate_dummy_icp
>   * unregister pre_2_10_vmstate_dummy_icp
>   * register real vmstate_icp
> 
> As the initialization of this machine is already complex enough, I
> need help from PPC maintainers to fix this.
> 

What about dropping all this code, i.e. basically reverting 46f7afa37096 
("spapr:
fix migration of ICPState objects from/to older QEMU") ?

Unless we still care to migrate pseries machine types from 2017 of
course...

> Volunteers?
> 

Not working on PPC anymore since almost two years, I certainly don't have time,
nor motivation to fix this. I might be able to answer some questions or to
review someone else's patch that gets rid of the offending code, at best.

Cheers,

--
Greg


> CC: Cedric Le Goater 
> CC: Daniel Henrique Barboza 
> CC: David Gibson 
> CC: Greg Kurz 
> 
> Signed-off-by: Juan Quintela 
> ---
>  hw/ppc/spapr.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index cb840676d3..8531d13492 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -143,7 +143,12 @@ static bool pre_2_10_vmstate_dummy_icp_needed(void 
> *opaque)
>  }
>  
>  static const VMStateDescription pre_2_10_vmstate_dummy_icp = {
> -.name = "icp/server",
> +/*
> + * Hack ahead.  We can't have two devices with the same name and
> + * instance id.  So I rename this to pass make check.
> + * Real help from people who knows the hardware is needed.
> + */
> +.name = "pre-2.10-icp/server",
>  .version_id = 1,
>  .minimum_version_id = 1,
>  .needed = pre_2_10_vmstate_dummy_icp_needed,



-- 
Greg



Re: [PATCH v5 8/9] fsdev: Use ThrottleDirection instread of bool is_write

2023-08-07 Thread Greg Kurz
On Fri, 28 Jul 2023 10:20:05 +0800
zhenwei pi  wrote:

> 'bool is_write' style is obsolete from throttle framework, adapt
> fsdev to the new style.
> 
> Cc: Greg Kurz 
> Reviewed-by: Hanna Czenczek 
> Signed-off-by: zhenwei pi 

Reviewed-by: Greg Kurz 

> ---
>  fsdev/qemu-fsdev-throttle.c | 14 +++---
>  fsdev/qemu-fsdev-throttle.h |  4 ++--
>  hw/9pfs/cofile.c|  4 ++--
>  3 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/fsdev/qemu-fsdev-throttle.c b/fsdev/qemu-fsdev-throttle.c
> index 1c137d6f0f..d912da906d 100644
> --- a/fsdev/qemu-fsdev-throttle.c
> +++ b/fsdev/qemu-fsdev-throttle.c
> @@ -94,22 +94,22 @@ void fsdev_throttle_init(FsThrottle *fst)
>  }
>  }
>  
> -void coroutine_fn fsdev_co_throttle_request(FsThrottle *fst, bool is_write,
> +void coroutine_fn fsdev_co_throttle_request(FsThrottle *fst,
> +ThrottleDirection direction,
>  struct iovec *iov, int iovcnt)
>  {
> -ThrottleDirection direction = is_write ? THROTTLE_WRITE : THROTTLE_READ;
> -
> +assert(direction < THROTTLE_MAX);
>  if (throttle_enabled(>cfg)) {
>  if (throttle_schedule_timer(>ts, >tt, direction) ||
> -!qemu_co_queue_empty(>throttled_reqs[is_write])) {
> -qemu_co_queue_wait(>throttled_reqs[is_write], NULL);
> +!qemu_co_queue_empty(>throttled_reqs[direction])) {
> +qemu_co_queue_wait(>throttled_reqs[direction], NULL);
>  }
>  
>  throttle_account(>ts, direction, iov_size(iov, iovcnt));
>  
> -if (!qemu_co_queue_empty(>throttled_reqs[is_write]) &&
> +if (!qemu_co_queue_empty(>throttled_reqs[direction]) &&
>  !throttle_schedule_timer(>ts, >tt, direction)) {
> -qemu_co_queue_next(>throttled_reqs[is_write]);
> +qemu_co_queue_next(>throttled_reqs[direction]);
>  }
>  }
>  }
> diff --git a/fsdev/qemu-fsdev-throttle.h b/fsdev/qemu-fsdev-throttle.h
> index a21aecddc7..daa8ca2494 100644
> --- a/fsdev/qemu-fsdev-throttle.h
> +++ b/fsdev/qemu-fsdev-throttle.h
> @@ -23,14 +23,14 @@ typedef struct FsThrottle {
>  ThrottleState ts;
>  ThrottleTimers tt;
>  ThrottleConfig cfg;
> -CoQueue  throttled_reqs[2];
> +CoQueue  throttled_reqs[THROTTLE_MAX];
>  } FsThrottle;
>  
>  int fsdev_throttle_parse_opts(QemuOpts *, FsThrottle *, Error **);
>  
>  void fsdev_throttle_init(FsThrottle *);
>  
> -void coroutine_fn fsdev_co_throttle_request(FsThrottle *, bool ,
> +void coroutine_fn fsdev_co_throttle_request(FsThrottle *, ThrottleDirection ,
>  struct iovec *, int);
>  
>  void fsdev_throttle_cleanup(FsThrottle *);
> diff --git a/hw/9pfs/cofile.c b/hw/9pfs/cofile.c
> index 9c5344039e..71174c3e4a 100644
> --- a/hw/9pfs/cofile.c
> +++ b/hw/9pfs/cofile.c
> @@ -252,7 +252,7 @@ int coroutine_fn v9fs_co_pwritev(V9fsPDU *pdu, 
> V9fsFidState *fidp,
>  if (v9fs_request_cancelled(pdu)) {
>  return -EINTR;
>  }
> -fsdev_co_throttle_request(s->ctx.fst, true, iov, iovcnt);
> +fsdev_co_throttle_request(s->ctx.fst, THROTTLE_WRITE, iov, iovcnt);
>  v9fs_co_run_in_worker(
>  {
>  err = s->ops->pwritev(>ctx, >fs, iov, iovcnt, offset);
> @@ -272,7 +272,7 @@ int coroutine_fn v9fs_co_preadv(V9fsPDU *pdu, 
> V9fsFidState *fidp,
>  if (v9fs_request_cancelled(pdu)) {
>  return -EINTR;
>  }
> -fsdev_co_throttle_request(s->ctx.fst, false, iov, iovcnt);
> +fsdev_co_throttle_request(s->ctx.fst, THROTTLE_READ, iov, iovcnt);
>  v9fs_co_run_in_worker(
>  {
>  err = s->ops->preadv(>ctx, >fs, iov, iovcnt, offset);



-- 
Greg



Re: [PATCH v4 19/19] Drop duplicate #include

2023-01-19 Thread Greg Kurz
On Thu, 19 Jan 2023 07:59:59 +0100
Markus Armbruster  wrote:

> Tracked down with the help of scripts/clean-includes.
> 
> Signed-off-by: Markus Armbruster 
> ---

For ppc changes.

Reviewed-by: Greg Kurz 

>  include/hw/arm/fsl-imx6ul.h   | 1 -
>  include/hw/arm/fsl-imx7.h | 1 -
>  backends/tpm/tpm_emulator.c   | 1 -
>  hw/acpi/piix4.c   | 1 -
>  hw/alpha/dp264.c  | 1 -
>  hw/arm/virt.c | 1 -
>  hw/arm/xlnx-versal.c  | 1 -
>  hw/block/pflash_cfi01.c   | 1 -
>  hw/core/machine.c | 1 -
>  hw/hppa/machine.c | 1 -
>  hw/i386/acpi-build.c  | 1 -
>  hw/loongarch/acpi-build.c | 1 -
>  hw/misc/macio/cuda.c  | 1 -
>  hw/misc/macio/pmu.c   | 1 -
>  hw/net/xilinx_axienet.c   | 1 -
>  hw/ppc/ppc405_uc.c| 2 --
>  hw/ppc/ppc440_bamboo.c| 1 -
>  hw/ppc/spapr_drc.c| 1 -
>  hw/rdma/vmw/pvrdma_dev_ring.c | 1 -
>  hw/remote/machine.c   | 1 -
>  hw/remote/remote-obj.c| 1 -
>  hw/rtc/mc146818rtc.c  | 1 -
>  hw/s390x/virtio-ccw-serial.c  | 1 -
>  migration/postcopy-ram.c  | 2 --
>  softmmu/dirtylimit.c  | 1 -
>  softmmu/runstate.c| 1 -
>  softmmu/vl.c  | 1 -
>  target/loongarch/translate.c  | 1 -
>  target/mips/tcg/translate.c   | 1 -
>  target/nios2/translate.c  | 2 --
>  tests/unit/test-cutils.c  | 1 -
>  ui/gtk.c  | 1 -
>  util/oslib-posix.c| 4 
>  33 files changed, 39 deletions(-)
> 
> diff --git a/include/hw/arm/fsl-imx6ul.h b/include/hw/arm/fsl-imx6ul.h
> index 7812e516a5..1952cb984d 100644
> --- a/include/hw/arm/fsl-imx6ul.h
> +++ b/include/hw/arm/fsl-imx6ul.h
> @@ -30,7 +30,6 @@
>  #include "hw/timer/imx_gpt.h"
>  #include "hw/timer/imx_epit.h"
>  #include "hw/i2c/imx_i2c.h"
> -#include "hw/gpio/imx_gpio.h"
>  #include "hw/sd/sdhci.h"
>  #include "hw/ssi/imx_spi.h"
>  #include "hw/net/imx_fec.h"
> diff --git a/include/hw/arm/fsl-imx7.h b/include/hw/arm/fsl-imx7.h
> index 4e5e071864..355bd8ea83 100644
> --- a/include/hw/arm/fsl-imx7.h
> +++ b/include/hw/arm/fsl-imx7.h
> @@ -32,7 +32,6 @@
>  #include "hw/timer/imx_gpt.h"
>  #include "hw/timer/imx_epit.h"
>  #include "hw/i2c/imx_i2c.h"
> -#include "hw/gpio/imx_gpio.h"
>  #include "hw/sd/sdhci.h"
>  #include "hw/ssi/imx_spi.h"
>  #include "hw/net/imx_fec.h"
> diff --git a/backends/tpm/tpm_emulator.c b/backends/tpm/tpm_emulator.c
> index 49cc3d749d..2b440d2c9a 100644
> --- a/backends/tpm/tpm_emulator.c
> +++ b/backends/tpm/tpm_emulator.c
> @@ -35,7 +35,6 @@
>  #include "sysemu/runstate.h"
>  #include "sysemu/tpm_backend.h"
>  #include "sysemu/tpm_util.h"
> -#include "sysemu/runstate.h"
>  #include "tpm_int.h"
>  #include "tpm_ioctl.h"
>  #include "migration/blocker.h"
> diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
> index 0a81f1ad93..df39f91294 100644
> --- a/hw/acpi/piix4.c
> +++ b/hw/acpi/piix4.c
> @@ -35,7 +35,6 @@
>  #include "sysemu/xen.h"
>  #include "qapi/error.h"
>  #include "qemu/range.h"
> -#include "hw/acpi/pcihp.h"
>  #include "hw/acpi/cpu_hotplug.h"
>  #include "hw/acpi/cpu.h"
>  #include "hw/hotplug.h"
> diff --git a/hw/alpha/dp264.c b/hw/alpha/dp264.c
> index c502c8c62a..4161f559a7 100644
> --- a/hw/alpha/dp264.c
> +++ b/hw/alpha/dp264.c
> @@ -18,7 +18,6 @@
>  #include "net/net.h"
>  #include "qemu/cutils.h"
>  #include "qemu/datadir.h"
> -#include "net/net.h"
>  
>  static uint64_t cpu_alpha_superpage_to_phys(void *opaque, uint64_t addr)
>  {
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index ea2413a0ba..d3849d7233 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -33,7 +33,6 @@
>  #include "qemu/units.h"
>  #include "qemu/option.h"
>  #include "monitor/qdev.h"
> -#include "qapi/error.h"
>  #include "hw/sysbus.h"
>  #include "hw/arm/boot.h"
>  #include "hw/arm/primecell.h"
> diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
> index 57276e1506..69b1b99e93 100644
> --- a/hw/arm/xlnx-versal.c
> +++ b/hw/arm/xlnx-versal.c
> @@ -22,7 +22,6 @@
>  #include "hw/misc/unimp.h"
>  #include "hw/arm/xlnx-versal.h"
>  #include "qemu/log.h"
> -#include "hw/sysbus.h"
>  
>  #define XLNX_VERSAL_ACPU_TYPE ARM_CPU_TYPE_

Re: [RFC PATCH-for-8.0 07/10] hw/virtio: Directly access cached VirtIODevice::access_is_big_endian

2022-12-13 Thread Greg Kurz
On Tue, 13 Dec 2022 00:05:14 +0100
Philippe Mathieu-Daudé  wrote:

> Since the device endianness doesn't change at runtime,
> use the cached value instead of evaluating it on each call.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  include/hw/virtio/virtio-access.h | 44 +++
>  1 file changed, 22 insertions(+), 22 deletions(-)
> 
> diff --git a/include/hw/virtio/virtio-access.h 
> b/include/hw/virtio/virtio-access.h
> index 07aae69042..985f39fe16 100644
> --- a/include/hw/virtio/virtio-access.h
> +++ b/include/hw/virtio/virtio-access.h
> @@ -43,7 +43,7 @@ static inline uint16_t virtio_lduw_phys(VirtIODevice *vdev, 
> hwaddr pa)
>  {
>  AddressSpace *dma_as = vdev->dma_as;
>  
> -if (virtio_access_is_big_endian(vdev)) {
> +if (vdev->access_is_big_endian) {

For x86, virtio_access_is_big_endian() expands to :

static inline bool virtio_access_is_big_endian(VirtIODevice *vdev)
{
return false;
}

When I added these memory accessors, there was a strong requirement from MST
that x86 shouldn't have to pay for an out-of-line check when it is known that
everything is always little endian. Not sure exactly what you're trying to
achieve with VirtIODevice::access_is_big_endian but this shouldn't mess with
this fast path.

>  return lduw_be_phys(dma_as, pa);
>  }
>  return lduw_le_phys(dma_as, pa);
> @@ -53,7 +53,7 @@ static inline uint32_t virtio_ldl_phys(VirtIODevice *vdev, 
> hwaddr pa)
>  {
>  AddressSpace *dma_as = vdev->dma_as;
>  
> -if (virtio_access_is_big_endian(vdev)) {
> +if (vdev->access_is_big_endian) {
>  return ldl_be_phys(dma_as, pa);
>  }
>  return ldl_le_phys(dma_as, pa);
> @@ -63,7 +63,7 @@ static inline uint64_t virtio_ldq_phys(VirtIODevice *vdev, 
> hwaddr pa)
>  {
>  AddressSpace *dma_as = vdev->dma_as;
>  
> -if (virtio_access_is_big_endian(vdev)) {
> +if (vdev->access_is_big_endian) {
>  return ldq_be_phys(dma_as, pa);
>  }
>  return ldq_le_phys(dma_as, pa);
> @@ -74,7 +74,7 @@ static inline void virtio_stw_phys(VirtIODevice *vdev, 
> hwaddr pa,
>  {
>  AddressSpace *dma_as = vdev->dma_as;
>  
> -if (virtio_access_is_big_endian(vdev)) {
> +if (vdev->access_is_big_endian) {
>  stw_be_phys(dma_as, pa, value);
>  } else {
>  stw_le_phys(dma_as, pa, value);
> @@ -86,7 +86,7 @@ static inline void virtio_stl_phys(VirtIODevice *vdev, 
> hwaddr pa,
>  {
>  AddressSpace *dma_as = vdev->dma_as;
>  
> -if (virtio_access_is_big_endian(vdev)) {
> +if (vdev->access_is_big_endian) {
>  stl_be_phys(dma_as, pa, value);
>  } else {
>  stl_le_phys(dma_as, pa, value);
> @@ -95,7 +95,7 @@ static inline void virtio_stl_phys(VirtIODevice *vdev, 
> hwaddr pa,
>  
>  static inline void virtio_stw_p(VirtIODevice *vdev, void *ptr, uint16_t v)
>  {
> -if (virtio_access_is_big_endian(vdev)) {
> +if (vdev->access_is_big_endian) {
>  stw_be_p(ptr, v);
>  } else {
>  stw_le_p(ptr, v);
> @@ -104,7 +104,7 @@ static inline void virtio_stw_p(VirtIODevice *vdev, void 
> *ptr, uint16_t v)
>  
>  static inline void virtio_stl_p(VirtIODevice *vdev, void *ptr, uint32_t v)
>  {
> -if (virtio_access_is_big_endian(vdev)) {
> +if (vdev->access_is_big_endian) {
>  stl_be_p(ptr, v);
>  } else {
>  stl_le_p(ptr, v);
> @@ -113,7 +113,7 @@ static inline void virtio_stl_p(VirtIODevice *vdev, void 
> *ptr, uint32_t v)
>  
>  static inline void virtio_stq_p(VirtIODevice *vdev, void *ptr, uint64_t v)
>  {
> -if (virtio_access_is_big_endian(vdev)) {
> +if (vdev->access_is_big_endian) {
>  stq_be_p(ptr, v);
>  } else {
>  stq_le_p(ptr, v);
> @@ -122,7 +122,7 @@ static inline void virtio_stq_p(VirtIODevice *vdev, void 
> *ptr, uint64_t v)
>  
>  static inline int virtio_lduw_p(VirtIODevice *vdev, const void *ptr)
>  {
> -if (virtio_access_is_big_endian(vdev)) {
> +if (vdev->access_is_big_endian) {
>  return lduw_be_p(ptr);
>  } else {
>  return lduw_le_p(ptr);
> @@ -131,7 +131,7 @@ static inline int virtio_lduw_p(VirtIODevice *vdev, const 
> void *ptr)
>  
>  static inline int virtio_ldl_p(VirtIODevice *vdev, const void *ptr)
>  {
> -if (virtio_access_is_big_endian(vdev)) {
> +if (vdev->access_is_big_endian) {
>  return ldl_be_p(ptr);
>  } else {
>  return ldl_le_p(ptr);
> @@ -140,7 +140,7 @@ static inline int virtio_ldl_p(VirtIODevice *vdev, const 
> void *ptr)
>  
>  static inline uint64_t virtio_ldq_p(VirtIODevice *vdev, const void *ptr)
>  {
> -if (virtio_access_is_big_endian(vdev)) {
> +if (vdev->access_is_big_endian) {
>  return ldq_be_p(ptr);
>  } else {
>  return ldq_le_p(ptr);
> @@ -150,9 +150,9 @@ static inline uint64_t virtio_ldq_p(VirtIODevice *vdev, 
> const void *ptr)
>  static inline uint16_t virtio_tswap16(VirtIODevice *vdev, uint16_t s)
>  {
>  #if HOST_BIG_ENDIAN
> -return 

Re: [PATCH 1/1] 9pfs: avoid iterator invalidation in v9fs_mark_fids_unreclaim

2022-09-27 Thread Greg Kurz
On Tue, 27 Sep 2022 19:14:33 +0200
Christian Schoenebeck  wrote:

> On Dienstag, 27. September 2022 15:05:13 CEST Linus Heckemann wrote:
> > Christian Schoenebeck  writes:
> > > Ah, you sent this fix as a separate patch on top. I actually just meant
> > > that you would take my already queued patch as the latest version (just
> > > because I had made some minor changes on my end) and adjust that patch
> > > further as v4.
> > > 
> > > Anyway, there are still some things to do here, so maybe you can send your
> > > patch squashed in the next round ...
> > 
> > I see, will do!
> > 
> > >> @Christian: I still haven't been able to reproduce the issue that this
> > >> commit is supposed to fix (I tried building KDE too, no problems), so
> > >> it's a bit of a shot in the dark. It certainly still runs and I think it
> > >> should fix the issue, but it would be great if you could test it.
> > > 
> > > No worries about reproduction, I will definitely test this thoroughly. ;-)
> > > 
> > >>  hw/9pfs/9p.c | 46 ++
> > >>  1 file changed, 30 insertions(+), 16 deletions(-)
> > >> 
> > >> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > >> index f4c1e37202..825c39e122 100644
> > >> --- a/hw/9pfs/9p.c
> > >> +++ b/hw/9pfs/9p.c
> > >> @@ -522,33 +522,47 @@ static int coroutine_fn
> > >> v9fs_mark_fids_unreclaim(V9fsPDU *pdu, V9fsPath *path) V9fsFidState
> > >> *fidp;
> > >> 
> > >>  gpointer fid;
> > >>  GHashTableIter iter;
> > >> 
> > >> +/*
> > >> + * The most common case is probably that we have exactly one
> > >> + * fid for the given path, so preallocate exactly one.
> > >> + */
> > >> +GArray *to_reopen = g_array_sized_new(FALSE, FALSE,
> > >> sizeof(V9fsFidState*), 1); +gint i;
> > > 
> > > Please use `g_autoptr(GArray)` instead of `GArray *`, that avoids the need
> > > for explicit calls to g_array_free() below.
> > 
> > Good call. I'm not familiar with glib, so I didn't know about this :)
> > 
> > >> -fidp->flags |= FID_NON_RECLAIMABLE;
> > > 
> > > Why did you remove that? It should still be marked as FID_NON_RECLAIMABLE,
> > > no?
> > Indeed, that was an accident.
> > 
> > >> +/*
> > >> + * Ensure the fid survives a potential clunk request during
> > >> + * v9fs_reopen_fid or put_fid.
> > >> + */
> > >> +fidp->ref++;
> > > 
> > > Hmm, bumping the refcount here makes sense, as the 2nd loop may be
> > > interrupted and the fid otherwise disappear in between, but ...
> > > 
> > >> +g_array_append_val(to_reopen, fidp);
> > >> 
> > >>  }
> > >> 
> > >> +}
> > >> 
> > >> -/* We're done with this fid */
> > >> +for (i=0; i < to_reopen->len; i++) {
> > >> +fidp = g_array_index(to_reopen, V9fsFidState*, i);
> > >> +/* reopen the file/dir if already closed */
> > >> +err = v9fs_reopen_fid(pdu, fidp);
> > >> +if (err < 0) {
> > >> +put_fid(pdu, fidp);
> > >> +g_array_free(to_reopen, TRUE);
> > >> +return err;
> > > 
> > > ... this return would then leak all remainder fids that you have bumped
> > > the
> > > refcount for above already.
> > 
> > You're right. I think the best way around it, though it feels ugly, is
> > to add a third loop in an "out:".
> 
> Either that, or continuing the loop to the end. Not that this would become 
> much prettier. I must admit I also don't really have a good idea for a clean 
> solution in this case.
> 
> > > Also: I noticed that your changes in virtfs_reset() would need the same
> > > 2-loop hack to avoid hash iterator invalidation, as it would also call
> > > put_fid() inside the loop and be prone for hash iterator invalidation
> > > otherwise.
> > Good point. Will do.
> > 
> > One more thing has occurred to me. I think the reclaiming/reopening
> > logic will misbehave in the following sequence of events:
> > 
> > 1. QEMU reclaims an open fid, losing the file handle
> > 2. The file referred to by the fid is replaced with a different file
> >(e.g. via rename or symlink) outside QEMU
> > 3. The file is accessed again by the guest, causing QEMU to reopen a
> >_different file_ from before without the guest having performed any
> >operations that should cause this to happen.
> > 
> > This is neither introduced nor resolved by my changes. Am I overlooking
> > something that avoids this (be it documentation that directories exposed
> > via 9p should not be touched by the host), or is this a real issue? I'm
> > thinking one could at least detect it by saving inode numbers in
> > V9fsFidState and comparing them when reopening, but recovering from such
> > a situation seems difficult.
> 
> Well, in that specific scenario when rename/move happens outside of QEMU then 
> yes, this might happen unfortunately. The point of this "reclaim fid" stuff 
> is 
> to deal with the fact that there is an upper limit on systems for the max. 
> amount of 

Re: [PATCH v3] 9pfs: use GHashTable for fid table

2022-09-08 Thread Greg Kurz
On Thu, 08 Sep 2022 18:10:28 +0200
Linus Heckemann  wrote:

> (sorry for the dup @Greg, forgot to reply-all)
> 
> Greg Kurz  writes:
> >> > g_hash_table_steal_extended() [1] actually allows to do just that.
> >> 
> >> g_hash_table_steal_extended unfortunately isn't available since it was
> >> introduced in glib 2.58 and we're maintaining compatibility to 2.56.
> >> 
> >
> > Ha... this could be addressed through conditional compilation, e.g.:
> 
> It still won't compile, because we set GLIB_VERSION_MAX_ALLOWED in
> glib-compat.h and it would require a compat wrapper as described

ah drat, you're right !

> there. I think that's a bit much for this far more marginal performance
> change. I'm happy to resubmit with the TODO comment though if you like?

Either that or Christian may add it when merging.

Cheers,

--
Greg



Re: [PATCH v3] 9pfs: use GHashTable for fid table

2022-09-08 Thread Greg Kurz
On Thu,  8 Sep 2022 13:23:53 +0200
Linus Heckemann  wrote:

> The previous implementation would iterate over the fid table for
> lookup operations, resulting in an operation with O(n) complexity on
> the number of open files and poor cache locality -- for every open,
> stat, read, write, etc operation.
> 
> This change uses a hashtable for this instead, significantly improving
> the performance of the 9p filesystem. The runtime of NixOS's simple
> installer test, which copies ~122k files totalling ~1.8GiB from 9p,
> decreased by a factor of about 10.
> 
> Signed-off-by: Linus Heckemann 
> Reviewed-by: Philippe Mathieu-Daudé 
> Reviewed-by: Greg Kurz 
> ---
> 
> Greg Kurz writes:
> > The comment above should be adapted to the new situation : no need
> 
> I've removed it completely, since the logic is simple enough that only
> the shortened comment below remains necessary.
> 
> > With the new logic, this should just be:
> 
> now is :)
> 
> > g_hash_table_steal_extended() [1] actually allows to do just that.
> 
> g_hash_table_steal_extended unfortunately isn't available since it was
> introduced in glib 2.58 and we're maintaining compatibility to 2.56.
> 

Ha... this could be addressed through conditional compilation, e.g.:

static V9fsFidState *clunk_fid(V9fsState *s, int32_t fid)
{
V9fsFidState *fidp;

#if GLIB_CHECK_VERSION(2,56,0)
if (!g_hash_table_steal_extended(s->fids, GINT_TO_POINTER(fid),
 NULL, (gpointer *) )) {
return NULL;
}
#else
fidp = g_hash_table_lookup(s->fids, GINT_TO_POINTER(fid));
if (fidp) {
g_hash_table_remove(s->fids, GINT_TO_POINTER(fid));
} else {
return NULL;
}
#endif

fidp->clunked = true;
return fidp;
}

or simply leave a TODO comment so that we don't forget.


> > You could just call g_hash_table_iter_remove() here
> 
> Applied this suggestion, thanks!
> 
> 
> > Well... finding at least one clunked fid state in this table is
> > definitely a bug so I'll keep the BUG_ON() anyway.
> 
> Christian Schoenebeck writes:
> > Yeah, I think you are right, it would feel odd. Just drop BUG_ON() for
> > now.
> 
> I still prefer dropping it, but if we were to keep it I think it should
> be in v9fs_reclaim_fd where we iterate and can thus check the whole
> table.
> 

IMO the relevant aspect isn't really about checking the whole table, but
rather not to get a clunked fid out of this table and pass it over.

> 
> Greg Kurz and Philippe Mathieu-Daudé write:
> > [patch versioning]
> 
> Whoops. I used -v2 on git send-email, which just ignored the option,
> rather than git format-patch, by accident. This one _should_ now be v3!
> 
> 

v3 it is and LGTM ! No big deal with the BUG_ON(), given the improvement.

My R-b stands. Thanks Linus !

>  hw/9pfs/9p.c | 140 +--
>  hw/9pfs/9p.h |   2 +-
>  2 files changed, 70 insertions(+), 72 deletions(-)
> 
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index aebadeaa03..98a475e560 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -282,33 +282,31 @@ static V9fsFidState *coroutine_fn get_fid(V9fsPDU *pdu, 
> int32_t fid)
>  V9fsFidState *f;
>  V9fsState *s = pdu->s;
>  
> -QSIMPLEQ_FOREACH(f, >fid_list, next) {
> -BUG_ON(f->clunked);
> -if (f->fid == fid) {
> -/*
> - * Update the fid ref upfront so that
> - * we don't get reclaimed when we yield
> - * in open later.
> - */
> -f->ref++;
> -/*
> - * check whether we need to reopen the
> - * file. We might have closed the fd
> - * while trying to free up some file
> - * descriptors.
> - */
> -err = v9fs_reopen_fid(pdu, f);
> -if (err < 0) {
> -f->ref--;
> -return NULL;
> -}
> -/*
> - * Mark the fid as referenced so that the LRU
> - * reclaim won't close the file descriptor
> - */
> -f->flags |= FID_REFERENCED;
> -return f;
> +f = g_hash_table_lookup(s->fids, GINT_TO_POINTER(fid));
> +if (f) {
> +/*
> + * Update the fid ref upfront so that
> + * we don't get reclaimed when we yield
> + * in open later.
> + */
> +f->ref++;
> +/*
> + * check whether we need to reopen the
> + * file. We might have closed the fd
> + * while trying to free up some file
> + * descriptors.
> + */
> + 

Re: [PATCH] 9pfs: use GHashMap for fid table

2022-09-05 Thread Greg Kurz
Hi Linus,

Thanks for this promising change !

On Mon, 05 Sep 2022 10:51:10 +0200
Linus Heckemann  wrote:

> Hi all, thanks for your reviews.
> 
> > @@ -4226,7 +4232,7 @@ int v9fs_device_realize_common(V9fsState *s, const 
> > V9fsTransport *t,
> >  s->ctx.fmode = fse->fmode;
> >  s->ctx.dmode = fse->dmode;
> > 
> > -QSIMPLEQ_INIT(>fid_list);
> > +s->fids = g_hash_table_new(NULL, NULL);
> >  qemu_co_rwlock_init(>rename_lock);
> > 
> >  if (s->ops->init(>ctx, errp) < 0) {
> 
> I noticed that the hash table may be leaked as is. I'll address this in
> the next submission.
> 

Pay attention that this rollback should be added to
v9fs_device_unrealize_common() which is assumed to be
idempotent and is already called on the error path of
v9fs_device_realize_common().

> 
> Philippe Mathieu-Daudé  writes:
> > [Style nitpicking]
> 
> Applied these changes and will include them in the next version of the patch.
> 
> Christian Schoenebeck  writes:
> > > @@ -317,12 +315,9 @@ static V9fsFidState *alloc_fid(V9fsState *s, int32_t
> > > fid) {
> > >  V9fsFidState *f;
> > > 
> > > -QSIMPLEQ_FOREACH(f, >fid_list, next) {
> > > +if (g_hash_table_contains(s->fids, GINT_TO_POINTER(fid))) {
> > >  /* If fid is already there return NULL */
> > > -BUG_ON(f->clunked);
> > > -if (f->fid == fid) {
> > > -return NULL;
> > > -}
> > > +return NULL;
> >
> > Probably retaining BUG_ON(f->clunked) here?
> 
> I decided not to since this was a sanity check that was happening for
> _each_ fid, but only up to the one we were looking for. This seemed
> inconsistent and awkward to me, so I dropped it completely (and the
> invariant that no clunked fids remain in the table still seems to hold
> -- it's fairly trivial to check, in that the clunked flag is only set
> in two places, both of which also remove the map entry). My preference
> would be to leave it out, but I'd also be fine with restoring it for
> just the one we're looking for, or maybe moving the check to when we're
> iterating over the whole table, e.g. in v9fs_reclaim_fd. Thoughts?
> 

Well... finding at least one clunked fid state in this table is
definitely a bug so I'll keep the BUG_ON() anyway.

> > > @@ -424,12 +419,11 @@ static V9fsFidState *clunk_fid(V9fsState *s, int32_t
> > > fid) {
> > >  V9fsFidState *fidp;
> > > 
> > > -QSIMPLEQ_FOREACH(fidp, >fid_list, next) {
> > > -if (fidp->fid == fid) {
> > > -QSIMPLEQ_REMOVE(>fid_list, fidp, V9fsFidState, next);
> > > -fidp->clunked = true;
> > > -return fidp;
> > > -}
> > > +fidp = g_hash_table_lookup(s->fids, GINT_TO_POINTER(fid));
> > > +if (fidp) {
> > > +g_hash_table_remove(s->fids, GINT_TO_POINTER(fid));
> > > +fidp->clunked = true;
> > > +return fidp;
> >
> > We can't get rid of the double lookup here, can we? Surprisingly I don't 
> > find 
> > a lookup function on the iterator based API.
> 
> It seems you're not the only one who had that idea:
> https://gitlab.gnome.org/GNOME/glib/-/issues/613
> 
> In this case, I think an extended remove function which returns the
> values that were present would be even nicer. But neither exists at this
> time (and that issue is pretty old), I guess we're stuck with this for
> now.

g_hash_table_steal_extended() [1] actually allows to do just that. Since the
hash table is allocated with g_hash_table_new() and doesn't care for destroy
functions, the code change would be something like:

@@ -424,12 +419,10 @@ static V9fsFidState *clunk_fid(V9fsState *s, int32_t fid)
 {
 V9fsFidState *fidp;
 
-QSIMPLEQ_FOREACH(fidp, >fid_list, next) {
-if (fidp->fid == fid) {
-QSIMPLEQ_REMOVE(>fid_list, fidp, V9fsFidState, next);
-fidp->clunked = true;
-return fidp;
-}
+if (g_hash_table_steal_extended(s->fids, GINT_TO_POINTER(fid), NULL,
+(gpointer *))) {
+fidp->clunked = true;
+return fidp;
 }
 return NULL;
 }

[1] 
https://developer-old.gnome.org/glib/stable/glib-Hash-Tables.html#g-hash-table-steal-extended

Cheers,

--
Greg

> 
> 
> Daniel P. Berrangé writes:
> > In $SUBJECT it is called GHashTable, not GHashMap
> 
> Indeed, good catch. Will fix in the next version.
> 
> Linus




Re: [PATCH 02/18] block: Add a 'flags' param to blk_pread()

2022-05-17 Thread Greg Kurz
On Tue, 17 May 2022 12:35:08 +0100
Alberto Faria  wrote:

> For consistency with other I/O functions, and in preparation to
> implement it using generated_co_wrapper.
> 
> Callers were updated using this Coccinelle script:
> 
> @@ expression blk, offset, buf, bytes; @@
> - blk_pread(blk, offset, buf, bytes)
> + blk_pread(blk, offset, buf, bytes, 0)
> 
> It had no effect on hw/block/nand.c, presumably due to the #if, so that
> file was updated manually.
> 
> Overly-long lines were then fixed by hand.
> 
> Signed-off-by: Alberto Faria 
> ---

Reviewed-by: Greg Kurz 

>  block.c   |  2 +-
>  block/block-backend.c |  5 +++--
>  block/commit.c|  2 +-
>  block/export/fuse.c   |  2 +-
>  hw/arm/allwinner-h3.c |  2 +-
>  hw/arm/aspeed.c   |  2 +-
>  hw/block/block.c  |  2 +-
>  hw/block/fdc.c|  6 +++---
>  hw/block/hd-geometry.c|  2 +-
>  hw/block/m25p80.c |  2 +-
>  hw/block/nand.c   | 12 ++--
>  hw/block/onenand.c| 12 ++--
>  hw/ide/atapi.c|  4 ++--
>  hw/misc/mac_via.c |  2 +-
>  hw/misc/sifive_u_otp.c|  4 ++--
>  hw/nvram/eeprom_at24c.c   |  2 +-
>  hw/nvram/spapr_nvram.c|  2 +-
>  hw/nvram/xlnx-bbram.c |  2 +-
>  hw/nvram/xlnx-efuse.c |  2 +-
>  hw/ppc/pnv_pnor.c |  2 +-
>  hw/sd/sd.c|  2 +-
>  include/sysemu/block-backend-io.h |  3 ++-
>  migration/block.c |  4 ++--
>  nbd/server.c  |  4 ++--
>  qemu-img.c| 12 ++--
>  qemu-io-cmds.c|  2 +-
>  tests/unit/test-block-iothread.c  |  4 ++--
>  27 files changed, 52 insertions(+), 50 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 0fd830e2e2..ed701b4889 100644
> --- a/block.c
> +++ b/block.c
> @@ -1037,7 +1037,7 @@ static int find_image_format(BlockBackend *file, const 
> char *filename,
>  return ret;
>  }
>  
> -ret = blk_pread(file, 0, buf, sizeof(buf));
> +ret = blk_pread(file, 0, buf, sizeof(buf), 0);
>  if (ret < 0) {
>  error_setg_errno(errp, -ret, "Could not read image for determining 
> its "
>   "format");
> diff --git a/block/block-backend.c b/block/block-backend.c
> index c1c367bf9e..da89450861 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1567,14 +1567,15 @@ BlockAIOCB *blk_aio_pwrite_zeroes(BlockBackend *blk, 
> int64_t offset,
>  flags | BDRV_REQ_ZERO_WRITE, cb, opaque);
>  }
>  
> -int blk_pread(BlockBackend *blk, int64_t offset, void *buf, int bytes)
> +int blk_pread(BlockBackend *blk, int64_t offset, void *buf, int bytes,
> +  BdrvRequestFlags flags)
>  {
>  int ret;
>  QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, bytes);
>  IO_OR_GS_CODE();
>  
>  blk_inc_in_flight(blk);
> -ret = blk_do_preadv(blk, offset, bytes, , 0);
> +ret = blk_do_preadv(blk, offset, bytes, , flags);
>  blk_dec_in_flight(blk);
>  
>  return ret;
> diff --git a/block/commit.c b/block/commit.c
> index 851d1c557a..e5b3ad08da 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -527,7 +527,7 @@ int bdrv_commit(BlockDriverState *bs)
>  goto ro_cleanup;
>  }
>  if (ret) {
> -ret = blk_pread(src, offset, buf, n);
> +ret = blk_pread(src, offset, buf, n, 0);
>  if (ret < 0) {
>  goto ro_cleanup;
>  }
> diff --git a/block/export/fuse.c b/block/export/fuse.c
> index e80b24a867..dcf8f225f3 100644
> --- a/block/export/fuse.c
> +++ b/block/export/fuse.c
> @@ -554,7 +554,7 @@ static void fuse_read(fuse_req_t req, fuse_ino_t inode,
>  return;
>  }
>  
> -ret = blk_pread(exp->common.blk, offset, buf, size);
> +ret = blk_pread(exp->common.blk, offset, buf, size, 0);
>  if (ret >= 0) {
>  fuse_reply_buf(req, buf, size);
>  } else {
> diff --git a/hw/arm/allwinner-h3.c b/hw/arm/allwinner-h3.c
> index 318ed4348c..788083b6fa 100644
> --- a/hw/arm/allwinner-h3.c
> +++ b/hw/arm/allwinner-h3.c
> @@ -174,7 +174,7 @@ void allwinner_h3_bootrom_setup(AwH3State *s, 
> BlockBackend *blk)
>  const int64_t rom_size = 32 * KiB;
>  g_autofree uint8_t *buffer = g_new0(uint8_t, rom_size);
>  
> -if (blk_pread(blk, 8 * KiB, buffer, rom_size) < 0) {
> +if (blk_pread

Re: [PATCH 01/18] block: Make blk_{pread,pwrite}() return 0 on success

2022-05-17 Thread Greg Kurz
On Tue, 17 May 2022 12:35:07 +0100
Alberto Faria  wrote:

> They currently return the value of their 'bytes' parameter on success.
> 
> Make them return 0 instead, for consistency with other I/O functions and
> in preparation to implement them using generated_co_wrapper. This also
> makes it clear that short reads/writes are not possible.
> 
> Signed-off-by: Alberto Faria 
> ---
>  block.c  |  8 +---
>  block/block-backend.c|  7 ++-
>  block/qcow.c |  6 +++---
>  hw/block/m25p80.c|  2 +-
>  hw/misc/mac_via.c|  2 +-
>  hw/misc/sifive_u_otp.c   |  2 +-
>  hw/nvram/eeprom_at24c.c  |  4 ++--
>  hw/nvram/spapr_nvram.c   | 12 ++--
>  hw/ppc/pnv_pnor.c|  2 +-

For PPC and sPAPR parts

Reviewed-by: Greg Kurz 

>  qemu-img.c   | 17 +++--
>  qemu-io-cmds.c   | 18 --
>  tests/unit/test-block-iothread.c |  4 ++--
>  12 files changed, 43 insertions(+), 41 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 2c0080..0fd830e2e2 100644
> --- a/block.c
> +++ b/block.c
> @@ -1045,14 +1045,16 @@ static int find_image_format(BlockBackend *file, 
> const char *filename,
>  return ret;
>  }
>  
> -drv = bdrv_probe_all(buf, ret, filename);
> +drv = bdrv_probe_all(buf, sizeof(buf), filename);
>  if (!drv) {
>  error_setg(errp, "Could not determine image format: No compatible "
> "driver found");
> -ret = -ENOENT;
> +*pdrv = NULL;
> +return -ENOENT;
>  }
> +
>  *pdrv = drv;
> -return ret;
> +return 0;
>  }
>  
>  /**
> diff --git a/block/block-backend.c b/block/block-backend.c
> index e0e1aff4b1..c1c367bf9e 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1577,19 +1577,16 @@ int blk_pread(BlockBackend *blk, int64_t offset, void 
> *buf, int bytes)
>  ret = blk_do_preadv(blk, offset, bytes, , 0);
>  blk_dec_in_flight(blk);
>  
> -return ret < 0 ? ret : bytes;
> +return ret;
>  }
>  
>  int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int bytes,
> BdrvRequestFlags flags)
>  {
> -int ret;
>  QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, bytes);
>  IO_OR_GS_CODE();
>  
> -ret = blk_pwritev_part(blk, offset, bytes, , 0, flags);
> -
> -return ret < 0 ? ret : bytes;
> +return blk_pwritev_part(blk, offset, bytes, , 0, flags);
>  }
>  
>  int64_t blk_getlength(BlockBackend *blk)
> diff --git a/block/qcow.c b/block/qcow.c
> index c646d6b16d..25a43353c1 100644
> --- a/block/qcow.c
> +++ b/block/qcow.c
> @@ -891,14 +891,14 @@ static int coroutine_fn 
> qcow_co_create(BlockdevCreateOptions *opts,
>  
>  /* write all the data */
>  ret = blk_pwrite(qcow_blk, 0, , sizeof(header), 0);
> -if (ret != sizeof(header)) {
> +if (ret < 0) {
>  goto exit;
>  }
>  
>  if (qcow_opts->has_backing_file) {
>  ret = blk_pwrite(qcow_blk, sizeof(header),
>   qcow_opts->backing_file, backing_filename_len, 0);
> -if (ret != backing_filename_len) {
> +if (ret < 0) {
>  goto exit;
>  }
>  }
> @@ -908,7 +908,7 @@ static int coroutine_fn 
> qcow_co_create(BlockdevCreateOptions *opts,
>   i++) {
>  ret = blk_pwrite(qcow_blk, header_size + BDRV_SECTOR_SIZE * i,
>   tmp, BDRV_SECTOR_SIZE, 0);
> -if (ret != BDRV_SECTOR_SIZE) {
> +if (ret < 0) {
>  g_free(tmp);
>  goto exit;
>  }
> diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
> index 7d3d8b12e0..bd58c07bb6 100644
> --- a/hw/block/m25p80.c
> +++ b/hw/block/m25p80.c
> @@ -1506,7 +1506,7 @@ static void m25p80_realize(SSIPeripheral *ss, Error 
> **errp)
>  trace_m25p80_binding(s);
>  s->storage = blk_blockalign(s->blk, s->size);
>  
> -if (blk_pread(s->blk, 0, s->storage, s->size) != s->size) {
> +if (blk_pread(s->blk, 0, s->storage, s->size) < 0) {
>  error_setg(errp, "failed to read the initial flash content");
>  return;
>  }
> diff --git a/hw/misc/mac_via.c b/hw/misc/mac_via.c
> index 525e38ce93..0515d1818e 100644
> --- a/hw/misc/mac_via.c
> +++ b/hw/misc/mac_via.c
> @@ -1030,7 +1030,7 @@ static void mos6522_q800_via1_realize(DeviceState *dev, 
> Error **errp)
>  }
>  
>  len = 

Re: [PATCH 2/3] 9pfs: Use g_new() & friends where that makes obvious sense

2022-03-15 Thread Greg Kurz
On Mon, 14 Mar 2022 17:01:07 +0100
Markus Armbruster  wrote:

> g_new(T, n) is neater than g_malloc(sizeof(T) * n).  It's also safer,
> for two reasons.  One, it catches multiplication overflowing size_t.
> Two, it returns T * rather than void *, which lets the compiler catch
> more type errors.
> 
> This commit only touches allocations with size arguments of the form
> sizeof(T).
> 
> Patch created mechanically with:
> 
> $ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \
>--macro-file scripts/cocci-macro-file.h FILES...
> 
> Except this uncovers a typing error:
> 
> ../hw/9pfs/9p.c:855:13: warning: incompatible pointer types assigning to 
> 'QpfEntry *' from 'QppEntry *' [-Wincompatible-pointer-types]
>   val = g_new0(QppEntry, 1);
>   ^ ~~~
> 1 warning generated.
> 
> Harmless, because QppEntry is larger than QpfEntry.  Fix to allocate a
> QpfEntry instead.
> 
> Cc: Greg Kurz 
> Cc: Christian Schoenebeck 
> Signed-off-by: Markus Armbruster 
> ---

Reviewed-by: Greg Kurz 

>  hw/9pfs/9p-proxy.c   | 2 +-
>  hw/9pfs/9p-synth.c   | 4 ++--
>  hw/9pfs/9p.c | 8 
>  hw/9pfs/codir.c  | 6 +++---
>  tests/qtest/virtio-9p-test.c | 4 ++--
>  5 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/9pfs/9p-proxy.c b/hw/9pfs/9p-proxy.c
> index 8b4b5cf7dc..4c5e0fc217 100644
> --- a/hw/9pfs/9p-proxy.c
> +++ b/hw/9pfs/9p-proxy.c
> @@ -1187,7 +1187,7 @@ static int proxy_parse_opts(QemuOpts *opts, 
> FsDriverEntry *fs, Error **errp)
>  
>  static int proxy_init(FsContext *ctx, Error **errp)
>  {
> -V9fsProxy *proxy = g_malloc(sizeof(V9fsProxy));
> +V9fsProxy *proxy = g_new(V9fsProxy, 1);
>  int sock_id;
>  
>  if (ctx->export_flags & V9FS_PROXY_SOCK_NAME) {
> diff --git a/hw/9pfs/9p-synth.c b/hw/9pfs/9p-synth.c
> index b3080e415b..d99d263985 100644
> --- a/hw/9pfs/9p-synth.c
> +++ b/hw/9pfs/9p-synth.c
> @@ -49,7 +49,7 @@ static V9fsSynthNode *v9fs_add_dir_node(V9fsSynthNode 
> *parent, int mode,
>  
>  /* Add directory type and remove write bits */
>  mode = ((mode & 0777) | S_IFDIR) & ~(S_IWUSR | S_IWGRP | S_IWOTH);
> -node = g_malloc0(sizeof(V9fsSynthNode));
> +node = g_new0(V9fsSynthNode, 1);
>  if (attr) {
>  /* We are adding .. or . entries */
>  node->attr = attr;
> @@ -128,7 +128,7 @@ int qemu_v9fs_synth_add_file(V9fsSynthNode *parent, int 
> mode,
>  }
>  /* Add file type and remove write bits */
>  mode = ((mode & 0777) | S_IFREG);
> -node = g_malloc0(sizeof(V9fsSynthNode));
> +node = g_new0(V9fsSynthNode, 1);
>  node->attr = >actual_attr;
>  node->attr->inode  = synth_node_count++;
>  node->attr->nlink  = 1;
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index a6d6b3f835..8e9d4aea73 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -324,7 +324,7 @@ static V9fsFidState *alloc_fid(V9fsState *s, int32_t fid)
>  return NULL;
>  }
>  }
> -f = g_malloc0(sizeof(V9fsFidState));
> +f = g_new0(V9fsFidState, 1);
>  f->fid = fid;
>  f->fid_type = P9_FID_NONE;
>  f->ref = 1;
> @@ -804,7 +804,7 @@ static int qid_inode_prefix_hash_bits(V9fsPDU *pdu, dev_t 
> dev)
>  
>  val = qht_lookup(>s->qpd_table, , hash);
>  if (!val) {
> -val = g_malloc0(sizeof(QpdEntry));
> +val = g_new0(QpdEntry, 1);
>  *val = lookup;
>  affix = affixForIndex(pdu->s->qp_affix_next);
>  val->prefix_bits = affix.bits;
> @@ -852,7 +852,7 @@ static int qid_path_fullmap(V9fsPDU *pdu, const struct 
> stat *stbuf,
>  return -ENFILE;
>  }
>  
> -val = g_malloc0(sizeof(QppEntry));
> +val = g_new0(QpfEntry, 1);
>  *val = lookup;
>  
>  /* new unique inode and device combo */
> @@ -928,7 +928,7 @@ static int qid_path_suffixmap(V9fsPDU *pdu, const struct 
> stat *stbuf,
>  return -ENFILE;
>  }
>  
> -val = g_malloc0(sizeof(QppEntry));
> +val = g_new0(QppEntry, 1);
>  *val = lookup;
>  
>  /* new unique inode affix and device combo */
> diff --git a/hw/9pfs/codir.c b/hw/9pfs/codir.c
> index 75148bc985..93ba44fb75 100644
> --- a/hw/9pfs/codir.c
> +++ b/hw/9pfs/codir.c
> @@ -141,9 +141,9 @@ static int do_readdir_many(V9fsPDU *pdu, V9fsFidState 
> *fidp,
>  
>  /* append next node to result chain */
>  if (!e) {
> -*entries = e = g_malloc0(sizeof(V9fsDirEnt));
> +  

Re: [PATCH v2 0/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k

2021-10-08 Thread Greg Kurz
On Thu, 7 Oct 2021 16:42:49 +0100
Stefan Hajnoczi  wrote:

> On Thu, Oct 07, 2021 at 02:51:55PM +0200, Christian Schoenebeck wrote:
> > On Donnerstag, 7. Oktober 2021 07:23:59 CEST Stefan Hajnoczi wrote:
> > > On Mon, Oct 04, 2021 at 09:38:00PM +0200, Christian Schoenebeck wrote:
> > > > At the moment the maximum transfer size with virtio is limited to 4M
> > > > (1024 * PAGE_SIZE). This series raises this limit to its maximum
> > > > theoretical possible transfer size of 128M (32k pages) according to the
> > > > virtio specs:
> > > > 
> > > > https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#
> > > > x1-240006
> > > Hi Christian,
> > > I took a quick look at the code:
> > > 


Hi,

Thanks Stefan for sharing virtio expertise and helping Christian !

> > > - The Linux 9p driver restricts descriptor chains to 128 elements
> > >   (net/9p/trans_virtio.c:VIRTQUEUE_NUM)
> > 
> > Yes, that's the limitation that I am about to remove (WIP); current kernel 
> > patches:
> > https://lore.kernel.org/netdev/cover.1632327421.git.linux_...@crudebyte.com/
> 
> I haven't read the patches yet but I'm concerned that today the driver
> is pretty well-behaved and this new patch series introduces a spec
> violation. Not fixing existing spec violations is okay, but adding new
> ones is a red flag. I think we need to figure out a clean solution.
> 
> > > - The QEMU 9pfs code passes iovecs directly to preadv(2) and will fail
> > >   with EINVAL when called with more than IOV_MAX iovecs
> > >   (hw/9pfs/9p.c:v9fs_read())
> > 
> > Hmm, which makes me wonder why I never encountered this error during 
> > testing.
> > 
> > Most people will use the 9p qemu 'local' fs driver backend in practice, so 
> > that v9fs_read() call would translate for most people to this 
> > implementation 
> > on QEMU side (hw/9p/9p-local.c):
> > 
> > static ssize_t local_preadv(FsContext *ctx, V9fsFidOpenState *fs,
> > const struct iovec *iov,
> > int iovcnt, off_t offset)
> > {
> > #ifdef CONFIG_PREADV
> > return preadv(fs->fd, iov, iovcnt, offset);
> > #else
> > int err = lseek(fs->fd, offset, SEEK_SET);
> > if (err == -1) {
> > return err;
> > } else {
> > return readv(fs->fd, iov, iovcnt);
> > }
> > #endif
> > }
> > 
> > > Unless I misunderstood the code, neither side can take advantage of the
> > > new 32k descriptor chain limit?
> > > 
> > > Thanks,
> > > Stefan
> > 
> > I need to check that when I have some more time. One possible explanation 
> > might be that preadv() already has this wrapped into a loop in its 
> > implementation to circumvent a limit like IOV_MAX. It might be another "it 
> > works, but not portable" issue, but not sure.
> >
> > There are still a bunch of other issues I have to resolve. If you look at
> > net/9p/client.c on kernel side, you'll notice that it basically does this 
> > ATM
> > 
> > kmalloc(msize);
> > 

Note that this is done twice : once for the T message (client request) and once
for the R message (server answer). The 9p driver could adjust the size of the T
message to what's really needed instead of allocating the full msize. R message
size is not known though.

> > for every 9p request. So not only does it allocate much more memory for 
> > every 
> > request than actually required (i.e. say 9pfs was mounted with msize=8M, 
> > then 
> > a 9p request that actually would just need 1k would nevertheless allocate 
> > 8M), 
> > but also it allocates > PAGE_SIZE, which obviously may fail at any time.
> 
> The PAGE_SIZE limitation sounds like a kmalloc() vs vmalloc() situation.
> 
> I saw zerocopy code in the 9p guest driver but didn't investigate when
> it's used. Maybe that should be used for large requests (file
> reads/writes)?

This is the case already : zero-copy is only used for reads/writes/readdir
if the requested size is 1k or more.

Also you'll note that in this case, the 9p driver doesn't allocate msize
for the T/R messages but only 4k, which is largely enough to hold the
header.

/*
 * We allocate a inline protocol data of only 4k bytes.
 * The actual content is passed in zero-copy fashion.
 */
req = p9_client_prepare_req(c, type, P9_ZC_HDR_SZ, fmt, ap);

and

/* size of header for zero copy read/write */
#define P9_ZC_HDR_SZ 4096

A huge msize only makes sense for Twrite, Rread and Rreaddir because
of the amount of data they convey. All other messages certainly fit
in a couple of kilobytes only (sorry, don't remember the numbers).

A first change should be to allocate MIN(XXX, msize) for the
regular non-zc case, where XXX could be a reasonable fixed
value (8k?). In the case of T messages, it is even possible
to adjust the size to what's exactly needed, ala snprintf(NULL).

> virtio-blk/scsi don't memcpy data into a new buffer, they
> directly access page cache or O_DIRECT pinned pages.
> 
> Stefan

Cheers,

--
Greg


pgp65AEjJSQDF.pgp

Re: [PATCH v2 1/3] virtio: turn VIRTQUEUE_MAX_SIZE into a variable

2021-10-05 Thread Greg Kurz
On Mon, 4 Oct 2021 21:38:04 +0200
Christian Schoenebeck  wrote:

> Refactor VIRTQUEUE_MAX_SIZE to effectively become a runtime
> variable per virtio user.
> 
> Reasons:
> 
> (1) VIRTQUEUE_MAX_SIZE should reflect the absolute theoretical
> maximum queue size possible. Which is actually the maximum
> queue size allowed by the virtio protocol. The appropriate
> value for VIRTQUEUE_MAX_SIZE would therefore be 32768:
> 
> 
> https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-240006
> 
> Apparently VIRTQUEUE_MAX_SIZE was instead defined with a
> more or less arbitrary value of 1024 in the past, which
> limits the maximum transfer size with virtio to 4M
> (more precise: 1024 * PAGE_SIZE, with the latter typically
> being 4k).
> 
> (2) Additionally the current value of 1024 poses a hidden limit,
> invisible to guest, which causes a system hang with the
> following QEMU error if guest tries to exceed it:
> 
> virtio: too many write descriptors in indirect table
> 
> (3) Unfortunately not all virtio users in QEMU would currently
> work correctly with the new value of 32768.
> 
> So let's turn this hard coded global value into a runtime
> variable as a first step in this commit, configurable for each
> virtio user by passing a corresponding value with virtio_init()
> call.
> 
> Signed-off-by: Christian Schoenebeck 
> ---

Reviewed-by: Greg Kurz 

>  hw/9pfs/virtio-9p-device.c |  3 ++-
>  hw/block/vhost-user-blk.c  |  2 +-
>  hw/block/virtio-blk.c  |  3 ++-
>  hw/char/virtio-serial-bus.c|  2 +-
>  hw/display/virtio-gpu-base.c   |  2 +-
>  hw/input/virtio-input.c|  2 +-
>  hw/net/virtio-net.c| 15 ---
>  hw/scsi/virtio-scsi.c  |  2 +-
>  hw/virtio/vhost-user-fs.c  |  2 +-
>  hw/virtio/vhost-user-i2c.c |  3 ++-
>  hw/virtio/vhost-vsock-common.c |  2 +-
>  hw/virtio/virtio-balloon.c |  4 ++--
>  hw/virtio/virtio-crypto.c  |  3 ++-
>  hw/virtio/virtio-iommu.c   |  2 +-
>  hw/virtio/virtio-mem.c |  2 +-
>  hw/virtio/virtio-pmem.c|  2 +-
>  hw/virtio/virtio-rng.c |  2 +-
>  hw/virtio/virtio.c | 35 +++---
>  include/hw/virtio/virtio.h |  5 -
>  19 files changed, 57 insertions(+), 36 deletions(-)
> 
> diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
> index 54ee93b71f..cd5d95dd51 100644
> --- a/hw/9pfs/virtio-9p-device.c
> +++ b/hw/9pfs/virtio-9p-device.c
> @@ -216,7 +216,8 @@ static void virtio_9p_device_realize(DeviceState *dev, 
> Error **errp)
>  }
>  
>  v->config_size = sizeof(struct virtio_9p_config) + strlen(s->fsconf.tag);
> -virtio_init(vdev, "virtio-9p", VIRTIO_ID_9P, v->config_size);
> +virtio_init(vdev, "virtio-9p", VIRTIO_ID_9P, v->config_size,
> +VIRTQUEUE_MAX_SIZE);
>  v->vq = virtio_add_queue(vdev, MAX_REQ, handle_9p_output);
>  }
>  
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index ba13cb87e5..336f56705c 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -491,7 +491,7 @@ static void vhost_user_blk_device_realize(DeviceState 
> *dev, Error **errp)
>  }
>  
>  virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
> -sizeof(struct virtio_blk_config));
> +sizeof(struct virtio_blk_config), VIRTQUEUE_MAX_SIZE);
>  
>  s->virtqs = g_new(VirtQueue *, s->num_queues);
>  for (i = 0; i < s->num_queues; i++) {
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index f139cd7cc9..9c0f46815c 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -1213,7 +1213,8 @@ static void virtio_blk_device_realize(DeviceState *dev, 
> Error **errp)
>  
>  virtio_blk_set_config_size(s, s->host_features);
>  
> -virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK, s->config_size);
> +virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK, s->config_size,
> +VIRTQUEUE_MAX_SIZE);
>  
>  s->blk = conf->conf.blk;
>  s->rq = NULL;
> diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
> index f01ec2137c..9ad915 100644
> --- a/hw/char/virtio-serial-bus.c
> +++ b/hw/char/virtio-serial-bus.c
> @@ -1045,7 +1045,7 @@ static void virtio_serial_device_realize(DeviceState 
> *dev, Error **errp)
>  config_size = offsetof(struct virtio_console_config, emerg_wr);
>  }
>  virtio_init(vdev, "virtio-serial", VIRTIO_ID_CONSOLE,
> - 

Re: [PATCH v2 2/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k

2021-10-05 Thread Greg Kurz
On Tue, 5 Oct 2021 03:16:07 -0400
"Michael S. Tsirkin"  wrote:

> On Mon, Oct 04, 2021 at 09:38:08PM +0200, Christian Schoenebeck wrote:
> > Raise the maximum possible virtio transfer size to 128M
> > (more precisely: 32k * PAGE_SIZE). See previous commit for a
> > more detailed explanation for the reasons of this change.
> > 
> > For not breaking any virtio user, all virtio users transition
> > to using the new macro VIRTQUEUE_LEGACY_MAX_SIZE instead of
> > VIRTQUEUE_MAX_SIZE, so they are all still using the old value
> > of 1k with this commit.
> > 
> > On the long-term, each virtio user should subsequently either
> > switch from VIRTQUEUE_LEGACY_MAX_SIZE to VIRTQUEUE_MAX_SIZE
> > after checking that they support the new value of 32k, or
> > otherwise they should replace the VIRTQUEUE_LEGACY_MAX_SIZE
> > macro by an appropriate value supported by them.
> > 
> > Signed-off-by: Christian Schoenebeck 
> 
> 
> I don't think we need this. Legacy isn't descriptive either.  Just leave
> VIRTQUEUE_MAX_SIZE alone, and come up with a new name for 32k.
> 

Yes I agree. Only virtio-9p is going to benefit from the new
size in the short/medium term, so it looks a bit excessive to
patch all devices. Also in the end, you end up reverting the name
change in the last patch for virtio-9p... which is a indication
that this patch does too much.

Introduce the new macro in virtio-9p and use it only there.

> > ---
> >  hw/9pfs/virtio-9p-device.c |  2 +-
> >  hw/block/vhost-user-blk.c  |  6 +++---
> >  hw/block/virtio-blk.c  |  6 +++---
> >  hw/char/virtio-serial-bus.c|  2 +-
> >  hw/input/virtio-input.c|  2 +-
> >  hw/net/virtio-net.c| 12 ++--
> >  hw/scsi/virtio-scsi.c  |  2 +-
> >  hw/virtio/vhost-user-fs.c  |  6 +++---
> >  hw/virtio/vhost-user-i2c.c |  2 +-
> >  hw/virtio/vhost-vsock-common.c |  2 +-
> >  hw/virtio/virtio-balloon.c |  2 +-
> >  hw/virtio/virtio-crypto.c  |  2 +-
> >  hw/virtio/virtio-iommu.c   |  2 +-
> >  hw/virtio/virtio-mem.c |  2 +-
> >  hw/virtio/virtio-mmio.c|  4 ++--
> >  hw/virtio/virtio-pmem.c|  2 +-
> >  hw/virtio/virtio-rng.c |  3 ++-
> >  include/hw/virtio/virtio.h | 20 +++-
> >  18 files changed, 49 insertions(+), 30 deletions(-)
> > 
> > diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
> > index cd5d95dd51..9013e7df6e 100644
> > --- a/hw/9pfs/virtio-9p-device.c
> > +++ b/hw/9pfs/virtio-9p-device.c
> > @@ -217,7 +217,7 @@ static void virtio_9p_device_realize(DeviceState *dev, 
> > Error **errp)
> >  
> >  v->config_size = sizeof(struct virtio_9p_config) + 
> > strlen(s->fsconf.tag);
> >  virtio_init(vdev, "virtio-9p", VIRTIO_ID_9P, v->config_size,
> > -VIRTQUEUE_MAX_SIZE);
> > +VIRTQUEUE_LEGACY_MAX_SIZE);
> >  v->vq = virtio_add_queue(vdev, MAX_REQ, handle_9p_output);
> >  }
> >  
> > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> > index 336f56705c..e5e45262ab 100644
> > --- a/hw/block/vhost-user-blk.c
> > +++ b/hw/block/vhost-user-blk.c
> > @@ -480,9 +480,9 @@ static void vhost_user_blk_device_realize(DeviceState 
> > *dev, Error **errp)
> >  error_setg(errp, "queue size must be non-zero");
> >  return;
> >  }
> > -if (s->queue_size > VIRTQUEUE_MAX_SIZE) {
> > +if (s->queue_size > VIRTQUEUE_LEGACY_MAX_SIZE) {
> >  error_setg(errp, "queue size must not exceed %d",
> > -   VIRTQUEUE_MAX_SIZE);
> > +   VIRTQUEUE_LEGACY_MAX_SIZE);
> >  return;
> >  }
> >  
> > @@ -491,7 +491,7 @@ static void vhost_user_blk_device_realize(DeviceState 
> > *dev, Error **errp)
> >  }
> >  
> >  virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
> > -sizeof(struct virtio_blk_config), VIRTQUEUE_MAX_SIZE);
> > +sizeof(struct virtio_blk_config), 
> > VIRTQUEUE_LEGACY_MAX_SIZE);
> >  
> >  s->virtqs = g_new(VirtQueue *, s->num_queues);
> >  for (i = 0; i < s->num_queues; i++) {
> > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> > index 9c0f46815c..5883e3e7db 100644
> > --- a/hw/block/virtio-blk.c
> > +++ b/hw/block/virtio-blk.c
> > @@ -1171,10 +1171,10 @@ static void virtio_blk_device_realize(DeviceState 
> > *dev, Error **errp)
> >  return;
> >  }
> >  if (!is_power_of_2(conf->queue_size) ||
> > -conf->queue_size > VIRTQUEUE_MAX_SIZE) {
> > +conf->queue_size > VIRTQUEUE_LEGACY_MAX_SIZE) {
> >  error_setg(errp, "invalid queue-size property (%" PRIu16 "), "
> > "must be a power of 2 (max %d)",
> > -   conf->queue_size, VIRTQUEUE_MAX_SIZE);
> > +   conf->queue_size, VIRTQUEUE_LEGACY_MAX_SIZE);
> >  return;
> >  }
> >  
> > @@ -1214,7 +1214,7 @@ static void virtio_blk_device_realize(DeviceState 
> > *dev, Error **errp)
> >  virtio_blk_set_config_size(s, 

Re: [PATCH v2 0/4] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci

2021-05-17 Thread Greg Kurz
On Wed, 12 May 2021 17:05:53 +0100
Stefan Hajnoczi  wrote:

> On Fri, May 07, 2021 at 06:59:01PM +0200, Greg Kurz wrote:
> > Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU,
> > a serious slow down may be observed on setups with a big enough number
> > of vCPUs.
> > 
> > Exemple with a pseries guest on a bi-POWER9 socket system (128 HW threads):
> > 
> >   virtio-scsi  virtio-blk
> > 
> > 1   0m20.922s   0m21.346s
> > 2   0m21.230s   0m20.350s
> > 4   0m21.761s   0m20.997s
> > 8   0m22.770s   0m20.051s
> > 16  0m22.038s   0m19.994s
> > 32  0m22.928s   0m20.803s
> > 64  0m26.583s   0m22.953s
> > 128 0m41.273s   0m32.333s
> > 256 2m4.727s1m16.924s
> > 384 6m5.563s3m26.186s
> > 
> > Both perf and gprof indicate that QEMU is hogging CPUs when setting up
> > the ioeventfds:
> > 
> >  67.88%  swapper [kernel.kallsyms]  [k] power_pmu_enable
> >   9.47%  qemu-kvm[kernel.kallsyms]  [k] smp_call_function_single
> >   8.64%  qemu-kvm[kernel.kallsyms]  [k] power_pmu_enable
> > =>2.79%  qemu-kvmqemu-kvm   [.] 
> > memory_region_ioeventfd_before
> > =>2.12%  qemu-kvmqemu-kvm   [.] 
> > address_space_update_ioeventfds
> >   0.56%  kworker/8:0-mm  [kernel.kallsyms]  [k] smp_call_function_single
> > 
> > address_space_update_ioeventfds() is called when committing an MR
> > transaction, i.e. for each ioeventfd with the current code base,
> > and it internally loops on all ioventfds:
> > 
> > static void address_space_update_ioeventfds(AddressSpace *as)
> > {
> > [...]
> > FOR_EACH_FLAT_RANGE(fr, view) {
> > for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
> > 
> > This means that the setup of ioeventfds for these devices has
> > quadratic time complexity.
> > 
> > This series simply changes the device models to extend the transaction
> > to all virtqueueues, like already done in the past in the generic
> > code with 710fccf80d78 ("virtio: improve virtio devices initialization
> > time").
> > 
> > Only virtio-scsi and virtio-blk are covered here, but a similar change
> > might also be beneficial to other device types such as host-scsi-pci,
> > vhost-user-scsi-pci and vhost-user-blk-pci.
> > 
> >   virtio-scsi  virtio-blk
> > 
> > 1   0m21.271s   0m22.076s
> > 2   0m20.912s   0m19.716s
> > 4   0m20.508s   0m19.310s
> > 8   0m21.374s   0m20.273s
> > 16  0m21.559s   0m21.374s
> > 32  0m22.532s   0m21.271s
> > 64  0m26.550s   0m22.007s
> > 128 0m29.115s   0m27.446s
> > 256 0m44.752s   0m41.004s
> > 384 1m2.884s0m58.023s
> > 
> > This should fix https://bugzilla.redhat.com/show_bug.cgi?id=1927108
> > which reported the issue for virtio-scsi-pci.
> > 
> > Changes since v1:
> > - Add some comments (Stefan)
> > - Drop optimization on the error path in patch 2 (Stefan)
> > 
> > Changes since RFC:
> > 
> > As suggested by Stefan, splimplify the code by directly beginning and
> > committing the memory transaction from the device model, without all
> > the virtio specific proxying code and no changes needed in the memory
> > subsystem.
> > 
> > Greg Kurz (4):
> >   virtio-blk: Fix rollback path in virtio_blk_data_plane_start()
> >   virtio-blk: Configure all host notifiers in a single MR transaction
> >   virtio-scsi: Set host notifiers and callbacks separately
> >   virtio-scsi: Configure all host notifiers in a single MR transaction
> > 
> >  hw/block/dataplane/virtio-blk.c | 45 -
> >  hw/scsi/virtio-scsi-dataplane.c | 72 -
> >  2 files changed, 97 insertions(+), 20 deletions(-)
> > 
> > -- 
> > 2.26.3
> > 
> 
> Thanks, applied to my block tree:
> https://gitlab.com/stefanha/qemu/commits/block
> 

Hi Stefan,

It seems that Michael already merged the previous version of this
patch set with its latest PR.

https://gitlab.com/qemu-project/qemu/-/commit/6005ee07c380cbde44292f5f6c96e7daa70f4f7d

It is thus missing the v1->v2 changes. Basically some comments to
clarify the optimization we're doing with the MR transaction and
the removal of the optimization on an error path.

The optimization on the error path isn't needed indeed but it
doesn't hurt. No need to change that now that the patches are
upstream.

I can post a follow-up patch to add the missing comments though.
While here, I'd even add these comments in the generic
virtio_device_*_ioeventfd_impl() calls as well, since they already
have the very same optimization.

Anyway, I guess you can drop the patches from your tree.

Cheers,

--
Greg

> Stefan



pgpcj4AIrpiAx.pgp
Description: OpenPGP digital signature


[PATCH v2 4/4] virtio-scsi: Configure all host notifiers in a single MR transaction

2021-05-07 Thread Greg Kurz
This allows the virtio-scsi-pci device to batch the setup of all its
host notifiers. This significantly improves boot time of VMs with a
high number of vCPUs, e.g. from 6m5.563s down to 1m2.884s for a
pseries machine with 384 vCPUs.

Note that memory_region_transaction_commit() must be called before
virtio_bus_cleanup_host_notifier() because the latter might close
ioeventfds that the transaction still assumes to be around when it
commits.

Signed-off-by: Greg Kurz 
Reviewed-by: Stefan Hajnoczi 
---
 hw/scsi/virtio-scsi-dataplane.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index b2cb3d9dcc64..18eb824c97f5 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -152,6 +152,12 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 goto fail_guest_notifiers;
 }
 
+/*
+ * Batch all the host notifiers in a single transaction to avoid
+ * quadratic time complexity in address_space_update_ioeventfds().
+ */
+memory_region_transaction_begin();
+
 rc = virtio_scsi_set_host_notifier(s, vs->ctrl_vq, 0);
 if (rc != 0) {
 goto fail_host_notifiers;
@@ -173,6 +179,8 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 vq_init_count++;
 }
 
+memory_region_transaction_commit();
+
 aio_context_acquire(s->ctx);
 virtio_queue_aio_set_host_notifier_handler(vs->ctrl_vq, s->ctx,
 
virtio_scsi_data_plane_handle_ctrl);
@@ -192,6 +200,15 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 fail_host_notifiers:
 for (i = 0; i < vq_init_count; i++) {
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
+}
+
+/*
+ * The transaction expects the ioeventfds to be open when it
+ * commits. Do it now, before the cleanup loop.
+ */
+memory_region_transaction_commit();
+
+for (i = 0; i < vq_init_count; i++) {
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
 }
 k->set_guest_notifiers(qbus->parent, vs->conf.num_queues + 2, false);
@@ -229,8 +246,23 @@ void virtio_scsi_dataplane_stop(VirtIODevice *vdev)
 
 blk_drain_all(); /* ensure there are no in-flight requests */
 
+/*
+ * Batch all the host notifiers in a single transaction to avoid
+ * quadratic time complexity in address_space_update_ioeventfds().
+ */
+memory_region_transaction_begin();
+
 for (i = 0; i < vs->conf.num_queues + 2; i++) {
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
+}
+
+/*
+ * The transaction expects the ioeventfds to be open when it
+ * commits. Do it now, before the cleanup loop.
+ */
+memory_region_transaction_commit();
+
+for (i = 0; i < vs->conf.num_queues + 2; i++) {
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
 }
 
-- 
2.26.3




[PATCH v2 2/4] virtio-blk: Configure all host notifiers in a single MR transaction

2021-05-07 Thread Greg Kurz
This allows the virtio-blk-pci device to batch the setup of all its
host notifiers. This significantly improves boot time of VMs with a
high number of vCPUs, e.g. from 3m26.186s down to 0m58.023s for a
pseries machine with 384 vCPUs.

Note that memory_region_transaction_commit() must be called before
virtio_bus_cleanup_host_notifier() because the latter might close
ioeventfds that the transaction still assumes to be around when it
commits.

Signed-off-by: Greg Kurz 
---
 hw/block/dataplane/virtio-blk.c | 34 +
 1 file changed, 34 insertions(+)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index d7b5c95d26d9..55efded716e2 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -198,19 +198,38 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 goto fail_guest_notifiers;
 }
 
+/*
+ * Batch all the host notifiers in a single transaction to avoid
+ * quadratic time complexity in address_space_update_ioeventfds().
+ */
+memory_region_transaction_begin();
+
 /* Set up virtqueue notify */
 for (i = 0; i < nvqs; i++) {
 r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, true);
 if (r != 0) {
+int j = i;
+
 fprintf(stderr, "virtio-blk failed to set host notifier (%d)\n", 
r);
 while (i--) {
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
+}
+
+/*
+ * The transaction expects the ioeventfds to be open when it
+ * commits. Do it now, before the cleanup loop.
+ */
+memory_region_transaction_commit();
+
+while (j--) {
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
 }
 goto fail_host_notifiers;
 }
 }
 
+memory_region_transaction_commit();
+
 s->starting = false;
 vblk->dataplane_started = true;
 trace_virtio_blk_data_plane_start(s);
@@ -312,8 +331,23 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 
 aio_context_release(s->ctx);
 
+/*
+ * Batch all the host notifiers in a single transaction to avoid
+ * quadratic time complexity in address_space_update_ioeventfds().
+ */
+memory_region_transaction_begin();
+
 for (i = 0; i < nvqs; i++) {
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
+}
+
+/*
+ * The transaction expects the ioeventfds to be open when it
+ * commits. Do it now, before the cleanup loop.
+ */
+memory_region_transaction_commit();
+
+for (i = 0; i < nvqs; i++) {
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
 }
 
-- 
2.26.3




[PATCH v2 3/4] virtio-scsi: Set host notifiers and callbacks separately

2021-05-07 Thread Greg Kurz
Host notifiers are guaranteed to be idle until the callbacks are
hooked up with virtio_queue_aio_set_host_notifier_handler(). They
thus don't need to be set or unset with the AioContext lock held.

Do this outside the critical section, like virtio-blk already
does : basically downgrading virtio_scsi_vring_init() to only
setup the host notifier and set the callback in the caller.

This will allow to batch addition/deletion of ioeventds in
a single memory transaction, which is expected to greatly
improve initialization time.

Signed-off-by: Greg Kurz 
Reviewed-by: Stefan Hajnoczi 
---
 hw/scsi/virtio-scsi-dataplane.c | 40 ++---
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 4ad879340645..b2cb3d9dcc64 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -94,8 +94,7 @@ static bool virtio_scsi_data_plane_handle_event(VirtIODevice 
*vdev,
 return progress;
 }
 
-static int virtio_scsi_vring_init(VirtIOSCSI *s, VirtQueue *vq, int n,
-  VirtIOHandleAIOOutput fn)
+static int virtio_scsi_set_host_notifier(VirtIOSCSI *s, VirtQueue *vq, int n)
 {
 BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(s)));
 int rc;
@@ -109,7 +108,6 @@ static int virtio_scsi_vring_init(VirtIOSCSI *s, VirtQueue 
*vq, int n,
 return rc;
 }
 
-virtio_queue_aio_set_host_notifier_handler(vq, s->ctx, fn);
 return 0;
 }
 
@@ -154,38 +152,44 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 goto fail_guest_notifiers;
 }
 
-aio_context_acquire(s->ctx);
-rc = virtio_scsi_vring_init(s, vs->ctrl_vq, 0,
-virtio_scsi_data_plane_handle_ctrl);
-if (rc) {
-goto fail_vrings;
+rc = virtio_scsi_set_host_notifier(s, vs->ctrl_vq, 0);
+if (rc != 0) {
+goto fail_host_notifiers;
 }
 
 vq_init_count++;
-rc = virtio_scsi_vring_init(s, vs->event_vq, 1,
-virtio_scsi_data_plane_handle_event);
-if (rc) {
-goto fail_vrings;
+rc = virtio_scsi_set_host_notifier(s, vs->event_vq, 1);
+if (rc != 0) {
+goto fail_host_notifiers;
 }
 
 vq_init_count++;
+
 for (i = 0; i < vs->conf.num_queues; i++) {
-rc = virtio_scsi_vring_init(s, vs->cmd_vqs[i], i + 2,
-virtio_scsi_data_plane_handle_cmd);
+rc = virtio_scsi_set_host_notifier(s, vs->cmd_vqs[i], i + 2);
 if (rc) {
-goto fail_vrings;
+goto fail_host_notifiers;
 }
 vq_init_count++;
 }
 
+aio_context_acquire(s->ctx);
+virtio_queue_aio_set_host_notifier_handler(vs->ctrl_vq, s->ctx,
+
virtio_scsi_data_plane_handle_ctrl);
+virtio_queue_aio_set_host_notifier_handler(vs->event_vq, s->ctx,
+   
virtio_scsi_data_plane_handle_event);
+
+for (i = 0; i < vs->conf.num_queues; i++) {
+virtio_queue_aio_set_host_notifier_handler(vs->cmd_vqs[i], s->ctx,
+ 
virtio_scsi_data_plane_handle_cmd);
+}
+
 s->dataplane_starting = false;
 s->dataplane_started = true;
 aio_context_release(s->ctx);
 return 0;
 
-fail_vrings:
-aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s);
-aio_context_release(s->ctx);
+fail_host_notifiers:
 for (i = 0; i < vq_init_count; i++) {
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
-- 
2.26.3




[PATCH v2 1/4] virtio-blk: Fix rollback path in virtio_blk_data_plane_start()

2021-05-07 Thread Greg Kurz
When dataplane multiqueue support was added in QEMU 2.7, the path
that would rollback guest notifiers assignment in case of error
simply got dropped.

Later on, when Error was added to blk_set_aio_context() in QEMU 4.1,
another error path was introduced, but it ommits to rollback both
host and guest notifiers.

It seems cleaner to fix the rollback path in one go. The patch is
simple enough that it can be adjusted if backported to a pre-4.1
QEMU.

Fixes: 51b04ac5c6a6 ("virtio-blk: dataplane multiqueue support")
Cc: stefa...@redhat.com
Fixes: 97896a4887a0 ("block: Add Error to blk_set_aio_context()")
Cc: kw...@redhat.com
Signed-off-by: Greg Kurz 
Reviewed-by: Stefan Hajnoczi 
---
 hw/block/dataplane/virtio-blk.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index e9050c8987e7..d7b5c95d26d9 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -207,7 +207,7 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
 }
-goto fail_guest_notifiers;
+goto fail_host_notifiers;
 }
 }
 
@@ -221,7 +221,7 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 aio_context_release(old_context);
 if (r < 0) {
 error_report_err(local_err);
-goto fail_guest_notifiers;
+goto fail_aio_context;
 }
 
 /* Process queued requests before the ones in vring */
@@ -245,6 +245,13 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 aio_context_release(s->ctx);
 return 0;
 
+  fail_aio_context:
+for (i = 0; i < nvqs; i++) {
+virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
+virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
+}
+  fail_host_notifiers:
+k->set_guest_notifiers(qbus->parent, nvqs, false);
   fail_guest_notifiers:
 /*
  * If we failed to set up the guest notifiers queued requests will be
-- 
2.26.3




[PATCH v2 0/4] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci

2021-05-07 Thread Greg Kurz
Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU,
a serious slow down may be observed on setups with a big enough number
of vCPUs.

Exemple with a pseries guest on a bi-POWER9 socket system (128 HW threads):

  virtio-scsi  virtio-blk

1   0m20.922s   0m21.346s
2   0m21.230s   0m20.350s
4   0m21.761s   0m20.997s
8   0m22.770s   0m20.051s
16  0m22.038s   0m19.994s
32  0m22.928s   0m20.803s
64  0m26.583s   0m22.953s
128 0m41.273s   0m32.333s
256 2m4.727s1m16.924s
384 6m5.563s3m26.186s

Both perf and gprof indicate that QEMU is hogging CPUs when setting up
the ioeventfds:

 67.88%  swapper [kernel.kallsyms]  [k] power_pmu_enable
  9.47%  qemu-kvm[kernel.kallsyms]  [k] smp_call_function_single
  8.64%  qemu-kvm[kernel.kallsyms]  [k] power_pmu_enable
=>2.79%  qemu-kvmqemu-kvm   [.] memory_region_ioeventfd_before
=>2.12%  qemu-kvmqemu-kvm   [.] address_space_update_ioeventfds
  0.56%  kworker/8:0-mm  [kernel.kallsyms]  [k] smp_call_function_single

address_space_update_ioeventfds() is called when committing an MR
transaction, i.e. for each ioeventfd with the current code base,
and it internally loops on all ioventfds:

static void address_space_update_ioeventfds(AddressSpace *as)
{
[...]
FOR_EACH_FLAT_RANGE(fr, view) {
for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {

This means that the setup of ioeventfds for these devices has
quadratic time complexity.

This series simply changes the device models to extend the transaction
to all virtqueueues, like already done in the past in the generic
code with 710fccf80d78 ("virtio: improve virtio devices initialization
time").

Only virtio-scsi and virtio-blk are covered here, but a similar change
might also be beneficial to other device types such as host-scsi-pci,
vhost-user-scsi-pci and vhost-user-blk-pci.

  virtio-scsi  virtio-blk

1   0m21.271s   0m22.076s
2   0m20.912s   0m19.716s
4   0m20.508s   0m19.310s
8   0m21.374s   0m20.273s
16  0m21.559s   0m21.374s
32  0m22.532s   0m21.271s
64  0m26.550s   0m22.007s
128 0m29.115s   0m27.446s
256 0m44.752s   0m41.004s
384 1m2.884s0m58.023s

This should fix https://bugzilla.redhat.com/show_bug.cgi?id=1927108
which reported the issue for virtio-scsi-pci.

Changes since v1:
- Add some comments (Stefan)
- Drop optimization on the error path in patch 2 (Stefan)

Changes since RFC:

As suggested by Stefan, splimplify the code by directly beginning and
committing the memory transaction from the device model, without all
the virtio specific proxying code and no changes needed in the memory
subsystem.

Greg Kurz (4):
  virtio-blk: Fix rollback path in virtio_blk_data_plane_start()
  virtio-blk: Configure all host notifiers in a single MR transaction
  virtio-scsi: Set host notifiers and callbacks separately
  virtio-scsi: Configure all host notifiers in a single MR transaction

 hw/block/dataplane/virtio-blk.c | 45 -
 hw/scsi/virtio-scsi-dataplane.c | 72 -
 2 files changed, 97 insertions(+), 20 deletions(-)

-- 
2.26.3





Re: [PATCH 11/23] hw/intc/xics: Avoid dynamic stack allocation

2021-05-06 Thread Greg Kurz
On Wed,  5 May 2021 23:10:35 +0200
Philippe Mathieu-Daudé  wrote:

> Use autofree heap allocation instead of variable-length
> array on the stack.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/intc/xics.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index 68f9d44feb4..c293d00d5c4 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -566,8 +566,8 @@ static void ics_reset_irq(ICSIRQState *irq)
>  static void ics_reset(DeviceState *dev)
>  {
>  ICSState *ics = ICS(dev);
> +g_autofree uint8_t *flags = g_malloc(ics->nr_irqs);

I would have made it g_new(uint8_t, ics->nr_irqs) so that changes
in the type of 'flags' that could potentially change the allocated
size are safely detected.

This is unlikely though, so:

Reviewed-by: Greg Kurz 

>  int i;
> -uint8_t flags[ics->nr_irqs];
>  
>  for (i = 0; i < ics->nr_irqs; i++) {
>  flags[i] = ics->irqs[i].flags;




Re: [for-6.1 2/4] virtio-blk: Configure all host notifiers in a single MR transaction

2021-05-05 Thread Greg Kurz
On Wed, 5 May 2021 11:14:23 +0100
Stefan Hajnoczi  wrote:

> On Wed, Apr 07, 2021 at 04:34:59PM +0200, Greg Kurz wrote:
> > This allows the virtio-blk-pci device to batch the setup of all its
> > host notifiers. This significantly improves boot time of VMs with a
> > high number of vCPUs, e.g. from 3m26.186s down to 0m58.023s for a
> > pseries machine with 384 vCPUs.
> > 
> > Note that memory_region_transaction_commit() must be called before
> > virtio_bus_cleanup_host_notifier() because the latter might close
> > ioeventfds that the transaction still assumes to be around when it
> > commits.
> > 
> > Signed-off-by: Greg Kurz 
> > ---
> >  hw/block/dataplane/virtio-blk.c | 25 +
> >  1 file changed, 25 insertions(+)
> > 
> > diff --git a/hw/block/dataplane/virtio-blk.c 
> > b/hw/block/dataplane/virtio-blk.c
> > index d7b5c95d26d9..cd81893d1d01 100644
> > --- a/hw/block/dataplane/virtio-blk.c
> > +++ b/hw/block/dataplane/virtio-blk.c
> > @@ -198,19 +198,30 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
> >  goto fail_guest_notifiers;
> >  }
> >  
> > +memory_region_transaction_begin();
> 
> This call is optional and an optimization. It would be nice to have a
> comment here explaining that - otherwise people may wonder why this is
> necessary.
> 

Indeed. Same goes for patch 4 actually.

Michael,

Should I re-post an updated version of this series or can the
comments be added in a followup ?

> > +
> >  /* Set up virtqueue notify */
> >  for (i = 0; i < nvqs; i++) {
> >  r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, true);
> >  if (r != 0) {
> > +int j = i;
> > +
> >  fprintf(stderr, "virtio-blk failed to set host notifier 
> > (%d)\n", r);
> >  while (i--) {
> >  virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
> > +}
> > +
> > +memory_region_transaction_commit();
> > +
> > +while (j--) {
> >  virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
> >  }
> >  goto fail_host_notifiers;
> >  }
> >  }
> >  
> > +memory_region_transaction_commit();
> > +
> >  s->starting = false;
> >  vblk->dataplane_started = true;
> >  trace_virtio_blk_data_plane_start(s);
> > @@ -246,8 +257,15 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
> >  return 0;
> >  
> >fail_aio_context:
> > +memory_region_transaction_begin();
> 
> Probably unnecessary since this is an error code path. We don't need to
> optimize it.
> 
> Doesn't hurt though.

True. I can drop this if I repost.


pgpHxP92LGZgm.pgp
Description: OpenPGP digital signature


Re: [PATCH 1/5] hw/ppc/spapr_iommu: Register machine reset handler

2021-04-27 Thread Greg Kurz
On Tue, 27 Apr 2021 11:20:07 +0200
Philippe Mathieu-Daudé  wrote:

> On 4/27/21 3:45 AM, David Gibson wrote:
> > On Sat, Apr 24, 2021 at 06:22:25PM +0200, Philippe Mathieu-Daudé wrote:
> >> The TYPE_SPAPR_TCE_TABLE device is bus-less, thus isn't reset
> >> automatically.  Register a reset handler to get reset with the
> >> machine.
> >>
> >> It doesn't seem to be an issue because it is that way since the
> >> device QDev'ifycation 8 years ago, in commit a83000f5e3f
> >> ("spapr-tce: make sPAPRTCETable a proper device").
> >> Still, correct to have a proper API usage.
> > 
> > So, the reason this works now is that we explicitly call
> > device_reset() on the TCE table from the TCE tables "owner", either a
> > PHB (spapr_phb_reset()) or a VIO device (spapr_vio_quiesce_one()).
> > 
> > I think we want either that, or the register_reset(), not both.
> 
> rtas_quiesce() seems to call a DeviceClass::reset() on the
> children of TYPE_SPAPR_VIO_BUS:
> 
> Abstract TYPE_VIO_SPAPR_DEVICE has the TYPE_SPAPR_VIO_BUS bus_type,
> and registers the spapr_vio_busdev_reset() handler, which calls
> spapr_vio_quiesce_one()...
> 
> So either we already have 2 resets, or the bus is never reset?
> 

rtas_quiesce() is called when the guests definitively transition
from the SLOF FW to the OS. It isn't a true reset path actually,
even if it needs to reset a few devices.

On the other hand, your patch would _really_ cause the TCE table
device to be reset twice at machine reset AFAICT.

> The bus is created in spapr_machine_init():
> 
> /* Set up VIO bus */
> spapr->vio_bus = spapr_vio_bus_init();
> 
> TYPE_SPAPR_MACHINE class registers spapr_machine_reset(), which
> manually calls qemu_devices_reset() and spapr_drc_reset_all(),
> but I can't understand if a callee resets vio_bus...

The vio_bus *is* reset:

#0  0x000100629a98 in spapr_vio_busdev_reset (qdev=0x10165c400) at 
/home/greg/Work/qemu/qemu-virtiofs/include/hw/ppc/spapr_vio.h:31
#1  0x0001009fd32c in device_transitional_reset (obj=0x10165c400) at 
/home/greg/Work/qemu/qemu-virtiofs/include/hw/qdev-core.h:17
#2  0x000100a00e24 in resettable_phase_hold (obj=0x10165c400, 
opaque=, type=) at ../../hw/core/resettable.c:182
#3  0x0001009f9108 in bus_reset_child_foreach (obj=, 
cb=0x100a00cc0 , opaque=0x0, type=) at 
../../hw/core/bus.c:97
#4  0x000100a00db8 in resettable_child_foreach (rc=0x1014f5400, 
type=RESET_TYPE_COLD, opaque=0x0, cb=0x100a00cc0 , 
obj=0x10156e600) at ../../hw/core/resettable.c:96
#5  0x000100a00db8 in resettable_phase_hold (obj=0x10156e600, 
opaque=, type=) at ../../hw/core/resettable.c:173
#6  0x0001009fcaa8 in device_reset_child_foreach (obj=, 
cb=0x100a00cc0 , opaque=0x0, type=) at 
../../hw/core/qdev.c:366
#7  0x000100a00db8 in resettable_child_foreach (rc=0x1013eef90, 
type=RESET_TYPE_COLD, opaque=0x0, cb=0x100a00cc0 , 
obj=0x10164a0e0) at ../../hw/core/resettable.c:96
#8  0x000100a00db8 in resettable_phase_hold (obj=0x10164a0e0, 
opaque=, type=) at ../../hw/core/resettable.c:173
#9  0x0001009f9108 in bus_reset_child_foreach (obj=, 
cb=0x100a00cc0 , opaque=0x0, type=) at 
../../hw/core/bus.c:97
#10 0x000100a00db8 in resettable_child_foreach (rc=0x1012b1a00, 
type=RESET_TYPE_COLD, opaque=0x0, cb=0x100a00cc0 , 
obj=0x10154d4b0) at ../../hw/core/resettable.c:96
#11 0x000100a00db8 in resettable_phase_hold (obj=obj@entry=0x10154d4b0, 
opaque=opaque@entry=0x0, type=type@entry=RESET_TYPE_COLD) at 
../../hw/core/resettable.c:173
#12 0x000100a01794 in resettable_assert_reset (obj=0x10154d4b0, 
type=) at ../../hw/core/resettable.c:60
#13 0x000100a01c60 in resettable_reset (obj=0x10154d4b0, type=) at ../../hw/core/resettable.c:45
#14 0x000100a020ec in resettable_cold_reset_fn (opaque=) at 
../../hw/core/resettable.c:269
#15 0x000100a00718 in qemu_devices_reset () at ../../hw/core/reset.c:69
#16 0x000100624024 in spapr_machine_reset (machine=0x101545480) at 
../../hw/ppc/spapr.c:1587
#17 0x0001007b8128 in qemu_system_reset (reason=) at 
../../softmmu/runstate.c:442
#18 0x0001007b8fa8 in main_loop_should_exit () at 
../../softmmu/runstate.c:687
#19 0x0001007b8fa8 in qemu_main_loop () at ../../softmmu/runstate.c:721
#20 0x0001002f5150 in main (argc=, argv=, 
envp=) at ../../softmmu/main.c:50

And it seems rtas_quiesce() could just do bus_cold_reset(>bus)
rather than open-coding the walk of vio_bus children.



Re: [for-6.1 0/4] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci

2021-04-19 Thread Greg Kurz
Ping ?

On Wed, 7 Apr 2021 16:34:57 +0200
Greg Kurz  wrote:

> Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU,
> a serious slow down may be observed on setups with a big enough number
> of vCPUs.
> 
> Exemple with a pseries guest on a bi-POWER9 socket system (128 HW threads):
> 
>   virtio-scsi  virtio-blk
> 
> 1 0m20.922s   0m21.346s
> 2 0m21.230s   0m20.350s
> 4 0m21.761s   0m20.997s
> 8 0m22.770s   0m20.051s
> 160m22.038s   0m19.994s
> 320m22.928s   0m20.803s
> 640m26.583s   0m22.953s
> 128   0m41.273s   0m32.333s
> 256   2m4.727s1m16.924s
> 384   6m5.563s3m26.186s
> 
> Both perf and gprof indicate that QEMU is hogging CPUs when setting up
> the ioeventfds:
> 
>  67.88%  swapper [kernel.kallsyms]  [k] power_pmu_enable
>   9.47%  qemu-kvm[kernel.kallsyms]  [k] smp_call_function_single
>   8.64%  qemu-kvm[kernel.kallsyms]  [k] power_pmu_enable
> =>2.79%  qemu-kvmqemu-kvm   [.] memory_region_ioeventfd_before
> =>2.12%  qemu-kvmqemu-kvm   [.] 
> address_space_update_ioeventfds
>   0.56%  kworker/8:0-mm  [kernel.kallsyms]  [k] smp_call_function_single
> 
> address_space_update_ioeventfds() is called when committing an MR
> transaction, i.e. for each ioeventfd with the current code base,
> and it internally loops on all ioventfds:
> 
> static void address_space_update_ioeventfds(AddressSpace *as)
> {
> [...]
> FOR_EACH_FLAT_RANGE(fr, view) {
> for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
> 
> This means that the setup of ioeventfds for these devices has
> quadratic time complexity.
> 
> This series simply changes the device models to extend the transaction
> to all virtqueueues, like already done in the past in the generic
> code with 710fccf80d78 ("virtio: improve virtio devices initialization
> time").
> 
> Only virtio-scsi and virtio-blk are covered here, but a similar change
> might also be beneficial to other device types such as host-scsi-pci,
> vhost-user-scsi-pci and vhost-user-blk-pci.
> 
>   virtio-scsi  virtio-blk
> 
> 1 0m21.271s   0m22.076s
> 2 0m20.912s   0m19.716s
> 4 0m20.508s   0m19.310s
> 8 0m21.374s   0m20.273s
> 160m21.559s   0m21.374s
> 320m22.532s   0m21.271s
> 640m26.550s   0m22.007s
> 128   0m29.115s   0m27.446s
> 256   0m44.752s   0m41.004s
> 384   1m2.884s0m58.023s
> 
> This should fix https://bugzilla.redhat.com/show_bug.cgi?id=1927108
> which reported the issue for virtio-scsi-pci.
> 
> Changes since RFC:
> 
> As suggested by Stefan, splimplify the code by directly beginning and
> committing the memory transaction from the device model, without all
> the virtio specific proxying code and no changes needed in the memory
> subsystem.
> 
> Greg Kurz (4):
>   virtio-blk: Fix rollback path in virtio_blk_data_plane_start()
>   virtio-blk: Configure all host notifiers in a single MR transaction
>   virtio-scsi: Set host notifiers and callbacks separately
>   virtio-scsi: Configure all host notifiers in a single MR transaction
> 
>  hw/block/dataplane/virtio-blk.c | 36 +++--
>  hw/scsi/virtio-scsi-dataplane.c | 56 ++---
>  2 files changed, 72 insertions(+), 20 deletions(-)
> 




[for-6.1 1/4] virtio-blk: Fix rollback path in virtio_blk_data_plane_start()

2021-04-07 Thread Greg Kurz
When dataplane multiqueue support was added in QEMU 2.7, the path
that would rollback guest notifiers assignment in case of error
simply got dropped.

Later on, when Error was added to blk_set_aio_context() in QEMU 4.1,
another error path was introduced, but it ommits to rollback both
host and guest notifiers.

It seems cleaner to fix the rollback path in one go. The patch is
simple enough that it can be adjusted if backported to a pre-4.1
QEMU.

Fixes: 51b04ac5c6a6 ("virtio-blk: dataplane multiqueue support")
Cc: stefa...@redhat.com
Fixes: 97896a4887a0 ("block: Add Error to blk_set_aio_context()")
Cc: kw...@redhat.com
Signed-off-by: Greg Kurz 
Reviewed-by: Stefan Hajnoczi 
---
 hw/block/dataplane/virtio-blk.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index e9050c8987e7..d7b5c95d26d9 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -207,7 +207,7 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
 }
-goto fail_guest_notifiers;
+goto fail_host_notifiers;
 }
 }
 
@@ -221,7 +221,7 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 aio_context_release(old_context);
 if (r < 0) {
 error_report_err(local_err);
-goto fail_guest_notifiers;
+goto fail_aio_context;
 }
 
 /* Process queued requests before the ones in vring */
@@ -245,6 +245,13 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 aio_context_release(s->ctx);
 return 0;
 
+  fail_aio_context:
+for (i = 0; i < nvqs; i++) {
+virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
+virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
+}
+  fail_host_notifiers:
+k->set_guest_notifiers(qbus->parent, nvqs, false);
   fail_guest_notifiers:
 /*
  * If we failed to set up the guest notifiers queued requests will be
-- 
2.26.3




[for-6.1 4/4] virtio-scsi: Configure all host notifiers in a single MR transaction

2021-04-07 Thread Greg Kurz
This allows the virtio-scsi-pci device to batch the setup of all its
host notifiers. This significantly improves boot time of VMs with a
high number of vCPUs, e.g. from 6m5.563s down to 1m2.884s for a
pseries machine with 384 vCPUs.

Note that memory_region_transaction_commit() must be called before
virtio_bus_cleanup_host_notifier() because the latter might close
ioeventfds that the transaction still assumes to be around when it
commits.

Signed-off-by: Greg Kurz 
---
 hw/scsi/virtio-scsi-dataplane.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index b2cb3d9dcc64..28e003250a11 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -152,6 +152,8 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 goto fail_guest_notifiers;
 }
 
+memory_region_transaction_begin();
+
 rc = virtio_scsi_set_host_notifier(s, vs->ctrl_vq, 0);
 if (rc != 0) {
 goto fail_host_notifiers;
@@ -173,6 +175,8 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 vq_init_count++;
 }
 
+memory_region_transaction_commit();
+
 aio_context_acquire(s->ctx);
 virtio_queue_aio_set_host_notifier_handler(vs->ctrl_vq, s->ctx,
 
virtio_scsi_data_plane_handle_ctrl);
@@ -192,6 +196,11 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 fail_host_notifiers:
 for (i = 0; i < vq_init_count; i++) {
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
+}
+
+memory_region_transaction_commit();
+
+for (i = 0; i < vq_init_count; i++) {
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
 }
 k->set_guest_notifiers(qbus->parent, vs->conf.num_queues + 2, false);
@@ -229,8 +238,15 @@ void virtio_scsi_dataplane_stop(VirtIODevice *vdev)
 
 blk_drain_all(); /* ensure there are no in-flight requests */
 
+memory_region_transaction_begin();
+
 for (i = 0; i < vs->conf.num_queues + 2; i++) {
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
+}
+
+memory_region_transaction_commit();
+
+for (i = 0; i < vs->conf.num_queues + 2; i++) {
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
 }
 
-- 
2.26.3




[for-6.1 2/4] virtio-blk: Configure all host notifiers in a single MR transaction

2021-04-07 Thread Greg Kurz
This allows the virtio-blk-pci device to batch the setup of all its
host notifiers. This significantly improves boot time of VMs with a
high number of vCPUs, e.g. from 3m26.186s down to 0m58.023s for a
pseries machine with 384 vCPUs.

Note that memory_region_transaction_commit() must be called before
virtio_bus_cleanup_host_notifier() because the latter might close
ioeventfds that the transaction still assumes to be around when it
commits.

Signed-off-by: Greg Kurz 
---
 hw/block/dataplane/virtio-blk.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index d7b5c95d26d9..cd81893d1d01 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -198,19 +198,30 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 goto fail_guest_notifiers;
 }
 
+memory_region_transaction_begin();
+
 /* Set up virtqueue notify */
 for (i = 0; i < nvqs; i++) {
 r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, true);
 if (r != 0) {
+int j = i;
+
 fprintf(stderr, "virtio-blk failed to set host notifier (%d)\n", 
r);
 while (i--) {
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
+}
+
+memory_region_transaction_commit();
+
+while (j--) {
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
 }
 goto fail_host_notifiers;
 }
 }
 
+memory_region_transaction_commit();
+
 s->starting = false;
 vblk->dataplane_started = true;
 trace_virtio_blk_data_plane_start(s);
@@ -246,8 +257,15 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 return 0;
 
   fail_aio_context:
+memory_region_transaction_begin();
+
 for (i = 0; i < nvqs; i++) {
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
+}
+
+memory_region_transaction_commit();
+
+for (i = 0; i < nvqs; i++) {
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
 }
   fail_host_notifiers:
@@ -312,8 +330,15 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 
 aio_context_release(s->ctx);
 
+memory_region_transaction_begin();
+
 for (i = 0; i < nvqs; i++) {
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
+}
+
+memory_region_transaction_commit();
+
+for (i = 0; i < nvqs; i++) {
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
 }
 
-- 
2.26.3




[for-6.1 3/4] virtio-scsi: Set host notifiers and callbacks separately

2021-04-07 Thread Greg Kurz
Host notifiers are guaranteed to be idle until the callbacks are
hooked up with virtio_queue_aio_set_host_notifier_handler(). They
thus don't need to be set or unset with the AioContext lock held.

Do this outside the critical section, like virtio-blk already
does : basically downgrading virtio_scsi_vring_init() to only
setup the host notifier and set the callback in the caller.

This will allow to batch addition/deletion of ioeventds in
a single memory transaction, which is expected to greatly
improve initialization time.

Signed-off-by: Greg Kurz 
---
 hw/scsi/virtio-scsi-dataplane.c | 40 ++---
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 4ad879340645..b2cb3d9dcc64 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -94,8 +94,7 @@ static bool virtio_scsi_data_plane_handle_event(VirtIODevice 
*vdev,
 return progress;
 }
 
-static int virtio_scsi_vring_init(VirtIOSCSI *s, VirtQueue *vq, int n,
-  VirtIOHandleAIOOutput fn)
+static int virtio_scsi_set_host_notifier(VirtIOSCSI *s, VirtQueue *vq, int n)
 {
 BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(s)));
 int rc;
@@ -109,7 +108,6 @@ static int virtio_scsi_vring_init(VirtIOSCSI *s, VirtQueue 
*vq, int n,
 return rc;
 }
 
-virtio_queue_aio_set_host_notifier_handler(vq, s->ctx, fn);
 return 0;
 }
 
@@ -154,38 +152,44 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 goto fail_guest_notifiers;
 }
 
-aio_context_acquire(s->ctx);
-rc = virtio_scsi_vring_init(s, vs->ctrl_vq, 0,
-virtio_scsi_data_plane_handle_ctrl);
-if (rc) {
-goto fail_vrings;
+rc = virtio_scsi_set_host_notifier(s, vs->ctrl_vq, 0);
+if (rc != 0) {
+goto fail_host_notifiers;
 }
 
 vq_init_count++;
-rc = virtio_scsi_vring_init(s, vs->event_vq, 1,
-virtio_scsi_data_plane_handle_event);
-if (rc) {
-goto fail_vrings;
+rc = virtio_scsi_set_host_notifier(s, vs->event_vq, 1);
+if (rc != 0) {
+goto fail_host_notifiers;
 }
 
 vq_init_count++;
+
 for (i = 0; i < vs->conf.num_queues; i++) {
-rc = virtio_scsi_vring_init(s, vs->cmd_vqs[i], i + 2,
-virtio_scsi_data_plane_handle_cmd);
+rc = virtio_scsi_set_host_notifier(s, vs->cmd_vqs[i], i + 2);
 if (rc) {
-goto fail_vrings;
+goto fail_host_notifiers;
 }
 vq_init_count++;
 }
 
+aio_context_acquire(s->ctx);
+virtio_queue_aio_set_host_notifier_handler(vs->ctrl_vq, s->ctx,
+
virtio_scsi_data_plane_handle_ctrl);
+virtio_queue_aio_set_host_notifier_handler(vs->event_vq, s->ctx,
+   
virtio_scsi_data_plane_handle_event);
+
+for (i = 0; i < vs->conf.num_queues; i++) {
+virtio_queue_aio_set_host_notifier_handler(vs->cmd_vqs[i], s->ctx,
+ 
virtio_scsi_data_plane_handle_cmd);
+}
+
 s->dataplane_starting = false;
 s->dataplane_started = true;
 aio_context_release(s->ctx);
 return 0;
 
-fail_vrings:
-aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s);
-aio_context_release(s->ctx);
+fail_host_notifiers:
 for (i = 0; i < vq_init_count; i++) {
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
-- 
2.26.3




[for-6.1 0/4] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci

2021-04-07 Thread Greg Kurz
Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU,
a serious slow down may be observed on setups with a big enough number
of vCPUs.

Exemple with a pseries guest on a bi-POWER9 socket system (128 HW threads):

  virtio-scsi  virtio-blk

1   0m20.922s   0m21.346s
2   0m21.230s   0m20.350s
4   0m21.761s   0m20.997s
8   0m22.770s   0m20.051s
16  0m22.038s   0m19.994s
32  0m22.928s   0m20.803s
64  0m26.583s   0m22.953s
128 0m41.273s   0m32.333s
256 2m4.727s1m16.924s
384 6m5.563s3m26.186s

Both perf and gprof indicate that QEMU is hogging CPUs when setting up
the ioeventfds:

 67.88%  swapper [kernel.kallsyms]  [k] power_pmu_enable
  9.47%  qemu-kvm[kernel.kallsyms]  [k] smp_call_function_single
  8.64%  qemu-kvm[kernel.kallsyms]  [k] power_pmu_enable
=>2.79%  qemu-kvmqemu-kvm   [.] memory_region_ioeventfd_before
=>2.12%  qemu-kvmqemu-kvm   [.] address_space_update_ioeventfds
  0.56%  kworker/8:0-mm  [kernel.kallsyms]  [k] smp_call_function_single

address_space_update_ioeventfds() is called when committing an MR
transaction, i.e. for each ioeventfd with the current code base,
and it internally loops on all ioventfds:

static void address_space_update_ioeventfds(AddressSpace *as)
{
[...]
FOR_EACH_FLAT_RANGE(fr, view) {
for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {

This means that the setup of ioeventfds for these devices has
quadratic time complexity.

This series simply changes the device models to extend the transaction
to all virtqueueues, like already done in the past in the generic
code with 710fccf80d78 ("virtio: improve virtio devices initialization
time").

Only virtio-scsi and virtio-blk are covered here, but a similar change
might also be beneficial to other device types such as host-scsi-pci,
vhost-user-scsi-pci and vhost-user-blk-pci.

  virtio-scsi  virtio-blk

1   0m21.271s   0m22.076s
2   0m20.912s   0m19.716s
4   0m20.508s   0m19.310s
8   0m21.374s   0m20.273s
16  0m21.559s   0m21.374s
32  0m22.532s   0m21.271s
64  0m26.550s   0m22.007s
128 0m29.115s   0m27.446s
256 0m44.752s   0m41.004s
384 1m2.884s0m58.023s

This should fix https://bugzilla.redhat.com/show_bug.cgi?id=1927108
which reported the issue for virtio-scsi-pci.

Changes since RFC:

As suggested by Stefan, splimplify the code by directly beginning and
committing the memory transaction from the device model, without all
the virtio specific proxying code and no changes needed in the memory
subsystem.

Greg Kurz (4):
  virtio-blk: Fix rollback path in virtio_blk_data_plane_start()
  virtio-blk: Configure all host notifiers in a single MR transaction
  virtio-scsi: Set host notifiers and callbacks separately
  virtio-scsi: Configure all host notifiers in a single MR transaction

 hw/block/dataplane/virtio-blk.c | 36 +++--
 hw/scsi/virtio-scsi-dataplane.c | 56 ++---
 2 files changed, 72 insertions(+), 20 deletions(-)

-- 
2.26.3





Re: [RFC 3/8] virtio: Add API to batch set host notifiers

2021-03-31 Thread Greg Kurz
On Wed, 31 Mar 2021 15:47:45 +0100
Stefan Hajnoczi  wrote:

> On Tue, Mar 30, 2021 at 04:17:32PM +0200, Greg Kurz wrote:
> > On Tue, 30 Mar 2021 14:55:42 +0100
> > Stefan Hajnoczi  wrote:
> > 
> > > On Tue, Mar 30, 2021 at 12:17:40PM +0200, Greg Kurz wrote:
> > > > On Mon, 29 Mar 2021 18:10:57 +0100
> > > > Stefan Hajnoczi  wrote:
> > > > > On Thu, Mar 25, 2021 at 04:07:30PM +0100, Greg Kurz wrote:
> > > > > > @@ -315,6 +338,10 @@ static void 
> > > > > > virtio_bus_unset_and_cleanup_host_notifiers(VirtioBusState *bus,
> > > > > >  
> > > > > >  for (i = 0; i < nvqs; i++) {
> > > > > >  virtio_bus_set_host_notifier(bus, i + n_offset, false);
> > > > > > +}
> > > > > > +/* Let address_space_update_ioeventfds() run before closing 
> > > > > > ioeventfds */
> > > > > 
> > > > > assert(memory_region_transaction_depth == 0)?
> > > > > 
> > > > 
> > > > Hmm... appart from the fact that memory_region_transaction_depth is
> > > > a memory internal thing that shouldn't be exposed here, it seems to
> > > > me that memory_region_transaction_depth can be != 0 when, e.g. when
> > > > batching is used... or I'm missing something ?
> > > > 
> > > > I was actually thinking of adding some asserts for that in the
> > > > memory_region_*_eventfd_full() functions introduced by patch 1.
> > > > 
> > > > if (!transaction) {
> > > > memory_region_transaction_begin();
> > > > }
> > > > assert(memory_region_transaction_depth != 0);
> > > 
> > > In that case is it safe to call virtio_bus_cleanup_host_notifier()
> > > below? I thought it depends on the transaction committing first.
> > > 
> > 
> > Yes because the transaction ends...
> > 
> > > > 
> > > > > > +virtio_bus_set_host_notifier_commit(bus);
> > ...here ^^
> > 
> > > > > > +for (i = 0; i < nvqs; i++) {
> > > > > >  virtio_bus_cleanup_host_notifier(bus, i + n_offset);
> > > > > >  }
> > > > > >  }
> 
> That contradicts what you said above: "it seems to me that
> memory_region_transaction_depth can be != 0 when, e.g. when batching is
> used".
> 
> If memory_region_transaction_depth can be != 0 when this function is
> entered then memory_region_transaction_commit() will have no effect:
> 
>   void memory_region_transaction_commit(void)
>   {
>   AddressSpace *as;
> 
>   assert(memory_region_transaction_depth);
>   assert(qemu_mutex_iothread_locked());
> 
>   --memory_region_transaction_depth;
>   if (!memory_region_transaction_depth) {

memory_region_transaction_depth should be equal to 1 when
entering the function, not 0... which is the case when
batching.

>   ^--- we won't take this branch!
> 
> So the code after memory_region_transaction_commit() cannot assume that
> anything was actually committed.
> 

Right and nothing in the current code base seems to prevent
memory_region_*_eventfd() to be called within an ongoing
transaction actually. It looks that we might want to fix that
first.

> That's why I asked about adding assert(memory_region_transaction_depth
> == 0) to guarantee that our commit takes effect immediately so that it's
> safe to call virtio_bus_cleanup_host_notifier().
> 

Yes, it was just misplaced and I didn't get the intent at first :)

> Stefan



pgpcu8f8v0gDw.pgp
Description: OpenPGP digital signature


Re: [RFC 3/8] virtio: Add API to batch set host notifiers

2021-03-30 Thread Greg Kurz
On Tue, 30 Mar 2021 14:55:42 +0100
Stefan Hajnoczi  wrote:

> On Tue, Mar 30, 2021 at 12:17:40PM +0200, Greg Kurz wrote:
> > On Mon, 29 Mar 2021 18:10:57 +0100
> > Stefan Hajnoczi  wrote:
> > > On Thu, Mar 25, 2021 at 04:07:30PM +0100, Greg Kurz wrote:
> > > > @@ -315,6 +338,10 @@ static void 
> > > > virtio_bus_unset_and_cleanup_host_notifiers(VirtioBusState *bus,
> > > >  
> > > >  for (i = 0; i < nvqs; i++) {
> > > >  virtio_bus_set_host_notifier(bus, i + n_offset, false);
> > > > +}
> > > > +/* Let address_space_update_ioeventfds() run before closing 
> > > > ioeventfds */
> > > 
> > > assert(memory_region_transaction_depth == 0)?
> > > 
> > 
> > Hmm... appart from the fact that memory_region_transaction_depth is
> > a memory internal thing that shouldn't be exposed here, it seems to
> > me that memory_region_transaction_depth can be != 0 when, e.g. when
> > batching is used... or I'm missing something ?
> > 
> > I was actually thinking of adding some asserts for that in the
> > memory_region_*_eventfd_full() functions introduced by patch 1.
> > 
> > if (!transaction) {
> > memory_region_transaction_begin();
> > }
> > assert(memory_region_transaction_depth != 0);
> 
> In that case is it safe to call virtio_bus_cleanup_host_notifier()
> below? I thought it depends on the transaction committing first.
> 

Yes because the transaction ends...

> > 
> > > > +virtio_bus_set_host_notifier_commit(bus);
...here ^^

> > > > +for (i = 0; i < nvqs; i++) {
> > > >  virtio_bus_cleanup_host_notifier(bus, i + n_offset);
> > > >  }
> > > >  }



pgpg035dfTiJO.pgp
Description: OpenPGP digital signature


Re: [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci

2021-03-30 Thread Greg Kurz
On Mon, 29 Mar 2021 18:35:16 +0100
Stefan Hajnoczi  wrote:

> On Thu, Mar 25, 2021 at 04:07:27PM +0100, Greg Kurz wrote:
> > Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU,
> > a serious slow down may be observed on setups with a big enough number
> > of vCPUs.
> > 
> > Exemple with a pseries guest on a bi-POWER9 socket system (128 HW threads):
> > 
> > 1   0m20.922s   0m21.346s
> > 2   0m21.230s   0m20.350s
> > 4   0m21.761s   0m20.997s
> > 8   0m22.770s   0m20.051s
> > 16  0m22.038s   0m19.994s
> > 32  0m22.928s   0m20.803s
> > 64  0m26.583s   0m22.953s
> > 128 0m41.273s   0m32.333s
> > 256 2m4.727s1m16.924s
> > 384 6m5.563s3m26.186s
> > 
> > Both perf and gprof indicate that QEMU is hogging CPUs when setting up
> > the ioeventfds:
> > 
> >  67.88%  swapper [kernel.kallsyms]  [k] power_pmu_enable
> >   9.47%  qemu-kvm[kernel.kallsyms]  [k] smp_call_function_single
> >   8.64%  qemu-kvm[kernel.kallsyms]  [k] power_pmu_enable
> > =>2.79%  qemu-kvmqemu-kvm   [.] 
> > memory_region_ioeventfd_before
> > =>2.12%  qemu-kvmqemu-kvm   [.] 
> > address_space_update_ioeventfds
> >   0.56%  kworker/8:0-mm  [kernel.kallsyms]  [k] smp_call_function_single
> > 
> > address_space_update_ioeventfds() is called when committing an MR
> > transaction, i.e. for each ioeventfd with the current code base,
> > and it internally loops on all ioventfds:
> > 
> > static void address_space_update_ioeventfds(AddressSpace *as)
> > {
> > [...]
> > FOR_EACH_FLAT_RANGE(fr, view) {
> > for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
> > 
> > This means that the setup of ioeventfds for these devices has
> > quadratic time complexity.
> > 
> > This series introduce generic APIs to allow batch creation and deletion
> > of ioeventfds, and converts virtio-blk and virtio-scsi to use them. This
> > greatly improves the numbers:
> > 
> > 1   0m21.271s   0m22.076s
> > 2   0m20.912s   0m19.716s
> > 4   0m20.508s   0m19.310s
> > 8   0m21.374s   0m20.273s
> > 16  0m21.559s   0m21.374s
> > 32  0m22.532s   0m21.271s
> > 64  0m26.550s   0m22.007s
> > 128 0m29.115s   0m27.446s
> > 256 0m44.752s   0m41.004s
> > 384 1m2.884s0m58.023s
> 
> Excellent numbers!
> 
> I wonder if the code can be simplified since
> memory_region_transaction_begin/end() supports nesting. Why not call
> them directly from the device model instead of introducing callbacks in
> core virtio and virtio-pci code?
> 

It seems a bit awkward that the device model should assume a memory
transaction is needed to setup host notifiers, which are ioeventfds
under the hood but the device doesn't know that.

> Also, do you think there are other opportunities to have a long
> transaction to batch up machine init, device hotplug, etc? It's not
> clear to me when transactions must be ended. Clearly it's necessary to

The transaction *must* be ended before calling
virtio_bus_cleanup_host_notifier() because
address_space_add_del_ioeventfds(), called when
finishing the transaction, needs the "to-be-closed"
eventfds to be still open, otherwise the KVM_IOEVENTFD 
ioctl() might fail with EBADF.

See this change in patch 3:

@@ -315,6 +338,10 @@ static void 
virtio_bus_unset_and_cleanup_host_notifiers(VirtioBusState *bus,
 
 for (i = 0; i < nvqs; i++) {
 virtio_bus_set_host_notifier(bus, i + n_offset, false);
+}
+/* Let address_space_update_ioeventfds() run before closing ioeventfds */
+virtio_bus_set_host_notifier_commit(bus);
+for (i = 0; i < nvqs; i++) {
 virtio_bus_cleanup_host_notifier(bus, i + n_offset);
 }
 }

Maybe I should provide more details why we're doing that ?

> end the transaction if we need to do something that depends on the
> MemoryRegion, eventfd, etc being updated. But most of the time there is
> no immediate need to end the transaction and more code could share the
> same transaction before we go back to the event loop or vcpu thread.
> 

I can't tell for all scenarios that involve memory transactions but
it seems this is definitely not the case for ioeventfds : the rest
of the code expects the transaction to be complete.

> Stefan

Thanks for the review !

Cheers,

--
Greg


pgp3DvoOfjH9_.pgp
Description: OpenPGP digital signature


Re: [RFC 4/8] virtio-pci: Batch add/del ioeventfds in a single MR transaction

2021-03-30 Thread Greg Kurz
On Mon, 29 Mar 2021 18:24:40 +0100
Stefan Hajnoczi  wrote:

> On Thu, Mar 25, 2021 at 04:07:31PM +0100, Greg Kurz wrote:
> > diff --git a/softmmu/memory.c b/softmmu/memory.c
> > index 1b1942d521cc..0279e5671bcb 100644
> > --- a/softmmu/memory.c
> > +++ b/softmmu/memory.c
> > @@ -2368,7 +2368,7 @@ void memory_region_add_eventfd_full(MemoryRegion *mr,
> >  if (size) {
> >  adjust_endianness(mr, , size_memop(size) | MO_TE);
> >  }
> > -if (transaction) {
> > +if (!transaction) {
> >  memory_region_transaction_begin();
> >  }
> >  for (i = 0; i < mr->ioeventfd_nb; ++i) {
> > @@ -2383,7 +2383,7 @@ void memory_region_add_eventfd_full(MemoryRegion *mr,
> >  sizeof(*mr->ioeventfds) * (mr->ioeventfd_nb-1 - i));
> >  mr->ioeventfds[i] = mrfd;
> >  ioeventfd_update_pending |= mr->enabled;
> > -if (transaction) {
> > +if (!transaction) {
> >  memory_region_transaction_commit();
> >  }
> 
> Looks like these two hunks belong in a previous patch.

And they are actually wrong... we *do* want a nested
transaction if 'transaction' is true :) This is a
leftover I thought I had removed but obviously not...


pgpoANV1tZRdT.pgp
Description: OpenPGP digital signature


Re: [RFC 3/8] virtio: Add API to batch set host notifiers

2021-03-30 Thread Greg Kurz
On Mon, 29 Mar 2021 18:10:57 +0100
Stefan Hajnoczi  wrote:

> On Thu, Mar 25, 2021 at 04:07:30PM +0100, Greg Kurz wrote:
> > Introduce VirtioBusClass methods to begin and commit a transaction
> > of setting/unsetting host notifiers. These handlers will be implemented
> > by virtio-pci to batch addition and deletion of ioeventfds for multiqueue
> > devices like virtio-scsi-pci or virtio-blk-pci.
> > 
> > Convert virtio_bus_set_host_notifiers() to use these handlers. Note that
> > virtio_bus_cleanup_host_notifier() closes eventfds, which could still be
> > passed to the KVM_IOEVENTFD ioctl() when the transaction ends and fail
> > with EBADF. The cleanup of the host notifiers is thus pushed to a
> > separate loop in virtio_bus_unset_and_cleanup_host_notifiers(), after
> > transaction commit.
> > 
> > Signed-off-by: Greg Kurz 
> > ---
> >  include/hw/virtio/virtio-bus.h |  4 
> >  hw/virtio/virtio-bus.c | 34 ++
> >  2 files changed, 38 insertions(+)
> > 
> > diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
> > index 6d1e4ee3e886..99704b2c090a 100644
> > --- a/include/hw/virtio/virtio-bus.h
> > +++ b/include/hw/virtio/virtio-bus.h
> > @@ -82,6 +82,10 @@ struct VirtioBusClass {
> >   */
> >  int (*ioeventfd_assign)(DeviceState *d, EventNotifier *notifier,
> >  int n, bool assign);
> > +
> > +void (*ioeventfd_assign_begin)(DeviceState *d);
> > +void (*ioeventfd_assign_commit)(DeviceState *d);
> 
> Please add doc comments for these new functions.
> 

Will do.

> > +
> >  /*
> >   * Whether queue number n is enabled.
> >   */
> > diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> > index c9e7cdb5c161..156484c4ca14 100644
> > --- a/hw/virtio/virtio-bus.c
> > +++ b/hw/virtio/virtio-bus.c
> > @@ -295,6 +295,28 @@ int virtio_bus_set_host_notifier(VirtioBusState *bus, 
> > int n, bool assign)
> >  return r;
> >  }
> >  
> > +static void virtio_bus_set_host_notifier_begin(VirtioBusState *bus)
> > +{
> > +VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus);
> > +DeviceState *proxy = DEVICE(BUS(bus)->parent);
> > +
> > +if (k->ioeventfd_assign_begin) {
> > +assert(k->ioeventfd_assign_commit);
> > +k->ioeventfd_assign_begin(proxy);
> > +}
> > +}
> > +
> > +static void virtio_bus_set_host_notifier_commit(VirtioBusState *bus)
> > +{
> > +VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus);
> > +DeviceState *proxy = DEVICE(BUS(bus)->parent);
> > +
> > +if (k->ioeventfd_assign_commit) {
> > +assert(k->ioeventfd_assign_begin);
> > +k->ioeventfd_assign_commit(proxy);
> > +}
> > +}
> > +
> >  void virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n)
> >  {
> >  VirtIODevice *vdev = virtio_bus_get_device(bus);
> > @@ -308,6 +330,7 @@ void virtio_bus_cleanup_host_notifier(VirtioBusState 
> > *bus, int n)
> >  event_notifier_cleanup(notifier);
> >  }
> >  
> > +/* virtio_bus_set_host_notifier_begin() must have been called */
> >  static void virtio_bus_unset_and_cleanup_host_notifiers(VirtioBusState 
> > *bus,
> >  int nvqs, int 
> > n_offset)
> >  {
> > @@ -315,6 +338,10 @@ static void 
> > virtio_bus_unset_and_cleanup_host_notifiers(VirtioBusState *bus,
> >  
> >  for (i = 0; i < nvqs; i++) {
> >  virtio_bus_set_host_notifier(bus, i + n_offset, false);
> > +}
> > +/* Let address_space_update_ioeventfds() run before closing ioeventfds 
> > */
> 
> assert(memory_region_transaction_depth == 0)?
> 

Hmm... appart from the fact that memory_region_transaction_depth is
a memory internal thing that shouldn't be exposed here, it seems to
me that memory_region_transaction_depth can be != 0 when, e.g. when
batching is used... or I'm missing something ?

I was actually thinking of adding some asserts for that in the
memory_region_*_eventfd_full() functions introduced by patch 1.

if (!transaction) {
memory_region_transaction_begin();
}
assert(memory_region_transaction_depth != 0);

> > +virtio_bus_set_host_notifier_commit(bus);
> > +for (i = 0; i < nvqs; i++) {
> >  virtio_bus_cleanup_host_notifier(bus, i + n_offset);
> >  }
> >  }
> > @@ -327,17 +354,24 @@ int virtio_bus_set_host_notifiers(VirtioBusState 
> >

Re: [RFC 1/8] memory: Allow eventfd add/del without starting a transaction

2021-03-30 Thread Greg Kurz
On Mon, 29 Mar 2021 18:03:49 +0100
Stefan Hajnoczi  wrote:

> On Thu, Mar 25, 2021 at 04:07:28PM +0100, Greg Kurz wrote:
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 5728a681b27d..98ed552e001c 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -1848,13 +1848,25 @@ void 
> > memory_region_clear_flush_coalesced(MemoryRegion *mr);
> >   * @match_data: whether to match against @data, instead of just @addr
> >   * @data: the data to match against the guest write
> >   * @e: event notifier to be triggered when @addr, @size, and @data all 
> > match.
> > + * @transaction: whether to start a transaction for the change
> 
> "start" is unclear. Does it begin a transaction and return with the
> transaction unfinished? I think instead the function performs the
> eventfd addition within a transaction. It would be nice to clarify this.
> 

What about: 

 * @transaction: if true, the eventfd is added within a nested transaction,
 *   if false, it is up to the caller to ensure this is called
 *   within a transaction.

> >   **/
> > -void memory_region_add_eventfd(MemoryRegion *mr,
> > -   hwaddr addr,
> > -   unsigned size,
> > -   bool match_data,
> > -   uint64_t data,
> > -   EventNotifier *e);
> > +void memory_region_add_eventfd_full(MemoryRegion *mr,
> > +hwaddr addr,
> > +unsigned size,
> > +bool match_data,
> > +uint64_t data,
> > +EventNotifier *e,
> > +bool transaction);
> > +
> > +static inline void memory_region_add_eventfd(MemoryRegion *mr,
> > + hwaddr addr,
> > + unsigned size,
> > + bool match_data,
> > + uint64_t data,
> > + EventNotifier *e)
> > +{
> > +memory_region_add_eventfd_full(mr, addr, size, match_data, data, e, 
> > true);
> > +}
> >  
> >  /**
> >   * memory_region_del_eventfd: Cancel an eventfd.
> > @@ -1868,13 +1880,25 @@ void memory_region_add_eventfd(MemoryRegion *mr,
> >   * @match_data: whether to match against @data, instead of just @addr
> >   * @data: the data to match against the guest write
> >   * @e: event notifier to be triggered when @addr, @size, and @data all 
> > match.
> > + * @transaction: whether to start a transaction for the change
> 
> Same here.

and:

 * @transaction: if true, the eventfd is cancelled within a nested transaction,
 *   if false, it is up to the caller to ensure this is called
 *   within a transaction.

?


pgpatucJmw7B8.pgp
Description: OpenPGP digital signature


Re: [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci

2021-03-25 Thread Greg Kurz
On Thu, 25 Mar 2021 17:43:10 +
Stefan Hajnoczi  wrote:

> On Thu, Mar 25, 2021 at 01:05:16PM -0400, Michael S. Tsirkin wrote:
> > On Thu, Mar 25, 2021 at 04:07:27PM +0100, Greg Kurz wrote:
> > > Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU,
> > > a serious slow down may be observed on setups with a big enough number
> > > of vCPUs.
> > > 
> > > Exemple with a pseries guest on a bi-POWER9 socket system (128 HW 
> > > threads):
> > > 
> > > 1 0m20.922s   0m21.346s
> > > 2 0m21.230s   0m20.350s
> > > 4 0m21.761s   0m20.997s
> > > 8 0m22.770s   0m20.051s
> > > 160m22.038s   0m19.994s
> > > 320m22.928s   0m20.803s
> > > 640m26.583s   0m22.953s
> > > 128   0m41.273s   0m32.333s
> > > 256   2m4.727s1m16.924s
> > > 384   6m5.563s3m26.186s
> > > 
> > > Both perf and gprof indicate that QEMU is hogging CPUs when setting up
> > > the ioeventfds:
> > > 
> > >  67.88%  swapper [kernel.kallsyms]  [k] power_pmu_enable
> > >   9.47%  qemu-kvm[kernel.kallsyms]  [k] smp_call_function_single
> > >   8.64%  qemu-kvm[kernel.kallsyms]  [k] power_pmu_enable
> > > =>2.79%  qemu-kvmqemu-kvm   [.] 
> > > memory_region_ioeventfd_before
> > > =>2.12%  qemu-kvmqemu-kvm   [.] 
> > > address_space_update_ioeventfds
> > >   0.56%  kworker/8:0-mm  [kernel.kallsyms]  [k] smp_call_function_single
> > > 
> > > address_space_update_ioeventfds() is called when committing an MR
> > > transaction, i.e. for each ioeventfd with the current code base,
> > > and it internally loops on all ioventfds:
> > > 
> > > static void address_space_update_ioeventfds(AddressSpace *as)
> > > {
> > > [...]
> > > FOR_EACH_FLAT_RANGE(fr, view) {
> > > for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
> > > 
> > > This means that the setup of ioeventfds for these devices has
> > > quadratic time complexity.
> > > 
> > > This series introduce generic APIs to allow batch creation and deletion
> > > of ioeventfds, and converts virtio-blk and virtio-scsi to use them. This
> > > greatly improves the numbers:
> > > 
> > > 1 0m21.271s   0m22.076s
> > > 2 0m20.912s   0m19.716s
> > > 4 0m20.508s   0m19.310s
> > > 8 0m21.374s   0m20.273s
> > > 160m21.559s   0m21.374s
> > > 320m22.532s   0m21.271s
> > > 640m26.550s   0m22.007s
> > > 128   0m29.115s   0m27.446s
> > > 256   0m44.752s   0m41.004s
> > > 384   1m2.884s0m58.023s
> > > 
> > > The series deliberately spans over multiple subsystems for easier
> > > review and experimenting. It also does some preliminary fixes on
> > > the way. It is thus posted as an RFC for now, but if the general
> > > idea is acceptable, I guess a non-RFC could be posted and maybe
> > > extend the feature to some other devices that might suffer from
> > > similar scaling issues, e.g. vhost-scsi-pci, vhost-user-scsi-pci
> > > and vhost-user-blk-pci, even if I haven't checked.
> > > 
> > > This should fix https://bugzilla.redhat.com/show_bug.cgi?id=1927108
> > > which reported the issue for virtio-scsi-pci.
> > 
> > 
> > Series looks ok from a quick look ...
> > 
> > this is a regression isn't it?
> > So I guess we'll need that in 6.0 or revert the # of vqs
> > change for now ...
> 
> Commit 9445e1e15e66c19e42bea942ba810db28052cd05 ("virtio-blk-pci:
> default num_queues to -smp N") was already released in QEMU 5.2.0. It is
> not a QEMU 6.0 regression.
> 

Oh you're right, I've just checked and QEMU 5.2.0 has the same problem.

> I'll review this series on Monday.
> 

Thanks !

> Stefan



pgpLxNhqGt_OH.pgp
Description: OpenPGP digital signature


Re: [RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci

2021-03-25 Thread Greg Kurz
On Thu, 25 Mar 2021 13:05:16 -0400
"Michael S. Tsirkin"  wrote:

> On Thu, Mar 25, 2021 at 04:07:27PM +0100, Greg Kurz wrote:
> > Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU,
> > a serious slow down may be observed on setups with a big enough number
> > of vCPUs.
> > 
> > Exemple with a pseries guest on a bi-POWER9 socket system (128 HW threads):
> > 
> > 1   0m20.922s   0m21.346s
> > 2   0m21.230s   0m20.350s
> > 4   0m21.761s   0m20.997s
> > 8   0m22.770s   0m20.051s
> > 16  0m22.038s   0m19.994s
> > 32  0m22.928s   0m20.803s
> > 64  0m26.583s   0m22.953s
> > 128 0m41.273s   0m32.333s
> > 256 2m4.727s1m16.924s
> > 384 6m5.563s3m26.186s
> > 
> > Both perf and gprof indicate that QEMU is hogging CPUs when setting up
> > the ioeventfds:
> > 
> >  67.88%  swapper [kernel.kallsyms]  [k] power_pmu_enable
> >   9.47%  qemu-kvm[kernel.kallsyms]  [k] smp_call_function_single
> >   8.64%  qemu-kvm[kernel.kallsyms]  [k] power_pmu_enable
> > =>2.79%  qemu-kvmqemu-kvm   [.] 
> > memory_region_ioeventfd_before
> > =>2.12%  qemu-kvmqemu-kvm   [.] 
> > address_space_update_ioeventfds
> >   0.56%  kworker/8:0-mm  [kernel.kallsyms]  [k] smp_call_function_single
> > 
> > address_space_update_ioeventfds() is called when committing an MR
> > transaction, i.e. for each ioeventfd with the current code base,
> > and it internally loops on all ioventfds:
> > 
> > static void address_space_update_ioeventfds(AddressSpace *as)
> > {
> > [...]
> > FOR_EACH_FLAT_RANGE(fr, view) {
> > for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
> > 
> > This means that the setup of ioeventfds for these devices has
> > quadratic time complexity.
> > 
> > This series introduce generic APIs to allow batch creation and deletion
> > of ioeventfds, and converts virtio-blk and virtio-scsi to use them. This
> > greatly improves the numbers:
> > 
> > 1   0m21.271s   0m22.076s
> > 2   0m20.912s   0m19.716s
> > 4   0m20.508s   0m19.310s
> > 8   0m21.374s   0m20.273s
> > 16  0m21.559s   0m21.374s
> > 32  0m22.532s   0m21.271s
> > 64  0m26.550s   0m22.007s
> > 128 0m29.115s   0m27.446s
> > 256 0m44.752s   0m41.004s
> > 384 1m2.884s0m58.023s
> > 
> > The series deliberately spans over multiple subsystems for easier
> > review and experimenting. It also does some preliminary fixes on
> > the way. It is thus posted as an RFC for now, but if the general
> > idea is acceptable, I guess a non-RFC could be posted and maybe
> > extend the feature to some other devices that might suffer from
> > similar scaling issues, e.g. vhost-scsi-pci, vhost-user-scsi-pci
> > and vhost-user-blk-pci, even if I haven't checked.
> > 
> > This should fix https://bugzilla.redhat.com/show_bug.cgi?id=1927108
> > which reported the issue for virtio-scsi-pci.
> 
> 
> Series looks ok from a quick look ...
> 
> this is a regression isn't it?

This is a regression only if we consider the defaults. Manually setting
num_queues to a high value already affects existing devices.

> So I guess we'll need that in 6.0 or revert the # of vqs
> change for now ...
> 

Not sure it is safe to merge these fixes this late... also,
as said above, I've only tested virtio-scsi and virtio-blk
but I believe the vhost-user-* variants might be affected too.

Reverting the # of vqs would really be a pity IMHO. What
about mitigating the effects ? e.g. enforce previous
behavior only if # vcpus > 64 ?

> > Greg Kurz (8):
> >   memory: Allow eventfd add/del without starting a transaction
> >   virtio: Introduce virtio_bus_set_host_notifiers()
> >   virtio: Add API to batch set host notifiers
> >   virtio-pci: Batch add/del ioeventfds in a single MR transaction
> >   virtio-blk: Fix rollback path in virtio_blk_data_plane_start()
> >   virtio-blk: Use virtio_bus_set_host_notifiers()
> >   virtio-scsi: Set host notifiers and callbacks separately
> >   virtio-scsi: Use virtio_bus_set_host_notifiers()
> > 
> >  hw/virtio/virtio-pci.h  |  1 +
> >  include/exec/memory.h   | 48 --
> >  include/hw/virtio/virtio-bus.h  |  7 
> >  hw/block/dataplane/virtio-blk.c | 26 +---
> >  hw/scsi/virtio-scsi-dataplane.c | 68 ++--
> >  hw/virtio/virtio-bus.c  | 70 +
> >  hw/virtio/virtio-pci.c  | 53 +
> >  softmmu/memory.c| 42 
> >  8 files changed, 225 insertions(+), 90 deletions(-)
> > 
> > -- 
> > 2.26.3
> > 
> 




[RFC 6/8] virtio-blk: Use virtio_bus_set_host_notifiers()

2021-03-25 Thread Greg Kurz
This allows the virtio-blk-pci device to batch additions and deletions
of host notifiers. This significantly improves boot time of VMs with a
high number of vCPUs, e.g. from 3m26.408s down to 0m59.923s for a pseries
machine with 384 vCPUs.

Signed-off-by: Greg Kurz 
---
 hw/block/dataplane/virtio-blk.c | 25 ++---
 1 file changed, 6 insertions(+), 19 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index d7b5c95d26d9..fd2a60010285 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -172,6 +172,7 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 VirtIOBlockDataPlane *s = vblk->dataplane;
 BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vblk)));
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+VirtioBusState *bus = VIRTIO_BUS(qbus);
 AioContext *old_context;
 unsigned i;
 unsigned nvqs = s->conf->num_queues;
@@ -199,16 +200,9 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 }
 
 /* Set up virtqueue notify */
-for (i = 0; i < nvqs; i++) {
-r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, true);
-if (r != 0) {
-fprintf(stderr, "virtio-blk failed to set host notifier (%d)\n", 
r);
-while (i--) {
-virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
-virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
-}
-goto fail_host_notifiers;
-}
+r = virtio_bus_set_host_notifiers(bus, nvqs, 0, true);
+if (r != 0) {
+goto fail_host_notifiers;
 }
 
 s->starting = false;
@@ -246,10 +240,7 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 return 0;
 
   fail_aio_context:
-for (i = 0; i < nvqs; i++) {
-virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
-virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
-}
+virtio_bus_set_host_notifiers(bus, nvqs, 0, false);
   fail_host_notifiers:
 k->set_guest_notifiers(qbus->parent, nvqs, false);
   fail_guest_notifiers:
@@ -287,7 +278,6 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 VirtIOBlockDataPlane *s = vblk->dataplane;
 BusState *qbus = qdev_get_parent_bus(DEVICE(vblk));
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
-unsigned i;
 unsigned nvqs = s->conf->num_queues;
 
 if (!vblk->dataplane_started || s->stopping) {
@@ -312,10 +302,7 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 
 aio_context_release(s->ctx);
 
-for (i = 0; i < nvqs; i++) {
-virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
-virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
-}
+virtio_bus_set_host_notifiers(VIRTIO_BUS(qbus), nvqs, 0, false);
 
 qemu_bh_cancel(s->bh);
 notify_guest_bh(s); /* final chance to notify guest */
-- 
2.26.3




[RFC 8/8] virtio-scsi: Use virtio_bus_set_host_notifiers()

2021-03-25 Thread Greg Kurz
This allows the virtio-scsi-pci device to batch additions and deletions
of host notifiers. This significantly improves boot time of VMs with a
high number of vCPUs, e.g. from 6m13.969s down to 1m4.268s for a pseries
machine with 384 vCPUs.

Signed-off-by: Greg Kurz 
---
 hw/scsi/virtio-scsi-dataplane.c | 28 +---
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 11b53ab767be..eec2b6e19a5b 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -141,6 +141,7 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
 VirtIOSCSI *s = VIRTIO_SCSI(vdev);
+VirtioBusState *bus = VIRTIO_BUS(qbus);
 
 if (s->dataplane_started ||
 s->dataplane_starting ||
@@ -171,12 +172,9 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 
 vq_init_count++;
 
-for (i = 0; i < vs->conf.num_queues; i++) {
-rc = virtio_scsi_set_host_notifier(s, vs->cmd_vqs[i], i + 2);
-if (rc) {
-goto fail_host_notifiers;
-}
-vq_init_count++;
+rc = virtio_bus_set_host_notifiers(bus, vs->conf.num_queues, 2, true);
+if (rc) {
+goto fail_host_notifiers;
 }
 
 aio_context_acquire(s->ctx);
@@ -196,10 +194,13 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 return 0;
 
 fail_host_notifiers:
-for (i = 0; i < vq_init_count; i++) {
-virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
-virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
-}
+/*
+ * Only host notifiers for ctrl_vq and event_vq can be set at
+ * this point. Notifiers for cmd_vqs[] have been reverted by
+ * virtio_bus_set_host_notifiers() already.
+ */
+assert(vq_init_count <= 2);
+virtio_bus_set_host_notifiers(bus, vq_init_count, 0, false);
 k->set_guest_notifiers(qbus->parent, vs->conf.num_queues + 2, false);
 fail_guest_notifiers:
 s->dataplane_fenced = true;
@@ -215,7 +216,6 @@ void virtio_scsi_dataplane_stop(VirtIODevice *vdev)
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
 VirtIOSCSI *s = VIRTIO_SCSI(vdev);
-int i;
 
 if (!s->dataplane_started || s->dataplane_stopping) {
 return;
@@ -235,10 +235,8 @@ void virtio_scsi_dataplane_stop(VirtIODevice *vdev)
 
 blk_drain_all(); /* ensure there are no in-flight requests */
 
-for (i = 0; i < vs->conf.num_queues + 2; i++) {
-virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
-virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
-}
+virtio_bus_set_host_notifiers(VIRTIO_BUS(qbus), vs->conf.num_queues + 2, 0,
+  false);
 
 /* Clean up guest notifier (irq) */
 k->set_guest_notifiers(qbus->parent, vs->conf.num_queues + 2, false);
-- 
2.26.3




[RFC 3/8] virtio: Add API to batch set host notifiers

2021-03-25 Thread Greg Kurz
Introduce VirtioBusClass methods to begin and commit a transaction
of setting/unsetting host notifiers. These handlers will be implemented
by virtio-pci to batch addition and deletion of ioeventfds for multiqueue
devices like virtio-scsi-pci or virtio-blk-pci.

Convert virtio_bus_set_host_notifiers() to use these handlers. Note that
virtio_bus_cleanup_host_notifier() closes eventfds, which could still be
passed to the KVM_IOEVENTFD ioctl() when the transaction ends and fail
with EBADF. The cleanup of the host notifiers is thus pushed to a
separate loop in virtio_bus_unset_and_cleanup_host_notifiers(), after
transaction commit.

Signed-off-by: Greg Kurz 
---
 include/hw/virtio/virtio-bus.h |  4 
 hw/virtio/virtio-bus.c | 34 ++
 2 files changed, 38 insertions(+)

diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
index 6d1e4ee3e886..99704b2c090a 100644
--- a/include/hw/virtio/virtio-bus.h
+++ b/include/hw/virtio/virtio-bus.h
@@ -82,6 +82,10 @@ struct VirtioBusClass {
  */
 int (*ioeventfd_assign)(DeviceState *d, EventNotifier *notifier,
 int n, bool assign);
+
+void (*ioeventfd_assign_begin)(DeviceState *d);
+void (*ioeventfd_assign_commit)(DeviceState *d);
+
 /*
  * Whether queue number n is enabled.
  */
diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index c9e7cdb5c161..156484c4ca14 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -295,6 +295,28 @@ int virtio_bus_set_host_notifier(VirtioBusState *bus, int 
n, bool assign)
 return r;
 }
 
+static void virtio_bus_set_host_notifier_begin(VirtioBusState *bus)
+{
+VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus);
+DeviceState *proxy = DEVICE(BUS(bus)->parent);
+
+if (k->ioeventfd_assign_begin) {
+assert(k->ioeventfd_assign_commit);
+k->ioeventfd_assign_begin(proxy);
+}
+}
+
+static void virtio_bus_set_host_notifier_commit(VirtioBusState *bus)
+{
+VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus);
+DeviceState *proxy = DEVICE(BUS(bus)->parent);
+
+if (k->ioeventfd_assign_commit) {
+assert(k->ioeventfd_assign_begin);
+k->ioeventfd_assign_commit(proxy);
+}
+}
+
 void virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n)
 {
 VirtIODevice *vdev = virtio_bus_get_device(bus);
@@ -308,6 +330,7 @@ void virtio_bus_cleanup_host_notifier(VirtioBusState *bus, 
int n)
 event_notifier_cleanup(notifier);
 }
 
+/* virtio_bus_set_host_notifier_begin() must have been called */
 static void virtio_bus_unset_and_cleanup_host_notifiers(VirtioBusState *bus,
 int nvqs, int n_offset)
 {
@@ -315,6 +338,10 @@ static void 
virtio_bus_unset_and_cleanup_host_notifiers(VirtioBusState *bus,
 
 for (i = 0; i < nvqs; i++) {
 virtio_bus_set_host_notifier(bus, i + n_offset, false);
+}
+/* Let address_space_update_ioeventfds() run before closing ioeventfds */
+virtio_bus_set_host_notifier_commit(bus);
+for (i = 0; i < nvqs; i++) {
 virtio_bus_cleanup_host_notifier(bus, i + n_offset);
 }
 }
@@ -327,17 +354,24 @@ int virtio_bus_set_host_notifiers(VirtioBusState *bus, 
int nvqs, int n_offset,
 int rc;
 
 if (assign) {
+virtio_bus_set_host_notifier_begin(bus);
+
 for (i = 0; i < nvqs; i++) {
 rc = virtio_bus_set_host_notifier(bus, i + n_offset, true);
 if (rc != 0) {
 warn_report_once("%s: Failed to set host notifier (%s).\n",
  vdev->name, strerror(-rc));
 
+/* This also calls virtio_bus_set_host_notifier_commit() */
 virtio_bus_unset_and_cleanup_host_notifiers(bus, i, n_offset);
 return rc;
 }
 }
+
+virtio_bus_set_host_notifier_commit(bus);
 } else {
+virtio_bus_set_host_notifier_begin(bus);
+/* This also calls virtio_bus_set_host_notifier_commit() */
 virtio_bus_unset_and_cleanup_host_notifiers(bus, nvqs, n_offset);
 }
 
-- 
2.26.3




[RFC 7/8] virtio-scsi: Set host notifiers and callbacks separately

2021-03-25 Thread Greg Kurz
Host notifiers are guaranteed to be idle until the callbacks are
hooked up with virtio_queue_aio_set_host_notifier_handler(). They
thus don't need to be set or unset with the AioContext lock held.

Do this outside the critical section, like virtio-blk already
does : basically splitting virtio_scsi_vring_init() in two
functions, one to set/unset the host notifier and one for the
aio handler.

Further improvement is to convert virtio-scsi-pci to use the
virtio_bus_set_host_notifiers() API in order to batch setup
and tear down of ioeventfds. This is expected to significantly
reduce boot time of VMs with high number of vCPUs.

Signed-off-by: Greg Kurz 
---
 hw/scsi/virtio-scsi-dataplane.c | 46 -
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 4ad879340645..11b53ab767be 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -94,8 +94,7 @@ static bool virtio_scsi_data_plane_handle_event(VirtIODevice 
*vdev,
 return progress;
 }
 
-static int virtio_scsi_vring_init(VirtIOSCSI *s, VirtQueue *vq, int n,
-  VirtIOHandleAIOOutput fn)
+static int virtio_scsi_set_host_notifier(VirtIOSCSI *s, VirtQueue *vq, int n)
 {
 BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(s)));
 int rc;
@@ -109,10 +108,15 @@ static int virtio_scsi_vring_init(VirtIOSCSI *s, 
VirtQueue *vq, int n,
 return rc;
 }
 
-virtio_queue_aio_set_host_notifier_handler(vq, s->ctx, fn);
 return 0;
 }
 
+static void virtio_scsi_set_host_notifier_handler(VirtIOSCSI *s, VirtQueue *vq,
+  VirtIOHandleAIOOutput fn)
+{
+virtio_queue_aio_set_host_notifier_handler(vq, s->ctx, fn);
+}
+
 /* Context: BH in IOThread */
 static void virtio_scsi_dataplane_stop_bh(void *opaque)
 {
@@ -154,38 +158,44 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 goto fail_guest_notifiers;
 }
 
-aio_context_acquire(s->ctx);
-rc = virtio_scsi_vring_init(s, vs->ctrl_vq, 0,
-virtio_scsi_data_plane_handle_ctrl);
-if (rc) {
-goto fail_vrings;
+rc = virtio_scsi_set_host_notifier(s, vs->ctrl_vq, 0);
+if (rc != 0) {
+goto fail_host_notifiers;
 }
 
 vq_init_count++;
-rc = virtio_scsi_vring_init(s, vs->event_vq, 1,
-virtio_scsi_data_plane_handle_event);
-if (rc) {
-goto fail_vrings;
+rc = virtio_scsi_set_host_notifier(s, vs->event_vq, 1);
+if (rc != 0) {
+goto fail_host_notifiers;
 }
 
 vq_init_count++;
+
 for (i = 0; i < vs->conf.num_queues; i++) {
-rc = virtio_scsi_vring_init(s, vs->cmd_vqs[i], i + 2,
-virtio_scsi_data_plane_handle_cmd);
+rc = virtio_scsi_set_host_notifier(s, vs->cmd_vqs[i], i + 2);
 if (rc) {
-goto fail_vrings;
+goto fail_host_notifiers;
 }
 vq_init_count++;
 }
 
+aio_context_acquire(s->ctx);
+virtio_scsi_set_host_notifier_handler(s, vs->ctrl_vq,
+  virtio_scsi_data_plane_handle_ctrl);
+virtio_scsi_set_host_notifier_handler(s, vs->event_vq,
+  virtio_scsi_data_plane_handle_event);
+
+for (i = 0; i < vs->conf.num_queues; i++) {
+virtio_scsi_set_host_notifier_handler(s, vs->cmd_vqs[i],
+ 
virtio_scsi_data_plane_handle_cmd);
+}
+
 s->dataplane_starting = false;
 s->dataplane_started = true;
 aio_context_release(s->ctx);
 return 0;
 
-fail_vrings:
-aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s);
-aio_context_release(s->ctx);
+fail_host_notifiers:
 for (i = 0; i < vq_init_count; i++) {
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
-- 
2.26.3




[RFC 5/8] virtio-blk: Fix rollback path in virtio_blk_data_plane_start()

2021-03-25 Thread Greg Kurz
When dataplane multiqueue support was added in QEMU 2.7, the path
that would rollback guest notifiers assignment in case of error
simply got dropped.

Later on, when Error was added to blk_set_aio_context() in QEMU 4.1,
another error path was introduced, but it ommits to rollback both
host and guest notifiers.

It seems cleaner to fix the rollback path in one go. The patch is
simple enough that it can be adjusted if backported to a pre-4.1
QEMU.

Fixes: 51b04ac5c6a6 ("virtio-blk: dataplane multiqueue support")
Cc: stefa...@redhat.com
Fixes: 97896a4887a0 ("block: Add Error to blk_set_aio_context()")
Cc: kw...@redhat.com
Signed-off-by: Greg Kurz 
---
 hw/block/dataplane/virtio-blk.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index e9050c8987e7..d7b5c95d26d9 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -207,7 +207,7 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
 }
-goto fail_guest_notifiers;
+goto fail_host_notifiers;
 }
 }
 
@@ -221,7 +221,7 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 aio_context_release(old_context);
 if (r < 0) {
 error_report_err(local_err);
-goto fail_guest_notifiers;
+goto fail_aio_context;
 }
 
 /* Process queued requests before the ones in vring */
@@ -245,6 +245,13 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 aio_context_release(s->ctx);
 return 0;
 
+  fail_aio_context:
+for (i = 0; i < nvqs; i++) {
+virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
+virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
+}
+  fail_host_notifiers:
+k->set_guest_notifiers(qbus->parent, nvqs, false);
   fail_guest_notifiers:
 /*
  * If we failed to set up the guest notifiers queued requests will be
-- 
2.26.3




[RFC 4/8] virtio-pci: Batch add/del ioeventfds in a single MR transaction

2021-03-25 Thread Greg Kurz
Implement the ioeventfd_assign_begin() and ioeventfd_assign_commit()
handlers of VirtioBusClass. Basically track that a transaction was
already requested by the device and use this information to prevent
the memory code to generate a transaction for each individual eventfd.

Devices that want to benefit of this batching feature must be converted
to use the virtio_bus_set_host_notifiers() API.

Signed-off-by: Greg Kurz 
---
 hw/virtio/virtio-pci.h |  1 +
 hw/virtio/virtio-pci.c | 53 +-
 softmmu/memory.c   |  4 ++--
 3 files changed, 40 insertions(+), 18 deletions(-)

diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index d7d5d403a948..a1b3f1bc45c9 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -141,6 +141,7 @@ struct VirtIOPCIProxy {
 bool disable_modern;
 bool ignore_backend_features;
 OnOffAuto disable_legacy;
+bool ioeventfd_assign_started;
 uint32_t class_code;
 uint32_t nvectors;
 uint32_t dfselect;
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 883045a22354..0a8738c69541 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -243,47 +243,66 @@ static int virtio_pci_ioeventfd_assign(DeviceState *d, 
EventNotifier *notifier,
 hwaddr modern_addr = virtio_pci_queue_mem_mult(proxy) *
  virtio_get_queue_index(vq);
 hwaddr legacy_addr = VIRTIO_PCI_QUEUE_NOTIFY;
+bool transaction = !proxy->ioeventfd_assign_started;
 
 if (assign) {
 if (modern) {
 if (fast_mmio) {
-memory_region_add_eventfd(modern_mr, modern_addr, 0,
-  false, n, notifier);
+memory_region_add_eventfd_full(modern_mr, modern_addr, 0,
+   false, n, notifier, 
transaction);
 } else {
-memory_region_add_eventfd(modern_mr, modern_addr, 2,
-  false, n, notifier);
+memory_region_add_eventfd_full(modern_mr, modern_addr, 2,
+   false, n, notifier, 
transaction);
 }
 if (modern_pio) {
-memory_region_add_eventfd(modern_notify_mr, 0, 2,
-  true, n, notifier);
+memory_region_add_eventfd_full(modern_notify_mr, 0, 2,
+   true, n, notifier, transaction);
 }
 }
 if (legacy) {
-memory_region_add_eventfd(legacy_mr, legacy_addr, 2,
-  true, n, notifier);
+memory_region_add_eventfd_full(legacy_mr, legacy_addr, 2,
+   true, n, notifier, transaction);
 }
 } else {
 if (modern) {
 if (fast_mmio) {
-memory_region_del_eventfd(modern_mr, modern_addr, 0,
-  false, n, notifier);
+memory_region_del_eventfd_full(modern_mr, modern_addr, 0,
+   false, n, notifier, 
transaction);
 } else {
-memory_region_del_eventfd(modern_mr, modern_addr, 2,
-  false, n, notifier);
+memory_region_del_eventfd_full(modern_mr, modern_addr, 2,
+   false, n, notifier, 
transaction);
 }
 if (modern_pio) {
-memory_region_del_eventfd(modern_notify_mr, 0, 2,
-  true, n, notifier);
+memory_region_del_eventfd_full(modern_notify_mr, 0, 2,
+   true, n, notifier, transaction);
 }
 }
 if (legacy) {
-memory_region_del_eventfd(legacy_mr, legacy_addr, 2,
-  true, n, notifier);
+memory_region_del_eventfd_full(legacy_mr, legacy_addr, 2,
+   true, n, notifier, transaction);
 }
 }
 return 0;
 }
 
+static void virtio_pci_ioeventfd_assign_begin(DeviceState *d)
+{
+VirtIOPCIProxy *proxy = to_virtio_pci_proxy(d);
+
+assert(!proxy->ioeventfd_assign_started);
+proxy->ioeventfd_assign_started = true;
+memory_region_transaction_begin();
+}
+
+static void virtio_pci_ioeventfd_assign_commit(DeviceState *d)
+{
+VirtIOPCIProxy *proxy = to_virtio_pci_proxy(d);
+
+assert(proxy->ioeventfd_assign_started);
+memory_region_transaction_commit();
+proxy->ioeventfd_assign_started = false;
+}
+
 static void virtio_pci_start_ioeventfd(VirtIOPCIProxy *proxy)
 {
 virtio_bus_start_ioeventfd(>bus);
@@ -2161,6 +2180,8 @@ static void virtio_pci_bus_class_init(ObjectClass *klass, 
void *data)
 k->query_nvectors = virtio_pci_q

[RFC 2/8] virtio: Introduce virtio_bus_set_host_notifiers()

2021-03-25 Thread Greg Kurz
Multiqueue devices such as virtio-scsi or virtio-blk, all open-code the
same pattern to setup/tear down host notifiers of the request virtqueues.
Consolidate the pattern in a new virtio_bus_set_host_notifiers() API.
Since virtio-scsi and virtio-blk both fallback to userspace if host
notifiers can't be set, e.g. file descriptor exhaustion, go for a
warning rather than an error. Also make it oneshot to avoid flooding
the logs since the message would be very likely the same for all
virtqueues.

Devices will be converted to use virtio_bus_set_host_notifiers() in
separate patches.

Signed-off-by: Greg Kurz 
---
 include/hw/virtio/virtio-bus.h |  3 +++
 hw/virtio/virtio-bus.c | 36 ++
 2 files changed, 39 insertions(+)

diff --git a/include/hw/virtio/virtio-bus.h b/include/hw/virtio/virtio-bus.h
index ef8abe49c5a1..6d1e4ee3e886 100644
--- a/include/hw/virtio/virtio-bus.h
+++ b/include/hw/virtio/virtio-bus.h
@@ -154,5 +154,8 @@ void virtio_bus_release_ioeventfd(VirtioBusState *bus);
 int virtio_bus_set_host_notifier(VirtioBusState *bus, int n, bool assign);
 /* Tell the bus that the ioeventfd handler is no longer required. */
 void virtio_bus_cleanup_host_notifier(VirtioBusState *bus, int n);
+/* Call virtio_bus_set_host_notifier() for several consecutive vqs */
+int virtio_bus_set_host_notifiers(VirtioBusState *bus, int nvqs, int n_offset,
+  bool assign);
 
 #endif /* VIRTIO_BUS_H */
diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index d6332d45c3b2..c9e7cdb5c161 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -308,6 +308,42 @@ void virtio_bus_cleanup_host_notifier(VirtioBusState *bus, 
int n)
 event_notifier_cleanup(notifier);
 }
 
+static void virtio_bus_unset_and_cleanup_host_notifiers(VirtioBusState *bus,
+int nvqs, int n_offset)
+{
+int i;
+
+for (i = 0; i < nvqs; i++) {
+virtio_bus_set_host_notifier(bus, i + n_offset, false);
+virtio_bus_cleanup_host_notifier(bus, i + n_offset);
+}
+}
+
+int virtio_bus_set_host_notifiers(VirtioBusState *bus, int nvqs, int n_offset,
+  bool assign)
+{
+VirtIODevice *vdev = virtio_bus_get_device(bus);
+int i;
+int rc;
+
+if (assign) {
+for (i = 0; i < nvqs; i++) {
+rc = virtio_bus_set_host_notifier(bus, i + n_offset, true);
+if (rc != 0) {
+warn_report_once("%s: Failed to set host notifier (%s).\n",
+ vdev->name, strerror(-rc));
+
+virtio_bus_unset_and_cleanup_host_notifiers(bus, i, n_offset);
+return rc;
+}
+}
+} else {
+virtio_bus_unset_and_cleanup_host_notifiers(bus, nvqs, n_offset);
+}
+
+return 0;
+}
+
 static char *virtio_bus_get_dev_path(DeviceState *dev)
 {
 BusState *bus = qdev_get_parent_bus(dev);
-- 
2.26.3




[RFC 0/8] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci

2021-03-25 Thread Greg Kurz
Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU,
a serious slow down may be observed on setups with a big enough number
of vCPUs.

Exemple with a pseries guest on a bi-POWER9 socket system (128 HW threads):

1   0m20.922s   0m21.346s
2   0m21.230s   0m20.350s
4   0m21.761s   0m20.997s
8   0m22.770s   0m20.051s
16  0m22.038s   0m19.994s
32  0m22.928s   0m20.803s
64  0m26.583s   0m22.953s
128 0m41.273s   0m32.333s
256 2m4.727s1m16.924s
384 6m5.563s3m26.186s

Both perf and gprof indicate that QEMU is hogging CPUs when setting up
the ioeventfds:

 67.88%  swapper [kernel.kallsyms]  [k] power_pmu_enable
  9.47%  qemu-kvm[kernel.kallsyms]  [k] smp_call_function_single
  8.64%  qemu-kvm[kernel.kallsyms]  [k] power_pmu_enable
=>2.79%  qemu-kvmqemu-kvm   [.] memory_region_ioeventfd_before
=>2.12%  qemu-kvmqemu-kvm   [.] address_space_update_ioeventfds
  0.56%  kworker/8:0-mm  [kernel.kallsyms]  [k] smp_call_function_single

address_space_update_ioeventfds() is called when committing an MR
transaction, i.e. for each ioeventfd with the current code base,
and it internally loops on all ioventfds:

static void address_space_update_ioeventfds(AddressSpace *as)
{
[...]
FOR_EACH_FLAT_RANGE(fr, view) {
for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {

This means that the setup of ioeventfds for these devices has
quadratic time complexity.

This series introduce generic APIs to allow batch creation and deletion
of ioeventfds, and converts virtio-blk and virtio-scsi to use them. This
greatly improves the numbers:

1   0m21.271s   0m22.076s
2   0m20.912s   0m19.716s
4   0m20.508s   0m19.310s
8   0m21.374s   0m20.273s
16  0m21.559s   0m21.374s
32  0m22.532s   0m21.271s
64  0m26.550s   0m22.007s
128 0m29.115s   0m27.446s
256 0m44.752s   0m41.004s
384 1m2.884s0m58.023s

The series deliberately spans over multiple subsystems for easier
review and experimenting. It also does some preliminary fixes on
the way. It is thus posted as an RFC for now, but if the general
idea is acceptable, I guess a non-RFC could be posted and maybe
extend the feature to some other devices that might suffer from
similar scaling issues, e.g. vhost-scsi-pci, vhost-user-scsi-pci
and vhost-user-blk-pci, even if I haven't checked.

This should fix https://bugzilla.redhat.com/show_bug.cgi?id=1927108
which reported the issue for virtio-scsi-pci.

Greg Kurz (8):
  memory: Allow eventfd add/del without starting a transaction
  virtio: Introduce virtio_bus_set_host_notifiers()
  virtio: Add API to batch set host notifiers
  virtio-pci: Batch add/del ioeventfds in a single MR transaction
  virtio-blk: Fix rollback path in virtio_blk_data_plane_start()
  virtio-blk: Use virtio_bus_set_host_notifiers()
  virtio-scsi: Set host notifiers and callbacks separately
  virtio-scsi: Use virtio_bus_set_host_notifiers()

 hw/virtio/virtio-pci.h  |  1 +
 include/exec/memory.h   | 48 --
 include/hw/virtio/virtio-bus.h  |  7 
 hw/block/dataplane/virtio-blk.c | 26 +---
 hw/scsi/virtio-scsi-dataplane.c | 68 ++--
 hw/virtio/virtio-bus.c  | 70 +
 hw/virtio/virtio-pci.c  | 53 +
 softmmu/memory.c| 42 
 8 files changed, 225 insertions(+), 90 deletions(-)

-- 
2.26.3





[RFC 1/8] memory: Allow eventfd add/del without starting a transaction

2021-03-25 Thread Greg Kurz
Each addition or deletion of an eventfd happens in its own MR
transaction. This doesn't scale well with multiqueue devices
that do 1:1 queue:vCPU mapping (e.g. virtio-scsi-pci or
virtio-blk-pci) : these devices typically create at least one
eventfd per queue and memory_region_transaction_commit(),
which is called during commit, also loops on eventfds, resulting
in a quadratic time complexity. This calls for batching : a
device should be able to add or delete its eventfds in a single
transaction.

Prepare ground for this by introducing extended versions of
memory_region_add_eventfd() and memory_region_del_eventfd()
that take an extra bool argument to control if a transaction
should be started or not.

No behavior change at this point.

Signed-off-by: Greg Kurz 
---
 include/exec/memory.h | 48 ---
 softmmu/memory.c  | 42 ++---
 2 files changed, 62 insertions(+), 28 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 5728a681b27d..98ed552e001c 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1848,13 +1848,25 @@ void memory_region_clear_flush_coalesced(MemoryRegion 
*mr);
  * @match_data: whether to match against @data, instead of just @addr
  * @data: the data to match against the guest write
  * @e: event notifier to be triggered when @addr, @size, and @data all match.
+ * @transaction: whether to start a transaction for the change
  **/
-void memory_region_add_eventfd(MemoryRegion *mr,
-   hwaddr addr,
-   unsigned size,
-   bool match_data,
-   uint64_t data,
-   EventNotifier *e);
+void memory_region_add_eventfd_full(MemoryRegion *mr,
+hwaddr addr,
+unsigned size,
+bool match_data,
+uint64_t data,
+EventNotifier *e,
+bool transaction);
+
+static inline void memory_region_add_eventfd(MemoryRegion *mr,
+ hwaddr addr,
+ unsigned size,
+ bool match_data,
+ uint64_t data,
+ EventNotifier *e)
+{
+memory_region_add_eventfd_full(mr, addr, size, match_data, data, e, true);
+}
 
 /**
  * memory_region_del_eventfd: Cancel an eventfd.
@@ -1868,13 +1880,25 @@ void memory_region_add_eventfd(MemoryRegion *mr,
  * @match_data: whether to match against @data, instead of just @addr
  * @data: the data to match against the guest write
  * @e: event notifier to be triggered when @addr, @size, and @data all match.
+ * @transaction: whether to start a transaction for the change
  */
-void memory_region_del_eventfd(MemoryRegion *mr,
-   hwaddr addr,
-   unsigned size,
-   bool match_data,
-   uint64_t data,
-   EventNotifier *e);
+void memory_region_del_eventfd_full(MemoryRegion *mr,
+hwaddr addr,
+unsigned size,
+bool match_data,
+uint64_t data,
+EventNotifier *e,
+bool transaction);
+
+static inline void memory_region_del_eventfd(MemoryRegion *mr,
+ hwaddr addr,
+ unsigned size,
+ bool match_data,
+ uint64_t data,
+ EventNotifier *e)
+{
+memory_region_del_eventfd_full(mr, addr, size, match_data, data, e, true);
+}
 
 /**
  * memory_region_add_subregion: Add a subregion to a container.
diff --git a/softmmu/memory.c b/softmmu/memory.c
index d4493ef9e430..1b1942d521cc 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2341,12 +2341,13 @@ void memory_region_clear_flush_coalesced(MemoryRegion 
*mr)
 
 static bool userspace_eventfd_warning;
 
-void memory_region_add_eventfd(MemoryRegion *mr,
-   hwaddr addr,
-   unsigned size,
-   bool match_data,
-   uint64_t data,
-   EventNotifier *e)
+void memory_region_add_eventfd_full(MemoryRegion *mr,
+hwaddr addr,
+unsigned size,
+bool match_data

Re: [PATCH v2] MAINTAINERS: Fix the location of tools manuals

2021-03-09 Thread Greg Kurz
On Tue, 9 Mar 2021 20:48:40 +0100
Thomas Huth  wrote:

> On 09/03/2021 18.41, Wainer dos Santos Moschetta wrote:
> > Hi,
> > 
> > Any issue that prevent this of being queued?
> 
> Maybe it's just not clear who should take the patch ... CC:-ing qemu-trivial 
> and qemu-block now, since I think it could go through the trivial or block 
> tree.
> 

For the virtfs change:

Acked-by: Greg Kurz 

> > On 2/4/21 10:59 AM, Philippe Mathieu-Daudé wrote:
> >> On 2/4/21 2:54 PM, Wainer dos Santos Moschetta wrote:
> >>> The qemu-img.rst, qemu-nbd.rst, virtfs-proxy-helper.rst, 
> >>> qemu-trace-stap.rst,
> >>> and virtiofsd.rst manuals were moved to docs/tools, so this update 
> >>> MAINTAINERS
> >>> accordingly.
> >>>
> >>> Fixes: a08b4a9fe6c ("docs: Move tools documentation to tools manual")
> >>> Signed-off-by: Wainer dos Santos Moschetta 
> >>> ---
> >>> v1: was "MAINTAINERS: Fix the location of virtiofsd.rst"
> >>> v2: Fixed the location of all files [philmd]
> >>>
> >>>   MAINTAINERS | 10 +-
> >>>   1 file changed, 5 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/MAINTAINERS b/MAINTAINERS
> >>> index 00626941f1..174425a941 100644
> >>> --- a/MAINTAINERS
> >>> +++ b/MAINTAINERS
> >>> @@ -1829,7 +1829,7 @@ S: Odd Fixes
> >>>   F: hw/9pfs/
> >>>   X: hw/9pfs/xen-9p*
> >>>   F: fsdev/
> >>> -F: docs/interop/virtfs-proxy-helper.rst
> >>> +F: docs/tools/virtfs-proxy-helper.rst
> 
> FWIW:
> Reviewed-by: Thomas Huth 
> 




Re: [PATCH-for-6.0 2/3] hw/virtio: Build most of virtio devices as arch-independent objects

2020-11-05 Thread Greg Kurz
On Thu,  5 Nov 2020 13:43:52 +0100
Philippe Mathieu-Daudé  wrote:

> VirtIO devices shouldn't be arch-specific. Some device have to
> use PAGE_SIZE definition or access to CPUState. Keep building
> them as arch-specific objects. Move all we can to libcommon.fa.
> 

This patch breaks build:

$ ./configure && make
...

[890/2578] Compiling C object libcommon.fa.p/hw_virtio_virtio-mem.c.o
FAILED: libcommon.fa.p/hw_virtio_virtio-mem.c.o 
cc -Ilibcommon.fa.p -I. -I.. -I../slirp -I../slirp/src -Iqapi -Itrace -Iui 
-Iui/shader -I/usr/include/SDL2 -I/usr/include/pixman-1 -I/usr/include/libpng16 
-I/usr/include/capstone -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include 
-fdiagnostics-color=auto -pipe -Wall -Winvalid-pch -Werror -std=gnu99 -O2 -g 
-U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -m64 -mcx16 -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv -Wold-style-declaration 
-Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self 
-Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels 
-Wexpansion-to-defined -Wno-missing-include-dirs -Wno-shift-negative-value 
-Wno-psabi -fstack-protector-strong -isystem /var/tmp/qemu/linux-headers 
-isystem linux-headers -iquote /var/tmp/qemu/tcg/i386 -iquote . -iquote 
/var/tmp/qemu -iquote /var/tmp/qemu/accel/tcg -iquote /var/tmp/qemu/include 
-iquote /var/tmp/qemu/disas/libvixl -pthread -fPIC -D_REENTRANT -Wno-undef 
-D_DEFAULT_SOURCE -D_XOPEN_SOURCE=600 -DNCURSES_WIDECHAR -MD -MQ 
libcommon.fa.p/hw_virtio_virtio-mem.c.o -MF 
libcommon.fa.p/hw_virtio_virtio-mem.c.o.d -o 
libcommon.fa.p/hw_virtio_virtio-mem.c.o -c ../hw/virtio/virtio-mem.c
In file included from ../hw/virtio/virtio-mem.c:28:
/var/tmp/qemu/include/exec/ram_addr.h:23:10: fatal error: cpu.h: No such file 
or directory
   23 | #include "cpu.h"
  |  ^~~
compilation terminated.

Unless I'm missing something legacy devices require to be built according
to the target because of the endianess.

> Suggested-by: Stefan Hajnoczi 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/9pfs/meson.build|  2 +-
>  hw/block/dataplane/meson.build |  2 +-
>  hw/block/meson.build   |  2 +-
>  hw/char/meson.build|  2 +-
>  hw/net/meson.build |  2 +-
>  hw/scsi/meson.build|  2 +-
>  hw/virtio/meson.build  | 15 +--
>  7 files changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/9pfs/meson.build b/hw/9pfs/meson.build
> index cc094262122..ac964be15ce 100644
> --- a/hw/9pfs/meson.build
> +++ b/hw/9pfs/meson.build
> @@ -17,4 +17,4 @@
>  fs_ss.add(when: 'CONFIG_XEN', if_true: files('xen-9p-backend.c'))
>  softmmu_ss.add_all(when: 'CONFIG_9PFS', if_true: fs_ss)
>  
> -specific_ss.add(when: 'CONFIG_VIRTIO_9P', if_true: 
> files('virtio-9p-device.c'))
> +softmmu_ss.add(when: 'CONFIG_VIRTIO_9P', if_true: 
> files('virtio-9p-device.c'))
> diff --git a/hw/block/dataplane/meson.build b/hw/block/dataplane/meson.build
> index 12c6a264f10..e2f3721ce24 100644
> --- a/hw/block/dataplane/meson.build
> +++ b/hw/block/dataplane/meson.build
> @@ -1,2 +1,2 @@
> -specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c'))
> +softmmu_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c'))
>  specific_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c'))
> diff --git a/hw/block/meson.build b/hw/block/meson.build
> index 602ca6c8541..497592c33ac 100644
> --- a/hw/block/meson.build
> +++ b/hw/block/meson.build
> @@ -15,7 +15,7 @@
>  softmmu_ss.add(when: 'CONFIG_SH4', if_true: files('tc58128.c'))
>  softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 
> 'nvme-ns.c'))
>  
> -specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c'))
> +softmmu_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c'))
>  specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
> files('vhost-user-blk.c'))
>  
>  subdir('dataplane')
> diff --git a/hw/char/meson.build b/hw/char/meson.build
> index 196ac91fa29..7496594ea07 100644
> --- a/hw/char/meson.build
> +++ b/hw/char/meson.build
> @@ -37,5 +37,5 @@
>  
>  specific_ss.add(when: 'CONFIG_HTIF', if_true: files('riscv_htif.c'))
>  specific_ss.add(when: 'CONFIG_TERMINAL3270', if_true: 
> files('terminal3270.c'))
> -specific_ss.add(when: 'CONFIG_VIRTIO', if_true: files('virtio-serial-bus.c'))
> +softmmu_ss.add(when: 'CONFIG_VIRTIO', if_true: files('virtio-serial-bus.c'))
>  specific_ss.add(when: 'CONFIG_PSERIES', if_true: files('spapr_vty.c'))
> diff --git a/hw/net/meson.build b/hw/net/meson.build
> index 4a7051b54a0..c795af23eee 100644
> --- a/hw/net/meson.build
> +++ b/hw/net/meson.build
> @@ -43,7 +43,7 @@
>  specific_ss.add(when: 'CONFIG_XILINX_ETHLITE', if_true: 
> files('xilinx_ethlite.c'))
>  
>  softmmu_ss.add(when: 'CONFIG_VIRTIO_NET', if_true: files('net_rx_pkt.c'))
> -specific_ss.add(when: 

[PATCH] block: Move bdrv_drain_all_end_quiesce() to block_int.h

2020-10-28 Thread Greg Kurz
This function is really an internal helper for bdrv_close(). Update its
doc comment to make this clear and make the function private.

Signed-off-by: Greg Kurz 
---

As suggested by Stefan here:

https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg08235.html
---
 include/block/block.h |6 --
 include/block/block_int.h |9 +
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 809987017631..d16c401cb44e 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -779,12 +779,6 @@ void bdrv_drained_end(BlockDriverState *bs);
  */
 void bdrv_drained_end_no_poll(BlockDriverState *bs, int *drained_end_counter);
 
-/**
- * End all quiescent sections started by bdrv_drain_all_begin(). This is
- * only needed when deleting a BDS before bdrv_drain_all_end() is called.
- */
-void bdrv_drain_all_end_quiesce(BlockDriverState *bs);
-
 /**
  * End a quiescent section started by bdrv_subtree_drained_begin().
  */
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 38cad9d15c50..95d9333be14f 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1407,4 +1407,13 @@ static inline BlockDriverState 
*bdrv_primary_bs(BlockDriverState *bs)
 return child_bs(bdrv_primary_child(bs));
 }
 
+/**
+ * End all quiescent sections started by bdrv_drain_all_begin(). This is
+ * needed when deleting a BDS before bdrv_drain_all_end() is called.
+ *
+ * NOTE: this is an internal helper for bdrv_close() *only*. No one else
+ * should call it.
+ */
+void bdrv_drain_all_end_quiesce(BlockDriverState *bs);
+
 #endif /* BLOCK_INT_H */





Re: [PATCH v2] block: End quiescent sections when a BDS is deleted

2020-10-27 Thread Greg Kurz
On Tue, 27 Oct 2020 13:54:04 +
Stefan Hajnoczi  wrote:

> On Fri, Oct 23, 2020 at 05:01:10PM +0200, Greg Kurz wrote:
> > +/**
> > + * End all quiescent sections started by bdrv_drain_all_begin(). This is
> > + * only needed when deleting a BDS before bdrv_drain_all_end() is called.
> > + */
> > +void bdrv_drain_all_end_quiesce(BlockDriverState *bs);
> 
> This function is only called from block.c. Can it be moved to the
> private block_int.h header?
> 

Ha, I wasn't aware of block_int.h... It seems to be a very good idea.

> The code is not clear on whether bdrv_drain_all_end_quiesce() is an API
> that others can use or an internal helper function that must only be
> called by bdrv_close(). I came to the conclusion that the latter is true
> after reviewing the patch.
> 

Yes it is.

> Please update the bdrv_drain_all_end_quiesce() doc comment to clarify
> that this function is an internal helper for bdrv_close() - no one else
> needs to worry about it.

I'll do that.

Thanks for the suggestions Stefan.

Cheers,

--
Greg


pgp05wwZFvnyO.pgp
Description: OpenPGP digital signature


[PATCH v2] block: End quiescent sections when a BDS is deleted

2020-10-23 Thread Greg Kurz
If a BDS gets deleted during blk_drain_all(), it might miss a
call to bdrv_do_drained_end(). This means missing a call to
aio_enable_external() and the AIO context remains disabled for
ever. This can cause a device to become irresponsive and to
disrupt the guest execution, ie. hang, loop forever or worse.

This scenario is quite easy to encounter with virtio-scsi
on POWER when punching multiple blockdev-create QMP commands
while the guest is booting and it is still running the SLOF
firmware. This happens because SLOF disables/re-enables PCI
devices multiple times via IO/MEM/MASTER bits of PCI_COMMAND
register after the initial probe/feature negotiation, as it
tends to work with a single device at a time at various stages
like probing and running block/network bootloaders without
doing a full reset in-between. This naturally generates many
dataplane stops and starts, and thus many drain sections that
can race with blockdev_create_run(). In the end, SLOF bails
out.

It is somehow reproducible on x86 but it requires to generate
articial dataplane start/stop activity with stop/cont QMP
commands. In this case, seabios ends up looping for ever,
waiting for the virtio-scsi device to send a response to
a command it never received.

Add a helper that pairs all previously called bdrv_do_drained_begin()
with a bdrv_do_drained_end() and call it from bdrv_close().
While at it, update the "/bdrv-drain/graph-change/drain_all"
test in test-bdrv-drain so that it can catch the issue.

BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1874441
Signed-off-by: Greg Kurz 
---
v2: - use g_assert() in core code (checkpatch)
- rename helper to bdrv_drain_all_end_quiesce() (Kevin)
---
 block.c |9 +
 block/io.c  |   13 +
 include/block/block.h   |6 ++
 tests/test-bdrv-drain.c |1 +
 4 files changed, 29 insertions(+)

diff --git a/block.c b/block.c
index 430edf79bb10..ee5b28a9798d 100644
--- a/block.c
+++ b/block.c
@@ -4458,6 +4458,15 @@ static void bdrv_close(BlockDriverState *bs)
 }
 QLIST_INIT(>aio_notifiers);
 bdrv_drained_end(bs);
+
+/*
+ * If we're still inside some bdrv_drain_all_begin()/end() sections, end
+ * them now since this BDS won't exist anymore when bdrv_drain_all_end()
+ * gets called.
+ */
+if (bs->quiesce_counter) {
+bdrv_drain_all_end_quiesce(bs);
+}
 }
 
 void bdrv_close_all(void)
diff --git a/block/io.c b/block/io.c
index 54f0968aee27..c3a345a911e6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -633,6 +633,19 @@ void bdrv_drain_all_begin(void)
 }
 }
 
+void bdrv_drain_all_end_quiesce(BlockDriverState *bs)
+{
+int drained_end_counter = 0;
+
+g_assert(bs->quiesce_counter > 0);
+g_assert(!bs->refcnt);
+
+while (bs->quiesce_counter) {
+bdrv_do_drained_end(bs, false, NULL, true, _end_counter);
+}
+BDRV_POLL_WHILE(bs, qatomic_read(_end_counter) > 0);
+}
+
 void bdrv_drain_all_end(void)
 {
 BlockDriverState *bs = NULL;
diff --git a/include/block/block.h b/include/block/block.h
index d16c401cb44e..809987017631 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -779,6 +779,12 @@ void bdrv_drained_end(BlockDriverState *bs);
  */
 void bdrv_drained_end_no_poll(BlockDriverState *bs, int *drained_end_counter);
 
+/**
+ * End all quiescent sections started by bdrv_drain_all_begin(). This is
+ * only needed when deleting a BDS before bdrv_drain_all_end() is called.
+ */
+void bdrv_drain_all_end_quiesce(BlockDriverState *bs);
+
 /**
  * End a quiescent section started by bdrv_subtree_drained_begin().
  */
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 1595bbc92e9e..8a29e33e004a 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -594,6 +594,7 @@ static void test_graph_change_drain_all(void)
 
 g_assert_cmpint(bs_b->quiesce_counter, ==, 0);
 g_assert_cmpint(b_s->drain_count, ==, 0);
+g_assert_cmpint(qemu_get_aio_context()->external_disable_cnt, ==, 0);
 
 bdrv_unref(bs_b);
 blk_unref(blk_b);





Re: [PATCH] block: End quiescent sections when a BDS is deleted

2020-10-23 Thread Greg Kurz
On Fri, 23 Oct 2020 16:18:05 +0200
Kevin Wolf  wrote:

> Am 23.10.2020 um 12:41 hat Greg Kurz geschrieben:
> > If a BDS gets deleted during blk_drain_all(), it might miss a
> > call to bdrv_do_drained_end(). This means missing a call to
> > aio_enable_external() and the AIO context remains disabled for
> > ever. This can cause a device to become irresponsive and to
> > disrupt the guest execution, ie. hang, loop forever or worse.
> > 
> > This scenario is quite easy to encounter with virtio-scsi
> > on POWER when punching multiple blockdev-create QMP commands
> > while the guest is booting and it is still running the SLOF
> > firmware. This happens because SLOF disables/re-enables PCI
> > devices multiple times via IO/MEM/MASTER bits of PCI_COMMAND
> > register after the initial probe/feature negotiation, as it
> > tends to work with a single device at a time at various stages
> > like probing and running block/network bootloaders without
> > doing a full reset in-between. This naturally generates many
> > dataplane stops and starts, and thus many drain sections that
> > can race with blockdev_create_run(). In the end, SLOF bails
> > out.
> > 
> > It is somehow reproducible on x86 but it requires to generate
> > articial dataplane start/stop activity with stop/cont QMP
> > commands. In this case, seabios ends up looping for ever,
> > waiting for the virtio-scsi device to send a response to
> > a command it never received.
> > 
> > Add a helper that pairs all previously called bdrv_do_drained_begin()
> > with a bdrv_do_drained_end() and call it from bdrv_close().
> > While at it, update the "/bdrv-drain/graph-change/drain_all"
> > test in test-bdrv-drain so that it can catch the issue.
> > 
> > BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1874441
> > Signed-off-by: Greg Kurz 
> > ---
> >  block.c |9 +
> >  block/io.c  |   13 +
> >  include/block/block.h   |6 ++
> >  tests/test-bdrv-drain.c |1 +
> >  4 files changed, 29 insertions(+)
> > 
> > diff --git a/block.c b/block.c
> > index 430edf79bb10..ddcb36dd48a8 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -4458,6 +4458,15 @@ static void bdrv_close(BlockDriverState *bs)
> >  }
> >  QLIST_INIT(>aio_notifiers);
> >  bdrv_drained_end(bs);
> > +
> > +/*
> > + * If we're still inside some bdrv_drain_all_begin()/end() sections, 
> > end
> > + * them now since this BDS won't exist anymore when 
> > bdrv_drain_all_end()
> > + * gets called.
> > + */
> > +if (bs->quiesce_counter) {
> > +bdrv_drained_end_quiesce(bs);
> > +}
> >  }
> >  
> >  void bdrv_close_all(void)
> > diff --git a/block/io.c b/block/io.c
> > index 54f0968aee27..8a0da06bbb14 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -633,6 +633,19 @@ void bdrv_drain_all_begin(void)
> >  }
> >  }
> >  
> > +void bdrv_drained_end_quiesce(BlockDriverState *bs)
> 
> I think the name should make clear that this is meant as a counterpart
> for bdrv_drain_all_begin(), so maybe bdrv_drain_all_end_quiesce()?
> 

Sure.

> (The function is not suitable for any other kinds of drain because the
> parameters it passes to bdrv_do_drained_end() are only the same as for
> bdrv_drain_all_begin().)
> 
> > +{
> > +int drained_end_counter = 0;
> > +
> > +g_assert_cmpint(bs->quiesce_counter, >, 0);
> > +g_assert_cmpint(bs->refcnt, ==, 0);
> 
> By the way, I didn't know about the problem with these macros either.
> 
> > +while (bs->quiesce_counter) {
> > +bdrv_do_drained_end(bs, false, NULL, true, _end_counter);
> > +}
> > +BDRV_POLL_WHILE(bs, qatomic_read(_end_counter) > 0);
> > +}
> > +
> >  void bdrv_drain_all_end(void)
> >  {
> >  BlockDriverState *bs = NULL;
> > diff --git a/include/block/block.h b/include/block/block.h
> > index d16c401cb44e..c0ce6e690081 100644
> > --- a/include/block/block.h
> > +++ b/include/block/block.h
> > @@ -779,6 +779,12 @@ void bdrv_drained_end(BlockDriverState *bs);
> >   */
> >  void bdrv_drained_end_no_poll(BlockDriverState *bs, int 
> > *drained_end_counter);
> >  
> > +/**
> > + * End all quiescent sections started by bdrv_drain_all_begin(). This is
> > + * only needed when deleting a BDS before bdrv_drain_all_end() is called.
> > + */
> > +void bdrv_drained_end_quiesce(BlockDriverStat

Re: [PATCH] block: End quiescent sections when a BDS is deleted

2020-10-23 Thread Greg Kurz
On Fri, 23 Oct 2020 03:48:39 -0700
 wrote:

> Patchew URL: 
> https://patchew.org/QEMU/160344969243.4091343.14371338409686732879.st...@bahia.lan/
> 
> 
> 
> Hi,
> 
> This series seems to have some coding style problems. See output below for
> more information:
> 
> Type: series
> Message-id: 160344969243.4091343.14371338409686732879.st...@bahia.lan
> Subject: [PATCH] block: End quiescent sections when a BDS is deleted
> 
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> git rev-parse base > /dev/null || exit 0
> git config --local diff.renamelimit 0
> git config --local diff.renames True
> git config --local diff.algorithm histogram
> ./scripts/checkpatch.pl --mailback base..
> === TEST SCRIPT END ===
> 
> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> From https://github.com/patchew-project/qemu
>  * [new tag] 
> patchew/160344969243.4091343.14371338409686732879.st...@bahia.lan -> 
> patchew/160344969243.4091343.14371338409686732879.st...@bahia.lan
>  - [tag update]  patchew/20201023101222.250147-1-kw...@redhat.com -> 
> patchew/20201023101222.250147-1-kw...@redhat.com
> Switched to a new branch 'test'
> f9501f8 block: End quiescent sections when a BDS is deleted
> 
> === OUTPUT BEGIN ===
> ERROR: Use g_assert or g_assert_not_reached
> #73: FILE: block/io.c:640:
> +g_assert_cmpint(bs->quiesce_counter, >, 0);
> 

Ah... sorry I wasn't aware of the story behind glib commit a6a875068779,
I'll post a v2 what uses g_assert() instead.

> ERROR: Use g_assert or g_assert_not_reached
> #74: FILE: block/io.c:641:
> +g_assert_cmpint(bs->refcnt, ==, 0);
> 
> total: 2 errors, 0 warnings, 53 lines checked
> 
> Commit f9501f846de1 (block: End quiescent sections when a BDS is deleted) has 
> style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> === OUTPUT END ===
> 
> Test command exited with code: 1
> 
> 
> The full log is available at
> http://patchew.org/logs/160344969243.4091343.14371338409686732879.st...@bahia.lan/testing.checkpatch/?type=message.
> ---
> Email generated automatically by Patchew [https://patchew.org/].
> Please send your feedback to patchew-de...@redhat.com



[PATCH] block: End quiescent sections when a BDS is deleted

2020-10-23 Thread Greg Kurz
If a BDS gets deleted during blk_drain_all(), it might miss a
call to bdrv_do_drained_end(). This means missing a call to
aio_enable_external() and the AIO context remains disabled for
ever. This can cause a device to become irresponsive and to
disrupt the guest execution, ie. hang, loop forever or worse.

This scenario is quite easy to encounter with virtio-scsi
on POWER when punching multiple blockdev-create QMP commands
while the guest is booting and it is still running the SLOF
firmware. This happens because SLOF disables/re-enables PCI
devices multiple times via IO/MEM/MASTER bits of PCI_COMMAND
register after the initial probe/feature negotiation, as it
tends to work with a single device at a time at various stages
like probing and running block/network bootloaders without
doing a full reset in-between. This naturally generates many
dataplane stops and starts, and thus many drain sections that
can race with blockdev_create_run(). In the end, SLOF bails
out.

It is somehow reproducible on x86 but it requires to generate
articial dataplane start/stop activity with stop/cont QMP
commands. In this case, seabios ends up looping for ever,
waiting for the virtio-scsi device to send a response to
a command it never received.

Add a helper that pairs all previously called bdrv_do_drained_begin()
with a bdrv_do_drained_end() and call it from bdrv_close().
While at it, update the "/bdrv-drain/graph-change/drain_all"
test in test-bdrv-drain so that it can catch the issue.

BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1874441
Signed-off-by: Greg Kurz 
---
 block.c |9 +
 block/io.c  |   13 +
 include/block/block.h   |6 ++
 tests/test-bdrv-drain.c |1 +
 4 files changed, 29 insertions(+)

diff --git a/block.c b/block.c
index 430edf79bb10..ddcb36dd48a8 100644
--- a/block.c
+++ b/block.c
@@ -4458,6 +4458,15 @@ static void bdrv_close(BlockDriverState *bs)
 }
 QLIST_INIT(>aio_notifiers);
 bdrv_drained_end(bs);
+
+/*
+ * If we're still inside some bdrv_drain_all_begin()/end() sections, end
+ * them now since this BDS won't exist anymore when bdrv_drain_all_end()
+ * gets called.
+ */
+if (bs->quiesce_counter) {
+bdrv_drained_end_quiesce(bs);
+}
 }
 
 void bdrv_close_all(void)
diff --git a/block/io.c b/block/io.c
index 54f0968aee27..8a0da06bbb14 100644
--- a/block/io.c
+++ b/block/io.c
@@ -633,6 +633,19 @@ void bdrv_drain_all_begin(void)
 }
 }
 
+void bdrv_drained_end_quiesce(BlockDriverState *bs)
+{
+int drained_end_counter = 0;
+
+g_assert_cmpint(bs->quiesce_counter, >, 0);
+g_assert_cmpint(bs->refcnt, ==, 0);
+
+while (bs->quiesce_counter) {
+bdrv_do_drained_end(bs, false, NULL, true, _end_counter);
+}
+BDRV_POLL_WHILE(bs, qatomic_read(_end_counter) > 0);
+}
+
 void bdrv_drain_all_end(void)
 {
 BlockDriverState *bs = NULL;
diff --git a/include/block/block.h b/include/block/block.h
index d16c401cb44e..c0ce6e690081 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -779,6 +779,12 @@ void bdrv_drained_end(BlockDriverState *bs);
  */
 void bdrv_drained_end_no_poll(BlockDriverState *bs, int *drained_end_counter);
 
+/**
+ * End all quiescent sections started by bdrv_drain_all_begin(). This is
+ * only needed when deleting a BDS before bdrv_drain_all_end() is called.
+ */
+void bdrv_drained_end_quiesce(BlockDriverState *bs);
+
 /**
  * End a quiescent section started by bdrv_subtree_drained_begin().
  */
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 1595bbc92e9e..8a29e33e004a 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -594,6 +594,7 @@ static void test_graph_change_drain_all(void)
 
 g_assert_cmpint(bs_b->quiesce_counter, ==, 0);
 g_assert_cmpint(b_s->drain_count, ==, 0);
+g_assert_cmpint(qemu_get_aio_context()->external_disable_cnt, ==, 0);
 
 bdrv_unref(bs_b);
 blk_unref(blk_b);





Re: [PATCH v2 13/13] block/qed: bdrv_qed_do_open: deal with errp

2020-09-18 Thread Greg Kurz
On Thu, 17 Sep 2020 22:55:19 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> Set errp always on failure. Generic bdrv_open_driver supports driver
> functions which can return negative value and forget to set errp.
> That's a strange thing.. Let's improve bdrv_qed_do_open to not behave
> this way. This allows to simplify code in
> bdrv_qed_co_invalidate_cache().
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Alberto Garcia 
> ---

Reviewed-by: Greg Kurz 

>  block/qed.c | 24 +++-
>  1 file changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/block/qed.c b/block/qed.c
> index b27e7546ca..f45c640513 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -393,6 +393,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
> *bs, QDict *options,
>  
>  ret = bdrv_pread(bs->file, 0, _header, sizeof(le_header));
>  if (ret < 0) {
> +error_setg(errp, "Failed to read QED header");
>  return ret;
>  }
>  qed_header_le_to_cpu(_header, >header);
> @@ -408,25 +409,30 @@ static int coroutine_fn 
> bdrv_qed_do_open(BlockDriverState *bs, QDict *options,
>  return -ENOTSUP;
>  }
>  if (!qed_is_cluster_size_valid(s->header.cluster_size)) {
> +error_setg(errp, "QED cluster size is invalid");
>  return -EINVAL;
>  }
>  
>  /* Round down file size to the last cluster */
>  file_size = bdrv_getlength(bs->file->bs);
>  if (file_size < 0) {
> +error_setg(errp, "Failed to get file length");
>  return file_size;
>  }
>  s->file_size = qed_start_of_cluster(s, file_size);
>  
>  if (!qed_is_table_size_valid(s->header.table_size)) {
> +error_setg(errp, "QED table size is invalid");
>  return -EINVAL;
>  }
>  if (!qed_is_image_size_valid(s->header.image_size,
>   s->header.cluster_size,
>   s->header.table_size)) {
> +error_setg(errp, "QED image size is invalid");
>  return -EINVAL;
>  }
>  if (!qed_check_table_offset(s, s->header.l1_table_offset)) {
> +error_setg(errp, "QED table offset is invalid");
>  return -EINVAL;
>  }
>  
> @@ -438,6 +444,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
> *bs, QDict *options,
>  
>  /* Header size calculation must not overflow uint32_t */
>  if (s->header.header_size > UINT32_MAX / s->header.cluster_size) {
> +error_setg(errp, "QED header size is too large");
>  return -EINVAL;
>  }
>  
> @@ -445,6 +452,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
> *bs, QDict *options,
>  if ((uint64_t)s->header.backing_filename_offset +
>  s->header.backing_filename_size >
>  s->header.cluster_size * s->header.header_size) {
> +error_setg(errp, "QED backing filename offset is invalid");
>  return -EINVAL;
>  }
>  
> @@ -453,6 +461,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
> *bs, QDict *options,
>bs->auto_backing_file,
>sizeof(bs->auto_backing_file));
>  if (ret < 0) {
> +error_setg(errp, "Failed to read backing filename");
>  return ret;
>  }
>  pstrcpy(bs->backing_file, sizeof(bs->backing_file),
> @@ -475,6 +484,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
> *bs, QDict *options,
>  
>  ret = qed_write_header_sync(s);
>  if (ret) {
> +error_setg(errp, "Failed to update header");
>  return ret;
>  }
>  
> @@ -487,6 +497,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
> *bs, QDict *options,
>  
>  ret = qed_read_l1_table_sync(s);
>  if (ret) {
> +error_setg(errp, "Failed to read L1 table");
>  goto out;
>  }
>  
> @@ -503,6 +514,7 @@ static int coroutine_fn bdrv_qed_do_open(BlockDriverState 
> *bs, QDict *options,
>  
>  ret = qed_check(s, , true);
>  if (ret) {
> +error_setg(errp, "Image corrupted");
>  goto out;
>  }
>  }
> @@ -1537,22 +1549,16 @@ static void coroutine_fn 
> bdrv_qed_co_invalidate_cache(BlockDriverState *bs,
>Error **errp)
>  {
>  BDRVQEDState *s = bs->opaque;
> -Er

Re: [PATCH v2 12/13] block/qcow2: simplify qcow2_co_invalidate_cache()

2020-09-18 Thread Greg Kurz
On Fri, 18 Sep 2020 19:01:34 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> 18.09.2020 18:51, Alberto Garcia wrote:
> > On Fri 18 Sep 2020 05:30:06 PM CEST, Greg Kurz wrote:
> >>> qcow2_do_open correctly sets errp on each failure path. So, we can
> >>> simplify code in qcow2_co_invalidate_cache() and drop explicit error
> >>> propagation. We should use ERRP_GUARD() (accordingly to comment in
> >>> include/qapi/error.h) together with error_append() call which we add to
> >>> avoid problems with error_fatal.
> >>>
> >>
> >> The wording gives the impression that we add error_append() to avoid 
> >> problems
> >> with error_fatal which is certainly not true. Also it isn't _append() but
> >> _prepend() :)
> >>
> >> What about ?
> >>
> >> "Add ERRP_GUARD() as mandated by the documentation in include/qapi/error.h
> >>   to avoid problems with the error_prepend() call if errp is
> >>   _fatal."
> 
> OK for me.
> 
> > 
> > I had to go to the individual error functions to see what "it doesn't
> > work with _fatal" actually means.
> > 
> > So in a case like qcow2_do_open() which has:
> > 
> > error_setg(errp, ...)
> > error_append_hint(errp, ...)
> > 
> > As far as I can see this works just fine without ERRP_GUARD() and with
> > error_fatal, the difference is that if we don't use the guard then the
> > process exists during error_setg(), and if we use the guard it exists
> > during the implicit error_propagate() call triggered by its destruction
> > at the end of the function. In this latter case the printed error
> > message would include the hint.
> > 
> 
> Yes the only problem is that without ERRP_GUARD we lose the hint in case of 
> error_fatal.
> 

Yeah, so rather:

"Add ERRP_GUARD() as mandated by the documentation in include/qapi/error.h
 so that error_prepend() is actually called even if errp is _fatal."

Cheers,

--
Greg



Re: [PATCH v2 12/13] block/qcow2: simplify qcow2_co_invalidate_cache()

2020-09-18 Thread Greg Kurz
On Thu, 17 Sep 2020 22:55:18 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> qcow2_do_open correctly sets errp on each failure path. So, we can
> simplify code in qcow2_co_invalidate_cache() and drop explicit error
> propagation. We should use ERRP_GUARD() (accordingly to comment in
> include/qapi/error.h) together with error_append() call which we add to
> avoid problems with error_fatal.
> 

The wording gives the impression that we add error_append() to avoid problems
with error_fatal which is certainly not true. Also it isn't _append() but
_prepend() :)

What about ?

"Add ERRP_GUARD() as mandated by the documentation in include/qapi/error.h
 to avoid problems with the error_prepend() call if errp is _fatal."

With that fixed,

Reviewed-by: Greg Kurz 

> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/qcow2.c | 13 -
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 2b6ec4b757..cd5f48d3fb 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2702,11 +2702,11 @@ static void qcow2_close(BlockDriverState *bs)
>  static void coroutine_fn qcow2_co_invalidate_cache(BlockDriverState *bs,
> Error **errp)
>  {
> +ERRP_GUARD();
>  BDRVQcow2State *s = bs->opaque;
>  int flags = s->flags;
>  QCryptoBlock *crypto = NULL;
>  QDict *options;
> -Error *local_err = NULL;
>  int ret;
>  
>  /*
> @@ -2724,16 +2724,11 @@ static void coroutine_fn 
> qcow2_co_invalidate_cache(BlockDriverState *bs,
>  
>  flags &= ~BDRV_O_INACTIVE;
>  qemu_co_mutex_lock(>lock);
> -ret = qcow2_do_open(bs, options, flags, _err);
> +ret = qcow2_do_open(bs, options, flags, errp);
>  qemu_co_mutex_unlock(>lock);
>  qobject_unref(options);
> -if (local_err) {
> -error_propagate_prepend(errp, local_err,
> -"Could not reopen qcow2 layer: ");
> -bs->drv = NULL;
> -return;
> -} else if (ret < 0) {
> -error_setg_errno(errp, -ret, "Could not reopen qcow2 layer");
> +if (ret < 0) {
> +error_prepend(errp, "Could not reopen qcow2 layer: ");
>  bs->drv = NULL;
>  return;
>  }




Re: [PATCH v2 09/13] block/qcow2-bitmap: improve qcow2_load_dirty_bitmaps() interface

2020-09-18 Thread Greg Kurz
On Thu, 17 Sep 2020 22:55:15 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> It's recommended for bool functions with errp to return true on success
> and false on failure. Non-standard interfaces don't help to understand
> the code. The change is also needed to reduce error propagation.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---

With the documentation change suggested by Berto,

Reviewed-by: Greg Kurz 

>  block/qcow2.h|  3 ++-
>  block/qcow2-bitmap.c | 25 ++---
>  block/qcow2.c|  6 ++
>  3 files changed, 18 insertions(+), 16 deletions(-)
> 
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 6eac088f1c..3c64dcda33 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -972,7 +972,8 @@ void qcow2_cache_discard(Qcow2Cache *c, void *table);
>  int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>void **refcount_table,
>int64_t *refcount_table_size);
> -bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error **errp);
> +bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, bool *header_updated,
> +  Error **errp);
>  bool qcow2_get_bitmap_info_list(BlockDriverState *bs,
>  Qcow2BitmapInfoList **info_list, Error 
> **errp);
>  int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp);
> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
> index 4f6138f544..500175f4e8 100644
> --- a/block/qcow2-bitmap.c
> +++ b/block/qcow2-bitmap.c
> @@ -962,25 +962,26 @@ static void set_readonly_helper(gpointer bitmap, 
> gpointer value)
>  bdrv_dirty_bitmap_set_readonly(bitmap, (bool)value);
>  }
>  
> -/* qcow2_load_dirty_bitmaps()
> - * Return value is a hint for caller: true means that the Qcow2 header was
> - * updated. (false doesn't mean that the header should be updated by the
> - * caller, it just means that updating was not needed or the image cannot be
> - * written to).
> - * On failure the function returns false.
> +/*
> + * Return true on success, false on failure. Anyway, if header_updated
> + * provided set it appropriately.
>   */
> -bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error **errp)
> +bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, bool *header_updated,
> +  Error **errp)
>  {
>  BDRVQcow2State *s = bs->opaque;
>  Qcow2BitmapList *bm_list;
>  Qcow2Bitmap *bm;
>  GSList *created_dirty_bitmaps = NULL;
> -bool header_updated = false;
>  bool needs_update = false;
>  
> +if (header_updated) {
> +*header_updated = false;
> +}
> +
>  if (s->nb_bitmaps == 0) {
>  /* No bitmaps - nothing to do */
> -return false;
> +return true;
>  }
>  
>  bm_list = bitmap_list_load(bs, s->bitmap_directory_offset,
> @@ -1036,7 +1037,9 @@ bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, 
> Error **errp)
>  error_setg_errno(errp, -ret, "Can't update bitmap directory");
>  goto fail;
>  }
> -header_updated = true;
> +if (header_updated) {
> +*header_updated = true;
> +}
>  }
>  
>  if (!can_write(bs)) {
> @@ -1047,7 +1050,7 @@ bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, 
> Error **errp)
>  g_slist_free(created_dirty_bitmaps);
>  bitmap_list_free(bm_list);
>  
> -return header_updated;
> +return true;
>  
>  fail:
>  g_slist_foreach(created_dirty_bitmaps, release_dirty_bitmap_helper, bs);
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 8c89c98978..c4b86df7c0 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -1297,7 +1297,6 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
> *bs, QDict *options,
>  unsigned int len, i;
>  int ret = 0;
>  QCowHeader header;
> -Error *local_err = NULL;
>  uint64_t ext_end;
>  uint64_t l1_vm_state_index;
>  bool update_header = false;
> @@ -1785,9 +1784,8 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
> *bs, QDict *options,
>  
>  if (!(bdrv_get_flags(bs) & BDRV_O_INACTIVE)) {
>  /* It's case 1, 2 or 3.2. Or 3.1 which is BUG in management layer. */
> -bool header_updated = qcow2_load_dirty_bitmaps(bs, _err);
> -if (local_err != NULL) {
> -error_propagate(errp, local_err);
> +bool header_updated;
> +if (!qcow2_load_dirty_bitmaps(bs, _updated, errp)) {
>  ret = -EINVAL;
>  goto fail;
>  }




Re: [PATCH 11/14] block/qcow2-bitmap: return startus from qcow2_store_persistent_dirty_bitmaps

2020-09-11 Thread Greg Kurz
On Fri, 11 Sep 2020 13:18:32 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> 11.09.2020 12:38, Greg Kurz wrote:
> > s/startus/status
> > 
> > On Wed,  9 Sep 2020 21:59:27 +0300
> > Vladimir Sementsov-Ogievskiy  wrote:
> > 
> >> It's better to return status together with setting errp. It makes
> >> possible to avoid error propagation.
> >>
> >> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> >> ---
> >>   block/qcow2.h|  2 +-
> >>   block/qcow2-bitmap.c | 13 ++---
> >>   2 files changed, 7 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/block/qcow2.h b/block/qcow2.h
> >> index e7e662533b..49824be5c6 100644
> >> --- a/block/qcow2.h
> >> +++ b/block/qcow2.h
> >> @@ -972,7 +972,7 @@ bool qcow2_get_bitmap_info_list(BlockDriverState *bs,
> >>   Qcow2BitmapInfoList **info_list, Error 
> >> **errp);
> >>   int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp);
> >>   int qcow2_truncate_bitmaps_check(BlockDriverState *bs, Error **errp);
> >> -void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
> >> +bool qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
> >> bool release_stored, Error 
> >> **errp);
> >>   int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp);
> >>   bool qcow2_co_can_store_new_dirty_bitmap(BlockDriverState *bs,
> >> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
> >> index f58923fce3..5eeff1cb1c 100644
> >> --- a/block/qcow2-bitmap.c
> >> +++ b/block/qcow2-bitmap.c
> >> @@ -1524,9 +1524,10 @@ out:
> >>* readonly to begin with, and whether we opened directly or reopened to 
> >> that
> >>* state shouldn't matter for the state we get afterward.
> >>*/
> >> -void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
> >> +bool qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
> >> bool release_stored, Error 
> >> **errp)
> >>   {
> >> +ERRP_GUARD();
> > 
> > Maybe worth mentioning in the changelog that this ERRP_GUARD() fixes
> > an error_prepend(errp, ...) not visible in the patch context.
> 
> Ah yes. Actually this is occasional thing I didn't want to include into this 
> patch
> (and int this part I). So we can just drop it and leave for part II or part 
> III,
> or add a note into commit message
> 
> > 
> > Anyway,
> > 
> > Reviewed-by: Greg Kurz 
> 
> Thanks a lot for reviewing :)
> 

Don't mention it :)

> Hmm.. With this series I understand the following:
> 
> 1. It's no sense in simple applying scripts/coccinelle/errp-guard.cocci to 
> the whole code-base, because:
> 
>- it produces a lot of "if (*errp)" in places where it is really simple to 
> avoid error propagation at all, like in this series
>- reviewing is the hardest part of the process
> 
> So, if we have to review these changes anyway, it's better to invest a bit 
> more time into patch creation, and make code correspond to our modern error 
> API recommendations.
> 
> 2. So, the process turns into following steps:
> 
>- apply scripts/coccinelle/errp-guard.cocci
>- look through patches and do obvious refactorings (like this series)
>- keep ERRP_GUARD where necessary (appending info to error, or where 
> refactoring of function return status is too invasive and not simple)
> 

I've started to follow this process for the spapr code and, indeed, I
can come up with better changes by refactoring some code manually.
Some of these changes are not that obvious that they could be made
by someone who doesn't know the code, so I tend to agree with your
arguments in 1.

This is also the reason I didn't review patches 10, 13 and 14 because
they looked like I should understand the corresponding code a bit more.

> 3. Obviously, that's too much for me :) Of course, I will invest some time 
> into making the series like this one, and reviewing them, but I can't do it 
> for weeks and months. (My original сunning plan to simply push ~100 generated 
> commits with my s-o-b and become the greatest contributor failed:)
> 

Ha ha :D ... as a consolation prize, maybe we can reach a fair number
of r-b by reviewing each other's _simple_ cleanups ;-)

> The good thing is that now, with ERRP_GUARD finally merged, we can produce 
> parallel series like this, and they will be processed in parallel by 
> different maintainers (and Markus will have to merge series for subsystems 

Re: [PATCH 12/14] block/qcow2: read_cache_sizes: return status value

2020-09-11 Thread Greg Kurz
On Wed,  9 Sep 2020 21:59:28 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> It's better to return status together with setting errp. It allows to
> reduce error propagation.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---

Reviewed-by: Greg Kurz 

>  block/qcow2.c | 19 +--
>  1 file changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index c2cd9434cc..31dd28d19e 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -869,7 +869,7 @@ static void qcow2_attach_aio_context(BlockDriverState *bs,
>  cache_clean_timer_init(bs, new_context);
>  }
>  
> -static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
> +static bool read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
>   uint64_t *l2_cache_size,
>   uint64_t *l2_cache_entry_size,
>   uint64_t *refcount_cache_size, Error **errp)
> @@ -907,16 +907,16 @@ static void read_cache_sizes(BlockDriverState *bs, 
> QemuOpts *opts,
>  error_setg(errp, QCOW2_OPT_CACHE_SIZE ", " 
> QCOW2_OPT_L2_CACHE_SIZE
> " and " QCOW2_OPT_REFCOUNT_CACHE_SIZE " may not be 
> set "
> "at the same time");
> -return;
> +return false;
>  } else if (l2_cache_size_set &&
> (l2_cache_max_setting > combined_cache_size)) {
>  error_setg(errp, QCOW2_OPT_L2_CACHE_SIZE " may not exceed "
> QCOW2_OPT_CACHE_SIZE);
> -return;
> +return false;
>  } else if (*refcount_cache_size > combined_cache_size) {
>  error_setg(errp, QCOW2_OPT_REFCOUNT_CACHE_SIZE " may not exceed "
> QCOW2_OPT_CACHE_SIZE);
> -return;
> +return false;
>  }
>  
>  if (l2_cache_size_set) {
> @@ -955,8 +955,10 @@ static void read_cache_sizes(BlockDriverState *bs, 
> QemuOpts *opts,
>  error_setg(errp, "L2 cache entry size must be a power of two "
> "between %d and the cluster size (%d)",
> 1 << MIN_CLUSTER_BITS, s->cluster_size);
> -return;
> +return false;
>  }
> +
> +return true;
>  }
>  
>  typedef struct Qcow2ReopenState {
> @@ -983,7 +985,6 @@ static int qcow2_update_options_prepare(BlockDriverState 
> *bs,
>  int i;
>  const char *encryptfmt;
>  QDict *encryptopts = NULL;
> -Error *local_err = NULL;
>  int ret;
>  
>  qdict_extract_subqdict(options, , "encrypt.");
> @@ -996,10 +997,8 @@ static int qcow2_update_options_prepare(BlockDriverState 
> *bs,
>  }
>  
>  /* get L2 table/refcount block cache size from command line options */
> -read_cache_sizes(bs, opts, _cache_size, _cache_entry_size,
> - _cache_size, _err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> +if (!read_cache_sizes(bs, opts, _cache_size, _cache_entry_size,
> +  _cache_size, errp)) {
>  ret = -EINVAL;
>  goto fail;
>  }




Re: [PATCH 11/14] block/qcow2-bitmap: return startus from qcow2_store_persistent_dirty_bitmaps

2020-09-11 Thread Greg Kurz
s/startus/status

On Wed,  9 Sep 2020 21:59:27 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> It's better to return status together with setting errp. It makes
> possible to avoid error propagation.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/qcow2.h|  2 +-
>  block/qcow2-bitmap.c | 13 ++---
>  2 files changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/block/qcow2.h b/block/qcow2.h
> index e7e662533b..49824be5c6 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -972,7 +972,7 @@ bool qcow2_get_bitmap_info_list(BlockDriverState *bs,
>  Qcow2BitmapInfoList **info_list, Error 
> **errp);
>  int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp);
>  int qcow2_truncate_bitmaps_check(BlockDriverState *bs, Error **errp);
> -void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
> +bool qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
>bool release_stored, Error **errp);
>  int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp);
>  bool qcow2_co_can_store_new_dirty_bitmap(BlockDriverState *bs,
> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
> index f58923fce3..5eeff1cb1c 100644
> --- a/block/qcow2-bitmap.c
> +++ b/block/qcow2-bitmap.c
> @@ -1524,9 +1524,10 @@ out:
>   * readonly to begin with, and whether we opened directly or reopened to that
>   * state shouldn't matter for the state we get afterward.
>   */
> -void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
> +bool qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
>bool release_stored, Error **errp)
>  {
> +ERRP_GUARD();

Maybe worth mentioning in the changelog that this ERRP_GUARD() fixes
an error_prepend(errp, ...) not visible in the patch context.

Anyway,

Reviewed-by: Greg Kurz 

>  BdrvDirtyBitmap *bitmap;
>  BDRVQcow2State *s = bs->opaque;
>  uint32_t new_nb_bitmaps = s->nb_bitmaps;
> @@ -1546,7 +1547,7 @@ void 
> qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
>  bm_list = bitmap_list_load(bs, s->bitmap_directory_offset,
> s->bitmap_directory_size, errp);
>  if (bm_list == NULL) {
> -return;
> +return false;
>  }
>  }
>  
> @@ -1661,7 +1662,7 @@ success:
>  }
>  
>  bitmap_list_free(bm_list);
> -return;
> +return true;
>  
>  fail:
>  QSIMPLEQ_FOREACH(bm, bm_list, entry) {
> @@ -1679,16 +1680,14 @@ fail:
>  }
>  
>  bitmap_list_free(bm_list);
> +return false;
>  }
>  
>  int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp)
>  {
>  BdrvDirtyBitmap *bitmap;
> -Error *local_err = NULL;
>  
> -qcow2_store_persistent_dirty_bitmaps(bs, false, _err);
> -if (local_err != NULL) {
> -error_propagate(errp, local_err);
> +if (!qcow2_store_persistent_dirty_bitmaps(bs, false, errp)) {
>  return -EINVAL;
>  }
>  




Re: [PATCH 09/14] block/qcow2: qcow2_get_specific_info(): drop error propagation

2020-09-11 Thread Greg Kurz
On Wed,  9 Sep 2020 21:59:25 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> Don't use error propagation in qcow2_get_specific_info(). For this
> refactor qcow2_get_bitmap_info_list, its current interface rather

... interface is rather

> weird.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---

Reviewed-by: Greg Kurz 

>  block/qcow2.h|  4 ++--
>  block/qcow2-bitmap.c | 27 +--
>  block/qcow2.c| 10 +++---
>  3 files changed, 18 insertions(+), 23 deletions(-)
> 
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 065ec3df0b..ac6a2d3e2a 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -967,8 +967,8 @@ int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, 
> BdrvCheckResult *res,
>void **refcount_table,
>int64_t *refcount_table_size);
>  bool qcow2_load_dirty_bitmaps(BlockDriverState *bs, Error **errp);
> -Qcow2BitmapInfoList *qcow2_get_bitmap_info_list(BlockDriverState *bs,
> -Error **errp);
> +bool qcow2_get_bitmap_info_list(BlockDriverState *bs,
> +Qcow2BitmapInfoList **info_list, Error 
> **errp);
>  int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp);
>  int qcow2_truncate_bitmaps_check(BlockDriverState *bs, Error **errp);
>  void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
> index 8c34b2aef7..9b14c0791f 100644
> --- a/block/qcow2-bitmap.c
> +++ b/block/qcow2-bitmap.c
> @@ -1090,30 +1090,29 @@ static Qcow2BitmapInfoFlagsList 
> *get_bitmap_info_flags(uint32_t flags)
>  /*
>   * qcow2_get_bitmap_info_list()
>   * Returns a list of QCOW2 bitmap details.
> - * In case of no bitmaps, the function returns NULL and
> - * the @errp parameter is not set.
> - * When bitmap information can not be obtained, the function returns
> - * NULL and the @errp parameter is set.
> + * On success return true with bm_list set (probably to NULL, if no bitmaps),
> + * on failure return false with errp set.
>   */
> -Qcow2BitmapInfoList *qcow2_get_bitmap_info_list(BlockDriverState *bs,
> -Error **errp)
> +bool qcow2_get_bitmap_info_list(BlockDriverState *bs,
> +Qcow2BitmapInfoList **info_list, Error 
> **errp)
>  {
>  BDRVQcow2State *s = bs->opaque;
>  Qcow2BitmapList *bm_list;
>  Qcow2Bitmap *bm;
> -Qcow2BitmapInfoList *list = NULL;
> -Qcow2BitmapInfoList **plist = 
>  
>  if (s->nb_bitmaps == 0) {
> -return NULL;
> +*info_list = NULL;
> +return true;
>  }
>  
>  bm_list = bitmap_list_load(bs, s->bitmap_directory_offset,
> s->bitmap_directory_size, errp);
> -if (bm_list == NULL) {
> -return NULL;
> +if (!bm_list) {
> +return false;
>  }
>  
> +*info_list = NULL;
> +
>  QSIMPLEQ_FOREACH(bm, bm_list, entry) {
>  Qcow2BitmapInfo *info = g_new0(Qcow2BitmapInfo, 1);
>  Qcow2BitmapInfoList *obj = g_new0(Qcow2BitmapInfoList, 1);
> @@ -1121,13 +1120,13 @@ Qcow2BitmapInfoList 
> *qcow2_get_bitmap_info_list(BlockDriverState *bs,
>  info->name = g_strdup(bm->name);
>  info->flags = get_bitmap_info_flags(bm->flags & ~BME_RESERVED_FLAGS);
>  obj->value = info;
> -*plist = obj;
> -plist = >next;
> +*info_list = obj;
> +info_list = >next;
>  }
>  
>  bitmap_list_free(bm_list);
>  
> -return list;
> +return true;
>  }
>  
>  int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp)
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 10175fa399..eb7c82120c 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -5056,12 +5056,10 @@ static ImageInfoSpecific 
> *qcow2_get_specific_info(BlockDriverState *bs,
>  BDRVQcow2State *s = bs->opaque;
>  ImageInfoSpecific *spec_info;
>  QCryptoBlockInfo *encrypt_info = NULL;
> -Error *local_err = NULL;
>  
>  if (s->crypto != NULL) {
> -encrypt_info = qcrypto_block_get_info(s->crypto, _err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> +encrypt_info = qcrypto_block_get_info(s->crypto, errp);
> +if (!encrypt_info) {
>  return NULL;
>  }
>  }
> @@ -5078,9 +5076,7 @@ static ImageInfoSpecific 
> *qcow2_get_specific_info(BlockDriverState *bs,
>  };
>  } else if (s->qcow_version == 3) {
>  Qcow2BitmapInfoList *bitmaps;
> -bitmaps = qcow2_get_bitmap_info_list(bs, _err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> +if (!qcow2_get_bitmap_info_list(bs, , errp)) {
>  qapi_free_ImageInfoSpecific(spec_info);
>  qapi_free_QCryptoBlockInfo(encrypt_info);
>  return NULL;




Re: [PATCH 08/14] blockjob: return status from block_job_set_speed()

2020-09-11 Thread Greg Kurz
On Wed,  9 Sep 2020 21:59:24 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> Better to return status together with setting errp. It allows to avoid
> error propagation in the caller.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---

Reviewed-by: Greg Kurz 

>  include/block/blockjob.h |  2 +-
>  blockjob.c   | 18 --
>  2 files changed, 9 insertions(+), 11 deletions(-)
> 
> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
> index 35faa3aa26..d200f33c10 100644
> --- a/include/block/blockjob.h
> +++ b/include/block/blockjob.h
> @@ -139,7 +139,7 @@ bool block_job_has_bdrv(BlockJob *job, BlockDriverState 
> *bs);
>   * Set a rate-limiting parameter for the job; the actual meaning may
>   * vary depending on the job type.
>   */
> -void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp);
> +bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp);
>  
>  /**
>   * block_job_query:
> diff --git a/blockjob.c b/blockjob.c
> index 470facfd47..afddf7a1fb 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -254,28 +254,30 @@ static bool job_timer_pending(Job *job)
>  return timer_pending(>sleep_timer);
>  }
>  
> -void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
> +bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
>  {
>  int64_t old_speed = job->speed;
>  
> -if (job_apply_verb(>job, JOB_VERB_SET_SPEED, errp)) {
> -return;
> +if (job_apply_verb(>job, JOB_VERB_SET_SPEED, errp) < 0) {
> +return false;
>  }
>  if (speed < 0) {
>  error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "speed",
> "a non-negative value");
> -return;
> +return false;
>  }
>  
>  ratelimit_set_speed(>limit, speed, BLOCK_JOB_SLICE_TIME);
>  
>  job->speed = speed;
>  if (speed && speed <= old_speed) {
> -return;
> +return true;
>  }
>  
>  /* kick only if a timer is pending */
>  job_enter_cond(>job, job_timer_pending);
> +
> +return true;
>  }
>  
>  int64_t block_job_ratelimit_get_delay(BlockJob *job, uint64_t n)
> @@ -448,12 +450,8 @@ void *block_job_create(const char *job_id, const 
> BlockJobDriver *driver,
>  
>  /* Only set speed when necessary to avoid NotSupported error */
>  if (speed != 0) {
> -Error *local_err = NULL;
> -
> -block_job_set_speed(job, speed, _err);
> -if (local_err) {
> +if (!block_job_set_speed(job, speed, errp)) {
>  job_early_fail(>job);
> -error_propagate(errp, local_err);
>  return NULL;
>  }
>  }




Re: [PATCH 07/14] block/blklogwrites: drop error propagation

2020-09-11 Thread Greg Kurz
On Fri, 11 Sep 2020 07:30:37 +0200
Markus Armbruster  wrote:

> Markus Armbruster  writes:
> 
> > Greg Kurz  writes:
> >
> >> On Wed,  9 Sep 2020 21:59:23 +0300
> >> Vladimir Sementsov-Ogievskiy  wrote:
> >>
> >>> It's simple to avoid error propagation in blk_log_writes_open(), we
> >>> just need to refactor blk_log_writes_find_cur_log_sector() a bit.
> >>> 
> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> >>> ---
> >>>  block/blklogwrites.c | 23 +++
> >>>  1 file changed, 11 insertions(+), 12 deletions(-)
> >>> 
> >>> diff --git a/block/blklogwrites.c b/block/blklogwrites.c
> >>> index 7ef046cee9..c7da507b2d 100644
> >>> --- a/block/blklogwrites.c
> >>> +++ b/block/blklogwrites.c
> >>> @@ -96,10 +96,10 @@ static inline bool 
> >>> blk_log_writes_sector_size_valid(uint32_t sector_size)
> >>>  sector_size < (1ull << 24);
> >>>  }
> >>>  
> >>> -static uint64_t blk_log_writes_find_cur_log_sector(BdrvChild *log,
> >>> -   uint32_t sector_size,
> >>> -   uint64_t nr_entries,
> >>> -   Error **errp)
> >>> +static int64_t blk_log_writes_find_cur_log_sector(BdrvChild *log,
> >>> +  uint32_t sector_size,
> >>> +  uint64_t nr_entries,
> >>> +  Error **errp)
> >>>  {
> >>>  uint64_t cur_sector = 1;
> >>>  uint64_t cur_idx = 0;
> >>> @@ -112,13 +112,13 @@ static uint64_t 
> >>> blk_log_writes_find_cur_log_sector(BdrvChild *log,
> >>>  if (read_ret < 0) {
> >>>  error_setg_errno(errp, -read_ret,
> >>>   "Failed to read log entry %"PRIu64, 
> >>> cur_idx);
> >>> -return (uint64_t)-1ull;
> >>> +return read_ret;
> >>
> >> This changes the error status of blk_log_writes_open() from -EINVAL to
> >> whatever is returned by bdrv_pread(). I guess this is an improvement
> >> worth to be mentioned in the changelog.
> >>
> >>>  }
> >>>  
> >>>  if (cur_entry.flags & ~cpu_to_le64(LOG_FLAG_MASK)) {
> >>>  error_setg(errp, "Invalid flags 0x%"PRIx64" in log entry 
> >>> %"PRIu64,
> >>> le64_to_cpu(cur_entry.flags), cur_idx);
> >>> -return (uint64_t)-1ull;
> >>> +return -EINVAL;
> >>>  }
> >>>  
> >>>  /* Account for the sector of the entry itself */
> >>> @@ -143,7 +143,6 @@ static int blk_log_writes_open(BlockDriverState *bs, 
> >>> QDict *options, int flags,
> >>>  {
> >>>  BDRVBlkLogWritesState *s = bs->opaque;
> >>>  QemuOpts *opts;
> >>> -Error *local_err = NULL;
> >>>  int ret;
> >>>  uint64_t log_sector_size;
> >>>  bool log_append;
> >>> @@ -215,15 +214,15 @@ static int blk_log_writes_open(BlockDriverState 
> >>> *bs, QDict *options, int flags,
> >>>  s->nr_entries = 0;
> >>>  
> >>>  if (blk_log_writes_sector_size_valid(log_sector_size)) {
> >>> -s->cur_log_sector =
> >>> +int64_t cur_log_sector =
> >>>  blk_log_writes_find_cur_log_sector(s->log_file, 
> >>> log_sector_size,
> >>> -le64_to_cpu(log_sb.nr_entries), 
> >>> _err);
> >>> -if (local_err) {
> >>> -ret = -EINVAL;
> >>> -error_propagate(errp, local_err);
> >>> +le64_to_cpu(log_sb.nr_entries), 
> >>> errp);
> >>> +if (cur_log_sector < 0) {
> >>> +ret = cur_log_sector;
> >>
> >> This is converting int64_t to int, which is usually not recommended.
> >> In practice this is probably okay because cur_log_sector is supposed
> >> to be a negative errno (ie. an int) in this case.
> >
> > It isn't: bl

Re: [PATCH 07/14] block/blklogwrites: drop error propagation

2020-09-10 Thread Greg Kurz
On Wed,  9 Sep 2020 21:59:23 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> It's simple to avoid error propagation in blk_log_writes_open(), we
> just need to refactor blk_log_writes_find_cur_log_sector() a bit.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/blklogwrites.c | 23 +++
>  1 file changed, 11 insertions(+), 12 deletions(-)
> 
> diff --git a/block/blklogwrites.c b/block/blklogwrites.c
> index 7ef046cee9..c7da507b2d 100644
> --- a/block/blklogwrites.c
> +++ b/block/blklogwrites.c
> @@ -96,10 +96,10 @@ static inline bool 
> blk_log_writes_sector_size_valid(uint32_t sector_size)
>  sector_size < (1ull << 24);
>  }
>  
> -static uint64_t blk_log_writes_find_cur_log_sector(BdrvChild *log,
> -   uint32_t sector_size,
> -   uint64_t nr_entries,
> -   Error **errp)
> +static int64_t blk_log_writes_find_cur_log_sector(BdrvChild *log,
> +  uint32_t sector_size,
> +  uint64_t nr_entries,
> +  Error **errp)
>  {
>  uint64_t cur_sector = 1;
>  uint64_t cur_idx = 0;
> @@ -112,13 +112,13 @@ static uint64_t 
> blk_log_writes_find_cur_log_sector(BdrvChild *log,
>  if (read_ret < 0) {
>  error_setg_errno(errp, -read_ret,
>   "Failed to read log entry %"PRIu64, cur_idx);
> -return (uint64_t)-1ull;
> +return read_ret;

This changes the error status of blk_log_writes_open() from -EINVAL to
whatever is returned by bdrv_pread(). I guess this is an improvement
worth to be mentioned in the changelog.

>  }
>  
>  if (cur_entry.flags & ~cpu_to_le64(LOG_FLAG_MASK)) {
>  error_setg(errp, "Invalid flags 0x%"PRIx64" in log entry 
> %"PRIu64,
> le64_to_cpu(cur_entry.flags), cur_idx);
> -return (uint64_t)-1ull;
> +return -EINVAL;
>  }
>  
>  /* Account for the sector of the entry itself */
> @@ -143,7 +143,6 @@ static int blk_log_writes_open(BlockDriverState *bs, 
> QDict *options, int flags,
>  {
>  BDRVBlkLogWritesState *s = bs->opaque;
>  QemuOpts *opts;
> -Error *local_err = NULL;
>  int ret;
>  uint64_t log_sector_size;
>  bool log_append;
> @@ -215,15 +214,15 @@ static int blk_log_writes_open(BlockDriverState *bs, 
> QDict *options, int flags,
>  s->nr_entries = 0;
>  
>  if (blk_log_writes_sector_size_valid(log_sector_size)) {
> -s->cur_log_sector =
> +int64_t cur_log_sector =
>  blk_log_writes_find_cur_log_sector(s->log_file, 
> log_sector_size,
> -le64_to_cpu(log_sb.nr_entries), 
> _err);
> -if (local_err) {
> -ret = -EINVAL;
> -error_propagate(errp, local_err);
> +le64_to_cpu(log_sb.nr_entries), errp);
> +if (cur_log_sector < 0) {
> +ret = cur_log_sector;

This is converting int64_t to int, which is usually not recommended.
In practice this is probably okay because cur_log_sector is supposed
to be a negative errno (ie. an int) in this case.

Maybe make this clear with a an assert(cur_log_sector >= INT_MIN) ?

>  goto fail_log;
>  }
>  
> +s->cur_log_sector = cur_log_sector;
>  s->nr_entries = le64_to_cpu(log_sb.nr_entries);
>  }
>  } else {




Re: [PATCH 06/14] block/mirror: drop extra error propagation in commit_active_start()

2020-09-10 Thread Greg Kurz
On Wed,  9 Sep 2020 21:59:22 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> Let's check return value of mirror_start_job to check for failure
> instead of local_err.
> 
> Rename ret to job, as ret is usually integer variable.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---

Reviewed-by: Greg Kurz 

>  block/mirror.c | 12 +---
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index ca250f765d..529ba12b2a 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1788,8 +1788,7 @@ BlockJob *commit_active_start(const char *job_id, 
> BlockDriverState *bs,
>bool auto_complete, Error **errp)
>  {
>  bool base_read_only;
> -Error *local_err = NULL;
> -BlockJob *ret;
> +BlockJob *job;
>  
>  base_read_only = bdrv_is_read_only(base);
>  
> @@ -1799,19 +1798,18 @@ BlockJob *commit_active_start(const char *job_id, 
> BlockDriverState *bs,
>  }
>  }
>  
> -ret = mirror_start_job(
> +job = mirror_start_job(
>   job_id, bs, creation_flags, base, NULL, speed, 0, 0,
>   MIRROR_LEAVE_BACKING_CHAIN, false,
>   on_error, on_error, true, cb, opaque,
>   _active_job_driver, false, base, auto_complete,
>   filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND,
> - _err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> + errp);
> +if (!job) {
>  goto error_restore_flags;
>  }
>  
> -return ret;
> +return job;
>  
>  error_restore_flags:
>  /* ignore error and errp for bdrv_reopen, because we want to propagate




Re: [PATCH 05/14] block: drop extra error propagation for bdrv_set_backing_hd

2020-09-10 Thread Greg Kurz
On Wed,  9 Sep 2020 21:59:21 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> bdrv_set_backing_hd now returns status, let's use it.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---

Reviewed-by: Greg Kurz 

>  block.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 9b624b2535..c5e3a1927e 100644
> --- a/block.c
> +++ b/block.c
> @@ -3011,11 +3011,9 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
> *parent_options,
>  
>  /* Hook up the backing file link; drop our reference, bs owns the
>   * backing_hd reference now */
> -bdrv_set_backing_hd(bs, backing_hd, _err);
> +ret = bdrv_set_backing_hd(bs, backing_hd, errp);
>  bdrv_unref(backing_hd);
> -if (local_err) {
> -error_propagate(errp, local_err);
> -ret = -EINVAL;
> +if (ret < 0) {
>  goto free_exit;
>  }
>  




Re: [PATCH 04/14] blockdev: fix drive_backup_prepare() missed error

2020-09-10 Thread Greg Kurz
On Wed,  9 Sep 2020 21:59:20 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> We leak local_err and don't report failure to the caller. It's
> definitely wrong, let's fix.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---

Reviewed-by: Greg Kurz 

>  blockdev.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 36bef6b188..74259527c1 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1797,8 +1797,7 @@ static void drive_backup_prepare(BlkActionState 
> *common, Error **errp)
>  aio_context_acquire(aio_context);
>  
>  if (set_backing_hd) {
> -bdrv_set_backing_hd(target_bs, source, _err);
> -if (local_err) {
> +if (bdrv_set_backing_hd(target_bs, source, errp) < 0) {
>  goto unref;
>  }
>  }




Re: [PATCH 03/14] block: check return value of bdrv_open_child and drop error propagation

2020-09-10 Thread Greg Kurz
On Wed,  9 Sep 2020 21:59:19 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> This patch is generated by cocci script:
> 
> @@
> symbol bdrv_open_child, errp, local_err;
> expression file;
> @@
> 
>   file = bdrv_open_child(...,
> -_err
> +errp
> );
> - if (local_err)
> + if (!file)
>   {
>   ...
> - error_propagate(errp, local_err);
>   ...
>   }
> 
> with command
> 
> spatch --sp-file x.cocci --macro-file scripts/cocci-macro-file.h \
> --in-place --no-show-diff --max-width 80 --use-gitgrep block
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---

Reviewed-by: Greg Kurz 

>  block/blkdebug.c |  6 ++
>  block/blklogwrites.c | 10 --
>  block/blkreplay.c|  6 ++
>  block/blkverify.c| 11 ---
>  block/qcow2.c|  5 ++---
>  block/quorum.c   |  6 ++
>  6 files changed, 16 insertions(+), 28 deletions(-)
> 
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index 9c08d8a005..61aaee9879 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c
> @@ -464,7 +464,6 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  {
>  BDRVBlkdebugState *s = bs->opaque;
>  QemuOpts *opts;
> -Error *local_err = NULL;
>  int ret;
>  uint64_t align;
>  
> @@ -494,10 +493,9 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  bs->file = bdrv_open_child(qemu_opt_get(opts, "x-image"), options, 
> "image",
> bs, _of_bds,
> BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
> -   false, _err);
> -if (local_err) {
> +   false, errp);
> +if (!bs->file) {
>  ret = -EINVAL;
> -error_propagate(errp, local_err);
>  goto out;
>  }
>  
> diff --git a/block/blklogwrites.c b/block/blklogwrites.c
> index 57315f56b4..7ef046cee9 100644
> --- a/block/blklogwrites.c
> +++ b/block/blklogwrites.c
> @@ -157,19 +157,17 @@ static int blk_log_writes_open(BlockDriverState *bs, 
> QDict *options, int flags,
>  /* Open the file */
>  bs->file = bdrv_open_child(NULL, options, "file", bs, _of_bds,
> BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY, 
> false,
> -   _err);
> -if (local_err) {
> +   errp);
> +if (!bs->file) {
>  ret = -EINVAL;
> -error_propagate(errp, local_err);
>  goto fail;
>  }
>  
>  /* Open the log file */
>  s->log_file = bdrv_open_child(NULL, options, "log", bs, _of_bds,
> -  BDRV_CHILD_METADATA, false, _err);
> -if (local_err) {
> +  BDRV_CHILD_METADATA, false, errp);
> +if (!s->log_file) {
>  ret = -EINVAL;
> -error_propagate(errp, local_err);
>  goto fail;
>  }
>  
> diff --git a/block/blkreplay.c b/block/blkreplay.c
> index 30a0f5d57a..4a247752fd 100644
> --- a/block/blkreplay.c
> +++ b/block/blkreplay.c
> @@ -23,16 +23,14 @@ typedef struct Request {
>  static int blkreplay_open(BlockDriverState *bs, QDict *options, int flags,
>Error **errp)
>  {
> -Error *local_err = NULL;
>  int ret;
>  
>  /* Open the image file */
>  bs->file = bdrv_open_child(NULL, options, "image", bs, _of_bds,
> BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
> -   false, _err);
> -if (local_err) {
> +   false, errp);
> +if (!bs->file) {
>  ret = -EINVAL;
> -error_propagate(errp, local_err);
>  goto fail;
>  }
>  
> diff --git a/block/blkverify.c b/block/blkverify.c
> index 4aed53ab59..95ae73e2aa 100644
> --- a/block/blkverify.c
> +++ b/block/blkverify.c
> @@ -112,7 +112,6 @@ static int blkverify_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  {
>  BDRVBlkverifyState *s = bs->opaque;
>  QemuOpts *opts;
> -Error *local_err = NULL;
>  int ret;
>  
>  opts = qemu_opts_create(_opts, NULL, 0, _abort);
> @@ -125,20 +124,18 @@ static int blkverify_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  bs->file = bdrv_open_child(qemu_opt_get(opts, "x-raw"), options, "raw",
> bs, _of_bds,
> BDRV_CHILD_FILTERED | BDRV_C

Re: [PATCH 02/14] block: use return status of bdrv_append()

2020-09-10 Thread Greg Kurz
On Wed,  9 Sep 2020 21:59:18 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> Now bdrv_append returns status and we can drop all the local_err things
> around it.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---

Reviewed-by: Greg Kurz 

Just one suggestion for a follow-up below...

>  block.c |  5 +
>  block/backup-top.c  | 20 
>  block/commit.c  |  5 +
>  block/mirror.c  |  6 ++
>  blockdev.c  |  4 +---
>  tests/test-bdrv-graph-mod.c |  6 +++---
>  6 files changed, 16 insertions(+), 30 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 6d35449027..9b624b2535 100644
> --- a/block.c
> +++ b/block.c
> @@ -3156,7 +3156,6 @@ static BlockDriverState 
> *bdrv_append_temp_snapshot(BlockDriverState *bs,
>  int64_t total_size;
>  QemuOpts *opts = NULL;
>  BlockDriverState *bs_snapshot = NULL;
> -Error *local_err = NULL;
>  int ret;
>  
>  /* if snapshot, we create a temporary backing file and open it
> @@ -3203,9 +3202,7 @@ static BlockDriverState 
> *bdrv_append_temp_snapshot(BlockDriverState *bs,
>   * order to be able to return one, we have to increase
>   * bs_snapshot's refcount here */
>  bdrv_ref(bs_snapshot);
> -bdrv_append(bs_snapshot, bs, _err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> +if (bdrv_append(bs_snapshot, bs, errp) < 0) {
>  bs_snapshot = NULL;
>  goto out;
>  }
> diff --git a/block/backup-top.c b/block/backup-top.c
> index af2f20f346..de9d5e1634 100644
> --- a/block/backup-top.c
> +++ b/block/backup-top.c
> @@ -192,7 +192,7 @@ BlockDriverState *bdrv_backup_top_append(BlockDriverState 
> *source,
>   BlockCopyState **bcs,
>   Error **errp)
>  {
> -Error *local_err = NULL;
> +ERRP_GUARD();
>  BDRVBackupTopState *state;
>  BlockDriverState *top;
>  bool appended = false;
> @@ -225,9 +225,8 @@ BlockDriverState *bdrv_backup_top_append(BlockDriverState 
> *source,
>  bdrv_drained_begin(source);
>  
>  bdrv_ref(top);
> -bdrv_append(top, source, _err);
> -if (local_err) {
> -error_prepend(_err, "Cannot append backup-top filter: ");
> +if (bdrv_append(top, source, errp) < 0) {
> +error_prepend(errp, "Cannot append backup-top filter: ");
>  goto fail;
>  }
>  appended = true;
> @@ -237,18 +236,16 @@ BlockDriverState 
> *bdrv_backup_top_append(BlockDriverState *source,
>   * we want.
>   */
>  state->active = true;
> -bdrv_child_refresh_perms(top, top->backing, _err);
> -if (local_err) {
> -error_prepend(_err,
> -  "Cannot set permissions for backup-top filter: ");
> +if (bdrv_child_refresh_perms(top, top->backing, errp) < 0) {
> +error_prepend(errp, "Cannot set permissions for backup-top filter: 
> ");
>  goto fail;
>  }
>  
>  state->cluster_size = cluster_size;
>  state->bcs = block_copy_state_new(top->backing, state->target,
> -  cluster_size, write_flags, _err);
> -if (local_err) {
> -error_prepend(_err, "Cannot create block-copy-state: ");
> +  cluster_size, write_flags, errp);
> +if (!state->bcs) {
> +error_prepend(errp, "Cannot create block-copy-state: ");
>  goto fail;
>  }
>  *bcs = state->bcs;
> @@ -266,7 +263,6 @@ fail:
>  }
>  
>  bdrv_drained_end(source);
> -error_propagate(errp, local_err);
>  
>  return NULL;
>  }
> diff --git a/block/commit.c b/block/commit.c
> index 7732d02dfe..7720d4729b 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -253,7 +253,6 @@ void commit_start(const char *job_id, BlockDriverState 
> *bs,
>  CommitBlockJob *s;
>  BlockDriverState *iter;
>  BlockDriverState *commit_top_bs = NULL;
> -Error *local_err = NULL;
>  int ret;
>  

... this is unrelated but while reviewing I've noticed that the ret
variable isn't really needed.

>  assert(top != bs);
> @@ -292,10 +291,8 @@ void commit_start(const char *job_id, BlockDriverState 
> *bs,
>  
>  commit_top_bs->total_sectors = top->total_sectors;
>  
> -bdrv_append(commit_top_bs, top, _err);
> -if (local_err) {
> +if (bdrv_append(commit_top_bs, top, errp) < 0) {
>  commit_top_bs = NULL;
> -error_propagate(errp, local_err);
>  got

Re: [PATCH 01/14] block: return status from bdrv_append and friends

2020-09-10 Thread Greg Kurz
On Wed,  9 Sep 2020 21:59:17 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> The recommended use of qemu error api assumes returning status together
> with setting errp and avoid void functions with errp parameter. Let's
> improve bdrv_append and some friends to reduce error-propagation
> overhead in further patches.
> 
> Choose int return status, because bdrv_replace_node() has call to
> bdrv_check_update_perm(), which reports int status, which seems correct
> to propagate.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---

Reviewed-by: Greg Kurz 

>  include/block/block.h | 12 ++--
>  block.c   | 39 ---
>  2 files changed, 30 insertions(+), 21 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index 6e36154061..03b3cee8f8 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -336,10 +336,10 @@ int bdrv_create(BlockDriver *drv, const char* filename,
>  int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp);
>  
>  BlockDriverState *bdrv_new(void);
> -void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
> - Error **errp);
> -void bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
> -   Error **errp);
> +int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
> +Error **errp);
> +int bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
> +  Error **errp);
>  
>  int bdrv_parse_aio(const char *mode, int *flags);
>  int bdrv_parse_cache_mode(const char *mode, int *flags, bool *writethrough);
> @@ -351,8 +351,8 @@ BdrvChild *bdrv_open_child(const char *filename,
> BdrvChildRole child_role,
> bool allow_none, Error **errp);
>  BlockDriverState *bdrv_open_blockdev_ref(BlockdevRef *ref, Error **errp);
> -void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
> - Error **errp);
> +int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
> +Error **errp);
>  int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options,
> const char *bdref_key, Error **errp);
>  BlockDriverState *bdrv_open(const char *filename, const char *reference,
> diff --git a/block.c b/block.c
> index 2ba76b2c36..6d35449027 100644
> --- a/block.c
> +++ b/block.c
> @@ -2866,14 +2866,15 @@ static BdrvChildRole 
> bdrv_backing_role(BlockDriverState *bs)
>   * Sets the backing file link of a BDS. A new reference is created; callers
>   * which don't need their own reference any more must call bdrv_unref().
>   */
> -void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
> +int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
>   Error **errp)
>  {
> +int ret = 0;
>  bool update_inherits_from = bdrv_chain_contains(bs, backing_hd) &&
>  bdrv_inherits_from_recursive(backing_hd, bs);
>  
>  if (bdrv_is_backing_chain_frozen(bs, backing_bs(bs), errp)) {
> -return;
> +return -EPERM;
>  }
>  
>  if (backing_hd) {
> @@ -2891,15 +2892,22 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
> BlockDriverState *backing_hd,
>  
>  bs->backing = bdrv_attach_child(bs, backing_hd, "backing", _of_bds,
>  bdrv_backing_role(bs), errp);
> +if (!bs->backing) {
> +ret = -EINVAL;
> +goto out;
> +}
> +
>  /* If backing_hd was already part of bs's backing chain, and
>   * inherits_from pointed recursively to bs then let's update it to
>   * point directly to bs (else it will become NULL). */
> -if (bs->backing && update_inherits_from) {
> +if (update_inherits_from) {
>  backing_hd->inherits_from = bs;
>  }
>  
>  out:
>  bdrv_refresh_limits(bs, NULL);
> +
> +return ret;
>  }
>  
>  /*
> @@ -4517,8 +4525,8 @@ static bool should_update_child(BdrvChild *c, 
> BlockDriverState *to)
>  return ret;
>  }
>  
> -void bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
> -   Error **errp)
> +int bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
> +  Error **errp)
>  {
>  BdrvChild *c, *next;
>  GSList *list = NULL, *p;
> @@ -4540,6 +4548,7 @@ void bdrv_replace_node(BlockDriverState *from, 
> BlockDriverState *to,
>  continue;
>  }
>  if (c->frozen) {
> +ret = -

[PATCH] block: Avoid stale pointer dereference in blk_get_aio_context()

2020-07-09 Thread Greg Kurz
It is possible for blk_remove_bs() to race with blk_drain_all(), causing
the latter to dereference a stale blk->root pointer:


  blk_remove_bs(blk)
   bdrv_root_unref_child(blk->root)
child_bs = blk->root->bs
bdrv_detach_child(blk->root)
 ...
 g_free(blk->root) <== blk->root becomes stale
bdrv_unref(child_bs) < yield at some point

A blk_drain_all() can be triggered by some guest action in the
meantime, eg. on POWER, SLOF might disable bus mastering on
a virtio-scsi-pci device:

  virtio_write_config()
   virtio_pci_stop_ioeventfd()
virtio_bus_stop_ioeventfd()
 virtio_scsi_dataplane_stop()
  blk_drain_all()
   blk_get_aio_context()
   bs = blk->root ? blk->root->bs : NULL
^
  stale

Then, depending on one's luck, QEMU either crashes with SEGV or
hits the assertion in blk_get_aio_context().

blk->root is set by blk_insert_bs() which calls bdrv_root_attach_child()
first. The blk_remove_bs() function should rollback the changes made
by blk_insert_bs() in the opposite order (or it should be documented
somewhere why this isn't the case). Clear blk->root before calling
bdrv_root_unref_child() in blk_remove_bs().

Signed-off-by: Greg Kurz 
---
 block/block-backend.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 6936b25c836c..0bf0188133e3 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -808,6 +808,7 @@ void blk_remove_bs(BlockBackend *blk)
 {
 ThrottleGroupMember *tgm = >public.throttle_group_member;
 BlockDriverState *bs;
+BdrvChild *root;
 
 notifier_list_notify(>remove_bs_notifiers, blk);
 if (tgm->throttle_state) {
@@ -825,8 +826,9 @@ void blk_remove_bs(BlockBackend *blk)
  * to avoid that and a potential QEMU crash.
  */
 blk_drain(blk);
-bdrv_root_unref_child(blk->root);
+root = blk->root;
 blk->root = NULL;
+bdrv_root_unref_child(root);
 }
 
 /*





Re: [PATCH v4 10/45] qemu-option: Simplify around find_default_by_name()

2020-07-07 Thread Greg Kurz
On Tue,  7 Jul 2020 18:05:38 +0200
Markus Armbruster  wrote:

> Signed-off-by: Markus Armbruster 
> Reviewed-by: Eric Blake 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> ---

Reviewed-by: Greg Kurz 

>  util/qemu-option.c | 18 +-
>  1 file changed, 5 insertions(+), 13 deletions(-)
> 
> diff --git a/util/qemu-option.c b/util/qemu-option.c
> index 14e211ddd8..e7b540a21b 100644
> --- a/util/qemu-option.c
> +++ b/util/qemu-option.c
> @@ -277,7 +277,6 @@ static void qemu_opt_del_all(QemuOpts *opts, const char 
> *name)
>  const char *qemu_opt_get(QemuOpts *opts, const char *name)
>  {
>  QemuOpt *opt;
> -const char *def_val;
>  
>  if (opts == NULL) {
>  return NULL;
> @@ -285,12 +284,10 @@ const char *qemu_opt_get(QemuOpts *opts, const char 
> *name)
>  
>  opt = qemu_opt_find(opts, name);
>  if (!opt) {
> -def_val = find_default_by_name(opts, name);
> -if (def_val) {
> -return def_val;
> -}
> +return find_default_by_name(opts, name);
>  }
> -return opt ? opt->str : NULL;
> +
> +return opt->str;
>  }
>  
>  void qemu_opt_iter_init(QemuOptsIter *iter, QemuOpts *opts, const char *name)
> @@ -319,8 +316,7 @@ const char *qemu_opt_iter_next(QemuOptsIter *iter)
>  char *qemu_opt_get_del(QemuOpts *opts, const char *name)
>  {
>  QemuOpt *opt;
> -const char *def_val;
> -char *str = NULL;
> +char *str;
>  
>  if (opts == NULL) {
>  return NULL;
> @@ -328,11 +324,7 @@ char *qemu_opt_get_del(QemuOpts *opts, const char *name)
>  
>  opt = qemu_opt_find(opts, name);
>  if (!opt) {
> -def_val = find_default_by_name(opts, name);
> -if (def_val) {
> -str = g_strdup(def_val);
> -}
> -return str;
> +return g_strdup(find_default_by_name(opts, name));
>  }
>  str = opt->str;
>  opt->str = NULL;




Re: [PATCH v4 02/45] error: Improve error.h's big comment

2020-07-07 Thread Greg Kurz
On Tue,  7 Jul 2020 18:05:30 +0200
Markus Armbruster  wrote:

> Add headlines to the big comment.
> 
> Explain examples for NULL, _abort and _fatal argument
> better.
> 
> Tweak rationale for error_propagate_prepend().
> 
> Signed-off-by: Markus Armbruster 
> ---

Reviewed-by: Greg Kurz 

>  include/qapi/error.h | 51 +++-
>  1 file changed, 36 insertions(+), 15 deletions(-)
> 
> diff --git a/include/qapi/error.h b/include/qapi/error.h
> index e8960eaad5..6d079c58b7 100644
> --- a/include/qapi/error.h
> +++ b/include/qapi/error.h
> @@ -15,6 +15,8 @@
>  /*
>   * Error reporting system loosely patterned after Glib's GError.
>   *
> + * = Creating errors =
> + *
>   * Create an error:
>   * error_setg(, "situation normal, all fouled up");
>   *
> @@ -27,6 +29,8 @@
>   * error_setg(, "invalid quark\n" // WRONG!
>   *"Valid quarks are up, down, strange, charm, top, bottom.");
>   *
> + * = Reporting and destroying errors =
> + *
>   * Report an error to the current monitor if we have one, else stderr:
>   * error_report_err(err);
>   * This frees the error object.
> @@ -40,6 +44,30 @@
>   * error_free(err);
>   * Note that this loses hints added with error_append_hint().
>   *
> + * Call a function ignoring errors:
> + * foo(arg, NULL);
> + * This is more concise than
> + * Error *err = NULL;
> + * foo(arg, );
> + * error_free(err); // don't do this
> + *
> + * Call a function aborting on errors:
> + * foo(arg, _abort);
> + * This is more concise and fails more nicely than
> + * Error *err = NULL;
> + * foo(arg, );
> + * assert(!err); // don't do this
> + *
> + * Call a function treating errors as fatal:
> + * foo(arg, _fatal);
> + * This is more concise than
> + * Error *err = NULL;
> + * foo(arg, );
> + * if (err) { // don't do this
> + * error_report_err(err);
> + * exit(1);
> + * }
> + *
>   * Handle an error without reporting it (just for completeness):
>   * error_free(err);
>   *
> @@ -47,6 +75,11 @@
>   * reporting it (primarily useful in testsuites):
>   * error_free_or_abort();
>   *
> + * = Passing errors around =
> + *
> + * Errors get passed to the caller through the conventional @errp
> + * parameter.
> + *
>   * Pass an existing error to the caller:
>   * error_propagate(errp, err);
>   * where Error **errp is a parameter, by convention the last one.
> @@ -54,11 +87,10 @@
>   * Pass an existing error to the caller with the message modified:
>   * error_propagate_prepend(errp, err,
>   * "Could not frobnicate '%s': ", name);
> - *
> - * Avoid
> - * error_propagate(errp, err);
> + * This is more concise than
> + * error_propagate(errp, err); // don't do this
>   * error_prepend(errp, "Could not frobnicate '%s': ", name);
> - * because this fails to prepend when @errp is _fatal.
> + * and works even when @errp is _fatal.
>   *
>   * Create a new error and pass it to the caller:
>   * error_setg(errp, "situation normal, all fouled up");
> @@ -70,15 +102,6 @@
>   * handle the error...
>   * }
>   *
> - * Call a function ignoring errors:
> - * foo(arg, NULL);
> - *
> - * Call a function aborting on errors:
> - * foo(arg, _abort);
> - *
> - * Call a function treating errors as fatal:
> - * foo(arg, _fatal);
> - *
>   * Receive an error and pass it on to the caller:
>   * Error *err = NULL;
>   * foo(arg, );
> @@ -86,8 +109,6 @@
>   * handle the error...
>   * error_propagate(errp, err);
>   * }
> - * where Error **errp is a parameter, by convention the last one.
> - *
>   * Do *not* "optimize" this to
>   * foo(arg, errp);
>   * if (*errp) { // WRONG!




Re: [PATCH v3 08/44] qemu-option: Factor out helper find_default_by_name()

2020-07-07 Thread Greg Kurz
On Mon,  6 Jul 2020 10:09:14 +0200
Markus Armbruster  wrote:

> Signed-off-by: Markus Armbruster 
> Reviewed-by: Eric Blake 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> ---
>  util/qemu-option.c | 47 ++
>  1 file changed, 27 insertions(+), 20 deletions(-)
> 
> diff --git a/util/qemu-option.c b/util/qemu-option.c
> index 1df55bc881..14e211ddd8 100644
> --- a/util/qemu-option.c
> +++ b/util/qemu-option.c
> @@ -142,6 +142,13 @@ static const QemuOptDesc *find_desc_by_name(const 
> QemuOptDesc *desc,
>  return NULL;
>  }
>  
> +static const char *find_default_by_name(QemuOpts *opts, const char *name)
> +{
> +const QemuOptDesc *desc = find_desc_by_name(opts->list->desc, name);
> +
> +return desc ? desc->def_value_str : NULL;
> +}
> +
>  void parse_option_size(const char *name, const char *value,
> uint64_t *ret, Error **errp)
>  {
> @@ -270,7 +277,7 @@ static void qemu_opt_del_all(QemuOpts *opts, const char 
> *name)
>  const char *qemu_opt_get(QemuOpts *opts, const char *name)
>  {
>  QemuOpt *opt;
> -const QemuOptDesc *desc;
> +const char *def_val;
>  
>  if (opts == NULL) {
>  return NULL;
> @@ -278,9 +285,9 @@ const char *qemu_opt_get(QemuOpts *opts, const char *name)
>  
>  opt = qemu_opt_find(opts, name);
>  if (!opt) {
> -desc = find_desc_by_name(opts->list->desc, name);
> -if (desc && desc->def_value_str) {
> -return desc->def_value_str;
> +def_val = find_default_by_name(opts, name);
> +if (def_val) {
> +return def_val;
>  }
>  }
>  return opt ? opt->str : NULL;
> @@ -312,7 +319,7 @@ const char *qemu_opt_iter_next(QemuOptsIter *iter)
>  char *qemu_opt_get_del(QemuOpts *opts, const char *name)
>  {
>  QemuOpt *opt;
> -const QemuOptDesc *desc;
> +const char *def_val;
>  char *str = NULL;
>  
>  if (opts == NULL) {
> @@ -321,9 +328,9 @@ char *qemu_opt_get_del(QemuOpts *opts, const char *name)
>  
>  opt = qemu_opt_find(opts, name);
>  if (!opt) {
> -desc = find_desc_by_name(opts->list->desc, name);
> -if (desc && desc->def_value_str) {
> -str = g_strdup(desc->def_value_str);
> +def_val = find_default_by_name(opts, name);
> +if (def_val) {
> +str = g_strdup(def_val);
>  }
>  return str;
>  }


This could be possibly abbreviated to:

if (!opt) {
return g_strdup(find_default_by_name(opts, name));
}

since g_strdup(NULL) returns NULL, but the more verbose version
is nice as well and it is consistent with the other changes, so:

Reviewed-by: Greg Kurz 

> @@ -349,7 +356,7 @@ static bool qemu_opt_get_bool_helper(QemuOpts *opts, 
> const char *name,
>   bool defval, bool del)
>  {
>  QemuOpt *opt;
> -const QemuOptDesc *desc;
> +const char *def_val;
>  bool ret = defval;
>  
>  if (opts == NULL) {
> @@ -358,9 +365,9 @@ static bool qemu_opt_get_bool_helper(QemuOpts *opts, 
> const char *name,
>  
>  opt = qemu_opt_find(opts, name);
>  if (opt == NULL) {
> -desc = find_desc_by_name(opts->list->desc, name);
> -if (desc && desc->def_value_str) {
> -parse_option_bool(name, desc->def_value_str, , _abort);
> +def_val = find_default_by_name(opts, name);
> +if (def_val) {
> +parse_option_bool(name, def_val, , _abort);
>  }
>  return ret;
>  }
> @@ -386,7 +393,7 @@ static uint64_t qemu_opt_get_number_helper(QemuOpts 
> *opts, const char *name,
> uint64_t defval, bool del)
>  {
>  QemuOpt *opt;
> -const QemuOptDesc *desc;
> +const char *def_val;
>  uint64_t ret = defval;
>  
>  if (opts == NULL) {
> @@ -395,9 +402,9 @@ static uint64_t qemu_opt_get_number_helper(QemuOpts 
> *opts, const char *name,
>  
>  opt = qemu_opt_find(opts, name);
>  if (opt == NULL) {
> -desc = find_desc_by_name(opts->list->desc, name);
> -if (desc && desc->def_value_str) {
> -parse_option_number(name, desc->def_value_str, , 
> _abort);
> +def_val = find_default_by_name(opts, name);
> +if (def_val) {
> +parse_option_number(name, def_val, , _abort);
>  }
>  return ret;
>  }
> @@ -424,7 +431,7 @@ static uint64_t qemu_opt_get_size_helper(QemuOpts *opts, 
> const char *name,
> 

Re: [PATCH v3 07/44] qemu-option: Make uses of find_desc_by_name() more similar

2020-07-07 Thread Greg Kurz
On Mon,  6 Jul 2020 10:09:13 +0200
Markus Armbruster  wrote:

> This is to make the next commit easier to review.
> 
> Signed-off-by: Markus Armbruster 
> Reviewed-by: Eric Blake 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> ---

Reviewed-by: Greg Kurz 

>  util/qemu-option.c | 32 ++--
>  1 file changed, 18 insertions(+), 14 deletions(-)
> 
> diff --git a/util/qemu-option.c b/util/qemu-option.c
> index fd1fd23521..1df55bc881 100644
> --- a/util/qemu-option.c
> +++ b/util/qemu-option.c
> @@ -270,6 +270,7 @@ static void qemu_opt_del_all(QemuOpts *opts, const char 
> *name)
>  const char *qemu_opt_get(QemuOpts *opts, const char *name)
>  {
>  QemuOpt *opt;
> +const QemuOptDesc *desc;
>  
>  if (opts == NULL) {
>  return NULL;
> @@ -277,7 +278,7 @@ const char *qemu_opt_get(QemuOpts *opts, const char *name)
>  
>  opt = qemu_opt_find(opts, name);
>  if (!opt) {
> -const QemuOptDesc *desc = find_desc_by_name(opts->list->desc, name);
> +desc = find_desc_by_name(opts->list->desc, name);
>  if (desc && desc->def_value_str) {
>  return desc->def_value_str;
>  }
> @@ -348,6 +349,7 @@ static bool qemu_opt_get_bool_helper(QemuOpts *opts, 
> const char *name,
>   bool defval, bool del)
>  {
>  QemuOpt *opt;
> +const QemuOptDesc *desc;
>  bool ret = defval;
>  
>  if (opts == NULL) {
> @@ -356,7 +358,7 @@ static bool qemu_opt_get_bool_helper(QemuOpts *opts, 
> const char *name,
>  
>  opt = qemu_opt_find(opts, name);
>  if (opt == NULL) {
> -const QemuOptDesc *desc = find_desc_by_name(opts->list->desc, name);
> +desc = find_desc_by_name(opts->list->desc, name);
>  if (desc && desc->def_value_str) {
>  parse_option_bool(name, desc->def_value_str, , _abort);
>  }
> @@ -384,6 +386,7 @@ static uint64_t qemu_opt_get_number_helper(QemuOpts 
> *opts, const char *name,
> uint64_t defval, bool del)
>  {
>  QemuOpt *opt;
> +const QemuOptDesc *desc;
>  uint64_t ret = defval;
>  
>  if (opts == NULL) {
> @@ -392,7 +395,7 @@ static uint64_t qemu_opt_get_number_helper(QemuOpts 
> *opts, const char *name,
>  
>  opt = qemu_opt_find(opts, name);
>  if (opt == NULL) {
> -const QemuOptDesc *desc = find_desc_by_name(opts->list->desc, name);
> +desc = find_desc_by_name(opts->list->desc, name);
>  if (desc && desc->def_value_str) {
>  parse_option_number(name, desc->def_value_str, , 
> _abort);
>  }
> @@ -421,6 +424,7 @@ static uint64_t qemu_opt_get_size_helper(QemuOpts *opts, 
> const char *name,
>   uint64_t defval, bool del)
>  {
>  QemuOpt *opt;
> +const QemuOptDesc *desc;
>  uint64_t ret = defval;
>  
>  if (opts == NULL) {
> @@ -429,7 +433,7 @@ static uint64_t qemu_opt_get_size_helper(QemuOpts *opts, 
> const char *name,
>  
>  opt = qemu_opt_find(opts, name);
>  if (opt == NULL) {
> -const QemuOptDesc *desc = find_desc_by_name(opts->list->desc, name);
> +desc = find_desc_by_name(opts->list->desc, name);
>  if (desc && desc->def_value_str) {
>  parse_option_size(name, desc->def_value_str, , _abort);
>  }
> @@ -540,18 +544,18 @@ void qemu_opt_set_bool(QemuOpts *opts, const char 
> *name, bool val,
> Error **errp)
>  {
>  QemuOpt *opt;
> -const QemuOptDesc *desc = opts->list->desc;
> +const QemuOptDesc *desc;
>  
> -opt = g_malloc0(sizeof(*opt));
> -opt->desc = find_desc_by_name(desc, name);
> -if (!opt->desc && !opts_accepts_any(opts)) {
> +desc = find_desc_by_name(opts->list->desc, name);
> +if (!desc && !opts_accepts_any(opts)) {
>  error_setg(errp, QERR_INVALID_PARAMETER, name);
> -g_free(opt);
>  return;
>  }
>  
> +opt = g_malloc0(sizeof(*opt));
>  opt->name = g_strdup(name);
>  opt->opts = opts;
> +opt->desc = desc;
>  opt->value.boolean = !!val;
>  opt->str = g_strdup(val ? "on" : "off");
>  QTAILQ_INSERT_TAIL(>head, opt, next);
> @@ -561,18 +565,18 @@ void qemu_opt_set_number(QemuOpts *opts, const char 
> *name, int64_t val,
>   Error **errp)
>  {
>  QemuOpt *opt;
> -const QemuOptDesc *desc = opts->list->desc;
> +const QemuO

Re: [PATCH v3 06/44] qemu-option: Check return value instead of @err where convenient

2020-07-07 Thread Greg Kurz
On Mon, 06 Jul 2020 22:01:38 +0200
Markus Armbruster  wrote:

> Greg Kurz  writes:
> 
> > On Mon,  6 Jul 2020 10:09:12 +0200
> > Markus Armbruster  wrote:
> >
> >> Convert uses like
> >> 
> >> opts = qemu_opts_create(..., );
> >> if (err) {
> >> ...
> >> }
> >> 
> >> to
> >> 
> >> opts = qemu_opts_create(..., );
> >
> > The patch doesn't strictly do that since it also converts  to errp.
> 
> Yes, and that's actually why I do it.  I'll change the commit message to
> say so:
> 
>to
>
>opts = qemu_opts_create(..., errp);
> 

Ok.

> > This is okay because most of the changes also drop the associated
> > error_propagate(), with the exception of block/parallels.c for which
> > I had to check how local_err is used. As already noted by Vladimir
> > earlier this generates an harmless "no-op error_propagate", but it
> > could be worth mentioning that in the changelog for future reviews :)
> 
> Yes, error_propagate() becomes a no-op for one out of three error paths
> through it.  There's similar "partial no-opification" elsewhere in this
> series, notably in PATCH 36.
> 
> Concrete suggestions for improving the commit message further are
> welcome!
> 

What about this ?

The change in block/parallels.c doesn't provide any clue on the usage
of local_err. As usual it is expected to be equal to NULL at the time
qemu_opts_create() is called, and the goto on the error path jumps
here:

fail_options:
error_propagate(errp, local_err);

So, if qemu_opts_create() fails, we end up doing error_propagate(errp, NULL)
which is a harmles no-op.

> >> if (!opts) {
> >> ...
> >> }
> >> 
> >> Eliminate error_propagate() that are now unnecessary.  Delete @err
> >> that are now unused.
> >> 
> >> Signed-off-by: Markus Armbruster 
> >> Reviewed-by: Eric Blake 
> >> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> >> ---
> >>  block/parallels.c  |  4 ++--
> >>  blockdev.c |  5 ++---
> >>  qdev-monitor.c |  5 ++---
> >>  util/qemu-config.c | 10 --
> >>  util/qemu-option.c | 12 
> >
> > Maybe some other potential candidates ?
> >
> > chardev/char.c:
> >
> >opts = qemu_opts_create(qemu_find_opts("chardev"), label, 1, _err);
> > if (local_err) {
> > error_report_err(local_err);
> > return NULL;
> > }
> >
> > monitor/hmp-cmds.c:
> >
> > opts = qemu_opts_from_qdict(qemu_find_opts("netdev"), qdict, );
> > if (err) {
> > goto out;
> > }
> >
> >
> > opts = qemu_opts_from_qdict(qemu_find_opts("object"), qdict, );
> > if (err) {
> > goto end;
> > }
> 
> Don't fit my clarified commit message, because I can't replace  by
> errp there.
> 

Sure.

> I found these:
> 
>   diff --git a/block/blkdebug.c b/block/blkdebug.c
>   index 7194bc7f06..471b597dfe 100644
>   --- a/block/blkdebug.c
>   +++ b/block/blkdebug.c
>   @@ -294,17 +294,13 @@ static int read_config(BDRVBlkdebugState *s, const 
> char *filename,
> 
>d.s = s;
>d.action = ACTION_INJECT_ERROR;
>   -qemu_opts_foreach(_error_opts, add_rule, , _err);
>   -if (local_err) {
>   -error_propagate(errp, local_err);
>   +if (qemu_opts_foreach(_error_opts, add_rule, , errp)) {
>ret = -EINVAL;
>goto fail;
>}
> 
>d.action = ACTION_SET_STATE;
>   -qemu_opts_foreach(_state_opts, add_rule, , _err);
>   -if (local_err) {
>   -error_propagate(errp, local_err);
>   +if (qemu_opts_foreach(_state_opts, add_rule, , errp)) {
>ret = -EINVAL;
>goto fail;
>}
> 
> However, I really need to get a pull request out...  Can patch them
> later.
> 

Yeah and we might probably find a few more, but certainly not much
after this colossal effort of yours.

> > With or without the extra changes:
> >
> > Reviewed-by: Greg Kurz 
> 
> Thanks!
> 

My pleasure.



Re: [PATCH v3 06/44] qemu-option: Check return value instead of @err where convenient

2020-07-06 Thread Greg Kurz
On Mon,  6 Jul 2020 10:09:12 +0200
Markus Armbruster  wrote:

> Convert uses like
> 
> opts = qemu_opts_create(..., );
> if (err) {
> ...
> }
> 
> to
> 
> opts = qemu_opts_create(..., );

The patch doesn't strictly do that since it also converts  to errp.
This is okay because most of the changes also drop the associated
error_propagate(), with the exception of block/parallels.c for which
I had to check how local_err is used. As already noted by Vladimir
earlier this generates an harmless "no-op error_propagate", but it
could be worth mentioning that in the changelog for future reviews :)

> if (!opts) {
> ...
> }
> 
> Eliminate error_propagate() that are now unnecessary.  Delete @err
> that are now unused.
> 
> Signed-off-by: Markus Armbruster 
> Reviewed-by: Eric Blake 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/parallels.c  |  4 ++--
>  blockdev.c |  5 ++---
>  qdev-monitor.c |  5 ++---
>  util/qemu-config.c | 10 --
>  util/qemu-option.c | 12 

Maybe some other potential candidates ?

chardev/char.c:

   opts = qemu_opts_create(qemu_find_opts("chardev"), label, 1, _err);
if (local_err) {
error_report_err(local_err);
return NULL;
}

monitor/hmp-cmds.c:

opts = qemu_opts_from_qdict(qemu_find_opts("netdev"), qdict, );
if (err) {
goto out;
}


opts = qemu_opts_from_qdict(qemu_find_opts("object"), qdict, );
if (err) {
goto end;
}

With or without the extra changes:

Reviewed-by: Greg Kurz 

>  5 files changed, 14 insertions(+), 22 deletions(-)
> 
> diff --git a/block/parallels.c b/block/parallels.c
> index 63a1cde8af..f26f03c926 100644
> --- a/block/parallels.c
> +++ b/block/parallels.c
> @@ -824,8 +824,8 @@ static int parallels_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  }
>  }
>  
> -opts = qemu_opts_create(_runtime_opts, NULL, 0, _err);
> -if (local_err != NULL) {
> +opts = qemu_opts_create(_runtime_opts, NULL, 0, errp);
> +if (!opts) {
>  goto fail_options;
>  }
>  
> diff --git a/blockdev.c b/blockdev.c
> index 31d5eaf6bf..b52ed9de86 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -504,9 +504,8 @@ static BlockBackend *blockdev_init(const char *file, 
> QDict *bs_opts,
>  /* Check common options by copying from bs_opts to opts, all other 
> options
>   * stay in bs_opts for processing by bdrv_open(). */
>  id = qdict_get_try_str(bs_opts, "id");
> -opts = qemu_opts_create(_common_drive_opts, id, 1, );
> -if (error) {
> -error_propagate(errp, error);
> +opts = qemu_opts_create(_common_drive_opts, id, 1, errp);
> +if (!opts) {
>  goto err_no_opts;
>  }
>  
> diff --git a/qdev-monitor.c b/qdev-monitor.c
> index 13a13a811a..079cb6001e 100644
> --- a/qdev-monitor.c
> +++ b/qdev-monitor.c
> @@ -799,9 +799,8 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, 
> Error **errp)
>  QemuOpts *opts;
>  DeviceState *dev;
>  
> -opts = qemu_opts_from_qdict(qemu_find_opts("device"), qdict, _err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> +opts = qemu_opts_from_qdict(qemu_find_opts("device"), qdict, errp);
> +if (!opts) {
>  return;
>  }
>  if (!monitor_cur_is_qmp() && qdev_device_help(opts)) {
> diff --git a/util/qemu-config.c b/util/qemu-config.c
> index 772f5a219e..c0d0e9b8ef 100644
> --- a/util/qemu-config.c
> +++ b/util/qemu-config.c
> @@ -493,9 +493,8 @@ static void config_parse_qdict_section(QDict *options, 
> QemuOptsList *opts,
>  goto out;
>  }
>  
> -subopts = qemu_opts_create(opts, NULL, 0, _err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> +subopts = qemu_opts_create(opts, NULL, 0, errp);
> +if (!subopts) {
>  goto out;
>  }
>  
> @@ -538,10 +537,9 @@ static void config_parse_qdict_section(QDict *options, 
> QemuOptsList *opts,
>  }
>  
>  opt_name = g_strdup_printf("%s.%u", opts->name, i++);
> -subopts = qemu_opts_create(opts, opt_name, 1, _err);
> +subopts = qemu_opts_create(opts, opt_name, 1, errp);
>  g_free(opt_name);
> -if (local_err) {
> -error_propagate(errp, local_err);
> +if (!subopts) {
>  goto out;
>  }
>  
> diff --git a/util/qemu-option.c b/util/qemu-option.c
> index 0ebfd97a98..fd1fd23521 100644
> --- a/util/qemu-option.c
> +++ b/util/qemu

Re: [PATCH v3 04/44] macio: Tidy up error handling in macio_newworld_realize()

2020-07-06 Thread Greg Kurz
On Mon,  6 Jul 2020 10:09:10 +0200
Markus Armbruster  wrote:

> macio_newworld_realize() effectively ignores ns->gpio realization
> errors, leaking the Error object.  Fortunately, macio_gpio_realize()
> can't actually fail.  Tidy up.
> 
> Cc: Mark Cave-Ayland 
> Cc: David Gibson 
> Signed-off-by: Markus Armbruster 
> Reviewed-by: Eric Blake 
> Acked-by: David Gibson 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> ---

Reviewed-by: Greg Kurz 

>  hw/misc/macio/macio.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/misc/macio/macio.c b/hw/misc/macio/macio.c
> index 42414797e2..be66bb7758 100644
> --- a/hw/misc/macio/macio.c
> +++ b/hw/misc/macio/macio.c
> @@ -334,7 +334,9 @@ static void macio_newworld_realize(PCIDevice *d, Error 
> **errp)
>   _abort);
>  memory_region_add_subregion(>bar, 0x50,
>  sysbus_mmio_get_region(sysbus_dev, 0));
> -qdev_realize(DEVICE(>gpio), BUS(>macio_bus), );
> +if (!qdev_realize(DEVICE(>gpio), BUS(>macio_bus), errp)) {
> +return;
> +}
>  
>  /* PMU */
>  object_initialize_child(OBJECT(s), "pmu", >pmu, TYPE_VIA_PMU);




Re: [PATCH v3 03/44] qdev: Use returned bool to check for qdev_realize() etc. failure

2020-07-06 Thread Greg Kurz
On Mon, 06 Jul 2020 13:35:19 +0200
Markus Armbruster  wrote:

> Greg Kurz  writes:
> 
> > On Mon,  6 Jul 2020 10:09:09 +0200
> > Markus Armbruster  wrote:
> >
> >> Convert
> >> 
> >> foo(..., );
> >> if (err) {
> >> ...
> >> }
> >> 
> >> to
> >> 
> >> if (!foo(..., )) {
> >> ...
> >> }
> >> 
> >> for qdev_realize(), qdev_realize_and_unref(), qbus_realize() and their
> >> wrappers isa_realize_and_unref(), pci_realize_and_unref(),
> >> sysbus_realize(), sysbus_realize_and_unref(), usb_realize_and_unref().
> >> Coccinelle script:
> >> 
> >> @@
> >> identifier fun = {
> >> isa_realize_and_unref, pci_realize_and_unref, qbus_realize,
> >> qdev_realize, qdev_realize_and_unref, sysbus_realize,
> >> sysbus_realize_and_unref, usb_realize_and_unref
> >> };
> >> expression list args, args2;
> >> typedef Error;
> >> Error *err;
> >> @@
> >> -fun(args, , args2);
> >> -if (err)
> >> +if (!fun(args, , args2))
> >>  {
> >>  ...
> >>  }
> >> 
> >> Chokes on hw/arm/musicpal.c's lcd_refresh() with the unhelpful error
> >> message "no position information".  Nothing to convert there; skipped.
> >> 
> >> Fails to convert hw/arm/armsse.c, because Coccinelle gets confused by
> >> ARMSSE being used both as typedef and function-like macro there.
> >> Converted manually.
> >> 
> >> A few line breaks tidied up manually.
> >> 
> >> Signed-off-by: Markus Armbruster 
> >> Reviewed-by: Eric Blake 
> >> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> >> ---
> >
> > FWIW I had posted an R-b for this patch in v1 
> > (20200629124037.2b9a2...@bahia.lan).
> 
> When I sliced and diced my patches for v2, I dropped R-bys for patches
> substantially altered.  This one was borderline: the patch does strictly
> less, and the work it no longer does us done by later patches.
> 
> Example: v1's first hunk
> 
> diff --git a/hw/arm/allwinner-a10.c b/hw/arm/allwinner-a10.c
> index 52e0d83760..3e45aa4141 100644
> --- a/hw/arm/allwinner-a10.c
> +++ b/hw/arm/allwinner-a10.c
> @@ -72,17 +72,12 @@ static void aw_a10_realize(DeviceState *dev, Error 
> **errp)
>  {
>  AwA10State *s = AW_A10(dev);
>  SysBusDevice *sysbusdev;
> -Error *err = NULL;
> 
> -qdev_realize(DEVICE(>cpu), NULL, );
> -if (err != NULL) {
> -error_propagate(errp, err);
> +if (!qdev_realize(DEVICE(>cpu), NULL, errp)) {
>  return;
>  }
> 
> -sysbus_realize(SYS_BUS_DEVICE(>intc), );
> -if (err != NULL) {
> -error_propagate(errp, err);
> +if (!sysbus_realize(SYS_BUS_DEVICE(>intc), errp)) {
>  return;
>  }
>  sysbusdev = SYS_BUS_DEVICE(>intc);
> 
> became
> 
> diff --git a/hw/arm/allwinner-a10.c b/hw/arm/allwinner-a10.c
> index 52e0d83760..e1acffe5f6 100644
> --- a/hw/arm/allwinner-a10.c
> +++ b/hw/arm/allwinner-a10.c
> @@ -74,14 +74,12 @@ static void aw_a10_realize(DeviceState *dev, Error 
> **errp)
>  SysBusDevice *sysbusdev;
>  Error *err = NULL;
> 
> -qdev_realize(DEVICE(>cpu), NULL, );
> -if (err != NULL) {
> +if (!qdev_realize(DEVICE(>cpu), NULL, )) {
>  error_propagate(errp, err);
>  return;
>  }
> 
> -sysbus_realize(SYS_BUS_DEVICE(>intc), );
> -if (err != NULL) {
> +if (!sysbus_realize(SYS_BUS_DEVICE(>intc), )) {
>  error_propagate(errp, err);
>  return;
>  }
> 
> 
> in v2 and v3.  The two error_propagate() and the local variable now go
> away only in PATCH v3 33.
> 
> Would you like me to record your R-by for the patch's current version?
> 

I've reviewed it again, so, yes, please do.



Re: [PATCH v3 03/44] qdev: Use returned bool to check for qdev_realize() etc. failure

2020-07-06 Thread Greg Kurz
On Mon,  6 Jul 2020 10:09:09 +0200
Markus Armbruster  wrote:

> Convert
> 
> foo(..., );
> if (err) {
> ...
> }
> 
> to
> 
> if (!foo(..., )) {
> ...
> }
> 
> for qdev_realize(), qdev_realize_and_unref(), qbus_realize() and their
> wrappers isa_realize_and_unref(), pci_realize_and_unref(),
> sysbus_realize(), sysbus_realize_and_unref(), usb_realize_and_unref().
> Coccinelle script:
> 
> @@
> identifier fun = {
> isa_realize_and_unref, pci_realize_and_unref, qbus_realize,
> qdev_realize, qdev_realize_and_unref, sysbus_realize,
> sysbus_realize_and_unref, usb_realize_and_unref
> };
> expression list args, args2;
> typedef Error;
> Error *err;
> @@
> -fun(args, , args2);
> -if (err)
> +if (!fun(args, , args2))
>  {
>  ...
>  }
> 
> Chokes on hw/arm/musicpal.c's lcd_refresh() with the unhelpful error
> message "no position information".  Nothing to convert there; skipped.
> 
> Fails to convert hw/arm/armsse.c, because Coccinelle gets confused by
> ARMSSE being used both as typedef and function-like macro there.
> Converted manually.
> 
> A few line breaks tidied up manually.
> 
> Signed-off-by: Markus Armbruster 
> Reviewed-by: Eric Blake 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> ---

FWIW I had posted an R-b for this patch in v1 
(20200629124037.2b9a2...@bahia.lan).

>  hw/arm/allwinner-a10.c  | 15 +++
>  hw/arm/armsse.c | 78 +++--
>  hw/arm/armv7m.c |  9 ++--
>  hw/arm/aspeed_ast2600.c | 51 +++--
>  hw/arm/aspeed_soc.c | 45 +++
>  hw/arm/bcm2835_peripherals.c| 45 +++
>  hw/arm/bcm2836.c|  9 ++--
>  hw/arm/cubieboard.c |  3 +-
>  hw/arm/digic.c  |  9 ++--
>  hw/arm/digic_boards.c   |  3 +-
>  hw/arm/fsl-imx25.c  | 33 +-
>  hw/arm/fsl-imx31.c  | 24 --
>  hw/arm/fsl-imx6.c   | 36 +--
>  hw/arm/msf2-soc.c   | 15 +++
>  hw/arm/nrf51_soc.c  | 18 +++-
>  hw/arm/stm32f205_soc.c  | 21 +++--
>  hw/arm/stm32f405_soc.c  | 24 --
>  hw/arm/xlnx-zynqmp.c| 45 +++
>  hw/block/fdc.c  |  3 +-
>  hw/block/xen-block.c|  3 +-
>  hw/char/serial-pci-multi.c  |  3 +-
>  hw/char/serial-pci.c|  3 +-
>  hw/char/serial.c|  6 +--
>  hw/core/cpu.c   |  3 +-
>  hw/cpu/a15mpcore.c  |  3 +-
>  hw/cpu/a9mpcore.c   | 15 +++
>  hw/cpu/arm11mpcore.c| 12 ++---
>  hw/cpu/realview_mpcore.c|  6 +--
>  hw/display/virtio-gpu-pci.c |  4 +-
>  hw/display/virtio-vga.c |  3 +-
>  hw/intc/armv7m_nvic.c   |  6 +--
>  hw/intc/pnv_xive.c  |  6 +--
>  hw/intc/realview_gic.c  |  3 +-
>  hw/intc/spapr_xive.c|  6 +--
>  hw/intc/xics.c  |  3 +-
>  hw/intc/xive.c  |  3 +-
>  hw/isa/piix4.c  |  3 +-
>  hw/microblaze/xlnx-zynqmp-pmu.c |  6 +--
>  hw/mips/cps.c   | 12 ++---
>  hw/misc/macio/cuda.c|  3 +-
>  hw/misc/macio/macio.c   | 18 +++-
>  hw/misc/macio/pmu.c |  3 +-
>  hw/pci-host/pnv_phb3.c  |  9 ++--
>  hw/pci-host/pnv_phb4.c  |  3 +-
>  hw/pci-host/pnv_phb4_pec.c  |  3 +-
>  hw/ppc/e500.c   |  3 +-
>  hw/ppc/pnv.c| 39 ++---
>  hw/ppc/pnv_core.c   |  3 +-
>  hw/ppc/pnv_psi.c|  6 +--
>  hw/ppc/spapr_cpu_core.c |  3 +-
>  hw/ppc/spapr_irq.c  |  3 +-
>  hw/riscv/opentitan.c|  6 +--
>  hw/riscv/sifive_e.c |  3 +-
>  hw/riscv/sifive_u.c |  3 +-
>  hw/s390x/event-facility.c   | 10 ++---
>  hw/s390x/s390-pci-bus.c |  3 +-
>  hw/s390x/sclp.c |  3 +-
>  hw/s390x/virtio-ccw-crypto.c|  3 +-
>  hw/s390x/virtio-ccw-rng.c   |  3 +-
>  hw/scsi/scsi-bus.c  |  3 +-
>  hw/sd/aspeed_sdhci.c|  3 +-
>  hw/sd/ssi-sd.c  |  3 +-
>  hw/usb/bus.c|  3 +-
>  hw/virtio/virtio-rng-pci.c  |  3 +-
>  qdev-monitor.c  |  3 +-
>  65 files changed, 248 insertions(+), 495 deletions(-)
> 
> diff --git a/hw/arm/allwinner-a10.c b/hw/arm/allwinner-a10.c
> index 52e0d83760..e1acffe5f6 100644
> --- a/hw/arm/allwinner-a10.c
> +++ b/hw/arm/allwinner-a10.c
> @@ -74,14 +74,12 @@ static void aw_a10_realize(DeviceState *dev, Error **errp)
>  SysBusDevice *sysbusdev;
>  Error *err = NULL;
>  
> -qdev_realize(DEVICE(>cpu), NULL, );
> -if (err != NULL) {
> +if (!qdev_realize(DEVICE(>cpu), NULL, )) {
>  error_propagate(errp, err);
>  return;
>  }
>  

Re: [PATCH 03/46] qdev: Smooth error checking of qdev_realize() & friends

2020-06-29 Thread Greg Kurz
On Wed, 24 Jun 2020 18:43:01 +0200
Markus Armbruster  wrote:

> Convert
> 
> foo(..., );
> if (err) {
> ...
> }
> 
> to
> 
> if (!foo(..., )) {
> ...
> }
> 
> for qdev_realize(), qdev_realize_and_unref(), qbus_realize() and their
> wrappers isa_realize_and_unref(), pci_realize_and_unref(),
> sysbus_realize(), sysbus_realize_and_unref(), usb_realize_and_unref().
> Coccinelle script:
> 
> @@
> identifier fun = {isa_realize_and_unref, pci_realize_and_unref, 
> qbus_realize, qdev_realize, qdev_realize_and_unref, sysbus_realize, 
> sysbus_realize_and_unref, usb_realize_and_unref};
> expression list args, args2;
> typedef Error;
> Error *err;
> identifier errp;
> @@
> -  fun(args, , args2);
> -  if (err) {
> +  if (!fun(args, errp, args2)) {
>  ... when != err
> -error_propagate(errp, err);
>  ...
>  }
> 
> @@
> identifier fun = {isa_realize_and_unref, pci_realize_and_unref, 
> qbus_realize, qdev_realize, qdev_realize_and_unref, sysbus_realize, 
> sysbus_realize_and_unref, usb_realize_and_unref};
> expression list args, args2;
> typedef Error;
> Error *err;
> @@
> -  fun(args, , args2);
> -  if (err) {
> +  if (!fun(args, , args2)) {
>  ...
>  }
> 
> Fails to convert hw/arm/armsse.c, because Coccinelle gets confused by
> ARMSSE being used both as typedef and function-like macro there.
> Convert manually.
> 
> Eliminate error_propagate() that are now unnecessary.  Delete @err
> that are now unused.  Clean up whitespace.
> 
> Signed-off-by: Markus Armbruster 
> ---

Reviewed-by: Greg Kurz 

>  hw/arm/allwinner-a10.c  |  21 ++-
>  hw/arm/armsse.c | 104 
>  hw/arm/armv7m.c |  12 +---
>  hw/arm/aspeed_ast2600.c |  68 ++---
>  hw/arm/aspeed_soc.c |  60 +-
>  hw/arm/bcm2835_peripherals.c|  60 +-
>  hw/arm/bcm2836.c|  12 +---
>  hw/arm/cubieboard.c |   3 +-
>  hw/arm/digic.c  |  12 +---
>  hw/arm/digic_boards.c   |   3 +-
>  hw/arm/fsl-imx25.c  |  44 --
>  hw/arm/fsl-imx31.c  |  32 +++---
>  hw/arm/fsl-imx6.c   |  48 ---
>  hw/arm/msf2-soc.c   |  21 ++-
>  hw/arm/nrf51_soc.c  |  24 ++--
>  hw/arm/stm32f205_soc.c  |  29 +++--
>  hw/arm/stm32f405_soc.c  |  32 +++---
>  hw/arm/xlnx-zynqmp.c|  61 +--
>  hw/block/fdc.c  |   4 +-
>  hw/block/xen-block.c|   3 +-
>  hw/char/serial-pci-multi.c  |   5 +-
>  hw/char/serial-pci.c|   5 +-
>  hw/char/serial.c|  10 +--
>  hw/core/cpu.c   |   3 +-
>  hw/cpu/a15mpcore.c  |   5 +-
>  hw/cpu/a9mpcore.c   |  21 ++-
>  hw/cpu/arm11mpcore.c|  17 ++
>  hw/cpu/realview_mpcore.c|   9 +--
>  hw/display/virtio-gpu-pci.c |   6 +-
>  hw/display/virtio-vga.c |   5 +-
>  hw/intc/armv7m_nvic.c   |   9 +--
>  hw/intc/pnv_xive.c  |   8 +--
>  hw/intc/realview_gic.c  |   5 +-
>  hw/intc/spapr_xive.c|   8 +--
>  hw/intc/xics.c  |   5 +-
>  hw/intc/xive.c  |   3 +-
>  hw/isa/piix4.c  |   5 +-
>  hw/microblaze/xlnx-zynqmp-pmu.c |   9 +--
>  hw/mips/cps.c   |  17 ++
>  hw/misc/macio/cuda.c|   5 +-
>  hw/misc/macio/macio.c   |  25 ++--
>  hw/misc/macio/pmu.c |   5 +-
>  hw/pci-host/pnv_phb3.c  |  13 +---
>  hw/pci-host/pnv_phb4.c  |   5 +-
>  hw/pci-host/pnv_phb4_pec.c  |   5 +-
>  hw/ppc/e500.c   |   5 +-
>  hw/ppc/pnv.c|  53 
>  hw/ppc/pnv_core.c   |   4 +-
>  hw/ppc/pnv_psi.c|   9 +--
>  hw/ppc/spapr_cpu_core.c |   3 +-
>  hw/ppc/spapr_irq.c  |   5 +-
>  hw/riscv/opentitan.c|   9 +--
>  hw/riscv/sifive_e.c |   6 +-
>  hw/riscv/sifive_u.c |   5 +-
>  hw/s390x/event-facility.c   |  13 ++--
>  hw/s390x/s390-pci-bus.c |   3 +-
>  hw/s390x/sclp.c |   3 +-
>  hw/s390x/virtio-ccw-crypto.c|   5 +-
>  hw/s390x/virtio-ccw-rng.c   |   5 +-
>  hw/scsi/scsi-bus.c  |   4 +-
>  hw/sd/aspeed_sdhci.c|   4 +-
>  hw/sd/ssi-sd.c  |

Re: [PATCH 02/46] error: Document Error API usage rules

2020-06-25 Thread Greg Kurz
On Wed, 24 Jun 2020 18:43:00 +0200
Markus Armbruster  wrote:

> This merely codifies existing practice, with one exception: the rule
> advising against returning void, where existing practice is mixed.
> 
> When the Error API was created, we adopted the (unwritten) rule to
> return void when the function returns no useful value on success,
> unlike GError, which recommends to return true on success and false on
> error then.
> 
> When a function returns a distinct error value, say false, a checked
> call that passes the error up looks like
> 
> if (!frobnicate(..., errp)) {
> handle the error...
> }
> 
> When it returns void, we need
> 
> Error *err = NULL;
> 
> frobnicate(..., );
> if (err) {
> handle the error...
> error_propagate(errp, err);
> }
> 
> Not only is this more verbose, it also creates an Error object even
> when @errp is null, _abort or _fatal.
> 
> People got tired of the additional boilerplate, and started to ignore
> the unwritten rule.  The result is confusion among developers about
> the preferred usage.
> 

This confusion is reinforced by the fact that the standard pattern:

error_setg(errp, ...);
error_append_hint(errp, ...);

doesn't work when errp is _fatal, which is a typical case of
an invalid command line argument, where it is valuable to suggest
something sensible to the user but error_setg() exits before we
could do so.

Fortunately, Vladimir's work will address that and eliminate the
temptation to workaround the issue with more boilerplate :)

> The written rule will hopefully reduce the confusion.
> 
> The remainder of this series will update a substantial amount of code
> to honor the rule.
> 
> Signed-off-by: Markus Armbruster 
> ---

Reviewed-by: Greg Kurz 

>  include/qapi/error.h | 26 ++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/include/qapi/error.h b/include/qapi/error.h
> index 1a5ea25e12..c3d84d610a 100644
> --- a/include/qapi/error.h
> +++ b/include/qapi/error.h
> @@ -15,6 +15,32 @@
>  /*
>   * Error reporting system loosely patterned after Glib's GError.
>   *
> + * Rules:
> + *
> + * - Functions that use Error to report errors have an Error **errp
> + *   parameter.  It should be the last parameter, except for functions
> + *   taking variable arguments.
> + *
> + * - You may pass NULL to not receive the error, _abort to abort
> + *   on error, _fatal to exit(1) on error, or a pointer to a
> + *   variable containing NULL to receive the error.
> + *
> + * - The value of @errp should not affect control flow.
> + *
> + * - On success, the function should not use @errp.  On failure, it
> + *   should set a new error, e.g. with error_setg(errp, ...), or
> + *   propagate an existing one, e.g. with error_propagate(errp, ...).
> + *
> + * - Whenever practical, also return a value that indicates success /
> + *   failure.  This can make the error checking more concise, and can
> + *   avoid useless error object creation and destruction.  Note that
> + *   we still have many functions returning void.  We recommend
> + *   • bool-valued functions return true on success / false on failure,
> + *   • pointer-valued functions return non-null / null pointer, and
> + *   • integer-valued functions return non-negative / negative.
> + *
> + * How to:
> + *
>   * Create an error:
>   * error_setg(errp, "situation normal, all fouled up");
>   *




Re: [PATCH 01/46] error: Improve examples in error.h's big comment

2020-06-25 Thread Greg Kurz
On Wed, 24 Jun 2020 18:42:59 +0200
Markus Armbruster  wrote:

> Show errp instead of  where  is actually unusual.  Add a
> missing declaration.  Add a second error pileup example.
> 
> Signed-off-by: Markus Armbruster 
> ---
>  include/qapi/error.h | 19 +++
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/include/qapi/error.h b/include/qapi/error.h
> index ad5b6e896d..1a5ea25e12 100644
> --- a/include/qapi/error.h
> +++ b/include/qapi/error.h
> @@ -16,15 +16,15 @@
>   * Error reporting system loosely patterned after Glib's GError.
>   *
>   * Create an error:
> - * error_setg(, "situation normal, all fouled up");
> + * error_setg(errp, "situation normal, all fouled up");
>   *
>   * Create an error and add additional explanation:
> - * error_setg(, "invalid quark");
> - * error_append_hint(, "Valid quarks are up, down, strange, "
> + * error_setg(errp, "invalid quark");
> + * error_append_hint(errp, "Valid quarks are up, down, strange, "
>   *   "charm, top, bottom.\n");
>   *
>   * Do *not* contract this to
> - * error_setg(, "invalid quark\n"
> + * error_setg(errp, "invalid quark\n" // WRONG!
>   *"Valid quarks are up, down, strange, charm, top, bottom.");
>   *
>   * Report an error to the current monitor if we have one, else stderr:
> @@ -108,12 +108,23 @@
>   * }
>   *
>   * Do *not* "optimize" this to
> + * Error *err = NULL;
>   * foo(arg, );
>   * bar(arg, ); // WRONG!
>   * if (err) {
>   * handle the error...
>   * }
>   * because this may pass a non-null err to bar().
> + *
> + * Likewise, do *not*
> + * Error *err = NULL;
> + * if (cond1) {
> + * error_setg(err, ...);

s/err/

> + * }
> + * if (cond2) {
> + * error_setg(err, ...); // WRONG!

ditto

With that fixed:

Reviewed-by: Greg Kurz 

> + * }
> + * because this may pass a non-null err to error_setg().
>   */
>  
>  #ifndef ERROR_H




Re: [PATCH v10 1/9] error: auto propagated local_err

2020-06-24 Thread Greg Kurz
On Wed, 24 Jun 2020 18:53:05 +0200
Markus Armbruster  wrote:

> Greg Kurz  writes:
> 
> > On Mon, 15 Jun 2020 07:21:03 +0200
> > Markus Armbruster  wrote:
> >
> >> Greg Kurz  writes:
> >> 
> >> > On Tue, 17 Mar 2020 18:16:17 +0300
> >> > Vladimir Sementsov-Ogievskiy  wrote:
> >> >
> >> >> Introduce a new ERRP_AUTO_PROPAGATE macro, to be used at start of
> >> >> functions with an errp OUT parameter.
> >> >> 
> >> >> It has three goals:
> >> >> 
> >> >> 1. Fix issue with error_fatal and error_prepend/error_append_hint: user
> >> >> can't see this additional information, because exit() happens in
> >> >> error_setg earlier than information is added. [Reported by Greg Kurz]
> >> >> 
> >> >
> >> > I have more of these coming and I'd really like to use 
> >> > ERRP_AUTO_PROPAGATE.
> >> >
> >> > It seems we have a consensus on the macro itself but this series is gated
> >> > by the conversion of the existing code base.
> >> >
> >> > What about merging this patch separately so that people can start using
> >> > it at least ?
> >> 
> >> Please give me a few more days to finish the work I feel should go in
> >> before the conversion.  With any luck, Vladimir can then rebase /
> >> recreate the conversion easily, and you can finally use the macro for
> >> your own work.
> >> 
> >
> > Sure. Thanks.
> 
> Just posted "[PATCH 00/46] Less clumsy error checking".  The sheer size
> of the thing and the length of its dependency chain explains why it took
> me so long.  I feel bad about delaying you all the same.  Apologies!
> 

No problem. This series of yours is impressive. Putting an end to the
highjacking of the Error ** argument is really a beneficial move.

> I hope we can converge quickly enough to get Vladimir's work on top
> ready in time for the soft freeze.
> 

I'll find some cycles for reviewing.

Cheers,

--
Greg



Re: [PATCH v10 1/9] error: auto propagated local_err

2020-06-15 Thread Greg Kurz
On Mon, 15 Jun 2020 07:21:03 +0200
Markus Armbruster  wrote:

> Greg Kurz  writes:
> 
> > On Tue, 17 Mar 2020 18:16:17 +0300
> > Vladimir Sementsov-Ogievskiy  wrote:
> >
> >> Introduce a new ERRP_AUTO_PROPAGATE macro, to be used at start of
> >> functions with an errp OUT parameter.
> >> 
> >> It has three goals:
> >> 
> >> 1. Fix issue with error_fatal and error_prepend/error_append_hint: user
> >> can't see this additional information, because exit() happens in
> >> error_setg earlier than information is added. [Reported by Greg Kurz]
> >> 
> >
> > I have more of these coming and I'd really like to use ERRP_AUTO_PROPAGATE.
> >
> > It seems we have a consensus on the macro itself but this series is gated
> > by the conversion of the existing code base.
> >
> > What about merging this patch separately so that people can start using
> > it at least ?
> 
> Please give me a few more days to finish the work I feel should go in
> before the conversion.  With any luck, Vladimir can then rebase /
> recreate the conversion easily, and you can finally use the macro for
> your own work.
> 

Sure. Thanks.



Re: [PATCH v10 1/9] error: auto propagated local_err

2020-06-10 Thread Greg Kurz
On Tue, 17 Mar 2020 18:16:17 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> Introduce a new ERRP_AUTO_PROPAGATE macro, to be used at start of
> functions with an errp OUT parameter.
> 
> It has three goals:
> 
> 1. Fix issue with error_fatal and error_prepend/error_append_hint: user
> can't see this additional information, because exit() happens in
> error_setg earlier than information is added. [Reported by Greg Kurz]
> 

I have more of these coming and I'd really like to use ERRP_AUTO_PROPAGATE.

It seems we have a consensus on the macro itself but this series is gated
by the conversion of the existing code base.

What about merging this patch separately so that people can start using
it at least ?

> 2. Fix issue with error_abort and error_propagate: when we wrap
> error_abort by local_err+error_propagate, the resulting coredump will
> refer to error_propagate and not to the place where error happened.
> (the macro itself doesn't fix the issue, but it allows us to [3.] drop
> the local_err+error_propagate pattern, which will definitely fix the
> issue) [Reported by Kevin Wolf]
> 
> 3. Drop local_err+error_propagate pattern, which is used to workaround
> void functions with errp parameter, when caller wants to know resulting
> status. (Note: actually these functions could be merely updated to
> return int error code).
> 
> To achieve these goals, later patches will add invocations
> of this macro at the start of functions with either use
> error_prepend/error_append_hint (solving 1) or which use
> local_err+error_propagate to check errors, switching those
> functions to use *errp instead (solving 2 and 3).
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Paul Durrant 
> Reviewed-by: Greg Kurz 
> Reviewed-by: Eric Blake 
> ---
> 
> Cc: Eric Blake 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> Cc: Greg Kurz 
> Cc: Christian Schoenebeck 
> Cc: Stefan Hajnoczi 
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: Paul Durrant 
> Cc: "Philippe Mathieu-Daudé" 
> Cc: Laszlo Ersek 
> Cc: Gerd Hoffmann 
> Cc: Stefan Berger 
> Cc: Markus Armbruster 
> Cc: Michael Roth 
> Cc: qemu-de...@nongnu.org
> Cc: qemu-block@nongnu.org
> Cc: xen-de...@lists.xenproject.org
> 
>  include/qapi/error.h | 205 ---
>  1 file changed, 173 insertions(+), 32 deletions(-)
> 
> diff --git a/include/qapi/error.h b/include/qapi/error.h
> index ad5b6e896d..30140d9bfe 100644
> --- a/include/qapi/error.h
> +++ b/include/qapi/error.h
> @@ -15,6 +15,8 @@
>  /*
>   * Error reporting system loosely patterned after Glib's GError.
>   *
> + * = Deal with Error object =
> + *
>   * Create an error:
>   * error_setg(, "situation normal, all fouled up");
>   *
> @@ -47,28 +49,91 @@
>   * reporting it (primarily useful in testsuites):
>   * error_free_or_abort();
>   *
> - * Pass an existing error to the caller:
> - * error_propagate(errp, err);
> - * where Error **errp is a parameter, by convention the last one.
> + * = Deal with Error ** function parameter =
>   *
> - * Pass an existing error to the caller with the message modified:
> - * error_propagate_prepend(errp, err);
> + * A function may use the error system to return errors. In this case, the
> + * function defines an Error **errp parameter, by convention the last one 
> (with
> + * exceptions for functions using ... or va_list).
>   *
> - * Avoid
> - * error_propagate(errp, err);
> - * error_prepend(errp, "Could not frobnicate '%s': ", name);
> - * because this fails to prepend when @errp is _fatal.
> + * The caller may then pass in the following errp values:
>   *
> - * Create a new error and pass it to the caller:
> + * 1. _abort
> + *Any error will result in abort().
> + * 2. _fatal
> + *Any error will result in exit() with a non-zero status.
> + * 3. NULL
> + *No error reporting through errp parameter.
> + * 4. The address of a NULL-initialized Error *err
> + *Any error will populate errp with an error object.
> + *
> + * The following rules then implement the correct semantics desired by the
> + * caller.
> + *
> + * Create a new error to pass to the caller:
>   * error_setg(errp, "situation normal, all fouled up");
>   *
> - * Call a function and receive an error from it:
> + * Calling another errp-based function:
> + * f(..., errp);
> + *
> + * == Checking success of subcall ==
> + *
> + * If a function returns a value indicating an error in addition to setting
> + * errp (which is recommended), then you don't need any additional code, just
> + * do:
> + *
> + * int ret = f(..., errp);
> + * 

Re: [PATCH v8 01/10] error: auto propagated local_err

2020-03-06 Thread Greg Kurz
On Fri,  6 Mar 2020 08:15:27 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> Here is introduced ERRP_AUTO_PROPAGATE macro, to be used at start of
> functions with an errp OUT parameter.
> 
> It has three goals:
> 
> 1. Fix issue with error_fatal and error_prepend/error_append_hint: user
> can't see this additional information, because exit() happens in
> error_setg earlier than information is added. [Reported by Greg Kurz]
> 
> 2. Fix issue with error_abort and error_propagate: when we wrap
> error_abort by local_err+error_propagate, the resulting coredump will
> refer to error_propagate and not to the place where error happened.
> (the macro itself doesn't fix the issue, but it allows us to [3.] drop
> the local_err+error_propagate pattern, which will definitely fix the
> issue) [Reported by Kevin Wolf]
> 
> 3. Drop local_err+error_propagate pattern, which is used to workaround
> void functions with errp parameter, when caller wants to know resulting
> status. (Note: actually these functions could be merely updated to
> return int error code).
> 
> To achieve these goals, later patches will add invocations
> of this macro at the start of functions with either use
> error_prepend/error_append_hint (solving 1) or which use
> local_err+error_propagate to check errors, switching those
> functions to use *errp instead (solving 2 and 3).
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 

Thanks for this impressive work Vladimir !

Reviewed-by: Greg Kurz 

> Cc: Eric Blake 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> Cc: Greg Kurz 
> Cc: Christian Schoenebeck 
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: Paul Durrant 
> Cc: Stefan Hajnoczi 
> Cc: "Philippe Mathieu-Daudé" 
> Cc: Laszlo Ersek 
> Cc: Gerd Hoffmann 
> Cc: Stefan Berger 
> Cc: Markus Armbruster 
> Cc: Michael Roth 
> Cc: qemu-block@nongnu.org
> Cc: qemu-de...@nongnu.org
> Cc: xen-de...@lists.xenproject.org
> 
>  include/qapi/error.h | 203 ---
>  1 file changed, 170 insertions(+), 33 deletions(-)
> 
> diff --git a/include/qapi/error.h b/include/qapi/error.h
> index ad5b6e896d..bb9bcf02fb 100644
> --- a/include/qapi/error.h
> +++ b/include/qapi/error.h
> @@ -15,6 +15,8 @@
>  /*
>   * Error reporting system loosely patterned after Glib's GError.
>   *
> + * = Deal with Error object =
> + *
>   * Create an error:
>   * error_setg(, "situation normal, all fouled up");
>   *
> @@ -47,28 +49,88 @@
>   * reporting it (primarily useful in testsuites):
>   * error_free_or_abort();
>   *
> - * Pass an existing error to the caller:
> - * error_propagate(errp, err);
> - * where Error **errp is a parameter, by convention the last one.
> + * = Deal with Error ** function parameter =
>   *
> - * Pass an existing error to the caller with the message modified:
> - * error_propagate_prepend(errp, err);
> + * Function may use error system to return errors. In this case function
> + * defines Error **errp parameter, which should be the last one (except for
> + * functions which varidic argument list), which has the following API:
>   *
> - * Avoid
> - * error_propagate(errp, err);
> - * error_prepend(errp, "Could not frobnicate '%s': ", name);
> - * because this fails to prepend when @errp is _fatal.
> + * Caller may pass as errp:
> + * 1. _abort
> + *This means abort on any error
> + * 2. _fatal
> + *Exit with non-zero return code on error
> + * 3. NULL
> + *Ignore errors
> + * 4. Another value
> + *On error allocate error object and set errp
>   *
> - * Create a new error and pass it to the caller:
> - * error_setg(errp, "situation normal, all fouled up");
> + * Error API functions with Error ** (like error_setg) argument supports 
> these
> + * rules, so user functions just need to use them appropriately (read below).
>   *
> - * Call a function and receive an error from it:
> + * Simple pass error to the caller:
> + * error_setg(errp, "Some error");
> + *
> + * Subcall of another errp-based function, passing the error to the caller
> + * f(..., errp);
> + *
> + * == Checking success of subcall ==
> + *
> + * If function returns error code in addition to errp (which is recommended),
> + * you don't need any additional code, just do:
> + * int ret = f(..., errp);
> + * if (ret < 0) {
> + * ... handle error ...
> + * return ret;
> + * }
> + *
> + * If function returns nothing (which is not recommended API) and the only 
> way
> + * to check success is checking errp, we must care about cases [1-3] above. 
> We
> + * need to use

Re: [PATCH v2 8/8] virtfs-proxy-helper: Convert documentation to rST

2020-01-24 Thread Greg Kurz
On Fri, 24 Jan 2020 16:26:06 +
Peter Maydell  wrote:

> The virtfs-proxy-helper documentation is currently in
> fsdev/qemu-trace-stap.texi in Texinfo format, which we
> present to the user as:
>  * a virtfs-proxy-helper manpage
>  * but not (unusually for QEMU) part of the HTML docs
> 
> Convert the documentation to rST format that lives in
> the docs/ subdirectory, and present it to the user as:
>  * a virtfs-proxy-helper manpage
>  * part of the interop/ Sphinx manual
> 
> There are minor formatting changes to suit Sphinx, but no
> content changes. In particular I've split the -u and -g
> options into each having their own description text.
> 
> Signed-off-by: Peter Maydell 
> ---

Thanks !

Acked-by: Greg Kurz 

>  Makefile |  7 ++-
>  MAINTAINERS  |  1 +
>  docs/interop/conf.py |  5 +-
>  docs/interop/index.rst   |  1 +
>  docs/interop/virtfs-proxy-helper.rst | 72 
>  fsdev/virtfs-proxy-helper.texi   | 63 
>  6 files changed, 81 insertions(+), 68 deletions(-)
>  create mode 100644 docs/interop/virtfs-proxy-helper.rst
>  delete mode 100644 fsdev/virtfs-proxy-helper.texi
> 
> diff --git a/Makefile b/Makefile
> index 5dded94bf63..e08882fd49f 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -354,7 +354,7 @@ DOCS+=docs/interop/qemu-ga-ref.html 
> docs/interop/qemu-ga-ref.txt docs/interop/qe
>  DOCS+=docs/qemu-cpu-models.7
>  DOCS+=$(MANUAL_BUILDDIR)/index.html
>  ifdef CONFIG_VIRTFS
> -DOCS+=fsdev/virtfs-proxy-helper.1
> +DOCS+=$(MANUAL_BUILDDIR)/interop/virtfs-proxy-helper.1
>  endif
>  ifdef CONFIG_TRACE_SYSTEMTAP
>  DOCS+=$(MANUAL_BUILDDIR)/interop/qemu-trace-stap.1
> @@ -859,7 +859,7 @@ endif
>  endif
>  ifdef CONFIG_VIRTFS
>   $(INSTALL_DIR) "$(DESTDIR)$(mandir)/man1"
> - $(INSTALL_DATA) fsdev/virtfs-proxy-helper.1 "$(DESTDIR)$(mandir)/man1"
> + $(INSTALL_DATA) $(MANUAL_BUILDDIR)/interop/virtfs-proxy-helper.1 
> "$(DESTDIR)$(mandir)/man1"
>  endif
>  
>  install-datadir:
> @@ -1051,7 +1051,7 @@ $(MANUAL_BUILDDIR)/system/index.html: $(call 
> manual-deps,system)
>   $(call build-manual,system,html)
>  
>  $(call define-manpage-rule,interop,\
> -   qemu-ga.8 qemu-img.1 qemu-nbd.8 qemu-trace-stap.1,\
> +   qemu-ga.8 qemu-img.1 qemu-nbd.8 qemu-trace-stap.1 
> virtfs-proxy-helper.1,\
> $(SRC_PATH/qemu-img-cmds.hx))
>  
>  $(call define-manpage-rule,system,qemu-block-drivers.7)
> @@ -1078,7 +1078,6 @@ docs/interop/qemu-ga-qapi.texi: 
> qga/qapi-generated/qga-qapi-doc.texi
>  
>  qemu.1: qemu-doc.texi qemu-options.texi qemu-monitor.texi 
> qemu-monitor-info.texi
>  qemu.1: qemu-option-trace.texi
> -fsdev/virtfs-proxy-helper.1: fsdev/virtfs-proxy-helper.texi
>  docs/qemu-cpu-models.7: docs/qemu-cpu-models.texi
>  
>  html: qemu-doc.html docs/interop/qemu-qmp-ref.html 
> docs/interop/qemu-ga-ref.html sphinxdocs
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 54c4429069d..83fb32b8601 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1573,6 +1573,7 @@ S: Odd Fixes
>  F: hw/9pfs/
>  X: hw/9pfs/xen-9p*
>  F: fsdev/
> +F: docs/interop/virtfs-proxy-helper.rst
>  F: tests/qtest/virtio-9p-test.c
>  T: git https://github.com/gkurz/qemu.git 9p-next
>  
> diff --git a/docs/interop/conf.py b/docs/interop/conf.py
> index baea7fb50ee..b0f322207ca 100644
> --- a/docs/interop/conf.py
> +++ b/docs/interop/conf.py
> @@ -24,5 +24,8 @@ man_pages = [
>  ('qemu-nbd', 'qemu-nbd', u'QEMU Disk Network Block Device Server',
>   ['Anthony Liguori '], 8),
>  ('qemu-trace-stap', 'qemu-trace-stap', u'QEMU SystemTap trace tool',
> - [], 1)
> + [], 1),
> +('virtfs-proxy-helper', 'virtfs-proxy-helper',
> + u'QEMU 9p virtfs proxy filesystem helper',
> + ['M. Mohan Kumar'], 1)
>  ]
> diff --git a/docs/interop/index.rst b/docs/interop/index.rst
> index d756a826b26..3b763b1eebe 100644
> --- a/docs/interop/index.rst
> +++ b/docs/interop/index.rst
> @@ -23,3 +23,4 @@ Contents:
> qemu-trace-stap
> vhost-user
> vhost-user-gpu
> +   virtfs-proxy-helper
> diff --git a/docs/interop/virtfs-proxy-helper.rst 
> b/docs/interop/virtfs-proxy-helper.rst
> new file mode 100644
> index 000..6cdeedf8e93
> --- /dev/null
> +++ b/docs/interop/virtfs-proxy-helper.rst
> @@ -0,0 +1,72 @@
> +QEMU 9p virtfs proxy filesystem helper
> +==
> +
> +Synopsis
> +
> +
> +**virtfs-proxy-helper** [*OPTIONS*]
> +
> +Description
> +---
> +
> +Pass-through security model in QEMU 9p server needs root privilege

Re: [PATCH v6 02/11] error: auto propagated local_err

2020-01-15 Thread Greg Kurz
On Fri, 10 Jan 2020 22:41:49 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> Here is introduced ERRP_AUTO_PROPAGATE macro, to be used at start of
> functions with errp OUT parameter.
> 
> It has three goals:
> 
> 1. Fix issue with error_fatal & error_prepend/error_append_hint: user
> can't see this additional information, because exit() happens in
> error_setg earlier than information is added. [Reported by Greg Kurz]
> 
> 2. Fix issue with error_abort & error_propagate: when we wrap
> error_abort by local_err+error_propagate, resulting coredump will
> refer to error_propagate and not to the place where error happened.
> (the macro itself doesn't fix the issue, but it allows to [3.] drop all
> local_err+error_propagate pattern, which will definitely fix the issue)
> [Reported by Kevin Wolf]
> 
> 3. Drop local_err+error_propagate pattern, which is used to workaround
> void functions with errp parameter, when caller wants to know resulting
> status. (Note: actually these functions could be merely updated to
> return int error code).
> 
> To achieve these goals, we need to add invocation of the macro at start
> of functions, which needs error_prepend/error_append_hint (1.); add
> invocation of the macro at start of functions which do
> local_err+error_propagate scenario the check errors, drop local errors
> from them and just use *errp instead (2., 3.).
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 

LGTM

Reviewed-by: Greg Kurz 

> CC: Cornelia Huck 
> CC: Eric Blake 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Greg Kurz 
> CC: Stefan Hajnoczi 
> CC: Stefano Stabellini 
> CC: Anthony Perard 
> CC: Paul Durrant 
> CC: "Philippe Mathieu-Daudé" 
> CC: Laszlo Ersek 
> CC: Gerd Hoffmann 
> CC: Stefan Berger 
> CC: Markus Armbruster 
> CC: Michael Roth 
> CC: qemu-block@nongnu.org
> CC: xen-de...@lists.xenproject.org
> 
>  include/qapi/error.h | 84 +++-
>  1 file changed, 83 insertions(+), 1 deletion(-)
> 
> diff --git a/include/qapi/error.h b/include/qapi/error.h
> index fa8d51fd6d..532b9afb9e 100644
> --- a/include/qapi/error.h
> +++ b/include/qapi/error.h
> @@ -78,7 +78,7 @@
>   * Call a function treating errors as fatal:
>   * foo(arg, _fatal);
>   *
> - * Receive an error and pass it on to the caller:
> + * Receive an error and pass it on to the caller (DEPRECATED*):
>   * Error *err = NULL;
>   * foo(arg, );
>   * if (err) {
> @@ -98,6 +98,50 @@
>   * foo(arg, errp);
>   * for readability.
>   *
> + * DEPRECATED* This pattern is deprecated now, use ERRP_AUTO_PROPAGATE macro
> + * instead (defined below).
> + * It's deprecated because of two things:
> + *
> + * 1. Issue with error_abort & error_propagate: when we wrap error_abort by
> + * local_err+error_propagate, resulting coredump will refer to 
> error_propagate
> + * and not to the place where error happened.
> + *
> + * 2. A lot of extra code of the same pattern
> + *
> + * How to update old code to use ERRP_AUTO_PROPAGATE?
> + *
> + * All you need is to add ERRP_AUTO_PROPAGATE() invocation at function start,
> + * than you may safely dereference errp to check errors and do not need any
> + * additional local Error variables or calls to error_propagate().
> + *
> + * Example:
> + *
> + * old code
> + *
> + * void fn(..., Error **errp) {
> + * Error *err = NULL;
> + * foo(arg, );
> + * if (err) {
> + * handle the error...
> + * error_propagate(errp, err);
> + * return;
> + * }
> + * ...
> + * }
> + *
> + * updated code
> + *
> + * void fn(..., Error **errp) {
> + * ERRP_AUTO_PROPAGATE();
> + * foo(arg, errp);
> + * if (*errp) {
> + * handle the error...
> + * return;
> + * }
> + * ...
> + * }
> + *
> + *
>   * Receive and accumulate multiple errors (first one wins):
>   * Error *err = NULL, *local_err = NULL;
>   * foo(arg, );
> @@ -348,6 +392,44 @@ void error_set_internal(Error **errp,
>  ErrorClass err_class, const char *fmt, ...)
>  GCC_FMT_ATTR(6, 7);
>  
> +typedef struct ErrorPropagator {
> +Error *local_err;
> +Error **errp;
> +} ErrorPropagator;
> +
> +static inline void error_propagator_cleanup(ErrorPropagator *prop)
> +{
> +error_propagate(prop->errp, prop->local_err);
> +}
> +
> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagator, error_propagator_cleanup);
> +
> +/*
> + * ERRP_AUTO_PROPAGATE
> + *
> + * This macro is created to be the fi

Re: [PATCH v6 01/11] qapi/error: add (Error **errp) cleaning APIs

2020-01-14 Thread Greg Kurz
On Fri, 10 Jan 2020 22:41:48 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 

Reviewed-by: Greg Kurz 

> CC: Cornelia Huck 
> CC: Eric Blake 
> CC: Kevin Wolf 
> CC: Max Reitz 
> CC: Greg Kurz 
> CC: Stefan Hajnoczi 
> CC: Stefano Stabellini 
> CC: Anthony Perard 
> CC: Paul Durrant 
> CC: "Philippe Mathieu-Daudé" 
> CC: Laszlo Ersek 
> CC: Gerd Hoffmann 
> CC: Stefan Berger 
> CC: Markus Armbruster 
> CC: Michael Roth 
> CC: qemu-block@nongnu.org
> CC: xen-de...@lists.xenproject.org
> 
>  include/qapi/error.h | 26 ++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/include/qapi/error.h b/include/qapi/error.h
> index ad5b6e896d..fa8d51fd6d 100644
> --- a/include/qapi/error.h
> +++ b/include/qapi/error.h
> @@ -309,6 +309,32 @@ void warn_reportf_err(Error *err, const char *fmt, ...)
>  void error_reportf_err(Error *err, const char *fmt, ...)
>  GCC_FMT_ATTR(2, 3);
>  
> +/*
> + * Functions to clean Error **errp: call corresponding Error *err cleaning
> + * function an set pointer to NULL
> + */
> +static inline void error_free_errp(Error **errp)
> +{
> +assert(errp && *errp);
> +error_free(*errp);
> +*errp = NULL;
> +}
> +
> +static inline void error_report_errp(Error **errp)
> +{
> +assert(errp && *errp);
> +error_report_err(*errp);
> +*errp = NULL;
> +}
> +
> +static inline void warn_report_errp(Error **errp)
> +{
> +assert(errp && *errp);
> +warn_report_err(*errp);
> +*errp = NULL;
> +}
> +
> +
>  /*
>   * Just like error_setg(), except you get to specify the error class.
>   * Note: use of error classes other than ERROR_CLASS_GENERIC_ERROR is




Re: [PATCH v4 04/31] error: auto propagated local_err

2019-10-08 Thread Greg Kurz
On Tue, 08 Oct 2019 18:03:13 +0200
Markus Armbruster  wrote:

> Vladimir Sementsov-Ogievskiy  writes:
> 
> > Here is introduced ERRP_AUTO_PROPAGATE macro, to be used at start of
> > functions with errp OUT parameter.
> >
> > It has three goals:
> >
> > 1. Fix issue with error_fatal & error_prepend/error_append_hint: user
> > can't see this additional information, because exit() happens in
> > error_setg earlier than information is added. [Reported by Greg Kurz]
> >
> > 2. Fix issue with error_abort & error_propagate: when we wrap
> > error_abort by local_err+error_propagate, resulting coredump will
> > refer to error_propagate and not to the place where error happened.
> > (the macro itself doesn't fix the issue, but it allows to [3.] drop all
> > local_err+error_propagate pattern, which will definitely fix the issue)
> > [Reported by Kevin Wolf]
> >
> > 3. Drop local_err+error_propagate pattern, which is used to workaround
> > void functions with errp parameter, when caller wants to know resulting
> > status. (Note: actually these functions could be merely updated to
> > return int error code).
> 
> Starting with stating your goals is an excellent idea.  But I'd love to
> next read a high-level description of how your patch achieves or enables
> achieving these goals.
> 
> > Signed-off-by: Vladimir Sementsov-Ogievskiy 
> > ---
> [...]
> > diff --git a/include/qapi/error.h b/include/qapi/error.h
> > index 9376f59c35..02f967ac1d 100644
> > --- a/include/qapi/error.h
> > +++ b/include/qapi/error.h
> > @@ -322,6 +322,43 @@ void error_set_internal(Error **errp,
> >  ErrorClass err_class, const char *fmt, ...)
> >  GCC_FMT_ATTR(6, 7);
> >  
> > +typedef struct ErrorPropagator {
> > +Error *local_err;
> > +Error **errp;
> > +} ErrorPropagator;
> > +
> > +static inline void error_propagator_cleanup(ErrorPropagator *prop)
> > +{
> > +error_propagate(prop->errp, prop->local_err);
> > +}
> > +
> > +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagator, 
> > error_propagator_cleanup);
> > +
> > +/*
> > + * ERRP_AUTO_PROPAGATE
> > + *
> > + * This macro is created to be the first line of a function with Error 
> > **errp
> > + * OUT parameter. It's needed only in cases where we want to use 
> > error_prepend,
> > + * error_append_hint or dereference *errp. It's still safe (but useless) in
> > + * other cases.
> > + *
> > + * If errp is NULL or points to error_fatal, it is rewritten to point to a
> > + * local Error object, which will be automatically propagated to the 
> > original
> > + * errp on function exit (see error_propagator_cleanup).
> > + *
> > + * After invocation of this macro it is always safe to dereference errp
> > + * (as it's not NULL anymore) and to append hints (by error_append_hint)
> > + * (as, if it was error_fatal, we swapped it with a local_error to be
> > + * propagated on cleanup).
> 
> Well, appending hints was always safe, it just didn't work with
> _fatal.  Don't worry about that now, I'll probably want to polish
> this contract comment a bit anyway, but later.
> 

FWIW I've already posted this:

Author: Greg Kurz 
Date:   Mon Oct 7 15:45:46 2019 +0200

error: Update error_append_hint()'s documenation
    
    error_setg() and error_propagate(), as well as their variants, cause
QEMU to terminate when called with _fatal or _abort. This
prevents to add hints since error_append_hint() isn't even called in
this case.

It means that error_append_hint() should only be used with a local
error object, and then propagate this local error to the caller.

Document this in  .

Signed-off-by: Greg Kurz 

Message-id: <156871563702.196432.5964411202152101367.st...@bahia.lan>
https://patchwork.ozlabs.org/patch/1163278/

> > + *
> > + * Note: we don't wrap the error_abort case, as we want resulting coredump
> > + * to point to the place where the error happened, not to error_propagate.
> > + */
> > +#define ERRP_AUTO_PROPAGATE() \
> > +g_auto(ErrorPropagator) __auto_errp_prop = {.errp = errp}; \
> 
> Took me a second to realize: the macro works, because the initializer
> implicitly initializes .local_error = NULL.
> 
> __auto_errp_prop is an identifier reserved for any use.  I think we
> could use _auto_errp_prop, which is only reserved for use as identifiers
> with file scope in both the ordinary and tag name spaces.  See ISO/IEC
> 9899:1999 7.1.3 Reserved identifiers.
> 
> > +errp = ((errp == NULL || *errp == error_fatal) ? \

Re: [PATCH v4 00/31] error: auto propagated local_err

2019-10-08 Thread Greg Kurz
On Tue, 8 Oct 2019 08:41:08 +
Vladimir Sementsov-Ogievskiy  wrote:

> 08.10.2019 10:30, Markus Armbruster wrote:
> > Vladimir Sementsov-Ogievskiy  writes:
> > 
> >> Hi all!
> >>
> >> Here is a proposal of auto propagation for local_err, to not call
> >> error_propagate on every exit point, when we deal with local_err.
> >>
> >> There are also two issues with errp:
> >>
> >> 1. error_fatal & error_append_hint/error_prepend: user can't see this
> >> additional info, because exit() happens in error_setg earlier than info
> >> is added. [Reported by Greg Kurz]
> > 
> > How is this series related to Greg's "[PATCH 00/17] Fix usage of
> > error_append_hint()"?  Do we need both?
> 
> These series is a substitution for Greg's. Still, there are problems with
> automation, which Greg pointed in 21/31, and I don't know what to do next.
> 
> May be, just continue to review patches and fix them by hand. May be try to
> improve automation...
> 

The feeling I have after working on my series is that the lines that deal
with errors are mixed up with the functional code in a variety of ways.
That makes it very difficult if not impossible to come with code patterns
suitable for a 100% automated solution IMHO.

My questioning is more around the semantics of error_fatal actually. What
does passing _fatal gives us over passing _err and calling
error_report_err()+exit(), apart from breaking error_append_hint() and
error_prepend() ?

> > 
> >> 2. error_abort & error_propagate: when we wrap
> >> error_abort by local_err+error_propagate, resulting coredump will
> >> refer to error_propagate and not to the place where error happened.
> >> (the macro itself don't fix the issue, but it allows to [3.] drop all
> >> local_err+error_propagate pattern, which will definitely fix the issue)
> >> [Reported by Kevin Wolf]
> >>
> >> Still, applying new macro to all errp-functions is a huge task, which is
> >> impossible to solve in one series.
> >>
> >> So, here is a minimum: solve only [1.], by adding new macro to all
> >> errp-functions which wants to call error_append_hint.
> >>
> >> v4;
> >> 02: - check errp to be not NULL
> >>  - drop Eric's r-b
> >> 03: add Eric's r-b
> >> 04: - rename macro to ERRP_AUTO_PROPAGATE [Kevin]
> >>  - improve comment and commit msg, mention
> >>error_prepend
> >> 05: - handle error_prepend too
> >>  - use new macro name
> >>  - drop empty line at the end
> >>
> >> commit message for auto-generated commits updated,
> >> commits regenerated.
> >>
> >> I'll use cc-cmd to cc appropriate recipients per patch, still
> >> cover-letter together with 04-06 patches should be interesting for
> >> all:
> > [...]
> > 
> > Big series; let me guess its structure.
> > 
> >> Vladimir Sementsov-Ogievskiy (31):
> >>errp: rename errp to errp_in where it is IN-argument
> >>hw/core/loader-fit: fix freeing errp in fit_load_fdt
> >>net/net: fix local variable shadowing in net_client_init
> > 
> > Preparations.
> > 
> >>error: auto propagated local_err
> > 
> > The new idea.
> > 
> >>scripts: add script to fix error_append_hint/error_prepend usage
> >>python: add commit-per-subsystem.py
> > 
> > Scripts to help you apply it.
> > 
> >>s390: Fix error_append_hint/error_prepend usage
> >>ARM TCG CPUs: Fix error_append_hint/error_prepend usage
> >>PowerPC TCG CPUs: Fix error_append_hint/error_prepend usage
> >>arm: Fix error_append_hint/error_prepend usage
> >>SmartFusion2: Fix error_append_hint/error_prepend usage
> >>ASPEED BMCs: Fix error_append_hint/error_prepend usage
> >>Boston: Fix error_append_hint/error_prepend usage
> >>PowerNV (Non-Virtualized): Fix error_append_hint/error_prepend usage
> >>PCI: Fix error_append_hint/error_prepend usage
> >>SCSI: Fix error_append_hint/error_prepend usage
> >>USB: Fix error_append_hint/error_prepend usage
> >>VFIO: Fix error_append_hint/error_prepend usage
> >>vhost: Fix error_append_hint/error_prepend usage
> >>virtio: Fix error_append_hint/error_prepend usage
> >>virtio-9p: Fix error_append_hint/error_prepend usage
> >>XIVE: Fix error_append_hint/error_prepend usage
> >>block: Fix error_append_hint/error_prepend usage
> >>chardev: Fix error_append_hint/error_prepend usage
> >>cmdline: Fix error_append_hint/error_prepend usage
> >>QOM: Fix error_append_hint/error_prepend usage
> >>Migration: Fix error_append_hint/error_prepend usage
> >>Sockets: Fix error_append_hint/error_prepend usage
> >>nbd: Fix error_append_hint/error_prepend usage
> >>PVRDMA: Fix error_append_hint/error_prepend usage
> >>ivshmem: Fix error_append_hint/error_prepend usage
> > 
> > Applying it.
> > 
> > Correct?
> > 
> 
> Yes
> 
> 




Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Greg Kurz
On Thu, 19 Sep 2019 09:28:11 +
Vladimir Sementsov-Ogievskiy  wrote:

> 19.09.2019 11:59, Greg Kurz wrote:
> > On Wed, 18 Sep 2019 16:02:44 +0300
> > Vladimir Sementsov-Ogievskiy  wrote:
> > 
> >> Hi all!
> >>
> >> Here is a proposal (three of them, actually) of auto propagation for
> >> local_err, to not call error_propagate on every exit point, when we
> >> deal with local_err.
> >>
> >> It also may help make Greg's series[1] about error_append_hint smaller.
> >>
> > 
> > This will depend on whether we reach a consensus soon enough (soft
> > freeze for 4.2 is 2019-10-29). Otherwise, I think my series should
> > be merged as is, and auto-propagation to be delayed to 4.3.
> > 
> >> See definitions and examples below.
> >>
> >> I'm cc-ing to this RFC everyone from series[1] CC list, as if we like
> >> it, the idea will touch same code (and may be more).
> >>
> > 
> > When we have a good auto-propagation mechanism available, I guess
> > this can be beneficial for most of the code, not only the places
> > where we add hints (or prepend something, see below).
> > 
> >> [1]: https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg03449.html
> >>
> >> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> >> ---
> >>   include/qapi/error.h | 102 +++
> >>   block.c  |  63 --
> >>   block/backup.c   |   8 +++-
> >>   block/gluster.c  |   7 +++
> >>   4 files changed, 144 insertions(+), 36 deletions(-)
> >>
> >> diff --git a/include/qapi/error.h b/include/qapi/error.h
> >> index 3f95141a01..083e061014 100644
> >> --- a/include/qapi/error.h
> >> +++ b/include/qapi/error.h
> >> @@ -322,6 +322,108 @@ void error_set_internal(Error **errp,
> >>   ErrorClass err_class, const char *fmt, ...)
> >>   GCC_FMT_ATTR(6, 7);
> >>   
> >> +typedef struct ErrorPropagator {
> >> +Error **errp;
> >> +Error *local_err;
> >> +} ErrorPropagator;
> >> +
> >> +static inline void error_propagator_cleanup(ErrorPropagator *prop)
> >> +{
> >> +if (prop->local_err) {
> >> +error_propagate(prop->errp, prop->local_err);
> > 
> > We also have error_propagate_prepend(), which is basically a variant of
> > error_prepend()+error_propagate() that can cope with _fatal. This
> > was introduced by Markus in commit 4b5766488fd3, for similar reasons that
> > motivated my series. It has ~30 users in the tree.
> > 
> > There's no such thing a generic cleanup function with a printf-like
> > interface, so all of these should be converted back to error_prepend()
> > if we go for auto-propagation.
> > 
> > Aside from propagation, one can sometime choose to call things like
> > error_free() or error_free_or_abort()... Auto-propagation should
> > detect that and not call error_propagate() in this case.
> 
> Hmm, for example, qmp_eject, which error_free or error_propagate..
> We can leave such cases as is, not many of them. Or introduce
> safe_errp_free(Error **errp), which will zero pointer after freeing.
> 

Maybe even turning error_free() to take an Error ** ? It looks
safe to zero out a dangling pointer. Of course the API change
would need to be propagated to all error_* functions that
explicitly call error_free().

> > 
> >> +}
> >> +}
> >> +
> >> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagator, 
> >> error_propagator_cleanup);
> >> +
> >> +/*
> >> + * ErrorPropagationPair
> >> + *
> >> + * [Error *local_err, Error **errp]
> >> + *
> >> + * First element is local_err, second is original errp, which is 
> >> propagation
> >> + * target. Yes, errp has a bit another type, so it should be converted.
> >> + *
> >> + * ErrorPropagationPair may be used as errp, which points to local_err,
> >> + * as it's type is compatible.
> >> + */
> >> +typedef Error *ErrorPropagationPair[2];
> >> +
> >> +static inline void error_propagation_pair_cleanup(ErrorPropagationPair 
> >> *arr)
> >> +{
> >> +Error *local_err = (*arr)[0];
> >> +Error **errp = (Error **)(*arr)[1];
> >> +
> >> +if (local_err) {
> >> +error_propagate(errp, local_err);
> >> +}
> >> +}
> >> +
> >> +G_DEFINE_AUTO_CLEANUP_CLE

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Greg Kurz
On Wed, 18 Sep 2019 16:02:44 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> Hi all!
> 
> Here is a proposal (three of them, actually) of auto propagation for
> local_err, to not call error_propagate on every exit point, when we
> deal with local_err.
> 
> It also may help make Greg's series[1] about error_append_hint smaller.
> 

This will depend on whether we reach a consensus soon enough (soft
freeze for 4.2 is 2019-10-29). Otherwise, I think my series should
be merged as is, and auto-propagation to be delayed to 4.3.

> See definitions and examples below.
> 
> I'm cc-ing to this RFC everyone from series[1] CC list, as if we like
> it, the idea will touch same code (and may be more).
> 

When we have a good auto-propagation mechanism available, I guess
this can be beneficial for most of the code, not only the places
where we add hints (or prepend something, see below).

> [1]: https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg03449.html
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/qapi/error.h | 102 +++
>  block.c  |  63 --
>  block/backup.c   |   8 +++-
>  block/gluster.c  |   7 +++
>  4 files changed, 144 insertions(+), 36 deletions(-)
> 
> diff --git a/include/qapi/error.h b/include/qapi/error.h
> index 3f95141a01..083e061014 100644
> --- a/include/qapi/error.h
> +++ b/include/qapi/error.h
> @@ -322,6 +322,108 @@ void error_set_internal(Error **errp,
>  ErrorClass err_class, const char *fmt, ...)
>  GCC_FMT_ATTR(6, 7);
>  
> +typedef struct ErrorPropagator {
> +Error **errp;
> +Error *local_err;
> +} ErrorPropagator;
> +
> +static inline void error_propagator_cleanup(ErrorPropagator *prop)
> +{
> +if (prop->local_err) {
> +error_propagate(prop->errp, prop->local_err);

We also have error_propagate_prepend(), which is basically a variant of
error_prepend()+error_propagate() that can cope with _fatal. This
was introduced by Markus in commit 4b5766488fd3, for similar reasons that
motivated my series. It has ~30 users in the tree.

There's no such thing a generic cleanup function with a printf-like
interface, so all of these should be converted back to error_prepend()
if we go for auto-propagation.

Aside from propagation, one can sometime choose to call things like
error_free() or error_free_or_abort()... Auto-propagation should
detect that and not call error_propagate() in this case.

> +}
> +}
> +
> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagator, error_propagator_cleanup);
> +
> +/*
> + * ErrorPropagationPair
> + *
> + * [Error *local_err, Error **errp]
> + *
> + * First element is local_err, second is original errp, which is propagation
> + * target. Yes, errp has a bit another type, so it should be converted.
> + *
> + * ErrorPropagationPair may be used as errp, which points to local_err,
> + * as it's type is compatible.
> + */
> +typedef Error *ErrorPropagationPair[2];
> +
> +static inline void error_propagation_pair_cleanup(ErrorPropagationPair *arr)
> +{
> +Error *local_err = (*arr)[0];
> +Error **errp = (Error **)(*arr)[1];
> +
> +if (local_err) {
> +error_propagate(errp, local_err);
> +}
> +}
> +
> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagationPair,
> + error_propagation_pair_cleanup);
> +
> +/*
> + * DEF_AUTO_ERRP
> + *
> + * Define auto_errp variable, which may be used instead of errp, and
> + * *auto_errp may be safely checked to be zero or not, and may be safely
> + * used for error_append_hint(). auto_errp is automatically propagated
> + * to errp at function exit.
> + */
> +#define DEF_AUTO_ERRP(auto_errp, errp) \
> +g_auto(ErrorPropagationPair) (auto_errp) = {NULL, (Error *)(errp)}
> +
> +
> +/*
> + * Another variant:
> + *   Pros:
> + * - normal structure instead of cheating with array
> + * - we can directly use errp, if it's not NULL and don't point to
> + *   error_abort or error_fatal
> + *   Cons:
> + * - we need to define two variables instead of one
> + */
> +typedef struct ErrorPropagationStruct {
> +Error *local_err;
> +Error **errp;
> +} ErrorPropagationStruct;
> +
> +static inline void error_propagation_struct_cleanup(ErrorPropagationStruct 
> *prop)
> +{
> +if (prop->local_err) {
> +error_propagate(prop->errp, prop->local_err);
> +}
> +}
> +
> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagationStruct,
> + error_propagation_struct_cleanup);
> +
> +#define DEF_AUTO_ERRP_V2(auto_errp, errp) \
> +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
> +Error **auto_errp = \
> +((errp) == NULL || *(errp) == error_abort || *(errp) == error_fatal) 
> ? \
> +&__auto_errp_prop.local_err : \
> +(errp);
> +
> +/*
> + * Third variant:
> + *   Pros:
> + * - simpler movement for functions which don't have local_err yet
> + *   the 

Re: [Qemu-block] [Qemu-devel] [PATCH 02/17] block: Pass local error object pointer to error_append_hint()

2019-09-18 Thread Greg Kurz
On Tue, 17 Sep 2019 17:40:11 +
Vladimir Sementsov-Ogievskiy  wrote:

> 17.09.2019 18:37, Greg Kurz wrote:
> > On Tue, 17 Sep 2019 13:25:03 +
> > Vladimir Sementsov-Ogievskiy  wrote:
> > 
> >> 17.09.2019 13:20, Greg Kurz wrote:
> >>> Ensure that hints are added even if errp is _fatal or _abort.
> >>>
> >>> Signed-off-by: Greg Kurz 
> >>> ---
> >>>block/backup.c   |7 +--
> >>>block/dirty-bitmap.c |7 +--
> >>>block/file-posix.c   |   20 +---
> >>>block/gluster.c  |   23 +++
> >>>block/qcow.c |   10 ++
> >>>block/qcow2.c|7 +--
> >>>block/vhdx-log.c |7 +--
> >>>block/vpc.c  |7 +--
> >>>8 files changed, 59 insertions(+), 29 deletions(-)
> >>>
> >>> diff --git a/block/backup.c b/block/backup.c
> >>> index 763f0d7ff6db..d8c422a0e3bc 100644
> >>> --- a/block/backup.c
> >>> +++ b/block/backup.c
> >>> @@ -602,11 +602,14 @@ static int64_t 
> >>> backup_calculate_cluster_size(BlockDriverState *target,
> >>>BACKUP_CLUSTER_SIZE_DEFAULT);
> >>>return BACKUP_CLUSTER_SIZE_DEFAULT;
> >>>} else if (ret < 0 && !target->backing) {
> >>> -error_setg_errno(errp, -ret,
> >>> +Error *local_err = NULL;
> >>> +
> >>> +error_setg_errno(_err, -ret,
> >>>"Couldn't determine the cluster size of the target image, "
> >>>"which has no backing file");
> >>> -error_append_hint(errp,
> >>> +error_append_hint(_err,
> >>>"Aborting, since this may create an unusable destination 
> >>> image\n");
> >>> +error_propagate(errp, local_err);
> >>>return ret;
> >>>} else if (ret < 0 && target->backing) {
> >>>/* Not fatal; just trudge on ahead. */
> >>
> >>
> >> Pain.. Do we need these hints, if they are so painful?
> >>
> > 
> > I agree that the one above doesn't qualify as a useful hint.
> > It just tells that QEMU is giving up and gives no indication
> > to the user on how to avoid the issue. It should probably be
> > dropped.
> > 
> >> At least for cases like this, we can create helper function
> >>
> >> error_setg_errno_hint(..., error, hint)
> > 
> > Not very useful if hint has to be forged separately with
> > g_sprintf(), and we can't have such a thing as:
> > 
> > error_setg_errno_hint(errp, err_fmt, ..., hint_fmt, ...)
> > 
> >>
> >> But what could be done when we call function, which may or may not set 
> >> errp?
> >>
> >> ret = f(errp);
> >> if (ret) {
> >>  error_append_hint(errp, hint);
> >> }
> >>
> > 
> > Same problem. If errp is _fatal and f() does errno_setg(errp), it
> > ends up calling exit().
> > 
> >> Hmmm..
> >>
> >> Can it look like
> >>
> >> ret = f(..., hint_helper(errp, hint))
> >>
> >> ?
> >>
> > 
> > Nope, hint_helper() will get called before f() and things are worse.
> > If errp is NULL then error_append_hint() does nothing and if it is
> > _fatal then it aborts.
> > 
> >> I can't imagine how to do it, as someone should remove hint from 
> >> error_abort object on
> >> success path..
> >>
> >> But seems, the following is possible, which seems better for me than 
> >> local-error approach:
> >>
> >> error_push_hint(errp, hint);
> >> ret = f(.., errp);
> >> error_pop_hint(errp);
> >>
> > 
> > Matter of taste... also, it looks awkward to come up with a hint
> > before knowing what happened. I mean the appropriate hint could
> > depend on the value returned by f() and/or errno for example.
> > 
> >> ===
> >>
> >> Continue thinking on this:
> >>
> >> It may look like just
> >> ret = f(..., set_hint(errp, hint));
> >>
> >> or (just to split long line):
> >> set_hint(errp, hint);
> >> ret = f(..., errp);
> >>
> >> if in each function with errp does error_push_hint(errp) on start and 
> >> error_po

Re: [Qemu-block] [PATCH 02/17] block: Pass local error object pointer to error_append_hint()

2019-09-17 Thread Greg Kurz
On Tue, 17 Sep 2019 16:46:31 +0200
Kevin Wolf  wrote:

> Am 17.09.2019 um 16:39 hat Eric Blake geschrieben:
> > On 9/17/19 5:20 AM, Greg Kurz wrote:
> > > Ensure that hints are added even if errp is _fatal or _abort.
> > > 
> > > Signed-off-by: Greg Kurz 
> > > ---
> > >  block/backup.c   |7 +--
> > >  block/dirty-bitmap.c |7 +--
> > >  block/file-posix.c   |   20 +---
> > >  block/gluster.c  |   23 +++
> > >  block/qcow.c |   10 ++
> > >  block/qcow2.c|7 +--
> > >  block/vhdx-log.c |7 +--
> > >  block/vpc.c  |7 +--
> > >  8 files changed, 59 insertions(+), 29 deletions(-)
> > > 
> > > diff --git a/block/backup.c b/block/backup.c
> > > index 763f0d7ff6db..d8c422a0e3bc 100644
> > > --- a/block/backup.c
> > > +++ b/block/backup.c
> > > @@ -602,11 +602,14 @@ static int64_t 
> > > backup_calculate_cluster_size(BlockDriverState *target,
> > >  BACKUP_CLUSTER_SIZE_DEFAULT);
> > >  return BACKUP_CLUSTER_SIZE_DEFAULT;
> > >  } else if (ret < 0 && !target->backing) {
> > > -error_setg_errno(errp, -ret,
> > > +Error *local_err = NULL;
> > 
> > Can we go with the shorter name 'err' instead of 'local_err'?  I know,
> > we aren't consistent (both styles appear throughout the tree), but the
> > shorter style is more appealing to me.
> 
> I like local_err better because it's easier to distinguish from errp.
> The compiler might catch it if you use the wrong one because one is
> Error* and the other is Error**, but as a reviewer, I can still get
> confused.
> 

I'll favor the official maintainer, hence keeping local_err :)

> Kevin




  1   2   >