date:20201130

On Mon, Nov 30, 2020 at 9:07 AM Alex Richardson
 wrote:
>
> The TW and TSR fields should be bits 21 and 22 and not 30/29.
> This was found while comparing QEMU behaviour against the sail formal
> model (https://github.com/rems-project/sail-riscv/).
>
> Signed-off-by: Alex Richardson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu_bits.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> index 24b24c69c5..92147332c6 100644
> --- a/target/riscv/cpu_bits.h
> +++ b/target/riscv/cpu_bits.h
> @@ -379,8 +379,8 @@
>  #define MSTATUS_MXR 0x0008
>  #define MSTATUS_VM  0x1F00 /* until: priv-1.9.1 */
>  #define MSTATUS_TVM 0x0010 /* since: priv-1.10 */
> -#define MSTATUS_TW  0x2000 /* since: priv-1.10 */
> -#define MSTATUS_TSR 0x4000 /* since: priv-1.10 */
> +#define MSTATUS_TW  0x0020 /* since: priv-1.10 */
> +#define MSTATUS_TSR 0x0040 /* since: priv-1.10 */
>  #define MSTATUS_GVA 0x40ULL
>  #define MSTATUS_MPV 0x80ULL
>
> --
> 2.29.2
>
>

Re: [PATCH] target/riscv: Fix the bug of HLVX/HLV/HSV

On Sun, Nov 29, 2020 at 5:37 PM Yifei Jiang  wrote:
>
> We found that the hypervisor virtual-machine load and store instructions,
> included HLVX/HLV/HSV, couldn't access guest userspace memory.
>
> In the riscv-privileged spec, HLVX/HLV/HSV is defined as follow:
> "As usual when V=1, two-stage address translation is applied, and
> the HS-level sstatus.SUM is ignored."
>
> But get_physical_address() doesn't ignore sstatus.SUM, when HLVX/HLV/HSV
> accesses guest userspace memory. So this patch fixes it.
>
> Signed-off-by: Yifei Jiang 
> Signed-off-by: Yipeng Yin 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu_helper.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index a2787b1d48..7274f971a4 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -367,7 +367,8 @@ static int get_physical_address(CPURISCVState *env, 
> hwaddr *physical,
>  vm = get_field(env->hgatp, HGATP_MODE);
>  widened = 2;
>  }
> -sum = get_field(env->mstatus, MSTATUS_SUM);
> +/* status.SUM will be ignored if execute on background */
> +sum = get_field(env->mstatus, MSTATUS_SUM) || use_background;
>  switch (vm) {
>  case VM_1_10_SV32:
>levels = 2; ptidxbits = 10; ptesize = 4; break;
> --
> 2.19.1
>
>

Re: [PATCH-for-5.2? 1/4] hw/arm/fsl-imx: Add SD bus QOM alias on the SoC

On Tue, Nov 24, 2020 at 1:54 AM Philippe Mathieu-Daudé  wrote:
>
> To be able to select a particular SD bus from the command
> line, add a QOM alias on the SoC (using an unique name).
>
> Buglink: https://bugs.launchpad.net/qemu/+bug/1895895
> Reported-by: David Aghaian 
> Suggested-by: Peter Maydell 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/arm/fsl-imx25.c | 6 ++
>  hw/arm/fsl-imx6.c  | 6 ++
>  2 files changed, 12 insertions(+)
>
> diff --git a/hw/arm/fsl-imx25.c b/hw/arm/fsl-imx25.c
> index 08a98f828fc..6e66ae742af 100644
> --- a/hw/arm/fsl-imx25.c
> +++ b/hw/arm/fsl-imx25.c
> @@ -239,6 +239,7 @@ static void fsl_imx25_realize(DeviceState *dev, Error 
> **errp)
>  { FSL_IMX25_ESDHC1_ADDR, FSL_IMX25_ESDHC1_IRQ },
>  { FSL_IMX25_ESDHC2_ADDR, FSL_IMX25_ESDHC2_IRQ },
>  };
> +g_autofree char *bus_name = NULL;
>
>  object_property_set_uint(OBJECT(&s->esdhc[i]), "sd-spec-version", 2,
>   &error_abort);
> @@ -253,6 +254,11 @@ static void fsl_imx25_realize(DeviceState *dev, Error 
> **errp)
>  sysbus_connect_irq(SYS_BUS_DEVICE(&s->esdhc[i]), 0,
> qdev_get_gpio_in(DEVICE(&s->avic),
>  esdhc_table[i].irq));
> +
> +/* Alias controller SD bus to the SoC itself */
> +bus_name = g_strdup_printf("sd-bus%d", i);
> +object_property_add_alias(OBJECT(s), bus_name,
> +  OBJECT(&s->esdhc[i]), "sd-bus");
>  }
>
>  /* USB */
> diff --git a/hw/arm/fsl-imx6.c b/hw/arm/fsl-imx6.c
> index 00dafe3f62d..144bcdcaf6c 100644
> --- a/hw/arm/fsl-imx6.c
> +++ b/hw/arm/fsl-imx6.c
> @@ -314,6 +314,7 @@ static void fsl_imx6_realize(DeviceState *dev, Error 
> **errp)
>  { FSL_IMX6_uSDHC3_ADDR, FSL_IMX6_uSDHC3_IRQ },
>  { FSL_IMX6_uSDHC4_ADDR, FSL_IMX6_uSDHC4_IRQ },
>  };
> +g_autofree char *bus_name = NULL;
>
>  /* UHS-I SDIO3.0 SDR104 1.8V ADMA */
>  object_property_set_uint(OBJECT(&s->esdhc[i]), "sd-spec-version", 3,
> @@ -329,6 +330,11 @@ static void fsl_imx6_realize(DeviceState *dev, Error 
> **errp)
>  sysbus_connect_irq(SYS_BUS_DEVICE(&s->esdhc[i]), 0,
> qdev_get_gpio_in(DEVICE(&s->a9mpcore),
>  esdhc_table[i].irq));
> +
> +/* Alias controller SD bus to the SoC itself */
> +bus_name = g_strdup_printf("sd-bus%d", i);
> +object_property_add_alias(OBJECT(s), bus_name,
> +  OBJECT(&s->esdhc[i]), "sd-bus");
>  }
>
>  /* USB */
> --
> 2.26.2
>
>

Re: [PATCH-for-5.2? 4/4] hw/arm/xilinx_zynq: Add SD bus QOM alias on the machine

On Tue, Nov 24, 2020 at 1:50 AM Philippe Mathieu-Daudé  wrote:
>
> To be able to select a particular SD bus from the command
> line, add a QOM alias on the machine (using an unique name).
>
> Suggested-by: Peter Maydell 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/arm/xilinx_zynq.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
> index b72772bc824..6a4a1de88cf 100644
> --- a/hw/arm/xilinx_zynq.c
> +++ b/hw/arm/xilinx_zynq.c
> @@ -286,6 +286,7 @@ static void zynq_init(MachineState *machine)
>  DriveInfo *di;
>  BlockBackend *blk;
>  DeviceState *carddev;
> +g_autofree char *bus_name = NULL;
>
>  /* Compatible with:
>   * - SD Host Controller Specification Version 2.0 Part A2
> @@ -299,6 +300,11 @@ static void zynq_init(MachineState *machine)
>  sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, hci_addr);
>  sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[hci_irq - 
> IRQ_OFFSET]);
>
> +/* Alias controller SD bus to the machine itself */
> +bus_name = g_strdup_printf("sd-bus%d", n);
> +object_property_add_alias(OBJECT(machine), bus_name,
> +  OBJECT(dev), "sd-bus");
> +
>  di = drive_get_next(IF_SD);
>  blk = di ? blk_by_legacy_dinfo(di) : NULL;
>  carddev = qdev_new(TYPE_SD_CARD);
> --
> 2.26.2
>
>

Re: [PATCH-for-5.2? 3/4] hw/arm/xlnx-versal: Add SD bus QOM alias on the SoC

On Tue, Nov 24, 2020 at 1:51 AM Philippe Mathieu-Daudé  wrote:
>
> To be able to select a particular SD bus from the command
> line, add a QOM alias on the SoC (using an unique name).
>
> Suggested-by: Peter Maydell 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/arm/xlnx-versal.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
> index 12ba6c4ebae..da3ee24a5b9 100644
> --- a/hw/arm/xlnx-versal.c
> +++ b/hw/arm/xlnx-versal.c
> @@ -210,6 +210,7 @@ static void versal_create_sds(Versal *s, qemu_irq *pic)
>  int i;
>
>  for (i = 0; i < ARRAY_SIZE(s->pmc.iou.sd); i++) {
> +g_autofree char *bus_name = NULL;
>  DeviceState *dev;
>  MemoryRegion *mr;
>
> @@ -224,6 +225,10 @@ static void versal_create_sds(Versal *s, qemu_irq *pic)
>  object_property_set_uint(OBJECT(dev), "uhs", UHS_I, &error_fatal);
>  sysbus_realize(SYS_BUS_DEVICE(dev), &error_fatal);
>
> +/* Alias controller SD bus to the SoC itself */
> +bus_name = g_strdup_printf("sd-bus%d", i);
> +object_property_add_alias(OBJECT(s), bus_name, OBJECT(dev), 
> "sd-bus");
> +
>  mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
>  memory_region_add_subregion(&s->mr_ps,
>  MM_PMC_SD0 + i * MM_PMC_SD0_SIZE, mr);
> --
> 2.26.2
>
>

Re: [PATCH 2/2] hw/ssi: imx_spi: Disable chip selects in imx_spi_reset()

On Sun, Nov 29, 2020 at 8:05 PM Bin Meng  wrote:
>
> From: Xuzhou Cheng 
>
> When a write to ECSPI_CONREG register to disable the SPI controller,
> imx_spi_reset() is called to reset the controller, during which CS
> lines should have been disabled, otherwise the state machine of any
> devices (e.g.: SPI flashes) connected to the SPI master is stuck to
> its last state and responds incorrectly to any follow-up commands.
>
> Fixes c906a3a01582: ("i.MX: Add the Freescale SPI Controller")
> Signed-off-by: Xuzhou Cheng 
> Signed-off-by: Bin Meng 

Acked-by: Alistair Francis 

Alistair

>
> ---
>
>  hw/ssi/imx_spi.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/hw/ssi/imx_spi.c b/hw/ssi/imx_spi.c
> index e605049..85c172e 100644
> --- a/hw/ssi/imx_spi.c
> +++ b/hw/ssi/imx_spi.c
> @@ -231,6 +231,7 @@ static void imx_spi_flush_txfifo(IMXSPIState *s)
>  static void imx_spi_reset(DeviceState *dev)
>  {
>  IMXSPIState *s = IMX_SPI(dev);
> +int i;
>
>  DPRINTF("\n");
>
> @@ -243,6 +244,10 @@ static void imx_spi_reset(DeviceState *dev)
>
>  imx_spi_update_irq(s);
>
> +for (i = 0; i < ECSPI_NUM_CS; i++) {
> +qemu_set_irq(s->cs_lines[i], 1);
> +}
> +
>  s->burst_length = 0;
>  }
>
> --
> 2.7.4
>
>

Re: [PATCH 1/2] hw/ssi: imx_spi: Use a macro for number of chip selects supported

On Sun, Nov 29, 2020 at 8:06 PM Bin Meng  wrote:
>
> From: Bin Meng 
>
> Avoid using a magic number (4) everywhere for the number of chip
> selects supported.
>
> Signed-off-by: Bin Meng 

Reviewed-by: Alistair Francis 

Alistair

> ---
>
>  hw/ssi/imx_spi.c | 4 ++--
>  include/hw/ssi/imx_spi.h | 5 -
>  2 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/hw/ssi/imx_spi.c b/hw/ssi/imx_spi.c
> index d8885ae..e605049 100644
> --- a/hw/ssi/imx_spi.c
> +++ b/hw/ssi/imx_spi.c
> @@ -361,7 +361,7 @@ static void imx_spi_write(void *opaque, hwaddr offset, 
> uint64_t value,
>
>  /* We are in master mode */
>
> -for (i = 0; i < 4; i++) {
> +for (i = 0; i < ECSPI_NUM_CS; i++) {
>  qemu_set_irq(s->cs_lines[i],
>   i == imx_spi_selected_channel(s) ? 0 : 1);
>  }
> @@ -424,7 +424,7 @@ static void imx_spi_realize(DeviceState *dev, Error 
> **errp)
>  sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->iomem);
>  sysbus_init_irq(SYS_BUS_DEVICE(dev), &s->irq);
>
> -for (i = 0; i < 4; ++i) {
> +for (i = 0; i < ECSPI_NUM_CS; ++i) {
>  sysbus_init_irq(SYS_BUS_DEVICE(dev), &s->cs_lines[i]);
>  }
>
> diff --git a/include/hw/ssi/imx_spi.h b/include/hw/ssi/imx_spi.h
> index b82b17f..eeaf49b 100644
> --- a/include/hw/ssi/imx_spi.h
> +++ b/include/hw/ssi/imx_spi.h
> @@ -77,6 +77,9 @@
>
>  #define EXTRACT(value, name) extract32(value, name##_SHIFT, name##_LENGTH)
>
> +/* number of chip selects supported */
> +#define ECSPI_NUM_CS 4
> +
>  #define TYPE_IMX_SPI "imx.spi"
>  OBJECT_DECLARE_SIMPLE_TYPE(IMXSPIState, IMX_SPI)
>
> @@ -89,7 +92,7 @@ struct IMXSPIState {
>
>  qemu_irq irq;
>
> -qemu_irq cs_lines[4];
> +qemu_irq cs_lines[ECSPI_NUM_CS];
>
>  SSIBus *bus;
>
> --
> 2.7.4
>
>

Re: [PATCH-for-5.2? 2/4] hw/arm/exynos4210: Add SD bus QOM alias on the SoC

On Tue, Nov 24, 2020 at 1:55 AM Philippe Mathieu-Daudé  wrote:
>
> To be able to select a particular SD bus from the command
> line, add a QOM alias on the SoC (using an unique name).
>
> Suggested-by: Peter Maydell 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/arm/exynos4210.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
> index ced2769b102..a60f08d372a 100644
> --- a/hw/arm/exynos4210.c
> +++ b/hw/arm/exynos4210.c
> @@ -408,6 +408,7 @@ static void exynos4210_realize(DeviceState *socdev, Error 
> **errp)
>
>  /*** SD/MMC host controllers ***/
>  for (n = 0; n < EXYNOS4210_SDHCI_NUMBER; n++) {
> +g_autofree char *bus_name = NULL;
>  DeviceState *carddev;
>  BlockBackend *blk;
>  DriveInfo *di;
> @@ -432,6 +433,10 @@ static void exynos4210_realize(DeviceState *socdev, 
> Error **errp)
>  sysbus_mmio_map(busdev, 0, EXYNOS4210_SDHCI_ADDR(n));
>  sysbus_connect_irq(busdev, 0, s->irq_table[exynos4210_get_irq(29, 
> n)]);
>
> +/* Alias controller SD bus to the SoC itself */
> +bus_name = g_strdup_printf("sd-bus%d", n);
> +object_property_add_alias(OBJECT(s), bus_name, OBJECT(dev), 
> "sd-bus");
> +
>  di = drive_get(IF_SD, 0, n);
>  blk = di ? blk_by_legacy_dinfo(di) : NULL;
>  carddev = qdev_new(TYPE_SD_CARD);
> --
> 2.26.2
>
>

[Bug 1906156] Re: Host OS Reboot Required, for Guest kext to Load (Fully)

2020-11-30 Thread Russell Morris

Sure, will do (upstream version). Is there a preferred way to do it?
Meaning ... build locally, or install from some repository?

Thanks!

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1906156

Title:
  Host OS Reboot Required, for Guest kext to Load (Fully)

Status in QEMU:
  Incomplete

Bug description:
  Hi,

  Finding this one a bit odd, but I am loading a driver (kext) in a
  macOS guest ... and it works, on the first VM (domain) startup after a
  full / clean host OS boot (or reboot). However, if I even reboot the
  guest OS, then the driver load fails => can be "corrected" by a full
  host OS reboot (which seems very extreme).

  Is this a known issue, and/or is there a workaround?

  FYI, running,
  QEMU emulator version 5.0.0 (Debian 1:5.0-5ubuntu9.1)
  Copyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers

  This is for a macOS guest, on a Linux host.

  Thanks!

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1906156/+subscriptions

Re: [PATCH 2/8] hvf: Move common code out

2020-11-30 Thread Roman Bolshakov

On Mon, Nov 30, 2020 at 10:40:49PM +0100, Alexander Graf wrote:
> Hi Peter,
> 
> On 30.11.20 22:08, Peter Collingbourne wrote:
> > On Mon, Nov 30, 2020 at 12:56 PM Frank Yang  wrote:
> > > 
> > > 
> > > On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf  wrote:
> > > > Hi Frank,
> > > > 
> > > > Thanks for the update :). Your previous email nudged me into the right 
> > > > direction. I previously had implemented WFI through the internal timer 
> > > > framework which performed way worse.
> > > Cool, glad it's helping. Also, Peter found out that the main thing 
> > > keeping us from just using cntpct_el0 on the host directly and compare 
> > > with cval is that if we sleep, cval is going to be much < cntpct_el0 by 
> > > the sleep time. If we can get either the architecture or macos to read 
> > > out the sleep time then we might be able to not have to use a poll 
> > > interval either!
> > > > Along the way, I stumbled over a few issues though. For starters, the 
> > > > signal mask for SIG_IPI was not set correctly, so while pselect() would 
> > > > exit, the signal would never get delivered to the thread! For a fix, 
> > > > check out
> > > > 
> > > >
> > > > https://patchew.org/QEMU/20201130030723.78326-1-ag...@csgraf.de/20201130030723.78326-4-ag...@csgraf.de/
> > > > 
> > > Thanks, we'll take a look :)
> > > 
> > > > Please also have a look at my latest stab at WFI emulation. It doesn't 
> > > > handle WFE (that's only relevant in overcommitted scenarios). But it 
> > > > does handle WFI and even does something similar to hlt polling, albeit 
> > > > not with an adaptive threshold.
> > Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so
> > I'll reply to your patch here. You have:
> > 
> > +/* Set cpu->hvf->sleeping so that we get a
> > SIG_IPI signal. */
> > +cpu->hvf->sleeping = true;
> > +smp_mb();
> > +
> > +/* Bail out if we received an IRQ meanwhile */
> > +if (cpu->thread_kicked || (cpu->interrupt_request &
> > +(CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
> > +cpu->hvf->sleeping = false;
> > +break;
> > +}
> > +
> > +/* nanosleep returns on signal, so we wake up on kick. 
> > */
> > +nanosleep(ts, NULL);
> > 
> > and then send the signal conditional on whether sleeping is true, but
> > I think this is racy. If the signal is sent after sleeping is set to
> > true but before entering nanosleep then I think it will be ignored and
> > we will miss the wakeup. That's why in my implementation I block IPI
> > on the CPU thread at startup and then use pselect to atomically
> > unblock and begin sleeping. The signal is sent unconditionally so
> > there's no need to worry about races between actually sleeping and the
> > "we think we're sleeping" state. It may lead to an extra wakeup but
> > that's better than missing it entirely.
> 
> 
> Thanks a bunch for the comment! So the trick I was using here is to modify
> the timespec from the kick function before sending the IPI signal. That way,
> we know that either we are inside the sleep (where the signal wakes it up)
> or we are outside the sleep (where timespec={} will make it return
> immediately).
> 
> The only race I can think of is if nanosleep does calculations based on the
> timespec and we happen to send the signal right there and then.
> 
> The problem with blocking IPIs is basically what Frank was describing
> earlier: How do you unset the IPI signal pending status? If the signal is
> never delivered, how can pselect differentiate "signal from last time is
> still pending" from "new signal because I got an IPI"?
> 
> 

Hi Alex,

There was a patch for x86 HVF that implements CPU kick and it wasn't
merged (mostly because of my lazyness). It has some changes like you
introduced in the series and VMX-specific handling of preemption timer
to gurantee interrupt delivery without kick loss:

https://patchwork.kernel.org/project/qemu-devel/patch/20200729124832.79375-1-r.bolsha...@yadro.com/

I wonder if it'd possible to have common handling of kicks for both x86
and arm (given that arch-specific bits are wrapped)?

Thanks,
Roman

Re: [PATCH 2/8] hvf: Move common code out

2020-11-30 Thread Alexander Graf

On 01.12.20 01:00, Peter Collingbourne wrote:

On Mon, Nov 30, 2020 at 3:18 PM Alexander Graf wrote:

On 01.12.20 00:01, Peter Collingbourne wrote:

On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf wrote:

Hi Peter,

On 30.11.20 22:08, Peter Collingbourne wrote:

On Mon, Nov 30, 2020 at 12:56 PM Frank Yang wrote:

On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf wrote:

Hi Frank,

Thanks for the update :). Your previous email nudged me into the right
direction. I previously had implemented WFI through the internal timer
framework which performed way worse.

Cool, glad it's helping. Also, Peter found out that the main thing keeping us from
just using cntpct_el0 on the host directly and compare with cval is that if we
sleep, cval is going to be much < cntpct_el0 by the sleep time. If we can get
either the architecture or macos to read out the sleep time then we might be able
to not have to use a poll interval either!

Along the way, I stumbled over a few issues though. For starters, the signal
mask for SIG_IPI was not set correctly, so while pselect() would exit, the
signal would never get delivered to the thread! For a fix, check out

https://patchew.org/QEMU/20201130030723.78326-1-ag...@csgraf.de/20201130030723.78326-4-ag...@csgraf.de/

Thanks, we'll take a look :)

Please also have a look at my latest stab at WFI emulation. It doesn't handle
WFE (that's only relevant in overcommitted scenarios). But it does handle WFI
and even does something similar to hlt polling, albeit not with an adaptive
threshold.

Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so
I'll reply to your patch here. You have:

+/* Set cpu->hvf->sleeping so that we get a
SIG_IPI signal. */
+cpu->hvf->sleeping = true;
+smp_mb();
+
+/* Bail out if we received an IRQ meanwhile */
+if (cpu->thread_kicked || (cpu->interrupt_request &
+(CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
+cpu->hvf->sleeping = false;
+break;
+}
+
+/* nanosleep returns on signal, so we wake up on kick. */
+nanosleep(ts, NULL);

and then send the signal conditional on whether sleeping is true, but
I think this is racy. If the signal is sent after sleeping is set to
true but before entering nanosleep then I think it will be ignored and
we will miss the wakeup. That's why in my implementation I block IPI
on the CPU thread at startup and then use pselect to atomically
unblock and begin sleeping. The signal is sent unconditionally so
there's no need to worry about races between actually sleeping and the
"we think we're sleeping" state. It may lead to an extra wakeup but
that's better than missing it entirely.

Thanks a bunch for the comment! So the trick I was using here is to
modify the timespec from the kick function before sending the IPI
signal. That way, we know that either we are inside the sleep (where the
signal wakes it up) or we are outside the sleep (where timespec={} will
make it return immediately).

The only race I can think of is if nanosleep does calculations based on
the timespec and we happen to send the signal right there and then.

Yes that's the race I was thinking of. Admittedly it's a small window
but it's theoretically possible and part of the reason why pselect was
created.

The problem with blocking IPIs is basically what Frank was describing
earlier: How do you unset the IPI signal pending status? If the signal
is never delivered, how can pselect differentiate "signal from last time
is still pending" from "new signal because I got an IPI"?

In this case we would take the additional wakeup which should be
harmless since we will take the WFx exit again and put us in the
correct state. But that's a lot better than busy looping.

I'm not sure I follow. I'm thinking of the following scenario:

- trap into WFI handler
- go to sleep with blocked SIG_IPI
- SIG_IPI arrives, pselect() exits
- signal is still pending because it's blocked
- enter guest
- trap into WFI handler
- run pselect(), but it immediate exits because SIG_IPI is still pending

This was the loop I was seeing when running with SIG_IPI blocked. That's
part of the reason why I switched to a different model.

What I observe is that when returning from a pending signal pselect
consumes the signal (which is also consistent with my understanding of
what pselect does). That means that it doesn't matter if we take a
second WFx exit because once we reach the pselect in the second WFx
exit the signal will have been consumed by the pselect in the first
exit and we will just wait for the next one.

I don't know why things may have been going wrong in your
implementation but it may be related to the issue with
mach_absolute_time() which I posted about separately and was also
causing busy loops for us in so

Re: [PATCH 2/8] hvf: Move common code out

On Mon, Nov 30, 2020 at 3:18 PM Alexander Graf  wrote:
>
>
> On 01.12.20 00:01, Peter Collingbourne wrote:
> > On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf  wrote:
> >> Hi Peter,
> >>
> >> On 30.11.20 22:08, Peter Collingbourne wrote:
> >>> On Mon, Nov 30, 2020 at 12:56 PM Frank Yang  wrote:
> 
>  On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf  wrote:
> > Hi Frank,
> >
> > Thanks for the update :). Your previous email nudged me into the right 
> > direction. I previously had implemented WFI through the internal timer 
> > framework which performed way worse.
>  Cool, glad it's helping. Also, Peter found out that the main thing 
>  keeping us from just using cntpct_el0 on the host directly and compare 
>  with cval is that if we sleep, cval is going to be much < cntpct_el0 by 
>  the sleep time. If we can get either the architecture or macos to read 
>  out the sleep time then we might be able to not have to use a poll 
>  interval either!
> > Along the way, I stumbled over a few issues though. For starters, the 
> > signal mask for SIG_IPI was not set correctly, so while pselect() would 
> > exit, the signal would never get delivered to the thread! For a fix, 
> > check out
> >
> > 
> > https://patchew.org/QEMU/20201130030723.78326-1-ag...@csgraf.de/20201130030723.78326-4-ag...@csgraf.de/
> >
>  Thanks, we'll take a look :)
> 
> > Please also have a look at my latest stab at WFI emulation. It doesn't 
> > handle WFE (that's only relevant in overcommitted scenarios). But it 
> > does handle WFI and even does something similar to hlt polling, albeit 
> > not with an adaptive threshold.
> >>> Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so
> >>> I'll reply to your patch here. You have:
> >>>
> >>> +/* Set cpu->hvf->sleeping so that we get a
> >>> SIG_IPI signal. */
> >>> +cpu->hvf->sleeping = true;
> >>> +smp_mb();
> >>> +
> >>> +/* Bail out if we received an IRQ meanwhile */
> >>> +if (cpu->thread_kicked || (cpu->interrupt_request &
> >>> +(CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
> >>> +cpu->hvf->sleeping = false;
> >>> +break;
> >>> +}
> >>> +
> >>> +/* nanosleep returns on signal, so we wake up on 
> >>> kick. */
> >>> +nanosleep(ts, NULL);
> >>>
> >>> and then send the signal conditional on whether sleeping is true, but
> >>> I think this is racy. If the signal is sent after sleeping is set to
> >>> true but before entering nanosleep then I think it will be ignored and
> >>> we will miss the wakeup. That's why in my implementation I block IPI
> >>> on the CPU thread at startup and then use pselect to atomically
> >>> unblock and begin sleeping. The signal is sent unconditionally so
> >>> there's no need to worry about races between actually sleeping and the
> >>> "we think we're sleeping" state. It may lead to an extra wakeup but
> >>> that's better than missing it entirely.
> >>
> >> Thanks a bunch for the comment! So the trick I was using here is to
> >> modify the timespec from the kick function before sending the IPI
> >> signal. That way, we know that either we are inside the sleep (where the
> >> signal wakes it up) or we are outside the sleep (where timespec={} will
> >> make it return immediately).
> >>
> >> The only race I can think of is if nanosleep does calculations based on
> >> the timespec and we happen to send the signal right there and then.
> > Yes that's the race I was thinking of. Admittedly it's a small window
> > but it's theoretically possible and part of the reason why pselect was
> > created.
> >
> >> The problem with blocking IPIs is basically what Frank was describing
> >> earlier: How do you unset the IPI signal pending status? If the signal
> >> is never delivered, how can pselect differentiate "signal from last time
> >> is still pending" from "new signal because I got an IPI"?
> > In this case we would take the additional wakeup which should be
> > harmless since we will take the WFx exit again and put us in the
> > correct state. But that's a lot better than busy looping.
>
>
> I'm not sure I follow. I'm thinking of the following scenario:
>
>- trap into WFI handler
>- go to sleep with blocked SIG_IPI
>- SIG_IPI arrives, pselect() exits
>- signal is still pending because it's blocked
>- enter guest
>- trap into WFI handler
>- run pselect(), but it immediate exits because SIG_IPI is still pending
>
> This was the loop I was seeing when running with SIG_IPI blocked. That's
> part of the reason why I switched to a different model.

What I observe is that when returning from a pending signal pselect
consumes the signal (which is also consistent with my understanding of
what pselect does

Re: [PATCH 2/8] hvf: Move common code out

2020-11-30 Thread Alexander Graf

On 01.12.20 00:01, Peter Collingbourne wrote:

On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf wrote:

Hi Peter,

On 30.11.20 22:08, Peter Collingbourne wrote:

On Mon, Nov 30, 2020 at 12:56 PM Frank Yang wrote:

On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf wrote:

Hi Frank,

Thanks for the update :). Your previous email nudged me into the right
direction. I previously had implemented WFI through the internal timer
framework which performed way worse.

https://patchew.org/QEMU/20201130030723.78326-1-ag...@csgraf.de/20201130030723.78326-4-ag...@csgraf.de/

Thanks, we'll take a look :)

Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so
I'll reply to your patch here. You have:

The only race I can think of is if nanosleep does calculations based on
the timespec and we happen to send the signal right there and then.

Yes that's the race I was thinking of. Admittedly it's a small window
but it's theoretically possible and part of the reason why pselect was
created.

In this case we would take the additional wakeup which should be
harmless since we will take the WFx exit again and put us in the
correct state. But that's a lot better than busy looping.

I'm not sure I follow. I'm thinking of the following scenario:

This was the loop I was seeing when running with SIG_IPI blocked. That's
part of the reason why I switched to a different model.

I reckon that you could improve things a little by unblocking the
signal and then reblocking it before unlocking iothread (e.g. with a
pselect with zero time interval), which would flush any pending
signals. Since any such signal would correspond to a signal from last
time (because we still have the iothread lock) we know that any future
signals should correspond to new IPIs.

Yeah, I think you actually *have* to do exactly that, because otherwise
pselect() will always return after 0ns because the signal is still pending.

And yes, I agree that that starts to sound a bit less racy now. But it
means we can probably also just do

- WFI handler
- block SIG_IPI
- set hvf->sleeping = true
- c

Re: [PATCH 2/8] hvf: Move common code out

On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf  wrote:
>
> Hi Peter,
>
> On 30.11.20 22:08, Peter Collingbourne wrote:
> > On Mon, Nov 30, 2020 at 12:56 PM Frank Yang  wrote:
> >>
> >>
> >> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf  wrote:
> >>> Hi Frank,
> >>>
> >>> Thanks for the update :). Your previous email nudged me into the right 
> >>> direction. I previously had implemented WFI through the internal timer 
> >>> framework which performed way worse.
> >> Cool, glad it's helping. Also, Peter found out that the main thing keeping 
> >> us from just using cntpct_el0 on the host directly and compare with cval 
> >> is that if we sleep, cval is going to be much < cntpct_el0 by the sleep 
> >> time. If we can get either the architecture or macos to read out the sleep 
> >> time then we might be able to not have to use a poll interval either!
> >>> Along the way, I stumbled over a few issues though. For starters, the 
> >>> signal mask for SIG_IPI was not set correctly, so while pselect() would 
> >>> exit, the signal would never get delivered to the thread! For a fix, 
> >>> check out
> >>>
> >>>
> >>> https://patchew.org/QEMU/20201130030723.78326-1-ag...@csgraf.de/20201130030723.78326-4-ag...@csgraf.de/
> >>>
> >> Thanks, we'll take a look :)
> >>
> >>> Please also have a look at my latest stab at WFI emulation. It doesn't 
> >>> handle WFE (that's only relevant in overcommitted scenarios). But it does 
> >>> handle WFI and even does something similar to hlt polling, albeit not 
> >>> with an adaptive threshold.
> > Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so
> > I'll reply to your patch here. You have:
> >
> > +/* Set cpu->hvf->sleeping so that we get a
> > SIG_IPI signal. */
> > +cpu->hvf->sleeping = true;
> > +smp_mb();
> > +
> > +/* Bail out if we received an IRQ meanwhile */
> > +if (cpu->thread_kicked || (cpu->interrupt_request &
> > +(CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
> > +cpu->hvf->sleeping = false;
> > +break;
> > +}
> > +
> > +/* nanosleep returns on signal, so we wake up on kick. 
> > */
> > +nanosleep(ts, NULL);
> >
> > and then send the signal conditional on whether sleeping is true, but
> > I think this is racy. If the signal is sent after sleeping is set to
> > true but before entering nanosleep then I think it will be ignored and
> > we will miss the wakeup. That's why in my implementation I block IPI
> > on the CPU thread at startup and then use pselect to atomically
> > unblock and begin sleeping. The signal is sent unconditionally so
> > there's no need to worry about races between actually sleeping and the
> > "we think we're sleeping" state. It may lead to an extra wakeup but
> > that's better than missing it entirely.
>
>
> Thanks a bunch for the comment! So the trick I was using here is to
> modify the timespec from the kick function before sending the IPI
> signal. That way, we know that either we are inside the sleep (where the
> signal wakes it up) or we are outside the sleep (where timespec={} will
> make it return immediately).
>
> The only race I can think of is if nanosleep does calculations based on
> the timespec and we happen to send the signal right there and then.

Yes that's the race I was thinking of. Admittedly it's a small window
but it's theoretically possible and part of the reason why pselect was
created.

> The problem with blocking IPIs is basically what Frank was describing
> earlier: How do you unset the IPI signal pending status? If the signal
> is never delivered, how can pselect differentiate "signal from last time
> is still pending" from "new signal because I got an IPI"?

In this case we would take the additional wakeup which should be
harmless since we will take the WFx exit again and put us in the
correct state. But that's a lot better than busy looping.

I reckon that you could improve things a little by unblocking the
signal and then reblocking it before unlocking iothread (e.g. with a
pselect with zero time interval), which would flush any pending
signals. Since any such signal would correspond to a signal from last
time (because we still have the iothread lock) we know that any future
signals should correspond to new IPIs.

Peter

Re: [PATCH 2/8] hvf: Move common code out

On Mon, Nov 30, 2020 at 12:56 PM Frank Yang  wrote:
>
>
>
> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf  wrote:
>>
>> Hi Frank,
>>
>> Thanks for the update :). Your previous email nudged me into the right 
>> direction. I previously had implemented WFI through the internal timer 
>> framework which performed way worse.
>
> Cool, glad it's helping. Also, Peter found out that the main thing keeping us 
> from just using cntpct_el0 on the host directly and compare with cval is that 
> if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we 
> can get either the architecture or macos to read out the sleep time then we 
> might be able to not have to use a poll interval either!

We tracked down the discrepancies between CNTPCT_EL0 on the guest vs
on the host to the fact that CNTPCT_EL0 on the guest does not
increment while the system is asleep and as such corresponds to
mach_absolute_time() on the host (if you read the XNU sources you will
see that mach_absolute_time() is implemented as CNTPCT_EL0 plus a
constant representing the time spent asleep) while CNTPCT_EL0 on the
host does increment while asleep. This patch switches the
implementation over to using mach_absolute_time() instead of reading
CNTPCT_EL0 directly:

https://android-review.googlesource.com/c/platform/external/qemu/+/1514870

Peter

>>
>> Along the way, I stumbled over a few issues though. For starters, the signal 
>> mask for SIG_IPI was not set correctly, so while pselect() would exit, the 
>> signal would never get delivered to the thread! For a fix, check out
>>
>>   
>> https://patchew.org/QEMU/20201130030723.78326-1-ag...@csgraf.de/20201130030723.78326-4-ag...@csgraf.de/
>>
>
> Thanks, we'll take a look :)
>
>>
>> Please also have a look at my latest stab at WFI emulation. It doesn't 
>> handle WFE (that's only relevant in overcommitted scenarios). But it does 
>> handle WFI and even does something similar to hlt polling, albeit not with 
>> an adaptive threshold.
>>
>> Also, is there a particular reason you're working on this super interesting 
>> and useful code in a random downstream fork of QEMU? Wouldn't it be more 
>> helpful to contribute to the upstream code base instead?
>
> We'd actually like to contribute upstream too :) We do want to maintain our 
> own downstream though; Android Emulator codebase needs to work solidly on 
> macos and windows which has made keeping up with upstream difficult, and 
> staying on a previous version (2.12) with known quirks easier. (theres also 
> some android related customization relating to Qt Ui + different set of 
> virtual devices and snapshot support (incl. snapshots of graphics devices 
> with OpenGLES state tracking), which we hope to separate into other 
> libraries/processes, but its not insignificant)
>>
>>
>> Alex
>>
>>
>> On 30.11.20 21:15, Frank Yang wrote:
>>
>> Update: We're not quite sure how to compare the CNTV_CVAL and CNTVCT. But 
>> the high CPU usage seems to be mitigated by having a poll interval (like KVM 
>> does) in handling WFI:
>>
>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512501
>>
>> This is loosely inspired by 
>> https://elixir.bootlin.com/linux/v5.10-rc6/source/virt/kvm/kvm_main.c#L2766 
>> which does seem to specify a poll interval.
>>
>> It would be cool if we could have a lightweight way to enter sleep and 
>> restart the vcpus precisely when CVAL passes, though.
>>
>> Frank
>>
>>
>> On Fri, Nov 27, 2020 at 3:30 PM Frank Yang  wrote:
>>>
>>> Hi all,
>>>
>>> +Peter Collingbourne
>>>
>>> I'm a developer on the Android Emulator, which is in a fork of QEMU.
>>>
>>> Peter and I have been working on an HVF Apple Silicon backend with an eye 
>>> toward Android guests.
>>>
>>> We have gotten things to basically switch to Android userspace already 
>>> (logcat/shell and graphics available at least)
>>>
>>> Our strategy so far has been to import logic from the KVM implementation 
>>> and hook into QEMU's software devices that previously assumed to only work 
>>> with TCG, or have KVM-specific paths.
>>>
>>> Thanks to Alexander for the tip on the 36-bit address space limitation btw; 
>>> our way of addressing this is to still allow highmem but not put pci high 
>>> mmio so high.
>>>
>>> Also, note we have a sleep/signal based mechanism to deal with WFx, which 
>>> might be worth looking into in Alexander's implementation as well:
>>>
>>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512551
>>>
>>> Patches so far, FYI:
>>>
>>> https://android-review.googlesource.com/c/platform/external/qemu/+/1513429/1
>>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512554/3
>>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512553/3
>>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512552/3
>>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512551/3
>>>
>>> https://android.googlesource.com/platform/external/qemu/+/c17eb6a3ffd5004

Re: [PATCH 2/8] hvf: Move common code out

2020-11-30 Thread Peter Maydell

On Mon, 30 Nov 2020 at 20:56, Frank Yang  wrote:
> We'd actually like to contribute upstream too :) We do want to maintain
> our own downstream though; Android Emulator codebase needs to work
> solidly on macos and windows which has made keeping up with upstream difficult

One of the main reasons why OSX and Windows support upstream is
not so great is because very few people are helping to develop,
test and support it upstream. The way to fix that IMHO is for more
people who do care about those platforms to actively engage
with us upstream to help in making those platforms move closer to
being first class citizens. If you stay on a downstream fork
forever then I don't think you'll ever see things improve.

thanks
-- PMM

Re: [PATCH] qmp-shell: Sort by key when pretty-printing

2020-11-30 Thread David Edmondson

On Monday, 2020-11-30 at 15:56:51 -05, John Snow wrote:

> On 10/13/20 10:14 AM, David Edmondson wrote:
>> If the user selects pretty-printing (-p) the contents of any
>> dictionaries in the output are sorted by key.
>> 
>> Signed-off-by: David Edmondson 
>> ---
>>   scripts/qmp/qmp-shell | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
>> index c5eef06f3f..b4d06096ab 100755
>> --- a/scripts/qmp/qmp-shell
>> +++ b/scripts/qmp/qmp-shell
>> @@ -260,7 +260,7 @@ class QMPShell(qmp.QEMUMonitorProtocol):
>>   indent = None
>>   if self._pretty:
>>   indent = 4
>> -jsobj = json.dumps(qmp, indent=indent)
>> +jsobj = json.dumps(qmp, indent=indent, sort_keys=self._pretty)
>>   print(str(jsobj))
>>   
>>   def _execute_cmd(self, cmdline):
>> 
>
> Hi, out of curiosity, what does this help you accomplish?

When dumping a dictionary with many values, visually finding a specific
one that is of interest is much quicker if they are sorted. Nothing more
than that.

> I've recently been overhauling a LOT of the Python utilities we have, so 
> I'm interested in hearing about how people use these tools and what 
> they'd like them to do.
>
> --js

dme.
-- 
I can't explain, you would not understand. This is not how I am.

Re: [RFC PATCH-for-5.2] gitlab-ci: Do not automatically run Avocado integration tests anymore

2020-11-30 Thread Willian Rampazzo

On Fri, Nov 27, 2020 at 2:41 PM Philippe Mathieu-Daudé
 wrote:
>
> We lately realized that the Avocado framework was not designed
> to be regularly run on CI environments. Therefore, as of 5.2
> we deprecate the gitlab-ci jobs using Avocado. To not disrupt
> current users, it is possible to keep the current behavior by
> setting the QEMU_CI_INTEGRATION_JOBS_PRE_5_2_RELEASE variable
> (see [*]).
> From now on, using these jobs (or adding new tests to them)
> is strongly discouraged.

When you say, "Avocado framework was not designed to be regularly run
on CI environments", I feel your pain. Avocado is a really nice test
framework, and I agree with you that running it locally is a little
easier than running inside a CI environment. Debugging a job failure
in the CI is not user-friendly; finding the command to reproduce a job
failure locally is not user-friendly. I understand why you would like
to remove the CI's acceptance tests, but I think your proposal is
missing some arguments and some planning.

If I read correctly, we share the same view that the CI and the
software tests are two different things. Here you are proposing that
we temporarily remove the CI's acceptance tests because it is not
user-friendly to the devs. This does not mean the tests will be lost.
It will be possible to run them locally or in the CI using the
QEMU_CI_INTEGRATION_JOBS_PRE_5_2_RELEASE variable.

>
> Tests based on Avocado will be ported to new job schemes during
> the next releases, with better documentation and templates.

I understand you intend to make a more reliable and stable CI. Some
wording on why the new job scheme will be better than what we have now
and some planning on enabling the acceptance tests again in the CI may
help evaluate if it is feasible or just the same as what we have
today.

It would be nice to hear from other subsystem maintainers their pain
points about using the CI and how we can improve it. I hear you that
Avocado needs to improve its interface to be more user friendly. As an
Avocado developer, I would also like to hear from others where we can
improve Avocado to make it less painful for the QEMU developers'.

Re: [PATCH 2/8] hvf: Move common code out

2020-11-30 Thread Alexander Graf

Hi Peter,

On 30.11.20 22:08, Peter Collingbourne wrote:

On Mon, Nov 30, 2020 at 12:56 PM Frank Yang wrote:

On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf wrote:

Hi Frank,

Thanks for the update :). Your previous email nudged me into the right
direction. I previously had implemented WFI through the internal timer
framework which performed way worse.

https://patchew.org/QEMU/20201130030723.78326-1-ag...@csgraf.de/20201130030723.78326-4-ag...@csgraf.de/

Thanks, we'll take a look :)

Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so
I'll reply to your patch here. You have:

The only race I can think of is if nanosleep does calculations based on
the timespec and we happen to send the signal right there and then.

Alex

Re: [PATCH-for-6.0 v4 15/17] gitlab-ci: Add test for Xen (on CentOS 7)

2020-11-30 Thread Stefano Stabellini

On Fri, 27 Nov 2020, Anthony PERARD wrote:
> On Thu, Nov 26, 2020 at 12:45:59PM -0500, Eduardo Habkost wrote:
> > On Thu, Nov 26, 2020 at 05:38:24PM +, Anthony PERARD wrote:
> > > Is `make check` going to do something useful with the Xen support? Or is
> > > it going to need more work in order to test the Xen support of QEMU?
> > > (Like starting an actual Xen guest.)
> > 
> > I don't think it will test Xen support, but we still want to at
> > least check if --enable-xen doesn't break anything else.
> 
> That sound good.
> 
> > Is there any public CI system anywhere where Xen support is
> > tested today?
> 
> Yes, we have osstest which regularly test Xen with QEMU from upstream.
> Result are sent to xen-devel. But that might not be very useful for
> qemu-devel.
> 
> We also have a GitLab CI which does some Xen tests, but I don't think
> QEMU is tested there.
> https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=automation/gitlab-ci/test.yaml;hb=HEAD
> https://gitlab.com/xen-project/xen/

QEMU (the version of QEMU picked by the Xen tools) is built but not used
in the Xen Project CI-loop yet.

I am extending the CI-loop with more tests [1], and I would like to have at
least one QEMU test at some point soon. Probably something based on Xen 9pfs.

[1] https://marc.info/?l=xen-devel&m=160627845825763

Re: [PATCH 2/8] hvf: Move common code out

On Mon, Nov 30, 2020 at 12:56 PM Frank Yang  wrote:
>
>
>
> On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf  wrote:
>>
>> Hi Frank,
>>
>> Thanks for the update :). Your previous email nudged me into the right 
>> direction. I previously had implemented WFI through the internal timer 
>> framework which performed way worse.
>
> Cool, glad it's helping. Also, Peter found out that the main thing keeping us 
> from just using cntpct_el0 on the host directly and compare with cval is that 
> if we sleep, cval is going to be much < cntpct_el0 by the sleep time. If we 
> can get either the architecture or macos to read out the sleep time then we 
> might be able to not have to use a poll interval either!
>>
>> Along the way, I stumbled over a few issues though. For starters, the signal 
>> mask for SIG_IPI was not set correctly, so while pselect() would exit, the 
>> signal would never get delivered to the thread! For a fix, check out
>>
>>   
>> https://patchew.org/QEMU/20201130030723.78326-1-ag...@csgraf.de/20201130030723.78326-4-ag...@csgraf.de/
>>
>
> Thanks, we'll take a look :)
>
>>
>> Please also have a look at my latest stab at WFI emulation. It doesn't 
>> handle WFE (that's only relevant in overcommitted scenarios). But it does 
>> handle WFI and even does something similar to hlt polling, albeit not with 
>> an adaptive threshold.

Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so
I'll reply to your patch here. You have:

+/* Set cpu->hvf->sleeping so that we get a
SIG_IPI signal. */
+cpu->hvf->sleeping = true;
+smp_mb();
+
+/* Bail out if we received an IRQ meanwhile */
+if (cpu->thread_kicked || (cpu->interrupt_request &
+(CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
+cpu->hvf->sleeping = false;
+break;
+}
+
+/* nanosleep returns on signal, so we wake up on kick. */
+nanosleep(ts, NULL);

and then send the signal conditional on whether sleeping is true, but
I think this is racy. If the signal is sent after sleeping is set to
true but before entering nanosleep then I think it will be ignored and
we will miss the wakeup. That's why in my implementation I block IPI
on the CPU thread at startup and then use pselect to atomically
unblock and begin sleeping. The signal is sent unconditionally so
there's no need to worry about races between actually sleeping and the
"we think we're sleeping" state. It may lead to an extra wakeup but
that's better than missing it entirely.

Peter

>>
>> Also, is there a particular reason you're working on this super interesting 
>> and useful code in a random downstream fork of QEMU? Wouldn't it be more 
>> helpful to contribute to the upstream code base instead?
>
> We'd actually like to contribute upstream too :) We do want to maintain our 
> own downstream though; Android Emulator codebase needs to work solidly on 
> macos and windows which has made keeping up with upstream difficult, and 
> staying on a previous version (2.12) with known quirks easier. (theres also 
> some android related customization relating to Qt Ui + different set of 
> virtual devices and snapshot support (incl. snapshots of graphics devices 
> with OpenGLES state tracking), which we hope to separate into other 
> libraries/processes, but its not insignificant)
>>
>>
>> Alex
>>
>>
>> On 30.11.20 21:15, Frank Yang wrote:
>>
>> Update: We're not quite sure how to compare the CNTV_CVAL and CNTVCT. But 
>> the high CPU usage seems to be mitigated by having a poll interval (like KVM 
>> does) in handling WFI:
>>
>> https://android-review.googlesource.com/c/platform/external/qemu/+/1512501
>>
>> This is loosely inspired by 
>> https://elixir.bootlin.com/linux/v5.10-rc6/source/virt/kvm/kvm_main.c#L2766 
>> which does seem to specify a poll interval.
>>
>> It would be cool if we could have a lightweight way to enter sleep and 
>> restart the vcpus precisely when CVAL passes, though.
>>
>> Frank
>>
>>
>> On Fri, Nov 27, 2020 at 3:30 PM Frank Yang  wrote:
>>>
>>> Hi all,
>>>
>>> +Peter Collingbourne
>>>
>>> I'm a developer on the Android Emulator, which is in a fork of QEMU.
>>>
>>> Peter and I have been working on an HVF Apple Silicon backend with an eye 
>>> toward Android guests.
>>>
>>> We have gotten things to basically switch to Android userspace already 
>>> (logcat/shell and graphics available at least)
>>>
>>> Our strategy so far has been to import logic from the KVM implementation 
>>> and hook into QEMU's software devices that previously assumed to only work 
>>> with TCG, or have KVM-specific paths.
>>>
>>> Thanks to Alexander for the tip on the 36-bit address space limitation btw; 
>>> our way of addressing this is to still allow highmem but not put pci high 
>>> mmio so high.
>>>
>>> Also, note we have a sleep/signal based mechanism to deal with WFx,

Re: [PATCH v2 07/12] tpm: put some tpm devices into the correct category

2020-11-30 Thread Stefan Berger


On 11/30/20 3:36 AM, Gan Qixin wrote:

Some tpm devices have no category, put them into the correct category.

Signed-off-by: Gan Qixin 
---
Cc: Stefan Berger 
---
  hw/tpm/tpm_tis_isa.c| 1 +
  hw/tpm/tpm_tis_sysbus.c | 1 +
  2 files changed, 2 insertions(+)

diff --git a/hw/tpm/tpm_tis_isa.c b/hw/tpm/tpm_tis_isa.c
index 6fd876eebf..10d8a14f19 100644
--- a/hw/tpm/tpm_tis_isa.c
+++ b/hw/tpm/tpm_tis_isa.c
@@ -150,6 +150,7 @@ static void tpm_tis_isa_class_init(ObjectClass *klass, void 
*data)
  dc->reset = tpm_tis_isa_reset;
  tc->request_completed = tpm_tis_isa_request_completed;
  tc->get_version = tpm_tis_isa_get_tpm_version;
+set_bit(DEVICE_CATEGORY_MISC, dc->categories);
  }

  static const TypeInfo tpm_tis_isa_info = {
diff --git a/hw/tpm/tpm_tis_sysbus.c b/hw/tpm/tpm_tis_sysbus.c
index 2c32aa7099..45e63efd63 100644
--- a/hw/tpm/tpm_tis_sysbus.c
+++ b/hw/tpm/tpm_tis_sysbus.c
@@ -139,6 +139,7 @@ static void tpm_tis_sysbus_class_init(ObjectClass *klass, 
void *data)
  dc->reset = tpm_tis_sysbus_reset;
  tc->request_completed = tpm_tis_sysbus_request_completed;
  tc->get_version = tpm_tis_sysbus_get_tpm_version;
+set_bit(DEVICE_CATEGORY_MISC, dc->categories);
  }

  static const TypeInfo tpm_tis_sysbus_info = {


Reviewed-by: Stefan Berger

Re: [PATCH] qmp-shell: Sort by key when pretty-printing

2020-11-30 Thread John Snow


On 10/13/20 10:14 AM, David Edmondson wrote:

If the user selects pretty-printing (-p) the contents of any
dictionaries in the output are sorted by key.

Signed-off-by: David Edmondson 
---
  scripts/qmp/qmp-shell | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
index c5eef06f3f..b4d06096ab 100755
--- a/scripts/qmp/qmp-shell
+++ b/scripts/qmp/qmp-shell
@@ -260,7 +260,7 @@ class QMPShell(qmp.QEMUMonitorProtocol):
  indent = None
  if self._pretty:
  indent = 4
-jsobj = json.dumps(qmp, indent=indent)
+jsobj = json.dumps(qmp, indent=indent, sort_keys=self._pretty)
  print(str(jsobj))
  
  def _execute_cmd(self, cmdline):




Hi, out of curiosity, what does this help you accomplish?

I've recently been overhauling a LOT of the Python utilities we have, so 
I'm interested in hearing about how people use these tools and what 
they'd like them to do.


--js

Re: [PATCH v2] hw/nvme: Move NVMe emulation out of hw/block/ directory

2020-11-30 Thread Klaus Jensen

On Nov 30 15:52, Philippe Mathieu-Daudé wrote:
> As IDE used to be, NVMe emulation is becoming an active
> subsystem. Move it into its own namespace.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> v2: Rebased after nvme-ns got merged in commit 8680d6e3646
> ---
>  meson.build   |   1 +
>  hw/{block/nvme.h => nvme/nvme-internal.h} |   4 +-
>  hw/{block => nvme}/nvme-ns.h  |   0
>  hw/{block/nvme.c => nvme/core.c}  |   2 +-
>  hw/{block => nvme}/nvme-ns.c  |   0
>  MAINTAINERS   |   2 +-
>  hw/Kconfig|   1 +
>  hw/block/Kconfig  |   5 -
>  hw/block/meson.build  |   1 -
>  hw/block/trace-events | 132 -
>  hw/meson.build|   1 +
>  hw/nvme/Kconfig   |   4 +
>  hw/nvme/meson.build   |   1 +
>  hw/nvme/trace-events  | 133 ++
>  14 files changed, 145 insertions(+), 142 deletions(-)
>  rename hw/{block/nvme.h => nvme/nvme-internal.h} (98%)
>  rename hw/{block => nvme}/nvme-ns.h (100%)
>  rename hw/{block/nvme.c => nvme/core.c} (99%)
>  rename hw/{block => nvme}/nvme-ns.c (100%)

Would we want to consider renaming nvme-ns.c to namespace.c? And maybe
also follow up with consolidating nvme-ns.h into nvme-internal.h?


signature.asc
Description: PGP signature

Re: [PATCH 2/8] hvf: Move common code out

2020-11-30 Thread Frank Yang

Update: We're not quite sure how to compare the CNTV_CVAL and CNTVCT. But
the high CPU usage seems to be mitigated by having a poll interval (like
KVM does) in handling WFI:

https://android-review.googlesource.com/c/platform/external/qemu/+/1512501

This is loosely inspired by
https://elixir.bootlin.com/linux/v5.10-rc6/source/virt/kvm/kvm_main.c#L2766
which does seem to specify a poll interval.

It would be cool if we could have a lightweight way to enter sleep and
restart the vcpus precisely when CVAL passes, though.

Frank


On Fri, Nov 27, 2020 at 3:30 PM Frank Yang  wrote:

> Hi all,
>
> +Peter Collingbourne 
>
> I'm a developer on the Android Emulator, which is in a fork of QEMU.
>
> Peter and I have been working on an HVF Apple Silicon backend with an eye
> toward Android guests.
>
> We have gotten things to basically switch to Android userspace already
> (logcat/shell and graphics available at least)
>
> Our strategy so far has been to import logic from the KVM implementation
> and hook into QEMU's software devices that previously assumed to only work
> with TCG, or have KVM-specific paths.
>
> Thanks to Alexander for the tip on the 36-bit address space limitation
> btw; our way of addressing this is to still allow highmem but not put pci
> high mmio so high.
>
> Also, note we have a sleep/signal based mechanism to deal with WFx, which
> might be worth looking into in Alexander's implementation as well:
>
> https://android-review.googlesource.com/c/platform/external/qemu/+/1512551
>
> Patches so far, FYI:
>
>
> https://android-review.googlesource.com/c/platform/external/qemu/+/1513429/1
>
> https://android-review.googlesource.com/c/platform/external/qemu/+/1512554/3
>
> https://android-review.googlesource.com/c/platform/external/qemu/+/1512553/3
>
> https://android-review.googlesource.com/c/platform/external/qemu/+/1512552/3
>
> https://android-review.googlesource.com/c/platform/external/qemu/+/1512551/3
>
>
> https://android.googlesource.com/platform/external/qemu/+/c17eb6a3ffd50047e9646aff6640b710cb8ff48a
>
> https://android.googlesource.com/platform/external/qemu/+/74bed16de1afb41b7a7ab8da1d1861226c9db63b
>
> https://android.googlesource.com/platform/external/qemu/+/eccd9e47ab2ccb9003455e3bb721f57f9ebc3c01
>
> https://android.googlesource.com/platform/external/qemu/+/54fe3d67ed4698e85826537a4f49b2b9074b2228
>
> https://android.googlesource.com/platform/external/qemu/+/82ef91a6fede1d1000f36be037ad4d58fbe0d102
>
> https://android.googlesource.com/platform/external/qemu/+/c28147aa7c74d98b858e99623d2fe46e74a379f6
>
> Peter's also noticed that there are extra steps needed for M1's to allow
> TCG to work, as it involves JIT:
>
>
> https://android.googlesource.com/platform/external/qemu/+/740e3fe47f88926c6bda9abb22ee6eae1bc254a9
>
> We'd appreciate any feedback/comments :)
>
> Best,
>
> Frank
>
> On Fri, Nov 27, 2020 at 1:57 PM Alexander Graf  wrote:
>
>>
>> On 27.11.20 21:00, Roman Bolshakov wrote:
>> > On Thu, Nov 26, 2020 at 10:50:11PM +0100, Alexander Graf wrote:
>> >> Until now, Hypervisor.framework has only been available on x86_64
>> systems.
>> >> With Apple Silicon shipping now, it extends its reach to aarch64. To
>> >> prepare for support for multiple architectures, let's move common code
>> out
>> >> into its own accel directory.
>> >>
>> >> Signed-off-by: Alexander Graf 
>> >> ---
>> >>   MAINTAINERS |   9 +-
>> >>   accel/hvf/hvf-all.c |  56 +
>> >>   accel/hvf/hvf-cpus.c| 468
>> 
>> >>   accel/hvf/meson.build   |   7 +
>> >>   accel/meson.build   |   1 +
>> >>   include/sysemu/hvf_int.h|  69 ++
>> >>   target/i386/hvf/hvf-cpus.c  | 131 --
>> >>   target/i386/hvf/hvf-cpus.h  |  25 --
>> >>   target/i386/hvf/hvf-i386.h  |  48 +---
>> >>   target/i386/hvf/hvf.c   | 360 +--
>> >>   target/i386/hvf/meson.build |   1 -
>> >>   target/i386/hvf/x86hvf.c|  11 +-
>> >>   target/i386/hvf/x86hvf.h|   2 -
>> >>   13 files changed, 619 insertions(+), 569 deletions(-)
>> >>   create mode 100644 accel/hvf/hvf-all.c
>> >>   create mode 100644 accel/hvf/hvf-cpus.c
>> >>   create mode 100644 accel/hvf/meson.build
>> >>   create mode 100644 include/sysemu/hvf_int.h
>> >>   delete mode 100644 target/i386/hvf/hvf-cpus.c
>> >>   delete mode 100644 target/i386/hvf/hvf-cpus.h
>> >>
>> >> diff --git a/MAINTAINERS b/MAINTAINERS
>> >> index 68bc160f41..ca4b6d9279 100644
>> >> --- a/MAINTAINERS
>> >> +++ b/MAINTAINERS
>> >> @@ -444,9 +444,16 @@ M: Cameron Esfahani 
>> >>   M: Roman Bolshakov 
>> >>   W: https://wiki.qemu.org/Features/HVF
>> >>   S: Maintained
>> >> -F: accel/stubs/hvf-stub.c
>> > There was a patch for that in the RFC series from Claudio.
>>
>>
>> Yeah, I'm not worried about this hunk :).
>>
>>
>> >
>> >>   F: target/i386/hvf/
>> >> +
>> >> +HVF
>> >> +M: Cameron Esfahani 
>> >> +M: Roman Bolshakov 
>> >> +W: https://wiki.qemu.org/Features/HVF
>> >> +S: Maintained
>>

[Bug 1906295] Re: Implementation of exclusive monitor in ARM

2020-11-30 Thread Peter Maydell

QEMU's load/store exclusive implementation is known to not be
architecturally compliant. Most notably, it works on virtual addresses
and the architecture specifies that exclusives are supposed to work on
physical addresses. We provide a "good enough for what real code (not
weird test cases) needs" implementation.

However, in this specific case, we're within the architectural
requirements. In the ARMv8A Arm ARM (DDI0487F.c) the pseudocode
AArch32.ExclusiveMonitorsPass() function has this comment:

// It is IMPLEMENTATION DEFINED whether the detection of memory aborts happens
// before or after the check on the local Exclusives monitor. As a result a 
failure
// of the local monitor can occur on some implementations even if the memory
// access would give a memory about.

In other words, it's up to the implementation whether it detects and
reports the invalid address first. QEMU's implementation chooses not to.


** Changed in: qemu
   Status: New => Invalid

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1906295

Title:
  Implementation of exclusive monitor in ARM

Status in QEMU:
  Invalid

Bug description:
  Hi

  I refer to the implementation of exclusive monitor in ARM32. For
  instruction like STREX Rx,Ry,[Rz], we need to check whether the
  address [Rz] is in exclusive state. If not, we set the value Rx as 1
  without doing the store operation. However, I noticed that QEMU will
  not check whether the address that Rz points to is a legal address or
  not. If the value of Rz is 0x0 but it is not in exclusive state. QEMU
  will set Rx as 1 and continue to execute the following instructions.

  However, physical devices will check the value of Rz. If Rz is an illegal 
address (e.g., 0x0), a SIGSEGV signal will be raised even the address is not in 
exclusive state. I searched many documentation about ARM and it seems that 
manual of ARM specification does not specify the implementation of exclusive 
monitor in detail. I am not sure which one is the right behavior. 
  Should QEMU add this check? This might not be a mistake. However, should it 
be better if QEMU has the same behavior as a physical device? Feel free if you 
need a testcase. Many thanks

  Regards
  Muhui

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1906295/+subscriptions

Re: [RFC] ich9:cpuhp: add support for cpu hot-unplug with SMI broadcast enabled

2020-11-30 Thread Ankur Arora


On 2020-11-30 8:58 a.m., Laszlo Ersek wrote:

On 11/28/20 00:48, Ankur Arora wrote:


It is possible that there are CPUs with bits for both is_inserting and
is_removing. In that case QemuCpuhpCollectApicIds() would put them in the
PluggedApicIds array and the unplug eventually happens in the next
firmware invocation.

If a CPU has both is_inserting and fw_remove set, the firmware processes
the
hotplug in that invocation and the unplug happens whenever the OSPM
triggers
the firmware next.


If these corner cases will actually work (I'm somewhat doubtful), that
will be really great.


Heh, yeah. That's a big if. I'll see if I can hit it in my testing.

Thanks
Ankur

Re: [PATCH 00/18] qapi/qom: QAPIfy object-add

2020-11-30 Thread Paolo Bonzini


On 30/11/20 19:10, Kevin Wolf wrote:

Am 30.11.2020 um 17:57 hat Paolo Bonzini geschrieben:

The main problem is that it wouldn't extend well, if at all, to
machines and devices.  So those would still not be integrated into the
QAPI schema.


What do you think is the biggest difference there? Don't devices work
the same as user creatable objects in that you first set a bunch of
properties (which would now be done through QAPI instead) and then call
the completion/realize method that actually makes use of them?


For devices it's just the practical issue that there are too many to 
have something like this series.  For machine types the main issue is 
that the version-specific machine types would have to be defined in one 
more place.



The single struct doesn't bother me _too much_ actually.  What bothers me is
that it won't be a single source of all QOM objects, only those that happen
to be created by object-add.


But isn't it only natural that a list of these objects will exist in
some way, implicitly or explicitly? object-add must somehow decide which
object types it allows the user to create.


There's already one, it's objects that implement user creatable.  I 
don't think having it as a QAPI enum (or discriminator) is worthwhile, 
and it's one more thing that can go out of sync between the QAPI schema 
and the C code.



I'm also pretty sure that QOM as it exists now is not the right solution
for any of them because it has some major shortcomings. It's too easy to
get things wrong (like the writable properties after creation), its
introspection is rather weak and separated from the QAPI introspection,
it doesn't encourage documentation, and it involves quite a bit of
boilerplate and duplicated code between class implementations.

A modified QOM might be the right solution, though. I would like to
bring QAPI and QOM together because most of these weaknesses are
strengths of QAPI.


I agree wholeheartedly.  But your series goes to the opposite excess. 
Eduardo is doing work in QOM to mitigate the issues you point out, and 
you have to meet in the middle with him.  Starting with the user-visible 
parts focuses on the irrelevant parts.



Mostly because it's more or less the same issue that you have with
BlockdevOptions, with the extra complication that this series only deals
with the easy one of the four above categories.


I'm not sure which exact problem with BlockdevOptions you mean. The
reason why the block layer doesn't use BlockdevOptions everywhere is
-drive support.


You don't have a single BlockdevOptions* field in the structs of block/. 
 My interpretation of this is that BlockdevOptions* is a good schema 
for configuring things but not for run-time state.



Maybe if we don't want to commit to keeping the ObjectOptions schema,
the part that should wait is object-add and I should do the command line
options first? Then the schema remains an implementation detail for now
that is invisible in introspection.


I don't see much benefit in converting _any_ of the three actually.  The
only good reason I see for QAPIfying this is the documentation, and the
promise of deduplicating QOM boilerplate.  The latter is only a promise, but
documentation alone is a damn good reason and it's enough to get this work
into a mergeable shape as soon as possible!


I think having some input validation done by QAPI instead of in each QOM
object individually is nice, too. You get it after CLI, QMP and HMP all
go through QAPI.


But the right way to do that is to use Eduardo's field properties: they 
make QOM do the right thing, adding another layer of parsing is just 
adding more complexity.  Are there any validation bugs that you're 
fixing?  Is that something that cannot be fixed elsewhere, or are you 
papering over bad QOM coding?  (Again, I'm not debating that QOM 
properties are hard to write correctly).



But maybe, we could start in the opposite direction: start with the use QAPI
to eliminate QOM boilerplate.  Basing your work on Eduardo's field
properties series, you could add a new 'object' "kind" to QAPI that would
create an array of field properties (e.g. a macro expanding to a compound
literal?)


There is a very simple reason why I don't want to start with the QAPI
generator: John has multiple series pending that touch basically every
part of the QAPI generator. This means not only that I need to be
careful about merge conflict (or base my work on top of five other
series, which feels adventurous), but also that I would be competing
with John for Markus' reviewer capacity, further slowing things down.


That's something that you have to discuss with John and Markus.  It may 
be that John has to shelve part of his work or rebase on top of yours 
instead.


By adding an extra parsing layer you're adding complexity that may not 
be needed in the end, and we're very bad of removing it later.  So I'm 
worried that this series will become technical debt in just a few months.



Well, two reasons: Also bec

Re: [PATCH v4 6/6] introduce simple linear scan rate limiting mechanism

2020-11-30 Thread Andrey Gruzdev


On 30.11.2020 19:40, Peter Xu wrote:

On Mon, Nov 30, 2020 at 11:11:00AM +0300, Andrey Gruzdev wrote:

On 28.11.2020 01:28, Peter Xu wrote:

On Thu, Nov 26, 2020 at 06:17:34PM +0300, Andrey Gruzdev wrote:

Since reading UFFD events and saving paged data are performed
from the same thread, write fault latencies are sensitive to
migration stream stalls. Limiting total page saving rate is a
method to reduce amount of noticiable fault resolution latencies.

Migration bandwidth limiting is achieved via noticing cases of
out-of-threshold write fault latencies and temporarily disabling
(strictly speaking, severely throttling) saving non-faulting pages.


So have you done any measurements out of it, as we've talked in previous
version?  Thanks,



Sorry, not done yet.


So do you still plan to? :)

And if not, could you describe the rational behind this patch?  For example,
what's the problem behind (e.g., guest hangs for xxx seconds, maybe?) and
what's the outcome (guest doesn't hang any more)?

Thanks,



Yes, I think providing a latency histogram with description of the test 
case is the right thing.


--
Andrey Gruzdev, Principal Engineer
Virtuozzo GmbH  +7-903-247-6397
virtuzzo.com

Re: [PATCH v4 3/6] support UFFD write fault processing in ram_save_iterate()

2020-11-30 Thread Andrey Gruzdev


On 30.11.2020 19:32, Peter Xu wrote:

On Mon, Nov 30, 2020 at 12:14:03AM +0300, Andrey Gruzdev wrote:

+#ifdef CONFIG_LINUX
+/**
+ * ram_find_block_by_host_address: find RAM block containing host page
+ *
+ * Returns pointer to RAMBlock if found, NULL otherwise
+ *
+ * @rs: current RAM state
+ * @page_address: host page address
+ */
+static RAMBlock *ram_find_block_by_host_address(RAMState *rs, hwaddr 
page_address)


Reuse qemu_ram_block_from_host() somehow?



Seems not very suitable here, since we use rs->last_seen_block to restart
search..


The search logic is actually still the same, it's just about which block to
start searching (rs->last_seen_block, ram_list.mru_block, or the 1st one).  So
an internal helper to pass in that information would be nice.  Though that'll
require rcu lock taken before hand to keep the ramblock ptr valid.

No worry - we can keep this and work on top too, I think.

[...]
  +static RAMBlock *poll_fault_page(RAMState *rs, ram_addr_t *offset)

+{
+struct uffd_msg uffd_msg;
+hwaddr page_address;
+RAMBlock *bs;
+int res;
+
+if (!migrate_background_snapshot()) {
+return NULL;
+}
+
+res = uffd_read_events(rs->uffdio_fd, &uffd_msg, 1);
+if (res <= 0) {
+return NULL;
+}
+
+page_address = uffd_msg.arg.pagefault.address;
+bs = ram_find_block_by_host_address(rs, page_address);
+if (!bs) {
+/* In case we couldn't find respective block, just unprotect faulting 
page. */
+uffd_protect_memory(rs->uffdio_fd, page_address, TARGET_PAGE_SIZE, 
false);
+error_report("ram_find_block_by_host_address() failed: address=0x%0lx",
+page_address);


Looks ok to error_report() instead of assert(), but I'll suggest drop the call
to uffd_protect_memory() at least.  The only reason to not use assert() is
because we try our best to avoid crashing the vm, however I really doubt
whether uffd_protect_memory() is the right thing to do even if it happens - we
may at last try to unprotect some strange pages that we don't even know where
it is...



IMHO better to unprotect these strange pages then to leave them protected by
UFFD.. To avoid getting VM completely in-operational.
At least we know the page generated wr-fault, maybe due to incorrect
write-tracking initialization, or RAMBlock somehow has gone. Nevertheless if
leave the page as is, VM would certainly lock.


Yes makes some senes too.  However it's definitely a severe programming error,
even if the VM is unblocked, it can be in a very weird state...

Maybe we just assert? Then we can see how unlucky we'll be. :)



Yeah, seems assert is a right thing here :)



Hmm, I wonder about assert(). In QEMU it would do something in release
builds?


I guess yes, at least for now.  Because osdep.h has this:

/*
  * We have a lot of unaudited code that may fail in strange ways, or
  * even be a security risk during migration, if you disable assertions
  * at compile-time.  You may comment out these safety checks if you
  * absolutely want to disable assertion overhead, but it is not
  * supported upstream so the risk is all yours.  Meanwhile, please
  * submit patches to remove any side-effects inside an assertion, or
  * fixing error handling that should use Error instead of assert.
  */
#ifdef NDEBUG
#error building with NDEBUG is not supported
#endif
#ifdef G_DISABLE_ASSERT
#error building with G_DISABLE_ASSERT is not supported
#endif

[...]



Ooops, didn't know that, thanks for info!


+/**
+ * ram_save_host_page_post: ram_save_host_page() post-notifier
+ *
+ * @rs: current RAM state
+ * @pss: page-search-status structure
+ * @opaque: opaque context value
+ * @res_override: pointer to the return value of ram_save_host_page(),
+ *   overwritten in case of an error
+ */
+static void ram_save_host_page_post(RAMState *rs, PageSearchStatus *pss,
+void *opaque, int *res_override)
+{
+/* Check if page is from UFFD-managed region. */
+if (pss->block->flags & RAM_UF_WRITEPROTECT) {
+#ifdef CONFIG_LINUX
+ram_addr_t page_from = (ram_addr_t) opaque;
+hwaddr page_address = (hwaddr) pss->block->host +
+  ((hwaddr) page_from << TARGET_PAGE_BITS);


I feel like most new uses of hwaddr is not correct...  As I also commented in
the other patch.  We should save a lot of castings if switched.



Need to check. hwaddr is typedef'ed as uint64_t while ram_addr_t as
uintptr_t. I any case UFFD interface relies on u64 type.


For example, page_from should be a host address, we can use unsigned long, or
uint64_t, but ram_addr_t is not proper, which is only used in ramblock address
space.

I think it's fine that e.g. we use common types like uint64_t directly.



Now I also think we'd better use common types - mostly uint64_t 
directly, not to confuse anybody with that specific type descriptions.





+hwaddr run_length = (hwaddr) (pss->page - page_from + 1) << 
TARGET_PAGE_BITS;
+int res;
+
+/* F

Re: [PATCH 00/18] qapi/qom: QAPIfy object-add

2020-11-30 Thread Peter Krempa

On Mon, Nov 30, 2020 at 13:25:20 +0100, Kevin Wolf wrote:
> This series adds a QAPI type for the properties of all user creatable
> QOM types and finally makes QMP object-add use the new ObjectOptions
> union so that QAPI introspection can be used for user creatable objects.

FYI, here's a libvirt patchset which (apart from refactors) adds
compatibility with the deprecated/removed 'props' sub-object and also
adds testing to see whether all objects used by libvirt conform to the
new schema:

https://www.redhat.com/archives/libvir-list/2020-November/msg01625.html

Spoiler alert: surprisingly they do match so no updates are necessary!

Re: [PATCH] qemu-nbd: Fix a memleak in qemu_nbd_client_list()

On 11/30/20 6:36 AM, Alex Chen wrote:
> When the qio_channel_socket_connect_sync() fails
> we should goto 'out' label to free the 'sioc' instead of return.
> 
> Reported-by: Euler Robot 
> Signed-off-by: Alex Chen 
> ---
>  qemu-nbd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

A local leak, but the only caller of qemu_nbd_client_list() is main()
which returns the value and thereby exits the program, so it is
inconsequential in the bigger picture.  I'll defer this to 6.0, and pick
it up through my NBD tree once that opens.

> 
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index a7075c5419..47587a709e 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -181,7 +181,7 @@ static int qemu_nbd_client_list(SocketAddress *saddr, 
> QCryptoTLSCreds *tls,
>  sioc = qio_channel_socket_new();
>  if (qio_channel_socket_connect_sync(sioc, saddr, &err) < 0) {
>  error_report_err(err);
> -return EXIT_FAILURE;
> +goto out;
>  }
>  rc = nbd_receive_export_list(QIO_CHANNEL(sioc), tls, hostname, &list,
>   &err);
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[Bug 1906295] [NEW] Implementation of exclusive monitor in ARM

2020-11-30 Thread JIANG Muhui

Public bug reported:

Hi

I refer to the implementation of exclusive monitor in ARM32. For
instruction like STREX Rx,Ry,[Rz], we need to check whether the address
[Rz] is in exclusive state. If not, we set the value Rx as 1 without
doing the store operation. However, I noticed that QEMU will not check
whether the address that Rz points to is a legal address or not. If the
value of Rz is 0x0 but it is not in exclusive state. QEMU will set Rx as
1 and continue to execute the following instructions.

However, physical devices will check the value of Rz. If Rz is an illegal 
address (e.g., 0x0), a SIGSEGV signal will be raised even the address is not in 
exclusive state. I searched many documentation about ARM and it seems that 
manual of ARM specification does not specify the implementation of exclusive 
monitor in detail. I am not sure which one is the right behavior. 
Should QEMU add this check? This might not be a mistake. However, should it be 
better if QEMU has the same behavior as a physical device? Feel free if you 
need a testcase. Many thanks

Regards
Muhui

** Affects: qemu
 Importance: Undecided
 Status: New

** Summary changed:

- Improper implementation of exclusive monitor in ARM
+ Implementation of exclusive monitor in ARM

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1906295

Title:
  Implementation of exclusive monitor in ARM

Status in QEMU:
  New

Bug description:
  Hi

  I refer to the implementation of exclusive monitor in ARM32. For
  instruction like STREX Rx,Ry,[Rz], we need to check whether the
  address [Rz] is in exclusive state. If not, we set the value Rx as 1
  without doing the store operation. However, I noticed that QEMU will
  not check whether the address that Rz points to is a legal address or
  not. If the value of Rz is 0x0 but it is not in exclusive state. QEMU
  will set Rx as 1 and continue to execute the following instructions.

  However, physical devices will check the value of Rz. If Rz is an illegal 
address (e.g., 0x0), a SIGSEGV signal will be raised even the address is not in 
exclusive state. I searched many documentation about ARM and it seems that 
manual of ARM specification does not specify the implementation of exclusive 
monitor in detail. I am not sure which one is the right behavior. 
  Should QEMU add this check? This might not be a mistake. However, should it 
be better if QEMU has the same behavior as a physical device? Feel free if you 
need a testcase. Many thanks

  Regards
  Muhui

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1906295/+subscriptions

Re: [PATCH v4 2/6] introduce UFFD-WP low-level interface helpers

2020-11-30 Thread Andrey Gruzdev


On 30.11.2020 18:34, Peter Xu wrote:

On Sun, Nov 29, 2020 at 11:12:10PM +0300, Andrey Gruzdev wrote:

+void ram_write_tracking_stop(void)
+{
+#ifdef CONFIG_LINUX
+RAMState *rs = ram_state;
+RAMBlock *bs;
+assert(rs->uffdio_fd >= 0);


Maybe too harsh - we can return if it's invalid.

Meanwhile, better rcu_read_lock(), as well?



Yep, RCU lock, I'll add. Why too harsh? Just a debug assertion.


I was afraid some special path could trigger ram_write_tracking_stop() being
called before ram_write_tracking_start(), then vm could crash.  If we can
guarantee that not happening, then it's also ok with assert().

[...]



It can't really happen, but I agree that assert() is a bit excessive here.


+/**
+ * uffd_poll_events: poll UFFD file descriptor for read
+ *
+ * Returns true if events are available for read, false otherwise
+ *
+ * @uffd: UFFD file descriptor
+ * @tmo: timeout in milliseconds, 0 for non-blocking operation,
+ *   negative value for infinite wait
+ */
+bool uffd_poll_events(int uffd, int tmo)


Shall we spell "tmo" out?

In the comment? I think it's ok.


I'd suggest to spell it out everywhere, especially in the codes.  But feel free
to take your own preference.  Thanks,



Thanks,

--
Andrey Gruzdev, Principal Engineer
Virtuozzo GmbH  +7-903-247-6397
virtuzzo.com

Re: QMP and the 'id' parameter

2020-11-30 Thread John Snow


On 11/23/20 1:57 AM, Markus Armbruster wrote:

(Was it not possible to have the client send an ACK response that
doesn't indicate success, but just signals receipt? Semantically, it
would be the difference between do-x and start-x as a command.)

Feels quite possible to me.

If I read git-log correctly, the commands' design ignored the race with
shutdown / suspend, and 'success-response': false was a quick fix.

I think changing the commands from 'do-x' to 'start-x' would have been
better.  A bit more work, though, and back then there weren't any
'start-x' examples to follow, I guess.

I wish success-response didn't exist, but is getting rid of it worth
changing the interface?  I honestly don't know.



OK, I'll jot it down in the ideas book and we can come back to it later, 
after we've fried the other 2,385 fish in queue. I think simplifying the 
interface is going to be helpful long-term, but I don't have a good grip 
on the actual work estimate.


In terms of a simple library design, this edge case makes an async 
library quite a bit worse to support, because now it needs to understand 
that sometimes the target will never reply, and it's command dependent. 
Now the core "QMP" library, ostensibly the command-agnostic layer, needs 
to query to discover the semantics of each command it sends.



(1) Identify 'success-response: false' commands:

- guest-shutdown
- guest-suspend-disk
- guest-suspend-ram
- guest-suspend-hybrid


(2) Replace them with jobs, where a success response acknowledges 
receipt and start of the job. Errors encountered setting up the job can 
be returned synchronously; errors encountered past the point of no 
return can be queried via the job system.


(I am assuming that it is untenable to have a system where we 
acknowledge receipt, go past the point of no return and then encounter 
an error and have no way to report it, since we've already responded to 
the shutdown/suspend request. Therefore, the job system seems appropriate.)


((The onus is now on the job layer to understand that on job success, 
the server goes away and cannot report success, but I think this is 
easier to implement in a client at the logical layer than at the 
protocol layer.))



(3) Deprecate the old interfaces.


(4) Delete the old interfaces. Remove 'success-response: false' from the 
meta-schema. Remove 'success-response' from GuestAgentCommandInfo.



...but, not terribly important. Might be nice to help plumb some 
non-block jobs for sake of example for other areas where they might be 
useful. Other stuff to think about in the meantime, but thank you for 
the heads up. Maybe a good GSoC project, actually? It seems 
straightforward, just with a lot of plumbing.



I suppose for now, what can happen is the client (using the AQMP 
library) can simply await the results of execute() with a timeout. If 
the server closes the connection, we know it worked -- or, if there's a 
timeout and we used --no-shutdown, we can interrogate the VM status.

Re: [PATCH for-6.0 v2 1/3] spapr: Improve naming of some vCPU id related items

On Mon, 30 Nov 2020 18:32:27 +0100
Cédric Le Goater  wrote:

> On 11/30/20 5:52 PM, Greg Kurz wrote:
> > The machine tells the IC backend the number of vCPU ids it will be
> > exposed to, in order to:
> > - fill the "ibm,interrupt-server-ranges" property in the DT (XICS)
> > - size the VP block used by the in-kernel chip (XICS-on-XIVE, XIVE)
> > 
> > The current "nr_servers" and "spapr_max_server_number" naming can
> > mislead people info thinking it is about a quantity of CPUs. Make
> > it clear this is all about vCPU ids.
> 
> OK. This looks fine. 
> 
> But, XIVE does not care about vCPU ids. Only the count of vCPUs

XIVE does not care about vCPU ids indeed but KVM does. The KVM XIVE code
has been indexing VPs with vCPU ids from the start as visible in linux
commit 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE
interrupt controller") :

+int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
+struct kvm_vcpu *vcpu, u32 cpu)
+{
[...]
+   xc->vp_id = xive->vp_base + cpu;

This allows to have a quick conversion between vCPU id and VP id
without relying on an intermediate mapping structure. 

> is relevant. So, it would be nice to add a comment in the code 
> that we got it wrong at some point or that XICS imposed the use
> of max vCPU ids.
> 

There's more to it. The vCPU id spacing is a hack that was
introduced to workaround the limitation of pre-POWER9 cpu
types that cannot have HW threads from the same core running
in different MMU contexts. This was probably not such a good
idea but we have to live with it now since vCPU ids are guest
visible. I don't think it brings much to complain once more
over vCPU spacing since this has been addressed with VSMT.

The machine now always have consecutive vCPU ids by default,
ie. number of vCPUs == number of vCPU ids. Only custom machine
setups that tweak VSMT to some other value can be affected
by the spacing.

> C. 
> 
> 
> > Signed-off-by: Greg Kurz 
> > ---
> >  include/hw/ppc/spapr.h  |  2 +-
> >  include/hw/ppc/spapr_irq.h  |  8 
> >  include/hw/ppc/spapr_xive.h |  2 +-
> >  include/hw/ppc/xics_spapr.h |  2 +-
> >  hw/intc/spapr_xive.c|  8 
> >  hw/intc/spapr_xive_kvm.c|  4 ++--
> >  hw/intc/xics_kvm.c  |  4 ++--
> >  hw/intc/xics_spapr.c|  8 
> >  hw/ppc/spapr.c  |  8 
> >  hw/ppc/spapr_irq.c  | 18 +-
> >  10 files changed, 32 insertions(+), 32 deletions(-)
> > 
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index b7ced9faebf5..dc99d45e2852 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -849,7 +849,7 @@ int spapr_hpt_shift_for_ramsize(uint64_t ramsize);
> >  int spapr_reallocate_hpt(SpaprMachineState *spapr, int shift, Error 
> > **errp);
> >  void spapr_clear_pending_events(SpaprMachineState *spapr);
> >  void spapr_clear_pending_hotplug_events(SpaprMachineState *spapr);
> > -int spapr_max_server_number(SpaprMachineState *spapr);
> > +int spapr_max_vcpu_ids(SpaprMachineState *spapr);
> >  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
> >uint64_t pte0, uint64_t pte1);
> >  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
> > diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
> > index c22a72c9e270..2e53fc9e6cbb 100644
> > --- a/include/hw/ppc/spapr_irq.h
> > +++ b/include/hw/ppc/spapr_irq.h
> > @@ -43,7 +43,7 @@ DECLARE_CLASS_CHECKERS(SpaprInterruptControllerClass, 
> > SPAPR_INTC,
> >  struct SpaprInterruptControllerClass {
> >  InterfaceClass parent;
> >  
> > -int (*activate)(SpaprInterruptController *intc, uint32_t nr_servers,
> > +int (*activate)(SpaprInterruptController *intc, uint32_t max_vcpu_ids,
> >  Error **errp);
> >  void (*deactivate)(SpaprInterruptController *intc);
> >  
> > @@ -62,7 +62,7 @@ struct SpaprInterruptControllerClass {
> >  /* These methods should only be called on the active intc */
> >  void (*set_irq)(SpaprInterruptController *intc, int irq, int val);
> >  void (*print_info)(SpaprInterruptController *intc, Monitor *mon);
> > -void (*dt)(SpaprInterruptController *intc, uint32_t nr_servers,
> > +void (*dt)(SpaprInterruptController *intc, uint32_t max_vcpu_ids,
> > void *fdt, uint32_t phandle);
> >  int (*post_load)(SpaprInterruptController *intc, int version_id);
> >  };
> > @@ -74,7 +74,7 @@ int spapr_irq_cpu_intc_create(struct SpaprMachineState 
> > *spapr,
> >  void spapr_irq_cpu_intc_reset(struct SpaprMachineState *spapr, PowerPCCPU 
> > *cpu);
> >  void spapr_irq_cpu_intc_destroy(struct SpaprMachineState *spapr, 
> > PowerPCCPU *cpu);
> >  void spapr_irq_print_info(struct SpaprMachineState *spapr, Monitor *mon);
> > -void spapr_irq_dt(struct SpaprMachineState *spapr, uint32_t nr_servers,
> > +void spapr_irq_dt(struct SpaprMachineState *spapr, uint32_t max_vcpu_ids,
> >void *fdt, u

Re: [PATCH 4/4] elf_ops.h: Be more verbose with ROM blob names

On 11/29/20 2:39 PM, Peter Maydell wrote:
> Instead of making the ROM blob name something like:
>   phdr #0: /home/petmay01/linaro/qemu-misc-tests/ldmia-fault.axf
> make it a little more self-explanatory for people who don't know
> ELF format details:
>   /home/petmay01/linaro/qemu-misc-tests/ldmia-fault.axf ELF program header 
> segment 0
> 
> Signed-off-by: Peter Maydell 
> ---

Reviewed-by: Richard Henderson 

r~

Re: [PATCH] python 3.5 compatibility

2020-11-30 Thread Enrico Weigelt, metux IT consult

On 30.11.20 10:44, Kevin Wolf wrote:

Hi,

> While type hints are valuable documentation, they are more than just
> that. They help to find and prevent avoidable bugs in the code. We are
> actively in the process of adding them to everything in the QAPI
> generator to improve maintainability rather than removing them.

IOW: I'll have to do lots of backporting work, just to keep it running :(


--mtx

-- 
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
i...@metux.net -- +49-151-27565287

Re: [PATCH] python 3.5 compatibility

2020-11-30 Thread Enrico Weigelt, metux IT consult

On 27.11.20 20:15, Peter Maydell wrote:

Hi,

> Could you say which "stable distros" you have in mind, and whether
> they are covered by our "supported build platforms" policy
> https://www.qemu.org/docs/master/system/build-platforms.html  ?

I'm running on Devuan Ascii.

And packaging python-3.6 just for a few pieces of one application (qemu)
is far too much work.

--mtx

-- 
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
i...@metux.net -- +49-151-27565287

Re: [PATCH 3/4] elf_ops.h: Don't truncate name of the ROM blobs we create

On 11/29/20 2:39 PM, Peter Maydell wrote:
> Currently the load_elf code assembles the ROM blob name into a
> local 128 byte fixed-size array. Use g_strdup_printf() instead so
> that we don't truncate the pathname if it happens to be long.
> (This matters mostly for monitor 'info roms' output and for the
> error messages if ROM blobs overlap.)
> 
> Signed-off-by: Peter Maydell 
> ---
>  include/hw/elf_ops.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson 

r~

Re: [PATCH 2/4] hw/core/loader.c: Improve reporting of ROM overlap errors

On 11/29/20 2:39 PM, Peter Maydell wrote:
> In rom_check_and_register_reset() we report to the user if there is
> a "ROM region overlap". This has a couple of problems:
>  * the reported information is not very easy to intepret
>  * the function just prints the overlap to stderr (and relies on
>its single callsite in vl.c to do an error_report() and exit)
>  * only the first overlap encountered is diagnosed
> 
> Make this function use error_report() and error_printf() and
> report a more user-friendly report with all the overlaps
> diagnosed.
> 
> Sample old output:
> 
> rom: requested regions overlap (rom dtb. free=0x8000, 
> addr=0x)
> qemu-system-aarch64: rom check and register reset failed
> 
> Sample new output:
> 
> qemu-system-aarch64: Some ROM regions are overlapping
> These ROM regions might have been loaded by direct user request or by default.
> They could be BIOS/firmware images, a guest kernel, initrd or some other file 
> loaded into guest memory.
> Check whether you intended to load all this guest code, and whether it has 
> been built to load to the correct addresses.
> 
> The following two regions overlap (in the cpu-memory-0 address space):
>   phdr #0: /home/petmay01/linaro/qemu-misc-tests/ldmia-fault.axf (addresses 
> 0x - 0x8000)
>   dtb (addresses 0x - 0x0010)
> 
> The following two regions overlap (in the cpu-memory-0 address space):
>   phdr #1: /home/petmay01/linaro/qemu-misc-tests/bad-psci-call.axf (addresses 
> 0x4000 - 0x4010)
>   phdr #0: /home/petmay01/linaro/qemu-misc-tests/bp-test.elf (addresses 
> 0x4000 - 0x4020)
> 
> Signed-off-by: Peter Maydell 
> ---

Reviewed-by: Richard Henderson 

r~

Re: [PATCH 1/4] hw/core/loader.c: Track last-seen ROM in rom_check_and_register_reset()

On 11/29/20 2:39 PM, Peter Maydell wrote:
> In rom_check_and_register_reset() we detect overlaps by looking at
> whether the ROM blob we're currently examining is in the same address
> space and starts before the previous ROM blob ends.  (This works
> because the ROM list is kept sorted in order by AddressSpace and then
> by address.)
> 
> Instead of keeping the AddressSpace and last address of the previous ROM
> blob in local variables, just keep a pointer to it.
> 
> This will allow us to print more useful information when we do detect
> an overlap.
> 
> Signed-off-by: Peter Maydell 
> ---
>  hw/core/loader.c | 23 +++
>  1 file changed, 15 insertions(+), 8 deletions(-)

Reviewed-by: Richard Henderson 

r~

Re: [PATCH] target/mips: Allow executing MSA instructions on Loongson-3A4000

On 11/30/20 4:22 AM, Philippe Mathieu-Daudé wrote:
> The Loongson-3A4000 is a GS464V-based processor with MIPS MSA ASE:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg763059.html
> 
> Commit af868995e1b correctly set the 'MSA present' bit of Config3
> register, but forgot to allow the MSA instructions decoding in
> insn_flags, so executing them triggers a 'Reserved Instruction'.
> 
> Fix by adding the ASE_MSA mask to insn_flags.
> 
> Fixes: af868995e1b ("target/mips: Add Loongson-3 CPU definition")
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Buggy since 5.1, so probably not a big deal.
> ---
>  target/mips/translate_init.c.inc | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson 

r~

Re: [PATCH] decodetree: Allow use of hex/bin format for argument field values

On 11/30/20 6:26 AM, Philippe Mathieu-Daudé wrote:
>  # 'Foo=number' sets an argument field to a constant value
> -if re.fullmatch(re_C_ident + '=[+-]?[0-9]+', t):
> +if re.fullmatch(re_C_ident + '=[+-]?(0[bx])?[0-9]+', t):
>  (fname, value) = t.split('=')
> -value = int(value)
> +if re.fullmatch('[+-]?0b[0-9]+', value):
> +base = 2
> +elif re.fullmatch('[+-]?0x[0-9]+', value):
> +base = 16
> +else:
> +base = 10
> +value = int(value, base)
>  flds = add_field(lineno, flds, fname, ConstField(value))
>  continue

Well, the regxps are off.  No letters for the hex, and 9 accepted for binary.
I think with the right regexps, just trusting int(value, 0) is good enough.

So maybe something like

  re_C_ident + "=[+-]?([0-9]+|0x[0-9a-fA-F]+|0b[01]+)"


r~

Re: [PATCH 00/18] qapi/qom: QAPIfy object-add

2020-11-30 Thread Kevin Wolf

Am 30.11.2020 um 17:57 hat Paolo Bonzini geschrieben:
> On 30/11/20 16:46, Kevin Wolf wrote:
> > Am 30.11.2020 um 15:58 hat Paolo Bonzini geschrieben:
> > > With this series it's basically pointless to have QOM properties at
> > > all.
> > 
> > Not entirely, because there are still some writable properties that can
> > be changed later on.
> 
> Are there really any (that are not bugs like opened/loaded)? That's also why
> Eduardo and I discussed a class-wide allow_set function for his field
> properties series.

Yes. I don't really know most objects apart from what I had to learn for
this series, but I know at least two that actually allow this
intentionally: iothread and throttle-group.

> > So with this in mind, I think I'm in favour of completely leaving the
> > initialisation of properties on object creation to QAPI, and only
> > providing individual setters if we actually intend to allow property
> > changes after creation.
> 
> The main problem is that it wouldn't extend well, if at all, to
> machines and devices.  So those would still not be integrated into the
> QAPI schema.

What do you think is the biggest difference there? Don't devices work
the same as user creatable objects in that you first set a bunch of
properties (which would now be done through QAPI instead) and then call
the completion/realize method that actually makes use of them?

I must admit that I don't know how machine types work.

> > > So the question is, are we okay with shoveling half of QEMU's backend data
> > > model into a single struct?  If so, there are important consequences.
> > 
> > Yeah, the single struct bothers me a bit, both in the QAPI schema and in
> > the C source.
> 
> The single struct doesn't bother me _too much_ actually.  What bothers me is
> that it won't be a single source of all QOM objects, only those that happen
> to be created by object-add.

But isn't it only natural that a list of these objects will exist in
some way, implicitly or explicitly? object-add must somehow decide which
object types it allows the user to create.

Once we describe the object types in the schema (rather than only their
properties), which is required if we want to generate the QOM
boilerplate, this can possibly become implicit because then QAPI can
know which objects implement USER_CREATABLE.

> So I start to wonder if QOM as it exists now is the right solution for
> all kind of objects:
> 
> - backends and other object-add types (not per-target, relatively few
> classes and even fewer hierarchies)
> 
> - machine (per-target, many classes, no hierarchy)
> 
> - device (can be treated as not per-target, many many classes, very few
> hierarchies)
> 
> - accelerator (per-target, few classes, no hierarchy)
> 
> - chardev (ok those are the same as the first category)
> 
> If QOM is the right solution, this patch goes in the wrong direction.
> 
> If QOM is the wrong solution, this patch is okay but then we have another
> problem to solve. :)

I think the requirements for all of them are probably similar enough
that they can potentially be covered by a single thing.

I'm also pretty sure that QOM as it exists now is not the right solution
for any of them because it has some major shortcomings. It's too easy to
get things wrong (like the writable properties after creation), its
introspection is rather weak and separated from the QAPI introspection,
it doesn't encourage documentation, and it involves quite a bit of
boilerplate and duplicated code between class implementations.

A modified QOM might be the right solution, though. I would like to
bring QAPI and QOM together because most of these weaknesses are
strengths of QAPI.

I guess the real question is what aspects of QOM need to be changed to
make it the right solution.

> > > The problem with this series is that you are fine with deduplicating 
> > > things
> > > as a third step, but you cannot be sure that such deduplication is 
> > > possible
> > > at all.  So while I don't have any problems in principle with the
> > > ObjectOptions concept, I don't think it should be committed without a 
> > > clear
> > > idea of how to do the third step.
> > 
> > Do you have any specific concerns why the deduplication might not
> > possible, or just because it's uncharted territory?
> 
> Mostly because it's more or less the same issue that you have with
> BlockdevOptions, with the extra complication that this series only deals
> with the easy one of the four above categories.

I'm not sure which exact problem with BlockdevOptions you mean. The
reason why the block layer doesn't use BlockdevOptions everywhere is
-drive support.

> > Maybe if we don't want to commit to keeping the ObjectOptions schema,
> > the part that should wait is object-add and I should do the command line
> > options first? Then the schema remains an implementation detail for now
> > that is invisible in introspection.
> 
> I don't see much benefit in converting _any_ of the three actually.  The
> only good reason I see f

Re: [PATCH for-6.0 v2 1/3] spapr: Improve naming of some vCPU id related items

On 11/30/20 6:32 PM, Cédric Le Goater wrote:
> On 11/30/20 5:52 PM, Greg Kurz wrote:
>> The machine tells the IC backend the number of vCPU ids it will be
>> exposed to, in order to:
>> - fill the "ibm,interrupt-server-ranges" property in the DT (XICS)
>> - size the VP block used by the in-kernel chip (XICS-on-XIVE, XIVE)
>>
>> The current "nr_servers" and "spapr_max_server_number" naming can
>> mislead people info thinking it is about a quantity of CPUs. Make
>> it clear this is all about vCPU ids.
> 
> OK. This looks fine. 
> 
> But, XIVE does not care about vCPU ids. Only the count of vCPUs
> is relevant. So, it would be nice to add a comment in the code 
> that we got it wrong at some point or that XICS imposed the use
> of max vCPU ids.

Which you do in the next patch,

Reviewed-by: Cédric Le Goater 

Thanks,

C. 
> 
>> Signed-off-by: Greg Kurz 
>> ---
>>  include/hw/ppc/spapr.h  |  2 +-
>>  include/hw/ppc/spapr_irq.h  |  8 
>>  include/hw/ppc/spapr_xive.h |  2 +-
>>  include/hw/ppc/xics_spapr.h |  2 +-
>>  hw/intc/spapr_xive.c|  8 
>>  hw/intc/spapr_xive_kvm.c|  4 ++--
>>  hw/intc/xics_kvm.c  |  4 ++--
>>  hw/intc/xics_spapr.c|  8 
>>  hw/ppc/spapr.c  |  8 
>>  hw/ppc/spapr_irq.c  | 18 +-
>>  10 files changed, 32 insertions(+), 32 deletions(-)
>>
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index b7ced9faebf5..dc99d45e2852 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -849,7 +849,7 @@ int spapr_hpt_shift_for_ramsize(uint64_t ramsize);
>>  int spapr_reallocate_hpt(SpaprMachineState *spapr, int shift, Error **errp);
>>  void spapr_clear_pending_events(SpaprMachineState *spapr);
>>  void spapr_clear_pending_hotplug_events(SpaprMachineState *spapr);
>> -int spapr_max_server_number(SpaprMachineState *spapr);
>> +int spapr_max_vcpu_ids(SpaprMachineState *spapr);
>>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>>uint64_t pte0, uint64_t pte1);
>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
>> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
>> index c22a72c9e270..2e53fc9e6cbb 100644
>> --- a/include/hw/ppc/spapr_irq.h
>> +++ b/include/hw/ppc/spapr_irq.h
>> @@ -43,7 +43,7 @@ DECLARE_CLASS_CHECKERS(SpaprInterruptControllerClass, 
>> SPAPR_INTC,
>>  struct SpaprInterruptControllerClass {
>>  InterfaceClass parent;
>>  
>> -int (*activate)(SpaprInterruptController *intc, uint32_t nr_servers,
>> +int (*activate)(SpaprInterruptController *intc, uint32_t max_vcpu_ids,
>>  Error **errp);
>>  void (*deactivate)(SpaprInterruptController *intc);
>>  
>> @@ -62,7 +62,7 @@ struct SpaprInterruptControllerClass {
>>  /* These methods should only be called on the active intc */
>>  void (*set_irq)(SpaprInterruptController *intc, int irq, int val);
>>  void (*print_info)(SpaprInterruptController *intc, Monitor *mon);
>> -void (*dt)(SpaprInterruptController *intc, uint32_t nr_servers,
>> +void (*dt)(SpaprInterruptController *intc, uint32_t max_vcpu_ids,
>> void *fdt, uint32_t phandle);
>>  int (*post_load)(SpaprInterruptController *intc, int version_id);
>>  };
>> @@ -74,7 +74,7 @@ int spapr_irq_cpu_intc_create(struct SpaprMachineState 
>> *spapr,
>>  void spapr_irq_cpu_intc_reset(struct SpaprMachineState *spapr, PowerPCCPU 
>> *cpu);
>>  void spapr_irq_cpu_intc_destroy(struct SpaprMachineState *spapr, PowerPCCPU 
>> *cpu);
>>  void spapr_irq_print_info(struct SpaprMachineState *spapr, Monitor *mon);
>> -void spapr_irq_dt(struct SpaprMachineState *spapr, uint32_t nr_servers,
>> +void spapr_irq_dt(struct SpaprMachineState *spapr, uint32_t max_vcpu_ids,
>>void *fdt, uint32_t phandle);
>>  
>>  uint32_t spapr_irq_nr_msis(struct SpaprMachineState *spapr);
>> @@ -105,7 +105,7 @@ typedef int 
>> (*SpaprInterruptControllerInitKvm)(SpaprInterruptController *,
>>  
>>  int spapr_irq_init_kvm(SpaprInterruptControllerInitKvm fn,
>> SpaprInterruptController *intc,
>> -   uint32_t nr_servers,
>> +   uint32_t max_vcpu_ids,
>> Error **errp);
>>  
>>  /*
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index 26c8d90d7196..643129b13536 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -79,7 +79,7 @@ int spapr_xive_end_to_target(uint8_t end_blk, uint32_t 
>> end_idx,
>>  /*
>>   * KVM XIVE device helpers
>>   */
>> -int kvmppc_xive_connect(SpaprInterruptController *intc, uint32_t nr_servers,
>> +int kvmppc_xive_connect(SpaprInterruptController *intc, uint32_t 
>> max_vcpu_ids,
>>  Error **errp);
>>  void kvmppc_xive_disconnect(SpaprInterruptController *intc);
>>  void kvmppc_xive_reset(SpaprXive *xive, Error **errp);
>> diff --git a/include/hw/ppc/xics_spapr.

Re: [PATCH for-6.0 v2 2/3] spapr/xive: Fix size of END table and number of claimed IPIs

On 11/30/20 5:52 PM, Greg Kurz wrote:
> The sPAPR XIVE device has an internal ENDT table the size of
> which is configurable by the machine. This table is supposed
> to contain END structures for all possible vCPUs that may
> enter the guest. The machine must also claim IPIs for all
> possible vCPUs since this is expected by the guest.
> 
> spapr_irq_init() takes care of that under the assumption that
> spapr_max_vcpu_ids() returns the number of possible vCPUs.
> This happens to be the case when the VSMT mode is set to match
> the number of threads per core in the guest (default behavior).
> With non-default VSMT settings, this limit is > to the number
> of vCPUs. In the worst case, we can end up allocating an 8 times
> bigger ENDT and claiming 8 times more IPIs than needed. But more
> importantly, this creates a confusion between number of vCPUs and
> vCPU ids, which can lead to subtle bugs like [1].
> 
> Use smp.max_cpus instead of spapr_max_vcpu_ids() in
> spapr_irq_init() for the latest machine type. Older machine
> types continue to use spapr_max_vcpu_ids() since the size of
> the ENDT is migration visible.
> 
> [1] https://bugs.launchpad.net/qemu/+bug/1900241
> 
> Signed-off-by: Greg Kurz 


Reviewed-by: Cédric Le Goater 

Thanks,

C.

> ---
>  include/hw/ppc/spapr.h |  1 +
>  hw/ppc/spapr.c |  3 +++
>  hw/ppc/spapr_irq.c | 16 +---
>  3 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index dc99d45e2852..95bf210d0bf6 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -139,6 +139,7 @@ struct SpaprMachineClass {
>  hwaddr rma_limit;  /* clamp the RMA to this size */
>  bool pre_5_1_assoc_refpoints;
>  bool pre_5_2_numa_associativity;
> +bool pre_6_0_xive_max_cpus;
>  
>  bool (*phb_placement)(SpaprMachineState *spapr, uint32_t index,
>uint64_t *buid, hwaddr *pio, 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index ab59bfe941d0..227a926dfd48 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -4530,8 +4530,11 @@ DEFINE_SPAPR_MACHINE(6_0, "6.0", true);
>   */
>  static void spapr_machine_5_2_class_options(MachineClass *mc)
>  {
> +SpaprMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
> +
>  spapr_machine_6_0_class_options(mc);
>  compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len);
> +smc->pre_6_0_xive_max_cpus = true;
>  }
>  
>  DEFINE_SPAPR_MACHINE(5_2, "5.2", false);
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 552e30e93036..4d9ecd5d0af8 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -324,17 +324,27 @@ void spapr_irq_init(SpaprMachineState *spapr, Error 
> **errp)
>  }
>  
>  if (spapr->irq->xive) {
> -uint32_t max_vcpu_ids = spapr_max_vcpu_ids(spapr);
> +uint32_t max_cpus = MACHINE(spapr)->smp.max_cpus;
>  DeviceState *dev;
>  int i;
>  
> +/*
> + * Older machine types used to size the ENDT and IPI range
> + * according to the upper limit of vCPU ids, which can be
> + * higher than smp.max_cpus with custom VSMT settings. Keep
> + * the previous behavior for compatibility with such setups.
> + */
> +if (smc->pre_6_0_xive_max_cpus) {
> +max_cpus = spapr_max_vcpu_ids(spapr);
> +}
> +
>  dev = qdev_new(TYPE_SPAPR_XIVE);
>  qdev_prop_set_uint32(dev, "nr-irqs", smc->nr_xirqs + 
> SPAPR_XIRQ_BASE);
>  /*
>   * 8 XIVE END structures per CPU. One for each available
>   * priority
>   */
> -qdev_prop_set_uint32(dev, "nr-ends", max_vcpu_ids << 3);
> +qdev_prop_set_uint32(dev, "nr-ends", max_cpus << 3);
>  object_property_set_link(OBJECT(dev), "xive-fabric", OBJECT(spapr),
>   &error_abort);
>  sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
> @@ -342,7 +352,7 @@ void spapr_irq_init(SpaprMachineState *spapr, Error 
> **errp)
>  spapr->xive = SPAPR_XIVE(dev);
>  
>  /* Enable the CPU IPIs */
> -for (i = 0; i < max_vcpu_ids; ++i) {
> +for (i = 0; i < max_cpus; ++i) {
>  SpaprInterruptControllerClass *sicc
>  = SPAPR_INTC_GET_CLASS(spapr->xive);
>  
>

[PATCH 2/3] tests/acceptance: verify s390x device detection

The kernel/initrd combination does not provide the virtio-net
driver; therefore, simply check whether the presented device type
is indeed virtio-net for the two virtio-net-{ccw,pci} devices.

Signed-off-by: Cornelia Huck 
---
 tests/acceptance/machine_s390_ccw_virtio.py | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/tests/acceptance/machine_s390_ccw_virtio.py 
b/tests/acceptance/machine_s390_ccw_virtio.py
index 683b6e0dac2e..e203ee304264 100644
--- a/tests/acceptance/machine_s390_ccw_virtio.py
+++ b/tests/acceptance/machine_s390_ccw_virtio.py
@@ -80,3 +80,14 @@ class S390CCWVirtioMachine(Test):
 exec_command_and_wait_for_pattern(self,
   'cat 
/sys/bus/ccw/devices/0.3.1234/virtio?/features',
   virtio_rng_features)
+# verify that we indeed have virtio-net devices (without having the
+# virtio-net driver handy)
+exec_command_and_wait_for_pattern(self,
+  'cat 
/sys/bus/ccw/devices/0.1./cutype',
+  '3832/01')
+exec_command_and_wait_for_pattern(self,
+  'cat 
/sys/bus/pci/devices/0005\:00\:00.0/subsystem_vendor',
+  '0x1af4')
+exec_command_and_wait_for_pattern(self,
+  'cat 
/sys/bus/pci/devices/0005\:00\:00.0/subsystem_device',
+  '0x0001')
-- 
2.26.2

[PATCH 1/3] tests/acceptance: test virtio-ccw revision handling

The max_revision prop of virtio-ccw devices can be used to force
an older revision for compatibility handling. The easiest way to
check this is to force a device to revision 0, which turns off
virtio-1.

Signed-off-by: Cornelia Huck 
---
 tests/acceptance/machine_s390_ccw_virtio.py | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/tests/acceptance/machine_s390_ccw_virtio.py 
b/tests/acceptance/machine_s390_ccw_virtio.py
index db6352c44434..683b6e0dac2e 100644
--- a/tests/acceptance/machine_s390_ccw_virtio.py
+++ b/tests/acceptance/machine_s390_ccw_virtio.py
@@ -51,6 +51,10 @@ class S390CCWVirtioMachine(Test):
  '-initrd', initrd_path,
  '-append', kernel_command_line,
  '-device', 'virtio-net-ccw,devno=fe.1.',
+ '-device',
+ 'virtio-rng-ccw,devno=fe.2.,max_revision=0',
+ '-device',
+ 'virtio-rng-ccw,devno=fe.3.1234,max_revision=2',
  '-device', 'zpci,uid=5,target=zzz',
  '-device', 'virtio-net-pci,id=zzz')
 self.vm.launch()
@@ -60,9 +64,19 @@ class S390CCWVirtioMachine(Test):
 # first debug shell is too early, we need to wait for device detection
 exec_command_and_wait_for_pattern(self, 'exit', shell_ready)
 
-ccw_bus_id="0.1."
+ccw_bus_ids="0.1.  0.2.  0.3.1234"
 pci_bus_id="0005:00:00.0"
 exec_command_and_wait_for_pattern(self, 'ls /sys/bus/ccw/devices/',
-  ccw_bus_id)
+  ccw_bus_ids)
 exec_command_and_wait_for_pattern(self, 'ls /sys/bus/pci/devices/',
   pci_bus_id)
+# check that the device at 0.2. is in legacy mode, while the
+# device at 0.3.1234 has the virtio-1 feature bit set
+
virtio_rng_features="11001000"
+
virtio_rng_features_legacy="1100"
+exec_command_and_wait_for_pattern(self,
+  'cat 
/sys/bus/ccw/devices/0.2./virtio?/features',
+  virtio_rng_features_legacy)
+exec_command_and_wait_for_pattern(self,
+  'cat 
/sys/bus/ccw/devices/0.3.1234/virtio?/features',
+  virtio_rng_features)
-- 
2.26.2

[PATCH 3/3] tests/acceptance: test s390x zpci fid propagation

Verify that a fid specified on the command line shows up correctly
as the function_id in the guest.

Signed-off-by: Cornelia Huck 
---
 tests/acceptance/machine_s390_ccw_virtio.py | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/tests/acceptance/machine_s390_ccw_virtio.py 
b/tests/acceptance/machine_s390_ccw_virtio.py
index e203ee304264..53b8484f8f9c 100644
--- a/tests/acceptance/machine_s390_ccw_virtio.py
+++ b/tests/acceptance/machine_s390_ccw_virtio.py
@@ -56,7 +56,9 @@ class S390CCWVirtioMachine(Test):
  '-device',
  'virtio-rng-ccw,devno=fe.3.1234,max_revision=2',
  '-device', 'zpci,uid=5,target=zzz',
- '-device', 'virtio-net-pci,id=zzz')
+ '-device', 'virtio-net-pci,id=zzz',
+ '-device', 'zpci,uid=0xa,fid=12,target=serial',
+ '-device', 'virtio-serial-pci,id=serial')
 self.vm.launch()
 
 shell_ready = "sh: can't access tty; job control turned off"
@@ -65,11 +67,11 @@ class S390CCWVirtioMachine(Test):
 exec_command_and_wait_for_pattern(self, 'exit', shell_ready)
 
 ccw_bus_ids="0.1.  0.2.  0.3.1234"
-pci_bus_id="0005:00:00.0"
+pci_bus_ids="0005:00:00.0  000a:00:00.0"
 exec_command_and_wait_for_pattern(self, 'ls /sys/bus/ccw/devices/',
   ccw_bus_ids)
 exec_command_and_wait_for_pattern(self, 'ls /sys/bus/pci/devices/',
-  pci_bus_id)
+  pci_bus_ids)
 # check that the device at 0.2. is in legacy mode, while the
 # device at 0.3.1234 has the virtio-1 feature bit set
 
virtio_rng_features="11001000"
@@ -91,3 +93,7 @@ class S390CCWVirtioMachine(Test):
 exec_command_and_wait_for_pattern(self,
   'cat 
/sys/bus/pci/devices/0005\:00\:00.0/subsystem_device',
   '0x0001')
+# check fid propagation
+exec_command_and_wait_for_pattern(self,
+  'cat 
/sys/bus/pci/devices/000a\:00\:00.0/function_id',
+  '0x000c')
-- 
2.26.2

[PATCH 0/3] tests/acceptance: enhance s390x devices test

This series builds upon the new s390x acceptance test currently
queued on my s390-next branch.

Sadly, the kernel/initrd I'm using does not have the virtio-net driver,
so I cannot test things like mac address specification etc. Instead,
I added some quick checks regarding legacy virtio and propagation of
device type and fid properties.

Next up: maybe some device plug/unplug tests; but I still need to find
some inspiration there.

[And yes, I know that checkpatch moans about long lines -- hard to avoid
if you use a function with a very long name.]

Cornelia Huck (3):
  tests/acceptance: test virtio-ccw revision handling
  tests/acceptance: verify s390x device detection
  tests/acceptance: test s390x zpci fid propagation

 tests/acceptance/machine_s390_ccw_virtio.py | 41 ++---
 1 file changed, 36 insertions(+), 5 deletions(-)


base-commit: 875a99a0354211276b6daf635427b3c52a025790
-- 
2.26.2

Re: ImageInfo oddities regarding compression

2020-11-30 Thread Daniel P . Berrangé

On Fri, Nov 27, 2020 at 05:52:09PM +0100, Markus Armbruster wrote:
> Kevin Wolf  writes:
> 
> > Am 27.11.2020 um 13:21 hat Markus Armbruster geschrieben:
> >> >> I fell down this (thankfully shallow) rabbit hole because we also have
> >> >> 
> >> >> { 'enum': 'MultiFDCompression',
> >> >>   'data': [ 'none', 'zlib',
> >> >> { 'name': 'zstd', 'if': 'defined(CONFIG_ZSTD)' } ] }
> >> >> 
> >> >> I wonder whether we could merge them into a common type.
> >> 
> >> Looks like we could: current code would never report the additional
> >> value 'none'.  Introspection would show it, though.  Seems unlikely to
> >> cause trouble.  Observation, not demand.
> >
> > Forgot to comment on this one...
> >
> > Technically we could probably, but would it make sense? Support for
> > compression formats has to be implemented separately for both cases, so
> > that they currently happen to support the same list is more of a
> > coincidence.
> >
> > If we ever add a third compression format to qcow2, would we add the
> > same format to migration, too, or would we split the schema into two
> > types again?
> 
> I figure if a compression method is worth adding to one, it's probably
> worth adding to the other.

I'm not entirely sure about that, as compression algorithms have different
tradeoffs.

For migration compression we're streaming large volumes of data in a live
session, both compression and decompression speed is very important.

For qcow2 we're handling relative small discrete blocks (a qcow2
cluster at a time) and decompression speed is much more important that
compression speed, since we generally compress once out of band, and
decompress many times while live.

So we could see new compression methods desired for migration that are
not relevant for qcow2, or vica-verca.

> Having two separate enums isn't too bad.  A third has been proposed[*],
> but I hope we can reuse migration's existing enum there.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH for-6.0 v2 1/3] spapr: Improve naming of some vCPU id related items

On 11/30/20 5:52 PM, Greg Kurz wrote:
> The machine tells the IC backend the number of vCPU ids it will be
> exposed to, in order to:
> - fill the "ibm,interrupt-server-ranges" property in the DT (XICS)
> - size the VP block used by the in-kernel chip (XICS-on-XIVE, XIVE)
> 
> The current "nr_servers" and "spapr_max_server_number" naming can
> mislead people info thinking it is about a quantity of CPUs. Make
> it clear this is all about vCPU ids.

OK. This looks fine. 

But, XIVE does not care about vCPU ids. Only the count of vCPUs
is relevant. So, it would be nice to add a comment in the code 
that we got it wrong at some point or that XICS imposed the use
of max vCPU ids.

C. 


> Signed-off-by: Greg Kurz 
> ---
>  include/hw/ppc/spapr.h  |  2 +-
>  include/hw/ppc/spapr_irq.h  |  8 
>  include/hw/ppc/spapr_xive.h |  2 +-
>  include/hw/ppc/xics_spapr.h |  2 +-
>  hw/intc/spapr_xive.c|  8 
>  hw/intc/spapr_xive_kvm.c|  4 ++--
>  hw/intc/xics_kvm.c  |  4 ++--
>  hw/intc/xics_spapr.c|  8 
>  hw/ppc/spapr.c  |  8 
>  hw/ppc/spapr_irq.c  | 18 +-
>  10 files changed, 32 insertions(+), 32 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index b7ced9faebf5..dc99d45e2852 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -849,7 +849,7 @@ int spapr_hpt_shift_for_ramsize(uint64_t ramsize);
>  int spapr_reallocate_hpt(SpaprMachineState *spapr, int shift, Error **errp);
>  void spapr_clear_pending_events(SpaprMachineState *spapr);
>  void spapr_clear_pending_hotplug_events(SpaprMachineState *spapr);
> -int spapr_max_server_number(SpaprMachineState *spapr);
> +int spapr_max_vcpu_ids(SpaprMachineState *spapr);
>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>uint64_t pte0, uint64_t pte1);
>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
> index c22a72c9e270..2e53fc9e6cbb 100644
> --- a/include/hw/ppc/spapr_irq.h
> +++ b/include/hw/ppc/spapr_irq.h
> @@ -43,7 +43,7 @@ DECLARE_CLASS_CHECKERS(SpaprInterruptControllerClass, 
> SPAPR_INTC,
>  struct SpaprInterruptControllerClass {
>  InterfaceClass parent;
>  
> -int (*activate)(SpaprInterruptController *intc, uint32_t nr_servers,
> +int (*activate)(SpaprInterruptController *intc, uint32_t max_vcpu_ids,
>  Error **errp);
>  void (*deactivate)(SpaprInterruptController *intc);
>  
> @@ -62,7 +62,7 @@ struct SpaprInterruptControllerClass {
>  /* These methods should only be called on the active intc */
>  void (*set_irq)(SpaprInterruptController *intc, int irq, int val);
>  void (*print_info)(SpaprInterruptController *intc, Monitor *mon);
> -void (*dt)(SpaprInterruptController *intc, uint32_t nr_servers,
> +void (*dt)(SpaprInterruptController *intc, uint32_t max_vcpu_ids,
> void *fdt, uint32_t phandle);
>  int (*post_load)(SpaprInterruptController *intc, int version_id);
>  };
> @@ -74,7 +74,7 @@ int spapr_irq_cpu_intc_create(struct SpaprMachineState 
> *spapr,
>  void spapr_irq_cpu_intc_reset(struct SpaprMachineState *spapr, PowerPCCPU 
> *cpu);
>  void spapr_irq_cpu_intc_destroy(struct SpaprMachineState *spapr, PowerPCCPU 
> *cpu);
>  void spapr_irq_print_info(struct SpaprMachineState *spapr, Monitor *mon);
> -void spapr_irq_dt(struct SpaprMachineState *spapr, uint32_t nr_servers,
> +void spapr_irq_dt(struct SpaprMachineState *spapr, uint32_t max_vcpu_ids,
>void *fdt, uint32_t phandle);
>  
>  uint32_t spapr_irq_nr_msis(struct SpaprMachineState *spapr);
> @@ -105,7 +105,7 @@ typedef int 
> (*SpaprInterruptControllerInitKvm)(SpaprInterruptController *,
>  
>  int spapr_irq_init_kvm(SpaprInterruptControllerInitKvm fn,
> SpaprInterruptController *intc,
> -   uint32_t nr_servers,
> +   uint32_t max_vcpu_ids,
> Error **errp);
>  
>  /*
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 26c8d90d7196..643129b13536 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -79,7 +79,7 @@ int spapr_xive_end_to_target(uint8_t end_blk, uint32_t 
> end_idx,
>  /*
>   * KVM XIVE device helpers
>   */
> -int kvmppc_xive_connect(SpaprInterruptController *intc, uint32_t nr_servers,
> +int kvmppc_xive_connect(SpaprInterruptController *intc, uint32_t 
> max_vcpu_ids,
>  Error **errp);
>  void kvmppc_xive_disconnect(SpaprInterruptController *intc);
>  void kvmppc_xive_reset(SpaprXive *xive, Error **errp);
> diff --git a/include/hw/ppc/xics_spapr.h b/include/hw/ppc/xics_spapr.h
> index de752c0d2c7e..5c0e9430a964 100644
> --- a/include/hw/ppc/xics_spapr.h
> +++ b/include/hw/ppc/xics_spapr.h
> @@ -35,7 +35,7 @@
>  DECLARE_INSTANCE_CHECKER(ICSState, ICS_SPAPR,
>

Re: [PATCH for-6.0 v2 0/3] spapr: Address the confusion between IPI numbers and vCPU ids

On 11/30/20 5:52 PM, Greg Kurz wrote:
> A regression was recently fixed in the sPAPR XIVE code for QEMU 5.2
> RC3 [1]. It boiled down to a confusion between IPI numbers and vCPU
> ids, which happen to be numerically equal in general, but are really
> different entities that can diverge in some setups. This was causing
> QEMU to misconfigure XIVE and to crash the guest.
> 
> The confusion comes from XICS actually. Interrupt presenters in XICS
> are identified by a "server number" which is a 1:1 mapping to vCPU
> ids. The range of these "server numbers" is exposed to the guest in
> the "ibm,interrupt-server-ranges" property. A xics_max_server_number()
> helper was introduced at some point to compute the upper limit of the
> range. When XIVE was added, commit 1a518e7693c9 renamed the helper to
> spapr_max_server_number(). It ended up being used to size a bunch of
> things in XIVE that are per-vCPU, such as internal END tables or
> IPI ranges presented to the guest. The problem is that the maximum
> "server number" can be much higher (up to 8 times) than the actual
> number of vCPUs when the VSMT mode doesn't match the number of threads
> per core in the guest:
> 
> DIV_ROUND_UP(ms->smp.max_cpus * spapr->vsmt, ms->smp.threads);
> 
> Since QEMU 4.2, the default behavior is to set spapr->vsmt to
> ms->smp.threads. Setups with custom VSMT settings will configure XIVE
> to use more HW resources than needed. This is a bit unfortunate but
> not extremely harmful, 

Indeed. The default usage case (without vsmt) has no impact since 
it does not fragment the XIVE VP space more than needed.

> unless maybe if a lot of guests are running on the host. 

We can run 4K (-2) KVM guests today on a P9 system. To reach the 
internal limits, each should have 32 vCPUs. It's possible with a 
lot of RAM but it's not a common scenario. 

C.


> The sizing of the IPI range is more problematic though
> as it eventually led to [1].
> 
> This series first does some renaming to make it clear when we're
> dealing with vCPU ids. It then fixes the machine code to pass
> smp.max_cpus to XIVE where appropriate. Since these changes are
> guest/migration visible, a machine property is added to keep the
> existing behavior for older machine types. The series is thus based
> on Connie's recent patch that introduces compat machines for
> QEMU 6.0.
> 
> Based-on: 20201109173928.1001764-1-coh...@redhat.com
> 
> Note that we still use spapr_max_vcpu_ids() when activating the
> in-kernel irqchip because this is what both XICS-on-XIVE and XIVE
> KVM devices expect.
> 
> [1] https://bugs.launchpad.net/qemu/+bug/1900241
> 
> v2: - comments on v1 highlighted that problems mostly come from
>   spapr_max_server_number() which got misused over the years.
>   Updated the cover letter accordingly.
> - completely new approach. Instead of messing with device properties,
>   pass the appropriate values to the IC backend handlers.
> - rename a few things using the "max_vcpu_ids" wording instead of
>   "nr_servers" and "max_server_number"
> 
> Greg Kurz (3):
>   spapr: Improve naming of some vCPU id related items
>   spapr/xive: Fix size of END table and number of claimed IPIs
>   spapr/xive: Fix the "ibm,xive-lisn-ranges" property
> 
>  include/hw/ppc/spapr.h  |  3 ++-
>  include/hw/ppc/spapr_irq.h  | 12 ++--
>  include/hw/ppc/spapr_xive.h |  2 +-
>  include/hw/ppc/xics_spapr.h |  2 +-
>  hw/intc/spapr_xive.c|  9 +
>  hw/intc/spapr_xive_kvm.c|  4 ++--
>  hw/intc/xics_kvm.c  |  4 ++--
>  hw/intc/xics_spapr.c| 11 ++-
>  hw/ppc/spapr.c  | 12 
>  hw/ppc/spapr_irq.c  | 34 --
>  10 files changed, 57 insertions(+), 36 deletions(-)
>

Re: ImageInfo oddities regarding compression

On 11/27/20 4:32 AM, Kevin Wolf wrote:

>>>##
>>># @Qcow2CompressionType:
>>>#
>>># Compression type used in qcow2 image file
>>>#
>>># @zlib: zlib compression, see 
>>># @zstd: zstd compression, see 
>>>#
>>># Since: 5.1
>>>##
>>>{ 'enum': 'Qcow2CompressionType',
>>>  'data': [ 'zlib', { 'name': 'zstd', 'if': 'defined(CONFIG_ZSTD)' } ] }
>>>
>>> Apparently, you can't have a qcow2 image without compression.  Correct?
>>>
>>> Can you imagine a use case for "without compression"?
>>
>> Yes & no. An image always has a compression type, either implicitly
>> zlib as the historical default, or explicitly as a type recorded in
>> the header.  This doesn't mean compression is used.
>>
>> There may be zero or more clusters actually using compression.
>> Typically compression will never be used, but there's still an
>> associated compression type regardless.
> 
> Right, so the correct answer to "is this image compressed?" is "unknown"
> for qcow2. Providing an answer would require checking all clusters in
> the image for the compression flag, which is not something we want to do
> for query-block. And even if we did that, it would still be unclear what
> to do with a partially compressed image.

If we truly need it, we could define three new autoclear bits in the
qcow2 spec:

bit 2: set if bits 3-4 are known to describe entire image
bit 3: set any time a compressed cluster is written to the image
bit 4: set any time an uncompressed cluster is written to the image

Any edit to the image by an older qemu will clear all three bits,
whereas new qemu would set bit 2 on image creation, and set the
appropriate bit 3 or 4 on any cluster write, so that we then have the
following knowledge:

234
===
000 - image was created or modified by older qemu; we have no idea if
clusters are written, or if compression was used, without expensive scan
001 - image contains at least one normal cluster, but no idea if it also
contains compressed clusters without expensive scan
010 - image contains at least one compressed cluster, but no idea if it
is fully compressed without expensive scan
011 - image contains mix of normal and compressed clusters
100 - image is newly created with no written clusters
101 - image contains only normal clusters; compression type could be
changed without risk
110 - image contains only compressed clusters
111 - image contains mix of normal and compressed clusters

But I'm not sure we need it.

> 
> The @compression-type only tells you what format a compressed cluster
> uses if any compressed cluster exists in the image (or if a new
> compressed cluster is written to it). It doesn't mean that such clusters
> currently exist.
> 
> Kevin
> 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[Bug 498523] Re: Add on-line write compression support to qcow2

2020-11-30 Thread Max Reitz

The compression filter can be used e.g. with -drive
driver=compress,file.driver=qcow2,file.file.filename=foo.qcow2.
However, it shouldn’t be used lightly, as it will only do the right
thing in very specific circumstances, namely every cluster that’s
written to must not be allocated already.  So writing to the same
cluster twice will not work.  (Which is why I was hesitant to merge this
filter, but in the end I was contend with the fact that it’s at least
difficult enough to use that unsuspecting users hopefully won’t
accidentally enable it.)

(It should be noted that this is not a limitation of the compression
filter, though, but of qcow2’s implementation (VMDK isn’t any better).
So technically qemu has the feature now, but qcow2 is still missing it.)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/498523

Title:
  Add on-line write compression support to qcow2

Status in QEMU:
  Fix Released

Bug description:
  This is a wishlist item.  Launchpad really need a way for the
  submitter to indicate this.

  It would be really cool if qemu were to support disk compression on-
  line for writes.

  I know this wouldn't be really easy.  Although most OS's use blocks,
  you can really only count on being able to compress 512-byte sectors,
  which doesn't give much room for a good compression ratio.  Moreover,
  the index indicating where in the image file each sector is located
  would be complex to manage, since the compressed blocks would be
  variable sized, and you'd be wanting to do some kind of best-fit
  allocation of space in the image file.  (If you were to make the image
  file compressed block size granularity, say, 64 bytes, you could
  probably do this best fit O(1).)  If you were to buffer enough writes,
  you could group arbitrary sequences of written sectors into blocks to
  compress (which with writeback could be sent to a helper thread on
  another CPU, so the throughput would be good).

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/498523/+subscriptions

Re: [PATCH v1 8/9] virtio-mem: Require only coordinated discards

2020-11-30 Thread Dr. David Alan Gilbert

* David Hildenbrand (da...@redhat.com) wrote:
> We implement the RamDiscardMgr interface and only require coordinated
> discarding of RAM to work.
> 
> Cc: Paolo Bonzini 
> Cc: "Michael S. Tsirkin" 
> Cc: Alex Williamson 
> Cc: Dr. David Alan Gilbert 
> Cc: Igor Mammedov 
> Cc: Pankaj Gupta 
> Cc: Peter Xu 
> Cc: Auger Eric 
> Cc: Wei Yang 
> Cc: teawater 
> Cc: Marek Kedzierski 
> Signed-off-by: David Hildenbrand 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  hw/virtio/virtio-mem.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 93257b6c26..932d386c37 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -687,7 +687,7 @@ static void virtio_mem_device_realize(DeviceState *dev, 
> Error **errp)
>  return;
>  }
>  
> -if (ram_block_discard_require(true)) {
> +if (ram_block_coordinated_discard_require(true)) {
>  error_setg(errp, "Discarding RAM is disabled");
>  return;
>  }
> @@ -695,7 +695,7 @@ static void virtio_mem_device_realize(DeviceState *dev, 
> Error **errp)
>  ret = ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb));
>  if (ret) {
>  error_setg_errno(errp, -ret, "Unexpected error discarding RAM");
> -ram_block_discard_require(false);
> +ram_block_coordinated_discard_require(false);
>  return;
>  }
>  
> @@ -738,7 +738,7 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
>  virtio_del_queue(vdev, 0);
>  virtio_cleanup(vdev);
>  g_free(vmem->bitmap);
> -ram_block_discard_require(false);
> +ram_block_coordinated_discard_require(false);
>  }
>  
>  static int virtio_mem_discard_range_cb(const VirtIOMEM *vmem, void *arg,
> -- 
> 2.26.2
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH for-6.0 1/6] qapi: Add query-accel command

2020-11-30 Thread Philippe Mathieu-Daudé

On 11/16/20 2:10 PM, Roman Bolshakov wrote:
> There's a problem for management applications to determine if certain
> accelerators available. Generic QMP command should help with that.
> 
> Signed-off-by: Roman Bolshakov 
> ---
>  monitor/qmp-cmds.c | 15 +++
>  qapi/machine.json  | 19 +++
>  2 files changed, 34 insertions(+)
...
> +##
> +# @query-accel:
> +#
> +# Returns information about an accelerator
> +#
> +# Returns: @KvmInfo
> +#
> +# Since: 6.0.0
> +#
> +# Example:
> +#
> +# -> { "execute": "query-accel", "arguments": { "name": "kvm" } }
> +# <- { "return": { "enabled": true, "present": true } }

FWIW you can use 'qom-list-types' for that:

{ "execute": "qom-list-types", "arguments": { "implements": "accel" } }
{
"return": [
{
"name": "qtest-accel",
"parent": "accel"
},
{
"name": "tcg-accel",
"parent": "accel"
},
{
"name": "xen-accel",
"parent": "accel"
},
{
"name": "kvm-accel",
"parent": "accel"
},
{
"name": "accel",
"parent": "object"
}
]
}

Which is what I use for integration tests:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg675079.html
https://www.mail-archive.com/qemu-devel@nongnu.org/msg675085.html

[PATCH] target/riscv: Fix definition of MSTATUS_TW and MSTATUS_TSR

2020-11-30 Thread Alex Richardson

The TW and TSR fields should be bits 21 and 22 and not 30/29.
This was found while comparing QEMU behaviour against the sail formal
model (https://github.com/rems-project/sail-riscv/).

Signed-off-by: Alex Richardson 
---
 target/riscv/cpu_bits.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index 24b24c69c5..92147332c6 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -379,8 +379,8 @@
 #define MSTATUS_MXR 0x0008
 #define MSTATUS_VM  0x1F00 /* until: priv-1.9.1 */
 #define MSTATUS_TVM 0x0010 /* since: priv-1.10 */
-#define MSTATUS_TW  0x2000 /* since: priv-1.10 */
-#define MSTATUS_TSR 0x4000 /* since: priv-1.10 */
+#define MSTATUS_TW  0x0020 /* since: priv-1.10 */
+#define MSTATUS_TSR 0x0040 /* since: priv-1.10 */
 #define MSTATUS_GVA 0x40ULL
 #define MSTATUS_MPV 0x80ULL
 
-- 
2.29.2

Re: [RFC PATCH 18/25] hw/cxl/device: Add a memory device (8.2.8.5)

2020-11-30 Thread Ben Widawsky

On 20-11-26 07:36:23, Markus Armbruster wrote:
> Ben Widawsky  writes:
> 
> > On 20-11-13 08:47:59, Markus Armbruster wrote:
> >> Eric Blake  writes:
> >> 
> >> > On 11/10/20 11:47 PM, Ben Widawsky wrote:
> >> >> A CXL memory device (AKA Type 3) is a CXL component that contains some
> >> >> combination of volatile and persistent memory. It also implements the
> >> >> previously defined mailbox interface as well as the memory device
> >> >> firmware interface.
> >> >> 
> >> >> The following example will create a 256M device in a 512M window:
> >> >> 
> >> >> -object 
> >> >> "memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M"
> >> >> -device "cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,size=256M"
> >> >> 
> >> >> Signed-off-by: Ben Widawsky 
> >> >> ---
> >> >
> >> >> +++ b/qapi/machine.json
> >> >> @@ -1394,6 +1394,7 @@
> >> >>  { 'union': 'MemoryDeviceInfo',
> >> >>'data': { 'dimm': 'PCDIMMDeviceInfo',
> >> >>  'nvdimm': 'PCDIMMDeviceInfo',
> >> >> +'cxl': 'PCDIMMDeviceInfo',
> >> >>  'virtio-pmem': 'VirtioPMEMDeviceInfo',
> >> >>  'virtio-mem': 'VirtioMEMDeviceInfo'
> >> >>}
> >> >
> >> > Missing documentation of the new data type, and the fact that it will be
> >> > introduced in 6.0.
> >> 
> >> Old wish list item: improve the QAPI schema frontend to flag this.
> >> 
> >
> > "Introduced in 6.0" - quite the optimist. Kidding aside, this is the area 
> > where
> > I could use some feedback. CXL Type 3 memory devices can contain both 
> > volatile
> > and persistent memory at the same time. As such, I think I'll need a new 
> > type to
> > represent that, but I'd love to know how best to accomplish that.
> 
> We can help.  Tell us what information you want to provide in variant
> 'cxl'.  If it's a superset of an existing variant, give us just the
> delta.
> 

I'm not exactly sure what the best way to do this is in QEMU, so I'm not really
sure what to specify as the delta. A CXL memory device can have both volatile
and persistent memory. Currently when a CXL memory device implements the
TYPE_MEMORY_DEVICE interface. So I believe the shortest path is a
MemoryDeviceInfo that can have two memory devices associated with it, but I
don't know if that's the best path.


> >> > Also, Markus has been trying to get rid of so-called
> >> > "simple unions" in favor of "flat unions" - every time we modify an
> >> > existing simple union, it is worth asking if it is time to first flatten
> >> > this.
> >> 
> >> 0. Simple unions can be transformed into flat unions.  The
> >> transformation can either preserve the nested wire format, or flatten
> >> it.  See docs/devel/qapi-code-gen.txt "A simple union can always be
> >> re-written as a flat union ..."
> >> 
> >> 1. No new simple unions.
> >> 
> >> 2. Existing simple unions that can be flattened without breaking
> >> backward compatibility have long been flattened.
> >> 
> >> 3. The remaining simple unions are part of QMP, where we need to
> >> preserve the wire format.  We could turn them into flat union preserving
> >> the wire format.  Only worthwhile if we kill simple unions and simplify
> >> scripts/qapi/.  Opportunity to make the flat union syntax less
> >> cumbersome.  Not done due to lack of time.
> >> 
> >> 4. Kevin and I have been experimenting with ways to provide both flat
> >> and nested wire format.  This would pave the way for orderly deprecation
> >> of the nested wire format.  May not be practical for QMP output.
> >> 
> >
> > So is there anything for me to do here?
> 
> No.  Extending an existing simple union is okay.
> 
> We should not add news ones.  We should think twice before we add new
> uses of existing ones.
> 
>

Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration

2020-11-30 Thread Alex Williamson

On Thu, 26 Nov 2020 14:56:17 +0800
Shenming Lu  wrote:

> Hi,
> 
> After reading everyone's opinions, we have a rough idea for this issue.
> 
> One key point is whether it is necessary to setup the config space before
> the device can accept further migration data. I think it is decided by
> the vendor driver, so we can simply ask the vendor driver about it in
> .save_setup, which could avoid a lot of unnecessary copies and settings.
> Once we have known the need, we can iterate the config space (before)
> along with the device migration data in .save_live_iterate and
> .save_live_complete_precopy, and if not needed, we can only migrate the
> config space in .save_state.
> 
> Another key point is that the interrupt enabling should be after the
> restoring of the interrupt controller (might not only interrupts).
> My solution is to add a subflag at the beginning of the config data
> (right after VFIO_MIG_FLAG_DEV_CONFIG_STATE) to indicate the triggered
> actions on the dst (such as whether to enable interrupts).
> 
> Below is it's workflow.
> 
> On the save path:
>   In vfio_save_setup():
>   Ask the vendor driver if it needs the config space setup before it
>   can accept further migration data.

How does "ask the vendor driver" actually work?

>   |
>   In vfio_save_iterate() (pre-copy):
>   If *needed*, save the config space which would be setup on the dst
>   before the migration data, but send with a subflag to instruct not
>   to (such as) enable interrupts.

If not for triggering things like MSI/X configuration, isn't config
space almost entirely virtual?  What visibility does the vendor driver
have to the VM machine dependencies regarding device interrupt versus
interrupt controller migration?

>   |
>   In vfio_save_complete_precopy() (stop-and-copy, iterable process):
>   The same as that in vfio_save_iterate().
>   |
>   In .save_state (stop-and-copy, non-iterable process):
>   If *needed*, only send a subflag to instruct to enable interrupts.
>   If *not needed*, save the config space and setup everything on the dst.

Again, how does the vendor driver have visibility to know when the VM
machine can enable interrupts?

> 
> Besides the above idea, we might be able to choose to let the vendor driver do
> more: qemu just sends and writes the config data (before) along with the 
> device
> migration data every time, and it's up to the vendor driver to filter 
> out/buffer
> the received data or reorder the settings...

There is no vendor driver in QEMU though, so are you suggesting that
QEMU follows a standard protocol and the vendor driver chooses when to
enable specific features?  For instance, QEMU would call SET_IRQS and
the driver would return success, but defer that setup if necessary?
That seems quite troubling as we then have ioctls that behave
differently depending on the device state and we have no error path to
userspace should that setup fail later.  The vendor driver does have
its own data stream for migration, so the vendor driver could tell the
destination version of itself what type of interrupt to use, which
might be sufficient if we were to ignore the latency if QEMU were to
defer interrupt setup until stop-and-copy.

Is the question of when to setup device interrupts versus the interrupt
controller state largely a machine issue within QEMU?  If so, shouldn't
it be at QEMU's determination when to act on the config space
information on the target?  IOW, if a vendor driver has a dependency on
interrupt configuration, they need to include it in their own pre-copy
data stream and decouple that dependency from userspace interrupt
configuration via the SET_IRQS ioctl.  Is that possible?  Thanks,

Alex

Re: [RFC] ich9:cpuhp: add support for cpu hot-unplug with SMI broadcast enabled

2020-11-30 Thread Laszlo Ersek

On 11/28/20 01:43, Ankur Arora wrote:
> On 2020-11-27 7:19 a.m., Laszlo Ersek wrote:
>> On 11/27/20 05:10, Ankur Arora wrote:
>>
>>> Yeah I was wondering what would happen for simultaneous hot add and
>>> remove.
>>> I guess we would always do remove first and then the add, unless we hit
>>> the break due to max_cpus_per_pass and switch to hot-add mode.
>>
>> Considering the firmware only, I disagree with remove-then-add.
>>
>> EFI_SMM_CPU_SERVICE_PROTOCOL.AddProcessor() and
>> EFI_SMM_CPU_SERVICE_PROTOCOL.RemoveProcessor() (implemented in
>> SmmAddProcessor() and SmmRemoveProcessor() in
>> "UefiCpuPkg/PiSmmCpuDxeSmm/CpuService.c", respectively) only mark the
>> processors for addition/removal. The actual processing is done only
>> later, in BSPHandler() --> SmmCpuUpdate(), when "all SMI handlers are
>> finished" (see the comment in SmmRemoveProcessor()).
>>
>> Consequently, I would not suggest replacing a valid APIC ID in a
>> particular mCpuHotPlugData.ApicId[Index] slot with INVALID_APIC_ID
>> (corresponding to the unplug operation), and then possibly replacing
>> INVALID_APIC_ID in the *same slot* with the APIC ID of the newly plugged
>> CPU, in the exact same SMI invocation (= in the same execution of
>> CpuHotplugMmi()). That might cause some component in edk2 to see the
>> APIC ID in mCpuHotPlugData.ApicId[Index] to change from one valid ACPI
>> ID to another valid APIC ID, and I don't even want to think about what
>> kind of mess that could cause.
> 
> Shudders.
> 
>>
>> So no, please handle plugs first, for which unused slots in
>> mCpuHotPlugData.ApicId will be populated, and *then* handle removals (in
>> the same invocation of CpuHotplugMmi()).
> 
> Yeah, that ordering makes complete sense.
> 
>>
>> By the way, for unplug, you will not have to re-set
>> mCpuHotPlugData.ApicId[Index] to INVALID_APIC_ID, as
>> SmmRemoveProcessor() does that internally. You just have to locate the
>> Index for the APIC ID being removed, for calling
>> EFI_SMM_CPU_SERVICE_PROTOCOL.RemoveProcessor().
> 
> Right. The hotplug is more involved (given the need to pen the new CPU)
> but for the unplug, AFAICS all the actual handling for removal is in
> .RemoveProcessor() and at SMI exit in SmmCpuUpdate().

Yes, I got the same impression (without having tried to implement it, of
course).

Laszlo

Re: [RFC] ich9:cpuhp: add support for cpu hot-unplug with SMI broadcast enabled

2020-11-30 Thread Laszlo Ersek

On 11/28/20 00:48, Ankur Arora wrote:

> It is possible that there are CPUs with bits for both is_inserting and
> is_removing. In that case QemuCpuhpCollectApicIds() would put them in the
> PluggedApicIds array and the unplug eventually happens in the next
> firmware invocation.
> 
> If a CPU has both is_inserting and fw_remove set, the firmware processes
> the
> hotplug in that invocation and the unplug happens whenever the OSPM
> triggers
> the firmware next.

If these corner cases will actually work (I'm somewhat doubtful), that
will be really great.

Thanks!
Laszlo

Re: [PATCH 00/18] qapi/qom: QAPIfy object-add

2020-11-30 Thread Paolo Bonzini


On 30/11/20 16:46, Kevin Wolf wrote:

Am 30.11.2020 um 15:58 hat Paolo Bonzini geschrieben:

With this series it's basically pointless to have QOM properties at
all.


Not entirely, because there are still some writable properties that can
be changed later on.


Are there really any (that are not bugs like opened/loaded)? That's also 
why Eduardo and I discussed a class-wide allow_set function for his 
field properties series.



So with this in mind, I think I'm in favour of completely leaving the
initialisation of properties on object creation to QAPI, and only
providing individual setters if we actually intend to allow property
changes after creation.


The main problem is that it wouldn't extend well, if at all, to machines 
and devices.  So those would still not be integrated into the QAPI schema.



So the question is, are we okay with shoveling half of QEMU's backend data
model into a single struct?  If so, there are important consequences.


Yeah, the single struct bothers me a bit, both in the QAPI schema and in
the C source.


The single struct doesn't bother me _too much_ actually.  What bothers 
me is that it won't be a single source of all QOM objects, only those 
that happen to be created by object-add.  So I start to wonder if QOM as 
it exists now is the right solution for all kind of objects:


- backends and other object-add types (not per-target, relatively few 
classes and even fewer hierarchies)


- machine (per-target, many classes, no hierarchy)

- device (can be treated as not per-target, many many classes, very few 
hierarchies)


- accelerator (per-target, few classes, no hierarchy)

- chardev (ok those are the same as the first category)

If QOM is the right solution, this patch goes in the wrong direction.

If QOM is the wrong solution, this patch is okay but then we have 
another problem to solve. :)



The problem with this series is that you are fine with deduplicating things
as a third step, but you cannot be sure that such deduplication is possible
at all.  So while I don't have any problems in principle with the
ObjectOptions concept, I don't think it should be committed without a clear
idea of how to do the third step.


Do you have any specific concerns why the deduplication might not
possible, or just because it's uncharted territory?


Mostly because it's more or less the same issue that you have with 
BlockdevOptions, with the extra complication that this series only deals 
with the easy one of the four above categories.



Maybe if we don't want to commit to keeping the ObjectOptions schema,
the part that should wait is object-add and I should do the command line
options first? Then the schema remains an implementation detail for now
that is invisible in introspection.


I don't see much benefit in converting _any_ of the three actually.  The 
only good reason I see for QAPIfying this is the documentation, and the 
promise of deduplicating QOM boilerplate.  The latter is only a promise, 
but documentation alone is a damn good reason and it's enough to get 
this work into a mergeable shape as soon as possible!


But maybe, we could start in the opposite direction: start with the use 
QAPI to eliminate QOM boilerplate.  Basing your work on Eduardo's field 
properties series, you could add a new 'object' "kind" to QAPI that 
would create an array of field properties (e.g. a macro expanding to a 
compound literal?)

.  Something like


+{ 'object': 'InputBarrier',
+  'data': { 'name': 'str',
+'x-origin': 'int16',
+'y-origin': 'int16',
+'width': 'int16',
+'height': 'int16' },
+  'properties': { 'server': 'str',
+  'port': 'str' } }

would create a macro QOM_InputBarrier_FIELDS defining properties for the 
following fields of the InputBarrier struct:


gchar *name;
int16_t x_origin, y_origin;
int16_t width, height;

while server and port would only appear in the documentation (or 
alternatively you could leave them out completely, as you wish).

The advantages would be noticeable:

1) the information would be moved in the QAPI schema JSON from the 
beginning, decoupling the conflict-heavy part from the complex question 
of how to expose the QOM schema in the introspection data


2) there would not be any more duplication than before (there would be 
duplication between structs and QAPI schema, but not between structs and 
C code that defines properties).


3) it would be opt-in, so it doesn't put on you the burden of keeping 
the series in sync with new objects that are added (I have one for the 
qtest server for example).  At the same time it would be quite appealing 
for owners of QOM code to convert their objects to field properties and 
get documentation for free.


4) we could special-case 'object' definitions and generate them in the 
system manual as well, since they are also useful to document -object.


Yes it's a huge change but you have the boring part already done.  What 
do you think?

Re: [PATCH v2 0/4] [RfC] fix tracing for modules

2020-11-30 Thread Stefan Hajnoczi

On Tue, Nov 24, 2020 at 05:02:51PM +0100, Gerd Hoffmann wrote:
> First version that actually works.  Only qxl covered for this RfC, other
> modules will follow once the basics are hashed out.
> 
> More context:
>   https://bugzilla.redhat.com/show_bug.cgi?id=1898700
>   https://bugzilla.redhat.com/show_bug.cgi?id=1869642
> 
> take care,
>   Gerd
> 
> Gerd Hoffmann (4):
>   meson: add trace_events_config[]
>   meson: move up hw subdir (specifically before trace subdir)
>   meson: add module_trace & module_trace_src
>   meson: move qxl trace events to separate file

Awesome, thank you for working on this!

I noticed an issue with simpletrace: the trace file does not contain
qxl_* TRACE_RECORD_TYPE_MAPPING records when ./configure
--enable-modules is used. This happens because st_write_event_mapping()
is called before the qxl module calls trace_event_register_group().

(The mapping records describe the integer ID to string name mapping used
in a simpletrace file.)

You can check this using "grep -a qxl_ trace-$LAST_QEMU_PID" after
running qemu --device qxl built with ./configure --enable-modules
--enable-trace-backend=simple.

Remove --enable-modules and the file will contain the qxl_ trace events.

This means the trace file is broken because the simpletrace records
cannot be mapped to a trace event name.

One way to solve this is by modifying trace_event_register_group() to
call into trace/simple.c (maybe with a TraceEventIter initialized to
iterate over the newly registered trace events group?).

Alternatively, simpletrace.c might be able to emit
TRACE_RECORD_TYPE_MAPPING on demand instead of ahead of time.

Stefan

signature.asc
Description: PGP signature

[PATCH for-6.0 v2 1/3] spapr: Improve naming of some vCPU id related items

The machine tells the IC backend the number of vCPU ids it will be
exposed to, in order to:
- fill the "ibm,interrupt-server-ranges" property in the DT (XICS)
- size the VP block used by the in-kernel chip (XICS-on-XIVE, XIVE)

The current "nr_servers" and "spapr_max_server_number" naming can
mislead people info thinking it is about a quantity of CPUs. Make
it clear this is all about vCPU ids.

Signed-off-by: Greg Kurz 
---
 include/hw/ppc/spapr.h  |  2 +-
 include/hw/ppc/spapr_irq.h  |  8 
 include/hw/ppc/spapr_xive.h |  2 +-
 include/hw/ppc/xics_spapr.h |  2 +-
 hw/intc/spapr_xive.c|  8 
 hw/intc/spapr_xive_kvm.c|  4 ++--
 hw/intc/xics_kvm.c  |  4 ++--
 hw/intc/xics_spapr.c|  8 
 hw/ppc/spapr.c  |  8 
 hw/ppc/spapr_irq.c  | 18 +-
 10 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index b7ced9faebf5..dc99d45e2852 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -849,7 +849,7 @@ int spapr_hpt_shift_for_ramsize(uint64_t ramsize);
 int spapr_reallocate_hpt(SpaprMachineState *spapr, int shift, Error **errp);
 void spapr_clear_pending_events(SpaprMachineState *spapr);
 void spapr_clear_pending_hotplug_events(SpaprMachineState *spapr);
-int spapr_max_server_number(SpaprMachineState *spapr);
+int spapr_max_vcpu_ids(SpaprMachineState *spapr);
 void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
   uint64_t pte0, uint64_t pte1);
 void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index c22a72c9e270..2e53fc9e6cbb 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -43,7 +43,7 @@ DECLARE_CLASS_CHECKERS(SpaprInterruptControllerClass, 
SPAPR_INTC,
 struct SpaprInterruptControllerClass {
 InterfaceClass parent;
 
-int (*activate)(SpaprInterruptController *intc, uint32_t nr_servers,
+int (*activate)(SpaprInterruptController *intc, uint32_t max_vcpu_ids,
 Error **errp);
 void (*deactivate)(SpaprInterruptController *intc);
 
@@ -62,7 +62,7 @@ struct SpaprInterruptControllerClass {
 /* These methods should only be called on the active intc */
 void (*set_irq)(SpaprInterruptController *intc, int irq, int val);
 void (*print_info)(SpaprInterruptController *intc, Monitor *mon);
-void (*dt)(SpaprInterruptController *intc, uint32_t nr_servers,
+void (*dt)(SpaprInterruptController *intc, uint32_t max_vcpu_ids,
void *fdt, uint32_t phandle);
 int (*post_load)(SpaprInterruptController *intc, int version_id);
 };
@@ -74,7 +74,7 @@ int spapr_irq_cpu_intc_create(struct SpaprMachineState *spapr,
 void spapr_irq_cpu_intc_reset(struct SpaprMachineState *spapr, PowerPCCPU 
*cpu);
 void spapr_irq_cpu_intc_destroy(struct SpaprMachineState *spapr, PowerPCCPU 
*cpu);
 void spapr_irq_print_info(struct SpaprMachineState *spapr, Monitor *mon);
-void spapr_irq_dt(struct SpaprMachineState *spapr, uint32_t nr_servers,
+void spapr_irq_dt(struct SpaprMachineState *spapr, uint32_t max_vcpu_ids,
   void *fdt, uint32_t phandle);
 
 uint32_t spapr_irq_nr_msis(struct SpaprMachineState *spapr);
@@ -105,7 +105,7 @@ typedef int 
(*SpaprInterruptControllerInitKvm)(SpaprInterruptController *,
 
 int spapr_irq_init_kvm(SpaprInterruptControllerInitKvm fn,
SpaprInterruptController *intc,
-   uint32_t nr_servers,
+   uint32_t max_vcpu_ids,
Error **errp);
 
 /*
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 26c8d90d7196..643129b13536 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -79,7 +79,7 @@ int spapr_xive_end_to_target(uint8_t end_blk, uint32_t 
end_idx,
 /*
  * KVM XIVE device helpers
  */
-int kvmppc_xive_connect(SpaprInterruptController *intc, uint32_t nr_servers,
+int kvmppc_xive_connect(SpaprInterruptController *intc, uint32_t max_vcpu_ids,
 Error **errp);
 void kvmppc_xive_disconnect(SpaprInterruptController *intc);
 void kvmppc_xive_reset(SpaprXive *xive, Error **errp);
diff --git a/include/hw/ppc/xics_spapr.h b/include/hw/ppc/xics_spapr.h
index de752c0d2c7e..5c0e9430a964 100644
--- a/include/hw/ppc/xics_spapr.h
+++ b/include/hw/ppc/xics_spapr.h
@@ -35,7 +35,7 @@
 DECLARE_INSTANCE_CHECKER(ICSState, ICS_SPAPR,
  TYPE_ICS_SPAPR)
 
-int xics_kvm_connect(SpaprInterruptController *intc, uint32_t nr_servers,
+int xics_kvm_connect(SpaprInterruptController *intc, uint32_t max_vcpu_ids,
  Error **errp);
 void xics_kvm_disconnect(SpaprInterruptController *intc);
 bool xics_kvm_has_broken_disconnect(void);
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 12dd6d3ce357..d0a0ca822367 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -669,7

[PATCH for-6.0 v2 2/3] spapr/xive: Fix size of END table and number of claimed IPIs

The sPAPR XIVE device has an internal ENDT table the size of
which is configurable by the machine. This table is supposed
to contain END structures for all possible vCPUs that may
enter the guest. The machine must also claim IPIs for all
possible vCPUs since this is expected by the guest.

spapr_irq_init() takes care of that under the assumption that
spapr_max_vcpu_ids() returns the number of possible vCPUs.
This happens to be the case when the VSMT mode is set to match
the number of threads per core in the guest (default behavior).
With non-default VSMT settings, this limit is > to the number
of vCPUs. In the worst case, we can end up allocating an 8 times
bigger ENDT and claiming 8 times more IPIs than needed. But more
importantly, this creates a confusion between number of vCPUs and
vCPU ids, which can lead to subtle bugs like [1].

Use smp.max_cpus instead of spapr_max_vcpu_ids() in
spapr_irq_init() for the latest machine type. Older machine
types continue to use spapr_max_vcpu_ids() since the size of
the ENDT is migration visible.

[1] https://bugs.launchpad.net/qemu/+bug/1900241

Signed-off-by: Greg Kurz 
---
 include/hw/ppc/spapr.h |  1 +
 hw/ppc/spapr.c |  3 +++
 hw/ppc/spapr_irq.c | 16 +---
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index dc99d45e2852..95bf210d0bf6 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -139,6 +139,7 @@ struct SpaprMachineClass {
 hwaddr rma_limit;  /* clamp the RMA to this size */
 bool pre_5_1_assoc_refpoints;
 bool pre_5_2_numa_associativity;
+bool pre_6_0_xive_max_cpus;
 
 bool (*phb_placement)(SpaprMachineState *spapr, uint32_t index,
   uint64_t *buid, hwaddr *pio, 
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index ab59bfe941d0..227a926dfd48 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4530,8 +4530,11 @@ DEFINE_SPAPR_MACHINE(6_0, "6.0", true);
  */
 static void spapr_machine_5_2_class_options(MachineClass *mc)
 {
+SpaprMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
+
 spapr_machine_6_0_class_options(mc);
 compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len);
+smc->pre_6_0_xive_max_cpus = true;
 }
 
 DEFINE_SPAPR_MACHINE(5_2, "5.2", false);
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 552e30e93036..4d9ecd5d0af8 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -324,17 +324,27 @@ void spapr_irq_init(SpaprMachineState *spapr, Error 
**errp)
 }
 
 if (spapr->irq->xive) {
-uint32_t max_vcpu_ids = spapr_max_vcpu_ids(spapr);
+uint32_t max_cpus = MACHINE(spapr)->smp.max_cpus;
 DeviceState *dev;
 int i;
 
+/*
+ * Older machine types used to size the ENDT and IPI range
+ * according to the upper limit of vCPU ids, which can be
+ * higher than smp.max_cpus with custom VSMT settings. Keep
+ * the previous behavior for compatibility with such setups.
+ */
+if (smc->pre_6_0_xive_max_cpus) {
+max_cpus = spapr_max_vcpu_ids(spapr);
+}
+
 dev = qdev_new(TYPE_SPAPR_XIVE);
 qdev_prop_set_uint32(dev, "nr-irqs", smc->nr_xirqs + SPAPR_XIRQ_BASE);
 /*
  * 8 XIVE END structures per CPU. One for each available
  * priority
  */
-qdev_prop_set_uint32(dev, "nr-ends", max_vcpu_ids << 3);
+qdev_prop_set_uint32(dev, "nr-ends", max_cpus << 3);
 object_property_set_link(OBJECT(dev), "xive-fabric", OBJECT(spapr),
  &error_abort);
 sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
@@ -342,7 +352,7 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
 spapr->xive = SPAPR_XIVE(dev);
 
 /* Enable the CPU IPIs */
-for (i = 0; i < max_vcpu_ids; ++i) {
+for (i = 0; i < max_cpus; ++i) {
 SpaprInterruptControllerClass *sicc
 = SPAPR_INTC_GET_CLASS(spapr->xive);
 
-- 
2.26.2

[PATCH for-6.0 v2 0/3] spapr: Address the confusion between IPI numbers and vCPU ids

A regression was recently fixed in the sPAPR XIVE code for QEMU 5.2
RC3 [1]. It boiled down to a confusion between IPI numbers and vCPU
ids, which happen to be numerically equal in general, but are really
different entities that can diverge in some setups. This was causing
QEMU to misconfigure XIVE and to crash the guest.

The confusion comes from XICS actually. Interrupt presenters in XICS
are identified by a "server number" which is a 1:1 mapping to vCPU
ids. The range of these "server numbers" is exposed to the guest in
the "ibm,interrupt-server-ranges" property. A xics_max_server_number()
helper was introduced at some point to compute the upper limit of the
range. When XIVE was added, commit 1a518e7693c9 renamed the helper to
spapr_max_server_number(). It ended up being used to size a bunch of
things in XIVE that are per-vCPU, such as internal END tables or
IPI ranges presented to the guest. The problem is that the maximum
"server number" can be much higher (up to 8 times) than the actual
number of vCPUs when the VSMT mode doesn't match the number of threads
per core in the guest:

DIV_ROUND_UP(ms->smp.max_cpus * spapr->vsmt, ms->smp.threads);

Since QEMU 4.2, the default behavior is to set spapr->vsmt to
ms->smp.threads. Setups with custom VSMT settings will configure XIVE
to use more HW resources than needed. This is a bit unfortunate but
not extremely harmful, unless maybe if a lot of guests are running
on the host. The sizing of the IPI range is more problematic though
as it eventually led to [1].

This series first does some renaming to make it clear when we're
dealing with vCPU ids. It then fixes the machine code to pass
smp.max_cpus to XIVE where appropriate. Since these changes are
guest/migration visible, a machine property is added to keep the
existing behavior for older machine types. The series is thus based
on Connie's recent patch that introduces compat machines for
QEMU 6.0.

Based-on: 20201109173928.1001764-1-coh...@redhat.com

Note that we still use spapr_max_vcpu_ids() when activating the
in-kernel irqchip because this is what both XICS-on-XIVE and XIVE
KVM devices expect.

[1] https://bugs.launchpad.net/qemu/+bug/1900241

v2: - comments on v1 highlighted that problems mostly come from
  spapr_max_server_number() which got misused over the years.
  Updated the cover letter accordingly.
- completely new approach. Instead of messing with device properties,
  pass the appropriate values to the IC backend handlers.
- rename a few things using the "max_vcpu_ids" wording instead of
  "nr_servers" and "max_server_number"

Greg Kurz (3):
  spapr: Improve naming of some vCPU id related items
  spapr/xive: Fix size of END table and number of claimed IPIs
  spapr/xive: Fix the "ibm,xive-lisn-ranges" property

 include/hw/ppc/spapr.h  |  3 ++-
 include/hw/ppc/spapr_irq.h  | 12 ++--
 include/hw/ppc/spapr_xive.h |  2 +-
 include/hw/ppc/xics_spapr.h |  2 +-
 hw/intc/spapr_xive.c|  9 +
 hw/intc/spapr_xive_kvm.c|  4 ++--
 hw/intc/xics_kvm.c  |  4 ++--
 hw/intc/xics_spapr.c| 11 ++-
 hw/ppc/spapr.c  | 12 
 hw/ppc/spapr_irq.c  | 34 --
 10 files changed, 57 insertions(+), 36 deletions(-)

-- 
2.26.2

Re: [PATCH 00/18] qapi/qom: QAPIfy object-add

2020-11-30 Thread Daniel P . Berrangé

On Mon, Nov 30, 2020 at 05:13:57PM +0100, Kevin Wolf wrote:
> Am 30.11.2020 um 16:30 hat Daniel P. BerrangÃÂ© geschrieben:
> > On Mon, Nov 30, 2020 at 03:58:23PM +0100, Paolo Bonzini wrote:
> > > On 30/11/20 13:25, Kevin Wolf wrote:
> > > > This series adds a QAPI type for the properties of all user creatable
> > > > QOM types and finally makes QMP object-add use the new ObjectOptions
> > > > union so that QAPI introspection can be used for user creatable objects.
> > > > 
> > > > After this series, there is least one obvious next step that needs to be
> > > > done: Change HMP and all of the command line parser to use
> > > > ObjectOptions, too, so that the QAPI schema is consistently enforced in
> > > > all external interfaces. I am planning to send another series to address
> > > > this.
> > > > 
> > > > In a third step, we can try to start deduplicating and integrating 
> > > > things
> > > > better between QAPI and the QOM implementation, e.g. by generating parts
> > > > of the QOM boilerplate from the QAPI schema.
> > > 
> > > With this series it's basically pointless to have QOM properties at all.
> > > Instead, you are basically having half of QEMU's backend data model into a
> > > single struct.
> > > 
> > > So the question is, are we okay with shoveling half of QEMU's backend data
> > > model into a single struct?  If so, there are important consequences.
> > 
> > In theory they should have the same set of options, but nothing in
> > this series will enforce that. So we're introducing the danger that
> > QMP object-add will miss some property, and thus be less functional
> > than the CLI -object.  If we convert CLI -object  to use the QAPI
> > parser too, we eliminate that danger, but we still have the struct
> > duplication.
> 
> I think converting the CLI is doable in the short term. I already have
> the patch for qemu-storage-daemon, but decided to keep it for a separate
> series.
> 
> The most difficult part is probably -readconfig, but with Paolo's RFC
> patches to move it away from QemuOpts, even that shouldn't be very hard.
> 
> > > 1) QOM basically does not need properties anymore except for devices and
> > > machines (accelerators could be converted to QAPI as well). All
> > > user-creatable objects can be changed to something like chardev's "get a
> > > struct and use it fill in the fields", and only leave properties to 
> > > devices
> > > and machines.
> > > 
> > > 2) User-creatable objects can have a much more flexible schema.  This 
> > > means
> > > there's no reason to have block device creation as its own command and
> > > struct for example.
> > > 
> > > The problem with this series is that you are fine with deduplicating 
> > > things
> > > as a third step, but you cannot be sure that such deduplication is 
> > > possible
> > > at all.  So while I don't have any problems in principle with the
> > > ObjectOptions concept, I don't think it should be committed without a 
> > > clear
> > > idea of how to do the third step.
> > 
> > I feel like we should at least aim to kill the struct duplication, even if
> > we ignore the bigger QOM stuff like setters/getters/constructors/etc. The
> > generated structs are not far off being usable.
> > 
> > eg for the secret object we have the QAPI schema
> > 
> > { 'struct': 'SecretCommonProperties',
> >   'data': { '*loaded': { 'type': 'bool', 'features': ['deprecated'] },
> > '*format': 'QCryptoSecretFormat',
> > '*keyid': 'str',
> > '*iv': 'str' } }
> > 
> > { 'struct': 'SecretProperties',
> >   'base': 'SecretCommonProperties',
> >   'data': { '*data': 'str',
> > '*file': 'str' } }
> > 
> > IIUC this will resulting in a QAPI generated flattened struct:
> > 
> >   struct SecretProperties {
> > bool loaded;
> > QCryptoSecretFormat format;
> > char *keyid;
> > char *iv;
> > char *data;
> > char *file;
> >   };
> > 
> > vs the QOM manually written structs
> > 
> >   struct QCryptoSecretCommon {
> > Object parent_obj;
> > uint8_t *rawdata;
> > size_t rawlen;
> > QCryptoSecretFormat format;
> > char *keyid;
> > char *iv;
> >   };
> > 
> >   struct QCryptoSecret {
> > QCryptoSecretCommon parent_obj;
> > char *data;
> > char *file;
> >   };
> > 
> > The key differences
> > 
> >  - The parent struct is embedded, rather than flattened
> >  - The "loaded" property doesn't need to exist
> >  - Some extra fields are live state (rawdata, rawlen)
> > 
> > Lets pretend we just kill "loaded" entirely, so ignore that.
> > 
> > We could simply make QOM "Object" a well known built-in type, so
> > we can reference it as a "parent". Then any struct with "Object"
> > as a parent could use struct embedding rather flattening and thus
> > just work.
> > 
> > Can we invent a "state" field for fields that are internal
> > only, separate from the public "data" fields.
> > 
> > eg the secret QAPI def would only need a couple of changes:
> > 
> > { 'struct': 'QCrypto

[PATCH for-6.0 v2 3/3] spapr/xive: Fix the "ibm, xive-lisn-ranges" property

The dt() callback of the sPAPR IC class has a "nr_servers"
argument which is used by both XIVE and XICS to setup the
"interrupt-controller" node in the DT.

The machine currently passes spapr_max_server_number() to
spapr_irq_dt(). This is perfectly fine to populate the range
of vCPU ids in the "ibm,interrupt-server-ranges" property
for XICS. However, this doesn't makes sense for XIVE's
"ibm,xive-lisn-ranges" property which really expects the
maximum number of vCPUs instead.

Add a new "max_cpus" argument to spapr_irq_dt() and the
dt() handler to convey the maximum number of vCPUs. Have
the latest machine type to pass smp.max_cpus and sPAPR XIVE
to use that for "ibm,xive-lisn-ranges". Older machine types
go on with the previous behavior since this is guest visible.

Signed-off-by: Greg Kurz 
---
 include/hw/ppc/spapr_irq.h | 4 ++--
 hw/intc/spapr_xive.c   | 3 ++-
 hw/intc/xics_spapr.c   | 3 ++-
 hw/ppc/spapr.c | 3 ++-
 hw/ppc/spapr_irq.c | 8 ++--
 5 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index 2e53fc9e6cbb..1edf4851afa4 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -63,7 +63,7 @@ struct SpaprInterruptControllerClass {
 void (*set_irq)(SpaprInterruptController *intc, int irq, int val);
 void (*print_info)(SpaprInterruptController *intc, Monitor *mon);
 void (*dt)(SpaprInterruptController *intc, uint32_t max_vcpu_ids,
-   void *fdt, uint32_t phandle);
+   uint32_t max_cpus, void *fdt, uint32_t phandle);
 int (*post_load)(SpaprInterruptController *intc, int version_id);
 };
 
@@ -75,7 +75,7 @@ void spapr_irq_cpu_intc_reset(struct SpaprMachineState 
*spapr, PowerPCCPU *cpu);
 void spapr_irq_cpu_intc_destroy(struct SpaprMachineState *spapr, PowerPCCPU 
*cpu);
 void spapr_irq_print_info(struct SpaprMachineState *spapr, Monitor *mon);
 void spapr_irq_dt(struct SpaprMachineState *spapr, uint32_t max_vcpu_ids,
-  void *fdt, uint32_t phandle);
+  uint32_t max_cpus, void *fdt, uint32_t phandle);
 
 uint32_t spapr_irq_nr_msis(struct SpaprMachineState *spapr);
 int spapr_irq_msi_alloc(struct SpaprMachineState *spapr, uint32_t num, bool 
align,
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index d0a0ca822367..f9a563cd0a9b 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -670,6 +670,7 @@ static void spapr_xive_print_info(SpaprInterruptController 
*intc, Monitor *mon)
 }
 
 static void spapr_xive_dt(SpaprInterruptController *intc, uint32_t 
max_vcpu_ids,
+  uint32_t max_cpus,
   void *fdt, uint32_t phandle)
 {
 SpaprXive *xive = SPAPR_XIVE(intc);
@@ -678,7 +679,7 @@ static void spapr_xive_dt(SpaprInterruptController *intc, 
uint32_t max_vcpu_ids,
 /* Interrupt number ranges for the IPIs */
 uint32_t lisn_ranges[] = {
 cpu_to_be32(SPAPR_IRQ_IPI),
-cpu_to_be32(SPAPR_IRQ_IPI + max_vcpu_ids),
+cpu_to_be32(SPAPR_IRQ_IPI + max_cpus),
 };
 /*
  * EQ size - the sizes of pages supported by the system 4K, 64K,
diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
index 8f753a858cc2..d9f887dfd303 100644
--- a/hw/intc/xics_spapr.c
+++ b/hw/intc/xics_spapr.c
@@ -309,7 +309,8 @@ static void ics_spapr_realize(DeviceState *dev, Error 
**errp)
 }
 
 static void xics_spapr_dt(SpaprInterruptController *intc, uint32_t 
max_vcpu_ids,
-  void *fdt, uint32_t phandle)
+  uint32_t max_cpus, void *fdt,
+  uint32_t phandle)
 {
 uint32_t interrupt_server_ranges_prop[] = {
 0, cpu_to_be32(max_vcpu_ids),
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 227a926dfd48..be3b4b9119b7 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1164,7 +1164,8 @@ void *spapr_build_fdt(SpaprMachineState *spapr, bool 
reset, size_t space)
 _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
 
 /* /interrupt controller */
-spapr_irq_dt(spapr, spapr_max_vcpu_ids(spapr), fdt, PHANDLE_INTC);
+spapr_irq_dt(spapr, spapr_max_vcpu_ids(spapr), machine->smp.max_cpus,
+ fdt, PHANDLE_INTC);
 
 ret = spapr_dt_memory(spapr, fdt);
 if (ret < 0) {
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 4d9ecd5d0af8..e1fd777aff62 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -272,12 +272,16 @@ void spapr_irq_print_info(SpaprMachineState *spapr, 
Monitor *mon)
 }
 
 void spapr_irq_dt(SpaprMachineState *spapr, uint32_t max_vcpu_ids,
-  void *fdt, uint32_t phandle)
+  uint32_t max_cpus, void *fdt, uint32_t phandle)
 {
 SpaprInterruptControllerClass *sicc
 = SPAPR_INTC_GET_CLASS(spapr->active_intc);
 
-sicc->dt(spapr->active_intc, max_vcpu_ids, fdt, phandle);
+/* For older machine types in case they have an unusual VSMT setting */
+if (SPAPR_MACHINE_GET_CLASS(spapr

Re: [RFC PATCH-for-5.2] gitlab-ci: Do not automatically run Avocado integration tests anymore

2020-11-30 Thread Ademar Reis

On Mon, Nov 30, 2020 at 10:31:09AM +, Daniel P. BerrangÃ© wrote:
> On Fri, Nov 27, 2020 at 07:29:31PM +0100, Thomas Huth wrote:
> > On 27/11/2020 18.57, Philippe Mathieu-DaudÃ© wrote:
> > > On 11/27/20 6:47 PM, Thomas Huth wrote:
> > >> On 27/11/2020 18.41, Philippe Mathieu-DaudÃ© wrote:
> > >>> We lately realized that the Avocado framework was not designed
> > >>> to be regularly run on CI environments. Therefore, as of 5.2
> > >>> we deprecate the gitlab-ci jobs using Avocado. To not disrupt
> > >>> current users, it is possible to keep the current behavior by
> > >>> setting the QEMU_CI_INTEGRATION_JOBS_PRE_5_2_RELEASE variable
> > >>> (see [*]).
> > >>> From now on, using these jobs (or adding new tests to them)
> > >>> is strongly discouraged.
> > >>>
> > >>> Tests based on Avocado will be ported to new job schemes during
> > >>> the next releases, with better documentation and templates.
> > >>
> > >> Why should we disable the jobs by default as long as there is no 
> > >> replacement
> > >> available yet?
> > > 
> > > Why keep it enabled if it is failing randomly
> > 
> > We can still disable single jobs if they are not stable, but that's no
> > reason to disable all of them by default, is it?
> 
> Agreed, the jobs which are known to be broken or unreliable should
> be unconditonally disabled in QEMU as a whole. This isn't specific
> to gitlab config - the qemu build makefiles/mesonfiles should disable
> the problem tests entirely, as we don't want developers wasting time
> running them locally either if they're known to be broken/unreliable.
> 

The problem is identifying when a test is broken/unreliable. No
complex test is 100% reliable: change the environment,
(configuration, build options, platform, etc) and any test complex
enough will start to fail. I've worked in projects orders of
magnitude simpler than QEMU and that was a golden rule. Testing QEMU
is *HARD*.

Which is why I defend test-automation separated from CI:

 * Have a stable CI with tests cherry-picked by whoever is
   maintaining a particular CI runner (we shouldn't have orphan
   runners).

 * Have as many tests as possible in the git repo: maintained,
   reviewed and run (outside of a CI) by people who care about them.

I absolutely agree with you that maintainers and developers should
care and our goal should be a gating CI. The question is how to
create a strategy and a plan to get there. Forcing people to care
rarely works.

Thanks.
   - Ademar

-- 
Ademar Reis Jr
Red Hat

^[:wq!

Re: [PATCH v2 4/6] migration: Add zstd support in multi-thread compression

On 11/27/20 3:32 AM, Zeyu Jin wrote:
> This patch enables zstd option in multi-thread compression.
> 
> Signed-off-by: Zeyu Jin 
> Signed-off-by: Ying Fang 
> ---

> +++ b/qapi/migration.json
> @@ -536,7 +536,7 @@
>  #
>  ##
>  { 'enum': 'CompressMethod',
> -  'data': [ 'zlib' ] }
> +  'data': [ 'zlib', { 'name': 'zstd', 'if': 'defined(CONFIG_ZSTD)' } ] }

Missing documentation of the new value, including a '(since 6.0)' marker.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v2 0/6] migration: Multi-thread compression method support