date:20210914

[PATCH v2 1/1] python: Update for pylint 2.10

2021-09-14 Thread John Snow

A few new annoyances. Of note is the new warning for an unspecified
encoding when opening a text file, which actually does indicate a
potentially real problem; see
https://www.python.org/dev/peps/pep-0597/#motivation

It's not clear to me what the "right" encoding is; it depends on
whatever encoding QEMU is using when it prints to terminal. I'm going to
assume UTF-8 works.

Signed-off-by: John Snow 
---
 python/qemu/machine/machine.py | 3 ++-
 python/setup.cfg   | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index a7081b1845..a27a80497d 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -291,7 +291,8 @@ def get_pid(self) -> Optional[int]:
 
 def _load_io_log(self) -> None:
 if self._qemu_log_path is not None:
-with open(self._qemu_log_path, "r") as iolog:
+with open(self._qemu_log_path, "r",
+  encoding='utf-8') as iolog:
 self._iolog = iolog.read()
 
 @property
diff --git a/python/setup.cfg b/python/setup.cfg
index 83909c1c97..0f0cab098f 100644
--- a/python/setup.cfg
+++ b/python/setup.cfg
@@ -104,6 +104,7 @@ good-names=i,
 [pylint.similarities]
 # Ignore imports when computing similarities.
 ignore-imports=yes
+ignore-signatures=yes
 
 # Minimum lines number of a similarity.
 # TODO: Remove after we opt in to Pylint 2.8.3. See commit msg.
-- 
2.31.1

[PATCH v2 0/1] Update check-python-tox test for pylint 2.10

2021-09-14 Thread John Snow

V2: It's not safe to use sys.stderr.encoding to determine a "console
encoding", because that uses the "current" stderr and not a
hypothetically generic one -- and doing this causes the acceptance tests
to fail.

Use UTF-8 instead.

Question: What encoding do terminal programs use? Is there an inherent
encoding to fprintf et al, or does it just push whatever bytes you put
into it straight into the stdout/stderr pipe?

John Snow (1):
  python: Update for pylint 2.10

 python/qemu/machine/machine.py | 3 ++-
 python/setup.cfg   | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

-- 
2.31.1

Re: ensuring a machine's buses have unique names

2021-09-14 Thread Markus Armbruster

Peter Maydell  writes:

> On Thu, 26 Aug 2021 at 15:08, Markus Armbruster  wrote:
>> Peter Maydell  writes:
>> > What's the right way to ensure that when a machine has multiple
>> > buses of the same type (eg multiple i2c controllers, multiple
>> > sd card controllers) they all get assigned unique names so that
>> > the user can use '-device ...,bus=some-name' to put a device on a
>> > specific bus?
>
>> Another one used to be isapc.  It's not anymore.  I believe it's due to
>>
>> commit 61de36761b565a4138d8ad7ec75489ab28fe84b6
>> Author: Alexander Graf 
>> Date:   Thu Feb 6 16:08:15 2014 +0100
>>
>> qdev: Keep global allocation counter per bus
>
>> Note that the automatic bus numbers depend on the order in which board
>> code creates devices.  Too implicit and fragile for my taste.  But it's
>> been working well enough.
>
> I had a bit of a look into this. I think the problem here is that
> we created a family of easy-to-misuse APIs and then misused them...

Well, "mission accomplished!"

> The qbus_create() and qbus_create_inplace() functions both take
> a 'const char *name' argument. If they're passed in NULL then
> they create an automatically-uniquified name (as per the commit
> above).

Fine print: if the device providing the bus has an ID (the thing set
with id=ID), then the name is ID.N, where N counts from 0 in this
device, else it's TYPE.N, where N counts from 0 globally per bus type,
and TYPE is the bus's type name converted to lower case.

Either scheme produces unique names, but together they need not: 

$ qemu-system-x86_64 --nodefaults -S -display none -monitor stdio -device 
intel-hda -device intel-hda,id=hda
QEMU 6.1.50 monitor - type 'help' for more information
(qemu) info qtree
bus: main-system-bus
  type System
  [...]
  dev: i440FX-pcihost, id ""
[...]
bus: pci.0
  type PCI
  dev: intel-hda, id "hda"
[...]
bus: hda.0
  type HDA
  dev: intel-hda, id ""
[...]
bus: hda.0
  type HDA
  [...]

Both buses are named "hda.0".

Awesome: we made avoiding device IDs that produce bus ID clashes the
user's job.  To know what to avoid, you need to know your machine type,
and the buses provided by the devices you add without ID (which you
shouldn't).  "Fun" when your machine type evolves.

Poorly designed from the start, and then commit 61de36761b5 blew its
chance to fix it.

> If they're passed in a non-NULL string then they use
> it as-is, whether it's unique in the system or not. We then
> typically wrap qbus_create() in a bus-specific creation function
> (examples are scsi_bus_new(), ide_bus_new(), i2c_init_bus()).
> Mostly those creation functions also take a 'name' argument and
> pass it through. ide_bus_new() is an interesting exception which
> does not take a name argument.
>
> The easy-to-misuse part is that now we have a set of functions
> that look like you should pass them in a name (and where there's
> plenty of code in the codebase that passes in a name) but where
> that's the wrong thing unless you're a board model and are
> picking a guaranteed unique name, or you're an odd special case
> like virtio-scsi. (virtio-scsi is the one caller of scsi_bus_new()
> that passes in something other than NULL.) In particular for
> code which is implementing a device that is a whatever-controller,
> creating a whatever-bus and specifying a name is almost always
> going to be wrong, because as soon as some machine creates two
> of these whatever-controllers it has non-unique bus names.

Yes.

> It looks like IDE buses are OK because ide_bus_new() takes no
> name argument, and SCSI buses are OK because the callers
> correctly pass in NULL, but almost all the "minor" buses
> (SD, I2C, ipack, aux...) have a lot of incorrect naming of
> buses in controller models.
>
> I'm not sure how best to sort this tangle out. We could:
>  * make controller devices pass in NULL as bus name; this
>means that some bus names will change, which is an annoying
>breakage but for these minor bus types we can probably
>get away with it. This brings these buses into line with
>how we've been handling uniqueness for ide and scsi.

To gauge the breakage, we need a list of the affected bus names.

>  * drop the 'name' argument for buses like ide that don't
>actually have any callsites that need to pass a name
>  * split into foo_bus_new() and foo_bus_new_named() so that
>the "easy default" doesn't pass a name, and there's at least
>a place to put a doc comment explaining that the name passed
>into the _named() version should be unique ??

Yes, please.

A bus name setter would be even more discouraging, but is no good,
because it can't undo the side effect on the bus type's counter.

Omitting foo_bus_new_named() when there is no user feels okay to me.

>  * something else ?
>
> thanks
> -- PMM

Re: [PATCH v3 00/22] target/ppc: DFP instructions using decodetree

2021-09-14 Thread David Gibson

On Fri, Sep 10, 2021 at 08:26:02AM -0300, Luis Pires wrote:
> This series moves all existing DFP instructions to decodetree and
> implements the 2 new instructions (dcffixqq and dctfixqq) from
> Power ISA 3.1.
> 
> In order to implement dcffixqq, divu128/divs128 were modified to
> support 128-bit quotients (previously, they were limited to 64-bit
> quotients), along with adjustments being made to their existing callers.
> libdecnumber was also expanded to allow creating decimal numbers from
> 128-bit integers.
> 
> Similarly, for dctfixqq, mulu128 (host-utils) and decNumberIntegralToInt128
> (libdecnumber) were introduced to support 128-bit integers.
> 
> The remaining patches of this series move all of the already existing
> DFP instructions to decodetree, and end up removing dfp-ops.c.inc, which
> is no longer needed.
> 
> NOTE 1: The previous, non-decodetree code, was updating ctx->nip for all the
> DFP instructions. I've removed that, but it would be great if someone could
> confirm that updating nip really wasn't necessary.
> 
> NOTE 2: Some arithmetic function support for 128-bit integers was added,
> for now, still using 64-bit pairs. In the near future, I think we should
> modify all of them to use Int128 (and introduce UInt128). But I'll send
> out an RFC to discuss how to do that in another patch series.
> 
> NOTE 3: The helper names are in uppercase, to match the instruction
> names and to simplify the macros that define trans* functions.
> Previously, this wasn't the case, as we were using lowercase instruction
> names in the pre-decodetree code. Another standalone patch will be sent
> later on, changing to uppercase the other new (decodetree) helpers whose
> names are directly related to instruction names, eventually making PPC
> helper names consistent.
> 
> Based-on: 20210823150235.35759-1-luis.pi...@eldorado.org.br
> (target/ppc: fix setting of CR flags in bcdcfsq)
> This series assumes bcdcfsq's fix is already in.

I've applied 1..4 to ppc-for-6.2, since those have acks.  Waiting on
reviews (probably from Richard) before applying the rest.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v2 03/22] target/riscv: Implement hgeie and hgeip CSRs

2021-09-14 Thread Alistair Francis

On Thu, Sep 2, 2021 at 9:47 PM Anup Patel  wrote:
>
> The hgeie and hgeip CSRs are required for emulating an external
> interrupt controller capable of injecting virtual external
> interrupt to Guest/VM running at VS-level.
>
> Signed-off-by: Anup Patel 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.c| 61 ---
>  target/riscv/cpu.h|  5 
>  target/riscv/cpu_bits.h   |  1 +
>  target/riscv/cpu_helper.c | 36 +--
>  target/riscv/csr.c| 43 ++-
>  target/riscv/machine.c|  6 ++--
>  6 files changed, 117 insertions(+), 35 deletions(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 9d97fbe3d9..0ade6ad144 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -572,23 +572,49 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
> **errp)
>  static void riscv_cpu_set_irq(void *opaque, int irq, int level)
>  {
>  RISCVCPU *cpu = RISCV_CPU(opaque);
> +CPURISCVState *env = >env;
>
> -switch (irq) {
> -case IRQ_U_SOFT:
> -case IRQ_S_SOFT:
> -case IRQ_VS_SOFT:
> -case IRQ_M_SOFT:
> -case IRQ_U_TIMER:
> -case IRQ_S_TIMER:
> -case IRQ_VS_TIMER:
> -case IRQ_M_TIMER:
> -case IRQ_U_EXT:
> -case IRQ_S_EXT:
> -case IRQ_VS_EXT:
> -case IRQ_M_EXT:
> -riscv_cpu_update_mip(cpu, 1 << irq, BOOL_TO_MASK(level));
> -break;
> -default:
> +if (irq < IRQ_LOCAL_MAX) {
> +switch (irq) {
> +case IRQ_U_SOFT:
> +case IRQ_S_SOFT:
> +case IRQ_VS_SOFT:
> +case IRQ_M_SOFT:
> +case IRQ_U_TIMER:
> +case IRQ_S_TIMER:
> +case IRQ_VS_TIMER:
> +case IRQ_M_TIMER:
> +case IRQ_U_EXT:
> +case IRQ_S_EXT:
> +case IRQ_VS_EXT:
> +case IRQ_M_EXT:
> +riscv_cpu_update_mip(cpu, 1 << irq, BOOL_TO_MASK(level));
> +break;
> +default:
> +g_assert_not_reached();
> +}
> +} else if (irq < (IRQ_LOCAL_MAX + IRQ_LOCAL_GUEST_MAX)) {
> +/* Require H-extension for handling guest local interrupts */
> +if (!riscv_has_ext(env, RVH)) {
> +g_assert_not_reached();
> +}
> +
> +/* Compute bit position in HGEIP CSR */
> +irq = irq - IRQ_LOCAL_MAX + 1;
> +if (env->geilen < irq) {
> +g_assert_not_reached();
> +}
> +
> +/* Update HGEIP CSR */
> +env->hgeip &= ~((target_ulong)1 << irq);
> +if (level) {
> +env->hgeip |= (target_ulong)1 << irq;
> +}
> +
> +/* Update mip.SGEIP bit */
> +riscv_cpu_update_mip(cpu, MIP_SGEIP,
> + BOOL_TO_MASK(!!(env->hgeie & env->hgeip)));
> +} else {
>  g_assert_not_reached();
>  }
>  }
> @@ -601,7 +627,8 @@ static void riscv_cpu_init(Object *obj)
>  cpu_set_cpustate_pointers(cpu);
>
>  #ifndef CONFIG_USER_ONLY
> -qdev_init_gpio_in(DEVICE(cpu), riscv_cpu_set_irq, IRQ_LOCAL_MAX);
> +qdev_init_gpio_in(DEVICE(cpu), riscv_cpu_set_irq,
> +  IRQ_LOCAL_MAX + IRQ_LOCAL_GUEST_MAX);
>  #endif /* CONFIG_USER_ONLY */
>  }
>
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index bf1c899c00..59b36f758f 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -148,6 +148,7 @@ struct CPURISCVState {
>  target_ulong priv;
>  /* This contains QEMU specific information about the virt state. */
>  target_ulong virt;
> +target_ulong geilen;
>  target_ulong resetvec;
>
>  target_ulong mhartid;
> @@ -185,6 +186,8 @@ struct CPURISCVState {
>  target_ulong htval;
>  target_ulong htinst;
>  target_ulong hgatp;
> +target_ulong hgeie;
> +target_ulong hgeip;
>  uint64_t htimedelta;
>
>  /* Virtual CSRs */
> @@ -336,6 +339,8 @@ int riscv_cpu_gdb_read_register(CPUState *cpu, GByteArray 
> *buf, int reg);
>  int riscv_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
>  bool riscv_cpu_exec_interrupt(CPUState *cs, int interrupt_request);
>  bool riscv_cpu_fp_enabled(CPURISCVState *env);
> +target_ulong riscv_cpu_get_geilen(CPURISCVState *env);
> +void riscv_cpu_set_geilen(CPURISCVState *env, target_ulong geilen);
>  bool riscv_cpu_virt_enabled(CPURISCVState *env);
>  void riscv_cpu_set_virt_enabled(CPURISCVState *env, bool enable);
>  bool riscv_cpu_force_hs_excep_enabled(CPURISCVState *env);
> diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> index 17ede1d4a9..a1958dbd6a 100644
> --- a/target/riscv/cpu_bits.h
> +++ b/target/riscv/cpu_bits.h
> @@ -506,6 +506,7 @@ typedef enum RISCVException {
>  #define IRQ_M_EXT  11
>  #define IRQ_S_GEXT 12
>  #define IRQ_LOCAL_MAX  13
> +#define IRQ_LOCAL_GUEST_MAX(TARGET_LONG_BITS - 1)
>
>  /* mip masks */
>  #define MIP_USIP   (1 <<

Re: [PATCH v2 20/53] target/ppc: convert to use format_state instead of dump_state

2021-09-14 Thread David Gibson

On Tue, Sep 14, 2021 at 03:20:09PM +0100, Daniel P. Berrangé wrote:
> Signed-off-by: Daniel P. Berrangé 

Acked-by: David Gibson 

> ---
>  target/ppc/cpu.h  |   2 +-
>  target/ppc/cpu_init.c | 212 +-
>  2 files changed, 126 insertions(+), 88 deletions(-)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 500205229c..c84ae29b98 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1256,7 +1256,7 @@ DECLARE_OBJ_CHECKERS(PPCVirtualHypervisor, 
> PPCVirtualHypervisorClass,
>  
>  void ppc_cpu_do_interrupt(CPUState *cpu);
>  bool ppc_cpu_exec_interrupt(CPUState *cpu, int int_req);
> -void ppc_cpu_dump_state(CPUState *cpu, FILE *f, int flags);
> +void ppc_cpu_format_state(CPUState *cpu, GString *buf, int flags);
>  hwaddr ppc_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
>  int ppc_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
>  int ppc_cpu_gdb_read_register_apple(CPUState *cpu, GByteArray *buf, int reg);
> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> index ad7abc6041..3456be465c 100644
> --- a/target/ppc/cpu_init.c
> +++ b/target/ppc/cpu_init.c
> @@ -9043,7 +9043,7 @@ static void ppc_cpu_class_init(ObjectClass *oc, void 
> *data)
>  
>  cc->class_by_name = ppc_cpu_class_by_name;
>  cc->has_work = ppc_cpu_has_work;
> -cc->dump_state = ppc_cpu_dump_state;
> +cc->format_state = ppc_cpu_format_state;
>  cc->set_pc = ppc_cpu_set_pc;
>  cc->gdb_read_register = ppc_cpu_gdb_read_register;
>  cc->gdb_write_register = ppc_cpu_gdb_write_register;
> @@ -9104,7 +9104,7 @@ static void ppc_cpu_register_types(void)
>  #endif
>  }
>  
> -void ppc_cpu_dump_state(CPUState *cs, FILE *f, int flags)
> +void ppc_cpu_format_state(CPUState *cs, GString *buf, int flags)
>  {
>  #define RGPL  4
>  #define RFPL  4
> @@ -9113,39 +9113,41 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, int 
> flags)
>  CPUPPCState *env = >env;
>  int i;
>  
> -qemu_fprintf(f, "NIP " TARGET_FMT_lx "   LR " TARGET_FMT_lx " CTR "
> - TARGET_FMT_lx " XER " TARGET_FMT_lx " CPU#%d\n",
> - env->nip, env->lr, env->ctr, cpu_read_xer(env),
> - cs->cpu_index);
> -qemu_fprintf(f, "MSR " TARGET_FMT_lx " HID0 " TARGET_FMT_lx "  HF "
> - "%08x iidx %d didx %d\n",
> - env->msr, env->spr[SPR_HID0], env->hflags,
> - cpu_mmu_index(env, true), cpu_mmu_index(env, false));
> +g_string_append_printf(buf,
> +   "NIP " TARGET_FMT_lx "   LR " TARGET_FMT_lx " CTR 
> "
> +   TARGET_FMT_lx " XER " TARGET_FMT_lx " CPU#%d\n",
> +   env->nip, env->lr, env->ctr, cpu_read_xer(env),
> +   cs->cpu_index);
> +g_string_append_printf(buf,
> +   "MSR " TARGET_FMT_lx " HID0 " TARGET_FMT_lx "  HF 
> "
> +   "%08x iidx %d didx %d\n",
> +   env->msr, env->spr[SPR_HID0], env->hflags,
> +   cpu_mmu_index(env, true), cpu_mmu_index(env, 
> false));
>  #if !defined(NO_TIMER_DUMP)
> -qemu_fprintf(f, "TB %08" PRIu32 " %08" PRIu64
> +g_string_append_printf(buf, "TB %08" PRIu32 " %08" PRIu64
>  #if !defined(CONFIG_USER_ONLY)
> - " DECR " TARGET_FMT_lu
> +   " DECR " TARGET_FMT_lu
>  #endif
> - "\n",
> - cpu_ppc_load_tbu(env), cpu_ppc_load_tbl(env)
> +   "\n",
> +   cpu_ppc_load_tbu(env), cpu_ppc_load_tbl(env)
>  #if !defined(CONFIG_USER_ONLY)
> - , cpu_ppc_load_decr(env)
> +   , cpu_ppc_load_decr(env)
>  #endif
>  );
>  #endif
>  for (i = 0; i < 32; i++) {
>  if ((i & (RGPL - 1)) == 0) {
> -qemu_fprintf(f, "GPR%02d", i);
> +g_string_append_printf(buf, "GPR%02d", i);
>  }
> -qemu_fprintf(f, " %016" PRIx64, ppc_dump_gpr(env, i));
> +g_string_append_printf(buf, " %016" PRIx64, ppc_dump_gpr(env, i));
>  if ((i & (RGPL - 1)) == (RGPL - 1)) {
> -qemu_fprintf(f, "\n");
> +g_string_append_printf(buf, "\n");
>  }
>  }
> -qemu_fprintf(f, "CR ");
> +g_string_append_printf(buf, "CR ");
>  for (i = 0; i < 8; i++)
> -qemu_fprintf(f, "%01x", env->crf[i]);
> -qemu_fprintf(f, "  [");
> +g_string_append_printf(buf, "%01x", env->crf[i]);
> +g_string_append_printf(buf, "  [");
>  for (i = 0; i < 8; i++) {
>  char a = '-';
>  if (env->crf[i] & 0x08) {
> @@ -9155,75 +9157,97 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, int 
> flags)
>  } else if (env->crf[i] & 0x02) {
>  a = 'E';
>  }
> -qemu_fprintf(f, " %c%c", a, env->crf[i] & 0x01 ? 'O' : ' ');
> +g_string_append_printf(buf, " %c%c", a, env->crf[i] & 0x01 ?

Re: [PATCH v2 50/53] monitor: merge duplicate "info tlb" handlers

2021-09-14 Thread David Gibson

On Tue, Sep 14, 2021 at 03:20:39PM +0100, Daniel P. Berrangé wrote:
> Now that all target architectures are converted to use the "format_tlb"
> callback, we can merge all the duplicate "info tlb" handlers into one
> and remove the architecture condition on the command.
> 
> Signed-off-by: Daniel P. Berrangé 

ppc parts
Acked-by: David Gibson 

> ---
>  hmp-commands-info.hx |  3 ---
>  include/monitor/hmp-target.h |  1 -
>  monitor/misc.c   | 15 +++
>  target/i386/monitor.c| 15 ---
>  target/m68k/monitor.c| 15 ---
>  target/nios2/monitor.c   | 15 ---
>  target/ppc/monitor.c | 15 ---
>  target/sh4/monitor.c | 15 ---
>  target/sparc/monitor.c   | 16 
>  target/xtensa/monitor.c  | 15 ---
>  10 files changed, 15 insertions(+), 110 deletions(-)
> 
> diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
> index f8312342cd..7bd1e04d46 100644
> --- a/hmp-commands-info.hx
> +++ b/hmp-commands-info.hx
> @@ -206,8 +206,6 @@ SRST
>  Show PCI information.
>  ERST
>  
> -#if defined(TARGET_I386) || defined(TARGET_SH4) || defined(TARGET_SPARC) || \
> -defined(TARGET_PPC) || defined(TARGET_XTENSA) || defined(TARGET_M68K)
>  {
>  .name   = "tlb",
>  .args_type  = "",
> @@ -215,7 +213,6 @@ ERST
>  .help   = "show virtual to physical memory mappings",
>  .cmd= hmp_info_tlb,
>  },
> -#endif
>  
>  SRST
>``info tlb``
> diff --git a/include/monitor/hmp-target.h b/include/monitor/hmp-target.h
> index df79ad3355..04e02e8895 100644
> --- a/include/monitor/hmp-target.h
> +++ b/include/monitor/hmp-target.h
> @@ -45,7 +45,6 @@ CPUArchState *mon_get_cpu_env(Monitor *mon);
>  CPUState *mon_get_cpu(Monitor *mon);
>  
>  void hmp_info_mem(Monitor *mon, const QDict *qdict);
> -void hmp_info_tlb(Monitor *mon, const QDict *qdict);
>  void hmp_mce(Monitor *mon, const QDict *qdict);
>  void hmp_info_local_apic(Monitor *mon, const QDict *qdict);
>  
> diff --git a/monitor/misc.c b/monitor/misc.c
> index 6b07ef..c7d138914d 100644
> --- a/monitor/misc.c
> +++ b/monitor/misc.c
> @@ -936,6 +936,21 @@ static void hmp_info_mtree(Monitor *mon, const QDict 
> *qdict)
>  mtree_info(flatview, dispatch_tree, owner, disabled);
>  }
>  
> +static void hmp_info_tlb(Monitor *mon, const QDict *qdict)
> +{
> +g_autoptr(GString) buf = g_string_new("");
> +CPUState *cpu = mon_get_cpu(mon);
> +
> +if (!cpu) {
> +monitor_printf(mon, "No CPU available\n");
> +return;
> +}
> +
> +cpu_format_tlb(cpu, buf);
> +
> +monitor_printf(mon, "%s", buf->str);
> +}
> +
>  static void hmp_info_profile(Monitor *mon, const QDict *qdict)
>  {
>  Error *err = NULL;
> diff --git a/target/i386/monitor.c b/target/i386/monitor.c
> index 698fbbc80b..a7eb4205c7 100644
> --- a/target/i386/monitor.c
> +++ b/target/i386/monitor.c
> @@ -248,21 +248,6 @@ void x86_cpu_format_tlb(CPUState *cpu, GString *buf)
>  }
>  }
>  
> -void hmp_info_tlb(Monitor *mon, const QDict *qdict)
> -{
> -g_autoptr(GString) buf = g_string_new("");
> -CPUState *cpu = mon_get_cpu(mon);
> -
> -if (!cpu) {
> -monitor_printf(mon, "No CPU available\n");
> -return;
> -}
> -
> -cpu_format_tlb(cpu, buf);
> -
> -monitor_printf(mon, "%s", buf->str);
> -}
> -
>  static void mem_print(Monitor *mon, CPUArchState *env,
>hwaddr *pstart, int *plast_prot,
>hwaddr end, int prot)
> diff --git a/target/m68k/monitor.c b/target/m68k/monitor.c
> index 003a665246..0dc729692b 100644
> --- a/target/m68k/monitor.c
> +++ b/target/m68k/monitor.c
> @@ -12,21 +12,6 @@
>  #include "qapi/error.h"
>  #include "qapi/qapi-commands-machine-target.h"
>  
> -void hmp_info_tlb(Monitor *mon, const QDict *qdict)
> -{
> -g_autoptr(GString) buf = g_string_new("");
> -CPUState *cpu = mon_get_cpu(mon);
> -
> -if (!cpu) {
> -monitor_printf(mon, "No CPU available\n");
> -return;
> -}
> -
> -cpu_format_tlb(cpu, buf);
> -
> -monitor_printf(mon, "%s", buf->str);
> -}
> -
>  static const MonitorDef monitor_defs[] = {
>  { "d0", offsetof(CPUM68KState, dregs[0]) },
>  { "d1", offsetof(CPUM68KState, dregs[1]) },
> diff --git a/target/nios2/monitor.c b/target/nios2/monitor.c
> index 99d35e8ef1..1180a32f80 100644
> --- a/target/nios2/monitor.c
> +++ b/target/nios2/monitor.c
> @@ -26,18 +26,3 @@
>  #include "monitor/monitor.h"
>  #include "monitor/hmp-target.h"
>  #include "monitor/hmp.h"
> -
> -void hmp_info_tlb(Monitor *mon, const QDict *qdict)
> -{
> -g_autoptr(GString) buf = g_string_new("");
> -CPUState *cpu = mon_get_cpu(mon);
> -
> -if (!cpu) {
> -monitor_printf(mon, "No CPU available\n");
> -return;
> -}
> -
> -cpu_format_tlb(cpu, buf);
> -
> -monitor_printf(mon, "%s", buf->str);
> -}
> diff --git

Re: [PATCH v2 46/53] target/ppc: convert to use format_tlb callback

2021-09-14 Thread David Gibson

On Tue, Sep 14, 2021 at 03:20:35PM +0100, Daniel P. Berrangé wrote:
> Change the "info tlb" implementation to use the format_tlb callback.
> 
> Signed-off-by: Daniel P. Berrangé 

Acked-by: David Gibson 

> ---
>  target/ppc/cpu.h|   3 +-
>  target/ppc/cpu_init.c   |   3 +
>  target/ppc/mmu-hash64.c |   8 +-
>  target/ppc/mmu-hash64.h |   2 +-
>  target/ppc/mmu_common.c | 167 ++--
>  target/ppc/monitor.c|  10 ++-
>  6 files changed, 107 insertions(+), 86 deletions(-)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index c84ae29b98..37b44bfbc3 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1257,6 +1257,7 @@ DECLARE_OBJ_CHECKERS(PPCVirtualHypervisor, 
> PPCVirtualHypervisorClass,
>  void ppc_cpu_do_interrupt(CPUState *cpu);
>  bool ppc_cpu_exec_interrupt(CPUState *cpu, int int_req);
>  void ppc_cpu_format_state(CPUState *cpu, GString *buf, int flags);
> +void ppc_cpu_format_tlb(CPUState *cpu, GString *buf);
>  hwaddr ppc_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
>  int ppc_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
>  int ppc_cpu_gdb_read_register_apple(CPUState *cpu, GByteArray *buf, int reg);
> @@ -2667,8 +2668,6 @@ static inline bool 
> ppc_interrupts_little_endian(PowerPCCPU *cpu)
>  return false;
>  }
>  
> -void dump_mmu(CPUPPCState *env);
> -
>  void ppc_maybe_bswap_register(CPUPPCState *env, uint8_t *mem_buf, int len);
>  void ppc_store_vscr(CPUPPCState *env, uint32_t vscr);
>  uint32_t ppc_get_vscr(CPUPPCState *env);
> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> index 3456be465c..98d6f40a49 100644
> --- a/target/ppc/cpu_init.c
> +++ b/target/ppc/cpu_init.c
> @@ -9044,6 +9044,9 @@ static void ppc_cpu_class_init(ObjectClass *oc, void 
> *data)
>  cc->class_by_name = ppc_cpu_class_by_name;
>  cc->has_work = ppc_cpu_has_work;
>  cc->format_state = ppc_cpu_format_state;
> +#ifndef CONFIG_USER_ONLY
> +cc->format_tlb = ppc_cpu_format_tlb;
> +#endif
>  cc->set_pc = ppc_cpu_set_pc;
>  cc->gdb_read_register = ppc_cpu_gdb_read_register;
>  cc->gdb_write_register = ppc_cpu_gdb_write_register;
> diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
> index 19832c4b46..73927a0819 100644
> --- a/target/ppc/mmu-hash64.c
> +++ b/target/ppc/mmu-hash64.c
> @@ -80,7 +80,7 @@ static ppc_slb_t *slb_lookup(PowerPCCPU *cpu, target_ulong 
> eaddr)
>  return NULL;
>  }
>  
> -void dump_slb(PowerPCCPU *cpu)
> +void dump_slb(PowerPCCPU *cpu, GString *buf)
>  {
>  CPUPPCState *env = >env;
>  int i;
> @@ -88,15 +88,15 @@ void dump_slb(PowerPCCPU *cpu)
>  
>  cpu_synchronize_state(CPU(cpu));
>  
> -qemu_printf("SLB\tESID\t\t\tVSID\n");
> +g_string_append_printf(buf, "SLB\tESID\t\t\tVSID\n");
>  for (i = 0; i < cpu->hash64_opts->slb_size; i++) {
>  slbe = env->slb[i].esid;
>  slbv = env->slb[i].vsid;
>  if (slbe == 0 && slbv == 0) {
>  continue;
>  }
> -qemu_printf("%d\t0x%016" PRIx64 "\t0x%016" PRIx64 "\n",
> -i, slbe, slbv);
> +g_string_append_printf(buf, "%d\t0x%016" PRIx64 "\t0x%016" PRIx64 
> "\n",
> +   i, slbe, slbv);
>  }
>  }
>  
> diff --git a/target/ppc/mmu-hash64.h b/target/ppc/mmu-hash64.h
> index c5b2f97ff7..99e03a5849 100644
> --- a/target/ppc/mmu-hash64.h
> +++ b/target/ppc/mmu-hash64.h
> @@ -4,7 +4,7 @@
>  #ifndef CONFIG_USER_ONLY
>  
>  #ifdef TARGET_PPC64
> -void dump_slb(PowerPCCPU *cpu);
> +void dump_slb(PowerPCCPU *cpu, GString *buf);
>  int ppc_store_slb(PowerPCCPU *cpu, target_ulong slot,
>target_ulong esid, target_ulong vsid);
>  bool ppc_hash64_xlate(PowerPCCPU *cpu, vaddr eaddr, MMUAccessType 
> access_type,
> diff --git a/target/ppc/mmu_common.c b/target/ppc/mmu_common.c
> index 754509e556..d7b716f30a 100644
> --- a/target/ppc/mmu_common.c
> +++ b/target/ppc/mmu_common.c
> @@ -937,19 +937,19 @@ static const char *book3e_tsize_to_str[32] = {
>  "1T", "2T"
>  };
>  
> -static void mmubooke_dump_mmu(CPUPPCState *env)
> +static void mmubooke_dump_mmu(CPUPPCState *env, GString *buf)
>  {
>  ppcemb_tlb_t *entry;
>  int i;
>  
>  if (kvm_enabled() && !env->kvm_sw_tlb) {
> -qemu_printf("Cannot access KVM TLB\n");
> +g_string_append_printf(buf, "Cannot access KVM TLB\n");
>  return;
>  }
>  
> -qemu_printf("\nTLB:\n");
> -qemu_printf("Effective  Physical   Size PID   Prot "
> -"Attr\n");
> +g_string_append_printf(buf, "\nTLB:\n");
> +g_string_append_printf(buf, "Effective  Physical   "
> +   "Size PID   Prot Attr\n");
>  
>  entry = >tlb.tlbe[0];
>  for (i = 0; i < env->nb_tlb; i++, entry++) {
> @@ -973,22 +973,24 @@ static void mmubooke_dump_mmu(CPUPPCState *env)
>  } else {
>  snprintf(size_buf, sizeof(size_buf), "%3" PRId64 "k",

Re: [PATCH v2 16/22] hw/riscv: virt: Use AIA INTC compatible string when available

2021-09-14 Thread Alistair Francis

On Thu, Sep 2, 2021 at 9:58 PM Anup Patel  wrote:
>
> We should use the AIA INTC compatible string in the CPU INTC
> DT nodes when the CPUs support AIA feature. This will allow
> Linux INTC driver to use AIA local interrupt CSRs.
>
> Signed-off-by: Anup Patel 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/riscv/virt.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
> index ec0cb69b8c..f43304beca 100644
> --- a/hw/riscv/virt.c
> +++ b/hw/riscv/virt.c
> @@ -211,8 +211,17 @@ static void create_fdt_socket_cpus(RISCVVirtState *s, 
> int socket,
>  qemu_fdt_add_subnode(mc->fdt, intc_name);
>  qemu_fdt_setprop_cell(mc->fdt, intc_name, "phandle",
>  intc_phandles[cpu]);
> -qemu_fdt_setprop_string(mc->fdt, intc_name, "compatible",
> -"riscv,cpu-intc");
> +if (riscv_feature(>soc[socket].harts[cpu].env,
> +  RISCV_FEATURE_AIA)) {
> +static const char * const compat[2] = {
> +"riscv,cpu-intc-aia", "riscv,cpu-intc"
> +};
> +qemu_fdt_setprop_string_array(mc->fdt, name, "compatible",
> +  (char **), ARRAY_SIZE(compat));
> +} else {
> +qemu_fdt_setprop_string(mc->fdt, intc_name, "compatible",
> +"riscv,cpu-intc");
> +}
>  qemu_fdt_setprop(mc->fdt, intc_name, "interrupt-controller", NULL, 
> 0);
>  qemu_fdt_setprop_cell(mc->fdt, intc_name, "#interrupt-cells", 1);
>
> --
> 2.25.1
>
>

Re: [PATCH v2 17/22] target/riscv: Allow users to force enable AIA CSRs in HART

2021-09-14 Thread Alistair Francis

On Thu, Sep 2, 2021 at 10:03 PM Anup Patel  wrote:
>
> We add "x-aia" command-line option for RISC-V HART using which
> allows users to force enable CPU AIA CSRs without changing the
> interrupt controller available in RISC-V machine.
>
> Signed-off-by: Anup Patel 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.c | 5 +
>  target/riscv/cpu.h | 1 +
>  2 files changed, 6 insertions(+)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index e0f4ae4224..9723d54eaf 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -452,6 +452,10 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
> **errp)
>  }
>  }
>
> +if (cpu->cfg.aia) {
> +riscv_set_feature(env, RISCV_FEATURE_AIA);
> +}
> +
>  set_resetvec(env, cpu->cfg.resetvec);
>
>  /* If only XLEN is set for misa, then set misa from properties */
> @@ -672,6 +676,7 @@ static Property riscv_cpu_properties[] = {
>  DEFINE_PROP_BOOL("mmu", RISCVCPU, cfg.mmu, true),
>  DEFINE_PROP_BOOL("pmp", RISCVCPU, cfg.pmp, true),
>  DEFINE_PROP_BOOL("x-epmp", RISCVCPU, cfg.epmp, false),
> +DEFINE_PROP_BOOL("x-aia", RISCVCPU, cfg.aia, false),
>
>  DEFINE_PROP_UINT64("resetvec", RISCVCPU, cfg.resetvec, DEFAULT_RSTVEC),
>  DEFINE_PROP_END_OF_LIST(),
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 16a4596433..cab9e90153 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -337,6 +337,7 @@ struct RISCVCPU {
>  bool mmu;
>  bool pmp;
>  bool epmp;
> +bool aia;
>  uint64_t resetvec;
>  } cfg;
>  };
> --
> 2.25.1
>
>

Re: [PATCH v5 17/31] target/arm: Enforce alignment for LDM/STM

2021-09-14 Thread Nick Desaulniers

On Tue, Sep 7, 2021 at 6:44 AM Richard Henderson
 wrote:
>
> On 8/31/21 2:51 AM, Nathan Chancellor wrote:
> > I just bisected a boot hang with an LLVM-built multi_v7_defconfig +
> > CONFIG_THUMB2_KERNEL=y kernel down to this commit. I do not see the same
> > hang when the kernel is compiled with GCC 11.2.0 and binutils 2.37 nor
> > do I see a hang with multi_v7_defconfig by itself. Is there something
> > that LLVM is doing wrong when compiling/assembling/linking the kernel or
> > is there something wrong/too aggressive with this commit? I can
> > reproduce this with current QEMU HEAD (ad22d05833).
> >
> > My QEMU invocation is:
> >
> > $ qemu-system-arm \
> >  -append "console=ttyAMA0 earlycon" \
> >  -display none \
> >  -initrd rootfs.cpio \
> >  -kernel zImage \
> >  -M virt \
> >  -m 512m \
> >  -nodefaults \
> >  -no-reboot \
> >  -serial mon:stdio
> >
> > and the rootfs.cpio and zImage files can be found here:
> >
> > https://github.com/nathanchance/bug-files/tree/15c1fd6e44622a3c27823d2c5c3083dfc7246146/qemu-2e1f39e29bf9a6b28eaee9fc0949aab50dbad94a
>
> Hmm.  I see
>
> IN:
> 0xc13038e2:  e890 008c  ldm.wr0, {r2, r3, r7}
>
> R00=c13077ca R01=c11a8058 R02=c11a8058 R03=c031737f
> R04=48379000 R05=0024 R06=c031748d R07=c03174bb
> R08=412fc0f1 R09=c0ce9308 R10=50c5387d R11=
> R12=0009 R13=c1501f88 R14=c0301739 R15=c13038e2
> PSR=21f3 --C- T svc32
> Taking exception 4 [Data Abort]
> ...from EL1 to EL1
> ...with ESR 0x25/0x963f
> ...with DFSR 0x1 DFAR 0xc13077ca
>
> So, yes, it's a ldm from an address % 4 = 2, so it is correct that we should 
> trap.  You
> should see the same trap on real hw.

Makes sense. I guess if we can find which label that's in, we can look
closer into the code generated by the compiler.
scripts/extract-vmlinux doesn't seem to be able to extract a vmlinux
from either zImage artifact though; not sure yet we'll be able to
disassemble those.

Oh, I guess GDB can show us. Looks like 0xc13038e2 is...hard to tell,
there's no debug info so we just have jumps to addresses in hex, not
symbols.  I don't know my way around GDB well enough to get a sense
for where we are in the source code.
https://gist.github.com/nickdesaulniers/764ac9afab04775846ffa6c90c5a266a

If I rebuild QEMU from source, I don't get any disassembly that looks
similar, probably as a result of different compiler versions, and
maybe adding debug info.

--
Thanks,
~Nick Desaulniers

Re: [PATCH v2 02/22] target/riscv: Implement SGEIP bit in hip and hie CSRs

2021-09-14 Thread Alistair Francis

On Thu, Sep 2, 2021 at 9:40 PM Anup Patel  wrote:
>
> A hypervsior can optionally take guest external interrupts using
> SGEIP bit of hip and hie CSRs.
>
> Signed-off-by: Anup Patel 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.c  |  3 ++-
>  target/riscv/cpu_bits.h |  3 +++
>  target/riscv/csr.c  | 18 +++---
>  3 files changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index dc1353b858..9d97fbe3d9 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -368,6 +368,7 @@ static void riscv_cpu_reset(DeviceState *dev)
>  env->priv = PRV_M;
>  env->mstatus &= ~(MSTATUS_MIE | MSTATUS_MPRV);
>  env->mcause = 0;
> +env->miclaim = MIP_SGEIP;
>  env->pc = env->resetvec;
>  env->two_stage_lookup = false;
>  #endif
> @@ -600,7 +601,7 @@ static void riscv_cpu_init(Object *obj)
>  cpu_set_cpustate_pointers(cpu);
>
>  #ifndef CONFIG_USER_ONLY
> -qdev_init_gpio_in(DEVICE(cpu), riscv_cpu_set_irq, 12);
> +qdev_init_gpio_in(DEVICE(cpu), riscv_cpu_set_irq, IRQ_LOCAL_MAX);
>  #endif /* CONFIG_USER_ONLY */
>  }
>
> diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> index 7330ff5a19..17ede1d4a9 100644
> --- a/target/riscv/cpu_bits.h
> +++ b/target/riscv/cpu_bits.h
> @@ -504,6 +504,8 @@ typedef enum RISCVException {
>  #define IRQ_S_EXT  9
>  #define IRQ_VS_EXT 10
>  #define IRQ_M_EXT  11
> +#define IRQ_S_GEXT 12
> +#define IRQ_LOCAL_MAX  13
>
>  /* mip masks */
>  #define MIP_USIP   (1 << IRQ_U_SOFT)
> @@ -518,6 +520,7 @@ typedef enum RISCVException {
>  #define MIP_SEIP   (1 << IRQ_S_EXT)
>  #define MIP_VSEIP  (1 << IRQ_VS_EXT)
>  #define MIP_MEIP   (1 << IRQ_M_EXT)
> +#define MIP_SGEIP  (1 << IRQ_S_GEXT)
>
>  /* sip masks */
>  #define SIP_SSIP   MIP_SSIP
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index 1f13d1042d..bc25c79e39 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -408,12 +408,13 @@ static RISCVException read_timeh(CPURISCVState *env, 
> int csrno,
>  #define M_MODE_INTERRUPTS  (MIP_MSIP | MIP_MTIP | MIP_MEIP)
>  #define S_MODE_INTERRUPTS  (MIP_SSIP | MIP_STIP | MIP_SEIP)
>  #define VS_MODE_INTERRUPTS (MIP_VSSIP | MIP_VSTIP | MIP_VSEIP)
> +#define HS_MODE_INTERRUPTS (MIP_SGEIP | VS_MODE_INTERRUPTS)
>
>  static const target_ulong delegable_ints = S_MODE_INTERRUPTS |
> VS_MODE_INTERRUPTS;
>  static const target_ulong vs_delegable_ints = VS_MODE_INTERRUPTS;
>  static const target_ulong all_ints = M_MODE_INTERRUPTS | S_MODE_INTERRUPTS |
> - VS_MODE_INTERRUPTS;
> + HS_MODE_INTERRUPTS;
>  #define DELEGABLE_EXCPS ((1ULL << (RISCV_EXCP_INST_ADDR_MIS)) | \
>   (1ULL << (RISCV_EXCP_INST_ACCESS_FAULT)) | \
>   (1ULL << (RISCV_EXCP_ILLEGAL_INST)) | \
> @@ -644,7 +645,7 @@ static RISCVException write_mideleg(CPURISCVState *env, 
> int csrno,
>  {
>  env->mideleg = (env->mideleg & ~delegable_ints) | (val & delegable_ints);
>  if (riscv_has_ext(env, RVH)) {
> -env->mideleg |= VS_MODE_INTERRUPTS;
> +env->mideleg |= HS_MODE_INTERRUPTS;
>  }
>  return RISCV_EXCP_NONE;
>  }
> @@ -660,6 +661,9 @@ static RISCVException write_mie(CPURISCVState *env, int 
> csrno,
>  target_ulong val)
>  {
>  env->mie = (env->mie & ~all_ints) | (val & all_ints);
> +if (!riscv_has_ext(env, RVH)) {
> +env->mie &= ~MIP_SGEIP;
> +}
>  return RISCV_EXCP_NONE;
>  }
>
> @@ -960,7 +964,7 @@ static RISCVException rmw_sip(CPURISCVState *env, int 
> csrno,
>  }
>
>  if (ret_value) {
> -*ret_value &= env->mideleg;
> +*ret_value &= env->mideleg & S_MODE_INTERRUPTS;
>  }
>  return ret;
>  }
> @@ -1078,7 +1082,7 @@ static RISCVException rmw_hvip(CPURISCVState *env, int 
> csrno,
>write_mask & hvip_writable_mask);
>
>  if (ret_value) {
> -*ret_value &= hvip_writable_mask;
> +*ret_value &= VS_MODE_INTERRUPTS;
>  }
>  return ret;
>  }
> @@ -1091,7 +1095,7 @@ static RISCVException rmw_hip(CPURISCVState *env, int 
> csrno,
>write_mask & hip_writable_mask);
>
>  if (ret_value) {
> -*ret_value &= hip_writable_mask;
> +*ret_value &= HS_MODE_INTERRUPTS;
>  }
>  return ret;
>  }
> @@ -1099,14 +1103,14 @@ static RISCVException rmw_hip(CPURISCVState *env, int 
> csrno,
>  static RISCVException read_hie(CPURISCVState *env, int csrno,
> target_ulong *val)
>  {
> -*val = env->mie & VS_MODE_INTERRUPTS;
> +*val = env->mie &

Re: [PATCH 1/2] iotests: Fix unspecified-encoding pylint warnings

2021-09-14 Thread John Snow

On Tue, Aug 24, 2021 at 11:47 AM Philippe Mathieu-Daudé 
wrote:

> On 8/24/21 5:35 PM, Hanna Reitz wrote:
> > As of recently, pylint complains when `open()` calls are missing an
> > `encoding=` specified.  Everything we have should be UTF-8 (and in fact,
> > everything should be UTF-8, period (exceptions apply)), so use that.
> >
> > Signed-off-by: Hanna Reitz 
> > ---
> >  tests/qemu-iotests/297| 2 +-
> >  tests/qemu-iotests/iotests.py | 8 +---
> >  2 files changed, 6 insertions(+), 4 deletions(-)
>
> Reviewed-by: Philippe Mathieu-Daudé 
>
>
I don't see this upstream just yet, so ...

Reviewed-by: John Snow 

I'll get around to revisiting my "run the iotest linters on Python CI"
thing soon which will flush out anything else that might still be missing.

--js

Re: [PATCH v2 04/22] target/riscv: Improve fidelity of guest external interrupts

2021-09-14 Thread Alistair Francis

On Tue, Sep 14, 2021 at 2:33 AM Anup Patel  wrote:
>
> On Thu, Sep 9, 2021 at 12:14 PM Alistair Francis  wrote:
> >
> > On Thu, Sep 2, 2021 at 9:26 PM Anup Patel  wrote:
> > >
> > > The guest external interrupts for external interrupt controller are
> > > not delivered to the guest running under hypervisor on time. This
> > > results in a guest having sluggish response to serial console input
> > > and other I/O events. To improve timely delivery of guest external
> > > interrupts, we check and inject interrupt upon every sret instruction.
> > >
> > > Signed-off-by: Anup Patel 
> > > ---
> > >  target/riscv/op_helper.c | 9 +
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/target/riscv/op_helper.c b/target/riscv/op_helper.c
> > > index ee7c24efe7..4c995c239e 100644
> > > --- a/target/riscv/op_helper.c
> > > +++ b/target/riscv/op_helper.c
> > > @@ -129,6 +129,15 @@ target_ulong helper_sret(CPURISCVState *env, 
> > > target_ulong cpu_pc_deb)
> > >
> > >  riscv_cpu_set_mode(env, prev_priv);
> > >
> > > +/*
> > > + * QEMU does not promptly deliver guest external interrupts
> > > + * to a guest running on a hypervisor which in-turn is running
> > > + * on QEMU. We make dummy call to riscv_cpu_update_mip() upon
> > > + * every sret instruction so that QEMU pickup guest external
> > > + * interrupts sooner.
> > > + */
> > > + riscv_cpu_update_mip(env_archcpu(env), 0, 0);
> >
> > This doesn't seem right. I don't understand why we need this?
> >
> > riscv_cpu_update_mip() is called when an interrupt is delivered to the
> > CPU, if we are missing interrupts then that is a bug somewhere else.
>
> I have finally figured out the cause of guest external interrupts being
> missed by Guest/VM.
>
> The riscv_cpu_set_irq() which handles guest external interrupt lines
> of each CPU is called asynchronously. This function in-turn calls
> riscv_cpu_update_mip() but the CPU might be in host mode (V=0)
> or in Guest/VM mode (V=1). If the CPU is in host mode (V=0) when

The IRQ being raised should just directly call riscv_cpu_update_mip()
so the IRQ should happen straight away.

Even from MTTCG I see this:

"""
Currently thanks to KVM work any access to IO memory is automatically
protected by the global iothread mutex, also known as the BQL (Big
Qemu Lock). Any IO region that doesn't use global mutex is expected to
do its own locking.

However IO memory isn't the only way emulated hardware state can be
modified. Some architectures have model specific registers that
trigger hardware emulation features. Generally any translation helper
that needs to update more than a single vCPUs of state should take the
BQL.
"""

So we should be fine here as well.

Can you supply a test case to reproduce the bug?

> the riscv_cpu_set_irq() is called, then the CPU interrupt requested by
> riscv_cpu_update_mip() has no effect because the CPU can't take
> the interrupt until it enters Guest/VM mode.
>
> This patch does the right thing by doing a dummy call to
> riscv_cpu_update_mip() upon SRET instruction so that if the CPU
> had missed a guest interrupt previously then the CPU can take it now.

This still doesn't look like the right fix.

Alistair

>
> Regards,
> Anup

Re: [PATCH v2 0/2] hw/arm/raspi: Remove deprecated raspi2/raspi3 aliases

2021-09-14 Thread John Snow

On Fri, Aug 27, 2021 at 2:30 PM Philippe Mathieu-Daudé 
wrote:

> On 8/27/21 8:01 PM, Willian Rampazzo wrote:
> > Hi, Phil,
> >
> > On Thu, Aug 26, 2021 at 1:49 PM Philippe Mathieu-Daudé 
> wrote:
> >>
> >> Hi Peter,
> >>
> >> On 7/9/21 6:00 PM, Peter Maydell wrote:
> >>> On Fri, 9 Jul 2021 at 16:33, Peter Maydell 
> wrote:
> 
>  On Thu, 8 Jul 2021 at 15:55, Philippe Mathieu-Daudé 
> wrote:
> >
> > Since v1:
> > - renamed tests (Peter)
> >
> > Philippe Mathieu-Daudé (2):
> >   tests: Remove uses of deprecated raspi2/raspi3 machine names
> >   hw/arm/raspi: Remove deprecated raspi2/raspi3 aliases
> 
> 
> 
>  Applied to target-arm.next, thanks.
> >>>
> >>> I found that this seems to break 'make check':
> >>>
> >>>  (09/52)
> tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_raspi2_initrd:
> >>> ERROR: Could not perform graceful shutdown (40.38 s)
> >>
> >> I can not reproduce. Maybe something got changed in Python/Avocado
> >> since... I'm going to simply respin (updating 6.1 -> 6.2).
> >
> > I also could not reproduce. I checked and nothing changed on the
> > Avocado side. The version is still the same on the QEMU side.
>
> Thanks for double-checking!
>
>
Sorry for the long silence.

Did you get this sorted out? I don't recall changing the QEMUMachine code
upstream lately (Though I have been tinkering with it a lot in my own
branches) -- was the root cause of the failure discovered?

--js

Re: [PATCH] target/riscv: Backup/restore mstatus.SD bit when virtual register swapped

2021-09-14 Thread Alistair Francis

On Tue, Sep 14, 2021 at 11:37 AM  wrote:
>
> From: Frank Chang 
>
> When virtual registers are swapped, mstatus.SD bit should also be
> backed up/restored. Otherwise, mstatus.SD bit will be incorrectly kept
> across the world switches.
>
> Signed-off-by: Frank Chang 
> Reviewed-by: Vincent Chen 

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>  target/riscv/cpu_helper.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index 968cb8046f4..488867b59eb 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -111,9 +111,10 @@ bool riscv_cpu_fp_enabled(CPURISCVState *env)
>
>  void riscv_cpu_swap_hypervisor_regs(CPURISCVState *env)
>  {
> +target_ulong sd = riscv_cpu_is_32bit(env) ? MSTATUS32_SD : MSTATUS64_SD;
>  uint64_t mstatus_mask = MSTATUS_MXR | MSTATUS_SUM | MSTATUS_FS |
>  MSTATUS_SPP | MSTATUS_SPIE | MSTATUS_SIE |
> -MSTATUS64_UXL;
> +MSTATUS64_UXL | sd;
>  bool current_virt = riscv_cpu_virt_enabled(env);
>
>  g_assert(riscv_has_ext(env, RVH));
> --
> 2.25.1
>
>

Re: [PATCH] docs/system/riscv: sifive_u: Update U-Boot instructions

2021-09-14 Thread Alistair Francis

On Sun, Sep 12, 2021 at 1:34 AM Bin Meng  wrote:
>
> In U-Boot v2021.07 release, there were 2 major changes for the
> SiFive Unleashed board support:
>
> - Board config name was changed from sifive_fu540_defconfig to
>   sifive_unleashed_defconfig
> - The generic binman tool was used to generate the FIT image
>   (combination of U-Boot proper, DTB and OpenSBI firmware)
>
> which make the existing U-Boot instructions out of date.
>
> Update the doc with latest instructions.
>
> Signed-off-by: Bin Meng 

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>
>  docs/system/riscv/sifive_u.rst | 49 ++
>  1 file changed, 26 insertions(+), 23 deletions(-)
>
> diff --git a/docs/system/riscv/sifive_u.rst b/docs/system/riscv/sifive_u.rst
> index 01108b5ecc..8ac93d0153 100644
> --- a/docs/system/riscv/sifive_u.rst
> +++ b/docs/system/riscv/sifive_u.rst
> @@ -209,15 +209,16 @@ command line options with ``qemu-system-riscv32``.
>  Running U-Boot
>  --
>
> -U-Boot mainline v2021.01 release is tested at the time of writing. To build a
> +U-Boot mainline v2021.07 release is tested at the time of writing. To build a
>  U-Boot mainline bootloader that can be booted by the ``sifive_u`` machine, 
> use
> -the sifive_fu540_defconfig with similar commands as described above for 
> Linux:
> +the sifive_unleashed_defconfig with similar commands as described above for
> +Linux:
>
>  .. code-block:: bash
>
>$ export CROSS_COMPILE=riscv64-linux-
>$ export OPENSBI=/path/to/opensbi-riscv64-generic-fw_dynamic.bin
> -  $ make sifive_fu540_defconfig
> +  $ make sifive_unleashed_defconfig
>
>  You will get spl/u-boot-spl.bin and u-boot.itb file in the build tree.
>
> @@ -312,31 +313,29 @@ board on QEMU ``sifive_u`` machine out of the box. This 
> allows users to
>  develop and test the recommended RISC-V boot flow with a real world use
>  case: ZSBL (in QEMU) loads U-Boot SPL from SD card or SPI flash to L2LIM,
>  then U-Boot SPL loads the combined payload image of OpenSBI fw_dynamic
> -firmware and U-Boot proper. However sometimes we want to have a quick test
> -of booting U-Boot on QEMU without the needs of preparing the SPI flash or
> -SD card images, an alternate way can be used, which is to create a U-Boot
> -S-mode image by modifying the configuration of U-Boot:
> +firmware and U-Boot proper.
> +
> +However sometimes we want to have a quick test of booting U-Boot on QEMU
> +without the needs of preparing the SPI flash or SD card images, an alternate
> +way can be used, which is to create a U-Boot S-mode image by modifying the
> +configuration of U-Boot:
>
>  .. code-block:: bash
>
> +  $ export CROSS_COMPILE=riscv64-linux-
> +  $ make sifive_unleashed_defconfig
>$ make menuconfig
>
> -then manually select the following configuration in U-Boot:
> -
> -  Device Tree Control > Provider of DTB for DT Control > Prior Stage 
> bootloader DTB
> +then manually select the following configuration:
>
> -This lets U-Boot to use the QEMU generated device tree blob. During the 
> build,
> -a build error will be seen below:
> +  * Device Tree Control ---> Provider of DTB for DT Control ---> Prior Stage 
> bootloader DTB
>
> -.. code-block:: none
> +and unselect the following configuration:
>
> -  MKIMAGE u-boot.img
> -  ./tools/mkimage: Can't open arch/riscv/dts/hifive-unleashed-a00.dtb: No 
> such file or directory
> -  ./tools/mkimage: failed to build FIT
> -  make: *** [Makefile:1440: u-boot.img] Error 1
> +  * Library routines ---> Allow access to binman information in the device 
> tree
>
> -The above errors can be safely ignored as we don't run U-Boot SPL under QEMU
> -in this alternate configuration.
> +This changes U-Boot to use the QEMU generated device tree blob, and bypass
> +running the U-Boot SPL stage.
>
>  Boot the 64-bit U-Boot S-mode image directly:
>
> @@ -351,14 +350,18 @@ It's possible to create a 32-bit U-Boot S-mode image as 
> well.
>  .. code-block:: bash
>
>$ export CROSS_COMPILE=riscv64-linux-
> -  $ make sifive_fu540_defconfig
> +  $ make sifive_unleashed_defconfig
>$ make menuconfig
>
>  then manually update the following configuration in U-Boot:
>
> -  Device Tree Control > Provider of DTB for DT Control > Prior Stage 
> bootloader DTB
> -  RISC-V architecture > Base ISA > RV32I
> -  Boot images > Text Base > 0x8040
> +  * Device Tree Control ---> Provider of DTB for DT Control ---> Prior Stage 
> bootloader DTB
> +  * RISC-V architecture ---> Base ISA ---> RV32I
> +  * Boot options ---> Boot images ---> Text Base ---> 0x8040
> +
> +and unselect the following configuration:
> +
> +  * Library routines ---> Allow access to binman information in the device 
> tree
>
>  Use the same command line options to boot the 32-bit U-Boot S-mode image:
>
> --
> 2.25.1
>
>

Re: [PATCH RFC v2 04/16] vfio-user: connect vfio proxy to remote server

2021-09-14 Thread John Johnson



> On Sep 14, 2021, at 6:06 AM, Stefan Hajnoczi  wrote:
> 
> On Mon, Sep 13, 2021 at 05:23:33PM +, John Johnson wrote:
 On Sep 9, 2021, at 10:25 PM, John Johnson  
 wrote:
> On Sep 8, 2021, at 11:29 PM, Stefan Hajnoczi  wrote:
> On Thu, Sep 09, 2021 at 05:11:49AM +, John Johnson wrote:
>>  I did look at coroutines, but they seemed to work when the sender
>> is triggering the coroutine on send, not when request packets are 
>> arriving
>> asynchronously to the sends.
> 
> This can be done with a receiver coroutine. Its job is to be the only
> thing that reads vfio-user messages from the socket. A receiver
> coroutine reads messages from the socket and wakes up the waiting
> coroutine that yielded from vfio_user_send_recv() or
> vfio_user_pci_process_req().
> 
> (Although vfio_user_pci_process_req() could be called directly from the
> receiver coroutine, it seems safer to have a separate coroutine that
> processes requests so that the receiver isn't blocked in case
> vfio_user_pci_process_req() yields while processing a request.)
> 
> Going back to what you mentioned above, the receiver coroutine does
> something like this:
> 
> if it's a reply
>reply = find_reply(...)
>qemu_coroutine_enter(reply->co) // instead of signalling reply->cv
> else
>QSIMPLEQ_INSERT_TAIL(_reqs, request, next);
>if (pending_reqs_was_empty) {
>qemu_coroutine_enter(process_request_co);
>}
> 
> The pending_reqs queue holds incoming requests that the
> process_request_co coroutine processes.
> 
 
 
How do coroutines work across threads?  There can be multiple vCPU
 threads waiting for replies, and I think the receiver coroutine will be
 running in the main loop thread.  Where would a vCPU block waiting for
 a reply?  I think coroutine_yield() returns to its coroutine_enter() caller
>>> 
>>> 
>>> 
>>> A vCPU thread holding the BQL can iterate the event loop if it has
>>> reached a synchronous point that needs to wait for a reply before
>>> returning. I think we have this situation when a MemoryRegion is
>>> accessed on the proxy device.
>>> 
>>> For example, block/block-backend.c:blk_prw() kicks off a coroutine and
>>> then runs the event loop until the coroutine finishes:
>>> 
>>>  Coroutine *co = qemu_coroutine_create(co_entry, );
>>>  bdrv_coroutine_enter(blk_bs(blk), co);
>>>  BDRV_POLL_WHILE(blk_bs(blk), rwco.ret == NOT_DONE);
>>> 
>>> BDRV_POLL_WHILE() boils down to a loop like this:
>>> 
>>>  while ((cond)) {
>>>aio_poll(ctx, true);
>>>  }
>>> 
>> 
>>  I think that would make vCPUs sending requests and the
>> receiver coroutine all poll on the same socket.  If the “wrong”
>> routine reads the message, I’d need a second level of synchronization
>> to pass the message to the “right” one.  e.g., if the vCPU coroutine
>> reads a request, it needs to pass it to the receiver; if the receiver
>> coroutine reads a reply, it needs to pass it to a vCPU.
>> 
>>  Avoiding this complexity is one of the reasons I went with
>> a separate thread that only reads the socket over the mp-qemu model,
>> which does have the sender poll, but doesn’t need to handle incoming
>> requests.
> 
> Only one coroutine reads from the socket, the "receiver" coroutine. In a
> previous reply I sketched what the receiver does:
> 
>  if it's a reply
>  reply = find_reply(...)
>  qemu_coroutine_enter(reply->co) // instead of signalling reply->cv
>  else
>  QSIMPLEQ_INSERT_TAIL(_reqs, request, next);
>  if (pending_reqs_was_empty) {
>  qemu_coroutine_enter(process_request_co);
>  }
> 

Sorry, I was assuming when you said the coroutine will block with
aio_poll(), you implied it would also read messages from the socket.
 

> The qemu_coroutine_enter(reply->co) call re-enters the coroutine that
> was created by the vCPU thread. Is this the "second level of
> synchronization" that you described? It's very similar to signalling
> reply->cv in the existing patch.
> 

Yes, the only difference is it would be woken on each message,
even though it doesn’t read them.  Which is what I think you’re addressing
below.


> Now I'm actually thinking about whether this can be improved by keeping
> the condvar so that the vCPU thread doesn't need to call aio_poll()
> (which is awkward because it doesn't drop the BQL and therefore blocks
> other vCPUs from making progress). That approach wouldn't require a
> dedicated thread for vfio-user.
> 

Wouldn’t you need to acquire BQL twice for every vCPU reply: once to
run the receiver coroutine, and once when the vCPU thread wakes up and wants
to return to the VFIO code.  The migration thread would also add a BQL
dependency, where it didn’t have one before.

Is your objection with using an iothread, or using a separate thread?
I can change to using qemu_thread_create() and

Re: [PATCH] target/riscv: Backup/restore mstatus.SD bit when virtual register swapped

2021-09-14 Thread Alistair Francis

On Tue, Sep 14, 2021 at 11:37 AM  wrote:
>
> From: Frank Chang 
>
> When virtual registers are swapped, mstatus.SD bit should also be
> backed up/restored. Otherwise, mstatus.SD bit will be incorrectly kept
> across the world switches.
>
> Signed-off-by: Frank Chang 
> Reviewed-by: Vincent Chen 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu_helper.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index 968cb8046f4..488867b59eb 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -111,9 +111,10 @@ bool riscv_cpu_fp_enabled(CPURISCVState *env)
>
>  void riscv_cpu_swap_hypervisor_regs(CPURISCVState *env)
>  {
> +target_ulong sd = riscv_cpu_is_32bit(env) ? MSTATUS32_SD : MSTATUS64_SD;
>  uint64_t mstatus_mask = MSTATUS_MXR | MSTATUS_SUM | MSTATUS_FS |
>  MSTATUS_SPP | MSTATUS_SPIE | MSTATUS_SIE |
> -MSTATUS64_UXL;
> +MSTATUS64_UXL | sd;
>  bool current_virt = riscv_cpu_virt_enabled(env);
>
>  g_assert(riscv_has_ext(env, RVH));
> --
> 2.25.1
>
>

Re: [PATCH] docs/system/riscv: sifive_u: Update U-Boot instructions

2021-09-14 Thread Alistair Francis

On Sun, Sep 12, 2021 at 1:34 AM Bin Meng  wrote:
>
> In U-Boot v2021.07 release, there were 2 major changes for the
> SiFive Unleashed board support:
>
> - Board config name was changed from sifive_fu540_defconfig to
>   sifive_unleashed_defconfig
> - The generic binman tool was used to generate the FIT image
>   (combination of U-Boot proper, DTB and OpenSBI firmware)
>
> which make the existing U-Boot instructions out of date.
>
> Update the doc with latest instructions.
>
> Signed-off-by: Bin Meng 

Reviewed-by: Alistair Francis 

Alistair

> ---
>
>  docs/system/riscv/sifive_u.rst | 49 ++
>  1 file changed, 26 insertions(+), 23 deletions(-)
>
> diff --git a/docs/system/riscv/sifive_u.rst b/docs/system/riscv/sifive_u.rst
> index 01108b5ecc..8ac93d0153 100644
> --- a/docs/system/riscv/sifive_u.rst
> +++ b/docs/system/riscv/sifive_u.rst
> @@ -209,15 +209,16 @@ command line options with ``qemu-system-riscv32``.
>  Running U-Boot
>  --
>
> -U-Boot mainline v2021.01 release is tested at the time of writing. To build a
> +U-Boot mainline v2021.07 release is tested at the time of writing. To build a
>  U-Boot mainline bootloader that can be booted by the ``sifive_u`` machine, 
> use
> -the sifive_fu540_defconfig with similar commands as described above for 
> Linux:
> +the sifive_unleashed_defconfig with similar commands as described above for
> +Linux:
>
>  .. code-block:: bash
>
>$ export CROSS_COMPILE=riscv64-linux-
>$ export OPENSBI=/path/to/opensbi-riscv64-generic-fw_dynamic.bin
> -  $ make sifive_fu540_defconfig
> +  $ make sifive_unleashed_defconfig
>
>  You will get spl/u-boot-spl.bin and u-boot.itb file in the build tree.
>
> @@ -312,31 +313,29 @@ board on QEMU ``sifive_u`` machine out of the box. This 
> allows users to
>  develop and test the recommended RISC-V boot flow with a real world use
>  case: ZSBL (in QEMU) loads U-Boot SPL from SD card or SPI flash to L2LIM,
>  then U-Boot SPL loads the combined payload image of OpenSBI fw_dynamic
> -firmware and U-Boot proper. However sometimes we want to have a quick test
> -of booting U-Boot on QEMU without the needs of preparing the SPI flash or
> -SD card images, an alternate way can be used, which is to create a U-Boot
> -S-mode image by modifying the configuration of U-Boot:
> +firmware and U-Boot proper.
> +
> +However sometimes we want to have a quick test of booting U-Boot on QEMU
> +without the needs of preparing the SPI flash or SD card images, an alternate
> +way can be used, which is to create a U-Boot S-mode image by modifying the
> +configuration of U-Boot:
>
>  .. code-block:: bash
>
> +  $ export CROSS_COMPILE=riscv64-linux-
> +  $ make sifive_unleashed_defconfig
>$ make menuconfig
>
> -then manually select the following configuration in U-Boot:
> -
> -  Device Tree Control > Provider of DTB for DT Control > Prior Stage 
> bootloader DTB
> +then manually select the following configuration:
>
> -This lets U-Boot to use the QEMU generated device tree blob. During the 
> build,
> -a build error will be seen below:
> +  * Device Tree Control ---> Provider of DTB for DT Control ---> Prior Stage 
> bootloader DTB
>
> -.. code-block:: none
> +and unselect the following configuration:
>
> -  MKIMAGE u-boot.img
> -  ./tools/mkimage: Can't open arch/riscv/dts/hifive-unleashed-a00.dtb: No 
> such file or directory
> -  ./tools/mkimage: failed to build FIT
> -  make: *** [Makefile:1440: u-boot.img] Error 1
> +  * Library routines ---> Allow access to binman information in the device 
> tree
>
> -The above errors can be safely ignored as we don't run U-Boot SPL under QEMU
> -in this alternate configuration.
> +This changes U-Boot to use the QEMU generated device tree blob, and bypass
> +running the U-Boot SPL stage.
>
>  Boot the 64-bit U-Boot S-mode image directly:
>
> @@ -351,14 +350,18 @@ It's possible to create a 32-bit U-Boot S-mode image as 
> well.
>  .. code-block:: bash
>
>$ export CROSS_COMPILE=riscv64-linux-
> -  $ make sifive_fu540_defconfig
> +  $ make sifive_unleashed_defconfig
>$ make menuconfig
>
>  then manually update the following configuration in U-Boot:
>
> -  Device Tree Control > Provider of DTB for DT Control > Prior Stage 
> bootloader DTB
> -  RISC-V architecture > Base ISA > RV32I
> -  Boot images > Text Base > 0x8040
> +  * Device Tree Control ---> Provider of DTB for DT Control ---> Prior Stage 
> bootloader DTB
> +  * RISC-V architecture ---> Base ISA ---> RV32I
> +  * Boot options ---> Boot images ---> Text Base ---> 0x8040
> +
> +and unselect the following configuration:
> +
> +  * Library routines ---> Allow access to binman information in the device 
> tree
>
>  Use the same command line options to boot the 32-bit U-Boot S-mode image:
>
> --
> 2.25.1
>
>

RE: [RFC v2 0/2] ui: Add a Wayland backend for Qemu UI (v2)

2021-09-14 Thread Kasireddy, Vivek

Hi Daniel,

> On Mon, Sep 13, 2021 at 03:20:34PM -0700, Vivek Kasireddy wrote:
> > Why does Qemu need a new Wayland UI backend?
> > The main reason why there needs to be a plain and simple Wayland backend
> > for Qemu UI is to eliminate the Blit (aka GPU copy) that happens if using
> > a toolkit like GTK or SDL (because they use EGL). The Blit can be eliminated
> > by sharing the dmabuf fd -- associated with the Guest scanout buffer --
> > directly with the Host compositor via the linux-dmabuf (unstable) protocol.
> > Once properly integrated, it would be potentially possible to have the
> > scanout buffer created by the Guest compositor be placed directly on a
> > hardware plane on the Host thereby improving performance. Only Guest
> > compositors that use multiple back buffers (at-least 1 front and 1 back)
> > and virtio-gpu would benefit from this work.
> 
> IME, QEMU already suffers from having too many barely maintained UI
> implementations and iti s confusing to users. Using a toolkit like GTK
> is generally a good thing, even if they don't enable the maximum
> theoretical performance, because they reduce the long term maint burden.
[Kasireddy, Vivek] The Wayland UI is ~600 lines of code and more than half of
that is just boilerplate; it is also fairly standalone. We don't have any major
complaints against GTK UI (which is ~3000 lines of code, just the Qemu piece
excluding libgtk) except that there is no way to dissociate EGL from it. And, to
keep the UI codebase up-to-date and leverage latest features from GTK (for
example GTK4 and beyond) it would require non-trivial amount of work.
Therefore, I think it'd not be too onerous to maintain something lightweight 
like
the Wayland UI module.

> 
> I'm far from convinced that we should take on the maint of yet another
> UI in QEMU, even if it does have some performance benefit, especially
[Kasireddy, Vivek] There are novel use-cases coming up particularly with 
the arrival of technologies like SRIOV where the Guest is expected to do all the
rendering and share the fully composited buffer with the Host whose job is to
just present it on the screen. And, in this scenario, if we were to use GTK UI,
the (fullscreen sized) Blits incurred would quickly add up if there are 4-6 
Guests
running and presenting content at the same time locally on multiple monitors.
Wayland UI helps in this situation as it does not do any additional rendering on
the Host (no Blit) as it just merely forwards the Guest composited buffer to the
Host compositor.

> if implemented using a very low level API like Wayland, that won't let
> us easily add rich UI features.
[Kasireddy, Vivek] Yes, it is a drawback of Wayland UI that it'd not be 
possible to
do window decorations/rich UI features but there are users/customers that do 
not care
for them. I think it should be possible to have both Wayland and GTK UI co-exist
where users can choose GTK UI for fancy features and Wayland UI for performance.

Thanks,
Vivek

> 
> Regards,
> Daniel
> --
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

RE: [RFC v2 2/2] ui: Add a plain Wayland backend for Qemu UI

2021-09-14 Thread Kasireddy, Vivek

Hi Daniel,
 
> On Mon, Sep 13, 2021 at 03:20:36PM -0700, Vivek Kasireddy wrote:
> > Cc: Gerd Hoffmann 
> > Signed-off-by: Vivek Kasireddy 
> > ---
> >  configure |   8 +-
> >  meson.build   |  33 +++
> >  meson_options.txt |   2 +
> >  qapi/ui.json  |   3 +
> >  ui/meson.build|  52 
> >  ui/wayland.c  | 628 ++
> >  6 files changed, 725 insertions(+), 1 deletion(-)  create mode 100644
> > ui/wayland.c
> 
> 
> > diff --git a/ui/meson.build b/ui/meson.build index
> > a73beb0e54..86fc324c82 100644
> > --- a/ui/meson.build
> > +++ b/ui/meson.build
> > @@ -64,6 +64,58 @@ if config_host.has_key('CONFIG_OPENGL') and gbm.found()
> >ui_modules += {'egl-headless' : egl_headless_ss}  endif
> >
> > +wayland_scanner = find_program('wayland-scanner') proto_sources = [
> > +  ['xdg-shell', 'stable', ],
> > +  ['fullscreen-shell', 'unstable', 'v1', ],
> > +  ['linux-dmabuf', 'unstable', 'v1', ], ] wayland_headers = []
> > +wayland_proto_sources = []
> > +
> > +if wayland.found()
> > +  foreach p: proto_sources
> > +proto_name = p.get(0)
> > +proto_stability = p.get(1)
> > +
> > +if proto_stability == 'stable'
> > +  output_base = proto_name
> > +  input = files(join_paths(wlproto_dir,
> '@0@/@1@/@2@.xml'.format(proto_stability, proto_name, output_base)))
> > +else
> > +  proto_version = p.get(2)
> > +  output_base = '@0@-@1@-@2@'.format(proto_name, proto_stability,
> proto_version)
> > +  input = files(join_paths(wlproto_dir,
> '@0@/@1@/@2@.xml'.format(proto_stability, proto_name, output_base)))
> > +endif
> > +
> > +wayland_headers += custom_target('@0@ client 
> > header'.format(output_base),
> > +  input: input,
> > +  output: '@0@-client-protocol.h'.format(output_base),
> > +  command: [
> > +wayland_scanner,
> > +'client-header',
> > +'@INPUT@', '@OUTPUT@',
> > +  ], build_by_default: true
> > +)
> > +
> > +wayland_proto_sources += custom_target('@0@ 
> > source'.format(output_base),
> > +  input: input,
> > +  output: '@0@-protocol.c'.format(output_base),
> > +  command: [
> > +wayland_scanner,
> > +'private-code',
> > +'@INPUT@', '@OUTPUT@',
> > +  ], build_by_default: true
> > +)
> > +  endforeach
> > +endif
> > +
> > +if wayland.found()
> > +  wayland_ss = ss.source_set()
> > +  wayland_ss.add(when: wayland, if_true: files('wayland.c',
> > +'xdg-shell-protocol.c',
> > +'fullscreen-shell-unstable-v1-protocol.c','linux-dmabuf-unstable-v1-p
> > +rotocol.c'))
> > +  #wayland_ss.add(when: wayland, if_true: files('wayland.c'),
> > +[wayland_proto_sources])
> > +  ui_modules += {'wayland' : wayland_ss} endif
> 
> Configure fails on this
> 
>   Program wayland-scanner found: YES (/usr/bin/wayland-scanner)
> 
>   ../ui/meson.build:114:13: ERROR: File xdg-shell-protocol.c does not exist.
> 
> 
> the code a few lines above generates xdg-shell-protocol.c, but that isn't run 
> until you type
> "make", so when meson is resolving the source files they don't exist.
> 
> The alternative line you have commented out looks more like what we would 
> need, but it
> doesn't work either as its syntax is invalid.
[Kasireddy, Vivek] Right, the commented line is the one we'd need but despite 
exhaustively
trying various different combinations, I couldn't get Meson to include the 
auto-generated
protocol sources. If it is not too much trouble, could you please point me to 
an example
where this is done elsewhere in Qemu source that I can look at?

> 
> How did you actually compile this series ?
[Kasireddy, Vivek] Oh, as a workaround, I just manually added the protocol 
sources. I am
sorry I did not realize that this code would be compiled/tested; I mainly 
posted these
RFC/WIP patches to provide additional context to the discussion associated with 
the DRM/
Virtio-gpu kernel patches.

Thanks,
Vivek

> 
> 
> Regards,
> Daniel
> --
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v4 05/16] tcg/s390x: Implement tcg_out_ld/st for vector types

2021-09-14 Thread Richard Henderson


On 9/14/21 3:03 PM, Richard Henderson wrote:

On 9/14/21 9:46 AM, David Hildenbrand wrote:

+    if (likely(data < 16)) {


This actually maps to "if (likely(data <= TCG_REG_R15))", correct?


Sure.


I'm going to add is_general_reg and is_vector_reg predicates.


r~

RE: [PULL 0/6] Vga 20210910 patches

2021-09-14 Thread Kasireddy, Vivek

Hi Gerd, Peter,

> 
> On Fri, Sep 10, 2021 at 05:52:55PM +0100, Peter Maydell wrote:
> > On Fri, 10 Sept 2021 at 14:19, Gerd Hoffmann  wrote:
> > >
> > > The following changes since commit 
> > > bd662023e683850c085e98c8ff8297142c2dd9f2:
> > >
> > >   Merge remote-tracking branch
> > > 'remotes/mcayland/tags/qemu-openbios-20210908' into staging
> > > (2021-09-08 11:06:17 +0100)
> > >
> > > are available in the Git repository at:
> > >
> > >   git://git.kraxel.org/qemu tags/vga-20210910-pull-request
> > >
> > > for you to fetch changes up to 6335c0b56819a5d1219ea84a11a732d0861542db:
> > >
> > >   qxl: fix pre-save logic (2021-09-10 12:23:12 +0200)
> > >
> > > 
> > > virtio-gpu + ui: fence syncronization.
> > > qxl: unbreak live migration.
> > >
> > > 
> >
> > Hi; this fails to build on the ppc64 system:
> >
> > ../../ui/egl-helpers.c:79:6: error: no previous prototype for
> > 'egl_dmabuf_create_sync' [-Werror=missing-prototypes]
> >79 | void egl_dmabuf_create_sync(QemuDmaBuf *dmabuf)
> >   |  ^~
> > ../../ui/egl-helpers.c:95:6: error: no previous prototype for
> > 'egl_dmabuf_create_fence' [-Werror=missing-prototypes]
> >95 | void egl_dmabuf_create_fence(QemuDmaBuf *dmabuf)
> >   |  ^~~
> >
> >
> > The prototype is hidden behind CONFIG_GBM, but the definition is not.
> >
> > Then the callsites fail:
> >
> > ../../ui/gtk-gl-area.c: In function 'gd_gl_area_draw':
> > ../../ui/gtk-gl-area.c:77:9: error: implicit declaration of function
> > 'egl_dmabuf_create_sync' [-Werror=implicit-function-declaration]
> >77 | egl_dmabuf_create_sync(dmabuf);
> >   | ^~
> > ../../ui/gtk-gl-area.c:77:9: error: nested extern declaration of
> > 'egl_dmabuf_create_sync' [-Werror=nested-externs]
> > ../../ui/gtk-gl-area.c:81:9: error: implicit declaration of function
> > 'egl_dmabuf_create_fence' [-Werror=implicit-function-declaration]
> >81 | egl_dmabuf_create_fence(dmabuf);
> >   | ^~~
> > ../../ui/gtk-gl-area.c:81:9: error: nested extern declaration of
> > 'egl_dmabuf_create_fence' [-Werror=nested-externs]
> >
> >
> > ../../ui/gtk-egl.c: In function 'gd_egl_draw':
> > ../../ui/gtk-egl.c:100:9: error: implicit declaration of function
> > 'egl_dmabuf_create_fence' [-Werror=implicit-function-declaration]
> >   100 | egl_dmabuf_create_fence(dmabuf);
> >   | ^~~
> > ../../ui/gtk-egl.c:100:9: error: nested extern declaration of
> > 'egl_dmabuf_create_fence' [-Werror=nested-externs]
> > ../../ui/gtk-egl.c: In function 'gd_egl_scanout_flush':
> > ../../ui/gtk-egl.c:301:9: error: implicit declaration of function
> > 'egl_dmabuf_create_sync' [-Werror=implicit-function-declaration]
> >   301 | egl_dmabuf_create_sync(vc->gfx.guest_fb.dmabuf);
> >   | ^~
> > ../../ui/gtk-egl.c:301:9: error: nested extern declaration of
> > 'egl_dmabuf_create_sync' [-Werror=nested-externs]
> >
> >
> > You can probably repro this on any system which has the opengl
> > libraries installed but not libgbm.
> 
> Vivek?  Can you have a look please?
[Kasireddy, Vivek] I sent a v6 that fixes these compilation errors:
https://lists.nongnu.org/archive/html/qemu-devel/2021-09/msg03859.html

Compile tested the patches with and without GBM.

Thanks,
Vivek

Re: [PATCH v4 05/16] tcg/s390x: Implement tcg_out_ld/st for vector types

2021-09-14 Thread Richard Henderson


On 9/14/21 9:46 AM, David Hildenbrand wrote:

+    if (likely(data < 16)) {


This actually maps to "if (likely(data <= TCG_REG_R15))", correct?


Sure.


r~

Re: [PATCH v4 05/16] tcg/s390x: Implement tcg_out_ld/st for vector types

2021-09-14 Thread Richard Henderson


On 9/14/21 9:46 AM, David Hildenbrand wrote:

+static void tcg_out_insn_VRX(TCGContext *s, S390Opcode op, TCGReg v1,
+ TCGReg b2, TCGReg x2, intptr_t d2, int m3)


Is intptr_t really the right type here? Just curious ... I'd have used an uint16_t and 
asserted "!(d1 & 0xf000)".


It does come from upstream, as part of a host address. If you use uint16_t, the assert 
misses the upper bits being zero because they've been truncated.



r~

[PATCH v6 2/5] ui/egl: Add egl helpers to help with synchronization

2021-09-14 Thread Vivek Kasireddy

These egl helpers would be used for creating and waiting on
a sync object.

Cc: Gerd Hoffmann 
Reviewed-by: Gerd Hoffmann 
Signed-off-by: Vivek Kasireddy 
---
 include/ui/console.h |  2 ++
 include/ui/egl-helpers.h |  2 ++
 ui/egl-helpers.c | 26 ++
 3 files changed, 30 insertions(+)

diff --git a/include/ui/console.h b/include/ui/console.h
index 3be21497a2..45ec129174 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -168,6 +168,8 @@ typedef struct QemuDmaBuf {
 uint64_t  modifier;
 uint32_t  texture;
 bool  y0_top;
+void  *sync;
+int   fence_fd;
 } QemuDmaBuf;
 
 typedef struct DisplayState DisplayState;
diff --git a/include/ui/egl-helpers.h b/include/ui/egl-helpers.h
index f1bf8f97fc..2c3ba92b53 100644
--- a/include/ui/egl-helpers.h
+++ b/include/ui/egl-helpers.h
@@ -45,6 +45,8 @@ int egl_get_fd_for_texture(uint32_t tex_id, EGLint *stride, 
EGLint *fourcc,
 
 void egl_dmabuf_import_texture(QemuDmaBuf *dmabuf);
 void egl_dmabuf_release_texture(QemuDmaBuf *dmabuf);
+void egl_dmabuf_create_sync(QemuDmaBuf *dmabuf);
+void egl_dmabuf_create_fence(QemuDmaBuf *dmabuf);
 
 #endif
 
diff --git a/ui/egl-helpers.c b/ui/egl-helpers.c
index 6d0cb2b5cb..385a3fa752 100644
--- a/ui/egl-helpers.c
+++ b/ui/egl-helpers.c
@@ -287,6 +287,32 @@ void egl_dmabuf_release_texture(QemuDmaBuf *dmabuf)
 dmabuf->texture = 0;
 }
 
+void egl_dmabuf_create_sync(QemuDmaBuf *dmabuf)
+{
+EGLSyncKHR sync;
+
+if (epoxy_has_egl_extension(qemu_egl_display,
+"EGL_KHR_fence_sync") &&
+epoxy_has_egl_extension(qemu_egl_display,
+"EGL_ANDROID_native_fence_sync")) {
+sync = eglCreateSyncKHR(qemu_egl_display,
+EGL_SYNC_NATIVE_FENCE_ANDROID, NULL);
+if (sync != EGL_NO_SYNC_KHR) {
+dmabuf->sync = sync;
+}
+}
+}
+
+void egl_dmabuf_create_fence(QemuDmaBuf *dmabuf)
+{
+if (dmabuf->sync) {
+dmabuf->fence_fd = eglDupNativeFenceFDANDROID(qemu_egl_display,
+  dmabuf->sync);
+eglDestroySyncKHR(qemu_egl_display, dmabuf->sync);
+dmabuf->sync = NULL;
+}
+}
+
 #endif /* CONFIG_GBM */
 
 /* -- */
-- 
2.30.2

[PATCH v6 5/5] virtio-gpu: Add gl_flushed callback

2021-09-14 Thread Vivek Kasireddy

Adding this callback provides a way to resume the processing of
cmds in fenceq and cmdq that were not processed because the UI
was waiting on a fence and blocked cmd processing.

Cc: Gerd Hoffmann 
Reviewed-by: Gerd Hoffmann 
Signed-off-by: Vivek Kasireddy 
---
 hw/display/virtio-gpu.c | 32 ++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 72da5bf500..182e0868b0 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -985,8 +985,10 @@ void virtio_gpu_simple_process_cmd(VirtIOGPU *g,
 break;
 }
 if (!cmd->finished) {
-virtio_gpu_ctrl_response_nodata(g, cmd, cmd->error ? cmd->error :
-VIRTIO_GPU_RESP_OK_NODATA);
+if (!g->parent_obj.renderer_blocked) {
+virtio_gpu_ctrl_response_nodata(g, cmd, cmd->error ? cmd->error :
+VIRTIO_GPU_RESP_OK_NODATA);
+}
 }
 }
 
@@ -1042,6 +1044,30 @@ void virtio_gpu_process_cmdq(VirtIOGPU *g)
 g->processing_cmdq = false;
 }
 
+static void virtio_gpu_process_fenceq(VirtIOGPU *g)
+{
+struct virtio_gpu_ctrl_command *cmd, *tmp;
+
+QTAILQ_FOREACH_SAFE(cmd, >fenceq, next, tmp) {
+trace_virtio_gpu_fence_resp(cmd->cmd_hdr.fence_id);
+virtio_gpu_ctrl_response_nodata(g, cmd, VIRTIO_GPU_RESP_OK_NODATA);
+QTAILQ_REMOVE(>fenceq, cmd, next);
+g_free(cmd);
+g->inflight--;
+if (virtio_gpu_stats_enabled(g->parent_obj.conf)) {
+fprintf(stderr, "inflight: %3d (-)\r", g->inflight);
+}
+}
+}
+
+static void virtio_gpu_handle_gl_flushed(VirtIOGPUBase *b)
+{
+VirtIOGPU *g = container_of(b, VirtIOGPU, parent_obj);
+
+virtio_gpu_process_fenceq(g);
+virtio_gpu_process_cmdq(g);
+}
+
 static void virtio_gpu_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOGPU *g = VIRTIO_GPU(vdev);
@@ -1400,10 +1426,12 @@ static void virtio_gpu_class_init(ObjectClass *klass, 
void *data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
 VirtIOGPUClass *vgc = VIRTIO_GPU_CLASS(klass);
+VirtIOGPUBaseClass *vgbc = >parent;
 
 vgc->handle_ctrl = virtio_gpu_handle_ctrl;
 vgc->process_cmd = virtio_gpu_simple_process_cmd;
 vgc->update_cursor_data = virtio_gpu_update_cursor_data;
+vgbc->gl_flushed = virtio_gpu_handle_gl_flushed;
 
 vdc->realize = virtio_gpu_device_realize;
 vdc->reset = virtio_gpu_reset;
-- 
2.30.2

[PATCH v6 4/5] ui/gtk-egl: Wait for the draw signal for dmabuf blobs

2021-09-14 Thread Vivek Kasireddy

Instead of immediately drawing and submitting, queue and wait
for the draw signal if the dmabuf submitted is a blob.

Cc: Gerd Hoffmann 
Reviewed-by: Gerd Hoffmann 
Signed-off-by: Vivek Kasireddy 
---
 include/ui/gtk.h |  2 ++
 ui/gtk-egl.c | 15 +++
 ui/gtk.c |  2 +-
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/ui/gtk.h b/include/ui/gtk.h
index 43854f3509..7d22affd38 100644
--- a/include/ui/gtk.h
+++ b/include/ui/gtk.h
@@ -182,6 +182,8 @@ void gd_egl_cursor_dmabuf(DisplayChangeListener *dcl,
   uint32_t hot_x, uint32_t hot_y);
 void gd_egl_cursor_position(DisplayChangeListener *dcl,
 uint32_t pos_x, uint32_t pos_y);
+void gd_egl_flush(DisplayChangeListener *dcl,
+  uint32_t x, uint32_t y, uint32_t w, uint32_t h);
 void gd_egl_scanout_flush(DisplayChangeListener *dcl,
   uint32_t x, uint32_t y, uint32_t w, uint32_t h);
 void gtk_egl_init(DisplayGLMode mode);
diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index 3a90aeb2b9..72ce5e1f8f 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -309,6 +309,21 @@ void gd_egl_scanout_flush(DisplayChangeListener *dcl,
 eglSwapBuffers(qemu_egl_display, vc->gfx.esurface);
 }
 
+void gd_egl_flush(DisplayChangeListener *dcl,
+  uint32_t x, uint32_t y, uint32_t w, uint32_t h)
+{
+VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
+GtkWidget *area = vc->gfx.drawing_area;
+
+if (vc->gfx.guest_fb.dmabuf) {
+graphic_hw_gl_block(vc->gfx.dcl.con, true);
+gtk_widget_queue_draw_area(area, x, y, w, h);
+return;
+}
+
+gd_egl_scanout_flush(>gfx.dcl, x, y, w, h);
+}
+
 void gtk_egl_init(DisplayGLMode mode)
 {
 GdkDisplay *gdk_display = gdk_display_get_default();
diff --git a/ui/gtk.c b/ui/gtk.c
index 5105c0a33f..b0564d80c1 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -637,7 +637,7 @@ static const DisplayChangeListenerOps dcl_egl_ops = {
 .dpy_gl_scanout_dmabuf   = gd_egl_scanout_dmabuf,
 .dpy_gl_cursor_dmabuf= gd_egl_cursor_dmabuf,
 .dpy_gl_cursor_position  = gd_egl_cursor_position,
-.dpy_gl_update   = gd_egl_scanout_flush,
+.dpy_gl_update   = gd_egl_flush,
 .dpy_gl_release_dmabuf   = gd_gl_release_dmabuf,
 .dpy_has_dmabuf  = gd_has_dmabuf,
 };
-- 
2.30.2

[PATCH v6 1/5] ui/gtk: Create a common release_dmabuf helper

2021-09-14 Thread Vivek Kasireddy

Since the texture release mechanism is same for both gtk-egl
and gtk-glarea, move the helper from gtk-egl to common gtk
code so that it can be shared by both gtk backends.

Cc: Gerd Hoffmann 
Reviewed-by: Gerd Hoffmann 
Signed-off-by: Vivek Kasireddy 
---
 include/ui/gtk.h |  2 --
 ui/gtk-egl.c |  8 
 ui/gtk.c | 11 ++-
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/include/ui/gtk.h b/include/ui/gtk.h
index 7835ef1a71..8e98a79ac8 100644
--- a/include/ui/gtk.h
+++ b/include/ui/gtk.h
@@ -181,8 +181,6 @@ void gd_egl_cursor_dmabuf(DisplayChangeListener *dcl,
   uint32_t hot_x, uint32_t hot_y);
 void gd_egl_cursor_position(DisplayChangeListener *dcl,
 uint32_t pos_x, uint32_t pos_y);
-void gd_egl_release_dmabuf(DisplayChangeListener *dcl,
-   QemuDmaBuf *dmabuf);
 void gd_egl_scanout_flush(DisplayChangeListener *dcl,
   uint32_t x, uint32_t y, uint32_t w, uint32_t h);
 void gtk_egl_init(DisplayGLMode mode);
diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index 2a2e6d3a17..b671181272 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -249,14 +249,6 @@ void gd_egl_cursor_position(DisplayChangeListener *dcl,
 vc->gfx.cursor_y = pos_y * vc->gfx.scale_y;
 }
 
-void gd_egl_release_dmabuf(DisplayChangeListener *dcl,
-   QemuDmaBuf *dmabuf)
-{
-#ifdef CONFIG_GBM
-egl_dmabuf_release_texture(dmabuf);
-#endif
-}
-
 void gd_egl_scanout_flush(DisplayChangeListener *dcl,
   uint32_t x, uint32_t y, uint32_t w, uint32_t h)
 {
diff --git a/ui/gtk.c b/ui/gtk.c
index cfb0728d1f..784a2f6c74 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -575,6 +575,14 @@ static bool gd_has_dmabuf(DisplayChangeListener *dcl)
 return vc->gfx.has_dmabuf;
 }
 
+static void gd_gl_release_dmabuf(DisplayChangeListener *dcl,
+ QemuDmaBuf *dmabuf)
+{
+#ifdef CONFIG_GBM
+egl_dmabuf_release_texture(dmabuf);
+#endif
+}
+
 /** DisplayState Callbacks (opengl version) **/
 
 static const DisplayChangeListenerOps dcl_gl_area_ops = {
@@ -593,6 +601,7 @@ static const DisplayChangeListenerOps dcl_gl_area_ops = {
 .dpy_gl_scanout_disable  = gd_gl_area_scanout_disable,
 .dpy_gl_update   = gd_gl_area_scanout_flush,
 .dpy_gl_scanout_dmabuf   = gd_gl_area_scanout_dmabuf,
+.dpy_gl_release_dmabuf   = gd_gl_release_dmabuf,
 .dpy_has_dmabuf  = gd_has_dmabuf,
 };
 
@@ -615,8 +624,8 @@ static const DisplayChangeListenerOps dcl_egl_ops = {
 .dpy_gl_scanout_dmabuf   = gd_egl_scanout_dmabuf,
 .dpy_gl_cursor_dmabuf= gd_egl_cursor_dmabuf,
 .dpy_gl_cursor_position  = gd_egl_cursor_position,
-.dpy_gl_release_dmabuf   = gd_egl_release_dmabuf,
 .dpy_gl_update   = gd_egl_scanout_flush,
+.dpy_gl_release_dmabuf   = gd_gl_release_dmabuf,
 .dpy_has_dmabuf  = gd_has_dmabuf,
 };
 
-- 
2.30.2

[PATCH v6 0/5] virtio-gpu: Add a default synchronization mechanism for blobs (v6)

2021-09-14 Thread Vivek Kasireddy

When the Guest and Host are using Blob resources, there is a chance
that they may use the underlying storage associated with a Blob at
the same time leading to glitches such as flickering or tearing.
To prevent these from happening, the Host needs to ensure that it
waits until its Blit is completed by the Host GPU before letting
the Guest reuse the Blob.

This should be the default behavior regardless of the type of Guest
that is using Blob resources but would be particularly useful for 
Guests that are using frontbuffer rendering such as some X compositors,
Windows compositors, etc.

The way it works is the Guest submits the resource_flush command and
waits -- for example over a dma fence -- until virtio-gpu sends an ack.
And, the UI will queue a new repaint request and waits until the sync
object associated with the Blit is signaled. Once this is done, the UI
will trigger virtio-gpu to send an ack for the resource_flush cmd.

v2:
- Added more description in the cover letter
- Removed the wait from resource_flush and included it in
  a gl_flushed() callback

v3:
- Instead of explicitly waiting on the sync object and stalling the
  thread, add the relevant fence fd to Qemu's main loop and wait
  for it to be signalled. (suggested by Gerd Hoffmann)

v4:
- Replace the field 'blob' with 'allow_fences' in QemuDmabuf struct.
  (Gerd)

v5: rebase

v6: Fixed the compilation error on platforms that do not have GBM.

Cc: Gerd Hoffmann 
Cc: Dongwon Kim 

Vivek Kasireddy (5):
  ui/gtk: Create a common release_dmabuf helper
  ui/egl: Add egl helpers to help with synchronization
  ui: Create sync objects and fences only for blobs
  ui/gtk-egl: Wait for the draw signal for dmabuf blobs
  virtio-gpu: Add gl_flushed callback

 hw/display/virtio-gpu-udmabuf.c |  1 +
 hw/display/virtio-gpu.c | 32 --
 include/ui/console.h|  3 +++
 include/ui/egl-helpers.h|  3 +++
 include/ui/gtk.h|  5 ++--
 ui/egl-helpers.c| 26 ++
 ui/gtk-egl.c| 48 +++--
 ui/gtk-gl-area.c| 26 ++
 ui/gtk.c| 26 --
 9 files changed, 156 insertions(+), 14 deletions(-)

-- 
2.30.2

[PATCH v6 3/5] ui: Create sync objects and fences only for blobs

2021-09-14 Thread Vivek Kasireddy

Create sync objects and fences only for dmabufs that are blobs. Once a
fence is created (after glFlush) and is signalled,
graphic_hw_gl_flushed() will be called and virtio-gpu cmd processing
will be resumed.

Cc: Gerd Hoffmann 
Signed-off-by: Vivek Kasireddy 
---
 hw/display/virtio-gpu-udmabuf.c |  1 +
 include/ui/console.h|  1 +
 include/ui/egl-helpers.h|  1 +
 include/ui/gtk.h|  1 +
 ui/gtk-egl.c| 25 +
 ui/gtk-gl-area.c| 26 ++
 ui/gtk.c| 13 +
 7 files changed, 68 insertions(+)

diff --git a/hw/display/virtio-gpu-udmabuf.c b/hw/display/virtio-gpu-udmabuf.c
index 3c01a415e7..c6f7f58784 100644
--- a/hw/display/virtio-gpu-udmabuf.c
+++ b/hw/display/virtio-gpu-udmabuf.c
@@ -185,6 +185,7 @@ static VGPUDMABuf
 dmabuf->buf.stride = fb->stride;
 dmabuf->buf.fourcc = qemu_pixman_to_drm_format(fb->format);
 dmabuf->buf.fd = res->dmabuf_fd;
+dmabuf->buf.allow_fences = true;
 
 dmabuf->scanout_id = scanout_id;
 QTAILQ_INSERT_HEAD(>dmabuf.bufs, dmabuf, next);
diff --git a/include/ui/console.h b/include/ui/console.h
index 45ec129174..244664d727 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -170,6 +170,7 @@ typedef struct QemuDmaBuf {
 bool  y0_top;
 void  *sync;
 int   fence_fd;
+bool  allow_fences;
 } QemuDmaBuf;
 
 typedef struct DisplayState DisplayState;
diff --git a/include/ui/egl-helpers.h b/include/ui/egl-helpers.h
index 2c3ba92b53..2fb6e0dd6b 100644
--- a/include/ui/egl-helpers.h
+++ b/include/ui/egl-helpers.h
@@ -19,6 +19,7 @@ typedef struct egl_fb {
 GLuint texture;
 GLuint framebuffer;
 bool delete_texture;
+QemuDmaBuf *dmabuf;
 } egl_fb;
 
 void egl_fb_destroy(egl_fb *fb);
diff --git a/include/ui/gtk.h b/include/ui/gtk.h
index 8e98a79ac8..43854f3509 100644
--- a/include/ui/gtk.h
+++ b/include/ui/gtk.h
@@ -155,6 +155,7 @@ extern bool gtk_use_gl_area;
 /* ui/gtk.c */
 void gd_update_windowsize(VirtualConsole *vc);
 int gd_monitor_update_interval(GtkWidget *widget);
+void gd_hw_gl_flushed(void *vc);
 
 /* ui/gtk-egl.c */
 void gd_egl_init(VirtualConsole *vc);
diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index b671181272..3a90aeb2b9 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -12,6 +12,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
 
 #include "trace.h"
 
@@ -94,6 +95,18 @@ void gd_egl_draw(VirtualConsole *vc)
 }
 
 glFlush();
+#ifdef CONFIG_GBM
+if (vc->gfx.guest_fb.dmabuf) {
+QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf;
+
+egl_dmabuf_create_fence(dmabuf);
+if (dmabuf->fence_fd > 0) {
+qemu_set_fd_handler(dmabuf->fence_fd, gd_hw_gl_flushed, NULL, vc);
+return;
+}
+graphic_hw_gl_block(vc->gfx.dcl.con, false);
+}
+#endif
 graphic_hw_gl_flushed(vc->gfx.dcl.con);
 }
 
@@ -209,6 +222,8 @@ void gd_egl_scanout_dmabuf(DisplayChangeListener *dcl,
QemuDmaBuf *dmabuf)
 {
 #ifdef CONFIG_GBM
+VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
+
 egl_dmabuf_import_texture(dmabuf);
 if (!dmabuf->texture) {
 return;
@@ -217,6 +232,10 @@ void gd_egl_scanout_dmabuf(DisplayChangeListener *dcl,
 gd_egl_scanout_texture(dcl, dmabuf->texture,
false, dmabuf->width, dmabuf->height,
0, 0, dmabuf->width, dmabuf->height);
+
+if (dmabuf->allow_fences) {
+vc->gfx.guest_fb.dmabuf = dmabuf;
+}
 #endif
 }
 
@@ -281,6 +300,12 @@ void gd_egl_scanout_flush(DisplayChangeListener *dcl,
 egl_fb_blit(>gfx.win_fb, >gfx.guest_fb, !vc->gfx.y0_top);
 }
 
+#ifdef CONFIG_GBM
+if (vc->gfx.guest_fb.dmabuf) {
+egl_dmabuf_create_sync(vc->gfx.guest_fb.dmabuf);
+}
+#endif
+
 eglSwapBuffers(qemu_egl_display, vc->gfx.esurface);
 }
 
diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
index dd5783fec7..b23523748e 100644
--- a/ui/gtk-gl-area.c
+++ b/ui/gtk-gl-area.c
@@ -8,6 +8,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
 
 #include "trace.h"
 
@@ -71,7 +72,25 @@ void gd_gl_area_draw(VirtualConsole *vc)
 surface_gl_render_texture(vc->gfx.gls, vc->gfx.ds);
 }
 
+#ifdef CONFIG_GBM
+if (vc->gfx.guest_fb.dmabuf) {
+egl_dmabuf_create_sync(vc->gfx.guest_fb.dmabuf);
+}
+#endif
+
 glFlush();
+#ifdef CONFIG_GBM
+if (vc->gfx.guest_fb.dmabuf) {
+QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf;
+
+egl_dmabuf_create_fence(dmabuf);
+if (dmabuf->fence_fd > 0) {
+qemu_set_fd_handler(dmabuf->fence_fd, gd_hw_gl_flushed, NULL, vc);
+return;
+}
+graphic_hw_gl_block(vc->gfx.dcl.con, false);
+}
+#endif
 graphic_hw_gl_flushed(vc->gfx.dcl.con);
 }
 
@@ -213,6 +232,9 @@ void gd_gl_area_scanout_flush(DisplayChangeListener *dcl,
 {
 VirtualConsole *vc =

[PATCH RFC 10/13] hw/nvme: add experimental object x-nvme-subsystem

2021-09-14 Thread Klaus Jensen

From: Klaus Jensen 

Add a basic user creatable object that models an NVMe NVM subsystem.

Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   |  26 +++---
 hw/nvme/ns.c |   5 +-
 hw/nvme/nvme.h   |  30 
 hw/nvme/subsys.c | 121 +++
 qapi/qom.json|  17 +++
 5 files changed, 162 insertions(+), 37 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index ec63338b5bfc..563a8f8ad1df 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6526,7 +6526,7 @@ static int nvme_init_subsys(NvmeState *n, Error **errp)
 return 0;
 }
 
-cntlid = nvme_subsys_register_ctrl(n, errp);
+cntlid = nvme_subsys_register_ctrl(n->subsys, n, errp);
 if (cntlid < 0) {
 return -1;
 }
@@ -6557,6 +6557,12 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 return;
 }
 
+if (!n->subsys) {
+error_setg(errp, "device '%s' requires the 'subsys' parameter",
+   TYPE_NVME_DEVICE);
+return;
+}
+
 nvme_init_state(n);
 if (nvme_init_pci(n, pci_dev, errp)) {
 return;
@@ -6574,10 +6580,14 @@ static void nvme_legacy_realize(PCIDevice *pci_dev, 
Error **errp)
 NvmeState *n = NVME_STATE(pci_dev);
 NvmeCtrlLegacyDevice *ctrl = NVME_DEVICE_LEGACY(n);
 
-if (ctrl->namespace.blkconf.blk && n->subsys) {
-error_setg(errp, "subsystem support is unavailable with legacy "
-   "namespace ('drive' property)");
-return;
+if (ctrl->subsys_dev) {
+if (ctrl->namespace.blkconf.blk) {
+error_setg(errp, "subsystem support is unavailable with legacy "
+   "namespace ('drive' property)");
+return;
+}
+
+n->subsys = >subsys_dev->subsys;
 }
 
 if (nvme_check_constraints(n, errp)) {
@@ -6647,8 +6657,6 @@ static void nvme_exit(PCIDevice *pci_dev)
 static Property nvme_state_props[] = {
 DEFINE_PROP_LINK("pmrdev", NvmeState, pmr.dev, TYPE_MEMORY_BACKEND,
  HostMemoryBackend *),
-DEFINE_PROP_LINK("subsys", NvmeState, subsys, TYPE_NVME_SUBSYS,
- NvmeSubsystem *),
 DEFINE_PROP_STRING("serial", NvmeState, params.serial),
 DEFINE_PROP_UINT8("aerl", NvmeState, params.aerl, 3),
 DEFINE_PROP_UINT8("mdts", NvmeState, params.mdts, 7),
@@ -6657,6 +6665,8 @@ static Property nvme_state_props[] = {
 };
 
 static Property nvme_props[] = {
+DEFINE_PROP_LINK("subsys", NvmeState, subsys, TYPE_NVME_SUBSYSTEM,
+ NvmeSubsystem *),
 DEFINE_PROP_UINT32("cmb-size-mb", NvmeState, params.cmb_size_mb, 0),
 DEFINE_PROP_UINT32("max-aen-retention", NvmeState, params.aer_max_queued, 
64),
 DEFINE_PROP_UINT32("max-ioqpairs", NvmeState, params.max_ioqpairs, 64),
@@ -6674,6 +6684,8 @@ static Property nvme_props[] = {
 
 static Property nvme_legacy_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeCtrlLegacyDevice, namespace.blkconf),
+DEFINE_PROP_LINK("subsys", NvmeCtrlLegacyDevice, subsys_dev,
+ TYPE_NVME_SUBSYS_DEVICE, NvmeSubsystemDevice *),
 DEFINE_PROP_UINT32("cmb_size_mb", NvmeState, params.cmb_size_mb, 0),
 DEFINE_PROP_UINT32("num_queues", NvmeState, params.num_queues, 0),
 DEFINE_PROP_UINT32("aer_max_queued", NvmeState, params.aer_max_queued, 64),
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index bdd41a3d1fc3..3d643554644c 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -493,7 +493,8 @@ static void nvme_nsdev_realize(DeviceState *dev, Error 
**errp)
 NvmeNamespaceDevice *nsdev = NVME_NAMESPACE_DEVICE(dev);
 NvmeNamespace *ns = >ns;
 BusState *s = qdev_get_parent_bus(dev);
-NvmeState *n = NVME_STATE(s->parent);
+NvmeCtrlLegacyDevice *ctrl = NVME_DEVICE_LEGACY(s->parent);
+NvmeState *n = NVME_STATE(ctrl);
 NvmeSubsystem *subsys = n->subsys;
 uint32_t nsid = nsdev->params.nsid;
 int i;
@@ -515,7 +516,7 @@ static void nvme_nsdev_realize(DeviceState *dev, Error 
**errp)
  * If this namespace belongs to a subsystem (through a link on the
  * controller device), reparent the device.
  */
-if (!qdev_set_parent_bus(dev, >bus.parent_bus, errp)) {
+if (!qdev_set_parent_bus(dev, >subsys_dev->bus.parent_bus, 
errp)) {
 return;
 }
 }
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 629a8ccab9f8..1ae185139132 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -48,24 +48,36 @@ typedef struct NvmeBus {
 BusState parent_bus;
 } NvmeBus;
 
-#define TYPE_NVME_SUBSYS "nvme-subsys"
-#define NVME_SUBSYS(obj) \
-OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
+#define TYPE_NVME_SUBSYSTEM "x-nvme-subsystem"
+OBJECT_DECLARE_SIMPLE_TYPE(NvmeSubsystem, NVME_SUBSYSTEM)
 
 typedef struct NvmeSubsystem {
-DeviceState parent_obj;
-NvmeBus bus;
-uint8_t subnqn[256];
+Object parent_obj;
+
+QemuUUID uuid;
+uint8_t  subnqn[256];
 
 NvmeState

[PATCH RFC 11/13] hw/nvme: add experimental abstract object x-nvme-ns

2021-09-14 Thread Klaus Jensen

From: Klaus Jensen 

Add the abstract NvmeNamespace object to base proper namespace types on.

Signed-off-by: Klaus Jensen 
---
 hw/nvme/ns.c | 286 +++
 hw/nvme/nvme.h   |  24 
 hw/nvme/subsys.c |  31 +
 qapi/qom.json|  18 +++
 4 files changed, 359 insertions(+)

diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 3d643554644c..05828fbb48a5 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -13,9 +13,13 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/cutils.h"
+#include "qemu/ctype.h"
 #include "qemu/units.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
+#include "qapi/qapi-builtin-visit.h"
+#include "qom/object_interfaces.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/block-backend.h"
 
@@ -638,8 +642,290 @@ static const TypeInfo nvme_nsdev_info = {
 .instance_init = nvme_nsdev_instance_init,
 };
 
+bool nvme_ns_prop_writable(Object *obj, const char *name, Error **errp)
+{
+NvmeNamespace *ns = NVME_NAMESPACE(obj);
+
+if (ns->realized) {
+error_setg(errp, "attempt to set immutable property '%s' on "
+   "active namespace", name);
+return false;
+}
+
+return true;
+}
+
+static void set_attached(Object *obj, Visitor *v, const char *name,
+ void *opaque, Error **errp)
+{
+NvmeNamespace *ns = NVME_NAMESPACE(obj);
+
+if (!nvme_ns_prop_writable(obj, name, errp)) {
+return;
+}
+
+visit_type_strList(v, name, >_ctrls, errp);
+}
+
+static void get_attached(Object *obj, Visitor *v, const char *name,
+ void *opaque, Error **errp)
+{
+NvmeNamespace *ns = NVME_NAMESPACE(obj);
+strList *paths = NULL;
+strList **tail = 
+int cntlid;
+
+for (cntlid = 0; cntlid < ARRAY_SIZE(ns->subsys->ctrls); cntlid++) {
+NvmeState *ctrl = nvme_subsys_ctrl(ns->subsys, cntlid);
+if (!ctrl || !nvme_ns(ctrl, ns->nsid)) {
+continue;
+}
+
+QAPI_LIST_APPEND(tail, object_get_canonical_path(OBJECT(ctrl)));
+}
+
+visit_type_strList(v, name, , errp);
+qapi_free_strList(paths);
+}
+
+static void get_nsid(Object *obj, Visitor *v, const char *name, void *opaque,
+ Error **errp)
+{
+NvmeNamespace *ns = NVME_NAMESPACE(obj);
+uint32_t value = ns->nsid;
+
+visit_type_uint32(v, name, , errp);
+}
+
+static void set_nsid(Object *obj, Visitor *v, const char *name, void *opaque,
+ Error **errp)
+{
+NvmeNamespace *ns = NVME_NAMESPACE(obj);
+uint32_t value;
+
+if (!nvme_ns_prop_writable(obj, name, errp)) {
+return;
+}
+
+if (!visit_type_uint32(v, name, , errp)) {
+return;
+}
+
+if (value > NVME_MAX_NAMESPACES) {
+error_setg(errp, "invalid namespace identifier");
+return;
+}
+
+ns->nsid = value;
+}
+
+static char *get_uuid(Object *obj, Error **errp)
+{
+NvmeNamespace *ns = NVME_NAMESPACE(obj);
+
+char *str = g_malloc(UUID_FMT_LEN + 1);
+
+qemu_uuid_unparse(>uuid, str);
+
+return str;
+}
+
+static void set_uuid(Object *obj, const char *v, Error **errp)
+{
+NvmeNamespace *ns = NVME_NAMESPACE(obj);
+
+if (!nvme_ns_prop_writable(obj, "uuid", errp)) {
+return;
+}
+
+if (!strcmp(v, "auto")) {
+qemu_uuid_generate(>uuid);
+} else if (qemu_uuid_parse(v, >uuid) < 0) {
+error_setg(errp, "invalid uuid");
+}
+}
+
+static char *get_eui64(Object *obj, Error **errp)
+{
+NvmeNamespace *ns = NVME_NAMESPACE(obj);
+
+const int len = 2 * 8 + 7 + 1; /* "aa:bb:cc:dd:ee:ff:gg:hh\0" */
+char *str = g_malloc(len);
+
+snprintf(str, len, "%02x:%02x:%02x:%02x:%02x:%02x:%02x:%02x",
+ ns->eui64.a[0], ns->eui64.a[1], ns->eui64.a[2], ns->eui64.a[3],
+ ns->eui64.a[4], ns->eui64.a[5], ns->eui64.a[6], ns->eui64.a[7]);
+
+return str;
+}
+
+static void set_eui64(Object *obj, const char *v, Error **errp)
+{
+NvmeNamespace *ns = NVME_NAMESPACE(obj);
+
+int i, pos;
+
+if (!nvme_ns_prop_writable(obj, "eui64", errp)) {
+return;
+}
+
+if (!strcmp(v, "auto")) {
+ns->eui64.a[0] = 0x52;
+ns->eui64.a[1] = 0x54;
+ns->eui64.a[2] = 0x00;
+
+for (i = 0; i < 5; ++i) {
+ns->eui64.a[3 + i] = g_random_int();
+}
+
+return;
+}
+
+for (i = 0, pos = 0; i < 8; i++, pos += 3) {
+long octet;
+
+if (!(qemu_isxdigit(v[pos]) && qemu_isxdigit(v[pos + 1]))) {
+goto invalid;
+}
+
+if (i == 7) {
+if (v[pos + 2] != '\0') {
+goto invalid;
+}
+} else {
+if (!(v[pos + 2] == ':' || v[pos + 2] == '-')) {
+goto invalid;
+}
+}
+
+if (qemu_strtol(v + pos, NULL, 16, ) < 0 || octet > 0xff) {
+goto invalid;
+}
+
+ns->eui64.a[i] = octet;
+}
+
+return;
+
+invalid:
+

[PATCH RFC 09/13] hw/nvme: add experimental device x-nvme-ctrl

2021-09-14 Thread Klaus Jensen

From: Klaus Jensen 

Add a new experimental 'x-nvme-ctrl' device which allows us to get rid
of a bunch of legacy options and slightly change others to better use
the qdev property system.

Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 111 +
 hw/nvme/nvme.h |  11 -
 2 files changed, 93 insertions(+), 29 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 6a4f07b8d114..ec63338b5bfc 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6551,8 +6551,28 @@ void nvme_attach_ns(NvmeState *n, NvmeNamespace *ns)
 
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
-NvmeCtrl *ctrl = NVME_DEVICE(pci_dev);
-NvmeState *n = NVME_STATE(ctrl);
+NvmeState *n = NVME_STATE(pci_dev);
+
+if (nvme_check_constraints(n, errp)) {
+return;
+}
+
+nvme_init_state(n);
+if (nvme_init_pci(n, pci_dev, errp)) {
+return;
+}
+
+if (nvme_init_subsys(n, errp)) {
+return;
+}
+
+nvme_init_ctrl(n, pci_dev);
+}
+
+static void nvme_legacy_realize(PCIDevice *pci_dev, Error **errp)
+{
+NvmeState *n = NVME_STATE(pci_dev);
+NvmeCtrlLegacyDevice *ctrl = NVME_DEVICE_LEGACY(n);
 
 if (ctrl->namespace.blkconf.blk && n->subsys) {
 error_setg(errp, "subsystem support is unavailable with legacy "
@@ -6575,6 +6595,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 if (nvme_init_subsys(n, errp)) {
 return;
 }
+
 nvme_init_ctrl(n, pci_dev);
 
 /* setup a namespace if the controller drive property was given */
@@ -6629,24 +6650,40 @@ static Property nvme_state_props[] = {
 DEFINE_PROP_LINK("subsys", NvmeState, subsys, TYPE_NVME_SUBSYS,
  NvmeSubsystem *),
 DEFINE_PROP_STRING("serial", NvmeState, params.serial),
-DEFINE_PROP_UINT32("cmb_size_mb", NvmeState, params.cmb_size_mb, 0),
-DEFINE_PROP_UINT32("num_queues", NvmeState, params.num_queues, 0),
-DEFINE_PROP_UINT32("max_ioqpairs", NvmeState, params.max_ioqpairs, 64),
-DEFINE_PROP_UINT16("msix_qsize", NvmeState, params.msix_qsize, 65),
 DEFINE_PROP_UINT8("aerl", NvmeState, params.aerl, 3),
-DEFINE_PROP_UINT32("aer_max_queued", NvmeState, params.aer_max_queued, 64),
 DEFINE_PROP_UINT8("mdts", NvmeState, params.mdts, 7),
-DEFINE_PROP_UINT8("vsl", NvmeState, params.vsl, 7),
-DEFINE_PROP_BOOL("use-intel-id", NvmeState, params.use_intel_id, false),
 DEFINE_PROP_BOOL("legacy-cmb", NvmeState, params.legacy_cmb, false),
-DEFINE_PROP_UINT8("zoned.zasl", NvmeState, params.zasl, 0),
-DEFINE_PROP_BOOL("zoned.auto_transition", NvmeState,
- params.auto_transition_zones, true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
 static Property nvme_props[] = {
-DEFINE_BLOCK_PROPERTIES(NvmeCtrl, namespace.blkconf),
+DEFINE_PROP_UINT32("cmb-size-mb", NvmeState, params.cmb_size_mb, 0),
+DEFINE_PROP_UINT32("max-aen-retention", NvmeState, params.aer_max_queued, 
64),
+DEFINE_PROP_UINT32("max-ioqpairs", NvmeState, params.max_ioqpairs, 64),
+DEFINE_PROP_UINT16("msix-vectors", NvmeState, params.msix_qsize, 2048),
+
+/* nvm command set specific properties */
+DEFINE_PROP_UINT8("nvm-vsl", NvmeState, params.vsl, 7),
+
+/* zoned command set specific properties */
+DEFINE_PROP_UINT8("zoned-zasl", NvmeState, params.zasl, 0),
+DEFINE_PROP_BOOL("zoned-auto-transition-zones", NvmeState,
+ params.auto_transition_zones, true),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static Property nvme_legacy_props[] = {
+DEFINE_BLOCK_PROPERTIES(NvmeCtrlLegacyDevice, namespace.blkconf),
+DEFINE_PROP_UINT32("cmb_size_mb", NvmeState, params.cmb_size_mb, 0),
+DEFINE_PROP_UINT32("num_queues", NvmeState, params.num_queues, 0),
+DEFINE_PROP_UINT32("aer_max_queued", NvmeState, params.aer_max_queued, 64),
+DEFINE_PROP_UINT32("max_ioqpairs", NvmeState, params.max_ioqpairs, 64),
+DEFINE_PROP_UINT16("msix_qsize", NvmeState, params.msix_qsize, 65),
+DEFINE_PROP_BOOL("use-intel-id", NvmeState, params.use_intel_id, false),
+DEFINE_PROP_UINT8("vsl", NvmeState, params.vsl, 7),
+DEFINE_PROP_UINT8("zoned.zasl", NvmeState, params.zasl, 0),
+DEFINE_PROP_BOOL("zoned.auto_transition", NvmeState,
+ params.auto_transition_zones, true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -6702,7 +6739,6 @@ static void nvme_state_class_init(ObjectClass *oc, void 
*data)
 DeviceClass *dc = DEVICE_CLASS(oc);
 PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc);
 
-pc->realize = nvme_realize;
 pc->exit = nvme_exit;
 pc->class_id = PCI_CLASS_STORAGE_EXPRESS;
 pc->revision = 2;
@@ -6736,25 +6772,45 @@ static const TypeInfo nvme_state_info = {
 static void nvme_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
+PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc);
+
+pc->realize = nvme_realize;
+
 device_class_set_props(dc, nvme_props);
 }
 
-static

[PATCH RFC 13/13] hw/nvme: add attached-namespaces prop

2021-09-14 Thread Klaus Jensen

From: Klaus Jensen 

Add a runtime property to get a list of attached namespaces per
controller.

Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 04e564ad6be6..ed867384e40a 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6746,6 +6746,27 @@ static void nvme_set_smart_warning(Object *obj, Visitor 
*v, const char *name,
 }
 }
 
+static void get_attached_namespaces(Object *obj, Visitor *v, const char *name,
+void *opaque, Error **errp)
+{
+NvmeState *n = NVME_STATE(obj);
+strList *paths = NULL;
+strList **tail = 
+int nsid;
+
+for (nsid = 1; nsid <= NVME_MAX_NAMESPACES; nsid++) {
+NvmeNamespace *ns = nvme_ns(n, nsid);
+if (!ns) {
+continue;
+}
+
+QAPI_LIST_APPEND(tail, object_get_canonical_path(OBJECT(ns)));
+}
+
+visit_type_strList(v, name, , errp);
+qapi_free_strList(paths);
+}
+
 static const VMStateDescription nvme_vmstate = {
 .name = "nvme",
 .unmigratable = 1,
@@ -6771,6 +6792,9 @@ static void nvme_state_instance_init(Object *obj)
 object_property_add(obj, "smart_critical_warning", "uint8",
 nvme_get_smart_warning,
 nvme_set_smart_warning, NULL, NULL);
+
+object_property_add(obj, "attached-namespaces", "str",
+get_attached_namespaces, NULL, NULL, NULL);
 }
 
 static const TypeInfo nvme_state_info = {
-- 
2.33.0

[PATCH RFC 08/13] hw/nvme: hoist qdev state from controller

2021-09-14 Thread Klaus Jensen

From: Klaus Jensen 

Add an abstract object NvmeState.

Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 382 +--
 hw/nvme/dif.c|   4 +-
 hw/nvme/dif.h|   2 +-
 hw/nvme/ns.c |   4 +-
 hw/nvme/nvme.h   |  52 ---
 hw/nvme/subsys.c |   4 +-
 6 files changed, 239 insertions(+), 209 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 966fba605d79..6a4f07b8d114 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -323,7 +323,7 @@ static int nvme_ns_zoned_aor_check(NvmeNamespaceZoned 
*zoned, uint32_t act,
 return NVME_SUCCESS;
 }
 
-static bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
+static bool nvme_addr_is_cmb(NvmeState *n, hwaddr addr)
 {
 hwaddr hi, lo;
 
@@ -337,13 +337,13 @@ static bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
 return addr >= lo && addr < hi;
 }
 
-static inline void *nvme_addr_to_cmb(NvmeCtrl *n, hwaddr addr)
+static inline void *nvme_addr_to_cmb(NvmeState *n, hwaddr addr)
 {
 hwaddr base = n->params.legacy_cmb ? n->cmb.mem.addr : n->cmb.cba;
 return >cmb.buf[addr - base];
 }
 
-static bool nvme_addr_is_pmr(NvmeCtrl *n, hwaddr addr)
+static bool nvme_addr_is_pmr(NvmeState *n, hwaddr addr)
 {
 hwaddr hi;
 
@@ -356,12 +356,12 @@ static bool nvme_addr_is_pmr(NvmeCtrl *n, hwaddr addr)
 return addr >= n->pmr.cba && addr < hi;
 }
 
-static inline void *nvme_addr_to_pmr(NvmeCtrl *n, hwaddr addr)
+static inline void *nvme_addr_to_pmr(NvmeState *n, hwaddr addr)
 {
 return memory_region_get_ram_ptr(>pmr.dev->mr) + (addr - n->pmr.cba);
 }
 
-static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
+static int nvme_addr_read(NvmeState *n, hwaddr addr, void *buf, int size)
 {
 hwaddr hi = addr + size - 1;
 if (hi < addr) {
@@ -381,7 +381,7 @@ static int nvme_addr_read(NvmeCtrl *n, hwaddr addr, void 
*buf, int size)
 return pci_dma_read(>parent_obj, addr, buf, size);
 }
 
-static int nvme_addr_write(NvmeCtrl *n, hwaddr addr, void *buf, int size)
+static int nvme_addr_write(NvmeState *n, hwaddr addr, void *buf, int size)
 {
 hwaddr hi = addr + size - 1;
 if (hi < addr) {
@@ -401,18 +401,18 @@ static int nvme_addr_write(NvmeCtrl *n, hwaddr addr, void 
*buf, int size)
 return pci_dma_write(>parent_obj, addr, buf, size);
 }
 
-static bool nvme_nsid_valid(NvmeCtrl *n, uint32_t nsid)
+static bool nvme_nsid_valid(NvmeState *n, uint32_t nsid)
 {
 return nsid &&
 (nsid == NVME_NSID_BROADCAST || nsid <= NVME_MAX_NAMESPACES);
 }
 
-static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
+static int nvme_check_sqid(NvmeState *n, uint16_t sqid)
 {
 return sqid < n->params.max_ioqpairs + 1 && n->sq[sqid] != NULL ? 0 : -1;
 }
 
-static int nvme_check_cqid(NvmeCtrl *n, uint16_t cqid)
+static int nvme_check_cqid(NvmeState *n, uint16_t cqid)
 {
 return cqid < n->params.max_ioqpairs + 1 && n->cq[cqid] != NULL ? 0 : -1;
 }
@@ -441,7 +441,7 @@ static uint8_t nvme_sq_empty(NvmeSQueue *sq)
 return sq->head == sq->tail;
 }
 
-static void nvme_irq_check(NvmeCtrl *n)
+static void nvme_irq_check(NvmeState *n)
 {
 uint32_t intms = ldl_le_p(>bar.intms);
 
@@ -455,7 +455,7 @@ static void nvme_irq_check(NvmeCtrl *n)
 }
 }
 
-static void nvme_irq_assert(NvmeCtrl *n, NvmeCQueue *cq)
+static void nvme_irq_assert(NvmeState *n, NvmeCQueue *cq)
 {
 if (cq->irq_enabled) {
 if (msix_enabled(&(n->parent_obj))) {
@@ -472,7 +472,7 @@ static void nvme_irq_assert(NvmeCtrl *n, NvmeCQueue *cq)
 }
 }
 
-static void nvme_irq_deassert(NvmeCtrl *n, NvmeCQueue *cq)
+static void nvme_irq_deassert(NvmeState *n, NvmeCQueue *cq)
 {
 if (cq->irq_enabled) {
 if (msix_enabled(&(n->parent_obj))) {
@@ -496,7 +496,7 @@ static void nvme_req_clear(NvmeRequest *req)
 req->status = NVME_SUCCESS;
 }
 
-static inline void nvme_sg_init(NvmeCtrl *n, NvmeSg *sg, bool dma)
+static inline void nvme_sg_init(NvmeState *n, NvmeSg *sg, bool dma)
 {
 if (dma) {
 pci_dma_sglist_init(>qsg, >parent_obj, 0);
@@ -574,7 +574,7 @@ static void nvme_sg_split(NvmeSg *sg, NvmeNamespaceNvm 
*nvm, NvmeSg *data,
 }
 }
 
-static uint16_t nvme_map_addr_cmb(NvmeCtrl *n, QEMUIOVector *iov, hwaddr addr,
+static uint16_t nvme_map_addr_cmb(NvmeState *n, QEMUIOVector *iov, hwaddr addr,
   size_t len)
 {
 if (!len) {
@@ -592,7 +592,7 @@ static uint16_t nvme_map_addr_cmb(NvmeCtrl *n, QEMUIOVector 
*iov, hwaddr addr,
 return NVME_SUCCESS;
 }
 
-static uint16_t nvme_map_addr_pmr(NvmeCtrl *n, QEMUIOVector *iov, hwaddr addr,
+static uint16_t nvme_map_addr_pmr(NvmeState *n, QEMUIOVector *iov, hwaddr addr,
   size_t len)
 {
 if (!len) {
@@ -608,7 +608,7 @@ static uint16_t nvme_map_addr_pmr(NvmeCtrl *n, QEMUIOVector 
*iov, hwaddr addr,
 return NVME_SUCCESS;
 }
 
-static uint16_t nvme_map_addr(NvmeCtrl *n, NvmeSg *sg, hwaddr addr, size_t len)
+static uint16_t nvme_map_addr(NvmeState *n, NvmeSg *sg,

[PATCH RFC 05/13] hw/nvme: move BlockBackend to NvmeNamespaceNvm

2021-09-14 Thread Klaus Jensen

From: Klaus Jensen 

Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 66 +-
 hw/nvme/dif.c  | 14 +--
 hw/nvme/nvme.h |  6 +
 3 files changed, 46 insertions(+), 40 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 7f41181aafa1..f05d85075f08 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1465,7 +1465,7 @@ static int nvme_block_status_all(NvmeNamespace *ns, 
uint64_t slba,
  uint32_t nlb, int flags)
 {
 NvmeNamespaceNvm *nvm = NVME_NAMESPACE_NVM(ns);
-BlockDriverState *bs = blk_bs(ns->blkconf.blk);
+BlockDriverState *bs = blk_bs(nvme_blk(ns));
 
 int64_t pnum = 0, bytes = nvme_l2b(nvm, nlb);
 int64_t offset = nvme_l2b(nvm, slba);
@@ -1865,7 +1865,7 @@ void nvme_rw_complete_cb(void *opaque, int ret)
 {
 NvmeRequest *req = opaque;
 NvmeNamespace *ns = req->ns;
-BlockBackend *blk = ns->blkconf.blk;
+BlockBackend *blk = nvme_blk(ns);
 BlockAcctCookie *acct = >acct;
 BlockAcctStats *stats = blk_get_stats(blk);
 
@@ -1891,7 +1891,7 @@ static void nvme_rw_cb(void *opaque, int ret)
 NvmeNamespace *ns = req->ns;
 NvmeNamespaceNvm *nvm = NVME_NAMESPACE_NVM(ns);
 
-BlockBackend *blk = ns->blkconf.blk;
+BlockBackend *blk = nvme_blk(ns);
 
 trace_pci_nvme_rw_cb(nvme_cid(req), blk_name(blk));
 
@@ -1942,7 +1942,7 @@ static void nvme_verify_cb(void *opaque, int ret)
 NvmeRequest *req = ctx->req;
 NvmeNamespace *ns = req->ns;
 NvmeNamespaceNvm *nvm = NVME_NAMESPACE_NVM(ns);
-BlockBackend *blk = ns->blkconf.blk;
+BlockBackend *blk = nvme_blk(ns);
 BlockAcctCookie *acct = >acct;
 BlockAcctStats *stats = blk_get_stats(blk);
 NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
@@ -2000,7 +2000,7 @@ static void nvme_verify_mdata_in_cb(void *opaque, int ret)
 uint32_t nlb = le16_to_cpu(rw->nlb) + 1;
 size_t mlen = nvme_m2b(nvm, nlb);
 uint64_t offset = nvme_moff(nvm, slba);
-BlockBackend *blk = ns->blkconf.blk;
+BlockBackend *blk = nvme_blk(ns);
 
 trace_pci_nvme_verify_mdata_in_cb(nvme_cid(req), blk_name(blk));
 
@@ -2046,7 +2046,7 @@ static void nvme_compare_mdata_cb(void *opaque, int ret)
 uint32_t reftag = le32_to_cpu(rw->reftag);
 struct nvme_compare_ctx *ctx = req->opaque;
 g_autofree uint8_t *buf = NULL;
-BlockBackend *blk = ns->blkconf.blk;
+BlockBackend *blk = nvme_blk(ns);
 BlockAcctCookie *acct = >acct;
 BlockAcctStats *stats = blk_get_stats(blk);
 uint16_t status = NVME_SUCCESS;
@@ -2126,7 +2126,7 @@ static void nvme_compare_data_cb(void *opaque, int ret)
 NvmeCtrl *n = nvme_ctrl(req);
 NvmeNamespace *ns = req->ns;
 NvmeNamespaceNvm *nvm = NVME_NAMESPACE_NVM(ns);
-BlockBackend *blk = ns->blkconf.blk;
+BlockBackend *blk = nvme_blk(ns);
 BlockAcctCookie *acct = >acct;
 BlockAcctStats *stats = blk_get_stats(blk);
 
@@ -2272,7 +2272,7 @@ static void nvme_dsm_md_cb(void *opaque, int ret)
 nvme_dsm_cb(iocb, 0);
 }
 
-iocb->aiocb = blk_aio_pwrite_zeroes(ns->blkconf.blk, nvme_moff(nvm, slba),
+iocb->aiocb = blk_aio_pwrite_zeroes(nvme_blk(ns), nvme_moff(nvm, slba),
 nvme_m2b(nvm, nlb), BDRV_REQ_MAY_UNMAP,
 nvme_dsm_cb, iocb);
 return;
@@ -2320,7 +2320,7 @@ next:
 goto next;
 }
 
-iocb->aiocb = blk_aio_pdiscard(ns->blkconf.blk, nvme_l2b(nvm, slba),
+iocb->aiocb = blk_aio_pdiscard(nvme_blk(ns), nvme_l2b(nvm, slba),
nvme_l2b(nvm, nlb),
nvme_dsm_md_cb, iocb);
 return;
@@ -2341,7 +2341,7 @@ static uint16_t nvme_dsm(NvmeCtrl *n, NvmeRequest *req)
 trace_pci_nvme_dsm(nr, attr);
 
 if (attr & NVME_DSMGMT_AD) {
-NvmeDSMAIOCB *iocb = blk_aio_get(_dsm_aiocb_info, ns->blkconf.blk,
+NvmeDSMAIOCB *iocb = blk_aio_get(_dsm_aiocb_info, nvme_blk(ns),
  nvme_misc_cb, req);
 
 iocb->req = req;
@@ -2371,7 +2371,7 @@ static uint16_t nvme_verify(NvmeCtrl *n, NvmeRequest *req)
 NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
 NvmeNamespace *ns = req->ns;
 NvmeNamespaceNvm *nvm = NVME_NAMESPACE_NVM(ns);
-BlockBackend *blk = ns->blkconf.blk;
+BlockBackend *blk = nvme_blk(ns);
 uint64_t slba = le64_to_cpu(rw->slba);
 uint32_t nlb = le16_to_cpu(rw->nlb) + 1;
 size_t len = nvme_l2b(nvm, nlb);
@@ -2421,7 +2421,7 @@ static uint16_t nvme_verify(NvmeCtrl *n, NvmeRequest *req)
 block_acct_start(blk_get_stats(blk), >acct, ctx->data.iov.size,
  BLOCK_ACCT_READ);
 
-req->aiocb = blk_aio_preadv(ns->blkconf.blk, offset, >data.iov, 0,
+req->aiocb = blk_aio_preadv(nvme_blk(ns), offset, >data.iov, 0,
 nvme_verify_mdata_in_cb, ctx);
 return NVME_NO_COMPLETE;
 }
@@ -2472,7 +2472,7 @@ static void nvme_copy_bh(void *opaque)
 NvmeCopyAIOCB

[PATCH RFC 12/13] hw/nvme: add experimental objects x-nvme-ns-{nvm, zoned}

2021-09-14 Thread Klaus Jensen

From: Klaus Jensen 

Add implementations of namespaces that supports the NVM and Zoned
Command Sets.

Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c  |  11 +-
 hw/nvme/dif.h   |   2 +
 hw/nvme/meson.build |   2 +-
 hw/nvme/ns-nvm.c| 360 +++
 hw/nvme/ns-zoned.c  | 449 
 hw/nvme/ns.c| 281 +++
 hw/nvme/nvm.h   |  65 +++
 hw/nvme/nvme.h  |  96 +-
 hw/nvme/zoned.h |  48 +
 qapi/qom.json   |  48 +
 10 files changed, 1010 insertions(+), 352 deletions(-)
 create mode 100644 hw/nvme/ns-nvm.c
 create mode 100644 hw/nvme/ns-zoned.c
 create mode 100644 hw/nvme/nvm.h

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 563a8f8ad1df..04e564ad6be6 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -164,6 +164,7 @@
 
 #include "nvme.h"
 #include "dif.h"
+#include "nvm.h"
 #include "zoned.h"
 
 #include "trace.h"
@@ -5342,7 +5343,7 @@ static void nvme_format_set(NvmeNamespace *ns, NvmeCmd 
*cmd)
 nvm->id_ns.dps = (pil << 3) | pi;
 nvm->id_ns.flbas = lbaf | (mset << 4);
 
-nvme_ns_nvm_init_format(nvm);
+nvme_ns_nvm_configure_format(nvm);
 }
 
 static void nvme_format_ns_cb(void *opaque, int ret)
@@ -6611,10 +6612,14 @@ static void nvme_legacy_realize(PCIDevice *pci_dev, 
Error **errp)
 /* setup a namespace if the controller drive property was given */
 if (ctrl->namespace.blkconf.blk) {
 NvmeNamespaceDevice *nsdev = >namespace;
-NvmeNamespace *ns = >ns;
+NvmeNamespace *ns = NVME_NAMESPACE(nsdev->ns);
+NvmeNamespaceNvm *nvm = NVME_NAMESPACE_NVM(ns);
 ns->nsid = 1;
 
-nvme_ns_init(ns);
+ns->csi = NVME_CSI_NVM;
+
+nvme_ns_nvm_configure_identify(ns);
+nvme_ns_nvm_configure_format(nvm);
 
 nvme_attach_ns(n, ns);
 }
diff --git a/hw/nvme/dif.h b/hw/nvme/dif.h
index 53a22bc7c78e..81efb95cd391 100644
--- a/hw/nvme/dif.h
+++ b/hw/nvme/dif.h
@@ -1,6 +1,8 @@
 #ifndef HW_NVME_DIF_H
 #define HW_NVME_DIF_H
 
+#include "nvm.h"
+
 /* from Linux kernel (crypto/crct10dif_common.c) */
 static const uint16_t t10_dif_crc_table[256] = {
 0x, 0x8BB7, 0x9CD9, 0x176E, 0xB205, 0x39B2, 0x2EDC, 0xA56B,
diff --git a/hw/nvme/meson.build b/hw/nvme/meson.build
index 3cf40046eea9..2bb8354bcb57 100644
--- a/hw/nvme/meson.build
+++ b/hw/nvme/meson.build
@@ -1 +1 @@
-softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('ctrl.c', 'dif.c', 
'ns.c', 'subsys.c'))
+softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('ctrl.c', 'dif.c', 
'ns.c', 'ns-nvm.c', 'ns-zoned.c', 'subsys.c'))
diff --git a/hw/nvme/ns-nvm.c b/hw/nvme/ns-nvm.c
new file mode 100644
index ..afb0482ab9e8
--- /dev/null
+++ b/hw/nvme/ns-nvm.c
@@ -0,0 +1,360 @@
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "qapi/visitor.h"
+#include "qom/object_interfaces.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/block-backend.h"
+
+#include "nvme.h"
+#include "nvm.h"
+
+#include "trace.h"
+
+static char *get_blockdev(Object *obj, Error **errp)
+{
+NvmeNamespaceNvm *nvm = NVME_NAMESPACE_NVM(obj);
+const char *value;
+
+value = blk_name(nvm->blk);
+if (strcmp(value, "") == 0) {
+BlockDriverState *bs = blk_bs(nvm->blk);
+if (bs) {
+value = bdrv_get_node_name(bs);
+}
+}
+
+return g_strdup(value);
+}
+
+static void set_blockdev(Object *obj, const char *str, Error **errp)
+{
+NvmeNamespaceNvm *nvm = NVME_NAMESPACE_NVM(obj);
+
+g_free(nvm->blk_nodename);
+nvm->blk_nodename = g_strdup(str);
+}
+
+static void get_lba_size(Object *obj, Visitor *v, const char *name,
+ void *opaque, Error **errp)
+{
+NvmeNamespaceNvm *nvm = NVME_NAMESPACE_NVM(obj);
+uint64_t lba_size = nvm->lbasz;
+
+visit_type_size(v, name, _size, errp);
+}
+
+static void set_lba_size(Object *obj, Visitor *v, const char *name,
+ void *opaque, Error **errp)
+{
+NvmeNamespaceNvm *nvm = NVME_NAMESPACE_NVM(obj);
+uint64_t lba_size;
+
+if (!nvme_ns_prop_writable(obj, name, errp)) {
+return;
+}
+
+if (!visit_type_size(v, name, _size, errp)) {
+return;
+}
+
+nvm->lbasz = lba_size;
+nvm->lbaf.ds = 31 - clz32(nvm->lbasz);
+}
+
+static void get_metadata_size(Object *obj, Visitor *v, const char *name,
+  void *opaque, Error **errp)
+{
+NvmeNamespaceNvm *nvm = NVME_NAMESPACE_NVM(obj);
+uint16_t value = nvm->lbaf.ms;
+
+visit_type_uint16(v, name, , errp);
+}
+
+static void set_metadata_size(Object *obj, Visitor *v, const char *name,
+  void *opaque, Error **errp)
+{
+NvmeNamespaceNvm *nvm = NVME_NAMESPACE_NVM(obj);
+uint16_t value;
+
+if (!nvme_ns_prop_writable(obj, name, errp)) {
+return;
+}
+
+if (!visit_type_uint16(v, name, , errp)) {
+

[PATCH RFC 07/13] hw/nvme: hoist qdev state from namespace

2021-09-14 Thread Klaus Jensen

From: Klaus Jensen 

Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c |  32 +++---
 hw/nvme/ns.c   | 265 ++---
 hw/nvme/nvme.h |  45 ++---
 3 files changed, 187 insertions(+), 155 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f05d85075f08..966fba605d79 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4616,10 +4616,10 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
NvmeRequest *req,
 continue;
 }
 }
-if (ns->params.nsid <= min_nsid) {
+if (ns->nsid <= min_nsid) {
 continue;
 }
-list_ptr[j++] = cpu_to_le32(ns->params.nsid);
+list_ptr[j++] = cpu_to_le32(ns->nsid);
 if (j == data_len / sizeof(uint32_t)) {
 break;
 }
@@ -4664,10 +4664,10 @@ static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, 
NvmeRequest *req,
 continue;
 }
 }
-if (ns->params.nsid <= min_nsid || c->csi != ns->csi) {
+if (ns->nsid <= min_nsid || c->csi != ns->csi) {
 continue;
 }
-list_ptr[j++] = cpu_to_le32(ns->params.nsid);
+list_ptr[j++] = cpu_to_le32(ns->nsid);
 if (j == data_len / sizeof(uint32_t)) {
 break;
 }
@@ -4714,14 +4714,14 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl 
*n, NvmeRequest *req)
  */
 uuid.hdr.nidt = NVME_NIDT_UUID;
 uuid.hdr.nidl = NVME_NIDL_UUID;
-memcpy(uuid.v, ns->params.uuid.data, NVME_NIDL_UUID);
+memcpy(uuid.v, ns->uuid.data, NVME_NIDL_UUID);
 memcpy(pos, , sizeof(uuid));
 pos += sizeof(uuid);
 
-if (ns->params.eui64) {
+if (ns->eui64.v) {
 eui64.hdr.nidt = NVME_NIDT_EUI64;
 eui64.hdr.nidl = NVME_NIDL_EUI64;
-eui64.v = cpu_to_be64(ns->params.eui64);
+eui64.v = cpu_to_be64(ns->eui64.v);
 memcpy(pos, , sizeof(eui64));
 pos += sizeof(eui64);
 }
@@ -5260,7 +5260,7 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
NvmeRequest *req)
 return NVME_NS_ALREADY_ATTACHED | NVME_DNR;
 }
 
-if (ns->attached && !ns->params.shared) {
+if (ns->attached && !(ns->flags & NVME_NS_SHARED)) {
 return NVME_NS_PRIVATE | NVME_DNR;
 }
 
@@ -5338,12 +5338,12 @@ static void nvme_format_set(NvmeNamespace *ns, NvmeCmd 
*cmd)
 uint8_t mset = (dw10 >> 4) & 0x1;
 uint8_t pil = (dw10 >> 8) & 0x1;
 
-trace_pci_nvme_format_set(ns->params.nsid, lbaf, mset, pi, pil);
+trace_pci_nvme_format_set(ns->nsid, lbaf, mset, pi, pil);
 
 nvm->id_ns.dps = (pil << 3) | pi;
 nvm->id_ns.flbas = lbaf | (mset << 4);
 
-nvme_ns_init_format(ns);
+nvme_ns_nvm_init_format(nvm);
 }
 
 static void nvme_format_ns_cb(void *opaque, int ret)
@@ -6544,7 +6544,7 @@ static int nvme_init_subsys(NvmeCtrl *n, Error **errp)
 void nvme_attach_ns(NvmeCtrl *n, NvmeNamespace *ns)
 {
 NvmeNamespaceNvm *nvm = NVME_NAMESPACE_NVM(ns);
-uint32_t nsid = ns->params.nsid;
+uint32_t nsid = ns->nsid;
 assert(nsid && nsid <= NVME_MAX_NAMESPACES);
 
 n->namespaces[nsid] = ns;
@@ -6557,7 +6557,6 @@ void nvme_attach_ns(NvmeCtrl *n, NvmeNamespace *ns)
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
 NvmeCtrl *n = NVME(pci_dev);
-NvmeNamespace *ns;
 Error *local_err = NULL;
 
 nvme_check_constraints(n, _err);
@@ -6582,12 +6581,11 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 
 /* setup a namespace if the controller drive property was given */
 if (n->namespace.blkconf.blk) {
-ns = >namespace;
-ns->params.nsid = 1;
+NvmeNamespaceDevice *nsdev = >namespace;
+NvmeNamespace *ns = >ns;
+ns->nsid = 1;
 
-if (nvme_ns_setup(ns, errp)) {
-return;
-}
+nvme_ns_init(ns);
 
 nvme_attach_ns(n, ns);
 }
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 0e231102c475..b411b184c253 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -26,9 +26,8 @@
 
 #define MIN_DISCARD_GRANULARITY (4 * KiB)
 
-void nvme_ns_init_format(NvmeNamespace *ns)
+void nvme_ns_nvm_init_format(NvmeNamespaceNvm *nvm)
 {
-NvmeNamespaceNvm *nvm = NVME_NAMESPACE_NVM(ns);
 NvmeIdNs *id_ns = >id_ns;
 BlockDriverInfo bdi;
 int npdg, nlbas, ret;
@@ -48,7 +47,7 @@ void nvme_ns_init_format(NvmeNamespace *ns)
 
 npdg = nvm->discard_granularity / nvm->lbasz;
 
-ret = bdrv_get_info(blk_bs(ns->blkconf.blk), );
+ret = bdrv_get_info(blk_bs(nvm->blk), );
 if (ret >= 0 && bdi.cluster_size > nvm->discard_granularity) {
 npdg = bdi.cluster_size / nvm->lbasz;
 }
@@ -56,53 +55,39 @@ void nvme_ns_init_format(NvmeNamespace *ns)
 id_ns->npda = id_ns->npdg = npdg - 1;
 }
 
-static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
+void nvme_ns_init(NvmeNamespace *ns)
 {
 NvmeNamespaceNvm *nvm = NVME_NAMESPACE_NVM(ns);
-static uint64_t ns_count;

[PATCH RFC 02/13] hw/nvme: move zns helpers and types into zoned.h

2021-09-14 Thread Klaus Jensen

From: Klaus Jensen 

Move ZNS related helpers and types into zoned.h. Use a common prefix
(nvme_zoned or nvme_ns_zoned) for zns related functions.

Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c  | 92 --
 hw/nvme/ns.c| 39 ++--
 hw/nvme/nvme.h  | 72 
 hw/nvme/zoned.h | 97 +
 4 files changed, 156 insertions(+), 144 deletions(-)
 create mode 100644 hw/nvme/zoned.h

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 65970b81d5fb..778a2689481d 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -164,6 +164,8 @@
 
 #include "nvme.h"
 #include "dif.h"
+#include "zoned.h"
+
 #include "trace.h"
 
 #define NVME_MAX_IOQPAIRS 0x
@@ -262,7 +264,7 @@ static void nvme_assign_zone_state(NvmeNamespace *ns, 
NvmeZone *zone,
NvmeZoneState state)
 {
 if (QTAILQ_IN_USE(zone, entry)) {
-switch (nvme_get_zone_state(zone)) {
+switch (nvme_zoned_zs(zone)) {
 case NVME_ZONE_STATE_EXPLICITLY_OPEN:
 QTAILQ_REMOVE(>exp_open_zones, zone, entry);
 break;
@@ -279,7 +281,7 @@ static void nvme_assign_zone_state(NvmeNamespace *ns, 
NvmeZone *zone,
 }
 }
 
-nvme_set_zone_state(zone, state);
+nvme_zoned_set_zs(zone, state);
 
 switch (state) {
 case NVME_ZONE_STATE_EXPLICITLY_OPEN:
@@ -304,7 +306,8 @@ static void nvme_assign_zone_state(NvmeNamespace *ns, 
NvmeZone *zone,
  * Check if we can open a zone without exceeding open/active limits.
  * AOR stands for "Active and Open Resources" (see TP 4053 section 2.5).
  */
-static int nvme_aor_check(NvmeNamespace *ns, uint32_t act, uint32_t opn)
+static int nvme_ns_zoned_aor_check(NvmeNamespace *ns, uint32_t act,
+   uint32_t opn)
 {
 if (ns->params.max_active_zones != 0 &&
 ns->nr_active_zones + act > ns->params.max_active_zones) {
@@ -1552,28 +1555,11 @@ static void nvme_aio_err(NvmeRequest *req, int ret)
 req->status = status;
 }
 
-static inline uint32_t nvme_zone_idx(NvmeNamespace *ns, uint64_t slba)
-{
-return ns->zone_size_log2 > 0 ? slba >> ns->zone_size_log2 :
-slba / ns->zone_size;
-}
-
-static inline NvmeZone *nvme_get_zone_by_slba(NvmeNamespace *ns, uint64_t slba)
-{
-uint32_t zone_idx = nvme_zone_idx(ns, slba);
-
-if (zone_idx >= ns->num_zones) {
-return NULL;
-}
-
-return >zone_array[zone_idx];
-}
-
 static uint16_t nvme_check_zone_state_for_write(NvmeZone *zone)
 {
 uint64_t zslba = zone->d.zslba;
 
-switch (nvme_get_zone_state(zone)) {
+switch (nvme_zoned_zs(zone)) {
 case NVME_ZONE_STATE_EMPTY:
 case NVME_ZONE_STATE_IMPLICITLY_OPEN:
 case NVME_ZONE_STATE_EXPLICITLY_OPEN:
@@ -1598,7 +1584,7 @@ static uint16_t nvme_check_zone_state_for_write(NvmeZone 
*zone)
 static uint16_t nvme_check_zone_write(NvmeNamespace *ns, NvmeZone *zone,
   uint64_t slba, uint32_t nlb)
 {
-uint64_t zcap = nvme_zone_wr_boundary(zone);
+uint64_t zcap = nvme_zoned_zone_wr_boundary(zone);
 uint16_t status;
 
 status = nvme_check_zone_state_for_write(zone);
@@ -1621,7 +1607,7 @@ static uint16_t nvme_check_zone_write(NvmeNamespace *ns, 
NvmeZone *zone,
 
 static uint16_t nvme_check_zone_state_for_read(NvmeZone *zone)
 {
-switch (nvme_get_zone_state(zone)) {
+switch (nvme_zoned_zs(zone)) {
 case NVME_ZONE_STATE_EMPTY:
 case NVME_ZONE_STATE_IMPLICITLY_OPEN:
 case NVME_ZONE_STATE_EXPLICITLY_OPEN:
@@ -1646,10 +1632,10 @@ static uint16_t nvme_check_zone_read(NvmeNamespace *ns, 
uint64_t slba,
 uint64_t bndry, end;
 uint16_t status;
 
-zone = nvme_get_zone_by_slba(ns, slba);
+zone = nvme_ns_zoned_get_by_slba(ns, slba);
 assert(zone);
 
-bndry = nvme_zone_rd_boundary(ns, zone);
+bndry = nvme_zoned_zone_rd_boundary(ns, zone);
 end = slba + nlb;
 
 status = nvme_check_zone_state_for_read(zone);
@@ -1669,7 +1655,7 @@ static uint16_t nvme_check_zone_read(NvmeNamespace *ns, 
uint64_t slba,
 if (status) {
 break;
 }
-} while (end > nvme_zone_rd_boundary(ns, zone));
+} while (end > nvme_zoned_zone_rd_boundary(ns, zone));
 }
 }
 
@@ -1678,16 +1664,16 @@ static uint16_t nvme_check_zone_read(NvmeNamespace *ns, 
uint64_t slba,
 
 static uint16_t nvme_zrm_finish(NvmeNamespace *ns, NvmeZone *zone)
 {
-switch (nvme_get_zone_state(zone)) {
+switch (nvme_zoned_zs(zone)) {
 case NVME_ZONE_STATE_FULL:
 return NVME_SUCCESS;
 
 case NVME_ZONE_STATE_IMPLICITLY_OPEN:
 case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-nvme_aor_dec_open(ns);
+nvme_ns_zoned_aor_dec_open(ns);
 /* fallthrough */
 case NVME_ZONE_STATE_CLOSED:
-nvme_aor_dec_active(ns);
+nvme_ns_zoned_aor_dec_active(ns);

[PATCH RFC 03/13] hw/nvme: move zoned namespace members to separate struct

2021-09-14 Thread Klaus Jensen

From: Klaus Jensen 

In preparation for nvm and zoned namespace separation, move zoned
related members from NvmeNamespace into NvmeNamespaceZoned.

There are no functional changes here, basically just a
s/NvmeNamespace/NvmeNamespaceZoned and s/ns/zoned where applicable.

Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 276 +++
 hw/nvme/ns.c | 134 +++--
 hw/nvme/nvme.h   |  66 +++
 hw/nvme/zoned.h  |  68 +--
 include/block/nvme.h |   4 +
 5 files changed, 304 insertions(+), 244 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 778a2689481d..4c30823d389f 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -260,22 +260,22 @@ static uint16_t nvme_sqid(NvmeRequest *req)
 return le16_to_cpu(req->sq->sqid);
 }
 
-static void nvme_assign_zone_state(NvmeNamespace *ns, NvmeZone *zone,
+static void nvme_assign_zone_state(NvmeNamespaceZoned *zoned, NvmeZone *zone,
NvmeZoneState state)
 {
 if (QTAILQ_IN_USE(zone, entry)) {
 switch (nvme_zoned_zs(zone)) {
 case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-QTAILQ_REMOVE(>exp_open_zones, zone, entry);
+QTAILQ_REMOVE(>exp_open_zones, zone, entry);
 break;
 case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-QTAILQ_REMOVE(>imp_open_zones, zone, entry);
+QTAILQ_REMOVE(>imp_open_zones, zone, entry);
 break;
 case NVME_ZONE_STATE_CLOSED:
-QTAILQ_REMOVE(>closed_zones, zone, entry);
+QTAILQ_REMOVE(>closed_zones, zone, entry);
 break;
 case NVME_ZONE_STATE_FULL:
-QTAILQ_REMOVE(>full_zones, zone, entry);
+QTAILQ_REMOVE(>full_zones, zone, entry);
 default:
 ;
 }
@@ -285,16 +285,16 @@ static void nvme_assign_zone_state(NvmeNamespace *ns, 
NvmeZone *zone,
 
 switch (state) {
 case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-QTAILQ_INSERT_TAIL(>exp_open_zones, zone, entry);
+QTAILQ_INSERT_TAIL(>exp_open_zones, zone, entry);
 break;
 case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-QTAILQ_INSERT_TAIL(>imp_open_zones, zone, entry);
+QTAILQ_INSERT_TAIL(>imp_open_zones, zone, entry);
 break;
 case NVME_ZONE_STATE_CLOSED:
-QTAILQ_INSERT_TAIL(>closed_zones, zone, entry);
+QTAILQ_INSERT_TAIL(>closed_zones, zone, entry);
 break;
 case NVME_ZONE_STATE_FULL:
-QTAILQ_INSERT_TAIL(>full_zones, zone, entry);
+QTAILQ_INSERT_TAIL(>full_zones, zone, entry);
 case NVME_ZONE_STATE_READ_ONLY:
 break;
 default:
@@ -306,17 +306,17 @@ static void nvme_assign_zone_state(NvmeNamespace *ns, 
NvmeZone *zone,
  * Check if we can open a zone without exceeding open/active limits.
  * AOR stands for "Active and Open Resources" (see TP 4053 section 2.5).
  */
-static int nvme_ns_zoned_aor_check(NvmeNamespace *ns, uint32_t act,
+static int nvme_ns_zoned_aor_check(NvmeNamespaceZoned *zoned, uint32_t act,
uint32_t opn)
 {
-if (ns->params.max_active_zones != 0 &&
-ns->nr_active_zones + act > ns->params.max_active_zones) {
-trace_pci_nvme_err_insuff_active_res(ns->params.max_active_zones);
+if (zoned->max_active_zones != 0 &&
+zoned->nr_active_zones + act > zoned->max_active_zones) {
+trace_pci_nvme_err_insuff_active_res(zoned->max_active_zones);
 return NVME_ZONE_TOO_MANY_ACTIVE | NVME_DNR;
 }
-if (ns->params.max_open_zones != 0 &&
-ns->nr_open_zones + opn > ns->params.max_open_zones) {
-trace_pci_nvme_err_insuff_open_res(ns->params.max_open_zones);
+if (zoned->max_open_zones != 0 &&
+zoned->nr_open_zones + opn > zoned->max_open_zones) {
+trace_pci_nvme_err_insuff_open_res(zoned->max_open_zones);
 return NVME_ZONE_TOO_MANY_OPEN | NVME_DNR;
 }
 
@@ -1581,8 +1581,8 @@ static uint16_t nvme_check_zone_state_for_write(NvmeZone 
*zone)
 return NVME_INTERNAL_DEV_ERROR;
 }
 
-static uint16_t nvme_check_zone_write(NvmeNamespace *ns, NvmeZone *zone,
-  uint64_t slba, uint32_t nlb)
+static uint16_t nvme_check_zone_write(NvmeZone *zone, uint64_t slba,
+  uint32_t nlb)
 {
 uint64_t zcap = nvme_zoned_zone_wr_boundary(zone);
 uint16_t status;
@@ -1625,24 +1625,24 @@ static uint16_t nvme_check_zone_state_for_read(NvmeZone 
*zone)
 return NVME_INTERNAL_DEV_ERROR;
 }
 
-static uint16_t nvme_check_zone_read(NvmeNamespace *ns, uint64_t slba,
+static uint16_t nvme_check_zone_read(NvmeNamespaceZoned *zoned, uint64_t slba,
  uint32_t nlb)
 {
 NvmeZone *zone;
 uint64_t bndry, end;
 uint16_t status;
 
-zone = nvme_ns_zoned_get_by_slba(ns, slba);
+zone = nvme_ns_zoned_get_by_slba(zoned, slba);
 assert(zone);
 
-bndry =

[PATCH RFC 06/13] nvme: add structured type for nguid

2021-09-14 Thread Klaus Jensen

From: Klaus Jensen 

Add a structured type for NGUID.

Signed-off-by: Klaus Jensen 
---
 include/block/nvme.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index 2bcabe561589..f41464ee19bd 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1269,6 +1269,11 @@ typedef struct QEMU_PACKED NvmeLBAFE {
 
 #define NVME_NSID_BROADCAST 0x
 
+typedef struct QEMU_PACKED NvmeNGUID {
+uint8_t vspexid[8];
+uint64_teui;
+} NvmeNGUID;
+
 typedef struct QEMU_PACKED NvmeIdNs {
 uint64_tnsze;
 uint64_tncap;
@@ -1300,7 +1305,7 @@ typedef struct QEMU_PACKED NvmeIdNs {
 uint32_tmcl;
 uint8_t msrc;
 uint8_t rsvd81[23];
-uint8_t nguid[16];
+NvmeNGUID   nguid;
 uint64_teui64;
 NvmeLBAFlbaf[16];
 uint8_t rsvd192[192];
-- 
2.33.0

[PATCH RFC 01/13] hw/nvme: move dif/pi prototypes into dif.h

2021-09-14 Thread Klaus Jensen

From: Klaus Jensen 

Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c |  1 +
 hw/nvme/dif.c  |  1 +
 hw/nvme/dif.h  | 53 ++
 hw/nvme/nvme.h | 50 ---
 4 files changed, 55 insertions(+), 50 deletions(-)
 create mode 100644 hw/nvme/dif.h

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index dc0e7b00308e..65970b81d5fb 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -163,6 +163,7 @@
 #include "migration/vmstate.h"
 
 #include "nvme.h"
+#include "dif.h"
 #include "trace.h"
 
 #define NVME_MAX_IOQPAIRS 0x
diff --git a/hw/nvme/dif.c b/hw/nvme/dif.c
index 5dbd18b2a4a5..cd0cea2b5ebd 100644
--- a/hw/nvme/dif.c
+++ b/hw/nvme/dif.c
@@ -13,6 +13,7 @@
 #include "sysemu/block-backend.h"
 
 #include "nvme.h"
+#include "dif.h"
 #include "trace.h"
 
 uint16_t nvme_check_prinfo(NvmeNamespace *ns, uint8_t prinfo, uint64_t slba,
diff --git a/hw/nvme/dif.h b/hw/nvme/dif.h
new file mode 100644
index ..e36fea30e71e
--- /dev/null
+++ b/hw/nvme/dif.h
@@ -0,0 +1,53 @@
+#ifndef HW_NVME_DIF_H
+#define HW_NVME_DIF_H
+
+/* from Linux kernel (crypto/crct10dif_common.c) */
+static const uint16_t t10_dif_crc_table[256] = {
+0x, 0x8BB7, 0x9CD9, 0x176E, 0xB205, 0x39B2, 0x2EDC, 0xA56B,
+0xEFBD, 0x640A, 0x7364, 0xF8D3, 0x5DB8, 0xD60F, 0xC161, 0x4AD6,
+0x54CD, 0xDF7A, 0xC814, 0x43A3, 0xE6C8, 0x6D7F, 0x7A11, 0xF1A6,
+0xBB70, 0x30C7, 0x27A9, 0xAC1E, 0x0975, 0x82C2, 0x95AC, 0x1E1B,
+0xA99A, 0x222D, 0x3543, 0xBEF4, 0x1B9F, 0x9028, 0x8746, 0x0CF1,
+0x4627, 0xCD90, 0xDAFE, 0x5149, 0xF422, 0x7F95, 0x68FB, 0xE34C,
+0xFD57, 0x76E0, 0x618E, 0xEA39, 0x4F52, 0xC4E5, 0xD38B, 0x583C,
+0x12EA, 0x995D, 0x8E33, 0x0584, 0xA0EF, 0x2B58, 0x3C36, 0xB781,
+0xD883, 0x5334, 0x445A, 0xCFED, 0x6A86, 0xE131, 0xF65F, 0x7DE8,
+0x373E, 0xBC89, 0xABE7, 0x2050, 0x853B, 0x0E8C, 0x19E2, 0x9255,
+0x8C4E, 0x07F9, 0x1097, 0x9B20, 0x3E4B, 0xB5FC, 0xA292, 0x2925,
+0x63F3, 0xE844, 0xFF2A, 0x749D, 0xD1F6, 0x5A41, 0x4D2F, 0xC698,
+0x7119, 0xFAAE, 0xEDC0, 0x6677, 0xC31C, 0x48AB, 0x5FC5, 0xD472,
+0x9EA4, 0x1513, 0x027D, 0x89CA, 0x2CA1, 0xA716, 0xB078, 0x3BCF,
+0x25D4, 0xAE63, 0xB90D, 0x32BA, 0x97D1, 0x1C66, 0x0B08, 0x80BF,
+0xCA69, 0x41DE, 0x56B0, 0xDD07, 0x786C, 0xF3DB, 0xE4B5, 0x6F02,
+0x3AB1, 0xB106, 0xA668, 0x2DDF, 0x88B4, 0x0303, 0x146D, 0x9FDA,
+0xD50C, 0x5EBB, 0x49D5, 0xC262, 0x6709, 0xECBE, 0xFBD0, 0x7067,
+0x6E7C, 0xE5CB, 0xF2A5, 0x7912, 0xDC79, 0x57CE, 0x40A0, 0xCB17,
+0x81C1, 0x0A76, 0x1D18, 0x96AF, 0x33C4, 0xB873, 0xAF1D, 0x24AA,
+0x932B, 0x189C, 0x0FF2, 0x8445, 0x212E, 0xAA99, 0xBDF7, 0x3640,
+0x7C96, 0xF721, 0xE04F, 0x6BF8, 0xCE93, 0x4524, 0x524A, 0xD9FD,
+0xC7E6, 0x4C51, 0x5B3F, 0xD088, 0x75E3, 0xFE54, 0xE93A, 0x628D,
+0x285B, 0xA3EC, 0xB482, 0x3F35, 0x9A5E, 0x11E9, 0x0687, 0x8D30,
+0xE232, 0x6985, 0x7EEB, 0xF55C, 0x5037, 0xDB80, 0xCCEE, 0x4759,
+0x0D8F, 0x8638, 0x9156, 0x1AE1, 0xBF8A, 0x343D, 0x2353, 0xA8E4,
+0xB6FF, 0x3D48, 0x2A26, 0xA191, 0x04FA, 0x8F4D, 0x9823, 0x1394,
+0x5942, 0xD2F5, 0xC59B, 0x4E2C, 0xEB47, 0x60F0, 0x779E, 0xFC29,
+0x4BA8, 0xC01F, 0xD771, 0x5CC6, 0xF9AD, 0x721A, 0x6574, 0xEEC3,
+0xA415, 0x2FA2, 0x38CC, 0xB37B, 0x1610, 0x9DA7, 0x8AC9, 0x017E,
+0x1F65, 0x94D2, 0x83BC, 0x080B, 0xAD60, 0x26D7, 0x31B9, 0xBA0E,
+0xF0D8, 0x7B6F, 0x6C01, 0xE7B6, 0x42DD, 0xC96A, 0xDE04, 0x55B3
+};
+
+uint16_t nvme_check_prinfo(NvmeNamespace *ns, uint8_t prinfo, uint64_t slba,
+   uint32_t reftag);
+uint16_t nvme_dif_mangle_mdata(NvmeNamespace *ns, uint8_t *mbuf, size_t mlen,
+   uint64_t slba);
+void nvme_dif_pract_generate_dif(NvmeNamespace *ns, uint8_t *buf, size_t len,
+ uint8_t *mbuf, size_t mlen, uint16_t apptag,
+ uint32_t *reftag);
+uint16_t nvme_dif_check(NvmeNamespace *ns, uint8_t *buf, size_t len,
+uint8_t *mbuf, size_t mlen, uint8_t prinfo,
+uint64_t slba, uint16_t apptag,
+uint16_t appmask, uint32_t *reftag);
+uint16_t nvme_dif_rw(NvmeCtrl *n, NvmeRequest *req);
+
+#endif /* HW_NVME_DIF_H */
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 83ffabade4cf..45bf96d65321 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -503,54 +503,4 @@ void nvme_rw_complete_cb(void *opaque, int ret);
 uint16_t nvme_map_dptr(NvmeCtrl *n, NvmeSg *sg, size_t len,
NvmeCmd *cmd);
 
-/* from Linux kernel (crypto/crct10dif_common.c) */
-static const uint16_t t10_dif_crc_table[256] = {
-0x, 0x8BB7, 0x9CD9, 0x176E, 0xB205, 0x39B2, 0x2EDC, 0xA56B,
-0xEFBD, 0x640A, 0x7364, 0xF8D3, 0x5DB8, 0xD60F, 0xC161, 0x4AD6,
-0x54CD, 0xDF7A, 0xC814, 0x43A3, 0xE6C8, 0x6D7F, 0x7A11, 0xF1A6,
-0xBB70, 0x30C7, 0x27A9, 0xAC1E, 0x0975, 0x82C2, 0x95AC, 0x1E1B,
-0xA99A, 0x222D, 0x3543, 0xBEF4, 0x1B9F, 0x9028, 0x8746, 0x0CF1,
-0x4627,

[PATCH RFC 04/13] hw/nvme: move nvm namespace members to separate struct

2021-09-14 Thread Klaus Jensen

From: Klaus Jensen 

Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 282 +++--
 hw/nvme/dif.c  | 101 +-
 hw/nvme/dif.h  |  12 +--
 hw/nvme/ns.c   |  72 +++--
 hw/nvme/nvme.h |  45 +---
 5 files changed, 290 insertions(+), 222 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 4c30823d389f..7f41181aafa1 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -528,11 +528,11 @@ static inline void nvme_sg_unmap(NvmeSg *sg)
  * holds both data and metadata. This function splits the data and metadata
  * into two separate QSG/IOVs.
  */
-static void nvme_sg_split(NvmeSg *sg, NvmeNamespace *ns, NvmeSg *data,
+static void nvme_sg_split(NvmeSg *sg, NvmeNamespaceNvm *nvm, NvmeSg *data,
   NvmeSg *mdata)
 {
 NvmeSg *dst = data;
-uint32_t trans_len, count = ns->lbasz;
+uint32_t trans_len, count = nvm->lbasz;
 uint64_t offset = 0;
 bool dma = sg->flags & NVME_SG_DMA;
 size_t sge_len;
@@ -564,7 +564,7 @@ static void nvme_sg_split(NvmeSg *sg, NvmeNamespace *ns, 
NvmeSg *data,
 
 if (count == 0) {
 dst = (dst == data) ? mdata : data;
-count = (dst == data) ? ns->lbasz : ns->lbaf.ms;
+count = (dst == data) ? nvm->lbasz : nvm->lbaf.ms;
 }
 
 if (sge_len == offset) {
@@ -1029,17 +1029,17 @@ static uint16_t nvme_map_mptr(NvmeCtrl *n, NvmeSg *sg, 
size_t len,
 
 static uint16_t nvme_map_data(NvmeCtrl *n, uint32_t nlb, NvmeRequest *req)
 {
-NvmeNamespace *ns = req->ns;
+NvmeNamespaceNvm *nvm = NVME_NAMESPACE_NVM(req->ns);
 NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
-bool pi = !!NVME_ID_NS_DPS_TYPE(ns->id_ns.dps);
+bool pi = !!NVME_ID_NS_DPS_TYPE(nvm->id_ns.dps);
 bool pract = !!(le16_to_cpu(rw->control) & NVME_RW_PRINFO_PRACT);
-size_t len = nvme_l2b(ns, nlb);
+size_t len = nvme_l2b(nvm, nlb);
 uint16_t status;
 
-if (nvme_ns_ext(ns) && !(pi && pract && ns->lbaf.ms == 8)) {
+if (nvme_ns_ext(nvm) && !(pi && pract && nvm->lbaf.ms == 8)) {
 NvmeSg sg;
 
-len += nvme_m2b(ns, nlb);
+len += nvme_m2b(nvm, nlb);
 
 status = nvme_map_dptr(n, , len, >cmd);
 if (status) {
@@ -1047,7 +1047,7 @@ static uint16_t nvme_map_data(NvmeCtrl *n, uint32_t nlb, 
NvmeRequest *req)
 }
 
 nvme_sg_init(n, >sg, sg.flags & NVME_SG_DMA);
-nvme_sg_split(, ns, >sg, NULL);
+nvme_sg_split(, nvm, >sg, NULL);
 nvme_sg_unmap();
 
 return NVME_SUCCESS;
@@ -1058,14 +1058,14 @@ static uint16_t nvme_map_data(NvmeCtrl *n, uint32_t 
nlb, NvmeRequest *req)
 
 static uint16_t nvme_map_mdata(NvmeCtrl *n, uint32_t nlb, NvmeRequest *req)
 {
-NvmeNamespace *ns = req->ns;
-size_t len = nvme_m2b(ns, nlb);
+NvmeNamespaceNvm *nvm = NVME_NAMESPACE_NVM(req->ns);
+size_t len = nvme_m2b(nvm, nlb);
 uint16_t status;
 
-if (nvme_ns_ext(ns)) {
+if (nvme_ns_ext(nvm)) {
 NvmeSg sg;
 
-len += nvme_l2b(ns, nlb);
+len += nvme_l2b(nvm, nlb);
 
 status = nvme_map_dptr(n, , len, >cmd);
 if (status) {
@@ -1073,7 +1073,7 @@ static uint16_t nvme_map_mdata(NvmeCtrl *n, uint32_t nlb, 
NvmeRequest *req)
 }
 
 nvme_sg_init(n, >sg, sg.flags & NVME_SG_DMA);
-nvme_sg_split(, ns, NULL, >sg);
+nvme_sg_split(, nvm, NULL, >sg);
 nvme_sg_unmap();
 
 return NVME_SUCCESS;
@@ -1209,14 +1209,14 @@ static inline uint16_t nvme_h2c(NvmeCtrl *n, uint8_t 
*ptr, uint32_t len,
 uint16_t nvme_bounce_data(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
   NvmeTxDirection dir, NvmeRequest *req)
 {
-NvmeNamespace *ns = req->ns;
+NvmeNamespaceNvm *nvm = NVME_NAMESPACE_NVM(req->ns);
 NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
-bool pi = !!NVME_ID_NS_DPS_TYPE(ns->id_ns.dps);
+bool pi = !!NVME_ID_NS_DPS_TYPE(nvm->id_ns.dps);
 bool pract = !!(le16_to_cpu(rw->control) & NVME_RW_PRINFO_PRACT);
 
-if (nvme_ns_ext(ns) && !(pi && pract && ns->lbaf.ms == 8)) {
-return nvme_tx_interleaved(n, >sg, ptr, len, ns->lbasz,
-   ns->lbaf.ms, 0, dir);
+if (nvme_ns_ext(nvm) && !(pi && pract && nvm->lbaf.ms == 8)) {
+return nvme_tx_interleaved(n, >sg, ptr, len, nvm->lbasz,
+   nvm->lbaf.ms, 0, dir);
 }
 
 return nvme_tx(n, >sg, ptr, len, dir);
@@ -1225,12 +1225,12 @@ uint16_t nvme_bounce_data(NvmeCtrl *n, uint8_t *ptr, 
uint32_t len,
 uint16_t nvme_bounce_mdata(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
NvmeTxDirection dir, NvmeRequest *req)
 {
-NvmeNamespace *ns = req->ns;
+NvmeNamespaceNvm *nvm = NVME_NAMESPACE_NVM(req->ns);
 uint16_t status;
 
-if (nvme_ns_ext(ns)) {
-return nvme_tx_interleaved(n, >sg, ptr, len, ns->lbaf.ms,
-   ns->lbasz, ns->lbasz, dir);
+if (nvme_ns_ext(nvm)) {
+return

[PATCH RFC 00/13] hw/nvme: experimental user-creatable objects

2021-09-14 Thread Klaus Jensen

From: Klaus Jensen 

Hi,

This is an attempt at adressing a bunch of issues that have presented
themselves since we added subsystem support. It's been brewing for a
while now.

Fundamentally, I've come to the conclusion that modeling namespaces and
subsystems as "devices" is wrong. They should have been user-creatable
objects. We've run into multiple issues with wrt. hotplugging due to how
namespaces hook up to the controller with a bus. The bus-based design
made a lot of sense when we didn't have subsystem support and it follows
the design of hw/scsi. But, the problem here is that the bus-based
design dictates a one parent relationship, and with shared namespaces,
that is just not true. If the namespaces are considered to have a single
parent, that parent is the subsystem, not any specific controller.

This series adds a set of experimental user-creatable objects:

  -object x-nvme-subsystem
  -object x-nvme-ns-nvm
  -object x-nvme-ns-zoned

It also adds a new controller device (-device x-nvme-ctrl) that supports
these new objects (and gets rid of a bunch of deprecated and confusing
parameters). This new approach has a bunch of benefits (other than just
fixing the hotplugging issues properly) - we also get support for some
nice introspection through some new dynamic properties:

  (qemu) qom-get /machine/peripheral/nvme-ctrl-1 attached-namespaces
  [
  "/objects/nvm-1",
  "/objects/zns-1"
  ]

  (qemu) qom-list /objects/zns-1
  type (string)
  subsys (link)
  nsid (uint32)
  uuid (string)
  attached-ctrls (str)
  eui64 (string)
  blockdev (string)
  pi-first (bool)
  pi-type (NvmeProtInfoType)
  extended-lba (bool)
  metadata-size (uint16)
  lba-size (size)
  zone-descriptor-extension-size (size)
  zone-cross-read (bool)
  zone-max-open (uint32)
  zone-capacity (size)
  zone-size (size)
  zone-max-active (uint32)

  (qemu) qom-get /objects/zns-1 pi-type
  "none"

  (qemu) qom-get /objects/zns-1 eui64
  "52:54:00:17:67:a0:40:15"

  (qemu) qom-get /objects/zns-1 zone-capacity
  12582912

Currently, there are no shortcuts, so you have to define the full
topology to get it up and running. Notice that the topology is explicit
(the 'subsys' and 'attached-ctrls' links). There are no 'nvme-bus'
anymore.

  -object x-nvme-subsystem,id=subsys0,subnqn=foo
  -device x-nvme-ctrl,id=nvme-ctrl-0,serial=foo,subsys=subsys0
  -device x-nvme-ctrl,id=nvme-ctrl-1,serial=bar,subsys=subsys0
  -drive  id=nvm-1,file=nvm-1.img,format=raw,if=none,discard=unmap
  -object 
x-nvme-ns-nvm,id=nvm-1,blockdev=nvm-1,nsid=1,subsys=subsys0,attached-ctrls=nvme-ctrl-1
  -drive  id=nvm-2,file=nvm-2.img,format=raw,if=none,discard=unmap
  -object 
x-nvme-ns-nvm,id=nvm-2,blockdev=nvm-2,nsid=2,subsys=subsys0,attached-ctrls=nvme-ctrl-0

It'd be nice to add some defaults for when you don't need/want a
full-blown multi controller/namespace setup.

The first patches in this series reorganized a bunch of structs to make
it easier to separate them in later patches. Then, it proceeds to hoist
the device state into separate structures such that we can reuse the
core logic in both the new objects and the existing devices. Thus, full
backwards compatibility is kept and the existing device all work as they
do prior to this series being applied. I have chosen to separate the nvm
and zoned namespace types into individual objects. The core namespace
functionality is contained in an abstract (non user-creatable) x-nvme-ns
object and the x-nvme-ns-nvm object extends this and serves at the
parent of the x-nvme-ns-zoned object itself.

There are definitely an alternative to this approach - one that I've
previously discussed with Hannes (and other QEMU devs, thanks!), and
that would be to add the subsystem as a system bus device.

Cheers, Klaus

Klaus Jensen (13):
  hw/nvme: move dif/pi prototypes into dif.h
  hw/nvme: move zns helpers and types into zoned.h
  hw/nvme: move zoned namespace members to separate struct
  hw/nvme: move nvm namespace members to separate struct
  hw/nvme: move BlockBackend to NvmeNamespaceNvm
  nvme: add structured type for nguid
  hw/nvme: hoist qdev state from namespace
  hw/nvme: hoist qdev state from controller
  hw/nvme: add experimental device x-nvme-ctrl
  hw/nvme: add experimental object x-nvme-subsystem
  hw/nvme: add experimental abstract object x-nvme-ns
  hw/nvme: add experimental objects x-nvme-ns-{nvm,zoned}
  hw/nvme: add attached-namespaces prop

 hw/nvme/ctrl.c   | 1187 --
 hw/nvme/dif.c|  120 +++--
 hw/nvme/dif.h|   55 ++
 hw/nvme/meson.build  |2 +-
 hw/nvme/ns-nvm.c |  360 +
 hw/nvme/ns-zoned.c   |  449 
 hw/nvme/ns.c |  818 -
 hw/nvme/nvm.h|   65 +++
 hw/nvme/nvme.h   |  325 +---
 hw/nvme/subsys.c |  154 +-
 hw/nvme/zoned.h  |  147 ++
 include/block/nvme.h |   11 +-
 qapi/qom.json|   83 +++
 13 files changed, 2612 insertions(+), 1164

Re: [PATCH 04/20] nubus: use bitmap to manage available slots

2021-09-14 Thread Mark Cave-Ayland


On 12/09/2021 18:48, Philippe Mathieu-Daudé wrote:


On 9/12/21 9:48 AM, Mark Cave-Ayland wrote:

Convert nubus_device_realize() to use a bitmap to manage available slots to 
allow
for future Nubus devices to be plugged into arbitrary slots from the command 
line.

Update mac_nubus_bridge_init() to only allow slots 0x9 to 0xe on a Macintosh
machines as documented in "Desigining Cards and Drivers for the Macintosh 
Family".

Signed-off-by: Mark Cave-Ayland 
---
  hw/nubus/mac-nubus-bridge.c |  3 +++
  hw/nubus/nubus-bus.c|  2 +-
  hw/nubus/nubus-device.c | 33 +++--
  include/hw/nubus/nubus.h|  4 ++--
  4 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/hw/nubus/mac-nubus-bridge.c b/hw/nubus/mac-nubus-bridge.c
index 7c329300b8..6e78f4c0b3 100644
--- a/hw/nubus/mac-nubus-bridge.c
+++ b/hw/nubus/mac-nubus-bridge.c
@@ -18,6 +18,9 @@ static void mac_nubus_bridge_init(Object *obj)
  
  s->bus = NUBUS_BUS(qbus_create(TYPE_NUBUS_BUS, DEVICE(s), NULL));
  
+/* Macintosh only has slots 0x9 to 0xe available */

+s->bus->slot_available_mask = 0x7e00;


So MAKE_64BIT_MASK(9, 6),


  sysbus_init_mmio(sbd, >bus->super_slot_io);
  sysbus_init_mmio(sbd, >bus->slot_io);
  }
diff --git a/hw/nubus/nubus-bus.c b/hw/nubus/nubus-bus.c
index 5c13452308..f6d3655f51 100644
--- a/hw/nubus/nubus-bus.c
+++ b/hw/nubus/nubus-bus.c
@@ -84,7 +84,7 @@ static void nubus_init(Object *obj)
nubus, "nubus-slots",
NUBUS_SLOT_NB * NUBUS_SLOT_SIZE);
  
-nubus->current_slot = NUBUS_FIRST_SLOT;

+nubus->slot_available_mask = 0x;


and MAKE_64BIT_MASK(0, 16)?


  }


I'll convert these over to use MAKE_64BIT_MASK too :)


ATB,

Mark.

Re: [PATCH 16/20] nubus-bridge: embed the NubusBus object directly within nubus-bridge

2021-09-14 Thread Mark Cave-Ayland


On 12/09/2021 18:43, Philippe Mathieu-Daudé wrote:


On 9/12/21 9:49 AM, Mark Cave-Ayland wrote:

Since nubus-bridge is a container for NubusBus then it should be embedded
directly within the bridge device using qbus_create_inplace().

Signed-off-by: Mark Cave-Ayland 
---
  hw/m68k/q800.c  | 2 +-
  hw/nubus/mac-nubus-bridge.c | 7 ---
  hw/nubus/nubus-bridge.c | 3 ++-
  include/hw/nubus/nubus.h| 2 +-
  4 files changed, 8 insertions(+), 6 deletions(-)



diff --git a/hw/nubus/mac-nubus-bridge.c b/hw/nubus/mac-nubus-bridge.c
index c16cfc4ab3..c23d5d508d 100644
--- a/hw/nubus/mac-nubus-bridge.c
+++ b/hw/nubus/mac-nubus-bridge.c
@@ -18,18 +18,19 @@ static void mac_nubus_bridge_init(Object *obj)
  MacNubusBridge *s = MAC_NUBUS_BRIDGE(obj);
  NubusBridge *nb = NUBUS_BRIDGE(obj);
  SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
+NubusBus *bus = >bus;
  
  /* Macintosh only has slots 0x9 to 0xe available */

-nb->bus->slot_available_mask = 0x7e00;
+bus->slot_available_mask = 0x7e00;


Re-reading I'd use MAKE_64BIT_MASK(9, 6)
or eventually MAKE_64BIT_MASK(9, 0xe - 0x6 + 1).


Thanks, I'll go for MAKE_64BIT_MASK(9, 6) here in v2.


ATB,

Mark.

Re: [PATCH 11/20] nubus-device: add romfile property for loading declaration ROMs

2021-09-14 Thread Mark Cave-Ayland


On 12/09/2021 18:39, Philippe Mathieu-Daudé wrote:


On 9/12/21 9:49 AM, Mark Cave-Ayland wrote:

The declaration ROM is located at the top-most address of the standard slot
space.

Signed-off-by: Mark Cave-Ayland 
---
  hw/nubus/nubus-device.c  | 43 +++-
  include/hw/nubus/nubus.h |  5 +
  2 files changed, 47 insertions(+), 1 deletion(-)



+/* Declaration ROM */



+} else if (size > NUBUS_DECL_ROM_MAX_SIZE) {


I'd check for >= and define as (64 * KiB).


That's a good idea - I'll update this for the v2.


+error_setg(errp, "romfile \"%s\" too large (maximum size 64K)",
+   nd->romfile);
+g_free(path);
+return;
+}



diff --git a/include/hw/nubus/nubus.h b/include/hw/nubus/nubus.h
index 87a97516c7..42f4c9dbb8 100644
--- a/include/hw/nubus/nubus.h
+++ b/include/hw/nubus/nubus.h
@@ -39,12 +39,17 @@ struct NubusBus {
  uint32_t slot_available_mask;
  };
  
+#define NUBUS_DECL_ROM_MAX_SIZE0x



ATB,

Mark.

Re: question on vhost, limiting kernel threads and NPROC

2021-09-14 Thread Christian Brauner

On Mon, Sep 13, 2021 at 05:32:32PM -0400, Michael S. Tsirkin wrote:
> On Mon, Sep 13, 2021 at 12:04:04PM -0500, Mike Christie wrote:
> > I just realized I forgot to cc the virt list so adding now.
> > 
> > Christian see the very bottom for a different fork patch.
> > 
> > On 7/12/21 7:05 AM, Stefan Hajnoczi wrote:
> > > On Fri, Jul 09, 2021 at 11:25:37AM -0500, Mike Christie wrote:
> > >> Hi,
> > >>
> > >> The goal of this email is to try and figure how we want to track/limit 
> > >> the
> > >> number of kernel threads created by vhost devices.
> > >>
> > >> Background:
> > >> ---
> > >> For vhost-scsi, we've hit a issue where the single vhost worker thread 
> > >> can't
> > >> handle all IO the being sent from multiple queues. IOPs is stuck at 
> > >> around
> > >> 500K. To fix this, we did this patchset:
> > >>
> > >> https://lore.kernel.org/linux-scsi/20210525180600.6349-1-michael.chris...@oracle.com/
> > >>
> > >> which allows userspace to create N threads and map them to a dev's 
> > >> virtqueues.
> > >> With this we can get around 1.4M IOPs.
> > >>
> > >> Problem:
> > >> 
> > >> While those patches were being reviewed, a concern about tracking all 
> > >> these
> > >> new possible threads was raised here:
> > >>
> > >> https://lore.kernel.org/linux-scsi/YL45CfpHyzSEcAJv@stefanha-x1.localdomain/
> > >>
> > >> To save you some time, the question is what does other kernel code using 
> > >> the
> > >> kthread API do to track the number of kernel threads created on behalf of
> > >> a userspace thread. The answer is they don't do anything so we will have 
> > >> to
> > >> add that code.
> > >>
> > >> I started to do that here:
> > >>
> > >> https://lkml.org/lkml/2021/6/23/1233
> > >>
> > >> where those patches would charge/check the vhost device owner's 
> > >> RLIMIT_NPROC
> > >> value. But, the question of if we really want to do this has come up 
> > >> which is
> > >> why I'm bugging lists like libvirt now.
> > >>
> > >> Question/Solution:
> > >> --
> > >> I'm bugging everyone so we can figure out:
> > >>
> > >> If we need to specifically track the number of kernel threads being made
> > >> for the vhost kernel use case by the RLIMIT_NPROC limit?
> > >>
> > >> Or, is it ok to limit the number of devices with the RLIMIT_NOFILE limit.
> > >> Then each device has a limit on the number of threads it can create.
> > > 
> > > Do we want to add an interface where an unprivileged userspace process
> > > can create large numbers of kthreads? The number is indirectly bounded
> > > by RLIMIT_NOFILE * num_virtqueues, but there is no practical way to
> > > use that rlimit since num_virtqueues various across vhost devices and
> > > RLIMIT_NOFILE might need to have a specific value to control file
> > > descriptors.
> > > 
> > > io_uring worker threads are limited by RLIMIT_NPROC. I think it makes
> > > sense in vhost too where the device instance is owned by a specific
> > > userspace process and can be accounted against that process' rlimit.
> > > 
> > > I don't have a specific use case other than that I think vhost should be
> > > safe and well-behaved.
> > > 
> > 
> > Sorry for the late reply. I finally got to go on PTO and used like 2
> > years worth in one super long vacation :)
> > 
> > I still don't have a RLIMIT_NPROC use case and it wasn't not clear to
> > me if that has to be handled before merging. However, I might have got
> > lucky and found a bug where the fix will handle your request too.
> > 
> > It looks like cgroup v2 is supposed to work, but for vhost threads
> > it doesn't because the kernel functions we use just support v1. If
> > we change the vhost layer to create threads like how io_uring does
> > then we get the RLIMIT_NPROC checks and also cgroup v2 support.
> > 
> > Christian, If you didn't like this patch
> > 
> > https://lkml.org/lkml/2021/6/23/1233
> > 
> > then I'm not sure how much you will like what is needed to support the
> > above. Here is a patch which includes what we would need from the fork
> > related code. On one hand, it's nicer because it fits into the PF FLAG
> > code like you requested. But, I have to add a no_files arg. See below:
> > 
> > 
> > --
> > 
> > 
> > >From 351d476e8db0a78b9bdf22d77dd1abe66c0eac40 Mon Sep 17 00:00:00 2001
> > From: Mike Christie 
> > Date: Mon, 13 Sep 2021 11:20:20 -0500
> > Subject: [PATCH] fork: allow cloning of userspace procs from kernel
> > 
> > Userspace apps/processes like Qemu call into the vhost layer to create
> > worker threads which execute IO on behalf of VMs. If users set RIMIT
> > or cgroup limits or setup v2 cgroups or namespaces, the worker thread
> > is not accounted for or even setup correctly. The reason is that vhost
> > uses the kthread api which inherits those attributes/values from the
> > kthreadd thread. This patch allows kernel modules to work like the
> > io_uring code which can call kernel_clone from the userspace thread's
> > context

Re: [PATCH v2 44/53] target/m68k: convert to use format_tlb callback

2021-09-14 Thread Laurent Vivier

Le 14/09/2021 à 16:20, Daniel P. Berrangé a écrit :
> Change the "info tlb" implementation to use the format_tlb callback.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  target/m68k/cpu.c |   3 +
>  target/m68k/cpu.h |   3 +-
>  target/m68k/helper.c  | 132 ++
>  target/m68k/monitor.c |  11 +++-
>  4 files changed, 82 insertions(+), 67 deletions(-)
> 
> diff --git a/target/m68k/cpu.c b/target/m68k/cpu.c
> index 4ccf572a68..8f143eb540 100644
> --- a/target/m68k/cpu.c
> +++ b/target/m68k/cpu.c
> @@ -537,6 +537,9 @@ static void m68k_cpu_class_init(ObjectClass *c, void 
> *data)
>  cc->class_by_name = m68k_cpu_class_by_name;
>  cc->has_work = m68k_cpu_has_work;
>  cc->format_state = m68k_cpu_format_state;
> +#ifndef CONFIG_USER_ONLY
> +cc->format_tlb = m68k_cpu_format_tlb;
> +#endif
>  cc->set_pc = m68k_cpu_set_pc;
>  cc->gdb_read_register = m68k_cpu_gdb_read_register;
>  cc->gdb_write_register = m68k_cpu_gdb_write_register;
> diff --git a/target/m68k/cpu.h b/target/m68k/cpu.h
> index b0641f6d0d..f2d777a1ba 100644
> --- a/target/m68k/cpu.h
> +++ b/target/m68k/cpu.h
> @@ -169,6 +169,7 @@ struct M68kCPU {
>  void m68k_cpu_do_interrupt(CPUState *cpu);
>  bool m68k_cpu_exec_interrupt(CPUState *cpu, int int_req);
>  void m68k_cpu_format_state(CPUState *cpu, GString *buf, int flags);
> +void m68k_cpu_format_tlb(CPUState *cpu, GString *buf);
>  hwaddr m68k_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
>  int m68k_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
>  int m68k_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
> @@ -612,6 +613,4 @@ static inline void cpu_get_tb_cpu_state(CPUM68KState 
> *env, target_ulong *pc,
>  }
>  }
>  
> -void dump_mmu(CPUM68KState *env);
> -
>  #endif
> diff --git a/target/m68k/helper.c b/target/m68k/helper.c
> index 137a3e1a3d..050a27d21c 100644
> --- a/target/m68k/helper.c
> +++ b/target/m68k/helper.c
> @@ -25,6 +25,7 @@
>  #include "exec/helper-proto.h"
>  #include "fpu/softfloat.h"
>  #include "qemu/qemu-print.h"
> +#include "qapi/error.h"
>  
>  #define SIGNBIT (1u << 31)
>  
> @@ -483,27 +484,28 @@ void m68k_switch_sp(CPUM68KState *env)
>  /* MMU: 68040 only */
>  
>  static void print_address_zone(uint32_t logical, uint32_t physical,
> -   uint32_t size, int attr)
> +   uint32_t size, int attr, GString *buf)
>  {
> -qemu_printf("%08x - %08x -> %08x - %08x %c ",
> -logical, logical + size - 1,
> -physical, physical + size - 1,
> -attr & 4 ? 'W' : '-');
> +g_string_append_printf(buf, "%08x - %08x -> %08x - %08x %c ",
> +   logical, logical + size - 1,
> +   physical, physical + size - 1,
> +   attr & 4 ? 'W' : '-');
>  size >>= 10;
>  if (size < 1024) {
> -qemu_printf("(%d KiB)\n", size);
> +g_string_append_printf(buf, "(%d KiB)\n", size);
>  } else {
>  size >>= 10;
>  if (size < 1024) {
> -qemu_printf("(%d MiB)\n", size);
> +g_string_append_printf(buf, "(%d MiB)\n", size);
>  } else {
>  size >>= 10;
> -qemu_printf("(%d GiB)\n", size);
> +g_string_append_printf(buf, "(%d GiB)\n", size);
>  }
>  }
>  }
>  
> -static void dump_address_map(CPUM68KState *env, uint32_t root_pointer)
> +static void dump_address_map(CPUM68KState *env, uint32_t root_pointer,
> + GString *buf)
>  {
>  int i, j, k;
>  int tic_size, tic_shift;
> @@ -573,7 +575,8 @@ static void dump_address_map(CPUM68KState *env, uint32_t 
> root_pointer)
>  size = last_logical + (1 << tic_shift) -
> first_logical;
>  print_address_zone(first_logical,
> -   first_physical, size, last_attr);
> +   first_physical, size, last_attr,
> +   buf);
>  }
>  first_logical = logical;
>  first_physical = physical;
> @@ -583,125 +586,130 @@ static void dump_address_map(CPUM68KState *env, 
> uint32_t root_pointer)
>  }
>  if (first_logical != logical || (attr & 4) != (last_attr & 4)) {
>  size = logical + (1 << tic_shift) - first_logical;
> -print_address_zone(first_logical, first_physical, size, last_attr);
> +print_address_zone(first_logical, first_physical, size, last_attr, 
> buf);
>  }
>  }
>  
>  #define DUMP_CACHEFLAGS(a) \
>  switch (a & M68K_DESC_CACHEMODE) { \
>  case M68K_DESC_CM_WRTHRU: /* cachable, write-through */ \
> -qemu_printf("T"); \
> +g_string_append_printf(buf, "T"); \
>  break; \
>  case M68K_DESC_CM_COPYBK: /* cachable, copyback */ \
>

Re: [PATCH] gitlab-ci: Make more custom runner jobs manual, and don't allow failure

2021-09-14 Thread Willian Rampazzo

On Tue, Sep 14, 2021 at 4:18 PM Peter Maydell  wrote:
>
> On Mon, 13 Sept 2021 at 11:19, Peter Maydell  wrote:
> >
> > Currently we define a lot of jobs for our custom runners:
> > for both aarch64 and s390x we have
> >  - all-linux-static
> >  - all
> >  - alldbg
> >  - clang (manual)
> >  - tci
> >  - notcg (manual)
> >
> > This is overkill.  The main reason to run on these hosts is to get
> > coverage for the host architecture; we can leave the handling of
> > differences like debug vs non-debug to the x86 CI jobs.
> >
> > The jobs are also generally running OK; they occasionally fail due to
> > timeouts, which is likely because we're overloading the machine by
> > asking it to run 4 CI jobs at once plus the ad-hoc CI.
> >
> > Remove the 'allow_failure' tag from all these jobs, and switch the
> > s390x-alldbg, aarch64-all, s390x-tci and aarch64-tci jobs to manual.
> > This will let us make the switch for s390x and aarch64 hosts from
> > the ad-hoc CI to gitlab.
> >
> > Signed-off-by: Peter Maydell 
>
> It looks like this change has resulted in pipelines ending
> up in a "blocked" state:
>
> https://gitlab.com/qemu-project/qemu/-/pipelines
>
> I'm not sure why this is -- is it perhaps because there were
> other jobs that depended on the now-manual-only jobs ?
> Can somebody suggest a fix ?

There are a couple of issues listed on GitLab main repository
reporting the same behavior. When you remove the allow_failure: true,
it is set to the default, false. As other stages may depend on that
job and it is now set to not allow failure, the pipeline is marked as
blocked.

Some people reported setting the jobs to allow_failure: true "solved"
the problem.

References:
https://gitlab.com/gitlab-org/gitlab/-/issues/39534
https://gitlab.com/gitlab-org/gitlab/-/issues/31415
https://gitlab.com/gitlab-org/gitlab-foss/-/issues/66602

>
> thanks
> -- PMM
>

Re: [PATCH v2 15/53] target/m68k: convert to use format_state instead of dump_state

2021-09-14 Thread Laurent Vivier

Le 14/09/2021 à 16:20, Daniel P. Berrangé a écrit :
> Signed-off-by: Daniel P. Berrangé 
> ---
>  target/m68k/cpu.c   |  2 +-
>  target/m68k/cpu.h   |  2 +-
>  target/m68k/translate.c | 92 ++---
>  3 files changed, 51 insertions(+), 45 deletions(-)
> 
> diff --git a/target/m68k/cpu.c b/target/m68k/cpu.c
> index 72de6e9726..4ccf572a68 100644
> --- a/target/m68k/cpu.c
> +++ b/target/m68k/cpu.c
> @@ -536,7 +536,7 @@ static void m68k_cpu_class_init(ObjectClass *c, void 
> *data)
>  
>  cc->class_by_name = m68k_cpu_class_by_name;
>  cc->has_work = m68k_cpu_has_work;
> -cc->dump_state = m68k_cpu_dump_state;
> +cc->format_state = m68k_cpu_format_state;
>  cc->set_pc = m68k_cpu_set_pc;
>  cc->gdb_read_register = m68k_cpu_gdb_read_register;
>  cc->gdb_write_register = m68k_cpu_gdb_write_register;
> diff --git a/target/m68k/cpu.h b/target/m68k/cpu.h
> index 997d588911..b0641f6d0d 100644
> --- a/target/m68k/cpu.h
> +++ b/target/m68k/cpu.h
> @@ -168,7 +168,7 @@ struct M68kCPU {
>  
>  void m68k_cpu_do_interrupt(CPUState *cpu);
>  bool m68k_cpu_exec_interrupt(CPUState *cpu, int int_req);
> -void m68k_cpu_dump_state(CPUState *cpu, FILE *f, int flags);
> +void m68k_cpu_format_state(CPUState *cpu, GString *buf, int flags);
>  hwaddr m68k_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
>  int m68k_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
>  int m68k_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
> diff --git a/target/m68k/translate.c b/target/m68k/translate.c
> index c34d9aed61..951bbed6bf 100644
> --- a/target/m68k/translate.c
> +++ b/target/m68k/translate.c
> @@ -6316,75 +6316,81 @@ static double floatx80_to_double(CPUM68KState *env, 
> uint16_t high, uint64_t low)
>  return u.d;
>  }
>  
> -void m68k_cpu_dump_state(CPUState *cs, FILE *f, int flags)
> +void m68k_cpu_format_state(CPUState *cs, GString *buf, int flags)
>  {
>  M68kCPU *cpu = M68K_CPU(cs);
>  CPUM68KState *env = >env;
>  int i;
>  uint16_t sr;
>  for (i = 0; i < 8; i++) {
> -qemu_fprintf(f, "D%d = %08x   A%d = %08x   "
> - "F%d = %04x %016"PRIx64"  (%12g)\n",
> - i, env->dregs[i], i, env->aregs[i],
> - i, env->fregs[i].l.upper, env->fregs[i].l.lower,
> - floatx80_to_double(env, env->fregs[i].l.upper,
> -env->fregs[i].l.lower));
> -}
> -qemu_fprintf(f, "PC = %08x   ", env->pc);
> +g_string_append_printf(buf, "D%d = %08x   A%d = %08x   "
> +   "F%d = %04x %016"PRIx64"  (%12g)\n",
> +   i, env->dregs[i], i, env->aregs[i],
> +   i, env->fregs[i].l.upper, 
> env->fregs[i].l.lower,
> +   floatx80_to_double(env, env->fregs[i].l.upper,
> +  env->fregs[i].l.lower));
> +}
> +g_string_append_printf(buf, "PC = %08x   ", env->pc);
>  sr = env->sr | cpu_m68k_get_ccr(env);
> -qemu_fprintf(f, "SR = %04x T:%x I:%x %c%c %c%c%c%c%c\n",
> - sr, (sr & SR_T) >> SR_T_SHIFT, (sr & SR_I) >> SR_I_SHIFT,
> - (sr & SR_S) ? 'S' : 'U', (sr & SR_M) ? '%' : 'I',
> - (sr & CCF_X) ? 'X' : '-', (sr & CCF_N) ? 'N' : '-',
> - (sr & CCF_Z) ? 'Z' : '-', (sr & CCF_V) ? 'V' : '-',
> - (sr & CCF_C) ? 'C' : '-');
> -qemu_fprintf(f, "FPSR = %08x %c%c%c%c ", env->fpsr,
> - (env->fpsr & FPSR_CC_A) ? 'A' : '-',
> - (env->fpsr & FPSR_CC_I) ? 'I' : '-',
> - (env->fpsr & FPSR_CC_Z) ? 'Z' : '-',
> - (env->fpsr & FPSR_CC_N) ? 'N' : '-');
> -qemu_fprintf(f, "\n"
> - "FPCR = %04x ", env->fpcr);
> +g_string_append_printf(buf, "SR = %04x T:%x I:%x %c%c %c%c%c%c%c\n",
> +   sr, (sr & SR_T) >> SR_T_SHIFT,
> +   (sr & SR_I) >> SR_I_SHIFT,
> +   (sr & SR_S) ? 'S' : 'U', (sr & SR_M) ? '%' : 'I',
> +   (sr & CCF_X) ? 'X' : '-', (sr & CCF_N) ? 'N' : 
> '-',
> +   (sr & CCF_Z) ? 'Z' : '-', (sr & CCF_V) ? 'V' : 
> '-',
> +   (sr & CCF_C) ? 'C' : '-');
> +g_string_append_printf(buf, "FPSR = %08x %c%c%c%c ", env->fpsr,
> +   (env->fpsr & FPSR_CC_A) ? 'A' : '-',
> +   (env->fpsr & FPSR_CC_I) ? 'I' : '-',
> +   (env->fpsr & FPSR_CC_Z) ? 'Z' : '-',
> +   (env->fpsr & FPSR_CC_N) ? 'N' : '-');
> +g_string_append_printf(buf, "\n"
> +   "FPCR = %04x ", env->fpcr);
>  switch (env->fpcr & FPCR_PREC_MASK) {
>  case FPCR_PREC_X:
> -qemu_fprintf(f, "X ");
> +

Re: [PATCH v6 3/6] spapr: introduce spapr_numa_associativity_reset()

2021-09-14 Thread Daniel Henrique Barboza





On 9/14/21 08:55, Greg Kurz wrote:

On Fri, 10 Sep 2021 16:55:36 -0300
Daniel Henrique Barboza  wrote:


Introducing a new NUMA affinity, FORM2, requires a new mechanism to
switch between affinity modes after CAS. Also, we want FORM2 data
structures and functions to be completely separated from the existing
FORM1 code, allowing us to avoid adding new code that inherits the
existing complexity of FORM1.

At the same time, it's also desirable to minimize the amount of changes
made in write_dt() functions that are used to write ibm,associativity of
the resources, RTAS artifacts and h_home_node_associativity. These
functions can work in the same way in both NUMA affinity modes, as long
as we use a similar data structure and parametrize it properly depending
on the affinity mode selected.

This patch introduces spapr_numa_associativity_reset() to start this
process. This function will be used to switch to the chosen NUMA
affinity after CAS and after migrating the guest. To do that, the
existing 'numa_assoc_array' is renamed to 'FORM1_assoc_array' and will
hold FORM1 data that is populated at associativity_init().
'numa_assoc_array' is now a pointer that can be switched between the
existing affinity arrays. We don't have FORM2 data structures yet, so
'numa_assoc_array' will always point to 'FORM1_assoc_array'.

We also take the precaution of pointing 'numa_assoc_array' to
'FORM1_assoc_array' in associativity_init() time, before CAS, to not
change FORM1 availability for existing guests.

A small change in spapr_numa_write_associativity_dt() is made to reflect
the fact that 'numa_assoc_array' is now a pointer and we must be
explicit with the size being written in the DT.

Signed-off-by: Daniel Henrique Barboza 
---
  hw/ppc/spapr.c  | 14 +
  hw/ppc/spapr_hcall.c|  7 +++
  hw/ppc/spapr_numa.c | 42 +
  include/hw/ppc/spapr.h  |  3 ++-
  include/hw/ppc/spapr_numa.h |  1 +
  5 files changed, 57 insertions(+), 10 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d39fd4e644..5afbb76cab 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1786,6 +1786,20 @@ static int spapr_post_load(void *opaque, int version_id)
  return err;
  }
  
+/*

+ * NUMA affinity selection is made in CAS time. There is no reliable
+ * way of telling whether the guest already went through CAS before
+ * migration due to how spapr_ov5_cas_needed works: a FORM1 guest can
+ * be migrated with ov5_cas empty regardless of going through CAS
+ * first.
+ *
+ * One solution is to call numa_associativity_reset(). The downside
+ * is that a guest migrated before CAS will reset it again when going
+ * through it, but since it's a lightweight operation it's worth being
+ * a little redundant to be safe.


Also this isn't a hot path.


+ */
+ spapr_numa_associativity_reset(spapr);
+
  return err;
  }
  
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c

index 0e9a5b2e40..82ab92ddba 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -17,6 +17,7 @@
  #include "kvm_ppc.h"
  #include "hw/ppc/fdt.h"
  #include "hw/ppc/spapr_ovec.h"
+#include "hw/ppc/spapr_numa.h"
  #include "mmu-book3s-v3.h"
  #include "hw/mem/memory-device.h"
  
@@ -1197,6 +1198,12 @@ target_ulong do_client_architecture_support(PowerPCCPU *cpu,

  spapr->cas_pre_isa3_guest = !spapr_ovec_test(ov1_guest, OV1_PPC_3_00);
  spapr_ovec_cleanup(ov1_guest);
  
+/*

+ * Reset numa_assoc_array now that we know which NUMA affinity
+ * the guest will use.
+ */
+spapr_numa_associativity_reset(spapr);
+
  /*
   * Ensure the guest asks for an interrupt mode we support;
   * otherwise terminate the boot.
diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
index fb6059550f..327952ba9e 100644
--- a/hw/ppc/spapr_numa.c
+++ b/hw/ppc/spapr_numa.c
@@ -97,7 +97,7 @@ static void spapr_numa_define_FORM1_domains(SpaprMachineState 
*spapr)
   */
  for (i = 1; i < nb_numa_nodes; i++) {
  for (j = 1; j < MAX_DISTANCE_REF_POINTS; j++) {
-spapr->numa_assoc_array[i][j] = cpu_to_be32(i);
+spapr->FORM1_assoc_array[i][j] = cpu_to_be32(i);
  }
  }
  
@@ -149,8 +149,8 @@ static void spapr_numa_define_FORM1_domains(SpaprMachineState *spapr)

   * and going up to 0x1.
   */
  for (i = n_level; i > 0; i--) {
-assoc_src = spapr->numa_assoc_array[src][i];
-spapr->numa_assoc_array[dst][i] = assoc_src;
+assoc_src = spapr->FORM1_assoc_array[src][i];
+spapr->FORM1_assoc_array[dst][i] = assoc_src;
  }
  }
  }
@@ -167,6 +167,11 @@ static void 
spapr_numa_FORM1_affinity_init(SpaprMachineState *spapr,
  int nb_numa_nodes = machine->numa_state->num_nodes;
  int i, j, max_nodes_with_gpus;
  
+/* init FORM1_assoc_array */

+for

Re: [PATCH v2 20/53] target/ppc: convert to use format_state instead of dump_state

2021-09-14 Thread Greg Kurz

On Tue, 14 Sep 2021 15:20:09 +0100
Daniel P. Berrangé  wrote:

> Signed-off-by: Daniel P. Berrangé 
> ---

Acked-by: Greg Kurz 

>  target/ppc/cpu.h  |   2 +-
>  target/ppc/cpu_init.c | 212 +-
>  2 files changed, 126 insertions(+), 88 deletions(-)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 500205229c..c84ae29b98 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1256,7 +1256,7 @@ DECLARE_OBJ_CHECKERS(PPCVirtualHypervisor, 
> PPCVirtualHypervisorClass,
>  
>  void ppc_cpu_do_interrupt(CPUState *cpu);
>  bool ppc_cpu_exec_interrupt(CPUState *cpu, int int_req);
> -void ppc_cpu_dump_state(CPUState *cpu, FILE *f, int flags);
> +void ppc_cpu_format_state(CPUState *cpu, GString *buf, int flags);
>  hwaddr ppc_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
>  int ppc_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
>  int ppc_cpu_gdb_read_register_apple(CPUState *cpu, GByteArray *buf, int reg);
> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> index ad7abc6041..3456be465c 100644
> --- a/target/ppc/cpu_init.c
> +++ b/target/ppc/cpu_init.c
> @@ -9043,7 +9043,7 @@ static void ppc_cpu_class_init(ObjectClass *oc, void 
> *data)
>  
>  cc->class_by_name = ppc_cpu_class_by_name;
>  cc->has_work = ppc_cpu_has_work;
> -cc->dump_state = ppc_cpu_dump_state;
> +cc->format_state = ppc_cpu_format_state;
>  cc->set_pc = ppc_cpu_set_pc;
>  cc->gdb_read_register = ppc_cpu_gdb_read_register;
>  cc->gdb_write_register = ppc_cpu_gdb_write_register;
> @@ -9104,7 +9104,7 @@ static void ppc_cpu_register_types(void)
>  #endif
>  }
>  
> -void ppc_cpu_dump_state(CPUState *cs, FILE *f, int flags)
> +void ppc_cpu_format_state(CPUState *cs, GString *buf, int flags)
>  {
>  #define RGPL  4
>  #define RFPL  4
> @@ -9113,39 +9113,41 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, int 
> flags)
>  CPUPPCState *env = >env;
>  int i;
>  
> -qemu_fprintf(f, "NIP " TARGET_FMT_lx "   LR " TARGET_FMT_lx " CTR "
> - TARGET_FMT_lx " XER " TARGET_FMT_lx " CPU#%d\n",
> - env->nip, env->lr, env->ctr, cpu_read_xer(env),
> - cs->cpu_index);
> -qemu_fprintf(f, "MSR " TARGET_FMT_lx " HID0 " TARGET_FMT_lx "  HF "
> - "%08x iidx %d didx %d\n",
> - env->msr, env->spr[SPR_HID0], env->hflags,
> - cpu_mmu_index(env, true), cpu_mmu_index(env, false));
> +g_string_append_printf(buf,
> +   "NIP " TARGET_FMT_lx "   LR " TARGET_FMT_lx " CTR 
> "
> +   TARGET_FMT_lx " XER " TARGET_FMT_lx " CPU#%d\n",
> +   env->nip, env->lr, env->ctr, cpu_read_xer(env),
> +   cs->cpu_index);
> +g_string_append_printf(buf,
> +   "MSR " TARGET_FMT_lx " HID0 " TARGET_FMT_lx "  HF 
> "
> +   "%08x iidx %d didx %d\n",
> +   env->msr, env->spr[SPR_HID0], env->hflags,
> +   cpu_mmu_index(env, true), cpu_mmu_index(env, 
> false));
>  #if !defined(NO_TIMER_DUMP)
> -qemu_fprintf(f, "TB %08" PRIu32 " %08" PRIu64
> +g_string_append_printf(buf, "TB %08" PRIu32 " %08" PRIu64
>  #if !defined(CONFIG_USER_ONLY)
> - " DECR " TARGET_FMT_lu
> +   " DECR " TARGET_FMT_lu
>  #endif
> - "\n",
> - cpu_ppc_load_tbu(env), cpu_ppc_load_tbl(env)
> +   "\n",
> +   cpu_ppc_load_tbu(env), cpu_ppc_load_tbl(env)
>  #if !defined(CONFIG_USER_ONLY)
> - , cpu_ppc_load_decr(env)
> +   , cpu_ppc_load_decr(env)
>  #endif
>  );
>  #endif
>  for (i = 0; i < 32; i++) {
>  if ((i & (RGPL - 1)) == 0) {
> -qemu_fprintf(f, "GPR%02d", i);
> +g_string_append_printf(buf, "GPR%02d", i);
>  }
> -qemu_fprintf(f, " %016" PRIx64, ppc_dump_gpr(env, i));
> +g_string_append_printf(buf, " %016" PRIx64, ppc_dump_gpr(env, i));
>  if ((i & (RGPL - 1)) == (RGPL - 1)) {
> -qemu_fprintf(f, "\n");
> +g_string_append_printf(buf, "\n");
>  }
>  }
> -qemu_fprintf(f, "CR ");
> +g_string_append_printf(buf, "CR ");
>  for (i = 0; i < 8; i++)
> -qemu_fprintf(f, "%01x", env->crf[i]);
> -qemu_fprintf(f, "  [");
> +g_string_append_printf(buf, "%01x", env->crf[i]);
> +g_string_append_printf(buf, "  [");
>  for (i = 0; i < 8; i++) {
>  char a = '-';
>  if (env->crf[i] & 0x08) {
> @@ -9155,75 +9157,97 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, int 
> flags)
>  } else if (env->crf[i] & 0x02) {
>  a = 'E';
>  }
> -qemu_fprintf(f, " %c%c", a, env->crf[i] & 0x01 ? 'O' : ' ');
> +g_string_append_printf(buf, " %c%c", a, env->crf[i] & 0x01 ? 'O' : ' 
>

[PULL v4 00/43] tcg patch queue

2021-09-14 Thread Richard Henderson

Version 4: Drop the cpu_loop noreturn patch.


r~


The following changes since commit 4c9af1ea1457782cf0adb293179335ef6de942aa:

  gitlab-ci: Make more custom runner jobs manual, and don't allow failure 
(2021-09-14 17:03:03 +0100)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20210914-4

for you to fetch changes up to e028eada62dbfcba134ac5afdefc3aa343ae202f:

  tcg/arm: More use of the TCGReg enum (2021-09-14 12:00:21 -0700)


Fix translation race condition for user-only.
Fix tcg/i386 encoding for VPSLLVQ, VPSRLVQ.
Fix tcg/arm tcg_out_vec_op signature.
Fix tcg/ppc (32bit) build with clang.
Remove dupluate TCG_KICK_PERIOD definition.
Remove unused tcg_global_reg_new.
Restrict cpu_exec_interrupt and its callees to sysemu.
Cleanups for tcg/arm.


Bin Meng (1):
  tcg: Remove tcg_global_reg_new defines

Ilya Leoshkevich (3):
  accel/tcg: Add DisasContextBase argument to translator_ld*
  accel/tcg: Clear PAGE_WRITE before translation
  accel/tcg/user-exec: Fix read-modify-write of code on s390 hosts

Jose R. Ziviani (1):
  tcg/arm: Fix tcg_out_vec_op function signature

Luc Michel (1):
  accel/tcg: remove redundant TCG_KICK_PERIOD define

Philippe Mathieu-Daudé (24):
  target/avr: Remove pointless use of CONFIG_USER_ONLY definition
  target/i386: Restrict sysemu-only fpu_helper helpers
  target/i386: Simplify TARGET_X86_64 #ifdef'ry
  target/xtensa: Restrict do_transaction_failed() to sysemu
  accel/tcg: Rename user-mode do_interrupt hack as fake_user_interrupt
  target/alpha: Restrict cpu_exec_interrupt() handler to sysemu
  target/arm: Restrict cpu_exec_interrupt() handler to sysemu
  target/cris: Restrict cpu_exec_interrupt() handler to sysemu
  target/hppa: Restrict cpu_exec_interrupt() handler to sysemu
  target/i386: Restrict cpu_exec_interrupt() handler to sysemu
  target/i386: Move x86_cpu_exec_interrupt() under sysemu/ folder
  target/m68k: Restrict cpu_exec_interrupt() handler to sysemu
  target/microblaze: Restrict cpu_exec_interrupt() handler to sysemu
  target/mips: Restrict cpu_exec_interrupt() handler to sysemu
  target/nios2: Restrict cpu_exec_interrupt() handler to sysemu
  target/openrisc: Restrict cpu_exec_interrupt() handler to sysemu
  target/ppc: Restrict cpu_exec_interrupt() handler to sysemu
  target/riscv: Restrict cpu_exec_interrupt() handler to sysemu
  target/sh4: Restrict cpu_exec_interrupt() handler to sysemu
  target/sparc: Restrict cpu_exec_interrupt() handler to sysemu
  target/rx: Restrict cpu_exec_interrupt() handler to sysemu
  target/xtensa: Restrict cpu_exec_interrupt() handler to sysemu
  accel/tcg: Restrict TCGCPUOps::cpu_exec_interrupt() to sysemu
  user: Remove cpu_get_pic_interrupt() stubs

Richard Henderson (13):
  tcg/i386: Split P_VEXW from P_REXW
  tcg/ppc: Replace TCG_TARGET_CALL_DARWIN with _CALL_DARWIN
  tcg/ppc: Ensure _CALL_SYSV is set for 32-bit ELF
  tcg/arm: Remove fallback definition of __ARM_ARCH
  tcg/arm: Standardize on tcg_out__{reg,imm}
  tcg/arm: Simplify use_armv5t_instructions
  tcg/arm: Support armv4t in tcg_out_goto and tcg_out_call
  tcg/arm: Split out tcg_out_ldstm
  tcg/arm: Simplify usage of encode_imm
  tcg/arm: Drop inline markers
  tcg/arm: Give enum arm_cond_code_e a typedef and use it
  tcg/arm: More use of the ARMInsn enum
  tcg/arm: More use of the TCGReg enum

 include/exec/translate-all.h  |   1 +
 include/exec/translator.h |  44 +--
 include/hw/core/tcg-cpu-ops.h |  26 +-
 include/tcg/tcg-op.h  |   2 -
 target/alpha/cpu.h|   2 +-
 target/arm/arm_ldst.h |  12 +-
 target/arm/cpu.h  |   3 +-
 target/cris/cpu.h |   2 +-
 target/hppa/cpu.h |   4 +-
 target/i386/cpu.h |   3 +
 target/i386/tcg/helper-tcg.h  |   2 +
 target/m68k/cpu.h |   2 +
 target/microblaze/cpu.h   |   2 +
 target/mips/tcg/tcg-internal.h|   5 +-
 target/openrisc/cpu.h |   5 +-
 target/ppc/cpu.h  |   4 +-
 target/riscv/cpu.h|   2 +-
 target/rx/cpu.h   |   2 +
 target/sh4/cpu.h  |   4 +-
 target/xtensa/cpu.h   |   2 +
 tcg/arm/tcg-target.h  |  27 +-
 accel/tcg/cpu-exec.c  |  14 +-
 accel/tcg/tcg-accel-ops-rr.c  |   2 -
 accel/tcg/translate-all.c |  59 ++--
 accel/tcg/translator.c|  39 +++
 accel/tcg/user-exec.c |  48 ++-
 bsd

[PATCH v4 00/43] tcg patch queue

2021-09-14 Thread Richard Henderson

Version 4: Drop the cpu_loop noreturn patch.


r~


The following changes since commit 4c9af1ea1457782cf0adb293179335ef6de942aa:

  gitlab-ci: Make more custom runner jobs manual, and don't allow failure 
(2021-09-14 17:03:03 +0100)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20210914-4

for you to fetch changes up to e028eada62dbfcba134ac5afdefc3aa343ae202f:

  tcg/arm: More use of the TCGReg enum (2021-09-14 12:00:21 -0700)


Fix translation race condition for user-only.
Fix tcg/i386 encoding for VPSLLVQ, VPSRLVQ.
Fix tcg/arm tcg_out_vec_op signature.
Fix tcg/ppc (32bit) build with clang.
Remove dupluate TCG_KICK_PERIOD definition.
Remove unused tcg_global_reg_new.
Restrict cpu_exec_interrupt and its callees to sysemu.
Cleanups for tcg/arm.


Bin Meng (1):
  tcg: Remove tcg_global_reg_new defines

Ilya Leoshkevich (3):
  accel/tcg: Add DisasContextBase argument to translator_ld*
  accel/tcg: Clear PAGE_WRITE before translation
  accel/tcg/user-exec: Fix read-modify-write of code on s390 hosts

Jose R. Ziviani (1):
  tcg/arm: Fix tcg_out_vec_op function signature

Luc Michel (1):
  accel/tcg: remove redundant TCG_KICK_PERIOD define

Philippe Mathieu-Daudé (24):
  target/avr: Remove pointless use of CONFIG_USER_ONLY definition
  target/i386: Restrict sysemu-only fpu_helper helpers
  target/i386: Simplify TARGET_X86_64 #ifdef'ry
  target/xtensa: Restrict do_transaction_failed() to sysemu
  accel/tcg: Rename user-mode do_interrupt hack as fake_user_interrupt
  target/alpha: Restrict cpu_exec_interrupt() handler to sysemu
  target/arm: Restrict cpu_exec_interrupt() handler to sysemu
  target/cris: Restrict cpu_exec_interrupt() handler to sysemu
  target/hppa: Restrict cpu_exec_interrupt() handler to sysemu
  target/i386: Restrict cpu_exec_interrupt() handler to sysemu
  target/i386: Move x86_cpu_exec_interrupt() under sysemu/ folder
  target/m68k: Restrict cpu_exec_interrupt() handler to sysemu
  target/microblaze: Restrict cpu_exec_interrupt() handler to sysemu
  target/mips: Restrict cpu_exec_interrupt() handler to sysemu
  target/nios2: Restrict cpu_exec_interrupt() handler to sysemu
  target/openrisc: Restrict cpu_exec_interrupt() handler to sysemu
  target/ppc: Restrict cpu_exec_interrupt() handler to sysemu
  target/riscv: Restrict cpu_exec_interrupt() handler to sysemu
  target/sh4: Restrict cpu_exec_interrupt() handler to sysemu
  target/sparc: Restrict cpu_exec_interrupt() handler to sysemu
  target/rx: Restrict cpu_exec_interrupt() handler to sysemu
  target/xtensa: Restrict cpu_exec_interrupt() handler to sysemu
  accel/tcg: Restrict TCGCPUOps::cpu_exec_interrupt() to sysemu
  user: Remove cpu_get_pic_interrupt() stubs

Richard Henderson (13):
  tcg/i386: Split P_VEXW from P_REXW
  tcg/ppc: Replace TCG_TARGET_CALL_DARWIN with _CALL_DARWIN
  tcg/ppc: Ensure _CALL_SYSV is set for 32-bit ELF
  tcg/arm: Remove fallback definition of __ARM_ARCH
  tcg/arm: Standardize on tcg_out__{reg,imm}
  tcg/arm: Simplify use_armv5t_instructions
  tcg/arm: Support armv4t in tcg_out_goto and tcg_out_call
  tcg/arm: Split out tcg_out_ldstm
  tcg/arm: Simplify usage of encode_imm
  tcg/arm: Drop inline markers
  tcg/arm: Give enum arm_cond_code_e a typedef and use it
  tcg/arm: More use of the ARMInsn enum
  tcg/arm: More use of the TCGReg enum

 include/exec/translate-all.h  |   1 +
 include/exec/translator.h |  44 +--
 include/hw/core/tcg-cpu-ops.h |  26 +-
 include/tcg/tcg-op.h  |   2 -
 target/alpha/cpu.h|   2 +-
 target/arm/arm_ldst.h |  12 +-
 target/arm/cpu.h  |   3 +-
 target/cris/cpu.h |   2 +-
 target/hppa/cpu.h |   4 +-
 target/i386/cpu.h |   3 +
 target/i386/tcg/helper-tcg.h  |   2 +
 target/m68k/cpu.h |   2 +
 target/microblaze/cpu.h   |   2 +
 target/mips/tcg/tcg-internal.h|   5 +-
 target/openrisc/cpu.h |   5 +-
 target/ppc/cpu.h  |   4 +-
 target/riscv/cpu.h|   2 +-
 target/rx/cpu.h   |   2 +
 target/sh4/cpu.h  |   4 +-
 target/xtensa/cpu.h   |   2 +
 tcg/arm/tcg-target.h  |  27 +-
 accel/tcg/cpu-exec.c  |  14 +-
 accel/tcg/tcg-accel-ops-rr.c  |   2 -
 accel/tcg/translate-all.c |  59 ++--
 accel/tcg/translator.c|  39 +++
 accel/tcg/user-exec.c |  48 ++-
 bsd

Re: [PATCH v2 06/53] hw/core: introduce 'format_state' callback to replace 'dump_state'

2021-09-14 Thread Greg Kurz

On Tue, 14 Sep 2021 15:19:55 +0100
Daniel P. Berrangé  wrote:

> The 'dump_state' callback assumes it will be outputting to a FILE
> object. This is fine for HMP, but not so useful for QMP. Introduce
> a new 'format_state' callback that returns a formatted GString
> instead.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---

Reviewed-by: Greg Kurz 

>  hw/core/cpu-common.c  | 15 +++
>  include/hw/core/cpu.h | 13 -
>  2 files changed, 27 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
> index e2f5a64604..c2cd33a817 100644
> --- a/hw/core/cpu-common.c
> +++ b/hw/core/cpu-common.c
> @@ -106,6 +106,21 @@ void cpu_dump_state(CPUState *cpu, FILE *f, int flags)
>  if (cc->dump_state) {
>  cpu_synchronize_state(cpu);
>  cc->dump_state(cpu, f, flags);
> +} else if (cc->format_state) {
> +g_autoptr(GString) buf = g_string_new("");
> +cpu_synchronize_state(cpu);
> +cc->format_state(cpu, buf, flags);
> +qemu_fprintf(f, "%s", buf->str);
> +}
> +}
> +
> +void cpu_format_state(CPUState *cpu, GString *buf, int flags)
> +{
> +CPUClass *cc = CPU_GET_CLASS(cpu);
> +
> +if (cc->format_state) {
> +cpu_synchronize_state(cpu);
> +cc->format_state(cpu, buf, flags);
>  }
>  }
>  
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index bc864564ce..1599ef9df3 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -91,7 +91,8 @@ struct SysemuCPUOps;
>   * @reset_dump_flags: #CPUDumpFlags to use for reset logging.
>   * @has_work: Callback for checking if there is work to do.
>   * @memory_rw_debug: Callback for GDB memory access.
> - * @dump_state: Callback for dumping state.
> + * @dump_state: Callback for dumping state. Deprecated, use @format_state.
> + * @format_state: Callback for formatting state.
>   * @get_arch_id: Callback for getting architecture-dependent CPU ID.
>   * @set_pc: Callback for setting the Program Counter register. This
>   *   should have the semantics used by the target architecture when
> @@ -136,6 +137,7 @@ struct CPUClass {
>  int (*memory_rw_debug)(CPUState *cpu, vaddr addr,
> uint8_t *buf, int len, bool is_write);
>  void (*dump_state)(CPUState *cpu, FILE *, int flags);
> +void (*format_state)(CPUState *cpu, GString *buf, int flags);
>  int64_t (*get_arch_id)(CPUState *cpu);
>  void (*set_pc)(CPUState *cpu, vaddr value);
>  int (*gdb_read_register)(CPUState *cpu, GByteArray *buf, int reg);
> @@ -537,6 +539,15 @@ enum CPUDumpFlags {
>   */
>  void cpu_dump_state(CPUState *cpu, FILE *f, int flags);
>  
> +/**
> + * cpu_format_state:
> + * @cpu: The CPU whose state is to be formatted.
> + * @buf: buffer to format state into
> + *
> + * Formats the CPU state.
> + */
> +void cpu_format_state(CPUState *cpu, GString *buf, int flags);
> +
>  #ifndef CONFIG_USER_ONLY
>  /**
>   * cpu_get_phys_page_attrs_debug:

Re: [PATCH] gitlab-ci: Make more custom runner jobs manual, and don't allow failure

2021-09-14 Thread Peter Maydell

On Mon, 13 Sept 2021 at 11:19, Peter Maydell  wrote:
>
> Currently we define a lot of jobs for our custom runners:
> for both aarch64 and s390x we have
>  - all-linux-static
>  - all
>  - alldbg
>  - clang (manual)
>  - tci
>  - notcg (manual)
>
> This is overkill.  The main reason to run on these hosts is to get
> coverage for the host architecture; we can leave the handling of
> differences like debug vs non-debug to the x86 CI jobs.
>
> The jobs are also generally running OK; they occasionally fail due to
> timeouts, which is likely because we're overloading the machine by
> asking it to run 4 CI jobs at once plus the ad-hoc CI.
>
> Remove the 'allow_failure' tag from all these jobs, and switch the
> s390x-alldbg, aarch64-all, s390x-tci and aarch64-tci jobs to manual.
> This will let us make the switch for s390x and aarch64 hosts from
> the ad-hoc CI to gitlab.
>
> Signed-off-by: Peter Maydell 

It looks like this change has resulted in pipelines ending
up in a "blocked" state:

https://gitlab.com/qemu-project/qemu/-/pipelines

I'm not sure why this is -- is it perhaps because there were
other jobs that depended on the now-manual-only jobs ?
Can somebody suggest a fix ?

thanks
-- PMM

Re: [PULL 0/5] Misc patches

2021-09-14 Thread Peter Maydell

On Tue, 14 Sept 2021 at 14:01,  wrote:
>
> From: Marc-André Lureau 
>
> The following changes since commit c6f5e042d89e79206cd1ce5525d3df219f13c3cc:
>
>   Merge remote-tracking branch 
> 'remotes/pmaydell/tags/pull-target-arm-20210913-3' into staging (2021-09-13 
> 21:06:15 +0100)
>
> are available in the Git repository at:
>
>   g...@gitlab.com:marcandre.lureau/qemu.git tags/misc-pull-request
>
> for you to fetch changes up to 78e3e1d046e64b86e8c9bf3011d5a2a795b5e373:
>
>   chardev: add some comments about the class methods (2021-09-14 16:57:11 
> +0400)
>
> 
> chardev & doc misc
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.2
for any user-visible changes.

-- PMM

Re: [PATCH v2] nbd/server: Suppress Broken pipe errors on abrupt disconnection

2021-09-14 Thread Eric Blake

[IOn Tue, Sep 14, 2021 at 03:52:00PM +0100, Richard W.M. Jones wrote:
> On Tue, Sep 14, 2021 at 05:40:59PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > 13.09.2021 18:19, Richard W.M. Jones wrote:
> > >$ rm -f /tmp/sock /tmp/pid
> > >$ qemu-img create -f qcow2 /tmp/disk.qcow2 1M
> > >$ qemu-nbd -t --format=qcow2 --socket=/tmp/sock --pid-file=/tmp/pid 
> > >/tmp/disk.qcow2 &
> > >$ nbdsh -u 'nbd+unix:///?socket=/tmp/sock' -c 'h.get_size()'
> > >qemu-nbd: Disconnect client, due to: Failed to send reply: Unable to write 
> > >to socket: Broken pipe
> > >$ killall qemu-nbd
> > >
> > >nbdsh is abruptly dropping the NBD connection here which is a valid
> > >way to close the connection.  It seems unnecessary to print an error
> > >in this case so this commit suppresses it.
> > >
> > >Note that if you call the nbdsh h.shutdown() method then the message
> > >was not printed:
> > >
> > >$ nbdsh -u 'nbd+unix:///?socket=/tmp/sock' -c 'h.get_size()' -c 
> > >'h.shutdown()'
> >
> > My personal opinion, is that this warning doesn't hurt in general. I
> > think in production tools should gracefully shutdown any connection,
> > and abrupt shutdown is a sign of something wrong - i.e., worth
> > warning.
> >
> > Shouldn't nbdsh do graceful shutdown by default?
> 
> On the client side the only difference is that nbd_shutdown sends
> NBD_CMD_DISC to the server (versus simply closing the socket).  On the
> server side when the server receives NBD_CMD_DISC it must complete any
> in-flight requests, but there's no requirement for the server to
> commit anything to disk.  IOW you can still lose data even though you
> took the time to disconnect.

If you use NBD_CMD_FLUSH as the last command before NBD_CMD_DISC, then
you shouldn't have data loss (but it requires the server to support
flush).  And in general, while a server that does not flush data on
CMD_DISC is compliant, it is poor quality of implementation if it
strands data that easily, for a client that tried hard to exit
gracefully.

> 
> So I don't think there's any reason for libnbd to always gracefully
> shut down (especially in this case where there are no in-flight
> requests), and anyway it would break ABI to make that change and slow
> down the client in cases when there's nothing to clean up.

I agree that we don't want libnbd to always gracefully shut down by
default; end users can already choose a graceful shutdown when they
want.

At the same time, I would not be opposed to improving the libnbd and
nbdkit testsuite usage of libnbd to request graceful shutdown in
places where it is currently getting an abrupt disconnect merely
because we were lazy when writing the test.

> 
> > >Signed-off-by: Richard W.M. Jones 
> > >---
> > >  nbd/server.c | 7 ++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > >
> > >diff --git a/nbd/server.c b/nbd/server.c
> > >index 3927f7789d..e0a43802b2 100644
> > >--- a/nbd/server.c
> > >+++ b/nbd/server.c
> > >@@ -2669,7 +2669,12 @@ static coroutine_fn void nbd_trip(void *opaque)
> > >  ret = nbd_handle_request(client, , req->data, 
> > > _err);
> > >  }
> > >  if (ret < 0) {
> > >-error_prepend(_err, "Failed to send reply: ");
> > >+if (errno != EPIPE) {
> >
> > Both nbd_handle_request() and nbd_send_generic_reply() declares that
> > they return -errno on failure in communication with client. I think,
> > you should use ret here: if (ret != -EPIPE). It's safer: who knows,
> > does errno really set on all error paths of called functions? If
> > not, we may see here errno of some another previous operation.
> 
> Should we set errno = 0 earlier in nbd_trip()?  I don't really know
> how coroutines in qemu interact with thread-local variables though.

No, we don't need to set errno to 0 prior to a call except at points
where we expect errno to be reliable after the call; but nbd_trip()
does not have any guarantees of reliable errno in the first place
(instead, it captured errno into the return value prior to any point
where errno loses its reliability).


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[PATCH 2/2] gitlab: Add cross-riscv64-system, cross-riscv64-user

2021-09-14 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 .gitlab-ci.d/crossbuilds.yml | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/.gitlab-ci.d/crossbuilds.yml b/.gitlab-ci.d/crossbuilds.yml
index f10168db2e..0fe4a55ac5 100644
--- a/.gitlab-ci.d/crossbuilds.yml
+++ b/.gitlab-ci.d/crossbuilds.yml
@@ -124,6 +124,20 @@ cross-ppc64el-user:
   variables:
 IMAGE: debian-ppc64el-cross
 
+cross-riscv64-system:
+  extends: .cross_system_build_job
+  needs:
+job: riscv64-debian-cross-container
+  variables:
+IMAGE: debian-riscv64-cross
+
+cross-riscv64-user:
+  extends: .cross_user_build_job
+  needs:
+job: riscv64-debian-cross-container
+  variables:
+IMAGE: debian-riscv64-cross
+
 cross-s390x-system:
   extends: .cross_system_build_job
   needs:
-- 
2.25.1

Re: [PATCH v2] nbd/server: Suppress Broken pipe errors on abrupt disconnection

2021-09-14 Thread Eric Blake

On Tue, Sep 14, 2021 at 05:40:59PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 13.09.2021 18:19, Richard W.M. Jones wrote:
> > $ rm -f /tmp/sock /tmp/pid
> > $ qemu-img create -f qcow2 /tmp/disk.qcow2 1M
> > $ qemu-nbd -t --format=qcow2 --socket=/tmp/sock --pid-file=/tmp/pid 
> > /tmp/disk.qcow2 &
> > $ nbdsh -u 'nbd+unix:///?socket=/tmp/sock' -c 'h.get_size()'
> > qemu-nbd: Disconnect client, due to: Failed to send reply: Unable to write 
> > to socket: Broken pipe
> > $ killall qemu-nbd
> > 
> > nbdsh is abruptly dropping the NBD connection here which is a valid
> > way to close the connection.  It seems unnecessary to print an error
> > in this case so this commit suppresses it.
> > 
> > Note that if you call the nbdsh h.shutdown() method then the message
> > was not printed:
> > 
> > $ nbdsh -u 'nbd+unix:///?socket=/tmp/sock' -c 'h.get_size()' -c 
> > 'h.shutdown()'
> 
> My personal opinion, is that this warning doesn't hurt in general. I think in 
> production tools should gracefully shutdown any connection, and abrupt 
> shutdown is a sign of something wrong - i.e., worth warning.
> 
> Shouldn't nbdsh do graceful shutdown by default?

nbdsh exposes the ability to do graceful shutdown, but does not force
it (it is up to the client software using nbdsh whether it calls the
right APIs for a graceful shutdown).

We might consider a new API (which we'd then expose via a new
command-line option to nbdsh) that requests that libnbd try harder to
send NBD_OPT_ABORT or NBD_CMD_DISC prior to closing, but it would
still be something that end users would have to opt into using, and
not something we can turn on by default.

> > +++ b/nbd/server.c
> > @@ -2669,7 +2669,12 @@ static coroutine_fn void nbd_trip(void *opaque)
> >   ret = nbd_handle_request(client, , req->data, _err);
> >   }
> >   if (ret < 0) {
> > -error_prepend(_err, "Failed to send reply: ");
> > +if (errno != EPIPE) {
> 
> Both nbd_handle_request() and nbd_send_generic_reply() declares that they 
> return -errno on failure in communication with client. I think, you should 
> use ret here: if (ret != -EPIPE). It's safer: who knows, does errno really 
> set on all error paths of called functions? If not, we may see here errno of 
> some another previous operation.

Correct - 'errno' is indeterminate at this point; the correct check is
if (-ret != EPIPE).  I can make that tweak while taking this patch, if
we decide it is worthwhile.


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[PATCH 0/2] gitlab-ci: Add riscv64 cross builds

2021-09-14 Thread Richard Henderson

The only tcg host that does not have build coverage is riscv64.
Filling this hole will help with tcg reorgs I have in the works.

Thanks to Alex for help debugging the docker image creation.


r~


Alex Bennée (1):
  tests/docker: promote debian-riscv64-cross to a full image

Richard Henderson (1):
  gitlab: Add cross-riscv64-system, cross-riscv64-user

 .gitlab-ci.d/container-cross.yml  |  1 -
 .gitlab-ci.d/crossbuilds.yml  | 14 ++
 tests/docker/Makefile.include |  2 -
 .../dockerfiles/debian-riscv64-cross.docker   | 46 +--
 4 files changed, 55 insertions(+), 8 deletions(-)

-- 
2.25.1

[PATCH 1/2] tests/docker: promote debian-riscv64-cross to a full image

2021-09-14 Thread Richard Henderson

From: Alex Bennée 

To be able to cross build QEMU itself we need to include a few more
libraries. These are only available in Debian's unstable ports repo
for now so we need to base the riscv64 image on sid with the the
minimal libs needed to build QEMU (glib/pixman).

The result works but is not as clean as using build-dep to bring in
more dependencies. However sid is by definition a shifting pile of
sand and by keeping the list of libs minimal we reduce the chance of
having an image we can't build. It's good enough for a basic cross
build testing of TCG.

Cc: "Daniel P. Berrangé" 
Signed-off-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 .gitlab-ci.d/container-cross.yml  |  1 -
 tests/docker/Makefile.include |  2 -
 .../dockerfiles/debian-riscv64-cross.docker   | 46 +--
 3 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/.gitlab-ci.d/container-cross.yml b/.gitlab-ci.d/container-cross.yml
index 0fcebe363a..05996200e1 100644
--- a/.gitlab-ci.d/container-cross.yml
+++ b/.gitlab-ci.d/container-cross.yml
@@ -134,7 +134,6 @@ ppc64el-debian-cross-container:
 riscv64-debian-cross-container:
   extends: .container_job_template
   stage: containers-layer2
-  needs: ['amd64-debian10-container']
   variables:
 NAME: debian-riscv64-cross
 
diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index ff5d732889..3b03763186 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -141,7 +141,6 @@ docker-image-debian-mips64-cross: docker-image-debian10
 docker-image-debian-mips64el-cross: docker-image-debian10
 docker-image-debian-mipsel-cross: docker-image-debian10
 docker-image-debian-ppc64el-cross: docker-image-debian10
-docker-image-debian-riscv64-cross: docker-image-debian10
 docker-image-debian-s390x-cross: docker-image-debian10
 docker-image-debian-sh4-cross: docker-image-debian10
 docker-image-debian-sparc64-cross: docker-image-debian10
@@ -180,7 +179,6 @@ DOCKER_PARTIAL_IMAGES += debian-arm64-test-cross
 DOCKER_PARTIAL_IMAGES += debian-powerpc-test-cross
 DOCKER_PARTIAL_IMAGES += debian-hppa-cross
 DOCKER_PARTIAL_IMAGES += debian-m68k-cross debian-mips64-cross
-DOCKER_PARTIAL_IMAGES += debian-riscv64-cross
 DOCKER_PARTIAL_IMAGES += debian-sh4-cross debian-sparc64-cross
 DOCKER_PARTIAL_IMAGES += debian-tricore-cross
 DOCKER_PARTIAL_IMAGES += debian-xtensa-cross
diff --git a/tests/docker/dockerfiles/debian-riscv64-cross.docker 
b/tests/docker/dockerfiles/debian-riscv64-cross.docker
index 2bbff19772..594d97982c 100644
--- a/tests/docker/dockerfiles/debian-riscv64-cross.docker
+++ b/tests/docker/dockerfiles/debian-riscv64-cross.docker
@@ -1,12 +1,48 @@
 #
-# Docker cross-compiler target
+# Docker cross-compiler target for riscv64
 #
-# This docker target builds on the debian Buster base image.
+# Currently the only distro that gets close to cross compiling riscv64
+# images is Debian Sid (with unofficial ports). As this is a moving
+# target we keep the library list minimal and are aiming to migrate
+# from this hack as soon as we are able.
 #
-FROM qemu/debian10
+FROM docker.io/library/debian:sid-slim
+
+# Add ports
+RUN apt update && \
+DEBIAN_FRONTEND=noninteractive apt install -yy eatmydata && \
+DEBIAN_FRONTEND=noninteractive eatmydata apt update -yy && \
+DEBIAN_FRONTEND=noninteractive eatmydata apt upgrade -yy
+
+# Install common build utilities
+RUN DEBIAN_FRONTEND=noninteractive eatmydata apt install -yy \
+bc \
+build-essential \
+ca-certificates \
+debian-ports-archive-keyring \
+dpkg-dev \
+gettext \
+git \
+ninja-build \
+pkg-config \
+python3
+
+# Add ports and riscv64 architecture
+RUN echo "deb http://ftp.ports.debian.org/debian-ports/ sid main" >> 
/etc/apt/sources.list
+RUN dpkg --add-architecture riscv64
+
+# Duplicate deb line as deb-src
+RUN cat /etc/apt/sources.list | sed "s/^deb\ /deb-src /" >> 
/etc/apt/sources.list
 
 RUN apt update && \
 DEBIAN_FRONTEND=noninteractive eatmydata \
 apt install -y --no-install-recommends \
-gcc-riscv64-linux-gnu \
-libc6-dev-riscv64-cross
+ gcc-riscv64-linux-gnu \
+ libc6-dev-riscv64-cross \
+ libffi-dev:riscv64 \
+ libglib2.0-dev:riscv64 \
+ libpixman-1-dev:riscv64
+
+# Specify the cross prefix for this image (see tests/docker/common.rc)
+ENV QEMU_CONFIGURE_OPTS --cross-prefix=riscv64-linux-gnu-
+ENV DEF_TARGET_LIST riscv64-softmmu,riscv64-linux-user
-- 
2.25.1

[RFC 1/1] docs/deve/ci-plan: define a high-level plan for the QEMU GitLab CI

2021-09-14 Thread Willian Rampazzo

This adds a high-level plan for the QEMU GitLab CI based on use cases.
The idea is to have a base for evolving the QEMU CI. It sets high-level
characteristics for the QEMU CI use cases, which helps guide its
development.

Signed-off-by: Willian Rampazzo 
---
 docs/devel/ci-plan.rst | 77 ++
 docs/devel/ci.rst  |  1 +
 2 files changed, 78 insertions(+)
 create mode 100644 docs/devel/ci-plan.rst

diff --git a/docs/devel/ci-plan.rst b/docs/devel/ci-plan.rst
new file mode 100644
index 00..5e95b6bcea
--- /dev/null
+++ b/docs/devel/ci-plan.rst
@@ -0,0 +1,77 @@
+The GitLab CI structure
+===
+
+This section describes the current state of the QEMU GitLab CI and the
+high-level plan for its future.
+
+Current state
+-
+
+The mainstream QEMU project considers the GitLab CI its primary CI system.
+Currently, it runs 120+ jobs, where ~36 are container build jobs, 69 are QEMU
+build jobs, ~22 are test jobs, 1  is a web page deploy job, and 1 is an
+external job covering Travis jobs execution.
+
+In the current state, every push a user does to its fork runs most of the jobs
+compared to the jobs running on the main repository. The exceptions are the
+acceptance tests jobs, which run automatically on the main repository only.
+Running most of the jobs in the user's fork or the main repository is not
+viable. The job number tends to increase, becoming impractical to run all of
+them on every single push.
+
+Future of QEMU GitLab CI
+
+
+Following is a proposal to establish a high-level plan and set the
+characteristics for the QEMU GitLab CI. The idea is to organize the CI by use
+cases, avoid wasting resources and CI minutes, anticipating the time GitLab
+starts to enforce minutes limits soon.
+
+Use cases
+^
+
+Below is a list of the most common use cases for the QEMU GitLab CI.
+
+Gating
+""
+
+The gating set of jobs runs on the maintainer's pull requests when the project
+leader pushes them to the staging branch of the project. The gating CI pipeline
+has the following characteristics:
+
+ * Jobs tagged as gating run as part of the gating CI pipeline;
+ * The gating CI pipeline consists of stable jobs;
+ * The execution duration of the gating CI pipeline should, as much as 
possible,
+   have an upper bound limit of 2 hours.
+
+Developers
+""
+
+A developer working on a new feature or fixing an issue may want to run/propose
+a specific set of tests. Those tests may, eventually, benefit other developers.
+A developer CI pipeline has the following characteristics:
+
+ * It is easy to run current tests available in the project;
+ * It is easy to add new tests or remove unneeded tests;
+ * It is flexible enough to allow changes in the current jobs.
+
+Maintainers
+"""
+
+When accepting developers' patches, a maintainer may want to run a specific
+test set. A maintainer CI pipeline has the following characteristics:
+
+ * It consists of tests that are valuable for the subsystem;
+ * It is easy to run a set of specific tests available in the project;
+ * It is easy to add new tests or remove unneeded tests.
+
+Scheduled / periodic pipelines
+""
+
+The scheduled CI pipeline runs periodically on the master/main branch of the
+project. It covers as many jobs as needed or allowed by the execution duration
+of GitLab CI. The main idea of this pipeline is to run jobs that are not part
+of any other use cases due to some limitations, like execution duration, or
+flakiness. This pipeline may be helpful, for example, to collect test/job
+statistics or to define test/job stability. The scheduled CI pipeline should
+not act as a gating CI pipeline.
diff --git a/docs/devel/ci.rst b/docs/devel/ci.rst
index 8d95247188..c9a43f997d 100644
--- a/docs/devel/ci.rst
+++ b/docs/devel/ci.rst
@@ -9,5 +9,6 @@ found at::
https://wiki.qemu.org/Testing/CI
 
 .. include:: ci-definitions.rst
+.. include:: ci-plan.rst
 .. include:: ci-jobs.rst
 .. include:: ci-runners.rst
-- 
2.31.1

[RFC 0/1] docs/deve/ci-plan: define a high-level plan for the QEMU GitLab CI

2021-09-14 Thread Willian Rampazzo

This adds a high-level plan for the QEMU GitLab CI based on use cases.
The idea is to have a base for evolving the QEMU CI. It sets high-level
characteristics for the QEMU CI use cases, which helps guide its
development.

There is an opportunity to discuss the high-level QEMU CI plan and some of
the implementation details during the KVM Forum.

Willian Rampazzo (1):
  docs/deve/ci-plan: define a high-level plan for the QEMU GitLab CI

 docs/devel/ci-plan.rst | 77 ++
 docs/devel/ci.rst  |  1 +
 2 files changed, 78 insertions(+)
 create mode 100644 docs/devel/ci-plan.rst

-- 
2.31.1

Re: [PATCH v4 08/16] tcg/s390x: Implement minimal vector operations

2021-09-14 Thread David Hildenbrand


On 26.06.21 07:02, Richard Henderson wrote:

Implementing add, sub, and, or, xor as the minimal set.
This allows us to actually enable vectors in query_s390_facilities.

Signed-off-by: Richard Henderson 
---
  tcg/s390x/tcg-target.c.inc | 154 -
  1 file changed, 150 insertions(+), 4 deletions(-)



[...]

Reviewed-by: David Hildenbrand 


--
Thanks,

David / dhildenb

Re: [PATCH v4 12/16] tcg/s390x: Implement TCG_TARGET_HAS_minmax_vec

2021-09-14 Thread David Hildenbrand


On 26.06.21 07:03, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  tcg/s390x/tcg-target.h |  2 +-
  tcg/s390x/tcg-target.c.inc | 25 +
  2 files changed, 26 insertions(+), 1 deletion(-)


Reviewed-by: David Hildenbrand 


--
Thanks,

David / dhildenb

Re: [PATCH v4 11/16] tcg/s390x: Implement vector shift operations

2021-09-14 Thread David Hildenbrand


On 26.06.21 07:03, Richard Henderson wrote:

Signed-off-by: Richard Henderson
---
  tcg/s390x/tcg-target-con-set.h |  1 +
  tcg/s390x/tcg-target.h | 12 ++---
  tcg/s390x/tcg-target.c.inc | 93 +-
  3 files changed, 99 insertions(+), 7 deletions(-)


Reviewed-by: David Hildenbrand 

--
Thanks,

David / dhildenb

Re: [PATCH v4 10/16] tcg/s390x: Implement TCG_TARGET_HAS_mul_vec

2021-09-14 Thread David Hildenbrand


On 26.06.21 07:03, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  tcg/s390x/tcg-target.h | 2 +-
  tcg/s390x/tcg-target.c.inc | 7 +++
  2 files changed, 8 insertions(+), 1 deletion(-)




Reviewed-by: David Hildenbrand 

--
Thanks,

David / dhildenb

Re: [PATCH v4 09/16] tcg/s390x: Implement andc, orc, abs, neg, not vector operations

2021-09-14 Thread David Hildenbrand


On 26.06.21 07:03, Richard Henderson wrote:

These logical and arithmetic operations are optional but trivial.

Signed-off-by: Richard Henderson 
---
  tcg/s390x/tcg-target-con-set.h |  1 +
  tcg/s390x/tcg-target.h | 10 +-
  tcg/s390x/tcg-target.c.inc | 34 +-
  3 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/tcg/s390x/tcg-target-con-set.h b/tcg/s390x/tcg-target-con-set.h
index ce9432cfe3..cb953896d5 100644
--- a/tcg/s390x/tcg-target-con-set.h
+++ b/tcg/s390x/tcg-target-con-set.h
@@ -17,6 +17,7 @@ C_O0_I2(v, r)
  C_O1_I1(r, L)
  C_O1_I1(r, r)
  C_O1_I1(v, r)
+C_O1_I1(v, v)
  C_O1_I1(v, vr)
  C_O1_I2(r, 0, ri)
  C_O1_I2(r, 0, rI)
diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index db54266da0..a3d4b5111f 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -143,11 +143,11 @@ extern uint64_t s390_facilities[3];
  #define TCG_TARGET_HAS_v128   HAVE_FACILITY(VECTOR)
  #define TCG_TARGET_HAS_v256   0
  
-#define TCG_TARGET_HAS_andc_vec   0

-#define TCG_TARGET_HAS_orc_vec0
-#define TCG_TARGET_HAS_not_vec0
-#define TCG_TARGET_HAS_neg_vec0
-#define TCG_TARGET_HAS_abs_vec0
+#define TCG_TARGET_HAS_andc_vec   1
+#define TCG_TARGET_HAS_orc_vec1
+#define TCG_TARGET_HAS_not_vec1
+#define TCG_TARGET_HAS_neg_vec1
+#define TCG_TARGET_HAS_abs_vec1
  #define TCG_TARGET_HAS_roti_vec   0
  #define TCG_TARGET_HAS_rots_vec   0
  #define TCG_TARGET_HAS_rotv_vec   0
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index c0622daaa0..040690abe2 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -270,13 +270,18 @@ typedef enum S390Opcode {
  VRIb_VGM= 0xe746,
  VRIc_VREP   = 0xe74d,
  
+VRRa_VLC= 0xe7de,

+VRRa_VLP= 0xe7df,
  VRRa_VLR= 0xe756,
  VRRc_VA = 0xe7f3,
  VRRc_VCEQ   = 0xe7f8,   /* we leave the m5 cs field 0 */
  VRRc_VCH= 0xe7fb,   /* " */
  VRRc_VCHL   = 0xe7f9,   /* " */
  VRRc_VN = 0xe768,
+VRRc_VNC= 0xe769,
+VRRc_VNO= 0xe76b,
  VRRc_VO = 0xe76a,
+VRRc_VOC= 0xe76f,


VOC requires the vector-enhancements facility 1.


  VRRc_VS = 0xe7f7,
  VRRc_VX = 0xe76d,
  VRRf_VLVGP  = 0xe762,
@@ -2637,6 +2642,16 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
  tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
  break;
  
+case INDEX_op_abs_vec:

+tcg_out_insn(s, VRRa, VLP, a0, a1, vece);
+break;
+case INDEX_op_neg_vec:
+tcg_out_insn(s, VRRa, VLC, a0, a1, vece);
+break;
+case INDEX_op_not_vec:
+tcg_out_insn(s, VRRc, VNO, a0, a1, a1, 0);
+break;
+
  case INDEX_op_add_vec:
  tcg_out_insn(s, VRRc, VA, a0, a1, a2, vece);
  break;
@@ -2646,9 +2661,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
  case INDEX_op_and_vec:
  tcg_out_insn(s, VRRc, VN, a0, a1, a2, 0);
  break;
+case INDEX_op_andc_vec:
+tcg_out_insn(s, VRRc, VNC, a0, a1, a2, 0);
+break;
  case INDEX_op_or_vec:
  tcg_out_insn(s, VRRc, VO, a0, a1, a2, 0);
  break;
+case INDEX_op_orc_vec:
+tcg_out_insn(s, VRRc, VOC, a0, a1, a2, 0);
+break;
  case INDEX_op_xor_vec:
  tcg_out_insn(s, VRRc, VX, a0, a1, a2, 0);
  break;
@@ -2679,10 +2700,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
  int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
  {
  switch (opc) {
+case INDEX_op_abs_vec:
  case INDEX_op_add_vec:
-case INDEX_op_sub_vec:


Seems like an unrelated change that should have been performed in the 
introducing patch.



Apart from that:

Reviewed-by: David Hildenbrand 

--
Thanks,

David / dhildenb

Re: [PULL v3 00/44] tcg patch queue

2021-09-14 Thread Philippe Mathieu-Daudé

On 9/14/21 7:13 PM, Peter Maydell wrote:
> On Tue, 14 Sept 2021 at 16:53, Richard Henderson
>  wrote:
>>
>> Version 3: Rebase and fix a minor patch conflict.
>>
>>
>> r~
>>
>>
>> The following changes since commit c6f5e042d89e79206cd1ce5525d3df219f13c3cc:
>>
>>   Merge remote-tracking branch 
>> 'remotes/pmaydell/tags/pull-target-arm-20210913-3' into staging (2021-09-13 
>> 21:06:15 +0100)
>>
>> are available in the Git repository at:
>>
>>   https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20210914
>>
>> for you to fetch changes up to a5b759b6dca7daf87fa5007a7f5784bf22f3830f:
>>
>>   tcg/arm: More use of the TCGReg enum (2021-09-14 07:59:43 -0700)
>>
>> 
>> Fix translation race condition for user-only.
>> Fix tcg/i386 encoding for VPSLLVQ, VPSRLVQ.
>> Fix tcg/arm tcg_out_vec_op signature.
>> Fix tcg/ppc (32bit) build with clang.
>> Remove dupluate TCG_KICK_PERIOD definition.
>> Remove unused tcg_global_reg_new.
>> Restrict cpu_exec_interrupt and its callees to sysemu.
>> Cleanups for tcg/arm.
> 
> This throws up new warnings on FreeBSD:
> 
> ../src/bsd-user/main.c:148:1: warning: function declared 'noreturn'
> should not return [-Winvalid-noreturn]
> 
> Unlike linux-user, where cpu_loop() is the direct implementation
> of the target-specific main loop, on bsd-user cpu_loop() seems
> to just call target_cpu_loop(). Since target_cpu_loop() isn't
> marked noreturn, the compiler complains about cpu_loop() being
> marked noreturn.

Sorry, my bad. I ran this on Gitlab CI but now realize the
FreeBSD job is marked as "allow to fail" so I missed it :(

> Easy fix would be to just drop the bsd-user part of
> "user: Mark cpu_loop() with noreturn attribute" I guess.
> Otherwise you could try marking all the target_cpu_loop()
> functions noreturn as well.

Richard, can you drop the offending patch from your pull
request?

Re: [PATCH v4 05/12] job: @force parameter for job_cancel_sync()

2021-09-14 Thread Hanna Reitz


On 08.09.21 18:33, Vladimir Sementsov-Ogievskiy wrote:

07.09.2021 15:42, Hanna Reitz wrote:

Callers should be able to specify whether they want job_cancel_sync() to
force-cancel the job or not.

In fact, almost all invocations do not care about consistency of the
result and just want the job to terminate as soon as possible, so they
should pass force=true.  The replication block driver is the exception,
specifically the active commit job it runs.

As for job_cancel_sync_all(), all callers want it to force-cancel all
jobs, because that is the point of it: To cancel all remaining jobs as
quickly as possible (generally on process termination).  So make it
invoke job_cancel_sync() with force=true.

This changes some iotest outputs, because quitting qemu while a mirror
job is active will now lead to it being cancelled instead of completed,
which is what we want.  (Cancelling a READY mirror job with force=false
may take an indefinite amount of time, which we do not want when
quitting.  If users want consistent results, they must have all jobs be
done before they quit qemu.)

Buglink:https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Hanna Reitz


Reviewed-by: Vladimir Sementsov-Ogievskiy 


Thanks!

Do you plan on taking this series or should I?

Hanna

Re: [PATCH v2 29/53] qapi: introduce x-query-registers QMP command

2021-09-14 Thread Philippe Mathieu-Daudé

On 9/14/21 6:04 PM, Eric Blake wrote:
> On Tue, Sep 14, 2021 at 03:20:18PM +0100, Daniel P. Berrangé wrote:
>> This is a counterpart to the HMP "info registers" command. It is being
>> added with an "x-" prefix because this QMP command is intended as an
>> ad hoc debugging tool and will thus not be modelled in QAPI as fully
>> structured data, nor will it have long term guaranteed stability.
>> The existing HMP command is rewritten to call the QMP command.
>>
>> Signed-off-by: Daniel P. Berrangé 
>> ---

>> +##
>> +# @HumanReadableText:
>> +#
>> +# @human-readable-text: Formatted output intended for humans.
>> +#
>> +# Since: 6.2.0
> 
> Should be '6.2', not '6.2.0', to match...
> 
>> +#
>> +##
>> +{ 'struct': 'HumanReadableText',
>> +  'data': { 'human-readable-text': 'str' } }
>> diff --git a/qapi/machine.json b/qapi/machine.json
>> index 157712f006..8737efa865 100644
>> --- a/qapi/machine.json
>> +++ b/qapi/machine.json
>> @@ -1312,3 +1312,18 @@
>>   '*cores': 'int',
>>   '*threads': 'int',
>>   '*maxcpus': 'int' } }
>> +
>> +##
>> +# @x-query-registers:
>> +#
>> +# @cpu: the CPU number to query. If omitted, queries all CPUs
>> +#
>> +# Query information on the CPU registers
>> +#
>> +# Returns: CPU state in an architecture-specific format
>> +#
>> +# Since: 6.2
> 
> ...the prevailing style.
> 
> If it were likely that someone might backport just some (but not all)
> added x- commands, it may be wise to separate the creation of
> HumanReadableText into its own patch to backport that but not
> x-query-registers.  But I rather suspect anyone backporting this will
> take the series wholesale, so the coupling in this patch is not worth
> worrying about.

IIUC the problem is this breaks bisection, as you get a QAPI error:

 qapi/qapi-commands-machine.c:123:13: error:
‘qmp_marshal_output_HumanReadableText’
defined but not used [-Werror=unused-function]
  123 | static void
qmp_marshal_output_HumanReadableText(HumanReadableText *ret_in,
  | ^~
cc1: all warnings being treated as errors

See the comment added in commit 1f7b9f3181e
("qapi/commands: add #if conditions to commands").

> 
>> +##
>> +{ 'command': 'x-query-registers',
>> +  'data': {'*cpu': 'int' },
>> +  'returns': 'HumanReadableText' }
>> -- 
>> 2.31.1
>>
>

Re: [PATCH v2 29/53] qapi: introduce x-query-registers QMP command

2021-09-14 Thread Philippe Mathieu-Daudé

On 9/14/21 7:15 PM, Philippe Mathieu-Daudé wrote:
> On 9/14/21 6:04 PM, Eric Blake wrote:
>> On Tue, Sep 14, 2021 at 03:20:18PM +0100, Daniel P. Berrangé wrote:
>>> This is a counterpart to the HMP "info registers" command. It is being
>>> added with an "x-" prefix because this QMP command is intended as an
>>> ad hoc debugging tool and will thus not be modelled in QAPI as fully
>>> structured data, nor will it have long term guaranteed stability.
>>> The existing HMP command is rewritten to call the QMP command.
>>>
>>> Signed-off-by: Daniel P. Berrangé 
>>> ---
> 
>>> +##
>>> +# @HumanReadableText:
>>> +#
>>> +# @human-readable-text: Formatted output intended for humans.
>>> +#
>>> +# Since: 6.2.0
>>
>> Should be '6.2', not '6.2.0', to match...
>>
>>> +#
>>> +##
>>> +{ 'struct': 'HumanReadableText',
>>> +  'data': { 'human-readable-text': 'str' } }
>>> diff --git a/qapi/machine.json b/qapi/machine.json
>>> index 157712f006..8737efa865 100644
>>> --- a/qapi/machine.json
>>> +++ b/qapi/machine.json
>>> @@ -1312,3 +1312,18 @@
>>>   '*cores': 'int',
>>>   '*threads': 'int',
>>>   '*maxcpus': 'int' } }
>>> +
>>> +##
>>> +# @x-query-registers:
>>> +#
>>> +# @cpu: the CPU number to query. If omitted, queries all CPUs
>>> +#
>>> +# Query information on the CPU registers
>>> +#
>>> +# Returns: CPU state in an architecture-specific format
>>> +#
>>> +# Since: 6.2
>>
>> ...the prevailing style.
>>
>> If it were likely that someone might backport just some (but not all)
>> added x- commands, it may be wise to separate the creation of
>> HumanReadableText into its own patch to backport that but not
>> x-query-registers.  But I rather suspect anyone backporting this will
>> take the series wholesale, so the coupling in this patch is not worth
>> worrying about.
> 
> IIUC the problem is this breaks bisection, as you get a QAPI error:
> 
>  qapi/qapi-commands-machine.c:123:13: error:
> ‘qmp_marshal_output_HumanReadableText’
> defined but not used [-Werror=unused-function]
>   123 | static void
> qmp_marshal_output_HumanReadableText(HumanReadableText *ret_in,
>   | ^~
> cc1: all warnings being treated as errors
> 
> See the comment added in commit 1f7b9f3181e
> ("qapi/commands: add #if conditions to commands").

Oh we already talked about this together in this thread:
https://lore.kernel.org/qemu-devel/20210609202952.r4nb2smrptyck...@redhat.com/

> 
>>
>>> +##
>>> +{ 'command': 'x-query-registers',
>>> +  'data': {'*cpu': 'int' },
>>> +  'returns': 'HumanReadableText' }
>>> -- 
>>> 2.31.1
>>>
>>

Re: [PATCH v2 42/53] hw/core: introduce a 'format_tlb' callback

2021-09-14 Thread Daniel P . Berrangé

On Tue, Sep 14, 2021 at 07:02:19PM +0200, Philippe Mathieu-Daudé wrote:
> On 9/14/21 6:34 PM, Daniel P. Berrangé wrote:
> > On Tue, Sep 14, 2021 at 05:56:09PM +0200, Philippe Mathieu-Daudé wrote:
> >> On 9/14/21 4:20 PM, Daniel P. Berrangé wrote:
> >>> This will allow us to reduce duplication between the different targets
> >>> implementing the 'info tlb' command.
> >>>
> >>> Signed-off-by: Daniel P. Berrangé 
> >>> ---
> >>>  hw/core/cpu-common.c  |  9 +
> >>>  include/hw/core/cpu.h | 11 +++
> >>>  2 files changed, 20 insertions(+)
> >>
> >>> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> >>> index 4c47e1df18..64fc57c8d9 100644
> >>> --- a/include/hw/core/cpu.h
> >>> +++ b/include/hw/core/cpu.h
> >>>   * @has_work: Callback for checking if there is work to do.
> >>>   * @memory_rw_debug: Callback for GDB memory access.
> >>>   * @format_state: Callback for formatting state.
> >>> + * @format_tlb: Callback for formatting memory mappings
> 
> "... for formatting translations of virtual to physical memory mappings"
> 
> >>>   * @get_arch_id: Callback for getting architecture-dependent CPU ID.
> >>>   * @set_pc: Callback for setting the Program Counter register. This
> >>>   *   should have the semantics used by the target architecture when
> >>> @@ -136,6 +137,7 @@ struct CPUClass {
> >>>  int (*memory_rw_debug)(CPUState *cpu, vaddr addr,
> >>> uint8_t *buf, int len, bool is_write);
> >>>  void (*format_state)(CPUState *cpu, GString *buf, int flags);
> >>> +void (*format_tlb)(CPUState *cpu, GString *buf);
> >>
> >> Doesn't this belong to SysemuCPUOps?
> > 
> > I can't really answer, since my knowledge of this area of QEMU code is
> > fairly mimimal. I put it here because it is basically serving the same
> > purpose as the "format_state" callback immediately above it, which was
> > a rename of the existing "dump_state" callback. I assumed whatever was
> > there already was a good practice to follow[1]...
> 
> Since it involves physical memory, I'm pretty sure this is sysemu
> specific. Beside in the following patches you guard the handlers
> with '#ifndef CONFIG_USER_ONLY'.
> 
> Good news, there is very few changes needed in your patches, for
> example the next patch:

.snip..

yes I see what you mean now, and agree this looks like a better
approach

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v4 07/10] qcow2-refcount: check_refcounts_l2(): check reserved bits

2021-09-14 Thread Hanna Reitz


On 14.09.21 14:24, Vladimir Sementsov-Ogievskiy wrote:

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
Tested-by: Kirill Tkhai 
Reviewed-by: Hanna Reitz 
---
  block/qcow2.h  |  1 +
  block/qcow2-refcount.c | 12 +++-
  2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index c0e1e83796..b8b1093b61 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -587,6 +587,7 @@ typedef enum QCow2MetadataOverlap {
  
  #define L1E_OFFSET_MASK 0x00fffe00ULL

  #define L2E_OFFSET_MASK 0x00fffe00ULL
+#define L2E_STD_RESERVED_MASK 0x3f0001feULL
  
  #define REFT_OFFSET_MASK 0xfe00ULL
  
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c

index 9a5ae3cac4..5d57e677bc 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1682,8 +1682,18 @@ static int check_refcounts_l2(BlockDriverState *bs, 
BdrvCheckResult *res,
  int csize;
  l2_entry = get_l2_entry(s, l2_table, i);
  l2_bitmap = get_l2_bitmap(s, l2_table, i);
+QCow2ClusterType type = qcow2_get_cluster_type(bs, l2_entry);


Hm, with l2_bitmap being declared next to l2_entry, this is now the 
patch that adds a declaration after a statement here.


(The possible resolutions seem to be the same, either move the 
declaration up to the function’s root block, or move l2_entry and 
l2_bitmap’s declarations here...)


(I don’t think we need a v5 for this, it should be fine if you tell me 
which way you prefer.)


Hanna

Re: [PULL v3 00/44] tcg patch queue

2021-09-14 Thread Peter Maydell

On Tue, 14 Sept 2021 at 16:53, Richard Henderson
 wrote:
>
> Version 3: Rebase and fix a minor patch conflict.
>
>
> r~
>
>
> The following changes since commit c6f5e042d89e79206cd1ce5525d3df219f13c3cc:
>
>   Merge remote-tracking branch 
> 'remotes/pmaydell/tags/pull-target-arm-20210913-3' into staging (2021-09-13 
> 21:06:15 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20210914
>
> for you to fetch changes up to a5b759b6dca7daf87fa5007a7f5784bf22f3830f:
>
>   tcg/arm: More use of the TCGReg enum (2021-09-14 07:59:43 -0700)
>
> 
> Fix translation race condition for user-only.
> Fix tcg/i386 encoding for VPSLLVQ, VPSRLVQ.
> Fix tcg/arm tcg_out_vec_op signature.
> Fix tcg/ppc (32bit) build with clang.
> Remove dupluate TCG_KICK_PERIOD definition.
> Remove unused tcg_global_reg_new.
> Restrict cpu_exec_interrupt and its callees to sysemu.
> Cleanups for tcg/arm.

This throws up new warnings on FreeBSD:

../src/bsd-user/main.c:148:1: warning: function declared 'noreturn'
should not return [-Winvalid-noreturn]

Unlike linux-user, where cpu_loop() is the direct implementation
of the target-specific main loop, on bsd-user cpu_loop() seems
to just call target_cpu_loop(). Since target_cpu_loop() isn't
marked noreturn, the compiler complains about cpu_loop() being
marked noreturn.

Easy fix would be to just drop the bsd-user part of
"user: Mark cpu_loop() with noreturn attribute" I guess.
Otherwise you could try marking all the target_cpu_loop()
functions noreturn as well.

-- PMM

Re: [PATCH v4 06/10] qcow2-refcount: check_refcounts_l2(): check l2_bitmap

2021-09-14 Thread Hanna Reitz


On 14.09.21 14:24, Vladimir Sementsov-Ogievskiy wrote:

Check subcluster bitmap of the l2 entry for different types of
clusters:

  - for compressed it must be zero
  - for allocated check consistency of two parts of the bitmap
  - for unallocated all subclusters should be unallocated
(or zero-plain)

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Tested-by: Kirill Tkhai 
---
  block/qcow2-refcount.c | 28 ++--
  1 file changed, 26 insertions(+), 2 deletions(-)


Reviewed-by: Hanna Reitz

Re: [PATCH v3 00/17] iotests: support zstd

2021-09-14 Thread Hanna Reitz


On 14.09.21 12:25, Vladimir Sementsov-Ogievskiy wrote:

These series makes tests pass with

IMGOPTS='compression_type=zstd'

Also, python iotests start to support IMGOPTS (they didn't before).

v3:
02-04,06,08,14,17: add Hanna's r-b
07  iotests.py: filter out successful output of qemu-img create
   fix subject
   handle 149, 237 and 296 iotests
  (note, 149 is handled intuitively, as it fails :(


It was also reviewed intuitively. :)

Thanks, applied to my block branch:

https://github.com/XanClic/qemu/commits/block

Hanna

Re: [PATCH v3 07/17] iotests.py: filter out successful output of qemu-img create

2021-09-14 Thread Hanna Reitz


On 14.09.21 12:25, Vladimir Sementsov-Ogievskiy wrote:

The only "feature" of this "Formatting ..." line is that we have to
update it every time we add new option. Let's drop it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  tests/qemu-iotests/149.out| 21 -
  tests/qemu-iotests/237.out|  3 ---
  tests/qemu-iotests/255.out|  4 
  tests/qemu-iotests/274.out| 29 -
  tests/qemu-iotests/280.out|  1 -
  tests/qemu-iotests/296.out| 10 +++---
  tests/qemu-iotests/iotests.py | 10 --
  7 files changed, 11 insertions(+), 67 deletions(-)


Reviewed-by: Hanna Reitz

Re: [PATCH v2 42/53] hw/core: introduce a 'format_tlb' callback

2021-09-14 Thread Philippe Mathieu-Daudé

On 9/14/21 6:34 PM, Daniel P. Berrangé wrote:
> On Tue, Sep 14, 2021 at 05:56:09PM +0200, Philippe Mathieu-Daudé wrote:
>> On 9/14/21 4:20 PM, Daniel P. Berrangé wrote:
>>> This will allow us to reduce duplication between the different targets
>>> implementing the 'info tlb' command.
>>>
>>> Signed-off-by: Daniel P. Berrangé 
>>> ---
>>>  hw/core/cpu-common.c  |  9 +
>>>  include/hw/core/cpu.h | 11 +++
>>>  2 files changed, 20 insertions(+)
>>
>>> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
>>> index 4c47e1df18..64fc57c8d9 100644
>>> --- a/include/hw/core/cpu.h
>>> +++ b/include/hw/core/cpu.h
>>>   * @has_work: Callback for checking if there is work to do.
>>>   * @memory_rw_debug: Callback for GDB memory access.
>>>   * @format_state: Callback for formatting state.
>>> + * @format_tlb: Callback for formatting memory mappings

"... for formatting translations of virtual to physical memory mappings"

>>>   * @get_arch_id: Callback for getting architecture-dependent CPU ID.
>>>   * @set_pc: Callback for setting the Program Counter register. This
>>>   *   should have the semantics used by the target architecture when
>>> @@ -136,6 +137,7 @@ struct CPUClass {
>>>  int (*memory_rw_debug)(CPUState *cpu, vaddr addr,
>>> uint8_t *buf, int len, bool is_write);
>>>  void (*format_state)(CPUState *cpu, GString *buf, int flags);
>>> +void (*format_tlb)(CPUState *cpu, GString *buf);
>>
>> Doesn't this belong to SysemuCPUOps?
> 
> I can't really answer, since my knowledge of this area of QEMU code is
> fairly mimimal. I put it here because it is basically serving the same
> purpose as the "format_state" callback immediately above it, which was
> a rename of the existing "dump_state" callback. I assumed whatever was
> there already was a good practice to follow[1]...

Since it involves physical memory, I'm pretty sure this is sysemu
specific. Beside in the following patches you guard the handlers
with '#ifndef CONFIG_USER_ONLY'.

Good news, there is very few changes needed in your patches, for
example the next patch:

-- >8 --
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index ab86224ee23..9d2bd2e2ef4 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6732,6 +6732,7 @@ static Property x86_cpu_properties[] = {
 #include "hw/core/sysemu-cpu-ops.h"

 static const struct SysemuCPUOps i386_sysemu_ops = {
+.format_tlb = x86_cpu_format_tlb,
 .get_memory_mapping = x86_cpu_get_memory_mapping,
 .get_paging_enabled = x86_cpu_get_paging_enabled,
 .get_phys_page_attrs_debug = x86_cpu_get_phys_page_attrs_debug,
@@ -6765,9 +6766,6 @@ static void x86_cpu_common_class_init(ObjectClass
*oc, void *data)
 cc->parse_features = x86_cpu_parse_featurestr;
 cc->has_work = x86_cpu_has_work;
 cc->format_state = x86_cpu_format_state;
-#ifndef CONFIG_USER_ONLY
-cc->format_tlb = x86_cpu_format_tlb;
-#endif
 cc->set_pc = x86_cpu_set_pc;
 cc->gdb_read_register = x86_cpu_gdb_read_register;
 cc->gdb_write_register = x86_cpu_gdb_write_register;
---

> 
> Regards,
> Daniel
> 
> [1] yes assuming these things is often foolish in QEMU :-)
>

Re: [PATCH v4 06/16] tcg/s390x: Implement tcg_out_mov for vector types

2021-09-14 Thread David Hildenbrand


On 14.09.21 18:53, David Hildenbrand wrote:

On 26.06.21 07:02, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
   tcg/s390x/tcg-target.c.inc | 72 +++---
   1 file changed, 68 insertions(+), 4 deletions(-)

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index b6ea129e14..c4e12a57f3 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -265,6 +265,11 @@ typedef enum S390Opcode {
   RX_STC  = 0x42,
   RX_STH  = 0x40,
   
+VRRa_VLR= 0xe756,

+
+VRSb_VLVG   = 0xe722,
+VRSc_VLGV   = 0xe721,
+
   VRX_VL  = 0xe706,
   VRX_VLLEZ   = 0xe704,
   VRX_VST = 0xe70e,
@@ -548,6 +553,39 @@ static int RXB(TCGReg v1, TCGReg v2, TCGReg v3, TCGReg v4)
| ((v4 & 0x10) << (4 + 0));
   }
   
+static void tcg_out_insn_VRRa(TCGContext *s, S390Opcode op,

+  TCGReg v1, TCGReg v2, int m3)
+{
+tcg_debug_assert(v1 >= TCG_REG_V0 && v1 <= TCG_REG_V31);
+tcg_debug_assert(v2 >= TCG_REG_V0 && v2 <= TCG_REG_V31);
+tcg_out16(s, (op & 0xff00) | ((v1 & 15) << 4) | (v2 & 15));
+tcg_out32(s, (op & 0x00ff) | RXB(v1, v2, 0, 0) | (m3 << 12));
+}
+
+static void tcg_out_insn_VRSb(TCGContext *s, S390Opcode op, TCGReg v1,
+  intptr_t d2, TCGReg b2, TCGReg r3, int m4)
+{
+tcg_debug_assert(v1 >= TCG_REG_V0 && v1 <= TCG_REG_V31);
+tcg_debug_assert(d2 >= 0 && d2 <= 0xfff);
+tcg_debug_assert(b2 <= TCG_REG_R15);
+tcg_debug_assert(r3 <= TCG_REG_R15);
+tcg_out16(s, (op & 0xff00) | ((v1 & 15) << 4) | r3);
+tcg_out16(s, b2 << 12 | d2);
+tcg_out16(s, (op & 0x00ff) | RXB(v1, 0, 0, 0) | (m4 << 12));
+}
+
+static void tcg_out_insn_VRSc(TCGContext *s, S390Opcode op, TCGReg r1,
+  intptr_t d2, TCGReg b2, TCGReg v3, int m4)
+{
+tcg_debug_assert(r1 <= TCG_REG_R15);
+tcg_debug_assert(d2 >= 0 && d2 <= 0xfff);
+tcg_debug_assert(b2 <= TCG_REG_R15);
+tcg_debug_assert(v3 >= TCG_REG_V0 && v3 <= TCG_REG_V31);
+tcg_out16(s, (op & 0xff00) | (r1 << 4) | (v3 & 15));
+tcg_out16(s, b2 << 12 | d2);
+tcg_out16(s, (op & 0x00ff) | RXB(0, 0, v3, 0) | (m4 << 12));
+}
+
   static void tcg_out_insn_VRX(TCGContext *s, S390Opcode op, TCGReg v1,
TCGReg b2, TCGReg x2, intptr_t d2, int m3)
   {
@@ -581,12 +619,38 @@ static void tcg_out_sh32(TCGContext* s, S390Opcode op, 
TCGReg dest,
   
   static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg dst, TCGReg src)

   {
-if (src != dst) {
-if (type == TCG_TYPE_I32) {
+if (src == dst) {
+return true;
+}
+switch (type) {
+case TCG_TYPE_I32:
+if (likely(dst < 16 && src < 16)) {
   tcg_out_insn(s, RR, LR, dst, src);
-} else {
-tcg_out_insn(s, RRE, LGR, dst, src);
+break;
   }
+/* fallthru */
+


Does that fall-through work as expected? I would have thought we would
want to pass "2" as m4 for VLGV and VLVG below?



Forget my question, we're not doing memory access :)

Reviewed-by: David Hildenbrand 

--
Thanks,

David / dhildenb

Re: [PATCH v4 06/16] tcg/s390x: Implement tcg_out_mov for vector types

2021-09-14 Thread David Hildenbrand


On 26.06.21 07:02, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  tcg/s390x/tcg-target.c.inc | 72 +++---
  1 file changed, 68 insertions(+), 4 deletions(-)

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index b6ea129e14..c4e12a57f3 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -265,6 +265,11 @@ typedef enum S390Opcode {
  RX_STC  = 0x42,
  RX_STH  = 0x40,
  
+VRRa_VLR= 0xe756,

+
+VRSb_VLVG   = 0xe722,
+VRSc_VLGV   = 0xe721,
+
  VRX_VL  = 0xe706,
  VRX_VLLEZ   = 0xe704,
  VRX_VST = 0xe70e,
@@ -548,6 +553,39 @@ static int RXB(TCGReg v1, TCGReg v2, TCGReg v3, TCGReg v4)
   | ((v4 & 0x10) << (4 + 0));
  }
  
+static void tcg_out_insn_VRRa(TCGContext *s, S390Opcode op,

+  TCGReg v1, TCGReg v2, int m3)
+{
+tcg_debug_assert(v1 >= TCG_REG_V0 && v1 <= TCG_REG_V31);
+tcg_debug_assert(v2 >= TCG_REG_V0 && v2 <= TCG_REG_V31);
+tcg_out16(s, (op & 0xff00) | ((v1 & 15) << 4) | (v2 & 15));
+tcg_out32(s, (op & 0x00ff) | RXB(v1, v2, 0, 0) | (m3 << 12));
+}
+
+static void tcg_out_insn_VRSb(TCGContext *s, S390Opcode op, TCGReg v1,
+  intptr_t d2, TCGReg b2, TCGReg r3, int m4)
+{
+tcg_debug_assert(v1 >= TCG_REG_V0 && v1 <= TCG_REG_V31);
+tcg_debug_assert(d2 >= 0 && d2 <= 0xfff);
+tcg_debug_assert(b2 <= TCG_REG_R15);
+tcg_debug_assert(r3 <= TCG_REG_R15);
+tcg_out16(s, (op & 0xff00) | ((v1 & 15) << 4) | r3);
+tcg_out16(s, b2 << 12 | d2);
+tcg_out16(s, (op & 0x00ff) | RXB(v1, 0, 0, 0) | (m4 << 12));
+}
+
+static void tcg_out_insn_VRSc(TCGContext *s, S390Opcode op, TCGReg r1,
+  intptr_t d2, TCGReg b2, TCGReg v3, int m4)
+{
+tcg_debug_assert(r1 <= TCG_REG_R15);
+tcg_debug_assert(d2 >= 0 && d2 <= 0xfff);
+tcg_debug_assert(b2 <= TCG_REG_R15);
+tcg_debug_assert(v3 >= TCG_REG_V0 && v3 <= TCG_REG_V31);
+tcg_out16(s, (op & 0xff00) | (r1 << 4) | (v3 & 15));
+tcg_out16(s, b2 << 12 | d2);
+tcg_out16(s, (op & 0x00ff) | RXB(0, 0, v3, 0) | (m4 << 12));
+}
+
  static void tcg_out_insn_VRX(TCGContext *s, S390Opcode op, TCGReg v1,
   TCGReg b2, TCGReg x2, intptr_t d2, int m3)
  {
@@ -581,12 +619,38 @@ static void tcg_out_sh32(TCGContext* s, S390Opcode op, 
TCGReg dest,
  
  static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg dst, TCGReg src)

  {
-if (src != dst) {
-if (type == TCG_TYPE_I32) {
+if (src == dst) {
+return true;
+}
+switch (type) {
+case TCG_TYPE_I32:
+if (likely(dst < 16 && src < 16)) {
  tcg_out_insn(s, RR, LR, dst, src);
-} else {
-tcg_out_insn(s, RRE, LGR, dst, src);
+break;
  }
+/* fallthru */
+


Does that fall-through work as expected? I would have thought we would 
want to pass "2" as m4 for VLGV and VLVG below?


Apart from that LGTM.


--
Thanks,

David / dhildenb

Re: [PATCH v4 05/16] tcg/s390x: Implement tcg_out_ld/st for vector types

2021-09-14 Thread David Hildenbrand


On 26.06.21 07:02, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  tcg/s390x/tcg-target.c.inc | 122 +
  1 file changed, 110 insertions(+), 12 deletions(-)

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index 18233c628d..b6ea129e14 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -265,6 +265,12 @@ typedef enum S390Opcode {
  RX_STC  = 0x42,
  RX_STH  = 0x40,
  
+VRX_VL  = 0xe706,

+VRX_VLLEZ   = 0xe704,
+VRX_VST = 0xe70e,
+VRX_VSTEF   = 0xe70b,
+VRX_VSTEG   = 0xe70a,
+
  NOP = 0x0707,
  } S390Opcode;
  
@@ -529,6 +535,31 @@ static void tcg_out_insn_RSY(TCGContext *s, S390Opcode op, TCGReg r1,

  #define tcg_out_insn_RX   tcg_out_insn_RS
  #define tcg_out_insn_RXY  tcg_out_insn_RSY
  
+static int RXB(TCGReg v1, TCGReg v2, TCGReg v3, TCGReg v4)

+{
+/*
+ * Shift bit 4 of each regno to its corresponding bit of RXB.
+ * RXB itself begins at bit 8 of the instruction so 8 - 4 = 4
+ * is the left-shift of the 4th operand.
+ */
+return ((v1 & 0x10) << (4 + 3))
+ | ((v2 & 0x10) << (4 + 2))
+ | ((v3 & 0x10) << (4 + 1))
+ | ((v4 & 0x10) << (4 + 0));
+}
+
+static void tcg_out_insn_VRX(TCGContext *s, S390Opcode op, TCGReg v1,
+ TCGReg b2, TCGReg x2, intptr_t d2, int m3)


Is intptr_t really the right type here? Just curious ... I'd have used 
an uint16_t and asserted "!(d1 & 0xf000)".



+{
+tcg_debug_assert(v1 >= TCG_REG_V0 && v1 <= TCG_REG_V31);
+tcg_debug_assert(d2 >= 0 && d2 <= 0xfff);
+tcg_debug_assert(x2 <= TCG_REG_R15);
+tcg_debug_assert(b2 <= TCG_REG_R15);
+tcg_out16(s, (op & 0xff00) | ((v1 & 15) << 4) | x2);


Nit: ((v1 & 0xf) << 4)

makes it immediately clearer to me which bits are set by which piece of 
this puzzle :)



+tcg_out16(s, (b2 << 12) | d2);
+tcg_out16(s, (op & 0x00ff) | RXB(v1, 0, 0, 0) | (m3 << 12));
+}
+
  /* Emit an opcode with "type-checking" of the format.  */
  #define tcg_out_insn(S, FMT, OP, ...) \
  glue(tcg_out_insn_,FMT)(S, glue(glue(FMT,_),OP), ## __VA_ARGS__)
@@ -705,25 +736,92 @@ static void tcg_out_mem(TCGContext *s, S390Opcode opc_rx, 
S390Opcode opc_rxy,
  }
  }
  
+static void tcg_out_vrx_mem(TCGContext *s, S390Opcode opc_vrx,

+TCGReg data, TCGReg base, TCGReg index,
+tcg_target_long ofs, int m3)
+{
+if (ofs < 0 || ofs >= 0x1000) {
+if (ofs >= -0x8 && ofs < 0x8) {
+tcg_out_insn(s, RXY, LAY, TCG_TMP0, base, index, ofs);
+base = TCG_TMP0;
+index = TCG_REG_NONE;
+ofs = 0;
+} else {
+tcg_out_movi(s, TCG_TYPE_PTR, TCG_TMP0, ofs);
+if (index != TCG_REG_NONE) {
+tcg_out_insn(s, RRE, AGR, TCG_TMP0, index);
+}
+index = TCG_TMP0;
+ofs = 0;
+}
+}
+tcg_out_insn_VRX(s, opc_vrx, data, base, index, ofs, m3);
+}
  
  /* load data without address translation or endianness conversion */

-static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg data,
-  TCGReg base, intptr_t ofs)
+static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg data,
+   TCGReg base, intptr_t ofs)
  {
-if (type == TCG_TYPE_I32) {
-tcg_out_mem(s, RX_L, RXY_LY, data, base, TCG_REG_NONE, ofs);
-} else {
-tcg_out_mem(s, 0, RXY_LG, data, base, TCG_REG_NONE, ofs);
+switch (type) {
+case TCG_TYPE_I32:
+if (likely(data < 16)) {


This actually maps to "if (likely(data <= TCG_REG_R15))", correct?

--
Thanks,

David / dhildenb

Re: [PATCH v2 30/53] qapi: introduce x-query-roms QMP command

2021-09-14 Thread Daniel P . Berrangé

On Tue, Sep 14, 2021 at 06:04:48PM +0200, Philippe Mathieu-Daudé wrote:
> On 9/14/21 4:20 PM, Daniel P. Berrangé wrote:
> > This is a counterpart to the HMP "info roms" command. It is being
> > added with an "x-" prefix because this QMP command is intended as an
> > adhoc debugging tool and will thus not be modelled in QAPI as fully
> > structured data, nor will it have long term guaranteed stability.
> > The existing HMP command is rewritten to call the QMP command.
> > 
> > Signed-off-by: Daniel P. Berrangé 
> > ---
> >  hw/core/loader.c  | 55 ---
> >  qapi/machine.json | 12 +++
> >  2 files changed, 50 insertions(+), 17 deletions(-)
> 
> > -void hmp_info_roms(Monitor *mon, const QDict *qdict)
> > +HumanReadableText *qmp_x_query_roms(Error **errp)
> >  {
> >  Rom *rom;
> > +g_autoptr(GString) buf = g_string_new("");
> > +HumanReadableText *ret;
> >  
> >  QTAILQ_FOREACH(rom, , next) {
> >  if (rom->mr) {
> > -monitor_printf(mon, "%s"
> > -   " size=0x%06zx name=\"%s\"\n",
> > -   memory_region_name(rom->mr),
> > -   rom->romsize,
> > -   rom->name);
> > +g_string_append_printf(buf, "%s"
> > +   " size=0x%06zx name=\"%s\"\n",
> > +   memory_region_name(rom->mr),
> > +   rom->romsize,
> > +   rom->name);
> >  } else if (!rom->fw_file) {
> > -monitor_printf(mon, "addr=" TARGET_FMT_plx
> > -   " size=0x%06zx mem=%s name=\"%s\"\n",
> > -   rom->addr, rom->romsize,
> > -   rom->isrom ? "rom" : "ram",
> > -   rom->name);
> > +g_string_append_printf(buf, "addr=" TARGET_FMT_plx
> > +   " size=0x%06zx mem=%s name=\"%s\"\n",
> > +   rom->addr, rom->romsize,
> > +   rom->isrom ? "rom" : "ram",
> > +   rom->name);
> >  } else {
> > -monitor_printf(mon, "fw=%s/%s"
> > -   " size=0x%06zx name=\"%s\"\n",
> > -   rom->fw_dir,
> > -   rom->fw_file,
> > -   rom->romsize,
> > -   rom->name);
> > +g_string_append_printf(buf, "fw=%s/%s"
> > +   " size=0x%06zx name=\"%s\"\n",
> > +   rom->fw_dir,
> > +   rom->fw_file,
> > +   rom->romsize,
> > +   rom->name);
> >  }
> >  }
> > +
> > +ret = g_new0(HumanReadableText, 1);
> > +ret->human_readable_text = g_steal_pointer(>str);
> > +return ret;
> > +}
> 
> Is it possible to have an helper in 'qapi/qmp/smth.h' such:
> 
> HumanReadableText *qmp_human_readable_text_new(GString **pbuf)
> {
> HumanReadableText *ret = g_new0(HumanReadableText, 1);
> 
> ret->human_readable_text = g_steal_pointer(pbuf);

NB, we're not stealing the GString, we're stealing the
char * inside it.

> 
> return ret;
> }

but yes, we could do a helper like this.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2 42/53] hw/core: introduce a 'format_tlb' callback

2021-09-14 Thread Daniel P . Berrangé

On Tue, Sep 14, 2021 at 05:56:09PM +0200, Philippe Mathieu-Daudé wrote:
> On 9/14/21 4:20 PM, Daniel P. Berrangé wrote:
> > This will allow us to reduce duplication between the different targets
> > implementing the 'info tlb' command.
> > 
> > Signed-off-by: Daniel P. Berrangé 
> > ---
> >  hw/core/cpu-common.c  |  9 +
> >  include/hw/core/cpu.h | 11 +++
> >  2 files changed, 20 insertions(+)
> 
> > diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> > index 4c47e1df18..64fc57c8d9 100644
> > --- a/include/hw/core/cpu.h
> > +++ b/include/hw/core/cpu.h
> >   * @has_work: Callback for checking if there is work to do.
> >   * @memory_rw_debug: Callback for GDB memory access.
> >   * @format_state: Callback for formatting state.
> > + * @format_tlb: Callback for formatting memory mappings
> >   * @get_arch_id: Callback for getting architecture-dependent CPU ID.
> >   * @set_pc: Callback for setting the Program Counter register. This
> >   *   should have the semantics used by the target architecture when
> > @@ -136,6 +137,7 @@ struct CPUClass {
> >  int (*memory_rw_debug)(CPUState *cpu, vaddr addr,
> > uint8_t *buf, int len, bool is_write);
> >  void (*format_state)(CPUState *cpu, GString *buf, int flags);
> > +void (*format_tlb)(CPUState *cpu, GString *buf);
> 
> Doesn't this belong to SysemuCPUOps?

I can't really answer, since my knowledge of this area of QEMU code is
fairly mimimal. I put it here because it is basically serving the same
purpose as the "format_state" callback immediately above it, which was
a rename of the existing "dump_state" callback. I assumed whatever was
there already was a good practice to follow[1]...

Regards,
Daniel

[1] yes assuming these things is often foolish in QEMU :-)
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2] nbd/server: Suppress Broken pipe errors on abrupt disconnection

2021-09-14 Thread Richard W.M. Jones

On Tue, Sep 14, 2021 at 06:21:58PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 14.09.2021 17:52, Richard W.M. Jones wrote:
> > On the
> >server side when the server receives NBD_CMD_DISC it must complete any
> >in-flight requests, but there's no requirement for the server to
> >commit anything to disk.  IOW you can still lose data even though you
> >took the time to disconnect.
> >
> >So I don't think there's any reason for libnbd to always gracefully
> 
> Hmm. Me go to NBD spec :)
> 
> I think, there is a reason:
> 
> "The client and the server MUST NOT initiate any form of disconnect other 
> than in one of the above circumstances."
> 
> And the only possibility for client to initiate a hard disconnect listed 
> above is "if it detects a violation by the other party of a mandatory 
> condition within this document".
> 
> So at least, nbdsh violates NBD protocol. May be spec should be updated to 
> satisfy your needs.

I would say the spec is at best contradictory, but if you read other
parts of the spec, then it's pretty clear that we're allowed to drop
the connection whenever we like.  This section says as much:

https://github.com/NetworkBlockDevice/nbd/blob/5805b25ad3da96e7c0b3160cda51ea19eb518d5b/doc/proto.md#terminating-the-transmission-phase

  There are two methods of terminating the transmission phase:
  ...
  "The client or the server drops the TCP session (in which case it
  SHOULD shut down the TLS session first). This is referred to as
  'initiating a hard disconnect'."

Anyway we're dropping the TCP connection because sometimes we are just
interrogating an NBD server eg to find out what it supports, and doing
a graceful shutdown is a waste of time and internet.

> >shut down (especially in this case where there are no in-flight
> >requests), and anyway it would break ABI to make that change and slow
> >down the client in cases when there's nothing to clean up.
> 
> Which ABI will it break?

Our contract with callers using nbd_close(3), if nbd_shutdown(3) is
not called beforehand.
https://libguestfs.org/nbd_shutdown.3.html
https://libguestfs.org/nbd_create.3.html (really nbd_close ...)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Re: [RFC v2 0/2] ui: Add a Wayland backend for Qemu UI (v2)

2021-09-14 Thread Daniel P . Berrangé

On Mon, Sep 13, 2021 at 03:20:34PM -0700, Vivek Kasireddy wrote:
> Why does Qemu need a new Wayland UI backend?
> The main reason why there needs to be a plain and simple Wayland backend
> for Qemu UI is to eliminate the Blit (aka GPU copy) that happens if using
> a toolkit like GTK or SDL (because they use EGL). The Blit can be eliminated
> by sharing the dmabuf fd -- associated with the Guest scanout buffer --
> directly with the Host compositor via the linux-dmabuf (unstable) protocol.
> Once properly integrated, it would be potentially possible to have the
> scanout buffer created by the Guest compositor be placed directly on a
> hardware plane on the Host thereby improving performance. Only Guest 
> compositors that use multiple back buffers (at-least 1 front and 1 back)
> and virtio-gpu would benefit from this work.

IME, QEMU already suffers from having too many barely maintained UI
implementations and iti s confusing to users. Using a toolkit like GTK
is generally a good thing, even if they don't enable the maximum
theoretical performance, because they reduce the long term maint burden.

I'm far from convinced that we should take on the maint of yet another
UI in QEMU, even if it does have some performance benefit, especially
if implemented using a very low level API like Wayland, that won't let
us easily add rich UI features.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v4 04/16] tcg/s390x: Add host vector framework

2021-09-14 Thread David Hildenbrand


On 26.06.21 07:02, Richard Henderson wrote:

Add registers and function stubs.  The functionality
is disabled via squashing s390_facilities[2] to 0.

We must still include results for the mandatory opcodes in
tcg_target_op_def, as all opcodes are checked during tcg init.

Signed-off-by: Richard Henderson 


Reviewed-by: David Hildenbrand 


--
Thanks,

David / dhildenb

Re: [RFC v2 2/2] ui: Add a plain Wayland backend for Qemu UI

2021-09-14 Thread Daniel P . Berrangé

On Mon, Sep 13, 2021 at 03:20:36PM -0700, Vivek Kasireddy wrote:
> Cc: Gerd Hoffmann 
> Signed-off-by: Vivek Kasireddy 
> ---
>  configure |   8 +-
>  meson.build   |  33 +++
>  meson_options.txt |   2 +
>  qapi/ui.json  |   3 +
>  ui/meson.build|  52 
>  ui/wayland.c  | 628 ++
>  6 files changed, 725 insertions(+), 1 deletion(-)
>  create mode 100644 ui/wayland.c


> diff --git a/ui/meson.build b/ui/meson.build
> index a73beb0e54..86fc324c82 100644
> --- a/ui/meson.build
> +++ b/ui/meson.build
> @@ -64,6 +64,58 @@ if config_host.has_key('CONFIG_OPENGL') and gbm.found()
>ui_modules += {'egl-headless' : egl_headless_ss}
>  endif
>  
> +wayland_scanner = find_program('wayland-scanner')
> +proto_sources = [
> +  ['xdg-shell', 'stable', ],
> +  ['fullscreen-shell', 'unstable', 'v1', ],
> +  ['linux-dmabuf', 'unstable', 'v1', ],
> +]
> +wayland_headers = []
> +wayland_proto_sources = []
> +
> +if wayland.found()
> +  foreach p: proto_sources
> +proto_name = p.get(0)
> +proto_stability = p.get(1)
> +
> +if proto_stability == 'stable'
> +  output_base = proto_name
> +  input = files(join_paths(wlproto_dir, 
> '@0@/@1@/@2@.xml'.format(proto_stability, proto_name, output_base)))
> +else
> +  proto_version = p.get(2)
> +  output_base = '@0@-@1@-@2@'.format(proto_name, proto_stability, 
> proto_version)
> +  input = files(join_paths(wlproto_dir, 
> '@0@/@1@/@2@.xml'.format(proto_stability, proto_name, output_base)))
> +endif
> +
> +wayland_headers += custom_target('@0@ client header'.format(output_base),
> +  input: input,
> +  output: '@0@-client-protocol.h'.format(output_base),
> +  command: [
> +wayland_scanner,
> +'client-header',
> +'@INPUT@', '@OUTPUT@',
> +  ], build_by_default: true
> +)
> +
> +wayland_proto_sources += custom_target('@0@ source'.format(output_base),
> +  input: input,
> +  output: '@0@-protocol.c'.format(output_base),
> +  command: [
> +wayland_scanner,
> +'private-code',
> +'@INPUT@', '@OUTPUT@',
> +  ], build_by_default: true
> +)
> +  endforeach
> +endif
> +
> +if wayland.found()
> +  wayland_ss = ss.source_set()
> +  wayland_ss.add(when: wayland, if_true: files('wayland.c', 
> 'xdg-shell-protocol.c', 
> 'fullscreen-shell-unstable-v1-protocol.c','linux-dmabuf-unstable-v1-protocol.c'))
> +  #wayland_ss.add(when: wayland, if_true: files('wayland.c'), 
> [wayland_proto_sources])
> +  ui_modules += {'wayland' : wayland_ss}
> +endif

Configure fails on this

  Program wayland-scanner found: YES (/usr/bin/wayland-scanner)

  ../ui/meson.build:114:13: ERROR: File xdg-shell-protocol.c does not exist.


the code a few lines above generates xdg-shell-protocol.c, but that
isn't run until you type "make", so when meson is resolving the
source files they don't exist.

The alternative line you have commented out looks more like what we
would need, but it doesn't work either as its syntax is invalid.

How did you actually compile this series ?


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2 30/53] qapi: introduce x-query-roms QMP command

2021-09-14 Thread Philippe Mathieu-Daudé

On 9/14/21 4:20 PM, Daniel P. Berrangé wrote:
> This is a counterpart to the HMP "info roms" command. It is being
> added with an "x-" prefix because this QMP command is intended as an
> adhoc debugging tool and will thus not be modelled in QAPI as fully
> structured data, nor will it have long term guaranteed stability.
> The existing HMP command is rewritten to call the QMP command.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  hw/core/loader.c  | 55 ---
>  qapi/machine.json | 12 +++
>  2 files changed, 50 insertions(+), 17 deletions(-)

> -void hmp_info_roms(Monitor *mon, const QDict *qdict)
> +HumanReadableText *qmp_x_query_roms(Error **errp)
>  {
>  Rom *rom;
> +g_autoptr(GString) buf = g_string_new("");
> +HumanReadableText *ret;
>  
>  QTAILQ_FOREACH(rom, , next) {
>  if (rom->mr) {
> -monitor_printf(mon, "%s"
> -   " size=0x%06zx name=\"%s\"\n",
> -   memory_region_name(rom->mr),
> -   rom->romsize,
> -   rom->name);
> +g_string_append_printf(buf, "%s"
> +   " size=0x%06zx name=\"%s\"\n",
> +   memory_region_name(rom->mr),
> +   rom->romsize,
> +   rom->name);
>  } else if (!rom->fw_file) {
> -monitor_printf(mon, "addr=" TARGET_FMT_plx
> -   " size=0x%06zx mem=%s name=\"%s\"\n",
> -   rom->addr, rom->romsize,
> -   rom->isrom ? "rom" : "ram",
> -   rom->name);
> +g_string_append_printf(buf, "addr=" TARGET_FMT_plx
> +   " size=0x%06zx mem=%s name=\"%s\"\n",
> +   rom->addr, rom->romsize,
> +   rom->isrom ? "rom" : "ram",
> +   rom->name);
>  } else {
> -monitor_printf(mon, "fw=%s/%s"
> -   " size=0x%06zx name=\"%s\"\n",
> -   rom->fw_dir,
> -   rom->fw_file,
> -   rom->romsize,
> -   rom->name);
> +g_string_append_printf(buf, "fw=%s/%s"
> +   " size=0x%06zx name=\"%s\"\n",
> +   rom->fw_dir,
> +   rom->fw_file,
> +   rom->romsize,
> +   rom->name);
>  }
>  }
> +
> +ret = g_new0(HumanReadableText, 1);
> +ret->human_readable_text = g_steal_pointer(>str);
> +return ret;
> +}

Is it possible to have an helper in 'qapi/qmp/smth.h' such:

HumanReadableText *qmp_human_readable_text_new(GString **pbuf)
{
HumanReadableText *ret = g_new0(HumanReadableText, 1);

ret->human_readable_text = g_steal_pointer(pbuf);

return ret;
}

?

Re: [PATCH v2 42/53] hw/core: introduce a 'format_tlb' callback

2021-09-14 Thread Philippe Mathieu-Daudé

On 9/14/21 4:20 PM, Daniel P. Berrangé wrote:
> This will allow us to reduce duplication between the different targets
> implementing the 'info tlb' command.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  hw/core/cpu-common.c  |  9 +
>  include/hw/core/cpu.h | 11 +++
>  2 files changed, 20 insertions(+)

> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index 4c47e1df18..64fc57c8d9 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
>   * @has_work: Callback for checking if there is work to do.
>   * @memory_rw_debug: Callback for GDB memory access.
>   * @format_state: Callback for formatting state.
> + * @format_tlb: Callback for formatting memory mappings
>   * @get_arch_id: Callback for getting architecture-dependent CPU ID.
>   * @set_pc: Callback for setting the Program Counter register. This
>   *   should have the semantics used by the target architecture when
> @@ -136,6 +137,7 @@ struct CPUClass {
>  int (*memory_rw_debug)(CPUState *cpu, vaddr addr,
> uint8_t *buf, int len, bool is_write);
>  void (*format_state)(CPUState *cpu, GString *buf, int flags);
> +void (*format_tlb)(CPUState *cpu, GString *buf);

Doesn't this belong to SysemuCPUOps?

Re: [PATCH v2 29/53] qapi: introduce x-query-registers QMP command

2021-09-14 Thread Eric Blake

On Tue, Sep 14, 2021 at 03:20:18PM +0100, Daniel P. Berrangé wrote:
> This is a counterpart to the HMP "info registers" command. It is being
> added with an "x-" prefix because this QMP command is intended as an
> ad hoc debugging tool and will thus not be modelled in QAPI as fully
> structured data, nor will it have long term guaranteed stability.
> The existing HMP command is rewritten to call the QMP command.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---

> +++ b/qapi/common.json
> @@ -197,3 +197,14 @@
>  { 'enum': 'GrabToggleKeys',
>'data': [ 'ctrl-ctrl', 'alt-alt', 'shift-shift','meta-meta', 'scrolllock',
>  'ctrl-scrolllock' ] }
> +
> +##
> +# @HumanReadableText:
> +#
> +# @human-readable-text: Formatted output intended for humans.
> +#
> +# Since: 6.2.0

Should be '6.2', not '6.2.0', to match...

> +#
> +##
> +{ 'struct': 'HumanReadableText',
> +  'data': { 'human-readable-text': 'str' } }
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 157712f006..8737efa865 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -1312,3 +1312,18 @@
>   '*cores': 'int',
>   '*threads': 'int',
>   '*maxcpus': 'int' } }
> +
> +##
> +# @x-query-registers:
> +#
> +# @cpu: the CPU number to query. If omitted, queries all CPUs
> +#
> +# Query information on the CPU registers
> +#
> +# Returns: CPU state in an architecture-specific format
> +#
> +# Since: 6.2

...the prevailing style.

If it were likely that someone might backport just some (but not all)
added x- commands, it may be wise to separate the creation of
HumanReadableText into its own patch to backport that but not
x-query-registers.  But I rather suspect anyone backporting this will
take the series wholesale, so the coupling in this patch is not worth
worrying about.

> +##
> +{ 'command': 'x-query-registers',
> +  'data': {'*cpu': 'int' },
> +  'returns': 'HumanReadableText' }
> -- 
> 2.31.1
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH] gitlab-ci: Make more custom runner jobs manual, and don't allow failure

2021-09-14 Thread Peter Maydell

On Mon, 13 Sept 2021 at 11:19, Peter Maydell  wrote:
>
> Currently we define a lot of jobs for our custom runners:
> for both aarch64 and s390x we have
>  - all-linux-static
>  - all
>  - alldbg
>  - clang (manual)
>  - tci
>  - notcg (manual)
>
> This is overkill.  The main reason to run on these hosts is to get
> coverage for the host architecture; we can leave the handling of
> differences like debug vs non-debug to the x86 CI jobs.
>
> The jobs are also generally running OK; they occasionally fail due to
> timeouts, which is likely because we're overloading the machine by
> asking it to run 4 CI jobs at once plus the ad-hoc CI.
>
> Remove the 'allow_failure' tag from all these jobs, and switch the
> s390x-alldbg, aarch64-all, s390x-tci and aarch64-tci jobs to manual.
> This will let us make the switch for s390x and aarch64 hosts from
> the ad-hoc CI to gitlab.
>
> Signed-off-by: Peter Maydell 

Pushed to master so I can turn off my ad-hoc CI for this.

thanks
-- PMM

[PATCH 2/3] gdbstub: implement NOIRQ support for single step on KVM

2021-09-14 Thread Maxim Levitsky

Signed-off-by: Maxim Levitsky 
---
 accel/kvm/kvm-all.c  | 25 ++
 gdbstub.c| 60 
 include/sysemu/kvm.h | 13 ++
 3 files changed, 88 insertions(+), 10 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 6b187e9c96..e141260796 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -169,6 +169,8 @@ bool kvm_vm_attributes_allowed;
 bool kvm_direct_msi_allowed;
 bool kvm_ioeventfd_any_length_allowed;
 bool kvm_msi_use_devid;
+bool kvm_has_guest_debug;
+int kvm_sstep_flags;
 static bool kvm_immediate_exit;
 static hwaddr kvm_max_slot_size = ~0;
 
@@ -2559,6 +2561,25 @@ static int kvm_init(MachineState *ms)
 kvm_sregs2 =
 (kvm_check_extension(s, KVM_CAP_SREGS2) > 0);
 
+kvm_has_guest_debug =
+(kvm_check_extension(s, KVM_CAP_SET_GUEST_DEBUG) > 0);
+
+kvm_sstep_flags = 0;
+
+if (kvm_has_guest_debug) {
+/* Assume that single stepping is supported */
+kvm_sstep_flags = SSTEP_ENABLE;
+
+int guest_debug_flags =
+kvm_check_extension(s, KVM_CAP_SET_GUEST_DEBUG2);
+
+if (guest_debug_flags > 0) {
+if (guest_debug_flags & KVM_GUESTDBG_BLOCKIRQ) {
+kvm_sstep_flags |= SSTEP_NOIRQ;
+}
+}
+}
+
 kvm_state = s;
 
 ret = kvm_arch_init(ms, s);
@@ -3188,6 +3209,10 @@ int kvm_update_guest_debug(CPUState *cpu, unsigned long 
reinject_trap)
 
 if (cpu->singlestep_enabled) {
 data.dbg.control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP;
+
+if (cpu->singlestep_enabled & SSTEP_NOIRQ) {
+data.dbg.control |= KVM_GUESTDBG_BLOCKIRQ;
+}
 }
 kvm_arch_update_guest_debug(cpu, );
 
diff --git a/gdbstub.c b/gdbstub.c
index 5d8e6ae3cd..48bb803bae 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -368,12 +368,11 @@ typedef struct GDBState {
 gdb_syscall_complete_cb current_syscall_cb;
 GString *str_buf;
 GByteArray *mem_buf;
+int sstep_flags;
+int supported_sstep_flags;
 } GDBState;
 
-/* By default use no IRQs and no timers while single stepping so as to
- * make single stepping like an ICE HW step.
- */
-static int sstep_flags = SSTEP_ENABLE|SSTEP_NOIRQ|SSTEP_NOTIMER;
+static GDBState gdbserver_state;
 
 /* Retrieves flags for single step mode. */
 static int get_sstep_flags(void)
@@ -385,11 +384,10 @@ static int get_sstep_flags(void)
 if (replay_mode != REPLAY_MODE_NONE) {
 return SSTEP_ENABLE;
 } else {
-return sstep_flags;
+return gdbserver_state.sstep_flags;
 }
 }
 
-static GDBState gdbserver_state;
 
 static void init_gdbserver_state(void)
 {
@@ -399,6 +397,23 @@ static void init_gdbserver_state(void)
 gdbserver_state.str_buf = g_string_new(NULL);
 gdbserver_state.mem_buf = g_byte_array_sized_new(MAX_PACKET_LENGTH);
 gdbserver_state.last_packet = g_byte_array_sized_new(MAX_PACKET_LENGTH + 
4);
+
+
+if (kvm_enabled()) {
+gdbserver_state.supported_sstep_flags = 
kvm_get_supported_sstep_flags();
+} else {
+gdbserver_state.supported_sstep_flags =
+SSTEP_ENABLE | SSTEP_NOIRQ | SSTEP_NOTIMER;
+}
+
+/*
+ * By default use no IRQs and no timers while single stepping so as to
+ * make single stepping like an ICE HW step.
+ */
+
+gdbserver_state.sstep_flags = SSTEP_ENABLE | SSTEP_NOIRQ | SSTEP_NOTIMER;
+gdbserver_state.sstep_flags &= gdbserver_state.supported_sstep_flags;
+
 }
 
 #ifndef CONFIG_USER_ONLY
@@ -2017,24 +2032,44 @@ static void handle_v_commands(GArray *params, void 
*user_ctx)
 
 static void handle_query_qemu_sstepbits(GArray *params, void *user_ctx)
 {
-g_string_printf(gdbserver_state.str_buf, "ENABLE=%x,NOIRQ=%x,NOTIMER=%x",
-SSTEP_ENABLE, SSTEP_NOIRQ, SSTEP_NOTIMER);
+g_string_printf(gdbserver_state.str_buf, "ENABLE=%x", SSTEP_ENABLE);
+
+if (gdbserver_state.supported_sstep_flags & SSTEP_NOIRQ) {
+g_string_append_printf(gdbserver_state.str_buf, ",NOIRQ=%x",
+   SSTEP_NOIRQ);
+}
+
+if (gdbserver_state.supported_sstep_flags & SSTEP_NOTIMER) {
+g_string_append_printf(gdbserver_state.str_buf, ",NOTIMER=%x",
+   SSTEP_NOTIMER);
+}
+
 put_strbuf();
 }
 
 static void handle_set_qemu_sstep(GArray *params, void *user_ctx)
 {
+int new_sstep_flags;
+
 if (!params->len) {
 return;
 }
 
-sstep_flags = get_param(params, 0)->val_ul;
+new_sstep_flags = get_param(params, 0)->val_ul;
+
+if (new_sstep_flags  & ~gdbserver_state.supported_sstep_flags) {
+put_packet("E22");
+return;
+}
+
+gdbserver_state.sstep_flags = new_sstep_flags;
 put_packet("OK");
 }
 
 static void handle_query_qemu_sstep(GArray *params, void *user_ctx)
 {
-g_string_printf(gdbserver_state.str_buf, "0x%x", sstep_flags);
+g_string_printf(gdbserver_state.str_buf, "0x%x",
+

[PATCH 1/3] KVM: use KVM_{GET|SET}_SREGS2 when supported.

2021-09-14 Thread Maxim Levitsky

This allows to make PDPTRs part of the migration
stream and thus not reload them after migration which
is against X86 spec.

Signed-off-by: Maxim Levitsky 
---
 accel/kvm/kvm-all.c   |   5 ++
 include/sysemu/kvm.h  |   4 ++
 target/i386/cpu.h |   3 ++
 target/i386/kvm/kvm.c | 107 +-
 target/i386/machine.c |  30 
 5 files changed, 147 insertions(+), 2 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 0125c17edb..6b187e9c96 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -163,6 +163,7 @@ bool kvm_msi_via_irqfd_allowed;
 bool kvm_gsi_routing_allowed;
 bool kvm_gsi_direct_mapping;
 bool kvm_allowed;
+bool kvm_sregs2;
 bool kvm_readonly_mem_allowed;
 bool kvm_vm_attributes_allowed;
 bool kvm_direct_msi_allowed;
@@ -2554,6 +2555,10 @@ static int kvm_init(MachineState *ms)
 kvm_ioeventfd_any_length_allowed =
 (kvm_check_extension(s, KVM_CAP_IOEVENTFD_ANY_LENGTH) > 0);
 
+
+kvm_sregs2 =
+(kvm_check_extension(s, KVM_CAP_SREGS2) > 0);
+
 kvm_state = s;
 
 ret = kvm_arch_init(ms, s);
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index a1ab1ee12d..b3d4538c55 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -32,6 +32,7 @@
 #ifdef CONFIG_KVM_IS_POSSIBLE
 
 extern bool kvm_allowed;
+extern bool kvm_sregs2;
 extern bool kvm_kernel_irqchip;
 extern bool kvm_split_irqchip;
 extern bool kvm_async_interrupts_allowed;
@@ -139,6 +140,9 @@ extern bool kvm_msi_use_devid;
  */
 #define kvm_gsi_direct_mapping() (kvm_gsi_direct_mapping)
 
+
+#define kvm_supports_sregs2() (kvm_sregs2)
+
 /**
  * kvm_readonly_mem_enabled:
  *
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 71ae3141c3..9adae12426 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1436,6 +1436,9 @@ typedef struct CPUX86State {
 SegmentCache idt; /* only base and limit are used */
 
 target_ulong cr[5]; /* NOTE: cr1 is unused */
+
+bool pdptrs_valid;
+uint64_t pdptrs[4];
 int32_t a20_mask;
 
 BNDReg bnd_regs[4];
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 500d2e0e68..841b3b98f7 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2587,6 +2587,61 @@ static int kvm_put_sregs(X86CPU *cpu)
 return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_SREGS, );
 }
 
+static int kvm_put_sregs2(X86CPU *cpu)
+{
+CPUX86State *env = >env;
+struct kvm_sregs2 sregs;
+int i;
+
+sregs.flags = 0;
+
+if ((env->eflags & VM_MASK)) {
+set_v8086_seg(, >segs[R_CS]);
+set_v8086_seg(, >segs[R_DS]);
+set_v8086_seg(, >segs[R_ES]);
+set_v8086_seg(, >segs[R_FS]);
+set_v8086_seg(, >segs[R_GS]);
+set_v8086_seg(, >segs[R_SS]);
+} else {
+set_seg(, >segs[R_CS]);
+set_seg(, >segs[R_DS]);
+set_seg(, >segs[R_ES]);
+set_seg(, >segs[R_FS]);
+set_seg(, >segs[R_GS]);
+set_seg(, >segs[R_SS]);
+}
+
+set_seg(, >tr);
+set_seg(, >ldt);
+
+sregs.idt.limit = env->idt.limit;
+sregs.idt.base = env->idt.base;
+memset(sregs.idt.padding, 0, sizeof sregs.idt.padding);
+sregs.gdt.limit = env->gdt.limit;
+sregs.gdt.base = env->gdt.base;
+memset(sregs.gdt.padding, 0, sizeof sregs.gdt.padding);
+
+sregs.cr0 = env->cr[0];
+sregs.cr2 = env->cr[2];
+sregs.cr3 = env->cr[3];
+sregs.cr4 = env->cr[4];
+
+sregs.cr8 = cpu_get_apic_tpr(cpu->apic_state);
+sregs.apic_base = cpu_get_apic_base(cpu->apic_state);
+
+sregs.efer = env->efer;
+
+if (env->pdptrs_valid) {
+for (i = 0; i < 4; i++) {
+sregs.pdptrs[i] = env->pdptrs[i];
+}
+sregs.flags |= KVM_SREGS2_FLAGS_PDPTRS_VALID;
+}
+
+return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_SREGS2, );
+}
+
+
 static void kvm_msr_buf_reset(X86CPU *cpu)
 {
 memset(cpu->kvm_msr_buf, 0, MSR_BUF_SIZE);
@@ -3252,6 +3307,53 @@ static int kvm_get_sregs(X86CPU *cpu)
 return 0;
 }
 
+static int kvm_get_sregs2(X86CPU *cpu)
+{
+CPUX86State *env = >env;
+struct kvm_sregs2 sregs;
+int i, ret;
+
+ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS2, );
+if (ret < 0) {
+return ret;
+}
+
+get_seg(>segs[R_CS], );
+get_seg(>segs[R_DS], );
+get_seg(>segs[R_ES], );
+get_seg(>segs[R_FS], );
+get_seg(>segs[R_GS], );
+get_seg(>segs[R_SS], );
+
+get_seg(>tr, );
+get_seg(>ldt, );
+
+env->idt.limit = sregs.idt.limit;
+env->idt.base = sregs.idt.base;
+env->gdt.limit = sregs.gdt.limit;
+env->gdt.base = sregs.gdt.base;
+
+env->cr[0] = sregs.cr0;
+env->cr[2] = sregs.cr2;
+env->cr[3] = sregs.cr3;
+env->cr[4] = sregs.cr4;
+
+env->efer = sregs.efer;
+
+env->pdptrs_valid = sregs.flags & KVM_SREGS2_FLAGS_PDPTRS_VALID;
+
+if (env->pdptrs_valid) {
+for (i = 0; i < 4; i++) {
+env->pdptrs[i] = sregs.pdptrs[i];
+}
+}
+
+/* changes to apic base and cr8/tpr are read back

Re: [RFC PATCH 3/3] tests/tcg: commit Makefile atrocities in the name of portability

2021-09-14 Thread Warner Losh

On Tue, Aug 3, 2021 at 5:02 AM Alex Bennée  wrote:

> Not all of the multiarch tests are pure POSIX so elide over those
> tests on a non-Linux system. This allows for at least some of the
> tests to be nominally usable by *BSD user builds.
>
> Signed-off-by: Alex Bennée 
> Cc: Warner Losh 
> ---
>  tests/tcg/multiarch/Makefile.target | 6 +-
>  tests/tcg/x86_64/Makefile.target| 4 
>  2 files changed, 9 insertions(+), 1 deletion(-)
>

Acked-by: Warner Losh 

To do this with gcc10, however, I had to add -Wno-error=overflow
otherwise I got a lot of warnings about constants being truncated to
0.

It also fails the sha1 test, but when I run it by hand it works. It turns
out that I have a sha1 in my path, and at least in the bsd-user edition
of qemu-i386 tries to run that and fails.

Also, the hello world program needed tweaking

So with this applied and the following patch

diff --git a/tests/tcg/Makefile.target b/tests/tcg/Makefile.target
index 63cf1b2573..39420631a8 100644
--- a/tests/tcg/Makefile.target
+++ b/tests/tcg/Makefile.target
@@ -155,7 +155,7 @@ RUN_TESTS+=$(EXTRA_RUNS)

 ifdef CONFIG_USER_ONLY
 run-%: %
-   $(call run-test, $<, $(QEMU) $(QEMU_OPTS) $<, "$< on
$(TARGET_NAME)")
+   $(call run-test, $<, $(QEMU) $(QEMU_OPTS) ./$<, "$< on
$(TARGET_NAME)")

 run-plugin-%:
$(call run-test, $@, $(QEMU) $(QEMU_OPTS) \
@@ -168,7 +168,7 @@ run-%: %
$(call run-test, $<, \
  $(QEMU) -monitor none -display none \
  -chardev file$(COMMA)path=$<.out$(COMMA)id=output \
- $(QEMU_OPTS) $<, \
+ $(QEMU_OPTS) ./$<, \
  "$< on $(TARGET_NAME)")

 run-plugin-%:
diff --git a/tests/tcg/i386/Makefile.target b/tests/tcg/i386/Makefile.target
index a053ca3f15..ae258c47f0 100644
--- a/tests/tcg/i386/Makefile.target
+++ b/tests/tcg/i386/Makefile.target
@@ -21,6 +21,7 @@ run-plugin-test-i386-pcmpistri-%: QEMU_OPTS += -cpu max
 run-test-i386-bmi2: QEMU_OPTS += -cpu max
 run-plugin-test-i386-bmi2-%: QEMU_OPTS += -cpu max

+CFLAGS +=  -Wno-error=overflow
 #
 # hello-i386 is a barebones app
 #
diff --git a/tests/tcg/i386/hello-i386.c b/tests/tcg/i386/hello-i386.c
index 59196dd0b7..4a5a25211c 100644
--- a/tests/tcg/i386/hello-i386.c
+++ b/tests/tcg/i386/hello-i386.c
@@ -1,4 +1,10 @@
+#ifdef __FreeBSD__
+#include 
+#define __NR_exit SYS_exit
+#define __NR_write SYS_write
+#else
 #include 
+#endif

 static inline void exit(int status)
 {

I get down to a failure i the mmap test and that's all I have time to
plumb the depths
of this morning... Investigating the mmap test failure will have to wait
for another day.

Warner


> diff --git a/tests/tcg/multiarch/Makefile.target
> b/tests/tcg/multiarch/Makefile.target
> index 85a6fb7a2e..38ee0f1dec 100644
> --- a/tests/tcg/multiarch/Makefile.target
> +++ b/tests/tcg/multiarch/Makefile.target
> @@ -10,7 +10,11 @@ MULTIARCH_SRC=$(SRC_PATH)/tests/tcg/multiarch
>  # Set search path for all sources
>  VPATH  += $(MULTIARCH_SRC)
>  MULTIARCH_SRCS   =$(notdir $(wildcard $(MULTIARCH_SRC)/*.c))
> -MULTIARCH_TESTS  =$(filter-out float_helpers, $(MULTIARCH_SRCS:.c=))
> +MULTIARCH_SKIP=float_helpers
> +ifeq ($(CONFIG_LINUX),)
> +MULTIARCH_SKIP+=linux-test
> +endif
> +MULTIARCH_TESTS  =$(filter-out $(MULTIARCH_SKIP),$(MULTIARCH_SRCS:.c=))
>
>  #
>  # The following are any additional rules needed to build things
> diff --git a/tests/tcg/x86_64/Makefile.target
> b/tests/tcg/x86_64/Makefile.target
> index 2151ea6302..d7a7385583 100644
> --- a/tests/tcg/x86_64/Makefile.target
> +++ b/tests/tcg/x86_64/Makefile.target
> @@ -8,8 +8,12 @@
>
>  include $(SRC_PATH)/tests/tcg/i386/Makefile.target
>
> +ifneq ($(CONFIG_LINUX),)
>  X86_64_TESTS += vsyscall
>  TESTS=$(MULTIARCH_TESTS) $(X86_64_TESTS) test-x86_64
> +else
> +TESTS=$(MULTIARCH_TESTS)
> +endif
>  QEMU_OPTS += -cpu max
>
>  test-x86_64: LDFLAGS+=-lm -lc
> --
> 2.30.2
>
>

[PULL v3 01/44] accel/tcg: Add DisasContextBase argument to translator_ld*

2021-09-14 Thread Richard Henderson

From: Ilya Leoshkevich 

Signed-off-by: Ilya Leoshkevich 
[rth: Split out of a larger patch.]
Signed-off-by: Richard Henderson 
---
 include/exec/translator.h |  9 +
 target/arm/arm_ldst.h | 12 ++--
 target/alpha/translate.c  |  2 +-
 target/arm/translate-a64.c|  2 +-
 target/arm/translate.c|  9 +
 target/hexagon/translate.c|  3 ++-
 target/hppa/translate.c   |  2 +-
 target/i386/tcg/translate.c   | 10 +-
 target/m68k/translate.c   |  2 +-
 target/mips/tcg/translate.c   |  8 
 target/openrisc/translate.c   |  2 +-
 target/ppc/translate.c|  5 +++--
 target/riscv/translate.c  |  5 +++--
 target/s390x/tcg/translate.c  | 16 +---
 target/sh4/translate.c|  4 ++--
 target/sparc/translate.c  |  2 +-
 target/xtensa/translate.c |  5 +++--
 target/mips/tcg/micromips_translate.c.inc |  2 +-
 target/mips/tcg/mips16e_translate.c.inc   |  4 ++--
 target/mips/tcg/nanomips_translate.c.inc  |  4 ++--
 20 files changed, 58 insertions(+), 50 deletions(-)

diff --git a/include/exec/translator.h b/include/exec/translator.h
index d318803267..6c054e8d05 100644
--- a/include/exec/translator.h
+++ b/include/exec/translator.h
@@ -157,7 +157,8 @@ bool translator_use_goto_tb(DisasContextBase *db, 
target_ulong dest);
 
 #define GEN_TRANSLATOR_LD(fullname, type, load_fn, swap_fn) \
 static inline type  \
-fullname ## _swap(CPUArchState *env, abi_ptr pc, bool do_swap)  \
+fullname ## _swap(CPUArchState *env, DisasContextBase *dcbase,  \
+  abi_ptr pc, bool do_swap) \
 {   \
 type ret = load_fn(env, pc);\
 if (do_swap) {  \
@@ -166,10 +167,10 @@ bool translator_use_goto_tb(DisasContextBase *db, 
target_ulong dest);
 plugin_insn_append(, sizeof(ret));  \
 return ret; \
 }   \
-\
-static inline type fullname(CPUArchState *env, abi_ptr pc)  \
+static inline type fullname(CPUArchState *env,  \
+DisasContextBase *dcbase, abi_ptr pc)   \
 {   \
-return fullname ## _swap(env, pc, false);   \
+return fullname ## _swap(env, dcbase, pc, false);   \
 }
 
 GEN_TRANSLATOR_LD(translator_ldub, uint8_t, cpu_ldub_code, /* no swap */)
diff --git a/target/arm/arm_ldst.h b/target/arm/arm_ldst.h
index 057160e8da..cee0548a1c 100644
--- a/target/arm/arm_ldst.h
+++ b/target/arm/arm_ldst.h
@@ -24,15 +24,15 @@
 #include "qemu/bswap.h"
 
 /* Load an instruction and return it in the standard little-endian order */
-static inline uint32_t arm_ldl_code(CPUARMState *env, target_ulong addr,
-bool sctlr_b)
+static inline uint32_t arm_ldl_code(CPUARMState *env, DisasContextBase *s,
+target_ulong addr, bool sctlr_b)
 {
-return translator_ldl_swap(env, addr, bswap_code(sctlr_b));
+return translator_ldl_swap(env, s, addr, bswap_code(sctlr_b));
 }
 
 /* Ditto, for a halfword (Thumb) instruction */
-static inline uint16_t arm_lduw_code(CPUARMState *env, target_ulong addr,
- bool sctlr_b)
+static inline uint16_t arm_lduw_code(CPUARMState *env, DisasContextBase* s,
+ target_ulong addr, bool sctlr_b)
 {
 #ifndef CONFIG_USER_ONLY
 /* In big-endian (BE32) mode, adjacent Thumb instructions have been swapped
@@ -41,7 +41,7 @@ static inline uint16_t arm_lduw_code(CPUARMState *env, 
target_ulong addr,
 addr ^= 2;
 }
 #endif
-return translator_lduw_swap(env, addr, bswap_code(sctlr_b));
+return translator_lduw_swap(env, s, addr, bswap_code(sctlr_b));
 }
 
 #endif
diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index de6c0a8439..b034206688 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -2971,7 +2971,7 @@ static void alpha_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cpu)
 {
 DisasContext *ctx = container_of(dcbase, DisasContext, base);
 CPUAlphaState *env = cpu->env_ptr;
-uint32_t insn = translator_ldl(env, ctx->base.pc_next);
+uint32_t insn = translator_ldl(env, >base, ctx->base.pc_next);
 
 ctx->base.pc_next += 4;

1 2 3 4 >

1 - 100 of 354 matches

Mail list logo