date:20200320

Re: Qemu on Windows 10 - no acceleration found

2020-03-20 Thread Holger Schranz


... To early in the morning ...

It is on GitHub...

on Github:
https://github.com/intel/haxm/releases/tag/v7.5.6

others:
https://www.softpedia.com/get/Programming/Other-Programming-Files/Intel-Hardware-Accelerated-Execution-Manager.shtml
https://www.filecroco.com/download-intel-hardware-accelerated-execution-manager/

Best regards

Holger

Am 21.03.20 um 06:17 schrieb Holger Schranz:

Hi Jerry,

have you read the instructtions on Intel:

https://github.com/intel/haxm/wiki/Installation-Instructions-on-Windows

May be this helps you

Best regards

Holger

Am 20.03.20 um 21:22 schrieb Jerry Geis:
So I tried --enable-whpx and I get Invalid option. Im on Windows 10 
and QEMU 4.2.0


I'm confused.  Then I don't know where to download the HAXM. The 
place I found is GIT and it wants the user to compile it. I was 
looking for just an EXE.


Thanks

Jerry

Re: Qemu on Windows 10 - no acceleration found

2020-03-20 Thread Holger Schranz


Hi Jerry,

have you read the instructtions on Intel:

https://github.com/intel/haxm/wiki/Installation-Instructions-on-Windows

May be this helps you

Best regards

Holger

Am 20.03.20 um 21:22 schrieb Jerry Geis:
So I tried --enable-whpx and I get Invalid option. Im on Windows 10 
and QEMU 4.2.0


I'm confused.  Then I don't know where to download the HAXM. The place 
I found is GIT and it wants the user to compile it. I was looking for 
just an EXE.


Thanks

Jerry

[PATCH] target/mips: Fix loongson multimedia condition instructions

2020-03-20 Thread Jiaxun Yang

Loongson multimedia condition instructions were previously implemented as
write 0 to rd due to lack of documentation. So I just confirmed with Loongson
about their encoding and implemented them correctly.

Signed-off-by: Jiaxun Yang 
---
 target/mips/translate.c | 40 ++--
 1 file changed, 34 insertions(+), 6 deletions(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index d745bd2803..43be8d27b5 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -5529,6 +5529,8 @@ static void gen_loongson_multimedia(DisasContext *ctx, 
int rd, int rs, int rt)
 {
 uint32_t opc, shift_max;
 TCGv_i64 t0, t1;
+TCGCond cond;
+TCGLabel *lab;
 
 opc = MASK_LMI(ctx->opcode);
 switch (opc) {
@@ -5816,7 +5818,7 @@ static void gen_loongson_multimedia(DisasContext *ctx, 
int rd, int rs, int rt)
 case OPC_DADD_CP2:
 {
 TCGv_i64 t2 = tcg_temp_new_i64();
-TCGLabel *lab = gen_new_label();
+lab = gen_new_label();
 
 tcg_gen_mov_i64(t2, t0);
 tcg_gen_add_i64(t0, t1, t2);
@@ -5837,7 +5839,7 @@ static void gen_loongson_multimedia(DisasContext *ctx, 
int rd, int rs, int rt)
 case OPC_DSUB_CP2:
 {
 TCGv_i64 t2 = tcg_temp_new_i64();
-TCGLabel *lab = gen_new_label();
+lab = gen_new_label();
 
 tcg_gen_mov_i64(t2, t0);
 tcg_gen_sub_i64(t0, t1, t2);
@@ -5862,14 +5864,39 @@ static void gen_loongson_multimedia(DisasContext *ctx, 
int rd, int rs, int rt)
 
 case OPC_SEQU_CP2:
 case OPC_SEQ_CP2:
+cond = TCG_COND_EQ;
+goto do_cc_cond;
+break;
+
 case OPC_SLTU_CP2:
+cond = TCG_COND_LTU;
+goto do_cc_cond;
+break;
+
 case OPC_SLT_CP2:
+cond = TCG_COND_LT;
+goto do_cc_cond;
+break;
+
 case OPC_SLEU_CP2:
+cond = TCG_COND_LEU;
+goto do_cc_cond;
+break;
+
 case OPC_SLE_CP2:
-/*
- * ??? Document is unclear: Set FCC[CC].  Does that mean the
- * FD field is the CC field?
- */
+cond = TCG_COND_LE;
+do_cc_cond:
+{
+int cc = (ctx->opcode >> 8) & 0x7;
+lab = gen_new_label();
+tcg_gen_ori_i32(fpu_fcr31, fpu_fcr31, 1 << get_fp_bit(cc));
+tcg_gen_brcond_i64(cond, t0, t1, lab);
+tcg_gen_xori_i32(fpu_fcr31, fpu_fcr31, 1 << get_fp_bit(cc));
+gen_set_label(lab);
+}
+goto no_rd;
+break;
+
 default:
 MIPS_INVAL("loongson_cp2");
 generate_exception_end(ctx, EXCP_RI);
@@ -5878,6 +5905,7 @@ static void gen_loongson_multimedia(DisasContext *ctx, 
int rd, int rs, int rt)
 
 gen_store_fpr64(ctx, t0, rd);
 
+no_rd:
 tcg_temp_free_i64(t0);
 tcg_temp_free_i64(t1);
 }
-- 
2.26.0.rc2

[PATCH 1/1] spapr/rtas: Add MinMem to ibm, get-system-parameter RTAS call

2020-03-20 Thread Leonardo Bras

Add support for MinMem SPLPAR Characteristic on emulated
RTAS call ibm,get-system-parameter.

MinMem represents Minimum Memory, that is described in LOPAPR as:
The minimum amount of main store that is needed to power on the
partition. Minimum memory is expressed in MB of storage.

This  provides a way for the OS to discern hotplugged LMBs and
LMBs that have started with the VM, allowing it to better provide
a way for memory hot-removal.

Signed-off-by: Leonardo Bras 
---
 hw/ppc/spapr_rtas.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 9fb8c8632a..0f3fbca7af 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -276,10 +276,12 @@ static void rtas_ibm_get_system_parameter(PowerPCCPU *cpu,
 
 switch (parameter) {
 case RTAS_SYSPARM_SPLPAR_CHARACTERISTICS: {
-char *param_val = g_strdup_printf("MaxEntCap=%d,"
+char *param_val = g_strdup_printf("MinMem=%" PRIu64 ","
+  "MaxEntCap=%d,"
   "DesMem=%" PRIu64 ","
   "DesProcs=%d,"
   "MaxPlatProcs=%d",
+  ms->ram_size / MiB,
   ms->smp.max_cpus,
   ms->ram_size / MiB,
   ms->smp.cpus,
-- 
2.24.1

Re: [PATCH] gdbstub: add support to Xfer:auxv:read: packet

2020-03-20 Thread Lirong Yuan

On Fri, Mar 20, 2020 at 2:17 AM Alex Bennée  wrote:

>
> Lirong Yuan  writes:
>
> > On Fri, Mar 6, 2020 at 5:01 PM Lirong Yuan  wrote:
> >
> >> This allows gdb to access the target’s auxiliary vector,
> >> which can be helpful for telling system libraries important details
> >> about the hardware, operating system, and process.
> >>
> >> Signed-off-by: Lirong Yuan 
> >> ---
> >>  gdbstub.c | 55 +++
> >>  1 file changed, 55 insertions(+)
> >>
> >> diff --git a/gdbstub.c b/gdbstub.c
> >> index 22a2d630cd..a946af7007 100644
> >> --- a/gdbstub.c
> >> +++ b/gdbstub.c
> >> @@ -2105,6 +2105,12 @@ static void handle_query_supported(GdbCmdContext
> >> *gdb_ctx, void *user_ctx)
> >>  pstrcat(gdb_ctx->str_buf, sizeof(gdb_ctx->str_buf),
> >>  ";qXfer:features:read+");
> >>  }
> >> +#ifdef CONFIG_USER_ONLY
> >> +if (gdb_ctx->s->c_cpu->opaque) {
> >> +pstrcat(gdb_ctx->str_buf, sizeof(gdb_ctx->str_buf),
> >> +";qXfer:auxv:read+");
> >> +}
> >> +#endif
> >>
> >>  if (gdb_ctx->num_params &&
> >>  strstr(gdb_ctx->params[0].data, "multiprocess+")) {
> >> @@ -2166,6 +2172,47 @@ static void
> >> handle_query_xfer_features(GdbCmdContext *gdb_ctx, void *user_ctx)
> >>  put_packet_binary(gdb_ctx->s, gdb_ctx->str_buf, len + 1, true);
> >>  }
> >>
> >> +#ifdef CONFIG_USER_ONLY
> >> +static void handle_query_xfer_auxv(GdbCmdContext *gdb_ctx, void
> *user_ctx)
> >> +{
> >> +TaskState *ts;
> >> +unsigned long offset, len, saved_auxv, auxv_len;
> >> +const char *mem;
> >> +
> >> +if (gdb_ctx->num_params < 2) {
> >> +put_packet(gdb_ctx->s, "E22");
> >> +return;
> >> +}
> >> +
> >> +offset = gdb_ctx->params[0].val_ul;
> >> +len = gdb_ctx->params[1].val_ul;
> >> +
> >> +ts = gdb_ctx->s->c_cpu->opaque;
> >> +saved_auxv = ts->info->saved_auxv;
> >> +auxv_len = ts->info->auxv_len;
> >> +mem = (const char *)(saved_auxv + offset);
> >> +
> >> +if (offset >= auxv_len) {
> >> +put_packet(gdb_ctx->s, "E22");
> >> +return;
> >> +}
> >> +
> >> +if (len > (MAX_PACKET_LENGTH - 5) / 2) {
> >> +len = (MAX_PACKET_LENGTH - 5) / 2;
> >> +}
> >> +
> >> +if (len < auxv_len - offset) {
> >> +gdb_ctx->str_buf[0] = 'm';
> >> +len = memtox(gdb_ctx->str_buf + 1, mem, len);
> >> +} else {
> >> +gdb_ctx->str_buf[0] = 'l';
> >> +len = memtox(gdb_ctx->str_buf + 1, mem, auxv_len - offset);
> >> +}
> >> +
> >> +put_packet_binary(gdb_ctx->s, gdb_ctx->str_buf, len + 1, true);
> >> +}
> >> +#endif
> >> +
> >>  static void handle_query_attached(GdbCmdContext *gdb_ctx, void
> *user_ctx)
> >>  {
> >>  put_packet(gdb_ctx->s, GDB_ATTACHED);
> >> @@ -2271,6 +2318,14 @@ static GdbCmdParseEntry gdb_gen_query_table[] = {
> >>  .cmd_startswith = 1,
> >>  .schema = "s:l,l0"
> >>  },
> >> +#ifdef CONFIG_USER_ONLY
> >> +{
> >> +.handler = handle_query_xfer_auxv,
> >> +.cmd = "Xfer:auxv:read:",
> >> +.cmd_startswith = 1,
> >> +.schema = "l,l0"
> >> +},
> >> +#endif
> >>  {
> >>  .handler = handle_query_attached,
> >>  .cmd = "Attached:",
> >> --
> >> 2.25.1.481.gfbce0eb801-goog
> >>
> >>
> > Friendly ping~
>
> Sorry I missed this on my radar. There was a minor re-factor of gdbstub
> that was just merged which will mean this patch needs a re-base to use
> g_string_* functions to expand stings.
>
> Also we have some simple gdbstub tests now - could we come up with a
> multiarch gdbstub test to verify this is working properly?
>
> >
> > Link to the patchwork page:
> > http://patchwork.ozlabs.org/patch/1250727/
>
>
> --
> Alex Bennée
>

Hi Alex,

For sure, I will re-base this patch to use g_string_* functions.

Currently we are using qemu aarch64. I am not sure how to do this yet, but
I could try to add something to
https://github.com/qemu/qemu/tree/master/tests/tcg/aarch64/gdbstub

Does this sound good?

Thanks!
Lirong

[RFC PATCH] target/ppc: Add capability for enabling secure guests

2020-03-20 Thread Fabiano Rosas

Making use of ppc's Protected Execution Facility (PEF) feature, a
guest can become a secure guest (aka. secure VM - SVM) and have its
memory protected from access by the host. This feature is mediated by
a piece of firmware called the Ultravisor (UV).

The transition from a regular to a secure VM is initiated by the guest
kernel during prom_init via the use of an ultracall (enter secure mode
- UV_ESM) and with cooperation from the hypervisor via an hcall
(H_SVM_INIT_START).

Currently QEMU has no knowledge of this process and no way to
determine if a host supports the feature. A guest with PEF support
enabled would always try to enter secure mode regardless of user
intent or hardware support.

To address the above, a new KVM capability (KVM_CAP_PPC_SECURE_GUEST
[1]) is being introduced in the kernel without which KVM will block
the secure transition.

This patch adds support for checking/enabling this KVM capability via
a new spapr capability (SPAPR_CAP_SECURE_GUEST) and the equivalent
command line switch (-machine pseries,cap-svm). The capability
defaults to off.

1- https://lore.kernel.org/kvm/20200319043301.GA13052@blackberry

Signed-off-by: Fabiano Rosas 
---

I have implemented this to be able to test Paul's patch. I'm sending
it as RFC in case it helps anyone else and if we decide to go in this
direction I can develop it further.

PS: TCG currently gets in a loop of 0x700 due to the lack of 'sc 2'
emulation - and all the rest of PEF, of course =).

---
 hw/ppc/spapr.c |  1 +
 hw/ppc/spapr_caps.c| 30 ++
 include/hw/ppc/spapr.h |  3 ++-
 target/ppc/kvm.c   | 12 
 target/ppc/kvm_ppc.h   | 12 
 5 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 9a2bd501aa..a881ac4e29 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4542,6 +4542,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
 smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
 smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
 smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
+smc->default_caps.caps[SPAPR_CAP_SECURE_GUEST] = SPAPR_CAP_OFF;
 spapr_caps_add_properties(smc, &error_abort);
 smc->irq = &spapr_irq_dual;
 smc->dr_phb_enabled = true;
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index 679ae7959f..375b7e0b30 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -524,6 +524,27 @@ static void cap_fwnmi_apply(SpaprMachineState *spapr, 
uint8_t val,
 }
 }

+static void cap_secure_guest_apply(SpaprMachineState *spapr,
+   uint8_t val, Error **errp)
+{
+if (!val) {
+/* capability disabled by default */
+return;
+}
+
+if (!kvm_enabled()) {
+error_setg(errp, "No PEF support in tcg, try cap-svm=off");
+return;
+}
+
+if (!kvmppc_has_cap_secure_guest()) {
+error_setg(errp, "KVM implementation does not support secure guests, "
+   "try cap-svm=off");
+} else if (kvmppc_enable_cap_secure_guest() < 0) {
+error_setg(errp, "Error enabling cap-svm, try cap-svm=off");
+}
+}
+
 SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
 [SPAPR_CAP_HTM] = {
 .name = "htm",
@@ -632,6 +653,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
 .type = "bool",
 .apply = cap_fwnmi_apply,
 },
+[SPAPR_CAP_SECURE_GUEST] = {
+.name = "svm",
+.description = "Allow the guest to become a Secure Guest",
+.index = SPAPR_CAP_SECURE_GUEST,
+.get = spapr_cap_get_bool,
+.set = spapr_cap_set_bool,
+.type = "bool",
+.apply = cap_secure_guest_apply,
+},
 };

 static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 42d64a0368..7f5289782d 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -81,8 +81,9 @@ typedef enum {
 #define SPAPR_CAP_CCF_ASSIST0x09
 /* Implements PAPR FWNMI option */
 #define SPAPR_CAP_FWNMI 0x0A
+#define SPAPR_CAP_SECURE_GUEST  0x0B
 /* Num Caps */
-#define SPAPR_CAP_NUM   (SPAPR_CAP_FWNMI + 1)
+#define SPAPR_CAP_NUM   (SPAPR_CAP_SECURE_GUEST + 1)

 /*
  * Capability Values
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 597f72be1b..9254749cd7 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -88,6 +88,7 @@ static int cap_ppc_safe_indirect_branch;
 static int cap_ppc_count_cache_flush_assist;
 static int cap_ppc_nested_kvm_hv;
 static int cap_large_decr;
+static int cap_ppc_secure_guest;

 static uint32_t debug_inst_opcode;

@@ -135,6 +136,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 cap_resize_hpt = kvm_vm_check_extension(s, KVM_CAP_SPAPR_RESIZE_HPT);
 kvmppc_get_cpu_characteristics(s);
 cap_ppc_nested_kvm_hv = kvm_vm_ch

Re: [PATCH v2 1/1] device_tree: Add info message when dumping dtb to file

2020-03-20 Thread Alistair Francis

On Wed, Mar 18, 2020 at 9:03 PM Leonardo Bras  wrote:
>
> When dumping dtb to a file, qemu exits silently before starting the VM.
>
> Add info message so user can easily track why the proccess exits.
> Add error message if dtb dump failed.
>
> Signed-off-by: Leonardo Bras 

Thanks for the patch

Reviewed-by: Alistair Francis 

I have sent a PR with this patch.

Alistair

> ---
>  device_tree.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/device_tree.c b/device_tree.c
> index f8b46b3c73..bba6cc2164 100644
> --- a/device_tree.c
> +++ b/device_tree.c
> @@ -530,7 +530,12 @@ void qemu_fdt_dumpdtb(void *fdt, int size)
>
>  if (dumpdtb) {
>  /* Dump the dtb to a file and quit */
> -exit(g_file_set_contents(dumpdtb, fdt, size, NULL) ? 0 : 1);
> +if (g_file_set_contents(dumpdtb, fdt, size, NULL)) {
> +info_report("dtb dumped to %s. Exiting.", dumpdtb);
> +exit(0);
> +}
> +error_report("%s: Failed dumping dtb to %s", __func__, dumpdtb);
> +exit(1);
>  }
>  }
>
> --
> 2.24.1
>
>

[PULL 0/1] DTC queue for 5.0

2020-03-20 Thread Alistair Francis

The following changes since commit 3d0ac346032a1fa9afafcaedc979a99f670e077e:

  Merge remote-tracking branch 'remotes/ehabkost/tags/python-next-pull-request' 
into staging (2020-03-20 13:54:23 +)

are available in the Git repository at:

  g...@github.com:alistair23/qemu.git tags/pull-dtc-next-20200320-1

for you to fetch changes up to 9f252c7c88eacbf21dadcfe117b0d08f2e88ceeb:

  device_tree: Add info message when dumping dtb to file (2020-03-20 14:55:44 
-0700)


DTC patches for 5.0


Leonardo Bras (1):
  device_tree: Add info message when dumping dtb to file

 device_tree.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

[PULL 1/1] device_tree: Add info message when dumping dtb to file

2020-03-20 Thread Alistair Francis

From: Leonardo Bras 

When dumping dtb to a file, qemu exits silently before starting the VM.

Add info message so user can easily track why the proccess exits.
Add error message if dtb dump failed.

Signed-off-by: Leonardo Bras 
Message-Id: <20200319040326.391090-1-leona...@linux.ibm.com>
Reviewed-by: David Gibson 
Reviewed-by: Alistair Francis 
Signed-off-by: Alistair Francis 
---
 device_tree.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/device_tree.c b/device_tree.c
index f8b46b3c73..bba6cc2164 100644
--- a/device_tree.c
+++ b/device_tree.c
@@ -530,7 +530,12 @@ void qemu_fdt_dumpdtb(void *fdt, int size)
 
 if (dumpdtb) {
 /* Dump the dtb to a file and quit */
-exit(g_file_set_contents(dumpdtb, fdt, size, NULL) ? 0 : 1);
+if (g_file_set_contents(dumpdtb, fdt, size, NULL)) {
+info_report("dtb dumped to %s. Exiting.", dumpdtb);
+exit(0);
+}
+error_report("%s: Failed dumping dtb to %s", __func__, dumpdtb);
+exit(1);
 }
 }
 
-- 
2.25.1

[PATCH v4] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-03-20 Thread Andrzej Jakowski

This patch introduces support for PMR that has been defined as part of NVMe 1.4
spec. User can now specify a pmrdev option that should point to 
HostMemoryBackend.
pmrdev memory region will subsequently be exposed as PCI BAR 2 in emulated NVMe
device. Guest OS can perform mmio read and writes to the PMR region that will 
stay
persistent across system reboot.

Signed-off-by: Andrzej Jakowski 
Reviewed-by: Klaus Jensen 
---
v3:
 - replaced qemu_msync() use with qemu_ram_writeback() to allow pmem_persist()
   or qemu_msync() be called depending on configuration [4] (Stefan)
 - rephrased comments to improve clarity and fixed code style issues [4]
   (Stefan, Klaus)

v2:
 - reworked PMR to use HostMemoryBackend instead of directly mapping PMR
   backend file into qemu [1] (Stefan)

v1:
 - provided support for Bit 1 from PMRWBM register instead of Bit 0 to ensure
   improved performance in virtualized environment [2] (Stefan)

 - added check if pmr size is power of two in size [3] (David)

 - addressed cross compilation build problems reported by CI environment

[1]: 
https://lore.kernel.org/qemu-devel/20200306223853.37958-1-andrzej.jakow...@linux.intel.com/
[2]: 
https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4-2019.06.10-Ratified.pdf
[3]: 
https://lore.kernel.org/qemu-devel/20200218224811.30050-1-andrzej.jakow...@linux.intel.com/
[4]: 
https://lore.kernel.org/qemu-devel/20200318200303.11322-1-andrzej.jakow...@linux.intel.com/
---
Persistent Memory Region (PMR) is a new optional feature provided in NVMe 1.4
specification. This patch implements initial support for it in NVMe driver.
---
 hw/block/Makefile.objs |   2 +-
 hw/block/nvme.c| 109 ++
 hw/block/nvme.h|   2 +
 hw/block/trace-events  |   4 +
 include/block/nvme.h   | 172 +
 5 files changed, 288 insertions(+), 1 deletion(-)

diff --git a/hw/block/Makefile.objs b/hw/block/Makefile.objs
index 4b4a2b338d..47960b5f0d 100644
--- a/hw/block/Makefile.objs
+++ b/hw/block/Makefile.objs
@@ -7,12 +7,12 @@ common-obj-$(CONFIG_PFLASH_CFI02) += pflash_cfi02.o
 common-obj-$(CONFIG_XEN) += xen-block.o
 common-obj-$(CONFIG_ECC) += ecc.o
 common-obj-$(CONFIG_ONENAND) += onenand.o
-common-obj-$(CONFIG_NVME_PCI) += nvme.o
 common-obj-$(CONFIG_SWIM) += swim.o
 
 common-obj-$(CONFIG_SH4) += tc58128.o
 
 obj-$(CONFIG_VIRTIO_BLK) += virtio-blk.o
 obj-$(CONFIG_VHOST_USER_BLK) += vhost-user-blk.o
+obj-$(CONFIG_NVME_PCI) += nvme.o
 
 obj-y += dataplane/
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index d28335cbf3..9b453423cf 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -19,10 +19,19 @@
  *  -drive file=,if=none,id=
  *  -device nvme,drive=,serial=,id=, \
  *  cmb_size_mb=, \
+ *  [pmrdev=,] \
  *  num_queues=
  *
  * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
  * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
+ *
+ * cmb_size_mb= and pmrdev= options are mutually exclusive due to limitation
+ * in available BAR's. cmb_size_mb= will take precedence over pmrdev= when
+ * both provided.
+ * Enabling pmr emulation can be achieved by pointing to memory-backend-file.
+ * For example:
+ * -object memory-backend-file,id=,share=on,mem-path=, \
+ *  size=  -device nvme,...,pmrdev=
  */
 
 #include "qemu/osdep.h"
@@ -35,7 +44,9 @@
 #include "sysemu/sysemu.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
+#include "sysemu/hostmem.h"
 #include "sysemu/block-backend.h"
+#include "exec/ram_addr.h"
 
 #include "qemu/log.h"
 #include "qemu/module.h"
@@ -1141,6 +1152,26 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 NVME_GUEST_ERR(nvme_ub_mmiowr_cmbsz_readonly,
"invalid write to read only CMBSZ, ignored");
 return;
+case 0xE00: /* PMRCAP */
+NVME_GUEST_ERR(nvme_ub_mmiowr_pmrcap_readonly,
+   "invalid write to PMRCAP register, ignored");
+return;
+case 0xE04: /* TODO PMRCTL */
+break;
+case 0xE08: /* PMRSTS */
+NVME_GUEST_ERR(nvme_ub_mmiowr_pmrsts_readonly,
+   "invalid write to PMRSTS register, ignored");
+return;
+case 0xE0C: /* PMREBS */
+NVME_GUEST_ERR(nvme_ub_mmiowr_pmrebs_readonly,
+   "invalid write to PMREBS register, ignored");
+return;
+case 0xE10: /* PMRSWTP */
+NVME_GUEST_ERR(nvme_ub_mmiowr_pmrswtp_readonly,
+   "invalid write to PMRSWTP register, ignored");
+return;
+case 0xE14: /* TODO PMRMSC */
+ break;
 default:
 NVME_GUEST_ERR(nvme_ub_mmiowr_invalid,
"invalid MMIO write,"
@@ -1169,6 +1200,16 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr 
addr, unsigned size)
 }
 
 if (addr < sizeof(n->bar)) {
+/*
+ * When PMRWBM bit 1 is set then read from
+ * from PMRSTS should ensure pri

Re: [PATCH v11 03/16] s390x: protvirt: Support unpack facility

2020-03-20 Thread Bruce Rogers

On Thu, 2020-03-19 at 09:19 -0400, Janosch Frank wrote:
> The unpack facility provides the means to setup a protected guest. A
> protected guest cannot be introspected by the hypervisor or any
> user/administrator of the machine it is running on.
> 
> Protected guests are encrypted at rest and need a special boot
> mechanism via diag308 subcode 8 and 10.
> 
> Code 8 sets the PV specific IPLB which is retained separately from
> those set via code 5.
> 
> Code 10 is used to unpack the VM into protected memory, verify its
> integrity and start it.
> 
> Signed-off-by: Janosch Frank 
> Co-developed-by: Christian Borntraeger 
> [Changes
> to machine]
> Reviewed-by: David Hildenbrand 
> Reviewed-by: Claudio Imbrenda 
> Reviewed-by: Cornelia Huck 
> ---
>  MAINTAINERS |   2 +
>  hw/s390x/Makefile.objs  |   1 +
>  hw/s390x/ipl.c  |  59 +-
>  hw/s390x/ipl.h  |  91 -
>  hw/s390x/pv.c   |  98 +++
>  hw/s390x/s390-virtio-ccw.c  | 119
> +++-
>  include/hw/s390x/pv.h   |  55 +
>  include/hw/s390x/s390-virtio-ccw.h  |   1 +
>  target/s390x/cpu.c  |   1 +
>  target/s390x/cpu_features_def.inc.h |   1 +
>  target/s390x/diag.c |  39 -
>  target/s390x/kvm-stub.c |   5 ++
>  target/s390x/kvm.c  |   5 ++
>  target/s390x/kvm_s390x.h|   1 +
>  14 files changed, 468 insertions(+), 10 deletions(-)
>  create mode 100644 hw/s390x/pv.c
>  create mode 100644 include/hw/s390x/pv.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index dfbd5b0c5de9074c..f4e09213f945a716 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -391,6 +391,8 @@ F: target/s390x/machine.c
>  F: target/s390x/sigp.c
>  F: target/s390x/cpu_features*.[ch]
>  F: target/s390x/cpu_models.[ch]
> +F: hw/s390x/pv.c
> +F: include/hw/s390x/pv.h
>  F: hw/intc/s390_flic.c
>  F: hw/intc/s390_flic_kvm.c
>  F: include/hw/s390x/s390_flic.h
> diff --git a/hw/s390x/Makefile.objs b/hw/s390x/Makefile.objs
> index e02ed80b6829a511..a46a1c7894e0f612 100644
> --- a/hw/s390x/Makefile.objs
> +++ b/hw/s390x/Makefile.objs
> @@ -31,6 +31,7 @@ obj-y += tod-qemu.o
>  obj-$(CONFIG_KVM) += tod-kvm.o
>  obj-$(CONFIG_KVM) += s390-skeys-kvm.o
>  obj-$(CONFIG_KVM) += s390-stattrib-kvm.o
> +obj-$(CONFIG_KVM) += pv.o
>  obj-y += s390-ccw.o
>  obj-y += ap-device.o
>  obj-y += ap-bridge.o
> diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
> index b81942e1e6f9002e..6e21cd453b51b4ff 100644
> --- a/hw/s390x/ipl.c
> +++ b/hw/s390x/ipl.c
> @@ -1,10 +1,11 @@
>  /*
>   * bootloader support
>   *
> - * Copyright IBM, Corp. 2012
> + * Copyright IBM, Corp. 2012, 2020
>   *
>   * Authors:
>   *  Christian Borntraeger 
> + *  Janosch Frank 
>   *
>   * This work is licensed under the terms of the GNU GPL, version 2
> or (at your
>   * option) any later version.  See the COPYING file in the top-level 
> directory.
> @@ -27,6 +28,7 @@
>  #include "hw/s390x/vfio-ccw.h"
>  #include "hw/s390x/css.h"
>  #include "hw/s390x/ebcdic.h"
> +#include "hw/s390x/pv.h"
>  #include "ipl.h"
>  #include "qemu/error-report.h"
>  #include "qemu/config-file.h"
> @@ -566,12 +568,31 @@ void s390_ipl_update_diag308(IplParameterBlock
> *iplb)
>  {
>  S390IPLState *ipl = get_ipl_device();
>  
> -ipl->iplb = *iplb;
> -ipl->iplb_valid = true;
> +/*
> + * The IPLB set and retrieved by subcodes 8/9 is completely
> + * separate from the one managed via subcodes 5/6.
> + */
> +if (iplb->pbt == S390_IPL_TYPE_PV) {
> +ipl->iplb_pv = *iplb;
> +ipl->iplb_valid_pv = true;
> +} else {
> +ipl->iplb = *iplb;
> +ipl->iplb_valid = true;
> +}
>  ipl->netboot = is_virtio_net_device(iplb);
>  update_machine_ipl_properties(iplb);
>  }
>  
> +IplParameterBlock *s390_ipl_get_iplb_pv(void)
> +{
> +S390IPLState *ipl = get_ipl_device();
> +
> +if (!ipl->iplb_valid_pv) {
> +return NULL;
> +}
> +return &ipl->iplb_pv;
> +}
> +
>  IplParameterBlock *s390_ipl_get_iplb(void)
>  {
>  S390IPLState *ipl = get_ipl_device();
> @@ -660,6 +681,38 @@ static void s390_ipl_prepare_qipl(S390CPU *cpu)
>  cpu_physical_memory_unmap(addr, len, 1, len);
>  }
>  
> +int s390_ipl_prepare_pv_header(void)
> +{
> +IplParameterBlock *ipib = s390_ipl_get_iplb_pv();
> +IPLBlockPV *ipib_pv = &ipib->pv;
> +void *hdr = g_malloc(ipib_pv->pv_header_len);
> +int rc;
> +
> +cpu_physical_memory_read(ipib_pv->pv_header_addr, hdr,
> + ipib_pv->pv_header_len);
> +rc = s390_pv_set_sec_parms((uint64_t)hdr,
> +   ipib_pv->pv_header_len);

This causes a compiler issue when building for 32 bit x86 as follows:

/home/abuild/rpmbuild/BUILD/qemu-4.2.0/hw/s390x/ipl.c: In function
's390_ipl_prepare_pv_header':
/home/abuild/rpmbuild/BUILD/qemu-4.2.0/hw/s390x/

Re: [Qemu-devel] [PULL 2/3] hmp: Update info vnc

2020-03-20 Thread Dr. David Alan Gilbert

* Peter Maydell (peter.mayd...@linaro.org) wrote:
> On Mon, 17 Jul 2017 at 10:40, Gerd Hoffmann  wrote:
> >
> > From: "Dr. David Alan Gilbert" 
> >
> > The QMP query-vnc interfaces have gained a lot more information that
> > the HMP interfaces hasn't got yet. Update it.
> >
> > Note the output format has changed, but this is HMP so that's OK.
> 
> Hi; another "ancient change Coverity has only just noticed has
> a problem" email :-)   This is CID 1421932. It looks like any
> "info vnc" will leak memory if there are any VNC servers to
> display info about...
> 
> >  void hmp_info_vnc(Monitor *mon, const QDict *qdict)
> >  {
> > -VncInfo *info;
> > +VncInfo2List *info2l;
> >  Error *err = NULL;
> > -VncClientInfoList *client;
> >
> > -info = qmp_query_vnc(&err);
> > +info2l = qmp_query_vnc_servers(&err);
> 
> Here we get a list of VNC servers, which is allocated memory...
> 
> >  if (err) {
> >  error_report_err(err);
> >  return;
> >  }
> > -
> > -if (!info->enabled) {
> > -monitor_printf(mon, "Server: disabled\n");
> > -goto out;
> > -}
> > -
> > -monitor_printf(mon, "Server:\n");
> > -if (info->has_host && info->has_service) {
> > -monitor_printf(mon, " address: %s:%s\n", info->host, 
> > info->service);
> > -}
> > -if (info->has_auth) {
> > -monitor_printf(mon, "auth: %s\n", info->auth);
> > +if (!info2l) {
> > +monitor_printf(mon, "None\n");
> > +return;
> >  }
> >
> > -if (!info->has_clients || info->clients == NULL) {
> > -monitor_printf(mon, "Client: none\n");
> > -} else {
> > -for (client = info->clients; client; client = client->next) {
> > -monitor_printf(mon, "Client:\n");
> > -monitor_printf(mon, " address: %s:%s\n",
> > -   client->value->host,
> > -   client->value->service);
> > -monitor_printf(mon, "  x509_dname: %s\n",
> > -   client->value->x509_dname ?
> > -   client->value->x509_dname : "none");
> > -monitor_printf(mon, "username: %s\n",
> > -   client->value->has_sasl_username ?
> > -   client->value->sasl_username : "none");
> > +while (info2l) {
> > +VncInfo2 *info = info2l->value;
> > +monitor_printf(mon, "%s:\n", info->id);
> > +hmp_info_vnc_servers(mon, info->server);
> > +hmp_info_vnc_clients(mon, info->clients);
> > +if (!info->server) {
> > +/* The server entry displays its auth, we only
> > + * need to display in the case of 'reverse' connections
> > + * where there's no server.
> > + */
> > +hmp_info_vnc_authcrypt(mon, "  ", info->auth,
> > +   info->has_vencrypt ? &info->vencrypt : 
> > NULL);
> > +}
> > +if (info->has_display) {
> > +monitor_printf(mon, "  Display: %s\n", info->display);
> >  }
> > +info2l = info2l->next;
> 
> ...but the loop iteration here updates 'info2l' as it goes along...
> 
> >  }
> >
> > -out:
> > -qapi_free_VncInfo(info);
> > +qapi_free_VncInfo2List(info2l);
> 
> ...so here we end up passing NULL to qapi_free_VncInfo2List(),
> which will do nothing, leaking the whole list.
> 
> Would somebody like to send a patch?

Oops, yes I can look at that; I guess something along the lines of an
info2l_orig  and free that at the end.

Dave

> thanks
> -- PMM
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: Qemu on Windows 10 - no acceleration found

2020-03-20 Thread Jerry Geis

So I tried --enable-whpx and I get Invalid option. Im on Windows 10 and
QEMU 4.2.0

I'm confused.  Then I don't know where to download the HAXM. The place I
found is GIT and it wants the user to compile it. I was looking for just an
EXE.

Thanks

Jerry

Re: [PATCH v14 Kernel 7/7] vfio: Selective dirty page tracking if IOMMU backed device pins pages

2020-03-20 Thread Alex Williamson

On Thu, 19 Mar 2020 02:24:33 -0400
Yan Zhao  wrote:
> On Thu, Mar 19, 2020 at 03:41:14AM +0800, Kirti Wankhede wrote:
> > diff --git a/drivers/vfio/vfio_iommu_type1.c 
> > b/drivers/vfio/vfio_iommu_type1.c
> > index 912629320719..deec09f4b0f6 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -72,6 +72,7 @@ struct vfio_iommu {
> > boolv2;
> > boolnesting;
> > booldirty_page_tracking;
> > +   boolpinned_page_dirty_scope;
> >  };
> >  
> >  struct vfio_domain {
> > @@ -99,6 +100,7 @@ struct vfio_group {
> > struct iommu_group  *iommu_group;
> > struct list_headnext;
> > boolmdev_group; /* An mdev group */
> > +   boolpinned_page_dirty_scope;
> >  };
> >  
> >  struct vfio_iova {
> > @@ -132,6 +134,10 @@ struct vfio_regions {
> >  static int put_pfn(unsigned long pfn, int prot);
> >  static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
> >  
> > +static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu 
> > *iommu,
> > +  struct iommu_group *iommu_group);
> > +
> > +static void update_pinned_page_dirty_scope(struct vfio_iommu *iommu);
> >  /*
> >   * This code handles mapping and unmapping of user data buffers
> >   * into DMA'ble space using the IOMMU
> > @@ -556,11 +562,13 @@ static int vfio_unpin_page_external(struct vfio_dma 
> > *dma, dma_addr_t iova,
> >  }
> >  
> >  static int vfio_iommu_type1_pin_pages(void *iommu_data,
> > + struct iommu_group *iommu_group,
> >   unsigned long *user_pfn,
> >   int npage, int prot,
> >   unsigned long *phys_pfn)
> >  {
> > struct vfio_iommu *iommu = iommu_data;
> > +   struct vfio_group *group;
> > int i, j, ret;
> > unsigned long remote_vaddr;
> > struct vfio_dma *dma;
> > @@ -630,8 +638,14 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
> >(vpfn->iova - dma->iova) >> pgshift, 1);
> > }
> > }  
> 
> Could you provide an interface lightweight than vfio_pin_pages for 
> pass-through
> devices? e.g. vfio_mark_iova_dirty()
> 
> Or at least allowing phys_pfn to be empty for pass-through devices.
> 
> This is really inefficient:
> bitmap_set(dma->bitmap, (vpfn->iova - dma->iova) / pgsize, 1));
> i.e.
> in order to mark an iova dirty, it has to go through iova ---> pfn --> iova
> while acquiring pfn is not necessary for pass-through devices.

I think this would be possible, but I don't think it should be gating
to this series.  We don't have such consumers yet.  Thanks,

Alex

Re: discard and v2 qcow2 images

2020-03-20 Thread Alberto Garcia

On Fri 20 Mar 2020 08:35:44 PM CET, Eric Blake  wrote:
>> This flag is however only supported when qcow_version >= 3. In older
>> images the cluster is simply deallocated, exposing any possible
>> previous data from the backing file.
>
> Discard is advisory, and has no requirements that discarded data read
> back as zero.  However, if write zeroes uses discard under the hood,
> then THAT usage must guarantee reading back as zero.

write_zeroes doesn't seem to use discard in any case, so no problem
there.

>> @@ -3763,6 +3763,10 @@ static coroutine_fn int 
>> qcow2_co_pdiscard(BlockDriverState *bs,
>>   int ret;
>>   BDRVQcow2State *s = bs->opaque;
>>   
>> +if (s->qcow_version < 3) {
>> +return -ENOTSUP;
>> +}
>> +
>
> This changes it so you no longer see stale data, but doesn't change
> the fact that you don't read zeroes (just that your stale data is now
> from the current layer instead of the backing layer, since we did
> nothing at all).

discard can already fail if the request is not aligned, in this case you
get -ENOTSUP and stay with the same data as before.

What's different in this case is that you can actually get stale data,
that doesn't seem like a desirable outcome.

Berto

Re: discard and v2 qcow2 images

2020-03-20 Thread Eric Blake


On 3/20/20 1:58 PM, Alberto Garcia wrote:

Hi,

when full_discard is false in discard_in_l2_slice() then the selected
cluster should be deallocated and it should read back as zeroes. This
is done by clearing the cluster offset field and setting OFLAG_ZERO in
the L2 entry.

This flag is however only supported when qcow_version >= 3. In older
images the cluster is simply deallocated, exposing any possible
previous data from the backing file.


Discard is advisory, and has no requirements that discarded data read 
back as zero.  However, if write zeroes uses discard under the hood, 
then THAT usage must guarantee reading back as zero.




This can be trivially reproduced like this:

qemu-img create -f qcow2 backing.img 64k
qemu-io -c 'write -P 0xff 0 64k' backing.img
qemu-img create -f qcow2 -o compat=0.10 -b backing.img top.img
qemu-io -c 'write -P 0x01 0 64k' top.img

After this, top.img is filled with 0x01. Now we issue a discard
command:

qemu-io -c 'discard 0 64k' top.img

top.img should now read as zeroes, but instead you get the data from
the backing file (0xff). If top.img was created with compat=1.1
instead (the default) then it would read as zeroes after the discard.


I'd argue that this is undesirable behavior, but not a bug.



This seems like a bug to me, and I would simply forbid using discard
in this case (see below). The other user of full_discard = false is
qcow2_snapshot_create() but I think that one is safe and should be
allowed?

--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3763,6 +3763,10 @@ static coroutine_fn int 
qcow2_co_pdiscard(BlockDriverState *bs,
  int ret;
  BDRVQcow2State *s = bs->opaque;
  
+if (s->qcow_version < 3) {

+return -ENOTSUP;
+}
+


This changes it so you no longer see stale data, but doesn't change the 
fact that you don't read zeroes (just that your stale data is now from 
the current layer instead of the backing layer, since we did nothing at 
all).


I'm not opposed to the patch, per se, but am not convinced that this is 
a problem to worry about.



  if (!QEMU_IS_ALIGNED(offset | bytes, s->cluster_size)) {
  assert(bytes < s->cluster_size);
  /* Ignore partial clusters, except for the special case of the

Berto



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v14 Kernel 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap

2020-03-20 Thread Alex Williamson

On Sat, 21 Mar 2020 00:44:32 +0530
Kirti Wankhede  wrote:

> On 3/20/2020 9:17 PM, Alex Williamson wrote:
> > On Fri, 20 Mar 2020 09:40:39 -0600
> > Alex Williamson  wrote:
> >   
> >> On Fri, 20 Mar 2020 04:35:29 -0400
> >> Yan Zhao  wrote:
> >>  
> >>> On Thu, Mar 19, 2020 at 03:41:12AM +0800, Kirti Wankhede wrote:  
>  DMA mapped pages, including those pinned by mdev vendor drivers, might
>  get unpinned and unmapped while migration is active and device is still
>  running. For example, in pre-copy phase while guest driver could access
>  those pages, host device or vendor driver can dirty these mapped pages.
>  Such pages should be marked dirty so as to maintain memory consistency
>  for a user making use of dirty page tracking.
> 
>  To get bitmap during unmap, user should set flag
>  VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP, bitmap memory should be allocated 
>  and
>  zeroed by user space application. Bitmap size and page size should be set
>  by user application.
> 
>  Signed-off-by: Kirti Wankhede 
>  Reviewed-by: Neo Jia 
>  ---
>    drivers/vfio/vfio_iommu_type1.c | 55 
>  ++---
>    include/uapi/linux/vfio.h   | 11 +
>    2 files changed, 62 insertions(+), 4 deletions(-)
> 
>  diff --git a/drivers/vfio/vfio_iommu_type1.c 
>  b/drivers/vfio/vfio_iommu_type1.c
>  index d6417fb02174..aa1ac30f7854 100644
>  --- a/drivers/vfio/vfio_iommu_type1.c
>  +++ b/drivers/vfio/vfio_iommu_type1.c
>  @@ -939,7 +939,8 @@ static int verify_bitmap_size(uint64_t npages, 
>  uint64_t bitmap_size)
>    }
>    
>    static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>  - struct vfio_iommu_type1_dma_unmap *unmap)
>  + struct vfio_iommu_type1_dma_unmap *unmap,
>  + struct vfio_bitmap *bitmap)
>    {
>   uint64_t mask;
>   struct vfio_dma *dma, *dma_last = NULL;
>  @@ -990,6 +991,10 @@ static int vfio_dma_do_unmap(struct vfio_iommu 
>  *iommu,
>    * will be returned if these conditions are not met.  The v2 
>  interface
>    * will only return success and a size of zero if there were no
>    * mappings within the range.
>  + *
>  + * When VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP flag is set, unmap 
>  request
>  + * must be for single mapping. Multiple mappings with this flag 
>  set is
>  + * not supported.
>    */
>   if (iommu->v2) {
>   dma = vfio_find_dma(iommu, unmap->iova, 1);
>  @@ -997,6 +1002,13 @@ static int vfio_dma_do_unmap(struct vfio_iommu 
>  *iommu,
>   ret = -EINVAL;
>   goto unlock;
>   }
>  +
>  +if ((unmap->flags & 
>  VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
>  +(dma->iova != unmap->iova || dma->size != 
>  unmap->size)) {  
> >>> dma is probably NULL here!  
> >>
> >> Yep, I didn't look closely enough there.  This is situated right
> >> between the check to make sure we're not bisecting a mapping at the
> >> start of the unmap and the check to make sure we're not bisecting a
> >> mapping at the end of the unmap.  There's no guarantee that we have a
> >> valid pointer here.  The test should be in the while() loop below this
> >> code.  
> > 
> > Actually the test could remain here, we can exit here if we can't find
> > a dma at the start of the unmap range with the GET_DIRTY_BITMAP flag,
> > but we absolutely cannot deref dma without testing it.
> >   
> 
> In the check above newly added check, if dma is NULL then its an error 
> condition, because Unmap requests must fully cover previous mappings, right?

Yes, but we'll do a null pointer deref before we return error.
 
> >>> And this restriction on UNMAP would make some UNMAP operations of vIOMMU
> >>> fail.
> >>>
> >>> e.g. below condition indeed happens in reality.
> >>> an UNMAP ioctl comes for IOVA range from 0xff80, of size 0x20
> >>> However, IOVAs in this range are mapped page by page.i.e., dma->size is 
> >>> 0x1000.
> >>>
> >>> Previous, this UNMAP ioctl could unmap successfully as a whole.  
> >>
> >> What triggers this in the guest?  Note that it's only when using the
> >> GET_DIRTY_BITMAP flag that this is restricted.  Does the event you're
> >> referring to potentially occur under normal circumstances in that mode?
> >> Thanks,
> >>  
> 
> Such unmap would callback vfio_iommu_map_notify() in QEMU. In 
> vfio_iommu_map_notify(), unmap is called on same range  iotlb->addr_mask + 1> which was used for map. Secondly unmap with bitmap 
> will be called only when device state has _SAVING flag set.

It might be helpful for Yan, and everyone else, to see the latest

Re: [PATCH v3] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-03-20 Thread Stefan Hajnoczi

On Fri, Mar 20, 2020 at 5:48 PM Andrzej Jakowski
 wrote:
>
> On 3/20/20 8:45 AM, Stefan Hajnoczi wrote:
> > Please use qemu_ram_writeback() so that pmem_persist() and qemu_msync()
> > are used as appropriate.
>
> Thx!
> qemu_ram_writeback() doesn't return any status. How can I know that actual 
> msync succeds?

If the warn_report() message that is already printed by
qemu_ram_writeback() is insufficient in terms of error reporting, I
suggest propagating the return value from qemu_ram_writeback() and
qemu_ram_block_writeback().

> Also qemu_ram_writeback() requires me to include #include "exec/ram_addr.h".
> After including it when I compile code I'm getting following error:
>
> In file included from hw/block/nvme.c:49:
> /root/sources/pmr/qemu/include/exec/ram_addr.h:23:10: fatal error: cpu.h: No 
> such file or directory
>23 | #include "cpu.h"
>   |  ^~~
> compilation terminated.
> make: *** [/root/sources/pmr/qemu/rules.mak:69: hw/block/nvme.o] Error 1
>
> Why this is happening and what should be changed.

Generally object files are built as part of common-obj-y in
Makefile.objs.  These object files are built only once across all QEMU
targets (e.g. qemu-system-x86_64, qemu-system-arm, ...).

Some code embeds target-specific information and is therefore not
suitable for common-obj-y.  These object files are built as part of
obj-y in Makefile.objs.

You can fix this compilation issue by changing hw/block/Makefile.objs
to like this:

diff --git a/hw/block/Makefile.objs b/hw/block/Makefile.objs
index 4b4a2b338d..12d5d5dac6 100644
--- a/hw/block/Makefile.objs
+++ b/hw/block/Makefile.objs
@@ -7,11 +7,11 @@ common-obj-$(CONFIG_PFLASH_CFI02) += pflash_cfi02.o
 common-obj-$(CONFIG_XEN) += xen-block.o
 common-obj-$(CONFIG_ECC) += ecc.o
 common-obj-$(CONFIG_ONENAND) += onenand.o
-common-obj-$(CONFIG_NVME_PCI) += nvme.o
 common-obj-$(CONFIG_SWIM) += swim.o

 common-obj-$(CONFIG_SH4) += tc58128.o

+obj-$(CONFIG_NVME_PCI) += nvme.o
 obj-$(CONFIG_VIRTIO_BLK) += virtio-blk.o
 obj-$(CONFIG_VHOST_USER_BLK) += vhost-user-blk.o

Stefan

Re: Qemu on Windows 10 - no acceleration found

2020-03-20 Thread Stefan Weil

Am 20.03.20 um 18:20 schrieb Jerry Geis:

> Hi All,
>
> I have tried QEMU on Windows 10 host with and without HyperV active in
> the features list.
> Neither seemed to affect the "really slow" speed. Either option
> results in -enable-kvm giving "no acceleration found".
>
> How do I enable acceleration on QEMU for windows.
>
> Jerry


Please read https://qemu.weilnetz.de/FAQ.

Stefan W.

Re: [PATCH v14 Kernel 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap

2020-03-20 Thread Kirti Wankhede





On 3/20/2020 9:17 PM, Alex Williamson wrote:

On Fri, 20 Mar 2020 09:40:39 -0600
Alex Williamson  wrote:


On Fri, 20 Mar 2020 04:35:29 -0400
Yan Zhao  wrote:


On Thu, Mar 19, 2020 at 03:41:12AM +0800, Kirti Wankhede wrote:

DMA mapped pages, including those pinned by mdev vendor drivers, might
get unpinned and unmapped while migration is active and device is still
running. For example, in pre-copy phase while guest driver could access
those pages, host device or vendor driver can dirty these mapped pages.
Such pages should be marked dirty so as to maintain memory consistency
for a user making use of dirty page tracking.

To get bitmap during unmap, user should set flag
VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP, bitmap memory should be allocated and
zeroed by user space application. Bitmap size and page size should be set
by user application.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  drivers/vfio/vfio_iommu_type1.c | 55 ++---
  include/uapi/linux/vfio.h   | 11 +
  2 files changed, 62 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index d6417fb02174..aa1ac30f7854 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -939,7 +939,8 @@ static int verify_bitmap_size(uint64_t npages, uint64_t 
bitmap_size)
  }
  
  static int vfio_dma_do_unmap(struct vfio_iommu *iommu,

-struct vfio_iommu_type1_dma_unmap *unmap)
+struct vfio_iommu_type1_dma_unmap *unmap,
+struct vfio_bitmap *bitmap)
  {
uint64_t mask;
struct vfio_dma *dma, *dma_last = NULL;
@@ -990,6 +991,10 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 * will be returned if these conditions are not met.  The v2 interface
 * will only return success and a size of zero if there were no
 * mappings within the range.
+*
+* When VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP flag is set, unmap request
+* must be for single mapping. Multiple mappings with this flag set is
+* not supported.
 */
if (iommu->v2) {
dma = vfio_find_dma(iommu, unmap->iova, 1);
@@ -997,6 +1002,13 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
ret = -EINVAL;
goto unlock;
}
+
+   if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
+   (dma->iova != unmap->iova || dma->size != unmap->size)) {

dma is probably NULL here!


Yep, I didn't look closely enough there.  This is situated right
between the check to make sure we're not bisecting a mapping at the
start of the unmap and the check to make sure we're not bisecting a
mapping at the end of the unmap.  There's no guarantee that we have a
valid pointer here.  The test should be in the while() loop below this
code.


Actually the test could remain here, we can exit here if we can't find
a dma at the start of the unmap range with the GET_DIRTY_BITMAP flag,
but we absolutely cannot deref dma without testing it.



In the check above newly added check, if dma is NULL then its an error 
condition, because Unmap requests must fully cover previous mappings, right?



And this restriction on UNMAP would make some UNMAP operations of vIOMMU
fail.

e.g. below condition indeed happens in reality.
an UNMAP ioctl comes for IOVA range from 0xff80, of size 0x20
However, IOVAs in this range are mapped page by page.i.e., dma->size is 0x1000.

Previous, this UNMAP ioctl could unmap successfully as a whole.


What triggers this in the guest?  Note that it's only when using the
GET_DIRTY_BITMAP flag that this is restricted.  Does the event you're
referring to potentially occur under normal circumstances in that mode?
Thanks,



Such unmap would callback vfio_iommu_map_notify() in QEMU. In 
vfio_iommu_map_notify(), unmap is called on same range iotlb->addr_mask + 1> which was used for map. Secondly unmap with bitmap 
will be called only when device state has _SAVING flag set.


Thanks,
Kirti

[PATCH-for-5.0 3/4] tests/docker: Use Python3 PyYAML in the Fedora image

2020-03-20 Thread Philippe Mathieu-Daudé

The Python2 PyYAML is now pointless, switch to the Python3 version.

Fixes: bcbf27947 (docker: move tests from python2 to python3)
Signed-off-by: Philippe Mathieu-Daudé 
---
 tests/docker/dockerfiles/fedora.docker | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/docker/dockerfiles/fedora.docker 
b/tests/docker/dockerfiles/fedora.docker
index 019eb12dcb..174979c7af 100644
--- a/tests/docker/dockerfiles/fedora.docker
+++ b/tests/docker/dockerfiles/fedora.docker
@@ -79,8 +79,8 @@ ENV PACKAGES \
 perl-Test-Harness \
 pixman-devel \
 python3 \
+python3-PyYAML \
 python3-sphinx \
-PyYAML \
 rdma-core-devel \
 SDL2-devel \
 snappy-devel \
-- 
2.21.1

[PATCH 4/4] tests/docker: Add libepoxy and libudev packages to the Fedora image

2020-03-20 Thread Philippe Mathieu-Daudé

Install optional dependencies of QEMU to get better coverage.

Suggested-by: Peter Maydell 
Signed-off-by: Philippe Mathieu-Daudé 
---
 tests/docker/dockerfiles/fedora.docker | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/docker/dockerfiles/fedora.docker 
b/tests/docker/dockerfiles/fedora.docker
index 174979c7af..4bd2c953af 100644
--- a/tests/docker/dockerfiles/fedora.docker
+++ b/tests/docker/dockerfiles/fedora.docker
@@ -29,6 +29,7 @@ ENV PACKAGES \
 libblockdev-mpath-devel \
 libcap-ng-devel \
 libcurl-devel \
+libepoxy-devel \
 libfdt-devel \
 libiscsi-devel \
 libjpeg-devel \
@@ -38,6 +39,7 @@ ENV PACKAGES \
 libseccomp-devel \
 libssh-devel \
 libubsan \
+libudev-devel \
 libusbx-devel \
 libxml2-devel \
 libzstd-devel \
-- 
2.21.1

[PATCH-for-5.0 0/4] tests/docker: Fixes for 5.0

2020-03-20 Thread Philippe Mathieu-Daudé

Easy fixes for our Docker images.

Philippe Mathieu-Daudé (4):
  tests/docker: Keep package list sorted
  tests/docker: Install gcrypt devel package in Debian image
  tests/docker: Use Python3 PyYAML in the Fedora image
  tests/docker: Add libepoxy and libudev packages to the Fedora image

 tests/docker/dockerfiles/centos7.docker  |  6 --
 tests/docker/dockerfiles/debian-amd64.docker |  1 +
 tests/docker/dockerfiles/fedora.docker   | 10 +++---
 3 files changed, 12 insertions(+), 5 deletions(-)

-- 
2.21.1

[PATCH-for-5.0 2/4] tests/docker: Install gcrypt devel package in Debian image

2020-03-20 Thread Philippe Mathieu-Daudé

Apparently Debian Stretch was listing gcrypt as a QEMU dependency,
but this is not the case anymore in Buster, so we need to install
it manually (it it not listed by 'apt-get -s build-dep qemu' in
the common debian10.docker anymore).

 $ ../configure $QEMU_CONFIGURE_OPTS

  ERROR: User requested feature gcrypt
 configure was not able to find it.
 Install gcrypt devel >= 1.5.0

Fixes: 698a71edbed & 6f8bbb374be
Signed-off-by: Philippe Mathieu-Daudé 
---
 tests/docker/dockerfiles/debian-amd64.docker | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/docker/dockerfiles/debian-amd64.docker 
b/tests/docker/dockerfiles/debian-amd64.docker
index d4849f509f..957f0bc2e7 100644
--- a/tests/docker/dockerfiles/debian-amd64.docker
+++ b/tests/docker/dockerfiles/debian-amd64.docker
@@ -16,6 +16,7 @@ RUN apt update && \
 apt install -y --no-install-recommends \
 libbz2-dev \
 liblzo2-dev \
+libgcrypt20-dev \
 librdmacm-dev \
 libsasl2-dev \
 libsnappy-dev \
-- 
2.21.1

[PATCH 1/4] tests/docker: Keep package list sorted

2020-03-20 Thread Philippe Mathieu-Daudé

Keep package list sorted, this eases rebase/cherry-pick.

Fixes: 3a6784813
Signed-off-by: Philippe Mathieu-Daudé 
---
 tests/docker/dockerfiles/centos7.docker | 6 --
 tests/docker/dockerfiles/fedora.docker  | 6 --
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/tests/docker/dockerfiles/centos7.docker 
b/tests/docker/dockerfiles/centos7.docker
index cdd72de7eb..9a2a2e515d 100644
--- a/tests/docker/dockerfiles/centos7.docker
+++ b/tests/docker/dockerfiles/centos7.docker
@@ -2,6 +2,8 @@ FROM centos:7
 RUN yum install -y epel-release centos-release-xen-48
 
 RUN yum -y update
+
+# Please keep this list sorted alphabetically
 ENV PACKAGES \
 bison \
 bzip2 \
@@ -19,6 +21,7 @@ ENV PACKAGES \
 libepoxy-devel \
 libfdt-devel \
 librdmacm-devel \
+libzstd-devel \
 lzo-devel \
 make \
 mesa-libEGL-devel \
@@ -33,7 +36,6 @@ ENV PACKAGES \
 tar \
 vte-devel \
 xen-devel \
-zlib-devel \
-libzstd-devel
+zlib-devel
 RUN yum install -y $PACKAGES
 RUN rpm -q $PACKAGES | sort > /packages.txt
diff --git a/tests/docker/dockerfiles/fedora.docker 
b/tests/docker/dockerfiles/fedora.docker
index a658c0..019eb12dcb 100644
--- a/tests/docker/dockerfiles/fedora.docker
+++ b/tests/docker/dockerfiles/fedora.docker
@@ -1,4 +1,6 @@
 FROM fedora:30
+
+# Please keep this list sorted alphabetically
 ENV PACKAGES \
 bc \
 bison \
@@ -38,6 +40,7 @@ ENV PACKAGES \
 libubsan \
 libusbx-devel \
 libxml2-devel \
+libzstd-devel \
 llvm \
 lzo-devel \
 make \
@@ -92,8 +95,7 @@ ENV PACKAGES \
 vte291-devel \
 which \
 xen-devel \
-zlib-devel \
-libzstd-devel
+zlib-devel
 ENV QEMU_CONFIGURE_OPTS --python=/usr/bin/python3
 
 RUN dnf install -y $PACKAGES
-- 
2.21.1

Re: [PATCH v15 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-20 Thread Alex Williamson

On Sat, 21 Mar 2020 00:12:04 +0530
Kirti Wankhede  wrote:

> On 3/20/2020 11:31 PM, Alex Williamson wrote:
> > On Fri, 20 Mar 2020 23:19:14 +0530
> > Kirti Wankhede  wrote:
> >   
> >> On 3/20/2020 4:27 AM, Alex Williamson wrote:  
> >>> On Fri, 20 Mar 2020 01:46:41 +0530
> >>> Kirti Wankhede  wrote:
> >>>  
> 
> 
> 
>  +static int vfio_iova_dirty_bitmap(struct vfio_iommu *iommu, dma_addr_t 
>  iova,
>  +  size_t size, uint64_t pgsize,
>  +  u64 __user *bitmap)
>  +{
>  +struct vfio_dma *dma;
>  +unsigned long pgshift = __ffs(pgsize);
>  +unsigned int npages, bitmap_size;
>  +
>  +dma = vfio_find_dma(iommu, iova, 1);
>  +
>  +if (!dma)
>  +return -EINVAL;
>  +
>  +if (dma->iova != iova || dma->size != size)
>  +return -EINVAL;
>  +
>  +npages = dma->size >> pgshift;
>  +bitmap_size = DIRTY_BITMAP_BYTES(npages);
>  +
>  +/* mark all pages dirty if all pages are pinned and mapped. */
>  +if (dma->iommu_mapped)
>  +bitmap_set(dma->bitmap, 0, npages);
>  +
>  +if (copy_to_user((void __user *)bitmap, dma->bitmap, 
>  bitmap_size))
>  +return -EFAULT;  
> >>>
> >>> We still need to reset the bitmap here, clearing and re-adding the
> >>> pages that are still pinned.
> >>>
> >>> https://lore.kernel.org/kvm/20200319070635.2ff5d...@x1.home/
> >>>  
> >>
> >> I thought you agreed on my reply to it
> >> https://lore.kernel.org/kvm/31621b70-02a9-2ea5-045f-f72b671fe...@nvidia.com/
> >>  
> >>   > Why re-populate when there will be no change since
> >>   > vfio_iova_dirty_bitmap() is called holding iommu->lock? If there is any
> >>   > pin request while vfio_iova_dirty_bitmap() is still working, it will
> >>   > wait till iommu->lock is released. Bitmap will be populated when page 
> >> is
> >>   > pinned.  
> > 
> > As coded, dirty bits are only ever set in the bitmap, never cleared.
> > If a page is unpinned between iterations of the user recording the
> > dirty bitmap, it should be marked dirty in the iteration immediately
> > after the unpinning and not marked dirty in the following iteration.
> > That doesn't happen here.  We're reporting cumulative dirty pages since
> > logging was enabled, we need to be reporting dirty pages since the user
> > last retrieved the dirty bitmap.  The bitmap should be cleared and
> > currently pinned pages re-added after copying to the user.  Thanks,
> >   
> 
> Does that mean, we have to track every iteration? do we really need that 
> tracking?
> 
> Generally the flow is:
> - vendor driver pin x pages
> - Enter pre-copy-phase where vCPUs are running - user starts dirty pages 
> tracking, then user asks dirty bitmap, x pages reported dirty by 
> VFIO_IOMMU_DIRTY_PAGES ioctl with _GET flag
> - In pre-copy phase, vendor driver pins y more pages, now bitmap 
> consists of x+y bits set
> - In pre-copy phase, vendor driver unpins z pages, but bitmap is not 
> updated, so again bitmap consists of x+y bits set.
> - Enter in stop-and-copy phase, vCPUs are stopped, mdev devices are stopped
> - user asks dirty bitmap - Since here vCPU and mdev devices are stopped, 
> pages should not get dirty by guest driver or the physical device. 
> Hence, x+y dirty pages would be reported.
> 
> I don't think we need to track every iteration of bitmap reporting.

Yes, once a bitmap is read, it's reset.  In your example, after
unpinning z pages the user should still see a bitmap with x+y pages,
but once they've read that bitmap, the next bitmap should be x+y-z.
Userspace can make decisions about when to switch from pre-copy to
stop-and-copy based on convergence, ie. the slope of the line recording
dirty pages per iteration.  The implementation here never allows an
inflection point, dirty pages reported through vfio would always either
be flat or climbing.  There might also be a case that an iommu backed
device could start pinning pages during the course of a migration, how
would the bitmap ever revert from fully populated to only tracking the
pinned pages?  Thanks,

Alex

discard and v2 qcow2 images

2020-03-20 Thread Alberto Garcia

Hi,

when full_discard is false in discard_in_l2_slice() then the selected
cluster should be deallocated and it should read back as zeroes. This
is done by clearing the cluster offset field and setting OFLAG_ZERO in
the L2 entry.

This flag is however only supported when qcow_version >= 3. In older
images the cluster is simply deallocated, exposing any possible
previous data from the backing file.

This can be trivially reproduced like this:

   qemu-img create -f qcow2 backing.img 64k
   qemu-io -c 'write -P 0xff 0 64k' backing.img
   qemu-img create -f qcow2 -o compat=0.10 -b backing.img top.img
   qemu-io -c 'write -P 0x01 0 64k' top.img

After this, top.img is filled with 0x01. Now we issue a discard
command:

   qemu-io -c 'discard 0 64k' top.img

top.img should now read as zeroes, but instead you get the data from
the backing file (0xff). If top.img was created with compat=1.1
instead (the default) then it would read as zeroes after the discard.

This seems like a bug to me, and I would simply forbid using discard
in this case (see below). The other user of full_discard = false is
qcow2_snapshot_create() but I think that one is safe and should be
allowed?

--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3763,6 +3763,10 @@ static coroutine_fn int 
qcow2_co_pdiscard(BlockDriverState *bs,
 int ret;
 BDRVQcow2State *s = bs->opaque;
 
+if (s->qcow_version < 3) {
+return -ENOTSUP;
+}
+
 if (!QEMU_IS_ALIGNED(offset | bytes, s->cluster_size)) {
 assert(bytes < s->cluster_size);
 /* Ignore partial clusters, except for the special case of the

Berto

Re: [PATCH v6 19/61] target/riscv: vector integer divide instructions

2020-03-20 Thread Alistair Francis

On Tue, Mar 17, 2020 at 8:45 AM LIU Zhiwei  wrote:
>
> Signed-off-by: LIU Zhiwei 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/helper.h   | 33 +++
>  target/riscv/insn32.decode  |  8 +++
>  target/riscv/insn_trans/trans_rvv.inc.c | 10 
>  target/riscv/vector_helper.c| 74 +
>  4 files changed, 125 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index f42a12eef3..357f149198 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -558,3 +558,36 @@ DEF_HELPER_6(vmulhsu_vx_b, void, ptr, ptr, tl, ptr, env, 
> i32)
>  DEF_HELPER_6(vmulhsu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vmulhsu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vmulhsu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> +
> +DEF_HELPER_6(vdivu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vdivu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vdivu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vdivu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vdiv_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vdiv_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vdiv_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vdiv_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vremu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vremu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vremu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vremu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vrem_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vrem_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vrem_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vrem_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vdivu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vdivu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vdivu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vdivu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vdiv_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vdiv_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vdiv_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vdiv_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vremu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vremu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vremu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vremu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vrem_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vrem_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vrem_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vrem_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index abfed469bc..7fb8f8fad8 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -371,6 +371,14 @@ vmulhu_vv   100100 . . . 010 . 1010111 
> @r_vm
>  vmulhu_vx   100100 . . . 110 . 1010111 @r_vm
>  vmulhsu_vv  100110 . . . 010 . 1010111 @r_vm
>  vmulhsu_vx  100110 . . . 110 . 1010111 @r_vm
> +vdivu_vv10 . . . 010 . 1010111 @r_vm
> +vdivu_vx10 . . . 110 . 1010111 @r_vm
> +vdiv_vv 11 . . . 010 . 1010111 @r_vm
> +vdiv_vx 11 . . . 110 . 1010111 @r_vm
> +vremu_vv100010 . . . 010 . 1010111 @r_vm
> +vremu_vx100010 . . . 110 . 1010111 @r_vm
> +vrem_vv 100011 . . . 010 . 1010111 @r_vm
> +vrem_vx 100011 . . . 110 . 1010111 @r_vm
>
>  vsetvli 0 ... . 111 . 1010111  @r2_zimm
>  vsetvl  100 . . 111 . 1010111  @r
> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
> b/target/riscv/insn_trans/trans_rvv.inc.c
> index c276beabd6..ed53eaaef5 100644
> --- a/target/riscv/insn_trans/trans_rvv.inc.c
> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> @@ -1462,3 +1462,13 @@ GEN_OPIVX_GVEC_TRANS(vmul_vx,  muls)
>  GEN_OPIVX_TRANS(vmulh_vx, opivx_check)
>  GEN_OPIVX_TRANS(vmulhu_vx, opivx_check)
>  GEN_OPIVX_TRANS(vmulhsu_vx, opivx_check)
> +
> +/* Vector Integer Divide Instructions */
> +GEN_OPIVV_TRANS(vdivu_vv, opivv_check)
> +GEN_OPIVV_TRANS(vdiv_vv, opivv_check)
> +GEN_OPIVV_TRANS(vremu_vv, opivv_check)
> +GEN_OPIVV_TRANS(vrem_vv, opivv_check)
> +GEN_OPIVX_TRANS(vdivu_vx, opivx_check)
> +GEN_OPIVX_TRANS(vdiv_vx, opivx_check)
> +GEN_OPIVX_TRANS(vremu_vx, opivx_check)
> +GEN_OPIVX_TRANS(vrem_vx, opivx_check)
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 56ba9a7422..4fc7a08954 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
>

Re: [PATCH v6 17/61] target/riscv: vector integer min/max instructions

2020-03-20 Thread Alistair Francis

On Tue, Mar 17, 2020 at 8:41 AM LIU Zhiwei  wrote:
>
> Signed-off-by: LIU Zhiwei 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/helper.h   | 33 
>  target/riscv/insn32.decode  |  8 +++
>  target/riscv/insn_trans/trans_rvv.inc.c | 10 
>  target/riscv/vector_helper.c| 71 +
>  4 files changed, 122 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 4e6c47c2d2..c7d4ff185a 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -492,3 +492,36 @@ DEF_HELPER_6(vmsgt_vx_b, void, ptr, ptr, tl, ptr, env, 
> i32)
>  DEF_HELPER_6(vmsgt_vx_h, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vmsgt_vx_w, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vmsgt_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> +
> +DEF_HELPER_6(vminu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vminu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vminu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vminu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vmin_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vmin_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vmin_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vmin_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vmaxu_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vmaxu_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vmaxu_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vmaxu_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vmax_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vmax_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vmax_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vmax_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vminu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vminu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vminu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vminu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vmin_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vmin_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vmin_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vmin_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vmaxu_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vmaxu_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vmaxu_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vmaxu_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vmax_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vmax_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vmax_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vmax_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index df6181980d..aafbdc6be7 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -355,6 +355,14 @@ vmsgtu_vx   00 . . . 100 . 1010111 
> @r_vm
>  vmsgtu_vi   00 . . . 011 . 1010111 @r_vm
>  vmsgt_vx01 . . . 100 . 1010111 @r_vm
>  vmsgt_vi01 . . . 011 . 1010111 @r_vm
> +vminu_vv000100 . . . 000 . 1010111 @r_vm
> +vminu_vx000100 . . . 100 . 1010111 @r_vm
> +vmin_vv 000101 . . . 000 . 1010111 @r_vm
> +vmin_vx 000101 . . . 100 . 1010111 @r_vm
> +vmaxu_vv000110 . . . 000 . 1010111 @r_vm
> +vmaxu_vx000110 . . . 100 . 1010111 @r_vm
> +vmax_vv 000111 . . . 000 . 1010111 @r_vm
> +vmax_vx 000111 . . . 100 . 1010111 @r_vm
>
>  vsetvli 0 ... . 111 . 1010111  @r2_zimm
>  vsetvl  100 . . 111 . 1010111  @r
> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
> b/target/riscv/insn_trans/trans_rvv.inc.c
> index 53c00d914f..53c49ee15c 100644
> --- a/target/riscv/insn_trans/trans_rvv.inc.c
> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> @@ -1442,3 +1442,13 @@ GEN_OPIVI_TRANS(vmsleu_vi, 1, vmsleu_vx, 
> opivx_cmp_check)
>  GEN_OPIVI_TRANS(vmsle_vi, 0, vmsle_vx, opivx_cmp_check)
>  GEN_OPIVI_TRANS(vmsgtu_vi, 1, vmsgtu_vx, opivx_cmp_check)
>  GEN_OPIVI_TRANS(vmsgt_vi, 0, vmsgt_vx, opivx_cmp_check)
> +
> +/* Vector Integer Min/Max Instructions */
> +GEN_OPIVV_GVEC_TRANS(vminu_vv, umin)
> +GEN_OPIVV_GVEC_TRANS(vmin_vv,  smin)
> +GEN_OPIVV_GVEC_TRANS(vmaxu_vv, umax)
> +GEN_OPIVV_GVEC_TRANS(vmax_vv,  smax)
> +GEN_OPIVX_TRANS(vminu_vx, opivx_check)
> +GEN_OPIVX_TRANS(vmin_vx,  opivx_check)
> +GEN_OPIVX_TRANS(vmaxu_vx, opivx_check)
> +GEN_OPIVX_TRANS(vmax_vx,  opivx_check)
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index d1fc543e98..32c2760a8a 100644
> --- a/tar

Re: [PATCH v6 15/61] target/riscv: vector narrowing integer right shift instructions

2020-03-20 Thread Alistair Francis

On Tue, Mar 17, 2020 at 8:37 AM LIU Zhiwei  wrote:
>
> Signed-off-by: LIU Zhiwei 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/helper.h   | 13 
>  target/riscv/insn32.decode  |  6 ++
>  target/riscv/insn_trans/trans_rvv.inc.c | 85 +
>  target/riscv/vector_helper.c| 14 
>  4 files changed, 118 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 47284c7476..0f36a8ce43 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -422,3 +422,16 @@ DEF_HELPER_6(vsra_vx_b, void, ptr, ptr, tl, ptr, env, 
> i32)
>  DEF_HELPER_6(vsra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vsra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vsra_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> +
> +DEF_HELPER_6(vnsrl_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vnsrl_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vnsrl_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vnsra_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vnsra_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vnsra_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vnsrl_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vnsrl_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vnsrl_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vnsra_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vnsra_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vnsra_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index f6d0f5aec5..89fd2aa4e2 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -329,6 +329,12 @@ vsrl_vi 101000 . . . 011 . 1010111 
> @r_vm
>  vsra_vv 101001 . . . 000 . 1010111 @r_vm
>  vsra_vx 101001 . . . 100 . 1010111 @r_vm
>  vsra_vi 101001 . . . 011 . 1010111 @r_vm
> +vnsrl_vv101100 . . . 000 . 1010111 @r_vm
> +vnsrl_vx101100 . . . 100 . 1010111 @r_vm
> +vnsrl_vi101100 . . . 011 . 1010111 @r_vm
> +vnsra_vv101101 . . . 000 . 1010111 @r_vm
> +vnsra_vx101101 . . . 100 . 1010111 @r_vm
> +vnsra_vi101101 . . . 011 . 1010111 @r_vm
>
>  vsetvli 0 ... . 111 . 1010111  @r2_zimm
>  vsetvl  100 . . 111 . 1010111  @r
> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
> b/target/riscv/insn_trans/trans_rvv.inc.c
> index 6ed2466e75..a537b507a0 100644
> --- a/target/riscv/insn_trans/trans_rvv.inc.c
> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> @@ -1312,3 +1312,88 @@ GEN_OPIVX_GVEC_SHIFT_TRANS(vsra_vx,  sars)
>  GEN_OPIVI_GVEC_TRANS(vsll_vi, 1, vsll_vx,  shli)
>  GEN_OPIVI_GVEC_TRANS(vsrl_vi, 1, vsrl_vx,  shri)
>  GEN_OPIVI_GVEC_TRANS(vsra_vi, 1, vsra_vx,  sari)
> +
> +/* Vector Narrowing Integer Right Shift Instructions */
> +static bool opivv_narrow_check(DisasContext *s, arg_rmrr *a)
> +{
> +return (vext_check_isa_ill(s) &&
> +vext_check_overlap_mask(s, a->rd, a->vm, false) &&
> +vext_check_reg(s, a->rd, false) &&
> +vext_check_reg(s, a->rs2, true) &&
> +vext_check_reg(s, a->rs1, false) &&
> +vext_check_overlap_group(a->rd, 1 << s->lmul, a->rs2,
> +2 << s->lmul) &&
> +(s->lmul < 0x3) && (s->sew < 0x3));
> +}
> +
> +/* OPIVV with NARROW */
> +#define GEN_OPIVV_NARROW_TRANS(NAME)   \
> +static bool trans_##NAME(DisasContext *s, arg_rmrr *a) \
> +{  \
> +if (opivv_narrow_check(s, a)) {\
> +uint32_t data = 0; \
> +static gen_helper_gvec_4_ptr * const fns[3] = {\
> +gen_helper_##NAME##_b, \
> +gen_helper_##NAME##_h, \
> +gen_helper_##NAME##_w, \
> +}; \
> +data = FIELD_DP32(data, VDATA, MLEN, s->mlen); \
> +data = FIELD_DP32(data, VDATA, VM, a->vm); \
> +data = FIELD_DP32(data, VDATA, LMUL, s->lmul); \
> +tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0), \
> +vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),  \
> +cpu_env, 0, s->vlen / 8, data, fns[s->sew]);   \
> +return true;   \
> +}  \
> +return false;

Re: [PATCH v5 07/18] s390x: protvirt: Inhibit balloon when switching to protected mode

2020-03-20 Thread Halil Pasic

On Thu, 19 Mar 2020 18:31:11 +0100
David Hildenbrand  wrote:

> [...]
> 
> >>
> >> I asked this question already to Michael (cc) via a different
> >> channel, but hare is it again:
> >>
> >> Why does the balloon driver not support VIRTIO_F_IOMMU_PLATFORM? It
> >> is absolutely not clear to me. The introducing commit mentioned
> >> that it "bypasses DMA". I fail to see that.
> >>
> >> At least the communication via the SG mechanism should work
> >> perfectly fine with an IOMMU enabled. So I assume it boils down to
> >> the pages that we inflate/deflate not being referenced via IOVA?
> > 
> > AFAIU the IOVA/GPA stuff is not the problem here. You have said it
> > yourself, the SG mechanism would work for balloon out of the box, as
> > it does for the other virtio devices. 
> > 
> > But VIRTIO_F_ACCESS_PLATFORM (aka VIRTIO_F_IOMMU_PLATFORM)  not
> > presented means according to Michael that the device has full access
> > to the entire guest RAM. If VIRTIO_F_ACCESS_PLATFORM is negotiated
> > this may or may not be the case.
> 
> So you say
> 
> "The virtio specification tells that the device is to present
> VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the
> device "can only access certain memory addresses with said access
> specified and/or granted by the platform"."
> 
> So, AFAIU, *any* virtio device (hypervisor side) has to present this
> flag when PV is enabled. 

Yes, and balloon says bye bye when running in PV mode is only a secondary
objective. I've compiled some references:

"To summarize, the necessary conditions for a hack along these lines
(using DMA API without VIRTIO_F_ACCESS_PLATFORM) are that we detect that:

  - secure guest mode is enabled - so we know that since we don't share
most memory regular virtio code won't
work, even though the buggy hypervisor didn't set VIRTIO_F_ACCESS_PLATFORM" 
(Michael Tsirkin, https://lkml.org/lkml/2020/2/20/1021)
I.e.: PV but !VIRTIO_F_ACCESS_PLATFORM \implies bugy hypervisor

"If VIRTIO_F_ACCESS_PLATFORM is set then things just work.  If
VIRTIO_F_ACCESS_PLATFORM is clear device is supposed to have access to
all of memory.  You can argue in various ways but it's easier to just
declare a behaviour that violates this a bug."
(Michael Tsirkin, https://lkml.org/lkml/2020/2/21/1626)
This one is about all memory guest, and not just the buffers transfered
via the virtqueue, which surprised me a bit at the beginning. But balloon
actually needs this.

"A device SHOULD offer VIRTIO_F_ACCESS_PLATFORM if its access to memory
is through bus addresses distinct from and translated by the platform to
physical addresses used by the driver, and/or if it can only access
certain memory addresses with said access specified and/or granted by
the platform. A device MAY fail to operate further if
VIRTIO_F_ACCESS_PLATFORM is not accepted. "
(https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-4120002)

> In that regard, your patch makes perfect sense
> (although I am not sure it's a good idea to overwrite these feature
> bits
> - maybe they should be activated on the cmdline permanently instead
> when PV is to be used? (or enable )).

I didn't understand the last part. I believe conserving the user
specified value when not running in PV mode is better than the hard
overwrite I did here. I wanted a discussion starter.

I think the other option (with respect to let QEMU manage this for user,
i.e. what I try to do here) is to fence the conversion if virtio devices
that do not offer VIRTIO_F_ACCESS_PLATFORM are attached; and disallow
hotplug of such devices at some point during the conversion.

I believe that alternative is even uglier.

IMHO we don't want the end user to fiddle with iommu_platform, because
all the 'benefit' he gets from that is possibility to make a mistake.
For example, I got an internal bug report saying virtio is broken with
PV, which boiled down to an overlooked auto generated NIC, which of
course had iommu_platform (VIRTIO_F_ACCESS_PLATFORM) not set.

> 
> > 
> > The actual problem is that the pages denoted by the buffer
> > transmitted via the virtqueue are normally not shared pages. I.e.
> > the hypervisor can not reuse them (what is the point of balloon
> > inflate). To make this work, the guest would need to share the pages
> > before saying 'host these are in my balloon, so you can use them'.
> > This is a piece of logic we
> 
> What exactly would have to be done in the hypervisor to support it?

AFAIK nothing. The guest needs to share the pages, and everything works.
Janosch, can you help me with this one? 

> 
> Assume we have to trigger sharing/unsharing - this sounds like a very
> architecture specific thing?

It is, but any guest having sovereignty about its memory may need
something similar.

> Or is this e.g., doing a map/unmap
> operation like mapping/unmapping the SG?

No this is something different. We need stronger guarantees than the
streaming portion of the DMA API provides. And what we actually wa

Re: [PATCH v6 13/61] target/riscv: vector bitwise logical instructions

2020-03-20 Thread Alistair Francis

On Tue, Mar 17, 2020 at 8:33 AM LIU Zhiwei  wrote:
>
> Signed-off-by: LIU Zhiwei 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/helper.h   | 25 
>  target/riscv/insn32.decode  |  9 +
>  target/riscv/insn_trans/trans_rvv.inc.c | 11 ++
>  target/riscv/vector_helper.c| 51 +
>  4 files changed, 96 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 72c733bf49..4373e9e8c2 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -372,3 +372,28 @@ DEF_HELPER_6(vmsbc_vxm_b, void, ptr, ptr, tl, ptr, env, 
> i32)
>  DEF_HELPER_6(vmsbc_vxm_h, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vmsbc_vxm_w, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vmsbc_vxm_d, void, ptr, ptr, tl, ptr, env, i32)
> +
> +DEF_HELPER_6(vand_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vand_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vand_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vand_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vor_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vor_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vor_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vor_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vxor_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vxor_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vxor_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vxor_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vand_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vand_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vand_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vand_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vor_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vor_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vor_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vor_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vxor_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vxor_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vxor_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vxor_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 022c8ea18b..3ad6724632 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -311,6 +311,15 @@ vsbc_vvm010010 1 . . 000 . 1010111 
> @r_vm_1
>  vsbc_vxm010010 1 . . 100 . 1010111 @r_vm_1
>  vmsbc_vvm   010011 1 . . 000 . 1010111 @r_vm_1
>  vmsbc_vxm   010011 1 . . 100 . 1010111 @r_vm_1
> +vand_vv 001001 . . . 000 . 1010111 @r_vm
> +vand_vx 001001 . . . 100 . 1010111 @r_vm
> +vand_vi 001001 . . . 011 . 1010111 @r_vm
> +vor_vv  001010 . . . 000 . 1010111 @r_vm
> +vor_vx  001010 . . . 100 . 1010111 @r_vm
> +vor_vi  001010 . . . 011 . 1010111 @r_vm
> +vxor_vv 001011 . . . 000 . 1010111 @r_vm
> +vxor_vx 001011 . . . 100 . 1010111 @r_vm
> +vxor_vi 001011 . . . 011 . 1010111 @r_vm
>
>  vsetvli 0 ... . 111 . 1010111  @r2_zimm
>  vsetvl  100 . . 111 . 1010111  @r
> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
> b/target/riscv/insn_trans/trans_rvv.inc.c
> index 4562d5f14f..b4ba6d83f3 100644
> --- a/target/riscv/insn_trans/trans_rvv.inc.c
> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> @@ -1247,3 +1247,14 @@ GEN_OPIVX_TRANS(vmsbc_vxm, opivx_vmadc_check)
>
>  GEN_OPIVI_TRANS(vadc_vim, 0, vadc_vxm, opivx_vadc_check)
>  GEN_OPIVI_TRANS(vmadc_vim, 0, vmadc_vxm, opivx_vmadc_check)
> +
> +/* Vector Bitwise Logical Instructions */
> +GEN_OPIVV_GVEC_TRANS(vand_vv, and)
> +GEN_OPIVV_GVEC_TRANS(vor_vv,  or)
> +GEN_OPIVV_GVEC_TRANS(vxor_vv, xor)
> +GEN_OPIVX_GVEC_TRANS(vand_vx, ands)
> +GEN_OPIVX_GVEC_TRANS(vor_vx,  ors)
> +GEN_OPIVX_GVEC_TRANS(vxor_vx, xors)
> +GEN_OPIVI_GVEC_TRANS(vand_vi, 0, vand_vx, andi)
> +GEN_OPIVI_GVEC_TRANS(vor_vi, 0, vor_vx,  ori)
> +GEN_OPIVI_GVEC_TRANS(vxor_vi, 0, vxor_vx, xori)
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 9913dcbea2..470bf079b2 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -1235,3 +1235,54 @@ GEN_VEXT_VMADC_VXM(vmsbc_vxm_b, uint8_t,  H1, DO_MSBC)
>  GEN_VEXT_VMADC_VXM(vmsbc_vxm_h, uint16_t, H2, DO_MSBC)
>  GEN_VEXT_VMADC_VXM(vmsbc_vxm_w, uint32_t, H4, DO_MSBC)
>  GEN_VEXT_VMADC_VXM(vmsbc_vxm_d, uint64_t, H8, DO_MSBC)
> +
> +/* Vector Bitwise Logical Instructions */
> +RVVCALL(OPIVV2, vand_vv_b, OP_SSS_B, H1, H1, H1, DO_AND)
> +RVVCALL(OPIVV2, vand_vv_h, OP_SSS

Re: [PATCH v15 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-20 Thread Kirti Wankhede

On 3/20/2020 11:31 PM, Alex Williamson wrote:

On Fri, 20 Mar 2020 23:19:14 +0530
Kirti Wankhede  wrote:

On 3/20/2020 4:27 AM, Alex Williamson wrote:

On Fri, 20 Mar 2020 01:46:41 +0530
Kirti Wankhede  wrote:

+static int vfio_iova_dirty_bitmap(struct vfio_iommu *iommu, dma_addr_t iova,
+ size_t size, uint64_t pgsize,
+ u64 __user *bitmap)
+{
+   struct vfio_dma *dma;
+   unsigned long pgshift = __ffs(pgsize);
+   unsigned int npages, bitmap_size;
+
+   dma = vfio_find_dma(iommu, iova, 1);
+
+   if (!dma)
+   return -EINVAL;
+
+   if (dma->iova != iova || dma->size != size)
+   return -EINVAL;
+
+   npages = dma->size >> pgshift;
+   bitmap_size = DIRTY_BITMAP_BYTES(npages);
+
+   /* mark all pages dirty if all pages are pinned and mapped. */
+   if (dma->iommu_mapped)
+   bitmap_set(dma->bitmap, 0, npages);
+
+   if (copy_to_user((void __user *)bitmap, dma->bitmap, bitmap_size))
+   return -EFAULT;

We still need to reset the bitmap here, clearing and re-adding the
pages that are still pinned.

https://lore.kernel.org/kvm/20200319070635.2ff5d...@x1.home/

I thought you agreed on my reply to it
https://lore.kernel.org/kvm/31621b70-02a9-2ea5-045f-f72b671fe...@nvidia.com/

  > Why re-populate when there will be no change since
  > vfio_iova_dirty_bitmap() is called holding iommu->lock? If there is any
  > pin request while vfio_iova_dirty_bitmap() is still working, it will
  > wait till iommu->lock is released. Bitmap will be populated when page is
  > pinned.

As coded, dirty bits are only ever set in the bitmap, never cleared.
If a page is unpinned between iterations of the user recording the
dirty bitmap, it should be marked dirty in the iteration immediately
after the unpinning and not marked dirty in the following iteration.
That doesn't happen here.  We're reporting cumulative dirty pages since
logging was enabled, we need to be reporting dirty pages since the user
last retrieved the dirty bitmap.  The bitmap should be cleared and
currently pinned pages re-added after copying to the user.  Thanks,

Does that mean, we have to track every iteration? do we really need that 
tracking?

Generally the flow is:
- vendor driver pin x pages
- Enter pre-copy-phase where vCPUs are running - user starts dirty pages 
tracking, then user asks dirty bitmap, x pages reported dirty by 
VFIO_IOMMU_DIRTY_PAGES ioctl with _GET flag
- In pre-copy phase, vendor driver pins y more pages, now bitmap 
consists of x+y bits set
- In pre-copy phase, vendor driver unpins z pages, but bitmap is not 
updated, so again bitmap consists of x+y bits set.

- Enter in stop-and-copy phase, vCPUs are stopped, mdev devices are stopped
- user asks dirty bitmap - Since here vCPU and mdev devices are stopped, 
pages should not get dirty by guest driver or the physical device. 
Hence, x+y dirty pages would be reported.

I don't think we need to track every iteration of bitmap reporting.

Thanks,
Kirti

Re: [PATCH v6 10/61] target/riscv: vector single-width integer add and subtract

2020-03-20 Thread Alistair Francis

On Tue, Mar 17, 2020 at 8:27 AM LIU Zhiwei  wrote:
>
> Signed-off-by: LIU Zhiwei 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/helper.h   |  21 ++
>  target/riscv/insn32.decode  |  10 +
>  target/riscv/insn_trans/trans_rvv.inc.c | 251 
>  target/riscv/vector_helper.c| 149 ++
>  4 files changed, 431 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 70a4b05f75..e73701d4bb 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -269,3 +269,24 @@ DEF_HELPER_6(vamominw_v_w,  void, ptr, ptr, tl, ptr, 
> env, i32)
>  DEF_HELPER_6(vamomaxw_v_w,  void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vamominuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vamomaxuw_v_w, void, ptr, ptr, tl, ptr, env, i32)
> +
> +DEF_HELPER_6(vadd_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vadd_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vadd_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vadd_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vsub_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vsub_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vsub_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vsub_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vadd_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vadd_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vadd_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vadd_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vsub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vrsub_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vrsub_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vrsub_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vrsub_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 1330703720..d1034a0e61 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -44,6 +44,7 @@
>  &uimm rd
>  &shift shamt rs1 rd
>  &atomicaq rl rs2 rs1 rd
> +&rmrr  vm rd rs1 rs2
>  &rwdvm vm wd rd rs1 rs2
>  &r2nfvmvm rd rs1 nf
>  &rnfvm vm rd rs1 rs2 nf
> @@ -68,6 +69,7 @@
>  @r2  ...   . . ... . ... %rs1 %rd
>  @r2_nfvm ... ... vm:1 . . ... . ... &r2nfvm %nf %rs1 %rd
>  @r_nfvm  ... ... vm:1 . . ... . ... &rnfvm %nf %rs2 %rs1 %rd
> +@r_vm.. vm:1 . . ... . ... &rmrr %rs2 %rs1 %rd
>  @r_wdvm  . wd:1 vm:1 . . ... . ... &rwdvm %rs2 %rs1 %rd
>  @r2_zimm . zimm:11  . ... . ... %rs1 %rd
>
> @@ -275,5 +277,13 @@ vamominuw_v 11000 . . . . 110 . 010 
> @r_wdvm
>  vamomaxuw_v 11100 . . . . 110 . 010 @r_wdvm
>
>  # *** new major opcode OP-V ***
> +vadd_vv 00 . . . 000 . 1010111 @r_vm
> +vadd_vx 00 . . . 100 . 1010111 @r_vm
> +vadd_vi 00 . . . 011 . 1010111 @r_vm
> +vsub_vv 10 . . . 000 . 1010111 @r_vm
> +vsub_vx 10 . . . 100 . 1010111 @r_vm
> +vrsub_vx11 . . . 100 . 1010111 @r_vm
> +vrsub_vi11 . . . 011 . 1010111 @r_vm
> +
>  vsetvli 0 ... . 111 . 1010111  @r2_zimm
>  vsetvl  100 . . 111 . 1010111  @r
> diff --git a/target/riscv/insn_trans/trans_rvv.inc.c 
> b/target/riscv/insn_trans/trans_rvv.inc.c
> index a8722ed9d2..c68f6ffe3b 100644
> --- a/target/riscv/insn_trans/trans_rvv.inc.c
> +++ b/target/riscv/insn_trans/trans_rvv.inc.c
> @@ -740,3 +740,254 @@ GEN_VEXT_TRANS(vamomaxd_v, 15, rwdvm, amo_op, amo_check)
>  GEN_VEXT_TRANS(vamominud_v, 16, rwdvm, amo_op, amo_check)
>  GEN_VEXT_TRANS(vamomaxud_v, 17, rwdvm, amo_op, amo_check)
>  #endif
> +
> +/*
> + *** Vector Integer Arithmetic Instructions
> + */
> +#define MAXSZ(s) (s->vlen >> (3 - s->lmul))
> +
> +static bool opivv_check(DisasContext *s, arg_rmrr *a)
> +{
> +return (vext_check_isa_ill(s) &&
> +vext_check_overlap_mask(s, a->rd, a->vm, false) &&
> +vext_check_reg(s, a->rd, false) &&
> +vext_check_reg(s, a->rs2, false) &&
> +vext_check_reg(s, a->rs1, false));
> +}
> +
> +typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
> +uint32_t, uint32_t, uint32_t);
> +
> +static inline bool
> +do_opivv_gvec(DisasContext *s, arg_rmrr *a, GVecGen3Fn *gvec_fn,
> +  gen_helper_gvec_4_ptr *fn)
> +{
> +if (!opivv_check(s, a)) {
> +return false;
> +}
> +
> +if (a->vm && s->vl_eq_vlmax) {
> +gvec_

[PATCH] block: Avoid memleak on qcow2 image info failure

2020-03-20 Thread Eric Blake

If we fail to get bitmap info, we must not leak the encryption info.

Fixes: b8968c875f403
Fixes: Coverity CID 1421894
Signed-off-by: Eric Blake 
---
 block/qcow2.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index d44b45633dbb..e08917ed8462 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -4811,6 +4811,7 @@ static ImageInfoSpecific 
*qcow2_get_specific_info(BlockDriverState *bs,
 if (local_err) {
 error_propagate(errp, local_err);
 qapi_free_ImageInfoSpecific(spec_info);
+qapi_free_QCryptoBlockInfo(encrypt_info);
 return NULL;
 }
 *spec_info->u.qcow2.data = (ImageInfoSpecificQCow2){
-- 
2.25.1

Re: [Qemu-devel] [PULL 3/4] qcow2: Add list of bitmaps to ImageInfoSpecificQCow2

2020-03-20 Thread Eric Blake


On 3/20/20 12:57 PM, Peter Maydell wrote:

On Mon, 11 Feb 2019 at 20:57, Eric Blake  wrote:


From: Andrey Shinkevich 

In the 'Format specific information' section of the 'qemu-img info'
command output, the supplemental information about existing QCOW2
bitmaps will be shown, such as a bitmap name, flags and granularity:


Hi; Coverity has just noticed an issue (CID 1421894) with this change:




+Qcow2BitmapInfoList *bitmaps;
+bitmaps = qcow2_get_bitmap_info_list(bs, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+qapi_free_ImageInfoSpecific(spec_info);
+return NULL;


If we take this error-exit codepath, then we never free the
memory allocated by the earlier call to qcrypto_block_get_info().


Fix sent.

Hmm - it would be nice if the QAPI generator could declare all QAPI 
types as g_autoptr compatible, so we could simplify our cleanup paths to 
not have to worry about calling qapi_free_FOO() on all paths.  But while 
the memory leak fix is a one-liner safe for 5.0, switching to g_autoptr 
is a bigger task that would be 5.1 material.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PULL v2 0/1] Slirp patches

2020-03-20 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20200320155106.549514-1-marcandre.lur...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PULL v2 0/1] Slirp patches
Message-id: 20200320155106.549514-1-marcandre.lur...@redhat.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
ca00178 slirp: update submodule to v4.2.0+

=== OUTPUT BEGIN ===
ERROR: Missing Signed-off-by: line(s)

total: 1 errors, 0 warnings, 2 lines checked

Commit ca0017831606 (slirp: update submodule to v4.2.0+) has style problems, 
please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200320155106.549514-1-marcandre.lur...@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH] monitor/hmp-cmds: fix bad indentation in 'info migrate_parameters' cmd output

2020-03-20 Thread Dr. David Alan Gilbert

* Daniel P. Berrangé (berra...@redhat.com) wrote:
> On Fri, Mar 20, 2020 at 05:31:17PM +, Dr. David Alan Gilbert wrote:
> > (Rearranging the text a bit)
> > 
> > * Markus Armbruster (arm...@redhat.com) wrote:
> > 
> > > David (cc'ed) should be able to tell us which fix is right.
> > > 
> > > @tls_creds and @tls_hostname look like they could have the same issue.
> > 
> > A certain Markus removed the Null checks in 8cc99dc because 4af245d
> > guaranteed they would be None-Null for tls-creds/hostname - so we
> > should be OK for those.
> > 
> > But tls-authz came along a lot later in d2f1d29 and doesn't
> > seem to have the initialisation, which is now in
> > migration_instance_init.
> > 
> > So I *think* the fix for this is to do the modern equivalent of 4af245d
> > :
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index c1d88ace7f..0bc1b93277 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -3686,6 +3686,7 @@ static void migration_instance_init(Object *obj)
> >  
> >  params->tls_hostname = g_strdup("");
> >  params->tls_creds = g_strdup("");
> > +params->tls_authz = g_strdup("");
> >  
> >  /* Set has_* up only for parameter checks */
> >  params->has_compress_level = true;
> > 
> > Copying in Dan to check that wouldn't break tls.
> 
> It *will* break TLS, because it will cause the TLS code to lookup
> an object with the ID of "".  NULL must be preserved when calling
> the TLS APIs.

OK, good I asked...

> The assignment of "" to tls_hostname would also have broken TLS,
> so the migration_tls_channel_connect method had to turn it back
> into a real NULL.
> 
> The use of "" for tls_creds will similarly cause it to try and
> lookup an object with ID of "", and fail. That one's harmless
> though, because it would also fail if it were NULL.

OK.

It looks like the output of query-migrate-parameters though already
turns it into "", so I don't think you can tell it's NULL from that:

{"QMP": {"version": {"qemu": {"micro": 0, "minor": 2, "major": 4}, "package": 
"qemu-4.2.0-4.fc31"}, "capabilities": ["oob"]}}
{ "execute": "qmp_capabilities" }
{"return": {}}
{ "execute": "query-migrate-parameters" }
{"return": {"xbzrle-cache-size": 67108864, "cpu-throttle-initial": 20, 
"announce-max": 550, "decompress-threads": 2, "compress-threads": 8, 
"compress-level": 1, "multifd-channels": 2, "announce-initial": 50, 
"block-incremental": false, "compress-wait-thread": true, "downtime-limit": 
300, "tls-authz": "", "announce-rounds": 5, "announce-step": 100, "tls-creds": 
"", "max-cpu-throttle": 99, "max-postcopy-bandwidth": 0, "tls-hostname": "", 
"max-bandwidth": 33554432, "x-checkpoint-delay": 2, 
"cpu-throttle-increment": 10}}

I'm not sure who turns a Null into a "" but I guess it's somewhere in
the json output iterator.

So we can fix this problem either in qmp_query_migrate_parameters
and just strdup a "", or substitute it in hmp_info_migrate_parameters.

Dave

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[PATCH v2 2/4] qemu: capabilities: add QEMU_CAPS_VIRTFS_MULTIDEVS

2020-03-20 Thread Christian Schoenebeck

The QEMU 9pfs 'multidevs' option exists since QEMU 4.2, so just
set this capability based on that QEMU version.

Signed-off-by: Christian Schoenebeck 
---
 src/qemu/qemu_capabilities.c  | 5 +
 src/qemu/qemu_capabilities.h  | 1 +
 tests/qemucapabilitiesdata/caps_4.2.0.x86_64.xml  | 1 +
 tests/qemucapabilitiesdata/caps_5.0.0.aarch64.xml | 1 +
 tests/qemucapabilitiesdata/caps_5.0.0.x86_64.xml  | 1 +
 5 files changed, 9 insertions(+)

diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
index a95a60c36a..68b6e166e9 100644
--- a/src/qemu/qemu_capabilities.c
+++ b/src/qemu/qemu_capabilities.c
@@ -567,6 +567,7 @@ VIR_ENUM_IMPL(virQEMUCaps,
   "query-named-block-nodes.flat",
   "blockdev-snapshot.allow-write-only-overlay",
   "blockdev-reopen",
+  "virtfs-multidevs",
 );
 
 
@@ -4837,6 +4838,10 @@ virQEMUCapsInitQMPVersionCaps(virQEMUCapsPtr qemuCaps)
 ARCH_IS_PPC64(qemuCaps->arch)) {
 virQEMUCapsSet(qemuCaps, QEMU_CAPS_MACHINE_PSERIES_MAX_CPU_COMPAT);
 }
+
+/* -virtfs multidevs option is supported since QEMU 4.2 */
+if (qemuCaps->version >= 4002000)
+virQEMUCapsSet(qemuCaps, QEMU_CAPS_VIRTFS_MULTIDEVS);
 }
 
 
diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h
index f0961e273c..a6025312be 100644
--- a/src/qemu/qemu_capabilities.h
+++ b/src/qemu/qemu_capabilities.h
@@ -548,6 +548,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for 
syntax-check */
 QEMU_CAPS_QMP_QUERY_NAMED_BLOCK_NODES_FLAT, /* query-named-block-nodes 
supports the 'flat' option */
 QEMU_CAPS_BLOCKDEV_SNAPSHOT_ALLOW_WRITE_ONLY, /* blockdev-snapshot has the 
'allow-write-only-overlay' feature */
 QEMU_CAPS_BLOCKDEV_REOPEN, /* 'blockdev-reopen' qmp command is supported */
+QEMU_CAPS_VIRTFS_MULTIDEVS, /* -virtfs multidevs supported by virtio-9p */
 
 QEMU_CAPS_LAST /* this must always be the last item */
 } virQEMUCapsFlags;
diff --git a/tests/qemucapabilitiesdata/caps_4.2.0.x86_64.xml 
b/tests/qemucapabilitiesdata/caps_4.2.0.x86_64.xml
index 83e804ea36..d8b0de46cd 100644
--- a/tests/qemucapabilitiesdata/caps_4.2.0.x86_64.xml
+++ b/tests/qemucapabilitiesdata/caps_4.2.0.x86_64.xml
@@ -223,6 +223,7 @@
   
   
   
+  
   4002000
   0
   43100242
diff --git a/tests/qemucapabilitiesdata/caps_5.0.0.aarch64.xml 
b/tests/qemucapabilitiesdata/caps_5.0.0.aarch64.xml
index e52c60607d..3a695fbe79 100644
--- a/tests/qemucapabilitiesdata/caps_5.0.0.aarch64.xml
+++ b/tests/qemucapabilitiesdata/caps_5.0.0.aarch64.xml
@@ -181,6 +181,7 @@
   
   
   
+  
   4002050
   0
   61700241
diff --git a/tests/qemucapabilitiesdata/caps_5.0.0.x86_64.xml 
b/tests/qemucapabilitiesdata/caps_5.0.0.x86_64.xml
index d773f7e356..95fa0813dd 100644
--- a/tests/qemucapabilitiesdata/caps_5.0.0.x86_64.xml
+++ b/tests/qemucapabilitiesdata/caps_5.0.0.x86_64.xml
@@ -226,6 +226,7 @@
   
   
   
+  
   4002050
   0
   43100241
-- 
2.20.1

Re: [PATCH v15 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-20 Thread Alex Williamson

On Fri, 20 Mar 2020 23:19:14 +0530
Kirti Wankhede  wrote:

> On 3/20/2020 4:27 AM, Alex Williamson wrote:
> > On Fri, 20 Mar 2020 01:46:41 +0530
> > Kirti Wankhede  wrote:
> >   
> >> VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
> >> - Start dirty pages tracking while migration is active
> >> - Stop dirty pages tracking.
> >> - Get dirty pages bitmap. Its user space application's responsibility to
> >>copy content of dirty pages from source to destination during migration.
> >>
> >> To prevent DoS attack, memory for bitmap is allocated per vfio_dma
> >> structure. Bitmap size is calculated considering smallest supported page
> >> size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled
> >>
> >> Bitmap is populated for already pinned pages when bitmap is allocated for
> >> a vfio_dma with the smallest supported page size. Update bitmap from
> >> pinning functions when tracking is enabled. When user application queries
> >> bitmap, check if requested page size is same as page size used to
> >> populated bitmap. If it is equal, copy bitmap, but if not equal, return
> >> error.
> >>
> >> Signed-off-by: Kirti Wankhede 
> >> Reviewed-by: Neo Jia 
> >> ---
> >>   drivers/vfio/vfio_iommu_type1.c | 242 
> >> +++-
> >>   1 file changed, 236 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/vfio/vfio_iommu_type1.c 
> >> b/drivers/vfio/vfio_iommu_type1.c
> >> index 70aeab921d0f..239f61764d03 100644
> >> --- a/drivers/vfio/vfio_iommu_type1.c
> >> +++ b/drivers/vfio/vfio_iommu_type1.c
> >> @@ -71,6 +71,7 @@ struct vfio_iommu {
> >>unsigned intdma_avail;
> >>boolv2;
> >>boolnesting;
> >> +  booldirty_page_tracking;
> >>   };
> >>   
> >>   struct vfio_domain {
> >> @@ -91,6 +92,7 @@ struct vfio_dma {
> >>boollock_cap;   /* capable(CAP_IPC_LOCK) */
> >>struct task_struct  *task;
> >>struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
> >> +  unsigned long   *bitmap;
> >>   };
> >>   
> >>   struct vfio_group {
> >> @@ -125,7 +127,21 @@ struct vfio_regions {
> >>   #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
> >>(!list_empty(&iommu->domain_list))
> >>   
> >> +#define DIRTY_BITMAP_BYTES(n) (ALIGN(n, BITS_PER_TYPE(u64)) / 
> >> BITS_PER_BYTE)
> >> +
> >> +/*
> >> + * Input argument of number of bits to bitmap_set() is unsigned integer, 
> >> which
> >> + * further casts to signed integer for unaligned multi-bit operation,
> >> + * __bitmap_set().
> >> + * Then maximum bitmap size supported is 2^31 bits divided by 2^3 
> >> bits/byte,
> >> + * that is 2^28 (256 MB) which maps to 2^31 * 2^12 = 2^43 (8TB) on 4K page
> >> + * system.
> >> + */
> >> +#define DIRTY_BITMAP_PAGES_MAX((1UL << 31) - 1)
> >> +#define DIRTY_BITMAP_SIZE_MAX  
> >> DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
> >> +
> >>   static int put_pfn(unsigned long pfn, int prot);
> >> +static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
> >>   
> >>   /*
> >>* This code handles mapping and unmapping of user data buffers
> >> @@ -175,6 +191,67 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
> >> struct vfio_dma *old)
> >>rb_erase(&old->node, &iommu->dma_list);
> >>   }
> >>   
> >> +
> >> +static int vfio_dma_bitmap_alloc(struct vfio_dma *dma, uint64_t pgsize)
> >> +{
> >> +  uint64_t npages = dma->size / pgsize;
> >> +  
> > 
> > Shouldn't we test this against one of the MAX macros defined above?  It
> > would be bad if we could enabled dirty tracking but not allow the user
> > to retrieve it.
> >   
> 
> Yes, adding check as below:
> 
>  if (npages > DIRTY_BITMAP_PAGES_MAX)
>  -EINVAL;
> 
> 
> >> +  dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL);
> >> +  if (!dma->bitmap)
> >> +  return -ENOMEM;
> >> +
> >> +  return 0;
> >> +}
> >> +
> >> +static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu, uint64_t 
> >> pgsize)
> >> +{
> >> +  struct rb_node *n = rb_first(&iommu->dma_list);
> >> +
> >> +  for (; n; n = rb_next(n)) {
> >> +  struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
> >> +  struct rb_node *p;
> >> +  int ret;
> >> +
> >> +  ret = vfio_dma_bitmap_alloc(dma, pgsize);
> >> +  if (ret) {
> >> +  struct rb_node *p = rb_prev(n);
> >> +
> >> +  for (; p; p = rb_prev(p)) {
> >> +  struct vfio_dma *dma = rb_entry(n,
> >> +  struct vfio_dma, node);
> >> +
> >> +  kfree(dma->bitmap);
> >> +  dma->bitmap = NULL;
> >> +  }
> >> +  return ret;
> >> +  }
> >> +
> >> +  if (RB_EMPTY_ROOT(&dma->pfn_list))
> >> +  continue;
> >> +
> >> +

[PATCH v2 4/4] qemu: add support for 'multidevs' option

2020-03-20 Thread Christian Schoenebeck

This option prevents misbehaviours on guest if a qemu 9pfs export
contains multiple devices, due to the potential file ID collisions
this otherwise may cause.

Signed-off-by: Christian Schoenebeck 
---
 src/qemu/qemu_command.c |  7 +++
 src/qemu/qemu_domain.c  | 12 
 2 files changed, 19 insertions(+)

diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
index 9790c92cf8..7020e5448c 100644
--- a/src/qemu/qemu_command.c
+++ b/src/qemu/qemu_command.c
@@ -2632,6 +2632,13 @@ qemuBuildFSStr(virDomainFSDefPtr fs)
 } else if (fs->accessmode == VIR_DOMAIN_FS_ACCESSMODE_SQUASH) {
 virBufferAddLit(&opt, ",security_model=none");
 }
+if (fs->multidevs == VIR_DOMAIN_FS_MULTIDEVS_REMAP) {
+virBufferAddLit(&opt, ",multidevs=remap");
+} else if (fs->multidevs == VIR_DOMAIN_FS_MULTIDEVS_FORBID) {
+virBufferAddLit(&opt, ",multidevs=forbid");
+} else if (fs->multidevs == VIR_DOMAIN_FS_MULTIDEVS_WARN) {
+virBufferAddLit(&opt, ",multidevs=warn");
+}
 } else if (fs->fsdriver == VIR_DOMAIN_FS_DRIVER_TYPE_HANDLE) {
 /* removed since qemu 4.0.0 see v3.1.0-29-g93aee84f57 */
 virBufferAddLit(&opt, "handle");
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index edc8ba2ddb..c54c64fadb 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -8529,6 +8529,13 @@ qemuDomainDeviceDefValidateFS(virDomainFSDefPtr fs,
_("only supports mount filesystem type"));
 return -1;
 }
+if (fs->multidevs != VIR_DOMAIN_FS_MODEL_DEFAULT &&
+!virQEMUCapsGet(qemuCaps, QEMU_CAPS_VIRTFS_MULTIDEVS))
+{
+virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
+   _("multidevs is not supported with this QEMU binary"));
+return -1;
+}
 
 switch ((virDomainFSDriverType) fs->fsdriver) {
 case VIR_DOMAIN_FS_DRIVER_TYPE_DEFAULT:
@@ -8581,6 +8588,11 @@ qemuDomainDeviceDefValidateFS(virDomainFSDefPtr fs,
_("virtiofs is not supported with this QEMU 
binary"));
 return -1;
 }
+if (fs->multidevs != VIR_DOMAIN_FS_MULTIDEVS_DEFAULT) {
+virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
+   _("virtiofs does not support multidevs"));
+return -1;
+}
 if (qemuDomainDefValidateVirtioFSSharedMemory(def) < 0)
 return -1;
 break;
-- 
2.20.1

[PATCH v2 1/4] docs: virtfs: add section separators

2020-03-20 Thread Christian Schoenebeck

Signed-off-by: Christian Schoenebeck 
---
 docs/formatdomain.html.in | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index 594146009d..cc2c671c14 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -4084,13 +4084,20 @@
 
 
 
+  
   Since 5.2.0, the filesystem element
   has an optional attribute model with supported values
   "virtio-transitional", "virtio-non-transitional", or "virtio".
   See Virtio transitional devices
   for more details.
+  
+
   
 
+  
+  The filesystem element may contain the following 
subelements:
+  
+
   driver
   
 The optional driver element allows specifying further details
-- 
2.20.1

[PATCH v2 0/4] add support for QEMU 9pfs 'multidevs' option

2020-03-20 Thread Christian Schoenebeck

QEMU 4.2 added a new option 'multidevs' for 9pfs. The following patch adds
support for this new option to libvirt.

In short, what is this about: to distinguish files uniquely from each other
in general, numeric file IDs are typically used for comparison, which in
practice is the combination of a file's device ID and the file's inode
number. Unfortunately 9p protocol's QID field used for this purpose,
currently is too small to fit both the device ID and inode number in, which
hence is a problem if one 9pfs export contains multiple devices and may
thus lead to misbheaviours on guest (e.g. with SAMBA file servers) in that
case due to potential file ID collisions.

To mitigate this problem with 9pfs a 'multidevs' option was introduced in
QEMU 4.2 for defining how to deal with this, e.g. multidevs=remap will cause
QEMU's 9pfs implementation to remap all inodes from host side to different
inode numbers on guest side in a way that prevents file ID collisions.

NOTE: In the libvirt docs changes of this libvirt patch I simply assumed
"since 6.2.0". So the final libvirt version number would need to be adjusted
in that text if necessary.

See QEMU discussion with following Message-ID for details:
8a2ffe17fda3a86b9a5a437e1245276881f1e235.1567680121.git.qemu_...@crudebyte.com

v1->v2:

  * Unrelated docs/formatdomain.html.in changes to separate patch.
[patch 1]

  * Added new capability QEMU_CAPS_VIRTFS_MULTIDEVS.
[patch 2]

  * XML changes as isolated patch.
[patch 3]

  * Code style fix.
[patch 3]

  * QEMU 'multidevs' command handling as isolated patch.
[patch 4]

  * Error out if not QEMU_CAPS_VIRTFS_MULTIDEVS capability.
[patch 4]

  * Error out on virtiofs (since it does not have the 'multidevs' option).
[patch 4]

TODO:

  * Capabilities test cases would fail if 
was added to the other architectures' test case xml files, why?
[patch 2]

  * The requested test cases to add: Sorry, the libvirt test case
environment is yet a mystery to me, I would not even know where to
start here.

Message-ID of v1: e1jefpl-00028n...@lizzy.crudebyte.com

Christian Schoenebeck (4):
  docs: virtfs: add section separators
  qemu: capabilities: add QEMU_CAPS_VIRTFS_MULTIDEVS
  conf: add 'multidevs' option
  qemu: add support for 'multidevs' option

 docs/formatdomain.html.in | 47 ++-
 docs/schemas/domaincommon.rng | 10 
 src/conf/domain_conf.c| 29 
 src/conf/domain_conf.h| 13 +
 src/qemu/qemu_capabilities.c  |  5 ++
 src/qemu/qemu_capabilities.h  |  1 +
 src/qemu/qemu_command.c   |  7 +++
 src/qemu/qemu_domain.c| 12 +
 .../caps_4.2.0.x86_64.xml |  1 +
 .../caps_5.0.0.aarch64.xml|  1 +
 .../caps_5.0.0.x86_64.xml |  1 +
 11 files changed, 126 insertions(+), 1 deletion(-)

-- 
2.20.1

[PATCH v2 3/4] conf: add 'multidevs' option

2020-03-20 Thread Christian Schoenebeck

Introduce new 'multidevs' option for filesystem.

  


  

This option prevents misbehaviours on guest if a qemu 9pfs export
contains multiple devices, due to the potential file ID collisions
this otherwise may cause.

Signed-off-by: Christian Schoenebeck 
---
 docs/formatdomain.html.in | 40 ++-
 docs/schemas/domaincommon.rng | 10 +
 src/conf/domain_conf.c| 29 +
 src/conf/domain_conf.h| 13 
 4 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index cc2c671c14..13c506988b 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -3967,7 +3967,7 @@
 
 
   
-  
+  
 
 
 
@@ -4092,6 +4092,44 @@
   for more details.
   
 
+  
+  The filesystem element has an optional attribute multidevs
+  which specifies how to deal with a filesystem export containing more than
+  one device, in order to avoid file ID collisions on guest when using 9pfs
+  (since 6.2.0, requires QEMU 4.2).
+  This attribute is not available for virtiofs. The possible values are:
+  
+
+
+default
+
+Use QEMU's default setting (which currently is warn).
+
+remap
+
+This setting allows guest to access multiple devices per export without
+encountering misbehaviours. Inode numbers from host are automatically
+remapped on guest to actively prevent file ID collisions if guest
+accesses one export containing multiple devices.
+
+forbid
+
+Only allow to access one device per export by guest. Attempts to access
+additional devices on the same export will cause the individual
+filesystem access by guest to fail with an error and being logged 
(once)
+as error on host side.
+
+warn
+
+This setting resembles the behaviour of 9pfs prior to QEMU 4.2, that is
+no action is performed to prevent any potential file ID collisions if 
an
+export contains multiple devices, with the only exception: a warning is
+logged (once) on host side now. This setting may lead to misbehaviours
+on guest side if more than one device is exported per export, due to 
the
+potential file ID collisions this may cause on guest side in that case.
+
+
+
   
 
   
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index 6805420451..9b37740e30 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -2676,6 +2676,16 @@
 
   
 
+
+  
+
+  default
+  remap
+  forbid
+  warn
+
+  
+
 
   
 
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 71535f53f5..6a9a7dd0bb 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -501,6 +501,14 @@ VIR_ENUM_IMPL(virDomainFSModel,
   "virtio-non-transitional",
 );
 
+VIR_ENUM_IMPL(virDomainFSMultidevs,
+  VIR_DOMAIN_FS_MULTIDEVS_LAST,
+  "default",
+  "remap",
+  "forbid",
+  "warn",
+);
+
 VIR_ENUM_IMPL(virDomainFSCacheMode,
   VIR_DOMAIN_FS_CACHE_MODE_LAST,
   "default",
@@ -11376,6 +11384,7 @@ virDomainFSDefParseXML(virDomainXMLOptionPtr xmlopt,
 g_autofree char *usage = NULL;
 g_autofree char *units = NULL;
 g_autofree char *model = NULL;
+g_autofree char *multidevs = NULL;
 
 ctxt->node = node;
 
@@ -11414,6 +11423,17 @@ virDomainFSDefParseXML(virDomainXMLOptionPtr xmlopt,
 }
 }
 
+multidevs = virXMLPropString(node, "multidevs");
+if (multidevs) {
+if ((def->multidevs = virDomainFSMultidevsTypeFromString(multidevs)) < 
0) {
+virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+   _("unknown multidevs '%s'"), multidevs);
+goto error;
+}
+} else {
+def->multidevs = VIR_DOMAIN_FS_MULTIDEVS_DEFAULT;
+}
+
 if (virDomainParseScaledValue("./space_hard_limit[1]",
   NULL, ctxt, &def->space_hard_limit,
   1, ULLONG_MAX, false) < 0)
@@ -25397,6 +25417,7 @@ virDomainFSDefFormat(virBufferPtr buf,
 const char *accessmode = 
virDomainFSAccessModeTypeToString(def->accessmode);
 const char *fsdriver = virDomainFSDriverTypeToString(def->fsdriver);
 const char *wrpolicy = virDomainFSWrpolicyTypeToStrin

Re: [PATCH v3] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-03-20 Thread Andrzej Jakowski

On 3/20/20 8:45 AM, Stefan Hajnoczi wrote:
> Please use qemu_ram_writeback() so that pmem_persist() and qemu_msync()
> are used as appropriate.

Thx!
qemu_ram_writeback() doesn't return any status. How can I know that actual 
msync succeds?

Also qemu_ram_writeback() requires me to include #include "exec/ram_addr.h". 
After including it when I compile code I'm getting following error:

In file included from hw/block/nvme.c:49:
/root/sources/pmr/qemu/include/exec/ram_addr.h:23:10: fatal error: cpu.h: No 
such file or directory
   23 | #include "cpu.h"
  |  ^~~
compilation terminated.
make: *** [/root/sources/pmr/qemu/rules.mak:69: hw/block/nvme.o] Error 1

Why this is happening and what should be changed.

Re: [Qemu-devel] [PULL 3/4] qcow2: Add list of bitmaps to ImageInfoSpecificQCow2

2020-03-20 Thread Peter Maydell

On Mon, 11 Feb 2019 at 20:57, Eric Blake  wrote:
>
> From: Andrey Shinkevich 
>
> In the 'Format specific information' section of the 'qemu-img info'
> command output, the supplemental information about existing QCOW2
> bitmaps will be shown, such as a bitmap name, flags and granularity:

Hi; Coverity has just noticed an issue (CID 1421894) with this change:

> diff --git a/block/qcow2.c b/block/qcow2.c
> index bcb80d0270c..65a54c9ac65 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -4387,7 +4387,7 @@ static ImageInfoSpecific 
> *qcow2_get_specific_info(BlockDriverState *bs,
>  spec_info = g_new(ImageInfoSpecific, 1);
>  *spec_info = (ImageInfoSpecific){
>  .type  = IMAGE_INFO_SPECIFIC_KIND_QCOW2,
> -.u.qcow2.data = g_new(ImageInfoSpecificQCow2, 1),
> +.u.qcow2.data = g_new0(ImageInfoSpecificQCow2, 1),
>  };
>  if (s->qcow_version == 2) {
>  *spec_info->u.qcow2.data = (ImageInfoSpecificQCow2){
> @@ -4395,6 +4395,13 @@ static ImageInfoSpecific 
> *qcow2_get_specific_info(BlockDriverState *bs,
>  .refcount_bits  = s->refcount_bits,
>  };
>  } else if (s->qcow_version == 3) {
> +Qcow2BitmapInfoList *bitmaps;
> +bitmaps = qcow2_get_bitmap_info_list(bs, &local_err);
> +if (local_err) {
> +error_propagate(errp, local_err);
> +qapi_free_ImageInfoSpecific(spec_info);
> +return NULL;

If we take this error-exit codepath, then we never free the
memory allocated by the earlier call to qcrypto_block_get_info().

> +}
>  *spec_info->u.qcow2.data = (ImageInfoSpecificQCow2){
>  .compat = g_strdup("1.1"),
>  .lazy_refcounts = s->compatible_features &
> @@ -4404,6 +4411,8 @@ static ImageInfoSpecific 
> *qcow2_get_specific_info(BlockDriverState *bs,
>QCOW2_INCOMPAT_CORRUPT,
>  .has_corrupt= true,
>  .refcount_bits  = s->refcount_bits,
> +.has_bitmaps= !!bitmaps,
> +.bitmaps= bitmaps,
>  };
>  } else {
>  /* if this assertion fails, this probably means a new version was
> --
> 2.20.1

thanks
-- PMM

Re: [PATCH v15 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-20 Thread Kirti Wankhede





On 3/20/2020 4:27 AM, Alex Williamson wrote:

On Fri, 20 Mar 2020 01:46:41 +0530
Kirti Wankhede  wrote:


VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
- Start dirty pages tracking while migration is active
- Stop dirty pages tracking.
- Get dirty pages bitmap. Its user space application's responsibility to
   copy content of dirty pages from source to destination during migration.

To prevent DoS attack, memory for bitmap is allocated per vfio_dma
structure. Bitmap size is calculated considering smallest supported page
size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled

Bitmap is populated for already pinned pages when bitmap is allocated for
a vfio_dma with the smallest supported page size. Update bitmap from
pinning functions when tracking is enabled. When user application queries
bitmap, check if requested page size is same as page size used to
populated bitmap. If it is equal, copy bitmap, but if not equal, return
error.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  drivers/vfio/vfio_iommu_type1.c | 242 +++-
  1 file changed, 236 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 70aeab921d0f..239f61764d03 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -71,6 +71,7 @@ struct vfio_iommu {
unsigned intdma_avail;
boolv2;
boolnesting;
+   booldirty_page_tracking;
  };
  
  struct vfio_domain {

@@ -91,6 +92,7 @@ struct vfio_dma {
boollock_cap;   /* capable(CAP_IPC_LOCK) */
struct task_struct  *task;
struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
+   unsigned long   *bitmap;
  };
  
  struct vfio_group {

@@ -125,7 +127,21 @@ struct vfio_regions {
  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)   \
(!list_empty(&iommu->domain_list))
  
+#define DIRTY_BITMAP_BYTES(n)	(ALIGN(n, BITS_PER_TYPE(u64)) / BITS_PER_BYTE)

+
+/*
+ * Input argument of number of bits to bitmap_set() is unsigned integer, which
+ * further casts to signed integer for unaligned multi-bit operation,
+ * __bitmap_set().
+ * Then maximum bitmap size supported is 2^31 bits divided by 2^3 bits/byte,
+ * that is 2^28 (256 MB) which maps to 2^31 * 2^12 = 2^43 (8TB) on 4K page
+ * system.
+ */
+#define DIRTY_BITMAP_PAGES_MAX ((1UL << 31) - 1)
+#define DIRTY_BITMAP_SIZE_MAX   DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
+
  static int put_pfn(unsigned long pfn, int prot);
+static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
  
  /*

   * This code handles mapping and unmapping of user data buffers
@@ -175,6 +191,67 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
struct vfio_dma *old)
rb_erase(&old->node, &iommu->dma_list);
  }
  
+

+static int vfio_dma_bitmap_alloc(struct vfio_dma *dma, uint64_t pgsize)
+{
+   uint64_t npages = dma->size / pgsize;
+


Shouldn't we test this against one of the MAX macros defined above?  It
would be bad if we could enabled dirty tracking but not allow the user
to retrieve it.



Yes, adding check as below:

if (npages > DIRTY_BITMAP_PAGES_MAX)
-EINVAL;



+   dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL);
+   if (!dma->bitmap)
+   return -ENOMEM;
+
+   return 0;
+}
+
+static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu, uint64_t pgsize)
+{
+   struct rb_node *n = rb_first(&iommu->dma_list);
+
+   for (; n; n = rb_next(n)) {
+   struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
+   struct rb_node *p;
+   int ret;
+
+   ret = vfio_dma_bitmap_alloc(dma, pgsize);
+   if (ret) {
+   struct rb_node *p = rb_prev(n);
+
+   for (; p; p = rb_prev(p)) {
+   struct vfio_dma *dma = rb_entry(n,
+   struct vfio_dma, node);
+
+   kfree(dma->bitmap);
+   dma->bitmap = NULL;
+   }
+   return ret;
+   }
+
+   if (RB_EMPTY_ROOT(&dma->pfn_list))
+   continue;
+
+   for (p = rb_first(&dma->pfn_list); p; p = rb_next(p)) {
+   struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn,
+node);
+
+   bitmap_set(dma->bitmap,
+  (vpfn->iova - dma->iova) / pgsize, 1);
+   }
+   }
+   return 0;
+}
+
+static void vfio_dma_bitmap_free_all(struct vfio_iommu *iommu)
+{
+   struct rb_node *n = rb_first(&iommu->dma_list);
+
+   for (; n;

Re: [PATCH] monitor/hmp-cmds: fix bad indentation in 'info migrate_parameters' cmd output

2020-03-20 Thread Daniel P . Berrangé

On Fri, Mar 20, 2020 at 05:31:17PM +, Dr. David Alan Gilbert wrote:
> (Rearranging the text a bit)
> 
> * Markus Armbruster (arm...@redhat.com) wrote:
> 
> > David (cc'ed) should be able to tell us which fix is right.
> > 
> > @tls_creds and @tls_hostname look like they could have the same issue.
> 
> A certain Markus removed the Null checks in 8cc99dc because 4af245d
> guaranteed they would be None-Null for tls-creds/hostname - so we
> should be OK for those.
> 
> But tls-authz came along a lot later in d2f1d29 and doesn't
> seem to have the initialisation, which is now in
> migration_instance_init.
> 
> So I *think* the fix for this is to do the modern equivalent of 4af245d
> :
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index c1d88ace7f..0bc1b93277 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3686,6 +3686,7 @@ static void migration_instance_init(Object *obj)
>  
>  params->tls_hostname = g_strdup("");
>  params->tls_creds = g_strdup("");
> +params->tls_authz = g_strdup("");
>  
>  /* Set has_* up only for parameter checks */
>  params->has_compress_level = true;
> 
> Copying in Dan to check that wouldn't break tls.

It *will* break TLS, because it will cause the TLS code to lookup
an object with the ID of "".  NULL must be preserved when calling
the TLS APIs.

The assignment of "" to tls_hostname would also have broken TLS,
so the migration_tls_channel_connect method had to turn it back
into a real NULL.

The use of "" for tls_creds will similarly cause it to try and
lookup an object with ID of "", and fail. That one's harmless
though, because it would also fail if it were NULL.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2 6/6] scripts/coverity-scan: Add Docker support

2020-03-20 Thread Paolo Bonzini

On 19/03/20 20:33, Peter Maydell wrote:
> +# TODO: how do you get 'docker build' to print the output of the
> +# commands it is running to its stdout? This would be useful for debug.
> +DOCKER_BUILDKIT=1 docker build -t coverity-scanner \
> +   --secret id=coverity.token,src="$SECRET" \
> +   -f scripts/coverity-scan/coverity-scan.docker \
> +   scripts/coverity-scan

I'm not sure but tests/docker/docker.py should do it.  I'll test this
next week.

Paolo

Re: [PULL 06/35] spapr: Fail CAS if option vector table cannot be parsed

2020-03-20 Thread Peter Maydell

On Mon, 3 Feb 2020 at 06:11, David Gibson  wrote:
>
> From: Greg Kurz 
>
> Most of the option vector helpers have assertions to check their
> arguments aren't null. The guest can provide an arbitrary address
> for the CAS structure that would result in such null arguments.
> Fail CAS with H_PARAMETER and print a warning instead of aborting
> QEMU.
>
> Signed-off-by: Greg Kurz 
> Reviewed-by: Philippe Mathieu-Daudé 
> Message-Id: <157925255250.397143.10855183619366882459.st...@bahia.lan>
> Signed-off-by: David Gibson 
> ---
>  hw/ppc/spapr_hcall.c | 8 
>  1 file changed, 8 insertions(+)

Hi; Coverity points out that this change introduces a
memory leak (CID 1421924):

>
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index f1799b1b70..ffb14641f9 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1703,7 +1703,15 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu,
>  ov_table = addr;
>
>  ov1_guest = spapr_ovec_parse_vector(ov_table, 1);

spapr_ovec_parse_vector() allocates memory...

> +if (!ov1_guest) {
> +warn_report("guest didn't provide option vector 1");
> +return H_PARAMETER;
> +}
>  ov5_guest = spapr_ovec_parse_vector(ov_table, 5);
> +if (!ov5_guest) {
> +warn_report("guest didn't provide option vector 5");
> +return H_PARAMETER;

...but if we take this early exit code path it is never freed
(via spapr_ovec_cleanup()).

> +}
>  if (spapr_ovec_test(ov5_guest, OV5_MMU_BOTH)) {
>  error_report("guest requested hash and radix MMU, which is 
> invalid.");
>  exit(EXIT_FAILURE);

All the other error paths in the function either precede
allocation of the vectors or just call exit() rather than
returning, so this is the only leak.

thanks
-- PMM

Re: [PATCH] monitor/hmp-cmds: fix bad indentation in 'info migrate_parameters' cmd output

2020-03-20 Thread Dr. David Alan Gilbert

(Rearranging the text a bit)

* Markus Armbruster (arm...@redhat.com) wrote:

> David (cc'ed) should be able to tell us which fix is right.
> 
> @tls_creds and @tls_hostname look like they could have the same issue.

A certain Markus removed the Null checks in 8cc99dc because 4af245d
guaranteed they would be None-Null for tls-creds/hostname - so we
should be OK for those.

But tls-authz came along a lot later in d2f1d29 and doesn't
seem to have the initialisation, which is now in
migration_instance_init.

So I *think* the fix for this is to do the modern equivalent of 4af245d
:

diff --git a/migration/migration.c b/migration/migration.c
index c1d88ace7f..0bc1b93277 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3686,6 +3686,7 @@ static void migration_instance_init(Object *obj)
 
 params->tls_hostname = g_strdup("");
 params->tls_creds = g_strdup("");
+params->tls_authz = g_strdup("");
 
 /* Set has_* up only for parameter checks */
 params->has_compress_level = true;

Copying in Dan to check that wouldn't break tls.

Dave

> Mao Zhongyi  writes:
> 
> > run:
> > (qemu) info migrate_parameters
> > announce-initial: 50 ms
> > ...
> > announce-max: 550 ms
> > multifd-compression: none
> > xbzrle-cache-size: 4194304
> > max-postcopy-bandwidth: 0
> >  tls-authz: '(null)'
> >
> > The last line seems a bit out of place, fix it.
> 
> Yes, indentation is off, and your patch fixes that.  But there's also
> the '(null)', which emanates a certain bug smell.  Let's have a look at
> the code:
> 
> > Signed-off-by: Mao Zhongyi 
> > ---
> >  monitor/hmp-cmds.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> > index 58724031ea..f8be6bbb16 100644
> > --- a/monitor/hmp-cmds.c
> > +++ b/monitor/hmp-cmds.c
> > @@ -459,7 +459,7 @@ void hmp_info_migrate_parameters(Monitor *mon, const 
> > QDict *qdict)
>void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
>{
>MigrationParameters *params;
> 
>params = qmp_query_migrate_parameters(NULL);
> 
>if (params) {
>[...]
> >  monitor_printf(mon, "%s: %" PRIu64 "\n",
> >  
> > MigrationParameter_str(MIGRATION_PARAMETER_MAX_POSTCOPY_BANDWIDTH),
> >  params->max_postcopy_bandwidth);
> > -monitor_printf(mon, " %s: '%s'\n",
> > +monitor_printf(mon, "%s: '%s'\n",
> >  MigrationParameter_str(MIGRATION_PARAMETER_TLS_AUTHZ),
> >  params->has_tls_authz ? params->tls_authz : "");
> >  }
> 
> Here, params->tls_authz is null even though params->has_tls_authz is
> true.
> 
> GNU Libc is nice enough not to crash when you attempt to print a null
> pointer, but other libcs are not.
> 
> Where does the null pointer come from?
> 
>MigrationParameters *qmp_query_migrate_parameters(Error **errp)
>{
>MigrationParameters *params;
>MigrationState *s = migrate_get_current();
> 
>/* TODO use QAPI_CLONE() instead of duplicating it inline */
>params = g_malloc0(sizeof(*params));
>[...]
> --->   params->has_tls_authz = true;
> --->   params->tls_authz = g_strdup(s->parameters.tls_authz);
>[...]
> 
>return params;
>}
> 
> Note we ignore s->parameters.has_tls_authz.
> 
> If @tls_authz is should be present in params exactly when it is present
> in s->params, we should do this:
> 
>params->has_tls_authz = s->parameters.has_tls_authz;
>params->tls_authz = g_strdup(s->parameters.tls_authz);
> 
> If @tls_authz is should be present exactly when it's not null, we should
> do this:
> 
>params->has_tls_authz = !!s->parameters.tls_authz;
>params->tls_authz = g_strdup(s->parameters.tls_authz);
> 
> If @tls_authz should always be present, we need to substitute the null
> pointer by a suitable string, like this:
> 
>params->has_tls_authz = true;
>params->tls_authz = s->parameters.tls_authz
>? g_strdup(s->parameters.tls_authz) : "";
> 
> The /* TODO use QAPI_CLONE() instead of duplicating it inline */
> suggests yet another possible fix.
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH] hw/rdma/vmw/pvrdma_dev_ring: Replace strncpy with pstrcpy

2020-03-20 Thread Julia Suvorova

On Fri, Mar 20, 2020 at 4:20 PM Stefan Hajnoczi  wrote:
>
> On Wed, Mar 18, 2020 at 02:48:49PM +0100, Julia Suvorova wrote:
> > ring->name is defined as 'char name[MAX_RING_NAME_SZ]'. Replace untruncated
> > strncpy with QEMU function.
> > This case prevented QEMU from compiling with --enable-sanitizers.
> >
> > Signed-off-by: Julia Suvorova 
> > ---
> >  hw/rdma/vmw/pvrdma_dev_ring.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
>
> Here is my equivalent patch
> <20200316160702.478964-3-stefa...@redhat.com> but feel free to merge
> this.

Oops, sorry, I guess I didn't search carefully enough for the solution.

> Reviewed-by: Stefan Hajnoczi

Re: [PATCH] Update copyright date for user-facing copyright strings

2020-03-20 Thread Philippe Mathieu-Daudé


On 3/16/20 12:20 PM, Peter Maydell wrote:

Update the copyright date to 2020 for the copyright strings which are
user-facing and represent overall copyright info for all of QEMU.

Reported-by: John Arbuckle 
Signed-off-by: Peter Maydell 
---
  include/qemu-common.h | 2 +-
  docs/conf.py  | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/qemu-common.h b/include/qemu-common.h
index 082da59e852..d0142f29ac1 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -13,7 +13,7 @@
  #define TFR(expr) do { if ((expr) != -1) break; } while (errno == EINTR)
  
  /* Copyright string for -version arguments, About dialogs, etc */

-#define QEMU_COPYRIGHT "Copyright (c) 2003-2019 " \
+#define QEMU_COPYRIGHT "Copyright (c) 2003-2020 " \
  "Fabrice Bellard and the QEMU Project developers"
  
  /* Bug reporting information for --help arguments, About dialogs, etc */

diff --git a/docs/conf.py b/docs/conf.py
index 960043cb860..af55f506d5d 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -80,7 +80,7 @@ master_doc = 'index'
  
  # General information about the project.

  project = u'QEMU'
-copyright = u'2019, The QEMU Project Developers'
+copyright = u'2020, The QEMU Project Developers'


Ah, more complete than 
https://www.mail-archive.com/qemu-devel@nongnu.org/msg688687.html, thanks.


Reviewed-by: Philippe Mathieu-Daudé 


  author = u'The QEMU Project Developers'
  
  # The version info for the project you're documenting, acts as replacement for

Re: [PATCH 1/1] s390/ipl: fix off-by-one in update_machine_ipl_properties()

2020-03-20 Thread Cornelia Huck

On Fri, 20 Mar 2020 15:31:01 +0100
Halil Pasic  wrote:

> In update_machine_ipl_properties() the array ascii_loadparm needs to
> hold the 8 char lodparm and a string terminating zero char.

s/lodparm/loadparm/

> Let's increase the size of ascii_loadparm accordingly.
> 
> Signed-off-by: Halil Pasic 
> Fixes: 0a01e082a4 ("s390/ipl: sync back loadparm")

Fixes: Coverity CID 1421966

> Reported-by: Peter Maydell 
> ---
>  hw/s390x/ipl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
> index b81942e1e6..8c3e019571 100644
> --- a/hw/s390x/ipl.c
> +++ b/hw/s390x/ipl.c
> @@ -546,7 +546,7 @@ static void 
> update_machine_ipl_properties(IplParameterBlock *iplb)
>  /* Sync loadparm */
>  if (iplb->flags & DIAG308_FLAGS_LP_VALID) {
>  uint8_t *ebcdic_loadparm = iplb->loadparm;
> -char ascii_loadparm[8];
> +char ascii_loadparm[9];
>  int i;
>  
>  for (i = 0; i < 8 && ebcdic_loadparm[i]; i++) {
> 
> base-commit: 226cd20706e20264c176f8edbaf17d7c9b7ade4a

Thanks, queued to s390-fixes.

Re: [RFC PATCH for 5.0] configure: disable MTTCG for MIPS guests

2020-03-20 Thread Alex Bennée



Aleksandar Markovic  writes:

> пет, 20. мар 2020. у 12:45 Alex Bennée  је написао/ла:
>>
>> While debugging check-acceptance failures I found an instability in
>> the mips64el test case. Briefly the test case:
>>
>>   retry.py -n 100 -c -- ./mips64el-softmmu/qemu-system-mips64el \
>> -display none -vga none -serial mon:stdio \
>> -machine malta -kernel ./vmlinux-4.7.0-rc1.I6400 \
>> -cpu I6400 -smp 8 -vga std \
>> -append "printk.time=0 clocksource=GIC console=tty0 console=ttyS0 
>> panic=-1" \
>> --no-reboot
>>
>
> Thank for the findings!
>
> Could you perhaps attach or link to "retry.py"?

Sure - it's just a noddy python script which I use for repeated testing:

  https://github.com/stsquad/retry

> Did you run this particular test for the first time now, or it used to
> pass before?

I only noticed it since it was added to check-acceptance and has been
flakey since added I think.

>
> Thanks,
> Aleksandar
>
>> Reports about a 9% failure rate:
>>
>>   Results summary:
>>   0: 91 times (91.00%), avg time 5.547 (0.45 varience/0.67 deviation)
>>   -6: 9 times (9.00%), avg time 3.394 (0.02 varience/0.13 deviation)
>>   Ran command 100 times, 91 passes
>>
>> When re-run with "--accel tcg,thread=single" the instability goes
>> away.
>>
>>   Results summary:
>>   0: 100 times (100.00%), avg time 17.318 (249.76 varience/15.80 deviation)
>>   Ran command 100 times, 100 passes
>>
>> Which seems to indicate there is some aspect of the MIPS MTTCG fixes
>> that has been missed. Ideally we would fix that but I'm afraid I don't
>> have time to investigate and am not super familiar with the
>> architecture anyway.
>>
>> I've disabled all the mips guests as I assume it's a fundamental
>> synchronisation primitive that is broken but I haven't tested them all
>> (there are a lot!).
>>
>> Signed-off-by: Alex Bennée 
>> Cc: Aleksandar Markovic 
>> Cc: Aurelien Jarno 
>> Cc: Aleksandar Rikalo 
>> Cc: Philippe Mathieu-Daudé 
>> ---
>>  configure | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/configure b/configure
>> index 206d22c5153..002792d21dc 100755
>> --- a/configure
>> +++ b/configure
>> @@ -7832,19 +7832,19 @@ case "$target_name" in
>>  echo "TARGET_ABI32=y" >> $config_target_mak
>>;;
>>mips|mipsel)
>> -mttcg="yes"
>> +mttcg="no"
>>  TARGET_ARCH=mips
>>  echo "TARGET_ABI_MIPSO32=y" >> $config_target_mak
>>;;
>>mipsn32|mipsn32el)
>> -mttcg="yes"
>> +mttcg="no"
>>  TARGET_ARCH=mips64
>>  TARGET_BASE_ARCH=mips
>>  echo "TARGET_ABI_MIPSN32=y" >> $config_target_mak
>>  echo "TARGET_ABI32=y" >> $config_target_mak
>>;;
>>mips64|mips64el)
>> -mttcg="yes"
>> +mttcg="no"
>>  TARGET_ARCH=mips64
>>  TARGET_BASE_ARCH=mips
>>  echo "TARGET_ABI_MIPSN64=y" >> $config_target_mak
>> --
>> 2.20.1
>>


-- 
Alex Bennée

Qemu on Windows 10 - no acceleration found

2020-03-20 Thread Jerry Geis

Hi All,

I have tried QEMU on Windows 10 host with and without HyperV active in the
features list.
Neither seemed to affect the "really slow" speed. Either option results in
-enable-kvm giving "no acceleration found".

How do I enable acceleration on QEMU for windows.

Jerry

Re: [PATCH v2 4/6] linux-user/flatload.c: Use "" for include of QEMU header target_flat.h

2020-03-20 Thread Richard Henderson

On 3/19/20 12:33 PM, Peter Maydell wrote:
> The target_flat.h file is a QEMU header, so we should include it using
> quotes, not angle brackets.
> 
> Coverity otherwise is unable to find the header:
> 
> "../linux-user/flatload.c", line 40: error #1712: cannot open source file
>   "target_flat.h"
>   #include 
>   ^
> 
> because the relevant directory is only on the -iquote path, not the -I path.
> 
> Signed-off-by: Peter Maydell 
> ---
> I don't know why Coverity in particular has trouble here but
> real compilers don't. Still, the "" is the right thing.
> ---
>  linux-user/flatload.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Richard Henderson 

r~

Re: [PATCH v2 2/6] thread.h: Fix Coverity version of qemu_cond_timedwait()

2020-03-20 Thread Richard Henderson

On 3/19/20 12:33 PM, Peter Maydell wrote:
> For Coverity's benefit, we provide simpler versions of functions like
> qemu_mutex_lock(), qemu_cond_wait() and qemu_cond_timedwait().  When
> we added qemu_cond_timedwait() in commit 3dcc9c6ec4ea, a cut and
> paste error meant that the Coverity version of qemu_cond_timedwait()
> was using the wrong _impl function, which makes the Coverity parser
> complain:
> 
> "/qemu/include/qemu/thread.h", line 159: warning #140: too many arguments in
>   function call
>   return qemu_cond_timedwait(cond, mutex, ms);
>  ^
> 
> "/qemu/include/qemu/thread.h", line 159: warning #120: return value type does
>   not match the function type
>   return qemu_cond_timedwait(cond, mutex, ms);
>  ^
> 
> "/qemu/include/qemu/thread.h", line 156: warning #1563: function
>   "qemu_cond_timedwait" not emitted, consider modeling it or review
>   parse diagnostics to improve fidelity
>   static inline bool (qemu_cond_timedwait)(QemuCond *cond, QemuMutex *mutex,
>   ^
> 
> These aren't fatal, but reduce the scope of the analysis. Fix the error.
> 
> Signed-off-by: Peter Maydell 
> ---
>  include/qemu/thread.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Richard Henderson 

r~

Re: [PATCH v2 3/6] thread.h: Remove trailing semicolons from Coverity qemu_mutex_lock() etc

2020-03-20 Thread Richard Henderson

On 3/19/20 12:33 PM, Peter Maydell wrote:
> All the Coverity-specific definitions of qemu_mutex_lock() and friends
> have a trailing semicolon. This works fine almost everywhere because
> of QEMU's mandatory-braces coding style and because most callsites are
> simple, but target/s390x/sigp.c has a use of qemu_mutex_trylock() as
> an if() statement, which makes the ';' a syntax error:
> "../target/s390x/sigp.c", line 461: warning #18: expected a ")"
>   if (qemu_mutex_trylock(&qemu_sigp_mutex)) {
>   ^
> 
> Remove the bogus semicolons from the macro definitions.
> 
> Signed-off-by: Peter Maydell 
> ---
>  include/qemu/thread.h | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)

Reviewed-by: Richard Henderson 

r~

Re: [PATCH v2 1/6] osdep.h: Drop no-longer-needed Coverity workarounds

2020-03-20 Thread Richard Henderson

On 3/19/20 12:33 PM, Peter Maydell wrote:
> In commit a1a98357e3fd in 2018 we added some workarounds for Coverity
> not being able to handle the _Float* types introduced by recent
> glibc.  Newer versions of the Coverity scan tools have support for
> these types, and will fail with errors about duplicate typedefs if we
> have our workaround.  Remove our copy of the typedefs.
> 
> Signed-off-by: Peter Maydell 

Reviewed-by: Richard Henderson 

r~

Re: [RFC PATCH for 5.0] configure: disable MTTCG for MIPS guests

2020-03-20 Thread Aleksandar Markovic

пет, 20. мар 2020. у 12:45 Alex Bennée  је написао/ла:
>
> While debugging check-acceptance failures I found an instability in
> the mips64el test case. Briefly the test case:
>
>   retry.py -n 100 -c -- ./mips64el-softmmu/qemu-system-mips64el \
> -display none -vga none -serial mon:stdio \
> -machine malta -kernel ./vmlinux-4.7.0-rc1.I6400 \
> -cpu I6400 -smp 8 -vga std \
> -append "printk.time=0 clocksource=GIC console=tty0 console=ttyS0 
> panic=-1" \
> --no-reboot
>

Thank for the findings!

Could you perhaps attach or link to "retry.py"?

Did you run this particular test for the first time now, or it used to
pass before?

Thanks,
Aleksandar

> Reports about a 9% failure rate:
>
>   Results summary:
>   0: 91 times (91.00%), avg time 5.547 (0.45 varience/0.67 deviation)
>   -6: 9 times (9.00%), avg time 3.394 (0.02 varience/0.13 deviation)
>   Ran command 100 times, 91 passes
>
> When re-run with "--accel tcg,thread=single" the instability goes
> away.
>
>   Results summary:
>   0: 100 times (100.00%), avg time 17.318 (249.76 varience/15.80 deviation)
>   Ran command 100 times, 100 passes
>
> Which seems to indicate there is some aspect of the MIPS MTTCG fixes
> that has been missed. Ideally we would fix that but I'm afraid I don't
> have time to investigate and am not super familiar with the
> architecture anyway.
>
> I've disabled all the mips guests as I assume it's a fundamental
> synchronisation primitive that is broken but I haven't tested them all
> (there are a lot!).
>
> Signed-off-by: Alex Bennée 
> Cc: Aleksandar Markovic 
> Cc: Aurelien Jarno 
> Cc: Aleksandar Rikalo 
> Cc: Philippe Mathieu-Daudé 
> ---
>  configure | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/configure b/configure
> index 206d22c5153..002792d21dc 100755
> --- a/configure
> +++ b/configure
> @@ -7832,19 +7832,19 @@ case "$target_name" in
>  echo "TARGET_ABI32=y" >> $config_target_mak
>;;
>mips|mipsel)
> -mttcg="yes"
> +mttcg="no"
>  TARGET_ARCH=mips
>  echo "TARGET_ABI_MIPSO32=y" >> $config_target_mak
>;;
>mipsn32|mipsn32el)
> -mttcg="yes"
> +mttcg="no"
>  TARGET_ARCH=mips64
>  TARGET_BASE_ARCH=mips
>  echo "TARGET_ABI_MIPSN32=y" >> $config_target_mak
>  echo "TARGET_ABI32=y" >> $config_target_mak
>;;
>mips64|mips64el)
> -mttcg="yes"
> +mttcg="no"
>  TARGET_ARCH=mips64
>  TARGET_BASE_ARCH=mips
>  echo "TARGET_ABI_MIPSN64=y" >> $config_target_mak
> --
> 2.20.1
>

Re: [RFC PATCH for 5.0] configure: disable MTTCG for MIPS guests

2020-03-20 Thread Aleksandar Markovic

пет, 20. мар 2020. у 18:08 Aleksandar Markovic <
aleksandar.qemu.de...@gmail.com> је написао/ла:
>
> пет, 20. мар 2020. у 12:45 Alex Bennée  је
написао/ла:
> >
> > While debugging check-acceptance failures I found an instability in
> > the mips64el test case. Briefly the test case:
> >
> >   retry.py -n 100 -c -- ./mips64el-softmmu/qemu-system-mips64el \
> > -display none -vga none -serial mon:stdio \
> > -machine malta -kernel ./vmlinux-4.7.0-rc1.I6400 \
> > -cpu I6400 -smp 8 -vga std \
> > -append "printk.time=0 clocksource=GIC console=tty0 console=ttyS0
panic=-1" \
> > --no-reboot
> >
>
> Thank for the findings!
>
> Could you perhaps attach or link to "retry.py"?
>

Is this the script you used:

https://github.com/stsquad/retry/blob/master/retry.py

> Did you run this particular test for the first time now, or it used to
> pass before?
>
> Thanks,
> Aleksandar
>
> > Reports about a 9% failure rate:
> >
> >   Results summary:
> >   0: 91 times (91.00%), avg time 5.547 (0.45 varience/0.67 deviation)
> >   -6: 9 times (9.00%), avg time 3.394 (0.02 varience/0.13 deviation)
> >   Ran command 100 times, 91 passes
> >
> > When re-run with "--accel tcg,thread=single" the instability goes
> > away.
> >
> >   Results summary:
> >   0: 100 times (100.00%), avg time 17.318 (249.76 varience/15.80
deviation)
> >   Ran command 100 times, 100 passes
> >
> > Which seems to indicate there is some aspect of the MIPS MTTCG fixes
> > that has been missed. Ideally we would fix that but I'm afraid I don't
> > have time to investigate and am not super familiar with the
> > architecture anyway.
> >
> > I've disabled all the mips guests as I assume it's a fundamental
> > synchronisation primitive that is broken but I haven't tested them all
> > (there are a lot!).
> >
> > Signed-off-by: Alex Bennée 
> > Cc: Aleksandar Markovic 
> > Cc: Aurelien Jarno 
> > Cc: Aleksandar Rikalo 
> > Cc: Philippe Mathieu-Daudé 
> > ---
> >  configure | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/configure b/configure
> > index 206d22c5153..002792d21dc 100755
> > --- a/configure
> > +++ b/configure
> > @@ -7832,19 +7832,19 @@ case "$target_name" in
> >  echo "TARGET_ABI32=y" >> $config_target_mak
> >;;
> >mips|mipsel)
> > -mttcg="yes"
> > +mttcg="no"
> >  TARGET_ARCH=mips
> >  echo "TARGET_ABI_MIPSO32=y" >> $config_target_mak
> >;;
> >mipsn32|mipsn32el)
> > -mttcg="yes"
> > +mttcg="no"
> >  TARGET_ARCH=mips64
> >  TARGET_BASE_ARCH=mips
> >  echo "TARGET_ABI_MIPSN32=y" >> $config_target_mak
> >  echo "TARGET_ABI32=y" >> $config_target_mak
> >;;
> >mips64|mips64el)
> > -mttcg="yes"
> > +mttcg="no"
> >  TARGET_ARCH=mips64
> >  TARGET_BASE_ARCH=mips
> >  echo "TARGET_ABI_MIPSN64=y" >> $config_target_mak
> > --
> > 2.20.1
> >

[RFC v6 24/24] hw/arm/smmuv3: Allow MAP notifiers

2020-03-20 Thread Eric Auger

We now have all bricks to support nested paging. This
uses MAP notifiers to map the MSIs. So let's allow MAP
notifiers to be registered.

Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 6db3d2f218..dc716d7d59 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1537,14 +1537,6 @@ static int smmuv3_notify_flag_changed(IOMMUMemoryRegion 
*iommu,
 SMMUv3State *s3 = sdev->smmu;
 SMMUState *s = &(s3->smmu_state);
 
-if (new & IOMMU_NOTIFIER_MAP) {
-error_setg(errp,
-   "device %02x.%02x.%x requires iommu MAP notifier which is "
-   "not currently supported", pci_bus_num(sdev->bus),
-   PCI_SLOT(sdev->devfn), PCI_FUNC(sdev->devfn));
-return -EINVAL;
-}
-
 if (old == IOMMU_NOTIFIER_NONE) {
 trace_smmuv3_notify_flag_add(iommu->parent_obj.name);
 QLIST_INSERT_HEAD(&s->devices_with_notifiers, sdev, next);
-- 
2.20.1

[RFC v6 23/24] hw/arm/smmuv3: Implement fault injection

2020-03-20 Thread Eric Auger

We convert iommu_fault structs received from the kernel
into the data struct used by the emulation code and record
the evnts into the virtual event queue.

Signed-off-by: Eric Auger 

---

v3 -> v4:
- fix compil issue on mingw

Exhaustive mapping remains to be done
---
 hw/arm/smmuv3.c | 71 +
 1 file changed, 71 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 7a805030e2..6db3d2f218 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1569,6 +1569,76 @@ static int smmuv3_get_attr(IOMMUMemoryRegion *iommu,
 return -EINVAL;
 }
 
+struct iommu_fault;
+
+static inline int
+smmuv3_inject_faults(IOMMUMemoryRegion *iommu_mr, int count,
+ struct iommu_fault *buf)
+{
+#ifdef __linux__
+SMMUDevice *sdev = container_of(iommu_mr, SMMUDevice, iommu);
+SMMUv3State *s3 = sdev->smmu;
+uint32_t sid = smmu_get_sid(sdev);
+int i;
+
+for (i = 0; i < count; i++) {
+SMMUEventInfo info = {};
+struct iommu_fault_unrecoverable *record;
+
+if (buf[i].type != IOMMU_FAULT_DMA_UNRECOV) {
+continue;
+}
+
+info.sid = sid;
+record = &buf[i].event;
+
+switch (record->reason) {
+case IOMMU_FAULT_REASON_PASID_INVALID:
+info.type = SMMU_EVT_C_BAD_SUBSTREAMID;
+/* TODO further fill info.u.c_bad_substream */
+break;
+case IOMMU_FAULT_REASON_PASID_FETCH:
+info.type = SMMU_EVT_F_CD_FETCH;
+break;
+case IOMMU_FAULT_REASON_BAD_PASID_ENTRY:
+info.type = SMMU_EVT_C_BAD_CD;
+/* TODO further fill info.u.c_bad_cd */
+break;
+case IOMMU_FAULT_REASON_WALK_EABT:
+info.type = SMMU_EVT_F_WALK_EABT;
+info.u.f_walk_eabt.addr = record->addr;
+info.u.f_walk_eabt.addr2 = record->fetch_addr;
+break;
+case IOMMU_FAULT_REASON_PTE_FETCH:
+info.type = SMMU_EVT_F_TRANSLATION;
+info.u.f_translation.addr = record->addr;
+break;
+case IOMMU_FAULT_REASON_OOR_ADDRESS:
+info.type = SMMU_EVT_F_ADDR_SIZE;
+info.u.f_addr_size.addr = record->addr;
+break;
+case IOMMU_FAULT_REASON_ACCESS:
+info.type = SMMU_EVT_F_ACCESS;
+info.u.f_access.addr = record->addr;
+break;
+case IOMMU_FAULT_REASON_PERMISSION:
+info.type = SMMU_EVT_F_PERMISSION;
+info.u.f_permission.addr = record->addr;
+break;
+default:
+warn_report("%s Unexpected fault reason received from host: %d",
+__func__, record->reason);
+continue;
+}
+
+smmuv3_record_event(s3, &info);
+}
+return 0;
+#else
+return -1;
+#endif
+}
+
 static void smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
   void *data)
 {
@@ -1577,6 +1647,7 @@ static void 
smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
 imrc->translate = smmuv3_translate;
 imrc->notify_flag_changed = smmuv3_notify_flag_changed;
 imrc->get_attr = smmuv3_get_attr;
+imrc->inject_faults = smmuv3_inject_faults;
 }
 
 static const TypeInfo smmuv3_type_info = {
-- 
2.20.1

[RFC v6 22/24] hw/arm/smmuv3: Pass stage 1 configurations to the host

2020-03-20 Thread Eric Auger

In case PASID PciOps are set for the device we call
the set_pasid_table() callback on each STE update.

This allows to pass the guest stage 1 configuration
to the host and apply it at physical level.

Signed-off-by: Eric Auger 

---

v4 -> v5:
- Use PciOps instead of config notifiers

v3 -> v4:
- fix compile issue with mingw

v2 -> v3:
- adapt to pasid_cfg field changes. Use local variable
- add trace event
- set version fields
- use CONFIG_PASID

v1 -> v2:
- do not notify anymore on CD change. Anyway the smmuv3 linux
  driver is not sending any CD invalidation commands. If we were
  to propagate CD invalidation commands, we would use the
  CACHE_INVALIDATE VFIO ioctl.
- notify a precise config flags to prepare for addition of new
  flags
---
 hw/arm/smmuv3.c | 77 +++--
 hw/arm/trace-events |  1 +
 2 files changed, 61 insertions(+), 17 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index edd76bce4c..7a805030e2 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -16,6 +16,10 @@
  * with this program; if not, see .
  */
 
+#ifdef __linux__
+#include "linux/iommu.h"
+#endif
+
 #include "qemu/osdep.h"
 #include "hw/irq.h"
 #include "hw/sysbus.h"
@@ -861,6 +865,60 @@ static void smmuv3_inv_notifiers_iova(SMMUState *s, int 
asid,
 }
 }
 
+static void smmuv3_notify_config_change(SMMUState *bs, uint32_t sid)
+{
+#ifdef __linux__
+IOMMUMemoryRegion *mr = smmu_iommu_mr(bs, sid);
+SMMUEventInfo event = {.type = SMMU_EVT_NONE, .sid = sid,
+   .inval_ste_allowed = true};
+IOMMUConfig iommu_config;
+SMMUTransCfg *cfg;
+SMMUDevice *sdev;
+
+if (!mr) {
+return;
+}
+
+sdev = container_of(mr, SMMUDevice, iommu);
+
+/* flush QEMU config cache */
+smmuv3_flush_config(sdev);
+
+if (!pci_device_is_pasid_ops_set(sdev->bus, sdev->devfn)) {
+return;
+}
+
+cfg = smmuv3_get_config(sdev, &event);
+
+if (!cfg) {
+return;
+}
+
+iommu_config.pasid_cfg.version = PASID_TABLE_CFG_VERSION_1;
+iommu_config.pasid_cfg.format = IOMMU_PASID_FORMAT_SMMUV3;
+iommu_config.pasid_cfg.base_ptr = cfg->s1ctxptr;
+iommu_config.pasid_cfg.pasid_bits = 0;
+iommu_config.pasid_cfg.smmuv3.version = PASID_TABLE_SMMUV3_CFG_VERSION_1;
+
+if (cfg->disabled || cfg->bypassed) {
+iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_BYPASS;
+} else if (cfg->aborted) {
+iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_ABORT;
+} else {
+iommu_config.pasid_cfg.config = IOMMU_PASID_CONFIG_TRANSLATE;
+}
+
+trace_smmuv3_notify_config_change(mr->parent_obj.name,
+  iommu_config.pasid_cfg.config,
+  iommu_config.pasid_cfg.base_ptr);
+
+if (pci_device_set_pasid_table(sdev->bus, sdev->devfn, &iommu_config)) {
+error_report("Failed to pass PASID table to host for iommu mr %s (%m)",
+ mr->parent_obj.name);
+}
+#endif
+}
+
 static int smmuv3_cmdq_consume(SMMUv3State *s)
 {
 SMMUState *bs = ARM_SMMU(s);
@@ -911,22 +969,14 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 case SMMU_CMD_CFGI_STE:
 {
 uint32_t sid = CMD_SID(&cmd);
-IOMMUMemoryRegion *mr = smmu_iommu_mr(bs, sid);
-SMMUDevice *sdev;
 
 if (CMD_SSEC(&cmd)) {
 cmd_error = SMMU_CERROR_ILL;
 break;
 }
 
-if (!mr) {
-break;
-}
-
 trace_smmuv3_cmdq_cfgi_ste(sid);
-sdev = container_of(mr, SMMUDevice, iommu);
-smmuv3_flush_config(sdev);
-
+smmuv3_notify_config_change(bs, sid);
 break;
 }
 case SMMU_CMD_CFGI_STE_RANGE: /* same as SMMU_CMD_CFGI_ALL */
@@ -943,14 +993,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 trace_smmuv3_cmdq_cfgi_ste_range(start, end);
 
 for (i = start; i <= end; i++) {
-IOMMUMemoryRegion *mr = smmu_iommu_mr(bs, i);
-SMMUDevice *sdev;
-
-if (!mr) {
-continue;
-}
-sdev = container_of(mr, SMMUDevice, iommu);
-smmuv3_flush_config(sdev);
+smmuv3_notify_config_change(bs, i);
 }
 break;
 }
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index 3809005cba..741e645ae2 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -52,4 +52,5 @@ smmuv3_config_cache_inv(uint32_t sid) "Config cache INV for 
sid %d"
 smmuv3_notify_flag_add(const char *iommu) "ADD SMMUNotifier node for iommu 
mr=%s"
 smmuv3_notify_flag_del(const char *iommu) "DEL SMMUNotifier node for iommu 
mr=%s"
 smmuv3_inv_notifiers_iova(const char *name, uint16_t asid, uint64_t iova) 
"iommu mr=%s asid=%d iova=0x%"PRIx64
+smmuv3_notify_config_change(const char

[RFC v6 21/24] hw/arm/smmuv3: Fill the IOTLBEntry leaf field on NH_VA invalidation

2020-03-20 Thread Eric Auger

Let's propagate the leaf attribute throughout the invalidation path.
This hint is used to reduce the scope of the invalidations to the
last level of translation. Not enforcing it induces large performance
penalties in nested mode.

Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3.c | 16 +---
 hw/arm/trace-events |  2 +-
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 66603c1fde..edd76bce4c 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -811,8 +811,7 @@ epilogue:
  */
 static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
IOMMUNotifier *n,
-   int asid,
-   dma_addr_t iova)
+   int asid, dma_addr_t iova, bool leaf)
 {
 SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
 SMMUEventInfo event = {.inval_ste_allowed = true};
@@ -839,12 +838,14 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
 entry.addr_mask = (1 << tt->granule_sz) - 1;
 entry.perm = IOMMU_NONE;
 entry.arch_id = asid;
+entry.leaf = leaf;
 
 memory_region_notify_one(n, &entry);
 }
 
 /* invalidate an asid/iova tuple in all mr's */
-static void smmuv3_inv_notifiers_iova(SMMUState *s, int asid, dma_addr_t iova)
+static void smmuv3_inv_notifiers_iova(SMMUState *s, int asid,
+  dma_addr_t iova, bool leaf)
 {
 SMMUDevice *sdev;
 
@@ -855,7 +856,7 @@ static void smmuv3_inv_notifiers_iova(SMMUState *s, int 
asid, dma_addr_t iova)
 trace_smmuv3_inv_notifiers_iova(mr->parent_obj.name, asid, iova);
 
 IOMMU_NOTIFIER_FOREACH(n, mr) {
-smmuv3_notify_iova(mr, n, asid, iova);
+smmuv3_notify_iova(mr, n, asid, iova, leaf);
 }
 }
 }
@@ -993,9 +994,10 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 {
 dma_addr_t addr = CMD_ADDR(&cmd);
 uint16_t vmid = CMD_VMID(&cmd);
+bool leaf = CMD_LEAF(&cmd);
 
-trace_smmuv3_cmdq_tlbi_nh_vaa(vmid, addr);
-smmuv3_inv_notifiers_iova(bs, -1, addr);
+trace_smmuv3_cmdq_tlbi_nh_vaa(vmid, addr, leaf);
+smmuv3_inv_notifiers_iova(bs, -1, addr, leaf);
 smmu_iotlb_inv_all(bs);
 break;
 }
@@ -1007,7 +1009,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 bool leaf = CMD_LEAF(&cmd);
 
 trace_smmuv3_cmdq_tlbi_nh_va(vmid, asid, addr, leaf);
-smmuv3_inv_notifiers_iova(bs, asid, addr);
+smmuv3_inv_notifiers_iova(bs, asid, addr, leaf);
 smmu_iotlb_inv_iova(bs, asid, addr);
 break;
 }
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index 0acedcedc6..3809005cba 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -43,7 +43,7 @@ smmuv3_cmdq_cfgi_cd(uint32_t sid) "streamid = %d"
 smmuv3_config_cache_hit(uint32_t sid, uint32_t hits, uint32_t misses, uint32_t 
perc) "Config cache HIT for sid %d (hits=%d, misses=%d, hit rate=%d)"
 smmuv3_config_cache_miss(uint32_t sid, uint32_t hits, uint32_t misses, 
uint32_t perc) "Config cache MISS for sid %d (hits=%d, misses=%d, hit rate=%d)"
 smmuv3_cmdq_tlbi_nh_va(int vmid, int asid, uint64_t addr, bool leaf) "vmid =%d 
asid =%d addr=0x%"PRIx64" leaf=%d"
-smmuv3_cmdq_tlbi_nh_vaa(int vmid, uint64_t addr) "vmid =%d addr=0x%"PRIx64
+smmuv3_cmdq_tlbi_nh_vaa(int vmid, uint64_t addr, bool leaf) "vmid =%d 
addr=0x%"PRIx64" leaf=%d"
 smmuv3_cmdq_tlbi_nh(void) ""
 smmuv3_cmdq_tlbi_nh_asid(uint16_t asid) "asid=%d"
 smmu_iotlb_cache_hit(uint16_t asid, uint64_t addr, uint32_t hit, uint32_t 
miss, uint32_t p) "IOTLB cache HIT asid=%d addr=0x%"PRIx64" hit=%d miss=%d hit 
rate=%d"
-- 
2.20.1

[RFC v6 19/24] hw/arm/smmuv3: Store the PASID table GPA in the translation config

2020-03-20 Thread Eric Auger

For VFIO integration we will need to pass the Context Descriptor (CD)
table GPA to the host. The CD table is also referred to as the PASID
table. Its GPA corresponds to the s1ctrptr field of the Stream Table
Entry. So let's decode and store it in the configuration structure.

Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3.c  | 1 +
 include/hw/arm/smmu-common.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 9bea5f65ae..1424e08c31 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -352,6 +352,7 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
   "SMMUv3 S1 stalling fault model not allowed yet\n");
 goto bad_ste;
 }
+cfg->s1ctxptr = STE_CTXPTR(ste);
 return 0;
 
 bad_ste:
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index 1f37844e5c..353668f4ea 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -68,6 +68,7 @@ typedef struct SMMUTransCfg {
 uint8_t tbi;   /* Top Byte Ignore */
 uint16_t asid;
 SMMUTransTableInfo tt[2];
+dma_addr_t s1ctxptr;
 uint32_t iotlb_hits;   /* counts IOTLB hits for this asid */
 uint32_t iotlb_misses; /* counts IOTLB misses for this asid */
 } SMMUTransCfg;
-- 
2.20.1

[RFC v6 18/24] hw/arm/smmuv3: Advertise MSI_TRANSLATE attribute

2020-03-20 Thread Eric Auger

The SMMUv3 has the peculiarity to translate MSI
transactionss. let's advertise the corresponding
attribute.

Signed-off-by: Eric Auger 

---
---
 hw/arm/smmuv3.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index e33eabd028..9bea5f65ae 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1515,6 +1515,9 @@ static int smmuv3_get_attr(IOMMUMemoryRegion *iommu,
 if (attr == IOMMU_ATTR_VFIO_NESTED) {
 *(bool *) data = true;
 return 0;
+} else if (attr == IOMMU_ATTR_MSI_TRANSLATE) {
+*(bool *) data = true;
+return 0;
 }
 return -EINVAL;
 }
-- 
2.20.1

Re: [PATCH v2 1/2] lockable: fix COUNTER macro to be referenced properly

2020-03-20 Thread Richard Henderson

On 3/19/20 4:34 PM, dnbrd...@gmail.com wrote:
> From: Daniel Brodsky 
> 
> - __COUNTER__ doesn't work with ## concat
> - replaced ## with glue() macro so __COUNTER__ is evaluated
> 
> Fixes: 3284c3ddc4
> 
> Signed-off-by: Daniel Brodsky 
> ---
>  include/qemu/lockable.h | 2 +-
>  include/qemu/rcu.h  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson 

r~

[RFC v6 16/24] vfio/pci: Set up the DMA FAULT region

2020-03-20 Thread Eric Auger

Set up the fault region which is composed of the actual fault
queue (mmappable) and a header used to handle it. The fault
queue is mmapped.

Signed-off-by: Eric Auger 

---

v4 -> v5:
- use a single DMA FAULT region. No version selection anymore
---
 hw/vfio/pci.c | 64 +++
 hw/vfio/pci.h |  1 +
 2 files changed, 65 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 7579f476b0..029652a507 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2530,11 +2530,67 @@ int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
 return 0;
 }
 
+static void vfio_init_fault_regions(VFIOPCIDevice *vdev, Error **errp)
+{
+struct vfio_region_info *fault_region_info = NULL;
+struct vfio_region_info_cap_fault *cap_fault;
+VFIODevice *vbasedev = &vdev->vbasedev;
+struct vfio_info_cap_header *hdr;
+char *fault_region_name;
+int ret;
+
+ret = vfio_get_dev_region_info(&vdev->vbasedev,
+   VFIO_REGION_TYPE_NESTED,
+   VFIO_REGION_SUBTYPE_NESTED_DMA_FAULT,
+   &fault_region_info);
+if (ret) {
+goto out;
+}
+
+hdr = vfio_get_region_info_cap(fault_region_info,
+   VFIO_REGION_INFO_CAP_DMA_FAULT);
+if (!hdr) {
+error_setg(errp, "failed to retrieve DMA FAULT capability");
+goto out;
+}
+cap_fault = container_of(hdr, struct vfio_region_info_cap_fault,
+ header);
+if (cap_fault->version != 1) {
+error_setg(errp, "Unsupported DMA FAULT API version %d",
+   cap_fault->version);
+goto out;
+}
+
+fault_region_name = g_strdup_printf("%s DMA FAULT %d",
+vbasedev->name,
+fault_region_info->index);
+
+ret = vfio_region_setup(OBJECT(vdev), vbasedev,
+&vdev->dma_fault_region,
+fault_region_info->index,
+fault_region_name);
+g_free(fault_region_name);
+if (ret) {
+error_setg_errno(errp, -ret,
+ "failed to set up the DMA FAULT region %d",
+ fault_region_info->index);
+goto out;
+}
+
+ret = vfio_region_mmap(&vdev->dma_fault_region);
+if (ret) {
+error_setg_errno(errp, -ret, "Failed to mmap the DMA FAULT queue");
+}
+out:
+g_free(fault_region_info);
+}
+
 static void vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
 {
 VFIODevice *vbasedev = &vdev->vbasedev;
 struct vfio_region_info *reg_info;
 struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
+Error *err = NULL;
 int i, ret = -1;
 
 /* Sanity check device */
@@ -2598,6 +2654,12 @@ static void vfio_populate_device(VFIOPCIDevice *vdev, 
Error **errp)
 }
 }
 
+vfio_init_fault_regions(vdev, &err);
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
 irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
 
 ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
@@ -3200,6 +3262,7 @@ static void vfio_instance_finalize(Object *obj)
 
 vfio_display_finalize(vdev);
 vfio_bars_finalize(vdev);
+vfio_region_finalize(&vdev->dma_fault_region);
 g_free(vdev->emulated_config_bits);
 g_free(vdev->rom);
 if (vdev->migration_blocker) {
@@ -3224,6 +3287,7 @@ static void vfio_exitfn(PCIDevice *pdev)
 vfio_unregister_req_notifier(vdev);
 vfio_unregister_err_notifier(vdev);
 vfio_unregister_ext_irq_notifiers(vdev);
+vfio_region_exit(&vdev->dma_fault_region);
 pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
 if (vdev->irqchip_change_notifier.notify) {
 kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 56f0fabb33..c5a59a8e3d 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -141,6 +141,7 @@ typedef struct VFIOPCIDevice {
 EventNotifier err_notifier;
 EventNotifier req_notifier;
 VFIOPCIExtIRQ *ext_irqs;
+VFIORegion dma_fault_region;
 int (*resetfn)(struct VFIOPCIDevice *);
 uint32_t vendor_id;
 uint32_t device_id;
-- 
2.20.1

[RFC v6 20/24] hw/arm/smmuv3: Fill the IOTLBEntry arch_id on NH_VA invalidation

2020-03-20 Thread Eric Auger

When the guest invalidates one S1 entry, it passes the asid.
When propagating this invalidation downto the host, the asid
information also must be passed. So let's fill the arch_id field
introduced for that purpose.

Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 1424e08c31..66603c1fde 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -838,6 +838,7 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
 entry.iova = iova;
 entry.addr_mask = (1 << tt->granule_sz) - 1;
 entry.perm = IOMMU_NONE;
+entry.arch_id = asid;
 
 memory_region_notify_one(n, &entry);
 }
-- 
2.20.1

[RFC v6 11/24] vfio: Introduce helpers to DMA map/unmap a RAM section

2020-03-20 Thread Eric Auger

Let's introduce two helpers that allow to DMA map/unmap a RAM
section. Those helpers will be called for nested stage setup in
another call site. Also the vfio_listener_region_add/del()
structure may be clearer.

Signed-off-by: Eric Auger 

---

v5 -> v6:
- add Error **
---
 hw/vfio/common.c | 177 ++-
 hw/vfio/trace-events |   4 +-
 2 files changed, 108 insertions(+), 73 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f20b37fbee..e067009da8 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -512,13 +512,115 @@ hostwin_from_range(VFIOContainer *container, hwaddr 
iova, hwaddr end)
 return NULL;
 }
 
+static int vfio_dma_map_ram_section(VFIOContainer *container,
+MemoryRegionSection *section, Error **err)
+{
+VFIOHostDMAWindow *hostwin;
+Int128 llend, llsize;
+hwaddr iova, end;
+void *vaddr;
+int ret;
+
+assert(memory_region_is_ram(section->mr));
+
+iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+llend = int128_make64(section->offset_within_address_space);
+llend = int128_add(llend, section->size);
+llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+end = int128_get64(int128_sub(llend, int128_one()));
+
+vaddr = memory_region_get_ram_ptr(section->mr) +
+section->offset_within_region +
+(iova - section->offset_within_address_space);
+
+hostwin = hostwin_from_range(container, iova, end);
+if (!hostwin) {
+error_setg(err, "Container %p can't map guest IOVA region"
+   " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
+return -EFAULT;
+}
+
+trace_vfio_dma_map_ram(iova, end, vaddr);
+
+llsize = int128_sub(llend, int128_make64(iova));
+
+if (memory_region_is_ram_device(section->mr)) {
+hwaddr pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
+
+if ((iova & pgmask) || (int128_get64(llsize) & pgmask)) {
+trace_vfio_listener_region_add_no_dma_map(
+memory_region_name(section->mr),
+section->offset_within_address_space,
+int128_getlo(section->size),
+pgmask + 1);
+return 0;
+}
+}
+
+ret = vfio_dma_map(container, iova, int128_get64(llsize),
+   vaddr, section->readonly);
+if (ret) {
+error_setg(err, "vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+   "0x%"HWADDR_PRIx", %p) = %d (%m)",
+   container, iova, int128_get64(llsize), vaddr, ret);
+if (memory_region_is_ram_device(section->mr)) {
+/* Allow unexpected mappings not to be fatal for RAM devices */
+error_report_err(*err);
+return 0;
+}
+return ret;
+}
+return 0;
+}
+
+static void vfio_dma_unmap_ram_section(VFIOContainer *container,
+   MemoryRegionSection *section)
+{
+Int128 llend, llsize;
+hwaddr iova, end;
+bool try_unmap = true;
+int ret;
+
+iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+llend = int128_make64(section->offset_within_address_space);
+llend = int128_add(llend, section->size);
+llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+
+if (int128_ge(int128_make64(iova), llend)) {
+return;
+}
+end = int128_get64(int128_sub(llend, int128_one()));
+
+llsize = int128_sub(llend, int128_make64(iova));
+
+trace_vfio_dma_unmap_ram(iova, end);
+
+if (memory_region_is_ram_device(section->mr)) {
+hwaddr pgmask;
+VFIOHostDMAWindow *hostwin = hostwin_from_range(container, iova, end);
+
+assert(hostwin); /* or region_add() would have failed */
+
+pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
+try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
+}
+
+if (try_unmap) {
+ret = vfio_dma_unmap(container, iova, int128_get64(llsize));
+if (ret) {
+error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx") = %d (%m)",
+ container, iova, int128_get64(llsize), ret);
+}
+}
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
  MemoryRegionSection *section)
 {
 VFIOContainer *container = container_of(listener, VFIOContainer, listener);
 hwaddr iova, end;
-Int128 llend, llsize;
-void *vaddr;
+Int128 llend;
 int ret;
 VFIOHostDMAWindow *hostwin;
 Error *err = NULL;
@@ -655,39 +757,7 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 }
 
 /* Here we assume that memory_region_is_ram(section->mr)==true */
-
-vaddr = memory_region_get_ram_ptr(section->mr) +
-section->offset_within_region +
-(iova - section->offset_within_address_space);
-
-trac

[RFC v6 13/24] vfio: Pass stage 1 MSI bindings to the host

2020-03-20 Thread Eric Auger

We register the stage1 MSI bindings when enabling the vectors
and we unregister them on container disconnection.

Signed-off-by: Eric Auger 

---

v4 -> v5:
- use VFIO_IOMMU_SET_MSI_BINDING

v2 -> v3:
- only register the notifier if the IOMMU translates MSIs
- record the msi bindings in a container list and unregister on
  container release
---
 hw/vfio/common.c  | 52 +++
 hw/vfio/pci.c | 51 +-
 hw/vfio/trace-events  |  2 ++
 include/hw/vfio/vfio-common.h |  9 ++
 4 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index c0ae59bfe6..4d51b1f63b 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -484,6 +484,56 @@ static void vfio_iommu_unmap_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
 }
 }
 
+int vfio_iommu_set_msi_binding(VFIOContainer *container,
+   IOMMUTLBEntry *iotlb)
+{
+struct vfio_iommu_type1_set_msi_binding ustruct;
+VFIOMSIBinding *binding;
+int ret;
+
+QLIST_FOREACH(binding, &container->msibinding_list, next) {
+if (binding->iova == iotlb->iova) {
+return 0;
+}
+}
+
+ustruct.argsz = sizeof(struct vfio_iommu_type1_set_msi_binding);
+ustruct.iova = iotlb->iova;
+ustruct.flags = VFIO_IOMMU_BIND_MSI;
+ustruct.gpa = iotlb->translated_addr;
+ustruct.size = iotlb->addr_mask + 1;
+ret = ioctl(container->fd, VFIO_IOMMU_SET_MSI_BINDING , &ustruct);
+if (ret) {
+error_report("%s: failed to register the stage1 MSI binding (%m)",
+ __func__);
+return ret;
+}
+binding =  g_new0(VFIOMSIBinding, 1);
+binding->iova = ustruct.iova;
+binding->gpa = ustruct.gpa;
+binding->size = ustruct.size;
+
+QLIST_INSERT_HEAD(&container->msibinding_list, binding, next);
+return 0;
+}
+
+static void vfio_container_unbind_msis(VFIOContainer *container)
+{
+VFIOMSIBinding *binding, *tmp;
+
+QLIST_FOREACH_SAFE(binding, &container->msibinding_list, next, tmp) {
+struct vfio_iommu_type1_set_msi_binding ustruct;
+
+/* the MSI doorbell is not used anymore, unregister it */
+ustruct.argsz = sizeof(struct vfio_iommu_type1_set_msi_binding);
+ustruct.flags = VFIO_IOMMU_UNBIND_MSI;
+ustruct.iova = binding->iova;
+ioctl(container->fd, VFIO_IOMMU_SET_MSI_BINDING , &ustruct);
+QLIST_REMOVE(binding, next);
+g_free(binding);
+}
+}
+
 static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 {
 VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
@@ -1598,6 +1648,8 @@ static void vfio_disconnect_container(VFIOGroup *group)
 g_free(giommu);
 }
 
+vfio_container_unbind_msis(container);
+
 trace_vfio_disconnect_container(container->fd);
 close(container->fd);
 g_free(container);
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index fc314cc6a9..6f2d5696c3 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -377,6 +377,49 @@ static void vfio_msi_interrupt(void *opaque)
 notify(&vdev->pdev, nr);
 }
 
+static int vfio_register_msi_binding(VFIOPCIDevice *vdev, int vector_n)
+{
+VFIOContainer *container = vdev->vbasedev.group->container;
+PCIDevice *dev = &vdev->pdev;
+AddressSpace *as = pci_device_iommu_address_space(dev);
+MSIMessage msg = pci_get_msi_message(dev, vector_n);
+IOMMUMemoryRegionClass *imrc;
+IOMMUMemoryRegion *iommu_mr;
+bool msi_translate = false, nested = false;
+IOMMUTLBEntry entry;
+
+if (as == &address_space_memory) {
+return 0;
+}
+
+iommu_mr = IOMMU_MEMORY_REGION(as->root);
+memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_MSI_TRANSLATE,
+ (void *)&msi_translate);
+memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED,
+ (void *)&nested);
+imrc = memory_region_get_iommu_class_nocheck(iommu_mr);
+
+if (!nested || !msi_translate) {
+return 0;
+}
+
+/* MSI doorbell address is translated by an IOMMU */
+
+rcu_read_lock();
+entry = imrc->translate(iommu_mr, msg.address, IOMMU_WO, 0);
+rcu_read_unlock();
+
+if (entry.perm == IOMMU_NONE) {
+return -ENOENT;
+}
+
+trace_vfio_register_msi_binding(vdev->vbasedev.name, vector_n,
+msg.address, entry.translated_addr);
+
+vfio_iommu_set_msi_binding(container, &entry);
+return 0;
+}
+
 static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
 {
 struct vfio_irq_set *irq_set;
@@ -394,7 +437,7 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool 
msix)
 fds = (int32_t *)&irq_set->data;
 
 for (i = 0; i < vdev->nr_vectors; i++) {
-int fd = -1;
+int ret, fd = -1;
 
 /*
  * MSI vs MSI-X - The guest has direct access to MSI mask and pending

[RFC v6 14/24] vfio: Helper to get IRQ info including capabilities

2020-03-20 Thread Eric Auger

As done for vfio regions, add helpers to retrieve irq info
including their optional capabilities.

Signed-off-by: Eric Auger 
---
 hw/vfio/common.c  | 97 +++
 hw/vfio/trace-events  |  1 +
 include/hw/vfio/vfio-common.h |  7 +++
 3 files changed, 105 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 4d51b1f63b..327fedf7e4 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1014,6 +1014,25 @@ vfio_get_region_info_cap(struct vfio_region_info *info, 
uint16_t id)
 return NULL;
 }
 
+struct vfio_info_cap_header *
+vfio_get_irq_info_cap(struct vfio_irq_info *info, uint16_t id)
+{
+struct vfio_info_cap_header *hdr;
+void *ptr = info;
+
+if (!(info->flags & VFIO_IRQ_INFO_FLAG_CAPS)) {
+return NULL;
+}
+
+for (hdr = ptr + info->cap_offset; hdr != ptr; hdr = ptr + hdr->next) {
+if (hdr->id == id) {
+return hdr;
+}
+}
+
+return NULL;
+}
+
 static int vfio_setup_region_sparse_mmaps(VFIORegion *region,
   struct vfio_region_info *info)
 {
@@ -1842,6 +1861,33 @@ retry:
 return 0;
 }
 
+int vfio_get_irq_info(VFIODevice *vbasedev, int index,
+  struct vfio_irq_info **info)
+{
+size_t argsz = sizeof(struct vfio_irq_info);
+
+*info = g_malloc0(argsz);
+
+(*info)->index = index;
+retry:
+(*info)->argsz = argsz;
+
+if (ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, *info)) {
+g_free(*info);
+*info = NULL;
+return -errno;
+}
+
+if ((*info)->argsz > argsz) {
+argsz = (*info)->argsz;
+*info = g_realloc(*info, argsz);
+
+goto retry;
+}
+
+return 0;
+}
+
 int vfio_get_dev_region_info(VFIODevice *vbasedev, uint32_t type,
  uint32_t subtype, struct vfio_region_info **info)
 {
@@ -1877,6 +1923,42 @@ int vfio_get_dev_region_info(VFIODevice *vbasedev, 
uint32_t type,
 return -ENODEV;
 }
 
+int vfio_get_dev_irq_info(VFIODevice *vbasedev, uint32_t type,
+  uint32_t subtype, struct vfio_irq_info **info)
+{
+int i;
+
+for (i = 0; i < vbasedev->num_irqs; i++) {
+struct vfio_info_cap_header *hdr;
+struct vfio_irq_info_cap_type *cap_type;
+
+if (vfio_get_irq_info(vbasedev, i, info)) {
+continue;
+}
+
+hdr = vfio_get_irq_info_cap(*info, VFIO_IRQ_INFO_CAP_TYPE);
+if (!hdr) {
+g_free(*info);
+continue;
+}
+
+cap_type = container_of(hdr, struct vfio_irq_info_cap_type, header);
+
+trace_vfio_get_dev_irq(vbasedev->name, i,
+   cap_type->type, cap_type->subtype);
+
+if (cap_type->type == type && cap_type->subtype == subtype) {
+return 0;
+}
+
+g_free(*info);
+}
+
+*info = NULL;
+return -ENODEV;
+}
+
+
 bool vfio_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type)
 {
 struct vfio_region_info *info = NULL;
@@ -1892,6 +1974,21 @@ bool vfio_has_region_cap(VFIODevice *vbasedev, int 
region, uint16_t cap_type)
 return ret;
 }
 
+bool vfio_has_irq_cap(VFIODevice *vbasedev, int region, uint16_t cap_type)
+{
+struct vfio_region_info *info = NULL;
+bool ret = false;
+
+if (!vfio_get_region_info(vbasedev, region, &info)) {
+if (vfio_get_region_info_cap(info, cap_type)) {
+ret = true;
+}
+g_free(info);
+}
+
+return ret;
+}
+
 /*
  * Interfaces for IBM EEH (Enhanced Error Handling)
  */
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 5de97a8882..c04a8c12d8 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -114,6 +114,7 @@ vfio_region_mmaps_set_enabled(const char *name, bool 
enabled) "Region %s mmaps e
 vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) 
"Device %s region %d: %d sparse mmap entries"
 vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) 
"sparse entry %d [0x%lx - 0x%lx]"
 vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t 
subtype) "%s index %d, %08x/%0x8"
+vfio_get_dev_irq(const char *name, int index, uint32_t type, uint32_t subtype) 
"%s index %d, %08x/%0x8"
 vfio_dma_unmap_overflow_workaround(void) ""
 vfio_iommu_addr_inv_iotlb(int asid, uint64_t addr, uint64_t size, uint64_t 
nb_granules, bool leaf) "nested IOTLB invalidate asid=%d, addr=0x%"PRIx64" 
granule_size=0x%"PRIx64" nb_granules=0x%"PRIx64" leaf=%d"
 vfio_iommu_asid_inv_iotlb(int asid) "nested IOTLB invalidate asid=%d"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 8ca34146d7..2ef39cbbc3 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -200,6 +200,13 @@ int vfio_get_dev_region_info(VFIODevice *vbasedev, 
uint32_t type,
 bool vfio_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type);
 struct vfio_info

[RFC v6 17/24] vfio/pci: Implement the DMA fault handler

2020-03-20 Thread Eric Auger

Whenever the eventfd is triggered, we retrieve the DMA fault(s)
from the mmapped fault region and inject them in the iommu
memory region.

Signed-off-by: Eric Auger 
---
 hw/vfio/pci.c | 50 ++
 hw/vfio/pci.h |  1 +
 2 files changed, 51 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 029652a507..86ee4b6b47 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2845,10 +2845,60 @@ static PCIPASIDOps vfio_pci_pasid_ops = {
 static void vfio_dma_fault_notifier_handler(void *opaque)
 {
 VFIOPCIExtIRQ *ext_irq = opaque;
+VFIOPCIDevice *vdev = ext_irq->vdev;
+PCIDevice *pdev = &vdev->pdev;
+AddressSpace *as = pci_device_iommu_address_space(pdev);
+IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(as->root);
+struct vfio_region_dma_fault header;
+struct iommu_fault *queue;
+char *queue_buffer = NULL;
+ssize_t bytes;
 
 if (!event_notifier_test_and_clear(&ext_irq->notifier)) {
 return;
 }
+
+bytes = pread(vdev->vbasedev.fd, &header, sizeof(header),
+  vdev->dma_fault_region.fd_offset);
+if (bytes != sizeof(header)) {
+error_report("%s unable to read the fault region header (0x%lx)",
+ __func__, bytes);
+return;
+}
+
+/* Normally the fault queue is mmapped */
+queue = (struct iommu_fault *)vdev->dma_fault_region.mmaps[0].mmap;
+if (!queue) {
+size_t queue_size = header.nb_entries * header.entry_size;
+
+error_report("%s: fault queue not mmapped: slower fault handling",
+ vdev->vbasedev.name);
+
+queue_buffer = g_malloc(queue_size);
+bytes =  pread(vdev->vbasedev.fd, queue_buffer, queue_size,
+   vdev->dma_fault_region.fd_offset + header.offset);
+if (bytes != queue_size) {
+error_report("%s unable to read the fault queue (0x%lx)",
+ __func__, bytes);
+return;
+}
+
+queue = (struct iommu_fault *)queue_buffer;
+}
+
+while (vdev->fault_tail_index != header.head) {
+memory_region_inject_faults(iommu_mr, 1,
+&queue[vdev->fault_tail_index]);
+vdev->fault_tail_index =
+(vdev->fault_tail_index + 1) % header.nb_entries;
+}
+bytes = pwrite(vdev->vbasedev.fd, &vdev->fault_tail_index, 4,
+   vdev->dma_fault_region.fd_offset);
+if (bytes != 4) {
+error_report("%s unable to write the fault region tail index (0x%lx)",
+ __func__, bytes);
+}
+g_free(queue_buffer);
 }
 
 static int vfio_register_ext_irq_handler(VFIOPCIDevice *vdev,
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index c5a59a8e3d..2d0b65d8ff 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -142,6 +142,7 @@ typedef struct VFIOPCIDevice {
 EventNotifier req_notifier;
 VFIOPCIExtIRQ *ext_irqs;
 VFIORegion dma_fault_region;
+uint32_t fault_tail_index;
 int (*resetfn)(struct VFIOPCIDevice *);
 uint32_t vendor_id;
 uint32_t device_id;
-- 
2.20.1

[RFC v6 10/24] vfio: Introduce hostwin_from_range helper

2020-03-20 Thread Eric Auger

Let's introduce a hostwin_from_range() helper that returns the
hostwin encapsulating an IOVA range or NULL if none is found.

This improves the readibility of callers and removes the usage
of hostwin_found.

Signed-off-by: Eric Auger 
---
 hw/vfio/common.c | 36 +---
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ac417b5dbd..f20b37fbee 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -499,6 +499,19 @@ out:
 rcu_read_unlock();
 }
 
+static VFIOHostDMAWindow *
+hostwin_from_range(VFIOContainer *container, hwaddr iova, hwaddr end)
+{
+VFIOHostDMAWindow *hostwin;
+
+QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
+return hostwin;
+}
+}
+return NULL;
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
  MemoryRegionSection *section)
 {
@@ -508,7 +521,6 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 void *vaddr;
 int ret;
 VFIOHostDMAWindow *hostwin;
-bool hostwin_found;
 Error *err = NULL;
 
 if (vfio_listener_skipped_section(section)) {
@@ -593,15 +605,8 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 #endif
 }
 
-hostwin_found = false;
-QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
-if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
-hostwin_found = true;
-break;
-}
-}
-
-if (!hostwin_found) {
+hostwin = hostwin_from_range(container, iova, end);
+if (!hostwin) {
 error_setg(&err, "Container %p can't map guest IOVA region"
" 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
 goto fail;
@@ -774,16 +779,9 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 
 if (memory_region_is_ram_device(section->mr)) {
 hwaddr pgmask;
-VFIOHostDMAWindow *hostwin;
-bool hostwin_found = false;
+VFIOHostDMAWindow *hostwin = hostwin_from_range(container, iova, end);
 
-QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
-if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
-hostwin_found = true;
-break;
-}
-}
-assert(hostwin_found); /* or region_add() would have failed */
+assert(hostwin); /* or region_add() would have failed */
 
 pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
 try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
-- 
2.20.1

[RFC v6 12/24] vfio: Set up nested stage mappings

2020-03-20 Thread Eric Auger

In nested mode, legacy vfio_iommu_map_notify cannot be used as
there is no "caching" mode and we do not trap on map.

On Intel, vfio_iommu_map_notify was used to DMA map the RAM
through the host single stage.

With nested mode, we need to setup the stage 2 and the stage 1
separately. This patch introduces a prereg_listener to setup
the stage 2 mapping.

The stage 1 mapping, owned by the guest, is passed to the host
when the guest invalidates the stage 1 configuration, through
a dedicated PCIPASIDOps callback. Guest IOTLB invalidations
are cascaded downto the host through another IOMMU MR UNMAP
notifier.

Signed-off-by: Eric Auger 

---

v6 -> v7:
- remove PASID based invalidation

v5 -> v6:
- add error_report_err()
- remove the abort in case of nested stage case

v4 -> v5:
- use VFIO_IOMMU_SET_PASID_TABLE
- use PCIPASIDOps for config notification

v3 -> v4:
- use iommu_inv_pasid_info for ASID invalidation

v2 -> v3:
- use VFIO_IOMMU_ATTACH_PASID_TABLE
- new user API
- handle leaf

v1 -> v2:
- adapt to uapi changes
- pass the asid
- pass IOMMU_NOTIFIER_S1_CFG when initializing the config notifier
---
 hw/vfio/common.c | 112 ---
 hw/vfio/pci.c|  21 
 hw/vfio/trace-events |   2 +
 3 files changed, 129 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index e067009da8..c0ae59bfe6 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -446,6 +446,44 @@ static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void 
**vaddr,
 return true;
 }
 
+/* Propagate a guest IOTLB invalidation to the host (nested mode) */
+static void vfio_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
+{
+VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
+hwaddr start = iotlb->iova + giommu->iommu_offset;
+
+VFIOContainer *container = giommu->container;
+struct vfio_iommu_type1_cache_invalidate ustruct;
+size_t size = iotlb->addr_mask + 1;
+int ret;
+
+assert(iotlb->perm == IOMMU_NONE);
+
+ustruct.argsz = sizeof(ustruct);
+ustruct.flags = 0;
+ustruct.info.version = IOMMU_CACHE_INVALIDATE_INFO_VERSION_1;
+
+ustruct.info.cache = IOMMU_CACHE_INV_TYPE_IOTLB;
+ustruct.info.granularity = IOMMU_INV_GRANU_ADDR;
+ustruct.info.addr_info.flags = IOMMU_INV_ADDR_FLAGS_ARCHID;
+if (iotlb->leaf) {
+ustruct.info.addr_info.flags |= IOMMU_INV_ADDR_FLAGS_LEAF;
+}
+ustruct.info.addr_info.archid = iotlb->arch_id;
+ustruct.info.addr_info.addr = start;
+ustruct.info.addr_info.granule_size = size;
+ustruct.info.addr_info.nb_granules = 1;
+trace_vfio_iommu_addr_inv_iotlb(iotlb->arch_id, start, size, 1,
+iotlb->leaf);
+
+ret = ioctl(container->fd, VFIO_IOMMU_CACHE_INVALIDATE, &ustruct);
+if (ret) {
+error_report("%p: failed to invalidate CACHE for 0x%"PRIx64
+ " mask=0x%"PRIx64" (%d)",
+ container, start, iotlb->addr_mask, ret);
+}
+}
+
 static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 {
 VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
@@ -615,6 +653,35 @@ static void vfio_dma_unmap_ram_section(VFIOContainer 
*container,
 }
 }
 
+static void vfio_prereg_listener_region_add(MemoryListener *listener,
+MemoryRegionSection *section)
+{
+VFIOContainer *container =
+container_of(listener, VFIOContainer, prereg_listener);
+Error *err = NULL;
+
+if (!memory_region_is_ram(section->mr)) {
+return;
+}
+
+vfio_dma_map_ram_section(container, section, &err);
+if (err) {
+error_report_err(err);
+}
+}
+static void vfio_prereg_listener_region_del(MemoryListener *listener,
+ MemoryRegionSection *section)
+{
+VFIOContainer *container =
+container_of(listener, VFIOContainer, prereg_listener);
+
+if (!memory_region_is_ram(section->mr)) {
+return;
+}
+
+vfio_dma_unmap_ram_section(container, section);
+}
+
 static void vfio_listener_region_add(MemoryListener *listener,
  MemoryRegionSection *section)
 {
@@ -717,9 +784,10 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 memory_region_ref(section->mr);
 
 if (memory_region_is_iommu(section->mr)) {
+IOMMUNotify notify;
 VFIOGuestIOMMU *giommu;
 IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr);
-int iommu_idx;
+int iommu_idx, flags;
 
 trace_vfio_listener_region_add_iommu(iova, end);
 /*
@@ -738,8 +806,18 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 llend = int128_sub(llend, int128_one());
 iommu_idx = memory_region_iommu_attrs_to_index(iommu_mr,
MEMTXATTRS_UNSPECIFIED);
-iommu_notifier_init(&giommu->n, vfio_iommu

[RFC v6 08/24] pci: introduce PCIPASIDOps to PCIDevice

2020-03-20 Thread Eric Auger

From: Liu Yi L 

This patch introduces PCIPASIDOps for IOMMU related operations.

https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg00078.html
https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg00940.html

So far, to setup virt-SVA for assigned SVA capable device, needs to
configure host translation structures for specific pasid. (e.g. bind
guest page table to host and enable nested translation in host).
Besides, vIOMMU emulator needs to forward guest's cache invalidation
to host since host nested translation is enabled. e.g. on VT-d, guest
owns 1st level translation table, thus cache invalidation for 1st
level should be propagated to host.

This patch adds two functions: alloc_pasid and free_pasid to support
guest pasid allocation and free. The implementations of the callbacks
would be device passthru modules. Like vfio.

Cc: Kevin Tian 
Cc: Jacob Pan 
Cc: Peter Xu 
Cc: Eric Auger 
Cc: Yi Sun 
Cc: David Gibson 
Signed-off-by: Liu Yi L 
Signed-off-by: Yi Sun 
---
 hw/pci/pci.c | 34 ++
 include/hw/pci/pci.h | 11 +++
 2 files changed, 45 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e1ed6677e1..67e03b8db1 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2695,6 +2695,40 @@ void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void 
*opaque)
 bus->iommu_opaque = opaque;
 }
 
+void pci_setup_pasid_ops(PCIDevice *dev, PCIPASIDOps *ops)
+{
+assert(ops && !dev->pasid_ops);
+dev->pasid_ops = ops;
+}
+
+bool pci_device_is_pasid_ops_set(PCIBus *bus, int32_t devfn)
+{
+PCIDevice *dev;
+
+if (!bus) {
+return false;
+}
+
+dev = bus->devices[devfn];
+return !!(dev && dev->pasid_ops);
+}
+
+int pci_device_set_pasid_table(PCIBus *bus, int32_t devfn,
+   IOMMUConfig *config)
+{
+PCIDevice *dev;
+
+if (!bus) {
+return -EINVAL;
+}
+
+dev = bus->devices[devfn];
+if (dev && dev->pasid_ops && dev->pasid_ops->set_pasid_table) {
+return dev->pasid_ops->set_pasid_table(bus, devfn, config);
+}
+return -ENOENT;
+}
+
 static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, void *opaque)
 {
 Range *range = opaque;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index cfedf5a995..2146cb7519 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -8,6 +8,7 @@
 #include "hw/isa/isa.h"
 
 #include "hw/pci/pcie.h"
+#include "hw/iommu/iommu.h"
 
 extern bool pci_available;
 
@@ -264,6 +265,11 @@ struct PCIReqIDCache {
 };
 typedef struct PCIReqIDCache PCIReqIDCache;
 
+struct PCIPASIDOps {
+int (*set_pasid_table)(PCIBus *bus, int32_t devfn, IOMMUConfig *config);
+};
+typedef struct PCIPASIDOps PCIPASIDOps;
+
 struct PCIDevice {
 DeviceState qdev;
 bool partially_hotplugged;
@@ -357,6 +363,7 @@ struct PCIDevice {
 
 /* ID of standby device in net_failover pair */
 char *failover_pair_id;
+PCIPASIDOps *pasid_ops;
 };
 
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
@@ -490,6 +497,10 @@ typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, 
int);
 AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
 void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
 
+void pci_setup_pasid_ops(PCIDevice *dev, PCIPASIDOps *ops);
+bool pci_device_is_pasid_ops_set(PCIBus *bus, int32_t devfn);
+int pci_device_set_pasid_table(PCIBus *bus, int32_t devfn, IOMMUConfig 
*config);
+
 static inline void
 pci_set_byte(uint8_t *config, uint8_t val)
 {
-- 
2.20.1

[RFC v6 15/24] vfio/pci: Register handler for iommu fault

2020-03-20 Thread Eric Auger

We use the new extended IRQ VFIO_IRQ_TYPE_NESTED type and
VFIO_IRQ_SUBTYPE_DMA_FAULT subtype to set/unset
a notifier for physical DMA faults. The associated eventfd is
triggered, in nested mode, whenever a fault is detected at IOMMU
physical level.

The actual handler will be implemented in subsequent patches.

Signed-off-by: Eric Auger 

---

v4 -> v5:
- index_to_str now returns the index name, ie. DMA_FAULT
- use the extended IRQ

v3 -> v4:
- check VFIO_PCI_DMA_FAULT_IRQ_INDEX is supported at kernel level
  before attempting to set signaling for it.
---
 hw/vfio/pci.c | 81 ++-
 hw/vfio/pci.h |  7 +
 2 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 6f2d5696c3..7579f476b0 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2780,6 +2780,76 @@ static PCIPASIDOps vfio_pci_pasid_ops = {
 .set_pasid_table = vfio_iommu_set_pasid_table,
 };
 
+static void vfio_dma_fault_notifier_handler(void *opaque)
+{
+VFIOPCIExtIRQ *ext_irq = opaque;
+
+if (!event_notifier_test_and_clear(&ext_irq->notifier)) {
+return;
+}
+}
+
+static int vfio_register_ext_irq_handler(VFIOPCIDevice *vdev,
+ uint32_t type, uint32_t subtype,
+ IOHandler *handler)
+{
+int32_t fd, ext_irq_index, index;
+struct vfio_irq_info *irq_info;
+Error *err = NULL;
+EventNotifier *n;
+int ret;
+
+ret = vfio_get_dev_irq_info(&vdev->vbasedev, type, subtype, &irq_info);
+if (ret) {
+return ret;
+}
+index = irq_info->index;
+ext_irq_index = irq_info->index - VFIO_PCI_NUM_IRQS;
+g_free(irq_info);
+
+vdev->ext_irqs[ext_irq_index].vdev = vdev;
+vdev->ext_irqs[ext_irq_index].index = index;
+n = &vdev->ext_irqs[ext_irq_index].notifier;
+
+ret = event_notifier_init(n, 0);
+if (ret) {
+error_report("vfio: Unable to init event notifier for ext irq %d(%d)",
+ ext_irq_index, ret);
+return ret;
+}
+
+fd = event_notifier_get_fd(n);
+qemu_set_fd_handler(fd, vfio_dma_fault_notifier_handler, NULL,
+&vdev->ext_irqs[ext_irq_index]);
+
+ret = vfio_set_irq_signaling(&vdev->vbasedev, index, 0,
+ VFIO_IRQ_SET_ACTION_TRIGGER, fd, &err);
+if (ret) {
+error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+qemu_set_fd_handler(fd, NULL, NULL, vdev);
+event_notifier_cleanup(n);
+}
+return ret;
+}
+
+static void vfio_unregister_ext_irq_notifiers(VFIOPCIDevice *vdev)
+{
+VFIODevice *vbasedev = &vdev->vbasedev;
+Error *err = NULL;
+int i;
+
+for (i = 0; i < vbasedev->num_irqs - VFIO_PCI_NUM_IRQS; i++) {
+if (vfio_set_irq_signaling(vbasedev, i + VFIO_PCI_NUM_IRQS , 0,
+   VFIO_IRQ_SET_ACTION_TRIGGER, -1, &err)) {
+error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+}
+qemu_set_fd_handler(event_notifier_get_fd(&vdev->ext_irqs[i].notifier),
+NULL, NULL, vdev);
+event_notifier_cleanup(&vdev->ext_irqs[i].notifier);
+}
+g_free(vdev->ext_irqs);
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
 VFIOPCIDevice *vdev = PCI_VFIO(pdev);
@@ -2790,7 +2860,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 ssize_t len;
 struct stat st;
 int groupid;
-int i, ret;
+int i, ret, nb_ext_irqs;
 bool is_mdev;
 
 if (!vdev->vbasedev.sysfsdev) {
@@ -2890,6 +2960,11 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 goto error;
 }
 
+nb_ext_irqs = vdev->vbasedev.num_irqs - VFIO_PCI_NUM_IRQS;
+if (nb_ext_irqs > 0) {
+vdev->ext_irqs = g_new0(VFIOPCIExtIRQ, nb_ext_irqs);
+}
+
 vfio_populate_device(vdev, &err);
 if (err) {
 error_propagate(errp, err);
@@ -3094,6 +3169,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 
 vfio_register_err_notifier(vdev);
 vfio_register_req_notifier(vdev);
+vfio_register_ext_irq_handler(vdev, VFIO_IRQ_TYPE_NESTED,
+  VFIO_IRQ_SUBTYPE_DMA_FAULT,
+  vfio_dma_fault_notifier_handler);
 vfio_setup_resetfn_quirk(vdev);
 
 pci_setup_pasid_ops(pdev, &vfio_pci_pasid_ops);
@@ -3145,6 +3223,7 @@ static void vfio_exitfn(PCIDevice *pdev)
 
 vfio_unregister_req_notifier(vdev);
 vfio_unregister_err_notifier(vdev);
+vfio_unregister_ext_irq_notifiers(vdev);
 pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
 if (vdev->irqchip_change_notifier.notify) {
 kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 0da7a20a7e..56f0fabb33 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -113,6 +113,12 @@ typedef struct VFIOMSIXInfo {
 unsigned long *pe

[RFC v6 07/24] iommu: Introduce generic header

2020-03-20 Thread Eric Auger

This header is meant to exposes data types used by
several IOMMU devices such as struct for SVA and
nested stage configuration.

Signed-off-by: Eric Auger 
---
 include/hw/iommu/iommu.h | 28 
 1 file changed, 28 insertions(+)
 create mode 100644 include/hw/iommu/iommu.h

diff --git a/include/hw/iommu/iommu.h b/include/hw/iommu/iommu.h
new file mode 100644
index 00..12092bda7b
--- /dev/null
+++ b/include/hw/iommu/iommu.h
@@ -0,0 +1,28 @@
+/*
+ * common header for iommu devices
+ *
+ * Copyright Red Hat, Inc. 2019
+ *
+ * Authors:
+ *  Eric Auger 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_HW_IOMMU_IOMMU_H
+#define QEMU_HW_IOMMU_IOMMU_H
+#ifdef __linux__
+#include 
+#endif
+
+typedef struct IOMMUConfig {
+union {
+#ifdef __linux__
+struct iommu_pasid_table_config pasid_cfg;
+#endif
+  };
+} IOMMUConfig;
+
+
+#endif /* QEMU_HW_IOMMU_IOMMU_H */
-- 
2.20.1

[RFC v6 09/24] vfio: Force nested if iommu requires it

2020-03-20 Thread Eric Auger

In case we detect the address space is translated by
a virtual IOMMU which requires HW nested paging to
integrate with VFIO, let's set up the container with
the VFIO_TYPE1_NESTING_IOMMU iommu_type.

Signed-off-by: Eric Auger 

---

v4 -> v5:
- fail immediatly if nested is wanted but not supported

v2 -> v3:
- add "nested only is selected if requested by @force_nested"
  comment in this patch
---
 hw/vfio/common.c | 36 
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 0b3593b3c0..ac417b5dbd 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1155,27 +1155,38 @@ static void vfio_put_address_space(VFIOAddressSpace 
*space)
  * vfio_get_iommu_type - selects the richest iommu_type (v2 first)
  */
 static int vfio_get_iommu_type(VFIOContainer *container,
+   bool want_nested,
Error **errp)
 {
-int iommu_types[] = { VFIO_TYPE1v2_IOMMU, VFIO_TYPE1_IOMMU,
+int iommu_types[] = { VFIO_TYPE1_NESTING_IOMMU,
+  VFIO_TYPE1v2_IOMMU, VFIO_TYPE1_IOMMU,
   VFIO_SPAPR_TCE_v2_IOMMU, VFIO_SPAPR_TCE_IOMMU };
-int i;
+int i, ret = -EINVAL;
 
 for (i = 0; i < ARRAY_SIZE(iommu_types); i++) {
 if (ioctl(container->fd, VFIO_CHECK_EXTENSION, iommu_types[i])) {
-return iommu_types[i];
+if (iommu_types[i] == VFIO_TYPE1_NESTING_IOMMU && !want_nested) {
+continue;
+}
+ret = iommu_types[i];
+break;
 }
 }
-error_setg(errp, "No available IOMMU models");
-return -EINVAL;
+if (ret < 0) {
+error_setg(errp, "No available IOMMU models");
+} else if (want_nested && ret != VFIO_TYPE1_NESTING_IOMMU) {
+error_setg(errp, "Nested mode requested but not supported");
+ret = -EINVAL;
+}
+return ret;
 }
 
 static int vfio_init_container(VFIOContainer *container, int group_fd,
-   Error **errp)
+   bool want_nested, Error **errp)
 {
 int iommu_type, ret;
 
-iommu_type = vfio_get_iommu_type(container, errp);
+iommu_type = vfio_get_iommu_type(container, want_nested, errp);
 if (iommu_type < 0) {
 return iommu_type;
 }
@@ -1211,6 +1222,14 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 VFIOContainer *container;
 int ret, fd;
 VFIOAddressSpace *space;
+IOMMUMemoryRegion *iommu_mr;
+bool nested = false;
+
+if (as != &address_space_memory && memory_region_is_iommu(as->root)) {
+iommu_mr = IOMMU_MEMORY_REGION(as->root);
+memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_VFIO_NESTED,
+ (void *)&nested);
+}
 
 space = vfio_get_address_space(as);
 
@@ -1272,12 +1291,13 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as,
 QLIST_INIT(&container->giommu_list);
 QLIST_INIT(&container->hostwin_list);
 
-ret = vfio_init_container(container, group->fd, errp);
+ret = vfio_init_container(container, group->fd, nested, errp);
 if (ret) {
 goto free_container_exit;
 }
 
 switch (container->iommu_type) {
+case VFIO_TYPE1_NESTING_IOMMU:
 case VFIO_TYPE1v2_IOMMU:
 case VFIO_TYPE1_IOMMU:
 {
-- 
2.20.1

[RFC v6 03/24] memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute

2020-03-20 Thread Eric Auger

We introduce a new IOMMU Memory Region attribute,
IOMMU_ATTR_VFIO_NESTED that tells whether the virtual IOMMU
requires HW nested paging for VFIO integration.

Current Intel virtual IOMMU device supports "Caching
Mode" and does not require 2 stages at physical level to be
integrated with VFIO. However SMMUv3 does not implement such
"caching mode" and requires to use HW nested paging.

As such SMMUv3 is the first IOMMU device to advertise this
attribute.

Signed-off-by: Eric Auger 
---
 hw/arm/smmuv3.c   | 12 
 include/exec/memory.h |  3 ++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 57a79df55b..e33eabd028 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1508,6 +1508,17 @@ static int smmuv3_notify_flag_changed(IOMMUMemoryRegion 
*iommu,
 return 0;
 }
 
+static int smmuv3_get_attr(IOMMUMemoryRegion *iommu,
+   enum IOMMUMemoryRegionAttr attr,
+   void *data)
+{
+if (attr == IOMMU_ATTR_VFIO_NESTED) {
+*(bool *) data = true;
+return 0;
+}
+return -EINVAL;
+}
+
 static void smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
   void *data)
 {
@@ -1515,6 +1526,7 @@ static void 
smmuv3_iommu_memory_region_class_init(ObjectClass *klass,
 
 imrc->translate = smmuv3_translate;
 imrc->notify_flag_changed = smmuv3_notify_flag_changed;
+imrc->get_attr = smmuv3_get_attr;
 }
 
 static const TypeInfo smmuv3_type_info = {
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 1614d9a02c..b9d2f0a437 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -213,7 +213,8 @@ typedef struct MemoryRegionClass {
 
 
 enum IOMMUMemoryRegionAttr {
-IOMMU_ATTR_SPAPR_TCE_FD
+IOMMU_ATTR_SPAPR_TCE_FD,
+IOMMU_ATTR_VFIO_NESTED,
 };
 
 /**
-- 
2.20.1

[RFC v6 05/24] memory: Introduce IOMMU Memory Region inject_faults API

2020-03-20 Thread Eric Auger

This new API allows to inject @count iommu_faults into
the IOMMU memory region.

Signed-off-by: Eric Auger 
---
 include/exec/memory.h | 25 +
 memory.c  | 10 ++
 2 files changed, 35 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index f2c773163f..141a5dc197 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -57,6 +57,8 @@ struct MemoryRegionMmio {
 CPUWriteMemoryFunc *write[3];
 };
 
+struct iommu_fault;
+
 typedef struct IOMMUTLBEntry IOMMUTLBEntry;
 
 /* See address_space_translate: bit 0 is read, bit 1 is write.  */
@@ -357,6 +359,19 @@ typedef struct IOMMUMemoryRegionClass {
  * @iommu: the IOMMUMemoryRegion
  */
 int (*num_indexes)(IOMMUMemoryRegion *iommu);
+
+/*
+ * Inject @count faults into the IOMMU memory region
+ *
+ * Optional method: if this method is not provided, then
+ * memory_region_injection_faults() will return -ENOENT
+ *
+ * @iommu: the IOMMU memory region to inject the faults in
+ * @count: number of faults to inject
+ * @buf: fault buffer
+ */
+int (*inject_faults)(IOMMUMemoryRegion *iommu, int count,
+ struct iommu_fault *buf);
 } IOMMUMemoryRegionClass;
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
@@ -1365,6 +1380,16 @@ int memory_region_iommu_attrs_to_index(IOMMUMemoryRegion 
*iommu_mr,
  */
 int memory_region_iommu_num_indexes(IOMMUMemoryRegion *iommu_mr);
 
+/**
+ * memory_region_inject_faults : inject @count faults stored in @buf
+ *
+ * @iommu_mr: the IOMMU memory region
+ * @count: number of faults to be injected
+ * @buf: buffer containing the faults
+ */
+int memory_region_inject_faults(IOMMUMemoryRegion *iommu_mr, int count,
+struct iommu_fault *buf);
+
 /**
  * memory_region_name: get a memory region's name
  *
diff --git a/memory.c b/memory.c
index 09be40edd2..9cdd77e0de 100644
--- a/memory.c
+++ b/memory.c
@@ -2001,6 +2001,16 @@ int memory_region_iommu_num_indexes(IOMMUMemoryRegion 
*iommu_mr)
 return imrc->num_indexes(iommu_mr);
 }
 
+int memory_region_inject_faults(IOMMUMemoryRegion *iommu_mr, int count,
+struct iommu_fault *buf)
+{
+IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
+if (!imrc->inject_faults) {
+return -ENOENT;
+}
+return imrc->inject_faults(iommu_mr, count, buf);
+}
+
 void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
 {
 uint8_t mask = 1 << client;
-- 
2.20.1

[RFC v6 04/24] memory: Add IOMMU_ATTR_MSI_TRANSLATE IOMMU memory region attribute

2020-03-20 Thread Eric Auger

We introduce a new IOMMU Memory Region attribute, IOMMU_ATTR_MSI_TRANSLATE
which tells whether the virtual IOMMU translates MSIs. ARM SMMU
will expose this attribute since, as opposed to Intel DMAR, MSIs
are translated as any other DMA requests.

Signed-off-by: Eric Auger 
---
 include/exec/memory.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index b9d2f0a437..f2c773163f 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -215,6 +215,7 @@ typedef struct MemoryRegionClass {
 enum IOMMUMemoryRegionAttr {
 IOMMU_ATTR_SPAPR_TCE_FD,
 IOMMU_ATTR_VFIO_NESTED,
+IOMMU_ATTR_MSI_TRANSLATE,
 };
 
 /**
-- 
2.20.1

[RFC v6 06/24] memory: Add arch_id and leaf fields in IOTLBEntry

2020-03-20 Thread Eric Auger

TLB entries are usually tagged with some ids such as the asid
or pasid. When propagating an invalidation command from the
guest to the host, we need to pass this id.

Also we add a leaf field which indicates, in case of invalidation
notification, whether only cache entries for the last level of
translation are required to be invalidated.

Signed-off-by: Eric Auger 
---
 include/exec/memory.h | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 141a5dc197..d61311aeba 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -71,12 +71,30 @@ typedef enum {
 
 #define IOMMU_ACCESS_FLAG(r, w) (((r) ? IOMMU_RO : 0) | ((w) ? IOMMU_WO : 0))
 
+/**
+ * IOMMUTLBEntry - IOMMU TLB entry
+ *
+ * Structure used when performing a translation or when notifying MAP or
+ * UNMAP (invalidation) events
+ *
+ * @target_as: target address space
+ * @iova: IO virtual address (input)
+ * @translated_addr: translated address (output)
+ * @addr_mask: address mask (0xfff means 4K binding), must be multiple of 2
+ * @perm: permission flag of the mapping (NONE encodes no mapping or
+ * invalidation notification)
+ * @arch_id: architecture specific ID tagging the TLB
+ * @leaf: when @perm is NONE, indicates whether only caches for the last
+ * level of translation need to be invalidated.
+ */
 struct IOMMUTLBEntry {
 AddressSpace*target_as;
 hwaddr   iova;
 hwaddr   translated_addr;
-hwaddr   addr_mask;  /* 0xfff = 4k translation */
+hwaddr   addr_mask;
 IOMMUAccessFlags perm;
+uint32_t arch_id;
+bool leaf;
 };
 
 /*
-- 
2.20.1

[RFC v6 01/24] update-linux-headers: Import iommu.h

2020-03-20 Thread Eric Auger

Update the script to import the new iommu.h uapi header.

Signed-off-by: Eric Auger 
---
 scripts/update-linux-headers.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 29c27f4681..5b64ee3912 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -141,7 +141,7 @@ done
 
 rm -rf "$output/linux-headers/linux"
 mkdir -p "$output/linux-headers/linux"
-for header in kvm.h vfio.h vfio_ccw.h vhost.h \
+for header in kvm.h vfio.h vfio_ccw.h vhost.h iommu.h \
   psci.h psp-sev.h userfaultfd.h mman.h; do
 cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux"
 done
-- 
2.20.1

[RFC v6 02/24] header update against 5.6.0-rc3 and IOMMU/VFIO nested stage APIs

2020-03-20 Thread Eric Auger

This is an update against
https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10

Signed-off-by: Eric Auger 
---
 linux-headers/COPYING   |   2 +
 linux-headers/asm-x86/kvm.h |   1 +
 linux-headers/linux/iommu.h | 375 
 linux-headers/linux/vfio.h  | 109 ++-
 4 files changed, 486 insertions(+), 1 deletion(-)
 create mode 100644 linux-headers/linux/iommu.h

diff --git a/linux-headers/COPYING b/linux-headers/COPYING
index da4cb28feb..a635a38ef9 100644
--- a/linux-headers/COPYING
+++ b/linux-headers/COPYING
@@ -16,3 +16,5 @@ In addition, other licenses may also apply. Please see:
Documentation/process/license-rules.rst
 
 for more details.
+
+All contributions to the Linux Kernel are subject to this COPYING file.
diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 503d3f42da..3f3f780c8c 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -390,6 +390,7 @@ struct kvm_sync_regs {
 #define KVM_STATE_NESTED_GUEST_MODE0x0001
 #define KVM_STATE_NESTED_RUN_PENDING   0x0002
 #define KVM_STATE_NESTED_EVMCS 0x0004
+#define KVM_STATE_NESTED_MTF_PENDING   0x0008
 
 #define KVM_STATE_NESTED_SMM_GUEST_MODE0x0001
 #define KVM_STATE_NESTED_SMM_VMXON 0x0002
diff --git a/linux-headers/linux/iommu.h b/linux-headers/linux/iommu.h
new file mode 100644
index 00..1b3f6420bb
--- /dev/null
+++ b/linux-headers/linux/iommu.h
@@ -0,0 +1,375 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * IOMMU user API definitions
+ */
+
+#ifndef _IOMMU_H
+#define _IOMMU_H
+
+#include 
+
+#define IOMMU_FAULT_PERM_READ  (1 << 0) /* read */
+#define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */
+#define IOMMU_FAULT_PERM_EXEC  (1 << 2) /* exec */
+#define IOMMU_FAULT_PERM_PRIV  (1 << 3) /* privileged */
+
+/* Generic fault types, can be expanded IRQ remapping fault */
+enum iommu_fault_type {
+   IOMMU_FAULT_DMA_UNRECOV = 1,/* unrecoverable fault */
+   IOMMU_FAULT_PAGE_REQ,   /* page request fault */
+};
+
+enum iommu_fault_reason {
+   IOMMU_FAULT_REASON_UNKNOWN = 0,
+
+   /* Could not access the PASID table (fetch caused external abort) */
+   IOMMU_FAULT_REASON_PASID_FETCH,
+
+   /* PASID entry is invalid or has configuration errors */
+   IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
+
+   /*
+* PASID is out of range (e.g. exceeds the maximum PASID
+* supported by the IOMMU) or disabled.
+*/
+   IOMMU_FAULT_REASON_PASID_INVALID,
+
+   /*
+* An external abort occurred fetching (or updating) a translation
+* table descriptor
+*/
+   IOMMU_FAULT_REASON_WALK_EABT,
+
+   /*
+* Could not access the page table entry (Bad address),
+* actual translation fault
+*/
+   IOMMU_FAULT_REASON_PTE_FETCH,
+
+   /* Protection flag check failed */
+   IOMMU_FAULT_REASON_PERMISSION,
+
+   /* access flag check failed */
+   IOMMU_FAULT_REASON_ACCESS,
+
+   /* Output address of a translation stage caused Address Size fault */
+   IOMMU_FAULT_REASON_OOR_ADDRESS,
+};
+
+/**
+ * struct iommu_fault_unrecoverable - Unrecoverable fault data
+ * @reason: reason of the fault, from &enum iommu_fault_reason
+ * @flags: parameters of this fault (IOMMU_FAULT_UNRECOV_* values)
+ * @pasid: Process Address Space ID
+ * @perm: requested permission access using by the incoming transaction
+ *(IOMMU_FAULT_PERM_* values)
+ * @addr: offending page address
+ * @fetch_addr: address that caused a fetch abort, if any
+ */
+struct iommu_fault_unrecoverable {
+   __u32   reason;
+#define IOMMU_FAULT_UNRECOV_PASID_VALID(1 << 0)
+#define IOMMU_FAULT_UNRECOV_ADDR_VALID (1 << 1)
+#define IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID   (1 << 2)
+   __u32   flags;
+   __u32   pasid;
+   __u32   perm;
+   __u64   addr;
+   __u64   fetch_addr;
+};
+
+/**
+ * struct iommu_fault_page_request - Page Request data
+ * @flags: encodes whether the corresponding fields are valid and whether this
+ * is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* values)
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
+ * @addr: page address
+ * @private_data: device-specific private information
+ */
+struct iommu_fault_page_request {
+#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID   (1 << 0)
+#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1)
+#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2)
+   __u32   flags;
+   __u32   pasid;
+   __u32   grpid;
+   __u32   perm;
+   __u64   addr;
+   __u64   private_data[2];
+};
+
+/**
+ * struct iommu_fault - Generic fault data
+ * @type: fault type from &enum iommu_fault_type
+ * @padding: reserved for future use (should be zero)
+ * @event: fault event, when @typ

[RFC v6 00/24] vSMMUv3/pSMMUv3 2 stage VFIO integration

2020-03-20 Thread Eric Auger

Up to now vSMMUv3 has not been integrated with VFIO. VFIO
integration requires to program the physical IOMMU consistently
with the guest mappings. However, as opposed to VTD, SMMUv3 has
no "Caching Mode" which allows easy trapping of guest mappings.
This means the vSMMUV3 cannot use the same VFIO integration as VTD.

However SMMUv3 has 2 translation stages. This was devised with
virtualization use case in mind where stage 1 is "owned" by the
guest whereas the host uses stage 2 for VM isolation.

This series sets up this nested translation stage. It only works
if there is one physical SMMUv3 used along with QEMU vSMMUv3 (in
other words, it does not work if there is a physical SMMUv2).

- We force the host to use stage 2 instead of stage 1, when we
  detect a vSMMUV3 is behind a VFIO device. For a VFIO device
  without any virtual IOMMU, we still use stage 1 as many existing
  SMMUs expect this behavior.
- We use PCIPASIDOps to propage guest stage1 config changes on
  STE (Stream Table Entry) changes.
- We implement a specific UNMAP notifier that conveys guest
  IOTLB invalidations to the host
- We register MSI IOVA/GPA bindings to the host so that this latter
  can build a nested stage translation
- As the legacy MAP notifier is not called anymore, we must make
  sure stage 2 mappings are set. This is achieved through another
  prereg memory listener.
- Physical SMMU stage 1 related faults are reported to the guest
  via en eventfd mechanism and exposed trhough a dedicated VFIO-PCI
  region. Then they are reinjected into the guest.

Best Regards

Eric

This series can be found at:
https://github.com/eauger/qemu/tree/v4.2.0-2stage-rfcv6

Kernel Dependencies:
[1] [PATCH v10 00/11] SMMUv3 Nested Stage Setup (VFIO part)
[2] [PATCH v10 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
branch at: https://github.com/eauger/linux/tree/will-arm-smmu-updates-2stage-v10

History:

v5 -> v6:
- just rebase work

v4 -> v5:
- Use PCIPASIDOps for config update notifications
- removal of notification for MSI binding which is not needed
  anymore
- Use a single fault region
- use the specific interrupt index

v3 -> v4:
- adapt to changes in uapi (asid cache invalidation)
- check VFIO_PCI_DMA_FAULT_IRQ_INDEX is supported at kernel level
  before attempting to set signaling for it.
- sync on 5.2-rc1 kernel headers + Drew's patch that imports sve_context.h
- fix MSI binding for MSI (not MSIX)
- fix mingw compilation

v2 -> v3:
- rework fault handling
- MSI binding registration done in vfio-pci. MSI binding tear down called
  on container cleanup path
- leaf parameter propagated

v1 -> v2:
- Fixed dual assignment (asid now correctly propagated on TLB invalidations)
- Integrated fault reporting


Eric Auger (23):
  update-linux-headers: Import iommu.h
  header update against 5.6.0-rc3 and IOMMU/VFIO nested stage APIs
  memory: Add IOMMU_ATTR_VFIO_NESTED IOMMU memory region attribute
  memory: Add IOMMU_ATTR_MSI_TRANSLATE IOMMU memory region attribute
  memory: Introduce IOMMU Memory Region inject_faults API
  memory: Add arch_id and leaf fields in IOTLBEntry
  iommu: Introduce generic header
  vfio: Force nested if iommu requires it
  vfio: Introduce hostwin_from_range helper
  vfio: Introduce helpers to DMA map/unmap a RAM section
  vfio: Set up nested stage mappings
  vfio: Pass stage 1 MSI bindings to the host
  vfio: Helper to get IRQ info including capabilities
  vfio/pci: Register handler for iommu fault
  vfio/pci: Set up the DMA FAULT region
  vfio/pci: Implement the DMA fault handler
  hw/arm/smmuv3: Advertise MSI_TRANSLATE attribute
  hw/arm/smmuv3: Store the PASID table GPA in the translation config
  hw/arm/smmuv3: Fill the IOTLBEntry arch_id on NH_VA invalidation
  hw/arm/smmuv3: Fill the IOTLBEntry leaf field on NH_VA invalidation
  hw/arm/smmuv3: Pass stage 1 configurations to the host
  hw/arm/smmuv3: Implement fault injection
  hw/arm/smmuv3: Allow MAP notifiers

Liu Yi L (1):
  pci: introduce PCIPASIDOps to PCIDevice

 hw/arm/smmuv3.c | 189 ++--
 hw/arm/trace-events |   3 +-
 hw/pci/pci.c|  34 +++
 hw/vfio/common.c| 506 +---
 hw/vfio/pci.c   | 267 -
 hw/vfio/pci.h   |   9 +
 hw/vfio/trace-events|   9 +-
 include/exec/memory.h   |  49 +++-
 include/hw/arm/smmu-common.h|   1 +
 include/hw/iommu/iommu.h|  28 ++
 include/hw/pci/pci.h|  11 +
 include/hw/vfio/vfio-common.h   |  16 +
 linux-headers/COPYING   |   2 +
 linux-headers/asm-x86/kvm.h |   1 +
 linux-headers/linux/iommu.h | 375 +++
 linux-headers/linux/vfio.h  | 109 ++-
 memory.c|  10 +
 scripts/update-linux-headers.sh |   2 +-
 18 files changed, 1478 insertions(+), 143 deletions(-)
 create mode 100644 include/hw/iommu/iommu.h
 create mode 100644 linux-headers/linux/iommu.h

-- 
2.20.1

[Bug 1867519] Re: qemu 4.2 segfaults on VF detach

2020-03-20 Thread Launchpad Bug Tracker

This bug was fixed in the package qemu - 1:4.2-3ubuntu3

---
qemu (1:4.2-3ubuntu3) focal; urgency=medium

  * d/p/stable/lp-1867519-*: Stabilize qemu 4.2 with upstream
patches @qemu-stable (LP: #1867519)

 -- Christian Ehrhardt   Wed, 18 Mar
2020 13:57:57 +0100

** Changed in: qemu (Ubuntu)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1867519

Title:
  qemu 4.2 segfaults on VF detach

Status in QEMU:
  Fix Committed
Status in qemu package in Ubuntu:
  Fix Released

Bug description:
  After updating Ubuntu 20.04 to the Beta version, we get the following
  error and the virtual machines stucks when detaching PCI devices using
  virsh command:

  Error:
  error: Failed to detach device from /tmp/vf_interface_attached.xml
  error: internal error: End of file from qemu monitor

  steps to reproduce:
   1. create a VM over Ubuntu 20.04 (5.4.0-14-generic)
   2. attach PCI device to this VM (Mellanox VF for example)
   3. try to detaching  the PCI device using virsh command:
 a. create a pci interface xml file:
  






  
 b.  #virsh detach-device  


  - Ubuntu release:
Description:Ubuntu Focal Fossa (development branch)
Release:20.04

  - Package ver:
libvirt0:
Installed: 6.0.0-0ubuntu3
Candidate: 6.0.0-0ubuntu5
Version table:
   6.0.0-0ubuntu5 500
  500 http://il.archive.ubuntu.com/ubuntu focal/main amd64 Packages
   *** 6.0.0-0ubuntu3 100
  100 /var/lib/dpkg/status

  - What you expected to happen: 
PCI device detached without any errors.

  - What happened instead:
getting the errors above and he VM stuck

  additional info:
  after downgrading the libvirt0 package and all the dependent packages to 5.4 
the previous, version, seems that the issue disappeared

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1867519/+subscriptions

Re: [PULL v2 05/13] target/rx: CPU definitions

2020-03-20 Thread Peter Maydell

On Fri, 20 Mar 2020 at 16:32, Philippe Mathieu-Daudé  wrote:
> -fwrapv is here indeed.
>
> I use
> --extra-cflags=-fsanitize=address,alignment,array-bounds,bool,builtin,enum,float-cast-overflow,float-divide-by-zero,function,integer-divide-by-zero,nonnull-attribute,null,pointer-overflow,return,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,unreachable,vla-bound,vptr

There was a bug in older clang versions where the shift-base
sanitizer didn't honour -fwrapv:
https://bugs.llvm.org/show_bug.cgi?id=25552

https://wiki.qemu.org/Testing#clang_UBSan
says you can work around the clang bug with -fno-sanitize=shift-base.

The bug was fixed upstream back in 2016, though, so the
fix ought to be in clang 4, I think. Are you using an
old clang version, or has it regressed in newer clang?

thanks
-- PMM

Re: [PULL v2 05/13] target/rx: CPU definitions

2020-03-20 Thread Philippe Mathieu-Daudé


On 3/20/20 5:21 PM, Peter Maydell wrote:

On Fri, 20 Mar 2020 at 16:19, Richard Henderson
 wrote:


On 3/20/20 9:04 AM, Philippe Mathieu-Daudé wrote:

Not related to this patch, but this line generates a warning with Clang:

   CC  rx-softmmu/target/rx/cpu.o
target/rx/cpu.c:158:33: warning: The result of the left shift is undefined
because the left operand is negative
 address = physical = addr & TARGET_PAGE_MASK;
 ^~~~
include/exec/cpu-all.h:234:45: note: expanded from macro 'TARGET_PAGE_MASK'
#define TARGET_PAGE_MASK   ((target_long)-1 << TARGET_PAGE_BITS)
 ~~~ ^
1 warning generated.


>From configure:


# We use -fwrapv to tell the compiler that we require a C dialect where
# left shift of signed integers is well defined and has the expected
# 2s-complement style results. (Both clang and gcc agree that it
# provides these semantics.)


Clang is *supposed* to be not generating those warnings.


I do have clang in my build tests, so at least some versions of
clang do indeed correctly handle -fwrapv. What version are
you using, Philippe ?


-fwrapv is here indeed.

I use 
--extra-cflags=-fsanitize=address,alignment,array-bounds,bool,builtin,enum,float-cast-overflow,float-divide-by-zero,function,integer-divide-by-zero,nonnull-attribute,null,pointer-overflow,return,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,unreachable,vla-bound,vptr


Apparently -fwrapv is ignored. Probably one of shift-base/shift-exponent 
sanitizer plugins.




thanks
-- PMM

Re: [PATCH v3 3/3] Acceptance test: provides to use RDMA transport for migration test

2020-03-20 Thread Willian Rampazzo

Hi Oksana,

On Fri, Mar 20, 2020 at 12:16 PM Oksana Vohchana  wrote:
>
> Adds test for RDMA migration check
>
> Signed-off-by: Oksana Vohchana 
> ---
>  tests/acceptance/migration.py | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/tests/acceptance/migration.py b/tests/acceptance/migration.py
> index a783f3915b..c8673114a9 100644
> --- a/tests/acceptance/migration.py
> +++ b/tests/acceptance/migration.py
> @@ -105,3 +105,15 @@ class Migration(Test):
>  """
>  free_port = self._get_free_port()
>  dest_uri = 'exec:nc -l localhost %u' % free_port
> +
> +@skipUnless(_if_rdma_enable(None), "Unit rdma.service could not be 
> found")
> +@skipUnless(_get_interface_rdma(None), 'RDMA service or interface not 
> configured')

If you change these two methods to be static, you will not need to use
the `None` parameter, as I mentioned in patch 2 of this series.

> +def test_migration_with_rdma_localhost(self):
> +iface = self._get_interface_rdma()
> +ip = self._get_ip_rdma(iface)
> +if ip:
> +free_port = self._get_free_port(address=ip)
> +else:
> +self.cancel("Ip address isn't configured")
> +dest_uri = 'rdma:%s:%u' % (ip, free_port)
> +self.do_migrate(dest_uri)
> --
> 2.21.1
>
>

Re: [PATCH v9 02/10] scripts: Coccinelle script to use ERRP_AUTO_PROPAGATE()

2020-03-20 Thread Markus Armbruster

Vladimir Sementsov-Ogievskiy  writes:

> 20.03.2020 16:58, Markus Armbruster wrote:
>> Vladimir Sementsov-Ogievskiy  writes:
[...]
>>> I will not be surprised, if we missed some more interesting cases :)
>>> But we should proceed. What is our plan? Will you queue v10 for 5.1?
>>
>> v10's PATCH 1+2 look ready.  The error.h comment update could perhaps
>> use some polish; I've focused my attention elsewhere.
>>
>> PATCH 8-9 are generated.  They should never be rebased, always be
>> regenerated.  We compare regenerated patches to posted ones to make sure
>> they are still sane, and the R-bys are still valid.  I can take care of
>> the comparing.
>>
>> I'd like to have a pull request ready when the tree reopens for general
>> development.  Let's use the time until then to get more generated
>> patches out for review.
>>
>> If I queue up patches in my tree, we shift the responsibility for
>> regenerating patches from you to me, and create a coordination issue:
>> you'll want to base patch submissions on the branch I use to queue this
>> work, and that's going to be awkward when I rebase / regenerate that
>> branch.  I think it's simpler to queue up in your tree until we're ready
>> for a pull request.
>>
>> When you post more patches, use
>>
>>  Based-on: <20200317151625.20797-1-vsement...@virtuozzo.com>
>>
>> so that Patchew applies them on top of this series.  Hmm, probably won't
>> do, as PATCH 9 already conflicts.
>>
>> You could instead repost PATCH 1+2 with each batch.  I hope that's not
>> too confusing.
>>
>> I trust you'll keep providing a tag reviewers can pull.
>>
>> I suggest to ask maintainers to leave merging these patches to me, in
>> cover letters.
>>
>> Makes sense?
>>
>
> Hmm.
>
> I remember what Kevin said about freeze period: maintainers will queue
> a lot of patches in their "next" branches, and send pull requests at start
> of next developing period. This highly possible will drop r-bs I can get now.
> And reviewers will have to review twice.
>
> And for the same reason, it's bad idea to queue in your branch a lot of 
> patches
> from different subsystems during freeze.
>
> So, just postpone this all up to next development phase?

Okay.  I hope we can process generated patches at a brisk pace then.

Re: [PATCH v3 2/3] Acceptance test: provides new functions

2020-03-20 Thread Willian Rampazzo

Hi Oksana,

On Fri, Mar 20, 2020 at 12:15 PM Oksana Vohchana  wrote:
>
> Provides new functions related to the rdma migration test
> Adds functions to check if service RDMA is enabled and gets
> the ip address on the interface where it was configured
>
> Signed-off-by: Oksana Vohchana 
> ---
>  tests/acceptance/migration.py | 30 ++
>  1 file changed, 30 insertions(+)
>
> diff --git a/tests/acceptance/migration.py b/tests/acceptance/migration.py
> index e4c39b85a1..a783f3915b 100644
> --- a/tests/acceptance/migration.py
> +++ b/tests/acceptance/migration.py
> @@ -11,12 +11,17 @@
>
>
>  import tempfile
> +import json
>  from avocado_qemu import Test
>  from avocado import skipUnless
>
>  from avocado.utils import network
>  from avocado.utils import wait
>  from avocado.utils.path import find_command
> +from avocado.utils.network.interfaces import NetworkInterface
> +from avocado.utils.network.hosts import LocalHost
> +from avocado.utils import service
> +from avocado.utils import process
>
>
>  class Migration(Test):
> @@ -58,6 +63,31 @@ class Migration(Test):
>  self.cancel('Failed to find a free port')
>  return port
>
> +def _if_rdma_enable(self):
> +rdma_stat = service.ServiceManager()
> +rdma = rdma_stat.status('rdma')
> +return rdma

You can just `return rdma_stat.status('rdma')` here! Also, as you are
not using any of the class attributes or methods, if you make this
method static, you don't need to call it with `None` as you did on
patch 3 of this series.

> +
> +def _get_interface_rdma(self):
> +cmd = 'rdma link show -j'
> +out = json.loads(process.getoutput(cmd))
> +try:
> +for i in out:
> +if i['state'] == 'ACTIVE':
> +return i['netdev']
> +except KeyError:
> +return None

Same comment about making this method static.

Actually, if you are not using any of the attributes or methods from
the Migration class on these two methods, I think you can define them
as functions, outside of the Class. Does it make sense?

> +
> +def _get_ip_rdma(self, interface):
> +local = LocalHost()
> +network_in = NetworkInterface(interface, local)
> +try:
> +ip = network_in._get_interface_details()
> +if ip:
> +return ip[0]['addr_info'][0]['local']
> +except:
> +self.cancel("Incorrect interface configuration or device name")
> +

If you change the logic a bit and raise an exception here, instead of
doing a `self.cancel`, you can also make this method static, or move
it outside of the class. The cancel can be handled in the test, with
the exception raised here.

>
>  def test_migration_with_tcp_localhost(self):
>  dest_uri = 'tcp:localhost:%u' % self._get_free_port()
> --
> 2.21.1
>
>

Let me know if the comments do not make sense.

Willian

Re: [PULL v2 05/13] target/rx: CPU definitions

2020-03-20 Thread Peter Maydell

On Fri, 20 Mar 2020 at 16:19, Richard Henderson
 wrote:
>
> On 3/20/20 9:04 AM, Philippe Mathieu-Daudé wrote:
> > Not related to this patch, but this line generates a warning with Clang:
> >
> >   CC  rx-softmmu/target/rx/cpu.o
> > target/rx/cpu.c:158:33: warning: The result of the left shift is undefined
> > because the left operand is negative
> > address = physical = addr & TARGET_PAGE_MASK;
> > ^~~~
> > include/exec/cpu-all.h:234:45: note: expanded from macro 'TARGET_PAGE_MASK'
> > #define TARGET_PAGE_MASK   ((target_long)-1 << TARGET_PAGE_BITS)
> > ~~~ ^
> > 1 warning generated.
>
> >From configure:
>
> > # We use -fwrapv to tell the compiler that we require a C dialect where
> > # left shift of signed integers is well defined and has the expected
> > # 2s-complement style results. (Both clang and gcc agree that it
> > # provides these semantics.)
>
> Clang is *supposed* to be not generating those warnings.

I do have clang in my build tests, so at least some versions of
clang do indeed correctly handle -fwrapv. What version are
you using, Philippe ?

thanks
-- PMM

Re: [PULL 0/4] Python queue for 5.0 soft freeze

2020-03-20 Thread Philippe Mathieu-Daudé


On 3/20/20 5:14 PM, Peter Maydell wrote:

On Fri, 20 Mar 2020 at 16:11, Philippe Mathieu-Daudé  wrote:


On 3/20/20 4:59 PM, Peter Maydell wrote:

On Wed, 18 Mar 2020 at 01:12, Eduardo Habkost  wrote:


The following changes since commit d649689a8ecb2e276cc20d3af6d416e3c299cb17:

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
staging (2020-03-17 18:33:05 +)

are available in the Git repository at:

git://github.com/ehabkost/qemu.git tags/python-next-pull-request

for you to fetch changes up to f4abfc6cb037da951e7977a67171f361fc6d21d7:

MAINTAINERS: add simplebench (2020-03-17 21:09:26 -0400)


Python queue for 5.0 soft freeze

* Add scripts/simplebench (Vladimir Sementsov-Ogievskiy)




Applied, thanks.


I guess there was a mis understanding with Eduardo, he was going to
resend this pullrequest due to:

ERROR: please use python3 interpreter


Ah, sorry. I'd read the replies to this thread as meaning that
those things were OK to fix as followup patches rather than
requiring a respin of the pull.


As you noticed, scripts/simplebench/bench_block_job.py is not run in our 
tests, so no need to hold the other pull requests, we'll fix later.


Thanks,

Phil.

Re: [PATCH v10 2/9] scripts: Coccinelle script to use ERRP_AUTO_PROPAGATE()

2020-03-20 Thread Markus Armbruster

Vladimir Sementsov-Ogievskiy  writes:

> Script adds ERRP_AUTO_PROPAGATE macro invocation where appropriate and
> does corresponding changes in code (look for details in
> include/qapi/error.h)
>
> Usage example:
> spatch --sp-file scripts/coccinelle/auto-propagated-errp.cocci \
>  --macro-file scripts/cocci-macro-file.h --in-place --no-show-diff \
>  --max-width 80 FILES...
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>
> Cc: Eric Blake 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> Cc: Greg Kurz 
> Cc: Christian Schoenebeck 
> Cc: Stefan Hajnoczi 
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: Paul Durrant 
> Cc: "Philippe Mathieu-Daudé" 
> Cc: Laszlo Ersek 
> Cc: Gerd Hoffmann 
> Cc: Stefan Berger 
> Cc: Markus Armbruster 
> Cc: Michael Roth 
> Cc: qemu-devel@nongnu.org
> Cc: qemu-bl...@nongnu.org
> Cc: xen-de...@lists.xenproject.org
>
>  scripts/coccinelle/auto-propagated-errp.cocci | 336 ++
>  include/qapi/error.h  |   3 +
>  MAINTAINERS   |   1 +
>  3 files changed, 340 insertions(+)
>  create mode 100644 scripts/coccinelle/auto-propagated-errp.cocci
>
> diff --git a/scripts/coccinelle/auto-propagated-errp.cocci 
> b/scripts/coccinelle/auto-propagated-errp.cocci
> new file mode 100644
> index 00..5188b07006
> --- /dev/null
> +++ b/scripts/coccinelle/auto-propagated-errp.cocci
> @@ -0,0 +1,336 @@
> +// Use ERRP_AUTO_PROPAGATE (see include/qapi/error.h)
> +//
> +// Copyright (c) 2020 Virtuozzo International GmbH.
> +//
> +// This program is free software; you can redistribute it and/or
> +// modify it under the terms of the GNU General Public License as
> +// published by the Free Software Foundation; either version 2 of the
> +// License, or (at your option) any later version.
> +//
> +// This program is distributed in the hope that it will be useful,
> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +// GNU General Public License for more details.
> +//
> +// You should have received a copy of the GNU General Public License
> +// along with this program.  If not, see
> +// .
> +//
> +// Usage example:
> +// spatch --sp-file scripts/coccinelle/auto-propagated-errp.cocci \
> +//  --macro-file scripts/cocci-macro-file.h --in-place \
> +//  --no-show-diff --max-width 80 FILES...
> +//
> +// Note: --max-width 80 is needed because coccinelle default is less
> +// than 80, and without this parameter coccinelle may reindent some
> +// lines which fit into 80 characters but not to coccinelle default,
> +// which in turn produces extra patch hunks for no reason.
> +
> +// Switch unusual Error ** parameter names to errp
> +// (this is necessary to use ERRP_AUTO_PROPAGATE).
> +//
> +// Disable optional_qualifier to skip functions with
> +// "Error *const *errp" parameter.
> +//
> +// Skip functions with "assert(_errp && *_errp)" statement, because
> +// that signals unusual semantics, and the parameter name may well
> +// serve a purpose. (like nbd_iter_channel_error()).
> +//
> +// Skip util/error.c to not touch, for example, error_propagate() and
> +// error_propagate_prepend().
> +@ depends on !(file in "util/error.c") disable optional_qualifier@
> +identifier fn;
> +identifier _errp != errp;
> +@@
> +
> + fn(...,
> +-   Error **_errp
> ++   Error **errp
> +,...)
> + {
> +(
> + ... when != assert(_errp && *_errp)
> +&
> + <...
> +-_errp
> ++errp
> + ...>
> +)
> + }
> +
> +// Add invocation of ERRP_AUTO_PROPAGATE to errp-functions where
> +// necessary
> +//
> +// Note, that without "when any" the final "..." does not mach
> +// something matched by previous pattern, i.e. the rule will not match
> +// double error_prepend in control flow like in
> +// vfio_set_irq_signaling().
> +//
> +// Note, "exists" says that we want apply rule even if it does not
> +// match on all possible control flows (otherwise, it will not match
> +// standard pattern when error_propagate() call is in if branch).
> +@ disable optional_qualifier exists@
> +identifier fn, local_err;
> +symbol errp;
> +@@
> +
> + fn(..., Error **errp, ...)
> + {
> ++   ERRP_AUTO_PROPAGATE();
> +...  when != ERRP_AUTO_PROPAGATE();
> +(
> +(
> +error_append_hint(errp, ...);
> +|
> +error_prepend(errp, ...);
> +|
> +error_vprepend(errp, ...);
> +)
> +... when any
> +|
> +Error *local_err = NULL;
> +...
> +(
> +error_propagate_prepend(errp, local_err, ...);
> +|
> +error_propagate(errp, local_err);
> +)
> +...
> +)
> + }
> +
> +// Warn when several Error * definitions are in the control flow.
> +// This rule is not chained to rule1 and less restrictive, to cover more
> +// functions to warn (even those we are not going to convert).
> +//
> +// Note, that even with one (or zero) Error * definition in the each
> +// control flow we may have several (in total) Error * definitions in
> +// the f

Re: [PULL v2 05/13] target/rx: CPU definitions

2020-03-20 Thread Richard Henderson

On 3/20/20 9:04 AM, Philippe Mathieu-Daudé wrote:
> Not related to this patch, but this line generates a warning with Clang:
> 
>   CC  rx-softmmu/target/rx/cpu.o
> target/rx/cpu.c:158:33: warning: The result of the left shift is undefined
> because the left operand is negative
>     address = physical = addr & TARGET_PAGE_MASK;
>     ^~~~
> include/exec/cpu-all.h:234:45: note: expanded from macro 'TARGET_PAGE_MASK'
> #define TARGET_PAGE_MASK   ((target_long)-1 << TARGET_PAGE_BITS)
>     ~~~ ^
> 1 warning generated.

>From configure:

> # We use -fwrapv to tell the compiler that we require a C dialect where
> # left shift of signed integers is well defined and has the expected
> # 2s-complement style results. (Both clang and gcc agree that it
> # provides these semantics.)

Clang is *supposed* to be not generating those warnings.


r~

Re: [PATCH] ppc/ppc405_boards: Remove unnecessary NULL check

2020-03-20 Thread Markus Armbruster

Philippe Mathieu-Daudé  writes:

> This code is inside the "if (dinfo)" condition, so testing
> again here whether it is NULL is unnecessary.
>
> Fixes: dd59bcae7 (Don't size flash memory to match backing image)
> Reported-by: Coverity (CID 1421917)
> Suggested-by: Peter Maydell 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/ppc/ppc405_boards.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c
> index e6bffb9e1a..6198ec1035 100644
> --- a/hw/ppc/ppc405_boards.c
> +++ b/hw/ppc/ppc405_boards.c
> @@ -191,7 +191,7 @@ static void ref405ep_init(MachineState *machine)
>  bios_size = 8 * MiB;
>  pflash_cfi02_register((uint32_t)(-bios_size),
>"ef405ep.bios", bios_size,
> -  dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
> +  blk_by_legacy_dinfo(dinfo),
>64 * KiB, 1,
>2, 0x0001, 0x22DA, 0x, 0x, 0x555, 
> 0x2AA,
>1);
> @@ -459,7 +459,7 @@ static void taihu_405ep_init(MachineState *machine)
>  bios_size = 2 * MiB;
>  pflash_cfi02_register(0xFFE0,
>"taihu_405ep.bios", bios_size,
> -  dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
> +  blk_by_legacy_dinfo(dinfo),
>64 * KiB, 1,
>4, 0x0001, 0x22DA, 0x, 0x, 0x555, 
> 0x2AA,
>1);
> @@ -494,7 +494,7 @@ static void taihu_405ep_init(MachineState *machine)
>  if (dinfo) {
>  bios_size = 32 * MiB;
>  pflash_cfi02_register(0xfc00, "taihu_405ep.flash", bios_size,
> -  dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
> +  blk_by_legacy_dinfo(dinfo),
>64 * KiB, 1,
>4, 0x0001, 0x22DA, 0x, 0x, 0x555, 
> 0x2AA,
>1);

Reviewed-by: Markus Armbruster

Re: [PULL 0/4] Python queue for 5.0 soft freeze

2020-03-20 Thread Peter Maydell

On Fri, 20 Mar 2020 at 16:11, Philippe Mathieu-Daudé  wrote:
>
> On 3/20/20 4:59 PM, Peter Maydell wrote:
> > On Wed, 18 Mar 2020 at 01:12, Eduardo Habkost  wrote:
> >>
> >> The following changes since commit 
> >> d649689a8ecb2e276cc20d3af6d416e3c299cb17:
> >>
> >>Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
> >> staging (2020-03-17 18:33:05 +)
> >>
> >> are available in the Git repository at:
> >>
> >>git://github.com/ehabkost/qemu.git tags/python-next-pull-request
> >>
> >> for you to fetch changes up to f4abfc6cb037da951e7977a67171f361fc6d21d7:
> >>
> >>MAINTAINERS: add simplebench (2020-03-17 21:09:26 -0400)
> >>
> >> 
> >> Python queue for 5.0 soft freeze
> >>
> >> * Add scripts/simplebench (Vladimir Sementsov-Ogievskiy)
> >>
> >
> >
> > Applied, thanks.
>
> I guess there was a mis understanding with Eduardo, he was going to
> resend this pullrequest due to:
>
> ERROR: please use python3 interpreter

Ah, sorry. I'd read the replies to this thread as meaning that
those things were OK to fix as followup patches rather than
requiring a respin of the pull.

thanks
-- PMM

1 2 3 >

1 - 100 of 286 matches

Mail list logo