date:20200416

[PULL 3/3] target/ppc: Fix mtmsr(d) L=1 variant that loses interrupts

2020-04-16 Thread David Gibson

From: Nicholas Piggin 

If mtmsr L=1 sets MSR[EE] while there is a maskable exception pending,
it does not cause an interrupt. This causes the test case to hang:

https://lists.gnu.org/archive/html/qemu-ppc/2019-10/msg00826.html

More recently, Linux reduced the occurance of operations (e.g., rfi)
which stop translation and allow pending interrupts to be processed.
This started causing hangs in Linux boot in long-running kernel tests,
running with '-d int' shows the decrementer stops firing despite DEC
wrapping and MSR[EE]=1.

https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208301.html

The cause is the broken mtmsr L=1 behaviour, which is contrary to the
architecture. From Power ISA v3.0B, p.977, Move To Machine State Register,
Programming Note states:

If MSR[EE]=0 and an External, Decrementer, or Performance Monitor
exception is pending, executing an mtmsrd instruction that sets
MSR[EE] to 1 will cause the interrupt to occur before the next
instruction is executed, if no higher priority exception exists

Fix this by handling L=1 exactly the same way as L=0, modulo the MSR
bits altered.

The confusion arises from L=0 being "context synchronizing" whereas L=1
is "execution synchronizing", which is a weaker semantic. However this
is not a relaxation of the requirement that these exceptions cause
interrupts when MSR[EE]=1 (e.g., when mtmsr executes to completion as
TCG is doing here), rather it specifies how a pipelined processor can
have multiple instructions in flight where one may influence how another
behaves.

Cc: qemu-sta...@nongnu.org
Reported-by: Anton Blanchard 
Reported-by: Nathan Chancellor 
Tested-by: Nathan Chancellor 
Signed-off-by: Nicholas Piggin 
Message-Id: <2020041431.465560-1-npig...@gmail.com>
Reviewed-by: Cédric Le Goater 
Tested-by: Cédric Le Goater 
Signed-off-by: David Gibson 
---
 target/ppc/translate.c | 46 +-
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index b207fb5386..9959259dba 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -4361,30 +4361,34 @@ static void gen_mtmsrd(DisasContext *ctx)
 CHK_SV;
 
 #if !defined(CONFIG_USER_ONLY)
+if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+gen_io_start();
+}
 if (ctx->opcode & 0x0001) {
-/* Special form that does not need any synchronisation */
+/* L=1 form only updates EE and RI */
 TCGv t0 = tcg_temp_new();
+TCGv t1 = tcg_temp_new();
 tcg_gen_andi_tl(t0, cpu_gpr[rS(ctx->opcode)],
 (1 << MSR_RI) | (1 << MSR_EE));
-tcg_gen_andi_tl(cpu_msr, cpu_msr,
+tcg_gen_andi_tl(t1, cpu_msr,
 ~(target_ulong)((1 << MSR_RI) | (1 << MSR_EE)));
-tcg_gen_or_tl(cpu_msr, cpu_msr, t0);
+tcg_gen_or_tl(t1, t1, t0);
+
+gen_helper_store_msr(cpu_env, t1);
 tcg_temp_free(t0);
+tcg_temp_free(t1);
+
 } else {
 /*
  * XXX: we need to update nip before the store if we enter
  *  power saving mode, we will exit the loop directly from
  *  ppc_store_msr
  */
-if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
-gen_io_start();
-}
 gen_update_nip(ctx, ctx->base.pc_next);
 gen_helper_store_msr(cpu_env, cpu_gpr[rS(ctx->opcode)]);
-/* Must stop the translation as machine state (may have) changed */
-/* Note that mtmsr is not always defined as context-synchronizing */
-gen_stop_exception(ctx);
 }
+/* Must stop the translation as machine state (may have) changed */
+gen_stop_exception(ctx);
 #endif /* !defined(CONFIG_USER_ONLY) */
 }
 #endif /* defined(TARGET_PPC64) */
@@ -4394,15 +4398,23 @@ static void gen_mtmsr(DisasContext *ctx)
 CHK_SV;
 
 #if !defined(CONFIG_USER_ONLY)
-   if (ctx->opcode & 0x0001) {
-/* Special form that does not need any synchronisation */
+if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+gen_io_start();
+}
+if (ctx->opcode & 0x0001) {
+/* L=1 form only updates EE and RI */
 TCGv t0 = tcg_temp_new();
+TCGv t1 = tcg_temp_new();
 tcg_gen_andi_tl(t0, cpu_gpr[rS(ctx->opcode)],
 (1 << MSR_RI) | (1 << MSR_EE));
-tcg_gen_andi_tl(cpu_msr, cpu_msr,
+tcg_gen_andi_tl(t1, cpu_msr,
 ~(target_ulong)((1 << MSR_RI) | (1 << MSR_EE)));
-tcg_gen_or_tl(cpu_msr, cpu_msr, t0);
+tcg_gen_or_tl(t1, t1, t0);
+
+gen_helper_store_msr(cpu_env, t1);
 tcg_temp_free(t0);
+tcg_temp_free(t1);
+
 } else {
 TCGv msr = tcg_temp_new();
 
@@ -4411,9 +4423,6 @@ static void gen_mtmsr(DisasContext *ctx)
  *  power saving mode, we will exit the loop directly from
  *  ppc_store_msr
  */
-if (tb_cflags(ctx->base.tb) &

[PULL 1/3] linux-user/ppc: Fix padding in mcontext_t for ppc64

2020-04-16 Thread David Gibson

From: Richard Henderson 

The padding that was added in 95cda4c44ee was added to a union,
and so it had no effect.  This fixes misalignment errors detected
by clang sanitizers for ppc64 and ppc64le.

In addition, only ppc64 allocates space for VSX registers, so do
not save them for ppc32.  The kernel only has references to
CONFIG_SPE in signal_32.c, so do not attempt to save them for ppc64.

Fixes: 95cda4c44ee
Signed-off-by: Richard Henderson 
Message-Id: <20200407032105.26711-1-richard.hender...@linaro.org>
Acked-by: Laurent Vivier 
Signed-off-by: David Gibson 
---
 linux-user/ppc/signal.c | 69 +
 1 file changed, 29 insertions(+), 40 deletions(-)

diff --git a/linux-user/ppc/signal.c b/linux-user/ppc/signal.c
index ecd99736b7..20a02c197c 100644
--- a/linux-user/ppc/signal.c
+++ b/linux-user/ppc/signal.c
@@ -35,12 +35,26 @@ struct target_mcontext {
 target_ulong mc_gregs[48];
 /* Includes fpscr.  */
 uint64_t mc_fregs[33];
+
 #if defined(TARGET_PPC64)
 /* Pointer to the vector regs */
 target_ulong v_regs;
+/*
+ * On ppc64, this mcontext structure is naturally *unaligned*,
+ * or rather it is aligned on a 8 bytes boundary but not on
+ * a 16 byte boundary.  This pad fixes it up.  This is why we
+ * cannot use ppc_avr_t, which would force alignment.  This is
+ * also why the vector regs are referenced in the ABI by the
+ * v_regs pointer above so any amount of padding can be added here.
+ */
+target_ulong pad;
+/* VSCR and VRSAVE are saved separately.  Also reserve space for VSX. */
+struct {
+uint64_t altivec[34 + 16][2];
+} mc_vregs;
 #else
 target_ulong mc_pad[2];
-#endif
+
 /* We need to handle Altivec and SPE at the same time, which no
kernel needs to do.  Fortunately, the kernel defines this bit to
be Altivec-register-large all the time, rather than trying to
@@ -48,32 +62,14 @@ struct target_mcontext {
 union {
 /* SPE vector registers.  One extra for SPEFSCR.  */
 uint32_t spe[33];
-/* Altivec vector registers.  The packing of VSCR and VRSAVE
-   varies depending on whether we're PPC64 or not: PPC64 splits
-   them apart; PPC32 stuffs them together.
-   We also need to account for the VSX registers on PPC64
-*/
-#if defined(TARGET_PPC64)
-#define QEMU_NVRREG (34 + 16)
-/* On ppc64, this mcontext structure is naturally *unaligned*,
- * or rather it is aligned on a 8 bytes boundary but not on
- * a 16 bytes one. This pad fixes it up. This is also why the
- * vector regs are referenced by the v_regs pointer above so
- * any amount of padding can be added here
- */
-target_ulong pad;
-#else
-/* On ppc32, we are already aligned to 16 bytes */
-#define QEMU_NVRREG 33
-#endif
-/* We cannot use ppc_avr_t here as we do *not* want the implied
- * 16-bytes alignment that would result from it. This would have
- * the effect of making the whole struct target_mcontext aligned
- * which breaks the layout of struct target_ucontext on ppc64.
+/*
+ * Altivec vector registers.  One extra for VRSAVE.
+ * On ppc32, we are already aligned to 16 bytes.  We could
+ * use ppc_avr_t, but choose to share the same type as ppc64.
  */
-uint64_t altivec[QEMU_NVRREG][2];
-#undef QEMU_NVRREG
+uint64_t altivec[33][2];
 } mc_vregs;
+#endif
 };
 
 /* See arch/powerpc/include/asm/sigcontext.h.  */
@@ -278,6 +274,7 @@ static void save_user_regs(CPUPPCState *env, struct 
target_mcontext *frame)
 __put_user((uint32_t)env->spr[SPR_VRSAVE], vrsave);
 }
 
+#if defined(TARGET_PPC64)
 /* Save VSX second halves */
 if (env->insns_flags2 & PPC2_VSX) {
 uint64_t *vsregs = (uint64_t *)>mc_vregs.altivec[34];
@@ -286,6 +283,7 @@ static void save_user_regs(CPUPPCState *env, struct 
target_mcontext *frame)
 __put_user(*vsrl, [i]);
 }
 }
+#endif
 
 /* Save floating point registers.  */
 if (env->insns_flags & PPC_FLOAT) {
@@ -296,22 +294,18 @@ static void save_user_regs(CPUPPCState *env, struct 
target_mcontext *frame)
 __put_user((uint64_t) env->fpscr, >mc_fregs[32]);
 }
 
+#if !defined(TARGET_PPC64)
 /* Save SPE registers.  The kernel only saves the high half.  */
 if (env->insns_flags & PPC_SPE) {
-#if defined(TARGET_PPC64)
-for (i = 0; i < ARRAY_SIZE(env->gpr); i++) {
-__put_user(env->gpr[i] >> 32, >mc_vregs.spe[i]);
-}
-#else
 for (i = 0; i < ARRAY_SIZE(env->gprh); i++) {
 __put_user(env->gprh[i], >mc_vregs.spe[i]);
 }
-#endif
 /* Set MSR_SPE in the saved MSR value to indicate that
frame->mc_vregs contains valid data.  */
 msr |= MSR_SPE;
 __put_user(env->spe_fscr, >mc_vregs.spe[32]);
 }
+#endif
 
 /* Store MSR.  */

[PULL 2/3] target/ppc: Fix wrong interpretation of the disposition flag.

2020-04-16 Thread David Gibson

From: Ganesh Goudar 

Bitwise AND with kvm_run->flags to evaluate if we recovered from
MCE or not is not correct, As disposition in kvm_run->flags is a
two-bit integer value and not a bit map, So check for equality
instead of bitwise AND.

Without the fix qemu treats any unrecoverable mce error as recoverable
and ends up in a mce loop inside the guest, Below are the MCE logs before
and after the fix.

Before fix:

[   66.775757] MCE: CPU0: Initiator CPU
[   66.775891] MCE: CPU0: Unknown
[   66.776587] MCE: CPU0: machine check (Harmless) Host UE Indeterminate 
[Recovered]
[   66.776857] MCE: CPU0: NIP: [c008000e00b8] mcetest_tlbie+0xb0/0x128 
[mcetest_tlbie]

After fix:

[ 20.650577] CPU: 0 PID: 1415 Comm: insmod Tainted: G M O 5.6.0-fwnmi-arv+ #11
[ 20.650618] NIP: c008023a00e8 LR: c008023a00d8 CTR: c0021fe0
[ 20.650660] REGS: c001fffd3d70 TRAP: 0200 Tainted: G M O (5.6.0-fwnmi-arv+)
[ 20.650708] MSR: 82a0b033  CR: 
42000222 XER: 2004
[ 20.650758] CFAR: c000b940 DAR: c008025e00e0 DSISR: 0200 
IRQMASK: 0
[ 20.650758] GPR00: c008023a00d8 c001fddd79a0 c008023a8500 
0039
[ 20.650758] GPR04: 0001   
0007
[ 20.650758] GPR08: 0007 c008025e00e0  
00f7
[ 20.650758] GPR12:  c190 c101f398 
c008025c052f
[ 20.650758] GPR16: 03a8 c008025c c001fddd7d70 
c15b7940
[ 20.650758] GPR20: fff1 c0f72c28 c008025a0988 

[ 20.650758] GPR24: 0100 c008023a05d0 c01f1d70 

[ 20.650758] GPR28: c001fde2 c001fd02b2e0 c008023a 
c008025e
[ 20.651178] NIP [c008023a00e8] mcetest_tlbie+0xe8/0xf0 [mcetest_tlbie]
[ 20.651220] LR [c008023a00d8] mcetest_tlbie+0xd8/0xf0 [mcetest_tlbie]
[ 20.651262] Call Trace:
[ 20.651280] [c001fddd79a0] [c008023a00d8] mcetest_tlbie+0xd8/0xf0 
[mcetest_tlbie] (unreliable)
[ 20.651340] [c001fddd7a10] [c001091c] do_one_initcall+0x6c/0x2c0
[ 20.651390] [c001fddd7af0] [c01f7998] do_init_module+0x90/0x298
[ 20.651433] [c001fddd7b80] [c01f61a8] load_module+0x1f58/0x27a0
[ 20.651476] [c001fddd7d40] [c01f6c70] 
__do_sys_finit_module+0xe0/0x100
[ 20.651526] [c001fddd7e20] [c000b9d0] system_call+0x5c/0x68
[ 20.651567] Instruction dump:
[ 20.651594] e8410018 3c62 e8638020 48cd e8410018 3c62 e8638028 
48bd
[ 20.651646] e8410018 7be904e4 3940 612900e0 <7d434a64> 4b74 3c4c0001 
38428410
[ 20.651699] ---[ end trace 4c40897f016b4340 ]---
[ 20.653310]
Bus error
[ 20.655575] MCE: CPU0: machine check (Harmless) Host UE Indeterminate [Not 
recovered]
[ 20.655575] MCE: CPU0: NIP: [c008023a00e8] mcetest_tlbie+0xe8/0xf0 
[mcetest_tlbie]
[ 20.655576] MCE: CPU0: Initiator CPU
[ 20.655576] MCE: CPU0: Unknown

Signed-off-by: Ganesh Goudar 
Message-Id: <20200408170944.16003-1-ganes...@linux.ibm.com>
Signed-off-by: David Gibson 
---
 target/ppc/kvm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 03d0667e8f..2692f76130 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -2816,11 +2816,11 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
 #if defined(TARGET_PPC64)
 int kvm_handle_nmi(PowerPCCPU *cpu, struct kvm_run *run)
 {
-bool recovered = run->flags & KVM_RUN_PPC_NMI_DISP_FULLY_RECOV;
+uint16_t flags = run->flags & KVM_RUN_PPC_NMI_DISP_MASK;
 
 cpu_synchronize_state(CPU(cpu));
 
-spapr_mce_req_event(cpu, recovered);
+spapr_mce_req_event(cpu, flags == KVM_RUN_PPC_NMI_DISP_FULLY_RECOV);
 
 return 0;
 }
-- 
2.25.2

[PULL 0/3] ppc-for-5.0 queue 20200417

2020-04-16 Thread David Gibson

The following changes since commit 20038cd7a8412feeb49c01f6ede89e36c8995472:

  Update version for v5.0.0-rc3 release (2020-04-15 20:51:54 +0100)

are available in the Git repository at:

  git://github.com/dgibson/qemu.git tags/ppc-for-5.0-20200417

for you to fetch changes up to 5ed195065cc6895f61b9d59bfa0a0536ed5ed51e:

  target/ppc: Fix mtmsr(d) L=1 variant that loses interrupts (2020-04-17 
10:39:03 +1000)


ppc patch queue for 2020-04-17

Here are a few late bugfixes for qemu-5.0 in the ppc target code.
Unless some really nasty last minute bug shows up, I expect this to be
the last ppc pull request for qemu-5.0.


Ganesh Goudar (1):
  target/ppc: Fix wrong interpretation of the disposition flag.

Nicholas Piggin (1):
  target/ppc: Fix mtmsr(d) L=1 variant that loses interrupts

Richard Henderson (1):
  linux-user/ppc: Fix padding in mcontext_t for ppc64

 linux-user/ppc/signal.c | 69 +
 target/ppc/kvm.c|  4 +--
 target/ppc/translate.c  | 46 +++--
 3 files changed, 58 insertions(+), 61 deletions(-)

[PATCH qemu] spapr: Add PVR setting capability

2020-04-16 Thread Alexey Kardashevskiy

At the moment the VCPU init sequence includes setting PVR which in case of
KVM-HV only checks if it matches the hardware PVR mask as PVR cannot be
virtualized by the hardware. In order to cope with various CPU revisions
only top 16bit of PVR are checked which works for minor revision updates.

However in every CPU generation starting POWER7 (at least) there were CPUs
supporting the (almost) same POWER ISA level but having different top
16bits of PVR - POWER7+, POWER8E, POWER8NVL; this time we got POWER9+
with a new PVR family. We would normally add the PVR mask for the new one
too, the problem with it is that although the physical machines exist,
P9+ is not going to be released as a product, and this situation is likely
to repeat in the future.

Instead of adding every new CPU family in QEMU, this adds a new sPAPR
machine capability to force PVR setting/checking. It is "on" by default
to preserve the existing behavior. When "off", it is the user's
responsibility to specify the correct CPU.

Signed-off-by: Alexey Kardashevskiy 
---
 include/hw/ppc/spapr.h |  5 -
 hw/ppc/spapr.c |  1 +
 hw/ppc/spapr_caps.c| 18 ++
 target/ppc/kvm.c   | 16 ++--
 4 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index e579eaf28c05..5ccac4d56871 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -81,8 +81,10 @@ typedef enum {
 #define SPAPR_CAP_CCF_ASSIST0x09
 /* Implements PAPR FWNMI option */
 #define SPAPR_CAP_FWNMI 0x0A
+/* Implements PAPR PVR option */
+#define SPAPR_CAP_PVR   0x0B
 /* Num Caps */
-#define SPAPR_CAP_NUM   (SPAPR_CAP_FWNMI + 1)
+#define SPAPR_CAP_NUM   (SPAPR_CAP_PVR + 1)
 
 /*
  * Capability Values
@@ -912,6 +914,7 @@ extern const VMStateDescription 
vmstate_spapr_cap_nested_kvm_hv;
 extern const VMStateDescription vmstate_spapr_cap_large_decr;
 extern const VMStateDescription vmstate_spapr_cap_ccf_assist;
 extern const VMStateDescription vmstate_spapr_cap_fwnmi;
+extern const VMStateDescription vmstate_spapr_cap_pvr;
 
 static inline uint8_t spapr_get_cap(SpaprMachineState *spapr, int cap)
 {
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 841b5ec59b12..ecc74c182b9f 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4535,6 +4535,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
 smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = SPAPR_CAP_ON;
 smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_ON;
 smc->default_caps.caps[SPAPR_CAP_FWNMI] = SPAPR_CAP_ON;
+smc->default_caps.caps[SPAPR_CAP_PVR] = SPAPR_CAP_ON;
 spapr_caps_add_properties(smc, _abort);
 smc->irq = _irq_dual;
 smc->dr_phb_enabled = true;
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index eb54f9422722..398b72b77f9f 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -525,6 +525,14 @@ static void cap_fwnmi_apply(SpaprMachineState *spapr, 
uint8_t val,
 }
 }
 
+static void cap_pvr_apply(SpaprMachineState *spapr, uint8_t val, Error **errp)
+{
+if (val) {
+return;
+}
+warn_report("If you're uing kvm-hv.ko, only \"-cpu host\" is supported");
+}
+
 SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
 [SPAPR_CAP_HTM] = {
 .name = "htm",
@@ -633,6 +641,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
 .type = "bool",
 .apply = cap_fwnmi_apply,
 },
+[SPAPR_CAP_PVR] = {
+.name = "pvr",
+.description = "Enforce PVR in KVM",
+.index = SPAPR_CAP_PVR,
+.get = spapr_cap_get_bool,
+.set = spapr_cap_set_bool,
+.type = "bool",
+.apply = cap_pvr_apply,
+},
 };
 
 static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
@@ -773,6 +790,7 @@ SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
 SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
 SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
 SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI);
+SPAPR_CAP_MIG_STATE(pvr, SPAPR_CAP_PVR);
 
 void spapr_caps_init(SpaprMachineState *spapr)
 {
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 03d0667e8f94..a4adc29b6522 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -466,15 +466,27 @@ int kvm_arch_init_vcpu(CPUState *cs)
 PowerPCCPU *cpu = POWERPC_CPU(cs);
 CPUPPCState *cenv = >env;
 int ret;
+SpaprMachineState *spapr;
 
 /* Synchronize sregs with kvm */
 ret = kvm_arch_sync_sregs(cpu);
 if (ret) {
 if (ret == -EINVAL) {
 error_report("Register sync failed... If you're using kvm-hv.ko,"
- " only \"-cpu host\" is possible");
+ " only \"-cpu host\" is supported");
+}
+/*
+ * The user chose not to set PVR which makes sense if we are running
+ * on a CPU with known ISA level but unknown

Re: [PATCH 4/4] vhost-user-blk: fix crash in realize process

2020-04-16 Thread Raphael Norwitz

Mostly looks good - just a few superficial notes.

On Wed, Apr 15, 2020 at 11:28:26AM +0800, Li Feng wrote:
> 1. set s->connected to true after vhost_dev_init;
> 2. call vhost_dev_get_config when s->connected is true, otherwise the
> hdev->host_ops will be nullptr.

You mean hdev->vhost_ops, right?

> 
> Signed-off-by: Li Feng 
> ---
>  hw/block/vhost-user-blk.c | 47 
> +--
>  1 file changed, 25 insertions(+), 22 deletions(-)
> +/*
> + * set true util vhost_dev_init return ok, because CLOSE event may happen
> + * in vhost_dev_init routine.
> + */

I'm a little confused by this comment. Do you mean to say “wait until 
vhost_dev_init
succeeds to set connected to true, because a close event may happen while
vhost_dev_init is executing”?

Re: [PATCH 2/4] vhost-user-blk: fix invalid memory access

2020-04-16 Thread Raphael Norwitz

On Wed, Apr 15, 2020 at 11:28:24AM +0800, Li Feng wrote:
> 
> when s->inflight is freed, vhost_dev_free_inflight may try to access
> s->inflight->addr, it will retrigger the following issue.
> 
> ==7309==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x604001020d18 at pc 0x55ce948a bp 0x7fffb170 sp 0x7fffb160
> READ of size 8 at 0x604001020d18 thread T0
> #0 0x55ce9489 in vhost_dev_free_inflight 
> /root/smartx/qemu-el7/qemu-test/hw/virtio/vhost.c:1473
> #1 0x55cd86eb in virtio_reset 
> /root/smartx/qemu-el7/qemu-test/hw/virtio/virtio.c:1214
> #2 0x560d3eff in virtio_pci_reset hw/virtio/virtio-pci.c:1859
> #3 0x55f2ac53 in device_set_realized hw/core/qdev.c:893
> #4 0x561d572c in property_set_bool qom/object.c:1925
> #5 0x561de8de in object_property_set_qobject qom/qom-qobject.c:27
> #6 0x561d99f4 in object_property_set_bool qom/object.c:1188
> #7 0x55e50ae7 in qdev_device_add 
> /root/smartx/qemu-el7/qemu-test/qdev-monitor.c:626
> #8 0x55e51213 in qmp_device_add 
> /root/smartx/qemu-el7/qemu-test/qdev-monitor.c:806
> #9 0x55e8ff40 in hmp_device_add 
> /root/smartx/qemu-el7/qemu-test/hmp.c:1951
> #10 0x55be889a in handle_hmp_command 
> /root/smartx/qemu-el7/qemu-test/monitor.c:3404
> #11 0x55beac8b in monitor_command_cb 
> /root/smartx/qemu-el7/qemu-test/monitor.c:4296
> #12 0x56433eb7 in readline_handle_byte util/readline.c:393
> #13 0x55be89ec in monitor_read 
> /root/smartx/qemu-el7/qemu-test/monitor.c:4279
> #14 0x563285cc in tcp_chr_read chardev/char-socket.c:470
> #15 0x7670b968 in g_main_context_dispatch 
> (/lib64/libglib-2.0.so.0+0x4a968)
> #16 0x5640727c in glib_pollfds_poll util/main-loop.c:215
> #17 0x5640727c in os_host_main_loop_wait util/main-loop.c:238
> #18 0x5640727c in main_loop_wait util/main-loop.c:497
> #19 0x55b2d0bf in main_loop /root/smartx/qemu-el7/qemu-test/vl.c:2013
> #20 0x55b2d0bf in main /root/smartx/qemu-el7/qemu-test/vl.c:4776
> #21 0x7fffdd2eb444 in __libc_start_main (/lib64/libc.so.6+0x22444)
> #22 0x55b3767a  
> (/root/smartx/qemu-el7/qemu-test/x86_64-softmmu/qemu-system-x86_64+0x5e367a)
> 
> 0x604001020d18 is located 8 bytes inside of 40-byte region 
> [0x604001020d10,0x604001020d38)
> freed by thread T0 here:
> #0 0x76f00508 in __interceptor_free (/lib64/libasan.so.4+0xde508)
> #1 0x7671107d in g_free (/lib64/libglib-2.0.so.0+0x5007d)
> 
> previously allocated by thread T0 here:
> #0 0x76f00a88 in __interceptor_calloc (/lib64/libasan.so.4+0xdea88)
> #1 0x76710fc5 in g_malloc0 (/lib64/libglib-2.0.so.0+0x4ffc5)
> 
> SUMMARY: AddressSanitizer: heap-use-after-free 
> /root/smartx/qemu-el7/qemu-test/hw/virtio/vhost.c:1473 in 
> vhost_dev_free_inflight
> Shadow bytes around the buggy address:
>   0x0c08801fc150: fa fa 00 00 00 00 04 fa fa fa fd fd fd fd fd fa
>   0x0c08801fc160: fa fa fd fd fd fd fd fd fa fa 00 00 00 00 04 fa
>   0x0c08801fc170: fa fa 00 00 00 00 00 01 fa fa 00 00 00 00 04 fa
>   0x0c08801fc180: fa fa 00 00 00 00 00 01 fa fa 00 00 00 00 00 01
>   0x0c08801fc190: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 04 fa
> =>0x0c08801fc1a0: fa fa fd[fd]fd fd fd fa fa fa fd fd fd fd fd fa
>   0x0c08801fc1b0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
>   0x0c08801fc1c0: fa fa 00 00 00 00 00 fa fa fa fd fd fd fd fd fd
>   0x0c08801fc1d0: fa fa 00 00 00 00 00 01 fa fa fd fd fd fd fd fa
>   0x0c08801fc1e0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fd
>   0x0c08801fc1f0: fa fa 00 00 00 00 00 01 fa fa fd fd fd fd fd fa
> Shadow byte legend (one shadow byte represents 8 application bytes):
>   Addressable:   00
>   Partially addressable: 01 02 03 04 05 06 07
>   Heap left redzone:   fa
>   Freed heap region:   fd
>   Stack left redzone:  f1
>   Stack mid redzone:   f2
>   Stack right redzone: f3
>   Stack after return:  f5
>   Stack use after scope:   f8
>   Global redzone:  f9
>   Global init order:   f6
>   Poisoned by user:f7
>   Container overflow:  fc
>   Array cookie:ac
>   Intra object redzone:bb
>   ASan internal:   fe
>   Left alloca redzone: ca
>   Right alloca redzone:cb
> ==7309==ABORTING
> 
> Signed-off-by: Li Feng 

Reviewed-by: Raphael Norwitz 

> ---
>  hw/block/vhost-user-blk.c | 4 
>  hw/virtio/vhost.c | 2 +-
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 776b9af3eb..19e79b96e4 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -463,7 +463,9 @@ reconnect:
>  
>  virtio_err:
>  g_free(s->vhost_vqs);
> +s->vhost_vqs = NULL;
>  g_free(s->inflight);
> +s->inflight = NULL;
>  for (i = 0; i < s->num_queues; i++) {
>  virtio_delete_queue(s->virtqs[i]);
>  }
> @@ -484,7 +486,9 @@ static

Re: [PATCH 1/4] vhost-user-blk: delay vhost_user_blk_disconnect

2020-04-16 Thread Raphael Norwitz

On Wed, Apr 15, 2020 at 11:28:23AM +0800, Li Feng wrote:
> 
>  switch (event) {
>  case CHR_EVENT_OPENED:
> @@ -363,7 +376,16 @@ static void vhost_user_blk_event(void *opaque, 
> QEMUChrEvent event)
>  }
>  break;
>  case CHR_EVENT_CLOSED:
> -vhost_user_blk_disconnect(dev);
> +/*
> + * a close event may happen during a read/write, but vhost
> + * code assumes the vhost_dev remains setup, so delay the
> + * stop & clear to idle.
> + */
> +ctx = qemu_get_current_aio_context();
> +
> +qemu_chr_fe_set_handlers(>chardev,  NULL, NULL, NULL,
> + NULL, NULL, NULL, false);
> +aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, opaque);

This seems a bit racy. What’s to stop the async operation from executing before
the next read?

If the issue is just that the vhost_dev state is being destroyed too early, why
don’t we rather move the vhost_dev_cleanup() call from 
vhost_user_blk_disconnect()
to vhost_user_blk_connect()? We may need to add some state to tell if this is 
the
first connect or a reconnect so we don’t call vhost_dev_cleanup() on initial
connect, but as long as we call vhost_dev_cleanup() before vhost_dev_init()
every time the device reconnects it shouldn’t matter that we keep that state
around.

Thoughts?

>  break;
>  case CHR_EVENT_BREAK:
>  case CHR_EVENT_MUX_IN:

Re: [PATCH v3 0/7] s390x/vfio-ccw: Channel Path Handling [QEMU]

2020-04-16 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20200417023440.70514-1-far...@linux.ibm.com/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

PASS 1 fdc-test /x86_64/fdc/cmos
PASS 2 fdc-test /x86_64/fdc/no_media_on_start
PASS 3 fdc-test /x86_64/fdc/read_without_media
==6194==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 test-qobject-output-visitor /visitor/output/int
PASS 2 test-qobject-output-visitor /visitor/output/bool
PASS 4 fdc-test /x86_64/fdc/media_change
---
PASS 32 test-opts-visitor /visitor/opts/range/beyond
PASS 33 test-opts-visitor /visitor/opts/dict/unvisited
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-coroutine -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="test-coroutine" 
==6236==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
==6236==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 
0x7fff2a4ff000; bottom 0x7fa61792; size: 0x005912bdf000 (382566526976)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 test-coroutine /basic/no-dangling-access
---
PASS 12 test-aio /aio/event/flush
PASS 13 test-aio /aio/event/wait/no-flush-cb
PASS 11 fdc-test /x86_64/fdc/read_no_dma_18
==6251==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 14 test-aio /aio/timer/schedule
PASS 15 test-aio /aio/coroutine/queue-chaining
PASS 16 test-aio /aio-gsource/flush
---
PASS 28 test-aio /aio-gsource/timer/schedule
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-aio-multithread -m=quick -k --tap < /dev/null | 
./scripts/tap-driver.pl --test-name="test-aio-multithread" 
PASS 1 test-aio-multithread /aio/multi/lifecycle
==6256==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 12 fdc-test /x86_64/fdc/read_no_dma_19
PASS 2 test-aio-multithread /aio/multi/schedule
PASS 13 fdc-test /x86_64/fdc/fuzz-registers
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img 
tests/qtest/ide-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="ide-test" 
==6278==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 ide-test /x86_64/ide/identify
PASS 3 test-aio-multithread /aio/multi/mutex/contended
==6284==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 2 ide-test /x86_64/ide/flush
==6295==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 3 ide-test /x86_64/ide/bmdma/simple_rw
==6301==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 4 ide-test /x86_64/ide/bmdma/trim
PASS 4 test-aio-multithread /aio/multi/mutex/handoff
==6307==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 5 test-aio-multithread /aio/multi/mutex/mcs
PASS 6 test-aio-multithread /aio/multi/mutex/pthread
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-throttle -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="test-throttle" 
==6324==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 test-throttle /throttle/leak_bucket
PASS 2 test-throttle /throttle/compute_wait
PASS 3 test-throttle /throttle/init
---
PASS 14 test-throttle /throttle/config/max
PASS 15 test-throttle /throttle/config/iops_size
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-thread-pool -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="test-thread-pool" 
==6328==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 test-thread-pool /thread-pool/submit
PASS 2 test-thread-pool /thread-pool/submit-aio
PASS 3 test-thread-pool /thread-pool/submit-co
PASS 4 test-thread-pool /thread-pool/submit-many
PASS 5 test-thread-pool /thread-pool/cancel
==6395==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 6 test-thread-pool /thread-pool/cancel-async

Re: [edk2-discuss] Load Option passing. Either bugs or my confusion.

2020-04-16 Thread Hou Qiming

I'm glad we can reach a consensus that ramfb needs sanity checks. And well,
I'm probably at fault with the hijacking.

Your QEMU/TCG in QEMU/TCG example also made me realize a deeper problem,
though: your setting still can't escape the host display / physical GPU
issue. The middle display layers be bochs or whatever, but as long as the
framebuffer content and resolution values are propagated, and the end
result is displayed at all on the host, the host GPU attack surface remains
exposed to the L2 guest, and checks are needed. Everything shown on the
screen involves the display driver - GPU stack, GTK or SDL or tty, you
can't avoid that. ramfb-kvmgt just happened to be the shortest pipeline
where every stage neglected the checks, which exposed this problem. Blaming
this on ramfb is unfair since in your scenario the checks are better done
in the display subsystems.

TL;DR You made me realize right now, it's a very real risk that an AARCH64
Windows guest could exploit a x64 host's display driver by specifying a
crafted framebuffer with overflowing resolution. I don't want to break it,
but I'd prefer a broken state over an insecure state.

I'm not quite sure what this thread is. But I think with the scope this
discussion is going, maybe it's more of a bug than a regression.


On Thu, Apr 16, 2020 at 10:12 PM Laszlo Ersek  wrote:

> On 04/16/20 06:38, Hou Qiming wrote:
> > Very good point, I did neglect ramfb resolution changes... But there is
> one
> > important thing: it *can* cause a QEMU crash, a potentially exploitable
> > one, not always a guest crash. That's what motivated my heavy-handed
> > approach since allowing resolution change would have necessitated a good
> > deal of security checks. It has crashed my host *kernel* quite a few
> times.
> >
> > The point is, while the QemuRamfbDxe driver may behave properly, nothing
> > prevents the guest from writing garbage or *malicious* values to the
> ramfb
> > config space. Then the values are sent to the display component without
> any
> > sanity check. For some GUI frontends, this means allocating an OpenGL
> > texture with guest-supplied dimensions and uploading guest memory content
> > to it, which means that guest memory content goes straight into a *kernel
> > driver*, *completely unchecked*. Some integer overflow and a lenient GPU
> > driver later, and the guest escapes straight to kernel.
> >
> > The proper way to enable ramfb resolution change again is adding sanity
> > checks for ramfb resolution / pointer / etc. on the QEMU side. We have to
> > make sure it doesn't exceed what the host GPU driver supports. Maybe
> clamp
> > both width and height to between 1 and 2048? We also need to validate
> that
> > OpenGL texture dimension update succeeds. Note that OpenGL is not obliged
> > to validate anything and everything has to be checked on the QEMU side.
>
> I agree that QEMU should sanity check the resolution requested by the
> guest. I also agree that "arbitrary" limits are acceptable, for
> preventing integer overflows and -- hopefully -- memory allocation
> failures too.
>
> But I don't see the host kernel / OpenGL / physical GPU angle, at least
> not directly. That angle seems to be specific to your particular use
> case (particular choice of display backend).
>
> For example, if you nest QEMU/TCG in QEMU/TCG, with no KVM and no device
> assignment in the picture anywhere, and OVMF drives ramfb in L2, and the
> display *backend* (such as GTK or SDL GUI window) for the QEMU process
> running in L1 sits on top of a virtual device (such as bochs-display)
> provided by QEMU running in L0, then the ramfb stuff (including the
> resolution changes and the range checks) should work just the same,
> between L2 and L1.
>
> I kinda feel like ramfb has been hijacked for providing a boot time
> display crutch for kvmgt. (I might not be using the correct terminology
> here; sorry about that). That's *not* what ramfb was originally intended
> for, as far as I recall. Compare:
>
> - 59926de9987c ("Merge remote-tracking branch
> 'remotes/kraxel/tags/vga-20180618-pull-request' into staging", 2018-06-19)
>
> - dddb37495b84 ("Merge remote-tracking branch
> 'remotes/awilliam/tags/vfio-updates-20181015.0' into staging", 2018-10-15)
>
> IIRC, Gerd originally invented ramfb for giving AARCH64 Windows the
> linear framebuffer that the latter so badly wants, in particular so that
> the framebuffer exist in guest RAM (not in guest MMIO), in order to
> avoid the annoying S1/S2 caching behavior of AARCH64/KVM when the guest
> maps an area as MMIO that is mapped as RAM on the host [1]. See:
>
> - https://bugzilla.tianocore.org/show_bug.cgi?id=785#c4
> - https://bugzilla.tianocore.org/show_bug.cgi?id=785#c7
> - https://bugzilla.tianocore.org/show_bug.cgi?id=785#c8
>
> and the further references given in those bugzilla comments.
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1679680#c0
>
> Component reuse is obviously *hugely* important, and it would be silly
> for me to

[PATCH v3 0/7] s390x/vfio-ccw: Channel Path Handling [QEMU]

2020-04-16 Thread Eric Farman

Here is a new pass at the channel-path handling code for vfio-ccw,
to take advantage of the corresponding kernel patches posted here:

https://lore.kernel.org/kvm/20200417023001.65006-1-far...@linux.ibm.com/

Per the discussion in v2, I dropped the EIO-to-CC3 patch from the
head of the series.

I also added a patch to refactor css_queue_crw(), so we can get a
CRW queued with a fully-qualified CRW we get out of this region
instead of extracting/recreating it.

Besides that, changes should be in the git notes for each patch.

v2: 
https://lore.kernel.org/qemu-devel/20200206214509.16434-1-far...@linux.ibm.com/
v1: 
https://lore.kernel.org/qemu-devel/20191115033437.37926-1-far...@linux.ibm.com/

Eric Farman (3):
  vfio-ccw: Refactor cleanup of regions
  vfio-ccw: Refactor ccw irq handler
  s390x/css: Refactor the css_queue_crw() routine

Farhan Ali (4):
  linux-headers: update
  vfio-ccw: Add support for the schib region
  vfio-ccw: Add support for the crw region
  vfio-ccw: Add support for the CRW irq

 hw/s390x/css.c |  57 ++---
 hw/s390x/s390-ccw.c|  28 +
 hw/vfio/ccw.c  | 203 +
 include/hw/s390x/css.h |   4 +-
 include/hw/s390x/s390-ccw.h|   1 +
 linux-headers/linux/vfio.h |  40 +++
 linux-headers/linux/vfio_ccw.h |  18 +++
 target/s390x/ioinst.c  |   3 +-
 8 files changed, 313 insertions(+), 41 deletions(-)

-- 
2.17.1

[PATCH v3 1/7] linux-headers: update

2020-04-16 Thread Eric Farman

From: Farhan Ali 

Signed-off-by: Farhan Ali 
Signed-off-by: Eric Farman 
---

Notes:
v2->v3: [EF]
 - Re-ran 16 April 2020 (based on kernel tag v5.6, and limited to
   bits interesting to this series)

v1->v2: [EF]
 - Re-ran 3 February 2020 (based on kernel tag v5.5)

v0->v1: [EF]
 - Run scripts/update-linux-headers.sh properly, but do not
   add resulting changes to linux-headers/asm-mips/

 linux-headers/linux/vfio.h | 40 ++
 linux-headers/linux/vfio_ccw.h | 18 +++
 2 files changed, 58 insertions(+)

diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index fb10370d29..9c8d889551 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -378,6 +378,8 @@ struct vfio_region_gfx_edid {
 
 /* sub-types for VFIO_REGION_TYPE_CCW */
 #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD  (1)
+#define VFIO_REGION_SUBTYPE_CCW_SCHIB  (2)
+#define VFIO_REGION_SUBTYPE_CCW_CRW(3)
 
 /*
  * The MSIX mappable capability informs that MSIX data of a BAR can be mmapped
@@ -577,6 +579,7 @@ enum {
 
 enum {
VFIO_CCW_IO_IRQ_INDEX,
+   VFIO_CCW_CRW_IRQ_INDEX,
VFIO_CCW_NUM_IRQS
 };
 
@@ -707,6 +710,43 @@ struct vfio_device_ioeventfd {
 
 #define VFIO_DEVICE_IOEVENTFD  _IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/**
+ * VFIO_DEVICE_FEATURE - _IORW(VFIO_TYPE, VFIO_BASE + 17,
+ *struct vfio_device_feature)
+ *
+ * Get, set, or probe feature data of the device.  The feature is selected
+ * using the FEATURE_MASK portion of the flags field.  Support for a feature
+ * can be probed by setting both the FEATURE_MASK and PROBE bits.  A probe
+ * may optionally include the GET and/or SET bits to determine read vs write
+ * access of the feature respectively.  Probing a feature will return success
+ * if the feature is supported and all of the optionally indicated GET/SET
+ * methods are supported.  The format of the data portion of the structure is
+ * specific to the given feature.  The data portion is not required for
+ * probing.  GET and SET are mutually exclusive, except for use with PROBE.
+ *
+ * Return 0 on success, -errno on failure.
+ */
+struct vfio_device_feature {
+   __u32   argsz;
+   __u32   flags;
+#define VFIO_DEVICE_FEATURE_MASK   (0x) /* 16-bit feature index */
+#define VFIO_DEVICE_FEATURE_GET(1 << 16) /* Get feature into 
data[] */
+#define VFIO_DEVICE_FEATURE_SET(1 << 17) /* Set feature from 
data[] */
+#define VFIO_DEVICE_FEATURE_PROBE  (1 << 18) /* Probe feature support */
+   __u8data[];
+};
+
+#define VFIO_DEVICE_FEATURE_IO(VFIO_TYPE, VFIO_BASE + 17)
+
+/*
+ * Provide support for setting a PCI VF Token, which is used as a shared
+ * secret between PF and VF drivers.  This feature may only be set on a
+ * PCI SR-IOV PF when SR-IOV is enabled on the PF and there are no existing
+ * open VFs.  Data provided when setting this feature is a 16-byte array
+ * (__u8 b[16]), representing a UUID.
+ */
+#define VFIO_DEVICE_FEATURE_PCI_VF_TOKEN   (0)
+
 /*  API for Type1 VFIO IOMMU  */
 
 /**
diff --git a/linux-headers/linux/vfio_ccw.h b/linux-headers/linux/vfio_ccw.h
index fcc3e69ef5..237fd5a618 100644
--- a/linux-headers/linux/vfio_ccw.h
+++ b/linux-headers/linux/vfio_ccw.h
@@ -34,4 +34,22 @@ struct ccw_cmd_region {
__u32 ret_code;
 } __attribute__((packed));
 
+/*
+ * Used for processing commands that read the subchannel-information block
+ * Reading this region triggers a stsch() to hardware
+ * Note: this is controlled by a capability
+ */
+struct ccw_schib_region {
+#define SCHIB_AREA_SIZE 52
+   __u8 schib_area[SCHIB_AREA_SIZE];
+} __attribute__((packed));
+
+/*
+ * Used for returning Channel Report Word(s) to userspace.
+ * Note: this is controlled by a capability
+ */
+struct ccw_crw_region {
+   __u32 crw;
+} __attribute__((packed));
+
 #endif
-- 
2.17.1

[PATCH v3 3/7] vfio-ccw: Add support for the schib region

2020-04-16 Thread Eric Farman

From: Farhan Ali 

The schib region can be used to obtain the latest SCHIB from the host
passthrough subchannel. Since the guest SCHIB is virtualized,
we currently only update the path related information so that the
guest is aware of any path related changes when it issues the
'stsch' instruction.

Signed-off-by: Farhan Ali 
Signed-off-by: Eric Farman 
---

Notes:
v1->v2:
 - Remove silly variable intialization, and add a block comment,
   to css_do_stsch() [CH]
 - Add a TODO statement to s390_ccw_store(), for myself to sort
   out while we go over kernel code more closely [CH/EF]
 - In vfio_ccw_handle_store(),
- Set schib pointer once region is determined to be non-NULL [CH]
- Return cc=0 if pread() fails, and log an error [CH]

v0->v1: [EF]
 - Change various incarnations of "update chp status" to
   "handle_store", to reflect the STSCH instruction that will
   drive this code
 - Remove temporary variable for casting/testing purposes in
   s390_ccw_store(), and add a block comment of WHY its there.
 - Add a few comments to vfio_ccw_handle_store()

 hw/s390x/css.c  | 13 ++--
 hw/s390x/s390-ccw.c | 28 +
 hw/vfio/ccw.c   | 63 +
 include/hw/s390x/css.h  |  3 +-
 include/hw/s390x/s390-ccw.h |  1 +
 target/s390x/ioinst.c   |  3 +-
 6 files changed, 106 insertions(+), 5 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index 5d8e08667e..a44faa3549 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -1335,11 +1335,20 @@ static void copy_schib_to_guest(SCHIB *dest, const 
SCHIB *src)
 }
 }
 
-int css_do_stsch(SubchDev *sch, SCHIB *schib)
+IOInstEnding css_do_stsch(SubchDev *sch, SCHIB *schib)
 {
+int ret;
+
+/*
+ * For some subchannels, we may want to update parts of
+ * the schib (e.g., update path masks from the host device
+ * for passthrough subchannels).
+ */
+ret = s390_ccw_store(sch);
+
 /* Use current status. */
 copy_schib_to_guest(schib, >curr_status);
-return 0;
+return ret;
 }
 
 static void copy_pmcw_from_guest(PMCW *dest, const PMCW *src)
diff --git a/hw/s390x/s390-ccw.c b/hw/s390x/s390-ccw.c
index 0c5a5b60bd..0c619706a1 100644
--- a/hw/s390x/s390-ccw.c
+++ b/hw/s390x/s390-ccw.c
@@ -51,6 +51,34 @@ int s390_ccw_clear(SubchDev *sch)
 return cdc->handle_clear(sch);
 }
 
+IOInstEnding s390_ccw_store(SubchDev *sch)
+{
+S390CCWDeviceClass *cdc = NULL;
+int ret = IOINST_CC_EXPECTED;
+
+/*
+ * This only applies to passthrough devices, so we can't unconditionally
+ * set this variable like we would for halt/clear.
+ *
+ * TODO from Conny on v1:
+ *   "We have a generic ccw_cb in the subchannel structure for ccw
+ *interpretation; would it make sense to add a generic callback
+ *for stsch there as well?
+ *
+ *   "(This works fine, though. Might want to add the check for
+ *halt/clear as well, but that might be a bit overkill.)"
+ */
+if (object_dynamic_cast(OBJECT(sch->driver_data), TYPE_S390_CCW)) {
+cdc = S390_CCW_DEVICE_GET_CLASS(sch->driver_data);
+}
+
+if (cdc && cdc->handle_store) {
+ret = cdc->handle_store(sch);
+}
+
+return ret;
+}
+
 static void s390_ccw_get_dev_info(S390CCWDevice *cdev,
   char *sysfsdev,
   Error **errp)
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index ae9e396367..8aa224bf43 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -41,6 +41,9 @@ struct VFIOCCWDevice {
 uint64_t async_cmd_region_size;
 uint64_t async_cmd_region_offset;
 struct ccw_cmd_region *async_cmd_region;
+uint64_t schib_region_size;
+uint64_t schib_region_offset;
+struct ccw_schib_region *schib_region;
 EventNotifier io_notifier;
 bool force_orb_pfch;
 bool warned_orb_pfch;
@@ -123,6 +126,51 @@ again:
 }
 }
 
+static IOInstEnding vfio_ccw_handle_store(SubchDev *sch)
+{
+S390CCWDevice *cdev = sch->driver_data;
+VFIOCCWDevice *vcdev = DO_UPCAST(VFIOCCWDevice, cdev, cdev);
+SCHIB *schib = >curr_status;
+struct ccw_schib_region *region = vcdev->schib_region;
+SCHIB *s;
+int ret;
+
+/* schib region not available so nothing else to do */
+if (!region) {
+return IOINST_CC_EXPECTED;
+}
+
+memset(region, 0, sizeof(*region));
+ret = pread(vcdev->vdev.fd, region, vcdev->schib_region_size,
+vcdev->schib_region_offset);
+
+if (ret == -1) {
+/*
+ * Device is probably damaged, but store subchannel does not
+ * have a nonzero cc defined for this scenario.  Log an error,
+ * and presume things are otherwise fine.
+ */
+error_report("vfio-ccw: store region read failed with errno=%d", 
errno);
+return IOINST_CC_EXPECTED;
+}
+
+/*
+ * Selectively copy path-related bits

[PATCH v3 6/7] s390x/css: Refactor the css_queue_crw() routine

2020-04-16 Thread Eric Farman

We have a use case (vfio-ccw) where a CRW is already built and
ready to use.  Rather than teasing out the components just to
reassemble it later, let's rework this code so we can queue a
fully-qualified CRW directly.

Signed-off-by: Eric Farman 
---
 hw/s390x/css.c | 44 --
 include/hw/s390x/css.h |  1 +
 2 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index a44faa3549..a72c09adbe 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -2170,30 +2170,23 @@ void css_subch_assign(uint8_t cssid, uint8_t ssid, 
uint16_t schid,
 }
 }
 
-void css_queue_crw(uint8_t rsc, uint8_t erc, int solicited,
-   int chain, uint16_t rsid)
+void css_queue_crw_cont(CRW crw)
 {
 CrwContainer *crw_cont;
 
-trace_css_crw(rsc, erc, rsid, chain ? "(chained)" : "");
+trace_css_crw((crw.flags & CRW_FLAGS_MASK_RSC) >> 8,
+  crw.flags & CRW_FLAGS_MASK_ERC,
+  crw.rsid,
+  (crw.flags & CRW_FLAGS_MASK_C) ? "(chained)" : "");
+
 /* TODO: Maybe use a static crw pool? */
 crw_cont = g_try_new0(CrwContainer, 1);
 if (!crw_cont) {
 channel_subsys.crws_lost = true;
 return;
 }
-crw_cont->crw.flags = (rsc << 8) | erc;
-if (solicited) {
-crw_cont->crw.flags |= CRW_FLAGS_MASK_S;
-}
-if (chain) {
-crw_cont->crw.flags |= CRW_FLAGS_MASK_C;
-}
-crw_cont->crw.rsid = rsid;
-if (channel_subsys.crws_lost) {
-crw_cont->crw.flags |= CRW_FLAGS_MASK_R;
-channel_subsys.crws_lost = false;
-}
+
+crw_cont->crw = crw;
 
 QTAILQ_INSERT_TAIL(_subsys.pending_crws, crw_cont, sibling);
 
@@ -2204,6 +2197,27 @@ void css_queue_crw(uint8_t rsc, uint8_t erc, int 
solicited,
 }
 }
 
+void css_queue_crw(uint8_t rsc, uint8_t erc, int solicited,
+   int chain, uint16_t rsid)
+{
+CRW crw;
+
+crw.flags = (rsc << 8) | erc;
+if (solicited) {
+crw.flags |= CRW_FLAGS_MASK_S;
+}
+if (chain) {
+crw.flags |= CRW_FLAGS_MASK_C;
+}
+crw.rsid = rsid;
+if (channel_subsys.crws_lost) {
+crw.flags |= CRW_FLAGS_MASK_R;
+channel_subsys.crws_lost = false;
+}
+
+css_queue_crw_cont(crw);
+}
+
 void css_generate_sch_crws(uint8_t cssid, uint8_t ssid, uint16_t schid,
int hotplugged, int add)
 {
diff --git a/include/hw/s390x/css.h b/include/hw/s390x/css.h
index 7e3a5e7433..1aa7b80f5b 100644
--- a/include/hw/s390x/css.h
+++ b/include/hw/s390x/css.h
@@ -205,6 +205,7 @@ void copy_scsw_to_guest(SCSW *dest, const SCSW *src);
 void css_inject_io_interrupt(SubchDev *sch);
 void css_reset(void);
 void css_reset_sch(SubchDev *sch);
+void css_queue_crw_cont(CRW crw);
 void css_queue_crw(uint8_t rsc, uint8_t erc, int solicited,
int chain, uint16_t rsid);
 void css_generate_sch_crws(uint8_t cssid, uint8_t ssid, uint16_t schid,
-- 
2.17.1

[PATCH v3 4/7] vfio-ccw: Add support for the crw region

2020-04-16 Thread Eric Farman

From: Farhan Ali 

The crw region can be used to obtain information about
Channel Report Words (CRW) from vfio-ccw driver.

Currently only channel path related CRWs are passed to
QEMU from vfio-ccw driver.

Signed-off-by: Farhan Ali 
Signed-off-by: Eric Farman 
---

Notes:
v0->v1: [EF]
 - Fixed copy/paste error in error message (s/schib/CRW)

 hw/vfio/ccw.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 8aa224bf43..db565b6f38 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -44,6 +44,9 @@ struct VFIOCCWDevice {
 uint64_t schib_region_size;
 uint64_t schib_region_offset;
 struct ccw_schib_region *schib_region;
+uint64_t crw_region_size;
+uint64_t crw_region_offset;
+struct ccw_crw_region *crw_region;
 EventNotifier io_notifier;
 bool force_orb_pfch;
 bool warned_orb_pfch;
@@ -449,10 +452,24 @@ static void vfio_ccw_get_region(VFIOCCWDevice *vcdev, 
Error **errp)
 vcdev->schib_region = g_malloc(info->size);
 }
 
+ret = vfio_get_dev_region_info(vdev, VFIO_REGION_TYPE_CCW,
+   VFIO_REGION_SUBTYPE_CCW_CRW, );
+
+if (!ret) {
+vcdev->crw_region_size = info->size;
+if (sizeof(*vcdev->crw_region) != vcdev->crw_region_size) {
+error_setg(errp, "vfio: Unexpected size of the CRW region");
+goto out_err;
+}
+vcdev->crw_region_offset = info->offset;
+vcdev->crw_region = g_malloc(info->size);
+}
+
 g_free(info);
 return;
 
 out_err:
+g_free(vcdev->crw_region);
 g_free(vcdev->schib_region);
 g_free(vcdev->async_cmd_region);
 g_free(vcdev->io_region);
@@ -462,6 +479,7 @@ out_err:
 
 static void vfio_ccw_put_region(VFIOCCWDevice *vcdev)
 {
+g_free(vcdev->crw_region);
 g_free(vcdev->schib_region);
 g_free(vcdev->async_cmd_region);
 g_free(vcdev->io_region);
-- 
2.17.1

[PATCH v3 5/7] vfio-ccw: Refactor ccw irq handler

2020-04-16 Thread Eric Farman

Make it easier to add new ones in the future.

Signed-off-by: Eric Farman 
Reviewed-by: Cornelia Huck 
---

Notes:
v2->v3:
 - Added Conny's r-b

v1->v2:
 - Make irq parameter unsigned [CH]
 - Remove extraneous %m from error_report calls [CH]

 hw/vfio/ccw.c | 58 +--
 1 file changed, 42 insertions(+), 16 deletions(-)

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index db565b6f38..ee3415a64a 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -334,22 +334,36 @@ read_err:
 css_inject_io_interrupt(sch);
 }
 
-static void vfio_ccw_register_io_notifier(VFIOCCWDevice *vcdev, Error **errp)
+static void vfio_ccw_register_irq_notifier(VFIOCCWDevice *vcdev,
+   unsigned int irq,
+   Error **errp)
 {
 VFIODevice *vdev = >vdev;
 struct vfio_irq_info *irq_info;
 size_t argsz;
 int fd;
+EventNotifier *notifier;
+IOHandler *fd_read;
+
+switch (irq) {
+case VFIO_CCW_IO_IRQ_INDEX:
+notifier = >io_notifier;
+fd_read = vfio_ccw_io_notifier_handler;
+break;
+default:
+error_setg(errp, "vfio: Unsupported device irq(%d)", irq);
+return;
+}
 
-if (vdev->num_irqs < VFIO_CCW_IO_IRQ_INDEX + 1) {
-error_setg(errp, "vfio: unexpected number of io irqs %u",
+if (vdev->num_irqs < irq + 1) {
+error_setg(errp, "vfio: unexpected number of irqs %u",
vdev->num_irqs);
 return;
 }
 
 argsz = sizeof(*irq_info);
 irq_info = g_malloc0(argsz);
-irq_info->index = VFIO_CCW_IO_IRQ_INDEX;
+irq_info->index = irq;
 irq_info->argsz = argsz;
 if (ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO,
   irq_info) < 0 || irq_info->count < 1) {
@@ -357,37 +371,49 @@ static void vfio_ccw_register_io_notifier(VFIOCCWDevice 
*vcdev, Error **errp)
 goto out_free_info;
 }
 
-if (event_notifier_init(>io_notifier, 0)) {
+if (event_notifier_init(notifier, 0)) {
 error_setg_errno(errp, errno,
- "vfio: Unable to init event notifier for IO");
+ "vfio: Unable to init event notifier for irq (%d)",
+ irq);
 goto out_free_info;
 }
 
-fd = event_notifier_get_fd(>io_notifier);
-qemu_set_fd_handler(fd, vfio_ccw_io_notifier_handler, NULL, vcdev);
+fd = event_notifier_get_fd(notifier);
+qemu_set_fd_handler(fd, fd_read, NULL, vcdev);
 
-if (vfio_set_irq_signaling(vdev, VFIO_CCW_IO_IRQ_INDEX, 0,
+if (vfio_set_irq_signaling(vdev, irq, 0,
VFIO_IRQ_SET_ACTION_TRIGGER, fd, errp)) {
 qemu_set_fd_handler(fd, NULL, NULL, vcdev);
-event_notifier_cleanup(>io_notifier);
+event_notifier_cleanup(notifier);
 }
 
 out_free_info:
 g_free(irq_info);
 }
 
-static void vfio_ccw_unregister_io_notifier(VFIOCCWDevice *vcdev)
+static void vfio_ccw_unregister_irq_notifier(VFIOCCWDevice *vcdev,
+ unsigned int irq)
 {
 Error *err = NULL;
+EventNotifier *notifier;
+
+switch (irq) {
+case VFIO_CCW_IO_IRQ_INDEX:
+notifier = >io_notifier;
+break;
+default:
+error_report("vfio: Unsupported device irq(%d)", irq);
+return;
+}
 
-if (vfio_set_irq_signaling(>vdev, VFIO_CCW_IO_IRQ_INDEX, 0,
+if (vfio_set_irq_signaling(>vdev, irq, 0,
VFIO_IRQ_SET_ACTION_TRIGGER, -1, )) {
 error_reportf_err(err, VFIO_MSG_PREFIX, vcdev->vdev.name);
 }
 
-qemu_set_fd_handler(event_notifier_get_fd(>io_notifier),
+qemu_set_fd_handler(event_notifier_get_fd(notifier),
 NULL, NULL, vcdev);
-event_notifier_cleanup(>io_notifier);
+event_notifier_cleanup(notifier);
 }
 
 static void vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
@@ -590,7 +616,7 @@ static void vfio_ccw_realize(DeviceState *dev, Error **errp)
 goto out_region_err;
 }
 
-vfio_ccw_register_io_notifier(vcdev, );
+vfio_ccw_register_irq_notifier(vcdev, VFIO_CCW_IO_IRQ_INDEX, );
 if (err) {
 goto out_notifier_err;
 }
@@ -619,7 +645,7 @@ static void vfio_ccw_unrealize(DeviceState *dev, Error 
**errp)
 S390CCWDeviceClass *cdc = S390_CCW_DEVICE_GET_CLASS(cdev);
 VFIOGroup *group = vcdev->vdev.group;
 
-vfio_ccw_unregister_io_notifier(vcdev);
+vfio_ccw_unregister_irq_notifier(vcdev, VFIO_CCW_IO_IRQ_INDEX);
 vfio_ccw_put_region(vcdev);
 vfio_ccw_put_device(vcdev);
 vfio_put_group(group);
-- 
2.17.1

[PATCH v3 7/7] vfio-ccw: Add support for the CRW irq

2020-04-16 Thread Eric Farman

From: Farhan Ali 

The CRW irq will be used by vfio-ccw to notify the userspace
about any CRWs the userspace needs to handle. Let's add support
for it.

Signed-off-by: Farhan Ali 
Signed-off-by: Eric Farman 
---

Notes:
v2->v3:
 - Remove "size==0" check in CRW notifier [CH]
 - Remove intermediate rsc/erc variables, use css_queue_crw_cont() [CH]
 - s/crw0/crw/ [CH]

v1->v2:
 - Add a loop to continually read region while data is
   present, queueing CRWs as found [CH]

v0->v1: [EF]
 - Check vcdev->crw_region before registering the irq,
   in case host kernel does not have matching support
 - Split the refactoring changes to an earlier (new) patch
   (and don't remove the "num_irqs" check in the register
   routine, but adjust it to the check the input variable)
 - Don't revert the cool vfio_set_irq_signaling() stuff
 - Unregister CRW IRQ before IO IRQ in unrealize
 - s/crw1/crw0/

 hw/vfio/ccw.c | 50 ++
 1 file changed, 50 insertions(+)

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index ee3415a64a..cb4a331ced 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -48,6 +48,7 @@ struct VFIOCCWDevice {
 uint64_t crw_region_offset;
 struct ccw_crw_region *crw_region;
 EventNotifier io_notifier;
+EventNotifier crw_notifier;
 bool force_orb_pfch;
 bool warned_orb_pfch;
 };
@@ -264,6 +265,39 @@ static void vfio_ccw_reset(DeviceState *dev)
 ioctl(vcdev->vdev.fd, VFIO_DEVICE_RESET);
 }
 
+static void vfio_ccw_crw_notifier_handler(void *opaque)
+{
+VFIOCCWDevice *vcdev = opaque;
+struct ccw_crw_region *region = vcdev->crw_region;
+CRW crw;
+int size;
+
+if (!event_notifier_test_and_clear(>crw_notifier)) {
+return;
+}
+
+do {
+memset(region, 0, sizeof(*region));
+size = pread(vcdev->vdev.fd, region, vcdev->crw_region_size,
+ vcdev->crw_region_offset);
+
+if (size == -1) {
+error_report("vfio-ccw: Read crw region failed with errno=%d",
+ errno);
+break;
+}
+
+if (region->crw == 0) {
+/* No more CRWs to queue */
+break;
+}
+
+memcpy(, >crw, sizeof(CRW));
+
+css_queue_crw_cont(crw);
+} while (1);
+}
+
 static void vfio_ccw_io_notifier_handler(void *opaque)
 {
 VFIOCCWDevice *vcdev = opaque;
@@ -350,6 +384,10 @@ static void vfio_ccw_register_irq_notifier(VFIOCCWDevice 
*vcdev,
 notifier = >io_notifier;
 fd_read = vfio_ccw_io_notifier_handler;
 break;
+case VFIO_CCW_CRW_IRQ_INDEX:
+notifier = >crw_notifier;
+fd_read = vfio_ccw_crw_notifier_handler;
+break;
 default:
 error_setg(errp, "vfio: Unsupported device irq(%d)", irq);
 return;
@@ -401,6 +439,9 @@ static void vfio_ccw_unregister_irq_notifier(VFIOCCWDevice 
*vcdev,
 case VFIO_CCW_IO_IRQ_INDEX:
 notifier = >io_notifier;
 break;
+case VFIO_CCW_CRW_IRQ_INDEX:
+notifier = >crw_notifier;
+break;
 default:
 error_report("vfio: Unsupported device irq(%d)", irq);
 return;
@@ -621,6 +662,14 @@ static void vfio_ccw_realize(DeviceState *dev, Error 
**errp)
 goto out_notifier_err;
 }
 
+if (vcdev->crw_region) {
+vfio_ccw_register_irq_notifier(vcdev, VFIO_CCW_CRW_IRQ_INDEX, );
+if (err) {
+vfio_ccw_unregister_irq_notifier(vcdev, VFIO_CCW_IO_IRQ_INDEX);
+goto out_notifier_err;
+}
+}
+
 return;
 
 out_notifier_err:
@@ -645,6 +694,7 @@ static void vfio_ccw_unrealize(DeviceState *dev, Error 
**errp)
 S390CCWDeviceClass *cdc = S390_CCW_DEVICE_GET_CLASS(cdev);
 VFIOGroup *group = vcdev->vdev.group;
 
+vfio_ccw_unregister_irq_notifier(vcdev, VFIO_CCW_CRW_IRQ_INDEX);
 vfio_ccw_unregister_irq_notifier(vcdev, VFIO_CCW_IO_IRQ_INDEX);
 vfio_ccw_put_region(vcdev);
 vfio_ccw_put_device(vcdev);
-- 
2.17.1

[PATCH v3 2/7] vfio-ccw: Refactor cleanup of regions

2020-04-16 Thread Eric Farman

While we're at it, add a g_free() for the async_cmd_region that
is the last thing currently created.  g_free() knows how to handle
NULL pointers, so this makes it easier to remember what cleanups
need to be performed when new regions are added.

Signed-off-by: Eric Farman 
Reviewed-by: Cornelia Huck 
---

Notes:
v1-v2:
 - Add Conny's r-b

 hw/vfio/ccw.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 50cc2ec75c..ae9e396367 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -370,8 +370,7 @@ static void vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error 
**errp)
 vcdev->io_region_size = info->size;
 if (sizeof(*vcdev->io_region) != vcdev->io_region_size) {
 error_setg(errp, "vfio: Unexpected size of the I/O region");
-g_free(info);
-return;
+goto out_err;
 }
 
 vcdev->io_region_offset = info->offset;
@@ -384,15 +383,20 @@ static void vfio_ccw_get_region(VFIOCCWDevice *vcdev, 
Error **errp)
 vcdev->async_cmd_region_size = info->size;
 if (sizeof(*vcdev->async_cmd_region) != vcdev->async_cmd_region_size) {
 error_setg(errp, "vfio: Unexpected size of the async cmd region");
-g_free(vcdev->io_region);
-g_free(info);
-return;
+goto out_err;
 }
 vcdev->async_cmd_region_offset = info->offset;
 vcdev->async_cmd_region = g_malloc0(info->size);
 }
 
 g_free(info);
+return;
+
+out_err:
+g_free(vcdev->async_cmd_region);
+g_free(vcdev->io_region);
+g_free(info);
+return;
 }
 
 static void vfio_ccw_put_region(VFIOCCWDevice *vcdev)
-- 
2.17.1

Re: [PATCH] target/ppc: Fix mtmsr(d) L=1 variant that loses interrupts

2020-04-16 Thread David Gibson

On Tue, Apr 14, 2020 at 09:11:31PM +1000, Nicholas Piggin wrote:
65;5803;1c> If mtmsr L=1 sets MSR[EE] while there is a maskable exception 
pending,
> it does not cause an interrupt. This causes the test case to hang:
> 
> https://lists.gnu.org/archive/html/qemu-ppc/2019-10/msg00826.html
> 
> More recently, Linux reduced the occurance of operations (e.g., rfi)
> which stop translation and allow pending interrupts to be processed.
> This started causing hangs in Linux boot in long-running kernel tests,
> running with '-d int' shows the decrementer stops firing despite DEC
> wrapping and MSR[EE]=1.
> 
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208301.html
> 
> The cause is the broken mtmsr L=1 behaviour, which is contrary to the
> architecture. From Power ISA v3.0B, p.977, Move To Machine State Register,
> Programming Note states:
> 
> If MSR[EE]=0 and an External, Decrementer, or Performance Monitor
> exception is pending, executing an mtmsrd instruction that sets
> MSR[EE] to 1 will cause the interrupt to occur before the next
> instruction is executed, if no higher priority exception exists
> 
> Fix this by handling L=1 exactly the same way as L=0, modulo the MSR
> bits altered.
> 
> The confusion arises from L=0 being "context synchronizing" whereas L=1
> is "execution synchronizing", which is a weaker semantic. However this
> is not a relaxation of the requirement that these exceptions cause
> interrupts when MSR[EE]=1 (e.g., when mtmsr executes to completion as
> TCG is doing here), rather it specifies how a pipelined processor can
> have multiple instructions in flight where one may influence how another
> behaves.
> 
> Cc: qemu-sta...@nongnu.org
> Reported-by: Anton Blanchard 
> Reported-by: Nathan Chancellor 
> Tested-by: Nathan Chancellor 
> Signed-off-by: Nicholas Piggin 
> ---
> Thanks very much to Nathan for reporting and testing it, I added his
> Tested-by tag despite a more polished patch, as the the basics are 
> still the same (and still fixes his test case here).
> 
> This bug possibly goes back to early v2.04 / mtmsrd L=1 support around
> 2007, and the code has been changed several times since then so may
> require some backporting.
> 
> 32-bit / mtmsr untested at the moment, I don't have an environment
> handy.
> 
>  target/ppc/translate.c | 46 +-
>  1 file changed, 27 insertions(+), 19 deletions(-)

Applied to ppc-for-5.0.

> 
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index b207fb5386..9959259dba 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -4361,30 +4361,34 @@ static void gen_mtmsrd(DisasContext *ctx)
>  CHK_SV;
>  
>  #if !defined(CONFIG_USER_ONLY)
> +if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
> +gen_io_start();
> +}
>  if (ctx->opcode & 0x0001) {
> -/* Special form that does not need any synchronisation */
> +/* L=1 form only updates EE and RI */
>  TCGv t0 = tcg_temp_new();
> +TCGv t1 = tcg_temp_new();
>  tcg_gen_andi_tl(t0, cpu_gpr[rS(ctx->opcode)],
>  (1 << MSR_RI) | (1 << MSR_EE));
> -tcg_gen_andi_tl(cpu_msr, cpu_msr,
> +tcg_gen_andi_tl(t1, cpu_msr,
>  ~(target_ulong)((1 << MSR_RI) | (1 << MSR_EE)));
> -tcg_gen_or_tl(cpu_msr, cpu_msr, t0);
> +tcg_gen_or_tl(t1, t1, t0);
> +
> +gen_helper_store_msr(cpu_env, t1);
>  tcg_temp_free(t0);
> +tcg_temp_free(t1);
> +
>  } else {
>  /*
>   * XXX: we need to update nip before the store if we enter
>   *  power saving mode, we will exit the loop directly from
>   *  ppc_store_msr
>   */
> -if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
> -gen_io_start();
> -}
>  gen_update_nip(ctx, ctx->base.pc_next);
>  gen_helper_store_msr(cpu_env, cpu_gpr[rS(ctx->opcode)]);
> -/* Must stop the translation as machine state (may have) changed */
> -/* Note that mtmsr is not always defined as context-synchronizing */
> -gen_stop_exception(ctx);
>  }
> +/* Must stop the translation as machine state (may have) changed */
> +gen_stop_exception(ctx);
>  #endif /* !defined(CONFIG_USER_ONLY) */
>  }
>  #endif /* defined(TARGET_PPC64) */
> @@ -4394,15 +4398,23 @@ static void gen_mtmsr(DisasContext *ctx)
>  CHK_SV;
>  
>  #if !defined(CONFIG_USER_ONLY)
> -   if (ctx->opcode & 0x0001) {
> -/* Special form that does not need any synchronisation */
> +if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
> +gen_io_start();
> +}
> +if (ctx->opcode & 0x0001) {
> +/* L=1 form only updates EE and RI */
>  TCGv t0 = tcg_temp_new();
> +TCGv t1 = tcg_temp_new();
>  tcg_gen_andi_tl(t0, cpu_gpr[rS(ctx->opcode)],
>  (1 << MSR_RI) | (1 << MSR_EE));
> -

Re: [PATCH] qcow2: Expose bitmaps' size during measure

2020-04-16 Thread Eric Blake


On 4/16/20 5:49 PM, no-re...@patchew.org wrote:

Patchew URL: https://patchew.org/QEMU/20200416212349.731404-1-ebl...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

Not run: 259
Failures: 190
Failed 1 of 117 iotests


Hmm - the email truncated the useful part of the failure.  Anyways, 
reading from...



The full log is available at
http://patchew.org/logs/20200416212349.731404-1-ebl...@redhat.com/testing.docker-quick@centos7/?type=message.


I see:

--- /tmp/qemu-test/src/tests/qemu-iotests/190.out	2020-04-16 
21:15:51.0 +
+++ /tmp/qemu-test/build/tests/qemu-iotests/190.out.bad	2020-04-16 
22:45:47.504493172 +

@@ -4,6 +4,7 @@
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=219902322
 required size: 219902322
 fully allocated size: 219902322
+bitmaps size: 4846791580151137091
 required size: 335806464

which looks suspiciously like an uninitialized variable leaking through 
when there are no bitmaps to be measured.  I'll fix it in v2.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[PATCH v4 3/3] hw/vfio: let read-only flag take effect for mmap'd regions

2020-04-16 Thread Yan Zhao

along side setting host page table to be read-only, the memory regions
are also required to be read-only, so that when guest writes to the
read-only & mmap'd regions, vmexits would happen and region write handlers
are called.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Yan Zhao 
Signed-off-by: Xin Zeng 
---
 hw/vfio/common.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index b6956a8098..0049e97c34 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -979,6 +979,10 @@ int vfio_region_mmap(VFIORegion *region)
   name, region->mmaps[i].size,
   region->mmaps[i].mmap);
 g_free(name);
+
+if (!(region->flags & VFIO_REGION_INFO_FLAG_WRITE)) {
+memory_region_set_readonly(>mmaps[i].mem, true);
+}
 memory_region_add_subregion(region->mem, region->mmaps[i].offset,
 >mmaps[i].mem);
 
-- 
2.17.1

[PATCH v4 2/3] hw/vfio: drop guest writes to ro regions

2020-04-16 Thread Yan Zhao

for vfio regions that are without write permission,
drop guest writes to those regions.

Cc: Philippe Mathieu-Daudé 
Signed-off-by: Yan Zhao 
Signed-off-by: Xin Zeng 
---
 hw/vfio/common.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 0b3593b3c0..b6956a8098 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -38,6 +38,7 @@
 #include "sysemu/reset.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "qemu/log.h"
 
 VFIOGroupList vfio_group_list =
 QLIST_HEAD_INITIALIZER(vfio_group_list);
@@ -190,6 +191,15 @@ void vfio_region_write(void *opaque, hwaddr addr,
 uint64_t qword;
 } buf;
 
+trace_vfio_region_write(vbasedev->name, region->nr, addr, data, size);
+if (!(region->flags & VFIO_REGION_INFO_FLAG_WRITE)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "Invalid write to read only vfio region 0x%"
+  HWADDR_PRIx" size %u\n", addr, size);
+
+return;
+}
+
 switch (size) {
 case 1:
 buf.byte = data;
@@ -215,8 +225,6 @@ void vfio_region_write(void *opaque, hwaddr addr,
  addr, data, size);
 }
 
-trace_vfio_region_write(vbasedev->name, region->nr, addr, data, size);
-
 /*
  * A read or write to a BAR always signals an INTx EOI.  This will
  * do nothing if not pending (including not in INTx mode).  We assume
-- 
2.17.1

[PATCH v4 0/3] drop writes to read-only ram device & vfio regions

2020-04-16 Thread Yan Zhao

MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

patch 1 modifies handler of ram device memory regions to drop guest writes
to read-only ram device memory regions

patch 2 modifies handler of non-mmap'd read-only vfio regions to drop guest
writes to those regions 

patch 3 set read-only flag to mmap'd read-only vfio regions, so that guest
writes to those regions would be trapped.
without patch 1, host qemu would then crash on guest write to those
read-only regions.
with patch 1, host qemu would drop the writes.

Changelog:
v4:
-instead of modifying tracing log, added qemu_log_mask(LOG_GUEST_ERROR...)
to log guest writes to read-only regions (Philippe)

for
v3:
-refreshed and Cc Stefan for reviewing of tracing part

v2:
-split one big patches into smaller ones (Philippe)
-modify existing trace to record guest writes to read-only memory (Alex)
-modify vfio_region_write() to drop guest writes to non-mmap'd read-only
 region (Alex)


Yan Zhao (3):
  memory: drop guest writes to read-only ram device regions
  hw/vfio: drop guest writes to ro regions
  hw/vfio: let read-only flag take effect for mmap'd regions

 hw/vfio/common.c | 16 ++--
 memory.c |  7 +++
 2 files changed, 21 insertions(+), 2 deletions(-)

-- 
2.17.1

[PATCH v4 1/3] memory: drop guest writes to read-only ram device regions

2020-04-16 Thread Yan Zhao

for ram device regions, drop guest writes if the regions is read-only.

Cc: Philippe Mathieu-Daudé 
Signed-off-by: Yan Zhao 
Signed-off-by: Xin Zeng 
---
 memory.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/memory.c b/memory.c
index 601b749906..9576dd6807 100644
--- a/memory.c
+++ b/memory.c
@@ -34,6 +34,7 @@
 #include "sysemu/accel.h"
 #include "hw/boards.h"
 #include "migration/vmstate.h"
+#include "qemu/log.h"
 
 //#define DEBUG_UNASSIGNED
 
@@ -1313,6 +1314,12 @@ static void memory_region_ram_device_write(void *opaque, 
hwaddr addr,
 MemoryRegion *mr = opaque;
 
 trace_memory_region_ram_device_write(get_cpu_index(), mr, addr, data, 
size);
+if (mr->readonly) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "Invalid write to read only ram device region 0x%"
+   HWADDR_PRIx" size %u\n", addr, size);
+return;
+}
 
 switch (size) {
 case 1:
-- 
2.17.1

Re: [PATCH] qcow2: Expose bitmaps' size during measure

2020-04-16 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20200416212349.731404-1-ebl...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

Not run: 259
Failures: 190
Failed 1 of 117 iotests
make: *** [check-tests/check-block.sh] Error 1
make: *** Waiting for unfinished jobs
  TESTcheck-qtest-aarch64: tests/qtest/qos-test
Traceback (most recent call last):
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=fab7467314384d429b3d07d0ac891780', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-o6sxcgfh/src/docker-src.2020-04-16-18.34.06.7222:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=fab7467314384d429b3d07d0ac891780
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-o6sxcgfh/src'
make: *** [docker-run-test-quick@centos7] Error 2

real15m8.073s
user0m8.781s


The full log is available at
http://patchew.org/logs/20200416212349.731404-1-ebl...@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH 0/4] RFC/WIP: Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread

2020-04-16 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20200416203624.32366-1-mlevi...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  TESTcheck-qtest-x86_64: tests/qtest/device-plug-test
  TESTcheck-qtest-x86_64: tests/qtest/drive_del-test
**
ERROR:/tmp/qemu-test/src/tests/qtest/drive_del-test.c:25:drive_add: assertion 
failed (resp == "OK\r\n"): ("Duplicate ID 'drive0' for drive\r\n" == "OK\r\n")
ERROR - Bail out! 
ERROR:/tmp/qemu-test/src/tests/qtest/drive_del-test.c:25:drive_add: assertion 
failed (resp == "OK\r\n"): ("Duplicate ID 'drive0' for drive\r\n" == "OK\r\n")
make: *** [check-qtest-x86_64] Error 1
make: *** Waiting for unfinished jobs
  TESTiotest-qcow2: 040
qemu-system-aarch64: -accel kvm: invalid accelerator kvm
---
  TESTcheck-qtest-aarch64: tests/qtest/cdrom-test
  TESTcheck-qtest-aarch64: tests/qtest/device-introspect-test
**
ERROR:/tmp/qemu-test/src/qom/object.c:1124:object_unref: assertion failed: 
(obj->ref > 0)
Broken pipe
/tmp/qemu-test/src/tests/qtest/libqtest.c:175: kill_qemu() detected QEMU death 
from signal 6 (Aborted) (core dumped)
ERROR - too few tests run (expected 6, got 5)
make: *** [check-qtest-aarch64] Error 1
  TESTiotest-qcow2: 154
  TESTiotest-qcow2: 156
  TESTiotest-qcow2: 158
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=f01c53d3dbfc4338a717907bbf724fe3', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-1p2vlvi4/src/docker-src.2020-04-16-17.33.43.25485:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=f01c53d3dbfc4338a717907bbf724fe3
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-1p2vlvi4/src'
make: *** [docker-run-test-quick@centos7] Error 2

real14m7.554s
user0m9.291s


The full log is available at
http://patchew.org/logs/20200416203624.32366-1-mlevi...@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH] qcow2: Expose bitmaps' size during measure

2020-04-16 Thread Eric Blake


On 4/16/20 4:23 PM, Eric Blake wrote:

It's useful to know how much space can be occupied by qcow2 persistent
bitmaps, even though such metadata is unrelated to the guest-visible
data.  Report this value as an additional field.

Reported-by: Nir Soffer 
Signed-off-by: Eric Blake 
---



Per https://bugzilla.redhat.com/show_bug.cgi?id=1779904#c0, I didn't 
quite round up in enough places:



@@ -4739,6 +4742,26 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
  goto err;
  }

+FOR_EACH_DIRTY_BITMAP(in_bs, bm) {
+if (bdrv_dirty_bitmap_get_persistence(bm)) {
+uint64_t bmsize = bdrv_dirty_bitmap_size(bm);
+uint32_t granularity = bdrv_dirty_bitmap_granularity(bm);
+const char *name = bdrv_dirty_bitmap_name(bm);
+uint64_t bmclusters = DIV_ROUND_UP(bmsize / granularity
+   / CHAR_BIT, cluster_size);


All of these divisions need to round up.  For example, in an image with 
512-byte clusters and granularity, and a bitmap covering 512*512*8+512 
bytes (2097664), we need 2 clusters, not 1, for the bitmap itself. 
Fortunately, it is an edge case, and we usually have enough slop in the 
final round up to cluster size that most users won't trip on this.



+
+/* Assume the entire bitmap is allocated */
+bitmaps_size += bmclusters * cluster_size;
+/* Also reserve space for the bitmap table entries */
+bitmaps_size += ROUND_UP(bmclusters * sizeof(uint64_t),
+ cluster_size);
+/* Guess at contribution to bitmap directory size */
+bitmap_overhead += ROUND_UP(strlen(name) + 24,


And I don't like this magic number, but sizeof(Qcow2BitmapDirEntry) from 
qcow2-bitmap.c is a private struct not accessible here.



+sizeof(uint64_t));
+}
+}
+bitmaps_size += ROUND_UP(bitmap_overhead, cluster_size);
+

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 0/4] RFC/WIP: Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread

2020-04-16 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20200416203624.32366-1-mlevi...@redhat.com/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

PASS 3 fdc-test /x86_64/fdc/read_without_media
PASS 1 check-qnull /public/qnull_ref
PASS 2 check-qnull /public/qnull_visit
==8099==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/check-qobject -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="check-qobject" 
PASS 4 fdc-test /x86_64/fdc/media_change
PASS 5 fdc-test /x86_64/fdc/sense_interrupt
---
PASS 33 test-opts-visitor /visitor/opts/dict/unvisited
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-coroutine -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="test-coroutine" 
PASS 11 fdc-test /x86_64/fdc/read_no_dma_18
==8157==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
==8157==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 
0x7ffd2b90d000; bottom 0x7f7ffd19e000; size: 0x007d2e76f000 (537650458624)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 test-coroutine /basic/no-dangling-access
---
PASS 12 test-aio /aio/event/flush
PASS 13 test-aio /aio/event/wait/no-flush-cb
PASS 12 fdc-test /x86_64/fdc/read_no_dma_19
==8172==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 14 test-aio /aio/timer/schedule
PASS 13 fdc-test /x86_64/fdc/fuzz-registers
PASS 15 test-aio /aio/coroutine/queue-chaining
---
PASS 26 test-aio /aio-gsource/event/flush
PASS 27 test-aio /aio-gsource/event/wait/no-flush-cb
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img 
tests/qtest/ide-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="ide-test" 
==8180==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 ide-test /x86_64/ide/identify
==8186==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 28 test-aio /aio-gsource/timer/schedule
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-aio-multithread -m=quick -k --tap < /dev/null | 
./scripts/tap-driver.pl --test-name="test-aio-multithread" 
PASS 2 ide-test /x86_64/ide/flush
==8193==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 test-aio-multithread /aio/multi/lifecycle
==8195==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 3 ide-test /x86_64/ide/bmdma/simple_rw
PASS 2 test-aio-multithread /aio/multi/schedule
==8212==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 4 ide-test /x86_64/ide/bmdma/trim
PASS 3 test-aio-multithread /aio/multi/mutex/contended
==8223==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 4 test-aio-multithread /aio/multi/mutex/handoff
PASS 5 test-aio-multithread /aio/multi/mutex/mcs
PASS 6 test-aio-multithread /aio/multi/mutex/pthread
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-throttle -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="test-throttle" 
==8245==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 test-throttle /throttle/leak_bucket
PASS 2 test-throttle /throttle/compute_wait
PASS 3 test-throttle /throttle/init
---
PASS 14 test-throttle /throttle/config/max
PASS 15 test-throttle /throttle/config/iops_size
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-thread-pool -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="test-thread-pool" 
==8251==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 test-thread-pool /thread-pool/submit
PASS 2 test-thread-pool /thread-pool/submit-aio
PASS 3 test-thread-pool /thread-pool/submit-co
PASS 4 test-thread-pool /thread-pool/submit-many
==8247==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 5

Re: [PATCH 0/4] RFC/WIP: Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread

2020-04-16 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20200416203624.32366-1-mlevi...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH 0/4] RFC/WIP: Fix scsi devices plug/unplug races w.r.t 
virtio-scsi iothread
Message-id: 20200416203624.32366-1-mlevi...@redhat.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] patchew/20200416212349.731404-1-ebl...@redhat.com -> 
patchew/20200416212349.731404-1-ebl...@redhat.com
Switched to a new branch 'test'
5e47155 virtio-scsi: don't touch scsi devices that are not yet realized
7a4e20b device-core: use atomic_set on .realized property
c8c6d72 device-core: use RCU for list of childs of a bus
7b3ca63 scsi/scsi_bus: switch search direction in scsi_device_find

=== OUTPUT BEGIN ===
1/4 Checking commit 7b3ca636be2f (scsi/scsi_bus: switch search direction in 
scsi_device_find)
2/4 Checking commit c8c6d7230602 (device-core: use RCU for list of childs of a 
bus)
ERROR: space required before the open brace '{'
#34: FILE: hw/core/bus.c:52:
+WITH_RCU_READ_LOCK_GUARD(){

ERROR: space required before the open brace '{'
#88: FILE: hw/core/bus.c:200:
+WITH_RCU_READ_LOCK_GUARD(){

total: 2 errors, 0 warnings, 255 lines checked

Patch 2/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

3/4 Checking commit 7a4e20bf9d25 (device-core: use atomic_set on .realized 
property)
4/4 Checking commit 5e47155794bf (virtio-scsi: don't touch scsi devices that 
are not yet realized)
WARNING: Block comments use a leading /* on a separate line
#33: FILE: hw/scsi/virtio-scsi.c:49:
+/* This function might run on the IO thread and we might race against

total: 0 errors, 1 warnings, 30 lines checked

Patch 4/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200416203624.32366-1-mlevi...@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[PATCH] qcow2: Expose bitmaps' size during measure

2020-04-16 Thread Eric Blake

It's useful to know how much space can be occupied by qcow2 persistent
bitmaps, even though such metadata is unrelated to the guest-visible
data.  Report this value as an additional field.

Reported-by: Nir Soffer 
Signed-off-by: Eric Blake 
---

This is independent from my 'qemu-img convert --bitmaps' series, but
highly related.  As an example, if I create a 100M image, then 2
persistent bitmaps, all with default cluster/granularity sizing, I now
see:

$ ./qemu-img measure -f qcow2 -O qcow2 build/img.top
required size: 52756480
fully allocated size: 105185280
bitmaps size: 327680

which argues that I should allocate 52756480 + 327680 bytes prior to
attempting qemu-img convert --bitmaps to a pre-sized destination.

If we like the idea, I probably need to submit a 2/1 patch adding
iotest coverage of the new measurement.

 qapi/block-core.json | 15 ++-
 block/qcow2.c| 25 +
 qemu-img.c   |  3 +++
 3 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 943df1926a91..b47c6d69ba27 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -633,18 +633,23 @@
 # efficiently so file size may be smaller than virtual disk size.
 #
 # The values are upper bounds that are guaranteed to fit the new image file.
-# Subsequent modification, such as internal snapshot or bitmap creation, may
-# require additional space and is not covered here.
+# Subsequent modification, such as internal snapshot or further bitmap
+# creation, may require additional space and is not covered here.
 #
-# @required: Size required for a new image file, in bytes.
+# @required: Size required for a new image file, in bytes, when copying just
+#guest-visible contents.
 #
 # @fully-allocated: Image file size, in bytes, once data has been written
-#   to all sectors.
+#   to all sectors, when copying just guest-visible contents.
+#
+# @bitmaps: Additional size required for bitmap metadata not directly used
+#   for guest contents, when that metadata can be copied in addition
+#   to guest contents. (since 5.1)
 #
 # Since: 2.10
 ##
 { 'struct': 'BlockMeasureInfo',
-  'data': {'required': 'int', 'fully-allocated': 'int'} }
+  'data': {'required': 'int', 'fully-allocated': 'int', '*bitmaps': 'int'} }

 ##
 # @query-block:
diff --git a/block/qcow2.c b/block/qcow2.c
index b524b0c53f84..eba6c2511e60 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -4657,6 +4657,7 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 PreallocMode prealloc;
 bool has_backing_file;
 bool has_luks;
+uint64_t bitmaps_size = 0; /* size occupied by bitmaps in in_bs */

 /* Parse image creation options */
 cluster_size = qcow2_opt_get_cluster_size_del(opts, _err);
@@ -4732,6 +4733,8 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,

 /* Account for input image */
 if (in_bs) {
+BdrvDirtyBitmap *bm;
+size_t bitmap_overhead = 0;
 int64_t ssize = bdrv_getlength(in_bs);
 if (ssize < 0) {
 error_setg_errno(_err, -ssize,
@@ -4739,6 +4742,26 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 goto err;
 }

+FOR_EACH_DIRTY_BITMAP(in_bs, bm) {
+if (bdrv_dirty_bitmap_get_persistence(bm)) {
+uint64_t bmsize = bdrv_dirty_bitmap_size(bm);
+uint32_t granularity = bdrv_dirty_bitmap_granularity(bm);
+const char *name = bdrv_dirty_bitmap_name(bm);
+uint64_t bmclusters = DIV_ROUND_UP(bmsize / granularity
+   / CHAR_BIT, cluster_size);
+
+/* Assume the entire bitmap is allocated */
+bitmaps_size += bmclusters * cluster_size;
+/* Also reserve space for the bitmap table entries */
+bitmaps_size += ROUND_UP(bmclusters * sizeof(uint64_t),
+ cluster_size);
+/* Guess at contribution to bitmap directory size */
+bitmap_overhead += ROUND_UP(strlen(name) + 24,
+sizeof(uint64_t));
+}
+}
+bitmaps_size += ROUND_UP(bitmap_overhead, cluster_size);
+
 virtual_size = ROUND_UP(ssize, cluster_size);

 if (has_backing_file) {
@@ -4795,6 +4818,8 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
  * still counted.
  */
 info->required = info->fully_allocated - virtual_size + required;
+info->has_bitmaps = !!bitmaps_size;
+info->bitmaps = bitmaps_size;
 return info;

 err:
diff --git a/qemu-img.c b/qemu-img.c
index 6541357179c2..d900bde89911 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -5084,6 +5084,9 @@ static int img_measure(int argc, char **argv)

Re: [PATCH v2 8/8] hw/arm/fsl-imx7: Connect watchdog interrupts

2020-04-16 Thread Guenter Roeck

Hi Peter,

On 4/16/20 8:29 AM, Peter Maydell wrote:
> On Sun, 22 Mar 2020 at 21:19, Guenter Roeck  wrote:
>>
>> i.MX7 supports watchdog pretimeout interupts. With this commit,
>> the watchdog in mcimx7d-sabre is fully operational, including
>> pretimeout support.
>>
>> Signed-off-by: Guenter Roeck 
> 
>> diff --git a/include/hw/arm/fsl-imx7.h b/include/hw/arm/fsl-imx7.h
>> index 47826da2b7..da977f9ffb 100644
>> --- a/include/hw/arm/fsl-imx7.h
>> +++ b/include/hw/arm/fsl-imx7.h
>> @@ -228,6 +228,11 @@ enum FslIMX7IRQs {
>>  FSL_IMX7_USB2_IRQ = 42,
>>  FSL_IMX7_USB3_IRQ = 40,
>>
>> +FSL_IMX7_WDOG1_IRQ= 78,
>> +FSL_IMX7_WDOG2_IRQ= 79,
>> +FSL_IMX7_WDOG3_IRQ= 10,
>> +FSL_IMX7_WDOG4_IRQ= 109,
> 
> irq 10 for wdog3 seems to match the kernel's dts, but it's
> a bit weird that it's way out of the range of the others.
> Did you sanity check it against the imx7 data sheet and/or
> real h/w behaviour that it's not a typo for
> one-hundred-and-something? (108 would be the obvious guess...)
> 

I actually did check, for that very same reason. To be sure I looked
again. 10 is correct per datasheet. 108 is TZASC1 (TZASC (PL380)
interrupt).

> Otherwise
> Reviewed-by: Peter Maydell 
> 

Thanks,
Guenter

Re: [PATCH v20 QEMU 0/5] virtio-balloon: add support for free page reporting

2020-04-16 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20200416195641.13144.16955.stgit@localhost.localdomain/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH v20 QEMU 0/5] virtio-balloon: add support for free page 
reporting
Message-id: 20200416195641.13144.16955.stgit@localhost.localdomain
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
33de5c8 virtio-balloon: Provide an interface for free page reporting
20cd9d2 linux-headers: update to contain virito-balloon free page reporting
fe9a29b virtio-balloon: Implement support for page poison tracking feature
5824022 virtio-balloon: Replace free page hinting references to 'report' with 
'hint'
9fcf955 linux-headers: Update to allow renaming of free_page_report_cmd_id

=== OUTPUT BEGIN ===
1/5 Checking commit 9fcf955ce5d7 (linux-headers: Update to allow renaming of 
free_page_report_cmd_id)
2/5 Checking commit 58240226c116 (virtio-balloon: Replace free page hinting 
references to 'report' with 'hint')
3/5 Checking commit fe9a29ba1521 (virtio-balloon: Implement support for page 
poison tracking feature)
4/5 Checking commit 20cd9d2d0845 (linux-headers: update to contain 
virito-balloon free page reporting)
5/5 Checking commit 33de5c8bb2f6 (virtio-balloon: Provide an interface for free 
page reporting)
ERROR: code indent should never use tabs
#68: FILE: hw/virtio/virtio-balloon.c:364:
+^I^I(ram_offset + size) > qemu_ram_get_used_length(rb)) {$

total: 1 errors, 0 warnings, 94 lines checked

Patch 5/5 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200416195641.13144.16955.stgit@localhost.localdomain/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: m68k: gdbstub crashing setting float register on cfv4e cpu

2020-04-16 Thread Pierre Muller




Le 16/04/2020 à 22:09, Laurent Vivier a écrit :
> Le 16/04/2020 à 22:03, Pierre Muller a écrit :
>> Le 16/04/2020 à 13:18, Laurent Vivier a écrit :
>>> Le 14/04/2020 à 18:56, Alex Bennée a écrit :

 Philippe Mathieu-Daudé  writes:

> gdbstub/m68k seems broken with floats, previous to refactor commit
> a010bdbe719 ("extend GByteArray to read register helpers").
>
> HEAD at 6fb1603aa2:
>
> $ qemu-system-m68k -s -S -cpu cfv4e
>
> ---[GUEST]---
>
> (gdb) set architecture m68k:cfv4e
> The target architecture is assumed to be m68k:cfv4e
> (gdb) target remote 172.17.0.1:1234
> Remote debugging using 172.17.0.1:1234
> (gdb) info float
> fp0-nan(0xfff7f) (raw 0xff7f)
> fp1-nan(0xfff7f) (raw 0xff7f)
> fp2-nan(0xfff7f) (raw 0xff7f)
> fp3-nan(0xfff7f) (raw 0xff7f)
> fp4-nan(0xfff7f) (raw 0xff7f)
> fp5-nan(0xfff7f) (raw 0xff7f)
> fp6-nan(0xfff7f) (raw 0xff7f)
> fp7-nan(0xfff7f) (raw 0xff7f)
> fpcontrol  0x0 0
> fpstatus   0x0 0
> fpiaddr0x0 0x0
> (gdb) set $fp0=1
> Remote communication error.  Target disconnected.: Connection reset by
> peer.

 With my sha1 debugging test case I get different results depending on
 the cpu type:

   /home/alex/lsrc/qemu.git/tests/guest-debug/run-test.py --gdb 
 /home/alex/src/tools/binutils-gdb.git/builds/all/install/bin/gdb --qemu 
 /home/alex/lsrc/qemu.git/builds/user.static/m68k-linux-user/qemu-m68k 
 --qargs "" --bin tests/tcg/m68k-linux-user/sha1 --test
>> /home/alex/lsrc/qemu.git/tests/tcg/multiarch/gdbstub/sha1.py
   GNU gdb (GDB) 10.0.50.20200414-git
   Copyright (C) 2020 Free Software Foundation, Inc.
   License GPLv3+: GNU GPL version 3 or later 
 
   This is free software: you are free to change and redistribute it.
   There is NO WARRANTY, to the extent permitted by law.
   Type "show copying" and "show warranty" for details.
   This GDB was configured as "x86_64-pc-linux-gnu".
   Type "show configuration" for configuration details.
   For bug reporting instructions, please see:
   .
   Find the GDB manual and other documentation resources online at:
   .

   For help, type "help".
   Type "apropos word" to search for commands related to "word"...
   Executed .gdbinit
   Reading symbols from tests/tcg/m68k-linux-user/sha1...
   Remote debugging using localhost:1234
   warning: Register "fp0" has an unsupported size (96 bits)
   warning: Register "fp1" has an unsupported size (96 bits)
   warning: Register "fp2" has an unsupported size (96 bits)
   warning: Register "fp3" has an unsupported size (96 bits)
   warning: Register "fp4" has an unsupported size (96 bits)
   warning: Register "fp5" has an unsupported size (96 bits)
   warning: Register "fp6" has an unsupported size (96 bits)
   warning: Register "fp7" has an unsupported size (96 bits)
   Remote 'g' packet reply is too long (expected 148 bytes, got 180 bytes):
>> 408009f083407fff7fff7fff7fff7fff7fff7fff7fff
>>>
>>> This is a bug in GDB that doesn't support 96bit float registers of 680x0
>>> but only 64bit registers of coldfire.
>>>
>>> There was a rework of GDB in the past that has broken that and no one
>>> noticed. I bisected and found the commit but it was really too complex
>>> and difficult to fix.
>>>
>>> To be able to debug remotely m68k I use gdb from etch-m68k in a chroot
>>> (or from real hardware).
>>
>>   I do have a fix for gdb-8.3 release: it works for me.
>> See patch below,
>>
>>   You could test it out on other versions,
>> changes to m68k-tdep.c are not that big in recent GDB releases.
>>   I use it with a locally modified qemu to try to support FPU
>> exceptions for m68k FPU.
>>   But I never found the time nor the enery to try to submit those
>> to qemu-devel, especially after viewing what happened to a similar
>> attempt for powerpc hardware fpu support.
>> See "[RFC PATCH v2] target/ppc: Enable hardfloat for PPC" thread, up to
>> https://lists.nongnu.org/archive/html/qemu-ppc/2020-03/msg6.html
> 
> But why didn't you submit your patch to gdb?

  You are rtight, I should do so,

[PATCH 3/4] device-core: use atomic_set on .realized property

2020-04-16 Thread Maxim Levitsky

Some code might race with placement of new devices on a bus.
We currently first place a (unrealized) device on the bus
and then realize it.

As a workaround, users that scan the child device list, can
check the realized property to see if it is safe to access such a device.
Use an atomic write here too to aid with this.

A separate discussion is what to do with devices that are unrealized:
It looks like for this case we only call the hotplug handler's unplug
callback and its up to it to unrealize the device.
An atomic operation doesn't cause harm for this code path though.

Signed-off-by: Maxim Levitsky 
---
 hw/core/qdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index f0c87e582e..bbb1ae3eb3 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -983,7 +983,7 @@ static void device_set_realized(Object *obj, bool value, 
Error **errp)
 }
 
 assert(local_err == NULL);
-dev->realized = value;
+atomic_set(>realized, value);
 return;
 
 child_realize_fail:
-- 
2.17.2

[PATCH 4/4] virtio-scsi: don't touch scsi devices that are not yet realized

2020-04-16 Thread Maxim Levitsky

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1812399

Suggested-by: Paolo Bonzini 
Signed-off-by: Maxim Levitsky 
---
 hw/scsi/virtio-scsi.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index b0f4a35f81..e360b4e03e 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -35,13 +35,29 @@ static inline int virtio_scsi_get_lun(uint8_t *lun)
 
 static inline SCSIDevice *virtio_scsi_device_find(VirtIOSCSI *s, uint8_t *lun)
 {
+SCSIDevice *device = NULL;
+
 if (lun[0] != 1) {
 return NULL;
 }
 if (lun[2] != 0 && !(lun[2] >= 0x40 && lun[2] < 0x80)) {
 return NULL;
 }
-return scsi_device_find(>bus, 0, lun[1], virtio_scsi_get_lun(lun));
+
+device = scsi_device_find(>bus, 0, lun[1], virtio_scsi_get_lun(lun));
+
+/* This function might run on the IO thread and we might race against
+ * main thread hot-plugging the device.
+ *
+ * We assume that as soon as .realized is set to true we can let
+ * the user access the device.
+ */
+
+if (!device || !atomic_read(>qdev.realized)) {
+return NULL;
+}
+
+return device;
 }
 
 void virtio_scsi_init_req(VirtIOSCSI *s, VirtQueue *vq, VirtIOSCSIReq *req)
-- 
2.17.2

[PATCH 1/4] scsi/scsi_bus: switch search direction in scsi_device_find

2020-04-16 Thread Maxim Levitsky

This change will allow us to convert the bus children list to RCU,
while not changing the logic of this function

Signed-off-by: Maxim Levitsky 
---
 hw/scsi/scsi-bus.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 1c980cab38..7bbc37acec 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -1584,7 +1584,7 @@ SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, 
int id, int lun)
 BusChild *kid;
 SCSIDevice *target_dev = NULL;
 
-QTAILQ_FOREACH_REVERSE(kid, >qbus.children, sibling) {
+QTAILQ_FOREACH(kid, >qbus.children, sibling) {
 DeviceState *qdev = kid->child;
 SCSIDevice *dev = SCSI_DEVICE(qdev);
 
@@ -1592,7 +1592,15 @@ SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, 
int id, int lun)
 if (dev->lun == lun) {
 return dev;
 }
-target_dev = dev;
+
+/*
+ * If we don't find exact match (channel/bus/lun),
+ * we will return the first device which matches channel/bus
+ */
+
+if (!target_dev) {
+target_dev = dev;
+}
 }
 }
 return target_dev;
-- 
2.17.2

[PATCH 0/4] RFC/WIP: Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread

2020-04-16 Thread Maxim Levitsky

Hi!

This is a patch series that is a result of my discussion with Paulo on
how to correctly fix the root cause of the BZ #1812399.

The root cause of this bug is the fact that IO thread is running mostly
unlocked versus main thread on which device hotplug is done.

qdev_device_add first creates the device object, then places it on the bus,
and only then realizes it.

However some drivers and currently only virtio-scsi enumerate its child bus
devices on each request that is received from the guest and that can happen on 
the IO
thread.

Thus we have a window when new device is on the bus but not realized and can be 
accessed
by the virtio-scsi driver in that state.

Fix that by doing two things:

1. Add partial RCU protection to the list of a bus's child devices.
This allows the scsi IO thread to safely enumerate the child devices
while it races with the hotplug placing the device on the bus.

2. Make the virtio-scsi driver check .realized property of the scsi device
and avoid touching the device if it isn't

I don't think that this is very pretty way to solve this, we discussed this
with Paulo and it kind of looks like the lesser evil. I am open to your 
thoughts about this.

Note that this patch series doesn't pass some unit tests and in particular 
qtest 'drive_del-test'
I did some light debug of this test and I see that the reason for this is that 
now child device deletion
can be delayed due to RCU. This is also something I would like to discuss in 
this RFC.

Note also that I might have some code style errors and bugs in this since I 
haven't
tested the code in depth yet, because I am not yet sure that this is the right 
way
to fix that bug

Also note that in the particular bug report the issue wasn't a race but rather 
due
to combination of things, the .realize code in the middle managed to trigger IO 
on the virtqueue
which caused the virtio-scsi driver to access the half realized device. However
since this can happen as well with real IO thread, this patch series was done,
which fixes this as well.

Best regards,
Maxim Levitsky

Maxim Levitsky (4):
  scsi/scsi_bus: switch search direction in scsi_device_find
  device-core: use RCU for list of childs of a bus
  device-core: use atomic_set on .realized property
  virtio-scsi: don't touch scsi devices that are not yet realized

 hw/core/bus.c  | 43 --
 hw/core/qdev.c | 48 ++
 hw/scsi/scsi-bus.c | 27 ---
 hw/scsi/virtio-scsi.c  | 24 +++--
 include/hw/qdev-core.h |  3 +++
 include/hw/virtio/virtio-bus.h |  7 +++--
 6 files changed, 114 insertions(+), 38 deletions(-)

-- 
2.17.2

[PATCH 2/4] device-core: use RCU for list of childs of a bus

2020-04-16 Thread Maxim Levitsky

This fixes the race between device emulation code that tries to find
a child device to dispatch the request to (e.g a scsi disk),
and hotplug of a new device to that bus.

Note that this doesn't convert all the readers of the list
but only these that might go over that list without BQL held.

This is a very small first step to make this code thread safe.

Suggested-by: Paolo Bonzini 
Signed-off-by: Maxim Levitsky 
---
 hw/core/bus.c  | 43 ---
 hw/core/qdev.c | 46 +++---
 hw/scsi/scsi-bus.c | 17 ++---
 hw/scsi/virtio-scsi.c  |  6 -
 include/hw/qdev-core.h |  3 +++
 include/hw/virtio/virtio-bus.h |  7 --
 6 files changed, 87 insertions(+), 35 deletions(-)

diff --git a/hw/core/bus.c b/hw/core/bus.c
index 3dc0a825f0..cb7756ded1 100644
--- a/hw/core/bus.c
+++ b/hw/core/bus.c
@@ -49,12 +49,14 @@ int qbus_walk_children(BusState *bus,
 }
 }
 
-QTAILQ_FOREACH(kid, >children, sibling) {
-err = qdev_walk_children(kid->child,
- pre_devfn, pre_busfn,
- post_devfn, post_busfn, opaque);
-if (err < 0) {
-return err;
+WITH_RCU_READ_LOCK_GUARD(){
+QTAILQ_FOREACH_RCU(kid, >children, sibling) {
+err = qdev_walk_children(kid->child,
+ pre_devfn, pre_busfn,
+ post_devfn, post_busfn, opaque);
+if (err < 0) {
+return err;
+}
 }
 }
 
@@ -90,9 +92,13 @@ static void bus_reset_child_foreach(Object *obj, 
ResettableChildCallback cb,
 BusState *bus = BUS(obj);
 BusChild *kid;
 
-QTAILQ_FOREACH(kid, >children, sibling) {
+rcu_read_lock();
+
+QTAILQ_FOREACH_RCU(kid, >children, sibling) {
 cb(OBJECT(kid->child), opaque, type);
 }
+
+rcu_read_unlock();
 }
 
 static void qbus_realize(BusState *bus, DeviceState *parent, const char *name)
@@ -138,10 +144,15 @@ static void bus_unparent(Object *obj)
 /* Only the main system bus has no parent, and that bus is never freed */
 assert(bus->parent);
 
-while ((kid = QTAILQ_FIRST(>children)) != NULL) {
+rcu_read_lock();
+
+while ((kid = QTAILQ_FIRST_RCU(>children)) != NULL) {
 DeviceState *dev = kid->child;
 object_unparent(OBJECT(dev));
 }
+
+rcu_read_unlock();
+
 QLIST_REMOVE(bus, sibling);
 bus->parent->num_child_bus--;
 bus->parent = NULL;
@@ -185,14 +196,18 @@ static void bus_set_realized(Object *obj, bool value, 
Error **errp)
 
 /* TODO: recursive realization */
 } else if (!value && bus->realized) {
-QTAILQ_FOREACH(kid, >children, sibling) {
-DeviceState *dev = kid->child;
-object_property_set_bool(OBJECT(dev), false, "realized",
- _err);
-if (local_err != NULL) {
-break;
+
+WITH_RCU_READ_LOCK_GUARD(){
+QTAILQ_FOREACH_RCU(kid, >children, sibling) {
+DeviceState *dev = kid->child;
+object_property_set_bool(OBJECT(dev), false, "realized",
+ _err);
+if (local_err != NULL) {
+break;
+}
 }
 }
+
 if (bc->unrealize && local_err == NULL) {
 bc->unrealize(bus, _err);
 }
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 85f062def7..f0c87e582e 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -50,26 +50,37 @@ const VMStateDescription *qdev_get_vmsd(DeviceState *dev)
 return dc->vmsd;
 }
 
+static void bus_free_bus_child(BusChild *kid)
+{
+object_unref(OBJECT(kid->child));
+g_free(kid);
+}
+
 static void bus_remove_child(BusState *bus, DeviceState *child)
 {
 BusChild *kid;
 
-QTAILQ_FOREACH(kid, >children, sibling) {
+rcu_read_lock();
+
+QTAILQ_FOREACH_RCU(kid, >children, sibling) {
 if (kid->child == child) {
 char name[32];
 
 snprintf(name, sizeof(name), "child[%d]", kid->index);
-QTAILQ_REMOVE(>children, kid, sibling);
+QTAILQ_REMOVE_RCU(>children, kid, sibling);
 
 bus->num_children--;
 
 /* This gives back ownership of kid->child back to us.  */
 object_property_del(OBJECT(bus), name, NULL);
-object_unref(OBJECT(kid->child));
-g_free(kid);
-return;
+
+/* free the bus kid, when it is safe to do so*/
+call_rcu(kid, bus_free_bus_child, rcu);
+break;
 }
 }
+
+rcu_read_unlock();
 }
 
 static void bus_add_child(BusState *bus, DeviceState *child)
@@ -82,7 +93,9 @@ static void bus_add_child(BusState *bus, DeviceState *child)
 kid->child = child;
 object_ref(OBJECT(kid->child));
 
-QTAILQ_INSERT_HEAD(>children, kid,

Re: [PATCH 0/2] virtiofsd: drop Linux capabilities(7)

2020-04-16 Thread Vivek Goyal

On Thu, Apr 16, 2020 at 05:49:05PM +0100, Stefan Hajnoczi wrote:
> virtiofsd doesn't need of all Linux capabilities(7) available to root.  Keep a
> whitelisted set of capabilities that we require.  This improves security in
> case virtiofsd is compromised by making it hard for an attacker to gain 
> further
> access to the system.

Hi Stefan,

Good to see this patch. We needed to limit capabilities to reduce attack
surface.

What tests have you run to make sure this current set of whitelisted
capabilities is good enough.

Vivek

> 
> Stefan Hajnoczi (2):
>   virtiofsd: only retain file system capabilities
>   virtiofsd: drop all capabilities in the wait parent process
> 
>  tools/virtiofsd/passthrough_ll.c | 51 
>  1 file changed, 51 insertions(+)
> 
> -- 
> 2.25.1
>

Re: m68k: gdbstub crashing setting float register on cfv4e cpu

2020-04-16 Thread Laurent Vivier

Le 16/04/2020 à 22:03, Pierre Muller a écrit :
> Le 16/04/2020 à 13:18, Laurent Vivier a écrit :
>> Le 14/04/2020 à 18:56, Alex Bennée a écrit :
>>>
>>> Philippe Mathieu-Daudé  writes:
>>>
 gdbstub/m68k seems broken with floats, previous to refactor commit
 a010bdbe719 ("extend GByteArray to read register helpers").

 HEAD at 6fb1603aa2:

 $ qemu-system-m68k -s -S -cpu cfv4e

 ---[GUEST]---

 (gdb) set architecture m68k:cfv4e
 The target architecture is assumed to be m68k:cfv4e
 (gdb) target remote 172.17.0.1:1234
 Remote debugging using 172.17.0.1:1234
 (gdb) info float
 fp0-nan(0xfff7f) (raw 0xff7f)
 fp1-nan(0xfff7f) (raw 0xff7f)
 fp2-nan(0xfff7f) (raw 0xff7f)
 fp3-nan(0xfff7f) (raw 0xff7f)
 fp4-nan(0xfff7f) (raw 0xff7f)
 fp5-nan(0xfff7f) (raw 0xff7f)
 fp6-nan(0xfff7f) (raw 0xff7f)
 fp7-nan(0xfff7f) (raw 0xff7f)
 fpcontrol  0x0 0
 fpstatus   0x0 0
 fpiaddr0x0 0x0
 (gdb) set $fp0=1
 Remote communication error.  Target disconnected.: Connection reset by
 peer.
>>>
>>> With my sha1 debugging test case I get different results depending on
>>> the cpu type:
>>>
>>>   /home/alex/lsrc/qemu.git/tests/guest-debug/run-test.py --gdb 
>>> /home/alex/src/tools/binutils-gdb.git/builds/all/install/bin/gdb --qemu 
>>> /home/alex/lsrc/qemu.git/builds/user.static/m68k-linux-user/qemu-m68k 
>>> --qargs "" --bin tests/tcg/m68k-linux-user/sha1 --test
> /home/alex/lsrc/qemu.git/tests/tcg/multiarch/gdbstub/sha1.py
>>>   GNU gdb (GDB) 10.0.50.20200414-git
>>>   Copyright (C) 2020 Free Software Foundation, Inc.
>>>   License GPLv3+: GNU GPL version 3 or later 
>>> 
>>>   This is free software: you are free to change and redistribute it.
>>>   There is NO WARRANTY, to the extent permitted by law.
>>>   Type "show copying" and "show warranty" for details.
>>>   This GDB was configured as "x86_64-pc-linux-gnu".
>>>   Type "show configuration" for configuration details.
>>>   For bug reporting instructions, please see:
>>>   .
>>>   Find the GDB manual and other documentation resources online at:
>>>   .
>>>
>>>   For help, type "help".
>>>   Type "apropos word" to search for commands related to "word"...
>>>   Executed .gdbinit
>>>   Reading symbols from tests/tcg/m68k-linux-user/sha1...
>>>   Remote debugging using localhost:1234
>>>   warning: Register "fp0" has an unsupported size (96 bits)
>>>   warning: Register "fp1" has an unsupported size (96 bits)
>>>   warning: Register "fp2" has an unsupported size (96 bits)
>>>   warning: Register "fp3" has an unsupported size (96 bits)
>>>   warning: Register "fp4" has an unsupported size (96 bits)
>>>   warning: Register "fp5" has an unsupported size (96 bits)
>>>   warning: Register "fp6" has an unsupported size (96 bits)
>>>   warning: Register "fp7" has an unsupported size (96 bits)
>>>   Remote 'g' packet reply is too long (expected 148 bytes, got 180 bytes):
> 408009f083407fff7fff7fff7fff7fff7fff7fff7fff
>>
>> This is a bug in GDB that doesn't support 96bit float registers of 680x0
>> but only 64bit registers of coldfire.
>>
>> There was a rework of GDB in the past that has broken that and no one
>> noticed. I bisected and found the commit but it was really too complex
>> and difficult to fix.
>>
>> To be able to debug remotely m68k I use gdb from etch-m68k in a chroot
>> (or from real hardware).
> 
>   I do have a fix for gdb-8.3 release: it works for me.
> See patch below,
> 
>   You could test it out on other versions,
> changes to m68k-tdep.c are not that big in recent GDB releases.
>   I use it with a locally modified qemu to try to support FPU
> exceptions for m68k FPU.
>   But I never found the time nor the enery to try to submit those
> to qemu-devel, especially after viewing what happened to a similar
> attempt for powerpc hardware fpu support.
> See "[RFC PATCH v2] target/ppc: Enable hardfloat for PPC" thread, up to
> https://lists.nongnu.org/archive/html/qemu-ppc/2020-03/msg6.html

But why didn't you submit your patch to gdb?

Thanks,
Laurent

Re: m68k: gdbstub crashing setting float register on cfv4e cpu

2020-04-16 Thread Pierre Muller

Le 16/04/2020 à 13:18, Laurent Vivier a écrit :
> Le 14/04/2020 à 18:56, Alex Bennée a écrit :
>>
>> Philippe Mathieu-Daudé  writes:
>>
>>> gdbstub/m68k seems broken with floats, previous to refactor commit
>>> a010bdbe719 ("extend GByteArray to read register helpers").
>>>
>>> HEAD at 6fb1603aa2:
>>>
>>> $ qemu-system-m68k -s -S -cpu cfv4e
>>>
>>> ---[GUEST]---
>>>
>>> (gdb) set architecture m68k:cfv4e
>>> The target architecture is assumed to be m68k:cfv4e
>>> (gdb) target remote 172.17.0.1:1234
>>> Remote debugging using 172.17.0.1:1234
>>> (gdb) info float
>>> fp0-nan(0xfff7f) (raw 0xff7f)
>>> fp1-nan(0xfff7f) (raw 0xff7f)
>>> fp2-nan(0xfff7f) (raw 0xff7f)
>>> fp3-nan(0xfff7f) (raw 0xff7f)
>>> fp4-nan(0xfff7f) (raw 0xff7f)
>>> fp5-nan(0xfff7f) (raw 0xff7f)
>>> fp6-nan(0xfff7f) (raw 0xff7f)
>>> fp7-nan(0xfff7f) (raw 0xff7f)
>>> fpcontrol  0x0 0
>>> fpstatus   0x0 0
>>> fpiaddr0x0 0x0
>>> (gdb) set $fp0=1
>>> Remote communication error.  Target disconnected.: Connection reset by
>>> peer.
>>
>> With my sha1 debugging test case I get different results depending on
>> the cpu type:
>>
>>   /home/alex/lsrc/qemu.git/tests/guest-debug/run-test.py --gdb 
>> /home/alex/src/tools/binutils-gdb.git/builds/all/install/bin/gdb --qemu 
>> /home/alex/lsrc/qemu.git/builds/user.static/m68k-linux-user/qemu-m68k 
>> --qargs "" --bin tests/tcg/m68k-linux-user/sha1 --test
/home/alex/lsrc/qemu.git/tests/tcg/multiarch/gdbstub/sha1.py
>>   GNU gdb (GDB) 10.0.50.20200414-git
>>   Copyright (C) 2020 Free Software Foundation, Inc.
>>   License GPLv3+: GNU GPL version 3 or later 
>> 
>>   This is free software: you are free to change and redistribute it.
>>   There is NO WARRANTY, to the extent permitted by law.
>>   Type "show copying" and "show warranty" for details.
>>   This GDB was configured as "x86_64-pc-linux-gnu".
>>   Type "show configuration" for configuration details.
>>   For bug reporting instructions, please see:
>>   .
>>   Find the GDB manual and other documentation resources online at:
>>   .
>>
>>   For help, type "help".
>>   Type "apropos word" to search for commands related to "word"...
>>   Executed .gdbinit
>>   Reading symbols from tests/tcg/m68k-linux-user/sha1...
>>   Remote debugging using localhost:1234
>>   warning: Register "fp0" has an unsupported size (96 bits)
>>   warning: Register "fp1" has an unsupported size (96 bits)
>>   warning: Register "fp2" has an unsupported size (96 bits)
>>   warning: Register "fp3" has an unsupported size (96 bits)
>>   warning: Register "fp4" has an unsupported size (96 bits)
>>   warning: Register "fp5" has an unsupported size (96 bits)
>>   warning: Register "fp6" has an unsupported size (96 bits)
>>   warning: Register "fp7" has an unsupported size (96 bits)
>>   Remote 'g' packet reply is too long (expected 148 bytes, got 180 bytes):
408009f083407fff7fff7fff7fff7fff7fff7fff7fff
>
> This is a bug in GDB that doesn't support 96bit float registers of 680x0
> but only 64bit registers of coldfire.
>
> There was a rework of GDB in the past that has broken that and no one
> noticed. I bisected and found the commit but it was really too complex
> and difficult to fix.
>
> To be able to debug remotely m68k I use gdb from etch-m68k in a chroot
> (or from real hardware).

  I do have a fix for gdb-8.3 release: it works for me.
See patch below,

  You could test it out on other versions,
changes to m68k-tdep.c are not that big in recent GDB releases.
  I use it with a locally modified qemu to try to support FPU
exceptions for m68k FPU.
  But I never found the time nor the enery to try to submit those
to qemu-devel, especially after viewing what happened to a similar
attempt for powerpc hardware fpu support.
See "[RFC PATCH v2] target/ppc: Enable hardfloat for PPC" thread, up to
https://lists.nongnu.org/archive/html/qemu-ppc/2020-03/msg6.html


Pierre Muller


muller@gcc123:~/gnu/gdb$ cat gdb-8.3-m68k-fpu-fix.patch
diff -rc gdb-8.3/gdb/m68k-tdep.c gdb-8.3-for-m68k/gdb/m68k-tdep.c
*** gdb-8.3/gdb/m68k-tdep.c 2019-02-27 04:51:50.0 +
--- gdb-8.3-for-m68k/gdb/m68k-tdep.c2019-09-30 14:28:02.632962365 +
***
*** 1124,1131 
--- 1124,1136 

feature = tdesc_find_feature

Re: [PATCH v4 17/30] qcow2: Add subcluster support to calculate_l2_meta()

2020-04-16 Thread Alberto Garcia

On Wed 15 Apr 2020 10:39:26 AM CEST, Vladimir Sementsov-Ogievskiy wrote:
>> + * Returns 1 on success, -errno on failure (in order to match the
>> + * return value of handle_copied() and handle_alloc()).
>
> Hmm, honestly, I don't like this idea. handle_copied and handle_alloc
> has special return code semantics. Here no reason for special
> semantics, just classic error/success.

Right, the only reason is to avoid adding something like this after all
callers:

if (ret == 0) {
ret = 1;
}

But you have a point, maybe I change it after all.

>> +case QCOW2_SUBCLUSTER_NORMAL:
>> +case QCOW2_SUBCLUSTER_COMPRESSED:
>> +case QCOW2_SUBCLUSTER_ZERO_ALLOC:
>> +case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
>> +cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
>
> Hmm. Interesting, actually, we don't need to COW
> QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC subclusters in cow-area.. But this
> need more modifications to cow-handling.

True, if there are more unallocated subclusters in the cow area we could
make the copy operation smaller. I'm not sure if it's worth adding extra
code for this, but maybe I can leave a comment.

>> +break;
>> +case QCOW2_SUBCLUSTER_ZERO_PLAIN:
>> +case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
>> +cow_end_to = ROUND_UP(cow_end_from, s->subcluster_size);
>
>
> This is because in new cluster we can made previous subclusters
> unallocated, and don't copy from backing.
> Hmm, actually, we should not just make them unallocated, but copy part
> of bitmap from original l2-entry.. I need to keep it in mind for next
> patches.

The bitmap is always copied from the original L2 entry, you can see it
in the patch "qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()"

Berto

[PATCH v20 QEMU 5/5] virtio-balloon: Provide an interface for free page reporting

2020-04-16 Thread Alexander Duyck

From: Alexander Duyck 

Add support for free page reporting. The idea is to function very similar
to how the balloon works in that we basically end up madvising the page as
not being used. However we don't really need to bother with any deflate
type logic since the page will be faulted back into the guest when it is
read or written to.

This provides a new way of letting the guest proactively report free
pages to the hypervisor, so the hypervisor can reuse them. In contrast to
inflate/deflate that is triggered via the hypervisor explicitly.

Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   62 +++-
 include/hw/virtio/virtio-balloon.h |2 +
 2 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 19d587fd05cb..e5c9317921a1 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -321,6 +321,59 @@ static void balloon_stats_set_poll_interval(Object *obj, 
Visitor *v,
 balloon_stats_change_timer(s, 0);
 }
 
+static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
+{
+VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
+VirtQueueElement *elem;
+
+while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement {
+unsigned int i;
+
+if (qemu_balloon_is_inhibited() || dev->poison_val) {
+continue;
+}
+
+for (i = 0; i < elem->in_num; i++) {
+void *addr = elem->in_sg[i].iov_base;
+size_t size = elem->in_sg[i].iov_len;
+ram_addr_t ram_offset;
+RAMBlock *rb;
+
+/*
+ * There is no need to check the memory section to see if
+ * it is ram/readonly/romd like there is for handle_output
+ * below. If the region is not meant to be written to then
+ * address_space_map will have allocated a bounce buffer
+ * and it will be freed in address_space_unmap and trigger
+ * and unassigned_mem_write before failing to copy over the
+ * buffer. If more than one bad descriptor is provided it
+ * will return NULL after the first bounce buffer and fail
+ * to map any resources.
+ */
+rb = qemu_ram_block_from_host(addr, false, _offset);
+if (!rb) {
+trace_virtio_balloon_bad_addr(elem->in_addr[i]);
+continue;
+}
+
+/*
+ * For now we will simply ignore unaligned memory regions, or
+ * regions that overrun the end of the RAMBlock.
+ */
+if (!QEMU_IS_ALIGNED(ram_offset | size, qemu_ram_pagesize(rb)) ||
+   (ram_offset + size) > qemu_ram_get_used_length(rb)) {
+continue;
+}
+
+ram_block_discard_range(rb, ram_offset, size);
+}
+
+virtqueue_push(vq, elem, 0);
+virtio_notify(vdev, vq);
+g_free(elem);
+}
+}
+
 static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
@@ -728,7 +781,8 @@ static uint64_t virtio_balloon_get_features(VirtIODevice 
*vdev, uint64_t f,
 VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
 f |= dev->host_features;
 virtio_add_feature(, VIRTIO_BALLOON_F_STATS_VQ);
-if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
+virtio_has_feature(f, VIRTIO_BALLOON_F_REPORTING)) {
 virtio_add_feature(, VIRTIO_BALLOON_F_PAGE_POISON);
 }
 
@@ -818,6 +872,10 @@ static void virtio_balloon_device_realize(DeviceState 
*dev, Error **errp)
 s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
 s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
 
+if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
+s->rvq = virtio_add_queue(vdev, 32, virtio_balloon_handle_report);
+}
+
 if (virtio_has_feature(s->host_features,
VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
 s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE,
@@ -949,6 +1007,8 @@ static Property virtio_balloon_properties[] = {
  */
 DEFINE_PROP_BOOL("qemu-4-0-config-size", VirtIOBalloon,
  qemu_4_0_config_size, false),
+DEFINE_PROP_BIT("free-page-reporting", VirtIOBalloon, host_features,
+VIRTIO_BALLOON_F_REPORTING, true),
 DEFINE_PROP_LINK("iothread", VirtIOBalloon, iothread, TYPE_IOTHREAD,
  IOThread *),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index 3ca2a78e1aca..ac4013d51010 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -42,7 +42,7 @@ enum virtio_balloon_free_page_hint_status {
 
 typedef struct VirtIOBalloon {

[Bug 1873344] [NEW] KVM Windows 98 sound card passthrough is not working for DOS programs..

2020-04-16 Thread ruthan

Public bug reported:

Hello,
im trying to passthrough PCI soundcards into Qemu Windows 98 machine - i tried 
Yamaha 724/744 and Aunreal Vortex 1, for Windows 98 its working fine, but for 
Windows 98 dosbox mode its at the best half - working - FM (music) only or 
nothing with detected by games sound setups.
  All there cards are using SB emulation devices. 

  When i try to boot to pure DOS, without Windows 98, even music is not working 
with pass through, only sound which i was able to heard its form Yamaha Setup 
utility test - Native 16bit sound, aby other test, games setup etc.. are able 
to dettect sound cards at all. 
  Im pretty sure that drivers are setup correctly, because im using same setup 
on other physical machine, when its working. My suspect is missing or broken 
Qemu MB DMA channels emulation.. Because its is need to make DOS sound working.

  Im using pass through because, SB16 emulation in Qemu is incomplete
and for Windows 98 dos box, problem is same as with physical cards. Same
with AC97 emulation and its Win95 drivers, which have SB emulation
device fallback..

Qemu 2.11 + 4.2 Linux Mint 19.3. MB Gigabyte Z170.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1873344

Title:
  KVM Windows 98 sound card passthrough is not working for DOS
  programs..

Status in QEMU:
  New

Bug description:
  Hello,
  im trying to passthrough PCI soundcards into Qemu Windows 98 machine - i 
tried Yamaha 724/744 and Aunreal Vortex 1, for Windows 98 its working fine, but 
for Windows 98 dosbox mode its at the best half - working - FM (music) only or 
nothing with detected by games sound setups.
All there cards are using SB emulation devices. 

When i try to boot to pure DOS, without Windows 98, even music is not 
working with pass through, only sound which i was able to heard its form Yamaha 
Setup utility test - Native 16bit sound, aby other test, games setup etc.. are 
able to dettect sound cards at all. 
Im pretty sure that drivers are setup correctly, because im using same 
setup on other physical machine, when its working. My suspect is missing or 
broken Qemu MB DMA channels emulation.. Because its is need to make DOS sound 
working.

Im using pass through because, SB16 emulation in Qemu is incomplete
  and for Windows 98 dos box, problem is same as with physical cards.
  Same with AC97 emulation and its Win95 drivers, which have SB
  emulation device fallback..

  Qemu 2.11 + 4.2 Linux Mint 19.3. MB Gigabyte Z170.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1873344/+subscriptions

[PATCH v20 QEMU 4/5] linux-headers: update to contain virito-balloon free page reporting

2020-04-16 Thread Alexander Duyck

From: Alexander Duyck 

Sync the latest upstream changes for free page reporting. To be
replaced by a full linux header sync.

Signed-off-by: Alexander Duyck 
---
 include/standard-headers/linux/virtio_balloon.h |1 +
 1 file changed, 1 insertion(+)

diff --git a/include/standard-headers/linux/virtio_balloon.h 
b/include/standard-headers/linux/virtio_balloon.h
index af0a6b59dab2..af3b7a1fa263 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -36,6 +36,7 @@
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT3 /* VQ to report free pages */
 #define VIRTIO_BALLOON_F_PAGE_POISON   4 /* Guest is using page poisoning */
+#define VIRTIO_BALLOON_F_REPORTING 5 /* Page reporting virtqueue */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12

[PATCH v20 QEMU 2/5] virtio-balloon: Replace free page hinting references to 'report' with 'hint'

2020-04-16 Thread Alexander Duyck

From: Alexander Duyck 

In an upcoming patch a feature named Free Page Reporting is about to be
added. In order to avoid any confusion we should drop the use of the word
'report' when referring to Free Page Hinting. So what this patch does is go
through and replace all instances of 'report' with 'hint" when we are
referring to free page hinting.

Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   74 ++--
 include/hw/virtio/virtio-balloon.h |   20 +-
 2 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a4729f7fc930..a1d6fb52c876 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -466,21 +466,21 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 ret = false;
 goto out;
 }
-if (id == dev->free_page_report_cmd_id) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+if (id == dev->free_page_hint_cmd_id) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_START;
 } else {
 /*
  * Stop the optimization only when it has started. This
  * avoids a stale stop sign for the previous command.
  */
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 }
 }
 }
 
 if (elem->in_num) {
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
 qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
   elem->in_sg[0].iov_len);
 }
@@ -506,11 +506,11 @@ static void virtio_ballloon_get_free_page_hints(void 
*opaque)
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify(vdev, vq);
   /*
-   * Start to poll the vq once the reporting started. Otherwise, continue
+   * Start to poll the vq once the hinting started. Otherwise, continue
* only when there are entries on the vq, which need to be given back.
*/
 } while (continue_to_get_hints ||
- dev->free_page_report_status == FREE_PAGE_REPORT_S_START);
+ dev->free_page_hint_status == FREE_PAGE_HINT_S_START);
 virtio_queue_set_notification(vq, 1);
 }
 
@@ -531,14 +531,14 @@ static void virtio_balloon_free_page_start(VirtIOBalloon 
*s)
 return;
 }
 
-if (s->free_page_report_cmd_id == UINT_MAX) {
-s->free_page_report_cmd_id =
-   VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
+if (s->free_page_hint_cmd_id == UINT_MAX) {
+s->free_page_hint_cmd_id =
+   VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN;
 } else {
-s->free_page_report_cmd_id++;
+s->free_page_hint_cmd_id++;
 }
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
+s->free_page_hint_status = FREE_PAGE_HINT_S_REQUESTED;
 virtio_notify_config(vdev);
 }
 
@@ -546,18 +546,18 @@ static void virtio_balloon_free_page_stop(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-if (s->free_page_report_status != FREE_PAGE_REPORT_S_STOP) {
+if (s->free_page_hint_status != FREE_PAGE_HINT_S_STOP) {
 /*
  * The lock also guarantees us that the
  * virtio_ballloon_get_free_page_hints exits after the
- * free_page_report_status is set to S_STOP.
+ * free_page_hint_status is set to S_STOP.
  */
 qemu_mutex_lock(>free_page_lock);
 /*
  * The guest hasn't done the reporting, so host sends a notification
  * to the guest to actively stop the reporting.
  */
-s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+s->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify_config(vdev);
 }
@@ -567,15 +567,15 @@ static void virtio_balloon_free_page_done(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_DONE;
+s->free_page_hint_status = FREE_PAGE_HINT_S_DONE;
 virtio_notify_config(vdev);
 }
 
 static int
-virtio_balloon_free_page_report_notify(NotifierWithReturn *n, void *data)
+virtio_balloon_free_page_hint_notify(NotifierWithReturn *n, void *data)
 {
 VirtIOBalloon *dev = container_of(n, VirtIOBalloon,
-  free_page_report_notify);
+  free_page_hint_notify);
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 PrecopyNotifyData *pnd = data;
 
@@ -624,7 +624,7 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
 if

[PATCH v20 QEMU 0/5] virtio-balloon: add support for free page reporting

2020-04-16 Thread Alexander Duyck

This series provides an asynchronous means of reporting free guest pages
to QEMU through virtio-balloon so that the memory associated with those
pages can be dropped and reused by other processes and/or guests on the
host. Using this it is possible to avoid unnecessary I/O to disk and
greatly improve performance in the case of memory overcommit on the host.

I originally submitted this patch series back on February 11th 2020[1],
but at that time I was focused primarily on the kernel portion of this
patch set. However as of April 7th those patches are now included in
Linus's kernel tree[2] and so I am submitting the QEMU pieces for
inclusion.

[1]: 
https://lore.kernel.org/lkml/20200211224416.29318.44077.stgit@localhost.localdomain/
[2]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b0c504f154718904ae49349147e3b7e6ae91ffdc

Changes from v17:
Fixed typo in patch 1 title
Addressed white-space issues reported via checkpatch
Added braces {} for two if statements to match expected coding style

Changes from v18:
Updated patches 2 and 3 based on input from dhildenb
Added comment to patch 2 describing what keeps us from reporting a bad page
Added patch to address issue with ROM devices being directly writable

Changes from v19:
Added std-headers change to match changes pushed for linux kernel headers
Added patch to remove "report" from page hinting code paths
Updated comment to better explain why we disable hints w/ page poisoning
Removed code that was modifying config size for poison vs hinting
Dropped x-page-poison property
Added code to bounds check the reported region vs the RAM block
Dropped patch for ROM devices as that was already pulled in by Paolo

---

Alexander Duyck (5):
  linux-headers: Update to allow renaming of free_page_report_cmd_id
  virtio-balloon: Replace free page hinting references to 'report' with 
'hint'
  virtio-balloon: Implement support for page poison tracking feature
  linux-headers: update to contain virito-balloon free page reporting
  virtio-balloon: Provide an interface for free page reporting


 hw/virtio/virtio-balloon.c  |  161 ++-
 include/hw/virtio/virtio-balloon.h  |   23 ++-
 include/standard-headers/linux/virtio_balloon.h |   12 +-
 3 files changed, 146 insertions(+), 50 deletions(-)

--

[PATCH v20 QEMU 3/5] virtio-balloon: Implement support for page poison tracking feature

2020-04-16 Thread Alexander Duyck

From: Alexander Duyck 

We need to make certain to advertise support for page poison tracking if
we want to actually get data on if the guest will be poisoning pages. So
if free page hinting is active we should add page poisoning support and
let the guest disable it if it isn't using it.

Page poisoning will result in a page being dirtied on free. As such we
cannot really avoid having to copy the page at least one more time since
we will need to write the poison value to the destination. As such we can
just ignore free page hinting if page poisoning is enabled as it will
actually reduce the work we have to do.

Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   27 +++
 include/hw/virtio/virtio-balloon.h |1 +
 2 files changed, 28 insertions(+)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a1d6fb52c876..19d587fd05cb 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -531,6 +531,23 @@ static void virtio_balloon_free_page_start(VirtIOBalloon 
*s)
 return;
 }
 
+/*
+ * If page poisoning is enabled then we probably shouldn't bother with
+ * the hinting since the poisoning will dirty the page and invalidate
+ * the work we are doing anyway.
+ *
+ * If at some point in the future the implementation is changed in the
+ * guest we would still have issues as the poison would need to be
+ * applied to the last state of the page on the remote end.
+ *
+ * To do that we could improve upon this current implementation by
+ * migrating a poison/zero page if the page is flagged as dirty instead
+ * of simply skipping it.
+ */
+if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
+return;
+}
+
 if (s->free_page_hint_cmd_id == UINT_MAX) {
 s->free_page_hint_cmd_id =
VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN;
@@ -634,6 +651,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, 
uint8_t *config_data)
 
 config.num_pages = cpu_to_le32(dev->num_pages);
 config.actual = cpu_to_le32(dev->actual);
+config.poison_val = cpu_to_le32(dev->poison_val);
 
 if (dev->free_page_hint_status == FREE_PAGE_HINT_S_REQUESTED) {
 config.free_page_hint_cmd_id =
@@ -697,6 +715,10 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 qapi_event_send_balloon_change(vm_ram_size -
 ((ram_addr_t) dev->actual << 
VIRTIO_BALLOON_PFN_SHIFT));
 }
+dev->poison_val = 0;
+if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
+dev->poison_val = le32_to_cpu(config.poison_val);
+}
 trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
@@ -706,6 +728,9 @@ static uint64_t virtio_balloon_get_features(VirtIODevice 
*vdev, uint64_t f,
 VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
 f |= dev->host_features;
 virtio_add_feature(, VIRTIO_BALLOON_F_STATS_VQ);
+if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+virtio_add_feature(, VIRTIO_BALLOON_F_PAGE_POISON);
+}
 
 return f;
 }
@@ -854,6 +879,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
 g_free(s->stats_vq_elem);
 s->stats_vq_elem = NULL;
 }
+
+s->poison_val = 0;
 }
 
 static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index 108cff97e71a..3ca2a78e1aca 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -70,6 +70,7 @@ typedef struct VirtIOBalloon {
 uint32_t host_features;
 
 bool qemu_4_0_config_size;
+uint32_t poison_val;
 } VirtIOBalloon;
 
 #endif

[PATCH v20 QEMU 1/5] linux-headers: Update to allow renaming of free_page_report_cmd_id

2020-04-16 Thread Alexander Duyck

From: Alexander Duyck 

Sync to the latest upstream changes for free page hinting. To be
replaced by a full linux header sync.

Signed-off-by: Alexander Duyck 
---
 include/standard-headers/linux/virtio_balloon.h |   11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/standard-headers/linux/virtio_balloon.h 
b/include/standard-headers/linux/virtio_balloon.h
index 9375ca2a70de..af0a6b59dab2 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -47,8 +47,15 @@ struct virtio_balloon_config {
uint32_t num_pages;
/* Number of pages we've actually got in balloon. */
uint32_t actual;
-   /* Free page report command id, readonly by guest */
-   uint32_t free_page_report_cmd_id;
+   /*
+* Free page hint command id, readonly by guest.
+* Was previously name free_page_report_cmd_id so we
+* need to carry that name for legacy support.
+*/
+   union {
+   uint32_t free_page_hint_cmd_id;
+   uint32_t free_page_report_cmd_id;   /* deprecated */
+   };
/* Stores PAGE_POISON if page poisoning is in use */
uint32_t poison_val;
 };

[Bug 1873341] [NEW] Qemu Win98 VM with KVM videocard passthrough DOS mode video is not working for most of games..

2020-04-16 Thread ruthan

Public bug reported:

Hello,
im using Win98 machine with KVM videocards passthrough which is working fine, 
but when i try Windows 98 - Dosbox mode, there is something work with all 
videocards which i tried PCI-E/PCI - Nvidia, 3Dfx, Matrox.

 Often is framerate is very slow, as slideshow:
Doom 2, Blood, even for Fdisk start - i can see how its slowly rendering 
individual lines, or its not working at all - freeze / black screen only - 
Warcraft 2 demo (vesa 640x480). 

 There is something wrong with it.

 Qemu 2.11 + 4.2, Linux Mint 19.3. Gigabyte Z170 MB.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1873341

Title:
  Qemu Win98 VM with KVM videocard passthrough DOS mode video is not
  working for most of games..

Status in QEMU:
  New

Bug description:
  Hello,
  im using Win98 machine with KVM videocards passthrough which is working fine, 
but when i try Windows 98 - Dosbox mode, there is something work with all 
videocards which i tried PCI-E/PCI - Nvidia, 3Dfx, Matrox.

   Often is framerate is very slow, as slideshow:
  Doom 2, Blood, even for Fdisk start - i can see how its slowly rendering 
individual lines, or its not working at all - freeze / black screen only - 
Warcraft 2 demo (vesa 640x480). 

   There is something wrong with it.

   Qemu 2.11 + 4.2, Linux Mint 19.3. Gigabyte Z170 MB.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1873341/+subscriptions

[Bug 1873340] [NEW] KVM Old ATI(pre) AMD card passthrough is not working

2020-04-16 Thread ruthan

Public bug reported:

Hello,
tried to passthroug old ATI pre AMD PCI / PCI-E cards, on machine where 
anything else is working - Nvidia /Matrox / 3dfx cards..

Here are results:
ATI Mach 64 PCI - videocard - machine start segfault
ATI Rage XL PCI - videocard - machine start segfault
ATI Radeon 7000 PCI - Segmentation fault
ATI X600 Giabyte GV-RX60P128D - Segmentation fault
ATI X700 PCI-E Legend - videocard - completely broken picture from boot
ATI X800 XL PCI-E Gigabyte - videocard - completely broken picture from boot
  All cards have last bioses.

ATI X600 - HP one professional with DMS-59 connector, im unable to make
passthrough, but im not able to set in Windows 98/WinXP machine..
anything less than 16 bit colors.. Im getting VM crashes or boot
freezes, when i try to boot with more colors.

 Qemu 2.11 and 4.2, is the same, Mint Linux 19.3. Giabyte Z170 MB.

** Affects: qemu
 Importance: Undecided
 Status: New

** Description changed:

  Hello,
  tried to passthroug old ATI pre AMD PCI / PCI-E cards, on machine where 
anything else is working - Nvidia /Matrox / 3dfx cards..
  
  Here are results:
  ATI Mach 64 PCI - videocard - machine start segfault
  ATI Rage XL PCI - videocard - machine start segfault
  ATI Radeon 7000 PCI - Segmentation fault
  ATI X600 Giabyte GV-RX60P128D - Segmentation fault
  ATI X700 PCI-E Legend - videocard - completely broken picture from boot
  ATI X800 XL PCI-E Gigabyte - videocard - completely broken picture from boot
-   All cards has last bioses.
+   All cards have last bioses.
  
  ATI X600 - HP one professional with DMS-59 connector, im unable to make
  passthrough, but im not able to set in Windows 98/WinXP machine..
  anything less than 16 bit colors.. Im getting VM crashes or boot
  freezes, when i try to boot with more colors.
  
-  Qemu 2.11 and 4.2, is the same, Mint Linux 19.3.
+  Qemu 2.11 and 4.2, is the same, Mint Linux 19.3. Giabyte Z170 MB.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1873340

Title:
  KVM Old ATI(pre) AMD card passthrough is not working

Status in QEMU:
  New

Bug description:
  Hello,
  tried to passthroug old ATI pre AMD PCI / PCI-E cards, on machine where 
anything else is working - Nvidia /Matrox / 3dfx cards..

  Here are results:
  ATI Mach 64 PCI - videocard - machine start segfault
  ATI Rage XL PCI - videocard - machine start segfault
  ATI Radeon 7000 PCI - Segmentation fault
  ATI X600 Giabyte GV-RX60P128D - Segmentation fault
  ATI X700 PCI-E Legend - videocard - completely broken picture from boot
  ATI X800 XL PCI-E Gigabyte - videocard - completely broken picture from boot
    All cards have last bioses.

  ATI X600 - HP one professional with DMS-59 connector, im unable to
  make passthrough, but im not able to set in Windows 98/WinXP machine..
  anything less than 16 bit colors.. Im getting VM crashes or boot
  freezes, when i try to boot with more colors.

   Qemu 2.11 and 4.2, is the same, Mint Linux 19.3. Giabyte Z170 MB.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1873340/+subscriptions

Re: [PATCH v2 4/6] dwc-hsotg USB host controller emulation

2020-04-16 Thread Paul Zimmerman


On 4/16/20 9:30 AM, Philippe Mathieu-Daudé wrote:

On 4/16/20 5:47 PM, Peter Maydell wrote:

On Thu, 16 Apr 2020 at 16:45, Peter Maydell  wrote:


On Sun, 29 Mar 2020 at 00:18, Paul Zimmerman  wrote:



+    s->as = _space_memory;


Ideally this should be a device property. (hw/dma/pl080.c
has an example of how to declare a TYPE_MEMORY_REGION
property and then create an AddressSpace from it in
the realize method. hw/arm/versatilepb.c and hw/arm/mps2-tz.c
show the other end, using object_property_set_link() to pass
the appropriate MemoryRegion to the device before realizing it.)


On closer inspection you're already doing that with the dma_as/
dma_mr. What's this AddressSpace for if it's different?


s->as is not used, probably a leftover (s->dma_as is used).



thanks
-- PMM


Thanks for the reviews guys, I will take all your suggestions into
account before I post the next series.

Thanks,
Paul

[Bug 1873338] [NEW] Dos on the fly CD image replacement is not Working with DOS

2020-04-16 Thread ruthan

Public bug reported:

Im not able to exchange CD image on the fly (needed for some games). I
messed with command like - in console(ATL+CRTL+2) eject ide1-cd0 and
change ide-cd0 D:/Games/!Emulators/Dos-QEMU/ISOs/TestChangeISO.iso , but
system so never able to find new CD data.. simply drive so empty.. but
when i reboot virtual machine, new change image is now working.

  Qemu 4.2.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1873338

Title:
  Dos on the fly CD image replacement is not Working with DOS

Status in QEMU:
  New

Bug description:
  Im not able to exchange CD image on the fly (needed for some games). I
  messed with command like - in console(ATL+CRTL+2) eject ide1-cd0 and
  change ide-cd0 D:/Games/!Emulators/Dos-QEMU/ISOs/TestChangeISO.iso ,
  but system so never able to find new CD data.. simply drive so empty..
  but when i reboot virtual machine, new change image is now working.

Qemu 4.2.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1873338/+subscriptions

[Bug 1873337] [NEW] Arrow keys press is double in some programs in Dos

2020-04-16 Thread ruthan

Public bug reported:

Hello,
im trying to use Qemu for Dos machines.

 But there is problem with some programs that arrow key press is double
in some problems. As advanced Filemanagers - Dos Navigator or File
Wizard, same Scandisk.

There is gif:
https://www.vogons.org/download/file.php?id=77141=view

 Its blocking to use such problem, unless you use Numlock key for it,
but im used 25+ years to arrow keys and its bug.. I guess that it would
mess with some games too.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1873337

Title:
  Arrow keys press is double in some programs in Dos

Status in QEMU:
  New

Bug description:
  Hello,
  im trying to use Qemu for Dos machines.

   But there is problem with some programs that arrow key press is
  double in some problems. As advanced Filemanagers - Dos Navigator or
  File Wizard, same Scandisk.

  There is gif:
  https://www.vogons.org/download/file.php?id=77141=view

   Its blocking to use such problem, unless you use Numlock key for it,
  but im used 25+ years to arrow keys and its bug.. I guess that it
  would mess with some games too.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1873337/+subscriptions

[Bug 1873339] [NEW] Qemu DOS Quake - 640x480 and above resolutions - Unable to load VESA palette in dos prompt and game crashing are not working

2020-04-16 Thread ruthan

Public bug reported:

I have problem make Quake Demo working with 640x480+, with 320x200 working fine.
I tried 3 virtual videocards settings: -vga cirrus 640x480 is not available, 
probably emulated GPU has not enough VRAM or some Vesa2 utility is needed. For 
-vga std and -vga vmware // 640x480 is available in game menu, but when i tried 
to set it, im getting: Unable to load VESA palette in dos prompt and game 
crashing.
With vmware svgaII other Q2DOS 640x480 and 1024x768 its working fine, so it not 
working only with some games.

  Qemu 4.2, its same on Linux and Windows.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1873339

Title:
  Qemu DOS Quake - 640x480 and above resolutions - Unable to load VESA
  palette in dos prompt and game crashing are not working

Status in QEMU:
  New

Bug description:
  I have problem make Quake Demo working with 640x480+, with 320x200 working 
fine.
  I tried 3 virtual videocards settings: -vga cirrus 640x480 is not available, 
probably emulated GPU has not enough VRAM or some Vesa2 utility is needed. For 
-vga std and -vga vmware // 640x480 is available in game menu, but when i tried 
to set it, im getting: Unable to load VESA palette in dos prompt and game 
crashing.
  With vmware svgaII other Q2DOS 640x480 and 1024x768 its working fine, so it 
not working only with some games.

Qemu 4.2, its same on Linux and Windows.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1873339/+subscriptions

[Bug 1873335] [NEW] Dos Keypad is not working for numbers - numlock is not working

2020-04-16 Thread ruthan

Public bug reported:

Hello,
i tried to use Qemu 4.2 for Dos, but there is problem what in Dos is not 
possible turn on Numlock for input numbers, so games need it.. Numlock only 
working as arrow keys.
  I tested bough Windows and Linux builds.

With same setting, when i use Windows 98 or later os, numlock is working
fine.

** Affects: qemu
 Importance: Undecided
 Status: New

** Description changed:

  Hello,
  i tried to use Qemu 4.2 for Dos, but there is problem what in Dos is not 
possible turn on Numlock for input numbers, so games need it.. Numlock only 
working as arrow keys.
-   I tested bough Windows and Linux builds.
+   I tested bough Windows and Linux builds.
+ 
+ With same setting, when i use Windows 98 or later os, numlock is working
+ fine.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1873335

Title:
  Dos Keypad is not working for numbers - numlock is not working

Status in QEMU:
  New

Bug description:
  Hello,
  i tried to use Qemu 4.2 for Dos, but there is problem what in Dos is not 
possible turn on Numlock for input numbers, so games need it.. Numlock only 
working as arrow keys.
    I tested bough Windows and Linux builds.

  With same setting, when i use Windows 98 or later os, numlock is
  working fine.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1873335/+subscriptions

[PATCH v1 PATCH v1 0/1] tests: machine-none-test: Enable MicroBlaze testing

2020-04-16 Thread Edgar E. Iglesias

From: "Edgar E. Iglesias" 

This is to re-enable machine-none MicroBlaze testing.

Cheers,
Edgar

Edgar E. Iglesias (1):
  tests: machine-none-test: Enable MicroBlaze testing

 tests/qtest/machine-none-test.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

-- 
2.20.1

[PATCH v1 PATCH v1 1/1] tests: machine-none-test: Enable MicroBlaze testing

2020-04-16 Thread Edgar E. Iglesias

From: "Edgar E. Iglesias" 

Enable MicroBlaze testing.

Signed-off-by: Edgar E. Iglesias 
---
 tests/qtest/machine-none-test.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/tests/qtest/machine-none-test.c b/tests/qtest/machine-none-test.c
index 8bb54a6360..209d86eb57 100644
--- a/tests/qtest/machine-none-test.c
+++ b/tests/qtest/machine-none-test.c
@@ -33,8 +33,8 @@ static struct arch2cpu cpus_map[] = {
 { "cris", "crisv32" },
 { "lm32", "lm32-full" },
 { "m68k", "m5206" },
-/* FIXME: { "microblaze", "any" }, doesn't work with -M none -cpu any */
-/* FIXME: { "microblazeel", "any" }, doesn't work with -M none -cpu any */
+{ "microblaze", "any" },
+{ "microblazeel", "any" },
 { "mips", "4Kc" },
 { "mipsel", "I7200" },
 { "mips64", "20Kc" },
@@ -79,10 +79,8 @@ static void test_machine_cpu_cli(void)
 QTestState *qts;
 
 if (!cpu_model) {
-if (!(!strcmp(arch, "microblaze") || !strcmp(arch, "microblazeel"))) {
-fprintf(stderr, "WARNING: cpu name for target '%s' isn't defined,"
-" add it to cpus_map\n", arch);
-}
+fprintf(stderr, "WARNING: cpu name for target '%s' isn't defined,"
+" add it to cpus_map\n", arch);
 return; /* TODO: die here to force all targets have a test */
 }
 qts = qtest_initf("-machine none -cpu '%s'", cpu_model);
-- 
2.20.1

Re: [PATCH 0/3] qemu-img: Add convert --bitmaps

2020-04-16 Thread Eric Blake


(adding Markus for a CLI question, look for [*])

On 4/16/20 1:20 PM, Nir Soffer wrote:

On Thu, Apr 16, 2020 at 5:51 PM Eric Blake  wrote:


Without this series, the process for copying one qcow2 image to
another including all of its bitmaps involves running qemu and doing
the copying by hand with a series of QMP commands.  This makes the
process a bit more convenient.


This seems good for copying an image chain from one storage to another,
but I think we need a similar --bitmaps option to qemu-img measure to make
this really useful.

Here is example use case showing how qemu-img measure is related:

Source chain:
/dev/vg1/base
/dev/vg1/top

Destination chain:
/dev/vg2/base
/dev/vg2/top

We create empty lvs with the same name on destination storage (/dev/vg2).

We measure the base lv using qemu-img measure for creating the target lv:

 qemu-img measure -f qcow2 -O qcow2 /dev/vg1/base
 lvcreate -L required_size /dev/vg2/base
 qemu-img create -f qcow2 /dev/vg2/base 10g

For the top lv we use the current size of the source lv - I think we
should measure it instead but
I'm not sure if qemu-img measure supports measuring a single image in a chain
(maybe -o backing_file?).


qemu-measure --image-opts should be able to measure a single image by 
specifying image opts that purposefully treat the image as standalone 
rather than with its normal backing file included.  Let's see if I can 
whip up an example:


$ qemu-img create -f qcow2 img.base 100M
Formatting 'img.base', fmt=qcow2 size=104857600 cluster_size=65536 
lazy_refcounts=off refcount_bits=16

$ qemu-io -f qcow2 -c 'w 0 25m' img.base
wrote 26214400/26214400 bytes at offset 0
25 MiB, 1 ops; 00.24 sec (103.405 MiB/sec and 4.1362 ops/sec)
$ qemu-img create -f qcow2 -F qcow2 -b img.base img.top
Formatting 'img.top', fmt=qcow2 size=104857600 backing_file=img.base 
backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16

$ qemu-io -f qcow2 -c 'w 25m 25m' img.top
wrote 26214400/26214400 bytes at offset 26214400
25 MiB, 1 ops; 00.24 sec (103.116 MiB/sec and 4.1247 ops/sec)
$ qemu-img measure -f qcow2 -O qcow2 img.base
required size: 26542080
fully allocated size: 105185280
required size: 52756480
fully allocated size: 105185280

Okay, I can reproduce what you are seeing - measuring the top image 
defaults to measuring the full allocation of the entire chain, rather 
than the allocation of just the top image.  And now with --image-opts to 
the rescue:


$ qemu-img measure --image-opts -O qcow2 
driver=qcow2,backing=,file.driver=file,file.filename=img.top
qemu-img: warning: Use of "backing": "" is deprecated; use "backing": 
null instead

required size: 26542080
fully allocated size: 105185280

There you go - by forcing qemu to treat the overlay image as though it 
had no backing, you can then measure that image in isolation.


(*) Hmm - that warning about backing="" being deprecated is annoying, 
but I don't know any other way to use dotted command line syntax and 
still express that we want a QMP null.  I tried to see if I could inject 
an alternative backing driver, such as null-co, but was met with errors:


$ ./qemu-img measure --image-opts -O qcow2 
driver=qcow2,backing.driver=null-co,file.driver=file,file.filename=img.top
qemu-img: Could not open 
'driver=qcow2,backing.driver=null-co,file.driver=file,file.filename=img.top': 
Could not open backing file: The only allowed filename for this driver 
is 'null-co://'
$ ./qemu-img measure --image-opts -O qcow2 
driver=qcow2,backing.driver=null-co,backing.file=null-co://,file.driver=file,file.filename=img.top
qemu-img: Could not open 
'driver=qcow2,backing.driver=null-co,backing.file=null-co://,file.driver=file,file.filename=img.top': 
Could not open backing file: The only allowed filename for this driver 
is 'null-co://'
$ ./qemu-img measure --image-opts -O qcow2 
driver=qcow2,backing.driver=null-co,backing.file.filename=null-co://,file.driver=file,file.filename=img.top
qemu-img: Could not open 
'driver=qcow2,backing.driver=null-co,backing.file.filename=null-co://,file.driver=file,file.filename=img.top': 
Could not open backing file: Block protocol 'null-co' doesn't support 
the option 'file.filename'


We don't want to support "" in the QMP syntax forever, but if the CLI 
syntax has to handle the empty string specially in order to get null 
passed to the QMP code, then so be it.


I also tried, but failed, to use JSON syntax.  I don't know why we 
haven't wired up --image-opts to use JSON syntax yet.


$ qemu-img measure --image-opts -O qcow2 '{"driver":"qcow2", "backing":null,
  "file":{"driver":"file", "filename":"img.top"}}'
qemu-img: Could not open '{"driver":"qcow2", "backing":null,
  "file":{"driver":"file", "filename":"img.top"}}': Cannot find 
device={"driver":"qcow2" nor node_name={"driver":"qcow2"


I guess there's always the pseudo-json protocol:

$ qemu-img measure -O qcow2 'json:{"driver":"qcow2", "backing":null,
  "file":{"driver":"file", "filename":"img.top"}}'

Re: [PATCH RFC] configure: prefer sphinx-build to sphinx-build-3

2020-04-16 Thread Peter Maydell

On Thu, 16 Apr 2020 at 19:22, John Snow  wrote:
> My goal is to make virtual environments work out of the box.
>
> I.e., if you run ./configure from inside a VENV, it should "just work."

Yeah, this seems reasonable to me. If I understand your
patch correctly it ought to work without breaking
the setup Markus describes, because in that case
'sphinx-build' exists but will fail the test_sphinx_build
step (because it's a Python 2 sphinx-build) and we'll
then move on and use sphinx-build-3.

Patch looks good to me, but you'll need to rebase and update it
to take account of commits 516e8b7d4a and 988ae6c3a7
now in master.

thanks
-- PMM

Re: [PULL 0/1] Linux user for 5.0 patches

2020-04-16 Thread Laurent Vivier

Le 16/04/2020 à 21:08, Peter Maydell a écrit :
> On Thu, 16 Apr 2020 at 18:16, Laurent Vivier  wrote:
>>
>> Le 16/04/2020 à 18:03, Peter Maydell a écrit :
>>> On Thu, 16 Apr 2020 at 16:29, Laurent Vivier  wrote:

 The following changes since commit 
 20038cd7a8412feeb49c01f6ede89e36c8995472:

   Update version for v5.0.0-rc3 release (2020-04-15 20:51:54 +0100)

 are available in the Git repository at:

   git://github.com/vivier/qemu.git tags/linux-user-for-5.0-pull-request

 for you to fetch changes up to 386d38656889a40d29b514ee6f34997ca18f741e:

   linux-user/syscall.c: add target-to-host mapping for epoll_create1() 
 (2020-04-16 09:24:22 +0200)

 
 Fix epoll_create1() for qemu-alpha

 
>>>
>>> How critical is this bug fix? After rc3, I really don't want
>>> to have to create an rc4 unless it's unavoidable...
>>
>> See the launchpad bug (https://bugs.gentoo.org/717548): on alpha, it
>> prevents the use of python3 in gentoo chroot, and thus we can't use
>> emerge to install packages. It also impacts cmake on debian (see
>> https://bugs.launchpad.net/bugs/1860553).
>>
>> But it's not a regression, so up to you to reject it. It appears now
>> because most of the distro have switched from python2 to python3.
>>
>> It's a low risk change, only in linux-user and for archs that have a
>> different EPOLL_CLOEXEC value.
> 
> Thanks for the explanation. I think that I'll put it to one
> side and if we need an rc4 for some other reason it can go in,
> but it's not sufficiently major to merit an rc4 by itself.
> 

Thank you, I agree.

Laurent

Re: [PULL 0/1] Linux user for 5.0 patches

2020-04-16 Thread Peter Maydell

On Thu, 16 Apr 2020 at 18:16, Laurent Vivier  wrote:
>
> Le 16/04/2020 à 18:03, Peter Maydell a écrit :
> > On Thu, 16 Apr 2020 at 16:29, Laurent Vivier  wrote:
> >>
> >> The following changes since commit 
> >> 20038cd7a8412feeb49c01f6ede89e36c8995472:
> >>
> >>   Update version for v5.0.0-rc3 release (2020-04-15 20:51:54 +0100)
> >>
> >> are available in the Git repository at:
> >>
> >>   git://github.com/vivier/qemu.git tags/linux-user-for-5.0-pull-request
> >>
> >> for you to fetch changes up to 386d38656889a40d29b514ee6f34997ca18f741e:
> >>
> >>   linux-user/syscall.c: add target-to-host mapping for epoll_create1() 
> >> (2020-04-16 09:24:22 +0200)
> >>
> >> 
> >> Fix epoll_create1() for qemu-alpha
> >>
> >> 
> >
> > How critical is this bug fix? After rc3, I really don't want
> > to have to create an rc4 unless it's unavoidable...
>
> See the launchpad bug (https://bugs.gentoo.org/717548): on alpha, it
> prevents the use of python3 in gentoo chroot, and thus we can't use
> emerge to install packages. It also impacts cmake on debian (see
> https://bugs.launchpad.net/bugs/1860553).
>
> But it's not a regression, so up to you to reject it. It appears now
> because most of the distro have switched from python2 to python3.
>
> It's a low risk change, only in linux-user and for archs that have a
> different EPOLL_CLOEXEC value.

Thanks for the explanation. I think that I'll put it to one
side and if we need an rc4 for some other reason it can go in,
but it's not sufficiently major to merit an rc4 by itself.

-- PMM

Re: [PATCH] 9pfs: local: ignore O_NOATIME if we don't have permissions

2020-04-16 Thread Omar Sandoval

On Thu, Apr 16, 2020 at 04:58:31PM +0200, Christian Schoenebeck wrote:
> On Donnerstag, 16. April 2020 02:44:33 CEST Omar Sandoval wrote:
> > From: Omar Sandoval 
> > 
> > QEMU's local 9pfs server passes through O_NOATIME from the client. If
> > the QEMU process doesn't have permissions to use O_NOATIME (namely, it
> > does not own the file nor have the CAP_FOWNER capability), the open will
> > fail. This causes issues when from the client's point of view, it
> > believes it has permissions to use O_NOATIME (e.g., a process running as
> > root in the virtual machine). Additionally, overlayfs on Linux opens
> > files on the lower layer using O_NOATIME, so in this case a 9pfs mount
> > can't be used as a lower layer for overlayfs (cf.
> > https://github.com/osandov/drgn/blob/dabfe1971951701da13863dbe6d8a1d172ad965
> > 0/vmtest/onoatimehack.c and https://github.com/NixOS/nixpkgs/issues/54509).
> > 
> > Luckily, O_NOATIME is effectively a hint, and is often ignored by, e.g.,
> > network filesystems. open(2) notes that O_NOATIME "may not be effective
> > on all filesystems. One example is NFS, where the server maintains the
> > access time." This means that we can honor it when possible but fall
> > back to ignoring it.
> 
> I am not sure whether NFS would simply silently ignore O_NOATIME i.e. without 
> returning EPERM. I don't read it that way.

As far as I can tell, the NFS protocol has nothing equivalent to
O_NOATIME and thus can't honor it. Feel free to test it:

  # mount -t nfs -o vers=4,rw 10.0.2.2:/ /mnt
  # echo foo > /mnt/foo
  # touch -d "1 hour ago" /mnt/foo
  # stat /mnt/foo | grep 'Access: [0-9]'
  Access: 2020-04-16 10:43:36.838952593 -0700
  # # Drop caches so we have to go to the NFS server.
  # echo 3 > /proc/sys/vm/drop_caches
  # strace -e openat dd if=/mnt/foo of=/dev/null iflag=noatime |& grep /mnt/foo
  openat(AT_FDCWD, "/mnt/foo", O_RDONLY|O_NOATIME) = 3
  # stat /mnt/foo | grep 'Access: [0-9]'
  Access: 2020-04-16 11:43:36.906462928 -0700

> Fact is on Linux the expected 
> behaviour is returning EPERM if O_NOATIME cannot be satisfied, consistent 
> since its introduction 22 years ago:
> http://lkml.iu.edu/hypermail/linux/kernel/9811.2/0118.html

The exact phrasing in the man-page is: "EPERM  The O_NOATIME flag was
specified, but the effective user ID of the caller did not match the
owner of the file and the caller was not privileged." IMO, it's about
whether the (guest) process has permission from the (guest) kernel's
point of view, not whether the filesystem could satisfy it.

> > Signed-off-by: Omar Sandoval 
> > ---
> >  hw/9pfs/9p-util.h | 5 +
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/hw/9pfs/9p-util.h b/hw/9pfs/9p-util.h
> > index 79ed6b233e..50842d540f 100644
> > --- a/hw/9pfs/9p-util.h
> > +++ b/hw/9pfs/9p-util.h
> > @@ -37,9 +37,14 @@ static inline int openat_file(int dirfd, const char
> > *name, int flags, {
> >  int fd, serrno, ret;
> > 
> > +again:
> >  fd = openat(dirfd, name, flags | O_NOFOLLOW | O_NOCTTY | O_NONBLOCK,
> >  mode);
> >  if (fd == -1) {
> > +if (errno == EPERM && (flags & O_NOATIME)) {
> > +flags &= ~O_NOATIME;
> > +goto again;
> > +}
> >  return -1;
> >  }
> 
> It would certainly fix the problem in your use case. But it would also unmask 
> O_NOATIME for all other ones (i.e. regular users on guest).

The guest kernel will still check whether processes on the guest have
permission to use O_NOATIME. This only changes the behavior when the
guest kernel believes that the process has permission even though the
host QEMU process doesn't.

> I mean I understand your point, but I also have to take other use cases into 
> account which might expect to receive EPERM if O_NOATIME cannot be granted.

If you'd still like to preserve this behavior, would it be acceptable to
make this a QEMU option? Maybe something like "-virtfs
honor_noatime=no": the default would be "yes", which is the current
behavior, and "no" would always mask out NOATIME.

> May I ask how come that file/dir in question does not share the same uid in 
> your specific use case? Are the file(s) created outside of QEMU, i.e. 
> directly 
> by some app on host?

My use case is running tests on different versions of the Linux kernel
while reusing the host's userspace environment. I export the host's root
filesystem read-only to the guest via 9pfs, and the guest sets up
overlayfs on top of it (to allow certain modifications) and chroots into
that. Without a workaround like the LD_PRELOAD one I mentioned in the
commit message, any (read) accesses to files owned by root on the host
(like /bin/sh) will fail.

Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature

2020-04-16 Thread David Hildenbrand

> The other thing to keep in mind is that the poison value only really
> comes into play with hinting/reporting. In the case of the standard
> balloon the pages are considered allocated from the guest's

Currently just as free page hinting IMHO. They are temporarily
considered allocated.

> perspective until the balloon is deflated. Then any poison/init will
> occur over again anyway so I don't think the standard balloon should
> really care.

I think we should make this consistent. And as we discuss below, this
allows for a nice optimization in the guest even for ordinary
inflation/deflation (no need to zero out/poison again when giving the
pages back to the buddy).

> 
> For hinting it somewhat depends. Currently the implementation is
> inflating a balloon so having poisoning or init_on_free means it is
> written to immediately after it is freed so it defeats the purpose of
> the hinting. However that is a Linux implementation issue, not

Yeah, and as we discuss below, we can optimize that later in Linux. It's
sub-optimal, I agree.

> necessarily an issue with the QEMU implementation. As such may be I
> should fix that in the Linux driver since that has been ignored in
> QEMU up until now anyway. The more interesting bit is what should the
> behavior be from the hypervisor when a page is marked as being hinted.
> I think right now the behavior is to just not migrate the page. I
> wonder though if we shouldn't instead just consider the page a zero
> page, and then maybe modify the zero page behavior for the case where
> we know page poisoning is enabled.

I consider that maybe future work. Let's keep it simple for now (iow,
try to get page poisoning handling right first). The optimize the guest
handling on balloon deflation / end of free page hinting.

[...]

>> I can totally understand if Alex would want to stop working on
>> VIRTIO_BALLOON_F_PAGE_POISON at this point and only fix the guest to not
>> enable free page reporting in case we don't have
>> VIRTIO_BALLOON_F_PAGE_POISON (unless that's already done), lol. :)
> 
> I already have a patch for that.
> 
> The bigger issue is how to deal with the PAGE_POISON being enabled
> with FREE_PAGE_HINTING. The legacy code at this point is just broken
> and is plowing through with FREE_PAGE_HINTING while it is enabled.
> That is safe for now because it is using a balloon, the side effect is
> that it is going to defer migration. If it switches to a page
> reporting type setup at some point in the future we would need to make
> sure something is written to the other end to identify the poison/zero
> pages.


I think we don't have to worry about that for now. Might be sub-optimal,
but then, I don't think actual page poisoning isn't all that frequently
used in production setups.


-- 
Thanks,

David / dhildenb

Re: [PATCH 0/3] qemu-img: Add convert --bitmaps

2020-04-16 Thread Nir Soffer

On Thu, Apr 16, 2020 at 5:51 PM Eric Blake  wrote:
>
> Without this series, the process for copying one qcow2 image to
> another including all of its bitmaps involves running qemu and doing
> the copying by hand with a series of QMP commands.  This makes the
> process a bit more convenient.

This seems good for copying an image chain from one storage to another,
but I think we need a similar --bitmaps option to qemu-img measure to make
this really useful.

Here is example use case showing how qemu-img measure is related:

Source chain:
/dev/vg1/base
/dev/vg1/top

Destination chain:
/dev/vg2/base
/dev/vg2/top

We create empty lvs with the same name on destination storage (/dev/vg2).

We measure the base lv using qemu-img measure for creating the target lv:

qemu-img measure -f qcow2 -O qcow2 /dev/vg1/base
lvcreate -L required_size /dev/vg2/base
qemu-img create -f qcow2 /dev/vg2/base 10g

For the top lv we use the current size of the source lv - I think we
should measure it instead but
I'm not sure if qemu-img measure supports measuring a single image in a chain
(maybe -o backing_file?).

lvcreate -L current_size /dev/vg2/top
qemu-img create -f qcow2 -b /dev/vg2/base -F qcow2 /dev/vg2/top 10g

And then convert the lvs one by one:

qemu-img convert -f qcow2 -O qcow2 -n --bitmaps /dev/vg1/base /dev/vg2/base
qemu-img convert -f qcow2 -O qcow2 -n --bitmaps -B /dev/vg2/base
/dev/vg1/top /dev/vg2/top

The first copy may fail with ENOSPC since qemu-img measure of the base
does not consider the
bitmaps in the required size.

So I think we need to add a similar --bitmaps option to qemu-img
measure, hopefully reusing the
same code to find and estimate the size of the bitmaps.

Maybe we can estimate the size using qemu-img info --bitmaps, but I
think the right way to
do this is in qemu-img measure.

We have also another use case when we collapsed an image chain to single image:

Source chain:
/dev/vg1/base
/dev/vg1/top

Destination:
/dev/vg2/collapsed

In this case we measure the size of the entire chain (/dev/vg1/base <-
/dev/vg1/top) and create
/dev/vg2/collapsed in the correct size, and then we convert the chain using:

   qemu-img convert /dev/vg1/top /dev/vg2/collapsed

Currently we use this for exporting images, for example when creating
templates, or as a simple
backup. In this case we don't need to copy the bitmaps in the target
image - this is a new image
not used by any VM. Copying the bitmaps may also be non-trivial since
we may have the bitmaps
with the same names in several layers (e.g. result of live snapshot).

So I think using --bitmaps should be disabled when doing this kind of
convert. We can handle this
on our side easily, but I think this should fail or log a warning on
qemu-img, or require merging of
bitmaps with same names during the copy. I did not check if you
already handle this.

Finally we also have a use case when we copy the chain as is to new or
same storage, but
we create a new vm. In this case I don't think the backup history
makes sense for the new
vm, so we don't need to copy the bitmaps.

I will review the rest of the patches next week and can maybe give
this some testing.

Nir

> I still think that someday we will need a 'qemu-img bitmap' with
> various subcommands for manipulating bitmaps within an offline image,
> but in the meantime, this seems like a useful addition on its own.
>
> Series can also be downloaded at:
> https://repo.or.cz/qemu/ericb.git/shortlog/refs/tags/qemu-img-bitmaps-v1
>
> Eric Blake (3):
>   blockdev: Split off basic bitmap operations for qemu-img
>   qemu-img: Add convert --bitmaps option
>   iotests: Add test 291 to for qemu-img convert --bitmaps
>
>  docs/tools/qemu-img.rst|   6 +-
>  Makefile.objs  |   2 +-
>  include/sysemu/blockdev.h  |  10 ++
>  blockbitmaps.c | 217 +
>  blockdev.c | 184 ---
>  qemu-img.c |  81 +-
>  MAINTAINERS|   1 +
>  qemu-img-cmds.hx   |   4 +-
>  tests/qemu-iotests/291 | 143 
>  tests/qemu-iotests/291.out |  56 ++
>  tests/qemu-iotests/group   |   1 +
>  11 files changed, 514 insertions(+), 191 deletions(-)
>  create mode 100644 blockbitmaps.c
>  create mode 100755 tests/qemu-iotests/291
>  create mode 100644 tests/qemu-iotests/291.out
>
> --
> 2.26.0
>

Re: [PATCH RFC] configure: prefer sphinx-build to sphinx-build-3

2020-04-16 Thread John Snow




On 4/16/20 8:31 AM, Alex Bennée wrote:
> 
> John Snow  writes:
> 
>> On 4/15/20 1:55 PM, Peter Maydell wrote:
>>> On Wed, 15 Apr 2020 at 18:33, John Snow  wrote:

 sphinx-build is the name of the script entry point from the sphinx
 package itself. sphinx-build-3 is a pacakging convention by Linux
 distributions. Prefer, where possible, the canonical package name.
>>>
>>> This was Markus's code originally; cc'ing him.
>>>
>>> (Incidentally I think when we say "Linux distributions" we
>>> really mean "Red Hat"; Debian/Ubuntu don't use the "sphinx-build-3" name.)
>>>
>>
>> I'll take your word for it :)
>>
>>> thanks
>>> -- PMM
>>> (rest of email untrimmed for context)
>>>
>>
>> My only goal here is that if you are using a virtual environment with
>> sphinx installed that it prefers that, so non-standard names need to
>> come last.
>>
>> There's probably 10,000,000 ways to do that, hence the RFC.
> 
> What's wrong with just passing --sphinx-build=sphinx-build in your
> configure string? It will override whatever we auto-detect AFAICT.
> 

My goal is to make virtual environments work out of the box.

I.e., if you run ./configure from inside a VENV, it should "just work."

--js

Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature

2020-04-16 Thread Alexander Duyck

On Thu, Apr 16, 2020 at 7:55 AM David Hildenbrand  wrote:
>
> >> We should document our result of page poisoning, free page hinting, and
> >> free page reporting there as well. I hope you'll have time for the latter.
> >>
> >> -
> >> Semantics of VIRTIO_BALLOON_F_PAGE_POISON
> >> -
> >>
> >> "The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
> >> guest is using page poisoning. Guest writes to the poison_val config
> >> field to tell host about the page poisoning value that is in use."
> >> -> Very little information, no signs about what has to be done.
> >
> > I think it's an informational field. Knowing that free pages
> > are full of a specific pattern can be handy for the hypervisor
> > for a variety of reasons. E.g. compression/deduplication?
>
> I was referring to the documentation of the feature and what we
> (hypervisor) are expected to do (in regards to inflation/deflation).
>
> Yes, it might be valuable to know that the guest is using poisoning. I
> assume compression/deduplication (IOW KSM) will figure out themselves
> that such pages are equal.

The other thing to keep in mind is that the poison value only really
comes into play with hinting/reporting. In the case of the standard
balloon the pages are considered allocated from the guest's
perspective until the balloon is deflated. Then any poison/init will
occur over again anyway so I don't think the standard balloon should
really care.

For hinting it somewhat depends. Currently the implementation is
inflating a balloon so having poisoning or init_on_free means it is
written to immediately after it is freed so it defeats the purpose of
the hinting. However that is a Linux implementation issue, not
necessarily an issue with the QEMU implementation. As such may be I
should fix that in the Linux driver since that has been ignored in
QEMU up until now anyway. The more interesting bit is what should the
behavior be from the hypervisor when a page is marked as being hinted.
I think right now the behavior is to just not migrate the page. I
wonder though if we shouldn't instead just consider the page a zero
page, and then maybe modify the zero page behavior for the case where
we know page poisoning is enabled.

For reporting it is a matter of tracking the contents. We don't want
to modify the contents in any way as we are attempting to essentially
do in-place tracking of the page. So if it is poisoned or initialized
it needs to stay in that state so we cannot invalidate the page if
doing so will cause it to lose state information.

> >> "Let the hypervisor know that we are expecting a specific value to be
> >> written back in balloon pages."
> >
> >
> >
> >> -> Okay, that talks about "balloon pages", which would include right now
> >> -- pages "inflated" and then "deflated" using free page hinting
> >> -- pages "inflated" and then "deflated" using oridnary inflate/deflate
> >>queue
> >
> > ATM, in this case driver calls "free" and that fills page with the
> > poison value.
>
> Yes, that's what I mentioned somehwere, it's currently done by Linux and ...
>
> >
> > It might be a valid optimization to allow driver to skip
> > poisoning of freed pages in this case.
>
> ... we should prepare for that :)

Agreed.

> >
> >> And I would add
> >>
> >> "However, if the inflated page was not filled with "poison_val" when
> >> inflating, it's not predictable if the original page or a page filled
> >> with "poison_val" is returned."
> >>
> >> Which would cover the "we did not discard the page in the hypervisor, so
> >> the original page is still there".
> >>
> >>
> >> We should also document what is expected to happen if "poison_val" is
> >> suddenly changed by the guest at one point in time again. (e.g., not
> >> supported, unexpected things can happen, etc.)
> >
> > Right. I think we should require that this can only be changed
> > before features have been negotiated.
> > That is the only point where hypervisor can still fail
> > gracefully (i.e. fail FEATURES_OK).
>
> Agreed.

I believe that is the current behavior. Essentially if poisoning
enabled then the feature flag is left set. I think the one change I
will make in the driver is that if poisoning is enabled in the kernel,
but PAGE_POISON is not available as a feature, I am going to disable
both the reporting and hinting features in virtballoon_validate.

> I can totally understand if Alex would want to stop working on
> VIRTIO_BALLOON_F_PAGE_POISON at this point and only fix the guest to not
> enable free page reporting in case we don't have
> VIRTIO_BALLOON_F_PAGE_POISON (unless that's already done), lol. :)

I already have a patch for that.

The bigger issue is how to deal with the PAGE_POISON being enabled
with FREE_PAGE_HINTING. The legacy code at this point is just broken
and is plowing through with FREE_PAGE_HINTING while it is

[PATCH] linux-user/strace.list: fix epoll_create{,1} -strace output

2020-04-16 Thread Sergei Trofimovich

Fix syscall name and parameters priinter.

Before the change:

```
$ alpha-linux-user/qemu-alpha -strace -L /usr/alpha-unknown-linux-gnu/ /tmp/a
...
1274697 
%s(%d)(2097152,274903156744,274903156760,274905840712,274877908880,274903235616)
 = 3
1274697 exit_group(0)
```

After the change:

```
$ alpha-linux-user/qemu-alpha -strace -L /usr/alpha-unknown-linux-gnu/ /tmp/a
...
1273719 epoll_create1(2097152) = 3
1273719 exit_group(0)
```

Signed-off-by: Sergei Trofimovich 
CC: Riku Voipio 
CC: Laurent Vivier 
---
 linux-user/strace.list | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/strace.list b/linux-user/strace.list
index d49a1e92a8..9281c0a758 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -125,10 +125,10 @@
 { TARGET_NR_dup3, "dup3" , "%s(%d,%d,%d)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_epoll_create
-{ TARGET_NR_epoll_create, "%s(%d)", NULL, NULL, NULL },
+{ TARGET_NR_epoll_create, "epoll_create", "%s(%d)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_epoll_create1
-{ TARGET_NR_epoll_create1, "%s(%d)", NULL, NULL, NULL },
+{ TARGET_NR_epoll_create1, "epoll_create1", "%s(%d)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_epoll_ctl
 { TARGET_NR_epoll_ctl, "epoll_ctl" , NULL, NULL, NULL },
-- 
2.26.1

Re: [PATCH 2/2] virtiofsd: drop all capabilities in the wait parent process

2020-04-16 Thread Philippe Mathieu-Daudé


On 4/16/20 6:49 PM, Stefan Hajnoczi wrote:

All this process does is wait for its child.  No capabilities are
needed.

Signed-off-by: Stefan Hajnoczi 
---
  tools/virtiofsd/passthrough_ll.c | 13 +
  1 file changed, 13 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index af97ba1c41..0c3f33b074 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2530,6 +2530,17 @@ static void print_capabilities(void)
  printf("}\n");
  }
  
+/*

+ * Drop all Linux capabilities because the wait parent process only needs to
+ * sit in waitpid(2) and terminate.
+ */
+static void setup_wait_parent_capabilities(void)
+{
+capng_setpid(syscall(SYS_gettid));


Maybe worth a /* Drop all capabilities */ comment here.

Reviewed-by: Philippe Mathieu-Daudé 


+capng_clear(CAPNG_SELECT_BOTH);
+capng_apply(CAPNG_SELECT_BOTH);
+}
+
  /*
   * Move to a new mount, net, and pid namespaces to isolate this process.
   */
@@ -2561,6 +2572,8 @@ static void setup_namespaces(struct lo_data *lo, struct 
fuse_session *se)
  pid_t waited;
  int wstatus;
  
+setup_wait_parent_capabilities();

+
  /* The parent waits for the child */
  do {
  waited = waitpid(child, , 0);

[PATCH RFC v5] target/arm: Implement SVE2 HISTCNT, HISTSEG

2020-04-16 Thread Stephen Long

Signed-off-by: Stephen Long 
---
Made the fixes Richard noted.

 target/arm/helper-sve.h|   7 +++
 target/arm/sve.decode  |   6 +++
 target/arm/sve_helper.c| 104 +
 target/arm/translate-sve.c |  29 +++
 4 files changed, 146 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 4733614614..958ad623f6 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -2526,6 +2526,13 @@ DEF_HELPER_FLAGS_5(sve2_nmatch_ppzz_b, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(sve2_nmatch_ppzz_h, TCG_CALL_NO_RWG,
i32, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(sve2_histcnt_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve2_histcnt_d, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve2_histseg, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_6(sve2_faddp_zpzz_h, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_6(sve2_faddp_zpzz_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 26690d4208..9dd20eb6ec 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -147,6 +147,7 @@
 _esz rn=%reg_movprfx
 @rdn_pg_rm_ra    esz:2 . ra:5  ... pg:3 rm:5 rd:5 \
 _esz rn=%reg_movprfx
+@rd_pg_rn_rm    esz:2 . rm:5 ... pg:3 rn:5 rd:5   _esz
 
 # One register operand, with governing predicate, vector element size
 @rd_pg_rn    esz:2 ... ... ... pg:3 rn:5 rd:5   _esz
@@ -1325,6 +1326,11 @@ UQRSHRNT01000101 .. 1 . 00  . .  
@rd_rn_tszimm_shr
 MATCH   01000101 .. 1 . 100 ... . 0  @pd_pg_rn_rm
 NMATCH  01000101 .. 1 . 100 ... . 1  @pd_pg_rn_rm
 
+### SVE2 Histogram Computation
+
+HISTCNT 01000101 .. 1 . 110 ... . .  @rd_pg_rn_rm
+HISTSEG 01000101 .. 1 . 101 000 . .  @rd_rn_rm
+
 ## SVE2 floating-point pairwise operations
 
 FADDP   01100100 .. 010 00 0 100 ... . . @rdn_pg_rm
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 7c65009bb8..65857e27b4 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -7016,3 +7016,107 @@ DO_PPZZ_MATCH(sve2_nmatch_ppzz_b, MO_8, true)
 DO_PPZZ_MATCH(sve2_nmatch_ppzz_h, MO_16, true)
 
 #undef DO_PPZZ_MATCH
+
+void HELPER(sve2_histcnt_s)(void *vd, void *vn, void *vm, void *vg,
+uint32_t desc)
+{
+intptr_t i, j;
+intptr_t opr_sz = simd_oprsz(desc);
+uint32_t *d = vd, *n = vn, *m = vm;
+uint8_t *pg = vg;
+
+for (i = 0; i < opr_sz; i += 4) {
+uint64_t count = 0;
+uint8_t pred = pg[H1(i >> 3)] >> (i & 7);
+if (pred & 1) {
+uint32_t nn = n[H4(i >> 2)];
+for (j = 0; j <= i; j += 4) {
+uint8_t pred = pg[H1(j >> 3)] >> (j & 7);
+if (pred & 1 && nn == m[H4(j >> 2)]) {
+++count;
+}
+}
+}
+d[H4(i >> 2)] = count;
+}
+}
+
+void HELPER(sve2_histcnt_d)(void *vd, void *vn, void *vm, void *vg,
+uint32_t desc)
+{
+intptr_t i, j;
+intptr_t opr_sz = simd_oprsz(desc) / 8;
+uint64_t *d = vd, *n = vn, *m = vm;
+uint8_t *pg = vg;
+
+for (i = 0; i < opr_sz; ++i) {
+uint64_t count = 0;
+if (pg[H1(i)] & 1) {
+uint64_t nn = n[i];
+for (j = 0; j <= i; ++j) {
+if (pg[H1(j)] & 1 && nn == m[j]) {
+++count;
+}
+}
+}
+d[i] = count;
+}
+}
+
+/*
+ * Returns the number of bytes in m0 and m1 that match n.
+ * See comment for do_match2().
+ * */
+static inline uint64_t do_histseg_cnt(uint8_t n, uint64_t m0, uint64_t m1)
+{
+int esz = MO_8;
+int bits = 8 << esz;
+uint64_t ones = dup_const(esz, 1);
+uint64_t signs = ones << (bits - 1);
+uint64_t cmp0, cmp1;
+
+cmp1 = dup_const(esz, n);
+cmp0 = cmp1 ^ m0;
+cmp1 = cmp1 ^ m1;
+cmp0 = (cmp0 - ones) & ~cmp0 & signs;
+cmp1 = (cmp1 - ones) & ~cmp1 & signs;
+
+/*
+ * Combine the two compares in a way that the bits do
+ * not overlap, and so preserves the count of set bits.
+ * If the host has a efficient instruction for ctpop,
+ * then ctpop(x) + ctpop(y) has the same number of
+ * operations as ctpop(x | (y >> 1)).  If the host does
+ * not have an efficient ctpop, then we only want to
+ * use it once.
+ */
+return ctpop64(cmp0 | (cmp1 >> 1));
+}
+
+void HELPER(sve2_histseg)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+intptr_t i, j;
+intptr_t opr_sz = simd_oprsz(desc);
+
+for (i = 0; i < opr_sz; i += 16) {
+uint64_t n0 = *(uint64_t *)(vn + i);
+uint64_t n1 = *(uint64_t *)(vn + i + 8);
+
+uint64_t m0 = *(uint64_t *)(vm +

Re: [PULL 0/1] Linux user for 5.0 patches

2020-04-16 Thread Laurent Vivier

Le 16/04/2020 à 18:03, Peter Maydell a écrit :
> On Thu, 16 Apr 2020 at 16:29, Laurent Vivier  wrote:
>>
>> The following changes since commit 20038cd7a8412feeb49c01f6ede89e36c8995472:
>>
>>   Update version for v5.0.0-rc3 release (2020-04-15 20:51:54 +0100)
>>
>> are available in the Git repository at:
>>
>>   git://github.com/vivier/qemu.git tags/linux-user-for-5.0-pull-request
>>
>> for you to fetch changes up to 386d38656889a40d29b514ee6f34997ca18f741e:
>>
>>   linux-user/syscall.c: add target-to-host mapping for epoll_create1() 
>> (2020-04-16 09:24:22 +0200)
>>
>> 
>> Fix epoll_create1() for qemu-alpha
>>
>> 
> 
> How critical is this bug fix? After rc3, I really don't want
> to have to create an rc4 unless it's unavoidable...

See the launchpad bug (https://bugs.gentoo.org/717548): on alpha, it
prevents the use of python3 in gentoo chroot, and thus we can't use
emerge to install packages. It also impacts cmake on debian (see
https://bugs.launchpad.net/bugs/1860553).

But it's not a regression, so up to you to reject it. It appears now
because most of the distro have switched from python2 to python3.

It's a low risk change, only in linux-user and for archs that have a
different EPOLL_CLOEXEC value.

Thanks,
Laurent

Re: [PATCH v2] nrf51: Fix last GPIO CNF address

2020-04-16 Thread Peter Maydell

On Wed, 15 Apr 2020 at 05:37, Cameron Esfahani  wrote:
>
> NRF51_GPIO_REG_CNF_END doesn't actually refer to the start of the last
> valid CNF register: it's referring to the last byte of the last valid
> CNF register.
>
> This hasn't been a problem up to now, as current implementation in
> memory.c turns an unaligned 4-byte read from 0x77f to a single byte read
> and the qtest only looks at the least-significant byte of the register.
>
> But when running with patches which fix unaligned accesses in memory.c,
> the qtest breaks.
>
> Considering NRF51 doesn't support unaligned accesses, the simplest fix
> is to actually set NRF51_GPIO_REG_CNF_END to the start of the last valid
> CNF register: 0x77c.
>
> Now, qtests work with or without the unaligned access patches.
>
> Reviewed-by: Cédric Le Goater 
> Tested-by: Cédric Le Goater 
> Reviewed-by: Joel Stanley 
> Signed-off-by: Cameron Esfahani 



Applied to target-arm.next for 5.1, thanks.

-- PMM

[PATCH 2/2] virtiofsd: drop all capabilities in the wait parent process

2020-04-16 Thread Stefan Hajnoczi

All this process does is wait for its child.  No capabilities are
needed.

Signed-off-by: Stefan Hajnoczi 
---
 tools/virtiofsd/passthrough_ll.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index af97ba1c41..0c3f33b074 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2530,6 +2530,17 @@ static void print_capabilities(void)
 printf("}\n");
 }
 
+/*
+ * Drop all Linux capabilities because the wait parent process only needs to
+ * sit in waitpid(2) and terminate.
+ */
+static void setup_wait_parent_capabilities(void)
+{
+capng_setpid(syscall(SYS_gettid));
+capng_clear(CAPNG_SELECT_BOTH);
+capng_apply(CAPNG_SELECT_BOTH);
+}
+
 /*
  * Move to a new mount, net, and pid namespaces to isolate this process.
  */
@@ -2561,6 +2572,8 @@ static void setup_namespaces(struct lo_data *lo, struct 
fuse_session *se)
 pid_t waited;
 int wstatus;
 
+setup_wait_parent_capabilities();
+
 /* The parent waits for the child */
 do {
 waited = waitpid(child, , 0);
-- 
2.25.1

[PATCH 1/2] virtiofsd: only retain file system capabilities

2020-04-16 Thread Stefan Hajnoczi

virtiofsd runs as root but only needs a subset of root's Linux
capabilities(7).  As a file server its purpose is to create and access
files on behalf of a client.  It needs to be able to access files with
arbitrary uid/gid owners.  It also needs to be create device nodes.

Introduce a Linux capabilities(7) whitelist and drop all capabilities
that we don't need, making the virtiofsd process less powerful than a
regular uid root process.

  # cat /proc/PID/status
  ...
  Before   After
  CapInh:  
  CapPrm: 003f 88df
  CapEff: 003f 88df
  CapBnd: 003f 
  CapAmb:  

Note that file capabilities cannot be used to achieve the same effect on
the virtiofsd executable because mount is used during sandbox setup.
Therefore we drop capabilities programmatically at the right point
during startup.

This patch only affects the sandboxed child process.  The parent process
that sits in waitpid(2) still has full root capabilities and will be
addressed in the next patch.

Signed-off-by: Stefan Hajnoczi 
---
 tools/virtiofsd/passthrough_ll.c | 38 
 1 file changed, 38 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 4c35c95b25..af97ba1c41 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2695,6 +2695,43 @@ static void setup_mounts(const char *source)
 close(oldroot);
 }
 
+/*
+ * Only keep whitelisted capabilities that are needed for file system operation
+ */
+static void setup_capabilities(void)
+{
+pthread_mutex_lock();
+capng_restore_state();
+
+/*
+ * Whitelist file system-related capabilities that are needed for a file
+ * server to act like root.  Drop everything else like networking and
+ * sysadmin capabilities.
+ *
+ * Exclusions:
+ * 1. CAP_LINUX_IMMUTABLE is not included because it's only used via ioctl
+ *and we don't support that.
+ * 2. CAP_MAC_OVERRIDE is not included because it only seems to be
+ *used by the Smack LSM.  Omit it until there is demand for it.
+ */
+capng_setpid(syscall(SYS_gettid));
+capng_clear(CAPNG_SELECT_BOTH);
+capng_updatev(CAPNG_ADD, CAPNG_PERMITTED | CAPNG_EFFECTIVE,
+CAP_CHOWN,
+CAP_DAC_OVERRIDE,
+CAP_DAC_READ_SEARCH,
+CAP_FOWNER,
+CAP_FSETID,
+CAP_SETGID,
+CAP_SETUID,
+CAP_MKNOD,
+CAP_SETFCAP);
+capng_apply(CAPNG_SELECT_BOTH);
+
+cap.saved = capng_save_state();
+pthread_mutex_unlock();
+}
+
 /*
  * Lock down this process to prevent access to other processes or files outside
  * source directory.  This reduces the impact of arbitrary code execution bugs.
@@ -2705,6 +2742,7 @@ static void setup_sandbox(struct lo_data *lo, struct 
fuse_session *se,
 setup_namespaces(lo, se);
 setup_mounts(lo->source);
 setup_seccomp(enable_syslog);
+setup_capabilities();
 }
 
 /* Raise the maximum number of open file descriptors */
-- 
2.25.1

[PATCH 0/2] virtiofsd: drop Linux capabilities(7)

2020-04-16 Thread Stefan Hajnoczi

virtiofsd doesn't need of all Linux capabilities(7) available to root.  Keep a
whitelisted set of capabilities that we require.  This improves security in
case virtiofsd is compromised by making it hard for an attacker to gain further
access to the system.

Stefan Hajnoczi (2):
  virtiofsd: only retain file system capabilities
  virtiofsd: drop all capabilities in the wait parent process

 tools/virtiofsd/passthrough_ll.c | 51 
 1 file changed, 51 insertions(+)

-- 
2.25.1

Re: [PATCH v2] aspeed: Add boot stub for smp booting

2020-04-16 Thread Peter Maydell

On Thu, 9 Apr 2020 at 07:31, Joel Stanley  wrote:
>
> This is a boot stub that is similar to the code u-boot runs, allowing
> the kernel to boot the secondary CPU.

> +static void aspeed_write_smpboot(ARMCPU *cpu,
> + const struct arm_boot_info *info)
> +{
> +static const uint32_t poll_mailbox_ready[] = {
> +/*
> + * r2 = per-cpu go sign value
> + * r1 = AST_SMP_MBOX_FIELD_ENTRY
> + * r0 = AST_SMP_MBOX_FIELD_GOSIGN
> + */
> +0xee100fb0,  /* mrc p15, 0, r0, c0, c0, 5 */
> +0xe21000ff,  /* andsr0, r0, #255  */
> +0xe59f201c,  /* ldr r2, [pc, #28] */
> +0xe1822000,  /* orr r2, r2, r0*/
> +
> +0xe59f1018,  /* ldr r1, [pc, #24] */
> +0xe59f0018,  /* ldr r0, [pc, #24] */
> +
> +0xe320f002,  /* wfe   */
> +0xe5904000,  /* ldr r4, [r0]  */
> +0xe1520004,  /* cmp r2, r4*/
> +0x1afb,  /* bne  */

Note that unlike "wfi", QEMU's "wfe" implementation is merely
a 'yield', so a secondary-CPU boot loop that has wfe in it
will basically be a busy-loop of those vcpu threads.
(This is why the smpboot code in hw/arm/boot.c uses wfi.)

I don't suppose the secondary boot protocol on these boards
is such that a wfi loop will work ? (Depends on what the
primary code in the kernel does to prod the secondary after
writing the magic value.)

> +0xe591f000,  /* ldr pc, [r1]  */
> +AST_SMP_MBOX_GOSIGN,
> +AST_SMP_MBOX_FIELD_ENTRY,
> +AST_SMP_MBOX_FIELD_GOSIGN,
> +};

thanks
-- PMM

Re: [PATCH v1 0/2] dma/xlnx-zdma: Fix descriptor loading wrt host endianness

2020-04-16 Thread Peter Maydell

On Sat, 4 Apr 2020 at 13:26, Edgar E. Iglesias  wrote:
>
> From: "Edgar E. Iglesias" 
>
> Hi,
>
> This fixes the endinannes related bugs with descriptor loading
> that Peter pointed out.
>
> Cheers,
> Edgar
>
> Edgar E. Iglesias (2):
>   dma/xlnx-zdma: Fix descriptor loading (MEM) wrt endianness
>   dma/xlnx-zdma: Fix descriptor loading (REG) wrt endianness



Applied to target-arm.next for 5.1, thanks.

-- PMM

Re: [PATCH v2 4/6] dwc-hsotg USB host controller emulation

2020-04-16 Thread Philippe Mathieu-Daudé


On 4/16/20 5:47 PM, Peter Maydell wrote:

On Thu, 16 Apr 2020 at 16:45, Peter Maydell  wrote:


On Sun, 29 Mar 2020 at 00:18, Paul Zimmerman  wrote:



+s->as = _space_memory;


Ideally this should be a device property. (hw/dma/pl080.c
has an example of how to declare a TYPE_MEMORY_REGION
property and then create an AddressSpace from it in
the realize method. hw/arm/versatilepb.c and hw/arm/mps2-tz.c
show the other end, using object_property_set_link() to pass
the appropriate MemoryRegion to the device before realizing it.)


On closer inspection you're already doing that with the dma_as/
dma_mr. What's this AddressSpace for if it's different?


s->as is not used, probably a leftover (s->dma_as is used).



thanks
-- PMM

Re: [PATCH RFC v4] target/arm: Implement SVE2 HISTCNT, HISTSEG

2020-04-16 Thread Richard Henderson

On 4/16/20 7:42 AM, Stephen Long wrote:
> +static inline uint8_t do_histseg_cnt(uint8_t n, uint64_t m0, uint64_t m1)
> +{
> +int esz = 0;

Clearer to use MO_8.

> +int bits = 8 << esz;
> +uint64_t ones = dup_const(esz, 1);
> +uint64_t signs = ones << (bits - 1);
> +uint64_t cmp0, cmp1;
> +
> +cmp1 = dup_const(1, n);

Error in the esz argument here.

> +cmp0 = cmp1 ^ m0;
> +cmp1 = cmp1 ^ m1;
> +cmp0 = (cmp0 - ones) & ~cmp0;
> +cmp1 = (cmp1 - ones) & ~cmp1;
> +return ctpop64((cmp0 | cmp1) & signs);
> +}

Ah, well, I may have been too brief in my suggestion before.  I encourage you
to have a look at the bithacks patch and understand the algorithm here -- it's
quite clever.

We cannot simply OR the two halves together, since 8 | 8 == 8 loses one from
the count of bits.  So:

  cmp0 = (cmp0 - ones) & ~cmp0 & signs;
  cmp1 = (cmp1 - ones) & ~cmp1 & signs;

  /*
   * Combine the two compares in a way that the bits do
   * not overlap, and so preserves the count of set bits.
   * If the host has a efficient instruction for ctpop,
   * then ctpop(x) + ctpop(y) has the same number of
   * operations as ctpop(x | (y >> 1)).  If the host does
   * not have an efficient ctpop, then we only want to
   * use it once.
   */
  return ctpop64(cmp0 | (cmp1 >> 1));

> +for (j = 0; j < 64; j += 8) {
> +uint8_t count0 = do_histseg_cnt(n0 >> j, m0, m1);
> +out0 |= count0 << j;
> +
> +uint8_t count1 = do_histseg_cnt(n1 >> j, m0, m1);
> +out1 |= count1 << j;
> +}

Wrong type for count0/count1 for shifting by e.g. 56.

You might as well just use uint64_t as the return value from do_histseg_cnt()
so that we don't get unnecessary zero-extensions from the compiler.

r~

Re: [PULL 0/1] Linux user for 5.0 patches

2020-04-16 Thread Peter Maydell

On Thu, 16 Apr 2020 at 16:29, Laurent Vivier  wrote:
>
> The following changes since commit 20038cd7a8412feeb49c01f6ede89e36c8995472:
>
>   Update version for v5.0.0-rc3 release (2020-04-15 20:51:54 +0100)
>
> are available in the Git repository at:
>
>   git://github.com/vivier/qemu.git tags/linux-user-for-5.0-pull-request
>
> for you to fetch changes up to 386d38656889a40d29b514ee6f34997ca18f741e:
>
>   linux-user/syscall.c: add target-to-host mapping for epoll_create1() 
> (2020-04-16 09:24:22 +0200)
>
> 
> Fix epoll_create1() for qemu-alpha
>
> 

How critical is this bug fix? After rc3, I really don't want
to have to create an rc4 unless it's unavoidable...

thanks
-- PMM

Re: [PATCH v2 4/6] dwc-hsotg USB host controller emulation

2020-04-16 Thread Peter Maydell

On Thu, 16 Apr 2020 at 16:45, Peter Maydell  wrote:
>
> On Sun, 29 Mar 2020 at 00:18, Paul Zimmerman  wrote:

> > +s->as = _space_memory;
>
> Ideally this should be a device property. (hw/dma/pl080.c
> has an example of how to declare a TYPE_MEMORY_REGION
> property and then create an AddressSpace from it in
> the realize method. hw/arm/versatilepb.c and hw/arm/mps2-tz.c
> show the other end, using object_property_set_link() to pass
> the appropriate MemoryRegion to the device before realizing it.)

On closer inspection you're already doing that with the dma_as/
dma_mr. What's this AddressSpace for if it's different?

thanks
-- PMM

Re: [PATCH v2 4/6] dwc-hsotg USB host controller emulation

2020-04-16 Thread Peter Maydell

On Sun, 29 Mar 2020 at 00:18, Paul Zimmerman  wrote:
>
> Add the dwc-hsotg (dwc2) USB host controller emulation code.
> Based on hw/usb/hcd-ehci.c and hw/usb/hcd-ohci.c.
>
> Note that to use this with the dwc-otg driver in the Raspbian
> kernel, you must pass the option "dwc_otg.fiq_fsm_enable=0" on
> the kernel command line.
>
> Emulation of slave mode and of descriptor-DMA mode has not been
> implemented yet. These modes are seldom used.
>
> I have used some on-line sources of information while developing
> this emulation, including:
>
> http://www.capital-micro.com/PDF/CME-M7_Family_User_Guide_EN.pdf
> has a pretty complete description of the controller starting on
> page 370.
>
> https://sourceforge.net/p/wive-ng/wive-ng-mt/ci/master/tree/docs/DataSheets/RT3050_5x_V2.0_081408_0902.pdf
> has a description of the controller registers starting on page
> 130.

Ooh, these reference URLs are very helpful. Could you put
them in a comment at the top of the C file as well as in the
commit message, please?

> Signed-off-by: Paul Zimmerman 
> ---
>  hw/usb/hcd-dwc2.c   | 1301 +++
>  hw/usb/trace-events |   47 ++
>  2 files changed, 1348 insertions(+)
>  create mode 100644 hw/usb/hcd-dwc2.c
>
> diff --git a/hw/usb/hcd-dwc2.c b/hw/usb/hcd-dwc2.c
> new file mode 100644
> index 00..fd85543f4d
> --- /dev/null
> +++ b/hw/usb/hcd-dwc2.c
> @@ -0,0 +1,1301 @@
> +/*
> + * dwc-hsotg (dwc2) USB host controller emulation
> + *
> + * Based on hw/usb/hcd-ehci.c and hw/usb/hcd-ohci.c
> + *
> + * Copyright (c) 2020 Paul Zimmerman 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "hw/usb/dwc2-regs.h"
> +#include "hw/usb/hcd-dwc2.h"
> +#include "trace.h"
> +#include "qemu/error-report.h"
> +#include "qemu/main-loop.h"
> +
> +#define USB_HZ_FS   1200
> +#define USB_HZ_HS   9600
> +
> +/* nifty macros from Arnon's EHCI version  */
> +#define get_field(data, field) \
> +(((data) & field##_MASK) >> field##_SHIFT)
> +
> +#define set_field(data, newval, field) do { \
> +uint32_t val = *data; \
> +val &= ~field##_MASK; \
> +val |= ((newval) << field##_SHIFT) & field##_MASK; \
> +*data = val; \
> +} while (0)
> +
> +#define get_bit(data, bitmask) \
> +(!!((data) & bitmask))

Could you use the standard field definition, extract, and deposit
macros from include/hw/registerfields.h, please?

> +static void dwc2_sysbus_realize(DeviceState *dev, Error **errp)
> +{
> +SysBusDevice *d = SYS_BUS_DEVICE(dev);
> +DWC2State *s = DWC2_USB(dev);
> +
> +s->glbregbase = 0;
> +s->fszregbase = 0x0100;
> +s->hreg0base = 0x0400;
> +s->hreg1base = 0x0500;
> +s->pcgregbase = 0x0e00;
> +s->hreg2base = 0x1000;
> +s->portnr = NB_PORTS;
> +s->as = _space_memory;

Ideally this should be a device property. (hw/dma/pl080.c
has an example of how to declare a TYPE_MEMORY_REGION
property and then create an AddressSpace from it in
the realize method. hw/arm/versatilepb.c and hw/arm/mps2-tz.c
show the other end, using object_property_set_link() to pass
the appropriate MemoryRegion to the device before realizing it.)

> +
> +dwc2_realize(s, dev, errp);

Why have you divided the realize function up into
dwc2_sysbus_realize() and dwc2_realize() and
dwc2_init()? The usual expectation would be that
there is (if you need it) an instance_init called
dwc2_init() and a realize called dwc2_realize(),
so using these names for functions that are just
called from the realize method is a bit confusing.
object_property_set_link(OBJECT(dev), OBJECT(sysmem), "downstream",
 _fatal);

> +dwc2_init(s, dev);
> +sysbus_init_irq(d, >irq);
> +sysbus_init_mmio(d, >mem);
> +}
> +
> +static void dwc2_class_init(ObjectClass *klass, void *data)
> +{
> +DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +dc->realize = dwc2_sysbus_realize;
> +dc->reset = dwc2_sysbus_reset;
> +set_bit(DEVICE_CATEGORY_USB, dc->categories);

Could you provide a VMStateDescription for dc->vmsd, please?

> +}
> +
> +static const TypeInfo dwc2_usb_type_info = {
> +.name  = TYPE_DWC2_USB,
> +.parent= TYPE_SYS_BUS_DEVICE,
> +.instance_size = sizeof(DWC2State),
> +.class_init= dwc2_class_init,
> +};
> +
> +static void dwc2_usb_register_types(void)
> +{
> +type_register_static(_usb_type_info);
> +}

thanks
-- PMM

[PATCH 1/2] qom: Factor out user_creatable_add_dict()

2020-04-16 Thread Kevin Wolf

The QMP handler qmp_object_add() and the implementation of --object in
qemu-storage-daemon can share most of the code. Currently,
qemu-storage-daemon calls qmp_object_add(), but this is not correct
because different visitors need to be used.

As a first step towards a fix, make qmp_object_add() a wrapper around a
new function user_creatable_add_dict() that can get an additional
parameter. The handling of "props" is only required for compatibility
and not required for the qemu-storage-daemon command line, so it stays
in qmp_object_add().

Signed-off-by: Kevin Wolf 
---
 include/qom/object_interfaces.h | 12 
 qom/object_interfaces.c | 27 +++
 qom/qom-qmp-cmds.c  | 24 +---
 3 files changed, 40 insertions(+), 23 deletions(-)

diff --git a/include/qom/object_interfaces.h b/include/qom/object_interfaces.h
index 6f92f3cebb..a0037968a4 100644
--- a/include/qom/object_interfaces.h
+++ b/include/qom/object_interfaces.h
@@ -87,6 +87,18 @@ Object *user_creatable_add_type(const char *type, const char 
*id,
 const QDict *qdict,
 Visitor *v, Error **errp);
 
+/**
+ * user_creatable_add_dict:
+ * @qdict: the object definition
+ * @errp: if an error occurs, a pointer to an area to store the error
+ *
+ * Create an instance of the user creatable object that is defined by
+ * @qdict.  The object type is taken from the QDict key 'qom-type', its
+ * ID from the key 'id'. The remaining entries in @qdict are used to
+ * initialize the object properties.
+ */
+void user_creatable_add_dict(QDict *qdict, Error **errp);
+
 /**
  * user_creatable_add_opts:
  * @opts: the object definition
diff --git a/qom/object_interfaces.c b/qom/object_interfaces.c
index 72cb9e32a9..739e3e5172 100644
--- a/qom/object_interfaces.c
+++ b/qom/object_interfaces.c
@@ -6,6 +6,7 @@
 #include "qapi/qmp/qerror.h"
 #include "qapi/qmp/qjson.h"
 #include "qapi/qmp/qstring.h"
+#include "qapi/qobject-input-visitor.h"
 #include "qom/object_interfaces.h"
 #include "qemu/help_option.h"
 #include "qemu/module.h"
@@ -105,6 +106,32 @@ out:
 return obj;
 }
 
+void user_creatable_add_dict(QDict *qdict, Error **errp)
+{
+Visitor *v;
+Object *obj;
+g_autofree char *type = NULL;
+g_autofree char *id = NULL;
+
+type = g_strdup(qdict_get_try_str(qdict, "qom-type"));
+if (!type) {
+error_setg(errp, QERR_MISSING_PARAMETER, "qom-type");
+return;
+}
+qdict_del(qdict, "qom-type");
+
+id = g_strdup(qdict_get_try_str(qdict, "id"));
+if (!id) {
+error_setg(errp, QERR_MISSING_PARAMETER, "id");
+return;
+}
+qdict_del(qdict, "id");
+
+v = qobject_input_visitor_new(QOBJECT(qdict));
+obj = user_creatable_add_type(type, id, qdict, v, errp);
+visit_free(v);
+object_unref(obj);
+}
 
 Object *user_creatable_add_opts(QemuOpts *opts, Error **errp)
 {
diff --git a/qom/qom-qmp-cmds.c b/qom/qom-qmp-cmds.c
index e47ebe8ed1..35db44b50e 100644
--- a/qom/qom-qmp-cmds.c
+++ b/qom/qom-qmp-cmds.c
@@ -21,7 +21,6 @@
 #include "qapi/qapi-commands-qom.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qerror.h"
-#include "qapi/qobject-input-visitor.h"
 #include "qemu/cutils.h"
 #include "qom/object_interfaces.h"
 #include "qom/qom-qobject.h"
@@ -245,24 +244,6 @@ void qmp_object_add(QDict *qdict, QObject **ret_data, 
Error **errp)
 {
 QObject *props;
 QDict *pdict;
-Visitor *v;
-Object *obj;
-g_autofree char *type = NULL;
-g_autofree char *id = NULL;
-
-type = g_strdup(qdict_get_try_str(qdict, "qom-type"));
-if (!type) {
-error_setg(errp, QERR_MISSING_PARAMETER, "qom-type");
-return;
-}
-qdict_del(qdict, "qom-type");
-
-id = g_strdup(qdict_get_try_str(qdict, "id"));
-if (!id) {
-error_setg(errp, QERR_MISSING_PARAMETER, "id");
-return;
-}
-qdict_del(qdict, "id");
 
 props = qdict_get(qdict, "props");
 if (props) {
@@ -282,10 +263,7 @@ void qmp_object_add(QDict *qdict, QObject **ret_data, 
Error **errp)
 qobject_unref(pdict);
 }
 
-v = qobject_input_visitor_new(QOBJECT(qdict));
-obj = user_creatable_add_type(type, id, qdict, v, errp);
-visit_free(v);
-object_unref(obj);
+user_creatable_add_dict(qdict, errp);
 }
 
 void qmp_object_del(const char *id, Error **errp)
-- 
2.20.1

[PATCH 2/2] qemu-storage-daemon: Fix non-string --object properties

2020-04-16 Thread Kevin Wolf

After processing the option string with the keyval parser, we get a
QDict that contains only strings. This QDict must be fed to a keyval
visitor which converts the strings into the right data types.

qmp_object_add(), however, uses the normal QObject input visitor, which
expects a QDict where all properties already have the QType that matches
the data type required by the QOM object type.

Change the --object implementation in qemu-storage-daemon so that it
doesn't call qmp_object_add(), but calls user_creatable_add_dict()
directly instead and pass it a new keyval boolean that decides which
visitor must be used.

Reported-by: Coiby Xu 
Signed-off-by: Kevin Wolf 
---
 include/qom/object_interfaces.h | 6 +-
 qemu-storage-daemon.c   | 4 +---
 qom/object_interfaces.c | 8 ++--
 qom/qom-qmp-cmds.c  | 2 +-
 4 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/include/qom/object_interfaces.h b/include/qom/object_interfaces.h
index a0037968a4..65172120fa 100644
--- a/include/qom/object_interfaces.h
+++ b/include/qom/object_interfaces.h
@@ -90,6 +90,10 @@ Object *user_creatable_add_type(const char *type, const char 
*id,
 /**
  * user_creatable_add_dict:
  * @qdict: the object definition
+ * @keyval: if true, use a keyval visitor for processing @qdict (i.e.
+ *  assume that all @qdict values are strings); otherwise, use
+ *  the normal QObject visitor (i.e. assume all @qdict values
+ *  have the QType expected by the QOM object type)
  * @errp: if an error occurs, a pointer to an area to store the error
  *
  * Create an instance of the user creatable object that is defined by
@@ -97,7 +101,7 @@ Object *user_creatable_add_type(const char *type, const char 
*id,
  * ID from the key 'id'. The remaining entries in @qdict are used to
  * initialize the object properties.
  */
-void user_creatable_add_dict(QDict *qdict, Error **errp);
+void user_creatable_add_dict(QDict *qdict, bool keyval, Error **errp);
 
 /**
  * user_creatable_add_opts:
diff --git a/qemu-storage-daemon.c b/qemu-storage-daemon.c
index dd128978cc..9e7adfe3a6 100644
--- a/qemu-storage-daemon.c
+++ b/qemu-storage-daemon.c
@@ -278,7 +278,6 @@ static void process_options(int argc, char *argv[])
 QemuOpts *opts;
 const char *type;
 QDict *args;
-QObject *ret_data = NULL;
 
 /* FIXME The keyval parser rejects 'help' arguments, so we must
  * unconditionall try QemuOpts first. */
@@ -291,9 +290,8 @@ static void process_options(int argc, char *argv[])
 qemu_opts_del(opts);
 
 args = keyval_parse(optarg, "qom-type", _fatal);
-qmp_object_add(args, _data, _fatal);
+user_creatable_add_dict(args, true, _fatal);
 qobject_unref(args);
-qobject_unref(ret_data);
 break;
 }
 default:
diff --git a/qom/object_interfaces.c b/qom/object_interfaces.c
index 739e3e5172..bc36f96e47 100644
--- a/qom/object_interfaces.c
+++ b/qom/object_interfaces.c
@@ -106,7 +106,7 @@ out:
 return obj;
 }
 
-void user_creatable_add_dict(QDict *qdict, Error **errp)
+void user_creatable_add_dict(QDict *qdict, bool keyval, Error **errp)
 {
 Visitor *v;
 Object *obj;
@@ -127,7 +127,11 @@ void user_creatable_add_dict(QDict *qdict, Error **errp)
 }
 qdict_del(qdict, "id");
 
-v = qobject_input_visitor_new(QOBJECT(qdict));
+if (keyval) {
+v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+} else {
+v = qobject_input_visitor_new(QOBJECT(qdict));
+}
 obj = user_creatable_add_type(type, id, qdict, v, errp);
 visit_free(v);
 object_unref(obj);
diff --git a/qom/qom-qmp-cmds.c b/qom/qom-qmp-cmds.c
index 35db44b50e..c5249e44d0 100644
--- a/qom/qom-qmp-cmds.c
+++ b/qom/qom-qmp-cmds.c
@@ -263,7 +263,7 @@ void qmp_object_add(QDict *qdict, QObject **ret_data, Error 
**errp)
 qobject_unref(pdict);
 }
 
-user_creatable_add_dict(qdict, errp);
+user_creatable_add_dict(qdict, false, errp);
 }
 
 void qmp_object_del(const char *id, Error **errp)
-- 
2.20.1

[PATCH 0/2] qemu-storage-daemon: Fix non-string --object properties

2020-04-16 Thread Kevin Wolf

Kevin Wolf (2):
  qom: Factor out user_creatable_add_dict()
  qemu-storage-daemon: Fix non-string --object properties

 include/qom/object_interfaces.h | 16 
 qemu-storage-daemon.c   |  4 +---
 qom/object_interfaces.c | 31 +++
 qom/qom-qmp-cmds.c  | 24 +---
 4 files changed, 49 insertions(+), 26 deletions(-)

-- 
2.20.1

Re: [PATCH v2 8/8] hw/arm/fsl-imx7: Connect watchdog interrupts

2020-04-16 Thread Peter Maydell

On Sun, 22 Mar 2020 at 21:19, Guenter Roeck  wrote:
>
> i.MX7 supports watchdog pretimeout interupts. With this commit,
> the watchdog in mcimx7d-sabre is fully operational, including
> pretimeout support.
>
> Signed-off-by: Guenter Roeck 

> diff --git a/include/hw/arm/fsl-imx7.h b/include/hw/arm/fsl-imx7.h
> index 47826da2b7..da977f9ffb 100644
> --- a/include/hw/arm/fsl-imx7.h
> +++ b/include/hw/arm/fsl-imx7.h
> @@ -228,6 +228,11 @@ enum FslIMX7IRQs {
>  FSL_IMX7_USB2_IRQ = 42,
>  FSL_IMX7_USB3_IRQ = 40,
>
> +FSL_IMX7_WDOG1_IRQ= 78,
> +FSL_IMX7_WDOG2_IRQ= 79,
> +FSL_IMX7_WDOG3_IRQ= 10,
> +FSL_IMX7_WDOG4_IRQ= 109,

irq 10 for wdog3 seems to match the kernel's dts, but it's
a bit weird that it's way out of the range of the others.
Did you sanity check it against the imx7 data sheet and/or
real h/w behaviour that it's not a typo for
one-hundred-and-something? (108 would be the obvious guess...)

Otherwise
Reviewed-by: Peter Maydell 

thanks
-- PMM

[PULL 1/1] linux-user/syscall.c: add target-to-host mapping for epoll_create1()

2020-04-16 Thread Laurent Vivier

From: Sergei Trofimovich 

Noticed by Barnabás Virágh as a python-3.7 failue on qemu-alpha.

The bug shows up on alpha as it's one of the targets where
EPOLL_CLOEXEC differs from other targets:
sysdeps/unix/sysv/linux/alpha/bits/epoll.h: EPOLL_CLOEXEC  = 0100
sysdeps/unix/sysv/linux/bits/epoll.h:EPOLL_CLOEXEC = 0200

Bug: https://bugs.gentoo.org/717548
Reported-by: Barnabás Virágh
Signed-off-by: Sergei Trofimovich 
CC: Riku Voipio 
CC: Laurent Vivier 
Reviewed-by: Laurent Vivier 
Message-Id: <20200415220508.5044-1-sly...@gentoo.org>
Signed-off-by: Laurent Vivier 
---
 linux-user/syscall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 674f70e70a56..05f03919ff07 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -12012,7 +12012,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #endif
 #if defined(TARGET_NR_epoll_create1) && defined(CONFIG_EPOLL_CREATE1)
 case TARGET_NR_epoll_create1:
-return get_errno(epoll_create1(arg1));
+return get_errno(epoll_create1(target_to_host_bitmask(arg1, 
fcntl_flags_tbl)));
 #endif
 #if defined(TARGET_NR_epoll_ctl)
 case TARGET_NR_epoll_ctl:
-- 
2.25.2

[PULL 0/1] Linux user for 5.0 patches

2020-04-16 Thread Laurent Vivier

The following changes since commit 20038cd7a8412feeb49c01f6ede89e36c8995472:

  Update version for v5.0.0-rc3 release (2020-04-15 20:51:54 +0100)

are available in the Git repository at:

  git://github.com/vivier/qemu.git tags/linux-user-for-5.0-pull-request

for you to fetch changes up to 386d38656889a40d29b514ee6f34997ca18f741e:

  linux-user/syscall.c: add target-to-host mapping for epoll_create1() 
(2020-04-16 09:24:22 +0200)


Fix epoll_create1() for qemu-alpha



Sergei Trofimovich (1):
  linux-user/syscall.c: add target-to-host mapping for epoll_create1()

 linux-user/syscall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.25.2

Re: [PATCH v2 7/8] hw/arm/fsl-imx7: Instantiate various unimplemented devices

2020-04-16 Thread Peter Maydell

On Sun, 22 Mar 2020 at 21:19, Guenter Roeck  wrote:
>
> Instantiating PWM, CAN, CAAM, and OCOTP devices is necessary to avoid
> crashes when booting mainline Linux.
>
> Signed-off-by: Guenter Roeck 

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH v2 2/8] hw/watchdog: Implement full i.MX watchdog support

2020-04-16 Thread Peter Maydell

On Sun, 22 Mar 2020 at 21:19, Guenter Roeck  wrote:
>
> Implement full support for the watchdog in i.MX systems.
> Pretimeout support is optional because the watchdog hardware on i.MX31
> does not support pretimeouts.
>
> Signed-off-by: Guenter Roeck 
> ---
> v2: Fixup of CONFIG_WDT_IMX -> CONFIG_WDT_IMX2 moved to patch 1/8

Sorry for not getting to this earlier, I've been focusing on
work for the 5.0 release. Some comments below, but overall
this looks pretty good.

>
>  hw/watchdog/wdt_imx2.c | 196 +++--
>  include/hw/watchdog/wdt_imx2.h |  49 -
>  2 files changed, 231 insertions(+), 14 deletions(-)
>
> diff --git a/hw/watchdog/wdt_imx2.c b/hw/watchdog/wdt_imx2.c
> index ad1ef02e9e..f5339f3590 100644
> --- a/hw/watchdog/wdt_imx2.c
> +++ b/hw/watchdog/wdt_imx2.c
> @@ -13,24 +13,157 @@
>  #include "qemu/bitops.h"
>  #include "qemu/module.h"
>  #include "sysemu/watchdog.h"
> +#include "migration/vmstate.h"
> +#include "hw/qdev-properties.h"
>
>  #include "hw/watchdog/wdt_imx2.h"
>
> -#define IMX2_WDT_WCR_WDABIT(5)  /* -> External Reset WDOG_B */
> -#define IMX2_WDT_WCR_SRSBIT(4)  /* -> Software Reset Signal */
> +static void imx2_wdt_interrupt(void *opaque)
> +{
> +IMX2WdtState *s = IMX2_WDT(opaque);
> +
> +s->wicr |= IMX2_WDT_WICR_WTIS;
> +qemu_set_irq(s->irq, 1);
> +}
>
> -static uint64_t imx2_wdt_read(void *opaque, hwaddr addr,
> -  unsigned int size)
> +static void imx2_wdt_expired(void *opaque)
>  {
> +IMX2WdtState *s = IMX2_WDT(opaque);
> +
> +s->wrsr = IMX2_WDT_WRSR_TOUT;
> +
> +/* Perform watchdog action if watchdog is enabled */
> +if (s->wcr & IMX2_WDT_WCR_WDE) {
> +watchdog_perform_action();
> +}
> +}
> +
> +static void imx2_wdt_reset(DeviceState *dev)
> +{
> +IMX2WdtState *s = IMX2_WDT(dev);
> +
> +s->wcr = IMX2_WDT_WCR_WDA | IMX2_WDT_WCR_SRS;
> +s->wsr = 0;
> +s->wrsr &= ~(IMX2_WDT_WRSR_TOUT | IMX2_WDT_WRSR_SFTW);
> +s->wicr = 4;
> +s->wmcr = IMX2_WDT_WMCR_PDE;

Your reset function probably also needs to ptimer_stop()
the timers or otherwise put them into whatever is the
correct state for the device-as-reset.

> +}

> +
>  static void imx2_wdt_write(void *opaque, hwaddr addr,
> uint64_t value, unsigned int size)
>  {
> -if (addr == IMX2_WDT_WCR &&
> -(~value & (IMX2_WDT_WCR_WDA | IMX2_WDT_WCR_SRS))) {
> -watchdog_perform_action();
> +IMX2WdtState *s = IMX2_WDT(opaque);
> +
> +switch (addr) {
> +case IMX2_WDT_WCR:
> +s->wcr = value;
> +if (!(value & IMX2_WDT_WCR_SRS)) {
> +s->wrsr = IMX2_WDT_WRSR_SFTW;
> +}
> +if (!(value & (IMX2_WDT_WCR_WDA | IMX2_WDT_WCR_SRS)) ||
> +(!(value & IMX2_WDT_WCR_WT) && (value & IMX2_WDT_WCR_WDE))) {
> +watchdog_perform_action();
> +}
> +s->wcr |= IMX2_WDT_WCR_SRS;
> +imx_wdt2_update_timer(s, true);
> +break;
> +case IMX2_WDT_WSR:
> +if (s->wsr == IMX2_WDT_SEQ1 && value == IMX2_WDT_SEQ2) {
> +imx_wdt2_update_timer(s, false);
> +}
> +s->wsr = value;
> +break;
> +case IMX2_WDT_WRSR:
> +break;
> +case IMX2_WDT_WICR:
> +if (!s->pretimeout_support) {
> +return;
> +}
> +/* The pretimeout value is write-once */

My imx6 manual says that the WICR WIE bit is also write-once,
so I think that changes to it should also be guarded under
!pretimeout_locked, like the WICT bits.

(In fact quite a lot of registers seem to have write-once bits.)

> +if (s->pretimeout_locked) {
> +value &= ~IMX2_WDT_WICR_WICT;
> +s->wicr &= (IMX2_WDT_WICR_WTIS | IMX2_WDT_WICR_WICT);
> +} else {
> +s->wicr &= IMX2_WDT_WICR_WTIS;
> +}
> +s->wicr |= value & (IMX2_WDT_WICR_WIE | IMX2_WDT_WICR_WICT);
> +if (value & IMX2_WDT_WICR_WTIS) {
> +s->wicr &= ~IMX2_WDT_WICR_WTIS;
> +qemu_set_irq(s->irq, 0);
> +}
> +imx_wdt2_update_itimer(s, true);
> +s->pretimeout_locked = true;
> +break;
> +case IMX2_WDT_WMCR:
> +s->wmcr = value & IMX2_WDT_WMCR_PDE;
> +break;
>  }
>  }

>  static void imx2_wdt_realize(DeviceState *dev, Error **errp)
>  {
>  IMX2WdtState *s = IMX2_WDT(dev);
> +SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
>
>  memory_region_init_io(>mmio, OBJECT(dev),
>_wdt_ops, s,
> -  TYPE_IMX2_WDT".mmio",
> -  IMX2_WDT_REG_NUM * sizeof(uint16_t));
> -sysbus_init_mmio(SYS_BUS_DEVICE(dev), >mmio);
> +  TYPE_IMX2_WDT,
> +  IMX2_WDT_MMIO_SIZE);
> +sysbus_init_mmio(sbd, >mmio);
> +sysbus_init_irq(sbd, >irq);
> +
> +s->timer = ptimer_init(imx2_wdt_expired, s, PTIMER_POLICY_DEFAULT);

PTIMER_POLICY_DEFAULT is almost

Re: [PATCH] 9pfs: local: ignore O_NOATIME if we don't have permissions

2020-04-16 Thread Christian Schoenebeck

On Donnerstag, 16. April 2020 02:44:33 CEST Omar Sandoval wrote:
> From: Omar Sandoval 
> 
> QEMU's local 9pfs server passes through O_NOATIME from the client. If
> the QEMU process doesn't have permissions to use O_NOATIME (namely, it
> does not own the file nor have the CAP_FOWNER capability), the open will
> fail. This causes issues when from the client's point of view, it
> believes it has permissions to use O_NOATIME (e.g., a process running as
> root in the virtual machine). Additionally, overlayfs on Linux opens
> files on the lower layer using O_NOATIME, so in this case a 9pfs mount
> can't be used as a lower layer for overlayfs (cf.
> https://github.com/osandov/drgn/blob/dabfe1971951701da13863dbe6d8a1d172ad965
> 0/vmtest/onoatimehack.c and https://github.com/NixOS/nixpkgs/issues/54509).
> 
> Luckily, O_NOATIME is effectively a hint, and is often ignored by, e.g.,
> network filesystems. open(2) notes that O_NOATIME "may not be effective
> on all filesystems. One example is NFS, where the server maintains the
> access time." This means that we can honor it when possible but fall
> back to ignoring it.

I am not sure whether NFS would simply silently ignore O_NOATIME i.e. without 
returning EPERM. I don't read it that way. Fact is on Linux the expected 
behaviour is returning EPERM if O_NOATIME cannot be satisfied, consistent 
since its introduction 22 years ago:
http://lkml.iu.edu/hypermail/linux/kernel/9811.2/0118.html

> Signed-off-by: Omar Sandoval 
> ---
>  hw/9pfs/9p-util.h | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/hw/9pfs/9p-util.h b/hw/9pfs/9p-util.h
> index 79ed6b233e..50842d540f 100644
> --- a/hw/9pfs/9p-util.h
> +++ b/hw/9pfs/9p-util.h
> @@ -37,9 +37,14 @@ static inline int openat_file(int dirfd, const char
> *name, int flags, {
>  int fd, serrno, ret;
> 
> +again:
>  fd = openat(dirfd, name, flags | O_NOFOLLOW | O_NOCTTY | O_NONBLOCK,
>  mode);
>  if (fd == -1) {
> +if (errno == EPERM && (flags & O_NOATIME)) {
> +flags &= ~O_NOATIME;
> +goto again;
> +}
>  return -1;
>  }

It would certainly fix the problem in your use case. But it would also unmask 
O_NOATIME for all other ones (i.e. regular users on guest).

I mean I understand your point, but I also have to take other use cases into 
account which might expect to receive EPERM if O_NOATIME cannot be granted.

May I ask how come that file/dir in question does not share the same uid in 
your specific use case? Are the file(s) created outside of QEMU, i.e. directly 
by some app on host?

Best regards,
Christian Schoenebeck

Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature

2020-04-16 Thread David Hildenbrand

>> We should document our result of page poisoning, free page hinting, and
>> free page reporting there as well. I hope you'll have time for the latter.
>>
>> -
>> Semantics of VIRTIO_BALLOON_F_PAGE_POISON
>> -
>>
>> "The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
>> guest is using page poisoning. Guest writes to the poison_val config
>> field to tell host about the page poisoning value that is in use."
>> -> Very little information, no signs about what has to be done.
> 
> I think it's an informational field. Knowing that free pages
> are full of a specific pattern can be handy for the hypervisor
> for a variety of reasons. E.g. compression/deduplication?

I was referring to the documentation of the feature and what we
(hypervisor) are expected to do (in regards to inflation/deflation).

Yes, it might be valuable to know that the guest is using poisoning. I
assume compression/deduplication (IOW KSM) will figure out themselves
that such pages are equal.

>> "Let the hypervisor know that we are expecting a specific value to be
>> written back in balloon pages."
> 
> 
> 
>> -> Okay, that talks about "balloon pages", which would include right now
>> -- pages "inflated" and then "deflated" using free page hinting
>> -- pages "inflated" and then "deflated" using oridnary inflate/deflate
>>queue
> 
> ATM, in this case driver calls "free" and that fills page with the
> poison value.

Yes, that's what I mentioned somehwere, it's currently done by Linux and ...

> 
> It might be a valid optimization to allow driver to skip
> poisoning of freed pages in this case.

... we should prepare for that :)

> 
>> And I would add
>>
>> "However, if the inflated page was not filled with "poison_val" when
>> inflating, it's not predictable if the original page or a page filled
>> with "poison_val" is returned."
>>
>> Which would cover the "we did not discard the page in the hypervisor, so
>> the original page is still there".
>>
>>
>> We should also document what is expected to happen if "poison_val" is
>> suddenly changed by the guest at one point in time again. (e.g., not
>> supported, unexpected things can happen, etc.)
> 
> Right. I think we should require that this can only be changed
> before features have been negotiated.
> That is the only point where hypervisor can still fail
> gracefully (i.e. fail FEATURES_OK).

Agreed.

I can totally understand if Alex would want to stop working on
VIRTIO_BALLOON_F_PAGE_POISON at this point and only fix the guest to not
enable free page reporting in case we don't have
VIRTIO_BALLOON_F_PAGE_POISON (unless that's already done), lol. :)

-- 
Thanks,

David / dhildenb

[Qemu devel PATCH v6 1/3] hw/net: Add Smartfusion2 emac block

2020-04-16 Thread sundeep . lkml

From: Subbaraya Sundeep 

Modelled Ethernet MAC of Smartfusion2 SoC.
Micrel KSZ8051 PHY is present on Emcraft's
SOM kit hence same PHY is emulated.

Signed-off-by: Subbaraya Sundeep 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
---
 MAINTAINERS|   2 +
 hw/net/Makefile.objs   |   1 +
 hw/net/msf2-emac.c | 589 +
 include/hw/net/msf2-emac.h |  53 
 4 files changed, 645 insertions(+)
 create mode 100644 hw/net/msf2-emac.c
 create mode 100644 include/hw/net/msf2-emac.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 8cbc1fa..cea5733 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -919,6 +919,8 @@ F: include/hw/arm/msf2-soc.h
 F: include/hw/misc/msf2-sysreg.h
 F: include/hw/timer/mss-timer.h
 F: include/hw/ssi/mss-spi.h
+F: hw/net/msf2-emac.c
+F: include/hw/net/msf2-emac.h
 
 Emcraft M2S-FG484
 M: Subbaraya Sundeep 
diff --git a/hw/net/Makefile.objs b/hw/net/Makefile.objs
index af4d194..f2b7398 100644
--- a/hw/net/Makefile.objs
+++ b/hw/net/Makefile.objs
@@ -55,3 +55,4 @@ common-obj-$(CONFIG_ROCKER) += rocker/rocker.o 
rocker/rocker_fp.o \
 obj-$(call lnot,$(CONFIG_ROCKER)) += rocker/qmp-norocker.o
 
 common-obj-$(CONFIG_CAN_BUS) += can/
+common-obj-$(CONFIG_MSF2) += msf2-emac.o
diff --git a/hw/net/msf2-emac.c b/hw/net/msf2-emac.c
new file mode 100644
index 000..32ba9e8
--- /dev/null
+++ b/hw/net/msf2-emac.c
@@ -0,0 +1,589 @@
+/*
+ * QEMU model of the Smartfusion2 Ethernet MAC.
+ *
+ * Copyright (c) 2020 Subbaraya Sundeep .
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ *
+ * Refer to section Ethernet MAC in the document:
+ * UG0331: SmartFusion2 Microcontroller Subsystem User Guide
+ * Datasheet URL:
+ * https://www.microsemi.com/document-portal/cat_view/56661-internal-documents/
+ * 56758-soc?lang=en=20=220
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "exec/address-spaces.h"
+#include "hw/registerfields.h"
+#include "hw/net/msf2-emac.h"
+#include "hw/net/mii.h"
+#include "hw/irq.h"
+#include "hw/qdev-properties.h"
+#include "migration/vmstate.h"
+
+REG32(CFG1, 0x0)
+FIELD(CFG1, RESET, 31, 1)
+FIELD(CFG1, RX_EN, 2, 1)
+FIELD(CFG1, TX_EN, 0, 1)
+FIELD(CFG1, LB_EN, 8, 1)
+REG32(CFG2, 0x4)
+REG32(IFG, 0x8)
+REG32(HALF_DUPLEX, 0xc)
+REG32(MAX_FRAME_LENGTH, 0x10)
+REG32(MII_CMD, 0x24)
+FIELD(MII_CMD, READ, 0, 1)
+REG32(MII_ADDR, 0x28)
+FIELD(MII_ADDR, REGADDR, 0, 5)
+FIELD(MII_ADDR, PHYADDR, 8, 5)
+REG32(MII_CTL, 0x2c)
+REG32(MII_STS, 0x30)
+REG32(STA1, 0x40)
+REG32(STA2, 0x44)
+REG32(FIFO_CFG0, 0x48)
+REG32(FIFO_CFG4, 0x58)
+FIELD(FIFO_CFG4, BCAST, 9, 1)
+FIELD(FIFO_CFG4, MCAST, 8, 1)
+REG32(FIFO_CFG5, 0x5C)
+FIELD(FIFO_CFG5, BCAST, 9, 1)
+FIELD(FIFO_CFG5, MCAST, 8, 1)
+REG32(DMA_TX_CTL, 0x180)
+FIELD(DMA_TX_CTL, EN, 0, 1)
+REG32(DMA_TX_DESC, 0x184)
+REG32(DMA_TX_STATUS, 0x188)
+FIELD(DMA_TX_STATUS, PKTCNT, 16, 8)
+FIELD(DMA_TX_STATUS, UNDERRUN, 1, 1)
+FIELD(DMA_TX_STATUS, PKT_SENT, 0, 1)
+REG32(DMA_RX_CTL, 0x18c)
+FIELD(DMA_RX_CTL, EN, 0, 1)
+REG32(DMA_RX_DESC, 0x190)
+REG32(DMA_RX_STATUS, 0x194)
+FIELD(DMA_RX_STATUS, PKTCNT, 16, 8)
+FIELD(DMA_RX_STATUS, OVERFLOW, 2, 1)
+FIELD(DMA_RX_STATUS, PKT_RCVD, 0, 1)
+REG32(DMA_IRQ_MASK, 0x198)
+REG32(DMA_IRQ, 0x19c)
+
+#define EMPTY_MASK  (1 << 31)
+#define PKT_SIZE0x7FF
+#define PHYADDR 0x1
+#define MAX_PKT_SIZE2048
+
+typedef struct {
+uint32_t pktaddr;
+uint32_t pktsize;
+uint32_t next;
+} EmacDesc;
+
+static uint32_t emac_get_isr(MSF2EmacState *s)
+{
+uint32_t ier = s->regs[R_DMA_IRQ_MASK];
+uint32_t tx = s->regs[R_DMA_TX_STATUS] & 0xF;
+uint32_t rx = s->regs[R_DMA_RX_STATUS] & 0xF;
+uint32_t isr = (rx << 4) | tx;
+
+s->regs[R_DMA_IRQ] = ier & isr;
+return s->regs[R_DMA_IRQ];
+}
+
+static void

[Qemu devel PATCH v6 0/3] Add SmartFusion2 EMAC block

2020-04-16 Thread sundeep . lkml

From: Subbaraya Sundeep 

This patch set emulates Ethernet MAC block
present in Microsemi SmartFusion2 SoC.

v6:
 Fixed destination address matching logic
 Added missing break in emac_write
v5:
 As per Philippe comments:
Returned size in receive function
Added link property to pass DMA memory
Used FIELD() APIs
Added mac_addr in emac state
Used FIELD_EX32 and FIELD_DP32 APIs
Simplified if else logics in emac_write/read
v4:
  Added loop back as per Jason's comment 
v3:
  Added SmartFusion2 ethernet test to tests/acceptance
v2:
  No changes. Fixed Signed-off mail id in patch 2/2

Testing:

1. Download u-boot.bin, uImage and msf2-devkit.dtb from
   https://github.com/Subbaraya-Sundeep/qemu-test-binaries.git
2. Copy uImage and msf2-devkit.dtb to suitable Qemu tftp directory
3. Launch Qemu by
   ./arm-softmmu/qemu-system-arm -M emcraft-sf2 -serial mon:stdio -kernel \
   u-boot.bin -display none -nic user,tftp=

Example:
./arm-softmmu/qemu-system-arm -M emcraft-sf2 -serial mon:stdio -kernel u-boot 
-display none -nic user,tftp=/home/hyd1358/qemu_tftp

U-Boot 2010.03-0-ga7695d6 (Apr 04 2020 - 15:07:27)

CPU  : SmartFusion2 SoC (Cortex-M3 Hard IP)
Freqs: CORTEX-M3=142MHz,PCLK0=71MHz,PCLK1=71MHz
Board: M2S-FG484-SOM Rev 1A, www.emcraft.com
DRAM:  64 MB
*** Warning - bad CRC, using default environment

In:serial
Out:   serial
Err:   serial
Net:   M2S_MAC

Hit any key to stop autoboot:  3  0 

M2S-FG484-SOM> run netboot
Using M2S_MAC device
TFTP from server 10.0.2.2; our IP address is 10.0.2.15
Filename 'uImage'.
Load address: 0xa0007fc0
Loading: *#
 #
 #
 ###
done
Bytes transferred = 3681568 (382d20 hex)
Using M2S_MAC device
TFTP from server 10.0.2.2; our IP address is 10.0.2.15
Filename 'msf2-devkit.dtb'.
Load address: 0xa200
Loading: *#
done
Bytes transferred = 1712 (6b0 hex)
## Booting kernel from Legacy Image at a0007fc0 ...
   Image Name:   
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:3681504 Bytes =  3.5 MB
   Load Address: a0008000
   Entry Point:  a0008001
   Verifying Checksum ... OK
   Loading Kernel Image ... OK
OK

Starting kernel ...

[0.00] Booting Linux on physical CPU 0x0
[0.00] Linux version 4.5.0-gb0e5502-dirty (hyd1358@hyd1358) (gcc 
version 4.4.1 (Sourcery G++ Lite 2010q1-189) ) #85 PREEMPT Sat Apr 4 23:26:40 
IST 2020
[0.00] CPU: ARMv7-M [410fc231] revision 1 (ARMv7M), cr=
[0.00] CPU: unknown data cache, unknown instruction cache
[0.00] Machine model: Microsemi SmartFusion 2 development board
[0.00] bootconsole [earlycon0] enabled
[0.00] Built 1 zonelists in Zone order, mobility grouping on.  Total 
pages: 16256
[0.00] Kernel command line: console=ttyS0,115200n8 panic=10 
mem=64M@0xa000 earlyprintk
[0.00] PID hash table entries: 256 (order: -2, 1024 bytes)
[0.00] Dentry cache hash table entries: 8192 (order: 3, 32768 bytes)
[0.00] Inode-cache hash table entries: 4096 (order: 2, 16384 bytes)
[0.00] Memory: 61212K/65536K available (1612K kernel code, 75K rwdata, 
680K rodata, 1224K init, 120K bss, 4324K reserved, 0K cma-reserved)
[0.00] Virtual kernel memory layout:
[0.00] vector  : 0x - 0x1000   (   4 kB)
[0.00] fixmap  : 0xffc0 - 0xfff0   (3072 kB)
[0.00] vmalloc : 0x - 0x   (4095 MB)
[0.00] lowmem  : 0xa000 - 0xa400   (  64 MB)
[0.00] modules : 0xa000 - 0xa080   (   8 MB)
[0.00]   .text : 0xa0008000 - 0xa02453e8   (2293 kB)
[0.00]   .init : 0xa0246000 - 0xa0378000   (1224 kB)
[0.00]   .data : 0xa0378000 - 0xa038ace0   (  76 kB)
[0.00].bss : 0xa038ace0 - 0xa03a8ea0   ( 121 kB)
[0.00] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[0.00] Preemptible hierarchical RCU implementation.
.
.
.
[0.445184] Found M2S MAC at 0x40041000, irq 18
[0.448810] libphy: msf2 MII bus: probed
[0.527047] ipip: IPv4 over IPv4 tunneling driver
[0.532367] NET: Registered protocol family 10
[0.542307] sit: IPv6 over IPv4 tunneling driver
[0.544655] NET: Registered protocol family 17
[0.565395] Freeing unused kernel memory: 1224K (a0246000 - a0378000)
init started: BusyBox v1.31.1 (2020-01-25 20:01:06 IST)
starting pid 26, tty '': '/etc/rc'
starting pid 31, tty '/dev/ttyS0': '/bin/hush -i'


BusyBox v1.31.1 (2020-01-25 20:01:06 IST) hush - the humble shell
Enter 'help' for a list of built-in commands.

/ # ifconfig eth0 10.0.2.15
[   11.116091] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
/ # [   11.653634] eth0: link up (100/full)
[   11.655246] IPv6:

[Qemu devel PATCH v6 3/3] tests/boot_linux_console: Add ethernet test to SmartFusion2

2020-04-16 Thread sundeep . lkml

From: Subbaraya Sundeep 

In addition to simple serial test this patch uses ping
to test the ethernet block modelled in SmartFusion2 SoC.

Signed-off-by: Subbaraya Sundeep 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
---
 tests/acceptance/boot_linux_console.py | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/tests/acceptance/boot_linux_console.py 
b/tests/acceptance/boot_linux_console.py
index f825cd9..c6b06a1 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -336,13 +336,13 @@ class BootLinuxConsole(Test):
 """
 uboot_url = ('https://raw.githubusercontent.com/'
  'Subbaraya-Sundeep/qemu-test-binaries/'
- 'fa030bd77a014a0b8e360d3b7011df89283a2f0b/u-boot')
-uboot_hash = 'abba5d9c24cdd2d49cdc2a8aa92976cf20737eff'
+ 'fe371d32e50ca682391e1e70ab98c2942aeffb01/u-boot')
+uboot_hash = 'cbb8cbab970f594bf6523b9855be209c08374ae2'
 uboot_path = self.fetch_asset(uboot_url, asset_hash=uboot_hash)
 spi_url = ('https://raw.githubusercontent.com/'
'Subbaraya-Sundeep/qemu-test-binaries/'
-   'fa030bd77a014a0b8e360d3b7011df89283a2f0b/spi.bin')
-spi_hash = '85f698329d38de63aea6e884a86fbde70890a78a'
+   'fe371d32e50ca682391e1e70ab98c2942aeffb01/spi.bin')
+spi_hash = '65523a1835949b6f4553be96dec1b6a38fb05501'
 spi_path = self.fetch_asset(spi_url, asset_hash=spi_hash)
 
 self.vm.set_console()
@@ -352,7 +352,12 @@ class BootLinuxConsole(Test):
  '-drive', 'file=' + spi_path + ',if=mtd,format=raw',
  '-no-reboot')
 self.vm.launch()
-self.wait_for_console_pattern('init started: BusyBox')
+self.wait_for_console_pattern('Enter \'help\' for a list')
+
+exec_command_and_wait_for_pattern(self, 'ifconfig eth0 10.0.2.15',
+ 'eth0: link becomes ready')
+exec_command_and_wait_for_pattern(self, 'ping -c 3 10.0.2.2',
+'3 packets transmitted, 3 packets received, 0% packet loss')
 
 def do_test_arm_raspi2(self, uart_id):
 """
-- 
2.7.4

[Qemu devel PATCH v6 2/3] msf2: Add EMAC block to SmartFusion2 SoC

2020-04-16 Thread sundeep . lkml

From: Subbaraya Sundeep 

With SmartFusion2 Ethernet MAC model in
place this patch adds the same to SoC.

Signed-off-by: Subbaraya Sundeep 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
---
 hw/arm/msf2-soc.c | 26 --
 include/hw/arm/msf2-soc.h |  2 ++
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/hw/arm/msf2-soc.c b/hw/arm/msf2-soc.c
index 588d643..a455b88 100644
--- a/hw/arm/msf2-soc.c
+++ b/hw/arm/msf2-soc.c
@@ -1,7 +1,7 @@
 /*
  * SmartFusion2 SoC emulation.
  *
- * Copyright (c) 2017 Subbaraya Sundeep 
+ * Copyright (c) 2017-2020 Subbaraya Sundeep 
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to 
deal
@@ -35,11 +35,14 @@
 
 #define MSF2_TIMER_BASE   0x40004000
 #define MSF2_SYSREG_BASE  0x40038000
+#define MSF2_EMAC_BASE0x40041000
 
 #define ENVM_BASE_ADDRESS 0x6000
 
 #define SRAM_BASE_ADDRESS 0x2000
 
+#define MSF2_EMAC_IRQ 12
+
 #define MSF2_ENVM_MAX_SIZE(512 * KiB)
 
 /*
@@ -81,6 +84,13 @@ static void m2sxxx_soc_initfn(Object *obj)
 sysbus_init_child_obj(obj, "spi[*]", >spi[i], sizeof(s->spi[i]),
   TYPE_MSS_SPI);
 }
+
+sysbus_init_child_obj(obj, "emac", >emac, sizeof(s->emac),
+  TYPE_MSS_EMAC);
+if (nd_table[0].used) {
+qemu_check_nic_model(_table[0], TYPE_MSS_EMAC);
+qdev_set_nic_properties(DEVICE(>emac), _table[0]);
+}
 }
 
 static void m2sxxx_soc_realize(DeviceState *dev_soc, Error **errp)
@@ -192,6 +202,19 @@ static void m2sxxx_soc_realize(DeviceState *dev_soc, Error 
**errp)
 g_free(bus_name);
 }
 
+dev = DEVICE(>emac);
+object_property_set_link(OBJECT(>emac), OBJECT(get_system_memory()),
+ "ahb-bus", _abort);
+object_property_set_bool(OBJECT(>emac), true, "realized", );
+if (err != NULL) {
+error_propagate(errp, err);
+return;
+}
+busdev = SYS_BUS_DEVICE(dev);
+sysbus_mmio_map(busdev, 0, MSF2_EMAC_BASE);
+sysbus_connect_irq(busdev, 0,
+   qdev_get_gpio_in(armv7m, MSF2_EMAC_IRQ));
+
 /* Below devices are not modelled yet. */
 create_unimplemented_device("i2c_0", 0x40002000, 0x1000);
 create_unimplemented_device("dma", 0x40003000, 0x1000);
@@ -202,7 +225,6 @@ static void m2sxxx_soc_realize(DeviceState *dev_soc, Error 
**errp)
 create_unimplemented_device("can", 0x40015000, 0x1000);
 create_unimplemented_device("rtc", 0x40017000, 0x1000);
 create_unimplemented_device("apb_config", 0x4002, 0x1);
-create_unimplemented_device("emac", 0x40041000, 0x1000);
 create_unimplemented_device("usb", 0x40043000, 0x1000);
 }
 
diff --git a/include/hw/arm/msf2-soc.h b/include/hw/arm/msf2-soc.h
index 3cfe5c7..c9cb214 100644
--- a/include/hw/arm/msf2-soc.h
+++ b/include/hw/arm/msf2-soc.h
@@ -29,6 +29,7 @@
 #include "hw/timer/mss-timer.h"
 #include "hw/misc/msf2-sysreg.h"
 #include "hw/ssi/mss-spi.h"
+#include "hw/net/msf2-emac.h"
 
 #define TYPE_MSF2_SOC "msf2-soc"
 #define MSF2_SOC(obj) OBJECT_CHECK(MSF2State, (obj), TYPE_MSF2_SOC)
@@ -62,6 +63,7 @@ typedef struct MSF2State {
 MSF2SysregState sysreg;
 MSSTimerState timer;
 MSSSpiState spi[MSF2_NUM_SPIS];
+MSF2EmacState emac;
 } MSF2State;
 
 #endif
-- 
2.7.4

[PATCH 2/3] qemu-img: Add convert --bitmaps option

2020-04-16 Thread Eric Blake

Make it easier to copy all the persistent bitmaps of a source image
along with the contents, by adding a boolean flag for use with
qemu-img convert.

See also https://bugzilla.redhat.com/show_bug.cgi?id=1779893

Signed-off-by: Eric Blake 
---
 docs/tools/qemu-img.rst |  6 ++-
 qemu-img.c  | 81 +++--
 qemu-img-cmds.hx|  4 +-
 3 files changed, 85 insertions(+), 6 deletions(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 0080f83a76c9..8c4d85e0b835 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -186,6 +186,10 @@ Parameters to convert subcommand:

 .. program:: qemu-img-convert

+.. option:: --bitmaps
+
+  Additionally copy all bitmaps
+
 .. option:: -n

   Skip the creation of the target volume
@@ -373,7 +377,7 @@ Command description:
   4
 Error on reading data

-.. option:: convert [--object OBJECTDEF] [--image-opts] [--target-image-opts] 
[--target-is-zero] [-U] [-C] [-c] [-p] [-q] [-n] [-f FMT] [-t CACHE] [-T 
SRC_CACHE] [-O OUTPUT_FMT] [-B BACKING_FILE] [-o OPTIONS] [-l SNAPSHOT_PARAM] 
[-S SPARSE_SIZE] [-m NUM_COROUTINES] [-W] FILENAME [FILENAME2 [...]] 
OUTPUT_FILENAME
+.. option:: convert [--object OBJECTDEF] [--image-opts] [--target-image-opts] 
[--target-is-zero] [--bitmaps] [-U] [-C] [-c] [-p] [-q] [-n] [-f FMT] [-t 
CACHE] [-T SRC_CACHE] [-O OUTPUT_FMT] [-B BACKING_FILE] [-o OPTIONS] [-l 
SNAPSHOT_PARAM] [-S SPARSE_SIZE] [-m NUM_COROUTINES] [-W] FILENAME [FILENAME2 
[...]] OUTPUT_FILENAME

   Convert the disk image *FILENAME* or a snapshot *SNAPSHOT_PARAM*
   to disk image *OUTPUT_FILENAME* using format *OUTPUT_FMT*. It can
diff --git a/qemu-img.c b/qemu-img.c
index 821cbf610e5f..6541357179c2 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -28,6 +28,7 @@
 #include "qemu-common.h"
 #include "qemu-version.h"
 #include "qapi/error.h"
+#include "qapi/qapi-commands-block-core.h"
 #include "qapi/qapi-visit-block-core.h"
 #include "qapi/qobject-output-visitor.h"
 #include "qapi/qmp/qjson.h"
@@ -71,6 +72,7 @@ enum {
 OPTION_SHRINK = 266,
 OPTION_SALVAGE = 267,
 OPTION_TARGET_IS_ZERO = 268,
+OPTION_BITMAPS = 269,
 };

 typedef enum OutputFormat {
@@ -176,6 +178,7 @@ static void QEMU_NORETURN help(void)
"   hiding corruption that has already occurred.\n"
"\n"
"Parameters to convert subcommand:\n"
+   "  '--bitmaps' copies all persistent bitmaps to destination\n"
"  '-m' specifies how many coroutines work in parallel during the 
convert\n"
"   process (defaults to 8)\n"
"  '-W' allow to write to the target out of order rather than 
sequential\n"
@@ -2054,6 +2057,47 @@ static int convert_do_copy(ImgConvertState *s)
 return s->ret;
 }

+static int convert_copy_bitmaps(BlockDriverState *src, BlockDriverState *dst)
+{
+BdrvDirtyBitmap *bm;
+Error *err = NULL;
+BlockDirtyBitmapMergeSource *merge;
+BlockDirtyBitmapMergeSourceList *list;
+
+FOR_EACH_DIRTY_BITMAP(src, bm) {
+const char *name;
+
+if (!bdrv_dirty_bitmap_get_persistence(bm)) {
+continue;
+}
+name = bdrv_dirty_bitmap_name(bm);
+qmp_block_dirty_bitmap_add(dst->node_name, name,
+   true, bdrv_dirty_bitmap_granularity(bm),
+   true, true,
+   true, !bdrv_dirty_bitmap_enabled(bm),
+   );
+if (err) {
+error_reportf_err(err, "Failed to create bitmap %s: ", name);
+return -1;
+}
+
+merge = g_new0(BlockDirtyBitmapMergeSource, 1);
+merge->type = QTYPE_QDICT;
+merge->u.external.node = g_strdup(src->node_name);
+merge->u.external.name = g_strdup(name);
+list = g_new0(BlockDirtyBitmapMergeSourceList, 1);
+list->value = merge;
+qmp_block_dirty_bitmap_merge(dst->node_name, name, list, );
+qapi_free_BlockDirtyBitmapMergeSourceList(list);
+if (err) {
+error_reportf_err(err, "Failed to populate bitmap %s: ", name);
+return -1;
+}
+}
+
+return 0;
+}
+
 #define MAX_BUF_SECTORS 32768

 static int img_convert(int argc, char **argv)
@@ -2075,6 +2119,8 @@ static int img_convert(int argc, char **argv)
 int64_t ret = -EINVAL;
 bool force_share = false;
 bool explict_min_sparse = false;
+bool bitmaps = false;
+size_t nbitmaps = 0;

 ImgConvertState s = (ImgConvertState) {
 /* Need at least 4k of zeros for sparse detection */
@@ -2094,6 +2140,7 @@ static int img_convert(int argc, char **argv)
 {"target-image-opts", no_argument, 0, OPTION_TARGET_IMAGE_OPTS},
 {"salvage", no_argument, 0, OPTION_SALVAGE},
 {"target-is-zero", no_argument, 0, OPTION_TARGET_IS_ZERO},
+{"bitmaps", no_argument, 0, OPTION_BITMAPS},
 {0, 0, 0, 0}
 };

[PATCH 3/3] iotests: Add test 291 to for qemu-img convert --bitmaps

2020-04-16 Thread Eric Blake

Add a new test covering the feature added in the previous patch.

Signed-off-by: Eric Blake 
---
 tests/qemu-iotests/291 | 143 +
 tests/qemu-iotests/291.out |  56 +++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 200 insertions(+)
 create mode 100755 tests/qemu-iotests/291
 create mode 100644 tests/qemu-iotests/291.out

diff --git a/tests/qemu-iotests/291 b/tests/qemu-iotests/291
new file mode 100755
index ..dfdcc8e352c8
--- /dev/null
+++ b/tests/qemu-iotests/291
@@ -0,0 +1,143 @@
+#!/usr/bin/env bash
+#
+# Test qemu-img convert --bitmaps
+#
+# Copyright (C) 2018-2020 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+status=1 # failure is the default!
+
+_cleanup()
+{
+_cleanup_test_img
+nbd_server_stop
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.nbd
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+_require_command QEMU_NBD
+
+do_run_qemu()
+{
+echo Testing: "$@"
+$QEMU -nographic -qmp stdio -serial none "$@"
+echo
+}
+
+run_qemu()
+{
+do_run_qemu "$@" 2>&1 | _filter_testdir | _filter_qmp \
+  | _filter_qemu | _filter_imgfmt \
+  | _filter_actual_image_size | _filter_qemu_io
+}
+
+# Create initial image and populate two bitmaps: one active, one inactive
+_make_test_img 10M
+run_qemu <

[PATCH 1/3] blockdev: Split off basic bitmap operations for qemu-img

2020-04-16 Thread Eric Blake

The next patch wants to teach qemu how to copy a bitmap from one qcow2
file to another.  But blockdev.o is too heavyweight to link into
qemu-img, so it's time to split off the bare bones of what we will
need into a new file blockbitmaps.o.  Transactions are not needed in
qemu-img (if things fail while creating the new image, the fix is to
delete the botched copy, rather than worrying about atomic rollback).

For now, I stuck to just the minimum code motion (add and merge); we
could instead decide to move everything bitmap-related that does not
also pull in transactions (delete, enable, disable).

Signed-off-by: Eric Blake 
---
 Makefile.objs |   2 +-
 include/sysemu/blockdev.h |  10 ++
 blockbitmaps.c| 217 ++
 blockdev.c| 184 
 MAINTAINERS   |   1 +
 5 files changed, 229 insertions(+), 185 deletions(-)
 create mode 100644 blockbitmaps.c

diff --git a/Makefile.objs b/Makefile.objs
index a7c967633acf..44e30fa9a6e3 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -14,7 +14,7 @@ chardev-obj-y = chardev/
 authz-obj-y = authz/

 block-obj-y = nbd/
-block-obj-y += block.o blockjob.o job.o
+block-obj-y += block.o blockbitmaps.o blockjob.o job.o
 block-obj-y += block/ scsi/
 block-obj-y += qemu-io-cmds.o
 block-obj-$(CONFIG_REPLICATION) += replication.o
diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
index a86d99b3d875..95cfeb29bc0a 100644
--- a/include/sysemu/blockdev.h
+++ b/include/sysemu/blockdev.h
@@ -57,4 +57,14 @@ QemuOpts *drive_add(BlockInterfaceType type, int index, 
const char *file,
 DriveInfo *drive_new(QemuOpts *arg, BlockInterfaceType block_default_type,
  Error **errp);

+BdrvDirtyBitmap *block_dirty_bitmap_lookup(const char *node,
+   const char *name,
+   BlockDriverState **pbs,
+   Error **errp);
+BdrvDirtyBitmap *do_block_dirty_bitmap_merge(const char *node,
+ const char *target,
+ BlockDirtyBitmapMergeSourceList 
*bitmaps,
+ HBitmap **backup, Error **errp);
+
+
 #endif
diff --git a/blockbitmaps.c b/blockbitmaps.c
new file mode 100644
index ..0d334d82006d
--- /dev/null
+++ b/blockbitmaps.c
@@ -0,0 +1,217 @@
+/*
+ * QEMU host block device bitmaps
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ *
+ * This file incorporates work covered by the following copyright and
+ * permission notice:
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+
+#include "sysemu/blockdev.h"
+#include "block/block.h"
+#include "qapi/qapi-commands-block.h"
+#include "qapi/error.h"
+
+/**
+ * block_dirty_bitmap_lookup:
+ * Return a dirty bitmap (if present), after validating
+ * the node reference and bitmap names.
+ *
+ * @node: The name of the BDS node to search for bitmaps
+ * @name: The name of the bitmap to search for
+ * @pbs: Output pointer for BDS lookup, if desired. Can be NULL.
+ * @errp: Output pointer for error information. Can be NULL.
+ *
+ * @return: A bitmap object on success, or NULL on failure.
+ */
+BdrvDirtyBitmap *block_dirty_bitmap_lookup(const char *node,
+   const char *name,
+   BlockDriverState **pbs,
+   Error **errp)
+{
+BlockDriverState *bs;
+BdrvDirtyBitmap *bitmap;
+
+if (!node) {
+error_setg(errp, "Node cannot be NULL");
+return NULL;
+}
+if (!name) {
+error_setg(errp,

[PATCH 0/3] qemu-img: Add convert --bitmaps

2020-04-16 Thread Eric Blake

Without this series, the process for copying one qcow2 image to
another including all of its bitmaps involves running qemu and doing
the copying by hand with a series of QMP commands.  This makes the
process a bit more convenient.

I still think that someday we will need a 'qemu-img bitmap' with
various subcommands for manipulating bitmaps within an offline image,
but in the meantime, this seems like a useful addition on its own.

Series can also be downloaded at:
https://repo.or.cz/qemu/ericb.git/shortlog/refs/tags/qemu-img-bitmaps-v1

Eric Blake (3):
  blockdev: Split off basic bitmap operations for qemu-img
  qemu-img: Add convert --bitmaps option
  iotests: Add test 291 to for qemu-img convert --bitmaps

 docs/tools/qemu-img.rst|   6 +-
 Makefile.objs  |   2 +-
 include/sysemu/blockdev.h  |  10 ++
 blockbitmaps.c | 217 +
 blockdev.c | 184 ---
 qemu-img.c |  81 +-
 MAINTAINERS|   1 +
 qemu-img-cmds.hx   |   4 +-
 tests/qemu-iotests/291 | 143 
 tests/qemu-iotests/291.out |  56 ++
 tests/qemu-iotests/group   |   1 +
 11 files changed, 514 insertions(+), 191 deletions(-)
 create mode 100644 blockbitmaps.c
 create mode 100755 tests/qemu-iotests/291
 create mode 100644 tests/qemu-iotests/291.out

-- 
2.26.0

1 2 >

1 - 100 of 153 matches

Mail list logo