date:20180213

[Qemu-devel] [QEMU-PPC] [PATCH V2 3/3] ppc/spapr-caps: For pseries-2.12 change spapr-cap defaults

2018-02-13 Thread Suraj Jitindar Singh

For the pseries-2.12 machine type, make the spapr-caps SPAPR_CAP_CFPC
and SPAPR_CAP_SBBC default to workaround. Thus if the host is capable
the guest will be able to take advantage of these workarounds by default.
Otherwise if the host doesn't have these capabilities qemu will fail to
start and they will have to be explicitly disabled on the command line
with:
-machine pseries,cap-cfpc=broken,cap-sbbc=broken

Signed-off-by: Suraj Jitindar Singh 
---
 hw/ppc/spapr.c  | 11 ++-
 hw/ppc/spapr_caps.c | 10 ++
 include/hw/compat.h |  2 ++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 969db6cde2..e2ebb76242 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3941,13 +3941,20 @@ static const TypeInfo spapr_machine_info = {
 /*
  * pseries-2.12
  */
+#define SPAPR_COMPAT_2_12  \
+HW_COMPAT_2_12
+
 static void spapr_machine_2_12_instance_options(MachineState *machine)
 {
 }
 
 static void spapr_machine_2_12_class_options(MachineClass *mc)
 {
-/* Defaults for the latest behaviour inherited from the base class */
+sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
+
+smc->default_caps.caps[SPAPR_CAP_CFPC] = SPAPR_CAP_WORKAROUND;
+smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_WORKAROUND;
+SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_12);
 }
 
 DEFINE_SPAPR_MACHINE(2_12, "2.12", true);
@@ -3969,6 +3976,8 @@ static void spapr_machine_2_11_class_options(MachineClass 
*mc)
 
 spapr_machine_2_12_class_options(mc);
 smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_ON;
+smc->default_caps.caps[SPAPR_CAP_CFPC] = SPAPR_CAP_BROKEN;
+smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
 SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_11);
 }
 
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index 05997b0842..c25c2bca52 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -281,11 +281,21 @@ static sPAPRCapabilities 
default_caps_with_cpu(sPAPRMachineState *spapr,
 
 caps = smc->default_caps;
 
+if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_3_00,
+  0, spapr->max_compat_pvr)) {
+caps.caps[SPAPR_CAP_CFPC] = SPAPR_CAP_BROKEN;
+}
+
 if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_2_07,
   0, spapr->max_compat_pvr)) {
 caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF;
 }
 
+if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_2_06_PLUS,
+  0, spapr->max_compat_pvr)) {
+caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
+}
+
 if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_2_06,
   0, spapr->max_compat_pvr)) {
 caps.caps[SPAPR_CAP_VSX] = SPAPR_CAP_OFF;
diff --git a/include/hw/compat.h b/include/hw/compat.h
index 7f31850dfa..13238239da 100644
--- a/include/hw/compat.h
+++ b/include/hw/compat.h
@@ -1,6 +1,8 @@
 #ifndef HW_COMPAT_H
 #define HW_COMPAT_H
 
+#define HW_COMPAT_2_12
+
 #define HW_COMPAT_2_11 \
 {\
 .driver   = "hpet",\
-- 
2.13.6

[Qemu-devel] [QEMU-PPC] [PATCH V2 1/3] ppc/spapr-caps: Change migration macro to take full spapr-cap name

2018-02-13 Thread Suraj Jitindar Singh

Change the macro that generates the vmstate migration field and the needed
function for the spapr-caps to take the full spapr-cap name. This has
the benefit of meaning this instance will be picked up when greping
for the spapr-caps and making it more obvious what this macro is doing.

Signed-off-by: Suraj Jitindar Singh 
---
 hw/ppc/spapr_caps.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index 62efdaee38..e69d308560 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -350,34 +350,34 @@ int spapr_caps_post_migration(sPAPRMachineState *spapr)
 }
 
 /* Used to generate the migration field and needed function for a spapr cap */
-#define SPAPR_CAP_MIG_STATE(cap, ccap)  \
-static bool spapr_cap_##cap##_needed(void *opaque)  \
+#define SPAPR_CAP_MIG_STATE(sname, cap) \
+static bool spapr_cap_##sname##_needed(void *opaque)\
 {   \
 sPAPRMachineState *spapr = opaque;  \
 \
-return spapr->cmd_line_caps[SPAPR_CAP_##ccap] &&\
-   (spapr->eff.caps[SPAPR_CAP_##ccap] !=\
-spapr->def.caps[SPAPR_CAP_##ccap]); \
+return spapr->cmd_line_caps[cap] && \
+   (spapr->eff.caps[cap] != \
+spapr->def.caps[cap]);  \
 }   \
 \
-const VMStateDescription vmstate_spapr_cap_##cap = {\
-.name = "spapr/cap/" #cap,  \
+const VMStateDescription vmstate_spapr_cap_##sname = {  \
+.name = "spapr/cap/" #sname,\
 .version_id = 1,\
 .minimum_version_id = 1,\
-.needed = spapr_cap_##cap##_needed, \
+.needed = spapr_cap_##sname##_needed,   \
 .fields = (VMStateField[]) {\
-VMSTATE_UINT8(mig.caps[SPAPR_CAP_##ccap],   \
+VMSTATE_UINT8(mig.caps[cap],\
   sPAPRMachineState),   \
 VMSTATE_END_OF_LIST()   \
 },  \
 }
 
-SPAPR_CAP_MIG_STATE(htm, HTM);
-SPAPR_CAP_MIG_STATE(vsx, VSX);
-SPAPR_CAP_MIG_STATE(dfp, DFP);
-SPAPR_CAP_MIG_STATE(cfpc, CFPC);
-SPAPR_CAP_MIG_STATE(sbbc, SBBC);
-SPAPR_CAP_MIG_STATE(ibs, IBS);
+SPAPR_CAP_MIG_STATE(htm, SPAPR_CAP_HTM);
+SPAPR_CAP_MIG_STATE(vsx, SPAPR_CAP_VSX);
+SPAPR_CAP_MIG_STATE(dfp, SPAPR_CAP_DFP);
+SPAPR_CAP_MIG_STATE(cfpc, SPAPR_CAP_CFPC);
+SPAPR_CAP_MIG_STATE(sbbc, SPAPR_CAP_SBBC);
+SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
 
 void spapr_caps_reset(sPAPRMachineState *spapr)
 {
-- 
2.13.6

[Qemu-devel] [QEMU-PPC] [PATCH V2 2/3] ppc/spapr-caps: Convert spapr-cap-ibs to be a boolean

2018-02-13 Thread Suraj Jitindar Singh

The spapr-cap cap-ibs can only have values broken or fixed as there is
no workaround. Currently setting the value workaround will hit an assert
if the guest makes the hcall h_get_cpu_characteristics.

Thus this capability is better suited to being represented as a boolean.
Setting this to OFF corresponds to the old BROKEN, that is no indirect
branch serialisation. Setting this to ON corresponds to the old FIXED,
that is indirect branches are serialised.

Reported-by: Satheesh Rajendran 
Signed-off-by: Suraj Jitindar Singh 
---
 hw/ppc/spapr.c  |  2 +-
 hw/ppc/spapr_caps.c | 12 ++--
 target/ppc/kvm.c|  2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 32a876be56..969db6cde2 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3886,7 +3886,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
 smc->default_caps.caps[SPAPR_CAP_DFP] = SPAPR_CAP_ON;
 smc->default_caps.caps[SPAPR_CAP_CFPC] = SPAPR_CAP_BROKEN;
 smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
-smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
+smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_OFF;
 spapr_caps_add_properties(smc, &error_abort);
 }
 
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index e69d308560..05997b0842 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -207,9 +207,9 @@ static void 
cap_safe_indirect_branch_apply(sPAPRMachineState *spapr,
 {
 if (tcg_enabled() && val) {
 /* TODO - for now only allow broken for TCG */
-error_setg(errp, "Requested safe indirect branch capability level not 
supported by tcg, try a different value for cap-ibs");
+error_setg(errp, "Indirect Branch Serialisation support not available, 
try cap-ibs=off");
 } else if (kvm_enabled() && (val > kvmppc_get_cap_safe_indirect_branch())) 
{
-error_setg(errp, "Requested safe indirect branch capability level not 
supported by kvm, try a different value for cap-ibs");
+error_setg(errp, "Indirect Branch Serialisation support not available, 
try cap-ibs=off");
 }
 }
 
@@ -263,11 +263,11 @@ sPAPRCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
 },
 [SPAPR_CAP_IBS] = {
 .name = "ibs",
-.description = "Indirect Branch Serialisation" VALUE_DESC_TRISTATE,
+.description = "Indirect Branch Serialisation",
 .index = SPAPR_CAP_IBS,
-.get = spapr_cap_get_tristate,
-.set = spapr_cap_set_tristate,
-.type = "string",
+.get = spapr_cap_get_bool,
+.set = spapr_cap_set_bool,
+.type = "bool",
 .apply = cap_safe_indirect_branch_apply,
 },
 };
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 9842b3bb12..3e3e5f9c1f 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -2495,7 +2495,7 @@ static void kvmppc_get_cpu_characteristics(KVMState *s)
 }
 /* Parse and set cap_ppc_safe_indirect_branch */
 if (c.character & H_CPU_CHAR_BCCTRL_SERIALISED) {
-cap_ppc_safe_indirect_branch = 2;
+cap_ppc_safe_indirect_branch = 1;
 }
 }
 
-- 
2.13.6

[Qemu-devel] [PATCH v4] hw/char: remove legacy interface escc_init()

2018-02-13 Thread Laurent Vivier

Move necessary stuff in escc.h and update type names.
Remove slavio_serial_ms_kbd_init().
Fix code style problems reported by checkpatch.pl
Update mac_newworld, mac_oldworld and sun4m to use directly the
QDEV interface.

Signed-off-by: Laurent Vivier 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Mark Cave-Ayland 
---

Notes:
v4: rebase and add Mark's R-b
v3: in sun4m, move comments about Slavio TTY
above both qdev_create().
v2: in sun4m, move comments about Slavio TTY close to
their qdev_prop_set_chr()²

 hw/char/escc.c | 209 ++---
 hw/ppc/mac_newworld.c  |  19 -
 hw/ppc/mac_oldworld.c  |  19 -
 hw/sparc/sun4m.c   |  34 +++-
 include/hw/char/escc.h |  54 +++--
 5 files changed, 170 insertions(+), 165 deletions(-)

diff --git a/hw/char/escc.c b/hw/char/escc.c
index 449bf2fc63..628f5f81f7 100644
--- a/hw/char/escc.c
+++ b/hw/char/escc.c
@@ -26,10 +26,7 @@
 #include "hw/hw.h"
 #include "hw/sysbus.h"
 #include "hw/char/escc.h"
-#include "chardev/char-fe.h"
-#include "chardev/char-serial.h"
 #include "ui/console.h"
-#include "ui/input.h"
 #include "trace.h"
 
 /*
@@ -64,53 +61,7 @@
  *  2010-May-23  Artyom Tarasenko:  Reworked IUS logic
  */
 
-typedef enum {
-chn_a, chn_b,
-} ChnID;
-
-#define CHN_C(s) ((s)->chn == chn_b? 'b' : 'a')
-
-typedef enum {
-ser, kbd, mouse,
-} ChnType;
-
-#define SERIO_QUEUE_SIZE 256
-
-typedef struct {
-uint8_t data[SERIO_QUEUE_SIZE];
-int rptr, wptr, count;
-} SERIOQueue;
-
-#define SERIAL_REGS 16
-typedef struct ChannelState {
-qemu_irq irq;
-uint32_t rxint, txint, rxint_under_svc, txint_under_svc;
-struct ChannelState *otherchn;
-uint32_t reg;
-uint8_t wregs[SERIAL_REGS], rregs[SERIAL_REGS];
-SERIOQueue queue;
-CharBackend chr;
-int e0_mode, led_mode, caps_lock_mode, num_lock_mode;
-int disabled;
-int clock;
-uint32_t vmstate_dummy;
-ChnID chn; // this channel, A (base+4) or B (base+0)
-ChnType type;
-uint8_t rx, tx;
-QemuInputHandlerState *hs;
-} ChannelState;
-
-#define ESCC(obj) OBJECT_CHECK(ESCCState, (obj), TYPE_ESCC)
-
-typedef struct ESCCState {
-SysBusDevice parent_obj;
-
-struct ChannelState chn[2];
-uint32_t it_shift;
-MemoryRegion mmio;
-uint32_t disabled;
-uint32_t frequency;
-} ESCCState;
+#define CHN_C(s) ((s)->chn == escc_chn_b ? 'b' : 'a')
 
 #define SERIAL_CTRL 0
 #define SERIAL_DATA 1
@@ -214,44 +165,47 @@ typedef struct ESCCState {
 #define R_MISC1I 14
 #define R_EXTINT 15
 
-static void handle_kbd_command(ChannelState *s, int val);
+static void handle_kbd_command(ESCCChannelState *s, int val);
 static int serial_can_receive(void *opaque);
-static void serial_receive_byte(ChannelState *s, int ch);
+static void serial_receive_byte(ESCCChannelState *s, int ch);
 
 static void clear_queue(void *opaque)
 {
-ChannelState *s = opaque;
-SERIOQueue *q = &s->queue;
+ESCCChannelState *s = opaque;
+ESCCSERIOQueue *q = &s->queue;
 q->rptr = q->wptr = q->count = 0;
 }
 
 static void put_queue(void *opaque, int b)
 {
-ChannelState *s = opaque;
-SERIOQueue *q = &s->queue;
+ESCCChannelState *s = opaque;
+ESCCSERIOQueue *q = &s->queue;
 
 trace_escc_put_queue(CHN_C(s), b);
-if (q->count >= SERIO_QUEUE_SIZE)
+if (q->count >= ESCC_SERIO_QUEUE_SIZE) {
 return;
+}
 q->data[q->wptr] = b;
-if (++q->wptr == SERIO_QUEUE_SIZE)
+if (++q->wptr == ESCC_SERIO_QUEUE_SIZE) {
 q->wptr = 0;
+}
 q->count++;
 serial_receive_byte(s, 0);
 }
 
 static uint32_t get_queue(void *opaque)
 {
-ChannelState *s = opaque;
-SERIOQueue *q = &s->queue;
+ESCCChannelState *s = opaque;
+ESCCSERIOQueue *q = &s->queue;
 int val;
 
 if (q->count == 0) {
 return 0;
 } else {
 val = q->data[q->rptr];
-if (++q->rptr == SERIO_QUEUE_SIZE)
+if (++q->rptr == ESCC_SERIO_QUEUE_SIZE) {
 q->rptr = 0;
+}
 q->count--;
 }
 trace_escc_get_queue(CHN_C(s), val);
@@ -260,7 +214,7 @@ static uint32_t get_queue(void *opaque)
 return val;
 }
 
-static int escc_update_irq_chn(ChannelState *s)
+static int escc_update_irq_chn(ESCCChannelState *s)
 {
 if s->wregs[W_INTR] & INTR_TXINT) && (s->txint == 1)) ||
  // tx ints enabled, pending
@@ -274,7 +228,7 @@ static int escc_update_irq_chn(ChannelState *s)
 return 0;
 }
 
-static void escc_update_irq(ChannelState *s)
+static void escc_update_irq(ESCCChannelState *s)
 {
 int irq;
 
@@ -285,12 +239,12 @@ static void escc_update_irq(ChannelState *s)
 qemu_set_irq(s->irq, irq);
 }
 
-static void escc_reset_chn(ChannelState *s)
+static void escc_reset_chn(ESCCChannelState *s)
 {
 int i;
 
 s->reg = 0;
-for (i = 0; i < SERIAL_REGS; i++) {
+for (i = 0; i < ESCC_SERIAL_REGS; i++) {
 s->rregs[i] = 0;
 s->wregs[i] = 0;
 }
@@ -322,13 +276,13 @@ stat

Re: [Qemu-devel] [RFC PATCH v6 00/20] replay additions

2018-02-13 Thread Ciro Santilli

The patch 23bdb6f7ce73c33f96449e43b4cae01e55f79ae1 appears to be
segfaulting `qemu-img` at `replay_mutex_lock`.

The problem does not happen on the patch base
bc2943d6caf787e1c9a5f3109cdb98f37630b89e

The command is:

buildroot/output.x86_64~/images
../host/bin/qemu-img convert -f raw -O qcow2 rootfs.ext2 rootfs.ext2.qcow2
Aborted (core dumped)

and the backtrace:

>>> bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x75ce6f5d in __GI_abort () at abort.c:90
#2  0x5565ae79 in replay_mutex_unlock () at stubs/replay.c:79
#3  0x556393a3 in os_host_main_loop_wait
(timeout=) at util/main-loop.c:256
#4  main_loop_wait (nonblocking=) at util/main-loop.c:522
#5  0x55576890 in convert_do_copy (s=0x7fffca10) at
qemu-img.c:1900
#6  img_convert (argc=, argv=) at
qemu-img.c:2332
#7  0x55571dda in main (argc=7, argv=) at
qemu-img.c:4763
>>>

77 void replay_mutex_lock(void)
78 {
79 abort();
80 }

The configure command is:

GCC="/usr/bin/gcc" CXX="/usr/bin/g++" CPP="/usr/bin/cpp"
OBJCOPY="/usr/bin/objcopy" RANLIB="/usr/bin/ranlib"
CPPFLAGS="-I/home/ciro/bak/git/linux-kernel-module-cheat/buildroot/output.x86_64~/host/include"
CFLAGS="-O2 
-I/home/ciro/bak/git/linux-kernel-module-cheat/buildroot/output.x86_64~/host/include"
CXXFLAGS="-O2 
-I/home/ciro/bak/git/linux-kernel-module-cheat/buildroot/output.x86_64~/host/include"
LDFLAGS="-L/home/ciro/bak/git/linux-kernel-module-cheat/buildroot/output.x86_64~/host/lib
-Wl,-rpath,/home/ciro/bak/git/linux-kernel-module-cheat/buildroot/output.x86_64~/host/lib"
INTLTOOL_PERL=/usr/bin/perl CPP="/usr/bin/gcc -E" ./configure
--target-list="x86_64-softmmu"
--prefix="/home/ciro/bak/git/linux-kernel-module-cheat/buildroot/output.x86_64~/host"
--interp-prefix=/home/ciro/bak/git/linux-kernel-module-cheat/buildroot/output.x86_64~/host/x86_64-buildroot-linux-uclibc/sysroot
--cc="/usr/bin/gcc" --host-cc="/usr/bin/gcc"
--python=/home/ciro/bak/git/linux-kernel-module-cheat/buildroot/output.x86_64~/host/bin/python2
--extra-cflags="-O2
-I/home/ciro/bak/git/linux-kernel-module-cheat/buildroot/output.x86_64~/host/include"
--extra-ldflags="-L/home/ciro/bak/git/linux-kernel-module-cheat/buildroot/output.x86_64~/host/lib
-Wl,-rpath,/home/ciro/bak/git/linux-kernel-module-cheat/buildroot/output.x86_64~/host/lib"
--enable-debug --extra-cflags='-DDEBUG_PL061=1'
--enable-trace-backends=simple --enable-sdl --with-sdlabi=2.0

and everything is fully automated at:
https://github.com/cirosantilli/linux-kernel-module-cheat/tree/5ae702c71c2b2ad326b7791ff128cac0d8b298a2
by running:

./build -q


On Wed, Feb 7, 2018 at 12:38 PM, Pavel Dovgalyuk  wrote:
>> From: Ciro Santilli [mailto:ciro.santi...@gmail.com]
>> Can you provide a test branch somewhere so I can easily test it out?
>
> Here it is: https://github.com/ispras/qemu/tree/rr-180207
>
> Pavel Dovgalyuk
>

Re: [Qemu-devel] Assigning network devices to nested VMs results in driver errors in nested VMs

2018-02-13 Thread Peter Xu

On Tue, Feb 13, 2018 at 11:44:09PM -0500, Jintack Lim wrote:
> Hi,
> 
> I'm trying to assign network devices to nested VMs on x86 using KVM,
> but I got network device driver errors in the nested VMs. (I've tried
> this about an year ago when vIOMMU patches were not upstreamed, and I
> got similar errors at that time.)
> 
> This could be network driver issues, but I'd like to get some help if
> somebody encountered similar issues.
> 
> I'm using v4.15.0 kernel and v2.11.0 QEMU, and I followed this [1]
> guide. I had no problem with assigning devices to the first level VMs
> (L1 VMs). And I also checked that the devices were assigned to nested
> VMs with the lspci command in the nested VMs. But network device
> drivers failed to initialize the device. I tried two network cards -
> Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection and
> Mellanox Technologies MT27500 Family.
> 
> Intel driver error in the nested VM looks like this.
> [1.939552] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver -
> version 5.1.0-k
> [1.949796] ixgbe: Copyright (c) 1999-2016 Intel Corporation.
> [2.210024] ixgbe :00:04.0: HW Init failed: -12
> [2.218144] ixgbe: probe of :00:04.0 failed with error -12
> 
> and I saw lots of these messages in the host (L0) kernel log when
> booting the nested VM.
> 
> [ 1557.404173] DMAR: DRHD: handling fault status reg 102
> [ 1557.409813] DMAR: [DMA Read] Request device [06:00.0] fault addr
> 9 [fault reason 06] PTE Read access is not set
> [ 1561.383957] DMAR: DRHD: handling fault status reg 202
> [ 1561.389598] DMAR: [DMA Read] Request device [06:00.0] fault addr
> 9 [fault reason 06] PTE Read access is not set
> 
> This is Mellanox driver error in another nested VM.
> [2.481694] mlx4_core: Initializing :00:04.0
> [3.519422] mlx4_core :00:04.0: Installed FW has unsupported
> command interface revision 0
> [3.537769] mlx4_core :00:04.0: (Installed FW version is 0.0.000)
> [3.551733] mlx4_core :00:04.0: This driver version supports
> only revisions 2 to 3
> [3.568758] mlx4_core :00:04.0: QUERY_FW command failed, aborting
> [3.582789] mlx4_core :00:04.0: Failed to init fw, aborting.
> 
> The host showed similar messages as above.
> 
> I wonder what could be the cause of these errors. Please let me know
> if further information is needed.
> 
> [1] https://wiki.qemu.org/Features/VT-d

Hi, Jintack,

Thanks for reporting the problem.

I haven't been playing with nested assignment much recently (and even
before), but I think I encountered similar problem too in the past.

Will let you know if I had any progress, but it's possibly not gonna
happen in a few days since there'll be a whole week holiday starting
from tomorrow (which is Chinese Spring Festival).

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v3] hw/char: remove legacy interface escc_init()

2018-02-13 Thread David Gibson

On Tue, Feb 13, 2018 at 10:57:46PM +, Mark Cave-Ayland wrote:
> On 13/02/18 13:01, Laurent Vivier wrote:
> 
> > Hi,
> > 
> > can a maintainer of one of the involved parts take this in his
> > maintenance branch to have this merged?
> > 
> > Thanks,
> > Laurent
> > 
> > On 29/01/2018 15:21, Laurent Vivier wrote:
> > > Paolo,
> > > 
> > > I forgot to cc: you for the "MAINTAINERS/Character devices/Odd Fixes".
> > > Could you take this through your branch?
> > > 
> > > Thanks,
> > > Laurent
> > > 
> > > On 26/01/2018 16:41, Mark Cave-Ayland wrote:
> > > > On 26/01/18 14:47, Laurent Vivier wrote:
> > > > 
> > > > > Move necessary stuff in escc.h and update type names.
> > > > > Remove slavio_serial_ms_kbd_init().
> > > > > Fix code style problems reported by checkpatch.pl
> > > > > Update mac_newworld, mac_oldworld and sun4m to use directly the
> > > > > QDEV interface.
> > > > > 
> > > > > Signed-off-by: Laurent Vivier 
> > > > > Reviewed-by: Philippe Mathieu-Daudé 
> > > > > ---
> > > > > 
> > > > > Notes:
> > > > >   v3: in sun4m, move comments about Slavio TTY
> > > > >   above both qdev_create().
> > > > >   v2: in sun4m, move comments about Slavio TTY close to
> > > > >   their qdev_prop_set_chr()
> > > > > 
> > > > >    hw/char/escc.c | 208
> > > > > ++---
> > > > >    hw/ppc/mac_newworld.c  |  19 -
> > > > >    hw/ppc/mac_oldworld.c  |  19 -
> > > > >    hw/sparc/sun4m.c   |  34 +++-
> > > > >    include/hw/char/escc.h |  54 +++--
> > > > >    5 files changed, 170 insertions(+), 164 deletions(-)
> > > > > 
> > > > > diff --git a/hw/char/escc.c b/hw/char/escc.c
> > > > > index 3ab831a6a7..bb735cc0c8 100644
> > > > > --- a/hw/char/escc.c
> > > > > +++ b/hw/char/escc.c
> > > > > @@ -26,10 +26,7 @@
> > > > >    #include "hw/hw.h"
> > > > >    #include "hw/sysbus.h"
> > > > >    #include "hw/char/escc.h"
> > > > > -#include "chardev/char-fe.h"
> > > > > -#include "chardev/char-serial.h"
> > > > >    #include "ui/console.h"
> > > > > -#include "ui/input.h"
> > > > >    #include "trace.h"
> > > > >      /*
> > > > > @@ -64,53 +61,7 @@
> > > > >     *  2010-May-23  Artyom Tarasenko:  Reworked IUS logic
> > > > >     */
> > > > >    -typedef enum {
> > > > > -    chn_a, chn_b,
> > > > > -} ChnID;
> > > > > -
> > > > > -#define CHN_C(s) ((s)->chn == chn_b? 'b' : 'a')
> > > > > -
> > > > > -typedef enum {
> > > > > -    ser, kbd, mouse,
> > > > > -} ChnType;
> > > > > -
> > > > > -#define SERIO_QUEUE_SIZE 256
> > > > > -
> > > > > -typedef struct {
> > > > > -    uint8_t data[SERIO_QUEUE_SIZE];
> > > > > -    int rptr, wptr, count;
> > > > > -} SERIOQueue;
> > > > > -
> > > > > -#define SERIAL_REGS 16
> > > > > -typedef struct ChannelState {
> > > > > -    qemu_irq irq;
> > > > > -    uint32_t rxint, txint, rxint_under_svc, txint_under_svc;
> > > > > -    struct ChannelState *otherchn;
> > > > > -    uint32_t reg;
> > > > > -    uint8_t wregs[SERIAL_REGS], rregs[SERIAL_REGS];
> > > > > -    SERIOQueue queue;
> > > > > -    CharBackend chr;
> > > > > -    int e0_mode, led_mode, caps_lock_mode, num_lock_mode;
> > > > > -    int disabled;
> > > > > -    int clock;
> > > > > -    uint32_t vmstate_dummy;
> > > > > -    ChnID chn; // this channel, A (base+4) or B (base+0)
> > > > > -    ChnType type;
> > > > > -    uint8_t rx, tx;
> > > > > -    QemuInputHandlerState *hs;
> > > > > -} ChannelState;
> > > > > -
> > > > > -#define ESCC(obj) OBJECT_CHECK(ESCCState, (obj), TYPE_ESCC)
> > > > > -
> > > > > -typedef struct ESCCState {
> > > > > -    SysBusDevice parent_obj;
> > > > > -
> > > > > -    struct ChannelState chn[2];
> > > > > -    uint32_t it_shift;
> > > > > -    MemoryRegion mmio;
> > > > > -    uint32_t disabled;
> > > > > -    uint32_t frequency;
> > > > > -} ESCCState;
> > > > > +#define CHN_C(s) ((s)->chn == escc_chn_b ? 'b' : 'a')
> > > > >      #define SERIAL_CTRL 0
> > > > >    #define SERIAL_DATA 1
> > > > > @@ -214,44 +165,47 @@ typedef struct ESCCState {
> > > > >    #define R_MISC1I 14
> > > > >    #define R_EXTINT 15
> > > > >    -static void handle_kbd_command(ChannelState *s, int val);
> > > > > +static void handle_kbd_command(ESCCChannelState *s, int val);
> > > > >    static int serial_can_receive(void *opaque);
> > > > > -static void serial_receive_byte(ChannelState *s, int ch);
> > > > > +static void serial_receive_byte(ESCCChannelState *s, int ch);
> > > > >      static void clear_queue(void *opaque)
> > > > >    {
> > > > > -    ChannelState *s = opaque;
> > > > > -    SERIOQueue *q = &s->queue;
> > > > > +    ESCCChannelState *s = opaque;
> > > > > +    ESCCSERIOQueue *q = &s->queue;
> > > > >    q->rptr = q->wptr = q->count = 0;
> > > > >    }
> > > > >      static void put_queue(void *opaque, int b)
> > > > >    {
> > > > > -    ChannelState *s = opaque;
> > > > > -    SERIOQueue *q = &s->queue;
> > > > > +    ESCCChannelState *s = opaque;
> > > > > +    ESCCSERIOQueue *q = &s->queue

Re: [Qemu-devel] [PATCH v6 27/28] migration/qmp: add command migrate-pause

2018-02-13 Thread Peter Xu

On Tue, Feb 13, 2018 at 08:11:00PM +, Dr. David Alan Gilbert wrote:
> * Peter Xu (pet...@redhat.com) wrote:
> > It pauses an ongoing migration.  Currently it only supports postcopy.
> > Note that this command will work on either side of the migration.
> > Basically when we trigger this on one side, it'll interrupt the other
> > side as well since the other side will get notified on the disconnect
> > event.
> > 
> > However, it's still possible that the other side is not notified, for
> > example, when the network is totally broken, or due to some firewall
> > configuration changes.  In that case, we will also need to run the same
> > command on the other side so both sides will go into the paused state.
> > 
> > Signed-off-by: Peter Xu 
> > ---
> >  migration/migration.c | 27 +++
> >  qapi/migration.json   | 16 
> >  2 files changed, 43 insertions(+)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index bb57ed9ade..139abec0c3 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1448,6 +1448,33 @@ void qmp_migrate_recover(const char *uri, Error 
> > **errp)
> >  qemu_start_incoming_migration(uri, errp);
> >  }
> >  
> > +void qmp_migrate_pause(Error **errp)
> > +{
> > +MigrationState *ms = migrate_get_current();
> > +MigrationIncomingState *mis = migration_incoming_get_current();
> > +int ret;
> > +
> > +if (ms->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> > +/* Source side, during postcopy */
> > +ret = qemu_file_shutdown(ms->to_dst_file);
> 
> This doesn't feel thread safe; although I'm not sure how to make it so.
> If the migration finishes just after we check the state but before the
> shutdown we end up using a bogus QEMUFile*
> Making all the places that close a QEMUFile* set hte pointer Null before
> they do the close doesn't help because you still race with that.
> 
> (The race is small, but still)

IMHO we can fix it by adding a migration lock for management code. If
you see my previous migrate cleanup series, it's in my todo. ;)

The basic idea is that we take the lock for critical paths (but not
during most of the migration process).  E.g., we may need the lock
for:

- very beginning of migration, during setup
- reaching the end of migration
- every single migration QMP command (since HMP calls them so HMP will
  also acquire the lock)
- anywhere else I didn't mention that may necessary, e.g., when we
  change migrate state, meanwhile we do something else - basically
  that should be an "atomic operation", and we need the lock to make
  sure of that.

For the recovery series, I would prefer that we ignore this issue for
now - since this problem is there for quite a long time AFAICT in the
whole migration code rather than this series only, and we need to
solve it once and for all.

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v6 26/28] hmp/migration: add migrate_recover command

2018-02-13 Thread Peter Xu

On Tue, Feb 13, 2018 at 07:45:09PM +, Dr. David Alan Gilbert wrote:
> * Peter Xu (pet...@redhat.com) wrote:
> > Sister command to migrate-recover in QMP.
> > 
> > Signed-off-by: Peter Xu 
> 
> Yes, useful for testing, although we don't have any OOB equivalent yet,
> something I need to look at.

That'll be nice.

> 
> Reviewed-by: Dr. David Alan Gilbert 

Thanks,

-- 
Peter Xu

[Qemu-devel] Assigning network devices to nested VMs results in driver errors in nested VMs

2018-02-13 Thread Jintack Lim

Hi,

I'm trying to assign network devices to nested VMs on x86 using KVM,
but I got network device driver errors in the nested VMs. (I've tried
this about an year ago when vIOMMU patches were not upstreamed, and I
got similar errors at that time.)

This could be network driver issues, but I'd like to get some help if
somebody encountered similar issues.

I'm using v4.15.0 kernel and v2.11.0 QEMU, and I followed this [1]
guide. I had no problem with assigning devices to the first level VMs
(L1 VMs). And I also checked that the devices were assigned to nested
VMs with the lspci command in the nested VMs. But network device
drivers failed to initialize the device. I tried two network cards -
Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection and
Mellanox Technologies MT27500 Family.

Intel driver error in the nested VM looks like this.
[1.939552] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver -
version 5.1.0-k
[1.949796] ixgbe: Copyright (c) 1999-2016 Intel Corporation.
[2.210024] ixgbe :00:04.0: HW Init failed: -12
[2.218144] ixgbe: probe of :00:04.0 failed with error -12

and I saw lots of these messages in the host (L0) kernel log when
booting the nested VM.

[ 1557.404173] DMAR: DRHD: handling fault status reg 102
[ 1557.409813] DMAR: [DMA Read] Request device [06:00.0] fault addr
9 [fault reason 06] PTE Read access is not set
[ 1561.383957] DMAR: DRHD: handling fault status reg 202
[ 1561.389598] DMAR: [DMA Read] Request device [06:00.0] fault addr
9 [fault reason 06] PTE Read access is not set

This is Mellanox driver error in another nested VM.
[2.481694] mlx4_core: Initializing :00:04.0
[3.519422] mlx4_core :00:04.0: Installed FW has unsupported
command interface revision 0
[3.537769] mlx4_core :00:04.0: (Installed FW version is 0.0.000)
[3.551733] mlx4_core :00:04.0: This driver version supports
only revisions 2 to 3
[3.568758] mlx4_core :00:04.0: QUERY_FW command failed, aborting
[3.582789] mlx4_core :00:04.0: Failed to init fw, aborting.

The host showed similar messages as above.

I wonder what could be the cause of these errors. Please let me know
if further information is needed.

[1] https://wiki.qemu.org/Features/VT-d

Thanks,
Jintack

Re: [Qemu-devel] [qemu-web PATCH] Add a blog post documenting Spectre/Meltdown options for QEMU 2.11.1

2018-02-13 Thread Bruce Rogers

On 2/13/2018 at 5:11 PM, Michael Roth  wrote:
> This blog entry is intended as a follow‑up to the original entry in
> January regarding Spectre/Meltdown and the proposed changes to address
> them in the upcoming 2.11.1 release.
> 
> This entry is meant to accompany the 2.11.1 release (planned for
> 2018‑02‑14) and document how to make use of the new options for
> various architectures.
> 
> Cc: Eduardo Habkost 
> Cc: Paolo Bonzini 
> Cc: Peter Maydell 
> Cc: Suraj Jitindar Singh 
> Cc: David Gibson 
> Cc: Christian Borntraeger 
> Cc: Cornelia Huck 
> Cc: Thomas Huth 
> Signed‑off‑by: Michael Roth 
> ‑‑‑
> 
> The pseries/s390 bits have gotten some initial review (thanks 
> Suraj/Christian),
> but it can definitely use some additional review on the x86 side of things.
> 
> Also, Peter if think anything extra should to be mentioned on the ARM side 
> just
> let me know what to add.
> 
>  .../2018‑02‑14‑qemu‑2‑11‑1‑and‑spectre‑update.md   | 180 
> +
>  1 file changed, 180 insertions(+)
>  create mode 100644 _posts/2018‑02‑14‑qemu‑2‑11‑1‑and‑spectre‑update.md
> 
> diff ‑‑git a/_posts/2018‑02‑14‑qemu‑2‑11‑1‑and‑spectre‑update.md 
> b/_posts/2018‑02‑14‑qemu‑2‑11‑1‑and‑spectre‑update.md
> new file mode 100644
> index 000..7cdea59
> ‑‑‑ /dev/null
> +++ b/_posts/2018‑02‑14‑qemu‑2‑11‑1‑and‑spectre‑update.md
> @@ ‑0,0 +1,180 @@
> +‑‑‑
> +layout: post
> +title:  "QEMU 2.11.1 and making use of Spectre/Meltdown mitigation for KVM 
> guests"
> +date: 2018‑02‑14 10:35:44 ‑0600
> +author: Michael Roth
> +categories: [meltdown, spectre, security, x86, ppc, s390, releases, 'qemu 
> 2.11']
> +‑‑‑
> +
> +In a [previous post](https://www.qemu.org/2018/01/04/spectre/) it was
> +detailed how QEMU/KVM might be affected by Spectre/Meltdown attacks, and 
> what
> +the plan was to mitigate them in QEMU 2.11.1 (and eventually QEMU 2.12).
> +
> +QEMU 2.11.1 is now available, and contains the aforementioned mitigations 
> for
> +x86 guests, along with additional mitigation functionality for pseries and
> +s390 guests (ARM guests do not currently require additional QEMU patches).
> +However, enabling this functionality requires additional configuration 
> beyond
> +just updating QEMU, which we hope to address with this post.
> +
> +Please note that, as mentioned in the previous blog post, QEMU/KVM 
> generally
> +has the same requirements as other unpriviledged processes running on the
> +host WRT Spectre/Meltdown mitigation. What is being addressed here is
> +enabling a guest operating system to enable the same (or similar) 
> mitigations
> +to protect itself from unpriviledged guest processes. Thus, the
> +patches/requirements listed here are specific to that goal and should not 
> be
> +regarded as the full set of requirements to enable mitigations on the host
> +side (though in some cases there is some overlap between the two WRT 
> required
> +patches/etc).
> +
> +Also please note that this is a best‑effort from the QEMU/KVM community, and
> +these mitigations rely on a mix of additional kernel/firmware/microcode
> +updates that are in some cases not available publically, or may not yet be
> +implemented in some distros, so users are highly encouraged to consult with
> +their respective vendors/distros to confirm whether all the required
> +components are in place. We do our best to highlight the requirements here,
> +but this may not be an exhaustive list.
> +
> +
> +## enabling mitigations for x86 KVM guests
> +
> +For x86 guests there are 2 additional CPU flags associated with
> +Spectre/Meltdown mitigation
: **spec‑ctrl**, and **ibpb**. These flags
> +expose additional functionality made available through new microcode
> +updates for certain Intel/AMD processors that can be used to mitigate
> +various attack vectors related to Spectre. (Meltdown mitigation via KPTI
> +does not require additional CPU functionality or microcode, and does not
> +require an updated QEMU, only the related guest/host kernel patches).
> +
> +These CPU flags:
> +
> +* spec‑ctrl: exposes Indirect Branch Restricted Speculation (IBRS)
> +* ibpb: exposes Indirect Branch Prediction Barriers
> +
> +are both features requiring guest/host kernel updates, as well as
> +microcode updates for Intel and recent AMD processors. The status of
> +these kernel patches upstream is still in flux, but most supported
> +distros have some form of the patches that is sufficient to make use
> +of the features. The current status/availability of microcode updates
> +depends on your CPU architecture/model. Please check with your
> +vendor/distro to confirm these prerequisites are available/installed.
> +
> +Generally, for Intel CPUs with updated microcode, **spec‑ctrl** will
> +enable both IBRS and IBPB functionality. For AMD EPYC processors,
> +**ibpb** can be used to enable IBPB specifically, and is thought to
> +be sufficient by itself that particular architecture.

be sufficient by itself for that particular architecture

> +
> +These flags can be set

Re: [Qemu-devel] [PATCH v6 25/28] qmp/migration: new command migrate-recover

2018-02-13 Thread Peter Xu

On Tue, Feb 13, 2018 at 06:56:51PM +, Dr. David Alan Gilbert wrote:
> * Peter Xu (pet...@redhat.com) wrote:
> > The first allow-oob=true command.  It's used on destination side when
> > the postcopy migration is paused and ready for a recovery.  After
> > execution, a new migration channel will be established for postcopy to
> > continue.
> > 
> > Signed-off-by: Peter Xu 
> > ---
> >  migration/migration.c | 26 ++
> >  migration/migration.h |  1 +
> >  migration/savevm.c|  3 +++
> >  qapi/migration.json   | 20 
> >  4 files changed, 50 insertions(+)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index cf3a3f416c..bb57ed9ade 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1422,6 +1422,32 @@ void qmp_migrate_incoming(const char *uri, Error 
> > **errp)
> >  once = false;
> >  }
> >  
> > +void qmp_migrate_recover(const char *uri, Error **errp)
> > +{
> > +MigrationIncomingState *mis = migration_incoming_get_current();
> > +
> > +if (mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > +error_setg(errp, "Migrate recover can only be run "
> > +   "when postcopy is paused.");
> > +return;
> > +}
> 
> OK, if it did come back as Paused I don't think it can leave it again
> except this way, so I'm not too worried it being thread safe.
> 
> > +if (mis->postcopy_recover_triggered) {
> > +error_setg(errp, "Migrate recovery is triggered already");
> > +return;
> > +}
> > +
> > +/* This will make sure we'll only allow one recover for one pause */
> > +mis->postcopy_recover_triggered = true;
> 
> However, does that need to be done with a :
>if (atomic_cmpxchg(mis->postcopy_recovery_triggered, false, true) ==
>true) {
>   error_setg(errp, "Migrate recovery is triggered already");
>}
> 
> for the slim chance that someone did this command on the main and the
> oob monitor?

Yes, slim chance, but I agree. :)

I wasn't that strict on this, but I should.  Since we are at it, maybe
I'll also...

> 
> Dave
> 
> > +/*
> > + * Note that this call will never start a real migration; it will
> > + * only re-setup the migration stream and poke existing migration
> > + * to continue using that newly established channel.
> > + */
> > +qemu_start_incoming_migration(uri, errp);
> > +}
> > +
> >  bool migration_is_blocked(Error **errp)
> >  {
> >  if (qemu_savevm_state_blocked(errp)) {
> > diff --git a/migration/migration.h b/migration/migration.h
> > index 88f5614b90..581bf4668b 100644
> > --- a/migration/migration.h
> > +++ b/migration/migration.h
> > @@ -65,6 +65,7 @@ struct MigrationIncomingState {
> >  QemuSemaphore colo_incoming_sem;
> >  
> >  /* notify PAUSED postcopy incoming migrations to try to continue */
> > +bool postcopy_recover_triggered;
> >  QemuSemaphore postcopy_pause_sem_dst;
> >  QemuSemaphore postcopy_pause_sem_fault;
> >  };
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index d40092a2b6..5f41b062ba 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -2182,6 +2182,9 @@ static bool 
> > postcopy_pause_incoming(MigrationIncomingState *mis)
> >  /* Notify the fault thread for the invalidated file handle */
> >  postcopy_fault_thread_notify(mis);
> >  
> > +/* Clear the triggered bit to allow one recovery */
> > +mis->postcopy_recover_triggered = false;
> > +

... move this set operation above migrate_set_state() since there can
also be a slim chance too that we may be handling migrate-recover even
before setting up postcopy_recover_triggered=false first.

Thanks,

-- 
Peter Xu

[Qemu-devel] [Bug 916720] Re: select fails on windows because a non-socket fd is in the rfds set

2018-02-13 Thread Launchpad Bug Tracker

[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/916720

Title:
  select fails on windows because a non-socket fd is in the rfds set

Status in QEMU:
  Expired

Bug description:
  The select call in file main_loop.c at line 460 fails on windows
  because a non-socket fd is in the rfds set. As a result, gdb remote
  connections will never be accepted by qemu. The select function
  returns with -1. WSAGetLastError returns code 10038 (WSAENOTSOCK).

  I start qemu as follows:
  qemu-system-arm -cpu cortex-m3 -M lm3s6965evb -nographic -monitor null 
-serial null -semihosting -kernel test1.elf -S -gdb tcp:127.0.0.1:2200

  qemu is configure with:
  CFLAGS="-O4 -march=i686"
  configure --target-list="i386-softmmu arm-softmmu sparc-softmmu ppc-softmmu" 
--prefix=/home/qemu/install --cc=mingw32-gcc --host-cc=mingw32-gcc 
--audio-drv-list="dsound sdl" --audio-card-list="ac97 es1370 sb16 cs4231a adlib 
gus"

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/916720/+subscriptions

[Qemu-devel] [Bug 855630] Re: Cant Run Wine (posix not nptl) past 0.14.1

2018-02-13 Thread Launchpad Bug Tracker

[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/855630

Title:
  Cant Run Wine (posix not nptl) past 0.14.1

Status in QEMU:
  Expired

Bug description:
  when trying to build qemu I can build with ./configure --static
  --target-list=i386-linux-user just fine with 0.14.1

  But when I try to go on 0.15.0 or higher (tested on 0.15.0). About to
  test on 0.15.5 from git

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/855630/+subscriptions

Re: [Qemu-devel] [PATCH v6 21/28] migration: setup ramstate for resume

2018-02-13 Thread Peter Xu

On Tue, Feb 13, 2018 at 06:17:51PM +, Dr. David Alan Gilbert wrote:
> * Peter Xu (pet...@redhat.com) wrote:
> > After we updated the dirty bitmaps of ramblocks, we also need to update
> > the critical fields in RAMState to make sure it is ready for a resume.
> > 
> > Signed-off-by: Peter Xu 
> > ---
> >  migration/ram.c| 40 +++-
> >  migration/trace-events |  1 +
> >  2 files changed, 40 insertions(+), 1 deletion(-)
> > 
> > diff --git a/migration/ram.c b/migration/ram.c
> > index a2a4b05d5c..d275875f54 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -2250,6 +2250,36 @@ static int ram_init_all(RAMState **rsp)
> >  return 0;
> >  }
> >  
> > +static void ram_state_resume_prepare(RAMState *rs, QEMUFile *out)
> > +{
> > +RAMBlock *block;
> > +long pages = 0;
> > +
> > +/*
> > + * Postcopy is not using xbzrle/compression, so no need for that.
> > + * Also, since source are already halted, we don't need to care
> > + * about dirty page logging as well.
> > + */
> > +
> > +RAMBLOCK_FOREACH(block) {
> > +pages += bitmap_count_one(block->bmap,
> > +  block->used_length >> TARGET_PAGE_BITS);
> > +}
> > +
> > +/* This may not be aligned with current bitmaps. Recalculate. */
> > +rs->migration_dirty_pages = pages;
> 
> migration_dirty_pages is uint64_t - so we should probably do the cast
> above and keep 'pages' as uint64_t.

Sure.

> 
> > +rs->last_seen_block = NULL;
> > +rs->last_sent_block = NULL;
> > +rs->last_page = 0;
> > +rs->last_version = ram_list.version;
> 
> Do you need to explicitly set
>rs->ram_bulk_stage = false;
> 
> if the failure happened just after the start of postcopy and no
> requested pages had been sent, I think it might still  be set?

Could you elaborate what would go wrong even if it's still set?

Thanks,

-- 
Peter Xu

[Qemu-devel] [QEMU-PPC PATCH 3/3] ppc/spapr-caps: For pseries-2.12 change spapr-cap defaults

2018-02-13 Thread Suraj Jitindar Singh

For the pseries-2.12 machine type, make the spapr-caps SPAPR_CAP_CFPC
and SPAPR_CAP_SBBC default to workaround. This means the guest will
be able to take advantage of these workarounds by default, so long
as the host is capable.

Signed-off-by: Suraj Jitindar Singh 
---
 hw/ppc/spapr.c  | 11 ++-
 hw/ppc/spapr_caps.c | 10 ++
 include/hw/compat.h |  2 ++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 32a876be56..cd4a024660 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3941,13 +3941,20 @@ static const TypeInfo spapr_machine_info = {
 /*
  * pseries-2.12
  */
+#define SPAPR_COMPAT_2_12  \
+HW_COMPAT_2_12
+
 static void spapr_machine_2_12_instance_options(MachineState *machine)
 {
 }
 
 static void spapr_machine_2_12_class_options(MachineClass *mc)
 {
-/* Defaults for the latest behaviour inherited from the base class */
+sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
+
+smc->default_caps.caps[SPAPR_CAP_CFPC] = SPAPR_CAP_WORKAROUND;
+smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_WORKAROUND;
+SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_12);
 }
 
 DEFINE_SPAPR_MACHINE(2_12, "2.12", true);
@@ -3969,6 +3976,8 @@ static void spapr_machine_2_11_class_options(MachineClass 
*mc)
 
 spapr_machine_2_12_class_options(mc);
 smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_ON;
+smc->default_caps.caps[SPAPR_CAP_CFPC] = SPAPR_CAP_BROKEN;
+smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
 SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_11);
 }
 
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index 1cd73b617f..3dda1db812 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -283,11 +283,21 @@ static sPAPRCapabilities 
default_caps_with_cpu(sPAPRMachineState *spapr,
 
 caps = smc->default_caps;
 
+if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_3_00,
+  0, spapr->max_compat_pvr)) {
+caps.caps[SPAPR_CAP_CFPC] = SPAPR_CAP_BROKEN;
+}
+
 if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_2_07,
   0, spapr->max_compat_pvr)) {
 caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF;
 }
 
+if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_2_06_PLUS,
+  0, spapr->max_compat_pvr)) {
+caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
+}
+
 if (!ppc_check_compat(cpu, CPU_POWERPC_LOGICAL_2_06,
   0, spapr->max_compat_pvr)) {
 caps.caps[SPAPR_CAP_VSX] = SPAPR_CAP_OFF;
diff --git a/include/hw/compat.h b/include/hw/compat.h
index 7f31850dfa..13238239da 100644
--- a/include/hw/compat.h
+++ b/include/hw/compat.h
@@ -1,6 +1,8 @@
 #ifndef HW_COMPAT_H
 #define HW_COMPAT_H
 
+#define HW_COMPAT_2_12
+
 #define HW_COMPAT_2_11 \
 {\
 .driver   = "hpet",\
-- 
2.13.6

[Qemu-devel] [QEMU-PPC PATCH 2/3] ppc/spapr-caps: Disallow setting workaround for spapr-cap-ibs

2018-02-13 Thread Suraj Jitindar Singh

The spapr-cap cap-ibs can only have values broken or fixed as there is
no workaround. Currently setting the value workaround will hit an assert
if the guest makes the hcall h_get_cpu_characteristics.

Report an error when attempting to apply the setting with a more helpful
error message.

Reported-by: Satheesh Rajendran 
Signed-off-by: Suraj Jitindar Singh 
---
 hw/ppc/spapr_caps.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index e69d308560..1cd73b617f 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -205,7 +205,9 @@ static void cap_safe_bounds_check_apply(sPAPRMachineState 
*spapr, uint8_t val,
 static void cap_safe_indirect_branch_apply(sPAPRMachineState *spapr,
uint8_t val, Error **errp)
 {
-if (tcg_enabled() && val) {
+if (val == SPAPR_CAP_WORKAROUND) { /* Can only be Broken or Fixed */
+error_setg(errp, "Requested safe indirect branch capability level 
\"workaround\" not valid, try cap-ibs=fixed");
+} else if (tcg_enabled() && val) {
 /* TODO - for now only allow broken for TCG */
 error_setg(errp, "Requested safe indirect branch capability level not 
supported by tcg, try a different value for cap-ibs");
 } else if (kvm_enabled() && (val > kvmppc_get_cap_safe_indirect_branch())) 
{
-- 
2.13.6

[Qemu-devel] [QEMU-PPC PATCH 1/3] ppc/spapr-caps: Change migration macro to take full spapr-cap name

2018-02-13 Thread Suraj Jitindar Singh

Change the macro that generates the vmstate migration field and the needed
function for the spapr-caps to take the full spapr-cap name. This has
the benefit of meaning this instance will be picked up when greping
for the spapr-caps and making it more obvious what this macro is doing.

Signed-off-by: Suraj Jitindar Singh 
---
 hw/ppc/spapr_caps.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index 62efdaee38..e69d308560 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -350,34 +350,34 @@ int spapr_caps_post_migration(sPAPRMachineState *spapr)
 }
 
 /* Used to generate the migration field and needed function for a spapr cap */
-#define SPAPR_CAP_MIG_STATE(cap, ccap)  \
-static bool spapr_cap_##cap##_needed(void *opaque)  \
+#define SPAPR_CAP_MIG_STATE(sname, cap) \
+static bool spapr_cap_##sname##_needed(void *opaque)\
 {   \
 sPAPRMachineState *spapr = opaque;  \
 \
-return spapr->cmd_line_caps[SPAPR_CAP_##ccap] &&\
-   (spapr->eff.caps[SPAPR_CAP_##ccap] !=\
-spapr->def.caps[SPAPR_CAP_##ccap]); \
+return spapr->cmd_line_caps[cap] && \
+   (spapr->eff.caps[cap] != \
+spapr->def.caps[cap]);  \
 }   \
 \
-const VMStateDescription vmstate_spapr_cap_##cap = {\
-.name = "spapr/cap/" #cap,  \
+const VMStateDescription vmstate_spapr_cap_##sname = {  \
+.name = "spapr/cap/" #sname,\
 .version_id = 1,\
 .minimum_version_id = 1,\
-.needed = spapr_cap_##cap##_needed, \
+.needed = spapr_cap_##sname##_needed,   \
 .fields = (VMStateField[]) {\
-VMSTATE_UINT8(mig.caps[SPAPR_CAP_##ccap],   \
+VMSTATE_UINT8(mig.caps[cap],\
   sPAPRMachineState),   \
 VMSTATE_END_OF_LIST()   \
 },  \
 }
 
-SPAPR_CAP_MIG_STATE(htm, HTM);
-SPAPR_CAP_MIG_STATE(vsx, VSX);
-SPAPR_CAP_MIG_STATE(dfp, DFP);
-SPAPR_CAP_MIG_STATE(cfpc, CFPC);
-SPAPR_CAP_MIG_STATE(sbbc, SBBC);
-SPAPR_CAP_MIG_STATE(ibs, IBS);
+SPAPR_CAP_MIG_STATE(htm, SPAPR_CAP_HTM);
+SPAPR_CAP_MIG_STATE(vsx, SPAPR_CAP_VSX);
+SPAPR_CAP_MIG_STATE(dfp, SPAPR_CAP_DFP);
+SPAPR_CAP_MIG_STATE(cfpc, SPAPR_CAP_CFPC);
+SPAPR_CAP_MIG_STATE(sbbc, SPAPR_CAP_SBBC);
+SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
 
 void spapr_caps_reset(sPAPRMachineState *spapr)
 {
-- 
2.13.6

Re: [Qemu-devel] [PATCH v6 2/4] cryptodev: add vhost support

2018-02-13 Thread Zhoujian (jay)

> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Wednesday, February 14, 2018 12:44 AM
> To: Zhoujian (jay) 
> Cc: qemu-devel@nongnu.org; pbonz...@redhat.com; Huangweidong (C)
> ; stefa...@redhat.com; pa...@linux.vnet.ibm.com;
> longpeng ; xin.z...@intel.com; roy.fan.zh...@intel.com;
> Gonglei (Arei) ; wangxin (U)
> 
> Subject: Re: [PATCH v6 2/4] cryptodev: add vhost support
> 
> On Sun, Jan 21, 2018 at 08:54:48PM +0800, Jay Zhou wrote:
> > diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs index
> > 765d363..c65dca2 100644
> > --- a/hw/virtio/Makefile.objs
> > +++ b/hw/virtio/Makefile.objs
> > @@ -7,7 +7,7 @@ common-obj-y += virtio-mmio.o  obj-y += virtio.o
> > virtio-balloon.o
> >  obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
> >  obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o -obj-y += virtio-crypto.o
> > +obj-$(CONFIG_LINUX) += virtio-crypto.o
> >  obj-$(CONFIG_VIRTIO_PCI) += virtio-crypto-pci.o  endif
> >
> 
> This disables virtio crypto completely on non-Linux, which is not nice. We
> should not break working configs.

So If I understand correctly, the virtio crypto device should be compiled in
unconditionally, which is like this:

obj-y += virtio-crypto.o

> 
> In particular this causes test failures on non-Linux hosts. Peter Maydell was
> kind enough to debug this and reported this backtrace:
> 
> gdb --args ./aarch64-softmmu/qemu-system-aarch64 -device virtio-crypto-pci -
> machine virt [...]
> #0  0x7f7ff450e6fa in _lwp_kill () from /usr/lib/libc.so.12
> #1  0x7f7ff450e385 in abort () from /usr/lib/libc.so.12
> #2  0x7f7ff5c65da2 in g_assertion_message () from /usr/pkg/lib/libglib-
> 2.0.so.0
> #3  0x7f7ff5c65e11 in g_assertion_message_expr () from
> /usr/pkg/lib/libglib-2.0.so.0
> #4  0x0074dc16 in object_initialize_with_type
> (data=data@entry=0x7f7ff33a2170, size=, type=0x0)
> at /root/qemu/qom/object.c:372
> #5  0x0074de33 in object_initialize (data=data@entry=0x7f7ff33a2170,
> size=, typename=)
> at /root/qemu/qom/object.c:392
> #6  0x004d2293 in virtio_instance_init_common
> (proxy_obj=0x7f7ff339a000, data=0x7f7ff33a2170, vdev_size=,
> vdev_name=) at /root/qemu/hw/virtio/virtio.c:2232
> #7  0x0074db0d in object_initialize_with_type
> (data=data@entry=0x7f7ff339a000, size=33664, type=type@entry=0x7f7ff7b79a80)
> at /root/qemu/qom/object.c:384
> #8  0x0074dc66 in object_new_with_type (type=0x7f7ff7b79a80) at
> /root/qemu/qom/object.c:492
> #9  0x0074deb9 in object_new (typename=typename@entry=0x7f7ff7b454e0
> "virtio-crypto-pci") at /root/qemu/qom/object.c:502
> #10 0x005924d6 in qdev_device_add (opts=0x7f7ff7b4c070,
> errp=errp@entry=0x7f7fda10) at /root/qemu/qdev-monitor.c:615
> #11 0x00594d31 in device_init_func (opaque=,
> opts=, errp=) at /root/qemu/vl.c:2373
> #12 0x00826e56 in qemu_opts_foreach (list=,
> func=func@entry=0x594d0c , opaque=opaque@entry=0x0,
> errp=errp@entry=0x0) at /root/qemu/util/qemu-option.c:1073
> #13 0x008b723d in main (argc=, argv=,
> envp=) at /root/qemu/vl.c:4642
> 
> 
> He explained:
> 
> 
>  ... this is almost certainly the classic "device A depends on device
> B, device B is conditionally compiled but device A isn't"
>  the type that is missing is virtio-crypto-device  virtio-
> crypto.o is built only if CONFIG_LINUX, but virtio-crypto-pci is in virtio-
> crypto-pci.c which is built if CONFIG_VIRTIO_PCI

Okay, I see. Thanks for Peter's help.

Regards,
Jay

> 
> 
> --
> MST

Re: [Qemu-devel] [PATCH v6 0/4] cryptodev: add vhost support

2018-02-13 Thread Zhoujian (jay)

> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Wednesday, February 14, 2018 12:47 AM
> To: Zhoujian (jay) 
> Cc: qemu-devel@nongnu.org; pbonz...@redhat.com; Huangweidong (C)
> ; stefa...@redhat.com; pa...@linux.vnet.ibm.com;
> longpeng ; xin.z...@intel.com; roy.fan.zh...@intel.com;
> Gonglei (Arei) ; wangxin (U)
> 
> Subject: Re: [PATCH v6 0/4] cryptodev: add vhost support
> 
> On Sun, Jan 21, 2018 at 08:54:46PM +0800, Jay Zhou wrote:
> > From: Gonglei 
> >
> > I posted the RFC verion a few months ago for DPDK vhost-crypto
> > implmention, and now it's time to send the formal version. Because we
> > need an user space scheme for better performance.
> >
> > The vhost user crypto server side patches had been sent to DPDK
> > community, pls see
> 
> I dropped the patchset from the latest pull request.
> Please address the issues found, test stop path some more and resumit.

Hi Michael,
Thanks for your help, I'll respin this patchset when the issues are solved.

Regards,
Jay

> 
> Thanks!
> 
> > [RFC PATCH 0/6] lib/librte_vhost: introduce new vhost_user crypto
> > backend support
> > http://dpdk.org/ml/archives/dev/2017-November/081048.html
> >
> > You also can get virtio-crypto polling mode driver from:
> >
> > [PATCH] virtio: add new driver for crypto devices
> > http://dpdk.org/ml/archives/dev/2017-November/081985.html
> >
> > v5 -> v6:
> >   Fix compile error about backends/cryptodev-vhost-user.o and rebase on
> >   the master
> > v4 -> v5:
> >   squash [PATCH v4 5/5] into previous patches [Michael]
> > v3 -> v4:
> >   "[PATCH v4 5/5] cryptodev-vhost-user: depend on CONFIG_VHOST_CRYPTO
> >   and CONFIG_VHOST_USER" newly added to fix compilation dependency
> > [Michael]
> > v2 -> v3:
> >   New added vhost user messages should be sent only when feature
> >   has been successfully negotiated [Michael]
> > v1 -> v2:
> >   Fix compile error on mingw32
> >
> > Gonglei (4):
> >   cryptodev: add vhost-user as a new cryptodev backend
> >   cryptodev: add vhost support
> >   cryptodev-vhost-user: add crypto session handler
> >   cryptodev-vhost-user: set the key length
> >
> >  backends/Makefile.objs|   6 +
> >  backends/cryptodev-builtin.c  |   1 +
> >  backends/cryptodev-vhost-user.c   | 379
> ++
> >  backends/cryptodev-vhost.c| 347
> +++
> >  configure |  15 ++
> >  docs/interop/vhost-user.txt   |  26 +++
> >  hw/virtio/Makefile.objs   |   2 +-
> >  hw/virtio/vhost-user.c| 104 ++
> >  hw/virtio/virtio-crypto.c |  70 +++
> >  include/hw/virtio/vhost-backend.h |   8 +
> >  include/hw/virtio/virtio-crypto.h |   1 +
> >  include/sysemu/cryptodev-vhost-user.h |  47 +
> >  include/sysemu/cryptodev-vhost.h  | 154 ++
> >  include/sysemu/cryptodev.h|   8 +
> >  qemu-options.hx   |  21 ++
> >  vl.c  |   6 +
> >  16 files changed, 1194 insertions(+), 1 deletion(-)  create mode
> > 100644 backends/cryptodev-vhost-user.c  create mode 100644
> > backends/cryptodev-vhost.c  create mode 100644
> > include/sysemu/cryptodev-vhost-user.h
> >  create mode 100644 include/sysemu/cryptodev-vhost.h
> >
> > --
> > 1.8.3.1
> >

Re: [Qemu-devel] [PATCH v6 1/4] cryptodev: add vhost-user as a new cryptodev backend

2018-02-13 Thread Zhoujian (jay)

> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Wednesday, February 14, 2018 12:46 AM
> To: Zhoujian (jay) 
> Cc: qemu-devel@nongnu.org; pbonz...@redhat.com; Huangweidong (C)
> ; stefa...@redhat.com; pa...@linux.vnet.ibm.com;
> longpeng ; xin.z...@intel.com; roy.fan.zh...@intel.com;
> Gonglei (Arei) ; wangxin (U)
> 
> Subject: Re: [PATCH v6 1/4] cryptodev: add vhost-user as a new cryptodev
> backend
> 
> On Sun, Jan 21, 2018 at 08:54:47PM +0800, Jay Zhou wrote:
> > diff --git a/backends/cryptodev-vhost-user.c
> > b/backends/cryptodev-vhost-user.c new file mode 100644 index
> > 000..4e63ece
> > --- /dev/null
> > +++ b/backends/cryptodev-vhost-user.c
> > @@ -0,0 +1,333 @@
> > +/*
> > + * QEMU Cryptodev backend for QEMU cipher APIs
> > + *
> > + * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
> > + *
> > + * Authors:
> > + *Gonglei 
> > + *
> > + * This library is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2 of the License, or (at your option) any later version.
> > + *
> > + * This library is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with this library; if not, see
> .
> > + *
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "hw/boards.h"
> > +#include "qapi/error.h"
> > +#include "qapi/qmp/qerror.h"
> > +#include "qemu/error-report.h"
> > +#include "standard-headers/linux/virtio_crypto.h"
> > +#include "sysemu/cryptodev-vhost.h"
> > +#include "chardev/char-fe.h"
> > +
> > +
> > +/**
> > + * @TYPE_CRYPTODEV_BACKEND_VHOST_USER:
> > + * name of backend that uses vhost user server  */ #define
> > +TYPE_CRYPTODEV_BACKEND_VHOST_USER "cryptodev-vhost-user"
> > +
> > +#define CRYPTODEV_BACKEND_VHOST_USER(obj) \
> > +OBJECT_CHECK(CryptoDevBackendVhostUser, \
> > + (obj), TYPE_CRYPTODEV_BACKEND_VHOST_USER)
> > +
> > +
> > +typedef struct CryptoDevBackendVhostUser {
> > +CryptoDevBackend parent_obj;
> > +
> > +CharBackend chr;
> > +char *chr_name;
> > +bool opened;
> > +CryptoDevBackendVhost *vhost_crypto[MAX_CRYPTO_QUEUE_NUM];
> > +} CryptoDevBackendVhostUser;
> > +
> > +static int
> > +cryptodev_vhost_user_running(
> > + CryptoDevBackendVhost *crypto) {
> > +return crypto ? 1 : 0;
> > +}
> > +
> > +static void cryptodev_vhost_user_stop(int queues,
> > +  CryptoDevBackendVhostUser *s) {
> > +size_t i;
> > +
> > +for (i = 0; i < queues; i++) {
> > +if (!cryptodev_vhost_user_running(s->vhost_crypto[i])) {
> > +continue;
> > +}
> > +
> > +if (s->vhost_crypto) {
> > +cryptodev_vhost_cleanup(s->vhost_crypto[i]);
> > +s->vhost_crypto[i] = NULL;
> > +}
> > +}
> > +}
> 
> This test is problematic: clang build triggers an error:
> > /home/petmay01/linaro/qemu-for-merges/backends/cryptodev-vhost-user.c:86:16:
> > error: address of array 's->vhost_crypto' will always evaluate to
> > 'true' [-Werror,-Wpointer-bool-conversion]
> > if (s->vhost_crypto) {
> > ~~  ~~~^~~~

This line should be

if (s->vhost_crypto[i]) {
> 
> I really don't see how this could do the right thing, which makes me suspect
> that either you did not test stop, or you always have all queues enabled.
> 
> Pls test a config with some queues disabled.
> 
> In particular this machinery needs some unit tests to catch errors like this.

Okay, will do more tests, sorry about that.

Regards,
Jay

> 
> 
> --
> MST

Re: [Qemu-devel] [PULL 00/26] virtio, vhost, pci, pc: features, fixes and cleanups

2018-02-13 Thread Zhoujian (jay)

> -Original Message-
> From: Qemu-devel [mailto:qemu-devel-
> bounces+jianjay.zhou=huawei@nongnu.org] On Behalf Of Michael S. Tsirkin
> Sent: Wednesday, February 14, 2018 12:52 AM
> To: Peter Maydell 
> Cc: QEMU Developers 
> Subject: Re: [Qemu-devel] [PULL 00/26] virtio, vhost, pci, pc: features,
> fixes and cleanups
> 
> On Tue, Feb 13, 2018 at 04:33:14PM +, Peter Maydell wrote:
> > On 12 February 2018 at 09:35, Peter Maydell 
> wrote:
> > > This asserts in 'make check' for NetBSD, FreeBSD, OpenBSD, OSX:
> > >
> > > TEST: tests/device-introspect-test... (pid=19530)
> > >   /aarch64/device/introspect/list: OK
> > >   /aarch64/device/introspect/list-fields:  OK
> > >   /aarch64/device/introspect/none: OK
> > >   /aarch64/device/introspect/abstract: OK
> > >   /aarch64/device/introspect/concrete: **
> > > ERROR:/root/qemu/qom/object.c:372:object_initialize_with_type:
> > > assertion failed: (type != NULL)
> > > Broken pipe
> > > FAIL
> > > GTester: last random seed: R02Sd4e2c04f6ac00d843a31ccac4de0d914
> > > (pid=12686)
> > >   /aarch64/device/introspect/abstract-interfaces:  OK
> > > FAIL: tests/device-introspect-test
> > >
> > > (other archs fail too, aarch64 is just the first one we hit). This
> > > error is often "device A instantiates device B, but device B is
> > > conditionally compiled in and A is always compiled; so test fails
> > > trying to create device A on hosts where device B isn't built" I think.
> >
> > This part is indeed that problem -- the 'virtio-crypto-device' object
> > is built only if CONFIG_LINUX, but 'virtio-crypto-pci' is built if
> > CONFIG_VIRTIO_PCI, so on systems where the latter is true but not the
> > former you get a virtio-crypto-pci device that crashes when instantiated.
> >
> > The fix should be
> > --- a/hw/virtio/Makefile.objs
> > +++ b/hw/virtio/Makefile.objs
> > @@ -8,7 +8,7 @@ obj-y += virtio.o virtio-balloon.o
> >  obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
> >  obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o
> >  obj-$(CONFIG_LINUX) += virtio-crypto.o
> > -obj-$(CONFIG_VIRTIO_PCI) += virtio-crypto-pci.o
> > +obj-$(call land,$(CONFIG_LINUX),$(CONFIG_VIRTIO_PCI)) +=
> > +virtio-crypto-pci.o
> >  endif
> >
> >  common-obj-$(call lnot,$(CONFIG_LINUX)) += vhost-stub.o
> >
> >
> > thanks
> > -- PMM
> 
> Thanks for your help with this!
> 
> I think the root cause is that one of the patches disables virtio-crypto
> build on non linux which used to be enabled.
> 
> This has been silently done by a patch which was supposed to merely add a
> vhost crypto device, I'm sorry I missed that in the review.

Hi Michael, Peter

Sorry for the late response and the trouble I have made, I just returned
from a vacation.
Thanks for your help and pointing out the root cause, I should explicitly
put the config change of virtio-crypto into a separate patch to make
review much easier.

> 
> I've dropped the crypto vhost patches from the pull request for now.

Will fix the issue in the next version, sorry again for the inconvenience.

Regards,
Jay

> 
> Pushed with the same name - should be fine now.
> 
> --
> MST

Re: [Qemu-devel] [PATCH qemu v7 2/4] vfio/pci: Relax DMA map errors for MMIO regions

2018-02-13 Thread David Gibson

On Tue, Feb 13, 2018 at 07:20:56PM +1100, Alexey Kardashevskiy wrote:
> On 13/02/18 16:41, David Gibson wrote:
> > On Tue, Feb 13, 2018 at 04:36:30PM +1100, David Gibson wrote:
> >> On Tue, Feb 13, 2018 at 12:15:52PM +1100, Alexey Kardashevskiy wrote:
> >>> On 13/02/18 03:06, Alex Williamson wrote:
>  On Mon, 12 Feb 2018 18:05:54 +1100
>  Alexey Kardashevskiy  wrote:
> 
> > On 12/02/18 16:19, David Gibson wrote:
> >> On Fri, Feb 09, 2018 at 06:55:01PM +1100, Alexey Kardashevskiy wrote:  
> >>> At the moment if vfio_memory_listener is registered in the system 
> >>> memory
> >>> address space, it maps/unmaps every RAM memory region for DMA.
> >>> It expects system page size aligned memory sections so vfio_dma_map
> >>> would not fail and so far this has been the case. A mapping failure
> >>> would be fatal. A side effect of such behavior is that some MMIO pages
> >>> would not be mapped silently.
> >>>
> >>> However we are going to change MSIX BAR handling so we will end having
> >>> non-aligned sections in vfio_memory_listener (more details is in
> >>> the next patch) and vfio_dma_map will exit QEMU.
> >>>
> >>> In order to avoid fatal failures on what previously was not a failure 
> >>> and
> >>> was just silently ignored, this checks the section alignment to
> >>> the smallest supported IOMMU page size and prints an error if not 
> >>> aligned;
> >>> it also prints an error if vfio_dma_map failed despite the page size 
> >>> check.
> >>> Both errors are not fatal; only MMIO RAM regions are checked
> >>> (aka "RAM device" regions).
> >>>
> >>> If the amount of errors printed is overwhelming, the MSIX relocation
> >>> could be used to avoid excessive error output.
> >>>
> >>> This is unlikely to cause any behavioral change.
> >>>
> >>> Signed-off-by: Alexey Kardashevskiy   
> >>
> >> There are some relatively superficial problems noted below.
> >>
> >> But more fundamentally, this feels like it's extending an existing
> >> hack past the point of usefulness.
> >>
> >> The explicit check for is_ram_device() here has always bothered me -
> >> it's not like a real bus bridge magically knows whether a target
> >> address maps to RAM or not.
> >>
> >> What I think is really going on is that even for systems without an
> >> IOMMU, it's not really true to say that the PCI address space maps
> >> directly onto address_space_memory.  Instead, there's a large, but
> >> much less than 2^64 sized, "upstream window" at address 0 on the PCI
> >> bus, which is identity mapped to the system bus.  Details will vary
> >> with the system, but in practice we expect nothing but RAM to be in
> >> that window.  Addresses not within that window won't be mapped to the
> >> system bus but will just be broadcast on the PCI bus and might be
> >> picked up as a p2p transaction.  
> >
> > Currently this p2p works only via the IOMMU, direct p2p is not possible 
> > as
> > the guest needs to know physical MMIO addresses to make p2p work and it
> > does not.
> 
>  /me points to the Direct Translated P2P section of the ACS spec, though
>  it's as prone to spoofing by the device as ATS.  In any case, p2p
>  reflected from the IOMMU is still p2p and offloads the CPU even if
>  bandwidth suffers vs bare metal depending on if the data doubles back
>  over any links.  Thanks,
> >>>
> >>> Sure, I was just saying that p2p via IOMMU won't be as simple as broadcast
> >>> on the PCI bus, IOMMU needs to be programmed in advance to make this work,
> >>> and current that broadcast won't work for the passed through devices.
> >>
> >> Well, sure, p2p in a guest with passthrough devices clearly needs to
> >> be translated through the IOMMU (and p2p from a passthrough to an
> >> emulated device is essentially impossible).
> >>
> >> But.. what does that have to do with this code.  This is the memory
> >> area watcher, looking for memory regions being mapped directly into
> >> the PCI space.  NOT IOMMU regions, since those are handled separately
> >> by wiring up the IOMMU notifier.  This will only trigger if RAM-like,
> >> non-RAM regions are put into PCI space *not* behind an IOMMMU.
> > 
> > Duh, sorry, realised I was mixing up host and guest IOMMU.  I guess
> > the point here is that this will map RAM-like devices into the host
> > IOMMU when there is no guest IOMMU, allowing p2p transactions between
> > passthrough devices (though not from passthrough to emulated devices).
> 
> Correct.
> 
> > 
> > The conditions still seem kind of awkward to me, but I guess it makes
> > sense.
> 
> Is it the time to split this listener to RAM-listener and PCI bus listener?

I'm not really sure what you mean by that.

> On x86 it listens on the "memory" AS, on spapr - on the
> "pci@8002000" AS, this will just create

Re: [Qemu-devel] [PATCH v2] iotests: Test creating overlay when guest running

2018-02-13 Thread Fam Zheng

Ping?

On Thu, Jan 4, 2018 at 6:18 AM, Eric Blake  wrote:
> On 12/24/2017 08:51 PM, Fam Zheng wrote:
>> Signed-off-by: Fam Zheng 
>>
>> ---
>>
>> v2: Actually test the thing. [Kevin]
>> ---
>>  tests/qemu-iotests/153 | 8 +---
>>  tests/qemu-iotests/153.out | 7 ---
>>  2 files changed, 9 insertions(+), 6 deletions(-)
>
> Reviewed-by: Eric Blake 
>
>>
>> diff --git a/tests/qemu-iotests/153 b/tests/qemu-iotests/153
>> index fa25eb24bd..adfd02695b 100755
>> --- a/tests/qemu-iotests/153
>> +++ b/tests/qemu-iotests/153
>> @@ -32,6 +32,7 @@ _cleanup()
>>  {
>>  _cleanup_test_img
>>  rm -f "${TEST_IMG}.base"
>> +rm -f "${TEST_IMG}.overlay"
>
> Trivial conflict with Jeff's work to do per-test temporary directories
> in iotests.
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>

Re: [Qemu-devel] [PATCH 1/5] Add a git-publish configuration file

2018-02-13 Thread Fam Zheng

On Tue, 02/13 18:09, Daniel P. Berrangé wrote:
> On Tue, Feb 13, 2018 at 05:34:25PM +, Stefan Hajnoczi wrote:
> > From: Fam Zheng 
> > 
> > git-publish [1] is a convenient tool to send patches and has been
> > popular among QEMU developers.  Recently it has been made available in
> > Fedora official repo thanks to Stefan's work.
> > 
> > One nice feature of the tool is a per-project configuration with
> > profiles, especially in which the cccmd option is a handy method to
> > create the Cc list.
> > 
> > [1]: https://github.com/stefanha/git-publish
> > 
> > Signed-off-by: Fam Zheng 
> > Reviewed-by: Marc-André Lureau 
> > Message-id: 20180205054725.25634-2-f...@redhat.com
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  .gitpublish | 58 ++
> >  1 file changed, 58 insertions(+)
> >  create mode 100644 .gitpublish
> > 
> > diff --git a/.gitpublish b/.gitpublish
> > new file mode 100644
> > index 00..ed48f6e52c
> > --- /dev/null
> > +++ b/.gitpublish
> > @@ -0,0 +1,58 @@
> > +#
> > +# Common git-publish profiles that can be used to send patches to QEMU 
> > upstream.
> > +#
> > +# See https://github.com/stefanha/git-publish for more information
> > +#
> > +[gitpublishprofile "default"]
> > +base = master
> > +prefix = PATCH
> > +to = qemu-devel@nongnu.org
> > +cccmd = scripts/get_maintainer.pl --noroles --norolestats --nogit 
> > --nogit-fallback 2>/dev/null
> > +
> > +[gitpublishprofile "rfc"]
> > +base = master
> > +prefix = RFC PATCH
> > +to = qemu-devel@nongnu.org
> > +cccmd = scripts/get_maintainer.pl --noroles --norolestats --nogit 
> > --nogit-fallback 2>/dev/null
> > +
> > +[gitpublishprofile "stable"]
> > +base = master
> > +prefix = PATCH
> > +to = qemu-devel@nongnu.org
> > +cc = qemu-sta...@nongnu.org
> > +cccmd = scripts/get_maintainer.pl --noroles --norolestats --nogit 
> > --nogit-fallback 2>/dev/null
> > +
> > +[gitpublishprofile "trivial"]
> > +base = master
> > +prefix = PATCH
> > +to = qemu-devel@nongnu.org
> > +cc = qemu-triv...@nongnu.org
> > +cccmd = scripts/get_maintainer.pl --noroles --norolestats --nogit 
> > --nogit-fallback 2>/dev/null
> > +
> > +[gitpublishprofile "block"]
> > +base = master
> > +prefix = PATCH
> > +to = qemu-devel@nongnu.org
> > +cc = qemu-bl...@nongnu.org
> > +cccmd = scripts/get_maintainer.pl --noroles --norolestats --nogit 
> > --nogit-fallback 2>/dev/null
> 
> Why is a custom entry needed for block here (and other things
> below).   Won't running get_maintainer.pl already correctly
> report when a patch needs cc'ing to qemu-bl...@nongnu.org
> based on MAINTAINER rules ?

Yeah, dropping them should be fine. What do you think, Stefan?

Fam

Re: [Qemu-devel] [PATCH 0/5] Block patches

2018-02-13 Thread Fam Zheng

On Tue, 02/13 17:34, Stefan Hajnoczi wrote:
> The following changes since commit fb68096da3d35e64c88cd610c1fa42766c58e92a:
> 
>   Revert "tests: use memfd in vhost-user-test" (2018-02-13 09:51:52 +)
> 
> are available in the Git repository at:
> 
>   git://github.com/stefanha/qemu.git tags/block-pull-request
> 
> for you to fetch changes up to 64b01feca991e5b19a5d750ef77cdca92b68bdbb:
> 
>   misc: fix spelling (2018-02-13 15:38:17 +)

Did you mean "PULL" in the subject?

Fam

[Qemu-devel] [qemu-web PATCH] Add a blog post documenting Spectre/Meltdown options for QEMU 2.11.1

2018-02-13 Thread Michael Roth

This blog entry is intended as a follow-up to the original entry in
January regarding Spectre/Meltdown and the proposed changes to address
them in the upcoming 2.11.1 release.

This entry is meant to accompany the 2.11.1 release (planned for
2018-02-14) and document how to make use of the new options for
various architectures.

Cc: Eduardo Habkost 
Cc: Paolo Bonzini 
Cc: Peter Maydell 
Cc: Suraj Jitindar Singh 
Cc: David Gibson 
Cc: Christian Borntraeger 
Cc: Cornelia Huck 
Cc: Thomas Huth 
Signed-off-by: Michael Roth 
---

The pseries/s390 bits have gotten some initial review (thanks Suraj/Christian),
but it can definitely use some additional review on the x86 side of things.

Also, Peter if think anything extra should to be mentioned on the ARM side just
let me know what to add.

 .../2018-02-14-qemu-2-11-1-and-spectre-update.md   | 180 +
 1 file changed, 180 insertions(+)
 create mode 100644 _posts/2018-02-14-qemu-2-11-1-and-spectre-update.md

diff --git a/_posts/2018-02-14-qemu-2-11-1-and-spectre-update.md 
b/_posts/2018-02-14-qemu-2-11-1-and-spectre-update.md
new file mode 100644
index 000..7cdea59
--- /dev/null
+++ b/_posts/2018-02-14-qemu-2-11-1-and-spectre-update.md
@@ -0,0 +1,180 @@
+---
+layout: post
+title:  "QEMU 2.11.1 and making use of Spectre/Meltdown mitigation for KVM 
guests"
+date: 2018-02-14 10:35:44 -0600
+author: Michael Roth
+categories: [meltdown, spectre, security, x86, ppc, s390, releases, 'qemu 
2.11']
+---
+
+In a [previous post](https://www.qemu.org/2018/01/04/spectre/) it was
+detailed how QEMU/KVM might be affected by Spectre/Meltdown attacks, and what
+the plan was to mitigate them in QEMU 2.11.1 (and eventually QEMU 2.12).
+
+QEMU 2.11.1 is now available, and contains the aforementioned mitigations for
+x86 guests, along with additional mitigation functionality for pseries and
+s390 guests (ARM guests do not currently require additional QEMU patches).
+However, enabling this functionality requires additional configuration beyond
+just updating QEMU, which we hope to address with this post.
+
+Please note that, as mentioned in the previous blog post, QEMU/KVM generally
+has the same requirements as other unpriviledged processes running on the
+host WRT Spectre/Meltdown mitigation. What is being addressed here is
+enabling a guest operating system to enable the same (or similar) mitigations
+to protect itself from unpriviledged guest processes. Thus, the
+patches/requirements listed here are specific to that goal and should not be
+regarded as the full set of requirements to enable mitigations on the host
+side (though in some cases there is some overlap between the two WRT required
+patches/etc).
+
+Also please note that this is a best-effort from the QEMU/KVM community, and
+these mitigations rely on a mix of additional kernel/firmware/microcode
+updates that are in some cases not available publically, or may not yet be
+implemented in some distros, so users are highly encouraged to consult with
+their respective vendors/distros to confirm whether all the required
+components are in place. We do our best to highlight the requirements here,
+but this may not be an exhaustive list.
+
+
+## enabling mitigations for x86 KVM guests
+
+For x86 guests there are 2 additional CPU flags associated with
+Spectre/Meltdown mitigation: **spec-ctrl**, and **ibpb**. These flags
+expose additional functionality made available through new microcode
+updates for certain Intel/AMD processors that can be used to mitigate
+various attack vectors related to Spectre. (Meltdown mitigation via KPTI
+does not require additional CPU functionality or microcode, and does not
+require an updated QEMU, only the related guest/host kernel patches).
+
+These CPU flags:
+
+* spec-ctrl: exposes Indirect Branch Restricted Speculation (IBRS)
+* ibpb: exposes Indirect Branch Prediction Barriers
+
+are both features requiring guest/host kernel updates, as well as
+microcode updates for Intel and recent AMD processors. The status of
+these kernel patches upstream is still in flux, but most supported
+distros have some form of the patches that is sufficient to make use
+of the features. The current status/availability of microcode updates
+depends on your CPU architecture/model. Please check with your
+vendor/distro to confirm these prerequisites are available/installed.
+
+Generally, for Intel CPUs with updated microcode, **spec-ctrl** will
+enable both IBRS and IBPB functionality. For AMD EPYC processors,
+**ibpb** can be used to enable IBPB specifically, and is thought to
+be sufficient by itself that particular architecture.
+
+These flags can be set in a similar manner as other CPU flags, i.e.:
+
+qemu-system-x86_64 -cpu qemu64,+spec-ctrl,... ...
+qemu-system-x86_64 -cpu IvyBridge,+spec-ctrl,... ...
+qemu-system-x86_64 -cpu EPYC,+ibpb
+etc...
+
+Additionally, for management stacks that lack support for setting
+specific CPU flags, a set of new CPU types have been added wh

Re: [Qemu-devel] [PATCH v5 09/23] RISC-V TCG Code Generation

2018-02-13 Thread Emilio G. Cota

On Tue, Feb 13, 2018 at 14:10:20 -0800, Richard Henderson wrote:
> On 02/13/2018 01:55 PM, Emilio G. Cota wrote:
> > Are we planning to use BS_STOP in the future? I see it has no setters,
> > although we check for it in gen_intermediate_code:
> 
> No, but the whole port should be converted to exec/translator.h, which defines
> DisasJumpType.  Not something I'm going to require on initial submission until
> we've gotten most of the other targets cleaned up.

I see. I've just done the conversion for v5:
  https://github.com/cota/qemu/commits/riscv-v5-trloop

Can you please take a look?

Thanks,

Emilio

Re: [Qemu-devel] [PATCH 1/1] hw/ppc/spapr_hcall: set htab_shift after kvmppc_resize_hpt_commit

2018-02-13 Thread David Gibson

On Tue, Feb 13, 2018 at 03:37:16PM -0200, Daniel Henrique Barboza wrote:
1;5002;0c> Newer kernels have a htab resize capability when adding or remove
> memory. At these situations, the guest kernel might reallocate its
> htab to a more suitable size based on the resulting memory.
> 
> However, we're not setting the new value back into the machine state
> when a KVM guest resizes its htab. At first this doesn't seem harmful,
> but when migrating or saving the guest state (via virsh managedsave,
> for instance) this mismatch between the htab size of QEMU and the
> kernel makes the guest hangs when trying to load its state.
> 
> Inside h_resize_hpt_commit, the hypercall that commits the hash page
> resize changes, let's set spapr->htab_shift to the new value if we're
> sure that kvmppc_resize_hpt_commit were successful.
> 
> While we're here, add a "not RADIX" sanity check as it is already done
> in the related hypercall h_resize_hpt_prepare.
> 
> Fixes: https://github.com/open-power-host-os/qemu/issues/28
> Reported-by: Satheesh Rajendran 
> Signed-off-by: Daniel Henrique Barboza 

Ouch.  Good catch.  I'm kind of astonished this didn't break even
worse than it did.  Applied.

> ---
>  hw/ppc/spapr_hcall.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 76422cfac1..1986560480 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -731,11 +731,21 @@ static target_ulong h_resize_hpt_commit(PowerPCCPU *cpu,
>  return H_AUTHORITY;
>  }
>  
> +if (!spapr->htab_shift) {
> +/* Radix guest, no HPT */
> +return H_NOT_AVAILABLE;
> +}
> +
>  trace_spapr_h_resize_hpt_commit(flags, shift);
>  
>  rc = kvmppc_resize_hpt_commit(cpu, flags, shift);
>  if (rc != -ENOSYS) {
> -return resize_hpt_convert_rc(rc);
> +rc = resize_hpt_convert_rc(rc);
> +if (rc == H_SUCCESS) {
> +/* Need to set the new htab_shift in the machine state */
> +spapr->htab_shift = shift;
> +}
> +return rc;
>  }
>  
>  if (flags != 0) {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH 1/2] qcow2: Prefer 'entries' over 'size' for non-byte values in spec

2018-02-13 Thread Eric Blake

We want to limit the use of the term 'size' for only values that
count by bytes.  Renaming fields in the spec does not invalidate
any existing implementation, but may make future implementations
easier to write.

A reasonable followup would be to rename internal qemu code that
operates on qcow2 images to also use the distinction between
size and entries in variable names.

Signed-off-by: Eric Blake 
---
 docs/interop/qcow2.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index d7fdb1fee31..597d3f261d5 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -47,7 +47,7 @@ The first cluster of a qcow2 image contains the file header:
 1 for AES encryption
 2 for LUKS encryption

- 36 - 39:   l1_size
+ 36 - 39:   l1_entries
 Number of entries in the active L1 table

  40 - 47:   l1_table_offset
@@ -538,7 +538,7 @@ Structure of a bitmap directory entry:
 (described below) for the bitmap starts. Must be aligned to
 a cluster boundary.

- 8 - 11:bitmap_table_size
+ 8 - 11:bitmap_table_entries
 Number of entries in the bitmap table of the bitmap.

 12 - 15:flags
-- 
2.14.3

[Qemu-devel] [RFC PATCH 0/2] s/size/entries/ when dealing with non-byte units

2018-02-13 Thread Eric Blake

I mentioned this while reviewing Berto's series on L2 slice handling;
this is a first cut at patches that I think are worth doing throughout
the qcow2 code base if we like the idea.

Eric Blake (2):
  qcow2: Prefer 'entries' over 'size' for non-byte values in spec
  qcow2: Prefer 'entries' over 'size' during cache creation

 docs/interop/qcow2.txt |  4 ++--
 block/qcow2.h  |  4 ++--
 block/qcow2.c  | 21 +++--
 3 files changed, 15 insertions(+), 14 deletions(-)

-- 
2.14.3

[Qemu-devel] [PATCH 2/2] qcow2: Prefer 'entries' over 'size' during cache creation

2018-02-13 Thread Eric Blake

Using 'size' for anything other than bytes is difficult to
reason about; let's rename entries related to the number of
entries in a cache accordingly.

Signed-off-by: Eric Blake 
---
 block/qcow2.h |  4 ++--
 block/qcow2.c | 21 +++--
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 883802241fb..0daf8e6d6f8 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -68,10 +68,10 @@
 #define MAX_CLUSTER_BITS 21

 /* Must be at least 2 to cover COW */
-#define MIN_L2_CACHE_SIZE 2 /* cache entries */
+#define MIN_L2_CACHE_ENTRIES 2

 /* Must be at least 4 to cover all cases of refcount table growth */
-#define MIN_REFCOUNT_CACHE_SIZE 4 /* clusters */
+#define MIN_REFCOUNT_CACHE_ENTRIES 4

 /* Whichever is more */
 #define DEFAULT_L2_CACHE_CLUSTERS 8 /* clusters */
diff --git a/block/qcow2.c b/block/qcow2.c
index 288b5299d80..f25c33df1d1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -843,6 +843,7 @@ static int qcow2_update_options_prepare(BlockDriverState 
*bs,
 const char *opt_overlap_check, *opt_overlap_check_template;
 int overlap_check_template = 0;
 uint64_t l2_cache_size, l2_cache_entry_size, refcount_cache_size;
+uint64_t l2_cache_entries, refcount_cache_entries;
 int i;
 const char *encryptfmt;
 QDict *encryptopts = NULL;
@@ -869,21 +870,21 @@ static int qcow2_update_options_prepare(BlockDriverState 
*bs,
 goto fail;
 }

-l2_cache_size /= l2_cache_entry_size;
-if (l2_cache_size < MIN_L2_CACHE_SIZE) {
-l2_cache_size = MIN_L2_CACHE_SIZE;
+l2_cache_entries = l2_cache_size / l2_cache_entry_size;
+if (l2_cache_entries < MIN_L2_CACHE_ENTRIES) {
+l2_cache_entries = MIN_L2_CACHE_ENTRIES;
 }
-if (l2_cache_size > INT_MAX) {
+if (l2_cache_entries > INT_MAX) {
 error_setg(errp, "L2 cache size too big");
 ret = -EINVAL;
 goto fail;
 }

-refcount_cache_size /= s->cluster_size;
-if (refcount_cache_size < MIN_REFCOUNT_CACHE_SIZE) {
-refcount_cache_size = MIN_REFCOUNT_CACHE_SIZE;
+refcount_cache_entries = refcount_cache_size / s->cluster_size;
+if (refcount_cache_entries < MIN_REFCOUNT_CACHE_ENTRIES) {
+refcount_cache_entries = MIN_REFCOUNT_CACHE_ENTRIES;
 }
-if (refcount_cache_size > INT_MAX) {
+if (refcount_cache_entries > INT_MAX) {
 error_setg(errp, "Refcount cache size too big");
 ret = -EINVAL;
 goto fail;
@@ -908,9 +909,9 @@ static int qcow2_update_options_prepare(BlockDriverState 
*bs,
 }

 r->l2_slice_size = l2_cache_entry_size / sizeof(uint64_t);
-r->l2_table_cache = qcow2_cache_create(bs, l2_cache_size,
+r->l2_table_cache = qcow2_cache_create(bs, l2_cache_entries,
l2_cache_entry_size);
-r->refcount_block_cache = qcow2_cache_create(bs, refcount_cache_size,
+r->refcount_block_cache = qcow2_cache_create(bs, refcount_cache_entries,
  s->cluster_size);
 if (r->l2_table_cache == NULL || r->refcount_block_cache == NULL) {
 error_setg(errp, "Could not allocate metadata caches");
-- 
2.14.3

Re: [Qemu-devel] [PATCH v3] hw/char: remove legacy interface escc_init()

2018-02-13 Thread Mark Cave-Ayland


On 13/02/18 13:01, Laurent Vivier wrote:


Hi,

can a maintainer of one of the involved parts take this in his
maintenance branch to have this merged?

Thanks,
Laurent

On 29/01/2018 15:21, Laurent Vivier wrote:

Paolo,

I forgot to cc: you for the "MAINTAINERS/Character devices/Odd Fixes".
Could you take this through your branch?

Thanks,
Laurent

On 26/01/2018 16:41, Mark Cave-Ayland wrote:

On 26/01/18 14:47, Laurent Vivier wrote:


Move necessary stuff in escc.h and update type names.
Remove slavio_serial_ms_kbd_init().
Fix code style problems reported by checkpatch.pl
Update mac_newworld, mac_oldworld and sun4m to use directly the
QDEV interface.

Signed-off-by: Laurent Vivier 
Reviewed-by: Philippe Mathieu-Daudé 
---

Notes:
  v3: in sun4m, move comments about Slavio TTY
  above both qdev_create().
  v2: in sun4m, move comments about Slavio TTY close to
  their qdev_prop_set_chr()

   hw/char/escc.c | 208
++---
   hw/ppc/mac_newworld.c  |  19 -
   hw/ppc/mac_oldworld.c  |  19 -
   hw/sparc/sun4m.c   |  34 +++-
   include/hw/char/escc.h |  54 +++--
   5 files changed, 170 insertions(+), 164 deletions(-)

diff --git a/hw/char/escc.c b/hw/char/escc.c
index 3ab831a6a7..bb735cc0c8 100644
--- a/hw/char/escc.c
+++ b/hw/char/escc.c
@@ -26,10 +26,7 @@
   #include "hw/hw.h"
   #include "hw/sysbus.h"
   #include "hw/char/escc.h"
-#include "chardev/char-fe.h"
-#include "chardev/char-serial.h"
   #include "ui/console.h"
-#include "ui/input.h"
   #include "trace.h"
     /*
@@ -64,53 +61,7 @@
    *  2010-May-23  Artyom Tarasenko:  Reworked IUS logic
    */
   -typedef enum {
-    chn_a, chn_b,
-} ChnID;
-
-#define CHN_C(s) ((s)->chn == chn_b? 'b' : 'a')
-
-typedef enum {
-    ser, kbd, mouse,
-} ChnType;
-
-#define SERIO_QUEUE_SIZE 256
-
-typedef struct {
-    uint8_t data[SERIO_QUEUE_SIZE];
-    int rptr, wptr, count;
-} SERIOQueue;
-
-#define SERIAL_REGS 16
-typedef struct ChannelState {
-    qemu_irq irq;
-    uint32_t rxint, txint, rxint_under_svc, txint_under_svc;
-    struct ChannelState *otherchn;
-    uint32_t reg;
-    uint8_t wregs[SERIAL_REGS], rregs[SERIAL_REGS];
-    SERIOQueue queue;
-    CharBackend chr;
-    int e0_mode, led_mode, caps_lock_mode, num_lock_mode;
-    int disabled;
-    int clock;
-    uint32_t vmstate_dummy;
-    ChnID chn; // this channel, A (base+4) or B (base+0)
-    ChnType type;
-    uint8_t rx, tx;
-    QemuInputHandlerState *hs;
-} ChannelState;
-
-#define ESCC(obj) OBJECT_CHECK(ESCCState, (obj), TYPE_ESCC)
-
-typedef struct ESCCState {
-    SysBusDevice parent_obj;
-
-    struct ChannelState chn[2];
-    uint32_t it_shift;
-    MemoryRegion mmio;
-    uint32_t disabled;
-    uint32_t frequency;
-} ESCCState;
+#define CHN_C(s) ((s)->chn == escc_chn_b ? 'b' : 'a')
     #define SERIAL_CTRL 0
   #define SERIAL_DATA 1
@@ -214,44 +165,47 @@ typedef struct ESCCState {
   #define R_MISC1I 14
   #define R_EXTINT 15
   -static void handle_kbd_command(ChannelState *s, int val);
+static void handle_kbd_command(ESCCChannelState *s, int val);
   static int serial_can_receive(void *opaque);
-static void serial_receive_byte(ChannelState *s, int ch);
+static void serial_receive_byte(ESCCChannelState *s, int ch);
     static void clear_queue(void *opaque)
   {
-    ChannelState *s = opaque;
-    SERIOQueue *q = &s->queue;
+    ESCCChannelState *s = opaque;
+    ESCCSERIOQueue *q = &s->queue;
   q->rptr = q->wptr = q->count = 0;
   }
     static void put_queue(void *opaque, int b)
   {
-    ChannelState *s = opaque;
-    SERIOQueue *q = &s->queue;
+    ESCCChannelState *s = opaque;
+    ESCCSERIOQueue *q = &s->queue;
     trace_escc_put_queue(CHN_C(s), b);
-    if (q->count >= SERIO_QUEUE_SIZE)
+    if (q->count >= ESCC_SERIO_QUEUE_SIZE) {
   return;
+    }
   q->data[q->wptr] = b;
-    if (++q->wptr == SERIO_QUEUE_SIZE)
+    if (++q->wptr == ESCC_SERIO_QUEUE_SIZE) {
   q->wptr = 0;
+    }
   q->count++;
   serial_receive_byte(s, 0);
   }
     static uint32_t get_queue(void *opaque)
   {
-    ChannelState *s = opaque;
-    SERIOQueue *q = &s->queue;
+    ESCCChannelState *s = opaque;
+    ESCCSERIOQueue *q = &s->queue;
   int val;
     if (q->count == 0) {
   return 0;
   } else {
   val = q->data[q->rptr];
-    if (++q->rptr == SERIO_QUEUE_SIZE)
+    if (++q->rptr == ESCC_SERIO_QUEUE_SIZE) {
   q->rptr = 0;
+    }
   q->count--;
   }
   trace_escc_get_queue(CHN_C(s), val);
@@ -260,7 +214,7 @@ static uint32_t get_queue(void *opaque)
   return val;
   }
   -static int escc_update_irq_chn(ChannelState *s)
+static int escc_update_irq_chn(ESCCChannelState *s)
   {
   if s->wregs[W_INTR] & INTR_TXINT) && (s->txint == 1)) ||
    // tx ints enabled, pending
@@ -274,7 +228,7 @@ static int escc_update_irq_chn(ChannelState *s)
   return 0;
   }
   -static void escc_update_irq(Cha

Re: [Qemu-devel] [PATCH v6 1/3] pci: Add support for Designware IP block

2018-02-13 Thread Andrey Smirnov

On Tue, Feb 13, 2018 at 2:15 PM, Michael S. Tsirkin  wrote:
> On Tue, Feb 13, 2018 at 12:24:40PM -0800, Andrey Smirnov wrote:
>> On Tue, Feb 13, 2018 at 10:13 AM, Michael S. Tsirkin  wrote:
>> > On Tue, Feb 13, 2018 at 09:07:10AM -0800, Andrey Smirnov wrote:
>> >> +static void designware_pcie_root_class_init(ObjectClass *klass, void 
>> >> *data)
>> >> +{
>> >> +PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>> >> +DeviceClass *dc = DEVICE_CLASS(klass);
>> >> +
>> >> +set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
>> >> +
>> >> +k->vendor_id = PCI_VENDOR_ID_SYNOPSYS;
>> >> +k->device_id = 0xABCD;
>> >> +k->revision = 0;
>> >> +k->class_id = PCI_CLASS_BRIDGE_PCI;
>> >> +k->is_express = true;
>> >> +k->is_bridge = true;
>> >> +k->exit = pci_bridge_exitfn;
>> >> +k->realize = designware_pcie_root_realize;
>> >> +k->config_read = designware_pcie_root_config_read;
>> >> +k->config_write = designware_pcie_root_config_write;
>> >> +
>> >> +dc->reset = pci_bridge_reset;
>> >> +/*
>> >> + * PCI-facing part of the host bridge, not usable without the
>> >> + * host-facing part, which can't be device_add'ed, yet.
>> >> + */
>> >> +dc->user_creatable = false;
>> >> +dc->vmsd = &vmstate_designware_pcie_root;
>> >> +}
>> >> +
>> >> +static uint64_t designware_pcie_host_mmio_read(void *opaque, hwaddr addr,
>> >> +   unsigned int size)
>> >> +{
>> >> +PCIHostState *pci = PCI_HOST_BRIDGE(opaque);
>> >> +PCIDevice *device = pci_find_device(pci->bus, 0, 0);
>> >> +
>> >> +return pci_host_config_read_common(device,
>> >> +   addr,
>> >> +   pci_config_size(device),
>> >> +   size);
>> >> +}
>> >> +
>> >> +static void designware_pcie_host_mmio_write(void *opaque, hwaddr addr,
>> >> +uint64_t val, unsigned int 
>> >> size)
>> >> +{
>> >> +PCIHostState *pci = PCI_HOST_BRIDGE(opaque);
>> >> +PCIDevice *device = pci_find_device(pci->bus, 0, 0);
>> >> +
>> >> +return pci_host_config_write_common(device,
>> >> +addr,
>> >> +pci_config_size(device),
>> >> +val, size);
>> >> +}
>> >> +
>> >> +static const MemoryRegionOps designware_pci_mmio_ops = {
>> >> +.read   = designware_pcie_host_mmio_read,
>> >> +.write  = designware_pcie_host_mmio_write,
>> >> +.endianness = DEVICE_NATIVE_ENDIAN,
>> >> +.impl = {
>> >> +/*
>> >> + * Our device would not work correctly if the guest was doing
>> >> + * unaligned access. This might not be a limitation on the real
>> >> + * device but in practice there is no reason for a guest to 
>> >> access
>> >> + * this device unaligned.
>> >> + */
>> >> +.min_access_size = 4,
>> >> +.max_access_size = 4,
>> >> +.unaligned = false,
>> >> +},
>> >> +};
>> >
>> > Could you pls add some comments explaining why is DEVICE_NATIVE_ENDIAN
>> > appropriate here?  Most of these cases are plain "we never bothered
>> > about cross-endian setups". Some are "there's a mix of different
>> > endian-ness values, need to handle in a special way".
>> >
>> > I suspect you really need DEVICE_LITTLE_ENDIAN.
>> >
>>
>> That MemoryRegion corresponds to a register file permanently mapped
>> into CPU's address space, so my assumption is that SoC designers will
>> wire it according to CPUs endianness be it big or little. I am not
>> aware of any big-endian CPU based SoC on the market using Designware's
>> IP block, so I don't think there are any precedent confirming or
>> denying correctness of my assumption. IMHO, this is also the reason
>> why all of Linux driver code for that IP assumes little endianness.
>
> IMHO if Linux driver code does cpu_to_le then it seems best to be
> consistent with that.
>

Well, all of the DW code does so implicitly by using readl()/writel()
helpers which will perform cpu_to_le/le_to_cpu under the hood. But is
seems to me that it could be either because the access does have to be
LE always or simply because readl()/writel() are goto memory helpers
on ARM/LE-platforms.

FWIW: Somewhat similar precedent of MIPS/Boston machine can serve as
counter-example to my assumption, since Xilinx PCIE IP there seem to
be wired to be LE despite being attached to BE CPU.

Thanks,
Andrey Smirnov

Re: [Qemu-devel] [Qemu-stable] [PULL 10/25] virtio_error: don't invoke status callbacks

2018-02-13 Thread Michael S. Tsirkin

On Tue, Feb 13, 2018 at 09:53:58PM +0100, Peter Lieven wrote:
> 
> Am 21.12.2017 um 15:29 schrieb Michael S. Tsirkin:
> > Backends don't need to know what frontend requested a reset,
> > and notifying then from virtio_error is messy because
> > virtio_error itself might be invoked from backend.
> >
> > Let's just set the status directly.
> >
> > Cc: qemu-sta...@nongnu.org
> > Reported-by: Ilya Maximets 
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >  hw/virtio/virtio.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index ad564b0..d6002ee 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -2469,7 +2469,7 @@ void GCC_FMT_ATTR(2, 3) virtio_error(VirtIODevice 
> > *vdev, const char *fmt, ...)
> >  va_end(ap);
> >  
> >  if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> > -virtio_set_status(vdev, vdev->status | 
> > VIRTIO_CONFIG_S_NEEDS_RESET);
> > +vdev->status = vdev->status | VIRTIO_CONFIG_S_NEEDS_RESET;
> >  virtio_notify_config(vdev);
> >  }
> >  
> 
> 
> Is it possible that this patch introduces a stall in I/O and a deadlock on a 
> drain all?
> 
> I have seen Qemu VMs being I/O stalled and deadlocking on a vm stop command in
> 
> blk_drain_all. This happened after a longer storage outage.
> 
> 
> I am asking just theoretically because I have seen this behaviour first when 
> we
> 
> backported this patch in our stable 2.9 branch.
> 
> 
> Thank you,
> 
> Peter

Well - this patch was introduced to fix a crash, but
a well behaved VM should not trigger VIRTIO_CONFIG_S_NEEDS_RESET -
did you see any error messages in the log when this triggered?

-- 
MST

Re: [Qemu-devel] [PATCH v6 1/3] pci: Add support for Designware IP block

2018-02-13 Thread Michael S. Tsirkin

On Tue, Feb 13, 2018 at 12:24:40PM -0800, Andrey Smirnov wrote:
> On Tue, Feb 13, 2018 at 10:13 AM, Michael S. Tsirkin  wrote:
> > On Tue, Feb 13, 2018 at 09:07:10AM -0800, Andrey Smirnov wrote:
> >> +static void designware_pcie_root_class_init(ObjectClass *klass, void 
> >> *data)
> >> +{
> >> +PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> >> +DeviceClass *dc = DEVICE_CLASS(klass);
> >> +
> >> +set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
> >> +
> >> +k->vendor_id = PCI_VENDOR_ID_SYNOPSYS;
> >> +k->device_id = 0xABCD;
> >> +k->revision = 0;
> >> +k->class_id = PCI_CLASS_BRIDGE_PCI;
> >> +k->is_express = true;
> >> +k->is_bridge = true;
> >> +k->exit = pci_bridge_exitfn;
> >> +k->realize = designware_pcie_root_realize;
> >> +k->config_read = designware_pcie_root_config_read;
> >> +k->config_write = designware_pcie_root_config_write;
> >> +
> >> +dc->reset = pci_bridge_reset;
> >> +/*
> >> + * PCI-facing part of the host bridge, not usable without the
> >> + * host-facing part, which can't be device_add'ed, yet.
> >> + */
> >> +dc->user_creatable = false;
> >> +dc->vmsd = &vmstate_designware_pcie_root;
> >> +}
> >> +
> >> +static uint64_t designware_pcie_host_mmio_read(void *opaque, hwaddr addr,
> >> +   unsigned int size)
> >> +{
> >> +PCIHostState *pci = PCI_HOST_BRIDGE(opaque);
> >> +PCIDevice *device = pci_find_device(pci->bus, 0, 0);
> >> +
> >> +return pci_host_config_read_common(device,
> >> +   addr,
> >> +   pci_config_size(device),
> >> +   size);
> >> +}
> >> +
> >> +static void designware_pcie_host_mmio_write(void *opaque, hwaddr addr,
> >> +uint64_t val, unsigned int 
> >> size)
> >> +{
> >> +PCIHostState *pci = PCI_HOST_BRIDGE(opaque);
> >> +PCIDevice *device = pci_find_device(pci->bus, 0, 0);
> >> +
> >> +return pci_host_config_write_common(device,
> >> +addr,
> >> +pci_config_size(device),
> >> +val, size);
> >> +}
> >> +
> >> +static const MemoryRegionOps designware_pci_mmio_ops = {
> >> +.read   = designware_pcie_host_mmio_read,
> >> +.write  = designware_pcie_host_mmio_write,
> >> +.endianness = DEVICE_NATIVE_ENDIAN,
> >> +.impl = {
> >> +/*
> >> + * Our device would not work correctly if the guest was doing
> >> + * unaligned access. This might not be a limitation on the real
> >> + * device but in practice there is no reason for a guest to access
> >> + * this device unaligned.
> >> + */
> >> +.min_access_size = 4,
> >> +.max_access_size = 4,
> >> +.unaligned = false,
> >> +},
> >> +};
> >
> > Could you pls add some comments explaining why is DEVICE_NATIVE_ENDIAN
> > appropriate here?  Most of these cases are plain "we never bothered
> > about cross-endian setups". Some are "there's a mix of different
> > endian-ness values, need to handle in a special way".
> >
> > I suspect you really need DEVICE_LITTLE_ENDIAN.
> >
> 
> That MemoryRegion corresponds to a register file permanently mapped
> into CPU's address space, so my assumption is that SoC designers will
> wire it according to CPUs endianness be it big or little. I am not
> aware of any big-endian CPU based SoC on the market using Designware's
> IP block, so I don't think there are any precedent confirming or
> denying correctness of my assumption. IMHO, this is also the reason
> why all of Linux driver code for that IP assumes little endianness.

IMHO if Linux driver code does cpu_to_le then it seems best to be
consistent with that.

> I can't say that I testing this code against a big-endian guest/CPU,
> but that is primarily due to the fact that there's no real use case
> and any test set up I can put toghere would be a contrived example
> pointlessly proving my point.
> 
> Anyway, I am more than happy to switch it to use DEVICE_LITTLE_ENDIAN,
> I just don't know if doing so is any more justified than keeping it
> DEVICE_NATIVE_ENDIAN.
> 
> Thanks,
> Andrey Smirnov

I agree it's probably not critical for a target-specific device.

-- 
MST

Re: [Qemu-devel] [PATCH v5 09/23] RISC-V TCG Code Generation

2018-02-13 Thread Richard Henderson

On 02/13/2018 01:55 PM, Emilio G. Cota wrote:
> On Thu, Feb 08, 2018 at 14:28:34 +1300, Michael Clark wrote:
>> TCG code generation for the RV32IMAFDC and RV64IMAFDC. The QEMU
>> RISC-V code generator has complete coverage for the Base ISA v2.2,
>> Privileged ISA v1.9.1 and Privileged ISA v1.10:
>>
>> - RISC-V Instruction Set Manual Volume I: User-Level ISA Version 2.2
>> - RISC-V Instruction Set Manual Volume II: Privileged ISA Version 1.9.1
>> - RISC-V Instruction Set Manual Volume II: Privileged ISA Version 1.10
>>
>> Reviewed-by: Richard Henderson 
>> Signed-off-by: Michael Clark 
>> ---
> (snip)
>> +++ b/target/riscv/translate.c
> (snip)
>> +enum {
>> +BS_NONE = 0, /* When seen outside of translation while loop, 
>> indicates
>> + need to exit tb due to end of page. */
>> +BS_STOP = 1, /* Need to exit tb for syscall, sret, etc. */
> 
> Are we planning to use BS_STOP in the future? I see it has no setters,
> although we check for it in gen_intermediate_code:

No, but the whole port should be converted to exec/translator.h, which defines
DisasJumpType.  Not something I'm going to require on initial submission until
we've gotten most of the other targets cleaned up.


r~

Re: [Qemu-devel] [PATCH v2 1/1] virtio-balloon: include statistics of disk/file caches

2018-02-13 Thread Michael S. Tsirkin

On Tue, Feb 13, 2018 at 12:29:39PM -0800, Jonathan Helman wrote:
> 
> 
> On 02/05/2018 04:08 AM, Tomáš Golembiovský wrote:
> > ping
> > 
> > On Tue,  5 Dec 2017 13:14:46 +0100
> > Tomáš Golembiovský  wrote:
> > 
> 
> It would be good to include the corresponding upstream kernel change in the
> commit message. This would be similar to a previous change:
> https://lists.gnu.org/archive/html/qemu-devel/2016-02/msg0.html

good idea, but this has been merged by now.

> > > Signed-off-by: Tomáš Golembiovský 
> > > ---
> > >   hw/virtio/virtio-balloon.c  | 1 +
> > >   include/standard-headers/linux/virtio_balloon.h | 3 ++-
> > >   2 files changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > > index 37cde38982..8141326a51 100644
> > > --- a/hw/virtio/virtio-balloon.c
> > > +++ b/hw/virtio/virtio-balloon.c
> > > @@ -50,6 +50,7 @@ static const char *balloon_stat_names[] = {
> > >  [VIRTIO_BALLOON_S_MEMFREE] = "stat-free-memory",
> > >  [VIRTIO_BALLOON_S_MEMTOT] = "stat-total-memory",
> > >  [VIRTIO_BALLOON_S_AVAIL] = "stat-available-memory",
> > > +   [VIRTIO_BALLOON_S_CACHES] = "stat-disk-caches",
> > >  [VIRTIO_BALLOON_S_NR] = NULL
> > >   };
> > > diff --git a/include/standard-headers/linux/virtio_balloon.h 
> > > b/include/standard-headers/linux/virtio_balloon.h
> > > index 9d06ccd066..7b0a41b8fc 100644
> > > --- a/include/standard-headers/linux/virtio_balloon.h
> > > +++ b/include/standard-headers/linux/virtio_balloon.h
> > > @@ -52,7 +52,8 @@ struct virtio_balloon_config {
> > >   #define VIRTIO_BALLOON_S_MEMFREE  4   /* Total amount of free memory */
> > >   #define VIRTIO_BALLOON_S_MEMTOT   5   /* Total amount of memory */
> > >   #define VIRTIO_BALLOON_S_AVAIL6   /* Available memory as in /proc */
> > > -#define VIRTIO_BALLOON_S_NR   7
> > > +#define VIRTIO_BALLOON_S_CACHES   7   /* Disk caches */
> 
> I've been wondering, VIRTIO_BALLOON_S_AVAIL is not in the virtio spec (see
> Section 5.5.6.3). It seems like this header file needs to be in sync with
> the virtio spec in order to make this change.
> 
> I have a similar change to add a new statistic and was wondering this.

Absolutely. Tomáš?

> > > +#define VIRTIO_BALLOON_S_NR   8
> > >   /*
> > >* Memory statistics structure.
> > > -- 
> > > 2.15.1
> > > 
> > 
> > 
> 
> You need to add your new stat to the list of stats in
> docs/virtio-balloon-stats.txt.
> 
> Jon

Can't hurt, I agree.

-- 
MST

Re: [Qemu-devel] [PATCH v5 09/23] RISC-V TCG Code Generation

2018-02-13 Thread Emilio G. Cota

On Thu, Feb 08, 2018 at 14:28:34 +1300, Michael Clark wrote:
> TCG code generation for the RV32IMAFDC and RV64IMAFDC. The QEMU
> RISC-V code generator has complete coverage for the Base ISA v2.2,
> Privileged ISA v1.9.1 and Privileged ISA v1.10:
> 
> - RISC-V Instruction Set Manual Volume I: User-Level ISA Version 2.2
> - RISC-V Instruction Set Manual Volume II: Privileged ISA Version 1.9.1
> - RISC-V Instruction Set Manual Volume II: Privileged ISA Version 1.10
> 
> Reviewed-by: Richard Henderson 
> Signed-off-by: Michael Clark 
> ---
(snip)
> +++ b/target/riscv/translate.c
(snip)
> +/* Address comparion failure.  However, we still need to
> +   provide the memory barrier implied by AQ/RL.  */

s/comparion/comparison/

E.

Re: [Qemu-devel] [PATCH v5 09/23] RISC-V TCG Code Generation

2018-02-13 Thread Emilio G. Cota

On Thu, Feb 08, 2018 at 14:28:34 +1300, Michael Clark wrote:
> TCG code generation for the RV32IMAFDC and RV64IMAFDC. The QEMU
> RISC-V code generator has complete coverage for the Base ISA v2.2,
> Privileged ISA v1.9.1 and Privileged ISA v1.10:
> 
> - RISC-V Instruction Set Manual Volume I: User-Level ISA Version 2.2
> - RISC-V Instruction Set Manual Volume II: Privileged ISA Version 1.9.1
> - RISC-V Instruction Set Manual Volume II: Privileged ISA Version 1.10
> 
> Reviewed-by: Richard Henderson 
> Signed-off-by: Michael Clark 
> ---
(snip)
> +++ b/target/riscv/translate.c
(snip)
> +enum {
> +BS_NONE = 0, /* When seen outside of translation while loop, 
> indicates
> + need to exit tb due to end of page. */
> +BS_STOP = 1, /* Need to exit tb for syscall, sret, etc. */

Are we planning to use BS_STOP in the future? I see it has no setters,
although we check for it in gen_intermediate_code:

(snip)
> +switch (ctx.bstate) {
> +case BS_STOP:
> +gen_goto_tb(&ctx, 0, ctx.pc);
> +break;
> +case BS_NONE: /* handle end of page - DO NOT CHAIN. See gen_goto_tb. */

Should we get rid of it?

Emilio

Re: [Qemu-devel] [PATCH v2 4/4] acpi: build TPM Physical Presence interface

2018-02-13 Thread Stefan Berger


On 02/13/2018 04:04 PM, Laszlo Ersek wrote:

On 02/13/18 21:29, Stefan Berger wrote:

On 02/13/2018 02:59 PM, Laszlo Ersek wrote:

On 02/13/18 20:37, Kevin O'Connor wrote:

On Tue, Feb 13, 2018 at 05:16:49PM +0100, Laszlo Ersek wrote:

On 02/12/18 21:49, Stefan Berger wrote:

On 02/12/2018 03:46 PM, Kevin O'Connor wrote:

I'm not sure I fully understand the goals of the PPI interface.
Here's what I understand so far:

The TPM specs define some actions that are considered privileged.  An
example of this would be disabling the TPM itself.  In order to
prevent an attacker from performing these actions without
authorization, the TPM specs define a mechanism to assert "physical
presence" before the privileged action can be done.  They do this by
having the firmware present a menu during early boot that permits
these privileged operations, and then the firmware locks the TPM chip
so the actions can no longer be done by any software that runs after
the firmware.  Thus "physical presence" is asserted by demonstrating
one has console access to the machine during early boot.

The PPI spec implements a work around for this - presumably some
found
the enforcement mechanism too onerous.  It allows the OS to provide a
request code to the firmware, and on the next boot the firmware will
take the requested action before it locks the chip.  Thus allowing
the
OS to indirectly perform the privileged action even after the chip
has
been locked.  Thus, the PPI system seems to be an "elaborate hack" to
allow users to circumvent the physical presence mechanism (if they
choose to).

Correct.

Here's what I understand the proposed implementation involves:

1 - in addition to emulating the TPM device itself, QEMU will also
   introduce a virtual memory device with 0x400 bytes.

Correct.

2 - on first boot the firmware (seabios and uefi) will populate the
   memory region created in step 1.  In particular it will fill an
   array with the list of request codes it supports.  (Each
request
   is an 8bit value, the array has 256 entries.)

Correct. Each firmware would fill out the 256 byte array depending on
what it supports. The 8 bit values are basically flags and so on.

3 - QEMU will produce AML code implementing the standard PPI ACPI
   interface.  This AML code will take the request, find the table
   produced in step 1, compare it to the list of accepted requests
   produced in step 2, and then place the 8bit request in another
   qemu virtual memory device (at 0x or 0xFED45000).

Correct.

Now EDK2 wants to store the code in a UEFI variable in NVRAM. We
therefore would need to trigger an SMI. In SeaBIOS we wouldn't have to
do this.


4 - the OS will signal a reboot, qemu will do its normal reboot
logic,
   and the firmware will be run again.

5 - the firmware will extract the code written in stage 3, and if the
   tpm device has been configured to accept PPI codes from the
OS, it
   will invoke the requested action.

SeaBIOS would look into memory to find the code. EDK2 will read the
code
from a UEFI variable.


Did I understand the above correctly?

I think so. With the fine differences between SeaBIOS and EDK2
pointed out.

Here's what I suggest:

Please everyone continue working on this, according to Kevin's &
Stefan's description, but focus on QEMU and SeaBIOS *only*. Ignore edk2
for now.

If this were targetted at SeaBIOS, I'd look for a simpler
QEMU/firmware interface.  Something like:

A - QEMU produces AML code implementing the standard PPI ACPI
  interface that generates a request code and stores it in the
  device memory of an existing device (eg, writable fw_cfg or an
  extension field in the existing emulated TPM device).

ACPI code writing into fw_cfg sounds difficult.
I initially had PPI SeaBIOS code write into the TPM TIS device's memory
into some custom addresses. I'd consider this a hack. Now we have that
virtual memory device with those 0x400 bytes...

In these 0x400 bytes we have 256 bytes that are used for configuration
flags describing the supported opcode as you previously described. This
array allows us to decouple the firmware implementation from the ACPI
code and we need not hard code what is supported in the firmware inside
the ACPI code (which would be difficult to do anyway since in QEMU we
would not what firmware will be started and what PPI opcodes are
support) and the ppi sysfs entries in Linux for example show exactly
those PPI opcodes that are supported. The firmware needs to set those
flags and the firmware knows what it supports.

I hope we can settle that this device is the right path.


B - after a reboot the firmware extracts the PPI request code
  (produced in step A) and performs the requested action (if the TPM
  is configured to accept OS generated codes).

That is, skip steps 1 and 2 from the original proposal.

I think A/B can work fine, as long as
- the firmware can somehow dynamically recognize the device / "reg

Re: [Qemu-devel] [PATCH v2 4/4] acpi: build TPM Physical Presence interface

2018-02-13 Thread Laszlo Ersek

On 02/13/18 21:29, Stefan Berger wrote:
> On 02/13/2018 02:59 PM, Laszlo Ersek wrote:
>> On 02/13/18 20:37, Kevin O'Connor wrote:
>>> On Tue, Feb 13, 2018 at 05:16:49PM +0100, Laszlo Ersek wrote:
 On 02/12/18 21:49, Stefan Berger wrote:
> On 02/12/2018 03:46 PM, Kevin O'Connor wrote:
>> I'm not sure I fully understand the goals of the PPI interface.
>> Here's what I understand so far:
>>
>> The TPM specs define some actions that are considered privileged.  An
>> example of this would be disabling the TPM itself.  In order to
>> prevent an attacker from performing these actions without
>> authorization, the TPM specs define a mechanism to assert "physical
>> presence" before the privileged action can be done.  They do this by
>> having the firmware present a menu during early boot that permits
>> these privileged operations, and then the firmware locks the TPM chip
>> so the actions can no longer be done by any software that runs after
>> the firmware.  Thus "physical presence" is asserted by demonstrating
>> one has console access to the machine during early boot.
>>
>> The PPI spec implements a work around for this - presumably some
>> found
>> the enforcement mechanism too onerous.  It allows the OS to provide a
>> request code to the firmware, and on the next boot the firmware will
>> take the requested action before it locks the chip.  Thus allowing
>> the
>> OS to indirectly perform the privileged action even after the chip
>> has
>> been locked.  Thus, the PPI system seems to be an "elaborate hack" to
>> allow users to circumvent the physical presence mechanism (if they
>> choose to).
> Correct.
>> Here's what I understand the proposed implementation involves:
>>
>> 1 - in addition to emulating the TPM device itself, QEMU will also
>>   introduce a virtual memory device with 0x400 bytes.
> Correct.
>> 2 - on first boot the firmware (seabios and uefi) will populate the
>>   memory region created in step 1.  In particular it will fill an
>>   array with the list of request codes it supports.  (Each
>> request
>>   is an 8bit value, the array has 256 entries.)
> Correct. Each firmware would fill out the 256 byte array depending on
> what it supports. The 8 bit values are basically flags and so on.
>> 3 - QEMU will produce AML code implementing the standard PPI ACPI
>>   interface.  This AML code will take the request, find the table
>>   produced in step 1, compare it to the list of accepted requests
>>   produced in step 2, and then place the 8bit request in another
>>   qemu virtual memory device (at 0x or 0xFED45000).
> Correct.
>
> Now EDK2 wants to store the code in a UEFI variable in NVRAM. We
> therefore would need to trigger an SMI. In SeaBIOS we wouldn't have to
> do this.
>
>> 4 - the OS will signal a reboot, qemu will do its normal reboot
>> logic,
>>   and the firmware will be run again.
>>
>> 5 - the firmware will extract the code written in stage 3, and if the
>>   tpm device has been configured to accept PPI codes from the
>> OS, it
>>   will invoke the requested action.
> SeaBIOS would look into memory to find the code. EDK2 will read the
> code
> from a UEFI variable.
>
>> Did I understand the above correctly?
> I think so. With the fine differences between SeaBIOS and EDK2
> pointed out.
 Here's what I suggest:

 Please everyone continue working on this, according to Kevin's &
 Stefan's description, but focus on QEMU and SeaBIOS *only*. Ignore edk2
 for now.
>>> If this were targetted at SeaBIOS, I'd look for a simpler
>>> QEMU/firmware interface.  Something like:
>>>
>>> A - QEMU produces AML code implementing the standard PPI ACPI
>>>  interface that generates a request code and stores it in the
>>>  device memory of an existing device (eg, writable fw_cfg or an
>>>  extension field in the existing emulated TPM device).
> 
> ACPI code writing into fw_cfg sounds difficult.
> I initially had PPI SeaBIOS code write into the TPM TIS device's memory
> into some custom addresses. I'd consider this a hack. Now we have that
> virtual memory device with those 0x400 bytes...
> 
> In these 0x400 bytes we have 256 bytes that are used for configuration
> flags describing the supported opcode as you previously described. This
> array allows us to decouple the firmware implementation from the ACPI
> code and we need not hard code what is supported in the firmware inside
> the ACPI code (which would be difficult to do anyway since in QEMU we
> would not what firmware will be started and what PPI opcodes are
> support) and the ppi sysfs entries in Linux for example show exactly
> those PPI opcodes that are supported. The firmware needs to set those
> flags an

[Qemu-devel] sparc crash on delayed control-transfer couples

2018-02-13 Thread Steven Seeger

Consider the following code: 

0x100 cmp %g5, 3
0x104 be 0x200
0x108 b 0x300

I believe this is what is described on page 55 of the sparc v8 manual as 
unpredictable behavior, where a Bicc precedes an unconditional branch.

QEMU actually crashes unless run in GDB. Single stepping will actually have a 
successful compare of %g5==3 executing from 0x300.

Without GDB, qemu crashes with unaligned access at address 2 (JUMP_PC) on the 
fetch.

I understand that this may be "bad code" and may be "unpredictable" but I 
don't think QEMU crashing is an acceptable case. :)

I am not a SPARC expert at all (only started looking at sparc assembly 
yesterday, in fact) so I am not trying to say what the correct behavior it. It 
appears that the be should be a be,a, but is not. This may be a compiler bug 
in what was used to compile this code.

It seems that our board (real hardware) will run at 0x200 and ignore the 
branch. I am attempting to modify translate.c to do just that to see how 
things go.

If anyone else has any ideas, please chime in.

Thanks to you all for what you do.

Steven

[Qemu-devel] sparc branch to pc+4 issue

2018-02-13 Thread Steven Seeger

Consider pc==0x100:

0x100   b 0x104

The uncondtional not-annulled branch will go to 0x104, which is the next 
instruction anyway. do_branch() will leave dc->pc and dc->npc both set to 
0x104. This causes gdb to have a problem when single stepping. It will be 
stuck. QEMU will execute past this somehow, but I'm not sure with what side 
effect. It seems to me the following patch will fix this:

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index 71e0853e43..95ca90b51a 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -1464,6 +1464,7 @@ static void do_branch(DisasContext *dc, int32_t offset, 
uint32_t insn, int cc)
 dc->npc = dc->pc + 4;
 } else {
 dc->pc = dc->npc;
+if(target==dc->pc) target += 4;
 dc->npc = target;
 tcg_gen_mov_tl(cpu_pc, cpu_npc);
 }

I apologize if I am missing something with this assessment.

Steven

[Qemu-devel] [PATCH v2] hw/char/stm32f2xx_usart: fix TXE/TC bit handling

2018-02-13 Thread Richard Braun

I/O currently being synchronous, there is no reason to ever clear the
SR_TXE bit. However the SR_TC bit may be cleared by software writing
to the SR register, so set it on each write.

In addition, fix the reset value of the USART status register.

Signed-off-by: Richard Braun 
---
 hw/char/stm32f2xx_usart.c | 12 
 include/hw/char/stm32f2xx_usart.h |  7 ++-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/hw/char/stm32f2xx_usart.c b/hw/char/stm32f2xx_usart.c
index 07b462d4b6..032b5fda13 100644
--- a/hw/char/stm32f2xx_usart.c
+++ b/hw/char/stm32f2xx_usart.c
@@ -96,12 +96,10 @@ static uint64_t stm32f2xx_usart_read(void *opaque, hwaddr 
addr,
 switch (addr) {
 case USART_SR:
 retvalue = s->usart_sr;
-s->usart_sr &= ~USART_SR_TC;
 qemu_chr_fe_accept_input(&s->chr);
 return retvalue;
 case USART_DR:
 DB_PRINT("Value: 0x%" PRIx32 ", %c\n", s->usart_dr, (char) 
s->usart_dr);
-s->usart_sr |= USART_SR_TXE;
 s->usart_sr &= ~USART_SR_RXNE;
 qemu_chr_fe_accept_input(&s->chr);
 qemu_set_irq(s->irq, 0);
@@ -137,7 +135,9 @@ static void stm32f2xx_usart_write(void *opaque, hwaddr addr,
 switch (addr) {
 case USART_SR:
 if (value <= 0x3FF) {
-s->usart_sr = value;
+/* I/O being synchronous, TXE is always set. In addition, it may
+   only be set by hardware, so keep it set here. */
+s->usart_sr = value | USART_SR_TXE;
 } else {
 s->usart_sr &= value;
 }
@@ -151,8 +151,12 @@ static void stm32f2xx_usart_write(void *opaque, hwaddr 
addr,
 /* XXX this blocks entire thread. Rewrite to use
  * qemu_chr_fe_write and background I/O callbacks */
 qemu_chr_fe_write_all(&s->chr, &ch, 1);
+/* XXX I/O are currently synchronous, making it impossible for
+   software to observe transient states where TXE or TC aren't
+   set. Unlike TXE however, which is read-only, software may
+   clear TC by writing 0 to the SR register, so set it again
+   on each write. */
 s->usart_sr |= USART_SR_TC;
-s->usart_sr &= ~USART_SR_TXE;
 }
 return;
 case USART_BRR:
diff --git a/include/hw/char/stm32f2xx_usart.h 
b/include/hw/char/stm32f2xx_usart.h
index 9d03a7527c..7ea7448813 100644
--- a/include/hw/char/stm32f2xx_usart.h
+++ b/include/hw/char/stm32f2xx_usart.h
@@ -37,7 +37,12 @@
 #define USART_CR3  0x14
 #define USART_GTPR 0x18
 
-#define USART_SR_RESET 0x00C0
+/*
+ * XXX The reset value mentioned in "24.6.1 Status register" seems bogus.
+ * Looking at "Table 98 USART register map and reset values", it seems it
+ * should be 0xc0, and that's how real hardware behaves.
+ */
+#define USART_SR_RESET (USART_SR_TXE | USART_SR_TC)
 
 #define USART_SR_TXE  (1 << 7)
 #define USART_SR_TC   (1 << 6)
-- 
2.11.0

Re: [Qemu-devel] [Qemu-stable] [PULL 10/25] virtio_error: don't invoke status callbacks

2018-02-13 Thread Peter Lieven


Am 21.12.2017 um 15:29 schrieb Michael S. Tsirkin:
> Backends don't need to know what frontend requested a reset,
> and notifying then from virtio_error is messy because
> virtio_error itself might be invoked from backend.
>
> Let's just set the status directly.
>
> Cc: qemu-sta...@nongnu.org
> Reported-by: Ilya Maximets 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  hw/virtio/virtio.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index ad564b0..d6002ee 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2469,7 +2469,7 @@ void GCC_FMT_ATTR(2, 3) virtio_error(VirtIODevice 
> *vdev, const char *fmt, ...)
>  va_end(ap);
>  
>  if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> -virtio_set_status(vdev, vdev->status | VIRTIO_CONFIG_S_NEEDS_RESET);
> +vdev->status = vdev->status | VIRTIO_CONFIG_S_NEEDS_RESET;
>  virtio_notify_config(vdev);
>  }
>  


Is it possible that this patch introduces a stall in I/O and a deadlock on a 
drain all?

I have seen Qemu VMs being I/O stalled and deadlocking on a vm stop command in

blk_drain_all. This happened after a longer storage outage.


I am asking just theoretically because I have seen this behaviour first when we

backported this patch in our stable 2.9 branch.


Thank you,

Peter

Re: [Qemu-devel] [PATCH v2] vhost-user: fix memory leak

2018-02-13 Thread Philippe Mathieu-Daudé

On 02/13/2018 02:08 AM, linzhecheng wrote:
> g_free() was moved from vhost_net_cleanup in commit e6bcb1b, so we should
> free net after vhost_net_cleanup
> 
> Signed-off-by: linzhecheng 

Reviewed-by: Philippe Mathieu-Daudé 

> 
> diff --git a/net/vhost-user.c b/net/vhost-user.c
> index cb45512506..d024573e45 100644
> --- a/net/vhost-user.c
> +++ b/net/vhost-user.c
> @@ -109,6 +109,7 @@ static int vhost_user_start(int queues, NetClientState 
> *ncs[], CharBackend *be)
>  err:
>  if (net) {
>  vhost_net_cleanup(net);
> +g_free(net);
>  }
>  vhost_user_stop(i, ncs);
>  return -1;
>

Re: [Qemu-devel] [RfC PATCH v3 5/5] vfio/display: adding region support

2018-02-13 Thread Alex Williamson

On Tue, 13 Feb 2018 17:18:46 +0100
Gerd Hoffmann  wrote:

> Wire up region-based display.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/vfio/pci.h |   1 +
>  include/hw/vfio/vfio-common.h |   8 
>  hw/vfio/display.c | 102 
> +-
>  3 files changed, 109 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 1d005d922d..9fe0f3f198 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -148,6 +148,7 @@ typedef struct VFIOPCIDevice {
>  bool no_kvm_msi;
>  bool no_kvm_msix;
>  bool no_geforce_quirks;
> +VFIODisplay *dpy;
>  } VFIOPCIDevice;
>  
>  uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index f3a2ac9fee..fc8ae14fb7 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -142,6 +142,14 @@ typedef struct VFIOGroup {
>  QLIST_ENTRY(VFIOGroup) container_next;
>  } VFIOGroup;
>  
> +typedef struct VFIODisplay {
> +QemuConsole *con;
> +struct {
> +VFIORegion buffer;
> +DisplaySurface *surface;
> +} region;
> +} VFIODisplay;
> +
>  void vfio_put_base_device(VFIODevice *vbasedev);
>  void vfio_disable_irqindex(VFIODevice *vbasedev, int index);
>  void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index);
> diff --git a/hw/vfio/display.c b/hw/vfio/display.c
> index 4249be398d..819a0cb08c 100644
> --- a/hw/vfio/display.c
> +++ b/hw/vfio/display.c
> @@ -19,6 +19,105 @@
>  #include "qapi/error.h"
>  #include "pci.h"
>  
> +/* -- */
> +
> +static void vfio_display_region_update(void *opaque)
> +{
> +VFIOPCIDevice *vdev = opaque;
> +VFIODisplay *dpy = vdev->dpy;
> +struct vfio_device_gfx_plane_info plane = {
> +.argsz = sizeof(plane),
> +.flags = VFIO_GFX_PLANE_TYPE_REGION
> +};
> +pixman_format_code_t format = PIXMAN_x8r8g8b8;

nit, unused initialization

> +int ret;
> +
> +ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_QUERY_GFX_PLANE, &plane);
> +if (ret < 0) {
> +fprintf(stderr, "ioctl VFIO_DEVICE_QUERY_GFX_PLANE: %s\n",
> +strerror(errno));

nit, should this be an error_report()?

> +return;
> +}
> +if (!plane.drm_format || !plane.size) {
> +return;
> +}
> +format = qemu_drm_format_to_pixman(plane.drm_format);
> +if (!format) {
> +return;
> +}
> +
> +if (dpy->region.buffer.size &&
> +dpy->region.buffer.nr != plane.region_index) {
> +/* region changed */
> +vfio_region_exit(&dpy->region.buffer);

Don't we need to also call vfio_region_finalize()?  _exit only deletes
mmap MemoryRegions from the base MemoryRegion, _finalize does the
actual munmap, unparent calls, and frees dynamic memory.  As is, this
leaks memory afaict.

> +memset(&dpy->region.buffer, 0, sizeof(dpy->region.buffer));

_finalize could cleanup the VFIORegion rather than requiring this
memset, we simply haven't needed it yet since regions are generally
tied to the life cycle of the device.

> +dpy->region.surface = NULL;
> +}
> +
> +if (dpy->region.surface &&
> +(surface_width(dpy->region.surface) != plane.width ||
> + surface_height(dpy->region.surface) != plane.height ||
> + surface_format(dpy->region.surface) != format)) {
> +/* size changed */
> +dpy->region.surface = NULL;
> +}
> +
> +if (!dpy->region.buffer.size) {
> +/* mmap region */
> +ret = vfio_region_setup(OBJECT(vdev), &vdev->vbasedev,
> +&dpy->region.buffer,
> +plane.region_index,
> +"display");
> +if (ret != 0) {
> +fprintf(stderr, "%s: vfio_region_setup(%d): %s\n",
> +__func__, plane.region_index, strerror(-ret));
> +goto err1;
> +}
> +ret = vfio_region_mmap(&dpy->region.buffer);
> +if (ret != 0) {
> +fprintf(stderr, "%s: vfio_region_mmap(%d): %s\n", __func__,
> +plane.region_index, strerror(-ret));
> +goto err2;
> +}
> +assert(dpy->region.buffer.mmaps[0].mmap != NULL);
> +}
> +
> +if (dpy->region.surface == NULL) {
> +/* create surface */
> +dpy->region.surface = qemu_create_displaysurface_from
> +(plane.width, plane.height, format,
> + plane.stride, dpy->region.buffer.mmaps[0].mmap);
> +dpy_gfx_replace_surface(dpy->con, dpy->region.surface);
> +}
> +
> +/* full screen update */
> +dpy_gfx_update(dpy->con, 0, 0,
> +   surface_width(dpy->region.surface),
> +   surface_height(dpy->region.surface));
> +return;
> +
> +err2:
> +vfio_region_exit(&dpy->region.buffer);

Thi

Re: [Qemu-devel] [PATCH v2 1/1] virtio-balloon: include statistics of disk/file caches

2018-02-13 Thread Jonathan Helman




On 02/05/2018 04:08 AM, Tomáš Golembiovský wrote:

ping

On Tue,  5 Dec 2017 13:14:46 +0100
Tomáš Golembiovský  wrote:



It would be good to include the corresponding upstream kernel change in 
the commit message. This would be similar to a previous change: 
https://lists.gnu.org/archive/html/qemu-devel/2016-02/msg0.html



Signed-off-by: Tomáš Golembiovský 
---
  hw/virtio/virtio-balloon.c  | 1 +
  include/standard-headers/linux/virtio_balloon.h | 3 ++-
  2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 37cde38982..8141326a51 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -50,6 +50,7 @@ static const char *balloon_stat_names[] = {
 [VIRTIO_BALLOON_S_MEMFREE] = "stat-free-memory",
 [VIRTIO_BALLOON_S_MEMTOT] = "stat-total-memory",
 [VIRTIO_BALLOON_S_AVAIL] = "stat-available-memory",
+   [VIRTIO_BALLOON_S_CACHES] = "stat-disk-caches",
 [VIRTIO_BALLOON_S_NR] = NULL
  };
  
diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h

index 9d06ccd066..7b0a41b8fc 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -52,7 +52,8 @@ struct virtio_balloon_config {
  #define VIRTIO_BALLOON_S_MEMFREE  4   /* Total amount of free memory */
  #define VIRTIO_BALLOON_S_MEMTOT   5   /* Total amount of memory */
  #define VIRTIO_BALLOON_S_AVAIL6   /* Available memory as in /proc */
-#define VIRTIO_BALLOON_S_NR   7
+#define VIRTIO_BALLOON_S_CACHES   7   /* Disk caches */


I've been wondering, VIRTIO_BALLOON_S_AVAIL is not in the virtio spec 
(see Section 5.5.6.3). It seems like this header file needs to be in 
sync with the virtio spec in order to make this change.


I have a similar change to add a new statistic and was wondering this.


+#define VIRTIO_BALLOON_S_NR   8
  
  /*

   * Memory statistics structure.
--
2.15.1






You need to add your new stat to the list of stats in 
docs/virtio-balloon-stats.txt.


Jon

Re: [Qemu-devel] [PATCH v2 4/4] acpi: build TPM Physical Presence interface

2018-02-13 Thread Stefan Berger


On 02/13/2018 02:59 PM, Laszlo Ersek wrote:

On 02/13/18 20:37, Kevin O'Connor wrote:

On Tue, Feb 13, 2018 at 05:16:49PM +0100, Laszlo Ersek wrote:

On 02/12/18 21:49, Stefan Berger wrote:

On 02/12/2018 03:46 PM, Kevin O'Connor wrote:

I'm not sure I fully understand the goals of the PPI interface.
Here's what I understand so far:

The TPM specs define some actions that are considered privileged.  An
example of this would be disabling the TPM itself.  In order to
prevent an attacker from performing these actions without
authorization, the TPM specs define a mechanism to assert "physical
presence" before the privileged action can be done.  They do this by
having the firmware present a menu during early boot that permits
these privileged operations, and then the firmware locks the TPM chip
so the actions can no longer be done by any software that runs after
the firmware.  Thus "physical presence" is asserted by demonstrating
one has console access to the machine during early boot.

The PPI spec implements a work around for this - presumably some found
the enforcement mechanism too onerous.  It allows the OS to provide a
request code to the firmware, and on the next boot the firmware will
take the requested action before it locks the chip.  Thus allowing the
OS to indirectly perform the privileged action even after the chip has
been locked.  Thus, the PPI system seems to be an "elaborate hack" to
allow users to circumvent the physical presence mechanism (if they
choose to).

Correct.

Here's what I understand the proposed implementation involves:

1 - in addition to emulating the TPM device itself, QEMU will also
  introduce a virtual memory device with 0x400 bytes.

Correct.

2 - on first boot the firmware (seabios and uefi) will populate the
  memory region created in step 1.  In particular it will fill an
  array with the list of request codes it supports.  (Each request
  is an 8bit value, the array has 256 entries.)

Correct. Each firmware would fill out the 256 byte array depending on
what it supports. The 8 bit values are basically flags and so on.

3 - QEMU will produce AML code implementing the standard PPI ACPI
  interface.  This AML code will take the request, find the table
  produced in step 1, compare it to the list of accepted requests
  produced in step 2, and then place the 8bit request in another
  qemu virtual memory device (at 0x or 0xFED45000).

Correct.

Now EDK2 wants to store the code in a UEFI variable in NVRAM. We
therefore would need to trigger an SMI. In SeaBIOS we wouldn't have to
do this.


4 - the OS will signal a reboot, qemu will do its normal reboot logic,
  and the firmware will be run again.

5 - the firmware will extract the code written in stage 3, and if the
  tpm device has been configured to accept PPI codes from the OS, it
  will invoke the requested action.

SeaBIOS would look into memory to find the code. EDK2 will read the code
from a UEFI variable.


Did I understand the above correctly?

I think so. With the fine differences between SeaBIOS and EDK2 pointed out.

Here's what I suggest:

Please everyone continue working on this, according to Kevin's &
Stefan's description, but focus on QEMU and SeaBIOS *only*. Ignore edk2
for now.

If this were targetted at SeaBIOS, I'd look for a simpler
QEMU/firmware interface.  Something like:

A - QEMU produces AML code implementing the standard PPI ACPI
 interface that generates a request code and stores it in the
 device memory of an existing device (eg, writable fw_cfg or an
 extension field in the existing emulated TPM device).


ACPI code writing into fw_cfg sounds difficult.
I initially had PPI SeaBIOS code write into the TPM TIS device's memory 
into some custom addresses. I'd consider this a hack. Now we have that 
virtual memory device with those 0x400 bytes...


In these 0x400 bytes we have 256 bytes that are used for configuration 
flags describing the supported opcode as you previously described. This 
array allows us to decouple the firmware implementation from the ACPI 
code and we need not hard code what is supported in the firmware inside 
the ACPI code (which would be difficult to do anyway since in QEMU we 
would not what firmware will be started and what PPI opcodes are 
support) and the ppi sysfs entries in Linux for example show exactly 
those PPI opcodes that are supported. The firmware needs to set those 
flags and the firmware knows what it supports.


I hope we can settle that this device is the right path.



B - after a reboot the firmware extracts the PPI request code
 (produced in step A) and performs the requested action (if the TPM
 is configured to accept OS generated codes).

That is, skip steps 1 and 2 from the original proposal.

I think A/B can work fine, as long as
- the firmware can somehow dynamically recognize the device / "register
   block" that the request codes have to be pulled from, and


I experimented with

[Qemu-devel] [PATCH v8 18/21] vmdk: Switch to .bdrv_co_block_status()

2018-02-13 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vmdk driver accordingly.  Drop the
now-unused vmdk_find_index_in_cluster().

Also, fix a pre-existing bug: if find_extent() fails (unlikely,
since the block layer did a bounds check), then we must return a
failure, rather than 0.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 

---
v6-v7: no change
v5: drop dead code [Vladimir], return error on find_extent() failure
v4: rebase to interface tweak
v3: no change
v2: rebase to mapping flag
---
 block/vmdk.c | 38 ++
 1 file changed, 14 insertions(+), 24 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index ef15ddbfd3d..75f84213e6f 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1304,33 +1304,27 @@ static inline uint64_t 
vmdk_find_offset_in_cluster(VmdkExtent *extent,
 return extent_relative_offset % cluster_size;
 }

-static inline uint64_t vmdk_find_index_in_cluster(VmdkExtent *extent,
-  int64_t sector_num)
-{
-uint64_t offset;
-offset = vmdk_find_offset_in_cluster(extent, sector_num * 
BDRV_SECTOR_SIZE);
-return offset / BDRV_SECTOR_SIZE;
-}
-
-static int64_t coroutine_fn vmdk_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn vmdk_co_block_status(BlockDriverState *bs,
+ bool want_zero,
+ int64_t offset, int64_t bytes,
+ int64_t *pnum, int64_t *map,
+ BlockDriverState **file)
 {
 BDRVVmdkState *s = bs->opaque;
 int64_t index_in_cluster, n, ret;
-uint64_t offset;
+uint64_t cluster_offset;
 VmdkExtent *extent;

-extent = find_extent(s, sector_num, NULL);
+extent = find_extent(s, offset >> BDRV_SECTOR_BITS, NULL);
 if (!extent) {
-return 0;
+return -EIO;
 }
 qemu_co_mutex_lock(&s->lock);
-ret = get_cluster_offset(bs, extent, NULL,
- sector_num * 512, false, &offset,
+ret = get_cluster_offset(bs, extent, NULL, offset, false, &cluster_offset,
  0, 0);
 qemu_co_mutex_unlock(&s->lock);

-index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
+index_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
 switch (ret) {
 case VMDK_ERROR:
 ret = -EIO;
@@ -1345,18 +1339,14 @@ static int64_t coroutine_fn 
vmdk_co_get_block_status(BlockDriverState *bs,
 ret = BDRV_BLOCK_DATA;
 if (!extent->compressed) {
 ret |= BDRV_BLOCK_OFFSET_VALID;
-ret |= (offset + (index_in_cluster << BDRV_SECTOR_BITS))
-& BDRV_BLOCK_OFFSET_MASK;
+*map = cluster_offset + index_in_cluster;
 }
 *file = extent->file->bs;
 break;
 }

-n = extent->cluster_sectors - index_in_cluster;
-if (n > nb_sectors) {
-n = nb_sectors;
-}
-*pnum = n;
+n = extent->cluster_sectors * BDRV_SECTOR_SIZE - index_in_cluster;
+*pnum = MIN(n, bytes);
 return ret;
 }

@@ -2410,7 +2400,7 @@ static BlockDriver bdrv_vmdk = {
 .bdrv_close   = vmdk_close,
 .bdrv_create  = vmdk_create,
 .bdrv_co_flush_to_disk= vmdk_co_flush,
-.bdrv_co_get_block_status = vmdk_co_get_block_status,
+.bdrv_co_block_status = vmdk_co_block_status,
 .bdrv_get_allocated_file_size = vmdk_get_allocated_file_size,
 .bdrv_has_zero_init   = vmdk_has_zero_init,
 .bdrv_get_specific_info   = vmdk_get_specific_info,
-- 
2.14.3

[Qemu-devel] [PATCH v8 20/21] vvfat: Switch to .bdrv_co_block_status()

2018-02-13 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vvfat driver accordingly.  Note that we
can rely on the block driver having already clamped limits to our
block size, and simplify accordingly.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 

---
v5-v7: no change
v4: rebase to interface tweak
v3: no change
v2: rebase to earlier changes, simplify
---
 block/vvfat.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index 7e06ebacf61..4a17a49e128 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -3088,15 +3088,13 @@ vvfat_co_pwritev(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 return ret;
 }

-static int64_t coroutine_fn vvfat_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *n, BlockDriverState **file)
+static int coroutine_fn vvfat_co_block_status(BlockDriverState *bs,
+  bool want_zero, int64_t offset,
+  int64_t bytes, int64_t *n,
+  int64_t *map,
+  BlockDriverState **file)
 {
-*n = bs->total_sectors - sector_num;
-if (*n > nb_sectors) {
-*n = nb_sectors;
-} else if (*n < 0) {
-return 0;
-}
+*n = bytes;
 return BDRV_BLOCK_DATA;
 }

@@ -3257,7 +3255,7 @@ static BlockDriver bdrv_vvfat = {

 .bdrv_co_preadv = vvfat_co_preadv,
 .bdrv_co_pwritev= vvfat_co_pwritev,
-.bdrv_co_get_block_status = vvfat_co_get_block_status,
+.bdrv_co_block_status   = vvfat_co_block_status,
 };

 static void bdrv_vvfat_init(void)
-- 
2.14.3

[Qemu-devel] [PATCH v8 21/21] block: Drop unused .bdrv_co_get_block_status()

2018-02-13 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Now that all drivers have been updated to provide the
byte-based .bdrv_co_block_status(), we can delete the sector-based
interface.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 

---
v7: no change
v6: rebase to changes in patch 1, drop R-b
v5: rebase to master
v4: rebase to interface tweak
v3: no change
v2: rebase to earlier changes
---
 include/block/block_int.h |  3 ---
 block/io.c| 50 ++-
 2 files changed, 10 insertions(+), 43 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index bf2598856cf..5ae7738cf8d 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -215,9 +215,6 @@ struct BlockDriver {
  * as well as non-NULL pnum, map, and file; in turn, the driver
  * must return an error or set pnum to an aligned non-zero value.
  */
-int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum,
-BlockDriverState **file);
 int coroutine_fn (*bdrv_co_block_status)(BlockDriverState *bs,
 bool want_zero, int64_t offset, int64_t bytes, int64_t *pnum,
 int64_t *map, BlockDriverState **file);
diff --git a/block/io.c b/block/io.c
index 5bae79f282e..4c3dba09730 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1963,7 +1963,7 @@ static int coroutine_fn 
bdrv_co_block_status(BlockDriverState *bs,

 /* Must be non-NULL or bdrv_getlength() would have failed */
 assert(bs->drv);
-if (!bs->drv->bdrv_co_get_block_status && !bs->drv->bdrv_co_block_status) {
+if (!bs->drv->bdrv_co_block_status) {
 *pnum = bytes;
 ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
 if (offset + bytes == total_size) {
@@ -1981,53 +1981,23 @@ static int coroutine_fn 
bdrv_co_block_status(BlockDriverState *bs,

 /* Round out to request_alignment boundaries */
 align = bs->bl.request_alignment;
-if (bs->drv->bdrv_co_get_block_status && align < BDRV_SECTOR_SIZE) {
-align = BDRV_SECTOR_SIZE;
-}
 aligned_offset = QEMU_ALIGN_DOWN(offset, align);
 aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;

-if (bs->drv->bdrv_co_get_block_status) {
-int count; /* sectors */
-int64_t longret;
-
-assert(QEMU_IS_ALIGNED(aligned_offset | aligned_bytes,
-   BDRV_SECTOR_SIZE));
-/*
- * The contract allows us to return pnum smaller than bytes, even
- * if the next query would see the same status; we truncate the
- * request to avoid overflowing the driver's 32-bit interface.
- */
-longret = bs->drv->bdrv_co_get_block_status(
-bs, aligned_offset >> BDRV_SECTOR_BITS,
-MIN(INT_MAX, aligned_bytes) >> BDRV_SECTOR_BITS, &count,
-&local_file);
-if (longret < 0) {
-assert(INT_MIN <= longret);
-ret = longret;
-goto out;
-}
-if (longret & BDRV_BLOCK_OFFSET_VALID) {
-local_map = longret & BDRV_BLOCK_OFFSET_MASK;
-}
-ret = longret & ~BDRV_BLOCK_OFFSET_MASK;
-*pnum = count * BDRV_SECTOR_SIZE;
-} else {
-ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
-aligned_bytes, pnum, &local_map,
-&local_file);
-if (ret < 0) {
-*pnum = 0;
-goto out;
-}
-assert(*pnum); /* The block driver must make progress */
+ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
+aligned_bytes, pnum, &local_map,
+&local_file);
+if (ret < 0) {
+*pnum = 0;
+goto out;
 }

 /*
- * The driver's result must be a multiple of request_alignment.
+ * The driver's result must be a non-zero multiple of request_alignment.
  * Clamp pnum and adjust map to original request.
  */
-assert(QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset);
+assert(*pnum && QEMU_IS_ALIGNED(*pnum, align) &&
+   align > offset - aligned_offset);
 *pnum -= offset - aligned_offset;
 if (*pnum > bytes) {
 *pnum = bytes;
-- 
2.14.3

[Qemu-devel] [PATCH v8 00/21] add byte-based block_status driver callbacks

2018-02-13 Thread Eric Blake

There are patches floating around to add NBD_CMD_BLOCK_STATUS,
but NBD wants to report status on byte granularity (even if the
reporting will probably be naturally aligned to sectors or even
much higher levels).  I've therefore started the task of
converting our block status code to report at a byte granularity
rather than sectors.

These patches have been around for a while, but it's time to
finish it now that 2.12 is open for patches.

Based-on: <20180213170529.10858-1-kw...@redhat.com>
(Kevin's [PULL 00/55] Block layer patches)

The overall conversion currently looks like:
part 1: bdrv_is_allocated (merged, commit 51b0a488, 2.10)
part 2: dirty-bitmap (merged, commit ca759622, 2.11)
part 3: bdrv_get_block_status (merged, commit f0a9c18f, 2.11)
part 4: .bdrv_co_block_status (this series, v7 was here [1])

[1] https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg00954.html

Available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-byte-callback-v8

Since v7:
- rebase to master (more iscsi context changes, nvme driver is new)
- add a few more R-bys

Eric Blake (21):
  block: Add .bdrv_co_block_status() callback
  nvme: Drop pointless .bdrv_co_get_block_status()
  block: Switch passthrough drivers to .bdrv_co_block_status()
  file-posix: Switch to .bdrv_co_block_status()
  gluster: Switch to .bdrv_co_block_status()
  iscsi: Switch cluster_sectors to byte-based
  iscsi: Switch iscsi_allocmap_update() to byte-based
  iscsi: Switch to .bdrv_co_block_status()
  null: Switch to .bdrv_co_block_status()
  parallels: Switch to .bdrv_co_block_status()
  qcow: Switch to .bdrv_co_block_status()
  qcow2: Switch to .bdrv_co_block_status()
  qed: Switch to .bdrv_co_block_status()
  raw: Switch to .bdrv_co_block_status()
  sheepdog: Switch to .bdrv_co_block_status()
  vdi: Avoid bitrot of debugging code
  vdi: Switch to .bdrv_co_block_status()
  vmdk: Switch to .bdrv_co_block_status()
  vpc: Switch to .bdrv_co_block_status()
  vvfat: Switch to .bdrv_co_block_status()
  block: Drop unused .bdrv_co_get_block_status()

 include/block/block.h |  14 ++---
 include/block/block_int.h |  51 +--
 block/io.c|  86 +++--
 block/blkdebug.c  |  20 +++---
 block/commit.c|   2 +-
 block/file-posix.c|  62 +-
 block/gluster.c   |  70 ++---
 block/iscsi.c | 157 --
 block/mirror.c|   2 +-
 block/null.c  |  23 +++
 block/nvme.c  |  14 -
 block/parallels.c |  22 ---
 block/qcow.c  |  27 
 block/qcow2.c |  24 +++
 block/qed.c   |  84 +
 block/raw-format.c|  16 ++---
 block/sheepdog.c  |  26 
 block/throttle.c  |   2 +-
 block/vdi.c   |  45 +++--
 block/vmdk.c  |  38 +--
 block/vpc.c   |  45 ++---
 block/vvfat.c |  16 +++--
 22 files changed, 404 insertions(+), 442 deletions(-)

-- 
2.14.3

[Qemu-devel] [PATCH v8 15/21] sheepdog: Switch to .bdrv_co_block_status()

2018-02-13 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the sheepdog driver accordingly.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 
Reviewed-by: Jeff Cody 

---
v7: rebase to minor spacing changes in master
v5-v6: no change
v4: update to interface tweak
v3: no change
v2: rebase to mapping flag
---
 block/sheepdog.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index ac02b10fe03..3c3becf94df 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -3004,19 +3004,19 @@ static coroutine_fn int sd_co_pdiscard(BlockDriverState 
*bs, int64_t offset,
 return acb.ret;
 }

-static coroutine_fn int64_t
-sd_co_get_block_status(BlockDriverState *bs, int64_t sector_num, int 
nb_sectors,
-   int *pnum, BlockDriverState **file)
+static coroutine_fn int
+sd_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
+   int64_t bytes, int64_t *pnum, int64_t *map,
+   BlockDriverState **file)
 {
 BDRVSheepdogState *s = bs->opaque;
 SheepdogInode *inode = &s->inode;
 uint32_t object_size = (UINT32_C(1) << inode->block_size_shift);
-uint64_t offset = sector_num * BDRV_SECTOR_SIZE;
 unsigned long start = offset / object_size,
-  end = DIV_ROUND_UP((sector_num + nb_sectors) *
- BDRV_SECTOR_SIZE, object_size);
+  end = DIV_ROUND_UP(offset + bytes, object_size);
 unsigned long idx;
-int64_t ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
+*map = offset;
+int ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;

 for (idx = start; idx < end; idx++) {
 if (inode->data_vdi_id[idx] == 0) {
@@ -3033,9 +3033,9 @@ sd_co_get_block_status(BlockDriverState *bs, int64_t 
sector_num, int nb_sectors,
 }
 }

-*pnum = (idx - start) * object_size / BDRV_SECTOR_SIZE;
-if (*pnum > nb_sectors) {
-*pnum = nb_sectors;
+*pnum = (idx - start) * object_size;
+if (*pnum > bytes) {
+*pnum = bytes;
 }
 if (ret > 0 && ret & BDRV_BLOCK_OFFSET_VALID) {
 *file = bs;
@@ -3113,7 +3113,7 @@ static BlockDriver bdrv_sheepdog = {
 .bdrv_co_writev   = sd_co_writev,
 .bdrv_co_flush_to_disk= sd_co_flush_to_disk,
 .bdrv_co_pdiscard = sd_co_pdiscard,
-.bdrv_co_get_block_status = sd_co_get_block_status,
+.bdrv_co_block_status = sd_co_block_status,

 .bdrv_snapshot_create = sd_snapshot_create,
 .bdrv_snapshot_goto   = sd_snapshot_goto,
@@ -3149,7 +3149,7 @@ static BlockDriver bdrv_sheepdog_tcp = {
 .bdrv_co_writev   = sd_co_writev,
 .bdrv_co_flush_to_disk= sd_co_flush_to_disk,
 .bdrv_co_pdiscard = sd_co_pdiscard,
-.bdrv_co_get_block_status = sd_co_get_block_status,
+.bdrv_co_block_status = sd_co_block_status,

 .bdrv_snapshot_create = sd_snapshot_create,
 .bdrv_snapshot_goto   = sd_snapshot_goto,
@@ -3185,7 +3185,7 @@ static BlockDriver bdrv_sheepdog_unix = {
 .bdrv_co_writev   = sd_co_writev,
 .bdrv_co_flush_to_disk= sd_co_flush_to_disk,
 .bdrv_co_pdiscard = sd_co_pdiscard,
-.bdrv_co_get_block_status = sd_co_get_block_status,
+.bdrv_co_block_status = sd_co_block_status,

 .bdrv_snapshot_create = sd_snapshot_create,
 .bdrv_snapshot_goto   = sd_snapshot_goto,
-- 
2.14.3

[Qemu-devel] [PATCH v8 19/21] vpc: Switch to .bdrv_co_block_status()

2018-02-13 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vpc driver accordingly.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 

---
v7: tweak commit message and type of 'n' [Fam]
v6: no change
v5: fix incorrect rounding in 'map' and bad loop condition [Vladimir]
v4: rebase to interface tweak
v3: rebase to master
v2: drop get_sector_offset() [Kevin], rebase to mapping flag
---
 block/vpc.c | 45 +++--
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index cfa5144e867..fba4492fd7b 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -706,53 +706,54 @@ fail:
 return ret;
 }

-static int64_t coroutine_fn vpc_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn vpc_co_block_status(BlockDriverState *bs,
+bool want_zero,
+int64_t offset, int64_t bytes,
+int64_t *pnum, int64_t *map,
+BlockDriverState **file)
 {
 BDRVVPCState *s = bs->opaque;
 VHDFooter *footer = (VHDFooter*) s->footer_buf;
-int64_t start, offset;
+int64_t image_offset;
 bool allocated;
-int64_t ret;
-int n;
+int ret;
+int64_t n;

 if (be32_to_cpu(footer->type) == VHD_FIXED) {
-*pnum = nb_sectors;
+*pnum = bytes;
+*map = offset;
 *file = bs->file->bs;
-return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
-   (sector_num << BDRV_SECTOR_BITS);
+return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
 }

 qemu_co_mutex_lock(&s->lock);

-offset = get_image_offset(bs, sector_num << BDRV_SECTOR_BITS, false, NULL);
-start = offset;
-allocated = (offset != -1);
+image_offset = get_image_offset(bs, offset, false, NULL);
+allocated = (image_offset != -1);
 *pnum = 0;
 ret = 0;

 do {
 /* All sectors in a block are contiguous (without using the bitmap) */
-n = ROUND_UP(sector_num + 1, s->block_size / BDRV_SECTOR_SIZE)
-  - sector_num;
-n = MIN(n, nb_sectors);
+n = ROUND_UP(offset + 1, s->block_size) - offset;
+n = MIN(n, bytes);

 *pnum += n;
-sector_num += n;
-nb_sectors -= n;
+offset += n;
+bytes -= n;
 /* *pnum can't be greater than one block for allocated
  * sectors since there is always a bitmap in between. */
 if (allocated) {
 *file = bs->file->bs;
-ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
+*map = image_offset;
+ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 break;
 }
-if (nb_sectors == 0) {
+if (bytes == 0) {
 break;
 }
-offset = get_image_offset(bs, sector_num << BDRV_SECTOR_BITS, false,
-  NULL);
-} while (offset == -1);
+image_offset = get_image_offset(bs, offset, false, NULL);
+} while (image_offset == -1);

 qemu_co_mutex_unlock(&s->lock);
 return ret;
@@ -1098,7 +1099,7 @@ static BlockDriver bdrv_vpc = {

 .bdrv_co_preadv = vpc_co_preadv,
 .bdrv_co_pwritev= vpc_co_pwritev,
-.bdrv_co_get_block_status   = vpc_co_get_block_status,
+.bdrv_co_block_status   = vpc_co_block_status,

 .bdrv_get_info  = vpc_get_info,

-- 
2.14.3

[Qemu-devel] [PATCH v8 13/21] qed: Switch to .bdrv_co_block_status()

2018-02-13 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the qed driver accordingly, taking the opportunity
to inline qed_is_allocated_cb() into its lone caller (the callback
used to be important, until we switched qed to coroutines).  There is
no intent to optimize based on the want_zero flag for this format.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 

---
v6-v7: no change
v5: initialize len before qed_find_cluster() [Vladimir]
v4: rebase to interface change, inline pointless callback
v3: no change
v2: rebase to mapping flag, fix mask in qed_is_allocated_cb
---
 block/qed.c | 84 +
 1 file changed, 28 insertions(+), 56 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index c6ff3ab015d..a5952209261 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -688,74 +688,46 @@ finish:
 return ret;
 }

-typedef struct {
-BlockDriverState *bs;
-Coroutine *co;
-uint64_t pos;
-int64_t status;
-int *pnum;
-BlockDriverState **file;
-} QEDIsAllocatedCB;
-
-/* Called with table_lock held.  */
-static void qed_is_allocated_cb(void *opaque, int ret, uint64_t offset, size_t 
len)
-{
-QEDIsAllocatedCB *cb = opaque;
-BDRVQEDState *s = cb->bs->opaque;
-*cb->pnum = len / BDRV_SECTOR_SIZE;
-switch (ret) {
-case QED_CLUSTER_FOUND:
-offset |= qed_offset_into_cluster(s, cb->pos);
-cb->status = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
-*cb->file = cb->bs->file->bs;
-break;
-case QED_CLUSTER_ZERO:
-cb->status = BDRV_BLOCK_ZERO;
-break;
-case QED_CLUSTER_L2:
-case QED_CLUSTER_L1:
-cb->status = 0;
-break;
-default:
-assert(ret < 0);
-cb->status = ret;
-break;
-}
-
-if (cb->co) {
-aio_co_wake(cb->co);
-}
-}
-
-static int64_t coroutine_fn bdrv_qed_co_get_block_status(BlockDriverState *bs,
- int64_t sector_num,
- int nb_sectors, int *pnum,
+static int coroutine_fn bdrv_qed_co_block_status(BlockDriverState *bs,
+ bool want_zero,
+ int64_t pos, int64_t bytes,
+ int64_t *pnum, int64_t *map,
  BlockDriverState **file)
 {
 BDRVQEDState *s = bs->opaque;
-size_t len = (size_t)nb_sectors * BDRV_SECTOR_SIZE;
-QEDIsAllocatedCB cb = {
-.bs = bs,
-.pos = (uint64_t)sector_num * BDRV_SECTOR_SIZE,
-.status = BDRV_BLOCK_OFFSET_MASK,
-.pnum = pnum,
-.file = file,
-};
+size_t len = MIN(bytes, SIZE_MAX);
+int status;
 QEDRequest request = { .l2_table = NULL };
 uint64_t offset;
 int ret;

 qemu_co_mutex_lock(&s->table_lock);
-ret = qed_find_cluster(s, &request, cb.pos, &len, &offset);
-qed_is_allocated_cb(&cb, ret, offset, len);
+ret = qed_find_cluster(s, &request, pos, &len, &offset);

-/* The callback was invoked immediately */
-assert(cb.status != BDRV_BLOCK_OFFSET_MASK);
+*pnum = len;
+switch (ret) {
+case QED_CLUSTER_FOUND:
+*map = offset | qed_offset_into_cluster(s, pos);
+status = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+*file = bs->file->bs;
+break;
+case QED_CLUSTER_ZERO:
+status = BDRV_BLOCK_ZERO;
+break;
+case QED_CLUSTER_L2:
+case QED_CLUSTER_L1:
+status = 0;
+break;
+default:
+assert(ret < 0);
+status = ret;
+break;
+}

 qed_unref_l2_cache_entry(request.l2_table);
 qemu_co_mutex_unlock(&s->table_lock);

-return cb.status;
+return status;
 }

 static BDRVQEDState *acb_to_s(QEDAIOCB *acb)
@@ -1594,7 +1566,7 @@ static BlockDriver bdrv_qed = {
 .bdrv_child_perm  = bdrv_format_default_perms,
 .bdrv_create  = bdrv_qed_create,
 .bdrv_has_zero_init   = bdrv_has_zero_init_1,
-.bdrv_co_get_block_status = bdrv_qed_co_get_block_status,
+.bdrv_co_block_status = bdrv_qed_co_block_status,
 .bdrv_co_readv= bdrv_qed_co_readv,
 .bdrv_co_writev   = bdrv_qed_co_writev,
 .bdrv_co_pwrite_zeroes= bdrv_qed_co_pwrite_zeroes,
-- 
2.14.3

[Qemu-devel] [PATCH v8 07/21] iscsi: Switch iscsi_allocmap_update() to byte-based

2018-02-13 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert all uses of
the allocmap (no semantic change).  Callers that already had bytes
available are simpler, and callers that now scale to bytes will be
easier to switch to byte-based in the future.

Signed-off-by: Eric Blake 
Acked-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 

---
v7: rebase to master, simple enough to keep ack
v3-v6: no change
v2: rebase to count/bytes rename
---
 block/iscsi.c | 90 +--
 1 file changed, 44 insertions(+), 46 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 3414c21c7f5..d2b0466775c 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -458,24 +458,22 @@ static int iscsi_allocmap_init(IscsiLun *iscsilun, int 
open_flags)
 }

 static void
-iscsi_allocmap_update(IscsiLun *iscsilun, int64_t sector_num,
-  int nb_sectors, bool allocated, bool valid)
+iscsi_allocmap_update(IscsiLun *iscsilun, int64_t offset,
+  int64_t bytes, bool allocated, bool valid)
 {
 int64_t cl_num_expanded, nb_cls_expanded, cl_num_shrunk, nb_cls_shrunk;
-int cluster_sectors = iscsilun->cluster_size >> BDRV_SECTOR_BITS;

 if (iscsilun->allocmap == NULL) {
 return;
 }
 /* expand to entirely contain all affected clusters */
-assert(cluster_sectors);
-cl_num_expanded = sector_num / cluster_sectors;
-nb_cls_expanded = DIV_ROUND_UP(sector_num + nb_sectors,
-   cluster_sectors) - cl_num_expanded;
+assert(iscsilun->cluster_size);
+cl_num_expanded = offset / iscsilun->cluster_size;
+nb_cls_expanded = DIV_ROUND_UP(offset + bytes,
+   iscsilun->cluster_size) - cl_num_expanded;
 /* shrink to touch only completely contained clusters */
-cl_num_shrunk = DIV_ROUND_UP(sector_num, cluster_sectors);
-nb_cls_shrunk = (sector_num + nb_sectors) / cluster_sectors
-  - cl_num_shrunk;
+cl_num_shrunk = DIV_ROUND_UP(offset, iscsilun->cluster_size);
+nb_cls_shrunk = (offset + bytes) / iscsilun->cluster_size - cl_num_shrunk;
 if (allocated) {
 bitmap_set(iscsilun->allocmap, cl_num_expanded, nb_cls_expanded);
 } else {
@@ -498,26 +496,26 @@ iscsi_allocmap_update(IscsiLun *iscsilun, int64_t 
sector_num,
 }

 static void
-iscsi_allocmap_set_allocated(IscsiLun *iscsilun, int64_t sector_num,
- int nb_sectors)
+iscsi_allocmap_set_allocated(IscsiLun *iscsilun, int64_t offset,
+ int64_t bytes)
 {
-iscsi_allocmap_update(iscsilun, sector_num, nb_sectors, true, true);
+iscsi_allocmap_update(iscsilun, offset, bytes, true, true);
 }

 static void
-iscsi_allocmap_set_unallocated(IscsiLun *iscsilun, int64_t sector_num,
-   int nb_sectors)
+iscsi_allocmap_set_unallocated(IscsiLun *iscsilun, int64_t offset,
+   int64_t bytes)
 {
 /* Note: if cache.direct=on the fifth argument to iscsi_allocmap_update
  * is ignored, so this will in effect be an iscsi_allocmap_set_invalid.
  */
-iscsi_allocmap_update(iscsilun, sector_num, nb_sectors, false, true);
+iscsi_allocmap_update(iscsilun, offset, bytes, false, true);
 }

-static void iscsi_allocmap_set_invalid(IscsiLun *iscsilun, int64_t sector_num,
-   int nb_sectors)
+static void iscsi_allocmap_set_invalid(IscsiLun *iscsilun, int64_t offset,
+   int64_t bytes)
 {
-iscsi_allocmap_update(iscsilun, sector_num, nb_sectors, false, false);
+iscsi_allocmap_update(iscsilun, offset, bytes, false, false);
 }

 static void iscsi_allocmap_invalidate(IscsiLun *iscsilun)
@@ -531,34 +529,30 @@ static void iscsi_allocmap_invalidate(IscsiLun *iscsilun)
 }

 static inline bool
-iscsi_allocmap_is_allocated(IscsiLun *iscsilun, int64_t sector_num,
-int nb_sectors)
+iscsi_allocmap_is_allocated(IscsiLun *iscsilun, int64_t offset,
+int64_t bytes)
 {
 unsigned long size;
 if (iscsilun->allocmap == NULL) {
 return true;
 }
 assert(iscsilun->cluster_size);
-size = DIV_ROUND_UP(sector_num + nb_sectors,
-iscsilun->cluster_size >> BDRV_SECTOR_BITS);
+size = DIV_ROUND_UP(offset + bytes, iscsilun->cluster_size);
 return !(find_next_bit(iscsilun->allocmap, size,
-   sector_num * BDRV_SECTOR_SIZE /
-   iscsilun->cluster_size) == size);
+   offset / iscsilun->cluster_size) == size);
 }

 static inline bool iscsi_allocmap_is_valid(IscsiLun *iscsilun,
-   int64_t sector_num, int nb_sectors)
+   int64_t offset, int64_t bytes)
 {
 unsigned long size;
 if (iscsilun->allocmap_valid

[Qemu-devel] [PATCH v8 12/21] qcow2: Switch to .bdrv_co_block_status()

2018-02-13 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the qcow2 driver accordingly.

For now, we are ignoring the 'want_zero' hint.  However, it should
be relatively straightforward to honor the hint as a way to return
larger *pnum values when we have consecutive clusters with the same
data/zero status but which differ only in having non-consecutive
mappings.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 

---
v5-v7: no change
v4: update to interface tweak
v3: no change
v2: rebase to mapping flag
---
 block/qcow2.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 57a517e2bdd..288b5299d80 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1670,32 +1670,34 @@ static void qcow2_join_options(QDict *options, QDict 
*old_options)
 }
 }

-static int64_t coroutine_fn qcow2_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
+  bool want_zero,
+  int64_t offset, int64_t count,
+  int64_t *pnum, int64_t *map,
+  BlockDriverState **file)
 {
 BDRVQcow2State *s = bs->opaque;
 uint64_t cluster_offset;
 int index_in_cluster, ret;
 unsigned int bytes;
-int64_t status = 0;
+int status = 0;

-bytes = MIN(INT_MAX, nb_sectors * BDRV_SECTOR_SIZE);
+bytes = MIN(INT_MAX, count);
 qemu_co_mutex_lock(&s->lock);
-ret = qcow2_get_cluster_offset(bs, sector_num << BDRV_SECTOR_BITS, &bytes,
-   &cluster_offset);
+ret = qcow2_get_cluster_offset(bs, offset, &bytes, &cluster_offset);
 qemu_co_mutex_unlock(&s->lock);
 if (ret < 0) {
 return ret;
 }

-*pnum = bytes >> BDRV_SECTOR_BITS;
+*pnum = bytes;

 if (cluster_offset != 0 && ret != QCOW2_CLUSTER_COMPRESSED &&
 !s->crypto) {
-index_in_cluster = sector_num & (s->cluster_sectors - 1);
-cluster_offset |= (index_in_cluster << BDRV_SECTOR_BITS);
+index_in_cluster = offset & (s->cluster_size - 1);
+*map = cluster_offset | index_in_cluster;
 *file = bs->file->bs;
-status |= BDRV_BLOCK_OFFSET_VALID | cluster_offset;
+status |= BDRV_BLOCK_OFFSET_VALID;
 }
 if (ret == QCOW2_CLUSTER_ZERO_PLAIN || ret == QCOW2_CLUSTER_ZERO_ALLOC) {
 status |= BDRV_BLOCK_ZERO;
@@ -4352,7 +4354,7 @@ BlockDriver bdrv_qcow2 = {
 .bdrv_child_perm  = bdrv_format_default_perms,
 .bdrv_create= qcow2_create,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
-.bdrv_co_get_block_status = qcow2_co_get_block_status,
+.bdrv_co_block_status = qcow2_co_block_status,

 .bdrv_co_preadv = qcow2_co_preadv,
 .bdrv_co_pwritev= qcow2_co_pwritev,
-- 
2.14.3

[Qemu-devel] [PATCH v8 17/21] vdi: Switch to .bdrv_co_block_status()

2018-02-13 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vdi driver accordingly.  Note that the
TODO is already covered (the block layer guarantees bounds of its
requests), and that we can remove the now-unused s->block_sectors.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 

---
v6-v7: no change
v5: fix pnum when offset rounded down to block_size [Vladimir]
v4: rebase to interface tweak
v3: no change
v2: rebase to mapping flag
---
 block/vdi.c | 33 +
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 32b1763cde0..0780c82d829 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -172,8 +172,6 @@ typedef struct {
 uint32_t *bmap;
 /* Size of block (bytes). */
 uint32_t block_size;
-/* Size of block (sectors). */
-uint32_t block_sectors;
 /* First sector of block map. */
 uint32_t bmap_sector;
 /* VDI header (converted to host endianness). */
@@ -463,7 +461,6 @@ static int vdi_open(BlockDriverState *bs, QDict *options, 
int flags,
 bs->total_sectors = header.disk_size / SECTOR_SIZE;

 s->block_size = header.block_size;
-s->block_sectors = header.block_size / SECTOR_SIZE;
 s->bmap_sector = header.offset_bmap / SECTOR_SIZE;
 s->header = header;

@@ -509,33 +506,29 @@ static int vdi_reopen_prepare(BDRVReopenState *state,
 return 0;
 }

-static int64_t coroutine_fn vdi_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn vdi_co_block_status(BlockDriverState *bs,
+bool want_zero,
+int64_t offset, int64_t bytes,
+int64_t *pnum, int64_t *map,
+BlockDriverState **file)
 {
-/* TODO: Check for too large sector_num (in bdrv_is_allocated or here). */
 BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
-size_t bmap_index = sector_num / s->block_sectors;
-size_t sector_in_block = sector_num % s->block_sectors;
-int n_sectors = s->block_sectors - sector_in_block;
+size_t bmap_index = offset / s->block_size;
+size_t index_in_block = offset % s->block_size;
 uint32_t bmap_entry = le32_to_cpu(s->bmap[bmap_index]);
-uint64_t offset;
 int result;

-logout("%p, %" PRId64 ", %d, %p\n", bs, sector_num, nb_sectors, pnum);
-if (n_sectors > nb_sectors) {
-n_sectors = nb_sectors;
-}
-*pnum = n_sectors;
+logout("%p, %" PRId64 ", %" PRId64 ", %p\n", bs, offset, bytes, pnum);
+*pnum = MIN(s->block_size - index_in_block, bytes);
 result = VDI_IS_ALLOCATED(bmap_entry);
 if (!result) {
 return 0;
 }

-offset = s->header.offset_data +
-  (uint64_t)bmap_entry * s->block_size +
-  sector_in_block * SECTOR_SIZE;
+*map = s->header.offset_data + (uint64_t)bmap_entry * s->block_size +
+index_in_block;
 *file = bs->file->bs;
-return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
+return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 }

 static int coroutine_fn
@@ -903,7 +896,7 @@ static BlockDriver bdrv_vdi = {
 .bdrv_child_perm  = bdrv_format_default_perms,
 .bdrv_create = vdi_create,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
-.bdrv_co_get_block_status = vdi_co_get_block_status,
+.bdrv_co_block_status = vdi_co_block_status,
 .bdrv_make_empty = vdi_make_empty,

 .bdrv_co_preadv = vdi_co_preadv,
-- 
2.14.3

[Qemu-devel] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()

2018-02-13 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the null driver accordingly.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 

---
v6-v7: no change
v5: minor fix to type of 'ret'
v4: rebase to interface tweak
v3: no change
v2: rebase to mapping parameter
---
 block/null.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/block/null.c b/block/null.c
index 214d394fff4..806a8631e4d 100644
--- a/block/null.c
+++ b/block/null.c
@@ -223,22 +223,23 @@ static int null_reopen_prepare(BDRVReopenState 
*reopen_state,
 return 0;
 }

-static int64_t coroutine_fn null_co_get_block_status(BlockDriverState *bs,
- int64_t sector_num,
- int nb_sectors, int *pnum,
- BlockDriverState **file)
+static int coroutine_fn null_co_block_status(BlockDriverState *bs,
+ bool want_zero, int64_t offset,
+ int64_t bytes, int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
 {
 BDRVNullState *s = bs->opaque;
-off_t start = sector_num * BDRV_SECTOR_SIZE;
+int ret = BDRV_BLOCK_OFFSET_VALID;

-*pnum = nb_sectors;
+*pnum = bytes;
+*map = offset;
 *file = bs;

 if (s->read_zeroes) {
-return BDRV_BLOCK_OFFSET_VALID | start | BDRV_BLOCK_ZERO;
-} else {
-return BDRV_BLOCK_OFFSET_VALID | start;
+ret |= BDRV_BLOCK_ZERO;
 }
+return ret;
 }

 static void null_refresh_filename(BlockDriverState *bs, QDict *opts)
@@ -270,7 +271,7 @@ static BlockDriver bdrv_null_co = {
 .bdrv_co_flush_to_disk  = null_co_flush,
 .bdrv_reopen_prepare= null_reopen_prepare,

-.bdrv_co_get_block_status   = null_co_get_block_status,
+.bdrv_co_block_status   = null_co_block_status,

 .bdrv_refresh_filename  = null_refresh_filename,
 };
@@ -290,7 +291,7 @@ static BlockDriver bdrv_null_aio = {
 .bdrv_aio_flush = null_aio_flush,
 .bdrv_reopen_prepare= null_reopen_prepare,

-.bdrv_co_get_block_status   = null_co_get_block_status,
+.bdrv_co_block_status   = null_co_block_status,

 .bdrv_refresh_filename  = null_refresh_filename,
 };
-- 
2.14.3

[Qemu-devel] [PATCH v8 03/21] block: Switch passthrough drivers to .bdrv_co_block_status()

2018-02-13 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the generic helpers, and all passthrough clients
(blkdebug, commit, mirror, throttle) accordingly.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 

---
v6-v7: no change
v5: rebase to master
v4: rebase to interface tweak
v3: rebase to addition of throttle driver
v2: rebase to master, retitle while merging blkdebug, commit, and mirror
---
 include/block/block_int.h | 28 
 block/io.c| 36 
 block/blkdebug.c  | 20 +++-
 block/commit.c|  2 +-
 block/mirror.c|  2 +-
 block/throttle.c  |  2 +-
 6 files changed, 50 insertions(+), 40 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index c93722b43a4..bf2598856cf 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1041,23 +1041,27 @@ void bdrv_format_default_perms(BlockDriverState *bs, 
BdrvChild *c,
uint64_t *nperm, uint64_t *nshared);

 /*
- * Default implementation for drivers to pass bdrv_co_get_block_status() to
+ * Default implementation for drivers to pass bdrv_co_block_status() to
  * their file.
  */
-int64_t coroutine_fn bdrv_co_get_block_status_from_file(BlockDriverState *bs,
-int64_t sector_num,
-int nb_sectors,
-int *pnum,
-BlockDriverState 
**file);
+int coroutine_fn bdrv_co_block_status_from_file(BlockDriverState *bs,
+bool want_zero,
+int64_t offset,
+int64_t bytes,
+int64_t *pnum,
+int64_t *map,
+BlockDriverState **file);
 /*
- * Default implementation for drivers to pass bdrv_co_get_block_status() to
+ * Default implementation for drivers to pass bdrv_co_block_status() to
  * their backing file.
  */
-int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState 
*bs,
-   int64_t sector_num,
-   int nb_sectors,
-   int *pnum,
-   BlockDriverState 
**file);
+int coroutine_fn bdrv_co_block_status_from_backing(BlockDriverState *bs,
+   bool want_zero,
+   int64_t offset,
+   int64_t bytes,
+   int64_t *pnum,
+   int64_t *map,
+   BlockDriverState **file);
 const char *bdrv_get_parent_name(const BlockDriverState *bs);
 void blk_dev_change_media_cb(BlockBackend *blk, bool load, Error **errp);
 bool blk_dev_has_removable_media(BlockBackend *blk);
diff --git a/block/io.c b/block/io.c
index b00c7e2e2c0..5bae79f282e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1868,30 +1868,34 @@ typedef struct BdrvCoBlockStatusData {
 bool done;
 } BdrvCoBlockStatusData;

-int64_t coroutine_fn bdrv_co_get_block_status_from_file(BlockDriverState *bs,
-int64_t sector_num,
-int nb_sectors,
-int *pnum,
-BlockDriverState 
**file)
+int coroutine_fn bdrv_co_block_status_from_file(BlockDriverState *bs,
+bool want_zero,
+int64_t offset,
+int64_t bytes,
+int64_t *pnum,
+int64_t *map,
+BlockDriverState **file)
 {
 assert(bs->file && bs->file->bs);
-*pnum = nb_sectors;
+*pnum = bytes;
+*map = offset;
 *file = bs->file->bs;
-return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
-   (sector_num << BDRV_SECTOR_BITS);
+return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
 }

-int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState 
*bs,
-   int64_t sector_num,
-

[Qemu-devel] [PATCH v8 16/21] vdi: Avoid bitrot of debugging code

2018-02-13 Thread Eric Blake

Rework the debug define so that we always get -Wformat checking,
even when debugging is disabled.

Signed-off-by: Eric Blake 
Reviewed-by: Stefan Weil 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 

---
v2-v7: no change
---
 block/vdi.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index fc1c614cb12..32b1763cde0 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -87,12 +87,18 @@
 #define DEFAULT_CLUSTER_SIZE (1 * MiB)

 #if defined(CONFIG_VDI_DEBUG)
-#define logout(fmt, ...) \
-fprintf(stderr, "vdi\t%-24s" fmt, __func__, ##__VA_ARGS__)
+#define VDI_DEBUG 1
 #else
-#define logout(fmt, ...) ((void)0)
+#define VDI_DEBUG 0
 #endif

+#define logout(fmt, ...) \
+do {\
+if (VDI_DEBUG) {\
+fprintf(stderr, "vdi\t%-24s" fmt, __func__, ##__VA_ARGS__); \
+}   \
+} while (0)
+
 /* Image signature. */
 #define VDI_SIGNATURE 0xbeda107f

-- 
2.14.3

[Qemu-devel] [PATCH v8 05/21] gluster: Switch to .bdrv_co_block_status()

2018-02-13 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the gluster driver accordingly.

In want_zero mode, we continue to report fine-grained hole
information (the caller wants as much mapping detail as possible);
but when not in that mode, the caller prefers larger *pnum and
merely cares about what offsets are allocated at this layer, rather
than where the holes live.  Since holes still read as zeroes at
this layer (rather than deferring to a backing layer), we can take
the shortcut of skipping find_allocation(), and merely state that
all bytes are allocated.

We can also drop redundant bounds checks that are already
guaranteed by the block layer.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 

---
v6-v7: no change
v5: drop redundant code
v4: tweak commit message [Fam], rebase to interface tweak
v3: no change
v2: tweak comments [Prasanna], add mapping, drop R-b
---
 block/gluster.c | 70 -
 1 file changed, 34 insertions(+), 36 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 3f17b7819d2..1a07d221d17 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1362,68 +1362,66 @@ exit:
 }

 /*
- * Returns the allocation status of the specified sectors.
+ * Returns the allocation status of the specified offset.
  *
- * If 'sector_num' is beyond the end of the disk image the return value is 0
- * and 'pnum' is set to 0.
+ * The block layer guarantees 'offset' and 'bytes' are within bounds.
  *
- * 'pnum' is set to the number of sectors (including and immediately following
- * the specified sector) that are known to be in the same
+ * 'pnum' is set to the number of bytes (including and immediately following
+ * the specified offset) that are known to be in the same
  * allocated/unallocated state.
  *
- * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
- * beyond the end of the disk image it will be clamped.
+ * 'bytes' is the max value 'pnum' should be set to.
  *
- * (Based on raw_co_get_block_status() from file-posix.c.)
+ * (Based on raw_co_block_status() from file-posix.c.)
  */
-static int64_t coroutine_fn qemu_gluster_co_get_block_status(
-BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
-BlockDriverState **file)
+static int coroutine_fn qemu_gluster_co_block_status(BlockDriverState *bs,
+ bool want_zero,
+ int64_t offset,
+ int64_t bytes,
+ int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
 {
 BDRVGlusterState *s = bs->opaque;
-off_t start, data = 0, hole = 0;
-int64_t total_size;
+off_t data = 0, hole = 0;
 int ret = -EINVAL;

 if (!s->fd) {
 return ret;
 }

-start = sector_num * BDRV_SECTOR_SIZE;
-total_size = bdrv_getlength(bs);
-if (total_size < 0) {
-return total_size;
-} else if (start >= total_size) {
-*pnum = 0;
-return 0;
-} else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
-nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
+if (!want_zero) {
+*pnum = bytes;
+*map = offset;
+*file = bs;
+return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 }

-ret = find_allocation(bs, start, &data, &hole);
+ret = find_allocation(bs, offset, &data, &hole);
 if (ret == -ENXIO) {
 /* Trailing hole */
-*pnum = nb_sectors;
+*pnum = bytes;
 ret = BDRV_BLOCK_ZERO;
 } else if (ret < 0) {
 /* No info available, so pretend there are no holes */
-*pnum = nb_sectors;
+*pnum = bytes;
 ret = BDRV_BLOCK_DATA;
-} else if (data == start) {
-/* On a data extent, compute sectors to the end of the extent,
+} else if (data == offset) {
+/* On a data extent, compute bytes to the end of the extent,
  * possibly including a partial sector at EOF. */
-*pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
+*pnum = MIN(bytes, hole - offset);
 ret = BDRV_BLOCK_DATA;
 } else {
-/* On a hole, compute sectors to the beginning of the next extent.  */
-assert(hole == start);
-*pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
+/* On a hole, compute bytes to the beginning of the next extent.  */
+assert(hole == offset);
+*pnum = MIN(bytes, data - offset);
 ret = BDRV_BLOCK_ZERO;
 }

+*map = offset;
 *file = bs;

-return ret | BDRV_BLOCK_OFFSET_VALID | start;
+return ret | BDRV_BLOCK_OFFSET_VALID;
 }


@@ -1451,7 +1449,7 @@ static Blo

[Qemu-devel] [PATCH v8 08/21] iscsi: Switch to .bdrv_co_block_status()

2018-02-13 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the iscsi driver accordingly.  In this case,
it is handy to teach iscsi_co_block_status() to handle a NULL map
and file parameter, even though the block layer passes non-NULL
values, because we also call the function directly.  For now, there
are no optimizations done based on the want_zero flag.

We can also make the simplification of asserting that the block
layer passed in aligned values.

Signed-off-by: Eric Blake 
Reviewed-by: Fam Zheng 

---
v8: rebase to master
v7: rebase to master
v6: no change
v5: assert rather than check for alignment
v4: rebase to interface tweaks
v3: no change
v2: rebase to mapping parameter
---
 block/iscsi.c | 67 ---
 1 file changed, 32 insertions(+), 35 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index d2b0466775c..4842519fdad 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -653,36 +653,36 @@ out_unlock:



-static int64_t coroutine_fn iscsi_co_get_block_status(BlockDriverState *bs,
-  int64_t sector_num,
-  int nb_sectors, int *pnum,
-  BlockDriverState **file)
+static int coroutine_fn iscsi_co_block_status(BlockDriverState *bs,
+  bool want_zero, int64_t offset,
+  int64_t bytes, int64_t *pnum,
+  int64_t *map,
+  BlockDriverState **file)
 {
 IscsiLun *iscsilun = bs->opaque;
 struct scsi_get_lba_status *lbas = NULL;
 struct scsi_lba_status_descriptor *lbasd = NULL;
 struct IscsiTask iTask;
 uint64_t lba;
-int64_t ret;
+int ret;

 iscsi_co_init_iscsitask(iscsilun, &iTask);

-if (!is_sector_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
-ret = -EINVAL;
-goto out;
-}
+assert(QEMU_IS_ALIGNED(offset | bytes, iscsilun->block_size));

 /* default to all sectors allocated */
-ret = BDRV_BLOCK_DATA;
-ret |= (sector_num << BDRV_SECTOR_BITS) | BDRV_BLOCK_OFFSET_VALID;
-*pnum = nb_sectors;
+ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+if (map) {
+*map = offset;
+}
+*pnum = bytes;

 /* LUN does not support logical block provisioning */
 if (!iscsilun->lbpme) {
 goto out;
 }

-lba = sector_qemu2lun(sector_num, iscsilun);
+lba = offset / iscsilun->block_size;

 qemu_mutex_lock(&iscsilun->mutex);
 retry:
@@ -727,12 +727,12 @@ retry:

 lbasd = &lbas->descriptors[0];

-if (sector_qemu2lun(sector_num, iscsilun) != lbasd->lba) {
+if (lba != lbasd->lba) {
 ret = -EIO;
 goto out_unlock;
 }

-*pnum = sector_lun2qemu(lbasd->num_blocks, iscsilun);
+*pnum = lbasd->num_blocks * iscsilun->block_size;

 if (lbasd->provisioning == SCSI_PROVISIONING_TYPE_DEALLOCATED ||
 lbasd->provisioning == SCSI_PROVISIONING_TYPE_ANCHORED) {
@@ -743,15 +743,13 @@ retry:
 }

 if (ret & BDRV_BLOCK_ZERO) {
-iscsi_allocmap_set_unallocated(iscsilun, sector_num * BDRV_SECTOR_SIZE,
-   *pnum * BDRV_SECTOR_SIZE);
+iscsi_allocmap_set_unallocated(iscsilun, offset, *pnum);
 } else {
-iscsi_allocmap_set_allocated(iscsilun, sector_num * BDRV_SECTOR_SIZE,
- *pnum * BDRV_SECTOR_SIZE);
+iscsi_allocmap_set_allocated(iscsilun, offset, *pnum);
 }

-if (*pnum > nb_sectors) {
-*pnum = nb_sectors;
+if (*pnum > bytes) {
+*pnum = bytes;
 }
 out_unlock:
 qemu_mutex_unlock(&iscsilun->mutex);
@@ -760,7 +758,7 @@ out:
 if (iTask.task != NULL) {
 scsi_free_scsi_task(iTask.task);
 }
-if (ret > 0 && ret & BDRV_BLOCK_OFFSET_VALID) {
+if (ret > 0 && ret & BDRV_BLOCK_OFFSET_VALID && file) {
 *file = bs;
 }
 return ret;
@@ -800,25 +798,24 @@ static int coroutine_fn iscsi_co_readv(BlockDriverState 
*bs,
  nb_sectors * BDRV_SECTOR_SIZE) &&
 !iscsi_allocmap_is_allocated(iscsilun, sector_num * BDRV_SECTOR_SIZE,
  nb_sectors * BDRV_SECTOR_SIZE)) {
-int pnum;
-BlockDriverState *file;
+int64_t pnum;
 /* check the block status from the beginning of the cluster
  * containing the start sector */
-int cluster_sectors = iscsilun->cluster_size >> BDRV_SECTOR_BITS;
-int head;
-int64_t ret;
+int64_t head;
+int ret;

-assert(cluster_sectors);
-head = sector_num % cluster_sectors;
-ret = iscsi_co_get_block_status(bs, sector_num - head,
-BDRV_REQUEST_MAX_SECTORS, &pnum,
-

[Qemu-devel] [PATCH v8 04/21] file-posix: Switch to .bdrv_co_block_status()

2018-02-13 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the file protocol driver accordingly.

In want_zero mode, we continue to report fine-grained hole
information (the caller wants as much mapping detail as possible);
but when not in that mode, the caller prefers larger *pnum and
merely cares about what offsets are allocated at this layer, rather
than where the holes live.  Since holes still read as zeroes at
this layer (rather than deferring to a backing layer), we can take
the shortcut of skipping lseek(), and merely state that all bytes
are allocated.

We can also drop redundant bounds checks that are already
guaranteed by the block layer.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 

---
v6-v7: no change
v5: drop redundant code
v4: tweak commit message [Fam], rebase to interface tweak
v3: no change
v2: tweak comment, add mapping support
---
 block/file-posix.c | 62 +-
 1 file changed, 29 insertions(+), 33 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index ca49c1a98ae..f1591c38490 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2131,25 +2131,24 @@ static int find_allocation(BlockDriverState *bs, off_t 
start,
 }

 /*
- * Returns the allocation status of the specified sectors.
+ * Returns the allocation status of the specified offset.
  *
- * If 'sector_num' is beyond the end of the disk image the return value is 0
- * and 'pnum' is set to 0.
+ * The block layer guarantees 'offset' and 'bytes' are within bounds.
  *
- * 'pnum' is set to the number of sectors (including and immediately following
- * the specified sector) that are known to be in the same
+ * 'pnum' is set to the number of bytes (including and immediately following
+ * the specified offset) that are known to be in the same
  * allocated/unallocated state.
  *
- * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
- * beyond the end of the disk image it will be clamped.
+ * 'bytes' is the max value 'pnum' should be set to.
  */
-static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num,
-int nb_sectors, int *pnum,
-BlockDriverState **file)
+static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
+bool want_zero,
+int64_t offset,
+int64_t bytes, int64_t *pnum,
+int64_t *map,
+BlockDriverState **file)
 {
-off_t start, data = 0, hole = 0;
-int64_t total_size;
+off_t data = 0, hole = 0;
 int ret;

 ret = fd_open(bs);
@@ -2157,39 +2156,36 @@ static int64_t coroutine_fn 
raw_co_get_block_status(BlockDriverState *bs,
 return ret;
 }

-start = sector_num * BDRV_SECTOR_SIZE;
-total_size = bdrv_getlength(bs);
-if (total_size < 0) {
-return total_size;
-} else if (start >= total_size) {
-*pnum = 0;
-return 0;
-} else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
-nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
+if (!want_zero) {
+*pnum = bytes;
+*map = offset;
+*file = bs;
+return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 }

-ret = find_allocation(bs, start, &data, &hole);
+ret = find_allocation(bs, offset, &data, &hole);
 if (ret == -ENXIO) {
 /* Trailing hole */
-*pnum = nb_sectors;
+*pnum = bytes;
 ret = BDRV_BLOCK_ZERO;
 } else if (ret < 0) {
 /* No info available, so pretend there are no holes */
-*pnum = nb_sectors;
+*pnum = bytes;
 ret = BDRV_BLOCK_DATA;
-} else if (data == start) {
-/* On a data extent, compute sectors to the end of the extent,
+} else if (data == offset) {
+/* On a data extent, compute bytes to the end of the extent,
  * possibly including a partial sector at EOF. */
-*pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
+*pnum = MIN(bytes, hole - offset);
 ret = BDRV_BLOCK_DATA;
 } else {
-/* On a hole, compute sectors to the beginning of the next extent.  */
-assert(hole == start);
-*pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
+/* On a hole, compute bytes to the beginning of the next extent.  */
+assert(hole == offset);
+*pnum = MIN(bytes, data - offset);
 ret = BDRV_BLOCK_ZERO;
 }
+*map = offset;
 *file = bs;
-return ret | BDRV_BLOCK_OFFSET_VALID | start;
+return ret | BDRV_BLOCK_OFFSET_VALID;
 }

 static coroutine_fn Block

[Qemu-devel] [PATCH v8 11/21] qcow: Switch to .bdrv_co_block_status()

2018-02-13 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the qcow driver accordingly.  There is no
intent to optimize based on the want_zero flag for this format.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 

---
v5-v7: no change
v4: rebase to interface tweak
v3: rebase to master
v2: rebase to mapping flag
---
 block/qcow.c | 27 ---
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/block/qcow.c b/block/qcow.c
index 8631155ac81..dead5029c67 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -524,23 +524,28 @@ static int get_cluster_offset(BlockDriverState *bs,
 return 1;
 }

-static int64_t coroutine_fn qcow_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn qcow_co_block_status(BlockDriverState *bs,
+ bool want_zero,
+ int64_t offset, int64_t bytes,
+ int64_t *pnum, int64_t *map,
+ BlockDriverState **file)
 {
 BDRVQcowState *s = bs->opaque;
-int index_in_cluster, n, ret;
+int index_in_cluster, ret;
+int64_t n;
 uint64_t cluster_offset;

 qemu_co_mutex_lock(&s->lock);
-ret = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0, &cluster_offset);
+ret = get_cluster_offset(bs, offset, 0, 0, 0, 0, &cluster_offset);
 qemu_co_mutex_unlock(&s->lock);
 if (ret < 0) {
 return ret;
 }
-index_in_cluster = sector_num & (s->cluster_sectors - 1);
-n = s->cluster_sectors - index_in_cluster;
-if (n > nb_sectors)
-n = nb_sectors;
+index_in_cluster = offset & (s->cluster_size - 1);
+n = s->cluster_size - index_in_cluster;
+if (n > bytes) {
+n = bytes;
+}
 *pnum = n;
 if (!cluster_offset) {
 return 0;
@@ -548,9 +553,9 @@ static int64_t coroutine_fn 
qcow_co_get_block_status(BlockDriverState *bs,
 if ((cluster_offset & QCOW_OFLAG_COMPRESSED) || s->crypto) {
 return BDRV_BLOCK_DATA;
 }
-cluster_offset |= (index_in_cluster << BDRV_SECTOR_BITS);
+*map = cluster_offset | index_in_cluster;
 *file = bs->file->bs;
-return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | cluster_offset;
+return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 }

 static int decompress_buffer(uint8_t *out_buf, int out_buf_size,
@@ -1128,7 +1133,7 @@ static BlockDriver bdrv_qcow = {

 .bdrv_co_readv  = qcow_co_readv,
 .bdrv_co_writev = qcow_co_writev,
-.bdrv_co_get_block_status   = qcow_co_get_block_status,
+.bdrv_co_block_status   = qcow_co_block_status,

 .bdrv_make_empty= qcow_make_empty,
 .bdrv_co_pwritev_compressed = qcow_co_pwritev_compressed,
-- 
2.14.3

[Qemu-devel] [PATCH v8 01/21] block: Add .bdrv_co_block_status() callback

2018-02-13 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based. Now that the block layer exposes byte-based allocation,
it's time to tackle the drivers.  Add a new callback that operates
on as small as byte boundaries. Subsequent patches will then update
individual drivers, then finally remove .bdrv_co_get_block_status().

The new code also passes through the 'want_zero' hint, which will
allow subsequent patches to further optimize callers that only care
about how much of the image is allocated (want_zero is false),
rather than full details about runs of zeroes and which offsets the
allocation actually maps to (want_zero is true).  As part of this
effort, fix another part of the documentation: the claim in commit
4c41cb4 that BDRV_BLOCK_ALLOCATED is short for 'DATA || ZERO' is a
lie at the block layer (see commit e88ae2264), even though it is
how the bit is computed from the driver layer.  After all, there
are intentionally cases where we return ZERO but not ALLOCATED at
the block layer, when we know that a read sees zero because the
backing file is too short.  Note that the driver interface is thus
slightly different than the public interface with regards to which
bits will be set, and what guarantees are provided on input.

We also add an assertion that any driver using the new callback will
make progress (the only time pnum will be 0 is if the block layer
already handled an out-of-bounds request, or if there is an error);
the old driver interface did not provide this guarantee, which
could lead to some inf-loops in drastic corner-case failures.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 

---
v7: add R-b
v6: drop now-useless rounding of mid-sector end-of-file hole [Kevin],
better documentation of 'want_zero' [Kevin]
v5: rebase to master, typo fix, document more block layer guarantees
v4: rebase to master
v3: no change
v2: improve alignment handling, ensure all iotests still pass
---
 include/block/block.h | 14 +++---
 include/block/block_int.h | 20 +++-
 block/io.c| 28 +++-
 3 files changed, 41 insertions(+), 21 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 19b3ab9cb5e..947e8876cdd 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -115,19 +115,19 @@ typedef struct HDGeometry {
  * BDRV_BLOCK_ZERO: offset reads as zero
  * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing raw data
  * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
- *   layer (short for DATA || ZERO), set by block layer
- * BDRV_BLOCK_EOF: the returned pnum covers through end of file for this layer
+ *   layer rather than any backing, set by block layer
+ * BDRV_BLOCK_EOF: the returned pnum covers through end of file for this
+ * layer, set by block layer
  *
  * Internal flag:
  * BDRV_BLOCK_RAW: for use by passthrough drivers, such as raw, to request
  * that the block layer recompute the answer from the returned
  * BDS; must be accompanied by just BDRV_BLOCK_OFFSET_VALID.
  *
- * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 (BDRV_BLOCK_OFFSET_MASK) of
- * the return value (old interface) or the entire map parameter (new
- * interface) represent the offset in the returned BDS that is allocated for
- * the corresponding raw data.  However, whether that offset actually
- * contains data also depends on BDRV_BLOCK_DATA, as follows:
+ * If BDRV_BLOCK_OFFSET_VALID is set, the map parameter represents the
+ * host offset within the returned BDS that is allocated for the
+ * corresponding raw guest data.  However, whether that offset
+ * actually contains data also depends on BDRV_BLOCK_DATA, as follows:
  *
  * DATA ZERO OFFSET_VALID
  *  ttt   sectors read as zero, returned file is zero at offset
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 5ea63f8fa8a..c93722b43a4 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -202,15 +202,25 @@ struct BlockDriver {
 /*
  * Building block for bdrv_block_status[_above] and
  * bdrv_is_allocated[_above].  The driver should answer only
- * according to the current layer, and should not set
- * BDRV_BLOCK_ALLOCATED, but may set BDRV_BLOCK_RAW.  See block.h
- * for the meaning of _DATA, _ZERO, and _OFFSET_VALID.  The block
- * layer guarantees input aligned to request_alignment, as well as
- * non-NULL pnum and file.
+ * according to the current layer, and should only need to set
+ * BDRV_BLOCK_DATA, BDRV_BLOCK_ZERO, BDRV_BLOCK_OFFSET_VALID,
+ * and/or BDRV_BLOCK_RAW; if the current layer defers to a backing
+ * layer, the result should be 0 (and not BDRV_BLOCK_ZERO).  See
+ * block.h for the overall meaning of the bits.  As a hint, the
+ * flag want_zero is true if the caller care

[Qemu-devel] [PATCH v8 06/21] iscsi: Switch cluster_sectors to byte-based

2018-02-13 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert all uses of
the cluster size in sectors, along with adding assertions that we
are not dividing by zero.

Improve some comment grammar while in the area.

Signed-off-by: Eric Blake 
Acked-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 

---
v8: rebase to master
v2-v7: no change
---
 block/iscsi.c | 56 +++-
 1 file changed, 35 insertions(+), 21 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 421983dd6ff..3414c21c7f5 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -86,7 +86,7 @@ typedef struct IscsiLun {
 unsigned long *allocmap;
 unsigned long *allocmap_valid;
 long allocmap_size;
-int cluster_sectors;
+int cluster_size;
 bool use_16_for_rw;
 bool write_protected;
 bool lbpme;
@@ -430,9 +430,10 @@ static int iscsi_allocmap_init(IscsiLun *iscsilun, int 
open_flags)
 {
 iscsi_allocmap_free(iscsilun);

+assert(iscsilun->cluster_size);
 iscsilun->allocmap_size =
-DIV_ROUND_UP(sector_lun2qemu(iscsilun->num_blocks, iscsilun),
- iscsilun->cluster_sectors);
+DIV_ROUND_UP(iscsilun->num_blocks * iscsilun->block_size,
+ iscsilun->cluster_size);

 iscsilun->allocmap = bitmap_try_new(iscsilun->allocmap_size);
 if (!iscsilun->allocmap) {
@@ -440,7 +441,7 @@ static int iscsi_allocmap_init(IscsiLun *iscsilun, int 
open_flags)
 }

 if (open_flags & BDRV_O_NOCACHE) {
-/* in case that cache.direct = on all allocmap entries are
+/* when cache.direct = on all allocmap entries are
  * treated as invalid to force a relookup of the block
  * status on every read request */
 return 0;
@@ -461,17 +462,19 @@ iscsi_allocmap_update(IscsiLun *iscsilun, int64_t 
sector_num,
   int nb_sectors, bool allocated, bool valid)
 {
 int64_t cl_num_expanded, nb_cls_expanded, cl_num_shrunk, nb_cls_shrunk;
+int cluster_sectors = iscsilun->cluster_size >> BDRV_SECTOR_BITS;

 if (iscsilun->allocmap == NULL) {
 return;
 }
 /* expand to entirely contain all affected clusters */
-cl_num_expanded = sector_num / iscsilun->cluster_sectors;
+assert(cluster_sectors);
+cl_num_expanded = sector_num / cluster_sectors;
 nb_cls_expanded = DIV_ROUND_UP(sector_num + nb_sectors,
-   iscsilun->cluster_sectors) - 
cl_num_expanded;
+   cluster_sectors) - cl_num_expanded;
 /* shrink to touch only completely contained clusters */
-cl_num_shrunk = DIV_ROUND_UP(sector_num, iscsilun->cluster_sectors);
-nb_cls_shrunk = (sector_num + nb_sectors) / iscsilun->cluster_sectors
+cl_num_shrunk = DIV_ROUND_UP(sector_num, cluster_sectors);
+nb_cls_shrunk = (sector_num + nb_sectors) / cluster_sectors
   - cl_num_shrunk;
 if (allocated) {
 bitmap_set(iscsilun->allocmap, cl_num_expanded, nb_cls_expanded);
@@ -535,9 +538,12 @@ iscsi_allocmap_is_allocated(IscsiLun *iscsilun, int64_t 
sector_num,
 if (iscsilun->allocmap == NULL) {
 return true;
 }
-size = DIV_ROUND_UP(sector_num + nb_sectors, iscsilun->cluster_sectors);
+assert(iscsilun->cluster_size);
+size = DIV_ROUND_UP(sector_num + nb_sectors,
+iscsilun->cluster_size >> BDRV_SECTOR_BITS);
 return !(find_next_bit(iscsilun->allocmap, size,
-   sector_num / iscsilun->cluster_sectors) == size);
+   sector_num * BDRV_SECTOR_SIZE /
+   iscsilun->cluster_size) == size);
 }

 static inline bool iscsi_allocmap_is_valid(IscsiLun *iscsilun,
@@ -547,9 +553,12 @@ static inline bool iscsi_allocmap_is_valid(IscsiLun 
*iscsilun,
 if (iscsilun->allocmap_valid == NULL) {
 return false;
 }
-size = DIV_ROUND_UP(sector_num + nb_sectors, iscsilun->cluster_sectors);
+assert(iscsilun->cluster_size);
+size = DIV_ROUND_UP(sector_num + nb_sectors,
+iscsilun->cluster_size >> BDRV_SECTOR_BITS);
 return (find_next_zero_bit(iscsilun->allocmap_valid, size,
-   sector_num / iscsilun->cluster_sectors) == 
size);
+   sector_num * BDRV_SECTOR_SIZE /
+   iscsilun->cluster_size) == size);
 }

 static int coroutine_fn
@@ -793,16 +802,21 @@ static int coroutine_fn iscsi_co_readv(BlockDriverState 
*bs,
 BlockDriverState *file;
 /* check the block status from the beginning of the cluster
  * containing the start sector */
-int64_t ret = iscsi_co_get_block_status(bs,
-  sector_num - sector_num % iscsilun->cluster_sectors,
-  BDRV_REQUEST_MAX_SECTORS, &pnum, &file);
+int cluster_sectors = iscsilun->cluster_size >> BDRV_SECTO

[Qemu-devel] [PATCH v8 10/21] parallels: Switch to .bdrv_co_block_status()

2018-02-13 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the parallels driver accordingly.  Note that
the internal function block_status() is still sector-based, because
it is still in use by other sector-based functions; but that's okay
because request_alignment is 512 as a result of those functions.
For now, no optimizations are added based on the mapping hint.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 

---
v7: fix bug in *map [Vladimir]
v6: no change
v5: fix pnum when return is 0
v4: rebase to interface tweak, R-b dropped
v3: no change
v2: rebase to mapping parameter; it is ignored, so R-b kept
---
 block/parallels.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index e1e3d80c887..3e952a9c147 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -261,23 +261,31 @@ static coroutine_fn int 
parallels_co_flush_to_os(BlockDriverState *bs)
 }


-static int64_t coroutine_fn parallels_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn parallels_co_block_status(BlockDriverState *bs,
+  bool want_zero,
+  int64_t offset,
+  int64_t bytes,
+  int64_t *pnum,
+  int64_t *map,
+  BlockDriverState **file)
 {
 BDRVParallelsState *s = bs->opaque;
-int64_t offset;
+int count;

+assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
 qemu_co_mutex_lock(&s->lock);
-offset = block_status(s, sector_num, nb_sectors, pnum);
+offset = block_status(s, offset >> BDRV_SECTOR_BITS,
+  bytes >> BDRV_SECTOR_BITS, &count);
 qemu_co_mutex_unlock(&s->lock);

+*pnum = count * BDRV_SECTOR_SIZE;
 if (offset < 0) {
 return 0;
 }

+*map = offset * BDRV_SECTOR_SIZE;
 *file = bs->file->bs;
-return (offset << BDRV_SECTOR_BITS) |
-BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 }

 static coroutine_fn int parallels_co_writev(BlockDriverState *bs,
@@ -782,7 +790,7 @@ static BlockDriver bdrv_parallels = {
 .bdrv_open = parallels_open,
 .bdrv_close= parallels_close,
 .bdrv_child_perm  = bdrv_format_default_perms,
-.bdrv_co_get_block_status = parallels_co_get_block_status,
+.bdrv_co_block_status = parallels_co_block_status,
 .bdrv_has_zero_init   = bdrv_has_zero_init_1,
 .bdrv_co_flush_to_os  = parallels_co_flush_to_os,
 .bdrv_co_readv  = parallels_co_readv,
-- 
2.14.3

[Qemu-devel] [PATCH v8 02/21] nvme: Drop pointless .bdrv_co_get_block_status()

2018-02-13 Thread Eric Blake

Commit bdd6a90 has a bug: drivers should never directly set
BDRV_BLOCK_ALLOCATED, but only io.c should do that (as needed).
Instead, drivers should report BDRV_BLOCK_DATA if it knows that
data comes from this BDS.

But let's look at the bigger picture: semantically, the nvme
driver is similar to the nbd, null, and raw drivers (no backing
file, all data comes from this BDS).  But while two of those
other drivers have to supply the callback (null because it can
special-case BDRV_BLOCK_ZERO, raw because it can special-case
a different offset), in this case the block layer defaults are
good enough without the callback at all (similar to nbd).

So, fix the bug by deletion ;)

Signed-off-by: Eric Blake 

---
v8: new patch
---
 block/nvme.c | 14 --
 1 file changed, 14 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 10bffbbf2f4..4e561b08df3 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -1068,18 +1068,6 @@ static int nvme_reopen_prepare(BDRVReopenState 
*reopen_state,
 return 0;
 }

-static int64_t coroutine_fn nvme_co_get_block_status(BlockDriverState *bs,
- int64_t sector_num,
- int nb_sectors, int *pnum,
- BlockDriverState **file)
-{
-*pnum = nb_sectors;
-*file = bs;
-
-return BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_OFFSET_VALID |
-   (sector_num << BDRV_SECTOR_BITS);
-}
-
 static void nvme_refresh_filename(BlockDriverState *bs, QDict *opts)
 {
 QINCREF(opts);
@@ -1179,8 +1167,6 @@ static BlockDriver bdrv_nvme = {
 .bdrv_co_flush_to_disk= nvme_co_flush,
 .bdrv_reopen_prepare  = nvme_reopen_prepare,

-.bdrv_co_get_block_status = nvme_co_get_block_status,
-
 .bdrv_refresh_filename= nvme_refresh_filename,
 .bdrv_refresh_limits  = nvme_refresh_limits,

-- 
2.14.3

[Qemu-devel] [PATCH v8 14/21] raw: Switch to .bdrv_co_block_status()

2018-02-13 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the raw driver accordingly.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 

---
v5-v7: no change
v4: rebase to interface tweak
v3: no change
v2: rebase to mapping
---
 block/raw-format.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/block/raw-format.c b/block/raw-format.c
index ab552c09541..830243a8e48 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -250,17 +250,17 @@ fail:
 return ret;
 }

-static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num,
-int nb_sectors, int *pnum,
+static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
+bool want_zero, int64_t offset,
+int64_t bytes, int64_t *pnum,
+int64_t *map,
 BlockDriverState **file)
 {
 BDRVRawState *s = bs->opaque;
-*pnum = nb_sectors;
+*pnum = bytes;
 *file = bs->file->bs;
-sector_num += s->offset / BDRV_SECTOR_SIZE;
-return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
-   (sector_num << BDRV_SECTOR_BITS);
+*map = offset + s->offset;
+return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
 }

 static int coroutine_fn raw_co_pwrite_zeroes(BlockDriverState *bs,
@@ -496,7 +496,7 @@ BlockDriver bdrv_raw = {
 .bdrv_co_pwritev  = &raw_co_pwritev,
 .bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes,
 .bdrv_co_pdiscard = &raw_co_pdiscard,
-.bdrv_co_get_block_status = &raw_co_get_block_status,
+.bdrv_co_block_status = &raw_co_block_status,
 .bdrv_truncate= &raw_truncate,
 .bdrv_getlength   = &raw_getlength,
 .has_variable_length  = true,
-- 
2.14.3

Re: [Qemu-devel] [PATCH v6 1/3] pci: Add support for Designware IP block

2018-02-13 Thread Andrey Smirnov

On Tue, Feb 13, 2018 at 10:13 AM, Michael S. Tsirkin  wrote:
> On Tue, Feb 13, 2018 at 09:07:10AM -0800, Andrey Smirnov wrote:
>> +static void designware_pcie_root_class_init(ObjectClass *klass, void *data)
>> +{
>> +PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>> +DeviceClass *dc = DEVICE_CLASS(klass);
>> +
>> +set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
>> +
>> +k->vendor_id = PCI_VENDOR_ID_SYNOPSYS;
>> +k->device_id = 0xABCD;
>> +k->revision = 0;
>> +k->class_id = PCI_CLASS_BRIDGE_PCI;
>> +k->is_express = true;
>> +k->is_bridge = true;
>> +k->exit = pci_bridge_exitfn;
>> +k->realize = designware_pcie_root_realize;
>> +k->config_read = designware_pcie_root_config_read;
>> +k->config_write = designware_pcie_root_config_write;
>> +
>> +dc->reset = pci_bridge_reset;
>> +/*
>> + * PCI-facing part of the host bridge, not usable without the
>> + * host-facing part, which can't be device_add'ed, yet.
>> + */
>> +dc->user_creatable = false;
>> +dc->vmsd = &vmstate_designware_pcie_root;
>> +}
>> +
>> +static uint64_t designware_pcie_host_mmio_read(void *opaque, hwaddr addr,
>> +   unsigned int size)
>> +{
>> +PCIHostState *pci = PCI_HOST_BRIDGE(opaque);
>> +PCIDevice *device = pci_find_device(pci->bus, 0, 0);
>> +
>> +return pci_host_config_read_common(device,
>> +   addr,
>> +   pci_config_size(device),
>> +   size);
>> +}
>> +
>> +static void designware_pcie_host_mmio_write(void *opaque, hwaddr addr,
>> +uint64_t val, unsigned int size)
>> +{
>> +PCIHostState *pci = PCI_HOST_BRIDGE(opaque);
>> +PCIDevice *device = pci_find_device(pci->bus, 0, 0);
>> +
>> +return pci_host_config_write_common(device,
>> +addr,
>> +pci_config_size(device),
>> +val, size);
>> +}
>> +
>> +static const MemoryRegionOps designware_pci_mmio_ops = {
>> +.read   = designware_pcie_host_mmio_read,
>> +.write  = designware_pcie_host_mmio_write,
>> +.endianness = DEVICE_NATIVE_ENDIAN,
>> +.impl = {
>> +/*
>> + * Our device would not work correctly if the guest was doing
>> + * unaligned access. This might not be a limitation on the real
>> + * device but in practice there is no reason for a guest to access
>> + * this device unaligned.
>> + */
>> +.min_access_size = 4,
>> +.max_access_size = 4,
>> +.unaligned = false,
>> +},
>> +};
>
> Could you pls add some comments explaining why is DEVICE_NATIVE_ENDIAN
> appropriate here?  Most of these cases are plain "we never bothered
> about cross-endian setups". Some are "there's a mix of different
> endian-ness values, need to handle in a special way".
>
> I suspect you really need DEVICE_LITTLE_ENDIAN.
>

That MemoryRegion corresponds to a register file permanently mapped
into CPU's address space, so my assumption is that SoC designers will
wire it according to CPUs endianness be it big or little. I am not
aware of any big-endian CPU based SoC on the market using Designware's
IP block, so I don't think there are any precedent confirming or
denying correctness of my assumption. IMHO, this is also the reason
why all of Linux driver code for that IP assumes little endianness.

I can't say that I testing this code against a big-endian guest/CPU,
but that is primarily due to the fact that there's no real use case
and any test set up I can put toghere would be a contrived example
pointlessly proving my point.

Anyway, I am more than happy to switch it to use DEVICE_LITTLE_ENDIAN,
I just don't know if doing so is any more justified than keeping it
DEVICE_NATIVE_ENDIAN.

Thanks,
Andrey Smirnov

Re: [Qemu-devel] [PATCH v6 28/28] migration/hmp: add migrate_pause command

2018-02-13 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> Wrapper for QMP command "migrate-pause".
> 
> Signed-off-by: Peter Xu 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  hmp-commands.hx | 14 ++
>  hmp.c   |  9 +
>  hmp.h   |  1 +
>  3 files changed, 24 insertions(+)
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index 7563f3eaa0..32549702ee 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -969,6 +969,20 @@ STEXI
>  @item migrate_recover @var{uri}
>  @findex migrate_recover
>  Continue a paused incoming postcopy migration using the @var{uri}.
> +ETEXI
> +
> +{
> +.name   = "migrate_pause",
> +.args_type  = "",
> +.params = "",
> +.help   = "Pause an ongoing migration (postcopy-only)",
> +.cmd= hmp_migrate_pause,
> +},
> +
> +STEXI
> +@item migrate_pause
> +@findex migrate_pause
> +Pause an ongoing migration.  Currently it only supports postcopy.
>  ETEXI
>  
>  {
> diff --git a/hmp.c b/hmp.c
> index 4062d3fdba..ae6266cb21 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -1529,6 +1529,15 @@ void hmp_migrate_recover(Monitor *mon, const QDict 
> *qdict)
>  hmp_handle_error(mon, &err);
>  }
>  
> +void hmp_migrate_pause(Monitor *mon, const QDict *qdict)
> +{
> +Error *err = NULL;
> +
> +qmp_migrate_pause(&err);
> +
> +hmp_handle_error(mon, &err);
> +}
> +
>  /* Kept for backwards compatibility */
>  void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict)
>  {
> diff --git a/hmp.h b/hmp.h
> index 0d53fe78d9..0aa8dca738 100644
> --- a/hmp.h
> +++ b/hmp.h
> @@ -71,6 +71,7 @@ void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_continue(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_incoming(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_recover(Monitor *mon, const QDict *qdict);
> +void hmp_migrate_pause(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v6 27/28] migration/qmp: add command migrate-pause

2018-02-13 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> It pauses an ongoing migration.  Currently it only supports postcopy.
> Note that this command will work on either side of the migration.
> Basically when we trigger this on one side, it'll interrupt the other
> side as well since the other side will get notified on the disconnect
> event.
> 
> However, it's still possible that the other side is not notified, for
> example, when the network is totally broken, or due to some firewall
> configuration changes.  In that case, we will also need to run the same
> command on the other side so both sides will go into the paused state.
> 
> Signed-off-by: Peter Xu 
> ---
>  migration/migration.c | 27 +++
>  qapi/migration.json   | 16 
>  2 files changed, 43 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index bb57ed9ade..139abec0c3 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1448,6 +1448,33 @@ void qmp_migrate_recover(const char *uri, Error **errp)
>  qemu_start_incoming_migration(uri, errp);
>  }
>  
> +void qmp_migrate_pause(Error **errp)
> +{
> +MigrationState *ms = migrate_get_current();
> +MigrationIncomingState *mis = migration_incoming_get_current();
> +int ret;
> +
> +if (ms->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> +/* Source side, during postcopy */
> +ret = qemu_file_shutdown(ms->to_dst_file);

This doesn't feel thread safe; although I'm not sure how to make it so.
If the migration finishes just after we check the state but before the
shutdown we end up using a bogus QEMUFile*
Making all the places that close a QEMUFile* set hte pointer Null before
they do the close doesn't help because you still race with that.

(The race is small, but still)

Dave

> +if (ret) {
> +error_setg(errp, "Failed to pause source migration");
> +}
> +return;
> +}
> +
> +if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> +ret = qemu_file_shutdown(mis->from_src_file);
> +if (ret) {
> +error_setg(errp, "Failed to pause destination migration");
> +}
> +return;
> +}
> +
> +error_setg(errp, "migrate-pause is currently only supported "
> +   "during postcopy-active state");
> +}
> +
>  bool migration_is_blocked(Error **errp)
>  {
>  if (qemu_savevm_state_blocked(errp)) {
> diff --git a/qapi/migration.json b/qapi/migration.json
> index dfbcb02d4c..3d9cfeb8f1 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1192,3 +1192,19 @@
>  ##
>  { 'command': 'migrate-recover', 'data': { 'uri': 'str' },
>'allow-oob': true }
> +
> +##
> +# @migrate-pause:
> +#
> +# Pause a migration.  Currently it only supports postcopy.
> +#
> +# Returns: nothing.
> +#
> +# Example:
> +#
> +# -> { "execute": "migrate-pause" }
> +# <- { "return": {} }
> +#
> +# Since: 2.12
> +##
> +{ 'command': 'migrate-pause', 'allow-oob': true }
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v2 4/4] acpi: build TPM Physical Presence interface

2018-02-13 Thread Laszlo Ersek

On 02/13/18 20:37, Kevin O'Connor wrote:
> On Tue, Feb 13, 2018 at 05:16:49PM +0100, Laszlo Ersek wrote:
>> On 02/12/18 21:49, Stefan Berger wrote:
>>> On 02/12/2018 03:46 PM, Kevin O'Connor wrote:
 I'm not sure I fully understand the goals of the PPI interface.
 Here's what I understand so far:

 The TPM specs define some actions that are considered privileged.  An
 example of this would be disabling the TPM itself.  In order to
 prevent an attacker from performing these actions without
 authorization, the TPM specs define a mechanism to assert "physical
 presence" before the privileged action can be done.  They do this by
 having the firmware present a menu during early boot that permits
 these privileged operations, and then the firmware locks the TPM chip
 so the actions can no longer be done by any software that runs after
 the firmware.  Thus "physical presence" is asserted by demonstrating
 one has console access to the machine during early boot.

 The PPI spec implements a work around for this - presumably some found
 the enforcement mechanism too onerous.  It allows the OS to provide a
 request code to the firmware, and on the next boot the firmware will
 take the requested action before it locks the chip.  Thus allowing the
 OS to indirectly perform the privileged action even after the chip has
 been locked.  Thus, the PPI system seems to be an "elaborate hack" to
 allow users to circumvent the physical presence mechanism (if they
 choose to).
>>>
>>> Correct.

 Here's what I understand the proposed implementation involves:

 1 - in addition to emulating the TPM device itself, QEMU will also
  introduce a virtual memory device with 0x400 bytes.
>>> Correct.

 2 - on first boot the firmware (seabios and uefi) will populate the
  memory region created in step 1.  In particular it will fill an
  array with the list of request codes it supports.  (Each request
  is an 8bit value, the array has 256 entries.)
>>> Correct. Each firmware would fill out the 256 byte array depending on
>>> what it supports. The 8 bit values are basically flags and so on.
 3 - QEMU will produce AML code implementing the standard PPI ACPI
  interface.  This AML code will take the request, find the table
  produced in step 1, compare it to the list of accepted requests
  produced in step 2, and then place the 8bit request in another
  qemu virtual memory device (at 0x or 0xFED45000).
>>>
>>> Correct.
>>>
>>> Now EDK2 wants to store the code in a UEFI variable in NVRAM. We
>>> therefore would need to trigger an SMI. In SeaBIOS we wouldn't have to
>>> do this.
>>>
 4 - the OS will signal a reboot, qemu will do its normal reboot logic,
  and the firmware will be run again.

 5 - the firmware will extract the code written in stage 3, and if the
  tpm device has been configured to accept PPI codes from the OS, it
  will invoke the requested action.
>>>
>>> SeaBIOS would look into memory to find the code. EDK2 will read the code
>>> from a UEFI variable.
>>>
 Did I understand the above correctly?
>>> I think so. With the fine differences between SeaBIOS and EDK2 pointed out.
>>
>> Here's what I suggest:
>>
>> Please everyone continue working on this, according to Kevin's &
>> Stefan's description, but focus on QEMU and SeaBIOS *only*. Ignore edk2
>> for now.
> 
> If this were targetted at SeaBIOS, I'd look for a simpler
> QEMU/firmware interface.  Something like:
> 
> A - QEMU produces AML code implementing the standard PPI ACPI
> interface that generates a request code and stores it in the
> device memory of an existing device (eg, writable fw_cfg or an
> extension field in the existing emulated TPM device).
> 
> B - after a reboot the firmware extracts the PPI request code
> (produced in step A) and performs the requested action (if the TPM
> is configured to accept OS generated codes).
> 
> That is, skip steps 1 and 2 from the original proposal.

I think A/B can work fine, as long as
- the firmware can somehow dynamically recognize the device / "register
  block" that the request codes have to be pulled from, and
- QEMU is free to move the device or register block around, from release
  to release, without disturbing migration.

Thanks!
Laszlo

Re: [Qemu-devel] [PULL v2 00/48] Miscellaneous patches for 2017-02-13

2018-02-13 Thread Peter Maydell

On 13 February 2018 at 15:51, Paolo Bonzini  wrote:
> The following changes since commit 7d848450b6e2a3e14a776b4c93704710e7f3d233:
>
>   Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.12-20180212' 
> into staging (2018-02-12 14:52:48 +)
>
> are available in the git repository at:
>
>
>   git://github.com/bonzini/qemu.git tags/for-upstream
>
> for you to fetch changes up to 7524a39d8c7c9ff54504cfeb784909e4f49d6f30:
>
>   travis: use libgcc-4.8-dev (libgcc-6-dev is not available on Ubuntu 14.04) 
> (2018-02-13 16:15:09 +0100)
>
> 
> * CAN bus (will be under network maintainner)
> * scsi-block opblockers (myself)
> * Dirty log bitmap cleanup (myself)
> * SDHCI improvements and tests (Philippe)
> * HAX support for larger guest sizese (Yu Ning)
>
> 

Applied, thanks.

-- PMM

Re: [Qemu-devel] [Qemu-block] [PATCH 1/2] Add save-snapshot, load-snapshot and delete-snapshot to QAPI

2018-02-13 Thread Denis V. Lunev

On 02/13/2018 07:46 PM, Eric Blake wrote:
> On 02/13/2018 08:48 AM, Daniel P. Berrangé wrote:
 No, that's policy decision that doesn't matter from QMP pov. If the
 mgmt
 app wants the snapshot to be wrt to the initial time, it can simply
 invoke the "stop" QMP command before doing the live migration and
 "cont" afterwards.
>>>
>>> That would be non-live. I think Roman means a live snapshot that saves
>>> the state at the beginning of the operation. Basically the difference
>>> between blockdev-backup (state at the beginning) and blockdev-mirror
>>> (state at the end), except for a whole VM.
>>
>> That doesn't seem practical unless you can instantaneously write out
>> the entire guest RAM to disk without blocking, or can somehow snapshot
>> the RAM so you can write out a consistent view of the original RAM,
>> while the guest continues to dirty RAM pages.
>
> One idea for that is via fork()'s copy-on-write semantics; the parent
> continues processing the guest, while the child writes out RAM pages.
> Pages touched by the guest in the parent are now cleanly copied by the
> OS so that the child can take all the time it wants, but still writes
> out the state of the guest at the time of the fork().  It may be
> possible to use userfaultfd to achieve the same effects without a fork().
>
this would be problematic as we for sure will face memory problems.
Guest stalls are IMHO much less problematic (as they are expected
by end-user) then memory shortage.

Anyway, this is discussable.

Den

Re: [Qemu-devel] [PATCH v6 26/28] hmp/migration: add migrate_recover command

2018-02-13 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> Sister command to migrate-recover in QMP.
> 
> Signed-off-by: Peter Xu 

Yes, useful for testing, although we don't have any OOB equivalent yet,
something I need to look at.

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  hmp-commands.hx | 13 +
>  hmp.c   | 10 ++
>  hmp.h   |  1 +
>  3 files changed, 24 insertions(+)
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index 28ed5a7a13..7563f3eaa0 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -955,7 +955,20 @@ STEXI
>  @findex migrate_incoming
>  Continue an incoming migration using the @var{uri} (that has the same syntax
>  as the -incoming option).
> +ETEXI
>  
> +{
> +.name   = "migrate_recover",
> +.args_type  = "uri:s",
> +.params = "uri",
> +.help   = "Continue a paused incoming postcopy migration",
> +.cmd= hmp_migrate_recover,
> +},
> +
> +STEXI
> +@item migrate_recover @var{uri}
> +@findex migrate_recover
> +Continue a paused incoming postcopy migration using the @var{uri}.
>  ETEXI
>  
>  {
> diff --git a/hmp.c b/hmp.c
> index 6f8eec8365..4062d3fdba 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -1519,6 +1519,16 @@ void hmp_migrate_incoming(Monitor *mon, const QDict 
> *qdict)
>  hmp_handle_error(mon, &err);
>  }
>  
> +void hmp_migrate_recover(Monitor *mon, const QDict *qdict)
> +{
> +Error *err = NULL;
> +const char *uri = qdict_get_str(qdict, "uri");
> +
> +qmp_migrate_recover(uri, &err);
> +
> +hmp_handle_error(mon, &err);
> +}
> +
>  /* Kept for backwards compatibility */
>  void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict)
>  {
> diff --git a/hmp.h b/hmp.h
> index 536cb91caa..0d53fe78d9 100644
> --- a/hmp.h
> +++ b/hmp.h
> @@ -70,6 +70,7 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_continue(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_incoming(Monitor *mon, const QDict *qdict);
> +void hmp_migrate_recover(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
>  void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] device_tree: Increase FDT_MAX_SIZE to 128 KiB

2018-02-13 Thread Peter Maydell

On 13 February 2018 at 16:41, Geert Uytterhoeven
 wrote:
> It is not uncommon for a contemporary FDT to be larger than 64 KiB,
> leading to failures loading the device tree from sysfs:
>
> qemu-system-aarch64: qemu_fdt_setprop: Couldn't set ...: FDT_ERR_NOSPACE
>
> For reference, the largest arm64 DTB created from the Linux sources is
> 70 KiB large (93 KiB when built with symbols/fixup support).

I think we should probably give ourselves a bit more headroom,
then -- at least 256K.

The ppc boards actually define their own version of this constant:

#define FDT_MAX_SIZE0x0010

so I think we might as well just go with that in device_tree.c for
consistency.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 5/5] usb-mtp: Advertise SendObjectInfo for write support

2018-02-13 Thread Bandan Das

Gerd Hoffmann  writes:

>> +/*
>> + * ObjectInfo dataset received from initiator
>> + * Fields we don't care about are ignored
>> + */
>> +typedef struct {
>> +char __pad1[4];
>
> So, is this really padding or a field we don't care about?
>
> If the latter I'd suggest to give them proper names nevertheless,
> maybe append /* unused */.
>

Ok, will do.

>> +static void utf16_to_str(uint8_t len, uint16_t *arr, char *name)
>> +{
>> +int count;
>> +
>> +for (count = 0; count < len; count++) {
>> +/* Check for valid ascii */
>> +assert(!(arr[count] & 0xFF80));
>> +name[count] = arr[count];
>> +}
>> +}
>
> This should do the reverse of usb_mtp_add_str, i.e. first copy uint16_t
> array to wchar_t array, then use wcstombs to translate it into a
> (multi-)byte string of the current locale.

Ah, this is what I was missing. Thank you for the tip, will fix in the next
version.

Bandan

> cheers,
>   Gerd

Re: [Qemu-devel] [PATCH v2 4/4] acpi: build TPM Physical Presence interface

2018-02-13 Thread Kevin O'Connor

On Tue, Feb 13, 2018 at 05:16:49PM +0100, Laszlo Ersek wrote:
> On 02/12/18 21:49, Stefan Berger wrote:
> > On 02/12/2018 03:46 PM, Kevin O'Connor wrote:
> >> I'm not sure I fully understand the goals of the PPI interface.
> >> Here's what I understand so far:
> >>
> >> The TPM specs define some actions that are considered privileged.  An
> >> example of this would be disabling the TPM itself.  In order to
> >> prevent an attacker from performing these actions without
> >> authorization, the TPM specs define a mechanism to assert "physical
> >> presence" before the privileged action can be done.  They do this by
> >> having the firmware present a menu during early boot that permits
> >> these privileged operations, and then the firmware locks the TPM chip
> >> so the actions can no longer be done by any software that runs after
> >> the firmware.  Thus "physical presence" is asserted by demonstrating
> >> one has console access to the machine during early boot.
> >>
> >> The PPI spec implements a work around for this - presumably some found
> >> the enforcement mechanism too onerous.  It allows the OS to provide a
> >> request code to the firmware, and on the next boot the firmware will
> >> take the requested action before it locks the chip.  Thus allowing the
> >> OS to indirectly perform the privileged action even after the chip has
> >> been locked.  Thus, the PPI system seems to be an "elaborate hack" to
> >> allow users to circumvent the physical presence mechanism (if they
> >> choose to).
> > 
> > Correct.
> >>
> >> Here's what I understand the proposed implementation involves:
> >>
> >> 1 - in addition to emulating the TPM device itself, QEMU will also
> >>  introduce a virtual memory device with 0x400 bytes.
> > Correct.
> >>
> >> 2 - on first boot the firmware (seabios and uefi) will populate the
> >>  memory region created in step 1.  In particular it will fill an
> >>  array with the list of request codes it supports.  (Each request
> >>  is an 8bit value, the array has 256 entries.)
> > Correct. Each firmware would fill out the 256 byte array depending on
> > what it supports. The 8 bit values are basically flags and so on.
> >> 3 - QEMU will produce AML code implementing the standard PPI ACPI
> >>  interface.  This AML code will take the request, find the table
> >>  produced in step 1, compare it to the list of accepted requests
> >>  produced in step 2, and then place the 8bit request in another
> >>  qemu virtual memory device (at 0x or 0xFED45000).
> > 
> > Correct.
> > 
> > Now EDK2 wants to store the code in a UEFI variable in NVRAM. We
> > therefore would need to trigger an SMI. In SeaBIOS we wouldn't have to
> > do this.
> > 
> >> 4 - the OS will signal a reboot, qemu will do its normal reboot logic,
> >>  and the firmware will be run again.
> >>
> >> 5 - the firmware will extract the code written in stage 3, and if the
> >>  tpm device has been configured to accept PPI codes from the OS, it
> >>  will invoke the requested action.
> > 
> > SeaBIOS would look into memory to find the code. EDK2 will read the code
> > from a UEFI variable.
> > 
> >> Did I understand the above correctly?
> > I think so. With the fine differences between SeaBIOS and EDK2 pointed out.
> 
> Here's what I suggest:
> 
> Please everyone continue working on this, according to Kevin's &
> Stefan's description, but focus on QEMU and SeaBIOS *only*. Ignore edk2
> for now.

If this were targetted at SeaBIOS, I'd look for a simpler
QEMU/firmware interface.  Something like:

A - QEMU produces AML code implementing the standard PPI ACPI
interface that generates a request code and stores it in the
device memory of an existing device (eg, writable fw_cfg or an
extension field in the existing emulated TPM device).

B - after a reboot the firmware extracts the PPI request code
(produced in step A) and performs the requested action (if the TPM
is configured to accept OS generated codes).

That is, skip steps 1 and 2 from the original proposal.

-Kevin

Re: [Qemu-devel] [PATCH v2 3/5] usb-mtp: Support delete of mtp objects

2018-02-13 Thread Bandan Das

Gerd Hoffmann  writes:

>> +#ifndef CONFIG_INOTIFY1
>> +/* Assumes that children, if any, have been already freed */
>> +static void usb_mtp_object_free_one(MTPState *s, MTPObject *o)
>> +{
>> +assert(o->nchildren == 0);
>> +QTAILQ_REMOVE(&s->objects, o, next);
>> +g_free(o->name);
>> +g_free(o->path);
>> +g_free(o);
>> +}
>> +#endif
>
> I'd suggest to move the #ifdef into the function, so it can be called
> unconditinally.  Also #else /* not needed with inotify because ... */
> would be nice.
>
>> +#ifndef CONFIG_INOTIFY1
>> +usb_mtp_object_free_one(s, o);
>> +#endif
>
> These ifdefs can be dropped then.
>
>> +/* Mark store as RW */
>> +s->flags |= (1 << MTP_FLAG_WRITABLE);
>
> Do we want a property to enable write support?

Good idea, I will add one and set it to enabled by default.

Bandan

> cheers,
>   Gerd

Re: [Qemu-devel] QEMU leaves pidfile behind on exit

2018-02-13 Thread Laszlo Ersek

On 02/13/18 17:28, Daniel P. Berrangé wrote:
> On Fri, Feb 09, 2018 at 07:12:59PM +, Shaun Reitan wrote:
>> QEMU leaves the pidfile behind on a clean exit when using the option
>> -pidfile /var/run/qemu.pid.
>>
>> Should QEMU leave it behind or should it clean up after itself?
>>
>> I'm willing to take a crack at a patch to fix the issue, but before I do, I
>> want to make sure that leaving the pidfile behind was not intentional?
> 
> If QEMU deletes the pidfile on exit then, with the current pidfile
> acquisition logic, there's a race condition possible:
> 
> To acquire we do
> 
>  1. fd = open()
>  2. lockf(fd)
> 
> If the first QEMU that currently owns the pidfile unlinks in, while
> a second qemu is in betweeen steps 1 & 2, the second QEMU will
> acquire the pidfile successfully (which is fine) but the pidfile
> is now unlinked. This is not fine, because a 3rd qemu can now come
> and try to acquire the pidfile (by creating a new one) and succeed,
> despite the second qemu still owning the (now unlinked) pidfile.
> 
> It is possible to deal with this race by making qemu_create_pidfile
> more intelligent [1]. It would have todo
> 
>   1. fd = open(filename)
>   2. fstat(fd)
>   3. lockf(fd)
>   4. stat(filename)
> 
> It must then compare the results of 2 + 4 to ensure the pidfile it
> acquired is the same as the one on disk. With this change, it would
> be safe for QEMU to delete the pidfile on exit.

Why don't we just open the pidfile with (O_CREAT | O_EXCL)? O_EXCL is
supposed to be atomic.

... The open(2) manual on Linux says,

  On  NFS,  O_EXCL  is  supported only when using NFSv3 or
  later on kernel 2.6 or later.  In NFS environments where
  O_EXCL support is not provided, programs that rely on it
  for performing locking tasks will contain a race  condi-
  tion.   [...]

Sigh.

> [1] See the equiv libvirt logic for pidfile acquisition in
>  
> https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/virpidfile.c;h=58ab29f77f2cfb8583447112dae77a07446bc627;hb=HEAD#l384
> 

To my knowledge, "same file" should be checked with:

  a.st_dev == b.st_dev && a.st_ino == b.st_ino

Example:
- "filename" is "/var/run/qemu.pid"
- "/var/run" is originally a symbolic link to "/mnt/fs1/"
- between steps #1 and #4, "/var/run" is re-created as a symbolic link
  to "/mnt/fs2/" -- a different filesystem from fs1
- "/mnt/fs2/qemu.pid" happens to have the same inode number as
  "/mnt/fs1/qemu.pid"

Thanks,
Laszlo

Re: [Qemu-devel] [PATCH v6 3/9] block: Add VFIO based NVMe driver

2018-02-13 Thread Eric Blake


On 01/16/2018 12:08 AM, Fam Zheng wrote:

This is a new protocol driver that exclusively opens a host NVMe
controller through VFIO. It achieves better latency than linux-aio by
completely bypassing host kernel vfs/block layer.

 $rw-$bs-$iodepth  linux-aio nvme://
 
 randread-4k-1 10.5k 21.6k
 randread-512k-1   745   1591
 randwrite-4k-130.7k 37.0k
 randwrite-512k-1  1945  1980

 (unit: IOPS)

The driver also integrates with the polling mechanism of iothread.

This patch is co-authored by Paolo and me.

Signed-off-by: Paolo Bonzini 
Signed-off-by: Fam Zheng 
Message-Id: <20180110091846.10699-4-f...@redhat.com>
---


Sorry for not noticing sooner, but


+static int64_t coroutine_fn nvme_co_get_block_status(BlockDriverState *bs,
+ int64_t sector_num,
+ int nb_sectors, int *pnum,
+ BlockDriverState **file)
+{
+*pnum = nb_sectors;
+*file = bs;
+
+return BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_OFFSET_VALID |
+   (sector_num << BDRV_SECTOR_BITS);


This is wrong.  Drivers should only ever return BDRV_BLOCK_DATA (which 
io.c then _adds_ BDRV_BLOCK_ALLOCATED to, as needed).  I'll fix it up as 
part of my byte-based block status series (v8 coming up soon).


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v2 0/2] aspeed: simplify using the 'unimplemented device'

2018-02-13 Thread Peter Maydell

On 9 February 2018 at 08:57, Philippe Mathieu-Daudé  wrote:
> Since v1:
> - corrected buggy UART base address (noticed by Peter)
> - tested using openbmc-build provided by Andrew
> - added Cédric R-b
>
> tested with:
>
>   $ qemu-system-arm -M romulus-bmc -m 512 \
> -drive file=image-bmc,if=mtd,format=raw -nographic
>
> using this image:
>
>   
> https://openpower.xyz/job/openbmc-build/1240/distro=ubuntu,target=romulus/artifact/deploy/images/romulus/image-bmc
>
> (which is a good candidate for an Avocado / pyexpect test).
>
> Philippe Mathieu-Daudé (2):
>   hw/arm/aspeed: directly map the serial device to the system address
> space
>   hw/arm/aspeed: simplify using the 'unimplemented device' for
> aspeed_soc.io
>
>  include/hw/arm/aspeed_soc.h |  1 -
>  hw/arm/aspeed_soc.c | 35 +--
>  2 files changed, 5 insertions(+), 31 deletions(-)

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH V2] target-arm:Add a dynamic XML-description of the cp-registers to GDB

2018-02-13 Thread Peter Maydell

On 13 February 2018 at 17:16, Abdallah Bouassida
 wrote:
> This patch offers to GDB the ability to read/write all the coprocessor
> registers for ARM and ARM64 by generating dynamically an XML-description for
> these registers.
>
> Signed-off-by: Abdallah Bouassida 
> ---
> Hi Peter,
>
> http://patchwork.ozlabs.org/patch/867467/
> My last patch was also mangled by Thunderbird even after changing the
> settings to send a plain text..!
> Anyway, here it is the patch using the git send-mail command.

Thanks for the resend. I've had a look through and have some
review comments below.

> Best regards,
> Abdallah
>
>  gdbstub.c  | 18 +++
>  include/qom/cpu.h  |  3 ++
>  target/arm/cpu.c   |  3 ++
>  target/arm/cpu.h   | 18 +++
>  target/arm/gdbstub.c   | 87 
> ++
>  target/arm/gdbstub64.c | 25 +++
>  target/arm/helper.c|  3 +-
>  7 files changed, 155 insertions(+), 2 deletions(-)
>
> diff --git a/gdbstub.c b/gdbstub.c
> index f1d5148..f54053f 100644
> --- a/gdbstub.c
> +++ b/gdbstub.c
> @@ -670,10 +670,20 @@ static const char *get_feature_xml(const char *p, const 
> char **newp,
>  pstrcat(target_xml, sizeof(target_xml), r->xml);
>  pstrcat(target_xml, sizeof(target_xml), "\"/>");
>  }
> +if (cc->has_dynamic_xml) {
> +cc->gen_dynamic_xml(cpu);
> +pstrcat(target_xml, sizeof(target_xml), " href=\"");
> +pstrcat(target_xml, sizeof(target_xml), "dynamic_desc.xml");
> +pstrcat(target_xml, sizeof(target_xml), "\"/>");
> +}
>  pstrcat(target_xml, sizeof(target_xml), "");
>  }
>  return target_xml;
>  }
> +if (strncmp(p, "dynamic_desc.xml", len) == 0) {
> +CPUState *cpu = first_cpu;
> +return cc->get_dynamic_xml(cpu);
> +}

Looking more closely at the gdbstub code I realized it already has
a mechanism for the target cpu code to register extra registers: the
gdb_register_coprocessor() function. If we use that for the system
registers we can avoid most of the changes to gdbstub.c. All you need
is a single change to get_feature_xml() so that it calls a CPU
object method passing it the name of the xml file being looked for,
something like:

/* The target CPU object has an opportunity to generate XML dynamically */
if (cc->gdb_get_xml) {
const char *xmlname = g_strndup(p, len);
const char *xml = cc->gdb_get_xml(xmlname);
g_free(xmlname);
if (xml) {
return xml;
}
}

Then the arm code should call
   gdb_register_coprocessor(cs, sysreg_gdb_get_reg, sysreg_gdb_set_reg,
 n, "system-registers.xml", 0);
   in arm_cpu_register_gdb_regs_for_features().
You might as well generate the xml here too, I guess.

The gdb_get_xml hook implementation can then just check for
"is the filename system-registers.xml, if so return our cached xml".


>  for (i = 0; ; i++) {
>  name = xml_builtin[i][0];
>  if (!name || (strncmp(name, p, len) == 0 && strlen(name) == len))
> @@ -697,6 +707,10 @@ static int gdb_read_register(CPUState *cpu, uint8_t 
> *mem_buf, int reg)
>  return r->get_reg(env, mem_buf, reg - r->base_reg);
>  }
>  }
> +
> +if (cc->has_dynamic_xml) {
> +return cc->gdb_read_register(cpu, mem_buf, reg);
> +}
>  return 0;
>  }
>
> @@ -715,6 +729,10 @@ static int gdb_write_register(CPUState *cpu, uint8_t 
> *mem_buf, int reg)
>  return r->set_reg(env, mem_buf, reg - r->base_reg);
>  }
>  }
> +
> +if (cc->has_dynamic_xml) {
> +return cc->gdb_write_register(cpu, mem_buf, reg);
> +}
>  return 0;
>  }

These changes won't be needed because the code for handling registers
registered by gdb_register_coprocessor() can deal with them.

>
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index aff88fa..907d4dc 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -197,6 +197,9 @@ typedef struct CPUClass {
>  const struct VMStateDescription *vmsd;
>  const char *gdb_core_xml_file;
>  gchar * (*gdb_arch_name)(CPUState *cpu);
> +bool has_dynamic_xml;
> +void (*gen_dynamic_xml)(CPUState *cpu);
> +char *(*get_dynamic_xml)(CPUState *cpu);

New members in this struct should have entries added to the
documentation comment for the struct.

>  void (*cpu_exec_enter)(CPUState *cpu);
>  void (*cpu_exec_exit)(CPUState *cpu);
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 9da6ea5..9e060a6 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -1752,6 +1752,9 @@ static void arm_cpu_class_init(ObjectClass *oc, void 
> *data)
>  cc->gdb_num_core_regs = 26;
>  cc->gdb_core_xml_file = "arm-core.xml";
>  cc->gdb_arch_name = arm_gdb_arch_name;
> +cc->has_dynamic_xml = true;
> +cc->gen_dynamic_xml = a

Re: [Qemu-devel] [Qemu-block] [PATCH v4] ssh: switch from libssh2 to libssh

2018-02-13 Thread Eric Blake


On 02/13/2018 12:49 PM, Max Reitz wrote:

On 2018-01-18 17:44, Pino Toscano wrote:

Rewrite the implementation of the ssh block driver to use libssh instead
of libssh2.  The libssh library has various advantages over libssh2:
- easier API for authentication (for example for using ssh-agent)
- easier API for known_hosts handling
- supports newer types of keys in known_hosts




@@ -628,6 +570,8 @@ static int connect_to_ssh(BDRVSSHState *s, QDict *options,
  Error *local_err = NULL;
  const char *user, *path, *host_key_check;
  long port = 0;
+unsigned long portU = 0;


I was about to say: How about making port an unsigned long and swapping
the qemu_strtol() for a qemu_strtoul()?

But I think you'd rather want an unsigned int instead (and that won't
work with qemu_strtoul()).


Dan has a pending patch that adds qemu_strtoi() and qemu_strtoui(), when 
we want to deal with parsing to ints.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v6 25/28] qmp/migration: new command migrate-recover

2018-02-13 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> The first allow-oob=true command.  It's used on destination side when
> the postcopy migration is paused and ready for a recovery.  After
> execution, a new migration channel will be established for postcopy to
> continue.
> 
> Signed-off-by: Peter Xu 
> ---
>  migration/migration.c | 26 ++
>  migration/migration.h |  1 +
>  migration/savevm.c|  3 +++
>  qapi/migration.json   | 20 
>  4 files changed, 50 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index cf3a3f416c..bb57ed9ade 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1422,6 +1422,32 @@ void qmp_migrate_incoming(const char *uri, Error 
> **errp)
>  once = false;
>  }
>  
> +void qmp_migrate_recover(const char *uri, Error **errp)
> +{
> +MigrationIncomingState *mis = migration_incoming_get_current();
> +
> +if (mis->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
> +error_setg(errp, "Migrate recover can only be run "
> +   "when postcopy is paused.");
> +return;
> +}

OK, if it did come back as Paused I don't think it can leave it again
except this way, so I'm not too worried it being thread safe.

> +if (mis->postcopy_recover_triggered) {
> +error_setg(errp, "Migrate recovery is triggered already");
> +return;
> +}
> +
> +/* This will make sure we'll only allow one recover for one pause */
> +mis->postcopy_recover_triggered = true;

However, does that need to be done with a :
   if (atomic_cmpxchg(mis->postcopy_recovery_triggered, false, true) ==
   true) {
  error_setg(errp, "Migrate recovery is triggered already");
   }

for the slim chance that someone did this command on the main and the
oob monitor?

Dave

> +/*
> + * Note that this call will never start a real migration; it will
> + * only re-setup the migration stream and poke existing migration
> + * to continue using that newly established channel.
> + */
> +qemu_start_incoming_migration(uri, errp);
> +}
> +
>  bool migration_is_blocked(Error **errp)
>  {
>  if (qemu_savevm_state_blocked(errp)) {
> diff --git a/migration/migration.h b/migration/migration.h
> index 88f5614b90..581bf4668b 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -65,6 +65,7 @@ struct MigrationIncomingState {
>  QemuSemaphore colo_incoming_sem;
>  
>  /* notify PAUSED postcopy incoming migrations to try to continue */
> +bool postcopy_recover_triggered;
>  QemuSemaphore postcopy_pause_sem_dst;
>  QemuSemaphore postcopy_pause_sem_fault;
>  };
> diff --git a/migration/savevm.c b/migration/savevm.c
> index d40092a2b6..5f41b062ba 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2182,6 +2182,9 @@ static bool 
> postcopy_pause_incoming(MigrationIncomingState *mis)
>  /* Notify the fault thread for the invalidated file handle */
>  postcopy_fault_thread_notify(mis);
>  
> +/* Clear the triggered bit to allow one recovery */
> +mis->postcopy_recover_triggered = false;
> +
>  error_report("Detected IO failure for postcopy. "
>   "Migration paused.");
>  
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 055130314d..dfbcb02d4c 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1172,3 +1172,23 @@
>  # Since: 2.9
>  ##
>  { 'command': 'xen-colo-do-checkpoint' }
> +
> +##
> +# @migrate-recover:
> +#
> +# Provide a recovery migration stream URI.
> +#
> +# @uri: the URI to be used for the recovery of migration stream.
> +#
> +# Returns: nothing.
> +#
> +# Example:
> +#
> +# -> { "execute": "migrate-recover",
> +#  "arguments": { "uri": "tcp:192.168.1.200:12345" } }
> +# <- { "return": {} }
> +#
> +# Since: 2.12
> +##
> +{ 'command': 'migrate-recover', 'data': { 'uri': 'str' },
> +  'allow-oob': true }
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v4] ssh: switch from libssh2 to libssh

2018-02-13 Thread Max Reitz

On 2018-01-18 17:44, Pino Toscano wrote:
> Rewrite the implementation of the ssh block driver to use libssh instead
> of libssh2.  The libssh library has various advantages over libssh2:
> - easier API for authentication (for example for using ssh-agent)
> - easier API for known_hosts handling
> - supports newer types of keys in known_hosts
> 
> Kerberos authentication can be enabled once the libssh bug for it [1] is
> fixed.
> 
> The development version of libssh (i.e. the future 0.8.x) supports
> fsync, so reuse the build time check for this.
> 
> [1] https://red.libssh.org/issues/242
> 
> Signed-off-by: Pino Toscano 
> ---
> 
> Changes from v3:
> - fix socket cleanup in connect_to_ssh()
> - add comments about the socket cleanup
> - improve the error reporting (closer to what was with libssh2)
> - improve EOF detection on sftp_read()
> 
> Changes from v2:
> - used again an own fd
> - fixed co_yield() implementation
> 
> Changes from v1:
> - fixed jumbo packets writing
> - fixed missing 'err' assignment
> - fixed commit message

One thing: The performance seems to have dropped hugely, from what I can
tell.

Before this patch, running the iotests 1-10 over ssh (raw/ssh) took
12.6 s.  With this patch, they take 59.3 s.  Perhaps the starkest
contrast can be seen in test 1, which took 4 s before and 27 s after --
this test simply reads and writes 128 MB of continuous data.

I like having elliptic curves, but I think this patch needs optimization
work before we can replace libssh2.

>  block/Makefile.objs |   6 +-
>  block/ssh.c | 522 
> 
>  configure   |  65 ---
>  3 files changed, 278 insertions(+), 315 deletions(-)

[...]

> diff --git a/block/ssh.c b/block/ssh.c
> index b049a16eb9..2975fc27d8 100644
> --- a/block/ssh.c
> +++ b/block/ssh.c

[...]

> @@ -87,27 +81,25 @@ static void ssh_state_init(BDRVSSHState *s)
>  {
>  memset(s, 0, sizeof *s);
>  s->sock = -1;
> -s->offset = -1;
>  qemu_co_mutex_init(&s->lock);
>  }
>  
>  static void ssh_state_free(BDRVSSHState *s)
>  {
> +if (s->attrs) {
> +sftp_attributes_free(s->attrs);
> +}
>  if (s->sftp_handle) {
> -libssh2_sftp_close(s->sftp_handle);
> +sftp_close(s->sftp_handle);
>  }
>  if (s->sftp) {
> -libssh2_sftp_shutdown(s->sftp);
> +sftp_free(s->sftp);
>  }
>  if (s->session) {
> -libssh2_session_disconnect(s->session,
> -   "from qemu ssh client: "
> -   "user closed the connection");
> -libssh2_session_free(s->session);
> -}
> -if (s->sock >= 0) {
> -close(s->sock);
> +ssh_disconnect(s->session);
> +ssh_free(s->session);
>  }
> +/* s->sock is owned by the ssh_session, which free's it. */

s/free's/frees/

>  }
>  
>  static void GCC_FMT_ATTR(3, 4)
> @@ -121,13 +113,13 @@ session_error_setg(Error **errp, BDRVSSHState *s, const 
> char *fs, ...)
>  va_end(args);
>  
>  if (s->session) {
> -char *ssh_err;
> +const char *ssh_err;
>  int ssh_err_code;
>  
> -/* This is not an errno.  See . */
> -ssh_err_code = libssh2_session_last_error(s->session,
> -  &ssh_err, NULL, 0);
> -error_setg(errp, "%s: %s (libssh2 error code: %d)",
> +/* This is not an errno.  See . */
> +ssh_err = ssh_get_error(s->session);
> +ssh_err_code = ssh_get_error_code(s->session);
> +error_setg(errp, "%s: %s (libssh error code: %d)",
> msg, ssh_err, ssh_err_code);

Maybe we should not append the error info if there is no error.

(Example:

$ ./qemu-img info ssh://localhost/tmp/foo
qemu-img: Could not open 'ssh://localhost/tmp/foo': no host key was
found in known_hosts:  (libssh error code: 0)

)

>  } else {
>  error_setg(errp, "%s", msg);

[...]

> @@ -291,68 +283,41 @@ static void ssh_parse_filename(const char *filename, 
> QDict *options,
>  static int check_host_key_knownhosts(BDRVSSHState *s,
>   const char *host, int port, Error 
> **errp)
>  {
> -const char *home;
> -char *knh_file = NULL;
> -LIBSSH2_KNOWNHOSTS *knh = NULL;
> -struct libssh2_knownhost *found;
> -int ret, r;
> -const char *hostkey;
> -size_t len;
> -int type;
> +int ret;
> +int state;
>  
> -hostkey = libssh2_session_hostkey(s->session, &len, &type);
> -if (!hostkey) {
> -ret = -EINVAL;
> -session_error_setg(errp, s, "failed to read remote host key");
> -goto out;
> -}
> +state = ssh_is_server_known(s->session);
>  
> -knh = libssh2_knownhost_init(s->session);
> -if (!knh) {
> -ret = -EINVAL;
> -session_error_setg(errp, s,
> -   "failed to initialize known hosts support");
> -goto out;
> -}
> -
> -home =

Re: [Qemu-devel] block_status automatically added flags

2018-02-13 Thread Eric Blake


On 02/13/2018 11:36 AM, Vladimir Sementsov-Ogievskiy wrote:

Hi Eric!

I'm now testing my nbd block status realization (block_status part, not 
about dirty bitmaps), and faced into the following effect.


I created empty qcow2 image and wrote to the first sector, so

qemu-io -c map x

reports:

64 KiB (0x1) bytes allocated at offset 0 bytes (0x0)
9.938 MiB (0x9f) bytes not allocated at offset 64 KiB (0x1)

But I can't get same results, when connecting to nbd server, exporting 
the same qcow2 file, I get


10 MiB (0xa0) bytes allocated at offset 0 bytes (0x0)


Is this with or without your NBD_CMD_BLOCK_STATUS patches applied?  And 
are you exposing the data over NBD as raw ('qemu-nbd -f qcow2'/'qemu-io 
-f raw') or as qcow2 ('qemu-nbd -f raw'/'qemu-io -f qcow2')?


/me does a quick reproduction

Yes, I definitely see that behavior without any NBD_CMD_BLOCK_STATUS 
patches and when the image is exposed over NBD as raw, but not when 
exposed as qcow2, when testing the 2.11 release:


$ qemu-img create -f qcow2 file3 10M
Formatting 'file3', fmt=qcow2 size=10485760 cluster_size=65536 
lazy_refcounts=off refcount_bits=16

$ qemu-io -c 'w 0 64k' -c map -f qcow2 file3
wrote 65536/65536 bytes at offset 0
64 KiB, 1 ops; 0.0579 sec (1.079 MiB/sec and 17.2601 ops/sec)
64 KiB (0x1) bytes allocated at offset 0 bytes (0x0)
9.938 MiB (0x9f) bytes not allocated at offset 64 KiB (0x1)
$ qemu-nbd -f qcow2 -x foo file3
$ qemu-io -f raw -c map nbd://localhost:10809/foo
10 MiB (0xa0) bytes allocated at offset 0 bytes (0x0)
$ qemu-nbd -f raw -x foo file3
$ qemu-io -f qcow2 -c map nbd://localhost:10809/foo
64 KiB (0x1) bytes allocated at offset 0 bytes (0x0)
9.938 MiB (0x9f) bytes not allocated at offset 64 KiB (0x1)

Right now, without NBD block status, the NBD driver reports the entire 
file as allocated, as it can't do any better (NBD has no backing file, 
and all data .  Presumably, once NBD_CMD_BLOCK_STATUS is implemented, we 
can then use that for more accurate information.





Finally, I understand the reason:

for local file, qemu-io calls bdrv_is_allocated, which calls 
bdrv_common_block_status_above with want_zero=false. So, it doesn't set 
BDRV_BLOCK_ZERO, and doesn't set BDRV_BLOCK_ALLOCATED.


'qemu-io map' is a bit unusual; it is the only UI that easily exposes 
bdrv_is_allocated() to the outside world ('qemu-img map' does not). 
(The fact that both operations are named 'map' but do something 
different is annoying; for back-compat reasons, we can't change 
qemu-img, and I don't know if changing qemu-io is worth it.)



And, even if we 
change want_zero to true,


Well, you'd do that by invoking bdrv_block_status() (via 'qemu-img map', 
for example).


here, it will set BDRV_BLOCK_ZERO, but will 
not set BDRV_BLOCK_ALLOCATED, which contradicts with it's definition:


  BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
    layer (short for DATA || ZERO), set by block layer


This text is wrong; it gets fixed in my still-pending concluding series 
for byte-based block status:


https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg00955.html

Conceptually, BDRV_BLOCK_ALLOCATED means "is THIS layer of the backing 
chain responsible for the contents at this guest offset"; and there are 
cases where we know that we read zeroes but where the current layer is 
not responsible for the contents (such as a qcow2 that has a backing 
file with shorter length, where we return BDRV_BLOCK_ZERO but not 
BDRV_BLOCK_ALLOCATED).  But since NBD has no backing chain, the entire 
image is considered allocated.  Meanwhile, asking whether something is 
allocated ('qemu-io -c map') is not the usual question you want to ask 
when determining what portions of a file are zero.





for nbd, we go through the similar way on server (but with want_zero = 
true), and we finally have BDRV_BLOCK_ZERO without BDRV_BLOCK_ALLOCATED, 
which maps to NBD_STATE_HOLE+NBD_STATE_ZERO. But then, in the client we 
have BDRV_BLOCK_ZERO not automatically added by block layer but directly 
from nbd driver, therefor BDRV_BLOCK_ALLOCATED is set and I get 
different result.


Drivers should never set BDRV_BLOCK_ALLOCATED; only the code in io.c 
should set it; and output based on BDRV_BLOCK_ALLOCATED is only useful 
in backing chain scenarios (which NBD does not have).





this all looks weird for me.

BDRV_BLOCK_ALLOCATED definition should be fixed, to show that this flag 
show only reported by driver flags, not automatically added.


If we need to report yet more flags, where the driver can report 
additional information, then that's different.  But changing the 
BDRV_BLOCK_ALLOCATED semantics would probably have knock-on effects that 
I'm not prepared to audit for (that is, we'd rather fix the 
documentation to match reality, which my pending patch does, and NOT 
change the code to  match the current incorrect documentation).




And then the situation with

Re: [Qemu-devel] [PATCH] log-for-trace.h: Split out parts of log.h used by trace.h

2018-02-13 Thread Peter Maydell

On 13 February 2018 at 15:19, Eric Blake  wrote:
> On 02/13/2018 08:00 AM, Peter Maydell wrote:
>> +++ b/scripts/tracetool/backend/log.py
>> @@ -20,7 +20,7 @@ PUBLIC = True
>>   def generate_h_begin(events, group):
>> -out('#include "qemu/log.h"',
>> +out('#include "qemu/log-for-trace.h"',
>>   '')
>> @@ -35,14 +35,13 @@ def generate_h(event, group):
>>   else:
>>   cond = "trace_event_get_state(%s)" % ("TRACE_" +
>> event.name.upper())
>>   -out('if (%(cond)s) {',
>> +out('if (%(cond)s && qemu_loglevel_mask(LOG_TRACE)) {',
>>   'struct timeval _now;',
>>   'gettimeofday(&_now, NULL);',
>> -'qemu_log_mask(LOG_TRACE,',
>
>
> Oh, nice side effect - the old code was unconditionally calling
> gettimeofday() even when qemu_loglevel_mask(LOG_TRACE) fails; the new code
> limits the call when the logging is actually going to happen.

True, but I think in practice if the trace_event_get_state()
check succeeds then LOG_TRACE will always be on.

(Slightly oddly, qemu_str_to_log_mask() only sets LOG_TRACE if
a trace:foo event was enabled, but qemu_set_log() forces LOG_TRACE
to on even if no trace events were enabled.)

>> -'  "%%d@%%zd.%%06zd:%(name)s " %(fmt)s
>> "\\n",',
>> -'  getpid(),',
>> -'  (size_t)_now.tv_sec,
>> (size_t)_now.tv_usec',
>> -'  %(argnames)s);',
>> +'qemu_log("%%d@%%zd.%%06zd:%(name)s " %(fmt)s "\\n",',
>> +' getpid(),',
>> +' (size_t)_now.tv_sec, (size_t)_now.tv_usec',
>> +' %(argnames)s);',
>>   '}',
>>   cond=cond,
>>   name=event.name,
>>
>
> If you don't think the extra preprocessor magic to prevent accidental
> inclusion of the internal header is necessary, then
> Reviewed-by: Eric Blake 

It doesn't seem very likely in practice that anybody would
include the obscure internal header. We can add the magic
later if the mistake seems to happen in practice...

thanks
-- PMM

Re: [Qemu-devel] [PATCH v6 21/28] migration: setup ramstate for resume

2018-02-13 Thread Dr. David Alan Gilbert

* Peter Xu (pet...@redhat.com) wrote:
> After we updated the dirty bitmaps of ramblocks, we also need to update
> the critical fields in RAMState to make sure it is ready for a resume.
> 
> Signed-off-by: Peter Xu 
> ---
>  migration/ram.c| 40 +++-
>  migration/trace-events |  1 +
>  2 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index a2a4b05d5c..d275875f54 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2250,6 +2250,36 @@ static int ram_init_all(RAMState **rsp)
>  return 0;
>  }
>  
> +static void ram_state_resume_prepare(RAMState *rs, QEMUFile *out)
> +{
> +RAMBlock *block;
> +long pages = 0;
> +
> +/*
> + * Postcopy is not using xbzrle/compression, so no need for that.
> + * Also, since source are already halted, we don't need to care
> + * about dirty page logging as well.
> + */
> +
> +RAMBLOCK_FOREACH(block) {
> +pages += bitmap_count_one(block->bmap,
> +  block->used_length >> TARGET_PAGE_BITS);
> +}
> +
> +/* This may not be aligned with current bitmaps. Recalculate. */
> +rs->migration_dirty_pages = pages;

migration_dirty_pages is uint64_t - so we should probably do the cast
above and keep 'pages' as uint64_t.

> +rs->last_seen_block = NULL;
> +rs->last_sent_block = NULL;
> +rs->last_page = 0;
> +rs->last_version = ram_list.version;

Do you need to explicitly set
   rs->ram_bulk_stage = false;

if the failure happened just after the start of postcopy and no
requested pages had been sent, I think it might still  be set?


> +/* Update RAMState cache of output QEMUFile */
> +rs->f = out;
> +
> +trace_ram_state_resume_prepare(pages);
> +}
> +
>  /*
>   * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
>   * long-running RCU critical section.  When rcu-reclaims in the code
> @@ -3178,8 +3208,16 @@ out:
>  static int ram_resume_prepare(MigrationState *s, void *opaque)
>  {
>  RAMState *rs = *(RAMState **)opaque;
> +int ret;
>  
> -return ram_dirty_bitmap_sync_all(s, rs);
> +ret = ram_dirty_bitmap_sync_all(s, rs);
> +if (ret) {
> +return ret;
> +}
> +
> +ram_state_resume_prepare(rs, s->to_dst_file);
> +
> +return 0;
>  }
>  
>  static SaveVMHandlers savevm_ram_handlers = {
> diff --git a/migration/trace-events b/migration/trace-events
> index 45b1d89217..f5913ff51c 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -88,6 +88,7 @@ ram_dirty_bitmap_reload_complete(char *str) "%s"
>  ram_dirty_bitmap_sync_start(void) ""
>  ram_dirty_bitmap_sync_wait(void) ""
>  ram_dirty_bitmap_sync_complete(void) ""
> +ram_state_resume_prepare(long v) "%ld"
>  
>  # migration/migration.c
>  await_return_path_close_on_source_close(void) ""

Dave

> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PULL 00/26] virtio, vhost, pci, pc: features, fixes and cleanups

2018-02-13 Thread Peter Maydell

On 13 February 2018 at 16:52, Michael S. Tsirkin  wrote:
> I've dropped the crypto vhost patches from the pull request for now.
>
> Pushed with the same name - should be fine now.

Applied the fixed version, thanks.

-- PMM

Re: [Qemu-devel] [PATCH] log-for-trace.h: Split out parts of log.h used by trace.h

2018-02-13 Thread Richard Henderson

On 02/13/2018 06:00 AM, Peter Maydell wrote:
> A persistent build problem we see is where a source file
> accidentally omits the #include of log.h. This slips through
> local developer testing because if you configure with the
> default (log) trace backend trace.h will pull in log.h for you.
> Compilation fails only if some other backend is selected.
> 
> To make this error cause a compile failure regardless of
> the configured trace backend, split out the parts of log.h
> that trace.h requires into a new log-for-trace.h header.
> Since almost all manual uses of the log.h functions will
> use constants or functions which aren't in log-for-trace.h,
> this will let us catch missing #include "qemu/log.h" more
> consistently.
> 
> Signed-off-by: Peter Maydell 
> ---
>  include/qemu/log-for-trace.h | 35 +++
>  include/qemu/log.h   | 18 --
>  scripts/tracetool/backend/log.py | 13 ++---
>  3 files changed, 45 insertions(+), 21 deletions(-)
>  create mode 100644 include/qemu/log-for-trace.h

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [PATCH v6 1/3] pci: Add support for Designware IP block

2018-02-13 Thread Michael S. Tsirkin

On Tue, Feb 13, 2018 at 09:07:10AM -0800, Andrey Smirnov wrote:
> +static void designware_pcie_root_class_init(ObjectClass *klass, void *data)
> +{
> +PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> +DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
> +
> +k->vendor_id = PCI_VENDOR_ID_SYNOPSYS;
> +k->device_id = 0xABCD;
> +k->revision = 0;
> +k->class_id = PCI_CLASS_BRIDGE_PCI;
> +k->is_express = true;
> +k->is_bridge = true;
> +k->exit = pci_bridge_exitfn;
> +k->realize = designware_pcie_root_realize;
> +k->config_read = designware_pcie_root_config_read;
> +k->config_write = designware_pcie_root_config_write;
> +
> +dc->reset = pci_bridge_reset;
> +/*
> + * PCI-facing part of the host bridge, not usable without the
> + * host-facing part, which can't be device_add'ed, yet.
> + */
> +dc->user_creatable = false;
> +dc->vmsd = &vmstate_designware_pcie_root;
> +}
> +
> +static uint64_t designware_pcie_host_mmio_read(void *opaque, hwaddr addr,
> +   unsigned int size)
> +{
> +PCIHostState *pci = PCI_HOST_BRIDGE(opaque);
> +PCIDevice *device = pci_find_device(pci->bus, 0, 0);
> +
> +return pci_host_config_read_common(device,
> +   addr,
> +   pci_config_size(device),
> +   size);
> +}
> +
> +static void designware_pcie_host_mmio_write(void *opaque, hwaddr addr,
> +uint64_t val, unsigned int size)
> +{
> +PCIHostState *pci = PCI_HOST_BRIDGE(opaque);
> +PCIDevice *device = pci_find_device(pci->bus, 0, 0);
> +
> +return pci_host_config_write_common(device,
> +addr,
> +pci_config_size(device),
> +val, size);
> +}
> +
> +static const MemoryRegionOps designware_pci_mmio_ops = {
> +.read   = designware_pcie_host_mmio_read,
> +.write  = designware_pcie_host_mmio_write,
> +.endianness = DEVICE_NATIVE_ENDIAN,
> +.impl = {
> +/*
> + * Our device would not work correctly if the guest was doing
> + * unaligned access. This might not be a limitation on the real
> + * device but in practice there is no reason for a guest to access
> + * this device unaligned.
> + */
> +.min_access_size = 4,
> +.max_access_size = 4,
> +.unaligned = false,
> +},
> +};

Could you pls add some comments explaining why is DEVICE_NATIVE_ENDIAN
appropriate here?  Most of these cases are plain "we never bothered
about cross-endian setups". Some are "there's a mix of different
endian-ness values, need to handle in a special way".

I suspect you really need DEVICE_LITTLE_ENDIAN.

-- 
MST

Re: [Qemu-devel] [PATCH 1/5] Add a git-publish configuration file

2018-02-13 Thread Daniel P . Berrangé

On Tue, Feb 13, 2018 at 05:34:25PM +, Stefan Hajnoczi wrote:
> From: Fam Zheng 
> 
> git-publish [1] is a convenient tool to send patches and has been
> popular among QEMU developers.  Recently it has been made available in
> Fedora official repo thanks to Stefan's work.
> 
> One nice feature of the tool is a per-project configuration with
> profiles, especially in which the cccmd option is a handy method to
> create the Cc list.
> 
> [1]: https://github.com/stefanha/git-publish
> 
> Signed-off-by: Fam Zheng 
> Reviewed-by: Marc-André Lureau 
> Message-id: 20180205054725.25634-2-f...@redhat.com
> Signed-off-by: Stefan Hajnoczi 
> ---
>  .gitpublish | 58 ++
>  1 file changed, 58 insertions(+)
>  create mode 100644 .gitpublish
> 
> diff --git a/.gitpublish b/.gitpublish
> new file mode 100644
> index 00..ed48f6e52c
> --- /dev/null
> +++ b/.gitpublish
> @@ -0,0 +1,58 @@
> +#
> +# Common git-publish profiles that can be used to send patches to QEMU 
> upstream.
> +#
> +# See https://github.com/stefanha/git-publish for more information
> +#
> +[gitpublishprofile "default"]
> +base = master
> +prefix = PATCH
> +to = qemu-devel@nongnu.org
> +cccmd = scripts/get_maintainer.pl --noroles --norolestats --nogit 
> --nogit-fallback 2>/dev/null
> +
> +[gitpublishprofile "rfc"]
> +base = master
> +prefix = RFC PATCH
> +to = qemu-devel@nongnu.org
> +cccmd = scripts/get_maintainer.pl --noroles --norolestats --nogit 
> --nogit-fallback 2>/dev/null
> +
> +[gitpublishprofile "stable"]
> +base = master
> +prefix = PATCH
> +to = qemu-devel@nongnu.org
> +cc = qemu-sta...@nongnu.org
> +cccmd = scripts/get_maintainer.pl --noroles --norolestats --nogit 
> --nogit-fallback 2>/dev/null
> +
> +[gitpublishprofile "trivial"]
> +base = master
> +prefix = PATCH
> +to = qemu-devel@nongnu.org
> +cc = qemu-triv...@nongnu.org
> +cccmd = scripts/get_maintainer.pl --noroles --norolestats --nogit 
> --nogit-fallback 2>/dev/null
> +
> +[gitpublishprofile "block"]
> +base = master
> +prefix = PATCH
> +to = qemu-devel@nongnu.org
> +cc = qemu-bl...@nongnu.org
> +cccmd = scripts/get_maintainer.pl --noroles --norolestats --nogit 
> --nogit-fallback 2>/dev/null

Why is a custom entry needed for block here (and other things
below).   Won't running get_maintainer.pl already correctly
report when a patch needs cc'ing to qemu-bl...@nongnu.org
based on MAINTAINER rules ?


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH v3] scripts: Add decodetree.py

2018-02-13 Thread Richard Henderson

On 02/13/2018 02:05 AM, Peter Maydell wrote:
> Trivial-but-important nit: all these new files are missing
> copyright-and-license comment headers.

Oops, fixed.  I also added a one-line comment describing what each of the tests
are attempting.


r~

Re: [Qemu-devel] [PATCH v1] s390x/tcg: fix disabling/enabling DAT

2018-02-13 Thread Cornelia Huck

On Tue, 13 Feb 2018 17:12:40 +0100
David Hildenbrand  wrote:

> Currently, all memory accesses go via the MMU of the address space
> (primary, secondary, ...). This is bad, because we don't flush the TLB
> when disabling/enabling DAT. So we could add a tlb flush. However it
> is easier to simply select the MMU we already have in place for real
> memory access.
> 
> All we have to do is point at the right MMU and allow to execute these
> pages.
> 
> Signed-off-by: David Hildenbrand 
> ---
> 
> This is necessary to make the upcomming kvm-unit-tests with vmalloc support
> pass under TCG.
> 
>  target/s390x/cpu.h|  7 ++-
>  target/s390x/mmu_helper.c |  2 +-
>  target/s390x/translate.c  | 10 +++---
>  3 files changed, 14 insertions(+), 5 deletions(-)

Thanks, applied (with tabs removed).

Re: [Qemu-devel] [PATCH v1] s390x/tcg: fix disabling/enabling DAT

2018-02-13 Thread Cornelia Huck

On Tue, 13 Feb 2018 08:31:21 -0800 (PST)
no-re...@patchew.org wrote:

> Checking PATCH 1/1: s390x/tcg: fix disabling/enabling DAT...
> WARNING: line over 80 characters
> #32: FILE: target/s390x/cpu.h:320:
> +#define FLAG_MASK_PSW(FLAG_MASK_PER | FLAG_MASK_DAT | 
> FLAG_MASK_PSTATE \

I'll ignore this...

> 
> ERROR: code indent should never use tabs
> #32: FILE: target/s390x/cpu.h:320:
> +#define FLAG_MASK_PSW^I^I(FLAG_MASK_PER | FLAG_MASK_DAT | FLAG_MASK_PSTATE \$

...and just get rid of the pre-existing tabs here while merging, no
need to resend.

> 
> total: 1 errors, 1 warnings, 51 lines checked
> 
> Your patch has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> 
> === OUTPUT END ===
> 
> Test command exited with code: 1
> 
> 
> ---
> Email generated automatically by Patchew [http://patchew.org/].
> Please send your feedback to patchew-de...@freelists.org

1 2 3 4 5 >

1 - 100 of 413 matches

Mail list logo