[Qemu-devel] [PATCH 2/6] xics: add find_server

2014-05-06 Thread Alexey Kardashevskiy
PAPR allows having multiple interrupr servers.

This adds a server lookup function and makes use of it.

Since at the moment QEMU only supports a single server,
no change in behaviour is expected.

Signed-off-by: Alexey Kardashevskiy 
---
 hw/intc/xics.c | 28 +++-
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 1f89a00..9314654 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -635,14 +635,30 @@ static const TypeInfo ics_info = {
 /*
  * Exported functions
  */
+static int xics_find_server(XICSState *icp, int irq)
+{
+int server;
+
+for (server = 0; server < icp->nr_servers; ++server) {
+ICSState *ics = &icp->ics[server];
+if (ics_valid_irq(ics, irq)) {
+return server;
+}
+}
+
+return -1;
+}
 
 qemu_irq xics_get_qirq(XICSState *icp, int irq)
 {
-if (!ics_valid_irq(icp->ics, irq)) {
-return NULL;
+int server = xics_find_server(icp, irq);
+
+if (server >= 0) {
+ICSState *ics = &icp->ics[server];
+return ics->qirqs[irq - ics->offset];
 }
 
-return icp->ics->qirqs[irq - icp->ics->offset];
+return NULL;
 }
 
 static void ics_set_irq_type(ICSState *ics, int srcno, bool lsi)
@@ -653,10 +669,12 @@ static void ics_set_irq_type(ICSState *ics, int srcno, 
bool lsi)
 
 void xics_set_irq_type(XICSState *icp, int irq, bool lsi)
 {
-ICSState *ics = icp->ics;
+int server = xics_find_server(icp, irq);
+ICSState *ics;
 
-assert(ics_valid_irq(ics, irq));
+assert(server >= 0);
 
+ics = &icp->ics[server];
 ics_set_irq_type(ics, irq - ics->offset, lsi);
 }
 
-- 
1.8.4.rc4




[Qemu-devel] [PATCH] spapr: pci: clean msi info when releasing it

2014-05-06 Thread Liu Ping Fan
In current code, we use phb->msi_table[ndev].nvec to indicate whether
this msi entries are used by a device or not. So when unplug a pci
device, we should reset nvec to zero.

Signed-off-by: Liu Ping Fan 
---
 hw/ppc/spapr_pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index cbef095..7b1dfe1 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -316,6 +316,7 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, 
sPAPREnvironment *spapr,
 rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
 return;
 }
+phb->msi_table[ndev].nvec = 0;
 trace_spapr_pci_msi("Released MSIs", ndev, config_addr);
 rtas_st(rets, 0, RTAS_OUT_SUCCESS);
 rtas_st(rets, 1, 0);
-- 
1.8.1.4




[Qemu-devel] Configure virtio-scsi options via libvirt

2014-05-06 Thread Mike Perez
Hi everyone,

I would like be able to configure virtio-scsi options num_queues, max_sectors,
and cmd_per_lun via libvirt. Are there any plans to have this support?

-- 
Mike Perez



Re: [Qemu-devel] virtio-serial-pci very expensive during live migration

2014-05-06 Thread Paolo Bonzini

Il 06/05/2014 22:01, Chris Friesen ha scritto:


It seems like the main problem is that we loop over all the queues,
calling virtio_pci_set_host_notifier_internal() on each of them.  That
in turn calls memory_region_add_eventfd(), which calls
memory_region_transaction_commit(), which scans over all the address
spaces, which seems to take the vast majority of the time.


Yes, you can wrap the entire loop with memory_region_transaction_begin 
and memory_region_transaction_commit.  Can you try that?


Paolo



Re: [Qemu-devel] [PATCH v3] block/raw-posix: Try both FIEMAP and SEEK_HOLE

2014-05-06 Thread Paolo Bonzini

Il 07/05/2014 00:18, Max Reitz ha scritto:

The current version of raw-posix always uses ioctl(FS_IOC_FIEMAP) if
FIEMAP is available; lseek with SEEK_HOLE/SEEK_DATA are not even
compiled in in this case. However, there may be implementations which
support the latter but not the former (e.g., NFSv4.2) as well as vice
versa.

To cover both cases, always try SEEK_HOLE/SEEK_DATA (as this will
probably be covered by POSIX soon) and if that does not work, fall back
to FIEMAP; and if that does not work either, treat everything as
allocated.

Signed-off-by: Max Reitz 
---
v3:
 - use a tristate for keeping the information whether or not to use
   FIEMAP and/or SEEK_HOLE/SEEK_DATA [Eric]
 - fall through to another implementation (i.e. FIEMAP) if the first
   (i.e. SEEK_HOLE/SEEK_DATA) reports everything as allocated; this will
   not result in that implementation being skipped the next time,
   however [Eric]


You need this if you use SEEK_HOLE/SEEK_DATA first, because it has a 
default implementation that returns a single non-hole for the entire file.


But if you start with FIEMAP, you don't because it will return ENOTSUP 
instead of a single allocated extent.


Paolo


---
 block/raw-posix.c | 182 +++---
 1 file changed, 131 insertions(+), 51 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 3ce026d..fd6bac6 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -123,6 +123,18 @@

 #define MAX_BLOCKSIZE  4096

+/* In case there are multiple implementations for the same feature provided by
+ * the environment, this enumeration may be used to represent the status of
+ * these alternatives. */
+typedef enum ImplementationAlternativeStatus {
+/* The status is (yet) unknown. */
+IMPLSTAT_UNKNOWN = 0,
+/* This implementation is known to return correct results. */
+IMPLSTAT_WORKING,
+/* This implementation is known not to return correct results. */
+IMPLSTAT_SKIP,
+} ImplementationAlternativeStatus;
+
 typedef struct BDRVRawState {
 int fd;
 int type;
@@ -146,6 +158,12 @@ typedef struct BDRVRawState {
 bool has_discard:1;
 bool has_write_zeroes:1;
 bool discard_zeroes:1;
+#if defined SEEK_HOLE && defined SEEK_DATA
+ImplementationAlternativeStatus seek_hole_status;
+#endif
+#ifdef CONFIG_FIEMAP
+ImplementationAlternativeStatus fiemap_status;
+#endif
 } BDRVRawState;

 typedef struct BDRVRawReopenState {
@@ -1272,98 +1290,160 @@ static int raw_create(const char *filename, 
QEMUOptionParameter *options,
 return result;
 }

-/*
- * Returns true iff the specified sector is present in the disk image. Drivers
- * not implementing the functionality are assumed to not support backing files,
- * hence all their sectors are reported as allocated.
- *
- * If 'sector_num' is beyond the end of the disk image the return value is 0
- * and 'pnum' is set to 0.
- *
- * 'pnum' is set to the number of sectors (including and immediately following
- * the specified sector) that are known to be in the same
- * allocated/unallocated state.
- *
- * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
- * beyond the end of the disk image it will be clamped.
- */
-static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num,
-int nb_sectors, int *pnum)
+static int64_t try_fiemap(BlockDriverState *bs, off_t start, off_t end,
+  off_t *data, off_t *hole, int nb_sectors, int *pnum)
 {
-off_t start, data, hole;
-int64_t ret;
-
-ret = fd_open(bs);
-if (ret < 0) {
-return ret;
-}
-
-start = sector_num * BDRV_SECTOR_SIZE;
-ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
-
 #ifdef CONFIG_FIEMAP
-
 BDRVRawState *s = bs->opaque;
+int64_t ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
 struct {
 struct fiemap fm;
 struct fiemap_extent fe;
 } f;

+if (s->fiemap_status == IMPLSTAT_SKIP) {
+return -ENOTSUP;
+}
+
 f.fm.fm_start = start;
 f.fm.fm_length = (int64_t)nb_sectors * BDRV_SECTOR_SIZE;
 f.fm.fm_flags = 0;
 f.fm.fm_extent_count = 1;
 f.fm.fm_reserved = 0;
 if (ioctl(s->fd, FS_IOC_FIEMAP, &f) == -1) {
-/* Assume everything is allocated.  */
-*pnum = nb_sectors;
-return ret;
+s->fiemap_status = IMPLSTAT_SKIP;
+return -errno;
+}
+
+if (s->fiemap_status == IMPLSTAT_UNKNOWN) {
+if (f.fm.fm_extent_count == 1 &&
+f.fe.fe_logical == 0 && f.fe.fe_length >= end)
+{
+/* FIEMAP returned a single extent spanning the entire file; maybe
+ * this was just a default response and therefore we cannot be sure
+ * whether it actually works; fall back to an alternative
+ * implementation. */
+return -ENOTSUP;
+} else {

[Qemu-devel] [PATCH 4/6] spapr: move interrupt allocator to xics

2014-05-06 Thread Alexey Kardashevskiy
The current allocator returns IRQ numbers from a pool and does not
support IRQs reuse in any form as it did not keep track of what it
previously returned, it only keeps the last returned IRQ. Some use
cases such as PCI hot(un)plug may require IRQ release and reallocation.

This moves an allocator from SPAPR to XICS.

This switches IRQ users to use new API.

This uses LSI/MSI flags to know if interrupt is allocated.

This removes xics_set_irq_type() as it is not used anymore.

The interrupt release function will be posted as a separate patch.

Signed-off-by: Alexey Kardashevskiy 
---
 hw/intc/xics.c | 85 +++---
 hw/ppc/spapr.c | 67 ---
 hw/ppc/spapr_events.c  |  2 +-
 hw/ppc/spapr_pci.c |  6 ++--
 hw/ppc/spapr_vio.c |  2 +-
 include/hw/ppc/spapr.h | 10 --
 include/hw/ppc/xics.h  |  3 +-
 trace-events   |  4 +++
 8 files changed, 91 insertions(+), 88 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 7a64b2e..faf304c 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -669,15 +669,90 @@ static void ics_set_irq_type(ICSState *ics, int srcno, 
bool lsi)
 lsi ? XICS_FLAGS_LSI : XICS_FLAGS_MSI;
 }
 
-void xics_set_irq_type(XICSState *icp, int irq, bool lsi)
+#define ICS_IRQ_FREE(ics, srcno)   \
+(!((ics)->irqs[(srcno)].flags & (XICS_FLAGS_LSI | XICS_FLAGS_MSI)))
+
+static int ics_find_free_block(ICSState *ics, int num, int alignnum)
+{
+int first, i;
+
+for (first = 0; first < ics->nr_irqs; first += alignnum) {
+if (num > (ics->nr_irqs - first)) {
+return -1;
+}
+for (i = first; i < first + num; ++i) {
+if (!ICS_IRQ_FREE(ics, i)) {
+break;
+}
+}
+if (i == (first + num)) {
+return first;
+}
+}
+
+return -1;
+}
+
+int xics_alloc(XICSState *icp, int server, int irq_hint, bool lsi)
 {
-int server = xics_find_server(icp, irq);
-ICSState *ics;
+ICSState *ics = &icp->ics[server];
+int irq;
 
-assert(server >= 0);
+if (irq_hint) {
+assert(server == xics_find_server(icp, irq_hint));
+if (!ICS_IRQ_FREE(ics, irq_hint - ics->offset)) {
+trace_xics_alloc_failed_hint(server, irq_hint);
+return -1;
+}
+irq = irq_hint;
+} else {
+irq = ics_find_free_block(ics, 1, 1);
+if (irq < 0) {
+trace_xics_alloc_failed_no_left(server);
+return -1;
+}
+irq += ics->offset;
+}
 
-ics = &icp->ics[server];
 ics_set_irq_type(ics, irq - ics->offset, lsi);
+trace_xics_alloc(server, irq);
+
+return irq;
+}
+
+/*
+ * Allocate block of consequtive IRQs, returns a number of the first.
+ * If align==true, aligns the first IRQ number to num.
+ */
+int xics_alloc_block(XICSState *icp, int server, int num, bool lsi, bool align)
+{
+int i, first = -1;
+ICSState *ics = &icp->ics[server];
+
+assert(server == 0);
+/*
+ * MSIMesage::data is used for storing VIRQ so
+ * it has to be aligned to num to support multiple
+ * MSI vectors. MSI-X is not affected by this.
+ * The hint is used for the first IRQ, the rest should
+ * be allocated continuously.
+ */
+if (align) {
+assert((num == 1) || (num == 2) || (num == 4) ||
+   (num == 8) || (num == 16) || (num == 32));
+first = ics_find_free_block(ics, num, num);
+} else {
+first = ics_find_free_block(ics, num, 1);
+}
+
+if (first > 0) {
+for (i = first; i < first + num; ++i) {
+ics_set_irq_type(ics, i, lsi);
+}
+}
+trace_xics_alloc_block(server, first, num, lsi, align);
+
+return first + ics->offset;
 }
 
 /*
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index a11e121..db21515 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -88,73 +88,6 @@
 
 sPAPREnvironment *spapr;
 
-int spapr_allocate_irq(int hint, bool lsi)
-{
-int irq;
-
-if (hint) {
-irq = hint;
-if (hint >= spapr->next_irq) {
-spapr->next_irq = hint + 1;
-}
-/* FIXME: we should probably check for collisions somehow */
-} else {
-irq = spapr->next_irq++;
-}
-
-/* Configure irq type */
-if (!xics_get_qirq(spapr->icp, irq)) {
-return 0;
-}
-
-xics_set_irq_type(spapr->icp, irq, lsi);
-
-return irq;
-}
-
-/*
- * Allocate block of consequtive IRQs, returns a number of the first.
- * If msi==true, aligns the first IRQ number to num.
- */
-int spapr_allocate_irq_block(int num, bool lsi, bool msi)
-{
-int first = -1;
-int i, hint = 0;
-
-/*
- * MSIMesage::data is used for storing VIRQ so
- * it has to be aligned to num to support multiple
- * MSI vectors. MSI-X is not affected by this.
- * The hint is used for the first IRQ, the rest should
- * be allocated continuously.
- */
-if

Re: [Qemu-devel] [PATCH] target-i386: fix handling of ZF in btx instructions

2014-05-06 Thread Paolo Bonzini

Il 06/05/2014 15:33, Dmitry Poletaev ha scritto:

Eflags for bt/bts/btr/btc instructions compute as for shift(SAR) instructions. 
According to Intel A2 manual, for btx instructions zf is unaffected under any 
condition, but for SAR group, when evaluate eflags, we compute zf.


This patch makes zf=0, it doesn't leave it unaffected.

Paolo


So I suggest add another group, specifically for computing eflags after btx 
instructions.

Signed-off-by: Dmitry Poletaev 

diff --git a/target-i386/cc_helper.c b/target-i386/cc_helper.c
index 05dd12b..272e2f1 100644
--- a/target-i386/cc_helper.c
+++ b/target-i386/cc_helper.c
@@ -168,6 +168,12 @@ target_ulong helper_cc_compute_all(target_ulong dst, 
target_ulong src1,
 case CC_OP_SHLL:
 return compute_all_shll(dst, src1);

+case CC_OP_BTXB:
+return compute_all_btxb(dst, src1);
+case CC_OP_BTXW:
+return compute_all_btxw(dst, src1);
+case CC_OP_BTXL:
+return compute_all_btxl(dst, src1);
 case CC_OP_SARB:
 return compute_all_sarb(dst, src1);
 case CC_OP_SARW:
@@ -208,6 +214,8 @@ target_ulong helper_cc_compute_all(target_ulong dst, 
target_ulong src1,
 return compute_all_decq(dst, src1);
 case CC_OP_SHLQ:
 return compute_all_shlq(dst, src1);
+case CC_OP_BTXQ:
+return compute_all_btxq(dst, src1);
 case CC_OP_SARQ:
 return compute_all_sarq(dst, src1);
 case CC_OP_BMILGQ:
@@ -234,6 +242,10 @@ target_ulong helper_cc_compute_c(target_ulong dst, 
target_ulong src1,
 return 0;

 case CC_OP_EFLAGS:
+case CC_OP_BTXB:
+case CC_OP_BTXW:
+case CC_OP_BTXL:
+case CC_OP_BTXQ:
 case CC_OP_SARB:
 case CC_OP_SARW:
 case CC_OP_SARL:
diff --git a/target-i386/cc_helper_template.h b/target-i386/cc_helper_template.h
index 607311f..04375f1 100644
--- a/target-i386/cc_helper_template.h
+++ b/target-i386/cc_helper_template.h
@@ -187,6 +187,19 @@ static int glue(compute_c_shl, SUFFIX)(DATA_TYPE dst, 
DATA_TYPE src1)
 return (src1 >> (DATA_BITS - 1)) & CC_C;
 }

+static int glue(compute_all_btx, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
+{
+int cf, pf, af, sf, of;
+
+cf = src1 & CC_C;
+pf = 0; /* undefined */
+af = 0; /* undefined */
+/* zf unaffected */
+sf = 0;  /* undefined */
+of = 0; /* undefined */
+return cf | pf | af | sf | of;
+}
+
 static int glue(compute_all_sar, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
 {
 int cf, pf, af, zf, sf, of;
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 2a22a7d..506037d 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -660,6 +660,10 @@ typedef enum {
 CC_OP_SHLL,
 CC_OP_SHLQ,

+CC_OP_BTXB, /* modify only C, CC_SRC.msb = C */
+CC_OP_BTXW,
+CC_OP_BTXL,
+CC_OP_BTXQ,
 CC_OP_SARB, /* modify all flags, CC_DST = res, CC_SRC.lsb = C */
 CC_OP_SARW,
 CC_OP_SARL,
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 02625e3..e77ba0b 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -200,6 +200,7 @@ static const uint8_t cc_op_live[CC_OP_NB] = {
 [CC_OP_INCB ... CC_OP_INCQ] = USES_CC_DST | USES_CC_SRC,
 [CC_OP_DECB ... CC_OP_DECQ] = USES_CC_DST | USES_CC_SRC,
 [CC_OP_SHLB ... CC_OP_SHLQ] = USES_CC_DST | USES_CC_SRC,
+[CC_OP_BTXB ... CC_OP_BTXQ] = USES_CC_DST | USES_CC_SRC,
 [CC_OP_SARB ... CC_OP_SARQ] = USES_CC_DST | USES_CC_SRC,
 [CC_OP_BMILGB ... CC_OP_BMILGQ] = USES_CC_DST | USES_CC_SRC,
 [CC_OP_ADCX] = USES_CC_DST | USES_CC_SRC,
@@ -6734,7 +6735,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 tcg_gen_xor_tl(cpu_T[0], cpu_T[0], cpu_tmp0);
 break;
 }
-set_cc_op(s, CC_OP_SARB + ot);
+set_cc_op(s, CC_OP_BTXB + ot);
 if (op != 0) {
 if (mod != 3) {
 gen_op_st_v(s, ot, cpu_T[0], cpu_A0);









Re: [Qemu-devel] [PATCH v1 22/22] target-arm: A64: Register VBAR_EL3

2014-05-06 Thread Peter Crosthwaite
On Tue, May 6, 2014 at 4:08 PM, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> Signed-off-by: Edgar E. Iglesias 
> ---
>  target-arm/helper.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/target-arm/helper.c b/target-arm/helper.c
> index 6e3f5fa..b6dac25 100644
> --- a/target-arm/helper.c
> +++ b/target-arm/helper.c
> @@ -2106,6 +2106,11 @@ static const ARMCPRegInfo v8_el3_cp_reginfo[] = {
>.type = ARM_CP_NO_MIGRATE,
>.opc0 = 3, .opc1 = 6, .crn = 4, .crm = 0, .opc2 = 0,
>.access = PL3_RW, .fieldoffset = offsetof(CPUARMState, banked_spsr[7]) 
> },
> +{ .name = "VBAR_EL3", .state = ARM_CP_STATE_AA64,
> +  .opc0 = 3, .opc1 = 6, .crn = 12, .crm = 0, .opc2 = 0,
> +  .access = PL3_RW, .writefn = vbar_write,
> +  .fieldoffset = offsetof(CPUARMState, cp15.vbar_el[2]),

Same comment as P21.

Regards,
Peter

> +  .resetvalue = 0 },
>  REGINFO_SENTINEL
>  };
>
> --
> 1.8.3.2
>
>



Re: [Qemu-devel] [PATCH v1 21/22] target-arm: A64: Register VBAR_EL2

2014-05-06 Thread Peter Crosthwaite
On Tue, May 6, 2014 at 4:08 PM, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> Signed-off-by: Edgar E. Iglesias 
> ---
>  target-arm/helper.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/target-arm/helper.c b/target-arm/helper.c
> index 2406058..6e3f5fa 100644
> --- a/target-arm/helper.c
> +++ b/target-arm/helper.c
> @@ -2088,6 +2088,11 @@ static const ARMCPRegInfo v8_el2_cp_reginfo[] = {
>.type = ARM_CP_NO_MIGRATE,
>.opc0 = 3, .opc1 = 4, .crn = 4, .crm = 0, .opc2 = 0,
>.access = PL2_RW, .fieldoffset = offsetof(CPUARMState, banked_spsr[6]) 
> },
> +{ .name = "VBAR_EL2", .state = ARM_CP_STATE_AA64,
> +  .opc0 = 3, .opc1 = 4, .crn = 12, .crm = 0, .opc2 = 0,
> +  .access = PL2_RW, .writefn = vbar_write,
> +  .fieldoffset = offsetof(CPUARMState, cp15.vbar_el[1]),

This [1] is smoewhat misleading, and should either use the VBAR_EL_IDX
macro or if changing over to always-four array, just [2].

Regards,
Peter

> +  .resetvalue = 0 },
>  REGINFO_SENTINEL
>  };
>
> --
> 1.8.3.2
>
>



Re: [Qemu-devel] [PATCH v1 20/22] target-arm: Make vbar_write writeback to any CPREG

2014-05-06 Thread Peter Crosthwaite
On Tue, May 6, 2014 at 4:08 PM, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> Signed-off-by: Edgar E. Iglesias 
> ---
>  target-arm/helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target-arm/helper.c b/target-arm/helper.c
> index 65daeaf..2406058 100644
> --- a/target-arm/helper.c
> +++ b/target-arm/helper.c
> @@ -657,7 +657,7 @@ static void vbar_write(CPUARMState *env, const 
> ARMCPRegInfo *ri,
>   * contexts. (ARMv8 would permit us to do no masking at all, but ARMv7
>   * requires the bottom five bits to be RAZ/WI because they're UNK/SBZP.)
>   */
> -env->cp15.vbar_el[VBAR_EL_IDX(1)] = value & ~0x1FULL;
> +CPREG_FIELD64(env, ri) = value & ~0x1FULL;

Use raw_write() to implement CP register writing (check
vmsa_ttbr_write for example).

Regards,
Peter

>  }
>
>  static uint64_t ccsidr_read(CPUARMState *env, const ARMCPRegInfo *ri)
> --
> 1.8.3.2
>
>



Re: [Qemu-devel] [PATCH v1 18/22] target-arm: A64: Generalize update_spsel for the various ELs

2014-05-06 Thread Peter Crosthwaite
On Tue, May 6, 2014 at 4:08 PM, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> Signed-off-by: Edgar E. Iglesias 

Reviewed-by: Peter Crosthwaite 

> ---
>  target-arm/internals.h | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/target-arm/internals.h b/target-arm/internals.h
> index 7c39946..5d802db 100644
> --- a/target-arm/internals.h
> +++ b/target-arm/internals.h
> @@ -107,6 +107,7 @@ int arm_rmode_to_sf(int rmode);
>
>  static inline void update_spsel(CPUARMState *env, uint32_t imm)
>  {
> +unsigned int cur_el = arm_current_pl(env);
>  /* Update PSTATE SPSel bit; this requires us to update the
>   * working stack pointer in xregs[31].
>   */
> @@ -115,17 +116,15 @@ static inline void update_spsel(CPUARMState *env, 
> uint32_t imm)
>  }
>  env->pstate = deposit32(env->pstate, 0, 1, imm);
>
> -/* EL0 has no access rights to update SPSel, and this code
> - * assumes we are updating SP for EL1 while running as EL1.
> - */
> -assert(arm_current_pl(env) == 1);
> +/* EL0 has no access rights to update SPSel.  */
> +assert(cur_el >= 1 && cur_el <= 3);
>  if (env->pstate & PSTATE_SP) {
>  /* Switch from using SP_EL0 to using SP_ELx */
>  env->sp_el[0] = env->xregs[31];
> -env->xregs[31] = env->sp_el[1];
> +env->xregs[31] = env->sp_el[cur_el];
>  } else {
>  /* Switch from SP_EL0 to SP_ELx */
> -env->sp_el[1] = env->xregs[31];
> +env->sp_el[cur_el] = env->xregs[31];
>  env->xregs[31] = env->sp_el[0];
>  }
>  }
> --
> 1.8.3.2
>
>



Re: [Qemu-devel] [PATCH v1 17/22] target-arm: A64: Generalize ERET to various ELs

2014-05-06 Thread Peter Crosthwaite
On Tue, May 6, 2014 at 4:08 PM, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> Adds support for ERET to Aarch64 EL2 and 3.
>
> Signed-off-by: Edgar E. Iglesias 
> ---
>  target-arm/op_helper.c | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
> index f1ae05e..8494f7f 100644
> --- a/target-arm/op_helper.c
> +++ b/target-arm/op_helper.c
> @@ -386,13 +386,14 @@ void HELPER(msr_i_pstate)(CPUARMState *env, uint32_t 
> op, uint32_t imm)
>
>  void HELPER(exception_return)(CPUARMState *env)
>  {
> -unsigned int spsr_idx = arm64_banked_spsr_index(1);
> -uint32_t spsr = env->banked_spsr[spsr_idx];
> -int new_el, i;
>  int cur_el = arm_current_pl(env);
> +unsigned int spsr_idx = arm64_banked_spsr_index(cur_el);
> +uint32_t spsr;
> +int new_el, i;
>
> +spsr = env->banked_spsr[spsr_idx];

Why change to split declaration and assignment (amongst the other
all-in-one's above)?

Otherwise:

Reviewed-by: Peter Crosthwaite 

>  if (env->pstate & PSTATE_SP) {
> -env->sp_el[1] = env->xregs[31];
> +env->sp_el[cur_el] = env->xregs[31];
>  } else {
>  env->sp_el[0] = env->xregs[31];
>  }
> @@ -428,7 +429,7 @@ void HELPER(exception_return)(CPUARMState *env)
>  env->aarch64 = 1;
>  pstate_write(env, spsr);
>  env->xregs[31] = env->sp_el[new_el];
> -env->pc = env->elr_el[ELR_EL_IDX(1)];
> +env->pc = env->elr_el[ELR_EL_IDX(cur_el)];
>  }
>
>  return;
> @@ -442,7 +443,7 @@ illegal_return:
>   * no change to exception level, execution state or stack pointer
>   */
>  env->pstate |= PSTATE_IL;
> -env->pc = env->elr_el[ELR_EL_IDX(1)];
> +env->pc = env->elr_el[ELR_EL_IDX(cur_el)];
>  spsr &= PSTATE_NZCV | PSTATE_DAIF;
>  spsr |= pstate_read(env) & ~(PSTATE_NZCV | PSTATE_DAIF);
>  pstate_write(env, spsr);
> --
> 1.8.3.2
>
>



Re: [Qemu-devel] [PATCH v3 14/14] qmp: Don't use error_is_set() to suppress additional errors

2014-05-06 Thread Markus Armbruster
Eric Blake  writes:

> On 05/02/2014 05:26 AM, Markus Armbruster wrote:
>> Using error_is_set(errp) that way can sweep programming errors under
>> the carpet when we get called incorrectly with an error set.
>> 
>> encrypted_bdrv_it() does it, because there's no way to make
>> bdrv_iterate() break its loop.  Actually safe, because qmp_cont()
>> clears the error before the loop.  Clean it up anyway: replace
>> bdrv_iterate() by bdrv_next(), break the loop on error.
>> 
>> Replace both occurences, for consistency.
>
> s/occurences/occurrences/
>
>> 
>> Signed-off-by: Markus Armbruster 
>> Reviewed-by: Eric Blake 
>
> Fixing up the commit message typo is trivial and doesn't alter the
> positive review.

Luiz, could you fix it up on commit, when you add Eric's R-by?



Re: [Qemu-devel] [PATCH v1 14/22] target-arm: Register EL3 versions of ELR and SPSR

2014-05-06 Thread Peter Crosthwaite
On Tue, May 6, 2014 at 4:08 PM, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> Signed-off-by: Edgar E. Iglesias 

Same as last patch,

Otherwise:

Reviewed-by: Peter Crosthwaite 

> ---
>  target-arm/helper.c | 16 
>  1 file changed, 16 insertions(+)
>
> diff --git a/target-arm/helper.c b/target-arm/helper.c
> index 8efc340..65daeaf 100644
> --- a/target-arm/helper.c
> +++ b/target-arm/helper.c
> @@ -2091,6 +2091,19 @@ static const ARMCPRegInfo v8_el2_cp_reginfo[] = {
>  REGINFO_SENTINEL
>  };
>
> +static const ARMCPRegInfo v8_el3_cp_reginfo[] = {
> +{ .name = "ELR_EL3", .state = ARM_CP_STATE_AA64,
> +  .type = ARM_CP_NO_MIGRATE,
> +  .opc0 = 3, .opc1 = 6, .crn = 4, .crm = 0, .opc2 = 1,
> +  .access = PL3_RW,
> +  .fieldoffset = offsetof(CPUARMState, elr_el[ELR_EL_IDX(3)]) },
> +{ .name = "SPSR_EL3", .state = ARM_CP_STATE_AA64,
> +  .type = ARM_CP_NO_MIGRATE,
> +  .opc0 = 3, .opc1 = 6, .crn = 4, .crm = 0, .opc2 = 0,
> +  .access = PL3_RW, .fieldoffset = offsetof(CPUARMState, banked_spsr[7]) 
> },
> +REGINFO_SENTINEL
> +};
> +
>  static void sctlr_write(CPUARMState *env, const ARMCPRegInfo *ri,
>  uint64_t value)
>  {
> @@ -2338,6 +2351,9 @@ void register_cp_regs_for_features(ARMCPU *cpu)
>  if (arm_feature(env, ARM_FEATURE_EL2)) {
>  define_arm_cp_regs(cpu, v8_el2_cp_reginfo);
>  }
> +if (arm_feature(env, ARM_FEATURE_EL3)) {
> +define_arm_cp_regs(cpu, v8_el3_cp_reginfo);
> +}
>  }
>  if (arm_feature(env, ARM_FEATURE_MPU)) {
>  /* These are the MPU registers prior to PMSAv6. Any new
> --
> 1.8.3.2
>
>



Re: [Qemu-devel] [PATCH v1 15/22] target-arm: A64: Forbid ERET to increase the EL

2014-05-06 Thread Peter Crosthwaite
On Tue, May 6, 2014 at 4:08 PM, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> Signed-off-by: Edgar E. Iglesias 

Reviewed-by: Peter Crosthwaite 

> ---
>  target-arm/op_helper.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
> index dd9e4fc..770c776 100644
> --- a/target-arm/op_helper.c
> +++ b/target-arm/op_helper.c
> @@ -389,6 +389,7 @@ void HELPER(exception_return)(CPUARMState *env)
>  unsigned int spsr_idx = arm64_banked_spsr_index(1);
>  uint32_t spsr = env->banked_spsr[spsr_idx];
>  int new_el, i;
> +int cur_el = arm_current_pl(env);
>
>  if (env->pstate & PSTATE_SP) {
>  env->sp_el[1] = env->xregs[31];
> @@ -410,6 +411,10 @@ void HELPER(exception_return)(CPUARMState *env)
>  env->regs[15] = env->elr_el[ELR_EL_IDX(1)] & ~0x1;
>  } else {
>  new_el = extract32(spsr, 2, 2);
> +if (new_el > cur_el) {
> +/* Disallow returns to higher ELs than the current one.  */
> +goto illegal_return;
> +}
>  if (new_el > 1) {
>  /* Return to unimplemented EL */
>  goto illegal_return;
> --
> 1.8.3.2
>
>



Re: [Qemu-devel] [PATCH v1 16/22] target-arm: A64: Forbid ERET to unimplemented ELs

2014-05-06 Thread Peter Crosthwaite
On Tue, May 6, 2014 at 4:08 PM, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> Check for EL2 support before returning to it.
>
> Signed-off-by: Edgar E. Iglesias 

Reviewed-by: Peter Crosthwaite 

> ---
>  target-arm/op_helper.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
> index 770c776..f1ae05e 100644
> --- a/target-arm/op_helper.c
> +++ b/target-arm/op_helper.c
> @@ -411,12 +411,10 @@ void HELPER(exception_return)(CPUARMState *env)
>  env->regs[15] = env->elr_el[ELR_EL_IDX(1)] & ~0x1;
>  } else {
>  new_el = extract32(spsr, 2, 2);
> -if (new_el > cur_el) {
> +if (new_el > cur_el
> +|| (new_el == 2 && !arm_feature(env, ARM_FEATURE_EL2))) {
>  /* Disallow returns to higher ELs than the current one.  */
> -goto illegal_return;
> -}
> -if (new_el > 1) {
> -/* Return to unimplemented EL */
> +/* Disallow returns to unimplemented ELs.  */
>  goto illegal_return;
>  }
>  if (extract32(spsr, 1, 1)) {
> --
> 1.8.3.2
>
>



[Qemu-devel] [PATCH 0/6] move interrupts from spapr to xics

2014-05-06 Thread Alexey Kardashevskiy
The existing interrupt allocation scheme in SPAPR assumes that
interrupts are allocated at the start time, continously and the config
will not change. However, there are cases when this is not going to work
such as:

1. migration - we will have to have an ability to choose interrupt
numbers for devices in the command line and this will create gaps in
interrupt space.

2. PCI hotplug - interrupts from unplugged device need to be returned
back to interrupt pool, otherwise we will quickly run out of interrupts if
we do PCI hotplug often.

First this was posted as "[PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration"
but since then we decided to migrate IOMMU in a different way and now it just
about interrupts.

This is also going to be used in spapr_pci's ibm,change-msi callback
to release interrupts and reallocate them again. At the moment spapr_pci
keeps a map of what it allocated for what device and it would be nice
to get rid of that mapping too and use this patchset instead.

Please comment. Thanks!


Alexey Kardashevskiy (6):
  xics: add flags for interrupts
  xics: add find_server
  xics: disable flags reset on xics reset
  spapr: move interrupt allocator to xics
  spapr: remove @next_irq
  xics: implement xics_ics_free()

 hw/intc/xics.c | 150 +
 hw/intc/xics_kvm.c |   9 +--
 hw/ppc/spapr.c |  70 +--
 hw/ppc/spapr_events.c  |   2 +-
 hw/ppc/spapr_pci.c |   6 +-
 hw/ppc/spapr_vio.c |   2 +-
 include/hw/ppc/spapr.h |  11 
 include/hw/ppc/xics.h  |   9 ++-
 trace-events   |   6 ++
 9 files changed, 162 insertions(+), 103 deletions(-)

-- 
1.8.4.rc4




[Qemu-devel] [PATCH 1/6] xics: add flags for interrupts

2014-05-06 Thread Alexey Kardashevskiy
The existing interrupt allocation scheme in SPAPR assumes that
interrupts are allocated at the start time, continously and the config
will not change. However, there are cases when this is not going to work
such as:

1. migration - we will have to have an ability to choose interrupt
numbers for devices in the command line and this will create gaps in
interrupt space.

2. PCI hotplug - interrupts from unplugged device need to be returned
back to interrupt pool, otherwise we will quickly run out of interrupts.

This replaces a separate lslsi[] array with a byte in the ICSIRQState
struct and defines "LSI" and "MSI" flags. Neither of these flags set
signals that the descriptor is not allocated and not in use.

Signed-off-by: Alexey Kardashevskiy 
---
 hw/intc/xics.c| 21 ++---
 hw/intc/xics_kvm.c|  5 ++---
 include/hw/ppc/xics.h |  5 -
 3 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 64aabe7..1f89a00 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -438,7 +438,7 @@ static void ics_set_irq(void *opaque, int srcno, int val)
 {
 ICSState *ics = (ICSState *)opaque;
 
-if (ics->islsi[srcno]) {
+if (ics->irqs[srcno].flags & XICS_FLAGS_LSI) {
 set_irq_lsi(ics, srcno, val);
 } else {
 set_irq_msi(ics, srcno, val);
@@ -475,7 +475,7 @@ static void ics_write_xive(ICSState *ics, int nr, int 
server,
 
 trace_xics_ics_write_xive(nr, srcno, server, priority);
 
-if (ics->islsi[srcno]) {
+if (ics->irqs[srcno].flags & XICS_FLAGS_LSI) {
 write_xive_lsi(ics, srcno);
 } else {
 write_xive_msi(ics, srcno);
@@ -497,7 +497,7 @@ static void ics_resend(ICSState *ics)
 
 for (i = 0; i < ics->nr_irqs; i++) {
 /* FIXME: filter by server#? */
-if (ics->islsi[i]) {
+if (ics->irqs[i].flags & XICS_FLAGS_LSI) {
 resend_lsi(ics, i);
 } else {
 resend_msi(ics, i);
@@ -512,7 +512,7 @@ static void ics_eoi(ICSState *ics, int nr)
 
 trace_xics_ics_eoi(nr);
 
-if (ics->islsi[srcno]) {
+if (ics->irqs[srcno].flags & XICS_FLAGS_LSI) {
 irq->status &= ~XICS_STATUS_SENT;
 }
 }
@@ -609,7 +609,6 @@ static void ics_realize(DeviceState *dev, Error **errp)
 return;
 }
 ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
-ics->islsi = g_malloc0(ics->nr_irqs * sizeof(bool));
 ics->qirqs = qemu_allocate_irqs(ics_set_irq, ics, ics->nr_irqs);
 }
 
@@ -646,11 +645,19 @@ qemu_irq xics_get_qirq(XICSState *icp, int irq)
 return icp->ics->qirqs[irq - icp->ics->offset];
 }
 
+static void ics_set_irq_type(ICSState *ics, int srcno, bool lsi)
+{
+ics->irqs[srcno].flags |=
+lsi ? XICS_FLAGS_LSI : XICS_FLAGS_MSI;
+}
+
 void xics_set_irq_type(XICSState *icp, int irq, bool lsi)
 {
-assert(ics_valid_irq(icp->ics, irq));
+ICSState *ics = icp->ics;
 
-icp->ics->islsi[irq - icp->ics->offset] = lsi;
+assert(ics_valid_irq(ics, irq));
+
+ics_set_irq_type(ics, irq - ics->offset, lsi);
 }
 
 /*
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index 09476ae..456fc2c 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -224,7 +224,7 @@ static int ics_set_kvm_state(ICSState *ics, int version_id)
 state |= KVM_XICS_MASKED;
 }
 
-if (ics->islsi[i]) {
+if (ics->irqs[i].flags & XICS_FLAGS_LSI) {
 state |= KVM_XICS_LEVEL_SENSITIVE;
 if (irq->status & XICS_STATUS_ASSERTED) {
 state |= KVM_XICS_PENDING;
@@ -253,7 +253,7 @@ static void ics_kvm_set_irq(void *opaque, int srcno, int 
val)
 int rc;
 
 args.irq = srcno + ics->offset;
-if (!ics->islsi[srcno]) {
+if (ics->irqs[srcno].flags & XICS_FLAGS_MSI) {
 if (!val) {
 return;
 }
@@ -290,7 +290,6 @@ static void ics_kvm_realize(DeviceState *dev, Error **errp)
 return;
 }
 ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
-ics->islsi = g_malloc0(ics->nr_irqs * sizeof(bool));
 ics->qirqs = qemu_allocate_irqs(ics_kvm_set_irq, ics, ics->nr_irqs);
 }
 
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 0d7673d..aad48cf 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -136,7 +136,6 @@ struct ICSState {
 uint32_t nr_irqs;
 uint32_t offset;
 qemu_irq *qirqs;
-bool *islsi;
 ICSIRQState *irqs;
 XICSState *icp;
 };
@@ -150,6 +149,10 @@ struct ICSIRQState {
 #define XICS_STATUS_REJECTED   0x4
 #define XICS_STATUS_MASKED_PENDING 0x8
 uint8_t status;
+/* @flags == 0 measn the interrupt is not allocated */
+#define XICS_FLAGS_LSI 0x1
+#define XICS_FLAGS_MSI 0x2
+uint8_t flags;
 };
 
 qemu_irq xics_get_qirq(XICSState *icp, int irq);
-- 
1.8.4.rc4




[Qemu-devel] [PATCH 6/6] xics: implement xics_ics_free()

2014-05-06 Thread Alexey Kardashevskiy
Signed-off-by: Alexey Kardashevskiy 
---
 hw/intc/xics.c| 24 
 include/hw/ppc/xics.h |  1 +
 trace-events  |  2 ++
 3 files changed, 27 insertions(+)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index faf304c..2316519 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -755,6 +755,30 @@ int xics_alloc_block(XICSState *icp, int server, int num, 
bool lsi, bool align)
 return first + ics->offset;
 }
 
+static void ics_free(ICSState *ics, int srcno, int num)
+{
+int i;
+
+for (i = srcno; i < srcno + num; ++i) {
+if (ICS_IRQ_FREE(ics, i)) {
+trace_xics_ics_free_warn(ics - ics->icp->ics, i + ics->offset);
+}
+memset(&ics->irqs[i], 0, sizeof(ICSIRQState));
+}
+}
+
+void xics_free(XICSState *icp, int server, int irq, int num)
+{
+ICSState *ics = &icp->ics[server];
+
+assert(server == 0);
+
+trace_xics_ics_free(ics - icp->ics, irq, num);
+if (server >= 0) {
+ics_free(ics, irq - ics->offset, num);
+}
+}
+
 /*
  * Guest interfaces
  */
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 0f01a21..e5d8f47 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -158,6 +158,7 @@ struct ICSIRQState {
 qemu_irq xics_get_qirq(XICSState *icp, int irq);
 int xics_alloc(XICSState *icp, int server, int irq_hint, bool lsi);
 int xics_alloc_block(XICSState *icp, int server, int num, bool lsi, bool 
align);
+void xics_free(XICSState *icp, int server, int irq, int num);
 
 void xics_cpu_setup(XICSState *icp, PowerPCCPU *cpu);
 
diff --git a/trace-events b/trace-events
index 8cf8fb2..5ca126c 100644
--- a/trace-events
+++ b/trace-events
@@ -1179,6 +1179,8 @@ xics_alloc(int server, int irq) "server#%d, irq %d"
 xics_alloc_failed_hint(int server, int irq) "server#%d, irq %d is already in 
use"
 xics_alloc_failed_no_left(int server) "server#%d, no irq left"
 xics_alloc_block(int server, int first, int num, bool lsi, int align) 
"server#%d, first irq %d, %d irqs, lsi=%d, alignnum %d"
+xics_ics_free(int server, int irq, int num) "server#%d, first irq %d, %d irqs"
+xics_ics_free_warn(int server, int irq) "server#%d, irq %d is already free"
 
 # hw/ppc/spapr_iommu.c
 spapr_iommu_put(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t ret) 
"liobn=%"PRIx64" ioba=0x%"PRIx64" tce=0x%"PRIx64" ret=%"PRId64
-- 
1.8.4.rc4




[Qemu-devel] [PATCH 3/6] xics: disable flags reset on xics reset

2014-05-06 Thread Alexey Kardashevskiy
Since islsi[] array has been merged into the ICSState struct,
we must not reset flags as they tell if the interrupt is in use.

Signed-off-by: Alexey Kardashevskiy 
---
 hw/intc/xics.c | 4 +++-
 hw/intc/xics_kvm.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 9314654..7a64b2e 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -522,10 +522,12 @@ static void ics_reset(DeviceState *dev)
 ICSState *ics = ICS(dev);
 int i;
 
-memset(ics->irqs, 0, sizeof(ICSIRQState) * ics->nr_irqs);
 for (i = 0; i < ics->nr_irqs; i++) {
+ics->irqs[i].server = 0;
 ics->irqs[i].priority = 0xff;
 ics->irqs[i].saved_priority = 0xff;
+ics->irqs[i].status = 0;
+/* Do not reset @flags as IRQ might be allocated */
 }
 }
 
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index 456fc2c..a322593 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -272,10 +272,12 @@ static void ics_kvm_reset(DeviceState *dev)
 ICSState *ics = ICS(dev);
 int i;
 
-memset(ics->irqs, 0, sizeof(ICSIRQState) * ics->nr_irqs);
 for (i = 0; i < ics->nr_irqs; i++) {
+ics->irqs[i].server = 0;
 ics->irqs[i].priority = 0xff;
 ics->irqs[i].saved_priority = 0xff;
+ics->irqs[i].status = 0;
+/* Do not reset @flags as IRQ might be allocated */
 }
 
 ics_set_kvm_state(ics, 1);
-- 
1.8.4.rc4




[Qemu-devel] [PATCH 5/6] spapr: remove @next_irq

2014-05-06 Thread Alexey Kardashevskiy
This removes @next_irq from sPAPREnvironment which was used in old
IRQ allocator as XICS is now responsible for IRQs and keeps track of
allocated IRQs.

Signed-off-by: Alexey Kardashevskiy 
---
 hw/ppc/spapr.c | 3 +--
 include/hw/ppc/spapr.h | 1 -
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index db21515..a680e90 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -754,7 +754,7 @@ static const VMStateDescription vmstate_spapr = {
 .minimum_version_id = 1,
 .minimum_version_id_old = 1,
 .fields  = (VMStateField []) {
-VMSTATE_UINT32(next_irq, sPAPREnvironment),
+VMSTATE_UNUSED(4), /* used to be @next_irq */
 
 /* RTC offset */
 VMSTATE_UINT64(rtc_offset, sPAPREnvironment),
@@ -1158,7 +1158,6 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
 /* Set up Interrupt Controller before we create the VCPUs */
 spapr->icp = xics_system_init(smp_cpus * kvmppc_smt_threads() / 
smp_threads,
   XICS_IRQS);
-spapr->next_irq = XICS_IRQ_BASE;
 
 /* init CPUs */
 if (cpu_model == NULL) {
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index feb241a..f8d7326 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -27,7 +27,6 @@ typedef struct sPAPREnvironment {
 long rtas_size;
 void *fdt_skel;
 target_ulong entry_point;
-uint32_t next_irq;
 uint64_t rtc_offset;
 bool has_graphics;
 
-- 
1.8.4.rc4




Re: [Qemu-devel] [PATCH v1 13/22] target-arm: Register EL2 versions of ELR and SPSR

2014-05-06 Thread Peter Crosthwaite
On Tue, May 6, 2014 at 4:08 PM, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> Signed-off-by: Edgar E. Iglesias 
> ---
>  target-arm/helper.c | 17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/target-arm/helper.c b/target-arm/helper.c
> index ba1830d..8efc340 100644
> --- a/target-arm/helper.c
> +++ b/target-arm/helper.c
> @@ -2078,6 +2078,19 @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
>  REGINFO_SENTINEL
>  };
>
> +static const ARMCPRegInfo v8_el2_cp_reginfo[] = {
> +{ .name = "ELR_EL2", .state = ARM_CP_STATE_AA64,
> +  .type = ARM_CP_NO_MIGRATE,
> +  .opc0 = 3, .opc1 = 4, .crn = 4, .crm = 0, .opc2 = 1,
> +  .access = PL2_RW,
> +  .fieldoffset = offsetof(CPUARMState, elr_el[ELR_EL_IDX(2)]) },
> +{ .name = "SPSR_EL2", .state = ARM_CP_STATE_AA64,
> +  .type = ARM_CP_NO_MIGRATE,
> +  .opc0 = 3, .opc1 = 4, .crn = 4, .crm = 0, .opc2 = 0,
> +  .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, banked_spsr[6]) 
> },
> +REGINFO_SENTINEL
> +};
> +
>  static void sctlr_write(CPUARMState *env, const ARMCPRegInfo *ri,
>  uint64_t value)
>  {
> @@ -2321,6 +2334,10 @@ void register_cp_regs_for_features(ARMCPU *cpu)
>  define_arm_cp_regs(cpu, v8_idregs);
>  define_arm_cp_regs(cpu, v8_cp_reginfo);
>  define_aarch64_debug_regs(cpu);
> +
> +if (arm_feature(env, ARM_FEATURE_EL2)) {
> +define_arm_cp_regs(cpu, v8_el2_cp_reginfo);
> +}

I think this should be outside the if ARM_FEATURE_V8 for consistency.
None of the other per-feature CP register defs are nested within the
ifferry for their ARM version. Detecting the invalid combination of
ARM_FEATURE_EL2 and pre V8 should probably be an assertion done in
arm_cpu_realizefn().

Otherwise:

Reviewed-by: Peter Crosthwaite 

>  }
>  if (arm_feature(env, ARM_FEATURE_MPU)) {
>  /* These are the MPU registers prior to PMSAv6. Any new
> --
> 1.8.3.2
>
>



Re: [Qemu-devel] virtio-serial-pci very expensive during live migration

2014-05-06 Thread Markus Armbruster
Copying Amit.

Chris Friesen  writes:

> Hi,
>
> I recently made the unfortunate discovery that virtio-serial-pci is
> quite expensive to stop/start during live migration.
>
> By default we support 32 ports, each of which uses 2 queues.  In my
> case it takes 2-3ms per queue to disconnect on the source host, and
> another 2-3ms per queue to connect on the target host, for a total
> cost of >300ms.
>
> In our case this roughly tripled the outage times of a libvirt-based
> live migration, from 150ms to almost 500ms.
>
> It seems like the main problem is that we loop over all the queues,
> calling virtio_pci_set_host_notifier_internal() on each of them.  That
> in turn calls memory_region_add_eventfd(), which calls
> memory_region_transaction_commit(), which scans over all the address
> spaces, which seems to take the vast majority of the time.
>
> Yes, setting the max_ports value to something smaller does help, but
> each port still adds 10-12ms to the overall live migration time, which
> is crazy.
>
> Is there anything that could be done to make this code more efficient?
> Could we tweak the API so that we add all the eventfds and then do a
> single commit at the end?  Do we really need to scan the entire
> address space?  I don't know the code well enough to answer that sort
> of question, but I'm hoping that one of you does.
>
> Thanks,
> Chris



Re: [Qemu-devel] [PATCH v2] block/raw-posix: Try both FIEMAP and SEEK_HOLE

2014-05-06 Thread Markus Armbruster
Eric Blake  writes:

> On 05/06/2014 03:18 PM, Eric Blake wrote:
>
>> ...if you are on a file system where SEEK_HOLE triggers the kernel
>> fallback of "entire file is allocated", but where FIEMAP is wired up for
>> that file system, would it make sense to have try_seek_hole return -1 in
>> situations where lseek(s->fd, 0, SEEK_HOLE) returns the end of the file?
>>  Even more, should skip_seek_hole be a tri-state?
>
> On the other hand, such systems are getting vanishingly rare as people
> upgrade to newer kernels.  Do we care about catering to them, or is it
> fair game to just tell people to upgrade if they want performance?

My tolerance for code complications to speed up things under old kernels
isn't zero, but close.  Leave these games to downstreams sporting such
kernels.



Re: [Qemu-devel] [PATCH v1 12/22] target-arm: Add a feature flag for EL3

2014-05-06 Thread Peter Crosthwaite
On Tue, May 6, 2014 at 4:08 PM, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> Signed-off-by: Edgar E. Iglesias 

Reviewed-by: Peter Crosthwaite 

> ---
>  target-arm/cpu.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/target-arm/cpu.h b/target-arm/cpu.h
> index d2e52d4..34e8f7c 100644
> --- a/target-arm/cpu.h
> +++ b/target-arm/cpu.h
> @@ -637,6 +637,7 @@ enum arm_features {
>  ARM_FEATURE_CRC, /* ARMv8 CRC instructions */
>  ARM_FEATURE_CBAR_RO, /* has cp15 CBAR and it is read-only */
>  ARM_FEATURE_EL2, /* has EL2 Virtualization support */
> +ARM_FEATURE_EL3, /* has EL3 Secure monitor support */
>  };
>
>  static inline int arm_feature(CPUARMState *env, int feature)
> --
> 1.8.3.2
>
>



Re: [Qemu-devel] [PATCH v1 11/22] target-arm: Add a feature flag for EL2

2014-05-06 Thread Peter Crosthwaite
On Tue, May 6, 2014 at 4:08 PM, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> Signed-off-by: Edgar E. Iglesias 

Reviewed-by: Peter Crosthwaite 

> ---
>  target-arm/cpu.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/target-arm/cpu.h b/target-arm/cpu.h
> index 6e6625b..d2e52d4 100644
> --- a/target-arm/cpu.h
> +++ b/target-arm/cpu.h
> @@ -636,6 +636,7 @@ enum arm_features {
>  ARM_FEATURE_CBAR, /* has cp15 CBAR */
>  ARM_FEATURE_CRC, /* ARMv8 CRC instructions */
>  ARM_FEATURE_CBAR_RO, /* has cp15 CBAR and it is read-only */
> +ARM_FEATURE_EL2, /* has EL2 Virtualization support */
>  };
>
>  static inline int arm_feature(CPUARMState *env, int feature)
> --
> 1.8.3.2
>
>



Re: [Qemu-devel] [PATCH v1 10/22] target-arm: A64: Introduce arm64_banked_spsr_index()

2014-05-06 Thread Peter Crosthwaite
On Tue, May 6, 2014 at 4:08 PM, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> Add arm64_banked_spsr_index(), used to map an Exception Level
> to an index in the baked_spsr array.
>
> Signed-off-by: Edgar E. Iglesias 
> ---
>  target-arm/helper-a64.c |  5 +++--
>  target-arm/internals.h  | 14 ++
>  target-arm/op_helper.c  |  3 ++-
>  3 files changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
> index 10bd1fc..415efbe 100644
> --- a/target-arm/helper-a64.c
> +++ b/target-arm/helper-a64.c
> @@ -444,6 +444,7 @@ void aarch64_cpu_do_interrupt(CPUState *cs)
>  ARMCPU *cpu = ARM_CPU(cs);
>  CPUARMState *env = &cpu->env;
>  target_ulong addr = env->cp15.vbar_el[VBAR_EL_IDX(1)];
> +unsigned int spsr_idx = arm64_banked_spsr_index(1);
>  int i;
>
>  if (arm_current_pl(env) == 0) {
> @@ -488,12 +489,12 @@ void aarch64_cpu_do_interrupt(CPUState *cs)
>  }
>
>  if (is_a64(env)) {
> -env->banked_spsr[0] = pstate_read(env);
> +env->banked_spsr[spsr_idx] = pstate_read(env);
>  env->sp_el[arm_current_pl(env)] = env->xregs[31];
>  env->xregs[31] = env->sp_el[1];
>  env->elr_el[ELR_EL_IDX(1)] = env->pc;
>  } else {
> -env->banked_spsr[0] = cpsr_read(env);
> +env->banked_spsr[spsr_idx] = cpsr_read(env);
>  if (!env->thumb) {
>  env->cp15.esr_el[ESR_EL_IDX(1)] |= 1 << 25;
>  }
> diff --git a/target-arm/internals.h b/target-arm/internals.h
> index d63a975..7c39946 100644
> --- a/target-arm/internals.h
> +++ b/target-arm/internals.h
> @@ -75,6 +75,20 @@ static inline void arm_log_exception(int idx)
>   */
>  #define GTIMER_SCALE 16
>
> +/*
> + * For aarch64, map a given EL to an index in the banked_spsr array.
> + */
> +static inline unsigned int arm64_banked_spsr_index(unsigned int el)
> +{
> +static const unsigned int map[3] = {
> +[0] = 0, /* EL1.  */
> +[1] = 6, /* EL2.  */
> +[2] = 7, /* EL3.  */

You could just change the [] indicies to [1], [2], [3]

> +};
> +assert(el >= 1 && el <= 3);
> +return map[el - 1];

And drop this subtraction.

Regards,
Peter

> +}
> +
>  int bank_number(int mode);
>  void switch_mode(CPUARMState *, int);
>  void arm_cpu_register_gdb_regs_for_features(ARMCPU *cpu);
> diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
> index 21545d0..dd9e4fc 100644
> --- a/target-arm/op_helper.c
> +++ b/target-arm/op_helper.c
> @@ -386,7 +386,8 @@ void HELPER(msr_i_pstate)(CPUARMState *env, uint32_t op, 
> uint32_t imm)
>
>  void HELPER(exception_return)(CPUARMState *env)
>  {
> -uint32_t spsr = env->banked_spsr[0];
> +unsigned int spsr_idx = arm64_banked_spsr_index(1);
> +uint32_t spsr = env->banked_spsr[spsr_idx];
>  int new_el, i;
>
>  if (env->pstate & PSTATE_SP) {
> --
> 1.8.3.2
>
>



Re: [Qemu-devel] [RFC PATCH v2 00/16] visitor+BER migration format

2014-05-06 Thread Markus Armbruster
"Michael S. Tsirkin"  writes:

> On Tue, May 06, 2014 at 07:58:07PM +0100, Dr. David Alan Gilbert wrote:
>> * Markus Armbruster (arm...@redhat.com) wrote:
>> > "Michael S. Tsirkin"  writes:
>> 
>> 
>> 
>> > > OK but for a new machine type, let's default to BER, right?
>> > > I see no reason to keep supporting when non-BER when -M specifies 2.1
>> > > compatibility, do you?
>> > 
>> > I fail to see the relation between machine type and migration's wire
>> > encoding.
>> 
>> New machine types are a useful but not definitive line in the sand.  If
>> you enable something/change the default on a new machine type you know
>> it won't break any existing users since there aren't any.
>> 
>> Dve

The purpose of machine types is to keep the guest ABI stable.  I don't
like tacking random crap unrelated to guest ABI to machine types.
They're hard enough to grasp for users as they are.

> Exactly. And on the other hand, someone enabling old machine type
> and doing live migration is likely to want to be compatible with old
> qemu wrt migration.

Machine types let you migrate to a newer QEMU (forward migration)
without messing up the guest ABI.  Migrating to an older QEMU (backward
migration) basically doesn't work, and as long as that's the case,
picking the older wire format by default is worthless.



Re: [Qemu-devel] [PATCH v1 05/22] target-arm: Add arm_el_to_mmu_idx()

2014-05-06 Thread Peter Crosthwaite
On Tue, May 6, 2014 at 4:08 PM, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> Maps a given EL to the corresponding MMU index.
>
> Signed-off-by: Edgar E. Iglesias 
> ---
>  target-arm/cpu.h   | 21 -
>  target-arm/translate-a64.c |  8 ++--
>  2 files changed, 22 insertions(+), 7 deletions(-)
>
> diff --git a/target-arm/cpu.h b/target-arm/cpu.h
> index ff86250..938f389 100644
> --- a/target-arm/cpu.h
> +++ b/target-arm/cpu.h
> @@ -1086,9 +1086,28 @@ static inline CPUARMState *cpu_init(const char 
> *cpu_model)
>  #define MMU_MODE0_SUFFIX _kernel
>  #define MMU_MODE1_SUFFIX _user
>  #define MMU_USER_IDX 1

Blank line here.

Otherwise:

Reviewed-by: Peter Crosthwaite 

> +static inline int arm_el_to_mmu_idx(int current_el)
> +{
> +#ifdef CONFIG_USER_ONLY
> +return MMU_USER_IDX;
> +#else
> +switch (current_el) {
> +case 0:
> +return MMU_USER_IDX;
> +case 1:
> +return 0;
> +default:
> +/* Unsupported EL.  */
> +assert(0);
> +return 0;
> +}
> +#endif
> +}
> +
>  static inline int cpu_mmu_index (CPUARMState *env)
>  {
> -return arm_current_pl(env) ? 0 : 1;
> +int cur_el = arm_current_pl(env);
> +return arm_el_to_mmu_idx(cur_el);
>  }
>
>  #include "exec/cpu-all.h"
> diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
> index 4f8246f..8523e76 100644
> --- a/target-arm/translate-a64.c
> +++ b/target-arm/translate-a64.c
> @@ -164,13 +164,9 @@ void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
>  }
>  }
>
> -static int get_mem_index(DisasContext *s)
> +static inline int get_mem_index(DisasContext *s)
>  {
> -#ifdef CONFIG_USER_ONLY
> -return 1;
> -#else
> -return s->user;
> -#endif
> +return arm_el_to_mmu_idx(s->current_pl);
>  }
>
>  void gen_a64_set_pc_im(uint64_t val)
> --
> 1.8.3.2
>
>



Re: [Qemu-devel] [PATCH v1 01/22] target-arm: A64: Add friendly logging of PSTATE A and I flags

2014-05-06 Thread Peter Crosthwaite
On Tue, May 6, 2014 at 4:08 PM, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> Signed-off-by: Edgar E. Iglesias 
> ---
>  target-arm/translate-a64.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
> index b62db4d..4f8246f 100644
> --- a/target-arm/translate-a64.c
> +++ b/target-arm/translate-a64.c
> @@ -137,8 +137,10 @@ void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
>  cpu_fprintf(f, " ");
>  }
>  }
> -cpu_fprintf(f, "PSTATE=%08x (flags %c%c%c%c)\n",
> +cpu_fprintf(f, "PSTATE=%08x (flags %c%c%c%c%c%c)\n",

Should delimit (just a space I think) between DAIF and NZCV
components. ARM ARM repeatedly refers to these two groups of four as
single item suggesting they are two logical groupings of 4 bits each.

>  psr,
> +psr & PSTATE_A ? 'A' : '-',
> +psr & PSTATE_I ? 'I' : '-',

And should the full DAIF be added for completeness?

Regards,
Peter

>  psr & PSTATE_N ? 'N' : '-',
>  psr & PSTATE_Z ? 'Z' : '-',
>  psr & PSTATE_C ? 'C' : '-',
> --
> 1.8.3.2
>
>



Re: [Qemu-devel] [PATCH v1 09/22] target-arm: Add SPSR entries for EL2/HYP and EL3/MON

2014-05-06 Thread Edgar E. Iglesias
On Tue, May 06, 2014 at 04:08:13PM +1000, Edgar E. Iglesias wrote:
> From: "Edgar E. Iglesias" 

Noticed I missed updating cpu_mode_names[]
Queued an update to translate.c for v2.

Cheers,
Edgar

> 
> Signed-off-by: Edgar E. Iglesias 
> ---
>  target-arm/cpu.h | 4 +++-
>  target-arm/helper.c  | 4 
>  target-arm/machine.c | 8 
>  3 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/target-arm/cpu.h b/target-arm/cpu.h
> index fd8ce70..6e6625b 100644
> --- a/target-arm/cpu.h
> +++ b/target-arm/cpu.h
> @@ -143,7 +143,7 @@ typedef struct CPUARMState {
>  uint32_t spsr;
>  
>  /* Banked registers.  */
> -uint64_t banked_spsr[6];
> +uint64_t banked_spsr[8];
>  uint32_t banked_r13[6];
>  uint32_t banked_r14[6];
>  
> @@ -566,7 +566,9 @@ enum arm_cpu_mode {
>ARM_CPU_MODE_FIQ = 0x11,
>ARM_CPU_MODE_IRQ = 0x12,
>ARM_CPU_MODE_SVC = 0x13,
> +  ARM_CPU_MODE_MON = 0x16,
>ARM_CPU_MODE_ABT = 0x17,
> +  ARM_CPU_MODE_HYP = 0x1a,
>ARM_CPU_MODE_UND = 0x1b,
>ARM_CPU_MODE_SYS = 0x1f
>  };
> diff --git a/target-arm/helper.c b/target-arm/helper.c
> index baeaa28..ba1830d 100644
> --- a/target-arm/helper.c
> +++ b/target-arm/helper.c
> @@ -3078,6 +3078,10 @@ int bank_number(int mode)
>  return 4;
>  case ARM_CPU_MODE_FIQ:
>  return 5;
> +case ARM_CPU_MODE_HYP:
> +return 6;
> +case ARM_CPU_MODE_MON:
> +return 7;
>  }
>  hw_error("bank number requested for bad CPSR mode value 0x%x\n", mode);
>  }
> diff --git a/target-arm/machine.c b/target-arm/machine.c
> index 92ac621..e95be47 100644
> --- a/target-arm/machine.c
> +++ b/target-arm/machine.c
> @@ -222,9 +222,9 @@ static int cpu_post_load(void *opaque, int version_id)
>  
>  const VMStateDescription vmstate_arm_cpu = {
>  .name = "cpu",
> -.version_id = 19,
> -.minimum_version_id = 19,
> -.minimum_version_id_old = 19,
> +.version_id = 20,
> +.minimum_version_id = 20,
> +.minimum_version_id_old = 20,
>  .pre_save = cpu_pre_save,
>  .post_load = cpu_post_load,
>  .fields = (VMStateField[]) {
> @@ -238,7 +238,7 @@ const VMStateDescription vmstate_arm_cpu = {
>  .offset = 0,
>  },
>  VMSTATE_UINT32(env.spsr, ARMCPU),
> -VMSTATE_UINT64_ARRAY(env.banked_spsr, ARMCPU, 6),
> +VMSTATE_UINT64_ARRAY(env.banked_spsr, ARMCPU, 8),
>  VMSTATE_UINT32_ARRAY(env.banked_r13, ARMCPU, 6),
>  VMSTATE_UINT32_ARRAY(env.banked_r14, ARMCPU, 6),
>  VMSTATE_UINT32_ARRAY(env.usr_regs, ARMCPU, 5),
> -- 
> 1.8.3.2
> 



Re: [Qemu-devel] [PATCH v1 02/22] target-arm: Make elr_el1 an array

2014-05-06 Thread Peter Crosthwaite
On Tue, May 6, 2014 at 4:08 PM, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> No functional change.
> Prepares for future additions of the EL2 and 3 versions of this reg.
>
> Signed-off-by: Edgar E. Iglesias 
> ---
>  target-arm/cpu.h| 3 ++-
>  target-arm/helper-a64.c | 4 ++--
>  target-arm/helper.c | 3 ++-
>  target-arm/kvm64.c  | 4 ++--
>  target-arm/machine.c| 2 +-
>  target-arm/op_helper.c  | 6 +++---
>  6 files changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/target-arm/cpu.h b/target-arm/cpu.h
> index c83f249..eb7a0f5 100644
> --- a/target-arm/cpu.h
> +++ b/target-arm/cpu.h
> @@ -162,7 +162,8 @@ typedef struct CPUARMState {
>  uint32_t condexec_bits; /* IT bits.  cpsr[15:10,26:25].  */
>  uint64_t daif; /* exception masks, in the bits they are in in PSTATE */
>
> -uint64_t elr_el1; /* AArch64 ELR_EL1 */
> +#define ELR_EL_IDX(x) (x - 1)
> +uint64_t elr_el[1]; /* AArch64 exception link regs  */

Is it perhaps just easier to waste the space and always pad these
EL-banked CP arrays out to length 4 you can just use literal numbers
in the code? Probably make life easier when introspecting the CPU
state in GDB too.

Regards,
Peter

>  uint64_t sp_el[2]; /* AArch64 banked stack pointers */
>
>  /* System control coprocessor (cp15) */
> diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
> index bf921cc..5adf2b5 100644
> --- a/target-arm/helper-a64.c
> +++ b/target-arm/helper-a64.c
> @@ -491,13 +491,13 @@ void aarch64_cpu_do_interrupt(CPUState *cs)
>  env->banked_spsr[0] = pstate_read(env);
>  env->sp_el[arm_current_pl(env)] = env->xregs[31];
>  env->xregs[31] = env->sp_el[1];
> -env->elr_el1 = env->pc;
> +env->elr_el[ELR_EL_IDX(1)] = env->pc;
>  } else {
>  env->banked_spsr[0] = cpsr_read(env);
>  if (!env->thumb) {
>  env->cp15.esr_el1 |= 1 << 25;
>  }
> -env->elr_el1 = env->regs[15];
> +env->elr_el[ELR_EL_IDX(1)] = env->regs[15];
>
>  for (i = 0; i < 15; i++) {
>  env->xregs[i] = env->regs[i];
> diff --git a/target-arm/helper.c b/target-arm/helper.c
> index 3be917c..3457d3e 100644
> --- a/target-arm/helper.c
> +++ b/target-arm/helper.c
> @@ -2055,7 +2055,8 @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
>  { .name = "ELR_EL1", .state = ARM_CP_STATE_AA64,
>.type = ARM_CP_NO_MIGRATE,
>.opc0 = 3, .opc1 = 0, .crn = 4, .crm = 0, .opc2 = 1,
> -  .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, elr_el1) },
> +  .access = PL1_RW,
> +  .fieldoffset = offsetof(CPUARMState, elr_el[ELR_EL_IDX(1)]) },
>  { .name = "SPSR_EL1", .state = ARM_CP_STATE_AA64,
>.type = ARM_CP_NO_MIGRATE,
>.opc0 = 3, .opc1 = 0, .crn = 4, .crm = 0, .opc2 = 0,
> diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c
> index e115879..da376cf 100644
> --- a/target-arm/kvm64.c
> +++ b/target-arm/kvm64.c
> @@ -161,7 +161,7 @@ int kvm_arch_put_registers(CPUState *cs, int level)
>  }
>
>  reg.id = AARCH64_CORE_REG(elr_el1);
> -reg.addr = (uintptr_t) &env->elr_el1;
> +reg.addr = (uintptr_t) &env->elr_el[ELR_EL_IDX(1)];
>  ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
>  if (ret) {
>  return ret;
> @@ -241,7 +241,7 @@ int kvm_arch_get_registers(CPUState *cs)
>  }
>
>  reg.id = AARCH64_CORE_REG(elr_el1);
> -reg.addr = (uintptr_t) &env->elr_el1;
> +reg.addr = (uintptr_t) &env->elr_el[ELR_EL_IDX(1)];
>  ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, ®);
>  if (ret) {
>  return ret;
> diff --git a/target-arm/machine.c b/target-arm/machine.c
> index b967223..8b299a0 100644
> --- a/target-arm/machine.c
> +++ b/target-arm/machine.c
> @@ -243,7 +243,7 @@ const VMStateDescription vmstate_arm_cpu = {
>  VMSTATE_UINT32_ARRAY(env.banked_r14, ARMCPU, 6),
>  VMSTATE_UINT32_ARRAY(env.usr_regs, ARMCPU, 5),
>  VMSTATE_UINT32_ARRAY(env.fiq_regs, ARMCPU, 5),
> -VMSTATE_UINT64(env.elr_el1, ARMCPU),
> +VMSTATE_UINT64(env.elr_el[ELR_EL_IDX(1)], ARMCPU),
>  VMSTATE_UINT64_ARRAY(env.sp_el, ARMCPU, 2),
>  /* The length-check must come before the arrays to avoid
>   * incoming data possibly overflowing the array.
> diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
> index fb90676..21545d0 100644
> --- a/target-arm/op_helper.c
> +++ b/target-arm/op_helper.c
> @@ -406,7 +406,7 @@ void HELPER(exception_return)(CPUARMState *env)
>  env->regs[i] = env->xregs[i];
>  }
>
> -env->regs[15] = env->elr_el1 & ~0x1;
> +env->regs[15] = env->elr_el[ELR_EL_IDX(1)] & ~0x1;
>  } else {
>  new_el = extract32(spsr, 2, 2);
>  if (new_el > 1) {
> @@ -424,7 +424,7 @@ void HELPER(exception_return)(CPUARMState *env)
>  env->aarch64 = 1;
>  pstate_write(env, spsr);
>  env->xregs[31] = env->sp_el[new_el];
> -env->pc = env->elr_el1;
> + 

Re: [Qemu-devel] [PATCH v1 00/22] target-arm: Preparations for A64 EL2 and 3

2014-05-06 Thread Edgar E. Iglesias
On Tue, May 06, 2014 at 08:58:43AM +0100, Peter Maydell wrote:
> On 6 May 2014 07:08, Edgar E. Iglesias  wrote:
> > From: "Edgar E. Iglesias" 
> >
> > Hi,
> >
> > I've been doing some work on modeling parts of EL2 and 3 + some of
> > the system-wide virtualization features for ARMv8. A lot is missing
> > but I've got a series with enough to for example run KVM A64 guests
> > on top of EL3 firmware inside emulated QEMU A64 VMs.
> > I'm working on cleaning things up and plan to send patches and publish
> > things as I go.
> 
> So before I start reviewing this, how does it relate to the
> Samsung series for AArch32 trustzone (EL3) support that was
> posted last year? In Linaro we've been planning to rework that
> and integrate it upstream...
>

Hi Peter,

AFAICT the series have some minor overlap but mostly they complement each other.
The aarch64 EL3 support I've got so far is very limited. Has mode switching,
separate page tables, SMC etc and that kind of things but no S/NS state yet.
The A64 security state parts can be implemented on top of the Samsung series.

Cheers,
Edgar



Re: [Qemu-devel] [PATCH] qapi: fix null pointer dereference on invalid parameter

2014-05-06 Thread Eric Blake
On 05/06/2014 06:24 PM, Peter Lieven wrote:
> qemu segfaults if it receives an invalid parameter via a
> qmp command instead of throwing an error.
> 
> For example:
> { "execute": "blockdev-add",
> "arguments": { "options" : { "driver": "invalid-driver" } } }
> 
> CC: qemu-sta...@nongnu.org
> Signed-off-by: Peter Lieven 
> ---
>  qapi/qapi-dealloc-visitor.c |4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Does this overlap with any of Markus' work? It emphasizes how bad it is
that our visitor interface callbacks are undocumented on what they can
be passed and are expected to return.

https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg00225.html

> 
> diff --git a/qapi/qapi-dealloc-visitor.c b/qapi/qapi-dealloc-visitor.c
> index d0ea118..dc53545 100644
> --- a/qapi/qapi-dealloc-visitor.c
> +++ b/qapi/qapi-dealloc-visitor.c
> @@ -131,7 +131,9 @@ static void qapi_dealloc_end_list(Visitor *v, Error 
> **errp)
>  static void qapi_dealloc_type_str(Visitor *v, char **obj, const char *name,
>Error **errp)
>  {
> -g_free(*obj);
> +if (obj) {
> +g_free(*obj);
> +}
>  }

As for solving a crash,
Reviewed-by: Eric Blake 

But if Markus' cleanups solve the problem by guaranteeing that the
cleanup visitor is never passed a NULL obj, then this would be a dead
check.  I'm not familiar enough with the rest of the visitor code to
know whether the caller is at fault, or whether other callers or visitor
callbacks have the same bug.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCHv3] block: optimize zero writes with bdrv_write_zeroes

2014-05-06 Thread Eric Blake
On 05/06/2014 06:23 PM, Peter Lieven wrote:
> this patch tries to optimize zero write requests
> by automatically using bdrv_write_zeroes if it is
> supported by the format.
> 
> This significantly speeds up file system initialization and
> should speed zero write test used to test backend storage
> performance.
> 

> 
> Signed-off-by: Peter Lieven 
> ---
> v2->v3: - moved parameter parsing to blockdev_init
> - added per device detect_zeroes status to 
>   hmp (info block -v) and qmp (query-block) [Eric]
> - added support to enable detect-zeroes also
>   for hot added devices [Eric]
> - added missing entry to qemu_common_drive_opts
> - fixed description of qemu_iovec_is_zero [Fam]
> 

>  
> +static BdrvDetectZeroes parse_detect_zeroes(const char *buf, Error **errp)
> +{
> +if (!buf || !strcmp(buf, "off")) {
> +return BDRV_DETECT_ZEROES_OFF;
> +} else if (!strcmp(buf, "on")) {
> +return BDRV_DETECT_ZEROES_ON;
> +} else if (!strcmp(buf, "unmap")) {
> +return BDRV_DETECT_ZEROES_UNMAP;
> +} else {
> +error_setg(errp, "invalid value for detect-zeroes: %s",
> +   buf);
> +}
> +return BDRV_DETECT_ZEROES_OFF;
> +}

Isn't there QAPI generated code that you can use instead of open-coding
this conversion between string and enum values?

> +++ b/qapi-schema.json
> @@ -937,6 +937,8 @@
>  # @encryption_key_missing: true if the backing device is encrypted but an
>  #  valid encryption key is missing
>  #
> +# @detect_zeroes: detect and optimize zero writes (Since 2.1)
> +#
>  # @bps: total throughput limit in bytes per second is specified
>  #
>  # @bps_rd: read throughput limit in bytes per second is specified
> @@ -972,6 +974,7 @@
>'data': { 'file': 'str', '*node-name': 'str', 'ro': 'bool', 'drv': 'str',
>  '*backing_file': 'str', 'backing_file_depth': 'int',
>  'encrypted': 'bool', 'encryption_key_missing': 'bool',
> +'detect_zeroes': 'str',

For new additions, we try to prefer dash over underscore.  Eww - we've
already been inconsistent in this struct, with backing_file vs. node-name.

Maybe s/detect_zeroes/detect-zeroes/.  I obviously can't argue too
strongly in light of the rest of the struct in isolation.  However, you
DID use detect-zeroes on the input side later in the patch, so being
consistent between the two sites would be nice (given that node-name was
named consistently).

On the other hand, I _can_ argue strongly that open-coding this as 'str'
is wrong.  You already added an enum type, so use it:

'detect_zeroes': 'BlockdevDetectZeroesOptions',

(hmm, in this context, it's not really an option, so maybe there is some
other name we can bikeshed about; but beyond 'BlockdevDetectZeroes', I
don't have any other good ideas)

zero is one of those odd words with two different pluralized spellings
both in common use.  Code base currently has a 1:2 ratio between 'zeros'
vs. 'zeroes', so I guess you're okay; on the other hand, 'man qemu-img'
documents that 'convert -S' detects "zeros".

>  'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int',
>  'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
>  'image': 'ImageInfo',
> @@ -4250,6 +4253,20 @@
>'data': [ 'ignore', 'unmap' ] }
>  
>  ##
> +# @BlockdevDetectZeroesOptions
> +#
> +# Selects the operation mode for zero write detection.

s/Selects/Describes/ since you are also going to use this enum on the
output field.

> +#
> +# @off:  Disabled
> +# @on:   Enabled

Maybe more details?  Seeing this doesn't tell me the tradeoffs involved
in tweaking the parameter (one is faster, while the other uses less
storage resources - so maybe mention those benefits).  I see the
documentation later on for the command line, so maybe repeating some of
that here would help someone going by just the QMP interface.

> +# @unmap:Enabled and even try to unmap blocks if possible
> +#
> +# Since: 2.1
> +##
> +{ 'enum': 'BlockdevDetectZeroesOptions',
> +  'data': [ 'off', 'on', 'unmap' ] }
> +
> +##
>  # @BlockdevAioOptions
>  #
>  # Selects the AIO backend to handle I/O requests

> @@ -4327,7 +4346,8 @@
>  '*aio': 'BlockdevAioOptions',
>  '*rerror': 'BlockdevOnError',
>  '*werror': 'BlockdevOnError',
> -'*read-only': 'bool' } }
> +'*read-only': 'bool',
> +'*detect-zeroes': 'BlockdevDetectZeroesOptions' } }

This part is fine.

>  @var{copy-on-read} is "on" or "off" and enables whether to copy read backing
>  file sectors into the image file.
> +@item detect-zeroes=@var{detect-zeroes}
> +@var{detect-zeroes} is "off", "on" or "unmap" and enables the automatic
> +conversion of plain zero writes by the OS to driver specific optimized
> +zero write commands. If "unmap" is chosen and @var{discard} is "on"
> +a zero write may even be converted to an UNMAP operation.

This looks good.

-- 
Eric Blake  

Re: [Qemu-devel] [PATCH v7 for 2.0 4/4] qcow2: Add full image preallocation option

2014-05-06 Thread Hu Tao
Hi,

On Thu, Mar 20, 2014 at 02:38:32PM +0100, Kevin Wolf wrote:
> Am 17.03.2014 um 07:53 hat Hu Tao geschrieben:
> > This adds a preallocation=full mode to qcow2 image creation, which
> > creates a non-sparse image file.
> > 
> > Signed-off-by: Hu Tao 
> 
> I see that you changed the implementation from bdrv_preallocate() to
> bdrv_create() options in v4 after Stefan had asked whether
> bdrv_preallocate() was really necessary or just complicating things.
> 
> In hindsight, do you really think the new code is simpler? I certainly
> think this one is rather adventurous and the formulas to precalculate
> the image file size are easy to get wrong.

When I tried to implement the bdrv_preallocate() way, I found that it
was inevitable to precalculate the size. For example, in
qcow2::bdrv_preallcate() we have to call low level
raw::bdrv_preallocate() first, but we have to pass in the actual image
file size which we have to precalculate before qcow2::bdrv_preallocate()
can create any metadata.

Another problem I don't understand about bdrv_preallocate() is that how
it ensures existing data not being corrupted for every call to
bdrv_preallocate().

Regards,
Hu Tao



[Qemu-devel] [PATCH v3] qapi: Document optional arguments' backwards compatibility

2014-05-06 Thread Fam Zheng
From: Eric Blake 

Signed-off-by: Eric Blake 
Signed-off-by: Fam Zheng 

---
v3: More text from Eric.

Signed-off-by: Fam Zheng 
---
 docs/qapi-code-gen.txt | 32 
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/docs/qapi-code-gen.txt b/docs/qapi-code-gen.txt
index d78921f..a6cba0a 100644
--- a/docs/qapi-code-gen.txt
+++ b/docs/qapi-code-gen.txt
@@ -49,10 +49,34 @@ example of a complex type is:
  { 'type': 'MyType',
'data': { 'member1': 'str', 'member2': 'int', '*member3': 'str' } }
 
-The use of '*' as a prefix to the name means the member is optional.  Optional
-members should always be added to the end of the dictionary to preserve
-backwards compatibility.
-
+The use of '*' as a prefix to the name means the member is optional.
+
+The default initialization value of an optional argument should not be changed
+between versions of QEMU unless the new default maintains backward
+compatibility to the user-visible behavior of the old default.
+
+With proper documentation, this policy still allows some flexibility; for
+example, documenting that a default of 0 picks an optimal buffer size allows
+one release to declare the optimal size at 512 while another release declares
+the optimal size at 4096 - the user-visible behavior is not the bytes used by
+the buffer, but the fact that the buffer was optimal size.
+
+On input structures (only mentioned in the 'data' side of a command), changing
+from mandatory to optional is safe (older clients will supply the option, and
+newer clients can benefit from the default); changing from optional to
+mandatory is backwards incompatible (older clients may be omitting the option,
+and must continue to work).
+
+On output structures (only mentioned in the 'returns' side of a command),
+changing from mandatory to optional is in general unsafe (older clients may be
+expecting the field, and could crash if it is missing), although it can be done
+if the only way that the optional argument will be omitted is when it is
+triggered by the presence of a new input flag to the command that older clients
+don't know to send.  Changing from optional to mandatory is safe.
+
+A structure that is used in both input and output of various commands
+must consider the backwards compatibility constraints of both directions
+of use.
 
 A complex type definition can specify another complex type as its base.
 In this case, the fields of the base type are included as top-level fields
-- 
1.9.2




Re: [Qemu-devel] [PATCH v2] qapi: Document optional arguments' backwards compatibility

2014-05-06 Thread Fam Zheng
On Tue, 05/06 19:30, Eric Blake wrote:
> On 05/05/2014 08:05 PM, Fam Zheng wrote:
> > Signed-off-by: Fam Zheng 
> > 
> > ---
> > v2: Employ the text suggested by Eric. (Thanks!)
> 
> Since much of it is my wording, it's probably better to credit me as an
> author, by adding:
> 
> Signed-off-by: Eric Blake 

Gladly!

> 
> > 
> > Signed-off-by: Fam Zheng 
> > ---
> >  docs/qapi-code-gen.txt | 26 ++
> >  1 file changed, 22 insertions(+), 4 deletions(-)
> > 
> > +The use of '*' as a prefix to the name means the member is optional.
> > +
> > +The default initialization value of an optional argument should not be 
> > changed
> > +between versions of QEMU unless the new default maintains backward
> > +compatibility to the user-visible behavior of the old default.
> 
> Maybe worth adding:
> 
> With proper documentation, this policy still allows some flexibility;
> for example, documenting that a default of 0 picks an optimal buffer
> size allows one release to declare the optimal size at 512 while another
> release declares the optimal size at 4096 - the user-visible behavior is
> not the bytes used by the buffer, but the fact that the buffer was
> optimal size.

OK, will add this and your s-o-b in V3.

Thanks,
Fam

> 
> > +
> > +On input structures (only mentioned in the 'data' side of a command), 
> > changing
> > +from mandatory to optional is safe (older clients will supply the option, 
> > and
> > +newer clients can benefit from the default); changing from optional to
> > +mandatory is backwards incompatible (older clients may be omitting the 
> > option,
> > +and must continue to work).
> > +
> > +On output structures (only mentioned in the 'returns' side of a command),
> > +changing from mandatory to optional is in general unsafe (older clients 
> > may be
> > +expecting the field, and could crash if it is missing), although it can be 
> > done
> > +if the only way that the optional argument will be omitted is when it is
> > +triggered by the presence of a new input flag to the command that older 
> > clients
> > +don't know to send.  Changing from optional to mandatory is safe.
> > +
> > +A structure that is used in both input and output of various commands
> > +must consider the backwards compatibility constraints of both directions
> > +of use.
> 
> 
> -- 
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 





Re: [Qemu-devel] [PATCH v2] vmdk: Optimize cluster allocation

2014-05-06 Thread Fam Zheng
On Tue, 05/06 10:32, Fam Zheng wrote:
> On mounted NFS filesystem, ftruncate is much much slower than doing a
> zero write. Changing this significantly speeds up cluster allocation.
> 
> Comparing by converting a cirros image (296M) to VMDK on an NFS mount
> point, over 1Gbe LAN:
> 
> $ time qemu-img convert cirros-0.3.1.img /mnt/a.raw -O vmdk
> 
> Before:
> real0m26.464s
> user0m0.133s
> sys 0m0.527s
> 
> After:
> real0m2.120s
> user0m0.080s
> sys 0m0.197s
> 
> Signed-off-by: Fam Zheng 
> 
> ---
> V2: Fix cluster_offset check. (Kevin)
> 
> Signed-off-by: Fam Zheng 
> ---
>  block/vmdk.c | 19 ++-
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 06a1f9f..98d2d56 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1037,6 +1037,7 @@ static int get_cluster_offset(BlockDriverState *bs,
>  int min_index, i, j;
>  uint32_t min_count, *l2_table;
>  bool zeroed = false;
> +int64_t ret;
>  
>  if (m_data) {
>  m_data->valid = 0;
> @@ -1110,12 +,20 @@ static int get_cluster_offset(BlockDriverState *bs,
>  }
>  
>  /* Avoid the L2 tables update for the images that have snapshots. */
> -*cluster_offset = bdrv_getlength(extent->file);
> +ret = bdrv_getlength(extent->file);
> +if (ret < 0 ||
> +ret & ((extent->cluster_sectors << BDRV_SECTOR_BITS) - 1)) {
> +return VMDK_ERROR;
> +}
> +*cluster_offset = ret;
>  if (!extent->compressed) {
> -bdrv_truncate(
> -extent->file,
> -*cluster_offset + (extent->cluster_sectors << 9)
> -);
> +ret = bdrv_write_zeroes(extent->file,
> +*cluster_offset >> BDRV_SECTOR_BITS,
> +extent->cluster_sectors,
> +0);

Hi Stefan,

By considering a bdrv_write_zeroes as a pre-write, it in general doubles the
write for the whole image, so it's not a good solution.

A better way would be removing the bdrv_truncate and require the caller to do
full cluster write (with a bounce buffer if necessary).

So let's drop this patch.

Thanks,
Fam



Re: [Qemu-devel] [PATCH v2] qapi: Document optional arguments' backwards compatibility

2014-05-06 Thread Eric Blake
On 05/05/2014 08:05 PM, Fam Zheng wrote:
> Signed-off-by: Fam Zheng 
> 
> ---
> v2: Employ the text suggested by Eric. (Thanks!)

Since much of it is my wording, it's probably better to credit me as an
author, by adding:

Signed-off-by: Eric Blake 

> 
> Signed-off-by: Fam Zheng 
> ---
>  docs/qapi-code-gen.txt | 26 ++
>  1 file changed, 22 insertions(+), 4 deletions(-)
> 
> +The use of '*' as a prefix to the name means the member is optional.
> +
> +The default initialization value of an optional argument should not be 
> changed
> +between versions of QEMU unless the new default maintains backward
> +compatibility to the user-visible behavior of the old default.

Maybe worth adding:

With proper documentation, this policy still allows some flexibility;
for example, documenting that a default of 0 picks an optimal buffer
size allows one release to declare the optimal size at 512 while another
release declares the optimal size at 4096 - the user-visible behavior is
not the bytes used by the buffer, but the fact that the buffer was
optimal size.

> +
> +On input structures (only mentioned in the 'data' side of a command), 
> changing
> +from mandatory to optional is safe (older clients will supply the option, and
> +newer clients can benefit from the default); changing from optional to
> +mandatory is backwards incompatible (older clients may be omitting the 
> option,
> +and must continue to work).
> +
> +On output structures (only mentioned in the 'returns' side of a command),
> +changing from mandatory to optional is in general unsafe (older clients may 
> be
> +expecting the field, and could crash if it is missing), although it can be 
> done
> +if the only way that the optional argument will be omitted is when it is
> +triggered by the presence of a new input flag to the command that older 
> clients
> +don't know to send.  Changing from optional to mandatory is safe.
> +
> +A structure that is used in both input and output of various commands
> +must consider the backwards compatibility constraints of both directions
> +of use.


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH] qapi: fix null pointer dereference on invalid parameter

2014-05-06 Thread Peter Lieven
qemu segfaults if it receives an invalid parameter via a
qmp command instead of throwing an error.

For example:
{ "execute": "blockdev-add",
"arguments": { "options" : { "driver": "invalid-driver" } } }

CC: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven 
---
 qapi/qapi-dealloc-visitor.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/qapi/qapi-dealloc-visitor.c b/qapi/qapi-dealloc-visitor.c
index d0ea118..dc53545 100644
--- a/qapi/qapi-dealloc-visitor.c
+++ b/qapi/qapi-dealloc-visitor.c
@@ -131,7 +131,9 @@ static void qapi_dealloc_end_list(Visitor *v, Error **errp)
 static void qapi_dealloc_type_str(Visitor *v, char **obj, const char *name,
   Error **errp)
 {
-g_free(*obj);
+if (obj) {
+g_free(*obj);
+}
 }
 
 static void qapi_dealloc_type_int(Visitor *v, int64_t *obj, const char *name,
-- 
1.7.9.5




[Qemu-devel] [PATCHv3] block: optimize zero writes with bdrv_write_zeroes

2014-05-06 Thread Peter Lieven
this patch tries to optimize zero write requests
by automatically using bdrv_write_zeroes if it is
supported by the format.

This significantly speeds up file system initialization and
should speed zero write test used to test backend storage
performance.

I ran the following 2 tests on my internal SSD with a
50G QCOW2 container and on an attached iSCSI storage.

a) mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdX

QCOW2 [off] [on] [unmap]
-
runtime:   14secs1.1secs  1.1secs
filesize:  937M  18M  18M

iSCSI [off] [on] [unmap]

runtime:   9.3s  0.9s 0.9s

b) dd if=/dev/zero of=/dev/vdX bs=1M oflag=direct

QCOW2 [off] [on] [unmap]
-
runtime:   246secs   18secs   18secs
filesize:  51G   192K 192K
throughput:203M/s2.3G/s   2.3G/s

iSCSI*[off] [on] [unmap]

runtime:   8mins 45secs   33secs
throughput:106M/s1.2G/s   1.6G/s
allocated: 100%  100% 0%

* The storage was connected via an 1Gbit interface.
  It seems to internally handle writing zeroes
  via WRITESAME16 very fast.

Signed-off-by: Peter Lieven 
---
v2->v3: - moved parameter parsing to blockdev_init
- added per device detect_zeroes status to 
  hmp (info block -v) and qmp (query-block) [Eric]
- added support to enable detect-zeroes also
  for hot added devices [Eric]
- added missing entry to qemu_common_drive_opts
- fixed description of qemu_iovec_is_zero [Fam]

v1->v2: - added tests to commit message (Markus)
RFCv2->v1: - fixed paramter parsing strncmp -> strcmp (Eric)
   - fixed typo (choosen->chosen) (Eric)
   - added second example to commit msg

RFCv1->RFCv2: - add detect-zeroes=off|on|unmap knob to drive cmdline parameter
  - call zero detection only for format (bs->file != NULL)

 block.c   |   11 ++
 block/qapi.c  |   11 ++
 blockdev.c|   28 +
 hmp.c |6 ++
 include/block/block_int.h |   12 +++
 include/qemu-common.h |1 +
 qapi-schema.json  |   50 +++--
 qemu-options.hx   |6 ++
 qmp-commands.hx   |3 +++
 util/iov.c|   21 +++
 10 files changed, 134 insertions(+), 15 deletions(-)

diff --git a/block.c b/block.c
index b749d31..f27b35d 100644
--- a/block.c
+++ b/block.c
@@ -3244,6 +3244,17 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 
 ret = notifier_with_return_list_notify(&bs->before_write_notifiers, req);
 
+if (!ret && bs->detect_zeroes != BDRV_DETECT_ZEROES_OFF &&
+!(flags & BDRV_REQ_ZERO_WRITE) && bs->file &&
+drv->bdrv_co_write_zeroes && qemu_iovec_is_zero(qiov)) {
+flags |= BDRV_REQ_ZERO_WRITE;
+/* if the device was not opened with discard=on the below flag
+ * is immediately cleared again in bdrv_co_do_write_zeroes */
+if (bs->detect_zeroes == BDRV_DETECT_ZEROES_UNMAP) {
+flags |= BDRV_REQ_MAY_UNMAP;
+}
+}
+
 if (ret < 0) {
 /* Do nothing, write notifier decided to fail this request */
 } else if (flags & BDRV_REQ_ZERO_WRITE) {
diff --git a/block/qapi.c b/block/qapi.c
index af11445..fbf66c2 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -51,6 +51,17 @@ BlockDeviceInfo *bdrv_block_device_info(BlockDriverState *bs)
 
 info->backing_file_depth = bdrv_get_backing_file_depth(bs);
 
+switch (bs->detect_zeroes) {
+case BDRV_DETECT_ZEROES_ON:
+info->detect_zeroes = g_strdup("on");
+break;
+case BDRV_DETECT_ZEROES_UNMAP:
+info->detect_zeroes = g_strdup("unmap");
+break;
+default:
+info->detect_zeroes = g_strdup("off");
+}
+
 if (bs->io_limits_enabled) {
 ThrottleConfig cfg;
 throttle_get_config(&bs->throttle_state, &cfg);
diff --git a/blockdev.c b/blockdev.c
index 7810e9f..96c11fd 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -288,6 +288,21 @@ static int parse_block_error_action(const char *buf, bool 
is_read, Error **errp)
 }
 }
 
+static BdrvDetectZeroes parse_detect_zeroes(const char *buf, Error **errp)
+{
+if (!buf || !strcmp(buf, "off")) {
+return BDRV_DETECT_ZEROES_OFF;
+} else if (!strcmp(buf, "on")) {
+return BDRV_DETECT_ZEROES_ON;
+} else if (!strcmp(buf, "unmap")) {
+return BDRV_DETECT_ZEROES_UNMAP;
+} else {
+error_setg(errp, "invalid value for detect-zeroes: %s",
+   buf);
+}
+return BDRV_DETECT_ZEROES_OFF;
+}
+
 static bool check_throttle_config(ThrottleConfig *cfg, Error **errp)
 {
 if (throttle_conflicting(cfg)) {
@@ -324,6 +339,7 @@ static DriveInfo *blockdev_init(const char *file, QDict 
*bs_opts,
 QemuOpts *opts;
 const char *id;
 bool has_driver_specific_opts;
+

Re: [Qemu-devel] [PATCHv3] block: optimize zero writes with bdrv_write_zeroes

2014-05-06 Thread Peter Lieven
please ignore this one, I accidently used an old commit message

Am 07.05.2014 02:01, schrieb Peter Lieven:
> this patch tries to optimize zero write requests
> by automatically using bdrv_write_zeroes if it is
> supported by the format.
>
> this should significantly speed up file system initialization and
> should speed zero write test used to test backend storage performance.
>
> the difference can simply be tested by e.g.
>
>dd if=/dev/zero of=/dev/vdX bs=1M
>
> or
>
>mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdX
>
> Signed-off-by: Peter Lieven 
> ---
> v2->v3: - moved parameter parsing to blockdev_init
> - added per device detect_zeroes status to 
>   hmp (info block -v) and qmp (query-block) [Eric]
> - added support to enable detect-zeroes also
>   for hot added devices [Eric]
> - added missing entry to qemu_common_drive_opts
> - fixed description of qemu_iovec_is_zero [Fam]
>
> v1->v2: - added tests to commit message (Markus)
> RFCv2->v1: - fixed paramter parsing strncmp -> strcmp (Eric)
>- fixed typo (choosen->chosen) (Eric)
>- added second example to commit msg
>
> RFCv1->RFCv2: - add detect-zeroes=off|on|unmap knob to drive cmdline parameter
>   - call zero detection only for format (bs->file != NULL)
>
>  block.c   |   11 ++
>  block/qapi.c  |   11 ++
>  blockdev.c|   28 +
>  hmp.c |6 ++
>  include/block/block_int.h |   12 +++
>  include/qemu-common.h |1 +
>  qapi-schema.json  |   50 
> +++--
>  qemu-options.hx   |6 ++
>  qmp-commands.hx   |3 +++
>  util/iov.c|   21 +++
>  10 files changed, 134 insertions(+), 15 deletions(-)
>
> diff --git a/block.c b/block.c
> index b749d31..f27b35d 100644
> --- a/block.c
> +++ b/block.c
> @@ -3244,6 +3244,17 @@ static int coroutine_fn 
> bdrv_aligned_pwritev(BlockDriverState *bs,
>  
>  ret = notifier_with_return_list_notify(&bs->before_write_notifiers, req);
>  
> +if (!ret && bs->detect_zeroes != BDRV_DETECT_ZEROES_OFF &&
> +!(flags & BDRV_REQ_ZERO_WRITE) && bs->file &&
> +drv->bdrv_co_write_zeroes && qemu_iovec_is_zero(qiov)) {
> +flags |= BDRV_REQ_ZERO_WRITE;
> +/* if the device was not opened with discard=on the below flag
> + * is immediately cleared again in bdrv_co_do_write_zeroes */
> +if (bs->detect_zeroes == BDRV_DETECT_ZEROES_UNMAP) {
> +flags |= BDRV_REQ_MAY_UNMAP;
> +}
> +}
> +
>  if (ret < 0) {
>  /* Do nothing, write notifier decided to fail this request */
>  } else if (flags & BDRV_REQ_ZERO_WRITE) {
> diff --git a/block/qapi.c b/block/qapi.c
> index af11445..fbf66c2 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -51,6 +51,17 @@ BlockDeviceInfo *bdrv_block_device_info(BlockDriverState 
> *bs)
>  
>  info->backing_file_depth = bdrv_get_backing_file_depth(bs);
>  
> +switch (bs->detect_zeroes) {
> +case BDRV_DETECT_ZEROES_ON:
> +info->detect_zeroes = g_strdup("on");
> +break;
> +case BDRV_DETECT_ZEROES_UNMAP:
> +info->detect_zeroes = g_strdup("unmap");
> +break;
> +default:
> +info->detect_zeroes = g_strdup("off");
> +}
> +
>  if (bs->io_limits_enabled) {
>  ThrottleConfig cfg;
>  throttle_get_config(&bs->throttle_state, &cfg);
> diff --git a/blockdev.c b/blockdev.c
> index 7810e9f..96c11fd 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -288,6 +288,21 @@ static int parse_block_error_action(const char *buf, 
> bool is_read, Error **errp)
>  }
>  }
>  
> +static BdrvDetectZeroes parse_detect_zeroes(const char *buf, Error **errp)
> +{
> +if (!buf || !strcmp(buf, "off")) {
> +return BDRV_DETECT_ZEROES_OFF;
> +} else if (!strcmp(buf, "on")) {
> +return BDRV_DETECT_ZEROES_ON;
> +} else if (!strcmp(buf, "unmap")) {
> +return BDRV_DETECT_ZEROES_UNMAP;
> +} else {
> +error_setg(errp, "invalid value for detect-zeroes: %s",
> +   buf);
> +}
> +return BDRV_DETECT_ZEROES_OFF;
> +}
> +
>  static bool check_throttle_config(ThrottleConfig *cfg, Error **errp)
>  {
>  if (throttle_conflicting(cfg)) {
> @@ -324,6 +339,7 @@ static DriveInfo *blockdev_init(const char *file, QDict 
> *bs_opts,
>  QemuOpts *opts;
>  const char *id;
>  bool has_driver_specific_opts;
> +BdrvDetectZeroes detect_zeroes;
>  BlockDriver *drv = NULL;
>  
>  /* Check common options by copying from bs_opts to opts, all other 
> options
> @@ -452,6 +468,13 @@ static DriveInfo *blockdev_init(const char *file, QDict 
> *bs_opts,
>  }
>  }
>  
> +detect_zeroes =
> +parse_detect_zeroes(qemu_opt_get(opts, "detect-zeroes"), &error);
> +if (error) {

[Qemu-devel] [PATCHv3] block: optimize zero writes with bdrv_write_zeroes

2014-05-06 Thread Peter Lieven
this patch tries to optimize zero write requests
by automatically using bdrv_write_zeroes if it is
supported by the format.

this should significantly speed up file system initialization and
should speed zero write test used to test backend storage performance.

the difference can simply be tested by e.g.

   dd if=/dev/zero of=/dev/vdX bs=1M

or

   mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdX

Signed-off-by: Peter Lieven 
---
v2->v3: - moved parameter parsing to blockdev_init
- added per device detect_zeroes status to 
  hmp (info block -v) and qmp (query-block) [Eric]
- added support to enable detect-zeroes also
  for hot added devices [Eric]
- added missing entry to qemu_common_drive_opts
- fixed description of qemu_iovec_is_zero [Fam]

v1->v2: - added tests to commit message (Markus)
RFCv2->v1: - fixed paramter parsing strncmp -> strcmp (Eric)
   - fixed typo (choosen->chosen) (Eric)
   - added second example to commit msg

RFCv1->RFCv2: - add detect-zeroes=off|on|unmap knob to drive cmdline parameter
  - call zero detection only for format (bs->file != NULL)

 block.c   |   11 ++
 block/qapi.c  |   11 ++
 blockdev.c|   28 +
 hmp.c |6 ++
 include/block/block_int.h |   12 +++
 include/qemu-common.h |1 +
 qapi-schema.json  |   50 +++--
 qemu-options.hx   |6 ++
 qmp-commands.hx   |3 +++
 util/iov.c|   21 +++
 10 files changed, 134 insertions(+), 15 deletions(-)

diff --git a/block.c b/block.c
index b749d31..f27b35d 100644
--- a/block.c
+++ b/block.c
@@ -3244,6 +3244,17 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 
 ret = notifier_with_return_list_notify(&bs->before_write_notifiers, req);
 
+if (!ret && bs->detect_zeroes != BDRV_DETECT_ZEROES_OFF &&
+!(flags & BDRV_REQ_ZERO_WRITE) && bs->file &&
+drv->bdrv_co_write_zeroes && qemu_iovec_is_zero(qiov)) {
+flags |= BDRV_REQ_ZERO_WRITE;
+/* if the device was not opened with discard=on the below flag
+ * is immediately cleared again in bdrv_co_do_write_zeroes */
+if (bs->detect_zeroes == BDRV_DETECT_ZEROES_UNMAP) {
+flags |= BDRV_REQ_MAY_UNMAP;
+}
+}
+
 if (ret < 0) {
 /* Do nothing, write notifier decided to fail this request */
 } else if (flags & BDRV_REQ_ZERO_WRITE) {
diff --git a/block/qapi.c b/block/qapi.c
index af11445..fbf66c2 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -51,6 +51,17 @@ BlockDeviceInfo *bdrv_block_device_info(BlockDriverState *bs)
 
 info->backing_file_depth = bdrv_get_backing_file_depth(bs);
 
+switch (bs->detect_zeroes) {
+case BDRV_DETECT_ZEROES_ON:
+info->detect_zeroes = g_strdup("on");
+break;
+case BDRV_DETECT_ZEROES_UNMAP:
+info->detect_zeroes = g_strdup("unmap");
+break;
+default:
+info->detect_zeroes = g_strdup("off");
+}
+
 if (bs->io_limits_enabled) {
 ThrottleConfig cfg;
 throttle_get_config(&bs->throttle_state, &cfg);
diff --git a/blockdev.c b/blockdev.c
index 7810e9f..96c11fd 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -288,6 +288,21 @@ static int parse_block_error_action(const char *buf, bool 
is_read, Error **errp)
 }
 }
 
+static BdrvDetectZeroes parse_detect_zeroes(const char *buf, Error **errp)
+{
+if (!buf || !strcmp(buf, "off")) {
+return BDRV_DETECT_ZEROES_OFF;
+} else if (!strcmp(buf, "on")) {
+return BDRV_DETECT_ZEROES_ON;
+} else if (!strcmp(buf, "unmap")) {
+return BDRV_DETECT_ZEROES_UNMAP;
+} else {
+error_setg(errp, "invalid value for detect-zeroes: %s",
+   buf);
+}
+return BDRV_DETECT_ZEROES_OFF;
+}
+
 static bool check_throttle_config(ThrottleConfig *cfg, Error **errp)
 {
 if (throttle_conflicting(cfg)) {
@@ -324,6 +339,7 @@ static DriveInfo *blockdev_init(const char *file, QDict 
*bs_opts,
 QemuOpts *opts;
 const char *id;
 bool has_driver_specific_opts;
+BdrvDetectZeroes detect_zeroes;
 BlockDriver *drv = NULL;
 
 /* Check common options by copying from bs_opts to opts, all other options
@@ -452,6 +468,13 @@ static DriveInfo *blockdev_init(const char *file, QDict 
*bs_opts,
 }
 }
 
+detect_zeroes =
+parse_detect_zeroes(qemu_opt_get(opts, "detect-zeroes"), &error);
+if (error) {
+error_propagate(errp, error);
+goto early_err;
+}
+
 /* init */
 dinfo = g_malloc0(sizeof(*dinfo));
 dinfo->id = g_strdup(qemu_opts_id(opts));
@@ -462,6 +485,7 @@ static DriveInfo *blockdev_init(const char *file, QDict 
*bs_opts,
 }
 dinfo->bdrv->open_flags = snapshot ? BDRV_O_SNAPSHOT : 0;
 dinfo->bdrv->read_only = ro;
+din

Re: [Qemu-devel] Help needed testing on ppc

2014-05-06 Thread BALATON Zoltan

On Tue, 6 May 2014, Tom Musta wrote:

On 5/6/2014 5:03 AM, BALATON Zoltan wrote:

Hello,

As I got no reply on the qemu-ppc list so far I try here maybe there 
are some people who read this list but don't follow the ppc one.


I don't have the necessary hardware to do the testing needed for the 
patch below. Some context for the discussion can be found in this 
message: 
http://lists.nongnu.org/archive/html/qemu-ppc/2014-04/msg00277.html


It seems we have some code that contains instructions with a reserved 
bit set in an stwx instruction that works on real hardware but causes 
an invalid instruction exception on QEMU.


I'd appreciate some insight and help.

Regards,
BALATON Zoltan


This is a bit tricky.  You appear to have code that has a reserved bit 
set.


Early forms of the PowerPC ISA (circa 1998) said this:  "All reserved 
fields in instructions should be zero.  If they are not, the instruction 
form is invalid. ...  Any attempt to execute an invalid form of an 
instruction will cause the system illegal instruction error handler to 
be invoked or yield boundedly undefined results."  QEMU, as a general 
rule, meets this requirement by causing illegal instruction exceptions.


More modern versions of the ISA (circa 2006) say this: "Reserved fields 
in instructions are ignored by the processor.  This is a requirement in 
the Server environment and is being phased into the Embedded 
environment. ... To maximize compatibility with future architecture 
extensions, software must ensure that reserved fields in instructions 
contain zero and that defined fields of instructions do not contain 
reserved values."  Technically, QEMU does not comply with the 
requirement in the first sentence;  and MorpOS does not comply with the 
third.


The newer form of the ISA is compatible with the older one since 
ignoring reserved fields is a valid implementation of "boundedly 
undefined."


Thanks for the exhaustive answer with definitive references. This is 
really very helpful.



A few questions and comments:

(1) Why is MorphOS using this invalid instruction form?  Would it be 
easier to fix the OS rather than QEMU?


I don't know why is it used. I can ask the MorphOS developers but they did 
not seem to be too supportive so far and at least one of them expressed 
that they have no interest supporting other than their officially 
supported list of hardware at this time. So I assume it is easier to fix 
QEMU than MorphOS and if it works on a real Mac then it should also work 
on QEMU's emulation of that Mac hardware.


Is there some undocumented processor behavior that the code is dependent 
upon (e.g. is it actually expected CR0 to be set?).


This is what the testing was supposed to find out but MorphOS seems to run 
better with the quoted patch so I don't think it depends on any other 
undocumented behaviour other than ignoring reserved bits but I have no 
definitive answer.


(2) Your patch makes some store instructions compliant with the most 
recent ISAs but there are many other instructions that are not addressed 
by the patch.  I think fixing only some will be a future source of 
confusion.


(3) The change risks breaking behavior on older designs which may very 
well have taken the illegal instruction interrupt.  Would it make more 
sense to leave the masks as-is and instead make a single, isolated 
change in the decoder (gen_intermediate_code_internal).  This behavior 
could be made conditional (configuration item?  processor family 
specific flag?).  Unfortunately, the masks also catch some invalid forms 
that do not involve reserved fields (e.g. lq/stq to odd numbered 
registers).


I don't know this code very well so not sure I can follow your suggestion. 
Are you proposing that the invalid masks could be ignored globally in 
gen_intermediate_code_internal (around target-ppc/traslate.c:11444) based 
on some condition for all opcodes?


Since your quotes above show that QEMU does not implement the current 
specification and code relying on older behaviour would not run on newer 
processors so it's likely they will get fixed so I think the risk of 
breaking older designs is less than breaking software that rely on current 
specification so IMO it should be changed in QEMU if possible and only 
care about older designs when one is actually encountered.


(4) In general, modeling undefined behavior is a slippery slope.  I 
would much prefer to see the code fixed or justified before changing 
QEMU.


I can try to ask on the MorphOS list but their previous answer to another 
question was that it works on the hardware they officially support.


Regards,
BALATON Zoltan



[Qemu-devel] [PATCH v3] block/raw-posix: Try both FIEMAP and SEEK_HOLE

2014-05-06 Thread Max Reitz
The current version of raw-posix always uses ioctl(FS_IOC_FIEMAP) if
FIEMAP is available; lseek with SEEK_HOLE/SEEK_DATA are not even
compiled in in this case. However, there may be implementations which
support the latter but not the former (e.g., NFSv4.2) as well as vice
versa.

To cover both cases, always try SEEK_HOLE/SEEK_DATA (as this will
probably be covered by POSIX soon) and if that does not work, fall back
to FIEMAP; and if that does not work either, treat everything as
allocated.

Signed-off-by: Max Reitz 
---
v3:
 - use a tristate for keeping the information whether or not to use
   FIEMAP and/or SEEK_HOLE/SEEK_DATA [Eric]
 - fall through to another implementation (i.e. FIEMAP) if the first
   (i.e. SEEK_HOLE/SEEK_DATA) reports everything as allocated; this will
   not result in that implementation being skipped the next time,
   however [Eric]
---
 block/raw-posix.c | 182 +++---
 1 file changed, 131 insertions(+), 51 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 3ce026d..fd6bac6 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -123,6 +123,18 @@
 
 #define MAX_BLOCKSIZE  4096
 
+/* In case there are multiple implementations for the same feature provided by
+ * the environment, this enumeration may be used to represent the status of
+ * these alternatives. */
+typedef enum ImplementationAlternativeStatus {
+/* The status is (yet) unknown. */
+IMPLSTAT_UNKNOWN = 0,
+/* This implementation is known to return correct results. */
+IMPLSTAT_WORKING,
+/* This implementation is known not to return correct results. */
+IMPLSTAT_SKIP,
+} ImplementationAlternativeStatus;
+
 typedef struct BDRVRawState {
 int fd;
 int type;
@@ -146,6 +158,12 @@ typedef struct BDRVRawState {
 bool has_discard:1;
 bool has_write_zeroes:1;
 bool discard_zeroes:1;
+#if defined SEEK_HOLE && defined SEEK_DATA
+ImplementationAlternativeStatus seek_hole_status;
+#endif
+#ifdef CONFIG_FIEMAP
+ImplementationAlternativeStatus fiemap_status;
+#endif
 } BDRVRawState;
 
 typedef struct BDRVRawReopenState {
@@ -1272,98 +1290,160 @@ static int raw_create(const char *filename, 
QEMUOptionParameter *options,
 return result;
 }
 
-/*
- * Returns true iff the specified sector is present in the disk image. Drivers
- * not implementing the functionality are assumed to not support backing files,
- * hence all their sectors are reported as allocated.
- *
- * If 'sector_num' is beyond the end of the disk image the return value is 0
- * and 'pnum' is set to 0.
- *
- * 'pnum' is set to the number of sectors (including and immediately following
- * the specified sector) that are known to be in the same
- * allocated/unallocated state.
- *
- * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
- * beyond the end of the disk image it will be clamped.
- */
-static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num,
-int nb_sectors, int *pnum)
+static int64_t try_fiemap(BlockDriverState *bs, off_t start, off_t end,
+  off_t *data, off_t *hole, int nb_sectors, int *pnum)
 {
-off_t start, data, hole;
-int64_t ret;
-
-ret = fd_open(bs);
-if (ret < 0) {
-return ret;
-}
-
-start = sector_num * BDRV_SECTOR_SIZE;
-ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
-
 #ifdef CONFIG_FIEMAP
-
 BDRVRawState *s = bs->opaque;
+int64_t ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
 struct {
 struct fiemap fm;
 struct fiemap_extent fe;
 } f;
 
+if (s->fiemap_status == IMPLSTAT_SKIP) {
+return -ENOTSUP;
+}
+
 f.fm.fm_start = start;
 f.fm.fm_length = (int64_t)nb_sectors * BDRV_SECTOR_SIZE;
 f.fm.fm_flags = 0;
 f.fm.fm_extent_count = 1;
 f.fm.fm_reserved = 0;
 if (ioctl(s->fd, FS_IOC_FIEMAP, &f) == -1) {
-/* Assume everything is allocated.  */
-*pnum = nb_sectors;
-return ret;
+s->fiemap_status = IMPLSTAT_SKIP;
+return -errno;
+}
+
+if (s->fiemap_status == IMPLSTAT_UNKNOWN) {
+if (f.fm.fm_extent_count == 1 &&
+f.fe.fe_logical == 0 && f.fe.fe_length >= end)
+{
+/* FIEMAP returned a single extent spanning the entire file; maybe
+ * this was just a default response and therefore we cannot be sure
+ * whether it actually works; fall back to an alternative
+ * implementation. */
+return -ENOTSUP;
+} else {
+s->fiemap_status = IMPLSTAT_WORKING;
+}
 }
 
 if (f.fm.fm_mapped_extents == 0) {
 /* No extents found, data is beyond f.fm.fm_start + f.fm.fm_length.
  * f.fm.fm_start + f.fm.fm_length must be clamped to the file size!
  */
-off_t length = lseek(s

Re: [Qemu-devel] [libvirt] [RFC 0/5] Allow object-add on X86CPU subclasses, for CPU model probing

2014-05-06 Thread Eric Blake
On 05/06/2014 02:19 PM, Eduardo Habkost wrote:

>> IMHO, libvirt side should take advantage of information QEMU already
>> provides.
>>
> 
> Current API requires re-running QEMU to query the information. This
> series allows it to be run with the "-machine none" QEMU instance that
> is already run by libvirt.

Therein is the reason libvirt isn't using what qemu already has -
spawning multiple qemu instances (one per reported machine type)
multiplied by the number of qemu binaries does not scale well, when
compared to starting a single qemu -machine none, and doing all queries
on that one machine.  So the point of this patch is getting us closer to
the point where libvirt can learn accurate cpu model information for
multiple machines all from a single qemu invocation.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2] block/raw-posix: Try both FIEMAP and SEEK_HOLE

2014-05-06 Thread Max Reitz

On 06.05.2014 23:47, Eric Blake wrote:

On 05/06/2014 03:18 PM, Eric Blake wrote:


...if you are on a file system where SEEK_HOLE triggers the kernel
fallback of "entire file is allocated", but where FIEMAP is wired up for
that file system, would it make sense to have try_seek_hole return -1 in
situations where lseek(s->fd, 0, SEEK_HOLE) returns the end of the file?
  Even more, should skip_seek_hole be a tri-state?

On the other hand, such systems are getting vanishingly rare as people
upgrade to newer kernels.  Do we care about catering to them, or is it
fair game to just tell people to upgrade if they want performance?


How about I'll send v3 and you decide? ;-)



Re: [Qemu-devel] [PATCH v2] block/raw-posix: Try both FIEMAP and SEEK_HOLE

2014-05-06 Thread Eric Blake
On 05/06/2014 03:18 PM, Eric Blake wrote:

> ...if you are on a file system where SEEK_HOLE triggers the kernel
> fallback of "entire file is allocated", but where FIEMAP is wired up for
> that file system, would it make sense to have try_seek_hole return -1 in
> situations where lseek(s->fd, 0, SEEK_HOLE) returns the end of the file?
>  Even more, should skip_seek_hole be a tri-state?

On the other hand, such systems are getting vanishingly rare as people
upgrade to newer kernels.  Do we care about catering to them, or is it
fair game to just tell people to upgrade if they want performance?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2] block/raw-posix: Try both FIEMAP and SEEK_HOLE

2014-05-06 Thread Max Reitz

On 06.05.2014 23:18, Eric Blake wrote:

On 05/06/2014 01:00 PM, Max Reitz wrote:

The current version of raw-posix always uses ioctl(FS_IOC_FIEMAP) if
FIEMAP is available; lseek with SEEK_HOLE/SEEK_DATA are not even
compiled in in this case. However, there may be implementations which
support the latter but not the former (e.g., NFSv4.2) as well as vice
versa.

To cover both cases, always try SEEK_HOLE/SEEK_DATA (as this will
probably be covered by POSIX soon) and if that does not work, fall back
to FIEMAP; and if that does not work either, treat everything as
allocated.

Signed-off-by: Max Reitz 
---
v2:
  - reworked using static functions [Stefan]
  - changed order of FIEMAP and SEEK_HOLE [Eric]
---
  block/raw-posix.c | 135 ++
  1 file changed, 85 insertions(+), 50 deletions(-)

+++ b/block/raw-posix.c
@@ -146,6 +146,12 @@ typedef struct BDRVRawState {
  bool has_discard:1;
  bool has_write_zeroes:1;
  bool discard_zeroes:1;
+#if defined SEEK_HOLE && defined SEEK_DATA
+bool skip_seek_hole;

Remember, SEEK_HOLE support requires both support of the kernel and the
filesystem - we may still have cases where there are filesystems that
lack SEEK_HOLE but still have FIEMAP, even when compiled against kernel
headers where SEEK_HOLE is defined - on those filesystems, the kernel
guarantees SEEK_HOLE will report the entire file as allocated, even
though FIEMAP might be able to find holes.  On the other hand, the whole
point of this is to optimize cases; where the fallback to treating the
whole file as allocated may be slower but still gives correct behavior
to the guest.

I like how you have a per-struct state to track whether you have
encountered a previous failure, to skip the known-failing probe the next
time around.  But I wonder if you need a tweak...


+static int64_t try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
+ off_t *hole, int *pnum)
+{
+#if defined SEEK_HOLE && defined SEEK_DATA
  BDRVRawState *s = bs->opaque;
  
-hole = lseek(s->fd, start, SEEK_HOLE);

-if (hole == -1) {
+if (s->skip_seek_hole) {
+return -ENOTSUP;
+}
+
+*hole = lseek(s->fd, start, SEEK_HOLE);
+if (*hole == -1) {
  /* -ENXIO indicates that sector_num was past the end of the file.
   * There is a virtual hole there.  */
  assert(errno != -ENXIO);
  
-/* Most likely EINVAL.  Assume everything is allocated.  */

-*pnum = nb_sectors;
-return ret;
+s->skip_seek_hole = true;
+return -errno;
  }

...if you are on a file system where SEEK_HOLE triggers the kernel
fallback of "entire file is allocated", but where FIEMAP is wired up for
that file system, would it make sense to have try_seek_hole return -1 in
situations where lseek(s->fd, 0, SEEK_HOLE) returns the end of the file?
  Even more, should skip_seek_hole be a tri-state?

state: "unknown" - if lseek(SEEK_HOLE) returns -1, state changes to
"skip"; if it returns something other than EOF, state changes to "use";
if it returns EOF, state remains "unknown" (as punching a hole or
resizing the image may work to create a hole in what is currently a
fully-allocated image)

state: "skip" - we've had a previous lseek failure, no need to try
seeking for holes on this file

state: "use" - we've had success probing a hole, so we want to always
trust SEEK_HOLE for this file


Hm, you're probably right. I just hope it won't get too complicated…

  
-if (hole > start) {

-data = start;
+if (*hole > start) {
+*data = start;
  } else {
  /* On a hole.  We need another syscall to find its end.  */
-data = lseek(s->fd, start, SEEK_DATA);
-if (data == -1) {
-data = lseek(s->fd, 0, SEEK_END);
+*data = lseek(s->fd, start, SEEK_DATA);
+if (*data == -1) {
+*data = lseek(s->fd, 0, SEEK_END);
  }

Pre-existing, and unaffected by this patch, but lseek() changes the file
offset.  Does that affect any other code that might do a read()/write(),
or are we safe in always using pread()/pwrite() so that we don't care
about file offset?


I see multiple lseek(fd, 0, SEEK_END) in raw-posix.c for determining the 
file size, so I guess it's fine.



+ *
+ * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
+ * beyond the end of the disk image it will be clamped.
+ */
+static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
+int64_t sector_num,
+int nb_sectors, int *pnum)

Indentation is off.


Well, yes, this is pre-existing, but I'll fix it.


+{
+off_t start, data = 0, hole = 0;
+int64_t ret;
+
+ret = fd_open(bs);
+if (ret < 0) {
+return ret;
+}
+
+start = sector_num * BDRV_SECTOR_SIZE;
+
+ret = try_seek_hole(bs, start, &data, &hole, pnum);

Again, 

[Qemu-devel] [PATCH v2 2/4] check-qdict: Add test for qdict_join()

2014-05-06 Thread Max Reitz
Add some test cases for qdict_join().

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Benoit Canet 
---
 tests/check-qdict.c | 87 +
 1 file changed, 87 insertions(+)

diff --git a/tests/check-qdict.c b/tests/check-qdict.c
index 2ad0f78..a9296f0 100644
--- a/tests/check-qdict.c
+++ b/tests/check-qdict.c
@@ -444,6 +444,92 @@ static void qdict_array_split_test(void)
 QDECREF(test_dict);
 }
 
+static void qdict_join_test(void)
+{
+QDict *dict1, *dict2;
+bool overwrite = false;
+int i;
+
+dict1 = qdict_new();
+dict2 = qdict_new();
+
+
+/* Test everything once without overwrite and once with */
+do
+{
+/* Test empty dicts */
+qdict_join(dict1, dict2, overwrite);
+
+g_assert(qdict_size(dict1) == 0);
+g_assert(qdict_size(dict2) == 0);
+
+
+/* First iteration: Test movement */
+/* Second iteration: Test empty source and non-empty destination */
+qdict_put(dict2, "foo", qint_from_int(42));
+
+for (i = 0; i < 2; i++) {
+qdict_join(dict1, dict2, overwrite);
+
+g_assert(qdict_size(dict1) == 1);
+g_assert(qdict_size(dict2) == 0);
+
+g_assert(qdict_get_int(dict1, "foo") == 42);
+}
+
+
+/* Test non-empty source and destination without conflict */
+qdict_put(dict2, "bar", qint_from_int(23));
+
+qdict_join(dict1, dict2, overwrite);
+
+g_assert(qdict_size(dict1) == 2);
+g_assert(qdict_size(dict2) == 0);
+
+g_assert(qdict_get_int(dict1, "foo") == 42);
+g_assert(qdict_get_int(dict1, "bar") == 23);
+
+
+/* Test conflict */
+qdict_put(dict2, "foo", qint_from_int(84));
+
+qdict_join(dict1, dict2, overwrite);
+
+g_assert(qdict_size(dict1) == 2);
+g_assert(qdict_size(dict2) == !overwrite);
+
+g_assert(qdict_get_int(dict1, "foo") == overwrite ? 84 : 42);
+g_assert(qdict_get_int(dict1, "bar") == 23);
+
+if (!overwrite) {
+g_assert(qdict_get_int(dict2, "foo") == 84);
+}
+
+
+/* Check the references */
+g_assert(qdict_get(dict1, "foo")->refcnt == 1);
+g_assert(qdict_get(dict1, "bar")->refcnt == 1);
+
+if (!overwrite) {
+g_assert(qdict_get(dict2, "foo")->refcnt == 1);
+}
+
+
+/* Clean up */
+qdict_del(dict1, "foo");
+qdict_del(dict1, "bar");
+
+if (!overwrite) {
+qdict_del(dict2, "foo");
+}
+}
+while (overwrite ^= true);
+
+
+QDECREF(dict1);
+QDECREF(dict2);
+}
+
 /*
  * Errors test-cases
  */
@@ -584,6 +670,7 @@ int main(int argc, char **argv)
 g_test_add_func("/public/iterapi", qdict_iterapi_test);
 g_test_add_func("/public/flatten", qdict_flatten_test);
 g_test_add_func("/public/array_split", qdict_array_split_test);
+g_test_add_func("/public/join", qdict_join_test);
 
 g_test_add_func("/errors/put_exists", qdict_put_exists_test);
 g_test_add_func("/errors/get_not_exists", qdict_get_not_exists_test);
-- 
1.9.2




Re: [Qemu-devel] [PATCH v2] block/raw-posix: Try both FIEMAP and SEEK_HOLE

2014-05-06 Thread Eric Blake
On 05/06/2014 01:00 PM, Max Reitz wrote:
> The current version of raw-posix always uses ioctl(FS_IOC_FIEMAP) if
> FIEMAP is available; lseek with SEEK_HOLE/SEEK_DATA are not even
> compiled in in this case. However, there may be implementations which
> support the latter but not the former (e.g., NFSv4.2) as well as vice
> versa.
> 
> To cover both cases, always try SEEK_HOLE/SEEK_DATA (as this will
> probably be covered by POSIX soon) and if that does not work, fall back
> to FIEMAP; and if that does not work either, treat everything as
> allocated.
> 
> Signed-off-by: Max Reitz 
> ---
> v2:
>  - reworked using static functions [Stefan]
>  - changed order of FIEMAP and SEEK_HOLE [Eric]
> ---
>  block/raw-posix.c | 135 
> ++
>  1 file changed, 85 insertions(+), 50 deletions(-)
> 

> +++ b/block/raw-posix.c
> @@ -146,6 +146,12 @@ typedef struct BDRVRawState {
>  bool has_discard:1;
>  bool has_write_zeroes:1;
>  bool discard_zeroes:1;
> +#if defined SEEK_HOLE && defined SEEK_DATA
> +bool skip_seek_hole;

Remember, SEEK_HOLE support requires both support of the kernel and the
filesystem - we may still have cases where there are filesystems that
lack SEEK_HOLE but still have FIEMAP, even when compiled against kernel
headers where SEEK_HOLE is defined - on those filesystems, the kernel
guarantees SEEK_HOLE will report the entire file as allocated, even
though FIEMAP might be able to find holes.  On the other hand, the whole
point of this is to optimize cases; where the fallback to treating the
whole file as allocated may be slower but still gives correct behavior
to the guest.

I like how you have a per-struct state to track whether you have
encountered a previous failure, to skip the known-failing probe the next
time around.  But I wonder if you need a tweak...

> +static int64_t try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
> + off_t *hole, int *pnum)
> +{
> +#if defined SEEK_HOLE && defined SEEK_DATA
>  BDRVRawState *s = bs->opaque;
>  
> -hole = lseek(s->fd, start, SEEK_HOLE);
> -if (hole == -1) {
> +if (s->skip_seek_hole) {
> +return -ENOTSUP;
> +}
> +
> +*hole = lseek(s->fd, start, SEEK_HOLE);
> +if (*hole == -1) {
>  /* -ENXIO indicates that sector_num was past the end of the file.
>   * There is a virtual hole there.  */
>  assert(errno != -ENXIO);
>  
> -/* Most likely EINVAL.  Assume everything is allocated.  */
> -*pnum = nb_sectors;
> -return ret;
> +s->skip_seek_hole = true;
> +return -errno;
>  }

...if you are on a file system where SEEK_HOLE triggers the kernel
fallback of "entire file is allocated", but where FIEMAP is wired up for
that file system, would it make sense to have try_seek_hole return -1 in
situations where lseek(s->fd, 0, SEEK_HOLE) returns the end of the file?
 Even more, should skip_seek_hole be a tri-state?

state: "unknown" - if lseek(SEEK_HOLE) returns -1, state changes to
"skip"; if it returns something other than EOF, state changes to "use";
if it returns EOF, state remains "unknown" (as punching a hole or
resizing the image may work to create a hole in what is currently a
fully-allocated image)

state: "skip" - we've had a previous lseek failure, no need to try
seeking for holes on this file

state: "use" - we've had success probing a hole, so we want to always
trust SEEK_HOLE for this file

>  
> -if (hole > start) {
> -data = start;
> +if (*hole > start) {
> +*data = start;
>  } else {
>  /* On a hole.  We need another syscall to find its end.  */
> -data = lseek(s->fd, start, SEEK_DATA);
> -if (data == -1) {
> -data = lseek(s->fd, 0, SEEK_END);
> +*data = lseek(s->fd, start, SEEK_DATA);
> +if (*data == -1) {
> +*data = lseek(s->fd, 0, SEEK_END);
>  }

Pre-existing, and unaffected by this patch, but lseek() changes the file
offset.  Does that affect any other code that might do a read()/write(),
or are we safe in always using pread()/pwrite() so that we don't care
about file offset?

> + *
> + * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
> + * beyond the end of the disk image it will be clamped.
> + */
> +static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
> +int64_t sector_num,
> +int nb_sectors, int *pnum)

Indentation is off.

> +{
> +off_t start, data = 0, hole = 0;
> +int64_t ret;
> +
> +ret = fd_open(bs);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +start = sector_num * BDRV_SECTOR_SIZE;
> +
> +ret = try_seek_hole(bs, start, &data, &hole, pnum);

Again, a tri-state return (-1 for skipped, 0 for unknown, 1 for hole or
EOF found) might make more sense.  That way, you hit the fallba

Re: [Qemu-devel] [PATCH] block: Fix open flags with BDRV_O_SNAPSHOT

2014-05-06 Thread Max Reitz

On 06.05.2014 13:10, Jan Kiszka wrote:

On 2014-05-06 12:19, Kevin Wolf wrote:

The immediately visible effect of this patch is that it fixes committing
a temporary snapshot to its backing file. Previously, it would fail with
a "permission denied" error because bdrv_inherited_flags() forced the
backing file to be read-only, ignoring the r/w reopen of bdrv_commit().

The bigger problem this releaved is that the original open flags must
actually only be applied to the temporary snapshot, and the original
image file must be treated as a backing file of the temporary snapshot
and get the right flags for that.

Reported-by: Jan Kiszka 
Signed-off-by: Kevin Wolf 
---
  block.c| 34 +++---
  include/block/block.h  |  2 +-
  tests/qemu-iotests/051 |  4 
  tests/qemu-iotests/051.out | 10 ++
  4 files changed, 34 insertions(+), 16 deletions(-)


[snip]


diff --git a/tests/qemu-iotests/051.out b/tests/qemu-iotests/051.out
index 01b0384..31e329e 100644
--- a/tests/qemu-iotests/051.out
+++ b/tests/qemu-iotests/051.out
@@ -358,4 +358,14 @@ wrote 4096/4096 bytes at offset 0
  
  read 4096/4096 bytes at offset 0

  4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Testing: -drive file=TEST_DIR/t.qcow2,snapshot=on
+QEMU X.Y.Z monitor - type 'help' for more information
+(qemu) 
qqeqemqemuqemu-qemu-iqemu-ioqemu-io
 qemu-io iqemu-io 
idqemu-io ideqemu-io 
ide0qemu-io 
ide0-qemu-io 
ide0-hqemu-io 
ide0-hdqemu-io 
ide0-hd0qemu-io ide0-hd0 
qemu-io ide0-hd0 
"qemu-io ide0-hd0 
"wqemu-io ide0-hd0 
"wrqemu-io ide0-hd0 
"wri!
  qemu-io ide0-hd0 "writqemu-io ide0-hd0 
"writeqemu-io ide0-hd0 "write 
qemu-io ide0-hd0 "write 
-qemu-io ide0-hd0 "write 
-Pqemu-io ide0-hd0 "write -P 
qemu-io ide0-hd0 "write -P 
0qemu-io ide0-hd0 "write -P 
0xqemu-io ide0-hd0 "write -P 
0x3qemu-io ide0-hd0 "w!
  rite -P 0x33
[Dqemu-io ide0-hd0 "write -P 0x33 
qemu-io 
ide0-hd0 "write -P 0x33 
0qemu-io 
ide0-hd0 "write -P 0x33 0 
qemu-io 
ide0-hd0 "write -P 0x33 0 
4qemu-io
 ide0-hd0 "write -P 0x33 0 
4kqemu-io
 ide0-hd0 "write -P 0x33 0 4k"
+wrote 4096/4096 bytes at offset 0
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+(qemu) 
ccocomcommcommicommitcommit
 commit icommit 
idcommit 
idecommit 
ide0commit 
ide0-commit 
ide0-hcommit 
ide0-hdcommit ide0-hd0
+(qemu) qququiquit
+
+read 4096/4096 bytes at offset 0
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  *** done


Works fine here!

(For unknown reason, applying t

Re: [Qemu-devel] [PATCH] block: Fix open flags with BDRV_O_SNAPSHOT

2014-05-06 Thread Max Reitz

On 06.05.2014 12:19, Kevin Wolf wrote:

The immediately visible effect of this patch is that it fixes committing
a temporary snapshot to its backing file. Previously, it would fail with
a "permission denied" error because bdrv_inherited_flags() forced the
backing file to be read-only, ignoring the r/w reopen of bdrv_commit().

The bigger problem this releaved is that the original open flags must
actually only be applied to the temporary snapshot, and the original
image file must be treated as a backing file of the temporary snapshot
and get the right flags for that.

Reported-by: Jan Kiszka 
Signed-off-by: Kevin Wolf 
---
  block.c| 34 +++---
  include/block/block.h  |  2 +-
  tests/qemu-iotests/051 |  4 
  tests/qemu-iotests/051.out | 10 ++
  4 files changed, 34 insertions(+), 16 deletions(-)


Reviewed-by: Max Reitz 



Re: [Qemu-devel] [RFC 0/5] Allow object-add on X86CPU subclasses, for CPU model probing

2014-05-06 Thread Andreas Färber
Am 06.05.2014 22:19, schrieb Eduardo Habkost:
> On Tue, May 06, 2014 at 10:01:11PM +0200, Igor Mammedov wrote:
>> On Tue, 6 May 2014 11:42:56 -0300
>> Eduardo Habkost  wrote:
>>> On Tue, May 06, 2014 at 09:22:38AM +0200, Igor Mammedov wrote:
 On Fri, 2 May 2014 11:43:05 -0300
 Eduardo Habkost  wrote:
> On Fri, May 02, 2014 at 03:45:03PM +0200, Igor Mammedov wrote:
>> On Wed, 30 Apr 2014 17:29:28 -0300
>> Eduardo Habkost  wrote:
>>> This series allows management code to use object-add on X86CPU 
>>> subclasses, so it
>> Is there any reason why "device-add" couldn't be used?
> It needs to work with "-machine none", device_add requires a bus to
> exist, and there is no icc-bus on machine_none.
 The thing is that CPUID is a function of machine so using
 "-machine none" will provide only approximately accurate data.
 I'm not sure that retrieved possibly not accurate data are useful
 for libvirt.
>>> "-cpu host" doesn't depend on machine, and is the most important thing
>>> right now (because libvirt's checks for host QEMU/kernel/CPU
>>> capabilities is completely broken).
>> true, but device-add/-cpu host could be used right now to get the
>> same CPUID data wile using any machine type or default one without
>> any of this patches.
> 
> device_add can't be used with "-machine none".

I see no reason why we couldn't *make* CPUs work on -machine none. The
ICC bus and bridge were a hack to make APIC(?) hot-add work in face of
SysBus; if that prohibits other valid uses now, then evaluating Igor's
memory work for CPU might be an option.

I'm not aware of any real X86CPU dependency on ICCBus, so we might as
well make -device place it on SysBus if no ICCBus is available...
probably much more invasive and potentially dangerous though.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



[Qemu-devel] [PATCH v2 4/4] iotests: Add test for the JSON protocol

2014-05-06 Thread Max Reitz
Add a test for the JSON protocol driver.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/089 | 132 +
 tests/qemu-iotests/089.out |  49 +
 tests/qemu-iotests/group   |   1 +
 3 files changed, 182 insertions(+)
 create mode 100755 tests/qemu-iotests/089
 create mode 100644 tests/qemu-iotests/089.out

diff --git a/tests/qemu-iotests/089 b/tests/qemu-iotests/089
new file mode 100755
index 000..a1f1c64
--- /dev/null
+++ b/tests/qemu-iotests/089
@@ -0,0 +1,132 @@
+#!/bin/bash
+#
+# Test case for support of JSON filenames
+#
+# Copyright (C) 2014 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=mre...@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+tmp=/tmp/$$
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+# Using an image filename containing quotation marks will render the JSON data
+# below invalid. In that case, we have little choice but simply not to run this
+# test.
+case $TEST_IMG in
+*'"'*)
+_notrun "image filename may not contain quotation marks"
+;;
+esac
+
+IMG_SIZE=64M
+
+# Taken from test 072
+echo
+echo "=== Testing nested image formats ==="
+echo
+
+TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE
+
+$QEMU_IO -c 'write -P 42 0 512' -c 'write -P 23 512 512' \
+ -c 'write -P 66 1024 512' "$TEST_IMG.base" | _filter_qemu_io
+
+$QEMU_IMG convert -f raw -O $IMGFMT "$TEST_IMG.base" "$TEST_IMG"
+
+$QEMU_IO -c 'read -P 42 0 512' -c 'read -P 23 512 512' \
+ -c 'read -P 66 1024 512' "json:{
+\"driver\": \"$IMGFMT\",
+\"file\": {
+\"driver\": \"$IMGFMT\",
+\"file\": {
+\"filename\": \"$TEST_IMG\"
+}
+}
+}" | _filter_qemu_io
+
+# This should fail (see test 072)
+$QEMU_IO -c 'read -P 42 0 512' "$TEST_IMG" | _filter_qemu_io
+
+
+# Taken from test 071
+echo
+echo "=== Testing blkdebug ==="
+echo
+
+_make_test_img $IMG_SIZE
+
+$QEMU_IO -c 'write -P 42 0x38000 512' "$TEST_IMG" | _filter_qemu_io
+
+# The "image.filename" part tests whether "a": { "b": "c" } and "a.b": "c" do
+# the same (which they should).
+$QEMU_IO -c 'read -P 42 0x38000 512' "json:{
+\"driver\": \"$IMGFMT\",
+\"file\": {
+\"driver\": \"blkdebug\",
+\"inject-error\": [{
+\"event\": \"l2_load\"
+}],
+\"image.filename\": \"$TEST_IMG\"
+}
+}" | _filter_qemu_io
+
+
+echo
+echo "=== Testing qemu-img info output ==="
+echo
+
+$QEMU_IMG info "json:{\"driver\":\"qcow2\",\"file.filename\":\"$TEST_IMG\"}" \
+| _filter_testdir | _filter_imgfmt
+
+
+echo
+echo "=== Testing option merging ==="
+echo
+
+# This should work: Both options given directly as well as those given through
+# the filename should be used
+$QEMU_IO -c "open -o driver=qcow2 json:{\"file.filename\":\"$TEST_IMG\"}" \
+ -c "info" 2>&1 | _filter_testdir | _filter_imgfmt
+
+# This should not work: The "driver" option is set both directly and through 
the
+# filename
+$QEMU_IO -c "open -o driver=qcow2 
json:{\"driver\":\"qcow2\",\"file.filename\":\"$TEST_IMG\"}" \
+ -c "info" 2>&1 | _filter_testdir | _filter_imgfmt
+
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/089.out b/tests/qemu-iotests/089.out
new file mode 100644
index 000..54fadaf
--- /dev/null
+++ b/tests/qemu-iotests/089.out
@@ -0,0 +1,49 @@
+QA output created by 089
+
+=== Testing nested image formats ===
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=67108864 
+wrote 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 1024
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 1024
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Pattern verification failed at o

Re: [Qemu-devel] [PATCH v2 0/4] block: Allow JSON filenames

2014-05-06 Thread Eric Blake
On 05/06/2014 02:29 PM, Max Reitz wrote:
> This series acts as some kind of alternative or v5 to the "block/json:
> Add JSON protocol driver" series. It makes bdrv_open() parse filenames
> prefixed by "json:" as JSON objects (discarding the prefix beforehand)
> and then use the resulting QDict as the options for the block device to
> be opened with a NULL filename.
> 
> The purpose of this is that it may sometimes be desirable to specify
> options for a block device where only a filename can be given, e.g., for
> backing files. Using this should obviously be the exception, but it is
> nice to have if actually needed.
> 
> 
> After having written this series, I do indeed agree that it is much
> nicer than the old version of having a dedicated "virtual" block driver
> for this purpose.
> 
> Patches 1 and 2 are taken directly from the block/json v3, patch 4 has
> been enriched by an additional scenario (options given directly and
> through the filename) and its comments have been adapted to the new
> implementation.

Series:
Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH v2 3/4] block: Allow JSON filenames

2014-05-06 Thread Max Reitz
If the filename given to bdrv_open() is prefixed with "json:", parse the
rest as a JSON object and merge the result into the options QDict. If
there are conflicts, report one of them to the user and abort.

Signed-off-by: Max Reitz 
---
 block.c | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/block.c b/block.c
index b749d31..010da3e 100644
--- a/block.c
+++ b/block.c
@@ -1275,6 +1275,33 @@ out:
 g_free(tmp_filename);
 }
 
+static QDict *parse_json_filename(const char *filename, Error **errp)
+{
+QObject *options_obj;
+QDict *options;
+int ret;
+
+ret = strstart(filename, "json:", &filename);
+assert(ret);
+
+options_obj = qobject_from_json(filename);
+if (!options_obj) {
+error_setg(errp, "Could not parse the JSON options");
+return NULL;
+}
+
+if (qobject_type(options_obj) != QTYPE_QDICT) {
+qobject_decref(options_obj);
+error_setg(errp, "Invalid JSON object given");
+return NULL;
+}
+
+options = qobject_to_qdict(options_obj);
+qdict_flatten(options);
+
+return options;
+}
+
 /*
  * Opens a disk image (raw, qcow2, vmdk, ...)
  *
@@ -1337,6 +1364,26 @@ int bdrv_open(BlockDriverState **pbs, const char 
*filename,
 options = qdict_new();
 }
 
+if (filename && g_str_has_prefix(filename, "json:")) {
+QDict *json_options = parse_json_filename(filename, &local_err);
+if (local_err) {
+ret = -EINVAL;
+goto fail;
+}
+
+qdict_join(options, json_options, false);
+if (qdict_size(json_options) > 0) {
+error_setg(errp, "Option \"%s\" specified both directly and in "
+   "the JSON filename", qdict_first(json_options)->key);
+QDECREF(json_options);
+ret = -EINVAL;
+goto fail;
+}
+
+QDECREF(json_options);
+filename = NULL;
+}
+
 bs->options = options;
 options = qdict_clone_shallow(options);
 
-- 
1.9.2




Re: [Qemu-devel] [PATCH 3/4] block: Allow JSON filenames

2014-05-06 Thread Max Reitz

On 06.05.2014 22:28, Eric Blake wrote:

On 05/06/2014 02:00 PM, Max Reitz wrote:


-drive
file='json:{"driver":"qcow2","file.filename":"foo","backing.file.driver":"raw"}',backing.file.driver=qcow2


looks like it specifies conflicting backing.file.driver options.
Passing true means that qdict_join silently overwrites the value in
options to instead be the value in the json string; passing false means
you could flag the user error.

Yes, you're right; I'll change it.

And of course, bonus points if you enhance the testsuite to provoke the
error and prove we detect it :)


Already done. ;-)

Max



[Qemu-devel] [PATCH v2 0/4] block: Allow JSON filenames

2014-05-06 Thread Max Reitz
This series acts as some kind of alternative or v5 to the "block/json:
Add JSON protocol driver" series. It makes bdrv_open() parse filenames
prefixed by "json:" as JSON objects (discarding the prefix beforehand)
and then use the resulting QDict as the options for the block device to
be opened with a NULL filename.

The purpose of this is that it may sometimes be desirable to specify
options for a block device where only a filename can be given, e.g., for
backing files. Using this should obviously be the exception, but it is
nice to have if actually needed.


After having written this series, I do indeed agree that it is much
nicer than the old version of having a dedicated "virtual" block driver
for this purpose.

Patches 1 and 2 are taken directly from the block/json v3, patch 4 has
been enriched by an additional scenario (options given directly and
through the filename) and its comments have been adapted to the new
implementation.


v2:
 - Detect when options are given both directly and through the filename
   and report this as an error to the user [Eric]
   A corresponding test case has been added to patch 4
 - Update comments in the test added by patch 4 to match the new
   implementation [Eric]
 - The -g flag is no longer required in patch 4 (as the filename is no
   longer relevant for format detection, since it will be dropped after
   it has been parsed), so drop it from patch 4



git-backport-diff against v1:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/4:[] [--] 'qdict: Add qdict_join()'
002/4:[] [--] 'check-qdict: Add test for qdict_join()'
003/4:[0012] [FC] 'block: Allow JSON filenames'
004/4:[0035] [FC] 'iotests: Add test for the JSON protocol'



git-backport-diff against [PATCH v3 00/12] block/json:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/4:[] [--] 'qdict: Add qdict_join()'
002/4:[] [--] 'check-qdict: Add test for qdict_join()'
003/4:[down] 'block: Allow JSON filenames'
004/4:[0035] [FC] 'iotests: Add test for the JSON protocol'



Max Reitz (4):
  qdict: Add qdict_join()
  check-qdict: Add test for qdict_join()
  block: Allow JSON filenames
  iotests: Add test for the JSON protocol

 block.c|  47 
 include/qapi/qmp/qdict.h   |   3 ++
 qobject/qdict.c|  32 +++
 tests/check-qdict.c|  87 ++
 tests/qemu-iotests/089 | 132 +
 tests/qemu-iotests/089.out |  49 +
 tests/qemu-iotests/group   |   1 +
 7 files changed, 351 insertions(+)
 create mode 100755 tests/qemu-iotests/089
 create mode 100644 tests/qemu-iotests/089.out

-- 
1.9.2




[Qemu-devel] [PATCH v2 1/4] qdict: Add qdict_join()

2014-05-06 Thread Max Reitz
This function joins two QDicts by absorbing one into the other.

Signed-off-by: Max Reitz 
Reviewed-by: Benoit Canet 
Reviewed-by: Eric Blake 
---
 include/qapi/qmp/qdict.h |  3 +++
 qobject/qdict.c  | 32 
 2 files changed, 35 insertions(+)

diff --git a/include/qapi/qmp/qdict.h b/include/qapi/qmp/qdict.h
index 1ddf97b..d68f4eb 100644
--- a/include/qapi/qmp/qdict.h
+++ b/include/qapi/qmp/qdict.h
@@ -16,6 +16,7 @@
 #include "qapi/qmp/qobject.h"
 #include "qapi/qmp/qlist.h"
 #include "qemu/queue.h"
+#include 
 #include 
 
 #define QDICT_BUCKET_MAX 512
@@ -70,4 +71,6 @@ void qdict_flatten(QDict *qdict);
 void qdict_extract_subqdict(QDict *src, QDict **dst, const char *start);
 void qdict_array_split(QDict *src, QList **dst);
 
+void qdict_join(QDict *dest, QDict *src, bool overwrite);
+
 #endif /* QDICT_H */
diff --git a/qobject/qdict.c b/qobject/qdict.c
index 42ec4c0..ea239f0 100644
--- a/qobject/qdict.c
+++ b/qobject/qdict.c
@@ -665,3 +665,35 @@ void qdict_array_split(QDict *src, QList **dst)
 qlist_append_obj(*dst, subqobj ?: QOBJECT(subqdict));
 }
 }
+
+/**
+ * qdict_join(): Absorb the src QDict into the dest QDict, that is, move all
+ * elements from src to dest.
+ *
+ * If an element from src has a key already present in dest, it will not be
+ * moved unless overwrite is true.
+ *
+ * If overwrite is true, the conflicting values in dest will be discarded and
+ * replaced by the corresponding values from src.
+ *
+ * Therefore, with overwrite being true, the src QDict will always be empty 
when
+ * this function returns. If overwrite is false, the src QDict will be empty
+ * iff there were no conflicts.
+ */
+void qdict_join(QDict *dest, QDict *src, bool overwrite)
+{
+const QDictEntry *entry, *next;
+
+entry = qdict_first(src);
+while (entry) {
+next = qdict_next(src, entry);
+
+if (overwrite || !qdict_haskey(dest, entry->key)) {
+qobject_incref(entry->value);
+qdict_put_obj(dest, entry->key, entry->value);
+qdict_del(src, entry->key);
+}
+
+entry = next;
+}
+}
-- 
1.9.2




Re: [Qemu-devel] [PATCH 3/4] block: Allow JSON filenames

2014-05-06 Thread Eric Blake
On 05/06/2014 02:00 PM, Max Reitz wrote:

>>
>> -drive
>> file='json:{"driver":"qcow2","file.filename":"foo","backing.file.driver":"raw"}',backing.file.driver=qcow2
>>
>>
>> looks like it specifies conflicting backing.file.driver options.
>> Passing true means that qdict_join silently overwrites the value in
>> options to instead be the value in the json string; passing false means
>> you could flag the user error.
> 
> Yes, you're right; I'll change it.

And of course, bonus points if you enhance the testsuite to provoke the
error and prove we detect it :)

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC PATCH v2 00/16] visitor+BER migration format

2014-05-06 Thread Michael S. Tsirkin
On Tue, May 06, 2014 at 07:58:07PM +0100, Dr. David Alan Gilbert wrote:
> * Markus Armbruster (arm...@redhat.com) wrote:
> > "Michael S. Tsirkin"  writes:
> 
> 
> 
> > > OK but for a new machine type, let's default to BER, right?
> > > I see no reason to keep supporting when non-BER when -M specifies 2.1
> > > compatibility, do you?
> > 
> > I fail to see the relation between machine type and migration's wire
> > encoding.
> 
> New machine types are a useful but not definitive line in the sand.  If
> you enable something/change the default on a new machine type you know
> it won't break any existing users since there aren't any.
> 
> Dve

Exactly. And on the other hand, someone enabling old machine type
and doing live migration is likely to want to be compatible with old
qemu wrt migration.

> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH 3/4] block: Allow JSON filenames

2014-05-06 Thread Max Reitz

On 06.05.2014 21:57, Eric Blake wrote:

On 05/06/2014 01:30 PM, Max Reitz wrote:

If the filename given to bdrv_open() is prefixed with "json:", parse the
rest as a JSON object and use the result as the options QDict.

Signed-off-by: Max Reitz 
---
  block.c | 41 +
  1 file changed, 41 insertions(+)

  /*
   * Opens a disk image (raw, qcow2, vmdk, ...)
   *
@@ -1337,6 +1364,20 @@ int bdrv_open(BlockDriverState **pbs, const char 
*filename,
  options = qdict_new();
  }
  
+if (filename && g_str_has_prefix(filename, "json:")) {

+QDict *json_options = parse_json_filename(filename, &local_err);
+if (local_err) {
+ret = -EINVAL;
+goto fail;
+}
+
+qdict_join(options, json_options, true);
+assert(qdict_size(json_options) == 0);

Would it be better to pass false to qdict_join(), and then raise an
error if the user specified conflicting options?  For example (untested,
just typing off the top of my head here),

-drive
file='json:{"driver":"qcow2","file.filename":"foo","backing.file.driver":"raw"}',backing.file.driver=qcow2

looks like it specifies conflicting backing.file.driver options.
Passing true means that qdict_join silently overwrites the value in
options to instead be the value in the json string; passing false means
you could flag the user error.


Yes, you're right; I'll change it.

Max


+QDECREF(json_options);
+
+filename = NULL;
+}
+
  bs->options = options;
  options = qdict_clone_shallow(options);






Re: [Qemu-devel] [RFC 0/5] Allow object-add on X86CPU subclasses, for CPU model probing

2014-05-06 Thread Eduardo Habkost
On Tue, May 06, 2014 at 10:01:11PM +0200, Igor Mammedov wrote:
> On Tue, 6 May 2014 11:42:56 -0300
> Eduardo Habkost  wrote:
> 
> > On Tue, May 06, 2014 at 09:22:38AM +0200, Igor Mammedov wrote:
> > > On Fri, 2 May 2014 11:43:05 -0300
> > > Eduardo Habkost  wrote:
> > > 
> > > > On Fri, May 02, 2014 at 03:45:03PM +0200, Igor Mammedov wrote:
> > > > > On Wed, 30 Apr 2014 17:29:28 -0300
> > > > > Eduardo Habkost  wrote:
> > > > > 
> > > > > > This series allows management code to use object-add on X86CPU 
> > > > > > subclasses, so it
> > > > > Is there any reason why "device-add" couldn't be used?
> > > > 
> > > > It needs to work with "-machine none", device_add requires a bus to
> > > > exist, and there is no icc-bus on machine_none.
> > > The thing is that CPUID is a function of machine so using
> > > "-machine none" will provide only approximately accurate data.
> > > I'm not sure that retrieved possibly not accurate data are useful
> > > for libvirt.
> > 
> > "-cpu host" doesn't depend on machine, and is the most important thing
> > right now (because libvirt's checks for host QEMU/kernel/CPU
> > capabilities is completely broken).
> true, but device-add/-cpu host could be used right now to get the
> same CPUID data wile using any machine type or default one without
> any of this patches.

device_add can't be used with "-machine none".

> 
> > 
> > About machine-type data: currently libvirt has no concept of
> > per-machine-type CPU model data, anyway. We (and libvirt) will need to
> > address this eventually, but considering our track record, I bet the
> > QEMU community will take a few years to finally provide that info to
> > libvirt.
> I don't think QEMU is issue here, libvirt can use device-add to
> probe accurate CPUID on all CPU models on a given machine type now.
> libvirt should be fixed to care about machine type and use it to get
> correct CPUID data that QEMU provides.

True, it should. But we still need a solution for the "-cpu host" problem.

> 
> > 
> > In the meantime, we have a partial solution that fits the current
> > libvirt data model and is better than the current state (where libvirt
> > has to duplicate the CPU model data).
> Replacing one set of inaccurate CPUIDs with another is hardly solution.

We could continue arguing about this, but let's ignore the issue about
probing for CPU model information by now, and let's focus on host
capability probing ("-cpu host"), then.

How do you propose fixing that in a reasonable time (in QEMU 2.1)
without requiring libvirt to re-run QEMU?


> 
> > 
> > Maybe they will use this only for "host", maybe they will use this for
> > all the other CPU models, we are just providing the mechanism, it's
> > their decision to use it or not.
> As I've said above libvirt could use device-add xxx-host or -cpu host
> to get it and second point I object to is providing yet another way
> to produce inaccurate CPUID info (libvirt has one already) and to do
> so hack [patches 1-3] CPU infrastructure to run out of context it's
> supposed to run in. While current API already allows to get correct
> CPUID data.
> 
> IMHO, libvirt side should take advantage of information QEMU already
> provides.
> 

Current API requires re-running QEMU to query the information. This
series allows it to be run with the "-machine none" QEMU instance that
is already run by libvirt.


> > 
> > 
> > > 
> > > > 
> > > > The first thing I considered was making icc-bus user-creatable. Then I
> > > > noticed it wouldn't work because object-add always add objects to
> > > > /objects, not inside the qdev hierarchy (that's where device_add looks
> > > > for the bus).
> > > > 
> > > > So, allowing device_add could be possible, but would require changing
> > > > more basic infrastructure: either allowing bus-less devices on
> > > > device_add, or allowing device_add to add devices outside the qdev
> > > > hierarchy, or allowing object-add to create objects outside /objects.
> > > > 
> > > > Simply making CPU objects work with object-add was much simpler and less
> > > > intrusive. And it had the interesting side-effect of _not_ doing things
> > > > that are not required for CPU model probing (like creating an actual
> > > > VCPU thread).
> > > > 
> > > > > 
> > > > > 
> > > > > > can use it to probe for CPU model information without re-running 
> > > > > > QEMU. The main
> > > > > > use case for this is to allow management code to create CPU objects 
> > > > > > and query
> > > > > > the "feature-words" and "filtered-features" properties on the new 
> > > > > > objects, to
> > > > > > find out which features each CPU model needs, and to do the same 
> > > > > > using the
> > > > > > "host" CPU model to check which features can be enabled in a given 
> > > > > > host.
> > > > > > 
> > > > > > There's experimental libvirt code to use the new command at:
> > > > > > 
> > > > > > https://github.com/ehabkost/libvirt/tree/work/cpu-feature-word-query
> > > > > > The experimental code just create

Re: [Qemu-devel] [PATCH 4/4] iotests: Add test for the JSON protocol

2014-05-06 Thread Max Reitz

On 06.05.2014 22:04, Eric Blake wrote:

On 05/06/2014 01:30 PM, Max Reitz wrote:

Add a test for the JSON protocol driver.

Signed-off-by: Max Reitz 
Reviewed-by: Benoit Canet 
---



+echo
+echo "=== Testing qemu-img info output ==="
+echo
+
+# This should output information about the image itself, not about the JSON
+# block device.

Is this comment a bit stale, now that there is no separate JSON block
device?


Probably, yes, I just took it from the old series and was glad it just 
worked. :-)


Max



Re: [Qemu-devel] [PATCH 4/4] iotests: Add test for the JSON protocol

2014-05-06 Thread Eric Blake
On 05/06/2014 01:30 PM, Max Reitz wrote:
> Add a test for the JSON protocol driver.
> 
> Signed-off-by: Max Reitz 
> Reviewed-by: Benoit Canet 
> ---


> +echo
> +echo "=== Testing qemu-img info output ==="
> +echo
> +
> +# This should output information about the image itself, not about the JSON
> +# block device.

Is this comment a bit stale, now that there is no separate JSON block
device?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] virtio-serial-pci very expensive during live migration

2014-05-06 Thread Chris Friesen

Hi,

I recently made the unfortunate discovery that virtio-serial-pci is 
quite expensive to stop/start during live migration.


By default we support 32 ports, each of which uses 2 queues.  In my case 
it takes 2-3ms per queue to disconnect on the source host, and another 
2-3ms per queue to connect on the target host, for a total cost of >300ms.


In our case this roughly tripled the outage times of a libvirt-based 
live migration, from 150ms to almost 500ms.


It seems like the main problem is that we loop over all the queues, 
calling virtio_pci_set_host_notifier_internal() on each of them.  That 
in turn calls memory_region_add_eventfd(), which calls 
memory_region_transaction_commit(), which scans over all the address 
spaces, which seems to take the vast majority of the time.


Yes, setting the max_ports value to something smaller does help, but 
each port still adds 10-12ms to the overall live migration time, which 
is crazy.


Is there anything that could be done to make this code more efficient? 
Could we tweak the API so that we add all the eventfds and then do a 
single commit at the end?  Do we really need to scan the entire address 
space?  I don't know the code well enough to answer that sort of 
question, but I'm hoping that one of you does.


Thanks,
Chris



Re: [Qemu-devel] [RFC 0/5] Allow object-add on X86CPU subclasses, for CPU model probing

2014-05-06 Thread Igor Mammedov
On Tue, 6 May 2014 11:42:56 -0300
Eduardo Habkost  wrote:

> On Tue, May 06, 2014 at 09:22:38AM +0200, Igor Mammedov wrote:
> > On Fri, 2 May 2014 11:43:05 -0300
> > Eduardo Habkost  wrote:
> > 
> > > On Fri, May 02, 2014 at 03:45:03PM +0200, Igor Mammedov wrote:
> > > > On Wed, 30 Apr 2014 17:29:28 -0300
> > > > Eduardo Habkost  wrote:
> > > > 
> > > > > This series allows management code to use object-add on X86CPU 
> > > > > subclasses, so it
> > > > Is there any reason why "device-add" couldn't be used?
> > > 
> > > It needs to work with "-machine none", device_add requires a bus to
> > > exist, and there is no icc-bus on machine_none.
> > The thing is that CPUID is a function of machine so using
> > "-machine none" will provide only approximately accurate data.
> > I'm not sure that retrieved possibly not accurate data are useful
> > for libvirt.
> 
> "-cpu host" doesn't depend on machine, and is the most important thing
> right now (because libvirt's checks for host QEMU/kernel/CPU
> capabilities is completely broken).
true, but device-add/-cpu host could be used right now to get the
same CPUID data wile using any machine type or default one without
any of this patches.

> 
> About machine-type data: currently libvirt has no concept of
> per-machine-type CPU model data, anyway. We (and libvirt) will need to
> address this eventually, but considering our track record, I bet the
> QEMU community will take a few years to finally provide that info to
> libvirt.
I don't think QEMU is issue here, libvirt can use device-add to
probe accurate CPUID on all CPU models on a given machine type now.
libvirt should be fixed to care about machine type and use it to get
correct CPUID data that QEMU provides.

> 
> In the meantime, we have a partial solution that fits the current
> libvirt data model and is better than the current state (where libvirt
> has to duplicate the CPU model data).
Replacing one set of inaccurate CPUIDs with another is hardly solution.

> 
> Maybe they will use this only for "host", maybe they will use this for
> all the other CPU models, we are just providing the mechanism, it's
> their decision to use it or not.
As I've said above libvirt could use device-add xxx-host or -cpu host
to get it and second point I object to is providing yet another way
to produce inaccurate CPUID info (libvirt has one already) and to do
so hack [patches 1-3] CPU infrastructure to run out of context it's
supposed to run in. While current API already allows to get correct
CPUID data.

IMHO, libvirt side should take advantage of information QEMU already
provides.

> 
> 
> > 
> > > 
> > > The first thing I considered was making icc-bus user-creatable. Then I
> > > noticed it wouldn't work because object-add always add objects to
> > > /objects, not inside the qdev hierarchy (that's where device_add looks
> > > for the bus).
> > > 
> > > So, allowing device_add could be possible, but would require changing
> > > more basic infrastructure: either allowing bus-less devices on
> > > device_add, or allowing device_add to add devices outside the qdev
> > > hierarchy, or allowing object-add to create objects outside /objects.
> > > 
> > > Simply making CPU objects work with object-add was much simpler and less
> > > intrusive. And it had the interesting side-effect of _not_ doing things
> > > that are not required for CPU model probing (like creating an actual
> > > VCPU thread).
> > > 
> > > > 
> > > > 
> > > > > can use it to probe for CPU model information without re-running 
> > > > > QEMU. The main
> > > > > use case for this is to allow management code to create CPU objects 
> > > > > and query
> > > > > the "feature-words" and "filtered-features" properties on the new 
> > > > > objects, to
> > > > > find out which features each CPU model needs, and to do the same 
> > > > > using the
> > > > > "host" CPU model to check which features can be enabled in a given 
> > > > > host.
> > > > > 
> > > > > There's experimental libvirt code to use the new command at:
> > > > > 
> > > > > https://github.com/ehabkost/libvirt/tree/work/cpu-feature-word-query
> > > > > The experimental code just create the CPU objects to query for feature
> > > > > information, but doesn't do anything with that data.
> > > > > 
> > > > > Eduardo Habkost (5):
> > > > >   cpu: Initialize cpu->stopped=true earlier
> > > > >   cpu: Don't try to pause CPUs if they are already stopped
> > > > >   pc: Don't crash on apic_accept_pic_intr() if CPU has no apic_state
> > > > >   target-i386: Make CPU objects user-creatable
> > > > >   target-i386: Report QOM class name for CPU definitions
> > > > > 
> > > > >  cpus.c| 13 ++---
> > > > >  exec.c|  1 +
> > > > >  hw/i386/pc.c  |  2 +-
> > > > >  qapi-schema.json  |  6 +-
> > > > >  target-i386/cpu.c |  7 +++
> > > > >  5 files changed, 24 insertions(+), 5 deletions(-)
> > > > > 
> > > > 
> > > 
> > 
> 
> -- 
> Eduardo
> 


-- 
Regards,
  Igor



Re: [Qemu-devel] [PATCH 3/4] block: Allow JSON filenames

2014-05-06 Thread Eric Blake
On 05/06/2014 01:30 PM, Max Reitz wrote:
> If the filename given to bdrv_open() is prefixed with "json:", parse the
> rest as a JSON object and use the result as the options QDict.
> 
> Signed-off-by: Max Reitz 
> ---
>  block.c | 41 +
>  1 file changed, 41 insertions(+)
> 

>  /*
>   * Opens a disk image (raw, qcow2, vmdk, ...)
>   *
> @@ -1337,6 +1364,20 @@ int bdrv_open(BlockDriverState **pbs, const char 
> *filename,
>  options = qdict_new();
>  }
>  
> +if (filename && g_str_has_prefix(filename, "json:")) {
> +QDict *json_options = parse_json_filename(filename, &local_err);
> +if (local_err) {
> +ret = -EINVAL;
> +goto fail;
> +}
> +
> +qdict_join(options, json_options, true);
> +assert(qdict_size(json_options) == 0);

Would it be better to pass false to qdict_join(), and then raise an
error if the user specified conflicting options?  For example (untested,
just typing off the top of my head here),

-drive
file='json:{"driver":"qcow2","file.filename":"foo","backing.file.driver":"raw"}',backing.file.driver=qcow2

looks like it specifies conflicting backing.file.driver options.
Passing true means that qdict_join silently overwrites the value in
options to instead be the value in the json string; passing false means
you could flag the user error.

> +QDECREF(json_options);
> +
> +filename = NULL;
> +}
> +
>  bs->options = options;
>  options = qdict_clone_shallow(options);
>  
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward

2014-05-06 Thread Marcin Gibuła

Yes, and it isn't. Any ideas why it's not? This patch really just uses
the guest visible kvmclock time rather than the host view of it on
migration.

There is definitely something very broken on the host's side since it
does return a smaller time than the guest exposed interface indicates.


Don't know if helps but here are example values from time_at_migration 
and s->clock from your patch.


Tested on 5 restores of saved VM that (used to) hang:

   s->clock  time_at_migration
157082235125698  157113284546655
157082235125698  157113298196976
157082235125698  157113284615117
157082235125698  157113284486601
157082235125698  157113284479740

Now, when I compare system time on guest with and without patch:

On unpatched qemu vm restores with date: Apr 18 06:56:36
On patched qemu it says: Apr 18 06:57:06

--
mg



Re: [Qemu-devel] [PATCH] block: Fix bdrv_is_allocated() for short backing files

2014-05-06 Thread Max Reitz

On 06.05.2014 15:30, Kevin Wolf wrote:

bdrv_is_allocated() shouldn't return true for sectors that are
unallocated, but after the end of a short backing file, even though
such sectors are (correctly) marked as containing zeros.

Signed-off-by: Kevin Wolf 
---
  block.c   |  8 +---
  include/block/block.h | 11 +++
  2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index c90c71a..d3a9906 100644
--- a/block.c
+++ b/block.c
@@ -3883,6 +3883,10 @@ static int64_t coroutine_fn 
bdrv_co_get_block_status(BlockDriverState *bs,
   *pnum, pnum);
  }
  
+if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {

+ret |= BDRV_BLOCK_ALLOCATED;
+}
+


Shouldn't BDRV_BLOCK_ALLOCATED be set in the 
!bs->drv->bdrv_co_get_block_status case as well?



  if (!(ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO)) {
  if (bdrv_unallocated_blocks_are_zero(bs)) {
  ret |= BDRV_BLOCK_ZERO;
@@ -3959,9 +3963,7 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, 
int64_t sector_num,
  if (ret < 0) {
  return ret;
  }
-return
-(ret & BDRV_BLOCK_DATA) ||
-((ret & BDRV_BLOCK_ZERO) && !bdrv_has_zero_init(bs));
+return (ret & BDRV_BLOCK_ALLOCATED);
  }
  
  /*

diff --git a/include/block/block.h b/include/block/block.h
index 2fda81c..ad4c7e8 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -116,6 +116,8 @@ typedef enum {
  /* BDRV_BLOCK_DATA: data is read from bs->file or another file
   * BDRV_BLOCK_ZERO: sectors read as zero
   * BDRV_BLOCK_OFFSET_VALID: sector stored in bs->file as raw data
+ * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
+ *   layer (as opposed to the backing file)


I guess this is above BDRV_BLOCK_RAW (albeit having a greater value) 
because it is not only used internally? (to pick up on the topic of OCD :-P)


Max


   * BDRV_BLOCK_RAW: used internally to indicate that the request
   * was answered by the raw driver and that one
   * should look in bs->file directly.
@@ -137,10 +139,11 @@ typedef enum {
   *  ftf   not allocated or unknown offset, read as zero
   *  fff   not allocated or unknown offset, read from 
backing_hd
   */
-#define BDRV_BLOCK_DATA 1
-#define BDRV_BLOCK_ZERO 2
-#define BDRV_BLOCK_OFFSET_VALID 4
-#define BDRV_BLOCK_RAW  8
+#define BDRV_BLOCK_DATA 0x01
+#define BDRV_BLOCK_ZERO 0x02
+#define BDRV_BLOCK_OFFSET_VALID 0x04
+#define BDRV_BLOCK_RAW  0x08
+#define BDRV_BLOCK_ALLOCATED0x10
  #define BDRV_BLOCK_OFFSET_MASK  BDRV_SECTOR_MASK
  
  typedef enum {





Re: [Qemu-devel] qemu leaving unix sockets behind after VM is shut down

2014-05-06 Thread Chris Friesen

On 05/06/2014 07:39 AM, Stefan Hajnoczi wrote:

On Tue, Apr 01, 2014 at 02:34:58PM -0600, Chris Friesen wrote:

When running qemu with something like this

-device virtio-serial \
-chardev socket,path=/tmp/foo,server,nowait,id=foo \
-device virtserialport,chardev=foo,name=host.port.0

the VM starts up as expected and creates a socket at /tmp/foo as expected.

However, when I shut down the VM the socket at /tmp/foo is left
behind in the filesystem.  Basically qemu has "leaked" a file.

With something like OpenStack where we could be creating/destroying
many VMs this could end up creating a significant number of files in
the specified directory.

Has any thought been given to either automatically cleaning up the
unix socket in the filesystem when qemu exits, or else supporting
the abstract namespace for unix sockets to allow for automatic
cleanup?


Libvirt has a special case for the monitor socket in its
qemuProcessStop() function.

Are you using the OpenStack libvirt driver?

Perhaps QEMU should support cleanup but first I think we should check
the situation with libvirt.


Yes, I am in fact using OpenStack/libvirt, and did eventually track down 
libvirt as the code that was cleaning up the monitor socket.


Even so, I think this sort of change would be valid in qemu itself. qemu 
created the files, so really it should be up to qemu to delete them when 
it's done with them.


They're not usable for anything with qemu not running, so there's no 
good reason to leave them laying around.


Chris






[Qemu-devel] [PATCH 0/4] block: Allow JSON filenames

2014-05-06 Thread Max Reitz
This series acts as some kind of alternative or v4 to the "block/json:
Add JSON protocol driver" series. It makes bdrv_open() parse filenames
prefixed by "json:" as JSON objects (discarding the prefix beforehand)
and then use the resulting QDict as the options for the block device to
be opened with a NULL filename.

The purpose of this is that it may sometimes be desirable to specify
options for a block device where only a filename can be given, e.g., for
backing files. Using this should obviously be the exception, but it is
nice to have if actually needed.


After having written this series, I do indeed agree that it is much
nicer than the old version of having a dedicated "virtual" block driver
for this purpose.

Patches 1, 2 and 4 are taken directly from the block/json v3 (see
backport-diff output).


git-backport-diff against [PATCH v3 00/12] block/json:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/4:[] [--] 'qdict: Add qdict_join()'
002/4:[] [--] 'check-qdict: Add test for qdict_join()'
003/4:[down] 'block: Allow JSON filenames'
004/4:[] [-C] 'iotests: Add test for the JSON protocol'


Max Reitz (4):
  qdict: Add qdict_join()
  check-qdict: Add test for qdict_join()
  block: Allow JSON filenames
  iotests: Add test for the JSON protocol

 block.c|  41 +++
 include/qapi/qmp/qdict.h   |   3 ++
 qobject/qdict.c|  32 
 tests/check-qdict.c|  87 
 tests/qemu-iotests/089 | 123 +
 tests/qemu-iotests/089.out |  39 ++
 tests/qemu-iotests/group   |   1 +
 7 files changed, 326 insertions(+)
 create mode 100755 tests/qemu-iotests/089
 create mode 100644 tests/qemu-iotests/089.out

-- 
1.9.2




[Qemu-devel] [PATCH 2/4] check-qdict: Add test for qdict_join()

2014-05-06 Thread Max Reitz
Add some test cases for qdict_join().

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Benoit Canet 
---
 tests/check-qdict.c | 87 +
 1 file changed, 87 insertions(+)

diff --git a/tests/check-qdict.c b/tests/check-qdict.c
index 2ad0f78..a9296f0 100644
--- a/tests/check-qdict.c
+++ b/tests/check-qdict.c
@@ -444,6 +444,92 @@ static void qdict_array_split_test(void)
 QDECREF(test_dict);
 }
 
+static void qdict_join_test(void)
+{
+QDict *dict1, *dict2;
+bool overwrite = false;
+int i;
+
+dict1 = qdict_new();
+dict2 = qdict_new();
+
+
+/* Test everything once without overwrite and once with */
+do
+{
+/* Test empty dicts */
+qdict_join(dict1, dict2, overwrite);
+
+g_assert(qdict_size(dict1) == 0);
+g_assert(qdict_size(dict2) == 0);
+
+
+/* First iteration: Test movement */
+/* Second iteration: Test empty source and non-empty destination */
+qdict_put(dict2, "foo", qint_from_int(42));
+
+for (i = 0; i < 2; i++) {
+qdict_join(dict1, dict2, overwrite);
+
+g_assert(qdict_size(dict1) == 1);
+g_assert(qdict_size(dict2) == 0);
+
+g_assert(qdict_get_int(dict1, "foo") == 42);
+}
+
+
+/* Test non-empty source and destination without conflict */
+qdict_put(dict2, "bar", qint_from_int(23));
+
+qdict_join(dict1, dict2, overwrite);
+
+g_assert(qdict_size(dict1) == 2);
+g_assert(qdict_size(dict2) == 0);
+
+g_assert(qdict_get_int(dict1, "foo") == 42);
+g_assert(qdict_get_int(dict1, "bar") == 23);
+
+
+/* Test conflict */
+qdict_put(dict2, "foo", qint_from_int(84));
+
+qdict_join(dict1, dict2, overwrite);
+
+g_assert(qdict_size(dict1) == 2);
+g_assert(qdict_size(dict2) == !overwrite);
+
+g_assert(qdict_get_int(dict1, "foo") == overwrite ? 84 : 42);
+g_assert(qdict_get_int(dict1, "bar") == 23);
+
+if (!overwrite) {
+g_assert(qdict_get_int(dict2, "foo") == 84);
+}
+
+
+/* Check the references */
+g_assert(qdict_get(dict1, "foo")->refcnt == 1);
+g_assert(qdict_get(dict1, "bar")->refcnt == 1);
+
+if (!overwrite) {
+g_assert(qdict_get(dict2, "foo")->refcnt == 1);
+}
+
+
+/* Clean up */
+qdict_del(dict1, "foo");
+qdict_del(dict1, "bar");
+
+if (!overwrite) {
+qdict_del(dict2, "foo");
+}
+}
+while (overwrite ^= true);
+
+
+QDECREF(dict1);
+QDECREF(dict2);
+}
+
 /*
  * Errors test-cases
  */
@@ -584,6 +670,7 @@ int main(int argc, char **argv)
 g_test_add_func("/public/iterapi", qdict_iterapi_test);
 g_test_add_func("/public/flatten", qdict_flatten_test);
 g_test_add_func("/public/array_split", qdict_array_split_test);
+g_test_add_func("/public/join", qdict_join_test);
 
 g_test_add_func("/errors/put_exists", qdict_put_exists_test);
 g_test_add_func("/errors/get_not_exists", qdict_get_not_exists_test);
-- 
1.9.2




[Qemu-devel] [PATCH 4/4] iotests: Add test for the JSON protocol

2014-05-06 Thread Max Reitz
Add a test for the JSON protocol driver.

Signed-off-by: Max Reitz 
Reviewed-by: Benoit Canet 
---
 tests/qemu-iotests/089 | 123 +
 tests/qemu-iotests/089.out |  39 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 163 insertions(+)
 create mode 100755 tests/qemu-iotests/089
 create mode 100644 tests/qemu-iotests/089.out

diff --git a/tests/qemu-iotests/089 b/tests/qemu-iotests/089
new file mode 100755
index 000..75cb0c2
--- /dev/null
+++ b/tests/qemu-iotests/089
@@ -0,0 +1,123 @@
+#!/bin/bash
+#
+# Test case for the JSON block protocol
+#
+# Copyright (C) 2014 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=mre...@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+tmp=/tmp/$$
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+# Using an image filename containing quotation marks will render the JSON data
+# below invalid. In that case, we have little choice but simply not to run this
+# test.
+case $TEST_IMG in
+*'"'*)
+_notrun "image filename may not contain quotation marks"
+;;
+esac
+
+IMG_SIZE=64M
+
+# Taken from test 072
+echo
+echo "=== Testing nested image formats ==="
+echo
+
+TEST_IMG="$TEST_IMG.base" _make_test_img $IMG_SIZE
+
+$QEMU_IO -c 'write -P 42 0 512' -c 'write -P 23 512 512' \
+ -c 'write -P 66 1024 512' "$TEST_IMG.base" | _filter_qemu_io
+
+$QEMU_IMG convert -f raw -O $IMGFMT "$TEST_IMG.base" "$TEST_IMG"
+
+$QEMU_IO -c 'read -P 42 0 512' -c 'read -P 23 512 512' \
+ -c 'read -P 66 1024 512' "json:{
+\"driver\": \"$IMGFMT\",
+\"file\": {
+\"driver\": \"$IMGFMT\",
+\"file\": {
+\"filename\": \"$TEST_IMG\"
+}
+}
+}" | _filter_qemu_io
+
+# This should fail (see test 072)
+$QEMU_IO -c 'read -P 42 0 512' "$TEST_IMG" | _filter_qemu_io
+
+
+# Taken from test 071
+echo
+echo "=== Testing blkdebug ==="
+echo
+
+_make_test_img $IMG_SIZE
+
+$QEMU_IO -c 'write -P 42 0x38000 512' "$TEST_IMG" | _filter_qemu_io
+
+# The "image.filename" part tests whether "a": { "b": "c" } and "a.b": "c" do
+# the same (which they should).
+# This has to use -g to force qemu-io to use BDRV_O_PROTOCOL, since it will try
+# to determine the format of the file otherwise; due to the complexity of the
+# filename, only raw (or json itself) will work and this will then result in an
+# error because of the blkdebug part. Thus, use -g.
+$QEMU_IO -c 'read -P 42 0x38000 512' -g "json:{
+\"driver\": \"$IMGFMT\",
+\"file\": {
+\"driver\": \"blkdebug\",
+\"inject-error\": [{
+\"event\": \"l2_load\"
+}],
+\"image.filename\": \"$TEST_IMG\"
+}
+}" | _filter_qemu_io
+
+
+echo
+echo "=== Testing qemu-img info output ==="
+echo
+
+# This should output information about the image itself, not about the JSON
+# block device.
+$QEMU_IMG info "json:{\"driver\":\"qcow2\",\"file.filename\":\"$TEST_IMG\"}" \
+| _filter_testdir | _filter_imgfmt
+
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/089.out b/tests/qemu-iotests/089.out
new file mode 100644
index 000..4804ff0
--- /dev/null
+++ b/tests/qemu-iotests/089.out
@@ -0,0 +1,39 @@
+QA output created by 089
+
+=== Testing nested image formats ===
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=67108864 
+wrote 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 1024
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 1024
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Pattern verification failed at offset 0, 512 bytes
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Testing blkdebug ===
+
+Format

[Qemu-devel] [PATCH 3/4] block: Allow JSON filenames

2014-05-06 Thread Max Reitz
If the filename given to bdrv_open() is prefixed with "json:", parse the
rest as a JSON object and use the result as the options QDict.

Signed-off-by: Max Reitz 
---
 block.c | 41 +
 1 file changed, 41 insertions(+)

diff --git a/block.c b/block.c
index b749d31..2555e34 100644
--- a/block.c
+++ b/block.c
@@ -1275,6 +1275,33 @@ out:
 g_free(tmp_filename);
 }
 
+static QDict *parse_json_filename(const char *filename, Error **errp)
+{
+QObject *options_obj;
+QDict *options;
+int ret;
+
+ret = strstart(filename, "json:", &filename);
+assert(ret);
+
+options_obj = qobject_from_json(filename);
+if (!options_obj) {
+error_setg(errp, "Could not parse the JSON options");
+return NULL;
+}
+
+if (qobject_type(options_obj) != QTYPE_QDICT) {
+qobject_decref(options_obj);
+error_setg(errp, "Invalid JSON object given");
+return NULL;
+}
+
+options = qobject_to_qdict(options_obj);
+qdict_flatten(options);
+
+return options;
+}
+
 /*
  * Opens a disk image (raw, qcow2, vmdk, ...)
  *
@@ -1337,6 +1364,20 @@ int bdrv_open(BlockDriverState **pbs, const char 
*filename,
 options = qdict_new();
 }
 
+if (filename && g_str_has_prefix(filename, "json:")) {
+QDict *json_options = parse_json_filename(filename, &local_err);
+if (local_err) {
+ret = -EINVAL;
+goto fail;
+}
+
+qdict_join(options, json_options, true);
+assert(qdict_size(json_options) == 0);
+QDECREF(json_options);
+
+filename = NULL;
+}
+
 bs->options = options;
 options = qdict_clone_shallow(options);
 
-- 
1.9.2




[Qemu-devel] [PATCH 1/4] qdict: Add qdict_join()

2014-05-06 Thread Max Reitz
This function joins two QDicts by absorbing one into the other.

Signed-off-by: Max Reitz 
Reviewed-by: Benoit Canet 
Reviewed-by: Eric Blake 
---
 include/qapi/qmp/qdict.h |  3 +++
 qobject/qdict.c  | 32 
 2 files changed, 35 insertions(+)

diff --git a/include/qapi/qmp/qdict.h b/include/qapi/qmp/qdict.h
index 1ddf97b..d68f4eb 100644
--- a/include/qapi/qmp/qdict.h
+++ b/include/qapi/qmp/qdict.h
@@ -16,6 +16,7 @@
 #include "qapi/qmp/qobject.h"
 #include "qapi/qmp/qlist.h"
 #include "qemu/queue.h"
+#include 
 #include 
 
 #define QDICT_BUCKET_MAX 512
@@ -70,4 +71,6 @@ void qdict_flatten(QDict *qdict);
 void qdict_extract_subqdict(QDict *src, QDict **dst, const char *start);
 void qdict_array_split(QDict *src, QList **dst);
 
+void qdict_join(QDict *dest, QDict *src, bool overwrite);
+
 #endif /* QDICT_H */
diff --git a/qobject/qdict.c b/qobject/qdict.c
index 42ec4c0..ea239f0 100644
--- a/qobject/qdict.c
+++ b/qobject/qdict.c
@@ -665,3 +665,35 @@ void qdict_array_split(QDict *src, QList **dst)
 qlist_append_obj(*dst, subqobj ?: QOBJECT(subqdict));
 }
 }
+
+/**
+ * qdict_join(): Absorb the src QDict into the dest QDict, that is, move all
+ * elements from src to dest.
+ *
+ * If an element from src has a key already present in dest, it will not be
+ * moved unless overwrite is true.
+ *
+ * If overwrite is true, the conflicting values in dest will be discarded and
+ * replaced by the corresponding values from src.
+ *
+ * Therefore, with overwrite being true, the src QDict will always be empty 
when
+ * this function returns. If overwrite is false, the src QDict will be empty
+ * iff there were no conflicts.
+ */
+void qdict_join(QDict *dest, QDict *src, bool overwrite)
+{
+const QDictEntry *entry, *next;
+
+entry = qdict_first(src);
+while (entry) {
+next = qdict_next(src, entry);
+
+if (overwrite || !qdict_haskey(dest, entry->key)) {
+qobject_incref(entry->value);
+qdict_put_obj(dest, entry->key, entry->value);
+qdict_del(src, entry->key);
+}
+
+entry = next;
+}
+}
-- 
1.9.2




Re: [Qemu-devel] [PATCH] Update QEMU checkpatch.pl to current linux version

2014-05-06 Thread Peter Maydell
On 6 May 2014 19:59, Mike Day  wrote:
> On Tue, May 6, 2014 at 2:28 PM, Peter Maydell  
> wrote:
>> A couple of ideas about how we could approach this:
>>  (1) make a commit which is simply copying the kernel's 0.32
>>   into our repo; then follow that with a series of commits which
>>   re-apply our local changes.
>>  (2) apply all the individual commits from the kernel between 0.31
>>  and 0.32 to our repo
>>  (3) give up and stick with 0.31...
>
> idea (1) makes perfect sense to me, and the local changes will be easy
> to review.
>
>> It might also be helpful if you could describe the benefits
>> we get from this update (any bugfixes for false positives we
>> tend to run into? useful new checks?)
>
> I think its a valid question whether to forward-port the Qemu checks
> to 0.32. I'm not sure, myself, but this gives folks an idea of what
> the cost/benefit is.

Well, some of our local checks we need, because our coding
style is not the same as the kernel's. In particular the brace
checks must be different.

> Yesterday I was bitten by the StudlyCaps syndrome in a patch I
> submitted which was checkpatch-clean. Afterward I noticed that version
> 0.31 has the check for StudlyCaps commented out. This is corrected in
> version 0.32 with a more ambitious check for "CamelCase."
>
> The CamelCase check in v0.32 tries to determine if the patch
> introduced StudlyCaps or if they already exist in the unpatched source
> file. (I was hoping for a three-line check I could paste into v0.31.)

If you want a camelcase check you'll need to fix it to enforce
QEMU's style requirements: struct, enum and function typenames
should be camelcase; scalar typenames and variable names not.
A quick scan of the code in this patch suggests it just rejects
all new camelcase names (I might well be misreading it, though.)

> I noticed some other new features - it uses an optional configuration
> file, checks for validly formed email addresses. The other changes I
> believe, are either experimental or providing tools for filtering or
> typing messages.

Nothing hugely important then, but if we're going to do
periodic updates it's probably going to be easier to do
them relatively frequently rather than every five years :-)

thanks
-- PMM



[Qemu-devel] [PATCH v2] block/raw-posix: Try both FIEMAP and SEEK_HOLE

2014-05-06 Thread Max Reitz
The current version of raw-posix always uses ioctl(FS_IOC_FIEMAP) if
FIEMAP is available; lseek with SEEK_HOLE/SEEK_DATA are not even
compiled in in this case. However, there may be implementations which
support the latter but not the former (e.g., NFSv4.2) as well as vice
versa.

To cover both cases, always try SEEK_HOLE/SEEK_DATA (as this will
probably be covered by POSIX soon) and if that does not work, fall back
to FIEMAP; and if that does not work either, treat everything as
allocated.

Signed-off-by: Max Reitz 
---
v2:
 - reworked using static functions [Stefan]
 - changed order of FIEMAP and SEEK_HOLE [Eric]
---
 block/raw-posix.c | 135 ++
 1 file changed, 85 insertions(+), 50 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 3ce026d..1b3900d 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -146,6 +146,12 @@ typedef struct BDRVRawState {
 bool has_discard:1;
 bool has_write_zeroes:1;
 bool discard_zeroes:1;
+#if defined SEEK_HOLE && defined SEEK_DATA
+bool skip_seek_hole;
+#endif
+#ifdef CONFIG_FIEMAP
+bool skip_fiemap;
+#endif
 } BDRVRawState;
 
 typedef struct BDRVRawReopenState {
@@ -1272,53 +1278,29 @@ static int raw_create(const char *filename, 
QEMUOptionParameter *options,
 return result;
 }
 
-/*
- * Returns true iff the specified sector is present in the disk image. Drivers
- * not implementing the functionality are assumed to not support backing files,
- * hence all their sectors are reported as allocated.
- *
- * If 'sector_num' is beyond the end of the disk image the return value is 0
- * and 'pnum' is set to 0.
- *
- * 'pnum' is set to the number of sectors (including and immediately following
- * the specified sector) that are known to be in the same
- * allocated/unallocated state.
- *
- * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
- * beyond the end of the disk image it will be clamped.
- */
-static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num,
-int nb_sectors, int *pnum)
+static int64_t try_fiemap(BlockDriverState *bs, off_t start, off_t *data,
+  off_t *hole, int nb_sectors, int *pnum)
 {
-off_t start, data, hole;
-int64_t ret;
-
-ret = fd_open(bs);
-if (ret < 0) {
-return ret;
-}
-
-start = sector_num * BDRV_SECTOR_SIZE;
-ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
-
 #ifdef CONFIG_FIEMAP
-
 BDRVRawState *s = bs->opaque;
+int64_t ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
 struct {
 struct fiemap fm;
 struct fiemap_extent fe;
 } f;
 
+if (s->skip_fiemap) {
+return -ENOTSUP;
+}
+
 f.fm.fm_start = start;
 f.fm.fm_length = (int64_t)nb_sectors * BDRV_SECTOR_SIZE;
 f.fm.fm_flags = 0;
 f.fm.fm_extent_count = 1;
 f.fm.fm_reserved = 0;
 if (ioctl(s->fd, FS_IOC_FIEMAP, &f) == -1) {
-/* Assume everything is allocated.  */
-*pnum = nb_sectors;
-return ret;
+s->skip_fiemap = true;
+return -errno;
 }
 
 if (f.fm.fm_mapped_extents == 0) {
@@ -1326,44 +1308,97 @@ static int64_t coroutine_fn 
raw_co_get_block_status(BlockDriverState *bs,
  * f.fm.fm_start + f.fm.fm_length must be clamped to the file size!
  */
 off_t length = lseek(s->fd, 0, SEEK_END);
-hole = f.fm.fm_start;
-data = MIN(f.fm.fm_start + f.fm.fm_length, length);
+*hole = f.fm.fm_start;
+*data = MIN(f.fm.fm_start + f.fm.fm_length, length);
 } else {
-data = f.fe.fe_logical;
-hole = f.fe.fe_logical + f.fe.fe_length;
+*data = f.fe.fe_logical;
+*hole = f.fe.fe_logical + f.fe.fe_length;
 if (f.fe.fe_flags & FIEMAP_EXTENT_UNWRITTEN) {
 ret |= BDRV_BLOCK_ZERO;
 }
 }
 
-#elif defined SEEK_HOLE && defined SEEK_DATA
+return ret;
+#else
+return -ENOTSUP;
+#endif
+}
 
+static int64_t try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
+ off_t *hole, int *pnum)
+{
+#if defined SEEK_HOLE && defined SEEK_DATA
 BDRVRawState *s = bs->opaque;
 
-hole = lseek(s->fd, start, SEEK_HOLE);
-if (hole == -1) {
+if (s->skip_seek_hole) {
+return -ENOTSUP;
+}
+
+*hole = lseek(s->fd, start, SEEK_HOLE);
+if (*hole == -1) {
 /* -ENXIO indicates that sector_num was past the end of the file.
  * There is a virtual hole there.  */
 assert(errno != -ENXIO);
 
-/* Most likely EINVAL.  Assume everything is allocated.  */
-*pnum = nb_sectors;
-return ret;
+s->skip_seek_hole = true;
+return -errno;
 }
 
-if (hole > start) {
-data = start;
+if (*hole > start) {
+*data = start;
 } else {
 /* On a hole. 

Re: [Qemu-devel] [PATCH] Update QEMU checkpatch.pl to current linux version

2014-05-06 Thread Mike Day
On Tue, May 6, 2014 at 2:28 PM, Peter Maydell  wrote:
> I think this is going to be difficult to review, to say the least.
>
> Where does your patch come from? Is the kernel's checkpatch.pl
> just a single commit between 0.31 and 0.32 (surely not) or
> a series of fixes?

Yes, this is today's version of the kernel's checkpatch.pl.  Version
0.32 is the product of many patches onto 0.31

> A couple of ideas about how we could approach this:
>  (1) make a commit which is simply copying the kernel's 0.32
>   into our repo; then follow that with a series of commits which
>   re-apply our local changes.
>  (2) apply all the individual commits from the kernel between 0.31
>  and 0.32 to our repo
>  (3) give up and stick with 0.31...

idea (1) makes perfect sense to me, and the local changes will be easy
to review.

> It might also be helpful if you could describe the benefits
> we get from this update (any bugfixes for false positives we
> tend to run into? useful new checks?)

I think its a valid question whether to forward-port the Qemu checks
to 0.32. I'm not sure, myself, but this gives folks an idea of what
the cost/benefit is.

Yesterday I was bitten by the StudlyCaps syndrome in a patch I
submitted which was checkpatch-clean. Afterward I noticed that version
0.31 has the check for StudlyCaps commented out. This is corrected in
version 0.32 with a more ambitious check for "CamelCase."

The CamelCase check in v0.32 tries to determine if the patch
introduced StudlyCaps or if they already exist in the unpatched source
file. (I was hoping for a three-line check I could paste into v0.31.)

I noticed some other new features - it uses an optional configuration
file, checks for validly formed email addresses. The other changes I
believe, are either experimental or providing tools for filtering or
typing messages.

Mike



Re: [Qemu-devel] [RFC PATCH v2 00/16] visitor+BER migration format

2014-05-06 Thread Dr. David Alan Gilbert
* Markus Armbruster (arm...@redhat.com) wrote:
> "Michael S. Tsirkin"  writes:



> > OK but for a new machine type, let's default to BER, right?
> > I see no reason to keep supporting when non-BER when -M specifies 2.1
> > compatibility, do you?
> 
> I fail to see the relation between machine type and migration's wire
> encoding.

New machine types are a useful but not definitive line in the sand.  If
you enable something/change the default on a new machine type you know
it won't break any existing users since there aren't any.

Dve
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [RFC PATCH V4 1/6] linux-headers: Update KVM headers from linux-3.16-rc1

2014-05-06 Thread Peter Maydell
On 5 May 2014 09:57, Pranavkumar Sawargaonkar  wrote:
> Syncup KVM related linux headers from linux-3.16-rc1
>
> Signed-off-by: Pranavkumar Sawargaonkar 
> Signed-off-by: Anup Patel 
> ---
>  linux-headers/asm-arm/kvm.h   |   10 +++--
>  linux-headers/asm-arm64/kvm.h |   10 +++--
>  linux-headers/linux/kvm.h |   27 -
>  linux-headers/linux/psci.h|   90 
> +

psci.h is a new file, so the update-linux-headers.sh needs to be updated
so that it gets sync'd. (You are using the script to generate this
patch, right? :-))

thanks
-- PMM



Re: [Qemu-devel] [PATCH v3 3/4] target-ppc: ppc can be either endian

2014-05-06 Thread Peter Maydell
On 5 May 2014 09:07, Greg Kurz  wrote:
> POWER7, POWER7+ and POWER8 families use the ILE bit of the LPCR
> special purpose register to decide the endianness to use when
> entering interrupt handlers. When running a Linux guest, this
> provides a hint on the endianness used by the kernel. From a
> QEMU point of view, the information is needed for legacy virtio
> support and crash dump support as well.

Do you care about the case of:
 * kernel bigendian
 * userspace littleendian (or vice-versa)
 * guest kernel passes virtio device through to guest userspace
 * guest userspace is doing the manipulation of the device

?

(Will Deacon just suggested this as a possibility on the
kvm-arm mailing list...)

Also, are we documenting what the process should be for a
virtio implementation to decide the endianness for a particular
architecture? I assume we'd like kvmtool and QEMU to do
the same thing rather than subtly different things...

thanks
-- PMM



Re: [Qemu-devel] [PATCH] block/raw-posix: Fall back to SEEK_HOLE/SEEK_DATA

2014-05-06 Thread Eric Blake
On 05/06/2014 11:46 AM, Max Reitz wrote:

> 
> Okay, then I'll put the functionality in own functions and reverse the
> order in v2 while keeping the fallback idea, as I think there may exist
> systems where the reverse of what this patch tries to fix is true:
> SEEK_HOLE/SEEK_DATA is not supported, but FIEMAP is.

Correct - Linux implemented FIEMAP long before SEEK_HOLE, so there is a
range of kernels where FIEMAP is all you have.  It is only non-Linux
where you will have SEEK_HOLE in isolation.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] Update QEMU checkpatch.pl to current linux version

2014-05-06 Thread Peter Maydell
On 6 May 2014 19:16, Mike Day  wrote:
> This updates scripts/checkpatch.pl to version 0.32. Also,
> forward-ported the QEMU checks for no tabs and correct capitalization
> of "QEMU." Finally, make --no-tree the default option since this will
> be used with Qemu.
>
> Signed-off-by: Mike Day 
> ---
>
> Notes: This is a huge patch and I needed to include white space
> changes to get it applying cleanly.
> I've tested this so far with some random patches off the mailing list
> and some others that I've got around. I also tested for some intentional
> errors.
>
>  scripts/checkpatch.pl | 2544 
> -
>  1 file changed, 2120 insertions(+), 424 deletions(-)

I think this is going to be difficult to review, to say the least.

Where does your patch come from? Is the kernel's checkpatch.pl
just a single commit between 0.31 and 0.32 (surely not) or
a series of fixes?

A couple of ideas about how we could approach this:
 (1) make a commit which is simply copying the kernel's 0.32
  into our repo; then follow that with a series of commits which
  re-apply our local changes.
 (2) apply all the individual commits from the kernel between 0.31
 and 0.32 to our repo
 (3) give up and stick with 0.31...

(note that 1 and 2 here both end up with the same result as
applying this patch, but with a different commit history to get
there.)

We only have 16 commits which touch checkpatch.pl, so
my favourite of these is (1). This also makes it easier when
we want to upgrade to 0.33 some time in the future, I think.

It might also be helpful if you could describe the benefits
we get from this update (any bugfixes for false positives we
tend to run into? useful new checks?)

thanks
-- PMM



Re: [Qemu-devel] [PATCH v6 2/2] pic: use emulated lapic version 0x14 on pc machines >= 2.1

2014-05-06 Thread Gabriel L. Somlo
Michael,

Once you decide it's time to pick this up and apply it, the subject
line should be:

s/^pic:/apic:/

Please let me know if you'd rather have me send out a v7 for this
as opposed to you being able to fix it when you apply.

Thanks much,
--Gabriel


On Tue, May 06, 2014 at 11:17:25AM -0400, Gabriel L. Somlo wrote:
> Add "version" property to local apic, and have it default to
> 0x14 for pc machines starting at 2.1. For compatibility with
> previous releases, pc machines up to 2.0 will have their local
> apic version set to 0x11.
> 
> Signed-off-by: Gabriel L. Somlo 
> Acked-by: Alexander Graf 
> Acked-by: Michael S. Tsirkin 
> Reviewed-by: Andreas F??rber 



Re: [Qemu-devel] [PATCH v3 00/12] block/json: Add JSON protocol driver

2014-05-06 Thread Max Reitz

On 06.05.2014 10:10, Stefan Hajnoczi wrote:

On Mon, May 05, 2014 at 06:19:07PM +0200, Max Reitz wrote:

On 10.04.2014 20:43, Max Reitz wrote:

This series adds a passthrough JSON protocol block driver. Its filenames
are JSON objects prefixed by "json:". The objects are used as options
for opening another block device which will be the child of the JSON
device. Regarding this child device, the JSON driver behaves nearly the
same as raw_bsd in that it is just a passthrough driver. The only
difference is probably that the JSON driver identifies itself as a block
filter, in contrast to raw_bsd.

The purpose of this driver is that it may sometimes be desirable to
specify options for a block device where only a filename can be given,
e.g., for backing files. Using this should obviously be the exception,
but it is nice to have if actually needed.

Ping – I do understand that Kevin has reservations against this
series, but as long as he doesn't explicitly ask me to reimplement
this in bdrv_open() without an own block driver (which I'd more or
less gladly do), I do not see issues why this series should not be
merged.

I haven't reviewed it further because it seems like a kludge (that we
have to keep supporting once it's merged).  Was hoping you and Kevin
will come up with a long-term fix instead.


Okay, if you think the same, I guess I'll have to rewrite this series. I 
agree that including this functionality in bdrv_open() is the nicer 
alternative from the user's point of view, whereas I think using a 
separate block driver results in nicer code.


I'll rewrite this series and then we'll see how bad it actually looks. ;-)

Max



Re: [Qemu-devel] [PATCH] block/raw-posix: Fall back to SEEK_HOLE/SEEK_DATA

2014-05-06 Thread Max Reitz

On 06.05.2014 14:27, Eric Blake wrote:

On 05/06/2014 05:49 AM, Stefan Hajnoczi wrote:

On Mon, May 05, 2014 at 10:01:39PM +0200, Max Reitz wrote:

The current version of raw-posix always uses ioctl(FS_IOC_FIEMAP) if
FIEMAP is available; lseek with SEEK_HOLE/SEEK_DATA are not even
compiled in in this case. However, there may be implementations which
support the latter but not the former (e.g., NFSv4.2). In this case,
raw-posix should fall back to lseek with SEEK_HOLE/SEEK_DATA if FIEMAP
does not work.


A bigger cleanup is extracting the FIEMAP and SEEK_HOLE/SEEK_DATA
implementations into their own static functions.  Then
raw_co_get_block_status() becomes simpler and doesn't need ifdefs:

ret = try_fiemap(...);
if (ret < 0) {
 ret = try_seekhole(...);
}
if (ret < 0) {
 ...report every block allocated by default
}

In other words, let normal C control flow describe the relationships
between these code paths.  Use ifdef only to nop out try_fiemap() and
try_seekhole().

What do you think?

I like the idea - separating control flow from #ifdefs (by having stubs
on the other end of the ifdef) definitely makes algorithms easier to
understand.

More things to consider: GNU Coreutils has support for both fiemap and
seek_hole, but favors seek_hole first, for a couple reasons.  First,
FIEMAP has not always been reliable: on some older kernel/filesystem
pairs, fiemap could return stale results, which led cp(1) to cause data
loss unless it did an fsync() first to get the fiemap to be stable - but
the cost of the fsync() made the operation slower than if fiemap were
never used.  Second, POSIX will be standardizing seek_hole in its next
revision [1] (still several years out, but the fact that it is an
announced intention means people are starting to implement it now).
fiemap, on the other hand, remains a Linux-only extension.  Yes, fiemap
provides more details than seek_hole (and is the ONLY way to know the
difference between a hole that has reserved space on the disk vs a hole
that will require allocation if is written to), but if all you need to
know is whether a hole exists (rather than what type of hole), then
seek_hole is MUCH simpler.

[1] http://austingroupbugs.net/view.php?id=415


Okay, then I'll put the functionality in own functions and reverse the 
order in v2 while keeping the fallback idea, as I think there may exist 
systems where the reverse of what this patch tries to fix is true: 
SEEK_HOLE/SEEK_DATA is not supported, but FIEMAP is.


Max



Re: [Qemu-devel] [PATCH] block/raw-posix: Fall back to SEEK_HOLE/SEEK_DATA

2014-05-06 Thread Max Reitz

On 06.05.2014 13:49, Stefan Hajnoczi wrote:

On Mon, May 05, 2014 at 10:01:39PM +0200, Max Reitz wrote:

The current version of raw-posix always uses ioctl(FS_IOC_FIEMAP) if
FIEMAP is available; lseek with SEEK_HOLE/SEEK_DATA are not even
compiled in in this case. However, there may be implementations which
support the latter but not the former (e.g., NFSv4.2). In this case,
raw-posix should fall back to lseek with SEEK_HOLE/SEEK_DATA if FIEMAP
does not work.

Signed-off-by: Max Reitz 
---
As Linux does not yet implement NFSv4.2 (I don't even know whether the
specification is complete), I have no way of testing whether this
actually works for the proposed case. But as it doesn't break any of the
existing test cases, it should be fine.
---
  block/raw-posix.c | 27 +--
  1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 3ce026d..e523633 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -146,6 +146,9 @@ typedef struct BDRVRawState {
  bool has_discard:1;
  bool has_write_zeroes:1;
  bool discard_zeroes:1;
+#if defined CONFIG_FIEMAP && defined SEEK_HOLE && defined SEEK_DATA
+bool use_seek_hole_data;
+#endif
  } BDRVRawState;
  
  typedef struct BDRVRawReopenState {

@@ -1291,6 +1294,7 @@ static int64_t coroutine_fn 
raw_co_get_block_status(BlockDriverState *bs,
  int64_t sector_num,
  int nb_sectors, int *pnum)
  {
+BDRVRawState *s = bs->opaque;
  off_t start, data, hole;
  int64_t ret;
  
@@ -1303,22 +1307,31 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,

  ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
  
  #ifdef CONFIG_FIEMAP

-
-BDRVRawState *s = bs->opaque;
  struct {
  struct fiemap fm;
  struct fiemap_extent fe;
  } f;
  
+#if defined SEEK_HOLE && defined SEEK_DATA

+if (s->use_seek_hole_data) {
+goto try_seek_hole_data;
+}
+#endif

This is becoming hard to read due to the ifdefs and their relationships.

A minor simplification is to change the bool field:
#if defined CONFIG_FIEMAP
bool use_fiemap;
#endif

...

#if defined CONFIG_FIEMAP
if (s->use_fiemap) {
 ...try fiemap...
}
#endif /* CONFIG_FIEMAP */

use_fiemap is not dependent on SEEK_HOLE/SEEK_DATA.  That reduces the
amount of ifdefs needed.

A bigger cleanup is extracting the FIEMAP and SEEK_HOLE/SEEK_DATA
implementations into their own static functions.  Then
raw_co_get_block_status() becomes simpler and doesn't need ifdefs:

ret = try_fiemap(...);
if (ret < 0) {
 ret = try_seekhole(...);
}
if (ret < 0) {
 ...report every block allocated by default
}

In other words, let normal C control flow describe the relationships
between these code paths.  Use ifdef only to nop out try_fiemap() and
try_seekhole().

What do you think?


Great idea! I thought about using normal if's or something, but didn't 
think of this.


Max



Re: [Qemu-devel] [libvirt] qemu leaving unix sockets behind after VM is shut down

2014-05-06 Thread Eric Blake
On 05/06/2014 08:52 AM, Chris Friesen wrote:
> Yes, I am in fact using OpenStack/libvirt, and did eventually track down
> libvirt as the code that was cleaning up the monitor socket.
> 
> Even so, I think this sort of change would be valid in qemu itself. qemu
> created the files, so really it should be up to qemu to delete them when
> it's done with them.
> 
> They're not usable for anything with qemu not running, so there's no
> good reason to leave them laying around.

In the case of sVirt with SELinux labeling, it's often the case that
libvirt has to pre-create the file (as qemu itself is prevented from
creating the file itself) - that is also an argument for libvirt doing
the cleanup.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v12 0/4] qapi: Allow modularization of QAPI schema files

2014-05-06 Thread Eric Blake
On 05/06/2014 10:55 AM, Luiz Capitulino wrote:
> On Tue, 06 May 2014 08:55:52 -0600
> Eric Blake  wrote:
> 
>>> Eventually, we might want to have if/defs and whatnot. But having a master
>>> file seems a reasonable first step to me. I actually thought this was the
>>> intention. Unless I got it wrong, of course.
>>
>> Ifdefs may be a bit much.  If we add them, then we can worry about
>> explicit include guards, the same as the C preprocessor.  But for now,
>> I'd be perfectly fine with a followup patch that includes a file's
>> contents exactly once, no matter how many times it is included (that is,
>> act as if include guards were implicitly present, since we lack
>> conditionals, so include files are currently idempotent).
> 
> OK. Does it make sense to merge the current series without that
> modification?

Yes.  Idempotent inclusion as a followup patch is just fine; and the
current series is still useful for some clients without waiting for
idempotent inclusion to make it useful for even more clients.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v12 0/4] qapi: Allow modularization of QAPI schema files

2014-05-06 Thread Luiz Capitulino
On Tue, 06 May 2014 08:55:52 -0600
Eric Blake  wrote:

> > Eventually, we might want to have if/defs and whatnot. But having a master
> > file seems a reasonable first step to me. I actually thought this was the
> > intention. Unless I got it wrong, of course.
> 
> Ifdefs may be a bit much.  If we add them, then we can worry about
> explicit include guards, the same as the C preprocessor.  But for now,
> I'd be perfectly fine with a followup patch that includes a file's
> contents exactly once, no matter how many times it is included (that is,
> act as if include guards were implicitly present, since we lack
> conditionals, so include files are currently idempotent).

OK. Does it make sense to merge the current series without that
modification?



Re: [Qemu-devel] [PATCH v3] glib: fix g_poll early timeout on windows

2014-05-06 Thread Stefan Weil
Am 06.05.2014 17:49, schrieb Alex Bligh:
> 
> On 6 May 2014, at 13:38, Paolo Bonzini wrote:
> 
>> Il 06/05/2014 14:23, Stefan Hajnoczi ha scritto:
> Signed-off-by: Stanislav Vorobiov 
> ---
>  include/glib-compat.h |   19 +
>  include/qemu-common.h |   12 --
>  util/oslib-win32.c|  112 
> +
>  3 files changed, 131 insertions(+), 12 deletions(-)
>>> What is the status of this patch?
>>>
>>> I haven't followed the discussions around the issue but can review/merge
>>> if there is agreement now.
>>
>> The GNOME folks haven't followed up, so I think we should pick it.
> 
> I haven't checked it on win32 but was happy with it otherwise.


There was another patch on the list which moved g_poll code from
qemu-common.h to glib-compat.h, so when both patches get merged I expect
a (trivial) merge conflict.

The patch looks good. I already have applied it to my local queue, but I
won't be able to test it before end of next week. If someone wants to
pick it up earlier, I would not mind.

Cheers,
Stefan




Re: [Qemu-devel] [PATCH] vl.c: remove init_clocks call from main

2014-05-06 Thread Alex Bligh

On 6 May 2014, at 17:01, Peter Maydell wrote:

>> 
>> So if I read this right, init_clocks() is now only done in vl.c.
> 
> No, you're misreading it -- the commit message says "remove
> init_clocks call from main" and you can see from the diffstat
> it's removed from vl.c. It's now done only in qemu_clock_init().

Indeed. I get to wear the dunce's cap for a bit. Apologies.
Twice, actually, for not removing it from main() in the first
place.

Reviewed-By: Alex Bligh 

-- 
Alex Bligh







Re: [Qemu-devel] [PATCH] vl.c: remove init_clocks call from main

2014-05-06 Thread Peter Maydell
On 6 May 2014 16:57, Alex Bligh  wrote:
>
> On 6 May 2014, at 13:59, Kirill Batuzov wrote:
>
>> Clocks are initialized in qemu_init_main_loop. They are not needed before it.
>> Initializing them twice is not only unnecessary but is harmful: it results in
>> memory leak and potentially can lead to a situation where different parts of
>> QEMU use different sets of timers.
>>
>> To avoid it remove init_clocks call from main and add an assertion to
>> qemu_clock_init that corresponding clock has not been initialized yet.
>>
>> Signed-off-by: Kirill Batuzov 
>> ---
>> qemu-timer.c |3 +++
>> vl.c |1 -
>> 2 files changed, 3 insertions(+), 1 deletion(-)

> So if I read this right, init_clocks() is now only done in vl.c.

No, you're misreading it -- the commit message says "remove
init_clocks call from main" and you can see from the diffstat
it's removed from vl.c. It's now done only in qemu_clock_init().

thanks
-- PMM



Re: [Qemu-devel] [PATCH] vl.c: remove init_clocks call from main

2014-05-06 Thread Alex Bligh

On 6 May 2014, at 13:59, Kirill Batuzov wrote:

> Clocks are initialized in qemu_init_main_loop. They are not needed before it.
> Initializing them twice is not only unnecessary but is harmful: it results in
> memory leak and potentially can lead to a situation where different parts of
> QEMU use different sets of timers.
> 
> To avoid it remove init_clocks call from main and add an assertion to
> qemu_clock_init that corresponding clock has not been initialized yet.
> 
> Signed-off-by: Kirill Batuzov 
> ---
> qemu-timer.c |3 +++
> vl.c |1 -
> 2 files changed, 3 insertions(+), 1 deletion(-)
> 
> The init_clocks call was added to qemu_init_main_loop in commit
> 172061a0a0d98c974ea8d5ed715195237bc44225. init_clocks was made idempotent
> in prior commit 744ca8e3754e6808c6b5331d287adc533fca0ad3. Back then it was
> not possible to remove init_clocks call from main because rtc_clock variable
> was of type QEMUClock * and was used in option processing. So clocks needed to
> be initialized before command line options could be processed.
> 
> Commit 7bf8fbde449600926370f744b2b3d9cc819ca74f removed idempotence property
> from init_clocks. Commit 884f17c235095af99b92dd8cd10bb824a5b0f777 removed
> the need to call init_clocks from main, but did not remove the call.

So if I read this right, init_clocks() is now only done in vl.c.

I think this is problematic, as qemu-img (for instance) still uses the
mainloop (it calls qemu_init_main_loop() from main), but does not use
vl.c as far as I know.

Perhaps restoring the idempotence would be a better plan.

-- 
Alex Bligh







Re: [Qemu-devel] [PATCH v3] glib: fix g_poll early timeout on windows

2014-05-06 Thread Alex Bligh

On 6 May 2014, at 13:38, Paolo Bonzini wrote:

> Il 06/05/2014 14:23, Stefan Hajnoczi ha scritto:
>>> > Signed-off-by: Stanislav Vorobiov 
>>> > ---
>>> >  include/glib-compat.h |   19 +
>>> >  include/qemu-common.h |   12 --
>>> >  util/oslib-win32.c|  112 
>>> > +
>>> >  3 files changed, 131 insertions(+), 12 deletions(-)
>> What is the status of this patch?
>> 
>> I haven't followed the discussions around the issue but can review/merge
>> if there is agreement now.
> 
> The GNOME folks haven't followed up, so I think we should pick it.

I haven't checked it on win32 but was happy with it otherwise.

-- 
Alex Bligh







Re: [Qemu-devel] [PATCH] configure: Put tempfiles in subdir so we can clean up libtool files

2014-05-06 Thread Eric Blake
On 05/06/2014 08:53 AM, Peter Maydell wrote:

>> # Create a (secure) tmp directory for tmp files.
>>
>> {
>>   tmp=`(umask 077 && mktemp -d "./confXX") 2>/dev/null` &&
>>   test -d "$tmp"
>> }  ||
>> {
>>   tmp=./conf$$-$RANDOM
>>   (umask 077 && mkdir "$tmp")
>> } || as_fn_error $? "cannot create a temporary directory in ." "$LINENO" 5
>> ac_tmp=$tmp
> 
> Yuck.
> 
>> The use of $$ and $RANDOM is safe (even on shells that lack $RANDOM)
>> because of the fact that mkdir is atomic and the umask is correctly set
>> prior to the mkdir.
> 
> I dislike the use of $RANDOM, because it means we behave
> inconsistently. If it's OK for $RANDOM to expand to "" then we
> should just not use it at all, because that's OK and the same
> everywhere.

It's okay for $RANDOM to expand to "" in the fallback code, for the
platforms that lack mktemp(1); most developers are on a platform that
have mktemp.  The use of $RANDOM makes it harder for an attacker to
pre-create a competing file by the same name, but does not add any
security; so omitting $RANDOM for the fallback path doesn't hurt if you
are that bothered by seeing it present in a dash script.

> 
> Similarly, if it's OK not to use mktemp on some systems,
> we should use the same non-mktemp code everywhere.

The fallback is not ideal, but tolerable.  It's still better to try and
use mktemp where it exists.

> 
> We could sidestep this rubbish by not trying to put our temp
> files in /tmp/, and instead just put them in the build directory
> (ie ./conf-temps/ or something similar, which we blow away
> and recreate every time).

Yes, using a different location for temporary files and avoiding /tmp
might also work.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH] qmp: use valid JSON in transaction example

2014-05-06 Thread Eric Blake
Our example should use the correct quotes to match what someone
could actually pass over the wire.

* qmp-commands.hx: Use correct JSON quotes.

Signed-off-by: Eric Blake 
---
 qmp-commands.hx | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/qmp-commands.hx b/qmp-commands.hx
index ed3ab92..e20fcf0 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1165,19 +1165,19 @@ Example:

 -> { "execute": "transaction",
  "arguments": { "actions": [
- { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "ide-hd0",
+ { "type": "blockdev-snapshot-sync", "data" : { "device": "ide-hd0",
  "snapshot-file": 
"/some/place/my-image",
  "format": "qcow2" } },
- { 'type': 'blockdev-snapshot-sync', 'data' : { "node-name": "myfile",
+ { "type": "blockdev-snapshot-sync", "data" : { "node-name": "myfile",
  "snapshot-file": 
"/some/place/my-image2",
  "snapshot-node-name": "node3432",
  "mode": "existing",
  "format": "qcow2" } },
- { 'type': 'blockdev-snapshot-sync', 'data' : { "device": "ide-hd1",
+ { "type": "blockdev-snapshot-sync", "data" : { "device": "ide-hd1",
  "snapshot-file": 
"/some/place/my-image2",
  "mode": "existing",
  "format": "qcow2" } },
- { 'type': 'blockdev-snapshot-internal-sync', 'data' : {
+ { "type": "blockdev-snapshot-internal-sync", "data" : {
  "device": "ide-hd2",
  "name": "snapshot0" } } ] } }
 <- { "return": {} }
-- 
1.9.0




  1   2   3   >