date:20160330

Re: [Qemu-devel] [PATCH 0/6] virtio: refactor host notifiers

2016-03-30 Thread tu bo


Hi Christian:

I got the same crash with qemu master + assertion patch + "[PATCH 0/6] 
virtio: refactor host notifiers" +  Paolo's fix,


(gdb) bt
#0  blk_aio_read_entry (opaque=0x0) at block/block-backend.c:916
#1  0x02aa2e8e88fe in coroutine_trampoline (i0=, 
i1=-1677703696) at util/coroutine-ucontext.c:78

#2  0x03ffa85d150a in __makecontext_ret () from /lib64/libc.so.6


On 03/30/2016 12:27 AM, Christian Borntraeger wrote:

On 03/29/2016 03:50 PM, Paolo Bonzini wrote:



On 29/03/2016 13:45, Cornelia Huck wrote:

Hi Tu Bo,

please always include the assertion patch at
https://lists.gnu.org/archive/html/qemu-block/2016-03/msg00546.html in
your tests.  Can you include the backtrace from all threads with that patch?


thanks for your reminder about the assertion patch. Here is the
backtrace with qemu master + assertion patch + "[PATCH 0/6] virtio:
refactor host notifiers",


FWIW, I've been running this in a reboot loop for the last 2 1/2 hours.
Could you perhaps share your command line?


 From code inspection, the following is also necessary or at least a
good idea:

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 6fb29e3..7fa8477 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -258,7 +258,7 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s)
  aio_context_acquire(s->ctx);

  /* Stop notifications for new requests from guest */
-virtio_queue_aio_set_host_notifier_handler(s->vq, s->ctx, false, false);
+virtio_queue_aio_set_host_notifier_handler(s->vq, s->ctx, true, false);

  /* Drain and switch bs back to the QEMU main loop */
  blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context());



Right. Tu Bo, you seem to have the best testcase for this.
Does your setup runs fine with this on top?

CHristian

Re: [Qemu-devel] [PATCH v2] spapr: compute interrupt vector address from LPCR

2016-03-30 Thread David Gibson

On Wed, Mar 30, 2016 at 05:38:34PM +0200, Cédric Le Goater wrote:
> This address is changed by the linux kernel using the H_SET_MODE hcall
> and needs to be restored when migrating a spapr VM running in
> TCG. This can be done using the AIL bits from the LPCR register.
> 
> The patch introduces a helper routine cpu_ppc_get_excp_prefix() which
> returns the effective address offset of the interrupt handler
> depending on the LPCR_AIL bits. The same helper can be used in the
> H_SET_MODE hcall, which lets us remove the H_SET_MODE_ADDR_TRANS_*
> defines.
> 
> Signed-off-by: Cédric Le Goater 

I've applied this (with Greg's minor amendments) to ppc-for-2.6),
since it certainly improves behaviour, although I have a couple of
queries:

> ---
> 
>  Changes since v1:
> 
>  - moved helper routine under target-ppc/
>  - moved the restore of excp_prefix under cpu_post_load()
> 
>  hw/ppc/spapr_hcall.c|   13 ++---
>  include/hw/ppc/spapr.h  |5 -
>  target-ppc/cpu.h|9 +
>  target-ppc/machine.c|   20 +++-
>  target-ppc/translate_init.c |   14 ++
>  5 files changed, 44 insertions(+), 17 deletions(-)
> 
> Index: qemu-dgibson-for-2.6.git/hw/ppc/spapr_hcall.c
> ===
> --- qemu-dgibson-for-2.6.git.orig/hw/ppc/spapr_hcall.c
> +++ qemu-dgibson-for-2.6.git/hw/ppc/spapr_hcall.c
> @@ -835,17 +835,8 @@ static target_ulong h_set_mode_resource_
>  return H_P4;
>  }
>  
> -switch (mflags) {
> -case H_SET_MODE_ADDR_TRANS_NONE:
> -prefix = 0;
> -break;
> -case H_SET_MODE_ADDR_TRANS_0001_8000:
> -prefix = 0x18000;
> -break;
> -case H_SET_MODE_ADDR_TRANS_C000___4000:
> -prefix = 0xC0004000ULL;
> -break;
> -default:
> +prefix = cpu_ppc_get_excp_prefix(mflags);
> +if (prefix == (target_ulong) -1ULL) {
>  return H_UNSUPPORTED_FLAG;
>  }
>  
> Index: qemu-dgibson-for-2.6.git/target-ppc/machine.c
> ===
> --- qemu-dgibson-for-2.6.git.orig/target-ppc/machine.c
> +++ qemu-dgibson-for-2.6.git/target-ppc/machine.c
> @@ -156,12 +156,26 @@ static void cpu_pre_save(void *opaque)
>  }
>  }
>  
> +
> +static int cpu_post_load_excp_prefix(CPUPPCState *env)
> +{
> +int ail = (env->spr[SPR_LPCR] & LPCR_AIL) >> LPCR_AIL_SHIFT;
> +target_ulong prefix = cpu_ppc_get_excp_prefix(ail);
> +
> +if (prefix == (target_ulong) -1ULL) {
> +return -EINVAL;
> +}
> +env->excp_prefix = prefix;
> +return 0;
> +}
> +
>  static int cpu_post_load(void *opaque, int version_id)
>  {
>  PowerPCCPU *cpu = opaque;
>  CPUPPCState *env = >env;
>  int i;
>  target_ulong msr;
> +int ret = 0;
>  
>  /*
>   * We always ignore the source PVR. The user or management
> @@ -201,7 +215,11 @@ static int cpu_post_load(void *opaque, i
>  
>  hreg_compute_mem_idx(env);
>  
> -return 0;
> +if (env->spr[SPR_LPCR] & LPCR_AIL) {
> +ret = cpu_post_load_excp_prefix(env);
> +}

Why not call this unconditionally?  If AIL == 0 it will still do the
right thing.

Aren't there also circumstances where the exception prefix can depend
on the MSR?  Do those need to be handled somewhere?

> +
> +return ret;
>  }
>  
>  static bool fpu_needed(void *opaque)
> Index: qemu-dgibson-for-2.6.git/target-ppc/translate_init.c
> ===
> --- qemu-dgibson-for-2.6.git.orig/target-ppc/translate_init.c
> +++ qemu-dgibson-for-2.6.git/target-ppc/translate_init.c
> @@ -8522,6 +8522,20 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu)
>  }
>  }
>  
> +target_ulong cpu_ppc_get_excp_prefix(target_ulong ail)
> +{
> +switch (ail) {
> +case AIL_NONE:
> +return 0;
> +case AIL_0001_8000:
> +return 0x18000;
> +case AIL_C000___4000:
> +return 0xC0004000ULL;
> +default:
> +return (target_ulong) -1ULL;
> +}
> +}
> +
>  #endif /* !defined(CONFIG_USER_ONLY) */
>  
>  #endif /* defined (TARGET_PPC64) */
> Index: qemu-dgibson-for-2.6.git/target-ppc/cpu.h
> ===
> --- qemu-dgibson-for-2.6.git.orig/target-ppc/cpu.h
> +++ qemu-dgibson-for-2.6.git/target-ppc/cpu.h
> @@ -1269,6 +1269,7 @@ void store_booke_tsr (CPUPPCState *env,
>  void ppc_tlb_invalidate_all (CPUPPCState *env);
>  void ppc_tlb_invalidate_one (CPUPPCState *env, target_ulong addr);
>  void cpu_ppc_set_papr(PowerPCCPU *cpu);
> +target_ulong cpu_ppc_get_excp_prefix(target_ulong ail);
>  #endif
>  #endif
>  
> @@ -2277,6 +2278,14 @@ enum {
>  HMER_XSCOM_STATUS_LSH   = (63 - 23),
>  };
>  
> +/* Alternate Interrupt Location (AIL) */
> +enum {
> +AIL_NONE= 0,
> +AIL_RESERVED= 1,
> +AIL_0001_8000   = 2,
> +

Re: [Qemu-devel] [PATCH V2 2/3] colo-compare: track connection and enqueue packet

2016-03-30 Thread Zhang Chen




On 03/31/2016 12:23 PM, Li Zhijian wrote:



On 03/31/2016 12:06 PM, Zhang Chen wrote:


+static void packet_destroy(void *opaque, void *user_data)
+{
+Packet *pkt = opaque;
+
+g_free(pkt->data);
+g_slice_free(Packet, pkt);
+}
+
+static inline void colo_flush_connection(void *opaque, void 
*user_data)

+{

Is this used?


will be used, like lizhijian said.


I mean you should remove this now, and re-introduce when you integrate 
to COLO frame




OK ~~


Thanks
Li Zhijian
.



--
Thanks
zhangchen

Re: [Qemu-devel] [PATCH V2 2/3] colo-compare: track connection and enqueue packet

2016-03-30 Thread Li Zhijian




On 03/31/2016 12:06 PM, Zhang Chen wrote:


+static void packet_destroy(void *opaque, void *user_data)
+{
+Packet *pkt = opaque;
+
+g_free(pkt->data);
+g_slice_free(Packet, pkt);
+}
+
+static inline void colo_flush_connection(void *opaque, void *user_data)
+{

Is this used?


will be used, like lizhijian said.


I mean you should remove this now, and re-introduce when you integrate to COLO 
frame

Thanks
Li Zhijian

Re: [Qemu-devel] [PATCH V2 2/3] colo-compare: track connection and enqueue packet

2016-03-30 Thread Zhang Chen




On 03/30/2016 06:36 PM, Dr. David Alan Gilbert wrote:

* Zhang Chen (zhangchen.f...@cn.fujitsu.com) wrote:

In this patch we use kernel jhash table to track
connection, and then enqueue net packet like this:

+ CompareState ++
|   |
+---+   +---+ +---+
|conn list  +--->conn   +->conn   |
+---+   +---+ +---+
|   | |   | |  |
+---+ +---v+  +---v++---v+ +---v+
   |primary |  |secondary|primary | |secondary
   |packet  |  |packet  +|packet  | |packet  +
   ++  ++++ ++
   |   | |  |
   +---v+  +---v++---v+ +---v+
   |primary |  |secondary|primary | |secondary
   |packet  |  |packet  +|packet  | |packet  +
   ++  ++++ ++
   |   | |  |
   +---v+  +---v++---v+ +---v+
   |primary |  |secondary|primary | |secondary
   |packet  |  |packet  +|packet  | |packet  +
   ++  ++++ ++

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
  include/qemu/jhash.h |  59 ++
  net/colo-compare.c   | 324 ++-
  2 files changed, 380 insertions(+), 3 deletions(-)
  create mode 100644 include/qemu/jhash.h

diff --git a/include/qemu/jhash.h b/include/qemu/jhash.h
new file mode 100644
index 000..8a8ff0f
--- /dev/null
+++ b/include/qemu/jhash.h
@@ -0,0 +1,59 @@
+/* jhash.h: Jenkins hash support.
+  *
+  * Copyright (C) 2006. Bob Jenkins (bob_jenk...@burtleburtle.net)
+  *
+  * http://burtleburtle.net/bob/hash/
+  *
+  * These are the credits from Bob's sources:
+  *
+  * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
+  *
+  * These are functions for producing 32-bit hashes for hash table lookup.
+  * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final()
+  * are externally useful functions.  Routines to test the hash are included
+  * if SELF_TEST is defined.  You can use this free for any purpose.It's in
+  * the public domain.  It has no warranty.
+  *
+  * Copyright (C) 2009-2010 Jozsef Kadlecsik (kad...@blackhole.kfki.hu)
+  *
+  * I've modified Bob's hash to be useful in the Linux kernel, and
+  * any bugs present are my fault.
+  * Jozsef
+  */
+
+#ifndef QEMU_JHASH_H__
+#define QEMU_JHASH_H__
+
+#include "qemu/bitops.h"
+
+/*
+ * hashtable related is copied from linux kernel jhash
+ */
+
+/* __jhash_mix -- mix 3 32-bit values reversibly. */
+#define __jhash_mix(a, b, c)\
+{   \
+a -= c;  a ^= rol32(c, 4);  c += b; \
+b -= a;  b ^= rol32(a, 6);  a += c; \
+c -= b;  c ^= rol32(b, 8);  b += a; \
+a -= c;  a ^= rol32(c, 16); c += b; \
+b -= a;  b ^= rol32(a, 19); a += c; \
+c -= b;  c ^= rol32(b, 4);  b += a; \
+}
+
+/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
+#define __jhash_final(a, b, c)  \
+{   \
+c ^= b; c -= rol32(b, 14);  \
+a ^= c; a -= rol32(c, 11);  \
+b ^= a; b -= rol32(a, 25);  \
+c ^= b; c -= rol32(b, 16);  \
+a ^= c; a -= rol32(c, 4);   \
+b ^= a; b -= rol32(a, 14);  \
+c ^= b; c -= rol32(b, 24);  \
+}
+
+/* An arbitrary initial parameter */
+#define JHASH_INITVAL   0xdeadbeef
+
+#endif /* QEMU_JHASH_H__ */
diff --git a/net/colo-compare.c b/net/colo-compare.c
index 62c66df..0bb5a51 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -20,15 +20,22 @@
  #include "net/queue.h"
  #include "sysemu/char.h"
  #include "qemu/sockets.h"
+#include 
+#include "slirp/slirp.h"
+#include "qemu/jhash.h"
+#include 
  
  #define TYPE_COLO_COMPARE "colo-compare"

  #define COLO_COMPARE(obj) \
  OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE)
  
  #define COMPARE_READ_LEN_MAX NET_BUFSIZE

+#define PAGE_SIZE 4096
+#define ETH_HLEN 14

PAGE_SIZE is not just 4k; use one of the system headers.


OK, I will fix it with include/exec/cpu-all.h TARGET_PAGE_SIZE


Also, don't define ETH_HLEN - include net/eth.h


OK


  static QTAILQ_HEAD(, CompareState) net_compares =
 QTAILQ_HEAD_INITIALIZER(net_compares);
+static ssize_t hashtable_max_size;
  
  typedef struct ReadState {

  int state; /* 0 = getting length, 1 = getting data */
@@ -37,6 +44,28 @@ typedef struct ReadState {
  uint8_t buf[COMPARE_READ_LEN_MAX];
  } ReadState;
  
+/*

+  + CompareState ++
+  |   |
+

Re: [Qemu-devel] [PATCH] acpi: Fix TPM ACPI description to make TPM usable on Windows

2016-03-30 Thread Stefan Berger


On 03/30/2016 09:33 AM, Igor Mammedov wrote:

On Mon, 21 Mar 2016 10:21:11 -0400
Stefan Berger  wrote:


This patch addresses BZ 1281413.

Fix the APCI description to make it work on Windows again. The ACPI
description was broken in commit 9e47226.

above commit just added missing ASL description for TMP device
and you also posted exactly similar patch back then.


Sorry, I referenced the wrong commit. Here's the bad one:

commit 72d97b3a543a9c2c820bd463ba24751ae4247ac3



Current commit message is pretty useless, Pls describe in commit
message what/how is broken and how/why patch fixes it.


I am not sure what exactly broke it. All I know is that the scope was 
changed and the device name were change (ISA.TPM versus TPM). This was 
the working ACPI description from QEMU v2.3.1:


Scope(\_SB) {
/* TPM with emulated TPM TIS interface */
Device (TPM) {
Name (_HID, EisaID ("PNP0C31"))
Name (_CRS, ResourceTemplate ()
{
Memory32Fixed (ReadWrite, TPM_TIS_ADDR_BASE, 
TPM_TIS_ADDR_SIZE)

// older Linux tpm_tis drivers do not work with IRQ
//IRQNoFlags () {TPM_TIS_IRQ}
})





Signed-off-by: Stefan Berger 
---
  hw/i386/acpi-build.c | 26 --
  1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 325d8ce..c6e90b6 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2334,22 +2334,20 @@ build_dsdt(GArray *table_data, GArray *linker,
  Aml *scope = aml_scope("PCI0");
  /* Scan all PCI buses. Generate tables to support
hotplug. */ build_append_pci_bus_devices(scope, bus,
pm->pcihp_bridge_en); -
-if (misc->tpm_version != TPM_VERSION_UNSPEC) {
-dev = aml_device("ISA.TPM");
-aml_append(dev, aml_name_decl("_HID",
aml_eisaid("PNP0C31")));
-aml_append(dev, aml_name_decl("_STA",
aml_int(0xF)));
-crs = aml_resource_template();
-aml_append(crs,
aml_memory32_fixed(TPM_TIS_ADDR_BASE,
-   TPM_TIS_ADDR_SIZE, AML_READ_WRITE));
-aml_append(crs, aml_irq_no_flags(TPM_TIS_IRQ));
-aml_append(dev, aml_name_decl("_CRS", crs));
-aml_append(scope, dev);
-}
-
-aml_append(sb_scope, scope);

it's ISA device so it should stay in ISA scope,
I'll post and alternative fix for TPM_TIS_ADDR_BASE as follow up
to this series.


Alternative means it doesn't need this patch? Or it needs it and is a 
follow up on top of it?



  }
  }
+
+if (misc->tpm_version != TPM_VERSION_UNSPEC) {
+dev = aml_device("TPM");
+aml_append(dev, aml_name_decl("_HID",
aml_eisaid("PNP0C31")));
+aml_append(dev, aml_name_decl("_STA", aml_int(0xF)));
+crs = aml_resource_template();
+aml_append(crs, aml_memory32_fixed(TPM_TIS_ADDR_BASE,
+   TPM_TIS_ADDR_SIZE, AML_READ_WRITE));
+//aml_append(crs, aml_irq_no_flags(TPM_TIS_IRQ));

silent change,
why IRQ descriptor is commented out, it seems the device
uses/initializes it?
I'd split IRQ change in a separate patch that explains why it
shouldn't be in TPM._CRS.


We can leave it there if it works. The bug report is related to Windows, 
which I don't have running in a VM. So the safest was to roll back to 
2.3.1 equivalent, which had the IRQ disabled.


Stefan





+aml_append(dev, aml_name_decl("_CRS", crs));
+aml_append(sb_scope, dev);
+}
  aml_append(dsdt, sb_scope);
  }

Re: [Qemu-devel] [PATCH qemu v14 18/18] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)

2016-03-30 Thread David Gibson

On Thu, Mar 24, 2016 at 01:32:48PM +1100, Alexey Kardashevskiy wrote:
> On 03/23/2016 05:11 PM, David Gibson wrote:
> >On Wed, Mar 23, 2016 at 02:28:01PM +1100, Alexey Kardashevskiy wrote:
> >>On 03/23/2016 01:13 PM, David Gibson wrote:
> >>>On Mon, Mar 21, 2016 at 06:47:06PM +1100, Alexey Kardashevskiy wrote:
> This adds support for Dynamic DMA Windows (DDW) option defined by
> the SPAPR specification which allows to have additional DMA window(s)
> 
> This implements DDW for emulated and VFIO devices.
> This reserves RTAS token numbers for DDW calls.
> 
> This changes the TCE table migration descriptor to support dynamic
> tables as from now on, PHB will create as many stub TCE table objects
> as PHB can possibly support but not all of them might be initialized at
> the time of migration because DDW might or might not be requested by
> the guest.
> 
> The "ddw" property is enabled by default on a PHB but for compatibility
> the pseries-2.5 machine and older disable it.
> 
> This implements DDW for VFIO. The host kernel support is required.
> This adds a "levels" property to PHB to control the number of levels
> in the actual TCE table allocated by the host kernel, 0 is the default
> value to tell QEMU to calculate the correct value. Current hardware
> supports up to 5 levels.
> 
> The existing linux guests try creating one additional huge DMA window
> with 64K or 16MB pages and map the entire guest RAM to. If succeeded,
> the guest switches to dma_direct_ops and never calls TCE hypercalls
> (H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM
> and not waste time on map/unmap later. This adds a "dma64_win_addr"
> property which is a bus address for the 64bit window and by default
> set to 0x800... as this is what the modern POWER8 hardware
> uses and this allows having emulated and VFIO devices on the same bus.
> 
> This adds 4 RTAS handlers:
> * ibm,query-pe-dma-window
> * ibm,create-pe-dma-window
> * ibm,remove-pe-dma-window
> * ibm,reset-pe-dma-window
> These are registered from type_init() callback.
> 
> These RTAS handlers are implemented in a separate file to avoid polluting
> spapr_iommu.c with PCI.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>   hw/ppc/Makefile.objs|   1 +
>   hw/ppc/spapr.c  |   7 +-
>   hw/ppc/spapr_pci.c  |  73 ---
>   hw/ppc/spapr_rtas_ddw.c | 300 
>  
>   hw/vfio/common.c|   5 -
>   include/hw/pci-host/spapr.h |  13 ++
>   include/hw/ppc/spapr.h  |  16 ++-
>   trace-events|   4 +
>   8 files changed, 395 insertions(+), 24 deletions(-)
>   create mode 100644 hw/ppc/spapr_rtas_ddw.c
> 
> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> index c1ffc77..986b36f 100644
> --- a/hw/ppc/Makefile.objs
> +++ b/hw/ppc/Makefile.objs
> @@ -7,6 +7,7 @@ obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o 
> spapr_drc.o spapr_rng.o
>   ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
>   obj-y += spapr_pci_vfio.o
>   endif
> +obj-$(CONFIG_PSERIES) += spapr_rtas_ddw.o
>   # PowerPC 4xx boards
>   obj-y += ppc405_boards.o ppc4xx_devs.o ppc405_uc.o ppc440_bamboo.o
>   obj-y += ppc4xx_pci.o
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index d0bb423..ef4c637 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2362,7 +2362,12 @@ DEFINE_SPAPR_MACHINE(2_6, "2.6", true);
>    * pseries-2.5
>    */
>   #define SPAPR_COMPAT_2_5 \
> -HW_COMPAT_2_5
> +HW_COMPAT_2_5 \
> +{\
> +.driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,\
> +.property = "ddw",\
> +.value= stringify(off),\
> +},
> 
>   static void spapr_machine_2_5_instance_options(MachineState *machine)
>   {
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index af99a36..3bb294a 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -803,12 +803,12 @@ static char *spapr_phb_get_loc_code(sPAPRPHBState 
> *sphb, PCIDevice *pdev)
>   return buf;
>   }
> 
> -static void spapr_phb_dma_window_enable(sPAPRPHBState *sphb,
> -   uint32_t liobn,
> -   uint32_t page_shift,
> -   uint64_t window_addr,
> -   uint64_t window_size,
> -   Error **errp)
> +void spapr_phb_dma_window_enable(sPAPRPHBState *sphb,
> + uint32_t liobn,
> + uint32_t

Re: [Qemu-devel] [PATCH V2 0/3] Introduce COLO-compare

2016-03-30 Thread Li Zhijian




On 03/30/2016 08:05 PM, Dr. David Alan Gilbert wrote:

* Zhang Chen (zhangchen.f...@cn.fujitsu.com) wrote:

COLO-compare is a part of COLO project. It is used
to compare the network package to help COLO decide
whether to do checkpoint.


Hi Zhang Chen,
   I've put comments on the individual patches, but some more general things:

   1) Please add a coment giving the example of the command line for the primary
 and secondary use of this module - it helps make it easier to understand 
the patches.

   2) There's no tracing in here - please add some; I found when I tried to get
 COLO working I needed to use lots of tracing and debugging to understand 
the
 packet flow.

   3) Add comments; e.g. for each function say which thread is using it and 
where
  the packets are coming from; e.g.
 called from the main thread on the primary for packets arriving over 
the socket
 from the secondary.

  There's just so many packets going in so many directions it would make it
  easier to follow.

   4) A more fundamental problem is what happens if the secondary never sends 
anything
  on the socket, the result is you end up running until the end of the long 
COLO
  checkpoint without triggering a discompare - in my world I added a 
timeout (400ms)
  for an unmatched packet from the primary, where if no matching packet was 
received
  a checkpoint would be triggered.

   5) I see the packet comparison is still the simple memcmpy that you had in 
December;
  are you planning on doing anything more complicated; you must be seing 
most packets
  miscompare?

You can see my current world at; 
https://github.com/orbitfp7/qemu/commits/orbit-wp4-colo-mar16
which has my basic TCP comparison (it's only tracking incoming connections) and 
I know it's
not complete either.  It mostly works OK, although I've got an occasional seg
(which makes me wonder if I need to add the conn_list_lock I see you added).  
I'm also
not doing any TCP reassembly which is probably needed.


Thank you very much for your comments.
I just see you tree, you put in a lot of work(tcp comparison enhance, 
sequence/acknowledge
number re-write, timeout...)

Actually, this compare module is just in a RFC stage(only including compare 
frame), there are
many works to be done:

1) Integrate to COLO frame(and Let COLO primary and secondary at running state)

2) ip segment defrag

3) comparison base on the sequence number(tcp and udp) if packet has
   Because tcp re-transmission is quit common. IRC, your code will compare the 
whole tcp
packet(sequence number will be compare)

4) packet belongs to the same connection is sort by sequence number

5) Out-Of-Oder packet handle

6) cleanup the un-active conn_list which maybe closed. the simple way is to 
introduce a
   timer to record whether a connection have packet come within a timeout, 
connection gone
beyond this timeout should be cleanup.

7) Dave point out above (4)

8) something I miss...

For Various reasons, not all the works can be done immediately, So we hope to 
discuss and
decide which function have the high priority.
Any comments and suggestions are welcome.

IMO, a compare frame and a COLO frame hack patch could be simple enough.

Thanks
Li


Dave


v2:
  - add jhash.h

v1:
  - initial patch


Zhang Chen (3):
   colo-compare: introduce colo compare initlization
   colo-compare: track connection and enqueue packet
   colo-compare: introduce packet comparison thread

  include/qemu/jhash.h |  59 
  net/Makefile.objs|   1 +
  net/colo-compare.c   | 782 +++
  vl.c |   3 +-
  4 files changed, 844 insertions(+), 1 deletion(-)
  create mode 100644 include/qemu/jhash.h
  create mode 100644 net/colo-compare.c

--
1.9.1




--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


.



--
Best regards.
Li Zhijian (8555)

Re: [Qemu-devel] [PATCH 0/6] virtio: refactor host notifiers

2016-03-30 Thread tu bo


Hi Christian:

On 03/29/2016 07:54 PM, Christian Borntraeger wrote:

On 03/29/2016 11:14 AM, tu bo wrote:

Hi Paolo:

On 03/29/2016 02:11 AM, Paolo Bonzini wrote:

On 28/03/2016 05:55, TU BO wrote:

Hi Cornelia:

I got two crash with qemu master + "[PATCH 0/6] virtio: refactor host
notifiers",


Hi Tu Bo,

please always include the assertion patch at
https://lists.gnu.org/archive/html/qemu-block/2016-03/msg00546.html in
your tests.  Can you include the backtrace from all threads with that patch?


thanks for your reminder about the assertion patch. Here is the backtrace with qemu 
master + assertion patch + "[PATCH 0/6] virtio: refactor host notifiers",

I got two crashes,

1. For 1st crash,
(gdb) thread apply all bt

Thread 8 (Thread 0x3ff8daf1910 (LWP 52859)):
#0  0x03ff9718ec62 in do_futex_timed_wait () from /lib64/libpthread.so.0
#1  0x03ff9718ed76 in sem_timedwait () from /lib64/libpthread.so.0
#2  0x02aa2d755868 in qemu_sem_timedwait (sem=0x3ff88000fa8, ms=) at util/qemu-thread-posix.c:245
#3  0x02aa2d6803e4 in worker_thread (opaque=0x3ff88000f40) at 
thread-pool.c:92
#4  0x03ff971884c6 in start_thread () from /lib64/libpthread.so.0
#5  0x03ff96802ec2 in thread_start () from /lib64/libc.so.6

Thread 7 (Thread 0x3ff8e679910 (LWP 52856)):
#0  0x03ff9718ec62 in do_futex_timed_wait () from /lib64/libpthread.so.0
#1  0x03ff9718ed76 in sem_timedwait () from /lib64/libpthread.so.0
#2  0x02aa2d755868 in qemu_sem_timedwait (sem=0x2aa2e1fbfa8, ms=) at util/qemu-thread-posix.c:245
#3  0x02aa2d6803e4 in worker_thread (opaque=0x2aa2e1fbf40) at 
thread-pool.c:92
#4  0x03ff971884c6 in start_thread () from /lib64/libpthread.so.0
#5  0x03ff96802ec2 in thread_start () from /lib64/libc.so.6

Thread 6 (Thread 0x3ff9497f910 (LWP 52850)):
#0  0x03ff9718c50e in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x03ff96d19792 in g_cond_wait () from /lib64/libglib-2.0.so.0
#2  0x02aa2d7165d2 in wait_for_trace_records_available () at 
trace/simple.c:147
---Type  to continue, or q  to quit---
#3  writeout_thread (opaque=) at trace/simple.c:165
#4  0x03ff96cfa44c in g_thread_proxy () from /lib64/libglib-2.0.so.0
#5  0x03ff971884c6 in start_thread () from /lib64/libpthread.so.0
#6  0x03ff96802ec2 in thread_start () from /lib64/libc.so.6

Thread 5 (Thread 0x3ff8efff910 (LWP 52855)):
#0  0x03ff967f819a in ioctl () from /lib64/libc.so.6
#1  0x02aa2d546f3e in kvm_vcpu_ioctl (cpu=cpu@entry=0x2aa2e239030, 
type=type@entry=44672)
 at /usr/src/debug/qemu-2.5.50/kvm-all.c:1984
#2  0x02aa2d54701e in kvm_cpu_exec (cpu=0x2aa2e239030) at 
/usr/src/debug/qemu-2.5.50/kvm-all.c:1834
#3  0x02aa2d533cd6 in qemu_kvm_cpu_thread_fn (arg=) at 
/usr/src/debug/qemu-2.5.50/cpus.c:1056
#4  0x03ff971884c6 in start_thread () from /lib64/libpthread.so.0
#5  0x03ff96802ec2 in thread_start () from /lib64/libc.so.6

Thread 4 (Thread 0x3ff951ff910 (LWP 52849)):
#0  0x03ff967fcf56 in syscall () from /lib64/libc.so.6
#1  0x02aa2d755a36 in futex_wait (val=, ev=) 
at util/qemu-thread-posix.c:292
#2  qemu_event_wait (ev=0x2aa2ddb5914 ) at 
util/qemu-thread-posix.c:399
#3  0x02aa2d765002 in call_rcu_thread (opaque=) at 
util/rcu.c:250
#4  0x03ff971884c6 in start_thread () from /lib64/libpthread.so.0
#5  0x03ff96802ec2 in thread_start () from /lib64/libc.so.6
---Type  to continue, or q  to quit---

Thread 3 (Thread 0x3ff978e0bf0 (LWP 52845)):
#0  0x03ff967f66e6 in ppoll () from /lib64/libc.so.6
#1  0x02aa2d68928e in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
__fds=) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=, nfds=, timeout=-1) at 
qemu-timer.c:313
#3  0x02aa2d688b02 in os_host_main_loop_wait (timeout=-1) at main-loop.c:251
#4  main_loop_wait (nonblocking=) at main-loop.c:505
#5  0x02aa2d4faade in main_loop () at vl.c:1933
#6  main (argc=, argv=, envp=) at 
vl.c:4646

Thread 2 (Thread 0x3ff8910 (LWP 52851)):
#0  0x03ff967f66e6 in ppoll () from /lib64/libc.so.6
#1  0x02aa2d68928e in ppoll (__ss=0x0, __timeout=0x0, __nfds=, 
__fds=) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=, nfds=, timeout=-1) at 
qemu-timer.c:313
#3  0x02aa2d68a788 in aio_poll (ctx=0x2aa2de77e00, blocking=) at aio-posix.c:453
#4  0x02aa2d5b909c in iothread_run (opaque=0x2aa2de77220) at iothread.c:46
#5  0x03ff971884c6 in start_thread () from /lib64/libpthread.so.0
#6  0x03ff96802ec2 in thread_start () from /lib64/libc.so.6

Thread 1 (Thread 0x3ff8f7ff910 (LWP 52854)):
#0  0x03ff9673b650 in raise () from /lib64/libc.so.6
---Type  to continue, or q  to quit---
#1  0x03ff9673ced8 in abort () from /lib64/libc.so.6
#2  0x03ff96733666 in __assert_fail_base () from /lib64/libc.so.6
#3  0x03ff967336f4 in __assert_fail () from /lib64/libc.so.6
#4  0x02aa2d562608 in virtio_blk_handle_output (vdev=, 
vq=)
 at /usr/src/debug/qemu-2.5.50/hw/block/virtio-blk.c:595


Hmmm, are you sure that

Re: [Qemu-devel] [PATCH 1/1] virtio: fix ioeventfd assignment race

2016-03-30 Thread tu bo


Hi Cornelia:

I saw qemu crash for this patch on qemu master yesterday.

However, scsi disks of my lpar is not available because of s38 firmware 
update.  I'll double check the test when it's ready. thx


On 03/29/2016 10:17 PM, Cornelia Huck wrote:

The ->set_host_notifier() callback is invoked whenever we want to
switch from or to the generic ioeventfd handler. Currently, all
transports deregister the ioeventfd backing and then re-register
it. This opens a race window where we are without ioeventfd
backing for a time period: In the virtio-blk dataplane case, we
observed notifications coming in from both the vcpu thread and
the iothread.

Let's change pci, mmio and ccw to keep the ioeventfd during
->set_host_notifier() and only switch the ioeventfd handler.

Signed-off-by: Cornelia Huck 
---
  hw/s390x/virtio-ccw.c  | 22 +-
  hw/virtio/virtio-mmio.c| 27 +--
  hw/virtio/virtio-pci.c | 28 ++--
  include/hw/virtio/virtio-bus.h |  4 
  4 files changed, 56 insertions(+), 25 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index cb887ba..7b1088e 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -1196,14 +1196,26 @@ static bool 
virtio_ccw_query_guest_notifiers(DeviceState *d)
  static int virtio_ccw_set_host_notifier(DeviceState *d, int n, bool assign)
  {
  VirtioCcwDevice *dev = VIRTIO_CCW_DEVICE(d);
+VirtIODevice *vdev = virtio_bus_get_device(>bus);
+VirtQueue *vq = virtio_get_queue(vdev, n);

-/* Stop using the generic ioeventfd, we are doing eventfd handling
- * ourselves below */
-dev->ioeventfd_disabled = assign;
  if (assign) {
-virtio_ccw_stop_ioeventfd(dev);
+/* Stop using the generic ioeventfd, we are doing eventfd handling
+ * ourselves below */
+dev->ioeventfd_disabled = true;
+};
+/*
+ * Just switch the handler, don't deassign the ioeventfd.
+ * Otherwise, there's a window where we don't have an
+ * ioeventfd and we may end up with a notification where
+ * we don't expect one.
+ */
+virtio_queue_set_host_notifier_fd_handler(vq, assign, !assign);
+if (!assign) {
+/* Use generic ioeventfd handler again. */
+dev->ioeventfd_disabled = false;
  }
-return virtio_ccw_set_guest2host_notifier(dev, n, assign, false);
+return 0;
  }

  static int virtio_ccw_get_mappings(VirtioCcwDevice *dev)
diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index d4cd91f..aafebdf 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -502,19 +502,26 @@ static int virtio_mmio_set_host_notifier(DeviceState 
*opaque, int n,
   bool assign)
  {
  VirtIOMMIOProxy *proxy = VIRTIO_MMIO(opaque);
+VirtIODevice *vdev = virtio_bus_get_device(>bus);
+VirtQueue *vq = virtio_get_queue(vdev, n);

-/* Stop using ioeventfd for virtqueue kick if the device starts using host
- * notifiers.  This makes it easy to avoid stepping on each others' toes.
- */
-proxy->ioeventfd_disabled = assign;
  if (assign) {
-virtio_mmio_stop_ioeventfd(proxy);
+/* Stop using the generic ioeventfd, we are doing eventfd handling
+ * ourselves below */
+proxy->ioeventfd_disabled = true;
+};
+/*
+ * Just switch the handler, don't deassign the ioeventfd.
+ * Otherwise, there's a window where we don't have an
+ * ioeventfd and we may end up with a notification where
+ * we don't expect one.
+ */
+virtio_queue_set_host_notifier_fd_handler(vq, assign, !assign);
+if (!assign) {
+/* Use generic ioeventfd handler again. */
+proxy->ioeventfd_disabled = false;
  }
-/* We don't need to start here: it's not needed because backend
- * currently only stops on status change away from ok,
- * reset, vmstop and such. If we do add code to start here,
- * need to check vmstate, device state etc. */
-return virtio_mmio_set_host_notifier_internal(proxy, n, assign, false);
+return 0;
  }

  /* virtio-mmio device */
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 0dadb66..a91c1e8 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1115,18 +1115,26 @@ static int virtio_pci_set_host_notifier(DeviceState *d, 
int n, bool assign)
  {
  VirtIOPCIProxy *proxy = to_virtio_pci_proxy(d);

-/* Stop using ioeventfd for virtqueue kick if the device starts using host
- * notifiers.  This makes it easy to avoid stepping on each others' toes.
- */
-proxy->ioeventfd_disabled = assign;
+VirtIODevice *vdev = virtio_bus_get_device(>bus);
+VirtQueue *vq = virtio_get_queue(vdev, n);
+
  if (assign) {
-virtio_pci_stop_ioeventfd(proxy);
+/* Stop using the generic ioeventfd, we are doing eventfd handling
+ * ourselves below */
+

Re: [Qemu-devel] [PATCH 0/6] virtio: refactor host notifiers

2016-03-30 Thread tu bo


Hi Christian:

On 03/30/2016 12:27 AM, Christian Borntraeger wrote:

On 03/29/2016 03:50 PM, Paolo Bonzini wrote:



On 29/03/2016 13:45, Cornelia Huck wrote:

Hi Tu Bo,

please always include the assertion patch at
https://lists.gnu.org/archive/html/qemu-block/2016-03/msg00546.html in
your tests.  Can you include the backtrace from all threads with that patch?


thanks for your reminder about the assertion patch. Here is the
backtrace with qemu master + assertion patch + "[PATCH 0/6] virtio:
refactor host notifiers",


FWIW, I've been running this in a reboot loop for the last 2 1/2 hours.
Could you perhaps share your command line?


 From code inspection, the following is also necessary or at least a
good idea:

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 6fb29e3..7fa8477 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -258,7 +258,7 @@ void virtio_blk_data_plane_stop(VirtIOBlockDataPlane *s)
  aio_context_acquire(s->ctx);

  /* Stop notifications for new requests from guest */
-virtio_queue_aio_set_host_notifier_handler(s->vq, s->ctx, false, false);
+virtio_queue_aio_set_host_notifier_handler(s->vq, s->ctx, true, false);

  /* Drain and switch bs back to the QEMU main loop */
  blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context());



Right. Tu Bo, you seem to have the best testcase for this.
Does your setup runs fine with this on top?


My test needs at least four scsi disks, which was broken now because of 
the s38 firmware update. I'll test it when s38 scsi is ready. thx




CHristian

Re: [Qemu-devel] [PATCH V2 3/3] colo-compare: introduce packet comparison thread

2016-03-30 Thread Li Zhijian

On 03/30/2016 07:41 PM, Dr. David Alan Gilbert wrote:
[...]

>@@ -433,7 +532,9 @@ static void compare_pri_chr_in(void *opaque, const uint8_t 
*buf, int size)
>  if (ret == 1) {
>  if (packet_enqueue(s, PRIMARY_IN)) {
>  error_report("primary: unsupported packet in");
>-compare_chr_send(s->chr_out, buf, size);
>+compare_chr_send(s->chr_out, s->pri_rs.buf, s->pri_rs.packet_len);

Doesn't that change belong in an earlier patch?

>+} else {
>+qemu_event_set(>event);

Also these - why are these in this patch?

This event is to wakeup comparison thread to do compare.
Do you think we should put event related code to patch 2 ?

Thanks
Li

>  }

Re: [Qemu-devel] [PATCH V2 2/3] colo-compare: track connection and enqueue packet

2016-03-30 Thread Li Zhijian




On 03/30/2016 06:36 PM, Dr. David Alan Gilbert wrote:

* Zhang Chen (zhangchen.f...@cn.fujitsu.com) wrote:

In this patch we use kernel jhash table to track
connection, and then enqueue net packet like this:

+ CompareState ++
|   |
+---+   +---+ +---+
|conn list  +--->conn   +->conn   |
+---+   +---+ +---+
|   | |   | |  |
+---+ +---v+  +---v++---v+ +---v+
   |primary |  |secondary|primary | |secondary
   |packet  |  |packet  +|packet  | |packet  +
   ++  ++++ ++
   |   | |  |
   +---v+  +---v++---v+ +---v+
   |primary |  |secondary|primary | |secondary
   |packet  |  |packet  +|packet  | |packet  +
   ++  ++++ ++
   |   | |  |
   +---v+  +---v++---v+ +---v+
   |primary |  |secondary|primary | |secondary
   |packet  |  |packet  +|packet  | |packet  +
   ++  ++++ ++

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
  include/qemu/jhash.h |  59 ++
  net/colo-compare.c   | 324 ++-
  2 files changed, 380 insertions(+), 3 deletions(-)
  create mode 100644 include/qemu/jhash.h

diff --git a/include/qemu/jhash.h b/include/qemu/jhash.h
new file mode 100644
index 000..8a8ff0f
--- /dev/null
+++ b/include/qemu/jhash.h
@@ -0,0 +1,59 @@
+/* jhash.h: Jenkins hash support.
+  *
+  * Copyright (C) 2006. Bob Jenkins (bob_jenk...@burtleburtle.net)
+  *
+  * http://burtleburtle.net/bob/hash/
+  *
+  * These are the credits from Bob's sources:
+  *
+  * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
+  *
+  * These are functions for producing 32-bit hashes for hash table lookup.
+  * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final()
+  * are externally useful functions.  Routines to test the hash are included
+  * if SELF_TEST is defined.  You can use this free for any purpose.It's in
+  * the public domain.  It has no warranty.
+  *
+  * Copyright (C) 2009-2010 Jozsef Kadlecsik (kad...@blackhole.kfki.hu)
+  *
+  * I've modified Bob's hash to be useful in the Linux kernel, and
+  * any bugs present are my fault.
+  * Jozsef
+  */
+
+#ifndef QEMU_JHASH_H__
+#define QEMU_JHASH_H__
+
+#include "qemu/bitops.h"
+
+/*
+ * hashtable related is copied from linux kernel jhash
+ */
+
+/* __jhash_mix -- mix 3 32-bit values reversibly. */
+#define __jhash_mix(a, b, c)\
+{   \
+a -= c;  a ^= rol32(c, 4);  c += b; \
+b -= a;  b ^= rol32(a, 6);  a += c; \
+c -= b;  c ^= rol32(b, 8);  b += a; \
+a -= c;  a ^= rol32(c, 16); c += b; \
+b -= a;  b ^= rol32(a, 19); a += c; \
+c -= b;  c ^= rol32(b, 4);  b += a; \
+}
+
+/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
+#define __jhash_final(a, b, c)  \
+{   \
+c ^= b; c -= rol32(b, 14);  \
+a ^= c; a -= rol32(c, 11);  \
+b ^= a; b -= rol32(a, 25);  \
+c ^= b; c -= rol32(b, 16);  \
+a ^= c; a -= rol32(c, 4);   \
+b ^= a; b -= rol32(a, 14);  \
+c ^= b; c -= rol32(b, 24);  \
+}
+
+/* An arbitrary initial parameter */
+#define JHASH_INITVAL   0xdeadbeef
+
+#endif /* QEMU_JHASH_H__ */
diff --git a/net/colo-compare.c b/net/colo-compare.c
index 62c66df..0bb5a51 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -20,15 +20,22 @@
  #include "net/queue.h"
  #include "sysemu/char.h"
  #include "qemu/sockets.h"
+#include 
+#include "slirp/slirp.h"
+#include "qemu/jhash.h"
+#include 

  #define TYPE_COLO_COMPARE "colo-compare"
  #define COLO_COMPARE(obj) \
  OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE)

  #define COMPARE_READ_LEN_MAX NET_BUFSIZE
+#define PAGE_SIZE 4096
+#define ETH_HLEN 14


PAGE_SIZE is not just 4k; use one of the system headers.
Also, don't define ETH_HLEN - include net/eth.h


  static QTAILQ_HEAD(, CompareState) net_compares =
 QTAILQ_HEAD_INITIALIZER(net_compares);
+static ssize_t hashtable_max_size;

  typedef struct ReadState {
  int state; /* 0 = getting length, 1 = getting data */
@@ -37,6 +44,28 @@ typedef struct ReadState {
  uint8_t buf[COMPARE_READ_LEN_MAX];
  } ReadState;

+/*
+  + CompareState ++
+  |   |
+  +---+   +---+ +---+
+  |conn list  +--->conn

[Qemu-devel] [Bug 1562653] Re: Ubuntu 15.10: QEMU VM hang if memory >= 1T

2016-03-30 Thread changlimin

I mean vm hang when memory >= 1100G (1024G when enable ide cdrom)
instead of 1.5G.

If disable hpet, the vm will hang at
acpi_ex_system_memory_space_handler  when memory >= 1100G

If disable kvm, vm is good when memory <= 1020G, but also hang when
memory >= 1024G.

There is no critical information in vm.log:

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1562653

Title:
  Ubuntu 15.10: QEMU VM hang if memory >= 1T

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Incomplete

Bug description:
  1. Ubuntu 15.10 x86_64 installed on HP SuperDome X with 8CPUs and 4T
  memory.

  2. Create a VM, install Ubuntu 15.10, if memory >= 1T , VM hang when start. 
If memory < 1T, it is good.
  
u1510-1
39eefe1e-4829-4843-b892-026d143f3ec7
1073741824
1073741824
16

  hvm
  
  


  
  
  


destroy
restart
restart

  /usr/bin/kvm
  




  
  




  
  
  

  
  

  
  





  
  
  
  

  
  


  
  

  

  

  3. The panic stack is
... cannot show
async_page_fault+0x28
ioread32_rep+0x38
ata_sff_data_xfer32+0x8a
ata_pio_sector+0x93
ata_pio_sectors+0x34
ata_sff_hsm_move+0x226
RIP: kthread_data+0x10
CR2: _FFD8

  4. Change the host os to Redhat 7.2 , the vm is good even memory
  >=3.8T.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1562653/+subscriptions

Re: [Qemu-devel] [PATCH 0/9] some QOM'ify work under hw/arm

2016-03-30 Thread xiaoqiang zhao



在 2016年03月16日 20:00, Peter Maydell 写道:

On 16 March 2016 at 11:31, Peter Maydell  wrote:

A general note -- since these aren't bug fixes and we're in soft
freeze now, I don't think we should put them in until 2.6 is out.
I'll review the patches in a second, though.

Well, the patches all look good -- ping me after 2.6 is out to
apply them to target-arm.next then.

Where I've noted possible further cleanups, that's intended as
an "if you like writing cleanup patches" thing -- you don't need
to try to do them as part of this patch series.

thanks
-- PMM

ping !

Re: [Qemu-devel] [PATCH V2 1/3] colo-compare: introduce colo compare initlization

2016-03-30 Thread Zhang Chen




On 03/30/2016 05:25 PM, Dr. David Alan Gilbert wrote:

* Zhang Chen (zhangchen.f...@cn.fujitsu.com) wrote:

packet come from primary char indev will be send to
outdev - packet come from secondary char dev will be drop

Please put in the description an example of how you invoke
the filter on the primary and secondary.


OK.
colo-compare get packet from chardev(primary_in,secondary_in),
and output to other chardev(outdev), so, we can use it with the
help of filter-mirror and filter-redirector. like that:

primary:
-netdev 
tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown

-device e1000,id=e0,netdev=hn0,mac=52:a4:00:12:78:66
-chardev socket,id=mirror0,host=3.3.3.3,port=9003,server,nowait
-chardev socket,id=compare1,host=3.3.3.3,port=9004,server,nowait
-chardev socket,id=compare0,host=3.3.3.3,port=9001,server,nowait
-chardev socket,id=compare0-0,host=3.3.3.3,port=9001
-chardev socket,id=compare_out,host=3.3.3.3,port=9005,server,nowait
-chardev socket,id=compare_out0,host=3.3.3.3,port=9005
-object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0
-object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out
-object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0
-object 
colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0 



secondary:
-netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,down 
script=/etc/qemu-ifdown

-device e1000,netdev=hn0,mac=52:a4:00:12:78:66
-chardev socket,id=red0,host=3.3.3.3,port=9003
-chardev socket,id=red1,host=3.3.3.3,port=9004
-object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0
-object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1






Signed-off-by: Li Zhijian 
Signed-off-by: Zhang Chen 
Signed-off-by: Wen Congyang 
---
  net/Makefile.objs  |   1 +
  net/colo-compare.c | 344 +
  vl.c   |   3 +-
  3 files changed, 347 insertions(+), 1 deletion(-)
  create mode 100644 net/colo-compare.c

diff --git a/net/Makefile.objs b/net/Makefile.objs
index b7c22fd..ba92f73 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -16,3 +16,4 @@ common-obj-$(CONFIG_NETMAP) += netmap.o
  common-obj-y += filter.o
  common-obj-y += filter-buffer.o
  common-obj-y += filter-mirror.o
+common-obj-y += colo-compare.o
diff --git a/net/colo-compare.c b/net/colo-compare.c
new file mode 100644
index 000..62c66df
--- /dev/null
+++ b/net/colo-compare.c
@@ -0,0 +1,344 @@
+/*
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Author: Zhang Chen 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qapi/qmp/qerror.h"
+#include "qemu/error-report.h"
+
+#include "net/net.h"
+#include "net/vhost_net.h"
+#include "qom/object_interfaces.h"
+#include "qemu/iov.h"
+#include "qom/object.h"
+#include "qemu/typedefs.h"
+#include "net/queue.h"
+#include "sysemu/char.h"
+#include "qemu/sockets.h"
+
+#define TYPE_COLO_COMPARE "colo-compare"
+#define COLO_COMPARE(obj) \
+OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE)
+
+#define COMPARE_READ_LEN_MAX NET_BUFSIZE
+
+static QTAILQ_HEAD(, CompareState) net_compares =
+   QTAILQ_HEAD_INITIALIZER(net_compares);
+
+typedef struct ReadState {
+int state; /* 0 = getting length, 1 = getting data */
+unsigned int index;
+unsigned int packet_len;

Please make packet_len and index  uint32_t, since they're sent over the wire
as 32bit.


+uint8_t buf[COMPARE_READ_LEN_MAX];
+} ReadState;
+
+typedef struct CompareState {
+Object parent;
+
+char *pri_indev;
+char *sec_indev;
+char *outdev;
+CharDriverState *chr_pri_in;
+CharDriverState *chr_sec_in;
+CharDriverState *chr_out;
+QTAILQ_ENTRY(CompareState) next;
+ReadState pri_rs;
+ReadState sec_rs;
+} CompareState;
+
+static int compare_chr_send(CharDriverState *out, const uint8_t *buf, int size)
+{
+int ret = 0;
+uint32_t len = htonl(size);
+

Similarly, make 'size' uint32_t - everything that gets converted into a uint32_t
it's probably best to make a uint32_t.


+if (!size) {
+return 0;
+}
+
+ret = qemu_chr_fe_write_all(out, (uint8_t *), sizeof(len));
+if (ret != sizeof(len)) {
+goto err;
+}
+
+ret = qemu_chr_fe_write_all(out, (uint8_t *)buf, size);
+if (ret != size) {
+goto err;
+}
+

You can make this slightly simpler and save the return 0;


+return 0;
+
+err:
+return ret < 0 ? ret : -EIO;

err:
return ret <= 0 ? ret : -EIO;


+}
+
+static int compare_chr_can_read(void *opaque)
+{
+return COMPARE_READ_LEN_MAX;
+}
+
+/* Returns
+ * 0: readstate is not ready
+ * 1: readstate is ready
+ * otherwise error occurs
+ */
+static int compare_chr_fill_rstate(ReadState *rs,

Re: [Qemu-devel] [PATCH] target-ppc: Multiple/String Word alignment exception

2016-03-30 Thread Laurent Vivier

On 31/03/2016 01:29, David Gibson wrote:
> On Wed, 30 Mar 2016 19:13:00 +0200
> Laurent Vivier  wrote:
> 
>> If the processor is in little-endian mode, an alignment interrupt must
>> occur for the following instructions: lmw, stmw, lswi, lswx, stswi or stswx.
>>
>> This is what happens with KVM, so change TCG to do the same.
>>
>> As the instruction can be emulated by the kernel, enable the change
>> only in softmmu mode.
>>
>> Signed-off-by: Laurent Vivier 
> 
> I guess this makes sense given the existing hardware behaviour, even
> though it seems a bit perverse to me to make the emulator strictly less
> functional.
> 
> Alex, what do you think?
> 
> Note that in time I expect we'll want some new flag to control this
> behaviour.  Given the push towards LE, I think it's pretty likely that
> future CPUs (maybe even POWER9) will allow these operations on LE
> without exceptions.
> 
> I guess one question here is what does the architecture say about
> this?  Does it say these operations will generate alignment exceptions
> on LE, or just that they may (implementation dependent)?

"Power ISA Version 2.07

Book III-S

6.5.8 Alignment Interrupt

An Alignment interrupt occurs when no higher priority
exception exists and a data access cannot be per-
formed for any of the following reasons.
...
* The instruction is lmw, stmw, lswi, lswx, stswi, or
stswx, and the operand is in storage that is Write
Through Required or Caching Inhibited, or the
thread is in Little-Endian mode.
..."

And this is very similar in

"The PowerPC Architecture,
A specification for a new family of risc processors

Book III PowerPC Operating Environment Architecture

5.5.6 Alignment Interrupt

An Alignment interrupt occurs when no higher priority
exception exists and the implementation cannot perform a storage access
for one of the reasons listed below.

...
* The instruction is lmw, stmw, lswi, lswx, stswi, or
stswx, and the processor is in Little-Endian mode.
..."

And

"Power ISA version 3.0"

Chapter 3. Fixed-Point Facility
...
lswi
...
This instruction is not supported in Little-Endian mode.
If it is executed in Little-Endian mode, the system alignment
error handler is invoked.
...
Book III

6.5.8 Alignment Interrupt
...
An Alignment interrupt occurs when no higher priority
exception exists and an attempt is made to execute an
instruction in a manner that is required, by the instruction
description, to cause an Alignment interrupt. These
cases are as follows.
* A Load/Store Multiple instruction that is executed
in Little-Endian mode
...
An Alignment interrupt may occur when no higher priority
exception exists and a data access cannot be performed
for any of the following reasons
..."

Laurent

Re: [Qemu-devel] [PATCH] target-ppc: Multiple/String Word alignment exception

2016-03-30 Thread David Gibson

On Wed, 30 Mar 2016 19:13:00 +0200
Laurent Vivier  wrote:

> If the processor is in little-endian mode, an alignment interrupt must
> occur for the following instructions: lmw, stmw, lswi, lswx, stswi or stswx.
> 
> This is what happens with KVM, so change TCG to do the same.
> 
> As the instruction can be emulated by the kernel, enable the change
> only in softmmu mode.
> 
> Signed-off-by: Laurent Vivier 

I guess this makes sense given the existing hardware behaviour, even
though it seems a bit perverse to me to make the emulator strictly less
functional.

Alex, what do you think?

Note that in time I expect we'll want some new flag to control this
behaviour.  Given the push towards LE, I think it's pretty likely that
future CPUs (maybe even POWER9) will allow these operations on LE
without exceptions.

I guess one question here is what does the architecture say about
this?  Does it say these operations will generate alignment exceptions
on LE, or just that they may (implementation dependent)?

> ---
>  target-ppc/translate.c | 42 ++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 6f0e7b4..e33dcf7 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -3181,6 +3181,13 @@ static void gen_lmw(DisasContext *ctx)
>  {
>  TCGv t0;
>  TCGv_i32 t1;
> +#if !defined(CONFIG_USER_ONLY)
> +if (ctx->le_mode) {
> +gen_exception_err(ctx, POWERPC_EXCP_ALIGN, POWERPC_EXCP_ALIGN_LE);
> +return;
> +}
> +#endif
> +
>  gen_set_access_type(ctx, ACCESS_INT);
>  /* NIP cannot be restored if the memory exception comes from an helper */
>  gen_update_nip(ctx, ctx->nip - 4);
> @@ -3197,6 +3204,13 @@ static void gen_stmw(DisasContext *ctx)
>  {
>  TCGv t0;
>  TCGv_i32 t1;
> +#if !defined(CONFIG_USER_ONLY)
> +if (ctx->le_mode) {
> +gen_exception_err(ctx, POWERPC_EXCP_ALIGN, POWERPC_EXCP_ALIGN_LE);
> +return;
> +}
> +#endif
> +
>  gen_set_access_type(ctx, ACCESS_INT);
>  /* NIP cannot be restored if the memory exception comes from an helper */
>  gen_update_nip(ctx, ctx->nip - 4);
> @@ -3224,6 +3238,13 @@ static void gen_lswi(DisasContext *ctx)
>  int start = rD(ctx->opcode);
>  int ra = rA(ctx->opcode);
>  int nr;
> +#if !defined(CONFIG_USER_ONLY)
> +if (ctx->le_mode) {
> +gen_exception_err(ctx, POWERPC_EXCP_ALIGN, POWERPC_EXCP_ALIGN_LE);
> +return;
> +}
> +#endif
> +
>  
>  if (nb == 0)
>  nb = 32;
> @@ -3252,6 +3273,13 @@ static void gen_lswx(DisasContext *ctx)
>  {
>  TCGv t0;
>  TCGv_i32 t1, t2, t3;
> +#if !defined(CONFIG_USER_ONLY)
> +if (ctx->le_mode) {
> +gen_exception_err(ctx, POWERPC_EXCP_ALIGN, POWERPC_EXCP_ALIGN_LE);
> +return;
> +}
> +#endif
> +
>  gen_set_access_type(ctx, ACCESS_INT);
>  /* NIP cannot be restored if the memory exception comes from an helper */
>  gen_update_nip(ctx, ctx->nip - 4);
> @@ -3273,6 +3301,13 @@ static void gen_stswi(DisasContext *ctx)
>  TCGv t0;
>  TCGv_i32 t1, t2;
>  int nb = NB(ctx->opcode);
> +#if !defined(CONFIG_USER_ONLY)
> +if (ctx->le_mode) {
> +gen_exception_err(ctx, POWERPC_EXCP_ALIGN, POWERPC_EXCP_ALIGN_LE);
> +return;
> +}
> +#endif
> +
>  gen_set_access_type(ctx, ACCESS_INT);
>  /* NIP cannot be restored if the memory exception comes from an helper */
>  gen_update_nip(ctx, ctx->nip - 4);
> @@ -3293,6 +3328,13 @@ static void gen_stswx(DisasContext *ctx)
>  {
>  TCGv t0;
>  TCGv_i32 t1, t2;
> +#if !defined(CONFIG_USER_ONLY)
> +if (ctx->le_mode) {
> +gen_exception_err(ctx, POWERPC_EXCP_ALIGN, POWERPC_EXCP_ALIGN_LE);
> +return;
> +}
> +#endif
> +
>  gen_set_access_type(ctx, ACCESS_INT);
>  /* NIP cannot be restored if the memory exception comes from an helper */
>  gen_update_nip(ctx, ctx->nip - 4);
> -- 
> 2.5.5
> 


-- 
David Gibson 
Senior Software Engineer, Virtualization, Red Hat


pgpySdH8sPsNG.pgp
Description: OpenPGP digital signature

[Qemu-devel] [PULL] Fix ipv6 options according to documentation

2016-03-30 Thread Samuel Thibault

The options names were fixed in the qapi layer, but not in the command-line
options.

Signed-off-by: Samuel Thibault 
Reviewed-by: Eric Blake 
---
 net/net.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/net.c b/net/net.c
index 3b5a142..594c3b8 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1054,32 +1054,32 @@ int net_client_init(QemuOpts *opts, int is_netdev, 
Error **errp)
 
 {
 /* Parse convenience option format ip6-net=fec0::0[/64] */
-const char *ip6_net = qemu_opt_get(opts, "ip6-net");
+const char *ip6_net = qemu_opt_get(opts, "ipv6-net");
 
 if (ip6_net) {
 char buf[strlen(ip6_net) + 1];
 
 if (get_str_sep(buf, sizeof(buf), _net, '/') < 0) {
 /* Default 64bit prefix length.  */
-qemu_opt_set(opts, "ip6-prefix", ip6_net, _abort);
-qemu_opt_set_number(opts, "ip6-prefixlen", 64, _abort);
+qemu_opt_set(opts, "ipv6-prefix", ip6_net, _abort);
+qemu_opt_set_number(opts, "ipv6-prefixlen", 64, _abort);
 } else {
 /* User-specified prefix length.  */
 unsigned long len;
 int err;
 
-qemu_opt_set(opts, "ip6-prefix", buf, _abort);
+qemu_opt_set(opts, "ipv6-prefix", buf, _abort);
 err = qemu_strtoul(ip6_net, NULL, 10, );
 
 if (err) {
 error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
-  "ip6-prefix", "a number");
+  "ipv6-prefix", "a number");
 } else {
-qemu_opt_set_number(opts, "ip6-prefixlen", len,
+qemu_opt_set_number(opts, "ipv6-prefixlen", len,
 _abort);
 }
 }
-qemu_opt_unset(opts, "ip6-net");
+qemu_opt_unset(opts, "ipv6-net");
 }
 }
 
-- 
2.8.0.rc3

Re: [Qemu-devel] [PATCH] Fix ipv6 options according to documentation

2016-03-30 Thread Eric Blake

On 03/30/2016 04:55 PM, Samuel Thibault wrote:
> The options names were fixed in the qapi layer, but not in the command-line
> options.
> 
> Signed-off-by: Samuel Thibault 
> ---
>  net/net.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCHv2 3/5] slirp: Add dns6 resolution

2016-03-30 Thread Samuel Thibault

This makes get_dns_addr address family-agnostic, thus allowing to add the
IPv6 case.

Signed-off-by: Samuel Thibault 

---
Changes since v1:
- code layout
- cope with inet_ntop returning NULL
- move static variables to function using them.
---
 slirp/ip6.h  |  9 +++
 slirp/libslirp.h |  1 +
 slirp/slirp.c| 79 
 slirp/socket.c   |  4 +--
 4 files changed, 69 insertions(+), 24 deletions(-)

diff --git a/slirp/ip6.h b/slirp/ip6.h
index 8ddfa24..da23de6 100644
--- a/slirp/ip6.h
+++ b/slirp/ip6.h
@@ -26,6 +26,12 @@
 0x00, 0x00, 0x00, 0x00,\
 0x00, 0x00, 0x00, 0x02 } }
 
+#define ZERO_ADDR  { .s6_addr = \
+{ 0x00, 0x00, 0x00, 0x00,\
+0x00, 0x00, 0x00, 0x00,\
+0x00, 0x00, 0x00, 0x00,\
+0x00, 0x00, 0x00, 0x00 } }
+
 static inline bool in6_equal(const struct in6_addr *a, const struct in6_addr 
*b)
 {
 return memcmp(a, b, sizeof(*a)) == 0;
@@ -84,6 +90,9 @@ static inline bool in6_equal_mach(const struct in6_addr *a,
 #define in6_solicitednode_multicast(a)\
 (in6_equal_net(a, &(struct in6_addr)SOLICITED_NODE_PREFIX, 104))
 
+#define in6_zero(a)\
+(in6_equal(a, &(struct in6_addr)ZERO_ADDR))
+
 /* Compute emulated host MAC address from its ipv6 address */
 static inline void in6_compute_ethaddr(struct in6_addr ip,
uint8_t eth[ETH_ALEN])
diff --git a/slirp/libslirp.h b/slirp/libslirp.h
index 127aa41..b0cfbc5 100644
--- a/slirp/libslirp.h
+++ b/slirp/libslirp.h
@@ -7,6 +7,7 @@ struct Slirp;
 typedef struct Slirp Slirp;
 
 int get_dns_addr(struct in_addr *pdns_addr);
+int get_dns6_addr(struct in6_addr *pdns6_addr);
 
 Slirp *slirp_init(int restricted, bool in_enabled, struct in_addr vnetwork,
   struct in_addr vnetmask, struct in_addr vhost,
diff --git a/slirp/slirp.c b/slirp/slirp.c
index 4478a43..c6bcc6e 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -46,7 +46,9 @@ static QTAILQ_HEAD(slirp_instances, Slirp) slirp_instances =
 QTAILQ_HEAD_INITIALIZER(slirp_instances);
 
 static struct in_addr dns_addr;
+static struct in6_addr dns6_addr;
 static u_int dns_addr_time;
+static u_int dns6_addr_time;
 
 #define TIMEOUT_FAST 2  /* milliseconds */
 #define TIMEOUT_SLOW 499  /* milliseconds */
@@ -100,6 +102,11 @@ int get_dns_addr(struct in_addr *pdns_addr)
 return 0;
 }
 
+int get_dns6_addr(struct in6_addr *pdns_addr6)
+{
+return -1;
+}
+
 static void winsock_cleanup(void)
 {
 WSACleanup();
@@ -107,36 +114,37 @@ static void winsock_cleanup(void)
 
 #else
 
-static struct stat dns_addr_stat;
-
-static int get_dns_addr_cached(struct in_addr *pdns_addr)
+static int get_dns_addr_cached(void *pdns_addr, void *cached_addr,
+   socklen_t addrlen,
+   struct stat *cached_stat, u_int *cached_time)
 {
 struct stat old_stat;
-if (curtime - dns_addr_time < TIMEOUT_DEFAULT) {
-*pdns_addr = dns_addr;
+if (curtime - *cached_time < TIMEOUT_DEFAULT) {
+memcpy(pdns_addr, cached_addr, addrlen);
 return 0;
 }
-old_stat = dns_addr_stat;
-if (stat("/etc/resolv.conf", _addr_stat) != 0) {
+old_stat = *cached_stat;
+if (stat("/etc/resolv.conf", cached_stat) != 0) {
 return -1;
 }
-if (dns_addr_stat.st_dev == old_stat.st_dev
-&& dns_addr_stat.st_ino == old_stat.st_ino
-&& dns_addr_stat.st_size == old_stat.st_size
-&& dns_addr_stat.st_mtime == old_stat.st_mtime) {
-*pdns_addr = dns_addr;
+if (cached_stat->st_dev == old_stat.st_dev
+&& cached_stat->st_ino == old_stat.st_ino
+&& cached_stat->st_size == old_stat.st_size
+&& cached_stat->st_mtime == old_stat.st_mtime) {
+memcpy(pdns_addr, cached_addr, addrlen);
 return 0;
 }
 return 1;
 }
 
-static int get_dns_addr_resolv_conf(struct in_addr *pdns_addr)
+static int get_dns_addr_resolv_conf(int af, void *pdns_addr, void *cached_addr,
+socklen_t addrlen, u_int *cached_time)
 {
 char buff[512];
 char buff2[257];
 FILE *f;
 int found = 0;
-struct in_addr tmp_addr;
+void *tmp_addr = alloca(addrlen);
 
 f = fopen("/etc/resolv.conf", "r");
 if (!f)
@@ -147,13 +155,14 @@ static int get_dns_addr_resolv_conf(struct in_addr 
*pdns_addr)
 #endif
 while (fgets(buff, 512, f) != NULL) {
 if (sscanf(buff, "nameserver%*[ \t]%256s", buff2) == 1) {
-if (!inet_aton(buff2, _addr))
+if (!inet_pton(af, buff2, tmp_addr)) {
 continue;
+}
 /* If it's the first one, set it to dns_addr */
 if (!found) {
-*pdns_addr = tmp_addr;
-dns_addr = tmp_addr;
-dns_addr_time = curtime;
+

[Qemu-devel] [PATCHv2 4/5] slirp: Support link-local DNS addresses

2016-03-30 Thread Samuel Thibault

They look like fe80::%eth0

Signed-off-by: Samuel Thibault 
Reviewed-by: Thomas Huth 
---
 slirp/libslirp.h |  2 +-
 slirp/slirp.c| 26 ++
 slirp/socket.c   |  2 +-
 3 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/slirp/libslirp.h b/slirp/libslirp.h
index b0cfbc5..81bd139 100644
--- a/slirp/libslirp.h
+++ b/slirp/libslirp.h
@@ -7,7 +7,7 @@ struct Slirp;
 typedef struct Slirp Slirp;
 
 int get_dns_addr(struct in_addr *pdns_addr);
-int get_dns6_addr(struct in6_addr *pdns6_addr);
+int get_dns6_addr(struct in6_addr *pdns6_addr, unsigned *scope_id);
 
 Slirp *slirp_init(int restricted, bool in_enabled, struct in_addr vnetwork,
   struct in_addr vnetmask, struct in_addr vhost,
diff --git a/slirp/slirp.c b/slirp/slirp.c
index c6bcc6e..c00fa32 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -30,6 +30,10 @@
 #include "hw/hw.h"
 #include "qemu/cutils.h"
 
+#ifndef _WIN32
+#include 
+#endif
+
 /* host loopback address */
 struct in_addr loopback_addr;
 /* host loopback network mask */
@@ -138,13 +142,15 @@ static int get_dns_addr_cached(void *pdns_addr, void 
*cached_addr,
 }
 
 static int get_dns_addr_resolv_conf(int af, void *pdns_addr, void *cached_addr,
-socklen_t addrlen, u_int *cached_time)
+socklen_t addrlen, unsigned *scope_id,
+u_int *cached_time)
 {
 char buff[512];
 char buff2[257];
 FILE *f;
 int found = 0;
 void *tmp_addr = alloca(addrlen);
+unsigned if_index;
 
 f = fopen("/etc/resolv.conf", "r");
 if (!f)
@@ -155,6 +161,14 @@ static int get_dns_addr_resolv_conf(int af, void 
*pdns_addr, void *cached_addr,
 #endif
 while (fgets(buff, 512, f) != NULL) {
 if (sscanf(buff, "nameserver%*[ \t]%256s", buff2) == 1) {
+char *c = strchr(buff2, '%');
+if (c) {
+if_index = if_nametoindex(c + 1);
+*c = '\0';
+} else {
+if_index = 0;
+}
+
 if (!inet_pton(af, buff2, tmp_addr)) {
 continue;
 }
@@ -162,6 +176,9 @@ static int get_dns_addr_resolv_conf(int af, void 
*pdns_addr, void *cached_addr,
 if (!found) {
 memcpy(pdns_addr, tmp_addr, addrlen);
 memcpy(cached_addr, tmp_addr, addrlen);
+if (scope_id) {
+*scope_id = if_index;
+}
 *cached_time = curtime;
 }
 #ifdef DEBUG
@@ -205,10 +222,10 @@ int get_dns_addr(struct in_addr *pdns_addr)
 }
 }
 return get_dns_addr_resolv_conf(AF_INET, pdns_addr, _addr,
-sizeof(dns_addr), _addr_time);
+sizeof(dns_addr), NULL, _addr_time);
 }
 
-int get_dns6_addr(struct in6_addr *pdns6_addr)
+int get_dns6_addr(struct in6_addr *pdns6_addr, unsigned *scope_id)
 {
 static struct stat dns6_addr_stat;
 
@@ -221,7 +238,8 @@ int get_dns6_addr(struct in6_addr *pdns6_addr)
 }
 }
 return get_dns_addr_resolv_conf(AF_INET6, pdns6_addr, _addr,
-sizeof(dns6_addr), _addr_time);
+sizeof(dns6_addr),
+scope_id, _addr_time);
 }
 
 #endif
diff --git a/slirp/socket.c b/slirp/socket.c
index 653257d..896c27e 100644
--- a/slirp/socket.c
+++ b/slirp/socket.c
@@ -796,7 +796,7 @@ void sotranslate_out(struct socket *so, struct 
sockaddr_storage *addr)
 if (in6_equal_net(>so_faddr6, >vprefix_addr6,
 slirp->vprefix_len)) {
 if (in6_equal(>so_faddr6, >vnameserver_addr6)) {
-if (get_dns6_addr(>sin6_addr) < 0) {
+if (get_dns6_addr(>sin6_addr, >sin6_scope_id) < 0) 
{
 sin6->sin6_addr = in6addr_loopback;
 }
 } else {
-- 
2.8.0.rc3

[Qemu-devel] [PATCHv2 5/5] slirp: Add RDNSS advertisement

2016-03-30 Thread Samuel Thibault

This adds the RDNSS option to IPv6 router advertisements, so that the guest
can autoconfigure the DNS server address.

Signed-off-by: Samuel Thibault 
Reviewed-by: Thomas Huth 
---
 slirp/ip6_icmp.c | 19 ---
 slirp/ip6_icmp.h | 12 ++--
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/slirp/ip6_icmp.c b/slirp/ip6_icmp.c
index 09571bc..74585d1 100644
--- a/slirp/ip6_icmp.c
+++ b/slirp/ip6_icmp.c
@@ -149,7 +149,8 @@ void ndp_send_ra(Slirp *slirp)
 rip->ip_nh = IPPROTO_ICMPV6;
 rip->ip_pl = htons(ICMP6_NDP_RA_MINLEN
 + NDPOPT_LINKLAYER_LEN
-+ NDPOPT_PREFIXINFO_LEN);
++ NDPOPT_PREFIXINFO_LEN
++ NDPOPT_RDNSS_LEN);
 t->m_len = sizeof(struct ip6) + ntohs(rip->ip_pl);
 
 /* Build ICMPv6 packet */
@@ -167,16 +168,16 @@ void ndp_send_ra(Slirp *slirp)
 ricmp->icmp6_nra.lifetime = htons(NDP_AdvDefaultLifetime);
 ricmp->icmp6_nra.reach_time = htonl(NDP_AdvReachableTime);
 ricmp->icmp6_nra.retrans_time = htonl(NDP_AdvRetransTime);
+t->m_data += ICMP6_NDP_RA_MINLEN;
 
 /* Source link-layer address (NDP option) */
-t->m_data += ICMP6_NDP_RA_MINLEN;
 struct ndpopt *opt = mtod(t, struct ndpopt *);
 opt->ndpopt_type = NDPOPT_LINKLAYER_SOURCE;
 opt->ndpopt_len = NDPOPT_LINKLAYER_LEN / 8;
 in6_compute_ethaddr(rip->ip_src, opt->ndpopt_linklayer);
+t->m_data += NDPOPT_LINKLAYER_LEN;
 
 /* Prefix information (NDP option) */
-t->m_data += NDPOPT_LINKLAYER_LEN;
 struct ndpopt *opt2 = mtod(t, struct ndpopt *);
 opt2->ndpopt_type = NDPOPT_PREFIX_INFO;
 opt2->ndpopt_len = NDPOPT_PREFIXINFO_LEN / 8;
@@ -188,8 +189,20 @@ void ndp_send_ra(Slirp *slirp)
 opt2->ndpopt_prefixinfo.pref_lt = htonl(NDP_AdvPrefLifetime);
 opt2->ndpopt_prefixinfo.reserved2 = 0;
 opt2->ndpopt_prefixinfo.prefix = slirp->vprefix_addr6;
+t->m_data += NDPOPT_PREFIXINFO_LEN;
+
+/* Prefix information (NDP option) */
+struct ndpopt *opt3 = mtod(t, struct ndpopt *);
+opt3->ndpopt_type = NDPOPT_RDNSS;
+opt3->ndpopt_len = NDPOPT_RDNSS_LEN / 8;
+opt3->ndpopt_rdnss.reserved = 0;
+opt3->ndpopt_rdnss.lifetime = htonl(2 * NDP_MaxRtrAdvInterval);
+opt3->ndpopt_rdnss.addr = slirp->vnameserver_addr6;
+t->m_data += NDPOPT_RDNSS_LEN;
 
 /* ICMPv6 Checksum */
+t->m_data -= NDPOPT_RDNSS_LEN;
+t->m_data -= NDPOPT_PREFIXINFO_LEN;
 t->m_data -= NDPOPT_LINKLAYER_LEN;
 t->m_data -= ICMP6_NDP_RA_MINLEN;
 t->m_data -= sizeof(struct ip6);
diff --git a/slirp/ip6_icmp.h b/slirp/ip6_icmp.h
index 9460bf8..2282d29 100644
--- a/slirp/ip6_icmp.h
+++ b/slirp/ip6_icmp.h
@@ -122,6 +122,7 @@ struct ndpopt {
 uint8_t ndpopt_len; /* /!\ In units of 8 octets */
 union {
 unsigned char   linklayer_addr[6];  /* Source/Target Link-layer */
+#define ndpopt_linklayer ndpopt_body.linklayer_addr
 struct prefixinfo { /* Prefix Information */
 uint8_t prefix_length;
 #ifdef HOST_WORDS_BIGENDIAN
@@ -134,19 +135,26 @@ struct ndpopt {
 uint32_treserved2;
 struct in6_addr prefix;
 } QEMU_PACKED prefixinfo;
-} ndpopt_body;
-#define ndpopt_linklayer ndpopt_body.linklayer_addr
 #define ndpopt_prefixinfo ndpopt_body.prefixinfo
+struct rdnss {
+uint16_t reserved;
+uint32_t lifetime;
+struct in6_addr addr;
+} QEMU_PACKED rdnss;
+#define ndpopt_rdnss ndpopt_body.rdnss
+} ndpopt_body;
 } QEMU_PACKED;
 
 /* NDP options type */
 #define NDPOPT_LINKLAYER_SOURCE 1   /* Source Link-Layer Address */
 #define NDPOPT_LINKLAYER_TARGET 2   /* Target Link-Layer Address */
 #define NDPOPT_PREFIX_INFO  3   /* Prefix Information */
+#define NDPOPT_RDNSS25  /* Recursive DNS Server Address */
 
 /* NDP options size, in octets. */
 #define NDPOPT_LINKLAYER_LEN8
 #define NDPOPT_PREFIXINFO_LEN   32
+#define NDPOPT_RDNSS_LEN24
 
 /*
  * Definition of type and code field values.
-- 
2.8.0.rc3

[Qemu-devel] [PATCHv2 1/5] slirp: Allow disabling IPv4 or IPv6

2016-03-30 Thread Samuel Thibault

Add ipv4 and ipv6 boolean options, so the user can setup IPv4-only and
IPv6-only network environments.

Signed-off-by: Samuel Thibault 

---

Changes since previous versions:

- fix coding style
- explicit that qemu options can take ipv[46]=on|off
---
 net/slirp.c   | 36 ++--
 qapi-schema.json  |  8 
 qemu-options.hx   |  8 ++--
 slirp/ip6_icmp.c  |  8 
 slirp/ip6_input.c |  5 +
 slirp/ip_input.c  |  4 
 slirp/libslirp.h  |  3 ++-
 slirp/slirp.c | 10 +-
 slirp/slirp.h |  2 ++
 9 files changed, 74 insertions(+), 10 deletions(-)

diff --git a/net/slirp.c b/net/slirp.c
index 791a8f7..31630f0 100644
--- a/net/slirp.c
+++ b/net/slirp.c
@@ -136,8 +136,8 @@ static NetClientInfo net_slirp_info = {
 
 static int net_slirp_init(NetClientState *peer, const char *model,
   const char *name, int restricted,
-  const char *vnetwork, const char *vhost,
-  const char *vprefix6, int vprefix6_len,
+  bool ipv4, const char *vnetwork, const char *vhost,
+  bool ipv6, const char *vprefix6, int vprefix6_len,
   const char *vhost6,
   const char *vhostname, const char *tftp_export,
   const char *bootfile, const char *vdhcp_start,
@@ -165,6 +165,19 @@ static int net_slirp_init(NetClientState *peer, const char 
*model,
 char *end;
 struct slirp_config_str *config;
 
+if (!ipv4 && (vnetwork || vhost || vnameserver)) {
+return -1;
+}
+
+if (!ipv6 && (vprefix6 || vhost6 || vnameserver6)) {
+return -1;
+}
+
+if (!ipv4 && !ipv6) {
+/* It doesn't make sense to disable both */
+return -1;
+}
+
 if (!tftp_export) {
 tftp_export = legacy_tftp_prefix;
 }
@@ -309,8 +322,8 @@ static int net_slirp_init(NetClientState *peer, const char 
*model,
 
 s = DO_UPCAST(SlirpState, nc, nc);
 
-s->slirp = slirp_init(restricted, net, mask, host,
-  ip6_prefix, vprefix6_len, ip6_host,
+s->slirp = slirp_init(restricted, ipv4, net, mask, host,
+  ipv6, ip6_prefix, vprefix6_len, ip6_host,
   vhostname, tftp_export, bootfile, dhcp,
   dns, ip6_dns, dnssearch, s);
 QTAILQ_INSERT_TAIL(_stacks, s, entry);
@@ -813,10 +826,20 @@ int net_init_slirp(const NetClientOptions *opts, const 
char *name,
 int ret;
 const NetdevUserOptions *user;
 const char **dnssearch;
+bool ipv4 = true, ipv6 = true;
 
 assert(opts->type == NET_CLIENT_OPTIONS_KIND_USER);
 user = opts->u.user.data;
 
+if ((user->has_ipv6 && user->ipv6 && !user->has_ipv4) ||
+(user->has_ipv4 && !user->ipv4)) {
+ipv4 = 0;
+}
+if ((user->has_ipv4 && user->ipv4 && !user->has_ipv6) ||
+(user->has_ipv6 && !user->ipv6)) {
+ipv6 = 0;
+}
+
 vnet = user->has_net ? g_strdup(user->net) :
user->has_ip  ? g_strdup_printf("%s/24", user->ip) :
NULL;
@@ -828,8 +851,9 @@ int net_init_slirp(const NetClientOptions *opts, const char 
*name,
 net_init_slirp_configs(user->hostfwd, SLIRP_CFG_HOSTFWD);
 net_init_slirp_configs(user->guestfwd, 0);
 
-ret = net_slirp_init(peer, "user", name, user->q_restrict, vnet,
- user->host, user->ipv6_prefix, user->ipv6_prefixlen,
+ret = net_slirp_init(peer, "user", name, user->q_restrict,
+ ipv4, vnet, user->host,
+ ipv6, user->ipv6_prefix, user->ipv6_prefixlen,
  user->ipv6_host, user->hostname, user->tftp,
  user->bootfile, user->dhcpstart,
  user->dns, user->ipv6_dns, user->smb,
diff --git a/qapi-schema.json b/qapi-schema.json
index e58f6a9..54634c4 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2425,6 +2425,12 @@
 #
 # @restrict: #optional isolate the guest from the host
 #
+# @ipv4: #optional whether to support IPv4, default true for enabled
+#(since 2.6)
+#
+# @ipv6: #optional whether to support IPv6, default true for enabled
+#(since 2.6)
+#
 # @ip: #optional legacy parameter, use net= instead
 #
 # @net: #optional IP network address that the guest will see, in the
@@ -2473,6 +2479,8 @@
   'data': {
 '*hostname':  'str',
 '*restrict':  'bool',
+'*ipv4':  'bool',
+'*ipv6':  'bool',
 '*ip':'str',
 '*net':   'str',
 '*host':  'str',
diff --git a/qemu-options.hx b/qemu-options.hx
index a770086..789d9f6 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1551,8 +1551,9 @@ DEF("smb", HAS_ARG, QEMU_OPTION_smb, "", QEMU_ARCH_ALL)
 
 DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
 #ifdef CONFIG_SLIRP
-"-netdev user,id=str[,net=addr[/mask]][,host=addr][,ipv6-net=addr[/int]]\n"

[Qemu-devel] [PATCHv2 2/5] slirp: Split get_dns_addr

2016-03-30 Thread Samuel Thibault

Separate get_dns_addr into get_dns_addr_cached and get_dns_addr_resolv_conf
to make conversion to IPv6 easier.

Signed-off-by: Samuel Thibault 
Reviewed-by: Thomas Huth 

---
Change since v1: remove superfluous parenthesis.
---
 slirp/slirp.c | 53 ++---
 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/slirp/slirp.c b/slirp/slirp.c
index cefb9dc..4478a43 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -109,7 +109,28 @@ static void winsock_cleanup(void)
 
 static struct stat dns_addr_stat;
 
-int get_dns_addr(struct in_addr *pdns_addr)
+static int get_dns_addr_cached(struct in_addr *pdns_addr)
+{
+struct stat old_stat;
+if (curtime - dns_addr_time < TIMEOUT_DEFAULT) {
+*pdns_addr = dns_addr;
+return 0;
+}
+old_stat = dns_addr_stat;
+if (stat("/etc/resolv.conf", _addr_stat) != 0) {
+return -1;
+}
+if (dns_addr_stat.st_dev == old_stat.st_dev
+&& dns_addr_stat.st_ino == old_stat.st_ino
+&& dns_addr_stat.st_size == old_stat.st_size
+&& dns_addr_stat.st_mtime == old_stat.st_mtime) {
+*pdns_addr = dns_addr;
+return 0;
+}
+return 1;
+}
+
+static int get_dns_addr_resolv_conf(struct in_addr *pdns_addr)
 {
 char buff[512];
 char buff2[257];
@@ -117,24 +138,6 @@ int get_dns_addr(struct in_addr *pdns_addr)
 int found = 0;
 struct in_addr tmp_addr;
 
-if (dns_addr.s_addr != 0) {
-struct stat old_stat;
-if ((curtime - dns_addr_time) < TIMEOUT_DEFAULT) {
-*pdns_addr = dns_addr;
-return 0;
-}
-old_stat = dns_addr_stat;
-if (stat("/etc/resolv.conf", _addr_stat) != 0)
-return -1;
-if ((dns_addr_stat.st_dev == old_stat.st_dev)
-&& (dns_addr_stat.st_ino == old_stat.st_ino)
-&& (dns_addr_stat.st_size == old_stat.st_size)
-&& (dns_addr_stat.st_mtime == old_stat.st_mtime)) {
-*pdns_addr = dns_addr;
-return 0;
-}
-}
-
 f = fopen("/etc/resolv.conf", "r");
 if (!f)
 return -1;
@@ -174,6 +177,18 @@ int get_dns_addr(struct in_addr *pdns_addr)
 return 0;
 }
 
+int get_dns_addr(struct in_addr *pdns_addr)
+{
+if (dns_addr.s_addr != 0) {
+int ret;
+ret = get_dns_addr_cached(pdns_addr);
+if (ret <= 0) {
+return ret;
+}
+}
+return get_dns_addr_resolv_conf(pdns_addr);
+}
+
 #endif
 
 static void slirp_init_once(void)
-- 
2.8.0.rc3

[Qemu-devel] [PATCHv2 0/5] ipv4-only and ipv6-only support

2016-03-30 Thread Samuel Thibault

Hello,

This series gathers the patches to enable ipv4-only and ipv6-only support: it
adds the discussed ipv4 and ipv6 options to select which is enabled, and adds
support for ipv6 dns translation.

Changes since previous submission:

- code layout
- cope with inet_ntop returning NULL
- move static variables to function using them.
- remove superfluous parenthesis.
- fix coding style
- explicit that qemu options can take ipv[46]=on|off

Samuel Thibault (5):
  slirp: Allow disabling IPv4 or IPv6
  slirp: Split get_dns_addr
  slirp: Add dns6 resolution
  slirp: Support link-local DNS addresses
  slirp: Add RDNSS advertisement

 net/slirp.c   |  36 ---
 qapi-schema.json  |   8 
 qemu-options.hx   |   8 +++-
 slirp/ip6.h   |   9 
 slirp/ip6_icmp.c  |  27 +--
 slirp/ip6_icmp.h  |  12 -
 slirp/ip6_input.c |   5 +++
 slirp/ip_input.c  |   4 ++
 slirp/libslirp.h  |   4 +-
 slirp/slirp.c | 132 ++
 slirp/slirp.h |   2 +
 slirp/socket.c|   4 +-
 12 files changed, 207 insertions(+), 44 deletions(-)

-- 
2.8.0.rc3

[Qemu-devel] [PATCH] Fix ipv6 options according to documentation

2016-03-30 Thread Samuel Thibault

The options names were fixed in the qapi layer, but not in the command-line
options.

Signed-off-by: Samuel Thibault 
---
 net/net.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/net.c b/net/net.c
index 3b5a142..594c3b8 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1054,32 +1054,32 @@ int net_client_init(QemuOpts *opts, int is_netdev, 
Error **errp)
 
 {
 /* Parse convenience option format ip6-net=fec0::0[/64] */
-const char *ip6_net = qemu_opt_get(opts, "ip6-net");
+const char *ip6_net = qemu_opt_get(opts, "ipv6-net");
 
 if (ip6_net) {
 char buf[strlen(ip6_net) + 1];
 
 if (get_str_sep(buf, sizeof(buf), _net, '/') < 0) {
 /* Default 64bit prefix length.  */
-qemu_opt_set(opts, "ip6-prefix", ip6_net, _abort);
-qemu_opt_set_number(opts, "ip6-prefixlen", 64, _abort);
+qemu_opt_set(opts, "ipv6-prefix", ip6_net, _abort);
+qemu_opt_set_number(opts, "ipv6-prefixlen", 64, _abort);
 } else {
 /* User-specified prefix length.  */
 unsigned long len;
 int err;
 
-qemu_opt_set(opts, "ip6-prefix", buf, _abort);
+qemu_opt_set(opts, "ipv6-prefix", buf, _abort);
 err = qemu_strtoul(ip6_net, NULL, 10, );
 
 if (err) {
 error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
-  "ip6-prefix", "a number");
+  "ipv6-prefix", "a number");
 } else {
-qemu_opt_set_number(opts, "ip6-prefixlen", len,
+qemu_opt_set_number(opts, "ipv6-prefixlen", len,
 _abort);
 }
 }
-qemu_opt_unset(opts, "ip6-net");
+qemu_opt_unset(opts, "ipv6-net");
 }
 }
 
-- 
2.8.0.rc3

Re: [Qemu-devel] [PATCH v2 3/3] doc: Propose Structured Read extension

2016-03-30 Thread Alex Bligh

Eric,

>>> However, for ease of implementation, a
>>> +server MAY close the connection rather than entering transmission
>>> +phase if, at the end of option haggling, the client has negotiated
>>> +another command that requires a structured reply but did not also
>>> +negotiate Structured Reads.
>> 
>> That's pretty yucky given a reconnect will achieve the same result
>> and you'll end up in an infinite retry loop.
>> 
>> Wouldn't a better route be simply to say that implementing certain
>> commands (server or client sides) requires support of structured
>> replies?
> 
> I think we're in agreement that the only command that (for back-compat
> reasons) can ever send data on a simple reply is CMD_READ.  Therefore,
> if you negotiate any other command that can send data, that command will
> use a structured reply; the mere fact that you negotiated the command
> means that both client and server know about structured replies.
> 
> I guess what I was trying to get at is that if you are using any
> structured replies, then it is a disservice to wire-sniffers if you did
> not also enable structured replies for CMD_READ.

Agree

> Technically, it would
> be feasible to use simple replies for reads while using structured
> replies for the other negotiated commands,

Agree

> but practically, a client
> that does that seems undesirable, which is why I said that a server
> could disconnect rather than talking to such a client.

I would prefer to make that a protocol breach, i.e. the client
MUST NOT do that.

> So taking your sentence at face value, yes there is another solution
> possible: require that any NBD_OPT_* haggling used to negotiate the use
> of any other command with a structured reply MUST fail if the client has
> not first negotiated NBD_OPT_STRUCTURED_READ, but leaves the connection
> open so the client can continue option haggling.  That way, the only way
> to end option haggling with the new command enabled is to also enable
> structured reads.  The burden is then on the client to haggle in the
> correct order, and on the server to track haggling state when deciding
> how to answer the option requests for the new commands.

I think I'm saying something simpler (unless I'm missing something)
which is:

The client MUST NOT propose NBD_OPT_FOO unless it has previously
proposed and the server accepted NBD_OPT_BAR. In this case FOO
is e.g. the thing to find holes, BAR is NBD_OPT_STRUCTURED_READ.

So it's not a question of leaving the connection open or closing it,
it's simply that the client can't propose X unless it's already
negotiated Y. If it does, all bets are off, so of course it can
close the connection. But this makes it explicit that the client
if proposing X should first have successfully negotiated Y.

> I'm a bit reluctant to put ordering requirements on option haggling
> (that option A must be turned on before option B), but then again, the
> SELECT extension requires NBD_OPT_SELECT to be haggled prior to
> NBD_OPT_GO, so there's precedent.

Resisting such a change would resist the possibility in the future
that option X requires option Y. Even if you can get around that
this time, I'll bet my bottom dollar it will come up again.

> I also am thinking of proposing an
> extension for option haggling to communicate minimum/preferred
> alignments and maximum don't-fragment sizing, and that option would have
> to be enabled before OPT_LIST/OPT_SELECT/OPT_EXPORT_NAME (because it
> would change the NBD_REP_SERVER layout in response to those option
> requests), which would be another case where option A affects the
> behavior of option B.

... and as if by magic!

Yeah I was going to suggest a similar option, including a blocksize
alignment, a maximum size for a read/write, and a maximum DF size.

I'm busy writing an 'example' nbd server in golang and this is the
first thing I ran into. The purpose of this BTW is to serve as
an example implementation for structured replies.

>> These multiple error chunks are neat. However, I suspect lazy implementors
>> may just send an error without an offset.
> 
> Any ideas are appreciated on how to word it to suggest that servers
> SHOULD try to give full details; but I think we want the fallback to the
> no-offset case because some situations will not have an offset.

Indeed. Perhaps we can promise them Oreos.

> I was thinking that it's easier on the client if the final chunk always
> serves as a reliable indicator of success (OFFSET_DATA, OFFSET_HOLE,
> NONE) or error (ERROR, ERROR_OFFSET).  But if you think that always
> allowing a concluding NONE, even on an errored reply due to an earlier
> chunk reporting the error, is reasonable, I can relax things along those
> lines.  Or maybe we can state that the concluding chunk on an errored
> reply may be ERROR_OFFSET with 'error' and 'offset' fields set to 0,
> rather than forcing the server to replay one of the earlier offsets.

I think I'd prefer just that the last chunk has the relevant bit

Re: [Qemu-devel] [Nbd] [PATCH v2 3/3] doc: Propose Structured Read extension

2016-03-30 Thread Wouter Verhelst

On Wed, Mar 30, 2016 at 02:54:41PM -0600, Eric Blake wrote:
> On 03/30/2016 01:51 PM, Wouter Verhelst wrote:
> 
> > (side note: this seems to be mostly an NBD discussion at this point.
> > Does qemu-devel need to be in the loop before we've finished that? I
> > don't care either way, but then I don't want to bore or annoy people...)
> 
> Well, it stemmed out of qemu's desires to implement more efficient ways
> to both push and pull sparse files across NBD; and qemu will ultimately
> want to implement what we discuss.  But I'm okay doing most of the churn
> off of the qemu list, and only adding qemu back in the loop when it is
> actually time to implement the final design, unless someone else speaks
> up asking to remain in on the conversation.

Sure. I was just asking the question...

[...]
> Well, I'm worried about the opposite - if the client does not negotiate
> structured replies, but does negotiate NBD_CMD_GET_LBA_STATUS (which has
> a variable-length payload to reply), then the server has three choices:
> 0) refuse the client (we already said this is a bit undesirable, as it
> may lead to the client retrying in an infloop - having a way to return
> an error message is better than dropping the connection); 1) the server
> must find a way to shoehorn the same data that would be sent in a
> structured reply to fit in the old-style unstructured reply (one nice
> thing about the structured reply is that we get a length for free; that
> length would have to be shoehorned into the old-style reply, different
> from CMD_READ where a length was implied from the request), 2) the
> server must fail the message
> 
> Actually, having typed that, maybe choice 2 is not all that bad.

It isn't.

> It's fairly easy for a server to ALWAYS fail NBD_CMD_GET_LBA_STATUS
> with EINVAL if structured replies were not negotiated, and to document
> that a client that negotiates GET_LBA_STATUS MUST also negotiate
> structured replies for the command to be useful.  For that matter, we
> just documented that servers SHOULD use EINVAL for an unrecognized (or
> out-of-context) command.  The client can enable the two options in
> either order.  And we'd have the following table:
> 
> enabled   enabled
> structuredGET_LBAresult of:
> replies read GET_LBA   write
> 
> nonoold replyEINVALold reply
> yes   nonew replyEINVAL[*]
> noyes   old replyEINVALold reply
> yes   yes   new replynew reply [*]

Right.

It would also be reasonable to say that if you don't negotiate an option
but do end up using it, you end up squarely in undefined behaviour-land.
The server could send EINVAL, it could honour your request in ways that
you may not expect (including structured replies when you didn't ask for
them), or it could cause nasal demons.

> [*] Here, we're still debating whether it makes sense to allow/require a
> server to send new replies everywhere (and clients must handle both
> styles if they negotiate structured replies), or to require a server to
> send old replies (so that read is the only command where clients have to
> handle two styles, but where the results of negotiating pinpoint which
> style).

I'm still in favour of having it be "old reply" for the whole of that
"write" column. I'm just not convinced there's a downside to that, while
there is an upside.

> > Since only the read reply has a payload at this point in time, you don't
> > really need to special-case it, anyway.
> 
> Okay, so in v1 I tried to negotiate STRUCTURED_REPLY; in v2 I was
> specific to STRUCTURED_READ; it sounds like we are leaning back towards
> STRUCTURED_REPLY and just a caveat that any new command that sends
> payload SHOULD/MUST fail if STRUCTURED_REPLY was not also negotiated.

You could formulate it that way. Alternatively, you could formulate it
so that any command that sends payload may fail if STRUCTURED_REPLY was
not also negotiated, with caveat that there is this backwards
compatibility thing for READ.

(i.e., make READ be the exception rather than the rule)

> I guess that also makes it easier to argue for a server that uses a
> structured reply for ALL messages (REPLY_TYPE_NONE for success changes
> 16 bytes into 20, and REPLY_TYPE_ERROR for errors changes 16 bytes
> into 24).

Perhaps.

[...]
> > The client will already need to keep state to reassemble a fragmented
> > and out-of-order read reply, anyway. If that's already the case, adding
> > in the requirement to *also* keep track of error state (which could in
> > the simplest case be a single bit for a client which doesn't care about
> > offsets or error count) isn't that much more of an issue.
> 
> For a client that is copying NBD_CMD_READ into a local file, dumping
> directly via pwrite() as each chunk comes in doesn't require the client
> track any

Re: [Qemu-devel] tcg: reworking tb_invalidated_flag

2016-03-30 Thread Sergey Fedorov

On 30/03/16 22:08, Richard Henderson wrote:
> On 03/30/2016 10:08 AM, Sergey Fedorov wrote:
>> As of catching tb_flush() in cpu_exec() there have been three approaches
>> proposed.
>>
>> The first approach is to get rid of 'tb_invalidated_flag' and use
>> 'tb_flush_count'. Capture 'tb_flush_count' inside 'tb_lock' critical
>> section of cpu_exec() and compare it on each execution loop iteration
>> before trying to do tb_add_jump(). This would be simple and clear but it
>> would cost an extra load from a shared variable 'tb_flush_count' each
>> time we go over the execution loop.
>>
>> The second approach is to make 'tb_invalidated_flag' per-CPU. This
>> would be conceptually similar to what we have, but would give us thread
>> safety. With this approach, we need to be careful to correctly clear and
>> set the flag.
>>
>> The third approach is to mark each individual TB as valid/invalid. This
>> is what Emilio has in his MTTCG series [2]. Following this approach, we
>> could have very clean code with no extra overhead on the hot path.
>> However, it would require to mark all TBs as invalid on tb_flush().
>> Given that tb_flush() is rare, it shouldn't be a significant overhead.
>> Also, there could be several options how to mark TB valid/invalid:
>> a dedicated flag could be introduced or some invalid value of
>> pc/cs_base/flags could be used.
> I'm not really fond of this third option.  Yes, tb_flush is rare on some
> targets, but for those with software managed TLBs they're much more common.
> See e.g. mips and sparc.
>
> Even when tb_flush is rare, there can be 500k-1M TBs live when the flush does
> happen.  I simply cannot imagine that individually touching 1M variables
> performs as well as setting one global variable, or taking a global lock.

Ah, you are right, I missed that fact, thanks.

Kind regards,
Sergey

[Qemu-devel] [PATCH] target-i386: assert that KVM_GET/SET_MSRS can set all requested MSRs

2016-03-30 Thread Paolo Bonzini

This would have caught the bug in the previous patch.

Signed-off-by: Paolo Bonzini 
---
 target-i386/kvm.c | 34 ++
 1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 19e2d94..799fdfa 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -141,6 +141,7 @@ static int kvm_get_tsc(CPUState *cs)
 return ret;
 }
 
+assert(ret == 1);
 env->tsc = msr_data.entries[0].data;
 return 0;
 }
@@ -1446,6 +1447,7 @@ static int kvm_put_tscdeadline_msr(X86CPU *cpu)
 struct kvm_msr_entry entries[1];
 } msr_data;
 struct kvm_msr_entry *msrs = msr_data.entries;
+int ret;
 
 if (!has_msr_tsc_deadline) {
 return 0;
@@ -1457,7 +1459,13 @@ static int kvm_put_tscdeadline_msr(X86CPU *cpu)
 .nmsrs = 1,
 };
 
-return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, _data);
+ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, _data);
+if (ret < 0) {
+return ret;
+}
+
+assert(ret == 1);
+return 0;
 }
 
 /*
@@ -1472,6 +1480,11 @@ static int kvm_put_msr_feature_control(X86CPU *cpu)
 struct kvm_msrs info;
 struct kvm_msr_entry entry;
 } msr_data;
+int ret;
+
+if (!has_msr_feature_control) {
+return 0;
+}
 
 kvm_msr_entry_set(_data.entry, MSR_IA32_FEATURE_CONTROL,
   cpu->env.msr_ia32_feature_control);
@@ -1480,7 +1493,13 @@ static int kvm_put_msr_feature_control(X86CPU *cpu)
 .nmsrs = 1,
 };
 
-return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, _data);
+ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, _data);
+if (ret < 0) {
+return ret;
+}
+
+assert(ret == 1);
+return 0;
 }
 
 static int kvm_put_msrs(X86CPU *cpu, int level)
@@ -1492,6 +1511,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 } msr_data;
 struct kvm_msr_entry *msrs = msr_data.entries;
 int n = 0, i;
+int ret;
 
 kvm_msr_entry_set([n++], MSR_IA32_SYSENTER_CS, env->sysenter_cs);
 kvm_msr_entry_set([n++], MSR_IA32_SYSENTER_ESP, env->sysenter_esp);
@@ -1685,8 +1705,13 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 .nmsrs = n,
 };
 
-return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, _data);
+ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, _data);
+if (ret < 0) {
+return ret;
+}
 
+assert(ret == n);
+return 0;
 }
 
 
@@ -2055,6 +2080,7 @@ static int kvm_get_msrs(X86CPU *cpu)
 return ret;
 }
 
+assert(ret == n);
 for (i = 0; i < ret; i++) {
 uint32_t index = msrs[i].index;
 switch (index) {
@@ -2511,7 +2537,7 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 
 assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
 
-if (level >= KVM_PUT_RESET_STATE && has_msr_feature_control) {
+if (level >= KVM_PUT_RESET_STATE) {
 ret = kvm_put_msr_feature_control(x86_cpu);
 if (ret < 0) {
 return ret;
-- 
2.5.5

[Qemu-devel] [PATCH] target-i386: do not read/write MSR_TSC_AUX from KVM if CPUID bit is not set

2016-03-30 Thread Paolo Bonzini

KVM does not let you read or write this MSR if the corresponding CPUID
bit is not set.  This in turn causes MSRs that come after MSR_TSC_AUX
to be ignored by KVM_SET_MRSS.

One visible symptom is that s3.flat from kvm-unit-tests fails with
CPUs that do not have RDTSCP, because the SMBASE is not reset to
0x3 after reset.

Fixes: c9b8f6b6210847b4381c5b2ee172b1c7eb9985d6
Cc: qemu-sta...@nongnu.org
Signed-off-by: Paolo Bonzini 
---
 target-i386/kvm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 87ab969..19e2d94 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -917,6 +917,9 @@ int kvm_arch_init_vcpu(CPUState *cs)
 if (env->features[FEAT_1_EDX] & CPUID_MTRR) {
 has_msr_mtrr = true;
 }
+if (!(env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_RDTSCP)) {
+has_msr_tsc_aux = false;
+}
 
 return 0;
 }
-- 
2.5.5

Re: [Qemu-devel] [Nbd] [PATCH v2 3/3] doc: Propose Structured Read extension

2016-03-30 Thread Eric Blake

On 03/30/2016 01:51 PM, Wouter Verhelst wrote:

> (side note: this seems to be mostly an NBD discussion at this point.
> Does qemu-devel need to be in the loop before we've finished that? I
> don't care either way, but then I don't want to bore or annoy people...)

Well, it stemmed out of qemu's desires to implement more efficient ways
to both push and pull sparse files across NBD; and qemu will ultimately
want to implement what we discuss.  But I'm okay doing most of the churn
off of the qemu list, and only adding qemu back in the loop when it is
actually time to implement the final design, unless someone else speaks
up asking to remain in on the conversation.

>> I'm a bit reluctant to put ordering requirements on option haggling
>> (that option A must be turned on before option B),
> 
> Yes, me too.
> 
>> but then again, the
>> SELECT extension requires NBD_OPT_SELECT to be haggled prior to
>> NBD_OPT_GO, so there's precedent.
> 
> Yeah, but then, that's only because GO is meant to transition from
> negotiation to transmission, so it needs to be after *any* other
> negotiation; anything else would defeat its purpose.
> 
> Requiring structured read after negotiating other structured commands
> could easily be done by saying that it's an error to leave the
> negotiation phase with "some other" structured command negotiated, but
> not structured read.
> 
> On the other hand, I feel compelled to point out that we only find
> ourselves in this hole because we've coupled the structured reply with
> the read command. That may have looked like a good idea at the time, but
> obviously it isn't. If we just have it be "negotiate the structured
> reply format" rather than "the structured read reply", and state that
> once negotiated, the structured reply format is required for any command
> with a payload, we're done.

Well, I'm worried about the opposite - if the client does not negotiate
structured replies, but does negotiate NBD_CMD_GET_LBA_STATUS (which has
a variable-length payload to reply), then the server has three choices:
0) refuse the client (we already said this is a bit undesirable, as it
may lead to the client retrying in an infloop - having a way to return
an error message is better than dropping the connection); 1) the server
must find a way to shoehorn the same data that would be sent in a
structured reply to fit in the old-style unstructured reply (one nice
thing about the structured reply is that we get a length for free; that
length would have to be shoehorned into the old-style reply, different
from CMD_READ where a length was implied from the request), 2) the
server must fail the message

Actually, having typed that, maybe choice 2 is not all that bad.  It's
fairly easy for a server to ALWAYS fail NBD_CMD_GET_LBA_STATUS with
EINVAL if structured replies were not negotiated, and to document that a
client that negotiates GET_LBA_STATUS MUST also negotiate structured
replies for the command to be useful.  For that matter, we just
documented that servers SHOULD use EINVAL for an unrecognized (or
out-of-context) command.  The client can enable the two options in
either order.  And we'd have the following table:

enabled   enabled
structuredGET_LBAresult of:
replies read GET_LBA   write

nonoold replyEINVALold reply
yes   nonew replyEINVAL[*]
noyes   old replyEINVALold reply
yes   yes   new replynew reply [*]

[*] Here, we're still debating whether it makes sense to allow/require a
server to send new replies everywhere (and clients must handle both
styles if they negotiate structured replies), or to require a server to
send old replies (so that read is the only command where clients have to
handle two styles, but where the results of negotiating pinpoint which
style).

> 
> Since only the read reply has a payload at this point in time, you don't
> really need to special-case it, anyway.

Okay, so in v1 I tried to negotiate STRUCTURED_REPLY; in v2 I was
specific to STRUCTURED_READ; it sounds like we are leaning back towards
STRUCTURED_REPLY and just a caveat that any new command that sends
payload SHOULD/MUST fail if STRUCTURED_REPLY was not also negotiated.  I
guess that also makes it easier to argue for a server that uses a
structured reply for ALL messages (REPLY_TYPE_NONE for success changes
16 bytes into 20, and REPLY_TYPE_ERROR for errors changes 16 bytes into 24).

> 
>> I also am thinking of proposing an extension for option haggling to
>> communicate minimum/preferred alignments and maximum don't-fragment
>> sizing, and that option would have to be enabled before
>> OPT_LIST/OPT_SELECT/OPT_EXPORT_NAME (because it would change the
>> NBD_REP_SERVER layout in response to those option requests), which
>> would be another case where option A affects the

[Qemu-devel] [PULL for-2.6 2/2] block/nfs: add missing #include "qemu/cutils.h"

2016-03-30 Thread Jeff Cody

From: Stefan Hajnoczi 

parse_uint_full() used to be included from qemu-common.h but was moved
to qemu/cutils.h in commit f348b6d1a53e5271cf1c9f9acc4646b4b98c1771
("util: move declarations out of qemu-common.h").

Cc: Veronia Bahaa 
Signed-off-by: Stefan Hajnoczi 
Message-id: 1459341994-20567-3-git-send-email-stefa...@redhat.com
Signed-off-by: Jeff Cody 
---
 block/nfs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/nfs.c b/block/nfs.c
index 0c43dc7..9f51cc3 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -33,6 +33,7 @@
 #include "trace.h"
 #include "qemu/iov.h"
 #include "qemu/uri.h"
+#include "qemu/cutils.h"
 #include "sysemu/sysemu.h"
 #include 
 
-- 
1.9.3

[Qemu-devel] [PULL for-2.6 1/2] block/nfs: add missing #include "qapi/error.h"

2016-03-30 Thread Jeff Cody

From: Stefan Hajnoczi 

error_setg() used to be included indirectly through qemu/osdep.h.  Since
commit da34e65cb4025728566d6504a99916f6e7e1dd6a ("include/qemu/osdep.h:
Don't include qapi/error.h") it requires an explicit include.

Cc: Markus Armbruster 
Signed-off-by: Stefan Hajnoczi 
Message-id: 1459341994-20567-2-git-send-email-stefa...@redhat.com
Signed-off-by: Jeff Cody 
---
 block/nfs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/nfs.c b/block/nfs.c
index 7220e89..0c43dc7 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -28,6 +28,7 @@
 #include "qemu-common.h"
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
+#include "qapi/error.h"
 #include "block/block_int.h"
 #include "trace.h"
 #include "qemu/iov.h"
-- 
1.9.3

[Qemu-devel] [PULL for-2.6 0/2] Block patches

2016-03-30 Thread Jeff Cody

The following changes since commit 9370a3bbc478f623dd21d783560629ea2064625b:

  Update version for v2.6.0-rc0 release (2016-03-30 19:25:40 +0100)

are available in the git repository at:

  g...@github.com:codyprime/qemu-kvm-jtc.git tags/block-pull-request

for you to fetch changes up to 0d94b74655adadbeee135aee9bc105a41a524436:

  block/nfs: add missing #include "qemu/cutils.h" (2016-03-30 16:50:39 -0400)


Block patch for 2.6, to fix build failure for libnfs


Stefan Hajnoczi (2):
  block/nfs: add missing #include "qapi/error.h"
  block/nfs: add missing #include "qemu/cutils.h"

 block/nfs.c | 2 ++
 1 file changed, 2 insertions(+)

-- 
1.9.3

Re: [Qemu-devel] [ANNOUNCE] QEMU 2.6.0-rc0 is now available

2016-03-30 Thread Stefan Weil

Am 30.03.2016 um 22:34 schrieb Michael Roth:
> Hello,
>
> On behalf of the QEMU Team, I'd like to announce the availability of the
> first release candidate for the QEMU 2.6 release.  This release is meant
> for testing purposes and should not be used in a production environment.

These patches fix a build regression:

http://patchwork.ozlabs.org/patch/603337/
http://patchwork.ozlabs.org/patch/603336/

Thanks,
Stefan




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 0/2] block/nfs: fix build breakage due to missing includes

2016-03-30 Thread Stefan Weil

Am 30.03.2016 um 14:46 schrieb Stefan Hajnoczi:
> Header files included by almost every source files have been slimmed down
> recently to speed up the build and eliminate unnecessary dependencies.  The
> block/nfs.c file must have been forgotten when build testing because it
> currently fails to build.
>
> This series adds the necessary #include statements to make block/nfs.c build
> successfully again.
>
> Stefan Hajnoczi (2):
>   block/nfs: add missing #include "qapi/error.h"
>   block/nfs: add missing #include "qemu/cutils.h"
>
>  block/nfs.c | 2 ++
>  1 file changed, 2 insertions(+)
>

Tested-by: Stefan Weil 

I noticed these patches after I had just prepared my own patch...

Maybe Peter M. can apply both patches without waiting for a pull request
to fix the current broken git master.

Regards
Stefan



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Nbd] [PATCH v2 3/3] doc: Propose Structured Read extension

2016-03-30 Thread Wouter Verhelst

So,

On Tue, Mar 29, 2016 at 05:01:00PM -0600, Eric Blake wrote:
[...]
> +- `NBD_OPT_STRUCTURED_READ` (8)
> +
> +Defined by the experimental `Structured Read` extension; see below.

(detail: that makes the "structured read" extension be typographically
different from everything else. Either make it all caps, or not
monocase.)

[...]
>  A write request. Length and offset define the location and amount of
> @@ -536,13 +556,16 @@ The following error values are defined:
>  * `ENOMEM` (12), Cannot allocate memory.
>  * `EINVAL` (22), Invalid argument.
>  * `ENOSPC` (28), No space left on device.
> +* `EOVERFLOW` (75), Value too large.
>
>  The server SHOULD return `ENOSPC` if it receives a write request
>  including one or more sectors beyond the size of the device.  It SHOULD
>  return `EINVAL` if it receives a read or trim request including one or
>  more sectors beyond the size of the device.  It also SHOULD map the
> -`EDQUOT` and `EFBIG` errors to `ENOSPC`.  Finally, it SHOULD return
> -`EPERM` if it receives a write or trim request on a read-only export.
> +`EDQUOT` and `EFBIG` errors to `ENOSPC`.  It SHOULD return `EOVERFLOW`
> +on a request to send structured read data without fragmentation but
> +where the length is too large.  Finally, it SHOULD return `EPERM` if
> +it receives a write or trim request on a read-only export.

I'd like some more explicit language here that makes it clear EOVERFLOW
can *only* be used on structured replies. We reduced the set of valid
error numbers a while back for good reason; it would be bad if we
accidentally increase it for existing replies now.

>  The server SHOULD return `EINVAL` if it receives an unknown command.
> 
> @@ -579,7 +602,7 @@ To remedy this, a `SELECT` extension is envisioned. This 
> extension adds
>  two option requests and one error reply type, and extends one existing
>  option reply type.
> 
> -* `NBD_OPT_SELECT`
> +* `NBD_OPT_SELECT` (6)

NAK. The spec is currently consistent in associating numbers to
constants in only *one* place. This is no accident, and I'd like to keep
it that way.

(at least I think it is; if not, that's a bug)

>  The client wishes to select the export with the given name for use
>  in the transmission phase, but does not yet want to move to the
> @@ -613,7 +636,7 @@ option reply type.
>handle `NBD_REP_ERR_UNSUP`. In this case, they should fall back to
>using `NBD_OPT_EXPORT_NAME`.
> 
> -* `NBD_OPT_GO`
> +* `NBD_OPT_GO` (7)

same.

>  The client wishes to terminate the negotiation phase and progress to
>  the transmission phase. Possible replies from the server include:
> @@ -635,6 +658,235 @@ option reply type.
>message if they do not also send it as a reply to the
>`NBD_OPT_SELECT` message.
> 
> +### `Structured Read` extension
> +
> +Some of the major downsides of the default reply to `NBD_CMD_READ`
> +(without structured replies) are as follows.  First, it is not
> +possible to support partial reads (the command must succeed or fail as
> +a whole, either len bytes of data must be sent or the connection must
> +be closed).  There is no way to efficiently skip over portions of a
> +sparse file that are known to contain all zeroes.  Finally, it is not
> +possible to reliably decode the server traffic without also having
> +context of what pending read requests were sent by the client.
> +
> +To remedy this, a `Structured Read` extension is envisioned. This
> +extension adds a new option request, a new reply type during the
> +transmission phase, and a new command flag, and alters the set of
> +valid replies to an existing command.
> +
> +* `NBD_OPT_STRUCTURED_READ` (8)

same

> +The client wishes to use structured reads during the transmission
> +phase.  The option request has no additional data.
> +
> +The server replies with one of the following:
> +
> +- `NBD_REP_ACK`: Structured reads have been negotiated; the server
> +  MUST use structured replies to `NBD_CMD_READ`
> +- `NBD_REP_UNSUP`: Structured reads are not available; the transmission
  ^ ERR_

(however, see below ;-)

> +  phase MUST remain the same as if the client had not attempted
> +  `NBD_OPT_STRUCTURED_READ`

This makes it seem as though NBD_REP_ERR_UNSUP has a different meaning
here than it usually does. I think the spec should just say that the
server should reply with NBD_REP_ACK, and then mention that for
backwards compatibility the client should be prepared to handle
NBD_REP_UNSUP too (as is done elsewhere).

That is, if the server implements structured replies, it should allow it
(it makes no sense for the server to disallow structured reads if it
knows about them)

[...]
> +A server MUST NOT send a data payload in a normal reply if
> +Structured Reads are negotiated.  It is envisioned that all future
> +extension commands that require a data payload in the response
> +will require independent option negotiation, and therefore, the

[Qemu-devel] [ANNOUNCE] QEMU 2.6.0-rc0 is now available

2016-03-30 Thread Michael Roth

Hello,

On behalf of the QEMU Team, I'd like to announce the availability of the
first release candidate for the QEMU 2.6 release.  This release is meant
for testing purposes and should not be used in a production environment.

http://wiki.qemu.org/download/qemu-2.6.0-rc0.tar.bz2

You can help improve the quality of the QEMU 2.6 release by testing this
release and reporting bugs on Launchpad:

https://bugs.launchpad.net/qemu/

The release plan for the 2.6 release is available at:

http://wiki.qemu.org/Planning/2.6

Please add entries to the ChangeLog for the 2.6 release below:

http://wiki.qemu.org/ChangeLog/2.6

Re: [Qemu-devel] [Nbd] [PATCHv2] Strawman proposal for NBD structured replies

2016-03-30 Thread Wouter Verhelst

On Wed, Mar 30, 2016 at 12:43:57PM -0600, Eric Blake wrote:
> On 03/30/2016 01:33 AM, Wouter Verhelst wrote:
> > Morning,
> > 
> > On Wed, Mar 30, 2016 at 07:59:15AM +0100, Alex Bligh wrote:
> >> On 30 Mar 2016, at 00:17, Eric Blake  wrote:
> 
>  -The server replies with:
>  +Replies take one of two forms. They may either be structured replies,
> >>>
> >>> Hmm, you put your strawman directly in the 'transmission phase' section,
> >>> while mine is deferred to the 'Experimental Extensions' section, at
> >>> least until we have a reference client and server implementation.
> >>
> >> Yeah, my wording was straw man but I think it should go into the main
> >> body of the work. Obviously in that case it wouldn't be *merged* until
> >> we had a working implementation.
> >>
> >> The SELECT stuff is a bit different as I am not sure it was intended
> >> to be standardized short term (i.e. it was a longer term experiment
> >> IIRC).
> > 
> > Actually, I plan to implement that when I get around to doing STARTTLS
> > (which I've started work on, but is far from ready).
> 
> On that tangent, I found SELECT slightly ambiguous (particularly since
> I'm also considering a proposal to modify NBD_REP_SERVER to expose
> alignment details, so it would have to play nicely with SELECT):
> 
> Based on normative text alone, we would have:
> 
> S: 64 bits, 0x3e889045565a9
> S: 32 bits, NBD_OPT_SELECT (6)
> S: 32 bits, NBD_REP_SERVER (2)
> S: 32 bits, length
> S: 32 bits, name length
> S: 'name length' bytes, UTF-8 encoded name (is it NUL-terminated?)
> S: 'length - name length' bytes, which is UTF-8 encoded message to
> display to human

Formally, it defines that as "the rest of the data contains some
undefined implementation-specific details about the export (...) [i]f
the client did not explicitly request otherwise, these details are
defined to be UTF-8 encoded data suitable for direct display to a human
being."

> This implies that 'name length' <= 'length - 4', although we don't
> actually state that the server MUST NOT send a name length longer than
> that.  It might not hurt to also require that length include space for a
> NUL terminator (in both the name, and in the optional human-readable
> information field), but only if that matches existing practice (if it
> does not, then we should document that the client and server are NOT
> dealing with NUL-terminated UTF-8 strings, but relying on length instead).
> 
> The SELECT text then refines this:
> 
> S: 64 bits, 0x3e889045565a9
> S: 32 bits, NBD_OPT_SELECT (6)
> S: 32 bits, NBD_REP_SERVER (2)
> S: 32 bits, length
> S: 32 bits, name length
> S: 'name length' bytes, UTF-8 encoded name (is it NUL-terminated?)
> S: 64 bits, size
> S: 16 bits, export flags
> 
> but doesn't mention what happens if 'length' > 'name length + 14'.

The idea here was that the client in case of the SELECT extension *does*
"explicitly request otherwise", meaning, the details about the export
here are no longer undefined or implementation-specific, but instead
clearly defined to be the "size" plus the "export flags".

> Presumably, if the server wants to include the same UTF-8 human
> information as before, it would go AFTER the 16 bits for the export
> field (in other words, the SELECT extension is carving out fields in the
> MIDDLE of the payload).

Sortof. I can see how you come to that conclusion, but it wasn't meant
as such. Then again, I'll readily admit that I didn't spend quite as
much thinking/discussing about the SELECT extension as compared to the
discussion we're having now :-)

> So, once I know for sure about the handling of NUL bytes, I'll probably
> try my hand at a patch to clarify the wording in both the normative and
> in the SELECT extension.

nbd-server.c does this:

for(i=0; ilen; i++) {
SERVER* serve = &(g_array_index(servers, SERVER, i));
len = htonl(strlen(serve->servename));
memcpy(buf, , sizeof(len));
strcpy(ptr, serve->servename);
send_reply(opt, net, NBD_REP_SERVER, 
strlen(serve->servename)+sizeof(len), buf);
}

where ptr = buf + sizeof(len).

(on a sidenote, that should really be a strncpy. Let me fix that...
there, done)

so it doesn't send the NUL byte. The nbd-client side ends up adding a
NUL byte after whatever the server sent; I think that makes the most
sense, since a client should do so anyway for data it receives from a
(potentially evil) peer.

[...]
> > I now realize, after having slept over it, that you guys probably meant
> > for an error-with-offset to only mark bytes that were part of *that*
> > particular reply chunk to be marked as faulty, which makes more sense
> > than "the whole request range from that point on", as I was interpreting
> > it...
> 
> Indeed, anything I can do to make the wording more clear is welcome.

You should probably be explicit about which parts of the response are
marked as invalid when an error is sent.

Re: [Qemu-devel] [Nbd] [PATCH v2 3/3] doc: Propose Structured Read extension

2016-03-30 Thread Wouter Verhelst

Hi all,

(side note: this seems to be mostly an NBD discussion at this point.
Does qemu-devel need to be in the loop before we've finished that? I
don't care either way, but then I don't want to bore or annoy people...)

On Wed, Mar 30, 2016 at 11:45:04AM -0600, Eric Blake wrote:
> On 03/30/2016 12:50 AM, Alex Bligh wrote:
[...]
> So taking your sentence at face value, yes there is another solution
> possible: require that any NBD_OPT_* haggling used to negotiate the use
> of any other command with a structured reply MUST fail if the client has
> not first negotiated NBD_OPT_STRUCTURED_READ, but leaves the connection
> open so the client can continue option haggling.  That way, the only way
> to end option haggling with the new command enabled is to also enable
> structured reads.  The burden is then on the client to haggle in the
> correct order, and on the server to track haggling state when deciding
> how to answer the option requests for the new commands.
> 
> I'm a bit reluctant to put ordering requirements on option haggling
> (that option A must be turned on before option B),

Yes, me too.

> but then again, the
> SELECT extension requires NBD_OPT_SELECT to be haggled prior to
> NBD_OPT_GO, so there's precedent.

Yeah, but then, that's only because GO is meant to transition from
negotiation to transmission, so it needs to be after *any* other
negotiation; anything else would defeat its purpose.

Requiring structured read after negotiating other structured commands
could easily be done by saying that it's an error to leave the
negotiation phase with "some other" structured command negotiated, but
not structured read.

On the other hand, I feel compelled to point out that we only find
ourselves in this hole because we've coupled the structured reply with
the read command. That may have looked like a good idea at the time, but
obviously it isn't. If we just have it be "negotiate the structured
reply format" rather than "the structured read reply", and state that
once negotiated, the structured reply format is required for any command
with a payload, we're done.

Since only the read reply has a payload at this point in time, you don't
really need to special-case it, anyway.

> I also am thinking of proposing an extension for option haggling to
> communicate minimum/preferred alignments and maximum don't-fragment
> sizing, and that option would have to be enabled before
> OPT_LIST/OPT_SELECT/OPT_EXPORT_NAME (because it would change the
> NBD_REP_SERVER layout in response to those option requests), which
> would be another case where option A affects the behavior of option B.

I reused the NBD_REP_SERVER command in reply to the NBD_OPT_SELECT
command since its purpose seems fairly similar ("send metadata about an
export to the client"), but that may have been a mistake. It certainly
wasn't meant that if you say NBD_OPT_SELECT first and then go
NBD_OPT_LIST, that the NBD_REP_SERVER reply to NBD_OPT_LIST should be
the one as specified in the SELECT extension.

[...]
> > These multiple error chunks are neat. However, I suspect lazy implementors
> > may just send an error without an offset.
> 
> Any ideas are appreciated on how to word it to suggest that servers
> SHOULD try to give full details; but I think we want the fallback to the
> no-offset case because some situations will not have an offset.

I'm wary of making the spec too complicated. Adding such language might
make it unclear. Since as you say it can't be a hard-and-fast rule, I'd
prefer that we just trust implementor's judgement on this.

Not sending an offset (even if it would be possible) can certainly be
the better choice in some cases -- a protocol description can never know
all the trade-offs and choices a particular implementor may want or need
to make.

> >> +The server MAY send additional chunks or offset error replies, if
> >> +`NBD_REPLY_FLAG_DONE` was not set, but MUST ensure the final reply
> >> +also reports an error (that is, the final reply MUST NOT use
> >> +`NBD_REPLY_TYPE_NONE`), and MAY reuse an offset reported earlier
> >> +in constructing the final reply.
> > 
> > I'm not sure I get that bit. Why don't you make an errorred reply simply
> > one that contains one or more error chunks. An errorred reply need not 
> > contain
> > all the data requested (though each chunk must be complete). A reply that
> > isn't errorred needs not contain all the data requested. Why do you
> > need anything stronger than that? So if you have a parallelised server which
> > is simply sending several reads in parallel (think Ceph) it sends the
> > result from each thread, possibly followed by an error packet, and some
> > other thread notices when all of these have completed and sends a
> > NBD_REPLY_TYPE_NONE packet (always, error or not) to close of use of the
> > handle. This seems perfectly natural and no harder for the client to deal
> > with, but you are prohibiting it.
> 
> I was thinking that it's easier on the client

Re: [Qemu-devel] [PATCH 2/2] target-mips: Implement IEEE 754-2008 functionality for R6 and MSA instructions

2016-03-30 Thread Aleksandar Markovic

I really appreciate your guidance and help. I will respond shortly with a 
proposal that will address all issues that you brought up. Thanks again for 
your support and time.

Aleksandar

From: qemu-devel-bounces+aleksandar.markovic=imgtec@nongnu.org 
[qemu-devel-bounces+aleksandar.markovic=imgtec@nongnu.org] on behalf of 
Richard Henderson [r...@twiddle.net]
Sent: Monday, March 28, 2016 2:49 PM
To: Aleksandar Markovic; qemu-devel@nongnu.org
Cc: peter.mayd...@linaro.org; ehabk...@redhat.com; 
kbast...@mail.uni-paderborn.de; mark.cave-ayl...@ilande.co.uk; ag...@suse.de; 
Petar Jovanovic; blauwir...@gmail.com; jcmvb...@gmail.com; Miodrag Dinic; 
qemu-...@nongnu.org; qemu-...@nongnu.org; edgar.igles...@gmail.com; 
pbonz...@redhat.com; g...@mprc.pku.edu.cn; Leon Alrae; afaer...@suse.de; 
aurel...@aurel32.net; pro...@gmail.com
Subject: Re: [Qemu-devel] [PATCH 2/2] target-mips: Implement IEEE 754-2008 
functionality for R6 and MSA instructions

On 03/25/2016 05:50 AM, Aleksandar Markovic wrote:
> @@ -2621,9 +2621,23 @@ uint64_t helper_float_cvtl_d(CPUMIPSState *env, 
> uint64_t fdt0)
>   uint64_t dt2;
>
>   dt2 = float64_to_int64(fdt0, >active_fpu.fp_status);
> -if (get_float_exception_flags(>active_fpu.fp_status)
> -& (float_flag_invalid | float_flag_overflow)) {
> -dt2 = FP_TO_INT64_OVERFLOW;
> +if (env->active_fpu.fcr31 & (1 << FCR31_NAN2008)) {
> +if (get_float_exception_flags(>active_fpu.fp_status)
> +& (float_flag_invalid | float_flag_overflow)) {
> +if (float64_is_any_nan(fdt0)) {
> +dt2 = 0;
> +} else {
> +if (float64_is_neg(fdt0))
> +dt2 = INT64_MIN;
> +else
> +dt2 = INT64_MAX;
> +}
> +}
> +} else {
> +if (get_float_exception_flags(>active_fpu.fp_status)
> +& (float_flag_invalid | float_flag_overflow)) {
> +dt2 = FP_TO_INT64_OVERFLOW;
> +}

Better to swap the tests here, so that you test the exception flags first (and
once).  That is the exceptional condition, the one that will be true least
often.  After that, FCR31_NAN2008 will be tested only when needed.

But also, this pattern is replicated so many times you'd do well to pull this
sequence out to helper functions (one for s, one for d).

> +uint64_t helper_float_abs_d(CPUMIPSState *env, uint64_t fdt0)
> +{
> +uint64_t fdt1;
> +
> +if (env->active_fpu.fcr31 & (1 << FCR31_ABS2008)) {
> +fdt1 = float64_abs(fdt0);
> +} else {
> +if (float64_is_neg(fdt0)) {
> +fdt1 = float64_sub(0, fdt0, >active_fpu.fp_status);
> +} else {
> +fdt1 = float64_add(0, fdt0, >active_fpu.fp_status);
> +}
> +update_fcr31(env, GETPC());

Here you're better off using two separate helper functions, and chose the
correct one during translation.  Indeed, since the 2008 version is a simple
bit-flip, you needn't actually have a helper; just expand the sequence inline.


r~

[Qemu-devel] NBD_REP_SERVER layout [was: [PATCHv2] Strawman proposal for NBD structured replies]

2016-03-30 Thread Eric Blake

On 03/30/2016 12:43 PM, Eric Blake wrote:

> On that tangent, I found SELECT slightly ambiguous (particularly since
> I'm also considering a proposal to modify NBD_REP_SERVER to expose
> alignment details, so it would have to play nicely with SELECT):
> 
> Based on normative text alone, we would have:
> 
> S: 64 bits, 0x3e889045565a9
> S: 32 bits, NBD_OPT_SELECT (6)
> S: 32 bits, NBD_REP_SERVER (2)
> S: 32 bits, length
> S: 32 bits, name length
> S: 'name length' bytes, UTF-8 encoded name (is it NUL-terminated?)
> S: 'length - name length' bytes, which is UTF-8 encoded message to
> display to human
> 
> This implies that 'name length' <= 'length - 4', although we don't
> actually state that the server MUST NOT send a name length longer than
> that.  It might not hurt to also require that length include space for a
> NUL terminator (in both the name, and in the optional human-readable
> information field), but only if that matches existing practice (if it
> does not, then we should document that the client and server are NOT
> dealing with NUL-terminated UTF-8 strings, but relying on length instead).
> 
> The SELECT text then refines this:
> 
> S: 64 bits, 0x3e889045565a9
> S: 32 bits, NBD_OPT_SELECT (6)
> S: 32 bits, NBD_REP_SERVER (2)
> S: 32 bits, length
> S: 32 bits, name length
> S: 'name length' bytes, UTF-8 encoded name (is it NUL-terminated?)
> S: 64 bits, size
> S: 16 bits, export flags
> 
> but doesn't mention what happens if 'length' > 'name length + 14'.
> Presumably, if the server wants to include the same UTF-8 human
> information as before, it would go AFTER the 16 bits for the export
> field (in other words, the SELECT extension is carving out fields in the
> MIDDLE of the payload).

Actually, now that I've thought about it, I wonder if it is better to
arrange data so that there is _always_ a UTF-8 human-readable string
immediately after the name (with 1-byte content of "" as the minimum),
so that clients can just blindly print that string, even if binary data
appears later in the structure due to other things (such as
NBD_OPT_SELECT) specifying the layout of that binary data.  But if we go
with that approach, then we should probably document that NBD_REP_SERVER
sent in reply to NBD_OPT_SELECT must have length >= 'name-length + 15'.
 Anywhere that we put binary data where a client used to be expecting
human-readable data risks an unprepared client printing garbage.

> 
> So, once I know for sure about the handling of NUL bytes, I'll probably
> try my hand at a patch to clarify the wording in both the normative and
> in the SELECT extension.

At any rate, I hope that I've brought attention to the issue, and that
part of moving SELECT out of experimental and into normative will
involve documenting decisions you actually make while implementing things.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] tcg: reworking tb_invalidated_flag

2016-03-30 Thread Richard Henderson

On 03/30/2016 10:08 AM, Sergey Fedorov wrote:
> As of catching tb_flush() in cpu_exec() there have been three approaches
> proposed.
> 
> The first approach is to get rid of 'tb_invalidated_flag' and use
> 'tb_flush_count'. Capture 'tb_flush_count' inside 'tb_lock' critical
> section of cpu_exec() and compare it on each execution loop iteration
> before trying to do tb_add_jump(). This would be simple and clear but it
> would cost an extra load from a shared variable 'tb_flush_count' each
> time we go over the execution loop.
> 
> The second approach is to make 'tb_invalidated_flag' per-CPU. This
> would be conceptually similar to what we have, but would give us thread
> safety. With this approach, we need to be careful to correctly clear and
> set the flag.
> 
> The third approach is to mark each individual TB as valid/invalid. This
> is what Emilio has in his MTTCG series [2]. Following this approach, we
> could have very clean code with no extra overhead on the hot path.
> However, it would require to mark all TBs as invalid on tb_flush().
> Given that tb_flush() is rare, it shouldn't be a significant overhead.
> Also, there could be several options how to mark TB valid/invalid:
> a dedicated flag could be introduced or some invalid value of
> pc/cs_base/flags could be used.

I'm not really fond of this third option.  Yes, tb_flush is rare on some
targets, but for those with software managed TLBs they're much more common.
See e.g. mips and sparc.

Even when tb_flush is rare, there can be 500k-1M TBs live when the flush does
happen.  I simply cannot imagine that individually touching 1M variables
performs as well as setting one global variable, or taking a global lock.


r~

Re: [Qemu-devel] [PATCHv2] Strawman proposal for NBD structured replies

2016-03-30 Thread Eric Blake

On 03/30/2016 01:33 AM, Wouter Verhelst wrote:
> Morning,
> 
> On Wed, Mar 30, 2016 at 07:59:15AM +0100, Alex Bligh wrote:
>> On 30 Mar 2016, at 00:17, Eric Blake  wrote:

 -The server replies with:
 +Replies take one of two forms. They may either be structured replies,
>>>
>>> Hmm, you put your strawman directly in the 'transmission phase' section,
>>> while mine is deferred to the 'Experimental Extensions' section, at
>>> least until we have a reference client and server implementation.
>>
>> Yeah, my wording was straw man but I think it should go into the main
>> body of the work. Obviously in that case it wouldn't be *merged* until
>> we had a working implementation.
>>
>> The SELECT stuff is a bit different as I am not sure it was intended
>> to be standardized short term (i.e. it was a longer term experiment
>> IIRC).
> 
> Actually, I plan to implement that when I get around to doing STARTTLS
> (which I've started work on, but is far from ready).

On that tangent, I found SELECT slightly ambiguous (particularly since
I'm also considering a proposal to modify NBD_REP_SERVER to expose
alignment details, so it would have to play nicely with SELECT):

Based on normative text alone, we would have:

S: 64 bits, 0x3e889045565a9
S: 32 bits, NBD_OPT_SELECT (6)
S: 32 bits, NBD_REP_SERVER (2)
S: 32 bits, length
S: 32 bits, name length
S: 'name length' bytes, UTF-8 encoded name (is it NUL-terminated?)
S: 'length - name length' bytes, which is UTF-8 encoded message to
display to human

This implies that 'name length' <= 'length - 4', although we don't
actually state that the server MUST NOT send a name length longer than
that.  It might not hurt to also require that length include space for a
NUL terminator (in both the name, and in the optional human-readable
information field), but only if that matches existing practice (if it
does not, then we should document that the client and server are NOT
dealing with NUL-terminated UTF-8 strings, but relying on length instead).

The SELECT text then refines this:

S: 64 bits, 0x3e889045565a9
S: 32 bits, NBD_OPT_SELECT (6)
S: 32 bits, NBD_REP_SERVER (2)
S: 32 bits, length
S: 32 bits, name length
S: 'name length' bytes, UTF-8 encoded name (is it NUL-terminated?)
S: 64 bits, size
S: 16 bits, export flags

but doesn't mention what happens if 'length' > 'name length + 14'.
Presumably, if the server wants to include the same UTF-8 human
information as before, it would go AFTER the 16 bits for the export
field (in other words, the SELECT extension is carving out fields in the
MIDDLE of the payload).

So, once I know for sure about the handling of NUL bytes, I'll probably
try my hand at a patch to clarify the wording in both the normative and
in the SELECT extension.

>> I guess Wouter should be the arbiter of whether he'd prefer to merge
>> it only when we have an implementation. My concern is that it may
>> hang around in 'experimental', when it needs properly merging, which
>> will require re-wordsmithing.
> 
> I'll merge whatever the outcome of this discussion is in the
> experimental section; that way, it won't get lost. I can reformulate
> your text to fit there myself if needs be, although obviously having
> something that doesn't require such work is preferable. However, I'm
> leaning more towards Eric's proposal at this time, because it feels more
> mature.

Okay, I'll do my best to word things in the experimental section so that
we can minimize edits when moving text. Maybe I'll even go so far as to
post a 2-patch series; one with the text in the experimental section for
committing now; the other for moving the text (but NOT to be committed)
to the normative section, for demonstrating the level of effort required.

> 
> I now realize, after having slept over it, that you guys probably meant
> for an error-with-offset to only mark bytes that were part of *that*
> particular reply chunk to be marked as faulty, which makes more sense
> than "the whole request range from that point on", as I was interpreting
> it...

Indeed, anything I can do to make the wording more clear is welcome.
Obviously, a server can mark an entire data chunk invalid by setting the
offset to the same value as the offset used in that chunk; the real
power is that an offset that is > chunk-offset but < 'chunk-offset +
chunk-length' then lets the client know that the first half of chunk was
valid, the second half of chunk is bogus, and no other chunk is
impacted.  And when a lazy server sends error-without-offset, the client
is forced to treat ALL chunks as invalid; which is why I state that the
server SHOULD NOT mix error-with-offset and error-without-offset in the
same reply (SHOULD NOT, but not MUST NOT, because I can't predict every
possible error scenario, and there may be some fatal chain of events
where the server is forced to mix after all).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc

Re: [Qemu-devel] [PATCH 3/3] ide: really restart pending and in-flight atapi dma

2016-03-30 Thread John Snow



On 03/28/2016 07:48 AM, Denis V. Lunev wrote:
> From: Pavel Butsykin 
> 
> Restart of ATAPI DMA used to be unreachable, because the request to do
> so wasn't indicated in bus->error_status due to the lack of spare bits, and
> ide_restart_bh() would return early doing nothing.
> 
> This patch makes use of the observation that not all bit combinations were
> possible in ->error_status. In particular, IDE_RETRY_READ only made sense
> together with IDE_RETRY_DMA or IDE_RETRY_PIO. This allows to re-use
> IDE_RETRY_READ alone as an indicator of ATAPI DMA restart request.
> 
> To makes things more uniform, ATAPI DMA gets its own value for ->dma_cmd.
> As a means against confusion, macros are added to test the state of
> ->error_status.
> 
> The patch fixes the restart of both in-flight and pending ATAPI DMA,
> following the scheme similar to that of IDE DMA.
> 
> Signed-off-by: Pavel Butsykin 
> Reviewed-by: Roman Kagan 
> Signed-off-by: Denis V. Lunev 
> CC: John Snow 
> ---
>  hw/ide/atapi.c| 15 ---
>  hw/ide/core.c | 27 ---
>  hw/ide/internal.h | 21 +
>  3 files changed, 41 insertions(+), 22 deletions(-)
> 
> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> index acc52cd..fb9ae43 100644
> --- a/hw/ide/atapi.c
> +++ b/hw/ide/atapi.c
> @@ -342,6 +342,7 @@ static void ide_atapi_cmd_reply(IDEState *s, int size, 
> int max_size)
>  block_acct_start(blk_get_stats(s->blk), >acct, size,
>   BLOCK_ACCT_READ);
>  s->status = READY_STAT | SEEK_STAT | DRQ_STAT;
> +s->dma_cmd = IDE_DMA_ATAPI;
>  ide_start_dma(s, ide_atapi_cmd_read_dma_cb);
>  } else {
>  s->status = READY_STAT | SEEK_STAT;
> @@ -375,15 +376,18 @@ static void ide_atapi_cmd_check_status(IDEState *s)
>  }
>  /* ATAPI DMA support */
>  
> -/* XXX: handle read errors */
>  static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret)
>  {
>  IDEState *s = opaque;
>  int data_offset, n;
>  
>  if (ret < 0) {
> -ide_atapi_io_error(s, ret);
> -goto eot;
> +if (ide_handle_rw_error(s, -ret, ide_dma_cmd_to_retry(s->dma_cmd))) {
> +if (s->bus->error_status) {
> +return;
> +}
> +goto eot;
> +}
>  }
>  
>  if (s->io_buffer_size > 0) {
> @@ -464,6 +468,7 @@ static void ide_atapi_cmd_read_dma(IDEState *s, int lba, 
> int nb_sectors,
>  
>  /* XXX: check if BUSY_STAT should be set */
>  s->status = READY_STAT | SEEK_STAT | DRQ_STAT | BUSY_STAT;
> +s->dma_cmd = IDE_DMA_ATAPI;
>  ide_start_dma(s, ide_atapi_cmd_read_dma_cb);
>  }
>  
> @@ -481,10 +486,6 @@ static void ide_atapi_cmd_read(IDEState *s, int lba, int 
> nb_sectors,
>  }
>  }
>  
> -
> -/* Called by *_restart_bh when the transfer function points
> - * to ide_atapi_cmd
> - */
>  void ide_atapi_dma_restart(IDEState *s)
>  {
>  /*
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 8f86036..0425d86 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -56,7 +56,6 @@ static const int smart_attributes[][12] = {
>  { 190,  0x03, 0x00, 0x45, 0x45, 0x1f, 0x00, 0x1f, 0x1f, 0x00, 0x00, 
> 0x32},
>  };
>  
> -static int ide_handle_rw_error(IDEState *s, int error, int op);
>  static void ide_dummy_transfer_stop(IDEState *s);
>  
>  static void padstr(char *str, const char *src, int len)
> @@ -772,7 +771,7 @@ void ide_dma_error(IDEState *s)
>  ide_set_irq(s->bus);
>  }
>  
> -static int ide_handle_rw_error(IDEState *s, int error, int op)
> +int ide_handle_rw_error(IDEState *s, int error, int op)
>  {
>  bool is_read = (op & IDE_RETRY_READ) != 0;
>  BlockErrorAction action = blk_get_error_action(s->blk, is_read, error);
> @@ -782,8 +781,10 @@ static int ide_handle_rw_error(IDEState *s, int error, 
> int op)
>  s->bus->error_status = op;
>  } else if (action == BLOCK_ERROR_ACTION_REPORT) {
>  block_acct_failed(blk_get_stats(s->blk), >acct);
> -if (op & IDE_RETRY_DMA) {
> +if (IS_IDE_RETRY_DMA(op)) {
>  ide_dma_error(s);
> +} else if (IS_IDE_RETRY_ATAPI(op)) {
> +ide_atapi_io_error(s, -error);
>  } else {
>  ide_rw_error(s);
>  }
> @@ -871,6 +872,8 @@ static void ide_dma_cb(void *opaque, int ret)
>  ide_issue_trim, ide_dma_cb, s,
>  DMA_DIRECTION_TO_DEVICE);
>  break;
> +default:
> +abort();
>  }
>  return;
>  
> @@ -2517,15 +2520,13 @@ static void ide_restart_bh(void *opaque)
>  if (s->bus->dma->ops->restart) {
>  s->bus->dma->ops->restart(s->bus->dma);
>  }
> -}
> -
> -if (error_status & IDE_RETRY_DMA) {
> +} else if (IS_IDE_RETRY_DMA(error_status)) {
>  if (error_status & IDE_RETRY_TRIM) {
>

Re: [Qemu-devel] [PULL v2 00/17] target-arm queue

2016-03-30 Thread Peter Maydell

On 30 March 2016 at 17:31, Peter Maydell <peter.mayd...@linaro.org> wrote:
> Last lot of target-arm patches just sneaking under the wire:
>  * m25p80 (should be safe enough since not really used by anything)
>  * virt power key bugfix
>  * query-gic-capabilities QMP command
>
> changes v1->v2: new monitor.c is config-softmmu only.
>
> thanks
> -- PMM
>
>
> The following changes since commit 489ef4c810033e63af570c8a430af8b9858bfa5f:
>
>   Merge remote-tracking branch 'remotes/lalrae/tags/mips-20160329-2' into 
> staging (2016-03-30 16:06:45 +0100)
>
> are available in the git repository at:
>
>
>   git://git.linaro.org/people/pmaydell/qemu-arm.git 
> tags/pull-target-arm-20160330-1
>
> for you to fetch changes up to db31e49a565fc4c165fd98201721b313c3412c1f:
>
>   arm: implement query-gic-capabilities (2016-03-30 17:27:24 +0100)
>
> 
> target-arm queue:
>  * virt: fix the virtual power button by adding a modelled
>"key press for 100ms" device
>  * various improvements to m25p80 flash devices
>  * implement new QMP query-gic-capability command to let the
>management layer know what versions of GIC we support

Applied, thanks.

-- PMM

Re: [Qemu-devel] tcg: reworking tb_invalidated_flag

2016-03-30 Thread Paolo Bonzini

On 30/03/2016 19:08, Sergey Fedorov wrote:
> cpu_exec() case is a bit more subtle. Regarding tb_phys_invalidate(), it
> shouldn't be harmful if an invalidated TB get patched because it is not
> going to be executed anymore. It may only be a concern of efficiency.
> Though it doesn't seem to happen frequently.
> 
> As of catching tb_flush() in cpu_exec() there have been three approaches
> proposed.
> 
> The first approach is to get rid of 'tb_invalidated_flag' and use
> 'tb_flush_count'. Capture 'tb_flush_count' inside 'tb_lock' critical
> section of cpu_exec() and compare it on each execution loop iteration
> before trying to do tb_add_jump(). This would be simple and clear but it
> would cost an extra load from a shared variable 'tb_flush_count' each
> time we go over the execution loop.
> 
> The second approach is to make 'tb_invalidated_flag' per-CPU. This
> would be conceptually similar to what we have, but would give us thread
> safety. With this approach, we need to be careful to correctly clear and
> set the flag.

You can just ensure that setting and clearing it is done under tb_lock.

> The third approach is to mark each individual TB as valid/invalid. This
> is what Emilio has in his MTTCG series [2]. Following this approach, we
> could have very clean code with no extra overhead on the hot path.
> However, it would require to mark all TBs as invalid on tb_flush().
> Given that tb_flush() is rare, it shouldn't be a significant overhead.

I agree; the problem with this solution is that it adds complexity to
tb_flush.  Because TranslationBlocks live in tcg_ctx.tb_ctx.tbs you need
special code to exit all CPUs at tb_flush time, otherwise you risk that
a tb_alloc reuses a TranslationBlock while it is in use by a VCPU.  This
would be complicated, hard to test, and only needed for a very rare
occurrence(*).  Because tb_flush() is rare I believe we should keep it
as simple as possible.

(*) Both Emilio and Fred have a similar "exit all CPUs"
primitive.  Fred used it for tb_flush() and IIRC also
for some TLB flush primitives.  However, Alvise has been
reworking the TLB flush code to use a "CPU halted" state
that's less prone to deadlocks.

> Also, there could be several options how to mark TB valid/invalid:
> a dedicated flag could be introduced or some invalid value of
> pc/cs_base/flags could be used.

This is probably necessary nevertheless in order to make
tb_find_physical run outside tb_lock.

Paolo

Re: [Qemu-devel] [PATCH 1/3] ide: don't lose pending dma state

2016-03-30 Thread John Snow



On 03/28/2016 07:48 AM, Denis V. Lunev wrote:
> From: Pavel Butsykin 
> 
> If the migration occurs after the IDE DMA has been set up but before it
> has been initiated, the state gets lost upon save/restore. Specifically,
> ->dma_cb callback gets cleared, so, when the guest eventually starts bus
> mastering, the DMA never completes, causing the guest to time out the
> operation.
> 
> OTOH all the infrastructure is already in place to restart the DMA if
> the migration happens while the DMA is in progress.
> 
> So reuse that infrastructure, by setting bus->error_status based on
> ->dma_cmd in pre_save if ->dma_cb callback is already set but DMAING is
> clear. This will indicate the need for restart and make sure ->dma_cb is
> restored in ide_restart_bh(); however since DMAING is clear the state
> upon restore will be exactly "ready for DMA" as before the save.
> 
> Signed-off-by: Pavel Butsykin 
> Reviewed-by: Roman Kagan 
> Signed-off-by: Denis V. Lunev 
> CC: John Snow 
> ---
>  hw/ide/core.c |  9 +
>  hw/ide/internal.h | 15 +++
>  hw/ide/pci.c  |  4 
>  3 files changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 241e840..8f86036 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -803,14 +803,7 @@ static void ide_dma_cb(void *opaque, int ret)
>  return;
>  }
>  if (ret < 0) {
> -int op = IDE_RETRY_DMA;
> -
> -if (s->dma_cmd == IDE_DMA_READ)
> -op |= IDE_RETRY_READ;
> -else if (s->dma_cmd == IDE_DMA_TRIM)
> -op |= IDE_RETRY_TRIM;
> -
> -if (ide_handle_rw_error(s, -ret, op)) {
> +if (ide_handle_rw_error(s, -ret, ide_dma_cmd_to_retry(s->dma_cmd))) {
>  return;
>  }
>  }
> diff --git a/hw/ide/internal.h b/hw/ide/internal.h
> index 86bde26..68c7d0d 100644
> --- a/hw/ide/internal.h
> +++ b/hw/ide/internal.h
> @@ -513,6 +513,21 @@ struct IDEDevice {
>  #define IDE_RETRY_TRIM 0x80
>  #define IDE_RETRY_HBA  0x100
>  
> +static inline uint8_t ide_dma_cmd_to_retry(uint8_t dma_cmd)
> +{
> +switch (dma_cmd) {
> +case IDE_DMA_READ:
> +return IDE_RETRY_DMA | IDE_RETRY_READ;
> +case IDE_DMA_WRITE:
> +return IDE_RETRY_DMA;
> +case IDE_DMA_TRIM:
> +return IDE_RETRY_DMA | IDE_RETRY_TRIM;
> +default:
> +break;
> +}
> +return 0;
> +}
> +
>  static inline IDEState *idebus_active_if(IDEBus *bus)
>  {
>  return bus->ifs + bus->unit;
> diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> index 92ffee7..8d56a00 100644
> --- a/hw/ide/pci.c
> +++ b/hw/ide/pci.c
> @@ -308,6 +308,10 @@ static void ide_bmdma_pre_save(void *opaque)
>  BMDMAState *bm = opaque;
>  uint8_t abused_bits = BM_MIGRATION_COMPAT_STATUS_BITS;
>  
> +if (!(bm->status & BM_STATUS_DMAING) && bm->dma_cb) {
> +bm->bus->error_status =
> +ide_dma_cmd_to_retry(bmdma_active_if(bm)->dma_cmd);
> +}
>  bm->migration_retry_unit = bm->bus->retry_unit;
>  bm->migration_retry_sector_num = bm->bus->retry_sector_num;
>  bm->migration_retry_nsector = bm->bus->retry_nsector;
> 

_this_ is the one I meant to R-B, thanks Denis. (Testing the others now,
but having some laptop issues. Please stand by.)

Reviewed-by: John Snow

Re: [Qemu-devel] [PATCH v2 3/3] doc: Propose Structured Read extension

2016-03-30 Thread Eric Blake

On 03/30/2016 12:50 AM, Alex Bligh wrote:
> Eric,
> 
> There's a lot in common between our two proposals now (unsurprisingly).
> You've highlighted the differences in the other mail and I'll
> comment on them there. You may want to steal some of my wording as
> I think there are bits I've got that you haven't (as well as vice versa).
> But I'm inclined to use yours as a base unless you particularly
> like mine.
> 
> Comments inline below.
> 
> Alex
> 
> On 30 Mar 2016, at 00:01, Eric Blake  wrote:
> ...
> 
>> +While the server is permitted to send at most one normal reply (or
>> +else close the connection), a command that uses structured replies
>> +may document that the server is permitted to send mutiple replies,

And just now noticing my typo on 'multiple' :)

>> +all sharing the same handle,
> 
> The thought is fine, but the language is confusing. I think this is
> a single reply, made up of multiple parts (I called them chunks). You've
> called them multiple replies, which I think makes things less clear.
> Also below you've started using my 'chunk' language anyway!

All right, for v3, I will try to go with the wording that every request
has a single reply; but the reply will either be a 'simple reply'
(single message), or a 'structured reply' (which may occupy multiple
messages, where each message is called a 'chunk').

> 
>> by using the `NBD_REPLY_FLAG_DONE`
>> +(bit 0) to delineate the final reply. The server MAY interleave
>> +intermediate replies to one structured command with replies
>> +relating to a different handle.
> 
> Neat.
> 
> The argument against this route is that now there are essentially
> two ways to end a chain of chunks (with and without a NONE chunk)
> which is necessary for the reasons you set out. On balance I like it though.

Yeah; I didn't see any way around that.  Always requiring a NONE chunk
is more network overhead if the server can guarantee that the last data
chunk is error-free; but at the same time, we shouldn't force servers to
guarantee the last data chunk will be error-free.

I don't think it is too much of a burden for a client to receive chunks
in a loop until the FLAG_DONE bit is set, without regards to what chunk
type came last.  And for CMD_READ, we still have a nice delineation: if
the last chunk is NONE, OFFSET_DATA, or OFFSET_HOLE, the command
succeeded; if the last chunk is ERROR or ERROR_OFFSET, the command failed.

> 
>>  However, for ease of implementation, a
>> +server MAY close the connection rather than entering transmission
>> +phase if, at the end of option haggling, the client has negotiated
>> +another command that requires a structured reply but did not also
>> +negotiate Structured Reads.
> 
> That's pretty yucky given a reconnect will achieve the same result
> and you'll end up in an infinite retry loop.
> 
> Wouldn't a better route be simply to say that implementing certain
> commands (server or client sides) requires support of structured
> replies?

I think we're in agreement that the only command that (for back-compat
reasons) can ever send data on a simple reply is CMD_READ.  Therefore,
if you negotiate any other command that can send data, that command will
use a structured reply; the mere fact that you negotiated the command
means that both client and server know about structured replies.

I guess what I was trying to get at is that if you are using any
structured replies, then it is a disservice to wire-sniffers if you did
not also enable structured replies for CMD_READ.  Technically, it would
be feasible to use simple replies for reads while using structured
replies for the other negotiated commands, but practically, a client
that does that seems undesirable, which is why I said that a server
could disconnect rather than talking to such a client.

So taking your sentence at face value, yes there is another solution
possible: require that any NBD_OPT_* haggling used to negotiate the use
of any other command with a structured reply MUST fail if the client has
not first negotiated NBD_OPT_STRUCTURED_READ, but leaves the connection
open so the client can continue option haggling.  That way, the only way
to end option haggling with the new command enabled is to also enable
structured reads.  The burden is then on the client to haggle in the
correct order, and on the server to track haggling state when deciding
how to answer the option requests for the new commands.

I'm a bit reluctant to put ordering requirements on option haggling
(that option A must be turned on before option B), but then again, the
SELECT extension requires NBD_OPT_SELECT to be haggled prior to
NBD_OPT_GO, so there's precedent.  I also am thinking of proposing an
extension for option haggling to communicate minimum/preferred
alignments and maximum don't-fragment sizing, and that option would have
to be enabled before OPT_LIST/OPT_SELECT/OPT_EXPORT_NAME (because it
would change the NBD_REP_SERVER layout in

Re: [Qemu-devel] [PATCH 1/3] ide: don't loose pending dma state

2016-03-30 Thread John Snow



On 03/30/2016 01:38 PM, Denis V. Lunev wrote:
> On 03/30/2016 08:35 PM, John Snow wrote:
>>
>> On 03/23/2016 06:26 AM, Denis V. Lunev wrote:
>>> From: Pavel Butsykin 
>>>
>>> If the migration occurs after the IDE DMA has been set up but before it
>>> has been initiated, the state gets lost upon save/restore. Specifically,
>>> ->dma_cb callback gets cleared, so, when the guest eventually starts bus
>>> mastering, the DMA never completes, causing the guest to time out the
>>> operation.
>>>
>>> OTOH all the infrastructure is already in place to restart the DMA if
>>> the migration happens while the DMA is in progress.
>>>
>>> So reuse that infrastructure, by calling the DMA callback with an
>>> artificial error code in pre_save if the callback is already set but
>>> DMAING is clear. The callback then sets bus->error_status to indicate
>>> the need for restart; however since DMAING is clear the state upon
>>> restore will be exactly "ready forDMA" as before the save.
>>>
>>> This patch only fixes the IDE DMA case; the ATAPI DMA is left with a
>>> stub
>>> for now since its restart hadn't been implemented yet. It will be
>>> addressed in the followup patch.
>>>
>>> Signed-off-by: Pavel Butsykin 
>>> Reviewed-by: Roman Kagan 
>>> Signed-off-by: Denis V. Lunev 
>>> CC: John Snow 
>>> ---
>>>   hw/ide/atapi.c| 9 -
>>>   hw/ide/core.c | 9 -
>>>   hw/ide/internal.h | 7 +++
>>>   hw/ide/pci.c  | 3 +++
>>>   4 files changed, 26 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>>> index 1fe58ab..268220d 100644
>>> --- a/hw/ide/atapi.c
>>> +++ b/hw/ide/atapi.c
>>> @@ -381,7 +381,14 @@ static void ide_atapi_cmd_read_dma_cb(void
>>> *opaque, int ret)
>>>   IDEState *s = opaque;
>>>   int data_offset, n;
>>>   -if (ret < 0) {
>>> +if (ret < 0) { /* XXX: handle -ESTATESAVE for migration */
>>> +if (ret == -ESTATESAVE) {
>>> +/*
>>> + * This case is not really an error
>>> + * but a request to save the state.
>>> + */
>>> +return;
>>> +}
>>>   ide_atapi_io_error(s, ret);
>>>   goto eot;
>>>   }
>>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>>> index 241e840..872e11f 100644
>>> --- a/hw/ide/core.c
>>> +++ b/hw/ide/core.c
>>> @@ -810,7 +810,14 @@ static void ide_dma_cb(void *opaque, int ret)
>>>   else if (s->dma_cmd == IDE_DMA_TRIM)
>>>   op |= IDE_RETRY_TRIM;
>>>   -if (ide_handle_rw_error(s, -ret, op)) {
>>> +if (ret == -ESTATESAVE) {
>>> +/*
>>> + * This case is not really an error
>>> + * but a request to save the state.
>>> + */
>>> +s->bus->error_status = op;
>>> +return;
>>> +} else if (ide_handle_rw_error(s, -ret, op)) {
>>>   return;
>>>   }
>>>   }
>>> diff --git a/hw/ide/internal.h b/hw/ide/internal.h
>>> index 86bde26..dcd8627 100644
>>> --- a/hw/ide/internal.h
>>> +++ b/hw/ide/internal.h
>>> @@ -513,6 +513,13 @@ struct IDEDevice {
>>>   #define IDE_RETRY_TRIM 0x80
>>>   #define IDE_RETRY_HBA  0x100
>>>   +/*
>>> + * a code to trigger entering error path and save/restore the "ready
>>> to DMA"
>>> + * state just like DMA-ing state. (Ab)use EINPROGRESS as it's not
>>> supposed to
>>> + * come from the block layer
>>> + */
>>> +#define ESTATESAVE EINPROGRESS
>>> +
>>>   static inline IDEState *idebus_active_if(IDEBus *bus)
>>>   {
>>>   return bus->ifs + bus->unit;
>>> diff --git a/hw/ide/pci.c b/hw/ide/pci.c
>>> index 92ffee7..e1f89fa 100644
>>> --- a/hw/ide/pci.c
>>> +++ b/hw/ide/pci.c
>>> @@ -308,6 +308,9 @@ static void ide_bmdma_pre_save(void *opaque)
>>>   BMDMAState *bm = opaque;
>>>   uint8_t abused_bits = BM_MIGRATION_COMPAT_STATUS_BITS;
>>>   +if (!(bm->status & BM_STATUS_DMAING) && bm->dma_cb) {
>>> +bm->dma_cb(bmdma_active_if(bm), -ESTATESAVE);
>>> +}
>>>   bm->migration_retry_unit = bm->bus->retry_unit;
>>>   bm->migration_retry_sector_num = bm->bus->retry_sector_num;
>>>   bm->migration_retry_nsector = bm->bus->retry_nsector;
>>>
>> Reviewed-by: John Snow 
> there is never patch sent [PATCH 1/3] ide: don't lose pending dma state
> 28.03
> as v2.
> 
> Den

My mistake. I tested the new one but replied to the wrong mail.

Re: [Qemu-devel] [PATCH 1/3] ide: don't loose pending dma state

2016-03-30 Thread Denis V. Lunev


On 03/30/2016 08:35 PM, John Snow wrote:


On 03/23/2016 06:26 AM, Denis V. Lunev wrote:

From: Pavel Butsykin 

If the migration occurs after the IDE DMA has been set up but before it
has been initiated, the state gets lost upon save/restore. Specifically,
->dma_cb callback gets cleared, so, when the guest eventually starts bus
mastering, the DMA never completes, causing the guest to time out the
operation.

OTOH all the infrastructure is already in place to restart the DMA if
the migration happens while the DMA is in progress.

So reuse that infrastructure, by calling the DMA callback with an
artificial error code in pre_save if the callback is already set but
DMAING is clear. The callback then sets bus->error_status to indicate
the need for restart; however since DMAING is clear the state upon
restore will be exactly "ready forDMA" as before the save.

This patch only fixes the IDE DMA case; the ATAPI DMA is left with a stub
for now since its restart hadn't been implemented yet. It will be
addressed in the followup patch.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: John Snow 
---
  hw/ide/atapi.c| 9 -
  hw/ide/core.c | 9 -
  hw/ide/internal.h | 7 +++
  hw/ide/pci.c  | 3 +++
  4 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 1fe58ab..268220d 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -381,7 +381,14 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int 
ret)
  IDEState *s = opaque;
  int data_offset, n;
  
-if (ret < 0) {

+if (ret < 0) { /* XXX: handle -ESTATESAVE for migration */
+if (ret == -ESTATESAVE) {
+/*
+ * This case is not really an error
+ * but a request to save the state.
+ */
+return;
+}
  ide_atapi_io_error(s, ret);
  goto eot;
  }
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 241e840..872e11f 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -810,7 +810,14 @@ static void ide_dma_cb(void *opaque, int ret)
  else if (s->dma_cmd == IDE_DMA_TRIM)
  op |= IDE_RETRY_TRIM;
  
-if (ide_handle_rw_error(s, -ret, op)) {

+if (ret == -ESTATESAVE) {
+/*
+ * This case is not really an error
+ * but a request to save the state.
+ */
+s->bus->error_status = op;
+return;
+} else if (ide_handle_rw_error(s, -ret, op)) {
  return;
  }
  }
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 86bde26..dcd8627 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -513,6 +513,13 @@ struct IDEDevice {
  #define IDE_RETRY_TRIM 0x80
  #define IDE_RETRY_HBA  0x100
  
+/*

+ * a code to trigger entering error path and save/restore the "ready to DMA"
+ * state just like DMA-ing state. (Ab)use EINPROGRESS as it's not supposed to
+ * come from the block layer
+ */
+#define ESTATESAVE EINPROGRESS
+
  static inline IDEState *idebus_active_if(IDEBus *bus)
  {
  return bus->ifs + bus->unit;
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 92ffee7..e1f89fa 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -308,6 +308,9 @@ static void ide_bmdma_pre_save(void *opaque)
  BMDMAState *bm = opaque;
  uint8_t abused_bits = BM_MIGRATION_COMPAT_STATUS_BITS;
  
+if (!(bm->status & BM_STATUS_DMAING) && bm->dma_cb) {

+bm->dma_cb(bmdma_active_if(bm), -ESTATESAVE);
+}
  bm->migration_retry_unit = bm->bus->retry_unit;
  bm->migration_retry_sector_num = bm->bus->retry_sector_num;
  bm->migration_retry_nsector = bm->bus->retry_nsector;


Reviewed-by: John Snow 
there is never patch sent [PATCH 1/3] ide: don't lose pending dma state 
28.03

as v2.

Den

Re: [Qemu-devel] [PATCH 1/3] ide: don't loose pending dma state

2016-03-30 Thread John Snow



On 03/23/2016 06:26 AM, Denis V. Lunev wrote:
> From: Pavel Butsykin 
> 
> If the migration occurs after the IDE DMA has been set up but before it
> has been initiated, the state gets lost upon save/restore. Specifically,
> ->dma_cb callback gets cleared, so, when the guest eventually starts bus
> mastering, the DMA never completes, causing the guest to time out the
> operation.
> 
> OTOH all the infrastructure is already in place to restart the DMA if
> the migration happens while the DMA is in progress.
> 
> So reuse that infrastructure, by calling the DMA callback with an
> artificial error code in pre_save if the callback is already set but
> DMAING is clear. The callback then sets bus->error_status to indicate
> the need for restart; however since DMAING is clear the state upon
> restore will be exactly "ready forDMA" as before the save.
> 
> This patch only fixes the IDE DMA case; the ATAPI DMA is left with a stub
> for now since its restart hadn't been implemented yet. It will be
> addressed in the followup patch.
> 
> Signed-off-by: Pavel Butsykin 
> Reviewed-by: Roman Kagan 
> Signed-off-by: Denis V. Lunev 
> CC: John Snow 
> ---
>  hw/ide/atapi.c| 9 -
>  hw/ide/core.c | 9 -
>  hw/ide/internal.h | 7 +++
>  hw/ide/pci.c  | 3 +++
>  4 files changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> index 1fe58ab..268220d 100644
> --- a/hw/ide/atapi.c
> +++ b/hw/ide/atapi.c
> @@ -381,7 +381,14 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int 
> ret)
>  IDEState *s = opaque;
>  int data_offset, n;
>  
> -if (ret < 0) {
> +if (ret < 0) { /* XXX: handle -ESTATESAVE for migration */
> +if (ret == -ESTATESAVE) {
> +/*
> + * This case is not really an error
> + * but a request to save the state.
> + */
> +return;
> +}
>  ide_atapi_io_error(s, ret);
>  goto eot;
>  }
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 241e840..872e11f 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -810,7 +810,14 @@ static void ide_dma_cb(void *opaque, int ret)
>  else if (s->dma_cmd == IDE_DMA_TRIM)
>  op |= IDE_RETRY_TRIM;
>  
> -if (ide_handle_rw_error(s, -ret, op)) {
> +if (ret == -ESTATESAVE) {
> +/*
> + * This case is not really an error
> + * but a request to save the state.
> + */
> +s->bus->error_status = op;
> +return;
> +} else if (ide_handle_rw_error(s, -ret, op)) {
>  return;
>  }
>  }
> diff --git a/hw/ide/internal.h b/hw/ide/internal.h
> index 86bde26..dcd8627 100644
> --- a/hw/ide/internal.h
> +++ b/hw/ide/internal.h
> @@ -513,6 +513,13 @@ struct IDEDevice {
>  #define IDE_RETRY_TRIM 0x80
>  #define IDE_RETRY_HBA  0x100
>  
> +/*
> + * a code to trigger entering error path and save/restore the "ready to DMA"
> + * state just like DMA-ing state. (Ab)use EINPROGRESS as it's not supposed to
> + * come from the block layer
> + */
> +#define ESTATESAVE EINPROGRESS
> +
>  static inline IDEState *idebus_active_if(IDEBus *bus)
>  {
>  return bus->ifs + bus->unit;
> diff --git a/hw/ide/pci.c b/hw/ide/pci.c
> index 92ffee7..e1f89fa 100644
> --- a/hw/ide/pci.c
> +++ b/hw/ide/pci.c
> @@ -308,6 +308,9 @@ static void ide_bmdma_pre_save(void *opaque)
>  BMDMAState *bm = opaque;
>  uint8_t abused_bits = BM_MIGRATION_COMPAT_STATUS_BITS;
>  
> +if (!(bm->status & BM_STATUS_DMAING) && bm->dma_cb) {
> +bm->dma_cb(bmdma_active_if(bm), -ESTATESAVE);
> +}
>  bm->migration_retry_unit = bm->bus->retry_unit;
>  bm->migration_retry_sector_num = bm->bus->retry_sector_num;
>  bm->migration_retry_nsector = bm->bus->retry_nsector;
> 

Reviewed-by: John Snow

Re: [Qemu-devel] [Qemu-ppc] [PATCH v2] spapr: compute interrupt vector address from LPCR

2016-03-30 Thread Greg Kurz

On Wed, 30 Mar 2016 19:01:31 +0200
Greg Kurz  wrote:

> On Wed, 30 Mar 2016 17:38:34 +0200
> Cédric Le Goater  wrote:
> 
> > This address is changed by the linux kernel using the H_SET_MODE hcall
> > and needs to be restored when migrating a spapr VM running in
> > TCG. This can be done using the AIL bits from the LPCR register.
> > 
> > The patch introduces a helper routine cpu_ppc_get_excp_prefix() which
> > returns the effective address offset of the interrupt handler
> > depending on the LPCR_AIL bits. The same helper can be used in the
> > H_SET_MODE hcall, which lets us remove the H_SET_MODE_ADDR_TRANS_*
> > defines.
> >   
> 
> 
> ... because these H_SET_MODE_ADDR_TRANS_* defines, which refer to the values
> 0, 1, 2 and 3 for the mflag argument to the H_SET_MODE hypercall as described
> in section 14.5.4.3.8 of PAPR, are actually values for bits 39:40 (AIL) of the
> LPCR register, as described in section 2.2 of PowerISA.

read "section section 2.2 of chapter 2 of book-III S in PowerISA" :)

> 
> 
> > Signed-off-by: Cédric Le Goater 
> > ---
> >   
> 
> Reviewed-by: Greg Kurz 
> 
> >  Changes since v1:
> > 
> >  - moved helper routine under target-ppc/
> >  - moved the restore of excp_prefix under cpu_post_load()
> > 
> >  hw/ppc/spapr_hcall.c|   13 ++---
> >  include/hw/ppc/spapr.h  |5 -
> >  target-ppc/cpu.h|9 +
> >  target-ppc/machine.c|   20 +++-
> >  target-ppc/translate_init.c |   14 ++
> >  5 files changed, 44 insertions(+), 17 deletions(-)
> > 
> > Index: qemu-dgibson-for-2.6.git/hw/ppc/spapr_hcall.c
> > ===
> > --- qemu-dgibson-for-2.6.git.orig/hw/ppc/spapr_hcall.c
> > +++ qemu-dgibson-for-2.6.git/hw/ppc/spapr_hcall.c
> > @@ -835,17 +835,8 @@ static target_ulong h_set_mode_resource_
> >  return H_P4;
> >  }
> > 
> > -switch (mflags) {
> > -case H_SET_MODE_ADDR_TRANS_NONE:
> > -prefix = 0;
> > -break;
> > -case H_SET_MODE_ADDR_TRANS_0001_8000:
> > -prefix = 0x18000;
> > -break;
> > -case H_SET_MODE_ADDR_TRANS_C000___4000:
> > -prefix = 0xC0004000ULL;
> > -break;
> > -default:
> > +prefix = cpu_ppc_get_excp_prefix(mflags);
> > +if (prefix == (target_ulong) -1ULL) {
> >  return H_UNSUPPORTED_FLAG;
> >  }
> > 
> > Index: qemu-dgibson-for-2.6.git/target-ppc/machine.c
> > ===
> > --- qemu-dgibson-for-2.6.git.orig/target-ppc/machine.c
> > +++ qemu-dgibson-for-2.6.git/target-ppc/machine.c
> > @@ -156,12 +156,26 @@ static void cpu_pre_save(void *opaque)
> >  }
> >  }
> > 
> > +
> > +static int cpu_post_load_excp_prefix(CPUPPCState *env)
> > +{
> > +int ail = (env->spr[SPR_LPCR] & LPCR_AIL) >> LPCR_AIL_SHIFT;
> > +target_ulong prefix = cpu_ppc_get_excp_prefix(ail);
> > +
> > +if (prefix == (target_ulong) -1ULL) {
> > +return -EINVAL;
> > +}
> > +env->excp_prefix = prefix;
> > +return 0;
> > +}
> > +
> >  static int cpu_post_load(void *opaque, int version_id)
> >  {
> >  PowerPCCPU *cpu = opaque;
> >  CPUPPCState *env = >env;
> >  int i;
> >  target_ulong msr;
> > +int ret = 0;
> > 
> >  /*
> >   * We always ignore the source PVR. The user or management
> > @@ -201,7 +215,11 @@ static int cpu_post_load(void *opaque, i
> > 
> >  hreg_compute_mem_idx(env);
> > 
> > -return 0;
> > +if (env->spr[SPR_LPCR] & LPCR_AIL) {
> > +ret = cpu_post_load_excp_prefix(env);
> > +}
> > +
> > +return ret;
> >  }
> > 
> >  static bool fpu_needed(void *opaque)
> > Index: qemu-dgibson-for-2.6.git/target-ppc/translate_init.c
> > ===
> > --- qemu-dgibson-for-2.6.git.orig/target-ppc/translate_init.c
> > +++ qemu-dgibson-for-2.6.git/target-ppc/translate_init.c
> > @@ -8522,6 +8522,20 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu)
> >  }
> >  }
> > 
> > +target_ulong cpu_ppc_get_excp_prefix(target_ulong ail)
> > +{
> > +switch (ail) {
> > +case AIL_NONE:
> > +return 0;
> > +case AIL_0001_8000:
> > +return 0x18000;
> > +case AIL_C000___4000:
> > +return 0xC0004000ULL;
> > +default:
> > +return (target_ulong) -1ULL;
> > +}
> > +}
> > +
> >  #endif /* !defined(CONFIG_USER_ONLY) */
> > 
> >  #endif /* defined (TARGET_PPC64) */
> > Index: qemu-dgibson-for-2.6.git/target-ppc/cpu.h
> > ===
> > --- qemu-dgibson-for-2.6.git.orig/target-ppc/cpu.h
> > +++ qemu-dgibson-for-2.6.git/target-ppc/cpu.h
> > @@ -1269,6 +1269,7 @@ void store_booke_tsr (CPUPPCState *env,
> >  void ppc_tlb_invalidate_all (CPUPPCState *env);
> >  void

[Qemu-devel] [PATCH] target-ppc: Multiple/String Word alignment exception

2016-03-30 Thread Laurent Vivier

If the processor is in little-endian mode, an alignment interrupt must
occur for the following instructions: lmw, stmw, lswi, lswx, stswi or stswx.

This is what happens with KVM, so change TCG to do the same.

As the instruction can be emulated by the kernel, enable the change
only in softmmu mode.

Signed-off-by: Laurent Vivier 
---
 target-ppc/translate.c | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 6f0e7b4..e33dcf7 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -3181,6 +3181,13 @@ static void gen_lmw(DisasContext *ctx)
 {
 TCGv t0;
 TCGv_i32 t1;
+#if !defined(CONFIG_USER_ONLY)
+if (ctx->le_mode) {
+gen_exception_err(ctx, POWERPC_EXCP_ALIGN, POWERPC_EXCP_ALIGN_LE);
+return;
+}
+#endif
+
 gen_set_access_type(ctx, ACCESS_INT);
 /* NIP cannot be restored if the memory exception comes from an helper */
 gen_update_nip(ctx, ctx->nip - 4);
@@ -3197,6 +3204,13 @@ static void gen_stmw(DisasContext *ctx)
 {
 TCGv t0;
 TCGv_i32 t1;
+#if !defined(CONFIG_USER_ONLY)
+if (ctx->le_mode) {
+gen_exception_err(ctx, POWERPC_EXCP_ALIGN, POWERPC_EXCP_ALIGN_LE);
+return;
+}
+#endif
+
 gen_set_access_type(ctx, ACCESS_INT);
 /* NIP cannot be restored if the memory exception comes from an helper */
 gen_update_nip(ctx, ctx->nip - 4);
@@ -3224,6 +3238,13 @@ static void gen_lswi(DisasContext *ctx)
 int start = rD(ctx->opcode);
 int ra = rA(ctx->opcode);
 int nr;
+#if !defined(CONFIG_USER_ONLY)
+if (ctx->le_mode) {
+gen_exception_err(ctx, POWERPC_EXCP_ALIGN, POWERPC_EXCP_ALIGN_LE);
+return;
+}
+#endif
+
 
 if (nb == 0)
 nb = 32;
@@ -3252,6 +3273,13 @@ static void gen_lswx(DisasContext *ctx)
 {
 TCGv t0;
 TCGv_i32 t1, t2, t3;
+#if !defined(CONFIG_USER_ONLY)
+if (ctx->le_mode) {
+gen_exception_err(ctx, POWERPC_EXCP_ALIGN, POWERPC_EXCP_ALIGN_LE);
+return;
+}
+#endif
+
 gen_set_access_type(ctx, ACCESS_INT);
 /* NIP cannot be restored if the memory exception comes from an helper */
 gen_update_nip(ctx, ctx->nip - 4);
@@ -3273,6 +3301,13 @@ static void gen_stswi(DisasContext *ctx)
 TCGv t0;
 TCGv_i32 t1, t2;
 int nb = NB(ctx->opcode);
+#if !defined(CONFIG_USER_ONLY)
+if (ctx->le_mode) {
+gen_exception_err(ctx, POWERPC_EXCP_ALIGN, POWERPC_EXCP_ALIGN_LE);
+return;
+}
+#endif
+
 gen_set_access_type(ctx, ACCESS_INT);
 /* NIP cannot be restored if the memory exception comes from an helper */
 gen_update_nip(ctx, ctx->nip - 4);
@@ -3293,6 +3328,13 @@ static void gen_stswx(DisasContext *ctx)
 {
 TCGv t0;
 TCGv_i32 t1, t2;
+#if !defined(CONFIG_USER_ONLY)
+if (ctx->le_mode) {
+gen_exception_err(ctx, POWERPC_EXCP_ALIGN, POWERPC_EXCP_ALIGN_LE);
+return;
+}
+#endif
+
 gen_set_access_type(ctx, ACCESS_INT);
 /* NIP cannot be restored if the memory exception comes from an helper */
 gen_update_nip(ctx, ctx->nip - 4);
-- 
2.5.5

Re: [Qemu-devel] [PATCH v2] spapr: compute interrupt vector address from LPCR

2016-03-30 Thread Greg Kurz

On Wed, 30 Mar 2016 17:38:34 +0200
Cédric Le Goater  wrote:

> This address is changed by the linux kernel using the H_SET_MODE hcall
> and needs to be restored when migrating a spapr VM running in
> TCG. This can be done using the AIL bits from the LPCR register.
> 
> The patch introduces a helper routine cpu_ppc_get_excp_prefix() which
> returns the effective address offset of the interrupt handler
> depending on the LPCR_AIL bits. The same helper can be used in the
> H_SET_MODE hcall, which lets us remove the H_SET_MODE_ADDR_TRANS_*
> defines.
> 
> Signed-off-by: Cédric Le Goater 
> ---
> 

Sorry I hit the send button too quickly... Just a nit but my Reviewed-by stands.

>  Changes since v1:
> 
>  - moved helper routine under target-ppc/
>  - moved the restore of excp_prefix under cpu_post_load()
> 
>  hw/ppc/spapr_hcall.c|   13 ++---
>  include/hw/ppc/spapr.h  |5 -
>  target-ppc/cpu.h|9 +
>  target-ppc/machine.c|   20 +++-
>  target-ppc/translate_init.c |   14 ++
>  5 files changed, 44 insertions(+), 17 deletions(-)
> 
> Index: qemu-dgibson-for-2.6.git/hw/ppc/spapr_hcall.c
> ===
> --- qemu-dgibson-for-2.6.git.orig/hw/ppc/spapr_hcall.c
> +++ qemu-dgibson-for-2.6.git/hw/ppc/spapr_hcall.c
> @@ -835,17 +835,8 @@ static target_ulong h_set_mode_resource_
>  return H_P4;
>  }
> 
> -switch (mflags) {
> -case H_SET_MODE_ADDR_TRANS_NONE:
> -prefix = 0;
> -break;
> -case H_SET_MODE_ADDR_TRANS_0001_8000:
> -prefix = 0x18000;
> -break;
> -case H_SET_MODE_ADDR_TRANS_C000___4000:
> -prefix = 0xC0004000ULL;
> -break;
> -default:
> +prefix = cpu_ppc_get_excp_prefix(mflags);
> +if (prefix == (target_ulong) -1ULL) {

+if (prefix == (target_ulong) (-1ULL)) {

to make ./scripts/checkpatch.pl happy :)

>  return H_UNSUPPORTED_FLAG;
>  }
> 
> Index: qemu-dgibson-for-2.6.git/target-ppc/machine.c
> ===
> --- qemu-dgibson-for-2.6.git.orig/target-ppc/machine.c
> +++ qemu-dgibson-for-2.6.git/target-ppc/machine.c
> @@ -156,12 +156,26 @@ static void cpu_pre_save(void *opaque)
>  }
>  }
> 
> +
> +static int cpu_post_load_excp_prefix(CPUPPCState *env)
> +{
> +int ail = (env->spr[SPR_LPCR] & LPCR_AIL) >> LPCR_AIL_SHIFT;
> +target_ulong prefix = cpu_ppc_get_excp_prefix(ail);
> +
> +if (prefix == (target_ulong) -1ULL) {
> +return -EINVAL;
> +}
> +env->excp_prefix = prefix;
> +return 0;
> +}
> +
>  static int cpu_post_load(void *opaque, int version_id)
>  {
>  PowerPCCPU *cpu = opaque;
>  CPUPPCState *env = >env;
>  int i;
>  target_ulong msr;
> +int ret = 0;
> 
>  /*
>   * We always ignore the source PVR. The user or management
> @@ -201,7 +215,11 @@ static int cpu_post_load(void *opaque, i
> 
>  hreg_compute_mem_idx(env);
> 
> -return 0;
> +if (env->spr[SPR_LPCR] & LPCR_AIL) {
> +ret = cpu_post_load_excp_prefix(env);
> +}
> +
> +return ret;
>  }
> 
>  static bool fpu_needed(void *opaque)
> Index: qemu-dgibson-for-2.6.git/target-ppc/translate_init.c
> ===
> --- qemu-dgibson-for-2.6.git.orig/target-ppc/translate_init.c
> +++ qemu-dgibson-for-2.6.git/target-ppc/translate_init.c
> @@ -8522,6 +8522,20 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu)
>  }
>  }
> 
> +target_ulong cpu_ppc_get_excp_prefix(target_ulong ail)
> +{
> +switch (ail) {
> +case AIL_NONE:
> +return 0;
> +case AIL_0001_8000:
> +return 0x18000;
> +case AIL_C000___4000:
> +return 0xC0004000ULL;
> +default:
> +return (target_ulong) -1ULL;
> +}
> +}
> +
>  #endif /* !defined(CONFIG_USER_ONLY) */
> 
>  #endif /* defined (TARGET_PPC64) */
> Index: qemu-dgibson-for-2.6.git/target-ppc/cpu.h
> ===
> --- qemu-dgibson-for-2.6.git.orig/target-ppc/cpu.h
> +++ qemu-dgibson-for-2.6.git/target-ppc/cpu.h
> @@ -1269,6 +1269,7 @@ void store_booke_tsr (CPUPPCState *env,
>  void ppc_tlb_invalidate_all (CPUPPCState *env);
>  void ppc_tlb_invalidate_one (CPUPPCState *env, target_ulong addr);
>  void cpu_ppc_set_papr(PowerPCCPU *cpu);
> +target_ulong cpu_ppc_get_excp_prefix(target_ulong ail);
>  #endif
>  #endif
> 
> @@ -2277,6 +2278,14 @@ enum {
>  HMER_XSCOM_STATUS_LSH   = (63 - 23),
>  };
> 
> +/* Alternate Interrupt Location (AIL) */
> +enum {
> +AIL_NONE= 0,
> +AIL_RESERVED= 1,
> +AIL_0001_8000   = 2,
> +AIL_C000___4000 = 3,
> +};
> +
>  
> /*/
> 
>  static inline target_ulong cpu_read_xer(CPUPPCState

Re: [Qemu-devel] [PATCH v2] spapr: compute interrupt vector address from LPCR

2016-03-30 Thread Greg Kurz

On Wed, 30 Mar 2016 17:38:34 +0200
Cédric Le Goater  wrote:

> This address is changed by the linux kernel using the H_SET_MODE hcall
> and needs to be restored when migrating a spapr VM running in
> TCG. This can be done using the AIL bits from the LPCR register.
> 
> The patch introduces a helper routine cpu_ppc_get_excp_prefix() which
> returns the effective address offset of the interrupt handler
> depending on the LPCR_AIL bits. The same helper can be used in the
> H_SET_MODE hcall, which lets us remove the H_SET_MODE_ADDR_TRANS_*
> defines.
> 


... because these H_SET_MODE_ADDR_TRANS_* defines, which refer to the values
0, 1, 2 and 3 for the mflag argument to the H_SET_MODE hypercall as described
in section 14.5.4.3.8 of PAPR, are actually values for bits 39:40 (AIL) of the
LPCR register, as described in section 2.2 of PowerISA.


> Signed-off-by: Cédric Le Goater 
> ---
> 

Reviewed-by: Greg Kurz 

>  Changes since v1:
> 
>  - moved helper routine under target-ppc/
>  - moved the restore of excp_prefix under cpu_post_load()
> 
>  hw/ppc/spapr_hcall.c|   13 ++---
>  include/hw/ppc/spapr.h  |5 -
>  target-ppc/cpu.h|9 +
>  target-ppc/machine.c|   20 +++-
>  target-ppc/translate_init.c |   14 ++
>  5 files changed, 44 insertions(+), 17 deletions(-)
> 
> Index: qemu-dgibson-for-2.6.git/hw/ppc/spapr_hcall.c
> ===
> --- qemu-dgibson-for-2.6.git.orig/hw/ppc/spapr_hcall.c
> +++ qemu-dgibson-for-2.6.git/hw/ppc/spapr_hcall.c
> @@ -835,17 +835,8 @@ static target_ulong h_set_mode_resource_
>  return H_P4;
>  }
> 
> -switch (mflags) {
> -case H_SET_MODE_ADDR_TRANS_NONE:
> -prefix = 0;
> -break;
> -case H_SET_MODE_ADDR_TRANS_0001_8000:
> -prefix = 0x18000;
> -break;
> -case H_SET_MODE_ADDR_TRANS_C000___4000:
> -prefix = 0xC0004000ULL;
> -break;
> -default:
> +prefix = cpu_ppc_get_excp_prefix(mflags);
> +if (prefix == (target_ulong) -1ULL) {
>  return H_UNSUPPORTED_FLAG;
>  }
> 
> Index: qemu-dgibson-for-2.6.git/target-ppc/machine.c
> ===
> --- qemu-dgibson-for-2.6.git.orig/target-ppc/machine.c
> +++ qemu-dgibson-for-2.6.git/target-ppc/machine.c
> @@ -156,12 +156,26 @@ static void cpu_pre_save(void *opaque)
>  }
>  }
> 
> +
> +static int cpu_post_load_excp_prefix(CPUPPCState *env)
> +{
> +int ail = (env->spr[SPR_LPCR] & LPCR_AIL) >> LPCR_AIL_SHIFT;
> +target_ulong prefix = cpu_ppc_get_excp_prefix(ail);
> +
> +if (prefix == (target_ulong) -1ULL) {
> +return -EINVAL;
> +}
> +env->excp_prefix = prefix;
> +return 0;
> +}
> +
>  static int cpu_post_load(void *opaque, int version_id)
>  {
>  PowerPCCPU *cpu = opaque;
>  CPUPPCState *env = >env;
>  int i;
>  target_ulong msr;
> +int ret = 0;
> 
>  /*
>   * We always ignore the source PVR. The user or management
> @@ -201,7 +215,11 @@ static int cpu_post_load(void *opaque, i
> 
>  hreg_compute_mem_idx(env);
> 
> -return 0;
> +if (env->spr[SPR_LPCR] & LPCR_AIL) {
> +ret = cpu_post_load_excp_prefix(env);
> +}
> +
> +return ret;
>  }
> 
>  static bool fpu_needed(void *opaque)
> Index: qemu-dgibson-for-2.6.git/target-ppc/translate_init.c
> ===
> --- qemu-dgibson-for-2.6.git.orig/target-ppc/translate_init.c
> +++ qemu-dgibson-for-2.6.git/target-ppc/translate_init.c
> @@ -8522,6 +8522,20 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu)
>  }
>  }
> 
> +target_ulong cpu_ppc_get_excp_prefix(target_ulong ail)
> +{
> +switch (ail) {
> +case AIL_NONE:
> +return 0;
> +case AIL_0001_8000:
> +return 0x18000;
> +case AIL_C000___4000:
> +return 0xC0004000ULL;
> +default:
> +return (target_ulong) -1ULL;
> +}
> +}
> +
>  #endif /* !defined(CONFIG_USER_ONLY) */
> 
>  #endif /* defined (TARGET_PPC64) */
> Index: qemu-dgibson-for-2.6.git/target-ppc/cpu.h
> ===
> --- qemu-dgibson-for-2.6.git.orig/target-ppc/cpu.h
> +++ qemu-dgibson-for-2.6.git/target-ppc/cpu.h
> @@ -1269,6 +1269,7 @@ void store_booke_tsr (CPUPPCState *env,
>  void ppc_tlb_invalidate_all (CPUPPCState *env);
>  void ppc_tlb_invalidate_one (CPUPPCState *env, target_ulong addr);
>  void cpu_ppc_set_papr(PowerPCCPU *cpu);
> +target_ulong cpu_ppc_get_excp_prefix(target_ulong ail);
>  #endif
>  #endif
> 
> @@ -2277,6 +2278,14 @@ enum {
>  HMER_XSCOM_STATUS_LSH   = (63 - 23),
>  };
> 
> +/* Alternate Interrupt Location (AIL) */
> +enum {
> +AIL_NONE= 0,
> +AIL_RESERVED= 1,
> +AIL_0001_8000   = 2,
> +

[Qemu-devel] tcg: reworking tb_invalidated_flag

2016-03-30 Thread Sergey Fedorov

Hi,

This is a follow-up of the discussion [1]. The main focus is to move
towards thread-safe TB invalidation and translation buffer flush.
In addition, we can get more clean, readable and reliable code.

First, I'd like to summarize how 'tb_invalidated_flag' is used.
Basically, 'tb_invalidated_flag' is used to catch two events:
 * some TB has been invalidated by tb_phys_invalidate();
 * the whole translation buffer has been flushed by tb_flush().

This is important because we need to be sure:
 * the last executed TB can be safely patched by tb_add_jump() to
   directly call the next one in cpu_exec();
 * the original TB should be provided in 'tb->orig_tb' for further
   possible invalidation along with the temporarily generated TB when in
   cpu_exec_nocache().

cpu_exec_nocache() is a simple case because it is not such a hot piece
of code as cpu_exec() and it is only used in system mode which is not
currently used from multiple threads (though it could be in MTTCG).
Supposing it is safe to invalidate an already invalidated TB, it just
needs to check if tb_flush() has been called during tb_gen_code(). It
could be done by resetting 'tb_invalidated_flag' before calling
tb_gen_code() and checking it afterwards. 'tb_lock' should be held. To
make sure this doesn't affect other code relying on a value of the flag,
we could just preserve it inside 'tb_lock' critical section.

cpu_exec() case is a bit more subtle. Regarding tb_phys_invalidate(), it
shouldn't be harmful if an invalidated TB get patched because it is not
going to be executed anymore. It may only be a concern of efficiency.
Though it doesn't seem to happen frequently.

As of catching tb_flush() in cpu_exec() there have been three approaches
proposed.

The first approach is to get rid of 'tb_invalidated_flag' and use
'tb_flush_count'. Capture 'tb_flush_count' inside 'tb_lock' critical
section of cpu_exec() and compare it on each execution loop iteration
before trying to do tb_add_jump(). This would be simple and clear but it
would cost an extra load from a shared variable 'tb_flush_count' each
time we go over the execution loop.

The second approach is to make 'tb_invalidated_flag' per-CPU. This
would be conceptually similar to what we have, but would give us thread
safety. With this approach, we need to be careful to correctly clear and
set the flag.

The third approach is to mark each individual TB as valid/invalid. This
is what Emilio has in his MTTCG series [2]. Following this approach, we
could have very clean code with no extra overhead on the hot path.
However, it would require to mark all TBs as invalid on tb_flush().
Given that tb_flush() is rare, it shouldn't be a significant overhead.
Also, there could be several options how to mark TB valid/invalid:
a dedicated flag could be introduced or some invalid value of
pc/cs_base/flags could be used.

So the question is, what would be the most appropriate solution?

[1] http://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg06180.html
[2] http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg02582.html

Re: [Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.

2016-03-30 Thread Michael S. Tsirkin

On Wed, Mar 30, 2016 at 06:14:05PM +0300, Ilya Maximets wrote:
> Currently QEMU always crashes in following scenario (assume that
> vhost-user application is Open vSwitch with 'dpdkvhostuser' port):

In fact, wouldn't the right thing to do be stopping the VM?

> 1. # Check that link in guest is in a normal state.
>[guest]# ip link show eth0
> 2: eth0:  mtu 1500 qdisc <...>
> link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> 
> 2. # Kill vhost-user application (using SIGSEGV just to be sure).
>[host]# kill -11 `pgrep ovs-vswitchd`
> 
> 3. # Check that guest still thinks that all is good.
>[guest]# ip link show eth0
> 2: eth0:  mtu 1500 qdisc <...>
> link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> 
> 4. # Try to unbind virtio-pci driver and observe QEMU crash.
>[guest]# echo -n ':00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
> qemu: Failed to read msg header. Read 0 instead of 12. Original request 
> 11.
> qemu: Failed to read msg header. Read 0 instead of 12. Original request 
> 11.
> qemu: Failed to read msg header. Read 0 instead of 12. Original request 
> 11.
> 
> Child terminated with signal = 0xb (SIGSEGV)
> GDBserver exiting
> 
> After the applying of this patch-set:
> 
> 4. # Try to unbind virtio-pci driver. Unbind works fine with only few errors.
>[guest]# echo -n ':00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
> qemu: Failed to read msg header. Read 0 instead of 12. Original request 
> 11.
> qemu: Failed to read msg header. Read 0 instead of 12. Original request 
> 11.
> 
> 5. # Bind virtio-pci driver back.
>[guest]# echo -n ':00:01.0' > /sys/bus/pci/drivers/virtio-pci/bind
> 
> 6. # Check link in guest. No crashes here, link in DOWN state.
>[guest]# ip link show eth0
> 7: eth0:  mtu 1500 qdisc <...>
> link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
> 
> 7. QEMU may be gracefully restarted to restore communication after restarting
>of vhost-user application.
> 
> Ilya Maximets (4):
>   vhost-user: fix crash on socket disconnect.
>   vhost: prevent double stop of vhost_net device.
>   vhost: check for vhost_net device validity.
>   net: notify about link status only if it changed.
> 
>  hw/net/vhost_net.c | 48 
> ++
>  hw/net/virtio-net.c| 33 ++---
>  include/hw/virtio/virtio-net.h |  1 +
>  include/net/vhost-user.h   |  1 +
>  include/net/vhost_net.h|  1 +
>  net/filter.c   |  1 +
>  net/net.c  |  7 +++---
>  net/vhost-user.c   | 43 +
>  8 files changed, 111 insertions(+), 24 deletions(-)
> 
> -- 
> 2.5.0

Re: [Qemu-devel] [PATCH 1/2] softfloat: Enable run-time-configurable meaning of signaling NaN bit

2016-03-30 Thread Aleksandar Markovic

Thank you very much. I will let you know if any change regarding tricore 
appears in later versions of this patch.

From: qemu-devel-bounces+aleksandar.markovic=imgtec@nongnu.org 
[qemu-devel-bounces+aleksandar.markovic=imgtec@nongnu.org] on behalf of 
Bastian Koppelmann [kbast...@mail.uni-paderborn.de]
Sent: Tuesday, March 29, 2016 5:50 AM
To: Aleksandar Markovic; qemu-devel@nongnu.org
Cc: peter.mayd...@linaro.org; ehabk...@redhat.com; pro...@gmail.com; 
mark.cave-ayl...@ilande.co.uk; ag...@suse.de; Petar Jovanovic; 
blauwir...@gmail.com; jcmvb...@gmail.com; Miodrag Dinic; qemu-...@nongnu.org; 
qemu-...@nongnu.org; edgar.igles...@gmail.com; pbonz...@redhat.com; 
g...@mprc.pku.edu.cn; Leon Alrae; afaer...@suse.de; aurel...@aurel32.net; 
r...@twiddle.net
Subject: Re: [Qemu-devel] [PATCH 1/2] softfloat: Enable run-time-configurable 
meaning of signaling NaN bit

On 03/25/2016 01:50 PM, Aleksandar Markovic wrote:
> From: Aleksandar Markovic 
>
> This patch enables SoftFloat library to be configured at run-time in
> relation to the meaning of signaling NaN bit.
>
> Background:
>
> In floating-point calculations, there is a need for denoting undefined or
> unrepresentable values. This is achieved by defining certain floating-point
> numerical values to be NaNs (which stands for "not a number"). For additional
> reasons, virtually all modern floating-point unit implementations use two
> kinds of NaNs: quiet and signaling. The binary representations of these two
> kinds of NaNs, as a rule, differ only in one bit (it is traditionally,
> the first bit of mantissa).
>
> Up to 2008, standards for floating-point did not specify all details about
> binary representation of NaNs. More specifically, the meaning of the bit
> that is used for distinguishing between signaling and quiet NaNs was not
> strictly prescribed. (IEEE 754-2008 was the first floating-point standard
> that defined that meaning clearly, see [1], p. 35) As a result, different
> platforms took different approaches, and this presented certain challenge
> in emulators like QEMU.
>
> Mips architecture represent the most complex case among QEMU-supported
> architectures regarding signaling NaN bit. Up to Release 6 of Mips
> architecture, "1" in signaling NaN bit denoted signaling NaN, which is
> opposite to IEEE 754-2008 standard. From Release 6 on, Mips architecture
> adopted IEEE standard prescription, and "0" denotes signaling NaN. On top of
> that, Mips architecture for SIMD (also known as MSA, or vector instructions)
> also specifies signaling bit in accordance to IEEE standard. MSA unit can be
> implemented with both pre-Release 6 and Release 6 main processor units.
>
> QEMU uses SoftFloat library to implement various floating-point-related
> instructions on all platforms. The current implementation allows for defining
> meaning of signaling NaN bit during build time, and is implemented via
> preprocessor macro called SNAN_BIT_IS_ONE.
>
> The change in this patch enables SoftFloat library to be configured in
> run-time. This configuration is meant to occur during CPU initialization,
> when it is definitely known what desired behavior for particular CPU
> (or any additional FPUs) is.
>
> The change is implemented so that it is consistent with existing
> implementation of similar cases. This means that structure float_status is
> used for passing the information about desired signaling NaN bit during each
> invocation of SoftFloat functions. The additional field in float_status is
> called snan_bit_is_one, which supersedes macro SNAN_BIT_IS_ONE.
>
> Further break down of changes:
>
>   (for the sake of brevity, a placeholder XXX is used below and it might
>   mean float16, float32, float64, floatx80, or float128)
>
>   1) Added field snan_bit_is_one to the structure float_status,
>  and the correspondent setter function set_snan_bit_is_one().
>
>   2) SoftFloat library constants XXX_default_nan converted to functions
>  XXX_default_nan(float_status*). This is necessary since they are
>  dependant on signaling bit meaning.
>
>   3) Added a float_status* argument to SoftFloat library functions
>  XXX_is_quiet_nan(XXX a_), XXX_is_signaling_nan(XXX a_),
>  XXX_maybe_silence_nan(XXX a_).
>
>   4) Updated code in all architectures to reflect changes in SoftFloat
>  library. This change is twofolds: it includes modification of SoftFloat
>  library functions invocations, and addition of invocations of function
>  set_snan_bit_is_one() during CPU initialization, with arguments that
>  are appropriate for each architecture.
>
> IMPORTANT:
>
> This change is not meant to create any change in emulator behavior or
> functionality on any platform. It just provides the means for SoftFloat
> library to be used in a more flexible way - in other words, it will just
> prepare SoftFloat library for usage related to Mips platform and its
> specifics regarding

Re: [Qemu-devel] [PATCH] fix missing event_notifier_init_fd() function on Mac OS X

2016-03-30 Thread Paolo Bonzini



On 30/03/2016 18:35, Programmingkid wrote:
> Remove macro that prevents event_notifier_init_fd() function from being 
> compiled on Mac OS X.
> 
> This patch fixes this error:
> 
> Undefined symbols for architecture x86_64:
>   "_event_notifier_init_fd", referenced from:
>   _process_msg in ivshmem.o
> ld: symbol(s) not found for architecture x86_64
> collect2: error: ld returned 1 exit status
> make[1]: *** [qemu-system-ppc] Error 1
> make: *** [subdir-ppc-softmmu] Error 2
> 
> 
> Signed-off-by: John Arbuckle 

This is intentional, this feature of ivshmem.o could never work on OS X.
 I am not sure that failing the build is intentional.  Markus, any clue?

Paolo

[Qemu-devel] [PATCH] ui/cocoa.m: Add support for cdr files

2016-03-30 Thread Programmingkid

Allow the user to select .cdr files in the file open dialog. 

Signed-off-by: John Arbuckle 
---
 ui/cocoa.m | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/ui/cocoa.m b/ui/cocoa.m
index 6914714..60a7c07 100644
--- a/ui/cocoa.m
+++ b/ui/cocoa.m
@@ -874,7 +874,8 @@ QemuCocoaView *cocoaView;
 
 // set the supported image file types that can be opened
 supportedImageFileTypes = [NSArray arrayWithObjects: @"img", @"iso", 
@"dmg",
- @"qcow", @"qcow2", @"cloop", @"vmdk", nil];
+ @"qcow", @"qcow2", @"cloop", @"vmdk", @"cdr",
+  nil];
 }
 return self;
 }
-- 
2.7.2

[Qemu-devel] [PATCH] linux-user,target-ppc: fix use of MSR_LE

2016-03-30 Thread Laurent Vivier

setup_frame()/setup_rt_frame()/restore_user_regs() are using
MSR_LE as the similar kernel functions do: as a bitmask.

But in QEMU, MSR_LE is a bit position, so change this
accordingly.

The previous code was doing nothing as MSR_LE is 0,
and "env->msr &= ~MSR_LE" doesn't change the value of msr.

And yes, a user process can change its endianness,
see linux kernel commit:

fab5db9 [PATCH] powerpc: Implement support for setting little-endian mode 
via prctl

and prctl(2): PR_SET_ENDIAN, PR_GET_ENDIAN

Signed-off-by: Laurent Vivier 
---
 linux-user/signal.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/linux-user/signal.c b/linux-user/signal.c
index a233bab..f1b597b 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -4588,7 +4588,7 @@ static void restore_user_regs(CPUPPCState *env,
 
 /* If doing signal return, restore the previous little-endian mode.  */
 if (sig)
-env->msr = (env->msr & ~MSR_LE) | (msr & MSR_LE);
+env->msr = (env->msr & ~(1ull << MSR_LE)) | (msr & (1ull << MSR_LE));
 
 /* Restore Altivec registers if necessary.  */
 if (env->insns_flags & PPC_ALTIVEC) {
@@ -4703,7 +4703,7 @@ static void setup_frame(int sig, struct target_sigaction 
*ka,
 #endif
 
 /* Signal handlers are entered in big-endian mode.  */
-env->msr &= ~MSR_LE;
+env->msr &= ~(1ull << MSR_LE);
 
 unlock_user_struct(frame, frame_addr, 1);
 return;
@@ -4798,7 +4798,7 @@ static void setup_rt_frame(int sig, struct 
target_sigaction *ka,
 #endif
 
 /* Signal handlers are entered in big-endian mode.  */
-env->msr &= ~MSR_LE;
+env->msr &= ~(1ull << MSR_LE);
 
 unlock_user_struct(rt_sf, rt_sf_addr, 1);
 return;
-- 
2.5.0

[Qemu-devel] [PATCH] fix missing event_notifier_init_fd() function on Mac OS X

2016-03-30 Thread Programmingkid

Remove macro that prevents event_notifier_init_fd() function from being 
compiled on Mac OS X.

This patch fixes this error:

Undefined symbols for architecture x86_64:
  "_event_notifier_init_fd", referenced from:
  _process_msg in ivshmem.o
ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status
make[1]: *** [qemu-system-ppc] Error 1
make: *** [subdir-ppc-softmmu] Error 2


Signed-off-by: John Arbuckle 
---
 util/event_notifier-posix.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/util/event_notifier-posix.c b/util/event_notifier-posix.c
index e150301..11cef87 100644
--- a/util/event_notifier-posix.c
+++ b/util/event_notifier-posix.c
@@ -21,7 +21,6 @@
 #include 
 #endif
 
-#ifdef CONFIG_EVENTFD
 /*
  * Initialize @e with existing file descriptor @fd.
  * @fd must be a genuine eventfd object, emulation with pipe won't do.
@@ -31,7 +30,6 @@ void event_notifier_init_fd(EventNotifier *e, int fd)
 e->rfd = fd;
 e->wfd = fd;
 }
-#endif
 
 int event_notifier_init(EventNotifier *e, int active)
 {
-- 
2.7.2

[Qemu-devel] [PULL v2 00/17] target-arm queue

2016-03-30 Thread Peter Maydell

Last lot of target-arm patches just sneaking under the wire:
 * m25p80 (should be safe enough since not really used by anything)
 * virt power key bugfix
 * query-gic-capabilities QMP command

changes v1->v2: new monitor.c is config-softmmu only.

thanks
-- PMM


The following changes since commit 489ef4c810033e63af570c8a430af8b9858bfa5f:

  Merge remote-tracking branch 'remotes/lalrae/tags/mips-20160329-2' into 
staging (2016-03-30 16:06:45 +0100)

are available in the git repository at:


  git://git.linaro.org/people/pmaydell/qemu-arm.git 
tags/pull-target-arm-20160330-1

for you to fetch changes up to db31e49a565fc4c165fd98201721b313c3412c1f:

  arm: implement query-gic-capabilities (2016-03-30 17:27:24 +0100)


target-arm queue:
 * virt: fix the virtual power button by adding a modelled
   "key press for 100ms" device
 * various improvements to m25p80 flash devices
 * implement new QMP query-gic-capability command to let the
   management layer know what versions of GIC we support


Marcin Krzeminski (11):
  block: m25p80: Removed unused variable
  block: m25p80: RESET_ENABLE and RESET_MEMORY commands
  block: m25p80: Widen flags variable
  block: m25p80: Extend address mode
  block: m25p80: 4byte address mode
  block: m25p80: Add configuration registers
  block: m25p80: Dummy cycles for N25Q256/512
  block: m25p80: Fast read and 4bytes commands
  block: m25p80: Implemented FSR register
  block: m25p80: n25q256a/n25q512a models
  block: m25p80: at25128a/at25256a models

Peter Xu (4):
  arm: qmp: add query-gic-capabilities interface
  arm: enhance kvm_arm_create_scratch_host_vcpu
  kvm: add kvm_device_supported() helper function
  arm: implement query-gic-capabilities

Shannon Zhao (2):
  hw/gpio: Add the emulation of gpio_key
  ARM: Virt: Use gpio_key for power button

 default-configs/arm-softmmu.mak |   1 +
 hw/arm/virt.c   |   7 +-
 hw/block/m25p80.c   | 330 +---
 hw/gpio/Makefile.objs   |   1 +
 hw/gpio/gpio_key.c  | 104 +
 include/sysemu/kvm.h|   9 ++
 kvm-all.c   |  15 ++
 monitor.c   |   8 +
 qapi-schema.json|  36 +
 qmp-commands.hx |  27 
 target-arm/Makefile.objs|   2 +-
 target-arm/kvm.c|  14 +-
 target-arm/kvm_arm.h|   7 +-
 target-arm/monitor.c|  84 ++
 14 files changed, 620 insertions(+), 25 deletions(-)
 create mode 100644 hw/gpio/gpio_key.c
 create mode 100644 target-arm/monitor.c

Re: [Qemu-devel] [Qemu-Devel] [PATCH] Changed malloc to g_malloc, free to g_free in linux-user/qemu.h

2016-03-30 Thread haris iqbal

On Wed, Mar 30, 2016 at 7:39 PM, Stefan Hajnoczi  wrote:
> On Thu, Mar 24, 2016 at 12:02:03AM +0530, Md Haris Iqbal wrote:
>> Signed-off-by: Md Haris Iqbal 
>> ---
>>  linux-user/qemu.h | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/linux-user/qemu.h b/linux-user/qemu.h
>> index 26b0ba2..3c3fd15 100644
>> --- a/linux-user/qemu.h
>> +++ b/linux-user/qemu.h
>> @@ -381,7 +381,7 @@ static inline void *lock_user(int type, abi_ulong 
>> guest_addr, long len, int copy
>>  #ifdef DEBUG_REMAP
>>  {
>>  void *addr;
>> -addr = malloc(len);
>> +addr = g_malloc(len);
>>  if (copy)
>>  memcpy(addr, g2h(guest_addr), len);
>>  else
>> @@ -407,7 +407,7 @@ static inline void unlock_user(void *host_ptr, abi_ulong 
>> guest_addr,
>>  return;
>>  if (len > 0)
>>  memcpy(g2h(guest_addr), host_ptr, len);
>> -free(host_ptr);
>> +g_free(host_ptr);
>>  #endif
>>  }
>
> If I understand correctly either lock_user() or lock_user_string() may
> be followed by unlock_user().  If you change unlock_user() to g_free()
> then you also need to change lock_user_string() to g_malloc() to avoid a
> malloc()/g_free() mismatch.

lock_user_string() does not use malloc itself, but calls lock_user()
from itself.



-- 

With regards,

Md Haris Iqbal,
Placement Coordinator, MTech IT
NITK Surathkal,
Contact: +91 8861996962

Re: [Qemu-devel] [PULL 00/17] target-arm queue

2016-03-30 Thread Peter Maydell

On 30 March 2016 at 15:57, Peter Maydell <peter.mayd...@linaro.org> wrote:
>
> Last lot of target-arm patches just sneaking under the wire:
>  * m25p80 (should be safe enough since not really used by anything)
>  * virt power key bugfix
>  * query-gic-capabilities QMP command
>
> thanks
> -- PMM
>
> The following changes since commit b9c27e7ae6fb1387eafe858d8378ff14cd1c5b89:
>
>   Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging 
> (2016-03-30 13:43:05 +0100)
>
> are available in the git repository at:
>
>
>   git://git.linaro.org/people/pmaydell/qemu-arm.git 
> tags/pull-target-arm-20160330
>
> for you to fetch changes up to eefb90314f80c421899f8c42e59df7b80a27c93b:
>
>   arm: implement query-gic-capabilities (2016-03-30 15:48:27 +0100)
>
> 
> target-arm queue:
>  * virt: fix the virtual power button by adding a modelled
>"key press for 100ms" device
>  * various improvements to m25p80 flash devices
>  * implement new QMP query-gic-capability command to let the
>management layer know what versions of GIC we support

Whoops, build failure for the linux-user targets. Fixup patch:

diff --git a/target-arm/Makefile.objs b/target-arm/Makefile.objs
index 334074c..82cbe6b 100644
--- a/target-arm/Makefile.objs
+++ b/target-arm/Makefile.objs
@@ -1,5 +1,5 @@
 obj-y += arm-semi.o
-obj-$(CONFIG_SOFTMMU) += machine.o psci.o arch_dump.o
+obj-$(CONFIG_SOFTMMU) += machine.o psci.o arch_dump.o monitor.o
 obj-$(CONFIG_KVM) += kvm.o
 obj-$(call land,$(CONFIG_KVM),$(call lnot,$(TARGET_AARCH64))) += kvm32.o
 obj-$(call land,$(CONFIG_KVM),$(TARGET_AARCH64)) += kvm64.o
@@ -8,4 +8,4 @@ obj-y += translate.o op_helper.o helper.o cpu.o
 obj-y += neon_helper.o iwmmxt_helper.o
 obj-y += gdbstub.o
 obj-$(TARGET_AARCH64) += cpu64.o translate-a64.o helper-a64.o gdbstub64.o
-obj-y += crypto_helper.o monitor.o
+obj-y += crypto_helper.o

thanks
-- PMM

Re: [Qemu-devel] [PULL v2 00/21] target-mips queue for 2.6

2016-03-30 Thread Peter Maydell

On 30 March 2016 at 09:49, Leon Alrae  wrote:
> Here's version 2 with qapi/error.h includes.
>
> Thanks,
> Leon
>
> Cc: Peter Maydell 
> Cc: Aurelien Jarno 
>
> The following changes since commit 553934db664ecee676650fac0330dceff3531736:
>
>   Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into 
> staging (2016-03-29 19:54:49 +0100)
>
> are available in the git repository at:
>
>   git://github.com/lalrae/qemu.git tags/mips-20160329-2
>
> for you to fetch changes up to f6d4dd810983fdf3d1c9fb81838167efef63d1c8:
>
>   target-mips: add MAAR, MAARI register (2016-03-30 09:14:00 +0100)
>
> 
> MIPS patches 2016-03-29
>
> Changes:
> * add initial MIPS CPS support
> * implement ITU block
> * implement MAAR
>

Applied, thanks.

-- PMM

[Qemu-devel] [Bug 1563931] [NEW] qemu-img should allow resizing image with snapshots

2016-03-30 Thread Zeeshan Ali

Public bug reported:

Currently it's not possible to resize a disk image with qemu-img if
image in question has snapshots associated. I'm not entirely sure this
is technically possible but if it is, it would be really nice to support
that.

$ qemu-img --version
qemu-img version 2.4.1 (qemu-2.4.1-8.fc23), Copyright (c) 2004-2008 Fabrice 
Bellard

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1563931

Title:
  qemu-img should allow resizing image with snapshots

Status in QEMU:
  New

Bug description:
  Currently it's not possible to resize a disk image with qemu-img if
  image in question has snapshots associated. I'm not entirely sure this
  is technically possible but if it is, it would be really nice to
  support that.

  $ qemu-img --version
  qemu-img version 2.4.1 (qemu-2.4.1-8.fc23), Copyright (c) 2004-2008 Fabrice 
Bellard

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1563931/+subscriptions

Re: [Qemu-devel] Help with qapi.py problem

2016-03-30 Thread Programmingkid


On Mar 30, 2016, at 11:51 AM, Eric Blake wrote:

> On 03/30/2016 09:10 AM, Programmingkid wrote:
>> I just git pull'ed today and now have this problem:
>> 
>> .../scripts/qapi.py", line 1726, in maybe_open
>>return open(name, opt)
>> IOError: [Errno 1] Operation not permitted: './qapi-types.h'
> 
> What are the file permissions on ./qapi-types.h?  That's a generated
> file, but if you did something to mess with its permissions so that
> cannot be reopened for overwriting, that would explain the error message.


It looks like I locked the file and forgot about it.

Thank you for your help.

Re: [Qemu-devel] [PATCH v7 3/3] block: enable testing of LUKS driver with block I/O tests

2016-03-30 Thread Max Reitz

On 30.03.2016 15:53, Daniel P. Berrange wrote:
> This adds support for testing the LUKS driver with the block
> I/O test framework.
> 
>cd tests/qemu-io-tests
>./check -luks
> 
> A handful of test cases are modified to work with luks
> 
>  - 004 - whitelist luks format
>  - 012 - use TEST_IMG_FILE instead of TEST_IMG for file ops
>  - 048 - use TEST_IMG_FILE instead of TEST_IMG for file ops.

Well, that doesn't always work now that you kept TEST_IMG_FILE unset for
file-protocol tests.

(Both fail because some command in them cannot find the file ''.)

>  don't assume extended image contents is all zeros,
>  explicitly initialize with zeros
>  Make file size smaller to avoid having to decrypt
>  1 GB of data.
>  - 052 - don't assume initial image contents is all zeros,
>  explicitly initialize with zeros
>  - 100 - don't assume initial image contents is all zeros,
>  explicitly initialize with zeros
> 
> With this patch applied, the results are as follows:
> 
>   Passed: 001 002 003 004 005 008 009 010 011 012 021 032 043
>   047 048 049 052 087 100 134 143
>   Failed: 033 120 140 145
>  Skipped: 007 013 014 015 017 018 019 020 022 023 024 025 026
>   027 028 029 030 031 034 035 036 037 038 039 040 041
>   042 043 044 045 046 047 049 050 051 053 054 055 056
>   057 058 059 060 061 062 063 064 065 066 067 068 069
>   070 071 072 073 074 075 076 077 078 079 080 081 082
>   083 084 085 086 087 088 089 090 091 092 093 094 095
>   096 097 098 099 101 102 103 104 105 107 108 109 110
>   111 112 113 114 115 116 117 118 119 121 122 123 124
>   128 129 130 131 132 133 134 135 136 137 138 139 141
>   142 144 146 148
> 
> The reasons for the failed tests are:
> 
>  - 033 - needs adapting to use image opts syntax with blkdebug
>  and test image in order to correctly set align property
>  - 120 - needs adapting to use correct -drive syntax for luks
>  - 140 - needs adapting to use correct -drive syntax for luks
>  - 145 - needs adapting to use correct -drive syntax for luks
> 
> The vast majority of skipped tests are exercising code that is
> qcow2 specific, though a couple could probably be usefully
> enabled for luks too.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  tests/qemu-iotests/004   |  2 +-
>  tests/qemu-iotests/012   |  2 +-
>  tests/qemu-iotests/048   | 22 +++---
>  tests/qemu-iotests/048.out   |  6 --
>  tests/qemu-iotests/052   |  4 
>  tests/qemu-iotests/052.out   |  4 
>  tests/qemu-iotests/100   |  7 +++
>  tests/qemu-iotests/100.out   | 14 ++
>  tests/qemu-iotests/common|  7 +++
>  tests/qemu-iotests/common.rc |  3 +++
>  10 files changed, 60 insertions(+), 11 deletions(-)
> 

[...]

> diff --git a/tests/qemu-iotests/048 b/tests/qemu-iotests/048
> index e1eeac2..0e58364 100755
> --- a/tests/qemu-iotests/048
> +++ b/tests/qemu-iotests/048

[...]

> @@ -46,25 +46,33 @@ _compare()
>  . ./common.filter
>  . ./common.pattern
>  
> -_supported_fmt raw qcow qcow2 qed
> +_supported_fmt raw qcow qcow2 qed luks
>  _supported_proto file
>  _supported_os Linux
>  
>  # Setup test basic parameters
>  TEST_IMG2=$TEST_IMG.2
> +TEST_IMG_FILE2=$TEST_IMG_FILE.2
>  CLUSTER_SIZE=4096
> -size=1024M
> +size=128M
>  
>  _make_test_img $size
>  io_pattern write 524288 $CLUSTER_SIZE $CLUSTER_SIZE 4 45
>  
>  # Compare identical images
> -cp "$TEST_IMG" "${TEST_IMG2}"
> +cp "$TEST_IMG_FILE" "${TEST_IMG_FILE2}"
>  _compare
>  _compare -q
>  
>  # Compare images with different size
> -$QEMU_IMG resize -f $IMGFMT "$TEST_IMG" +512M
> +if [ "$IMGOPTSSYNTAX" = "true" ]; then
> +$QEMU_IMG resize $QEMU_IMG_EXTRA_ARGS "$TEST_IMG" +32M
> +else
> +$QEMU_IMG resize -f $IMGFMT "$TEST_IMG" +32M

If you don't decide to pass QEMU_IMG_EXTRA_ARGS in the
_qemu_img_wrapper, maybe you could set it to "-f $IMGFMT" if
$IMGOPTSSYNTAX is false.

(Keeping this as it is won't stop me from giving an R-b, though.)

Max

> +fi
> +# Ensure extended space is zero-initialized
> +$QEMU_IO "$TEST_IMG" -c "write -P 0 $size 32M" | _filter_qemu_io
> +
>  _compare
>  _compare -s
>  




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2] Use frame->retcode instead of frame address for alpha target restorer

2016-03-30 Thread Laurent Vivier



Le 30/03/2016 17:42, cheng...@emindsoft.com.cn a écrit :
> From: Chen Gang 
> 
> The restorer needs the return code address which is frame->retcode, not
> frame itself.
> 
> Signed-off-by: Chen Gang 

Reviewed-by: Laurent Vivier

Re: [Qemu-devel] [PATCH v2] Use frame->retcode instead of frame address for alpha target restorer

2016-03-30 Thread Richard Henderson

On 03/30/2016 08:42 AM, cheng...@emindsoft.com.cn wrote:
> From: Chen Gang 
> 
> The restorer needs the return code address which is frame->retcode, not
> frame itself.
> 
> Signed-off-by: Chen Gang 
> ---
>  linux-user/signal.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [PATCH 1/5] slirp: Allow disabling IPv4 or IPv6

2016-03-30 Thread Eric Blake

On 03/30/2016 09:36 AM, Samuel Thibault wrote:
> Thomas Huth, on Wed 30 Mar 2016 17:29:12 +0200, wrote:
>> On 30.03.2016 17:13, Samuel Thibault wrote:
>>> Thomas Huth, on Wed 30 Mar 2016 17:06:51 +0200, wrote:
 The "restrict" option is listed with "=on|off" here, that's why I
 thought it should be there for "ipv4" and "ipv6", too. Which boolean
 options are missing the "=on|off" ?
>>>
>>> All the ipv4 and ipv6 options in the same file.
>>
>> Ugh, ok, now I see it ... most of these are specified in qemu-options.hx
>> without the "=on|off", only for the "-netdev l2tpv3" it is specified as
>> "ipv6=on/off" (with slash instead of the pipe character!) ... what a
>> mess... not sure which is the best way to go here, so maybe keep it
>> without the "=on|off" for now so that it is consistent with most of the
>> other options?
> 
> Eric, Markus, what do you prefer?

No strong preference. Libvirt doesn't parse the command line --help
output, precisely because (as you've noticed) it is a big inconsistent
mess already.  Pick one, and that's fine; and it's a bonus if you want
to do a followup cleanup for consistency, but I won't lose any sleep if
you don't do a followup.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v7 2/3] block: add support for encryption secrets in block I/O tests

2016-03-30 Thread Daniel P. Berrange

On Wed, Mar 30, 2016 at 05:51:16PM +0200, Max Reitz wrote:
> On 30.03.2016 15:53, Daniel P. Berrange wrote:
> > The LUKS block driver tests will require the ability to specify
> > encryption secrets with block devices. This requires using the
> > --object argument to qemu-img/qemu-io to create a 'secret'
> > object.
> > 
> > When the IMGKEYSECRET env variable is set, it provides the
> > password to be associated with a secret called 'keysec0'
> > 
> > Signed-off-by: Daniel P. Berrange 
> > ---
> >  tests/qemu-iotests/common|  1 +
> >  tests/qemu-iotests/common.config |  6 ++
> >  tests/qemu-iotests/common.filter |  3 ++-
> >  tests/qemu-iotests/common.rc | 16 +---
> >  4 files changed, 22 insertions(+), 4 deletions(-)
> 
> Is there a reason why you didn't implement the same logic in
> _qemu_img_wrapper as in _qemu_io_wrapper?

Mostly because the --image-opts syntax isn't used with
the 'create' command for qemu-img. I guess I could have
trie to detect which command was being run and set the
args accordingly.

> This works, but it appears a bit overcomplicated to me.
> 
> Reviewed-by: Max Reitz 

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCH v7 1/3] block: add support for --image-opts in block I/O tests

2016-03-30 Thread Daniel P. Berrange

On Wed, Mar 30, 2016 at 05:44:41PM +0200, Max Reitz wrote:
> On 30.03.2016 15:53, Daniel P. Berrange wrote:
> > Currently all block tests use the traditional syntax for images
> > just specifying a filename. To support the LUKS driver without
> > resorting to JSON, the tests need to be able to use the new
> > --image-opts argument to qemu-img and qemu-io.
> > 
> > This introduces a new env variable IMGOPTSSYNTAX. If this is
> > set to 'true', then qemu-img/qemu-io should use --image-opts.
> > 
> > Signed-off-by: Daniel P. Berrange 
> > ---
> >  tests/qemu-iotests/039.out   | 20 +++---
> >  tests/qemu-iotests/common|  7 -
> >  tests/qemu-iotests/common.config | 15 +--
> >  tests/qemu-iotests/common.rc | 57 
> > +---
> >  4 files changed, 71 insertions(+), 28 deletions(-)
> > 
> > diff --git a/tests/qemu-iotests/039.out b/tests/qemu-iotests/039.out
> > index 32c8846..c6e0ac2 100644
> > --- a/tests/qemu-iotests/039.out
> > +++ b/tests/qemu-iotests/039.out
> > @@ -12,9 +12,9 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
> >  wrote 512/512 bytes at offset 0
> >  512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >  ./common.config: Killed  ( if [ "${VALGRIND_QEMU}" == "y" 
> > ]; then
> > -exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
> > "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
> > +exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
> > "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
> >  else
> > -exec "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
> > +exec "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
> >  fi )
> >  incompatible_features 0x1
> >  ERROR cluster 5 refcount=0 reference=1
> > @@ -51,9 +51,9 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
> >  wrote 512/512 bytes at offset 0
> >  512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >  ./common.config: Killed  ( if [ "${VALGRIND_QEMU}" == "y" 
> > ]; then
> > -exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
> > "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
> > +exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
> > "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
> >  else
> > -exec "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
> > +exec "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
> >  fi )
> >  incompatible_features 0x1
> >  ERROR cluster 5 refcount=0 reference=1
> > @@ -69,9 +69,9 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
> >  wrote 512/512 bytes at offset 0
> >  512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >  ./common.config: Killed  ( if [ "${VALGRIND_QEMU}" == "y" 
> > ]; then
> > -exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
> > "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
> > +exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
> > "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
> >  else
> > -exec "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
> > +exec "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
> >  fi )
> >  incompatible_features 0x0
> >  No errors were found on the image.
> > @@ -92,9 +92,9 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
> >  wrote 512/512 bytes at offset 0
> >  512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >  ./common.config: Killed  ( if [ "${VALGRIND_QEMU}" == "y" 
> > ]; then
> > -exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
> > "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
> > +exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
> > "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
> >  else
> > -exec "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
> > +exec "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
> >  fi )
> >  incompatible_features 0x1
> >  ERROR cluster 5 refcount=0 reference=1
> > @@ -106,9 +106,9 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT 
> > size=134217728
> >  wrote 512/512 bytes at offset 0
> >  512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> >  ./common.config: Killed  ( if [ "${VALGRIND_QEMU}" == "y" 
> > ]; then
> > -exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
> > "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
> > +exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
> > "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
> >  else
> > -exec "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
> > +exec "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
> >  fi )
> >  incompatible_features 0x0
> >  No errors were found on the image.
> 
> I think 061 and 137 need the same treatment.
> 
> Looks good apart from that.

FWIW, I didn't attempt to update those because they're qcow2 only. I
only tried to update tests which were (potentially) runnable with
the luks block driver.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o-

[Qemu-devel] [PATCH v2] Use frame->retcode instead of frame address for alpha target restorer

2016-03-30 Thread chengang

From: Chen Gang 

The restorer needs the return code address which is frame->retcode, not
frame itself.

Signed-off-by: Chen Gang 
---
 linux-user/signal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/signal.c b/linux-user/signal.c
index e487f9e..34367ce 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -5396,7 +5396,7 @@ static void setup_frame(int sig, struct target_sigaction 
*ka,
>retcode[1]);
 __put_user(INSN_CALLSYS, >retcode[2]);
 /* imb() */
-r26 = frame_addr;
+r26 = frame_addr + offsetof(struct target_sigframe, retcode);
 }
 
 unlock_user_struct(frame, frame_addr, 1);
@@ -5455,7 +5455,7 @@ static void setup_rt_frame(int sig, struct 
target_sigaction *ka,
>retcode[1]);
 __put_user(INSN_CALLSYS, >retcode[2]);
 /* imb(); */
-r26 = frame_addr;
+r26 = frame_addr + offsetof(struct target_rt_sigframe, retcode);
 }
 
 if (err) {
-- 
1.9.3

Re: [Qemu-devel] Help with qapi.py problem

2016-03-30 Thread Eric Blake

On 03/30/2016 09:10 AM, Programmingkid wrote:
> I just git pull'ed today and now have this problem:
> 
> .../scripts/qapi.py", line 1726, in maybe_open
> return open(name, opt)
> IOError: [Errno 1] Operation not permitted: './qapi-types.h'

What are the file permissions on ./qapi-types.h?  That's a generated
file, but if you did something to mess with its permissions so that
cannot be reopened for overwriting, that would explain the error message.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v7 2/3] block: add support for encryption secrets in block I/O tests

2016-03-30 Thread Max Reitz

On 30.03.2016 15:53, Daniel P. Berrange wrote:
> The LUKS block driver tests will require the ability to specify
> encryption secrets with block devices. This requires using the
> --object argument to qemu-img/qemu-io to create a 'secret'
> object.
> 
> When the IMGKEYSECRET env variable is set, it provides the
> password to be associated with a secret called 'keysec0'
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  tests/qemu-iotests/common|  1 +
>  tests/qemu-iotests/common.config |  6 ++
>  tests/qemu-iotests/common.filter |  3 ++-
>  tests/qemu-iotests/common.rc | 16 +---
>  4 files changed, 22 insertions(+), 4 deletions(-)

Is there a reason why you didn't implement the same logic in
_qemu_img_wrapper as in _qemu_io_wrapper?

This works, but it appears a bit overcomplicated to me.

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v7 1/3] block: add support for --image-opts in block I/O tests

2016-03-30 Thread Max Reitz

On 30.03.2016 15:53, Daniel P. Berrange wrote:
> Currently all block tests use the traditional syntax for images
> just specifying a filename. To support the LUKS driver without
> resorting to JSON, the tests need to be able to use the new
> --image-opts argument to qemu-img and qemu-io.
> 
> This introduces a new env variable IMGOPTSSYNTAX. If this is
> set to 'true', then qemu-img/qemu-io should use --image-opts.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  tests/qemu-iotests/039.out   | 20 +++---
>  tests/qemu-iotests/common|  7 -
>  tests/qemu-iotests/common.config | 15 +--
>  tests/qemu-iotests/common.rc | 57 
> +---
>  4 files changed, 71 insertions(+), 28 deletions(-)
> 
> diff --git a/tests/qemu-iotests/039.out b/tests/qemu-iotests/039.out
> index 32c8846..c6e0ac2 100644
> --- a/tests/qemu-iotests/039.out
> +++ b/tests/qemu-iotests/039.out
> @@ -12,9 +12,9 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
>  wrote 512/512 bytes at offset 0
>  512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  ./common.config: Killed  ( if [ "${VALGRIND_QEMU}" == "y" ]; 
> then
> -exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
> "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
> +exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
> "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
>  else
> -exec "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
> +exec "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
>  fi )
>  incompatible_features 0x1
>  ERROR cluster 5 refcount=0 reference=1
> @@ -51,9 +51,9 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
>  wrote 512/512 bytes at offset 0
>  512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  ./common.config: Killed  ( if [ "${VALGRIND_QEMU}" == "y" ]; 
> then
> -exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
> "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
> +exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
> "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
>  else
> -exec "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
> +exec "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
>  fi )
>  incompatible_features 0x1
>  ERROR cluster 5 refcount=0 reference=1
> @@ -69,9 +69,9 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
>  wrote 512/512 bytes at offset 0
>  512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  ./common.config: Killed  ( if [ "${VALGRIND_QEMU}" == "y" ]; 
> then
> -exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
> "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
> +exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
> "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
>  else
> -exec "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
> +exec "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
>  fi )
>  incompatible_features 0x0
>  No errors were found on the image.
> @@ -92,9 +92,9 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
>  wrote 512/512 bytes at offset 0
>  512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  ./common.config: Killed  ( if [ "${VALGRIND_QEMU}" == "y" ]; 
> then
> -exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
> "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
> +exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
> "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
>  else
> -exec "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
> +exec "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
>  fi )
>  incompatible_features 0x1
>  ERROR cluster 5 refcount=0 reference=1
> @@ -106,9 +106,9 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
>  wrote 512/512 bytes at offset 0
>  512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  ./common.config: Killed  ( if [ "${VALGRIND_QEMU}" == "y" ]; 
> then
> -exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
> "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
> +exec valgrind --log-file="${VALGRIND_LOGFILE}" --error-exitcode=99 
> "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
>  else
> -exec "$QEMU_IO_PROG" $QEMU_IO_OPTIONS "$@";
> +exec "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@";
>  fi )
>  incompatible_features 0x0
>  No errors were found on the image.

I think 061 and 137 need the same treatment.

Looks good apart from that.

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 6/9] virtio-blk: use aio handler for data plane

2016-03-30 Thread Paolo Bonzini



On 30/03/2016 17:25, Cornelia Huck wrote:
>> > 
>> > -/* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
>> > - * dataplane here instead of waiting for .set_status().
>> > - */
>> > -if (s->dataplane && !s->dataplane_started) {
>> > -virtio_blk_data_plane_start(s->dataplane);
>> > -return;
>> > -}
>> > -
>> >  assert(atomic_fetch_inc(>reentrancy_test) == 0);
> Hm, based on wrong branch?
> 

Indeed I had the testing patch in there for, well, testing.  I'll repost
with improved commit messages and the assertion removed.

Paolo

[Qemu-devel] [PATCH v2] spapr: compute interrupt vector address from LPCR

2016-03-30 Thread Cédric Le Goater

This address is changed by the linux kernel using the H_SET_MODE hcall
and needs to be restored when migrating a spapr VM running in
TCG. This can be done using the AIL bits from the LPCR register.

The patch introduces a helper routine cpu_ppc_get_excp_prefix() which
returns the effective address offset of the interrupt handler
depending on the LPCR_AIL bits. The same helper can be used in the
H_SET_MODE hcall, which lets us remove the H_SET_MODE_ADDR_TRANS_*
defines.

Signed-off-by: Cédric Le Goater 
---

 Changes since v1:

 - moved helper routine under target-ppc/
 - moved the restore of excp_prefix under cpu_post_load()

 hw/ppc/spapr_hcall.c|   13 ++---
 include/hw/ppc/spapr.h  |5 -
 target-ppc/cpu.h|9 +
 target-ppc/machine.c|   20 +++-
 target-ppc/translate_init.c |   14 ++
 5 files changed, 44 insertions(+), 17 deletions(-)

Index: qemu-dgibson-for-2.6.git/hw/ppc/spapr_hcall.c
===
--- qemu-dgibson-for-2.6.git.orig/hw/ppc/spapr_hcall.c
+++ qemu-dgibson-for-2.6.git/hw/ppc/spapr_hcall.c
@@ -835,17 +835,8 @@ static target_ulong h_set_mode_resource_
 return H_P4;
 }
 
-switch (mflags) {
-case H_SET_MODE_ADDR_TRANS_NONE:
-prefix = 0;
-break;
-case H_SET_MODE_ADDR_TRANS_0001_8000:
-prefix = 0x18000;
-break;
-case H_SET_MODE_ADDR_TRANS_C000___4000:
-prefix = 0xC0004000ULL;
-break;
-default:
+prefix = cpu_ppc_get_excp_prefix(mflags);
+if (prefix == (target_ulong) -1ULL) {
 return H_UNSUPPORTED_FLAG;
 }
 
Index: qemu-dgibson-for-2.6.git/target-ppc/machine.c
===
--- qemu-dgibson-for-2.6.git.orig/target-ppc/machine.c
+++ qemu-dgibson-for-2.6.git/target-ppc/machine.c
@@ -156,12 +156,26 @@ static void cpu_pre_save(void *opaque)
 }
 }
 
+
+static int cpu_post_load_excp_prefix(CPUPPCState *env)
+{
+int ail = (env->spr[SPR_LPCR] & LPCR_AIL) >> LPCR_AIL_SHIFT;
+target_ulong prefix = cpu_ppc_get_excp_prefix(ail);
+
+if (prefix == (target_ulong) -1ULL) {
+return -EINVAL;
+}
+env->excp_prefix = prefix;
+return 0;
+}
+
 static int cpu_post_load(void *opaque, int version_id)
 {
 PowerPCCPU *cpu = opaque;
 CPUPPCState *env = >env;
 int i;
 target_ulong msr;
+int ret = 0;
 
 /*
  * We always ignore the source PVR. The user or management
@@ -201,7 +215,11 @@ static int cpu_post_load(void *opaque, i
 
 hreg_compute_mem_idx(env);
 
-return 0;
+if (env->spr[SPR_LPCR] & LPCR_AIL) {
+ret = cpu_post_load_excp_prefix(env);
+}
+
+return ret;
 }
 
 static bool fpu_needed(void *opaque)
Index: qemu-dgibson-for-2.6.git/target-ppc/translate_init.c
===
--- qemu-dgibson-for-2.6.git.orig/target-ppc/translate_init.c
+++ qemu-dgibson-for-2.6.git/target-ppc/translate_init.c
@@ -8522,6 +8522,20 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu)
 }
 }
 
+target_ulong cpu_ppc_get_excp_prefix(target_ulong ail)
+{
+switch (ail) {
+case AIL_NONE:
+return 0;
+case AIL_0001_8000:
+return 0x18000;
+case AIL_C000___4000:
+return 0xC0004000ULL;
+default:
+return (target_ulong) -1ULL;
+}
+}
+
 #endif /* !defined(CONFIG_USER_ONLY) */
 
 #endif /* defined (TARGET_PPC64) */
Index: qemu-dgibson-for-2.6.git/target-ppc/cpu.h
===
--- qemu-dgibson-for-2.6.git.orig/target-ppc/cpu.h
+++ qemu-dgibson-for-2.6.git/target-ppc/cpu.h
@@ -1269,6 +1269,7 @@ void store_booke_tsr (CPUPPCState *env,
 void ppc_tlb_invalidate_all (CPUPPCState *env);
 void ppc_tlb_invalidate_one (CPUPPCState *env, target_ulong addr);
 void cpu_ppc_set_papr(PowerPCCPU *cpu);
+target_ulong cpu_ppc_get_excp_prefix(target_ulong ail);
 #endif
 #endif
 
@@ -2277,6 +2278,14 @@ enum {
 HMER_XSCOM_STATUS_LSH   = (63 - 23),
 };
 
+/* Alternate Interrupt Location (AIL) */
+enum {
+AIL_NONE= 0,
+AIL_RESERVED= 1,
+AIL_0001_8000   = 2,
+AIL_C000___4000 = 3,
+};
+
 /*/
 
 static inline target_ulong cpu_read_xer(CPUPPCState *env)
Index: qemu-dgibson-for-2.6.git/include/hw/ppc/spapr.h
===
--- qemu-dgibson-for-2.6.git.orig/include/hw/ppc/spapr.h
+++ qemu-dgibson-for-2.6.git/include/hw/ppc/spapr.h
@@ -204,11 +204,6 @@ struct sPAPRMachineState {
 #define H_SET_MODE_ENDIAN_BIG0
 #define H_SET_MODE_ENDIAN_LITTLE 1
 
-/* Flags for H_SET_MODE_RESOURCE_ADDR_TRANS_MODE */
-#define H_SET_MODE_ADDR_TRANS_NONE  0
-#define H_SET_MODE_ADDR_TRANS_0001_8000 2

Re: [Qemu-devel] [PATCH 9/9] virtio: remove starting/stopping checks

2016-03-30 Thread Cornelia Huck

On Wed, 30 Mar 2016 14:48:08 +0200
Paolo Bonzini  wrote:

> Reentrancy cannot happen while the BQL is being held.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  hw/block/dataplane/virtio-blk.c | 12 ++--
>  hw/scsi/virtio-scsi-dataplane.c |  9 +
>  include/hw/virtio/virtio-scsi.h |  2 --
>  3 files changed, 3 insertions(+), 20 deletions(-)
> 

Less flags, yeah!

Reviewed-by: Cornelia Huck

Re: [Qemu-devel] [PATCH 1/5] slirp: Allow disabling IPv4 or IPv6

2016-03-30 Thread Samuel Thibault

Thomas Huth, on Wed 30 Mar 2016 17:29:12 +0200, wrote:
> On 30.03.2016 17:13, Samuel Thibault wrote:
> > Thomas Huth, on Wed 30 Mar 2016 17:06:51 +0200, wrote:
> >> The "restrict" option is listed with "=on|off" here, that's why I
> >> thought it should be there for "ipv4" and "ipv6", too. Which boolean
> >> options are missing the "=on|off" ?
> > 
> > All the ipv4 and ipv6 options in the same file.
> 
> Ugh, ok, now I see it ... most of these are specified in qemu-options.hx
> without the "=on|off", only for the "-netdev l2tpv3" it is specified as
> "ipv6=on/off" (with slash instead of the pipe character!) ... what a
> mess... not sure which is the best way to go here, so maybe keep it
> without the "=on|off" for now so that it is consistent with most of the
> other options?

Eric, Markus, what do you prefer?

Samuel

Re: [Qemu-devel] [PATCH 8/9] virtio: merge virtio_queue_aio_set_host_notifier_handler with virtio_queue_set_aio

2016-03-30 Thread Cornelia Huck

On Wed, 30 Mar 2016 14:48:07 +0200
Paolo Bonzini  wrote:

> Eliminating the reentrancy is actually a nice thing that we can do
> with the API that Michael proposed, so let's make it first class.
> This also hides the complex assign/set_handler conventions from
> callers of virtio_queue_aio_set_host_notifier_handler, and fixes
> a possible race that could happen when passing assign=false to
> virtio_queue_aio_set_host_notifier_handler.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  hw/block/dataplane/virtio-blk.c |  7 +++
>  hw/scsi/virtio-scsi-dataplane.c | 12 
>  hw/virtio/virtio.c  | 19 ---
>  include/hw/virtio/virtio.h  |  6 ++
>  4 files changed, 13 insertions(+), 31 deletions(-)

Reviewed-by: Cornelia Huck

Re: [Qemu-devel] [PATCH 7/9] virtio-scsi: use aio handler for data plane

2016-03-30 Thread Cornelia Huck

On Wed, 30 Mar 2016 14:48:06 +0200
Paolo Bonzini  wrote:

> In addition to handling IO in vcpu thread and in io thread, dataplane
> introduces yet another mode: handling it by aio.
> 
> This reuses the same handler as previous modes, which triggers races as
> these were not designed to be reentrant.
> 
> Use a separate handler just for aio, and disable regular handlers when
> dataplane is active.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  hw/scsi/virtio-scsi-dataplane.c | 43 ---
>  hw/scsi/virtio-scsi.c   | 65 
> -
>  include/hw/virtio/virtio-scsi.h |  6 ++--
>  3 files changed, 86 insertions(+), 28 deletions(-)
> 

> +static int virtio_scsi_vring_init(VirtIOSCSI *s, VirtQueue *vq, int n,
> +   void (*fn)(VirtIODevice *vdev, VirtQueue *vq))

This function has been sadly misnamed since dataplane does not set up
its own vrings anymore. Maybe it should be called
virtio_scsi_vq_notifier_init() or so.

>  {
>  BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(s)));
>  VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);

Reviewed-by: Cornelia Huck

Re: [Qemu-devel] [PATCH 1/5] slirp: Allow disabling IPv4 or IPv6

2016-03-30 Thread Thomas Huth

On 30.03.2016 17:13, Samuel Thibault wrote:
> Thomas Huth, on Wed 30 Mar 2016 17:06:51 +0200, wrote:
>> The "restrict" option is listed with "=on|off" here, that's why I
>> thought it should be there for "ipv4" and "ipv6", too. Which boolean
>> options are missing the "=on|off" ?
> 
> All the ipv4 and ipv6 options in the same file.

Ugh, ok, now I see it ... most of these are specified in qemu-options.hx
without the "=on|off", only for the "-netdev l2tpv3" it is specified as
"ipv6=on/off" (with slash instead of the pipe character!) ... what a
mess... not sure which is the best way to go here, so maybe keep it
without the "=on|off" for now so that it is consistent with most of the
other options?

 Thomas

Re: [Qemu-devel] [PATCH 6/9] virtio-blk: use aio handler for data plane

2016-03-30 Thread Cornelia Huck

On Wed, 30 Mar 2016 14:48:05 +0200
Paolo Bonzini  wrote:

> From: "Michael S. Tsirkin" 
> 
> In addition to handling IO in vcpu thread and in io thread, dataplane
> introduces yet another mode: handling it by aio.
> 
> This reuses the same handler as previous modes, which triggers races as
> these were not designed to be reentrant.
> 
> Use a separate handler just for aio, and disable regular handlers when
> dataplane is active.
> 
> Signed-off-by: Michael S. Tsirkin 
> Signed-off-by: Paolo Bonzini 
> ---
>  hw/block/dataplane/virtio-blk.c | 13 +
>  hw/block/virtio-blk.c   | 27 +--
>  include/hw/virtio/virtio-blk.h  |  2 ++
>  3 files changed, 32 insertions(+), 10 deletions(-)
> 

> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 77221c1..47ad9ed 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -577,20 +577,11 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, 
> MultiReqBuffer *mrb)
>  }
>  }
> 
> -static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> +void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
>  {
> -VirtIOBlock *s = VIRTIO_BLK(vdev);
>  VirtIOBlockReq *req;
>  MultiReqBuffer mrb = {};
> 
> -/* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
> - * dataplane here instead of waiting for .set_status().
> - */
> -if (s->dataplane && !s->dataplane_started) {
> -virtio_blk_data_plane_start(s->dataplane);
> -return;
> -}
> -
>  assert(atomic_fetch_inc(>reentrancy_test) == 0);

Hm, based on wrong branch?

>  blk_io_plug(s->blk);
>

[Qemu-devel] [PATCH 2/4] vhost: prevent double stop of vhost_net device.

2016-03-30 Thread Ilya Maximets

There are few control flows in vhost's implementation where
vhost_net_stop() may be re entered several times in a row. This
leads to triggering of assertion while duble freeing of same resources.

For example, if vhost-user socket disconnection occured:
[ cut ---]
[guest]# echo -n ':00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
qemu: /qemu/memory.c:1852: memory_region_del_eventfd:
  Assertion `i != mr->ioeventfd_nb' failed.

Child terminated with signal = 0x6 (SIGABRT)
GDBserver exiting
[ cut ---]
[host]# gdb
Breakpoint 1 in virtio_pci_set_host_notifier_internal()
#0  virtio_pci_set_host_notifier_internal
#1  vhost_dev_disable_notifiers
#2  vhost_net_stop_one
#3  vhost_net_stop (dev=dev@entry=0x13a45f8, ncs=0x1ce89f0, <...>)
#4  virtio_net_vhost_status
#5  virtio_net_set_status
#6  qmp_set_link (<...>, up=up@entry=false, <...>)
#7  net_vhost_user_event (<...>, event=5)
#8  qemu_chr_be_event
#9  tcp_chr_disconnect
#10 tcp_chr_sync_read
#11 qemu_chr_fe_read_all
#12 vhost_user_read
#13 vhost_user_get_vring_base
#14 vhost_virtqueue_stop
#15 vhost_dev_stop
#16 vhost_net_stop_one
#17 vhost_net_stop (dev=dev@entry=0x13a45f8, ncs=0x1ce89f0, <...>)
#18 virtio_net_vhost_status
#19 virtio_net_set_status
#20 virtio_set_status
<...>
[ cut ---]

In example above assertion will fail when control will be brought back
to function at #17 and it will try to free 'eventfd' that was already
freed at call #3.

Fix that by disallowing execution of vhost_net_stop() if we're
already inside of it.

Signed-off-by: Ilya Maximets 
---
 hw/net/vhost_net.c | 8 
 hw/net/virtio-net.c| 1 +
 include/hw/virtio/virtio-net.h | 1 +
 3 files changed, 10 insertions(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 55d5456..0996e5d 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -335,8 +335,14 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
 BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(dev)));
 VirtioBusState *vbus = VIRTIO_BUS(qbus);
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
+VirtIONet *n = VIRTIO_NET(dev);
 int i, r;
 
+if (n->vhost_stopping_in_progress) {
+return;
+}
+n->vhost_stopping_in_progress = 1;
+
 for (i = 0; i < total_queues; i++) {
 vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
 put_vhost_net(ncs[i].peer);
@@ -348,6 +354,8 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
 fflush(stderr);
 }
 assert(r >= 0);
+
+n->vhost_stopping_in_progress = 0;
 }
 
 void vhost_net_cleanup(struct vhost_net *net)
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index e5456ef..9b1539c 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -149,6 +149,7 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t 
status)
 }
 
 n->vhost_started = 1;
+n->vhost_stopping_in_progress = 0;
 r = vhost_net_start(vdev, n->nic->ncs, queues);
 if (r < 0) {
 error_report("unable to start vhost net: %d: "
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 0cabdb6..fc07385 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -74,6 +74,7 @@ typedef struct VirtIONet {
 uint8_t nouni;
 uint8_t nobcast;
 uint8_t vhost_started;
+uint8_t vhost_stopping_in_progress;
 struct {
 uint32_t in_use;
 uint32_t first_multi;
-- 
2.5.0

[Qemu-devel] [PATCH 1/4] vhost-user: fix crash on socket disconnect.

2016-03-30 Thread Ilya Maximets

After disconnect of vhost-user socket (Ex.: host application crash)
QEMU will crash with segmentation fault while virtio driver unbinding
inside the guest.

This happens because vhost_net_cleanup() called while stopping
virtqueue #0 because of CHR_EVENT_CLOSED event arriving.
After that stopping of virtqueue #1 crashes because of cleaned and
freed device memory:

[ cut ---]
[guest]# echo -n ':00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.

Child terminated with signal = 0xb (SIGSEGV)
GDBserver exiting
[ cut ---]
[host]# gdb
Breakpoint 1 vhost_user_cleanup(dev) at hw/virtio/vhost-user.c
(gdb) bt
#0  vhost_user_cleanup # executes 'dev->opaque = 0;'
#1  vhost_dev_cleanup
#2  vhost_net_cleanup  # After return from #1: g_free(net)
#3  vhost_user_stop
#4  net_vhost_user_event (<...>, event=5) /* CHR_EVENT_CLOSED */
#5  qemu_chr_be_event (<...>, event=event@entry=5)
#6  tcp_chr_disconnect
#7  tcp_chr_sync_read
#8  qemu_chr_fe_read_all
#9  vhost_user_read
#10 vhost_user_get_vring_base
#11 vhost_virtqueue_stop (vq=0xffe190, idx=0)
#12 vhost_dev_stop
#13 vhost_net_stop_one
#14 vhost_net_stop
#15 virtio_net_vhost_status
#16 virtio_net_set_status
#17 virtio_set_status
<...>
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
qemu_chr_fe_write_all (s=s@entry=0x0, <...>) at qemu-char.c:302
302 if (s->replay && replay_mode == REPLAY_MODE_PLAY) {
(gdb) bt
#0  qemu_chr_fe_write_all (s=s@entry=0x0, <...>)
#1  vhost_user_write
#2  vhost_user_get_vring_base
#3  vhost_virtqueue_stop (vq=0xffe190, idx=1) # device already freed here
#4  vhost_dev_stop
#5  vhost_net_stop_one
#6  vhost_net_stop
#7  virtio_net_vhost_status
#8  virtio_net_set_status
#9  virtio_set_status
<...>
[ cut ---]

Fix that by introducing of reference counter for vhost_net device
and freeing memory only after dropping of last reference.

Signed-off-by: Ilya Maximets 
---
 hw/net/vhost_net.c   | 22 +++---
 hw/net/virtio-net.c  | 32 
 include/net/vhost-user.h |  1 +
 include/net/vhost_net.h  |  1 +
 net/filter.c |  1 +
 net/vhost-user.c | 43 ++-
 6 files changed, 80 insertions(+), 20 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 6e1032f..55d5456 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -296,6 +296,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) {
 dev->use_guest_notifier_mask = false;
 }
+put_vhost_net(ncs[i].peer);
  }
 
 r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
@@ -306,7 +307,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 
 for (i = 0; i < total_queues; i++) {
 r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev);
-
+put_vhost_net(ncs[i].peer);
 if (r < 0) {
 goto err_start;
 }
@@ -317,6 +318,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 err_start:
 while (--i >= 0) {
 vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
+put_vhost_net(ncs[i].peer);
 }
 e = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
 if (e < 0) {
@@ -337,6 +339,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
 
 for (i = 0; i < total_queues; i++) {
 vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
+put_vhost_net(ncs[i].peer);
 }
 
 r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
@@ -398,16 +401,25 @@ VHostNetState *get_vhost_net(NetClientState *nc)
 return vhost_net;
 }
 
+void put_vhost_net(NetClientState *nc)
+{
+if (nc && nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) {
+vhost_user_put_vhost_net(nc);
+}
+}
+
 int vhost_set_vring_enable(NetClientState *nc, int enable)
 {
 VHostNetState *net = get_vhost_net(nc);
 const VhostOps *vhost_ops = net->dev.vhost_ops;
+int res = 0;
 
 if (vhost_ops->vhost_set_vring_enable) {
-return vhost_ops->vhost_set_vring_enable(>dev, enable);
+res = vhost_ops->vhost_set_vring_enable(>dev, enable);
 }
 
-return 0;
+put_vhost_net(nc);
+return res;
 }
 
 #else
@@ -466,6 +478,10 @@ VHostNetState *get_vhost_net(NetClientState *nc)
 return 0;
 }
 
+void put_vhost_net(NetClientState *nc)
+{
+}
+
 int vhost_set_vring_enable(NetClientState *nc, int enable)
 {
 return 0;
diff --git

[Qemu-devel] [PATCH 0/4] Fix QEMU crash on vhost-user socket disconnect.

2016-03-30 Thread Ilya Maximets

Currently QEMU always crashes in following scenario (assume that
vhost-user application is Open vSwitch with 'dpdkvhostuser' port):

1. # Check that link in guest is in a normal state.
   [guest]# ip link show eth0
2: eth0:  mtu 1500 qdisc <...>
link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff

2. # Kill vhost-user application (using SIGSEGV just to be sure).
   [host]# kill -11 `pgrep ovs-vswitchd`

3. # Check that guest still thinks that all is good.
   [guest]# ip link show eth0
2: eth0:  mtu 1500 qdisc <...>
link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff

4. # Try to unbind virtio-pci driver and observe QEMU crash.
   [guest]# echo -n ':00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.

Child terminated with signal = 0xb (SIGSEGV)
GDBserver exiting

After the applying of this patch-set:

4. # Try to unbind virtio-pci driver. Unbind works fine with only few errors.
   [guest]# echo -n ':00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.

5. # Bind virtio-pci driver back.
   [guest]# echo -n ':00:01.0' > /sys/bus/pci/drivers/virtio-pci/bind

6. # Check link in guest. No crashes here, link in DOWN state.
   [guest]# ip link show eth0
7: eth0:  mtu 1500 qdisc <...>
link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff

7. QEMU may be gracefully restarted to restore communication after restarting
   of vhost-user application.

Ilya Maximets (4):
  vhost-user: fix crash on socket disconnect.
  vhost: prevent double stop of vhost_net device.
  vhost: check for vhost_net device validity.
  net: notify about link status only if it changed.

 hw/net/vhost_net.c | 48 ++
 hw/net/virtio-net.c| 33 ++---
 include/hw/virtio/virtio-net.h |  1 +
 include/net/vhost-user.h   |  1 +
 include/net/vhost_net.h|  1 +
 net/filter.c   |  1 +
 net/net.c  |  7 +++---
 net/vhost-user.c   | 43 +
 8 files changed, 111 insertions(+), 24 deletions(-)

-- 
2.5.0

[Qemu-devel] [PATCH 4/4] net: notify about link status only if it changed.

2016-03-30 Thread Ilya Maximets

No need to notify nc->peer if nothing changed.

Signed-off-by: Ilya Maximets 
---
 net/net.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/net.c b/net/net.c
index 3b5a142..6f6a8ce 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1385,9 +1385,10 @@ void qmp_set_link(const char *name, bool up, Error 
**errp)
 for (i = 0; i < queues; i++) {
 ncs[i]->peer->link_down = !up;
 }
-}
-if (nc->peer->info->link_status_changed) {
-nc->peer->info->link_status_changed(nc->peer);
+
+if (nc->peer->info->link_status_changed) {
+nc->peer->info->link_status_changed(nc->peer);
+}
 }
 }
 }
-- 
2.5.0

[Qemu-devel] [PATCH 3/4] vhost: check for vhost_net device validity.

2016-03-30 Thread Ilya Maximets

After successfull destroying of vhost-user device (may be after
virtio driver unbinding after disconnection of vhost-user socket)
QEMU will fail to bind virtio driver again with segmentation fault:

[ cut ---]
# After vhost-user socket diconnection.
[guest]# ip link show eth0
2: eth0:  mtu 1500 qdisc <...>
link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff

[guest]# echo -n ':00:01.0' > /sys/bus/pci/drivers/virtio-pci/unbind
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.
qemu: Failed to read msg header. Read 0 instead of 12. Original request 11.

[guest]# echo -n ':00:01.0' > /sys/bus/pci/drivers/virtio-pci/bind

Child terminated with signal = 0xb (SIGSEGV)
GDBserver exiting
[ cut ---]
[host]# gdb
Program received signal SIGSEGV, Segmentation fault.
vhost_set_vring_enable (<...>) at /hw/net/vhost_net.c:425
425 if (vhost_ops->vhost_set_vring_enable) {
(gdb) bt
#0  vhost_set_vring_enable (nc=0xfff8a0, enable=enable@entry=1)
net = 0x0 # NULL pointer to vhost_net device !
vhost_ops = <(Cannot access memory at address 0x110)>
res = 0
#1  peer_attach
#2  virtio_net_set_queues
#3  virtio_net_set_multiqueue
#4  virtio_net_set_features
#5  virtio_set_features_nocheck
#6  memory_region_write_accessor
#7  access_with_adjusted_size
#8  memory_region_dispatch_write
#9  address_space_write_continue
#10 address_space_write
<...>
[ cut ---]

This happens because of invalid vhost_net device pointer.
Fix that by checking this pointer in all functions before using.

Result:
[ cut ---]
[guest]# echo -n ':00:01.0' > /sys/bus/pci/drivers/virtio-pci/bind

# Check link in guest. No crashes here, link in DOWN state.
[guest]# ip link show eth0
7: eth0:  mtu 1500 qdisc <...>
link/ether 00:16:35:af:aa:4b brd ff:ff:ff:ff:ff:ff
[ cut ---]

Signed-off-by: Ilya Maximets 
---
 hw/net/vhost_net.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 0996e5d..4c3363f 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -202,6 +202,10 @@ static int vhost_net_start_one(struct vhost_net *net,
 struct vhost_vring_file file = { };
 int r;
 
+if (!net) {
+return -1;
+}
+
 net->dev.nvqs = 2;
 net->dev.vqs = net->vqs;
 
@@ -256,6 +260,10 @@ static void vhost_net_stop_one(struct vhost_net *net,
 {
 struct vhost_vring_file file = { .fd = -1 };
 
+if (!net) {
+return;
+}
+
 if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_TAP) {
 for (file.index = 0; file.index < net->dev.nvqs; ++file.index) {
 const VhostOps *vhost_ops = net->dev.vhost_ops;
@@ -287,6 +295,9 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 struct vhost_net *net;
 
 net = get_vhost_net(ncs[i].peer);
+if (!net) {
+return -1;
+}
 vhost_net_set_vq_index(net, i * 2);
 
 /* Suppress the masking guest notifiers on vhost user
@@ -419,9 +430,14 @@ void put_vhost_net(NetClientState *nc)
 int vhost_set_vring_enable(NetClientState *nc, int enable)
 {
 VHostNetState *net = get_vhost_net(nc);
-const VhostOps *vhost_ops = net->dev.vhost_ops;
+const VhostOps *vhost_ops;
 int res = 0;
 
+if (!net) {
+return 0;
+}
+
+vhost_ops = net->dev.vhost_ops;
 if (vhost_ops->vhost_set_vring_enable) {
 res = vhost_ops->vhost_set_vring_enable(>dev, enable);
 }
-- 
2.5.0

Re: [Qemu-devel] [PATCH 1/5] slirp: Allow disabling IPv4 or IPv6

2016-03-30 Thread Samuel Thibault

Thomas Huth, on Wed 30 Mar 2016 17:06:51 +0200, wrote:
> The "restrict" option is listed with "=on|off" here, that's why I
> thought it should be there for "ipv4" and "ipv6", too. Which boolean
> options are missing the "=on|off" ?

All the ipv4 and ipv6 options in the same file.

Samuel

[Qemu-devel] Help with qapi.py problem

2016-03-30 Thread Programmingkid

I just git pull'ed today and now have this problem:

.../scripts/qapi.py", line 1726, in maybe_open
return open(name, opt)
IOError: [Errno 1] Operation not permitted: './qapi-types.h'

This is the version of python I am using:

>>> import sys
>>> print (sys.version)
2.6.1 (r261:67515, Jun 24 2010, 21:47:49) 
[GCC 4.2.1 (Apple Inc. build 5646)]

Re: [Qemu-devel] [PATCH 5/9] virtio: add aio handler

2016-03-30 Thread Paolo Bonzini



On 30/03/2016 16:55, Cornelia Huck wrote:
>> >  if (assign && set_handler) {
>> >  aio_set_event_notifier(ctx, >host_notifier, true,
>> > -   virtio_queue_host_notifier_read);
>> > +   virtio_queue_host_notifier_aio_read);
>> >  } else {
>> >  aio_set_event_notifier(ctx, >host_notifier, true, NULL);
>> >  }
>> >  if (!assign) {
>> >  /* Test and clear notifier before after disabling event,
>> >   * in case poll callback didn't have time to run. */
>> > -virtio_queue_host_notifier_read(>host_notifier);
>> > +virtio_queue_host_notifier_aio_read(>host_notifier);
> Is this function ever called with assign==false anymore after patch 1?

Patch 8 provides the answer (which is no :)).

Paolo

Re: [Qemu-devel] [PATCH 1/5] slirp: Allow disabling IPv4 or IPv6

2016-03-30 Thread Thomas Huth

On 30.03.2016 17:04, Samuel Thibault wrote:
> Hello,
> 
> Thomas Huth, on Wed 30 Mar 2016 10:38:46 +0200, wrote:
>>> -"-netdev 
>>> user,id=str[,net=addr[/mask]][,host=addr][,ipv6-net=addr[/int]]\n"
>>> -" 
>>> [,ipv6-host=addr][,restrict=on|off][,hostname=host][,dhcpstart=addr]\n"
>>> +"-netdev user,id=str[,ipv4][,net=addr[/mask]][,host=addr]\n"
>>> +" [,ipv6][,ipv6-net=addr[/int]][,ipv6-host=addr]\n"
>>> +" [,restrict=on|off][,hostname=host][,dhcpstart=addr]\n"
>>>  " [,dns=addr][,ipv6-dns=addr][,dnssearch=domain][,tftp=dir]\n"
>>>  " [,bootfile=f][,hostfwd=rule][,guestfwd=rule]"
>>
>> Shouldn't that rather be "[,ipv4=on|off]" and "[,ipv6=on|off]" for a
>> boolean value?
> 
> Well, just like the rest of these ipv4/ipv6 options, they do accept
> =on|off, but it's not documented in the qemu-options.hx file.  Should we
> fix them all?

The "restrict" option is listed with "=on|off" here, that's why I
thought it should be there for "ipv4" and "ipv6", too. Which boolean
options are missing the "=on|off" ?

 Thomas

Re: [Qemu-devel] [PATCH v12 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-03-30 Thread Max Reitz

On 30.03.2016 13:39, Alberto Garcia wrote:
> On Tue 29 Mar 2016 05:51:22 PM CEST, Max Reitz wrote:
>>> It sounds like the argument here, and in Max's thread on
>>> query-block-node-tree, is that we DO have cases where order matters, and
>>> so we need a way for the hot-add operation to explicitly specify where
>>> in the list a child is inserted (whether it is being inserted as the new
>>> primary image, or explicitly as the last resort, or somewhere in the
>>> middle).  An optional parameter, that defaults to appending, may be ok,
>>> but we definitely need to consider how the order of children is affected
>>> by hot-add.
>>
>> However, the order should be queriable after the fact, and there are
>> three ways I see to accomplish this:
>>
>> (1) Make this information queriable as driver-specific BDS information.
>> I personally don't like it very much, but it would be fine.
>> (2) Implement query-block-node-tree, make the order of child nodes
>> significant and thus represent the FIFO order there. I don't like
>> this because it would mean returning two orders through that child
>> node list: One is the numeric order (children.0, children.1, ...)
>> and another is the FIFO order, which are not necessarily equal.
>> (3) Fix FIFO order to the child name (its role). I'm very much in favor
>> of this.
>>
>> While I don't have good arguments against (1), I think I have good
>> arguments for (3) instead: It just doesn't make sense to have a numeric
>> order of children if this order doesn't mean anything; especially if you
>> suddenly do need the list of child nodes to be ordered. To me, it
>> doesn't make any sense to introduce a new hidden order which takes
>> precedence over this obvious user-visible order.
> 
> I'm not sure if I understand correctly what you mean in (3). The
> user-visible FIFO order is the one specified when the Quorum is created:
> 
> children.0.file.filename=hd0.qcow2,
> children.1.file.filename=hd1.qcow2,
> ...
> 
> Would you then call those BDS children.0, children.1, etc

They are already called that way; it's not their node name but their
"child role" name.

>   and make those
> names be the ones that actually define how they are ordered internally?

Yes, that's what I meant.

> I also have another (not directly related) question: why not simply use
> the node name when removing children? I understood that the idea was
> that it's possible to have the same node attached twice to the same
> Quorum, but can you actually do that? And what's the use case?

What I like about using the child role name is that it automatically
prevents you from specifying a node that is not a child of the given parent.

Which makes me notice that it might be a good idea to require the user
to specify the child's role when adding a new child. In this version of
this series (where only quorum is supported), the children are just
inserted in numerical order (first free slot is taken first), but maybe
the user wants to insert them in a different order.

For quorum, this is basically irrelevant if the order doesn't mean
anything (which I don't like), but it may be relevant for other block
drivers.

And the "filling up quorum's children" topic then makes me notice that
(x-)blockdev-change should probably be transaction-able (so you can
restructure the whole BDS graph in a single transaction), but that's
something we can care about later on.

Max



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PULL 03/17] block: m25p80: Removed unused variable

2016-03-30 Thread Peter Maydell

From: Marcin Krzeminski 

Signed-off-by: Marcin Krzeminski 
Reviewed-by: Peter Crosthwaite 
Message-id: 1458719789-29868-2-git-send-email-marcin.krzemin...@nokia.com
Signed-off-by: Peter Maydell 
---
 hw/block/m25p80.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index de24f42..124 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -246,8 +246,6 @@ typedef enum {
 typedef struct Flash {
 SSISlave parent_obj;
 
-uint32_t r;
-
 BlockBackend *blk;
 
 uint8_t *storage;
-- 
1.9.1

1 2 3 >

1 - 100 of 234 matches

Mail list logo