date:20141130

Re: [Qemu-devel] master: intermittent acpi-test failures

2014-11-30 Thread Michael S. Tsirkin

On Sat, Nov 29, 2014 at 05:39:01PM +, Peter Maydell wrote:
> On 29 November 2014 at 17:36, Michael S. Tsirkin  wrote:
> > On Fri, Nov 28, 2014 at 01:34:33PM +, Peter Maydell wrote:
> >> These failures are back after a long period of not
> >> being a problem :-(
> 
> > My guess is VM fails to boot from disk for some reason.
> > Could you trigger a screenshot after this happens?
> 
> Sure, if you can provide instructions (this is all from
> "make check" so there's no display by default and
> extracting a standalone qemu command line from "make
> check" is pretty tedious IME).
> 
> -- PMM

It's probably easiest to simply drop -nographic
from test code to run with a display.

To trigger a screenshot, just give
screendump /path/to/file
on hmp.

-- 
MST

Re: [Qemu-devel] master: intermittent acpi-test failures

2014-11-30 Thread Michael S. Tsirkin

On Sun, Nov 30, 2014 at 05:12:55PM +0200, Michael S. Tsirkin wrote:
> On Sat, Nov 29, 2014 at 05:39:01PM +, Peter Maydell wrote:
> > On 29 November 2014 at 17:36, Michael S. Tsirkin  wrote:
> > > On Fri, Nov 28, 2014 at 01:34:33PM +, Peter Maydell wrote:
> > >> These failures are back after a long period of not
> > >> being a problem :-(
> > 
> > > My guess is VM fails to boot from disk for some reason.
> > > Could you trigger a screenshot after this happens?
> > 
> > Sure, if you can provide instructions (this is all from
> > "make check" so there's no display by default and
> > extracting a standalone qemu command line from "make
> > check" is pretty tedious IME).
> > 
> > -- PMM
> 
> It's probably easiest to simply drop -nographic
> from test code to run with a display.
> 
> To trigger a screenshot, just give
> screendump /path/to/file
> on hmp.

Another idea is to configure debugging in seabios.

> -- 
> MST

Re: [Qemu-devel] [PATCH RFC for-2.2] virtio-blk: force 1st s/g to match header

2014-11-30 Thread Michael S. Tsirkin

On Fri, Nov 28, 2014 at 04:14:35PM +, Peter Maydell wrote:
> On 28 November 2014 at 11:43, Stefan Hajnoczi  wrote:
> > Right, the test case explicitly tests different descriptor layouts,
> > even though virtio-blk-pci does not set the ANY_LAYOUT feature bit.
> >
> > Either the test case needs to check ANY_LAYOUT before using the
> > 2-descriptor layout or it needs to expect QEMU to refuse (in this case
> > exit(1), which is not very graceful).
> >
> > The quick fix is to skip the 2-descriptor layout tests and re-enable
> > them once virtio-blk actually supports ANY_LAYOUT.  Any objections?
> 
> So what do we want to do with this for 2.2? We have I think
> two choices:
>  (1) say that this isn't causing problems in practice, and defer all
>  this to 2.3
>  (2) add something like this patch plus fix the 'make check' tests
>  (but turning "maybe something misbehaves" into "qemu definitely
>  blows up and exits" doesn't seem like a great improvement to me)
> 
> I started looking at virtio-blk initially because I wasn't sure
> if we should fix the virtio-net issue in the core virtio code.
> But since we've decided not to do that, whether virtio-blk's
> problems are release-blockers or not is something that we can
> decide on their own merits.
> 
> My current thought is that we don't need to address this for 2.2;
> is there something I'm missing that means we shouldn't defer to 2.3?
> 
> thanks
> -- PMM

The result of this is host mapping leak.
What effect does this have? Can this DOS host?
If not, I agree.

[Qemu-devel] [kernel PATCH v2 0/2] devicetree: document ARM bindings for QEMU's Firmware Config interface

2014-11-30 Thread Laszlo Ersek

V2 seeks to address comments raised in the v1 review. Changes are broken
out per patch, as git notes.

Thanks
Laszlo

Laszlo Ersek (2):
  devicetree: document the "qemu" and "virtio" vendor prefixes
  devicetree: document ARM bindings for QEMU's Firmware Config interface

 Documentation/devicetree/bindings/arm/fw-cfg.txt   | 57 ++
 .../devicetree/bindings/vendor-prefixes.txt|  2 +
 2 files changed, 59 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/fw-cfg.txt

-- 
1.8.3.1

[Qemu-devel] [kernel PATCH v2 1/2] devicetree: document the "qemu" and "virtio" vendor prefixes

2014-11-30 Thread Laszlo Ersek

The QEMU open source machine emulator and virtualizer presents firmware
and operating systems running in virtual machines ("guests") with purely
virtual hardware (ie. hardware that has never existed in physical form).
Since QEMU exposes some of these devices in a DTB, it makes sense to
define "qemu" and "virtio" as vendor prefixes.

The qemu definition is from [1], revision 4451 (22:24, 25 November 2014).

The virtio definition is composed from [2] and [3].

[1] http://wiki.qemu.org/Main_Page
[2] 
http://docs.oasis-open.org/virtio/virtio/v1.0/csprd01/virtio-v1.0-csprd01.html
[3] http://en.wikipedia.org/wiki/OASIS_%28organization%29

Suggested-by: Mark Rutland 
Suggested-by: Arnd Bergmann 
Signed-off-by: Laszlo Ersek 
---

Notes:
v2:
- new in v2 [Mark Rutland, Arnd Bergmann]

 Documentation/devicetree/bindings/vendor-prefixes.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt 
b/Documentation/devicetree/bindings/vendor-prefixes.txt
index a344ec2..df095c1 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -119,6 +119,7 @@ pixcir  PIXCIR MICROELECTRONICS Co., Ltd
 powervrPowerVR (deprecated, use img)
 qcaQualcomm Atheros, Inc.
 qcom   Qualcomm Technologies, Inc
+qemu   QEMU, a generic and open source machine emulator and virtualizer
 qnap   QNAP Systems, Inc.
 radxa  Radxa
 raidsonic  RaidSonic Technology GmbH
@@ -159,6 +160,7 @@ usi Universal Scientific Industrial Co., Ltd.
 v3 V3 Semiconductor
 variscite  Variscite Ltd.
 viaVIA Technologies, Inc.
+virtio Virtual I/O Device Specification, developed by the OASIS consortium
 voipac Voipac Technologies s.r.o.
 winbond Winbond Electronics corp.
 wlfWolfson Microelectronics
-- 
1.8.3.1

[Qemu-devel] [kernel PATCH v2 2/2] devicetree: document ARM bindings for QEMU's Firmware Config interface

2014-11-30 Thread Laszlo Ersek

Peter Maydell suggested that we describe new devices / DTB nodes in the
kernel Documentation tree that we expose to arm "virt" guests in QEMU.

Although the kernel is not required to access the fw_cfg interface,
"Documentation/devicetree/bindings/arm" is probably the best central spot
to keep the fw_cfg description in.

Suggested-by: Peter Maydell 
Signed-off-by: Laszlo Ersek 
---

Notes:
v2:
- more info on what the fw_cfg device is used for, versioning, blobs etc
  [Mark Rutland]
- drop generic statements about DTB [Mark Rutland]
- drop uint64_t language [Mark Rutland]
- cover both registers with one contiguous region, of size 0x1000 [Mark
  Rutland, Arnd Bergmann]
- specify "qemu,fw-cfg-mmio" for the "compatible" property [Mark
  Rutland, Arnd Bergmann]
- reorder DTS snippet so that "compatible" come first [Mark Rutland]

 Documentation/devicetree/bindings/arm/fw-cfg.txt | 57 
 1 file changed, 57 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/fw-cfg.txt

diff --git a/Documentation/devicetree/bindings/arm/fw-cfg.txt 
b/Documentation/devicetree/bindings/arm/fw-cfg.txt
new file mode 100644
index 000..15e2ae3
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/fw-cfg.txt
@@ -0,0 +1,57 @@
+* QEMU Firmware Configuration bindings for ARM
+
+QEMU's arm-softmmu and aarch64-softmmu emulation / virtualization targets
+provide the following Firmware Configuration interface on the "virt" machine
+type:
+
+- A write-only, 16-bit wide selector (or control) register,
+- a read-write, 8-bit wide data register.
+
+The guest writes a selector value (a key) to the selector register, and then
+can read the corresponding data (produced by QEMU) via the data register. If
+the selected entry is writable, the guest can rewrite it through the data
+register.
+
+The interface allows guest firmware to download various parameters and blobs
+that affect how the firmware works and what tables it installs for the guest
+OS. For example, boot order of devices, ACPI tables, SMBIOS tables, kernel and
+initrd images for direct kernel booting, virtual machine UUID, SMP information,
+virtual NUMA topology, and so on.
+
+The authoritative registry of the valid selector values and their meanings is
+the QEMU source code; the structure of the data blobs corresponding to the
+individual key values is also defined in the QEMU source code.
+
+The outermost protocol (involving the write / read sequences of the control and
+data registers) is unversioned and considered stable. Versioning of individual
+blobs is theoretically possible, but it is not specified on this level (and is
+not done in practice as yet).
+
+QEMU exposes the control and data register to x86 guests at fixed IO ports. ARM
+guests can access them as memory mapped registers, and their location is
+communicated to the guest's UEFI firmware in the DTB that QEMU places at the
+bottom of the guest's DRAM.
+
+The guest kernel is not expected to use these registers (although it is
+certainly allowed to); the device tree bindings are documented here because
+this is where device tree bindings reside in general.
+
+Required properties:
+
+- compatible: "qemu,fw-cfg-mmio".
+
+- reg: the MMIO region used by the device.
+  * The first two bytes in the region cover the control register.
+  * The third byte covers the data register.
+
+Example:
+
+/ {
+   #size-cells = <0x2>;
+   #address-cells = <0x2>;
+
+   fw-cfg@902 {
+   compatible = "qemu,fw-cfg-mmio";
+   reg = <0x0 0x902 0x0 0x1000>;
+   };
+};
-- 
1.8.3.1

[Qemu-devel] [PATCH v2] arm: add fw_cfg to "virt" board

2014-11-30 Thread Laszlo Ersek

fw_cfg already supports exposure over MMIO (used in ppc/mac_newworld.c,
ppc/mac_oldworld.c, sparc/sun4m.c); we can easily add it to the "virt"
board.

The mmio register block of fw_cfg is advertized in the device tree. As
base address we pick 0x0902, which conforms to the comment preceding
"a15memmap": it falls in the miscellaneous device I/O range 128MB..256MB,
and it is aligned at 64KB. The DTB properties follow the documentation in
the Linux source file "Documentation/devicetree/bindings/arm/fw-cfg.txt".

fw_cfg automatically exports a number of files to the guest; for example,
"bootorder" (see fw_cfg_machine_reset()).

Signed-off-by: Laszlo Ersek 
---

Notes:
v2:
- use a single mmio region of size 0x1000
- set "compatible" property to "qemu,fw-cfg-mmio"

 hw/arm/virt.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 314e55b..af794ea 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -68,6 +68,7 @@ enum {
 VIRT_UART,
 VIRT_MMIO,
 VIRT_RTC,
+VIRT_FW_CFG,
 };
 
 typedef struct MemMapEntry {
@@ -107,6 +108,7 @@ static const MemMapEntry a15memmap[] = {
 [VIRT_GIC_CPU] ={ 0x0801, 0x0001 },
 [VIRT_UART] =   { 0x0900, 0x1000 },
 [VIRT_RTC] ={ 0x0901, 0x1000 },
+[VIRT_FW_CFG] = { 0x0902, 0x1000 },
 [VIRT_MMIO] =   { 0x0a00, 0x0200 },
 /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
 /* 0x1000 .. 0x4000 reserved for PCI */
@@ -519,6 +521,23 @@ static void create_flash(const VirtBoardInfo *vbi)
 g_free(nodename);
 }
 
+static void create_fw_cfg(const VirtBoardInfo *vbi)
+{
+hwaddr base = vbi->memmap[VIRT_FW_CFG].base;
+hwaddr size = vbi->memmap[VIRT_FW_CFG].size;
+char *nodename;
+
+fw_cfg_init(0, 0, base, base + 2);
+
+nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base);
+qemu_fdt_add_subnode(vbi->fdt, nodename);
+qemu_fdt_setprop_string(vbi->fdt, nodename,
+"compatible", "qemu,fw-cfg-mmio");
+qemu_fdt_setprop_sized_cells(vbi->fdt, nodename, "reg",
+ 2, base, 2, size);
+g_free(nodename);
+}
+
 static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size)
 {
 const VirtBoardInfo *board = (const VirtBoardInfo *)binfo;
@@ -604,6 +623,8 @@ static void machvirt_init(MachineState *machine)
  */
 create_virtio_devices(vbi, pic);
 
+create_fw_cfg(vbi);
+
 vbi->bootinfo.ram_size = machine->ram_size;
 vbi->bootinfo.kernel_filename = machine->kernel_filename;
 vbi->bootinfo.kernel_cmdline = machine->kernel_cmdline;
-- 
1.8.3.1

[Qemu-devel] [PATCH v5 6/6] hw/arm/virt: add dynamic sysbus device support

2014-11-30 Thread Eric Auger

Allows sysbus devices to be instantiated from command line by
using -device option. Machvirt creates a platform bus at init.
The dynamic sysbus devices are attached to this platform bus device.

The platform bus device registers a machine init done notifier
whose role will be to bind the dynamic sysbus devices. Indeed
dynamic sysbus devices are created after machine init.

machvirt also registers a notifier that will build the device
tree nodes for the platform bus and its children dynamic sysbus
devices.

Signed-off-by: Alexander Graf 
Signed-off-by: Eric Auger 

---
v4 -> v5:
- platform_bus_params becomes static const
- reword comment in create_platform_bus
- reword the commit message

v3 -> v4:
- use platform bus object, instantiated in create_platform_bus
- device tree generation for platform bus and children dynamic
  sysbus devices is no more handled at reset but in a
  machine_init_done_notifier (due to the change in implementaion
  of ARM load dtb using rom_add_blob_fixed).
- device tree enhancement now takes into account the case of
  user provided dtb. Before the user dtb was overwritten which
  was wrong. However in case the dtb is provided by the user,
  dynamic sysbus nodes are not added there.
- renaming of MACHVIRT_PLATFORM defines
- MACHVIRT_PLATFORM_PAGE_SHIFT and SIZE_PAGES not needed anymore,
  hence removed.
- DynSysbusParams struct renamed into ARMPlatformBusSystemParams
  and above params removed.
- separation of dt creation and QEMU binding is not mandated anymore
  since the device tree is not created from scratch anymore. Instead
  the modify_dtb function is used.
- create_platform_bus registers another machine init done notifier
  to start VFIO IRQ handling. This latter executes after the
  dynamic sysbus device binding.

v2 -> v3:
- renaming of arm_platform_bus_create_devtree and arm_load_dtb
- add copyright in hw/arm/dyn_sysbus_devtree.c

v1 -> v2:
- remove useless vfio-platform.h include file
- s/MACHVIRT_PLATFORM_HOLE/MACHVIRT_PLATFORM_SIZE
- use dyn_sysbus_binding and dyn_sysbus_devtree
- dynamic sysbus platform buse size shrinked to 4MB and
  moved between RTC and MMIO

v1:

Inspired from what Alex Graf did in ppc e500
https://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00012.html

Conflicts:
hw/arm/sysbus-fdt.c
---
 hw/arm/virt.c | 57 +
 1 file changed, 57 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 314e55b..37326a9 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -42,6 +42,8 @@
 #include "exec/address-spaces.h"
 #include "qemu/bitops.h"
 #include "qemu/error-report.h"
+#include "hw/arm/sysbus-fdt.h"
+#include "hw/platform-bus.h"
 
 #define NUM_VIRTIO_TRANSPORTS 32
 
@@ -59,6 +61,11 @@
 #define GIC_FDT_IRQ_PPI_CPU_START 8
 #define GIC_FDT_IRQ_PPI_CPU_WIDTH 8
 
+#define PLATFORM_BUS_BASE 0x940
+#define PLATFORM_BUS_SIZE (4ULL * 1024 * 1024)
+#define PLATFORM_BUS_FIRST_IRQ48
+#define PLATFORM_BUS_NUM_IRQS 20
+
 enum {
 VIRT_FLASH,
 VIRT_MEM,
@@ -68,6 +75,7 @@ enum {
 VIRT_UART,
 VIRT_MMIO,
 VIRT_RTC,
+VIRT_PLATFORM_BUS,
 };
 
 typedef struct MemMapEntry {
@@ -107,6 +115,7 @@ static const MemMapEntry a15memmap[] = {
 [VIRT_GIC_CPU] ={ 0x0801, 0x0001 },
 [VIRT_UART] =   { 0x0900, 0x1000 },
 [VIRT_RTC] ={ 0x0901, 0x1000 },
+[VIRT_PLATFORM_BUS] = {PLATFORM_BUS_BASE , PLATFORM_BUS_SIZE},
 [VIRT_MMIO] =   { 0x0a00, 0x0200 },
 /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
 /* 0x1000 .. 0x4000 reserved for PCI */
@@ -117,6 +126,14 @@ static const int a15irqmap[] = {
 [VIRT_UART] = 1,
 [VIRT_RTC] = 2,
 [VIRT_MMIO] = 16, /* ...to 16 + NUM_VIRTIO_TRANSPORTS - 1 */
+[VIRT_PLATFORM_BUS] = PLATFORM_BUS_FIRST_IRQ,
+};
+
+static const ARMPlatformBusSystemParams platform_bus_params = {
+.platform_bus_base = PLATFORM_BUS_BASE,
+.platform_bus_size = PLATFORM_BUS_SIZE,
+.platform_bus_first_irq = PLATFORM_BUS_FIRST_IRQ,
+.platform_bus_num_irqs = PLATFORM_BUS_NUM_IRQS,
 };
 
 static VirtBoardInfo machines[] = {
@@ -519,6 +536,43 @@ static void create_flash(const VirtBoardInfo *vbi)
 g_free(nodename);
 }
 
+static void create_platform_bus(VirtBoardInfo *vbi, qemu_irq *pic,
+const ARMPlatformBusSystemParams 
*system_params)
+{
+DeviceState *dev;
+SysBusDevice *s;
+int i;
+ARMPlatformBusFdtParams *fdt_params = g_new(ARMPlatformBusFdtParams, 1);
+MemoryRegion *sysmem = get_system_memory();
+
+fdt_params->system_params = system_params;
+fdt_params->binfo = &vbi->bootinfo;
+fdt_params->intc = "/intc";
+/*
+ * register a machine init done notifier that creates the device tree
+ * nodes of the platform bus and its children dynamic sysbus devices
+ */
+arm_register_platform_bus_fdt_creator(fdt_params);
+
+dev = qdev_create(NULL, TYP

[Qemu-devel] [PATCH v5 5/6] hw/arm/sysbus-fdt: helpers for platform bus nodes addition

2014-11-30 Thread Eric Auger

This new C module will be used by ARM machine files to generate
platform bus node and their dynamic sysbus device tree nodes.

Dynamic sysbus device node addition is done in a machine init
done notifier. arm_register_platform_bus_fdt_creator does the
registration of this latter and is supposed to be called by
ARM machine files that support platform bus and their dynamic
sysbus. Addition of dynamic sysbus nodes is done only if the
user did not provide any dtb.

Signed-off-by: Alexander Graf 
Signed-off-by: Eric Auger 

---

v4 -> v5:
- change indentation in add_fdt_node_functions. Also becomes a
  static const.
- ARMPlatformBusFdtParams.system_params becomes a pointer to
  a const ARMPlatformBusSystemParams
- removes platform-bus.h second inclusion

v3 -> v4:
- dyn_sysbus_devtree.c renamed into sysbus-fdt.c
- use new PlatformBusDevice object
- the dtb upgrade is done through modify_dtb. Before the fdt
  was recreated from scratch. When the user provided a dtb this
  latter was overwritten which was not correct.
- an array contains the association between device type names
  and their node creation function
- I must aknowledge I did not find any cleaner way to implement
  a FDT_BUILDER interface, as suggested by Paolo. The class method
  would need to be initialized somewhere and since it cannot
  happen in the device itself - according to Alex & Peter comments -,
  I don't see when I shall associate the device type and its
  interface implementation.

v2 -> v3:
- add arm_ prefix
- arm_sysbus_device_create_devtree becomes static

v1 -> v2:
- Code moved in an arch specific file to accomodate architecture
  dependent specificities.
- remove platform_bus_base from PlatformDevtreeData

v1: code originally written by Alex Graf in e500.c and reused for
ARM [Eric Auger]
---
 hw/arm/Makefile.objs|   1 +
 hw/arm/sysbus-fdt.c | 180 
 include/hw/arm/sysbus-fdt.h |  50 
 3 files changed, 231 insertions(+)
 create mode 100644 hw/arm/sysbus-fdt.c
 create mode 100644 include/hw/arm/sysbus-fdt.h

diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index 6088e53..0cc63e1 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -3,6 +3,7 @@ obj-$(CONFIG_DIGIC) += digic_boards.o
 obj-y += integratorcp.o kzm.o mainstone.o musicpal.o nseries.o
 obj-y += omap_sx1.o palm.o realview.o spitz.o stellaris.o
 obj-y += tosa.o versatilepb.o vexpress.o virt.o xilinx_zynq.o z2.o
+obj-y += sysbus-fdt.o
 
 obj-y += armv7m.o exynos4210.o pxa2xx.o pxa2xx_gpio.o pxa2xx_pic.o
 obj-$(CONFIG_DIGIC) += digic.o
diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
new file mode 100644
index 000..7537267
--- /dev/null
+++ b/hw/arm/sysbus-fdt.c
@@ -0,0 +1,180 @@
+/*
+ * ARM Platform Bus device tree generation helpers
+ *
+ * Copyright (c) 2014 Linaro Limited
+ *
+ * Authors:
+ *  Alex Graf 
+ *  Eric Auger 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ *
+ */
+
+#include "hw/arm/sysbus-fdt.h"
+#include "qemu/error-report.h"
+#include "sysemu/device_tree.h"
+#include "hw/platform-bus.h"
+#include "sysemu/sysemu.h"
+
+/*
+ * internal struct that contains the information to create dynamic
+ * sysbus device node
+ */
+typedef struct PlatformBusFdtData {
+void *fdt; /* device tree handle */
+int irq_start; /* index of the first IRQ usable by platform bus devices */
+const char *pbus_node_name; /* name of the platform bus node */
+PlatformBusDevice *pbus;
+} PlatformBusFdtData;
+
+/*
+ * struct used when calling the machine init done notifier
+ * that constructs the fdt nodes of platform bus devices
+ */
+typedef struct PlatformBusFdtNotifierParams {
+ARMPlatformBusFdtParams *fdt_params;
+Notifier notifier;
+} PlatformBusFdtNotifierParams;
+
+/* struct that associates a device type name and a node creation function */
+typedef struct NodeCreationPair {
+const char *typename;
+int (*add_fdt_node_fn)(SysBusDevice *sbdev, void *opaque);
+} NodeCreationPair;
+
+/* list of supported dynamic sysbus devices */
+static const NodeCreationPair add_fdt_node_functions[] = {
+{"", NULL}, /*last element*/
+};
+
+/**
+ * add_fdt_node - add the device tree node of a dynamic sysbus device
+ *
+ * @sbdev: handle to the sysbus device
+ * @opaque: handle to the PlatformBusFdtData
+ *
+ * Checks the sysbus type belongs to the list of device types that
+ * are dynamically instantiable and in the po

[Qemu-devel] [PATCH v5 0/6] machvirt dynamic sysbus device instantiation

2014-11-30 Thread Eric Auger

This patch series enables machvirt to dynamically instantiate sysbus
devices from command line (using -device option).

All those sysbus devices are plugged onto a platform bus. This latter
device is instantiated in machvirt and takes care of the binding of
children sysbus devices on a machine init done notifier. The device
tree node generation for children dynamic sysbus device also happens
on a subsequent notifier that must be executed after the above one.
machvirt registers that notifier before the platform bus creation to
make sure notifiers are executed in the right order: dt generation after
actual QOM binding.

Very few sysbus devices are supposed to be instantiated that
way. VFIO devices belong to them.

Node creation really is architecture specific. On ARM the dynamic
sysbus device node creation is implemented in a new C module,
hw/arm/sysbus-fdt.c and not in the machine file.

Machvirt transformations and sysbus-fdt are largely inspired from Alex work.

The patch series can be found at:
http://git.linaro.org/people/eric.auger/qemu.git (branch vfio_integ_v8)

Best Regards

Eric

v4 -> v5:
- in virt.c: platform_bus_params becomes static const
- sysbus-fdt: change indentation in add_fdt_node_functions array init
- s/load_dtb/arm_load_dtb in one boot.c comment

v3 -> v4:
- dyn_sysbus_binding removed since binding stuff now are implemented by
  the platform bus device
- due to a change in ARM load_dtb implementation using rom_add_blob_fixed,
  the dt no more is generated in a reset notifier but is generated on a
  machine init done notifier
- the augmented device tree is not generated from scratch anymore but is
  added using a modify_dtb function. This required some small change in
  boot.c
- the case where the user provides a dtb file now is handled
- some cleanup in virt additions
- implement a list of dyanmically instantiable devices in sysbus-fdt

v2 -> v3:
- patch now applies on top of Alex full patchset
- dyn_sysbus_devtree: add arm_prefix to emphasize the fact those
  functions are arm specific; arm_sysbus_device_create_devtree
  becomes static
- load_dtb renamed into arm_load_dtb
- add copyright in hw/arm/dyn_sysbus_devtree.c


Eric Auger (6):
  hw/arm/boot: load_dtb becomes non static arm_load_dtb
  hw/arm/boot: dtb start and limit moved in arm_boot_info
  hw/arm/boot: do not free VirtBoardInfo fdt in arm_load_dtb
  hw/arm: add a new modify_dtb_opaque field in arm_boot_info
  hw/arm/sysbus-fdt: helpers for platform bus nodes addition
  hw/arm/virt: add dynamic sysbus device support

 hw/arm/Makefile.objs|   1 +
 hw/arm/boot.c   |  52 +++--
 hw/arm/sysbus-fdt.c | 180 
 hw/arm/virt.c   |  57 ++
 include/hw/arm/arm.h|   7 ++
 include/hw/arm/sysbus-fdt.h |  50 
 6 files changed, 325 insertions(+), 22 deletions(-)
 create mode 100644 hw/arm/sysbus-fdt.c
 create mode 100644 include/hw/arm/sysbus-fdt.h

-- 
1.8.3.2

[Qemu-devel] [PATCH v5 2/6] hw/arm/boot: dtb start and limit moved in arm_boot_info

2014-11-30 Thread Eric Auger

Two fields are added in arm_boot_info (dtb_start and dtb_limit). The
prototype of arm_load_kernel is changed to only use arm_boot_info.

The rationale behind introducing that change is when dealing with
dynamic sysbus devices, we need to upgrade the device tree with dynamic
device nodes after the dtb is already loaded. Storing those parameters
in arm_boot_info allows to avoid computing again dtb_start and
dtb_load, as done in arm_load_kernel.

Signed-off-by: Eric Auger 
---
 hw/arm/boot.c| 38 +-
 include/hw/arm/arm.h |  5 +++--
 2 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 9997bea..0398cd4 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -314,24 +314,21 @@ static void set_kernel_args_old(const struct 
arm_boot_info *info)
 
 /**
  * arm_load_dtb() - load a device tree binary image into memory
- * @addr:   the address to load the image at
  * @binfo:  struct describing the boot environment
- * @addr_limit: upper limit of the available memory area at @addr
  *
  * Load a device tree supplied by the machine or by the user  with the
- * '-dtb' command line option, and put it at offset @addr in target
- * memory.
+ * '-dtb' command line option, and put it at offset binfo->dtb_start in
+ * target memory.
  *
- * If @addr_limit contains a meaningful value (i.e., it is strictly greater
- * than @addr), the device tree is only loaded if its size does not exceed
- * the limit.
+ * If binfo->dtb_limit contains a meaningful value (i.e., it is strictly
+ * greater binfo->dtb_start, the device tree is only loaded if its size does
+ * not exceed this upper limit.
  *
  * Returns: the size of the device tree image on success,
  *  0 if the image size exceeds the limit,
  *  -1 on errors.
  */
-int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
- hwaddr addr_limit)
+int arm_load_dtb(const struct arm_boot_info *binfo)
 {
 void *fdt = NULL;
 int size, rc;
@@ -360,7 +357,8 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info 
*binfo,
 }
 }
 
-if (addr_limit > addr && size > (addr_limit - addr)) {
+if (binfo->dtb_limit > binfo->dtb_start &&
+size > (binfo->dtb_limit - binfo->dtb_start)) {
 /* Installing the device tree blob at addr would exceed addr_limit.
  * Whether this constitutes failure is up to the caller to decide,
  * so just return 0 as size, i.e., no error.
@@ -427,7 +425,7 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info 
*binfo,
 /* Put the DTB into the memory map as a ROM image: this will ensure
  * the DTB is copied again upon reset, even if addr points into RAM.
  */
-rom_add_blob_fixed("dtb", fdt, size, addr);
+rom_add_blob_fixed("dtb", fdt, size, binfo->dtb_start);
 
 g_free(fdt);
 
@@ -504,7 +502,10 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
*info)
 /* If we have a device tree blob, but no kernel to supply it to,
  * copy it to the base of RAM for a bootloader to pick up.
  */
-if (arm_load_dtb(info->loader_start, info, 0) < 0) {
+info->dtb_start = info->loader_start;
+info->dtb_limit = 0;
+
+if (arm_load_dtb(info) < 0) {
 exit(1);
 }
 }
@@ -572,7 +573,9 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
*info)
 if (elf_low_addr < info->loader_start) {
 elf_low_addr = 0;
 }
-if (arm_load_dtb(info->loader_start, info, elf_low_addr) < 0) {
+info->dtb_start = info->loader_start;
+info->dtb_limit = elf_low_addr;
+if (arm_load_dtb(info) < 0) {
 exit(1);
 }
 }
@@ -635,12 +638,13 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
*info)
  * kernels will trash anything in the 4K page the initrd
  * ends in, so make sure the DTB isn't caught up in that.
  */
-hwaddr dtb_start = QEMU_ALIGN_UP(info->initrd_start + initrd_size,
- 4096);
-if (arm_load_dtb(dtb_start, info, 0) < 0) {
+info->dtb_start = QEMU_ALIGN_UP(info->initrd_start + initrd_size,
+4096);
+info->dtb_limit = 0;
+if (arm_load_dtb(info) < 0) {
 exit(1);
 }
-fixupcontext[FIXUP_ARGPTR] = dtb_start;
+fixupcontext[FIXUP_ARGPTR] = info->dtb_start;
 } else {
 fixupcontext[FIXUP_ARGPTR] = info->loader_start + KERNEL_ARGS_ADDR;
 if (info->ram_size >= (1ULL << 32)) {
diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h
index 5fdae7b..5f1ecb7 100644
--- a/include/hw/arm/arm.h
+++ b/include/hw/arm/arm.h
@@ -65,11 +65,12 @@ struct arm_boot_info {
 int is_linux;
 hwadd

[Qemu-devel] [PATCH v5 1/6] hw/arm/boot: load_dtb becomes non static arm_load_dtb

2014-11-30 Thread Eric Auger

load_dtb is renamed into arm_load_dtb and becomes non static.
it will be used by machvirt for dynamic instantiation of
platform devices

Signed-off-by: Eric Auger 

---

v4 -> v5:
s/load_dtb/arm_load_dtb in one comment

v2 -> v3:
load_dtb renamed into arm_load_dtb

Conflicts:
hw/arm/boot.c
---
 hw/arm/boot.c| 16 
 include/hw/arm/arm.h |  2 ++
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 0014c34..9997bea 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -313,7 +313,7 @@ static void set_kernel_args_old(const struct arm_boot_info 
*info)
 }
 
 /**
- * load_dtb() - load a device tree binary image into memory
+ * arm_load_dtb() - load a device tree binary image into memory
  * @addr:   the address to load the image at
  * @binfo:  struct describing the boot environment
  * @addr_limit: upper limit of the available memory area at @addr
@@ -330,8 +330,8 @@ static void set_kernel_args_old(const struct arm_boot_info 
*info)
  *  0 if the image size exceeds the limit,
  *  -1 on errors.
  */
-static int load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
-hwaddr addr_limit)
+int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
+ hwaddr addr_limit)
 {
 void *fdt = NULL;
 int size, rc;
@@ -504,7 +504,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
*info)
 /* If we have a device tree blob, but no kernel to supply it to,
  * copy it to the base of RAM for a bootloader to pick up.
  */
-if (load_dtb(info->loader_start, info, 0) < 0) {
+if (arm_load_dtb(info->loader_start, info, 0) < 0) {
 exit(1);
 }
 }
@@ -566,13 +566,13 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
*info)
  */
 if (elf_low_addr > info->loader_start
 || elf_high_addr < info->loader_start) {
-/* Pass elf_low_addr as address limit to load_dtb if it may be
- * pointing into RAM, otherwise pass '0' (no limit)
+/* Pass elf_low_addr as address limit to arm_load_dtb if it may
+ * be pointing into RAM, otherwise pass '0' (no limit)
  */
 if (elf_low_addr < info->loader_start) {
 elf_low_addr = 0;
 }
-if (load_dtb(info->loader_start, info, elf_low_addr) < 0) {
+if (arm_load_dtb(info->loader_start, info, elf_low_addr) < 0) {
 exit(1);
 }
 }
@@ -637,7 +637,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
*info)
  */
 hwaddr dtb_start = QEMU_ALIGN_UP(info->initrd_start + initrd_size,
  4096);
-if (load_dtb(dtb_start, info, 0) < 0) {
+if (arm_load_dtb(dtb_start, info, 0) < 0) {
 exit(1);
 }
 fixupcontext[FIXUP_ARGPTR] = dtb_start;
diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h
index cefc9e6..5fdae7b 100644
--- a/include/hw/arm/arm.h
+++ b/include/hw/arm/arm.h
@@ -68,6 +68,8 @@ struct arm_boot_info {
 hwaddr entry;
 };
 void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info);
+int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
+ hwaddr addr_limit);
 
 /* Multiplication factor to convert from system clock ticks to qemu timer
ticks.  */
-- 
1.8.3.2

[Qemu-devel] [PATCH v5 4/6] hw/arm: add a new modify_dtb_opaque field in arm_boot_info

2014-11-30 Thread Eric Auger

This field can be used by any modify_dtb() function to pass
additional arguments requested to build the modified dtb. This
is needed for creating the platform bus dynamic sysbus nodes.

Signed-off-by: Eric Auger 
---
 include/hw/arm/arm.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h
index 5f1ecb7..ff776fa 100644
--- a/include/hw/arm/arm.h
+++ b/include/hw/arm/arm.h
@@ -68,6 +68,10 @@ struct arm_boot_info {
 hwaddr dtb_start; /* start address of the dtb */
 hwaddr dtb_limit; /* upper RAM limit the dtb cannot overshoot */
 hwaddr entry;
+/* in case modify_dtb requires additional parameters to create the
+ * the new nodes, use following opaque
+ */
+void *modify_dtb_opaque;
 };
 void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info);
 int arm_load_dtb(const struct arm_boot_info *binfo);
-- 
1.8.3.2

[Qemu-devel] [PATCH v5 3/6] hw/arm/boot: do not free VirtBoardInfo fdt in arm_load_dtb

2014-11-30 Thread Eric Auger

Currently arm_load_dtb frees the fdt handle whatever it is allocated
from load_device_tree or allocated externally.

When adding dynamic sysbus nodes after the first dtb load, we would like
to reuse the fdt used during the first load instead of re-creating the
whole device tree. If the fdt is destroyed, this is not possible.

Signed-off-by: Eric Auger 
---
 hw/arm/boot.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 0398cd4..0f9cd2c 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -427,12 +427,16 @@ int arm_load_dtb(const struct arm_boot_info *binfo)
  */
 rom_add_blob_fixed("dtb", fdt, size, binfo->dtb_start);
 
-g_free(fdt);
+if (binfo->dtb_filename) {
+g_free(fdt);
+}
 
 return size;
 
 fail:
-g_free(fdt);
+if (binfo->dtb_filename) {
+g_free(fdt);
+}
 return -1;
 }
 
-- 
1.8.3.2

[Qemu-devel] [PATCH v8 05/19] hw/vfio/pci: add type, name and group fields in VFIODevice

2014-11-30 Thread Eric Auger

Add 3 new fields in the VFIODevice struct. Type is set to
VFIO_DEVICE_TYPE_PCI. The type enum value will later be used
to discriminate between VFIO PCI and platform devices. The name is
set to domain:bus:slot:function. Currently used to test whether
the device already is attached to the group. Later on, the name
will be used to simplify all traces. The group is simply moved
from VFIOPCIDevice to VFIODevice.

Signed-off-by: Eric Auger 
---
 hw/vfio/pci.c | 27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index cd9ce4e..157e1a5 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -48,6 +48,10 @@
 #define VFIO_ALLOW_KVM_MSI 1
 #define VFIO_ALLOW_KVM_MSIX 1
 
+enum {
+VFIO_DEVICE_TYPE_PCI = 0,
+};
+
 struct VFIOPCIDevice;
 
 typedef struct VFIOQuirk {
@@ -186,7 +190,10 @@ typedef struct VFIOMSIXInfo {
 } VFIOMSIXInfo;
 
 typedef struct VFIODevice {
+struct VFIOGroup *group;
+char *name;
 int fd;
+int type;
 } VFIODevice;
 
 typedef struct VFIOPCIDevice {
@@ -208,7 +215,6 @@ typedef struct VFIOPCIDevice {
 VFIOVGA vga; /* 0xa, 0x3b0, 0x3c0 */
 PCIHostDeviceAddress host;
 QLIST_ENTRY(VFIOPCIDevice) next;
-struct VFIOGroup *group;
 EventNotifier err_notifier;
 uint32_t features;
 #define VFIO_FEATURE_ENABLE_VGA_BIT 0
@@ -3924,7 +3930,7 @@ static int vfio_get_device(VFIOGroup *group, const char 
*name,
 }
 
 vdev->vbasedev.fd = ret;
-vdev->group = group;
+vdev->vbasedev.group = group;
 QLIST_INSERT_HEAD(&group->device_list, vdev, next);
 
 /* Sanity check device */
@@ -4054,7 +4060,7 @@ static int vfio_get_device(VFIOGroup *group, const char 
*name,
 error:
 if (ret) {
 QLIST_REMOVE(vdev, next);
-vdev->group = NULL;
+vdev->vbasedev.group = NULL;
 close(vdev->vbasedev.fd);
 }
 return ret;
@@ -4063,9 +4069,10 @@ error:
 static void vfio_put_device(VFIOPCIDevice *vdev)
 {
 QLIST_REMOVE(vdev, next);
-vdev->group = NULL;
+vdev->vbasedev.group = NULL;
 trace_vfio_put_device(vdev->vbasedev.fd);
 close(vdev->vbasedev.fd);
+g_free(vdev->vbasedev.name);
 if (vdev->msix) {
 g_free(vdev->msix);
 vdev->msix = NULL;
@@ -4197,6 +4204,11 @@ static int vfio_initfn(PCIDevice *pdev)
 return -errno;
 }
 
+vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
+g_strdup_printf(vdev->vbasedev.name, "%04x:%02x:%02x.%01x",
+vdev->host.domain, vdev->host.bus, vdev->host.slot,
+vdev->host.function);
+
 strncat(path, "iommu_group", sizeof(path) - strlen(path) - 1);
 
 len = readlink(path, iommu_group_path, sizeof(path));
@@ -4227,10 +4239,7 @@ static int vfio_initfn(PCIDevice *pdev)
 vdev->host.function);
 
 QLIST_FOREACH(pvdev, &group->device_list, next) {
-if (pvdev->host.domain == vdev->host.domain &&
-pvdev->host.bus == vdev->host.bus &&
-pvdev->host.slot == vdev->host.slot &&
-pvdev->host.function == vdev->host.function) {
+if (strcmp(pvdev->vbasedev.name, vdev->vbasedev.name) == 0) {
 
 error_report("vfio: error: device %s is already attached", path);
 vfio_put_group(group);
@@ -4333,7 +4342,7 @@ out_put:
 static void vfio_exitfn(PCIDevice *pdev)
 {
 VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
-VFIOGroup *group = vdev->group;
+VFIOGroup *group = vdev->vbasedev.group;
 
 vfio_unregister_err_notifier(vdev);
 pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
-- 
1.8.3.2

[Qemu-devel] [PATCH v8 02/19] hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice

2014-11-30 Thread Eric Auger

This prepares for the introduction of VFIOPlatformDevice

Signed-off-by: Eric Auger 
---
 hw/vfio/pci.c | 210 +-
 1 file changed, 106 insertions(+), 104 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 7e69415..0d7d4a0 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -48,11 +48,11 @@
 #define VFIO_ALLOW_KVM_MSI 1
 #define VFIO_ALLOW_KVM_MSIX 1
 
-struct VFIODevice;
+struct VFIOPCIDevice;
 
 typedef struct VFIOQuirk {
 MemoryRegion mem;
-struct VFIODevice *vdev;
+struct VFIOPCIDevice *vdev;
 QLIST_ENTRY(VFIOQuirk) next;
 struct {
 uint32_t base_offset:TARGET_PAGE_BITS;
@@ -123,7 +123,7 @@ typedef struct VFIOMSIVector {
  */
 EventNotifier interrupt;
 EventNotifier kvm_interrupt;
-struct VFIODevice *vdev; /* back pointer to device */
+struct VFIOPCIDevice *vdev; /* back pointer to device */
 int virq;
 bool use;
 } VFIOMSIVector;
@@ -185,7 +185,7 @@ typedef struct VFIOMSIXInfo {
 void *mmap;
 } VFIOMSIXInfo;
 
-typedef struct VFIODevice {
+typedef struct VFIOPCIDevice {
 PCIDevice pdev;
 int fd;
 VFIOINTx intx;
@@ -203,7 +203,7 @@ typedef struct VFIODevice {
 VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
 VFIOVGA vga; /* 0xa, 0x3b0, 0x3c0 */
 PCIHostDeviceAddress host;
-QLIST_ENTRY(VFIODevice) next;
+QLIST_ENTRY(VFIOPCIDevice) next;
 struct VFIOGroup *group;
 EventNotifier err_notifier;
 uint32_t features;
@@ -218,13 +218,13 @@ typedef struct VFIODevice {
 bool has_pm_reset;
 bool needs_reset;
 bool rom_read_failed;
-} VFIODevice;
+} VFIOPCIDevice;
 
 typedef struct VFIOGroup {
 int fd;
 int groupid;
 VFIOContainer *container;
-QLIST_HEAD(, VFIODevice) device_list;
+QLIST_HEAD(, VFIOPCIDevice) device_list;
 QLIST_ENTRY(VFIOGroup) next;
 QLIST_ENTRY(VFIOGroup) container_next;
 } VFIOGroup;
@@ -268,16 +268,16 @@ static QLIST_HEAD(, VFIOGroup)
 static int vfio_kvm_device_fd = -1;
 #endif
 
-static void vfio_disable_interrupts(VFIODevice *vdev);
+static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
 static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
   uint32_t val, int len);
-static void vfio_mmap_set_enabled(VFIODevice *vdev, bool enabled);
+static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
 
 /*
  * Common VFIO interrupt disable
  */
-static void vfio_disable_irqindex(VFIODevice *vdev, int index)
+static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
 {
 struct vfio_irq_set irq_set = {
 .argsz = sizeof(irq_set),
@@ -293,7 +293,7 @@ static void vfio_disable_irqindex(VFIODevice *vdev, int 
index)
 /*
  * INTx
  */
-static void vfio_unmask_intx(VFIODevice *vdev)
+static void vfio_unmask_intx(VFIOPCIDevice *vdev)
 {
 struct vfio_irq_set irq_set = {
 .argsz = sizeof(irq_set),
@@ -307,7 +307,7 @@ static void vfio_unmask_intx(VFIODevice *vdev)
 }
 
 #ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
-static void vfio_mask_intx(VFIODevice *vdev)
+static void vfio_mask_intx(VFIOPCIDevice *vdev)
 {
 struct vfio_irq_set irq_set = {
 .argsz = sizeof(irq_set),
@@ -338,7 +338,7 @@ static void vfio_mask_intx(VFIODevice *vdev)
  */
 static void vfio_intx_mmap_enable(void *opaque)
 {
-VFIODevice *vdev = opaque;
+VFIOPCIDevice *vdev = opaque;
 
 if (vdev->intx.pending) {
 timer_mod(vdev->intx.mmap_timer,
@@ -351,7 +351,7 @@ static void vfio_intx_mmap_enable(void *opaque)
 
 static void vfio_intx_interrupt(void *opaque)
 {
-VFIODevice *vdev = opaque;
+VFIOPCIDevice *vdev = opaque;
 
 if (!event_notifier_test_and_clear(&vdev->intx.interrupt)) {
 return;
@@ -370,7 +370,7 @@ static void vfio_intx_interrupt(void *opaque)
 }
 }
 
-static void vfio_eoi(VFIODevice *vdev)
+static void vfio_eoi(VFIOPCIDevice *vdev)
 {
 if (!vdev->intx.pending) {
 return;
@@ -384,7 +384,7 @@ static void vfio_eoi(VFIODevice *vdev)
 vfio_unmask_intx(vdev);
 }
 
-static void vfio_enable_intx_kvm(VFIODevice *vdev)
+static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
 {
 #ifdef CONFIG_KVM
 struct kvm_irqfd irqfd = {
@@ -462,7 +462,7 @@ fail:
 #endif
 }
 
-static void vfio_disable_intx_kvm(VFIODevice *vdev)
+static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
 {
 #ifdef CONFIG_KVM
 struct kvm_irqfd irqfd = {
@@ -506,7 +506,7 @@ static void vfio_disable_intx_kvm(VFIODevice *vdev)
 
 static void vfio_update_irq(PCIDevice *pdev)
 {
-VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
 PCIINTxRoute route;
 
 if (vdev->interrupt != VFIO_INT_INTx) {
@@ -537,7 +537,7 @@ static void vfio_update_irq(PCIDevice *pdev)
 vfio_eoi(vdev);
 }
 
-static int vfio_enable_intx(VFIODevice *vdev)
+static int

[Qemu-devel] [PATCH v8 01/19] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio

2014-11-30 Thread Eric Auger

From: Kim Phillips 

This is done in preparation for the addition of VFIO platform
device support.

Signed-off-by: Kim Phillips 
---
 LICENSE  | 2 +-
 MAINTAINERS  | 2 +-
 hw/Makefile.objs | 1 +
 hw/misc/Makefile.objs| 1 -
 hw/ppc/spapr_pci_vfio.c  | 2 +-
 hw/vfio/Makefile.objs| 3 +++
 hw/{misc/vfio.c => vfio/pci.c}   | 2 +-
 include/hw/{misc => vfio}/vfio.h | 0
 8 files changed, 8 insertions(+), 5 deletions(-)
 create mode 100644 hw/vfio/Makefile.objs
 rename hw/{misc/vfio.c => vfio/pci.c} (99%)
 rename include/hw/{misc => vfio}/vfio.h (100%)

diff --git a/LICENSE b/LICENSE
index da70e94..0e0b4b9 100644
--- a/LICENSE
+++ b/LICENSE
@@ -11,7 +11,7 @@ option) any later version.
 
 As of July 2013, contributions under version 2 of the GNU General Public
 License (and no later version) are only accepted for the following files
-or directories: bsd-user/, linux-user/, hw/misc/vfio.c, hw/xen/xen_pt*.
+or directories: bsd-user/, linux-user/, hw/vfio/, hw/xen/xen_pt*.
 
 3) The Tiny Code Generator (TCG) is released under the BSD license
(see license headers in files).
diff --git a/MAINTAINERS b/MAINTAINERS
index bcb69e8..255b512 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -657,7 +657,7 @@ F: hw/usb/dev-serial.c
 VFIO
 M: Alex Williamson 
 S: Supported
-F: hw/misc/vfio.c
+F: hw/vfio/*
 
 vhost
 M: Michael S. Tsirkin 
diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 52a1464..73afa41 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -26,6 +26,7 @@ devices-dirs-$(CONFIG_SOFTMMU) += ssi/
 devices-dirs-$(CONFIG_SOFTMMU) += timer/
 devices-dirs-$(CONFIG_TPM) += tpm/
 devices-dirs-$(CONFIG_SOFTMMU) += usb/
+devices-dirs-$(CONFIG_SOFTMMU) += vfio/
 devices-dirs-$(CONFIG_VIRTIO) += virtio/
 devices-dirs-$(CONFIG_SOFTMMU) += watchdog/
 devices-dirs-$(CONFIG_SOFTMMU) += xen/
diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index 979e532..e47fea8 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -21,7 +21,6 @@ common-obj-$(CONFIG_MACIO) += macio/
 
 ifeq ($(CONFIG_PCI), y)
 obj-$(CONFIG_KVM) += ivshmem.o
-obj-$(CONFIG_LINUX) += vfio.o
 endif
 
 obj-$(CONFIG_REALVIEW) += arm_sysctl.o
diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index d3bddf2..144912b 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -20,7 +20,7 @@
 #include "hw/ppc/spapr.h"
 #include "hw/pci-host/spapr.h"
 #include "linux/vfio.h"
-#include "hw/misc/vfio.h"
+#include "hw/vfio/vfio.h"
 
 static Property spapr_phb_vfio_properties[] = {
 DEFINE_PROP_INT32("iommu", sPAPRPHBVFIOState, iommugroupid, -1),
diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
new file mode 100644
index 000..31c7dab
--- /dev/null
+++ b/hw/vfio/Makefile.objs
@@ -0,0 +1,3 @@
+ifeq ($(CONFIG_LINUX), y)
+obj-$(CONFIG_PCI) += pci.o
+endif
diff --git a/hw/misc/vfio.c b/hw/vfio/pci.c
similarity index 99%
rename from hw/misc/vfio.c
rename to hw/vfio/pci.c
index 6c36c8b..7e69415 100644
--- a/hw/misc/vfio.c
+++ b/hw/vfio/pci.c
@@ -39,8 +39,8 @@
 #include "qemu/range.h"
 #include "sysemu/kvm.h"
 #include "sysemu/sysemu.h"
-#include "hw/misc/vfio.h"
 #include "trace.h"
+#include "hw/vfio/vfio.h"
 
 /* Extra debugging, trap acceleration paths for more logging */
 #define VFIO_ALLOW_MMAP 1
diff --git a/include/hw/misc/vfio.h b/include/hw/vfio/vfio.h
similarity index 100%
rename from include/hw/misc/vfio.h
rename to include/hw/vfio/vfio.h
-- 
1.8.3.2

[Qemu-devel] [PATCH v8 04/19] hw/vfio/pci: introduce minimalist VFIODevice with fd

2014-11-30 Thread Eric Auger

Introduce a new base VFIODevice strcut that will be used by both PCI
and Platform VFIO device. Move VFIOPCIDevice fd field there. Obviously
other fields from VFIOPCIDevice will be moved there but this patch
file is introduced to ease the review.

Also vfio_mask_single_irqindex, vfio_unmask_single_irqindex,
vfio_disable_irqindex now take a VFIODevice handle as argument.

Signed-off-by: Eric Auger 
---
 hw/vfio/pci.c | 117 +++---
 1 file changed, 63 insertions(+), 54 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 387da1a..cd9ce4e 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -185,9 +185,13 @@ typedef struct VFIOMSIXInfo {
 void *mmap;
 } VFIOMSIXInfo;
 
+typedef struct VFIODevice {
+int fd;
+} VFIODevice;
+
 typedef struct VFIOPCIDevice {
 PCIDevice pdev;
-int fd;
+VFIODevice vbasedev;
 VFIOINTx intx;
 unsigned int config_size;
 uint8_t *emulated_config_bits; /* QEMU emulated bits, little-endian */
@@ -277,7 +281,7 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool 
enabled);
 /*
  * Common VFIO interrupt disable
  */
-static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index)
+static void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
 {
 struct vfio_irq_set irq_set = {
 .argsz = sizeof(irq_set),
@@ -287,13 +291,13 @@ static void vfio_disable_irqindex(VFIOPCIDevice *vdev, 
int index)
 .count = 0,
 };
 
-ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
 }
 
 /*
  * INTx
  */
-static void vfio_unmask_single_irqindex(VFIOPCIDevice *vdev, int index)
+static void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index)
 {
 struct vfio_irq_set irq_set = {
 .argsz = sizeof(irq_set),
@@ -303,11 +307,11 @@ static void vfio_unmask_single_irqindex(VFIOPCIDevice 
*vdev, int index)
 .count = 1,
 };
 
-ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
 }
 
 #ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
-static void vfio_mask_single_irqindex(VFIOPCIDevice *vdev, int index)
+static void vfio_mask_single_irqindex(VFIODevice *vbasedev, int index)
 {
 struct vfio_irq_set irq_set = {
 .argsz = sizeof(irq_set),
@@ -317,7 +321,7 @@ static void vfio_mask_single_irqindex(VFIOPCIDevice *vdev, 
int index)
 .count = 1,
 };
 
-ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
 }
 #endif
 
@@ -381,7 +385,7 @@ static void vfio_eoi(VFIOPCIDevice *vdev)
 
 vdev->intx.pending = false;
 pci_irq_deassert(&vdev->pdev);
-vfio_unmask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
+vfio_unmask_single_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 }
 
 static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
@@ -404,7 +408,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
 
 /* Get to a known interrupt state */
 qemu_set_fd_handler(irqfd.fd, NULL, NULL, vdev);
-vfio_mask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
+vfio_mask_single_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 vdev->intx.pending = false;
 pci_irq_deassert(&vdev->pdev);
 
@@ -434,7 +438,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
 
 *pfd = irqfd.resamplefd;
 
-ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
 g_free(irq_set);
 if (ret) {
 error_report("vfio: Error: Failed to setup INTx unmask fd: %m");
@@ -442,7 +446,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
 }
 
 /* Let'em rip */
-vfio_unmask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
+vfio_unmask_single_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 
 vdev->intx.kvm_accel = true;
 
@@ -458,7 +462,7 @@ fail_irqfd:
 event_notifier_cleanup(&vdev->intx.unmask);
 fail:
 qemu_set_fd_handler(irqfd.fd, vfio_intx_interrupt, NULL, vdev);
-vfio_unmask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
+vfio_unmask_single_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 #endif
 }
 
@@ -479,7 +483,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
  * Get to a known state, hardware masked, QEMU ready to accept new
  * interrupts, QEMU IRQ de-asserted.
  */
-vfio_mask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
+vfio_mask_single_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 vdev->intx.pending = false;
 pci_irq_deassert(&vdev->pdev);
 
@@ -497,7 +501,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
 vdev->intx.kvm_accel = false;
 
 /* If we've missed an event, let it re-fire through QEMU */
-vfio_unmask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
+vfio_unmask_single_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 
 trace_vfio

[Qemu-devel] [PATCH v8 11/19] hw/vfio: create common module

2014-11-30 Thread Eric Auger

A new common module is created. It implements all functions
that have no device specificity (PCI, Platform).

This patch only consists in move (no functional changes)

Signed-off-by: Kim Phillips 
Signed-off-by: Eric Auger 

---
v7 -> v8:
- integrate "Add skip_dump flag to ignore memory region
  during dump"
- vfio_compute_needs_reset does not return bool anymore

v6 -> v7:
- integrate Revert "vfio: Make BARs native endian"
- remove VFIO_DEVICE_TYPE_PLATFORM in vfio-common.h,
  will come in next patch

v5 -> v6:
- follow all evolutions of original PCI code from v5 to V6
- move declaration of vfio_region_ops, vfio_memory_listener,
  vfio_group_list, vfio_address_spaces into vfio-common.h

v4 -> v5:
- integrate "sPAPR/IOMMU: Fix TCE entry permission"
- VFIOdevice .name dealloc removed from vfio_put_base_device
- add some includes according to vfio inclusion policy

v3 -> v4:
[Eric Auger]
move done after all PCI modifications to anticipate for
VFIO Platform needs. Purpose is to alleviate the whole
review process.

<= v3
First split done by Kim Phillips

Conflicts:
hw/vfio/pci.c

Conflicts:
hw/vfio/pci.c
---
 hw/vfio/Makefile.objs |1 +
 hw/vfio/common.c  |  959 ++
 hw/vfio/pci.c | 1028 +
 include/hw/vfio/vfio-common.h |  151 ++
 trace-events  |1 +
 5 files changed, 1113 insertions(+), 1027 deletions(-)
 create mode 100644 hw/vfio/common.c
 create mode 100644 include/hw/vfio/vfio-common.h

diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index 31c7dab..e31f30e 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -1,3 +1,4 @@
 ifeq ($(CONFIG_LINUX), y)
+obj-$(CONFIG_SOFTMMU) += common.o
 obj-$(CONFIG_PCI) += pci.o
 endif
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
new file mode 100644
index 000..554467f
--- /dev/null
+++ b/hw/vfio/common.c
@@ -0,0 +1,959 @@
+/*
+ * generic functions used by VFIO devices
+ *
+ * Copyright Red Hat, Inc. 2012
+ *
+ * Authors:
+ *  Alex Williamson 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Based on qemu-kvm device-assignment:
+ *  Adapted for KVM by Qumranet.
+ *  Copyright (c) 2007, Neocleus, Alex Novik (a...@neocleus.com)
+ *  Copyright (c) 2007, Neocleus, Guy Zana (g...@neocleus.com)
+ *  Copyright (C) 2008, Qumranet, Amit Shah (amit.s...@qumranet.com)
+ *  Copyright (C) 2008, Red Hat, Amit Shah (amit.s...@redhat.com)
+ *  Copyright (C) 2008, IBM, Muli Ben-Yehuda (m...@il.ibm.com)
+ */
+
+#include 
+#include 
+#include 
+
+#include "hw/vfio/vfio-common.h"
+#include "hw/vfio/vfio.h"
+#include "exec/address-spaces.h"
+#include "exec/memory.h"
+#include "hw/hw.h"
+#include "qemu/error-report.h"
+#include "sysemu/kvm.h"
+#include "trace.h"
+
+struct vfio_group_head vfio_group_list =
+QLIST_HEAD_INITIALIZER(vfio_address_spaces);
+struct vfio_as_head vfio_address_spaces =
+QLIST_HEAD_INITIALIZER(vfio_address_spaces);
+
+#ifdef CONFIG_KVM
+/*
+ * We have a single VFIO pseudo device per KVM VM.  Once created it lives
+ * for the life of the VM.  Closing the file descriptor only drops our
+ * reference to it and the device's reference to kvm.  Therefore once
+ * initialized, this file descriptor is only released on QEMU exit and
+ * we'll re-use it should another vfio device be attached before then.
+ */
+static int vfio_kvm_device_fd = -1;
+#endif
+
+/*
+ * Common VFIO interrupt disable
+ */
+void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
+{
+struct vfio_irq_set irq_set = {
+.argsz = sizeof(irq_set),
+.flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER,
+.index = index,
+.start = 0,
+.count = 0,
+};
+
+ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+}
+
+void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index)
+{
+struct vfio_irq_set irq_set = {
+.argsz = sizeof(irq_set),
+.flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
+.index = index,
+.start = 0,
+.count = 1,
+};
+
+ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+}
+
+void vfio_mask_single_irqindex(VFIODevice *vbasedev, int index)
+{
+struct vfio_irq_set irq_set = {
+.argsz = sizeof(irq_set),
+.flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
+.index = index,
+.start = 0,
+.count = 1,
+};
+
+ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+}
+
+/*
+ * IO Port/MMIO - Beware of the endians, VFIO is always little endian
+ */
+void vfio_region_write(void *opaque, hwaddr addr,
+   uint64_t data, unsigned size)
+{
+VFIORegion *region = opaque;
+VFIODevice *vbasedev = region->vbasedev;
+union {
+uint8_t byte;
+uint16_t word;
+uint32_t dword;
+uint64_t qword;
+} buf;

[Qemu-devel] [PATCH v8 03/19] hw/vfio/pci: generalize mask/unmask to any IRQ index

2014-11-30 Thread Eric Auger

To prepare for platform device introduction, rename vfio_mask_intx
and vfio_unmask_intx into vfio_mask_single_irqindex and respectively
unmask_single_irqindex. Also use a nex index parameter.

With that name and prototype the function will be usable for other
indexes than VFIO_PCI_INTX_IRQ_INDEX.

Signed-off-by: Eric Auger 
---
 hw/vfio/pci.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 0d7d4a0..387da1a 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -293,12 +293,12 @@ static void vfio_disable_irqindex(VFIOPCIDevice *vdev, 
int index)
 /*
  * INTx
  */
-static void vfio_unmask_intx(VFIOPCIDevice *vdev)
+static void vfio_unmask_single_irqindex(VFIOPCIDevice *vdev, int index)
 {
 struct vfio_irq_set irq_set = {
 .argsz = sizeof(irq_set),
 .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK,
-.index = VFIO_PCI_INTX_IRQ_INDEX,
+.index = index,
 .start = 0,
 .count = 1,
 };
@@ -307,12 +307,12 @@ static void vfio_unmask_intx(VFIOPCIDevice *vdev)
 }
 
 #ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */
-static void vfio_mask_intx(VFIOPCIDevice *vdev)
+static void vfio_mask_single_irqindex(VFIOPCIDevice *vdev, int index)
 {
 struct vfio_irq_set irq_set = {
 .argsz = sizeof(irq_set),
 .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK,
-.index = VFIO_PCI_INTX_IRQ_INDEX,
+.index = index,
 .start = 0,
 .count = 1,
 };
@@ -381,7 +381,7 @@ static void vfio_eoi(VFIOPCIDevice *vdev)
 
 vdev->intx.pending = false;
 pci_irq_deassert(&vdev->pdev);
-vfio_unmask_intx(vdev);
+vfio_unmask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
 }
 
 static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
@@ -404,7 +404,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
 
 /* Get to a known interrupt state */
 qemu_set_fd_handler(irqfd.fd, NULL, NULL, vdev);
-vfio_mask_intx(vdev);
+vfio_mask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
 vdev->intx.pending = false;
 pci_irq_deassert(&vdev->pdev);
 
@@ -442,7 +442,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
 }
 
 /* Let'em rip */
-vfio_unmask_intx(vdev);
+vfio_unmask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
 
 vdev->intx.kvm_accel = true;
 
@@ -458,7 +458,7 @@ fail_irqfd:
 event_notifier_cleanup(&vdev->intx.unmask);
 fail:
 qemu_set_fd_handler(irqfd.fd, vfio_intx_interrupt, NULL, vdev);
-vfio_unmask_intx(vdev);
+vfio_unmask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
 #endif
 }
 
@@ -479,7 +479,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
  * Get to a known state, hardware masked, QEMU ready to accept new
  * interrupts, QEMU IRQ de-asserted.
  */
-vfio_mask_intx(vdev);
+vfio_mask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
 vdev->intx.pending = false;
 pci_irq_deassert(&vdev->pdev);
 
@@ -497,7 +497,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
 vdev->intx.kvm_accel = false;
 
 /* If we've missed an event, let it re-fire through QEMU */
-vfio_unmask_intx(vdev);
+vfio_unmask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX);
 
 trace_vfio_disable_intx_kvm(vdev->host.domain, vdev->host.bus,
 vdev->host.slot, vdev->host.function);
-- 
1.8.3.2

[Qemu-devel] [PATCH v8 06/19] hw/vfio/pci: handle reset at VFIODevice

2014-11-30 Thread Eric Auger

Since we can potentially have both PCI and platform devices in
the same VFIO group, this latter now owns a list of VFIODevices.
A unified reset handler, vfio_reset_handler, is registered, looping
through this VFIODevice list. 2 specialized operations are introduced
(vfio_compute_needs_reset and vfio_hot_reset_multi): they allow to
implement type specific behavior. also reset_works and needs_reset
VFIOPCIDevice fields are moved into VFIODevice.

Signed-off-by: Eric Auger 

---

v8: compared to [PATCH v7 03/16] hw/vfio/pci: introduce VFIODevice,
vfio_compute_needs_reset does not return a bool anymore.
---
 hw/vfio/pci.c | 93 ---
 1 file changed, 63 insertions(+), 30 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 157e1a5..e68865b 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -189,13 +189,24 @@ typedef struct VFIOMSIXInfo {
 void *mmap;
 } VFIOMSIXInfo;
 
+typedef struct VFIODeviceOps VFIODeviceOps;
+
 typedef struct VFIODevice {
+QLIST_ENTRY(VFIODevice) next;
 struct VFIOGroup *group;
 char *name;
 int fd;
 int type;
+bool reset_works;
+bool needs_reset;
+VFIODeviceOps *ops;
 } VFIODevice;
 
+struct VFIODeviceOps {
+void (*vfio_compute_needs_reset)(VFIODevice *vdev);
+int (*vfio_hot_reset_multi)(VFIODevice *vdev);
+};
+
 typedef struct VFIOPCIDevice {
 PCIDevice pdev;
 VFIODevice vbasedev;
@@ -214,19 +225,16 @@ typedef struct VFIOPCIDevice {
 VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */
 VFIOVGA vga; /* 0xa, 0x3b0, 0x3c0 */
 PCIHostDeviceAddress host;
-QLIST_ENTRY(VFIOPCIDevice) next;
 EventNotifier err_notifier;
 uint32_t features;
 #define VFIO_FEATURE_ENABLE_VGA_BIT 0
 #define VFIO_FEATURE_ENABLE_VGA (1 << VFIO_FEATURE_ENABLE_VGA_BIT)
 int32_t bootindex;
 uint8_t pm_cap;
-bool reset_works;
 bool has_vga;
 bool pci_aer;
 bool has_flr;
 bool has_pm_reset;
-bool needs_reset;
 bool rom_read_failed;
 } VFIOPCIDevice;
 
@@ -234,7 +242,7 @@ typedef struct VFIOGroup {
 int fd;
 int groupid;
 VFIOContainer *container;
-QLIST_HEAD(, VFIOPCIDevice) device_list;
+QLIST_HEAD(, VFIODevice) device_list;
 QLIST_ENTRY(VFIOGroup) next;
 QLIST_ENTRY(VFIOGroup) container_next;
 } VFIOGroup;
@@ -3381,7 +3389,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool 
single)
  single ? "one" : "multi");
 
 vfio_pci_pre_reset(vdev);
-vdev->needs_reset = false;
+vdev->vbasedev.needs_reset = false;
 
 info = g_malloc0(sizeof(*info));
 info->argsz = sizeof(*info);
@@ -3418,6 +3426,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool 
single)
 for (i = 0; i < info->count; i++) {
 PCIHostDeviceAddress host;
 VFIOPCIDevice *tmp;
+VFIODevice *vbasedev_iter;
 
 host.domain = devices[i].segment;
 host.bus = devices[i].bus;
@@ -3449,7 +3458,11 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool 
single)
 }
 
 /* Prep dependent devices for reset and clear our marker. */
-QLIST_FOREACH(tmp, &group->device_list, next) {
+QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
+continue;
+}
+tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
 if (vfio_pci_host_match(&host, &tmp->host)) {
 if (single) {
 error_report("vfio: found another in-use device "
@@ -3459,7 +3472,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool 
single)
 goto out_single;
 }
 vfio_pci_pre_reset(tmp);
-tmp->needs_reset = false;
+tmp->vbasedev.needs_reset = false;
 multi = true;
 break;
 }
@@ -3512,6 +3525,7 @@ out:
 for (i = 0; i < info->count; i++) {
 PCIHostDeviceAddress host;
 VFIOPCIDevice *tmp;
+VFIODevice *vbasedev_iter;
 
 host.domain = devices[i].segment;
 host.bus = devices[i].bus;
@@ -3532,7 +3546,11 @@ out:
 break;
 }
 
-QLIST_FOREACH(tmp, &group->device_list, next) {
+QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
+continue;
+}
+tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
 if (vfio_pci_host_match(&host, &tmp->host)) {
 vfio_pci_post_reset(tmp);
 break;
@@ -3566,28 +3584,40 @@ static int vfio_pci_hot_reset_one(VFIOPCIDevice *vdev)
 return vfio_pci_hot_reset(vdev, true);
 }
 
-static int vfio_pci_hot_reset_multi(VFIOPCIDevice *vdev)
+static int vfio_pci_hot_reset_multi(VFIODevice *vbasedev)
 {
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDev

[Qemu-devel] [PATCH v8 07/19] hw/vfio/pci: Introduce VFIORegion

2014-11-30 Thread Eric Auger

This structure is going to be shared by VFIOPCIDevice and
VFIOPlatformDevice. VFIOBAR includes it.

vfio_eoi becomes an ops of VFIODevice specialized by parent device.
This makes possible to transform vfio_bar_write/read into generic
vfio_region_write/read that will be used by VFIOPlatformDevice too.

vfio_mmap_bar becomes vfio_map_region

Signed-off-by: Eric Auger 

---

v7->v8:
- integrate "Add skip_dump flag to ignore memory region during dump"

v4->v5:
- remove fd field from VFIORegion
- change error_report format string in vfio_region_write/read
- remove #ifdef DEBUG_VFIO in the same function
- correct missing initialization of bar region's vbasedev field
- change Object * parameter name of vfio_mmap_region and remove
  useless OBJECT()

Conflicts:
hw/vfio/pci.c

Conflicts:
hw/vfio/pci.c
---
 hw/vfio/pci.c | 193 ++
 trace-events  |   4 +-
 2 files changed, 103 insertions(+), 94 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e68865b..10c1697 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -77,15 +77,19 @@ typedef struct VFIOQuirk {
 } data;
 } VFIOQuirk;
 
-typedef struct VFIOBAR {
-off_t fd_offset; /* offset of BAR within device fd */
-int fd; /* device fd, allows us to pass VFIOBAR as opaque data */
+typedef struct VFIORegion {
+struct VFIODevice *vbasedev;
+off_t fd_offset; /* offset of region within device fd */
 MemoryRegion mem; /* slow, read/write access */
 MemoryRegion mmap_mem; /* direct mapped access */
 void *mmap;
 size_t size;
 uint32_t flags; /* VFIO region flags (rd/wr/mmap) */
-uint8_t nr; /* cache the BAR number for debug */
+uint8_t nr; /* cache the region number for debug */
+} VFIORegion;
+
+typedef struct VFIOBAR {
+VFIORegion region;
 bool ioport;
 bool mem64;
 QLIST_HEAD(, VFIOQuirk) quirks;
@@ -205,6 +209,7 @@ typedef struct VFIODevice {
 struct VFIODeviceOps {
 void (*vfio_compute_needs_reset)(VFIODevice *vdev);
 int (*vfio_hot_reset_multi)(VFIODevice *vdev);
+void (*vfio_eoi)(VFIODevice *vdev);
 };
 
 typedef struct VFIOPCIDevice {
@@ -388,8 +393,10 @@ static void vfio_intx_interrupt(void *opaque)
 }
 }
 
-static void vfio_eoi(VFIOPCIDevice *vdev)
+static void vfio_eoi(VFIODevice *vbasedev)
 {
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+
 if (!vdev->intx.pending) {
 return;
 }
@@ -399,7 +406,7 @@ static void vfio_eoi(VFIOPCIDevice *vdev)
 
 vdev->intx.pending = false;
 pci_irq_deassert(&vdev->pdev);
-vfio_unmask_single_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
+vfio_unmask_single_irqindex(vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 }
 
 static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
@@ -552,7 +559,7 @@ static void vfio_update_irq(PCIDevice *pdev)
 vfio_enable_intx_kvm(vdev);
 
 /* Re-enable the interrupt in cased we missed an EOI */
-vfio_eoi(vdev);
+vfio_eoi(&vdev->vbasedev);
 }
 
 static int vfio_enable_intx(VFIOPCIDevice *vdev)
@@ -1089,10 +1096,11 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
 /*
  * IO Port/MMIO - Beware of the endians, VFIO is always little endian
  */
-static void vfio_bar_write(void *opaque, hwaddr addr,
-   uint64_t data, unsigned size)
+static void vfio_region_write(void *opaque, hwaddr addr,
+  uint64_t data, unsigned size)
 {
-VFIOBAR *bar = opaque;
+VFIORegion *region = opaque;
+VFIODevice *vbasedev = region->vbasedev;
 union {
 uint8_t byte;
 uint16_t word;
@@ -1115,20 +1123,14 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
 break;
 }
 
-if (pwrite(bar->fd, &buf, size, bar->fd_offset + addr) != size) {
-error_report("%s(,0x%"HWADDR_PRIx", 0x%"PRIx64", %d) failed: %m",
- __func__, addr, data, size);
+if (pwrite(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
+error_report("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
+ ",%d) failed: %m",
+ __func__, vbasedev->name, region->nr,
+ addr, data, size);
 }
 
-#ifdef DEBUG_VFIO
-{
-VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar->nr]);
-
-trace_vfio_bar_write(vdev->host.domain, vdev->host.bus,
- vdev->host.slot, vdev->host.function,
- region->nr, addr, data, size);
-}
-#endif
+trace_vfio_region_write(vbasedev->name, region->nr, addr, data, size);
 
 /*
  * A read or write to a BAR always signals an INTx EOI.  This will
@@ -1138,13 +1140,14 @@ static void vfio_bar_write(void *opaque, hwaddr addr,
  * which access will service the interrupt, so we're potentially
  * getting quite a few host interrupts per guest interrupt.
  */
-vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar->nr]));
+vbasede

[Qemu-devel] [PATCH v8 08/19] hw/vfio/pci: split vfio_get_device

2014-11-30 Thread Eric Auger

vfio_get_device now takes a VFIODevice as argument. The function is split
into 2 parts: vfio_get_device which is generic and vfio_populate_device
which is bus specific.

3 new fields are introduced in VFIODevice to store dev_info.

vfio_put_base_device is created.

---

v5->v6:
- simplifies the split for vfio_get_device:
  vfio_check_device, vfio_populate_regions, vfio_populate_interrupts
  are now gathered into a unique specialization function dubbed
  vfio_populate_device

v4->v5:
- cleanup up of error handling and get/put operations in
  vfio_check_device, vfio_populate_regions, vfio_populate_interrupts and
  vfio_get_device.
  - correct misuse of errno
  - vfio_populate_regions always returns 0
  - VFIODevice .name deallocation done in vfio_put_device instead of
vfio_put_base_device
  - vfio_put_base_device done at vfio_get_device level.

Signed-off-by: Eric Auger 
---
 hw/vfio/pci.c | 130 +++---
 trace-events  |  10 ++---
 2 files changed, 83 insertions(+), 57 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 10c1697..60ff22b 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -204,12 +204,16 @@ typedef struct VFIODevice {
 bool reset_works;
 bool needs_reset;
 VFIODeviceOps *ops;
+unsigned int num_irqs;
+unsigned int num_regions;
+unsigned int flags;
 } VFIODevice;
 
 struct VFIODeviceOps {
 void (*vfio_compute_needs_reset)(VFIODevice *vdev);
 int (*vfio_hot_reset_multi)(VFIODevice *vdev);
 void (*vfio_eoi)(VFIODevice *vdev);
+int (*vfio_populate_device)(VFIODevice *vdev);
 };
 
 typedef struct VFIOPCIDevice {
@@ -296,6 +300,8 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, 
uint32_t addr, int len);
 static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr,
   uint32_t val, int len);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
+static void vfio_put_base_device(VFIODevice *vbasedev);
+static int vfio_populate_device(VFIODevice *vbasedev);
 
 /*
  * Common VFIO interrupt disable
@@ -3610,6 +3616,7 @@ static VFIODeviceOps vfio_pci_ops = {
 .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
 .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
 .vfio_eoi = vfio_eoi,
+.vfio_populate_device = vfio_populate_device,
 };
 
 static void vfio_reset_handler(void *opaque)
@@ -3951,70 +3958,45 @@ static void vfio_put_group(VFIOGroup *group)
 }
 }
 
-static int vfio_get_device(VFIOGroup *group, const char *name,
-   VFIOPCIDevice *vdev)
+static int vfio_populate_device(VFIODevice *vbasedev)
 {
-struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
 struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
 struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
-int ret, i;
-
-ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
-if (ret < 0) {
-error_report("vfio: error getting device %s from group %d: %m",
- name, group->groupid);
-error_printf("Verify all devices in group %d are bound to vfio-pci "
- "or pci-stub and not already in use\n", group->groupid);
-return ret;
-}
-
-vdev->vbasedev.fd = ret;
-vdev->vbasedev.group = group;
-QLIST_INSERT_HEAD(&group->device_list, &vdev->vbasedev, next);
+int i, ret = -1;
 
 /* Sanity check device */
-ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_INFO, &dev_info);
-if (ret) {
-error_report("vfio: error getting device info: %m");
-goto error;
-}
-
-trace_vfio_get_device_irq(name, dev_info.flags,
-  dev_info.num_regions, dev_info.num_irqs);
-
-if (!(dev_info.flags & VFIO_DEVICE_FLAGS_PCI)) {
+if (!(vbasedev->flags & VFIO_DEVICE_FLAGS_PCI)) {
 error_report("vfio: Um, this isn't a PCI device");
 goto error;
 }
 
-vdev->vbasedev.reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
-
-if (dev_info.num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
+if (vbasedev->num_regions < VFIO_PCI_CONFIG_REGION_INDEX + 1) {
 error_report("vfio: unexpected number of io regions %u",
- dev_info.num_regions);
+ vbasedev->num_regions);
 goto error;
 }
 
-if (dev_info.num_irqs < VFIO_PCI_MSIX_IRQ_INDEX + 1) {
-error_report("vfio: unexpected number of irqs %u", dev_info.num_irqs);
+if (vbasedev->num_irqs < VFIO_PCI_MSIX_IRQ_INDEX + 1) {
+error_report("vfio: unexpected number of irqs %u", vbasedev->num_irqs);
 goto error;
 }
 
 for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
 reg_info.index = i;
 
-ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, ®_info);
+ret = ioctl(vbasedev->fd, VFIO_DEVICE_GE

[Qemu-devel] [PATCH v8 16/19] hw/vfio/platform: Add irqfd support

2014-11-30 Thread Eric Auger

This patch aims at optimizing IRQ handling using irqfd framework.

Instead of handling the eventfds on user-side they are handled on
kernel side using
- the KVM irqfd framework,
- the VFIO driver virqfd framework.

the virtual IRQ completion is trapped at interrupt controller
This removes the need for fast/slow path swap.

Overall this brings significant performance improvements.

it depends on host kernel KVM irqfd.

Signed-off-by: Alvise Rigo 
Signed-off-by: Eric Auger 

---
v5 -> v6
- rely on kvm_irqfds_enabled() and kvm_resamplefds_enabled()
- guard KVM code with #ifdef CONFIG_KVM

v3 -> v4:
[Alvise Rigo]
Use of VFIO Platform driver v6 unmask/virqfd feature and removal
of resamplefd handler. Physical IRQ unmasking is now done in
VFIO driver.

v3:
[Eric Auger]
initial support with resamplefd handled on QEMU side since the
unmask was not supported on VFIO platform driver v5.

Conflicts:
hw/vfio/platform.c
---
 hw/vfio/platform.c  | 96 +
 include/hw/vfio/vfio-platform.h |  1 +
 trace-events|  2 +
 3 files changed, 99 insertions(+)

diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index 41f8693..97d98bf 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -25,6 +25,7 @@
 #include "hw/sysbus.h"
 #include "trace.h"
 #include "hw/platform-bus.h"
+#include "sysemu/kvm.h"
 
 static void vfio_intp_interrupt(VFIOINTp *intp);
 typedef void (*eventfd_user_side_handler_t)(VFIOINTp *intp);
@@ -236,6 +237,83 @@ static int vfio_start_eventfd_injection(VFIOINTp *intp)
 }
 
 /*
+ * Functions used for irqfd
+ */
+
+#ifdef CONFIG_KVM
+
+/**
+ * vfio_set_resample_eventfd - sets the resamplefd for an IRQ
+ * @intp: the IRQ struct pointer
+ * programs the VFIO driver to unmask this IRQ when the
+ * intp->unmask eventfd is triggered
+ */
+static int vfio_set_resample_eventfd(VFIOINTp *intp)
+{
+VFIODevice *vbasedev = &intp->vdev->vbasedev;
+struct vfio_irq_set *irq_set;
+int argsz, ret;
+int32_t *pfd;
+
+argsz = sizeof(*irq_set) + sizeof(*pfd);
+irq_set = g_malloc0(argsz);
+irq_set->argsz = argsz;
+irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_UNMASK;
+irq_set->index = intp->pin;
+irq_set->start = 0;
+irq_set->count = 1;
+pfd = (int32_t *)&irq_set->data;
+*pfd = event_notifier_get_fd(&intp->unmask);
+qemu_set_fd_handler(*pfd, NULL, NULL, intp);
+ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+g_free(irq_set);
+if (ret < 0) {
+error_report("vfio: Failed to set resample eventfd: %m");
+qemu_set_fd_handler(*pfd, NULL, NULL, NULL);
+}
+return ret;
+}
+
+/**
+ * vfio_start_irqfd_injection - starts irqfd injection for an IRQ
+ * programs VFIO driver with both the trigger and resamplefd
+ * programs KVM with the gsi, trigger & resample eventfds
+ */
+static int vfio_start_irqfd_injection(VFIOINTp *intp)
+{
+struct kvm_irqfd irqfd = {
+.fd = event_notifier_get_fd(&intp->interrupt),
+.resamplefd = event_notifier_get_fd(&intp->unmask),
+.gsi = intp->virtualID,
+.flags = KVM_IRQFD_FLAG_RESAMPLE,
+};
+
+if (kvm_vm_ioctl(kvm_state, KVM_IRQFD, &irqfd)) {
+error_report("vfio: Error: Failed to assign the irqfd: %m");
+goto fail_irqfd;
+}
+if (vfio_set_trigger_eventfd(intp, NULL) < 0) {
+goto fail_vfio;
+}
+if (vfio_set_resample_eventfd(intp) < 0) {
+goto fail_vfio;
+}
+
+intp->kvm_accel = true;
+trace_vfio_platform_start_irqfd_injection(intp->pin, intp->virtualID,
+ irqfd.fd, irqfd.resamplefd);
+return 0;
+
+fail_vfio:
+irqfd.flags = KVM_IRQFD_FLAG_DEASSIGN;
+kvm_vm_ioctl(kvm_state, KVM_IRQFD, &irqfd);
+fail_irqfd:
+return -1;
+}
+
+#endif
+
+/*
  * Functions used whatever the injection method
  */
 
@@ -314,6 +392,13 @@ static VFIOINTp *vfio_init_intp(VFIODevice *vbasedev, 
unsigned int index)
 error_report("vfio: Error: trigger event_notifier_init failed ");
 return NULL;
 }
+/* Get an eventfd for resample/unmask */
+ret = event_notifier_init(&intp->unmask, 0);
+if (ret) {
+g_free(intp);
+error_report("vfio: Error: resample event_notifier_init failed eoi");
+return NULL;
+}
 
 /* store the new intp in qlist */
 QLIST_INSERT_HEAD(&vdev->intp_list, intp, next);
@@ -520,7 +605,17 @@ static void vfio_platform_realize(DeviceState *dev, Error 
**errp)
 
 vbasedev->type = VFIO_DEVICE_TYPE_PLATFORM;
 vbasedev->ops = &vfio_platform_ops;
+
+#ifdef CONFIG_KVM
+if (kvm_irqfds_enabled() && kvm_resamplefds_enabled() &&
+vdev->irqfd_allowed) {
+vdev->start_irq_fn = vfio_start_irqfd_injection;
+} else {
+vdev->start_irq_fn = vfio_start_eventfd_injection;
+}
+#else
 vdev->start_irq_fn = vfio_start_eventfd_injection;
+#endif
 
 trace_vfio_platform_realize(vbasedev->name, vdev->c

[Qemu-devel] [PATCH v8 09/19] hw/vfio/pci: rename group_list into vfio_group_list

2014-11-30 Thread Eric Auger

better fit in the rest of the namespace

Signed-off-by: Eric Auger 
---
 hw/vfio/pci.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 60ff22b..d4a0e0f 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -282,7 +282,7 @@ static const VFIORomBlacklistEntry romblacklist[] = {
 #define MSIX_CAP_LENGTH 12
 
 static QLIST_HEAD(, VFIOGroup)
-group_list = QLIST_HEAD_INITIALIZER(group_list);
+vfio_group_list = QLIST_HEAD_INITIALIZER(vfio_group_list);
 
 #ifdef CONFIG_KVM
 /*
@@ -3454,7 +3454,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool 
single)
 continue;
 }
 
-QLIST_FOREACH(group, &group_list, next) {
+QLIST_FOREACH(group, &vfio_group_list, next) {
 if (group->groupid == devices[i].group_id) {
 break;
 }
@@ -3501,7 +3501,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool 
single)
 
 /* Determine how many group fds need to be passed */
 count = 0;
-QLIST_FOREACH(group, &group_list, next) {
+QLIST_FOREACH(group, &vfio_group_list, next) {
 for (i = 0; i < info->count; i++) {
 if (group->groupid == devices[i].group_id) {
 count++;
@@ -3515,7 +3515,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool 
single)
 fds = &reset->group_fds[0];
 
 /* Fill in group fds */
-QLIST_FOREACH(group, &group_list, next) {
+QLIST_FOREACH(group, &vfio_group_list, next) {
 for (i = 0; i < info->count; i++) {
 if (group->groupid == devices[i].group_id) {
 fds[reset->count++] = group->fd;
@@ -3550,7 +3550,7 @@ out:
 continue;
 }
 
-QLIST_FOREACH(group, &group_list, next) {
+QLIST_FOREACH(group, &vfio_group_list, next) {
 if (group->groupid == devices[i].group_id) {
 break;
 }
@@ -3624,13 +3624,13 @@ static void vfio_reset_handler(void *opaque)
 VFIOGroup *group;
 VFIODevice *vbasedev;
 
-QLIST_FOREACH(group, &group_list, next) {
+QLIST_FOREACH(group, &vfio_group_list, next) {
 QLIST_FOREACH(vbasedev, &group->device_list, next) {
 vbasedev->ops->vfio_compute_needs_reset(vbasedev);
 }
 }
 
-QLIST_FOREACH(group, &group_list, next) {
+QLIST_FOREACH(group, &vfio_group_list, next) {
 QLIST_FOREACH(vbasedev, &group->device_list, next) {
 if (vbasedev->needs_reset) {
 vbasedev->ops->vfio_hot_reset_multi(vbasedev);
@@ -3879,7 +3879,7 @@ static VFIOGroup *vfio_get_group(int groupid, 
AddressSpace *as)
 char path[32];
 struct vfio_group_status status = { .argsz = sizeof(status) };
 
-QLIST_FOREACH(group, &group_list, next) {
+QLIST_FOREACH(group, &vfio_group_list, next) {
 if (group->groupid == groupid) {
 /* Found it.  Now is it already in the right context? */
 if (group->container->space->as == as) {
@@ -3921,11 +3921,11 @@ static VFIOGroup *vfio_get_group(int groupid, 
AddressSpace *as)
 goto close_fd_exit;
 }
 
-if (QLIST_EMPTY(&group_list)) {
+if (QLIST_EMPTY(&vfio_group_list)) {
 qemu_register_reset(vfio_reset_handler, NULL);
 }
 
-QLIST_INSERT_HEAD(&group_list, group, next);
+QLIST_INSERT_HEAD(&vfio_group_list, group, next);
 
 vfio_kvm_device_add_group(group);
 
@@ -3953,7 +3953,7 @@ static void vfio_put_group(VFIOGroup *group)
 close(group->fd);
 g_free(group);
 
-if (QLIST_EMPTY(&group_list)) {
+if (QLIST_EMPTY(&vfio_group_list)) {
 qemu_unregister_reset(vfio_reset_handler, NULL);
 }
 }
-- 
1.8.3.2

[Qemu-devel] [PATCH v8 10/19] hw/vfio/pci: use name field in format strings

2014-11-30 Thread Eric Auger

Signed-off-by: Eric Auger 

Conflicts:
trace-events
---
 hw/vfio/pci.c | 213 --
 trace-events  | 109 --
 2 files changed, 116 insertions(+), 206 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index d4a0e0f..6e15c8a 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -386,9 +386,7 @@ static void vfio_intx_interrupt(void *opaque)
 return;
 }
 
-trace_vfio_intx_interrupt(vdev->host.domain, vdev->host.bus,
-  vdev->host.slot, vdev->host.function,
-  'A' + vdev->intx.pin);
+trace_vfio_intx_interrupt(vdev->vbasedev.name, 'A' + vdev->intx.pin);
 
 vdev->intx.pending = true;
 pci_irq_assert(&vdev->pdev);
@@ -407,8 +405,7 @@ static void vfio_eoi(VFIODevice *vbasedev)
 return;
 }
 
-trace_vfio_eoi(vdev->host.domain, vdev->host.bus,
-   vdev->host.slot, vdev->host.function);
+trace_vfio_eoi(vbasedev->name);
 
 vdev->intx.pending = false;
 pci_irq_deassert(&vdev->pdev);
@@ -477,8 +474,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev)
 
 vdev->intx.kvm_accel = true;
 
-trace_vfio_enable_intx_kvm(vdev->host.domain, vdev->host.bus,
-   vdev->host.slot, vdev->host.function);
+trace_vfio_enable_intx_kvm(vdev->vbasedev.name);
 
 return;
 
@@ -530,8 +526,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev)
 /* If we've missed an event, let it re-fire through QEMU */
 vfio_unmask_single_irqindex(&vdev->vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
 
-trace_vfio_disable_intx_kvm(vdev->host.domain, vdev->host.bus,
-vdev->host.slot, vdev->host.function);
+trace_vfio_disable_intx_kvm(vdev->vbasedev.name);
 #endif
 }
 
@@ -550,8 +545,7 @@ static void vfio_update_irq(PCIDevice *pdev)
 return; /* Nothing changed */
 }
 
-trace_vfio_update_irq(vdev->host.domain, vdev->host.bus,
-  vdev->host.slot, vdev->host.function,
+trace_vfio_update_irq(vdev->vbasedev.name,
   vdev->intx.route.irq, route.irq);
 
 vfio_disable_intx_kvm(vdev);
@@ -627,8 +621,7 @@ static int vfio_enable_intx(VFIOPCIDevice *vdev)
 
 vdev->interrupt = VFIO_INT_INTx;
 
-trace_vfio_enable_intx(vdev->host.domain, vdev->host.bus,
-   vdev->host.slot, vdev->host.function);
+trace_vfio_enable_intx(vdev->vbasedev.name);
 
 return 0;
 }
@@ -650,8 +643,7 @@ static void vfio_disable_intx(VFIOPCIDevice *vdev)
 
 vdev->interrupt = VFIO_INT_NONE;
 
-trace_vfio_disable_intx(vdev->host.domain, vdev->host.bus,
-vdev->host.slot, vdev->host.function);
+trace_vfio_disable_intx(vdev->vbasedev.name);
 }
 
 /*
@@ -678,9 +670,7 @@ static void vfio_msi_interrupt(void *opaque)
 abort();
 }
 
-trace_vfio_msi_interrupt(vdev->host.domain, vdev->host.bus,
- vdev->host.slot, vdev->host.function,
- nr, msg.address, msg.data);
+trace_vfio_msi_interrupt(vbasedev->name, nr, msg.address, msg.data);
 #endif
 
 if (vdev->interrupt == VFIO_INT_MSIX) {
@@ -787,9 +777,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
unsigned int nr,
 VFIOMSIVector *vector;
 int ret;
 
-trace_vfio_msix_vector_do_use(vdev->host.domain, vdev->host.bus,
-  vdev->host.slot, vdev->host.function,
-  nr);
+trace_vfio_msix_vector_do_use(vdev->vbasedev.name, nr);
 
 vector = &vdev->msi_vectors[nr];
 
@@ -875,9 +863,7 @@ static void vfio_msix_vector_release(PCIDevice *pdev, 
unsigned int nr)
 VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
 VFIOMSIVector *vector = &vdev->msi_vectors[nr];
 
-trace_vfio_msix_vector_release(vdev->host.domain, vdev->host.bus,
-   vdev->host.slot, vdev->host.function,
-   nr);
+trace_vfio_msix_vector_release(vdev->vbasedev.name, nr);
 
 /*
  * There are still old guests that mask and unmask vectors on every
@@ -940,8 +926,7 @@ static void vfio_enable_msix(VFIOPCIDevice *vdev)
 error_report("vfio: msix_set_vector_notifiers failed");
 }
 
-trace_vfio_enable_msix(vdev->host.domain, vdev->host.bus,
-   vdev->host.slot, vdev->host.function);
+trace_vfio_enable_msix(vdev->vbasedev.name);
 }
 
 static void vfio_enable_msi(VFIOPCIDevice *vdev)
@@ -1017,9 +1002,7 @@ retry:
 return;
 }
 
-trace_vfio_enable_msi(vdev->host.domain, vdev->host.bus,
-  vdev->host.slot, vdev->host.function,
-  vdev->nr_vectors);
+trace_vfio_enable_msi(vdev->vbasedev.name, vdev->nr_vectors);
 }
 
 static void vfio_disable_msi_common(VFIOPCIDevice *vdev)
@@ -1069,8 +1052,7 @@ static void vfio_d

[Qemu-devel] [PATCH v8 12/19] hw/vfio/platform: add vfio-platform support

2014-11-30 Thread Eric Auger

Minimal VFIO platform implementation supporting
- register space user mapping,
- IRQ assignment based on eventfds handled on qemu side.

irqfd kernel acceleration comes in a subsequent patch.

Signed-off-by: Kim Phillips 
Signed-off-by: Eric Auger 

---
v7 -> v8:
- change proto of vfio_platform_compute_needs_reset and sets
  vbasedev->needs_reset to false there
- vfio_[un]mask_irqindex renamed into vfio_[un]mask_single_irqindex
- vfio_register_irq_starter renamed into vfio_kick_irqs
  we now use a reset notifier instead of a machine init done notifier.
  Enables to get rid of the VfioIrqStarterNotifierParams dangling
  pointer. Previously we use pbus first_irq. This is no more possible
  since the reset notifier takes a void * and first_irq is a field of
  a const struct. So now we pass the DeviceState handle of the
  interrupt controller. I tried to keep the code generic, reason why
  I did not rely on an architecture specific accessor to retrieve
  the gsi number (gic accessor as proposed by Alex). I would like to
  avoid creating an ARM VFIO device model. I hope this model
  model can work on other archs than arm (no multiple intc?);
  wouldn't it be simpler to keep the previous first_irq parameter and
  relax the const constraint.

v6 -> v7:
- compat is not exposed anymore as a user option. Rationale is
  the vfio device became abstract and a specialization is needed
  anyway. The derived device must set the compat string.
- in v6 vfio_start_irq_injection was exposed in vfio-platform.h.
  A new function dubbed vfio_register_irq_starter replaces it. It
  registers a machine init done notifier that programs & starts
  all dynamic VFIO device IRQs. This function is supposed to be
  called by the machine file. A set of static helper routines are
  added too. It must be called before the creation of the platform
  bus device.

v5 -> v6:
- vfio_device property renamed into host property
- correct error handling of VFIO_DEVICE_GET_IRQ_INFO ioctl
  and remove PCI related comment
- remove declaration of vfio_setup_irqfd and irqfd_allowed
  property.Both belong to next patch (irqfd)
- remove declaration of vfio_intp_interrupt in vfio-platform.h
- functions that can be static get this characteristic
- remove declarations of vfio_region_ops, vfio_memory_listener,
  group_list, vfio_address_spaces. All are moved to vfio-common.h
- remove vfio_put_device declaration and definition
- print_regions removed. code moved into vfio_populate_regions
- replace DPRINTF by trace events
- new helper routine to set the trigger eventfd
- dissociate intp init from the injection enablement:
  vfio_enable_intp renamed into vfio_init_intp and new function
  named vfio_start_eventfd_injection
- injection start moved to vfio_start_irq_injection (not anymore
  in vfio_populate_interrupt)
- new start_irq_fn field in VFIOPlatformDevice corresponding to
  the function that will be used for starting injection
- user handled eventfd:
  x add mutex to protect IRQ state & list manipulation,
  x correct misleading comment in vfio_intp_interrupt.
  x Fix bugs thanks to fake interrupt modality
- VFIOPlatformDeviceClass becomes abstract
- add error_setg in vfio_platform_realize

v4 -> v5:
- vfio-plaform.h included first
- cleanup error handling in *populate*, vfio_get_device,
  vfio_enable_intp
- vfio_put_device not called anymore
- add some includes to follow vfio policy

v3 -> v4:
[Eric Auger]
- merge of "vfio: Add initial IRQ support in platform device"
  to get a full functional patch although perfs are limited.
- removal of unrealize function since I currently understand
  it is only used with device hot-plug feature.

v2 -> v3:
[Eric Auger]
- further factorization between PCI and platform (VFIORegion,
  VFIODevice). same level of functionality.

<= v2:
[Kim Philipps]
- Initial Creation of the device supporting register space mapping
---
 hw/vfio/Makefile.objs   |   1 +
 hw/vfio/platform.c  | 629 
 include/hw/vfio/vfio-common.h   |   1 +
 include/hw/vfio/vfio-platform.h |  85 ++
 trace-events|  12 +
 5 files changed, 728 insertions(+)
 create mode 100644 hw/vfio/platform.c
 create mode 100644 include/hw/vfio/vfio-platform.h

diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index e31f30e..c5c76fe 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -1,4 +1,5 @@
 ifeq ($(CONFIG_LINUX), y)
 obj-$(CONFIG_SOFTMMU) += common.o
 obj-$(CONFIG_PCI) += pci.o
+obj-$(CONFIG_SOFTMMU) += platform.o
 endif
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
new file mode 100644
index 000..41f8693
--- /dev/null
+++ b/hw/vfio/platform.c
@@ -0,0 +1,629 @@
+/*
+ * vfio based device assignment support - platform devices
+ *
+ * Copyright Linaro Limited, 2014
+ *
+ * Authors:
+ *  Kim Phillips 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Based on vfio based PCI d

[Qemu-devel] [PATCH v8 14/19] hw/arm/virt: add support for VFIO devices

2014-11-30 Thread Eric Auger

VFIO devices are dynamic sysbus devices. They could already be
instantiated. However for them to be functional, IRQ injection must
be programmed and started. This programming must happen after the
sysbus devices are attached to the platform bus and IRQ are bound.
Only at that time the GSI they are connected to are identified and
irqfd can be programmed.

Binding happens in a machine init done notifier registered by the
platform bus init. The IRQ start is done in a reset notifier.

This patchs adds the registration of the IRQ start notifier in machvirt.

Signed-off-by: Eric Auger 

---

v7 -> v8:
- vfio_kick_irqs replaces older vfio_register_irq_starter. The new
function registers a reset notifier while the older registered a
machine init done notifier.
- Given the fact platform_bus_first_irq has become part of a const
struct its handle cannot be passed as a void* to the reset notifier.
We now pass the interrupt DeviceState*.
- create_gic now returns the DeviceState handle of the gic so that it
can be passed to the reset notifier registration
---
 hw/arm/virt.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 37326a9..346b04a 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -44,6 +44,7 @@
 #include "qemu/error-report.h"
 #include "hw/arm/sysbus-fdt.h"
 #include "hw/platform-bus.h"
+#include "hw/vfio/vfio-platform.h"
 
 #define NUM_VIRTIO_TRANSPORTS 32
 
@@ -330,7 +331,7 @@ static void fdt_add_gic_node(const VirtBoardInfo *vbi)
 qemu_fdt_setprop_cell(vbi->fdt, "/intc", "phandle", gic_phandle);
 }
 
-static void create_gic(const VirtBoardInfo *vbi, qemu_irq *pic)
+static DeviceState *create_gic(const VirtBoardInfo *vbi, qemu_irq *pic)
 {
 /* We create a standalone GIC v2 */
 DeviceState *gicdev;
@@ -378,6 +379,7 @@ static void create_gic(const VirtBoardInfo *vbi, qemu_irq 
*pic)
 }
 
 fdt_add_gic_node(vbi);
+return gicdev;
 }
 
 static void create_uart(const VirtBoardInfo *vbi, qemu_irq *pic)
@@ -537,7 +539,8 @@ static void create_flash(const VirtBoardInfo *vbi)
 }
 
 static void create_platform_bus(VirtBoardInfo *vbi, qemu_irq *pic,
-const ARMPlatformBusSystemParams 
*system_params)
+const ARMPlatformBusSystemParams 
*system_params,
+DeviceState *gic)
 {
 DeviceState *dev;
 SysBusDevice *s;
@@ -571,6 +574,9 @@ static void create_platform_bus(VirtBoardInfo *vbi, 
qemu_irq *pic,
 memory_region_add_subregion(sysmem,
 system_params->platform_bus_base,
 sysbus_mmio_get_region(s, 0));
+
+/* setup VFIO signaling/IRQFD for all VFIO platform sysbus devices */
+qemu_register_reset(vfio_kick_irqs, gic);
 }
 
 static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size)
@@ -589,6 +595,7 @@ static void machvirt_init(MachineState *machine)
 MemoryRegion *ram = g_new(MemoryRegion, 1);
 const char *cpu_model = machine->cpu_model;
 VirtBoardInfo *vbi;
+DeviceState *gic;
 
 if (!cpu_model) {
 cpu_model = "cortex-a15";
@@ -646,7 +653,7 @@ static void machvirt_init(MachineState *machine)
 
 create_flash(vbi);
 
-create_gic(vbi, pic);
+gic = create_gic(vbi, pic);
 
 create_uart(vbi, pic);
 
@@ -658,7 +665,7 @@ static void machvirt_init(MachineState *machine)
  */
 create_virtio_devices(vbi, pic);
 
-create_platform_bus(vbi, pic, &platform_bus_params);
+create_platform_bus(vbi, pic, &platform_bus_params, gic);
 
 vbi->bootinfo.ram_size = machine->ram_size;
 vbi->bootinfo.kernel_filename = machine->kernel_filename;
-- 
1.8.3.2

[Qemu-devel] [PATCH v8 18/19] hw/vfio/common: vfio_kvm_device_fd moved in the common header

2014-11-30 Thread Eric Auger

the device is now used in platform for forwarded IRQ setup

Signed-off-by: Eric Auger 
---
 hw/vfio/common.c  | 3 ++-
 include/hw/vfio/vfio-common.h | 5 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 554467f..ba00ec9 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -44,9 +44,10 @@ struct vfio_as_head vfio_address_spaces =
  * initialized, this file descriptor is only released on QEMU exit and
  * we'll re-use it should another vfio device be attached before then.
  */
-static int vfio_kvm_device_fd = -1;
+int vfio_kvm_device_fd = -1;
 #endif
 
+
 /*
  * Common VFIO interrupt disable
  */
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index b5af090..58fd786 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -41,6 +41,11 @@
 #define VFIO_ALLOW_KVM_MSI 1
 #define VFIO_ALLOW_KVM_MSIX 1
 
+#ifdef CONFIG_KVM
+extern int vfio_kvm_device_fd;
+#endif
+
+
 enum {
 VFIO_DEVICE_TYPE_PCI = 0,
 VFIO_DEVICE_TYPE_PLATFORM = 1,
-- 
1.8.3.2

[Qemu-devel] [PATCH v8 17/19] linux-headers: Update KVM headers from linux-next tag ToBeFilled

2014-11-30 Thread Eric Auger

Syncup KVM related linux headers from linux-next tree using
scripts/update-linux-headers.sh.

Integrate updated KVM-VFIO API related to forwarded IRQ

Signed-off-by: Eric Auger 
---
 linux-headers/linux/kvm.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 12045a1..9f798ab 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -946,6 +946,9 @@ struct kvm_device_attr {
 #define  KVM_DEV_VFIO_GROUP1
 #define   KVM_DEV_VFIO_GROUP_ADD   1
 #define   KVM_DEV_VFIO_GROUP_DEL   2
+#define  KVM_DEV_VFIO_DEVICE   2
+#define   KVM_DEV_VFIO_DEVICE_FORWARD_IRQ  1
+#define   KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ2
 
 enum kvm_device_type {
KVM_DEV_TYPE_FSL_MPIC_20= 1,
@@ -963,6 +966,13 @@ enum kvm_device_type {
KVM_DEV_TYPE_MAX,
 };
 
+struct kvm_arch_forwarded_irq {
+__u32 fd; /* file desciptor of the VFIO device */
+__u32 index; /* VFIO device IRQ index */
+__u32 subindex; /* VFIO device IRQ subindex */
+__u32 gsi; /* gsi, ie. virtual IRQ number */
+};
+
 /*
  * ioctls for VM fds
  */
-- 
1.8.3.2

[Qemu-devel] [PATCH v8 13/19] hw/vfio: calxeda xgmac device

2014-11-30 Thread Eric Auger

The platform device class has become abstract. This patch introduces
a calxeda xgmac device that can be be instantiated on command line
using such option.

-device vfio-calxeda-xgmac,host="fff51000.ethernet"

Signed-off-by: Eric Auger 

---
v7 -> v8:
- add a comment in the header about the MMIO regions and IRQ which
  are exposed by the device

v5 -> v6
- back again following Alex Graf advises
- fix a bug related to compat override

v4 -> v5:
removed since device tree was moved to hw/arm/dyn_sysbus_devtree.c

v4: creation for device tree specialization
---
 hw/vfio/Makefile.objs|  1 +
 hw/vfio/calxeda_xgmac.c  | 54 
 include/hw/vfio/vfio-calxeda-xgmac.h | 46 ++
 3 files changed, 101 insertions(+)
 create mode 100644 hw/vfio/calxeda_xgmac.c
 create mode 100644 include/hw/vfio/vfio-calxeda-xgmac.h

diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index c5c76fe..913ab14 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -2,4 +2,5 @@ ifeq ($(CONFIG_LINUX), y)
 obj-$(CONFIG_SOFTMMU) += common.o
 obj-$(CONFIG_PCI) += pci.o
 obj-$(CONFIG_SOFTMMU) += platform.o
+obj-$(CONFIG_SOFTMMU) += calxeda_xgmac.o
 endif
diff --git a/hw/vfio/calxeda_xgmac.c b/hw/vfio/calxeda_xgmac.c
new file mode 100644
index 000..199e076
--- /dev/null
+++ b/hw/vfio/calxeda_xgmac.c
@@ -0,0 +1,54 @@
+/*
+ * calxeda xgmac example VFIO device
+ *
+ * Copyright Linaro Limited, 2014
+ *
+ * Authors:
+ *  Eric Auger 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hw/vfio/vfio-calxeda-xgmac.h"
+
+static void calxeda_xgmac_realize(DeviceState *dev, Error **errp)
+{
+VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
+VFIOCalxedaXgmacDeviceClass *k = VFIO_CALXEDA_XGMAC_DEVICE_GET_CLASS(dev);
+
+vdev->compat = g_strdup("calxeda,hb-xgmac");
+
+k->parent_realize(dev, errp);
+}
+
+static const VMStateDescription vfio_platform_vmstate = {
+.name = TYPE_VFIO_CALXEDA_XGMAC,
+.unmigratable = 1,
+};
+
+static void vfio_calxeda_xgmac_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+VFIOCalxedaXgmacDeviceClass *vcxc =
+VFIO_CALXEDA_XGMAC_DEVICE_CLASS(klass);
+vcxc->parent_realize = dc->realize;
+dc->realize = calxeda_xgmac_realize;
+dc->desc = "VFIO Calxeda XGMAC";
+}
+
+static const TypeInfo vfio_calxeda_xgmac_dev_info = {
+.name = TYPE_VFIO_CALXEDA_XGMAC,
+.parent = TYPE_VFIO_PLATFORM,
+.instance_size = sizeof(VFIOCalxedaXgmacDevice),
+.class_init = vfio_calxeda_xgmac_class_init,
+.class_size = sizeof(VFIOCalxedaXgmacDeviceClass),
+};
+
+static void register_calxeda_xgmac_dev_type(void)
+{
+type_register_static(&vfio_calxeda_xgmac_dev_info);
+}
+
+type_init(register_calxeda_xgmac_dev_type)
diff --git a/include/hw/vfio/vfio-calxeda-xgmac.h 
b/include/hw/vfio/vfio-calxeda-xgmac.h
new file mode 100644
index 000..f994775
--- /dev/null
+++ b/include/hw/vfio/vfio-calxeda-xgmac.h
@@ -0,0 +1,46 @@
+/*
+ * VFIO calxeda xgmac device
+ *
+ * Copyright Linaro Limited, 2014
+ *
+ * Authors:
+ *  Eric Auger 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef HW_VFIO_VFIO_CALXEDA_XGMAC_H
+#define HW_VFIO_VFIO_CALXEDA_XGMAC_H
+
+#include "hw/vfio/vfio-platform.h"
+
+#define TYPE_VFIO_CALXEDA_XGMAC "vfio-calxeda-xgmac"
+
+/**
+ * This device exposes:
+ * - a single MMIO region corresponding to its register space
+ * - 3 IRQS (main and 2 power related IRQs)
+ */
+typedef struct VFIOCalxedaXgmacDevice {
+VFIOPlatformDevice vdev;
+} VFIOCalxedaXgmacDevice;
+
+typedef struct VFIOCalxedaXgmacDeviceClass {
+/*< private >*/
+VFIOPlatformDeviceClass parent_class;
+/*< public >*/
+DeviceRealize parent_realize;
+} VFIOCalxedaXgmacDeviceClass;
+
+#define VFIO_CALXEDA_XGMAC_DEVICE(obj) \
+ OBJECT_CHECK(VFIOCalxedaXgmacDevice, (obj), TYPE_VFIO_CALXEDA_XGMAC)
+#define VFIO_CALXEDA_XGMAC_DEVICE_CLASS(klass) \
+ OBJECT_CLASS_CHECK(VFIOCalxedaXgmacDeviceClass, (klass), \
+TYPE_VFIO_CALXEDA_XGMAC)
+#define VFIO_CALXEDA_XGMAC_DEVICE_GET_CLASS(obj) \
+ OBJECT_GET_CLASS(VFIOCalxedaXgmacDeviceClass, (obj), \
+  TYPE_VFIO_CALXEDA_XGMAC)
+
+#endif
-- 
1.8.3.2

[Qemu-devel] [PATCH v8 15/19] hw/arm/sysbus-fdt: enable vfio-calxeda-xgmac dynamic instantiation

2014-11-30 Thread Eric Auger

vfio-calxeda-xgmac now can be instantiated using the -device option.
The node creation function generates a very basic dt node composed
of the compat, reg and interrupts properties

Signed-off-by: Eric Auger 

---
v7 -> v8:
- move the add_fdt_node_functions array declaration between the device
  specific code and the generic code to avoid forward declarations of
  decice specific functions
- rename add_basic_vfio_fdt_node into
  add_calxeda_midway_xgmac_fdt_node

v6 -> v7:
- compat string re-formatting removed since compat string is not exposed
  anymore as a user option
- VFIO IRQ kick-off removed from sysbus-fdt and moved to VFIO platform
  device
---
 hw/arm/sysbus-fdt.c | 88 +
 1 file changed, 88 insertions(+)

diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
index 7537267..86bbd06 100644
--- a/hw/arm/sysbus-fdt.c
+++ b/hw/arm/sysbus-fdt.c
@@ -26,6 +26,8 @@
 #include "sysemu/device_tree.h"
 #include "hw/platform-bus.h"
 #include "sysemu/sysemu.h"
+#include "hw/vfio/vfio-platform.h"
+#include "hw/vfio/vfio-calxeda-xgmac.h"
 
 /*
  * internal struct that contains the information to create dynamic
@@ -53,11 +55,97 @@ typedef struct NodeCreationPair {
 int (*add_fdt_node_fn)(SysBusDevice *sbdev, void *opaque);
 } NodeCreationPair;
 
+/* Device Specific Code */
+
+/**
+ * add_calxeda_midway_xgmac_fdt_node
+ *
+ * Generates a very simple node with following properties:
+ * compatible string, regs, interrupts
+ */
+static int add_calxeda_midway_xgmac_fdt_node(SysBusDevice *sbdev, void *opaque)
+{
+PlatformBusFdtData *data = opaque;
+PlatformBusDevice *pbus = data->pbus;
+void *fdt = data->fdt;
+const char *parent_node = data->pbus_node_name;
+int compat_str_len;
+char *nodename;
+int i, ret;
+uint32_t *irq_attr;
+uint64_t *reg_attr;
+uint64_t mmio_base;
+uint64_t irq_number;
+VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
+VFIODevice *vbasedev = &vdev->vbasedev;
+Object *obj = OBJECT(sbdev);
+
+mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
+
+nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
+   vbasedev->name,
+   mmio_base);
+
+qemu_fdt_add_subnode(fdt, nodename);
+
+compat_str_len = strlen(vdev->compat) + 1;
+qemu_fdt_setprop(fdt, nodename, "compatible",
+  vdev->compat, compat_str_len);
+
+reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
+
+for (i = 0; i < vbasedev->num_regions; i++) {
+mmio_base = platform_bus_get_mmio_addr(pbus, sbdev, i);
+reg_attr[4*i] = 1;
+reg_attr[4*i+1] = mmio_base;
+reg_attr[4*i+2] = 1;
+reg_attr[4*i+3] = memory_region_size(&vdev->regions[i]->mem);
+}
+
+ret = qemu_fdt_setprop_sized_cells_from_array(fdt, nodename, "reg",
+ vbasedev->num_regions*2, reg_attr);
+if (ret < 0) {
+error_report("could not set reg property of node %s", nodename);
+goto fail;
+}
+
+irq_attr = g_new(uint32_t, vbasedev->num_irqs*3);
+
+for (i = 0; i < vbasedev->num_irqs; i++) {
+irq_number = platform_bus_get_irqn(pbus, sbdev , i)
+ + data->irq_start;
+irq_attr[3*i] = cpu_to_be32(0);
+irq_attr[3*i+1] = cpu_to_be32(irq_number);
+irq_attr[3*i+2] = cpu_to_be32(0x4);
+}
+
+   ret = qemu_fdt_setprop(fdt, nodename, "interrupts",
+ irq_attr, vbasedev->num_irqs*3*sizeof(uint32_t));
+if (ret < 0) {
+error_report("could not set interrupts property of node %s",
+ nodename);
+goto fail;
+}
+
+g_free(nodename);
+g_free(irq_attr);
+g_free(reg_attr);
+
+return 0;
+
+fail:
+
+   return -1;
+}
+
 /* list of supported dynamic sysbus devices */
 static const NodeCreationPair add_fdt_node_functions[] = {
+{TYPE_VFIO_CALXEDA_XGMAC, add_calxeda_midway_xgmac_fdt_node},
 {"", NULL}, /*last element*/
 };
 
+/* Generic Code */
+
 /**
  * add_fdt_node - add the device tree node of a dynamic sysbus device
  *
-- 
1.8.3.2

[Qemu-devel] [PATCH v8 19/19] hw/vfio/platform: add forwarded irq support

2014-11-30 Thread Eric Auger

Tests whether the forwarded IRQ modality is available.
In the positive device IRQs are forwarded. This control is
achieved with KVM-VFIO device. with such a modality injection
still is handled through irqfds. However end of interrupt is
not trapped anymore. As soon as the guest completes its virtual
IRQ, the corresponding physical IRQ is completed and the same
physical IRQ can hit again.

A new x-forward property enables to force forwarding off although
enabled by the kernel.

Signed-off-by: Eric Auger 
---
 hw/vfio/platform.c  | 52 +
 include/hw/vfio/vfio-platform.h |  2 ++
 trace-events|  1 +
 3 files changed, 55 insertions(+)

diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index 97d98bf..7881b9b 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -237,6 +237,52 @@ static int vfio_start_eventfd_injection(VFIOINTp *intp)
 }
 
 /*
+ * Functions used with forwarding capability
+ */
+
+#ifdef CONFIG_KVM
+
+static bool has_kvm_vfio_forward_capability(void)
+{
+struct kvm_device_attr attr = {
+ .group = KVM_DEV_VFIO_DEVICE,
+ .attr = KVM_DEV_VFIO_DEVICE_FORWARD_IRQ};
+
+if (ioctl(vfio_kvm_device_fd, KVM_HAS_DEVICE_ATTR, &attr) == 0) {
+return true;
+} else {
+return false;
+}
+}
+
+static int vfio_set_forwarding(VFIOINTp *intp)
+{
+int ret;
+struct kvm_device_attr attr = {
+ .group = KVM_DEV_VFIO_DEVICE,
+ .attr = KVM_DEV_VFIO_DEVICE_FORWARD_IRQ};
+
+intp->fwd_irq = g_malloc0(sizeof(*intp->fwd_irq));
+intp->fwd_irq->fd = intp->vdev->vbasedev.fd;
+intp->fwd_irq->index = intp->pin;
+intp->fwd_irq->gsi = intp->virtualID;
+
+attr.addr = (uint64_t)(unsigned long)intp->fwd_irq;
+
+if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
+error_report("Failed to forward IRQ %d through KVM VFIO device",
+ intp->pin);
+g_free(intp->fwd_irq);
+return -errno;
+}
+trace_vfio_start_fwd_injection(intp->pin);
+
+return ret;
+}
+
+#endif
+
+/*
  * Functions used for irqfd
  */
 
@@ -288,6 +334,11 @@ static int vfio_start_irqfd_injection(VFIOINTp *intp)
 .flags = KVM_IRQFD_FLAG_RESAMPLE,
 };
 
+if (has_kvm_vfio_forward_capability() &&
+ intp->vdev->forward_allowed) {
+vfio_set_forwarding(intp);
+}
+
 if (kvm_vm_ioctl(kvm_state, KVM_IRQFD, &irqfd)) {
 error_report("vfio: Error: Failed to assign the irqfd: %m");
 goto fail_irqfd;
@@ -694,6 +745,7 @@ static Property vfio_platform_dev_properties[] = {
 DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
mmap_timeout, 1100),
 DEFINE_PROP_BOOL("x-irqfd", VFIOPlatformDevice, irqfd_allowed, true),
+DEFINE_PROP_BOOL("x-forward", VFIOPlatformDevice, forward_allowed, true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/vfio/vfio-platform.h b/include/hw/vfio/vfio-platform.h
index de0b5d5..d512bb3 100644
--- a/include/hw/vfio/vfio-platform.h
+++ b/include/hw/vfio/vfio-platform.h
@@ -42,6 +42,7 @@ typedef struct VFIOINTp {
 bool kvm_accel; /* set when QEMU bypass through KVM enabled */
 uint8_t pin; /* index */
 uint8_t virtualID; /* virtual IRQ */
+struct kvm_arch_forwarded_irq *fwd_irq;
 } VFIOINTp;
 
 typedef int (*start_irq_fn_t)(VFIOINTp *intp);
@@ -59,6 +60,7 @@ typedef struct VFIOPlatformDevice {
 start_irq_fn_t start_irq_fn;
 QemuMutex  intp_mutex;
 bool irqfd_allowed; /* debug option to force irqfd on/off */
+bool forward_allowed; /* debug option to force forwarding on/off */
 } VFIOPlatformDevice;
 
 
diff --git a/trace-events b/trace-events
index 59a09f6..0aea358 100644
--- a/trace-events
+++ b/trace-events
@@ -1431,6 +1431,7 @@ vfio_get_device(const char * name, unsigned int flags, 
unsigned int num_regions,
 vfio_put_base_device(int fd) "close vdev->fd=%d"
 
 # hw/vfio/platform.c
+vfio_start_fwd_injection(int pin) "forwarding set for IRQ pin %d"
 vfio_platform_eoi(int pin, int fd) "EOI IRQ pin %d (fd=%d)"
 vfio_platform_mmap_set_enabled(bool enabled) "fast path = %d"
 vfio_platform_intp_mmap_enable(int pin) "IRQ #%d still active, stay in slow 
path"
-- 
1.8.3.2

[Qemu-devel] [PATCH v8 00/19] KVM platform device passthrough

2014-11-30 Thread Eric Auger

This RFC series aims at enabling KVM platform device passthrough.
It implements a VFIO platform device, derived from VFIO PCI device.

The VFIO platform device uses the host VFIO platform driver which must
be bound to the assigned device prior to the QEMU system start.

- the guest can directly access the device register space
- assigned device IRQs are transparently routed to the guest by
  QEMU/KVM (3 methods currently are supported: user-level eventfd
  handling, irqfd, forwarded IRQs)
- iommu is transparently programmed to prevent the device from
  accessing physical pages outside of the guest address space

This patch series is made of the following patch file groups:

1-11) PCI modifications to prepare for platform device introduction
12-15) VFIO & calxeda midway platform device without irqfd support
16) VFIO platform device with irqfd support
17-19) VFIO platform device with IRQ forwarding support

Each group is independent and should be separately upstreamable.

Dependency List:

QEMU dependencies:
[1] [PATCH v5] machvirt dynamic sysbus device instantiation
Eric Auger
[2] [PATCH v3 0/2] actual checks of KVM_CAP_IRQFD and KVM_CAP_IRQFD_RESAMPLE,
Eric Auger
http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00589.html
[3] [PATCH v2] vfio: migration to trace points
Eric Auger
https://patchwork.ozlabs.org/patch/394785/

Kernel Dependencies:
[1] [PATCH v10 00/20] VFIO support for platform and AMBA devices on ARM
Antonios Motakis
http://comments.gmane.org/gmane.linux.kernel.iommu/7096
[2] [PATCH v3 0/6] vfio: type1: support for ARM SMMUS with VFIO_IOMMU_TYPE1
Antonios Motakis
http://www.spinics.net/lists/kvm-arm/msg11738.html
[3] [PATCH v4] ARM: KVM: add irqfd support
Eric Auger
https://lkml.org/lkml/2014/9/1/141
[4] [RFC PATCH 0/9] ARM: Forwarding physical interrupts to a guest VM,
Marc Zyngier
http://lwn.net/Articles/603514/
[5] [PATCH v3 0/9] KVM-VFIO IRQ forward control
Eric Auger
https://lkml.org/lkml/2014/9/1/344

- kernel pieces can be found at:
  http://git.linaro.org/people/eric.auger/linux.git (branch 3.18-rc6-v10)
- QEMU pieces can be found at:
  http://git.linaro.org/people/eric.auger/qemu.git (branch vfio_integ_v8)

The patch series was tested on Calxeda Midway (ARMv7) where one xgmac
is assigned to KVM host while the second one is assigned to the guest.
Reworked PCI device is not tested.

Wiki for Calxeda Midway setup:
https://wiki.linaro.org/LEG/Engineering/Virtualization/Platform_Device_Passthrough_on_Midway

History:
v7->v8:
- rebase on v2.2.0-rc3 and integrate
  "Add skip_dump flag to ignore memory region during dump"
- KVM header evolution with subindex addition in kvm_arch_forwarded_irq
- split [PATCH v7 03/16] hw/vfio/pci: introduce VFIODevice into 4 patches
- vfio_compute_needs_reset does not return bool anymore
- add some comments about exposed MMIO region and IRQ in calxeda xgmac
  device
- vfio_[un]mask_irqindex renamed into vfio_[un]mask_single_irqindex
- rework IRQ startup: former machine init done notifier is replaced by a
  reset notifier. machine file passes the interrupt controller
  DeviceState handle (not the platform bus first irq parameter).
- sysbus-fdt:
  - move the add_fdt_node_functions array declaration between the device
specific code and the generic code to avoid forward declarations of
decice specific functions
  - rename add_basic_vfio_fdt_node into add_calxeda_midway_xgmac_fdt_node
emphasizing the fact it is xgmac specific

v6->v7:
- fake injection test modality removed
- VFIO_DEVICE_TYPE_PLATFORM only introduced with VFIO platform
- new helper functions to start VFIO IRQ on machine init done notifier
  (introduced in hw/vfio/platform: add vfio-platform support and notifier
  registration invoked in hw/arm/virt: add support for VFIO devices).
  vfio_start_irq_injection is replaced by vfio_register_irq_starter.

v5->v6:
- rebase on 2.1rc5 PCI code
- forwarded IRQ first integraton
- vfio_device property renamed into host property
- split IRQ setup in different functions that match the 3 supported
  injection techniques (user handled eventfd, irqfd, forwarded IRQ):
  removes dynamic switch between injection methods
- introduce fake interrupts as a test modality:
  x makes possible to test multiple IRQ user-side handling.
  x this is a test feature only: enable to trigger a fd as if the
real physical IRQ hit. No virtual IRQ is injected into the guest
but handling is simulated so that the state machine can be tested
- user handled eventfd:
  x add mutex to protect IRQ state & list manipulation,
  x correct misleading comment in vfio_intp_interrupt.
  x Fix bugs using fake interrupt modality
- irqfd no more advertised in this patchset (handled in [3])
- VFIOPlatformDeviceClass becomes abstract and Calxeda xgmac device
  and class is re-introduced (as per v4)
- all DPRINTF removed in platform and replaced by trace-points
- corrects compilation with configure --disable-kvm
- simpli

[Qemu-devel] dtb support on x86 machines

2014-11-30 Thread João Henrique Ferreira de Freitas


Hi,


I would like to share my work-in-progress about device-tree on qemux x86 
machine. The patch is not fully functional but works as a proof of 
concept. It is based on qemu stable-2.1 and when I solve my questions I 
will do using master branch.


Besides that device-tree on x86 machines is not widespread used but 
works. The bootloader syslinux has support to it and I am doing the 
similar patches to kexec too. So I deciced to do the some with qemu. ;)


The patch uses setup_data field of linux boot protocol 
(https://www.kernel.org/doc/Documentation/x86/boot.txt) which is a 
linked list of 'struct setup_data'. Usually setup_data is used to extend 
boot parameters. I am using it to put a loaded dtb there.


Until now you can see the patch at 
https://github.com/joaohf/qemu/commit/941d68e6126b4e0908fdd8a90fa7d3f28098a49f. 
I will send it to qemu-devel list when I solve my biggest question that 
I am going to explain later.


-- begin 

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index ef9fad8..94467ba 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -51,6 +51,7 @@
 #include "exec/address-spaces.h"
 #include "sysemu/arch_init.h"
 #include "qemu/bitmap.h"
+#include "sysemu/device_tree.h"
 #include "qemu/config-file.h"
 #include "hw/acpi/acpi.h"
 #include "hw/acpi/cpu_hotplug.h"
@@ -75,7 +76,7 @@
 /* Leave a chunk of memory at the top of RAM for the BIOS ACPI tables
  * (128K) and other BIOS datastructures (less than 4K reported to be 
used at

  * the moment, 32K should be enough for a while).  */
-unsigned acpi_data_size = 0x2 + 0x8000;
+unsigned acpi_data_size = 0x2 + 0x8;
 void pc_set_legacy_acpi_data_size(void)
 {
 acpi_data_size = 0x1;
@@ -741,17 +742,77 @@ static long get_file_size(FILE *f)
 return size;
 }

+static int load_dtb(FWCfgState *fw_cfg,
+const char *dtb_filename,
+void **dtb_addr,
+int *dtb_size)
+{
+void *fdt = NULL;
+
+fdt = load_device_tree(dtb_filename, dtb_size);
+if (!fdt) {
+fprintf(stderr, "Couldn't open dtb file %s\n", dtb_filename);
+return -1;
+}
+
+qemu_fdt_dumpdtb(fdt, *dtb_size);
+
+*dtb_addr = fdt;
+
+return 0;
+}
+
+struct setup_data {
+uint64_t next;
+uint32_t type;
+#define SETUP_NONE  0
+#define SETUP_E820_EXT  1
+#define SETUP_DTB   2
+#define SETUP_PCI   3
+#define SETUP_EFI   4
+uint32_t len;
+uint8_t data[0];
+} __attribute__((packed));
+
+static int setup_dtb_data(FWCfgState *fw_cfg,
+  void **setup_data_addr, int *setup_data_size,
+  void *dtb_addr, off_t dtb_size)
+{
+struct setup_data *sd;
+int sdsize;
+
+sd = g_malloc(sizeof(struct setup_data) + dtb_size);
+if (!sd) {
+return -1;
+}
+
+memset(sd, 0, sizeof(struct setup_data) + dtb_size);
+sd->next = 0;
+sd->type = SETUP_DTB;
+sd->len = dtb_size;
+memcpy(sd->data, dtb_addr, dtb_size);
+
+sdsize = sd->len + sizeof(struct setup_data);
+
+*setup_data_addr = (void *) sd;
+*setup_data_size = sdsize;
+
+return 0;
+}
+
 static void load_linux(FWCfgState *fw_cfg,
const char *kernel_filename,
const char *initrd_filename,
+   const char *dtb_filename,
const char *kernel_cmdline,
hwaddr max_ram_size)
 {
 uint16_t protocol;
-int setup_size, kernel_size, initrd_size = 0, cmdline_size;
+int setup_size, kernel_size, initrd_size = 0, cmdline_size, 
dtb_size = 0, setup_data_size = 0;;

 uint32_t initrd_max;
 uint8_t header[8192], *setup, *kernel, *initrd_data;
-hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0;
+hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0, 
setup_data_addr = 0;

+void *dtb_addr, *setup_data;
 FILE *f;
 char *vmode;

@@ -891,6 +952,53 @@ static void load_linux(FWCfgState *fw_cfg,
 stl_p(header+0x21c, initrd_size);
 }

+/* load dtb */
+if (dtb_filename) {
+int retval;
+retval = load_dtb(fw_cfg, dtb_filename, &dtb_addr, &dtb_size);
+if (retval < 0) {
+fprintf(stderr, "qemu: error loading dtb %s: %s\n",
+dtb_filename, strerror(errno));
+exit(1);
+}
+
+retval = setup_dtb_data(fw_cfg, &setup_data, &setup_data_size,
+dtb_addr, dtb_size);
+if (retval < 0) {
+fprintf(stderr, "qemu: error no memory to setup_data\n");
+exit(1);
+}
+
+//if (!initrd_addr) {
+//setup_data_addr = 
(initrd_max-initrd_size-setup_data_size) & ~4095;

+//} else {
+setup_data_addr = 
QEMU_ALIGN_UP(initrd_max-initrd_size-setup_data_size, 4096);

+//}
+
+stq_p(header+0x250, setup_data_addr);
+
+cpu_physical_memory_write(setup_data_addr, setup_data, 
setup_data_size

Re: [Qemu-devel] [PATCH] i6300esb: fix reading config registers and accept writes of all length

2014-11-30 Thread Richard W.M. Jones

On Wed, Oct 29, 2014 at 02:42:51PM +0100, Adam Hoka wrote:
> Don't require configuration register write to be off a certain length,
> as some PCI implementations always access them in 32bit only. This is
> because it's in fact the only kind of access supported by the standard,
> anything else is implementation dependent.
> 
> Add support for reading back the configuration register values.
> 
> Unify the MMIO register implementation into a common read and write
> function. This makes driver testing in QEMU less surprising.
> 
> Missing: interrupt register is still not implemented as interrupting
> itself is absent. It's unclear from the 6300ESB ICH specs where
> the IRQ line is connected in real hardware.
> 
> Signed-off-by: Adam Hoka 

I don't really have any opinion on this patch.  All I care is that it
doesn't break the Linux device driver (the Intel-supplied 32 bit
Windows device driver is unfortunately a lost cause).  Did you test it
against Linux?  I wrote a small test harness that makes testing the
qemu watchdog simple:

http://git.annexia.org/?p=watchdog-test-framework.git;a=summary

Rich.

>  hw/watchdog/wdt_i6300esb.c | 134 
> ++---
>  1 file changed, 53 insertions(+), 81 deletions(-)
> 
> diff --git a/hw/watchdog/wdt_i6300esb.c b/hw/watchdog/wdt_i6300esb.c
> index 687c8b1..8512a91 100644
> --- a/hw/watchdog/wdt_i6300esb.c
> +++ b/hw/watchdog/wdt_i6300esb.c
> @@ -212,12 +212,12 @@ static void i6300esb_config_write(PCIDevice *dev, 
> uint32_t addr,
>  
>  i6300esb_debug("addr = %x, data = %x, len = %d\n", addr, data, len);
>  
> -if (addr == ESB_CONFIG_REG && len == 2) {
> +if (addr == ESB_CONFIG_REG) {
>  d->reboot_enabled = (data & ESB_WDT_REBOOT) == 0;
>  d->clock_scale =
>  (data & ESB_WDT_FREQ) != 0 ? CLOCK_SCALE_1MHZ : CLOCK_SCALE_1KHZ;
>  d->int_type = (data & ESB_WDT_INTTYPE);
> -} else if (addr == ESB_LOCK_REG && len == 1) {
> +} else if (addr == ESB_LOCK_REG) {
>  if (!d->locked) {
>  d->locked = (data & ESB_WDT_LOCK) != 0;
>  d->free_run = (data & ESB_WDT_FUNC) != 0;
> @@ -240,13 +240,13 @@ static uint32_t i6300esb_config_read(PCIDevice *dev, 
> uint32_t addr, int len)
>  
>  i6300esb_debug ("addr = %x, len = %d\n", addr, len);
>  
> -if (addr == ESB_CONFIG_REG && len == 2) {
> +if (addr == ESB_CONFIG_REG) {
>  data =
>  (d->reboot_enabled ? 0 : ESB_WDT_REBOOT) |
>  (d->clock_scale == CLOCK_SCALE_1MHZ ? ESB_WDT_FREQ : 0) |
>  d->int_type;
>  return data;
> -} else if (addr == ESB_LOCK_REG && len == 1) {
> +} else if (addr == ESB_LOCK_REG) {
>  data =
>  (d->free_run ? ESB_WDT_FUNC : 0) |
>  (d->locked ? ESB_WDT_LOCK : 0) |
> @@ -257,116 +257,88 @@ static uint32_t i6300esb_config_read(PCIDevice *dev, 
> uint32_t addr, int len)
>  }
>  }
>  
> -static uint32_t i6300esb_mem_readb(void *vp, hwaddr addr)
> +static uint32_t i6300esb_mem_read(void *vp, hwaddr addr)
>  {
> -i6300esb_debug ("addr = %x\n", (int) addr);
> -
> -return 0;
> -}
> -
> -static uint32_t i6300esb_mem_readw(void *vp, hwaddr addr)
> -{
> -uint32_t data = 0;
>  I6300State *d = vp;
>  
> -i6300esb_debug("addr = %x\n", (int) addr);
> +i6300esb_debug("addr = %p\n", (void *)addr);
>  
> -if (addr == 0xc) {
> +switch (addr) {
> +case 0x00:
> +return d->timer1_preload;
> +case 0x04:
> +return d->timer2_preload;
> +case 0x0c:
>  /* The previous reboot flag is really bit 9, but there is
>   * a bug in the Linux driver where it thinks it's bit 12.
>   * Set both.
>   */
> -data = d->previous_reboot_flag ? 0x1200 : 0;
> +return d->previous_reboot_flag ? 0x1200 : 0;
>  }
>  
> -return data;
> -}
> -
> -static uint32_t i6300esb_mem_readl(void *vp, hwaddr addr)
> -{
> -i6300esb_debug("addr = %x\n", (int) addr);
> -
>  return 0;
>  }
>  
> -static void i6300esb_mem_writeb(void *vp, hwaddr addr, uint32_t val)
> +static void i6300esb_mem_write(void *vp, hwaddr addr, uint32_t val)
>  {
>  I6300State *d = vp;
>  
> -i6300esb_debug("addr = %x, val = %x\n", (int) addr, val);
> +i6300esb_debug("addr = %p, val = 0x%x\n", (void *)addr, val);
>  
> -if (addr == 0xc && val == 0x80)
> +/* register lock */
> +if (addr == 0xc && val == 0x80) {
>  d->unlock_state = 1;
> -else if (addr == 0xc && val == 0x86 && d->unlock_state == 1)
> +return;
> +} else if (addr == 0xc && val == 0x86 && d->unlock_state == 1) {
>  d->unlock_state = 2;
> -}
> +return;
> +} else if (d->unlock_state == 0) {
> +return;
> +}
>  
> -static void i6300esb_mem_writew(void *vp, hwaddr addr, uint32_t val)
> -{
> -I6300State *d = vp;
> +switch (addr) {
> +case 0x00:
> +d->timer1_preload = val & 0xf;
> +break;
>  
> -i

Re: [Qemu-devel] How does qemu know the virtual memory of the guest os?

2014-11-30 Thread Richard W.M. Jones

On Fri, Nov 28, 2014 at 04:17:10PM -0800, Jidong Xiao wrote:
> Hi,
> 
> I notice that Qemu supports dump virtual memory of Guest OS. As this
> page suggests:
> 
> 
> http://doc.opensuse.org/products/draft/SLES/SLES-kvm_sd_draft/cha.qemu.monitor.html
> 
> To save the content of the virtual machine memory to a disk or console
> output, use the following commands:
> 
> memsave addr size filename
> 
> Saves virtual memory dump starting at addr of size size to file filename
> 
> pmemsave addr size filename
> 
> Saves physical memory dump starting at addr of size size to file filename
> =
> 
> I understand that hypervisors certainly know the physical memory of
> virtual machine, but how does it know the virtual memory of the Guest
> OS? I think the hypervisor has no semantic knowledge of the Guest OS,
> and such knowledge should be different for different OS (e.g., Windows
> vs Linux), so I am really surprised that Qemu can dump the virtual
> memory of the Guest OS. Can someone kindly give me some explanation?
> Thank you very much!!

It's different for each *architecture*, but not for each OS.

For example on x86 it starts by reading the CR* control registers, and
then the page tables (see target-i386/helper.c:
x86_cpu_get_phys_page_debug).

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

Re: [Qemu-devel] Better Cortex-M support?

2014-11-30 Thread Alistair Francis

On Fri, Nov 14, 2014 at 5:32 PM, Liviu Ionescu  wrote:
>
> On 14 Nov 2014, at 03:01, Alistair Francis  wrote:
>
>> I haven't looked into CMSIS or using SysTick, so I can't confirm that
>> they work. I don't have any experience with using either, so I can't
>> really be of much help with those.
>
> when you'll have some time, perhaps it would be useful to install GNU ARM 
> Eclipse and generate a project for your board, run it on the physical 
> hardware, then test it on QEMU.

Sorry about the long delay. I probably won't be able to do that for
some time, I have other other aspects for the project that are higher
priority. If I get a chance I will though

>
>> I have implementations for the more important system peripherals in
>> the STM32F2xx/4xx SoC families, including GPIO.
>
> did you implement the clock related registers? PLL & others? these are used 
> during CMSIS SystemInit() and are mandatory, otherwise emulation will either 
> fail or not be realistic.

Not specifically. I did implement a timer peripheral, but I assume
that isn't the same. I didn't have any issues with timing and
unrealistic emulation, but I'm not looking for exact time accurate
emulations

Thanks,

Alistair

>
>> You are welcome to use
>> those if you want
>
> thank you!
>
>> above my use case is more aimed at higher level machine/peripherals
>> support
>
> yes, that's great, but without a proper base, like system registers and 
> debug, usability may be not be as good as expected.
>
>
> regards,
>
> Liviu
>
>

Re: [Qemu-devel] [PATCH 3/7] test-coroutine: avoid overflow on 32-bit systems

2014-11-30 Thread Ming Lei

On Fri, Nov 28, 2014 at 10:12 PM, Paolo Bonzini  wrote:
> unsigned long is not large enough to represent 10 * duration there.
> Just use floating point.
>
> Signed-off-by: Paolo Bonzini 
> ---
>  tests/test-coroutine.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tests/test-coroutine.c b/tests/test-coroutine.c
> index e22fae1..27d1b6f 100644
> --- a/tests/test-coroutine.c
> +++ b/tests/test-coroutine.c
> @@ -337,7 +337,7 @@ static void perf_cost(void)
> "%luns per coroutine",
> maxcycles,
> duration, ops,
> -   (unsigned long)(10 * duration) / maxcycles);
> +   (unsigned long)(10.0 * duration / maxcycles));

One more single bracket.

thanks,
Ming Lei

Re: [Qemu-devel] [PATCH 04/12] spapr_pci: add set-indicator RTAS interface

2014-11-30 Thread Bharata B Rao

On Wed, Nov 26, 2014 at 11:57 AM, Michael Roth
 wrote:
> https://github.com/mdroth/qemu/commits/spapr-pci-hotplug-ppc-next-cleanup4.2
>
> The sPAPRDREntry stuff is now modeled by the sPAPRDRConnector QOM object in
> hw/ppc/spapr_drc.c, which manages the device's life-cycle based on
> rtas-set-sensor-state calls from the guest. As part of qemu-side 
> hotplug/unplug
> you use the attach/detach methods of the DRC to associate DT bits and 
> callbacks
> for things like device cleanup or rtas calls to fetch a DT node from the 
> device
> associated with a particular DRC.
>
> I still need to fix endian issues, and am realizing the dr connectors and DT
> bits for PHBs are not actually a prereq for PCI hotplug, so I may be pulling
> that out to a separate series specific to enabling PHB hotplug (namely for
> VFIO hotplug). I realize your CPU/MEM sort of depend on the top-level PHB
> device tree code so I'm not sure how best to deal with that. Worse case we'd
> roll the initial code into your series and base a follow-up series on that of
> that instead.

Thanks Michael for pointing me to your git tree.

I started rebasing my patchset on top of yours and realized that the
generic DT setup code from the below commits of your branch are needed
for CPU and memory hotplug too. They all apply in the order I  have
listed below.

71b32999c4eb spapr_drc: initial implementation
255c50200848 spapr: populate DRC entries for root dt node (don't need
code that adds PHB DT entries)
408206fc627e3 spapr_rtas: add set-indicator RTAS interface
da7a232fa6a44 spapr_rtas: add get-sensor-state RTAS interface
1c575d5b29688 spapr_rtas: add ibm,configure-connector RTAS interface
0c5d72833666c spapr_events: re-use EPOW event infrastructure for hotplug events
82ee5a9c88155 spapr_events: event-scan RTAS interface

If you can make the above set an independent patchset, it will become
easy to maintain and post CPU and memory hotplug patchsets.

I am facing some endian issues in your patchset and I will send fixes
for those separately.

Regards,
Bharata.

Re: [Qemu-devel] [PATCH] vhost: Fix vhostfd leak in error branch

2014-11-30 Thread Jason Wang




On Fri, Nov 28, 2014 at 5:26 PM, arei.gong...@huawei.com wrote:

From: Gonglei 

Signed-off-by: Gonglei 
---
 hw/scsi/vhost-scsi.c | 1 +
 hw/virtio/vhost.c| 2 ++
 2 files changed, 3 insertions(+)

diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 308b393..dcb2bc5 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -233,6 +233,7 @@ static void vhost_scsi_realize(DeviceState *dev, 
Error **errp)

vhost_dummy_handle_output);
 if (err != NULL) {
 error_propagate(errp, err);
+close(vhostfd);
 return;
 }
 
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c

index 5d7c40a..5a12861 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -817,10 +817,12 @@ int vhost_dev_init(struct vhost_dev *hdev, void 
*opaque,

 int i, r;
 
 if (vhost_set_backend_type(hdev, backend_type) < 0) {

+close((uintptr_t)opaque);
 return -1;
 }
 
 if (hdev->vhost_ops->vhost_backend_init(hdev, opaque) < 0) {

+close((uintptr_t)opaque);
 return -errno;
 }
 


Patch looks fine.

I wonder whether setting errno and goto fail would be better here?
This will let vhost_backend_cleanup() to do the cleanup, e.g closeing
fd or purging queue (for vhost uesr).

Re: [Qemu-devel] [PATCH 0/7] coroutine: optimizations

2014-11-30 Thread Ming Lei

On Fri, Nov 28, 2014 at 10:12 PM, Paolo Bonzini  wrote:
> As discussed in the other thread, this brings speedups from
> dropping the coroutine mutex (which serializes multiple iothreads,
> too) and using ELF thread-local storage.
>
> The speedup in perf/cost is about 30% (190->145).  Windows port tested
> with tests/test-coroutine.exe under Wine.

The data is very nice, and in my laptop, 'perf cost' can be decreased
from 244ns to 174ns.

BTW, the cost by using coroutine to run function isn't only from these
helpers(*_yield, *_enter, *_create, and perf-cost just measures
this part of cost), but also some implicit/invisible part. I have some
test cases which can show the problem. If someone is interested,
I can post them in list.

Thanks,
Ming Lei

Re: [Qemu-devel] [PATCH] vhost: Fix vhostfd leak in error branch

2014-11-30 Thread Gonglei

On 2014/12/1 13:03, Jason Wang wrote:

> 
> 
> On Fri, Nov 28, 2014 at 5:26 PM, arei.gong...@huawei.com wrote:
>> From: Gonglei 
>>
>> Signed-off-by: Gonglei 
>> ---
>>  hw/scsi/vhost-scsi.c | 1 +
>>  hw/virtio/vhost.c| 2 ++
>>  2 files changed, 3 insertions(+)
>>
>> diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
>> index 308b393..dcb2bc5 100644
>> --- a/hw/scsi/vhost-scsi.c
>> +++ b/hw/scsi/vhost-scsi.c
>> @@ -233,6 +233,7 @@ static void vhost_scsi_realize(DeviceState *dev, 
>> Error **errp)
>> vhost_dummy_handle_output);
>>  if (err != NULL) {
>>  error_propagate(errp, err);
>> +close(vhostfd);
>>  return;
>>  }
>>  
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index 5d7c40a..5a12861 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -817,10 +817,12 @@ int vhost_dev_init(struct vhost_dev *hdev, void 
>> *opaque,
>>  int i, r;
>>  
>>  if (vhost_set_backend_type(hdev, backend_type) < 0) {
>> +close((uintptr_t)opaque);
>>  return -1;
>>  }
>>  
>>  if (hdev->vhost_ops->vhost_backend_init(hdev, opaque) < 0) {
>> +close((uintptr_t)opaque);
>>  return -errno;
>>  }
>>  
> 
> Patch looks fine.
> 
> I wonder whether setting errno and goto fail would be better here?
> This will let vhost_backend_cleanup() to do the cleanup, e.g closeing
> fd or purging queue (for vhost uesr).
> 

Hi, Jason
Actually, vhost_backend_init() can not fail for both vhost-usr
and vhost-backend-type-kernel  at present. Besides, vhost-usr'
s vhost_backend_cleanup() just set dev->opaque to 0,
don't purge queues.

Regards,
-Gonglei

Re: [Qemu-devel] [PATCH 0/7] coroutine: optimizations

2014-11-30 Thread Peter Lieven


On 01.12.2014 06:55, Ming Lei wrote:

On Fri, Nov 28, 2014 at 10:12 PM, Paolo Bonzini  wrote:

As discussed in the other thread, this brings speedups from
dropping the coroutine mutex (which serializes multiple iothreads,
too) and using ELF thread-local storage.

The speedup in perf/cost is about 30% (190->145).  Windows port tested
with tests/test-coroutine.exe under Wine.

The data is very nice, and in my laptop, 'perf cost' can be decreased
from 244ns to 174ns.

BTW, the cost by using coroutine to run function isn't only from these
helpers(*_yield, *_enter, *_create, and perf-cost just measures
this part of cost), but also some implicit/invisible part. I have some
test cases which can show the problem. If someone is interested,
I can post them in list.


Of course, maybe the problem can be solved or impaired.

Peter

[Qemu-devel] [Bug 1363641] Re: Build of v2.1.0 fails on armv7l due to undeclared __NR_select

2014-11-30 Thread Ben Gelb

Hi Eduardo - your above commit doesn't update the version in the error message 
(a few lines below, still says >= 2.1.0). 
Sorry if this isn't the right place to comment on your patch, but it would be 
nice to fix (just spent a while trying to figure out why having 2.1.0 installed 
wasn't satisfying the configure check).

Also, I think the way the if statement is constructed it will not
properly apply the 2.1.1 version check for i386 (only for x86_64).

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1363641

Title:
  Build of v2.1.0 fails on armv7l due to undeclared __NR_select

Status in QEMU:
  New

Bug description:
  After `make clean` and `git clean -x -f -d` `git checkout v2.1.0 &&
  configure --prefix=/home/user/prefix-qemu-2.1.0 && make` fails due to
  missing declarations

  CCqemu-seccomp.o
  qemu-seccomp.c:28:1: error: '__NR_select' undeclared here (not in a 
function)
  qemu-seccomp.c:36:1: error: '__NR_mmap' undeclared here (not in a 
function)
  qemu-seccomp.c:57:1: error: '__NR_getrlimit' undeclared here (not in a 
function)
  qemu-seccomp.c:96:1: error: '__NR_time' undeclared here (not in a 
function)
    GEN   qmp-marshal.c
  qemu-seccomp.c:186:1: error: '__NR_alarm' undeclared here (not in a 
function)
  make: *** [qemu-seccomp.o] Error 1

  Same errors for master 8b3030114a449e66c68450acaac4b66f26d91416.
  `configure`should not succeed for a failing build if the error occurs
  due to missing dependencies, if it's a bug it needs to be fixed.
  `config.log` for v2.1.0 and 8b303011... attached. The content is
  mostly compiler output which I think is unusual for `config.log`, but
  see for yourself.

  I'm building on a debian 7.6 chroot on Synology DSM 5.0. `uname -a`
  says `Linux diskstatation 3.2.40 #4493 SMP Thu Aug 21 21:43:02 CST
  2014 armv7l GNU/Linux`.

  After installing some of the missing dependencies, i.e. `apt-get
  install liblzo2-dev libbsd-dev syslinux-common libhwloc-dev librdmacm-
  dev libsnappy-dev libibverbs-dev valgrind linux-
  headers-3.2.0-4-common` I'm getting

   CCmigration-rdma.o
  migration-rdma.c: In function 'ram_chunk_start':
  migration-rdma.c:523:12: error: cast to pointer from integer of different 
size [-Werror=int-to-pointer-cast]
  migration-rdma.c: In function '__qemu_rdma_add_block':
  migration-rdma.c:556:49: error: cast to pointer from integer of different 
size [-Werror=int-to-pointer-cast]
  migration-rdma.c:557:49: error: cast to pointer from integer of different 
size [-Werror=int-to-pointer-cast]
  migration-rdma.c: In function '__qemu_rdma_delete_block':
  migration-rdma.c:664:45: error: cast to pointer from integer of different 
size [-Werror=int-to-pointer-cast]
  migration-rdma.c:699:49: error: cast to pointer from integer of different 
size [-Werror=int-to-pointer-cast]
  migration-rdma.c: In function 'qemu_rdma_search_ram_block':
  migration-rdma.c:1113:49: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
  migration-rdma.c: In function 'qemu_rdma_register_and_get_keys':
  migration-rdma.c:1176:50: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
  migration-rdma.c:1177:29: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
  migration-rdma.c:1177:51: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
  migration-rdma.c:1178:29: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
  migration-rdma.c: In function 'qemu_rdma_post_send_control':
  migration-rdma.c:1562:36: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
  migration-rdma.c: In function 'qemu_rdma_post_recv_control':
  migration-rdma.c:1616:37: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
  migration-rdma.c: In function 'qemu_rdma_write_one':
  migration-rdma.c:1864:16: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
  migration-rdma.c:1868:53: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
  migration-rdma.c:1922:52: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
  migration-rdma.c:1923:50: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
  migration-rdma.c:1977:49: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
  migration-rdma.c:1998:49: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
  migration-rdma.c:2010:58: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
  migration-rdma.c: In function 'qemu_rdma_registration_handle':

Re: [Qemu-devel] [PATCH 0/7] coroutine: optimizations

2014-11-30 Thread Ming Lei

On Mon, 01 Dec 2014 08:05:17 +0100
Peter Lieven  wrote:

> On 01.12.2014 06:55, Ming Lei wrote:
> > On Fri, Nov 28, 2014 at 10:12 PM, Paolo Bonzini  wrote:
> >> As discussed in the other thread, this brings speedups from
> >> dropping the coroutine mutex (which serializes multiple iothreads,
> >> too) and using ELF thread-local storage.
> >>
> >> The speedup in perf/cost is about 30% (190->145).  Windows port tested
> >> with tests/test-coroutine.exe under Wine.
> > The data is very nice, and in my laptop, 'perf cost' can be decreased
> > from 244ns to 174ns.
> >
> > BTW, the cost by using coroutine to run function isn't only from these
> > helpers(*_yield, *_enter, *_create, and perf-cost just measures
> > this part of cost), but also some implicit/invisible part. I have some
> > test cases which can show the problem. If someone is interested,
> > I can post them in list.
> 
> Of course, maybe the problem can be solved or impaired.

OK, please try below patch:

From 917d5cc0a273f9825b10abd52152c54e08c81ef8 Mon Sep 17 00:00:00 2001
From: Ming Lei 
Date: Mon, 1 Dec 2014 11:11:23 +0800
Subject: [PATCH] test-coroutine: introduce perf-cost-with-load

The perf/cost test case only covers explicit cost by
using coroutine.

This patch provides a open/close file test case, and
from this case, we can find there is also some implicit
or invisible cost except for the cost measured by /perf/cost.

In my environment, follows the test result after appying this
patch and running perf/cost and perf/cost-with-load:

{*LOG(start):{/perf/cost}:LOG*}
/perf/cost: {*LOG(message):{Run operation 4000 iterations 7.539413
s, 5305K operations/s, 188ns per coroutine}:LOG*}
OK
{*LOG(stop):(0;0;7.539497):LOG*}

{*LOG(start):{/perf/cost-with-load}:LOG*}
/perf/cost-with-load: {*LOG(message):{Run operation 100 iterations
2.648014 s, 377K operations/s, 2648ns per operation without using
coroutine}:LOG*}
{*LOG(message):{Run operation 100 iterations 2.919133 s, 342K
operations/s, 2919ns per operation, 271ns(cost introduced by coroutine)
per operation with using coroutine}:LOG*}
OK
{*LOG(stop):(0;0;5.567333):LOG*}

From above data, we can see 188ns is introduced for running one
coroutine, but in /perf/cost-with-load, the actual cost introduced
is 271ns, and the extra 83ns cost is invisible and implicit.

The similar result can be found in following test case too:
- read from /dev/nullb0 which is opened with O_DIRECT
(it is sort of aio read simulation, need 3.13+ kernel for
/dev/nullbX support by 'modprobe null_blk', this case
can show +150ns extra cost)
- statvfs() syscall, there is ~30ns extra cost for running
one statvfs() with coroutine
---
 tests/test-coroutine.c |   67 
 1 file changed, 67 insertions(+)

diff --git a/tests/test-coroutine.c b/tests/test-coroutine.c
index 27d1b6f..7323a91 100644
--- a/tests/test-coroutine.c
+++ b/tests/test-coroutine.c
@@ -311,6 +311,72 @@ static void perf_baseline(void)
 maxcycles, duration);
 }
 
+static void perf_cost_load_worker(void *opaque)
+{
+int fd;
+
+fd = open("/proc/self/exe", O_RDONLY);
+assert(fd >= 0);
+close(fd);
+}
+
+static __attribute__((noinline)) void perf_cost_load_func(void *opaque)
+{
+perf_cost_load_worker(opaque);
+qemu_coroutine_yield();
+}
+
+static double perf_cost_load(unsigned long maxcycles, bool use_co)
+{
+unsigned long i = 0;
+double duration;
+
+g_test_timer_start();
+if (use_co) {
+Coroutine *co;
+while (i++ < maxcycles) {
+co = qemu_coroutine_create(perf_cost_load_func);
+qemu_coroutine_enter(co, &i);
+qemu_coroutine_enter(co, NULL);
+}
+} else {
+while (i++ < maxcycles) {
+perf_cost_load_worker(&i);
+}
+}
+duration = g_test_timer_elapsed();
+
+return duration;
+}
+
+static void perf_cost_with_load(void)
+{
+const unsigned long maxcycles = 100;
+double duration;
+unsigned long ops;
+unsigned long cost_co, cost;
+
+duration = perf_cost_load(maxcycles, false);
+ops = (long)(maxcycles / (duration * 1000));
+cost = (unsigned long)(10.0 * duration / maxcycles);
+g_test_message("Run operation %lu iterations %f s, %luK operations/s, "
+   "%luns per operation without using coroutine",
+   maxcycles,
+   duration, ops,
+   cost);
+
+duration = perf_cost_load(maxcycles, true);
+ops = (long)(maxcycles / (duration * 1000));
+cost_co = (unsigned long)(10.0 * duration / maxcycles);
+g_test_message("Run operation %lu iterations %f s, %luK operations/s, "
+   "%luns per operation, "
+   "%luns(cost introduced by coroutine) per operation "
+   "wit

Re: [Qemu-devel] [PATCH] vhost: Fix vhostfd leak in error branch

2014-11-30 Thread Jason Wang




On Mon, Dec 1, 2014 at 2:27 PM, Gonglei  wrote:

On 2014/12/1 13:03, Jason Wang wrote:

 
 
 On Fri, Nov 28, 2014 at 5:26 PM, arei.gong...@huawei.com wrote:

 From: Gonglei 

 Signed-off-by: Gonglei 
 ---
  hw/scsi/vhost-scsi.c | 1 +
  hw/virtio/vhost.c| 2 ++
  2 files changed, 3 insertions(+)

 diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
 index 308b393..dcb2bc5 100644
 --- a/hw/scsi/vhost-scsi.c
 +++ b/hw/scsi/vhost-scsi.c
 @@ -233,6 +233,7 @@ static void vhost_scsi_realize(DeviceState 
*dev, 
 Error **errp)

 vhost_dummy_handle_output);
  if (err != NULL) {
  error_propagate(errp, err);
 +close(vhostfd);
  return;
  }
  
 diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c

 index 5d7c40a..5a12861 100644
 --- a/hw/virtio/vhost.c
 +++ b/hw/virtio/vhost.c
 @@ -817,10 +817,12 @@ int vhost_dev_init(struct vhost_dev *hdev, 
void 
 *opaque,

  int i, r;
  
  if (vhost_set_backend_type(hdev, backend_type) < 0) {

 +close((uintptr_t)opaque);
  return -1;
  }
  
  if (hdev->vhost_ops->vhost_backend_init(hdev, opaque) < 0) {

 +close((uintptr_t)opaque);
  return -errno;
  }
  
 
 Patch looks fine.
 
 I wonder whether setting errno and goto fail would be better here?
 This will let vhost_backend_cleanup() to do the cleanup, e.g 
closeing

 fd or purging queue (for vhost uesr).
 


Hi, Jason
Actually, vhost_backend_init() can not fail for both vhost-usr
and vhost-backend-type-kernel  at present. Besides, vhost-usr'
s vhost_backend_cleanup() just set dev->opaque to 0,
don't purge queues.



I see, thanks for explaining.

Reviewed-by: Jason Wang

47 matches

Mail list logo