Re: [Qemu-devel] master: intermittent acpi-test failures
On Sun, Nov 30, 2014 at 05:12:55PM +0200, Michael S. Tsirkin wrote: On Sat, Nov 29, 2014 at 05:39:01PM +, Peter Maydell wrote: On 29 November 2014 at 17:36, Michael S. Tsirkin m...@redhat.com wrote: On Fri, Nov 28, 2014 at 01:34:33PM +, Peter Maydell wrote: These failures are back after a long period of not being a problem :-( My guess is VM fails to boot from disk for some reason. Could you trigger a screenshot after this happens? Sure, if you can provide instructions (this is all from make check so there's no display by default and extracting a standalone qemu command line from make check is pretty tedious IME). -- PMM It's probably easiest to simply drop -nographic from test code to run with a display. To trigger a screenshot, just give screendump /path/to/file on hmp. Another idea is to configure debugging in seabios. -- MST
Re: [Qemu-devel] [PATCH RFC for-2.2] virtio-blk: force 1st s/g to match header
On Fri, Nov 28, 2014 at 04:14:35PM +, Peter Maydell wrote: On 28 November 2014 at 11:43, Stefan Hajnoczi stefa...@gmail.com wrote: Right, the test case explicitly tests different descriptor layouts, even though virtio-blk-pci does not set the ANY_LAYOUT feature bit. Either the test case needs to check ANY_LAYOUT before using the 2-descriptor layout or it needs to expect QEMU to refuse (in this case exit(1), which is not very graceful). The quick fix is to skip the 2-descriptor layout tests and re-enable them once virtio-blk actually supports ANY_LAYOUT. Any objections? So what do we want to do with this for 2.2? We have I think two choices: (1) say that this isn't causing problems in practice, and defer all this to 2.3 (2) add something like this patch plus fix the 'make check' tests (but turning maybe something misbehaves into qemu definitely blows up and exits doesn't seem like a great improvement to me) I started looking at virtio-blk initially because I wasn't sure if we should fix the virtio-net issue in the core virtio code. But since we've decided not to do that, whether virtio-blk's problems are release-blockers or not is something that we can decide on their own merits. My current thought is that we don't need to address this for 2.2; is there something I'm missing that means we shouldn't defer to 2.3? thanks -- PMM The result of this is host mapping leak. What effect does this have? Can this DOS host? If not, I agree.
[Qemu-devel] [kernel PATCH v2 0/2] devicetree: document ARM bindings for QEMU's Firmware Config interface
V2 seeks to address comments raised in the v1 review. Changes are broken out per patch, as git notes. Thanks Laszlo Laszlo Ersek (2): devicetree: document the qemu and virtio vendor prefixes devicetree: document ARM bindings for QEMU's Firmware Config interface Documentation/devicetree/bindings/arm/fw-cfg.txt | 57 ++ .../devicetree/bindings/vendor-prefixes.txt| 2 + 2 files changed, 59 insertions(+) create mode 100644 Documentation/devicetree/bindings/arm/fw-cfg.txt -- 1.8.3.1
[Qemu-devel] [kernel PATCH v2 1/2] devicetree: document the qemu and virtio vendor prefixes
The QEMU open source machine emulator and virtualizer presents firmware and operating systems running in virtual machines (guests) with purely virtual hardware (ie. hardware that has never existed in physical form). Since QEMU exposes some of these devices in a DTB, it makes sense to define qemu and virtio as vendor prefixes. The qemu definition is from [1], revision 4451 (22:24, 25 November 2014). The virtio definition is composed from [2] and [3]. [1] http://wiki.qemu.org/Main_Page [2] http://docs.oasis-open.org/virtio/virtio/v1.0/csprd01/virtio-v1.0-csprd01.html [3] http://en.wikipedia.org/wiki/OASIS_%28organization%29 Suggested-by: Mark Rutland mark.rutl...@arm.com Suggested-by: Arnd Bergmann a...@arndb.de Signed-off-by: Laszlo Ersek ler...@redhat.com --- Notes: v2: - new in v2 [Mark Rutland, Arnd Bergmann] Documentation/devicetree/bindings/vendor-prefixes.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt index a344ec2..df095c1 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.txt +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt @@ -119,6 +119,7 @@ pixcir PIXCIR MICROELECTRONICS Co., Ltd powervrPowerVR (deprecated, use img) qcaQualcomm Atheros, Inc. qcom Qualcomm Technologies, Inc +qemu QEMU, a generic and open source machine emulator and virtualizer qnap QNAP Systems, Inc. radxa Radxa raidsonic RaidSonic Technology GmbH @@ -159,6 +160,7 @@ usi Universal Scientific Industrial Co., Ltd. v3 V3 Semiconductor variscite Variscite Ltd. viaVIA Technologies, Inc. +virtio Virtual I/O Device Specification, developed by the OASIS consortium voipac Voipac Technologies s.r.o. winbond Winbond Electronics corp. wlfWolfson Microelectronics -- 1.8.3.1
[Qemu-devel] [kernel PATCH v2 2/2] devicetree: document ARM bindings for QEMU's Firmware Config interface
Peter Maydell suggested that we describe new devices / DTB nodes in the kernel Documentation tree that we expose to arm virt guests in QEMU. Although the kernel is not required to access the fw_cfg interface, Documentation/devicetree/bindings/arm is probably the best central spot to keep the fw_cfg description in. Suggested-by: Peter Maydell peter.mayd...@linaro.org Signed-off-by: Laszlo Ersek ler...@redhat.com --- Notes: v2: - more info on what the fw_cfg device is used for, versioning, blobs etc [Mark Rutland] - drop generic statements about DTB [Mark Rutland] - drop uint64_t language [Mark Rutland] - cover both registers with one contiguous region, of size 0x1000 [Mark Rutland, Arnd Bergmann] - specify qemu,fw-cfg-mmio for the compatible property [Mark Rutland, Arnd Bergmann] - reorder DTS snippet so that compatible come first [Mark Rutland] Documentation/devicetree/bindings/arm/fw-cfg.txt | 57 1 file changed, 57 insertions(+) create mode 100644 Documentation/devicetree/bindings/arm/fw-cfg.txt diff --git a/Documentation/devicetree/bindings/arm/fw-cfg.txt b/Documentation/devicetree/bindings/arm/fw-cfg.txt new file mode 100644 index 000..15e2ae3 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/fw-cfg.txt @@ -0,0 +1,57 @@ +* QEMU Firmware Configuration bindings for ARM + +QEMU's arm-softmmu and aarch64-softmmu emulation / virtualization targets +provide the following Firmware Configuration interface on the virt machine +type: + +- A write-only, 16-bit wide selector (or control) register, +- a read-write, 8-bit wide data register. + +The guest writes a selector value (a key) to the selector register, and then +can read the corresponding data (produced by QEMU) via the data register. If +the selected entry is writable, the guest can rewrite it through the data +register. + +The interface allows guest firmware to download various parameters and blobs +that affect how the firmware works and what tables it installs for the guest +OS. For example, boot order of devices, ACPI tables, SMBIOS tables, kernel and +initrd images for direct kernel booting, virtual machine UUID, SMP information, +virtual NUMA topology, and so on. + +The authoritative registry of the valid selector values and their meanings is +the QEMU source code; the structure of the data blobs corresponding to the +individual key values is also defined in the QEMU source code. + +The outermost protocol (involving the write / read sequences of the control and +data registers) is unversioned and considered stable. Versioning of individual +blobs is theoretically possible, but it is not specified on this level (and is +not done in practice as yet). + +QEMU exposes the control and data register to x86 guests at fixed IO ports. ARM +guests can access them as memory mapped registers, and their location is +communicated to the guest's UEFI firmware in the DTB that QEMU places at the +bottom of the guest's DRAM. + +The guest kernel is not expected to use these registers (although it is +certainly allowed to); the device tree bindings are documented here because +this is where device tree bindings reside in general. + +Required properties: + +- compatible: qemu,fw-cfg-mmio. + +- reg: the MMIO region used by the device. + * The first two bytes in the region cover the control register. + * The third byte covers the data register. + +Example: + +/ { + #size-cells = 0x2; + #address-cells = 0x2; + + fw-cfg@902 { + compatible = qemu,fw-cfg-mmio; + reg = 0x0 0x902 0x0 0x1000; + }; +}; -- 1.8.3.1
[Qemu-devel] [PATCH v2] arm: add fw_cfg to virt board
fw_cfg already supports exposure over MMIO (used in ppc/mac_newworld.c, ppc/mac_oldworld.c, sparc/sun4m.c); we can easily add it to the virt board. The mmio register block of fw_cfg is advertized in the device tree. As base address we pick 0x0902, which conforms to the comment preceding a15memmap: it falls in the miscellaneous device I/O range 128MB..256MB, and it is aligned at 64KB. The DTB properties follow the documentation in the Linux source file Documentation/devicetree/bindings/arm/fw-cfg.txt. fw_cfg automatically exports a number of files to the guest; for example, bootorder (see fw_cfg_machine_reset()). Signed-off-by: Laszlo Ersek ler...@redhat.com --- Notes: v2: - use a single mmio region of size 0x1000 - set compatible property to qemu,fw-cfg-mmio hw/arm/virt.c | 21 + 1 file changed, 21 insertions(+) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 314e55b..af794ea 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -68,6 +68,7 @@ enum { VIRT_UART, VIRT_MMIO, VIRT_RTC, +VIRT_FW_CFG, }; typedef struct MemMapEntry { @@ -107,6 +108,7 @@ static const MemMapEntry a15memmap[] = { [VIRT_GIC_CPU] ={ 0x0801, 0x0001 }, [VIRT_UART] = { 0x0900, 0x1000 }, [VIRT_RTC] ={ 0x0901, 0x1000 }, +[VIRT_FW_CFG] = { 0x0902, 0x1000 }, [VIRT_MMIO] = { 0x0a00, 0x0200 }, /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */ /* 0x1000 .. 0x4000 reserved for PCI */ @@ -519,6 +521,23 @@ static void create_flash(const VirtBoardInfo *vbi) g_free(nodename); } +static void create_fw_cfg(const VirtBoardInfo *vbi) +{ +hwaddr base = vbi-memmap[VIRT_FW_CFG].base; +hwaddr size = vbi-memmap[VIRT_FW_CFG].size; +char *nodename; + +fw_cfg_init(0, 0, base, base + 2); + +nodename = g_strdup_printf(/fw-cfg@% PRIx64, base); +qemu_fdt_add_subnode(vbi-fdt, nodename); +qemu_fdt_setprop_string(vbi-fdt, nodename, +compatible, qemu,fw-cfg-mmio); +qemu_fdt_setprop_sized_cells(vbi-fdt, nodename, reg, + 2, base, 2, size); +g_free(nodename); +} + static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size) { const VirtBoardInfo *board = (const VirtBoardInfo *)binfo; @@ -604,6 +623,8 @@ static void machvirt_init(MachineState *machine) */ create_virtio_devices(vbi, pic); +create_fw_cfg(vbi); + vbi-bootinfo.ram_size = machine-ram_size; vbi-bootinfo.kernel_filename = machine-kernel_filename; vbi-bootinfo.kernel_cmdline = machine-kernel_cmdline; -- 1.8.3.1
[Qemu-devel] [PATCH v5 6/6] hw/arm/virt: add dynamic sysbus device support
Allows sysbus devices to be instantiated from command line by using -device option. Machvirt creates a platform bus at init. The dynamic sysbus devices are attached to this platform bus device. The platform bus device registers a machine init done notifier whose role will be to bind the dynamic sysbus devices. Indeed dynamic sysbus devices are created after machine init. machvirt also registers a notifier that will build the device tree nodes for the platform bus and its children dynamic sysbus devices. Signed-off-by: Alexander Graf ag...@suse.de Signed-off-by: Eric Auger eric.au...@linaro.org --- v4 - v5: - platform_bus_params becomes static const - reword comment in create_platform_bus - reword the commit message v3 - v4: - use platform bus object, instantiated in create_platform_bus - device tree generation for platform bus and children dynamic sysbus devices is no more handled at reset but in a machine_init_done_notifier (due to the change in implementaion of ARM load dtb using rom_add_blob_fixed). - device tree enhancement now takes into account the case of user provided dtb. Before the user dtb was overwritten which was wrong. However in case the dtb is provided by the user, dynamic sysbus nodes are not added there. - renaming of MACHVIRT_PLATFORM defines - MACHVIRT_PLATFORM_PAGE_SHIFT and SIZE_PAGES not needed anymore, hence removed. - DynSysbusParams struct renamed into ARMPlatformBusSystemParams and above params removed. - separation of dt creation and QEMU binding is not mandated anymore since the device tree is not created from scratch anymore. Instead the modify_dtb function is used. - create_platform_bus registers another machine init done notifier to start VFIO IRQ handling. This latter executes after the dynamic sysbus device binding. v2 - v3: - renaming of arm_platform_bus_create_devtree and arm_load_dtb - add copyright in hw/arm/dyn_sysbus_devtree.c v1 - v2: - remove useless vfio-platform.h include file - s/MACHVIRT_PLATFORM_HOLE/MACHVIRT_PLATFORM_SIZE - use dyn_sysbus_binding and dyn_sysbus_devtree - dynamic sysbus platform buse size shrinked to 4MB and moved between RTC and MMIO v1: Inspired from what Alex Graf did in ppc e500 https://lists.gnu.org/archive/html/qemu-ppc/2014-07/msg00012.html Conflicts: hw/arm/sysbus-fdt.c --- hw/arm/virt.c | 57 + 1 file changed, 57 insertions(+) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 314e55b..37326a9 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -42,6 +42,8 @@ #include exec/address-spaces.h #include qemu/bitops.h #include qemu/error-report.h +#include hw/arm/sysbus-fdt.h +#include hw/platform-bus.h #define NUM_VIRTIO_TRANSPORTS 32 @@ -59,6 +61,11 @@ #define GIC_FDT_IRQ_PPI_CPU_START 8 #define GIC_FDT_IRQ_PPI_CPU_WIDTH 8 +#define PLATFORM_BUS_BASE 0x940 +#define PLATFORM_BUS_SIZE (4ULL * 1024 * 1024) +#define PLATFORM_BUS_FIRST_IRQ48 +#define PLATFORM_BUS_NUM_IRQS 20 + enum { VIRT_FLASH, VIRT_MEM, @@ -68,6 +75,7 @@ enum { VIRT_UART, VIRT_MMIO, VIRT_RTC, +VIRT_PLATFORM_BUS, }; typedef struct MemMapEntry { @@ -107,6 +115,7 @@ static const MemMapEntry a15memmap[] = { [VIRT_GIC_CPU] ={ 0x0801, 0x0001 }, [VIRT_UART] = { 0x0900, 0x1000 }, [VIRT_RTC] ={ 0x0901, 0x1000 }, +[VIRT_PLATFORM_BUS] = {PLATFORM_BUS_BASE , PLATFORM_BUS_SIZE}, [VIRT_MMIO] = { 0x0a00, 0x0200 }, /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */ /* 0x1000 .. 0x4000 reserved for PCI */ @@ -117,6 +126,14 @@ static const int a15irqmap[] = { [VIRT_UART] = 1, [VIRT_RTC] = 2, [VIRT_MMIO] = 16, /* ...to 16 + NUM_VIRTIO_TRANSPORTS - 1 */ +[VIRT_PLATFORM_BUS] = PLATFORM_BUS_FIRST_IRQ, +}; + +static const ARMPlatformBusSystemParams platform_bus_params = { +.platform_bus_base = PLATFORM_BUS_BASE, +.platform_bus_size = PLATFORM_BUS_SIZE, +.platform_bus_first_irq = PLATFORM_BUS_FIRST_IRQ, +.platform_bus_num_irqs = PLATFORM_BUS_NUM_IRQS, }; static VirtBoardInfo machines[] = { @@ -519,6 +536,43 @@ static void create_flash(const VirtBoardInfo *vbi) g_free(nodename); } +static void create_platform_bus(VirtBoardInfo *vbi, qemu_irq *pic, +const ARMPlatformBusSystemParams *system_params) +{ +DeviceState *dev; +SysBusDevice *s; +int i; +ARMPlatformBusFdtParams *fdt_params = g_new(ARMPlatformBusFdtParams, 1); +MemoryRegion *sysmem = get_system_memory(); + +fdt_params-system_params = system_params; +fdt_params-binfo = vbi-bootinfo; +fdt_params-intc = /intc; +/* + * register a machine init done notifier that creates the device tree + * nodes of the platform bus and its children dynamic sysbus devices + */ +arm_register_platform_bus_fdt_creator(fdt_params); + +dev =
[Qemu-devel] [PATCH v5 5/6] hw/arm/sysbus-fdt: helpers for platform bus nodes addition
This new C module will be used by ARM machine files to generate platform bus node and their dynamic sysbus device tree nodes. Dynamic sysbus device node addition is done in a machine init done notifier. arm_register_platform_bus_fdt_creator does the registration of this latter and is supposed to be called by ARM machine files that support platform bus and their dynamic sysbus. Addition of dynamic sysbus nodes is done only if the user did not provide any dtb. Signed-off-by: Alexander Graf ag...@suse.de Signed-off-by: Eric Auger eric.au...@linaro.org --- v4 - v5: - change indentation in add_fdt_node_functions. Also becomes a static const. - ARMPlatformBusFdtParams.system_params becomes a pointer to a const ARMPlatformBusSystemParams - removes platform-bus.h second inclusion v3 - v4: - dyn_sysbus_devtree.c renamed into sysbus-fdt.c - use new PlatformBusDevice object - the dtb upgrade is done through modify_dtb. Before the fdt was recreated from scratch. When the user provided a dtb this latter was overwritten which was not correct. - an array contains the association between device type names and their node creation function - I must aknowledge I did not find any cleaner way to implement a FDT_BUILDER interface, as suggested by Paolo. The class method would need to be initialized somewhere and since it cannot happen in the device itself - according to Alex Peter comments -, I don't see when I shall associate the device type and its interface implementation. v2 - v3: - add arm_ prefix - arm_sysbus_device_create_devtree becomes static v1 - v2: - Code moved in an arch specific file to accomodate architecture dependent specificities. - remove platform_bus_base from PlatformDevtreeData v1: code originally written by Alex Graf in e500.c and reused for ARM [Eric Auger] --- hw/arm/Makefile.objs| 1 + hw/arm/sysbus-fdt.c | 180 include/hw/arm/sysbus-fdt.h | 50 3 files changed, 231 insertions(+) create mode 100644 hw/arm/sysbus-fdt.c create mode 100644 include/hw/arm/sysbus-fdt.h diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs index 6088e53..0cc63e1 100644 --- a/hw/arm/Makefile.objs +++ b/hw/arm/Makefile.objs @@ -3,6 +3,7 @@ obj-$(CONFIG_DIGIC) += digic_boards.o obj-y += integratorcp.o kzm.o mainstone.o musicpal.o nseries.o obj-y += omap_sx1.o palm.o realview.o spitz.o stellaris.o obj-y += tosa.o versatilepb.o vexpress.o virt.o xilinx_zynq.o z2.o +obj-y += sysbus-fdt.o obj-y += armv7m.o exynos4210.o pxa2xx.o pxa2xx_gpio.o pxa2xx_pic.o obj-$(CONFIG_DIGIC) += digic.o diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c new file mode 100644 index 000..7537267 --- /dev/null +++ b/hw/arm/sysbus-fdt.c @@ -0,0 +1,180 @@ +/* + * ARM Platform Bus device tree generation helpers + * + * Copyright (c) 2014 Linaro Limited + * + * Authors: + * Alex Graf ag...@suse.de + * Eric Auger eric.au...@linaro.org + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2 or later, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + * + */ + +#include hw/arm/sysbus-fdt.h +#include qemu/error-report.h +#include sysemu/device_tree.h +#include hw/platform-bus.h +#include sysemu/sysemu.h + +/* + * internal struct that contains the information to create dynamic + * sysbus device node + */ +typedef struct PlatformBusFdtData { +void *fdt; /* device tree handle */ +int irq_start; /* index of the first IRQ usable by platform bus devices */ +const char *pbus_node_name; /* name of the platform bus node */ +PlatformBusDevice *pbus; +} PlatformBusFdtData; + +/* + * struct used when calling the machine init done notifier + * that constructs the fdt nodes of platform bus devices + */ +typedef struct PlatformBusFdtNotifierParams { +ARMPlatformBusFdtParams *fdt_params; +Notifier notifier; +} PlatformBusFdtNotifierParams; + +/* struct that associates a device type name and a node creation function */ +typedef struct NodeCreationPair { +const char *typename; +int (*add_fdt_node_fn)(SysBusDevice *sbdev, void *opaque); +} NodeCreationPair; + +/* list of supported dynamic sysbus devices */ +static const NodeCreationPair add_fdt_node_functions[] = { +{, NULL}, /*last element*/ +}; + +/** + * add_fdt_node - add the device tree node of a dynamic sysbus device + * + * @sbdev: handle to the sysbus device + * @opaque: handle to the PlatformBusFdtData + * + * Checks the sysbus type belongs to the list of device types
[Qemu-devel] [PATCH v5 0/6] machvirt dynamic sysbus device instantiation
This patch series enables machvirt to dynamically instantiate sysbus devices from command line (using -device option). All those sysbus devices are plugged onto a platform bus. This latter device is instantiated in machvirt and takes care of the binding of children sysbus devices on a machine init done notifier. The device tree node generation for children dynamic sysbus device also happens on a subsequent notifier that must be executed after the above one. machvirt registers that notifier before the platform bus creation to make sure notifiers are executed in the right order: dt generation after actual QOM binding. Very few sysbus devices are supposed to be instantiated that way. VFIO devices belong to them. Node creation really is architecture specific. On ARM the dynamic sysbus device node creation is implemented in a new C module, hw/arm/sysbus-fdt.c and not in the machine file. Machvirt transformations and sysbus-fdt are largely inspired from Alex work. The patch series can be found at: http://git.linaro.org/people/eric.auger/qemu.git (branch vfio_integ_v8) Best Regards Eric v4 - v5: - in virt.c: platform_bus_params becomes static const - sysbus-fdt: change indentation in add_fdt_node_functions array init - s/load_dtb/arm_load_dtb in one boot.c comment v3 - v4: - dyn_sysbus_binding removed since binding stuff now are implemented by the platform bus device - due to a change in ARM load_dtb implementation using rom_add_blob_fixed, the dt no more is generated in a reset notifier but is generated on a machine init done notifier - the augmented device tree is not generated from scratch anymore but is added using a modify_dtb function. This required some small change in boot.c - the case where the user provides a dtb file now is handled - some cleanup in virt additions - implement a list of dyanmically instantiable devices in sysbus-fdt v2 - v3: - patch now applies on top of Alex full patchset - dyn_sysbus_devtree: add arm_prefix to emphasize the fact those functions are arm specific; arm_sysbus_device_create_devtree becomes static - load_dtb renamed into arm_load_dtb - add copyright in hw/arm/dyn_sysbus_devtree.c Eric Auger (6): hw/arm/boot: load_dtb becomes non static arm_load_dtb hw/arm/boot: dtb start and limit moved in arm_boot_info hw/arm/boot: do not free VirtBoardInfo fdt in arm_load_dtb hw/arm: add a new modify_dtb_opaque field in arm_boot_info hw/arm/sysbus-fdt: helpers for platform bus nodes addition hw/arm/virt: add dynamic sysbus device support hw/arm/Makefile.objs| 1 + hw/arm/boot.c | 52 +++-- hw/arm/sysbus-fdt.c | 180 hw/arm/virt.c | 57 ++ include/hw/arm/arm.h| 7 ++ include/hw/arm/sysbus-fdt.h | 50 6 files changed, 325 insertions(+), 22 deletions(-) create mode 100644 hw/arm/sysbus-fdt.c create mode 100644 include/hw/arm/sysbus-fdt.h -- 1.8.3.2
[Qemu-devel] [PATCH v5 2/6] hw/arm/boot: dtb start and limit moved in arm_boot_info
Two fields are added in arm_boot_info (dtb_start and dtb_limit). The prototype of arm_load_kernel is changed to only use arm_boot_info. The rationale behind introducing that change is when dealing with dynamic sysbus devices, we need to upgrade the device tree with dynamic device nodes after the dtb is already loaded. Storing those parameters in arm_boot_info allows to avoid computing again dtb_start and dtb_load, as done in arm_load_kernel. Signed-off-by: Eric Auger eric.au...@linaro.org --- hw/arm/boot.c| 38 +- include/hw/arm/arm.h | 5 +++-- 2 files changed, 24 insertions(+), 19 deletions(-) diff --git a/hw/arm/boot.c b/hw/arm/boot.c index 9997bea..0398cd4 100644 --- a/hw/arm/boot.c +++ b/hw/arm/boot.c @@ -314,24 +314,21 @@ static void set_kernel_args_old(const struct arm_boot_info *info) /** * arm_load_dtb() - load a device tree binary image into memory - * @addr: the address to load the image at * @binfo: struct describing the boot environment - * @addr_limit: upper limit of the available memory area at @addr * * Load a device tree supplied by the machine or by the user with the - * '-dtb' command line option, and put it at offset @addr in target - * memory. + * '-dtb' command line option, and put it at offset binfo-dtb_start in + * target memory. * - * If @addr_limit contains a meaningful value (i.e., it is strictly greater - * than @addr), the device tree is only loaded if its size does not exceed - * the limit. + * If binfo-dtb_limit contains a meaningful value (i.e., it is strictly + * greater binfo-dtb_start, the device tree is only loaded if its size does + * not exceed this upper limit. * * Returns: the size of the device tree image on success, * 0 if the image size exceeds the limit, * -1 on errors. */ -int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, - hwaddr addr_limit) +int arm_load_dtb(const struct arm_boot_info *binfo) { void *fdt = NULL; int size, rc; @@ -360,7 +357,8 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, } } -if (addr_limit addr size (addr_limit - addr)) { +if (binfo-dtb_limit binfo-dtb_start +size (binfo-dtb_limit - binfo-dtb_start)) { /* Installing the device tree blob at addr would exceed addr_limit. * Whether this constitutes failure is up to the caller to decide, * so just return 0 as size, i.e., no error. @@ -427,7 +425,7 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, /* Put the DTB into the memory map as a ROM image: this will ensure * the DTB is copied again upon reset, even if addr points into RAM. */ -rom_add_blob_fixed(dtb, fdt, size, addr); +rom_add_blob_fixed(dtb, fdt, size, binfo-dtb_start); g_free(fdt); @@ -504,7 +502,10 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info) /* If we have a device tree blob, but no kernel to supply it to, * copy it to the base of RAM for a bootloader to pick up. */ -if (arm_load_dtb(info-loader_start, info, 0) 0) { +info-dtb_start = info-loader_start; +info-dtb_limit = 0; + +if (arm_load_dtb(info) 0) { exit(1); } } @@ -572,7 +573,9 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info) if (elf_low_addr info-loader_start) { elf_low_addr = 0; } -if (arm_load_dtb(info-loader_start, info, elf_low_addr) 0) { +info-dtb_start = info-loader_start; +info-dtb_limit = elf_low_addr; +if (arm_load_dtb(info) 0) { exit(1); } } @@ -635,12 +638,13 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info) * kernels will trash anything in the 4K page the initrd * ends in, so make sure the DTB isn't caught up in that. */ -hwaddr dtb_start = QEMU_ALIGN_UP(info-initrd_start + initrd_size, - 4096); -if (arm_load_dtb(dtb_start, info, 0) 0) { +info-dtb_start = QEMU_ALIGN_UP(info-initrd_start + initrd_size, +4096); +info-dtb_limit = 0; +if (arm_load_dtb(info) 0) { exit(1); } -fixupcontext[FIXUP_ARGPTR] = dtb_start; +fixupcontext[FIXUP_ARGPTR] = info-dtb_start; } else { fixupcontext[FIXUP_ARGPTR] = info-loader_start + KERNEL_ARGS_ADDR; if (info-ram_size = (1ULL 32)) { diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h index 5fdae7b..5f1ecb7 100644 --- a/include/hw/arm/arm.h +++ b/include/hw/arm/arm.h @@ -65,11 +65,12 @@ struct arm_boot_info { int is_linux; hwaddr initrd_start;
[Qemu-devel] [PATCH v5 1/6] hw/arm/boot: load_dtb becomes non static arm_load_dtb
load_dtb is renamed into arm_load_dtb and becomes non static. it will be used by machvirt for dynamic instantiation of platform devices Signed-off-by: Eric Auger eric.au...@linaro.org --- v4 - v5: s/load_dtb/arm_load_dtb in one comment v2 - v3: load_dtb renamed into arm_load_dtb Conflicts: hw/arm/boot.c --- hw/arm/boot.c| 16 include/hw/arm/arm.h | 2 ++ 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/hw/arm/boot.c b/hw/arm/boot.c index 0014c34..9997bea 100644 --- a/hw/arm/boot.c +++ b/hw/arm/boot.c @@ -313,7 +313,7 @@ static void set_kernel_args_old(const struct arm_boot_info *info) } /** - * load_dtb() - load a device tree binary image into memory + * arm_load_dtb() - load a device tree binary image into memory * @addr: the address to load the image at * @binfo: struct describing the boot environment * @addr_limit: upper limit of the available memory area at @addr @@ -330,8 +330,8 @@ static void set_kernel_args_old(const struct arm_boot_info *info) * 0 if the image size exceeds the limit, * -1 on errors. */ -static int load_dtb(hwaddr addr, const struct arm_boot_info *binfo, -hwaddr addr_limit) +int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, + hwaddr addr_limit) { void *fdt = NULL; int size, rc; @@ -504,7 +504,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info) /* If we have a device tree blob, but no kernel to supply it to, * copy it to the base of RAM for a bootloader to pick up. */ -if (load_dtb(info-loader_start, info, 0) 0) { +if (arm_load_dtb(info-loader_start, info, 0) 0) { exit(1); } } @@ -566,13 +566,13 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info) */ if (elf_low_addr info-loader_start || elf_high_addr info-loader_start) { -/* Pass elf_low_addr as address limit to load_dtb if it may be - * pointing into RAM, otherwise pass '0' (no limit) +/* Pass elf_low_addr as address limit to arm_load_dtb if it may + * be pointing into RAM, otherwise pass '0' (no limit) */ if (elf_low_addr info-loader_start) { elf_low_addr = 0; } -if (load_dtb(info-loader_start, info, elf_low_addr) 0) { +if (arm_load_dtb(info-loader_start, info, elf_low_addr) 0) { exit(1); } } @@ -637,7 +637,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info) */ hwaddr dtb_start = QEMU_ALIGN_UP(info-initrd_start + initrd_size, 4096); -if (load_dtb(dtb_start, info, 0) 0) { +if (arm_load_dtb(dtb_start, info, 0) 0) { exit(1); } fixupcontext[FIXUP_ARGPTR] = dtb_start; diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h index cefc9e6..5fdae7b 100644 --- a/include/hw/arm/arm.h +++ b/include/hw/arm/arm.h @@ -68,6 +68,8 @@ struct arm_boot_info { hwaddr entry; }; void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info); +int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, + hwaddr addr_limit); /* Multiplication factor to convert from system clock ticks to qemu timer ticks. */ -- 1.8.3.2
[Qemu-devel] [PATCH v5 4/6] hw/arm: add a new modify_dtb_opaque field in arm_boot_info
This field can be used by any modify_dtb() function to pass additional arguments requested to build the modified dtb. This is needed for creating the platform bus dynamic sysbus nodes. Signed-off-by: Eric Auger eric.au...@linaro.org --- include/hw/arm/arm.h | 4 1 file changed, 4 insertions(+) diff --git a/include/hw/arm/arm.h b/include/hw/arm/arm.h index 5f1ecb7..ff776fa 100644 --- a/include/hw/arm/arm.h +++ b/include/hw/arm/arm.h @@ -68,6 +68,10 @@ struct arm_boot_info { hwaddr dtb_start; /* start address of the dtb */ hwaddr dtb_limit; /* upper RAM limit the dtb cannot overshoot */ hwaddr entry; +/* in case modify_dtb requires additional parameters to create the + * the new nodes, use following opaque + */ +void *modify_dtb_opaque; }; void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info); int arm_load_dtb(const struct arm_boot_info *binfo); -- 1.8.3.2
[Qemu-devel] [PATCH v5 3/6] hw/arm/boot: do not free VirtBoardInfo fdt in arm_load_dtb
Currently arm_load_dtb frees the fdt handle whatever it is allocated from load_device_tree or allocated externally. When adding dynamic sysbus nodes after the first dtb load, we would like to reuse the fdt used during the first load instead of re-creating the whole device tree. If the fdt is destroyed, this is not possible. Signed-off-by: Eric Auger eric.au...@linaro.org --- hw/arm/boot.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/hw/arm/boot.c b/hw/arm/boot.c index 0398cd4..0f9cd2c 100644 --- a/hw/arm/boot.c +++ b/hw/arm/boot.c @@ -427,12 +427,16 @@ int arm_load_dtb(const struct arm_boot_info *binfo) */ rom_add_blob_fixed(dtb, fdt, size, binfo-dtb_start); -g_free(fdt); +if (binfo-dtb_filename) { +g_free(fdt); +} return size; fail: -g_free(fdt); +if (binfo-dtb_filename) { +g_free(fdt); +} return -1; } -- 1.8.3.2
[Qemu-devel] [PATCH v8 05/19] hw/vfio/pci: add type, name and group fields in VFIODevice
Add 3 new fields in the VFIODevice struct. Type is set to VFIO_DEVICE_TYPE_PCI. The type enum value will later be used to discriminate between VFIO PCI and platform devices. The name is set to domain:bus:slot:function. Currently used to test whether the device already is attached to the group. Later on, the name will be used to simplify all traces. The group is simply moved from VFIOPCIDevice to VFIODevice. Signed-off-by: Eric Auger eric.au...@linaro.org --- hw/vfio/pci.c | 27 ++- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index cd9ce4e..157e1a5 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -48,6 +48,10 @@ #define VFIO_ALLOW_KVM_MSI 1 #define VFIO_ALLOW_KVM_MSIX 1 +enum { +VFIO_DEVICE_TYPE_PCI = 0, +}; + struct VFIOPCIDevice; typedef struct VFIOQuirk { @@ -186,7 +190,10 @@ typedef struct VFIOMSIXInfo { } VFIOMSIXInfo; typedef struct VFIODevice { +struct VFIOGroup *group; +char *name; int fd; +int type; } VFIODevice; typedef struct VFIOPCIDevice { @@ -208,7 +215,6 @@ typedef struct VFIOPCIDevice { VFIOVGA vga; /* 0xa, 0x3b0, 0x3c0 */ PCIHostDeviceAddress host; QLIST_ENTRY(VFIOPCIDevice) next; -struct VFIOGroup *group; EventNotifier err_notifier; uint32_t features; #define VFIO_FEATURE_ENABLE_VGA_BIT 0 @@ -3924,7 +3930,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name, } vdev-vbasedev.fd = ret; -vdev-group = group; +vdev-vbasedev.group = group; QLIST_INSERT_HEAD(group-device_list, vdev, next); /* Sanity check device */ @@ -4054,7 +4060,7 @@ static int vfio_get_device(VFIOGroup *group, const char *name, error: if (ret) { QLIST_REMOVE(vdev, next); -vdev-group = NULL; +vdev-vbasedev.group = NULL; close(vdev-vbasedev.fd); } return ret; @@ -4063,9 +4069,10 @@ error: static void vfio_put_device(VFIOPCIDevice *vdev) { QLIST_REMOVE(vdev, next); -vdev-group = NULL; +vdev-vbasedev.group = NULL; trace_vfio_put_device(vdev-vbasedev.fd); close(vdev-vbasedev.fd); +g_free(vdev-vbasedev.name); if (vdev-msix) { g_free(vdev-msix); vdev-msix = NULL; @@ -4197,6 +4204,11 @@ static int vfio_initfn(PCIDevice *pdev) return -errno; } +vdev-vbasedev.type = VFIO_DEVICE_TYPE_PCI; +g_strdup_printf(vdev-vbasedev.name, %04x:%02x:%02x.%01x, +vdev-host.domain, vdev-host.bus, vdev-host.slot, +vdev-host.function); + strncat(path, iommu_group, sizeof(path) - strlen(path) - 1); len = readlink(path, iommu_group_path, sizeof(path)); @@ -4227,10 +4239,7 @@ static int vfio_initfn(PCIDevice *pdev) vdev-host.function); QLIST_FOREACH(pvdev, group-device_list, next) { -if (pvdev-host.domain == vdev-host.domain -pvdev-host.bus == vdev-host.bus -pvdev-host.slot == vdev-host.slot -pvdev-host.function == vdev-host.function) { +if (strcmp(pvdev-vbasedev.name, vdev-vbasedev.name) == 0) { error_report(vfio: error: device %s is already attached, path); vfio_put_group(group); @@ -4333,7 +4342,7 @@ out_put: static void vfio_exitfn(PCIDevice *pdev) { VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); -VFIOGroup *group = vdev-group; +VFIOGroup *group = vdev-vbasedev.group; vfio_unregister_err_notifier(vdev); pci_device_set_intx_routing_notifier(vdev-pdev, NULL); -- 1.8.3.2
[Qemu-devel] [PATCH v8 02/19] hw/vfio/pci: Rename VFIODevice into VFIOPCIDevice
This prepares for the introduction of VFIOPlatformDevice Signed-off-by: Eric Auger eric.au...@linaro.org --- hw/vfio/pci.c | 210 +- 1 file changed, 106 insertions(+), 104 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 7e69415..0d7d4a0 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -48,11 +48,11 @@ #define VFIO_ALLOW_KVM_MSI 1 #define VFIO_ALLOW_KVM_MSIX 1 -struct VFIODevice; +struct VFIOPCIDevice; typedef struct VFIOQuirk { MemoryRegion mem; -struct VFIODevice *vdev; +struct VFIOPCIDevice *vdev; QLIST_ENTRY(VFIOQuirk) next; struct { uint32_t base_offset:TARGET_PAGE_BITS; @@ -123,7 +123,7 @@ typedef struct VFIOMSIVector { */ EventNotifier interrupt; EventNotifier kvm_interrupt; -struct VFIODevice *vdev; /* back pointer to device */ +struct VFIOPCIDevice *vdev; /* back pointer to device */ int virq; bool use; } VFIOMSIVector; @@ -185,7 +185,7 @@ typedef struct VFIOMSIXInfo { void *mmap; } VFIOMSIXInfo; -typedef struct VFIODevice { +typedef struct VFIOPCIDevice { PCIDevice pdev; int fd; VFIOINTx intx; @@ -203,7 +203,7 @@ typedef struct VFIODevice { VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */ VFIOVGA vga; /* 0xa, 0x3b0, 0x3c0 */ PCIHostDeviceAddress host; -QLIST_ENTRY(VFIODevice) next; +QLIST_ENTRY(VFIOPCIDevice) next; struct VFIOGroup *group; EventNotifier err_notifier; uint32_t features; @@ -218,13 +218,13 @@ typedef struct VFIODevice { bool has_pm_reset; bool needs_reset; bool rom_read_failed; -} VFIODevice; +} VFIOPCIDevice; typedef struct VFIOGroup { int fd; int groupid; VFIOContainer *container; -QLIST_HEAD(, VFIODevice) device_list; +QLIST_HEAD(, VFIOPCIDevice) device_list; QLIST_ENTRY(VFIOGroup) next; QLIST_ENTRY(VFIOGroup) container_next; } VFIOGroup; @@ -268,16 +268,16 @@ static QLIST_HEAD(, VFIOGroup) static int vfio_kvm_device_fd = -1; #endif -static void vfio_disable_interrupts(VFIODevice *vdev); +static void vfio_disable_interrupts(VFIOPCIDevice *vdev); static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len); static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr, uint32_t val, int len); -static void vfio_mmap_set_enabled(VFIODevice *vdev, bool enabled); +static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled); /* * Common VFIO interrupt disable */ -static void vfio_disable_irqindex(VFIODevice *vdev, int index) +static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index) { struct vfio_irq_set irq_set = { .argsz = sizeof(irq_set), @@ -293,7 +293,7 @@ static void vfio_disable_irqindex(VFIODevice *vdev, int index) /* * INTx */ -static void vfio_unmask_intx(VFIODevice *vdev) +static void vfio_unmask_intx(VFIOPCIDevice *vdev) { struct vfio_irq_set irq_set = { .argsz = sizeof(irq_set), @@ -307,7 +307,7 @@ static void vfio_unmask_intx(VFIODevice *vdev) } #ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */ -static void vfio_mask_intx(VFIODevice *vdev) +static void vfio_mask_intx(VFIOPCIDevice *vdev) { struct vfio_irq_set irq_set = { .argsz = sizeof(irq_set), @@ -338,7 +338,7 @@ static void vfio_mask_intx(VFIODevice *vdev) */ static void vfio_intx_mmap_enable(void *opaque) { -VFIODevice *vdev = opaque; +VFIOPCIDevice *vdev = opaque; if (vdev-intx.pending) { timer_mod(vdev-intx.mmap_timer, @@ -351,7 +351,7 @@ static void vfio_intx_mmap_enable(void *opaque) static void vfio_intx_interrupt(void *opaque) { -VFIODevice *vdev = opaque; +VFIOPCIDevice *vdev = opaque; if (!event_notifier_test_and_clear(vdev-intx.interrupt)) { return; @@ -370,7 +370,7 @@ static void vfio_intx_interrupt(void *opaque) } } -static void vfio_eoi(VFIODevice *vdev) +static void vfio_eoi(VFIOPCIDevice *vdev) { if (!vdev-intx.pending) { return; @@ -384,7 +384,7 @@ static void vfio_eoi(VFIODevice *vdev) vfio_unmask_intx(vdev); } -static void vfio_enable_intx_kvm(VFIODevice *vdev) +static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev) { #ifdef CONFIG_KVM struct kvm_irqfd irqfd = { @@ -462,7 +462,7 @@ fail: #endif } -static void vfio_disable_intx_kvm(VFIODevice *vdev) +static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev) { #ifdef CONFIG_KVM struct kvm_irqfd irqfd = { @@ -506,7 +506,7 @@ static void vfio_disable_intx_kvm(VFIODevice *vdev) static void vfio_update_irq(PCIDevice *pdev) { -VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev); +VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); PCIINTxRoute route; if (vdev-interrupt != VFIO_INT_INTx) { @@ -537,7 +537,7 @@ static void vfio_update_irq(PCIDevice *pdev) vfio_eoi(vdev); } -static int vfio_enable_intx(VFIODevice
[Qemu-devel] [PATCH v8 01/19] vfio: move hw/misc/vfio.c to hw/vfio/pci.c Move vfio.h into include/hw/vfio
From: Kim Phillips kim.phill...@linaro.org This is done in preparation for the addition of VFIO platform device support. Signed-off-by: Kim Phillips kim.phill...@linaro.org --- LICENSE | 2 +- MAINTAINERS | 2 +- hw/Makefile.objs | 1 + hw/misc/Makefile.objs| 1 - hw/ppc/spapr_pci_vfio.c | 2 +- hw/vfio/Makefile.objs| 3 +++ hw/{misc/vfio.c = vfio/pci.c} | 2 +- include/hw/{misc = vfio}/vfio.h | 0 8 files changed, 8 insertions(+), 5 deletions(-) create mode 100644 hw/vfio/Makefile.objs rename hw/{misc/vfio.c = vfio/pci.c} (99%) rename include/hw/{misc = vfio}/vfio.h (100%) diff --git a/LICENSE b/LICENSE index da70e94..0e0b4b9 100644 --- a/LICENSE +++ b/LICENSE @@ -11,7 +11,7 @@ option) any later version. As of July 2013, contributions under version 2 of the GNU General Public License (and no later version) are only accepted for the following files -or directories: bsd-user/, linux-user/, hw/misc/vfio.c, hw/xen/xen_pt*. +or directories: bsd-user/, linux-user/, hw/vfio/, hw/xen/xen_pt*. 3) The Tiny Code Generator (TCG) is released under the BSD license (see license headers in files). diff --git a/MAINTAINERS b/MAINTAINERS index bcb69e8..255b512 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -657,7 +657,7 @@ F: hw/usb/dev-serial.c VFIO M: Alex Williamson alex.william...@redhat.com S: Supported -F: hw/misc/vfio.c +F: hw/vfio/* vhost M: Michael S. Tsirkin m...@redhat.com diff --git a/hw/Makefile.objs b/hw/Makefile.objs index 52a1464..73afa41 100644 --- a/hw/Makefile.objs +++ b/hw/Makefile.objs @@ -26,6 +26,7 @@ devices-dirs-$(CONFIG_SOFTMMU) += ssi/ devices-dirs-$(CONFIG_SOFTMMU) += timer/ devices-dirs-$(CONFIG_TPM) += tpm/ devices-dirs-$(CONFIG_SOFTMMU) += usb/ +devices-dirs-$(CONFIG_SOFTMMU) += vfio/ devices-dirs-$(CONFIG_VIRTIO) += virtio/ devices-dirs-$(CONFIG_SOFTMMU) += watchdog/ devices-dirs-$(CONFIG_SOFTMMU) += xen/ diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs index 979e532..e47fea8 100644 --- a/hw/misc/Makefile.objs +++ b/hw/misc/Makefile.objs @@ -21,7 +21,6 @@ common-obj-$(CONFIG_MACIO) += macio/ ifeq ($(CONFIG_PCI), y) obj-$(CONFIG_KVM) += ivshmem.o -obj-$(CONFIG_LINUX) += vfio.o endif obj-$(CONFIG_REALVIEW) += arm_sysctl.o diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c index d3bddf2..144912b 100644 --- a/hw/ppc/spapr_pci_vfio.c +++ b/hw/ppc/spapr_pci_vfio.c @@ -20,7 +20,7 @@ #include hw/ppc/spapr.h #include hw/pci-host/spapr.h #include linux/vfio.h -#include hw/misc/vfio.h +#include hw/vfio/vfio.h static Property spapr_phb_vfio_properties[] = { DEFINE_PROP_INT32(iommu, sPAPRPHBVFIOState, iommugroupid, -1), diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs new file mode 100644 index 000..31c7dab --- /dev/null +++ b/hw/vfio/Makefile.objs @@ -0,0 +1,3 @@ +ifeq ($(CONFIG_LINUX), y) +obj-$(CONFIG_PCI) += pci.o +endif diff --git a/hw/misc/vfio.c b/hw/vfio/pci.c similarity index 99% rename from hw/misc/vfio.c rename to hw/vfio/pci.c index 6c36c8b..7e69415 100644 --- a/hw/misc/vfio.c +++ b/hw/vfio/pci.c @@ -39,8 +39,8 @@ #include qemu/range.h #include sysemu/kvm.h #include sysemu/sysemu.h -#include hw/misc/vfio.h #include trace.h +#include hw/vfio/vfio.h /* Extra debugging, trap acceleration paths for more logging */ #define VFIO_ALLOW_MMAP 1 diff --git a/include/hw/misc/vfio.h b/include/hw/vfio/vfio.h similarity index 100% rename from include/hw/misc/vfio.h rename to include/hw/vfio/vfio.h -- 1.8.3.2
[Qemu-devel] [PATCH v8 04/19] hw/vfio/pci: introduce minimalist VFIODevice with fd
Introduce a new base VFIODevice strcut that will be used by both PCI and Platform VFIO device. Move VFIOPCIDevice fd field there. Obviously other fields from VFIOPCIDevice will be moved there but this patch file is introduced to ease the review. Also vfio_mask_single_irqindex, vfio_unmask_single_irqindex, vfio_disable_irqindex now take a VFIODevice handle as argument. Signed-off-by: Eric Auger eric.au...@linaro.org --- hw/vfio/pci.c | 117 +++--- 1 file changed, 63 insertions(+), 54 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 387da1a..cd9ce4e 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -185,9 +185,13 @@ typedef struct VFIOMSIXInfo { void *mmap; } VFIOMSIXInfo; +typedef struct VFIODevice { +int fd; +} VFIODevice; + typedef struct VFIOPCIDevice { PCIDevice pdev; -int fd; +VFIODevice vbasedev; VFIOINTx intx; unsigned int config_size; uint8_t *emulated_config_bits; /* QEMU emulated bits, little-endian */ @@ -277,7 +281,7 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled); /* * Common VFIO interrupt disable */ -static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index) +static void vfio_disable_irqindex(VFIODevice *vbasedev, int index) { struct vfio_irq_set irq_set = { .argsz = sizeof(irq_set), @@ -287,13 +291,13 @@ static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index) .count = 0, }; -ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set); +ioctl(vbasedev-fd, VFIO_DEVICE_SET_IRQS, irq_set); } /* * INTx */ -static void vfio_unmask_single_irqindex(VFIOPCIDevice *vdev, int index) +static void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index) { struct vfio_irq_set irq_set = { .argsz = sizeof(irq_set), @@ -303,11 +307,11 @@ static void vfio_unmask_single_irqindex(VFIOPCIDevice *vdev, int index) .count = 1, }; -ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set); +ioctl(vbasedev-fd, VFIO_DEVICE_SET_IRQS, irq_set); } #ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */ -static void vfio_mask_single_irqindex(VFIOPCIDevice *vdev, int index) +static void vfio_mask_single_irqindex(VFIODevice *vbasedev, int index) { struct vfio_irq_set irq_set = { .argsz = sizeof(irq_set), @@ -317,7 +321,7 @@ static void vfio_mask_single_irqindex(VFIOPCIDevice *vdev, int index) .count = 1, }; -ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set); +ioctl(vbasedev-fd, VFIO_DEVICE_SET_IRQS, irq_set); } #endif @@ -381,7 +385,7 @@ static void vfio_eoi(VFIOPCIDevice *vdev) vdev-intx.pending = false; pci_irq_deassert(vdev-pdev); -vfio_unmask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX); +vfio_unmask_single_irqindex(vdev-vbasedev, VFIO_PCI_INTX_IRQ_INDEX); } static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev) @@ -404,7 +408,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev) /* Get to a known interrupt state */ qemu_set_fd_handler(irqfd.fd, NULL, NULL, vdev); -vfio_mask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX); +vfio_mask_single_irqindex(vdev-vbasedev, VFIO_PCI_INTX_IRQ_INDEX); vdev-intx.pending = false; pci_irq_deassert(vdev-pdev); @@ -434,7 +438,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev) *pfd = irqfd.resamplefd; -ret = ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set); +ret = ioctl(vdev-vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set); g_free(irq_set); if (ret) { error_report(vfio: Error: Failed to setup INTx unmask fd: %m); @@ -442,7 +446,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev) } /* Let'em rip */ -vfio_unmask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX); +vfio_unmask_single_irqindex(vdev-vbasedev, VFIO_PCI_INTX_IRQ_INDEX); vdev-intx.kvm_accel = true; @@ -458,7 +462,7 @@ fail_irqfd: event_notifier_cleanup(vdev-intx.unmask); fail: qemu_set_fd_handler(irqfd.fd, vfio_intx_interrupt, NULL, vdev); -vfio_unmask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX); +vfio_unmask_single_irqindex(vdev-vbasedev, VFIO_PCI_INTX_IRQ_INDEX); #endif } @@ -479,7 +483,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev) * Get to a known state, hardware masked, QEMU ready to accept new * interrupts, QEMU IRQ de-asserted. */ -vfio_mask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX); +vfio_mask_single_irqindex(vdev-vbasedev, VFIO_PCI_INTX_IRQ_INDEX); vdev-intx.pending = false; pci_irq_deassert(vdev-pdev); @@ -497,7 +501,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev) vdev-intx.kvm_accel = false; /* If we've missed an event, let it re-fire through QEMU */ -vfio_unmask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX); +vfio_unmask_single_irqindex(vdev-vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
[Qemu-devel] [PATCH v8 11/19] hw/vfio: create common module
A new common module is created. It implements all functions that have no device specificity (PCI, Platform). This patch only consists in move (no functional changes) Signed-off-by: Kim Phillips kim.phill...@linaro.org Signed-off-by: Eric Auger eric.au...@linaro.org --- v7 - v8: - integrate Add skip_dump flag to ignore memory region during dump - vfio_compute_needs_reset does not return bool anymore v6 - v7: - integrate Revert vfio: Make BARs native endian - remove VFIO_DEVICE_TYPE_PLATFORM in vfio-common.h, will come in next patch v5 - v6: - follow all evolutions of original PCI code from v5 to V6 - move declaration of vfio_region_ops, vfio_memory_listener, vfio_group_list, vfio_address_spaces into vfio-common.h v4 - v5: - integrate sPAPR/IOMMU: Fix TCE entry permission - VFIOdevice .name dealloc removed from vfio_put_base_device - add some includes according to vfio inclusion policy v3 - v4: [Eric Auger] move done after all PCI modifications to anticipate for VFIO Platform needs. Purpose is to alleviate the whole review process. = v3 First split done by Kim Phillips Conflicts: hw/vfio/pci.c Conflicts: hw/vfio/pci.c --- hw/vfio/Makefile.objs |1 + hw/vfio/common.c | 959 ++ hw/vfio/pci.c | 1028 + include/hw/vfio/vfio-common.h | 151 ++ trace-events |1 + 5 files changed, 1113 insertions(+), 1027 deletions(-) create mode 100644 hw/vfio/common.c create mode 100644 include/hw/vfio/vfio-common.h diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs index 31c7dab..e31f30e 100644 --- a/hw/vfio/Makefile.objs +++ b/hw/vfio/Makefile.objs @@ -1,3 +1,4 @@ ifeq ($(CONFIG_LINUX), y) +obj-$(CONFIG_SOFTMMU) += common.o obj-$(CONFIG_PCI) += pci.o endif diff --git a/hw/vfio/common.c b/hw/vfio/common.c new file mode 100644 index 000..554467f --- /dev/null +++ b/hw/vfio/common.c @@ -0,0 +1,959 @@ +/* + * generic functions used by VFIO devices + * + * Copyright Red Hat, Inc. 2012 + * + * Authors: + * Alex Williamson alex.william...@redhat.com + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + * Based on qemu-kvm device-assignment: + * Adapted for KVM by Qumranet. + * Copyright (c) 2007, Neocleus, Alex Novik (a...@neocleus.com) + * Copyright (c) 2007, Neocleus, Guy Zana (g...@neocleus.com) + * Copyright (C) 2008, Qumranet, Amit Shah (amit.s...@qumranet.com) + * Copyright (C) 2008, Red Hat, Amit Shah (amit.s...@redhat.com) + * Copyright (C) 2008, IBM, Muli Ben-Yehuda (m...@il.ibm.com) + */ + +#include sys/ioctl.h +#include sys/mman.h +#include linux/vfio.h + +#include hw/vfio/vfio-common.h +#include hw/vfio/vfio.h +#include exec/address-spaces.h +#include exec/memory.h +#include hw/hw.h +#include qemu/error-report.h +#include sysemu/kvm.h +#include trace.h + +struct vfio_group_head vfio_group_list = +QLIST_HEAD_INITIALIZER(vfio_address_spaces); +struct vfio_as_head vfio_address_spaces = +QLIST_HEAD_INITIALIZER(vfio_address_spaces); + +#ifdef CONFIG_KVM +/* + * We have a single VFIO pseudo device per KVM VM. Once created it lives + * for the life of the VM. Closing the file descriptor only drops our + * reference to it and the device's reference to kvm. Therefore once + * initialized, this file descriptor is only released on QEMU exit and + * we'll re-use it should another vfio device be attached before then. + */ +static int vfio_kvm_device_fd = -1; +#endif + +/* + * Common VFIO interrupt disable + */ +void vfio_disable_irqindex(VFIODevice *vbasedev, int index) +{ +struct vfio_irq_set irq_set = { +.argsz = sizeof(irq_set), +.flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER, +.index = index, +.start = 0, +.count = 0, +}; + +ioctl(vbasedev-fd, VFIO_DEVICE_SET_IRQS, irq_set); +} + +void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index) +{ +struct vfio_irq_set irq_set = { +.argsz = sizeof(irq_set), +.flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK, +.index = index, +.start = 0, +.count = 1, +}; + +ioctl(vbasedev-fd, VFIO_DEVICE_SET_IRQS, irq_set); +} + +void vfio_mask_single_irqindex(VFIODevice *vbasedev, int index) +{ +struct vfio_irq_set irq_set = { +.argsz = sizeof(irq_set), +.flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK, +.index = index, +.start = 0, +.count = 1, +}; + +ioctl(vbasedev-fd, VFIO_DEVICE_SET_IRQS, irq_set); +} + +/* + * IO Port/MMIO - Beware of the endians, VFIO is always little endian + */ +void vfio_region_write(void *opaque, hwaddr addr, + uint64_t data, unsigned size) +{ +VFIORegion *region = opaque; +VFIODevice *vbasedev = region-vbasedev; +union { +uint8_t byte; +uint16_t
[Qemu-devel] [PATCH v8 03/19] hw/vfio/pci: generalize mask/unmask to any IRQ index
To prepare for platform device introduction, rename vfio_mask_intx and vfio_unmask_intx into vfio_mask_single_irqindex and respectively unmask_single_irqindex. Also use a nex index parameter. With that name and prototype the function will be usable for other indexes than VFIO_PCI_INTX_IRQ_INDEX. Signed-off-by: Eric Auger eric.au...@linaro.org --- hw/vfio/pci.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 0d7d4a0..387da1a 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -293,12 +293,12 @@ static void vfio_disable_irqindex(VFIOPCIDevice *vdev, int index) /* * INTx */ -static void vfio_unmask_intx(VFIOPCIDevice *vdev) +static void vfio_unmask_single_irqindex(VFIOPCIDevice *vdev, int index) { struct vfio_irq_set irq_set = { .argsz = sizeof(irq_set), .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK, -.index = VFIO_PCI_INTX_IRQ_INDEX, +.index = index, .start = 0, .count = 1, }; @@ -307,12 +307,12 @@ static void vfio_unmask_intx(VFIOPCIDevice *vdev) } #ifdef CONFIG_KVM /* Unused outside of CONFIG_KVM code */ -static void vfio_mask_intx(VFIOPCIDevice *vdev) +static void vfio_mask_single_irqindex(VFIOPCIDevice *vdev, int index) { struct vfio_irq_set irq_set = { .argsz = sizeof(irq_set), .flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK, -.index = VFIO_PCI_INTX_IRQ_INDEX, +.index = index, .start = 0, .count = 1, }; @@ -381,7 +381,7 @@ static void vfio_eoi(VFIOPCIDevice *vdev) vdev-intx.pending = false; pci_irq_deassert(vdev-pdev); -vfio_unmask_intx(vdev); +vfio_unmask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX); } static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev) @@ -404,7 +404,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev) /* Get to a known interrupt state */ qemu_set_fd_handler(irqfd.fd, NULL, NULL, vdev); -vfio_mask_intx(vdev); +vfio_mask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX); vdev-intx.pending = false; pci_irq_deassert(vdev-pdev); @@ -442,7 +442,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev) } /* Let'em rip */ -vfio_unmask_intx(vdev); +vfio_unmask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX); vdev-intx.kvm_accel = true; @@ -458,7 +458,7 @@ fail_irqfd: event_notifier_cleanup(vdev-intx.unmask); fail: qemu_set_fd_handler(irqfd.fd, vfio_intx_interrupt, NULL, vdev); -vfio_unmask_intx(vdev); +vfio_unmask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX); #endif } @@ -479,7 +479,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev) * Get to a known state, hardware masked, QEMU ready to accept new * interrupts, QEMU IRQ de-asserted. */ -vfio_mask_intx(vdev); +vfio_mask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX); vdev-intx.pending = false; pci_irq_deassert(vdev-pdev); @@ -497,7 +497,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev) vdev-intx.kvm_accel = false; /* If we've missed an event, let it re-fire through QEMU */ -vfio_unmask_intx(vdev); +vfio_unmask_single_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX); trace_vfio_disable_intx_kvm(vdev-host.domain, vdev-host.bus, vdev-host.slot, vdev-host.function); -- 1.8.3.2
[Qemu-devel] [PATCH v8 06/19] hw/vfio/pci: handle reset at VFIODevice
Since we can potentially have both PCI and platform devices in the same VFIO group, this latter now owns a list of VFIODevices. A unified reset handler, vfio_reset_handler, is registered, looping through this VFIODevice list. 2 specialized operations are introduced (vfio_compute_needs_reset and vfio_hot_reset_multi): they allow to implement type specific behavior. also reset_works and needs_reset VFIOPCIDevice fields are moved into VFIODevice. Signed-off-by: Eric Auger eric.au...@linaro.org --- v8: compared to [PATCH v7 03/16] hw/vfio/pci: introduce VFIODevice, vfio_compute_needs_reset does not return a bool anymore. --- hw/vfio/pci.c | 93 --- 1 file changed, 63 insertions(+), 30 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 157e1a5..e68865b 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -189,13 +189,24 @@ typedef struct VFIOMSIXInfo { void *mmap; } VFIOMSIXInfo; +typedef struct VFIODeviceOps VFIODeviceOps; + typedef struct VFIODevice { +QLIST_ENTRY(VFIODevice) next; struct VFIOGroup *group; char *name; int fd; int type; +bool reset_works; +bool needs_reset; +VFIODeviceOps *ops; } VFIODevice; +struct VFIODeviceOps { +void (*vfio_compute_needs_reset)(VFIODevice *vdev); +int (*vfio_hot_reset_multi)(VFIODevice *vdev); +}; + typedef struct VFIOPCIDevice { PCIDevice pdev; VFIODevice vbasedev; @@ -214,19 +225,16 @@ typedef struct VFIOPCIDevice { VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */ VFIOVGA vga; /* 0xa, 0x3b0, 0x3c0 */ PCIHostDeviceAddress host; -QLIST_ENTRY(VFIOPCIDevice) next; EventNotifier err_notifier; uint32_t features; #define VFIO_FEATURE_ENABLE_VGA_BIT 0 #define VFIO_FEATURE_ENABLE_VGA (1 VFIO_FEATURE_ENABLE_VGA_BIT) int32_t bootindex; uint8_t pm_cap; -bool reset_works; bool has_vga; bool pci_aer; bool has_flr; bool has_pm_reset; -bool needs_reset; bool rom_read_failed; } VFIOPCIDevice; @@ -234,7 +242,7 @@ typedef struct VFIOGroup { int fd; int groupid; VFIOContainer *container; -QLIST_HEAD(, VFIOPCIDevice) device_list; +QLIST_HEAD(, VFIODevice) device_list; QLIST_ENTRY(VFIOGroup) next; QLIST_ENTRY(VFIOGroup) container_next; } VFIOGroup; @@ -3381,7 +3389,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single) single ? one : multi); vfio_pci_pre_reset(vdev); -vdev-needs_reset = false; +vdev-vbasedev.needs_reset = false; info = g_malloc0(sizeof(*info)); info-argsz = sizeof(*info); @@ -3418,6 +3426,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single) for (i = 0; i info-count; i++) { PCIHostDeviceAddress host; VFIOPCIDevice *tmp; +VFIODevice *vbasedev_iter; host.domain = devices[i].segment; host.bus = devices[i].bus; @@ -3449,7 +3458,11 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single) } /* Prep dependent devices for reset and clear our marker. */ -QLIST_FOREACH(tmp, group-device_list, next) { +QLIST_FOREACH(vbasedev_iter, group-device_list, next) { +if (vbasedev_iter-type != VFIO_DEVICE_TYPE_PCI) { +continue; +} +tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev); if (vfio_pci_host_match(host, tmp-host)) { if (single) { error_report(vfio: found another in-use device @@ -3459,7 +3472,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single) goto out_single; } vfio_pci_pre_reset(tmp); -tmp-needs_reset = false; +tmp-vbasedev.needs_reset = false; multi = true; break; } @@ -3512,6 +3525,7 @@ out: for (i = 0; i info-count; i++) { PCIHostDeviceAddress host; VFIOPCIDevice *tmp; +VFIODevice *vbasedev_iter; host.domain = devices[i].segment; host.bus = devices[i].bus; @@ -3532,7 +3546,11 @@ out: break; } -QLIST_FOREACH(tmp, group-device_list, next) { +QLIST_FOREACH(vbasedev_iter, group-device_list, next) { +if (vbasedev_iter-type != VFIO_DEVICE_TYPE_PCI) { +continue; +} +tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev); if (vfio_pci_host_match(host, tmp-host)) { vfio_pci_post_reset(tmp); break; @@ -3566,28 +3584,40 @@ static int vfio_pci_hot_reset_one(VFIOPCIDevice *vdev) return vfio_pci_hot_reset(vdev, true); } -static int vfio_pci_hot_reset_multi(VFIOPCIDevice *vdev) +static int vfio_pci_hot_reset_multi(VFIODevice *vbasedev) { +VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice,
[Qemu-devel] [PATCH v8 07/19] hw/vfio/pci: Introduce VFIORegion
This structure is going to be shared by VFIOPCIDevice and VFIOPlatformDevice. VFIOBAR includes it. vfio_eoi becomes an ops of VFIODevice specialized by parent device. This makes possible to transform vfio_bar_write/read into generic vfio_region_write/read that will be used by VFIOPlatformDevice too. vfio_mmap_bar becomes vfio_map_region Signed-off-by: Eric Auger eric.au...@linaro.org --- v7-v8: - integrate Add skip_dump flag to ignore memory region during dump v4-v5: - remove fd field from VFIORegion - change error_report format string in vfio_region_write/read - remove #ifdef DEBUG_VFIO in the same function - correct missing initialization of bar region's vbasedev field - change Object * parameter name of vfio_mmap_region and remove useless OBJECT() Conflicts: hw/vfio/pci.c Conflicts: hw/vfio/pci.c --- hw/vfio/pci.c | 193 ++ trace-events | 4 +- 2 files changed, 103 insertions(+), 94 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index e68865b..10c1697 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -77,15 +77,19 @@ typedef struct VFIOQuirk { } data; } VFIOQuirk; -typedef struct VFIOBAR { -off_t fd_offset; /* offset of BAR within device fd */ -int fd; /* device fd, allows us to pass VFIOBAR as opaque data */ +typedef struct VFIORegion { +struct VFIODevice *vbasedev; +off_t fd_offset; /* offset of region within device fd */ MemoryRegion mem; /* slow, read/write access */ MemoryRegion mmap_mem; /* direct mapped access */ void *mmap; size_t size; uint32_t flags; /* VFIO region flags (rd/wr/mmap) */ -uint8_t nr; /* cache the BAR number for debug */ +uint8_t nr; /* cache the region number for debug */ +} VFIORegion; + +typedef struct VFIOBAR { +VFIORegion region; bool ioport; bool mem64; QLIST_HEAD(, VFIOQuirk) quirks; @@ -205,6 +209,7 @@ typedef struct VFIODevice { struct VFIODeviceOps { void (*vfio_compute_needs_reset)(VFIODevice *vdev); int (*vfio_hot_reset_multi)(VFIODevice *vdev); +void (*vfio_eoi)(VFIODevice *vdev); }; typedef struct VFIOPCIDevice { @@ -388,8 +393,10 @@ static void vfio_intx_interrupt(void *opaque) } } -static void vfio_eoi(VFIOPCIDevice *vdev) +static void vfio_eoi(VFIODevice *vbasedev) { +VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev); + if (!vdev-intx.pending) { return; } @@ -399,7 +406,7 @@ static void vfio_eoi(VFIOPCIDevice *vdev) vdev-intx.pending = false; pci_irq_deassert(vdev-pdev); -vfio_unmask_single_irqindex(vdev-vbasedev, VFIO_PCI_INTX_IRQ_INDEX); +vfio_unmask_single_irqindex(vbasedev, VFIO_PCI_INTX_IRQ_INDEX); } static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev) @@ -552,7 +559,7 @@ static void vfio_update_irq(PCIDevice *pdev) vfio_enable_intx_kvm(vdev); /* Re-enable the interrupt in cased we missed an EOI */ -vfio_eoi(vdev); +vfio_eoi(vdev-vbasedev); } static int vfio_enable_intx(VFIOPCIDevice *vdev) @@ -1089,10 +1096,11 @@ static void vfio_update_msi(VFIOPCIDevice *vdev) /* * IO Port/MMIO - Beware of the endians, VFIO is always little endian */ -static void vfio_bar_write(void *opaque, hwaddr addr, - uint64_t data, unsigned size) +static void vfio_region_write(void *opaque, hwaddr addr, + uint64_t data, unsigned size) { -VFIOBAR *bar = opaque; +VFIORegion *region = opaque; +VFIODevice *vbasedev = region-vbasedev; union { uint8_t byte; uint16_t word; @@ -1115,20 +1123,14 @@ static void vfio_bar_write(void *opaque, hwaddr addr, break; } -if (pwrite(bar-fd, buf, size, bar-fd_offset + addr) != size) { -error_report(%s(,0x%HWADDR_PRIx, 0x%PRIx64, %d) failed: %m, - __func__, addr, data, size); +if (pwrite(vbasedev-fd, buf, size, region-fd_offset + addr) != size) { +error_report(%s(%s:region%d+0x%HWADDR_PRIx, 0x%PRIx64 + ,%d) failed: %m, + __func__, vbasedev-name, region-nr, + addr, data, size); } -#ifdef DEBUG_VFIO -{ -VFIOPCIDevice *vdev = container_of(bar, VFIOPCIDevice, bars[bar-nr]); - -trace_vfio_bar_write(vdev-host.domain, vdev-host.bus, - vdev-host.slot, vdev-host.function, - region-nr, addr, data, size); -} -#endif +trace_vfio_region_write(vbasedev-name, region-nr, addr, data, size); /* * A read or write to a BAR always signals an INTx EOI. This will @@ -1138,13 +1140,14 @@ static void vfio_bar_write(void *opaque, hwaddr addr, * which access will service the interrupt, so we're potentially * getting quite a few host interrupts per guest interrupt. */ -vfio_eoi(container_of(bar, VFIOPCIDevice, bars[bar-nr])); +
[Qemu-devel] [PATCH v8 08/19] hw/vfio/pci: split vfio_get_device
vfio_get_device now takes a VFIODevice as argument. The function is split into 2 parts: vfio_get_device which is generic and vfio_populate_device which is bus specific. 3 new fields are introduced in VFIODevice to store dev_info. vfio_put_base_device is created. --- v5-v6: - simplifies the split for vfio_get_device: vfio_check_device, vfio_populate_regions, vfio_populate_interrupts are now gathered into a unique specialization function dubbed vfio_populate_device v4-v5: - cleanup up of error handling and get/put operations in vfio_check_device, vfio_populate_regions, vfio_populate_interrupts and vfio_get_device. - correct misuse of errno - vfio_populate_regions always returns 0 - VFIODevice .name deallocation done in vfio_put_device instead of vfio_put_base_device - vfio_put_base_device done at vfio_get_device level. Signed-off-by: Eric Auger eric.au...@linaro.org --- hw/vfio/pci.c | 130 +++--- trace-events | 10 ++--- 2 files changed, 83 insertions(+), 57 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 10c1697..60ff22b 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -204,12 +204,16 @@ typedef struct VFIODevice { bool reset_works; bool needs_reset; VFIODeviceOps *ops; +unsigned int num_irqs; +unsigned int num_regions; +unsigned int flags; } VFIODevice; struct VFIODeviceOps { void (*vfio_compute_needs_reset)(VFIODevice *vdev); int (*vfio_hot_reset_multi)(VFIODevice *vdev); void (*vfio_eoi)(VFIODevice *vdev); +int (*vfio_populate_device)(VFIODevice *vdev); }; typedef struct VFIOPCIDevice { @@ -296,6 +300,8 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len); static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr, uint32_t val, int len); static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled); +static void vfio_put_base_device(VFIODevice *vbasedev); +static int vfio_populate_device(VFIODevice *vbasedev); /* * Common VFIO interrupt disable @@ -3610,6 +3616,7 @@ static VFIODeviceOps vfio_pci_ops = { .vfio_compute_needs_reset = vfio_pci_compute_needs_reset, .vfio_hot_reset_multi = vfio_pci_hot_reset_multi, .vfio_eoi = vfio_eoi, +.vfio_populate_device = vfio_populate_device, }; static void vfio_reset_handler(void *opaque) @@ -3951,70 +3958,45 @@ static void vfio_put_group(VFIOGroup *group) } } -static int vfio_get_device(VFIOGroup *group, const char *name, - VFIOPCIDevice *vdev) +static int vfio_populate_device(VFIODevice *vbasedev) { -struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) }; +VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev); struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) }; struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) }; -int ret, i; - -ret = ioctl(group-fd, VFIO_GROUP_GET_DEVICE_FD, name); -if (ret 0) { -error_report(vfio: error getting device %s from group %d: %m, - name, group-groupid); -error_printf(Verify all devices in group %d are bound to vfio-pci - or pci-stub and not already in use\n, group-groupid); -return ret; -} - -vdev-vbasedev.fd = ret; -vdev-vbasedev.group = group; -QLIST_INSERT_HEAD(group-device_list, vdev-vbasedev, next); +int i, ret = -1; /* Sanity check device */ -ret = ioctl(vdev-vbasedev.fd, VFIO_DEVICE_GET_INFO, dev_info); -if (ret) { -error_report(vfio: error getting device info: %m); -goto error; -} - -trace_vfio_get_device_irq(name, dev_info.flags, - dev_info.num_regions, dev_info.num_irqs); - -if (!(dev_info.flags VFIO_DEVICE_FLAGS_PCI)) { +if (!(vbasedev-flags VFIO_DEVICE_FLAGS_PCI)) { error_report(vfio: Um, this isn't a PCI device); goto error; } -vdev-vbasedev.reset_works = !!(dev_info.flags VFIO_DEVICE_FLAGS_RESET); - -if (dev_info.num_regions VFIO_PCI_CONFIG_REGION_INDEX + 1) { +if (vbasedev-num_regions VFIO_PCI_CONFIG_REGION_INDEX + 1) { error_report(vfio: unexpected number of io regions %u, - dev_info.num_regions); + vbasedev-num_regions); goto error; } -if (dev_info.num_irqs VFIO_PCI_MSIX_IRQ_INDEX + 1) { -error_report(vfio: unexpected number of irqs %u, dev_info.num_irqs); +if (vbasedev-num_irqs VFIO_PCI_MSIX_IRQ_INDEX + 1) { +error_report(vfio: unexpected number of irqs %u, vbasedev-num_irqs); goto error; } for (i = VFIO_PCI_BAR0_REGION_INDEX; i VFIO_PCI_ROM_REGION_INDEX; i++) { reg_info.index = i; -ret = ioctl(vdev-vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, reg_info); +ret = ioctl(vbasedev-fd, VFIO_DEVICE_GET_REGION_INFO,
[Qemu-devel] [PATCH v8 16/19] hw/vfio/platform: Add irqfd support
This patch aims at optimizing IRQ handling using irqfd framework. Instead of handling the eventfds on user-side they are handled on kernel side using - the KVM irqfd framework, - the VFIO driver virqfd framework. the virtual IRQ completion is trapped at interrupt controller This removes the need for fast/slow path swap. Overall this brings significant performance improvements. it depends on host kernel KVM irqfd. Signed-off-by: Alvise Rigo a.r...@virtualopensystems.com Signed-off-by: Eric Auger eric.au...@linaro.org --- v5 - v6 - rely on kvm_irqfds_enabled() and kvm_resamplefds_enabled() - guard KVM code with #ifdef CONFIG_KVM v3 - v4: [Alvise Rigo] Use of VFIO Platform driver v6 unmask/virqfd feature and removal of resamplefd handler. Physical IRQ unmasking is now done in VFIO driver. v3: [Eric Auger] initial support with resamplefd handled on QEMU side since the unmask was not supported on VFIO platform driver v5. Conflicts: hw/vfio/platform.c --- hw/vfio/platform.c | 96 + include/hw/vfio/vfio-platform.h | 1 + trace-events| 2 + 3 files changed, 99 insertions(+) diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c index 41f8693..97d98bf 100644 --- a/hw/vfio/platform.c +++ b/hw/vfio/platform.c @@ -25,6 +25,7 @@ #include hw/sysbus.h #include trace.h #include hw/platform-bus.h +#include sysemu/kvm.h static void vfio_intp_interrupt(VFIOINTp *intp); typedef void (*eventfd_user_side_handler_t)(VFIOINTp *intp); @@ -236,6 +237,83 @@ static int vfio_start_eventfd_injection(VFIOINTp *intp) } /* + * Functions used for irqfd + */ + +#ifdef CONFIG_KVM + +/** + * vfio_set_resample_eventfd - sets the resamplefd for an IRQ + * @intp: the IRQ struct pointer + * programs the VFIO driver to unmask this IRQ when the + * intp-unmask eventfd is triggered + */ +static int vfio_set_resample_eventfd(VFIOINTp *intp) +{ +VFIODevice *vbasedev = intp-vdev-vbasedev; +struct vfio_irq_set *irq_set; +int argsz, ret; +int32_t *pfd; + +argsz = sizeof(*irq_set) + sizeof(*pfd); +irq_set = g_malloc0(argsz); +irq_set-argsz = argsz; +irq_set-flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_UNMASK; +irq_set-index = intp-pin; +irq_set-start = 0; +irq_set-count = 1; +pfd = (int32_t *)irq_set-data; +*pfd = event_notifier_get_fd(intp-unmask); +qemu_set_fd_handler(*pfd, NULL, NULL, intp); +ret = ioctl(vbasedev-fd, VFIO_DEVICE_SET_IRQS, irq_set); +g_free(irq_set); +if (ret 0) { +error_report(vfio: Failed to set resample eventfd: %m); +qemu_set_fd_handler(*pfd, NULL, NULL, NULL); +} +return ret; +} + +/** + * vfio_start_irqfd_injection - starts irqfd injection for an IRQ + * programs VFIO driver with both the trigger and resamplefd + * programs KVM with the gsi, trigger resample eventfds + */ +static int vfio_start_irqfd_injection(VFIOINTp *intp) +{ +struct kvm_irqfd irqfd = { +.fd = event_notifier_get_fd(intp-interrupt), +.resamplefd = event_notifier_get_fd(intp-unmask), +.gsi = intp-virtualID, +.flags = KVM_IRQFD_FLAG_RESAMPLE, +}; + +if (kvm_vm_ioctl(kvm_state, KVM_IRQFD, irqfd)) { +error_report(vfio: Error: Failed to assign the irqfd: %m); +goto fail_irqfd; +} +if (vfio_set_trigger_eventfd(intp, NULL) 0) { +goto fail_vfio; +} +if (vfio_set_resample_eventfd(intp) 0) { +goto fail_vfio; +} + +intp-kvm_accel = true; +trace_vfio_platform_start_irqfd_injection(intp-pin, intp-virtualID, + irqfd.fd, irqfd.resamplefd); +return 0; + +fail_vfio: +irqfd.flags = KVM_IRQFD_FLAG_DEASSIGN; +kvm_vm_ioctl(kvm_state, KVM_IRQFD, irqfd); +fail_irqfd: +return -1; +} + +#endif + +/* * Functions used whatever the injection method */ @@ -314,6 +392,13 @@ static VFIOINTp *vfio_init_intp(VFIODevice *vbasedev, unsigned int index) error_report(vfio: Error: trigger event_notifier_init failed ); return NULL; } +/* Get an eventfd for resample/unmask */ +ret = event_notifier_init(intp-unmask, 0); +if (ret) { +g_free(intp); +error_report(vfio: Error: resample event_notifier_init failed eoi); +return NULL; +} /* store the new intp in qlist */ QLIST_INSERT_HEAD(vdev-intp_list, intp, next); @@ -520,7 +605,17 @@ static void vfio_platform_realize(DeviceState *dev, Error **errp) vbasedev-type = VFIO_DEVICE_TYPE_PLATFORM; vbasedev-ops = vfio_platform_ops; + +#ifdef CONFIG_KVM +if (kvm_irqfds_enabled() kvm_resamplefds_enabled() +vdev-irqfd_allowed) { +vdev-start_irq_fn = vfio_start_irqfd_injection; +} else { +vdev-start_irq_fn = vfio_start_eventfd_injection; +} +#else vdev-start_irq_fn = vfio_start_eventfd_injection; +#endif trace_vfio_platform_realize(vbasedev-name, vdev-compat); @@
[Qemu-devel] [PATCH v8 09/19] hw/vfio/pci: rename group_list into vfio_group_list
better fit in the rest of the namespace Signed-off-by: Eric Auger eric.au...@linaro.org --- hw/vfio/pci.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 60ff22b..d4a0e0f 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -282,7 +282,7 @@ static const VFIORomBlacklistEntry romblacklist[] = { #define MSIX_CAP_LENGTH 12 static QLIST_HEAD(, VFIOGroup) -group_list = QLIST_HEAD_INITIALIZER(group_list); +vfio_group_list = QLIST_HEAD_INITIALIZER(vfio_group_list); #ifdef CONFIG_KVM /* @@ -3454,7 +3454,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single) continue; } -QLIST_FOREACH(group, group_list, next) { +QLIST_FOREACH(group, vfio_group_list, next) { if (group-groupid == devices[i].group_id) { break; } @@ -3501,7 +3501,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single) /* Determine how many group fds need to be passed */ count = 0; -QLIST_FOREACH(group, group_list, next) { +QLIST_FOREACH(group, vfio_group_list, next) { for (i = 0; i info-count; i++) { if (group-groupid == devices[i].group_id) { count++; @@ -3515,7 +3515,7 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single) fds = reset-group_fds[0]; /* Fill in group fds */ -QLIST_FOREACH(group, group_list, next) { +QLIST_FOREACH(group, vfio_group_list, next) { for (i = 0; i info-count; i++) { if (group-groupid == devices[i].group_id) { fds[reset-count++] = group-fd; @@ -3550,7 +3550,7 @@ out: continue; } -QLIST_FOREACH(group, group_list, next) { +QLIST_FOREACH(group, vfio_group_list, next) { if (group-groupid == devices[i].group_id) { break; } @@ -3624,13 +3624,13 @@ static void vfio_reset_handler(void *opaque) VFIOGroup *group; VFIODevice *vbasedev; -QLIST_FOREACH(group, group_list, next) { +QLIST_FOREACH(group, vfio_group_list, next) { QLIST_FOREACH(vbasedev, group-device_list, next) { vbasedev-ops-vfio_compute_needs_reset(vbasedev); } } -QLIST_FOREACH(group, group_list, next) { +QLIST_FOREACH(group, vfio_group_list, next) { QLIST_FOREACH(vbasedev, group-device_list, next) { if (vbasedev-needs_reset) { vbasedev-ops-vfio_hot_reset_multi(vbasedev); @@ -3879,7 +3879,7 @@ static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as) char path[32]; struct vfio_group_status status = { .argsz = sizeof(status) }; -QLIST_FOREACH(group, group_list, next) { +QLIST_FOREACH(group, vfio_group_list, next) { if (group-groupid == groupid) { /* Found it. Now is it already in the right context? */ if (group-container-space-as == as) { @@ -3921,11 +3921,11 @@ static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as) goto close_fd_exit; } -if (QLIST_EMPTY(group_list)) { +if (QLIST_EMPTY(vfio_group_list)) { qemu_register_reset(vfio_reset_handler, NULL); } -QLIST_INSERT_HEAD(group_list, group, next); +QLIST_INSERT_HEAD(vfio_group_list, group, next); vfio_kvm_device_add_group(group); @@ -3953,7 +3953,7 @@ static void vfio_put_group(VFIOGroup *group) close(group-fd); g_free(group); -if (QLIST_EMPTY(group_list)) { +if (QLIST_EMPTY(vfio_group_list)) { qemu_unregister_reset(vfio_reset_handler, NULL); } } -- 1.8.3.2
[Qemu-devel] [PATCH v8 10/19] hw/vfio/pci: use name field in format strings
Signed-off-by: Eric Auger eric.au...@linaro.org Conflicts: trace-events --- hw/vfio/pci.c | 213 -- trace-events | 109 -- 2 files changed, 116 insertions(+), 206 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index d4a0e0f..6e15c8a 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -386,9 +386,7 @@ static void vfio_intx_interrupt(void *opaque) return; } -trace_vfio_intx_interrupt(vdev-host.domain, vdev-host.bus, - vdev-host.slot, vdev-host.function, - 'A' + vdev-intx.pin); +trace_vfio_intx_interrupt(vdev-vbasedev.name, 'A' + vdev-intx.pin); vdev-intx.pending = true; pci_irq_assert(vdev-pdev); @@ -407,8 +405,7 @@ static void vfio_eoi(VFIODevice *vbasedev) return; } -trace_vfio_eoi(vdev-host.domain, vdev-host.bus, - vdev-host.slot, vdev-host.function); +trace_vfio_eoi(vbasedev-name); vdev-intx.pending = false; pci_irq_deassert(vdev-pdev); @@ -477,8 +474,7 @@ static void vfio_enable_intx_kvm(VFIOPCIDevice *vdev) vdev-intx.kvm_accel = true; -trace_vfio_enable_intx_kvm(vdev-host.domain, vdev-host.bus, - vdev-host.slot, vdev-host.function); +trace_vfio_enable_intx_kvm(vdev-vbasedev.name); return; @@ -530,8 +526,7 @@ static void vfio_disable_intx_kvm(VFIOPCIDevice *vdev) /* If we've missed an event, let it re-fire through QEMU */ vfio_unmask_single_irqindex(vdev-vbasedev, VFIO_PCI_INTX_IRQ_INDEX); -trace_vfio_disable_intx_kvm(vdev-host.domain, vdev-host.bus, -vdev-host.slot, vdev-host.function); +trace_vfio_disable_intx_kvm(vdev-vbasedev.name); #endif } @@ -550,8 +545,7 @@ static void vfio_update_irq(PCIDevice *pdev) return; /* Nothing changed */ } -trace_vfio_update_irq(vdev-host.domain, vdev-host.bus, - vdev-host.slot, vdev-host.function, +trace_vfio_update_irq(vdev-vbasedev.name, vdev-intx.route.irq, route.irq); vfio_disable_intx_kvm(vdev); @@ -627,8 +621,7 @@ static int vfio_enable_intx(VFIOPCIDevice *vdev) vdev-interrupt = VFIO_INT_INTx; -trace_vfio_enable_intx(vdev-host.domain, vdev-host.bus, - vdev-host.slot, vdev-host.function); +trace_vfio_enable_intx(vdev-vbasedev.name); return 0; } @@ -650,8 +643,7 @@ static void vfio_disable_intx(VFIOPCIDevice *vdev) vdev-interrupt = VFIO_INT_NONE; -trace_vfio_disable_intx(vdev-host.domain, vdev-host.bus, -vdev-host.slot, vdev-host.function); +trace_vfio_disable_intx(vdev-vbasedev.name); } /* @@ -678,9 +670,7 @@ static void vfio_msi_interrupt(void *opaque) abort(); } -trace_vfio_msi_interrupt(vdev-host.domain, vdev-host.bus, - vdev-host.slot, vdev-host.function, - nr, msg.address, msg.data); +trace_vfio_msi_interrupt(vbasedev-name, nr, msg.address, msg.data); #endif if (vdev-interrupt == VFIO_INT_MSIX) { @@ -787,9 +777,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr, VFIOMSIVector *vector; int ret; -trace_vfio_msix_vector_do_use(vdev-host.domain, vdev-host.bus, - vdev-host.slot, vdev-host.function, - nr); +trace_vfio_msix_vector_do_use(vdev-vbasedev.name, nr); vector = vdev-msi_vectors[nr]; @@ -875,9 +863,7 @@ static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr) VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); VFIOMSIVector *vector = vdev-msi_vectors[nr]; -trace_vfio_msix_vector_release(vdev-host.domain, vdev-host.bus, - vdev-host.slot, vdev-host.function, - nr); +trace_vfio_msix_vector_release(vdev-vbasedev.name, nr); /* * There are still old guests that mask and unmask vectors on every @@ -940,8 +926,7 @@ static void vfio_enable_msix(VFIOPCIDevice *vdev) error_report(vfio: msix_set_vector_notifiers failed); } -trace_vfio_enable_msix(vdev-host.domain, vdev-host.bus, - vdev-host.slot, vdev-host.function); +trace_vfio_enable_msix(vdev-vbasedev.name); } static void vfio_enable_msi(VFIOPCIDevice *vdev) @@ -1017,9 +1002,7 @@ retry: return; } -trace_vfio_enable_msi(vdev-host.domain, vdev-host.bus, - vdev-host.slot, vdev-host.function, - vdev-nr_vectors); +trace_vfio_enable_msi(vdev-vbasedev.name, vdev-nr_vectors); } static void vfio_disable_msi_common(VFIOPCIDevice *vdev) @@ -1069,8 +1052,7 @@ static void vfio_disable_msix(VFIOPCIDevice *vdev)
[Qemu-devel] [PATCH v8 12/19] hw/vfio/platform: add vfio-platform support
Minimal VFIO platform implementation supporting - register space user mapping, - IRQ assignment based on eventfds handled on qemu side. irqfd kernel acceleration comes in a subsequent patch. Signed-off-by: Kim Phillips kim.phill...@linaro.org Signed-off-by: Eric Auger eric.au...@linaro.org --- v7 - v8: - change proto of vfio_platform_compute_needs_reset and sets vbasedev-needs_reset to false there - vfio_[un]mask_irqindex renamed into vfio_[un]mask_single_irqindex - vfio_register_irq_starter renamed into vfio_kick_irqs we now use a reset notifier instead of a machine init done notifier. Enables to get rid of the VfioIrqStarterNotifierParams dangling pointer. Previously we use pbus first_irq. This is no more possible since the reset notifier takes a void * and first_irq is a field of a const struct. So now we pass the DeviceState handle of the interrupt controller. I tried to keep the code generic, reason why I did not rely on an architecture specific accessor to retrieve the gsi number (gic accessor as proposed by Alex). I would like to avoid creating an ARM VFIO device model. I hope this model model can work on other archs than arm (no multiple intc?); wouldn't it be simpler to keep the previous first_irq parameter and relax the const constraint. v6 - v7: - compat is not exposed anymore as a user option. Rationale is the vfio device became abstract and a specialization is needed anyway. The derived device must set the compat string. - in v6 vfio_start_irq_injection was exposed in vfio-platform.h. A new function dubbed vfio_register_irq_starter replaces it. It registers a machine init done notifier that programs starts all dynamic VFIO device IRQs. This function is supposed to be called by the machine file. A set of static helper routines are added too. It must be called before the creation of the platform bus device. v5 - v6: - vfio_device property renamed into host property - correct error handling of VFIO_DEVICE_GET_IRQ_INFO ioctl and remove PCI related comment - remove declaration of vfio_setup_irqfd and irqfd_allowed property.Both belong to next patch (irqfd) - remove declaration of vfio_intp_interrupt in vfio-platform.h - functions that can be static get this characteristic - remove declarations of vfio_region_ops, vfio_memory_listener, group_list, vfio_address_spaces. All are moved to vfio-common.h - remove vfio_put_device declaration and definition - print_regions removed. code moved into vfio_populate_regions - replace DPRINTF by trace events - new helper routine to set the trigger eventfd - dissociate intp init from the injection enablement: vfio_enable_intp renamed into vfio_init_intp and new function named vfio_start_eventfd_injection - injection start moved to vfio_start_irq_injection (not anymore in vfio_populate_interrupt) - new start_irq_fn field in VFIOPlatformDevice corresponding to the function that will be used for starting injection - user handled eventfd: x add mutex to protect IRQ state list manipulation, x correct misleading comment in vfio_intp_interrupt. x Fix bugs thanks to fake interrupt modality - VFIOPlatformDeviceClass becomes abstract - add error_setg in vfio_platform_realize v4 - v5: - vfio-plaform.h included first - cleanup error handling in *populate*, vfio_get_device, vfio_enable_intp - vfio_put_device not called anymore - add some includes to follow vfio policy v3 - v4: [Eric Auger] - merge of vfio: Add initial IRQ support in platform device to get a full functional patch although perfs are limited. - removal of unrealize function since I currently understand it is only used with device hot-plug feature. v2 - v3: [Eric Auger] - further factorization between PCI and platform (VFIORegion, VFIODevice). same level of functionality. = v2: [Kim Philipps] - Initial Creation of the device supporting register space mapping --- hw/vfio/Makefile.objs | 1 + hw/vfio/platform.c | 629 include/hw/vfio/vfio-common.h | 1 + include/hw/vfio/vfio-platform.h | 85 ++ trace-events| 12 + 5 files changed, 728 insertions(+) create mode 100644 hw/vfio/platform.c create mode 100644 include/hw/vfio/vfio-platform.h diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs index e31f30e..c5c76fe 100644 --- a/hw/vfio/Makefile.objs +++ b/hw/vfio/Makefile.objs @@ -1,4 +1,5 @@ ifeq ($(CONFIG_LINUX), y) obj-$(CONFIG_SOFTMMU) += common.o obj-$(CONFIG_PCI) += pci.o +obj-$(CONFIG_SOFTMMU) += platform.o endif diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c new file mode 100644 index 000..41f8693 --- /dev/null +++ b/hw/vfio/platform.c @@ -0,0 +1,629 @@ +/* + * vfio based device assignment support - platform devices + * + * Copyright Linaro Limited, 2014 + * + * Authors: + * Kim Phillips kim.phill...@linaro.org + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in
[Qemu-devel] [PATCH v8 14/19] hw/arm/virt: add support for VFIO devices
VFIO devices are dynamic sysbus devices. They could already be instantiated. However for them to be functional, IRQ injection must be programmed and started. This programming must happen after the sysbus devices are attached to the platform bus and IRQ are bound. Only at that time the GSI they are connected to are identified and irqfd can be programmed. Binding happens in a machine init done notifier registered by the platform bus init. The IRQ start is done in a reset notifier. This patchs adds the registration of the IRQ start notifier in machvirt. Signed-off-by: Eric Auger eric.au...@linaro.org --- v7 - v8: - vfio_kick_irqs replaces older vfio_register_irq_starter. The new function registers a reset notifier while the older registered a machine init done notifier. - Given the fact platform_bus_first_irq has become part of a const struct its handle cannot be passed as a void* to the reset notifier. We now pass the interrupt DeviceState*. - create_gic now returns the DeviceState handle of the gic so that it can be passed to the reset notifier registration --- hw/arm/virt.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 37326a9..346b04a 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -44,6 +44,7 @@ #include qemu/error-report.h #include hw/arm/sysbus-fdt.h #include hw/platform-bus.h +#include hw/vfio/vfio-platform.h #define NUM_VIRTIO_TRANSPORTS 32 @@ -330,7 +331,7 @@ static void fdt_add_gic_node(const VirtBoardInfo *vbi) qemu_fdt_setprop_cell(vbi-fdt, /intc, phandle, gic_phandle); } -static void create_gic(const VirtBoardInfo *vbi, qemu_irq *pic) +static DeviceState *create_gic(const VirtBoardInfo *vbi, qemu_irq *pic) { /* We create a standalone GIC v2 */ DeviceState *gicdev; @@ -378,6 +379,7 @@ static void create_gic(const VirtBoardInfo *vbi, qemu_irq *pic) } fdt_add_gic_node(vbi); +return gicdev; } static void create_uart(const VirtBoardInfo *vbi, qemu_irq *pic) @@ -537,7 +539,8 @@ static void create_flash(const VirtBoardInfo *vbi) } static void create_platform_bus(VirtBoardInfo *vbi, qemu_irq *pic, -const ARMPlatformBusSystemParams *system_params) +const ARMPlatformBusSystemParams *system_params, +DeviceState *gic) { DeviceState *dev; SysBusDevice *s; @@ -571,6 +574,9 @@ static void create_platform_bus(VirtBoardInfo *vbi, qemu_irq *pic, memory_region_add_subregion(sysmem, system_params-platform_bus_base, sysbus_mmio_get_region(s, 0)); + +/* setup VFIO signaling/IRQFD for all VFIO platform sysbus devices */ +qemu_register_reset(vfio_kick_irqs, gic); } static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size) @@ -589,6 +595,7 @@ static void machvirt_init(MachineState *machine) MemoryRegion *ram = g_new(MemoryRegion, 1); const char *cpu_model = machine-cpu_model; VirtBoardInfo *vbi; +DeviceState *gic; if (!cpu_model) { cpu_model = cortex-a15; @@ -646,7 +653,7 @@ static void machvirt_init(MachineState *machine) create_flash(vbi); -create_gic(vbi, pic); +gic = create_gic(vbi, pic); create_uart(vbi, pic); @@ -658,7 +665,7 @@ static void machvirt_init(MachineState *machine) */ create_virtio_devices(vbi, pic); -create_platform_bus(vbi, pic, platform_bus_params); +create_platform_bus(vbi, pic, platform_bus_params, gic); vbi-bootinfo.ram_size = machine-ram_size; vbi-bootinfo.kernel_filename = machine-kernel_filename; -- 1.8.3.2
[Qemu-devel] [PATCH v8 18/19] hw/vfio/common: vfio_kvm_device_fd moved in the common header
the device is now used in platform for forwarded IRQ setup Signed-off-by: Eric Auger eric.au...@linaro.org --- hw/vfio/common.c | 3 ++- include/hw/vfio/vfio-common.h | 5 + 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 554467f..ba00ec9 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -44,9 +44,10 @@ struct vfio_as_head vfio_address_spaces = * initialized, this file descriptor is only released on QEMU exit and * we'll re-use it should another vfio device be attached before then. */ -static int vfio_kvm_device_fd = -1; +int vfio_kvm_device_fd = -1; #endif + /* * Common VFIO interrupt disable */ diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index b5af090..58fd786 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -41,6 +41,11 @@ #define VFIO_ALLOW_KVM_MSI 1 #define VFIO_ALLOW_KVM_MSIX 1 +#ifdef CONFIG_KVM +extern int vfio_kvm_device_fd; +#endif + + enum { VFIO_DEVICE_TYPE_PCI = 0, VFIO_DEVICE_TYPE_PLATFORM = 1, -- 1.8.3.2
[Qemu-devel] [PATCH v8 17/19] linux-headers: Update KVM headers from linux-next tag ToBeFilled
Syncup KVM related linux headers from linux-next tree using scripts/update-linux-headers.sh. Integrate updated KVM-VFIO API related to forwarded IRQ Signed-off-by: Eric Auger eric.au...@linaro.org --- linux-headers/linux/kvm.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 12045a1..9f798ab 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -946,6 +946,9 @@ struct kvm_device_attr { #define KVM_DEV_VFIO_GROUP1 #define KVM_DEV_VFIO_GROUP_ADD 1 #define KVM_DEV_VFIO_GROUP_DEL 2 +#define KVM_DEV_VFIO_DEVICE 2 +#define KVM_DEV_VFIO_DEVICE_FORWARD_IRQ 1 +#define KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ2 enum kvm_device_type { KVM_DEV_TYPE_FSL_MPIC_20= 1, @@ -963,6 +966,13 @@ enum kvm_device_type { KVM_DEV_TYPE_MAX, }; +struct kvm_arch_forwarded_irq { +__u32 fd; /* file desciptor of the VFIO device */ +__u32 index; /* VFIO device IRQ index */ +__u32 subindex; /* VFIO device IRQ subindex */ +__u32 gsi; /* gsi, ie. virtual IRQ number */ +}; + /* * ioctls for VM fds */ -- 1.8.3.2
[Qemu-devel] [PATCH v8 13/19] hw/vfio: calxeda xgmac device
The platform device class has become abstract. This patch introduces a calxeda xgmac device that can be be instantiated on command line using such option. -device vfio-calxeda-xgmac,host=fff51000.ethernet Signed-off-by: Eric Auger eric.au...@linaro.org --- v7 - v8: - add a comment in the header about the MMIO regions and IRQ which are exposed by the device v5 - v6 - back again following Alex Graf advises - fix a bug related to compat override v4 - v5: removed since device tree was moved to hw/arm/dyn_sysbus_devtree.c v4: creation for device tree specialization --- hw/vfio/Makefile.objs| 1 + hw/vfio/calxeda_xgmac.c | 54 include/hw/vfio/vfio-calxeda-xgmac.h | 46 ++ 3 files changed, 101 insertions(+) create mode 100644 hw/vfio/calxeda_xgmac.c create mode 100644 include/hw/vfio/vfio-calxeda-xgmac.h diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs index c5c76fe..913ab14 100644 --- a/hw/vfio/Makefile.objs +++ b/hw/vfio/Makefile.objs @@ -2,4 +2,5 @@ ifeq ($(CONFIG_LINUX), y) obj-$(CONFIG_SOFTMMU) += common.o obj-$(CONFIG_PCI) += pci.o obj-$(CONFIG_SOFTMMU) += platform.o +obj-$(CONFIG_SOFTMMU) += calxeda_xgmac.o endif diff --git a/hw/vfio/calxeda_xgmac.c b/hw/vfio/calxeda_xgmac.c new file mode 100644 index 000..199e076 --- /dev/null +++ b/hw/vfio/calxeda_xgmac.c @@ -0,0 +1,54 @@ +/* + * calxeda xgmac example VFIO device + * + * Copyright Linaro Limited, 2014 + * + * Authors: + * Eric Auger eric.au...@linaro.org + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include hw/vfio/vfio-calxeda-xgmac.h + +static void calxeda_xgmac_realize(DeviceState *dev, Error **errp) +{ +VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev); +VFIOCalxedaXgmacDeviceClass *k = VFIO_CALXEDA_XGMAC_DEVICE_GET_CLASS(dev); + +vdev-compat = g_strdup(calxeda,hb-xgmac); + +k-parent_realize(dev, errp); +} + +static const VMStateDescription vfio_platform_vmstate = { +.name = TYPE_VFIO_CALXEDA_XGMAC, +.unmigratable = 1, +}; + +static void vfio_calxeda_xgmac_class_init(ObjectClass *klass, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(klass); +VFIOCalxedaXgmacDeviceClass *vcxc = +VFIO_CALXEDA_XGMAC_DEVICE_CLASS(klass); +vcxc-parent_realize = dc-realize; +dc-realize = calxeda_xgmac_realize; +dc-desc = VFIO Calxeda XGMAC; +} + +static const TypeInfo vfio_calxeda_xgmac_dev_info = { +.name = TYPE_VFIO_CALXEDA_XGMAC, +.parent = TYPE_VFIO_PLATFORM, +.instance_size = sizeof(VFIOCalxedaXgmacDevice), +.class_init = vfio_calxeda_xgmac_class_init, +.class_size = sizeof(VFIOCalxedaXgmacDeviceClass), +}; + +static void register_calxeda_xgmac_dev_type(void) +{ +type_register_static(vfio_calxeda_xgmac_dev_info); +} + +type_init(register_calxeda_xgmac_dev_type) diff --git a/include/hw/vfio/vfio-calxeda-xgmac.h b/include/hw/vfio/vfio-calxeda-xgmac.h new file mode 100644 index 000..f994775 --- /dev/null +++ b/include/hw/vfio/vfio-calxeda-xgmac.h @@ -0,0 +1,46 @@ +/* + * VFIO calxeda xgmac device + * + * Copyright Linaro Limited, 2014 + * + * Authors: + * Eric Auger eric.au...@linaro.org + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#ifndef HW_VFIO_VFIO_CALXEDA_XGMAC_H +#define HW_VFIO_VFIO_CALXEDA_XGMAC_H + +#include hw/vfio/vfio-platform.h + +#define TYPE_VFIO_CALXEDA_XGMAC vfio-calxeda-xgmac + +/** + * This device exposes: + * - a single MMIO region corresponding to its register space + * - 3 IRQS (main and 2 power related IRQs) + */ +typedef struct VFIOCalxedaXgmacDevice { +VFIOPlatformDevice vdev; +} VFIOCalxedaXgmacDevice; + +typedef struct VFIOCalxedaXgmacDeviceClass { +/* private */ +VFIOPlatformDeviceClass parent_class; +/* public */ +DeviceRealize parent_realize; +} VFIOCalxedaXgmacDeviceClass; + +#define VFIO_CALXEDA_XGMAC_DEVICE(obj) \ + OBJECT_CHECK(VFIOCalxedaXgmacDevice, (obj), TYPE_VFIO_CALXEDA_XGMAC) +#define VFIO_CALXEDA_XGMAC_DEVICE_CLASS(klass) \ + OBJECT_CLASS_CHECK(VFIOCalxedaXgmacDeviceClass, (klass), \ +TYPE_VFIO_CALXEDA_XGMAC) +#define VFIO_CALXEDA_XGMAC_DEVICE_GET_CLASS(obj) \ + OBJECT_GET_CLASS(VFIOCalxedaXgmacDeviceClass, (obj), \ + TYPE_VFIO_CALXEDA_XGMAC) + +#endif -- 1.8.3.2
[Qemu-devel] [PATCH v8 15/19] hw/arm/sysbus-fdt: enable vfio-calxeda-xgmac dynamic instantiation
vfio-calxeda-xgmac now can be instantiated using the -device option. The node creation function generates a very basic dt node composed of the compat, reg and interrupts properties Signed-off-by: Eric Auger eric.au...@linaro.org --- v7 - v8: - move the add_fdt_node_functions array declaration between the device specific code and the generic code to avoid forward declarations of decice specific functions - rename add_basic_vfio_fdt_node into add_calxeda_midway_xgmac_fdt_node v6 - v7: - compat string re-formatting removed since compat string is not exposed anymore as a user option - VFIO IRQ kick-off removed from sysbus-fdt and moved to VFIO platform device --- hw/arm/sysbus-fdt.c | 88 + 1 file changed, 88 insertions(+) diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c index 7537267..86bbd06 100644 --- a/hw/arm/sysbus-fdt.c +++ b/hw/arm/sysbus-fdt.c @@ -26,6 +26,8 @@ #include sysemu/device_tree.h #include hw/platform-bus.h #include sysemu/sysemu.h +#include hw/vfio/vfio-platform.h +#include hw/vfio/vfio-calxeda-xgmac.h /* * internal struct that contains the information to create dynamic @@ -53,11 +55,97 @@ typedef struct NodeCreationPair { int (*add_fdt_node_fn)(SysBusDevice *sbdev, void *opaque); } NodeCreationPair; +/* Device Specific Code */ + +/** + * add_calxeda_midway_xgmac_fdt_node + * + * Generates a very simple node with following properties: + * compatible string, regs, interrupts + */ +static int add_calxeda_midway_xgmac_fdt_node(SysBusDevice *sbdev, void *opaque) +{ +PlatformBusFdtData *data = opaque; +PlatformBusDevice *pbus = data-pbus; +void *fdt = data-fdt; +const char *parent_node = data-pbus_node_name; +int compat_str_len; +char *nodename; +int i, ret; +uint32_t *irq_attr; +uint64_t *reg_attr; +uint64_t mmio_base; +uint64_t irq_number; +VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev); +VFIODevice *vbasedev = vdev-vbasedev; +Object *obj = OBJECT(sbdev); + +mmio_base = object_property_get_int(obj, mmio[0], NULL); + +nodename = g_strdup_printf(%s/%s@% PRIx64, parent_node, + vbasedev-name, + mmio_base); + +qemu_fdt_add_subnode(fdt, nodename); + +compat_str_len = strlen(vdev-compat) + 1; +qemu_fdt_setprop(fdt, nodename, compatible, + vdev-compat, compat_str_len); + +reg_attr = g_new(uint64_t, vbasedev-num_regions*4); + +for (i = 0; i vbasedev-num_regions; i++) { +mmio_base = platform_bus_get_mmio_addr(pbus, sbdev, i); +reg_attr[4*i] = 1; +reg_attr[4*i+1] = mmio_base; +reg_attr[4*i+2] = 1; +reg_attr[4*i+3] = memory_region_size(vdev-regions[i]-mem); +} + +ret = qemu_fdt_setprop_sized_cells_from_array(fdt, nodename, reg, + vbasedev-num_regions*2, reg_attr); +if (ret 0) { +error_report(could not set reg property of node %s, nodename); +goto fail; +} + +irq_attr = g_new(uint32_t, vbasedev-num_irqs*3); + +for (i = 0; i vbasedev-num_irqs; i++) { +irq_number = platform_bus_get_irqn(pbus, sbdev , i) + + data-irq_start; +irq_attr[3*i] = cpu_to_be32(0); +irq_attr[3*i+1] = cpu_to_be32(irq_number); +irq_attr[3*i+2] = cpu_to_be32(0x4); +} + + ret = qemu_fdt_setprop(fdt, nodename, interrupts, + irq_attr, vbasedev-num_irqs*3*sizeof(uint32_t)); +if (ret 0) { +error_report(could not set interrupts property of node %s, + nodename); +goto fail; +} + +g_free(nodename); +g_free(irq_attr); +g_free(reg_attr); + +return 0; + +fail: + + return -1; +} + /* list of supported dynamic sysbus devices */ static const NodeCreationPair add_fdt_node_functions[] = { +{TYPE_VFIO_CALXEDA_XGMAC, add_calxeda_midway_xgmac_fdt_node}, {, NULL}, /*last element*/ }; +/* Generic Code */ + /** * add_fdt_node - add the device tree node of a dynamic sysbus device * -- 1.8.3.2
[Qemu-devel] [PATCH v8 19/19] hw/vfio/platform: add forwarded irq support
Tests whether the forwarded IRQ modality is available. In the positive device IRQs are forwarded. This control is achieved with KVM-VFIO device. with such a modality injection still is handled through irqfds. However end of interrupt is not trapped anymore. As soon as the guest completes its virtual IRQ, the corresponding physical IRQ is completed and the same physical IRQ can hit again. A new x-forward property enables to force forwarding off although enabled by the kernel. Signed-off-by: Eric Auger eric.au...@linaro.org --- hw/vfio/platform.c | 52 + include/hw/vfio/vfio-platform.h | 2 ++ trace-events| 1 + 3 files changed, 55 insertions(+) diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c index 97d98bf..7881b9b 100644 --- a/hw/vfio/platform.c +++ b/hw/vfio/platform.c @@ -237,6 +237,52 @@ static int vfio_start_eventfd_injection(VFIOINTp *intp) } /* + * Functions used with forwarding capability + */ + +#ifdef CONFIG_KVM + +static bool has_kvm_vfio_forward_capability(void) +{ +struct kvm_device_attr attr = { + .group = KVM_DEV_VFIO_DEVICE, + .attr = KVM_DEV_VFIO_DEVICE_FORWARD_IRQ}; + +if (ioctl(vfio_kvm_device_fd, KVM_HAS_DEVICE_ATTR, attr) == 0) { +return true; +} else { +return false; +} +} + +static int vfio_set_forwarding(VFIOINTp *intp) +{ +int ret; +struct kvm_device_attr attr = { + .group = KVM_DEV_VFIO_DEVICE, + .attr = KVM_DEV_VFIO_DEVICE_FORWARD_IRQ}; + +intp-fwd_irq = g_malloc0(sizeof(*intp-fwd_irq)); +intp-fwd_irq-fd = intp-vdev-vbasedev.fd; +intp-fwd_irq-index = intp-pin; +intp-fwd_irq-gsi = intp-virtualID; + +attr.addr = (uint64_t)(unsigned long)intp-fwd_irq; + +if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, attr)) { +error_report(Failed to forward IRQ %d through KVM VFIO device, + intp-pin); +g_free(intp-fwd_irq); +return -errno; +} +trace_vfio_start_fwd_injection(intp-pin); + +return ret; +} + +#endif + +/* * Functions used for irqfd */ @@ -288,6 +334,11 @@ static int vfio_start_irqfd_injection(VFIOINTp *intp) .flags = KVM_IRQFD_FLAG_RESAMPLE, }; +if (has_kvm_vfio_forward_capability() + intp-vdev-forward_allowed) { +vfio_set_forwarding(intp); +} + if (kvm_vm_ioctl(kvm_state, KVM_IRQFD, irqfd)) { error_report(vfio: Error: Failed to assign the irqfd: %m); goto fail_irqfd; @@ -694,6 +745,7 @@ static Property vfio_platform_dev_properties[] = { DEFINE_PROP_UINT32(mmap-timeout-ms, VFIOPlatformDevice, mmap_timeout, 1100), DEFINE_PROP_BOOL(x-irqfd, VFIOPlatformDevice, irqfd_allowed, true), +DEFINE_PROP_BOOL(x-forward, VFIOPlatformDevice, forward_allowed, true), DEFINE_PROP_END_OF_LIST(), }; diff --git a/include/hw/vfio/vfio-platform.h b/include/hw/vfio/vfio-platform.h index de0b5d5..d512bb3 100644 --- a/include/hw/vfio/vfio-platform.h +++ b/include/hw/vfio/vfio-platform.h @@ -42,6 +42,7 @@ typedef struct VFIOINTp { bool kvm_accel; /* set when QEMU bypass through KVM enabled */ uint8_t pin; /* index */ uint8_t virtualID; /* virtual IRQ */ +struct kvm_arch_forwarded_irq *fwd_irq; } VFIOINTp; typedef int (*start_irq_fn_t)(VFIOINTp *intp); @@ -59,6 +60,7 @@ typedef struct VFIOPlatformDevice { start_irq_fn_t start_irq_fn; QemuMutex intp_mutex; bool irqfd_allowed; /* debug option to force irqfd on/off */ +bool forward_allowed; /* debug option to force forwarding on/off */ } VFIOPlatformDevice; diff --git a/trace-events b/trace-events index 59a09f6..0aea358 100644 --- a/trace-events +++ b/trace-events @@ -1431,6 +1431,7 @@ vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, vfio_put_base_device(int fd) close vdev-fd=%d # hw/vfio/platform.c +vfio_start_fwd_injection(int pin) forwarding set for IRQ pin %d vfio_platform_eoi(int pin, int fd) EOI IRQ pin %d (fd=%d) vfio_platform_mmap_set_enabled(bool enabled) fast path = %d vfio_platform_intp_mmap_enable(int pin) IRQ #%d still active, stay in slow path -- 1.8.3.2
[Qemu-devel] [PATCH v8 00/19] KVM platform device passthrough
This RFC series aims at enabling KVM platform device passthrough. It implements a VFIO platform device, derived from VFIO PCI device. The VFIO platform device uses the host VFIO platform driver which must be bound to the assigned device prior to the QEMU system start. - the guest can directly access the device register space - assigned device IRQs are transparently routed to the guest by QEMU/KVM (3 methods currently are supported: user-level eventfd handling, irqfd, forwarded IRQs) - iommu is transparently programmed to prevent the device from accessing physical pages outside of the guest address space This patch series is made of the following patch file groups: 1-11) PCI modifications to prepare for platform device introduction 12-15) VFIO calxeda midway platform device without irqfd support 16) VFIO platform device with irqfd support 17-19) VFIO platform device with IRQ forwarding support Each group is independent and should be separately upstreamable. Dependency List: QEMU dependencies: [1] [PATCH v5] machvirt dynamic sysbus device instantiation Eric Auger [2] [PATCH v3 0/2] actual checks of KVM_CAP_IRQFD and KVM_CAP_IRQFD_RESAMPLE, Eric Auger http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00589.html [3] [PATCH v2] vfio: migration to trace points Eric Auger https://patchwork.ozlabs.org/patch/394785/ Kernel Dependencies: [1] [PATCH v10 00/20] VFIO support for platform and AMBA devices on ARM Antonios Motakis http://comments.gmane.org/gmane.linux.kernel.iommu/7096 [2] [PATCH v3 0/6] vfio: type1: support for ARM SMMUS with VFIO_IOMMU_TYPE1 Antonios Motakis http://www.spinics.net/lists/kvm-arm/msg11738.html [3] [PATCH v4] ARM: KVM: add irqfd support Eric Auger https://lkml.org/lkml/2014/9/1/141 [4] [RFC PATCH 0/9] ARM: Forwarding physical interrupts to a guest VM, Marc Zyngier http://lwn.net/Articles/603514/ [5] [PATCH v3 0/9] KVM-VFIO IRQ forward control Eric Auger https://lkml.org/lkml/2014/9/1/344 - kernel pieces can be found at: http://git.linaro.org/people/eric.auger/linux.git (branch 3.18-rc6-v10) - QEMU pieces can be found at: http://git.linaro.org/people/eric.auger/qemu.git (branch vfio_integ_v8) The patch series was tested on Calxeda Midway (ARMv7) where one xgmac is assigned to KVM host while the second one is assigned to the guest. Reworked PCI device is not tested. Wiki for Calxeda Midway setup: https://wiki.linaro.org/LEG/Engineering/Virtualization/Platform_Device_Passthrough_on_Midway History: v7-v8: - rebase on v2.2.0-rc3 and integrate Add skip_dump flag to ignore memory region during dump - KVM header evolution with subindex addition in kvm_arch_forwarded_irq - split [PATCH v7 03/16] hw/vfio/pci: introduce VFIODevice into 4 patches - vfio_compute_needs_reset does not return bool anymore - add some comments about exposed MMIO region and IRQ in calxeda xgmac device - vfio_[un]mask_irqindex renamed into vfio_[un]mask_single_irqindex - rework IRQ startup: former machine init done notifier is replaced by a reset notifier. machine file passes the interrupt controller DeviceState handle (not the platform bus first irq parameter). - sysbus-fdt: - move the add_fdt_node_functions array declaration between the device specific code and the generic code to avoid forward declarations of decice specific functions - rename add_basic_vfio_fdt_node into add_calxeda_midway_xgmac_fdt_node emphasizing the fact it is xgmac specific v6-v7: - fake injection test modality removed - VFIO_DEVICE_TYPE_PLATFORM only introduced with VFIO platform - new helper functions to start VFIO IRQ on machine init done notifier (introduced in hw/vfio/platform: add vfio-platform support and notifier registration invoked in hw/arm/virt: add support for VFIO devices). vfio_start_irq_injection is replaced by vfio_register_irq_starter. v5-v6: - rebase on 2.1rc5 PCI code - forwarded IRQ first integraton - vfio_device property renamed into host property - split IRQ setup in different functions that match the 3 supported injection techniques (user handled eventfd, irqfd, forwarded IRQ): removes dynamic switch between injection methods - introduce fake interrupts as a test modality: x makes possible to test multiple IRQ user-side handling. x this is a test feature only: enable to trigger a fd as if the real physical IRQ hit. No virtual IRQ is injected into the guest but handling is simulated so that the state machine can be tested - user handled eventfd: x add mutex to protect IRQ state list manipulation, x correct misleading comment in vfio_intp_interrupt. x Fix bugs using fake interrupt modality - irqfd no more advertised in this patchset (handled in [3]) - VFIOPlatformDeviceClass becomes abstract and Calxeda xgmac device and class is re-introduced (as per v4) - all DPRINTF removed in platform and replaced by trace-points - corrects compilation with configure --disable-kvm - simplifies
[Qemu-devel] dtb support on x86 machines
Hi, I would like to share my work-in-progress about device-tree on qemux x86 machine. The patch is not fully functional but works as a proof of concept. It is based on qemu stable-2.1 and when I solve my questions I will do using master branch. Besides that device-tree on x86 machines is not widespread used but works. The bootloader syslinux has support to it and I am doing the similar patches to kexec too. So I deciced to do the some with qemu. ;) The patch uses setup_data field of linux boot protocol (https://www.kernel.org/doc/Documentation/x86/boot.txt) which is a linked list of 'struct setup_data'. Usually setup_data is used to extend boot parameters. I am using it to put a loaded dtb there. Until now you can see the patch at https://github.com/joaohf/qemu/commit/941d68e6126b4e0908fdd8a90fa7d3f28098a49f. I will send it to qemu-devel list when I solve my biggest question that I am going to explain later. -- begin diff --git a/hw/i386/pc.c b/hw/i386/pc.c index ef9fad8..94467ba 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -51,6 +51,7 @@ #include exec/address-spaces.h #include sysemu/arch_init.h #include qemu/bitmap.h +#include sysemu/device_tree.h #include qemu/config-file.h #include hw/acpi/acpi.h #include hw/acpi/cpu_hotplug.h @@ -75,7 +76,7 @@ /* Leave a chunk of memory at the top of RAM for the BIOS ACPI tables * (128K) and other BIOS datastructures (less than 4K reported to be used at * the moment, 32K should be enough for a while). */ -unsigned acpi_data_size = 0x2 + 0x8000; +unsigned acpi_data_size = 0x2 + 0x8; void pc_set_legacy_acpi_data_size(void) { acpi_data_size = 0x1; @@ -741,17 +742,77 @@ static long get_file_size(FILE *f) return size; } +static int load_dtb(FWCfgState *fw_cfg, +const char *dtb_filename, +void **dtb_addr, +int *dtb_size) +{ +void *fdt = NULL; + +fdt = load_device_tree(dtb_filename, dtb_size); +if (!fdt) { +fprintf(stderr, Couldn't open dtb file %s\n, dtb_filename); +return -1; +} + +qemu_fdt_dumpdtb(fdt, *dtb_size); + +*dtb_addr = fdt; + +return 0; +} + +struct setup_data { +uint64_t next; +uint32_t type; +#define SETUP_NONE 0 +#define SETUP_E820_EXT 1 +#define SETUP_DTB 2 +#define SETUP_PCI 3 +#define SETUP_EFI 4 +uint32_t len; +uint8_t data[0]; +} __attribute__((packed)); + +static int setup_dtb_data(FWCfgState *fw_cfg, + void **setup_data_addr, int *setup_data_size, + void *dtb_addr, off_t dtb_size) +{ +struct setup_data *sd; +int sdsize; + +sd = g_malloc(sizeof(struct setup_data) + dtb_size); +if (!sd) { +return -1; +} + +memset(sd, 0, sizeof(struct setup_data) + dtb_size); +sd-next = 0; +sd-type = SETUP_DTB; +sd-len = dtb_size; +memcpy(sd-data, dtb_addr, dtb_size); + +sdsize = sd-len + sizeof(struct setup_data); + +*setup_data_addr = (void *) sd; +*setup_data_size = sdsize; + +return 0; +} + static void load_linux(FWCfgState *fw_cfg, const char *kernel_filename, const char *initrd_filename, + const char *dtb_filename, const char *kernel_cmdline, hwaddr max_ram_size) { uint16_t protocol; -int setup_size, kernel_size, initrd_size = 0, cmdline_size; +int setup_size, kernel_size, initrd_size = 0, cmdline_size, dtb_size = 0, setup_data_size = 0;; uint32_t initrd_max; uint8_t header[8192], *setup, *kernel, *initrd_data; -hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0; +hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0, setup_data_addr = 0; +void *dtb_addr, *setup_data; FILE *f; char *vmode; @@ -891,6 +952,53 @@ static void load_linux(FWCfgState *fw_cfg, stl_p(header+0x21c, initrd_size); } +/* load dtb */ +if (dtb_filename) { +int retval; +retval = load_dtb(fw_cfg, dtb_filename, dtb_addr, dtb_size); +if (retval 0) { +fprintf(stderr, qemu: error loading dtb %s: %s\n, +dtb_filename, strerror(errno)); +exit(1); +} + +retval = setup_dtb_data(fw_cfg, setup_data, setup_data_size, +dtb_addr, dtb_size); +if (retval 0) { +fprintf(stderr, qemu: error no memory to setup_data\n); +exit(1); +} + +//if (!initrd_addr) { +//setup_data_addr = (initrd_max-initrd_size-setup_data_size) ~4095; +//} else { +setup_data_addr = QEMU_ALIGN_UP(initrd_max-initrd_size-setup_data_size, 4096); +//} + +stq_p(header+0x250, setup_data_addr); + +cpu_physical_memory_write(setup_data_addr, setup_data, setup_data_size); + -- Above you
Re: [Qemu-devel] [PATCH] i6300esb: fix reading config registers and accept writes of all length
On Wed, Oct 29, 2014 at 02:42:51PM +0100, Adam Hoka wrote: Don't require configuration register write to be off a certain length, as some PCI implementations always access them in 32bit only. This is because it's in fact the only kind of access supported by the standard, anything else is implementation dependent. Add support for reading back the configuration register values. Unify the MMIO register implementation into a common read and write function. This makes driver testing in QEMU less surprising. Missing: interrupt register is still not implemented as interrupting itself is absent. It's unclear from the 6300ESB ICH specs where the IRQ line is connected in real hardware. Signed-off-by: Adam Hoka adam.h...@gmail.com I don't really have any opinion on this patch. All I care is that it doesn't break the Linux device driver (the Intel-supplied 32 bit Windows device driver is unfortunately a lost cause). Did you test it against Linux? I wrote a small test harness that makes testing the qemu watchdog simple: http://git.annexia.org/?p=watchdog-test-framework.git;a=summary Rich. hw/watchdog/wdt_i6300esb.c | 134 ++--- 1 file changed, 53 insertions(+), 81 deletions(-) diff --git a/hw/watchdog/wdt_i6300esb.c b/hw/watchdog/wdt_i6300esb.c index 687c8b1..8512a91 100644 --- a/hw/watchdog/wdt_i6300esb.c +++ b/hw/watchdog/wdt_i6300esb.c @@ -212,12 +212,12 @@ static void i6300esb_config_write(PCIDevice *dev, uint32_t addr, i6300esb_debug(addr = %x, data = %x, len = %d\n, addr, data, len); -if (addr == ESB_CONFIG_REG len == 2) { +if (addr == ESB_CONFIG_REG) { d-reboot_enabled = (data ESB_WDT_REBOOT) == 0; d-clock_scale = (data ESB_WDT_FREQ) != 0 ? CLOCK_SCALE_1MHZ : CLOCK_SCALE_1KHZ; d-int_type = (data ESB_WDT_INTTYPE); -} else if (addr == ESB_LOCK_REG len == 1) { +} else if (addr == ESB_LOCK_REG) { if (!d-locked) { d-locked = (data ESB_WDT_LOCK) != 0; d-free_run = (data ESB_WDT_FUNC) != 0; @@ -240,13 +240,13 @@ static uint32_t i6300esb_config_read(PCIDevice *dev, uint32_t addr, int len) i6300esb_debug (addr = %x, len = %d\n, addr, len); -if (addr == ESB_CONFIG_REG len == 2) { +if (addr == ESB_CONFIG_REG) { data = (d-reboot_enabled ? 0 : ESB_WDT_REBOOT) | (d-clock_scale == CLOCK_SCALE_1MHZ ? ESB_WDT_FREQ : 0) | d-int_type; return data; -} else if (addr == ESB_LOCK_REG len == 1) { +} else if (addr == ESB_LOCK_REG) { data = (d-free_run ? ESB_WDT_FUNC : 0) | (d-locked ? ESB_WDT_LOCK : 0) | @@ -257,116 +257,88 @@ static uint32_t i6300esb_config_read(PCIDevice *dev, uint32_t addr, int len) } } -static uint32_t i6300esb_mem_readb(void *vp, hwaddr addr) +static uint32_t i6300esb_mem_read(void *vp, hwaddr addr) { -i6300esb_debug (addr = %x\n, (int) addr); - -return 0; -} - -static uint32_t i6300esb_mem_readw(void *vp, hwaddr addr) -{ -uint32_t data = 0; I6300State *d = vp; -i6300esb_debug(addr = %x\n, (int) addr); +i6300esb_debug(addr = %p\n, (void *)addr); -if (addr == 0xc) { +switch (addr) { +case 0x00: +return d-timer1_preload; +case 0x04: +return d-timer2_preload; +case 0x0c: /* The previous reboot flag is really bit 9, but there is * a bug in the Linux driver where it thinks it's bit 12. * Set both. */ -data = d-previous_reboot_flag ? 0x1200 : 0; +return d-previous_reboot_flag ? 0x1200 : 0; } -return data; -} - -static uint32_t i6300esb_mem_readl(void *vp, hwaddr addr) -{ -i6300esb_debug(addr = %x\n, (int) addr); - return 0; } -static void i6300esb_mem_writeb(void *vp, hwaddr addr, uint32_t val) +static void i6300esb_mem_write(void *vp, hwaddr addr, uint32_t val) { I6300State *d = vp; -i6300esb_debug(addr = %x, val = %x\n, (int) addr, val); +i6300esb_debug(addr = %p, val = 0x%x\n, (void *)addr, val); -if (addr == 0xc val == 0x80) +/* register lock */ +if (addr == 0xc val == 0x80) { d-unlock_state = 1; -else if (addr == 0xc val == 0x86 d-unlock_state == 1) +return; +} else if (addr == 0xc val == 0x86 d-unlock_state == 1) { d-unlock_state = 2; -} +return; +} else if (d-unlock_state == 0) { +return; +} -static void i6300esb_mem_writew(void *vp, hwaddr addr, uint32_t val) -{ -I6300State *d = vp; +switch (addr) { +case 0x00: +d-timer1_preload = val 0xf; +break; -i6300esb_debug(addr = %x, val = %x\n, (int) addr, val); +case 0x04: +d-timer2_preload = val 0xf; +break; -if (addr == 0xc val == 0x80) -
Re: [Qemu-devel] How does qemu know the virtual memory of the guest os?
On Fri, Nov 28, 2014 at 04:17:10PM -0800, Jidong Xiao wrote: Hi, I notice that Qemu supports dump virtual memory of Guest OS. As this page suggests: http://doc.opensuse.org/products/draft/SLES/SLES-kvm_sd_draft/cha.qemu.monitor.html To save the content of the virtual machine memory to a disk or console output, use the following commands: memsave addr size filename Saves virtual memory dump starting at addr of size size to file filename pmemsave addr size filename Saves physical memory dump starting at addr of size size to file filename = I understand that hypervisors certainly know the physical memory of virtual machine, but how does it know the virtual memory of the Guest OS? I think the hypervisor has no semantic knowledge of the Guest OS, and such knowledge should be different for different OS (e.g., Windows vs Linux), so I am really surprised that Qemu can dump the virtual memory of the Guest OS. Can someone kindly give me some explanation? Thank you very much!! It's different for each *architecture*, but not for each OS. For example on x86 it starts by reading the CR* control registers, and then the page tables (see target-i386/helper.c: x86_cpu_get_phys_page_debug). Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
Re: [Qemu-devel] Better Cortex-M support?
On Fri, Nov 14, 2014 at 5:32 PM, Liviu Ionescu i...@livius.net wrote: On 14 Nov 2014, at 03:01, Alistair Francis alistai...@gmail.com wrote: I haven't looked into CMSIS or using SysTick, so I can't confirm that they work. I don't have any experience with using either, so I can't really be of much help with those. when you'll have some time, perhaps it would be useful to install GNU ARM Eclipse and generate a project for your board, run it on the physical hardware, then test it on QEMU. Sorry about the long delay. I probably won't be able to do that for some time, I have other other aspects for the project that are higher priority. If I get a chance I will though I have implementations for the more important system peripherals in the STM32F2xx/4xx SoC families, including GPIO. did you implement the clock related registers? PLL others? these are used during CMSIS SystemInit() and are mandatory, otherwise emulation will either fail or not be realistic. Not specifically. I did implement a timer peripheral, but I assume that isn't the same. I didn't have any issues with timing and unrealistic emulation, but I'm not looking for exact time accurate emulations Thanks, Alistair You are welcome to use those if you want thank you! above my use case is more aimed at higher level machine/peripherals support yes, that's great, but without a proper base, like system registers and debug, usability may be not be as good as expected. regards, Liviu
Re: [Qemu-devel] [PATCH 3/7] test-coroutine: avoid overflow on 32-bit systems
On Fri, Nov 28, 2014 at 10:12 PM, Paolo Bonzini pbonz...@redhat.com wrote: unsigned long is not large enough to represent 10 * duration there. Just use floating point. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- tests/test-coroutine.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/test-coroutine.c b/tests/test-coroutine.c index e22fae1..27d1b6f 100644 --- a/tests/test-coroutine.c +++ b/tests/test-coroutine.c @@ -337,7 +337,7 @@ static void perf_cost(void) %luns per coroutine, maxcycles, duration, ops, - (unsigned long)(10 * duration) / maxcycles); + (unsigned long)(10.0 * duration / maxcycles)); One more single bracket. thanks, Ming Lei
Re: [Qemu-devel] [PATCH 04/12] spapr_pci: add set-indicator RTAS interface
On Wed, Nov 26, 2014 at 11:57 AM, Michael Roth mdr...@linux.vnet.ibm.com wrote: https://github.com/mdroth/qemu/commits/spapr-pci-hotplug-ppc-next-cleanup4.2 The sPAPRDREntry stuff is now modeled by the sPAPRDRConnector QOM object in hw/ppc/spapr_drc.c, which manages the device's life-cycle based on rtas-set-sensor-state calls from the guest. As part of qemu-side hotplug/unplug you use the attach/detach methods of the DRC to associate DT bits and callbacks for things like device cleanup or rtas calls to fetch a DT node from the device associated with a particular DRC. I still need to fix endian issues, and am realizing the dr connectors and DT bits for PHBs are not actually a prereq for PCI hotplug, so I may be pulling that out to a separate series specific to enabling PHB hotplug (namely for VFIO hotplug). I realize your CPU/MEM sort of depend on the top-level PHB device tree code so I'm not sure how best to deal with that. Worse case we'd roll the initial code into your series and base a follow-up series on that of that instead. Thanks Michael for pointing me to your git tree. I started rebasing my patchset on top of yours and realized that the generic DT setup code from the below commits of your branch are needed for CPU and memory hotplug too. They all apply in the order I have listed below. 71b32999c4eb spapr_drc: initial implementation 255c50200848 spapr: populate DRC entries for root dt node (don't need code that adds PHB DT entries) 408206fc627e3 spapr_rtas: add set-indicator RTAS interface da7a232fa6a44 spapr_rtas: add get-sensor-state RTAS interface 1c575d5b29688 spapr_rtas: add ibm,configure-connector RTAS interface 0c5d72833666c spapr_events: re-use EPOW event infrastructure for hotplug events 82ee5a9c88155 spapr_events: event-scan RTAS interface If you can make the above set an independent patchset, it will become easy to maintain and post CPU and memory hotplug patchsets. I am facing some endian issues in your patchset and I will send fixes for those separately. Regards, Bharata.
Re: [Qemu-devel] [PATCH] vhost: Fix vhostfd leak in error branch
On Fri, Nov 28, 2014 at 5:26 PM, arei.gong...@huawei.com wrote: From: Gonglei arei.gong...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- hw/scsi/vhost-scsi.c | 1 + hw/virtio/vhost.c| 2 ++ 2 files changed, 3 insertions(+) diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c index 308b393..dcb2bc5 100644 --- a/hw/scsi/vhost-scsi.c +++ b/hw/scsi/vhost-scsi.c @@ -233,6 +233,7 @@ static void vhost_scsi_realize(DeviceState *dev, Error **errp) vhost_dummy_handle_output); if (err != NULL) { error_propagate(errp, err); +close(vhostfd); return; } diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index 5d7c40a..5a12861 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -817,10 +817,12 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque, int i, r; if (vhost_set_backend_type(hdev, backend_type) 0) { +close((uintptr_t)opaque); return -1; } if (hdev-vhost_ops-vhost_backend_init(hdev, opaque) 0) { +close((uintptr_t)opaque); return -errno; } Patch looks fine. I wonder whether setting errno and goto fail would be better here? This will let vhost_backend_cleanup() to do the cleanup, e.g closeing fd or purging queue (for vhost uesr).
Re: [Qemu-devel] [PATCH 0/7] coroutine: optimizations
On Fri, Nov 28, 2014 at 10:12 PM, Paolo Bonzini pbonz...@redhat.com wrote: As discussed in the other thread, this brings speedups from dropping the coroutine mutex (which serializes multiple iothreads, too) and using ELF thread-local storage. The speedup in perf/cost is about 30% (190-145). Windows port tested with tests/test-coroutine.exe under Wine. The data is very nice, and in my laptop, 'perf cost' can be decreased from 244ns to 174ns. BTW, the cost by using coroutine to run function isn't only from these helpers(*_yield, *_enter, *_create, and perf-cost just measures this part of cost), but also some implicit/invisible part. I have some test cases which can show the problem. If someone is interested, I can post them in list. Thanks, Ming Lei
Re: [Qemu-devel] [PATCH] vhost: Fix vhostfd leak in error branch
On 2014/12/1 13:03, Jason Wang wrote: On Fri, Nov 28, 2014 at 5:26 PM, arei.gong...@huawei.com wrote: From: Gonglei arei.gong...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- hw/scsi/vhost-scsi.c | 1 + hw/virtio/vhost.c| 2 ++ 2 files changed, 3 insertions(+) diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c index 308b393..dcb2bc5 100644 --- a/hw/scsi/vhost-scsi.c +++ b/hw/scsi/vhost-scsi.c @@ -233,6 +233,7 @@ static void vhost_scsi_realize(DeviceState *dev, Error **errp) vhost_dummy_handle_output); if (err != NULL) { error_propagate(errp, err); +close(vhostfd); return; } diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index 5d7c40a..5a12861 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -817,10 +817,12 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque, int i, r; if (vhost_set_backend_type(hdev, backend_type) 0) { +close((uintptr_t)opaque); return -1; } if (hdev-vhost_ops-vhost_backend_init(hdev, opaque) 0) { +close((uintptr_t)opaque); return -errno; } Patch looks fine. I wonder whether setting errno and goto fail would be better here? This will let vhost_backend_cleanup() to do the cleanup, e.g closeing fd or purging queue (for vhost uesr). Hi, Jason Actually, vhost_backend_init() can not fail for both vhost-usr and vhost-backend-type-kernel at present. Besides, vhost-usr' s vhost_backend_cleanup() just set dev-opaque to 0, don't purge queues. Regards, -Gonglei
Re: [Qemu-devel] [PATCH 0/7] coroutine: optimizations
On 01.12.2014 06:55, Ming Lei wrote: On Fri, Nov 28, 2014 at 10:12 PM, Paolo Bonzini pbonz...@redhat.com wrote: As discussed in the other thread, this brings speedups from dropping the coroutine mutex (which serializes multiple iothreads, too) and using ELF thread-local storage. The speedup in perf/cost is about 30% (190-145). Windows port tested with tests/test-coroutine.exe under Wine. The data is very nice, and in my laptop, 'perf cost' can be decreased from 244ns to 174ns. BTW, the cost by using coroutine to run function isn't only from these helpers(*_yield, *_enter, *_create, and perf-cost just measures this part of cost), but also some implicit/invisible part. I have some test cases which can show the problem. If someone is interested, I can post them in list. Of course, maybe the problem can be solved or impaired. Peter
[Qemu-devel] [Bug 1363641] Re: Build of v2.1.0 fails on armv7l due to undeclared __NR_select
Hi Eduardo - your above commit doesn't update the version in the error message (a few lines below, still says = 2.1.0). Sorry if this isn't the right place to comment on your patch, but it would be nice to fix (just spent a while trying to figure out why having 2.1.0 installed wasn't satisfying the configure check). Also, I think the way the if statement is constructed it will not properly apply the 2.1.1 version check for i386 (only for x86_64). -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1363641 Title: Build of v2.1.0 fails on armv7l due to undeclared __NR_select Status in QEMU: New Bug description: After `make clean` and `git clean -x -f -d` `git checkout v2.1.0 configure --prefix=/home/user/prefix-qemu-2.1.0 make` fails due to missing declarations CCqemu-seccomp.o qemu-seccomp.c:28:1: error: '__NR_select' undeclared here (not in a function) qemu-seccomp.c:36:1: error: '__NR_mmap' undeclared here (not in a function) qemu-seccomp.c:57:1: error: '__NR_getrlimit' undeclared here (not in a function) qemu-seccomp.c:96:1: error: '__NR_time' undeclared here (not in a function) GEN qmp-marshal.c qemu-seccomp.c:186:1: error: '__NR_alarm' undeclared here (not in a function) make: *** [qemu-seccomp.o] Error 1 Same errors for master 8b3030114a449e66c68450acaac4b66f26d91416. `configure`should not succeed for a failing build if the error occurs due to missing dependencies, if it's a bug it needs to be fixed. `config.log` for v2.1.0 and 8b303011... attached. The content is mostly compiler output which I think is unusual for `config.log`, but see for yourself. I'm building on a debian 7.6 chroot on Synology DSM 5.0. `uname -a` says `Linux diskstatation 3.2.40 #4493 SMP Thu Aug 21 21:43:02 CST 2014 armv7l GNU/Linux`. After installing some of the missing dependencies, i.e. `apt-get install liblzo2-dev libbsd-dev syslinux-common libhwloc-dev librdmacm- dev libsnappy-dev libibverbs-dev valgrind linux- headers-3.2.0-4-common` I'm getting CCmigration-rdma.o migration-rdma.c: In function 'ram_chunk_start': migration-rdma.c:523:12: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] migration-rdma.c: In function '__qemu_rdma_add_block': migration-rdma.c:556:49: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] migration-rdma.c:557:49: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] migration-rdma.c: In function '__qemu_rdma_delete_block': migration-rdma.c:664:45: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] migration-rdma.c:699:49: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] migration-rdma.c: In function 'qemu_rdma_search_ram_block': migration-rdma.c:1113:49: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] migration-rdma.c: In function 'qemu_rdma_register_and_get_keys': migration-rdma.c:1176:50: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] migration-rdma.c:1177:29: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] migration-rdma.c:1177:51: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] migration-rdma.c:1178:29: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] migration-rdma.c: In function 'qemu_rdma_post_send_control': migration-rdma.c:1562:36: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] migration-rdma.c: In function 'qemu_rdma_post_recv_control': migration-rdma.c:1616:37: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] migration-rdma.c: In function 'qemu_rdma_write_one': migration-rdma.c:1864:16: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] migration-rdma.c:1868:53: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] migration-rdma.c:1922:52: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] migration-rdma.c:1923:50: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] migration-rdma.c:1977:49: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] migration-rdma.c:1998:49: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] migration-rdma.c:2010:58: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] migration-rdma.c: In function 'qemu_rdma_registration_handle':
Re: [Qemu-devel] [PATCH 0/7] coroutine: optimizations
On Mon, 01 Dec 2014 08:05:17 +0100 Peter Lieven p...@kamp.de wrote: On 01.12.2014 06:55, Ming Lei wrote: On Fri, Nov 28, 2014 at 10:12 PM, Paolo Bonzini pbonz...@redhat.com wrote: As discussed in the other thread, this brings speedups from dropping the coroutine mutex (which serializes multiple iothreads, too) and using ELF thread-local storage. The speedup in perf/cost is about 30% (190-145). Windows port tested with tests/test-coroutine.exe under Wine. The data is very nice, and in my laptop, 'perf cost' can be decreased from 244ns to 174ns. BTW, the cost by using coroutine to run function isn't only from these helpers(*_yield, *_enter, *_create, and perf-cost just measures this part of cost), but also some implicit/invisible part. I have some test cases which can show the problem. If someone is interested, I can post them in list. Of course, maybe the problem can be solved or impaired. OK, please try below patch: From 917d5cc0a273f9825b10abd52152c54e08c81ef8 Mon Sep 17 00:00:00 2001 From: Ming Lei ming@canonical.com Date: Mon, 1 Dec 2014 11:11:23 +0800 Subject: [PATCH] test-coroutine: introduce perf-cost-with-load The perf/cost test case only covers explicit cost by using coroutine. This patch provides a open/close file test case, and from this case, we can find there is also some implicit or invisible cost except for the cost measured by /perf/cost. In my environment, follows the test result after appying this patch and running perf/cost and perf/cost-with-load: {*LOG(start):{/perf/cost}:LOG*} /perf/cost: {*LOG(message):{Run operation 4000 iterations 7.539413 s, 5305K operations/s, 188ns per coroutine}:LOG*} OK {*LOG(stop):(0;0;7.539497):LOG*} {*LOG(start):{/perf/cost-with-load}:LOG*} /perf/cost-with-load: {*LOG(message):{Run operation 100 iterations 2.648014 s, 377K operations/s, 2648ns per operation without using coroutine}:LOG*} {*LOG(message):{Run operation 100 iterations 2.919133 s, 342K operations/s, 2919ns per operation, 271ns(cost introduced by coroutine) per operation with using coroutine}:LOG*} OK {*LOG(stop):(0;0;5.567333):LOG*} From above data, we can see 188ns is introduced for running one coroutine, but in /perf/cost-with-load, the actual cost introduced is 271ns, and the extra 83ns cost is invisible and implicit. The similar result can be found in following test case too: - read from /dev/nullb0 which is opened with O_DIRECT (it is sort of aio read simulation, need 3.13+ kernel for /dev/nullbX support by 'modprobe null_blk', this case can show +150ns extra cost) - statvfs() syscall, there is ~30ns extra cost for running one statvfs() with coroutine --- tests/test-coroutine.c | 67 1 file changed, 67 insertions(+) diff --git a/tests/test-coroutine.c b/tests/test-coroutine.c index 27d1b6f..7323a91 100644 --- a/tests/test-coroutine.c +++ b/tests/test-coroutine.c @@ -311,6 +311,72 @@ static void perf_baseline(void) maxcycles, duration); } +static void perf_cost_load_worker(void *opaque) +{ +int fd; + +fd = open(/proc/self/exe, O_RDONLY); +assert(fd = 0); +close(fd); +} + +static __attribute__((noinline)) void perf_cost_load_func(void *opaque) +{ +perf_cost_load_worker(opaque); +qemu_coroutine_yield(); +} + +static double perf_cost_load(unsigned long maxcycles, bool use_co) +{ +unsigned long i = 0; +double duration; + +g_test_timer_start(); +if (use_co) { +Coroutine *co; +while (i++ maxcycles) { +co = qemu_coroutine_create(perf_cost_load_func); +qemu_coroutine_enter(co, i); +qemu_coroutine_enter(co, NULL); +} +} else { +while (i++ maxcycles) { +perf_cost_load_worker(i); +} +} +duration = g_test_timer_elapsed(); + +return duration; +} + +static void perf_cost_with_load(void) +{ +const unsigned long maxcycles = 100; +double duration; +unsigned long ops; +unsigned long cost_co, cost; + +duration = perf_cost_load(maxcycles, false); +ops = (long)(maxcycles / (duration * 1000)); +cost = (unsigned long)(10.0 * duration / maxcycles); +g_test_message(Run operation %lu iterations %f s, %luK operations/s, + %luns per operation without using coroutine, + maxcycles, + duration, ops, + cost); + +duration = perf_cost_load(maxcycles, true); +ops = (long)(maxcycles / (duration * 1000)); +cost_co = (unsigned long)(10.0 * duration / maxcycles); +g_test_message(Run operation %lu iterations %f s, %luK operations/s, + %luns per operation, + %luns(cost introduced by coroutine) per operation + with
Re: [Qemu-devel] [PATCH] vhost: Fix vhostfd leak in error branch
On Mon, Dec 1, 2014 at 2:27 PM, Gonglei arei.gong...@huawei.com wrote: On 2014/12/1 13:03, Jason Wang wrote: On Fri, Nov 28, 2014 at 5:26 PM, arei.gong...@huawei.com wrote: From: Gonglei arei.gong...@huawei.com Signed-off-by: Gonglei arei.gong...@huawei.com --- hw/scsi/vhost-scsi.c | 1 + hw/virtio/vhost.c| 2 ++ 2 files changed, 3 insertions(+) diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c index 308b393..dcb2bc5 100644 --- a/hw/scsi/vhost-scsi.c +++ b/hw/scsi/vhost-scsi.c @@ -233,6 +233,7 @@ static void vhost_scsi_realize(DeviceState *dev, Error **errp) vhost_dummy_handle_output); if (err != NULL) { error_propagate(errp, err); +close(vhostfd); return; } diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index 5d7c40a..5a12861 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -817,10 +817,12 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque, int i, r; if (vhost_set_backend_type(hdev, backend_type) 0) { +close((uintptr_t)opaque); return -1; } if (hdev-vhost_ops-vhost_backend_init(hdev, opaque) 0) { +close((uintptr_t)opaque); return -errno; } Patch looks fine. I wonder whether setting errno and goto fail would be better here? This will let vhost_backend_cleanup() to do the cleanup, e.g closeing fd or purging queue (for vhost uesr). Hi, Jason Actually, vhost_backend_init() can not fail for both vhost-usr and vhost-backend-type-kernel at present. Besides, vhost-usr' s vhost_backend_cleanup() just set dev-opaque to 0, don't purge queues. I see, thanks for explaining. Reviewed-by: Jason Wang jasow...@redhat.com