Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
On 02/11/2012 03:12 PM, Andreas Färber wrote: Yes and no. They can have any target-specific pointer they want, just as before. But no global first_cpu / cpu_single_env pointer - that's replaced by CPU pointers, through which members of derived classes can be accessed (which did not work for CPUState due to CPU_COMMON members being at target-specific offset in the middle). Hmm, now I'm not even sure what I want that Andreas referred to. :) I definitely would like CPUState pointers to be changed into link properties, but that's not related to what Jan is doing here with cpu_single_env. Each LAPIC refers to a CPU, and that would become a link property indeed. But here we're using cpu_single_env to find out which LAPIC is being read. It's the other direction. Relying on thread-local cpu_single_env means that you restrict LAPIC memory reads to run in VCPU thread context, and this makes sense anyway. The only case of MMIO running in iothread context is Xen, but Xen always keeps the LAPIC in the hypervisor. Also, I think that having a view of CPUs in QOM is laudable, but I don't understand why that means you need to remove first_cpu / cpu_single_env. Finally, CPU_COMMON members may be referenced from TCG-generated code, how do you plan to move them and still keep the TLBs at small offsets within CPUState? Perhaps we need a drawing of the situation before and after the QOMization of CPUs. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane
https://bugzilla.kernel.org/show_bug.cgi?id=42755 --- Comment #21 from Gleb g...@redhat.com 2012-02-13 08:53:28 --- Can you please compile trace-cmd from its git [1] (do make all_cmd install_cmd; install part is important) and try getting trace with it? If this will not work I will guide you how to take a trace using debugfs directly. [1] git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2-RFC 1/2] shpc: standard hot plug controller
This adds support for SHPC interface, as defined by PCI Standard Hot-Plug Controller and Subsystem Specification, Rev 1.0 http://www.pcisig.com/specifications/conventional/pci_hot_plug/SHPC_10 Only SHPC intergrated with a PCI-to-PCI bridge is supported, SHPC integrated with a host bridge would need more work. All main SHPC features are supported: - MRL sensor - Attention button - Attention indicator - Power indicator Wake on hotplug and serr generation are stubbed out but unused as we don't have interfaces to generate these events ATM. One issue that isn't completely resolved is that qemu currently expects an eject interface, which SHPC does not provide: it merely removes the power to device and it's up to the user to remove the device from slot. This patch works around that by ejecting the device when power is removed and power LED goes off. TODO: - migration support - fix dependency on pci_internals.h Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Makefile.objs |1 + hw/pci.h |6 + hw/shpc.c | 646 + hw/shpc.h | 40 qemu-common.h |1 + 5 files changed, 694 insertions(+), 0 deletions(-) create mode 100644 hw/shpc.c create mode 100644 hw/shpc.h diff --git a/Makefile.objs b/Makefile.objs index 391e524..4546477 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -195,6 +195,7 @@ hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o hw-obj-y += fw_cfg.o hw-obj-$(CONFIG_PCI) += pci.o pci_bridge.o hw-obj-$(CONFIG_PCI) += msix.o msi.o +hw-obj-$(CONFIG_PCI) += shpc.o hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o hw-obj-y += watchdog.o diff --git a/hw/pci.h b/hw/pci.h index 33b0b18..756577e 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -125,6 +125,9 @@ enum { /* command register SERR bit enabled */ #define QEMU_PCI_CAP_SERR_BITNR 4 QEMU_PCI_CAP_SERR = (1 QEMU_PCI_CAP_SERR_BITNR), +/* Standard hot plug controller. */ +#define QEMU_PCI_SHPC_BITNR 5 +QEMU_PCI_CAP_SHPC = (1 QEMU_PCI_SHPC_BITNR), }; #define TYPE_PCI_DEVICE pci-device @@ -229,6 +232,9 @@ struct PCIDevice { /* PCI Express */ PCIExpressDevice exp; +/* SHPC */ +SHPCDevice *shpc; + /* Location of option rom */ char *romfile; bool has_rom; diff --git a/hw/shpc.c b/hw/shpc.c new file mode 100644 index 000..4baec29 --- /dev/null +++ b/hw/shpc.c @@ -0,0 +1,646 @@ +#include strings.h +#include stdint.h +#include range.h +#include shpc.h +#include pci.h +#include pci_internals.h + +/* TODO: model power only and disabled slot states. */ +/* TODO: handle SERR and wakeups */ +/* TODO: consider enabling 66MHz support */ + +/* TODO: remove fully only on state DISABLED and LED off. + * track state to properly record this. */ + +/* SHPC Working Register Set */ +#define SHPC_BASE_OFFSET 0x00 /* 4 bytes */ +#define SHPC_SLOTS_33 0x04 /* 4 bytes. Also encodes PCI-X slots. */ +#define SHPC_SLOTS_66 0x08 /* 4 bytes. */ +#define SHPC_NSLOTS 0x0C /* 1 byte */ +#define SHPC_FIRST_DEV0x0D /* 1 byte */ +#define SHPC_PHYS_SLOT0x0E /* 2 byte */ +#define SHPC_PHYS_NUM_MAX 0x7ff +#define SHPC_PHYS_NUM_UP 0x1000 +#define SHPC_PHYS_MRL 0x4000 +#define SHPC_PHYS_BUTTON 0x8000 +#define SHPC_SEC_BUS 0x10 /* 2 bytes */ +#define SHPC_SEC_BUS_33 0x0 +#define SHPC_SEC_BUS_66 0x1 /* Unused */ +#define SHPC_SEC_BUS_MASK 0x7 +#define SHPC_MSI_CTL 0x12 /* 1 byte */ +#define SHPC_PROG_IFC 0x13 /* 1 byte */ +#define SHPC_PROG_IFC_1_0 0x1 +#define SHPC_CMD_CODE 0x14 /* 1 byte */ +#define SHPC_CMD_TRGT 0x15 /* 1 byte */ +#define SHPC_CMD_TRGT_MIN 0x1 +#define SHPC_CMD_TRGT_MAX 0x1f +#define SHPC_CMD_STATUS 0x16 /* 2 bytes */ +#define SHPC_CMD_STATUS_BUSY 0x1 +#define SHPC_CMD_STATUS_MRL_OPEN 0x2 +#define SHPC_CMD_STATUS_INVALID_CMD 0x4 +#define SHPC_CMD_STATUS_INVALID_MODE 0x8 +#define SHPC_INT_LOCATOR 0x18 /* 4 bytes */ +#define SHPC_INT_COMMAND 0x1 +#define SHPC_SERR_LOCATOR 0x1C /* 4 bytes */ +#define SHPC_SERR_INT 0x20 /* 4 bytes */ +#define SHPC_INT_DIS 0x1 +#define SHPC_SERR_DIS 0x2 +#define SHPC_CMD_INT_DIS 0x4 +#define SHPC_ARB_SERR_DIS 0x8 +#define SHPC_CMD_DETECTED 0x1 +#define SHPC_ARB_DETECTED 0x2 + /* 4 bytes * slot # (start from 0) */ +#define SHPC_SLOT_REG(s) (0x24 + (s) * 4) + /* 2 bytes */ +#define SHPC_SLOT_STATUS(s) (0x0 + SHPC_SLOT_REG(s)) + +/* Same slot state masks are used for command and status registers */ +#define SHPC_SLOT_STATE_MASK 0x03 +#define SHPC_SLOT_STATE_SHIFT \ +(ffs(SHPC_SLOT_STATE_MASK) - 1) + +#define SHPC_STATE_NO 0x0 +#define SHPC_STATE_PWRONLY 0x1 +#define SHPC_STATE_ENABLED 0x2 +#define SHPC_STATE_DISABLED 0x3 + +#define SHPC_SLOT_PWR_LED_MASK 0xC +#define SHPC_SLOT_PWR_LED_SHIFT \ +(ffs(SHPC_SLOT_PWR_LED_MASK) - 1) +#define SHPC_SLOT_ATTN_LED_MASK 0x30 +#define SHPC_SLOT_ATTN_LED_SHIFT \ +
[PATCHv2-RFC 2/2] pci: add standard bridge device
This adds support for a standard pci to pci bridge, enabling support for more than 32 PCI devices in the system. Device hotplug is supported by means of SHPC controller. For guests with an SHPC driver, this allows robust hotplug and even hotplug of nested bridges, up to 31 devices per bridge. TODO: - chassis capability support - migration support - remove dependency on pci_internals.h Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Makefile.objs |2 +- hw/pci_bridge_dev.c | 136 +++ 2 files changed, 137 insertions(+), 1 deletions(-) create mode 100644 hw/pci_bridge_dev.c diff --git a/Makefile.objs b/Makefile.objs index 4546477..e89112c 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -193,7 +193,7 @@ hw-obj-$(CONFIG_VIRTIO) += virtio-console.o hw-obj-y += usb-libhw.o hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o hw-obj-y += fw_cfg.o -hw-obj-$(CONFIG_PCI) += pci.o pci_bridge.o +hw-obj-$(CONFIG_PCI) += pci.o pci_bridge.o pci_bridge_dev.o hw-obj-$(CONFIG_PCI) += msix.o msi.o hw-obj-$(CONFIG_PCI) += shpc.o hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o diff --git a/hw/pci_bridge_dev.c b/hw/pci_bridge_dev.c new file mode 100644 index 000..f48cd2d --- /dev/null +++ b/hw/pci_bridge_dev.c @@ -0,0 +1,136 @@ +/* + * Standard PCI Bridge Device + * + * Copyright (c) 2011 Red Hat Inc. Author: Michael S. Tsirkin m...@redhat.com + * + * http://www.pcisig.com/specifications/conventional/pci_to_pci_bridge_architecture/ + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ + +#include pci_bridge.h +#include pci_ids.h +#include shpc.h +#include memory.h +#include pci_internals.h + +#define REDHAT_PCI_VENDOR_ID 0x1b36 +#define PCI_BRIDGE_DEV_VENDOR_ID REDHAT_PCI_VENDOR_ID +#define PCI_BRIDGE_DEV_DEVICE_ID 0x1 + +struct PCIBridgeDev { +PCIBridge bridge; +MemoryRegion bar; +}; +typedef struct PCIBridgeDev PCIBridgeDev; + +/* Mapping mandated by PCI-to-PCI Bridge architecture specification, + * revision 1.2 */ +/* Table 9-1: Interrupt Binding for Devices Behind a Bridge */ +static int pci_bridge_dev_map_irq_fn(PCIDevice *dev, int irq_num) +{ +return (irq_num + PCI_SLOT(dev-devfn)) % PCI_NUM_PINS; +} + +static int pci_bridge_dev_initfn(PCIDevice *dev) +{ +PCIBridge *br = DO_UPCAST(PCIBridge, dev, dev); +PCIBridgeDev *bridge_dev = DO_UPCAST(PCIBridgeDev, bridge, br); +int err; +br-map_irq = pci_bridge_dev_map_irq_fn; +/* If we don't specify the name, the bus will be addressed as id.0, where + * id is the parent id. But it seems more natural to address the bus using + * the parent device name. */ +if (dev-qdev.id *dev-qdev.id) { +br-bus_name = dev-qdev.id; +} +err = pci_bridge_initfn(dev); +if (err) { +goto bridge_error; +} +memory_region_init(bridge_dev-bar, shpc-bar, shpc_bar_size(dev)); +err = shpc_init(dev, br-sec_bus, bridge_dev-bar, 0); +if (err) { +goto error; +} +/* TODO: spec recommends using 64 bit prefetcheable BAR. + * Check whether that works well. */ +pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, bridge_dev-bar); +dev-config[PCI_INTERRUPT_PIN] = 0x1; +return 0; +error: +memory_region_destroy(bridge_dev-bar); +bridge_error: +return err; +} + +static int pci_bridge_dev_exitfn(PCIDevice *dev) +{ +PCIBridge *br = DO_UPCAST(PCIBridge, dev, dev); +PCIBridgeDev *bridge_dev = DO_UPCAST(PCIBridgeDev, bridge, br); +int ret; +shpc_cleanup(dev); +memory_region_destroy(bridge_dev-bar); +ret = pci_bridge_exitfn(dev); +assert(!ret); +return 0; +} + +static void pci_bridge_dev_write_config(PCIDevice *d, +uint32_t address, uint32_t val, int len) +{ +pci_bridge_write_config(d, address, val, len); +shpc_cap_write_config(d, address, val, len); +} + +static void qdev_pci_bridge_dev_reset(DeviceState *qdev) +{ +PCIDevice *dev = DO_UPCAST(PCIDevice, qdev, qdev); +pci_bridge_reset(qdev); +shpc_reset(dev); +} + +static void pci_bridge_dev_class_init(ObjectClass *klass, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(klass); +PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); +k-init = pci_bridge_dev_initfn; +k-exit = pci_bridge_dev_exitfn; +k-config_write = pci_bridge_dev_write_config; +k-vendor_id =
[PATCHv2-RFC 0/2] RFC: standard pci bridge device
Here's a new version of the patch. It works for me. Deep nesting of bridges is supported. You need a small BIOS patch to support the OSHP method if you want hotplug to work. I will post this separately. We'd need a full ACPI driver to make hotplug work for guests without an SHPC driver (e.g. windows XP). Management support will also be needed. One small wrinkle is that the pci_addr property wants data in a format bus:device.function which is broken as guests can change bus numbers. For testing I used the 'addr' property which encodes slot*8+function#. We probably want to extend pci_addr in some way (e.g. :device.function ? Thoughts?). The SHPC controller supports up to 31 devices (out of 32 slots) so slot 0 doesn't support hotplug. Non hot-pluggable devices behind the bridge don't work currectly (we'll try to unplug them) so don't do this. For now I just blocked adding devices in slot 0, in the future it might be possible to add a non-hotpluggable device there. Example: qemu-system-x86_64 -enable-kvm -m 1G -drive file=/home/mst/rhel6.qcow2 -netdev tap,id=foo,ifname=msttap0,script=/home/mst/ifup,downscript=no,vhost=on -device pci-bridge,id=bog -device virtio-net-pci,netdev=foo,bus=bog,addr=8 Hot-unplug currently causes qemu to crash, this happens without this patch too, so I'm not worried :) New since v1: hotplug support -- MST Michael S. Tsirkin (2): shpc: standard hot plug controller pci: add standard bridge device Makefile.objs |3 +- hw/pci.h|6 + hw/pci_bridge_dev.c | 136 +++ hw/shpc.c | 646 +++ hw/shpc.h | 40 qemu-common.h |1 + 6 files changed, 831 insertions(+), 1 deletions(-) create mode 100644 hw/pci_bridge_dev.c create mode 100644 hw/shpc.c create mode 100644 hw/shpc.h -- 1.7.9.111.gf3fb0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pe: [PATCH v5 1/3] virtio-scsi: first version
On 02/12/2012 09:16 PM, James Bottomley wrote: Well, no-one's yet answered the question I had about why. virtio-scsi seems to be a basic duplication of virtio-blk except that it seems to fix some problems virtio-blk has. Namely queue parameter discover, which virtio-blk doesn't seem to do. The biggest differences between virtio-blk and virtio-scsi are that: 1) how the feature set is defined. virtio-blk defines the feature set of the device through a shared spec between the guest and the host. The virtio-scsi spec does not define a feature set for the devices, only for the transport. Introducing new features in the guest does not need to be done specifically for virt, it can be done in generic code (sd.c). This results in a large feature set and at the same time a very stable spec. Right now virtio-blk covers common usecases nicely. However, the Linux block layer _is_ growing support for new operations: discard is already there, write same is in the works, extended copy will also come in due time. Perhaps we'll add them to virtio-blk, perhaps not. If we will, we will have to modify the spec, the host implementation, and the guest drivers for each possible guest OS. virtio-scsi will support them transparently. Depending on your configuration, it might work without touching the host at all. 2) for disks with SCSI attachment, the native interface is exposed precisely as it is in the host. I think we had some misunderstanding WRT queue parameter discovery. My concern with virtio-blk's SG_IO support is more general than that. It is that SG_IO accesses the host disk, not the guest disk. They will have the same data, but they are effectively different disks. For example they might have different queue parameters, hence the misunderstanding. People are mostly using the SG_IO interface for sane purposes. For example you can ping the storage with INQUIRY commands to detect problems on the NAS or SAN. For these usecases the difference does not matter. However, there _are_ worrisome usecases for SG_IO that people are looking at. For example installing vendor backup tools in their guests. These tools send vendor-specific commands to the disks. Nothing particularly insane about that, but we want them to do it using a saner interface than VIRTIO_BLK_T_SCSI_CMD. On top of this, only virtio-scsi obviously will support devices such as tapes. There may also be a reason to cut the stack lower down. Error handling is most often cited for this, but no-one's satisfactorily explaned why it's better to do error handling in the guest instead of the host. It's not necessarily better. However error handling in the host may simply not be there. This is for example the case of NFS-based storage with the hard option. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm-1.0 crashes with threaded vnc server?
Am 11.02.2012 um 09:55 schrieb Corentin Chary: On Thu, Feb 9, 2012 at 7:08 PM, Peter Lieven p...@dlh.net wrote: Hi, is anyone aware if there are still problems when enabling the threaded vnc server? I saw some VMs crashing when using a qemu-kvm build with --enable-vnc-thread. qemu-kvm-1.0[22646]: segfault at 0 ip 7fec1ca7ea0b sp 7fec19d056d0 error 6 in libz.so.1.2.3.3[7fec1ca75000+16000] qemu-kvm-1.0[26056]: segfault at 7f06d8d6e010 ip 7f06e0a30d71 sp 7f06df035748 error 6 in libc-2.11.1.so[7f06e09aa000+17a000] I had no time to debug further. It seems to happen shortly after migrating, but thats uncertain. At least the segfault in libz seems to give a hint to VNC since I cannot image of any other part of qemu-kvm using libz except for VNC server. Thanks, Peter Hi Peter, I found two patches on my git tree that I sent long ago but somehow get lost on the mailing list. I rebased the tree but did not have the time (yet) to test them. http://git.iksaif.net/?p=qemu.git;a=shortlog;h=refs/heads/wip Feel free to try them. If QEMU segfault again, please send a full gdb backtrace / valgrind trace / way to reproduce :). Thanks, Hi Corentin, thanks for rebasing those patches. I remember that I have seen them the last time I noticed (about 1 year ago) that the threaded VNC is crashing. I'm on vacation this week, but I will test them next week and let you know if I can force a crash with them applied. If not we should consider to include them asap. Peter -- Corentin Chary http://xf.iksaif.net -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC] seabios: add OSHP method stub
To allow guests to load the native SHPC driver for a bridge, we must declare an OSHP method for the appropriate device which lets the OS take control of the SHPC. As we don't access SHPC at the moment, we don't need to do anything - just report success. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- diff --git a/src/ssdt-pcihp.dsl b/src/ssdt-pcihp.dsl index 442e7a8..3f50169 100644 --- a/src/ssdt-pcihp.dsl +++ b/src/ssdt-pcihp.dsl @@ -24,6 +24,7 @@ DefinitionBlock (ssdt-pcihp.aml, SSDT, 0x01, BXPC, BXSSDTPCIHP, 0x1) ACPI_EXTRACT_METHOD_STRING aml_ej0_name \ Method (_EJ0, 1) { Return(PCEJ(0x##slot)) } \ Name (_SUN, 0x##slot)\ + Method (OSHP, 1) { Return(0x0) } \ } hotplug_slot(03) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2-RFC 0/2] RFC: standard pci bridge device
At 02/13/2012 05:15 PM, Michael S. Tsirkin Wrote: Here's a new version of the patch. It works for me. Deep nesting of bridges is supported. You need a small BIOS patch to support the OSHP method if you want hotplug to work. I will post this separately. We'd need a full ACPI driver to make hotplug work for guests without an SHPC driver (e.g. windows XP). Management support will also be needed. One small wrinkle is that the pci_addr property wants data in a format bus:device.function which is broken as guests can change bus numbers. For testing I used the 'addr' property which encodes slot*8+function#. We probably want to extend pci_addr in some way (e.g. :device.function ? Thoughts?). What about using id+device(slot)+function to set the address? The SHPC controller supports up to 31 devices (out of 32 slots) so slot 0 doesn't support hotplug. Non hot-pluggable devices behind the bridge don't work currectly (we'll try to unplug them) so don't do this. For now I just blocked adding devices in slot 0, in the future it might be possible to add a non-hotpluggable device there. Example: qemu-system-x86_64 -enable-kvm -m 1G -drive file=/home/mst/rhel6.qcow2 -netdev tap,id=foo,ifname=msttap0,script=/home/mst/ifup,downscript=no,vhost=on -device pci-bridge,id=bog -device virtio-net-pci,netdev=foo,bus=bog,addr=8 Hot-unplug currently causes qemu to crash, this happens without this patch too, so I'm not worried :) How to trigger this bug without this patch? Thanks Wen Congyang New since v1: hotplug support -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
AHCI Boot disk?
Hello, I am attempting to test AHCI disks and find that I am unable to boot of the disk, even though the disk is seen if I do a network boot. This works: -drive file=/srv/kvm/debian.raw,if=virtio,cache=writeback,bus=0,index=0,media=disk,format=raw,serial=1,boot=on This [1] does not: -drive file=/srv/kvm/debian.raw,if=none,id=${AHCIID} \ -device ahci,id=${AHCIID} \ -device ide-hd,drive=${AHCIID},bus=${AHCIID}.0 The latter gives me : Boot failed. Could not read the boot disk 1. Should it work? 2. If so, what am I doing wrong? ;) Any help much appreciated! Conrad [1] http://wiki.qemu.org/ChangeLog/0.14#IDE_.2F_AHCI -- Conrad Wood (Deputy CTO, Head of Research Innovations) ProfitBricks GmbH Greifswalder Str. 207 D - 10405 Berlin Office: +49 30 51 64 09 21 DDI:+49 30 51 300 021 Email: conrad.w...@profitbricks.com URL:http://www.profitbricks.com/ Sitz der Gesellschaft: Berlin Registergericht: Amtsgericht Charlottenburg, HRB 125506 B. Geschäftsführer: Andreas Gauger, Achim Weiss -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AHCI Boot disk?
On Mon, Feb 13, 2012 at 09:55:46AM +0100, Conrad Wood wrote: Hello, I am attempting to test AHCI disks and find that I am unable to boot of the disk, even though the disk is seen if I do a network boot. This works: -drive file=/srv/kvm/debian.raw,if=virtio,cache=writeback,bus=0,index=0,media=disk,format=raw,serial=1,boot=on You shouldn't use boot=on. This [1] does not: -drive file=/srv/kvm/debian.raw,if=none,id=${AHCIID} \ -device ahci,id=${AHCIID} \ -device ide-hd,drive=${AHCIID},bus=${AHCIID}.0 The latter gives me : Boot failed. Could not read the boot disk 1. Should it work? AFAIK yes if you BIOS is up-to-date. 2. If so, what am I doing wrong? ;) Try to compile BIOS from git://git.seabios.org/seabios.git Any help much appreciated! Conrad [1] http://wiki.qemu.org/ChangeLog/0.14#IDE_.2F_AHCI -- Conrad Wood (Deputy CTO, Head of Research Innovations) ProfitBricks GmbH Greifswalder Str. 207 D - 10405 Berlin Office: +49 30 51 64 09 21 DDI:+49 30 51 300 021 Email: conrad.w...@profitbricks.com URL:http://www.profitbricks.com/ Sitz der Gesellschaft: Berlin Registergericht: Amtsgericht Charlottenburg, HRB 125506 B. Geschäftsführer: Andreas Gauger, Achim Weiss -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AHCI Boot disk?
On Mon, 2012-02-13 at 11:50 +0200, Gleb Natapov wrote: On Mon, Feb 13, 2012 at 09:55:46AM +0100, Conrad Wood wrote: Hello, I am attempting to test AHCI disks and find that I am unable to boot of the disk, even though the disk is seen if I do a network boot. This works: -drive file=/srv/kvm/debian.raw,if=virtio,cache=writeback,bus=0,index=0,media=disk,format=raw,serial=1,boot=on You shouldn't use boot=on. Yes... but in our specific case we cannot use boot-order (yet)... Thanks for point it out though ;) This [1] does not: -drive file=/srv/kvm/debian.raw,if=none,id=${AHCIID} \ -device ahci,id=${AHCIID} \ -device ide-hd,drive=${AHCIID},bus=${AHCIID}.0 The latter gives me : Boot failed. Could not read the boot disk 1. Should it work? AFAIK yes if you BIOS is up-to-date. 2. If so, what am I doing wrong? ;) Try to compile BIOS from git://git.seabios.org/seabios.git ok, will do. Thanks Conrad -- Conrad Wood (Deputy CTO, Head of Research Innovations) ProfitBricks GmbH Greifswalder Str. 207 D - 10405 Berlin Office: +49 30 51 64 09 21 DDI:+49 30 51 300 021 Email: conrad.w...@profitbricks.com URL:http://www.profitbricks.com/ Sitz der Gesellschaft: Berlin Registergericht: Amtsgericht Charlottenburg, HRB 125506 B. Geschäftsführer: Andreas Gauger, Achim Weiss -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2-RFC 0/2] RFC: standard pci bridge device
On Mon, Feb 13, 2012 at 05:38:26PM +0800, Wen Congyang wrote: At 02/13/2012 05:15 PM, Michael S. Tsirkin Wrote: Here's a new version of the patch. It works for me. Deep nesting of bridges is supported. You need a small BIOS patch to support the OSHP method if you want hotplug to work. I will post this separately. We'd need a full ACPI driver to make hotplug work for guests without an SHPC driver (e.g. windows XP). Management support will also be needed. One small wrinkle is that the pci_addr property wants data in a format bus:device.function which is broken as guests can change bus numbers. For testing I used the 'addr' property which encodes slot*8+function#. We probably want to extend pci_addr in some way (e.g. :device.function ? Thoughts?). What about using id+device(slot)+function to set the address? That's exactly what this patch does: addr encodes slot+function. I was asking about a friendlier format for this. The SHPC controller supports up to 31 devices (out of 32 slots) so slot 0 doesn't support hotplug. Non hot-pluggable devices behind the bridge don't work currectly (we'll try to unplug them) so don't do this. For now I just blocked adding devices in slot 0, in the future it might be possible to add a non-hotpluggable device there. Example: qemu-system-x86_64 -enable-kvm -m 1G -drive file=/home/mst/rhel6.qcow2 -netdev tap,id=foo,ifname=msttap0,script=/home/mst/ifup,downscript=no,vhost=on -device pci-bridge,id=bog -device virtio-net-pci,netdev=foo,bus=bog,addr=8 Hot-unplug currently causes qemu to crash, this happens without this patch too, so I'm not worried :) How to trigger this bug without this patch? Thanks Wen Congyang start with qemu-system-x86_64 -enable-kvm -m 1G -drive file=/home/mst/rhel6.qcow2 -netdev tap,id=foo,ifname=msttap0,script=/home/mst/ifup,downscript=no,vhost=on next do: device_add virtio-net-pci,netdev=foo,id=bla wait a bit for guest to notice the device device_del bla wait for device to go away and it will crash on next malloc, to trigger malloc give another command, e.g. info pci New since v1: hotplug support -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2-RFC 1/2] shpc: standard hot plug controller
Oh nice work. On Mon, Feb 13, 2012 at 11:15:55AM +0200, Michael S. Tsirkin wrote: This adds support for SHPC interface, as defined by PCI Standard Hot-Plug Controller and Subsystem Specification, Rev 1.0 http://www.pcisig.com/specifications/conventional/pci_hot_plug/SHPC_10 Only SHPC intergrated with a PCI-to-PCI bridge is supported, SHPC integrated with a host bridge would need more work. All main SHPC features are supported: - MRL sensor Does this just report latch status? (It seems so.) Do you plan to provide interfaces to manipulate the latch? - Attention button - Attention indicator - Power indicator Wake on hotplug and serr generation are stubbed out but unused as we don't have interfaces to generate these events ATM. One issue that isn't completely resolved is that qemu currently expects an eject interface, which SHPC does not provide: it merely removes the power to device and it's up to the user to remove the device from slot. This patch works around that by ejecting the device when power is removed and power LED goes off. TODO: - migration support - fix dependency on pci_internals.h If I didn't miss the code, - QMP command for pushing attention button. - QMP command to get LED status - QMP events for LED on/off thanks, Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Makefile.objs |1 + hw/pci.h |6 + hw/shpc.c | 646 + hw/shpc.h | 40 qemu-common.h |1 + 5 files changed, 694 insertions(+), 0 deletions(-) create mode 100644 hw/shpc.c create mode 100644 hw/shpc.h diff --git a/Makefile.objs b/Makefile.objs index 391e524..4546477 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -195,6 +195,7 @@ hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o hw-obj-y += fw_cfg.o hw-obj-$(CONFIG_PCI) += pci.o pci_bridge.o hw-obj-$(CONFIG_PCI) += msix.o msi.o +hw-obj-$(CONFIG_PCI) += shpc.o hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o hw-obj-y += watchdog.o diff --git a/hw/pci.h b/hw/pci.h index 33b0b18..756577e 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -125,6 +125,9 @@ enum { /* command register SERR bit enabled */ #define QEMU_PCI_CAP_SERR_BITNR 4 QEMU_PCI_CAP_SERR = (1 QEMU_PCI_CAP_SERR_BITNR), +/* Standard hot plug controller. */ +#define QEMU_PCI_SHPC_BITNR 5 +QEMU_PCI_CAP_SHPC = (1 QEMU_PCI_SHPC_BITNR), }; #define TYPE_PCI_DEVICE pci-device @@ -229,6 +232,9 @@ struct PCIDevice { /* PCI Express */ PCIExpressDevice exp; +/* SHPC */ +SHPCDevice *shpc; + /* Location of option rom */ char *romfile; bool has_rom; diff --git a/hw/shpc.c b/hw/shpc.c new file mode 100644 index 000..4baec29 --- /dev/null +++ b/hw/shpc.c @@ -0,0 +1,646 @@ +#include strings.h +#include stdint.h +#include range.h +#include shpc.h +#include pci.h +#include pci_internals.h + +/* TODO: model power only and disabled slot states. */ +/* TODO: handle SERR and wakeups */ +/* TODO: consider enabling 66MHz support */ + +/* TODO: remove fully only on state DISABLED and LED off. + * track state to properly record this. */ + +/* SHPC Working Register Set */ +#define SHPC_BASE_OFFSET 0x00 /* 4 bytes */ +#define SHPC_SLOTS_33 0x04 /* 4 bytes. Also encodes PCI-X slots. */ +#define SHPC_SLOTS_66 0x08 /* 4 bytes. */ +#define SHPC_NSLOTS 0x0C /* 1 byte */ +#define SHPC_FIRST_DEV0x0D /* 1 byte */ +#define SHPC_PHYS_SLOT0x0E /* 2 byte */ +#define SHPC_PHYS_NUM_MAX 0x7ff +#define SHPC_PHYS_NUM_UP 0x1000 +#define SHPC_PHYS_MRL 0x4000 +#define SHPC_PHYS_BUTTON 0x8000 +#define SHPC_SEC_BUS 0x10 /* 2 bytes */ +#define SHPC_SEC_BUS_33 0x0 +#define SHPC_SEC_BUS_66 0x1 /* Unused */ +#define SHPC_SEC_BUS_MASK 0x7 +#define SHPC_MSI_CTL 0x12 /* 1 byte */ +#define SHPC_PROG_IFC 0x13 /* 1 byte */ +#define SHPC_PROG_IFC_1_0 0x1 +#define SHPC_CMD_CODE 0x14 /* 1 byte */ +#define SHPC_CMD_TRGT 0x15 /* 1 byte */ +#define SHPC_CMD_TRGT_MIN 0x1 +#define SHPC_CMD_TRGT_MAX 0x1f +#define SHPC_CMD_STATUS 0x16 /* 2 bytes */ +#define SHPC_CMD_STATUS_BUSY 0x1 +#define SHPC_CMD_STATUS_MRL_OPEN 0x2 +#define SHPC_CMD_STATUS_INVALID_CMD 0x4 +#define SHPC_CMD_STATUS_INVALID_MODE 0x8 +#define SHPC_INT_LOCATOR 0x18 /* 4 bytes */ +#define SHPC_INT_COMMAND 0x1 +#define SHPC_SERR_LOCATOR 0x1C /* 4 bytes */ +#define SHPC_SERR_INT 0x20 /* 4 bytes */ +#define SHPC_INT_DIS 0x1 +#define SHPC_SERR_DIS 0x2 +#define SHPC_CMD_INT_DIS 0x4 +#define SHPC_ARB_SERR_DIS 0x8 +#define SHPC_CMD_DETECTED 0x1 +#define SHPC_ARB_DETECTED 0x2 + /* 4 bytes * slot # (start from 0) */ +#define SHPC_SLOT_REG(s) (0x24 + (s) * 4) + /* 2 bytes */ +#define SHPC_SLOT_STATUS(s) (0x0 + SHPC_SLOT_REG(s)) + +/* Same slot state masks are used
Re: [PATCH 3/3] KVM: perf: kvm events analysis tool
On 02/13/2012 01:32 PM, David Ahern wrote: [sorry for the top post - you would think Android would have a better mail client] If the first patch is needed then kvm-events will not work with older, unpatched kernels. That's a big limitation from a perf perpective. The first patch is only needed for code compilation, after kvm-events is compiled, you can analyse any kernels. :) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
On 2012-02-11 16:25, Blue Swirl wrote: On Fri, Feb 10, 2012 at 18:31, Jan Kiszka jan.kis...@siemens.com wrote: This enables acceleration for MMIO-based TPR registers accesses of 32-bit Windows guest systems. It is mostly useful with KVM enabled, either on older Intel CPUs (without flexpriority feature, can also be manually disabled for testing) or any current AMD processor. The approach introduced here is derived from the original version of qemu-kvm. It was refactored, documented, and extended by support for user space APIC emulation, both with and without KVM acceleration. The VMState format was kept compatible, so was the ABI to the option ROM that implements the guest-side para-virtualized driver service. This enables seamless migration from qemu-kvm to upstream or, one day, between KVM and TCG mode. The basic concept goes like this: - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel irqchip) a vmcall hypercall is registered - VAPIC option ROM is loaded into guest - option ROM activates TPR MMIO access reporting via port 0x7e - TPR accesses are trapped and patched in the guest to call into option ROM instead, VAPIC support is enabled - option ROM TPR helpers track state in memory and invoke hypercall to poll for pending IRQs if required Signed-off-by: Jan Kiszka jan.kis...@siemens.com I must say that I find the approach horrible, patching guests and ROMs and looking up Windows internals. Taking the same approach to extreme, we could for example patch Xen guest to become a KVM guest. Not that I object merging. Yes, this is horrible. But there is no real better way in the absence of hardware assisted virtualization of the TPR. I think MS is recommending this patching approach as well. diff --git a/hw/apic.c b/hw/apic.c index 086c544..2ebf3ca 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -35,6 +35,10 @@ #define MSI_ADDR_DEST_ID_SHIFT 12 #defineMSI_ADDR_DEST_ID_MASK 0x000 +#define SYNC_FROM_VAPIC 0x1 +#define SYNC_TO_VAPIC 0x2 +#define SYNC_ISR_IRR_TO_VAPIC 0x4 Enum, please. OK. + static APICCommonState *local_apics[MAX_APICS + 1]; static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode); @@ -78,6 +82,70 @@ static inline int get_bit(uint32_t *tab, int index) return !!(tab[i] mask); } +/* return -1 if no bit is set */ +static int get_highest_priority_int(uint32_t *tab) +{ +int i; +for (i = 7; i = 0; i--) { +if (tab[i] != 0) { +return i * 32 + fls_bit(tab[i]); +} +} +return -1; +} + +static void apic_sync_vapic(APICCommonState *s, int sync_type) +{ +VAPICState vapic_state; +size_t length; +off_t start; +int vector; + +if (!s-vapic_paddr) { +return; +} +if (sync_type SYNC_FROM_VAPIC) { +cpu_physical_memory_rw(s-vapic_paddr, (void *)vapic_state, + sizeof(vapic_state), 0); +s-tpr = vapic_state.tpr; +} +if (sync_type (SYNC_TO_VAPIC | SYNC_ISR_IRR_TO_VAPIC)) { +start = offsetof(VAPICState, isr); +length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr); + +if (sync_type SYNC_TO_VAPIC) { +assert(qemu_cpu_is_self(s-cpu_env)); + +vapic_state.tpr = s-tpr; +vapic_state.enabled = 1; +start = 0; +length = sizeof(VAPICState); +} + +vector = get_highest_priority_int(s-isr); +if (vector 0) { +vector = 0; +} +vapic_state.isr = vector 0xf0; + +vapic_state.zero = 0; + +vector = get_highest_priority_int(s-irr); +if (vector 0) { +vector = 0; +} +vapic_state.irr = vector 0xff; + +cpu_physical_memory_write_rom(s-vapic_paddr + start, + ((void *)vapic_state) + start, length); This assumes that the vapic_state structure matches guest what guest expect without conversion. Is this true for i386 on x86_64? I didn't check the structure in question. Yes, the structure in question is a packed one, stable on both guest and host side (the guest side is 32-bit only anyway). diff --git a/hw/apic_common.c b/hw/apic_common.c index 588531b..1977da7 100644 --- a/hw/apic_common.c +++ b/hw/apic_common.c @@ -20,8 +20,10 @@ #include apic.h #include apic_internal.h #include trace.h +#include kvm.h static int apic_irq_delivered; +bool apic_report_tpr_access; This should go to APICCommonState. Nope, it is a global state, also checked in a place where the APIC is set up, thus have no local clue about it yet and needs to pick up the global view. @@ -238,6 +275,7 @@ static int apic_init_common(SysBusDevice *dev) { APICCommonState *s = APIC_COMMON(dev); APICCommonClass *info; +static
[PATCH v2 0/6] Network performance regression
This patchset adds a new network perf testcase for Windows, refactors old netperf test, and support numa resource control. Process the raw results to a standard format at the end of test. regression.py can be used to compare two job results. --- Amos Kong (6): virt: Add vhost_threads and vcpu_threads to VM object virt_test_utils: Add pin_vm_threads virt-test: add NTttcp subtests virt-test: Refactor netperf test and add analysis module netperf: pin guest vcpus/memory/vhost thread to numa node virt: Introduce regression testing infrastructure client/tools/analyzer.py | 166 client/tools/perf.conf | 14 client/tools/regression.py | 24 ++ 3 files changed, 204 insertions(+), 0 deletions(-) create mode 100644 client/tools/analyzer.py create mode 100644 client/tools/perf.conf create mode 100644 client/tools/regression.py -- Amos Kong -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/6] virt: Add vhost_threads and vcpu_threads to VM object
Record vhost_net threads ID and vcpus threads ID to vm object after creating VM. Signed-off-by: Amos Kong ak...@redhat.com --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/client/virt/kvm_vm.py b/client/virt/kvm_vm.py index c5dba08..b2d6088 100644 --- a/client/virt/kvm_vm.py +++ b/client/virt/kvm_vm.py @@ -54,6 +54,8 @@ class VM(virt_vm.BaseVM): self.device_id = [] self.tapfds = [] self.uuid = None +self.vcpu_threads = [] +self.vhost_threads = [] self.spice_port = 8000 @@ -1008,6 +1010,12 @@ class VM(virt_vm.BaseVM): logging.debug(VM appears to be alive with PID %s, self.get_pid()) +o = self.monitor.info(cpus) +self.vcpu_threads = re.findall(thread_id=(\d+), o) +o = commands.getoutput(ps aux) +self.vhost_threads = re.findall(\w+\s+(\d+)\s.*\[vhost-%s\] % +self.get_pid(), o) + # Establish a session with the serial console -- requires a version # of netcat that supports -U self.serial_console = aexpect.ShellSession( -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/6] virt_test_utils: Add pin_vm_threads
This function is used to pin vhost and vcpu threads of VM to host cpu (in same numa node). Signed-off-by: Amos Kong ak...@redhat.com --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/client/virt/virt_test_utils.py b/client/virt/virt_test_utils.py index 6b0d7eb..7864d2a 100644 --- a/client/virt/virt_test_utils.py +++ b/client/virt/virt_test_utils.py @@ -811,3 +811,14 @@ def run_virt_sub_test(test, params, env, sub_type=None): # Run the test function run_func = getattr(test_module, run_%s % sub_type) run_func(test, params, env) + +def pin_vm_threads(vm, node): + +Pin VM threads to single cpu of a numa node +@param vm: VM object +@param node: NumaNode object + +for i in vm.vhost_threads: +logging.info(pin vhost thread(%s) to cpu(%s) % (i, node.pin_cpu(i))) +for i in vm.vcpu_threads: +logging.info(pin vcpu thread(%s) to cpu(%s) % (i, node.pin_cpu(i))) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/6] virt-test: add NTttcp subtests
This case will test tcp throughput between 2 windows guests, or between 1 guest and 1 external Windows host. When test between guest and external Windows host, 'receiver_address' should be set to external Windows' ip address. NTttcp is not a freely redistributable binary, so you *must* download it from microsoft and be in agreement with its EULA. See @see tag for a complete download link and also documentation on how to integrate it to your autotest setup. @see: http://msdn.microsoft.com/en-us/windows/hardware/gg463264 @see: http://download.microsoft.com/download/f/1/e/f1e1ac7f-e632-48ea-83ac-56b016318735/NT%20Testing%20TCP%20Tool.msi @see: https://github.com/autotest/autotest/wiki/KVMAutotest-Networking ! ntttcp.au3: This script will sign End-user license agreement ! for you, please don't use this script if you don't agree EULA. This test will generate result files with 'standard' format, split different items by '|', use one line as the title. We can analyze them by a general modules. raw_output_1.RHS: buf(k)| throughput(Mbit/s) ... 64| 2407.548 128| 2102.254 256| 4930.362 512| 4723.035 1024| 4725.334 Changes from v1: - pin vcpus/vhost_net threads to numa node - add autoio script for ntttcp test - user should put msi and autoit script to iso - fix threads sync issue - set test time to 30 seconds - support to use fixed receiver buf or use same buf as sender - 30 seconds is not enough, assign buf number to 200 Signed-off-by: Qingtang Zhou qz...@redhat.com Signed-off-by: Amos Kong ak...@redhat.com --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/client/virt/scripts/ntttcp.au3 b/client/virt/scripts/ntttcp.au3 new file mode 100755 index 000..00489e8 --- /dev/null +++ b/client/virt/scripts/ntttcp.au3 @@ -0,0 +1,41 @@ +#cs - +AutoIt Version: 3.1.1.0 +Author: Qingtang Zhou qz...@redhat.com + +Script Function: +Install NT Testing TCP tool + +Note: This script will sign End-user license agreement for user +#ce - + +Func WaitWind($title) +WinWait($title, ) + +If Not WinActive($title, ) Then +WinActivate($title, ) +EndIf +EndFunc + +$FILE=msiexec /i D:\NTttcp\\NT Testing TCP Tool.msi +Run($FILE) + +WaitWind(NT Testing TCP Tool) +WinWaitActive(NT Testing TCP Tool, Welcome to the NT Testing TCP Tool Setup Wizard) +Send(!n) + +WaitWind(NT Testing TCP Tool) +WinWaitActive(NT Testing TCP Tool, License Agreement) +send(!a) +send({ENTER}) + +WaitWind(NT Testing TCP Tool) +WinWaitActive(NT Testing TCP Tool, Select Installation Folder) +Send({ENTER}) + +WaitWind(NT Testing TCP Tool) +WinWaitActive(NT Testing TCP Tool, Confirm Installation) +send({ENTER}) + +WinWaitActive(NT Testing TCP Tool, Installation Complete) +send(!c) + diff --git a/client/virt/subtests.cfg.sample b/client/virt/subtests.cfg.sample index 89dda8c..cc0986a 100644 --- a/client/virt/subtests.cfg.sample +++ b/client/virt/subtests.cfg.sample @@ -1007,6 +1007,28 @@ variants: netperf_cmd = %s/netperf-2.4.5/src/netperf -t %s -H %s -l 60 -- -r %s protocols = TCP_RR TCP_CRR UDP_RR +- ntttcp: +type = ntttcp +image_snapshot = yes +check_ntttcp_cmd = cmd /c dir C:\NTttcp +ntttcp_sender_cmd = cmd /c C:\NTttcp\NTttcps.exe -m %s,0,%s -a 2 -l %s -n %s +ntttcp_receiver_cmd = cmd /c C:\NTttcp\NTttcpr.exe -m %s,0,%s -a 6 -rb %s -n %s +session_num = 1 +buffers = 2k 4k 8k 16k 32k 64k 128k 256k 512k 1024k 2048k +timeout = 1200 +kill_vm = yes +numa_node = -1 +variants: +- guest_guest: +vms += vm2 +- guest_host: +# external Windows system IP, NTttcp need to be installed firstly. +receiver_address = 192.168.1.1 +32: +ntttcp_install_cmd = 'cmd /c D:\autoit3.exe D:\NTttcp\NTttcp.au3 mkdir C:\NTttcp copy C:\Program Files\Microsoft Corporation\NT Testing TCP Tool\* C:\NTttcp cd C:\NTttcp\ copy NTttcp_%s.exe NTttcps.exe copy NTttcp_%s.exe NTttcpr.exe' +64: +ntttcp_install_cmd = 'cmd /c D:\autoit3.exe D:\NTttcp\NTttcp.au3 mkdir C:\NTttcp copy C:\Program Files (x86)\Microsoft Corporation\NT Testing TCP Tool\* C:\NTttcp cd C:\NTttcp\ copy NTttcp_%s.exe NTttcps.exe copy NTttcp_%s.exe NTttcpr.exe' + - ethtool: install setup image_copy unattended_install.cdrom only Linux type = ethtool diff --git a/client/virt/tests/ntttcp.py b/client/virt/tests/ntttcp.py new file mode 100644 index 000..66cdbfe --- /dev/null +++ b/client/virt/tests/ntttcp.py @@ -0,0 +1,175 @@ +import logging, os, glob, re, commands +from autotest_lib.client.common_lib import error +from autotest_lib.client.common_lib import utils +from autotest_lib.client.virt import virt_utils, aexpect, virt_test_utils +
[PATCH v2 4/6] virt-test: Refactor netperf test and add analysis module
Always use a VM as netperf server, we can use another VM/localhost/external host as the netperf clients. We setup env and launch test by executing remote ssh commands, you need to configure the IP of local/external host in configure file, VMs' IP can be got automatically. Generate a file with 'standard' format at the end of test, then we can analyze them by general module. Changes from v1: - record packet bytes - enable arp_ignore - get packet info from ifconfig - shape functions - don't change ssh config - use server.hosts.ssh_host.SSHHost to setup ssh Signed-off-by: Amos Kong ak...@redhat.com --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/client/virt/subtests.cfg.sample b/client/virt/subtests.cfg.sample index cc0986a..a2939f8 100644 --- a/client/virt/subtests.cfg.sample +++ b/client/virt/subtests.cfg.sample @@ -992,20 +992,36 @@ variants: - netperf: install setup image_copy unattended_install.cdrom only Linux +only virtio_net type = netperf -nics += ' nic2 nic3 nic4' +kill_vm = yes +image_snapshot = yes +nics += ' nic2' +# nic1 is for control, nic2 is for data connection +# bridge_nic1 = virbr0 +pci_model_nic1 = virtio_net +# bridge_nic2 = switch +pci_model_nic2 = e1000 nic_mode = tap netperf_files = netperf-2.4.5.tar.bz2 wait_before_data.patch -packet_size = 1500 -setup_cmd = cd %s tar xvfj netperf-2.4.5.tar.bz2 cd netperf-2.4.5 patch -p0 ../wait_before_data.patch ./configure make -netserver_cmd = %s/netperf-2.4.5/src/netserver +setup_cmd = cd /tmp rm -rf netperf-2.4.5 tar xvfj netperf-2.4.5.tar.bz2 cd netperf-2.4.5 patch -p0 ../wait_before_data.patch ./configure make +# configure netperf test parameters +# l = 60 +# protocols = TCP_STREAM TCP_MAERTS TCP_RR +# sessions = 1 2 4 +# sessions_rr = 50 100 250 500 +# sizes = 64 256 512 1024 +# sizes_rr = 64 256 512 1024 variants: -- stream: -netperf_cmd = %s/netperf-2.4.5/src/netperf -t %s -H %s -l 60 -- -m %s -protocols = TCP_STREAM TCP_MAERTS TCP_SENDFILE UDP_STREAM -- rr: -netperf_cmd = %s/netperf-2.4.5/src/netperf -t %s -H %s -l 60 -- -r %s -protocols = TCP_RR TCP_CRR UDP_RR +- guest_guest: +vms += vm2 +nics = 'nic1' +- host_guest: +# local host ip address +# client = localhost +- exhost_guest: +# external host ip address +# client = - ntttcp: type = ntttcp diff --git a/client/virt/tests/netperf.py b/client/virt/tests/netperf.py index fea1e9e..214f351 100644 --- a/client/virt/tests/netperf.py +++ b/client/virt/tests/netperf.py @@ -1,17 +1,18 @@ -import logging, os, signal +import logging, os, commands, sys, threading, re, glob from autotest_lib.client.common_lib import error from autotest_lib.client.bin import utils from autotest_lib.client.virt import aexpect, virt_utils +from autotest_lib.client.virt import virt_test_utils +from autotest_lib.server.hosts.ssh_host import SSHHost def run_netperf(test, params, env): Network stress test with netperf. -1) Boot up a VM with multiple nics. -2) Launch netserver on guest. -3) Execute multiple netperf clients on host in parallel - with different protocols. -4) Output the test result. +1) Boot up VM(s), setup SSH authorization between host + and guest(s)/external host +2) Prepare the test environment in server/client/host +3) Execute netperf tests, collect and analyze the results @param test: KVM test object. @param params: Dictionary with the test parameters. @@ -21,86 +22,202 @@ def run_netperf(test, params, env): vm.verify_alive() login_timeout = int(params.get(login_timeout, 360)) session = vm.wait_for_login(timeout=login_timeout) +server = vm.get_address() +server_ctl = vm.get_address(1) session.close() -session_serial = vm.wait_for_serial_login(timeout=login_timeout) - -netperf_dir = os.path.join(os.environ['AUTODIR'], tests/netperf2) -setup_cmd = params.get(setup_cmd) - -firewall_flush = iptables -F -session_serial.cmd_output(firewall_flush) -try: -utils.run(iptables -F) -except Exception: -pass - -for i in params.get(netperf_files).split(): -vm.copy_files_to(os.path.join(netperf_dir, i), /tmp) - -try: -session_serial.cmd(firewall_flush) -except aexpect.ShellError: -logging.warning(Could not flush firewall rules on guest) - -session_serial.cmd(setup_cmd % /tmp, timeout=200) -session_serial.cmd(params.get(netserver_cmd) % /tmp) - -if tcpdump in env and env[tcpdump].is_alive(): -# Stop the background tcpdump process -
[PATCH v2 5/6] netperf: pin guest vcpus/memory/vhost thread to numa node
Dynamically checking hardware and pin guest cpu threads and guest memory to last numa node Changes from v1: - assign numanode to -1 for netperf test Signed-off-by: Amos Kong ak...@redhat.com --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/client/virt/subtests.cfg.sample b/client/virt/subtests.cfg.sample index a2939f8..c68a48c 100644 --- a/client/virt/subtests.cfg.sample +++ b/client/virt/subtests.cfg.sample @@ -1012,6 +1012,7 @@ variants: # sessions_rr = 50 100 250 500 # sizes = 64 256 512 1024 # sizes_rr = 64 256 512 1024 +numa_node = -1 variants: - guest_guest: vms += vm2 diff --git a/client/virt/tests/netperf.py b/client/virt/tests/netperf.py index 214f351..bc4e436 100644 --- a/client/virt/tests/netperf.py +++ b/client/virt/tests/netperf.py @@ -26,12 +26,22 @@ def run_netperf(test, params, env): server_ctl = vm.get_address(1) session.close() +logging.debug(commands.getoutput(numactl --hardware)) +logging.debug(commands.getoutput(numactl --show)) +# pin guest vcpus/memory/vhost threads to last numa node of host by default +if params.get('numa_node'): +numa_node = int(params.get('numa_node')) +node = virt_utils.NumaNode(numa_node) +virt_test_utils.pin_vm_threads(vm, node) + if vm2 in params[vms]: vm2 = env.get_vm(vm2) vm2.verify_alive() session2 = vm2.wait_for_login(timeout=login_timeout) client = vm2.get_address() session2.close() +if params.get('numa_node'): +virt_test_utils.pin_vm_threads(vm2, node) if params.get(client): client = params[client] @@ -196,7 +206,10 @@ def launch_client(sessions, server, server_ctl, host, client, l, nf_args): return [nrx, ntx, nrxb, ntxb, nre, nrx_intr, ntx_intr, io_exit, irq_inj] def netperf_thread(i): -cmd = %s -H %s -l %s %s % (client_path, server, l, nf_args) +output = ssh_cmd(client, numactl --hardware) +n = int(re.findall(available: (\d+) nodes, output)[0]) - 1 +cmd = numactl --cpunodebind=%s --membind=%s %s -H %s -l %s %s % \ +(n, n, client_path, server, l, nf_args) output = ssh_cmd(client, cmd) f = file(/tmp/netperf.%s.%s.nf % (pid, i), w) f.write(output) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 6/6] virt: Introduce regression testing infrastructure
regression.py: Usage: python regression.py $testname $dir1 $dir2 $configfile 'regression' module is used to compare the test results of two jobs, we can use it (regression.compare()) at the end of control file, this script can also be used directly. Example: | # python regression.py netperf /result1-dir /result2-dir perf.conf analyzer.py: Usage: python analyzer.py $results list1 $results list2 $log_file It's used to compute average, standard deviation, augment rate, etc, and compare two test results (standard format). It can be used directly, example: | # python analyzer.py result-v1-1.RHS result-v1-2.RHS \ | result-v2-1.RHS result-v2-2.RHS result-v2-3.RHS log.txt | Thu Jan 5 10:17:24 2012 | | == Avg1 SD Augment Rate == | TCP_STREAM | size|sessions|throughput| cpu|normalize| ... | 2048| 2| 14699.17| 31.73| 463.19| ... | %SD | 0.0| 0.6| 0.0| 0.8| ... | 2048| 4| 15935.68| 34.30| 464.66| ... | %SD | 0.0| 0.3| 1.7| 1.5| ... | ... | | == AvgS Augment Rate = | TCP_STREAM | size|sessions|throughput| cpu|normalize| ... | 2048| 2| 7835.61| 31.66| 247.36| ... | 2048| 2| 8757.03| 31.94| 274.14| ... | % |+0.0| +11.8| +0.9|+10.8| ... | 2048| 4| 12000.65| 32.38| 370.62| ... | 2048| 4| 13641.20| 32.27| 423.29| ... | % |+0.0| +13.7| -0.3|+14.2| ... | perf.conf: config test related parameters. perf regression guide: https://github.com/autotest/autotest/wiki/KVMAutotest-Networking Changes from v1: - refactor analysis code - add standard deviation percent - only provide mechanism to user, user can use tools directly or use the lib in scripts Signed-off-by: Amos Kong ak...@redhat.com --- client/tools/analyzer.py | 166 client/tools/perf.conf | 14 client/tools/regression.py | 24 ++ 3 files changed, 204 insertions(+), 0 deletions(-) create mode 100644 client/tools/analyzer.py create mode 100644 client/tools/perf.conf create mode 100644 client/tools/regression.py diff --git a/client/tools/analyzer.py b/client/tools/analyzer.py new file mode 100644 index 000..28df97e --- /dev/null +++ b/client/tools/analyzer.py @@ -0,0 +1,166 @@ +import sys, re, string, time, commands, os, random + +def tee(content, filename): + Write content to standard output and file +fd = open(filename, a) +fd.write(content + \n) +fd.close() +print content + +class samples(): +def __init__(self, files): +self.files_dict = [] +for i in range(len(files)): +fd = open(files[i], r) +self.files_dict.append(fd.readlines()) +fd.close() + +def getAvg(self): +return self._process(self.files_dict, self._get_list_avg) + +def getAvgPercent(self, avgs_dict): +return self._process(avgs_dict, self._get_augment_rate) + +def getSD(self): +return self._process(self.files_dict, self._get_list_sd) + +def getSDPercent(self, sds_dict): +return self._process(sds_dict, self._get_percent) + +def _get_percent(self, data): + num2 / num1 * 100 +result = 0.0 +if len(data) == 2 and float(data[0]) != 0: +result = %.1f % (float(data[1]) / float(data[0]) * 100) +return result + +def _get_augment_rate(self, data): + (num2 - num1) / num1 * 100 +result = +0.0 +if len(data) == 2 and float(data[0]) != 0: +result = %+.1f % (((float(data[1]) - float(data[0])) + / float(data[0])) * 100) +return result + +def _get_list_sd(self, data): + +sumX = x1 + x2 + ... + xn +avgX = sumX / n +sumSquareX = x1^2 + ... + xn^2 +SD = sqrt([sumSquareX - (n * (avgX ^ 2))] / (n - 1)) + +sum = sqsum = 0 +n = len(data) +for i in data: +sum += float(i) +sqsum += float(i) ** 2 +avg = sum / n +if avg == 0 or n == 1: +return 0.0 +return %.1f % (((sqsum - (n * avg**2)) / (n - 1))**0.5) + +def _get_list_avg(self, data): + Compute the average of list members +sum = 0 +for i in data: +sum += float(i) +if . in data[0]: +return %.2f % (sum / len(data)) +return %d % (sum / len(data)) + +def _process_lines(self, files_dict, row, func): + Process lines of different sample files with assigned method +lines = [] +ret_lines = [] + +for i in range(len(files_dict)): +lines.append(files_dict[i][row].split(|)) +for col in range(len(lines[0])): +data_list = [] +for i in range(len(lines)): +data_list.append(lines[i][col].strip()) +
Re: Pe: [PATCH v5 1/3] virtio-scsi: first version
On Mon, Feb 13, 2012 at 8:05 AM, Christian Borntraeger borntrae...@de.ibm.com wrote: On 12/02/12 21:16, James Bottomley wrote: Could someone please explain to me why you can't simply fix virtio-blk? I dont think that virtio-scsi will replace virtio-blk everywhere. For non-scsi block devices, image files or logical volumes virtio-blk seems to be the right approach, I think. Or would virtio-blk maintainers give a reason why they're unwilling to have it fixed? I dont consider virtio-blk broken. It just doesnt cover everything. Although I'm not sure whether that helps here: since about a year there is software present in the upstream kernel that allows to use any block device or even a file as a SCSI device. Bart. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2-RFC 1/2] shpc: standard hot plug controller
On Mon, Feb 13, 2012 at 07:03:52PM +0900, Isaku Yamahata wrote: Oh nice work. On Mon, Feb 13, 2012 at 11:15:55AM +0200, Michael S. Tsirkin wrote: This adds support for SHPC interface, as defined by PCI Standard Hot-Plug Controller and Subsystem Specification, Rev 1.0 http://www.pcisig.com/specifications/conventional/pci_hot_plug/SHPC_10 Only SHPC intergrated with a PCI-to-PCI bridge is supported, SHPC integrated with a host bridge would need more work. All main SHPC features are supported: - MRL sensor Does this just report latch status? (It seems so.) What happens is that adding a device closes the latch, removing a device opens the latch. This simplifies the number of supported configurations significantly. Do you plan to provide interfaces to manipulate the latch? I didn't plan to do this, and this is non-trivial. Do you just want this for empty slots? And why? - Attention button - Attention indicator - Power indicator Wake on hotplug and serr generation are stubbed out but unused as we don't have interfaces to generate these events ATM. One issue that isn't completely resolved is that qemu currently expects an eject interface, which SHPC does not provide: it merely removes the power to device and it's up to the user to remove the device from slot. This patch works around that by ejecting the device when power is removed and power LED goes off. TODO: - migration support - fix dependency on pci_internals.h If I didn't miss the code, - QMP command for pushing attention button. - QMP command to get LED status It's easy to add these, so I'd accept such a patch, but I wonder why. - QMP events for LED on/off There's also blink :) thanks, I'm concerned that a guest can flood the management with such events. It's better to send a single LED change event, then we can suppress further events until next get LED status command. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Makefile.objs |1 + hw/pci.h |6 + hw/shpc.c | 646 + hw/shpc.h | 40 qemu-common.h |1 + 5 files changed, 694 insertions(+), 0 deletions(-) create mode 100644 hw/shpc.c create mode 100644 hw/shpc.h diff --git a/Makefile.objs b/Makefile.objs index 391e524..4546477 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -195,6 +195,7 @@ hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o hw-obj-y += fw_cfg.o hw-obj-$(CONFIG_PCI) += pci.o pci_bridge.o hw-obj-$(CONFIG_PCI) += msix.o msi.o +hw-obj-$(CONFIG_PCI) += shpc.o hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o hw-obj-y += watchdog.o diff --git a/hw/pci.h b/hw/pci.h index 33b0b18..756577e 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -125,6 +125,9 @@ enum { /* command register SERR bit enabled */ #define QEMU_PCI_CAP_SERR_BITNR 4 QEMU_PCI_CAP_SERR = (1 QEMU_PCI_CAP_SERR_BITNR), +/* Standard hot plug controller. */ +#define QEMU_PCI_SHPC_BITNR 5 +QEMU_PCI_CAP_SHPC = (1 QEMU_PCI_SHPC_BITNR), }; #define TYPE_PCI_DEVICE pci-device @@ -229,6 +232,9 @@ struct PCIDevice { /* PCI Express */ PCIExpressDevice exp; +/* SHPC */ +SHPCDevice *shpc; + /* Location of option rom */ char *romfile; bool has_rom; diff --git a/hw/shpc.c b/hw/shpc.c new file mode 100644 index 000..4baec29 --- /dev/null +++ b/hw/shpc.c @@ -0,0 +1,646 @@ +#include strings.h +#include stdint.h +#include range.h +#include shpc.h +#include pci.h +#include pci_internals.h + +/* TODO: model power only and disabled slot states. */ +/* TODO: handle SERR and wakeups */ +/* TODO: consider enabling 66MHz support */ + +/* TODO: remove fully only on state DISABLED and LED off. + * track state to properly record this. */ + +/* SHPC Working Register Set */ +#define SHPC_BASE_OFFSET 0x00 /* 4 bytes */ +#define SHPC_SLOTS_33 0x04 /* 4 bytes. Also encodes PCI-X slots. */ +#define SHPC_SLOTS_66 0x08 /* 4 bytes. */ +#define SHPC_NSLOTS 0x0C /* 1 byte */ +#define SHPC_FIRST_DEV0x0D /* 1 byte */ +#define SHPC_PHYS_SLOT0x0E /* 2 byte */ +#define SHPC_PHYS_NUM_MAX 0x7ff +#define SHPC_PHYS_NUM_UP 0x1000 +#define SHPC_PHYS_MRL 0x4000 +#define SHPC_PHYS_BUTTON 0x8000 +#define SHPC_SEC_BUS 0x10 /* 2 bytes */ +#define SHPC_SEC_BUS_33 0x0 +#define SHPC_SEC_BUS_66 0x1 /* Unused */ +#define SHPC_SEC_BUS_MASK 0x7 +#define SHPC_MSI_CTL 0x12 /* 1 byte */ +#define SHPC_PROG_IFC 0x13 /* 1 byte */ +#define SHPC_PROG_IFC_1_0 0x1 +#define SHPC_CMD_CODE 0x14 /* 1 byte */ +#define SHPC_CMD_TRGT 0x15 /* 1 byte */ +#define SHPC_CMD_TRGT_MIN 0x1 +#define SHPC_CMD_TRGT_MAX 0x1f +#define SHPC_CMD_STATUS 0x16 /* 2 bytes */ +#define
Re: virtio-blk performance regression and qemu-kvm
On Fri, Feb 10, 2012 at 2:36 PM, Dongsu Park dongsu.p...@profitbricks.com wrote: Now I'm running benchmarks with both qemu-kvm 0.14.1 and 1.0. - Sequential read (Running inside guest) # fio -name iops -rw=read -size=1G -iodepth 1 \ -filename /dev/vdb -ioengine libaio -direct=1 -bs=4096 - Sequential write (Running inside guest) # fio -name iops -rw=write -size=1G -iodepth 1 \ -filename /dev/vdb -ioengine libaio -direct=1 -bs=4096 For each one, I tested 3 times to get the average. Result: seqread with qemu-kvm 0.14.1 67,0 MByte/s seqread with qemu-kvm 1.0 30,9 MByte/s seqwrite with qemu-kvm 0.14.1 65,8 MByte/s seqwrite with qemu-kvm 1.0 30,5 MByte/s Please retry with the following commit or simply qemu-kvm.git/master. Avi discovered a performance regression which was introduced when the block layer was converted to use coroutines: $ git describe 39a7a362e16bb27e98738d63f24d1ab5811e26a8 v1.0-327-g39a7a36 (This commit is not in 1.0!) Please post your qemu-kvm command-line. 67 MB/s sequential 4 KB read means 67 * 1024 / 4 = 17152 requests per second, so 58 microseconds per request. Please post the fio output so we can double-check what is reported. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Q: Does linux kvm native tool support loading BIOS as the default loader now?
Hi all, As I know, native tool does not support loading BIOS so it does not support Windows. Is this supporting now? If not, I may try to implement it. Thanks, Yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Q: Does linux kvm native tool support loading BIOS as the default loader now?
On Mon, Feb 13, 2012 at 08:14:22PM +0800, Yang Bai wrote: Hi all, As I know, native tool does not support loading BIOS so it does not support Windows. Is this supporting now? If not, I may try to implement it. Nope yet. There was a plan to implement seabios support, but nothing is done that far. Feel free to implement such support. Cyrill -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Q: Does linux kvm native tool support loading BIOS as the default loader now?
On Mon, Feb 13, 2012 at 08:14:22PM +0800, Yang Bai wrote: As I know, native tool does not support loading BIOS so it does not support Windows. Is this supporting now? If not, I may try to implement it. On Mon, Feb 13, 2012 at 2:19 PM, Cyrill Gorcunov gorcu...@openvz.org wrote: Nope yet. There was a plan to implement seabios support, but nothing is done that far. Feel free to implement such support. Yup, optional SeaBIOS support would be awesome! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: x86: kvmclock: abstract save/restore sched_clock_state
On (Fri) 10 Feb 2012 [21:58:47], Igor Mammedov wrote: BTW Amit, your config doesn't have CONFIG_KVM_GUEST set, which causes primary cpu clock to be uninitialized too in case of SMP kernel. Interesting. I didn't notice that. However, if I enable that option, resume fails for me even the first time. Amit -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pe: [PATCH v5 1/3] virtio-scsi: first version
Hi Dor, James Co, On Mon, 2012-02-13 at 09:57 +0200, Dor Laor wrote: On 02/13/2012 09:05 AM, Christian Borntraeger wrote: On 12/02/12 21:16, James Bottomley wrote: Well, no-one's yet answered the question I had about why. Just to give one example from a different angle: In the big datacenters tape libraries are still very important, and lots of them have a scsi attachement. virtio-blk certainly is not the right way to handle those. Furthermore it seems even pretty hard to craft a virtio-tape since most of those libraries have vendor specific library controls (via sg). We would need to duplicate scsi generic (hint, hint :-) virtio-scsi seems to be a basic duplication of virtio-blk except that it seems to fix some problems virtio-blk has. Namely queue parameter discover, which virtio-blk doesn't seem to do. There may also be a reason to cut the stack lower down. Error handling is most often cited for this, but no-one's satisfactorily explaned why it's better to do error handling in the guest instead of the host. Could someone please explain to me why you can't simply fix virtio-blk? I dont think that virtio-scsi will replace virtio-blk everywhere. For non-scsi block devices, image files or logical volumes virtio-blk seems to be the right approach, I think. +1 virtio-scsi is superior w.r.t: - Device support: tapes, cdroms, other AFAICT any type of non TYPE_DISK struct scsi_device passthrough is going to currently require virtio-scsi in order to work. - Does guest-host mapped multipath The logic that comes with target_core_fabric_configfs.c and the native target control plane gives a host-side (tcm_vhost) fabric driver generic explict/implict ALUA multipath support by default. I think there are some interesting possibilities for paravirtualized ALUA multipath.. 8-) - Supports plenty of virtual disks mapped to the guest w/o need for a pci slot per each virtio-blk Ouch, virtio-blk lacks multi-lun per pci slot support..? - offload fancy/new/sophisticated scsi commands from the guest to the storage array w/o need for qemu implementation. Example XCOPY. ... There are some more goodies like ability to support windows guest clustering w/o hacky versions of scsi pass through over virtio-blk. virtio-blk is also a candidate to change the request based towards bio based implementation, so sticking to it does not buy us too much. MSFT cluster guests that require SPC-3 PR support can run today with tcm_loop LLD SCSI LUNs + SG_IO/BSG + right megasas QEMU HBA emulation, but I do agree this would be better served by virtio-scsi for guests that require SPC-3 PR support or passthrough. --nab -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: x86: kvmclock: abstract save/restore sched_clock_state
On (Fri) 10 Feb 2012 [13:43:05], Igor Mammedov wrote: Another thing is to try smp guest without kvmclock and see if it helps. It might be just something else. Nope, it's related to kvmclock. Amit -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pe: [PATCH v5 1/3] virtio-scsi: first version
On 02/13/2012 02:40 PM, Nicholas A. Bellinger wrote: Hi Dor, James Co, On Mon, 2012-02-13 at 09:57 +0200, Dor Laor wrote: On 02/13/2012 09:05 AM, Christian Borntraeger wrote: On 12/02/12 21:16, James Bottomley wrote: Well, no-one's yet answered the question I had about why. Just to give one example from a different angle: In the big datacenters tape libraries are still very important, and lots of them have a scsi attachement. virtio-blk certainly is not the right way to handle those. Furthermore it seems even pretty hard to craft a virtio-tape since most of those libraries have vendor specific library controls (via sg). We would need to duplicate scsi generic (hint, hint :-) virtio-scsi seems to be a basic duplication of virtio-blk except that it seems to fix some problems virtio-blk has. Namely queue parameter discover, which virtio-blk doesn't seem to do. There may also be a reason to cut the stack lower down. Error handling is most often cited for this, but no-one's satisfactorily explaned why it's better to do error handling in the guest instead of the host. Could someone please explain to me why you can't simply fix virtio-blk? I dont think that virtio-scsi will replace virtio-blk everywhere. For non-scsi block devices, image files or logical volumes virtio-blk seems to be the right approach, I think. +1 virtio-scsi is superior w.r.t: - Device support: tapes, cdroms, other AFAICT any type of non TYPE_DISK struct scsi_device passthrough is going to currently require virtio-scsi in order to work. - Does guest-host mapped multipath The logic that comes with target_core_fabric_configfs.c and the native target control plane gives a host-side (tcm_vhost) fabric driver generic explict/implict ALUA multipath support by default. I think there are some interesting possibilities for paravirtualized ALUA multipath.. 8-) - Supports plenty of virtual disks mapped to the guest w/o need for a pci slot per each virtio-blk Ouch, virtio-blk lacks multi-lun per pci slot support..? Only if you use the pci multi-function option but that kills standard hot unplug - offload fancy/new/sophisticated scsi commands from the guest to the storage array w/o need for qemu implementation. Example XCOPY. ... There are some more goodies like ability to support windows guest clustering w/o hacky versions of scsi pass through over virtio-blk. virtio-blk is also a candidate to change the request based towards bio based implementation, so sticking to it does not buy us too much. MSFT cluster guests that require SPC-3 PR support can run today with tcm_loop LLD SCSI LUNs + SG_IO/BSG + right megasas QEMU HBA emulation, but I do agree this would be better served by virtio-scsi for guests that require SPC-3 PR support or passthrough. --nab -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: x86: kvmclock: abstract save/restore sched_clock_state
On (Fri) 10 Feb 2012 [10:33:37], Marcelo Tosatti wrote: On Fri, Feb 10, 2012 at 10:32:16AM -0200, Marcelo Tosatti wrote: On Fri, Feb 10, 2012 at 03:32:11PM +0530, Amit Shah wrote: On (Thu) 09 Feb 2012 [16:13:29], Igor Mammedov wrote: Stalls are probably caused by uninitialized percpu hv_clock, with following patch I don't see stalls. Although I might be just lucky. http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=e2971ac7e1d186af059e088d305496c5cb47d487 Your commit does make things better, I don't see any stalls on the first resume. However, a subsequent s4 causes the stall to re-appear on resume, and this time there are no stall messages; the kernel just sits there spinning on something. I've not found the solution to this one yet (I had a commit similar to Marcelo's in the works, which got me to the previous works-but-stalls behaviour). I cannot reproduce it here. Suspend/resume are operating normally after several iterations. Igor do you see anything similar? Amit, can you please enable CONFIG_PRINTK_TIME=y and post a full dmesg (both during suspend and also the new kernel during resume). Also is it reproducible with UP guest? Yes, it is. Amit -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pe: [PATCH v5 1/3] virtio-scsi: first version
On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote: Only if you use the pci multi-function option but that kills standard hot unplug It doesn't kill it as such, rather you can't unplug luns individually. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 8/9] qemu-kvm: Use upstream kvm_irqchip_set_irq instead of kvm_set_irq
Functions are equivalent, let's switch. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/msi.c |2 +- hw/msix.c |2 +- kvm-stub.c |2 +- kvm.h |3 +-- qemu-kvm.c | 32 5 files changed, 4 insertions(+), 37 deletions(-) diff --git a/hw/msi.c b/hw/msi.c index 3e623c2..7bb3e2f 100644 --- a/hw/msi.c +++ b/hw/msi.c @@ -350,7 +350,7 @@ void msi_notify(PCIDevice *dev, unsigned int vector) } if (kvm_enabled() kvm_irqchip_in_kernel()) { -kvm_set_irq(dev-msi_irq_entries[vector].gsi, 1, NULL); +kvm_irqchip_set_irq(kvm_state, dev-msi_irq_entries[vector].gsi, 1); return; } diff --git a/hw/msix.c b/hw/msix.c index 55ddbf4..7955221 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -494,7 +494,7 @@ void msix_notify(PCIDevice *dev, unsigned vector) } if (kvm_enabled() kvm_irqchip_in_kernel()) { -kvm_set_irq(dev-msix_irq_entries[vector].gsi, 1, NULL); +kvm_irqchip_set_irq(kvm_state, dev-msix_irq_entries[vector].gsi, 1); return; } diff --git a/kvm-stub.c b/kvm-stub.c index 266dc4a..d22fcad 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -162,7 +162,7 @@ int kvm_irqchip_commit_routes(KVMState *s) return -ENOSYS; } -int kvm_set_irq(int irq, int level, int *status) +int kvm_irqchip_set_irq(KVMState *s, int irq, int level) { assert(0); return -ENOSYS; diff --git a/kvm.h b/kvm.h index b84aa40..3c3a510 100644 --- a/kvm.h +++ b/kvm.h @@ -228,13 +228,12 @@ int kvm_msi_message_del(KVMMsiMessage *msg); int kvm_msi_message_update(KVMMsiMessage *old, KVMMsiMessage *new); #ifndef NEED_CPU_H +int kvm_irqchip_set_irq(KVMState *s, int irq, int level); int kvm_irqchip_commit_routes(KVMState *s); #endif int kvm_irqchip_in_kernel(void); -int kvm_set_irq(int irq, int level, int *status); - #ifdef NEED_CPU_H #include qemu-kvm.h #endif diff --git a/qemu-kvm.c b/qemu-kvm.c index 10a313d..09a35f0 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -41,38 +41,6 @@ static inline void clear_gsi(KVMState *s, unsigned int gsi) } } -#ifdef KVM_CAP_IRQCHIP - -int kvm_set_irq(int irq, int level, int *status) -{ -struct kvm_irq_level event; -int r; - -if (!kvm_state-irqchip_in_kernel) { -return 0; -} -event.level = level; -event.irq = irq; -r = kvm_vm_ioctl(kvm_state, kvm_state-irqchip_inject_ioctl, - event); -if (r 0) { -perror(kvm_set_irq); -} - -if (status) { -#ifdef KVM_CAP_IRQ_INJECT_STATUS -*status = (kvm_state-irqchip_inject_ioctl == KVM_IRQ_LINE) ? -1 : event.status; -#else -*status = 1; -#endif -} - -return 1; -} - -#endif - #ifdef KVM_CAP_DEVICE_ASSIGNMENT int kvm_assign_pci_device(KVMState *s, struct kvm_assigned_pci_dev *assigned_dev) -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/9] qemu-kvm: Use machine options to configure qemu-kvm defaults
Upstream is moving towards this mechanism, so start using it in qemu-kvm already to configure the specific defaults: kvm enabled on, just like in-kernel irqchips. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/pc_piix.c |7 +++ kvm-all.c|8 vl.c |9 +++-- 3 files changed, 18 insertions(+), 6 deletions(-) diff --git a/hw/pc_piix.c b/hw/pc_piix.c index c9c580c..156fcc8 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -394,6 +394,7 @@ static QEMUMachine pc_machine_v1_0 = { .init = pc_init_pci, .max_cpus = 255, .is_default = 1, +.default_machine_opts = accel=kvm,kernel_irqchip=on, }; static QEMUMachine pc_machine_v0_15 = { @@ -409,6 +410,7 @@ static QEMUMachine pc_machine_v0_14 = { .desc = Standard PC, .init = pc_init_pci, .max_cpus = 255, +.default_machine_opts = accel=kvm,kernel_irqchip=on, .compat_props = (GlobalProperty[]) { { .driver = qxl, @@ -444,6 +446,7 @@ static QEMUMachine pc_machine_v0_13 = { .desc = Standard PC, .init = pc_init_pci_no_kvmclock, .max_cpus = 255, +.default_machine_opts = accel=kvm,kernel_irqchip=on, .compat_props = (GlobalProperty[]) { { .driver = virtio-9p-pci, @@ -491,6 +494,7 @@ static QEMUMachine pc_machine_v0_12 = { .desc = Standard PC, .init = pc_init_pci_no_kvmclock, .max_cpus = 255, +.default_machine_opts = accel=kvm,kernel_irqchip=on, .compat_props = (GlobalProperty[]) { { .driver = virtio-serial-pci, @@ -542,6 +546,7 @@ static QEMUMachine pc_machine_v0_11 = { .desc = Standard PC, qemu 0.11, .init = pc_init_pci_no_kvmclock, .max_cpus = 255, +.default_machine_opts = accel=kvm,kernel_irqchip=on, .compat_props = (GlobalProperty[]) { { .driver = virtio-blk-pci, @@ -601,6 +606,7 @@ static QEMUMachine pc_machine_v0_10 = { .desc = Standard PC, qemu 0.10, .init = pc_init_pci_no_kvmclock, .max_cpus = 255, +.default_machine_opts = accel=kvm,kernel_irqchip=on, .compat_props = (GlobalProperty[]) { { .driver = virtio-blk-pci, @@ -672,6 +678,7 @@ static QEMUMachine isapc_machine = { .desc = ISA-only PC, .init = pc_init_isa, .max_cpus = 1, +.default_machine_opts = accel=kvm,kernel_irqchip=on, }; #ifdef CONFIG_XEN diff --git a/kvm-all.c b/kvm-all.c index ae89389..515ba6e 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -887,6 +887,7 @@ int kvm_init(void) const KVMCapabilityInfo *missing_cap; int ret; int i; +QemuOptsList *list; s = g_malloc0(sizeof(KVMState)); @@ -973,6 +974,13 @@ int kvm_init(void) s-pit_state2 = kvm_check_extension(s, KVM_CAP_PIT_STATE2); #endif +list = qemu_find_opts(machine); +if (!QTAILQ_EMPTY(list-head) +!qemu_opt_get_bool(QTAILQ_FIRST(list-head), + kernel_irqchip, false)) { +kvm_irqchip = 0; +} + ret = kvm_arch_init(s); if (ret 0) { goto err; diff --git a/vl.c b/vl.c index c5994ee..c3b4037 100644 --- a/vl.c +++ b/vl.c @@ -2040,13 +2040,8 @@ static int configure_accelerator(void) } if (p == NULL) { -#ifdef CONFIG_KVM_OPTIONS -/* Use the default accelerator, kvm */ -p = kvm; -#else /* Use the default accelerator, tcg */ p = tcg; -#endif } while (!accel_initalised *p != '\0') { @@ -2908,7 +2903,9 @@ int main(int argc, char **argv, char **envp) break; #ifdef CONFIG_KVM_OPTIONS case QEMU_OPTION_no_kvm_irqchip: { - kvm_irqchip = 0; +olist = qemu_find_opts(machine); +qemu_opts_reset(olist); +qemu_opts_parse(olist, accel=kvm,kernel_irqchip=off, 0); break; } case QEMU_OPTION_no_kvm_pit: { -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/9] qemu-kvm: Switch to upstream irqchip services
Now that upstream has basic irqchip support, we can make use of it for qemu-kvm as well, removing another 700 lines of code here. This series depends on apic: Fix legacy vmstate loading for KVM which is currently awaiting upstream merge via uq/master. Jan Kiszka (9): qemu-kvm: Move kvm_create_pit out of arch init code qemu-kvm: Use machine options to configure qemu-kvm defaults qemu-kvm: Use upstream irq routing services qemu-kvm: Use upstream kvm_irqchip_create qemu-kvm: Use upstream kvm-ioapic qemu-kvm: Use upstream kvm-i8259 qemu-kvm: Drop unused kvm_get/set_irqchip qemu-kvm: Use upstream kvm_irqchip_set_irq instead of kvm_set_irq qemu-kvm: Use upstream kvm-apic Makefile.objs |2 +- Makefile.target|8 +- hw/apic.c | 151 +--- hw/device-assignment.c | 10 +- hw/i8254-kvm.c |3 + hw/i8259.c | 108 - hw/ioapic.c| 75 +- hw/isa-bus.c |2 +- hw/msi.c |6 +- hw/msix.c | 10 +- hw/pc.c|7 +- hw/pc_piix.c | 23 ++ kvm-all.c | 17 + kvm-stub.c |6 +- kvm.h |9 +- qemu-kvm-x86.c | 85 +--- qemu-kvm.c | 204 +--- qemu-kvm.h | 72 + target-i386/kvm.c | 17 vl.c | 10 +-- 20 files changed, 58 insertions(+), 767 deletions(-) -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 5/9] qemu-kvm: Use upstream kvm-ioapic
Drop the qemu-kvm version in favor of the equivalent upstream implementation. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/ioapic.c | 75 +- hw/pc_piix.c |7 + 2 files changed, 2 insertions(+), 80 deletions(-) diff --git a/hw/ioapic.c b/hw/ioapic.c index 3f86eff..79549f8 100644 --- a/hw/ioapic.c +++ b/hw/ioapic.c @@ -26,8 +26,6 @@ #include ioapic.h #include ioapic_internal.h -#include kvm.h - //#define DEBUG_IOAPIC #ifdef DEBUG_IOAPIC @@ -222,75 +220,6 @@ ioapic_mem_write(void *opaque, target_phys_addr_t addr, uint64_t val, } } -static void kvm_kernel_ioapic_save_to_user(IOAPICCommonState *s) -{ -#if defined(KVM_CAP_IRQCHIP) defined(TARGET_I386) -struct kvm_irqchip chip; -struct kvm_ioapic_state *kioapic; -int i; - -chip.chip_id = KVM_IRQCHIP_IOAPIC; -kvm_get_irqchip(kvm_state, chip); -kioapic = chip.chip.ioapic; - -s-id = kioapic-id; -s-ioregsel = kioapic-ioregsel; -s-irr = kioapic-irr; -for (i = 0; i IOAPIC_NUM_PINS; i++) { -s-ioredtbl[i] = kioapic-redirtbl[i].bits; -} -#endif -} - -static void kvm_kernel_ioapic_load_from_user(IOAPICCommonState *s) -{ -#if defined(KVM_CAP_IRQCHIP) defined(TARGET_I386) -struct kvm_irqchip chip; -struct kvm_ioapic_state *kioapic; -int i; - -chip.chip_id = KVM_IRQCHIP_IOAPIC; -kioapic = chip.chip.ioapic; - -kioapic-id = s-id; -kioapic-ioregsel = s-ioregsel; -kioapic-base_address = s-busdev.mmio[0].addr; -kioapic-irr = s-irr; -for (i = 0; i IOAPIC_NUM_PINS; i++) { -kioapic-redirtbl[i].bits = s-ioredtbl[i]; -} - -kvm_set_irqchip(kvm_state, chip); -#endif -} - -static void kvm_ioapic_pre_save(IOAPICCommonState *s) -{ - -if (kvm_enabled() kvm_irqchip_in_kernel()) { -kvm_kernel_ioapic_save_to_user(s); -} -} - -static void kvm_ioapic_post_load(IOAPICCommonState *s) -{ -if (kvm_enabled() kvm_irqchip_in_kernel()) { -kvm_kernel_ioapic_load_from_user(s); -} -} - -static void ioapic_reset(DeviceState *d) -{ -IOAPICCommonState *s = DO_UPCAST(IOAPICCommonState, busdev.qdev, d); - -ioapic_reset_common(d); -#ifdef KVM_CAP_IRQCHIP -if (kvm_enabled() kvm_irqchip_in_kernel()) { -kvm_kernel_ioapic_load_from_user(s); -} -#endif -} - static const MemoryRegionOps ioapic_io_ops = { .read = ioapic_mem_read, .write = ioapic_mem_write, @@ -312,9 +241,7 @@ static void ioapic_class_init(ObjectClass *klass, void *data) DeviceClass *dc = DEVICE_CLASS(klass); k-init = ioapic_init; -k-pre_save = kvm_ioapic_pre_save; -k-post_load = kvm_ioapic_post_load; -dc-reset = ioapic_reset; +dc-reset = ioapic_reset_common; } static TypeInfo ioapic_info = { diff --git a/hw/pc_piix.c b/hw/pc_piix.c index ef0202a..58bec18 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -47,8 +47,6 @@ # include xen/hvm/hvm_info_table.h #endif -qemu_irq *ioapic_irq_hack; - #define MAX_IDE_BUS 2 static const int ide_iobase[MAX_IDE_BUS] = { 0x1f0, 0x170 }; @@ -108,12 +106,9 @@ static void ioapic_init(GSIState *gsi_state) SysBusDevice *d; unsigned int i; -#ifdef UNUSED_UPSTREAM_KVM if (kvm_enabled() kvm_irqchip_in_kernel()) { dev = qdev_create(NULL, kvm-ioapic); -} else -#endif -{ +} else { dev = qdev_create(NULL, ioapic); } qdev_init_nofail(dev); -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/9] qemu-kvm: Use upstream irq routing services
Replace qemu-kvm's versions of kvm_add_irq_route, kvm_add_routing_entry, kvm_init_irq_routing, kvm_arch_init_irq_routing, and kvm_commit_irq_routes with the corresponding upstream services. Until the MSI API is refactored, we only need to export kvm_add_routing_entry for this. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/device-assignment.c | 10 ++-- hw/msi.c |4 +- hw/msix.c |8 ++-- hw/pc.c|2 +- hw/pc_piix.c |4 -- kvm-all.c | 10 +--- kvm-stub.c |4 +- kvm.h |6 +- qemu-kvm-x86.c | 50 qemu-kvm.c | 117 +-- qemu-kvm.h | 19 +--- target-i386/kvm.c |2 - 12 files changed, 25 insertions(+), 211 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 584cbb9..d8019fe 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -943,8 +943,8 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev) } assigned_dev-entry-gsi = r; -kvm_add_routing_entry(assigned_dev-entry); -if (kvm_commit_irq_routes() 0) { +kvm_add_routing_entry(kvm_state, assigned_dev-entry); +if (kvm_irqchip_commit_routes(kvm_state) 0) { perror(assigned_dev_update_msi: kvm_commit_irq_routes); assigned_dev-cap.state = ~ASSIGNED_DEVICE_MSI_ENABLED; return; @@ -1028,7 +1028,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) DEBUG(MSI-X vector %d, gsi %d, addr %08x_%08x, data %08x\n, i, r, entry-addr_hi, entry-addr_lo, entry-data); -kvm_add_routing_entry(adev-entry[i]); +kvm_add_routing_entry(kvm_state, adev-entry[i]); msix_entry.gsi = adev-entry[i].gsi; msix_entry.entry = i; @@ -1039,7 +1039,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) } } -if (r == 0 kvm_commit_irq_routes() 0) { +if (r == 0 kvm_irqchip_commit_routes(kvm_state) 0) { perror(assigned_dev_update_msix_mmio: kvm_commit_irq_routes); return -EINVAL; } @@ -1504,7 +1504,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr, return; } -ret = kvm_commit_irq_routes(); +ret = kvm_irqchip_commit_routes(kvm_state); if (ret) { fprintf(stderr, Error committing irq routes (%d)\n, ret); diff --git a/hw/msi.c b/hw/msi.c index 5c179c2..3e623c2 100644 --- a/hw/msi.c +++ b/hw/msi.c @@ -178,7 +178,7 @@ static void kvm_msi_update(PCIDevice *dev) } dev-msi_entries_nr = nr_vectors; if (changed) { -r = kvm_commit_irq_routes(); +r = kvm_irqchip_commit_routes(kvm_state); if (r) { fprintf(stderr, %s: kvm_commit_irq_routes failed: %s\n, __func__, strerror(-r)); @@ -196,7 +196,7 @@ static void kvm_msi_free(PCIDevice *dev) kvm_msi_message_del(dev-msi_irq_entries[vector]); } if (dev-msi_entries_nr 0) { -kvm_commit_irq_routes(); +kvm_irqchip_commit_routes(kvm_state); } dev-msi_entries_nr = 0; } diff --git a/hw/msix.c b/hw/msix.c index 6e40957..55ddbf4 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -49,7 +49,7 @@ static void kvm_msix_free(PCIDevice *dev) } } if (changed) { -kvm_commit_irq_routes(); +kvm_irqchip_commit_routes(kvm_state); } } @@ -89,7 +89,7 @@ static void kvm_msix_update(PCIDevice *dev, int vector, } if (r 0) { *entry = new_entry; -r = kvm_commit_irq_routes(); +r = kvm_irqchip_commit_routes(kvm_state); if (r) { fprintf(stderr, %s: kvm_commit_irq_routes failed: %s\n, __func__, strerror(-r)); @@ -110,7 +110,7 @@ static int kvm_msix_vector_add(PCIDevice *dev, unsigned vector) return r; } -r = kvm_commit_irq_routes(); +r = kvm_irqchip_commit_routes(kvm_state); if (r 0) { fprintf(stderr, %s: kvm_commit_irq_routes failed: %s\n, __func__, strerror(-r)); return r; @@ -121,7 +121,7 @@ static int kvm_msix_vector_add(PCIDevice *dev, unsigned vector) static void kvm_msix_vector_del(PCIDevice *dev, unsigned vector) { kvm_msi_message_del(dev-msix_irq_entries[vector]); -kvm_commit_irq_routes(); +kvm_irqchip_commit_routes(kvm_state); } /* Add MSI-X capability to the config space for the device. */ diff --git a/hw/pc.c b/hw/pc.c index 70abb6c..e38a63d 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -1158,7 +1158,7 @@ void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi, register_ioport_write(0xf0, 1, 1, ioportF0_write, NULL); -if (!no_hpet) { +if (!no_hpet (!kvm_irqchip_in_kernel() || kvm_has_pit_state2())) {
[PATCH v2 7/9] qemu-kvm: Drop unused kvm_get/set_irqchip
No users remaining. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- qemu-kvm.c | 28 qemu-kvm.h | 23 --- 2 files changed, 0 insertions(+), 51 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 8f1b760..10a313d 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -71,34 +71,6 @@ int kvm_set_irq(int irq, int level, int *status) return 1; } -int kvm_get_irqchip(KVMState *s, struct kvm_irqchip *chip) -{ -int r; - -if (!s-irqchip_in_kernel) { -return 0; -} -r = kvm_vm_ioctl(s, KVM_GET_IRQCHIP, chip); -if (r 0) { -perror(kvm_get_irqchip\n); -} -return r; -} - -int kvm_set_irqchip(KVMState *s, struct kvm_irqchip *chip) -{ -int r; - -if (!s-irqchip_in_kernel) { -return 0; -} -r = kvm_vm_ioctl(s, KVM_SET_IRQCHIP, chip); -if (r 0) { -perror(kvm_set_irqchip\n); -} -return r; -} - #endif #ifdef KVM_CAP_DEVICE_ASSIGNMENT diff --git a/qemu-kvm.h b/qemu-kvm.h index cd5e3cc..433e2fe 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -32,29 +32,6 @@ #include kvm.h -/*! - * \brief Dump in kernel IRQCHIP contents - * - * Dump one of the in kernel irq chip devices, including PIC (master/slave) - * and IOAPIC into a kvm_irqchip structure - * - * \param kvm Pointer to the current kvm_context - * \param chip The irq chip device to be dumped - */ -int kvm_get_irqchip(KVMState *s, struct kvm_irqchip *chip); - -/*! - * \brief Set in kernel IRQCHIP contents - * - * Write one of the in kernel irq chip devices, including PIC (master/slave) - * and IOAPIC - * - * - * \param kvm Pointer to the current kvm_context - * \param chip THe irq chip device to be written - */ -int kvm_set_irqchip(KVMState *s, struct kvm_irqchip *chip); - #if defined(__i386__) || defined(__x86_64__) /*! * \brief Get in kernel local APIC for vcpu -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/9] qemu-kvm: Use upstream kvm_irqchip_create
Drop kvm_create_irqchip in favor of the equivalent upstream version. This also allows to drop the kvm_irqchip global variable. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- kvm-all.c | 15 --- qemu-kvm.c | 29 - qemu-kvm.h |3 --- vl.c |1 - 4 files changed, 0 insertions(+), 48 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index afcad44..606bd02 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -843,7 +843,6 @@ static void kvm_init_irq_routing(KVMState *s) static int kvm_irqchip_create(KVMState *s) { -#ifdef UNUSED_UPSTREAM_KVM QemuOptsList *list = qemu_find_opts(machine); int ret; @@ -867,7 +866,6 @@ static int kvm_irqchip_create(KVMState *s) s-irqchip_in_kernel = 1; kvm_init_irq_routing(s); -#endif return 0; } @@ -881,7 +879,6 @@ int kvm_init(void) const KVMCapabilityInfo *missing_cap; int ret; int i; -QemuOptsList *list; s = g_malloc0(sizeof(KVMState)); @@ -968,13 +965,6 @@ int kvm_init(void) s-pit_state2 = kvm_check_extension(s, KVM_CAP_PIT_STATE2); #endif -list = qemu_find_opts(machine); -if (!QTAILQ_EMPTY(list-head) -!qemu_opt_get_bool(QTAILQ_FIRST(list-head), - kernel_irqchip, false)) { -kvm_irqchip = 0; -} - ret = kvm_arch_init(s); if (ret 0) { goto err; @@ -990,11 +980,6 @@ int kvm_init(void) s-many_ioeventfds = kvm_check_many_ioeventfds(); -ret = kvm_create_irqchip(s); -if (ret 0) { -return ret; -} - cpu_interrupt_handler = kvm_handle_interrupt; return 0; diff --git a/qemu-kvm.c b/qemu-kvm.c index 37af80f..8f1b760 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -41,35 +41,6 @@ static inline void clear_gsi(KVMState *s, unsigned int gsi) } } -int kvm_create_irqchip(KVMState *s) -{ -#ifdef KVM_CAP_IRQCHIP -int r; - -if (!kvm_irqchip || !kvm_check_extension(s, KVM_CAP_IRQCHIP)) { -return 0; -} - -r = kvm_vm_ioctl(s, KVM_CREATE_IRQCHIP); -if (r 0) { -fprintf(stderr, Create kernel PIC irqchip failed\n); -return r; -} - -s-irqchip_inject_ioctl = KVM_IRQ_LINE; -#if defined(KVM_CAP_IRQ_INJECT_STATUS) defined(KVM_IRQ_LINE_STATUS) -if (kvm_check_extension(s, KVM_CAP_IRQ_INJECT_STATUS)) { -s-irqchip_inject_ioctl = KVM_IRQ_LINE_STATUS; -} -#endif -s-irqchip_in_kernel = 1; - -kvm_init_irq_routing(s); -#endif - -return 0; -} - #ifdef KVM_CAP_IRQCHIP int kvm_set_irq(int irq, int level, int *status) diff --git a/qemu-kvm.h b/qemu-kvm.h index 6235800..cd5e3cc 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -32,8 +32,6 @@ #include kvm.h -int kvm_create_irqchip(KVMState *s); - /*! * \brief Dump in kernel IRQCHIP contents * @@ -243,7 +241,6 @@ int kvm_arch_set_ioport_access(unsigned long start, unsigned long size, int kvm_create_pit(KVMState *s); -extern int kvm_irqchip; extern int kvm_pit_reinject; extern unsigned int kvm_shadow_memory; diff --git a/vl.c b/vl.c index c3b4037..98d29ce 100644 --- a/vl.c +++ b/vl.c @@ -2173,7 +2173,6 @@ static void free_and_trace(gpointer mem) } #ifdef CONFIG_KVM_OPTIONS -int kvm_irqchip = 1; int kvm_pit_reinject = 1; #endif -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 9/9] qemu-kvm: Use upstream kvm-apic
Drop the qemu-kvm version in favor of the equivalent upstream implementation. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/apic.c | 151 +--- hw/pc.c |5 +-- qemu-kvm-x86.c| 31 --- qemu-kvm.h| 25 - target-i386/kvm.c |8 --- 5 files changed, 4 insertions(+), 216 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index b767b87..086c544 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -22,7 +22,6 @@ #include host-utils.h #include trace.h #include pc.h -#include kvm.h #define MAX_APIC_WORDS 8 @@ -133,35 +132,9 @@ void apic_deliver_pic_intr(DeviceState *d, int level) } } -static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id); - -static void kvm_irqchip_deliver_nmi(void *p) -{ -APICCommonState *s = p; -struct kvm_lapic_state klapic; -uint32_t lvt; - -kvm_get_lapic(s-cpu_env, klapic); -lvt = kapic_reg(klapic, 0x32 + APIC_LVT_LINT1); - -if (lvt APIC_LVT_MASKED) { -return; -} - -if (((lvt 8) 7) != APIC_DM_NMI) { -return; -} - -kvm_vcpu_ioctl(s-cpu_env, KVM_NMI); -} - static void apic_external_nmi(APICCommonState *s) { -if (kvm_irqchip_in_kernel()) { -run_on_cpu(s-cpu_env, kvm_irqchip_deliver_nmi, s); -} else { -apic_local_deliver(s, APIC_LVT_LINT1); -} +apic_local_deliver(s, APIC_LVT_LINT1); } #define foreach_apic(apic, deliver_bitmask, code) \ @@ -254,11 +227,8 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode, static void apic_set_base(APICCommonState *s, uint64_t val) { -if (kvm_enabled() kvm_irqchip_in_kernel()) -s-apicbase = val; -else -s-apicbase = (val 0xf000) | -(s-apicbase (MSR_IA32_APICBASE_BSP | MSR_IA32_APICBASE_ENABLE)); +s-apicbase = (val 0xf000) | +(s-apicbase (MSR_IA32_APICBASE_BSP | MSR_IA32_APICBASE_ENABLE)); /* if disabled, cannot be enabled again */ if (!(val MSR_IA32_APICBASE_ENABLE)) { s-apicbase = ~MSR_IA32_APICBASE_ENABLE; @@ -270,9 +240,6 @@ static void apic_set_base(APICCommonState *s, uint64_t val) static void apic_set_tpr(APICCommonState *s, uint8_t val) { s-tpr = (val 0x0f) 4; -if (kvm_enabled() kvm_irqchip_in_kernel()) { -return; -} apic_update_irq(s); } @@ -770,120 +737,8 @@ static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val) } } -#ifdef KVM_CAP_IRQCHIP - -static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id) -{ -return *((uint32_t *) (kapic-regs + (reg_id 4))); -} - -static inline void kapic_set_reg(struct kvm_lapic_state *kapic, - int reg_id, uint32_t val) -{ -*((uint32_t *) (kapic-regs + (reg_id 4))) = val; -} - -static void kvm_kernel_lapic_save_to_user(APICCommonState *s) -{ -struct kvm_lapic_state apic; -struct kvm_lapic_state *kapic = apic; -int i, v; - -kvm_get_lapic(s-cpu_env, kapic); - -s-id = kapic_reg(kapic, 0x2) 24; -s-tpr = kapic_reg(kapic, 0x8); -s-arb_id = kapic_reg(kapic, 0x9); -s-log_dest = kapic_reg(kapic, 0xd) 24; -s-dest_mode = kapic_reg(kapic, 0xe) 28; -s-spurious_vec = kapic_reg(kapic, 0xf); -for (i = 0; i 8; i++) { -s-isr[i] = kapic_reg(kapic, 0x10 + i); -s-tmr[i] = kapic_reg(kapic, 0x18 + i); -s-irr[i] = kapic_reg(kapic, 0x20 + i); -} -s-esr = kapic_reg(kapic, 0x28); -s-icr[0] = kapic_reg(kapic, 0x30); -s-icr[1] = kapic_reg(kapic, 0x31); -for (i = 0; i APIC_LVT_NB; i++) - s-lvt[i] = kapic_reg(kapic, 0x32 + i); -s-initial_count = kapic_reg(kapic, 0x38); -s-divide_conf = kapic_reg(kapic, 0x3e); - -v = (s-divide_conf 3) | ((s-divide_conf 1) 4); -s-count_shift = (v + 1) 7; - -s-initial_count_load_time = qemu_get_clock_ns(vm_clock); -apic_next_timer(s, s-initial_count_load_time); -} - -static void kvm_kernel_lapic_load_from_user(APICCommonState *s) -{ -struct kvm_lapic_state apic; -struct kvm_lapic_state *klapic = apic; -int i; - -memset(klapic, 0, sizeof apic); -kapic_set_reg(klapic, 0x2, s-id 24); -kapic_set_reg(klapic, 0x8, s-tpr); -kapic_set_reg(klapic, 0xd, s-log_dest 24); -kapic_set_reg(klapic, 0xe, s-dest_mode 28 | 0x0fff); -kapic_set_reg(klapic, 0xf, s-spurious_vec); -for (i = 0; i 8; i++) { -kapic_set_reg(klapic, 0x10 + i, s-isr[i]); -kapic_set_reg(klapic, 0x18 + i, s-tmr[i]); -kapic_set_reg(klapic, 0x20 + i, s-irr[i]); -} -kapic_set_reg(klapic, 0x28, s-esr); -kapic_set_reg(klapic, 0x30, s-icr[0]); -kapic_set_reg(klapic, 0x31, s-icr[1]); -for (i = 0; i APIC_LVT_NB; i++) -kapic_set_reg(klapic, 0x32 + i, s-lvt[i]); -kapic_set_reg(klapic, 0x38, s-initial_count); -kapic_set_reg(klapic, 0x3e, s-divide_conf); - -
[PATCH v2 6/9] qemu-kvm: Use upstream kvm-i8259
Drop the qemu-kvm version in favor of the equivalent upstream implementation. This allows to move the i8259 back into the hwlib. Note that this also drops the testdev hack and restores proper isa_get_irq. If testdev scripts exist that inject IRQ15, they need fixing. Testing for these interrupts on the PIIX3 makes no practical sense anyway as those lines are unused. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Makefile.objs |2 +- Makefile.target |8 ++-- hw/i8259.c | 108 --- hw/isa-bus.c|2 +- hw/pc_piix.c|5 +-- 5 files changed, 7 insertions(+), 118 deletions(-) diff --git a/Makefile.objs b/Makefile.objs index ee6b15d..2f70b84 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -229,7 +229,7 @@ hw-obj-$(CONFIG_APPLESMC) += applesmc.o hw-obj-$(CONFIG_SMARTCARD) += usb-ccid.o ccid-card-passthru.o hw-obj-$(CONFIG_SMARTCARD_NSS) += ccid-card-emulated.o hw-obj-$(CONFIG_USB_REDIR) += usb-redir.o -# hw-obj-$(CONFIG_I8259) += i8259_common.o i8259.o +hw-obj-$(CONFIG_I8259) += i8259_common.o i8259.o # PPC devices hw-obj-$(CONFIG_PREP_PCI) += prep_pci.o diff --git a/Makefile.target b/Makefile.target index f644762..b0ff38e 100644 --- a/Makefile.target +++ b/Makefile.target @@ -239,7 +239,7 @@ obj-$(CONFIG_IVSHMEM) += ivshmem.o obj-y += device-hotplug.o # Hardware support -obj-i386-y += mc146818rtc.o pc.o i8259_common.o i8259.o +obj-i386-y += mc146818rtc.o pc.o obj-i386-y += sga.o apic_common.o apic.o ioapic_common.o ioapic.o piix_pci.o obj-i386-y += vmport.o obj-i386-y += pci-hotplug.o smbios.o wdt_ib700.o @@ -257,7 +257,7 @@ obj-i386-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += device-assignment.o # shared objects obj-ppc-y = ppc.o ppc_booke.o # PREP target -obj-ppc-y += mc146818rtc.o i8259_common.o i8259.o +obj-ppc-y += mc146818rtc.o obj-ppc-y += ppc_prep.o # OldWorld PowerMac obj-ppc-y += ppc_oldworld.o @@ -312,7 +312,7 @@ obj-mips-y += pcspk.o i8254.o obj-mips-y += acpi.o acpi_piix4.o obj-mips-y += mips_addr.o mips_timer.o mips_int.o obj-mips-y += jazz_led.o -obj-mips-y += gt64xxx.o mc146818rtc.o i8259_common.o i8259.o +obj-mips-y += gt64xxx.o mc146818rtc.o obj-mips-$(CONFIG_FULONG) += bonito.o vt82c686.o mips_fulong2e.o obj-microblaze-y = petalogix_s3adsp1800_mmu.o @@ -391,7 +391,7 @@ obj-m68k-y += m68k-semi.o dummy_m68k.o obj-s390x-y = s390-virtio-bus.o s390-virtio.o -obj-alpha-y = mc146818rtc.o i8259_common.o i8259.o +obj-alpha-y = mc146818rtc.o obj-alpha-y += alpha_pci.o alpha_dp264.o alpha_typhoon.o obj-xtensa-y += xtensa_pic.o diff --git a/hw/i8259.c b/hw/i8259.c index cfffbee..7ae5380 100644 --- a/hw/i8259.c +++ b/hw/i8259.c @@ -28,11 +28,6 @@ #include qemu-timer.h #include i8259_internal.h -#include kvm.h -#include apic_internal.h - -static void kvm_i8259_set_irq(void *opaque, int irq, int level); - /* debug PIC */ //#define DEBUG_PIC @@ -226,17 +221,9 @@ int pic_read_irq(DeviceState *d) return intno; } -static int kvm_kernel_pic_load_from_user(PICCommonState *s); - static void pic_init_reset(PICCommonState *s) { pic_reset_common(s); - -if (kvm_enabled() kvm_irqchip_in_kernel()) { -kvm_kernel_pic_load_from_user(s); -return; -} - pic_update_irq(s); } @@ -393,22 +380,6 @@ static uint64_t elcr_ioport_read(void *opaque, target_phys_addr_t addr, return s-elcr; } -static void kvm_kernel_pic_save_to_user(PICCommonState *s); - -static void kvm_pic_pre_save(PICCommonState *s) -{ -if (kvm_enabled() kvm_irqchip_in_kernel()) { -kvm_kernel_pic_save_to_user(s); -} -} - -static void kvm_pic_post_load(PICCommonState *s) -{ -if (kvm_enabled() kvm_irqchip_in_kernel()) { -kvm_kernel_pic_load_from_user(s); -} -} - static const MemoryRegionOps pic_base_ioport_ops = { .read = pic_ioport_read, .write = pic_ioport_write, @@ -498,10 +469,6 @@ qemu_irq *i8259_init(ISABus *bus, qemu_irq parent_irq) slave_pic = DO_UPCAST(PICCommonState, dev, dev); -if (kvm_enabled() kvm_irqchip_in_kernel()) { -irq_set = qemu_allocate_irqs(kvm_i8259_set_irq, NULL, 24); -} - return irq_set; } @@ -511,8 +478,6 @@ static void i8259_class_init(ObjectClass *klass, void *data) DeviceClass *dc = DEVICE_CLASS(klass); k-init = pic_init; -k-pre_save = kvm_pic_pre_save; -k-post_load = kvm_pic_post_load; dc-reset = pic_reset; } @@ -528,77 +493,4 @@ static void pic_register(void) type_register_static(i8259_info); } -static void kvm_kernel_pic_save_to_user(PICCommonState *s) -{ -#ifdef KVM_CAP_IRQCHIP -struct kvm_irqchip chip; -struct kvm_pic_state *kpic; - -chip.chip_id = s-master ? - KVM_IRQCHIP_PIC_MASTER : - KVM_IRQCHIP_PIC_SLAVE; -kvm_get_irqchip(kvm_state, chip); -kpic = chip.chip.pic; - -s-last_irr = kpic-last_irr; -s-irr = kpic-irr; -s-imr = kpic-imr; -s-isr = kpic-isr; -
[PATCH v2 1/9] qemu-kvm: Move kvm_create_pit out of arch init code
This belongs where the PIT is created and allows us to drop another kvm_irqchip reference. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/i8254-kvm.c|3 +++ qemu-kvm-x86.c|4 ++-- qemu-kvm.h|2 ++ target-i386/kvm.c |7 --- 4 files changed, 7 insertions(+), 9 deletions(-) diff --git a/hw/i8254-kvm.c b/hw/i8254-kvm.c index 8b494d0..f0c7ac8 100644 --- a/hw/i8254-kvm.c +++ b/hw/i8254-kvm.c @@ -107,6 +107,9 @@ void kvm_pit_init(PITState *pit) { PITChannelState *s; +if (kvm_create_pit(kvm_state) 0) { +hw_error(KVM PIT creation failed\n); +} s = pit-channels[0]; s-irq_timer = qemu_new_timer_ns(vm_clock, dummy_timer, s); vmstate_pit.pre_save = kvm_pit_pre_save; diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index a0bfc23..6fe48a4 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -23,11 +23,11 @@ #include kvm.h #include hw/apic.h -static int kvm_create_pit(KVMState *s) +int kvm_create_pit(KVMState *s) { int r; -if (kvm_irqchip) { +if (kvm_irqchip_in_kernel()) { r = kvm_vm_ioctl(s, KVM_CREATE_PIT); if (r 0) { fprintf(stderr, Create kernel PIC irqchip failed\n); diff --git a/qemu-kvm.h b/qemu-kvm.h index 975b6fa..653370e 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -256,6 +256,8 @@ int kvm_update_ioport_access(CPUState *env); int kvm_arch_set_ioport_access(unsigned long start, unsigned long size, bool enable); +int kvm_create_pit(KVMState *s); + extern int kvm_irqchip; extern int kvm_pit_reinject; extern unsigned int kvm_shadow_memory; diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 7079e87..ee2d3f8 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -661,8 +661,6 @@ static int kvm_get_supported_msrs(KVMState *s) return ret; } -static int kvm_create_pit(KVMState *s); - int kvm_arch_init(KVMState *s) { uint64_t identity_base = 0xfffbc000; @@ -712,11 +710,6 @@ int kvm_arch_init(KVMState *s) } qemu_register_reset(kvm_unpoison_all, NULL); -ret = kvm_create_pit(s); -if (ret 0) { -return ret; -} - if (kvm_shadow_memory) { ret = kvm_vm_ioctl(s, KVM_SET_NR_MMU_PAGES, kvm_shadow_memory); if (ret 0) { -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
x86: kvmclock: abstract save/restore sched_clock_state (v2)
Upon resume from hibernation, CPU 0's hvclock area contains the old values for system_time and tsc_timestamp. It is necessary for the hypervisor to update these values with uptodate ones before the CPU uses them. Abstract TSC's save/restore sched_clock_state functions and use restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update. Also move restore_sched_clock_state before __restore_processor_state, since the later calls CONFIG_LOCK_STAT's lockstat_clock (also for TSC). Thanks to Igor Mammedov for tracking it down. Fixes suspend-to-disk with kvmclock. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h index 15d9915..c91e8b9 100644 --- a/arch/x86/include/asm/tsc.h +++ b/arch/x86/include/asm/tsc.h @@ -61,7 +61,7 @@ extern void check_tsc_sync_source(int cpu); extern void check_tsc_sync_target(void); extern int notsc_setup(char *); -extern void save_sched_clock_state(void); -extern void restore_sched_clock_state(void); +extern void tsc_save_sched_clock_state(void); +extern void tsc_restore_sched_clock_state(void); #endif /* _ASM_X86_TSC_H */ diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h index 5d0afac..baaca8d 100644 --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -162,6 +162,8 @@ struct x86_cpuinit_ops { * @is_untracked_pat_range exclude from PAT logic * @nmi_init enable NMI on cpus * @i8042_detect pre-detect if i8042 controller exists + * @save_sched_clock_state:save state for sched_clock() on suspend + * @restore_sched_clock_state: restore state for sched_clock() on resume */ struct x86_platform_ops { unsigned long (*calibrate_tsc)(void); @@ -173,6 +175,8 @@ struct x86_platform_ops { void (*nmi_init)(void); unsigned char (*get_nmi_reason)(void); int (*i8042_detect)(void); + void (*save_sched_clock_state)(void); + void (*restore_sched_clock_state)(void); }; struct pci_dev; diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index ca4e735..57e6b78 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -136,6 +136,15 @@ int kvm_register_clock(char *txt) return ret; } +void kvm_save_sched_clock_state(void) +{ +} + +void kvm_restore_sched_clock_state(void) +{ + kvm_register_clock(primary cpu clock, resume); +} + #ifdef CONFIG_X86_LOCAL_APIC static void __cpuinit kvm_setup_secondary_clock(void) { @@ -195,6 +204,8 @@ void __init kvmclock_init(void) x86_cpuinit.early_percpu_clock_init = kvm_setup_secondary_clock; #endif + x86_platform.save_sched_clock_state = kvm_save_sched_clock_state; + x86_platform.restore_sched_clock_state = kvm_restore_sched_clock_state; machine_ops.shutdown = kvm_shutdown; #ifdef CONFIG_KEXEC machine_ops.crash_shutdown = kvm_crash_shutdown; diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index a62c201..aed2aa1 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -629,7 +629,7 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) static unsigned long long cyc2ns_suspend; -void save_sched_clock_state(void) +void tsc_save_sched_clock_state(void) { if (!sched_clock_stable) return; @@ -645,7 +645,7 @@ void save_sched_clock_state(void) * that sched_clock() continues from the point where it was left off during * suspend. */ -void restore_sched_clock_state(void) +void tsc_restore_sched_clock_state(void) { unsigned long long offset; unsigned long flags; diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c index 6f2ec53..e9f265f 100644 --- a/arch/x86/kernel/x86_init.c +++ b/arch/x86/kernel/x86_init.c @@ -108,7 +108,9 @@ struct x86_platform_ops x86_platform = { .is_untracked_pat_range = is_ISA_range, .nmi_init = default_nmi_init, .get_nmi_reason = default_get_nmi_reason, - .i8042_detect = default_i8042_detect + .i8042_detect = default_i8042_detect, + .save_sched_clock_state = tsc_save_sched_clock_state, + .restore_sched_clock_state = tsc_restore_sched_clock_state, }; EXPORT_SYMBOL_GPL(x86_platform); diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index f10c0af..0e76a28 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -114,7 +114,7 @@ static void __save_processor_state(struct saved_context *ctxt) void save_processor_state(void) { __save_processor_state(saved_context); - save_sched_clock_state(); + x86_platform.save_sched_clock_state(); } #ifdef CONFIG_X86_32 EXPORT_SYMBOL(save_processor_state); @@ -230,8 +230,8 @@ static void __restore_processor_state(struct saved_context *ctxt) /* Needed by apm.c */ void restore_processor_state(void) { +
Re: [Android-virt] [PATCH RFC v2 3/3] ARM: KVM: Add support for MMU notifiers
On 12/02/12 01:12, Christoffer Dall wrote: On Sat, Feb 11, 2012 at 10:33 AM, Antonios Motakis a.mota...@virtualopensystems.com wrote: On 02/11/2012 06:35 PM, Christoffer Dall wrote: On Sat, Feb 11, 2012 at 7:00 AM, Antonios Motakis a.mota...@virtualopensystems.com wrote: On 02/10/2012 11:22 PM, Marc Zyngier wrote: +ENTRY(__kvm_tlb_flush_vmid) + hvc #0 @ Switch to Hyp mode + push{r2, r3} + ldrdr2, r3, [r0, #KVM_VTTBR] + mcrrp15, 6, r2, r3, c2 @ Write VTTBR + isb + mcr p15, 0, r0, c8, c7, 0 @ TBLIALL + dsb + isb + mov r2, #0 + mov r3, #0 + mcrrp15, 6, r2, r3, c2 @ Back to VMID #0 + isb + + pop {r2, r3} + hvc #0 @ Back to SVC + mov pc, lr +ENDPROC(__kvm_tlb_flush_vmid) With the last VMID implementation, you could get the equivalent effect of a per-VMID flush, by just getting a new VMID for the current VM. So you could do a (kvm-arch.vmid = 0) to force a new VMID when the guest reruns, and save the overhead of that flush (you will do a complete flush every 255 times instead of a small one every single time). to do this you would need to send an IPI if the guest is currently executing on another CPU and make it exit the guest, so that the VMID assignment will run before the guest potentially accesses that TLB entry that points to the page that was just reclaimed - which I am not sure will be better than this solution. Don't you have to do this anyway? You'd want the flush to be effective on all CPUs before proceeding. hmm yeah, actually you do need this. Unless the -IS version of the flush instruction covers all relevant cores in this case. Marc, I don't think that the processor clearing out the page table entry will necessarily belong to the same inner-shareable domain as the processor potentially executing the VM, so therefore the -IS flushing version would not be sufficient and we actually have to go and send an IPI. If we forget about the 11MPCore (which doesn't broadcast the TLB invalidation in hardware), the TLBIALLIS operation makes sure all cores belonging to the same inner shareable domain will see the TLB invalidation at the same time. If they don't, this is a hardware bug. Now, I do not have an example of a system where two CPUs are not part of the same IS domain. Even big.LITTLE has all of the potential 8 cores in an IS domain. If such a system exists one of these days, then it will be worth considering having a separate method to cope with the case. Until then, my opinion is to keep it as simple as possible. M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pe: [PATCH v5 1/3] virtio-scsi: first version
On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote: Only if you use the pci multi-function option but that kills standard hot unplug It doesn't kill it as such, rather you can't unplug luns individually. Isnt that just a consequence of the current implementation rather than a SCSI limitation? A different way to do hoplug could be to flag all devices as removable in the standard inq page then leave the LUN there persistently and what you remove/add is not the LUN device itself but just the media in the device. Instead of hot-plug remove the LUN, hot-plug becomes media eject or media insert. The device remains present all time, you never remove it, but instead hot-plug controls if the media is present or not. This would require implementing at least START_STOP_UNIT and PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC. regards ronnie sahlberg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pe: [PATCH v5 1/3] virtio-scsi: first version
On 02/13/2012 02:13 PM, ronnie sahlberg wrote: On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote: Only if you use the pci multi-function option but that kills standard hot unplug It doesn't kill it as such, rather you can't unplug luns individually. Isnt that just a consequence of the current implementation rather than a SCSI limitation? We're talking about virtio-blk here. :) Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pe: [PATCH v5 1/3] virtio-scsi: first version
On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote: On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote: Only if you use the pci multi-function option but that kills standard hot unplug It doesn't kill it as such, rather you can't unplug luns individually. Isnt that just a consequence of the current implementation rather than a SCSI limitation? Yes. A different way to do hoplug could be to flag all devices as removable in the standard inq page then leave the LUN there persistently and what you remove/add is not the LUN device itself but just the media in the device. Instead of hot-plug remove the LUN, hot-plug becomes media eject or media insert. The device remains present all time, you never remove it, but instead hot-plug controls if the media is present or not. This would require implementing at least START_STOP_UNIT and PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC. regards ronnie sahlberg That would work. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane
https://bugzilla.kernel.org/show_bug.cgi?id=42755 --- Comment #22 from Rosen sandik...@yandex.ru 2012-02-13 13:19:29 --- Created an attachment (id=72363) -- (https://bugzilla.kernel.org/attachment.cgi?id=72363) trace-cmd report -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane
https://bugzilla.kernel.org/show_bug.cgi?id=42755 --- Comment #23 from Gleb g...@redhat.com 2012-02-13 13:30:01 --- What guest did during this trace? Can you provide info pci monitor output pls? -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Q: Does linux kvm native tool support loading BIOS as the default loader now?
On 02/13/2012 12:38 PM, Pekka Enberg wrote: On Mon, Feb 13, 2012 at 08:14:22PM +0800, Yang Bai wrote: As I know, native tool does not support loading BIOS so it does not support Windows. Is this supporting now? If not, I may try to implement it. You're welcome to do so ;-). This would open the door for non-linux OS support in kvm tool. -- Asias He -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane
https://bugzilla.kernel.org/show_bug.cgi?id=42755 Avi Kivity a...@redhat.com changed: What|Removed |Added CC||a...@redhat.com --- Comment #24 from Avi Kivity a...@redhat.com 2012-02-13 13:43:35 --- Please run 'perf top' in the host and report the output (while tracing is disabled). -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On Fri, 2012-02-10 at 08:39 -0800, Stephen Hemminger wrote: Some related discussion points: * the bridge needs to support control from both userspace (MSTP, TRILL, ...) and kernel space (offload etc) I think all are pretty much covered if you let some controler (I prefer user space) ADD/DEL/GET/Event on the fdb TRILL really is outside the scope of this; from an encap/decap it probably needs to be YAND (Yet another netdev) and from a control side of things you need to just provide the above netlink ops(ADD, etC) on the fdb and let the controller worry about things (Actually you _may_ need to have learning done outside of the kernel for TRILL) * the bridge forwarding database is simpler and different than the existing neighbor table, don't remember the details but last time I checked it using neighbor table in bridge would be putting square peg in round hole. Agreed. cheers, jamal -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane
https://bugzilla.kernel.org/show_bug.cgi?id=42755 --- Comment #25 from Rosen sandik...@yandex.ru 2012-02-13 14:14:17 --- (In reply to comment #23) What guest did during this trace? Can you provide info pci monitor output pls? can't see full output from this command -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane
https://bugzilla.kernel.org/show_bug.cgi?id=42755 --- Comment #26 from Rosen sandik...@yandex.ru 2012-02-13 14:24:36 --- Created an attachment (id=72365) -- (https://bugzilla.kernel.org/attachment.cgi?id=72365) info pci -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2-RFC 1/2] shpc: standard hot plug controller
On Mon, Feb 13, 2012 at 01:49:32PM +0200, Michael S. Tsirkin wrote: On Mon, Feb 13, 2012 at 07:03:52PM +0900, Isaku Yamahata wrote: Oh nice work. On Mon, Feb 13, 2012 at 11:15:55AM +0200, Michael S. Tsirkin wrote: This adds support for SHPC interface, as defined by PCI Standard Hot-Plug Controller and Subsystem Specification, Rev 1.0 http://www.pcisig.com/specifications/conventional/pci_hot_plug/SHPC_10 Only SHPC intergrated with a PCI-to-PCI bridge is supported, SHPC integrated with a host bridge would need more work. All main SHPC features are supported: - MRL sensor Does this just report latch status? (It seems so.) What happens is that adding a device closes the latch, removing a device opens the latch. This simplifies the number of supported configurations significantly. Do you plan to provide interfaces to manipulate the latch? I didn't plan to do this, and this is non-trivial. Do you just want this for empty slots? And why? No, I just wondered your plan. - Attention button - Attention indicator - Power indicator Wake on hotplug and serr generation are stubbed out but unused as we don't have interfaces to generate these events ATM. One issue that isn't completely resolved is that qemu currently expects an eject interface, which SHPC does not provide: it merely removes the power to device and it's up to the user to remove the device from slot. This patch works around that by ejecting the device when power is removed and power LED goes off. TODO: - migration support - fix dependency on pci_internals.h If I didn't miss the code, - QMP command for pushing attention button. - QMP command to get LED status It's easy to add these, so I'd accept such a patch, but I wonder why. My concern is how libvirt/virt-manger (or other UI) presents slot status to operators/users. - QMP events for LED on/off There's also blink :) thanks, I'm concerned that a guest can flood the management with such events. It's better to send a single LED change event, then we can suppress further events until next get LED status command. Makes sense. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Makefile.objs |1 + hw/pci.h |6 + hw/shpc.c | 646 + hw/shpc.h | 40 qemu-common.h |1 + 5 files changed, 694 insertions(+), 0 deletions(-) create mode 100644 hw/shpc.c create mode 100644 hw/shpc.h diff --git a/Makefile.objs b/Makefile.objs index 391e524..4546477 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -195,6 +195,7 @@ hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o hw-obj-y += fw_cfg.o hw-obj-$(CONFIG_PCI) += pci.o pci_bridge.o hw-obj-$(CONFIG_PCI) += msix.o msi.o +hw-obj-$(CONFIG_PCI) += shpc.o hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o hw-obj-y += watchdog.o diff --git a/hw/pci.h b/hw/pci.h index 33b0b18..756577e 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -125,6 +125,9 @@ enum { /* command register SERR bit enabled */ #define QEMU_PCI_CAP_SERR_BITNR 4 QEMU_PCI_CAP_SERR = (1 QEMU_PCI_CAP_SERR_BITNR), +/* Standard hot plug controller. */ +#define QEMU_PCI_SHPC_BITNR 5 +QEMU_PCI_CAP_SHPC = (1 QEMU_PCI_SHPC_BITNR), }; #define TYPE_PCI_DEVICE pci-device @@ -229,6 +232,9 @@ struct PCIDevice { /* PCI Express */ PCIExpressDevice exp; +/* SHPC */ +SHPCDevice *shpc; + /* Location of option rom */ char *romfile; bool has_rom; diff --git a/hw/shpc.c b/hw/shpc.c new file mode 100644 index 000..4baec29 --- /dev/null +++ b/hw/shpc.c @@ -0,0 +1,646 @@ +#include strings.h +#include stdint.h +#include range.h +#include shpc.h +#include pci.h +#include pci_internals.h + +/* TODO: model power only and disabled slot states. */ +/* TODO: handle SERR and wakeups */ +/* TODO: consider enabling 66MHz support */ + +/* TODO: remove fully only on state DISABLED and LED off. + * track state to properly record this. */ + +/* SHPC Working Register Set */ +#define SHPC_BASE_OFFSET 0x00 /* 4 bytes */ +#define SHPC_SLOTS_33 0x04 /* 4 bytes. Also encodes PCI-X slots. */ +#define SHPC_SLOTS_66 0x08 /* 4 bytes. */ +#define SHPC_NSLOTS 0x0C /* 1 byte */ +#define SHPC_FIRST_DEV0x0D /* 1 byte */ +#define SHPC_PHYS_SLOT0x0E /* 2 byte */ +#define SHPC_PHYS_NUM_MAX 0x7ff +#define SHPC_PHYS_NUM_UP 0x1000 +#define SHPC_PHYS_MRL 0x4000 +#define SHPC_PHYS_BUTTON 0x8000 +#define SHPC_SEC_BUS 0x10 /* 2 bytes */ +#define SHPC_SEC_BUS_33 0x0 +#define SHPC_SEC_BUS_66 0x1 /* Unused */ +#define SHPC_SEC_BUS_MASK 0x7
[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane
https://bugzilla.kernel.org/show_bug.cgi?id=42755 --- Comment #27 from Rosen sandik...@yandex.ru 2012-02-13 14:37:36 --- and there soon will be video capture with 'perf top' http://vbox7.com/play:199e9ede30 -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2-RFC 1/2] shpc: standard hot plug controller
On Mon, Feb 13, 2012 at 11:30:23PM +0900, Isaku Yamahata wrote: On Mon, Feb 13, 2012 at 01:49:32PM +0200, Michael S. Tsirkin wrote: On Mon, Feb 13, 2012 at 07:03:52PM +0900, Isaku Yamahata wrote: Oh nice work. On Mon, Feb 13, 2012 at 11:15:55AM +0200, Michael S. Tsirkin wrote: This adds support for SHPC interface, as defined by PCI Standard Hot-Plug Controller and Subsystem Specification, Rev 1.0 http://www.pcisig.com/specifications/conventional/pci_hot_plug/SHPC_10 Only SHPC intergrated with a PCI-to-PCI bridge is supported, SHPC integrated with a host bridge would need more work. All main SHPC features are supported: - MRL sensor Does this just report latch status? (It seems so.) What happens is that adding a device closes the latch, removing a device opens the latch. This simplifies the number of supported configurations significantly. Do you plan to provide interfaces to manipulate the latch? I didn't plan to do this, and this is non-trivial. Do you just want this for empty slots? And why? No, I just wondered your plan. - Attention button - Attention indicator - Power indicator Wake on hotplug and serr generation are stubbed out but unused as we don't have interfaces to generate these events ATM. One issue that isn't completely resolved is that qemu currently expects an eject interface, which SHPC does not provide: it merely removes the power to device and it's up to the user to remove the device from slot. This patch works around that by ejecting the device when power is removed and power LED goes off. TODO: - migration support - fix dependency on pci_internals.h If I didn't miss the code, - QMP command for pushing attention button. - QMP command to get LED status It's easy to add these, so I'd accept such a patch, but I wonder why. My concern is how libvirt/virt-manger (or other UI) presents slot status to operators/users. They currently present free/busy status just by looking at info pci. Maybe that is enough. My concern is rather with the eject hack above: the add/delete API maps reasonably to _EJ0 interface, but isn't generic enough for SHPC. We'll need a better API for that. - QMP events for LED on/off There's also blink :) thanks, I'm concerned that a guest can flood the management with such events. It's better to send a single LED change event, then we can suppress further events until next get LED status command. Makes sense. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Makefile.objs |1 + hw/pci.h |6 + hw/shpc.c | 646 + hw/shpc.h | 40 qemu-common.h |1 + 5 files changed, 694 insertions(+), 0 deletions(-) create mode 100644 hw/shpc.c create mode 100644 hw/shpc.h diff --git a/Makefile.objs b/Makefile.objs index 391e524..4546477 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -195,6 +195,7 @@ hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o hw-obj-y += fw_cfg.o hw-obj-$(CONFIG_PCI) += pci.o pci_bridge.o hw-obj-$(CONFIG_PCI) += msix.o msi.o +hw-obj-$(CONFIG_PCI) += shpc.o hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o hw-obj-y += watchdog.o diff --git a/hw/pci.h b/hw/pci.h index 33b0b18..756577e 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -125,6 +125,9 @@ enum { /* command register SERR bit enabled */ #define QEMU_PCI_CAP_SERR_BITNR 4 QEMU_PCI_CAP_SERR = (1 QEMU_PCI_CAP_SERR_BITNR), +/* Standard hot plug controller. */ +#define QEMU_PCI_SHPC_BITNR 5 +QEMU_PCI_CAP_SHPC = (1 QEMU_PCI_SHPC_BITNR), }; #define TYPE_PCI_DEVICE pci-device @@ -229,6 +232,9 @@ struct PCIDevice { /* PCI Express */ PCIExpressDevice exp; +/* SHPC */ +SHPCDevice *shpc; + /* Location of option rom */ char *romfile; bool has_rom; diff --git a/hw/shpc.c b/hw/shpc.c new file mode 100644 index 000..4baec29 --- /dev/null +++ b/hw/shpc.c @@ -0,0 +1,646 @@ +#include strings.h +#include stdint.h +#include range.h +#include shpc.h +#include pci.h +#include pci_internals.h + +/* TODO: model power only and disabled slot states. */ +/* TODO: handle SERR and wakeups */ +/* TODO: consider enabling 66MHz support */ + +/* TODO: remove fully only on state DISABLED and LED off. + * track state to properly record this. */ + +/* SHPC Working Register Set */ +#define SHPC_BASE_OFFSET 0x00 /* 4 bytes */ +#define SHPC_SLOTS_33 0x04 /* 4 bytes. Also encodes PCI-X slots. */ +#define SHPC_SLOTS_66
Re: [Android-virt] [PATCH RFC v2 3/3] ARM: KVM: Add support for MMU notifiers
On Mon, Feb 13, 2012 at 5:13 AM, Marc Zyngier marc.zyng...@arm.com wrote: On 12/02/12 01:12, Christoffer Dall wrote: On Sat, Feb 11, 2012 at 10:33 AM, Antonios Motakis a.mota...@virtualopensystems.com wrote: On 02/11/2012 06:35 PM, Christoffer Dall wrote: On Sat, Feb 11, 2012 at 7:00 AM, Antonios Motakis a.mota...@virtualopensystems.com wrote: On 02/10/2012 11:22 PM, Marc Zyngier wrote: +ENTRY(__kvm_tlb_flush_vmid) + hvc #0 @ Switch to Hyp mode + push {r2, r3} + ldrd r2, r3, [r0, #KVM_VTTBR] + mcrr p15, 6, r2, r3, c2 @ Write VTTBR + isb + mcr p15, 0, r0, c8, c7, 0 @ TBLIALL + dsb + isb + mov r2, #0 + mov r3, #0 + mcrr p15, 6, r2, r3, c2 @ Back to VMID #0 + isb + + pop {r2, r3} + hvc #0 @ Back to SVC + mov pc, lr +ENDPROC(__kvm_tlb_flush_vmid) With the last VMID implementation, you could get the equivalent effect of a per-VMID flush, by just getting a new VMID for the current VM. So you could do a (kvm-arch.vmid = 0) to force a new VMID when the guest reruns, and save the overhead of that flush (you will do a complete flush every 255 times instead of a small one every single time). to do this you would need to send an IPI if the guest is currently executing on another CPU and make it exit the guest, so that the VMID assignment will run before the guest potentially accesses that TLB entry that points to the page that was just reclaimed - which I am not sure will be better than this solution. Don't you have to do this anyway? You'd want the flush to be effective on all CPUs before proceeding. hmm yeah, actually you do need this. Unless the -IS version of the flush instruction covers all relevant cores in this case. Marc, I don't think that the processor clearing out the page table entry will necessarily belong to the same inner-shareable domain as the processor potentially executing the VM, so therefore the -IS flushing version would not be sufficient and we actually have to go and send an IPI. If we forget about the 11MPCore (which doesn't broadcast the TLB invalidation in hardware), the TLBIALLIS operation makes sure all cores belonging to the same inner shareable domain will see the TLB invalidation at the same time. If they don't, this is a hardware bug. Now, I do not have an example of a system where two CPUs are not part of the same IS domain. Even big.LITTLE has all of the potential 8 cores in an IS domain. If such a system exists one of these days, then it will be worth considering having a separate method to cope with the case. Until then, my opinion is to keep it as simple as possible. ok, sounds good to me. Although, perhaps keep this as a comment somewhere... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pe: [PATCH v5 1/3] virtio-scsi: first version
On 02/13/2012 02:18 PM, Michael S. Tsirkin wrote: On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote: On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote: Only if you use the pci multi-function option but that kills standard hot unplug It doesn't kill it as such, rather you can't unplug luns individually. Isnt that just a consequence of the current implementation rather than a SCSI limitation? Yes. A different way to do hoplug could be to flag all devices as removable in the standard inq page then leave the LUN there persistently and what you remove/add is not the LUN device itself but just the media in the device. Instead of hot-plug remove the LUN, hot-plug becomes media eject or media insert. The device remains present all time, you never remove it, but instead hot-plug controls if the media is present or not. This would require implementing at least START_STOP_UNIT and PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC. regards ronnie sahlberg That would work. Or we simply use the Peripheral Qualifier that the device is gone; eg we could simply set PQ = 1, return sense code 0x25/00 and be done with ... Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On 2/10/2012 7:18 AM, jamal wrote: Hi John, I went backwards to summarize at the top after going through your email. TL;DR version 0.1: you provide a good use case where it makes sense to do things in the kernel. IMO, you could make the same arguement if your embedded switch could do ACLs, IPv4 forwarding etc. And the kernel bloats. I am always bigoted to move all policy control to user space instead of bloating in the kernel. On Thu, 2012-02-09 at 20:14 -0800, John Fastabend wrote: Hi Jamal, The user space app in this case would listen for FDB updates to the SW bridge and then mirror them at the embedded NIC. In this case it seems easier to just add a notifier chain and let the kernel keep these in sync. Otherwise we need a daemon in user space to replicate these. A user space daemon if you need to ensure synchronization. Thats what i meant when i said there was a disadvantage over the simple case when the goal is always to synchronize. On the other hand if you could make the same RTM_NEWNEIGH, RTM_DELNEIGH, and RTM_GETNEIGH work for the bridge, embedded bridge, and macvlan you would have one common interface to drive these. But the bridge already has this protocol/msgtype so that would require either some demux or new protocol/msgtype pairs to be created. The bridge is very netlink friendly these days. Given the rest of the network stack (*NEIGH* you mention above) talks netlink to user space it should be workable. Let me think on it. I'm tempted by the simplicity of adding notifier hooks though. If something is missing bridge-side it may need to be added (as Per Stephen's comment) - i just took it one further indicating those notifiers need to also netlink-speak Sure. Actually because the bridge is adding/removing fdb entries dynamically maybe its best this gets done in kernel. Here's the example case, [..] With the flow by letters above hope this is not too difficult to follow. (A) veth0 a virtual device transmits packet destined for ethx.y (B) SW bridge receives frames and updates FDB flooding to C (C) eth0 the PF in this case sends the frame to the HW backed by the embedded bridge Following so far. Can you have more than one PF per embedded switch? Or is the intent here purely to do VMs/VF separation? The use case here is multiple VFs but the same solution should work with multiple PFs as well. FDB controls should be independent of how the ports are exposed VFs, PFs, VMDQ/queue pairs, macvlan, etc. (D) The HW embedded switch has a static entry for ethx.y and forwards the frame to the VF or if its a broadcast frame also floods it to the wire and ethx.y nod. (E) ethx.y receives the frame and generates a response to the dest mac of veth0 nod. Since you said in #D the entries in the switch are static, I am assuming at this point neither ethx.y nor veth0 exist in the embedded FDB. Now here is the potential issue, (G) The frame transmitted from ethx.y with the destination address of veth0 but the embedded switch is not a learning switch. If the FDB update is done in user space its possible (likely?) that the FDB entry for veth0 has not been added to the embedded switch yet. Ok, got it - so the catch here is the switch is not capable of learning. I think this depends on where learning is done. Your intent is to use the S/W bridge as something that does the learning for you i.e in the kernel. This makes the s/w bridge part of MUST-have-for-this-to-run. And that maybe the case for your use case. This is _my_ use case today. What if I dont wanna run the S/W bridge at all? Ive been making a point that with a simple knob(Stephen doesn like to add such a knob), the SW bridge could defer learning to user space. [This way you can add a lot of richness e.g on ACLs such as restricting what MAC addresses etc are allowed to talk to which ones etc.]. But if bypass the s/w bridge all together and learn in user space or have a static config in which i populate the embedded switch, i dont see the issue. With events and ADD/DEL/GET FDB controls we can solve both cases. This also solves Roopa's case with macvlan where he wants to add additional addresses to macvlan ports. Now we either have to flood the frame which is not horrible but not ideal or worse if the embedded switch does not support flooding send it to the wire and veth0 never receives it. If it is a switch it has to flood, no? Otherwise it sounds broken. Yes it should flood here, unless its acting as a 802.1Qbg VEB or VEPA. If the SW bridge pushes the FDB update down into the embedded switch the address is for sure in the embedded switches forwarding tables and the switching works as expected. Yes, there is a small gap between the s/w bridge learning and the synchronization happening to the embedded nic switch. That gap gets larger if you defer learning to
Re: x86: kvmclock: abstract save/restore sched_clock_state (v2)
On 02/13/2012 02:07 PM, Marcelo Tosatti wrote: Upon resume from hibernation, CPU 0's hvclock area contains the old values for system_time and tsc_timestamp. It is necessary for the hypervisor to update these values with uptodate ones before the CPU uses them. Abstract TSC's save/restore sched_clock_state functions and use restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update. Also move restore_sched_clock_state before __restore_processor_state, since the later calls CONFIG_LOCK_STAT's lockstat_clock (also for TSC). Thanks to Igor Mammedov for tracking it down. Fixes suspend-to-disk with kvmclock. Signed-off-by: Marcelo Tosattimtosa...@redhat.com diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h index 15d9915..c91e8b9 100644 --- a/arch/x86/include/asm/tsc.h +++ b/arch/x86/include/asm/tsc.h @@ -61,7 +61,7 @@ extern void check_tsc_sync_source(int cpu); extern void check_tsc_sync_target(void); extern int notsc_setup(char *); -extern void save_sched_clock_state(void); -extern void restore_sched_clock_state(void); +extern void tsc_save_sched_clock_state(void); +extern void tsc_restore_sched_clock_state(void); #endif /* _ASM_X86_TSC_H */ diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h index 5d0afac..baaca8d 100644 --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -162,6 +162,8 @@ struct x86_cpuinit_ops { * @is_untracked_pat_rangeexclude from PAT logic * @nmi_init enable NMI on cpus * @i8042_detect pre-detect if i8042 controller exists + * @save_sched_clock_state:save state for sched_clock() on suspend + * @restore_sched_clock_state: restore state for sched_clock() on resume */ struct x86_platform_ops { unsigned long (*calibrate_tsc)(void); @@ -173,6 +175,8 @@ struct x86_platform_ops { void (*nmi_init)(void); unsigned char (*get_nmi_reason)(void); int (*i8042_detect)(void); + void (*save_sched_clock_state)(void); + void (*restore_sched_clock_state)(void); }; struct pci_dev; diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index ca4e735..57e6b78 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -136,6 +136,15 @@ int kvm_register_clock(char *txt) return ret; } +void kvm_save_sched_clock_state(void) +{ +} + +void kvm_restore_sched_clock_state(void) +{ + kvm_register_clock(primary cpu clock, resume); +} + #ifdef CONFIG_X86_LOCAL_APIC static void __cpuinit kvm_setup_secondary_clock(void) { @@ -195,6 +204,8 @@ void __init kvmclock_init(void) x86_cpuinit.early_percpu_clock_init = kvm_setup_secondary_clock; #endif + x86_platform.save_sched_clock_state = kvm_save_sched_clock_state; + x86_platform.restore_sched_clock_state = kvm_restore_sched_clock_state; machine_ops.shutdown = kvm_shutdown; #ifdef CONFIG_KEXEC machine_ops.crash_shutdown = kvm_crash_shutdown; diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index a62c201..aed2aa1 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -629,7 +629,7 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) static unsigned long long cyc2ns_suspend; -void save_sched_clock_state(void) +void tsc_save_sched_clock_state(void) { if (!sched_clock_stable) return; @@ -645,7 +645,7 @@ void save_sched_clock_state(void) * that sched_clock() continues from the point where it was left off during * suspend. */ -void restore_sched_clock_state(void) +void tsc_restore_sched_clock_state(void) { unsigned long long offset; unsigned long flags; diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c index 6f2ec53..e9f265f 100644 --- a/arch/x86/kernel/x86_init.c +++ b/arch/x86/kernel/x86_init.c @@ -108,7 +108,9 @@ struct x86_platform_ops x86_platform = { .is_untracked_pat_range = is_ISA_range, .nmi_init = default_nmi_init, .get_nmi_reason = default_get_nmi_reason, - .i8042_detect = default_i8042_detect + .i8042_detect = default_i8042_detect, + .save_sched_clock_state = tsc_save_sched_clock_state, + .restore_sched_clock_state = tsc_restore_sched_clock_state, }; EXPORT_SYMBOL_GPL(x86_platform); diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index f10c0af..0e76a28 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -114,7 +114,7 @@ static void __save_processor_state(struct saved_context *ctxt) void save_processor_state(void) { __save_processor_state(saved_context); - save_sched_clock_state(); + x86_platform.save_sched_clock_state(); } #ifdef CONFIG_X86_32 EXPORT_SYMBOL(save_processor_state); @@ -230,8 +230,8 @@ static void __restore_processor_state(struct saved_context *ctxt) /*
[PATCH RFC] pvclock: Make pv_clock more robust and fixup it if overflow happens
Instead of hunting misterious stalls/hungs all over the kernel when overflow occurs at pvclock.c:pvclock_get_nsec_offset u64 delta = native_read_tsc() - shadow-tsc_timestamp; and introducing hooks when places of unexpected access found, pv_clock should be initialized for the calling cpu if overflow condition is detected. Signed-off-by: Igor Mammedov imamm...@redhat.com --- arch/x86/kernel/pvclock.c | 18 +++--- 1 files changed, 15 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 42eb330..b486756 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -41,9 +41,14 @@ void pvclock_set_flags(u8 flags) valid_flags = flags; } -static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow) +static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow, + bool *overflow) { - u64 delta = native_read_tsc() - shadow-tsc_timestamp; + u64 delta; + u64 tsc = native_read_tsc(); + u64 shadow_timestamp = shadow-tsc_timestamp; + *overflow = tsc shadow_timestamp; + delta = tsc - shadow_timestamp; return pvclock_scale_delta(delta, shadow-tsc_to_nsec_mul, shadow-tsc_shift); } @@ -94,12 +99,19 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) unsigned version; cycle_t ret, offset; u64 last; + bool overflow; do { version = pvclock_get_time_values(shadow, src); barrier(); - offset = pvclock_get_nsec_offset(shadow); + offset = pvclock_get_nsec_offset(shadow, overflow); ret = shadow.system_timestamp + offset; + if (unlikely(overflow)) { + memset(src, 0, sizeof(*src)); + barrier(); + x86_cpuinit.early_percpu_clock_init(); + continue; + } barrier(); } while (version != src-version); -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: perf: kvm events analysis tool
On 02/13/2012 03:06 AM, Xiao Guangrong wrote: On 02/13/2012 01:32 PM, David Ahern wrote: [sorry for the top post - you would think Android would have a better mail client] If the first patch is needed then kvm-events will not work with older, unpatched kernels. That's a big limitation from a perf perpective. The first patch is only needed for code compilation, after kvm-events is compiled, you can analyse any kernels. :) understood. Now that I recall perf's way of handling out of tree builds, a couple of comments: 1. you need to add the following to tools/perf/MANIFEST arch/x86/include/asm/svm.h arch/x86/include/asm/vmx.h arch/x86/include/asm/kvm_host.h 2.scripts/checkpatch.pl is an unhappy camper. I'll take a look at the code and try out the command when I get some time. David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: x86: kvmclock: abstract save/restore sched_clock_state (v2)
On Mon, Feb 13, 2012 at 04:20:24PM +0100, Igor Mammedov wrote: On 02/13/2012 02:07 PM, Marcelo Tosatti wrote: Upon resume from hibernation, CPU 0's hvclock area contains the old values for system_time and tsc_timestamp. It is necessary for the hypervisor to update these values with uptodate ones before the CPU uses them. Abstract TSC's save/restore sched_clock_state functions and use restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update. Also move restore_sched_clock_state before __restore_processor_state, since the later calls CONFIG_LOCK_STAT's lockstat_clock (also for TSC). Thanks to Igor Mammedov for tracking it down. Fixes suspend-to-disk with kvmclock. Signed-off-by: Marcelo Tosattimtosa...@redhat.com diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h index 15d9915..c91e8b9 100644 --- a/arch/x86/include/asm/tsc.h +++ b/arch/x86/include/asm/tsc.h @@ -61,7 +61,7 @@ extern void check_tsc_sync_source(int cpu); extern void check_tsc_sync_target(void); extern int notsc_setup(char *); -extern void save_sched_clock_state(void); -extern void restore_sched_clock_state(void); +extern void tsc_save_sched_clock_state(void); +extern void tsc_restore_sched_clock_state(void); #endif /* _ASM_X86_TSC_H */ diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h index 5d0afac..baaca8d 100644 --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -162,6 +162,8 @@ struct x86_cpuinit_ops { * @is_untracked_pat_range exclude from PAT logic * @nmi_init enable NMI on cpus * @i8042_detect pre-detect if i8042 controller exists + * @save_sched_clock_state: save state for sched_clock() on suspend + * @restore_sched_clock_state: restore state for sched_clock() on resume */ struct x86_platform_ops { unsigned long (*calibrate_tsc)(void); @@ -173,6 +175,8 @@ struct x86_platform_ops { void (*nmi_init)(void); unsigned char (*get_nmi_reason)(void); int (*i8042_detect)(void); +void (*save_sched_clock_state)(void); +void (*restore_sched_clock_state)(void); }; struct pci_dev; diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index ca4e735..57e6b78 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -136,6 +136,15 @@ int kvm_register_clock(char *txt) return ret; } +void kvm_save_sched_clock_state(void) +{ +} + +void kvm_restore_sched_clock_state(void) +{ +kvm_register_clock(primary cpu clock, resume); +} + #ifdef CONFIG_X86_LOCAL_APIC static void __cpuinit kvm_setup_secondary_clock(void) { @@ -195,6 +204,8 @@ void __init kvmclock_init(void) x86_cpuinit.early_percpu_clock_init = kvm_setup_secondary_clock; #endif +x86_platform.save_sched_clock_state = kvm_save_sched_clock_state; +x86_platform.restore_sched_clock_state = kvm_restore_sched_clock_state; machine_ops.shutdown = kvm_shutdown; #ifdef CONFIG_KEXEC machine_ops.crash_shutdown = kvm_crash_shutdown; diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index a62c201..aed2aa1 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -629,7 +629,7 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) static unsigned long long cyc2ns_suspend; -void save_sched_clock_state(void) +void tsc_save_sched_clock_state(void) { if (!sched_clock_stable) return; @@ -645,7 +645,7 @@ void save_sched_clock_state(void) * that sched_clock() continues from the point where it was left off during * suspend. */ -void restore_sched_clock_state(void) +void tsc_restore_sched_clock_state(void) { unsigned long long offset; unsigned long flags; diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c index 6f2ec53..e9f265f 100644 --- a/arch/x86/kernel/x86_init.c +++ b/arch/x86/kernel/x86_init.c @@ -108,7 +108,9 @@ struct x86_platform_ops x86_platform = { .is_untracked_pat_range = is_ISA_range, .nmi_init = default_nmi_init, .get_nmi_reason = default_get_nmi_reason, -.i8042_detect = default_i8042_detect +.i8042_detect = default_i8042_detect, +.save_sched_clock_state = tsc_save_sched_clock_state, +.restore_sched_clock_state = tsc_restore_sched_clock_state, }; EXPORT_SYMBOL_GPL(x86_platform); diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index f10c0af..0e76a28 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -114,7 +114,7 @@ static void __save_processor_state(struct saved_context *ctxt) void save_processor_state(void) { __save_processor_state(saved_context); -save_sched_clock_state(); +x86_platform.save_sched_clock_state(); } #ifdef CONFIG_X86_32
Re: x86: kvmclock: abstract save/restore sched_clock_state (v2)
On (Mon) 13 Feb 2012 [11:07:27], Marcelo Tosatti wrote: Upon resume from hibernation, CPU 0's hvclock area contains the old values for system_time and tsc_timestamp. It is necessary for the hypervisor to update these values with uptodate ones before the CPU uses them. Abstract TSC's save/restore sched_clock_state functions and use restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update. Also move restore_sched_clock_state before __restore_processor_state, since the later calls CONFIG_LOCK_STAT's lockstat_clock (also for TSC). Thanks to Igor Mammedov for tracking it down. Fixes suspend-to-disk with kvmclock. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com This works fine, thanks. Tested-by: Amit Shah amit.s...@redhat.com Amit -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 00/15] SCSI s/g + SCSI migration + virtio-scsi
Here is v3 of the virtio-scsi driver. Changes are: - the virtio id is now 8, to fix a conflict in the virtio spec; - rebased for QOM; - changed the resid type to size_t following Stefan's advice; - fixed sense length (patch from Christian Hoff). The spec has been committed by Rusty (version 0.9.4), and SCSI maintainers should be okay with including it in the 3.4 kernel. Paolo Bonzini (13): dma-helpers: make QEMUSGList target independent dma-helpers: add dma_buf_read and dma_buf_write dma-helpers: add accounting wrappers ahci: use new DMA helpers scsi: pass residual amount to command_complete scsi: add scatter/gather functionality scsi-disk: enable scatter/gather functionality scsi: add SCSIDevice vmstate definitions scsi-generic: add migration support scsi-disk: add migration support virtio-scsi: add basic SCSI bus operation virtio-scsi: process control queue requests virtio-scsi: add migration support Stefan Hajnoczi (2): virtio-scsi: Add virtio-scsi stub device virtio-scsi: Add basic request processing infrastructure Makefile.target |1 + default-configs/pci.mak |1 + default-configs/s390x-softmmu.mak |1 + dma-helpers.c | 36 +++ dma.h | 20 +- hw/esp.c |3 +- hw/ide/ahci.c | 82 +- hw/lsi53c895a.c |2 +- hw/pci.h |1 + hw/s390-virtio-bus.c | 34 ++ hw/s390-virtio-bus.h |4 +- hw/scsi-bus.c | 142 +- hw/scsi-disk.c| 120 +++- hw/scsi-generic.c | 25 ++ hw/scsi.h | 22 ++- hw/spapr_vscsi.c |2 +- hw/usb-msd.c |2 +- hw/virtio-pci.c | 56 hw/virtio-pci.h |2 + hw/virtio-scsi.c | 607 + hw/virtio-scsi.h | 36 +++ hw/virtio.h |3 + 22 files changed, 1098 insertions(+), 104 deletions(-) create mode 100644 hw/virtio-scsi.c create mode 100644 hw/virtio-scsi.h -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 01/15] dma-helpers: make QEMUSGList target independent
scsi-disk will manage scatter/gather list, but it does not create single entries so it remains target-independent. Make QEMUSGList available to it. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- dma.h | 14 +++--- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/dma.h b/dma.h index a13209d..d50019b 100644 --- a/dma.h +++ b/dma.h @@ -17,6 +17,13 @@ typedef struct ScatterGatherEntry ScatterGatherEntry; +struct QEMUSGList { +ScatterGatherEntry *sg; +int nsg; +int nalloc; +size_t size; +}; + #if defined(TARGET_PHYS_ADDR_BITS) typedef target_phys_addr_t dma_addr_t; @@ -32,13 +39,6 @@ struct ScatterGatherEntry { dma_addr_t len; }; -struct QEMUSGList { -ScatterGatherEntry *sg; -int nsg; -int nalloc; -dma_addr_t size; -}; - void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint); void qemu_sglist_add(QEMUSGList *qsg, dma_addr_t base, dma_addr_t len); void qemu_sglist_destroy(QEMUSGList *qsg); -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 02/15] dma-helpers: add dma_buf_read and dma_buf_write
These helpers do a full transfer from an in-memory buffer to target memory, with support for scatter/gather lists. It will be used to store the reply of an emulated command into a QEMUSGList provided by the adapter. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- dma-helpers.c | 30 ++ dma.h |3 +++ 2 files changed, 33 insertions(+), 0 deletions(-) diff --git a/dma-helpers.c b/dma-helpers.c index f08cdb5..f53a51f 100644 --- a/dma-helpers.c +++ b/dma-helpers.c @@ -204,3 +204,33 @@ BlockDriverAIOCB *dma_bdrv_write(BlockDriverState *bs, { return dma_bdrv_io(bs, sg, sector, bdrv_aio_writev, cb, opaque, true); } + + +static uint64_t dma_buf_rw(uint8_t *ptr, int32_t len, QEMUSGList *sg, bool to_dev) +{ +uint64_t resid; +int sg_cur_index; + +resid = sg-size; +sg_cur_index = 0; +len = MIN(len, resid); +while (len 0) { +ScatterGatherEntry entry = sg-sg[sg_cur_index++]; +cpu_physical_memory_rw(entry.base, ptr, MIN(len, entry.len), !to_dev); +ptr += entry.len; +len -= entry.len; +resid -= entry.len; +} + +return resid; +} + +uint64_t dma_buf_read(uint8_t *ptr, int32_t len, QEMUSGList *sg) +{ +return dma_buf_rw(ptr, len, sg, 0); +} + +uint64_t dma_buf_write(uint8_t *ptr, int32_t len, QEMUSGList *sg) +{ +return dma_buf_rw(ptr, len, sg, 1); +} diff --git a/dma.h b/dma.h index d50019b..346ac4f 100644 --- a/dma.h +++ b/dma.h @@ -58,4 +58,7 @@ BlockDriverAIOCB *dma_bdrv_read(BlockDriverState *bs, BlockDriverAIOCB *dma_bdrv_write(BlockDriverState *bs, QEMUSGList *sg, uint64_t sector, BlockDriverCompletionFunc *cb, void *opaque); +uint64_t dma_buf_read(uint8_t *ptr, int32_t len, QEMUSGList *sg); +uint64_t dma_buf_write(uint8_t *ptr, int32_t len, QEMUSGList *sg); + #endif -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 03/15] dma-helpers: add accounting wrappers
The length of the transfer is already in the sglist, the wrapper simply fetches it. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- dma-helpers.c |6 ++ dma.h |3 +++ 2 files changed, 9 insertions(+), 0 deletions(-) diff --git a/dma-helpers.c b/dma-helpers.c index f53a51f..a773489 100644 --- a/dma-helpers.c +++ b/dma-helpers.c @@ -234,3 +234,9 @@ uint64_t dma_buf_write(uint8_t *ptr, int32_t len, QEMUSGList *sg) { return dma_buf_rw(ptr, len, sg, 1); } + +void dma_acct_start(BlockDriverState *bs, BlockAcctCookie *cookie, +QEMUSGList *sg, enum BlockAcctType type) +{ +bdrv_acct_start(bs, cookie, sg-size, type); +} diff --git a/dma.h b/dma.h index 346ac4f..20e86d2 100644 --- a/dma.h +++ b/dma.h @@ -61,4 +61,7 @@ BlockDriverAIOCB *dma_bdrv_write(BlockDriverState *bs, uint64_t dma_buf_read(uint8_t *ptr, int32_t len, QEMUSGList *sg); uint64_t dma_buf_write(uint8_t *ptr, int32_t len, QEMUSGList *sg); +void dma_acct_start(BlockDriverState *bs, BlockAcctCookie *cookie, +QEMUSGList *sg, enum BlockAcctType type); + #endif -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 04/15] ahci: use new DMA helpers
Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/ide/ahci.c | 82 + 1 files changed, 13 insertions(+), 69 deletions(-) diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c index c87a6ca..25ed844 100644 --- a/hw/ide/ahci.c +++ b/hw/ide/ahci.c @@ -426,55 +426,6 @@ static void ahci_reg_init(AHCIState *s) } } -static uint32_t read_from_sglist(uint8_t *buffer, uint32_t len, - QEMUSGList *sglist) -{ -uint32_t i = 0; -uint32_t total = 0, once; -ScatterGatherEntry *cur_prd; -uint32_t sgcount; - -cur_prd = sglist-sg; -sgcount = sglist-nsg; -for (i = 0; len sgcount; i++) { -once = MIN(cur_prd-len, len); -cpu_physical_memory_read(cur_prd-base, buffer, once); -cur_prd++; -sgcount--; -len -= once; -buffer += once; -total += once; -} - -return total; -} - -static uint32_t write_to_sglist(uint8_t *buffer, uint32_t len, -QEMUSGList *sglist) -{ -uint32_t i = 0; -uint32_t total = 0, once; -ScatterGatherEntry *cur_prd; -uint32_t sgcount; - -DPRINTF(-1, total: 0x%x bytes\n, len); - -cur_prd = sglist-sg; -sgcount = sglist-nsg; -for (i = 0; len sgcount; i++) { -once = MIN(cur_prd-len, len); -DPRINTF(-1, write 0x%x bytes to 0x%lx\n, once, (long)cur_prd-base); -cpu_physical_memory_write(cur_prd-base, buffer, once); -cur_prd++; -sgcount--; -len -= once; -buffer += once; -total += once; -} - -return total; -} - static void check_cmd(AHCIState *s, int port) { AHCIPortRegs *pr = s-dev[port].port_regs; @@ -795,9 +746,8 @@ static void process_ncq_command(AHCIState *s, int port, uint8_t *cmd_fis, DPRINTF(port, tag %d aio read %PRId64\n, ncq_tfs-tag, ncq_tfs-lba); -bdrv_acct_start(ncq_tfs-drive-port.ifs[0].bs, ncq_tfs-acct, -(ncq_tfs-sector_count-1) * BDRV_SECTOR_SIZE, -BDRV_ACCT_READ); +dma_acct_start(ncq_tfs-drive-port.ifs[0].bs, ncq_tfs-acct, + ncq_tfs-sglist, BDRV_ACCT_READ); ncq_tfs-aiocb = dma_bdrv_read(ncq_tfs-drive-port.ifs[0].bs, ncq_tfs-sglist, ncq_tfs-lba, ncq_cb, ncq_tfs); @@ -809,9 +759,8 @@ static void process_ncq_command(AHCIState *s, int port, uint8_t *cmd_fis, DPRINTF(port, tag %d aio write %PRId64\n, ncq_tfs-tag, ncq_tfs-lba); -bdrv_acct_start(ncq_tfs-drive-port.ifs[0].bs, ncq_tfs-acct, -(ncq_tfs-sector_count-1) * BDRV_SECTOR_SIZE, -BDRV_ACCT_WRITE); +dma_acct_start(ncq_tfs-drive-port.ifs[0].bs, ncq_tfs-acct, + ncq_tfs-sglist, BDRV_ACCT_WRITE); ncq_tfs-aiocb = dma_bdrv_write(ncq_tfs-drive-port.ifs[0].bs, ncq_tfs-sglist, ncq_tfs-lba, ncq_cb, ncq_tfs); @@ -1016,12 +965,12 @@ static int ahci_start_transfer(IDEDMA *dma) is_write ? writ : read, size, is_atapi ? atapi : ata, has_sglist ? : o); -if (is_write has_sglist (s-data_ptr s-data_end)) { -read_from_sglist(s-data_ptr, size, s-sg); -} - -if (!is_write has_sglist (s-data_ptr s-data_end)) { -write_to_sglist(s-data_ptr, size, s-sg); +if (has_sglist size) { +if (is_write) { +dma_buf_write(s-data_ptr, size, s-sg); +} else { +dma_buf_read(s-data_ptr, size, s-sg); +} } /* update number of transferred bytes */ @@ -1060,14 +1009,9 @@ static int ahci_dma_prepare_buf(IDEDMA *dma, int is_write) { AHCIDevice *ad = DO_UPCAST(AHCIDevice, dma, dma); IDEState *s = ad-port.ifs[0]; -int i; ahci_populate_sglist(ad, s-sg); - -s-io_buffer_size = 0; -for (i = 0; i s-sg.nsg; i++) { -s-io_buffer_size += s-sg.sg[i].len; -} +s-io_buffer_size = s-sg.size; DPRINTF(ad-port_no, len=%#x\n, s-io_buffer_size); return s-io_buffer_size != 0; @@ -1085,9 +1029,9 @@ static int ahci_dma_rw_buf(IDEDMA *dma, int is_write) } if (is_write) { -write_to_sglist(p, l, s-sg); +dma_buf_read(p, l, s-sg); } else { -read_from_sglist(p, l, s-sg); +dma_buf_write(p, l, s-sg); } /* update number of transferred bytes */ -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 06/15] scsi: add scatter/gather functionality
Scatter/gather functionality uses the newly added DMA helpers. The device can choose between doing DMA itself, or calling scsi_req_data as usual, which will use the newly added DMA helpers to copy piecewise to/from the destination area(s). Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi-bus.c | 28 ++-- hw/scsi.h |3 +++ 2 files changed, 29 insertions(+), 2 deletions(-) diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c index 6a069f4..69cb3fc 100644 --- a/hw/scsi-bus.c +++ b/hw/scsi-bus.c @@ -5,6 +5,7 @@ #include qdev.h #include blockdev.h #include trace.h +#include dma.h static char *scsibus_get_fw_dev_path(DeviceState *dev); static int scsi_req_parse(SCSICommand *cmd, SCSIDevice *dev, uint8_t *buf); @@ -651,6 +652,11 @@ int32_t scsi_req_enqueue(SCSIRequest *req) assert(!req-enqueued); scsi_req_ref(req); +if (req-bus-info-get_sg_list) { +req-sg = req-bus-info-get_sg_list(req); +} else { +req-sg = NULL; +} req-enqueued = true; QTAILQ_INSERT_TAIL(req-dev-requests, req, next); @@ -1275,14 +1281,32 @@ void scsi_req_continue(SCSIRequest *req) Once it completes, calling scsi_req_continue will restart I/O. */ void scsi_req_data(SCSIRequest *req, int len) { +uint8_t *buf; if (req-io_canceled) { trace_scsi_req_data_canceled(req-dev-id, req-lun, req-tag, len); return; } trace_scsi_req_data(req-dev-id, req-lun, req-tag, len); assert(req-cmd.mode != SCSI_XFER_NONE); -req-resid -= len; -req-bus-info-transfer_data(req, len); +if (!req-sg) { +req-resid -= len; +req-bus-info-transfer_data(req, len); +return; +} + +/* If the device calls scsi_req_data and the HBA specified a + * scatter/gather list, the transfer has to happen in a single + * step. */ +assert(!req-dma_started); +req-dma_started = true; + +buf = scsi_req_get_buf(req); +if (req-cmd.mode == SCSI_XFER_FROM_DEV) { +req-resid = dma_buf_read(buf, len, req-sg); +} else { +req-resid = dma_buf_write(buf, len, req-sg); +} +scsi_req_continue(req); } void scsi_req_print(SCSIRequest *req) diff --git a/hw/scsi.h b/hw/scsi.h index e1c52d2..811f61c 100644 --- a/hw/scsi.h +++ b/hw/scsi.h @@ -49,6 +49,8 @@ struct SCSIRequest { size_tresid; SCSICommand cmd; BlockDriverAIOCB *aiocb; +QEMUSGList*sg; +bool dma_started; uint8_t sense[SCSI_SENSE_BUF_SIZE]; uint32_t sense_len; bool enqueued; @@ -115,6 +117,7 @@ struct SCSIBusInfo { void (*transfer_data)(SCSIRequest *req, uint32_t arg); void (*complete)(SCSIRequest *req, uint32_t arg, size_t resid); void (*cancel)(SCSIRequest *req); +QEMUSGList *(*get_sg_list)(SCSIRequest *req); }; struct SCSIBus { -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 07/15] scsi-disk: enable scatter/gather functionality
Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi-bus.c |1 + hw/scsi-disk.c | 63 --- 2 files changed, 51 insertions(+), 13 deletions(-) diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c index 69cb3fc..817aa49 100644 --- a/hw/scsi-bus.c +++ b/hw/scsi-bus.c @@ -87,6 +87,7 @@ static void scsi_dma_restart_bh(void *opaque) scsi_req_continue(req); break; case SCSI_XFER_NONE: +assert(!req-sg); scsi_req_dequeue(req); scsi_req_enqueue(req); break; diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index 399e51e..0e4d6ad 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -38,6 +38,7 @@ do { fprintf(stderr, scsi-disk: fmt , ## __VA_ARGS__); } while (0) #include sysemu.h #include blockdev.h #include block_int.h +#include dma.h #ifdef __linux #include scsi/sg.h @@ -123,6 +124,27 @@ static uint32_t scsi_init_iovec(SCSIDiskReq *r) return r-qiov.size / 512; } +static void scsi_dma_complete(void *opaque, int ret) +{ +SCSIDiskReq *r = (SCSIDiskReq *)opaque; +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r-req.dev); + +bdrv_acct_done(s-qdev.conf.bs, r-acct); + +if (ret) { +if (scsi_handle_rw_error(r, -ret)) { +goto done; +} +} + +r-sector += r-sector_count; +r-sector_count = 0; +scsi_req_complete(r-req, GOOD); + +done: +scsi_req_unref(r-req); +} + static void scsi_read_complete(void * opaque, int ret) { SCSIDiskReq *r = (SCSIDiskReq *)opaque; @@ -213,10 +235,17 @@ static void scsi_read_data(SCSIRequest *req) return; } -n = scsi_init_iovec(r); -bdrv_acct_start(s-qdev.conf.bs, r-acct, n * BDRV_SECTOR_SIZE, BDRV_ACCT_READ); -r-req.aiocb = bdrv_aio_readv(s-qdev.conf.bs, r-sector, r-qiov, n, - scsi_read_complete, r); +if (r-req.sg) { +dma_acct_start(s-qdev.conf.bs, r-acct, r-req.sg, BDRV_ACCT_READ); +r-req.resid -= r-req.sg-size; +r-req.aiocb = dma_bdrv_read(s-qdev.conf.bs, r-req.sg, r-sector, + scsi_dma_complete, r); +} else { +n = scsi_init_iovec(r); +bdrv_acct_start(s-qdev.conf.bs, r-acct, n * BDRV_SECTOR_SIZE, BDRV_ACCT_READ); +r-req.aiocb = bdrv_aio_readv(s-qdev.conf.bs, r-sector, r-qiov, n, + scsi_read_complete, r); +} } /* @@ -315,18 +344,26 @@ static void scsi_write_data(SCSIRequest *req) return; } -n = r-qiov.size / 512; -if (n) { -if (s-tray_open) { -scsi_write_complete(r, -ENOMEDIUM); -return; -} +if (!r-req.sg !r-qiov.size) { +/* Called for the first time. Ask the driver to send us more data. */ +scsi_write_complete(r, 0); +return; +} +if (s-tray_open) { +scsi_write_complete(r, -ENOMEDIUM); +return; +} + +if (r-req.sg) { +dma_acct_start(s-qdev.conf.bs, r-acct, r-req.sg, BDRV_ACCT_WRITE); +r-req.resid -= r-req.sg-size; +r-req.aiocb = dma_bdrv_write(s-qdev.conf.bs, r-req.sg, r-sector, + scsi_dma_complete, r); +} else { +n = r-qiov.size / 512; bdrv_acct_start(s-qdev.conf.bs, r-acct, n * BDRV_SECTOR_SIZE, BDRV_ACCT_WRITE); r-req.aiocb = bdrv_aio_writev(s-qdev.conf.bs, r-sector, r-qiov, n, scsi_write_complete, r); -} else { -/* Called for the first time. Ask the driver to send us more data. */ -scsi_write_complete(r, 0); } } -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 08/15] scsi: add SCSIDevice vmstate definitions
Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi-bus.c | 107 +++-- hw/scsi.h | 16 2 files changed, 120 insertions(+), 3 deletions(-) diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c index 817aa49..15841d0 100644 --- a/hw/scsi-bus.c +++ b/hw/scsi-bus.c @@ -647,10 +647,8 @@ void scsi_req_build_sense(SCSIRequest *req, SCSISense sense) req-sense_len = 18; } -int32_t scsi_req_enqueue(SCSIRequest *req) +static void scsi_req_enqueue_internal(SCSIRequest *req) { -int32_t rc; - assert(!req-enqueued); scsi_req_ref(req); if (req-bus-info-get_sg_list) { @@ -660,7 +658,14 @@ int32_t scsi_req_enqueue(SCSIRequest *req) } req-enqueued = true; QTAILQ_INSERT_TAIL(req-dev-requests, req, next); +} +int32_t scsi_req_enqueue(SCSIRequest *req) +{ +int32_t rc; + +assert(!req-retry); +scsi_req_enqueue_internal(req); scsi_req_ref(req); rc = req-ops-send_command(req, req-cmd.buf); scsi_req_unref(req); @@ -1442,6 +1447,102 @@ SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, int id, int lun) return target_dev; } +/* SCSI request list. For simplicity, pv points to the whole device */ + +static void put_scsi_requests(QEMUFile *f, void *pv, size_t size) +{ +SCSIDevice *s = pv; +SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, s-qdev.parent_bus); +SCSIRequest *req; + +QTAILQ_FOREACH(req, s-requests, next) { +assert(!req-io_canceled); +assert(req-status == -1); +assert(req-retry); +assert(req-enqueued); + +qemu_put_sbyte(f, 1); +qemu_put_buffer(f, req-cmd.buf, sizeof(req-cmd.buf)); +qemu_put_be32s(f, req-tag); +qemu_put_be32s(f, req-lun); +if (bus-info-save_request) { +bus-info-save_request(f, req); +} +if (req-ops-save_request) { +req-ops-save_request(f, req); +} +} +qemu_put_sbyte(f, 0); +} + +static int get_scsi_requests(QEMUFile *f, void *pv, size_t size) +{ +SCSIDevice *s = pv; +SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, s-qdev.parent_bus); + +while (qemu_get_sbyte(f)) { +uint8_t buf[SCSI_CMD_BUF_SIZE]; +uint32_t tag; +uint32_t lun; +SCSIRequest *req; + +qemu_get_buffer(f, buf, sizeof(buf)); +qemu_get_be32s(f, tag); +qemu_get_be32s(f, lun); +req = scsi_req_new(s, tag, lun, buf, NULL); +if (bus-info-load_request) { +req-hba_private = bus-info-load_request(f, req); +} +if (req-ops-load_request) { +req-ops-load_request(f, req); +} + +/* Just restart it later. */ +req-retry = true; +scsi_req_enqueue_internal(req); + +/* At this point, the request will be kept alive by the reference + * added by scsi_req_enqueue_internal, so we can release our reference. + * The HBA of course will add its own reference in the load_request + * callback if it needs to hold on the SCSIRequest. + */ +scsi_req_unref(req); +} + +return 0; +} + +const VMStateInfo vmstate_info_scsi_requests = { +.name = scsi-requests, +.get = get_scsi_requests, +.put = put_scsi_requests, +}; + +const VMStateDescription vmstate_scsi_device = { +.name = SCSIDevice, +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField[]) { +VMSTATE_UINT8(unit_attention.key, SCSIDevice), +VMSTATE_UINT8(unit_attention.asc, SCSIDevice), +VMSTATE_UINT8(unit_attention.ascq, SCSIDevice), +VMSTATE_BOOL(sense_is_ua, SCSIDevice), +VMSTATE_UINT8_ARRAY(sense, SCSIDevice, SCSI_SENSE_BUF_SIZE), +VMSTATE_UINT32(sense_len, SCSIDevice), +{ +.name = requests, +.version_id = 0, +.field_exists = NULL, +.size = 0, /* ouch */ +.info = vmstate_info_scsi_requests, +.flags= VMS_SINGLE, +.offset = 0, +}, +VMSTATE_END_OF_LIST() +} +}; + static void scsi_device_class_init(ObjectClass *klass, void *data) { DeviceClass *k = DEVICE_CLASS(klass); diff --git a/hw/scsi.h b/hw/scsi.h index 811f61c..c6624ca 100644 --- a/hw/scsi.h +++ b/hw/scsi.h @@ -96,6 +96,16 @@ struct SCSIDevice uint64_t max_lba; }; +extern const VMStateDescription vmstate_scsi_device; + +#define VMSTATE_SCSI_DEVICE(_field, _state) {\ +.name = (stringify(_field)), \ +.size = sizeof(SCSIDevice),\ +.vmsd = vmstate_scsi_device, \ +.flags = VMS_STRUCT,\ +.offset = vmstate_offset_value(_state, _field, SCSIDevice), \ +} + /* cdrom.c */ int cdrom_read_toc(int nb_sectors,
[PATCH v3 05/15] scsi: pass residual amount to command_complete
With the upcoming sglist support, HBAs will not see any transfer_data call and will not have a way to detect short transfers. So pass the residual amount of data upon command completion. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- v2-v3: fixed resid type (Stefan) hw/esp.c |3 ++- hw/lsi53c895a.c |2 +- hw/scsi-bus.c| 12 hw/scsi.h|3 ++- hw/spapr_vscsi.c |2 +- hw/usb-msd.c |2 +- 6 files changed, 15 insertions(+), 9 deletions(-) diff --git a/hw/esp.c b/hw/esp.c index 2f44386..991e091 100644 --- a/hw/esp.c +++ b/hw/esp.c @@ -390,7 +390,8 @@ static void esp_do_dma(ESPState *s) esp_dma_done(s); } -static void esp_command_complete(SCSIRequest *req, uint32_t status) +static void esp_command_complete(SCSIRequest *req, uint32_t status, + size_t resid) { ESPState *s = DO_UPCAST(ESPState, busdev.qdev, req-bus-qbus.parent); diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c index 9a7ffe3..e36fe35 100644 --- a/hw/lsi53c895a.c +++ b/hw/lsi53c895a.c @@ -699,7 +699,7 @@ static int lsi_queue_req(LSIState *s, SCSIRequest *req, uint32_t len) } /* Callback to indicate that the SCSI layer has completed a command. */ -static void lsi_command_complete(SCSIRequest *req, uint32_t status) +static void lsi_command_complete(SCSIRequest *req, uint32_t status, size_t resid) { LSIState *s = DO_UPCAST(LSIState, dev.qdev, req-bus-qbus.parent); int out; diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c index 0ee50a8..6a069f4 100644 --- a/hw/scsi-bus.c +++ b/hw/scsi-bus.c @@ -533,6 +533,8 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun, } req-cmd = cmd; +req-resid = req-cmd.xfer; + switch (buf[0]) { case INQUIRY: trace_scsi_inquiry(d-id, lun, tag, cmd.buf[1], cmd.buf[2]); @@ -1275,10 +1277,12 @@ void scsi_req_data(SCSIRequest *req, int len) { if (req-io_canceled) { trace_scsi_req_data_canceled(req-dev-id, req-lun, req-tag, len); -} else { -trace_scsi_req_data(req-dev-id, req-lun, req-tag, len); -req-bus-info-transfer_data(req, len); +return; } +trace_scsi_req_data(req-dev-id, req-lun, req-tag, len); +assert(req-cmd.mode != SCSI_XFER_NONE); +req-resid -= len; +req-bus-info-transfer_data(req, len); } void scsi_req_print(SCSIRequest *req) @@ -1337,7 +1341,7 @@ void scsi_req_complete(SCSIRequest *req, int status) scsi_req_ref(req); scsi_req_dequeue(req); -req-bus-info-complete(req, req-status); +req-bus-info-complete(req, req-status, req-resid); scsi_req_unref(req); } diff --git a/hw/scsi.h b/hw/scsi.h index dc72b6f..e1c52d2 100644 --- a/hw/scsi.h +++ b/hw/scsi.h @@ -46,6 +46,7 @@ struct SCSIRequest { uint32_t tag; uint32_t lun; uint32_t status; +size_tresid; SCSICommand cmd; BlockDriverAIOCB *aiocb; uint8_t sense[SCSI_SENSE_BUF_SIZE]; @@ -112,7 +113,7 @@ struct SCSIBusInfo { int tcq; int max_channel, max_target, max_lun; void (*transfer_data)(SCSIRequest *req, uint32_t arg); -void (*complete)(SCSIRequest *req, uint32_t arg); +void (*complete)(SCSIRequest *req, uint32_t arg, size_t resid); void (*cancel)(SCSIRequest *req); }; diff --git a/hw/spapr_vscsi.c b/hw/spapr_vscsi.c index 9cfce19..d7123df 100644 --- a/hw/spapr_vscsi.c +++ b/hw/spapr_vscsi.c @@ -494,7 +494,7 @@ static void vscsi_transfer_data(SCSIRequest *sreq, uint32_t len) } /* Callback to indicate that the SCSI layer has completed a transfer. */ -static void vscsi_command_complete(SCSIRequest *sreq, uint32_t status) +static void vscsi_command_complete(SCSIRequest *sreq, uint32_t status, size_t resid) { VSCSIState *s = DO_UPCAST(VSCSIState, vdev.qdev, sreq-bus-qbus.parent); vscsi_req *req = sreq-hba_private; diff --git a/hw/usb-msd.c b/hw/usb-msd.c index 6153376..47b8b8e 100644 --- a/hw/usb-msd.c +++ b/hw/usb-msd.c @@ -223,7 +223,7 @@ static void usb_msd_transfer_data(SCSIRequest *req, uint32_t len) } } -static void usb_msd_command_complete(SCSIRequest *req, uint32_t status) +static void usb_msd_command_complete(SCSIRequest *req, uint32_t status, size_t resid) { MSDState *s = DO_UPCAST(MSDState, dev.qdev, req-bus-qbus.parent); USBPacket *p = s-packet; -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 09/15] scsi-generic: add migration support
Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi-generic.c | 25 + 1 files changed, 25 insertions(+), 0 deletions(-) diff --git a/hw/scsi-generic.c b/hw/scsi-generic.c index 4859212..cd62922 100644 --- a/hw/scsi-generic.c +++ b/hw/scsi-generic.c @@ -59,6 +59,28 @@ typedef struct SCSIGenericReq { sg_io_hdr_t io_header; } SCSIGenericReq; +static void scsi_generic_save_request(QEMUFile *f, SCSIRequest *req) +{ +SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req); + +qemu_put_sbe32s(f, r-buflen); +if (r-buflen r-req.cmd.mode == SCSI_XFER_TO_DEV) { +assert(!r-req.sg); +qemu_put_buffer(f, r-buf, r-req.cmd.xfer); +} +} + +static void scsi_generic_load_request(QEMUFile *f, SCSIRequest *req) +{ +SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req); + +qemu_get_sbe32s(f, r-buflen); +if (r-buflen r-req.cmd.mode == SCSI_XFER_TO_DEV) { +assert(!r-req.sg); +qemu_get_buffer(f, r-buf, r-req.cmd.xfer); +} +} + static void scsi_free_request(SCSIRequest *req) { SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req); @@ -446,6 +468,8 @@ const SCSIReqOps scsi_generic_req_ops = { .write_data = scsi_write_data, .cancel_io= scsi_cancel_io, .get_buf = scsi_get_buf, +.load_request = scsi_generic_load_request, +.save_request = scsi_generic_save_request, }; static SCSIRequest *scsi_new_request(SCSIDevice *d, uint32_t tag, uint32_t lun, @@ -474,6 +498,7 @@ static void scsi_generic_class_initfn(ObjectClass *klass, void *data) dc-desc = pass through generic scsi device (/dev/sg*); dc-reset = scsi_generic_reset; dc-props = scsi_generic_properties; +dc-vmsd = vmstate_scsi_device; } static TypeInfo scsi_generic_info = { -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 10/15] scsi-disk: add migration support
Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/scsi-disk.c | 59 --- 1 files changed, 55 insertions(+), 4 deletions(-) diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index 0e4d6ad..4d7b4eb 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -111,12 +111,12 @@ static void scsi_cancel_io(SCSIRequest *req) r-req.aiocb = NULL; } -static uint32_t scsi_init_iovec(SCSIDiskReq *r) +static uint32_t scsi_init_iovec(SCSIDiskReq *r, size_t size) { SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r-req.dev); if (!r-iov.iov_base) { -r-buflen = SCSI_DMA_BUF_SIZE; +r-buflen = size; r-iov.iov_base = qemu_blockalign(s-qdev.conf.bs, r-buflen); } r-iov.iov_len = MIN(r-sector_count * 512, r-buflen); @@ -124,6 +124,35 @@ static uint32_t scsi_init_iovec(SCSIDiskReq *r) return r-qiov.size / 512; } +static void scsi_disk_save_request(QEMUFile *f, SCSIRequest *req) +{ +SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req); + +qemu_put_be64s(f, r-sector); +qemu_put_be32s(f, r-sector_count); +qemu_put_be32s(f, r-buflen); +if (r-buflen r-req.cmd.mode == SCSI_XFER_TO_DEV) { +qemu_put_buffer(f, r-iov.iov_base, r-iov.iov_len); +} +} + +static void scsi_disk_load_request(QEMUFile *f, SCSIRequest *req) +{ +SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req); + +qemu_get_be64s(f, r-sector); +qemu_get_be32s(f, r-sector_count); +qemu_get_be32s(f, r-buflen); +if (r-buflen) { +scsi_init_iovec(r, r-buflen); +if (r-req.cmd.mode == SCSI_XFER_TO_DEV) { +qemu_get_buffer(f, r-iov.iov_base, r-iov.iov_len); +} +} + +qemu_iovec_init_external(r-qiov, r-iov, 1); +} + static void scsi_dma_complete(void *opaque, int ret) { SCSIDiskReq *r = (SCSIDiskReq *)opaque; @@ -241,7 +270,7 @@ static void scsi_read_data(SCSIRequest *req) r-req.aiocb = dma_bdrv_read(s-qdev.conf.bs, r-req.sg, r-sector, scsi_dma_complete, r); } else { -n = scsi_init_iovec(r); +n = scsi_init_iovec(r, SCSI_DMA_BUF_SIZE); bdrv_acct_start(s-qdev.conf.bs, r-acct, n * BDRV_SECTOR_SIZE, BDRV_ACCT_READ); r-req.aiocb = bdrv_aio_readv(s-qdev.conf.bs, r-sector, r-qiov, n, scsi_read_complete, r); @@ -316,7 +345,7 @@ static void scsi_write_complete(void * opaque, int ret) if (r-sector_count == 0) { scsi_req_complete(r-req, GOOD); } else { -scsi_init_iovec(r); +scsi_init_iovec(r, SCSI_DMA_BUF_SIZE); DPRINTF(Write complete tag=0x%x more=%d\n, r-req.tag, r-qiov.size); scsi_req_data(r-req, r-qiov.size); } @@ -1621,6 +1650,8 @@ static const SCSIReqOps scsi_disk_reqops = { .write_data = scsi_write_data, .cancel_io= scsi_cancel_io, .get_buf = scsi_get_buf, +.load_request = scsi_disk_load_request, +.save_request = scsi_disk_save_request, }; static SCSIRequest *scsi_new_request(SCSIDevice *d, uint32_t tag, uint32_t lun, @@ -1755,6 +1786,22 @@ static Property scsi_hd_properties[] = { DEFINE_PROP_END_OF_LIST(), }; +static const VMStateDescription vmstate_scsi_disk_state = { +.name = scsi-disk, +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField[]) { +VMSTATE_SCSI_DEVICE(qdev, SCSIDiskState), +VMSTATE_BOOL(media_changed, SCSIDiskState), +VMSTATE_BOOL(media_event, SCSIDiskState), +VMSTATE_BOOL(eject_request, SCSIDiskState), +VMSTATE_BOOL(tray_open, SCSIDiskState), +VMSTATE_BOOL(tray_locked, SCSIDiskState), +VMSTATE_END_OF_LIST() +} +}; + static void scsi_hd_class_initfn(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -1768,6 +1815,7 @@ static void scsi_hd_class_initfn(ObjectClass *klass, void *data) dc-desc = virtual SCSI disk; dc-reset = scsi_disk_reset; dc-props = scsi_hd_properties; +dc-vmsd = vmstate_scsi_disk_state; } static TypeInfo scsi_hd_info = { @@ -1795,6 +1843,7 @@ static void scsi_cd_class_initfn(ObjectClass *klass, void *data) dc-desc = virtual SCSI CD-ROM; dc-reset = scsi_disk_reset; dc-props = scsi_cd_properties; +dc-vmsd = vmstate_scsi_disk_state; } static TypeInfo scsi_cd_info = { @@ -1822,6 +1871,7 @@ static void scsi_block_class_initfn(ObjectClass *klass, void *data) dc-desc = SCSI block device passthrough; dc-reset = scsi_disk_reset; dc-props = scsi_block_properties; +dc-vmsd = vmstate_scsi_disk_state; } static TypeInfo scsi_block_info = { @@ -1851,6 +1901,7 @@ static void scsi_disk_class_initfn(ObjectClass *klass, void *data) dc-desc = virtual SCSI disk or CD-ROM (legacy); dc-reset = scsi_disk_reset; dc-props = scsi_disk_properties; +dc-vmsd = vmstate_scsi_disk_state; } static TypeInfo
[PATCH v3 11/15] virtio-scsi: Add virtio-scsi stub device
From: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Add a useless virtio SCSI HBA device: qemu -device virtio-scsi-pci Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- v2-v3: changed virtio id Makefile.target |1 + default-configs/pci.mak |1 + default-configs/s390x-softmmu.mak |1 + hw/pci.h |1 + hw/s390-virtio-bus.c | 34 ++ hw/s390-virtio-bus.h |4 +- hw/virtio-pci.c | 56 + hw/virtio-pci.h |2 + hw/virtio-scsi.c | 228 + hw/virtio-scsi.h | 36 ++ hw/virtio.h |3 + 11 files changed, 366 insertions(+), 1 deletions(-) create mode 100644 hw/virtio-scsi.c create mode 100644 hw/virtio-scsi.h diff --git a/Makefile.target b/Makefile.target index 29fde6e..c8f61d6 100644 --- a/Makefile.target +++ b/Makefile.target @@ -200,6 +200,7 @@ obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o ioport.o # need to fix this properly obj-$(CONFIG_NO_PCI) += pci-stub.o obj-$(CONFIG_VIRTIO) += virtio.o virtio-blk.o virtio-balloon.o virtio-net.o virtio-serial-bus.o +obj-$(CONFIG_VIRTIO_SCSI) += virtio-scsi.o obj-y += vhost_net.o obj-$(CONFIG_VHOST_NET) += vhost.o obj-$(CONFIG_REALLY_VIRTFS) += 9pfs/virtio-9p-device.o diff --git a/default-configs/pci.mak b/default-configs/pci.mak index 9d3e1db..21e4ccf 100644 --- a/default-configs/pci.mak +++ b/default-configs/pci.mak @@ -1,5 +1,6 @@ CONFIG_PCI=y CONFIG_VIRTIO_PCI=y +CONFIG_VIRTIO_SCSI=y CONFIG_VIRTIO=y CONFIG_USB_UHCI=y CONFIG_USB_OHCI=y diff --git a/default-configs/s390x-softmmu.mak b/default-configs/s390x-softmmu.mak index 3005729..e588803 100644 --- a/default-configs/s390x-softmmu.mak +++ b/default-configs/s390x-softmmu.mak @@ -1 +1,2 @@ CONFIG_VIRTIO=y +CONFIG_VIRTIO_SCSI=y diff --git a/hw/pci.h b/hw/pci.h index 33b0b18..ff4c12d 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -75,6 +75,7 @@ #define PCI_DEVICE_ID_VIRTIO_BLOCK 0x1001 #define PCI_DEVICE_ID_VIRTIO_BALLOON 0x1002 #define PCI_DEVICE_ID_VIRTIO_CONSOLE 0x1003 +#define PCI_DEVICE_ID_VIRTIO_SCSI0x1004 #define FMT_PCIBUS PRIx64 diff --git a/hw/s390-virtio-bus.c b/hw/s390-virtio-bus.c index 49140f8..3515abc 100644 --- a/hw/s390-virtio-bus.c +++ b/hw/s390-virtio-bus.c @@ -169,6 +169,39 @@ static int s390_virtio_serial_init(VirtIOS390Device *dev) return r; } +static int s390_virtio_scsi_init(VirtIOS390Device *dev) +{ +VirtIODevice *vdev; + +vdev = virtio_scsi_init((DeviceState *)dev, dev-scsi); +if (!vdev) { +return -1; +} + +return s390_virtio_device_init(dev, vdev); +} + +static Property virtio_scsi_properties[] = { +DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOPCIProxy, host_features, scsi), +DEFINE_PROP_END_OF_LIST(), +}; + +static void s390_virtio_scsi_class_init(ObjectClass *klass, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(klass); +VirtIOS390DeviceClass *k = VIRTIO_S390_DEVICE_CLASS(klass); + +k-init = s390_virtio_scsi_init; +dc-props = s390_virtio_scsi_properties; +} + +static DeviceInfo virtio_scsi_info = { +.name = virtio-scsi-s390, +.parent= TYPE_VIRTIO_S390_DEVICE, +.instance_size = sizeof(VirtIOS390Device), +.class_init= s390_virtio_scsi_class_init, +}; + static uint64_t s390_virtio_device_vq_token(VirtIOS390Device *dev, int vq) { ram_addr_t token_off; @@ -439,6 +472,7 @@ static void s390_virtio_register(void) type_register_static(s390_virtio_serial); type_register_static(s390_virtio_blk); type_register_static(s390_virtio_net); +type_register_static(s390_virtio_scsi); } device_init(s390_virtio_register); diff --git a/hw/s390-virtio-bus.h b/hw/s390-virtio-bus.h index b5e59b7..ef534b6 100644 --- a/hw/s390-virtio-bus.h +++ b/hw/s390-virtio-bus.h @@ -19,6 +19,7 @@ #include virtio-net.h #include virtio-serial.h +#include virtio-scsi.h #define VIRTIO_DEV_OFFS_TYPE 0 /* 8 bits */ #define VIRTIO_DEV_OFFS_NUM_VQ 1 /* 8 bits */ @@ -67,7 +68,8 @@ struct VirtIOS390Device { uint32_t host_features; virtio_serial_conf serial; virtio_net_conf net; -}; +VirtIOSCSIConf scsi; +} VirtIOS390Device; typedef struct VirtIOS390Bus { BusState bus; diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 93fff54..08e63a6 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -21,6 +21,7 @@ #include virtio-blk.h #include virtio-net.h #include virtio-serial.h +#include virtio-scsi.h #include pci.h #include qemu-error.h #include msix.h @@ -930,12 +931,67 @@ static TypeInfo virtio_balloon_info = { .class_init= virtio_balloon_class_init, }; +static int virtio_scsi_init_pci(PCIDevice
[PATCH v3 13/15] virtio-scsi: add basic SCSI bus operation
Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- v2-v3: fixed sense length (Christian Hoff) hw/virtio-scsi.c | 110 +++-- 1 files changed, 97 insertions(+), 13 deletions(-) diff --git a/hw/virtio-scsi.c b/hw/virtio-scsi.c index b34c14f..21264a1 100644 --- a/hw/virtio-scsi.c +++ b/hw/virtio-scsi.c @@ -128,6 +128,7 @@ typedef struct { DeviceState *qdev; VirtIOSCSIConf *conf; +SCSIBus bus; VirtQueue *ctrl_vq; VirtQueue *event_vq; VirtQueue *cmd_vq; @@ -156,6 +157,22 @@ typedef struct VirtIOSCSIReq { } resp; } VirtIOSCSIReq; +static inline int virtio_scsi_get_lun(uint8_t *lun) +{ +return ((lun[2] 8) | lun[3]) 0x3FFF; +} + +static inline SCSIDevice *virtio_scsi_device_find(VirtIOSCSI *s, uint8_t *lun) +{ +if (lun[0] != 1) { +return NULL; +} +if (lun[2] != 0 !(lun[2] = 0x40 lun[2] 0x80)) { +return NULL; +} +return scsi_device_find(s-bus, 0, lun[1], virtio_scsi_get_lun(lun)); +} + static void virtio_scsi_complete_req(VirtIOSCSIReq *req) { VirtIOSCSI *s = req-dev; @@ -240,7 +257,42 @@ static void virtio_scsi_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq) } } -static void virtio_scsi_fail_cmd_req(VirtIOSCSI *s, VirtIOSCSIReq *req) +static void virtio_scsi_command_complete(SCSIRequest *r, uint32_t status, + size_t resid) +{ +VirtIOSCSIReq *req = r-hba_private; + +req-resp.cmd-response = VIRTIO_SCSI_S_OK; +req-resp.cmd-status = status; +if (req-resp.cmd-status == GOOD) { +req-resp.cmd-resid = resid; +} else { +req-resp.cmd-resid = 0; +req-resp.cmd-sense_len = +scsi_req_get_sense(r, req-resp.cmd-sense, VIRTIO_SCSI_SENSE_SIZE); +} +virtio_scsi_complete_req(req); +} + +static QEMUSGList *virtio_scsi_get_sg_list(SCSIRequest *r) +{ +VirtIOSCSIReq *req = r-hba_private; + +return req-qsgl; +} + +static void virtio_scsi_request_cancelled(SCSIRequest *r) +{ +VirtIOSCSIReq *req = r-hba_private; + +if (!req) { +return; +} +req-resp.cmd-response = VIRTIO_SCSI_S_ABORTED; +virtio_scsi_complete_req(req); +} + +static void virtio_scsi_fail_cmd_req(VirtIOSCSIReq *req) { req-resp.cmd-response = VIRTIO_SCSI_S_FAILURE; virtio_scsi_complete_req(req); @@ -250,8 +301,10 @@ static void virtio_scsi_handle_cmd(VirtIODevice *vdev, VirtQueue *vq) { VirtIOSCSI *s = (VirtIOSCSI *)vdev; VirtIOSCSIReq *req; +int n; while ((req = virtio_scsi_pop_req(s, vq))) { +SCSIDevice *d; int out_size, in_size; if (req-elem.out_num 1 || req-elem.in_num 1) { virtio_scsi_bad_req(); @@ -265,21 +318,36 @@ static void virtio_scsi_handle_cmd(VirtIODevice *vdev, VirtQueue *vq) } if (req-elem.out_num 1 req-elem.in_num 1) { -virtio_scsi_fail_cmd_req(s, req); +virtio_scsi_fail_cmd_req(req); continue; } -req-resp.cmd-resid = 0; -req-resp.cmd-status_qualifier = 0; -req-resp.cmd-status = CHECK_CONDITION; -req-resp.cmd-sense_len = 4; -req-resp.cmd-sense[0] = 0xf0; /* Fixed format current sense */ -req-resp.cmd-sense[1] = ILLEGAL_REQUEST; -req-resp.cmd-sense[2] = 0x20; -req-resp.cmd-sense[3] = 0x00; -req-resp.cmd-response = VIRTIO_SCSI_S_OK; - -virtio_scsi_complete_req(req); +d = virtio_scsi_device_find(s, req-req.cmd-lun); +if (!d) { +req-resp.cmd-response = VIRTIO_SCSI_S_BAD_TARGET; +virtio_scsi_complete_req(req); +continue; +} +req-sreq = scsi_req_new(d, req-req.cmd-tag, + virtio_scsi_get_lun(req-req.cmd-lun), + req-req.cmd-cdb, req); + +if (req-sreq-cmd.mode != SCSI_XFER_NONE) { +int req_mode = +(req-elem.in_num 1 ? SCSI_XFER_FROM_DEV : SCSI_XFER_TO_DEV); + +if (req-sreq-cmd.mode != req_mode || +req-sreq-cmd.xfer req-qsgl.size) { +req-resp.cmd-response = VIRTIO_SCSI_S_OVERRUN; +virtio_scsi_complete_req(req); +continue; +} +} + +n = scsi_req_enqueue(req-sreq); +if (n) { +scsi_req_continue(req-sreq); +} } } @@ -331,6 +399,17 @@ static void virtio_scsi_reset(VirtIODevice *vdev) s-cdb_size = VIRTIO_SCSI_CDB_SIZE; } +static struct SCSIBusInfo virtio_scsi_scsi_info = { +.tcq = true, +.max_channel = VIRTIO_SCSI_MAX_CHANNEL, +.max_target = VIRTIO_SCSI_MAX_TARGET, +.max_lun = VIRTIO_SCSI_MAX_LUN, + +.complete = virtio_scsi_command_complete, +.cancel = virtio_scsi_request_cancelled, +.get_sg_list = virtio_scsi_get_sg_list, +}; + VirtIODevice
[PATCH v3 14/15] virtio-scsi: process control queue requests
Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/virtio-scsi.c | 125 ++--- 1 files changed, 117 insertions(+), 8 deletions(-) diff --git a/hw/virtio-scsi.c b/hw/virtio-scsi.c index 21264a1..7ad60ec 100644 --- a/hw/virtio-scsi.c +++ b/hw/virtio-scsi.c @@ -134,6 +134,7 @@ typedef struct { VirtQueue *cmd_vq; uint32_t sense_size; uint32_t cdb_size; +bool resetting; } VirtIOSCSI; typedef struct VirtIOSCSIReq { @@ -236,15 +237,95 @@ static VirtIOSCSIReq *virtio_scsi_pop_req(VirtIOSCSI *s, VirtQueue *vq) return req; } -static void virtio_scsi_fail_ctrl_req(VirtIOSCSIReq *req) +static void virtio_scsi_do_tmf(VirtIOSCSI *s, VirtIOSCSIReq *req) { -if (req-req.tmf-type == VIRTIO_SCSI_T_TMF) { -req-resp.tmf-response = VIRTIO_SCSI_S_FAILURE; -} else { -req-resp.an-response = VIRTIO_SCSI_S_FAILURE; +SCSIDevice *d = virtio_scsi_device_find(s, req-req.cmd-lun); +SCSIRequest *r, *next; +DeviceState *qdev; +int target; + +switch (req-req.tmf-subtype) { +case VIRTIO_SCSI_T_TMF_ABORT_TASK: +case VIRTIO_SCSI_T_TMF_QUERY_TASK: +d = virtio_scsi_device_find(s, req-req.cmd-lun); +if (!d) { +goto fail; +} +if (d-lun != virtio_scsi_get_lun(req-req.cmd-lun)) { +req-resp.tmf-response = VIRTIO_SCSI_S_INCORRECT_LUN; +break; +} +QTAILQ_FOREACH_SAFE(r, d-requests, next, next) { +if (r-tag == req-req.cmd-tag) { +break; +} +} +if (r r-hba_private) { +if (req-req.tmf-subtype == VIRTIO_SCSI_T_TMF_ABORT_TASK) { +scsi_req_cancel(r); +} +req-resp.tmf-response = VIRTIO_SCSI_S_FUNCTION_SUCCEEDED; +} else { +req-resp.tmf-response = VIRTIO_SCSI_S_OK; +} +break; + +case VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET: +d = virtio_scsi_device_find(s, req-req.cmd-lun); +if (!d) { +goto fail; +} +if (d-lun == virtio_scsi_get_lun(req-req.cmd-lun)) { +s-resetting++; +qdev_reset_all(d-qdev); +s-resetting--; +} +break; + +case VIRTIO_SCSI_T_TMF_ABORT_TASK_SET: +case VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET: +case VIRTIO_SCSI_T_TMF_QUERY_TASK_SET: +d = virtio_scsi_device_find(s, req-req.cmd-lun); +if (!d) { +goto fail; +} +if (d-lun != virtio_scsi_get_lun(req-req.cmd-lun)) { +req-resp.tmf-response = VIRTIO_SCSI_S_INCORRECT_LUN; +break; +} +req-resp.tmf-response = VIRTIO_SCSI_S_OK; +QTAILQ_FOREACH_SAFE(r, d-requests, next, next) { +if (r-hba_private) { +if (req-req.tmf-subtype != VIRTIO_SCSI_T_TMF_QUERY_TASK) { +scsi_req_cancel(r); +} +req-resp.tmf-response = VIRTIO_SCSI_S_FUNCTION_SUCCEEDED; +} +} +break; + +case VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET: +target = req-req.cmd-lun[1]; +s-resetting++; +QTAILQ_FOREACH(qdev, s-bus.qbus.children, sibling) { + d = DO_UPCAST(SCSIDevice, qdev, qdev); + if (d-channel == 0 d-id == target) { +qdev_reset_all(d-qdev); + } +} +s-resetting--; +break; + +case VIRTIO_SCSI_T_TMF_CLEAR_ACA: +default: +req-resp.tmf-response = VIRTIO_SCSI_S_FUNCTION_REJECTED; +break; } -virtio_scsi_complete_req(req); +return; + +fail: +req-resp.tmf-response = VIRTIO_SCSI_S_BAD_TARGET; } static void virtio_scsi_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq) @@ -253,7 +334,31 @@ static void virtio_scsi_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq) VirtIOSCSIReq *req; while ((req = virtio_scsi_pop_req(s, vq))) { -virtio_scsi_fail_ctrl_req(req); +int out_size, in_size; +if (req-elem.out_num 1 || req-elem.in_num 1) { +virtio_scsi_bad_req(); +continue; +} + +out_size = req-elem.out_sg[0].iov_len; +in_size = req-elem.in_sg[0].iov_len; +if (req-req.tmf-type == VIRTIO_SCSI_T_TMF) { +if (out_size sizeof(VirtIOSCSICtrlTMFReq) || +in_size sizeof(VirtIOSCSICtrlTMFResp)) { +virtio_scsi_bad_req(); +} +virtio_scsi_do_tmf(s, req); + +} else if (req-req.tmf-type == VIRTIO_SCSI_T_AN_QUERY || + req-req.tmf-type == VIRTIO_SCSI_T_AN_SUBSCRIBE) { +if (out_size sizeof(VirtIOSCSICtrlANReq) || +in_size sizeof(VirtIOSCSICtrlANResp)) { +virtio_scsi_bad_req(); +} +req-resp.an-event_actual = 0; +req-resp.an-response = VIRTIO_SCSI_S_OK; +} +
[PATCH v3 15/15] virtio-scsi: add migration support
Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- hw/virtio-scsi.c | 50 +- 1 files changed, 49 insertions(+), 1 deletions(-) diff --git a/hw/virtio-scsi.c b/hw/virtio-scsi.c index 7ad60ec..f5cecfc 100644 --- a/hw/virtio-scsi.c +++ b/hw/virtio-scsi.c @@ -237,6 +237,34 @@ static VirtIOSCSIReq *virtio_scsi_pop_req(VirtIOSCSI *s, VirtQueue *vq) return req; } +static void virtio_scsi_save_request(QEMUFile *f, SCSIRequest *sreq) +{ +VirtIOSCSIReq *req = sreq-hba_private; + +qemu_put_buffer(f, (unsigned char *)req-elem, sizeof(req-elem)); +} + +static void *virtio_scsi_load_request(QEMUFile *f, SCSIRequest *sreq) +{ +SCSIBus *bus = sreq-bus; +VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus); +VirtIOSCSIReq *req; + +req = g_malloc(sizeof(*req)); +qemu_get_buffer(f, (unsigned char *)req-elem, sizeof(req-elem)); +virtio_scsi_parse_req(s, s-cmd_vq, req); + +scsi_req_ref(sreq); +req-sreq = sreq; +if (req-sreq-cmd.mode != SCSI_XFER_NONE) { +int req_mode = +(req-elem.in_num 1 ? SCSI_XFER_FROM_DEV : SCSI_XFER_TO_DEV); + +assert(req-sreq-cmd.mode == req_mode); +} +return req; +} + static void virtio_scsi_do_tmf(VirtIOSCSI *s, VirtIOSCSIReq *req) { SCSIDevice *d = virtio_scsi_device_find(s, req-req.cmd-lun); @@ -508,6 +536,22 @@ static void virtio_scsi_reset(VirtIODevice *vdev) s-cdb_size = VIRTIO_SCSI_CDB_SIZE; } +/* The device does not have anything to save beyond the virtio data. + * Request data is saved with callbacks from SCSI devices. + */ +static void virtio_scsi_save(QEMUFile *f, void *opaque) +{ +VirtIOSCSI *s = opaque; +virtio_save(s-vdev, f); +} + +static int virtio_scsi_load(QEMUFile *f, void *opaque, int version_id) +{ +VirtIOSCSI *s = opaque; +virtio_load(s-vdev, f); +return 0; +} + static struct SCSIBusInfo virtio_scsi_scsi_info = { .tcq = true, .max_channel = VIRTIO_SCSI_MAX_CHANNEL, @@ -517,11 +561,14 @@ static struct SCSIBusInfo virtio_scsi_scsi_info = { .complete = virtio_scsi_command_complete, .cancel = virtio_scsi_request_cancelled, .get_sg_list = virtio_scsi_get_sg_list, +.save_request = virtio_scsi_save_request, +.load_request = virtio_scsi_load_request, }; VirtIODevice *virtio_scsi_init(DeviceState *dev, VirtIOSCSIConf *proxyconf) { VirtIOSCSI *s; +static int virtio_scsi_id; s = (VirtIOSCSI *)virtio_common_init(virtio-scsi, VIRTIO_ID_SCSI, sizeof(VirtIOSCSIConfig), @@ -548,7 +595,8 @@ VirtIODevice *virtio_scsi_init(DeviceState *dev, VirtIOSCSIConf *proxyconf) scsi_bus_legacy_handle_cmdline(s-bus); } -/* TODO savevm */ +register_savevm(dev, virtio-scsi, virtio_scsi_id++, 1, +virtio_scsi_save, virtio_scsi_load, s); return s-vdev; } -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/3] KVM: PPC: epapr: Factor out the epapr init
On 02/12/2012 11:47 PM, Liu Yu-B13201 wrote: -Original Message- From: Wood Scott-B07421 Sent: Saturday, February 11, 2012 2:40 AM To: Liu Yu-B13201 Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc-...@ozlabs.org; Wood Scott-B07421 Subject: Re: [PATCH v3 1/3] KVM: PPC: epapr: Factor out the epapr init Why are you still doing the patching inside kvm.c? Do you mean we should move kvm_hypercall_start() into epapr bit? Yes. This is an ePAPR mechanism; KVM just happens to be a user of it. We should also update arch/powerpc/include/asm/epapr_hcalls.h to use this mechanism. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] pvclock: Make pv_clock more robust and fixup it if overflow happens
On Mon, Feb 13, 2012 at 04:45:59PM +0100, Igor Mammedov wrote: Instead of hunting misterious stalls/hungs all over the kernel when overflow occurs at pvclock.c:pvclock_get_nsec_offset u64 delta = native_read_tsc() - shadow-tsc_timestamp; and introducing hooks when places of unexpected access found, pv_clock should be initialized for the calling cpu if overflow condition is detected. Signed-off-by: Igor Mammedov imamm...@redhat.com Igor, I disagree. This is fixing the symptom not the root cause. Additionally, Xen also uses pvclock_clocksource_read. How about adding a BUG_ON to detect the overflow, this way hunting for the problem is not necessary. arch/x86/kernel/pvclock.c | 18 +++--- 1 files changed, 15 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 42eb330..b486756 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -41,9 +41,14 @@ void pvclock_set_flags(u8 flags) valid_flags = flags; } -static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow) +static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow, +bool *overflow) { - u64 delta = native_read_tsc() - shadow-tsc_timestamp; + u64 delta; + u64 tsc = native_read_tsc(); + u64 shadow_timestamp = shadow-tsc_timestamp; + *overflow = tsc shadow_timestamp; + delta = tsc - shadow_timestamp; return pvclock_scale_delta(delta, shadow-tsc_to_nsec_mul, shadow-tsc_shift); } @@ -94,12 +99,19 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) unsigned version; cycle_t ret, offset; u64 last; + bool overflow; do { version = pvclock_get_time_values(shadow, src); barrier(); - offset = pvclock_get_nsec_offset(shadow); + offset = pvclock_get_nsec_offset(shadow, overflow); ret = shadow.system_timestamp + offset; + if (unlikely(overflow)) { + memset(src, 0, sizeof(*src)); + barrier(); + x86_cpuinit.early_percpu_clock_init(); + continue; + } barrier(); } while (version != src-version); -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] pvclock: Make pv_clock more robust and fixup it if overflow happens
On 02/13/2012 06:48 PM, Marcelo Tosatti wrote: On Mon, Feb 13, 2012 at 04:45:59PM +0100, Igor Mammedov wrote: Instead of hunting misterious stalls/hungs all over the kernel when overflow occurs at pvclock.c:pvclock_get_nsec_offset u64 delta = native_read_tsc() - shadow-tsc_timestamp; and introducing hooks when places of unexpected access found, pv_clock should be initialized for the calling cpu if overflow condition is detected. Signed-off-by: Igor Mammedovimamm...@redhat.com Igor, I disagree. This is fixing the symptom not the root cause. Additionally, Xen also uses pvclock_clocksource_read. How about adding a BUG_ON to detect the overflow, this way hunting for the problem is not necessary. Ok, I'll repost bug_on version. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] BUG in pv_clock when overflow condition is detected
BUG when overflow occurs at pvclock.c:pvclock_get_nsec_offset u64 delta = native_read_tsc() - shadow-tsc_timestamp; this might happen at an attempt to read an uninitialized yet clock. It won't prevent stalls and hangs but at least it won't do it silently. Signed-off-by: Igor Mammedov imamm...@redhat.com --- arch/x86/kernel/pvclock.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 42eb330..35a6190 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -43,7 +43,10 @@ void pvclock_set_flags(u8 flags) static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow) { - u64 delta = native_read_tsc() - shadow-tsc_timestamp; + u64 delta; + u64 tsc = native_read_tsc(); + BUG_ON(tsc shadow-tsc_timestamp); + delta = tsc - shadow-tsc_timestamp; return pvclock_scale_delta(delta, shadow-tsc_to_nsec_mul, shadow-tsc_shift); } -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
On Mon, Feb 13, 2012 at 10:16, Jan Kiszka jan.kis...@siemens.com wrote: On 2012-02-11 16:25, Blue Swirl wrote: On Fri, Feb 10, 2012 at 18:31, Jan Kiszka jan.kis...@siemens.com wrote: This enables acceleration for MMIO-based TPR registers accesses of 32-bit Windows guest systems. It is mostly useful with KVM enabled, either on older Intel CPUs (without flexpriority feature, can also be manually disabled for testing) or any current AMD processor. The approach introduced here is derived from the original version of qemu-kvm. It was refactored, documented, and extended by support for user space APIC emulation, both with and without KVM acceleration. The VMState format was kept compatible, so was the ABI to the option ROM that implements the guest-side para-virtualized driver service. This enables seamless migration from qemu-kvm to upstream or, one day, between KVM and TCG mode. The basic concept goes like this: - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel irqchip) a vmcall hypercall is registered - VAPIC option ROM is loaded into guest - option ROM activates TPR MMIO access reporting via port 0x7e - TPR accesses are trapped and patched in the guest to call into option ROM instead, VAPIC support is enabled - option ROM TPR helpers track state in memory and invoke hypercall to poll for pending IRQs if required Signed-off-by: Jan Kiszka jan.kis...@siemens.com I must say that I find the approach horrible, patching guests and ROMs and looking up Windows internals. Taking the same approach to extreme, we could for example patch Xen guest to become a KVM guest. Not that I object merging. Yes, this is horrible. But there is no real better way in the absence of hardware assisted virtualization of the TPR. I think MS is recommending this patching approach as well. Maybe instead of routing via ROM and the hypercall, the TPR accesses could be handled directly with guest invisible breakpoints (like GDB breakpoints, but for QEMU internal use), much like other instrumentation could be handled. diff --git a/hw/apic.c b/hw/apic.c index 086c544..2ebf3ca 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -35,6 +35,10 @@ #define MSI_ADDR_DEST_ID_SHIFT 12 #define MSI_ADDR_DEST_ID_MASK 0x000 +#define SYNC_FROM_VAPIC 0x1 +#define SYNC_TO_VAPIC 0x2 +#define SYNC_ISR_IRR_TO_VAPIC 0x4 Enum, please. OK. + static APICCommonState *local_apics[MAX_APICS + 1]; static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode); @@ -78,6 +82,70 @@ static inline int get_bit(uint32_t *tab, int index) return !!(tab[i] mask); } +/* return -1 if no bit is set */ +static int get_highest_priority_int(uint32_t *tab) +{ + int i; + for (i = 7; i = 0; i--) { + if (tab[i] != 0) { + return i * 32 + fls_bit(tab[i]); + } + } + return -1; +} + +static void apic_sync_vapic(APICCommonState *s, int sync_type) +{ + VAPICState vapic_state; + size_t length; + off_t start; + int vector; + + if (!s-vapic_paddr) { + return; + } + if (sync_type SYNC_FROM_VAPIC) { + cpu_physical_memory_rw(s-vapic_paddr, (void *)vapic_state, + sizeof(vapic_state), 0); + s-tpr = vapic_state.tpr; + } + if (sync_type (SYNC_TO_VAPIC | SYNC_ISR_IRR_TO_VAPIC)) { + start = offsetof(VAPICState, isr); + length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr); + + if (sync_type SYNC_TO_VAPIC) { + assert(qemu_cpu_is_self(s-cpu_env)); + + vapic_state.tpr = s-tpr; + vapic_state.enabled = 1; + start = 0; + length = sizeof(VAPICState); + } + + vector = get_highest_priority_int(s-isr); + if (vector 0) { + vector = 0; + } + vapic_state.isr = vector 0xf0; + + vapic_state.zero = 0; + + vector = get_highest_priority_int(s-irr); + if (vector 0) { + vector = 0; + } + vapic_state.irr = vector 0xff; + + cpu_physical_memory_write_rom(s-vapic_paddr + start, + ((void *)vapic_state) + start, length); This assumes that the vapic_state structure matches guest what guest expect without conversion. Is this true for i386 on x86_64? I didn't check the structure in question. Yes, the structure in question is a packed one, stable on both guest and host side (the guest side is 32-bit only anyway). diff --git a/hw/apic_common.c b/hw/apic_common.c index 588531b..1977da7 100644 --- a/hw/apic_common.c +++ b/hw/apic_common.c @@ -20,8 +20,10 @@ #include apic.h #include apic_internal.h #include trace.h +#include kvm.h static int apic_irq_delivered; +bool apic_report_tpr_access; This should go to APICCommonState. Nope, it
Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
On Mon, Feb 13, 2012 at 06:50:08PM +, Blue Swirl wrote: On Mon, Feb 13, 2012 at 10:16, Jan Kiszka jan.kis...@siemens.com wrote: On 2012-02-11 16:25, Blue Swirl wrote: On Fri, Feb 10, 2012 at 18:31, Jan Kiszka jan.kis...@siemens.com wrote: This enables acceleration for MMIO-based TPR registers accesses of 32-bit Windows guest systems. It is mostly useful with KVM enabled, either on older Intel CPUs (without flexpriority feature, can also be manually disabled for testing) or any current AMD processor. The approach introduced here is derived from the original version of qemu-kvm. It was refactored, documented, and extended by support for user space APIC emulation, both with and without KVM acceleration. The VMState format was kept compatible, so was the ABI to the option ROM that implements the guest-side para-virtualized driver service. This enables seamless migration from qemu-kvm to upstream or, one day, between KVM and TCG mode. The basic concept goes like this: - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel irqchip) a vmcall hypercall is registered - VAPIC option ROM is loaded into guest - option ROM activates TPR MMIO access reporting via port 0x7e - TPR accesses are trapped and patched in the guest to call into option ROM instead, VAPIC support is enabled - option ROM TPR helpers track state in memory and invoke hypercall to poll for pending IRQs if required Signed-off-by: Jan Kiszka jan.kis...@siemens.com I must say that I find the approach horrible, patching guests and ROMs and looking up Windows internals. Taking the same approach to extreme, we could for example patch Xen guest to become a KVM guest. Not that I object merging. Yes, this is horrible. But there is no real better way in the absence of hardware assisted virtualization of the TPR. I think MS is recommending this patching approach as well. Maybe instead of routing via ROM and the hypercall, the TPR accesses could be handled directly with guest invisible breakpoints (like GDB breakpoints, but for QEMU internal use), much like other instrumentation could be handled. Hypercall is rarely called. The idea behind patching is to not have exit on each TPR update. Breakpoint will cause exit making the whole exercise pointless. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
On 2012-02-13 19:50, Blue Swirl wrote: On Mon, Feb 13, 2012 at 10:16, Jan Kiszka jan.kis...@siemens.com wrote: On 2012-02-11 16:25, Blue Swirl wrote: On Fri, Feb 10, 2012 at 18:31, Jan Kiszka jan.kis...@siemens.com wrote: This enables acceleration for MMIO-based TPR registers accesses of 32-bit Windows guest systems. It is mostly useful with KVM enabled, either on older Intel CPUs (without flexpriority feature, can also be manually disabled for testing) or any current AMD processor. The approach introduced here is derived from the original version of qemu-kvm. It was refactored, documented, and extended by support for user space APIC emulation, both with and without KVM acceleration. The VMState format was kept compatible, so was the ABI to the option ROM that implements the guest-side para-virtualized driver service. This enables seamless migration from qemu-kvm to upstream or, one day, between KVM and TCG mode. The basic concept goes like this: - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel irqchip) a vmcall hypercall is registered - VAPIC option ROM is loaded into guest - option ROM activates TPR MMIO access reporting via port 0x7e - TPR accesses are trapped and patched in the guest to call into option ROM instead, VAPIC support is enabled - option ROM TPR helpers track state in memory and invoke hypercall to poll for pending IRQs if required Signed-off-by: Jan Kiszka jan.kis...@siemens.com I must say that I find the approach horrible, patching guests and ROMs and looking up Windows internals. Taking the same approach to extreme, we could for example patch Xen guest to become a KVM guest. Not that I object merging. Yes, this is horrible. But there is no real better way in the absence of hardware assisted virtualization of the TPR. I think MS is recommending this patching approach as well. Maybe instead of routing via ROM and the hypercall, the TPR accesses could be handled directly with guest invisible breakpoints (like GDB breakpoints, but for QEMU internal use), much like other instrumentation could be handled. Gleb answered it already. @@ -238,6 +275,7 @@ static int apic_init_common(SysBusDevice *dev) { APICCommonState *s = APIC_COMMON(dev); APICCommonClass *info; +static DeviceState *vapic; static int apic_no; if (apic_no = MAX_APICS) { @@ -248,10 +286,29 @@ static int apic_init_common(SysBusDevice *dev) info = APIC_COMMON_GET_CLASS(s); info-init(s); -sysbus_init_mmio(s-busdev, s-io_memory); +sysbus_init_mmio(dev, s-io_memory); + +if (!vapic s-vapic_control VAPIC_ENABLE_MASK) { +vapic = sysbus_create_simple(kvmvapic, -1, NULL); +} +s-vapic = vapic; +if (apic_report_tpr_access info-enable_tpr_reporting) { I think you should not rely on apic_report_tpr_access being in sane condition during class init. It is mandatory, e.g. for CPU hotplug, as reporting needs to be consistent accross all VCPUs. Therefore it is a static global, set to false initially. However, you are right, we lack proper clearing of the access report feature on reset, not only in this variable. I'd also set it to false initially. It's a global variable, thus initialized to false by definition. + +#define VAPIC_CPU_SHIFT 7 + +#define ROM_BLOCK_SIZE 512 +#define ROM_BLOCK_MASK (~(ROM_BLOCK_SIZE - 1)) + +typedef struct VAPICHandlers { +uint32_t set_tpr; +uint32_t set_tpr_eax; +uint32_t get_tpr[8]; +uint32_t get_tpr_stack; +} QEMU_PACKED VAPICHandlers; + +typedef struct GuestROMState { +char signature[8]; +uint32_t vaddr; This does not look 64 bit clean. It's packed. I meant virtual address could be 64 bits on a 64 bit host, not structure packing. This is for 32-bit guests only. 64-bit Windows doesn't access the TPR via MMIO, thus is not activating the VAPIC. +uint32_t state; +uint32_t rom_state_paddr; +uint32_t rom_state_vaddr; +uint32_t vapic_paddr; +uint32_t real_tpr_addr; +GuestROMState rom_state; +size_t rom_size; +} VAPICROMState; + +#define TPR_INSTR_IS_WRITE 0x1 +#define TPR_INSTR_ABS_MODRM 0x2 +#define TPR_INSTR_MATCH_MODRM_REG 0x4 + +typedef struct TPRInstruction { +uint8_t opcode; +uint8_t modrm_reg; +unsigned int flags; +size_t length; +off_t addr_offset; +} TPRInstruction; Also here the order is pessimized. Don't see the gain here, though. There are two bytes' hole between modrm_reg and flags, maybe also 4 bytes between length and addr_offset (if size_t is 32 bits but off_t 64 bits). I'd reverse the order so that members with largest alignment needs come first. Well, but this won't make the struct smaller. I prefer to keep the ordering in which we also initialize it. +static int find_real_tpr_addr(VAPICROMState *s, CPUState *env) +{ +target_phys_addr_t
Re: Pe: [PATCH v5 1/3] virtio-scsi: first version
On Tue, Feb 14, 2012 at 2:12 AM, Hannes Reinecke h...@suse.de wrote: On 02/13/2012 02:18 PM, Michael S. Tsirkin wrote: On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote: On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote: Only if you use the pci multi-function option but that kills standard hot unplug It doesn't kill it as such, rather you can't unplug luns individually. Isnt that just a consequence of the current implementation rather than a SCSI limitation? Yes. A different way to do hoplug could be to flag all devices as removable in the standard inq page then leave the LUN there persistently and what you remove/add is not the LUN device itself but just the media in the device. Instead of hot-plug remove the LUN, hot-plug becomes media eject or media insert. The device remains present all time, you never remove it, but instead hot-plug controls if the media is present or not. This would require implementing at least START_STOP_UNIT and PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC. regards ronnie sahlberg That would work. Or we simply use the Peripheral Qualifier that the device is gone; eg we could simply set PQ = 1, return sense code 0x25/00 and be done with ... That is still similar to rip a device out from the guest without notice and can cause the guest to be surprised. Removable media is standard feature in SCSI SBC (and other commandsets). The nice part of removable media is that it activates a contract between the device and the guest to prevent removal of the media when the guest depends on the media not being removed. I.e. If you have a SBC device with the removable-media bit set, this is used to tell the initiator this media can be removed, be prepared that this might happen. So when you mount such a SBC device in the guest, the guest will issue a PREVENT_ALLOW_MEDIUM_REMOVAL to tell the device this medium is in use and may not be removed. This automatically provides you with a mechanism where any guest can signal to qemu when qemu may or may not remove the device/medium. In addition to implementing PREVENT_ALLOW_MEDIUM_REMOVAL emulation, qemu would also need to check the prevent-allow status before it allows the device to be removed. If nothing else, using this approach will automatically provide a channel from the guest kernel to qemu to tell qemu when a device may be unplugged and when it is not safe to unplug the device. regards ronnie sahlberg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pe: [PATCH v5 1/3] virtio-scsi: first version
On Tue, Feb 14, 2012 at 7:42 AM, ronnie sahlberg ronniesahlb...@gmail.com wrote: On Tue, Feb 14, 2012 at 2:12 AM, Hannes Reinecke h...@suse.de wrote: On 02/13/2012 02:18 PM, Michael S. Tsirkin wrote: On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote: On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote: Only if you use the pci multi-function option but that kills standard hot unplug It doesn't kill it as such, rather you can't unplug luns individually. Isnt that just a consequence of the current implementation rather than a SCSI limitation? Yes. A different way to do hoplug could be to flag all devices as removable in the standard inq page then leave the LUN there persistently and what you remove/add is not the LUN device itself but just the media in the device. Instead of hot-plug remove the LUN, hot-plug becomes media eject or media insert. The device remains present all time, you never remove it, but instead hot-plug controls if the media is present or not. This would require implementing at least START_STOP_UNIT and PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC. regards ronnie sahlberg That would work. Or we simply use the Peripheral Qualifier that the device is gone; eg we could simply set PQ = 1, return sense code 0x25/00 and be done with ... That is still similar to rip a device out from the guest without notice and can cause the guest to be surprised. Removable media is standard feature in SCSI SBC (and other commandsets). The nice part of removable media is that it activates a contract between the device and the guest to prevent removal of the media when the guest depends on the media not being removed. I.e. If you have a SBC device with the removable-media bit set, this is used to tell the initiator this media can be removed, be prepared that this might happen. So when you mount such a SBC device in the guest, the guest will issue a PREVENT_ALLOW_MEDIUM_REMOVAL to tell the device this medium is in use and may not be removed. What I mean is that if /dev/sdb is removable, if you mount this as mount /dev/sdb1 /mnt this will automatically cause the guest kernel to send a PREVENT_ALLOW_MEDIUM_REMOVAL to /dev/sdb to prevent removal. When you umount /dev/sdb1 the kernel/guest will automagically send PREVENT_ALLOW_MEDIUM_REMOVEAL to /dev/sdb and allow removal of the media again. If you capture this command and track the prevent/allow removal status you automatically get a channel where qemu will know when it is safe to unplug the device and when it is not safe to unplug the device. This is a nice feature. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call agenda for Tuesday 14
Hi Please send in any agenda items you are interested in covering. Cheers, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
level in kvm_mmu_page_role
I have been going through the kvm code but didn't get the significance of level in kvm_mmu_page_role. So, it would be nice if anyone can explain it what is its use? Thanks, Sanidhya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pe: [PATCH v5 1/3] virtio-scsi: first version
On Tue, Feb 14, 2012 at 07:53:26AM +1100, ronnie sahlberg wrote: On Tue, Feb 14, 2012 at 7:42 AM, ronnie sahlberg ronniesahlb...@gmail.com wrote: On Tue, Feb 14, 2012 at 2:12 AM, Hannes Reinecke h...@suse.de wrote: On 02/13/2012 02:18 PM, Michael S. Tsirkin wrote: On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote: On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote: Only if you use the pci multi-function option but that kills standard hot unplug It doesn't kill it as such, rather you can't unplug luns individually. Isnt that just a consequence of the current implementation rather than a SCSI limitation? Yes. A different way to do hoplug could be to flag all devices as removable in the standard inq page then leave the LUN there persistently and what you remove/add is not the LUN device itself but just the media in the device. Instead of hot-plug remove the LUN, hot-plug becomes media eject or media insert. The device remains present all time, you never remove it, but instead hot-plug controls if the media is present or not. This would require implementing at least START_STOP_UNIT and PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC. regards ronnie sahlberg That would work. Or we simply use the Peripheral Qualifier that the device is gone; eg we could simply set PQ = 1, return sense code 0x25/00 and be done with ... That is still similar to rip a device out from the guest without notice and can cause the guest to be surprised. Removable media is standard feature in SCSI SBC (and other commandsets). The nice part of removable media is that it activates a contract between the device and the guest to prevent removal of the media when the guest depends on the media not being removed. I.e. If you have a SBC device with the removable-media bit set, this is used to tell the initiator this media can be removed, be prepared that this might happen. So when you mount such a SBC device in the guest, the guest will issue a PREVENT_ALLOW_MEDIUM_REMOVAL to tell the device this medium is in use and may not be removed. What I mean is that if /dev/sdb is removable, if you mount this as mount /dev/sdb1 /mnt this will automatically cause the guest kernel to send a PREVENT_ALLOW_MEDIUM_REMOVAL to /dev/sdb to prevent removal. When you umount /dev/sdb1 the kernel/guest will automagically send PREVENT_ALLOW_MEDIUM_REMOVEAL to /dev/sdb and allow removal of the media again. If you capture this command and track the prevent/allow removal status you automatically get a channel where qemu will know when it is safe to unplug the device and when it is not safe to unplug the device. This is a nice feature. Presumably there's a way for device to notify the OS that user requested removal, as well? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pe: [PATCH v5 1/3] virtio-scsi: first version
On Tue, Feb 14, 2012 at 9:59 AM, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Feb 14, 2012 at 07:53:26AM +1100, ronnie sahlberg wrote: On Tue, Feb 14, 2012 at 7:42 AM, ronnie sahlberg ronniesahlb...@gmail.com wrote: On Tue, Feb 14, 2012 at 2:12 AM, Hannes Reinecke h...@suse.de wrote: On 02/13/2012 02:18 PM, Michael S. Tsirkin wrote: On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote: On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote: Only if you use the pci multi-function option but that kills standard hot unplug It doesn't kill it as such, rather you can't unplug luns individually. Isnt that just a consequence of the current implementation rather than a SCSI limitation? Yes. A different way to do hoplug could be to flag all devices as removable in the standard inq page then leave the LUN there persistently and what you remove/add is not the LUN device itself but just the media in the device. Instead of hot-plug remove the LUN, hot-plug becomes media eject or media insert. The device remains present all time, you never remove it, but instead hot-plug controls if the media is present or not. This would require implementing at least START_STOP_UNIT and PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC. regards ronnie sahlberg That would work. Or we simply use the Peripheral Qualifier that the device is gone; eg we could simply set PQ = 1, return sense code 0x25/00 and be done with ... That is still similar to rip a device out from the guest without notice and can cause the guest to be surprised. Removable media is standard feature in SCSI SBC (and other commandsets). The nice part of removable media is that it activates a contract between the device and the guest to prevent removal of the media when the guest depends on the media not being removed. I.e. If you have a SBC device with the removable-media bit set, this is used to tell the initiator this media can be removed, be prepared that this might happen. So when you mount such a SBC device in the guest, the guest will issue a PREVENT_ALLOW_MEDIUM_REMOVAL to tell the device this medium is in use and may not be removed. What I mean is that if /dev/sdb is removable, if you mount this as mount /dev/sdb1 /mnt this will automatically cause the guest kernel to send a PREVENT_ALLOW_MEDIUM_REMOVAL to /dev/sdb to prevent removal. When you umount /dev/sdb1 the kernel/guest will automagically send PREVENT_ALLOW_MEDIUM_REMOVEAL to /dev/sdb and allow removal of the media again. If you capture this command and track the prevent/allow removal status you automatically get a channel where qemu will know when it is safe to unplug the device and when it is not safe to unplug the device. This is a nice feature. Presumably there's a way for device to notify the OS that user requested removal, as well? I think that is done by responding with sense to one of the commands, like the every few second TEST_UNIT_READY that the initiator/guest-kernel will send. 5Ah 01hDT WROM BK OPERATOR MEDIUM REMOVAL REQUEST This sense code should be the one to use. I dont know if linux scsi initiator honors this or what it will do. I guess something like this could work ? IF device is marked as prevent-removal THEN send OPERATOR SEND MEDIUM REMOVAL REQUEST to the initiator wait xyz seconds IF device is still marked as prevent-removal THEN ask operator guest refused to release the LUN, do you want to forcefully remove it? ELSE unmount the media FI ELSE unmount the media FI -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pe: [PATCH v5 1/3] virtio-scsi: first version
On Tue, Feb 14, 2012 at 10:30:59AM +1100, ronnie sahlberg wrote: On Tue, Feb 14, 2012 at 9:59 AM, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Feb 14, 2012 at 07:53:26AM +1100, ronnie sahlberg wrote: On Tue, Feb 14, 2012 at 7:42 AM, ronnie sahlberg ronniesahlb...@gmail.com wrote: On Tue, Feb 14, 2012 at 2:12 AM, Hannes Reinecke h...@suse.de wrote: On 02/13/2012 02:18 PM, Michael S. Tsirkin wrote: On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote: On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote: Only if you use the pci multi-function option but that kills standard hot unplug It doesn't kill it as such, rather you can't unplug luns individually. Isnt that just a consequence of the current implementation rather than a SCSI limitation? Yes. A different way to do hoplug could be to flag all devices as removable in the standard inq page then leave the LUN there persistently and what you remove/add is not the LUN device itself but just the media in the device. Instead of hot-plug remove the LUN, hot-plug becomes media eject or media insert. The device remains present all time, you never remove it, but instead hot-plug controls if the media is present or not. This would require implementing at least START_STOP_UNIT and PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC. regards ronnie sahlberg That would work. Or we simply use the Peripheral Qualifier that the device is gone; eg we could simply set PQ = 1, return sense code 0x25/00 and be done with ... That is still similar to rip a device out from the guest without notice and can cause the guest to be surprised. Removable media is standard feature in SCSI SBC (and other commandsets). The nice part of removable media is that it activates a contract between the device and the guest to prevent removal of the media when the guest depends on the media not being removed. I.e. If you have a SBC device with the removable-media bit set, this is used to tell the initiator this media can be removed, be prepared that this might happen. So when you mount such a SBC device in the guest, the guest will issue a PREVENT_ALLOW_MEDIUM_REMOVAL to tell the device this medium is in use and may not be removed. What I mean is that if /dev/sdb is removable, if you mount this as mount /dev/sdb1 /mnt this will automatically cause the guest kernel to send a PREVENT_ALLOW_MEDIUM_REMOVAL to /dev/sdb to prevent removal. When you umount /dev/sdb1 the kernel/guest will automagically send PREVENT_ALLOW_MEDIUM_REMOVEAL to /dev/sdb and allow removal of the media again. If you capture this command and track the prevent/allow removal status you automatically get a channel where qemu will know when it is safe to unplug the device and when it is not safe to unplug the device. This is a nice feature. Presumably there's a way for device to notify the OS that user requested removal, as well? I think that is done by responding with sense to one of the commands, like the every few second TEST_UNIT_READY that the initiator/guest-kernel will send. Does it do this even for mounted media? I didn't realize ... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pe: [PATCH v5 1/3] virtio-scsi: first version
On Mon, 13 Feb 2012 10:19:56 +0100, Paolo Bonzini pbonz...@redhat.com wrote: block layer _is_ growing support for new operations: discard is already there, write same is in the works, extended copy will also come in due time. Perhaps we'll add them to virtio-blk, perhaps not. FYI, I'd take patches for discard in virtio_blk today; it's a no-brainer in a virtual devoce. But I wouldn't want extended copy and write same. Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] seabios: add OSHP method stub
On Mon, Feb 13, 2012 at 11:33:08AM +0200, Michael S. Tsirkin wrote: To allow guests to load the native SHPC driver for a bridge, we must declare an OSHP method for the appropriate device which lets the OS take control of the SHPC. As we don't access SHPC at the moment, we don't need to do anything - just report success. The patch is fine with me, but since this is really qemu/kvm specific, please provide an ack from one of the qemu/kvm maintainers. -Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] seabios: add OSHP method stub
On Tue, Feb 14, 2012 at 02:43:45AM +0200, Michael S. Tsirkin wrote: On Mon, Feb 13, 2012 at 07:34:55PM -0500, Kevin O'Connor wrote: On Mon, Feb 13, 2012 at 11:33:08AM +0200, Michael S. Tsirkin wrote: To allow guests to load the native SHPC driver for a bridge, we must declare an OSHP method for the appropriate device which lets the OS take control of the SHPC. As we don't access SHPC at the moment, we don't need to do anything - just report success. The patch is fine with me, but since this is really qemu/kvm specific, please provide an ack from one of the qemu/kvm maintainers. -Kevin I expect no problem with this, though I'm wondering what makes it qemu specific. Only kvm/qemu use the ACPI tables in seabios. In a nutshell, I don't know what a SHPC is (nor OSHP), so I'm looking for an additional Ack. -Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html