Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once

2012-02-13 Thread Paolo Bonzini

On 02/11/2012 03:12 PM, Andreas Färber wrote:

Yes and no. They can have any target-specific pointer they want, just
as before. But no global first_cpu / cpu_single_env pointer - that's
replaced by CPU pointers, through which members of derived classes can
be accessed (which did not work for CPUState due to CPU_COMMON members
being at target-specific offset in the middle).


Hmm, now I'm not even sure what I want that Andreas referred to. :)

I definitely would like CPUState pointers to be changed into link 
properties, but that's not related to what Jan is doing here with 
cpu_single_env.  Each LAPIC refers to a CPU, and that would become a 
link property indeed.  But here we're using cpu_single_env to find out 
which LAPIC is being read.  It's the other direction.


Relying on thread-local cpu_single_env means that you restrict LAPIC 
memory reads to run in VCPU thread context, and this makes sense anyway. 
 The only case of MMIO running in iothread context is Xen, but Xen 
always keeps the LAPIC in the hypervisor.


Also, I think that having a view of CPUs in QOM is laudable, but I don't 
understand why that means you need to remove first_cpu / cpu_single_env.


Finally, CPU_COMMON members may be referenced from TCG-generated code, 
how do you plan to move them and still keep the TLBs at small offsets 
within CPUState?  Perhaps we need a drawing of the situation before and 
after the QOMization of CPUs.


Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane

2012-02-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42755





--- Comment #21 from Gleb g...@redhat.com  2012-02-13 08:53:28 ---
Can you please compile trace-cmd from its git [1] (do make all_cmd
install_cmd; install part is important) and try getting trace with it? If this
will not work I will guide you how to take a trace using debugfs directly.

[1] git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2-RFC 1/2] shpc: standard hot plug controller

2012-02-13 Thread Michael S. Tsirkin
This adds support for SHPC interface, as defined by PCI Standard
Hot-Plug Controller and Subsystem Specification, Rev 1.0
http://www.pcisig.com/specifications/conventional/pci_hot_plug/SHPC_10

Only SHPC intergrated with a PCI-to-PCI bridge is supported,
SHPC integrated with a host bridge would need more work.

All main SHPC features are supported:
- MRL sensor
- Attention button
- Attention indicator
- Power indicator

Wake on hotplug and serr generation are stubbed out but unused
as we don't have interfaces to generate these events ATM.

One issue that isn't completely resolved is that qemu currently
expects an eject interface, which SHPC does not provide: it merely
removes the power to device and it's up to the user to remove the device
from slot. This patch works around that by ejecting the device
when power is removed and power LED goes off.

TODO:
- migration support
- fix dependency on pci_internals.h

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 Makefile.objs |1 +
 hw/pci.h  |6 +
 hw/shpc.c |  646 +
 hw/shpc.h |   40 
 qemu-common.h |1 +
 5 files changed, 694 insertions(+), 0 deletions(-)
 create mode 100644 hw/shpc.c
 create mode 100644 hw/shpc.h

diff --git a/Makefile.objs b/Makefile.objs
index 391e524..4546477 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -195,6 +195,7 @@ hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
 hw-obj-y += fw_cfg.o
 hw-obj-$(CONFIG_PCI) += pci.o pci_bridge.o
 hw-obj-$(CONFIG_PCI) += msix.o msi.o
+hw-obj-$(CONFIG_PCI) += shpc.o
 hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o
 hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o
 hw-obj-y += watchdog.o
diff --git a/hw/pci.h b/hw/pci.h
index 33b0b18..756577e 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -125,6 +125,9 @@ enum {
 /* command register SERR bit enabled */
 #define QEMU_PCI_CAP_SERR_BITNR 4
 QEMU_PCI_CAP_SERR = (1  QEMU_PCI_CAP_SERR_BITNR),
+/* Standard hot plug controller. */
+#define QEMU_PCI_SHPC_BITNR 5
+QEMU_PCI_CAP_SHPC = (1  QEMU_PCI_SHPC_BITNR),
 };
 
 #define TYPE_PCI_DEVICE pci-device
@@ -229,6 +232,9 @@ struct PCIDevice {
 /* PCI Express */
 PCIExpressDevice exp;
 
+/* SHPC */
+SHPCDevice *shpc;
+
 /* Location of option rom */
 char *romfile;
 bool has_rom;
diff --git a/hw/shpc.c b/hw/shpc.c
new file mode 100644
index 000..4baec29
--- /dev/null
+++ b/hw/shpc.c
@@ -0,0 +1,646 @@
+#include strings.h
+#include stdint.h
+#include range.h
+#include shpc.h
+#include pci.h
+#include pci_internals.h
+
+/* TODO: model power only and disabled slot states. */
+/* TODO: handle SERR and wakeups */
+/* TODO: consider enabling 66MHz support */
+
+/* TODO: remove fully only on state DISABLED and LED off.
+ * track state to properly record this. */
+
+/* SHPC Working Register Set */
+#define SHPC_BASE_OFFSET  0x00 /* 4 bytes */
+#define SHPC_SLOTS_33 0x04 /* 4 bytes. Also encodes PCI-X slots. */
+#define SHPC_SLOTS_66 0x08 /* 4 bytes. */
+#define SHPC_NSLOTS   0x0C /* 1 byte */
+#define SHPC_FIRST_DEV0x0D /* 1 byte */
+#define SHPC_PHYS_SLOT0x0E /* 2 byte */
+#define SHPC_PHYS_NUM_MAX 0x7ff
+#define SHPC_PHYS_NUM_UP  0x1000
+#define SHPC_PHYS_MRL 0x4000
+#define SHPC_PHYS_BUTTON  0x8000
+#define SHPC_SEC_BUS  0x10 /* 2 bytes */
+#define SHPC_SEC_BUS_33   0x0
+#define SHPC_SEC_BUS_66   0x1 /* Unused */
+#define SHPC_SEC_BUS_MASK 0x7
+#define SHPC_MSI_CTL  0x12 /* 1 byte */
+#define SHPC_PROG_IFC 0x13 /* 1 byte */
+#define SHPC_PROG_IFC_1_0 0x1
+#define SHPC_CMD_CODE 0x14 /* 1 byte */
+#define SHPC_CMD_TRGT 0x15 /* 1 byte */
+#define SHPC_CMD_TRGT_MIN 0x1
+#define SHPC_CMD_TRGT_MAX 0x1f
+#define SHPC_CMD_STATUS   0x16 /* 2 bytes */
+#define SHPC_CMD_STATUS_BUSY  0x1
+#define SHPC_CMD_STATUS_MRL_OPEN  0x2
+#define SHPC_CMD_STATUS_INVALID_CMD   0x4
+#define SHPC_CMD_STATUS_INVALID_MODE  0x8
+#define SHPC_INT_LOCATOR  0x18 /* 4 bytes */
+#define SHPC_INT_COMMAND  0x1
+#define SHPC_SERR_LOCATOR 0x1C /* 4 bytes */
+#define SHPC_SERR_INT 0x20 /* 4 bytes */
+#define SHPC_INT_DIS  0x1
+#define SHPC_SERR_DIS 0x2
+#define SHPC_CMD_INT_DIS  0x4
+#define SHPC_ARB_SERR_DIS 0x8
+#define SHPC_CMD_DETECTED 0x1
+#define SHPC_ARB_DETECTED 0x2
+ /* 4 bytes * slot # (start from 0) */
+#define SHPC_SLOT_REG(s) (0x24 + (s) * 4)
+ /* 2 bytes */
+#define SHPC_SLOT_STATUS(s)   (0x0 + SHPC_SLOT_REG(s))
+
+/* Same slot state masks are used for command and status registers */
+#define SHPC_SLOT_STATE_MASK 0x03
+#define SHPC_SLOT_STATE_SHIFT \
+(ffs(SHPC_SLOT_STATE_MASK) - 1)
+
+#define SHPC_STATE_NO   0x0
+#define SHPC_STATE_PWRONLY  0x1
+#define SHPC_STATE_ENABLED  0x2
+#define SHPC_STATE_DISABLED 0x3
+
+#define SHPC_SLOT_PWR_LED_MASK   0xC
+#define SHPC_SLOT_PWR_LED_SHIFT \
+(ffs(SHPC_SLOT_PWR_LED_MASK) - 1)
+#define SHPC_SLOT_ATTN_LED_MASK  0x30
+#define SHPC_SLOT_ATTN_LED_SHIFT \
+

[PATCHv2-RFC 2/2] pci: add standard bridge device

2012-02-13 Thread Michael S. Tsirkin
This adds support for a standard pci to pci bridge,
enabling support for more than 32 PCI devices in the system.
Device hotplug is supported by means of SHPC controller.
For guests with an SHPC driver, this allows robust hotplug
and even hotplug of nested bridges, up to 31 devices
per bridge.

TODO:
- chassis capability support
- migration support
- remove dependency on pci_internals.h

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 Makefile.objs   |2 +-
 hw/pci_bridge_dev.c |  136 +++
 2 files changed, 137 insertions(+), 1 deletions(-)
 create mode 100644 hw/pci_bridge_dev.c

diff --git a/Makefile.objs b/Makefile.objs
index 4546477..e89112c 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -193,7 +193,7 @@ hw-obj-$(CONFIG_VIRTIO) += virtio-console.o
 hw-obj-y += usb-libhw.o
 hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
 hw-obj-y += fw_cfg.o
-hw-obj-$(CONFIG_PCI) += pci.o pci_bridge.o
+hw-obj-$(CONFIG_PCI) += pci.o pci_bridge.o pci_bridge_dev.o
 hw-obj-$(CONFIG_PCI) += msix.o msi.o
 hw-obj-$(CONFIG_PCI) += shpc.o
 hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o
diff --git a/hw/pci_bridge_dev.c b/hw/pci_bridge_dev.c
new file mode 100644
index 000..f48cd2d
--- /dev/null
+++ b/hw/pci_bridge_dev.c
@@ -0,0 +1,136 @@
+/*
+ * Standard PCI Bridge Device
+ *
+ * Copyright (c) 2011 Red Hat Inc. Author: Michael S. Tsirkin m...@redhat.com
+ *
+ * 
http://www.pcisig.com/specifications/conventional/pci_to_pci_bridge_architecture/
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see http://www.gnu.org/licenses/.
+ */
+
+#include pci_bridge.h
+#include pci_ids.h
+#include shpc.h
+#include memory.h
+#include pci_internals.h
+
+#define REDHAT_PCI_VENDOR_ID 0x1b36
+#define PCI_BRIDGE_DEV_VENDOR_ID REDHAT_PCI_VENDOR_ID
+#define PCI_BRIDGE_DEV_DEVICE_ID 0x1
+
+struct PCIBridgeDev {
+PCIBridge bridge;
+MemoryRegion bar;
+};
+typedef struct PCIBridgeDev PCIBridgeDev;
+
+/* Mapping mandated by PCI-to-PCI Bridge architecture specification,
+ * revision 1.2 */
+/* Table 9-1: Interrupt Binding for Devices Behind a Bridge */
+static int pci_bridge_dev_map_irq_fn(PCIDevice *dev, int irq_num)
+{
+return (irq_num + PCI_SLOT(dev-devfn)) % PCI_NUM_PINS;
+}
+
+static int pci_bridge_dev_initfn(PCIDevice *dev)
+{
+PCIBridge *br = DO_UPCAST(PCIBridge, dev, dev);
+PCIBridgeDev *bridge_dev = DO_UPCAST(PCIBridgeDev, bridge, br);
+int err;
+br-map_irq = pci_bridge_dev_map_irq_fn;
+/* If we don't specify the name, the bus will be addressed as id.0, where
+ * id is the parent id.  But it seems more natural to address the bus using
+ * the parent device name. */
+if (dev-qdev.id  *dev-qdev.id) {
+br-bus_name = dev-qdev.id;
+}
+err = pci_bridge_initfn(dev);
+if (err) {
+goto bridge_error;
+}
+memory_region_init(bridge_dev-bar, shpc-bar, shpc_bar_size(dev));
+err = shpc_init(dev, br-sec_bus, bridge_dev-bar, 0);
+if (err) {
+goto error;
+}
+/* TODO: spec recommends using 64 bit prefetcheable BAR.
+ * Check whether that works well. */
+pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, bridge_dev-bar);
+dev-config[PCI_INTERRUPT_PIN] = 0x1;
+return 0;
+error:
+memory_region_destroy(bridge_dev-bar);
+bridge_error:
+return err;
+}
+
+static int pci_bridge_dev_exitfn(PCIDevice *dev)
+{
+PCIBridge *br = DO_UPCAST(PCIBridge, dev, dev);
+PCIBridgeDev *bridge_dev = DO_UPCAST(PCIBridgeDev, bridge, br);
+int ret;
+shpc_cleanup(dev);
+memory_region_destroy(bridge_dev-bar);
+ret = pci_bridge_exitfn(dev);
+assert(!ret);
+return 0;
+}
+
+static void pci_bridge_dev_write_config(PCIDevice *d,
+uint32_t address, uint32_t val, int 
len)
+{
+pci_bridge_write_config(d, address, val, len);
+shpc_cap_write_config(d, address, val, len);
+}
+
+static void qdev_pci_bridge_dev_reset(DeviceState *qdev)
+{
+PCIDevice *dev = DO_UPCAST(PCIDevice, qdev, qdev);
+pci_bridge_reset(qdev);
+shpc_reset(dev);
+}
+
+static void pci_bridge_dev_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+k-init = pci_bridge_dev_initfn;
+k-exit = pci_bridge_dev_exitfn;
+k-config_write = pci_bridge_dev_write_config;
+k-vendor_id = 

[PATCHv2-RFC 0/2] RFC: standard pci bridge device

2012-02-13 Thread Michael S. Tsirkin
Here's a new version of the patch. It works for me.
Deep nesting of bridges is supported.
You need a small BIOS patch to support the OSHP method
if you want hotplug to work. I will post this separately.
We'd need a full ACPI driver to make hotplug work for guests
without an SHPC driver (e.g. windows XP).
Management support will also be needed.

One small wrinkle is that the pci_addr property
wants data in a format bus:device.function which is
broken as guests can change bus numbers.
For testing I used the 'addr' property which
encodes slot*8+function#. We probably want to
extend pci_addr in some way (e.g. :device.function ?
Thoughts?).

The SHPC controller supports up to 31 devices
(out of 32 slots) so slot 0 doesn't support hotplug.
Non hot-pluggable devices behind the bridge
don't work currectly (we'll try to unplug them)
so don't do this.
For now I just blocked adding devices in slot 0,
in the future it might be possible to add
a non-hotpluggable device there.

Example:

qemu-system-x86_64  -enable-kvm -m 1G
 -drive file=/home/mst/rhel6.qcow2
-netdev
tap,id=foo,ifname=msttap0,script=/home/mst/ifup,downscript=no,vhost=on
-device pci-bridge,id=bog
-device virtio-net-pci,netdev=foo,bus=bog,addr=8


Hot-unplug currently causes qemu to crash, this
happens without this patch too, so I'm not worried :)

New since v1:
hotplug support

-- 
MST


Michael S. Tsirkin (2):
  shpc: standard hot plug controller
  pci: add standard bridge device

 Makefile.objs   |3 +-
 hw/pci.h|6 +
 hw/pci_bridge_dev.c |  136 +++
 hw/shpc.c   |  646 +++
 hw/shpc.h   |   40 
 qemu-common.h   |1 +
 6 files changed, 831 insertions(+), 1 deletions(-)
 create mode 100644 hw/pci_bridge_dev.c
 create mode 100644 hw/shpc.c
 create mode 100644 hw/shpc.h

-- 
1.7.9.111.gf3fb0
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

2012-02-13 Thread Paolo Bonzini

On 02/12/2012 09:16 PM, James Bottomley wrote:

Well, no-one's yet answered the question I had about why.  virtio-scsi
seems to be a basic duplication of virtio-blk except that it seems to
fix some problems virtio-blk has.  Namely queue parameter discover,
which virtio-blk doesn't seem to do.


The biggest differences between virtio-blk and virtio-scsi are that:

1) how the feature set is defined.  virtio-blk defines the feature set 
of the device through a shared spec between the guest and the host.  The 
virtio-scsi spec does not define a feature set for the devices, only for 
the transport.  Introducing new features in the guest does not need to 
be done specifically for virt, it can be done in generic code (sd.c). 
This results in a large feature set and at the same time a very stable spec.


Right now virtio-blk covers common usecases nicely.  However, the Linux 
block layer _is_ growing support for new operations: discard is already 
there, write same is in the works, extended copy will also come in due 
time.  Perhaps we'll add them to virtio-blk, perhaps not.  If we will, 
we will have to modify the spec, the host implementation, and the guest 
drivers for each possible guest OS.  virtio-scsi will support them 
transparently.  Depending on your configuration, it might work without 
touching the host at all.



2) for disks with SCSI attachment, the native interface is exposed 
precisely as it is in the host.  I think we had some misunderstanding 
WRT queue parameter discovery.  My concern with virtio-blk's SG_IO 
support is more general than that.  It is that SG_IO accesses the host 
disk, not the guest disk.  They will have the same data, but they are 
effectively different disks.  For example they might have different 
queue parameters, hence the misunderstanding.


People are mostly using the SG_IO interface for sane purposes.  For 
example you can ping the storage with INQUIRY commands to detect 
problems on the NAS or SAN.  For these usecases the difference does not 
matter.  However, there _are_ worrisome usecases for SG_IO that people 
are looking at.  For example installing vendor backup tools in their 
guests.  These tools send vendor-specific commands to the disks. 
Nothing particularly insane about that, but we want them to do it using 
a saner interface than VIRTIO_BLK_T_SCSI_CMD.



On top of this, only virtio-scsi obviously will support devices such as 
tapes.



There may also be a reason to cut the stack lower down.  Error
handling is most often cited for this, but no-one's satisfactorily
explaned why it's better to do error handling in the guest instead of
the host.


It's not necessarily better.  However error handling in the host may 
simply not be there.  This is for example the case of NFS-based storage 
with the hard option.


Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm-1.0 crashes with threaded vnc server?

2012-02-13 Thread Peter Lieven

Am 11.02.2012 um 09:55 schrieb Corentin Chary:

 On Thu, Feb 9, 2012 at 7:08 PM, Peter Lieven p...@dlh.net wrote:
 Hi,
 
 is anyone aware if there are still problems when enabling the threaded vnc
 server?
 I saw some VMs crashing when using a qemu-kvm build with
 --enable-vnc-thread.
 
 qemu-kvm-1.0[22646]: segfault at 0 ip 7fec1ca7ea0b sp 7fec19d056d0
 error 6 in libz.so.1.2.3.3[7fec1ca75000+16000]
 qemu-kvm-1.0[26056]: segfault at 7f06d8d6e010 ip 7f06e0a30d71 sp
 7f06df035748 error 6 in libc-2.11.1.so[7f06e09aa000+17a000]
 
 I had no time to debug further. It seems to happen shortly after migrating,
 but thats uncertain. At least the segfault in libz seems to
 give a hint to VNC since I cannot image of any other part of qemu-kvm using
 libz except for VNC server.
 
 Thanks,
 Peter
 
 
 
 Hi Peter,
 I found two patches on my git tree that I sent long ago but somehow
 get lost on the mailing list. I rebased the tree but did not have the
 time (yet) to test them.
 http://git.iksaif.net/?p=qemu.git;a=shortlog;h=refs/heads/wip
 Feel free to try them. If QEMU segfault again, please send a full gdb
 backtrace / valgrind trace / way to reproduce :).
 Thanks,

Hi Corentin,

thanks for rebasing those patches. I remember that I have seen them the
last time I noticed (about 1 year ago) that the threaded VNC is crashing.
I'm on vacation this week, but I will test them next week 
and let you know if I can force a crash with them applied. If not we should
consider to include them asap.

Peter


 
 
 -- 
 Corentin Chary
 http://xf.iksaif.net

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC] seabios: add OSHP method stub

2012-02-13 Thread Michael S. Tsirkin
To allow guests to load the native SHPC driver
for a bridge, we must declare an OSHP method
for the appropriate device which lets the OS
take control of the SHPC.
As we don't access SHPC at the moment, we
don't need to do anything - just report success.

Signed-off-by: Michael S. Tsirkin m...@redhat.com

---

diff --git a/src/ssdt-pcihp.dsl b/src/ssdt-pcihp.dsl
index 442e7a8..3f50169 100644
--- a/src/ssdt-pcihp.dsl
+++ b/src/ssdt-pcihp.dsl
@@ -24,6 +24,7 @@ DefinitionBlock (ssdt-pcihp.aml, SSDT, 0x01, BXPC, 
BXSSDTPCIHP, 0x1)
ACPI_EXTRACT_METHOD_STRING aml_ej0_name  \
Method (_EJ0, 1) { Return(PCEJ(0x##slot)) }  \
Name (_SUN, 0x##slot)\
+   Method (OSHP, 1) { Return(0x0) }  \
 }
 
 hotplug_slot(03)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2-RFC 0/2] RFC: standard pci bridge device

2012-02-13 Thread Wen Congyang
At 02/13/2012 05:15 PM, Michael S. Tsirkin Wrote:
 Here's a new version of the patch. It works for me.
 Deep nesting of bridges is supported.
 You need a small BIOS patch to support the OSHP method
 if you want hotplug to work. I will post this separately.
 We'd need a full ACPI driver to make hotplug work for guests
 without an SHPC driver (e.g. windows XP).
 Management support will also be needed.
 
 One small wrinkle is that the pci_addr property
 wants data in a format bus:device.function which is
 broken as guests can change bus numbers.
 For testing I used the 'addr' property which
 encodes slot*8+function#. We probably want to
 extend pci_addr in some way (e.g. :device.function ?
 Thoughts?).

What about using id+device(slot)+function to set the address?

 
 The SHPC controller supports up to 31 devices
 (out of 32 slots) so slot 0 doesn't support hotplug.
 Non hot-pluggable devices behind the bridge
 don't work currectly (we'll try to unplug them)
 so don't do this.
 For now I just blocked adding devices in slot 0,
 in the future it might be possible to add
 a non-hotpluggable device there.
 
 Example:
 
 qemu-system-x86_64  -enable-kvm -m 1G
  -drive file=/home/mst/rhel6.qcow2
 -netdev
 tap,id=foo,ifname=msttap0,script=/home/mst/ifup,downscript=no,vhost=on
 -device pci-bridge,id=bog
 -device virtio-net-pci,netdev=foo,bus=bog,addr=8
 
 
 Hot-unplug currently causes qemu to crash, this
 happens without this patch too, so I'm not worried :)

How to trigger this bug without this patch?

Thanks
Wen Congyang

 
 New since v1:
   hotplug support
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


AHCI Boot disk?

2012-02-13 Thread Conrad Wood
Hello,

I am attempting to test AHCI disks and find that I am unable to boot of
the disk, even though the disk is seen if I do a network boot.

This works:

-drive
file=/srv/kvm/debian.raw,if=virtio,cache=writeback,bus=0,index=0,media=disk,format=raw,serial=1,boot=on

This [1] does not:

-drive file=/srv/kvm/debian.raw,if=none,id=${AHCIID} \
-device ahci,id=${AHCIID} \
-device ide-hd,drive=${AHCIID},bus=${AHCIID}.0 

The latter gives me : Boot failed. Could not read the boot disk

1. Should it work?
2. If so, what am I doing wrong? ;)

Any help much appreciated!

Conrad

[1]
http://wiki.qemu.org/ChangeLog/0.14#IDE_.2F_AHCI


-- 
Conrad Wood
(Deputy CTO, Head of Research  Innovations)

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin
 
Office: +49 30 51 64 09 21
DDI:+49 30 51 300 021
Email:  conrad.w...@profitbricks.com
URL:http://www.profitbricks.com/
Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B.
Geschäftsführer: Andreas Gauger, Achim Weiss

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AHCI Boot disk?

2012-02-13 Thread Gleb Natapov
On Mon, Feb 13, 2012 at 09:55:46AM +0100, Conrad Wood wrote:
 Hello,
 
 I am attempting to test AHCI disks and find that I am unable to boot of
 the disk, even though the disk is seen if I do a network boot.
 
 This works:
 
 -drive
 file=/srv/kvm/debian.raw,if=virtio,cache=writeback,bus=0,index=0,media=disk,format=raw,serial=1,boot=on
 
You shouldn't use boot=on.

 This [1] does not:
 
 -drive file=/srv/kvm/debian.raw,if=none,id=${AHCIID} \
 -device ahci,id=${AHCIID} \
 -device ide-hd,drive=${AHCIID},bus=${AHCIID}.0 
 
 The latter gives me : Boot failed. Could not read the boot disk
 
 1. Should it work?
AFAIK yes if you BIOS is up-to-date.

 2. If so, what am I doing wrong? ;)
 
Try to compile BIOS from git://git.seabios.org/seabios.git

 Any help much appreciated!
 
 Conrad
 
 [1]
 http://wiki.qemu.org/ChangeLog/0.14#IDE_.2F_AHCI
 
 
 -- 
 Conrad Wood
 (Deputy CTO, Head of Research  Innovations)
 
 ProfitBricks GmbH
 Greifswalder Str. 207
 D - 10405 Berlin
  
 Office: +49 30 51 64 09 21
 DDI:+49 30 51 300 021
 Email:  conrad.w...@profitbricks.com
 URL:http://www.profitbricks.com/
 Sitz der Gesellschaft: Berlin
 Registergericht: Amtsgericht Charlottenburg, HRB 125506 B.
 Geschäftsführer: Andreas Gauger, Achim Weiss
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AHCI Boot disk?

2012-02-13 Thread Conrad Wood
On Mon, 2012-02-13 at 11:50 +0200, Gleb Natapov wrote:
 On Mon, Feb 13, 2012 at 09:55:46AM +0100, Conrad Wood wrote:
  Hello,
  
  I am attempting to test AHCI disks and find that I am unable to boot of
  the disk, even though the disk is seen if I do a network boot.
  
  This works:
  
  -drive
  file=/srv/kvm/debian.raw,if=virtio,cache=writeback,bus=0,index=0,media=disk,format=raw,serial=1,boot=on
  
 You shouldn't use boot=on.

Yes... but in our specific case we cannot use boot-order (yet)...
Thanks for point it out though ;)


 
  This [1] does not:
  
  -drive file=/srv/kvm/debian.raw,if=none,id=${AHCIID} \
  -device ahci,id=${AHCIID} \
  -device ide-hd,drive=${AHCIID},bus=${AHCIID}.0 
  
  The latter gives me : Boot failed. Could not read the boot disk
  
  1. Should it work?
 AFAIK yes if you BIOS is up-to-date.
 
  2. If so, what am I doing wrong? ;)
  
 Try to compile BIOS from git://git.seabios.org/seabios.git

ok, will do. Thanks

Conrad

-- 
Conrad Wood
(Deputy CTO, Head of Research  Innovations)

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin
 
Office: +49 30 51 64 09 21
DDI:+49 30 51 300 021
Email:  conrad.w...@profitbricks.com
URL:http://www.profitbricks.com/
Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B.
Geschäftsführer: Andreas Gauger, Achim Weiss

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2-RFC 0/2] RFC: standard pci bridge device

2012-02-13 Thread Michael S. Tsirkin
On Mon, Feb 13, 2012 at 05:38:26PM +0800, Wen Congyang wrote:
 At 02/13/2012 05:15 PM, Michael S. Tsirkin Wrote:
  Here's a new version of the patch. It works for me.
  Deep nesting of bridges is supported.
  You need a small BIOS patch to support the OSHP method
  if you want hotplug to work. I will post this separately.
  We'd need a full ACPI driver to make hotplug work for guests
  without an SHPC driver (e.g. windows XP).
  Management support will also be needed.
  
  One small wrinkle is that the pci_addr property
  wants data in a format bus:device.function which is
  broken as guests can change bus numbers.
  For testing I used the 'addr' property which
  encodes slot*8+function#. We probably want to
  extend pci_addr in some way (e.g. :device.function ?
  Thoughts?).
 
 What about using id+device(slot)+function to set the address?

That's exactly what this patch does: addr encodes
slot+function.
I was asking about a friendlier format for this.

  
  The SHPC controller supports up to 31 devices
  (out of 32 slots) so slot 0 doesn't support hotplug.
  Non hot-pluggable devices behind the bridge
  don't work currectly (we'll try to unplug them)
  so don't do this.
  For now I just blocked adding devices in slot 0,
  in the future it might be possible to add
  a non-hotpluggable device there.
  
  Example:
  
  qemu-system-x86_64  -enable-kvm -m 1G
   -drive file=/home/mst/rhel6.qcow2
  -netdev
  tap,id=foo,ifname=msttap0,script=/home/mst/ifup,downscript=no,vhost=on
  -device pci-bridge,id=bog
  -device virtio-net-pci,netdev=foo,bus=bog,addr=8
  
  
  Hot-unplug currently causes qemu to crash, this
  happens without this patch too, so I'm not worried :)
 
 How to trigger this bug without this patch?
 
 Thanks
 Wen Congyang

start with 
 qemu-system-x86_64  -enable-kvm -m 1G
  -drive file=/home/mst/rhel6.qcow2
 -netdev
 tap,id=foo,ifname=msttap0,script=/home/mst/ifup,downscript=no,vhost=on

next do:
device_add virtio-net-pci,netdev=foo,id=bla
wait a bit for guest to notice the device
device_del bla
wait for device to go away
and it will crash on next malloc, to trigger
malloc give another command, e.g.
info pci

  
  New since v1:
  hotplug support
  
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2-RFC 1/2] shpc: standard hot plug controller

2012-02-13 Thread Isaku Yamahata
Oh nice work.

On Mon, Feb 13, 2012 at 11:15:55AM +0200, Michael S. Tsirkin wrote:
 This adds support for SHPC interface, as defined by PCI Standard
 Hot-Plug Controller and Subsystem Specification, Rev 1.0
 http://www.pcisig.com/specifications/conventional/pci_hot_plug/SHPC_10
 
 Only SHPC intergrated with a PCI-to-PCI bridge is supported,
 SHPC integrated with a host bridge would need more work.
 
 All main SHPC features are supported:
 - MRL sensor

Does this just report latch status? (It seems so.)
Do you plan to provide interfaces to manipulate the latch?


 - Attention button
 - Attention indicator
 - Power indicator

 Wake on hotplug and serr generation are stubbed out but unused
 as we don't have interfaces to generate these events ATM.
 
 One issue that isn't completely resolved is that qemu currently
 expects an eject interface, which SHPC does not provide: it merely
 removes the power to device and it's up to the user to remove the device
 from slot. This patch works around that by ejecting the device
 when power is removed and power LED goes off.
 
 TODO:
 - migration support
 - fix dependency on pci_internals.h

If I didn't miss the code,
- QMP command for pushing attention button.
- QMP command to get LED status
- QMP events for LED on/off

thanks,

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  Makefile.objs |1 +
  hw/pci.h  |6 +
  hw/shpc.c |  646 
 +
  hw/shpc.h |   40 
  qemu-common.h |1 +
  5 files changed, 694 insertions(+), 0 deletions(-)
  create mode 100644 hw/shpc.c
  create mode 100644 hw/shpc.h
 
 diff --git a/Makefile.objs b/Makefile.objs
 index 391e524..4546477 100644
 --- a/Makefile.objs
 +++ b/Makefile.objs
 @@ -195,6 +195,7 @@ hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
  hw-obj-y += fw_cfg.o
  hw-obj-$(CONFIG_PCI) += pci.o pci_bridge.o
  hw-obj-$(CONFIG_PCI) += msix.o msi.o
 +hw-obj-$(CONFIG_PCI) += shpc.o
  hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o
  hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o
  hw-obj-y += watchdog.o
 diff --git a/hw/pci.h b/hw/pci.h
 index 33b0b18..756577e 100644
 --- a/hw/pci.h
 +++ b/hw/pci.h
 @@ -125,6 +125,9 @@ enum {
  /* command register SERR bit enabled */
  #define QEMU_PCI_CAP_SERR_BITNR 4
  QEMU_PCI_CAP_SERR = (1  QEMU_PCI_CAP_SERR_BITNR),
 +/* Standard hot plug controller. */
 +#define QEMU_PCI_SHPC_BITNR 5
 +QEMU_PCI_CAP_SHPC = (1  QEMU_PCI_SHPC_BITNR),
  };
  
  #define TYPE_PCI_DEVICE pci-device
 @@ -229,6 +232,9 @@ struct PCIDevice {
  /* PCI Express */
  PCIExpressDevice exp;
  
 +/* SHPC */
 +SHPCDevice *shpc;
 +
  /* Location of option rom */
  char *romfile;
  bool has_rom;
 diff --git a/hw/shpc.c b/hw/shpc.c
 new file mode 100644
 index 000..4baec29
 --- /dev/null
 +++ b/hw/shpc.c
 @@ -0,0 +1,646 @@
 +#include strings.h
 +#include stdint.h
 +#include range.h
 +#include shpc.h
 +#include pci.h
 +#include pci_internals.h
 +
 +/* TODO: model power only and disabled slot states. */
 +/* TODO: handle SERR and wakeups */
 +/* TODO: consider enabling 66MHz support */
 +
 +/* TODO: remove fully only on state DISABLED and LED off.
 + * track state to properly record this. */
 +
 +/* SHPC Working Register Set */
 +#define SHPC_BASE_OFFSET  0x00 /* 4 bytes */
 +#define SHPC_SLOTS_33 0x04 /* 4 bytes. Also encodes PCI-X slots. */
 +#define SHPC_SLOTS_66 0x08 /* 4 bytes. */
 +#define SHPC_NSLOTS   0x0C /* 1 byte */
 +#define SHPC_FIRST_DEV0x0D /* 1 byte */
 +#define SHPC_PHYS_SLOT0x0E /* 2 byte */
 +#define SHPC_PHYS_NUM_MAX 0x7ff
 +#define SHPC_PHYS_NUM_UP  0x1000
 +#define SHPC_PHYS_MRL 0x4000
 +#define SHPC_PHYS_BUTTON  0x8000
 +#define SHPC_SEC_BUS  0x10 /* 2 bytes */
 +#define SHPC_SEC_BUS_33   0x0
 +#define SHPC_SEC_BUS_66   0x1 /* Unused */
 +#define SHPC_SEC_BUS_MASK 0x7
 +#define SHPC_MSI_CTL  0x12 /* 1 byte */
 +#define SHPC_PROG_IFC 0x13 /* 1 byte */
 +#define SHPC_PROG_IFC_1_0 0x1
 +#define SHPC_CMD_CODE 0x14 /* 1 byte */
 +#define SHPC_CMD_TRGT 0x15 /* 1 byte */
 +#define SHPC_CMD_TRGT_MIN 0x1
 +#define SHPC_CMD_TRGT_MAX 0x1f
 +#define SHPC_CMD_STATUS   0x16 /* 2 bytes */
 +#define SHPC_CMD_STATUS_BUSY  0x1
 +#define SHPC_CMD_STATUS_MRL_OPEN  0x2
 +#define SHPC_CMD_STATUS_INVALID_CMD   0x4
 +#define SHPC_CMD_STATUS_INVALID_MODE  0x8
 +#define SHPC_INT_LOCATOR  0x18 /* 4 bytes */
 +#define SHPC_INT_COMMAND  0x1
 +#define SHPC_SERR_LOCATOR 0x1C /* 4 bytes */
 +#define SHPC_SERR_INT 0x20 /* 4 bytes */
 +#define SHPC_INT_DIS  0x1
 +#define SHPC_SERR_DIS 0x2
 +#define SHPC_CMD_INT_DIS  0x4
 +#define SHPC_ARB_SERR_DIS 0x8
 +#define SHPC_CMD_DETECTED 0x1
 +#define SHPC_ARB_DETECTED 0x2
 + /* 4 bytes * slot # (start from 0) */
 +#define SHPC_SLOT_REG(s) (0x24 + (s) * 4)
 + /* 2 bytes */
 +#define SHPC_SLOT_STATUS(s)   (0x0 + SHPC_SLOT_REG(s))
 +
 +/* Same slot state masks are used 

Re: [PATCH 3/3] KVM: perf: kvm events analysis tool

2012-02-13 Thread Xiao Guangrong
On 02/13/2012 01:32 PM, David Ahern wrote:

 [sorry for the top post - you would think Android would have a better mail 
 client]
 
 If the first patch is needed then kvm-events will not work with older, 
 unpatched kernels. That's a big limitation from a perf perpective.
 


The first patch is only needed for code compilation, after kvm-events is
compiled, you can analyse any kernels. :)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests

2012-02-13 Thread Jan Kiszka
On 2012-02-11 16:25, Blue Swirl wrote:
 On Fri, Feb 10, 2012 at 18:31, Jan Kiszka jan.kis...@siemens.com wrote:
 This enables acceleration for MMIO-based TPR registers accesses of
 32-bit Windows guest systems. It is mostly useful with KVM enabled,
 either on older Intel CPUs (without flexpriority feature, can also be
 manually disabled for testing) or any current AMD processor.

 The approach introduced here is derived from the original version of
 qemu-kvm. It was refactored, documented, and extended by support for
 user space APIC emulation, both with and without KVM acceleration. The
 VMState format was kept compatible, so was the ABI to the option ROM
 that implements the guest-side para-virtualized driver service. This
 enables seamless migration from qemu-kvm to upstream or, one day,
 between KVM and TCG mode.

 The basic concept goes like this:
  - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
   irqchip) a vmcall hypercall is registered
  - VAPIC option ROM is loaded into guest
  - option ROM activates TPR MMIO access reporting via port 0x7e
  - TPR accesses are trapped and patched in the guest to call into option
   ROM instead, VAPIC support is enabled
  - option ROM TPR helpers track state in memory and invoke hypercall to
   poll for pending IRQs if required

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 
 I must say that I find the approach horrible, patching guests and ROMs
 and looking up Windows internals. Taking the same approach to extreme,
 we could for example patch Xen guest to become a KVM guest. Not that I
 object merging.

Yes, this is horrible. But there is no real better way in the absence of
hardware assisted virtualization of the TPR. I think MS is recommending
this patching approach as well.

 diff --git a/hw/apic.c b/hw/apic.c
 index 086c544..2ebf3ca 100644
 --- a/hw/apic.c
 +++ b/hw/apic.c
 @@ -35,6 +35,10 @@
  #define MSI_ADDR_DEST_ID_SHIFT 12
  #defineMSI_ADDR_DEST_ID_MASK   0x000

 +#define SYNC_FROM_VAPIC 0x1
 +#define SYNC_TO_VAPIC   0x2
 +#define SYNC_ISR_IRR_TO_VAPIC   0x4
 
 Enum, please.

OK.

 
 +
  static APICCommonState *local_apics[MAX_APICS + 1];

  static void apic_set_irq(APICCommonState *s, int vector_num, int 
 trigger_mode);
 @@ -78,6 +82,70 @@ static inline int get_bit(uint32_t *tab, int index)
 return !!(tab[i]  mask);
  }

 +/* return -1 if no bit is set */
 +static int get_highest_priority_int(uint32_t *tab)
 +{
 +int i;
 +for (i = 7; i = 0; i--) {
 +if (tab[i] != 0) {
 +return i * 32 + fls_bit(tab[i]);
 +}
 +}
 +return -1;
 +}
 +
 +static void apic_sync_vapic(APICCommonState *s, int sync_type)
 +{
 +VAPICState vapic_state;
 +size_t length;
 +off_t start;
 +int vector;
 +
 +if (!s-vapic_paddr) {
 +return;
 +}
 +if (sync_type  SYNC_FROM_VAPIC) {
 +cpu_physical_memory_rw(s-vapic_paddr, (void *)vapic_state,
 +   sizeof(vapic_state), 0);
 +s-tpr = vapic_state.tpr;
 +}
 +if (sync_type  (SYNC_TO_VAPIC | SYNC_ISR_IRR_TO_VAPIC)) {
 +start = offsetof(VAPICState, isr);
 +length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr);
 +
 +if (sync_type  SYNC_TO_VAPIC) {
 +assert(qemu_cpu_is_self(s-cpu_env));
 +
 +vapic_state.tpr = s-tpr;
 +vapic_state.enabled = 1;
 +start = 0;
 +length = sizeof(VAPICState);
 +}
 +
 +vector = get_highest_priority_int(s-isr);
 +if (vector  0) {
 +vector = 0;
 +}
 +vapic_state.isr = vector  0xf0;
 +
 +vapic_state.zero = 0;
 +
 +vector = get_highest_priority_int(s-irr);
 +if (vector  0) {
 +vector = 0;
 +}
 +vapic_state.irr = vector  0xff;
 +
 +cpu_physical_memory_write_rom(s-vapic_paddr + start,
 +  ((void *)vapic_state) + start, 
 length);
 
 This assumes that the vapic_state structure matches guest what guest
 expect without conversion. Is this true for i386 on x86_64? I didn't
 check the structure in question.

Yes, the structure in question is a packed one, stable on both guest and
host side (the guest side is 32-bit only anyway).

 diff --git a/hw/apic_common.c b/hw/apic_common.c
 index 588531b..1977da7 100644
 --- a/hw/apic_common.c
 +++ b/hw/apic_common.c
 @@ -20,8 +20,10 @@
  #include apic.h
  #include apic_internal.h
  #include trace.h
 +#include kvm.h

  static int apic_irq_delivered;
 +bool apic_report_tpr_access;
 
 This should go to APICCommonState.

Nope, it is a global state, also checked in a place where the APIC is
set up, thus have no local clue about it yet and needs to pick up the
global view.

 @@ -238,6 +275,7 @@ static int apic_init_common(SysBusDevice *dev)
  {
 APICCommonState *s = APIC_COMMON(dev);
 APICCommonClass *info;
 +static 

[PATCH v2 0/6] Network performance regression

2012-02-13 Thread Amos Kong
This patchset adds a new network perf testcase for Windows,
refactors old netperf test, and support numa resource control.
Process the raw results to a standard format at the end of test.
regression.py can be used to compare two job results.

---

Amos Kong (6):
  virt: Add vhost_threads and vcpu_threads to VM object
  virt_test_utils: Add pin_vm_threads
  virt-test: add NTttcp subtests
  virt-test: Refactor netperf test and add analysis module
  netperf: pin guest vcpus/memory/vhost thread to numa node
  virt: Introduce regression testing infrastructure


 client/tools/analyzer.py   |  166 
 client/tools/perf.conf |   14 
 client/tools/regression.py |   24 ++
 3 files changed, 204 insertions(+), 0 deletions(-)
 create mode 100644 client/tools/analyzer.py
 create mode 100644 client/tools/perf.conf
 create mode 100644 client/tools/regression.py

-- 
Amos Kong
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/6] virt: Add vhost_threads and vcpu_threads to VM object

2012-02-13 Thread Amos Kong
Record vhost_net threads ID and vcpus threads ID to vm object
after creating VM.

Signed-off-by: Amos Kong ak...@redhat.com
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/virt/kvm_vm.py b/client/virt/kvm_vm.py
index c5dba08..b2d6088 100644
--- a/client/virt/kvm_vm.py
+++ b/client/virt/kvm_vm.py
@@ -54,6 +54,8 @@ class VM(virt_vm.BaseVM):
 self.device_id = []
 self.tapfds = []
 self.uuid = None
+self.vcpu_threads = []
+self.vhost_threads = []
 
 
 self.spice_port = 8000
@@ -1008,6 +1010,12 @@ class VM(virt_vm.BaseVM):
 
 logging.debug(VM appears to be alive with PID %s, self.get_pid())
 
+o = self.monitor.info(cpus)
+self.vcpu_threads = re.findall(thread_id=(\d+), o)
+o = commands.getoutput(ps aux)
+self.vhost_threads = re.findall(\w+\s+(\d+)\s.*\[vhost-%s\] %
+self.get_pid(), o)
+
 # Establish a session with the serial console -- requires a version
 # of netcat that supports -U
 self.serial_console = aexpect.ShellSession(

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/6] virt_test_utils: Add pin_vm_threads

2012-02-13 Thread Amos Kong
This function is used to pin vhost and vcpu threads of VM to
host cpu (in same numa node).

Signed-off-by: Amos Kong ak...@redhat.com
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/virt/virt_test_utils.py b/client/virt/virt_test_utils.py
index 6b0d7eb..7864d2a 100644
--- a/client/virt/virt_test_utils.py
+++ b/client/virt/virt_test_utils.py
@@ -811,3 +811,14 @@ def run_virt_sub_test(test, params, env, sub_type=None):
 # Run the test function
 run_func = getattr(test_module, run_%s % sub_type)
 run_func(test, params, env)
+
+def pin_vm_threads(vm, node):
+
+Pin VM threads to single cpu of a numa node
+@param vm: VM object
+@param node: NumaNode object
+
+for i in vm.vhost_threads:
+logging.info(pin vhost thread(%s) to cpu(%s) % (i, node.pin_cpu(i)))
+for i in vm.vcpu_threads:
+logging.info(pin vcpu thread(%s) to cpu(%s) % (i, node.pin_cpu(i)))

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/6] virt-test: add NTttcp subtests

2012-02-13 Thread Amos Kong
This case will test tcp throughput between 2 windows guests,
or between 1 guest and 1 external Windows host.

When test between guest and external Windows host,
'receiver_address' should be set to external Windows' ip address.

NTttcp is not a freely redistributable binary, so you *must* download it
from microsoft and be in agreement with its EULA. See @see tag for a
complete download link and also documentation on how to integrate it to
your autotest setup.

@see: http://msdn.microsoft.com/en-us/windows/hardware/gg463264
@see: 
http://download.microsoft.com/download/f/1/e/f1e1ac7f-e632-48ea-83ac-56b016318735/NT%20Testing%20TCP%20Tool.msi
@see: https://github.com/autotest/autotest/wiki/KVMAutotest-Networking

! ntttcp.au3: This script will sign End-user license agreement
!   for you, please don't use this script if you don't agree EULA.

This test will generate result files with 'standard' format,
split different items by '|', use one line as the title.
We can analyze them by a general modules.
raw_output_1.RHS:
  buf(k)| throughput(Mbit/s)
   ...
  64| 2407.548
 128| 2102.254
 256| 4930.362
 512| 4723.035
1024| 4725.334

Changes from v1:
- pin vcpus/vhost_net threads to numa node
- add autoio script for ntttcp test
- user should put msi and autoit script to iso
- fix threads sync issue
- set test time to 30 seconds
- support to use fixed receiver buf or use same buf as sender
- 30 seconds is not enough, assign buf number to 200

Signed-off-by: Qingtang Zhou qz...@redhat.com
Signed-off-by: Amos Kong ak...@redhat.com
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/virt/scripts/ntttcp.au3 b/client/virt/scripts/ntttcp.au3
new file mode 100755
index 000..00489e8
--- /dev/null
+++ b/client/virt/scripts/ntttcp.au3
@@ -0,0 +1,41 @@
+#cs -
+AutoIt Version: 3.1.1.0
+Author: Qingtang Zhou qz...@redhat.com
+
+Script Function:
+Install NT Testing TCP tool
+
+Note: This script will sign End-user license agreement for user
+#ce -
+
+Func WaitWind($title)
+WinWait($title, )
+
+If Not WinActive($title, ) Then
+WinActivate($title, )
+EndIf
+EndFunc
+
+$FILE=msiexec /i D:\NTttcp\\NT Testing TCP Tool.msi
+Run($FILE)
+
+WaitWind(NT Testing TCP Tool)
+WinWaitActive(NT Testing TCP Tool, Welcome to the NT Testing TCP Tool Setup 
Wizard)
+Send(!n)
+
+WaitWind(NT Testing TCP Tool)
+WinWaitActive(NT Testing TCP Tool, License Agreement)
+send(!a)
+send({ENTER})
+
+WaitWind(NT Testing TCP Tool)
+WinWaitActive(NT Testing TCP Tool, Select Installation Folder)
+Send({ENTER})
+
+WaitWind(NT Testing TCP Tool)
+WinWaitActive(NT Testing TCP Tool, Confirm Installation)
+send({ENTER})
+
+WinWaitActive(NT Testing TCP Tool, Installation Complete)
+send(!c)
+
diff --git a/client/virt/subtests.cfg.sample b/client/virt/subtests.cfg.sample
index 89dda8c..cc0986a 100644
--- a/client/virt/subtests.cfg.sample
+++ b/client/virt/subtests.cfg.sample
@@ -1007,6 +1007,28 @@ variants:
 netperf_cmd = %s/netperf-2.4.5/src/netperf -t %s -H %s -l 60 
-- -r %s
 protocols = TCP_RR TCP_CRR UDP_RR
 
+- ntttcp:
+type = ntttcp
+image_snapshot = yes
+check_ntttcp_cmd = cmd /c dir C:\NTttcp
+ntttcp_sender_cmd = cmd /c C:\NTttcp\NTttcps.exe -m %s,0,%s -a 2 -l 
%s -n %s
+ntttcp_receiver_cmd = cmd /c C:\NTttcp\NTttcpr.exe -m %s,0,%s -a 6 
-rb %s -n %s
+session_num = 1
+buffers = 2k 4k 8k 16k 32k 64k 128k 256k 512k 1024k 2048k
+timeout = 1200
+kill_vm = yes
+numa_node = -1
+variants:
+- guest_guest:
+vms +=  vm2
+- guest_host:
+# external Windows system IP, NTttcp need to be installed 
firstly.
+receiver_address = 192.168.1.1
+32:
+ntttcp_install_cmd = 'cmd /c D:\autoit3.exe D:\NTttcp\NTttcp.au3 
 mkdir C:\NTttcp  copy C:\Program Files\Microsoft Corporation\NT Testing 
TCP Tool\* C:\NTttcp  cd C:\NTttcp\  copy NTttcp_%s.exe NTttcps.exe  
copy NTttcp_%s.exe NTttcpr.exe'
+64:
+ntttcp_install_cmd = 'cmd /c D:\autoit3.exe D:\NTttcp\NTttcp.au3 
 mkdir C:\NTttcp  copy C:\Program Files (x86)\Microsoft Corporation\NT 
Testing TCP Tool\* C:\NTttcp  cd C:\NTttcp\  copy NTttcp_%s.exe 
NTttcps.exe  copy NTttcp_%s.exe NTttcpr.exe'
+
 - ethtool: install setup image_copy unattended_install.cdrom
 only Linux
 type = ethtool
diff --git a/client/virt/tests/ntttcp.py b/client/virt/tests/ntttcp.py
new file mode 100644
index 000..66cdbfe
--- /dev/null
+++ b/client/virt/tests/ntttcp.py
@@ -0,0 +1,175 @@
+import logging, os, glob, re, commands
+from autotest_lib.client.common_lib import error
+from autotest_lib.client.common_lib import utils
+from autotest_lib.client.virt import virt_utils, aexpect, virt_test_utils
+

[PATCH v2 4/6] virt-test: Refactor netperf test and add analysis module

2012-02-13 Thread Amos Kong
Always use a VM as netperf server, we can use
another VM/localhost/external host as the netperf
clients.
We setup env and launch test by executing remote
ssh commands, you need to configure the IP of
local/external host in configure file, VMs' IP
can be got automatically.
Generate a file with 'standard' format at the end of test,
then we can analyze them by general module.

Changes from v1:
- record packet bytes
- enable arp_ignore
- get packet info from ifconfig
- shape functions
- don't change ssh config
- use server.hosts.ssh_host.SSHHost to setup ssh

Signed-off-by: Amos Kong ak...@redhat.com
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/virt/subtests.cfg.sample b/client/virt/subtests.cfg.sample
index cc0986a..a2939f8 100644
--- a/client/virt/subtests.cfg.sample
+++ b/client/virt/subtests.cfg.sample
@@ -992,20 +992,36 @@ variants:
 
 - netperf: install setup image_copy unattended_install.cdrom
 only Linux
+only virtio_net
 type = netperf
-nics += ' nic2 nic3 nic4'
+kill_vm = yes
+image_snapshot = yes
+nics += ' nic2'
+# nic1 is for control, nic2 is for data connection
+# bridge_nic1 = virbr0
+pci_model_nic1 = virtio_net
+# bridge_nic2 = switch
+pci_model_nic2 = e1000
 nic_mode = tap
 netperf_files = netperf-2.4.5.tar.bz2 wait_before_data.patch
-packet_size = 1500
-setup_cmd = cd %s  tar xvfj netperf-2.4.5.tar.bz2  cd 
netperf-2.4.5  patch -p0  ../wait_before_data.patch  ./configure  make
-netserver_cmd =  %s/netperf-2.4.5/src/netserver
+setup_cmd = cd /tmp  rm -rf netperf-2.4.5  tar xvfj 
netperf-2.4.5.tar.bz2  cd netperf-2.4.5  patch -p0  
../wait_before_data.patch  ./configure  make
+# configure netperf test parameters
+# l = 60
+# protocols = TCP_STREAM TCP_MAERTS TCP_RR
+# sessions = 1 2 4
+# sessions_rr = 50 100 250 500
+# sizes = 64 256 512 1024
+# sizes_rr = 64 256 512 1024
 variants:
-- stream:
-netperf_cmd = %s/netperf-2.4.5/src/netperf -t %s -H %s -l 60 
-- -m %s
-protocols = TCP_STREAM TCP_MAERTS TCP_SENDFILE UDP_STREAM
-- rr:
-netperf_cmd = %s/netperf-2.4.5/src/netperf -t %s -H %s -l 60 
-- -r %s
-protocols = TCP_RR TCP_CRR UDP_RR
+- guest_guest:
+vms +=  vm2
+nics = 'nic1'
+- host_guest:
+# local host ip address
+# client = localhost
+- exhost_guest:
+# external host ip address
+# client =
 
 - ntttcp:
 type = ntttcp
diff --git a/client/virt/tests/netperf.py b/client/virt/tests/netperf.py
index fea1e9e..214f351 100644
--- a/client/virt/tests/netperf.py
+++ b/client/virt/tests/netperf.py
@@ -1,17 +1,18 @@
-import logging, os, signal
+import logging, os, commands, sys, threading, re, glob
 from autotest_lib.client.common_lib import error
 from autotest_lib.client.bin import utils
 from autotest_lib.client.virt import aexpect, virt_utils
+from autotest_lib.client.virt import virt_test_utils
+from autotest_lib.server.hosts.ssh_host import SSHHost
 
 def run_netperf(test, params, env):
 
 Network stress test with netperf.
 
-1) Boot up a VM with multiple nics.
-2) Launch netserver on guest.
-3) Execute multiple netperf clients on host in parallel
-   with different protocols.
-4) Output the test result.
+1) Boot up VM(s), setup SSH authorization between host
+   and guest(s)/external host
+2) Prepare the test environment in server/client/host
+3) Execute netperf tests, collect and analyze the results
 
 @param test: KVM test object.
 @param params: Dictionary with the test parameters.
@@ -21,86 +22,202 @@ def run_netperf(test, params, env):
 vm.verify_alive()
 login_timeout = int(params.get(login_timeout, 360))
 session = vm.wait_for_login(timeout=login_timeout)
+server = vm.get_address()
+server_ctl = vm.get_address(1)
 session.close()
-session_serial = vm.wait_for_serial_login(timeout=login_timeout)
-
-netperf_dir = os.path.join(os.environ['AUTODIR'], tests/netperf2)
-setup_cmd = params.get(setup_cmd)
-
-firewall_flush = iptables -F
-session_serial.cmd_output(firewall_flush)
-try:
-utils.run(iptables -F)
-except Exception:
-pass
-
-for i in params.get(netperf_files).split():
-vm.copy_files_to(os.path.join(netperf_dir, i), /tmp)
-
-try:
-session_serial.cmd(firewall_flush)
-except aexpect.ShellError:
-logging.warning(Could not flush firewall rules on guest)
-
-session_serial.cmd(setup_cmd % /tmp, timeout=200)
-session_serial.cmd(params.get(netserver_cmd) % /tmp)
-
-if tcpdump in env and env[tcpdump].is_alive():
-# Stop the background tcpdump process
-   

[PATCH v2 5/6] netperf: pin guest vcpus/memory/vhost thread to numa node

2012-02-13 Thread Amos Kong
Dynamically checking hardware and pin guest cpu threads and
guest memory to last numa node

Changes from v1:
- assign numanode to -1 for netperf test

Signed-off-by: Amos Kong ak...@redhat.com
---
 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/client/virt/subtests.cfg.sample b/client/virt/subtests.cfg.sample
index a2939f8..c68a48c 100644
--- a/client/virt/subtests.cfg.sample
+++ b/client/virt/subtests.cfg.sample
@@ -1012,6 +1012,7 @@ variants:
 # sessions_rr = 50 100 250 500
 # sizes = 64 256 512 1024
 # sizes_rr = 64 256 512 1024
+numa_node = -1
 variants:
 - guest_guest:
 vms +=  vm2
diff --git a/client/virt/tests/netperf.py b/client/virt/tests/netperf.py
index 214f351..bc4e436 100644
--- a/client/virt/tests/netperf.py
+++ b/client/virt/tests/netperf.py
@@ -26,12 +26,22 @@ def run_netperf(test, params, env):
 server_ctl = vm.get_address(1)
 session.close()
 
+logging.debug(commands.getoutput(numactl --hardware))
+logging.debug(commands.getoutput(numactl --show))
+# pin guest vcpus/memory/vhost threads to last numa node of host by default
+if params.get('numa_node'):
+numa_node = int(params.get('numa_node'))
+node = virt_utils.NumaNode(numa_node)
+virt_test_utils.pin_vm_threads(vm, node)
+
 if vm2 in params[vms]:
 vm2 = env.get_vm(vm2)
 vm2.verify_alive()
 session2 = vm2.wait_for_login(timeout=login_timeout)
 client = vm2.get_address()
 session2.close()
+if params.get('numa_node'):
+virt_test_utils.pin_vm_threads(vm2, node)
 
 if params.get(client):
 client = params[client]
@@ -196,7 +206,10 @@ def launch_client(sessions, server, server_ctl, host, 
client, l, nf_args):
 return [nrx, ntx, nrxb, ntxb, nre, nrx_intr, ntx_intr, io_exit, 
irq_inj]
 
 def netperf_thread(i):
-cmd = %s -H %s -l %s %s % (client_path, server, l, nf_args)
+output = ssh_cmd(client, numactl --hardware)
+n = int(re.findall(available: (\d+) nodes, output)[0]) - 1
+cmd = numactl --cpunodebind=%s --membind=%s %s -H %s -l %s %s % \
+(n, n, client_path, server, l, nf_args)
 output = ssh_cmd(client, cmd)
 f = file(/tmp/netperf.%s.%s.nf % (pid, i), w)
 f.write(output)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 6/6] virt: Introduce regression testing infrastructure

2012-02-13 Thread Amos Kong
 regression.py:
Usage: python regression.py $testname $dir1 $dir2 $configfile
 'regression' module is used to compare the test results
 of two jobs, we can use it (regression.compare()) at
 the end of control file, this script can also be used directly.

Example:
| # python regression.py netperf /result1-dir /result2-dir perf.conf

 analyzer.py:
Usage: python analyzer.py $results list1 $results list2 $log_file
 It's used to compute average, standard deviation, augment rate, etc,
 and compare two test results (standard format).

 It can be used directly, example:
| # python analyzer.py result-v1-1.RHS result-v1-2.RHS \
|   result-v2-1.RHS result-v2-2.RHS result-v2-3.RHS log.txt
| Thu Jan  5 10:17:24 2012
|
| == Avg1 SD Augment Rate ==
| TCP_STREAM
| size|sessions|throughput|   cpu|normalize| ...
| 2048|   2|  14699.17| 31.73|   463.19| ...
| %SD | 0.0|   0.6|   0.0|  0.8| ...
| 2048|   4|  15935.68| 34.30|   464.66| ...
| %SD | 0.0|   0.3|   1.7|  1.5| ...
| ...
|
| == AvgS Augment Rate =
| TCP_STREAM
| size|sessions|throughput|   cpu|normalize| ...
| 2048|   2|   7835.61| 31.66|   247.36| ...
| 2048|   2|   8757.03| 31.94|   274.14| ...
| %   |+0.0| +11.8|  +0.9|+10.8| ...
| 2048|   4|  12000.65| 32.38|   370.62| ...
| 2048|   4|  13641.20| 32.27|   423.29| ...
| %   |+0.0| +13.7|  -0.3|+14.2| ...
|

 perf.conf:
 config test related parameters.

perf regression guide:
https://github.com/autotest/autotest/wiki/KVMAutotest-Networking

Changes from v1:
- refactor analysis code
- add standard deviation percent
- only provide mechanism to user, user can use tools directly or use the lib in 
scripts

Signed-off-by: Amos Kong ak...@redhat.com
---
 client/tools/analyzer.py   |  166 
 client/tools/perf.conf |   14 
 client/tools/regression.py |   24 ++
 3 files changed, 204 insertions(+), 0 deletions(-)
 create mode 100644 client/tools/analyzer.py
 create mode 100644 client/tools/perf.conf
 create mode 100644 client/tools/regression.py

diff --git a/client/tools/analyzer.py b/client/tools/analyzer.py
new file mode 100644
index 000..28df97e
--- /dev/null
+++ b/client/tools/analyzer.py
@@ -0,0 +1,166 @@
+import sys, re, string, time, commands, os, random
+
+def tee(content, filename):
+ Write content to standard output and file 
+fd = open(filename, a)
+fd.write(content + \n)
+fd.close()
+print content
+
+class samples():
+def __init__(self, files):
+self.files_dict = []
+for i in range(len(files)):
+fd = open(files[i], r)
+self.files_dict.append(fd.readlines())
+fd.close()
+
+def getAvg(self):
+return self._process(self.files_dict, self._get_list_avg)
+
+def getAvgPercent(self, avgs_dict):
+return self._process(avgs_dict, self._get_augment_rate)
+
+def getSD(self):
+return self._process(self.files_dict, self._get_list_sd)
+
+def getSDPercent(self, sds_dict):
+return self._process(sds_dict, self._get_percent)
+
+def _get_percent(self, data):
+ num2 / num1 * 100 
+result = 0.0
+if len(data) == 2 and float(data[0]) != 0:
+result = %.1f % (float(data[1]) / float(data[0]) * 100)
+return result
+
+def _get_augment_rate(self, data):
+ (num2 - num1) / num1 * 100 
+result = +0.0
+if len(data) == 2 and float(data[0]) != 0:
+result = %+.1f % (((float(data[1]) - float(data[0]))
+ / float(data[0])) * 100)
+return result
+
+def _get_list_sd(self, data):
+
+sumX = x1 + x2 + ... + xn
+avgX = sumX / n
+sumSquareX = x1^2 + ... + xn^2
+SD = sqrt([sumSquareX - (n * (avgX ^ 2))] / (n - 1))
+
+sum = sqsum = 0
+n = len(data)
+for i in data:
+sum += float(i)
+sqsum += float(i) ** 2
+avg = sum / n
+if avg == 0 or n == 1:
+return 0.0
+return %.1f % (((sqsum - (n * avg**2)) / (n - 1))**0.5)
+
+def _get_list_avg(self, data):
+ Compute the average of list members 
+sum = 0
+for i in data:
+sum += float(i)
+if . in data[0]:
+return %.2f % (sum / len(data))
+return %d % (sum / len(data))
+
+def _process_lines(self, files_dict, row, func):
+ Process lines of different sample files with assigned method 
+lines = []
+ret_lines = []
+
+for i in range(len(files_dict)):
+lines.append(files_dict[i][row].split(|))
+for col in range(len(lines[0])):
+data_list = []
+for i in range(len(lines)):
+data_list.append(lines[i][col].strip())
+

Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

2012-02-13 Thread Bart Van Assche
On Mon, Feb 13, 2012 at 8:05 AM, Christian Borntraeger
borntrae...@de.ibm.com wrote:
 On 12/02/12 21:16, James Bottomley wrote:
  Could someone please explain to me why you can't simply fix virtio-blk?

 I dont think that virtio-scsi will replace virtio-blk everywhere. For non-scsi
 block devices, image files or logical volumes virtio-blk seems to be the right
 approach, I think.

  Or would virtio-blk maintainers give a reason why they're unwilling to
  have it fixed?

 I dont consider virtio-blk broken. It just doesnt cover everything.

Although I'm not sure whether that helps here: since about a year
there is software present in the upstream kernel that allows to use
any block device or even a file as a SCSI device.

Bart.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2-RFC 1/2] shpc: standard hot plug controller

2012-02-13 Thread Michael S. Tsirkin
On Mon, Feb 13, 2012 at 07:03:52PM +0900, Isaku Yamahata wrote:
 Oh nice work.
 
 On Mon, Feb 13, 2012 at 11:15:55AM +0200, Michael S. Tsirkin wrote:
  This adds support for SHPC interface, as defined by PCI Standard
  Hot-Plug Controller and Subsystem Specification, Rev 1.0
  http://www.pcisig.com/specifications/conventional/pci_hot_plug/SHPC_10
  
  Only SHPC intergrated with a PCI-to-PCI bridge is supported,
  SHPC integrated with a host bridge would need more work.
  
  All main SHPC features are supported:
  - MRL sensor
 
 Does this just report latch status? (It seems so.)

What happens is that adding a device closes the latch, removing a device
opens the latch.  This simplifies the number of supported configurations
significantly.


 Do you plan to provide interfaces to manipulate the latch?

I didn't plan to do this, and this is non-trivial.
Do you just want this for empty slots?  And why?

 
  - Attention button
  - Attention indicator
  - Power indicator
 
  Wake on hotplug and serr generation are stubbed out but unused
  as we don't have interfaces to generate these events ATM.
  
  One issue that isn't completely resolved is that qemu currently
  expects an eject interface, which SHPC does not provide: it merely
  removes the power to device and it's up to the user to remove the device
  from slot. This patch works around that by ejecting the device
  when power is removed and power LED goes off.
  
  TODO:
  - migration support
  - fix dependency on pci_internals.h
 
 If I didn't miss the code,
 - QMP command for pushing attention button.
 - QMP command to get LED status

It's easy to add these, so I'd accept such a patch,
but I wonder why.

 - QMP events for LED on/off

There's also blink :)

 
 thanks,

I'm concerned that a guest can flood the management with such events.
It's better to send a single LED change event, then we
can suppress further events until next get LED status command.

  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   Makefile.objs |1 +
   hw/pci.h  |6 +
   hw/shpc.c |  646 
  +
   hw/shpc.h |   40 
   qemu-common.h |1 +
   5 files changed, 694 insertions(+), 0 deletions(-)
   create mode 100644 hw/shpc.c
   create mode 100644 hw/shpc.h
  
  diff --git a/Makefile.objs b/Makefile.objs
  index 391e524..4546477 100644
  --- a/Makefile.objs
  +++ b/Makefile.objs
  @@ -195,6 +195,7 @@ hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
   hw-obj-y += fw_cfg.o
   hw-obj-$(CONFIG_PCI) += pci.o pci_bridge.o
   hw-obj-$(CONFIG_PCI) += msix.o msi.o
  +hw-obj-$(CONFIG_PCI) += shpc.o
   hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o
   hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o
   hw-obj-y += watchdog.o
  diff --git a/hw/pci.h b/hw/pci.h
  index 33b0b18..756577e 100644
  --- a/hw/pci.h
  +++ b/hw/pci.h
  @@ -125,6 +125,9 @@ enum {
   /* command register SERR bit enabled */
   #define QEMU_PCI_CAP_SERR_BITNR 4
   QEMU_PCI_CAP_SERR = (1  QEMU_PCI_CAP_SERR_BITNR),
  +/* Standard hot plug controller. */
  +#define QEMU_PCI_SHPC_BITNR 5
  +QEMU_PCI_CAP_SHPC = (1  QEMU_PCI_SHPC_BITNR),
   };
   
   #define TYPE_PCI_DEVICE pci-device
  @@ -229,6 +232,9 @@ struct PCIDevice {
   /* PCI Express */
   PCIExpressDevice exp;
   
  +/* SHPC */
  +SHPCDevice *shpc;
  +
   /* Location of option rom */
   char *romfile;
   bool has_rom;
  diff --git a/hw/shpc.c b/hw/shpc.c
  new file mode 100644
  index 000..4baec29
  --- /dev/null
  +++ b/hw/shpc.c
  @@ -0,0 +1,646 @@
  +#include strings.h
  +#include stdint.h
  +#include range.h
  +#include shpc.h
  +#include pci.h
  +#include pci_internals.h
  +
  +/* TODO: model power only and disabled slot states. */
  +/* TODO: handle SERR and wakeups */
  +/* TODO: consider enabling 66MHz support */
  +
  +/* TODO: remove fully only on state DISABLED and LED off.
  + * track state to properly record this. */
  +
  +/* SHPC Working Register Set */
  +#define SHPC_BASE_OFFSET  0x00 /* 4 bytes */
  +#define SHPC_SLOTS_33 0x04 /* 4 bytes. Also encodes PCI-X slots. */
  +#define SHPC_SLOTS_66 0x08 /* 4 bytes. */
  +#define SHPC_NSLOTS   0x0C /* 1 byte */
  +#define SHPC_FIRST_DEV0x0D /* 1 byte */
  +#define SHPC_PHYS_SLOT0x0E /* 2 byte */
  +#define SHPC_PHYS_NUM_MAX 0x7ff
  +#define SHPC_PHYS_NUM_UP  0x1000
  +#define SHPC_PHYS_MRL 0x4000
  +#define SHPC_PHYS_BUTTON  0x8000
  +#define SHPC_SEC_BUS  0x10 /* 2 bytes */
  +#define SHPC_SEC_BUS_33   0x0
  +#define SHPC_SEC_BUS_66   0x1 /* Unused */
  +#define SHPC_SEC_BUS_MASK 0x7
  +#define SHPC_MSI_CTL  0x12 /* 1 byte */
  +#define SHPC_PROG_IFC 0x13 /* 1 byte */
  +#define SHPC_PROG_IFC_1_0 0x1
  +#define SHPC_CMD_CODE 0x14 /* 1 byte */
  +#define SHPC_CMD_TRGT 0x15 /* 1 byte */
  +#define SHPC_CMD_TRGT_MIN 0x1
  +#define SHPC_CMD_TRGT_MAX 0x1f
  +#define SHPC_CMD_STATUS   0x16 /* 2 bytes */
  +#define 

Re: virtio-blk performance regression and qemu-kvm

2012-02-13 Thread Stefan Hajnoczi
On Fri, Feb 10, 2012 at 2:36 PM, Dongsu Park
dongsu.p...@profitbricks.com wrote:
  Now I'm running benchmarks with both qemu-kvm 0.14.1 and 1.0.

  - Sequential read (Running inside guest)
   # fio -name iops -rw=read -size=1G -iodepth 1 \
    -filename /dev/vdb -ioengine libaio -direct=1 -bs=4096

  - Sequential write (Running inside guest)
   # fio -name iops -rw=write -size=1G -iodepth 1 \
    -filename /dev/vdb -ioengine libaio -direct=1 -bs=4096

  For each one, I tested 3 times to get the average.

  Result:

  seqread with qemu-kvm 0.14.1   67,0 MByte/s
  seqread with qemu-kvm 1.0      30,9 MByte/s

  seqwrite with qemu-kvm 0.14.1  65,8 MByte/s
  seqwrite with qemu-kvm 1.0     30,5 MByte/s

Please retry with the following commit or simply qemu-kvm.git/master.
Avi discovered a performance regression which was introduced when the
block layer was converted to use coroutines:

$ git describe 39a7a362e16bb27e98738d63f24d1ab5811e26a8
v1.0-327-g39a7a36

(This commit is not in 1.0!)

Please post your qemu-kvm command-line.

67 MB/s sequential 4 KB read means 67 * 1024 / 4 = 17152 requests per
second, so 58 microseconds per request.

Please post the fio output so we can double-check what is reported.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Q: Does linux kvm native tool support loading BIOS as the default loader now?

2012-02-13 Thread Yang Bai
Hi all,

As I know, native tool does not support loading BIOS so it does not
support Windows. Is this supporting now?
If not, I may try to implement it.

Thanks,
Yang
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Q: Does linux kvm native tool support loading BIOS as the default loader now?

2012-02-13 Thread Cyrill Gorcunov
On Mon, Feb 13, 2012 at 08:14:22PM +0800, Yang Bai wrote:
 Hi all,
 
 As I know, native tool does not support loading BIOS so it does not
 support Windows. Is this supporting now?
 If not, I may try to implement it.
 

Nope yet. There was a plan to implement seabios support,
but nothing is done that far. Feel free to implement such
support.

Cyrill
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Q: Does linux kvm native tool support loading BIOS as the default loader now?

2012-02-13 Thread Pekka Enberg
On Mon, Feb 13, 2012 at 08:14:22PM +0800, Yang Bai wrote:
 As I know, native tool does not support loading BIOS so it does not
 support Windows. Is this supporting now?
 If not, I may try to implement it.

On Mon, Feb 13, 2012 at 2:19 PM, Cyrill Gorcunov gorcu...@openvz.org wrote:
 Nope yet. There was a plan to implement seabios support,
 but nothing is done that far. Feel free to implement such
 support.

Yup, optional SeaBIOS support would be awesome!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86: kvmclock: abstract save/restore sched_clock_state

2012-02-13 Thread Amit Shah
On (Fri) 10 Feb 2012 [21:58:47], Igor Mammedov wrote:
 BTW Amit,
 your config doesn't have CONFIG_KVM_GUEST set, which causes primary cpu clock 
 to be
 uninitialized too in case of SMP kernel.

Interesting.  I didn't notice that.  However, if I enable that option,
resume fails for me even the first time.

Amit
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

2012-02-13 Thread Nicholas A. Bellinger
Hi Dor, James  Co,

On Mon, 2012-02-13 at 09:57 +0200, Dor Laor wrote:
 On 02/13/2012 09:05 AM, Christian Borntraeger wrote:
  On 12/02/12 21:16, James Bottomley wrote:
  Well, no-one's yet answered the question I had about why.
 
  Just to give one example from a different angle:
  In the big datacenters tape libraries are still very important, and lots
  of them have a scsi attachement. virtio-blk certainly is not the right
  way to handle those. Furthermore it seems even pretty hard to craft
  a virtio-tape since most of those libraries have vendor specific library
  controls (via sg). We would need to duplicate scsi generic (hint, hint :-)
 
  virtio-scsi seems to be a basic duplication of virtio-blk except that it 
  seems to
  fix some problems virtio-blk has.  Namely queue parameter discover,
  which virtio-blk doesn't seem to do.  There may also be a reason to cut
  the stack lower down.  Error handling is most often cited for this, but
  no-one's satisfactorily explaned why it's better to do error handling in
  the guest instead of the host.
 
  Could someone please explain to me why you can't simply fix virtio-blk?
 
  I dont think that virtio-scsi will replace virtio-blk everywhere. For 
  non-scsi
  block devices, image files or logical volumes virtio-blk seems to be the 
  right
  approach, I think.
 
 +1
 
 virtio-scsi is superior w.r.t:
- Device support: tapes, cdroms, other

AFAICT any type of non TYPE_DISK struct scsi_device passthrough is going
to currently require virtio-scsi in order to work.

- Does guest-host mapped multipath

The logic that comes with target_core_fabric_configfs.c and the native
target control plane gives a host-side (tcm_vhost) fabric driver generic
explict/implict ALUA multipath support by default.

I think there are some interesting possibilities for paravirtualized
ALUA multipath..  8-)

- Supports plenty of virtual disks mapped to the guest w/o need for a
  pci slot per each virtio-blk

Ouch, virtio-blk lacks multi-lun per pci slot support..?

- offload fancy/new/sophisticated scsi commands from the guest to the
  storage array w/o need for qemu implementation. Example XCOPY.
 

...

 There are some more goodies like ability to support windows guest 
 clustering w/o hacky versions of scsi pass through over virtio-blk.
 virtio-blk is also a candidate to change the request based towards bio 
 based implementation, so sticking to it does not buy us too much.
 

MSFT cluster guests that require SPC-3 PR support can run today with
tcm_loop LLD SCSI LUNs + SG_IO/BSG + right megasas QEMU HBA emulation,
but I do agree this would be better served by virtio-scsi for guests
that require SPC-3 PR support or passthrough.

--nab







--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86: kvmclock: abstract save/restore sched_clock_state

2012-02-13 Thread Amit Shah
On (Fri) 10 Feb 2012 [13:43:05], Igor Mammedov wrote:
 Another thing is to try smp guest without kvmclock and see if it helps.
 It might be just something else.

Nope, it's related to kvmclock.

Amit
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

2012-02-13 Thread Dor Laor

On 02/13/2012 02:40 PM, Nicholas A. Bellinger wrote:

Hi Dor, James  Co,

On Mon, 2012-02-13 at 09:57 +0200, Dor Laor wrote:

On 02/13/2012 09:05 AM, Christian Borntraeger wrote:

On 12/02/12 21:16, James Bottomley wrote:

Well, no-one's yet answered the question I had about why.


Just to give one example from a different angle:
In the big datacenters tape libraries are still very important, and lots
of them have a scsi attachement. virtio-blk certainly is not the right
way to handle those. Furthermore it seems even pretty hard to craft
a virtio-tape since most of those libraries have vendor specific library
controls (via sg). We would need to duplicate scsi generic (hint, hint :-)


virtio-scsi seems to be a basic duplication of virtio-blk except that it seems 
to
fix some problems virtio-blk has.  Namely queue parameter discover,
which virtio-blk doesn't seem to do.  There may also be a reason to cut
the stack lower down.  Error handling is most often cited for this, but
no-one's satisfactorily explaned why it's better to do error handling in
the guest instead of the host.

Could someone please explain to me why you can't simply fix virtio-blk?


I dont think that virtio-scsi will replace virtio-blk everywhere. For non-scsi
block devices, image files or logical volumes virtio-blk seems to be the right
approach, I think.


+1

virtio-scsi is superior w.r.t:
- Device support: tapes, cdroms, other


AFAICT any type of non TYPE_DISK struct scsi_device passthrough is going
to currently require virtio-scsi in order to work.


- Does guest-host mapped multipath


The logic that comes with target_core_fabric_configfs.c and the native
target control plane gives a host-side (tcm_vhost) fabric driver generic
explict/implict ALUA multipath support by default.

I think there are some interesting possibilities for paravirtualized
ALUA multipath..  8-)


- Supports plenty of virtual disks mapped to the guest w/o need for a
  pci slot per each virtio-blk


Ouch, virtio-blk lacks multi-lun per pci slot support..?


Only if you use the pci multi-function option but that kills standard 
hot unplug





- offload fancy/new/sophisticated scsi commands from the guest to the
  storage array w/o need for qemu implementation. Example XCOPY.



...


There are some more goodies like ability to support windows guest
clustering w/o hacky versions of scsi pass through over virtio-blk.
virtio-blk is also a candidate to change the request based towards bio
based implementation, so sticking to it does not buy us too much.



MSFT cluster guests that require SPC-3 PR support can run today with
tcm_loop LLD SCSI LUNs + SG_IO/BSG + right megasas QEMU HBA emulation,
but I do agree this would be better served by virtio-scsi for guests
that require SPC-3 PR support or passthrough.

--nab







--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86: kvmclock: abstract save/restore sched_clock_state

2012-02-13 Thread Amit Shah
On (Fri) 10 Feb 2012 [10:33:37], Marcelo Tosatti wrote:
 On Fri, Feb 10, 2012 at 10:32:16AM -0200, Marcelo Tosatti wrote:
  On Fri, Feb 10, 2012 at 03:32:11PM +0530, Amit Shah wrote:
   On (Thu) 09 Feb 2012 [16:13:29], Igor Mammedov wrote:
   
Stalls are probably caused by uninitialized percpu hv_clock, with
following patch I don't see stalls. Although I might be just lucky.
http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=e2971ac7e1d186af059e088d305496c5cb47d487
   
   Your commit does make things better, I don't see any stalls on the
   first resume.
   
   However, a subsequent s4 causes the stall to re-appear on resume, and
   this time there are no stall messages; the kernel just sits there
   spinning on something.  I've not found the solution to this one yet (I
   had a commit similar to Marcelo's in the works, which got me to the
   previous works-but-stalls behaviour).
  
  I cannot reproduce it here. Suspend/resume are operating normally after
  several iterations. Igor do you see anything similar?
  
  Amit, can you please enable CONFIG_PRINTK_TIME=y and post a full dmesg 
  (both during suspend and also the new kernel during resume).
 
 Also is it reproducible with UP guest?

Yes, it is.

Amit
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

2012-02-13 Thread Michael S. Tsirkin
On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:
 Only if you use the pci multi-function option but that kills
 standard hot unplug

It doesn't kill it as such, rather you can't unplug luns individually.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 8/9] qemu-kvm: Use upstream kvm_irqchip_set_irq instead of kvm_set_irq

2012-02-13 Thread Jan Kiszka
Functions are equivalent, let's switch.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/msi.c   |2 +-
 hw/msix.c  |2 +-
 kvm-stub.c |2 +-
 kvm.h  |3 +--
 qemu-kvm.c |   32 
 5 files changed, 4 insertions(+), 37 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index 3e623c2..7bb3e2f 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -350,7 +350,7 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
 }
 
 if (kvm_enabled()  kvm_irqchip_in_kernel()) {
-kvm_set_irq(dev-msi_irq_entries[vector].gsi, 1, NULL);
+kvm_irqchip_set_irq(kvm_state, dev-msi_irq_entries[vector].gsi, 1);
 return;
 }
 
diff --git a/hw/msix.c b/hw/msix.c
index 55ddbf4..7955221 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -494,7 +494,7 @@ void msix_notify(PCIDevice *dev, unsigned vector)
 }
 
 if (kvm_enabled()  kvm_irqchip_in_kernel()) {
-kvm_set_irq(dev-msix_irq_entries[vector].gsi, 1, NULL);
+kvm_irqchip_set_irq(kvm_state, dev-msix_irq_entries[vector].gsi, 1);
 return;
 }
 
diff --git a/kvm-stub.c b/kvm-stub.c
index 266dc4a..d22fcad 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -162,7 +162,7 @@ int kvm_irqchip_commit_routes(KVMState *s)
 return -ENOSYS;
 }
 
-int kvm_set_irq(int irq, int level, int *status)
+int kvm_irqchip_set_irq(KVMState *s, int irq, int level)
 {
 assert(0);
 return -ENOSYS;
diff --git a/kvm.h b/kvm.h
index b84aa40..3c3a510 100644
--- a/kvm.h
+++ b/kvm.h
@@ -228,13 +228,12 @@ int kvm_msi_message_del(KVMMsiMessage *msg);
 int kvm_msi_message_update(KVMMsiMessage *old, KVMMsiMessage *new);
 
 #ifndef NEED_CPU_H
+int kvm_irqchip_set_irq(KVMState *s, int irq, int level);
 int kvm_irqchip_commit_routes(KVMState *s);
 #endif
 
 int kvm_irqchip_in_kernel(void);
 
-int kvm_set_irq(int irq, int level, int *status);
-
 #ifdef NEED_CPU_H
 #include qemu-kvm.h
 #endif
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 10a313d..09a35f0 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -41,38 +41,6 @@ static inline void clear_gsi(KVMState *s, unsigned int gsi)
 }
 }
 
-#ifdef KVM_CAP_IRQCHIP
-
-int kvm_set_irq(int irq, int level, int *status)
-{
-struct kvm_irq_level event;
-int r;
-
-if (!kvm_state-irqchip_in_kernel) {
-return 0;
-}
-event.level = level;
-event.irq = irq;
-r = kvm_vm_ioctl(kvm_state, kvm_state-irqchip_inject_ioctl,
- event);
-if (r  0) {
-perror(kvm_set_irq);
-}
-
-if (status) {
-#ifdef KVM_CAP_IRQ_INJECT_STATUS
-*status = (kvm_state-irqchip_inject_ioctl == KVM_IRQ_LINE) ?
-1 : event.status;
-#else
-*status = 1;
-#endif
-}
-
-return 1;
-}
-
-#endif
-
 #ifdef KVM_CAP_DEVICE_ASSIGNMENT
 int kvm_assign_pci_device(KVMState *s,
   struct kvm_assigned_pci_dev *assigned_dev)
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/9] qemu-kvm: Use machine options to configure qemu-kvm defaults

2012-02-13 Thread Jan Kiszka
Upstream is moving towards this mechanism, so start using it in qemu-kvm
already to configure the specific defaults: kvm enabled on, just like
in-kernel irqchips.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/pc_piix.c |7 +++
 kvm-all.c|8 
 vl.c |9 +++--
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index c9c580c..156fcc8 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -394,6 +394,7 @@ static QEMUMachine pc_machine_v1_0 = {
 .init = pc_init_pci,
 .max_cpus = 255,
 .is_default = 1,
+.default_machine_opts = accel=kvm,kernel_irqchip=on,
 };
 
 static QEMUMachine pc_machine_v0_15 = {
@@ -409,6 +410,7 @@ static QEMUMachine pc_machine_v0_14 = {
 .desc = Standard PC,
 .init = pc_init_pci,
 .max_cpus = 255,
+.default_machine_opts = accel=kvm,kernel_irqchip=on,
 .compat_props = (GlobalProperty[]) {
 {
 .driver   = qxl,
@@ -444,6 +446,7 @@ static QEMUMachine pc_machine_v0_13 = {
 .desc = Standard PC,
 .init = pc_init_pci_no_kvmclock,
 .max_cpus = 255,
+.default_machine_opts = accel=kvm,kernel_irqchip=on,
 .compat_props = (GlobalProperty[]) {
 {
 .driver   = virtio-9p-pci,
@@ -491,6 +494,7 @@ static QEMUMachine pc_machine_v0_12 = {
 .desc = Standard PC,
 .init = pc_init_pci_no_kvmclock,
 .max_cpus = 255,
+.default_machine_opts = accel=kvm,kernel_irqchip=on,
 .compat_props = (GlobalProperty[]) {
 {
 .driver   = virtio-serial-pci,
@@ -542,6 +546,7 @@ static QEMUMachine pc_machine_v0_11 = {
 .desc = Standard PC, qemu 0.11,
 .init = pc_init_pci_no_kvmclock,
 .max_cpus = 255,
+.default_machine_opts = accel=kvm,kernel_irqchip=on,
 .compat_props = (GlobalProperty[]) {
 {
 .driver   = virtio-blk-pci,
@@ -601,6 +606,7 @@ static QEMUMachine pc_machine_v0_10 = {
 .desc = Standard PC, qemu 0.10,
 .init = pc_init_pci_no_kvmclock,
 .max_cpus = 255,
+.default_machine_opts = accel=kvm,kernel_irqchip=on,
 .compat_props = (GlobalProperty[]) {
 {
 .driver   = virtio-blk-pci,
@@ -672,6 +678,7 @@ static QEMUMachine isapc_machine = {
 .desc = ISA-only PC,
 .init = pc_init_isa,
 .max_cpus = 1,
+.default_machine_opts = accel=kvm,kernel_irqchip=on,
 };
 
 #ifdef CONFIG_XEN
diff --git a/kvm-all.c b/kvm-all.c
index ae89389..515ba6e 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -887,6 +887,7 @@ int kvm_init(void)
 const KVMCapabilityInfo *missing_cap;
 int ret;
 int i;
+QemuOptsList *list;
 
 s = g_malloc0(sizeof(KVMState));
 
@@ -973,6 +974,13 @@ int kvm_init(void)
 s-pit_state2 = kvm_check_extension(s, KVM_CAP_PIT_STATE2);
 #endif
 
+list = qemu_find_opts(machine);
+if (!QTAILQ_EMPTY(list-head) 
+!qemu_opt_get_bool(QTAILQ_FIRST(list-head),
+   kernel_irqchip, false)) {
+kvm_irqchip = 0;
+}
+
 ret = kvm_arch_init(s);
 if (ret  0) {
 goto err;
diff --git a/vl.c b/vl.c
index c5994ee..c3b4037 100644
--- a/vl.c
+++ b/vl.c
@@ -2040,13 +2040,8 @@ static int configure_accelerator(void)
 }
 
 if (p == NULL) {
-#ifdef CONFIG_KVM_OPTIONS
-/* Use the default accelerator, kvm */
-p = kvm;
-#else
 /* Use the default accelerator, tcg */
 p = tcg;
-#endif
 }
 
 while (!accel_initalised  *p != '\0') {
@@ -2908,7 +2903,9 @@ int main(int argc, char **argv, char **envp)
 break;
 #ifdef CONFIG_KVM_OPTIONS
case QEMU_OPTION_no_kvm_irqchip: {
-   kvm_irqchip = 0;
+olist = qemu_find_opts(machine);
+qemu_opts_reset(olist);
+qemu_opts_parse(olist, accel=kvm,kernel_irqchip=off, 0);
break;
}
case QEMU_OPTION_no_kvm_pit: {
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/9] qemu-kvm: Switch to upstream irqchip services

2012-02-13 Thread Jan Kiszka
Now that upstream has basic irqchip support, we can make use of it for
qemu-kvm as well, removing another 700 lines of code here.

This series depends on

apic: Fix legacy vmstate loading for KVM

which is currently awaiting upstream merge via uq/master.

Jan Kiszka (9):
  qemu-kvm: Move kvm_create_pit out of arch init code
  qemu-kvm: Use machine options to configure qemu-kvm defaults
  qemu-kvm: Use upstream irq routing services
  qemu-kvm: Use upstream kvm_irqchip_create
  qemu-kvm: Use upstream kvm-ioapic
  qemu-kvm: Use upstream kvm-i8259
  qemu-kvm: Drop unused kvm_get/set_irqchip
  qemu-kvm: Use upstream kvm_irqchip_set_irq instead of kvm_set_irq
  qemu-kvm: Use upstream kvm-apic

 Makefile.objs  |2 +-
 Makefile.target|8 +-
 hw/apic.c  |  151 +---
 hw/device-assignment.c |   10 +-
 hw/i8254-kvm.c |3 +
 hw/i8259.c |  108 -
 hw/ioapic.c|   75 +-
 hw/isa-bus.c   |2 +-
 hw/msi.c   |6 +-
 hw/msix.c  |   10 +-
 hw/pc.c|7 +-
 hw/pc_piix.c   |   23 ++
 kvm-all.c  |   17 +
 kvm-stub.c |6 +-
 kvm.h  |9 +-
 qemu-kvm-x86.c |   85 +---
 qemu-kvm.c |  204 +---
 qemu-kvm.h |   72 +
 target-i386/kvm.c  |   17 
 vl.c   |   10 +--
 20 files changed, 58 insertions(+), 767 deletions(-)

-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/9] qemu-kvm: Use upstream kvm-ioapic

2012-02-13 Thread Jan Kiszka
Drop the qemu-kvm version in favor of the equivalent upstream
implementation.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/ioapic.c  |   75 +-
 hw/pc_piix.c |7 +
 2 files changed, 2 insertions(+), 80 deletions(-)

diff --git a/hw/ioapic.c b/hw/ioapic.c
index 3f86eff..79549f8 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -26,8 +26,6 @@
 #include ioapic.h
 #include ioapic_internal.h
 
-#include kvm.h
-
 //#define DEBUG_IOAPIC
 
 #ifdef DEBUG_IOAPIC
@@ -222,75 +220,6 @@ ioapic_mem_write(void *opaque, target_phys_addr_t addr, 
uint64_t val,
 }
 }
 
-static void kvm_kernel_ioapic_save_to_user(IOAPICCommonState *s)
-{
-#if defined(KVM_CAP_IRQCHIP)  defined(TARGET_I386)
-struct kvm_irqchip chip;
-struct kvm_ioapic_state *kioapic;
-int i;
-
-chip.chip_id = KVM_IRQCHIP_IOAPIC;
-kvm_get_irqchip(kvm_state, chip);
-kioapic = chip.chip.ioapic;
-
-s-id = kioapic-id;
-s-ioregsel = kioapic-ioregsel;
-s-irr = kioapic-irr;
-for (i = 0; i  IOAPIC_NUM_PINS; i++) {
-s-ioredtbl[i] = kioapic-redirtbl[i].bits;
-}
-#endif
-}
-
-static void kvm_kernel_ioapic_load_from_user(IOAPICCommonState *s)
-{
-#if defined(KVM_CAP_IRQCHIP)  defined(TARGET_I386)
-struct kvm_irqchip chip;
-struct kvm_ioapic_state *kioapic;
-int i;
-
-chip.chip_id = KVM_IRQCHIP_IOAPIC;
-kioapic = chip.chip.ioapic;
-
-kioapic-id = s-id;
-kioapic-ioregsel = s-ioregsel;
-kioapic-base_address = s-busdev.mmio[0].addr;
-kioapic-irr = s-irr;
-for (i = 0; i  IOAPIC_NUM_PINS; i++) {
-kioapic-redirtbl[i].bits = s-ioredtbl[i];
-}
-
-kvm_set_irqchip(kvm_state, chip);
-#endif
-}
-
-static void kvm_ioapic_pre_save(IOAPICCommonState *s)
-{
-
-if (kvm_enabled()  kvm_irqchip_in_kernel()) {
-kvm_kernel_ioapic_save_to_user(s);
-}
-}
-
-static void kvm_ioapic_post_load(IOAPICCommonState *s)
-{
-if (kvm_enabled()  kvm_irqchip_in_kernel()) {
-kvm_kernel_ioapic_load_from_user(s);
-}
-}
-
-static void ioapic_reset(DeviceState *d)
-{
-IOAPICCommonState *s = DO_UPCAST(IOAPICCommonState, busdev.qdev, d);
-
-ioapic_reset_common(d);
-#ifdef KVM_CAP_IRQCHIP
-if (kvm_enabled()  kvm_irqchip_in_kernel()) {
-kvm_kernel_ioapic_load_from_user(s);
-}
-#endif
-}
-
 static const MemoryRegionOps ioapic_io_ops = {
 .read = ioapic_mem_read,
 .write = ioapic_mem_write,
@@ -312,9 +241,7 @@ static void ioapic_class_init(ObjectClass *klass, void 
*data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 
 k-init = ioapic_init;
-k-pre_save = kvm_ioapic_pre_save;
-k-post_load = kvm_ioapic_post_load;
-dc-reset = ioapic_reset;
+dc-reset = ioapic_reset_common;
 }
 
 static TypeInfo ioapic_info = {
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index ef0202a..58bec18 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -47,8 +47,6 @@
 #  include xen/hvm/hvm_info_table.h
 #endif
 
-qemu_irq *ioapic_irq_hack;
-
 #define MAX_IDE_BUS 2
 
 static const int ide_iobase[MAX_IDE_BUS] = { 0x1f0, 0x170 };
@@ -108,12 +106,9 @@ static void ioapic_init(GSIState *gsi_state)
 SysBusDevice *d;
 unsigned int i;
 
-#ifdef UNUSED_UPSTREAM_KVM
 if (kvm_enabled()  kvm_irqchip_in_kernel()) {
 dev = qdev_create(NULL, kvm-ioapic);
-} else
-#endif
-{
+} else {
 dev = qdev_create(NULL, ioapic);
 }
 qdev_init_nofail(dev);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/9] qemu-kvm: Use upstream irq routing services

2012-02-13 Thread Jan Kiszka
Replace qemu-kvm's versions of kvm_add_irq_route, kvm_add_routing_entry,
kvm_init_irq_routing, kvm_arch_init_irq_routing, and
kvm_commit_irq_routes with the corresponding upstream services. Until
the MSI API is refactored, we only need to export kvm_add_routing_entry
for this.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/device-assignment.c |   10 ++--
 hw/msi.c   |4 +-
 hw/msix.c  |8 ++--
 hw/pc.c|2 +-
 hw/pc_piix.c   |4 --
 kvm-all.c  |   10 +---
 kvm-stub.c |4 +-
 kvm.h  |6 +-
 qemu-kvm-x86.c |   50 
 qemu-kvm.c |  117 +--
 qemu-kvm.h |   19 +---
 target-i386/kvm.c  |2 -
 12 files changed, 25 insertions(+), 211 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 584cbb9..d8019fe 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -943,8 +943,8 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
 }
 assigned_dev-entry-gsi = r;
 
-kvm_add_routing_entry(assigned_dev-entry);
-if (kvm_commit_irq_routes()  0) {
+kvm_add_routing_entry(kvm_state, assigned_dev-entry);
+if (kvm_irqchip_commit_routes(kvm_state)  0) {
 perror(assigned_dev_update_msi: kvm_commit_irq_routes);
 assigned_dev-cap.state = ~ASSIGNED_DEVICE_MSI_ENABLED;
 return;
@@ -1028,7 +1028,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 DEBUG(MSI-X vector %d, gsi %d, addr %08x_%08x, data %08x\n, i,
   r, entry-addr_hi, entry-addr_lo, entry-data);
 
-kvm_add_routing_entry(adev-entry[i]);
+kvm_add_routing_entry(kvm_state, adev-entry[i]);
 
 msix_entry.gsi = adev-entry[i].gsi;
 msix_entry.entry = i;
@@ -1039,7 +1039,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 }
 }
 
-if (r == 0  kvm_commit_irq_routes()  0) {
+if (r == 0  kvm_irqchip_commit_routes(kvm_state)  0) {
perror(assigned_dev_update_msix_mmio: kvm_commit_irq_routes);
return -EINVAL;
 }
@@ -1504,7 +1504,7 @@ static void msix_mmio_write(void *opaque, 
target_phys_addr_t addr,
 return;
 }
 
-ret = kvm_commit_irq_routes();
+ret = kvm_irqchip_commit_routes(kvm_state);
 if (ret) {
 fprintf(stderr,
 Error committing irq routes (%d)\n, ret);
diff --git a/hw/msi.c b/hw/msi.c
index 5c179c2..3e623c2 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -178,7 +178,7 @@ static void kvm_msi_update(PCIDevice *dev)
 }
 dev-msi_entries_nr = nr_vectors;
 if (changed) {
-r = kvm_commit_irq_routes();
+r = kvm_irqchip_commit_routes(kvm_state);
 if (r) {
 fprintf(stderr, %s: kvm_commit_irq_routes failed: %s\n, __func__,
 strerror(-r));
@@ -196,7 +196,7 @@ static void kvm_msi_free(PCIDevice *dev)
 kvm_msi_message_del(dev-msi_irq_entries[vector]);
 }
 if (dev-msi_entries_nr  0) {
-kvm_commit_irq_routes();
+kvm_irqchip_commit_routes(kvm_state);
 }
 dev-msi_entries_nr = 0;
 }
diff --git a/hw/msix.c b/hw/msix.c
index 6e40957..55ddbf4 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -49,7 +49,7 @@ static void kvm_msix_free(PCIDevice *dev)
 }
 }
 if (changed) {
-kvm_commit_irq_routes();
+kvm_irqchip_commit_routes(kvm_state);
 }
 }
 
@@ -89,7 +89,7 @@ static void kvm_msix_update(PCIDevice *dev, int vector,
 }
 if (r  0) {
 *entry = new_entry;
-r = kvm_commit_irq_routes();
+r = kvm_irqchip_commit_routes(kvm_state);
 if (r) {
 fprintf(stderr, %s: kvm_commit_irq_routes failed: %s\n, __func__,
strerror(-r));
@@ -110,7 +110,7 @@ static int kvm_msix_vector_add(PCIDevice *dev, unsigned 
vector)
 return r;
 }
 
-r = kvm_commit_irq_routes();
+r = kvm_irqchip_commit_routes(kvm_state);
 if (r  0) {
 fprintf(stderr, %s: kvm_commit_irq_routes failed: %s\n, __func__, 
strerror(-r));
 return r;
@@ -121,7 +121,7 @@ static int kvm_msix_vector_add(PCIDevice *dev, unsigned 
vector)
 static void kvm_msix_vector_del(PCIDevice *dev, unsigned vector)
 {
 kvm_msi_message_del(dev-msix_irq_entries[vector]);
-kvm_commit_irq_routes();
+kvm_irqchip_commit_routes(kvm_state);
 }
 
 /* Add MSI-X capability to the config space for the device. */
diff --git a/hw/pc.c b/hw/pc.c
index 70abb6c..e38a63d 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1158,7 +1158,7 @@ void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi,
 
 register_ioport_write(0xf0, 1, 1, ioportF0_write, NULL);
 
-if (!no_hpet) {
+if (!no_hpet  (!kvm_irqchip_in_kernel() || kvm_has_pit_state2())) {
 

[PATCH v2 7/9] qemu-kvm: Drop unused kvm_get/set_irqchip

2012-02-13 Thread Jan Kiszka
No users remaining.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 qemu-kvm.c |   28 
 qemu-kvm.h |   23 ---
 2 files changed, 0 insertions(+), 51 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 8f1b760..10a313d 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -71,34 +71,6 @@ int kvm_set_irq(int irq, int level, int *status)
 return 1;
 }
 
-int kvm_get_irqchip(KVMState *s, struct kvm_irqchip *chip)
-{
-int r;
-
-if (!s-irqchip_in_kernel) {
-return 0;
-}
-r = kvm_vm_ioctl(s, KVM_GET_IRQCHIP, chip);
-if (r  0) {
-perror(kvm_get_irqchip\n);
-}
-return r;
-}
-
-int kvm_set_irqchip(KVMState *s, struct kvm_irqchip *chip)
-{
-int r;
-
-if (!s-irqchip_in_kernel) {
-return 0;
-}
-r = kvm_vm_ioctl(s, KVM_SET_IRQCHIP, chip);
-if (r  0) {
-perror(kvm_set_irqchip\n);
-}
-return r;
-}
-
 #endif
 
 #ifdef KVM_CAP_DEVICE_ASSIGNMENT
diff --git a/qemu-kvm.h b/qemu-kvm.h
index cd5e3cc..433e2fe 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -32,29 +32,6 @@
 
 #include kvm.h
 
-/*!
- * \brief Dump in kernel IRQCHIP contents
- *
- * Dump one of the in kernel irq chip devices, including PIC (master/slave)
- * and IOAPIC into a kvm_irqchip structure
- *
- * \param kvm Pointer to the current kvm_context
- * \param chip The irq chip device to be dumped
- */
-int kvm_get_irqchip(KVMState *s, struct kvm_irqchip *chip);
-
-/*!
- * \brief Set in kernel IRQCHIP contents
- *
- * Write one of the in kernel irq chip devices, including PIC (master/slave)
- * and IOAPIC
- *
- *
- * \param kvm Pointer to the current kvm_context
- * \param chip THe irq chip device to be written
- */
-int kvm_set_irqchip(KVMState *s, struct kvm_irqchip *chip);
-
 #if defined(__i386__) || defined(__x86_64__)
 /*!
  * \brief Get in kernel local APIC for vcpu
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/9] qemu-kvm: Use upstream kvm_irqchip_create

2012-02-13 Thread Jan Kiszka
Drop kvm_create_irqchip in favor of the equivalent upstream version.
This also allows to drop the kvm_irqchip global variable.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 kvm-all.c  |   15 ---
 qemu-kvm.c |   29 -
 qemu-kvm.h |3 ---
 vl.c   |1 -
 4 files changed, 0 insertions(+), 48 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index afcad44..606bd02 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -843,7 +843,6 @@ static void kvm_init_irq_routing(KVMState *s)
 
 static int kvm_irqchip_create(KVMState *s)
 {
-#ifdef UNUSED_UPSTREAM_KVM
 QemuOptsList *list = qemu_find_opts(machine);
 int ret;
 
@@ -867,7 +866,6 @@ static int kvm_irqchip_create(KVMState *s)
 s-irqchip_in_kernel = 1;
 
 kvm_init_irq_routing(s);
-#endif
 
 return 0;
 }
@@ -881,7 +879,6 @@ int kvm_init(void)
 const KVMCapabilityInfo *missing_cap;
 int ret;
 int i;
-QemuOptsList *list;
 
 s = g_malloc0(sizeof(KVMState));
 
@@ -968,13 +965,6 @@ int kvm_init(void)
 s-pit_state2 = kvm_check_extension(s, KVM_CAP_PIT_STATE2);
 #endif
 
-list = qemu_find_opts(machine);
-if (!QTAILQ_EMPTY(list-head) 
-!qemu_opt_get_bool(QTAILQ_FIRST(list-head),
-   kernel_irqchip, false)) {
-kvm_irqchip = 0;
-}
-
 ret = kvm_arch_init(s);
 if (ret  0) {
 goto err;
@@ -990,11 +980,6 @@ int kvm_init(void)
 
 s-many_ioeventfds = kvm_check_many_ioeventfds();
 
-ret = kvm_create_irqchip(s);
-if (ret  0) {
-return ret;
-}
-
 cpu_interrupt_handler = kvm_handle_interrupt;
 
 return 0;
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 37af80f..8f1b760 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -41,35 +41,6 @@ static inline void clear_gsi(KVMState *s, unsigned int gsi)
 }
 }
 
-int kvm_create_irqchip(KVMState *s)
-{
-#ifdef KVM_CAP_IRQCHIP
-int r;
-
-if (!kvm_irqchip || !kvm_check_extension(s, KVM_CAP_IRQCHIP)) {
-return 0;
-}
-
-r = kvm_vm_ioctl(s, KVM_CREATE_IRQCHIP);
-if (r  0) {
-fprintf(stderr, Create kernel PIC irqchip failed\n);
-return r;
-}
-
-s-irqchip_inject_ioctl = KVM_IRQ_LINE;
-#if defined(KVM_CAP_IRQ_INJECT_STATUS)  defined(KVM_IRQ_LINE_STATUS)
-if (kvm_check_extension(s, KVM_CAP_IRQ_INJECT_STATUS)) {
-s-irqchip_inject_ioctl = KVM_IRQ_LINE_STATUS;
-}
-#endif
-s-irqchip_in_kernel = 1;
-
-kvm_init_irq_routing(s);
-#endif
-
-return 0;
-}
-
 #ifdef KVM_CAP_IRQCHIP
 
 int kvm_set_irq(int irq, int level, int *status)
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 6235800..cd5e3cc 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -32,8 +32,6 @@
 
 #include kvm.h
 
-int kvm_create_irqchip(KVMState *s);
-
 /*!
  * \brief Dump in kernel IRQCHIP contents
  *
@@ -243,7 +241,6 @@ int kvm_arch_set_ioport_access(unsigned long start, 
unsigned long size,
 
 int kvm_create_pit(KVMState *s);
 
-extern int kvm_irqchip;
 extern int kvm_pit_reinject;
 extern unsigned int kvm_shadow_memory;
 
diff --git a/vl.c b/vl.c
index c3b4037..98d29ce 100644
--- a/vl.c
+++ b/vl.c
@@ -2173,7 +2173,6 @@ static void free_and_trace(gpointer mem)
 }
 
 #ifdef CONFIG_KVM_OPTIONS
-int kvm_irqchip = 1;
 int kvm_pit_reinject = 1;
 #endif
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 9/9] qemu-kvm: Use upstream kvm-apic

2012-02-13 Thread Jan Kiszka
Drop the qemu-kvm version in favor of the equivalent upstream
implementation.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/apic.c |  151 +---
 hw/pc.c   |5 +--
 qemu-kvm-x86.c|   31 ---
 qemu-kvm.h|   25 -
 target-i386/kvm.c |8 ---
 5 files changed, 4 insertions(+), 216 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index b767b87..086c544 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -22,7 +22,6 @@
 #include host-utils.h
 #include trace.h
 #include pc.h
-#include kvm.h
 
 #define MAX_APIC_WORDS 8
 
@@ -133,35 +132,9 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
 }
 }
 
-static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id);
-
-static void kvm_irqchip_deliver_nmi(void *p)
-{
-APICCommonState *s = p;
-struct kvm_lapic_state klapic;
-uint32_t lvt;
-
-kvm_get_lapic(s-cpu_env, klapic);
-lvt = kapic_reg(klapic, 0x32 + APIC_LVT_LINT1);
-
-if (lvt  APIC_LVT_MASKED) {
-return;
-}
-
-if (((lvt  8)  7) != APIC_DM_NMI) {
-return;
-}
-
-kvm_vcpu_ioctl(s-cpu_env, KVM_NMI);
-}
-
 static void apic_external_nmi(APICCommonState *s)
 {
-if (kvm_irqchip_in_kernel()) {
-run_on_cpu(s-cpu_env, kvm_irqchip_deliver_nmi, s);
-} else {
-apic_local_deliver(s, APIC_LVT_LINT1);
-}
+apic_local_deliver(s, APIC_LVT_LINT1);
 }
 
 #define foreach_apic(apic, deliver_bitmask, code) \
@@ -254,11 +227,8 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, 
uint8_t delivery_mode,
 
 static void apic_set_base(APICCommonState *s, uint64_t val)
 {
-if (kvm_enabled()  kvm_irqchip_in_kernel())
-s-apicbase = val;
-else
-s-apicbase = (val  0xf000) |
-(s-apicbase  (MSR_IA32_APICBASE_BSP | MSR_IA32_APICBASE_ENABLE));
+s-apicbase = (val  0xf000) |
+(s-apicbase  (MSR_IA32_APICBASE_BSP | MSR_IA32_APICBASE_ENABLE));
 /* if disabled, cannot be enabled again */
 if (!(val  MSR_IA32_APICBASE_ENABLE)) {
 s-apicbase = ~MSR_IA32_APICBASE_ENABLE;
@@ -270,9 +240,6 @@ static void apic_set_base(APICCommonState *s, uint64_t val)
 static void apic_set_tpr(APICCommonState *s, uint8_t val)
 {
 s-tpr = (val  0x0f)  4;
-if (kvm_enabled()  kvm_irqchip_in_kernel()) {
-return;
-}
 apic_update_irq(s);
 }
 
@@ -770,120 +737,8 @@ static void apic_mem_writel(void *opaque, 
target_phys_addr_t addr, uint32_t val)
 }
 }
 
-#ifdef KVM_CAP_IRQCHIP
-
-static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id)
-{
-return *((uint32_t *) (kapic-regs + (reg_id  4)));
-}
-
-static inline void kapic_set_reg(struct kvm_lapic_state *kapic,
- int reg_id, uint32_t val)
-{
-*((uint32_t *) (kapic-regs + (reg_id  4))) = val;
-}
-
-static void kvm_kernel_lapic_save_to_user(APICCommonState *s)
-{
-struct kvm_lapic_state apic;
-struct kvm_lapic_state *kapic = apic;
-int i, v;
-
-kvm_get_lapic(s-cpu_env, kapic);
-
-s-id = kapic_reg(kapic, 0x2)  24;
-s-tpr = kapic_reg(kapic, 0x8);
-s-arb_id = kapic_reg(kapic, 0x9);
-s-log_dest = kapic_reg(kapic, 0xd)  24;
-s-dest_mode = kapic_reg(kapic, 0xe)  28;
-s-spurious_vec = kapic_reg(kapic, 0xf);
-for (i = 0; i  8; i++) {
-s-isr[i] = kapic_reg(kapic, 0x10 + i);
-s-tmr[i] = kapic_reg(kapic, 0x18 + i);
-s-irr[i] = kapic_reg(kapic, 0x20 + i);
-}
-s-esr = kapic_reg(kapic, 0x28);
-s-icr[0] = kapic_reg(kapic, 0x30);
-s-icr[1] = kapic_reg(kapic, 0x31);
-for (i = 0; i  APIC_LVT_NB; i++)
-   s-lvt[i] = kapic_reg(kapic, 0x32 + i);
-s-initial_count = kapic_reg(kapic, 0x38);
-s-divide_conf = kapic_reg(kapic, 0x3e);
-
-v = (s-divide_conf  3) | ((s-divide_conf  1)  4);
-s-count_shift = (v + 1)  7;
-
-s-initial_count_load_time = qemu_get_clock_ns(vm_clock);
-apic_next_timer(s, s-initial_count_load_time);
-}
-
-static void kvm_kernel_lapic_load_from_user(APICCommonState *s)
-{
-struct kvm_lapic_state apic;
-struct kvm_lapic_state *klapic = apic;
-int i;
-
-memset(klapic, 0, sizeof apic);
-kapic_set_reg(klapic, 0x2, s-id  24);
-kapic_set_reg(klapic, 0x8, s-tpr);
-kapic_set_reg(klapic, 0xd, s-log_dest  24);
-kapic_set_reg(klapic, 0xe, s-dest_mode  28 | 0x0fff);
-kapic_set_reg(klapic, 0xf, s-spurious_vec);
-for (i = 0; i  8; i++) {
-kapic_set_reg(klapic, 0x10 + i, s-isr[i]);
-kapic_set_reg(klapic, 0x18 + i, s-tmr[i]);
-kapic_set_reg(klapic, 0x20 + i, s-irr[i]);
-}
-kapic_set_reg(klapic, 0x28, s-esr);
-kapic_set_reg(klapic, 0x30, s-icr[0]);
-kapic_set_reg(klapic, 0x31, s-icr[1]);
-for (i = 0; i  APIC_LVT_NB; i++)
-kapic_set_reg(klapic, 0x32 + i, s-lvt[i]);
-kapic_set_reg(klapic, 0x38, s-initial_count);
-kapic_set_reg(klapic, 0x3e, s-divide_conf);
-
-

[PATCH v2 6/9] qemu-kvm: Use upstream kvm-i8259

2012-02-13 Thread Jan Kiszka
Drop the qemu-kvm version in favor of the equivalent upstream
implementation. This allows to move the i8259 back into the hwlib.

Note that this also drops the testdev hack and restores proper
isa_get_irq. If testdev scripts exist that inject  IRQ15, they need
fixing. Testing for these interrupts on the PIIX3 makes no practical
sense anyway as those lines are unused.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 Makefile.objs   |2 +-
 Makefile.target |8 ++--
 hw/i8259.c  |  108 ---
 hw/isa-bus.c|2 +-
 hw/pc_piix.c|5 +--
 5 files changed, 7 insertions(+), 118 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index ee6b15d..2f70b84 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -229,7 +229,7 @@ hw-obj-$(CONFIG_APPLESMC) += applesmc.o
 hw-obj-$(CONFIG_SMARTCARD) += usb-ccid.o ccid-card-passthru.o
 hw-obj-$(CONFIG_SMARTCARD_NSS) += ccid-card-emulated.o
 hw-obj-$(CONFIG_USB_REDIR) += usb-redir.o
-# hw-obj-$(CONFIG_I8259) += i8259_common.o i8259.o
+hw-obj-$(CONFIG_I8259) += i8259_common.o i8259.o
 
 # PPC devices
 hw-obj-$(CONFIG_PREP_PCI) += prep_pci.o
diff --git a/Makefile.target b/Makefile.target
index f644762..b0ff38e 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -239,7 +239,7 @@ obj-$(CONFIG_IVSHMEM) += ivshmem.o
 obj-y += device-hotplug.o
 
 # Hardware support
-obj-i386-y += mc146818rtc.o pc.o i8259_common.o i8259.o
+obj-i386-y += mc146818rtc.o pc.o
 obj-i386-y += sga.o apic_common.o apic.o ioapic_common.o ioapic.o piix_pci.o
 obj-i386-y += vmport.o
 obj-i386-y += pci-hotplug.o smbios.o wdt_ib700.o
@@ -257,7 +257,7 @@ obj-i386-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += 
device-assignment.o
 # shared objects
 obj-ppc-y = ppc.o ppc_booke.o
 # PREP target
-obj-ppc-y += mc146818rtc.o i8259_common.o i8259.o
+obj-ppc-y += mc146818rtc.o
 obj-ppc-y += ppc_prep.o
 # OldWorld PowerMac
 obj-ppc-y += ppc_oldworld.o
@@ -312,7 +312,7 @@ obj-mips-y += pcspk.o i8254.o
 obj-mips-y += acpi.o acpi_piix4.o
 obj-mips-y += mips_addr.o mips_timer.o mips_int.o
 obj-mips-y += jazz_led.o
-obj-mips-y += gt64xxx.o mc146818rtc.o i8259_common.o i8259.o
+obj-mips-y += gt64xxx.o mc146818rtc.o
 obj-mips-$(CONFIG_FULONG) += bonito.o vt82c686.o mips_fulong2e.o
 
 obj-microblaze-y = petalogix_s3adsp1800_mmu.o
@@ -391,7 +391,7 @@ obj-m68k-y += m68k-semi.o dummy_m68k.o
 
 obj-s390x-y = s390-virtio-bus.o s390-virtio.o
 
-obj-alpha-y = mc146818rtc.o i8259_common.o i8259.o
+obj-alpha-y = mc146818rtc.o
 obj-alpha-y += alpha_pci.o alpha_dp264.o alpha_typhoon.o
 
 obj-xtensa-y += xtensa_pic.o
diff --git a/hw/i8259.c b/hw/i8259.c
index cfffbee..7ae5380 100644
--- a/hw/i8259.c
+++ b/hw/i8259.c
@@ -28,11 +28,6 @@
 #include qemu-timer.h
 #include i8259_internal.h
 
-#include kvm.h
-#include apic_internal.h
-
-static void kvm_i8259_set_irq(void *opaque, int irq, int level);
-
 /* debug PIC */
 //#define DEBUG_PIC
 
@@ -226,17 +221,9 @@ int pic_read_irq(DeviceState *d)
 return intno;
 }
 
-static int kvm_kernel_pic_load_from_user(PICCommonState *s);
-
 static void pic_init_reset(PICCommonState *s)
 {
 pic_reset_common(s);
-
-if (kvm_enabled()  kvm_irqchip_in_kernel()) {
-kvm_kernel_pic_load_from_user(s);
-return;
-}
-
 pic_update_irq(s);
 }
 
@@ -393,22 +380,6 @@ static uint64_t elcr_ioport_read(void *opaque, 
target_phys_addr_t addr,
 return s-elcr;
 }
 
-static void kvm_kernel_pic_save_to_user(PICCommonState *s);
-
-static void kvm_pic_pre_save(PICCommonState *s)
-{
-if (kvm_enabled()  kvm_irqchip_in_kernel()) {
-kvm_kernel_pic_save_to_user(s);
-}
-}
-
-static void kvm_pic_post_load(PICCommonState *s)
-{
-if (kvm_enabled()  kvm_irqchip_in_kernel()) {
-kvm_kernel_pic_load_from_user(s);
-}
-}
-
 static const MemoryRegionOps pic_base_ioport_ops = {
 .read = pic_ioport_read,
 .write = pic_ioport_write,
@@ -498,10 +469,6 @@ qemu_irq *i8259_init(ISABus *bus, qemu_irq parent_irq)
 
 slave_pic = DO_UPCAST(PICCommonState, dev, dev);
 
-if (kvm_enabled()  kvm_irqchip_in_kernel()) {
-irq_set = qemu_allocate_irqs(kvm_i8259_set_irq, NULL, 24);
-}
-
 return irq_set;
 }
 
@@ -511,8 +478,6 @@ static void i8259_class_init(ObjectClass *klass, void *data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 
 k-init = pic_init;
-k-pre_save  = kvm_pic_pre_save;
-k-post_load = kvm_pic_post_load;
 dc-reset = pic_reset;
 }
 
@@ -528,77 +493,4 @@ static void pic_register(void)
 type_register_static(i8259_info);
 }
 
-static void kvm_kernel_pic_save_to_user(PICCommonState *s)
-{
-#ifdef KVM_CAP_IRQCHIP
-struct kvm_irqchip chip;
-struct kvm_pic_state *kpic;
-
-chip.chip_id = s-master ?
-   KVM_IRQCHIP_PIC_MASTER :
-   KVM_IRQCHIP_PIC_SLAVE;
-kvm_get_irqchip(kvm_state, chip);
-kpic = chip.chip.pic;
-
-s-last_irr = kpic-last_irr;
-s-irr = kpic-irr;
-s-imr = kpic-imr;
-s-isr = kpic-isr;
-

[PATCH v2 1/9] qemu-kvm: Move kvm_create_pit out of arch init code

2012-02-13 Thread Jan Kiszka
This belongs where the PIT is created and allows us to drop another
kvm_irqchip reference.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/i8254-kvm.c|3 +++
 qemu-kvm-x86.c|4 ++--
 qemu-kvm.h|2 ++
 target-i386/kvm.c |7 ---
 4 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/hw/i8254-kvm.c b/hw/i8254-kvm.c
index 8b494d0..f0c7ac8 100644
--- a/hw/i8254-kvm.c
+++ b/hw/i8254-kvm.c
@@ -107,6 +107,9 @@ void kvm_pit_init(PITState *pit)
 {
 PITChannelState *s;
 
+if (kvm_create_pit(kvm_state)  0) {
+hw_error(KVM PIT creation failed\n);
+}
 s = pit-channels[0];
 s-irq_timer = qemu_new_timer_ns(vm_clock, dummy_timer, s);
 vmstate_pit.pre_save = kvm_pit_pre_save;
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index a0bfc23..6fe48a4 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -23,11 +23,11 @@
 #include kvm.h
 #include hw/apic.h
 
-static int kvm_create_pit(KVMState *s)
+int kvm_create_pit(KVMState *s)
 {
 int r;
 
-if (kvm_irqchip) {
+if (kvm_irqchip_in_kernel()) {
 r = kvm_vm_ioctl(s, KVM_CREATE_PIT);
 if (r  0) {
 fprintf(stderr, Create kernel PIC irqchip failed\n);
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 975b6fa..653370e 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -256,6 +256,8 @@ int kvm_update_ioport_access(CPUState *env);
 int kvm_arch_set_ioport_access(unsigned long start, unsigned long size,
bool enable);
 
+int kvm_create_pit(KVMState *s);
+
 extern int kvm_irqchip;
 extern int kvm_pit_reinject;
 extern unsigned int kvm_shadow_memory;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 7079e87..ee2d3f8 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -661,8 +661,6 @@ static int kvm_get_supported_msrs(KVMState *s)
 return ret;
 }
 
-static int kvm_create_pit(KVMState *s);
-
 int kvm_arch_init(KVMState *s)
 {
 uint64_t identity_base = 0xfffbc000;
@@ -712,11 +710,6 @@ int kvm_arch_init(KVMState *s)
 }
 qemu_register_reset(kvm_unpoison_all, NULL);
 
-ret = kvm_create_pit(s);
-if (ret  0) {
-return ret;
-}
-
 if (kvm_shadow_memory) {
 ret = kvm_vm_ioctl(s, KVM_SET_NR_MMU_PAGES, kvm_shadow_memory);
 if (ret  0) {
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


x86: kvmclock: abstract save/restore sched_clock_state (v2)

2012-02-13 Thread Marcelo Tosatti

Upon resume from hibernation, CPU 0's hvclock area contains the old
values for system_time and tsc_timestamp. It is necessary for the
hypervisor to update these values with uptodate ones before the CPU uses
them.

Abstract TSC's save/restore sched_clock_state functions and use
restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update.

Also move restore_sched_clock_state before __restore_processor_state,
since the later calls CONFIG_LOCK_STAT's lockstat_clock (also for TSC).
Thanks to Igor Mammedov for tracking it down.

Fixes suspend-to-disk with kvmclock.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 15d9915..c91e8b9 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -61,7 +61,7 @@ extern void check_tsc_sync_source(int cpu);
 extern void check_tsc_sync_target(void);
 
 extern int notsc_setup(char *);
-extern void save_sched_clock_state(void);
-extern void restore_sched_clock_state(void);
+extern void tsc_save_sched_clock_state(void);
+extern void tsc_restore_sched_clock_state(void);
 
 #endif /* _ASM_X86_TSC_H */
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 5d0afac..baaca8d 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -162,6 +162,8 @@ struct x86_cpuinit_ops {
  * @is_untracked_pat_range exclude from PAT logic
  * @nmi_init   enable NMI on cpus
  * @i8042_detect   pre-detect if i8042 controller exists
+ * @save_sched_clock_state:save state for sched_clock() on suspend
+ * @restore_sched_clock_state: restore state for sched_clock() on resume
  */
 struct x86_platform_ops {
unsigned long (*calibrate_tsc)(void);
@@ -173,6 +175,8 @@ struct x86_platform_ops {
void (*nmi_init)(void);
unsigned char (*get_nmi_reason)(void);
int (*i8042_detect)(void);
+   void (*save_sched_clock_state)(void);
+   void (*restore_sched_clock_state)(void);
 };
 
 struct pci_dev;
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index ca4e735..57e6b78 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -136,6 +136,15 @@ int kvm_register_clock(char *txt)
return ret;
 }
 
+void kvm_save_sched_clock_state(void)
+{
+}
+
+void kvm_restore_sched_clock_state(void)
+{
+   kvm_register_clock(primary cpu clock, resume);
+}
+
 #ifdef CONFIG_X86_LOCAL_APIC
 static void __cpuinit kvm_setup_secondary_clock(void)
 {
@@ -195,6 +204,8 @@ void __init kvmclock_init(void)
x86_cpuinit.early_percpu_clock_init =
kvm_setup_secondary_clock;
 #endif
+   x86_platform.save_sched_clock_state = kvm_save_sched_clock_state;
+   x86_platform.restore_sched_clock_state = kvm_restore_sched_clock_state;
machine_ops.shutdown  = kvm_shutdown;
 #ifdef CONFIG_KEXEC
machine_ops.crash_shutdown  = kvm_crash_shutdown;
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index a62c201..aed2aa1 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -629,7 +629,7 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
 
 static unsigned long long cyc2ns_suspend;
 
-void save_sched_clock_state(void)
+void tsc_save_sched_clock_state(void)
 {
if (!sched_clock_stable)
return;
@@ -645,7 +645,7 @@ void save_sched_clock_state(void)
  * that sched_clock() continues from the point where it was left off during
  * suspend.
  */
-void restore_sched_clock_state(void)
+void tsc_restore_sched_clock_state(void)
 {
unsigned long long offset;
unsigned long flags;
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 6f2ec53..e9f265f 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -108,7 +108,9 @@ struct x86_platform_ops x86_platform = {
.is_untracked_pat_range = is_ISA_range,
.nmi_init   = default_nmi_init,
.get_nmi_reason = default_get_nmi_reason,
-   .i8042_detect   = default_i8042_detect
+   .i8042_detect   = default_i8042_detect,
+   .save_sched_clock_state = tsc_save_sched_clock_state,
+   .restore_sched_clock_state  = tsc_restore_sched_clock_state,
 };
 
 EXPORT_SYMBOL_GPL(x86_platform);
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index f10c0af..0e76a28 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -114,7 +114,7 @@ static void __save_processor_state(struct saved_context 
*ctxt)
 void save_processor_state(void)
 {
__save_processor_state(saved_context);
-   save_sched_clock_state();
+   x86_platform.save_sched_clock_state();
 }
 #ifdef CONFIG_X86_32
 EXPORT_SYMBOL(save_processor_state);
@@ -230,8 +230,8 @@ static void __restore_processor_state(struct saved_context 
*ctxt)
 /* Needed by apm.c */
 void restore_processor_state(void)
 {
+   

Re: [Android-virt] [PATCH RFC v2 3/3] ARM: KVM: Add support for MMU notifiers

2012-02-13 Thread Marc Zyngier
On 12/02/12 01:12, Christoffer Dall wrote:
 On Sat, Feb 11, 2012 at 10:33 AM, Antonios Motakis
 a.mota...@virtualopensystems.com wrote:
 On 02/11/2012 06:35 PM, Christoffer Dall wrote:

 On Sat, Feb 11, 2012 at 7:00 AM, Antonios Motakis
 a.mota...@virtualopensystems.com  wrote:

 On 02/10/2012 11:22 PM, Marc Zyngier wrote:

 +ENTRY(__kvm_tlb_flush_vmid)
 +   hvc #0  @ Switch to Hyp mode
 +   push{r2, r3}

 +   ldrdr2, r3, [r0, #KVM_VTTBR]
 +   mcrrp15, 6, r2, r3, c2  @ Write VTTBR
 +   isb
 +   mcr p15, 0, r0, c8, c7, 0   @ TBLIALL
 +   dsb
 +   isb
 +   mov r2, #0
 +   mov r3, #0
 +   mcrrp15, 6, r2, r3, c2  @ Back to VMID #0
 +   isb
 +
 +   pop {r2, r3}
 +   hvc #0  @ Back to SVC
 +   mov pc, lr
 +ENDPROC(__kvm_tlb_flush_vmid)


 With the last VMID implementation, you could get the equivalent effect of
 a
 per-VMID flush, by just getting a new VMID for the current VM. So you
 could
 do a (kvm-arch.vmid = 0) to force a new VMID when the guest reruns, and
 save the overhead of that flush (you will do a complete flush every 255
 times instead of a small one every single time).

 to do this you would need to send an IPI if the guest is currently
 executing on another CPU and make it exit the guest, so that the VMID
 assignment will run before the guest potentially accesses that TLB
 entry that points to the page that was just reclaimed - which I am not
 sure will be better than this solution.

 Don't you have to do this anyway? You'd want the flush to be effective on
 all CPUs before proceeding.
 
 hmm yeah, actually you do need this. Unless the -IS version of the
 flush instruction covers all relevant cores in this case. Marc, I
 don't think that the processor clearing out the page table entry will
 necessarily belong to the same inner-shareable domain as the processor
 potentially executing the VM, so therefore the -IS flushing version
 would not be sufficient and we actually have to go and send an IPI.

If we forget about the 11MPCore (which doesn't broadcast the TLB
invalidation in hardware), the TLBIALLIS operation makes sure all cores
belonging to the same inner shareable domain will see the TLB
invalidation at the same time. If they don't, this is a hardware bug.

Now, I do not have an example of a system where two CPUs are not part of
the same IS domain. Even big.LITTLE has all of the potential 8 cores in
an IS domain. If such a system exists one of these days, then it will be
worth considering having a separate method to cope with the case. Until
then, my opinion is to keep it as simple as possible.

M.
-- 
Jazz is not dead. It just smells funny...

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

2012-02-13 Thread ronnie sahlberg
On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin m...@redhat.com wrote:
 On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:
 Only if you use the pci multi-function option but that kills
 standard hot unplug

 It doesn't kill it as such, rather you can't unplug luns individually.

Isnt that just a consequence of the current implementation rather than
a SCSI limitation?

A different way to do hoplug could be to flag all devices as removable
in the standard inq page then
leave the LUN there persistently and what you remove/add is not the
LUN device itself but just the media in the device.

Instead of hot-plug remove the LUN,  hot-plug becomes media eject or
media insert.
The device remains present all time, you never remove it, but instead
hot-plug controls if the media is present or not.


This would require implementing at least START_STOP_UNIT and
PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC.


regards
ronnie sahlberg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

2012-02-13 Thread Paolo Bonzini

On 02/13/2012 02:13 PM, ronnie sahlberg wrote:

On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin m...@redhat.com wrote:

On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:

Only if you use the pci multi-function option but that kills
standard hot unplug


It doesn't kill it as such, rather you can't unplug luns individually.


Isnt that just a consequence of the current implementation rather than
a SCSI limitation?


We're talking about virtio-blk here. :)

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

2012-02-13 Thread Michael S. Tsirkin
On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote:
 On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin m...@redhat.com wrote:
  On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:
  Only if you use the pci multi-function option but that kills
  standard hot unplug
 
  It doesn't kill it as such, rather you can't unplug luns individually.
 
 Isnt that just a consequence of the current implementation rather than
 a SCSI limitation?

Yes.

 A different way to do hoplug could be to flag all devices as removable
 in the standard inq page then
 leave the LUN there persistently and what you remove/add is not the
 LUN device itself but just the media in the device.
 
 Instead of hot-plug remove the LUN,  hot-plug becomes media eject or
 media insert.
 The device remains present all time, you never remove it, but instead
 hot-plug controls if the media is present or not.
 
 
 This would require implementing at least START_STOP_UNIT and
 PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC.
 
 
 regards
 ronnie sahlberg

That would work.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane

2012-02-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42755





--- Comment #22 from Rosen sandik...@yandex.ru  2012-02-13 13:19:29 ---
Created an attachment (id=72363)
 -- (https://bugzilla.kernel.org/attachment.cgi?id=72363)
trace-cmd report

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane

2012-02-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42755





--- Comment #23 from Gleb g...@redhat.com  2012-02-13 13:30:01 ---
What guest did during this trace? Can you provide info pci monitor output
pls?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Q: Does linux kvm native tool support loading BIOS as the default loader now?

2012-02-13 Thread Asias He
On 02/13/2012 12:38 PM, Pekka Enberg wrote:
 On Mon, Feb 13, 2012 at 08:14:22PM +0800, Yang Bai wrote:
 As I know, native tool does not support loading BIOS so it does not
 support Windows. Is this supporting now?
 If not, I may try to implement it.

You're welcome to do so ;-). This would open the door for non-linux OS
support in kvm tool.

-- 
Asias He
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane

2012-02-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42755


Avi Kivity a...@redhat.com changed:

   What|Removed |Added

 CC||a...@redhat.com




--- Comment #24 from Avi Kivity a...@redhat.com  2012-02-13 13:43:35 ---
Please run 'perf top' in the host and report the output (while tracing is
disabled).

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

2012-02-13 Thread jamal
On Fri, 2012-02-10 at 08:39 -0800, Stephen Hemminger wrote:

 Some related discussion points:
  * the bridge needs to support control from both userspace (MSTP, TRILL, ...)
and kernel space (offload etc)

I think all are pretty much covered if you let some controler (I prefer
user space) ADD/DEL/GET/Event on the fdb 
TRILL really is outside the scope of this; from an encap/decap it
probably needs to be YAND (Yet another netdev) and from a control side
of things you need to just provide the above netlink ops(ADD, etC) on
the fdb and let the controller worry about things (Actually you _may_
need to have learning done outside of the kernel for TRILL)

  * the bridge forwarding database is simpler and different than the existing
neighbor table, don't remember the details but last time I checked it
using neighbor table in bridge would be putting square peg in round hole.

Agreed.

cheers,
jamal


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane

2012-02-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42755





--- Comment #25 from Rosen sandik...@yandex.ru  2012-02-13 14:14:17 ---
(In reply to comment #23)
 What guest did during this trace? Can you provide info pci monitor output
 pls?

can't see full output from this command

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane

2012-02-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42755





--- Comment #26 from Rosen sandik...@yandex.ru  2012-02-13 14:24:36 ---
Created an attachment (id=72365)
 -- (https://bugzilla.kernel.org/attachment.cgi?id=72365)
info pci

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2-RFC 1/2] shpc: standard hot plug controller

2012-02-13 Thread Isaku Yamahata
On Mon, Feb 13, 2012 at 01:49:32PM +0200, Michael S. Tsirkin wrote:
 On Mon, Feb 13, 2012 at 07:03:52PM +0900, Isaku Yamahata wrote:
  Oh nice work.
  
  On Mon, Feb 13, 2012 at 11:15:55AM +0200, Michael S. Tsirkin wrote:
   This adds support for SHPC interface, as defined by PCI Standard
   Hot-Plug Controller and Subsystem Specification, Rev 1.0
   http://www.pcisig.com/specifications/conventional/pci_hot_plug/SHPC_10
   
   Only SHPC intergrated with a PCI-to-PCI bridge is supported,
   SHPC integrated with a host bridge would need more work.
   
   All main SHPC features are supported:
   - MRL sensor
  
  Does this just report latch status? (It seems so.)
 
 What happens is that adding a device closes the latch, removing a device
 opens the latch.  This simplifies the number of supported configurations
 significantly.
 
 
  Do you plan to provide interfaces to manipulate the latch?
 
 I didn't plan to do this, and this is non-trivial.
 Do you just want this for empty slots?  And why?

No, I just wondered your plan.


   - Attention button
   - Attention indicator
   - Power indicator
  
   Wake on hotplug and serr generation are stubbed out but unused
   as we don't have interfaces to generate these events ATM.
   
   One issue that isn't completely resolved is that qemu currently
   expects an eject interface, which SHPC does not provide: it merely
   removes the power to device and it's up to the user to remove the device
   from slot. This patch works around that by ejecting the device
   when power is removed and power LED goes off.
   
   TODO:
   - migration support
   - fix dependency on pci_internals.h
  
  If I didn't miss the code,
  - QMP command for pushing attention button.
  - QMP command to get LED status
 
 It's easy to add these, so I'd accept such a patch,
 but I wonder why.

My concern is how libvirt/virt-manger (or other UI) presents
slot status to operators/users.


  - QMP events for LED on/off
 
 There's also blink :)
 
  
  thanks,
 
 I'm concerned that a guest can flood the management with such events.
 It's better to send a single LED change event, then we
 can suppress further events until next get LED status command.

Makes sense.

 
   Signed-off-by: Michael S. Tsirkin m...@redhat.com
   ---
Makefile.objs |1 +
hw/pci.h  |6 +
hw/shpc.c |  646 
   +
hw/shpc.h |   40 
qemu-common.h |1 +
5 files changed, 694 insertions(+), 0 deletions(-)
create mode 100644 hw/shpc.c
create mode 100644 hw/shpc.h
   
   diff --git a/Makefile.objs b/Makefile.objs
   index 391e524..4546477 100644
   --- a/Makefile.objs
   +++ b/Makefile.objs
   @@ -195,6 +195,7 @@ hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
hw-obj-y += fw_cfg.o
hw-obj-$(CONFIG_PCI) += pci.o pci_bridge.o
hw-obj-$(CONFIG_PCI) += msix.o msi.o
   +hw-obj-$(CONFIG_PCI) += shpc.o
hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o
hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o
hw-obj-y += watchdog.o
   diff --git a/hw/pci.h b/hw/pci.h
   index 33b0b18..756577e 100644
   --- a/hw/pci.h
   +++ b/hw/pci.h
   @@ -125,6 +125,9 @@ enum {
/* command register SERR bit enabled */
#define QEMU_PCI_CAP_SERR_BITNR 4
QEMU_PCI_CAP_SERR = (1  QEMU_PCI_CAP_SERR_BITNR),
   +/* Standard hot plug controller. */
   +#define QEMU_PCI_SHPC_BITNR 5
   +QEMU_PCI_CAP_SHPC = (1  QEMU_PCI_SHPC_BITNR),
};

#define TYPE_PCI_DEVICE pci-device
   @@ -229,6 +232,9 @@ struct PCIDevice {
/* PCI Express */
PCIExpressDevice exp;

   +/* SHPC */
   +SHPCDevice *shpc;
   +
/* Location of option rom */
char *romfile;
bool has_rom;
   diff --git a/hw/shpc.c b/hw/shpc.c
   new file mode 100644
   index 000..4baec29
   --- /dev/null
   +++ b/hw/shpc.c
   @@ -0,0 +1,646 @@
   +#include strings.h
   +#include stdint.h
   +#include range.h
   +#include shpc.h
   +#include pci.h
   +#include pci_internals.h
   +
   +/* TODO: model power only and disabled slot states. */
   +/* TODO: handle SERR and wakeups */
   +/* TODO: consider enabling 66MHz support */
   +
   +/* TODO: remove fully only on state DISABLED and LED off.
   + * track state to properly record this. */
   +
   +/* SHPC Working Register Set */
   +#define SHPC_BASE_OFFSET  0x00 /* 4 bytes */
   +#define SHPC_SLOTS_33 0x04 /* 4 bytes. Also encodes PCI-X slots. */
   +#define SHPC_SLOTS_66 0x08 /* 4 bytes. */
   +#define SHPC_NSLOTS   0x0C /* 1 byte */
   +#define SHPC_FIRST_DEV0x0D /* 1 byte */
   +#define SHPC_PHYS_SLOT0x0E /* 2 byte */
   +#define SHPC_PHYS_NUM_MAX 0x7ff
   +#define SHPC_PHYS_NUM_UP  0x1000
   +#define SHPC_PHYS_MRL 0x4000
   +#define SHPC_PHYS_BUTTON  0x8000
   +#define SHPC_SEC_BUS  0x10 /* 2 bytes */
   +#define SHPC_SEC_BUS_33   0x0
   +#define SHPC_SEC_BUS_66   0x1 /* Unused */
   +#define SHPC_SEC_BUS_MASK 0x7
  

[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane

2012-02-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42755





--- Comment #27 from Rosen sandik...@yandex.ru  2012-02-13 14:37:36 ---
and there soon will be video capture with 'perf top'

http://vbox7.com/play:199e9ede30

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2-RFC 1/2] shpc: standard hot plug controller

2012-02-13 Thread Michael S. Tsirkin
On Mon, Feb 13, 2012 at 11:30:23PM +0900, Isaku Yamahata wrote:
 On Mon, Feb 13, 2012 at 01:49:32PM +0200, Michael S. Tsirkin wrote:
  On Mon, Feb 13, 2012 at 07:03:52PM +0900, Isaku Yamahata wrote:
   Oh nice work.
   
   On Mon, Feb 13, 2012 at 11:15:55AM +0200, Michael S. Tsirkin wrote:
This adds support for SHPC interface, as defined by PCI Standard
Hot-Plug Controller and Subsystem Specification, Rev 1.0
http://www.pcisig.com/specifications/conventional/pci_hot_plug/SHPC_10

Only SHPC intergrated with a PCI-to-PCI bridge is supported,
SHPC integrated with a host bridge would need more work.

All main SHPC features are supported:
- MRL sensor
   
   Does this just report latch status? (It seems so.)
  
  What happens is that adding a device closes the latch, removing a device
  opens the latch.  This simplifies the number of supported configurations
  significantly.
  
  
   Do you plan to provide interfaces to manipulate the latch?
  
  I didn't plan to do this, and this is non-trivial.
  Do you just want this for empty slots?  And why?
 
 No, I just wondered your plan.
 
 
- Attention button
- Attention indicator
- Power indicator
   
Wake on hotplug and serr generation are stubbed out but unused
as we don't have interfaces to generate these events ATM.

One issue that isn't completely resolved is that qemu currently
expects an eject interface, which SHPC does not provide: it merely
removes the power to device and it's up to the user to remove the device
from slot. This patch works around that by ejecting the device
when power is removed and power LED goes off.

TODO:
- migration support
- fix dependency on pci_internals.h
   
   If I didn't miss the code,
   - QMP command for pushing attention button.
   - QMP command to get LED status
  
  It's easy to add these, so I'd accept such a patch,
  but I wonder why.
 
 My concern is how libvirt/virt-manger (or other UI) presents
 slot status to operators/users.

They currently present free/busy status just by looking at info pci.
Maybe that is enough.

My concern is rather with the eject hack above: the add/delete
API maps reasonably to _EJ0 interface, but isn't generic enough
for SHPC. We'll need a better API for that.

   - QMP events for LED on/off
  
  There's also blink :)
  
   
   thanks,
  
  I'm concerned that a guest can flood the management with such events.
  It's better to send a single LED change event, then we
  can suppress further events until next get LED status command.
 
 Makes sense.
 
  
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 Makefile.objs |1 +
 hw/pci.h  |6 +
 hw/shpc.c |  646 
+
 hw/shpc.h |   40 
 qemu-common.h |1 +
 5 files changed, 694 insertions(+), 0 deletions(-)
 create mode 100644 hw/shpc.c
 create mode 100644 hw/shpc.h

diff --git a/Makefile.objs b/Makefile.objs
index 391e524..4546477 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -195,6 +195,7 @@ hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o
 hw-obj-y += fw_cfg.o
 hw-obj-$(CONFIG_PCI) += pci.o pci_bridge.o
 hw-obj-$(CONFIG_PCI) += msix.o msi.o
+hw-obj-$(CONFIG_PCI) += shpc.o
 hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o
 hw-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o 
xio3130_downstream.o
 hw-obj-y += watchdog.o
diff --git a/hw/pci.h b/hw/pci.h
index 33b0b18..756577e 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -125,6 +125,9 @@ enum {
 /* command register SERR bit enabled */
 #define QEMU_PCI_CAP_SERR_BITNR 4
 QEMU_PCI_CAP_SERR = (1  QEMU_PCI_CAP_SERR_BITNR),
+/* Standard hot plug controller. */
+#define QEMU_PCI_SHPC_BITNR 5
+QEMU_PCI_CAP_SHPC = (1  QEMU_PCI_SHPC_BITNR),
 };
 
 #define TYPE_PCI_DEVICE pci-device
@@ -229,6 +232,9 @@ struct PCIDevice {
 /* PCI Express */
 PCIExpressDevice exp;
 
+/* SHPC */
+SHPCDevice *shpc;
+
 /* Location of option rom */
 char *romfile;
 bool has_rom;
diff --git a/hw/shpc.c b/hw/shpc.c
new file mode 100644
index 000..4baec29
--- /dev/null
+++ b/hw/shpc.c
@@ -0,0 +1,646 @@
+#include strings.h
+#include stdint.h
+#include range.h
+#include shpc.h
+#include pci.h
+#include pci_internals.h
+
+/* TODO: model power only and disabled slot states. */
+/* TODO: handle SERR and wakeups */
+/* TODO: consider enabling 66MHz support */
+
+/* TODO: remove fully only on state DISABLED and LED off.
+ * track state to properly record this. */
+
+/* SHPC Working Register Set */
+#define SHPC_BASE_OFFSET  0x00 /* 4 bytes */
+#define SHPC_SLOTS_33 0x04 /* 4 bytes. Also encodes PCI-X slots. */
+#define SHPC_SLOTS_66 

Re: [Android-virt] [PATCH RFC v2 3/3] ARM: KVM: Add support for MMU notifiers

2012-02-13 Thread Christoffer Dall
On Mon, Feb 13, 2012 at 5:13 AM, Marc Zyngier marc.zyng...@arm.com wrote:
 On 12/02/12 01:12, Christoffer Dall wrote:
 On Sat, Feb 11, 2012 at 10:33 AM, Antonios Motakis
 a.mota...@virtualopensystems.com wrote:
 On 02/11/2012 06:35 PM, Christoffer Dall wrote:

 On Sat, Feb 11, 2012 at 7:00 AM, Antonios Motakis
 a.mota...@virtualopensystems.com  wrote:

 On 02/10/2012 11:22 PM, Marc Zyngier wrote:

 +ENTRY(__kvm_tlb_flush_vmid)
 +       hvc     #0                      @ Switch to Hyp mode
 +       push    {r2, r3}

 +       ldrd    r2, r3, [r0, #KVM_VTTBR]
 +       mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
 +       isb
 +       mcr     p15, 0, r0, c8, c7, 0   @ TBLIALL
 +       dsb
 +       isb
 +       mov     r2, #0
 +       mov     r3, #0
 +       mcrr    p15, 6, r2, r3, c2      @ Back to VMID #0
 +       isb
 +
 +       pop     {r2, r3}
 +       hvc     #0                      @ Back to SVC
 +       mov     pc, lr
 +ENDPROC(__kvm_tlb_flush_vmid)


 With the last VMID implementation, you could get the equivalent effect of
 a
 per-VMID flush, by just getting a new VMID for the current VM. So you
 could
 do a (kvm-arch.vmid = 0) to force a new VMID when the guest reruns, and
 save the overhead of that flush (you will do a complete flush every 255
 times instead of a small one every single time).

 to do this you would need to send an IPI if the guest is currently
 executing on another CPU and make it exit the guest, so that the VMID
 assignment will run before the guest potentially accesses that TLB
 entry that points to the page that was just reclaimed - which I am not
 sure will be better than this solution.

 Don't you have to do this anyway? You'd want the flush to be effective on
 all CPUs before proceeding.

 hmm yeah, actually you do need this. Unless the -IS version of the
 flush instruction covers all relevant cores in this case. Marc, I
 don't think that the processor clearing out the page table entry will
 necessarily belong to the same inner-shareable domain as the processor
 potentially executing the VM, so therefore the -IS flushing version
 would not be sufficient and we actually have to go and send an IPI.

 If we forget about the 11MPCore (which doesn't broadcast the TLB
 invalidation in hardware), the TLBIALLIS operation makes sure all cores
 belonging to the same inner shareable domain will see the TLB
 invalidation at the same time. If they don't, this is a hardware bug.

 Now, I do not have an example of a system where two CPUs are not part of
 the same IS domain. Even big.LITTLE has all of the potential 8 cores in
 an IS domain. If such a system exists one of these days, then it will be
 worth considering having a separate method to cope with the case. Until
 then, my opinion is to keep it as simple as possible.


ok, sounds good to me. Although, perhaps keep this as a comment somewhere...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

2012-02-13 Thread Hannes Reinecke
On 02/13/2012 02:18 PM, Michael S. Tsirkin wrote:
 On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote:
 On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin m...@redhat.com wrote:
 On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:
 Only if you use the pci multi-function option but that kills
 standard hot unplug

 It doesn't kill it as such, rather you can't unplug luns individually.

 Isnt that just a consequence of the current implementation rather than
 a SCSI limitation?
 
 Yes.
 
 A different way to do hoplug could be to flag all devices as removable
 in the standard inq page then
 leave the LUN there persistently and what you remove/add is not the
 LUN device itself but just the media in the device.

 Instead of hot-plug remove the LUN,  hot-plug becomes media eject or
 media insert.
 The device remains present all time, you never remove it, but instead
 hot-plug controls if the media is present or not.


 This would require implementing at least START_STOP_UNIT and
 PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC.


 regards
 ronnie sahlberg
 
 That would work.

Or we simply use the Peripheral Qualifier that the device is gone;
eg we could simply set PQ = 1, return sense code 0x25/00 and be done
with ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

2012-02-13 Thread John Fastabend
On 2/10/2012 7:18 AM, jamal wrote:
 Hi John,
 
 I went backwards to summarize at the top after going through your email.
 
 TL;DR version 0.1: 
 you provide a good use case where it makes sense to do things in the
 kernel. IMO, you could make the same arguement if your embedded switch
 could do ACLs, IPv4 forwarding etc. And the kernel bloats.
 I am always bigoted to move all policy control to user space instead of
 bloating in the kernel.
 
  
 On Thu, 2012-02-09 at 20:14 -0800, John Fastabend wrote:
 

 Hi Jamal,

 The user space app in this case would listen for FDB updates to the SW
 bridge and then mirror them at the embedded NIC. In this case it seems
 easier to just add a notifier chain and let the kernel keep these in
 sync. Otherwise we need a daemon in user space to replicate these.

 
 A user space daemon if you need to ensure synchronization. Thats what i
 meant when i said there was a disadvantage over the simple case when
 the goal is always to synchronize.
 
 On the other hand if you could make the same RTM_NEWNEIGH, RTM_DELNEIGH,
 and RTM_GETNEIGH work for the bridge, embedded bridge, and macvlan you
 would have one common interface to drive these. But the bridge already
 has this protocol/msgtype so that would require either some demux or
 new protocol/msgtype pairs to be created. 

 
 The bridge is very netlink friendly these days. Given the rest of the
 network stack (*NEIGH* you mention above) talks netlink to user space
 it should be workable. 
 
 Let me think on it. I'm tempted by the simplicity of adding notifier
 hooks though.
 
 If something is missing bridge-side it may need to be added (as Per
 Stephen's comment) - i just took it one further indicating those
 notifiers need to also netlink-speak
 

Sure.

 
 Actually because the bridge is adding/removing fdb entries dynamically
 maybe its best this gets done in kernel. Here's the example case,
 
 [..]
 

 With the flow by letters above hope this is not too difficult to follow.
 
 (A) veth0 a virtual device transmits packet destined for ethx.y
 (B) SW bridge receives frames and updates FDB flooding to C
 (C) eth0 the PF in this case sends the frame to the HW backed by the
 embedded bridge
 
 Following so far.
 Can you have more than one PF per embedded switch? Or is the intent here
 purely to do VMs/VF separation?
 

The use case here is multiple VFs but the same solution should work with
multiple PFs as well. FDB controls should be independent of how the ports
are exposed VFs, PFs, VMDQ/queue pairs, macvlan, etc.

 (D) The HW embedded switch has a static entry for ethx.y and forwards
 the frame to the VF or if its a broadcast frame also floods it to
 the wire and ethx.y
 
 nod.
 
 (E) ethx.y receives the frame and generates a response to the dest mac of
 veth0
 
 nod.
 Since you said in #D the entries in the switch are static, I am assuming
 at this point neither ethx.y nor veth0 exist in the embedded FDB.
 
 Now here is the potential issue,

 (G) The frame transmitted from ethx.y with the destination address of
 veth0 but the embedded switch is not a learning switch. If the FDB
 update is done in user space its possible (likely?) that the FDB
 entry for veth0 has not been added to the embedded switch yet. 
 
 Ok, got it - so the catch here is the switch is not capable of learning.
 I think this depends on where learning is done. Your intent is to
 use the S/W bridge as something that does the learning for you i.e in
 the kernel. This makes the s/w bridge part of MUST-have-for-this-to-run.
 And that maybe the case for your use case.
 

This is _my_ use case today.

 What if I dont wanna run the S/W bridge at all?
 Ive been making a point that with a simple knob(Stephen doesn like to
 add such a knob), the SW bridge could defer learning to user space. 
 [This way you can add a lot of richness e.g on ACLs such as restricting
 what MAC addresses etc are allowed to talk to which ones etc.].
 But if bypass the s/w bridge all together and learn in user space
 or have a static config in which i populate the embedded switch, i dont
 see the issue.

With events and ADD/DEL/GET FDB controls we can solve both cases. This also
solves Roopa's case with macvlan where he wants to add additional addresses
to macvlan ports.

 
 Now
 we either have to flood the frame which is not horrible but not
 ideal or worse if the embedded switch does not support flooding send
 it to the wire and veth0 never receives it. 
 
 If it is a switch it has to flood, no? Otherwise it sounds broken.
 

Yes it should flood here, unless its acting as a 802.1Qbg VEB or VEPA.

 If the SW bridge pushes
 the FDB update down into the embedded switch the address is for
 sure in the embedded switches forwarding tables and the switching
 works as expected.
 
 Yes, there is a small gap between the s/w bridge learning and the
 synchronization happening to the embedded nic switch. That gap gets
 larger if you defer learning to 

Re: x86: kvmclock: abstract save/restore sched_clock_state (v2)

2012-02-13 Thread Igor Mammedov

On 02/13/2012 02:07 PM, Marcelo Tosatti wrote:


Upon resume from hibernation, CPU 0's hvclock area contains the old
values for system_time and tsc_timestamp. It is necessary for the
hypervisor to update these values with uptodate ones before the CPU uses
them.

Abstract TSC's save/restore sched_clock_state functions and use
restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update.

Also move restore_sched_clock_state before __restore_processor_state,
since the later calls CONFIG_LOCK_STAT's lockstat_clock (also for TSC).
Thanks to Igor Mammedov for tracking it down.

Fixes suspend-to-disk with kvmclock.

Signed-off-by: Marcelo Tosattimtosa...@redhat.com

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 15d9915..c91e8b9 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -61,7 +61,7 @@ extern void check_tsc_sync_source(int cpu);
  extern void check_tsc_sync_target(void);

  extern int notsc_setup(char *);
-extern void save_sched_clock_state(void);
-extern void restore_sched_clock_state(void);
+extern void tsc_save_sched_clock_state(void);
+extern void tsc_restore_sched_clock_state(void);

  #endif /* _ASM_X86_TSC_H */
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 5d0afac..baaca8d 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -162,6 +162,8 @@ struct x86_cpuinit_ops {
   * @is_untracked_pat_rangeexclude from PAT logic
   * @nmi_init  enable NMI on cpus
   * @i8042_detect  pre-detect if i8042 controller exists
+ * @save_sched_clock_state:save state for sched_clock() on suspend
+ * @restore_sched_clock_state: restore state for sched_clock() on resume
   */
  struct x86_platform_ops {
unsigned long (*calibrate_tsc)(void);
@@ -173,6 +175,8 @@ struct x86_platform_ops {
void (*nmi_init)(void);
unsigned char (*get_nmi_reason)(void);
int (*i8042_detect)(void);
+   void (*save_sched_clock_state)(void);
+   void (*restore_sched_clock_state)(void);
  };

  struct pci_dev;
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index ca4e735..57e6b78 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -136,6 +136,15 @@ int kvm_register_clock(char *txt)
return ret;
  }

+void kvm_save_sched_clock_state(void)
+{
+}
+
+void kvm_restore_sched_clock_state(void)
+{
+   kvm_register_clock(primary cpu clock, resume);
+}
+
  #ifdef CONFIG_X86_LOCAL_APIC
  static void __cpuinit kvm_setup_secondary_clock(void)
  {
@@ -195,6 +204,8 @@ void __init kvmclock_init(void)
x86_cpuinit.early_percpu_clock_init =
kvm_setup_secondary_clock;
  #endif
+   x86_platform.save_sched_clock_state = kvm_save_sched_clock_state;
+   x86_platform.restore_sched_clock_state = kvm_restore_sched_clock_state;
machine_ops.shutdown  = kvm_shutdown;
  #ifdef CONFIG_KEXEC
machine_ops.crash_shutdown  = kvm_crash_shutdown;
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index a62c201..aed2aa1 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -629,7 +629,7 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)

  static unsigned long long cyc2ns_suspend;

-void save_sched_clock_state(void)
+void tsc_save_sched_clock_state(void)
  {
if (!sched_clock_stable)
return;
@@ -645,7 +645,7 @@ void save_sched_clock_state(void)
   * that sched_clock() continues from the point where it was left off during
   * suspend.
   */
-void restore_sched_clock_state(void)
+void tsc_restore_sched_clock_state(void)
  {
unsigned long long offset;
unsigned long flags;
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 6f2ec53..e9f265f 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -108,7 +108,9 @@ struct x86_platform_ops x86_platform = {
.is_untracked_pat_range = is_ISA_range,
.nmi_init   = default_nmi_init,
.get_nmi_reason = default_get_nmi_reason,
-   .i8042_detect   = default_i8042_detect
+   .i8042_detect   = default_i8042_detect,
+   .save_sched_clock_state = tsc_save_sched_clock_state,
+   .restore_sched_clock_state  = tsc_restore_sched_clock_state,
  };

  EXPORT_SYMBOL_GPL(x86_platform);
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index f10c0af..0e76a28 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -114,7 +114,7 @@ static void __save_processor_state(struct saved_context 
*ctxt)
  void save_processor_state(void)
  {
__save_processor_state(saved_context);
-   save_sched_clock_state();
+   x86_platform.save_sched_clock_state();
  }
  #ifdef CONFIG_X86_32
  EXPORT_SYMBOL(save_processor_state);
@@ -230,8 +230,8 @@ static void __restore_processor_state(struct saved_context 
*ctxt)
  /* 

[PATCH RFC] pvclock: Make pv_clock more robust and fixup it if overflow happens

2012-02-13 Thread Igor Mammedov
Instead of hunting misterious stalls/hungs all over the kernel when
overflow occurs at pvclock.c:pvclock_get_nsec_offset

u64 delta = native_read_tsc() - shadow-tsc_timestamp;

and introducing hooks when places of unexpected access found, pv_clock
should be initialized for the calling cpu if overflow condition is detected.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 arch/x86/kernel/pvclock.c |   18 +++---
 1 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 42eb330..b486756 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -41,9 +41,14 @@ void pvclock_set_flags(u8 flags)
valid_flags = flags;
 }
 
-static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow)
+static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow,
+  bool *overflow)
 {
-   u64 delta = native_read_tsc() - shadow-tsc_timestamp;
+   u64 delta;
+   u64 tsc = native_read_tsc();
+   u64 shadow_timestamp = shadow-tsc_timestamp;
+   *overflow = tsc  shadow_timestamp;
+   delta = tsc - shadow_timestamp;
return pvclock_scale_delta(delta, shadow-tsc_to_nsec_mul,
   shadow-tsc_shift);
 }
@@ -94,12 +99,19 @@ cycle_t pvclock_clocksource_read(struct 
pvclock_vcpu_time_info *src)
unsigned version;
cycle_t ret, offset;
u64 last;
+   bool overflow;
 
do {
version = pvclock_get_time_values(shadow, src);
barrier();
-   offset = pvclock_get_nsec_offset(shadow);
+   offset = pvclock_get_nsec_offset(shadow, overflow);
ret = shadow.system_timestamp + offset;
+   if (unlikely(overflow)) {
+   memset(src, 0, sizeof(*src));
+   barrier();
+   x86_cpuinit.early_percpu_clock_init();
+   continue;
+   }
barrier();
} while (version != src-version);
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: perf: kvm events analysis tool

2012-02-13 Thread David Ahern


On 02/13/2012 03:06 AM, Xiao Guangrong wrote:
 On 02/13/2012 01:32 PM, David Ahern wrote:
 
 [sorry for the top post - you would think Android would have a better mail 
 client]

 If the first patch is needed then kvm-events will not work with older, 
 unpatched kernels. That's a big limitation from a perf perpective.

 
 
 The first patch is only needed for code compilation, after kvm-events is
 compiled, you can analyse any kernels. :)

understood.

Now that I recall perf's way of handling out of tree builds, a couple of
comments:

1. you need to add the following to tools/perf/MANIFEST
arch/x86/include/asm/svm.h
arch/x86/include/asm/vmx.h
arch/x86/include/asm/kvm_host.h

2.scripts/checkpatch.pl is an unhappy camper.

I'll take a look at the code and try out the command when I get some time.

David
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86: kvmclock: abstract save/restore sched_clock_state (v2)

2012-02-13 Thread Marcelo Tosatti
On Mon, Feb 13, 2012 at 04:20:24PM +0100, Igor Mammedov wrote:
 On 02/13/2012 02:07 PM, Marcelo Tosatti wrote:
 
 Upon resume from hibernation, CPU 0's hvclock area contains the old
 values for system_time and tsc_timestamp. It is necessary for the
 hypervisor to update these values with uptodate ones before the CPU uses
 them.
 
 Abstract TSC's save/restore sched_clock_state functions and use
 restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update.
 
 Also move restore_sched_clock_state before __restore_processor_state,
 since the later calls CONFIG_LOCK_STAT's lockstat_clock (also for TSC).
 Thanks to Igor Mammedov for tracking it down.
 
 Fixes suspend-to-disk with kvmclock.
 
 Signed-off-by: Marcelo Tosattimtosa...@redhat.com
 
 diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
 index 15d9915..c91e8b9 100644
 --- a/arch/x86/include/asm/tsc.h
 +++ b/arch/x86/include/asm/tsc.h
 @@ -61,7 +61,7 @@ extern void check_tsc_sync_source(int cpu);
   extern void check_tsc_sync_target(void);
 
   extern int notsc_setup(char *);
 -extern void save_sched_clock_state(void);
 -extern void restore_sched_clock_state(void);
 +extern void tsc_save_sched_clock_state(void);
 +extern void tsc_restore_sched_clock_state(void);
 
   #endif /* _ASM_X86_TSC_H */
 diff --git a/arch/x86/include/asm/x86_init.h 
 b/arch/x86/include/asm/x86_init.h
 index 5d0afac..baaca8d 100644
 --- a/arch/x86/include/asm/x86_init.h
 +++ b/arch/x86/include/asm/x86_init.h
 @@ -162,6 +162,8 @@ struct x86_cpuinit_ops {
* @is_untracked_pat_range exclude from PAT logic
* @nmi_init   enable NMI on cpus
* @i8042_detect   pre-detect if i8042 controller exists
 + * @save_sched_clock_state: save state for sched_clock() on suspend
 + * @restore_sched_clock_state:  restore state for sched_clock() on 
 resume
*/
   struct x86_platform_ops {
  unsigned long (*calibrate_tsc)(void);
 @@ -173,6 +175,8 @@ struct x86_platform_ops {
  void (*nmi_init)(void);
  unsigned char (*get_nmi_reason)(void);
  int (*i8042_detect)(void);
 +void (*save_sched_clock_state)(void);
 +void (*restore_sched_clock_state)(void);
   };
 
   struct pci_dev;
 diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
 index ca4e735..57e6b78 100644
 --- a/arch/x86/kernel/kvmclock.c
 +++ b/arch/x86/kernel/kvmclock.c
 @@ -136,6 +136,15 @@ int kvm_register_clock(char *txt)
  return ret;
   }
 
 +void kvm_save_sched_clock_state(void)
 +{
 +}
 +
 +void kvm_restore_sched_clock_state(void)
 +{
 +kvm_register_clock(primary cpu clock, resume);
 +}
 +
   #ifdef CONFIG_X86_LOCAL_APIC
   static void __cpuinit kvm_setup_secondary_clock(void)
   {
 @@ -195,6 +204,8 @@ void __init kvmclock_init(void)
  x86_cpuinit.early_percpu_clock_init =
  kvm_setup_secondary_clock;
   #endif
 +x86_platform.save_sched_clock_state = kvm_save_sched_clock_state;
 +x86_platform.restore_sched_clock_state = kvm_restore_sched_clock_state;
  machine_ops.shutdown  = kvm_shutdown;
   #ifdef CONFIG_KEXEC
  machine_ops.crash_shutdown  = kvm_crash_shutdown;
 diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
 index a62c201..aed2aa1 100644
 --- a/arch/x86/kernel/tsc.c
 +++ b/arch/x86/kernel/tsc.c
 @@ -629,7 +629,7 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, int 
 cpu)
 
   static unsigned long long cyc2ns_suspend;
 
 -void save_sched_clock_state(void)
 +void tsc_save_sched_clock_state(void)
   {
  if (!sched_clock_stable)
  return;
 @@ -645,7 +645,7 @@ void save_sched_clock_state(void)
* that sched_clock() continues from the point where it was left off during
* suspend.
*/
 -void restore_sched_clock_state(void)
 +void tsc_restore_sched_clock_state(void)
   {
  unsigned long long offset;
  unsigned long flags;
 diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
 index 6f2ec53..e9f265f 100644
 --- a/arch/x86/kernel/x86_init.c
 +++ b/arch/x86/kernel/x86_init.c
 @@ -108,7 +108,9 @@ struct x86_platform_ops x86_platform = {
  .is_untracked_pat_range = is_ISA_range,
  .nmi_init   = default_nmi_init,
  .get_nmi_reason = default_get_nmi_reason,
 -.i8042_detect   = default_i8042_detect
 +.i8042_detect   = default_i8042_detect,
 +.save_sched_clock_state = tsc_save_sched_clock_state,
 +.restore_sched_clock_state  = tsc_restore_sched_clock_state,
   };
 
   EXPORT_SYMBOL_GPL(x86_platform);
 diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
 index f10c0af..0e76a28 100644
 --- a/arch/x86/power/cpu.c
 +++ b/arch/x86/power/cpu.c
 @@ -114,7 +114,7 @@ static void __save_processor_state(struct saved_context 
 *ctxt)
   void save_processor_state(void)
   {
  __save_processor_state(saved_context);
 -save_sched_clock_state();
 +x86_platform.save_sched_clock_state();
   }
   #ifdef CONFIG_X86_32
   

Re: x86: kvmclock: abstract save/restore sched_clock_state (v2)

2012-02-13 Thread Amit Shah
On (Mon) 13 Feb 2012 [11:07:27], Marcelo Tosatti wrote:
 
 Upon resume from hibernation, CPU 0's hvclock area contains the old
 values for system_time and tsc_timestamp. It is necessary for the
 hypervisor to update these values with uptodate ones before the CPU uses
 them.
 
 Abstract TSC's save/restore sched_clock_state functions and use
 restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update.
 
 Also move restore_sched_clock_state before __restore_processor_state,
 since the later calls CONFIG_LOCK_STAT's lockstat_clock (also for TSC).
 Thanks to Igor Mammedov for tracking it down.
 
 Fixes suspend-to-disk with kvmclock.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

This works fine, thanks.

Tested-by: Amit Shah amit.s...@redhat.com

Amit
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 00/15] SCSI s/g + SCSI migration + virtio-scsi

2012-02-13 Thread Paolo Bonzini
Here is v3 of the virtio-scsi driver.  Changes are:

- the virtio id is now 8, to fix a conflict in the virtio spec;

- rebased for QOM;

- changed the resid type to size_t following Stefan's advice;

- fixed sense length (patch from Christian Hoff).

The spec has been committed by Rusty (version 0.9.4), and SCSI maintainers
should be okay with including it in the 3.4 kernel.

Paolo Bonzini (13):
  dma-helpers: make QEMUSGList target independent
  dma-helpers: add dma_buf_read and dma_buf_write
  dma-helpers: add accounting wrappers
  ahci: use new DMA helpers
  scsi: pass residual amount to command_complete
  scsi: add scatter/gather functionality
  scsi-disk: enable scatter/gather functionality
  scsi: add SCSIDevice vmstate definitions
  scsi-generic: add migration support
  scsi-disk: add migration support
  virtio-scsi: add basic SCSI bus operation
  virtio-scsi: process control queue requests
  virtio-scsi: add migration support

Stefan Hajnoczi (2):
  virtio-scsi: Add virtio-scsi stub device
  virtio-scsi: Add basic request processing infrastructure

 Makefile.target   |1 +
 default-configs/pci.mak   |1 +
 default-configs/s390x-softmmu.mak |1 +
 dma-helpers.c |   36 +++
 dma.h |   20 +-
 hw/esp.c  |3 +-
 hw/ide/ahci.c |   82 +-
 hw/lsi53c895a.c   |2 +-
 hw/pci.h  |1 +
 hw/s390-virtio-bus.c  |   34 ++
 hw/s390-virtio-bus.h  |4 +-
 hw/scsi-bus.c |  142 +-
 hw/scsi-disk.c|  120 +++-
 hw/scsi-generic.c |   25 ++
 hw/scsi.h |   22 ++-
 hw/spapr_vscsi.c  |2 +-
 hw/usb-msd.c  |2 +-
 hw/virtio-pci.c   |   56 
 hw/virtio-pci.h   |2 +
 hw/virtio-scsi.c  |  607 +
 hw/virtio-scsi.h  |   36 +++
 hw/virtio.h   |3 +
 22 files changed, 1098 insertions(+), 104 deletions(-)
 create mode 100644 hw/virtio-scsi.c
 create mode 100644 hw/virtio-scsi.h

-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 01/15] dma-helpers: make QEMUSGList target independent

2012-02-13 Thread Paolo Bonzini
scsi-disk will manage scatter/gather list, but it does not create
single entries so it remains target-independent.  Make QEMUSGList
available to it.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 dma.h |   14 +++---
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/dma.h b/dma.h
index a13209d..d50019b 100644
--- a/dma.h
+++ b/dma.h
@@ -17,6 +17,13 @@
 
 typedef struct ScatterGatherEntry ScatterGatherEntry;
 
+struct QEMUSGList {
+ScatterGatherEntry *sg;
+int nsg;
+int nalloc;
+size_t size;
+};
+
 #if defined(TARGET_PHYS_ADDR_BITS)
 typedef target_phys_addr_t dma_addr_t;
 
@@ -32,13 +39,6 @@ struct ScatterGatherEntry {
 dma_addr_t len;
 };
 
-struct QEMUSGList {
-ScatterGatherEntry *sg;
-int nsg;
-int nalloc;
-dma_addr_t size;
-};
-
 void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint);
 void qemu_sglist_add(QEMUSGList *qsg, dma_addr_t base, dma_addr_t len);
 void qemu_sglist_destroy(QEMUSGList *qsg);
-- 
1.7.7.6


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 02/15] dma-helpers: add dma_buf_read and dma_buf_write

2012-02-13 Thread Paolo Bonzini
These helpers do a full transfer from an in-memory buffer to target
memory, with support for scatter/gather lists.  It will be used to
store the reply of an emulated command into a QEMUSGList provided by
the adapter.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 dma-helpers.c |   30 ++
 dma.h |3 +++
 2 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index f08cdb5..f53a51f 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -204,3 +204,33 @@ BlockDriverAIOCB *dma_bdrv_write(BlockDriverState *bs,
 {
 return dma_bdrv_io(bs, sg, sector, bdrv_aio_writev, cb, opaque, true);
 }
+
+
+static uint64_t dma_buf_rw(uint8_t *ptr, int32_t len, QEMUSGList *sg, bool 
to_dev)
+{
+uint64_t resid;
+int sg_cur_index;
+
+resid = sg-size;
+sg_cur_index = 0;
+len = MIN(len, resid);
+while (len  0) {
+ScatterGatherEntry entry = sg-sg[sg_cur_index++];
+cpu_physical_memory_rw(entry.base, ptr, MIN(len, entry.len), !to_dev);
+ptr += entry.len;
+len -= entry.len;
+resid -= entry.len;
+}
+
+return resid;
+}
+
+uint64_t dma_buf_read(uint8_t *ptr, int32_t len, QEMUSGList *sg)
+{
+return dma_buf_rw(ptr, len, sg, 0);
+}
+
+uint64_t dma_buf_write(uint8_t *ptr, int32_t len, QEMUSGList *sg)
+{
+return dma_buf_rw(ptr, len, sg, 1);
+}
diff --git a/dma.h b/dma.h
index d50019b..346ac4f 100644
--- a/dma.h
+++ b/dma.h
@@ -58,4 +58,7 @@ BlockDriverAIOCB *dma_bdrv_read(BlockDriverState *bs,
 BlockDriverAIOCB *dma_bdrv_write(BlockDriverState *bs,
  QEMUSGList *sg, uint64_t sector,
  BlockDriverCompletionFunc *cb, void *opaque);
+uint64_t dma_buf_read(uint8_t *ptr, int32_t len, QEMUSGList *sg);
+uint64_t dma_buf_write(uint8_t *ptr, int32_t len, QEMUSGList *sg);
+
 #endif
-- 
1.7.7.6


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 03/15] dma-helpers: add accounting wrappers

2012-02-13 Thread Paolo Bonzini
The length of the transfer is already in the sglist, the wrapper simply
fetches it.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 dma-helpers.c |6 ++
 dma.h |3 +++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index f53a51f..a773489 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -234,3 +234,9 @@ uint64_t dma_buf_write(uint8_t *ptr, int32_t len, 
QEMUSGList *sg)
 {
 return dma_buf_rw(ptr, len, sg, 1);
 }
+
+void dma_acct_start(BlockDriverState *bs, BlockAcctCookie *cookie,
+QEMUSGList *sg, enum BlockAcctType type)
+{
+bdrv_acct_start(bs, cookie, sg-size, type);
+}
diff --git a/dma.h b/dma.h
index 346ac4f..20e86d2 100644
--- a/dma.h
+++ b/dma.h
@@ -61,4 +61,7 @@ BlockDriverAIOCB *dma_bdrv_write(BlockDriverState *bs,
 uint64_t dma_buf_read(uint8_t *ptr, int32_t len, QEMUSGList *sg);
 uint64_t dma_buf_write(uint8_t *ptr, int32_t len, QEMUSGList *sg);
 
+void dma_acct_start(BlockDriverState *bs, BlockAcctCookie *cookie,
+QEMUSGList *sg, enum BlockAcctType type);
+
 #endif
-- 
1.7.7.6


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 04/15] ahci: use new DMA helpers

2012-02-13 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/ide/ahci.c |   82 +
 1 files changed, 13 insertions(+), 69 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index c87a6ca..25ed844 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -426,55 +426,6 @@ static void ahci_reg_init(AHCIState *s)
 }
 }
 
-static uint32_t read_from_sglist(uint8_t *buffer, uint32_t len,
- QEMUSGList *sglist)
-{
-uint32_t i = 0;
-uint32_t total = 0, once;
-ScatterGatherEntry *cur_prd;
-uint32_t sgcount;
-
-cur_prd = sglist-sg;
-sgcount = sglist-nsg;
-for (i = 0; len  sgcount; i++) {
-once = MIN(cur_prd-len, len);
-cpu_physical_memory_read(cur_prd-base, buffer, once);
-cur_prd++;
-sgcount--;
-len -= once;
-buffer += once;
-total += once;
-}
-
-return total;
-}
-
-static uint32_t write_to_sglist(uint8_t *buffer, uint32_t len,
-QEMUSGList *sglist)
-{
-uint32_t i = 0;
-uint32_t total = 0, once;
-ScatterGatherEntry *cur_prd;
-uint32_t sgcount;
-
-DPRINTF(-1, total: 0x%x bytes\n, len);
-
-cur_prd = sglist-sg;
-sgcount = sglist-nsg;
-for (i = 0; len  sgcount; i++) {
-once = MIN(cur_prd-len, len);
-DPRINTF(-1, write 0x%x bytes to 0x%lx\n, once, (long)cur_prd-base);
-cpu_physical_memory_write(cur_prd-base, buffer, once);
-cur_prd++;
-sgcount--;
-len -= once;
-buffer += once;
-total += once;
-}
-
-return total;
-}
-
 static void check_cmd(AHCIState *s, int port)
 {
 AHCIPortRegs *pr = s-dev[port].port_regs;
@@ -795,9 +746,8 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 DPRINTF(port, tag %d aio read %PRId64\n,
 ncq_tfs-tag, ncq_tfs-lba);
 
-bdrv_acct_start(ncq_tfs-drive-port.ifs[0].bs, ncq_tfs-acct,
-(ncq_tfs-sector_count-1) * BDRV_SECTOR_SIZE,
-BDRV_ACCT_READ);
+dma_acct_start(ncq_tfs-drive-port.ifs[0].bs, ncq_tfs-acct,
+   ncq_tfs-sglist, BDRV_ACCT_READ);
 ncq_tfs-aiocb = dma_bdrv_read(ncq_tfs-drive-port.ifs[0].bs,
ncq_tfs-sglist, ncq_tfs-lba,
ncq_cb, ncq_tfs);
@@ -809,9 +759,8 @@ static void process_ncq_command(AHCIState *s, int port, 
uint8_t *cmd_fis,
 DPRINTF(port, tag %d aio write %PRId64\n,
 ncq_tfs-tag, ncq_tfs-lba);
 
-bdrv_acct_start(ncq_tfs-drive-port.ifs[0].bs, ncq_tfs-acct,
-(ncq_tfs-sector_count-1) * BDRV_SECTOR_SIZE,
-BDRV_ACCT_WRITE);
+dma_acct_start(ncq_tfs-drive-port.ifs[0].bs, ncq_tfs-acct,
+   ncq_tfs-sglist, BDRV_ACCT_WRITE);
 ncq_tfs-aiocb = dma_bdrv_write(ncq_tfs-drive-port.ifs[0].bs,
 ncq_tfs-sglist, ncq_tfs-lba,
 ncq_cb, ncq_tfs);
@@ -1016,12 +965,12 @@ static int ahci_start_transfer(IDEDMA *dma)
 is_write ? writ : read, size, is_atapi ? atapi : ata,
 has_sglist ?  : o);
 
-if (is_write  has_sglist  (s-data_ptr  s-data_end)) {
-read_from_sglist(s-data_ptr, size, s-sg);
-}
-
-if (!is_write  has_sglist  (s-data_ptr  s-data_end)) {
-write_to_sglist(s-data_ptr, size, s-sg);
+if (has_sglist  size) {
+if (is_write) {
+dma_buf_write(s-data_ptr, size, s-sg);
+} else {
+dma_buf_read(s-data_ptr, size, s-sg);
+}
 }
 
 /* update number of transferred bytes */
@@ -1060,14 +1009,9 @@ static int ahci_dma_prepare_buf(IDEDMA *dma, int 
is_write)
 {
 AHCIDevice *ad = DO_UPCAST(AHCIDevice, dma, dma);
 IDEState *s = ad-port.ifs[0];
-int i;
 
 ahci_populate_sglist(ad, s-sg);
-
-s-io_buffer_size = 0;
-for (i = 0; i  s-sg.nsg; i++) {
-s-io_buffer_size += s-sg.sg[i].len;
-}
+s-io_buffer_size = s-sg.size;
 
 DPRINTF(ad-port_no, len=%#x\n, s-io_buffer_size);
 return s-io_buffer_size != 0;
@@ -1085,9 +1029,9 @@ static int ahci_dma_rw_buf(IDEDMA *dma, int is_write)
 }
 
 if (is_write) {
-write_to_sglist(p, l, s-sg);
+dma_buf_read(p, l, s-sg);
 } else {
-read_from_sglist(p, l, s-sg);
+dma_buf_write(p, l, s-sg);
 }
 
 /* update number of transferred bytes */
-- 
1.7.7.6


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 06/15] scsi: add scatter/gather functionality

2012-02-13 Thread Paolo Bonzini
Scatter/gather functionality uses the newly added DMA helpers.  The
device can choose between doing DMA itself, or calling scsi_req_data
as usual, which will use the newly added DMA helpers to copy piecewise
to/from the destination area(s).

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/scsi-bus.c |   28 ++--
 hw/scsi.h |3 +++
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index 6a069f4..69cb3fc 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -5,6 +5,7 @@
 #include qdev.h
 #include blockdev.h
 #include trace.h
+#include dma.h
 
 static char *scsibus_get_fw_dev_path(DeviceState *dev);
 static int scsi_req_parse(SCSICommand *cmd, SCSIDevice *dev, uint8_t *buf);
@@ -651,6 +652,11 @@ int32_t scsi_req_enqueue(SCSIRequest *req)
 
 assert(!req-enqueued);
 scsi_req_ref(req);
+if (req-bus-info-get_sg_list) {
+req-sg = req-bus-info-get_sg_list(req);
+} else {
+req-sg = NULL;
+}
 req-enqueued = true;
 QTAILQ_INSERT_TAIL(req-dev-requests, req, next);
 
@@ -1275,14 +1281,32 @@ void scsi_req_continue(SCSIRequest *req)
Once it completes, calling scsi_req_continue will restart I/O.  */
 void scsi_req_data(SCSIRequest *req, int len)
 {
+uint8_t *buf;
 if (req-io_canceled) {
 trace_scsi_req_data_canceled(req-dev-id, req-lun, req-tag, len);
 return;
 }
 trace_scsi_req_data(req-dev-id, req-lun, req-tag, len);
 assert(req-cmd.mode != SCSI_XFER_NONE);
-req-resid -= len;
-req-bus-info-transfer_data(req, len);
+if (!req-sg) {
+req-resid -= len;
+req-bus-info-transfer_data(req, len);
+return;
+}
+
+/* If the device calls scsi_req_data and the HBA specified a
+ * scatter/gather list, the transfer has to happen in a single
+ * step.  */
+assert(!req-dma_started);
+req-dma_started = true;
+
+buf = scsi_req_get_buf(req);
+if (req-cmd.mode == SCSI_XFER_FROM_DEV) {
+req-resid = dma_buf_read(buf, len, req-sg);
+} else {
+req-resid = dma_buf_write(buf, len, req-sg);
+}
+scsi_req_continue(req);
 }
 
 void scsi_req_print(SCSIRequest *req)
diff --git a/hw/scsi.h b/hw/scsi.h
index e1c52d2..811f61c 100644
--- a/hw/scsi.h
+++ b/hw/scsi.h
@@ -49,6 +49,8 @@ struct SCSIRequest {
 size_tresid;
 SCSICommand   cmd;
 BlockDriverAIOCB  *aiocb;
+QEMUSGList*sg;
+bool  dma_started;
 uint8_t sense[SCSI_SENSE_BUF_SIZE];
 uint32_t sense_len;
 bool enqueued;
@@ -115,6 +117,7 @@ struct SCSIBusInfo {
 void (*transfer_data)(SCSIRequest *req, uint32_t arg);
 void (*complete)(SCSIRequest *req, uint32_t arg, size_t resid);
 void (*cancel)(SCSIRequest *req);
+QEMUSGList *(*get_sg_list)(SCSIRequest *req);
 };
 
 struct SCSIBus {
-- 
1.7.7.6


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 07/15] scsi-disk: enable scatter/gather functionality

2012-02-13 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/scsi-bus.c  |1 +
 hw/scsi-disk.c |   63 ---
 2 files changed, 51 insertions(+), 13 deletions(-)

diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index 69cb3fc..817aa49 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -87,6 +87,7 @@ static void scsi_dma_restart_bh(void *opaque)
 scsi_req_continue(req);
 break;
 case SCSI_XFER_NONE:
+assert(!req-sg);
 scsi_req_dequeue(req);
 scsi_req_enqueue(req);
 break;
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 399e51e..0e4d6ad 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -38,6 +38,7 @@ do { fprintf(stderr, scsi-disk:  fmt , ## __VA_ARGS__); } 
while (0)
 #include sysemu.h
 #include blockdev.h
 #include block_int.h
+#include dma.h
 
 #ifdef __linux
 #include scsi/sg.h
@@ -123,6 +124,27 @@ static uint32_t scsi_init_iovec(SCSIDiskReq *r)
 return r-qiov.size / 512;
 }
 
+static void scsi_dma_complete(void *opaque, int ret)
+{
+SCSIDiskReq *r = (SCSIDiskReq *)opaque;
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r-req.dev);
+
+bdrv_acct_done(s-qdev.conf.bs, r-acct);
+
+if (ret) {
+if (scsi_handle_rw_error(r, -ret)) {
+goto done;
+}
+}
+
+r-sector += r-sector_count;
+r-sector_count = 0;
+scsi_req_complete(r-req, GOOD);
+
+done:
+scsi_req_unref(r-req);
+}
+
 static void scsi_read_complete(void * opaque, int ret)
 {
 SCSIDiskReq *r = (SCSIDiskReq *)opaque;
@@ -213,10 +235,17 @@ static void scsi_read_data(SCSIRequest *req)
 return;
 }
 
-n = scsi_init_iovec(r);
-bdrv_acct_start(s-qdev.conf.bs, r-acct, n * BDRV_SECTOR_SIZE, 
BDRV_ACCT_READ);
-r-req.aiocb = bdrv_aio_readv(s-qdev.conf.bs, r-sector, r-qiov, n,
-  scsi_read_complete, r);
+if (r-req.sg) {
+dma_acct_start(s-qdev.conf.bs, r-acct, r-req.sg, BDRV_ACCT_READ);
+r-req.resid -= r-req.sg-size;
+r-req.aiocb = dma_bdrv_read(s-qdev.conf.bs, r-req.sg, r-sector,
+ scsi_dma_complete, r);
+} else {
+n = scsi_init_iovec(r);
+bdrv_acct_start(s-qdev.conf.bs, r-acct, n * BDRV_SECTOR_SIZE, 
BDRV_ACCT_READ);
+r-req.aiocb = bdrv_aio_readv(s-qdev.conf.bs, r-sector, r-qiov, n,
+  scsi_read_complete, r);
+}
 }
 
 /*
@@ -315,18 +344,26 @@ static void scsi_write_data(SCSIRequest *req)
 return;
 }
 
-n = r-qiov.size / 512;
-if (n) {
-if (s-tray_open) {
-scsi_write_complete(r, -ENOMEDIUM);
-return;
-}
+if (!r-req.sg  !r-qiov.size) {
+/* Called for the first time.  Ask the driver to send us more data.  */
+scsi_write_complete(r, 0);
+return;
+}
+if (s-tray_open) {
+scsi_write_complete(r, -ENOMEDIUM);
+return;
+}
+
+if (r-req.sg) {
+dma_acct_start(s-qdev.conf.bs, r-acct, r-req.sg, BDRV_ACCT_WRITE);
+r-req.resid -= r-req.sg-size;
+r-req.aiocb = dma_bdrv_write(s-qdev.conf.bs, r-req.sg, r-sector,
+  scsi_dma_complete, r);
+} else {
+n = r-qiov.size / 512;
 bdrv_acct_start(s-qdev.conf.bs, r-acct, n * BDRV_SECTOR_SIZE, 
BDRV_ACCT_WRITE);
 r-req.aiocb = bdrv_aio_writev(s-qdev.conf.bs, r-sector, r-qiov, n,
scsi_write_complete, r);
-} else {
-/* Called for the first time.  Ask the driver to send us more data.  */
-scsi_write_complete(r, 0);
 }
 }
 
-- 
1.7.7.6


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 08/15] scsi: add SCSIDevice vmstate definitions

2012-02-13 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/scsi-bus.c |  107 +++--
 hw/scsi.h |   16 
 2 files changed, 120 insertions(+), 3 deletions(-)

diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index 817aa49..15841d0 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -647,10 +647,8 @@ void scsi_req_build_sense(SCSIRequest *req, SCSISense 
sense)
 req-sense_len = 18;
 }
 
-int32_t scsi_req_enqueue(SCSIRequest *req)
+static void scsi_req_enqueue_internal(SCSIRequest *req)
 {
-int32_t rc;
-
 assert(!req-enqueued);
 scsi_req_ref(req);
 if (req-bus-info-get_sg_list) {
@@ -660,7 +658,14 @@ int32_t scsi_req_enqueue(SCSIRequest *req)
 }
 req-enqueued = true;
 QTAILQ_INSERT_TAIL(req-dev-requests, req, next);
+}
 
+int32_t scsi_req_enqueue(SCSIRequest *req)
+{
+int32_t rc;
+
+assert(!req-retry);
+scsi_req_enqueue_internal(req);
 scsi_req_ref(req);
 rc = req-ops-send_command(req, req-cmd.buf);
 scsi_req_unref(req);
@@ -1442,6 +1447,102 @@ SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, 
int id, int lun)
 return target_dev;
 }
 
+/* SCSI request list.  For simplicity, pv points to the whole device */
+
+static void put_scsi_requests(QEMUFile *f, void *pv, size_t size)
+{
+SCSIDevice *s = pv;
+SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, s-qdev.parent_bus);
+SCSIRequest *req;
+
+QTAILQ_FOREACH(req, s-requests, next) {
+assert(!req-io_canceled);
+assert(req-status == -1);
+assert(req-retry);
+assert(req-enqueued);
+
+qemu_put_sbyte(f, 1);
+qemu_put_buffer(f, req-cmd.buf, sizeof(req-cmd.buf));
+qemu_put_be32s(f, req-tag);
+qemu_put_be32s(f, req-lun);
+if (bus-info-save_request) {
+bus-info-save_request(f, req);
+}
+if (req-ops-save_request) {
+req-ops-save_request(f, req);
+}
+}
+qemu_put_sbyte(f, 0);
+}
+
+static int get_scsi_requests(QEMUFile *f, void *pv, size_t size)
+{
+SCSIDevice *s = pv;
+SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, s-qdev.parent_bus);
+
+while (qemu_get_sbyte(f)) {
+uint8_t buf[SCSI_CMD_BUF_SIZE];
+uint32_t tag;
+uint32_t lun;
+SCSIRequest *req;
+
+qemu_get_buffer(f, buf, sizeof(buf));
+qemu_get_be32s(f, tag);
+qemu_get_be32s(f, lun);
+req = scsi_req_new(s, tag, lun, buf, NULL);
+if (bus-info-load_request) {
+req-hba_private = bus-info-load_request(f, req);
+}
+if (req-ops-load_request) {
+req-ops-load_request(f, req);
+}
+
+/* Just restart it later.  */
+req-retry = true;
+scsi_req_enqueue_internal(req);
+
+/* At this point, the request will be kept alive by the reference
+ * added by scsi_req_enqueue_internal, so we can release our reference.
+ * The HBA of course will add its own reference in the load_request
+ * callback if it needs to hold on the SCSIRequest.
+ */
+scsi_req_unref(req);
+}
+
+return 0;
+}
+
+const VMStateInfo vmstate_info_scsi_requests = {
+.name = scsi-requests,
+.get  = get_scsi_requests,
+.put  = put_scsi_requests,
+};
+
+const VMStateDescription vmstate_scsi_device = {
+.name = SCSIDevice,
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields = (VMStateField[]) {
+VMSTATE_UINT8(unit_attention.key, SCSIDevice),
+VMSTATE_UINT8(unit_attention.asc, SCSIDevice),
+VMSTATE_UINT8(unit_attention.ascq, SCSIDevice),
+VMSTATE_BOOL(sense_is_ua, SCSIDevice),
+VMSTATE_UINT8_ARRAY(sense, SCSIDevice, SCSI_SENSE_BUF_SIZE),
+VMSTATE_UINT32(sense_len, SCSIDevice),
+{
+.name = requests,
+.version_id   = 0,
+.field_exists = NULL,
+.size = 0,   /* ouch */
+.info = vmstate_info_scsi_requests,
+.flags= VMS_SINGLE,
+.offset   = 0,
+},
+VMSTATE_END_OF_LIST()
+}
+};
+
 static void scsi_device_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *k = DEVICE_CLASS(klass);
diff --git a/hw/scsi.h b/hw/scsi.h
index 811f61c..c6624ca 100644
--- a/hw/scsi.h
+++ b/hw/scsi.h
@@ -96,6 +96,16 @@ struct SCSIDevice
 uint64_t max_lba;
 };
 
+extern const VMStateDescription vmstate_scsi_device;
+
+#define VMSTATE_SCSI_DEVICE(_field, _state) {\
+.name   = (stringify(_field)),   \
+.size   = sizeof(SCSIDevice),\
+.vmsd   = vmstate_scsi_device,  \
+.flags  = VMS_STRUCT,\
+.offset = vmstate_offset_value(_state, _field, SCSIDevice),  \
+}
+
 /* cdrom.c */
 int cdrom_read_toc(int nb_sectors, 

[PATCH v3 05/15] scsi: pass residual amount to command_complete

2012-02-13 Thread Paolo Bonzini
With the upcoming sglist support, HBAs will not see any transfer_data
call and will not have a way to detect short transfers.  So pass the
residual amount of data upon command completion.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
v2-v3: fixed resid type (Stefan)

 hw/esp.c |3 ++-
 hw/lsi53c895a.c  |2 +-
 hw/scsi-bus.c|   12 
 hw/scsi.h|3 ++-
 hw/spapr_vscsi.c |2 +-
 hw/usb-msd.c |2 +-
 6 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/hw/esp.c b/hw/esp.c
index 2f44386..991e091 100644
--- a/hw/esp.c
+++ b/hw/esp.c
@@ -390,7 +390,8 @@ static void esp_do_dma(ESPState *s)
 esp_dma_done(s);
 }
 
-static void esp_command_complete(SCSIRequest *req, uint32_t status)
+static void esp_command_complete(SCSIRequest *req, uint32_t status,
+ size_t resid)
 {
 ESPState *s = DO_UPCAST(ESPState, busdev.qdev, req-bus-qbus.parent);
 
diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c
index 9a7ffe3..e36fe35 100644
--- a/hw/lsi53c895a.c
+++ b/hw/lsi53c895a.c
@@ -699,7 +699,7 @@ static int lsi_queue_req(LSIState *s, SCSIRequest *req, 
uint32_t len)
 }
 
  /* Callback to indicate that the SCSI layer has completed a command.  */
-static void lsi_command_complete(SCSIRequest *req, uint32_t status)
+static void lsi_command_complete(SCSIRequest *req, uint32_t status, size_t 
resid)
 {
 LSIState *s = DO_UPCAST(LSIState, dev.qdev, req-bus-qbus.parent);
 int out;
diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index 0ee50a8..6a069f4 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -533,6 +533,8 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, 
uint32_t lun,
 }
 
 req-cmd = cmd;
+req-resid = req-cmd.xfer;
+
 switch (buf[0]) {
 case INQUIRY:
 trace_scsi_inquiry(d-id, lun, tag, cmd.buf[1], cmd.buf[2]);
@@ -1275,10 +1277,12 @@ void scsi_req_data(SCSIRequest *req, int len)
 {
 if (req-io_canceled) {
 trace_scsi_req_data_canceled(req-dev-id, req-lun, req-tag, len);
-} else {
-trace_scsi_req_data(req-dev-id, req-lun, req-tag, len);
-req-bus-info-transfer_data(req, len);
+return;
 }
+trace_scsi_req_data(req-dev-id, req-lun, req-tag, len);
+assert(req-cmd.mode != SCSI_XFER_NONE);
+req-resid -= len;
+req-bus-info-transfer_data(req, len);
 }
 
 void scsi_req_print(SCSIRequest *req)
@@ -1337,7 +1341,7 @@ void scsi_req_complete(SCSIRequest *req, int status)
 
 scsi_req_ref(req);
 scsi_req_dequeue(req);
-req-bus-info-complete(req, req-status);
+req-bus-info-complete(req, req-status, req-resid);
 scsi_req_unref(req);
 }
 
diff --git a/hw/scsi.h b/hw/scsi.h
index dc72b6f..e1c52d2 100644
--- a/hw/scsi.h
+++ b/hw/scsi.h
@@ -46,6 +46,7 @@ struct SCSIRequest {
 uint32_t  tag;
 uint32_t  lun;
 uint32_t  status;
+size_tresid;
 SCSICommand   cmd;
 BlockDriverAIOCB  *aiocb;
 uint8_t sense[SCSI_SENSE_BUF_SIZE];
@@ -112,7 +113,7 @@ struct SCSIBusInfo {
 int tcq;
 int max_channel, max_target, max_lun;
 void (*transfer_data)(SCSIRequest *req, uint32_t arg);
-void (*complete)(SCSIRequest *req, uint32_t arg);
+void (*complete)(SCSIRequest *req, uint32_t arg, size_t resid);
 void (*cancel)(SCSIRequest *req);
 };
 
diff --git a/hw/spapr_vscsi.c b/hw/spapr_vscsi.c
index 9cfce19..d7123df 100644
--- a/hw/spapr_vscsi.c
+++ b/hw/spapr_vscsi.c
@@ -494,7 +494,7 @@ static void vscsi_transfer_data(SCSIRequest *sreq, uint32_t 
len)
 }
 
 /* Callback to indicate that the SCSI layer has completed a transfer.  */
-static void vscsi_command_complete(SCSIRequest *sreq, uint32_t status)
+static void vscsi_command_complete(SCSIRequest *sreq, uint32_t status, size_t 
resid)
 {
 VSCSIState *s = DO_UPCAST(VSCSIState, vdev.qdev, sreq-bus-qbus.parent);
 vscsi_req *req = sreq-hba_private;
diff --git a/hw/usb-msd.c b/hw/usb-msd.c
index 6153376..47b8b8e 100644
--- a/hw/usb-msd.c
+++ b/hw/usb-msd.c
@@ -223,7 +223,7 @@ static void usb_msd_transfer_data(SCSIRequest *req, 
uint32_t len)
 }
 }
 
-static void usb_msd_command_complete(SCSIRequest *req, uint32_t status)
+static void usb_msd_command_complete(SCSIRequest *req, uint32_t status, size_t 
resid)
 {
 MSDState *s = DO_UPCAST(MSDState, dev.qdev, req-bus-qbus.parent);
 USBPacket *p = s-packet;
-- 
1.7.7.6


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 09/15] scsi-generic: add migration support

2012-02-13 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/scsi-generic.c |   25 +
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/hw/scsi-generic.c b/hw/scsi-generic.c
index 4859212..cd62922 100644
--- a/hw/scsi-generic.c
+++ b/hw/scsi-generic.c
@@ -59,6 +59,28 @@ typedef struct SCSIGenericReq {
 sg_io_hdr_t io_header;
 } SCSIGenericReq;
 
+static void scsi_generic_save_request(QEMUFile *f, SCSIRequest *req)
+{
+SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
+
+qemu_put_sbe32s(f, r-buflen);
+if (r-buflen  r-req.cmd.mode == SCSI_XFER_TO_DEV) {
+assert(!r-req.sg);
+qemu_put_buffer(f, r-buf, r-req.cmd.xfer);
+}
+}
+
+static void scsi_generic_load_request(QEMUFile *f, SCSIRequest *req)
+{
+SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
+
+qemu_get_sbe32s(f, r-buflen);
+if (r-buflen  r-req.cmd.mode == SCSI_XFER_TO_DEV) {
+assert(!r-req.sg);
+qemu_get_buffer(f, r-buf, r-req.cmd.xfer);
+}
+}
+
 static void scsi_free_request(SCSIRequest *req)
 {
 SCSIGenericReq *r = DO_UPCAST(SCSIGenericReq, req, req);
@@ -446,6 +468,8 @@ const SCSIReqOps scsi_generic_req_ops = {
 .write_data   = scsi_write_data,
 .cancel_io= scsi_cancel_io,
 .get_buf  = scsi_get_buf,
+.load_request = scsi_generic_load_request,
+.save_request = scsi_generic_save_request,
 };
 
 static SCSIRequest *scsi_new_request(SCSIDevice *d, uint32_t tag, uint32_t lun,
@@ -474,6 +498,7 @@ static void scsi_generic_class_initfn(ObjectClass *klass, 
void *data)
 dc-desc = pass through generic scsi device (/dev/sg*);
 dc-reset = scsi_generic_reset;
 dc-props = scsi_generic_properties;
+dc-vmsd  = vmstate_scsi_device;
 }
 
 static TypeInfo scsi_generic_info = {
-- 
1.7.7.6


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 10/15] scsi-disk: add migration support

2012-02-13 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/scsi-disk.c |   59 ---
 1 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 0e4d6ad..4d7b4eb 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -111,12 +111,12 @@ static void scsi_cancel_io(SCSIRequest *req)
 r-req.aiocb = NULL;
 }
 
-static uint32_t scsi_init_iovec(SCSIDiskReq *r)
+static uint32_t scsi_init_iovec(SCSIDiskReq *r, size_t size)
 {
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r-req.dev);
 
 if (!r-iov.iov_base) {
-r-buflen = SCSI_DMA_BUF_SIZE;
+r-buflen = size;
 r-iov.iov_base = qemu_blockalign(s-qdev.conf.bs, r-buflen);
 }
 r-iov.iov_len = MIN(r-sector_count * 512, r-buflen);
@@ -124,6 +124,35 @@ static uint32_t scsi_init_iovec(SCSIDiskReq *r)
 return r-qiov.size / 512;
 }
 
+static void scsi_disk_save_request(QEMUFile *f, SCSIRequest *req)
+{
+SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
+
+qemu_put_be64s(f, r-sector);
+qemu_put_be32s(f, r-sector_count);
+qemu_put_be32s(f, r-buflen);
+if (r-buflen  r-req.cmd.mode == SCSI_XFER_TO_DEV) {
+qemu_put_buffer(f, r-iov.iov_base, r-iov.iov_len);
+}
+}
+
+static void scsi_disk_load_request(QEMUFile *f, SCSIRequest *req)
+{
+SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
+
+qemu_get_be64s(f, r-sector);
+qemu_get_be32s(f, r-sector_count);
+qemu_get_be32s(f, r-buflen);
+if (r-buflen) {
+scsi_init_iovec(r, r-buflen);
+if (r-req.cmd.mode == SCSI_XFER_TO_DEV) {
+qemu_get_buffer(f, r-iov.iov_base, r-iov.iov_len);
+}
+}
+
+qemu_iovec_init_external(r-qiov, r-iov, 1);
+}
+
 static void scsi_dma_complete(void *opaque, int ret)
 {
 SCSIDiskReq *r = (SCSIDiskReq *)opaque;
@@ -241,7 +270,7 @@ static void scsi_read_data(SCSIRequest *req)
 r-req.aiocb = dma_bdrv_read(s-qdev.conf.bs, r-req.sg, r-sector,
  scsi_dma_complete, r);
 } else {
-n = scsi_init_iovec(r);
+n = scsi_init_iovec(r, SCSI_DMA_BUF_SIZE);
 bdrv_acct_start(s-qdev.conf.bs, r-acct, n * BDRV_SECTOR_SIZE, 
BDRV_ACCT_READ);
 r-req.aiocb = bdrv_aio_readv(s-qdev.conf.bs, r-sector, r-qiov, n,
   scsi_read_complete, r);
@@ -316,7 +345,7 @@ static void scsi_write_complete(void * opaque, int ret)
 if (r-sector_count == 0) {
 scsi_req_complete(r-req, GOOD);
 } else {
-scsi_init_iovec(r);
+scsi_init_iovec(r, SCSI_DMA_BUF_SIZE);
 DPRINTF(Write complete tag=0x%x more=%d\n, r-req.tag, r-qiov.size);
 scsi_req_data(r-req, r-qiov.size);
 }
@@ -1621,6 +1650,8 @@ static const SCSIReqOps scsi_disk_reqops = {
 .write_data   = scsi_write_data,
 .cancel_io= scsi_cancel_io,
 .get_buf  = scsi_get_buf,
+.load_request = scsi_disk_load_request,
+.save_request = scsi_disk_save_request,
 };
 
 static SCSIRequest *scsi_new_request(SCSIDevice *d, uint32_t tag, uint32_t lun,
@@ -1755,6 +1786,22 @@ static Property scsi_hd_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static const VMStateDescription vmstate_scsi_disk_state = {
+.name = scsi-disk,
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields = (VMStateField[]) {
+VMSTATE_SCSI_DEVICE(qdev, SCSIDiskState),
+VMSTATE_BOOL(media_changed, SCSIDiskState),
+VMSTATE_BOOL(media_event, SCSIDiskState),
+VMSTATE_BOOL(eject_request, SCSIDiskState),
+VMSTATE_BOOL(tray_open, SCSIDiskState),
+VMSTATE_BOOL(tray_locked, SCSIDiskState),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static void scsi_hd_class_initfn(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -1768,6 +1815,7 @@ static void scsi_hd_class_initfn(ObjectClass *klass, void 
*data)
 dc-desc = virtual SCSI disk;
 dc-reset = scsi_disk_reset;
 dc-props = scsi_hd_properties;
+dc-vmsd  = vmstate_scsi_disk_state;
 }
 
 static TypeInfo scsi_hd_info = {
@@ -1795,6 +1843,7 @@ static void scsi_cd_class_initfn(ObjectClass *klass, void 
*data)
 dc-desc = virtual SCSI CD-ROM;
 dc-reset = scsi_disk_reset;
 dc-props = scsi_cd_properties;
+dc-vmsd  = vmstate_scsi_disk_state;
 }
 
 static TypeInfo scsi_cd_info = {
@@ -1822,6 +1871,7 @@ static void scsi_block_class_initfn(ObjectClass *klass, 
void *data)
 dc-desc = SCSI block device passthrough;
 dc-reset = scsi_disk_reset;
 dc-props = scsi_block_properties;
+dc-vmsd  = vmstate_scsi_disk_state;
 }
 
 static TypeInfo scsi_block_info = {
@@ -1851,6 +1901,7 @@ static void scsi_disk_class_initfn(ObjectClass *klass, 
void *data)
 dc-desc = virtual SCSI disk or CD-ROM (legacy);
 dc-reset = scsi_disk_reset;
 dc-props = scsi_disk_properties;
+dc-vmsd  = vmstate_scsi_disk_state;
 }
 
 static TypeInfo 

[PATCH v3 11/15] virtio-scsi: Add virtio-scsi stub device

2012-02-13 Thread Paolo Bonzini
From: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

Add a useless virtio SCSI HBA device:

  qemu -device virtio-scsi-pci

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
v2-v3: changed virtio id

 Makefile.target   |1 +
 default-configs/pci.mak   |1 +
 default-configs/s390x-softmmu.mak |1 +
 hw/pci.h  |1 +
 hw/s390-virtio-bus.c  |   34 ++
 hw/s390-virtio-bus.h  |4 +-
 hw/virtio-pci.c   |   56 +
 hw/virtio-pci.h   |2 +
 hw/virtio-scsi.c  |  228 +
 hw/virtio-scsi.h  |   36 ++
 hw/virtio.h   |3 +
 11 files changed, 366 insertions(+), 1 deletions(-)
 create mode 100644 hw/virtio-scsi.c
 create mode 100644 hw/virtio-scsi.h

diff --git a/Makefile.target b/Makefile.target
index 29fde6e..c8f61d6 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -200,6 +200,7 @@ obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o 
balloon.o ioport.o
 # need to fix this properly
 obj-$(CONFIG_NO_PCI) += pci-stub.o
 obj-$(CONFIG_VIRTIO) += virtio.o virtio-blk.o virtio-balloon.o virtio-net.o 
virtio-serial-bus.o
+obj-$(CONFIG_VIRTIO_SCSI) += virtio-scsi.o
 obj-y += vhost_net.o
 obj-$(CONFIG_VHOST_NET) += vhost.o
 obj-$(CONFIG_REALLY_VIRTFS) += 9pfs/virtio-9p-device.o
diff --git a/default-configs/pci.mak b/default-configs/pci.mak
index 9d3e1db..21e4ccf 100644
--- a/default-configs/pci.mak
+++ b/default-configs/pci.mak
@@ -1,5 +1,6 @@
 CONFIG_PCI=y
 CONFIG_VIRTIO_PCI=y
+CONFIG_VIRTIO_SCSI=y
 CONFIG_VIRTIO=y
 CONFIG_USB_UHCI=y
 CONFIG_USB_OHCI=y
diff --git a/default-configs/s390x-softmmu.mak 
b/default-configs/s390x-softmmu.mak
index 3005729..e588803 100644
--- a/default-configs/s390x-softmmu.mak
+++ b/default-configs/s390x-softmmu.mak
@@ -1 +1,2 @@
 CONFIG_VIRTIO=y
+CONFIG_VIRTIO_SCSI=y
diff --git a/hw/pci.h b/hw/pci.h
index 33b0b18..ff4c12d 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -75,6 +75,7 @@
 #define PCI_DEVICE_ID_VIRTIO_BLOCK   0x1001
 #define PCI_DEVICE_ID_VIRTIO_BALLOON 0x1002
 #define PCI_DEVICE_ID_VIRTIO_CONSOLE 0x1003
+#define PCI_DEVICE_ID_VIRTIO_SCSI0x1004
 
 #define FMT_PCIBUS  PRIx64
 
diff --git a/hw/s390-virtio-bus.c b/hw/s390-virtio-bus.c
index 49140f8..3515abc 100644
--- a/hw/s390-virtio-bus.c
+++ b/hw/s390-virtio-bus.c
@@ -169,6 +169,39 @@ static int s390_virtio_serial_init(VirtIOS390Device *dev)
 return r;
 }
 
+static int s390_virtio_scsi_init(VirtIOS390Device *dev)
+{
+VirtIODevice *vdev;
+
+vdev = virtio_scsi_init((DeviceState *)dev, dev-scsi);
+if (!vdev) {
+return -1;
+}
+
+return s390_virtio_device_init(dev, vdev);
+}
+
+static Property virtio_scsi_properties[] = {
+DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOPCIProxy, host_features, scsi),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void s390_virtio_scsi_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+VirtIOS390DeviceClass *k = VIRTIO_S390_DEVICE_CLASS(klass);
+
+k-init = s390_virtio_scsi_init;
+dc-props = s390_virtio_scsi_properties;
+}
+
+static DeviceInfo virtio_scsi_info = {
+.name  = virtio-scsi-s390,
+.parent= TYPE_VIRTIO_S390_DEVICE,
+.instance_size = sizeof(VirtIOS390Device),
+.class_init= s390_virtio_scsi_class_init,
+};
+
 static uint64_t s390_virtio_device_vq_token(VirtIOS390Device *dev, int vq)
 {
 ram_addr_t token_off;
@@ -439,6 +472,7 @@ static void s390_virtio_register(void)
 type_register_static(s390_virtio_serial);
 type_register_static(s390_virtio_blk);
 type_register_static(s390_virtio_net);
+type_register_static(s390_virtio_scsi);
 }
 device_init(s390_virtio_register);
 
diff --git a/hw/s390-virtio-bus.h b/hw/s390-virtio-bus.h
index b5e59b7..ef534b6 100644
--- a/hw/s390-virtio-bus.h
+++ b/hw/s390-virtio-bus.h
@@ -19,6 +19,7 @@
 
 #include virtio-net.h
 #include virtio-serial.h
+#include virtio-scsi.h
 
 #define VIRTIO_DEV_OFFS_TYPE   0   /* 8 bits */
 #define VIRTIO_DEV_OFFS_NUM_VQ 1   /* 8 bits */
@@ -67,7 +68,8 @@ struct VirtIOS390Device {
 uint32_t host_features;
 virtio_serial_conf serial;
 virtio_net_conf net;
-};
+VirtIOSCSIConf scsi;
+} VirtIOS390Device;
 
 typedef struct VirtIOS390Bus {
 BusState bus;
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 93fff54..08e63a6 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -21,6 +21,7 @@
 #include virtio-blk.h
 #include virtio-net.h
 #include virtio-serial.h
+#include virtio-scsi.h
 #include pci.h
 #include qemu-error.h
 #include msix.h
@@ -930,12 +931,67 @@ static TypeInfo virtio_balloon_info = {
 .class_init= virtio_balloon_class_init,
 };
 
+static int virtio_scsi_init_pci(PCIDevice 

[PATCH v3 13/15] virtio-scsi: add basic SCSI bus operation

2012-02-13 Thread Paolo Bonzini
Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
v2-v3: fixed sense length (Christian Hoff)

 hw/virtio-scsi.c |  110 +++--
 1 files changed, 97 insertions(+), 13 deletions(-)

diff --git a/hw/virtio-scsi.c b/hw/virtio-scsi.c
index b34c14f..21264a1 100644
--- a/hw/virtio-scsi.c
+++ b/hw/virtio-scsi.c
@@ -128,6 +128,7 @@ typedef struct {
 DeviceState *qdev;
 VirtIOSCSIConf *conf;
 
+SCSIBus bus;
 VirtQueue *ctrl_vq;
 VirtQueue *event_vq;
 VirtQueue *cmd_vq;
@@ -156,6 +157,22 @@ typedef struct VirtIOSCSIReq {
 } resp;
 } VirtIOSCSIReq;
 
+static inline int virtio_scsi_get_lun(uint8_t *lun)
+{
+return ((lun[2]  8) | lun[3])  0x3FFF;
+}
+
+static inline SCSIDevice *virtio_scsi_device_find(VirtIOSCSI *s, uint8_t *lun)
+{
+if (lun[0] != 1) {
+return NULL;
+}
+if (lun[2] != 0  !(lun[2] = 0x40  lun[2]  0x80)) {
+return NULL;
+}
+return scsi_device_find(s-bus, 0, lun[1], virtio_scsi_get_lun(lun));
+}
+
 static void virtio_scsi_complete_req(VirtIOSCSIReq *req)
 {
 VirtIOSCSI *s = req-dev;
@@ -240,7 +257,42 @@ static void virtio_scsi_handle_ctrl(VirtIODevice *vdev, 
VirtQueue *vq)
 }
 }
 
-static void virtio_scsi_fail_cmd_req(VirtIOSCSI *s, VirtIOSCSIReq *req)
+static void virtio_scsi_command_complete(SCSIRequest *r, uint32_t status,
+ size_t resid)
+{
+VirtIOSCSIReq *req = r-hba_private;
+
+req-resp.cmd-response = VIRTIO_SCSI_S_OK;
+req-resp.cmd-status = status;
+if (req-resp.cmd-status == GOOD) {
+req-resp.cmd-resid = resid;
+} else {
+req-resp.cmd-resid = 0;
+req-resp.cmd-sense_len =
+scsi_req_get_sense(r, req-resp.cmd-sense, 
VIRTIO_SCSI_SENSE_SIZE);
+}
+virtio_scsi_complete_req(req);
+}
+
+static QEMUSGList *virtio_scsi_get_sg_list(SCSIRequest *r)
+{
+VirtIOSCSIReq *req = r-hba_private;
+
+return req-qsgl;
+}
+
+static void virtio_scsi_request_cancelled(SCSIRequest *r)
+{
+VirtIOSCSIReq *req = r-hba_private;
+
+if (!req) {
+return;
+}
+req-resp.cmd-response = VIRTIO_SCSI_S_ABORTED;
+virtio_scsi_complete_req(req);
+}
+
+static void virtio_scsi_fail_cmd_req(VirtIOSCSIReq *req)
 {
 req-resp.cmd-response = VIRTIO_SCSI_S_FAILURE;
 virtio_scsi_complete_req(req);
@@ -250,8 +301,10 @@ static void virtio_scsi_handle_cmd(VirtIODevice *vdev, 
VirtQueue *vq)
 {
 VirtIOSCSI *s = (VirtIOSCSI *)vdev;
 VirtIOSCSIReq *req;
+int n;
 
 while ((req = virtio_scsi_pop_req(s, vq))) {
+SCSIDevice *d;
 int out_size, in_size;
 if (req-elem.out_num  1 || req-elem.in_num  1) {
 virtio_scsi_bad_req();
@@ -265,21 +318,36 @@ static void virtio_scsi_handle_cmd(VirtIODevice *vdev, 
VirtQueue *vq)
 }
 
 if (req-elem.out_num  1  req-elem.in_num  1) {
-virtio_scsi_fail_cmd_req(s, req);
+virtio_scsi_fail_cmd_req(req);
 continue;
 }
 
-req-resp.cmd-resid = 0;
-req-resp.cmd-status_qualifier = 0;
-req-resp.cmd-status = CHECK_CONDITION;
-req-resp.cmd-sense_len = 4;
-req-resp.cmd-sense[0] = 0xf0; /* Fixed format current sense */
-req-resp.cmd-sense[1] = ILLEGAL_REQUEST;
-req-resp.cmd-sense[2] = 0x20;
-req-resp.cmd-sense[3] = 0x00;
-req-resp.cmd-response = VIRTIO_SCSI_S_OK;
-
-virtio_scsi_complete_req(req);
+d = virtio_scsi_device_find(s, req-req.cmd-lun);
+if (!d) {
+req-resp.cmd-response = VIRTIO_SCSI_S_BAD_TARGET;
+virtio_scsi_complete_req(req);
+continue;
+}
+req-sreq = scsi_req_new(d, req-req.cmd-tag,
+ virtio_scsi_get_lun(req-req.cmd-lun),
+ req-req.cmd-cdb, req);
+
+if (req-sreq-cmd.mode != SCSI_XFER_NONE) {
+int req_mode =
+(req-elem.in_num  1 ? SCSI_XFER_FROM_DEV : SCSI_XFER_TO_DEV);
+
+if (req-sreq-cmd.mode != req_mode ||
+req-sreq-cmd.xfer  req-qsgl.size) {
+req-resp.cmd-response = VIRTIO_SCSI_S_OVERRUN;
+virtio_scsi_complete_req(req);
+continue;
+}
+}
+
+n = scsi_req_enqueue(req-sreq);
+if (n) {
+scsi_req_continue(req-sreq);
+}
 }
 }
 
@@ -331,6 +399,17 @@ static void virtio_scsi_reset(VirtIODevice *vdev)
 s-cdb_size = VIRTIO_SCSI_CDB_SIZE;
 }
 
+static struct SCSIBusInfo virtio_scsi_scsi_info = {
+.tcq = true,
+.max_channel = VIRTIO_SCSI_MAX_CHANNEL,
+.max_target = VIRTIO_SCSI_MAX_TARGET,
+.max_lun = VIRTIO_SCSI_MAX_LUN,
+
+.complete = virtio_scsi_command_complete,
+.cancel = virtio_scsi_request_cancelled,
+.get_sg_list = virtio_scsi_get_sg_list,
+};
+
 VirtIODevice 

[PATCH v3 14/15] virtio-scsi: process control queue requests

2012-02-13 Thread Paolo Bonzini
Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/virtio-scsi.c |  125 ++---
 1 files changed, 117 insertions(+), 8 deletions(-)

diff --git a/hw/virtio-scsi.c b/hw/virtio-scsi.c
index 21264a1..7ad60ec 100644
--- a/hw/virtio-scsi.c
+++ b/hw/virtio-scsi.c
@@ -134,6 +134,7 @@ typedef struct {
 VirtQueue *cmd_vq;
 uint32_t sense_size;
 uint32_t cdb_size;
+bool resetting;
 } VirtIOSCSI;
 
 typedef struct VirtIOSCSIReq {
@@ -236,15 +237,95 @@ static VirtIOSCSIReq *virtio_scsi_pop_req(VirtIOSCSI *s, 
VirtQueue *vq)
 return req;
 }
 
-static void virtio_scsi_fail_ctrl_req(VirtIOSCSIReq *req)
+static void virtio_scsi_do_tmf(VirtIOSCSI *s, VirtIOSCSIReq *req)
 {
-if (req-req.tmf-type == VIRTIO_SCSI_T_TMF) {
-req-resp.tmf-response = VIRTIO_SCSI_S_FAILURE;
-} else {
-req-resp.an-response = VIRTIO_SCSI_S_FAILURE;
+SCSIDevice *d = virtio_scsi_device_find(s, req-req.cmd-lun);
+SCSIRequest *r, *next;
+DeviceState *qdev;
+int target;
+
+switch (req-req.tmf-subtype) {
+case VIRTIO_SCSI_T_TMF_ABORT_TASK:
+case VIRTIO_SCSI_T_TMF_QUERY_TASK:
+d = virtio_scsi_device_find(s, req-req.cmd-lun);
+if (!d) {
+goto fail;
+}
+if (d-lun != virtio_scsi_get_lun(req-req.cmd-lun)) {
+req-resp.tmf-response = VIRTIO_SCSI_S_INCORRECT_LUN;
+break;
+}
+QTAILQ_FOREACH_SAFE(r, d-requests, next, next) {
+if (r-tag == req-req.cmd-tag) {
+break;
+}
+}
+if (r  r-hba_private) {
+if (req-req.tmf-subtype == VIRTIO_SCSI_T_TMF_ABORT_TASK) {
+scsi_req_cancel(r);
+}
+req-resp.tmf-response = VIRTIO_SCSI_S_FUNCTION_SUCCEEDED;
+} else {
+req-resp.tmf-response = VIRTIO_SCSI_S_OK;
+}
+break;
+
+case VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET:
+d = virtio_scsi_device_find(s, req-req.cmd-lun);
+if (!d) {
+goto fail;
+}
+if (d-lun == virtio_scsi_get_lun(req-req.cmd-lun)) {
+s-resetting++;
+qdev_reset_all(d-qdev);
+s-resetting--;
+}
+break;
+
+case VIRTIO_SCSI_T_TMF_ABORT_TASK_SET:
+case VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET:
+case VIRTIO_SCSI_T_TMF_QUERY_TASK_SET:
+d = virtio_scsi_device_find(s, req-req.cmd-lun);
+if (!d) {
+goto fail;
+}
+if (d-lun != virtio_scsi_get_lun(req-req.cmd-lun)) {
+req-resp.tmf-response = VIRTIO_SCSI_S_INCORRECT_LUN;
+break;
+}
+req-resp.tmf-response = VIRTIO_SCSI_S_OK;
+QTAILQ_FOREACH_SAFE(r, d-requests, next, next) {
+if (r-hba_private) {
+if (req-req.tmf-subtype != VIRTIO_SCSI_T_TMF_QUERY_TASK) {
+scsi_req_cancel(r);
+}
+req-resp.tmf-response = VIRTIO_SCSI_S_FUNCTION_SUCCEEDED;
+}
+}
+break;
+
+case VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET:
+target = req-req.cmd-lun[1];
+s-resetting++;
+QTAILQ_FOREACH(qdev, s-bus.qbus.children, sibling) {
+ d = DO_UPCAST(SCSIDevice, qdev, qdev);
+ if (d-channel == 0  d-id == target) {
+qdev_reset_all(d-qdev);
+ }
+}
+s-resetting--;
+break;
+
+case VIRTIO_SCSI_T_TMF_CLEAR_ACA:
+default:
+req-resp.tmf-response = VIRTIO_SCSI_S_FUNCTION_REJECTED;
+break;
 }
 
-virtio_scsi_complete_req(req);
+return;
+
+fail:
+req-resp.tmf-response = VIRTIO_SCSI_S_BAD_TARGET;
 }
 
 static void virtio_scsi_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
@@ -253,7 +334,31 @@ static void virtio_scsi_handle_ctrl(VirtIODevice *vdev, 
VirtQueue *vq)
 VirtIOSCSIReq *req;
 
 while ((req = virtio_scsi_pop_req(s, vq))) {
-virtio_scsi_fail_ctrl_req(req);
+int out_size, in_size;
+if (req-elem.out_num  1 || req-elem.in_num  1) {
+virtio_scsi_bad_req();
+continue;
+}
+
+out_size = req-elem.out_sg[0].iov_len;
+in_size = req-elem.in_sg[0].iov_len;
+if (req-req.tmf-type == VIRTIO_SCSI_T_TMF) {
+if (out_size  sizeof(VirtIOSCSICtrlTMFReq) ||
+in_size  sizeof(VirtIOSCSICtrlTMFResp)) {
+virtio_scsi_bad_req();
+}
+virtio_scsi_do_tmf(s, req);
+
+} else if (req-req.tmf-type == VIRTIO_SCSI_T_AN_QUERY ||
+   req-req.tmf-type == VIRTIO_SCSI_T_AN_SUBSCRIBE) {
+if (out_size  sizeof(VirtIOSCSICtrlANReq) ||
+in_size  sizeof(VirtIOSCSICtrlANResp)) {
+virtio_scsi_bad_req();
+}
+req-resp.an-event_actual = 0;
+req-resp.an-response = VIRTIO_SCSI_S_OK;
+}
+   

[PATCH v3 15/15] virtio-scsi: add migration support

2012-02-13 Thread Paolo Bonzini
Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/virtio-scsi.c |   50 +-
 1 files changed, 49 insertions(+), 1 deletions(-)

diff --git a/hw/virtio-scsi.c b/hw/virtio-scsi.c
index 7ad60ec..f5cecfc 100644
--- a/hw/virtio-scsi.c
+++ b/hw/virtio-scsi.c
@@ -237,6 +237,34 @@ static VirtIOSCSIReq *virtio_scsi_pop_req(VirtIOSCSI *s, 
VirtQueue *vq)
 return req;
 }
 
+static void virtio_scsi_save_request(QEMUFile *f, SCSIRequest *sreq)
+{
+VirtIOSCSIReq *req = sreq-hba_private;
+
+qemu_put_buffer(f, (unsigned char *)req-elem, sizeof(req-elem));
+}
+
+static void *virtio_scsi_load_request(QEMUFile *f, SCSIRequest *sreq)
+{
+SCSIBus *bus = sreq-bus;
+VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus);
+VirtIOSCSIReq *req;
+
+req = g_malloc(sizeof(*req));
+qemu_get_buffer(f, (unsigned char *)req-elem, sizeof(req-elem));
+virtio_scsi_parse_req(s, s-cmd_vq, req);
+
+scsi_req_ref(sreq);
+req-sreq = sreq;
+if (req-sreq-cmd.mode != SCSI_XFER_NONE) {
+int req_mode =
+(req-elem.in_num  1 ? SCSI_XFER_FROM_DEV : SCSI_XFER_TO_DEV);
+
+assert(req-sreq-cmd.mode == req_mode);
+}
+return req;
+}
+
 static void virtio_scsi_do_tmf(VirtIOSCSI *s, VirtIOSCSIReq *req)
 {
 SCSIDevice *d = virtio_scsi_device_find(s, req-req.cmd-lun);
@@ -508,6 +536,22 @@ static void virtio_scsi_reset(VirtIODevice *vdev)
 s-cdb_size = VIRTIO_SCSI_CDB_SIZE;
 }
 
+/* The device does not have anything to save beyond the virtio data.
+ * Request data is saved with callbacks from SCSI devices.
+ */
+static void virtio_scsi_save(QEMUFile *f, void *opaque)
+{
+VirtIOSCSI *s = opaque;
+virtio_save(s-vdev, f);
+}
+
+static int virtio_scsi_load(QEMUFile *f, void *opaque, int version_id)
+{
+VirtIOSCSI *s = opaque;
+virtio_load(s-vdev, f);
+return 0;
+}
+
 static struct SCSIBusInfo virtio_scsi_scsi_info = {
 .tcq = true,
 .max_channel = VIRTIO_SCSI_MAX_CHANNEL,
@@ -517,11 +561,14 @@ static struct SCSIBusInfo virtio_scsi_scsi_info = {
 .complete = virtio_scsi_command_complete,
 .cancel = virtio_scsi_request_cancelled,
 .get_sg_list = virtio_scsi_get_sg_list,
+.save_request = virtio_scsi_save_request,
+.load_request = virtio_scsi_load_request,
 };
 
 VirtIODevice *virtio_scsi_init(DeviceState *dev, VirtIOSCSIConf *proxyconf)
 {
 VirtIOSCSI *s;
+static int virtio_scsi_id;
 
 s = (VirtIOSCSI *)virtio_common_init(virtio-scsi, VIRTIO_ID_SCSI,
  sizeof(VirtIOSCSIConfig),
@@ -548,7 +595,8 @@ VirtIODevice *virtio_scsi_init(DeviceState *dev, 
VirtIOSCSIConf *proxyconf)
 scsi_bus_legacy_handle_cmdline(s-bus);
 }
 
-/* TODO savevm */
+register_savevm(dev, virtio-scsi, virtio_scsi_id++, 1,
+virtio_scsi_save, virtio_scsi_load, s);
 
 return s-vdev;
 }
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/3] KVM: PPC: epapr: Factor out the epapr init

2012-02-13 Thread Scott Wood
On 02/12/2012 11:47 PM, Liu Yu-B13201 wrote:
 
 
 -Original Message-
 From: Wood Scott-B07421
 Sent: Saturday, February 11, 2012 2:40 AM
 To: Liu Yu-B13201
 Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org;
 linuxppc-...@ozlabs.org; Wood Scott-B07421
 Subject: Re: [PATCH v3 1/3] KVM: PPC: epapr: Factor out the epapr init

 Why are you still doing the patching inside kvm.c?

 
 Do you mean we should move kvm_hypercall_start() into epapr bit?

Yes.  This is an ePAPR mechanism; KVM just happens to be a user of it.

We should also update arch/powerpc/include/asm/epapr_hcalls.h to use
this mechanism.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] pvclock: Make pv_clock more robust and fixup it if overflow happens

2012-02-13 Thread Marcelo Tosatti
On Mon, Feb 13, 2012 at 04:45:59PM +0100, Igor Mammedov wrote:
 Instead of hunting misterious stalls/hungs all over the kernel when
 overflow occurs at pvclock.c:pvclock_get_nsec_offset
 
 u64 delta = native_read_tsc() - shadow-tsc_timestamp;
 
 and introducing hooks when places of unexpected access found, pv_clock
 should be initialized for the calling cpu if overflow condition is detected.
 
 Signed-off-by: Igor Mammedov imamm...@redhat.com

Igor,

I disagree. This is fixing the symptom not the root cause. Additionally,
Xen also uses pvclock_clocksource_read.

How about adding a BUG_ON to detect the overflow, this way hunting for
the problem is not necessary.

  arch/x86/kernel/pvclock.c |   18 +++---
  1 files changed, 15 insertions(+), 3 deletions(-)
 
 diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
 index 42eb330..b486756 100644
 --- a/arch/x86/kernel/pvclock.c
 +++ b/arch/x86/kernel/pvclock.c
 @@ -41,9 +41,14 @@ void pvclock_set_flags(u8 flags)
   valid_flags = flags;
  }
  
 -static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow)
 +static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow,
 +bool *overflow)
  {
 - u64 delta = native_read_tsc() - shadow-tsc_timestamp;
 + u64 delta;
 + u64 tsc = native_read_tsc();
 + u64 shadow_timestamp = shadow-tsc_timestamp;
 + *overflow = tsc  shadow_timestamp;
 + delta = tsc - shadow_timestamp;
   return pvclock_scale_delta(delta, shadow-tsc_to_nsec_mul,
  shadow-tsc_shift);
  }
 @@ -94,12 +99,19 @@ cycle_t pvclock_clocksource_read(struct 
 pvclock_vcpu_time_info *src)
   unsigned version;
   cycle_t ret, offset;
   u64 last;
 + bool overflow;
  
   do {
   version = pvclock_get_time_values(shadow, src);
   barrier();
 - offset = pvclock_get_nsec_offset(shadow);
 + offset = pvclock_get_nsec_offset(shadow, overflow);
   ret = shadow.system_timestamp + offset;
 + if (unlikely(overflow)) {
 + memset(src, 0, sizeof(*src));
 + barrier();
 + x86_cpuinit.early_percpu_clock_init();
 + continue;
 + }
   barrier();
   } while (version != src-version);
  
 -- 
 1.7.7.6
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] pvclock: Make pv_clock more robust and fixup it if overflow happens

2012-02-13 Thread Igor Mammedov

On 02/13/2012 06:48 PM, Marcelo Tosatti wrote:

On Mon, Feb 13, 2012 at 04:45:59PM +0100, Igor Mammedov wrote:

Instead of hunting misterious stalls/hungs all over the kernel when
overflow occurs at pvclock.c:pvclock_get_nsec_offset

 u64 delta = native_read_tsc() - shadow-tsc_timestamp;

and introducing hooks when places of unexpected access found, pv_clock
should be initialized for the calling cpu if overflow condition is detected.

Signed-off-by: Igor Mammedovimamm...@redhat.com


Igor,

I disagree. This is fixing the symptom not the root cause. Additionally,
Xen also uses pvclock_clocksource_read.

How about adding a BUG_ON to detect the overflow, this way hunting for
the problem is not necessary.


Ok, I'll repost bug_on version.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] BUG in pv_clock when overflow condition is detected

2012-02-13 Thread Igor Mammedov
BUG when overflow occurs at pvclock.c:pvclock_get_nsec_offset

u64 delta = native_read_tsc() - shadow-tsc_timestamp;

this might happen at an attempt to read an uninitialized yet clock.
It won't prevent stalls and hangs but at least it won't do it silently.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
 arch/x86/kernel/pvclock.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 42eb330..35a6190 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -43,7 +43,10 @@ void pvclock_set_flags(u8 flags)
 
 static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow)
 {
-   u64 delta = native_read_tsc() - shadow-tsc_timestamp;
+   u64 delta;
+   u64 tsc = native_read_tsc();
+   BUG_ON(tsc  shadow-tsc_timestamp);
+   delta = tsc - shadow-tsc_timestamp;
return pvclock_scale_delta(delta, shadow-tsc_to_nsec_mul,
   shadow-tsc_shift);
 }
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests

2012-02-13 Thread Blue Swirl
On Mon, Feb 13, 2012 at 10:16, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-02-11 16:25, Blue Swirl wrote:
 On Fri, Feb 10, 2012 at 18:31, Jan Kiszka jan.kis...@siemens.com wrote:
 This enables acceleration for MMIO-based TPR registers accesses of
 32-bit Windows guest systems. It is mostly useful with KVM enabled,
 either on older Intel CPUs (without flexpriority feature, can also be
 manually disabled for testing) or any current AMD processor.

 The approach introduced here is derived from the original version of
 qemu-kvm. It was refactored, documented, and extended by support for
 user space APIC emulation, both with and without KVM acceleration. The
 VMState format was kept compatible, so was the ABI to the option ROM
 that implements the guest-side para-virtualized driver service. This
 enables seamless migration from qemu-kvm to upstream or, one day,
 between KVM and TCG mode.

 The basic concept goes like this:
  - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
   irqchip) a vmcall hypercall is registered
  - VAPIC option ROM is loaded into guest
  - option ROM activates TPR MMIO access reporting via port 0x7e
  - TPR accesses are trapped and patched in the guest to call into option
   ROM instead, VAPIC support is enabled
  - option ROM TPR helpers track state in memory and invoke hypercall to
   poll for pending IRQs if required

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

 I must say that I find the approach horrible, patching guests and ROMs
 and looking up Windows internals. Taking the same approach to extreme,
 we could for example patch Xen guest to become a KVM guest. Not that I
 object merging.

 Yes, this is horrible. But there is no real better way in the absence of
 hardware assisted virtualization of the TPR. I think MS is recommending
 this patching approach as well.

Maybe instead of routing via ROM and the hypercall, the TPR accesses
could be handled directly with guest invisible breakpoints (like GDB
breakpoints, but for QEMU internal use), much like other
instrumentation could be handled.

 diff --git a/hw/apic.c b/hw/apic.c
 index 086c544..2ebf3ca 100644
 --- a/hw/apic.c
 +++ b/hw/apic.c
 @@ -35,6 +35,10 @@
  #define MSI_ADDR_DEST_ID_SHIFT         12
  #define        MSI_ADDR_DEST_ID_MASK           0x000

 +#define SYNC_FROM_VAPIC                 0x1
 +#define SYNC_TO_VAPIC                   0x2
 +#define SYNC_ISR_IRR_TO_VAPIC           0x4

 Enum, please.

 OK.


 +
  static APICCommonState *local_apics[MAX_APICS + 1];

  static void apic_set_irq(APICCommonState *s, int vector_num, int 
 trigger_mode);
 @@ -78,6 +82,70 @@ static inline int get_bit(uint32_t *tab, int index)
     return !!(tab[i]  mask);
  }

 +/* return -1 if no bit is set */
 +static int get_highest_priority_int(uint32_t *tab)
 +{
 +    int i;
 +    for (i = 7; i = 0; i--) {
 +        if (tab[i] != 0) {
 +            return i * 32 + fls_bit(tab[i]);
 +        }
 +    }
 +    return -1;
 +}
 +
 +static void apic_sync_vapic(APICCommonState *s, int sync_type)
 +{
 +    VAPICState vapic_state;
 +    size_t length;
 +    off_t start;
 +    int vector;
 +
 +    if (!s-vapic_paddr) {
 +        return;
 +    }
 +    if (sync_type  SYNC_FROM_VAPIC) {
 +        cpu_physical_memory_rw(s-vapic_paddr, (void *)vapic_state,
 +                               sizeof(vapic_state), 0);
 +        s-tpr = vapic_state.tpr;
 +    }
 +    if (sync_type  (SYNC_TO_VAPIC | SYNC_ISR_IRR_TO_VAPIC)) {
 +        start = offsetof(VAPICState, isr);
 +        length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr);
 +
 +        if (sync_type  SYNC_TO_VAPIC) {
 +            assert(qemu_cpu_is_self(s-cpu_env));
 +
 +            vapic_state.tpr = s-tpr;
 +            vapic_state.enabled = 1;
 +            start = 0;
 +            length = sizeof(VAPICState);
 +        }
 +
 +        vector = get_highest_priority_int(s-isr);
 +        if (vector  0) {
 +            vector = 0;
 +        }
 +        vapic_state.isr = vector  0xf0;
 +
 +        vapic_state.zero = 0;
 +
 +        vector = get_highest_priority_int(s-irr);
 +        if (vector  0) {
 +            vector = 0;
 +        }
 +        vapic_state.irr = vector  0xff;
 +
 +        cpu_physical_memory_write_rom(s-vapic_paddr + start,
 +                                      ((void *)vapic_state) + start, 
 length);

 This assumes that the vapic_state structure matches guest what guest
 expect without conversion. Is this true for i386 on x86_64? I didn't
 check the structure in question.

 Yes, the structure in question is a packed one, stable on both guest and
 host side (the guest side is 32-bit only anyway).

 diff --git a/hw/apic_common.c b/hw/apic_common.c
 index 588531b..1977da7 100644
 --- a/hw/apic_common.c
 +++ b/hw/apic_common.c
 @@ -20,8 +20,10 @@
  #include apic.h
  #include apic_internal.h
  #include trace.h
 +#include kvm.h

  static int apic_irq_delivered;
 +bool apic_report_tpr_access;

 This should go to APICCommonState.

 Nope, it 

Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests

2012-02-13 Thread Gleb Natapov
On Mon, Feb 13, 2012 at 06:50:08PM +, Blue Swirl wrote:
 On Mon, Feb 13, 2012 at 10:16, Jan Kiszka jan.kis...@siemens.com wrote:
  On 2012-02-11 16:25, Blue Swirl wrote:
  On Fri, Feb 10, 2012 at 18:31, Jan Kiszka jan.kis...@siemens.com wrote:
  This enables acceleration for MMIO-based TPR registers accesses of
  32-bit Windows guest systems. It is mostly useful with KVM enabled,
  either on older Intel CPUs (without flexpriority feature, can also be
  manually disabled for testing) or any current AMD processor.
 
  The approach introduced here is derived from the original version of
  qemu-kvm. It was refactored, documented, and extended by support for
  user space APIC emulation, both with and without KVM acceleration. The
  VMState format was kept compatible, so was the ABI to the option ROM
  that implements the guest-side para-virtualized driver service. This
  enables seamless migration from qemu-kvm to upstream or, one day,
  between KVM and TCG mode.
 
  The basic concept goes like this:
   - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
    irqchip) a vmcall hypercall is registered
   - VAPIC option ROM is loaded into guest
   - option ROM activates TPR MMIO access reporting via port 0x7e
   - TPR accesses are trapped and patched in the guest to call into option
    ROM instead, VAPIC support is enabled
   - option ROM TPR helpers track state in memory and invoke hypercall to
    poll for pending IRQs if required
 
  Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 
  I must say that I find the approach horrible, patching guests and ROMs
  and looking up Windows internals. Taking the same approach to extreme,
  we could for example patch Xen guest to become a KVM guest. Not that I
  object merging.
 
  Yes, this is horrible. But there is no real better way in the absence of
  hardware assisted virtualization of the TPR. I think MS is recommending
  this patching approach as well.
 
 Maybe instead of routing via ROM and the hypercall, the TPR accesses
 could be handled directly with guest invisible breakpoints (like GDB
 breakpoints, but for QEMU internal use), much like other
 instrumentation could be handled.
 
Hypercall is rarely called. The idea behind patching is to not
have exit on each TPR update. Breakpoint will cause exit making the
whole exercise pointless.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests

2012-02-13 Thread Jan Kiszka
On 2012-02-13 19:50, Blue Swirl wrote:
 On Mon, Feb 13, 2012 at 10:16, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2012-02-11 16:25, Blue Swirl wrote:
 On Fri, Feb 10, 2012 at 18:31, Jan Kiszka jan.kis...@siemens.com wrote:
 This enables acceleration for MMIO-based TPR registers accesses of
 32-bit Windows guest systems. It is mostly useful with KVM enabled,
 either on older Intel CPUs (without flexpriority feature, can also be
 manually disabled for testing) or any current AMD processor.

 The approach introduced here is derived from the original version of
 qemu-kvm. It was refactored, documented, and extended by support for
 user space APIC emulation, both with and without KVM acceleration. The
 VMState format was kept compatible, so was the ABI to the option ROM
 that implements the guest-side para-virtualized driver service. This
 enables seamless migration from qemu-kvm to upstream or, one day,
 between KVM and TCG mode.

 The basic concept goes like this:
  - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
   irqchip) a vmcall hypercall is registered
  - VAPIC option ROM is loaded into guest
  - option ROM activates TPR MMIO access reporting via port 0x7e
  - TPR accesses are trapped and patched in the guest to call into option
   ROM instead, VAPIC support is enabled
  - option ROM TPR helpers track state in memory and invoke hypercall to
   poll for pending IRQs if required

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

 I must say that I find the approach horrible, patching guests and ROMs
 and looking up Windows internals. Taking the same approach to extreme,
 we could for example patch Xen guest to become a KVM guest. Not that I
 object merging.

 Yes, this is horrible. But there is no real better way in the absence of
 hardware assisted virtualization of the TPR. I think MS is recommending
 this patching approach as well.
 
 Maybe instead of routing via ROM and the hypercall, the TPR accesses
 could be handled directly with guest invisible breakpoints (like GDB
 breakpoints, but for QEMU internal use), much like other
 instrumentation could be handled.

Gleb answered it already.

 @@ -238,6 +275,7 @@ static int apic_init_common(SysBusDevice *dev)
  {
 APICCommonState *s = APIC_COMMON(dev);
 APICCommonClass *info;
 +static DeviceState *vapic;
 static int apic_no;

 if (apic_no = MAX_APICS) {
 @@ -248,10 +286,29 @@ static int apic_init_common(SysBusDevice *dev)
 info = APIC_COMMON_GET_CLASS(s);
 info-init(s);

 -sysbus_init_mmio(s-busdev, s-io_memory);
 +sysbus_init_mmio(dev, s-io_memory);
 +
 +if (!vapic  s-vapic_control  VAPIC_ENABLE_MASK) {
 +vapic = sysbus_create_simple(kvmvapic, -1, NULL);
 +}
 +s-vapic = vapic;
 +if (apic_report_tpr_access  info-enable_tpr_reporting) {

 I think you should not rely on apic_report_tpr_access being in sane
 condition during class init.

 It is mandatory, e.g. for CPU hotplug, as reporting needs to be
 consistent accross all VCPUs. Therefore it is a static global, set to
 false initially. However, you are right, we lack proper clearing of  the
 access report feature on reset, not only in this variable.
 
 I'd also set it to false initially.

It's a global variable, thus initialized to false by definition.

 +
 +#define VAPIC_CPU_SHIFT 7
 +
 +#define ROM_BLOCK_SIZE  512
 +#define ROM_BLOCK_MASK  (~(ROM_BLOCK_SIZE - 1))
 +
 +typedef struct VAPICHandlers {
 +uint32_t set_tpr;
 +uint32_t set_tpr_eax;
 +uint32_t get_tpr[8];
 +uint32_t get_tpr_stack;
 +} QEMU_PACKED VAPICHandlers;
 +
 +typedef struct GuestROMState {
 +char signature[8];
 +uint32_t vaddr;

 This does not look 64 bit clean.

 It's packed.
 
 I meant virtual address could be 64 bits on a 64 bit host, not
 structure packing.

This is for 32-bit guests only. 64-bit Windows doesn't access the TPR
via MMIO, thus is not activating the VAPIC.

 +uint32_t state;
 +uint32_t rom_state_paddr;
 +uint32_t rom_state_vaddr;
 +uint32_t vapic_paddr;
 +uint32_t real_tpr_addr;
 +GuestROMState rom_state;
 +size_t rom_size;
 +} VAPICROMState;
 +
 +#define TPR_INSTR_IS_WRITE  0x1
 +#define TPR_INSTR_ABS_MODRM 0x2
 +#define TPR_INSTR_MATCH_MODRM_REG   0x4
 +
 +typedef struct TPRInstruction {
 +uint8_t opcode;
 +uint8_t modrm_reg;
 +unsigned int flags;
 +size_t length;
 +off_t addr_offset;
 +} TPRInstruction;

 Also here the order is pessimized.

 Don't see the gain here, though.
 
 There are two bytes' hole between modrm_reg and flags, maybe also 4
 bytes between length and addr_offset (if size_t is 32 bits but off_t
 64 bits). I'd reverse the order so that members with largest alignment
 needs come first.

Well, but this won't make the struct smaller. I prefer to keep the
ordering in which we also initialize it.

 
 +static int find_real_tpr_addr(VAPICROMState *s, CPUState *env)
 +{
 +target_phys_addr_t 

Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

2012-02-13 Thread ronnie sahlberg
On Tue, Feb 14, 2012 at 2:12 AM, Hannes Reinecke h...@suse.de wrote:
 On 02/13/2012 02:18 PM, Michael S. Tsirkin wrote:
 On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote:
 On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin m...@redhat.com 
 wrote:
 On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:
 Only if you use the pci multi-function option but that kills
 standard hot unplug

 It doesn't kill it as such, rather you can't unplug luns individually.

 Isnt that just a consequence of the current implementation rather than
 a SCSI limitation?

 Yes.

 A different way to do hoplug could be to flag all devices as removable
 in the standard inq page then
 leave the LUN there persistently and what you remove/add is not the
 LUN device itself but just the media in the device.

 Instead of hot-plug remove the LUN,  hot-plug becomes media eject or
 media insert.
 The device remains present all time, you never remove it, but instead
 hot-plug controls if the media is present or not.


 This would require implementing at least START_STOP_UNIT and
 PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC.


 regards
 ronnie sahlberg

 That would work.

 Or we simply use the Peripheral Qualifier that the device is gone;
 eg we could simply set PQ = 1, return sense code 0x25/00 and be done
 with ...


That is still similar to rip a device out from the guest without notice
and can cause the guest to be surprised.


Removable media is standard feature in SCSI SBC (and other commandsets).
The nice part of removable media is that it activates a contract
between the device and the guest
to prevent removal of the media when the guest depends on the media
not being removed.

I.e.  If you have a SBC device with the removable-media bit set,
this is used to tell the initiator this media can be removed, be
prepared that this might happen.
So when you mount such a SBC device in the guest, the guest will issue
a PREVENT_ALLOW_MEDIUM_REMOVAL
to tell the device this medium is in use and may not be removed.

This automatically provides you with a mechanism where any guest can
signal to qemu when qemu may or may not remove the device/medium.



In addition to implementing PREVENT_ALLOW_MEDIUM_REMOVAL emulation,
qemu would also need to check the prevent-allow status before it
allows the device to be removed.

If nothing else, using this approach will automatically provide a
channel from the guest kernel to qemu to tell qemu when a device may
be unplugged and when it is not safe to unplug the device.



regards
ronnie sahlberg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

2012-02-13 Thread ronnie sahlberg
On Tue, Feb 14, 2012 at 7:42 AM, ronnie sahlberg
ronniesahlb...@gmail.com wrote:
 On Tue, Feb 14, 2012 at 2:12 AM, Hannes Reinecke h...@suse.de wrote:
 On 02/13/2012 02:18 PM, Michael S. Tsirkin wrote:
 On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote:
 On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin m...@redhat.com 
 wrote:
 On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:
 Only if you use the pci multi-function option but that kills
 standard hot unplug

 It doesn't kill it as such, rather you can't unplug luns individually.

 Isnt that just a consequence of the current implementation rather than
 a SCSI limitation?

 Yes.

 A different way to do hoplug could be to flag all devices as removable
 in the standard inq page then
 leave the LUN there persistently and what you remove/add is not the
 LUN device itself but just the media in the device.

 Instead of hot-plug remove the LUN,  hot-plug becomes media eject or
 media insert.
 The device remains present all time, you never remove it, but instead
 hot-plug controls if the media is present or not.


 This would require implementing at least START_STOP_UNIT and
 PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC.


 regards
 ronnie sahlberg

 That would work.

 Or we simply use the Peripheral Qualifier that the device is gone;
 eg we could simply set PQ = 1, return sense code 0x25/00 and be done
 with ...


 That is still similar to rip a device out from the guest without notice
 and can cause the guest to be surprised.


 Removable media is standard feature in SCSI SBC (and other commandsets).
 The nice part of removable media is that it activates a contract
 between the device and the guest
 to prevent removal of the media when the guest depends on the media
 not being removed.

 I.e.  If you have a SBC device with the removable-media bit set,
 this is used to tell the initiator this media can be removed, be
 prepared that this might happen.
 So when you mount such a SBC device in the guest, the guest will issue
 a PREVENT_ALLOW_MEDIUM_REMOVAL
 to tell the device this medium is in use and may not be removed.


What I mean is that if /dev/sdb is removable,
if you mount this as   mount /dev/sdb1 /mnt
this will automatically cause the guest kernel to send a
PREVENT_ALLOW_MEDIUM_REMOVAL to /dev/sdb to prevent removal.

When you umount /dev/sdb1   the kernel/guest will automagically send
PREVENT_ALLOW_MEDIUM_REMOVEAL to /dev/sdb and allow removal of the
media again.


If you capture this command and track the prevent/allow removal
status  you automatically get a channel where qemu will
know when it is safe to unplug the device  and when it is not safe to
unplug the device.
This is a nice feature.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for Tuesday 14

2012-02-13 Thread Juan Quintela

Hi

Please send in any agenda items you are interested in covering.

Cheers,

Juan.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


level in kvm_mmu_page_role

2012-02-13 Thread Sanidhya Kashyap
I have been going through the kvm code but didn't get the significance
of level in kvm_mmu_page_role. So, it would be nice if anyone can
explain it what is its use?

Thanks,
Sanidhya

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

2012-02-13 Thread Michael S. Tsirkin
On Tue, Feb 14, 2012 at 07:53:26AM +1100, ronnie sahlberg wrote:
 On Tue, Feb 14, 2012 at 7:42 AM, ronnie sahlberg
 ronniesahlb...@gmail.com wrote:
  On Tue, Feb 14, 2012 at 2:12 AM, Hannes Reinecke h...@suse.de wrote:
  On 02/13/2012 02:18 PM, Michael S. Tsirkin wrote:
  On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote:
  On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin m...@redhat.com 
  wrote:
  On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:
  Only if you use the pci multi-function option but that kills
  standard hot unplug
 
  It doesn't kill it as such, rather you can't unplug luns individually.
 
  Isnt that just a consequence of the current implementation rather than
  a SCSI limitation?
 
  Yes.
 
  A different way to do hoplug could be to flag all devices as removable
  in the standard inq page then
  leave the LUN there persistently and what you remove/add is not the
  LUN device itself but just the media in the device.
 
  Instead of hot-plug remove the LUN,  hot-plug becomes media eject or
  media insert.
  The device remains present all time, you never remove it, but instead
  hot-plug controls if the media is present or not.
 
 
  This would require implementing at least START_STOP_UNIT and
  PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC.
 
 
  regards
  ronnie sahlberg
 
  That would work.
 
  Or we simply use the Peripheral Qualifier that the device is gone;
  eg we could simply set PQ = 1, return sense code 0x25/00 and be done
  with ...
 
 
  That is still similar to rip a device out from the guest without notice
  and can cause the guest to be surprised.
 
 
  Removable media is standard feature in SCSI SBC (and other commandsets).
  The nice part of removable media is that it activates a contract
  between the device and the guest
  to prevent removal of the media when the guest depends on the media
  not being removed.
 
  I.e.  If you have a SBC device with the removable-media bit set,
  this is used to tell the initiator this media can be removed, be
  prepared that this might happen.
  So when you mount such a SBC device in the guest, the guest will issue
  a PREVENT_ALLOW_MEDIUM_REMOVAL
  to tell the device this medium is in use and may not be removed.
 
 
 What I mean is that if /dev/sdb is removable,
 if you mount this as   mount /dev/sdb1 /mnt
 this will automatically cause the guest kernel to send a
 PREVENT_ALLOW_MEDIUM_REMOVAL to /dev/sdb to prevent removal.
 
 When you umount /dev/sdb1   the kernel/guest will automagically send
 PREVENT_ALLOW_MEDIUM_REMOVEAL to /dev/sdb and allow removal of the
 media again.
 
 
 If you capture this command and track the prevent/allow removal
 status  you automatically get a channel where qemu will
 know when it is safe to unplug the device  and when it is not safe to
 unplug the device.
 This is a nice feature.

Presumably there's a way for device to notify the OS
that user requested removal, as well?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

2012-02-13 Thread ronnie sahlberg
On Tue, Feb 14, 2012 at 9:59 AM, Michael S. Tsirkin m...@redhat.com wrote:
 On Tue, Feb 14, 2012 at 07:53:26AM +1100, ronnie sahlberg wrote:
 On Tue, Feb 14, 2012 at 7:42 AM, ronnie sahlberg
 ronniesahlb...@gmail.com wrote:
  On Tue, Feb 14, 2012 at 2:12 AM, Hannes Reinecke h...@suse.de wrote:
  On 02/13/2012 02:18 PM, Michael S. Tsirkin wrote:
  On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote:
  On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin m...@redhat.com 
  wrote:
  On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:
  Only if you use the pci multi-function option but that kills
  standard hot unplug
 
  It doesn't kill it as such, rather you can't unplug luns individually.
 
  Isnt that just a consequence of the current implementation rather than
  a SCSI limitation?
 
  Yes.
 
  A different way to do hoplug could be to flag all devices as removable
  in the standard inq page then
  leave the LUN there persistently and what you remove/add is not the
  LUN device itself but just the media in the device.
 
  Instead of hot-plug remove the LUN,  hot-plug becomes media eject or
  media insert.
  The device remains present all time, you never remove it, but instead
  hot-plug controls if the media is present or not.
 
 
  This would require implementing at least START_STOP_UNIT and
  PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC.
 
 
  regards
  ronnie sahlberg
 
  That would work.
 
  Or we simply use the Peripheral Qualifier that the device is gone;
  eg we could simply set PQ = 1, return sense code 0x25/00 and be done
  with ...
 
 
  That is still similar to rip a device out from the guest without notice
  and can cause the guest to be surprised.
 
 
  Removable media is standard feature in SCSI SBC (and other commandsets).
  The nice part of removable media is that it activates a contract
  between the device and the guest
  to prevent removal of the media when the guest depends on the media
  not being removed.
 
  I.e.  If you have a SBC device with the removable-media bit set,
  this is used to tell the initiator this media can be removed, be
  prepared that this might happen.
  So when you mount such a SBC device in the guest, the guest will issue
  a PREVENT_ALLOW_MEDIUM_REMOVAL
  to tell the device this medium is in use and may not be removed.
 

 What I mean is that if /dev/sdb is removable,
 if you mount this as   mount /dev/sdb1 /mnt
 this will automatically cause the guest kernel to send a
 PREVENT_ALLOW_MEDIUM_REMOVAL to /dev/sdb to prevent removal.

 When you umount /dev/sdb1   the kernel/guest will automagically send
 PREVENT_ALLOW_MEDIUM_REMOVEAL to /dev/sdb and allow removal of the
 media again.


 If you capture this command and track the prevent/allow removal
 status  you automatically get a channel where qemu will
 know when it is safe to unplug the device  and when it is not safe to
 unplug the device.
 This is a nice feature.

 Presumably there's a way for device to notify the OS
 that user requested removal, as well?


I think that is done by responding with sense  to one of the commands,
like the every few second TEST_UNIT_READY that the
initiator/guest-kernel will send.

5Ah 01hDT WROM BK  OPERATOR MEDIUM REMOVAL REQUEST

This sense code should be the one to use.


I dont know if linux scsi initiator honors this  or what it will do.



I guess something like this could work ?

IF device is marked as prevent-removal THEN
send OPERATOR SEND MEDIUM REMOVAL REQUEST to the initiator
wait xyz seconds
IF device is still marked as prevent-removal THEN
ask operator guest refused to release the LUN, do you want to
forcefully remove it?
ELSE
unmount the media
FI
ELSE
   unmount the media
FI
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

2012-02-13 Thread Michael S. Tsirkin
On Tue, Feb 14, 2012 at 10:30:59AM +1100, ronnie sahlberg wrote:
 On Tue, Feb 14, 2012 at 9:59 AM, Michael S. Tsirkin m...@redhat.com wrote:
  On Tue, Feb 14, 2012 at 07:53:26AM +1100, ronnie sahlberg wrote:
  On Tue, Feb 14, 2012 at 7:42 AM, ronnie sahlberg
  ronniesahlb...@gmail.com wrote:
   On Tue, Feb 14, 2012 at 2:12 AM, Hannes Reinecke h...@suse.de wrote:
   On 02/13/2012 02:18 PM, Michael S. Tsirkin wrote:
   On Tue, Feb 14, 2012 at 12:13:36AM +1100, ronnie sahlberg wrote:
   On Tue, Feb 14, 2012 at 12:00 AM, Michael S. Tsirkin 
   m...@redhat.com wrote:
   On Mon, Feb 13, 2012 at 02:54:03PM +0200, Dor Laor wrote:
   Only if you use the pci multi-function option but that kills
   standard hot unplug
  
   It doesn't kill it as such, rather you can't unplug luns 
   individually.
  
   Isnt that just a consequence of the current implementation rather than
   a SCSI limitation?
  
   Yes.
  
   A different way to do hoplug could be to flag all devices as removable
   in the standard inq page then
   leave the LUN there persistently and what you remove/add is not the
   LUN device itself but just the media in the device.
  
   Instead of hot-plug remove the LUN,  hot-plug becomes media eject or
   media insert.
   The device remains present all time, you never remove it, but instead
   hot-plug controls if the media is present or not.
  
  
   This would require implementing at least START_STOP_UNIT and
   PREVENT_ALLOW_MEDIUM_REMOVAL opcode emulation from SBC.
  
  
   regards
   ronnie sahlberg
  
   That would work.
  
   Or we simply use the Peripheral Qualifier that the device is gone;
   eg we could simply set PQ = 1, return sense code 0x25/00 and be done
   with ...
  
  
   That is still similar to rip a device out from the guest without notice
   and can cause the guest to be surprised.
  
  
   Removable media is standard feature in SCSI SBC (and other commandsets).
   The nice part of removable media is that it activates a contract
   between the device and the guest
   to prevent removal of the media when the guest depends on the media
   not being removed.
  
   I.e.  If you have a SBC device with the removable-media bit set,
   this is used to tell the initiator this media can be removed, be
   prepared that this might happen.
   So when you mount such a SBC device in the guest, the guest will issue
   a PREVENT_ALLOW_MEDIUM_REMOVAL
   to tell the device this medium is in use and may not be removed.
  
 
  What I mean is that if /dev/sdb is removable,
  if you mount this as   mount /dev/sdb1 /mnt
  this will automatically cause the guest kernel to send a
  PREVENT_ALLOW_MEDIUM_REMOVAL to /dev/sdb to prevent removal.
 
  When you umount /dev/sdb1   the kernel/guest will automagically send
  PREVENT_ALLOW_MEDIUM_REMOVEAL to /dev/sdb and allow removal of the
  media again.
 
 
  If you capture this command and track the prevent/allow removal
  status  you automatically get a channel where qemu will
  know when it is safe to unplug the device  and when it is not safe to
  unplug the device.
  This is a nice feature.
 
  Presumably there's a way for device to notify the OS
  that user requested removal, as well?
 
 
 I think that is done by responding with sense  to one of the commands,
 like the every few second TEST_UNIT_READY that the
 initiator/guest-kernel will send.

Does it do this even for mounted media?
I didn't realize ...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

2012-02-13 Thread Rusty Russell
On Mon, 13 Feb 2012 10:19:56 +0100, Paolo Bonzini pbonz...@redhat.com wrote:
 block layer _is_ growing support for new operations: discard is already 
 there, write same is in the works, extended copy will also come in due 
 time.  Perhaps we'll add them to virtio-blk, perhaps not.

FYI, I'd take patches for discard in virtio_blk today; it's a no-brainer
in a virtual devoce.

But I wouldn't want extended copy and write same.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] seabios: add OSHP method stub

2012-02-13 Thread Kevin O'Connor
On Mon, Feb 13, 2012 at 11:33:08AM +0200, Michael S. Tsirkin wrote:
 To allow guests to load the native SHPC driver
 for a bridge, we must declare an OSHP method
 for the appropriate device which lets the OS
 take control of the SHPC.
 As we don't access SHPC at the moment, we
 don't need to do anything - just report success.

The patch is fine with me, but since this is really qemu/kvm specific,
please provide an ack from one of the qemu/kvm maintainers.

-Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] seabios: add OSHP method stub

2012-02-13 Thread Kevin O'Connor
On Tue, Feb 14, 2012 at 02:43:45AM +0200, Michael S. Tsirkin wrote:
 On Mon, Feb 13, 2012 at 07:34:55PM -0500, Kevin O'Connor wrote:
  On Mon, Feb 13, 2012 at 11:33:08AM +0200, Michael S. Tsirkin wrote:
   To allow guests to load the native SHPC driver
   for a bridge, we must declare an OSHP method
   for the appropriate device which lets the OS
   take control of the SHPC.
   As we don't access SHPC at the moment, we
   don't need to do anything - just report success.
  
  The patch is fine with me, but since this is really qemu/kvm specific,
  please provide an ack from one of the qemu/kvm maintainers.
  
  -Kevin
 
 I expect no problem with this,
 though I'm wondering what makes it qemu specific.

Only kvm/qemu use the ACPI tables in seabios.

In a nutshell, I don't know what a SHPC is (nor OSHP), so I'm looking
for an additional Ack.

-Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >