Re: qemu-kvm crashes with Assertion ... failed.

2010-03-24 Thread André Weidemann

Hi,

On 24.03.2010 17:23, Avi Kivity wrote:


On 03/24/2010 06:20 PM, André Weidemann wrote:

Does this happen with a guest installed on kvm, or just with the guest
that (guessing from the name) was imported from vmware?



I booted the VM via PXE into an Ubuntu Live CD image. I only added the
Windows disk image, so I could copy the resulting Excel file (from
iozone) to this disk. The Windows 7 on this disk was installed under
kvm 0.12.3.



What version of Ubuntu? Can you post a way to reproduce this reliably
(how you created the disk etc.)




In that particular case I ran Ubuntu Desktop 9.10 x86_64 inside the VM. 
I set my server up so I can boot any machine on the network via PXE. I 
used the live CD that you can download here: 
http://141.30.3.84/ubuntu-releases/9.10/ubuntu-9.10-desktop-amd64.iso


Since this setup is a very special I booted the VM from the CD-ISO and 
installed Ubuntu onto an LV(100GB). This way I think, it is easier to 
reproduce.


At first I installed Ubuntu on an LV which resides on a VG that is on a 
RAID5 created via mdadm (4x 1TB HDDs).


Here is the command line, I used to start the VM:
qemu-system-x86_64 -cpu core2duo -vga cirrus -boot order=d -vnc 
192.168.3.42:2 -k de -smp 4,cores=4 -m 1024 -net 
nic,model=e1000,macaddr=DE:AD:BE:EF:12:3A -net 
tap,script=/usr/local/bin/qemu-ifup  -monitor pty -name 
Ubuntu9.10test,process=Ubuntu9.10test -cdrom 
/tftpboot/ubuntu-9.10-desktop-amd64.iso -drive 
file=/dev/storage/UbuntuTest,if=ide,index=1,cache=none,aio=native


I booted into the Ubuntu Live CD and chose to install Ubuntu from the 
desktop. I had the partitioner install Ubuntu onto the entire disk. I 
did not setup any partitions manually. During the install process, the 
VM crashed again. The VM did not always crash at same stage of the 
installation process. But it nevertheless did, every time I tried to 
install it and I tried 3 times in a row.


I then changes index=1 to index=0 and ran another 3 tests. Ubuntu 
crashed 2 out of 3 times during installation.


To rule out the problem to be related to the LV on the RAID5, I then 
installed Ubuntu on a physical drive(250GB SATA). Therefore I changes 
the command line to this:
qemu-system-x86_64 -cpu core2duo -vga cirrus -boot order=d -vnc 
192.168.3.42:2 -k de -smp 4,cores=4 -m 1024 -net 
nic,model=e1000,macaddr=DE:AD:BE:EF:12:3A -net 
tap,script=/usr/local/bin/qemu-ifup  -monitor pty -name 
Ubuntu9.10test,process=Ubuntu9.10test -cdrom 
/tftpboot/ubuntu-9.10-desktop-amd64.iso -drive 
file=/dev/sdg,if=ide,index=0,cache=none,aio=native
I gave it another three tries. 2 out of 3 installations made the VM 
crash at around 95% and 43%.


So the crash does not seem to be related to running iozone inside the 
VM, but to disk access in general.


On the Server I am running Ubuntu Server 9.10 x86_64 with kernel 
2.6.31-20-server. The server is equipped with an Intel Q9300 CPU and 8GB 
RAM.

If you need more information, let me know.

 André
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] vhost-blk implementation

2010-03-24 Thread Avi Kivity

On 03/24/2010 10:05 PM, Christoph Hellwig wrote:

On Tue, Mar 23, 2010 at 12:03:14PM +0200, Avi Kivity wrote:
   

I also think it should be done at the bio layer.  File I/O is going to
be slower, if we do vhost-blk we should concentrate on maximum
performance.  The block layer also exposes more functionality we can use
(asynchronous barriers for example).
 

The block layer is more flexible, but that limits you to only stack
directly ontop of a block device, which is extremly inflexible.
   


We still have a virtio implementation in userspace for file-based images.

In any case, the file APIs are not asynchronous so we'll need a thread 
pool.  That will probably minimize the difference in performance between 
the userspace and kernel implementations.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/1] Shared memory uio_pci driver

2010-03-24 Thread Cam Macdonell
This patch adds a driver for my shared memory PCI device using the uio_pci
interface.  The driver has three memory regions.  The first memory region is for
device registers for sending interrupts. The second BAR is for receiving MSI-X
interrupts and the third memory region maps the shared memory.  The device only
exports the first and third memory regions to userspace.

This driver supports MSI-X and regular pin interrupts.  Currently, the number of
MSI vectors is set to 4 which could be increased, but the driver will work with
fewer vectors.  If MSI is not available, then regular interrupts will be used.
---
 drivers/uio/Kconfig   |8 ++
 drivers/uio/Makefile  |1 +
 drivers/uio/uio_ivshmem.c |  235 +
 3 files changed, 244 insertions(+), 0 deletions(-)
 create mode 100644 drivers/uio/uio_ivshmem.c

diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
index 1da73ec..b92cded 100644
--- a/drivers/uio/Kconfig
+++ b/drivers/uio/Kconfig
@@ -74,6 +74,14 @@ config UIO_SERCOS3
 
  If you compile this as a module, it will be called uio_sercos3.
 
+config UIO_IVSHMEM
+   tristate "KVM shared memory PCI driver"
+   default n
+   help
+ Userspace I/O interface for the KVM shared memory device.  This
+ driver will make available two memory regions, the first is
+ registers and the second is a region for sharing between VMs.
+
 config UIO_PCI_GENERIC
tristate "Generic driver for PCI 2.3 and PCI Express cards"
depends on PCI
diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
index 18fd818..25c1ca5 100644
--- a/drivers/uio/Makefile
+++ b/drivers/uio/Makefile
@@ -6,3 +6,4 @@ obj-$(CONFIG_UIO_AEC)   += uio_aec.o
 obj-$(CONFIG_UIO_SERCOS3)  += uio_sercos3.o
 obj-$(CONFIG_UIO_PCI_GENERIC)  += uio_pci_generic.o
 obj-$(CONFIG_UIO_NETX) += uio_netx.o
+obj-$(CONFIG_UIO_IVSHMEM) += uio_ivshmem.o
diff --git a/drivers/uio/uio_ivshmem.c b/drivers/uio/uio_ivshmem.c
new file mode 100644
index 000..607435b
--- /dev/null
+++ b/drivers/uio/uio_ivshmem.c
@@ -0,0 +1,235 @@
+/*
+ * UIO IVShmem Driver
+ *
+ * (C) 2009 Cam Macdonell
+ * based on Hilscher CIF card driver (C) 2007 Hans J. Koch 
+ *
+ * Licensed under GPL version 2 only.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define IntrStatus 0x04
+#define IntrMask 0x00
+
+struct ivshmem_info {
+struct uio_info *uio;
+struct pci_dev *dev;
+char (*msix_names)[256];
+struct msix_entry *msix_entries;
+int nvectors;
+};
+
+static irqreturn_t ivshmem_handler(int irq, struct uio_info *dev_info)
+{
+
+void __iomem *plx_intscr = dev_info->mem[0].internal_addr
++ IntrStatus;
+u32 val;
+
+val = readl(plx_intscr);
+if (val == 0)
+return IRQ_NONE;
+
+printk(KERN_INFO "Regular interrupt (val = %d)\n", val);
+return IRQ_HANDLED;
+}
+
+static irqreturn_t ivshmem_msix_handler(int irq, void *opaque)
+{
+
+struct uio_info * dev_info = (struct uio_info *) opaque;
+
+/* we have to do this explicitly when using MSI-X */
+uio_event_notify(dev_info);
+printk(KERN_INFO "MSI-X interrupt (%d)\n", irq);
+return IRQ_HANDLED;
+
+}
+
+static int request_msix_vectors(struct ivshmem_info *ivs_info, int nvectors)
+{
+int i, err;
+const char *name = "ivshmem";
+
+printk(KERN_INFO "devname is %s\n", name);
+ivs_info->nvectors = nvectors;
+
+
+ivs_info->msix_entries = kmalloc(nvectors * sizeof *ivs_info->msix_entries,
+GFP_KERNEL);
+ivs_info->msix_names = kmalloc(nvectors * sizeof *ivs_info->msix_names,
+GFP_KERNEL);
+
+for (i = 0; i < nvectors; ++i)
+ivs_info->msix_entries[i].entry = i;
+
+err = pci_enable_msix(ivs_info->dev, ivs_info->msix_entries,
+ivs_info->nvectors);
+if (err > 0) {
+ivs_info->nvectors = err; /* msi-x positive error code
+ returns the number available*/
+err = pci_enable_msix(ivs_info->dev, ivs_info->msix_entries,
+ivs_info->nvectors);
+if (err > 0) {
+printk(KERN_INFO "no MSI (%d). Back to INTx.\n", err);
+return -ENOSPC;
+}
+}
+
+printk(KERN_INFO "err is %d\n", err);
+if (err) return err;
+
+for (i = 0; i < ivs_info->nvectors; i++) {
+
+snprintf(ivs_info->msix_names[i], sizeof *ivs_info->msix_names,
+"%s-config", name);
+
+ivs_info->msix_entries[i].entry = i;
+err = request_irq(ivs_info->msix_entries[i].vector,
+ivshmem_msix_handler, 0,
+ivs_info->msix_names[i], ivs_info->uio);
+
+if (err) {
+return -ENOSPC;
+}
+}
+
+return 0;
+}
+
+static int __devinit ivshmem_pci_probe(struct pci_dev *dev,
+const struct pci_device_id *id)
+{
+struct uio_info *info;
+struct ivshmem_info * ivshmem_info;
+int nvectors = 4;
+
+inf

[PATCH v3 2/2] Inter-VM shared memory PCI device

2010-03-24 Thread Cam Macdonell
Support an inter-vm shared memory device that maps a shared-memory object as a
PCI device in the guest.  This patch also supports interrupts between guest by
communicating over a unix domain socket.  This patch applies to the qemu-kvm
repository.

-device ivshmem,size=[,shm=]

Interrupts are supported between multiple VMs by using a shared memory server
by using a chardev socket.

-device ivshmem,size=[,shm=][,chardev=][,msi=on]
[,irqfd=on][,nvectors=n]
-chardev socket,path=,id=

Sample programs, init scripts and the shared memory server are available in a
git repo here:

www.gitorious.org/nahanni
---
 Makefile.target |3 +
 hw/ivshmem.c|  622 +++
 qemu-char.c |6 +
 qemu-char.h |3 +
 4 files changed, 634 insertions(+), 0 deletions(-)
 create mode 100644 hw/ivshmem.c

diff --git a/Makefile.target b/Makefile.target
index 4d88543..15edf19 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -219,6 +219,9 @@ obj-y += pcnet.o
 obj-y += rtl8139.o
 obj-y += e1000.o
 
+# Inter-VM PCI shared memory
+obj-y += ivshmem.o
+
 # Hardware support
 obj-i386-y = ide/core.o ide/qdev.o ide/isa.o ide/pci.o ide/piix.o
 obj-i386-y += pckbd.o $(sound-obj-y) dma.o
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
new file mode 100644
index 000..c76aae3
--- /dev/null
+++ b/hw/ivshmem.c
@@ -0,0 +1,622 @@
+/*
+ * Inter-VM Shared Memory PCI device.
+ *
+ * Author:
+ *  Cam Macdonell 
+ *
+ * Based On: cirrus_vga.c and rtl8139.c
+ *
+ * This code is licensed under the GNU GPL v2.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "hw.h"
+#include "console.h"
+#include "pc.h"
+#include "pci.h"
+#include "sysemu.h"
+
+#include "msix.h"
+#include "qemu-kvm.h"
+#include "libkvm.h"
+
+#include 
+#include 
+#include 
+#include 
+
+#define PCI_COMMAND_IOACCESS0x0001
+#define PCI_COMMAND_MEMACCESS   0x0002
+
+#define DEBUG_IVSHMEM
+
+#define IVSHMEM_IRQFD   0
+#define IVSHMEM_MSI 1
+
+#ifdef DEBUG_IVSHMEM
+#define IVSHMEM_DPRINTF(fmt, args...)\
+do {printf("IVSHMEM: " fmt, ##args); } while (0)
+#else
+#define IVSHMEM_DPRINTF(fmt, args...)
+#endif
+
+#define NEW_GUEST_VAL UINT_MAX
+
+typedef struct IVShmemState {
+PCIDevice dev;
+uint32_t intrmask;
+uint32_t intrstatus;
+uint32_t doorbell;
+
+CharDriverState * chr;
+CharDriverState * eventfd_chr;
+int ivshmem_mmio_io_addr;
+
+pcibus_t mmio_addr;
+uint8_t *ivshmem_ptr;
+unsigned long ivshmem_offset;
+unsigned int ivshmem_size;
+int shm_fd; /* shared memory file descriptor */
+
+struct kvm_ioeventfd ioeventfds[16];
+int eventfds[16]; /* for now we have a limit of 16 inter-connected guests 
*/
+int eventfd_posn;
+int num_eventfds;
+uint32_t vectors;
+uint32_t features;
+
+char * shmobj;
+uint32_t size; /*size of shared memory in MB*/
+} IVShmemState;
+
+/* registers for the Inter-VM shared memory device */
+enum ivshmem_registers {
+IntrMask = 0,
+IntrStatus = 4,
+IVPosition = 8,
+Doorbell = 12,
+};
+
+static inline uint32_t ivshmem_has_feature(IVShmemState *ivs, int feature) {
+return (ivs->features & (1 << feature));
+}
+
+static inline int is_power_of_two(int x) {return (x & (x-1)) == 0;}
+
+static void ivshmem_map(PCIDevice *pci_dev, int region_num,
+pcibus_t addr, pcibus_t size, int type)
+{
+IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev);
+
+IVSHMEM_DPRINTF("addr = %u size = %u\n", (uint32_t)addr, (uint32_t)size);
+cpu_register_physical_memory(addr, s->ivshmem_size, s->ivshmem_offset);
+
+}
+
+/* accessing registers - based on rtl8139 */
+static void ivshmem_update_irq(IVShmemState *s, int val)
+{
+int isr;
+isr = (s->intrstatus & s->intrmask) & 0x;
+
+/* don't print ISR resets */
+if (isr) {
+IVSHMEM_DPRINTF("Set IRQ to %d (%04x %04x)\n",
+   isr ? 1 : 0, s->intrstatus, s->intrmask);
+}
+
+qemu_set_irq(s->dev.irq[0], (isr != 0));
+}
+
+static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val)
+{
+IVSHMEM_DPRINTF("IntrMask write(w) val = 0x%04x\n", val);
+
+s->intrmask = val;
+
+ivshmem_update_irq(s, val);
+}
+
+static uint32_t ivshmem_IntrMask_read(IVShmemState *s)
+{
+uint32_t ret = s->intrmask;
+
+IVSHMEM_DPRINTF("intrmask read(w) val = 0x%04x\n", ret);
+
+return ret;
+}
+
+static void ivshmem_IntrStatus_write(IVShmemState *s, uint32_t val)
+{
+IVSHMEM_DPRINTF("IntrStatus write(w) val = 0x%04x\n", val);
+
+s->intrstatus = val;
+
+ivshmem_update_irq(s, val);
+return;
+}
+
+static uint32_t ivshmem_IntrStatus_read(IVShmemState *s)
+{
+uint32_t ret = s->intrstatus;
+
+/* reading ISR clears all interrupts */
+s->intrstatus = 0;
+
+ivshmem_update_irq(s, 0);
+
+return ret;
+}
+
+static void ivshmem_io_writew(void *opaque, uint8_t addr, uint32_t val)
+{
+
+IVSHMEM_DP

[PATCH v3 0/2] Inter-VM shared memory PCI device

2010-03-24 Thread Cam Macdonell
Support an inter-vm shared memory device that maps a shared-memory object
as a PCI device in the guest.  This patch also supports interrupts between
guest by communicating over a unix domain socket.  This patch applies to the
qemu-kvm repository.

Changes in this version are using the qdev format and optional use of MSI and
ioeventfd/irqfd.

The non-interrupt version is supported by passing the shm parameter

-device ivshmem,size=,[shm=]

which will simply map the shm object into a BAR.

Interrupts are supported between multiple VMs by using a shared memory server
that is connected to with a socket character device

-device ivshmem,size=[,chardev=][,irqfd=on]
[,msi=on][,nvectors=n]
-chardev socket,path=,id=

The server passes file descriptors for the shared memory object and eventfds 
(our
interrupt mechanism) to the respective qemu instances.

When using interrupts, VMs communicate with a shared memory server that passes
the shared memory object file descriptor using SCM_RIGHTS.  The server assigns
each VM an ID number and sends this ID number to the Qemu process along with a
series of eventfd file descriptors, one per guest using the shared memory
server.  These eventfds will be used to send interrupts between guests.  Each
guest listens on the eventfd corresponding to their ID and may use the others
for sending interrupts to other guests.

enum ivshmem_registers {
IntrMask = 0,
IntrStatus = 4,
IVPosition = 8,
Doorbell = 12
};

The first two registers are the interrupt mask and status registers.  Mask and
status are only used with pin-based interrupts.  They are unused with MSI
interrupts.  The IVPosition register is read-only and reports the guest's ID
number.  Interrupts are triggered when a message is received on the guest's
eventfd from another VM.  To trigger an event, a guest must write to another
guest's Doorbell.  The "Doorbells" begin at offset 12.  A particular guest's
doorbell offset in the MMIO region is equal to

guest_id * 32 + Doorbell

The doorbell register for each guest is 32-bits.  The doorbell-per-guest
design was motivated for use with ioeventfd.

The semantics of the value written to the doorbell depends on whether the
device is using MSI or a regular pin-based interrupt.

Regular Interrupts
--

If regular interrupts are used (due to either a guest not supporting MSI or the
user specifying not to use them on the command-line) then the value written to
a guest's doorbell is what the guest's status register will be set to.

An status of (2^32 - 1) indicates that a new guest has joined.  Guests
should not send a message of this value for any other reason.

Message Signalled Interrupts


The important thing to remember with MSI is that it is only a signal, no
status is set (since MSI interrupts are not shared).  All information other
than the interrupt itself should be communicated via the shared memory region.
MSI is on by default.  It can be turned off with the msi=off to the parameter.

If the device uses MSI then the value written to the doorbell is the MSI vector
that will be raised.  Vector 0 is used to notify that a new guest has joined.
Vector 0 cannot be triggered by another guest since a value of 0 does not
trigger an eventfd.

ioeventfd/irqfd
---

ioeventfd/irqfd is turned on by irqfd=on passed to the device parameter (it is
off by default).  When using ioeventfd/irqfd the only interrupt value that can
be passed to another guest is 1 despite what value is written to a guest's
Doorbell.

Sample programs, init scripts and the shared memory server are available in a
git repo here:

www.gitorious.org/nahanni

Cam Macdonell (2):
  Support adding a file to qemu's ram allocation
  Inter-VM shared memory PCI device

 Makefile.target |3 +
 cpu-common.h|1 +
 exec.c  |   33 +++
 hw/ivshmem.c|  622 +++
 qemu-char.c |6 +
 qemu-char.h |3 +
 6 files changed, 668 insertions(+), 0 deletions(-)
 create mode 100644 hw/ivshmem.c

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/2] Support adding a file to qemu's ram allocation

2010-03-24 Thread Cam Macdonell
This avoids the need of using qemu_ram_alloc and mmap with MAP_FIXED to map a
host file into guest RAM.  This function mmaps the opened file anywhere and adds
the memory to the ram blocks.

Usage is

qemu_ram_mmap(fd, size, MAP_SHARED, offset);
---
 cpu-common.h |1 +
 exec.c   |   33 +
 2 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/cpu-common.h b/cpu-common.h
index 6cae15b..dffe4e7 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -32,6 +32,7 @@ static inline void 
cpu_register_physical_memory(target_phys_addr_t start_addr,
 }
 
 ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr);
+ram_addr_t qemu_ram_mmap(int, ram_addr_t, int, int);
 ram_addr_t qemu_ram_alloc(ram_addr_t);
 void qemu_ram_free(ram_addr_t addr);
 /* This should only be used for ram local to a device.  */
diff --git a/exec.c b/exec.c
index 3b4426e..6f4e747 100644
--- a/exec.c
+++ b/exec.c
@@ -2727,6 +2727,39 @@ static void *file_ram_alloc(ram_addr_t memory, const 
char *path)
 }
 #endif
 
+ram_addr_t qemu_ram_mmap(int fd, ram_addr_t size, int flags, int offset)
+{
+RAMBlock *new_block;
+
+size = TARGET_PAGE_ALIGN(size);
+new_block = qemu_malloc(sizeof(*new_block));
+
+// map the file passed as a parameter to be this part of memory
+new_block->host = mmap(0, size, PROT_READ|PROT_WRITE, flags, fd, offset);
+
+#ifdef MADV_MERGEABLE
+madvise(new_block->host, size, MADV_MERGEABLE);
+#endif
+
+new_block->offset = last_ram_offset;
+new_block->length = size;
+
+new_block->next = ram_blocks;
+ram_blocks = new_block;
+
+phys_ram_dirty = qemu_realloc(phys_ram_dirty,
+(last_ram_offset + size) >> TARGET_PAGE_BITS);
+memset(phys_ram_dirty + (last_ram_offset >> TARGET_PAGE_BITS),
+   0xff, size >> TARGET_PAGE_BITS);
+
+last_ram_offset += size;
+
+if (kvm_enabled())
+kvm_setup_guest_memory(new_block->host, size);
+
+return new_block->offset;
+}
+
 ram_addr_t qemu_ram_alloc(ram_addr_t size)
 {
 RAMBlock *new_block;
-- 
1.6.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM autotest patch queue report 03-25-2010

2010-03-24 Thread Lucas Meneghel Rodrigues
This week we had a strong influx of new patches, with 19 patches already
applied out of 21. If you want to check out the commits made, you can
refer to:

http://autotest.kernel.org/timeline


Summary
===

Total patches: 10
Reviewed patches: 5
Reviews unfinished: 5

Autotest patchwork
http://patchwork.test.kernel.org/project/autotest/list/


 
[KVM-AUTOTEST,5/5] KVM test: abort-on-error mode 2010-03-23 Michael Goldish lmr 
Under Review

Code looks good. Testing stage.


 
[KVM-AUTOTEST,v3] KVM test: take frequent screendumps during all tests 
2010-03-23 Michael Goldish lmr Under Review

Code looks good. Testing stage.


 
[KVM-AUTOTEST] Opensuse unattended install 2010-03-23 yogi lmr Under Review

I am concerned about a timeout added at the end of unattended install,
which could have bad side effects on other guests. Testing stage.


KVM-Test: Add kvm userspace unit test 2010-03-05 sshang lmr Under review

Naphtali Sprei is working on support running the unittests with a new
infrastructure in-qemu, made by Avi, so this test will wait a little bit
so we can merge both approaches in one.


[2/2] KVM test: Add cpu_set subtest 2010-02-25 Lucas Meneghel Rodrigues lmr 
Under Review

This patch will stay on the queue until the feature tested gets in a
better shape on KVM upstream


KVM test: Add support for ipv6 addresses 2010-02-24 Lucas Meneghel Rodrigues 
lmr Under Review

This test was reviewed and the decision is that it will stay on the
queue until we have more extensive guest network testing.


KVM test: Memory ballooning test for KVM guest 2010-02-11 pradeep lmr Under 
Review

Made comments to the patch originator, waiting for a revised version.


KVM-test: Add a subtest 'qemu_img' 2010-01-29 Yolkfull Chow lmr Under Review

Made comments to the patch originator, waiting for a revised version.


[2/2] KVM test: subtest migration: Add rem_host and rem_port for migrate() 
2009-12-08 Yolkfull Chow lmr Under Review
[1/2,-,V3] Add a server-side test - kvm_migration 2009-12-08 Yolkfull Chow lmr 
Under Review

This patchset still needs full review, but remote migration is something
that we want to take slowly, so we can have a first version integrated
upstream with a good round of testing. Right now I am not sure if the
approach on this patchset is the right way of approaching the problem.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MSI-X not enabled for ixgbe device-passthrough

2010-03-24 Thread Sheng Yang
On Wednesday 24 March 2010 23:54:15 Hannes Reinecke wrote:
> Hi all,
> 
> I'm trying to setup a system with device-passthrough for
> an ixgbe NIC.
> The device itself seems to work, but it isn't using MSI-X.
> So some more advanced features like DCB offloading etc
> won't work.

How about lspci result in the guest?

And some guest dmesg would also help.

-- 
regards
Yang, Sheng

> 
> lspci output of the device:
> 07:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit Network
>  Connection (rev 01) Subsystem: Intel Corporation Ethernet Server Adapter
>  X520-2 Flags: bus master, fast devsel, latency 0, IRQ 24
> Memory at f5c8 (64-bit, prefetchable) [size=512K]
> I/O ports at 5000 [size=32]
> Memory at f5c7 (64-bit, prefetchable) [size=16K]
> [virtual] Expansion ROM at e710 [disabled] [size=512K]
> Capabilities: [40] Power Management version 3
> Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+
>  Count=1/1 Enable- Capabilities: [70] MSI-X: Enable+ Mask- TabSize=64
> Capabilities: [a0] Express Endpoint, MSI 00
> Capabilities: [100] Advanced Error Reporting
> UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>  RxOF- MalfTLP- ECRC- UnsupReq+ ACSVoil- UEMsk:  DLP- SDES- TLP- FCP-
>  CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSVoil-
>  UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
>  ECRC- UnsupReq- ACSVoil- CESta:  RxErr- BadTLP- BadDLLP- Rollover-
>  Timeout- NonFatalErr+ CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout-
>  NonFatalErr- AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+
>  ChkEn- Capabilities: [140] Device Serial Number 40-9e-3c-ff-ff-21-1b-00
>  Capabilities: [150] Alternative Routing-ID Interpretation (ARI) ARICap:
>  MFVC- ACS-, Next Function: 1
> ARICtl: MFVC- ACS-, Function Group: 0
> Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
> IOVCap: Migration-, Interrupt Message Number: 000
> IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
> IOVSta: Migration-
> Initial VFs: 64, Total VFs: 64, Number of VFs: 64, Function
>  Dependency Link: 00 VF offset: 128, stride: 2, Device ID: 10ed
> Supported Page Size: 0553, System Page Size: 0001
> VF Migration: offset: , BIR: 1
> Kernel driver in use: ixgbe
> Kernel modules: ixgbe
> 
> please let me know if you need more information.
> 
> Cheers,
> 
> Hannes
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] Re: KVM call agenda for Mar 23

2010-03-24 Thread Zhang, Xiantao
Jes Sorensen wrote:
> On 03/23/10 13:45, Anthony Liguori wrote:
>> I don't think we can pull in:
>> 
>> - extboot
>> - ia64
>> - in-kernel pit[1]
>> - associated command line options
>> - device passthrough
>> 
>> The question is, if we dropped those things, would people actually
>> use qemu.git instead of qemu-kvm.git. If the answer is "no", what
>> set of things do we need in order for people to focus on qemu.git
>> instead of qemu-kvm.git.
> 
> I am not sure if anyone is still actively working on ia64. According
> to the qemu-kvm.git logs, there hasn't been any real ia64 changes to
> the code since my last commit in June of last year and then a couple
> of minor configure bits.
> 
> IMHO we can just let it rot - not sure if Xiantao is still interested?
For ia64 part, maybe we can keep the current qemu-kvm.git for the users. And it 
is not a must to push it into Qemu upstream. 
Xiantao

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH 3/4] KVM test: Add scrashme into guest test

2010-03-24 Thread Lucas Meneghel Rodrigues
This test was applied, but I changed the argument list to execute the
test with --mode=random, that is, random syscalls will be tested with
random values.

On Mon, Mar 22, 2010 at 4:45 AM, Jason Wang  wrote:
> This patch let the scrashme run in the guest. Scrashme is one kind of
> fuzzing or stress test through systemcall. It should be useful in
> testing the VMM.
>
> Signed-off-by: Jason Wang 
> ---
>  client/tests/kvm/autotest_control/scrashme.control |   14 ++
>  client/tests/kvm/tests_base.cfg.sample             |    3 +++
>  2 files changed, 17 insertions(+), 0 deletions(-)
>  create mode 100644 client/tests/kvm/autotest_control/scrashme.control
>
> diff --git a/client/tests/kvm/autotest_control/scrashme.control 
> b/client/tests/kvm/autotest_control/scrashme.control
> new file mode 100644
> index 000..ba2466f
> --- /dev/null
> +++ b/client/tests/kvm/autotest_control/scrashme.control
> @@ -0,0 +1,14 @@
> +NAME='scrashme'
> +AUTHOR='Yi Yang '
> +TEST_CATEGORY='Stress'
> +TEST_CLASS='Kernel'
> +TEST_TYPE='client'
> +TIME='MEDIUM'
> +DOC='''
> +Runs the scrashme suite located at:
> +http://www.codemonkey.org.uk/projects/scrashme/
> +
> +Runs the scrashme syscalls test suite. This test mode will exercise
> +kernel syscalls randomically, or in a sequential fashion.
> +'''
> +job.run_test('scrashme')
> diff --git a/client/tests/kvm/tests_base.cfg.sample 
> b/client/tests/kvm/tests_base.cfg.sample
> index 861759e..8cc83a9 100644
> --- a/client/tests/kvm/tests_base.cfg.sample
> +++ b/client/tests/kvm/tests_base.cfg.sample
> @@ -139,6 +139,9 @@ variants:
>             - tsc:
>                 test_name = tsc
>                 test_control_file = tsc.control
> +            - scrashme:
> +                test_name = scrashme
> +                test_control_file = scrashme.control
>
>     - linux_s3:     install setup unattended_install
>         type = linux_s3
>
> ___
> Autotest mailing list
> autot...@test.kernel.org
> http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
>



-- 
Lucas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST PATCH] KVM test: tests_base.cfg.sample: remove kill_vm_gracefully=no for system_powerdown

2010-03-24 Thread Michael Goldish
There's no good reason to set kill_vm_gracefully=no for system_powerdown.
Furthermore, killing the VM ungracefully (with 'quit') can severely damage the
filesystem (I witnessed that at least once).
Therefore, revert to the default, which is kill_vm_gracefully=yes.

Signed-off-by: Michael Goldish 
---
 client/tests/kvm/tests_base.cfg.sample |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index b8288fc..c9dfd0b 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -302,7 +302,6 @@ variants:
 shutdown_method = system_powerdown
 sleep_before_powerdown = 20
 kill_vm = yes
-kill_vm_gracefully = no
 
 - system_reset: install setup unattended_install
 type = boot
-- 
1.5.4.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 3/8] test: protect fwcfg accesses with lock

2010-03-24 Thread Marcelo Tosatti
So multiple CPU's can access fwcfg safely.

Signed-off-by: Marcelo Tosatti 

Index: qemu-kvm/kvm/user/test/lib/x86/fwcfg.c
===
--- qemu-kvm.orig/kvm/user/test/lib/x86/fwcfg.c
+++ qemu-kvm/kvm/user/test/lib/x86/fwcfg.c
@@ -1,4 +1,7 @@
 #include "fwcfg.h"
+#include "smp.h"
+
+static struct spinlock lock;
 
 uint64_t fwcfg_get_u(uint16_t index, int bytes)
 {
@@ -6,11 +9,13 @@ uint64_t fwcfg_get_u(uint16_t index, int
 uint8_t b;
 int i;
 
+spin_lock(&lock);
 asm volatile ("out %0, %1" : : "a"(index), "d"((uint16_t)BIOS_CFG_IOPORT));
 for (i = 0; i < bytes; ++i) {
 asm volatile ("in %1, %0" : "=a"(b) : "d"((uint16_t)(BIOS_CFG_IOPORT + 
1)));
 r |= (uint64_t)b << (i * 8);
 }
+spin_unlock(&lock);
 return r;
 }
 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 5/8] testdev: add port to create/delete memslots

2010-03-24 Thread Marcelo Tosatti
Signed-off-by: Marcelo Tosatti 

Index: qemu-kvm/hw/testdev.c
===
--- qemu-kvm.orig/hw/testdev.c
+++ qemu-kvm/hw/testdev.c
@@ -5,6 +5,10 @@
 struct testdev {
 ISADevice dev;
 CharDriverState *chr;
+struct memslot {
+target_phys_addr_t start;
+target_phys_addr_t end;
+} memslot;
 };
 
 static void test_device_serial_write(void *opaque, uint32_t addr, uint32_t 
data)
@@ -90,6 +94,45 @@ static CPUWriteMemoryFunc * const test_i
 test_iomem_writel,
 };
 
+#define CMD_CREATE_SLOT 0x0
+#define CMD_DELETE_SLOT 0x1
+
+static void test_device_memslot_write(void *opaque, uint32_t addr, uint32_t 
data)
+{
+uint32_t port = addr - 0x2018;
+struct testdev *dev = opaque;
+
+switch(port) {
+case 0:
+dev->memslot.start = 0;
+case 4:
+dev->memslot.start |= (unsigned long)data << (port * 8);
+break;
+case 8:
+dev->memslot.end = 0;
+case 12:
+dev->memslot.end |= (unsigned long)data << ((port-8) * 8);
+break;
+case 16:
+if (data == CMD_CREATE_SLOT) {
+ram_addr_t ram_addr, size;
+
+size = dev->memslot.end - dev->memslot.start;
+
+ram_addr = qemu_ram_alloc(size);
+cpu_register_physical_memory(dev->memslot.start, size, ram_addr);
+}
+else if (data == CMD_DELETE_SLOT) {
+ram_addr_t size = dev->memslot.end - dev->memslot.start;
+
+cpu_register_physical_memory(dev->memslot.start, size,
+ IO_MEM_UNASSIGNED);
+}
+default:
+break;
+}
+}
+
 static int init_test_device(ISADevice *isa)
 {
 struct testdev *dev = DO_UPCAST(struct testdev, dev, isa);
@@ -105,6 +148,8 @@ static int init_test_device(ISADevice *i
 register_ioport_read(0xe0, 1, 4, test_device_ioport_read, dev);
 register_ioport_write(0xe0, 1, 4, test_device_ioport_write, dev);
 register_ioport_write(0x2000, 24, 1, test_device_irq_line, NULL);
+register_ioport_write(0x2018, 20, 4, test_device_memslot_write, dev);
+
 iomem_buf = qemu_mallocz(0x1);
 iomem = cpu_register_io_memory(test_iomem_read, test_iomem_write, NULL);
 cpu_register_physical_memory(0xff00, 0x1, iomem);


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/8] add slot deletion, rmap chain tests

2010-03-24 Thread Marcelo Tosatti


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/8] test: add pagefault exception handler

2010-03-24 Thread Marcelo Tosatti
Which print cr2 and exits.

Signed-off-by: Marcelo Tosatti 

Index: qemu-kvm/kvm/user/test/lib/x86/smp.c
===
--- qemu-kvm.orig/kvm/user/test/lib/x86/smp.c
+++ qemu-kvm/kvm/user/test/lib/x86/smp.c
@@ -5,6 +5,7 @@
 #include "fwcfg.h"
 
 #define IPI_VECTOR 0x20
+#define PF_VECTOR 0xe
 
 static struct spinlock ipi_lock;
 static void (*ipi_function)(void *data);
@@ -18,6 +19,20 @@ static __attribute__((used)) void ipi()
 ipi_done = 1;
 }
 
+unsigned long read_cr2()
+{
+unsigned long cr2;
+
+asm volatile ("mov %%cr2, %0" : "=r"(cr2));
+return cr2;
+}
+
+static __attribute__((used)) void pf()
+{
+printf("PF: %lx\n", read_cr2());
+asm ("call exit");
+}
+
 asm (
  "ipi_entry: \n"
  "   call ipi \n"
@@ -28,21 +43,31 @@ asm (
 #endif
  );
 
+asm (
+ "pf_entry: \n"
+ "   call pf \n"
+#ifndef __x86_64__
+ "   iret"
+#else
+ "   iretq"
+#endif
+ );
+
 
-static void set_ipi_descriptor(void (*ipi_entry)(void))
+static void set_exp_descriptor(void (*entry)(void), unsigned vec)
 {
-unsigned short *desc = (void *)(IPI_VECTOR * sizeof(long) * 2);
+unsigned short *desc = (void *)(vec * sizeof(long) * 2);
 unsigned short cs;
-unsigned long ipi = (unsigned long)ipi_entry;
+unsigned long fn = (unsigned long)entry;
 
 asm ("mov %%cs, %0" : "=r"(cs));
-desc[0] = ipi;
+desc[0] = fn;
 desc[1] = cs;
 desc[2] = 0x8e00;
-desc[3] = ipi >> 16;
+desc[3] = fn >> 16;
 #ifdef __x86_64__
-desc[4] = ipi >> 32;
-desc[5] = ipi >> 48;
+desc[4] = fn >> 32;
+desc[5] = fn >> 48;
 desc[6] = 0;
 desc[7] = 0;
 #endif
@@ -118,8 +143,10 @@ void smp_init_ids(void)
 {
 int i;
 void ipi_entry(void);
+void pf_entry(void);
 
-set_ipi_descriptor(ipi_entry);
+set_exp_descriptor(ipi_entry, IPI_VECTOR);
+set_exp_descriptor(pf_entry, PF_VECTOR);
 
 setup_smp_id(0);
 for (i = 1; i < cpu_count(); ++i)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/8] test: allow functions to execute on non-irq context remotely

2010-03-24 Thread Marcelo Tosatti
Which allows code to execute on remote cpus while receiving interrupts.

Also move late smp initialization to common code, and the smp loop
to C code.

Signed-off-by: Marcelo Tosatti 

Index: qemu-kvm/kvm/user/test/lib/x86/smp.c
===
--- qemu-kvm.orig/kvm/user/test/lib/x86/smp.c
+++ qemu-kvm/kvm/user/test/lib/x86/smp.c
@@ -114,7 +114,7 @@ void on_cpu_async(int cpu, void (*functi
 }
 
 
-void smp_init(void)
+void smp_init_ids(void)
 {
 int i;
 void ipi_entry(void);
@@ -125,4 +125,57 @@ void smp_init(void)
 for (i = 1; i < cpu_count(); ++i)
 on_cpu(i, setup_smp_id, 0);
 
+   printf("detected %d cpus\n", cpu_count());
+}
+
+static void *smp_function(void)
+{
+void *fn;
+
+asm ("mov %%gs:8, %0" : "=r"(fn));
+return fn;
+}
+
+static void setup_smp_function(void *data)
+{
+asm ("mov %0, %%gs:8" : : "r"(data) : "memory");
+}
+
+static void *smp_data(void)
+{
+void *fn;
+
+asm ("mov %%gs:16, %0" : "=r"(fn));
+return fn;
+}
+
+static void setup_smp_data(void *data)
+{
+asm ("mov %0, %%gs:16" : : "r"(data) : "memory");
+}
+
+void on_cpu_noipi(int cpu, void (*function)(void *data), void *data)
+{
+if (cpu == smp_id())
+function(data);
+else {
+on_cpu(cpu, setup_smp_data, data);
+on_cpu(cpu, setup_smp_function, function);
+}
+}
+
+void smp_loop(void)
+{
+void (*fn)(void *data);
+void *data;
+
+asm volatile ("hlt");
+if (smp_id() == 0)
+return;
+
+fn = smp_function();
+if (fn) {
+fn(smp_data());
+setup_smp_function(0);
+}
 }
Index: qemu-kvm/kvm/user/test/lib/x86/smp.h
===
--- qemu-kvm.orig/kvm/user/test/lib/x86/smp.h
+++ qemu-kvm/kvm/user/test/lib/x86/smp.h
@@ -5,12 +5,11 @@ struct spinlock {
 int v;
 };
 
-void smp_init(void);
-
 int cpu_count(void);
 int smp_id(void);
 void on_cpu(int cpu, void (*function)(void *data), void *data);
 void on_cpu_async(int cpu, void (*function)(void *data), void *data);
+void on_cpu_noipi(int cpu, void (*function)(void *data), void *data);
 void spin_lock(struct spinlock *lock);
 void spin_unlock(struct spinlock *lock);
 
Index: qemu-kvm/kvm/user/test/x86/cstart64.S
===
--- qemu-kvm.orig/kvm/user/test/x86/cstart64.S
+++ qemu-kvm/kvm/user/test/x86/cstart64.S
@@ -165,7 +165,7 @@ ap_start64:
nop
lock incw cpu_online_count
 
-1: hlt
+1: call smp_loop
jmp 1b
 
 start64:
@@ -174,6 +174,7 @@ start64:
call enable_apic
call smp_init
call enable_x2apic
+   call smp_init_ids
call main
mov %eax, %edi
call exit
Index: qemu-kvm/kvm/user/test/x86/smptest.c
===
--- qemu-kvm.orig/kvm/user/test/x86/smptest.c
+++ qemu-kvm/kvm/user/test/x86/smptest.c
@@ -15,8 +15,6 @@ int main()
 int ncpus;
 int i;
 
-smp_init();
-
 ncpus = cpu_count();
 printf("found %d cpus\n", ncpus);
 for (i = 0; i < ncpus; ++i)
Index: qemu-kvm/kvm/user/test/x86/vmexit.c
===
--- qemu-kvm.orig/kvm/user/test/x86/vmexit.c
+++ qemu-kvm/kvm/user/test/x86/vmexit.c
@@ -155,8 +155,6 @@ int main(void)
 {
int i;
 
-   smp_init();
-
for (i = 0; i < ARRAY_SIZE(tests); ++i)
do_test(&tests[i]);
 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 6/8] test: parallel faults vs slot deletion

2010-03-24 Thread Marcelo Tosatti
Signed-off-by: Marcelo Tosatti 

Index: qemu-kvm/kvm/user/config-x86-common.mak
===
--- qemu-kvm.orig/kvm/user/config-x86-common.mak
+++ qemu-kvm/kvm/user/config-x86-common.mak
@@ -42,6 +42,9 @@ $(TEST_DIR)/sieve.flat: $(cstart.o) $(TE
$(TEST_DIR)/print.o $(TEST_DIR)/vm.o
  
 $(TEST_DIR)/vmexit.flat: $(cstart.o) $(TEST_DIR)/vmexit.o
+
+$(TEST_DIR)/slot_deletion.flat: $(cstart.o) $(TEST_DIR)/slot_deletion.o \
+   $(TEST_DIR)/print.o $(TEST_DIR)/vm.o
  
 $(TEST_DIR)/test32.flat: $(TEST_DIR)/test32.o
 
Index: qemu-kvm/kvm/user/config-x86_64.mak
===
--- qemu-kvm.orig/kvm/user/config-x86_64.mak
+++ qemu-kvm/kvm/user/config-x86_64.mak
@@ -7,6 +7,7 @@ CFLAGS += -D__x86_64__
 tests = $(TEST_DIR)/access.flat $(TEST_DIR)/sieve.flat \
   $(TEST_DIR)/simple.flat $(TEST_DIR)/stringio.flat \
   $(TEST_DIR)/memtest1.flat $(TEST_DIR)/emulator.flat \
-  $(TEST_DIR)/hypercall.flat $(TEST_DIR)/apic.flat
+  $(TEST_DIR)/hypercall.flat $(TEST_DIR)/apic.flat \
+  $(TEST_DIR)/slot_deletion.flat
 
 include config-x86-common.mak
Index: qemu-kvm/kvm/user/test/x86/slot_deletion.c
===
--- /dev/null
+++ qemu-kvm/kvm/user/test/x86/slot_deletion.c
@@ -0,0 +1,130 @@
+/* test parallel faults vs slot deletion */
+
+#include "libcflat.h"
+#include "vm.h"
+#include "smp.h"
+
+static unsigned int inl(unsigned short port)
+{
+unsigned int val;
+asm volatile ("inl %w1, %0":"=a" (val):"Nd" (port));
+return val;
+}
+
+static void outl(unsigned int data, unsigned short port)
+{
+asm volatile ("outl %0, %1"::"a" (data), "d" (port));
+}
+
+static int write_mem_slot (unsigned long start, unsigned long end)
+{
+outl(start, 0x2018);
+outl((unsigned long) start >> 32, 0x201c);
+
+outl(end, 0x2020);
+outl((unsigned long) end >> 32, 0x2024);
+return 0;
+}
+
+#define CMD_CREATE_SLOT 0x0
+#define CMD_DELETE_SLOT 0x1
+
+int create_mem_slot(unsigned long start, unsigned long end)
+{
+write_mem_slot (start, end);
+outl(CMD_CREATE_SLOT, 0x2028);
+return 0;
+}
+
+int delete_mem_slot(unsigned long start, unsigned long end)
+{
+write_mem_slot (start, end);
+outl(CMD_DELETE_SLOT, 0x2028);
+return 0;
+}
+
+void map_addr_with_pte_phys(void *virt_addr, unsigned long pte_phys,
+   void *target_page)
+{
+/* 1:1 map the pagetable inside memslot */
+install_page(phys_to_virt(read_cr3()), pte_phys, (void *)pte_phys);
+install_pte(phys_to_virt(read_cr3()), 1, virt_addr,
+   virt_to_phys(target_page) | PTE_PRESENT | PTE_WRITE,
+   (void *) pte_phys);
+}
+
+#define define_barrier(x) int count_##x = 0;\
+static void barrier_##x(void) { \
+count_##x++;\
+while (count_##x < cpu_count());\
+}
+
+define_barrier(cr3);
+define_barrier(fault);
+define_barrier(done);
+
+static void fault_vaddr(void *data)
+{
+unsigned long *target_map = data;
+
+barrier_fault();
+*target_map = 0;
+barrier_done();
+}
+
+void run_test(void)
+{
+unsigned long start, end, pte_phys;
+void *target_page, *virt_addr;
+int i;
+
+start = inl(0xd1);
+end = start + (PAGE_SIZE * 1000);
+create_mem_slot(start, end);
+target_page = alloc_page();
+
+pte_phys = start;
+virt_addr = (void *) 0xfa000;
+for (i = 2; i <= cpu_count(); i++) {
+map_addr_with_pte_phys(virt_addr, pte_phys, target_page);
+pte_phys += PAGE_SIZE;
+virt_addr += PAGE_SIZE * 512;
+}
+
+count_fault = 0;
+count_done = 0;
+pte_phys = start;
+virt_addr = (void *)0xfa000;
+for (i = 2; i <= cpu_count(); i++) {
+on_cpu_noipi(i-1, fault_vaddr, virt_addr);
+pte_phys += PAGE_SIZE;
+virt_addr += PAGE_SIZE * 512;
+}
+
+barrier_fault();
+delete_mem_slot(start, end);
+barrier_done();
+}
+
+static void setup_cr3 (void *cr3)
+{
+load_cr3(virt_to_phys(cr3));
+barrier_cr3();
+}
+
+int main (void)
+{
+int i;
+
+setup_vm();
+for (i = 2; i <= cpu_count(); i++)
+on_cpu_noipi(i-1, setup_cr3, (void *)read_cr3());
+
+barrier_cr3();
+
+for (i = 0; i < 100; i++)
+run_test();
+
+printf("SUCCESS\n");
+return 0;
+}


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 7/8] test: bump max vcpus to 64

2010-03-24 Thread Marcelo Tosatti
Signed-off-by: Marcelo Tosatti 

Index: qemu-kvm/kvm/user/test/x86/cstart64.S
===
--- qemu-kvm.orig/kvm/user/test/x86/cstart64.S
+++ qemu-kvm/kvm/user/test/x86/cstart64.S
@@ -6,7 +6,7 @@ boot_idt = 0
 
 ipi_vector = 0x20
 
-max_cpus = 4
+max_cpus = 64
 
 .bss
 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 4/8] test: export vm helpers

2010-03-24 Thread Marcelo Tosatti
To be used by next patches. Also make install_pte take an argument 
indicating physical location of pagetable.

Signed-off-by: Marcelo Tosatti 
 
Index: qemu-kvm/kvm/user/test/x86/vm.c
===
--- qemu-kvm.orig/kvm/user/test/x86/vm.c
+++ qemu-kvm/kvm/user/test/x86/vm.c
@@ -3,22 +3,9 @@
 
 void print(const char *s);
 
-#define PAGE_SIZE 4096ul
-#define LARGE_PAGE_SIZE (512 * PAGE_SIZE)
-
 static void *free = 0;
 static void *vfree_top = 0;
 
-static unsigned long virt_to_phys(const void *virt) 
-{ 
-return (unsigned long)virt;
-}
-
-static void *phys_to_virt(unsigned long phys)
-{
-return (void *)phys;
-}
-
 void *memset(void *data, int c, unsigned long len)
 {
 char *s = data;
@@ -61,15 +48,11 @@ void free_page(void *page)
 extern char edata;
 static unsigned long end_of_memory;
 
-#define PTE_PRESENT (1ull << 0)
-#define PTE_PSE (1ull << 7)
-#define PTE_WRITE   (1ull << 1)
-#define PTE_ADDR(0xff000ull)
-
-static void install_pte(unsigned long *cr3, 
-   int pte_level, 
+void install_pte(unsigned long *cr3,
+   int pte_level,
void *virt,
-   unsigned long pte)
+   unsigned long pte,
+   unsigned long *pt_page)
 {
 int level;
 unsigned long *pt = cr3;
@@ -78,7 +61,11 @@ static void install_pte(unsigned long *c
 for (level = 4; level > pte_level; --level) {
offset = ((unsigned long)virt >> ((level-1) * 9 + 12)) & 511;
if (!(pt[offset] & PTE_PRESENT)) {
-   unsigned long *new_pt = alloc_page();
+   unsigned long *new_pt = pt_page;
+if (!new_pt)
+new_pt = alloc_page();
+else
+pt_page = 0;
memset(new_pt, 0, PAGE_SIZE);
pt[offset] = virt_to_phys(new_pt) | PTE_PRESENT | PTE_WRITE;
}
@@ -108,58 +95,20 @@ static unsigned long get_pte(unsigned lo
 return pte;
 }
 
-static void install_large_page(unsigned long *cr3, 
-  unsigned long phys,
-  void *virt)
+void install_large_page(unsigned long *cr3,
+  unsigned long phys,
+  void *virt)
 {
-install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_PSE);
+install_pte(cr3, 2, virt, phys | PTE_PRESENT | PTE_WRITE | PTE_PSE, 0);
 }
 
-static void install_page(unsigned long *cr3, 
-unsigned long phys,
-void *virt)
+void install_page(unsigned long *cr3,
+  unsigned long phys,
+  void *virt)
 {
-install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE);
-}
-
-static inline void load_cr3(unsigned long cr3)
-{
-asm ( "mov %0, %%cr3" : : "r"(cr3) );
-}
-
-static inline unsigned long read_cr3()
-{
-unsigned long cr3;
-
-asm volatile ( "mov %%cr3, %0" : "=r"(cr3) );
-return cr3;
+install_pte(cr3, 1, virt, phys | PTE_PRESENT | PTE_WRITE, 0);
 }
 
-static inline void load_cr0(unsigned long cr0)
-{
-asm volatile ( "mov %0, %%cr0" : : "r"(cr0) );
-}
-
-static inline unsigned long read_cr0()
-{
-unsigned long cr0;
-
-asm volatile ( "mov %%cr0, %0" : "=r"(cr0) );
-return cr0;
-}
-
-static inline void load_cr4(unsigned long cr4)
-{
-asm volatile ( "mov %0, %%cr4" : : "r"(cr4) );
-}
-
-static inline unsigned long read_cr4()
-{
-unsigned long cr4;
-
-asm volatile ( "mov %%cr4, %0" : "=r"(cr4) );
-return cr4;
-}
 
 struct gdt_table_descr
 {
Index: qemu-kvm/kvm/user/test/x86/vm.h
===
--- qemu-kvm.orig/kvm/user/test/x86/vm.h
+++ qemu-kvm/kvm/user/test/x86/vm.h
@@ -1,10 +1,72 @@
 #ifndef VM_H
 #define VM_H
 
+#define PAGE_SIZE 4096ul
+#define LARGE_PAGE_SIZE (512 * PAGE_SIZE)
+
+#define PTE_PRESENT (1ull << 0)
+#define PTE_PSE (1ull << 7)
+#define PTE_WRITE   (1ull << 1)
+#define PTE_ADDR(0xff000ull)
+
 void setup_vm();
 
 void *vmalloc(unsigned long size);
 void vfree(void *mem);
 void *vmap(unsigned long long phys, unsigned long size);
 
+void install_pte(unsigned long *cr3,
+int pte_level,
+void *virt,
+unsigned long pte,
+unsigned long *pt_page);
+
+void *alloc_page();
+
+void install_large_page(unsigned long *cr3,unsigned long phys,
+   void *virt);
+void install_page(unsigned long *cr3, unsigned long phys, void *virt);
+
+static inline unsigned long virt_to_phys(const void *virt)
+{
+return (unsigned long)virt;
+}
+
+static inline void *phys_to_virt(unsigned long phys)
+{
+return (void *)phys;
+}
+
+
+static inline void load_cr3(unsigned long cr3)
+{
+asm ( "mov %0, %%cr3" : : "r"(cr3) );
+}
+
+static inline unsigned long read_cr3()
+{
+unsigned long cr3;
+
+asm volatil

[patch 8/8] test: long rmap chains

2010-03-24 Thread Marcelo Tosatti
Signed-off-by: Marcelo Tosatti 

Index: qemu-kvm/kvm/user/config-x86-common.mak
===
--- qemu-kvm.orig/kvm/user/config-x86-common.mak
+++ qemu-kvm/kvm/user/config-x86-common.mak
@@ -45,6 +45,9 @@ $(TEST_DIR)/vmexit.flat: $(cstart.o) $(T
 
 $(TEST_DIR)/slot_deletion.flat: $(cstart.o) $(TEST_DIR)/slot_deletion.o \
$(TEST_DIR)/print.o $(TEST_DIR)/vm.o
+
+$(TEST_DIR)/rmap_chain.flat: $(cstart.o) $(TEST_DIR)/rmap_chain.o \
+   $(TEST_DIR)/print.o $(TEST_DIR)/vm.o
  
 $(TEST_DIR)/test32.flat: $(TEST_DIR)/test32.o
 
Index: qemu-kvm/kvm/user/config-x86_64.mak
===
--- qemu-kvm.orig/kvm/user/config-x86_64.mak
+++ qemu-kvm/kvm/user/config-x86_64.mak
@@ -8,6 +8,6 @@ tests = $(TEST_DIR)/access.flat $(TEST_D
   $(TEST_DIR)/simple.flat $(TEST_DIR)/stringio.flat \
   $(TEST_DIR)/memtest1.flat $(TEST_DIR)/emulator.flat \
   $(TEST_DIR)/hypercall.flat $(TEST_DIR)/apic.flat \
-  $(TEST_DIR)/slot_deletion.flat
+  $(TEST_DIR)/slot_deletion.flat $(TEST_DIR)/rmap_chain.flat
 
 include config-x86-common.mak
Index: qemu-kvm/kvm/user/test/x86/rmap_chain.c
===
--- /dev/null
+++ qemu-kvm/kvm/user/test/x86/rmap_chain.c
@@ -0,0 +1,53 @@
+/* test long rmap chains */
+
+#include "libcflat.h"
+#include "vm.h"
+#include "smp.h"
+
+void print(const char *s);
+
+static unsigned int inl(unsigned short port)
+{
+unsigned int val;
+asm volatile ("inl %w1, %0":"=a" (val):"Nd" (port));
+return val;
+}
+
+int main (void)
+{
+int i;
+int nr_pages;
+void *target_page, *virt_addr;
+
+setup_vm();
+
+nr_pages = inl(0xd1) / PAGE_SIZE;
+nr_pages -= 1000;
+target_page = alloc_page();
+
+virt_addr = (void *) 0xfa000;
+for (i = 0; i < nr_pages; i++) {
+install_page(phys_to_virt(read_cr3()), virt_to_phys(target_page),
+ virt_addr);
+virt_addr += PAGE_SIZE;
+}
+printf("created %d mappings\n", nr_pages);
+
+virt_addr = (void *) 0xfa000;
+for (i = 0; i < nr_pages; i++) {
+unsigned long *touch = virt_addr;
+
+*touch = 0;
+virt_addr += PAGE_SIZE;
+}
+printf("instantiated mappings\n");
+
+virt_addr += PAGE_SIZE;
+install_pte(phys_to_virt(read_cr3()), 1, virt_addr,
+0 | PTE_PRESENT | PTE_WRITE, target_page);
+
+*(unsigned long *)virt_addr = 0;
+printf("SUCCESS\n");
+
+return 0;
+}


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/21] KVM: PPC: Don't reload FPU with invalid values

2010-03-24 Thread Alexander Graf
When the guest activates the FPU, we load it up. That's fine when
it wasn't activated before on the host, but if it was we end up
reloading FPU values from last time the FPU was deactivated on the
host without writing the proper values back to the vcpu struct.

This patch checks if the FPU is enabled already and if so just doesn't
bother activating it, making FPU operations survive guest context switches.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 400ae0a..029e1be 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -701,6 +701,11 @@ static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, 
unsigned int exit_nr,
return RESUME_GUEST;
}
 
+   /* We already own the ext */
+   if (vcpu->arch.guest_owned_ext & msr) {
+   return RESUME_GUEST;
+   }
+
 #ifdef DEBUG_EXT
printk(KERN_INFO "Loading up ext 0x%lx\n", msr);
 #endif
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/21] KVM: PPC: Implement emulation for lbzux and lhax

2010-03-24 Thread Alexander Graf
We get MMIOs with the weirdest instructions. But every time we do,
we need to improve our emulator to implement them.

So let's do that - this time it's lbzux and lhax's round.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/emulate.c |   20 
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 2410ec2..dbb5d68 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -38,10 +38,12 @@
 #define OP_31_XOP_LBZX  87
 #define OP_31_XOP_STWX  151
 #define OP_31_XOP_STBX  215
+#define OP_31_XOP_LBZUX 119
 #define OP_31_XOP_STBUX 247
 #define OP_31_XOP_LHZX  279
 #define OP_31_XOP_LHZUX 311
 #define OP_31_XOP_MFSPR 339
+#define OP_31_XOP_LHAX  343
 #define OP_31_XOP_STHX  407
 #define OP_31_XOP_STHUX 439
 #define OP_31_XOP_MTSPR 467
@@ -173,6 +175,19 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct 
kvm_vcpu *vcpu)
emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1);
break;
 
+   case OP_31_XOP_LBZUX:
+   rt = get_rt(inst);
+   ra = get_ra(inst);
+   rb = get_rb(inst);
+
+   ea = kvmppc_get_gpr(vcpu, rb);
+   if (ra)
+   ea += kvmppc_get_gpr(vcpu, ra);
+
+   emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1);
+   kvmppc_set_gpr(vcpu, ra, ea);
+   break;
+
case OP_31_XOP_STWX:
rs = get_rs(inst);
emulated = kvmppc_handle_store(run, vcpu,
@@ -202,6 +217,11 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct 
kvm_vcpu *vcpu)
kvmppc_set_gpr(vcpu, rs, ea);
break;
 
+   case OP_31_XOP_LHAX:
+   rt = get_rt(inst);
+   emulated = kvmppc_handle_loads(run, vcpu, rt, 2, 1);
+   break;
+
case OP_31_XOP_LHZX:
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 2, 1);
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/21] KVM: PPC: Implement mfsr emulation

2010-03-24 Thread Alexander Graf
We emulate the mfsrin instruction already, that passes the SR number
in a register value. But we lacked support for mfsr that encoded the
SR number in the opcode.

So let's implement it.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_64_emulate.c |   13 +
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_emulate.c 
b/arch/powerpc/kvm/book3s_64_emulate.c
index c989214..8d7a78d 100644
--- a/arch/powerpc/kvm/book3s_64_emulate.c
+++ b/arch/powerpc/kvm/book3s_64_emulate.c
@@ -35,6 +35,7 @@
 #define OP_31_XOP_SLBMTE   402
 #define OP_31_XOP_SLBIE434
 #define OP_31_XOP_SLBIA498
+#define OP_31_XOP_MFSR 595
 #define OP_31_XOP_MFSRIN   659
 #define OP_31_XOP_SLBMFEV  851
 #define OP_31_XOP_EIOIO854
@@ -90,6 +91,18 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
case OP_31_XOP_MTMSR:
kvmppc_set_msr(vcpu, kvmppc_get_gpr(vcpu, 
get_rs(inst)));
break;
+   case OP_31_XOP_MFSR:
+   {
+   int srnum;
+
+   srnum = kvmppc_get_field(inst, 12 + 32, 15 + 32);
+   if (vcpu->arch.mmu.mfsrin) {
+   u32 sr;
+   sr = vcpu->arch.mmu.mfsrin(vcpu, srnum);
+   kvmppc_set_gpr(vcpu, get_rt(inst), sr);
+   }
+   break;
+   }
case OP_31_XOP_MFSRIN:
{
int srnum;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/21] KVM: PPC: Ensure split mode works

2010-03-24 Thread Alexander Graf
On PowerPC we can go into MMU Split Mode. That means that either
data relocation is on but instruction relocation is off or vice
versa.

That mode didn't work properly, as we weren't always flushing
entries when going into a new split mode, potentially mapping
different code or data that we're supposed to.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s.h |9 +++---
 arch/powerpc/kvm/book3s.c |   46 +---
 2 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index e6ea974..14d0262 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -99,10 +99,11 @@ struct kvmppc_vcpu_book3s {
 #define CONTEXT_GUEST  1
 #define CONTEXT_GUEST_END  2
 
-#define VSID_REAL  0xfff0
-#define VSID_REAL_DR   0xffe0
-#define VSID_REAL_IR   0xffd0
-#define VSID_BAT   0xffc0
+#define VSID_REAL_DR   0x7ff0
+#define VSID_REAL_IR   0x7fe0
+#define VSID_SPLIT_MASK0x7fe0
+#define VSID_REAL  0x7fc0
+#define VSID_BAT   0x7fb0
 #define VSID_PR0x8000
 
 extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, u64 ea, u64 ea_mask);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 94c229d..c2ffb91 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -133,6 +133,14 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
 
if (((vcpu->arch.msr & (MSR_IR|MSR_DR)) != (old_msr & (MSR_IR|MSR_DR))) 
||
(vcpu->arch.msr & MSR_PR) != (old_msr & MSR_PR)) {
+   bool dr = (vcpu->arch.msr & MSR_DR) ? true : false;
+   bool ir = (vcpu->arch.msr & MSR_IR) ? true : false;
+
+   /* Flush split mode PTEs */
+   if (dr != ir)
+   kvmppc_mmu_pte_vflush(vcpu, VSID_SPLIT_MASK,
+ VSID_SPLIT_MASK);
+
kvmppc_mmu_flush_segments(vcpu);
kvmppc_mmu_map_segment(vcpu, vcpu->arch.pc);
}
@@ -395,15 +403,7 @@ static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong 
eaddr, bool data,
} else {
pte->eaddr = eaddr;
pte->raddr = eaddr & 0x;
-   pte->vpage = eaddr >> 12;
-   switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
-   case 0:
-   pte->vpage |= VSID_REAL;
-   case MSR_DR:
-   pte->vpage |= VSID_REAL_DR;
-   case MSR_IR:
-   pte->vpage |= VSID_REAL_IR;
-   }
+   pte->vpage = VSID_REAL | eaddr >> 12;
pte->may_read = true;
pte->may_write = true;
pte->may_execute = true;
@@ -512,12 +512,10 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
int page_found = 0;
struct kvmppc_pte pte;
bool is_mmio = false;
+   bool dr = (vcpu->arch.msr & MSR_DR) ? true : false;
+   bool ir = (vcpu->arch.msr & MSR_IR) ? true : false;
 
-   if ( vec == BOOK3S_INTERRUPT_DATA_STORAGE ) {
-   relocated = (vcpu->arch.msr & MSR_DR);
-   } else {
-   relocated = (vcpu->arch.msr & MSR_IR);
-   }
+   relocated = data ? dr : ir;
 
/* Resolve real address if translation turned on */
if (relocated) {
@@ -529,14 +527,18 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
pte.raddr = eaddr & 0x;
pte.eaddr = eaddr;
pte.vpage = eaddr >> 12;
-   switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
-   case 0:
-   pte.vpage |= VSID_REAL;
-   case MSR_DR:
-   pte.vpage |= VSID_REAL_DR;
-   case MSR_IR:
-   pte.vpage |= VSID_REAL_IR;
-   }
+   }
+
+   switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+   case 0:
+   pte.vpage |= VSID_REAL;
+   break;
+   case MSR_DR:
+   pte.vpage |= VSID_REAL_DR;
+   break;
+   case MSR_IR:
+   pte.vpage |= VSID_REAL_IR;
+   break;
}
 
if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/21] KVM: PPC: Allow userspace to unset the IRQ line

2010-03-24 Thread Alexander Graf
Userspace can tell us that it wants to trigger an interrupt. But
so far it can't tell us that it wants to stop triggering one.

So let's interpret the parameter to the ioctl that we have anyways
to tell us if we want to raise or lower the interrupt line.

Signed-off-by: Alexander Graf 

v2 -> v3:

 - Add CAP for unset irq
---
 arch/powerpc/include/asm/kvm.h |3 +++
 arch/powerpc/include/asm/kvm_ppc.h |2 ++
 arch/powerpc/kvm/book3s.c  |6 ++
 arch/powerpc/kvm/powerpc.c |6 +-
 include/linux/kvm.h|1 +
 5 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h
index 19bae31..6c5547d 100644
--- a/arch/powerpc/include/asm/kvm.h
+++ b/arch/powerpc/include/asm/kvm.h
@@ -84,4 +84,7 @@ struct kvm_guest_debug_arch {
 #define KVM_REG_QPR0x0040
 #define KVM_REG_FQPR   0x0060
 
+#define KVM_INTERRUPT_SET  -1U
+#define KVM_INTERRUPT_UNSET-2U
+
 #endif /* __LINUX_KVM_POWERPC_H */
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index c7fcdd7..6a2464e 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -92,6 +92,8 @@ extern void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_dequeue_dec(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
struct kvm_interrupt *irq);
+extern void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu,
+ struct kvm_interrupt *irq);
 
 extern int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
   unsigned int op, int *advance);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index c2ffb91..9e0bc47 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -230,6 +230,12 @@ void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_EXTERNAL);
 }
 
+void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu,
+  struct kvm_interrupt *irq)
+{
+   kvmppc_book3s_dequeue_irqprio(vcpu, BOOK3S_INTERRUPT_EXTERNAL);
+}
+
 int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority)
 {
int deliver = 1;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index a0e3172..af873d9 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -148,6 +148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
switch (ext) {
case KVM_CAP_PPC_SEGSTATE:
case KVM_CAP_PPC_PAIRED_SINGLES:
+   case KVM_CAP_PPC_UNSET_IRQ:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -450,7 +451,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 
 int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq)
 {
-   kvmppc_core_queue_external(vcpu, irq);
+   if (irq->irq == KVM_INTERRUPT_UNSET)
+   kvmppc_core_dequeue_external(vcpu, irq);
+   else
+   kvmppc_core_queue_external(vcpu, irq);
 
if (waitqueue_active(&vcpu->wq)) {
wake_up_interruptible(&vcpu->wq);
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ce28767..c36d093 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -507,6 +507,7 @@ struct kvm_ioeventfd {
 #define KVM_CAP_DEBUGREGS 50
 #endif
 #define KVM_CAP_X86_ROBUST_SINGLESTEP 51
+#define KVM_CAP_PPC_UNSET_IRQ 53
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/21] KVM: PPC: MOL bringup patches v3

2010-03-24 Thread Alexander Graf
Mac-on-Linux has always lacked PPC64 host support. This is going to
change now!

This patchset contains minor patches to enable MOL, but is mostly about
bug fixes that came out of running Mac OS X. With this set and the
current svn version of MOL I have 10.4.11 running as a guest on a 970MP
as well as a PS3 host.


v1 -> v2:

 - Add documentation for EXIT_OSI and ENABLE_CAP
 - Add flags to enable_cap
 - Add build fix for !CONFIG_VSX
 - Remove in-paca register check

v2 -> v3:

 - Document that EXIT_OSI is not migration safe
 - Add CAP for ENABLE_CAP
 - Improve documentation for ENABLE_CAP
 - Add alignment emulation for stfs, stfd, lfd
 - Add alignment DAR emulation
 - Add CAP for unset irq
 - new: Fix dcbz emulation
 - new: Add emulation for dcba
 - new: Add check if pte was mapped secondary (PS3 fix)
 - new: Use ULL for big numbers
 - new: Make bools bitfields
 - new: Disable MSR_FEx for Cell hosts (PS3 speedup)


Alexander Graf (21):
  KVM: PPC: Ensure split mode works
  KVM: PPC: Allow userspace to unset the IRQ line
  KVM: PPC: Make DSISR 32 bits wide
  KVM: PPC: Book3S_32 guest MMU fixes
  KVM: PPC: Split instruction reading out
  KVM: PPC: Don't reload FPU with invalid values
  KVM: PPC: Load VCPU for register fetching
  KVM: PPC: Implement mfsr emulation
  KVM: PPC: Implement BAT reads
  KVM: PPC: Make XER load 32 bit
  KVM: PPC: Implement emulation for lbzux and lhax
  KVM: PPC: Implement alignment interrupt
  KVM: Add support for enabling capabilities per-vcpu
  KVM: PPC: Add OSI hypercall interface
  KVM: PPC: Make build work without CONFIG_VSX/ALTIVEC
  KVM: PPC: Fix dcbz emulation
  KVM: PPC: Add emulation for dcba
  KVM: PPC: Add check if pte was mapped secondary
  KVM: PPC: Use ULL for big numbers
  KVM: PPC: Make bools bitfields
  KVM: PPC: Disable MSR_FEx for Cell hosts

 Documentation/kvm/api.txt   |   54 -
 arch/powerpc/include/asm/kvm.h  |3 +
 arch/powerpc/include/asm/kvm_book3s.h   |   47 +---
 arch/powerpc/include/asm/kvm_host.h |   10 +-
 arch/powerpc/include/asm/kvm_ppc.h  |2 +
 arch/powerpc/kvm/book3s.c   |  191 +++---
 arch/powerpc/kvm/book3s_32_mmu.c|   30 -
 arch/powerpc/kvm/book3s_64_emulate.c|  146 +++-
 arch/powerpc/kvm/book3s_64_interrupts.S |2 +-
 arch/powerpc/kvm/book3s_64_mmu_host.c   |7 +
 arch/powerpc/kvm/book3s_64_slb.S|2 +-
 arch/powerpc/kvm/emulate.c  |   20 
 arch/powerpc/kvm/powerpc.c  |   45 +++-
 include/linux/kvm.h |   19 +++
 14 files changed, 468 insertions(+), 110 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/21] KVM: PPC: Implement alignment interrupt

2010-03-24 Thread Alexander Graf
Mac OS X has some applications - namely the Finder - that require alignment
interrupts to work properly. So we need to implement them.

But the spec for 970 and 750 also looks different. While 750 requires the
DSISR and DAR fields to reflect some instruction bits (DSISR) and the fault
address (DAR), the 970 declares this as an optional feature. So we need
to reconstruct DSISR and DAR manually.

Signed-off-by: Alexander Graf 

---

v2 -> v3:

 - add emulation for stfs, stfd, lfd
 - add DAR emulation
---
 arch/powerpc/include/asm/kvm_book3s.h |2 +
 arch/powerpc/kvm/book3s.c |   10 
 arch/powerpc/kvm/book3s_64_emulate.c  |   75 +
 3 files changed, 87 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index b47b2f5..bea7637 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -131,6 +131,8 @@ extern void kvmppc_rmcall(ulong srr0, ulong srr1);
 extern void kvmppc_load_up_fpu(void);
 extern void kvmppc_load_up_altivec(void);
 extern void kvmppc_load_up_vsx(void);
+extern u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst);
+extern ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst);
 
 static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 585dc91..130a9a1 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -905,6 +905,16 @@ program_interrupt:
}
break;
}
+   case BOOK3S_INTERRUPT_ALIGNMENT:
+   if (kvmppc_read_inst(vcpu) == EMULATE_DONE) {
+   to_book3s(vcpu)->dsisr = kvmppc_alignment_dsisr(vcpu,
+   vcpu->arch.last_inst);
+   vcpu->arch.dear = kvmppc_alignment_dar(vcpu,
+   vcpu->arch.last_inst);
+   kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+   }
+   r = RESUME_GUEST;
+   break;
case BOOK3S_INTERRUPT_MACHINE_CHECK:
case BOOK3S_INTERRUPT_TRACE:
kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
diff --git a/arch/powerpc/kvm/book3s_64_emulate.c 
b/arch/powerpc/kvm/book3s_64_emulate.c
index 39d5003..1e5cf8d 100644
--- a/arch/powerpc/kvm/book3s_64_emulate.c
+++ b/arch/powerpc/kvm/book3s_64_emulate.c
@@ -44,6 +44,11 @@
 /* DCBZ is actually 1014, but we patch it to 1010 so we get a trap */
 #define OP_31_XOP_DCBZ 1010
 
+#define OP_LFS 48
+#define OP_LFD 50
+#define OP_STFS52
+#define OP_STFD54
+
 #define SPRN_GQR0  912
 #define SPRN_GQR1  913
 #define SPRN_GQR2  914
@@ -474,3 +479,73 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int 
sprn, int rt)
return emulated;
 }
 
+u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst)
+{
+   u32 dsisr = 0;
+
+   /*
+* This is what the spec says about DSISR bits (not mentioned = 0):
+*
+* 12:13[DS]Set to bits 30:31
+* 15:16[X] Set to bits 29:30
+* 17   [X] Set to bit 25
+*  [D/DS]  Set to bit 5
+* 18:21[X] Set to bits 21:24
+*  [D/DS]  Set to bits 1:4
+* 22:26Set to bits 6:10 (RT/RS/FRT/FRS)
+* 27:31Set to bits 11:15 (RA)
+*/
+
+   switch (get_op(inst)) {
+   /* D-form */
+   case OP_LFS:
+   case OP_LFD:
+   case OP_STFD:
+   case OP_STFS:
+   dsisr |= (inst >> 12) & 0x4000; /* bit 17 */
+   dsisr |= (inst >> 17) & 0x3c00; /* bits 18:21 */
+   break;
+   /* X-form */
+   case 31:
+   dsisr |= (inst << 14) & 0x18000; /* bits 15:16 */
+   dsisr |= (inst << 8)  & 0x04000; /* bit 17 */
+   dsisr |= (inst << 3)  & 0x03c00; /* bits 18:21 */
+   break;
+   default:
+   printk(KERN_INFO "KVM: Unaligned instruction 0x%x\n", inst);
+   break;
+   }
+
+   dsisr |= (inst >> 16) & 0x03ff; /* bits 22:31 */
+
+   return dsisr;
+}
+
+ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
+{
+   ulong dar = 0;
+   ulong ra;
+
+   switch (get_op(inst)) {
+   case OP_LFS:
+   case OP_LFD:
+   case OP_STFD:
+   case OP_STFS:
+   ra = get_ra(inst);
+   if (ra)
+   dar = kvmppc_get_gpr(vcpu, ra);
+   dar += (s32)((s16)inst);
+   break;
+   case 31:
+   ra = get_ra(inst);
+   if (ra)
+   dar = kvmppc_get_gpr(vcpu, ra);
+  

[PATCH 07/21] KVM: PPC: Load VCPU for register fetching

2010-03-24 Thread Alexander Graf
When trying to read or store vcpu register data, we should also make
sure the vcpu is actually loaded, so we're 100% sure we get the correct
values.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 029e1be..585dc91 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -955,6 +955,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
 {
int i;
 
+   vcpu_load(vcpu);
+
regs->pc = vcpu->arch.pc;
regs->cr = kvmppc_get_cr(vcpu);
regs->ctr = vcpu->arch.ctr;
@@ -975,6 +977,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
for (i = 0; i < ARRAY_SIZE(regs->gpr); i++)
regs->gpr[i] = kvmppc_get_gpr(vcpu, i);
 
+   vcpu_put(vcpu);
+
return 0;
 }
 
@@ -982,6 +986,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
 {
int i;
 
+   vcpu_load(vcpu);
+
vcpu->arch.pc = regs->pc;
kvmppc_set_cr(vcpu, regs->cr);
vcpu->arch.ctr = regs->ctr;
@@ -1001,6 +1007,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
for (i = 0; i < ARRAY_SIZE(regs->gpr); i++)
kvmppc_set_gpr(vcpu, i, regs->gpr[i]);
 
+   vcpu_put(vcpu);
+
return 0;
 }
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 19/21] KVM: PPC: Use ULL for big numbers

2010-03-24 Thread Alexander Graf
Some constants were bigger than ints. Let's mark them as such so we don't
accidently truncate them.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s.h |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 7e243b2..8a6b4c5 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -100,12 +100,12 @@ struct kvmppc_vcpu_book3s {
 #define CONTEXT_GUEST  1
 #define CONTEXT_GUEST_END  2
 
-#define VSID_REAL_DR   0x7ff0
-#define VSID_REAL_IR   0x7fe0
-#define VSID_SPLIT_MASK0x7fe0
-#define VSID_REAL  0x7fc0
-#define VSID_BAT   0x7fb0
-#define VSID_PR0x8000
+#define VSID_REAL_DR   0x7ff0ULL
+#define VSID_REAL_IR   0x7fe0ULL
+#define VSID_SPLIT_MASK0x7fe0ULL
+#define VSID_REAL  0x7fc0ULL
+#define VSID_BAT   0x7fb0ULL
+#define VSID_PR0x8000ULL
 
 extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, u64 ea, u64 ea_mask);
 extern void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 vp, u64 vp_mask);
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/21] KVM: PPC: Book3S_32 guest MMU fixes

2010-03-24 Thread Alexander Graf
This patch makes the VSID of mapped pages always reflecting all special cases
we have, like split mode.

It also changes the tlbie mask to 0x0000 according to the spec. The mask
we used before was incorrect.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s.h |1 +
 arch/powerpc/kvm/book3s_32_mmu.c  |   30 +++---
 2 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 9f5a992..b47b2f5 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -44,6 +44,7 @@ struct kvmppc_sr {
bool Ks;
bool Kp;
bool nx;
+   bool valid;
 };
 
 struct kvmppc_bat {
diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index 1483a9b..7071e22 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -57,6 +57,8 @@ static inline bool check_debug_ip(struct kvm_vcpu *vcpu)
 
 static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr,
  struct kvmppc_pte *pte, bool data);
+static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, u64 esid,
+u64 *vsid);
 
 static struct kvmppc_sr *find_sr(struct kvmppc_vcpu_book3s *vcpu_book3s, gva_t 
eaddr)
 {
@@ -66,13 +68,14 @@ static struct kvmppc_sr *find_sr(struct kvmppc_vcpu_book3s 
*vcpu_book3s, gva_t e
 static u64 kvmppc_mmu_book3s_32_ea_to_vp(struct kvm_vcpu *vcpu, gva_t eaddr,
 bool data)
 {
-   struct kvmppc_sr *sre = find_sr(to_book3s(vcpu), eaddr);
+   u64 vsid;
struct kvmppc_pte pte;
 
if (!kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, &pte, data))
return pte.vpage;
 
-   return (((u64)eaddr >> 12) & 0x) | (((u64)sre->vsid) << 16);
+   kvmppc_mmu_book3s_32_esid_to_vsid(vcpu, eaddr >> SID_SHIFT, &vsid);
+   return (((u64)eaddr >> 12) & 0x) | (vsid << 16);
 }
 
 static void kvmppc_mmu_book3s_32_reset_msr(struct kvm_vcpu *vcpu)
@@ -142,8 +145,13 @@ static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu 
*vcpu, gva_t eaddr,
bat->bepi_mask);
}
if ((eaddr & bat->bepi_mask) == bat->bepi) {
+   u64 vsid;
+   kvmppc_mmu_book3s_32_esid_to_vsid(vcpu,
+   eaddr >> SID_SHIFT, &vsid);
+   vsid <<= 16;
+   pte->vpage = (((u64)eaddr >> 12) & 0x) | vsid;
+
pte->raddr = bat->brpn | (eaddr & ~bat->bepi_mask);
-   pte->vpage = (eaddr >> 12) | VSID_BAT;
pte->may_read = bat->pp;
pte->may_write = bat->pp > 1;
pte->may_execute = true;
@@ -302,6 +310,7 @@ static void kvmppc_mmu_book3s_32_mtsrin(struct kvm_vcpu 
*vcpu, u32 srnum,
/* And then put in the new SR */
sre->raw = value;
sre->vsid = (value & 0x0fff);
+   sre->valid = (value & 0x8000) ? false : true;
sre->Ks = (value & 0x4000) ? true : false;
sre->Kp = (value & 0x2000) ? true : false;
sre->nx = (value & 0x1000) ? true : false;
@@ -312,7 +321,7 @@ static void kvmppc_mmu_book3s_32_mtsrin(struct kvm_vcpu 
*vcpu, u32 srnum,
 
 static void kvmppc_mmu_book3s_32_tlbie(struct kvm_vcpu *vcpu, ulong ea, bool 
large)
 {
-   kvmppc_mmu_pte_flush(vcpu, ea, ~0xFFFULL);
+   kvmppc_mmu_pte_flush(vcpu, ea, 0x0000);
 }
 
 static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, u64 esid,
@@ -333,15 +342,22 @@ static int kvmppc_mmu_book3s_32_esid_to_vsid(struct 
kvm_vcpu *vcpu, u64 esid,
break;
case MSR_DR|MSR_IR:
{
-   ulong ea;
-   ea = esid << SID_SHIFT;
-   *vsid = find_sr(to_book3s(vcpu), ea)->vsid;
+   ulong ea = esid << SID_SHIFT;
+   struct kvmppc_sr *sr = find_sr(to_book3s(vcpu), ea);
+
+   if (!sr->valid)
+   return -1;
+
+   *vsid = sr->vsid;
break;
}
default:
BUG();
}
 
+   if (vcpu->arch.msr & MSR_PR)
+   *vsid |= VSID_PR;
+
return 0;
 }
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/21] KVM: PPC: Make XER load 32 bit

2010-03-24 Thread Alexander Graf
We have a 32 bit value in the PACA to store XER in. We also do an stw
when storing XER in there. But then we load it with ld, completely
screwing it up on every entry.

Welcome to the Big Endian world.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_64_slb.S |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/book3s_64_slb.S
index 35b7627..0919679 100644
--- a/arch/powerpc/kvm/book3s_64_slb.S
+++ b/arch/powerpc/kvm/book3s_64_slb.S
@@ -145,7 +145,7 @@ slb_do_enter:
lwz r11, (PACA_KVM_CR)(r13)
mtcrr11
 
-   ld  r11, (PACA_KVM_XER)(r13)
+   lwz r11, (PACA_KVM_XER)(r13)
mtxer   r11
 
ld  r11, (PACA_KVM_R11)(r13)
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/21] KVM: PPC: Implement BAT reads

2010-03-24 Thread Alexander Graf
BATs can't only be written to, you can also read them out!
So let's implement emulation for reading BAT values again.

While at it, I also made BAT setting flush the segment cache,
so we're absolutely sure there's no MMU state left when writing
BATs.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_64_emulate.c |   35 ++
 1 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_emulate.c 
b/arch/powerpc/kvm/book3s_64_emulate.c
index 8d7a78d..39d5003 100644
--- a/arch/powerpc/kvm/book3s_64_emulate.c
+++ b/arch/powerpc/kvm/book3s_64_emulate.c
@@ -239,6 +239,34 @@ void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct 
kvmppc_bat *bat, bool upper,
}
 }
 
+static u32 kvmppc_read_bat(struct kvm_vcpu *vcpu, int sprn)
+{
+   struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+   struct kvmppc_bat *bat;
+
+   switch (sprn) {
+   case SPRN_IBAT0U ... SPRN_IBAT3L:
+   bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT0U) / 2];
+   break;
+   case SPRN_IBAT4U ... SPRN_IBAT7L:
+   bat = &vcpu_book3s->ibat[4 + ((sprn - SPRN_IBAT4U) / 2)];
+   break;
+   case SPRN_DBAT0U ... SPRN_DBAT3L:
+   bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT0U) / 2];
+   break;
+   case SPRN_DBAT4U ... SPRN_DBAT7L:
+   bat = &vcpu_book3s->dbat[4 + ((sprn - SPRN_DBAT4U) / 2)];
+   break;
+   default:
+   BUG();
+   }
+
+   if (sprn % 2)
+   return bat->raw >> 32;
+   else
+   return bat->raw;
+}
+
 static void kvmppc_write_bat(struct kvm_vcpu *vcpu, int sprn, u32 val)
 {
struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
@@ -290,6 +318,7 @@ int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int 
sprn, int rs)
/* BAT writes happen so rarely that we're ok to flush
 * everything here */
kvmppc_mmu_pte_flush(vcpu, 0, 0);
+   kvmppc_mmu_flush_segments(vcpu);
break;
case SPRN_HID0:
to_book3s(vcpu)->hid[0] = spr_val;
@@ -373,6 +402,12 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int 
sprn, int rt)
int emulated = EMULATE_DONE;
 
switch (sprn) {
+   case SPRN_IBAT0U ... SPRN_IBAT3L:
+   case SPRN_IBAT4U ... SPRN_IBAT7L:
+   case SPRN_DBAT0U ... SPRN_DBAT3L:
+   case SPRN_DBAT4U ... SPRN_DBAT7L:
+   kvmppc_set_gpr(vcpu, rt, kvmppc_read_bat(vcpu, sprn));
+   break;
case SPRN_SDR1:
kvmppc_set_gpr(vcpu, rt, to_book3s(vcpu)->sdr1);
break;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/21] KVM: Add support for enabling capabilities per-vcpu

2010-03-24 Thread Alexander Graf
Some times we don't want all capabilities to be available to all
our vcpus. One example for that is the OSI interface, implemented
in the next patch.

In order to have a generic mechanism in how to enable capabilities
individually, this patch introduces a new ioctl that can be used
for this purpose. That way features we don't want in all guests or
userspace configurations can just not be enabled and we're good.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - Add flags to enable_cap
  - Update documentation for kvm_enable_cap

v2 -> v3:

  - Add CAP for ENABLE_CAP
  - Improve documentation for ENABLE_CAP
---
 Documentation/kvm/api.txt  |   35 +++
 arch/powerpc/kvm/powerpc.c |   27 +++
 include/linux/kvm.h|   12 
 3 files changed, 74 insertions(+), 0 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index d170cb4..3da2240 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -749,6 +749,41 @@ Writes debug registers into the vcpu.
 See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
 yet and must be cleared on entry.
 
+4.34 KVM_ENABLE_CAP
+
+Capability: KVM_CAP_ENABLE_CAP
+Architectures: ppc
+Type: vcpu ioctl
+Parameters: struct kvm_enable_cap (in)
+Returns: 0 on success; -1 on error
+
+Not all extensions are enabled by default. Using this ioctl the application
+can enable an extension, making it available to the guest.
+
+On systems that do not support this ioctl, it always fails. On systems that
+do support it, it only works for extensions that are supported for enablement.
+
+To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should
+be used.
+
+struct kvm_enable_cap {
+   /* in */
+   __u32 cap;
+
+The capability that is supposed to get enabled.
+
+   __u32 flags;
+
+A bitfield indicating future enhancements. Has to be 0 for now.
+
+   __u64 args[4];
+
+Arguments for enabling a feature. If a feature needs initial values to
+function properly, this is the place to put them.
+
+   __u8  pad[64];
+};
+
 
 5. The kvm_run structure
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index af873d9..2092157 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -149,6 +149,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_PPC_SEGSTATE:
case KVM_CAP_PPC_PAIRED_SINGLES:
case KVM_CAP_PPC_UNSET_IRQ:
+   case KVM_CAP_ENABLE_CAP:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -464,6 +465,23 @@ int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct 
kvm_interrupt *irq)
return 0;
 }
 
+static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
+struct kvm_enable_cap *cap)
+{
+   int r;
+
+   if (cap->flags)
+   return -EINVAL;
+
+   switch (cap->cap) {
+   default:
+   r = -EINVAL;
+   break;
+   }
+
+   return r;
+}
+
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 struct kvm_mp_state *mp_state)
 {
@@ -492,6 +510,15 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = kvm_vcpu_ioctl_interrupt(vcpu, &irq);
break;
}
+   case KVM_ENABLE_CAP:
+   {
+   struct kvm_enable_cap cap;
+   r = -EFAULT;
+   if (copy_from_user(&cap, argp, sizeof(cap)))
+   goto out;
+   r = kvm_vcpu_ioctl_enable_cap(vcpu, &cap);
+   break;
+   }
default:
r = -EINVAL;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index c36d093..d9e920e 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -400,6 +400,15 @@ struct kvm_ioeventfd {
__u8  pad[36];
 };
 
+/* for KVM_ENABLE_CAP */
+struct kvm_enable_cap {
+   /* in */
+   __u32 cap;
+   __u32 flags;
+   __u64 args[4];
+   __u8  pad[64];
+};
+
 #define KVMIO 0xAE
 
 /*
@@ -508,6 +517,7 @@ struct kvm_ioeventfd {
 #endif
 #define KVM_CAP_X86_ROBUST_SINGLESTEP 51
 #define KVM_CAP_PPC_UNSET_IRQ 53
+#define KVM_CAP_ENABLE_CAP 54
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -697,6 +707,8 @@ struct kvm_clock_data {
 /* Available with KVM_CAP_DEBUGREGS */
 #define KVM_GET_DEBUGREGS _IOR(KVMIO,  0xa1, struct kvm_debugregs)
 #define KVM_SET_DEBUGREGS _IOW(KVMIO,  0xa2, struct kvm_debugregs)
+/* No need for CAP, because then it just always fails */
+#define KVM_ENABLE_CAP_IOW(KVMIO,  0xa3, struct kvm_enable_cap)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 << 0)
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/21] KVM: PPC: Add OSI hypercall interface

2010-03-24 Thread Alexander Graf
MOL uses its own hypercall interface to call back into userspace when
the guest wants to do something.

So let's implement that as an exit reason, specify it with a CAP and
only really use it when userspace wants us to.

The only user of it so far is MOL.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - Add documentation for OSI exit struct

v2 -> v3:

  - Document that EXIT_OSI is not migration safe
---
 Documentation/kvm/api.txt |   19 ---
 arch/powerpc/include/asm/kvm_book3s.h |5 +
 arch/powerpc/include/asm/kvm_host.h   |2 ++
 arch/powerpc/kvm/book3s.c |   24 ++--
 arch/powerpc/kvm/powerpc.c|   12 
 include/linux/kvm.h   |6 ++
 6 files changed, 59 insertions(+), 9 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index 3da2240..f0a9337 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -895,9 +895,9 @@ executed a memory-mapped I/O instruction which could not be 
satisfied
 by kvm.  The 'data' member contains the written data if 'is_write' is
 true, and should be filled by application code otherwise.
 
-NOTE: For KVM_EXIT_IO and KVM_EXIT_MMIO, the corresponding operations
-are complete (and guest state is consistent) only after userspace has
-re-entered the kernel with KVM_RUN.  The kernel side will first finish
+NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO and KVM_EXIT_OSI, the corresponding
+operations are complete (and guest state is consistent) only after userspace
+has re-entered the kernel with KVM_RUN.  The kernel side will first finish
 incomplete operations and then check for pending signals.  Userspace
 can re-enter the guest with an unmasked signal pending to complete
 pending operations.
@@ -952,6 +952,19 @@ s390 specific.
 
 powerpc specific.
 
+   /* KVM_EXIT_OSI */
+   struct {
+   __u64 gprs[32];
+   } osi;
+
+MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch
+hypercalls and exit with this exit struct that contains all the guest gprs.
+
+If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall.
+Userspace can now handle the hypercall and when it's done modify the gprs as
+necessary. Upon guest entry all guest GPRs will then be replaced by the values
+in this struct.
+
/* Fix the size of the union. */
char padding[256];
};
diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index bea7637..7e243b2 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -148,6 +148,11 @@ static inline ulong dsisr(void)
 
 extern void kvm_return_point(void);
 
+/* Magic register values loaded into r3 and r4 before the 'sc' assembly
+ * instruction for the OSI hypercalls */
+#define OSI_SC_MAGIC_R30x113724FA
+#define OSI_SC_MAGIC_R40x77810F9B
+
 #define INS_DCBZ   0x7c0007ec
 
 #endif /* __ASM_KVM_BOOK3S_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 0ebda67..486f1ca 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -273,6 +273,8 @@ struct kvm_vcpu_arch {
u8 mmio_sign_extend;
u8 dcr_needed;
u8 dcr_is_write;
+   u8 osi_needed;
+   u8 osi_enabled;
 
u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */
 
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 130a9a1..d6105d9 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -871,12 +871,24 @@ program_interrupt:
break;
}
case BOOK3S_INTERRUPT_SYSCALL:
-#ifdef EXIT_DEBUG
-   printk(KERN_INFO "Syscall Nr %d\n", (int)kvmppc_get_gpr(vcpu, 
0));
-#endif
-   vcpu->stat.syscall_exits++;
-   kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
-   r = RESUME_GUEST;
+   // XXX make user settable
+   if (vcpu->arch.osi_enabled &&
+   (((u32)kvmppc_get_gpr(vcpu, 3)) == OSI_SC_MAGIC_R3) &&
+   (((u32)kvmppc_get_gpr(vcpu, 4)) == OSI_SC_MAGIC_R4)) {
+   u64 *gprs = run->osi.gprs;
+   int i;
+
+   run->exit_reason = KVM_EXIT_OSI;
+   for (i = 0; i < 32; i++)
+   gprs[i] = kvmppc_get_gpr(vcpu, i);
+   vcpu->arch.osi_needed = 1;
+   r = RESUME_HOST_NV;
+
+   } else {
+   vcpu->stat.syscall_exits++;
+   kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+   r = RESUME_GUEST;
+   }
break;
case BOOK3S_INTERRUPT_FP_UNAVAIL:
case BOOK3S_INTERRUPT_ALTIVEC:
diff --git a/arch/p

[PATCH 17/21] KVM: PPC: Add emulation for dcba

2010-03-24 Thread Alexander Graf
Mac OS X uses the dcba instruction. According to the specification it doesn't
guarantee any functionality, so let's just emulate it as nop.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_64_emulate.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_emulate.c 
b/arch/powerpc/kvm/book3s_64_emulate.c
index bbd1590..8f50776 100644
--- a/arch/powerpc/kvm/book3s_64_emulate.c
+++ b/arch/powerpc/kvm/book3s_64_emulate.c
@@ -37,6 +37,7 @@
 #define OP_31_XOP_SLBIA498
 #define OP_31_XOP_MFSR 595
 #define OP_31_XOP_MFSRIN   659
+#define OP_31_XOP_DCBA 758
 #define OP_31_XOP_SLBMFEV  851
 #define OP_31_XOP_EIOIO854
 #define OP_31_XOP_SLBMFEE  915
@@ -183,6 +184,9 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
kvmppc_set_gpr(vcpu, get_rt(inst), t);
}
break;
+   case OP_31_XOP_DCBA:
+   /* Gets treated as NOP */
+   break;
case OP_31_XOP_DCBZ:
{
ulong rb = kvmppc_get_gpr(vcpu, get_rb(inst));
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 21/21] KVM: PPC: Disable MSR_FEx for Cell hosts

2010-03-24 Thread Alexander Graf
Cell can't handle MSR_FE0 and MSR_FE1 too well. It gets dog slow.
So let's just override the guest whenever we see one of the two and mask them
out. See commit ddf5f75a16b3e7460ffee881795aa168dffcd0cf for reference.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 1a12ef2..a7ab2ea 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -356,6 +356,10 @@ void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr)
!strcmp(cur_cpu_spec->platform, "ppc970"))
vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
 
+   /* Cell performs badly if MSR_FEx are set. So let's hope nobody
+  really needs them in a VM on Cell and force disable them. */
+   if (!strcmp(cur_cpu_spec->platform, "ppc-cell-be"))
+   to_book3s(vcpu)->msr_mask &= ~(MSR_FE0 | MSR_FE1);
 }
 
 /* Book3s_32 CPUs always have 32 bytes cache line size, which Linux assumes. To
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 18/21] KVM: PPC: Add check if pte was mapped secondary

2010-03-24 Thread Alexander Graf
Some HTAB providers (namely the PS3) ignore the SECONDARY flag. They
just put an entry in the htab as secondary when they see fit.

So we need to check the return value of htab_insert to remember the
correct slot id so we can actually invalidate the entry again.

Fixes KVM on the PS3.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_64_mmu_host.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
b/arch/powerpc/kvm/book3s_64_mmu_host.c
index 25bd4ed..a01e9c5 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -270,6 +270,13 @@ map_again:
(rflags & HPTE_R_N) ? '-' : 'x',
orig_pte->eaddr, hpteg, va, orig_pte->vpage, 
hpaddr);
 
+   /* The ppc_md code may give us a secondary entry even though we
+  asked for a primary. Fix up. */
+   if ((ret & _PTEIDX_SECONDARY) && !(vflags & HPTE_V_SECONDARY)) {
+   hash = ~hash;
+   hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP);
+   }
+
pte->slot = hpteg + (ret & 7);
pte->host_va = va;
pte->pte = *orig_pte;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 20/21] KVM: PPC: Make bools bitfields

2010-03-24 Thread Alexander Graf
Bool defaults to at least byte width. We usually only want to waste a single
bit on this. So let's move all the bool values to bitfields, potentially
saving memory.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s.h |   28 ++--
 arch/powerpc/include/asm/kvm_host.h   |6 +++---
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 8a6b4c5..ee79921 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -29,40 +29,40 @@ struct kvmppc_slb {
u64 vsid;
u64 orige;
u64 origv;
-   bool valid;
-   bool Ks;
-   bool Kp;
-   bool nx;
-   bool large; /* PTEs are 16MB */
-   bool tb;/* 1TB segment */
-   bool class;
+   bool valid  : 1;
+   bool Ks : 1;
+   bool Kp : 1;
+   bool nx : 1;
+   bool large  : 1;/* PTEs are 16MB */
+   bool tb : 1;/* 1TB segment */
+   bool class  : 1;
 };
 
 struct kvmppc_sr {
u32 raw;
u32 vsid;
-   bool Ks;
-   bool Kp;
-   bool nx;
-   bool valid;
+   bool Ks : 1;
+   bool Kp : 1;
+   bool nx : 1;
+   bool valid  : 1;
 };
 
 struct kvmppc_bat {
u64 raw;
u32 bepi;
u32 bepi_mask;
-   bool vs;
-   bool vp;
u32 brpn;
u8 wimg;
u8 pp;
+   bool vs : 1;
+   bool vp : 1;
 };
 
 struct kvmppc_sid_map {
u64 guest_vsid;
u64 guest_esid;
u64 host_vsid;
-   bool valid;
+   bool valid  : 1;
 };
 
 #define SID_MAP_BITS9
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 486f1ca..5869a48 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -127,9 +127,9 @@ struct kvmppc_pte {
u64 eaddr;
u64 vpage;
u64 raddr;
-   bool may_read;
-   bool may_write;
-   bool may_execute;
+   bool may_read   : 1;
+   bool may_write  : 1;
+   bool may_execute: 1;
 };
 
 struct kvmppc_mmu {
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/21] KVM: PPC: Fix dcbz emulation

2010-03-24 Thread Alexander Graf
On most systems we need to emulate dcbz when running 32 bit guests. So
far we've been rather slack, not giving correct DSISR values to the guest.

This patch makes the emulation more accurate, introducing a difference
between "page not mapped" and "write protection fault". While at it, it
also speeds up dcbz emulation by an order of magnitude by using kmap.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s.c|   56 +
 arch/powerpc/kvm/book3s_64_emulate.c |   19 +--
 2 files changed, 37 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 7912d72..1a12ef2 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
 
@@ -368,34 +369,29 @@ void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr)
  */
 static void kvmppc_patch_dcbz(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte)
 {
-   bool touched = false;
-   hva_t hpage;
+   struct page *hpage;
+   u64 hpage_offset;
u32 *page;
int i;
 
-   hpage = gfn_to_hva(vcpu->kvm, pte->raddr >> PAGE_SHIFT);
-   if (kvm_is_error_hva(hpage))
+   hpage = gfn_to_page(vcpu->kvm, pte->raddr >> PAGE_SHIFT);
+   if (is_error_page(hpage))
return;
 
-   hpage |= pte->raddr & ~PAGE_MASK;
-   hpage &= ~0xFFFULL;
-
-   page = vmalloc(HW_PAGE_SIZE);
-
-   if (copy_from_user(page, (void __user *)hpage, HW_PAGE_SIZE))
-   goto out;
+   hpage_offset = pte->raddr & ~PAGE_MASK;
+   hpage_offset &= ~0xFFFULL;
+   hpage_offset /= 4;
 
-   for (i=0; i < HW_PAGE_SIZE / 4; i++)
-   if ((page[i] & 0xff0007ff) == INS_DCBZ) {
-   page[i] &= 0xfff7; // reserved instruction, so we 
trap
-   touched = true;
-   }
+   get_page(hpage);
+   page = kmap_atomic(hpage, KM_USER0);
 
-   if (touched)
-   copy_to_user((void __user *)hpage, page, HW_PAGE_SIZE);
+   /* patch dcbz into reserved instruction, so we trap */
+   for (i=hpage_offset; i < hpage_offset + (HW_PAGE_SIZE / 4); i++)
+   if ((page[i] & 0xff0007ff) == INS_DCBZ)
+   page[i] &= 0xfff7;
 
-out:
-   vfree(page);
+   kunmap_atomic(page, KM_USER0);
+   put_page(hpage);
 }
 
 static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, bool data,
@@ -448,30 +444,21 @@ int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int 
size, void *ptr,
  bool data)
 {
struct kvmppc_pte pte;
-   hva_t hva = *eaddr;
 
vcpu->stat.st++;
 
if (kvmppc_xlate(vcpu, *eaddr, data, &pte))
-   goto nopte;
+   return -ENOENT;
 
*eaddr = pte.raddr;
 
-   hva = kvmppc_pte_to_hva(vcpu, &pte, false);
-   if (kvm_is_error_hva(hva))
-   goto mmio;
+   if (!pte.may_write)
+   return -EPERM;
 
-   if (copy_to_user((void __user *)hva, ptr, size)) {
-   printk(KERN_INFO "kvmppc_st at 0x%lx failed\n", hva);
-   goto mmio;
-   }
+   if (kvm_write_guest(vcpu->kvm, pte.raddr, ptr, size))
+   return EMULATE_DO_MMIO;
 
return EMULATE_DONE;
-
-nopte:
-   return -ENOENT;
-mmio:
-   return EMULATE_DO_MMIO;
 }
 
 int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr,
@@ -786,6 +773,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
 * that no guest that needs the dcbz hack does NX.
 */
kvmppc_mmu_pte_flush(vcpu, vcpu->arch.pc, ~0xFFFULL);
+   r = RESUME_GUEST;
} else {
vcpu->arch.msr |= vcpu->arch.shadow_srr1 & 0x5800;
kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
diff --git a/arch/powerpc/kvm/book3s_64_emulate.c 
b/arch/powerpc/kvm/book3s_64_emulate.c
index 1e5cf8d..bbd1590 100644
--- a/arch/powerpc/kvm/book3s_64_emulate.c
+++ b/arch/powerpc/kvm/book3s_64_emulate.c
@@ -189,6 +189,8 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
ulong ra = 0;
ulong addr, vaddr;
u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
+   u32 dsisr;
+   int r;
 
if (get_ra(inst))
ra = kvmppc_get_gpr(vcpu, get_ra(inst));
@@ -198,14 +200,23 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
addr &= 0x;
vaddr = addr;
 
-   if (kvmppc_st(vcpu, &addr, 32, zeros, true)) {
+   r = kvmppc_st(vcpu, &addr, 32, zeros, true);
+   if

[PATCH 15/21] KVM: PPC: Make build work without CONFIG_VSX/ALTIVEC

2010-03-24 Thread Alexander Graf
The FPU/Altivec/VSX enablement also brought access to some structure
elements that are only defined when the respective config options
are enabled.

Unfortuately I forgot to check for the config options at some places,
so let's do that now.

Unbreaks the build when CONFIG_VSX is not set.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index d6105d9..7912d72 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -608,7 +608,9 @@ void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr)
 {
struct thread_struct *t = ¤t->thread;
u64 *vcpu_fpr = vcpu->arch.fpr;
+#ifdef CONFIG_VSX
u64 *vcpu_vsx = vcpu->arch.vsr;
+#endif
u64 *thread_fpr = (u64*)t->fpr;
int i;
 
@@ -688,7 +690,9 @@ static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, 
unsigned int exit_nr,
 {
struct thread_struct *t = ¤t->thread;
u64 *vcpu_fpr = vcpu->arch.fpr;
+#ifdef CONFIG_VSX
u64 *vcpu_vsx = vcpu->arch.vsr;
+#endif
u64 *thread_fpr = (u64*)t->fpr;
int i;
 
@@ -1219,8 +1223,12 @@ int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
 {
int ret;
struct thread_struct ext_bkp;
+#ifdef CONFIG_ALTIVEC
bool save_vec = current->thread.used_vr;
+#endif
+#ifdef CONFIG_VSX
bool save_vsx = current->thread.used_vsr;
+#endif
ulong ext_msr;
 
/* No need to go into the guest when all we do is going out */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/21] KVM: PPC: Split instruction reading out

2010-03-24 Thread Alexander Graf
The current check_ext function reads the instruction and then does
the checking. Let's split the reading out so we can reuse it for
different functions.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s.c |   24 
 1 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 9e0bc47..400ae0a 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -650,26 +650,34 @@ void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr)
kvmppc_recalc_shadow_msr(vcpu);
 }
 
-static int kvmppc_check_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr)
+static int kvmppc_read_inst(struct kvm_vcpu *vcpu)
 {
ulong srr0 = vcpu->arch.pc;
int ret;
 
-   /* Need to do paired single emulation? */
-   if (!(vcpu->arch.hflags & BOOK3S_HFLAG_PAIRED_SINGLE))
-   return EMULATE_DONE;
-
-   /* Read out the instruction */
ret = kvmppc_ld(vcpu, &srr0, sizeof(u32), &vcpu->arch.last_inst, false);
if (ret == -ENOENT) {
vcpu->arch.msr = kvmppc_set_field(vcpu->arch.msr, 33, 33, 1);
vcpu->arch.msr = kvmppc_set_field(vcpu->arch.msr, 34, 36, 0);
vcpu->arch.msr = kvmppc_set_field(vcpu->arch.msr, 42, 47, 0);
kvmppc_book3s_queue_irqprio(vcpu, 
BOOK3S_INTERRUPT_INST_STORAGE);
-   } else if(ret == EMULATE_DONE) {
+   return EMULATE_AGAIN;
+   }
+
+   return EMULATE_DONE;
+}
+
+static int kvmppc_check_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr)
+{
+
+   /* Need to do paired single emulation? */
+   if (!(vcpu->arch.hflags & BOOK3S_HFLAG_PAIRED_SINGLE))
+   return EMULATE_DONE;
+
+   /* Read out the instruction */
+   if (kvmppc_read_inst(vcpu) == EMULATE_DONE)
/* Need to emulate */
return EMULATE_FAIL;
-   }
 
return EMULATE_AGAIN;
 }
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/21] KVM: PPC: Make DSISR 32 bits wide

2010-03-24 Thread Alexander Graf
DSISR is only defined as 32 bits wide. So let's reflect that in the
structs too.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s.h   |2 +-
 arch/powerpc/include/asm/kvm_host.h |2 +-
 arch/powerpc/kvm/book3s_64_interrupts.S |2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 14d0262..9f5a992 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -84,8 +84,8 @@ struct kvmppc_vcpu_book3s {
u64 hid[6];
u64 gqr[8];
int slb_nr;
+   u32 dsisr;
u64 sdr1;
-   u64 dsisr;
u64 hior;
u64 msr_mask;
u64 vsid_first;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 119deb4..0ebda67 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -260,7 +260,7 @@ struct kvm_vcpu_arch {
 
u32 last_inst;
 #ifdef CONFIG_PPC64
-   ulong fault_dsisr;
+   u32 fault_dsisr;
 #endif
ulong fault_dear;
ulong fault_esr;
diff --git a/arch/powerpc/kvm/book3s_64_interrupts.S 
b/arch/powerpc/kvm/book3s_64_interrupts.S
index c1584d0..faca876 100644
--- a/arch/powerpc/kvm/book3s_64_interrupts.S
+++ b/arch/powerpc/kvm/book3s_64_interrupts.S
@@ -171,7 +171,7 @@ kvmppc_handler_highmem:
std r3, VCPU_PC(r7)
std r4, VCPU_SHADOW_SRR1(r7)
std r5, VCPU_FAULT_DEAR(r7)
-   std r6, VCPU_FAULT_DSISR(r7)
+   stw r6, VCPU_FAULT_DSISR(r7)
 
ld  r5, VCPU_HFLAGS(r7)
rldicl. r5, r5, 0, 63   /* CR = ((r5 & 1) == 0) */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu: jaso-parser: Output the content of invalid keyword

2010-03-24 Thread Luiz Capitulino
On Wed, 24 Mar 2010 17:00:14 +0100
Markus Armbruster  wrote:

> Amos Kong  writes:
> 
> > When input some invialid word 'unknowcmd' through QMP port, qemu outputs 
> > this
> > error message:
> > "parse error: invalid keyword `%s'"
> > This patch makes qemu output the content of invalid keyword, like:
> > "parse error: invalid keyword `unknowcmd'"
> >
> > Signed-off-by: Amos Kong 
> 
> Looks good to me.
> 
> Hint: it's best to put a version in the subject when you respin, like
> [PATCH v2] ...

 Yes, and maintainers may miss a patch down a thread (and it's a good
opportunity to fix the subject).
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] vhost-blk implementation

2010-03-24 Thread Badari Pulavarty

Christoph Hellwig wrote:
Inspired by vhost-net implementation, I did initial prototype 
of vhost-blk to see if it provides any benefits over QEMU virtio-blk.

I haven't handled all the error cases, fixed naming conventions etc.,
but the implementation is stable to play with. I tried not to deviate
from vhost-net implementation where possible.



Can you also send the qemu side of it?
  
Its pretty hacky and based it on old patch (vhost-net) from MST for 
simplicity.
I haven't focused on cleaning it up and I will re-base it on MST's 
latest code

once it gets into QEMU.

Thanks,
Badari

---
hw/virtio-blk.c |  199 
1 file changed, 199 insertions(+)

Index: vhost/hw/virtio-blk.c
===
--- vhost.orig/hw/virtio-blk.c  2010-02-25 16:47:04.0 -0500
+++ vhost/hw/virtio-blk.c   2010-03-17 14:07:26.477430740 -0400
@@ -18,6 +18,7 @@
#ifdef __linux__
# include 
#endif
+#include 

typedef struct VirtIOBlock
{
@@ -28,8 +29,13 @@
char serial_str[BLOCK_SERIAL_STRLEN + 1];
QEMUBH *bh;
size_t config_size;
+uint8_t vhost_started;
} VirtIOBlock;

+typedef struct BDRVRawState {
+int fd;
+} BDRVRawState;
+
static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev)
{
return (VirtIOBlock *)vdev;
@@ -501,6 +507,198 @@
return 0;
}

+#if 1
+#include "linux/vhost.h"
+#include 
+#include 
+#include "vhost.h"
+
+int vhost_blk_fd;
+
+struct slot_info {
+unsigned long phys_addr;
+unsigned long len;
+unsigned long userspace_addr;
+unsigned flags;
+int logging_count;
+};
+
+extern struct slot_info slots[KVM_MAX_NUM_MEM_REGIONS];
+
+static int vhost_blk_start(struct VirtIODevice *vdev)
+{
+   target_phys_addr_t s, l, a;
+   int r, num, idx = 0;
+   struct vhost_vring_state state;
+   struct vhost_vring_file file;
+   struct vhost_vring_addr addr;
+   unsigned long long used_phys;
+   void *desc, *avail, *used;
+   int i, n =0;
+   struct VirtQueue *q = virtio_queue(vdev, idx);
+   VirtIOBlock *vb = to_virtio_blk(vdev);
+   struct vhost_memory *mem;
+   BDRVRawState *st = vb->bs->opaque;
+
+   vhost_blk_fd = open("/dev/vhost-blk", O_RDWR);
+   if (vhost_blk_fd < 0) {
+   fprintf(stderr, "unable to open vhost-blk\n");
+   return -errno;
+   }
+
+   r = ioctl(vhost_blk_fd, VHOST_SET_OWNER, NULL);
+if (r < 0) {
+   fprintf(stderr, "ioctl VHOST_SET_OWNER failed\n");
+return -errno;
+   }
+
+for (i = 0; i < KVM_MAX_NUM_MEM_REGIONS; ++i) {
+if (!slots[i].len ||
+   (slots[i].flags & KVM_MEM_LOG_DIRTY_PAGES)) {
+  continue;
+}
+++n;
+}
+
+mem = qemu_mallocz(offsetof(struct vhost_memory, regions) +
+   n * sizeof(struct vhost_memory_region));
+if (!mem)
+return -ENOMEM;
+
+mem->nregions = n;
+n = 0;
+for (i = 0; i < KVM_MAX_NUM_MEM_REGIONS; ++i) {
+if (!slots[i].len || (slots[i].flags &
+   KVM_MEM_LOG_DIRTY_PAGES)) {
+continue;
+}
+mem->regions[n].guest_phys_addr = slots[i].phys_addr;
+mem->regions[n].memory_size = slots[i].len;
+mem->regions[n].userspace_addr = slots[i].userspace_addr;
+++n;
+}
+
+r = ioctl(vhost_blk_fd, VHOST_SET_MEM_TABLE, mem);
+if (r < 0)
+return -errno;
+
+   state.index = idx;
+   num = state.num = virtio_queue_get_num(vdev, idx);
+   r = ioctl(vhost_blk_fd, VHOST_SET_VRING_NUM, &state);
+if (r) {
+   fprintf(stderr, "ioctl VHOST_SET_VRING_NUM failed\n");
+return -errno;
+}
+
+   state.num = virtio_queue_last_avail_idx(vdev, idx);
+   r = ioctl(vhost_blk_fd, VHOST_SET_VRING_BASE, &state);
+   if (r) {
+   fprintf(stderr, "ioctl VHOST_SET_VRING_BASE failed\n");
+return -errno;
+   }
+
+   s = l = sizeof(struct vring_desc) * num;
+   a = virtio_queue_get_desc(vdev, idx);
+   desc = cpu_physical_memory_map(a, &l, 0);
+   if (!desc || l != s) {
+r = -ENOMEM;
+goto fail_alloc;
+   }
+   s = l = offsetof(struct vring_avail, ring) +
+sizeof(u_int64_t) * num;
+a = virtio_queue_get_avail(vdev, idx);
+avail = cpu_physical_memory_map(a, &l, 0);
+if (!avail || l != s) {
+r = -ENOMEM;
+goto fail_alloc;
+}
+s = l = offsetof(struct vring_used, ring) +
+sizeof(struct vring_used_elem) * num;
+used_phys = a = virtio_queue_get_used(vdev, idx);
+used = cpu_physical_memory_map(a, &l,

Re: [RFC] vhost-blk implementation

2010-03-24 Thread Badari Pulavarty

Christoph Hellwig wrote:
Inspired by vhost-net implementation, I did initial prototype 
of vhost-blk to see if it provides any benefits over QEMU virtio-blk.

I haven't handled all the error cases, fixed naming conventions etc.,
but the implementation is stable to play with. I tried not to deviate
from vhost-net implementation where possible.



Can you also send the qemu side of it?

  

with vhost-blk:


# time dd if=/dev/vda of=/dev/null bs=128k iflag=direct
64+0 records in
64+0 records out
8388608 bytes (84 GB) copied, 126.135 seconds, 665 MB/s

real2m6.137s
user0m0.281s
sys 0m14.725s

without vhost-blk: (virtio)
---

# time dd if=/dev/vda of=/dev/null bs=128k iflag=direct
64+0 records in
64+0 records out
8388608 bytes (84 GB) copied, 275.466 seconds, 305 MB/s

real4m35.468s
user0m0.373s
sys 0m48.074s



Which caching mode is this?  I assume data=writeback, because otherwise
you'd be doing synchronous I/O directly from the handler.
  


Yes. This is with default (writeback) cache model. As mentioned earlier, 
readhead is helping here

and most cases, data would be ready in the pagecache.
  

+static int do_handle_io(struct file *file, uint32_t type, uint64_t sector,
+   struct iovec *iov, int in)
+{
+   loff_t pos = sector << 8;
+   int ret = 0;
+
+   if (type & VIRTIO_BLK_T_FLUSH)  {
+   ret = vfs_fsync(file, file->f_path.dentry, 1);
+   } else if (type & VIRTIO_BLK_T_OUT) {
+   ret = vfs_writev(file, iov, in, &pos);
+   } else {
+   ret = vfs_readv(file, iov, in, &pos);
+   }
+   return ret;



I have to admit I don't understand the vhost architecture at all, but
where do the actual data pointers used by the iovecs reside?
vfs_readv/writev expect both the iovec itself and the buffers
pointed to by it to reside in userspace, so just using kernel buffers
here will break badly on architectures with different user/kernel
mappings.  A lot of this is fixable using simple set_fs & co tricks,
but for direct I/O which uses get_user_pages even that will fail badly.
  
iovecs and buffers are user-space pointers (from the host kernel point 
of view). They are

guest address. So, I don't need to do any set_fs tricks.

Also it seems like you're doing all the I/O synchronous here?  For
data=writeback operations that could explain the read speedup
as you're avoiding context switches, but for actual write I/O
which has to get data to disk (either directly from vfs_writev or
later through vfs_fsync) this seems like a really bad idea stealing
a lot of guest time that should happen in the background.
  
Yes. QEMU virtio-blk is batching up all the writes and handing of the 
work to another
thread. When the writes() are complete, its sending a status completion. 
Since I am
doing everything synchronous (even though its write to pagecache) one 
request at a

time, that explains the slow down. We need to find a way to

1) batch IO writes together
2) hand off to another thread to do the IO, so that vhost-thread can handle
next set of requests
3) update the status on the completion

What do should I do here ? I can create bunch of kernel threads to do 
the IO for me.
Or some how fit and reuse AIO io_submit() mechanism. Whats the best way 
here ?

I hate do duplicate all the code VFS is doing.


Other than that the code seems quite nice and simple, but one huge
problem is that it'll only support raw images, and thus misses out
on all the "nice" image formats used in qemu deployments, especially
qcow2.  It's also missing the ioctl magic we're having in various
places, both for controlling host devices like cdroms and SG
passthrough.
  
True... unfortunately, I don't understand all of those (qcow2) details 
yet !! I need to read up on those,

to even make a comment :(

Thanks,
Badari


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] vhost-blk implementation

2010-03-24 Thread Christoph Hellwig
On Tue, Mar 23, 2010 at 12:03:14PM +0200, Avi Kivity wrote:
> I also think it should be done at the bio layer.  File I/O is going to  
> be slower, if we do vhost-blk we should concentrate on maximum  
> performance.  The block layer also exposes more functionality we can use  
> (asynchronous barriers for example).

The block layer is more flexible, but that limits you to only stack
directly ontop of a block device, which is extremly inflexible.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] vhost-blk implementation

2010-03-24 Thread Christoph Hellwig
> Inspired by vhost-net implementation, I did initial prototype 
> of vhost-blk to see if it provides any benefits over QEMU virtio-blk.
> I haven't handled all the error cases, fixed naming conventions etc.,
> but the implementation is stable to play with. I tried not to deviate
> from vhost-net implementation where possible.

Can you also send the qemu side of it?

> with vhost-blk:
> 
> 
> # time dd if=/dev/vda of=/dev/null bs=128k iflag=direct
> 64+0 records in
> 64+0 records out
> 8388608 bytes (84 GB) copied, 126.135 seconds, 665 MB/s
> 
> real2m6.137s
> user0m0.281s
> sys 0m14.725s
> 
> without vhost-blk: (virtio)
> ---
> 
> # time dd if=/dev/vda of=/dev/null bs=128k iflag=direct
> 64+0 records in
> 64+0 records out
> 8388608 bytes (84 GB) copied, 275.466 seconds, 305 MB/s
> 
> real4m35.468s
> user0m0.373s
> sys 0m48.074s

Which caching mode is this?  I assume data=writeback, because otherwise
you'd be doing synchronous I/O directly from the handler.

> +static int do_handle_io(struct file *file, uint32_t type, uint64_t sector,
> + struct iovec *iov, int in)
> +{
> + loff_t pos = sector << 8;
> + int ret = 0;
> +
> + if (type & VIRTIO_BLK_T_FLUSH)  {
> + ret = vfs_fsync(file, file->f_path.dentry, 1);
> + } else if (type & VIRTIO_BLK_T_OUT) {
> + ret = vfs_writev(file, iov, in, &pos);
> + } else {
> + ret = vfs_readv(file, iov, in, &pos);
> + }
> + return ret;

I have to admit I don't understand the vhost architecture at all, but
where do the actual data pointers used by the iovecs reside?
vfs_readv/writev expect both the iovec itself and the buffers
pointed to by it to reside in userspace, so just using kernel buffers
here will break badly on architectures with different user/kernel
mappings.  A lot of this is fixable using simple set_fs & co tricks,
but for direct I/O which uses get_user_pages even that will fail badly.

Also it seems like you're doing all the I/O synchronous here?  For
data=writeback operations that could explain the read speedup
as you're avoiding context switches, but for actual write I/O
which has to get data to disk (either directly from vfs_writev or
later through vfs_fsync) this seems like a really bad idea stealing
a lot of guest time that should happen in the background.


Other than that the code seems quite nice and simple, but one huge
problem is that it'll only support raw images, and thus misses out
on all the "nice" image formats used in qemu deployments, especially
qcow2.  It's also missing the ioctl magic we're having in various
places, both for controlling host devices like cdroms and SG
passthrough.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 24, 2010 at 08:20:10PM +0200, Avi Kivity escreveu:
> On 03/24/2010 07:47 PM, Arnaldo Carvalho de Melo wrote:
>> Em Wed, Mar 24, 2010 at 06:09:30PM +0200, Avi Kivity escreveu:
>>
>>> Doesn't perf already has a dependency on naming conventions for finding
>>> debug information?
>>>  
>> It looks at several places, from most symbol rich (/usr/lib/debug/, aka
>> -debuginfo packages, where we have full symtabs) to poorest (the
>> packaged binary, where we may just have a .dynsym).
>>
>> In an ideal world, it would just get the build-id (a SHA1 cookie that is
>> in an ELF session inserted in every binary (aka DSOs), kernel module,
>> kallsyms or vmlinux file) and use that to look first in a local cache
>> (implemented in perf for a long time already) or in some symbol server.
>>
>> For instance, for a random perf.data file I collected here in my machine
>> I have:
>>
>> [a...@doppio linux-2.6-tip]$ perf buildid-list | grep libpthread
>> 5c68f7afeb33309c78037e374b0deee84dd441f6 /lib64/libpthread-2.10.2.so
>> [a...@doppio linux-2.6-tip]$
>>
>> So I don't have to access /lib64/libpthread-2.10.2.so directly, nor some
>> convention to get a debuginfo in a local file like:
>>
>> /usr/lib/debug/lib64/libpthread-2.10.2.so.debug
>>
>> Instead the tools look at:
>>
>> [a...@doppio linux-2.6-tip]$ l 
>> ~/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6
>> lrwxrwxrwx 1 acme acme 73 2010-01-06 18:53 
>> /home/acme/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6 ->  
>> ../../lib64/libpthread-2.10.2.so/5c68f7afeb33309c78037e374b0deee84dd441f6*
>>
>> To find the file for that specific build-id, not the one installed in my
>> machine (or on the different machine, of a different architecture) that
>> may be completely unrelated, a new one, or one for a different arch.

> Thanks.  I believe qemu could easily act as a symbol server for this use  
> case.

Agreed, but it doesn't even have to :-)

We just need to get the build-id in the PERF_RECORD_MMAP event somehow
and then get this symbol from elsewhere, say the same DVD/RHN
channel/Debian Repository/embedded developer toolkit image not
stripped/whatever.

Or it may already be in the local cache from last week's perf report
session :-)

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 07:47 PM, Arnaldo Carvalho de Melo wrote:

Em Wed, Mar 24, 2010 at 06:09:30PM +0200, Avi Kivity escreveu:
   

Doesn't perf already has a dependency on naming conventions for finding
debug information?
 

It looks at several places, from most symbol rich (/usr/lib/debug/, aka
-debuginfo packages, where we have full symtabs) to poorest (the
packaged binary, where we may just have a .dynsym).

In an ideal world, it would just get the build-id (a SHA1 cookie that is
in an ELF session inserted in every binary (aka DSOs), kernel module,
kallsyms or vmlinux file) and use that to look first in a local cache
(implemented in perf for a long time already) or in some symbol server.

For instance, for a random perf.data file I collected here in my machine
I have:

[a...@doppio linux-2.6-tip]$ perf buildid-list | grep libpthread
5c68f7afeb33309c78037e374b0deee84dd441f6 /lib64/libpthread-2.10.2.so
[a...@doppio linux-2.6-tip]$

So I don't have to access /lib64/libpthread-2.10.2.so directly, nor some
convention to get a debuginfo in a local file like:

/usr/lib/debug/lib64/libpthread-2.10.2.so.debug

Instead the tools look at:

[a...@doppio linux-2.6-tip]$ l 
~/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6
lrwxrwxrwx 1 acme acme 73 2010-01-06 18:53 
/home/acme/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6 ->  
../../lib64/libpthread-2.10.2.so/5c68f7afeb33309c78037e374b0deee84dd441f6*

To find the file for that specific build-id, not the one installed in my
machine (or on the different machine, of a different architecture) that
may be completely unrelated, a new one, or one for a different arch.
   


Thanks.  I believe qemu could easily act as a symbol server for this use 
case.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 24, 2010 at 06:09:30PM +0200, Avi Kivity escreveu:
> Doesn't perf already has a dependency on naming conventions for finding  
> debug information?

It looks at several places, from most symbol rich (/usr/lib/debug/, aka
-debuginfo packages, where we have full symtabs) to poorest (the
packaged binary, where we may just have a .dynsym).

In an ideal world, it would just get the build-id (a SHA1 cookie that is
in an ELF session inserted in every binary (aka DSOs), kernel module,
kallsyms or vmlinux file) and use that to look first in a local cache
(implemented in perf for a long time already) or in some symbol server.

For instance, for a random perf.data file I collected here in my machine
I have:

[a...@doppio linux-2.6-tip]$ perf buildid-list | grep libpthread
5c68f7afeb33309c78037e374b0deee84dd441f6 /lib64/libpthread-2.10.2.so
[a...@doppio linux-2.6-tip]$

So I don't have to access /lib64/libpthread-2.10.2.so directly, nor some
convention to get a debuginfo in a local file like:

/usr/lib/debug/lib64/libpthread-2.10.2.so.debug

Instead the tools look at:

[a...@doppio linux-2.6-tip]$ l 
~/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6
lrwxrwxrwx 1 acme acme 73 2010-01-06 18:53 
/home/acme/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6 -> 
../../lib64/libpthread-2.10.2.so/5c68f7afeb33309c78037e374b0deee84dd441f6*

To find the file for that specific build-id, not the one installed in my
machine (or on the different machine, of a different architecture) that
may be completely unrelated, a new one, or one for a different arch.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu: jaso-parser: Output the content of invalid keyword

2010-03-24 Thread Richard Henderson
On 03/24/2010 08:12 AM, Amos Kong wrote:
> 
> When input some invialid word 'unknowcmd' through QMP port, qemu outputs this
> error message:
> "parse error: invalid keyword `%s'"
> This patch makes qemu output the content of invalid keyword, like:
> "parse error: invalid keyword `unknowcmd'"
> 
> Signed-off-by: Amos Kong 

Acked-by: Richard Henderson 

> ---
>  json-parser.c |8 +++-
>  1 files changed, 7 insertions(+), 1 deletions(-)
> 
> diff --git a/json-parser.c b/json-parser.c
> index 579928f..b55d763 100644
> --- a/json-parser.c
> +++ b/json-parser.c
> @@ -12,6 +12,7 @@
>   */
>  
>  #include 
> +#include 
>  
>  #include "qemu-common.h"
>  #include "qstring.h"
> @@ -93,7 +94,12 @@ static int token_is_escape(QObject *obj, const char *value)
>   */
>  static void parse_error(JSONParserContext *ctxt, QObject *token, const char 
> *msg, ...)
>  {
> -fprintf(stderr, "parse error: %s\n", msg);
> +va_list ap;
> +va_start(ap, msg);
> +fprintf(stderr, "parse error: ");
> +vfprintf(stderr, msg, ap);
> +fprintf(stderr, "\n");
> +va_end(ap);
>  }
>  
>  /**

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 06:47 PM, Avi Kivity wrote:


It's true.  If the kernel provides something, there are fewer things 
that can break.  But if your system is so broken that you can't 
resolve uids, fix that before running perf.  Must we design perf for 
that case?


After all, 'ls -l' will break under the same circumstances.  It's hard 
to imagine doing useful work when that doesn't work.



Also, perf itself will hang if it needs to access a file using autofs or 
nfs, and those are broken.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 06:45 PM, Joerg Roedel wrote:



That's just what I want to do.  Leave it in userspace and then they can
deal with it without telling us about it.
 

They can't do that with a directory in /proc?

   


I don't know.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: allow bit 10 to be cleared in MSR_IA32_MC4_CTL

2010-03-24 Thread Andre Przywara
There is a quirk for AMD K8 CPUs in many Linux kernels (see
arch/x86/kernel/cpu/mcheck/mce.c:__mcheck_cpu_apply_quirks()) that
clears bit 10 in that MCE related MSR. KVM can only cope with all
zeros or all ones, so it will inject a #GP into the guest, which
will let it panic.
So lets add a quirk to the quirk and ignore this single cleared bit.
This fixes -cpu kvm64 on all machines and -cpu host on K8 machines
with some guest Linux kernels.

Signed-off-by: Andre Przywara 
---
 arch/x86/kvm/x86.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 097ad3a..a58c634 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -910,9 +910,13 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 
data)
if (msr >= MSR_IA32_MC0_CTL &&
msr < MSR_IA32_MC0_CTL + 4 * bank_num) {
u32 offset = msr - MSR_IA32_MC0_CTL;
-   /* only 0 or all 1s can be written to IA32_MCi_CTL */
+   /* only 0 or all 1s can be written to IA32_MCi_CTL
+* some Linux kernels though clear bit 10 in bank 4 to
+* workaround a BIOS/GART TBL issue on AMD K8s, ignore
+* this to avoid an uncatched #GP in the guest
+*/
if ((offset & 0x3) == 0 &&
-   data != 0 && data != ~(u64)0)
+   data != 0 && (data | (1 << 10)) != ~(u64)0)
return -1;
vcpu->arch.mce_banks[offset] = data;
break;
-- 
1.6.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 06:40 PM, Joerg Roedel wrote:



Looks trivial to find a guest, less so with enumerating (still doable).
 

Not so trival and even more likely to break. Even it perf has the pid of
the process and wants to find the directory it has to do:

1. Get the uid of the process
2. Find the username for the uid
3. Use the username to find the home-directory

Steps 2. and 3. need nsswitch and/or pam access to get this information
from whatever source the admin has configured. And depending on what the
source is it may be temporarily unavailable causing nasty timeouts. In
short, there are many weak parts in that chain making it more likely to
break.
   


It's true.  If the kernel provides something, there are fewer things 
that can break.  But if your system is so broken that you can't resolve 
uids, fix that before running perf.  Must we design perf for that case?


After all, 'ls -l' will break under the same circumstances.  It's hard 
to imagine doing useful work when that doesn't work.



A kernel-based approach with /proc//kvm does not have those issues
(and to repeat myself, it is independent from the userspace being used).
   


It has other issues, which are IMO more problematic.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Peter Zijlstra
On Wed, 2010-03-24 at 17:23 +0100, Joerg Roedel wrote:
> On Wed, Mar 24, 2010 at 05:03:42PM +0100, Peter Zijlstra wrote:
> > On Wed, 2010-03-24 at 16:01 +0100, Joerg Roedel wrote:
> > 
> > > What I meant was: perf-kernel puts the guest-name into every sample and
> > > perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
> > > symbols. I leave the question of how the guest-fs is exposed to the host
> > > out of this discussion. We should discuss this seperatly.
> > 
> > I'd much prefer a pid like suggested later, keeps the samples smaller.
> > 
> > But that said, we need guest kernel events like mmap and context
> > switches too, otherwise we simply can't make sense of guest userspace
> > addresses, we need to know the guest address space layout.
> 
> With the filesystem approach all we need is the pid of the guest
> process. Then we can access proc//maps of the guest and read out the
> address space layout, no?

No, what if it maps new things after you read it? But still getting the
pid of the guest process seems non trivial without guest kernel support.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 06:32:51PM +0200, Avi Kivity wrote:
> On 03/24/2010 06:31 PM, Joerg Roedel wrote:

> That's just what I want to do.  Leave it in userspace and then they can  
> deal with it without telling us about it.

They can't do that with a directory in /proc?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 06:09:30PM +0200, Avi Kivity wrote:
> On 03/24/2010 05:59 PM, Joerg Roedel wrote:
>>
>>
 I am not tied to /sys/kvm. We could also use /proc//kvm/ for
 example. This would keep anything in the process space (except for the
 global list of VMs which we should have anyway).


>>> How about ~/.qemu/guests/$pid?
>>>  
>> That makes it hard for perf to find it and even harder to get a list of
>> all VMs.
>
> Looks trivial to find a guest, less so with enumerating (still doable).

Not so trival and even more likely to break. Even it perf has the pid of
the process and wants to find the directory it has to do:

1. Get the uid of the process
2. Find the username for the uid
3. Use the username to find the home-directory

Steps 2. and 3. need nsswitch and/or pam access to get this information
from whatever source the admin has configured. And depending on what the
source is it may be temporarily unavailable causing nasty timeouts. In
short, there are many weak parts in that chain making it more likely to
break.
A kernel-based approach with /proc//kvm does not have those issues
(and to repeat myself, it is independent from the userspace being used).

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 06:31 PM, Joerg Roedel wrote:

On Wed, Mar 24, 2010 at 06:20:38PM +0200, Avi Kivity wrote:
   

On 03/24/2010 06:17 PM, Joerg Roedel wrote:
 

But is this not only one entity more for
sVirt to handle? I would leave that decision to the sVirt developers.
Does attaching the same label as for the VM resources mean that root
could not access it anymore?

   

IIUC processes run under a context, and there's a policy somewhere that
tells you which context can access which label (and with what
permissions).  There was a server on the Internet once that gave you
root access and invited you to attack it.  No idea if anyone succeeded
or not (I got bored after about a minute).

So it depends on the policy.  If you attach the same label, that means
all files with the same label have the same access permissions.  I think.
 

So if this is true we can introduce a 'trace' label and add all contexts
that should be allowed to trace to it.
But we probably should leave the details to the security experts ;-)
   


That's just what I want to do.  Leave it in userspace and then they can 
deal with it without telling us about it.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 06:20:38PM +0200, Avi Kivity wrote:
> On 03/24/2010 06:17 PM, Joerg Roedel wrote:
>> But is this not only one entity more for
>> sVirt to handle? I would leave that decision to the sVirt developers.
>> Does attaching the same label as for the VM resources mean that root
>> could not access it anymore?
>>
>
> IIUC processes run under a context, and there's a policy somewhere that  
> tells you which context can access which label (and with what  
> permissions).  There was a server on the Internet once that gave you  
> root access and invited you to attack it.  No idea if anyone succeeded  
> or not (I got bored after about a minute).
>
> So it depends on the policy.  If you attach the same label, that means  
> all files with the same label have the same access permissions.  I think.

So if this is true we can introduce a 'trace' label and add all contexts
that should be allowed to trace to it.
But we probably should leave the details to the security experts ;-)

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm crashes with Assertion ... failed.

2010-03-24 Thread Avi Kivity

On 03/24/2010 06:20 PM, André Weidemann wrote:

Does this happen with a guest installed on kvm, or just with the guest
that (guessing from the name) was imported from vmware?



I booted the VM via PXE into an Ubuntu Live CD image. I only added the 
Windows disk image, so I could copy the resulting Excel file (from 
iozone) to this disk. The Windows 7 on this disk was installed under 
kvm 0.12.3.




What version of Ubuntu?  Can you post a way to reproduce this reliably 
(how you created the disk etc.)


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 05:03:42PM +0100, Peter Zijlstra wrote:
> On Wed, 2010-03-24 at 16:01 +0100, Joerg Roedel wrote:
> 
> > What I meant was: perf-kernel puts the guest-name into every sample and
> > perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
> > symbols. I leave the question of how the guest-fs is exposed to the host
> > out of this discussion. We should discuss this seperatly.
> 
> I'd much prefer a pid like suggested later, keeps the samples smaller.
> 
> But that said, we need guest kernel events like mmap and context
> switches too, otherwise we simply can't make sense of guest userspace
> addresses, we need to know the guest address space layout.

With the filesystem approach all we need is the pid of the guest
process. Then we can access proc//maps of the guest and read out the
address space layout, no?

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 06:17 PM, Joerg Roedel wrote:

On Wed, Mar 24, 2010 at 05:52:54PM +0200, Avi Kivity wrote:
   

On 03/24/2010 05:50 PM, Joerg Roedel wrote:
 

If we go the /proc//kvm way then the directory should probably
inherit the label from /proc//?
   

That's a security policy.  The security people like their policies
outside the kernel.

For example, they may want a label that allows a trace context to read
the data, and also qemu itself for introspection.
 

Hm, I am not a security expert.


I'm out of my depth here as well.


But is this not only one entity more for
sVirt to handle? I would leave that decision to the sVirt developers.
Does attaching the same label as for the VM resources mean that root
could not access it anymore?
   


IIUC processes run under a context, and there's a policy somewhere that 
tells you which context can access which label (and with what 
permissions).  There was a server on the Internet once that gave you 
root access and invited you to attack it.  No idea if anyone succeeded 
or not (I got bored after about a minute).


So it depends on the policy.  If you attach the same label, that means 
all files with the same label have the same access permissions.  I think.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm crashes with Assertion ... failed.

2010-03-24 Thread André Weidemann

Hi,
On 24.03.2010 13:17, Avi Kivity wrote:

On 03/17/2010 11:14 PM, André Weidemann wrote:

qemu-system-x86_64 -cpu core2duo -vga cirrus -boot order=ndc -vnc
192.168.3.42:2 -k de -smp 4,cores=4 -drive
file=/vmware/Windows7Test_600G.img,if=ide,index=0,cache=writeback -m
1024 -net nic,model=e1000,macaddr=DE:AD:BE:EF:12:3A -net
tap,script=/usr/local/bin/qemu-ifup -monitor pty -name
Windows7test,process=Windows7test -drive
file=/dev/storage/Windows7test,if=ide,index=1,cache=none,aio=native


Andre,

Can you try qemu-kvm-0.12.3 ?



I did the following:
git clone git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git
qemu-kvm-2010-03-17
cd qemu-kvm-2010-03-17
git checkout -b test qemu-kvm-0.12.3
./configure
make -j6 && make install

I started the VM again exactly as I did the last time and it crashed
again with the same error message.
"qemu-system-x86_64:
/usr/local/src/qemu-kvm-2010-03-17/hw/ide/internal.h:507:
bmdma_active_if: Assertion `bmdma->unit != (uint8_t)-1' failed."



Does this happen with a guest installed on kvm, or just with the guest
that (guessing from the name) was imported from vmware?


I booted the VM via PXE into an Ubuntu Live CD image. I only added the 
Windows disk image, so I could copy the resulting Excel file (from 
iozone) to this disk. The Windows 7 on this disk was installed under kvm 
0.12.3.


Regards
 André
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 06:03 PM, Peter Zijlstra wrote:

On Wed, 2010-03-24 at 16:01 +0100, Joerg Roedel wrote:

   

What I meant was: perf-kernel puts the guest-name into every sample and
perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
symbols. I leave the question of how the guest-fs is exposed to the host
out of this discussion. We should discuss this seperatly.
 

I'd much prefer a pid like suggested later, keeps the samples smaller.

But that said, we need guest kernel events like mmap and context
switches too, otherwise we simply can't make sense of guest userspace
addresses, we need to know the guest address space layout.
   


The kernel knows some of the address space layout, qemu knows all of it.


So aside from a filesystem content, we first need mmap and context
switch events to find the files we need to access.
   


This only works for the guest kernel, we don't know anything about guest 
processes [1].



And while I appreciate all the security talk, its basically pointless
anyway, the host can access it anyway, everybody agrees on that, but
still you're arguing the case..
   


root can access anything, but we're not talking about root.  The idea is 
to protect against a guest that has exploited its qemu and is now 
attacking the host and its fellow guests.   uid protection is no good 
since we want to isolate the guest from host processes belonging to the 
same uid and from other guests running under the same uid.


[1] We can find out guest pids if we teach the kernel what to 
dereference, i.e. gs:offset1->offset2->offset3.  Of course this varies 
from kernel to kernel, so we need some kind of bytecode that we can run 
in perf nmi context.  Kind of what we need to run an unwinder for 
-fomit-frame-pointer.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 05:52:54PM +0200, Avi Kivity wrote:
> On 03/24/2010 05:50 PM, Joerg Roedel wrote:
>> If we go the /proc//kvm way then the directory should probably
>> inherit the label from /proc//?
>
> That's a security policy.  The security people like their policies  
> outside the kernel.
>
> For example, they may want a label that allows a trace context to read  
> the data, and also qemu itself for introspection.

Hm, I am not a security expert. But is this not only one entity more for
sVirt to handle? I would leave that decision to the sVirt developers.
Does attaching the same label as for the VM resources mean that root
could not access it anymore?

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 05:59 PM, Joerg Roedel wrote:


   

I am not tied to /sys/kvm. We could also use /proc//kvm/ for
example. This would keep anything in the process space (except for the
global list of VMs which we should have anyway).

   

How about ~/.qemu/guests/$pid?
 

That makes it hard for perf to find it and even harder to get a list of
all VMs.


Looks trivial to find a guest, less so with enumerating (still doable).


  With /proc//kvm/guest we could symlink all guest
directories to /proc/kvm/ and perf reads the list from there. Also perf
can easily derive the directory for a guest from its pid.
Last but not least its kernel-created and thus independent from the
userspace part being used.
   


Doesn't perf already has a dependency on naming conventions for finding 
debug information?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Peter Zijlstra
On Wed, 2010-03-24 at 16:01 +0100, Joerg Roedel wrote:

> What I meant was: perf-kernel puts the guest-name into every sample and
> perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
> symbols. I leave the question of how the guest-fs is exposed to the host
> out of this discussion. We should discuss this seperatly.

I'd much prefer a pid like suggested later, keeps the samples smaller.

But that said, we need guest kernel events like mmap and context
switches too, otherwise we simply can't make sense of guest userspace
addresses, we need to know the guest address space layout.

So aside from a filesystem content, we first need mmap and context
switch events to find the files we need to access.

And while I appreciate all the security talk, its basically pointless
anyway, the host can access it anyway, everybody agrees on that, but
still you're arguing the case..
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu: jaso-parser: Output the content of invalid keyword

2010-03-24 Thread Markus Armbruster
Amos Kong  writes:

> When input some invialid word 'unknowcmd' through QMP port, qemu outputs this
> error message:
> "parse error: invalid keyword `%s'"
> This patch makes qemu output the content of invalid keyword, like:
> "parse error: invalid keyword `unknowcmd'"
>
> Signed-off-by: Amos Kong 

Looks good to me.

Hint: it's best to put a version in the subject when you respin, like
[PATCH v2] ...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 05:49:42PM +0200, Avi Kivity wrote:
> On 03/24/2010 05:46 PM, Joerg Roedel wrote:
>> On Wed, Mar 24, 2010 at 05:12:55PM +0200, Avi Kivity wrote:
>>
>>> On 03/24/2010 05:01 PM, Joerg Roedel wrote:
>>>  
 $ cd /sys/kvm/guest0
 $ ls -l
 -r 1 root root 0 2009-08-17 12:05 name
 dr-x-- 1 root root 0 2009-08-17 12:05 fs
 $ cat name
 guest0
 $ # ...

 The fs/ directory is used as the mount point for the guest root fs.

>>> The problem is /sys/kvm, not /sys/kvm/fs.
>>>  
>> I am not tied to /sys/kvm. We could also use /proc//kvm/ for
>> example. This would keep anything in the process space (except for the
>> global list of VMs which we should have anyway).
>>
>
> How about ~/.qemu/guests/$pid?

That makes it hard for perf to find it and even harder to get a list of
all VMs. With /proc//kvm/guest we could symlink all guest
directories to /proc/kvm/ and perf reads the list from there. Also perf
can easily derive the directory for a guest from its pid.
Last but not least its kernel-created and thus independent from the
userspace part being used.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


MSI-X not enabled for ixgbe device-passthrough

2010-03-24 Thread Hannes Reinecke
Hi all,

I'm trying to setup a system with device-passthrough for
an ixgbe NIC.
The device itself seems to work, but it isn't using MSI-X.
So some more advanced features like DCB offloading etc
won't work.

lspci output of the device:
07:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit Network 
Connection (rev 01)
Subsystem: Intel Corporation Ethernet Server Adapter X520-2
Flags: bus master, fast devsel, latency 0, IRQ 24
Memory at f5c8 (64-bit, prefetchable) [size=512K]
I/O ports at 5000 [size=32]
Memory at f5c7 (64-bit, prefetchable) [size=16K]
[virtual] Expansion ROM at e710 [disabled] [size=512K]
Capabilities: [40] Power Management version 3
Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ Count=1/1 
Enable-
Capabilities: [70] MSI-X: Enable+ Mask- TabSize=64
Capabilities: [a0] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq+ ACSVoil-
UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq+ ACSVoil-
UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq- ACSVoil-
CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [140] Device Serial Number 40-9e-3c-ff-ff-21-1b-00
Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 1
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
IOVSta: Migration-
Initial VFs: 64, Total VFs: 64, Number of VFs: 64, Function 
Dependency Link: 00
VF offset: 128, stride: 2, Device ID: 10ed
Supported Page Size: 0553, System Page Size: 0001
VF Migration: offset: , BIR: 1
Kernel driver in use: ixgbe
Kernel modules: ixgbe

please let me know if you need more information.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 05:50 PM, Joerg Roedel wrote:

On Wed, Mar 24, 2010 at 05:43:31PM +0200, Avi Kivity wrote:
   

On 03/24/2010 05:37 PM, Joerg Roedel wrote:
 

Even better. So a guest which breaks out can't even access its own
/sys/kvm/ directory. Perfect, it doesn't need that access anyway.
   

But what security label does that directory have?  How can we make sure
that whoever needs access to those files, gets them?

Automatically created objects don't work well with that model.  They're
simply missing information.
 

If we go the /proc//kvm way then the directory should probably
inherit the label from /proc//?
   


That's a security policy.  The security people like their policies 
outside the kernel.


For example, they may want a label that allows a trace context to read 
the data, and also qemu itself for introspection.



Same could be applied to /sys/kvm/guest/ if we decide for it. The VM is
still bound to a single process with a /proc/  after all.
   


Ditto.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 05:43:31PM +0200, Avi Kivity wrote:
> On 03/24/2010 05:37 PM, Joerg Roedel wrote:
>> Even better. So a guest which breaks out can't even access its own
>> /sys/kvm/ directory. Perfect, it doesn't need that access anyway.
>
> But what security label does that directory have?  How can we make sure  
> that whoever needs access to those files, gets them?
>
> Automatically created objects don't work well with that model.  They're  
> simply missing information.

If we go the /proc//kvm way then the directory should probably
inherit the label from /proc//?
Same could be applied to /sys/kvm/guest/ if we decide for it. The VM is
still bound to a single process with a /proc/ after all.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 05:46 PM, Joerg Roedel wrote:

On Wed, Mar 24, 2010 at 05:12:55PM +0200, Avi Kivity wrote:
   

On 03/24/2010 05:01 PM, Joerg Roedel wrote:
 

$ cd /sys/kvm/guest0
$ ls -l
-r 1 root root 0 2009-08-17 12:05 name
dr-x-- 1 root root 0 2009-08-17 12:05 fs
$ cat name
guest0
$ # ...

The fs/ directory is used as the mount point for the guest root fs.
   

The problem is /sys/kvm, not /sys/kvm/fs.
 

I am not tied to /sys/kvm. We could also use /proc//kvm/ for
example. This would keep anything in the process space (except for the
global list of VMs which we should have anyway).
   


How about ~/.qemu/guests/$pid?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 05:12:55PM +0200, Avi Kivity wrote:
> On 03/24/2010 05:01 PM, Joerg Roedel wrote:
>> $ cd /sys/kvm/guest0
>> $ ls -l
>> -r 1 root root 0 2009-08-17 12:05 name
>> dr-x-- 1 root root 0 2009-08-17 12:05 fs
>> $ cat name
>> guest0
>> $ # ...
>>
>> The fs/ directory is used as the mount point for the guest root fs.
>
> The problem is /sys/kvm, not /sys/kvm/fs.

I am not tied to /sys/kvm. We could also use /proc//kvm/ for
example. This would keep anything in the process space (except for the
global list of VMs which we should have anyway).

>> What I meant was: perf-kernel puts the guest-name into every sample and
>> perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
>> symbols. I leave the question of how the guest-fs is exposed to the host
>> out of this discussion. We should discuss this seperatly.
>
> How I see it: perf-kernel puts the guest pid into every sample, and  
> perf-userspace uses that to resolve to a mountpoint served by fuse, or  
> to a unix domain socket that serves the files.

We need a bit more information than just the qemu-pid, but yes, this
would also work out.

>> If a vm breaks into qemu it can access the host file system which is the
>> bigger problem. In this case there is no isolation anymore. From that
>> context it can even kill other VMs of the same user independent of a
>> hypothetical /sys/kvm/.
>
> It cannot.  sVirt labels the disk image and other files qemu needs with  
> the appropriate label, and everything else is off limits.  Even if you  
> run the guest as root, it won't have access to other files.

See my reply to Daniel's email.

>> Yes, but its different from the implementation point-of-view. For the
>> user it surely all plays together.
>
> We need qemu to cooperate for mmio tracing, and we can cooperate with  
> qemu for symbol resolution.  If it prevents adding another kernel API,  
> that's a win from my POV.

Thats true. Probably qemu can inject this information in the
kvm-trace-events stream.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: x86: document KVM_REQ_PENDING_TIMER usage

2010-03-24 Thread Marcelo Tosatti
On Wed, Mar 24, 2010 at 09:10:54AM +0800, 王箫 wrote:
> Thanks for pointing that, but is it possible that explicitly check the
> pending timer with kvm_cpu_has_pending_timer() in vcpu_enter_guest()? There
> seems some function duplication between KVM_REQ_PENDING_TIMER and
> ktimer->pending.

Right. KVM_REQ_PENDING_TIMER is per vcpu, and its one bit, while there
might be multiple ktimers per vcpu (its a shortcut between hrtimers and
guest entry, bypassing irq injection).

Yes, there is some duplication.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 05:37 PM, Joerg Roedel wrote:



No it can't. With sVirt every single VM has a custom security label and
the policy only allows it access to disks / files with a matching label,
and prevents it attacking any other VMs or processes on the host. THis
confines the scope of any exploit in QEMU to those resources the admin
has explicitly assigned to the guest.
 

Even better. So a guest which breaks out can't even access its own
/sys/kvm/ directory. Perfect, it doesn't need that access anyway.

   


But what security label does that directory have?  How can we make sure 
that whoever needs access to those files, gets them?


Automatically created objects don't work well with that model.  They're 
simply missing information.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 03:26:53PM +, Daniel P. Berrange wrote:
> On Wed, Mar 24, 2010 at 04:01:37PM +0100, Joerg Roedel wrote:
> > >> An approach like: "The files are owned and only readable by the same
> > >> user that started the vm." might be a good start. So a user can measure
> > >> its own guests and root can measure all of them.
> > >
> > > That's not how sVirt works.  sVirt isolates a user's VMs from each  
> > > other, so if a guest breaks into qemu it can't break into other guests  
> > > owned by the same user.
> > 
> > If a vm breaks into qemu it can access the host file system which is the
> > bigger problem. In this case there is no isolation anymore. From that
> > context it can even kill other VMs of the same user independent of a
> > hypothetical /sys/kvm/.
> 
> No it can't. With sVirt every single VM has a custom security label and
> the policy only allows it access to disks / files with a matching label,
> and prevents it attacking any other VMs or processes on the host. THis
> confines the scope of any exploit in QEMU to those resources the admin
> has explicitly assigned to the guest.

Even better. So a guest which breaks out can't even access its own
/sys/kvm/ directory. Perfect, it doesn't need that access anyway.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kvm@vger.kernel.org

2010-03-24 Thread Michael Tokarev
After a series of tries, I finally made my OEM copy
of Windows 7 to work in KVM using the original
registration key.

In short, one need SCIC table from the BIOS on the
original hardware, -- it should be in the BIOS in
the virtual machine too.  And second part is that
other tables in our virtual BIOS need to have the
same OEM identification as the SLIC table - namely
FACP (in case of kmv), and also XSDT and RSDT if
present.

The way to insert custom acpi table is using -acpitable
parameter.  But unfortunately kvm does not provide a way
to insert whole table this way, together with the header --
instead, it expects the header on the command line.  It
is possible to extract the header into printable form for
the command line, and cut it from the slic.bin, but I used
different way: I modified hw/acpi.c load whole thing from
the given file.

After doing that, and giving -acpitable file=slic.bin (with
file= parameter being my quick-n-dirty addition) to kvm,
I were able to see the correct SLIC table in /sys/firmware/
acpi/tables/SLIC in linux.  But windows refused to activate.

So the next step was to modify seabios OEM string which it
placed to other tables.  For that, in src/acpi.c I just added

  memcpy(h->oem_id, "_ASUS_Notebook", 14);

to build_header() routine (yes it is a notebook from Asus with
licensed version of windows7 professional).

And after that step windows happily told me that I'm now using
genuine copy of it and the activation is completed.

As far as I can see I have right to run my licensed copy this
way, on the same notebook it were purchased with.

So.. the real question is: while this quick-n-dirty proof of
concept works, it's not the way to go.  What can be done to
simplify the whole thing and to do it the Right Way?

At least having a way to accept complete acpi table (with header
and checksum and everything) is - IMHO - a good thing.

But I'm not sure about the OEM ID strings in other tables in seabios, --
it is quite ugly, both in implementation (how to tell bios to change
its identify?) and in the whole fact of it, since we're lying to the
(virtual) machine.  But from another point of view, it should be a
good debugging tool, since some software behaves differently given
one or another strings in the BIOS...

Comments, anyone?

Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Daniel P. Berrange
On Wed, Mar 24, 2010 at 04:01:37PM +0100, Joerg Roedel wrote:
> >> An approach like: "The files are owned and only readable by the same
> >> user that started the vm." might be a good start. So a user can measure
> >> its own guests and root can measure all of them.
> >
> > That's not how sVirt works.  sVirt isolates a user's VMs from each  
> > other, so if a guest breaks into qemu it can't break into other guests  
> > owned by the same user.
> 
> If a vm breaks into qemu it can access the host file system which is the
> bigger problem. In this case there is no isolation anymore. From that
> context it can even kill other VMs of the same user independent of a
> hypothetical /sys/kvm/.

No it can't. With sVirt every single VM has a custom security label and
the policy only allows it access to disks / files with a matching label,
and prevents it attacking any other VMs or processes on the host. THis
confines the scope of any exploit in QEMU to those resources the admin
has explicitly assigned to the guest.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Test report, kernel 647e9e... qemu 7811d4...

2010-03-24 Thread Avi Kivity

On 03/08/2010 08:40 AM, Hao, Xudong wrote:

Hi, all,
This is KVM biweekly test result against kvm.git: 
647e9ec3b543ea04d49a7323dfe0070682ed8465 and qemu-kvm.git: 
7811d4e8ec057d25db68f900be1f09a142faca49.

In the last month, KVM testing was blocked by one qemu-img issue and two qemu 
build issues. Now the qemu build issue and qemu-img bug all get fixed.

2. ltp diotest running time is 2.54 times than before
https://sourceforge.net/tracker/?func=detail&aid=2723366&group_id=180599&atid=893831
   


Can you check the performance of this with cache=writeback?

The common on the report referring to cache=writethrough is incorrect (I 
think?)


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 05:01 PM, Joerg Roedel wrote:



But when I weigh the benefit of truly transparent  system-wide perf
integration for users who don't use libvirt but do use  perf, versus
the cost of transforming kvm from a single-process API to a
system-wide API with all the complications that I've listed, it comes
out in favour of not adding the API.
 

Its not a transformation, its an extension. The current per-process
/dev/kvm stays mostly untouched. Its all about having something like
this:

$ cd /sys/kvm/guest0
$ ls -l
-r 1 root root 0 2009-08-17 12:05 name
dr-x-- 1 root root 0 2009-08-17 12:05 fs
$ cat name
guest0
$ # ...

The fs/ directory is used as the mount point for the guest root fs.
   


The problem is /sys/kvm, not /sys/kvm/fs.


The samples will be tagged with the guest-name (and some additional
information perf needs). Perf userspace can access the symbols then
through /sys/kvm/guest0/fs/...
   

I take that as a yes?  So we need a virtio-serial client in the kernel
(which might be exploitable by a malicious guest if buggy) and a
fs-over-virtio-serial client in the kernel (also exploitable).
 

What I meant was: perf-kernel puts the guest-name into every sample and
perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
symbols. I leave the question of how the guest-fs is exposed to the host
out of this discussion. We should discuss this seperatly.
   


How I see it: perf-kernel puts the guest pid into every sample, and 
perf-userspace uses that to resolve to a mountpoint served by fuse, or 
to a unix domain socket that serves the files.



An approach like: "The files are owned and only readable by the same
user that started the vm." might be a good start. So a user can measure
its own guests and root can measure all of them.
   

That's not how sVirt works.  sVirt isolates a user's VMs from each
other, so if a guest breaks into qemu it can't break into other guests
owned by the same user.
 

If a vm breaks into qemu it can access the host file system which is the
bigger problem. In this case there is no isolation anymore. From that
context it can even kill other VMs of the same user independent of a
hypothetical /sys/kvm/.
   


It cannot.  sVirt labels the disk image and other files qemu needs with 
the appropriate label, and everything else is off limits.  Even if you 
run the guest as root, it won't have access to other files.



Yeah that would be interesting information. But it is more related to
tracing than to pmu measurements.  The information which you
mentioned above are probably better captured by an extension of
trace-events to userspace.
   

It's all related.  You start with perf, see a problem with mmio, call up
a histogram of mmio or interrupts or whatever, then zoom in on the
misbehaving device.
 

Yes, but its different from the implementation point-of-view. For the
user it surely all plays together.
   


We need qemu to cooperate for mmio tracing, and we can cooperate with 
qemu for symbol resolution.  If it prevents adding another kernel API, 
that's a win from my POV.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 04:24 PM, Alexander Graf wrote:

Avi Kivity wrote:
   

On 03/24/2010 03:53 PM, Alexander Graf wrote:
 
   

Someone needs to know about the new guest to fetch its symbols.  Or do
you want that part in the kernel too?

 

How about we add a virtio "guest file system access" device? The guest
would then expose its own file system using that device.

On the host side this would simply be a -virtioguestfs
unix:/tmp/guest.fs and you'd get a unix socket that gives you full
access to the guest file system by using commands. I envision something
like:

   

The idea is to use a dedicated channel over virtio-serial.  If the
channel is present the file server can serve files over it.
 

The file server being a kernel module inside the guest? We want to be
able to serve things as early and hassle free as possible, so in this
case I agree with Ingo that a kernel module is superior.
   


No, just a daemon.  If it's important enough we can get distributions to 
package it by default, and then it will be hassle free.  If "early 
enough" is also so important, we can get it to start up on initrd.  If 
it's really critical, we can patch grub to serve the files as well.



SEND: GET /proc/version
RECV: Linux version 2.6.27.37-0.1-default (ge...@buildhost) (gcc version
4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-10-15
14:56:58 +0200

Now all we need is integration in perf to enumerate virtual machines
based on libvirt. If you want to run qemu-kvm directly, just go with
--guestfs=/tmp/guest.fs and perf could fetch all required information
automatically.

This should solve all issues while staying 100% in user space, right?

   

Yeah, needs a fuse filesystem to populate the host namespace (kind of
sshfs over virtio-serial).
 

I don't see why we need a fuse filesystem. We can of course create one
later on. But for now all you need is a user connecting to that socket.
   


If the perf app knows the protocol, no problem.  But leave perf with 
pure filesystem access and hide the details in fuse.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 03:57:39PM +0200, Avi Kivity wrote:
> On 03/24/2010 03:46 PM, Joerg Roedel wrote:

>> Someone who uses libvirt and virt-manager by default is probably not
>> interested in this feature at the same level a kvm developer is. And
>> developers tend not to use libvirt for low-level kvm development.  A
>> number of developers have stated in this thread already that they would
>> appreciate a solution for guest enumeration that would not involve
>> libvirt.
>
> So would I.

Great.

> But when I weigh the benefit of truly transparent  system-wide perf
> integration for users who don't use libvirt but do use  perf, versus
> the cost of transforming kvm from a single-process API to a
> system-wide API with all the complications that I've listed, it comes
> out in favour of not adding the API.

Its not a transformation, its an extension. The current per-process
/dev/kvm stays mostly untouched. Its all about having something like
this:

$ cd /sys/kvm/guest0
$ ls -l
-r 1 root root 0 2009-08-17 12:05 name
dr-x-- 1 root root 0 2009-08-17 12:05 fs
$ cat name
guest0
$ # ...

The fs/ directory is used as the mount point for the guest root fs.

>> The samples will be tagged with the guest-name (and some additional
>> information perf needs). Perf userspace can access the symbols then
>> through /sys/kvm/guest0/fs/...
>
> I take that as a yes?  So we need a virtio-serial client in the kernel  
> (which might be exploitable by a malicious guest if buggy) and a  
> fs-over-virtio-serial client in the kernel (also exploitable).

What I meant was: perf-kernel puts the guest-name into every sample and
perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
symbols. I leave the question of how the guest-fs is exposed to the host
out of this discussion. We should discuss this seperatly.


>> An approach like: "The files are owned and only readable by the same
>> user that started the vm." might be a good start. So a user can measure
>> its own guests and root can measure all of them.
>
> That's not how sVirt works.  sVirt isolates a user's VMs from each  
> other, so if a guest breaks into qemu it can't break into other guests  
> owned by the same user.

If a vm breaks into qemu it can access the host file system which is the
bigger problem. In this case there is no isolation anymore. From that
context it can even kill other VMs of the same user independent of a
hypothetical /sys/kvm/.

>> Yeah that would be interesting information. But it is more related to
>> tracing than to pmu measurements.  The information which you
>> mentioned above are probably better captured by an extension of
>> trace-events to userspace.
>
> It's all related.  You start with perf, see a problem with mmio, call up  
> a histogram of mmio or interrupts or whatever, then zoom in on the  
> misbehaving device.

Yes, but its different from the implementation point-of-view. For the
user it surely all plays together.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM test: Enable timedrift for Linux guests

2010-03-24 Thread Lucas Meneghel Rodrigues
On Wed, Mar 24, 2010 at 3:25 AM, Jason Wang  wrote:
> We should also test timedrift for Linux guests especially for guest
> with pvclock. So this patch enable the timedrift for linux guests.
>
> Changes from v1:
> - Correct the wrong name for guest load cleaning
> - Use -no-kvm-pit-reinjection for linux guests and -rtc-td-hack for
> windows guests.

Here I have a little doubt if the test is useful only while running
under these command line options (since we indeed have timedrift
failures without them). Maybe it makes more sense to create variants
with these options, to ensure that this command line will also be
tested. Michael?

> Signed-off-by: Jason Wang 
> ---
>  client/tests/kvm/tests_base.cfg.sample |   14 --
>  1 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/client/tests/kvm/tests_base.cfg.sample 
> b/client/tests/kvm/tests_base.cfg.sample
> index 8cc83a9..29a2430 100644
> --- a/client/tests/kvm/tests_base.cfg.sample
> +++ b/client/tests/kvm/tests_base.cfg.sample
> @@ -147,7 +147,6 @@ variants:
>         type = linux_s3
>
>     - timedrift:    install setup unattended_install
> -        extra_params += " -rtc-td-hack"
>         variants:
>             - with_load:
>                 type = timedrift
> @@ -330,7 +329,7 @@ variants:
>  variants:
>     # Linux section
>     - @Linux:
> -        no timedrift autoit
> +        no autoit
>         shutdown_command = shutdown -h now
>         reboot_command = shutdown -r now
>         status_test_command = echo $?
> @@ -342,6 +341,16 @@ variants:
>         file_transfer_port = 22
>         mem_chk_cmd = dmidecode -t 17 | awk -F: '/Size/ {print $2}'
>         cpu_chk_cmd = grep -c processor /proc/cpuinfo
> +        timedrift:
> +            extra_params += " -no-kvm-pit-reinjection"
> +            time_command = date +'TIME: %a %m/%d/%Y %H:%M:%S.%N'
> +            time_filter_re = "(?:TIME: \w\w\w )(.{19})(?:\.\d\d)"
> +            time_format = "%m/%d/%Y %H:%M:%S"
> +            guest_load_command = "dd if=/dev/urandom of=/dev/null"
> +            guest_load_instances = 2
> +            guest_load_stop_command = "killall -9 dd"
> +            host_load_command = "bzip2 -c --best /dev/urandom > /dev/null"
> +            host_load_instances = 8
>
>         variants:
>             - Fedora:
> @@ -717,6 +726,7 @@ variants:
>         stress_boot:
>             alive_test_cmd = systeminfo
>         timedrift:
> +            extra_params += " -rtc-td-hack"
>             time_command = "echo TIME: %date% %time%"
>             time_filter_re = "(?<=TIME: \w\w\w ).{19}(?=\.\d\d)"
>             time_format = "%m/%d/%Y %H:%M:%S"
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Lucas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH 2/4] KVM test: Add TSC into guest test

2010-03-24 Thread Lucas Meneghel Rodrigues
On Wed, Mar 24, 2010 at 2:59 AM, Jason Wang  wrote:
> Lucas Meneghel Rodrigues wrote:
>> On Mon, Mar 22, 2010 at 4:45 AM, Jason Wang  wrote:
>>
>>> TSC is used to check the whether the TSC of processors are
>>> synchronized which is useful for testing virtual TSC.
>>>
>>
>> The only thing that needs to be corrected here is that this test needs
>> -smp > 1 to work (actually, even numbers work better), so I'll make
>> it not available with -smp 1. Thanks for the patch, I am going to put
>> it upstream soon!
>>
>>
> I agree, and maybe we'd better also use smp > 1 in the test of
> monotonic_time ?

Monotonic time has not this restriction, as far as I know... Unless I
am very mistaken

>>> Signed-off-by: Jason Wang 
>>> ---
>>>  client/tests/kvm/autotest_control/tsc.control |   13 +
>>>  client/tests/kvm/tests_base.cfg.sample        |    3 +++
>>>  2 files changed, 16 insertions(+), 0 deletions(-)
>>>  create mode 100644 client/tests/kvm/autotest_control/tsc.control
>>>
>>> diff --git a/client/tests/kvm/autotest_control/tsc.control 
>>> b/client/tests/kvm/autotest_control/tsc.control
>>> new file mode 100644
>>> index 000..0c1c65a
>>> --- /dev/null
>>> +++ b/client/tests/kvm/autotest_control/tsc.control
>>> @@ -0,0 +1,13 @@
>>> +NAME = 'Check TSC'
>>> +AUTHOR = 'Michael Davidson '
>>> +TIME = 'MEDIUM'
>>> +TEST_CLASS = 'Kernel'
>>> +TEST_CATEGORY = 'Functional'
>>> +TEST_TYPE = 'client'
>>> +DOC = """
>>> +checktsc is a user space program that checks TSC synchronization
>>> +between pairs of CPUs on an SMP system using a technique borrowed
>>> +from the Linux 2.6.18 kernel.
>>> +"""
>>> +
>>> +job.run_test('tsc')
>>> diff --git a/client/tests/kvm/tests_base.cfg.sample 
>>> b/client/tests/kvm/tests_base.cfg.sample
>>> index 2af6a05..861759e 100644
>>> --- a/client/tests/kvm/tests_base.cfg.sample
>>> +++ b/client/tests/kvm/tests_base.cfg.sample
>>> @@ -136,6 +136,9 @@ variants:
>>>             - monotonic_time:
>>>                 test_name = monotonic_time
>>>                 test_control_file = monotonic_time.control
>>> +            - tsc:
>>> +                test_name = tsc
>>> +                test_control_file = tsc.control
>>>
>>>     - linux_s3:     install setup unattended_install
>>>         type = linux_s3
>>>
>>> ___
>>> Autotest mailing list
>>> autot...@test.kernel.org
>>> http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
>>>
>>>
>>
>>
>>
>>
>
> ___
> Autotest mailing list
> autot...@test.kernel.org
> http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
>



-- 
Lucas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Alexander Graf
Avi Kivity wrote:
> On 03/24/2010 03:53 PM, Alexander Graf wrote:
>>
>>> Someone needs to know about the new guest to fetch its symbols.  Or do
>>> you want that part in the kernel too?
>>>  
>>
>> How about we add a virtio "guest file system access" device? The guest
>> would then expose its own file system using that device.
>>
>> On the host side this would simply be a -virtioguestfs
>> unix:/tmp/guest.fs and you'd get a unix socket that gives you full
>> access to the guest file system by using commands. I envision something
>> like:
>>
>
> The idea is to use a dedicated channel over virtio-serial.  If the
> channel is present the file server can serve files over it.

The file server being a kernel module inside the guest? We want to be
able to serve things as early and hassle free as possible, so in this
case I agree with Ingo that a kernel module is superior.

>
>> SEND: GET /proc/version
>> RECV: Linux version 2.6.27.37-0.1-default (ge...@buildhost) (gcc version
>> 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-10-15
>> 14:56:58 +0200
>>
>> Now all we need is integration in perf to enumerate virtual machines
>> based on libvirt. If you want to run qemu-kvm directly, just go with
>> --guestfs=/tmp/guest.fs and perf could fetch all required information
>> automatically.
>>
>> This should solve all issues while staying 100% in user space, right?
>>
>
> Yeah, needs a fuse filesystem to populate the host namespace (kind of
> sshfs over virtio-serial).

I don't see why we need a fuse filesystem. We can of course create one
later on. But for now all you need is a user connecting to that socket.


Alex


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH] pci passthrough: zap option rom scanning.

2010-03-24 Thread Alexander Graf
Marcelo Tosatti wrote:
> On Wed, Jan 20, 2010 at 11:58:48AM +0100, Gerd Hoffmann wrote:
>   
>> Nowdays (qemu 0.12) seabios loads option roms from pci rom bars.  So
>> there is no need any more to scan for option roms and have qemu load
>> them.  Zap the code.
>>
>> Signed-off-by: Gerd Hoffmann 
>> 
>
> Applied, thanks.
>   

Without this patch (0.12.3) I get the following error message when
trying to pass 2 functions of an ixgbe adapter to the guest:

falla:/abuild/agraf/qemu-kvm/:[0]# qemu-kvm -pcidevice host=07:00.0
-pcidevice host=07:00.1 -nographic -append console=ttyS0 -kernel
/boot/vmlinuz.x -initrd /boot/initrd.x
device: 07:00.0: driver="pci-assign" host="07:00.0"
device: 07:00.1: driver="pci-assign" host="07:00.1"
rom: requested regions overlap (rom 07:00.1. free=0xac00,
addr=0x)
rom loading failed


The same code works with qemu-kvm.git. Cherry picking this commit
(51c0dad5ce383be94ca7c46e491ada17cc9ec416) also makes it work in
0.12-stable.


Thus I'd incline we also take this patch into 0.12-stable.


Alex


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: jaso-parser: Output the content of invalid keyword

2010-03-24 Thread Luiz Capitulino
On Wed, 24 Mar 2010 11:29:40 +0100
Aurelien Jarno  wrote:

> Hi,
> 
> On Wed, Mar 24, 2010 at 06:00:53PM +0800, Amos Kong wrote:
> > When input some invialid words in QMP port, qemu outputs this error message:
> > "parse error: invalid keyword `%s'"
> > This patch makes qemu output the content.
> 
> Is this patch for QEMU or KVM? If it is for QEMU, you should put the
> QEMU mailing list in Cc:. If it is for KVM, I don't have commit access
> there.

 It's for QEMU and looks good to me.

 Amos, can you resend there please? Just to take the right route..

> 
> > Signed-off-by: Amos Kong 
> > ---
> >  json-parser.c |7 ++-
> >  1 files changed, 6 insertions(+), 1 deletions(-)
> > 
> > diff --git a/json-parser.c b/json-parser.c
> > index 579928f..98a82af 100644
> > --- a/json-parser.c
> > +++ b/json-parser.c
> > @@ -12,6 +12,7 @@
> >   */
> >  
> >  #include 
> > +#include 
> >  
> >  #include "qemu-common.h"
> >  #include "qstring.h"
> > @@ -93,7 +94,11 @@ static int token_is_escape(QObject *obj, const char 
> > *value)
> >   */
> >  static void parse_error(JSONParserContext *ctxt, QObject *token, const 
> > char *msg, ...)
> >  {
> > -fprintf(stderr, "parse error: %s\n", msg);
> > +va_list ap;
> > +va_start(ap, msg);
> > +fprintf(stderr, "parse error:");
> > +vfprintf(stderr, msg, ap);
> > +fprintf(stderr, "\n");
> >  }
> >  
> >  /**
> > -- 
> > 1.6.3.3
> > 
> > 
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 03:53 PM, Alexander Graf wrote:



Someone needs to know about the new guest to fetch its symbols.  Or do
you want that part in the kernel too?
 


How about we add a virtio "guest file system access" device? The guest
would then expose its own file system using that device.

On the host side this would simply be a -virtioguestfs
unix:/tmp/guest.fs and you'd get a unix socket that gives you full
access to the guest file system by using commands. I envision something
like:
   


The idea is to use a dedicated channel over virtio-serial.  If the 
channel is present the file server can serve files over it.



SEND: GET /proc/version
RECV: Linux version 2.6.27.37-0.1-default (ge...@buildhost) (gcc version
4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-10-15
14:56:58 +0200

Now all we need is integration in perf to enumerate virtual machines
based on libvirt. If you want to run qemu-kvm directly, just go with
--guestfs=/tmp/guest.fs and perf could fetch all required information
automatically.

This should solve all issues while staying 100% in user space, right?
   


Yeah, needs a fuse filesystem to populate the host namespace (kind of 
sshfs over virtio-serial).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 03:46 PM, Joerg Roedel wrote:

On Wed, Mar 24, 2010 at 03:05:02PM +0200, Avi Kivity wrote:
   

On 03/24/2010 02:50 PM, Joerg Roedel wrote:
 
   

I don't want the tool for myself only. A typical perf user expects that
it works transparent.
   

A typical kvm user uses libvirt, so we can integrate it with that.
 

Someone who uses libvirt and virt-manager by default is probably not
interested in this feature at the same level a kvm developer is. And
developers tend not to use libvirt for low-level kvm development.  A
number of developers have stated in this thread already that they would
appreciate a solution for guest enumeration that would not involve
libvirt.
   


So would I.  But when I weigh the benefit of truly transparent 
system-wide perf integration for users who don't use libvirt but do use 
perf, versus the cost of transforming kvm from a single-process API to a 
system-wide API with all the complications that I've listed, it comes 
out in favour of not adding the API.


Those few users can probably script something to cover their needs.


Someone needs to know about the new guest to fetch its symbols.  Or do
you want that part in the kernel too?
 

The samples will be tagged with the guest-name (and some additional
information perf needs). Perf userspace can access the symbols then
through /sys/kvm/guest0/fs/...
   


I take that as a yes?  So we need a virtio-serial client in the kernel 
(which might be exploitable by a malicious guest if buggy) and a 
fs-over-virtio-serial client in the kernel (also exploitable).



Depends on how it is designed. A filesystem approach was already
mentioned. We could create /sys/kvm/ for example to expose information
about virtual machines to userspace. This would not require any new
security hooks.
   

Who would set the security context on those files?
 

An approach like: "The files are owned and only readable by the same
user that started the vm." might be a good start. So a user can measure
its own guests and root can measure all of them.
   


That's not how sVirt works.  sVirt isolates a user's VMs from each 
other, so if a guest breaks into qemu it can't break into other guests 
owned by the same user.


The users who need this API (!libvirt and perf) probably don't care 
about sVirt, but a new API must not break it.



Plus, we need cgroup  support so you can't see one container's guests
from an unrelated container.
 

cgroup support is an issue but we can solve that too. Its in general
still less complex than going through the whole libvirt-qemu-kvm stack.
   


It's a tradeoff.  IMO, going through qemu is the better way, and also 
provides more information.



Integration with qemu would allow perf to tell us that the guest is
hitting the interrupt status register of a virtio-blk device in pci
slot 5 (the information is already available through the kvm_mmio
trace event, but  only qemu can decode it).
 

Yeah that would be interesting information. But it is more related to
tracing than to pmu measurements.
The information which you mentioned above are probably better
captured by an extension of trace-events to userspace.
   


It's all related.  You start with perf, see a problem with mmio, call up 
a histogram of mmio or interrupts or whatever, then zoom in on the 
misbehaving device.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Alexander Graf
Avi Kivity wrote:
> On 03/24/2010 02:50 PM, Joerg Roedel wrote:
>>
>>> You can always provide the kernel and module paths as command line
>>> parameters.  It just won't be transparently usable, but if you're using
>>> qemu from the command line, presumably you can live with that.
>>>  
>> I don't want the tool for myself only. A typical perf user expects that
>> it works transparent.
>>
>
> A typical kvm user uses libvirt, so we can integrate it with that.
>
 Could be easily done using notifier chains already in the kernel.
 Probably implemented with much less than 100 lines of additional code.

>>> And a userspace interface for that.
>>>  
>> Not necessarily. The perf event is configured to measure systemwide kvm
>> by userspace. The kernel side of perf takes care that it stays
>> system-wide even with added vm instances. So in this case the consumer
>> for the notifier would be the perf kernel part. No userspace interface
>> required.
>>
>
> Someone needs to know about the new guest to fetch its symbols.  Or do
> you want that part in the kernel too?


How about we add a virtio "guest file system access" device? The guest
would then expose its own file system using that device.

On the host side this would simply be a -virtioguestfs
unix:/tmp/guest.fs and you'd get a unix socket that gives you full
access to the guest file system by using commands. I envision something
like:

SEND: GET /proc/version
RECV: Linux version 2.6.27.37-0.1-default (ge...@buildhost) (gcc version
4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-10-15
14:56:58 +0200

Now all we need is integration in perf to enumerate virtual machines
based on libvirt. If you want to run qemu-kvm directly, just go with
--guestfs=/tmp/guest.fs and perf could fetch all required information
automatically.

This should solve all issues while staying 100% in user space, right?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 03:05:02PM +0200, Avi Kivity wrote:
> On 03/24/2010 02:50 PM, Joerg Roedel wrote:

>> I don't want the tool for myself only. A typical perf user expects that
>> it works transparent.
>
> A typical kvm user uses libvirt, so we can integrate it with that.

Someone who uses libvirt and virt-manager by default is probably not
interested in this feature at the same level a kvm developer is. And
developers tend not to use libvirt for low-level kvm development.  A
number of developers have stated in this thread already that they would
appreciate a solution for guest enumeration that would not involve
libvirt.

> Someone needs to know about the new guest to fetch its symbols.  Or do  
> you want that part in the kernel too?

The samples will be tagged with the guest-name (and some additional
information perf needs). Perf userspace can access the symbols then
through /sys/kvm/guest0/fs/...

>> Depends on how it is designed. A filesystem approach was already
>> mentioned. We could create /sys/kvm/ for example to expose information
>> about virtual machines to userspace. This would not require any new
>> security hooks.
>
> Who would set the security context on those files?

An approach like: "The files are owned and only readable by the same
user that started the vm." might be a good start. So a user can measure
its own guests and root can measure all of them.

> Plus, we need cgroup  support so you can't see one container's guests
> from an unrelated container.

cgroup support is an issue but we can solve that too. Its in general
still less complex than going through the whole libvirt-qemu-kvm stack.

> Integration with qemu would allow perf to tell us that the guest is
> hitting the interrupt status register of a virtio-blk device in pci
> slot 5 (the information is already available through the kvm_mmio
> trace event, but  only qemu can decode it).

Yeah that would be interesting information. But it is more related to
tracing than to pmu measurements.
The information which you mentioned above are probably better
captured by an extension of trace-events to userspace.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 02:50 PM, Joerg Roedel wrote:



You can always provide the kernel and module paths as command line
parameters.  It just won't be transparently usable, but if you're using
qemu from the command line, presumably you can live with that.
 

I don't want the tool for myself only. A typical perf user expects that
it works transparent.
   


A typical kvm user uses libvirt, so we can integrate it with that.


Could be easily done using notifier chains already in the kernel.
Probably implemented with much less than 100 lines of additional code.
   

And a userspace interface for that.
 

Not necessarily. The perf event is configured to measure systemwide kvm
by userspace. The kernel side of perf takes care that it stays
system-wide even with added vm instances. So in this case the consumer
for the notifier would be the perf kernel part. No userspace interface
required.
   


Someone needs to know about the new guest to fetch its symbols.  Or do 
you want that part in the kernel too?



If we make an API, I'd like it to be generally useful.
 

Thats hard to do at this point since we don't know what people will use
it for. We should keep it simple in the beginning and add new features
as they are requested and make sense in this context.
   


IMO this use case is to rare to warrant its own API, especially as there 
are alternatives.



It's a total headache.  For example, we'd need security module hooks to
determine access permissions.  So far we managed to avoid that since kvm
doesn't allow you to access any information beyond what you provided it
directly.
 

Depends on how it is designed. A filesystem approach was already
mentioned. We could create /sys/kvm/ for example to expose information
about virtual machines to userspace. This would not require any new
security hooks.
   


Who would set the security context on those files?  Plus, we need cgroup 
support so you can't see one container's guests from an unrelated container.



Copying the objects is a one time cost.  If you run perf for more than a
second or two, it would fetch and cache all of the data.  It's really
the same problem with non-guest profiling, only magnified a bit.
 

I don't think we can cache filesystem data of a running guest on the
host. It is too hard to keep such a cache coherent.
   


I don't see any choice.  The guest can change its symbols at any time 
(say by kexec), without any notification.



Other userspaces can also provide this functionality, like they have to
provide disk, network, and display emulation.  The kernel is not a huge
library.
 

If two userspaces run in parallel what is the single instance where perf
can get a list of guests from?
   


I don't know.  Surely that's solvable though.


kvm.ko has only a small subset of the information that is used to define
a guest.
 

The subset is not small. It contains all guest vcpus, the complete
interrupt routing hardware emulation and manages event the guests
memory.
   


It doesn't contain most of the mmio and pio address space.  Integration 
with qemu would allow perf to tell us that the guest is hitting the 
interrupt status register of a virtio-blk device in pci slot 5 (the 
information is already available through the kvm_mmio trace event, but 
only qemu can decode it).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 02:08:17PM +0200, Avi Kivity wrote:
> On 03/24/2010 01:59 PM, Joerg Roedel wrote:

> You can always provide the kernel and module paths as command line  
> parameters.  It just won't be transparently usable, but if you're using  
> qemu from the command line, presumably you can live with that.

I don't want the tool for myself only. A typical perf user expects that
it works transparent.

>> Could be easily done using notifier chains already in the kernel.
>> Probably implemented with much less than 100 lines of additional code.
>
> And a userspace interface for that.

Not necessarily. The perf event is configured to measure systemwide kvm
by userspace. The kernel side of perf takes care that it stays
system-wide even with added vm instances. So in this case the consumer
for the notifier would be the perf kernel part. No userspace interface
required.

> If we make an API, I'd like it to be generally useful.

Thats hard to do at this point since we don't know what people will use
it for. We should keep it simple in the beginning and add new features
as they are requested and make sense in this context.

> It's a total headache.  For example, we'd need security module hooks to  
> determine access permissions.  So far we managed to avoid that since kvm  
> doesn't allow you to access any information beyond what you provided it  
> directly.

Depends on how it is designed. A filesystem approach was already
mentioned. We could create /sys/kvm/ for example to expose information
about virtual machines to userspace. This would not require any new
security hooks.

> Copying the objects is a one time cost.  If you run perf for more than a  
> second or two, it would fetch and cache all of the data.  It's really  
> the same problem with non-guest profiling, only magnified a bit.

I don't think we can cache filesystem data of a running guest on the
host. It is too hard to keep such a cache coherent.

>>> Other userspaces can also provide this functionality, like they have to
>>> provide disk, network, and display emulation.  The kernel is not a huge
>>> library.

If two userspaces run in parallel what is the single instance where perf
can get a list of guests from?

> kvm.ko has only a small subset of the information that is used to define  
> a guest.

The subset is not small. It contains all guest vcpus, the complete
interrupt routing hardware emulation and manages event the guests
memory.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST PATCH] KVM test: increase default timeout for autotest.sleeptest

2010-03-24 Thread Michael Goldish
Signed-off-by: Michael Goldish 
---
 client/tests/kvm/tests_base.cfg.sample |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 249f1b4..b8288fc 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -111,7 +111,7 @@ variants:
 variants:
 - sleeptest:
 test_name = sleeptest
-test_timeout = 30
+test_timeout = 120
 test_control_file = sleeptest.control
 - dbench:
 test_name = dbench
-- 
1.5.4.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv6 0/4] qemu-kvm: vhost net port

2010-03-24 Thread Avi Kivity

On 03/17/2010 03:04 PM, Michael S. Tsirkin wrote:

This is port of vhost v6 patch set I posted previously to qemu-kvm, for
those that want to get good performance out of it :) This patchset needs
to be applied when qemu.git one gets merged, this includes irqchip
support.

   


Ping me when this happens please.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI device passthrough / memory mapping issue

2010-03-24 Thread Avi Kivity

On 03/19/2010 12:46 AM, Fede wrote:

I'm currently working to enable vga passthrough in kvm.
   



assigned_dev_iomem_map: e_phys=1000 r_virt=0x7fa64af0e000 type=0
len=0200 region_num=3
kvm_register_phys_mem:580 memory: gpa: 1000, size: 200, uaddr:
7fa64af0e000, slot: 7,flags: 0
create_userspace_phys_mem: File exists
assigned_dev_iomem_map: Error: create new mapping failed


This worked a month ago. But after some git updates there's a problem.

When the real device regions are mapped from real virtual memory to
guest physical addresses in kvm, it overlaps region 3 with the guest
physical memory assigned to kernel space (0x1000 to 0x1020)

I've been trying to look for the answer to this question: Why is gpa
0x1000 chosen and not any other free memory space?
   


The BIOS generally assigns addresses, then the OS updates them if it 
wants to.  Is the crash before or after the OS has started loading?


If before, suggest you add some printfs to seabios to explain its decisions.


It seems that this addresses are being chosen here (for example, for region 3):
assigned_dev_pci_read_config: (4.0): address=001c val=0x len=4
assigned_dev_pci_write_config: (4.0): address=001c val=0x len=4
assigned_dev_pci_read_config: (4.0): address=001c val=0xfe00 len=4
   


The above sequence is how the bios determines the BAR size. (0x200 
in this case).



assigned_dev_pci_write_config: (4.0): address=001c val=0x len=4
   


Now it clears the mess from the previous step.


assigned_dev_pci_read_config: (4.0): address=001c val=0x len=4
assigned_dev_pci_write_config: (4.0): address=001c val=0x1000 len=4
   


Here it assigns a new address.  It's clearly wrong.  A log in seabios 
will explain this.



What does this mean? Why PCI BARs are being written and read?



Why the
values that are written differs from the ones that are read after?
   


BARs are not RAM cells.  the least significant address bits are 
hardwired to zero, that's how the BIOS detects the BAR size (and 
required alignment).



The last write is the gpa that is used.

Is this a bug? I can't find the source of this gpa addresses. I need
to change them.
   


Looks like a bug in seabios.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm crashes with Assertion ... failed.

2010-03-24 Thread Avi Kivity

On 03/17/2010 11:14 PM, André Weidemann wrote:

qemu-system-x86_64 -cpu core2duo -vga cirrus -boot order=ndc -vnc
192.168.3.42:2 -k de -smp 4,cores=4 -drive
file=/vmware/Windows7Test_600G.img,if=ide,index=0,cache=writeback -m
1024 -net nic,model=e1000,macaddr=DE:AD:BE:EF:12:3A -net
tap,script=/usr/local/bin/qemu-ifup  -monitor pty -name
Windows7test,process=Windows7test -drive
file=/dev/storage/Windows7test,if=ide,index=1,cache=none,aio=native


Andre,

Can you try qemu-kvm-0.12.3 ?



I did the following:
git clone git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git 
qemu-kvm-2010-03-17

cd qemu-kvm-2010-03-17
git checkout -b test qemu-kvm-0.12.3
./configure
make -j6 && make install

I started the VM again exactly as I did the last time and it crashed 
again with the same error message.
"qemu-system-x86_64: 
/usr/local/src/qemu-kvm-2010-03-17/hw/ide/internal.h:507: 
bmdma_active_if: Assertion `bmdma->unit != (uint8_t)-1' failed."




Does this happen with a guest installed on kvm, or just with the guest 
that (guessing from the name) was imported from vmware?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SIGSEGV with -smp 17+, and error handling around...

2010-03-24 Thread Avi Kivity

On 03/17/2010 10:12 PM, Michael Tokarev wrote:

When run with -smp 17 or greather, kvm
fails like this:

$ kvm -smp 17
kvm_create_vcpu: Invalid argument
kvm_setup_mce FAILED: Invalid argument
KVM_SET_LAPIC failed
Segmentation fault
$ _

In qemu-kvm.c, the kvm_create_vcpu() routine
(which is used in a vcpu thread to set up
vcpu) is declared as void, i.e, no error
return.  And the code that calls it blindly
assumes that it will never fail...

But the first error message above is from kernel,
which - apparently - refuses to create 17th vCPU.
Hence we've a vcpu thread which is empty/dummy and
not even fully initialized... so it fails later
in the game.

This all looks quite... raw, not polished ;)

Can we somehow handle the (several possible) errors
in that (and other) places, and how we ever can act
on them?  Abort?  Warn the user and reduce the number
of vcpus accordingly (seems wrong, esp. if it were
some first vcpus or in the middle which failed)...
   


A patch from Alex fixing this was just committed.  I'll apply it to the 
stable branch as well.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >