date:20090515

Re: [PATCH v2] Shared memory device with interrupt support

2009-05-15 Thread Cam Macdonell



On 15-May-09, at 8:54 PM, Kumar, Venkat wrote:


Cam,

A questions on interrupts as well.
What is "unix:path" that needs to be passed in the argument list?
Can it be any string?


It has to be a valid path on the host.  It will create a unix domain  
socket on that path.




If my understanding is correct both the VM's who wants to  
communicate would gives this path in the command line with one of  
them specifying as "server".


Exactly, the one with the "server" in the parameter list will wait for  
a connection before booting.


Cam



Thx,
Venkat






   Support an inter-vm shared memory device that maps a shared- 
memory object
as a PCI device in the guest.  This patch also supports interrupts  
between
guest by communicating over a unix domain socket.  This patch  
applies to the

qemu-kvm repository.

This device now creates a qemu character device and sends 1-bytes  
messages to
trigger interrupts.  Writes are trigger by writing to the "Doorbell"  
register
on the shared memory PCI device.  The lower 8-bits of the value  
written to this
register are sent as the 1-byte message so different meanings of  
interrupts can

be supported.

Interrupts are only supported between 2 VMs currently.  One VM must  
act as the
server by adding "server" to the command-line argument.  Shared  
memory devices

are created with the following command-line:

-ivhshmem ,,[unix:][,server]

Interrupts can also be used between host and guest as well by  
implementing a

listener on the host.

Cam

---
Makefile.target |3 +
hw/ivshmem.c|  421 ++ 
+

hw/pc.c |6 +
hw/pc.h |3 +
qemu-options.hx |   14 ++
sysemu.h|8 +
vl.c|   14 ++
7 files changed, 469 insertions(+), 0 deletions(-)
create mode 100644 hw/ivshmem.c

diff --git a/Makefile.target b/Makefile.target
index b68a689..3190bba 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -643,6 +643,9 @@ OBJS += pcnet.o
OBJS += rtl8139.o
OBJS += e1000.o

+# Inter-VM PCI shared memory
+OBJS += ivshmem.o
+
# Generic watchdog support and some watchdog devices
OBJS += watchdog.o
OBJS += wdt_ib700.o wdt_i6300esb.o
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
new file mode 100644
index 000..95e2268
--- /dev/null
+++ b/hw/ivshmem.c
@@ -0,0 +1,421 @@
+/*
+ * Inter-VM Shared Memory PCI device.
+ *
+ * Author:
+ *  Cam Macdonell 
+ *
+ * Based On: cirrus_vga.c and rtl8139.c
+ *
+ * This code is licensed under the GNU GPL v2.
+ */
+
+#include "hw.h"
+#include "console.h"
+#include "pc.h"
+#include "pci.h"
+#include "sysemu.h"
+
+#include "qemu-common.h"
+#include 
+
+#define PCI_COMMAND_IOACCESS0x0001
+#define PCI_COMMAND_MEMACCESS   0x0002
+#define PCI_COMMAND_BUSMASTER   0x0004
+
+//#define DEBUG_IVSHMEM
+
+#ifdef DEBUG_IVSHMEM
+#define IVSHMEM_DPRINTF(fmt, args...)\
+do {printf("IVSHMEM: " fmt, ##args); } while (0)
+#else
+#define IVSHMEM_DPRINTF(fmt, args...)
+#endif
+
+typedef struct IVShmemState {
+uint16_t intrmask;
+uint16_t intrstatus;
+uint16_t doorbell;
+uint8_t *ivshmem_ptr;
+unsigned long ivshmem_offset;
+unsigned int ivshmem_size;
+unsigned long bios_offset;
+unsigned int bios_size;
+target_phys_addr_t base_ctrl;
+int it_shift;
+PCIDevice *pci_dev;
+CharDriverState * chr;
+unsigned long map_addr;
+unsigned long map_end;
+int ivshmem_mmio_io_addr;
+} IVShmemState;
+
+typedef struct PCI_IVShmemState {
+PCIDevice dev;
+IVShmemState ivshmem_state;
+} PCI_IVShmemState;
+
+typedef struct IVShmemDesc {
+char name[1024];
+char * chrdev;
+int size;
+} IVShmemDesc;
+
+
+/* registers for the Inter-VM shared memory device */
+enum ivshmem_registers {
+IntrMask = 0,
+IntrStatus = 16,
+Doorbell = 32
+};
+
+static int num_ivshmem_devices = 0;
+static IVShmemDesc ivshmem_desc;
+
+static void ivshmem_map(PCIDevice *pci_dev, int region_num,
+uint32_t addr, uint32_t size, int type)
+{
+PCI_IVShmemState *d = (PCI_IVShmemState *)pci_dev;
+IVShmemState *s = &d->ivshmem_state;
+
+IVSHMEM_DPRINTF("addr = %u size = %u\n", addr, size);
+cpu_register_physical_memory(addr, s->ivshmem_size, s- 
>ivshmem_offset);

+
+}
+
+void ivshmem_init(const char * optarg) {
+
+char * temp;
+char * ivshmem_sz;
+int size;
+
+num_ivshmem_devices++;
+
+/* currently we only support 1 device */
+if (num_ivshmem_devices > MAX_IVSHMEM_DEVICES) {
+return;
+}
+
+temp = strdup(optarg);
+snprintf(ivshmem_desc.name, 1024, "/%s", strsep(&temp,","));
+ivshmem_sz=strsep(&temp,",");
+if (ivshmem_sz != NULL){
+size = atol(ivshmem_sz);
+} else {
+size = -1;
+}
+
+ivshmem_desc.chrdev = strsep(&temp,"\0");
+
+if ( size == -1) {
+ivshmem_desc.size = TARGET_PAGE_SIZE;
+} else {
+ivshmem_desc.size = size*1024*1024;
+}
+IVSHMEM_DPRINTF("optarg

Re: [PATCH v2] Shared memory device with interrupt support

2009-05-15 Thread Cam Macdonell



On 15-May-09, at 8:45 PM, Kumar, Venkat wrote:


Hi Cam, I have gone through you latest shared memory patch.
I have a few questions and comments.

Comment:-
+if (ivshmem_enabled) {
+ivshmem_init(ivshmem_device);
+ram_size += ivshmem_get_size();
+}
+

In your initial patch this part of the patch is

+if (ivshmem_enabled) {
+ivshmem_init(ivshmem_device);
+phys_ram_size += ivshmem_get_size();
+}

I think the phys_ram_size += ivshmem_get_size(); is correct.


Hi Venkat,

Not with the newer qemu that qemu-kvm uses.   The newer patch is for  
qemu-kvm, not kvm-userspace.  There is no longer a variable named  
phys_ram_size in pc.c in qemu-kvm.




Question:-
You are giving the desired virtual address for mmaping the shared  
memory object as "s->ivshmem_ptr" which is "phys_ram_base + s- 
>ivshmem_offset". This desired virtual address is nothing but the  
base virtual address of the memory that you are allocating after  
incrementing phys_ram_size. So now s->ivshmem_ptr would point to a  
new set of memory, which is the shared memory region instead of  
memory allocated through qemu_alloc_physram, which means if pages  
are allocated for "sh->ivshmem_ptr" virtual address range then those  
pages can never be addressed again. Correct me if my understanding  
is wrong.


I don't think so.  With the mmap call, I specify MAP_FIXED which  
requires that the memory in the shared memory object be mapped to the  
address given in the first parameter (s->ivshmem_ptr).  If MAP_FIXED  
is not specified then mmap would allocate the memory and map on to it,  
but with MAP_FIXED it maps onto the already reserved space that  
ivshmem_ptr points to and was allocated with qemu_ram_alloc().


I hope that answers your question,

Cam



-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org]  
On Behalf Of Cam Macdonell

Sent: Thursday, May 07, 2009 9:47 PM
To: kvm@vger.kernel.org
Cc: Cam Macdonell
Subject: [PATCH v2] Shared memory device with interrupt support

   Support an inter-vm shared memory device that maps a shared- 
memory object as a PCI device in the guest.  This patch also  
supports interrupts between guest by communicating over a unix  
domain socket.  This patch applies to the qemu-kvm repository.


This device now creates a qemu character device and sends 1-bytes  
messages to trigger interrupts.  Writes are trigger by writing to  
the "Doorbell" register on the shared memory PCI device.  The lower  
8-bits of the value written to this register are sent as the 1-byte  
message so different meanings of interrupts can be supported.


Interrupts are only supported between 2 VMs currently.  One VM must  
act as the server by adding "server" to the command-line argument.   
Shared memory devices are created with the following command-line:


-ivhshmem ,,[unix:][,server]

Interrupts can also be used between host and guest as well by  
implementing a listener on the host.


Cam

---
Makefile.target |3 +
hw/ivshmem.c|  421 ++ 
+

hw/pc.c |6 +
hw/pc.h |3 +
qemu-options.hx |   14 ++
sysemu.h|8 +
vl.c|   14 ++
7 files changed, 469 insertions(+), 0 deletions(-)
create mode 100644 hw/ivshmem.c

diff --git a/Makefile.target b/Makefile.target
index b68a689..3190bba 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -643,6 +643,9 @@ OBJS += pcnet.o
OBJS += rtl8139.o
OBJS += e1000.o

+# Inter-VM PCI shared memory
+OBJS += ivshmem.o
+
# Generic watchdog support and some watchdog devices
OBJS += watchdog.o
OBJS += wdt_ib700.o wdt_i6300esb.o
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
new file mode 100644
index 000..95e2268
--- /dev/null
+++ b/hw/ivshmem.c
@@ -0,0 +1,421 @@
+/*
+ * Inter-VM Shared Memory PCI device.
+ *
+ * Author:
+ *  Cam Macdonell 
+ *
+ * Based On: cirrus_vga.c and rtl8139.c
+ *
+ * This code is licensed under the GNU GPL v2.
+ */
+
+#include "hw.h"
+#include "console.h"
+#include "pc.h"
+#include "pci.h"
+#include "sysemu.h"
+
+#include "qemu-common.h"
+#include 
+
+#define PCI_COMMAND_IOACCESS0x0001
+#define PCI_COMMAND_MEMACCESS   0x0002
+#define PCI_COMMAND_BUSMASTER   0x0004
+
+//#define DEBUG_IVSHMEM
+
+#ifdef DEBUG_IVSHMEM
+#define IVSHMEM_DPRINTF(fmt, args...)\
+do {printf("IVSHMEM: " fmt, ##args); } while (0)
+#else
+#define IVSHMEM_DPRINTF(fmt, args...)
+#endif
+
+typedef struct IVShmemState {
+uint16_t intrmask;
+uint16_t intrstatus;
+uint16_t doorbell;
+uint8_t *ivshmem_ptr;
+unsigned long ivshmem_offset;
+unsigned int ivshmem_size;
+unsigned long bios_offset;
+unsigned int bios_size;
+target_phys_addr_t base_ctrl;
+int it_shift;
+PCIDevice *pci_dev;
+CharDriverState * chr;
+unsigned long map_addr;
+unsigned long map_end;
+int ivshmem_mmio_io_addr;
+} IVShmemState;
+
+typedef struct PCI_IVShme

[PATCH 1/2] Clean up MADT Table Creation

2009-05-15 Thread Beth Kon

This patch is based on the recent patch from Vincent Minet. I split Vincent's
changes into 2 patches (to separate MADT and RSDT table cleanup, as suggested by
Marcelo) and added a bit to them. And to give credit where it is due, this
cleanup is also related to the patch Marcelo provided when the HPET addition 
tripped over the same problem. (Thanks again Marcelo :-) 

This patch moves all the table layout calculations to the same area of
acpi_bios_init. This prevents corruption problems when, in the middle of
filling in the tables, the MADT table size grows. The idea is to do all the 
layout in one section, then fill things in afterwards. It also corrects a 
problem where the madt table was memset to 0 before the final size of the 
table had been determined.

Signed-off-by: Beth Kon 

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index cbd5f15..7f62e4f 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1665,6 +1665,7 @@ void acpi_bios_init(void)
 
 addr = (addr + 7) & ~7;
 madt_addr = addr;
+madt = (void *)(addr);
 madt_size = sizeof(*madt) +
 sizeof(struct madt_processor_apic) * MAX_CPUS +
 #ifdef BX_QEMU
@@ -1672,7 +1673,11 @@ void acpi_bios_init(void)
 #else
 sizeof(struct madt_io_apic);
 #endif
-madt = (void *)(addr);
+for ( i = 0; i < 16; i++ ) {
+if ( PCI_ISA_IRQ_MASK & (1U << i) ) {
+madt_size += sizeof(struct madt_int_override);
+}
+}
 addr += madt_size;
 
 #ifdef BX_QEMU
@@ -1786,7 +1791,6 @@ void acpi_bios_init(void)
 continue;
 }
 int_override++;
-madt_size += sizeof(struct madt_int_override);
 }
 acpi_build_table_header((struct acpi_table_header *)madt,
 "APIC", madt_size, 1);
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] Clean up RSDT Table Creation

2009-05-15 Thread Beth Kon

This patch is also based on the patch by Vincent Minet. It corrects the size
calculation of the RSDT, and checks for overflow of MAX_RSDT_ENTRIES, 
assuming that the external table entry count is contained within
MAX_RSDT_ENTRIES.

Signed-off-by: Beth Kon 

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 7f62e4f..ac8f9c5 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1626,7 +1626,7 @@ void acpi_bios_init(void)
 addr = base_addr = ram_size - ACPI_DATA_SIZE;
 rsdt_addr = addr;
 rsdt = (void *)(addr);
-rsdt_size = sizeof(*rsdt) + external_tables * 4;
+rsdt_size = sizeof(*rsdt);
 addr += rsdt_size;
 
 fadt_addr = addr;
@@ -1873,16 +1873,6 @@ void acpi_bios_init(void)
  "HPET", sizeof(*hpet), 1);
 #endif
 
-acpi_additional_tables(); /* resets cfg to required entry */
-for(i = 0; i < external_tables; i++) {
-uint16_t len;
-if(acpi_load_table(i, addr, &len) < 0)
-BX_PANIC("Failed to load ACPI table from QEMU\n");
-rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr);
-addr += len;
-if(addr >= ram_size)
-BX_PANIC("ACPI table overflow\n");
-}
 #endif
 
 /* RSDT */
@@ -1895,6 +1885,19 @@ void acpi_bios_init(void)
 //  rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
 if (nb_numa_nodes > 0)
 rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr);
+acpi_additional_tables(); /* resets cfg to required entry */
+/* external_tables load must occur last to 
+ * properly check for MAX_RSDT_ENTRIES overflow.
+ */
+for(i = 0; i < external_tables; i++) {
+uint16_t len;
+if(acpi_load_table(i, addr, &len) < 0)
+BX_PANIC("Failed to load ACPI table from QEMU\n");
+rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr);
+addr += len;
+if((addr >= ram_size) || (nb_rsdt_entries > MAX_RSDT_ENTRIES)) 
+BX_PANIC("ACPI table overflow\n");
+}
 #endif
 rsdt_size -= MAX_RSDT_ENTRIES * 4;
 rsdt_size += nb_rsdt_entries * 4;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Subject:[PATCH 1/2] Clean up MADT Table Creation

2009-05-15 Thread Beth Kon


Beth Kon wrote:

This patch is also based on the patch by Vincent Minet. It corrects the size
calculation of the RSDT, and checks for overflow of MAX_RSDT_ENTRIES, 
assuming that the external table entry count is contained within

MAX_RSDT_ENTRIES.

Signed-off-by: Beth Kon 

  
This should have been patch 2/2. I think git-send-email didn't like that 
I didn't have a space after Subject: . Let me try to resend with the 
space added.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v2] Shared memory device with interrupt support

2009-05-15 Thread Kumar, Venkat

Hi Cam, I have gone through you latest shared memory patch.
I have a few questions and comments.

Comment:-
+if (ivshmem_enabled) {
+ivshmem_init(ivshmem_device);
+ram_size += ivshmem_get_size();
+}
+

In your initial patch this part of the patch is

+if (ivshmem_enabled) {
+ivshmem_init(ivshmem_device);
+phys_ram_size += ivshmem_get_size();
+}

I think the phys_ram_size += ivshmem_get_size(); is correct.

Question:-
You are giving the desired virtual address for mmaping the shared memory object 
as "s->ivshmem_ptr" which is "phys_ram_base + s->ivshmem_offset". This desired 
virtual address is nothing but the base virtual address of the memory that you 
are allocating after incrementing phys_ram_size. So now s->ivshmem_ptr would 
point to a new set of memory, which is the shared memory region instead of 
memory allocated through qemu_alloc_physram, which means if pages are allocated 
for "sh->ivshmem_ptr" virtual address range then those pages can never be 
addressed again. Correct me if my understanding is wrong.

Thx,

Venkat


-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of 
Cam Macdonell
Sent: Thursday, May 07, 2009 9:47 PM
To: kvm@vger.kernel.org
Cc: Cam Macdonell
Subject: [PATCH v2] Shared memory device with interrupt support

Support an inter-vm shared memory device that maps a shared-memory object 
as a PCI device in the guest.  This patch also supports interrupts between 
guest by communicating over a unix domain socket.  This patch applies to the 
qemu-kvm repository.

This device now creates a qemu character device and sends 1-bytes messages to 
trigger interrupts.  Writes are trigger by writing to the "Doorbell" register 
on the shared memory PCI device.  The lower 8-bits of the value written to this 
register are sent as the 1-byte message so different meanings of interrupts can 
be supported.

Interrupts are only supported between 2 VMs currently.  One VM must act as the 
server by adding "server" to the command-line argument.  Shared memory devices 
are created with the following command-line:

-ivhshmem ,,[unix:][,server]

Interrupts can also be used between host and guest as well by implementing a 
listener on the host.

Cam

---
 Makefile.target |3 +
 hw/ivshmem.c|  421 +++
 hw/pc.c |6 +
 hw/pc.h |3 +
 qemu-options.hx |   14 ++
 sysemu.h|8 +
 vl.c|   14 ++
 7 files changed, 469 insertions(+), 0 deletions(-)
 create mode 100644 hw/ivshmem.c

diff --git a/Makefile.target b/Makefile.target
index b68a689..3190bba 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -643,6 +643,9 @@ OBJS += pcnet.o
 OBJS += rtl8139.o
 OBJS += e1000.o

+# Inter-VM PCI shared memory
+OBJS += ivshmem.o
+
 # Generic watchdog support and some watchdog devices
 OBJS += watchdog.o
 OBJS += wdt_ib700.o wdt_i6300esb.o
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
new file mode 100644
index 000..95e2268
--- /dev/null
+++ b/hw/ivshmem.c
@@ -0,0 +1,421 @@
+/*
+ * Inter-VM Shared Memory PCI device.
+ *
+ * Author:
+ *  Cam Macdonell 
+ *
+ * Based On: cirrus_vga.c and rtl8139.c
+ *
+ * This code is licensed under the GNU GPL v2.
+ */
+
+#include "hw.h"
+#include "console.h"
+#include "pc.h"
+#include "pci.h"
+#include "sysemu.h"
+
+#include "qemu-common.h"
+#include 
+
+#define PCI_COMMAND_IOACCESS0x0001
+#define PCI_COMMAND_MEMACCESS   0x0002
+#define PCI_COMMAND_BUSMASTER   0x0004
+
+//#define DEBUG_IVSHMEM
+
+#ifdef DEBUG_IVSHMEM
+#define IVSHMEM_DPRINTF(fmt, args...)\
+do {printf("IVSHMEM: " fmt, ##args); } while (0)
+#else
+#define IVSHMEM_DPRINTF(fmt, args...)
+#endif
+
+typedef struct IVShmemState {
+uint16_t intrmask;
+uint16_t intrstatus;
+uint16_t doorbell;
+uint8_t *ivshmem_ptr;
+unsigned long ivshmem_offset;
+unsigned int ivshmem_size;
+unsigned long bios_offset;
+unsigned int bios_size;
+target_phys_addr_t base_ctrl;
+int it_shift;
+PCIDevice *pci_dev;
+CharDriverState * chr;
+unsigned long map_addr;
+unsigned long map_end;
+int ivshmem_mmio_io_addr;
+} IVShmemState;
+
+typedef struct PCI_IVShmemState {
+PCIDevice dev;
+IVShmemState ivshmem_state;
+} PCI_IVShmemState;
+
+typedef struct IVShmemDesc {
+char name[1024];
+char * chrdev;
+int size;
+} IVShmemDesc;
+
+
+/* registers for the Inter-VM shared memory device */
+enum ivshmem_registers {
+IntrMask = 0,
+IntrStatus = 16,
+Doorbell = 32
+};
+
+static int num_ivshmem_devices = 0;
+static IVShmemDesc ivshmem_desc;
+
+static void ivshmem_map(PCIDevice *pci_dev, int region_num,
+uint32_t addr, uint32_t size, int type)
+{
+PCI_IVShmemState *d = (PCI_IVShmemState *)pci_dev;
+IVShmemState *s = &d->ivshmem_state;
+
+IVSHMEM_DPRINTF("addr = %u size = %u\n", addr, size);
+

Subject:[PATCH 1/2] Clean up MADT Table Creation

2009-05-15 Thread Beth Kon


This patch is also based on the patch by Vincent Minet. It corrects the size
calculation of the RSDT, and checks for overflow of MAX_RSDT_ENTRIES, 
assuming that the external table entry count is contained within
MAX_RSDT_ENTRIES.

Signed-off-by: Beth Kon 

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 7f62e4f..ac8f9c5 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1626,7 +1626,7 @@ void acpi_bios_init(void)
 addr = base_addr = ram_size - ACPI_DATA_SIZE;
 rsdt_addr = addr;
 rsdt = (void *)(addr);
-rsdt_size = sizeof(*rsdt) + external_tables * 4;
+rsdt_size = sizeof(*rsdt);
 addr += rsdt_size;
 
 fadt_addr = addr;
@@ -1873,16 +1873,6 @@ void acpi_bios_init(void)
  "HPET", sizeof(*hpet), 1);
 #endif
 
-acpi_additional_tables(); /* resets cfg to required entry */
-for(i = 0; i < external_tables; i++) {
-uint16_t len;
-if(acpi_load_table(i, addr, &len) < 0)
-BX_PANIC("Failed to load ACPI table from QEMU\n");
-rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr);
-addr += len;
-if(addr >= ram_size)
-BX_PANIC("ACPI table overflow\n");
-}
 #endif
 
 /* RSDT */
@@ -1895,6 +1885,19 @@ void acpi_bios_init(void)
 //  rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
 if (nb_numa_nodes > 0)
 rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr);
+acpi_additional_tables(); /* resets cfg to required entry */
+/* external_tables load must occur last to 
+ * properly check for MAX_RSDT_ENTRIES overflow.
+ */
+for(i = 0; i < external_tables; i++) {
+uint16_t len;
+if(acpi_load_table(i, addr, &len) < 0)
+BX_PANIC("Failed to load ACPI table from QEMU\n");
+rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr);
+addr += len;
+if((addr >= ram_size) || (nb_rsdt_entries > MAX_RSDT_ENTRIES)) 
+BX_PANIC("ACPI table overflow\n");
+}
 #endif
 rsdt_size -= MAX_RSDT_ENTRIES * 4;
 rsdt_size += nb_rsdt_entries * 4;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Subject:[PATCH 1/2] Clean up MADT Table Creation

2009-05-15 Thread Beth Kon


This patch is based on the recent patch from Vincent Minet. I split Vincent's
changes into 2 patches (to separate MADT and RSDT table cleanup, as suggested by
Marcelo) and added a bit to them. And to give credit where it is due, this
cleanup is also related to the patch Marcelo provided when the HPET addition 
tripped over the same problem. (Thanks again Marcelo :-) 

This patch moves all the table layout calculations to the same area of
acpi_bios_init. This prevents corruption problems when, in the middle of
filling in the tables, the MADT table size grows. The idea is to do all the 
layout in one section, then fill things in afterwards. It also corrects a 
problem where the madt table was memset to 0 before the final size of the 
table had been determined.

Signed-off-by: Beth Kon 

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index cbd5f15..7f62e4f 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1665,6 +1665,7 @@ void acpi_bios_init(void)
 
 addr = (addr + 7) & ~7;
 madt_addr = addr;
+madt = (void *)(addr);
 madt_size = sizeof(*madt) +
 sizeof(struct madt_processor_apic) * MAX_CPUS +
 #ifdef BX_QEMU
@@ -1672,7 +1673,11 @@ void acpi_bios_init(void)
 #else
 sizeof(struct madt_io_apic);
 #endif
-madt = (void *)(addr);
+for ( i = 0; i < 16; i++ ) {
+if ( PCI_ISA_IRQ_MASK & (1U << i) ) {
+madt_size += sizeof(struct madt_int_override);
+}
+}
 addr += madt_size;
 
 #ifdef BX_QEMU
@@ -1786,7 +1791,6 @@ void acpi_bios_init(void)
 continue;
 }
 int_override++;
-madt_size += sizeof(struct madt_int_override);
 }
 acpi_build_table_header((struct acpi_table_header *)madt,
 "APIC", madt_size, 1);
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: XP smp using a lot of CPU [SOLVED]

2009-05-15 Thread Brian Jackson



On May 15, 2009, at 3:24 PM, Ross Boylan wrote:


Using ACPI fixes the problem; CPU useage is now quite low.  Start line
was
sudo vdeq kvm -net nic,vlan=1,macaddr=52:54:a0:12:01:00 \
   -net vde,vlan=1,sock=/var/run/vde2/tap0.ctl \
   -boot d -cdrom /usr/local/backup/XPProSP3.iso \
   -std-vga -hda /dev/turtle/XP00 \
   -soundhw es1370 -localtime -m 1G -smp 2
I switched to -boot c later.

I ended up doing a fresh install; my repair got mucked up and I got  
the
message "The requested lookup key was not found in any active  
activation
context" when I entered a location into MSIE, including when I tried  
to
run Windows Update.  Googling showed this might indicate some  
permission

or file corruption issues.  They may have happened during my earlier
(virtual) system hang.

My experience suggests a theory: if you use SMP with XP (i.e., more  
than
1 virtual processor) you should enable acpi, i.e., not say -no- 
acpi.  It
this is true, the advice to run windows with -no-acpi should  
probably be

updated.  It's possible single CPU systems are affected as well.



I removed the note about -no-acpi from the howto on the wiki. I don't  
think that's been true for a long time.


--Iggy





Ross



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] qemu-kvm: Make PC speaker emulation aware of in-kernel PIT

2009-05-15 Thread Marcelo Tosatti

On Thu, May 14, 2009 at 10:43:05PM +0200, Jan Kiszka wrote:
> When using the in-kernel PIT the speaker emulation has to synchronize
> the PIT state with KVM. Enhance the existing speaker sound device and
> allow it to take over port 0x61 by using KVM_CREATE_PIT2 where
> available. This unbreaks -soundhw pcspk in KVM mode.
> 
> Changes in v4:
>  - preserve full PIT state across read-modify-write
>  - update kvm.h
> 
> Changes in v3:
>  - re-added incorrectly dropped kvm_enabled checks
> 
> Changes in v2:
>  - rebased over qemu-kvm and KVM_CREATE_PIT2
>  - refactored hooks in pcspk
> 
> Signed-off-by: Jan Kiszka 

Jan,

You always attempt to use KVM_CREATE_PIT2, so say on migration if the
destination does not support the new ioctl you fallback to in-kernel
dummy naturally. Seems the right thing to do.

Would be nice to avoid sprinkling KVM details inside hw/pcspk.c though
but that is another problem.

Looks good (and v3 kernel patch).

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: XP smp using a lot of CPU [SOLVED]

2009-05-15 Thread Ross Boylan

Using ACPI fixes the problem; CPU useage is now quite low.  Start line
was
sudo vdeq kvm -net nic,vlan=1,macaddr=52:54:a0:12:01:00 \
-net vde,vlan=1,sock=/var/run/vde2/tap0.ctl \
-boot d -cdrom /usr/local/backup/XPProSP3.iso \
-std-vga -hda /dev/turtle/XP00 \
-soundhw es1370 -localtime -m 1G -smp 2
I switched to -boot c later.

I ended up doing a fresh install; my repair got mucked up and I got the
message "The requested lookup key was not found in any active activation
context" when I entered a location into MSIE, including when I tried to
run Windows Update.  Googling showed this might indicate some permission
or file corruption issues.  They may have happened during my earlier
(virtual) system hang.

My experience suggests a theory: if you use SMP with XP (i.e., more than
1 virtual processor) you should enable acpi, i.e., not say -no-acpi.  It
this is true, the advice to run windows with -no-acpi should probably be
updated.  It's possible single CPU systems are affected as well.

Ross

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bios: Fix MADT corruption and RSDT size when using -acpitable

2009-05-15 Thread Beth Kon


Marcelo Tosatti wrote:

Beth,

On Thu, May 14, 2009 at 12:20:29PM -0400, Beth Kon wrote:
  

Anthony Liguori wrote:


Vincent Minet wrote:
  

External ACPI tables are counted twice for the RSDT size and the load
address for the first external table is in the MADT (interrupt override
entries are overwritten).

Signed-off-by: Vincent Minet 
  


Beth,

I think you had a patch attempting to address the same issue.  It was  
a bit more involved though.


Which is the proper fix and are they both to the same problem?
  
They are for 2 different bases. My patch was for qemu's bochs bios and  
this is for qemu-kvm/kvm/bios/rombios32.c. They are pretty divergent in  
this area of setting up the ACPI tables. My patch is still needed for  
the qemu base. I hope we'll be getting to one base soon :-)


Assuming the intent of the code was for MAX_RSDT_ENTRIES to include  
external_tables, this patch looks correct. I think one additional check  
would be needed (in my patch) to make sure that the code doesn't exceed  
MAX_RSDT_ENTRIES when the external tables are being loaded.


My patch also puts all the code that calculates madt_size in the same  
place, at the beginning of the table layout. I believe this is neater  
and will avoid problems like this one in the future. As much as  
possible, I think it best to get all the tables layed out, then fill  
them in. If for some reason this is not acceptable, we need to add a big  
note that no tables should be layed out after the madt because the madt  
may grow further down in the code and overwrite the other table.



I like this better too, see questions/comments below.

  

Regards,

Anthony Liguori

  

---
 kvm/bios/rombios32.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index cbd5f15..289361b 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1626,7 +1626,7 @@ void acpi_bios_init(void)
 addr = base_addr = ram_size - ACPI_DATA_SIZE;
 rsdt_addr = addr;
 rsdt = (void *)(addr);
-rsdt_size = sizeof(*rsdt) + external_tables * 4;
+rsdt_size = sizeof(*rsdt);
 addr += rsdt_size;
  fadt_addr = addr;
@@ -1787,6 +1787,7 @@ void acpi_bios_init(void)
 }
 int_override++;
 madt_size += sizeof(struct madt_int_override);
+addr += sizeof(struct madt_int_override);
 }
 acpi_build_table_header((struct acpi_table_header *)madt,
 "APIC", madt_size, 1);
  




  

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index cbd5f15..23835b6 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1626,7 +1626,7 @@ void acpi_bios_init(void)
 addr = base_addr = ram_size - ACPI_DATA_SIZE;
 rsdt_addr = addr;
 rsdt = (void *)(addr);
-rsdt_size = sizeof(*rsdt) + external_tables * 4;
+rsdt_size = sizeof(*rsdt);
 addr += rsdt_size;
 
 fadt_addr = addr;

@@ -1665,6 +1665,7 @@ void acpi_bios_init(void)
 
 addr = (addr + 7) & ~7;

 madt_addr = addr;
+madt = (void *)(addr);
 madt_size = sizeof(*madt) +
 sizeof(struct madt_processor_apic) * MAX_CPUS +
 #ifdef BX_QEMU
@@ -1672,7 +1673,11 @@ void acpi_bios_init(void)
 #else
 sizeof(struct madt_io_apic);
 #endif
-madt = (void *)(addr);
+for ( i = 0; i < 16; i++ ) {
+if ( PCI_ISA_IRQ_MASK & (1U << i) ) {
+madt_size += sizeof(struct madt_int_override);
+}
+}
 addr += madt_size;



This bug could only affect the HPET descriptor right? 
  
I'm not sure what you're asking. There were 2 bugs that Vincent pointed 
out. The first caused an incorrect rsdt_size to be reported, and the 
second (missing addr += sizeof(struct madt_int_override)) caused 
corruption of whatever came after the MADT. But even if his patch were 
applied, any future code that added a table and manipulated addr between 
the following points:


...
(about line 1676)
madt = (void *)(addr);
addr += madt_size;
...
(about line 1789)
madt_size += sizeof(struct madt_int_override);
addr += sizeof(struct madt_int_override);

would have wound up causing some kind of corruption, as happened with 
the HPET. Also the "memset(madt, 0, madt_size)" around line 1740 was not 
using the complete madt_size.


So this seems undesirable, and that's why I suggested moving all addr 
manipulation (with the exception of additional tables at the very end) 
to the same section of the table layout code. Seems best to manage 
madt_size all in one place.


  

 #ifdef BX_QEMU
@@ -1786,7 +1791,6 @@ void acpi_bios_init(void)
 continue;
 }
 int_override++;
-madt_size += sizeof(struct madt_int_override);
 }
 acpi_build_table_header((struct acpi_table_header *)madt,
 "APIC", madt_size, 1);
@@ -1868,17 +1872,6 @@ void acpi_bios_init(void)
 acpi_build_table_header(

Re: RFC: convert KVMTRACE to event traces

2009-05-15 Thread Marcelo Tosatti

On Fri, May 15, 2009 at 01:10:34PM -0400, Christoph Hellwig wrote:
> On Thu, May 14, 2009 at 05:30:16PM -0300, Marcelo Tosatti wrote:
> > +   trace_kvm_cr_write(cr, val);
> > switch (cr) {
> > case 0:
> > -   kvm_set_cr0(vcpu, kvm_register_read(vcpu, reg));
> > +   kvm_set_cr0(vcpu, val);
> > skip_emulated_instruction(vcpu);
> 
> Do we really need one trace point covering all cr writes, _and_ one for
> each specific register?

There is one tracepoint named kvm_cr that covers cr reads and writes.

kvm_trace_cr_read/kvm_trace_cr_write are macros that expand to
kvm_trace_cr(rw=1 or rw=0). Perhaps that is not a very good idea.

> 
> > if (!npt_enabled)
> > -   KVMTRACE_3D(PAGE_FAULT, &svm->vcpu, error_code,
> > -   (u32)fault_address, (u32)(fault_address >> 32),
> > -   handler);
> > +   trace_kvm_page_fault(fault_address, error_code);
> > else
> > -   KVMTRACE_3D(TDP_FAULT, &svm->vcpu, error_code,
> > -   (u32)fault_address, (u32)(fault_address >> 32),
> > -   handler);
> > +   trace_kvm_tdp_page_fault(fault_address, error_code);
> 
> Again this seems a bit cumbersome.  Why not just one tracepoint for
> page faults, with a flag if we're using npt or not?

Issue is the meaning of these faults is different. With npt disabled the
fault is a guest fault (like a normal pagefault), but with npt enabled
the fault indicates the host pagetables the hardware uses to do the
translation are not set up correctly.

I did unify them as you suggest but reverted back to separate
tracepoints because the unification might be confusing.

Can be unified later if desirable.

> > +ifeq ($(CONFIG_TRACEPOINTS),y)
> > +trace-objs = kvm-traces.o
> > +arch-trace-objs = kvm-traces-arch.o
> > +endif
> > +
> >  EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
> >  
> >  kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o 
> > \
> > -   i8254.o
> > +   i8254.o $(trace-objs)
> >  obj-$(CONFIG_KVM) += kvm.o
> > -kvm-intel-objs = vmx.o
> > +kvm-intel-objs = vmx.o $(arch-trace-objs)
> >  obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
> > -kvm-amd-objs = svm.o
> > +kvm-amd-objs = svm.o $(arch-trace-objs)
> >  obj-$(CONFIG_KVM_AMD) += kvm-amd.o
> 
> The option to select even tracing bits is CONFIG_EVENT_TRACING and the
> makefile syntax used here (both the original makefile and the additions)
> is rather awkward.
> 
> A proper arch/x86/kvm/Makefile including tracing bits should look like
> the following:
> 
> -- snip --
> EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
> 
> kvm-y += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
>  coalesced_mmio.o irq_comm.o)
> kvm-$(CONFIG_KVM_TRACE)   += $(addprefix ../../../virt/kvm/, kvm_trace.o)
> kvm-$(CONFIG_IOMMU_API)   += $(addprefix ../../../virt/kvm/, iommu.o)
> kmv-y += x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \
>  i8254.o
> 
> kvm-$(CONFIG_EVENT_TRACING) += kvm-traces.o
> kvm-arch-trace-$(CONFIG_EVENT_TRACING) += kvm-traces-arch.o
> 
> kvm-intel-y   += vmx.o $(kvm-arch-trace-y)
> kvm-amd-y += svm.o $(kvm-arch-trace-y)
> 
> obj-$(CONFIG_KVM) += kvm.o
> obj-$(CONFIG_KVM_INTEL)   += kvm-intel.o
> obj-$(CONFIG_KVM_AMD) += kvm-amd.o
> -- snip --
> 
> and do we actually still need kvm_trace.o after this?

Your version looks much nicer. kvm_trace.o can disappear as soon as 
this is in Avi's tree and a decent replacement for user/kvm_trace.c 
is in qemu-kvm.git.

> Anyway, I'll send the upstream part of the makefile cleanup out ASAP,
> then you can rebase later.

OK.

> 
> > Index: linux-2.6-x86-2/arch/x86/kvm/kvm-traces.c
> > ===
> > --- /dev/null
> > +++ linux-2.6-x86-2/arch/x86/kvm/kvm-traces.c
> > @@ -0,0 +1,5 @@
> > +#include 
> > +
> > +
> > +#define CREATE_TRACE_POINTS
> > +#include 
> 
> Can't we just put this into some other common .c file?  That would also
> reduce the amount of makefile magic required.
> 
> > Index: linux-2.6-x86-2/arch/x86/kvm/kvm-traces-arch.c
> > ===
> > --- /dev/null
> > +++ linux-2.6-x86-2/arch/x86/kvm/kvm-traces-arch.c
> > @@ -0,0 +1,5 @@
> > +#include 
> > +
> > +
> > +#define CREATE_TRACE_POINTS
> > +#include 
> 
> Same for this one, especially as the makefile hackery required for this
> one is even worse..

Probably for both. Now that you say I can't explain the reason for the
separate C files. Will put this up in a git tree in a couple of hours.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 resend 0/6] ATS capability support for Intel IOMMU

2009-05-15 Thread Jesse Barnes

On Thu, 14 May 2009 10:32:05 +0800
Yu Zhao  wrote:

> This patch series implements Address Translation Service support for
> the Intel IOMMU. The PCIe Endpoint that supports ATS capability can
> request the DMA address translation from the IOMMU and cache the
> translation itself. This can alleviate IOMMU TLB pressure and improve
> the hardware performance in the I/O virtualization environment.
> 
> The ATS is one of PCI-SIG I/O Virtualization (IOV) Specifications. The
> spec can be found at: http://www.pcisig.com/specifications/iov/ats/
> (it requires membership).

These ones can go through David's tree.  You can add my:
Acked-by: Jesse Barnes 

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: convert KVMTRACE to event traces

2009-05-15 Thread Christoph Hellwig

On Thu, May 14, 2009 at 05:30:16PM -0300, Marcelo Tosatti wrote:
> + trace_kvm_cr_write(cr, val);
>   switch (cr) {
>   case 0:
> - kvm_set_cr0(vcpu, kvm_register_read(vcpu, reg));
> + kvm_set_cr0(vcpu, val);
>   skip_emulated_instruction(vcpu);

Do we really need one trace point covering all cr writes, _and_ one for
each specific register?

>   if (!npt_enabled)
> - KVMTRACE_3D(PAGE_FAULT, &svm->vcpu, error_code,
> - (u32)fault_address, (u32)(fault_address >> 32),
> - handler);
> + trace_kvm_page_fault(fault_address, error_code);
>   else
> - KVMTRACE_3D(TDP_FAULT, &svm->vcpu, error_code,
> - (u32)fault_address, (u32)(fault_address >> 32),
> - handler);
> + trace_kvm_tdp_page_fault(fault_address, error_code);

Again this seems a bit cumbersome.  Why not just one tracepoint for
page faults, with a flag if we're using npt or not?

> +ifeq ($(CONFIG_TRACEPOINTS),y)
> +trace-objs = kvm-traces.o
> +arch-trace-objs = kvm-traces-arch.o
> +endif
> +
>  EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
>  
>  kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \
> - i8254.o
> + i8254.o $(trace-objs)
>  obj-$(CONFIG_KVM) += kvm.o
> -kvm-intel-objs = vmx.o
> +kvm-intel-objs = vmx.o $(arch-trace-objs)
>  obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
> -kvm-amd-objs = svm.o
> +kvm-amd-objs = svm.o $(arch-trace-objs)
>  obj-$(CONFIG_KVM_AMD) += kvm-amd.o

The option to select even tracing bits is CONFIG_EVENT_TRACING and the
makefile syntax used here (both the original makefile and the additions)
is rather awkward.

A proper arch/x86/kvm/Makefile including tracing bits should look like
the following:

-- snip --
EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm

kvm-y   += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
   coalesced_mmio.o irq_comm.o)
kvm-$(CONFIG_KVM_TRACE) += $(addprefix ../../../virt/kvm/, kvm_trace.o)
kvm-$(CONFIG_IOMMU_API) += $(addprefix ../../../virt/kvm/, iommu.o)
kmv-y   += x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \
   i8254.o

kvm-$(CONFIG_EVENT_TRACING) += kvm-traces.o
kvm-arch-trace-$(CONFIG_EVENT_TRACING) += kvm-traces-arch.o

kvm-intel-y += vmx.o $(kvm-arch-trace-y)
kvm-amd-y   += svm.o $(kvm-arch-trace-y)

obj-$(CONFIG_KVM)   += kvm.o
obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
obj-$(CONFIG_KVM_AMD)   += kvm-amd.o
-- snip --

and do we actually still need kvm_trace.o after this?

Anyway, I'll send the upstream part of the makefile cleanup out ASAP,
then you can rebase later.

> Index: linux-2.6-x86-2/arch/x86/kvm/kvm-traces.c
> ===
> --- /dev/null
> +++ linux-2.6-x86-2/arch/x86/kvm/kvm-traces.c
> @@ -0,0 +1,5 @@
> +#include 
> +
> +
> +#define CREATE_TRACE_POINTS
> +#include 

Can't we just put this into some other common .c file?  That would also
reduce the amount of makefile magic required.

> Index: linux-2.6-x86-2/arch/x86/kvm/kvm-traces-arch.c
> ===
> --- /dev/null
> +++ linux-2.6-x86-2/arch/x86/kvm/kvm-traces-arch.c
> @@ -0,0 +1,5 @@
> +#include 
> +
> +
> +#define CREATE_TRACE_POINTS
> +#include 

Same for this one, especially as the makefile hackery required for this
one is even worse..

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3] kmod: Add distclean rule

2009-05-15 Thread Jan Kiszka

The smaller the patch... sigh.

>

Remove the configure output config.kbuild, config.mak and arch links via
distclean.

Signed-off-by: Jan Kiszka 
---

 Makefile |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/Makefile b/Makefile
index dad5f0b..cef121d 100644
--- a/Makefile
+++ b/Makefile
@@ -68,3 +68,6 @@ rpm:  all
 
 clean:
$(MAKE) -C $(KERNELDIR) M=`pwd` $@
+
+distclean: clean
+   rm -f config.kbuild config.mak include/asm include-compat/asm
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] qemu-kvm: add iosignalfd support

2009-05-15 Thread Gregory Haskins

An iosignalfd allows an eventfd to attach to a specific PIO/MMIO region in the
guest.  Any guest-writes to that region will trigger an eventfd signal.

For more details, see the kernel side patches submitted here:

http://lkml.org/lkml/2009/5/15/303

Signed-off-by: Gregory Haskins 
---

 kvm/libkvm/libkvm.c |   68 +++
 kvm/libkvm/libkvm.h |   39 +
 2 files changed, 107 insertions(+), 0 deletions(-)

diff --git a/kvm/libkvm/libkvm.c b/kvm/libkvm/libkvm.c
index ccab985..dc3414f 100644
--- a/kvm/libkvm/libkvm.c
+++ b/kvm/libkvm/libkvm.c
@@ -1501,3 +1501,71 @@ int kvm_destroy_irqfd(kvm_context_t kvm, int fd, int 
gsi, int flags)
 }
 
 #endif /* KVM_CAP_IRQFD */
+
+#ifdef KVM_CAP_IOSIGNALFD
+
+int kvm_assign_iosignalfd(kvm_context_t kvm, unsigned long cookie,
+ unsigned long addr, size_t len,
+ int fd, int flags)
+{
+   int r;
+   int type = flags & IOSIGNALFD_FLAG_PIO; 
+   struct kvm_iosignalfd data = {
+   .cookie = cookie,
+   .addr   = addr,
+   .len= len,
+   .fd = fd,
+   .flags  = type ? KVM_IOSIGNALFD_FLAG_PIO : 0,
+   };
+
+   if (!kvm_check_extension(kvm, KVM_CAP_IOSIGNALFD))
+   return -ENOENT;
+
+   r = ioctl(kvm->vm_fd, KVM_IOSIGNALFD, &data);
+   if (r == -1)
+   r = -errno;
+   return r;
+}
+
+int kvm_deassign_iosignalfd(kvm_context_t kvm, unsigned long cookie,
+   unsigned long addr, int flags)
+{
+   int r;
+   int type = flags & IOSIGNALFD_FLAG_PIO; 
+   int cvalid = flags & IOSIGNALFD_FLAG_COOKIE;
+   struct kvm_iosignalfd data = {
+   .cookie  = cookie,
+   .addr= addr,
+   .flags   = KVM_IOSIGNALFD_FLAG_DEASSIGN |
+   (type ? KVM_IOSIGNALFD_FLAG_PIO : 0) |
+   (cvalid ? KVM_IOSIGNALFD_FLAG_COOKIE : 0),
+   };
+
+   if (!kvm_check_extension(kvm, KVM_CAP_IOSIGNALFD))
+   return -ENOENT;
+
+   r = ioctl(kvm->vm_fd, KVM_IOSIGNALFD, &data);
+   if (r == -1)
+   r = -errno;
+   return r;
+}
+
+#else /* KVM_CAP_IOSIGNALFD */
+
+int kvm_assign_iosignalfd(kvm_context_t kvm, unsigned long cookie,
+ unsigned long addr, size_t len,
+ int fd, int flags)
+{
+   return -ENOENT;
+}
+
+int kvm_deassign_iosignalfd(kvm_context_t kvm, unsigned long cookie,
+   unsigned long addr, int flags)
+{
+   return -ENOENT;
+}
+
+#endif /* KVM_CAP_IOSIGNALFD */
+
+
+
diff --git a/kvm/libkvm/libkvm.h b/kvm/libkvm/libkvm.h
index 3ccbe3d..ea81e55 100644
--- a/kvm/libkvm/libkvm.h
+++ b/kvm/libkvm/libkvm.h
@@ -882,6 +882,45 @@ int kvm_create_irqfd(kvm_context_t kvm, int gsi, int 
flags);
  */
 int kvm_destroy_irqfd(kvm_context_t kvm, int fd, int gsi, int flags);
 
+enum {
+   iosignalfd_option_pio,
+   iosignalfd_option_cookie,
+};
+
+#define IOSIGNALFD_FLAG_PIO(1 << iosignalfd_option_pio)
+#define IOSIGNALFD_FLAG_COOKIE (1 << iosignalfd_option_cookie)
+
+/*!
+ * \brief Assign an eventfd to an IO port (PIO or MMIO)
+ *
+ * Assigns an eventfd based file-descriptor to a specific PIO or MMIO
+ * address range.  Any guest writes to the specified range will generate
+ * an eventfd signal.
+ *
+ * \param kvm Pointer to the current kvm_context
+ * \param cookie A user-assigned cookie for optional use in deassign
+ * \param addr The IO address
+ * \param len The length of the IO region at the address
+ * \param fd The eventfd file-descriptor
+ * \param flags FLAG_PIO: PIO, else MMIO
+ */
+int kvm_assign_iosignalfd(kvm_context_t kvm, unsigned long cookie,
+ unsigned long addr, size_t len,
+ int fd, int flags);
+
+/*!
+ * \brief Deassign an iosignalfd from a previously registered IO port
+ *
+ * Deassigns an iosignalfd previously registered with kvm_assign_iosignalfd()
+ *
+ * \param kvm Pointer to the current kvm_context
+ * \param cookie The cookie to (optionally) match (must specifcy FLAG_COOKIE)
+ * \param addr The IO address to deassign
+ * \param flags FLAG_PIO: PIO, else MMIO, FLAG_COOKIE: cookie is valid  
+ */
+int kvm_deassign_iosignalfd(kvm_context_t kvm, unsigned long cookie,
+   unsigned long addr, int flags);
+
 #ifdef KVM_CAP_DEVICE_MSIX
 int kvm_assign_set_msix_nr(kvm_context_t kvm,
   struct kvm_assigned_msix_nr *msix_nr);

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH v2 0/4] iosignalfd

2009-05-15 Thread Gregory Haskins

Gregory Haskins wrote:
> [
>
> Applies to kvm.git:833367b57c plus the irqfd patch, v8, as posted here:
>
> http://lkml.org/lkml/2009/5/14/258
>   

I should also mention: NOT FOR INCLUSION

I am still testing this code, so this is an rfc for now.
> ]
>
> This is v2 of the series.  For more details, please see the header to
> patch 4/4.
>
> [
>Changelog:
>
>   v2:
>*) added optional data-matching capability (via cookie field)
>*) changed name from iofd to iosignalfd
>*) added io_bus unregister function
>*) implemented deassign feature
>
>   v1:
>*) original release (integrated into irqfd v7 series as "iofd")
> ]
>
> ---
>
> Gregory Haskins (4):
>   kvm: add iosignalfd support
>   kvm: add io_bus unregister function
>   kvm: add return value to kvm_io_bus_register_dev
>   eventfd: export eventfd interfaces for module use
>
>
>  arch/x86/kvm/i8254.c  |7 +-
>  arch/x86/kvm/i8259.c  |5 +
>  fs/eventfd.c  |3 +
>  include/linux/kvm.h   |   15 
>  include/linux/kvm_host.h  |   10 ++-
>  virt/kvm/coalesced_mmio.c |4 +
>  virt/kvm/eventfd.c|  154 
> +
>  virt/kvm/ioapic.c |4 +
>  virt/kvm/kvm_main.c   |   62 --
>  9 files changed, 249 insertions(+), 15 deletions(-)
>
>   




signature.asc
Description: OpenPGP digital signature

[KVM PATCH v2 4/4] kvm: add iosignalfd support

2009-05-15 Thread Gregory Haskins

iosignalfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
signal when written to by a guest.  Host userspace can register any arbitrary
IO address with a corresponding eventfd and then pass the eventfd to a
specific end-point of interest for handling.

Normal IO requires a blocking round-trip since the operation may cause
side-effects in the emulated model or may return data to the caller.
Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
"heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
device model synchronously before returning control back to the vcpu.

However, there is a subclass of IO which acts purely as a trigger for
other IO (such as to kick off an out-of-band DMA request, etc).  For these
patterns, the synchronous call is particularly expensive since we really
only want to simply get our notification transmitted asychronously and
return as quickly as possible.  All the sychronous infrastructure to ensure
proper data-dependencies are met in the normal IO case are just unecessary
overhead for signalling.  This adds additional computational load on the
system, as well as latency to the signalling path.

Therefore, we provide a mechanism for registration of an in-kernel trigger
point that allows the VCPU to only require a very brief, lightweight
exit just long enough to signal an eventfd.  This also means that any
clients compatible with the eventfd interface (which includes userspace
and kernelspace equally well) can now register to be notified. The end
result should be a more flexible and higher performance notification API
for the backend KVM hypervisor and perhipheral components.

To test this theory, we built a test-harness called "doorbell".  This
module has a function called "doorbell_ring()" which simply increments a
counter for each time the doorbell is signaled.  It supports signalling
from either an eventfd, or an ioctl().

We then wired up two paths to the doorbell: One via QEMU via a registered
io region and through the doorbell ioctl().  The other is direct via iosignalfd.

You can download this test harness here:

ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2

The measured results are as follows:

qemu-mmio:   11 iops, 9.09us rtt
iosignalfd-mmio: 200100 iops, 5.00us rtt
iosignalfd-pio:  367300 iops, 2.72us rtt

I didn't measure qemu-pio, because I have to figure out how to register a
PIO region with qemu's device model, and I got lazy.  However, for now we
can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
and -350ns for HC, we get:

qemu-pio:  153139 iops, 6.53us rtt
iosignalfd-hc: 412585 iops, 2.37us rtt

these are just for fun, for now, until I can gather more data.

Here is a graph for your convenience:

http://developer.novell.com/wiki/images/7/76/Iofd-chart.png

The conclusion to draw is that we save about 4us by skipping the userspace
hop.



Signed-off-by: Gregory Haskins 
---

 include/linux/kvm.h  |   15 
 include/linux/kvm_host.h |2 +
 virt/kvm/eventfd.c   |  154 ++
 virt/kvm/kvm_main.c  |   13 
 4 files changed, 184 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index a1ecc6a..9372b12 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -292,6 +292,19 @@ struct kvm_guest_debug {
struct kvm_guest_debug_arch arch;
 };
 
+#define KVM_IOSIGNALFD_FLAG_DEASSIGN  (1 << 0)
+#define KVM_IOSIGNALFD_FLAG_PIO   (1 << 1)
+#define KVM_IOSIGNALFD_FLAG_COOKIE(1 << 2)
+
+struct kvm_iosignalfd {
+   __u64 cookie;
+   __u64 addr;
+   __u32 len;
+   __u32 fd;
+   __u32 flags;
+   __u8  pad[12];
+};
+
 #define KVM_TRC_SHIFT   16
 /*
  * kvm trace categories
@@ -416,6 +429,7 @@ struct kvm_trace_rec {
 /* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */
 #define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30
 #define KVM_CAP_IRQFD 31
+#define KVM_CAP_IOSIGNALFD 32
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -509,6 +523,7 @@ struct kvm_irqfd {
_IOW(KVMIO, 0x74, struct kvm_assigned_msix_entry)
 #define KVM_DEASSIGN_DEV_IRQ   _IOW(KVMIO, 0x75, struct kvm_assigned_irq)
 #define KVM_IRQFD  _IOW(KVMIO, 0x76, struct kvm_irqfd)
+#define KVM_IOSIGNALFD _IOW(KVMIO, 0x77, struct kvm_iosignalfd)
 
 /*
  * ioctls for vcpu fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 214089f..4e4b174 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -137,6 +137,7 @@ struct kvm {
struct kvm_io_bus mmio_bus;
struct kvm_io_bus pio_bus;
struct list_head irqfds;
+   struct list_head iosignalfds;
struct kvm_vm_stat stat;
struct kvm_arch arch;
atomic_t users_count;
@@ -530,5 +531,6 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {}
 
 int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int fla

[KVM PATCH v2 3/4] kvm: add io_bus unregister function

2009-05-15 Thread Gregory Haskins

We want to support the notion of dynamic MMIO/PIO registrations and
therefore will need to support both register as well as unregister.

However, the current io_bus code is structured as a linear array and
is not conducive to unregistering, so refactor to allow "holes" in the
array.  We then enhance the API with an unregister function.

Signed-off-by: Gregory Haskins 
---

 include/linux/kvm_host.h |4 +++-
 virt/kvm/kvm_main.c  |   48 ++
 2 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 94c1a11..214089f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -52,7 +52,7 @@ extern struct kmem_cache *kvm_vcpu_cache;
  * in one place.
  */
 struct kvm_io_bus {
-   int   dev_count;
+   spinlock_t lock;
 #define NR_IOBUS_DEVS 6
struct kvm_io_device *devs[NR_IOBUS_DEVS];
 };
@@ -63,6 +63,8 @@ struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus 
*bus,
  gpa_t addr, int len, int is_write);
 int kvm_io_bus_register_dev(struct kvm_io_bus *bus,
struct kvm_io_device *dev);
+int kvm_io_bus_unregister_dev(struct kvm_io_bus *bus,
+   struct kvm_io_device *dev);
 
 struct kvm_vcpu {
struct kvm *kvm;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 60ba0cf..5f5e443 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2433,16 +2433,18 @@ static struct notifier_block kvm_reboot_notifier = {
 void kvm_io_bus_init(struct kvm_io_bus *bus)
 {
memset(bus, 0, sizeof(*bus));
+   spin_lock_init(&bus->lock);
 }
 
 void kvm_io_bus_destroy(struct kvm_io_bus *bus)
 {
int i;
 
-   for (i = 0; i < bus->dev_count; i++) {
+   for (i = 0; i < NR_IOBUS_DEVS; i++) {
struct kvm_io_device *pos = bus->devs[i];
 
-   kvm_iodevice_destructor(pos);
+   if (pos)
+   kvm_iodevice_destructor(pos);
}
 }
 
@@ -2451,10 +2453,10 @@ struct kvm_io_device *kvm_io_bus_find_dev(struct 
kvm_io_bus *bus,
 {
int i;
 
-   for (i = 0; i < bus->dev_count; i++) {
+   for (i = 0; i < NR_IOBUS_DEVS; i++) {
struct kvm_io_device *pos = bus->devs[i];
 
-   if (pos->in_range(pos, addr, len, is_write))
+   if (pos && pos->in_range(pos, addr, len, is_write))
return pos;
}
 
@@ -2463,12 +2465,42 @@ struct kvm_io_device *kvm_io_bus_find_dev(struct 
kvm_io_bus *bus,
 
 int kvm_io_bus_register_dev(struct kvm_io_bus *bus, struct kvm_io_device *dev)
 {
-   if (bus->dev_count > (NR_IOBUS_DEVS-1))
-   return -ENOSPC;
+   int i;
 
-   bus->devs[bus->dev_count++] = dev;
+   spin_lock(&bus->lock);
 
-   return 0;
+   for (i = 0; i < NR_IOBUS_DEVS; i++) {
+   if (bus->devs[i])
+   continue;
+
+   bus->devs[i] = dev;
+   spin_unlock(&bus->lock);
+   return 0;
+   }
+
+   spin_unlock(&bus->lock);
+
+   return -ENOSPC;
+}
+
+int kvm_io_bus_unregister_dev(struct kvm_io_bus *bus, struct kvm_io_device 
*dev)
+{
+   int i;
+
+   spin_lock(&bus->lock);
+
+   for (i = 0; i < NR_IOBUS_DEVS; i++) {
+
+   if (bus->devs[i] == dev) {
+   bus->devs[i] = NULL;
+   spin_unlock(&bus->lock);
+   return 0;
+   }
+   }
+
+   spin_unlock(&bus->lock);
+
+   return -ENOENT;
 }
 
 static struct notifier_block kvm_cpu_notifier = {

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM PATCH v2 2/4] kvm: add return value to kvm_io_bus_register_dev

2009-05-15 Thread Gregory Haskins

Today this function returns void and will internally BUG_ON if it fails.
We want to create dynamic MMIO/PIO entries driven from userspace later in
the series, so enhance this API to return an error code on failure.

We also fix up all the callsites to check the return code and BUG_ON if
it fails.

The net result should be identical behavior both before and after this
patch.  We are simply laying the groundwork for the dynamic usage

Signed-off-by: Gregory Haskins 
---

 arch/x86/kvm/i8254.c  |7 +--
 arch/x86/kvm/i8259.c  |5 -
 include/linux/kvm_host.h  |4 ++--
 virt/kvm/coalesced_mmio.c |4 +++-
 virt/kvm/ioapic.c |4 +++-
 virt/kvm/kvm_main.c   |7 +--
 6 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 4d6f0d2..cc274d6 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -564,6 +564,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm)
 {
struct kvm_pit *pit;
struct kvm_kpit_state *pit_state;
+   int ret;
 
pit = kzalloc(sizeof(struct kvm_pit), GFP_KERNEL);
if (!pit)
@@ -584,13 +585,15 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm)
pit->dev.write = pit_ioport_write;
pit->dev.in_range = pit_in_range;
pit->dev.private = pit;
-   kvm_io_bus_register_dev(&kvm->pio_bus, &pit->dev);
+   ret = kvm_io_bus_register_dev(&kvm->pio_bus, &pit->dev);
+   BUG_ON(ret < 0);
 
pit->speaker_dev.read = speaker_ioport_read;
pit->speaker_dev.write = speaker_ioport_write;
pit->speaker_dev.in_range = speaker_in_range;
pit->speaker_dev.private = pit;
-   kvm_io_bus_register_dev(&kvm->pio_bus, &pit->speaker_dev);
+   ret = kvm_io_bus_register_dev(&kvm->pio_bus, &pit->speaker_dev);
+   BUG_ON(ret < 0);
 
kvm->arch.vpit = pit;
pit->kvm = kvm;
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 1ccb50c..7d39b5b 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -519,6 +519,8 @@ static void pic_irq_request(void *opaque, int level)
 struct kvm_pic *kvm_create_pic(struct kvm *kvm)
 {
struct kvm_pic *s;
+   int ret;
+
s = kzalloc(sizeof(struct kvm_pic), GFP_KERNEL);
if (!s)
return NULL;
@@ -538,6 +540,7 @@ struct kvm_pic *kvm_create_pic(struct kvm *kvm)
s->dev.write = picdev_write;
s->dev.in_range = picdev_in_range;
s->dev.private = s;
-   kvm_io_bus_register_dev(&kvm->pio_bus, &s->dev);
+   ret = kvm_io_bus_register_dev(&kvm->pio_bus, &s->dev);
+   BUG_ON(ret < 0);
return s;
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index dc91610..94c1a11 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -61,8 +61,8 @@ void kvm_io_bus_init(struct kvm_io_bus *bus);
 void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus,
  gpa_t addr, int len, int is_write);
-void kvm_io_bus_register_dev(struct kvm_io_bus *bus,
-struct kvm_io_device *dev);
+int kvm_io_bus_register_dev(struct kvm_io_bus *bus,
+   struct kvm_io_device *dev);
 
 struct kvm_vcpu {
struct kvm *kvm;
diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 5ae620d..19945e1 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -86,6 +86,7 @@ static void coalesced_mmio_destructor(struct kvm_io_device 
*this)
 int kvm_coalesced_mmio_init(struct kvm *kvm)
 {
struct kvm_coalesced_mmio_dev *dev;
+   int ret;
 
dev = kzalloc(sizeof(struct kvm_coalesced_mmio_dev), GFP_KERNEL);
if (!dev)
@@ -96,7 +97,8 @@ int kvm_coalesced_mmio_init(struct kvm *kvm)
dev->dev.private  = dev;
dev->kvm = kvm;
kvm->coalesced_mmio_dev = dev;
-   kvm_io_bus_register_dev(&kvm->mmio_bus, &dev->dev);
+   ret = kvm_io_bus_register_dev(&kvm->mmio_bus, &dev->dev);
+   BUG_ON(ret < 0);
 
return 0;
 }
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 1eddae9..3eee4c9 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -317,6 +317,7 @@ void kvm_ioapic_reset(struct kvm_ioapic *ioapic)
 int kvm_ioapic_init(struct kvm *kvm)
 {
struct kvm_ioapic *ioapic;
+   int ret;
 
ioapic = kzalloc(sizeof(struct kvm_ioapic), GFP_KERNEL);
if (!ioapic)
@@ -328,7 +329,8 @@ int kvm_ioapic_init(struct kvm *kvm)
ioapic->dev.in_range = ioapic_in_range;
ioapic->dev.private = ioapic;
ioapic->kvm = kvm;
-   kvm_io_bus_register_dev(&kvm->mmio_bus, &ioapic->dev);
+   ret = kvm_io_bus_register_dev(&kvm->mmio_bus, &ioapic->dev);
+   BUG_ON(ret < 0);
return 0;
 }
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b2db766..60ba0cf 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2

[KVM PATCH v2 1/4] eventfd: export eventfd interfaces for module use

2009-05-15 Thread Gregory Haskins

We want to use eventfd from KVM which can be compiled as a module, so
export the interfaces.

Signed-off-by: Gregory Haskins 
---

 fs/eventfd.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 2a701d5..3f0e197 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct eventfd_ctx {
wait_queue_head_t wqh;
@@ -56,6 +57,7 @@ int eventfd_signal(struct file *file, int n)
 
return n;
 }
+EXPORT_SYMBOL_GPL(eventfd_signal);
 
 static int eventfd_release(struct inode *inode, struct file *file)
 {
@@ -197,6 +199,7 @@ struct file *eventfd_fget(int fd)
 
return file;
 }
+EXPORT_SYMBOL_GPL(eventfd_fget);
 
 SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags)
 {

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM PATCH v2 0/4] iosignalfd

2009-05-15 Thread Gregory Haskins

[

Applies to kvm.git:833367b57c plus the irqfd patch, v8, as posted here:

http://lkml.org/lkml/2009/5/14/258

]

This is v2 of the series.  For more details, please see the header to
patch 4/4.

[
   Changelog:

  v2:
   *) added optional data-matching capability (via cookie field)
   *) changed name from iofd to iosignalfd
   *) added io_bus unregister function
   *) implemented deassign feature

  v1:
   *) original release (integrated into irqfd v7 series as "iofd")
]

---

Gregory Haskins (4):
  kvm: add iosignalfd support
  kvm: add io_bus unregister function
  kvm: add return value to kvm_io_bus_register_dev
  eventfd: export eventfd interfaces for module use


 arch/x86/kvm/i8254.c  |7 +-
 arch/x86/kvm/i8259.c  |5 +
 fs/eventfd.c  |3 +
 include/linux/kvm.h   |   15 
 include/linux/kvm_host.h  |   10 ++-
 virt/kvm/coalesced_mmio.c |4 +
 virt/kvm/eventfd.c|  154 +
 virt/kvm/ioapic.c |4 +
 virt/kvm/kvm_main.c   |   62 --
 9 files changed, 249 insertions(+), 15 deletions(-)

-- 
Signature
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] kmod: Add distclean rule

2009-05-15 Thread Jan Kiszka

Jan Kiszka wrote:
> --- a/Makefile
> +++ b/Makefile
> @@ -68,3 +68,6 @@ rpm:all
>  
>  clean:
>   $(MAKE) -C $(KERNELDIR) M=`pwd` $@
> +
> +distclean:
> + rm -f config.kbuild config.mak

This one is "cleaner":

->

Remove the configure output config.kbuild and config.mak via distclean.

Signed-off-by: Jan Kiszka 
---

 Makefile |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/Makefile b/Makefile
index dad5f0b..a4c59c9 100644
--- a/Makefile
+++ b/Makefile
@@ -68,3 +68,6 @@ rpm:  all
 
 clean:
$(MAKE) -C $(KERNELDIR) M=`pwd` $@
+
+distclean: clean
+   rm -f config.kbuild config.mak
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bios: Fix MADT corruption and RSDT size when using -acpitable

2009-05-15 Thread Marcelo Tosatti

Beth,

On Thu, May 14, 2009 at 12:20:29PM -0400, Beth Kon wrote:
> Anthony Liguori wrote:
>> Vincent Minet wrote:
>>> External ACPI tables are counted twice for the RSDT size and the load
>>> address for the first external table is in the MADT (interrupt override
>>> entries are overwritten).
>>>
>>> Signed-off-by: Vincent Minet 
>>>   
>>
>> Beth,
>>
>> I think you had a patch attempting to address the same issue.  It was  
>> a bit more involved though.
>>
>> Which is the proper fix and are they both to the same problem?
> They are for 2 different bases. My patch was for qemu's bochs bios and  
> this is for qemu-kvm/kvm/bios/rombios32.c. They are pretty divergent in  
> this area of setting up the ACPI tables. My patch is still needed for  
> the qemu base. I hope we'll be getting to one base soon :-)
>
> Assuming the intent of the code was for MAX_RSDT_ENTRIES to include  
> external_tables, this patch looks correct. I think one additional check  
> would be needed (in my patch) to make sure that the code doesn't exceed  
> MAX_RSDT_ENTRIES when the external tables are being loaded.
>
> My patch also puts all the code that calculates madt_size in the same  
> place, at the beginning of the table layout. I believe this is neater  
> and will avoid problems like this one in the future. As much as  
> possible, I think it best to get all the tables layed out, then fill  
> them in. If for some reason this is not acceptable, we need to add a big  
> note that no tables should be layed out after the madt because the madt  
> may grow further down in the code and overwrite the other table.

I like this better too, see questions/comments below.

>>
>> Regards,
>>
>> Anthony Liguori
>>
>>> ---
>>>  kvm/bios/rombios32.c |3 ++-
>>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
>>> index cbd5f15..289361b 100755
>>> --- a/kvm/bios/rombios32.c
>>> +++ b/kvm/bios/rombios32.c
>>> @@ -1626,7 +1626,7 @@ void acpi_bios_init(void)
>>>  addr = base_addr = ram_size - ACPI_DATA_SIZE;
>>>  rsdt_addr = addr;
>>>  rsdt = (void *)(addr);
>>> -rsdt_size = sizeof(*rsdt) + external_tables * 4;
>>> +rsdt_size = sizeof(*rsdt);
>>>  addr += rsdt_size;
>>>   fadt_addr = addr;
>>> @@ -1787,6 +1787,7 @@ void acpi_bios_init(void)
>>>  }
>>>  int_override++;
>>>  madt_size += sizeof(struct madt_int_override);
>>> +addr += sizeof(struct madt_int_override);
>>>  }
>>>  acpi_build_table_header((struct acpi_table_header *)madt,
>>>  "APIC", madt_size, 1);
>>>   


> diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
> index cbd5f15..23835b6 100755
> --- a/kvm/bios/rombios32.c
> +++ b/kvm/bios/rombios32.c
> @@ -1626,7 +1626,7 @@ void acpi_bios_init(void)
>  addr = base_addr = ram_size - ACPI_DATA_SIZE;
>  rsdt_addr = addr;
>  rsdt = (void *)(addr);
> -rsdt_size = sizeof(*rsdt) + external_tables * 4;
> +rsdt_size = sizeof(*rsdt);
>  addr += rsdt_size;
>  
>  fadt_addr = addr;
> @@ -1665,6 +1665,7 @@ void acpi_bios_init(void)
>  
>  addr = (addr + 7) & ~7;
>  madt_addr = addr;
> +madt = (void *)(addr);
>  madt_size = sizeof(*madt) +
>  sizeof(struct madt_processor_apic) * MAX_CPUS +
>  #ifdef BX_QEMU
> @@ -1672,7 +1673,11 @@ void acpi_bios_init(void)
>  #else
>  sizeof(struct madt_io_apic);
>  #endif
> -madt = (void *)(addr);
> +for ( i = 0; i < 16; i++ ) {
> +if ( PCI_ISA_IRQ_MASK & (1U << i) ) {
> +madt_size += sizeof(struct madt_int_override);
> +}
> +}
>  addr += madt_size;

This bug could only affect the HPET descriptor right? 

>  #ifdef BX_QEMU
> @@ -1786,7 +1791,6 @@ void acpi_bios_init(void)
>  continue;
>  }
>  int_override++;
> -madt_size += sizeof(struct madt_int_override);
>  }
>  acpi_build_table_header((struct acpi_table_header *)madt,
>  "APIC", madt_size, 1);
> @@ -1868,17 +1872,6 @@ void acpi_bios_init(void)
>  acpi_build_table_header((struct  acpi_table_header *)hpet,
>   "HPET", sizeof(*hpet), 1);
>  #endif
> -
> -acpi_additional_tables(); /* resets cfg to required entry */
> -for(i = 0; i < external_tables; i++) {
> -uint16_t len;
> -if(acpi_load_table(i, addr, &len) < 0)
> -BX_PANIC("Failed to load ACPI table from QEMU\n");
> -rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr);
> -addr += len;
> -if(addr >= ram_size)
> -BX_PANIC("ACPI table overflow\n");
> -}

The external ACPI tables fix(es) are logically separate from the MADT
intoverride size calculation, and so they could be separate patches?

>  #endif
>  
>  /* RSDT */
> @@ -1891,6 +1884,16 @@ void acpi_bios_init(void)
>  //  rsdt->table_offset_e

[PATCH] kmod: Update .gitignore

2009-05-15 Thread Jan Kiszka

Signed-off-by: Jan Kiszka 
---

 .gitignore |  118 +++-
 1 files changed, 60 insertions(+), 58 deletions(-)

diff --git a/.gitignore b/.gitignore
index 22a8200..bdebd0a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -3,64 +3,66 @@
 *~
 *.flat
 *.a
-config.mak
 .*.cmd
-qemu/config-host.h
-qemu/config-host.mak
-user/test/bootstrap
-user/kvmctl
-qemu/dyngen
-qemu/x86_64-softmmu
-qemu/qemu-img
-qemu/qemu-nbd
 *.ko
 *.mod.c
-bios/*.bin
-bios/*.sym
-bios/*.txt
-bios/acpi-dsdt.aml
-vgabios/*.bin
-vgabios/*.txt
-extboot/extboot.bin
-extboot/extboot.img
-extboot/signrom
-kernel/config.kbuild
-kernel/modules.order
-kernel/Module.symvers
-kernel/Modules.symvers
-kernel/Module.markers
-kernel/.tmp_versions
-kernel/include-compat/asm
-kernel/include-compat/asm-x86/asm-x86
-kernel/include
-kernel/x86/modules.order
-kernel/x86/i825[49].[ch]
-kernel/x86/kvm_main.c
-kernel/x86/kvm_svm.h
-kernel/x86/vmx.[ch]
-kernel/x86/svm.[ch]
-kernel/x86/mmu.[ch]
-kernel/x86/paging_tmpl.h
-kernel/x86/x86_emulate.[ch]
-kernel/x86/ioapic.[ch]
-kernel/x86/iodev.h
-kernel/x86/irq.[ch]
-kernel/x86/kvm_trace.c
-kernel/x86/lapic.[ch]
-kernel/x86/tss.h
-kernel/x86/x86.[ch]
-kernel/x86/coalesced_mmio.[ch]
-kernel/x86/kvm_cache_regs.h
-kernel/x86/vtd.c
-kernel/x86/irq_comm.c
-kernel/x86/timer.c
-kernel/x86/kvm_timer.h
-kernel/x86/iommu.c
-qemu/pc-bios/extboot.bin
-qemu/qemu-doc.html
-qemu/*.[18]
-qemu/*.pod
-qemu/qemu-tech.html
-qemu/qemu-options.texi
-user/kvmtrace
-user/test/x86/bootstrap
+config.kbuild
+config.mak
+modules.order
+Module.symvers
+Modules.symvers
+Module.markers
+.tmp_versions
+include-compat/asm
+include-compat/asm-x86/asm-x86
+include
+x86/modules.order
+x86/i825[49].[ch]
+x86/kvm_main.c
+x86/kvm_svm.h
+x86/vmx.[ch]
+x86/svm.[ch]
+x86/mmu.[ch]
+x86/paging_tmpl.h
+x86/x86_emulate.[ch]
+x86/ioapic.[ch]
+x86/iodev.h
+x86/irq.[ch]
+x86/kvm_trace.c
+x86/lapic.[ch]
+x86/tss.h
+x86/x86.[ch]
+x86/coalesced_mmio.[ch]
+x86/kvm_cache_regs.h
+x86/vtd.c
+x86/irq_comm.c
+x86/timer.c
+x86/kvm_timer.h
+x86/iommu.c
+ia64/asm-offsets.c
+ia64/coalesced_mmio.[ch]
+ia64/ioapic.[ch]
+ia64/iodev.h
+ia64/iommu.c
+ia64/irq.h
+ia64/irq_comm.c
+ia64/kvm-ia64.c
+ia64/kvm_fw.c
+ia64/kvm_lib.c
+ia64/kvm_main.c
+ia64/kvm_minstate.h
+ia64/kvm_trace.c
+ia64/lapic.h
+ia64/memcpy.S
+ia64/memset.S
+ia64/misc.h
+ia64/mmio.c
+ia64/optvfault.S
+ia64/process.c
+ia64/trampoline.S
+ia64/vcpu.[ch]
+ia64/vmm.c
+ia64/vmm_ivt.S
+ia64/vti.h
+ia64/vtlb.c
+.stgit-*
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kmod: Add distclean rule

2009-05-15 Thread Jan Kiszka

Remove the configure output config.kbuild and config.mak via distclean.

Signed-off-by: Jan Kiszka 
---

 Makefile |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/Makefile b/Makefile
index dad5f0b..75aab71 100644
--- a/Makefile
+++ b/Makefile
@@ -68,3 +68,6 @@ rpm:  all
 
 clean:
$(MAKE) -C $(KERNELDIR) M=`pwd` $@
+
+distclean:
+   rm -f config.kbuild config.mak
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: XP smp using a lot of CPU

2009-05-15 Thread Ross Boylan

On Fri, 2009-05-15 at 11:56 -0300, Marcelo Tosatti wrote:
> Ross,
> 
> Can you confirm the qemu process CPU consumption is down to acceptable
> levels if you dont specify -no-acpi?
> 
> Thanks
Simply starting without -no-acpi did not help.  I tried to do a Windows
XP repair, but seemed to end up nasically doing a reinstall.  The system
now seems to be hung up.

I'm probably going to end up trying a fresh install; I'll report more
results when I have them.
> 
> 
> On Thu, May 14, 2009 at 01:01:11PM -0700, Ross Boylan wrote:
> > On Wed, 2009-05-13 at 09:56 +0300, Avi Kivity wrote:
> > > Ross Boylan wrote:
> > > > I just installed XP into a new VM, specifying -smp 2 for the machine.
> > > > According to top, it's using nearly 200% of a cpu even when I'm not
> > > > doing anything.
> > > >
> > > > Is this real CPU useage, or just a reporting problem (just as my disk
> > > > image is big according to ls, but isn't really)?
> > > >
> > > > If it's real, is there anything I can do about it?
> > > >
> > > > kvm 0.7.2 on Debian Lenny (but 2.6.29 kernel), amd64.  Xeon chips; 32
> > > > bit version of XP pro installed, now fully patched (including the
> > > > Windows Genuine Advantage stuff, though I cancelled it when it wanted to
> > > > run).  
> > > >
> > > > Task manager in XP shows virtually no CPU useage.
> > > >
> > > > Please cc me on responses.
> > > >
> > > >   
> > > 
> > > I'm guessing Windows uses a pio port to sleep, which kvm doesn't 
> > > support.  Can you provide kvm_stat output?
> > markov:~# kvm_stat -1
> > efer_reload0 0
> > exits9921384   566
> > fpu_reload267970 0
> > halt_exits 1 0
> > halt_wakeup3 0
> > host_state_reload402605017
> > hypercalls 0 0
> > insn_emulation   1329455 0
> > insn_emulation_fail  154 0
> > invlpg176773 0
> > io_exits 3818270 0
> > irq_exits1434046   566
> > irq_injections326730 0
> > irq_window164827 0
> > largepages 0 0
> > mmio_exits 35892 0
> > mmu_cache_miss 29760 0
> > mmu_flooded19908 0
> > mmu_pde_zapped 15557 0
> > mmu_pte_updated82088 0
> > mmu_pte_write  97990 0
> > mmu_recycled   0 0
> > mmu_shadow_zapped  43276 0
> > mmu_unsync   891 0
> > mmu_unsync_global  0 0
> > nmi_injections 0 0
> > nmi_window 0 0
> > pf_fixed 1231164 0
> > pf_guest  276083 0
> > remote_tlb_flush  115606 0
> > request_irq0 0
> > request_nmi0 0
> > signal_exits   5 0
> > tlb_flush 960198 0
> > 
> > This is with the VM displaying the XP "It is now safe to turn off your
> > computer".  CPU remains about 200% from kvm.  Invoked with
> > sudo vdeq kvm -net nic,vlan=1,macaddr=52:54:a0:12:01:00 \
> > -net vde,vlan=1,sock=/var/run/vde2/tap0.ctl \
> > -std-vga -hda XP.raw \
> > -boot c \
> > -soundhw es1370 -localtime -no-acpi  -m 1G -smp 2
> > 
> > Next I'll trying fiddling with acpi.
> > 
> > -- 
> > Ross Boylan  wk:  (415) 514-8146
> > 185 Berry St #5700   r...@biostat.ucsf.edu
> > Dept of Epidemiology and Biostatistics   fax: (415) 514-8150
> > University of California, San Francisco
> > San Francisco, CA 94107-1739 hm:  (415) 550-1062
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: just a dump

2009-05-15 Thread Marcelo Tosatti

On Wed, May 13, 2009 at 12:20:26AM +0200, Hans de Bruin wrote:
> Hans de Bruin wrote:
>> Staring to vms simultaneously end in crash
>>
>> linux 30-rc5
>> kvm-qemu kvm-85-378-g143eb2b
>> proc AMD dualcore
>>
>> vm's like:
>>
>> #!/bin/sh
>> n=10
>> cdrom=/iso/server2008x64.iso
>> drive=file=/kvm/disks/vm$n
>> mem=1024
>> cpu=qemu64
>> vga=std
>> mac=52:54:00:12:34:$n
>> bridge=br1
>>
>> qemu-system-x86_64 -cdrom $cdrom -drive $drive -m $mem -cpu $cpu -vga  
>> $vga -net nic,macaddr=$mac -net tap,script=/etc/qemu/$bridge
>>
>>
> another dmesg:

Hans,

The oopses below point to the possibility of a hardware problem, 
similar to:

https://bugzilla.redhat.com/show_bug.cgi?id=480779

Can you please rule it out with memtest86?

>
> device tap0 entered promiscuous mode
> br1: topology change detected, propagating
> br1: port 1(tap0) entering forwarding state
> device tap1 entered promiscuous mode
> br1: topology change detected, propagating
> br1: port 2(tap1) entering forwarding state
> tap0: no IPv6 routers present
> tap1: no IPv6 routers present
> kvm: 2915: cpu0 unimplemented perfctr wrmsr: 0xc001 data 0x0
> kvm: 2915: cpu0 unimplemented perfctr wrmsr: 0xc0010001 data 0x0
> kvm: 2915: cpu0 unimplemented perfctr wrmsr: 0xc0010002 data 0x0
> kvm: 2915: cpu0 unimplemented perfctr wrmsr: 0xc0010003 data 0x0
> kvm: 2914: cpu0 unimplemented perfctr wrmsr: 0xc001 data 0x0
> kvm: 2914: cpu0 unimplemented perfctr wrmsr: 0xc0010001 data 0x0
> kvm: 2914: cpu0 unimplemented perfctr wrmsr: 0xc0010002 data 0x0
> kvm: 2914: cpu0 unimplemented perfctr wrmsr: 0xc0010003 data 0x0
> rmap_remove: 880100de5500 8 0->BUG
> [ cut here ]
> kernel BUG at arch/x86/kvm/mmu.c:576!
> invalid opcode:  [#1] SMP
> last sysfs file: /sys/devices/pci:00/:00:10.0/:01:09.0/resource
> CPU 1
> Modules linked in:
> Pid: 2925, comm: qemu-system-x86 Not tainted 2.6.30-rc5 #3 System  
> Product Name
> RIP: 0010:[]  [] rmap_remove+0x151/0x200
> RSP: 0018:8801a0d379f8  EFLAGS: 00010292
> RAX: 002a RBX: 0008 RCX: 809a3b40
> RDX: 88002804d000 RSI: 0046 RDI: 809a3a34
> RBP: 8801a0d37a28 R08: 8777 R09: 
> R10:  R11:  R12: 
> R13: 880100de5500 R14: 880101e23580 R15: 8801a0e1c000
> FS:  4270d950(0063) GS:88002804d000() knlGS:07faa000
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 014a8c18 CR3: 0001a0c62000 CR4: 06e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: 0ff0 DR7: 0400
> Process qemu-system-x86 (pid: 2925, threadinfo 8801a0d36000, task  
> 8801af3605a0)
> Stack:
>  8801a0d37a28   
>  0500 880101e23580 8801a0d37ac8 8021ad8d
>   8801 0003020d 0016e772
> Call Trace:
>  [] paging64_sync_page+0x9d/0x1a0
>  [] ? rmap_write_protect+0xd5/0x150
>  [] kvm_sync_page+0x6b/0x90
>  [] mmu_sync_children+0xcd/0x120
>  [] ? x86_emulate_insn+0x292/0x4d30
>  [] ? x86_decode_insn+0x412/0xf10
>  [] mmu_sync_roots+0xc2/0xd0
>  [] kvm_mmu_load+0x138/0x200
>  [] ? handle_exit+0x14a/0x2c0
>  [] kvm_arch_vcpu_ioctl_run+0x863/0xaa0
>  [] ? kvm_vm_ioctl+0x165/0x910
>  [] ? do_futex+0x679/0x9a0
>  [] kvm_vcpu_ioctl+0x5d3/0x790
>  [] ? common_interrupt+0xe/0x13
>  [] ? __dequeue_entity+0x2b/0x50
>  [] vfs_ioctl+0x31/0x90
>  [] do_vfs_ioctl+0x2f1/0x4e0
>  [] sys_ioctl+0x82/0xa0
>  [] system_call_fastpath+0x16/0x1b
> Code: 04 75 e7 48 8b 47 20 49 89 fb 48 85 c0 0f 84 b7 00 00 00 48 89 c7  
> eb d0 49 8b 55 00 4c 89 ee 48 c7 c7 b8 2e 7f 80 e8 1f 29
> 04 00 <0f> 0b eb fe 48 8b 4f 18 48 85 c9 0f 94 c2 83 fe 02 0f 9e c0 84
> RIP  [] rmap_remove+0x151/0x200
>  RSP 
> ---[ end trace c11385df745a1fea ]---
> BUG: unable to handle kernel NULL pointer dereference at 0058
> IP: [] mmu_page_remove_parent_pte+0xc/0x100
> PGD 1a0ca8067 PUD 1a0ca9067 PMD 0
> Oops:  [#2] SMP
> last sysfs file: /sys/devices/pci:00/:00:10.0/:01:09.0/resource
> CPU 0
> Modules linked in:
> Pid: 2926, comm: qemu-system-x86 Tainted: G  D2.6.30-rc5 #3  
> System Product Name
> RIP: 0010:[]  []  
> mmu_page_remove_parent_pte+0xc/0x100
> RSP: 0018:8801a0da57a8  EFLAGS: 00010292
> RAX:  RBX:  RCX: 002b
> RDX: e200 RSI: 8800ccac0220 RDI: 
> RBP: 8801a0da57b8 R08: 006a R09: 8800ccd85e70
> R10:  R11:  R12: 8800ccac0220
> R13: 8800ccd85dc0 R14: 0044 R15: 8801a0db
> FS:  40fbc950(0063) GS:880028034000() knlGS:07fd5000
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0058 CR3: 0001a0c63000 CR4: 06e0
> DR0: 000

Re: [PATCH 5/6] Nested SVM: Implement INVLPGA

2009-05-15 Thread Joerg Roedel

On Fri, May 15, 2009 at 10:22:19AM +0200, Alexander Graf wrote:
> SVM adds another way to do INVLPG by ASID which Hyper-V makes use of,
> so let's implement it!
> 
> For now we just do the same thing invlpg does, as asid switching
> means we flush the mmu anyways. That might change one day though.
> 
> Signed-off-by: Alexander Graf 
> ---
>  arch/x86/kvm/svm.c |   14 +-
>  1 files changed, 13 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 30e6b43..b2c6cf3 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1785,6 +1785,18 @@ static int clgi_interception(struct vcpu_svm *svm, 
> struct kvm_run *kvm_run)
>   return 1;
>  }
>  
> +static int invlpga_interception(struct vcpu_svm *svm, struct kvm_run 
> *kvm_run)
> +{
> + struct kvm_vcpu *vcpu = &svm->vcpu;
> + nsvm_printk("INVLPGA\n");
> + svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
> + skip_emulated_instruction(&svm->vcpu);
> +
> + kvm_mmu_reset_context(vcpu);
> + kvm_mmu_load(vcpu);
> + return 1;
> +}
> +

Hmm, since we flush the TLB on every nested-guest entry I think we can
make this function a nop.

>  static int invalid_op_interception(struct vcpu_svm *svm,
>  struct kvm_run *kvm_run)
>  {
> @@ -2130,7 +2142,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
>   [SVM_EXIT_INVD] = emulate_on_interception,
>   [SVM_EXIT_HLT]  = halt_interception,
>   [SVM_EXIT_INVLPG]   = invlpg_interception,
> - [SVM_EXIT_INVLPGA]  = invalid_op_interception,
> + [SVM_EXIT_INVLPGA]  = invlpga_interception,
>   [SVM_EXIT_IOIO] = io_interception,
>   [SVM_EXIT_MSR]  = msr_interception,
>   [SVM_EXIT_TASK_SWITCH]  = task_switch_interception,
> -- 
> 1.6.0.2
> 
> 

-- 
   | Advanced Micro Devices GmbH
 Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München
 System| 
 Research  | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
 Center| Sitz: Dornach, Gemeinde Aschheim, Landkreis München
   | Registergericht München, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/6] Emulator: Inject #PF when page was not found

2009-05-15 Thread Joerg Roedel

On Fri, May 15, 2009 at 10:22:17AM +0200, Alexander Graf wrote:
> If we couldn't find a page on read_emulated, it might be a good
> idea to tell the guest about that and inject a #PF.
> 
> We do the same already for write faults. I don't know why it was
> not implemented for reads.

Have you checked that the emulator will never ever do speculative reads?
This may be the reason why the fault was not injected here.

> 
> Signed-off-by: Alexander Graf 
> ---
>  arch/x86/kvm/x86.c |7 +--
>  1 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5fcde2c..5aa1219 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2131,10 +2131,13 @@ static int emulator_read_emulated(unsigned long addr,
>   goto mmio;
>  
>   if (kvm_read_guest_virt(addr, val, bytes, vcpu)
> - == X86EMUL_CONTINUE)
> + == X86EMUL_CONTINUE) {
>   return X86EMUL_CONTINUE;
> - if (gpa == UNMAPPED_GVA)
> + }
> + if (gpa == UNMAPPED_GVA) {
> + kvm_inject_page_fault(vcpu, addr, 0);
>   return X86EMUL_PROPAGATE_FAULT;
> + }
>  
>  mmio:
>   /*
> -- 
> 1.6.0.2
> 
> 

-- 
   | Advanced Micro Devices GmbH
 Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München
 System| 
 Research  | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
 Center| Sitz: Dornach, Gemeinde Aschheim, Landkreis München
   | Registergericht München, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Status of pci passthrough work?

2009-05-15 Thread Amit Shah

On (Fri) May 15 2009 [07:14:05], Passera, Pablo R wrote:
> Hi Amit,
> Thanks for your answer. I was able to get your userspace pvdma 
> version. So now, I am using the PVDMA patched kernel and the PVDMA patches 
> userspace. However, I am not able to start the VM. I am running qemu with the 
> following options (I am trying without any pci passthrough first)
> 
> ./qemu/x86_64-softmmu/qemu-system-x86_64 -hda /root/kvm/dm2.img -m 256 -net 
> none
> 
> The SDL windows appear but it hangs after showing the message "Press F12 for 
> boot menu.". I am not getting any message neither in qemu nor in dmesg. Do 
> you know what could be happening? May be a kernel compile option? It would be 
> great if you can send me the .config file that you used to compile it, just 
> to check the options.

Can you try out a few things, like booting off the 'avi' branch in
userspace and / or the 'kvm' branch of the kernel tree? Just to rule out
the bugs in the device assignment code.

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/6] MMU: don't bail on PAT bits in PTE

2009-05-15 Thread Joerg Roedel

On Fri, May 15, 2009 at 12:53:42PM +0200, Alexander Graf wrote:
>
> On 15.05.2009, at 12:25, Michael S. Tsirkin wrote:
>
>> On Fri, May 15, 2009 at 10:22:16AM +0200, Alexander Graf wrote:
>>> A 64bit PTE can have bit7 set to 1 which means "Use this bit for the 
>>> PAT".
>>> Currently KVM's MMU code treats this bit as reserved, even though  
>>> it's not.
>>>
>>> As long as we're not required to make use of the PAT bits which is  
>>> only
>>> required for DMA/MMIO from my understanding, we can safely ignore it.
>>>
>>> Hyper-V uses this bit for kernel PTEs.
>>>
>>> Signed-off-by: Alexander Graf 
>>> ---
>>> arch/x86/kvm/mmu.c |2 +-
>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>>> index 8fcdae9..cce055a 100644
>>> --- a/arch/x86/kvm/mmu.c
>>> +++ b/arch/x86/kvm/mmu.c
>>> @@ -2169,7 +2169,7 @@ static void reset_rsvds_bits_mask(struct  
>>> kvm_vcpu *vcpu, int level)
>>> context->rsvd_bits_mask[1][1] = exb_bit_rsvd |
>>> rsvd_bits(maxphyaddr, 51) |
>>> rsvd_bits(13, 20);  /* large page */
>>> -   context->rsvd_bits_mask[1][0] = ~0ull;
>>> +   context->rsvd_bits_mask[1][0] = 0ull;
>>> break;
>>> }
>>> }
>>
>> Just to make sure I understand what this does: if guest sets bit7,  
>> will
>> bit7 get set in shadow PTEs as well?
>
> I don't see any code that interprets bit7, so the shadow PTE should be  
> completely unaffected.
>
> But to be sure I asked Jörg to take a look at it as well, as he's more  
> familiar with the x86 SPT code than I am :-).

The PAT bit is not propagated into the shadow page tables. Anyway, the
problem is fixed the wrong way in this patch. The real problem is that a
4kb pte is checked with mask considered for large pages (which do not
exist on walker level 0). The attached patch fixes it the better way
imho.

>From 7530aef3ed580b70a74224f8c04857754501c496 Mon Sep 17 00:00:00 2001
From: Joerg Roedel 
Date: Fri, 15 May 2009 15:14:19 +0200
Subject: [PATCH] kvm/mmu: fix reserved bit checking on 4kb pte level

The reserved bits checking code looks at bit 7 of the pte to determine
if it has to use the mask for a large pte or a normal pde. This does not
work on 4kb pte level because bit 7 is used there for PAT. Account this
in the checking function.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/mmu.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 479e748..8d9552e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2124,9 +2124,11 @@ static void paging_free(struct kvm_vcpu *vcpu)
 
 static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 gpte, int level)
 {
-   int bit7;
+   int bit7 = 0;
+
+   if (level != PT_PAGE_TABLE_LEVEL)
+   bit7 = (gpte >> 7) & 1;
 
-   bit7 = (gpte >> 7) & 1;
return (gpte & vcpu->arch.mmu.rsvd_bits_mask[bit7][level-1]) != 0;
 }
 
-- 
1.6.2.4


-- 
   | Advanced Micro Devices GmbH
 Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München
 System| 
 Research  | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
 Center| Sitz: Dornach, Gemeinde Aschheim, Landkreis München
   | Registergericht München, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -tip] x86: kvm/x86.c use MSR names in place of address

2009-05-15 Thread Jaswinder Singh Rajput

On Thu, 2009-05-14 at 11:00 +0530, Jaswinder Singh Rajput wrote:

> Here is the patch:
> 
> [PATCH -tip] x86: kvm/x86.c use MSR names in place of address
> 
> Replace 0xc0010010 with MSR_K8_SYSCFG and 0xc0010015 with MSR_K7_HWCR.
> 
> Signed-off-by: Jaswinder Singh Rajput 
> ---

This patch can also apply to kvm tree without any changes.

Thanks,

--
JSR

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Status of pci passthrough work?

2009-05-15 Thread Passera, Pablo R

Hi Amit,
Thanks for your answer. I was able to get your userspace pvdma version. 
So now, I am using the PVDMA patched kernel and the PVDMA patches userspace. 
However, I am not able to start the VM. I am running qemu with the following 
options (I am trying without any pci passthrough first)

./qemu/x86_64-softmmu/qemu-system-x86_64 -hda /root/kvm/dm2.img -m 256 -net none

The SDL windows appear but it hangs after showing the message "Press F12 for 
boot menu.". I am not getting any message neither in qemu nor in dmesg. Do you 
know what could be happening? May be a kernel compile option? It would be great 
if you can send me the .config file that you used to compile it, just to check 
the options.

Thanks,
Pablo

>-Original Message-
>From: Amit Shah [mailto:amit.s...@redhat.com]
>Sent: Friday, May 15, 2009 8:00 AM
>To: Passera, Pablo R
>Cc: kvm@vger.kernel.org
>Subject: Re: Status of pci passthrough work?
>
>Hello,
>
>On (Thu) May 14 2009 [11:08:29], Passera, Pablo R wrote:
>> Amit,
>> I trying to use PVDMA. I've downloaded a kernel snapshot from
>the your kvm git, but I couldn't download a snapshot or the repo from
>your kvm-userspace tree. I tried to launch the VM using kvm-85 user
>space but it hangs before loading it. Should it work with kvm-85 user
>space? Do you have the userspace patches for PVDMA?
>
>The pvdma userspace patches are at
>
>http://git.kernel.org/?p=linux/kernel/git/amit/kvm-
>userspace.git;a=shortlog;h=pvdma
>
>(look for the branch 'pvdma' in the tree).
>
>   Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -tip] x86: kvm replace MSR_IA32_TIME_STAMP_COUNTER with MSR_IA32_TSC of msr-index.h

2009-05-15 Thread Jaswinder Singh Rajput

Hello Avi,

On Thu, 2009-05-14 at 11:57 +0530, Jaswinder Singh Rajput wrote:
> Use standard msr-index.h's MSR declaration.
> 
> MSR_IA32_TSC is better than MSR_IA32_TIME_STAMP_COUNTER as it also solves
> 80 column issue.
> 
> Signed-off-by: Jaswinder Singh Rajput 
> ---

If this patch looks sane to you can apply in kvm tree.

Here is the updated patch based on kvm tree:

[PATCH] x86: kvm replace MSR_IA32_TIME_STAMP_COUNTER with MSR_IA32_TSC of 
msr-index.h

Use standard msr-index.h's MSR declaration.

MSR_IA32_TSC is better than MSR_IA32_TIME_STAMP_COUNTER as it also solves
80 column issue.

Signed-off-by: Jaswinder Singh Rajput 
---
 arch/x86/include/asm/kvm_host.h |2 --
 arch/x86/kvm/svm.c  |4 ++--
 arch/x86/kvm/vmx.c  |4 ++--
 arch/x86/kvm/x86.c  |5 ++---
 4 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 716a4ec..5c72897 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -753,8 +753,6 @@ static inline void kvm_inject_gp(struct kvm_vcpu *vcpu, u32 
error_code)
kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
 }
 
-#define MSR_IA32_TIME_STAMP_COUNTER0x010
-
 #define TSS_IOPB_BASE_OFFSET 0x66
 #define TSS_BASE_SIZE 0x68
 #define TSS_IOPB_SIZE (65536 / 8)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 71510e0..dd667dd 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1953,7 +1953,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned 
ecx, u64 *data)
struct vcpu_svm *svm = to_svm(vcpu);
 
switch (ecx) {
-   case MSR_IA32_TIME_STAMP_COUNTER: {
+   case MSR_IA32_TSC: {
u64 tsc;
 
rdtscll(tsc);
@@ -2043,7 +2043,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned 
ecx, u64 data)
struct vcpu_svm *svm = to_svm(vcpu);
 
switch (ecx) {
-   case MSR_IA32_TIME_STAMP_COUNTER: {
+   case MSR_IA32_TSC: {
u64 tsc;
 
rdtscll(tsc);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index fe2ce2b..98e6915 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -931,7 +931,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 
msr_index, u64 *pdata)
case MSR_EFER:
return kvm_get_msr_common(vcpu, msr_index, pdata);
 #endif
-   case MSR_IA32_TIME_STAMP_COUNTER:
+   case MSR_IA32_TSC:
data = guest_read_tsc();
break;
case MSR_IA32_SYSENTER_CS:
@@ -991,7 +991,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 
msr_index, u64 data)
case MSR_IA32_SYSENTER_ESP:
vmcs_writel(GUEST_SYSENTER_ESP, data);
break;
-   case MSR_IA32_TIME_STAMP_COUNTER:
+   case MSR_IA32_TSC:
rdtscll(host_tsc);
guest_write_tsc(data, host_tsc);
break;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 44e87a5..4150edb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -462,7 +462,7 @@ static u32 msrs_to_save[] = {
 #ifdef CONFIG_X86_64
MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
 #endif
-   MSR_IA32_TIME_STAMP_COUNTER, MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
+   MSR_IA32_TSC, MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
MSR_IA32_PERF_STATUS, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA
 };
 
@@ -640,8 +640,7 @@ static void kvm_write_guest_time(struct kvm_vcpu *v)
 
/* Keep irq disabled to prevent changes to the clock */
local_irq_save(flags);
-   kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER,
- &vcpu->hv_clock.tsc_timestamp);
+   kvm_get_msr(v, MSR_IA32_TSC, &vcpu->hv_clock.tsc_timestamp);
ktime_get_ts(&ts);
local_irq_restore(flags);
 
-- 
1.6.1.1



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Set bit 1 in disabled processor's _STA

2009-05-15 Thread Glauber Costa

This patch sets bits 1 in disabled processor's _STA.
According to the ACPI spec, this bit means:
 "Set if the device is enabled and decoding its resources."

Without it, Windows 2008 device manager shows the processors
as malfunctioning hardware.

Signed-off-by: Glauber Costa 
---
 kvm/bios/acpi-dsdt.dsl |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kvm/bios/acpi-dsdt.dsl b/kvm/bios/acpi-dsdt.dsl
index c756fed..c53816c 100755
--- a/kvm/bios/acpi-dsdt.dsl
+++ b/kvm/bios/acpi-dsdt.dsl
@@ -56,7 +56,7 @@ DefinitionBlock (
 }   \
 Method (_STA) { \
 If (CRST(nr)) { Return(0xF) }   \
-Else { Return(0x9) }\
+Else { Return(0xB) }\
 }   \
 }   \
 
-- 
1.5.6.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Don't try to mess with CPUID when running nested SVM

2009-05-15 Thread Avi Kivity


Alexander Graf wrote:
When using nested SVM we usually want the guest to see the exact 
CPUID values

we gave it and not some mangled ones.



That would triggered by -cpu host, not nesting.


Oh we have -cpu host already?


No, we don't :)

hm - treating the hypervisor bit like any other cpuid bit sounds like 
a good idea. I'm wondering though which way should be preferred. I 
usually don't want to have the hypervisor bit set - but maybe I'm the 
minority.




Windows requires the hypervisor bit to set in order to pass some testing 
program.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Don't try to mess with CPUID when running nested SVM

2009-05-15 Thread Alexander Graf



On 15.05.2009, at 13:09, Avi Kivity wrote:


Alexander Graf wrote:
When using nested SVM we usually want the guest to see the exact  
CPUID values

we gave it and not some mangled ones.



That would triggered by -cpu host, not nesting.


Oh we have -cpu host already? If so, we don't need that hackery of  
course :-)


@@ -1506,7 +1506,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t  
index, uint32_t count,

*edx = env->cpuid_features;
 /* "Hypervisor present" bit required for Microsoft SVVP */
-if (kvm_enabled())
+if (kvm_enabled() && !kvm_nested)
*ecx |= (1 << 31);
break;



-cpu host,-hypervisor


hm - treating the hypervisor bit like any other cpuid bit sounds like  
a good idea. I'm wondering though which way should be preferred. I  
usually don't want to have the hypervisor bit set - but maybe I'm the  
minority.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Don't try to mess with CPUID when running nested SVM

2009-05-15 Thread Avi Kivity


Alexander Graf wrote:

When using nested SVM we usually want the guest to see the exact CPUID values
we gave it and not some mangled ones.
  


That would triggered by -cpu host, not nesting.


@@ -1506,7 +1506,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *edx = env->cpuid_features;
 
 /* "Hypervisor present" bit required for Microsoft SVVP */

-if (kvm_enabled())
+if (kvm_enabled() && !kvm_nested)
 *ecx |= (1 << 31);
 break;
  


-cpu host,-hypervisor

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Status of pci passthrough work?

2009-05-15 Thread Amit Shah

Hello,

On (Thu) May 14 2009 [11:08:29], Passera, Pablo R wrote:
> Amit,
> I trying to use PVDMA. I've downloaded a kernel snapshot from the 
> your kvm git, but I couldn't download a snapshot or the repo from your 
> kvm-userspace tree. I tried to launch the VM using kvm-85 user space but it 
> hangs before loading it. Should it work with kvm-85 user space? Do you have 
> the userspace patches for PVDMA?

The pvdma userspace patches are at

http://git.kernel.org/?p=linux/kernel/git/amit/kvm-userspace.git;a=shortlog;h=pvdma

(look for the branch 'pvdma' in the tree).

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/6] MMU: don't bail on PAT bits in PTE

2009-05-15 Thread Alexander Graf



On 15.05.2009, at 12:25, Michael S. Tsirkin wrote:


On Fri, May 15, 2009 at 10:22:16AM +0200, Alexander Graf wrote:
A 64bit PTE can have bit7 set to 1 which means "Use this bit for  
the PAT".
Currently KVM's MMU code treats this bit as reserved, even though  
it's not.


As long as we're not required to make use of the PAT bits which is  
only

required for DMA/MMIO from my understanding, we can safely ignore it.

Hyper-V uses this bit for kernel PTEs.

Signed-off-by: Alexander Graf 
---
arch/x86/kvm/mmu.c |2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8fcdae9..cce055a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2169,7 +2169,7 @@ static void reset_rsvds_bits_mask(struct  
kvm_vcpu *vcpu, int level)

context->rsvd_bits_mask[1][1] = exb_bit_rsvd |
rsvd_bits(maxphyaddr, 51) |
rsvd_bits(13, 20);  /* large page */
-   context->rsvd_bits_mask[1][0] = ~0ull;
+   context->rsvd_bits_mask[1][0] = 0ull;
break;
}
}


Just to make sure I understand what this does: if guest sets bit7,  
will

bit7 get set in shadow PTEs as well?


I don't see any code that interprets bit7, so the shadow PTE should be  
completely unaffected.


But to be sure I asked Jörg to take a look at it as well, as he's more  
familiar with the x86 SPT code than I am :-).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/6] Add rudimentary Hyper-V guest support

2009-05-15 Thread Alexander Graf



On 15.05.2009, at 10:22, Alexander Graf wrote:

Now that we have nested SVM in place, let's make use of it and  
virtualize

something non-kvm.
The first interesting target that came to my mind here was Hyper-V.

This patchset makes Windows Server 2008 boot with Hyper-V, which runs
the "dom0" in virtualized mode already. I haven't been able to run a
second VM within for now though, but maybe I just wasn't patient  
enough ;-).


In order to find out why things were slow with nested SVM I hacked  
intercept reporting into debugfs in my local tree and found pretty  
interesting results (using NPT):


 SVM_EXIT_CLGI  3888080   0
 SVM_EXIT_CPUID3460   0
 SVM_EXIT_CR0_SEL_WRI 0   0
 SVM_EXIT_ERR 0   0
 SVM_EXIT_FERR_FREEZE 0   0
 SVM_EXIT_GDTR_READ   0   0
 SVM_EXIT_GDTR_WRITE  0   0
 SVM_EXIT_HLT 40186   0
 SVM_EXIT_ICEBP   0   0
 SVM_EXIT_IDTR_READ   0   0
 SVM_EXIT_IDTR_WRITE  0   0
 SVM_EXIT_INIT0   0
 SVM_EXIT_INTR   193173   0
 SVM_EXIT_INVD0   0
 SVM_EXIT_INVLPG  1   0
 SVM_EXIT_INVLPGA536994   0
 SVM_EXIT_IOIO  3450484   0
 SVM_EXIT_IRET0   0
 SVM_EXIT_LDTR_READ   0   0
 SVM_EXIT_LDTR_WRITE  0   0
 SVM_EXIT_MONITOR 0   0
 SVM_EXIT_MSR124614   0
 SVM_EXIT_MWAIT   0   0
 SVM_EXIT_MWAIT_COND  0   0
 SVM_EXIT_NMI 0   0
 SVM_EXIT_NPF   1040416   0
 SVM_EXIT_PAUSE   0   0
 SVM_EXIT_POPF0   0
 SVM_EXIT_PUSHF   0   0
 SVM_EXIT_RDPMC   0   0
 SVM_EXIT_RDTSC   0   0
 SVM_EXIT_RDTSCP  0   0
 SVM_EXIT_RSM 0   0
 SVM_EXIT_SHUTDOWN0   0
 SVM_EXIT_SKINIT  0   0
 SVM_EXIT_SMI20   0
 SVM_EXIT_STGI  3888080   0
 SVM_EXIT_SWINT   0   0
 SVM_EXIT_TASK_SWITCH 0   0
 SVM_EXIT_TR_READ 0   0
 SVM_EXIT_TR_WRITE0   0
 SVM_EXIT_VINTR  402865   0
 SVM_EXIT_VMLOAD3888096   0
 SVM_EXIT_VMMCALL767288   0
 SVM_EXIT_VMRUN 3888096   0
 SVM_EXIT_VMSAVE3888096   0
 SVM_EXIT_WBINVD 64   0


So apparently the most intercepts come from the SVM helper calls  
(clgi, stgi, vmload, vmsave). I guess I need to get back to the  
"emulate when GIF=0" approach to get things fast.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm build error with latest commit

2009-05-15 Thread Avi Kivity


Xu, Jiajun wrote:

Hi all,
Latest kvm can not build with 2.6.30-rc4 kernel. Could anyone help on the issue?

Error as following:

make[1]: Leaving directory 
`/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm'
  


The external module is now build using the kvm-kmod repository:

 http://git.kernel.org/?p=virt/kvm/kvm-kmod.git;a=summary

If you clone it, and use the commands 'git submodule init; git submodule 
update' is will create a linux-2.6 directory.  Afterwards all you need 
is to pull from both repositories, and make sync and make rpm will work 
as usual.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/6] MMU: don't bail on PAT bits in PTE

2009-05-15 Thread Michael S. Tsirkin

On Fri, May 15, 2009 at 10:22:16AM +0200, Alexander Graf wrote:
> A 64bit PTE can have bit7 set to 1 which means "Use this bit for the PAT".
> Currently KVM's MMU code treats this bit as reserved, even though it's not.
> 
> As long as we're not required to make use of the PAT bits which is only
> required for DMA/MMIO from my understanding, we can safely ignore it.
> 
> Hyper-V uses this bit for kernel PTEs.
> 
> Signed-off-by: Alexander Graf 
> ---
>  arch/x86/kvm/mmu.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 8fcdae9..cce055a 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2169,7 +2169,7 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu 
> *vcpu, int level)
>   context->rsvd_bits_mask[1][1] = exb_bit_rsvd |
>   rsvd_bits(maxphyaddr, 51) |
>   rsvd_bits(13, 20);  /* large page */
> - context->rsvd_bits_mask[1][0] = ~0ull;
> + context->rsvd_bits_mask[1][0] = 0ull;
>   break;
>   }
>  }

Just to make sure I understand what this does: if guest sets bit7, will
bit7 get set in shadow PTEs as well?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Don't try to mess with CPUID when running nested SVM

2009-05-15 Thread Alexander Graf

When using nested SVM we usually want the guest to see the exact CPUID values
we gave it and not some mangled ones.

Hyper-V for example doesn't even start when the "hypervisor present" bit is set.

Signed-off-by: Alexander Graf 
---
 target-i386/helper.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/target-i386/helper.c b/target-i386/helper.c
index 24fcea8..5f56698 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1496,7 +1496,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
  * isn't supported in compatibility mode on Intel.  so advertise the
  * actuall cpu, and say goodbye to migration between different vendors
  * is you use compatibility mode. */
-if (kvm_enabled())
+if (kvm_enabled() && !kvm_nested)
 host_cpuid(0, 0, NULL, ebx, ecx, edx);
 break;
 case 1:
@@ -1506,7 +1506,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *edx = env->cpuid_features;
 
 /* "Hypervisor present" bit required for Microsoft SVVP */
-if (kvm_enabled())
+if (kvm_enabled() && !kvm_nested)
 *ecx |= (1 << 31);
 break;
 case 2:
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Add external-module-compat header for MSR_VM_IGNNE

2009-05-15 Thread Alexander Graf

This patch adds a compat definition for MSR_VM_IGNNE

Signed-off-by: Alexander Graf 
---
 kvm/kernel/x86/external-module-compat.h |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/kvm/kernel/x86/external-module-compat.h 
b/kvm/kernel/x86/external-module-compat.h
index 8f9aae0..da42d7b 100644
--- a/kvm/kernel/x86/external-module-compat.h
+++ b/kvm/kernel/x86/external-module-compat.h
@@ -30,6 +30,11 @@
 #define MSR_VM_CR   0xc0010114
 #endif
 
+#ifndef MSR_VM_IGNNE
+#define MSR_VM_IGNNE0xc0010115
+#endif
+
+
 #ifndef MSR_VM_HSAVE_PA
 #define MSR_VM_HSAVE_PA 0xc0010117
 #endif
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm-autotest: The automation plans?

2009-05-15 Thread jason wang


Michael Goldish 写道:

- "sudhir kumar"  wrote:

  

On Thu, May 14, 2009 at 12:22 PM, jason wang 
wrote:


sudhir kumar 写道:
  

Hi Uri/Lucas,

Do you have any plans for enhancing kvm-autotest?
I was looking mainly on the following 2 aspects:

(1).
we have standalone migration only. Is there any plans of enhancing
kvm-autotest so that we can trigger migration while a workload is
running?
Something like this:
Start a workload(may be n instances of it).
let the test execute for some time.
Trigger migration.
Log into the target.
Check if the migration is succesful
Check if the test results are consistent.



We have some patches of ping pong migration and workload adding.
  

The


migration is based on public bridge and workload adding is based on
  

running


benchmark in the background of guest.
  

Cool. I would like to have look on them. So how do you manage the
background process/thread?

Yes, we would try to sent it here as soon as possible. The background 
workload could be added through various methods. We could an simple 
algorithm as follows:


run_migration2():
pid = run_autotest_background(test,params,env,"dbench","control.60")

Do ping-pong migration ...

wait_autoteset_background(pid)

run_autotest_background() would fork a subprocess to run function 
run_autotest() and catch its exception.
wait_autotest_background(pid) would wait until the background benchmark 
complete and analyse the result through the return value of the subprocess.
The child process could work well depends the fact that the ssh 
connection should alive during migration.

I believe this could be also achieved through job.parallel()

(2).
How can we run N parallel instances of a test? Will the current
configuration  be easily able to support it?

Please provide your thoughts on the above features.




The parallelized instances could be easily achieved through
  

job.parallel()


of autotest framework, and that is what we have used in our tests.
  

We have


make some helper routines such as get_free_port to be reentrant
  

through file


lock.
We've implemented following test cases: timedrift(already sent
  

here),


savevm/loadvm, suspend/resume, jumboframe, migration between two
  

machines


and others. We will sent it here for review in the following weeks.
There are some other things could be improved:
1) Current kvm_test.cfg.sample/kvm_test.cfg is transparent to
  

autotest


server UI. This would make it hard to configure the tests in the
  

server


side. During our test, we have merged it into control and make it
  

could be


configured by "editing control file" function of autotest server
  

side web


UI.
  

Not much clue here. But I would like to keep the control file as
simple as possible and as much independent of test scenarios as
possible. kvm_tests.cfg should be the right file untill and unless it
is impossible to do by using it.


2) Public bridge support: I've sent a patch(TAP network support in
kvm-autotest), this patch needs external DHCP server and requires
  

nmap


support. I don't know whether the method of original
  

kvm_runtes_old(DHCP


server of private bridge) is preferable.
  

The old approach is better. All might not be able to run an external
DHCP server for running the test. I do not see any issue with the old
approach.



We're taking more of a minimalist approach in kvm_runtest_2: the
framework should handle only the things directly related to testing.
Configuring and running a DHCP server is and should be beyond the scope
of the KVM-Autotest framework. To emulate the old behavior, you can just
start the DHCP server yourself locally. If you wish, maybe we can
bundle example scripts with the framework that will do this for the user,
but they should not be an integral part of the framework in my opinion.

  


--
Sudhir Kumar
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kvm build error with latest commit

2009-05-15 Thread Xu, Jiajun


Hi all,
Latest kvm can not build with 2.6.30-rc4 kernel. Could anyone help on the issue?

Error as following:

make[1]: Leaving directory 
`/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm'
+ make -C kernel LINUX=2.6.30-rc4
make[1]: Entering directory 
`/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel'
make -j20 -C /lib/modules/2.6.30-rc4/build M=`pwd` \
LINUXINCLUDE="-I`pwd`/include -Iinclude \
 \
-Iarch/x86/include -I`pwd`/include-compat \
-include include/linux/autoconf.h \
-include `pwd`/x86/external-module-compat.h "
make[2]: Entering directory `/mnt/sdb1/kernel/src/redhat/BUILD/kernel-2.6.30rc4'
  LD  
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/built-in.o
  CC [M]  
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/svm.o
  CC [M]  
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/../external-module-compat.o
  CC [M]  
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/vmx.o
  CC [M]  
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/vmx-debug.o
  CC [M]  
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/kvm_main.o
  CC [M]  
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/x86.o
  CC [M]  
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/mmu.o
  CC [M]  
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/x86_emulate.o
  CC [M]  
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/../anon_inodes.o
  CC [M]  
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/irq.o
  CC [M]  
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/i8259.o
  CC [M]  
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/ioapic.o
  CC [M]  
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/preempt.o
  CC [M]  
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/i8254.o
  CC [M]  
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/coalesced_mmio.o
  CC [M]  
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/irq_comm.o
  CC [M]  
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/timer.o
  CC [M]  
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/iommu.o
  CC [M]  
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/lapic.o
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/x86.c:
 In function 'do_cpuid_ent':
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/x86.c:1327:
 error: 'X86_FEATURE_MOVBE' undeclared (first use in this function)
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/x86.c:1327:
 error: (Each undeclared identifier is reported only once
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/x86.c:1327:
 error: for each function it appears in.)
/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/x86.c:1327:
 error: 'X86_FEATURE_POPCNT' undeclared (first use in this function)
make[4]: *** 
[/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/x86.o]
 Error 1
make[4]: *** Waiting for unfinished jobs
make[3]: *** 
[/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86]
 Error 2
make[2]: *** 
[_module_/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel]
 Error 2
make[2]: Leaving directory `/mnt/sdb1/kernel/src/redhat/BUILD/kernel-2.6.30rc4'
make[1]: *** [all] Error 2
make[1]: Leaving directory 
`/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel'
error: Bad exit status from /var/tmp/rpm-tmp.94190 (%build)


RPM build errors:
Bad exit status from /var/tmp/rpm-tmp.94190 (%build)
make: *** [rpm] Error 1

Best Regards
Jiajun--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a

Re: [PATCH][KVM-AUTOTEST] TAP network support in kvm-autotest

2009-05-15 Thread jason wang


Michael Goldish 写道:

Hi Micheal, thanks for your comments.

Hi Jason,

We already have patches that implement similar functionality here in
TLV, as mentioned in the to-do list (item #4 under 'Framework').
They're not yet committed upstream because they're still quite fresh.
  

OK, I would pay more attention to to-do list.

Still, your patch looks good and is quite similar to mine. The main
difference is that I use MAC/IP address pools specified by the user,
instead of random MACs with arp/nmap to detect the matching IP
addresses.
  
We've considers the use of MAC/IP address pools, but this method need to 
handle the cases of multiple kvm-autotest running on multiple guests. 
The MAC pools should not overlapped when using public bridges.

I will post my patch to the mailing list soon, but it will come
together with quite a few other patches that I haven't posted yet, so
please be patient.

Comments/questions:

Why do you use nmap in addition to arp? In what cases will arp not
suffice? I'm a little put off by the fact that nmap imposes an
additional requirement on the host. Three hosts I've tried don't come
with nmap installed by default.
  
We use nmap to make sure the guest IP could be finally found somehow. 
During our tests, the scripts may fail to get the IP address of guest 
when host iptables is turned on.

Please see additional comments below.

- "Jason Wang"  wrote:

  

Hi All:
This patch tries to add tap network support in kvm-autotest. Multiple
nics connected to different bridges could be achieved through this
script. Public bridge is important for testing real network traffic
and migration. The patch gives each nic with randomly generated mac
address. The ip address required in the test could be dynamically
probed through nmap/arp. Only the ip address of first NIC is used
through the test.

Example:
nics = nic1 nic2
network = bridge
bridge = switch
ifup =/etc/qemu-ifup-switch
ifdown =/etc/qemu-ifdown-switch

This would make the virtual machine have two nics both of which are
connected to a bridge with the name of 'switch'. Ifup/ifdown scripts
are also specified.

Another Example:
nics = nic1 nic2
network = bridge
bridge = switch
bridge_nic2 = virbr0
ifup =/etc/qemu-ifup-switch
ifup_nic2 = /etc/qemu-ifup-virbr0

This would makes the virtual machine have two nics: nic1 are connected
to bridge 'switch' and nci2 are connected to bridge 'virbr0'.

Public mode and user mode nic could also be mixed:
nics = nic1 nic2
network = bridge
network_nic2 = user

Looking forward for comments and suggestions.

From: jason 
Date: Wed, 13 May 2009 16:15:28 +0800
Subject: [PATCH] Add tap networking support.

---
 client/tests/kvm_runtest_2/kvm_utils.py |7 +++
 client/tests/kvm_runtest_2/kvm_vm.py|   74
++-
 2 files changed, 69 insertions(+), 12 deletions(-)

diff --git a/client/tests/kvm_runtest_2/kvm_utils.py
b/client/tests/kvm_runtest_2/kvm_utils.py
index be8ad95..0d1f7f8 100644
--- a/client/tests/kvm_runtest_2/kvm_utils.py
+++ b/client/tests/kvm_runtest_2/kvm_utils.py
@@ -773,3 +773,10 @@ def md5sum_file(filename, size=None):
 size -= len(data)
 f.close()
 return o.hexdigest()
+
+def random_mac():
+mac=[0x00,0x16,0x30,
+ random.randint(0x00,0x09),
+ random.randint(0x00,0x09),
+ random.randint(0x00,0x09)]
+return ':'.join(map(lambda x: "%02x" %x,mac))



Random MAC addresses will not necessarily work everywhere, as far as
I know. That's why I prefer user specified MAC/IP address ranges.
  
Yes, maybe we could use user specified mac address prefix or more useful 
algorithm to generate mac address.

diff --git a/client/tests/kvm_runtest_2/kvm_vm.py
b/client/tests/kvm_runtest_2/kvm_vm.py
index fab839f..ea7dab6 100644
--- a/client/tests/kvm_runtest_2/kvm_vm.py
+++ b/client/tests/kvm_runtest_2/kvm_vm.py
@@ -105,6 +105,10 @@ class VM:
 self.qemu_path = qemu_path
 self.image_dir = image_dir
 self.iso_dir = iso_dir
+self.macaddr = []
+for nic_name in kvm_utils.get_sub_dict_names(params,"nics"):
+macaddr = kvm_utils.random_mac()
+self.macaddr.append(macaddr)

 def verify_process_identity(self):
 """Make sure .pid really points to the original qemu
process.
@@ -189,9 +193,25 @@ class VM:
 for nic_name in kvm_utils.get_sub_dict_names(params,
"nics"):
 nic_params = kvm_utils.get_sub_dict(params, nic_name)
 qemu_cmd += " -net nic,vlan=%d" % vlan
+net = nic_params.get("network")
+if net == "bridge":
+qemu_cmd += ",macaddr=%s" % self.macaddr[vlan]
 if nic_params.get("nic_model"):
 qemu_cmd += ",model=%s" % nic_params.get("nic_model")
-qemu_cmd += " -net user,vlan=%d" % vlan
+if net == "bridge":
+qemu_cmd += " -net tap,vlan=%d" % vlan
+ifup = nic_params.get("ifup")
+if ifup:
+

[PATCH 1/6] Add definition for IGNNE MSR

2009-05-15 Thread Alexander Graf

Hyper-V tries to access MSR_IGNNE, so let's at least have a definition
for it in our headers.

Signed-off-by: Alexander Graf 
---
 arch/x86/include/asm/msr-index.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ec41fc1..e273549 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -372,6 +372,7 @@
 /* AMD-V MSRs */
 
 #define MSR_VM_CR   0xc0010114
+#define MSR_VM_IGNNE0xc0010115
 #define MSR_VM_HSAVE_PA 0xc0010117
 
 #endif /* _ASM_X86_MSR_INDEX_H */
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/6] Nested SVM: Improve interrupt injection

2009-05-15 Thread Alexander Graf

While trying to get Hyper-V running, I realized that the interrupt injection
mechanisms that are in place right now are not 100% correct.

This patch makes nested SVM's interrupt injection behave more like on a
real machine.

Signed-off-by: Alexander Graf 
---
 arch/x86/kvm/svm.c |   40 +---
 1 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index b2c6cf3..1d22d46 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1517,7 +1517,8 @@ static int nested_svm_vmexit_real(struct vcpu_svm *svm, 
void *arg1,
/* Kill any pending exceptions */
if (svm->vcpu.arch.exception.pending == true)
nsvm_printk("WARNING: Pending Exception\n");
-   svm->vcpu.arch.exception.pending = false;
+   kvm_clear_exception_queue(&svm->vcpu);
+   kvm_clear_interrupt_queue(&svm->vcpu);
 
/* Restore selected save entries */
svm->vmcb->save.es = hsave->save.es;
@@ -1585,7 +1586,8 @@ static int nested_svm_vmrun(struct vcpu_svm *svm, void 
*arg1,
svm->nested_vmcb = svm->vmcb->save.rax;
 
/* Clear internal status */
-   svm->vcpu.arch.exception.pending = false;
+   kvm_clear_exception_queue(&svm->vcpu);
+   kvm_clear_interrupt_queue(&svm->vcpu);
 
/* Save the old vmcb, so we don't need to pick what we save, but
   can restore everything when a VMEXIT occurs */
@@ -2276,21 +2278,15 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, 
int irq)
((/*control->int_vector >> 4*/ 0xf) << V_INTR_PRIO_SHIFT);
 }
 
-static void svm_queue_irq(struct kvm_vcpu *vcpu, unsigned nr)
-{
-   struct vcpu_svm *svm = to_svm(vcpu);
-
-   svm->vmcb->control.event_inj = nr |
-   SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR;
-}
-
 static void svm_set_irq(struct kvm_vcpu *vcpu, int irq)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
-   nested_svm_intr(svm);
+   if(!(svm->vcpu.arch.hflags & HF_GIF_MASK))
+   return;
 
-   svm_queue_irq(vcpu, irq);
+   svm->vmcb->control.event_inj = irq |
+   SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR;
 }
 
 static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
@@ -2318,13 +2314,25 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu)
struct vmcb *vmcb = svm->vmcb;
return (vmcb->save.rflags & X86_EFLAGS_IF) &&
!(vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) &&
-   (svm->vcpu.arch.hflags & HF_GIF_MASK);
+   (svm->vcpu.arch.hflags & HF_GIF_MASK) &&
+   !is_nested(svm);
 }
 
 static void enable_irq_window(struct kvm_vcpu *vcpu)
 {
-   svm_set_vintr(to_svm(vcpu));
-   svm_inject_irq(to_svm(vcpu), 0x0);
+   struct vcpu_svm *svm = to_svm(vcpu);
+   nsvm_printk("Trying to open IRQ window\n");
+
+   nested_svm_intr(svm);
+
+   /* In case GIF=0 we can't rely on the CPU to tell us when
+* GIF becomes 1, because that's a separate STGI/VMRUN intercept.
+* The next time we get that intercept, this function will be
+* called again though and we'll get the vintr intercept. */
+   if (svm->vcpu.arch.hflags & HF_GIF_MASK) {
+   svm_set_vintr(svm);
+   svm_inject_irq(svm, 0x0);
+   }
 }
 
 static void enable_nmi_window(struct kvm_vcpu *vcpu)
@@ -2392,6 +2400,8 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
case SVM_EXITINTINFO_TYPE_EXEPT:
/* In case of software exception do not reinject an exception
   vector, but re-execute and instruction instead */
+   if (is_nested(svm))
+   break;
if (vector == BP_VECTOR || vector == OF_VECTOR)
break;
if (exitintinfo & SVM_EXITINTINFO_VALID_ERR) {
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/6] MMU: don't bail on PAT bits in PTE

2009-05-15 Thread Alexander Graf

A 64bit PTE can have bit7 set to 1 which means "Use this bit for the PAT".
Currently KVM's MMU code treats this bit as reserved, even though it's not.

As long as we're not required to make use of the PAT bits which is only
required for DMA/MMIO from my understanding, we can safely ignore it.

Hyper-V uses this bit for kernel PTEs.

Signed-off-by: Alexander Graf 
---
 arch/x86/kvm/mmu.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8fcdae9..cce055a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2169,7 +2169,7 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, 
int level)
context->rsvd_bits_mask[1][1] = exb_bit_rsvd |
rsvd_bits(maxphyaddr, 51) |
rsvd_bits(13, 20);  /* large page */
-   context->rsvd_bits_mask[1][0] = ~0ull;
+   context->rsvd_bits_mask[1][0] = 0ull;
break;
}
 }
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/6] Add rudimentary Hyper-V guest support

2009-05-15 Thread Alexander Graf

Now that we have nested SVM in place, let's make use of it and virtualize
something non-kvm.
The first interesting target that came to my mind here was Hyper-V.

This patchset makes Windows Server 2008 boot with Hyper-V, which runs
the "dom0" in virtualized mode already. I haven't been able to run a
second VM within for now though, but maybe I just wasn't patient enough ;-).

Alexander Graf (6):
  Add definition for IGNNE MSR
  MMU: don't bail on PAT bits in PTE
  Emulator: Inject #PF when page was not found
  Implement Hyper-V MSRs
  Nested SVM: Implement INVLPGA
  Nested SVM: Improve interrupt injection

 arch/x86/include/asm/msr-index.h |1 +
 arch/x86/kvm/mmu.c   |2 +-
 arch/x86/kvm/svm.c   |   59 +++--
 arch/x86/kvm/x86.c   |7 +++-
 4 files changed, 50 insertions(+), 19 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/6] Implement Hyper-V MSRs

2009-05-15 Thread Alexander Graf

Hyper-V uses some MSRs, some of which are actually reserved for BIOS usage.

But let's be nice today and have it its way, because otherwise it fails
terribly.

For MSRs where I could find a name I used the name, otherwise they're just
added in their hex form for now.

Signed-off-by: Alexander Graf 
---
 arch/x86/kvm/svm.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ef43a18..30e6b43 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1932,6 +1932,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned 
ecx, u64 *data)
*data = svm->hsave_msr;
break;
case MSR_VM_CR:
+   case 0x4081:
*data = 0;
break;
case MSR_IA32_UCODE_REV:
@@ -2034,6 +2035,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned 
ecx, u64 data)
case MSR_VM_HSAVE_PA:
svm->hsave_msr = data;
break;
+   case MSR_VM_CR:
+   case MSR_VM_IGNNE:
+   case MSR_K8_HWCR:
+   break;
default:
return kvm_set_msr_common(vcpu, ecx, data);
}
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/6] Nested SVM: Implement INVLPGA

2009-05-15 Thread Alexander Graf

SVM adds another way to do INVLPG by ASID which Hyper-V makes use of,
so let's implement it!

For now we just do the same thing invlpg does, as asid switching
means we flush the mmu anyways. That might change one day though.

Signed-off-by: Alexander Graf 
---
 arch/x86/kvm/svm.c |   14 +-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 30e6b43..b2c6cf3 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1785,6 +1785,18 @@ static int clgi_interception(struct vcpu_svm *svm, 
struct kvm_run *kvm_run)
return 1;
 }
 
+static int invlpga_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+   struct kvm_vcpu *vcpu = &svm->vcpu;
+   nsvm_printk("INVLPGA\n");
+   svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
+   skip_emulated_instruction(&svm->vcpu);
+
+   kvm_mmu_reset_context(vcpu);
+   kvm_mmu_load(vcpu);
+   return 1;
+}
+
 static int invalid_op_interception(struct vcpu_svm *svm,
   struct kvm_run *kvm_run)
 {
@@ -2130,7 +2142,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
[SVM_EXIT_INVD] = emulate_on_interception,
[SVM_EXIT_HLT]  = halt_interception,
[SVM_EXIT_INVLPG]   = invlpg_interception,
-   [SVM_EXIT_INVLPGA]  = invalid_op_interception,
+   [SVM_EXIT_INVLPGA]  = invlpga_interception,
[SVM_EXIT_IOIO] = io_interception,
[SVM_EXIT_MSR]  = msr_interception,
[SVM_EXIT_TASK_SWITCH]  = task_switch_interception,
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/6] Emulator: Inject #PF when page was not found

2009-05-15 Thread Alexander Graf

If we couldn't find a page on read_emulated, it might be a good
idea to tell the guest about that and inject a #PF.

We do the same already for write faults. I don't know why it was
not implemented for reads.

Signed-off-by: Alexander Graf 
---
 arch/x86/kvm/x86.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5fcde2c..5aa1219 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2131,10 +2131,13 @@ static int emulator_read_emulated(unsigned long addr,
goto mmio;
 
if (kvm_read_guest_virt(addr, val, bytes, vcpu)
-   == X86EMUL_CONTINUE)
+   == X86EMUL_CONTINUE) {
return X86EMUL_CONTINUE;
-   if (gpa == UNMAPPED_GVA)
+   }
+   if (gpa == UNMAPPED_GVA) {
+   kvm_inject_page_fault(vcpu, addr, 0);
return X86EMUL_PROPAGATE_FAULT;
+   }
 
 mmio:
/*
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: event injection MACROs

2009-05-15 Thread Dong, Eddie

Gleb Natapov wrote:
> On Thu, May 14, 2009 at 10:34:11PM +0800, Dong, Eddie wrote:
>> Gleb Natapov wrote:
>>> On Thu, May 14, 2009 at 09:43:33PM +0800, Dong, Eddie wrote:
 Avi Kivity wrote:
> Dong, Eddie wrote:
>> OK.
>> Also back to Gleb's question, the reason I want to do that is to
>> simplify event generation mechanism in current KVM.
>> 
>> Today KVM use additional layer of exception/nmi/interrupt such as
>> vcpu.arch.exception.pending, vcpu->arch.interrupt.pending &
>> vcpu->arch.nmi_injected. All those additional layer is due to
>> compete of VM_ENTRY_INTR_INFO_FIELD
>> write to inject the event. Both SVM & VMX has only one resource
>> to inject the virtual event but KVM generates 3 catagory of
>> events in parallel which further requires additional
>> logic to dictate among them.
> 
> I thought of using a queue to hold all pending events (in a common
> format), sort it by priority, and inject the head.
 
 The SDM Table 5-4 requires to merge 2 events together, i.e.
 convert to #DF/ Triple fault or inject serially when 2 events
 happens no matter NMI, IRQ or exception. 
 
 As if considering above events merging activity, that is a single
 element queue.
>>> I don't know how you got to this conclusion from you previous
>>> statement. See explanation to table 5-2 for instate where it is
>>> stated that interrupt should be held pending if there is exception
>>> with higher priority. Should be held pending where? In the queue,
>>> like we do. Note that low prio exceptions are just dropped since
>>> they will be regenerated.
>> 
>> I have different understanding here.
>> My understanding is that "held" means NO INTA in HW, i.e. LAPIC
>> still hold this IRQ. 
>> 
> And what if INTA already happened and CPU is ready to fetch IDT for
> interrupt vector and at this very moment CPU faults?

If INTA happens, that means it is delivered. If its delivery triggers another 
exception, that is what Table5-4 handles.

My understanding is that it is 2 stage process. Table 5-2 talk about 
events happening before delivery, so that HW needs to prioritize them. 
Once a decision is make, the highest one is delivered but then it could 
trigger another exception when fetching IDT etc.

Current execption.pending/interrupt.pending/nmi_injected doesn't match 
either of above, interrupt/nmi is only for failed event injection, and a strange
fixed priority check when it is really injected: 
exception > failed NMI > failed IRQ > new NMI > new IRQ.

Table 5-2 looks missed in current KVM IMO except a wrong (but minor)
 exception > NMI > IRQ sequence.

> 
>>> 
  We could have either:  1) A pure SW "queue" that will be flush to
 HW register later (VM_ENTRY_INTR_INFO_FIELD), 2) Direct use HW
 register. 
 
>>> We have three event sources 1) exceptions 2) IRQ 3) NMI. We should
>>> have queue of three elements sorted by priority. On each entry we
>>> should 
>> 
>> Table 5-4 alreadys says NMI/IRQ is BENIGN.
> Table 5-2 applies here not table 5-4 I think.
> 
>> 
>>> inject an event with highest priority. And remove it from queue on
>>> exit.
>> 
>> The problem is that we have to decide to inject only one of above 3,
>> and discard the rest. Whether priority them or merge (to one event
>> as Table 5-4) is another story. 
> Only a small number of event are merged into #DF. Most handled
> serially (SDM does not define what serially means unfortunately), so
> I don't understand where "discard the rest" is come from. We can

vmx_complete_interrupts clear all of them at next EXIT.

Even from HW point of view, if there are pending NMI/IRQ/exception,
CPU pick highest one, NMI, ignore/discard IRQ (but LAPIC still holds 
IRQ, thus it can be re-injected), completely discard exception.

I don't say discarding has any problem, but unnecessary to keep all of 3.
the only difference is when to discard the rest 2, at queue_exception/irq/nmi 
time or later on (even at next EXIT time), which is same to me.

> discard exception since it will be regenerated anyway, but IRQ and
> NMI is another story. SDM says that IRQ should be held pending (once
> again not much explanation here), nothing about NMI.
> 
>>> 
 
 A potential benefit is that it can avoid duplicated code and
 potential bugs in current code as following patch shows if I
 understand correctly: 
 
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2599,7 +2599,7 @@ static int handle_exception(struct kvm_vcpu
 *vcpu, struct kvm_run *kvm_run) cr2 =
 vmcs_readl(EXIT_QUALIFICATION);
 KVMTRACE_3D(PAGE_FAULT, vcpu,
 error_code, (u32)cr2, (u32)((u64)cr2 >> 32), handler); -
 if (vcpu->arch.interrupt.pending || vcpu->arch.exception.pending )
 + if (vcpu->arch.interrupt.pending ||
 vcpu->arch.exception.pending  ||
 vcpu->arch.n

58 matches

Mail list logo