Re: [PATCH v2] Shared memory device with interrupt support
On 15-May-09, at 8:54 PM, Kumar, Venkat wrote: Cam, A questions on interrupts as well. What is "unix:path" that needs to be passed in the argument list? Can it be any string? It has to be a valid path on the host. It will create a unix domain socket on that path. If my understanding is correct both the VM's who wants to communicate would gives this path in the command line with one of them specifying as "server". Exactly, the one with the "server" in the parameter list will wait for a connection before booting. Cam Thx, Venkat Support an inter-vm shared memory device that maps a shared- memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. This device now creates a qemu character device and sends 1-bytes messages to trigger interrupts. Writes are trigger by writing to the "Doorbell" register on the shared memory PCI device. The lower 8-bits of the value written to this register are sent as the 1-byte message so different meanings of interrupts can be supported. Interrupts are only supported between 2 VMs currently. One VM must act as the server by adding "server" to the command-line argument. Shared memory devices are created with the following command-line: -ivhshmem ,,[unix:][,server] Interrupts can also be used between host and guest as well by implementing a listener on the host. Cam --- Makefile.target |3 + hw/ivshmem.c| 421 ++ + hw/pc.c |6 + hw/pc.h |3 + qemu-options.hx | 14 ++ sysemu.h|8 + vl.c| 14 ++ 7 files changed, 469 insertions(+), 0 deletions(-) create mode 100644 hw/ivshmem.c diff --git a/Makefile.target b/Makefile.target index b68a689..3190bba 100644 --- a/Makefile.target +++ b/Makefile.target @@ -643,6 +643,9 @@ OBJS += pcnet.o OBJS += rtl8139.o OBJS += e1000.o +# Inter-VM PCI shared memory +OBJS += ivshmem.o + # Generic watchdog support and some watchdog devices OBJS += watchdog.o OBJS += wdt_ib700.o wdt_i6300esb.o diff --git a/hw/ivshmem.c b/hw/ivshmem.c new file mode 100644 index 000..95e2268 --- /dev/null +++ b/hw/ivshmem.c @@ -0,0 +1,421 @@ +/* + * Inter-VM Shared Memory PCI device. + * + * Author: + * Cam Macdonell + * + * Based On: cirrus_vga.c and rtl8139.c + * + * This code is licensed under the GNU GPL v2. + */ + +#include "hw.h" +#include "console.h" +#include "pc.h" +#include "pci.h" +#include "sysemu.h" + +#include "qemu-common.h" +#include + +#define PCI_COMMAND_IOACCESS0x0001 +#define PCI_COMMAND_MEMACCESS 0x0002 +#define PCI_COMMAND_BUSMASTER 0x0004 + +//#define DEBUG_IVSHMEM + +#ifdef DEBUG_IVSHMEM +#define IVSHMEM_DPRINTF(fmt, args...)\ +do {printf("IVSHMEM: " fmt, ##args); } while (0) +#else +#define IVSHMEM_DPRINTF(fmt, args...) +#endif + +typedef struct IVShmemState { +uint16_t intrmask; +uint16_t intrstatus; +uint16_t doorbell; +uint8_t *ivshmem_ptr; +unsigned long ivshmem_offset; +unsigned int ivshmem_size; +unsigned long bios_offset; +unsigned int bios_size; +target_phys_addr_t base_ctrl; +int it_shift; +PCIDevice *pci_dev; +CharDriverState * chr; +unsigned long map_addr; +unsigned long map_end; +int ivshmem_mmio_io_addr; +} IVShmemState; + +typedef struct PCI_IVShmemState { +PCIDevice dev; +IVShmemState ivshmem_state; +} PCI_IVShmemState; + +typedef struct IVShmemDesc { +char name[1024]; +char * chrdev; +int size; +} IVShmemDesc; + + +/* registers for the Inter-VM shared memory device */ +enum ivshmem_registers { +IntrMask = 0, +IntrStatus = 16, +Doorbell = 32 +}; + +static int num_ivshmem_devices = 0; +static IVShmemDesc ivshmem_desc; + +static void ivshmem_map(PCIDevice *pci_dev, int region_num, +uint32_t addr, uint32_t size, int type) +{ +PCI_IVShmemState *d = (PCI_IVShmemState *)pci_dev; +IVShmemState *s = &d->ivshmem_state; + +IVSHMEM_DPRINTF("addr = %u size = %u\n", addr, size); +cpu_register_physical_memory(addr, s->ivshmem_size, s- >ivshmem_offset); + +} + +void ivshmem_init(const char * optarg) { + +char * temp; +char * ivshmem_sz; +int size; + +num_ivshmem_devices++; + +/* currently we only support 1 device */ +if (num_ivshmem_devices > MAX_IVSHMEM_DEVICES) { +return; +} + +temp = strdup(optarg); +snprintf(ivshmem_desc.name, 1024, "/%s", strsep(&temp,",")); +ivshmem_sz=strsep(&temp,","); +if (ivshmem_sz != NULL){ +size = atol(ivshmem_sz); +} else { +size = -1; +} + +ivshmem_desc.chrdev = strsep(&temp,"\0"); + +if ( size == -1) { +ivshmem_desc.size = TARGET_PAGE_SIZE; +} else { +ivshmem_desc.size = size*1024*1024; +} +IVSHMEM_DPRINTF("optarg
Re: [PATCH v2] Shared memory device with interrupt support
On 15-May-09, at 8:45 PM, Kumar, Venkat wrote: Hi Cam, I have gone through you latest shared memory patch. I have a few questions and comments. Comment:- +if (ivshmem_enabled) { +ivshmem_init(ivshmem_device); +ram_size += ivshmem_get_size(); +} + In your initial patch this part of the patch is +if (ivshmem_enabled) { +ivshmem_init(ivshmem_device); +phys_ram_size += ivshmem_get_size(); +} I think the phys_ram_size += ivshmem_get_size(); is correct. Hi Venkat, Not with the newer qemu that qemu-kvm uses. The newer patch is for qemu-kvm, not kvm-userspace. There is no longer a variable named phys_ram_size in pc.c in qemu-kvm. Question:- You are giving the desired virtual address for mmaping the shared memory object as "s->ivshmem_ptr" which is "phys_ram_base + s- >ivshmem_offset". This desired virtual address is nothing but the base virtual address of the memory that you are allocating after incrementing phys_ram_size. So now s->ivshmem_ptr would point to a new set of memory, which is the shared memory region instead of memory allocated through qemu_alloc_physram, which means if pages are allocated for "sh->ivshmem_ptr" virtual address range then those pages can never be addressed again. Correct me if my understanding is wrong. I don't think so. With the mmap call, I specify MAP_FIXED which requires that the memory in the shared memory object be mapped to the address given in the first parameter (s->ivshmem_ptr). If MAP_FIXED is not specified then mmap would allocate the memory and map on to it, but with MAP_FIXED it maps onto the already reserved space that ivshmem_ptr points to and was allocated with qemu_ram_alloc(). I hope that answers your question, Cam -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Cam Macdonell Sent: Thursday, May 07, 2009 9:47 PM To: kvm@vger.kernel.org Cc: Cam Macdonell Subject: [PATCH v2] Shared memory device with interrupt support Support an inter-vm shared memory device that maps a shared- memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. This device now creates a qemu character device and sends 1-bytes messages to trigger interrupts. Writes are trigger by writing to the "Doorbell" register on the shared memory PCI device. The lower 8-bits of the value written to this register are sent as the 1-byte message so different meanings of interrupts can be supported. Interrupts are only supported between 2 VMs currently. One VM must act as the server by adding "server" to the command-line argument. Shared memory devices are created with the following command-line: -ivhshmem ,,[unix:][,server] Interrupts can also be used between host and guest as well by implementing a listener on the host. Cam --- Makefile.target |3 + hw/ivshmem.c| 421 ++ + hw/pc.c |6 + hw/pc.h |3 + qemu-options.hx | 14 ++ sysemu.h|8 + vl.c| 14 ++ 7 files changed, 469 insertions(+), 0 deletions(-) create mode 100644 hw/ivshmem.c diff --git a/Makefile.target b/Makefile.target index b68a689..3190bba 100644 --- a/Makefile.target +++ b/Makefile.target @@ -643,6 +643,9 @@ OBJS += pcnet.o OBJS += rtl8139.o OBJS += e1000.o +# Inter-VM PCI shared memory +OBJS += ivshmem.o + # Generic watchdog support and some watchdog devices OBJS += watchdog.o OBJS += wdt_ib700.o wdt_i6300esb.o diff --git a/hw/ivshmem.c b/hw/ivshmem.c new file mode 100644 index 000..95e2268 --- /dev/null +++ b/hw/ivshmem.c @@ -0,0 +1,421 @@ +/* + * Inter-VM Shared Memory PCI device. + * + * Author: + * Cam Macdonell + * + * Based On: cirrus_vga.c and rtl8139.c + * + * This code is licensed under the GNU GPL v2. + */ + +#include "hw.h" +#include "console.h" +#include "pc.h" +#include "pci.h" +#include "sysemu.h" + +#include "qemu-common.h" +#include + +#define PCI_COMMAND_IOACCESS0x0001 +#define PCI_COMMAND_MEMACCESS 0x0002 +#define PCI_COMMAND_BUSMASTER 0x0004 + +//#define DEBUG_IVSHMEM + +#ifdef DEBUG_IVSHMEM +#define IVSHMEM_DPRINTF(fmt, args...)\ +do {printf("IVSHMEM: " fmt, ##args); } while (0) +#else +#define IVSHMEM_DPRINTF(fmt, args...) +#endif + +typedef struct IVShmemState { +uint16_t intrmask; +uint16_t intrstatus; +uint16_t doorbell; +uint8_t *ivshmem_ptr; +unsigned long ivshmem_offset; +unsigned int ivshmem_size; +unsigned long bios_offset; +unsigned int bios_size; +target_phys_addr_t base_ctrl; +int it_shift; +PCIDevice *pci_dev; +CharDriverState * chr; +unsigned long map_addr; +unsigned long map_end; +int ivshmem_mmio_io_addr; +} IVShmemState; + +typedef struct PCI_IVShme
[PATCH 1/2] Clean up MADT Table Creation
This patch is based on the recent patch from Vincent Minet. I split Vincent's changes into 2 patches (to separate MADT and RSDT table cleanup, as suggested by Marcelo) and added a bit to them. And to give credit where it is due, this cleanup is also related to the patch Marcelo provided when the HPET addition tripped over the same problem. (Thanks again Marcelo :-) This patch moves all the table layout calculations to the same area of acpi_bios_init. This prevents corruption problems when, in the middle of filling in the tables, the MADT table size grows. The idea is to do all the layout in one section, then fill things in afterwards. It also corrects a problem where the madt table was memset to 0 before the final size of the table had been determined. Signed-off-by: Beth Kon diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index cbd5f15..7f62e4f 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1665,6 +1665,7 @@ void acpi_bios_init(void) addr = (addr + 7) & ~7; madt_addr = addr; +madt = (void *)(addr); madt_size = sizeof(*madt) + sizeof(struct madt_processor_apic) * MAX_CPUS + #ifdef BX_QEMU @@ -1672,7 +1673,11 @@ void acpi_bios_init(void) #else sizeof(struct madt_io_apic); #endif -madt = (void *)(addr); +for ( i = 0; i < 16; i++ ) { +if ( PCI_ISA_IRQ_MASK & (1U << i) ) { +madt_size += sizeof(struct madt_int_override); +} +} addr += madt_size; #ifdef BX_QEMU @@ -1786,7 +1791,6 @@ void acpi_bios_init(void) continue; } int_override++; -madt_size += sizeof(struct madt_int_override); } acpi_build_table_header((struct acpi_table_header *)madt, "APIC", madt_size, 1); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Clean up RSDT Table Creation
This patch is also based on the patch by Vincent Minet. It corrects the size calculation of the RSDT, and checks for overflow of MAX_RSDT_ENTRIES, assuming that the external table entry count is contained within MAX_RSDT_ENTRIES. Signed-off-by: Beth Kon diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index 7f62e4f..ac8f9c5 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1626,7 +1626,7 @@ void acpi_bios_init(void) addr = base_addr = ram_size - ACPI_DATA_SIZE; rsdt_addr = addr; rsdt = (void *)(addr); -rsdt_size = sizeof(*rsdt) + external_tables * 4; +rsdt_size = sizeof(*rsdt); addr += rsdt_size; fadt_addr = addr; @@ -1873,16 +1873,6 @@ void acpi_bios_init(void) "HPET", sizeof(*hpet), 1); #endif -acpi_additional_tables(); /* resets cfg to required entry */ -for(i = 0; i < external_tables; i++) { -uint16_t len; -if(acpi_load_table(i, addr, &len) < 0) -BX_PANIC("Failed to load ACPI table from QEMU\n"); -rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr); -addr += len; -if(addr >= ram_size) -BX_PANIC("ACPI table overflow\n"); -} #endif /* RSDT */ @@ -1895,6 +1885,19 @@ void acpi_bios_init(void) // rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr); if (nb_numa_nodes > 0) rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr); +acpi_additional_tables(); /* resets cfg to required entry */ +/* external_tables load must occur last to + * properly check for MAX_RSDT_ENTRIES overflow. + */ +for(i = 0; i < external_tables; i++) { +uint16_t len; +if(acpi_load_table(i, addr, &len) < 0) +BX_PANIC("Failed to load ACPI table from QEMU\n"); +rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr); +addr += len; +if((addr >= ram_size) || (nb_rsdt_entries > MAX_RSDT_ENTRIES)) +BX_PANIC("ACPI table overflow\n"); +} #endif rsdt_size -= MAX_RSDT_ENTRIES * 4; rsdt_size += nb_rsdt_entries * 4; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Subject:[PATCH 1/2] Clean up MADT Table Creation
Beth Kon wrote: This patch is also based on the patch by Vincent Minet. It corrects the size calculation of the RSDT, and checks for overflow of MAX_RSDT_ENTRIES, assuming that the external table entry count is contained within MAX_RSDT_ENTRIES. Signed-off-by: Beth Kon This should have been patch 2/2. I think git-send-email didn't like that I didn't have a space after Subject: . Let me try to resend with the space added. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2] Shared memory device with interrupt support
Hi Cam, I have gone through you latest shared memory patch. I have a few questions and comments. Comment:- +if (ivshmem_enabled) { +ivshmem_init(ivshmem_device); +ram_size += ivshmem_get_size(); +} + In your initial patch this part of the patch is +if (ivshmem_enabled) { +ivshmem_init(ivshmem_device); +phys_ram_size += ivshmem_get_size(); +} I think the phys_ram_size += ivshmem_get_size(); is correct. Question:- You are giving the desired virtual address for mmaping the shared memory object as "s->ivshmem_ptr" which is "phys_ram_base + s->ivshmem_offset". This desired virtual address is nothing but the base virtual address of the memory that you are allocating after incrementing phys_ram_size. So now s->ivshmem_ptr would point to a new set of memory, which is the shared memory region instead of memory allocated through qemu_alloc_physram, which means if pages are allocated for "sh->ivshmem_ptr" virtual address range then those pages can never be addressed again. Correct me if my understanding is wrong. Thx, Venkat -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Cam Macdonell Sent: Thursday, May 07, 2009 9:47 PM To: kvm@vger.kernel.org Cc: Cam Macdonell Subject: [PATCH v2] Shared memory device with interrupt support Support an inter-vm shared memory device that maps a shared-memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. This device now creates a qemu character device and sends 1-bytes messages to trigger interrupts. Writes are trigger by writing to the "Doorbell" register on the shared memory PCI device. The lower 8-bits of the value written to this register are sent as the 1-byte message so different meanings of interrupts can be supported. Interrupts are only supported between 2 VMs currently. One VM must act as the server by adding "server" to the command-line argument. Shared memory devices are created with the following command-line: -ivhshmem ,,[unix:][,server] Interrupts can also be used between host and guest as well by implementing a listener on the host. Cam --- Makefile.target |3 + hw/ivshmem.c| 421 +++ hw/pc.c |6 + hw/pc.h |3 + qemu-options.hx | 14 ++ sysemu.h|8 + vl.c| 14 ++ 7 files changed, 469 insertions(+), 0 deletions(-) create mode 100644 hw/ivshmem.c diff --git a/Makefile.target b/Makefile.target index b68a689..3190bba 100644 --- a/Makefile.target +++ b/Makefile.target @@ -643,6 +643,9 @@ OBJS += pcnet.o OBJS += rtl8139.o OBJS += e1000.o +# Inter-VM PCI shared memory +OBJS += ivshmem.o + # Generic watchdog support and some watchdog devices OBJS += watchdog.o OBJS += wdt_ib700.o wdt_i6300esb.o diff --git a/hw/ivshmem.c b/hw/ivshmem.c new file mode 100644 index 000..95e2268 --- /dev/null +++ b/hw/ivshmem.c @@ -0,0 +1,421 @@ +/* + * Inter-VM Shared Memory PCI device. + * + * Author: + * Cam Macdonell + * + * Based On: cirrus_vga.c and rtl8139.c + * + * This code is licensed under the GNU GPL v2. + */ + +#include "hw.h" +#include "console.h" +#include "pc.h" +#include "pci.h" +#include "sysemu.h" + +#include "qemu-common.h" +#include + +#define PCI_COMMAND_IOACCESS0x0001 +#define PCI_COMMAND_MEMACCESS 0x0002 +#define PCI_COMMAND_BUSMASTER 0x0004 + +//#define DEBUG_IVSHMEM + +#ifdef DEBUG_IVSHMEM +#define IVSHMEM_DPRINTF(fmt, args...)\ +do {printf("IVSHMEM: " fmt, ##args); } while (0) +#else +#define IVSHMEM_DPRINTF(fmt, args...) +#endif + +typedef struct IVShmemState { +uint16_t intrmask; +uint16_t intrstatus; +uint16_t doorbell; +uint8_t *ivshmem_ptr; +unsigned long ivshmem_offset; +unsigned int ivshmem_size; +unsigned long bios_offset; +unsigned int bios_size; +target_phys_addr_t base_ctrl; +int it_shift; +PCIDevice *pci_dev; +CharDriverState * chr; +unsigned long map_addr; +unsigned long map_end; +int ivshmem_mmio_io_addr; +} IVShmemState; + +typedef struct PCI_IVShmemState { +PCIDevice dev; +IVShmemState ivshmem_state; +} PCI_IVShmemState; + +typedef struct IVShmemDesc { +char name[1024]; +char * chrdev; +int size; +} IVShmemDesc; + + +/* registers for the Inter-VM shared memory device */ +enum ivshmem_registers { +IntrMask = 0, +IntrStatus = 16, +Doorbell = 32 +}; + +static int num_ivshmem_devices = 0; +static IVShmemDesc ivshmem_desc; + +static void ivshmem_map(PCIDevice *pci_dev, int region_num, +uint32_t addr, uint32_t size, int type) +{ +PCI_IVShmemState *d = (PCI_IVShmemState *)pci_dev; +IVShmemState *s = &d->ivshmem_state; + +IVSHMEM_DPRINTF("addr = %u size = %u\n", addr, size); +
Subject:[PATCH 1/2] Clean up MADT Table Creation
This patch is also based on the patch by Vincent Minet. It corrects the size calculation of the RSDT, and checks for overflow of MAX_RSDT_ENTRIES, assuming that the external table entry count is contained within MAX_RSDT_ENTRIES. Signed-off-by: Beth Kon diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index 7f62e4f..ac8f9c5 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1626,7 +1626,7 @@ void acpi_bios_init(void) addr = base_addr = ram_size - ACPI_DATA_SIZE; rsdt_addr = addr; rsdt = (void *)(addr); -rsdt_size = sizeof(*rsdt) + external_tables * 4; +rsdt_size = sizeof(*rsdt); addr += rsdt_size; fadt_addr = addr; @@ -1873,16 +1873,6 @@ void acpi_bios_init(void) "HPET", sizeof(*hpet), 1); #endif -acpi_additional_tables(); /* resets cfg to required entry */ -for(i = 0; i < external_tables; i++) { -uint16_t len; -if(acpi_load_table(i, addr, &len) < 0) -BX_PANIC("Failed to load ACPI table from QEMU\n"); -rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr); -addr += len; -if(addr >= ram_size) -BX_PANIC("ACPI table overflow\n"); -} #endif /* RSDT */ @@ -1895,6 +1885,19 @@ void acpi_bios_init(void) // rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr); if (nb_numa_nodes > 0) rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr); +acpi_additional_tables(); /* resets cfg to required entry */ +/* external_tables load must occur last to + * properly check for MAX_RSDT_ENTRIES overflow. + */ +for(i = 0; i < external_tables; i++) { +uint16_t len; +if(acpi_load_table(i, addr, &len) < 0) +BX_PANIC("Failed to load ACPI table from QEMU\n"); +rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr); +addr += len; +if((addr >= ram_size) || (nb_rsdt_entries > MAX_RSDT_ENTRIES)) +BX_PANIC("ACPI table overflow\n"); +} #endif rsdt_size -= MAX_RSDT_ENTRIES * 4; rsdt_size += nb_rsdt_entries * 4; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Subject:[PATCH 1/2] Clean up MADT Table Creation
This patch is based on the recent patch from Vincent Minet. I split Vincent's changes into 2 patches (to separate MADT and RSDT table cleanup, as suggested by Marcelo) and added a bit to them. And to give credit where it is due, this cleanup is also related to the patch Marcelo provided when the HPET addition tripped over the same problem. (Thanks again Marcelo :-) This patch moves all the table layout calculations to the same area of acpi_bios_init. This prevents corruption problems when, in the middle of filling in the tables, the MADT table size grows. The idea is to do all the layout in one section, then fill things in afterwards. It also corrects a problem where the madt table was memset to 0 before the final size of the table had been determined. Signed-off-by: Beth Kon diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index cbd5f15..7f62e4f 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1665,6 +1665,7 @@ void acpi_bios_init(void) addr = (addr + 7) & ~7; madt_addr = addr; +madt = (void *)(addr); madt_size = sizeof(*madt) + sizeof(struct madt_processor_apic) * MAX_CPUS + #ifdef BX_QEMU @@ -1672,7 +1673,11 @@ void acpi_bios_init(void) #else sizeof(struct madt_io_apic); #endif -madt = (void *)(addr); +for ( i = 0; i < 16; i++ ) { +if ( PCI_ISA_IRQ_MASK & (1U << i) ) { +madt_size += sizeof(struct madt_int_override); +} +} addr += madt_size; #ifdef BX_QEMU @@ -1786,7 +1791,6 @@ void acpi_bios_init(void) continue; } int_override++; -madt_size += sizeof(struct madt_int_override); } acpi_build_table_header((struct acpi_table_header *)madt, "APIC", madt_size, 1); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: XP smp using a lot of CPU [SOLVED]
On May 15, 2009, at 3:24 PM, Ross Boylan wrote: Using ACPI fixes the problem; CPU useage is now quite low. Start line was sudo vdeq kvm -net nic,vlan=1,macaddr=52:54:a0:12:01:00 \ -net vde,vlan=1,sock=/var/run/vde2/tap0.ctl \ -boot d -cdrom /usr/local/backup/XPProSP3.iso \ -std-vga -hda /dev/turtle/XP00 \ -soundhw es1370 -localtime -m 1G -smp 2 I switched to -boot c later. I ended up doing a fresh install; my repair got mucked up and I got the message "The requested lookup key was not found in any active activation context" when I entered a location into MSIE, including when I tried to run Windows Update. Googling showed this might indicate some permission or file corruption issues. They may have happened during my earlier (virtual) system hang. My experience suggests a theory: if you use SMP with XP (i.e., more than 1 virtual processor) you should enable acpi, i.e., not say -no- acpi. It this is true, the advice to run windows with -no-acpi should probably be updated. It's possible single CPU systems are affected as well. I removed the note about -no-acpi from the howto on the wiki. I don't think that's been true for a long time. --Iggy Ross -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] qemu-kvm: Make PC speaker emulation aware of in-kernel PIT
On Thu, May 14, 2009 at 10:43:05PM +0200, Jan Kiszka wrote: > When using the in-kernel PIT the speaker emulation has to synchronize > the PIT state with KVM. Enhance the existing speaker sound device and > allow it to take over port 0x61 by using KVM_CREATE_PIT2 where > available. This unbreaks -soundhw pcspk in KVM mode. > > Changes in v4: > - preserve full PIT state across read-modify-write > - update kvm.h > > Changes in v3: > - re-added incorrectly dropped kvm_enabled checks > > Changes in v2: > - rebased over qemu-kvm and KVM_CREATE_PIT2 > - refactored hooks in pcspk > > Signed-off-by: Jan Kiszka Jan, You always attempt to use KVM_CREATE_PIT2, so say on migration if the destination does not support the new ioctl you fallback to in-kernel dummy naturally. Seems the right thing to do. Would be nice to avoid sprinkling KVM details inside hw/pcspk.c though but that is another problem. Looks good (and v3 kernel patch). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: XP smp using a lot of CPU [SOLVED]
Using ACPI fixes the problem; CPU useage is now quite low. Start line was sudo vdeq kvm -net nic,vlan=1,macaddr=52:54:a0:12:01:00 \ -net vde,vlan=1,sock=/var/run/vde2/tap0.ctl \ -boot d -cdrom /usr/local/backup/XPProSP3.iso \ -std-vga -hda /dev/turtle/XP00 \ -soundhw es1370 -localtime -m 1G -smp 2 I switched to -boot c later. I ended up doing a fresh install; my repair got mucked up and I got the message "The requested lookup key was not found in any active activation context" when I entered a location into MSIE, including when I tried to run Windows Update. Googling showed this might indicate some permission or file corruption issues. They may have happened during my earlier (virtual) system hang. My experience suggests a theory: if you use SMP with XP (i.e., more than 1 virtual processor) you should enable acpi, i.e., not say -no-acpi. It this is true, the advice to run windows with -no-acpi should probably be updated. It's possible single CPU systems are affected as well. Ross -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bios: Fix MADT corruption and RSDT size when using -acpitable
Marcelo Tosatti wrote: Beth, On Thu, May 14, 2009 at 12:20:29PM -0400, Beth Kon wrote: Anthony Liguori wrote: Vincent Minet wrote: External ACPI tables are counted twice for the RSDT size and the load address for the first external table is in the MADT (interrupt override entries are overwritten). Signed-off-by: Vincent Minet Beth, I think you had a patch attempting to address the same issue. It was a bit more involved though. Which is the proper fix and are they both to the same problem? They are for 2 different bases. My patch was for qemu's bochs bios and this is for qemu-kvm/kvm/bios/rombios32.c. They are pretty divergent in this area of setting up the ACPI tables. My patch is still needed for the qemu base. I hope we'll be getting to one base soon :-) Assuming the intent of the code was for MAX_RSDT_ENTRIES to include external_tables, this patch looks correct. I think one additional check would be needed (in my patch) to make sure that the code doesn't exceed MAX_RSDT_ENTRIES when the external tables are being loaded. My patch also puts all the code that calculates madt_size in the same place, at the beginning of the table layout. I believe this is neater and will avoid problems like this one in the future. As much as possible, I think it best to get all the tables layed out, then fill them in. If for some reason this is not acceptable, we need to add a big note that no tables should be layed out after the madt because the madt may grow further down in the code and overwrite the other table. I like this better too, see questions/comments below. Regards, Anthony Liguori --- kvm/bios/rombios32.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index cbd5f15..289361b 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1626,7 +1626,7 @@ void acpi_bios_init(void) addr = base_addr = ram_size - ACPI_DATA_SIZE; rsdt_addr = addr; rsdt = (void *)(addr); -rsdt_size = sizeof(*rsdt) + external_tables * 4; +rsdt_size = sizeof(*rsdt); addr += rsdt_size; fadt_addr = addr; @@ -1787,6 +1787,7 @@ void acpi_bios_init(void) } int_override++; madt_size += sizeof(struct madt_int_override); +addr += sizeof(struct madt_int_override); } acpi_build_table_header((struct acpi_table_header *)madt, "APIC", madt_size, 1); diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index cbd5f15..23835b6 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1626,7 +1626,7 @@ void acpi_bios_init(void) addr = base_addr = ram_size - ACPI_DATA_SIZE; rsdt_addr = addr; rsdt = (void *)(addr); -rsdt_size = sizeof(*rsdt) + external_tables * 4; +rsdt_size = sizeof(*rsdt); addr += rsdt_size; fadt_addr = addr; @@ -1665,6 +1665,7 @@ void acpi_bios_init(void) addr = (addr + 7) & ~7; madt_addr = addr; +madt = (void *)(addr); madt_size = sizeof(*madt) + sizeof(struct madt_processor_apic) * MAX_CPUS + #ifdef BX_QEMU @@ -1672,7 +1673,11 @@ void acpi_bios_init(void) #else sizeof(struct madt_io_apic); #endif -madt = (void *)(addr); +for ( i = 0; i < 16; i++ ) { +if ( PCI_ISA_IRQ_MASK & (1U << i) ) { +madt_size += sizeof(struct madt_int_override); +} +} addr += madt_size; This bug could only affect the HPET descriptor right? I'm not sure what you're asking. There were 2 bugs that Vincent pointed out. The first caused an incorrect rsdt_size to be reported, and the second (missing addr += sizeof(struct madt_int_override)) caused corruption of whatever came after the MADT. But even if his patch were applied, any future code that added a table and manipulated addr between the following points: ... (about line 1676) madt = (void *)(addr); addr += madt_size; ... (about line 1789) madt_size += sizeof(struct madt_int_override); addr += sizeof(struct madt_int_override); would have wound up causing some kind of corruption, as happened with the HPET. Also the "memset(madt, 0, madt_size)" around line 1740 was not using the complete madt_size. So this seems undesirable, and that's why I suggested moving all addr manipulation (with the exception of additional tables at the very end) to the same section of the table layout code. Seems best to manage madt_size all in one place. #ifdef BX_QEMU @@ -1786,7 +1791,6 @@ void acpi_bios_init(void) continue; } int_override++; -madt_size += sizeof(struct madt_int_override); } acpi_build_table_header((struct acpi_table_header *)madt, "APIC", madt_size, 1); @@ -1868,17 +1872,6 @@ void acpi_bios_init(void) acpi_build_table_header(
Re: RFC: convert KVMTRACE to event traces
On Fri, May 15, 2009 at 01:10:34PM -0400, Christoph Hellwig wrote: > On Thu, May 14, 2009 at 05:30:16PM -0300, Marcelo Tosatti wrote: > > + trace_kvm_cr_write(cr, val); > > switch (cr) { > > case 0: > > - kvm_set_cr0(vcpu, kvm_register_read(vcpu, reg)); > > + kvm_set_cr0(vcpu, val); > > skip_emulated_instruction(vcpu); > > Do we really need one trace point covering all cr writes, _and_ one for > each specific register? There is one tracepoint named kvm_cr that covers cr reads and writes. kvm_trace_cr_read/kvm_trace_cr_write are macros that expand to kvm_trace_cr(rw=1 or rw=0). Perhaps that is not a very good idea. > > > if (!npt_enabled) > > - KVMTRACE_3D(PAGE_FAULT, &svm->vcpu, error_code, > > - (u32)fault_address, (u32)(fault_address >> 32), > > - handler); > > + trace_kvm_page_fault(fault_address, error_code); > > else > > - KVMTRACE_3D(TDP_FAULT, &svm->vcpu, error_code, > > - (u32)fault_address, (u32)(fault_address >> 32), > > - handler); > > + trace_kvm_tdp_page_fault(fault_address, error_code); > > Again this seems a bit cumbersome. Why not just one tracepoint for > page faults, with a flag if we're using npt or not? Issue is the meaning of these faults is different. With npt disabled the fault is a guest fault (like a normal pagefault), but with npt enabled the fault indicates the host pagetables the hardware uses to do the translation are not set up correctly. I did unify them as you suggest but reverted back to separate tracepoints because the unification might be confusing. Can be unified later if desirable. > > +ifeq ($(CONFIG_TRACEPOINTS),y) > > +trace-objs = kvm-traces.o > > +arch-trace-objs = kvm-traces-arch.o > > +endif > > + > > EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm > > > > kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o > > \ > > - i8254.o > > + i8254.o $(trace-objs) > > obj-$(CONFIG_KVM) += kvm.o > > -kvm-intel-objs = vmx.o > > +kvm-intel-objs = vmx.o $(arch-trace-objs) > > obj-$(CONFIG_KVM_INTEL) += kvm-intel.o > > -kvm-amd-objs = svm.o > > +kvm-amd-objs = svm.o $(arch-trace-objs) > > obj-$(CONFIG_KVM_AMD) += kvm-amd.o > > The option to select even tracing bits is CONFIG_EVENT_TRACING and the > makefile syntax used here (both the original makefile and the additions) > is rather awkward. > > A proper arch/x86/kvm/Makefile including tracing bits should look like > the following: > > -- snip -- > EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm > > kvm-y += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \ > coalesced_mmio.o irq_comm.o) > kvm-$(CONFIG_KVM_TRACE) += $(addprefix ../../../virt/kvm/, kvm_trace.o) > kvm-$(CONFIG_IOMMU_API) += $(addprefix ../../../virt/kvm/, iommu.o) > kmv-y += x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \ > i8254.o > > kvm-$(CONFIG_EVENT_TRACING) += kvm-traces.o > kvm-arch-trace-$(CONFIG_EVENT_TRACING) += kvm-traces-arch.o > > kvm-intel-y += vmx.o $(kvm-arch-trace-y) > kvm-amd-y += svm.o $(kvm-arch-trace-y) > > obj-$(CONFIG_KVM) += kvm.o > obj-$(CONFIG_KVM_INTEL) += kvm-intel.o > obj-$(CONFIG_KVM_AMD) += kvm-amd.o > -- snip -- > > and do we actually still need kvm_trace.o after this? Your version looks much nicer. kvm_trace.o can disappear as soon as this is in Avi's tree and a decent replacement for user/kvm_trace.c is in qemu-kvm.git. > Anyway, I'll send the upstream part of the makefile cleanup out ASAP, > then you can rebase later. OK. > > > Index: linux-2.6-x86-2/arch/x86/kvm/kvm-traces.c > > === > > --- /dev/null > > +++ linux-2.6-x86-2/arch/x86/kvm/kvm-traces.c > > @@ -0,0 +1,5 @@ > > +#include > > + > > + > > +#define CREATE_TRACE_POINTS > > +#include > > Can't we just put this into some other common .c file? That would also > reduce the amount of makefile magic required. > > > Index: linux-2.6-x86-2/arch/x86/kvm/kvm-traces-arch.c > > === > > --- /dev/null > > +++ linux-2.6-x86-2/arch/x86/kvm/kvm-traces-arch.c > > @@ -0,0 +1,5 @@ > > +#include > > + > > + > > +#define CREATE_TRACE_POINTS > > +#include > > Same for this one, especially as the makefile hackery required for this > one is even worse.. Probably for both. Now that you say I can't explain the reason for the separate C files. Will put this up in a git tree in a couple of hours. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 resend 0/6] ATS capability support for Intel IOMMU
On Thu, 14 May 2009 10:32:05 +0800 Yu Zhao wrote: > This patch series implements Address Translation Service support for > the Intel IOMMU. The PCIe Endpoint that supports ATS capability can > request the DMA address translation from the IOMMU and cache the > translation itself. This can alleviate IOMMU TLB pressure and improve > the hardware performance in the I/O virtualization environment. > > The ATS is one of PCI-SIG I/O Virtualization (IOV) Specifications. The > spec can be found at: http://www.pcisig.com/specifications/iov/ats/ > (it requires membership). These ones can go through David's tree. You can add my: Acked-by: Jesse Barnes Thanks, -- Jesse Barnes, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: convert KVMTRACE to event traces
On Thu, May 14, 2009 at 05:30:16PM -0300, Marcelo Tosatti wrote: > + trace_kvm_cr_write(cr, val); > switch (cr) { > case 0: > - kvm_set_cr0(vcpu, kvm_register_read(vcpu, reg)); > + kvm_set_cr0(vcpu, val); > skip_emulated_instruction(vcpu); Do we really need one trace point covering all cr writes, _and_ one for each specific register? > if (!npt_enabled) > - KVMTRACE_3D(PAGE_FAULT, &svm->vcpu, error_code, > - (u32)fault_address, (u32)(fault_address >> 32), > - handler); > + trace_kvm_page_fault(fault_address, error_code); > else > - KVMTRACE_3D(TDP_FAULT, &svm->vcpu, error_code, > - (u32)fault_address, (u32)(fault_address >> 32), > - handler); > + trace_kvm_tdp_page_fault(fault_address, error_code); Again this seems a bit cumbersome. Why not just one tracepoint for page faults, with a flag if we're using npt or not? > +ifeq ($(CONFIG_TRACEPOINTS),y) > +trace-objs = kvm-traces.o > +arch-trace-objs = kvm-traces-arch.o > +endif > + > EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm > > kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \ > - i8254.o > + i8254.o $(trace-objs) > obj-$(CONFIG_KVM) += kvm.o > -kvm-intel-objs = vmx.o > +kvm-intel-objs = vmx.o $(arch-trace-objs) > obj-$(CONFIG_KVM_INTEL) += kvm-intel.o > -kvm-amd-objs = svm.o > +kvm-amd-objs = svm.o $(arch-trace-objs) > obj-$(CONFIG_KVM_AMD) += kvm-amd.o The option to select even tracing bits is CONFIG_EVENT_TRACING and the makefile syntax used here (both the original makefile and the additions) is rather awkward. A proper arch/x86/kvm/Makefile including tracing bits should look like the following: -- snip -- EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm kvm-y += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \ coalesced_mmio.o irq_comm.o) kvm-$(CONFIG_KVM_TRACE) += $(addprefix ../../../virt/kvm/, kvm_trace.o) kvm-$(CONFIG_IOMMU_API) += $(addprefix ../../../virt/kvm/, iommu.o) kmv-y += x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \ i8254.o kvm-$(CONFIG_EVENT_TRACING) += kvm-traces.o kvm-arch-trace-$(CONFIG_EVENT_TRACING) += kvm-traces-arch.o kvm-intel-y += vmx.o $(kvm-arch-trace-y) kvm-amd-y += svm.o $(kvm-arch-trace-y) obj-$(CONFIG_KVM) += kvm.o obj-$(CONFIG_KVM_INTEL) += kvm-intel.o obj-$(CONFIG_KVM_AMD) += kvm-amd.o -- snip -- and do we actually still need kvm_trace.o after this? Anyway, I'll send the upstream part of the makefile cleanup out ASAP, then you can rebase later. > Index: linux-2.6-x86-2/arch/x86/kvm/kvm-traces.c > === > --- /dev/null > +++ linux-2.6-x86-2/arch/x86/kvm/kvm-traces.c > @@ -0,0 +1,5 @@ > +#include > + > + > +#define CREATE_TRACE_POINTS > +#include Can't we just put this into some other common .c file? That would also reduce the amount of makefile magic required. > Index: linux-2.6-x86-2/arch/x86/kvm/kvm-traces-arch.c > === > --- /dev/null > +++ linux-2.6-x86-2/arch/x86/kvm/kvm-traces-arch.c > @@ -0,0 +1,5 @@ > +#include > + > + > +#define CREATE_TRACE_POINTS > +#include Same for this one, especially as the makefile hackery required for this one is even worse.. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] kmod: Add distclean rule
The smaller the patch... sigh. > Remove the configure output config.kbuild, config.mak and arch links via distclean. Signed-off-by: Jan Kiszka --- Makefile |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/Makefile b/Makefile index dad5f0b..cef121d 100644 --- a/Makefile +++ b/Makefile @@ -68,3 +68,6 @@ rpm: all clean: $(MAKE) -C $(KERNELDIR) M=`pwd` $@ + +distclean: clean + rm -f config.kbuild config.mak include/asm include-compat/asm -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] qemu-kvm: add iosignalfd support
An iosignalfd allows an eventfd to attach to a specific PIO/MMIO region in the guest. Any guest-writes to that region will trigger an eventfd signal. For more details, see the kernel side patches submitted here: http://lkml.org/lkml/2009/5/15/303 Signed-off-by: Gregory Haskins --- kvm/libkvm/libkvm.c | 68 +++ kvm/libkvm/libkvm.h | 39 + 2 files changed, 107 insertions(+), 0 deletions(-) diff --git a/kvm/libkvm/libkvm.c b/kvm/libkvm/libkvm.c index ccab985..dc3414f 100644 --- a/kvm/libkvm/libkvm.c +++ b/kvm/libkvm/libkvm.c @@ -1501,3 +1501,71 @@ int kvm_destroy_irqfd(kvm_context_t kvm, int fd, int gsi, int flags) } #endif /* KVM_CAP_IRQFD */ + +#ifdef KVM_CAP_IOSIGNALFD + +int kvm_assign_iosignalfd(kvm_context_t kvm, unsigned long cookie, + unsigned long addr, size_t len, + int fd, int flags) +{ + int r; + int type = flags & IOSIGNALFD_FLAG_PIO; + struct kvm_iosignalfd data = { + .cookie = cookie, + .addr = addr, + .len= len, + .fd = fd, + .flags = type ? KVM_IOSIGNALFD_FLAG_PIO : 0, + }; + + if (!kvm_check_extension(kvm, KVM_CAP_IOSIGNALFD)) + return -ENOENT; + + r = ioctl(kvm->vm_fd, KVM_IOSIGNALFD, &data); + if (r == -1) + r = -errno; + return r; +} + +int kvm_deassign_iosignalfd(kvm_context_t kvm, unsigned long cookie, + unsigned long addr, int flags) +{ + int r; + int type = flags & IOSIGNALFD_FLAG_PIO; + int cvalid = flags & IOSIGNALFD_FLAG_COOKIE; + struct kvm_iosignalfd data = { + .cookie = cookie, + .addr= addr, + .flags = KVM_IOSIGNALFD_FLAG_DEASSIGN | + (type ? KVM_IOSIGNALFD_FLAG_PIO : 0) | + (cvalid ? KVM_IOSIGNALFD_FLAG_COOKIE : 0), + }; + + if (!kvm_check_extension(kvm, KVM_CAP_IOSIGNALFD)) + return -ENOENT; + + r = ioctl(kvm->vm_fd, KVM_IOSIGNALFD, &data); + if (r == -1) + r = -errno; + return r; +} + +#else /* KVM_CAP_IOSIGNALFD */ + +int kvm_assign_iosignalfd(kvm_context_t kvm, unsigned long cookie, + unsigned long addr, size_t len, + int fd, int flags) +{ + return -ENOENT; +} + +int kvm_deassign_iosignalfd(kvm_context_t kvm, unsigned long cookie, + unsigned long addr, int flags) +{ + return -ENOENT; +} + +#endif /* KVM_CAP_IOSIGNALFD */ + + + diff --git a/kvm/libkvm/libkvm.h b/kvm/libkvm/libkvm.h index 3ccbe3d..ea81e55 100644 --- a/kvm/libkvm/libkvm.h +++ b/kvm/libkvm/libkvm.h @@ -882,6 +882,45 @@ int kvm_create_irqfd(kvm_context_t kvm, int gsi, int flags); */ int kvm_destroy_irqfd(kvm_context_t kvm, int fd, int gsi, int flags); +enum { + iosignalfd_option_pio, + iosignalfd_option_cookie, +}; + +#define IOSIGNALFD_FLAG_PIO(1 << iosignalfd_option_pio) +#define IOSIGNALFD_FLAG_COOKIE (1 << iosignalfd_option_cookie) + +/*! + * \brief Assign an eventfd to an IO port (PIO or MMIO) + * + * Assigns an eventfd based file-descriptor to a specific PIO or MMIO + * address range. Any guest writes to the specified range will generate + * an eventfd signal. + * + * \param kvm Pointer to the current kvm_context + * \param cookie A user-assigned cookie for optional use in deassign + * \param addr The IO address + * \param len The length of the IO region at the address + * \param fd The eventfd file-descriptor + * \param flags FLAG_PIO: PIO, else MMIO + */ +int kvm_assign_iosignalfd(kvm_context_t kvm, unsigned long cookie, + unsigned long addr, size_t len, + int fd, int flags); + +/*! + * \brief Deassign an iosignalfd from a previously registered IO port + * + * Deassigns an iosignalfd previously registered with kvm_assign_iosignalfd() + * + * \param kvm Pointer to the current kvm_context + * \param cookie The cookie to (optionally) match (must specifcy FLAG_COOKIE) + * \param addr The IO address to deassign + * \param flags FLAG_PIO: PIO, else MMIO, FLAG_COOKIE: cookie is valid + */ +int kvm_deassign_iosignalfd(kvm_context_t kvm, unsigned long cookie, + unsigned long addr, int flags); + #ifdef KVM_CAP_DEVICE_MSIX int kvm_assign_set_msix_nr(kvm_context_t kvm, struct kvm_assigned_msix_nr *msix_nr); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v2 0/4] iosignalfd
Gregory Haskins wrote: > [ > > Applies to kvm.git:833367b57c plus the irqfd patch, v8, as posted here: > > http://lkml.org/lkml/2009/5/14/258 > I should also mention: NOT FOR INCLUSION I am still testing this code, so this is an rfc for now. > ] > > This is v2 of the series. For more details, please see the header to > patch 4/4. > > [ >Changelog: > > v2: >*) added optional data-matching capability (via cookie field) >*) changed name from iofd to iosignalfd >*) added io_bus unregister function >*) implemented deassign feature > > v1: >*) original release (integrated into irqfd v7 series as "iofd") > ] > > --- > > Gregory Haskins (4): > kvm: add iosignalfd support > kvm: add io_bus unregister function > kvm: add return value to kvm_io_bus_register_dev > eventfd: export eventfd interfaces for module use > > > arch/x86/kvm/i8254.c |7 +- > arch/x86/kvm/i8259.c |5 + > fs/eventfd.c |3 + > include/linux/kvm.h | 15 > include/linux/kvm_host.h | 10 ++- > virt/kvm/coalesced_mmio.c |4 + > virt/kvm/eventfd.c| 154 > + > virt/kvm/ioapic.c |4 + > virt/kvm/kvm_main.c | 62 -- > 9 files changed, 249 insertions(+), 15 deletions(-) > > signature.asc Description: OpenPGP digital signature
[KVM PATCH v2 4/4] kvm: add iosignalfd support
iosignalfd is a mechanism to register PIO/MMIO regions to trigger an eventfd signal when written to by a guest. Host userspace can register any arbitrary IO address with a corresponding eventfd and then pass the eventfd to a specific end-point of interest for handling. Normal IO requires a blocking round-trip since the operation may cause side-effects in the emulated model or may return data to the caller. Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM "heavy-weight" exit back to userspace, and is ultimately serviced by qemu's device model synchronously before returning control back to the vcpu. However, there is a subclass of IO which acts purely as a trigger for other IO (such as to kick off an out-of-band DMA request, etc). For these patterns, the synchronous call is particularly expensive since we really only want to simply get our notification transmitted asychronously and return as quickly as possible. All the sychronous infrastructure to ensure proper data-dependencies are met in the normal IO case are just unecessary overhead for signalling. This adds additional computational load on the system, as well as latency to the signalling path. Therefore, we provide a mechanism for registration of an in-kernel trigger point that allows the VCPU to only require a very brief, lightweight exit just long enough to signal an eventfd. This also means that any clients compatible with the eventfd interface (which includes userspace and kernelspace equally well) can now register to be notified. The end result should be a more flexible and higher performance notification API for the backend KVM hypervisor and perhipheral components. To test this theory, we built a test-harness called "doorbell". This module has a function called "doorbell_ring()" which simply increments a counter for each time the doorbell is signaled. It supports signalling from either an eventfd, or an ioctl(). We then wired up two paths to the doorbell: One via QEMU via a registered io region and through the doorbell ioctl(). The other is direct via iosignalfd. You can download this test harness here: ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2 The measured results are as follows: qemu-mmio: 11 iops, 9.09us rtt iosignalfd-mmio: 200100 iops, 5.00us rtt iosignalfd-pio: 367300 iops, 2.72us rtt I didn't measure qemu-pio, because I have to figure out how to register a PIO region with qemu's device model, and I got lazy. However, for now we can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO, and -350ns for HC, we get: qemu-pio: 153139 iops, 6.53us rtt iosignalfd-hc: 412585 iops, 2.37us rtt these are just for fun, for now, until I can gather more data. Here is a graph for your convenience: http://developer.novell.com/wiki/images/7/76/Iofd-chart.png The conclusion to draw is that we save about 4us by skipping the userspace hop. Signed-off-by: Gregory Haskins --- include/linux/kvm.h | 15 include/linux/kvm_host.h |2 + virt/kvm/eventfd.c | 154 ++ virt/kvm/kvm_main.c | 13 4 files changed, 184 insertions(+), 0 deletions(-) diff --git a/include/linux/kvm.h b/include/linux/kvm.h index a1ecc6a..9372b12 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -292,6 +292,19 @@ struct kvm_guest_debug { struct kvm_guest_debug_arch arch; }; +#define KVM_IOSIGNALFD_FLAG_DEASSIGN (1 << 0) +#define KVM_IOSIGNALFD_FLAG_PIO (1 << 1) +#define KVM_IOSIGNALFD_FLAG_COOKIE(1 << 2) + +struct kvm_iosignalfd { + __u64 cookie; + __u64 addr; + __u32 len; + __u32 fd; + __u32 flags; + __u8 pad[12]; +}; + #define KVM_TRC_SHIFT 16 /* * kvm trace categories @@ -416,6 +429,7 @@ struct kvm_trace_rec { /* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */ #define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30 #define KVM_CAP_IRQFD 31 +#define KVM_CAP_IOSIGNALFD 32 #ifdef KVM_CAP_IRQ_ROUTING @@ -509,6 +523,7 @@ struct kvm_irqfd { _IOW(KVMIO, 0x74, struct kvm_assigned_msix_entry) #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq) #define KVM_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd) +#define KVM_IOSIGNALFD _IOW(KVMIO, 0x77, struct kvm_iosignalfd) /* * ioctls for vcpu fds diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 214089f..4e4b174 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -137,6 +137,7 @@ struct kvm { struct kvm_io_bus mmio_bus; struct kvm_io_bus pio_bus; struct list_head irqfds; + struct list_head iosignalfds; struct kvm_vm_stat stat; struct kvm_arch arch; atomic_t users_count; @@ -530,5 +531,6 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {} int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int fla
[KVM PATCH v2 3/4] kvm: add io_bus unregister function
We want to support the notion of dynamic MMIO/PIO registrations and therefore will need to support both register as well as unregister. However, the current io_bus code is structured as a linear array and is not conducive to unregistering, so refactor to allow "holes" in the array. We then enhance the API with an unregister function. Signed-off-by: Gregory Haskins --- include/linux/kvm_host.h |4 +++- virt/kvm/kvm_main.c | 48 ++ 2 files changed, 43 insertions(+), 9 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 94c1a11..214089f 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -52,7 +52,7 @@ extern struct kmem_cache *kvm_vcpu_cache; * in one place. */ struct kvm_io_bus { - int dev_count; + spinlock_t lock; #define NR_IOBUS_DEVS 6 struct kvm_io_device *devs[NR_IOBUS_DEVS]; }; @@ -63,6 +63,8 @@ struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus, gpa_t addr, int len, int is_write); int kvm_io_bus_register_dev(struct kvm_io_bus *bus, struct kvm_io_device *dev); +int kvm_io_bus_unregister_dev(struct kvm_io_bus *bus, + struct kvm_io_device *dev); struct kvm_vcpu { struct kvm *kvm; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 60ba0cf..5f5e443 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2433,16 +2433,18 @@ static struct notifier_block kvm_reboot_notifier = { void kvm_io_bus_init(struct kvm_io_bus *bus) { memset(bus, 0, sizeof(*bus)); + spin_lock_init(&bus->lock); } void kvm_io_bus_destroy(struct kvm_io_bus *bus) { int i; - for (i = 0; i < bus->dev_count; i++) { + for (i = 0; i < NR_IOBUS_DEVS; i++) { struct kvm_io_device *pos = bus->devs[i]; - kvm_iodevice_destructor(pos); + if (pos) + kvm_iodevice_destructor(pos); } } @@ -2451,10 +2453,10 @@ struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus, { int i; - for (i = 0; i < bus->dev_count; i++) { + for (i = 0; i < NR_IOBUS_DEVS; i++) { struct kvm_io_device *pos = bus->devs[i]; - if (pos->in_range(pos, addr, len, is_write)) + if (pos && pos->in_range(pos, addr, len, is_write)) return pos; } @@ -2463,12 +2465,42 @@ struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus, int kvm_io_bus_register_dev(struct kvm_io_bus *bus, struct kvm_io_device *dev) { - if (bus->dev_count > (NR_IOBUS_DEVS-1)) - return -ENOSPC; + int i; - bus->devs[bus->dev_count++] = dev; + spin_lock(&bus->lock); - return 0; + for (i = 0; i < NR_IOBUS_DEVS; i++) { + if (bus->devs[i]) + continue; + + bus->devs[i] = dev; + spin_unlock(&bus->lock); + return 0; + } + + spin_unlock(&bus->lock); + + return -ENOSPC; +} + +int kvm_io_bus_unregister_dev(struct kvm_io_bus *bus, struct kvm_io_device *dev) +{ + int i; + + spin_lock(&bus->lock); + + for (i = 0; i < NR_IOBUS_DEVS; i++) { + + if (bus->devs[i] == dev) { + bus->devs[i] = NULL; + spin_unlock(&bus->lock); + return 0; + } + } + + spin_unlock(&bus->lock); + + return -ENOENT; } static struct notifier_block kvm_cpu_notifier = { -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM PATCH v2 2/4] kvm: add return value to kvm_io_bus_register_dev
Today this function returns void and will internally BUG_ON if it fails. We want to create dynamic MMIO/PIO entries driven from userspace later in the series, so enhance this API to return an error code on failure. We also fix up all the callsites to check the return code and BUG_ON if it fails. The net result should be identical behavior both before and after this patch. We are simply laying the groundwork for the dynamic usage Signed-off-by: Gregory Haskins --- arch/x86/kvm/i8254.c |7 +-- arch/x86/kvm/i8259.c |5 - include/linux/kvm_host.h |4 ++-- virt/kvm/coalesced_mmio.c |4 +++- virt/kvm/ioapic.c |4 +++- virt/kvm/kvm_main.c |7 +-- 6 files changed, 22 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 4d6f0d2..cc274d6 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -564,6 +564,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm) { struct kvm_pit *pit; struct kvm_kpit_state *pit_state; + int ret; pit = kzalloc(sizeof(struct kvm_pit), GFP_KERNEL); if (!pit) @@ -584,13 +585,15 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm) pit->dev.write = pit_ioport_write; pit->dev.in_range = pit_in_range; pit->dev.private = pit; - kvm_io_bus_register_dev(&kvm->pio_bus, &pit->dev); + ret = kvm_io_bus_register_dev(&kvm->pio_bus, &pit->dev); + BUG_ON(ret < 0); pit->speaker_dev.read = speaker_ioport_read; pit->speaker_dev.write = speaker_ioport_write; pit->speaker_dev.in_range = speaker_in_range; pit->speaker_dev.private = pit; - kvm_io_bus_register_dev(&kvm->pio_bus, &pit->speaker_dev); + ret = kvm_io_bus_register_dev(&kvm->pio_bus, &pit->speaker_dev); + BUG_ON(ret < 0); kvm->arch.vpit = pit; pit->kvm = kvm; diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c index 1ccb50c..7d39b5b 100644 --- a/arch/x86/kvm/i8259.c +++ b/arch/x86/kvm/i8259.c @@ -519,6 +519,8 @@ static void pic_irq_request(void *opaque, int level) struct kvm_pic *kvm_create_pic(struct kvm *kvm) { struct kvm_pic *s; + int ret; + s = kzalloc(sizeof(struct kvm_pic), GFP_KERNEL); if (!s) return NULL; @@ -538,6 +540,7 @@ struct kvm_pic *kvm_create_pic(struct kvm *kvm) s->dev.write = picdev_write; s->dev.in_range = picdev_in_range; s->dev.private = s; - kvm_io_bus_register_dev(&kvm->pio_bus, &s->dev); + ret = kvm_io_bus_register_dev(&kvm->pio_bus, &s->dev); + BUG_ON(ret < 0); return s; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index dc91610..94c1a11 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -61,8 +61,8 @@ void kvm_io_bus_init(struct kvm_io_bus *bus); void kvm_io_bus_destroy(struct kvm_io_bus *bus); struct kvm_io_device *kvm_io_bus_find_dev(struct kvm_io_bus *bus, gpa_t addr, int len, int is_write); -void kvm_io_bus_register_dev(struct kvm_io_bus *bus, -struct kvm_io_device *dev); +int kvm_io_bus_register_dev(struct kvm_io_bus *bus, + struct kvm_io_device *dev); struct kvm_vcpu { struct kvm *kvm; diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c index 5ae620d..19945e1 100644 --- a/virt/kvm/coalesced_mmio.c +++ b/virt/kvm/coalesced_mmio.c @@ -86,6 +86,7 @@ static void coalesced_mmio_destructor(struct kvm_io_device *this) int kvm_coalesced_mmio_init(struct kvm *kvm) { struct kvm_coalesced_mmio_dev *dev; + int ret; dev = kzalloc(sizeof(struct kvm_coalesced_mmio_dev), GFP_KERNEL); if (!dev) @@ -96,7 +97,8 @@ int kvm_coalesced_mmio_init(struct kvm *kvm) dev->dev.private = dev; dev->kvm = kvm; kvm->coalesced_mmio_dev = dev; - kvm_io_bus_register_dev(&kvm->mmio_bus, &dev->dev); + ret = kvm_io_bus_register_dev(&kvm->mmio_bus, &dev->dev); + BUG_ON(ret < 0); return 0; } diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index 1eddae9..3eee4c9 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -317,6 +317,7 @@ void kvm_ioapic_reset(struct kvm_ioapic *ioapic) int kvm_ioapic_init(struct kvm *kvm) { struct kvm_ioapic *ioapic; + int ret; ioapic = kzalloc(sizeof(struct kvm_ioapic), GFP_KERNEL); if (!ioapic) @@ -328,7 +329,8 @@ int kvm_ioapic_init(struct kvm *kvm) ioapic->dev.in_range = ioapic_in_range; ioapic->dev.private = ioapic; ioapic->kvm = kvm; - kvm_io_bus_register_dev(&kvm->mmio_bus, &ioapic->dev); + ret = kvm_io_bus_register_dev(&kvm->mmio_bus, &ioapic->dev); + BUG_ON(ret < 0); return 0; } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index b2db766..60ba0cf 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2
[KVM PATCH v2 1/4] eventfd: export eventfd interfaces for module use
We want to use eventfd from KVM which can be compiled as a module, so export the interfaces. Signed-off-by: Gregory Haskins --- fs/eventfd.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/eventfd.c b/fs/eventfd.c index 2a701d5..3f0e197 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -16,6 +16,7 @@ #include #include #include +#include struct eventfd_ctx { wait_queue_head_t wqh; @@ -56,6 +57,7 @@ int eventfd_signal(struct file *file, int n) return n; } +EXPORT_SYMBOL_GPL(eventfd_signal); static int eventfd_release(struct inode *inode, struct file *file) { @@ -197,6 +199,7 @@ struct file *eventfd_fget(int fd) return file; } +EXPORT_SYMBOL_GPL(eventfd_fget); SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags) { -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM PATCH v2 0/4] iosignalfd
[ Applies to kvm.git:833367b57c plus the irqfd patch, v8, as posted here: http://lkml.org/lkml/2009/5/14/258 ] This is v2 of the series. For more details, please see the header to patch 4/4. [ Changelog: v2: *) added optional data-matching capability (via cookie field) *) changed name from iofd to iosignalfd *) added io_bus unregister function *) implemented deassign feature v1: *) original release (integrated into irqfd v7 series as "iofd") ] --- Gregory Haskins (4): kvm: add iosignalfd support kvm: add io_bus unregister function kvm: add return value to kvm_io_bus_register_dev eventfd: export eventfd interfaces for module use arch/x86/kvm/i8254.c |7 +- arch/x86/kvm/i8259.c |5 + fs/eventfd.c |3 + include/linux/kvm.h | 15 include/linux/kvm_host.h | 10 ++- virt/kvm/coalesced_mmio.c |4 + virt/kvm/eventfd.c| 154 + virt/kvm/ioapic.c |4 + virt/kvm/kvm_main.c | 62 -- 9 files changed, 249 insertions(+), 15 deletions(-) -- Signature -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] kmod: Add distclean rule
Jan Kiszka wrote: > --- a/Makefile > +++ b/Makefile > @@ -68,3 +68,6 @@ rpm:all > > clean: > $(MAKE) -C $(KERNELDIR) M=`pwd` $@ > + > +distclean: > + rm -f config.kbuild config.mak This one is "cleaner": -> Remove the configure output config.kbuild and config.mak via distclean. Signed-off-by: Jan Kiszka --- Makefile |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/Makefile b/Makefile index dad5f0b..a4c59c9 100644 --- a/Makefile +++ b/Makefile @@ -68,3 +68,6 @@ rpm: all clean: $(MAKE) -C $(KERNELDIR) M=`pwd` $@ + +distclean: clean + rm -f config.kbuild config.mak -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bios: Fix MADT corruption and RSDT size when using -acpitable
Beth, On Thu, May 14, 2009 at 12:20:29PM -0400, Beth Kon wrote: > Anthony Liguori wrote: >> Vincent Minet wrote: >>> External ACPI tables are counted twice for the RSDT size and the load >>> address for the first external table is in the MADT (interrupt override >>> entries are overwritten). >>> >>> Signed-off-by: Vincent Minet >>> >> >> Beth, >> >> I think you had a patch attempting to address the same issue. It was >> a bit more involved though. >> >> Which is the proper fix and are they both to the same problem? > They are for 2 different bases. My patch was for qemu's bochs bios and > this is for qemu-kvm/kvm/bios/rombios32.c. They are pretty divergent in > this area of setting up the ACPI tables. My patch is still needed for > the qemu base. I hope we'll be getting to one base soon :-) > > Assuming the intent of the code was for MAX_RSDT_ENTRIES to include > external_tables, this patch looks correct. I think one additional check > would be needed (in my patch) to make sure that the code doesn't exceed > MAX_RSDT_ENTRIES when the external tables are being loaded. > > My patch also puts all the code that calculates madt_size in the same > place, at the beginning of the table layout. I believe this is neater > and will avoid problems like this one in the future. As much as > possible, I think it best to get all the tables layed out, then fill > them in. If for some reason this is not acceptable, we need to add a big > note that no tables should be layed out after the madt because the madt > may grow further down in the code and overwrite the other table. I like this better too, see questions/comments below. >> >> Regards, >> >> Anthony Liguori >> >>> --- >>> kvm/bios/rombios32.c |3 ++- >>> 1 files changed, 2 insertions(+), 1 deletions(-) >>> >>> diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c >>> index cbd5f15..289361b 100755 >>> --- a/kvm/bios/rombios32.c >>> +++ b/kvm/bios/rombios32.c >>> @@ -1626,7 +1626,7 @@ void acpi_bios_init(void) >>> addr = base_addr = ram_size - ACPI_DATA_SIZE; >>> rsdt_addr = addr; >>> rsdt = (void *)(addr); >>> -rsdt_size = sizeof(*rsdt) + external_tables * 4; >>> +rsdt_size = sizeof(*rsdt); >>> addr += rsdt_size; >>> fadt_addr = addr; >>> @@ -1787,6 +1787,7 @@ void acpi_bios_init(void) >>> } >>> int_override++; >>> madt_size += sizeof(struct madt_int_override); >>> +addr += sizeof(struct madt_int_override); >>> } >>> acpi_build_table_header((struct acpi_table_header *)madt, >>> "APIC", madt_size, 1); >>> > diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c > index cbd5f15..23835b6 100755 > --- a/kvm/bios/rombios32.c > +++ b/kvm/bios/rombios32.c > @@ -1626,7 +1626,7 @@ void acpi_bios_init(void) > addr = base_addr = ram_size - ACPI_DATA_SIZE; > rsdt_addr = addr; > rsdt = (void *)(addr); > -rsdt_size = sizeof(*rsdt) + external_tables * 4; > +rsdt_size = sizeof(*rsdt); > addr += rsdt_size; > > fadt_addr = addr; > @@ -1665,6 +1665,7 @@ void acpi_bios_init(void) > > addr = (addr + 7) & ~7; > madt_addr = addr; > +madt = (void *)(addr); > madt_size = sizeof(*madt) + > sizeof(struct madt_processor_apic) * MAX_CPUS + > #ifdef BX_QEMU > @@ -1672,7 +1673,11 @@ void acpi_bios_init(void) > #else > sizeof(struct madt_io_apic); > #endif > -madt = (void *)(addr); > +for ( i = 0; i < 16; i++ ) { > +if ( PCI_ISA_IRQ_MASK & (1U << i) ) { > +madt_size += sizeof(struct madt_int_override); > +} > +} > addr += madt_size; This bug could only affect the HPET descriptor right? > #ifdef BX_QEMU > @@ -1786,7 +1791,6 @@ void acpi_bios_init(void) > continue; > } > int_override++; > -madt_size += sizeof(struct madt_int_override); > } > acpi_build_table_header((struct acpi_table_header *)madt, > "APIC", madt_size, 1); > @@ -1868,17 +1872,6 @@ void acpi_bios_init(void) > acpi_build_table_header((struct acpi_table_header *)hpet, > "HPET", sizeof(*hpet), 1); > #endif > - > -acpi_additional_tables(); /* resets cfg to required entry */ > -for(i = 0; i < external_tables; i++) { > -uint16_t len; > -if(acpi_load_table(i, addr, &len) < 0) > -BX_PANIC("Failed to load ACPI table from QEMU\n"); > -rsdt->table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(addr); > -addr += len; > -if(addr >= ram_size) > -BX_PANIC("ACPI table overflow\n"); > -} The external ACPI tables fix(es) are logically separate from the MADT intoverride size calculation, and so they could be separate patches? > #endif > > /* RSDT */ > @@ -1891,6 +1884,16 @@ void acpi_bios_init(void) > // rsdt->table_offset_e
[PATCH] kmod: Update .gitignore
Signed-off-by: Jan Kiszka --- .gitignore | 118 +++- 1 files changed, 60 insertions(+), 58 deletions(-) diff --git a/.gitignore b/.gitignore index 22a8200..bdebd0a 100644 --- a/.gitignore +++ b/.gitignore @@ -3,64 +3,66 @@ *~ *.flat *.a -config.mak .*.cmd -qemu/config-host.h -qemu/config-host.mak -user/test/bootstrap -user/kvmctl -qemu/dyngen -qemu/x86_64-softmmu -qemu/qemu-img -qemu/qemu-nbd *.ko *.mod.c -bios/*.bin -bios/*.sym -bios/*.txt -bios/acpi-dsdt.aml -vgabios/*.bin -vgabios/*.txt -extboot/extboot.bin -extboot/extboot.img -extboot/signrom -kernel/config.kbuild -kernel/modules.order -kernel/Module.symvers -kernel/Modules.symvers -kernel/Module.markers -kernel/.tmp_versions -kernel/include-compat/asm -kernel/include-compat/asm-x86/asm-x86 -kernel/include -kernel/x86/modules.order -kernel/x86/i825[49].[ch] -kernel/x86/kvm_main.c -kernel/x86/kvm_svm.h -kernel/x86/vmx.[ch] -kernel/x86/svm.[ch] -kernel/x86/mmu.[ch] -kernel/x86/paging_tmpl.h -kernel/x86/x86_emulate.[ch] -kernel/x86/ioapic.[ch] -kernel/x86/iodev.h -kernel/x86/irq.[ch] -kernel/x86/kvm_trace.c -kernel/x86/lapic.[ch] -kernel/x86/tss.h -kernel/x86/x86.[ch] -kernel/x86/coalesced_mmio.[ch] -kernel/x86/kvm_cache_regs.h -kernel/x86/vtd.c -kernel/x86/irq_comm.c -kernel/x86/timer.c -kernel/x86/kvm_timer.h -kernel/x86/iommu.c -qemu/pc-bios/extboot.bin -qemu/qemu-doc.html -qemu/*.[18] -qemu/*.pod -qemu/qemu-tech.html -qemu/qemu-options.texi -user/kvmtrace -user/test/x86/bootstrap +config.kbuild +config.mak +modules.order +Module.symvers +Modules.symvers +Module.markers +.tmp_versions +include-compat/asm +include-compat/asm-x86/asm-x86 +include +x86/modules.order +x86/i825[49].[ch] +x86/kvm_main.c +x86/kvm_svm.h +x86/vmx.[ch] +x86/svm.[ch] +x86/mmu.[ch] +x86/paging_tmpl.h +x86/x86_emulate.[ch] +x86/ioapic.[ch] +x86/iodev.h +x86/irq.[ch] +x86/kvm_trace.c +x86/lapic.[ch] +x86/tss.h +x86/x86.[ch] +x86/coalesced_mmio.[ch] +x86/kvm_cache_regs.h +x86/vtd.c +x86/irq_comm.c +x86/timer.c +x86/kvm_timer.h +x86/iommu.c +ia64/asm-offsets.c +ia64/coalesced_mmio.[ch] +ia64/ioapic.[ch] +ia64/iodev.h +ia64/iommu.c +ia64/irq.h +ia64/irq_comm.c +ia64/kvm-ia64.c +ia64/kvm_fw.c +ia64/kvm_lib.c +ia64/kvm_main.c +ia64/kvm_minstate.h +ia64/kvm_trace.c +ia64/lapic.h +ia64/memcpy.S +ia64/memset.S +ia64/misc.h +ia64/mmio.c +ia64/optvfault.S +ia64/process.c +ia64/trampoline.S +ia64/vcpu.[ch] +ia64/vmm.c +ia64/vmm_ivt.S +ia64/vti.h +ia64/vtlb.c +.stgit-* -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kmod: Add distclean rule
Remove the configure output config.kbuild and config.mak via distclean. Signed-off-by: Jan Kiszka --- Makefile |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/Makefile b/Makefile index dad5f0b..75aab71 100644 --- a/Makefile +++ b/Makefile @@ -68,3 +68,6 @@ rpm: all clean: $(MAKE) -C $(KERNELDIR) M=`pwd` $@ + +distclean: + rm -f config.kbuild config.mak -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: XP smp using a lot of CPU
On Fri, 2009-05-15 at 11:56 -0300, Marcelo Tosatti wrote: > Ross, > > Can you confirm the qemu process CPU consumption is down to acceptable > levels if you dont specify -no-acpi? > > Thanks Simply starting without -no-acpi did not help. I tried to do a Windows XP repair, but seemed to end up nasically doing a reinstall. The system now seems to be hung up. I'm probably going to end up trying a fresh install; I'll report more results when I have them. > > > On Thu, May 14, 2009 at 01:01:11PM -0700, Ross Boylan wrote: > > On Wed, 2009-05-13 at 09:56 +0300, Avi Kivity wrote: > > > Ross Boylan wrote: > > > > I just installed XP into a new VM, specifying -smp 2 for the machine. > > > > According to top, it's using nearly 200% of a cpu even when I'm not > > > > doing anything. > > > > > > > > Is this real CPU useage, or just a reporting problem (just as my disk > > > > image is big according to ls, but isn't really)? > > > > > > > > If it's real, is there anything I can do about it? > > > > > > > > kvm 0.7.2 on Debian Lenny (but 2.6.29 kernel), amd64. Xeon chips; 32 > > > > bit version of XP pro installed, now fully patched (including the > > > > Windows Genuine Advantage stuff, though I cancelled it when it wanted to > > > > run). > > > > > > > > Task manager in XP shows virtually no CPU useage. > > > > > > > > Please cc me on responses. > > > > > > > > > > > > > > I'm guessing Windows uses a pio port to sleep, which kvm doesn't > > > support. Can you provide kvm_stat output? > > markov:~# kvm_stat -1 > > efer_reload0 0 > > exits9921384 566 > > fpu_reload267970 0 > > halt_exits 1 0 > > halt_wakeup3 0 > > host_state_reload402605017 > > hypercalls 0 0 > > insn_emulation 1329455 0 > > insn_emulation_fail 154 0 > > invlpg176773 0 > > io_exits 3818270 0 > > irq_exits1434046 566 > > irq_injections326730 0 > > irq_window164827 0 > > largepages 0 0 > > mmio_exits 35892 0 > > mmu_cache_miss 29760 0 > > mmu_flooded19908 0 > > mmu_pde_zapped 15557 0 > > mmu_pte_updated82088 0 > > mmu_pte_write 97990 0 > > mmu_recycled 0 0 > > mmu_shadow_zapped 43276 0 > > mmu_unsync 891 0 > > mmu_unsync_global 0 0 > > nmi_injections 0 0 > > nmi_window 0 0 > > pf_fixed 1231164 0 > > pf_guest 276083 0 > > remote_tlb_flush 115606 0 > > request_irq0 0 > > request_nmi0 0 > > signal_exits 5 0 > > tlb_flush 960198 0 > > > > This is with the VM displaying the XP "It is now safe to turn off your > > computer". CPU remains about 200% from kvm. Invoked with > > sudo vdeq kvm -net nic,vlan=1,macaddr=52:54:a0:12:01:00 \ > > -net vde,vlan=1,sock=/var/run/vde2/tap0.ctl \ > > -std-vga -hda XP.raw \ > > -boot c \ > > -soundhw es1370 -localtime -no-acpi -m 1G -smp 2 > > > > Next I'll trying fiddling with acpi. > > > > -- > > Ross Boylan wk: (415) 514-8146 > > 185 Berry St #5700 r...@biostat.ucsf.edu > > Dept of Epidemiology and Biostatistics fax: (415) 514-8150 > > University of California, San Francisco > > San Francisco, CA 94107-1739 hm: (415) 550-1062 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: just a dump
On Wed, May 13, 2009 at 12:20:26AM +0200, Hans de Bruin wrote: > Hans de Bruin wrote: >> Staring to vms simultaneously end in crash >> >> linux 30-rc5 >> kvm-qemu kvm-85-378-g143eb2b >> proc AMD dualcore >> >> vm's like: >> >> #!/bin/sh >> n=10 >> cdrom=/iso/server2008x64.iso >> drive=file=/kvm/disks/vm$n >> mem=1024 >> cpu=qemu64 >> vga=std >> mac=52:54:00:12:34:$n >> bridge=br1 >> >> qemu-system-x86_64 -cdrom $cdrom -drive $drive -m $mem -cpu $cpu -vga >> $vga -net nic,macaddr=$mac -net tap,script=/etc/qemu/$bridge >> >> > another dmesg: Hans, The oopses below point to the possibility of a hardware problem, similar to: https://bugzilla.redhat.com/show_bug.cgi?id=480779 Can you please rule it out with memtest86? > > device tap0 entered promiscuous mode > br1: topology change detected, propagating > br1: port 1(tap0) entering forwarding state > device tap1 entered promiscuous mode > br1: topology change detected, propagating > br1: port 2(tap1) entering forwarding state > tap0: no IPv6 routers present > tap1: no IPv6 routers present > kvm: 2915: cpu0 unimplemented perfctr wrmsr: 0xc001 data 0x0 > kvm: 2915: cpu0 unimplemented perfctr wrmsr: 0xc0010001 data 0x0 > kvm: 2915: cpu0 unimplemented perfctr wrmsr: 0xc0010002 data 0x0 > kvm: 2915: cpu0 unimplemented perfctr wrmsr: 0xc0010003 data 0x0 > kvm: 2914: cpu0 unimplemented perfctr wrmsr: 0xc001 data 0x0 > kvm: 2914: cpu0 unimplemented perfctr wrmsr: 0xc0010001 data 0x0 > kvm: 2914: cpu0 unimplemented perfctr wrmsr: 0xc0010002 data 0x0 > kvm: 2914: cpu0 unimplemented perfctr wrmsr: 0xc0010003 data 0x0 > rmap_remove: 880100de5500 8 0->BUG > [ cut here ] > kernel BUG at arch/x86/kvm/mmu.c:576! > invalid opcode: [#1] SMP > last sysfs file: /sys/devices/pci:00/:00:10.0/:01:09.0/resource > CPU 1 > Modules linked in: > Pid: 2925, comm: qemu-system-x86 Not tainted 2.6.30-rc5 #3 System > Product Name > RIP: 0010:[] [] rmap_remove+0x151/0x200 > RSP: 0018:8801a0d379f8 EFLAGS: 00010292 > RAX: 002a RBX: 0008 RCX: 809a3b40 > RDX: 88002804d000 RSI: 0046 RDI: 809a3a34 > RBP: 8801a0d37a28 R08: 8777 R09: > R10: R11: R12: > R13: 880100de5500 R14: 880101e23580 R15: 8801a0e1c000 > FS: 4270d950(0063) GS:88002804d000() knlGS:07faa000 > CS: 0010 DS: ES: CR0: 80050033 > CR2: 014a8c18 CR3: 0001a0c62000 CR4: 06e0 > DR0: DR1: DR2: > DR3: DR6: 0ff0 DR7: 0400 > Process qemu-system-x86 (pid: 2925, threadinfo 8801a0d36000, task > 8801af3605a0) > Stack: > 8801a0d37a28 > 0500 880101e23580 8801a0d37ac8 8021ad8d > 8801 0003020d 0016e772 > Call Trace: > [] paging64_sync_page+0x9d/0x1a0 > [] ? rmap_write_protect+0xd5/0x150 > [] kvm_sync_page+0x6b/0x90 > [] mmu_sync_children+0xcd/0x120 > [] ? x86_emulate_insn+0x292/0x4d30 > [] ? x86_decode_insn+0x412/0xf10 > [] mmu_sync_roots+0xc2/0xd0 > [] kvm_mmu_load+0x138/0x200 > [] ? handle_exit+0x14a/0x2c0 > [] kvm_arch_vcpu_ioctl_run+0x863/0xaa0 > [] ? kvm_vm_ioctl+0x165/0x910 > [] ? do_futex+0x679/0x9a0 > [] kvm_vcpu_ioctl+0x5d3/0x790 > [] ? common_interrupt+0xe/0x13 > [] ? __dequeue_entity+0x2b/0x50 > [] vfs_ioctl+0x31/0x90 > [] do_vfs_ioctl+0x2f1/0x4e0 > [] sys_ioctl+0x82/0xa0 > [] system_call_fastpath+0x16/0x1b > Code: 04 75 e7 48 8b 47 20 49 89 fb 48 85 c0 0f 84 b7 00 00 00 48 89 c7 > eb d0 49 8b 55 00 4c 89 ee 48 c7 c7 b8 2e 7f 80 e8 1f 29 > 04 00 <0f> 0b eb fe 48 8b 4f 18 48 85 c9 0f 94 c2 83 fe 02 0f 9e c0 84 > RIP [] rmap_remove+0x151/0x200 > RSP > ---[ end trace c11385df745a1fea ]--- > BUG: unable to handle kernel NULL pointer dereference at 0058 > IP: [] mmu_page_remove_parent_pte+0xc/0x100 > PGD 1a0ca8067 PUD 1a0ca9067 PMD 0 > Oops: [#2] SMP > last sysfs file: /sys/devices/pci:00/:00:10.0/:01:09.0/resource > CPU 0 > Modules linked in: > Pid: 2926, comm: qemu-system-x86 Tainted: G D2.6.30-rc5 #3 > System Product Name > RIP: 0010:[] [] > mmu_page_remove_parent_pte+0xc/0x100 > RSP: 0018:8801a0da57a8 EFLAGS: 00010292 > RAX: RBX: RCX: 002b > RDX: e200 RSI: 8800ccac0220 RDI: > RBP: 8801a0da57b8 R08: 006a R09: 8800ccd85e70 > R10: R11: R12: 8800ccac0220 > R13: 8800ccd85dc0 R14: 0044 R15: 8801a0db > FS: 40fbc950(0063) GS:880028034000() knlGS:07fd5000 > CS: 0010 DS: ES: CR0: 80050033 > CR2: 0058 CR3: 0001a0c63000 CR4: 06e0 > DR0: 000
Re: [PATCH 5/6] Nested SVM: Implement INVLPGA
On Fri, May 15, 2009 at 10:22:19AM +0200, Alexander Graf wrote: > SVM adds another way to do INVLPG by ASID which Hyper-V makes use of, > so let's implement it! > > For now we just do the same thing invlpg does, as asid switching > means we flush the mmu anyways. That might change one day though. > > Signed-off-by: Alexander Graf > --- > arch/x86/kvm/svm.c | 14 +- > 1 files changed, 13 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index 30e6b43..b2c6cf3 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -1785,6 +1785,18 @@ static int clgi_interception(struct vcpu_svm *svm, > struct kvm_run *kvm_run) > return 1; > } > > +static int invlpga_interception(struct vcpu_svm *svm, struct kvm_run > *kvm_run) > +{ > + struct kvm_vcpu *vcpu = &svm->vcpu; > + nsvm_printk("INVLPGA\n"); > + svm->next_rip = kvm_rip_read(&svm->vcpu) + 3; > + skip_emulated_instruction(&svm->vcpu); > + > + kvm_mmu_reset_context(vcpu); > + kvm_mmu_load(vcpu); > + return 1; > +} > + Hmm, since we flush the TLB on every nested-guest entry I think we can make this function a nop. > static int invalid_op_interception(struct vcpu_svm *svm, > struct kvm_run *kvm_run) > { > @@ -2130,7 +2142,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm, > [SVM_EXIT_INVD] = emulate_on_interception, > [SVM_EXIT_HLT] = halt_interception, > [SVM_EXIT_INVLPG] = invlpg_interception, > - [SVM_EXIT_INVLPGA] = invalid_op_interception, > + [SVM_EXIT_INVLPGA] = invlpga_interception, > [SVM_EXIT_IOIO] = io_interception, > [SVM_EXIT_MSR] = msr_interception, > [SVM_EXIT_TASK_SWITCH] = task_switch_interception, > -- > 1.6.0.2 > > -- | Advanced Micro Devices GmbH Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München System| Research | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni Center| Sitz: Dornach, Gemeinde Aschheim, Landkreis München | Registergericht München, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/6] Emulator: Inject #PF when page was not found
On Fri, May 15, 2009 at 10:22:17AM +0200, Alexander Graf wrote: > If we couldn't find a page on read_emulated, it might be a good > idea to tell the guest about that and inject a #PF. > > We do the same already for write faults. I don't know why it was > not implemented for reads. Have you checked that the emulator will never ever do speculative reads? This may be the reason why the fault was not injected here. > > Signed-off-by: Alexander Graf > --- > arch/x86/kvm/x86.c |7 +-- > 1 files changed, 5 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 5fcde2c..5aa1219 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2131,10 +2131,13 @@ static int emulator_read_emulated(unsigned long addr, > goto mmio; > > if (kvm_read_guest_virt(addr, val, bytes, vcpu) > - == X86EMUL_CONTINUE) > + == X86EMUL_CONTINUE) { > return X86EMUL_CONTINUE; > - if (gpa == UNMAPPED_GVA) > + } > + if (gpa == UNMAPPED_GVA) { > + kvm_inject_page_fault(vcpu, addr, 0); > return X86EMUL_PROPAGATE_FAULT; > + } > > mmio: > /* > -- > 1.6.0.2 > > -- | Advanced Micro Devices GmbH Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München System| Research | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni Center| Sitz: Dornach, Gemeinde Aschheim, Landkreis München | Registergericht München, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of pci passthrough work?
On (Fri) May 15 2009 [07:14:05], Passera, Pablo R wrote: > Hi Amit, > Thanks for your answer. I was able to get your userspace pvdma > version. So now, I am using the PVDMA patched kernel and the PVDMA patches > userspace. However, I am not able to start the VM. I am running qemu with the > following options (I am trying without any pci passthrough first) > > ./qemu/x86_64-softmmu/qemu-system-x86_64 -hda /root/kvm/dm2.img -m 256 -net > none > > The SDL windows appear but it hangs after showing the message "Press F12 for > boot menu.". I am not getting any message neither in qemu nor in dmesg. Do > you know what could be happening? May be a kernel compile option? It would be > great if you can send me the .config file that you used to compile it, just > to check the options. Can you try out a few things, like booting off the 'avi' branch in userspace and / or the 'kvm' branch of the kernel tree? Just to rule out the bugs in the device assignment code. Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] MMU: don't bail on PAT bits in PTE
On Fri, May 15, 2009 at 12:53:42PM +0200, Alexander Graf wrote: > > On 15.05.2009, at 12:25, Michael S. Tsirkin wrote: > >> On Fri, May 15, 2009 at 10:22:16AM +0200, Alexander Graf wrote: >>> A 64bit PTE can have bit7 set to 1 which means "Use this bit for the >>> PAT". >>> Currently KVM's MMU code treats this bit as reserved, even though >>> it's not. >>> >>> As long as we're not required to make use of the PAT bits which is >>> only >>> required for DMA/MMIO from my understanding, we can safely ignore it. >>> >>> Hyper-V uses this bit for kernel PTEs. >>> >>> Signed-off-by: Alexander Graf >>> --- >>> arch/x86/kvm/mmu.c |2 +- >>> 1 files changed, 1 insertions(+), 1 deletions(-) >>> >>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c >>> index 8fcdae9..cce055a 100644 >>> --- a/arch/x86/kvm/mmu.c >>> +++ b/arch/x86/kvm/mmu.c >>> @@ -2169,7 +2169,7 @@ static void reset_rsvds_bits_mask(struct >>> kvm_vcpu *vcpu, int level) >>> context->rsvd_bits_mask[1][1] = exb_bit_rsvd | >>> rsvd_bits(maxphyaddr, 51) | >>> rsvd_bits(13, 20); /* large page */ >>> - context->rsvd_bits_mask[1][0] = ~0ull; >>> + context->rsvd_bits_mask[1][0] = 0ull; >>> break; >>> } >>> } >> >> Just to make sure I understand what this does: if guest sets bit7, >> will >> bit7 get set in shadow PTEs as well? > > I don't see any code that interprets bit7, so the shadow PTE should be > completely unaffected. > > But to be sure I asked Jörg to take a look at it as well, as he's more > familiar with the x86 SPT code than I am :-). The PAT bit is not propagated into the shadow page tables. Anyway, the problem is fixed the wrong way in this patch. The real problem is that a 4kb pte is checked with mask considered for large pages (which do not exist on walker level 0). The attached patch fixes it the better way imho. >From 7530aef3ed580b70a74224f8c04857754501c496 Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Fri, 15 May 2009 15:14:19 +0200 Subject: [PATCH] kvm/mmu: fix reserved bit checking on 4kb pte level The reserved bits checking code looks at bit 7 of the pte to determine if it has to use the mask for a large pte or a normal pde. This does not work on 4kb pte level because bit 7 is used there for PAT. Account this in the checking function. Signed-off-by: Joerg Roedel --- arch/x86/kvm/mmu.c |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 479e748..8d9552e 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2124,9 +2124,11 @@ static void paging_free(struct kvm_vcpu *vcpu) static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 gpte, int level) { - int bit7; + int bit7 = 0; + + if (level != PT_PAGE_TABLE_LEVEL) + bit7 = (gpte >> 7) & 1; - bit7 = (gpte >> 7) & 1; return (gpte & vcpu->arch.mmu.rsvd_bits_mask[bit7][level-1]) != 0; } -- 1.6.2.4 -- | Advanced Micro Devices GmbH Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München System| Research | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni Center| Sitz: Dornach, Gemeinde Aschheim, Landkreis München | Registergericht München, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip] x86: kvm/x86.c use MSR names in place of address
On Thu, 2009-05-14 at 11:00 +0530, Jaswinder Singh Rajput wrote: > Here is the patch: > > [PATCH -tip] x86: kvm/x86.c use MSR names in place of address > > Replace 0xc0010010 with MSR_K8_SYSCFG and 0xc0010015 with MSR_K7_HWCR. > > Signed-off-by: Jaswinder Singh Rajput > --- This patch can also apply to kvm tree without any changes. Thanks, -- JSR -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Status of pci passthrough work?
Hi Amit, Thanks for your answer. I was able to get your userspace pvdma version. So now, I am using the PVDMA patched kernel and the PVDMA patches userspace. However, I am not able to start the VM. I am running qemu with the following options (I am trying without any pci passthrough first) ./qemu/x86_64-softmmu/qemu-system-x86_64 -hda /root/kvm/dm2.img -m 256 -net none The SDL windows appear but it hangs after showing the message "Press F12 for boot menu.". I am not getting any message neither in qemu nor in dmesg. Do you know what could be happening? May be a kernel compile option? It would be great if you can send me the .config file that you used to compile it, just to check the options. Thanks, Pablo >-Original Message- >From: Amit Shah [mailto:amit.s...@redhat.com] >Sent: Friday, May 15, 2009 8:00 AM >To: Passera, Pablo R >Cc: kvm@vger.kernel.org >Subject: Re: Status of pci passthrough work? > >Hello, > >On (Thu) May 14 2009 [11:08:29], Passera, Pablo R wrote: >> Amit, >> I trying to use PVDMA. I've downloaded a kernel snapshot from >the your kvm git, but I couldn't download a snapshot or the repo from >your kvm-userspace tree. I tried to launch the VM using kvm-85 user >space but it hangs before loading it. Should it work with kvm-85 user >space? Do you have the userspace patches for PVDMA? > >The pvdma userspace patches are at > >http://git.kernel.org/?p=linux/kernel/git/amit/kvm- >userspace.git;a=shortlog;h=pvdma > >(look for the branch 'pvdma' in the tree). > > Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip] x86: kvm replace MSR_IA32_TIME_STAMP_COUNTER with MSR_IA32_TSC of msr-index.h
Hello Avi, On Thu, 2009-05-14 at 11:57 +0530, Jaswinder Singh Rajput wrote: > Use standard msr-index.h's MSR declaration. > > MSR_IA32_TSC is better than MSR_IA32_TIME_STAMP_COUNTER as it also solves > 80 column issue. > > Signed-off-by: Jaswinder Singh Rajput > --- If this patch looks sane to you can apply in kvm tree. Here is the updated patch based on kvm tree: [PATCH] x86: kvm replace MSR_IA32_TIME_STAMP_COUNTER with MSR_IA32_TSC of msr-index.h Use standard msr-index.h's MSR declaration. MSR_IA32_TSC is better than MSR_IA32_TIME_STAMP_COUNTER as it also solves 80 column issue. Signed-off-by: Jaswinder Singh Rajput --- arch/x86/include/asm/kvm_host.h |2 -- arch/x86/kvm/svm.c |4 ++-- arch/x86/kvm/vmx.c |4 ++-- arch/x86/kvm/x86.c |5 ++--- 4 files changed, 6 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 716a4ec..5c72897 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -753,8 +753,6 @@ static inline void kvm_inject_gp(struct kvm_vcpu *vcpu, u32 error_code) kvm_queue_exception_e(vcpu, GP_VECTOR, error_code); } -#define MSR_IA32_TIME_STAMP_COUNTER0x010 - #define TSS_IOPB_BASE_OFFSET 0x66 #define TSS_BASE_SIZE 0x68 #define TSS_IOPB_SIZE (65536 / 8) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 71510e0..dd667dd 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1953,7 +1953,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data) struct vcpu_svm *svm = to_svm(vcpu); switch (ecx) { - case MSR_IA32_TIME_STAMP_COUNTER: { + case MSR_IA32_TSC: { u64 tsc; rdtscll(tsc); @@ -2043,7 +2043,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data) struct vcpu_svm *svm = to_svm(vcpu); switch (ecx) { - case MSR_IA32_TIME_STAMP_COUNTER: { + case MSR_IA32_TSC: { u64 tsc; rdtscll(tsc); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index fe2ce2b..98e6915 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -931,7 +931,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) case MSR_EFER: return kvm_get_msr_common(vcpu, msr_index, pdata); #endif - case MSR_IA32_TIME_STAMP_COUNTER: + case MSR_IA32_TSC: data = guest_read_tsc(); break; case MSR_IA32_SYSENTER_CS: @@ -991,7 +991,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data) case MSR_IA32_SYSENTER_ESP: vmcs_writel(GUEST_SYSENTER_ESP, data); break; - case MSR_IA32_TIME_STAMP_COUNTER: + case MSR_IA32_TSC: rdtscll(host_tsc); guest_write_tsc(data, host_tsc); break; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 44e87a5..4150edb 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -462,7 +462,7 @@ static u32 msrs_to_save[] = { #ifdef CONFIG_X86_64 MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR, #endif - MSR_IA32_TIME_STAMP_COUNTER, MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, + MSR_IA32_TSC, MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, MSR_IA32_PERF_STATUS, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA }; @@ -640,8 +640,7 @@ static void kvm_write_guest_time(struct kvm_vcpu *v) /* Keep irq disabled to prevent changes to the clock */ local_irq_save(flags); - kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER, - &vcpu->hv_clock.tsc_timestamp); + kvm_get_msr(v, MSR_IA32_TSC, &vcpu->hv_clock.tsc_timestamp); ktime_get_ts(&ts); local_irq_restore(flags); -- 1.6.1.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Set bit 1 in disabled processor's _STA
This patch sets bits 1 in disabled processor's _STA. According to the ACPI spec, this bit means: "Set if the device is enabled and decoding its resources." Without it, Windows 2008 device manager shows the processors as malfunctioning hardware. Signed-off-by: Glauber Costa --- kvm/bios/acpi-dsdt.dsl |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/kvm/bios/acpi-dsdt.dsl b/kvm/bios/acpi-dsdt.dsl index c756fed..c53816c 100755 --- a/kvm/bios/acpi-dsdt.dsl +++ b/kvm/bios/acpi-dsdt.dsl @@ -56,7 +56,7 @@ DefinitionBlock ( } \ Method (_STA) { \ If (CRST(nr)) { Return(0xF) } \ -Else { Return(0x9) }\ +Else { Return(0xB) }\ } \ } \ -- 1.5.6.6 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Don't try to mess with CPUID when running nested SVM
Alexander Graf wrote: When using nested SVM we usually want the guest to see the exact CPUID values we gave it and not some mangled ones. That would triggered by -cpu host, not nesting. Oh we have -cpu host already? No, we don't :) hm - treating the hypervisor bit like any other cpuid bit sounds like a good idea. I'm wondering though which way should be preferred. I usually don't want to have the hypervisor bit set - but maybe I'm the minority. Windows requires the hypervisor bit to set in order to pass some testing program. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Don't try to mess with CPUID when running nested SVM
On 15.05.2009, at 13:09, Avi Kivity wrote: Alexander Graf wrote: When using nested SVM we usually want the guest to see the exact CPUID values we gave it and not some mangled ones. That would triggered by -cpu host, not nesting. Oh we have -cpu host already? If so, we don't need that hackery of course :-) @@ -1506,7 +1506,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, *edx = env->cpuid_features; /* "Hypervisor present" bit required for Microsoft SVVP */ -if (kvm_enabled()) +if (kvm_enabled() && !kvm_nested) *ecx |= (1 << 31); break; -cpu host,-hypervisor hm - treating the hypervisor bit like any other cpuid bit sounds like a good idea. I'm wondering though which way should be preferred. I usually don't want to have the hypervisor bit set - but maybe I'm the minority. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Don't try to mess with CPUID when running nested SVM
Alexander Graf wrote: When using nested SVM we usually want the guest to see the exact CPUID values we gave it and not some mangled ones. That would triggered by -cpu host, not nesting. @@ -1506,7 +1506,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, *edx = env->cpuid_features; /* "Hypervisor present" bit required for Microsoft SVVP */ -if (kvm_enabled()) +if (kvm_enabled() && !kvm_nested) *ecx |= (1 << 31); break; -cpu host,-hypervisor -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of pci passthrough work?
Hello, On (Thu) May 14 2009 [11:08:29], Passera, Pablo R wrote: > Amit, > I trying to use PVDMA. I've downloaded a kernel snapshot from the > your kvm git, but I couldn't download a snapshot or the repo from your > kvm-userspace tree. I tried to launch the VM using kvm-85 user space but it > hangs before loading it. Should it work with kvm-85 user space? Do you have > the userspace patches for PVDMA? The pvdma userspace patches are at http://git.kernel.org/?p=linux/kernel/git/amit/kvm-userspace.git;a=shortlog;h=pvdma (look for the branch 'pvdma' in the tree). Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] MMU: don't bail on PAT bits in PTE
On 15.05.2009, at 12:25, Michael S. Tsirkin wrote: On Fri, May 15, 2009 at 10:22:16AM +0200, Alexander Graf wrote: A 64bit PTE can have bit7 set to 1 which means "Use this bit for the PAT". Currently KVM's MMU code treats this bit as reserved, even though it's not. As long as we're not required to make use of the PAT bits which is only required for DMA/MMIO from my understanding, we can safely ignore it. Hyper-V uses this bit for kernel PTEs. Signed-off-by: Alexander Graf --- arch/x86/kvm/mmu.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 8fcdae9..cce055a 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2169,7 +2169,7 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, int level) context->rsvd_bits_mask[1][1] = exb_bit_rsvd | rsvd_bits(maxphyaddr, 51) | rsvd_bits(13, 20); /* large page */ - context->rsvd_bits_mask[1][0] = ~0ull; + context->rsvd_bits_mask[1][0] = 0ull; break; } } Just to make sure I understand what this does: if guest sets bit7, will bit7 get set in shadow PTEs as well? I don't see any code that interprets bit7, so the shadow PTE should be completely unaffected. But to be sure I asked Jörg to take a look at it as well, as he's more familiar with the x86 SPT code than I am :-). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] Add rudimentary Hyper-V guest support
On 15.05.2009, at 10:22, Alexander Graf wrote: Now that we have nested SVM in place, let's make use of it and virtualize something non-kvm. The first interesting target that came to my mind here was Hyper-V. This patchset makes Windows Server 2008 boot with Hyper-V, which runs the "dom0" in virtualized mode already. I haven't been able to run a second VM within for now though, but maybe I just wasn't patient enough ;-). In order to find out why things were slow with nested SVM I hacked intercept reporting into debugfs in my local tree and found pretty interesting results (using NPT): SVM_EXIT_CLGI 3888080 0 SVM_EXIT_CPUID3460 0 SVM_EXIT_CR0_SEL_WRI 0 0 SVM_EXIT_ERR 0 0 SVM_EXIT_FERR_FREEZE 0 0 SVM_EXIT_GDTR_READ 0 0 SVM_EXIT_GDTR_WRITE 0 0 SVM_EXIT_HLT 40186 0 SVM_EXIT_ICEBP 0 0 SVM_EXIT_IDTR_READ 0 0 SVM_EXIT_IDTR_WRITE 0 0 SVM_EXIT_INIT0 0 SVM_EXIT_INTR 193173 0 SVM_EXIT_INVD0 0 SVM_EXIT_INVLPG 1 0 SVM_EXIT_INVLPGA536994 0 SVM_EXIT_IOIO 3450484 0 SVM_EXIT_IRET0 0 SVM_EXIT_LDTR_READ 0 0 SVM_EXIT_LDTR_WRITE 0 0 SVM_EXIT_MONITOR 0 0 SVM_EXIT_MSR124614 0 SVM_EXIT_MWAIT 0 0 SVM_EXIT_MWAIT_COND 0 0 SVM_EXIT_NMI 0 0 SVM_EXIT_NPF 1040416 0 SVM_EXIT_PAUSE 0 0 SVM_EXIT_POPF0 0 SVM_EXIT_PUSHF 0 0 SVM_EXIT_RDPMC 0 0 SVM_EXIT_RDTSC 0 0 SVM_EXIT_RDTSCP 0 0 SVM_EXIT_RSM 0 0 SVM_EXIT_SHUTDOWN0 0 SVM_EXIT_SKINIT 0 0 SVM_EXIT_SMI20 0 SVM_EXIT_STGI 3888080 0 SVM_EXIT_SWINT 0 0 SVM_EXIT_TASK_SWITCH 0 0 SVM_EXIT_TR_READ 0 0 SVM_EXIT_TR_WRITE0 0 SVM_EXIT_VINTR 402865 0 SVM_EXIT_VMLOAD3888096 0 SVM_EXIT_VMMCALL767288 0 SVM_EXIT_VMRUN 3888096 0 SVM_EXIT_VMSAVE3888096 0 SVM_EXIT_WBINVD 64 0 So apparently the most intercepts come from the SVM helper calls (clgi, stgi, vmload, vmsave). I guess I need to get back to the "emulate when GIF=0" approach to get things fast. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm build error with latest commit
Xu, Jiajun wrote: Hi all, Latest kvm can not build with 2.6.30-rc4 kernel. Could anyone help on the issue? Error as following: make[1]: Leaving directory `/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm' The external module is now build using the kvm-kmod repository: http://git.kernel.org/?p=virt/kvm/kvm-kmod.git;a=summary If you clone it, and use the commands 'git submodule init; git submodule update' is will create a linux-2.6 directory. Afterwards all you need is to pull from both repositories, and make sync and make rpm will work as usual. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] MMU: don't bail on PAT bits in PTE
On Fri, May 15, 2009 at 10:22:16AM +0200, Alexander Graf wrote: > A 64bit PTE can have bit7 set to 1 which means "Use this bit for the PAT". > Currently KVM's MMU code treats this bit as reserved, even though it's not. > > As long as we're not required to make use of the PAT bits which is only > required for DMA/MMIO from my understanding, we can safely ignore it. > > Hyper-V uses this bit for kernel PTEs. > > Signed-off-by: Alexander Graf > --- > arch/x86/kvm/mmu.c |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 8fcdae9..cce055a 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -2169,7 +2169,7 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu > *vcpu, int level) > context->rsvd_bits_mask[1][1] = exb_bit_rsvd | > rsvd_bits(maxphyaddr, 51) | > rsvd_bits(13, 20); /* large page */ > - context->rsvd_bits_mask[1][0] = ~0ull; > + context->rsvd_bits_mask[1][0] = 0ull; > break; > } > } Just to make sure I understand what this does: if guest sets bit7, will bit7 get set in shadow PTEs as well? -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Don't try to mess with CPUID when running nested SVM
When using nested SVM we usually want the guest to see the exact CPUID values we gave it and not some mangled ones. Hyper-V for example doesn't even start when the "hypervisor present" bit is set. Signed-off-by: Alexander Graf --- target-i386/helper.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/target-i386/helper.c b/target-i386/helper.c index 24fcea8..5f56698 100644 --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -1496,7 +1496,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, * isn't supported in compatibility mode on Intel. so advertise the * actuall cpu, and say goodbye to migration between different vendors * is you use compatibility mode. */ -if (kvm_enabled()) +if (kvm_enabled() && !kvm_nested) host_cpuid(0, 0, NULL, ebx, ecx, edx); break; case 1: @@ -1506,7 +1506,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, *edx = env->cpuid_features; /* "Hypervisor present" bit required for Microsoft SVVP */ -if (kvm_enabled()) +if (kvm_enabled() && !kvm_nested) *ecx |= (1 << 31); break; case 2: -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Add external-module-compat header for MSR_VM_IGNNE
This patch adds a compat definition for MSR_VM_IGNNE Signed-off-by: Alexander Graf --- kvm/kernel/x86/external-module-compat.h |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/kvm/kernel/x86/external-module-compat.h b/kvm/kernel/x86/external-module-compat.h index 8f9aae0..da42d7b 100644 --- a/kvm/kernel/x86/external-module-compat.h +++ b/kvm/kernel/x86/external-module-compat.h @@ -30,6 +30,11 @@ #define MSR_VM_CR 0xc0010114 #endif +#ifndef MSR_VM_IGNNE +#define MSR_VM_IGNNE0xc0010115 +#endif + + #ifndef MSR_VM_HSAVE_PA #define MSR_VM_HSAVE_PA 0xc0010117 #endif -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-autotest: The automation plans?
Michael Goldish 写道: - "sudhir kumar" wrote: On Thu, May 14, 2009 at 12:22 PM, jason wang wrote: sudhir kumar 写道: Hi Uri/Lucas, Do you have any plans for enhancing kvm-autotest? I was looking mainly on the following 2 aspects: (1). we have standalone migration only. Is there any plans of enhancing kvm-autotest so that we can trigger migration while a workload is running? Something like this: Start a workload(may be n instances of it). let the test execute for some time. Trigger migration. Log into the target. Check if the migration is succesful Check if the test results are consistent. We have some patches of ping pong migration and workload adding. The migration is based on public bridge and workload adding is based on running benchmark in the background of guest. Cool. I would like to have look on them. So how do you manage the background process/thread? Yes, we would try to sent it here as soon as possible. The background workload could be added through various methods. We could an simple algorithm as follows: run_migration2(): pid = run_autotest_background(test,params,env,"dbench","control.60") Do ping-pong migration ... wait_autoteset_background(pid) run_autotest_background() would fork a subprocess to run function run_autotest() and catch its exception. wait_autotest_background(pid) would wait until the background benchmark complete and analyse the result through the return value of the subprocess. The child process could work well depends the fact that the ssh connection should alive during migration. I believe this could be also achieved through job.parallel() (2). How can we run N parallel instances of a test? Will the current configuration be easily able to support it? Please provide your thoughts on the above features. The parallelized instances could be easily achieved through job.parallel() of autotest framework, and that is what we have used in our tests. We have make some helper routines such as get_free_port to be reentrant through file lock. We've implemented following test cases: timedrift(already sent here), savevm/loadvm, suspend/resume, jumboframe, migration between two machines and others. We will sent it here for review in the following weeks. There are some other things could be improved: 1) Current kvm_test.cfg.sample/kvm_test.cfg is transparent to autotest server UI. This would make it hard to configure the tests in the server side. During our test, we have merged it into control and make it could be configured by "editing control file" function of autotest server side web UI. Not much clue here. But I would like to keep the control file as simple as possible and as much independent of test scenarios as possible. kvm_tests.cfg should be the right file untill and unless it is impossible to do by using it. 2) Public bridge support: I've sent a patch(TAP network support in kvm-autotest), this patch needs external DHCP server and requires nmap support. I don't know whether the method of original kvm_runtes_old(DHCP server of private bridge) is preferable. The old approach is better. All might not be able to run an external DHCP server for running the test. I do not see any issue with the old approach. We're taking more of a minimalist approach in kvm_runtest_2: the framework should handle only the things directly related to testing. Configuring and running a DHCP server is and should be beyond the scope of the KVM-Autotest framework. To emulate the old behavior, you can just start the DHCP server yourself locally. If you wish, maybe we can bundle example scripts with the framework that will do this for the user, but they should not be an integral part of the framework in my opinion. -- Sudhir Kumar -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kvm build error with latest commit
Hi all, Latest kvm can not build with 2.6.30-rc4 kernel. Could anyone help on the issue? Error as following: make[1]: Leaving directory `/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm' + make -C kernel LINUX=2.6.30-rc4 make[1]: Entering directory `/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel' make -j20 -C /lib/modules/2.6.30-rc4/build M=`pwd` \ LINUXINCLUDE="-I`pwd`/include -Iinclude \ \ -Iarch/x86/include -I`pwd`/include-compat \ -include include/linux/autoconf.h \ -include `pwd`/x86/external-module-compat.h " make[2]: Entering directory `/mnt/sdb1/kernel/src/redhat/BUILD/kernel-2.6.30rc4' LD /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/built-in.o CC [M] /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/svm.o CC [M] /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/../external-module-compat.o CC [M] /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/vmx.o CC [M] /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/vmx-debug.o CC [M] /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/kvm_main.o CC [M] /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/x86.o CC [M] /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/mmu.o CC [M] /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/x86_emulate.o CC [M] /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/../anon_inodes.o CC [M] /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/irq.o CC [M] /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/i8259.o CC [M] /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/ioapic.o CC [M] /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/preempt.o CC [M] /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/i8254.o CC [M] /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/coalesced_mmio.o CC [M] /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/irq_comm.o CC [M] /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/timer.o CC [M] /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/iommu.o CC [M] /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/lapic.o /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/x86.c: In function 'do_cpuid_ent': /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/x86.c:1327: error: 'X86_FEATURE_MOVBE' undeclared (first use in this function) /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/x86.c:1327: error: (Each undeclared identifier is reported only once /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/x86.c:1327: error: for each function it appears in.) /workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/x86.c:1327: error: 'X86_FEATURE_POPCNT' undeclared (first use in this function) make[4]: *** [/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86/x86.o] Error 1 make[4]: *** Waiting for unfinished jobs make[3]: *** [/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel/x86] Error 2 make[2]: *** [_module_/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel] Error 2 make[2]: Leaving directory `/mnt/sdb1/kernel/src/redhat/BUILD/kernel-2.6.30rc4' make[1]: *** [all] Error 2 make[1]: Leaving directory `/workspace/ia32e/nightly/kvm-master-2.6.30-rc4-20090515011054681/qemu-kvm/kvm/rpmtop/BUILD/kernel' error: Bad exit status from /var/tmp/rpm-tmp.94190 (%build) RPM build errors: Bad exit status from /var/tmp/rpm-tmp.94190 (%build) make: *** [rpm] Error 1 Best Regards Jiajun-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a
Re: [PATCH][KVM-AUTOTEST] TAP network support in kvm-autotest
Michael Goldish 写道: Hi Micheal, thanks for your comments. Hi Jason, We already have patches that implement similar functionality here in TLV, as mentioned in the to-do list (item #4 under 'Framework'). They're not yet committed upstream because they're still quite fresh. OK, I would pay more attention to to-do list. Still, your patch looks good and is quite similar to mine. The main difference is that I use MAC/IP address pools specified by the user, instead of random MACs with arp/nmap to detect the matching IP addresses. We've considers the use of MAC/IP address pools, but this method need to handle the cases of multiple kvm-autotest running on multiple guests. The MAC pools should not overlapped when using public bridges. I will post my patch to the mailing list soon, but it will come together with quite a few other patches that I haven't posted yet, so please be patient. Comments/questions: Why do you use nmap in addition to arp? In what cases will arp not suffice? I'm a little put off by the fact that nmap imposes an additional requirement on the host. Three hosts I've tried don't come with nmap installed by default. We use nmap to make sure the guest IP could be finally found somehow. During our tests, the scripts may fail to get the IP address of guest when host iptables is turned on. Please see additional comments below. - "Jason Wang" wrote: Hi All: This patch tries to add tap network support in kvm-autotest. Multiple nics connected to different bridges could be achieved through this script. Public bridge is important for testing real network traffic and migration. The patch gives each nic with randomly generated mac address. The ip address required in the test could be dynamically probed through nmap/arp. Only the ip address of first NIC is used through the test. Example: nics = nic1 nic2 network = bridge bridge = switch ifup =/etc/qemu-ifup-switch ifdown =/etc/qemu-ifdown-switch This would make the virtual machine have two nics both of which are connected to a bridge with the name of 'switch'. Ifup/ifdown scripts are also specified. Another Example: nics = nic1 nic2 network = bridge bridge = switch bridge_nic2 = virbr0 ifup =/etc/qemu-ifup-switch ifup_nic2 = /etc/qemu-ifup-virbr0 This would makes the virtual machine have two nics: nic1 are connected to bridge 'switch' and nci2 are connected to bridge 'virbr0'. Public mode and user mode nic could also be mixed: nics = nic1 nic2 network = bridge network_nic2 = user Looking forward for comments and suggestions. From: jason Date: Wed, 13 May 2009 16:15:28 +0800 Subject: [PATCH] Add tap networking support. --- client/tests/kvm_runtest_2/kvm_utils.py |7 +++ client/tests/kvm_runtest_2/kvm_vm.py| 74 ++- 2 files changed, 69 insertions(+), 12 deletions(-) diff --git a/client/tests/kvm_runtest_2/kvm_utils.py b/client/tests/kvm_runtest_2/kvm_utils.py index be8ad95..0d1f7f8 100644 --- a/client/tests/kvm_runtest_2/kvm_utils.py +++ b/client/tests/kvm_runtest_2/kvm_utils.py @@ -773,3 +773,10 @@ def md5sum_file(filename, size=None): size -= len(data) f.close() return o.hexdigest() + +def random_mac(): +mac=[0x00,0x16,0x30, + random.randint(0x00,0x09), + random.randint(0x00,0x09), + random.randint(0x00,0x09)] +return ':'.join(map(lambda x: "%02x" %x,mac)) Random MAC addresses will not necessarily work everywhere, as far as I know. That's why I prefer user specified MAC/IP address ranges. Yes, maybe we could use user specified mac address prefix or more useful algorithm to generate mac address. diff --git a/client/tests/kvm_runtest_2/kvm_vm.py b/client/tests/kvm_runtest_2/kvm_vm.py index fab839f..ea7dab6 100644 --- a/client/tests/kvm_runtest_2/kvm_vm.py +++ b/client/tests/kvm_runtest_2/kvm_vm.py @@ -105,6 +105,10 @@ class VM: self.qemu_path = qemu_path self.image_dir = image_dir self.iso_dir = iso_dir +self.macaddr = [] +for nic_name in kvm_utils.get_sub_dict_names(params,"nics"): +macaddr = kvm_utils.random_mac() +self.macaddr.append(macaddr) def verify_process_identity(self): """Make sure .pid really points to the original qemu process. @@ -189,9 +193,25 @@ class VM: for nic_name in kvm_utils.get_sub_dict_names(params, "nics"): nic_params = kvm_utils.get_sub_dict(params, nic_name) qemu_cmd += " -net nic,vlan=%d" % vlan +net = nic_params.get("network") +if net == "bridge": +qemu_cmd += ",macaddr=%s" % self.macaddr[vlan] if nic_params.get("nic_model"): qemu_cmd += ",model=%s" % nic_params.get("nic_model") -qemu_cmd += " -net user,vlan=%d" % vlan +if net == "bridge": +qemu_cmd += " -net tap,vlan=%d" % vlan +ifup = nic_params.get("ifup") +if ifup: +
[PATCH 1/6] Add definition for IGNNE MSR
Hyper-V tries to access MSR_IGNNE, so let's at least have a definition for it in our headers. Signed-off-by: Alexander Graf --- arch/x86/include/asm/msr-index.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index ec41fc1..e273549 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -372,6 +372,7 @@ /* AMD-V MSRs */ #define MSR_VM_CR 0xc0010114 +#define MSR_VM_IGNNE0xc0010115 #define MSR_VM_HSAVE_PA 0xc0010117 #endif /* _ASM_X86_MSR_INDEX_H */ -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] Nested SVM: Improve interrupt injection
While trying to get Hyper-V running, I realized that the interrupt injection mechanisms that are in place right now are not 100% correct. This patch makes nested SVM's interrupt injection behave more like on a real machine. Signed-off-by: Alexander Graf --- arch/x86/kvm/svm.c | 40 +--- 1 files changed, 25 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index b2c6cf3..1d22d46 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1517,7 +1517,8 @@ static int nested_svm_vmexit_real(struct vcpu_svm *svm, void *arg1, /* Kill any pending exceptions */ if (svm->vcpu.arch.exception.pending == true) nsvm_printk("WARNING: Pending Exception\n"); - svm->vcpu.arch.exception.pending = false; + kvm_clear_exception_queue(&svm->vcpu); + kvm_clear_interrupt_queue(&svm->vcpu); /* Restore selected save entries */ svm->vmcb->save.es = hsave->save.es; @@ -1585,7 +1586,8 @@ static int nested_svm_vmrun(struct vcpu_svm *svm, void *arg1, svm->nested_vmcb = svm->vmcb->save.rax; /* Clear internal status */ - svm->vcpu.arch.exception.pending = false; + kvm_clear_exception_queue(&svm->vcpu); + kvm_clear_interrupt_queue(&svm->vcpu); /* Save the old vmcb, so we don't need to pick what we save, but can restore everything when a VMEXIT occurs */ @@ -2276,21 +2278,15 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, int irq) ((/*control->int_vector >> 4*/ 0xf) << V_INTR_PRIO_SHIFT); } -static void svm_queue_irq(struct kvm_vcpu *vcpu, unsigned nr) -{ - struct vcpu_svm *svm = to_svm(vcpu); - - svm->vmcb->control.event_inj = nr | - SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR; -} - static void svm_set_irq(struct kvm_vcpu *vcpu, int irq) { struct vcpu_svm *svm = to_svm(vcpu); - nested_svm_intr(svm); + if(!(svm->vcpu.arch.hflags & HF_GIF_MASK)) + return; - svm_queue_irq(vcpu, irq); + svm->vmcb->control.event_inj = irq | + SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR; } static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) @@ -2318,13 +2314,25 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu) struct vmcb *vmcb = svm->vmcb; return (vmcb->save.rflags & X86_EFLAGS_IF) && !(vmcb->control.int_state & SVM_INTERRUPT_SHADOW_MASK) && - (svm->vcpu.arch.hflags & HF_GIF_MASK); + (svm->vcpu.arch.hflags & HF_GIF_MASK) && + !is_nested(svm); } static void enable_irq_window(struct kvm_vcpu *vcpu) { - svm_set_vintr(to_svm(vcpu)); - svm_inject_irq(to_svm(vcpu), 0x0); + struct vcpu_svm *svm = to_svm(vcpu); + nsvm_printk("Trying to open IRQ window\n"); + + nested_svm_intr(svm); + + /* In case GIF=0 we can't rely on the CPU to tell us when +* GIF becomes 1, because that's a separate STGI/VMRUN intercept. +* The next time we get that intercept, this function will be +* called again though and we'll get the vintr intercept. */ + if (svm->vcpu.arch.hflags & HF_GIF_MASK) { + svm_set_vintr(svm); + svm_inject_irq(svm, 0x0); + } } static void enable_nmi_window(struct kvm_vcpu *vcpu) @@ -2392,6 +2400,8 @@ static void svm_complete_interrupts(struct vcpu_svm *svm) case SVM_EXITINTINFO_TYPE_EXEPT: /* In case of software exception do not reinject an exception vector, but re-execute and instruction instead */ + if (is_nested(svm)) + break; if (vector == BP_VECTOR || vector == OF_VECTOR) break; if (exitintinfo & SVM_EXITINTINFO_VALID_ERR) { -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] MMU: don't bail on PAT bits in PTE
A 64bit PTE can have bit7 set to 1 which means "Use this bit for the PAT". Currently KVM's MMU code treats this bit as reserved, even though it's not. As long as we're not required to make use of the PAT bits which is only required for DMA/MMIO from my understanding, we can safely ignore it. Hyper-V uses this bit for kernel PTEs. Signed-off-by: Alexander Graf --- arch/x86/kvm/mmu.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 8fcdae9..cce055a 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2169,7 +2169,7 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, int level) context->rsvd_bits_mask[1][1] = exb_bit_rsvd | rsvd_bits(maxphyaddr, 51) | rsvd_bits(13, 20); /* large page */ - context->rsvd_bits_mask[1][0] = ~0ull; + context->rsvd_bits_mask[1][0] = 0ull; break; } } -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6] Add rudimentary Hyper-V guest support
Now that we have nested SVM in place, let's make use of it and virtualize something non-kvm. The first interesting target that came to my mind here was Hyper-V. This patchset makes Windows Server 2008 boot with Hyper-V, which runs the "dom0" in virtualized mode already. I haven't been able to run a second VM within for now though, but maybe I just wasn't patient enough ;-). Alexander Graf (6): Add definition for IGNNE MSR MMU: don't bail on PAT bits in PTE Emulator: Inject #PF when page was not found Implement Hyper-V MSRs Nested SVM: Implement INVLPGA Nested SVM: Improve interrupt injection arch/x86/include/asm/msr-index.h |1 + arch/x86/kvm/mmu.c |2 +- arch/x86/kvm/svm.c | 59 +++-- arch/x86/kvm/x86.c |7 +++- 4 files changed, 50 insertions(+), 19 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] Implement Hyper-V MSRs
Hyper-V uses some MSRs, some of which are actually reserved for BIOS usage. But let's be nice today and have it its way, because otherwise it fails terribly. For MSRs where I could find a name I used the name, otherwise they're just added in their hex form for now. Signed-off-by: Alexander Graf --- arch/x86/kvm/svm.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index ef43a18..30e6b43 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1932,6 +1932,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data) *data = svm->hsave_msr; break; case MSR_VM_CR: + case 0x4081: *data = 0; break; case MSR_IA32_UCODE_REV: @@ -2034,6 +2035,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data) case MSR_VM_HSAVE_PA: svm->hsave_msr = data; break; + case MSR_VM_CR: + case MSR_VM_IGNNE: + case MSR_K8_HWCR: + break; default: return kvm_set_msr_common(vcpu, ecx, data); } -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] Nested SVM: Implement INVLPGA
SVM adds another way to do INVLPG by ASID which Hyper-V makes use of, so let's implement it! For now we just do the same thing invlpg does, as asid switching means we flush the mmu anyways. That might change one day though. Signed-off-by: Alexander Graf --- arch/x86/kvm/svm.c | 14 +- 1 files changed, 13 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 30e6b43..b2c6cf3 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1785,6 +1785,18 @@ static int clgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) return 1; } +static int invlpga_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) +{ + struct kvm_vcpu *vcpu = &svm->vcpu; + nsvm_printk("INVLPGA\n"); + svm->next_rip = kvm_rip_read(&svm->vcpu) + 3; + skip_emulated_instruction(&svm->vcpu); + + kvm_mmu_reset_context(vcpu); + kvm_mmu_load(vcpu); + return 1; +} + static int invalid_op_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) { @@ -2130,7 +2142,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm, [SVM_EXIT_INVD] = emulate_on_interception, [SVM_EXIT_HLT] = halt_interception, [SVM_EXIT_INVLPG] = invlpg_interception, - [SVM_EXIT_INVLPGA] = invalid_op_interception, + [SVM_EXIT_INVLPGA] = invlpga_interception, [SVM_EXIT_IOIO] = io_interception, [SVM_EXIT_MSR] = msr_interception, [SVM_EXIT_TASK_SWITCH] = task_switch_interception, -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] Emulator: Inject #PF when page was not found
If we couldn't find a page on read_emulated, it might be a good idea to tell the guest about that and inject a #PF. We do the same already for write faults. I don't know why it was not implemented for reads. Signed-off-by: Alexander Graf --- arch/x86/kvm/x86.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5fcde2c..5aa1219 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2131,10 +2131,13 @@ static int emulator_read_emulated(unsigned long addr, goto mmio; if (kvm_read_guest_virt(addr, val, bytes, vcpu) - == X86EMUL_CONTINUE) + == X86EMUL_CONTINUE) { return X86EMUL_CONTINUE; - if (gpa == UNMAPPED_GVA) + } + if (gpa == UNMAPPED_GVA) { + kvm_inject_page_fault(vcpu, addr, 0); return X86EMUL_PROPAGATE_FAULT; + } mmio: /* -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: event injection MACROs
Gleb Natapov wrote: > On Thu, May 14, 2009 at 10:34:11PM +0800, Dong, Eddie wrote: >> Gleb Natapov wrote: >>> On Thu, May 14, 2009 at 09:43:33PM +0800, Dong, Eddie wrote: Avi Kivity wrote: > Dong, Eddie wrote: >> OK. >> Also back to Gleb's question, the reason I want to do that is to >> simplify event generation mechanism in current KVM. >> >> Today KVM use additional layer of exception/nmi/interrupt such as >> vcpu.arch.exception.pending, vcpu->arch.interrupt.pending & >> vcpu->arch.nmi_injected. All those additional layer is due to >> compete of VM_ENTRY_INTR_INFO_FIELD >> write to inject the event. Both SVM & VMX has only one resource >> to inject the virtual event but KVM generates 3 catagory of >> events in parallel which further requires additional >> logic to dictate among them. > > I thought of using a queue to hold all pending events (in a common > format), sort it by priority, and inject the head. The SDM Table 5-4 requires to merge 2 events together, i.e. convert to #DF/ Triple fault or inject serially when 2 events happens no matter NMI, IRQ or exception. As if considering above events merging activity, that is a single element queue. >>> I don't know how you got to this conclusion from you previous >>> statement. See explanation to table 5-2 for instate where it is >>> stated that interrupt should be held pending if there is exception >>> with higher priority. Should be held pending where? In the queue, >>> like we do. Note that low prio exceptions are just dropped since >>> they will be regenerated. >> >> I have different understanding here. >> My understanding is that "held" means NO INTA in HW, i.e. LAPIC >> still hold this IRQ. >> > And what if INTA already happened and CPU is ready to fetch IDT for > interrupt vector and at this very moment CPU faults? If INTA happens, that means it is delivered. If its delivery triggers another exception, that is what Table5-4 handles. My understanding is that it is 2 stage process. Table 5-2 talk about events happening before delivery, so that HW needs to prioritize them. Once a decision is make, the highest one is delivered but then it could trigger another exception when fetching IDT etc. Current execption.pending/interrupt.pending/nmi_injected doesn't match either of above, interrupt/nmi is only for failed event injection, and a strange fixed priority check when it is really injected: exception > failed NMI > failed IRQ > new NMI > new IRQ. Table 5-2 looks missed in current KVM IMO except a wrong (but minor) exception > NMI > IRQ sequence. > >>> We could have either: 1) A pure SW "queue" that will be flush to HW register later (VM_ENTRY_INTR_INFO_FIELD), 2) Direct use HW register. >>> We have three event sources 1) exceptions 2) IRQ 3) NMI. We should >>> have queue of three elements sorted by priority. On each entry we >>> should >> >> Table 5-4 alreadys says NMI/IRQ is BENIGN. > Table 5-2 applies here not table 5-4 I think. > >> >>> inject an event with highest priority. And remove it from queue on >>> exit. >> >> The problem is that we have to decide to inject only one of above 3, >> and discard the rest. Whether priority them or merge (to one event >> as Table 5-4) is another story. > Only a small number of event are merged into #DF. Most handled > serially (SDM does not define what serially means unfortunately), so > I don't understand where "discard the rest" is come from. We can vmx_complete_interrupts clear all of them at next EXIT. Even from HW point of view, if there are pending NMI/IRQ/exception, CPU pick highest one, NMI, ignore/discard IRQ (but LAPIC still holds IRQ, thus it can be re-injected), completely discard exception. I don't say discarding has any problem, but unnecessary to keep all of 3. the only difference is when to discard the rest 2, at queue_exception/irq/nmi time or later on (even at next EXIT time), which is same to me. > discard exception since it will be regenerated anyway, but IRQ and > NMI is another story. SDM says that IRQ should be held pending (once > again not much explanation here), nothing about NMI. > >>> A potential benefit is that it can avoid duplicated code and potential bugs in current code as following patch shows if I understand correctly: --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2599,7 +2599,7 @@ static int handle_exception(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) cr2 = vmcs_readl(EXIT_QUALIFICATION); KVMTRACE_3D(PAGE_FAULT, vcpu, error_code, (u32)cr2, (u32)((u64)cr2 >> 32), handler); - if (vcpu->arch.interrupt.pending || vcpu->arch.exception.pending ) + if (vcpu->arch.interrupt.pending || vcpu->arch.exception.pending || vcpu->arch.n