RE: [PATCH 5/6] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests
Su, Disheng wrote: > Amit Shah wrote: >> This patch has been contributed to by the following people: >> >> From: Or Sagi <[EMAIL PROTECTED]> >> From: Nir Peleg <[EMAIL PROTECTED]> >> From: Amit Shah <[EMAIL PROTECTED]> >> From: Ben-Ami Yassour <[EMAIL PROTECTED]> >> From: Weidong Han <[EMAIL PROTECTED]> >> From: Glauber de Oliveira Costa <[EMAIL PROTECTED]> >> >> With this patch, we can assign a device on the host machine to a >> guest. >> >> A new command-line option, -pcidevice is added. >> To invoke it for a device sitting at PCI bus:dev.fn 04:08.0, use >> this: >> >> -pcidevice host=04:08.0 >> >> * The host driver for the device, if any, is to be removed before >> assigning the device (else device assignment will fail). >> >> * A device that shares IRQ with another host device cannot currently >> be assigned. >> >> * The RAW_IO capability is needed for this to work >> >> This works only with the in-kernel irqchip method; to use the >> userspace irqchip, a kernel module (irqhook) and some extra changes >> are needed. >> >> Signed-off-by: Amit Shah <[EMAIL PROTECTED]> >> --- >> qemu/Makefile.target|1 + >> qemu/hw/device-assignment.c | 619 >> +++ >> qemu/hw/device-assignment.h | 98 +++ qemu/hw/pc.c >> |6 + qemu/hw/pci.c |7 + >> qemu/vl.c | 18 ++ >> 6 files changed, 749 insertions(+), 0 deletions(-) >> create mode 100644 qemu/hw/device-assignment.c >> create mode 100644 qemu/hw/device-assignment.h >> >> diff --git a/qemu/Makefile.target b/qemu/Makefile.target >> index d9bdeca..05a1d84 100644 >> --- a/qemu/Makefile.target >> +++ b/qemu/Makefile.target >> @@ -621,6 +621,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) >> dma.o OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o >> pc.o OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o >> OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o +OBJS+= >> device-assignment.o ifeq ($(USE_KVM_PIT), 1) >> OBJS+= i8254-kvm.o >> endif >> diff --git a/qemu/hw/device-assignment.c >> b/qemu/hw/device-assignment.c new file mode 100644 index >> 000..5ba21a0 --- /dev/null >> +++ b/qemu/hw/device-assignment.c >> @@ -0,0 +1,619 @@ >> +/* >> + * Copyright (c) 2007, Neocleus Corporation. >> + * >> + * This program is free software; you can redistribute it and/or >> modify it + * under the terms and conditions of the GNU General >> Public License, + * version 2, as published by the Free Software >> Foundation. + * + * This program is distributed in the hope it will >> be useful, but WITHOUT + * ANY WARRANTY; without even the implied >> warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. >> See the GNU General Public License for + * more details. >> + * >> + * You should have received a copy of the GNU General Public License >> along with + * this program; if not, write to the Free Software >> Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA >> 02111-1307 USA. + * + * >> + * Assign a PCI device from the host to a guest VM. + * >> + * Adapted for KVM by Qumranet. >> + * >> + * Copyright (c) 2007, Neocleus, Alex Novik ([EMAIL PROTECTED]) >> + * Copyright (c) 2007, Neocleus, Guy Zana ([EMAIL PROTECTED]) >> + * Copyright (C) 2008, Qumranet, Amit Shah ([EMAIL PROTECTED]) >> + * Copyright (C) 2008, Red Hat, Amit Shah ([EMAIL PROTECTED]) + >> */ +#include >> +#include >> +#include "qemu-kvm.h" >> +#include "hw.h" >> +#include "pc.h" >> +#include "sysemu.h" >> +#include "console.h" >> +#include >> +#include "device-assignment.h" >> + >> +/* From linux/ioport.h */ >> +#define IORESOURCE_IO 0x0100 /* Resource type */ >> +#define IORESOURCE_MEM 0x0200 >> +#define IORESOURCE_IRQ 0x0400 >> +#define IORESOURCE_DMA 0x0800 >> +#define IORESOURCE_PREFETCH 0x1000 /* No side effects */ + >> +/* #define DEVICE_ASSIGNMENT_DEBUG 1 */ >> + >> +#ifdef DEVICE_ASSIGNMENT_DEBUG >> +#define DEBUG(fmt, args...) \ >> +do { \ >> + fprintf(stderr, "%s: " fmt, __func__ , ## args);\ +} >> while (0) +#else >> +#define DEBUG(fmt, args...) do { } while(0) >> +#endif >> + >> +static void assigned_dev_ioport_writeb(void *opaque, uint32_t addr, >> + uint32_t value) +{ >> +AssignedDevRegion *r_access = (AssignedDevRegion *)opaque; >> +uint32_t r_pio = (unsigned long)r_access->r_virtbase >> ++ (addr - r_access->e_physbase); >> + >> +DEBUG(stderr, "%s: r_pio=%08x e_physbase=%08x" >> + " r_virtbase=%08lx value=%08x\n", >> + __func__, r_pio, (int)r_access->e_physbase, >> + (unsigned long)r_access->r_virtbase, value); + >> outb(value, r_pio); +} >> + >> +static void assigned_dev_ioport_writew(void *opaque, uint32_t addr, >> + uint32_t value) +{ >> +AssignedDevRe
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Eduardo Habkost <[EMAIL PROTECTED]> writes: > On Sun, Oct 26, 2008 at 05:07:45PM +0200, Avi Kivity wrote: >> Eric W. Biederman wrote: > > Is it possible to disable vmx mode before we enable interrrupts in the > kdump kernel? > > You need IPIs to disable vmx on smp. >>> >>> Thank you. Reading your description and taking a quick look at >>> the code in hardware disable it does not appear that there is >>> anything needed (other than restricting ourselves it running >>> uniprocessor in the kdump kernel) that needs to happen. >>> >>> Certainly it would be nice to have kvm disabled in hardware, >>> but if you are proposing using the existing hardware disable >>> I must say that the cure looks much worse than the disease. >>> >> >> Certainly you don't want to issue IPIs when kdump()ing. It's not >> unlikely that the other cpus have interrupts permanently disabled. >> >> (we can use NMI IPIs, but that will likely be messy) > > NMI IPIs are already used on x86 native_machine_crash_shutdown(), so > it wouldn't get more messy that it is currently. We just need to add > another bit of code to the code that already runs on an NMI handler. Yes. And handling of those NMIs is best effort. Nothing fails if they don't actually run. > My question is: is a notifier chain too much complexity for a sensible > piece of code like that? If so, a compile-time hook on that point > would be safer, but then it wouldn't work when KVM is compiled as a > out-of-tree module. Well we could fairly easily have a non-modular function that does. if (vmx_present && vmx_enabled) { turn_off_vmx(); } Which at first skim looks like it is all of about 10-20 machine instructions. There are a few real places where we need code on the kdump path because there it is not possible to do the work any other way. However we need to think long and hard about that because placing the code anywhere besides in a broken and failing kernel is going to be easier to maintain and more reliable. I oppose an atomic notifier because it makes the review essentially impossible. If any module can come in and register a notifier we can't know what code is running on that code path and we can't be certain the code is safe in an abnormal case to run on that code path. Right now we only need to support vmx on the kdump path because of what appears to be a hardware design bug. Enabling vmx apparently disables standard functions like an INIT IPI. Things like this do happen but they should be rare. > Good point. My problem was a hang when booting the kdump kernel, but it > may also cause problems later, when the kdump kernel reboots. What was the cause of the hang when booting the kdump kernel? Eric -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 5/6] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests
Amit Shah wrote: > This patch has been contributed to by the following people: > > From: Or Sagi <[EMAIL PROTECTED]> > From: Nir Peleg <[EMAIL PROTECTED]> > From: Amit Shah <[EMAIL PROTECTED]> > From: Ben-Ami Yassour <[EMAIL PROTECTED]> > From: Weidong Han <[EMAIL PROTECTED]> > From: Glauber de Oliveira Costa <[EMAIL PROTECTED]> > > With this patch, we can assign a device on the host machine to a > guest. > > A new command-line option, -pcidevice is added. > To invoke it for a device sitting at PCI bus:dev.fn 04:08.0, use this: > > -pcidevice host=04:08.0 > > * The host driver for the device, if any, is to be removed before > assigning the device (else device assignment will fail). > > * A device that shares IRQ with another host device cannot currently > be assigned. > > * The RAW_IO capability is needed for this to work > > This works only with the in-kernel irqchip method; to use the > userspace irqchip, a kernel module (irqhook) and some extra changes > are needed. > > Signed-off-by: Amit Shah <[EMAIL PROTECTED]> > --- > qemu/Makefile.target|1 + > qemu/hw/device-assignment.c | 619 > +++ > qemu/hw/device-assignment.h | 98 +++ qemu/hw/pc.c > |6 + qemu/hw/pci.c |7 + > qemu/vl.c | 18 ++ > 6 files changed, 749 insertions(+), 0 deletions(-) > create mode 100644 qemu/hw/device-assignment.c > create mode 100644 qemu/hw/device-assignment.h > > diff --git a/qemu/Makefile.target b/qemu/Makefile.target > index d9bdeca..05a1d84 100644 > --- a/qemu/Makefile.target > +++ b/qemu/Makefile.target > @@ -621,6 +621,7 @@ OBJS+= ide.o pckbd.o ps2.o vga.o $(SOUND_HW) dma.o > OBJS+= fdc.o mc146818rtc.o serial.o i8259.o i8254.o pcspk.o pc.o > OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o > OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o extboot.o > +OBJS+= device-assignment.o > ifeq ($(USE_KVM_PIT), 1) > OBJS+= i8254-kvm.o > endif > diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c > new file mode 100644 > index 000..5ba21a0 > --- /dev/null > +++ b/qemu/hw/device-assignment.c > @@ -0,0 +1,619 @@ > +/* > + * Copyright (c) 2007, Neocleus Corporation. > + * > + * This program is free software; you can redistribute it and/or > modify it + * under the terms and conditions of the GNU General > Public License, + * version 2, as published by the Free Software > Foundation. + * > + * This program is distributed in the hope it will be useful, but > WITHOUT + * ANY WARRANTY; without even the implied warranty of > MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU > General Public License for + * more details. > + * > + * You should have received a copy of the GNU General Public License > along with + * this program; if not, write to the Free Software > Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA > 02111-1307 USA. + * > + * > + * Assign a PCI device from the host to a guest VM. > + * > + * Adapted for KVM by Qumranet. > + * > + * Copyright (c) 2007, Neocleus, Alex Novik ([EMAIL PROTECTED]) > + * Copyright (c) 2007, Neocleus, Guy Zana ([EMAIL PROTECTED]) > + * Copyright (C) 2008, Qumranet, Amit Shah ([EMAIL PROTECTED]) > + * Copyright (C) 2008, Red Hat, Amit Shah ([EMAIL PROTECTED]) > + */ > +#include > +#include > +#include "qemu-kvm.h" > +#include "hw.h" > +#include "pc.h" > +#include "sysemu.h" > +#include "console.h" > +#include > +#include "device-assignment.h" > + > +/* From linux/ioport.h */ > +#define IORESOURCE_IO 0x0100 /* Resource type */ > +#define IORESOURCE_MEM 0x0200 > +#define IORESOURCE_IRQ 0x0400 > +#define IORESOURCE_DMA 0x0800 > +#define IORESOURCE_PREFETCH 0x1000 /* No side effects */ > + > +/* #define DEVICE_ASSIGNMENT_DEBUG 1 */ > + > +#ifdef DEVICE_ASSIGNMENT_DEBUG > +#define DEBUG(fmt, args...) \ > +do { \ > + fprintf(stderr, "%s: " fmt, __func__ , ## args);\ > +} while (0) > +#else > +#define DEBUG(fmt, args...) do { } while(0) > +#endif > + > +static void assigned_dev_ioport_writeb(void *opaque, uint32_t addr, > + uint32_t value) > +{ > +AssignedDevRegion *r_access = (AssignedDevRegion *)opaque; > +uint32_t r_pio = (unsigned long)r_access->r_virtbase > ++ (addr - r_access->e_physbase); > + > +DEBUG(stderr, "%s: r_pio=%08x e_physbase=%08x" > + " r_virtbase=%08lx value=%08x\n", > + __func__, r_pio, (int)r_access->e_physbase, > + (unsigned long)r_access->r_virtbase, value); > +outb(value, r_pio); > +} > + > +static void assigned_dev_ioport_writew(void *opaque, uint32_t addr, > + uint32_t value) > +{ > +AssignedDevRegion *r_access = (AssignedDevRegion *)opaque; > +uint32_t r_pio = (unsigned long)r_access->r_virtbase > ++ (addr - r_access
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Avi Kivity <[EMAIL PROTECTED]> writes: > Eric W. Biederman wrote: >>> so reboots don't >>> work. It also assigns some memory to the cpu; if the new kernel isn't aware > of >>> it, >> >> Not a problem for a kdump kernel, as it lives out of a reserved region >> of memory. >> > > But it is a problem for general kexec. Agreed. We certainly want to cleanly disable vmx on a normal kexec reboot. Is it possible to disable vmx mode before we enable interrrupts in the kdump kernel? >>> You need IPIs to disable vmx on smp. >>> >> >> Thank you. Reading your description and taking a quick look at >> the code in hardware disable it does not appear that there is >> anything needed (other than restricting ourselves it running >> uniprocessor in the kdump kernel) that needs to happen. >> >> Certainly it would be nice to have kvm disabled in hardware, >> but if you are proposing using the existing hardware disable >> I must say that the cure looks much worse than the disease. >> > > Certainly you don't want to issue IPIs when kdump()ing. It's not unlikely > that > the other cpus have interrupts permanently disabled. > > (we can use NMI IPIs, but that will likely be messy) > >> It looks like the disable function is all of about 20 assembly >> instructions so I would not have a problem if he had a >> little inline function we could call that test to see if >> vmx is enabled and disable it in the case of kexec on panic. >> >> The normal polite shutdown. That just looks like asking for trouble. >> > > But what happens when the kdump kernel reboots? If it is uniprocessor, it > will > never have a chance to disable vmx on other cpus. Using acpi reset (now > default) works around this on some machines, but not all. Mostly likely any reboot path will trigger software to toggle the reset line on the board. At least that has been my experience, and the reason we don't see many problems when we fail to properly shutdown devices before reboot. If vmx persists across that it would seem to be very broken by design. In any case if reboot fails after a kdump that is a minor thing. Someone can always push the power button or the equivalent. My objections are: on_each_cpu called from after a panic looks like a good way to loose control of the box and never return. Adding a notifier looks like a good way too add functionality onto the kdump path that never gets reviewed for robustness after a kernel crash and thus a good way to remove the usefulness of the kexec on panic kernel path. Eric -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
On Sun, Oct 26, 2008 at 05:07:45PM +0200, Avi Kivity wrote: > Eric W. Biederman wrote: Is it possible to disable vmx mode before we enable interrrupts in the kdump kernel? >>> You need IPIs to disable vmx on smp. >>> >> >> Thank you. Reading your description and taking a quick look at >> the code in hardware disable it does not appear that there is >> anything needed (other than restricting ourselves it running >> uniprocessor in the kdump kernel) that needs to happen. >> >> Certainly it would be nice to have kvm disabled in hardware, >> but if you are proposing using the existing hardware disable >> I must say that the cure looks much worse than the disease. >> > > Certainly you don't want to issue IPIs when kdump()ing. It's not > unlikely that the other cpus have interrupts permanently disabled. > > (we can use NMI IPIs, but that will likely be messy) NMI IPIs are already used on x86 native_machine_crash_shutdown(), so it wouldn't get more messy that it is currently. We just need to add another bit of code to the code that already runs on an NMI handler. My question is: is a notifier chain too much complexity for a sensible piece of code like that? If so, a compile-time hook on that point would be safer, but then it wouldn't work when KVM is compiled as a out-of-tree module. > >> It looks like the disable function is all of about 20 assembly >> instructions so I would not have a problem if he had a >> little inline function we could call that test to see if >> vmx is enabled and disable it in the case of kexec on panic. >> >> The normal polite shutdown. That just looks like asking for trouble. >> > > But what happens when the kdump kernel reboots? If it is uniprocessor, > it will never have a chance to disable vmx on other cpus. Using acpi > reset (now default) works around this on some machines, but not all. Good point. My problem was a hang when booting the kdump kernel, but it may also cause problems later, when the kdump kernel reboots. -- Eduardo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2198658 ] USB Passthrough is broken in the latest code
Bugs item #2198658, was opened at 2008-10-26 20:06 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2198658&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: qemu Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jamie Kirkpatrick (gringostarr) Assigned to: Nobody/Anonymous (nobody) Summary: USB Passthrough is broken in the latest code Initial Comment: I found that in the latest source I got the error message: "Could not add USB device ${device_id}" when trying to use my printer with an ubuntu guest. This previously worked until I updated, built and installed the latest kvm-userspace source last week to the latest Git source. After an hour or so digging around on the web I found out that this seems to be a known issue with the current qemu source, which was recently integrated into the Git trunk. A patch that fixed the issue against the KVM-77 sources can be found here: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/156085. My host is Debian Lenny running a 2.6.27 rc kernel.. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2198658&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: external module: test for EXT_CONFIG_KVM_TRACE instead of CONFIG_KVM_TRACE
Eduardo Habkost wrote: This patch complements 800e7a37ca9ace991a6a36a0046551ec30ec3b9c (kvm: external module: adjust KVM_TRACE support to account for host kernel). The config variable defined on config.kbuild was renamed to EXT_CONFIG_KVM_TRACE, but the check for the configure setting on kernel/Makefile was not changed to the new name, making the build system try to compile kvm_trace.c without -DEXT_CONFIG_KVM_TRACE=y, if building using --with-kvm-trace. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] kvm: bios: Update e820 table for EPT real mode pagetable
Sheng Yang wrote: On Friday 17 October 2008 15:17:52 Sheng Yang wrote: I remembered I had sent this long long ago, but happened to find it missing in upstream... Signed-off-by: Sheng Yang <[EMAIL PROTECTED]> Avi? Thanks for persisting; applied. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Eric W. Biederman wrote: so reboots don't work. It also assigns some memory to the cpu; if the new kernel isn't aware of it, Not a problem for a kdump kernel, as it lives out of a reserved region of memory. But it is a problem for general kexec. Is it possible to disable vmx mode before we enable interrrupts in the kdump kernel? You need IPIs to disable vmx on smp. Thank you. Reading your description and taking a quick look at the code in hardware disable it does not appear that there is anything needed (other than restricting ourselves it running uniprocessor in the kdump kernel) that needs to happen. Certainly it would be nice to have kvm disabled in hardware, but if you are proposing using the existing hardware disable I must say that the cure looks much worse than the disease. Certainly you don't want to issue IPIs when kdump()ing. It's not unlikely that the other cpus have interrupts permanently disabled. (we can use NMI IPIs, but that will likely be messy) It looks like the disable function is all of about 20 assembly instructions so I would not have a problem if he had a little inline function we could call that test to see if vmx is enabled and disable it in the case of kexec on panic. The normal polite shutdown. That just looks like asking for trouble. But what happens when the kdump kernel reboots? If it is uniprocessor, it will never have a chance to disable vmx on other cpus. Using acpi reset (now default) works around this on some machines, but not all. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] Add S3 state to DSDT. Handle resume event in the BIOS.
Gleb Natapov wrote: Signed-off-by: Gleb Natapov <[EMAIL PROTECTED]> --- bios/acpi-dsdt.dsl | 33 ++- bios/rombios.c | 35 + bios/rombios32.c | 74 3 files changed, 130 insertions(+), 12 deletions(-) diff --git a/bios/acpi-dsdt.dsl b/bios/acpi-dsdt.dsl index 577b3fe..02f53a1 100755 --- a/bios/acpi-dsdt.dsl +++ b/bios/acpi-dsdt.dsl @@ -65,6 +65,32 @@ DefinitionBlock ( gen_processor(14, E) } +/* + * S3 (suspend-to-ram), S4 (suspend-to-disc) and S5 (power-off) type codes: + * must match piix4 emulation. + */ +Name (\_S3, Package (0x04) +{ +0x01, /* PM1a_CNT.SLP_TYP */ +0x01, /* PM1b_CNT.SLP_TYP */ +Zero, /* reserved */ +Zero /* reserved */ +}) +Name (\_S4, Package (0x04) +{ +Zero, /* PM1a_CNT.SLP_TYP */ +Zero, /* PM1b_CNT.SLP_TYP */ +Zero, /* reserved */ +Zero /* reserved */ +}) +Name (\_S5, Package (0x04) +{ +Zero, /* PM1a_CNT.SLP_TYP */ +Zero, /* PM1b_CNT.SLP_TYP */ +Zero, /* reserved */ +Zero /* reserved */ +}) + Scope (\) { /* Debug Output */ @@ -626,13 +652,6 @@ DefinitionBlock ( } } -/* S5 = power off state */ -Name (_S5, Package (4) { -0x00, // PM1a_CNT.SLP_TYP -0x00, // PM2a_CNT.SLP_TYP -0x00, // reserved -0x00, // reserved -}) Scope (\_GPE) { Any reason for the code movement? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: Fix kvm_free_physmem memory leak.
François Diakhate wrote: Make sure that kvm_free_physmem actually frees memory when a memory slot is not user allocated. @@ -4195,7 +4195,7 @@ int kvm_arch_set_memory_region(struct kvm *kvm, memslot->userspace_addr = userspace_addr; spin_unlock(&kvm->mmu_lock); } else { - if (!old.user_alloc && old.rmap) { + if (!old.user_alloc && old.rmap && current->mm) { int ret; What's the purpose of this? diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index a87f45e..b0d7435 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -639,8 +639,17 @@ void kvm_free_physmem(struct kvm *kvm) { int i; - for (i = 0; i < kvm->nmemslots; ++i) + for (i = 0; i < kvm->nmemslots; ++i) { + struct kvm_userspace_memory_region mem = { + .slot = i, + .guest_phys_addr = kvm->memslots[i].base_gfn << PAGE_SHIFT, + .memory_size = 0, + .flags = 0, + }; + + kvm_set_memory_region(kvm, &mem, kvm->memslots[i].user_alloc); kvm_free_physmem_slot(&kvm->memslots[i], NULL); + } } Better to fix kvm_free_physmem_slot() if it doesn't handle !user_allocated memory properly. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Avi Kivity <[EMAIL PROTECTED]> writes: > Eric W. Biederman wrote: >> Why do we need to disable vmx mode before booting a normal linux kernel? >> > > vmx mode blocks INIT (even on the host; not just on the guests) *blink* broken hardware design there. > so reboots don't > work. It also assigns some memory to the cpu; if the new kernel isn't aware > of > it, Not a problem for a kdump kernel, as it lives out of a reserved region of memory. > the cpu and the kernel would both think it belongs to them. Finally, if vmx > mode is enabled, you can't start kvm on the new kernel. This isn't especially interesting in the crash dump scenario. >> Is it possible to disable vmx mode before we enable interrrupts in the >> kdump kernel? >> > > You need IPIs to disable vmx on smp. Thank you. Reading your description and taking a quick look at the code in hardware disable it does not appear that there is anything needed (other than restricting ourselves it running uniprocessor in the kdump kernel) that needs to happen. Certainly it would be nice to have kvm disabled in hardware, but if you are proposing using the existing hardware disable I must say that the cure looks much worse than the disease. It looks like the disable function is all of about 20 assembly instructions so I would not have a problem if he had a little inline function we could call that test to see if vmx is enabled and disable it in the case of kexec on panic. The normal polite shutdown. That just looks like asking for trouble. Eric -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND][PATCH] KVM: VMX: Report VNMI emulation
Jan Kiszka wrote: In case we ever have to debug possibly NMI-related issues of the guest, it may help to correlate them with the VNMI emulation for older VMX CPUs. Signed-off-by: Jan Kiszka <[EMAIL PROTECTED]> --- arch/x86/kvm/vmx.c |4 1 file changed, 4 insertions(+) Index: b/arch/x86/kvm/vmx.c === --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3671,6 +3671,10 @@ static int __init vmx_init(void) ept_sync_global(); + if (!cpu_has_virtual_nmis()) + printk(KERN_WARNING + "kvm: emulating NMI window via interrupt window\n"); + This is best done via /proc/cpuinfo... (vnmi or vmx_vnmi) what's the status on those patches? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] virtio_net: Fix leaked netdev->refcnt
Mark McLoughlin wrote: If after receiving some packets and refilling the queue with buffers, we detect that more packets are available then we re-schedule the queue and process them. This re-scheduling - i.e. calling __netif_rx_schedule() - causes a netdev reference to be taken. Once we've finally run out of buffers to process, we return zero and net_rx_action() drops the reference taken by the original call to _netif_rx_schedule() in e.g. skb_recv_done(). The reference taken by re-scheduling is always leaked, leading to: waiting for eth0 to become free. Usage count = 132568 Fix by immediately dropping the extra reference taken. Applied all three, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvmtrace: Allow user to clean up stale trace setup
Eduardo Habkost wrote: If kvmtrace crashes or gets killed while collecting trace data, stale trace setup information may be enabled on the kernel, and currently there is no way to force the previous trace setup to be overwritten or disabled. The following two patches against kvm-userspace will add a new command-line option to kvmtrace (-f), that will allow the user to forcibly clean up an existing trace setup before enabling trace. The long term plan is to drop kvmtrace in favor of the more generic frameworks (which provide a common trace transport, IIRC). Can you check if this has been merged (or if enough of it is in mainline)? This is exactly the sort of issues that are best handled by a generic implementation. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/6] qemu: piix: Introduce functions to get pin number from irq and vice versa
Amit Shah wrote: +int piix3_get_pin(int pic_irq) +{ +int i; +for (i = 0; i < 4; i++) +if (piix3_dev->config[0x60+i] == pic_irq) +return i; +return -1; +} What happens if two pci interrupts are routed to one irq line? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6] KVM/userspace: Device Assignment: Add ioctl wrappers needed for assigning devices
Amit Shah wrote: +#ifdef KVM_CAP_DEVICE_ASSIGNMENT +int kvm_assign_pci_device(kvm_context_t kvm, + struct kvm_assigned_pci_dev *assigned_dev) +{ + return ioctl(kvm->vm_fd, KVM_ASSIGN_PCI_DEVICE, assigned_dev); Convert -1s to -errno, to avoid problems with errno being overwritten later. +} + +int kvm_assign_irq(kvm_context_t kvm, + struct kvm_assigned_irq *assigned_irq) +{ + return ioctl(kvm->vm_fd, KVM_ASSIGN_IRQ, assigned_irq); +} +#endif -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] KVM/userspace: Device Assignment: Support for assigning PCI devices to guests
Anthony Liguori wrote: I don't think having a kvm_enabled() check here is very useful. I think device-assignment.c should be conditional on USE_KVM, and the only kvm_enabled() check should be when creating the initial device assignment. Practically speaking, QEMU is never going to support device assignment outside of the context of KVM because I strongly doubt anything like irqhook will make it upstream. Userspace interrupt handlers are actually possible with MSI; we should see if uio is open to adding support for that. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Eric W. Biederman wrote: Why do we need to disable vmx mode before booting a normal linux kernel? vmx mode blocks INIT (even on the host; not just on the guests) so reboots don't work. It also assigns some memory to the cpu; if the new kernel isn't aware of it, the cpu and the kernel would both think it belongs to them. Finally, if vmx mode is enabled, you can't start kvm on the new kernel. Is it possible to disable vmx mode before we enable interrrupts in the kdump kernel? You need IPIs to disable vmx on smp. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Eduardo Habkost wrote: Considering they touch both KVM and kexec, which tree would be best way to get them in? kvm.git will be happy to hold these patches, provided they are acked by the relevant maintainer. (Avi: the patches were sent only to kexec and kvm mailing lists, initially. If it's better to submit them to your address also so it gets on your queue, please let me know) Please do copy me. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/3] KVM: MMU: prepopulate the shadow on invlpg
Marcelo Tosatti wrote: If the guest executes invlpg on a non present entry, peek into th pagetable and attempt to prepopulate the shadow entry. Also stop dirty fault updates from interfering with the fork dete For RHEL3 and 32-bit Vista guests the success rate is high. With Win2003 there is a 1:2 success/fail ratio, but even then compilation benchmarks are slightly faster. 2000 and XP rarely execute invlpg on non present entries. 2% improvement on RHEL3/AIM7. Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]> Index: kvm/arch/x86/kvm/mmu.c === --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -2365,7 +2365,8 @@ static void kvm_mmu_access_page(struct k } void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, - const u8 *new, int bytes) + const u8 *new, int bytes, + bool speculative) kvm_mmu_pte_write()s are always speculative. Maybe this is misnamed? Perhaps 'guest_initiated' (with the opposite meaning). @@ -222,7 +223,7 @@ walk: if (ret) goto walk; pte |= PT_DIRTY_MASK; - kvm_mmu_pte_write(vcpu, pte_gpa, (u8 *)&pte, sizeof(pte)); + kvm_mmu_pte_write(vcpu, pte_gpa, (u8 *)&pte, sizeof(pte), 1); This is definitely not a speculative write. But it is !guest_initiated. @@ -467,10 +468,18 @@ static int FNAME(shadow_invlpg_entry)(st struct kvm_vcpu *vcpu, u64 addr, u64 *sptep, int level) { + struct shadow_walker *sw = + container_of(_sw, struct shadow_walker, walker); if (level == PT_PAGE_TABLE_LEVEL) { - if (is_shadow_present_pte(*sptep)) + struct kvm_mmu_page *sp = page_header(__pa(sptep)); blank line after declarations. Or move to head of function. + sw->pte_gpa = (sp->gfn << PAGE_SHIFT); + sw->pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t); + + if (is_shadow_present_pte(*sptep)) { rmap_remove(vcpu->kvm, sptep); + sw->pte_gpa = -1; Why? The pte could have heen replaced (for example, a write access to a cow page). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/3] KVM: MMU: skip global pgtables on sync due to cr3 switch
Marcelo Tosatti wrote: Skip syncing global pages on cr3 switch (but not on cr4/cr0). This is important for Linux 32-bit guests with PAE, where the kmap page is marked as global. Patch is good, but won't apply without the first. { u64 spte; int ret = 0; u64 mt_mask = shadow_mt_mask; + struct kvm_mmu_page *sp = page_header(__pa(shadow_pte)); + + if (!global && sp->global) { + sp->global = 0; A slight deficiency in this approach is that a page can't transition from !global to global. I don't think this is frequent, so we don't need to deal with it. But a more accurate approach is to keep a count of non-global present mappings, and to activate the global logic when this count is nonzero. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/3] KVM: MMU: collapse remote TLB flushes on root sync
Marcelo Tosatti wrote: Instead of flushing remote TLB's at every page resync, do an initial pass to write protect the sptes, collapsing the flushes on a single remote TLB invalidation. kernbench is 2.3% faster on 4-way guest. Improvements have been seen with other loads such as AIM7. Avi: feel free to change this if you dislike the style (I do, but can't think of anything nicer). static void mmu_sync_children(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) { struct sync_walker walker = { - .walker = { .entry = mmu_sync_fn, }, + .walker = { .entry = mmu_wprotect_fn, + .clear_unsync = false, }, .vcpu = vcpu, + .write_protected = false }; + /* collapse the TLB flushes as an optimization */ + mmu_unsync_walk(sp, &walker.walker); + if (walker.write_protected) + kvm_flush_remote_tlbs(vcpu->kvm); + + walker.walker.entry = mmu_sync_fn; + walker.walker.clear_unsync = true; + while (mmu_unsync_walk(sp, &walker.walker)) cond_resched_lock(&vcpu->kvm->mmu_lock); We're always doing two passes here, which is a bit sad. How about having a single pass which: - collects unsync pages into an array - exits on no more unsync pages or max array size reached Then, iterate over the array: - write protect all pages - flush tlb - sync pages Loop until the root is synced. If the number of pages to sync is typically small, and the array is sized to be larger than this, then we only walk the pagetables once. btw, our walkers are a bit awkward (though still better than what we had before). If we rewrite them into for_each style iterators, the code could become cleaner and shorter. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html