PCI device not properly reset after VFIO
Hi Alex, I've been playing around with VFIO and megasas (of course). What I did now was switching between VFIO and 'normal' operation, ie emulated access. megasas is happily running under VFIO, but when I do an emergency stop like killing the Qemu session the PCI device is not properly reset. IE when I load 'megaraid_sas' after unbinding the vfio_pci module the driver cannot initialize the card and waits forever for the firmware state to change. I need to do a proper pci reset via echo 1 > /sys/bus/pci/device//reset to get it into a working state again. Looking at vfio_pci_disable() pci reset is called before the config state and BARs are restored. Seeing that vfio_pci_enable() calls pci reset right at the start, too, before modifying anything I do wonder whether the pci reset is at the correct location for disable. I would have expected to call pci reset in vfio_pci_disable() _after_ we have restored the configuration, to ensure a sane state after reset. And, as experience show, we do need to call it there. So what is the rationale for the pci reset? Can we move it to the end of vfio_pci_disable() or do we need to call pci reset twice? Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost-blk: Add vhost-blk support v2
On Thu, Oct 18, 2012 at 02:50:56PM +1030, Rusty Russell wrote: > Asias He writes: > >>> +#define BLK_HDR 0 > >> > >> What's this for, exactly? Please add a comment. > > > > The block headr is in the first and separate buffer. > > Please don't assume this! We're trying to fix all the assumptions in > qemu at the moment. > > vhost_net handles this correctly, taking bytes off the descriptor chain > as required. > > Thanks, > Rusty. BTW are we agreed on the spec update that makes cmd 32 bytes? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to do fast accesses to LAPIC TPR under kvm?
On 2012-10-17 21:24, Stefan Fritsch wrote: > Hi, > > OpenBSD/i386 seems to be one of the few operating systems that still > uses the LAPIC taks priority register for interrupt handling. On AMD Yeah, only very special OSes do this... ;) > CPUs and on older Intel CPUs without the flexpriority feature, this > causes a huge performance impact on kvm. I have seen slowdown by a > factor of 10. > > Is there a way to use the TPR under kvm without the slowdown? There > are some MSRs inherited from Hyper-V, but using these does not make > that much difference. I think this is because they still cause an > vmexit for every TPR access. I expect the the same is true for x2apic > emulation, isn't it? Didn't study the HyperV interface yet, but the trick is indeed to avoid as many vmexits as possible, specifically when lowering the TPR value has no effect as no interrupts are pending. > > There is also the kvmvapic, but kvm does not expose a sane interface > to it and only uses it for Windows XP specific binary patching. The kvmvapic is not a classic paravirtual interface in that it does not really require guest OS awareness. But it requires the guest to accept being patched. That's the case for certain Windows versions. Also, the option ROMs, including our kvmvapic "ROM", have to be mapped at fixed, accessible addresses to allow jumping to it from a patched TPR instructions. Therefore, we limited the patching to known OS versions, avoiding to mess around with other, untested OSes. However, it may be possible to accept OpenBSD as well by adjusting the tests in kvmvapic and possibly adjusting some other details. > > Another possibility is TPR access via CR8 on AMD, but the missing > cr8_legacy CPUID bit and this discussion [1] make me believe that this > is not supported under kvm, at least in 32bit mode. Could this be > easily fixed? If yes, would it solve the performance problems, i.e. > offer performance comparable to Intel's flexpriority feature? Everything that unconditionally traps, and so do CR8 accesses, does not help. > > OpenBSD seems to be reluctant to stop using the TPR. In fact, in a > recent discussion, there has been a suggestion that OpenBSD should > switch to using TPR also on OpenBSD/amd64 to solve some problems with > boot interrupts. How do you expect this would affect performance under > kvm (if using CR8)? > > Or do you have any other suggestions? One could also modify kvm to > expose a real interface to kvmvapic, e.g. allow the guest OS to > provide the virtual address of the option rom and the offset of the > CPU number in the %fs segment, instead of using hard coded values for > Windows XP. Of course, though all we need is a stable address in fact. See vapic_write() for the existing PV interface (between option ROM and hypervisor so far). We can extend it as long as it is compatible with the existing one. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost-blk: Add vhost-blk support v2
Asias He writes: >>> +#define BLK_HDR0 >> >> What's this for, exactly? Please add a comment. > > The block headr is in the first and separate buffer. Please don't assume this! We're trying to fix all the assumptions in qemu at the moment. vhost_net handles this correctly, taking bytes off the descriptor chain as required. Thanks, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] x86: clear vmcss on all cpus when doing kdump if necessary
于 2012年10月17日 18:16, Avi Kivity 写道: > On 10/17/2012 04:28 AM, Zhang Yanfei wrote: >> 于 2012年10月15日 23:43, Avi Kivity 写道: >>> On 10/12/2012 08:40 AM, Zhang Yanfei wrote: Currently, kdump just makes all the logical processors leave VMX operation by executing VMXOFF instruction, so any VMCSs active on the logical processors may be corrupted. But, sometimes, we need the VMCSs to debug guest images contained in the host vmcore. To prevent the corruption, we should VMCLEAR the VMCSs before executing the VMXOFF instruction. >>> >>> How have you verified that VMXOFF doesn't flush cached VMCSs already? >>> >> >> I tried some tests, for example, I made copies for every vmcs, and in the >> kdump >> path, I backed up all the loaded vmcs into the copies before vmxoff. >> After generating the vmcore, I retrieve the vmcss and their copies, and >> compare them, >> no differences. >> >> Another test is using VMCLEAR to clear all the loaded vmcs before VMXOFF, >> and compare the vmcss and their copies, there are indeed differences between >> the >> vmcs and its copy. >> >> I know the tests may be not so convincing, for example, I used memcpy to >> back up >> the vmcss and it is an ordinary memory operation. But to ensure the >> non-corruption >> of the vmcss in the vmcore, I think we should VMCLEAR the vmcss before >> VMXOFF just >> as the Intel spec says. > > Sorry, I was unclear -- I was referring to the spec, I wasn't sure > whether VMXOFF is defined to flush VMCSes or whether it just invalidates > on-chip caches so that it won't flush them out in the future, corrupting > memory. We don't want to depend on actual behaviour as it may change > with future version. > > Copying some Intel folk, maybe they can clarify it. > Yes, the Intel spec says "may be" about the VMCS-corruption thing. From chapter 24.10.1 in Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3C:System Programming Guide, Part 3, there is the description: "If a logical processor leaves VMX operation, any VMCSs active on that logical processor may be corrupted (see below). To prevent such corruption of a VMCS that may be used either after a return to VMX operation or on another logical processor, software should VMCLEAR that VMCS before executing the VMXOFF instruction or removing power from the processor (e.g., as part of a transition to the S3 and S4 power states)." Our purpose is to make sure the VMCSs in the vmcore are updated and non-corrupted. So according to the description above, no matter whether VMXOFF is defined to flush VMCSs or whether it just invalidates on-chip caches, we'd better VMCLEAR the VMCSs before executing the VMXOFF. Thanks Zhang Yanfei -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] KVM: ia64: remove unused variable in kvm_release_vm_pages()
Acked-by: Xiantao Zhang > -Original Message- > From: Wei Yongjun [mailto:weiyj...@gmail.com] > Sent: Wednesday, October 17, 2012 11:04 PM > To: a...@redhat.com; mtosa...@redhat.com; Zhang, Xiantao; Luck, Tony; Yu, > Fenghua > Cc: yongjun_...@trendmicro.com.cn; kvm@vger.kernel.org; kvm- > i...@vger.kernel.org; linux-i...@vger.kernel.org > Subject: [PATCH] KVM: ia64: remove unused variable in > kvm_release_vm_pages() > > From: Wei Yongjun > > The variable base_gfn is initialized but never used otherwise, so remove the > unused variable. > > dpatch engine is used to auto generate this patch. > (https://github.com/weiyj/dpatch) > > Signed-off-by: Wei Yongjun > --- > arch/ia64/kvm/kvm-ia64.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index > 8b3a9c0..c71acd7 100644 > --- a/arch/ia64/kvm/kvm-ia64.c > +++ b/arch/ia64/kvm/kvm-ia64.c > @@ -1362,11 +1362,9 @@ static void kvm_release_vm_pages(struct kvm > *kvm) > struct kvm_memslots *slots; > struct kvm_memory_slot *memslot; > int j; > - unsigned long base_gfn; > > slots = kvm_memslots(kvm); > kvm_for_each_memslot(memslot, slots) { > - base_gfn = memslot->base_gfn; > for (j = 0; j < memslot->npages; j++) { > if (memslot->rmap[j]) > put_page((struct page *)memslot->rmap[j]); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
On Thu, 2012-10-18 at 09:10 +1100, Paul Mackerras wrote: > > With the XICS, there are two types of irqchip: a source controller and > a presentation controller. There is one presentation controller per > vcpu and typically one source controller per PCI host bridge (a source > controller can manage multiple sources). The "buid" above is > basically an identifier for a source controller. > > So with the above, it would be quite easy to add new types and > arguments for them. The only possible issue is that afiak, the ioctl number depends on the structure size, no ? If it does, then we should add some padding to the union to leave room for new types. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
On Wed, Oct 17, 2012 at 04:39:57PM -0400, Christoffer Dall wrote: > On Wed, Oct 17, 2012 at 4:38 PM, Alexander Graf wrote: > > On 10/14/2012 02:04 AM, Christoffer Dall wrote: > >> > >> *** warning: this RFC patch series is only compile-tested *** > >> > >> We need a way to specify the address at which we expect VMs to access > >> the interrupt controller (both the emulated distributor and the hardware > >> interface supporting virtualization). User space should decide on this > >> address as user space decides on an emulated board and loads a device > >> tree describing these details directly to the guest. > >> > >> Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific > >> ioctl with a a highly device specific set of parameters, we try > >> something slightly more generic, that should fit well with how user > >> space (read QEMU) first builds the individual devices and later sets up > >> the emulated platform. > > > > > > Have you talked to Ben about this one? He wanted to design a new, more > > flexible irqchip API that would work for XICS & MPIC. Maybe there's some > > room for cooperation here? > > > I have not - Ben, what do you have in mind? I've taken over Ben's patches in this area and I'm currently working on getting them ready for submission. So far we only have XICS emulation, and it is accessed through hypercalls, so there are no addresses in the create-iochip ioctl argument yet. What we have so far is a new ioctl: #define KVM_CREATE_IRQCHIP_ARGS _IOW(KVMIO, 0xac, struct kvm_irqchip_args) where kvm_irqchip_args is defined in an arch header and currently looks like this: /* for KVM_CAP_SPAPR_XICS */ #define __KVM_HAVE_IRQCHIP_ARGS struct kvm_irqchip_args { #define KVM_IRQCHIP_TYPE_ICP0 /* XICS: ICP (presentation controller) */ #define KVM_IRQCHIP_TYPE_ICS1 /* XICS: ICS (source controller) */ __u32 type; union { /* XICS ICP arguments. This needs to be called once before * creating any VCPU to initialize the main kernel XICS data * structures. */ struct { #define KVM_ICP_FLAG_NOREALMODE 0x0001 /* Disable real mode ICP */ __u32 flags; } icp; /* XICS ICS arguments. You can call this for every BUID you * want to make available. * * The BUID is 12 bits, the interrupt number within a BUID * is up to 12 bits as well. The resulting interrupt numbers * exposed to the guest are BUID || IRQ which is 24 bit * * BUID cannot be 0. */ struct { __u32 flags; __u16 buid; __u16 nr_irqs; } ics; }; }; With the XICS, there are two types of irqchip: a source controller and a presentation controller. There is one presentation controller per vcpu and typically one source controller per PCI host bridge (a source controller can manage multiple sources). The "buid" above is basically an identifier for a source controller. So with the above, it would be quite easy to add new types and arguments for them. Thoughts? Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] Added call parameter to track whether invocation originated with guest or elsewhere
OK, agreed it is not pretty. Thanks, Will > -Original Message- > From: Marcelo Tosatti [mailto:mtosa...@redhat.com] > Sent: Wednesday, October 17, 2012 7:09 AM > To: Avi Kivity > Cc: Auld, Will; Will Auld; kvm@vger.kernel.org; Zhang, Xiantao; Liu, > Jinsong > Subject: Re: [PATCH] Added call parameter to track whether invocation > originated with guest or elsewhere > > On Wed, Oct 17, 2012 at 12:35:33PM +0200, Avi Kivity wrote: > > On 10/17/2012 04:10 AM, Will Auld wrote: > > > Signed-off-by: Will Auld > > > --- > > > > > > Resending to full list > > > > > > Marcelo, > > > > > > This patch is what I believe you ask for as foundational for later > > > patches to address IA32_TSC_ADJUST. > > > > > > > Please write a changelog to reflect the motivation. > > > > All those bool parameters scattered all over the place aren't very > > pretty. Usually we solve this with helpers that embed the parameter > > name (kvm_set_msr() vs. kvm_set_msr_host()) but there are too many > > functions for this to work here. > > > > Marcelo, any ideas? > > Its easier to read > > kvm_x86_ops->kvm_set_msr() > kvm_x86_ops->kvm_set_msr_host() > > then > > kvm_x86_ops->kvm_set_msr(,false) > kvm_x86_ops->kvm_set_msr(,true) > > So you're right. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
On Wed, 2012-10-17 at 16:39 -0400, Christoffer Dall wrote: > > Have you talked to Ben about this one? He wanted to design a new, more > > flexible irqchip API that would work for XICS & MPIC. Maybe there's some > > room for cooperation here? > > > I have not - Ben, what do you have in mind? I've been sidetracked to some other stuff so for now Paul (CC) is taking over my interrupt patches. We initially changes IRQ_CREATE_IRQCHIP to take an argument but that was causing an x86 ABI breakage (ioctl number changing). So we'll probably be creating a new one. >From there, nothing fancy really, just an ioctl with an IRQ chip type at the beginning followed by a union of type-specific parameters. The main problem we haven't sorted out yet is how to replace some of the horrors related to mapping interrupts that have tendrils all the way into virtio-pci etc... in kemu that don't apply to use (well mostly) and the interaction with in-kernel generated interrupts to avoid going through qemu for vhost ec... Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
On Wed, Oct 17, 2012 at 4:38 PM, Alexander Graf wrote: > On 10/14/2012 02:04 AM, Christoffer Dall wrote: >> >> *** warning: this RFC patch series is only compile-tested *** >> >> We need a way to specify the address at which we expect VMs to access >> the interrupt controller (both the emulated distributor and the hardware >> interface supporting virtualization). User space should decide on this >> address as user space decides on an emulated board and loads a device >> tree describing these details directly to the guest. >> >> Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific >> ioctl with a a highly device specific set of parameters, we try >> something slightly more generic, that should fit well with how user >> space (read QEMU) first builds the individual devices and later sets up >> the emulated platform. > > > Have you talked to Ben about this one? He wanted to design a new, more > flexible irqchip API that would work for XICS & MPIC. Maybe there's some > room for cooperation here? > I have not - Ben, what do you have in mind? -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl
On Wed, Oct 17, 2012 at 4:31 PM, Peter Maydell wrote: > On 17 October 2012 21:23, Christoffer Dall > wrote: >> On Wed, Oct 17, 2012 at 4:21 PM, Peter Maydell >> wrote: +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is +initally run. >>> >>> "initially". >> >> thanks a bunch for those, and sorry about the sloppyness. > > No problem. Also just noticed "platform" there :-) > I'll spell check the diff just to be sure. :) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses
On 10/14/2012 02:04 AM, Christoffer Dall wrote: *** warning: this RFC patch series is only compile-tested *** We need a way to specify the address at which we expect VMs to access the interrupt controller (both the emulated distributor and the hardware interface supporting virtualization). User space should decide on this address as user space decides on an emulated board and loads a device tree describing these details directly to the guest. Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific ioctl with a a highly device specific set of parameters, we try something slightly more generic, that should fit well with how user space (read QEMU) first builds the individual devices and later sets up the emulated platform. Have you talked to Ben about this one? He wanted to design a new, more flexible irqchip API that would work for XICS & MPIC. Maybe there's some room for cooperation here? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl
On 17 October 2012 21:23, Christoffer Dall wrote: > On Wed, Oct 17, 2012 at 4:21 PM, Peter Maydell > wrote: >>> +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU >>> is >>> +initally run. >> >> "initially". > > thanks a bunch for those, and sorry about the sloppyness. No problem. Also just noticed "platform" there :-) -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
On 14 October 2012 01:04, Christoffer Dall wrote: > On ARM (and possibly other architectures) some bits are specific to the > model being emulated for the guest and user space needs a way to tell > the kernel about those bits. An example is mmio device base addresses, > where KVM must know the base address for a given device to properly > emulate mmio accesses within a certain address range or directly map a > device with virtualiation extensions into the guest address space. > > We try to make this API slightly more generic than for our specific use, > but so far only the VGIC uses this feature. > > Signed-off-by: Christoffer Dall > --- > Documentation/virtual/kvm/api.txt | 30 ++ > arch/arm/include/asm/kvm.h| 13 + > arch/arm/include/asm/kvm_mmu.h|1 + > arch/arm/include/asm/kvm_vgic.h |6 ++ > arch/arm/kvm/arm.c| 31 ++- > arch/arm/kvm/vgic.c | 34 +++--- > include/linux/kvm.h |8 > 7 files changed, 119 insertions(+), 4 deletions(-) > > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index 26e953d..30ddcac 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -2118,6 +2118,36 @@ for the emulated platofrm (see > KVM_SET_DEVICE_ADDRESS), but before the CPU is > initally run. > > > +4.80 KVM_SET_DEVICE_ADDRESS > + > +Capability: KVM_CAP_SET_DEVICE_ADDRESS > +Architectures: arm > +Type: vm ioctl > +Parameters: struct kvm_device_address (in) > +Returns: 0 on success, -1 on error > +Errors: > + ENODEV: The device id is unknwown "unknown" > + ENXIO: Device not supported in configuration "in this configuration" ? (I'm guessing this is for "you tried to map a GIC when this CPU doesn't have a GIC" and similar errors?) > + E2BIG: Address outside of guest physical address space I would say "outside" rather than "outside of" here. > + > +struct kvm_device_address { > + __u32 id; > + __u64 addr; > +}; > + > +Specify a device address in the guest's physical address space where guests > +can access emulated or directly exposed devices, which the host kernel needs > +to know about. The id field is an architecture specific identifier for a > +specific device. > + > +ARM divides the id field into two parts, a device ID and an address type id We should be consistent about whether ID is capitalised or not. > +specific to the individual device. > + > + bits: | 31...16 | 15...0 | > + field: | device id | addr type id | This doesn't say whether userspace is allowed to make this ioctl multiple times for the same device. This could be any of: * undefined behaviour * second call fails with some errno * second call overrides first one It also doesn't say that you're supposed to call this after CREATE and before INIT of the irqchip. (Nor does it say what happens if you call it at some other time.) -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl
On Wed, Oct 17, 2012 at 4:21 PM, Peter Maydell wrote: > On 14 October 2012 01:04, Christoffer Dall > wrote: >> Used to initialize the in-kernel interrupt controller. On ARM we need to >> map the virtual generic interrupt controller (vGIC) into Hyp the guest's >> physicall address space so the guest can access the virtual cpu >> interface. This must be done after the IRQ chips is create and after a >> base address has been provided for the emulated platform (patch is >> following), but before the CPU is initally run. > > I've now written the code for that patch but don't have access to a machine > with the ARM cross compile setup to build it until tomorrow. > >> >> Signed-off-by: Christoffer Dall >> --- >> Documentation/virtual/kvm/api.txt | 16 >> arch/arm/kvm/arm.c|1 + >> include/linux/kvm.h |3 +++ >> 3 files changed, 20 insertions(+) >> >> diff --git a/Documentation/virtual/kvm/api.txt >> b/Documentation/virtual/kvm/api.txt >> index 25eacc6..26e953d 100644 >> --- a/Documentation/virtual/kvm/api.txt >> +++ b/Documentation/virtual/kvm/api.txt >> @@ -2102,6 +2102,22 @@ This ioctl returns the guest registers that are >> supported for the >> KVM_GET_ONE_REG/KVM_SET_ONE_REG calls. >> >> >> +4.79 KVM_INIT_IRQCHIP >> + >> +Capability: KVM_CAP_INIT_IRQCHIP >> +Architectures: arm >> +Type: vm ioctl >> +Parameters: none >> +Returns: 0 on success, -1 on error >> + >> +Initialize the in-kernel interrupt controller. On ARM we need to map the >> +virtual generic interrupt controller (vGIC) into Hyp the guest's physicall > > Should that "Hyp" be deleted? yup > > "physical" > >> +address space so the guest can access the virtual cpu interface. This must >> be >> +done after the IRQ chips is create and after a base address has been >> provided > > "chip". "created". > >> +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU >> is >> +initally run. > > "initially". thanks a bunch for those, and sorry about the sloppyness. > > (all these typos are also in your commit message) > yeah, you caught my -ECUTANDPASTE there ;) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl
On 14 October 2012 01:04, Christoffer Dall wrote: > Used to initialize the in-kernel interrupt controller. On ARM we need to > map the virtual generic interrupt controller (vGIC) into Hyp the guest's > physicall address space so the guest can access the virtual cpu > interface. This must be done after the IRQ chips is create and after a > base address has been provided for the emulated platform (patch is > following), but before the CPU is initally run. I've now written the code for that patch but don't have access to a machine with the ARM cross compile setup to build it until tomorrow. > > Signed-off-by: Christoffer Dall > --- > Documentation/virtual/kvm/api.txt | 16 > arch/arm/kvm/arm.c|1 + > include/linux/kvm.h |3 +++ > 3 files changed, 20 insertions(+) > > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index 25eacc6..26e953d 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -2102,6 +2102,22 @@ This ioctl returns the guest registers that are > supported for the > KVM_GET_ONE_REG/KVM_SET_ONE_REG calls. > > > +4.79 KVM_INIT_IRQCHIP > + > +Capability: KVM_CAP_INIT_IRQCHIP > +Architectures: arm > +Type: vm ioctl > +Parameters: none > +Returns: 0 on success, -1 on error > + > +Initialize the in-kernel interrupt controller. On ARM we need to map the > +virtual generic interrupt controller (vGIC) into Hyp the guest's physicall Should that "Hyp" be deleted? "physical" > +address space so the guest can access the virtual cpu interface. This must be > +done after the IRQ chips is create and after a base address has been provided "chip". "created". > +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is > +initally run. "initially". (all these typos are also in your commit message) > + > + > 5. The kvm_run structure > > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c > index f8c377b..85c76e4 100644 > --- a/arch/arm/kvm/arm.c > +++ b/arch/arm/kvm/arm.c > @@ -195,6 +195,7 @@ int kvm_dev_ioctl_check_extension(long ext) > switch (ext) { > #ifdef CONFIG_KVM_ARM_VGIC > case KVM_CAP_IRQCHIP: > + case KVM_CAP_INIT_IRQCHIP: > r = vgic_present; > break; > #endif > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > index 8091b1d..90ee023 100644 > --- a/include/linux/kvm.h > +++ b/include/linux/kvm.h > @@ -626,6 +626,7 @@ struct kvm_ppc_smmu_info { > #ifdef __KVM_HAVE_READONLY_MEM > #define KVM_CAP_READONLY_MEM 81 > #endif > +#define KVM_CAP_INIT_IRQCHIP 82 > > #ifdef KVM_CAP_IRQ_ROUTING > > @@ -839,6 +840,8 @@ struct kvm_s390_ucas_mapping { > #define KVM_PPC_GET_SMMU_INFO_IOR(KVMIO, 0xa6, struct kvm_ppc_smmu_info) > /* Available with KVM_CAP_PPC_ALLOC_HTAB */ > #define KVM_PPC_ALLOCATE_HTAB_IOWR(KVMIO, 0xa7, __u32) > +/* Available with KVM_CAP_INIT_IRQCHIP */ > +#define KVM_INIT_IRQCHIP _IO(KVMIO, 0xa8) > > /* > * ioctls for vcpu fds > -- > 1.7.9.5 > -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How to do fast accesses to LAPIC TPR under kvm?
Hi, OpenBSD/i386 seems to be one of the few operating systems that still uses the LAPIC taks priority register for interrupt handling. On AMD CPUs and on older Intel CPUs without the flexpriority feature, this causes a huge performance impact on kvm. I have seen slowdown by a factor of 10. Is there a way to use the TPR under kvm without the slowdown? There are some MSRs inherited from Hyper-V, but using these does not make that much difference. I think this is because they still cause an vmexit for every TPR access. I expect the the same is true for x2apic emulation, isn't it? There is also the kvmvapic, but kvm does not expose a sane interface to it and only uses it for Windows XP specific binary patching. Another possibility is TPR access via CR8 on AMD, but the missing cr8_legacy CPUID bit and this discussion [1] make me believe that this is not supported under kvm, at least in 32bit mode. Could this be easily fixed? If yes, would it solve the performance problems, i.e. offer performance comparable to Intel's flexpriority feature? OpenBSD seems to be reluctant to stop using the TPR. In fact, in a recent discussion, there has been a suggestion that OpenBSD should switch to using TPR also on OpenBSD/amd64 to solve some problems with boot interrupts. How do you expect this would affect performance under kvm (if using CR8)? Or do you have any other suggestions? One could also modify kvm to expose a real interface to kvmvapic, e.g. allow the guest OS to provide the virtual address of the option rom and the offset of the CPU number in the %fs segment, instead of using hard coded values for Windows XP. Cheers, Stefan [1] http://www.mail-archive.com/kvm@vger.kernel.org/msg30627.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I/O errors in guest OS after repeated migration
On Wednesday, October 17, 2012 10:45:14 AM Guido Winkelmann wrote: > Am Dienstag, 16. Oktober 2012, 12:44:27 schrieb Brian Jackson: > > On Tuesday, October 16, 2012 11:33:44 AM Guido Winkelmann wrote: > > > The commandline, as generated by libvirtd, looks like this: > > > > > > LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin > > > QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.15 -enable-kvm -m 1024 > > > -smp 1,sockets=1,cores=1,threads=1 -name migratetest2 -uuid > > > ddbf11e9-387e-902b-4849-8c3067dc42a2 -nodefconfig -nodefaults -chardev > > > socket,id=charmonitor,path=/var/lib/libvirt/qemu/migratetest2.monitor,s > > > erv e > > > r,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc > > > -no-reboot -no- shutdown -device > > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive > > > file=/data/migratetest2_system,if=none,id=drive-virtio- > > > disk0,format=qcow2,cache=none -device virtio-blk- > > > pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio- > > > disk0,bootindex=1 -drive > > > file=/data/migratetest2_data-1,if=none,id=drive- > > > virtio-disk1,format=qcow2,cache=none -device virtio-blk- > > > pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk > > > 1 - netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 -device > > > virtio-net- > > > pci,netdev=hostnet0,id=net0,mac=02:00:00:00:00:0c,bus=pci.0,addr=0x3 > > > -vnc 127.0.0.1:2,password -k de -vga cirrus -incoming > > > tcp:0.0.0.0:49153 -device > > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 > > > > I see qcow2 in there. Live migration of qcow2 was a new feature in 1.0. > > Have you tried other formats or different qemu/kvm versions? > > I tried the same thing with a raw image file instead of qcow2, and the > problem still happens. From the /var/log/messages of the guest: > > Oct 17 17:10:34 localhost sshd[2368]: nss_ldap: could not search LDAP > server - Server is unavailable > Oct 17 17:10:39 localhost kernel: [ 126.800075] eth0: no IPv6 routers > present Oct 17 17:10:52 localhost kernel: [ 140.335783] Clocksource tsc > unstable (delta = -70265501 ns) > Oct 17 17:12:04 localhost /O error on device vda1, logical block 1858765 > Oct 17 17:12:04 localhost kernel: [ 212.070584] Buffer I/O error on device > vda1, logical block 1858766 > Oct 17 17:12:04 localhost kernel: [ 212.070587] Buffer I/O error on device > vda1, logical block 1858767 > Oct 17 17:12:04 localhost kernel: [ 212.070589] Buffer I/O error on device > vda1, logical block 1858768 > Oct 17 17:12:04 localhost kernel: [ 212.070592] Buffer I/O error on device > vda1, logical block 1858769 > Oct 17 17:12:04 localhost kernel: [ 212.070595] Buffer I/O error on device > vda1, logical block 1858770 > Oct 17 17:12:04 localhost kernel: [ 212.070597] Buffer I/O error on device > vda1, logical block 1858771 > Oct 17 17:12:04 localhost kernel: [ 212.070600] Buffer I/O error on device > vda1, logical block 1858772 > Oct 17 17:12:04 localhost kernel: [ 212.070602] Buffer I/O error on device > vda1, logical block 1858773 > Oct 17 17:12:04 localhost kernel: [ 212.070605] Buffer I/O error on device > vda1, logical block 1858774 > Oct 17 17:12:04 localhost kernel: [ 212.070607] Buffer I/O error on device > vda1, logical block 1858775 > Oct 17 17:12:04 localhost kernel: [ 212.070610] Buffer I/O error on device > vda1, logical block 1858776 > Oct 17 17:12:04 localhost kernel: [ 212.070612] Buffer I/O error on device > vda1, logical block 1858777 > Oct 17 17:12:04 localhost kernel: [ 212.070615] Buffer I/O error on device > vda1, logical block 1858778 > Oct 17 17:12:04 localhost kernel: [ 212.070617] Buffer I/O error on device > vda1, logical block 1858779 > > (I was writing a large file at the time, to make sure I actually catch I/O > errors as they happen) What about newer versions of qemu/kvm? But of course if those work, your next task is going to be git bisect it or file a bug with your distro that is using an ancient version of qemu/kvm. > > Guido > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I/O errors in guest OS after repeated migration
On Wednesday, October 17, 2012 06:54:00 AM Guido Winkelmann wrote: > Am Dienstag, 16. Oktober 2012, 12:44:27 schrieb Brian Jackson: > > On Tuesday, October 16, 2012 11:33:44 AM Guido Winkelmann wrote: > [...] > > > > The commandline, as generated by libvirtd, looks like this: > > > > > > LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin > > > QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.15 -enable-kvm -m 1024 > > > -smp 1,sockets=1,cores=1,threads=1 -name migratetest2 -uuid > > > ddbf11e9-387e-902b-4849-8c3067dc42a2 -nodefconfig -nodefaults -chardev > > > socket,id=charmonitor,path=/var/lib/libvirt/qemu/migratetest2.monitor,s > > > erv e > > > r,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc > > > -no-reboot -no- shutdown -device > > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive > > > file=/data/migratetest2_system,if=none,id=drive-virtio- > > > disk0,format=qcow2,cache=none -device virtio-blk- > > > pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio- > > > disk0,bootindex=1 -drive > > > file=/data/migratetest2_data-1,if=none,id=drive- > > > virtio-disk1,format=qcow2,cache=none -device virtio-blk- > > > pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk > > > 1 - netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 -device > > > virtio-net- > > > pci,netdev=hostnet0,id=net0,mac=02:00:00:00:00:0c,bus=pci.0,addr=0x3 > > > -vnc 127.0.0.1:2,password -k de -vga cirrus -incoming > > > tcp:0.0.0.0:49153 -device > > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 > > > > I see qcow2 in there. Live migration of qcow2 was a new feature in 1.0. > > Have you tried other formats or different qemu/kvm versions? > > Are you sure about that? Because I'm fairly certain I have been using live > migration since at least 0.14, if not 0.13, and I have always been using > qcow2 as the image format for the disks... > > I can still try with other image formats, though. Yes, see the release notes for 1.0. It may have worked by chance before that, but it wasn't guaranteed to work. There was no blacklisting feature then like there is now to stop it. > > Guido > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] KVM: PPC: Support ioeventfd
On 10/17/2012 04:50 PM, Avi Kivity wrote: On 10/16/2012 04:49 PM, Alexander Graf wrote: If there is a lot of prioritization and/or queuing logic, then yes. But what about MSI? Doesn't that have a direct path? Nope. Well, yes, in a certain special case where the MPIC pushes the interrupt vector on interrupt delivery into a special register. But not for the "normal" case. Ok. The patches are fine then, but would be good to add the PIO check. Yup, will do as a separate patch. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-unit test behavior
On 10/17/2012 06:08 PM, Conny Seidel wrote: > Hi, > > > we are seeing something strange when running the KVM unit-tests on > recent KVM and "older" CPUs (K8 Family). > A patch was just applied fixing this; it will be merged upstream in a few days. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kvm-unit test behavior
Hi, we are seeing something strange when running the KVM unit-tests on recent KVM and "older" CPUs (K8 Family). [ cut here ] WARNING: at arch/x86/kvm/../../../virt/kvm/kvm_main.c:1325 kvm_release_pfn_clean+0x5b/0x60 [kvm]() Hardware name: WARTHOG Modules linked in: tun nfsv4 auth_rpcgss nfsv3 nfs_acl nfs fscache lockd sunrpc bridge stp llc ipv6 amd8111e mii powernow_k8 freq_table kvm_amd kvm serio_raw pcspkr k8temp amd64_edac_mod edac_core edac_mce_amd i2c_amd756 amd_rng i2c_amd8111 sg shpchp ext3 jbd mbcache sd_mod crc_t10dif sr_mod cdrom sata_sil ata_generic pata_acpi pata_amd radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod Pid: 2084, comm: qemu-kvm Not tainted 3.6.0.20121010_ecefbd9-1.el6.osrc.x86_64 #1 Call Trace: [] warn_slowpath_common+0x7f/0xc0 [] warn_slowpath_null+0x1a/0x20 [] kvm_release_pfn_clean+0x5b/0x60 [kvm] [] paging64_fetch+0x1eb/0x370 [kvm] [] ? __gfn_to_pfn+0x6f/0x80 [kvm] [] ? gfn_to_pfn_async+0x1a/0x20 [kvm] [] ? try_async_pf+0x4b/0x1f0 [kvm] [] paging64_page_fault+0x293/0x2d0 [kvm] [] ? kfree+0x2c/0x120 [] kvm_mmu_page_fault+0x27/0xd0 [kvm] [] pf_interception+0xa4/0x170 [kvm_amd] [] handle_exit+0x146/0x2d0 [kvm_amd] [] ? kvm_get_cr8+0x1d/0x30 [kvm] [] ? svm_vcpu_run+0x425/0x530 [kvm_amd] [] vcpu_enter_guest+0x39c/0x6b0 [kvm] [] __vcpu_run+0x1e8/0x320 [kvm] [] kvm_arch_vcpu_ioctl_run+0x9a/0x1f0 [kvm] [] kvm_vcpu_ioctl+0x4a8/0x590 [kvm] [] do_vfs_ioctl+0x8c/0x340 [] sys_ioctl+0xa1/0xb0 [] ? __audit_syscall_exit+0x3d6/0x430 [] system_call_fastpath+0x16/0x1b ---[ end trace bc3b9055849b3814 ]--- The failing tests are svm and svm-disable, which seem to loop forever once started. Begin logfile: enabling apic enabling apic paging enabled cr0 = 80010011 cr3 = 7fff000 cr4 = 20 null: PASS vmrun: PASS vmrun intercept check: PASS cr3 read intercept: PASS enabling apic enabling apic paging enabled cr0 = 80010011 cr3 = 7fff000 cr4 = 20 null: PASS vmrun: PASS vmrun intercept check: PASS cr3 read intercept: PASS # goes on until the test is killed. Anyone seen this behavior? -- Kind regards. Conny Seidel ## # Email : conny.sei...@amd.comGnuPG-Key : 0xA6AB055D # # Fingerprint: 17C4 5DB2 7C4C C1C7 1452 8148 F139 7C09 A6AB 055D # ## # Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach # # General Managers: Alberto Bozzo# # Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen # # HRB Nr. 43632# ## signature.asc Description: PGP signature
Re: I/O errors in guest OS after repeated migration
Am Dienstag, 16. Oktober 2012, 12:44:27 schrieb Brian Jackson: > On Tuesday, October 16, 2012 11:33:44 AM Guido Winkelmann wrote: > > The commandline, as generated by libvirtd, looks like this: > > > > LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin > > QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.15 -enable-kvm -m 1024 > > -smp 1,sockets=1,cores=1,threads=1 -name migratetest2 -uuid > > ddbf11e9-387e-902b-4849-8c3067dc42a2 -nodefconfig -nodefaults -chardev > > socket,id=charmonitor,path=/var/lib/libvirt/qemu/migratetest2.monitor,serv > > e > > r,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc > > -no-reboot -no- shutdown -device > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive > > file=/data/migratetest2_system,if=none,id=drive-virtio- > > disk0,format=qcow2,cache=none -device virtio-blk- > > pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio- > > disk0,bootindex=1 -drive file=/data/migratetest2_data-1,if=none,id=drive- > > virtio-disk1,format=qcow2,cache=none -device virtio-blk- > > pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 - > > netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net- > > pci,netdev=hostnet0,id=net0,mac=02:00:00:00:00:0c,bus=pci.0,addr=0x3 -vnc > > 127.0.0.1:2,password -k de -vga cirrus -incoming tcp:0.0.0.0:49153 -device > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 > > I see qcow2 in there. Live migration of qcow2 was a new feature in 1.0. Have > you tried other formats or different qemu/kvm versions? I tried the same thing with a raw image file instead of qcow2, and the problem still happens. From the /var/log/messages of the guest: Oct 17 17:10:34 localhost sshd[2368]: nss_ldap: could not search LDAP server - Server is unavailable Oct 17 17:10:39 localhost kernel: [ 126.800075] eth0: no IPv6 routers present Oct 17 17:10:52 localhost kernel: [ 140.335783] Clocksource tsc unstable (delta = -70265501 ns) Oct 17 17:12:04 localhost /O error on device vda1, logical block 1858765 Oct 17 17:12:04 localhost kernel: [ 212.070584] Buffer I/O error on device vda1, logical block 1858766 Oct 17 17:12:04 localhost kernel: [ 212.070587] Buffer I/O error on device vda1, logical block 1858767 Oct 17 17:12:04 localhost kernel: [ 212.070589] Buffer I/O error on device vda1, logical block 1858768 Oct 17 17:12:04 localhost kernel: [ 212.070592] Buffer I/O error on device vda1, logical block 1858769 Oct 17 17:12:04 localhost kernel: [ 212.070595] Buffer I/O error on device vda1, logical block 1858770 Oct 17 17:12:04 localhost kernel: [ 212.070597] Buffer I/O error on device vda1, logical block 1858771 Oct 17 17:12:04 localhost kernel: [ 212.070600] Buffer I/O error on device vda1, logical block 1858772 Oct 17 17:12:04 localhost kernel: [ 212.070602] Buffer I/O error on device vda1, logical block 1858773 Oct 17 17:12:04 localhost kernel: [ 212.070605] Buffer I/O error on device vda1, logical block 1858774 Oct 17 17:12:04 localhost kernel: [ 212.070607] Buffer I/O error on device vda1, logical block 1858775 Oct 17 17:12:04 localhost kernel: [ 212.070610] Buffer I/O error on device vda1, logical block 1858776 Oct 17 17:12:04 localhost kernel: [ 212.070612] Buffer I/O error on device vda1, logical block 1858777 Oct 17 17:12:04 localhost kernel: [ 212.070615] Buffer I/O error on device vda1, logical block 1858778 Oct 17 17:12:04 localhost kernel: [ 212.070617] Buffer I/O error on device vda1, logical block 1858779 (I was writing a large file at the time, to make sure I actually catch I/O errors as they happen) Guido -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V2 0/5] Separate consigned (expected steal) from steal time.
On Wed, 2012-10-17 at 21:14 +0400, Glauber Costa wrote: > On 10/17/2012 06:23 AM, Michael Wolf wrote: > > In the case of where you have a system that is running in a > > capped or overcommitted environment the user may see steal time > > being reported in accounting tools such as top or vmstat. This can > > cause confusion for the end user. To ease the confusion this patch set > > adds the idea of consigned (expected steal) time. The host will separate > > the consigned time from the steal time. The consignment limit passed to the > > host will be the amount of steal time expected within a fixed period of > > time. Any other steal time accruing during that period will show as the > > traditional steal time. > > > > TODO: > > * Change native_clock to take params and not return a value > > * Change update_rq_clock_task > > > > Changes from V1: > > * Removed the steal time allowed percentage from the guest > > * Moved the separation of consigned (expected steal) and steal time to the > > host. > > * No longer include a sysctl interface. > > > > You are showing this in the guest somewhere, but tools like top will > still not show it. So for quite a while, it achieves nothing. > > Of course this is a barrier that any new statistic has to go through. So > while annoying, this is per-se ultimately not a blocker. > > What I still fail to see, is how this is useful information to be shown > in the guest. Honestly, if I'm in a guest VM or container, any time > during which I am not running is time I lost. It doesn't matter if this > was expected or not. This still seems to me as a host-side problem, to > be solved entirely by tooling. > What tools like top and vmstat will show is altered. When I put time in the consign bucket it does not show up in steal. So now as long as the system is performing as expected the user will see 100% and 0% steal. I added the consign field to /proc/stat so that all time accrued in the period is accounted for and also for debugging purposes. The user wont care about consign and will not see it. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv4 2/2] kvm: deliver msi interrupts from irq handler
We can deliver certain interrupts, notably MSI, from atomic context. Use kvm_set_irq_inatomic, to implement an irq handler for msi. This reduces the pressure on scheduler in case where host and guest irq share a host cpu. Signed-off-by: Michael S. Tsirkin --- virt/kvm/assigned-dev.c | 36 ++-- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index 23a41a9..3642239 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -105,6 +105,15 @@ static irqreturn_t kvm_assigned_dev_thread_intx(int irq, void *dev_id) } #ifdef __KVM_HAVE_MSI +static irqreturn_t kvm_assigned_dev_msi(int irq, void *dev_id) +{ + struct kvm_assigned_dev_kernel *assigned_dev = dev_id; + int ret = kvm_set_irq_inatomic(assigned_dev->kvm, + assigned_dev->irq_source_id, + assigned_dev->guest_irq, 1); + return unlikely(ret == -EWOULDBLOCK) ? IRQ_WAKE_THREAD : IRQ_HANDLED; +} + static irqreturn_t kvm_assigned_dev_thread_msi(int irq, void *dev_id) { struct kvm_assigned_dev_kernel *assigned_dev = dev_id; @@ -117,6 +126,23 @@ static irqreturn_t kvm_assigned_dev_thread_msi(int irq, void *dev_id) #endif #ifdef __KVM_HAVE_MSIX +static irqreturn_t kvm_assigned_dev_msix(int irq, void *dev_id) +{ + struct kvm_assigned_dev_kernel *assigned_dev = dev_id; + int index = find_index_from_host_irq(assigned_dev, irq); + u32 vector; + int ret = 0; + + if (index >= 0) { + vector = assigned_dev->guest_msix_entries[index].vector; + ret = kvm_set_irq_inatomic(assigned_dev->kvm, + assigned_dev->irq_source_id, + vector, 1); + } + + return unlikely(ret == -EWOULDBLOCK) ? IRQ_WAKE_THREAD : IRQ_HANDLED; +} + static irqreturn_t kvm_assigned_dev_thread_msix(int irq, void *dev_id) { struct kvm_assigned_dev_kernel *assigned_dev = dev_id; @@ -334,11 +360,6 @@ static int assigned_device_enable_host_intx(struct kvm *kvm, } #ifdef __KVM_HAVE_MSI -static irqreturn_t kvm_assigned_dev_msi(int irq, void *dev_id) -{ - return IRQ_WAKE_THREAD; -} - static int assigned_device_enable_host_msi(struct kvm *kvm, struct kvm_assigned_dev_kernel *dev) { @@ -363,11 +384,6 @@ static int assigned_device_enable_host_msi(struct kvm *kvm, #endif #ifdef __KVM_HAVE_MSIX -static irqreturn_t kvm_assigned_dev_msix(int irq, void *dev_id) -{ - return IRQ_WAKE_THREAD; -} - static int assigned_device_enable_host_msix(struct kvm *kvm, struct kvm_assigned_dev_kernel *dev) { -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv4 1/2] kvm: add kvm_set_irq_inatomic
Add an API to inject IRQ from atomic context. Return EWOULDBLOCK if impossible (e.g. for multicast). Only MSI is supported ATM. Signed-off-by: Michael S. Tsirkin --- include/linux/kvm_host.h | 1 + virt/kvm/irq_comm.c | 83 +--- 2 files changed, 72 insertions(+), 12 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 93bfc9f..e165c09 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -677,6 +677,7 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic, unsigned long *deliver_bitmask); #endif int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level); +int kvm_set_irq_inatomic(struct kvm *kvm, int irq_source_id, u32 irq, int level); int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm *kvm, int irq_source_id, int level); void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin); diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 2eb58af..656fa45 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -102,6 +102,23 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, return r; } +static inline void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e, + struct kvm_lapic_irq *irq) +{ + trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data); + + irq->dest_id = (e->msi.address_lo & + MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT; + irq->vector = (e->msi.data & + MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT; + irq->dest_mode = (1 << MSI_ADDR_DEST_MODE_SHIFT) & e->msi.address_lo; + irq->trig_mode = (1 << MSI_DATA_TRIGGER_SHIFT) & e->msi.data; + irq->delivery_mode = e->msi.data & 0x700; + irq->level = 1; + irq->shorthand = 0; + /* TODO Deal with RH bit of MSI message address */ +} + int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm, int irq_source_id, int level) { @@ -110,22 +127,26 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e, if (!level) return -1; - trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data); + kvm_set_msi_irq(e, &irq); - irq.dest_id = (e->msi.address_lo & - MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT; - irq.vector = (e->msi.data & - MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT; - irq.dest_mode = (1 << MSI_ADDR_DEST_MODE_SHIFT) & e->msi.address_lo; - irq.trig_mode = (1 << MSI_DATA_TRIGGER_SHIFT) & e->msi.data; - irq.delivery_mode = e->msi.data & 0x700; - irq.level = 1; - irq.shorthand = 0; - - /* TODO Deal with RH bit of MSI message address */ return kvm_irq_delivery_to_apic(kvm, NULL, &irq); } + +static int kvm_set_msi_inatomic(struct kvm_kernel_irq_routing_entry *e, +struct kvm *kvm) +{ + struct kvm_lapic_irq irq; + int r; + + kvm_set_msi_irq(e, &irq); + + if (kvm_irq_delivery_to_apic_fast(kvm, NULL, &irq, &r)) + return r; + else + return -EWOULDBLOCK; +} + int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi) { struct kvm_kernel_irq_routing_entry route; @@ -178,6 +199,44 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level) return ret; } +/* + * Deliver an IRQ in an atomic context if we can, or return a failure, + * user can retry in a process context. + * Return value: + * -EWOULDBLOCK - Can't deliver in atomic context: retry in a process context. + * Other values - No need to retry. + */ +int kvm_set_irq_inatomic(struct kvm *kvm, int irq_source_id, u32 irq, int level) +{ + struct kvm_kernel_irq_routing_entry *e; + int ret = -EINVAL; + struct kvm_irq_routing_table *irq_rt; + struct hlist_node *n; + + trace_kvm_set_irq(irq, level, irq_source_id); + + /* +* Injection into either PIC or IOAPIC might need to scan all CPUs, +* which would need to be retried from thread context; when same GSI +* is connected to both PIC and IOAPIC, we'd have to report a +* partial failure here. +* Since there's no easy way to do this, we only support injecting MSI +* which is limited to 1:1 GSI mapping. +*/ + rcu_read_lock(); + irq_rt = rcu_dereference(kvm->irq_routing); + if (irq < irq_rt->nr_rt_entries) + hlist_for_each_entry(e, n, &irq_rt->map[irq], link) { + if (likely(e->type == KVM_IRQ_ROUTING_MSI)) + ret = kvm_set_msi_inatomic(e, kvm); + else + ret = -EWOULDBLOCK; + break; + } + rcu_read_unlock(); + re
[PATCHv4 0/2] kvm: direct msix injection
We can deliver certain interrupts, notably MSIX, from atomic context. Here's an untested patch to do this (compiled only). Changes from v2: Don't inject broadcast interrupts directly Changes from v1: Tried to address comments from v1, except unifying with kvm_set_irq: passing flags to it looks too ugly. Added a comment. Jan, you said you can test this? Michael S. Tsirkin (2): kvm: add kvm_set_irq_inatomic kvm: deliver msi interrupts from irq handler include/linux/kvm_host.h | 1 + virt/kvm/assigned-dev.c | 36 +++-- virt/kvm/irq_comm.c | 83 +--- 3 files changed, 98 insertions(+), 22 deletions(-) -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] KVM: PPC: Support ioeventfd
On 10/16/2012 04:49 PM, Alexander Graf wrote: >> If there is a lot of prioritization and/or queuing logic, then yes. But >> what about MSI? Doesn't that have a direct path? > > Nope. Well, yes, in a certain special case where the MPIC pushes the > interrupt vector on interrupt delivery into a special register. But not > for the "normal" case. Ok. The patches are fine then, but would be good to add the PIO check. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 2/6] KVM: MMU: remove mmu_is_invalid
On 10/16/2012 02:08 PM, Xiao Guangrong wrote: > Remove mmu_is_invalid and use is_invalid_pfn instead Applied 2-5 to next; 6 depends on 1, so will wait until it is merged upstream. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Added call parameter to track whether invocation originated with guest or elsewhere
On 10/17/2012 04:09 PM, Marcelo Tosatti wrote: > On Wed, Oct 17, 2012 at 12:35:33PM +0200, Avi Kivity wrote: >> On 10/17/2012 04:10 AM, Will Auld wrote: >> > Signed-off-by: Will Auld >> > --- >> > >> > Resending to full list >> > >> > Marcelo, >> > >> > This patch is what I believe you ask for as foundational for later >> > patches to address IA32_TSC_ADJUST. >> > >> >> Please write a changelog to reflect the motivation. >> >> All those bool parameters scattered all over the place aren't very >> pretty. Usually we solve this with helpers that embed the parameter >> name (kvm_set_msr() vs. kvm_set_msr_host()) but there are too many >> functions for this to work here. >> >> Marcelo, any ideas? > > Its easier to read > > kvm_x86_ops->kvm_set_msr() > kvm_x86_ops->kvm_set_msr_host() > > then > > kvm_x86_ops->kvm_set_msr(,false) > kvm_x86_ops->kvm_set_msr(,true) > > So you're right. Yes, but we have a million functions for setting MSRs. Maybe struct msr { bool host_requested; u32 index; u64 data; }; and change all the APIs to use that. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/10] UAPI: Put a comment into uapi/asm-generic/kvm_para.h and use it from arches
Make uapi/asm-generic/kvm_para.h non-empty by addition of a comment to stop the patch program from deleting it when it creates it. Then delete empty arch-specific uapi/asm/kvm_para.h files and tell the Kbuild files to use the generic instead. Should this perhaps instead be a #warning or #error that the facility is unsupported on this arch? Signed-off-by: David Howells cc: Arnd Bergmann cc: Avi Kivity cc: Marcelo Tosatti cc: kvm@vger.kernel.org --- arch/ia64/include/uapi/asm/Kbuild |2 ++ arch/ia64/include/uapi/asm/kvm_para.h |0 arch/s390/include/uapi/asm/Kbuild |2 ++ arch/s390/include/uapi/asm/kvm_para.h |0 include/uapi/asm-generic/kvm_para.h |4 5 files changed, 8 insertions(+) delete mode 100644 arch/ia64/include/uapi/asm/kvm_para.h delete mode 100644 arch/s390/include/uapi/asm/kvm_para.h diff --git a/arch/ia64/include/uapi/asm/Kbuild b/arch/ia64/include/uapi/asm/Kbuild index 30cafac..1b3f5eb 100644 --- a/arch/ia64/include/uapi/asm/Kbuild +++ b/arch/ia64/include/uapi/asm/Kbuild @@ -1,6 +1,8 @@ # UAPI Header export list include include/uapi/asm-generic/Kbuild.asm +generic-y += kvm_para.h + header-y += auxvec.h header-y += bitsperlong.h header-y += break.h diff --git a/arch/ia64/include/uapi/asm/kvm_para.h b/arch/ia64/include/uapi/asm/kvm_para.h deleted file mode 100644 index e69de29..000 diff --git a/arch/s390/include/uapi/asm/Kbuild b/arch/s390/include/uapi/asm/Kbuild index 7bf68ff..59b67ed 100644 --- a/arch/s390/include/uapi/asm/Kbuild +++ b/arch/s390/include/uapi/asm/Kbuild @@ -1,6 +1,8 @@ # UAPI Header export list include include/uapi/asm-generic/Kbuild.asm +generic-y += kvm_para.h + header-y += auxvec.h header-y += bitsperlong.h header-y += byteorder.h diff --git a/arch/s390/include/uapi/asm/kvm_para.h b/arch/s390/include/uapi/asm/kvm_para.h deleted file mode 100644 index e69de29..000 diff --git a/include/uapi/asm-generic/kvm_para.h b/include/uapi/asm-generic/kvm_para.h index e69de29..486f0af 100644 --- a/include/uapi/asm-generic/kvm_para.h +++ b/include/uapi/asm-generic/kvm_para.h @@ -0,0 +1,4 @@ +/* + * There isn't anything here, but the file must not be empty or patch + * will delete it. + */ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Added call parameter to track whether invocation originated with guest or elsewhere
On Wed, Oct 17, 2012 at 12:35:33PM +0200, Avi Kivity wrote: > On 10/17/2012 04:10 AM, Will Auld wrote: > > Signed-off-by: Will Auld > > --- > > > > Resending to full list > > > > Marcelo, > > > > This patch is what I believe you ask for as foundational for later > > patches to address IA32_TSC_ADJUST. > > > > Please write a changelog to reflect the motivation. > > All those bool parameters scattered all over the place aren't very > pretty. Usually we solve this with helpers that embed the parameter > name (kvm_set_msr() vs. kvm_set_msr_host()) but there are too many > functions for this to work here. > > Marcelo, any ideas? Its easier to read kvm_x86_ops->kvm_set_msr() kvm_x86_ops->kvm_set_msr_host() then kvm_x86_ops->kvm_set_msr(,false) kvm_x86_ops->kvm_set_msr(,true) So you're right. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/10] UAPI: Put a comment into uapi/asm-generic/kvm_para.h and use it from arches
On Wednesday 17 October 2012, David Howells wrote: > Make uapi/asm-generic/kvm_para.h non-empty by addition of a comment to stop > the patch program from deleting it when it creates it. > > Then delete empty arch-specific uapi/asm/kvm_para.h files and tell the Kbuild > files to use the generic instead. > > Should this perhaps instead be a #warning or #error that the facility is > unsupported on this arch? Just an empty file is fine by me, but an #error also sounds reasonable if we want users to be able to write autoconf tests for it. > Signed-off-by: David Howells > cc: Arnd Bergmann > cc: Avi Kivity > cc: Marcelo Tosatti > cc: kvm@vger.kernel.org Acked-by: Arnd Bergmann -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/10] UAPI: Put a comment into uapi/asm-generic/kvm_para.h and use it from arches
Make uapi/asm-generic/kvm_para.h non-empty by addition of a comment to stop the patch program from deleting it when it creates it. Then delete empty arch-specific uapi/asm/kvm_para.h files and tell the Kbuild files to use the generic instead. Should this perhaps instead be a #warning or #error that the facility is unsupported on this arch? Signed-off-by: David Howells cc: Arnd Bergmann cc: Avi Kivity cc: Marcelo Tosatti cc: kvm@vger.kernel.org --- arch/ia64/include/uapi/asm/Kbuild |2 ++ arch/ia64/include/uapi/asm/kvm_para.h |0 arch/s390/include/uapi/asm/Kbuild |2 ++ arch/s390/include/uapi/asm/kvm_para.h |0 include/uapi/asm-generic/kvm_para.h |4 5 files changed, 8 insertions(+) delete mode 100644 arch/ia64/include/uapi/asm/kvm_para.h delete mode 100644 arch/s390/include/uapi/asm/kvm_para.h diff --git a/arch/ia64/include/uapi/asm/Kbuild b/arch/ia64/include/uapi/asm/Kbuild index 30cafac..1b3f5eb 100644 --- a/arch/ia64/include/uapi/asm/Kbuild +++ b/arch/ia64/include/uapi/asm/Kbuild @@ -1,6 +1,8 @@ # UAPI Header export list include include/uapi/asm-generic/Kbuild.asm +generic-y += kvm_para.h + header-y += auxvec.h header-y += bitsperlong.h header-y += break.h diff --git a/arch/ia64/include/uapi/asm/kvm_para.h b/arch/ia64/include/uapi/asm/kvm_para.h deleted file mode 100644 index e69de29..000 diff --git a/arch/s390/include/uapi/asm/Kbuild b/arch/s390/include/uapi/asm/Kbuild index 7bf68ff..59b67ed 100644 --- a/arch/s390/include/uapi/asm/Kbuild +++ b/arch/s390/include/uapi/asm/Kbuild @@ -1,6 +1,8 @@ # UAPI Header export list include include/uapi/asm-generic/Kbuild.asm +generic-y += kvm_para.h + header-y += auxvec.h header-y += bitsperlong.h header-y += byteorder.h diff --git a/arch/s390/include/uapi/asm/kvm_para.h b/arch/s390/include/uapi/asm/kvm_para.h deleted file mode 100644 index e69de29..000 diff --git a/include/uapi/asm-generic/kvm_para.h b/include/uapi/asm-generic/kvm_para.h index e69de29..486f0af 100644 --- a/include/uapi/asm-generic/kvm_para.h +++ b/include/uapi/asm-generic/kvm_para.h @@ -0,0 +1,4 @@ +/* + * There isn't anything here, but the file must not be empty or patch + * will delete it. + */ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 1/6] KVM: MMU: fix release noslot pfn
On 10/16/2012 02:07 PM, Xiao Guangrong wrote: > We can not directly call kvm_release_pfn_clean to release the pfn > since we can meet noslot pfn which is used to cache mmio info into > spte Applied to master for 3.7, 3.6, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I/O errors in guest OS after repeated migration
Am Dienstag, 16. Oktober 2012, 12:44:27 schrieb Brian Jackson: > On Tuesday, October 16, 2012 11:33:44 AM Guido Winkelmann wrote: [...] > > The commandline, as generated by libvirtd, looks like this: > > > > LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin > > QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.15 -enable-kvm -m 1024 > > -smp 1,sockets=1,cores=1,threads=1 -name migratetest2 -uuid > > ddbf11e9-387e-902b-4849-8c3067dc42a2 -nodefconfig -nodefaults -chardev > > socket,id=charmonitor,path=/var/lib/libvirt/qemu/migratetest2.monitor,serv > > e > > r,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc > > -no-reboot -no- shutdown -device > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive > > file=/data/migratetest2_system,if=none,id=drive-virtio- > > disk0,format=qcow2,cache=none -device virtio-blk- > > pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio- > > disk0,bootindex=1 -drive file=/data/migratetest2_data-1,if=none,id=drive- > > virtio-disk1,format=qcow2,cache=none -device virtio-blk- > > pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 - > > netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net- > > pci,netdev=hostnet0,id=net0,mac=02:00:00:00:00:0c,bus=pci.0,addr=0x3 -vnc > > 127.0.0.1:2,password -k de -vga cirrus -incoming tcp:0.0.0.0:49153 -device > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 > > I see qcow2 in there. Live migration of qcow2 was a new feature in 1.0. Have > you tried other formats or different qemu/kvm versions? Are you sure about that? Because I'm fairly certain I have been using live migration since at least 0.14, if not 0.13, and I have always been using qcow2 as the image format for the disks... I can still try with other image formats, though. Guido -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM on NFS
On 10/17/2012 01:04 PM, Andrew Holway wrote: > > >> O_DIRECT is good. I/O schedulers don't affect NFS so no need to tune >> anything on the host. You might experiment with switching to the >> deadline scheduler in the guest. > > Ill give it a go. Any ideas how I should be tuning my NFS? Not really. The defaults should work well enough. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM on NFS
> O_DIRECT is good. I/O schedulers don't affect NFS so no need to tune > anything on the host. You might experiment with switching to the > deadline scheduler in the guest. Ill give it a go. Any ideas how I should be tuning my NFS? > > > -- > error compiling committee.c: too many arguments to function > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM on NFS
On 10/17/2012 11:20 AM, Andrew Holway wrote: > Hello, > > I am testing KVM on an Oracle NFS box that I have. > > Does the list have any advice on best practice? I remember reading that there > is stuff you can do with I/O schedulers and stuff to make it more efficient. > > My VMs will primarily be running mysql databases. I am currently using > o_direct. > O_DIRECT is good. I/O schedulers don't affect NFS so no need to tune anything on the host. You might experiment with switching to the deadline scheduler in the guest. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Added call parameter to track whether invocation originated with guest or elsewhere
On 10/17/2012 04:10 AM, Will Auld wrote: > Signed-off-by: Will Auld > --- > > Resending to full list > > Marcelo, > > This patch is what I believe you ask for as foundational for later > patches to address IA32_TSC_ADJUST. > Please write a changelog to reflect the motivation. All those bool parameters scattered all over the place aren't very pretty. Usually we solve this with helpers that embed the parameter name (kvm_set_msr() vs. kvm_set_msr_host()) but there are too many functions for this to work here. Marcelo, any ideas? > Thanks, > > Will > > arch/x86/include/asm/kvm_host.h | 8 > arch/x86/kvm/svm.c | 18 ++ > arch/x86/kvm/vmx.c | 18 ++ > arch/x86/kvm/x86.c | 18 ++ > arch/x86/kvm/x86.h | 2 +- > 5 files changed, 35 insertions(+), 29 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 09155d6..c06f0d1 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -621,7 +621,7 @@ struct kvm_x86_ops { > void (*set_guest_debug)(struct kvm_vcpu *vcpu, > struct kvm_guest_debug *dbg); > int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata); > - int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); > + int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data, bool > guest_initiated); > u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg); > void (*get_segment)(struct kvm_vcpu *vcpu, > struct kvm_segment *var, int seg); > @@ -684,7 +684,7 @@ struct kvm_x86_ops { > bool (*has_wbinvd_exit)(void); > > void (*set_tsc_khz)(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool > scale); > - void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset); > + void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset, bool > guest_initiated); > > u64 (*compute_tsc_offset)(struct kvm_vcpu *vcpu, u64 target_tsc); > u64 (*read_l1_tsc)(struct kvm_vcpu *vcpu); > @@ -772,7 +772,7 @@ static inline int emulate_instruction(struct kvm_vcpu > *vcpu, > > void kvm_enable_efer_bits(u64); > int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); > -int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); > +int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data, bool > guest_initiated); > > struct x86_emulate_ctxt; > > @@ -799,7 +799,7 @@ void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, > int *l); > int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr); > > int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata); > -int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data); > +int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool > guest_initiated); > > unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu); > void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags); > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index baead95..424be27 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -1012,7 +1012,8 @@ static void svm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 > user_tsc_khz, bool scale) > svm->tsc_ratio = ratio; > } > > -static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) > +static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset, > + bool guest_initiated) > { > struct vcpu_svm *svm = to_svm(vcpu); > u64 g_tsc_offset = 0; > @@ -1255,7 +1256,7 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm > *kvm, unsigned int id) > svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT; > svm->asid_generation = 0; > init_vmcb(svm); > - kvm_write_tsc(&svm->vcpu, 0); > + kvm_write_tsc(&svm->vcpu, 0, false /*Not Guest Initiated*/); > > err = fx_init(&svm->vcpu); > if (err) > @@ -3147,13 +3148,14 @@ static int svm_set_vm_cr(struct kvm_vcpu *vcpu, u64 > data) > return 0; > } > > -static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data) > +static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data, > + bool guest_initiated) > { > struct vcpu_svm *svm = to_svm(vcpu); > > switch (ecx) { > case MSR_IA32_TSC: > - kvm_write_tsc(vcpu, data); > + kvm_write_tsc(vcpu, data, guest_initiated); > break; > case MSR_STAR: > svm->vmcb->save.star = data; > @@ -3208,12 +3210,12 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, > unsigned ecx, u64 data) > vcpu_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", > ecx, data); > break; > default: > - return kvm_set_msr_common(vcpu, ecx, data); > + return kvm_set_msr_common(vcpu, ecx, data, guest_initiated); > } > return 0
Re: [PATCH 5/5] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT
On 10/16/2012 11:52 PM, Paul Mackerras wrote: > On Tue, Oct 16, 2012 at 03:06:33PM +0200, Avi Kivity wrote: >> On 10/16/2012 01:58 PM, Paul Mackerras wrote: >> > On Tue, Oct 16, 2012 at 12:06:58PM +0200, Avi Kivity wrote: >> >> Does/should the fd support O_NONBLOCK and poll? (=waiting for an entry >> >> to change). >> > >> > No. >> >> This forces userspace to dedicate a thread for the HPT. > > Why? Reads never block in any case. Ok. This parallels KVM_GET_DIRTY_LOG. >> >> I meant the internal data structure that holds HPT entries. > > Oh, that's just an array, and userspace already knows how big it is. > >> I guess I don't understand the index. Do we expect changes to be in >> contiguous ranges? And invalid entries to be contiguous as well? That >> doesn't fit with how hash tables work. Does the index represent the >> position of the entry within the table, or something else? > > The index is just the position in the array. Typically, in each group > of 8 it will tend to be the low-numbered ones that are valid, since > creating an entry usually uses the first empty slot. So I expect that > on the first pass, most of the records will represent 8 HPTEs. On > subsequent passes, probably most records will represent a single HPTE. So it's a form of RLE compression. Ok. >> >> 16MiB is transferred in ~0.15 sec on GbE, much faster with 10GbE. Does >> it warrant a live migration protocol? > > The qemu people I talked to seemed to think so. > >> > Because it is a hash table, updates tend to be scattered throughout >> > the whole table, which is another reason why per-page dirty tracking >> > and updates would be pretty inefficient. >> >> This suggests a stream format that includes the index in every entry. > > That would amount to dropping the n_valid and n_invalid fields from > the current header format. That would be less efficient for the > initial pass (assuming we achieve an average n_valid of at least 2 on > the initial pass), and probably less efficient for the incremental > updates, since a newly-invalidated entry would have to be represented > as 16 zero bytes rather than just an 8-byte header with n_valid=0 and > n_invalid=1. I'm assuming here that the initial pass would omit > invalid entries. I agree. But let's have some measurements to make sure. > >> > >> > As for the change rate, it depends on the application of course, but >> > basically every time the guest changes a PTE in its Linux page tables >> > we do the corresponding change to the corresponding HPT entry, so the >> > rate can be quite high. Workloads that do a lot of fork, exit, mmap, >> > exec, etc. have a high rate of HPT updates. >> >> If the rate is high enough, then there's no point in a live update. > > True, but doesn't that argument apply to memory pages as well? In some cases it does. The question is what happens in practice. If you migrate a kernel build, how many entries are sent in the guest stopped phase? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT
On 10/16/2012 10:03 PM, Anthony Liguori wrote: >> >> This forces userspace to dedicate a thread for the HPT. > > If no changes are available, does read return a size > 0? I don't think > it's necessary to support polling. The kernel should always be able to > respond to userspace here. The only catch is whether to return !0 read > sizes when there are no changes. > > At any case, I can't see why a dedicated thread is needed. QEMU is > going to poll HPT based on how fast we can send data over the wire. That means spinning if we can send the data faster than we dirty it. But we do that anyway for memory. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Secure migration of LVM based guests over WAN
Am 16.10.2012 12:10, schrieb Avi Kivity: On 10/16/2012 11:48 AM, Lukas Laukamp wrote: Am 16.10.2012 11:40, schrieb Avi Kivity: On 10/16/2012 11:12 AM, Lukas Laukamp wrote: Hey all, I have a question about a solution for migrate LVM based guests directly over the network. So the situation: Two KVM hosts with libvirt, multiple LVM based guests Want to do: Migrate a LVM based guest directly to the other host over an secure connection I know that migration is possible when the VM disks are stored on an NFS, GFS2 filer/cluster etc. So would it be possible to do an offline migration directly with netcat or something like that? If all you need is offline, you can use scp to copy each volume to the destination volume. Make sure the guests are shut down when you do that. It is also possible to do a live migration, but unless the destination and source are in the same IP subnet, the guests are going to lose connectivity. Hello Avi, so can I simply copy an logical volume to the path of the volume group with scp? Yes. Best to enable compression to avoid sending zero blocks. For the live migration theme, it would be no problem when the guests looses connectivity, how could be done a live migration? See the -b option to the migrate command. I will read a little bit about the live migration theme. Best Regards -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] x86: clear vmcss on all cpus when doing kdump if necessary
On 10/17/2012 04:28 AM, Zhang Yanfei wrote: > 于 2012年10月15日 23:43, Avi Kivity 写道: >> On 10/12/2012 08:40 AM, Zhang Yanfei wrote: >>> Currently, kdump just makes all the logical processors leave VMX operation >>> by >>> executing VMXOFF instruction, so any VMCSs active on the logical processors >>> may >>> be corrupted. But, sometimes, we need the VMCSs to debug guest images >>> contained >>> in the host vmcore. To prevent the corruption, we should VMCLEAR the VMCSs >>> before >>> executing the VMXOFF instruction. >> >> How have you verified that VMXOFF doesn't flush cached VMCSs already? >> > > I tried some tests, for example, I made copies for every vmcs, and in the > kdump > path, I backed up all the loaded vmcs into the copies before vmxoff. > After generating the vmcore, I retrieve the vmcss and their copies, and > compare them, > no differences. > > Another test is using VMCLEAR to clear all the loaded vmcs before VMXOFF, > and compare the vmcss and their copies, there are indeed differences between > the > vmcs and its copy. > > I know the tests may be not so convincing, for example, I used memcpy to back > up > the vmcss and it is an ordinary memory operation. But to ensure the > non-corruption > of the vmcss in the vmcore, I think we should VMCLEAR the vmcss before VMXOFF > just > as the Intel spec says. Sorry, I was unclear -- I was referring to the spec, I wasn't sure whether VMXOFF is defined to flush VMCSes or whether it just invalidates on-chip caches so that it won't flush them out in the future, corrupting memory. We don't want to depend on actual behaviour as it may change with future version. Copying some Intel folk, maybe they can clarify it. > >>> >>> The patch set provides an alternative way to clear VMCSs related to guests >>> on all cpus when host is doing kdump. >>> >> >> I'm not sure the sysctl is really necessary. The only reason to turn if >> off is if the corruption is so severe that the loaded vmcs list itself >> causes a crash. I think it should be rare enough that we can do it >> unconditionally. >> > > You mean not using sysctl and just let VMCLEAR-VMCSS be a default behaviour? > If so, > I agree with you. Yes, that's what I meant. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch]KVM: enabling per domain PLE
On 10/17/2012 10:02 AM, Hu, Xuekun wrote: >> >> The problem with this is that it requires an administrator to understand the >> workload, not only of the guest, but also of other guests on the machine. >> With low overcommit, a high PLE window reduces unneeded exits, but with >> high overcommit we need those exits to reduce spinning. >> >> In addition, most kvm hosts don't have an administrator. They are controlled >> by a management system, which means we'll need some algorithm in >> userspace to control the PLE window. Taking the two together, we need a >> dynamic (for changing workloads) algorithm. >> >> There are threads discussing this dynamic algorithm, we are making slow >> progress because it's such a difficult problem, but I think this is much more >> useful than anything requiring user intervention. > > Avi, agreed that dynamic adaptive ple should be the best solution. However > currently it is a difficult problem like you said. Our solution just gives > user > a choice who know how to set the two PLE values. So the solution is a > compromise > solution, which should be better than nothing, for now? :-) Let's see how the PLE thread works out. Yes the patches give the user control, but we need to make sure the user knows how to control it (in fact your patch doesn't even update the documentation). Just throwing out a new ioctl, even if it is documented, doesn't mean that userspace will begin to use it, or that users will exploit it. Do you have a specific use case in mind? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v3 06/19] Implement "-dimm" command line option
On 10/17/2012 11:19 AM, Vasilis Liaskovitis wrote: >> >> I don't think so, but probably there's a limit of DIMMs that real >> controllers have, something like 8 max. > > In the case of i440fx specifically, do you mean that we should model the DRB > (Dram row boundary registers in section 3.2.19 of the i440fx spec) ? > > The i440fx DRB registers only supports up to 8 DRAM rows (let's say 1 row > maps 1-1 to a DimmDevice for this discussion) and only supports up to 2GB of > memory afaict (bit 31 and above is ignored). > > I 'd rather not model this part of the i440fx - having only 8 DIMMs seems too > restrictive. The rest of the patchset supports up to 255 DIMMs so it would be > a > waste imho to model an old pc memory controller that only supports 8 DIMMs. > > There was also an old discussion about i440fx modeling here: > https://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg02705.html > the general direction was that i440fx is too old and we don't want to > precisely > emulate the DRB registers, since they lack flexibility. > > Possible solutions: > > 1) is there a newer and more flexible chipset that we could model? Look for q35 on this list. > > 2) model and document ^--- the critical bit > a generic (non-existent) i440fx that would support more > and larger DIMMs. E.g. support 255 DIMMs. If we want to use a description > similar to the i440fx DRB registers, the registers would take up a lot of > space. > In i440fx there is one 8-bit DRB register per DIMM, and DRB[i] describes how > many 8MB chunks are contained in DIMMs 0...i. So, the register values are > cumulative (and total described memory cannot exceed 256x8MB = 2GB) Our i440fx has already been extended by support for pci and cpu hotplug, and I see no reason not to extend it for memory. We can allocate extra mmio space for registers if needed. Usually I'm against this sort of thing, but in this case we don't have much choice. > > We could for example model: > - an 8-bit non-cumulative register for each DIMM, denoting how many > 128MB chunks it contains. This allowes 32GB for each DIMM, and with 255 DIMMs > we > describe a bit less than 8TB. These registers require 255 bytes. > - a 16-bit cumulative register for each DIMM again for 128MB chunks. This > allows > us to describe 8TB of memory (but the registers take up double the space, > because > they describe cumulative memory amounts) There is no reason to save space. Why not have two 64-bit registers per DIMM, one describing the size and the other the base address, both in bytes? Use a few low order bits for control. > > 3) let everything be handled/abstracted by dimmbus - the chipset DRB modelling > is not done (at least for i440fx, other machines could). This is the least > precise > in terms of emulation. On the other hand, if we are not really trying to > emulate > the real (too restrictive) hardware, does it matter? We could emulate base memory using the chipset, and extra memory using the scheme above. This allows guests that are tied to the chipset to work, and guests that have more awareness (seabios) to use the extra features. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM ept flush
On 10/16/2012 08:50 PM, Rohan Sharma wrote: > Thanks for the reply. > I have one more question. > If I do munmap of the RAM allocated in qemu, > will the changes be reflected in KVM Ept. Yes. Those changes will be reflected. See kvm_mmu_notifier_invalidate_page(), and related. > I guess there is some mmu notifier which ensures that entries of EPT > are synced with the host entries. > > On Tue, Oct 16, 2012 at 8:27 PM, Avi Kivity wrote: >> On 10/16/2012 01:57 PM, Rohan Sharma wrote: >>> Is there a way to flush ept entries in qemu-kvm. >> >> No. >> >> >> -- >> error compiling committee.c: too many arguments to function > -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM on NFS
Hello, I am testing KVM on an Oracle NFS box that I have. Does the list have any advice on best practice? I remember reading that there is stuff you can do with I/O schedulers and stuff to make it more efficient. My VMs will primarily be running mysql databases. I am currently using o_direct. Thanks, Andrew -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v3 06/19] Implement "-dimm" command line option
On Sat, Oct 13, 2012 at 08:57:19AM +, Blue Swirl wrote: > On Tue, Oct 9, 2012 at 5:04 PM, Vasilis Liaskovitis > wrote: > >> snip > >> Maybe even the dimmbus device shouldn't exist by itself after all, or > >> it should be pretty much invisible to users. On real HW, the memory > >> controller or south bridge handles the memory. For i440fx, it's part > >> of the same chipset. So I think we should just add qdev properties to > >> i440fx to specify the sizes, nodes etc. Then i440fx should create the > >> dimmbus device unconditionally using the properties. The default > >> properties should create a sane configuration, otherwise -global > >> i440fx.dimm_size=512M etc. could be used. Then the bus would be > >> populated as before or with device_add. > > > > hmm the problem with using only i440fx properties, is that size/nodes look > > dimm specific to me, not chipset-memcontroller specific. Unless we only > > allow > > uniform size dimms. Is it possible to have a dynamic list of sizes/nodes > > pairs as > > properties of a qdev device? > > I don't think so, but probably there's a limit of DIMMs that real > controllers have, something like 8 max. In the case of i440fx specifically, do you mean that we should model the DRB (Dram row boundary registers in section 3.2.19 of the i440fx spec) ? The i440fx DRB registers only supports up to 8 DRAM rows (let's say 1 row maps 1-1 to a DimmDevice for this discussion) and only supports up to 2GB of memory afaict (bit 31 and above is ignored). I 'd rather not model this part of the i440fx - having only 8 DIMMs seems too restrictive. The rest of the patchset supports up to 255 DIMMs so it would be a waste imho to model an old pc memory controller that only supports 8 DIMMs. There was also an old discussion about i440fx modeling here: https://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg02705.html the general direction was that i440fx is too old and we don't want to precisely emulate the DRB registers, since they lack flexibility. Possible solutions: 1) is there a newer and more flexible chipset that we could model? 2) model and document a generic (non-existent) i440fx that would support more and larger DIMMs. E.g. support 255 DIMMs. If we want to use a description similar to the i440fx DRB registers, the registers would take up a lot of space. In i440fx there is one 8-bit DRB register per DIMM, and DRB[i] describes how many 8MB chunks are contained in DIMMs 0...i. So, the register values are cumulative (and total described memory cannot exceed 256x8MB = 2GB) We could for example model: - an 8-bit non-cumulative register for each DIMM, denoting how many 128MB chunks it contains. This allowes 32GB for each DIMM, and with 255 DIMMs we describe a bit less than 8TB. These registers require 255 bytes. - a 16-bit cumulative register for each DIMM again for 128MB chunks. This allows us to describe 8TB of memory (but the registers take up double the space, because they describe cumulative memory amounts) 3) let everything be handled/abstracted by dimmbus - the chipset DRB modelling is not done (at least for i440fx, other machines could). This is the least precise in terms of emulation. On the other hand, if we are not really trying to emulate the real (too restrictive) hardware, does it matter? thanks, - Vasilis -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.5.0 BUG] vmx_handle_exit: unexpected, valid vectoring info (0x80000b0e)
On Wed, Oct 17, 2012 at 03:04:49PM +0800, Xiao Guangrong wrote: > On 10/17/2012 02:43 PM, Fengguang Wu wrote: > > On Wed, Oct 17, 2012 at 02:26:22PM +0800, Xiao Guangrong wrote: > >> On 09/14/2012 01:57 PM, Xiao Guangrong wrote: > >>> On 09/12/2012 04:15 PM, Avi Kivity wrote: > On 09/12/2012 07:40 AM, Fengguang Wu wrote: > > Hi, > > > > 3 of my test boxes running v3.5 kernel become unaccessible and I find > > two of them kept emitting this dmesg: > > > > vmx_handle_exit: unexpected, valid vectoring info (0x8b0e) and exit > > reason is 0x31 > > > > The other one has froze and the above lines are the last dmesg. > > Any ideas? > > First, that printk should be rate-limited. > > Second, we should add EXIT_REASON_EPT_MISCONFIG (0x31) to > > if ((vectoring_info & VECTORING_INFO_VALID_MASK) && > (exit_reason != EXIT_REASON_EXCEPTION_NMI && > exit_reason != EXIT_REASON_EPT_VIOLATION && > exit_reason != EXIT_REASON_TASK_SWITCH)) > printk(KERN_WARNING "%s: unexpected, valid vectoring info " > "(0x%x) and exit reason is 0x%x\n", > __func__, vectoring_info, exit_reason); > > since it's easily caused by the guest. > >>> > >>> Yes, i will do these. > >>> > > Third, it's really unexpected. It seems the guest was attempting to > deliver a page fault exception (0x0e) but encountered an mmio page > during delivery (in the IDT, TSS, stack, or page tables). Is this > reproducible? If so it's easy to patch kvm to halt in that case and > allow examining the guest via qemu. > > >>> > >>> Have no idea yet why the box was frozen under this case, will try to > >>> write a test case, > >>> hope it can help me to find the reason out. > >>> > >> > >> Still did not know why linux kernel triggered it. I have posted > >> a patchset to report an internal error for this case, hoping > >> Fengguang can reproduce it after the patchset and Qemu's dump > >> can help us to find the reason out. > >> > >> I will keep working on it. > > > > Thanks! Shall I run some patched kernel, or just 3.6.0? > > The patchset is under review. Can be found at: > https://lkml.org/lkml/2012/10/17/31 Thanks, I'll try it. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V2 0/5] Separate consigned (expected steal) from steal time.
On 10/17/2012 06:23 AM, Michael Wolf wrote: > In the case of where you have a system that is running in a > capped or overcommitted environment the user may see steal time > being reported in accounting tools such as top or vmstat. This can > cause confusion for the end user. To ease the confusion this patch set > adds the idea of consigned (expected steal) time. The host will separate > the consigned time from the steal time. The consignment limit passed to the > host will be the amount of steal time expected within a fixed period of > time. Any other steal time accruing during that period will show as the > traditional steal time. > > TODO: > * Change native_clock to take params and not return a value > * Change update_rq_clock_task > > Changes from V1: > * Removed the steal time allowed percentage from the guest > * Moved the separation of consigned (expected steal) and steal time to the > host. > * No longer include a sysctl interface. > You are showing this in the guest somewhere, but tools like top will still not show it. So for quite a while, it achieves nothing. Of course this is a barrier that any new statistic has to go through. So while annoying, this is per-se ultimately not a blocker. What I still fail to see, is how this is useful information to be shown in the guest. Honestly, if I'm in a guest VM or container, any time during which I am not running is time I lost. It doesn't matter if this was expected or not. This still seems to me as a host-side problem, to be solved entirely by tooling. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Patch]KVM: enabling per domain PLE
> > The problem with this is that it requires an administrator to understand the > workload, not only of the guest, but also of other guests on the machine. > With low overcommit, a high PLE window reduces unneeded exits, but with > high overcommit we need those exits to reduce spinning. > > In addition, most kvm hosts don't have an administrator. They are controlled > by a management system, which means we'll need some algorithm in > userspace to control the PLE window. Taking the two together, we need a > dynamic (for changing workloads) algorithm. > > There are threads discussing this dynamic algorithm, we are making slow > progress because it's such a difficult problem, but I think this is much more > useful than anything requiring user intervention. Avi, agreed that dynamic adaptive ple should be the best solution. However currently it is a difficult problem like you said. Our solution just gives user a choice who know how to set the two PLE values. So the solution is a compromise solution, which should be better than nothing, for now? :-) Your comments? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM_MAX_VCPUS
On Wed, Oct 17, 2012 at 02:57:15AM +, Wei, Bing (WeiBing, MCXS-SH) wrote: > For pCPU/core and VCPUS/logical cpu mapping, It should be 8 multiple. 254 is > reasonable. Or something I miss? > I am not sure what do you mean. Can you clarify? > -Original Message- > From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf > Of Vinod, Chegu > Sent: Sunday, October 14, 2012 9:43 PM > To: Gleb Natapov > Cc: Sasha Levin; KVM > Subject: Re: KVM_MAX_VCPUS > > On 10/14/2012 2:08 AM, Gleb Natapov wrote: > > On Sat, Oct 13, 2012 at 10:32:13PM -0400, Sasha Levin wrote: > >> On 10/13/2012 06:29 PM, Chegu Vinod wrote: > >>> Hello, > >>> > >>> Wanted to get a clarification about KVM_MAX_VCPUS(currently set to 254) > >>> in kvm_host.h file. The kvm_vcpu *vcpus array is sized based on > >>> KVM_MAX_VCPUS. > >>> (i.e. a max of 254 elements in the array). > >>> > >>> An 8bit APIC id should allow for 256 ID's. Reserving one for Broadcast > >>> should > >>> leave 255 ID's. Is there one more ID reserved for some other purpose ? > >>> (hence > >>> leading to KVM_MAX_VCPUS being set to 254 and not 255). > >> Another ID goes to the IO-APIC. > >> > > This is not really needed on KVM. We can enlarge KVM_MAX_VCPUS to 255. > > Thanks for clarification! ( We did suspect the IO-APIC...but weren't > quite sure). > > Vinod > > > > -- > > Gleb. > > . > > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.5.0 BUG] vmx_handle_exit: unexpected, valid vectoring info (0x80000b0e)
On 10/17/2012 02:43 PM, Fengguang Wu wrote: > On Wed, Oct 17, 2012 at 02:26:22PM +0800, Xiao Guangrong wrote: >> On 09/14/2012 01:57 PM, Xiao Guangrong wrote: >>> On 09/12/2012 04:15 PM, Avi Kivity wrote: On 09/12/2012 07:40 AM, Fengguang Wu wrote: > Hi, > > 3 of my test boxes running v3.5 kernel become unaccessible and I find > two of them kept emitting this dmesg: > > vmx_handle_exit: unexpected, valid vectoring info (0x8b0e) and exit > reason is 0x31 > > The other one has froze and the above lines are the last dmesg. > Any ideas? First, that printk should be rate-limited. Second, we should add EXIT_REASON_EPT_MISCONFIG (0x31) to if ((vectoring_info & VECTORING_INFO_VALID_MASK) && (exit_reason != EXIT_REASON_EXCEPTION_NMI && exit_reason != EXIT_REASON_EPT_VIOLATION && exit_reason != EXIT_REASON_TASK_SWITCH)) printk(KERN_WARNING "%s: unexpected, valid vectoring info " "(0x%x) and exit reason is 0x%x\n", __func__, vectoring_info, exit_reason); since it's easily caused by the guest. >>> >>> Yes, i will do these. >>> Third, it's really unexpected. It seems the guest was attempting to deliver a page fault exception (0x0e) but encountered an mmio page during delivery (in the IDT, TSS, stack, or page tables). Is this reproducible? If so it's easy to patch kvm to halt in that case and allow examining the guest via qemu. >>> >>> Have no idea yet why the box was frozen under this case, will try to write >>> a test case, >>> hope it can help me to find the reason out. >>> >> >> Still did not know why linux kernel triggered it. I have posted >> a patchset to report an internal error for this case, hoping >> Fengguang can reproduce it after the patchset and Qemu's dump >> can help us to find the reason out. >> >> I will keep working on it. > > Thanks! Shall I run some patched kernel, or just 3.6.0? The patchset is under review. Can be found at: https://lkml.org/lkml/2012/10/17/31 > > Another problem I sometimes run into is, dmesg no longer works in the > test boxes that run lots of KVMs. It aborts with an error message: > > dmesg: klogctl failed: Bad address Interesting, will fight for it. :) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html