Current differences between qemu --enable-kvm and qemu-kvm?
Hi all, is there a summary existing that shows up the rough or actual differences between qemu --enable-kvm and qemu-kvm? I tested both versions with the same compile and start options, the CPU performance results are identical, only the bootup time of my guest system with qemu-kvm seemed to be a bit faster (not measured, it just feeled so). Thanks. Best regards, Erik -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Add support for the GUEST_SMBASE VMCS field for Intel VT-x.
On 05/18/2012 12:34 PM, Matthias Lange wrote: Hi, I was playing around with kvm's nested virtualization feature on Intel VT-x. When trying to access the GUEST_SMBASE (offset 0x4828)field of the VMCS I got a VMREAD/VMWRITE from/to unsupported VMCS component error. According to the Intel manual this field is not optional. The error results from the vmcs_field_to_offset function in vmx.c because the offset of GUEST_SMBASE is not defined. The following patch adds support for the GUEST_SMBASE field. This allows hypervisors running inside kvm read/write access to this field. I have tested this to work on a Core i5 machine. diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 31f180c..6a14720 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -194,6 +194,7 @@ enum vmcs_field { GUEST_TR_AR_BYTES = 0x4822, GUEST_INTERRUPTIBILITY_INFO = 0x4824, GUEST_ACTIVITY_STATE= 0X4826, + GUEST_SMBASE= 0x4828, GUEST_SYSENTER_CS = 0x482A, HOST_IA32_SYSENTER_CS = 0x4c00, CR0_GUEST_HOST_MASK = 0x6000, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 4ff0ab9..0063743 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -285,6 +285,7 @@ struct __packed vmcs12 { u32 guest_tr_ar_bytes; u32 guest_interruptibility_info; u32 guest_activity_state; + u32 guest_smbase; u32 guest_sysenter_cs; u32 host_ia32_sysenter_cs; u32 padding32[8]; /* room for future expansion */ @@ -546,6 +547,7 @@ static unsigned short vmcs_field_to_offset_table[] = { vmcs12 is an ABI, so you can't insert fields at random. Grab one from padding32. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I have a business proposal for you
-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/3] Minor vcpu-requests improvements
Nothing spectacular, just regularization of the code. v2: fix endless loop in patch 3 where a reload would set a bit in vcpu-requests and abort the entry Avi Kivity (3): KVM: Simplify KVM_REQ_EVENT/req_int_win handling KVM: Optimize vcpu-requests slow path slightly KVM: Move mmu reload out of line arch/x86/kvm/mmu.c |4 ++- arch/x86/kvm/svm.c |1 + arch/x86/kvm/x86.c | 73 3 files changed, 43 insertions(+), 35 deletions(-) -- 1.7.10.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/3] KVM: Simplify KVM_REQ_EVENT/req_int_win handling
Put the KVM_REQ_EVENT block in the regular vcpu-requests if (), instead of its own little check. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/x86.c | 30 -- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b78f89d..953e692 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5233,6 +5233,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) vcpu-run-request_interrupt_window; bool req_immediate_exit = 0; + if (unlikely(req_int_win)) + kvm_make_request(KVM_REQ_EVENT, vcpu); + if (vcpu-requests) { if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu)) kvm_mmu_unload(vcpu); @@ -5277,20 +5280,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) kvm_handle_pmu_event(vcpu); if (kvm_check_request(KVM_REQ_PMI, vcpu)) kvm_deliver_pmi(vcpu); - } - - if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) { - inject_pending_event(vcpu); - - /* enable NMI/IRQ window open exits if needed */ - if (vcpu-arch.nmi_pending) - kvm_x86_ops-enable_nmi_window(vcpu); - else if (kvm_cpu_has_interrupt(vcpu) || req_int_win) - kvm_x86_ops-enable_irq_window(vcpu); - - if (kvm_lapic_enabled(vcpu)) { - update_cr8_intercept(vcpu); - kvm_lapic_sync_to_vapic(vcpu); + if (kvm_check_request(KVM_REQ_EVENT, vcpu)) { + inject_pending_event(vcpu); + + /* enable NMI/IRQ window open exits if needed */ + if (vcpu-arch.nmi_pending) + kvm_x86_ops-enable_nmi_window(vcpu); + else if (kvm_cpu_has_interrupt(vcpu) || req_int_win) + kvm_x86_ops-enable_irq_window(vcpu); + + if (kvm_lapic_enabled(vcpu)) { + update_cr8_intercept(vcpu); + kvm_lapic_sync_to_vapic(vcpu); + } } } -- 1.7.10.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/3] KVM: Optimize vcpu-requests slow path slightly
Instead of using a atomic operation per active request, use just one to get all requests at once, then check them with local ops. This probably isn't any faster, since simultaneous requests are rare, but it does reduce code size. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/x86.c | 33 ++--- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 953e692..c0209eb 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5232,55 +5232,58 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) bool req_int_win = !irqchip_in_kernel(vcpu-kvm) vcpu-run-request_interrupt_window; bool req_immediate_exit = 0; + ulong reqs; if (unlikely(req_int_win)) kvm_make_request(KVM_REQ_EVENT, vcpu); if (vcpu-requests) { - if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu)) + reqs = xchg(vcpu-requests, 0UL); + + if (test_bit(KVM_REQ_MMU_RELOAD, reqs)) kvm_mmu_unload(vcpu); - if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu)) + if (test_bit(KVM_REQ_MIGRATE_TIMER, reqs)) __kvm_migrate_timers(vcpu); - if (kvm_check_request(KVM_REQ_CLOCK_UPDATE, vcpu)) { + if (test_bit(KVM_REQ_CLOCK_UPDATE, reqs)) { r = kvm_guest_time_update(vcpu); if (unlikely(r)) goto out; } - if (kvm_check_request(KVM_REQ_MMU_SYNC, vcpu)) + if (test_bit(KVM_REQ_MMU_SYNC, reqs)) kvm_mmu_sync_roots(vcpu); - if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) + if (test_bit(KVM_REQ_TLB_FLUSH, reqs)) kvm_x86_ops-tlb_flush(vcpu); - if (kvm_check_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu)) { + if (test_bit(KVM_REQ_REPORT_TPR_ACCESS, reqs)) { vcpu-run-exit_reason = KVM_EXIT_TPR_ACCESS; r = 0; goto out; } - if (kvm_check_request(KVM_REQ_TRIPLE_FAULT, vcpu)) { + if (test_bit(KVM_REQ_TRIPLE_FAULT, reqs)) { vcpu-run-exit_reason = KVM_EXIT_SHUTDOWN; r = 0; goto out; } - if (kvm_check_request(KVM_REQ_DEACTIVATE_FPU, vcpu)) { + if (test_bit(KVM_REQ_DEACTIVATE_FPU, reqs)) { vcpu-fpu_active = 0; kvm_x86_ops-fpu_deactivate(vcpu); } - if (kvm_check_request(KVM_REQ_APF_HALT, vcpu)) { + if (test_bit(KVM_REQ_APF_HALT, reqs)) { /* Page is swapped out. Do synthetic halt */ vcpu-arch.apf.halted = true; r = 1; goto out; } - if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu)) + if (test_bit(KVM_REQ_STEAL_UPDATE, reqs)) record_steal_time(vcpu); - if (kvm_check_request(KVM_REQ_NMI, vcpu)) + if (test_bit(KVM_REQ_NMI, reqs)) process_nmi(vcpu); req_immediate_exit = - kvm_check_request(KVM_REQ_IMMEDIATE_EXIT, vcpu); - if (kvm_check_request(KVM_REQ_PMU, vcpu)) + test_bit(KVM_REQ_IMMEDIATE_EXIT, reqs); + if (test_bit(KVM_REQ_PMU, reqs)) kvm_handle_pmu_event(vcpu); - if (kvm_check_request(KVM_REQ_PMI, vcpu)) + if (test_bit(KVM_REQ_PMI, reqs)) kvm_deliver_pmi(vcpu); - if (kvm_check_request(KVM_REQ_EVENT, vcpu)) { + if (test_bit(KVM_REQ_EVENT, reqs)) { inject_pending_event(vcpu); /* enable NMI/IRQ window open exits if needed */ -- 1.7.10.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/3] KVM: Move mmu reload out of line
Currently we check that the mmu root exits before every entry. Use the existing KVM_REQ_MMU_RELOAD mechanism instead, by making it really reload the mmu, and by adding the request to mmu initialization code. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/mmu.c |4 +++- arch/x86/kvm/svm.c |1 + arch/x86/kvm/x86.c | 14 +++--- 3 files changed, 11 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 72102e0..589fdaa 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3181,7 +3181,8 @@ void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu) static void paging_new_cr3(struct kvm_vcpu *vcpu) { pgprintk(%s: cr3 %lx\n, __func__, kvm_read_cr3(vcpu)); - mmu_free_roots(vcpu); + kvm_mmu_unload(vcpu); + kvm_mmu_load(vcpu); } static unsigned long get_cr3(struct kvm_vcpu *vcpu) @@ -3470,6 +3471,7 @@ static int init_kvm_nested_mmu(struct kvm_vcpu *vcpu) static int init_kvm_mmu(struct kvm_vcpu *vcpu) { + kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu); if (mmu_is_nested(vcpu)) return init_kvm_nested_mmu(vcpu); else if (tdp_enabled) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index f75af40..98f13d7 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2523,6 +2523,7 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm) if (nested_vmcb-control.nested_ctl) { kvm_mmu_unload(svm-vcpu); + kvm_make_request(KVM_REQ_MMU_RELOAD, svm-vcpu); svm-nested.nested_cr3 = nested_vmcb-control.nested_cr3; nested_svm_init_mmu_context(svm-vcpu); } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c0209eb..946933a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5240,8 +5240,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (vcpu-requests) { reqs = xchg(vcpu-requests, 0UL); - if (test_bit(KVM_REQ_MMU_RELOAD, reqs)) + if (test_bit(KVM_REQ_MMU_RELOAD, reqs)) { kvm_mmu_unload(vcpu); + r = kvm_mmu_reload(vcpu); + if (unlikely(r)) { + kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu); + goto out; + } + } if (test_bit(KVM_REQ_MIGRATE_TIMER, reqs)) __kvm_migrate_timers(vcpu); if (test_bit(KVM_REQ_CLOCK_UPDATE, reqs)) { @@ -5299,12 +5305,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) } } - r = kvm_mmu_reload(vcpu); - if (unlikely(r)) { - kvm_x86_ops-cancel_injection(vcpu); - goto out; - } - preempt_disable(); kvm_x86_ops-prepare_guest_switch(vcpu); -- 1.7.10.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Workload spikes on KVM host when doing IO on a guest...
On 05/20/2012 03:55 AM, Erik Brakkee wrote: Hi, I am seeing high workload spikes of approx. 15 when I do IO inside a KVM guest, for instance dd if=/dev/zero bs=1G count=1 of=hog When I execute a similar command on the host to write a file on the same physical disk, the workload only goes to about 3. This is not surprising. Each I/O request executes in a thread. I am using virtio on the guest with cache mode none. Also, I am using the noop IO scheduler on the guest and the deadline IO scheduler on the host. The guest is allocated a logical volume from the host. With logical volumes, you can use -drive ...,aio=native to avoid the threads. The load will disappear. When I execute the dd command on the guest, it finishes almost instantaneously but when I execute it on the host I have to wait for approx 10 seconds. Specifically, on the guest I see a transfer speed of approx. 600 MB/s and on the host I get 75.9MB/s. The figure for the host is most reliable as this is close to what the hard disks can handle (WD enterprise class SATA hard disks). try dd oflag=direct to force the data to disk. No idea why the host doesn't finish instantaneously. What appears to be happening is that somehow it forwards all IO from the guest immediately to the host, just as if write back caching was used. Write back caching is indeed used, since you did not specify oflag=direct. Is this some known issue in this version of KVM and should I simply upgrade (or replace the host with a centos 6.2 system). Or is there a simple configuration that can fix this? Nothing is broken, so it doesn't need fixing. The high load is not an indication of anything. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2 v2] KVM: Avoid wasting pages for small lpage_info arrays
On 05/20/2012 07:15 AM, Takuya Yoshikawa wrote: From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp lpage_info is created for each large level even when the memory slot is not for RAM. This means that when we add one slot for a PCI device, we end up allocating at least KVM_NR_PAGE_SIZES - 1 pages by vmalloc(). To make things worse, there is an increasing number of devices which would result in more pages being wasted this way. This patch mitigates this problem by using kvm_kvzalloc(). Thanks, applied to 'queue'. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH v2 00/11] uq/master: irqfd-based interrupt injection for virtio/vhost
On 05/17/2012 04:32 PM, Jan Kiszka wrote: [ changes in v2: rebase over uq/master ] This series is another major milestone of merging qemu-kvm into upstream. It implements the required interfaces and logic to directly inject MSI-X interrupts generated by the vhost-net kernel module into the KVM in-kernel irqchip. This involves - establishing MSI vector notifiers, so far triggered on relevant MSI-X configuration changes of subscribed PCI devices - support for static vIRQ-to-MSI routes - an API for linking an IRQFD with such a vIRQ - the usage of these services in virtio-pci to enable direct injection The series also contains some smaller refactorings of the KVM IRQ routing API such as automatic committing of route changes. It applies on top of the KVM MSI support series [1] posted recently. The complete stack is available at git://git.kiszka.org/qemu-kvm.git queues/kvm-msi-irqfd If the proposes API is acceptable, I will also provide some morphing patches for qemu-kvm to make the merge of both trees smoother. After this series, to only reasons to still use qemu-kvm for production purposes will be PCI device assignment and potential dependencies on legacy command line switches as well as vmstate formats (when requiring backward migration support). However, the majority of users should be able to switch to upstream QEMU seamlessly and finally receive the same level of performance on x86. Thanks, applied. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH v2 00/11] uq/master: irqfd-based interrupt injection for virtio/vhost
On Thu, May 17, 2012 at 10:32:28AM -0300, Jan Kiszka wrote: After this series, to only reasons to still use qemu-kvm for production purposes will be PCI device assignment Yay! By the way, there are probably not many reasons to keep the assignment code out of qemu.git. It duplicates a ton of code from core pci, but that's easier to fix in-tree than out of tree. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH v2 00/11] uq/master: irqfd-based interrupt injection for virtio/vhost
On 05/20/2012 05:42 PM, Michael S. Tsirkin wrote: On Thu, May 17, 2012 at 10:32:28AM -0300, Jan Kiszka wrote: After this series, to only reasons to still use qemu-kvm for production purposes will be PCI device assignment Yay! By the way, there are probably not many reasons to keep the assignment code out of qemu.git. It duplicates a ton of code from core pci, but that's easier to fix in-tree than out of tree. Right. And Jan, if you want to push device assignment to qemu.git, please update it in qemu-kvm.git instead of rewriting it in qemu.git. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Aros (icaros) system fails to reboot in kvm
On 05/10/2012 06:09 PM, Michal Suchanek wrote: It does not work. Without the patch KVM git experiences emulation error, with the patch it just locks up. I just tested it, and it worked for me; kvm.git next. Fastest reboot I've ever seen. Also I don't see why would they use movntps for framebuffer. The graphics is up and running, only reboot hits this unimplemented opcode. Strange, yes. Maybe it's a bad reset implementation in qemu, which leads the guest to misdetect the graphics card, and then use movntps. What version of qemu are you using? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu kvm: Accept PCID feature
On 05/10/2012 07:43 AM, Mao, Junjie wrote: This patch makes Qemu accept the PCID feature specified from configuration or command line options. Please post to the qemu mailing list. It's not kvm specific (it isn't supported by tcg, but that's another story). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Workload spikes on KVM host when doing IO on a guest...
On 05/20/2012 08:29 PM, Erik Brakkee wrote: Avi Kivity wrote: On 05/20/2012 08:02 PM, Erik Brakkee wrote: [...] Thanks for this information. Unfortunately, io=native in domain.xml is not supported by opensuse 11.3. It is supported in 12.1 so it appears that the version of KVM I have on the server is too old. I tried it on a system running the newer version and indeed, as you say the load disappears completely when using io=native. I am going to update the host now (probably to centos 6.2) to get rid of this problem. To be clear: it's not a problem. It's completely normal, and doesn't affect anything. The only problem with it is that it leads to high workload spikes, which is normally a reason to have a good look at what is going on. In this case, the newer version of KVM should help eliminate these spikes, so that the next time I see a spike in the workload I know that I have to look into something. Problem is, it doesn't mean anything important. It's the count of running threads plus the count of threads uninterruptibly waiting on a mutex. It's absolutely meaningless. I noticed the issue after I started monitoring the server and all VMs using zabbix (www.zabbix.com) and made a graph showing the workload of the hosts and that of all guests. See below. Falcon is the host and sparrow is a continuous integration server which is creating an updated RPM repository and writing a lot of files. Still the whole area of workload is a bit confusing to me. Is the effect of native IO simply that some of the IO work is not being counted anymore as part of the workload because the work is no longer done in user space? No, it no longer holds a mutex. Yet it does exactly the same thing. That's an indication that the counter is meaningless. (If the counter doesn't drop on an idle machine, that usually indicates trouble; but that's not the case) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ARM: KVM: Check the cpuid we're being asked to emulate.
On Wed, May 16, 2012 at 7:58 PM, Rusty Russell ru...@rustcorp.com.au wrote: On Mon, 14 May 2012 18:57:20 -0400, Christoffer Dall c.d...@virtualopensystems.com wrote: On Thu, Mar 22, 2012 at 8:41 PM, Rusty Russell rusty.russ...@linaro.org wrote: As our emulation gets more sophisticated, we need to know what CPU model we're dealing with. Particularly for some of the nastier workarounds. Let's start with Cortex A-15. We can then test the MIDR elsewhere in the code, knowing that it's one of a finite set of allowed values. (Revisiting this now) The intent is good, this patch is not the right way to do it though. I think want an explicit ioctl to tell the kernel what CPU; since the kernel initialized the regs, it needs to know. not sure of your point exactly, but if I understand correctly, what you're saying is that since the kernel initializes all the regs (at least it's going to) we want an ioctl to say this is the cpu for which you will initialize the regs? that also makes for a more friendly user space interface than you need to set this register to this cryptic value to emulate this cpu... -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3] KVM: x86: Implement PCID/INVPCID for guests with EPT
-Original Message- From: Marcelo Tosatti [mailto:mtosa...@redhat.com] Sent: Saturday, May 19, 2012 5:51 AM To: Mao, Junjie Cc: n...@math.technion.ac.il; 'kvm@vger.kernel.org' Subject: Re: [PATCH v3] KVM: x86: Implement PCID/INVPCID for guests with EPT On Fri, May 18, 2012 at 06:17:05AM +, Mao, Junjie wrote: This patch handles PCID/INVPCID for guests. Process-context identifiers (PCIDs) are a facility by which a logical processor may cache information for multiple linear-address spaces so that the processor may retain cached information when software switches to a different linear address space. Refer to section 4.10.1 in IA32 Intel Software Developer's Manual Volume 3A for details. For guests with EPT, the PCID feature is enabled and INVPCID behaves as running natively. For guests without EPT, the PCID feature is disabled and INVPCID triggers #UD. Changes from v2: Seperate management of PCID and INVPCID Prevent PCID bit in CPUID from exposing on guest hypervisors Don't check the lower 12 bits when loading cr3 if cr4.PCIDE is set Explicitly disable INVPCID for L2 guests Support both enable and disable INVPCID in vmx_cpuid_update() Changes from v1: Move cr0/cr4 writing checks to x86.c Update comments for the reason why PCID is disabled for non-EPT guests Do not support PCID/INVPCID for nested guests at present Clean up useless symbols Signed-off-by: Junjie Mao junjie@intel.com --- arch/x86/include/asm/cpufeature.h |1 + arch/x86/include/asm/kvm_host.h|5 ++- arch/x86/include/asm/processor-flags.h |2 + arch/x86/include/asm/vmx.h |2 + arch/x86/kvm/cpuid.c |6 ++- arch/x86/kvm/cpuid.h |8 + arch/x86/kvm/svm.c | 12 arch/x86/kvm/vmx.c | 49 +++- arch/x86/kvm/x86.c | 24 +-- 9 files changed, 102 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index 8d67d42..1aedbc0 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -203,6 +203,7 @@ #define X86_FEATURE_SMEP (9*32+ 7) /* Supervisor Mode Execution Protection */ #define X86_FEATURE_BMI2 (9*32+ 8) /* 2nd group bit manipulation extensions */ #define X86_FEATURE_ERMS (9*32+ 9) /* Enhanced REP MOVSB/STOSB */ +#define X86_FEATURE_INVPCID(9*32+10) /* INVPCID instruction */ #if defined(__KERNEL__) !defined(__ASSEMBLY__) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 74c9edf..2c250e6 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -47,12 +47,13 @@ #define CR3_PAE_RESERVED_BITS ((X86_CR3_PWT | X86_CR3_PCD) - 1) #define CR3_NONPAE_RESERVED_BITS ((PAGE_SIZE-1) ~(X86_CR3_PWT | X86_CR3_PCD)) +#define CR3_PCID_ENABLED_RESERVED_BITS 0xFF00ULL #define CR3_L_MODE_RESERVED_BITS (CR3_NONPAE_RESERVED_BITS | \ 0xFF00ULL) #define CR4_RESERVED_BITS \ (~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\ | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE \ - | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR \ + | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \ | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_RDWRGSFS \ | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE)) @@ -660,6 +661,8 @@ struct kvm_x86_ops { u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); int (*get_lpage_level)(void); bool (*rdtscp_supported)(void); + bool (*pcid_supported)(void); + bool (*invpcid_supported)(void); void (*adjust_tsc_offset)(struct kvm_vcpu *vcpu, s64 adjustment, bool host); void (*set_tdp_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3); diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h index f8ab3ea..aea1d1d 100644 --- a/arch/x86/include/asm/processor-flags.h +++ b/arch/x86/include/asm/processor-flags.h @@ -44,6 +44,7 @@ */ #define X86_CR3_PWT0x0008 /* Page Write Through */ #define X86_CR3_PCD0x0010 /* Page Cache Disable */ +#define X86_CR3_PCID_MASK 0x0fff /* PCID Mask */ /* * Intel CPU features in CR4 @@ -61,6 +62,7 @@ #define X86_CR4_OSXMMEXCPT 0x0400 /* enable unmasked SSE exceptions */ #define X86_CR4_VMXE 0x2000 /* enable VMX virtualization */ #define X86_CR4_RDWRGSFS 0x0001 /* enable RDWRGSFS support */ +#define X86_CR4_PCIDE 0x0002 /* enable PCID support */ #define X86_CR4_OSXSAVE 0x0004 /* enable xsave
Re: [PATCH] ARM: KVM: Check the cpuid we're being asked to emulate.
On Sun, 20 May 2012 14:34:48 -0400, Christoffer Dall c.d...@virtualopensystems.com wrote: On Wed, May 16, 2012 at 7:58 PM, Rusty Russell ru...@rustcorp.com.au wrote: On Mon, 14 May 2012 18:57:20 -0400, Christoffer Dall c.d...@virtualopensystems.com wrote: On Thu, Mar 22, 2012 at 8:41 PM, Rusty Russell rusty.russ...@linaro.org wrote: As our emulation gets more sophisticated, we need to know what CPU model we're dealing with. Particularly for some of the nastier workarounds. Let's start with Cortex A-15. We can then test the MIDR elsewhere in the code, knowing that it's one of a finite set of allowed values. (Revisiting this now) The intent is good, this patch is not the right way to do it though. I think want an explicit ioctl to tell the kernel what CPU; since the kernel initialized the regs, it needs to know. not sure of your point exactly, but if I understand correctly, what you're saying is that since the kernel initializes all the regs (at least it's going to) we want an ioctl to say this is the cpu for which you will initialize the regs? that also makes for a more friendly user space interface than you need to set this register to this cryptic value to emulate this cpu... Yes, exactly. Esp. since it also effects some of the cp15 emulation hacks. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/5] Export offsets of VMCS fields as note information for kdump
于 2012年05月21日 01:43, Avi Kivity 写道: On 05/16/2012 10:50 AM, zhangyanfei wrote: This patch set exports offsets of VMCS fields as note information for kdump. We call it VMCSINFO. The purpose of VMCSINFO is to retrieve runtime state of guest machine image, such as registers, in host machine's crash dump as VMCS format. The problem is that VMCS internal is hidden by Intel in its specification. So, we slove this problem by reverse engineering implemented in this patch set. The VMCSINFO is exported via sysfs to kexec-tools just like VMCOREINFO. Here are two usercases for two features that we want. 1) Create guest machine's crash dumpfile from host machine's crash dumpfile In general, we want to use this feature on failure analysis for the system where the processing depends on the communication between host and guest machines to look into the system from both machines's viewpoints. As a concrete situation, consider where there's heartbeat monitoring feature on the guest machine's side, where we need to determine in which machine side the cause of heartbeat stop lies. In our actual experiments, we encountered such situation and we found the cause of the bug was in host's process schedular so guest machine's vcpu stopped for a long time and then led to heartbeat stop. The module that judges heartbeat stop is on guest machine, so we need to debug guest machine's data. But if the cause lies in host machine side, we need to look into host machine's crash dump. Do you mean, that a heartbeat failure in the guest lead to host panic? My expectation is that a problem in the guest will cause the guest to panic and perhaps produce a dump; the host will remain up. The point is that before our investigation, we didn't know which side leads to this buggy situation. Maybe a bug in host machine or the guest machine itself causes a heartbeat failure. So we want to get both host machine's crash dump and guest machine's crash dump *at the same time*. Then we could use userspace tools to get guest machine crash dump from host machine's and analyse them separately to find which side causes the problem. Without this feature, we first create guest machine's dump and then create host mahine's, but there's only a short time between two processings, during which it's unlikely that buggy situation remains. So, we think the feature is useful to debug both guest machine's and host machine's sides at the same time, and expect we can make failure analysis efficiently. Of course, we believe this feature is commonly useful on the situation where guest machine doesn't work well due to something of host machine's. 2) Get offsets of VMCS information on the CPU running on the host machine If kdump doesn't work well, then it means we cannot use kvm API to get register values of guest machine and they are still left on its vmcs region. In the case, we use crash dump mechanism running outside of linux kernel, such as sadump, a firmware-based crash dump. Then VMCS information is then necessary. Shouldn't sadump then expose the VMCS offsets? Perhaps bundling them into its dump file? Firmware-based crash dump doesn't concern the os running on the machine. So it will not do any os handling when machine crashes. Thanks Zhang Yanfei -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 4/4] Enabling Access bit when doing memory swapping
-Original Message- From: Marcelo Tosatti [mailto:mtosa...@redhat.com] Sent: Friday, May 18, 2012 10:23 AM To: Xudong Hao Cc: a...@redhat.com; kvm@vger.kernel.org; linux-ker...@vger.kernel.org; Shan, Haitao; Zhang, Xiantao; Hao, Xudong Subject: Re: [PATCH 4/4] Enabling Access bit when doing memory swapping On Wed, May 16, 2012 at 09:12:30AM +0800, Xudong Hao wrote: Enabling Access bit when doing memory swapping. Signed-off-by: Haitao Shan haitao.s...@intel.com Signed-off-by: Xudong Hao xudong@intel.com --- arch/x86/kvm/mmu.c | 13 +++-- arch/x86/kvm/vmx.c |6 -- 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index ff053ca..5f55f98 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1166,7 +1166,8 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, int young = 0; /* -* Emulate the accessed bit for EPT, by checking if this page has +* In case of absence of EPT Access and Dirty Bits supports, +* emulate the accessed bit for EPT, by checking if this page has * an EPT mapping, and clearing it if it does. On the next access, * a new EPT mapping will be established. * This has some overhead, but not as much as the cost of swapping @@ -1179,11 +1180,11 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, while (spte) { int _young; u64 _spte = *spte; - BUG_ON(!(_spte PT_PRESENT_MASK)); - _young = _spte PT_ACCESSED_MASK; + BUG_ON(!is_shadow_present_pte(_spte)); + _young = _spte shadow_accessed_mask; if (_young) { young = 1; - clear_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte); + *spte = ~shadow_accessed_mask; } Now a dirty bit can be lost. Is there a reason to remove the clear_bit? The clear_bit() is called in shadown and EPT A/D mode, because PT_ACCESSED_SHIFT does not make sense for EPT A/D bit, so use the code shadow_accessed_mask to mask the bit for both of them. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Memory Tracking API
On 05/18/2012 12:17 AM, Richard W.M. Jones wrote: On Thu, May 17, 2012 at 11:36:24PM +0530, Jaspal wrote: Hi , Is it possible to keep a count of reads / writes taking place in a vm using qemu ( using kvm as hypervisor ) ? Is there a api ( or any patch ) for it ? Memory reads and writes is surely going to generate a huge amount of output! There are various DEBUG_* symbols at the top of exec.c and ioport.c. I've only used a few of these: DEBUG_UNASSIGNED - prints a message when an unmapped page is referenced (TCG only, presumably?) DEBUG_IOPORT - prints a message when any I/O port is referenced DEBUG_UNUSED_IOPORT - prints a message when a non-emulated I/O port is referenced There are several more if you look at the code. Rich. When are these functions called : kvm_read_guest_page , kvm_read_guest_atomic , kvm_write_guest_page present in kvm_main.c ? When qemu wants to read/write to a page ? If qemu has to read/write on the vm's memory ( RAM ) , does the process always involve kvm ? Thanks , Jaspal -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Biweekly KVM Test report, kernel 51bfd299... qemu a1fce560...
Hi All, This is KVM upstream test result against kvm.git 51bfd2998113e1f8ce8dcf853407b76a04b5f2a0 based on kernel 3.4.0-rc7, and qemu-kvm.git a1fce560c0e5f287ed65d2aaadb3e59578aaa983. We found 1 new bug and 1 bug got fixed in the past two weeks. New issue (1): 1. disk error when guest boot up via qcow2 image https://bugs.launchpad.net/qemu/+bug/1002121 -- Should be a regression on qemu-kvm. Fixed issue (1): 1. one of the two assigned NICs doesn’t work in SMP guest. https://bugs.launchpad.net/qemu/+bug/953754 Old issue (1): -- 1. (Nested-virt)L1 (kvm on kvm)guest panic with parameter “-cpu host” in qemu command line. https://bugs.launchpad.net/qemu/+bug/994378 Test environment: == Platform Westmere-EPSandybridge-EP CPU Cores 2432 Memory size 24G 32G Best Regards, Yongjie Ren (Jay)
[PATCH v2 1/4] Add EPT A/D bits definitions
Add EPT A/D bits definitions. Signed-off-by: Haitao Shan haitao.s...@intel.com Signed-off-by: Xudong Hao xudong@intel.com --- arch/x86/include/asm/vmx.h |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 31f180c..de007c2 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -404,6 +404,7 @@ enum vmcs_field { #define VMX_EPTP_WB_BIT(1ull 14) #define VMX_EPT_2MB_PAGE_BIT (1ull 16) #define VMX_EPT_1GB_PAGE_BIT (1ull 17) +#define VMX_EPT_AD_BIT (1ull 21) #define VMX_EPT_EXTENT_INDIVIDUAL_BIT (1ull 24) #define VMX_EPT_EXTENT_CONTEXT_BIT (1ull 25) #define VMX_EPT_EXTENT_GLOBAL_BIT (1ull 26) @@ -415,11 +416,14 @@ enum vmcs_field { #define VMX_EPT_MAX_GAW0x4 #define VMX_EPT_MT_EPTE_SHIFT 3 #define VMX_EPT_GAW_EPTP_SHIFT 3 +#define VMX_EPT_AD_ENABLE_BIT (1ull 6) #define VMX_EPT_DEFAULT_MT 0x6ull #define VMX_EPT_READABLE_MASK 0x1ull #define VMX_EPT_WRITABLE_MASK 0x2ull #define VMX_EPT_EXECUTABLE_MASK0x4ull #define VMX_EPT_IPAT_BIT (1ull 6) +#define VMX_EPT_ACCESS_BIT (1ull 8) +#define VMX_EPT_DIRTY_BIT (1ull 9) #define VMX_EPT_IDENTITY_PAGETABLE_ADDR0xfffbc000ul -- 1.5.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/4] Enabling Access bit when doing memory swapping
Enabling Access bit when doing memory swapping. Signed-off-by: Haitao Shan haitao.s...@intel.com Signed-off-by: Xudong Hao xudong@intel.com --- arch/x86/kvm/mmu.c | 13 +++-- arch/x86/kvm/vmx.c |6 -- 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 07424cf..392bdf3 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1232,7 +1232,8 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, int young = 0; /* -* Emulate the accessed bit for EPT, by checking if this page has +* In case of absence of EPT Access and Dirty Bits supports, +* emulate the accessed bit for EPT, by checking if this page has * an EPT mapping, and clearing it if it does. On the next access, * a new EPT mapping will be established. * This has some overhead, but not as much as the cost of swapping @@ -1243,11 +1244,11 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp, for (sptep = rmap_get_first(*rmapp, iter); sptep; sptep = rmap_get_next(iter)) { - BUG_ON(!(*sptep PT_PRESENT_MASK)); + BUG_ON(!is_shadow_present_pte(*sptep)); - if (*sptep PT_ACCESSED_MASK) { + if (*sptep shadow_accessed_mask) { young = 1; - clear_bit(PT_ACCESSED_SHIFT, (unsigned long *)sptep); + *sptep = ~shadow_accessed_mask; } } @@ -1271,9 +1272,9 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp, for (sptep = rmap_get_first(*rmapp, iter); sptep; sptep = rmap_get_next(iter)) { - BUG_ON(!(*sptep PT_PRESENT_MASK)); + BUG_ON(!is_shadow_present_pte(*sptep)); - if (*sptep PT_ACCESSED_MASK) { + if (*sptep shadow_accessed_mask) { young = 1; break; } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e8003b6..342ea2e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7259,8 +7259,10 @@ static int __init vmx_init(void) vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false); if (enable_ept) { - kvm_mmu_set_mask_ptes(0ull, 0ull, 0ull, 0ull, - VMX_EPT_EXECUTABLE_MASK); + kvm_mmu_set_mask_ptes(0ull, + (enable_ept_ad_bits) ? VMX_EPT_ACCESS_BIT : 0ull, + (enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull, + 0ull, VMX_EPT_EXECUTABLE_MASK); ept_set_mmio_spte_mask(); kvm_enable_tdp(); } else -- 1.5.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/4] Add parameter to control A/D bits support
Add kernel parameter to control A/D bits support, it's on by default. Signed-off-by: Haitao Shan haitao.s...@intel.com Signed-off-by: Xudong Hao xudong@intel.com --- arch/x86/kvm/vmx.c | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 3062ea9..f3858bf 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -71,6 +71,9 @@ static bool __read_mostly enable_unrestricted_guest = 1; module_param_named(unrestricted_guest, enable_unrestricted_guest, bool, S_IRUGO); +static bool __read_mostly enable_ept_ad_bits = 1; +module_param_named(eptad, enable_ept_ad_bits, bool, S_IRUGO); + static bool __read_mostly emulate_invalid_guest_state = 0; module_param(emulate_invalid_guest_state, bool, S_IRUGO); @@ -786,6 +789,11 @@ static inline bool cpu_has_vmx_ept_4levels(void) return vmx_capability.ept VMX_EPT_PAGE_WALK_4_BIT; } +static inline bool cpu_has_vmx_ept_ad_bits(void) +{ + return vmx_capability.ept VMX_EPT_AD_BIT; +} + static inline bool cpu_has_vmx_invept_individual_addr(void) { return vmx_capability.ept VMX_EPT_EXTENT_INDIVIDUAL_BIT; @@ -2624,8 +2632,12 @@ static __init int hardware_setup(void) !cpu_has_vmx_ept_4levels()) { enable_ept = 0; enable_unrestricted_guest = 0; + enable_ept_ad_bits = 0; } + if (!cpu_has_vmx_ept_ad_bits()) + enable_ept_ad_bits = 0; + if (!cpu_has_vmx_unrestricted_guest()) enable_unrestricted_guest = 0; -- 1.5.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/4] Enable EPT A/D bits if supported by turning on relevant bit in EPTP
In EPT page structure entry, Enable EPT A/D bits if processor supported. Signed-off-by: Haitao Shan haitao.s...@intel.com Signed-off-by: Xudong Hao xudong@intel.com --- arch/x86/kvm/vmx.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index f3858bf..e8003b6 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3018,6 +3018,8 @@ static u64 construct_eptp(unsigned long root_hpa) /* TODO write the value reading from MSR */ eptp = VMX_EPT_DEFAULT_MT | VMX_EPT_DEFAULT_GAW VMX_EPT_GAW_EPTP_SHIFT; + if (enable_ept_ad_bits) + eptp |= VMX_EPT_AD_ENABLE_BIT; eptp |= (root_hpa PAGE_MASK); return eptp; -- 1.5.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html