Re: dirty page tracking in kvm/qemu -- page faults inevitable?
On 07/30/2014 12:09 AM, Xiao Guangrong wrote: On 07/30/2014 06:12 AM, Chris Friesen wrote: Hi, I've got an issue where we're hitting major performance penalties while doing live migration, and it seems like it might be due to page faults triggering hypervisor exits, and then we get stuck waiting for the iothread lock which is held by the qemu dirty page scanning code. I am afraid that using dirty-bit instead of write-protection may cause the case even more worse for iothread-lock because we need to walk whole sptes to get dirty-set pages, however currently we only need to walk the page set in the bitmap. I found a document at "http://ftp.software-sources.co.il/Processor_Architecture_Update-Bob_Valentine.pdf"; which talks about the benefits of Haswell. One of the items reads: "New Accessed and Dirty bits for Extended Page Tables (EPT) eliminates major cause of vmexits" Is that accurate? If so, then it seems like it should allow for the VM to run without trying to exit the hypervisor, and as long as it just does in-memory operations it won't contend on the iothread lock. Chris -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr
This are not specific to e500hv but applicable for bookehv (As per comment from Scott Wood on my patch "kvm: ppc: bookehv: Added wrapper macros for shadow registers") Signed-off-by: Bharat Bhushan --- arch/powerpc/include/asm/kvm_ppc.h | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index cbee453..2ae2897 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -540,16 +540,16 @@ static inline bool kvmppc_shared_big_endian(struct kvm_vcpu *vcpu) #endif } -#define SPRNG_WRAPPER_GET(reg, e500hv_spr) \ +#define SPRNG_WRAPPER_GET(reg, bookehv_spr)\ static inline ulong kvmppc_get_##reg(struct kvm_vcpu *vcpu)\ { \ - return mfspr(e500hv_spr); \ + return mfspr(bookehv_spr); \ } \ -#define SPRNG_WRAPPER_SET(reg, e500hv_spr) \ +#define SPRNG_WRAPPER_SET(reg, bookehv_spr)\ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, ulong val) \ { \ - mtspr(e500hv_spr, val); \ + mtspr(bookehv_spr, val); \ } \ #define SHARED_WRAPPER_GET(reg, size) \ @@ -574,18 +574,18 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val) \ SHARED_WRAPPER_GET(reg, size) \ SHARED_WRAPPER_SET(reg, size) \ -#define SPRNG_WRAPPER(reg, e500hv_spr) \ - SPRNG_WRAPPER_GET(reg, e500hv_spr) \ - SPRNG_WRAPPER_SET(reg, e500hv_spr) \ +#define SPRNG_WRAPPER(reg, bookehv_spr) \ + SPRNG_WRAPPER_GET(reg, bookehv_spr) \ + SPRNG_WRAPPER_SET(reg, bookehv_spr) \ #ifdef CONFIG_KVM_BOOKE_HV -#define SHARED_SPRNG_WRAPPER(reg, size, e500hv_spr)\ - SPRNG_WRAPPER(reg, e500hv_spr) \ +#define SHARED_SPRNG_WRAPPER(reg, size, bookehv_spr) \ + SPRNG_WRAPPER(reg, bookehv_spr) \ #else -#define SHARED_SPRNG_WRAPPER(reg, size, e500hv_spr)\ +#define SHARED_SPRNG_WRAPPER(reg, size, bookehv_spr) \ SHARED_WRAPPER(reg, size) \ #endif -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr
On 30.07.14 11:33, Bharat Bhushan wrote: This are not specific to e500hv but applicable for bookehv (As per comment from Scott Wood on my patch "kvm: ppc: bookehv: Added wrapper macros for shadow registers") Signed-off-by: Bharat Bhushan Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform
On Fri, Jul 25 2014 at 4:29:12 pm BST, Will Deacon wrote: > If the physical address of GICV isn't page-aligned, then we end up > creating a stage-2 mapping of the page containing it, which causes us to > map neighbouring memory locations directly into the guest. > > As an example, consider a platform with GICV at physical 0x2c02f000 > running a 64k-page host kernel. If qemu maps this into the guest at > 0x8001, then guest physical addresses 0x8001 - 0x8001efff will > map host physical region 0x2c02 - 0x2c02efff. Accesses to these > physical regions may cause UNPREDICTABLE behaviour, for example, on the > Juno platform this will cause an SError exception to EL3, which brings > down the entire physical CPU resulting in RCU stalls / HYP panics / host > crashing / wasted weeks of debugging. > > SBSA recommends that systems alias the 4k GICV across the bounding 64k > region, in which case GICV physical could be described as 0x2c02 in > the above scenario. > > This patch fixes the problem by failing the vgic probe if the physical > base address or the size of GICV aren't page-aligned. Note that this > generated a warning in dmesg about freeing enabled IRQs, so I had to > move the IRQ enabling later in the probe. > > Cc: Christoffer Dall > Cc: Marc Zyngier > Cc: Gleb Natapov > Cc: Paolo Bonzini > Cc: Joel Schopp > Cc: Don Dutile > Acked-by: Peter Maydell > Signed-off-by: Will Deacon Looks good to me: Acked-by: Marc Zyngier Christoffer, can you please take this as an urgent fix? Thanks, M. -- Jazz is not dead. It just smells funny. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: nVMX: nested TPR shadow/threshold emulation
This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=61411 TPR shadow/threshold feature is important to speed up the Windows guest. Besides, it is a must feature for certain VMM. We map virtual APIC page address and TPR threshold from L1 VMCS. If TPR_BELOW_THRESHOLD VM exit is triggered by L2 guest and L1 interested in, we inject it into L1 VMM for handling. Signed-off-by: Wanpeng Li --- arch/x86/kvm/vmx.c | 22 ++ 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a3845b8..f60846c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2331,7 +2331,7 @@ static __init void nested_vmx_setup_ctls_msrs(void) CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MONITOR_EXITING | CPU_BASED_RDPMC_EXITING | CPU_BASED_RDTSC_EXITING | - CPU_BASED_PAUSE_EXITING | + CPU_BASED_PAUSE_EXITING | CPU_BASED_TPR_SHADOW | CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; /* * We can allow some features even when not supported by the @@ -6937,7 +6937,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu) case EXIT_REASON_MCE_DURING_VMENTRY: return 0; case EXIT_REASON_TPR_BELOW_THRESHOLD: - return 1; + return nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW); case EXIT_REASON_APIC_ACCESS: return nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES); @@ -7058,6 +7058,9 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) { + if (is_guest_mode(vcpu)) + return; + if (irr == -1 || tpr < irr) { vmcs_write32(TPR_THRESHOLD, 0); return; @@ -7962,14 +7965,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) if (!vmx->rdtscp_enabled) exec_control &= ~SECONDARY_EXEC_RDTSCP; /* Take the following fields only from vmcs12 */ - exec_control &= ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES | - SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY | + exec_control &= ~(SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY | SECONDARY_EXEC_APIC_REGISTER_VIRT); if (nested_cpu_has(vmcs12, CPU_BASED_ACTIVATE_SECONDARY_CONTROLS)) exec_control |= vmcs12->secondary_vm_exec_control; if (exec_control & SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) { + struct page *virtual_apic_page; /* * Translate L1 physical address to host physical * address for vmcs02. Keep the page pinned, so this @@ -7992,6 +7995,15 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) else vmcs_write64(APIC_ACCESS_ADDR, page_to_phys(vmx->nested.apic_access_page)); + + virtual_apic_page = nested_get_page(vcpu, + vmcs12->virtual_apic_page_addr); + if (vmcs_read64(VIRTUAL_APIC_PAGE_ADDR) != + page_to_phys(virtual_apic_page)) + vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, + page_to_phys(virtual_apic_page)); + nested_release_page(virtual_apic_page); + } else if (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) { exec_control |= SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; @@ -8002,6 +8014,8 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control); } + if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) + vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold); /* * Set host-state according to L0's settings (vmcs12 is irrelevant here) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform
From: Will Deacon If the physical address of GICV isn't page-aligned, then we end up creating a stage-2 mapping of the page containing it, which causes us to map neighbouring memory locations directly into the guest. As an example, consider a platform with GICV at physical 0x2c02f000 running a 64k-page host kernel. If qemu maps this into the guest at 0x8001, then guest physical addresses 0x8001 - 0x8001efff will map host physical region 0x2c02 - 0x2c02efff. Accesses to these physical regions may cause UNPREDICTABLE behaviour, for example, on the Juno platform this will cause an SError exception to EL3, which brings down the entire physical CPU resulting in RCU stalls / HYP panics / host crashing / wasted weeks of debugging. SBSA recommends that systems alias the 4k GICV across the bounding 64k region, in which case GICV physical could be described as 0x2c02 in the above scenario. This patch fixes the problem by failing the vgic probe if the physical base address or the size of GICV aren't page-aligned. Note that this generated a warning in dmesg about freeing enabled IRQs, so I had to move the IRQ enabling later in the probe. Cc: Christoffer Dall Cc: Marc Zyngier Cc: Gleb Natapov Cc: Paolo Bonzini Cc: Joel Schopp Cc: Don Dutile Acked-by: Peter Maydell Acked-by: Joel Schopp Acked-by: Marc Zyngier Signed-off-by: Will Deacon Signed-off-by: Christoffer Dall --- virt/kvm/arm/vgic.c | 24 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 56ff9be..476d3bf 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1526,17 +1526,33 @@ int kvm_vgic_hyp_init(void) goto out_unmap; } - kvm_info("%s@%llx IRQ%d\n", vgic_node->name, -vctrl_res.start, vgic_maint_irq); - on_each_cpu(vgic_init_maintenance_interrupt, NULL, 1); - if (of_address_to_resource(vgic_node, 3, &vcpu_res)) { kvm_err("Cannot obtain VCPU resource\n"); ret = -ENXIO; goto out_unmap; } + + if (!PAGE_ALIGNED(vcpu_res.start)) { + kvm_err("GICV physical address 0x%llx not page aligned\n", + (unsigned long long)vcpu_res.start); + ret = -ENXIO; + goto out_unmap; + } + + if (!PAGE_ALIGNED(resource_size(&vcpu_res))) { + kvm_err("GICV size 0x%llx not a multiple of page size 0x%lx\n", + (unsigned long long)resource_size(&vcpu_res), + PAGE_SIZE); + ret = -ENXIO; + goto out_unmap; + } + vgic_vcpu_base = vcpu_res.start; + kvm_info("%s@%llx IRQ%d\n", vgic_node->name, +vctrl_res.start, vgic_maint_irq); + on_each_cpu(vgic_init_maintenance_interrupt, NULL, 1); + goto out; out_unmap: -- 2.0.0 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform
On Wed, Jul 30, 2014 at 11:47:40AM +0100, Marc Zyngier wrote: > On Fri, Jul 25 2014 at 4:29:12 pm BST, Will Deacon > wrote: > > If the physical address of GICV isn't page-aligned, then we end up > > creating a stage-2 mapping of the page containing it, which causes us to > > map neighbouring memory locations directly into the guest. > > > > As an example, consider a platform with GICV at physical 0x2c02f000 > > running a 64k-page host kernel. If qemu maps this into the guest at > > 0x8001, then guest physical addresses 0x8001 - 0x8001efff will > > map host physical region 0x2c02 - 0x2c02efff. Accesses to these > > physical regions may cause UNPREDICTABLE behaviour, for example, on the > > Juno platform this will cause an SError exception to EL3, which brings > > down the entire physical CPU resulting in RCU stalls / HYP panics / host > > crashing / wasted weeks of debugging. > > > > SBSA recommends that systems alias the 4k GICV across the bounding 64k > > region, in which case GICV physical could be described as 0x2c02 in > > the above scenario. > > > > This patch fixes the problem by failing the vgic probe if the physical > > base address or the size of GICV aren't page-aligned. Note that this > > generated a warning in dmesg about freeing enabled IRQs, so I had to > > move the IRQ enabling later in the probe. > > > > Cc: Christoffer Dall > > Cc: Marc Zyngier > > Cc: Gleb Natapov > > Cc: Paolo Bonzini > > Cc: Joel Schopp > > Cc: Don Dutile > > Acked-by: Peter Maydell > > Signed-off-by: Will Deacon > > Looks good to me: > > Acked-by: Marc Zyngier > > Christoffer, can you please take this as an urgent fix? > Yes, sorry for the delay, Applied to master and notified the KVM guys to try and get it into 3.16. Thanks, -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] KVM/ARM Urgent fix for 3.16
Hi Paolo and Gleb, Is there any chance you can get this urgent fix (which allows KVM guest to bring down the entire system on some 64K enabled ARM64 hosts) merged for 3.16? The following changes since commit bb18b526a9d8d4a3fe56f234d5013b9f6036978d: Merge tag 'signed-for-3.16' of git://github.com/agraf/linux-2.6 into kvm-master (2014-07-08 12:08:58 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git tags/kvm-arm-for-3.16-rc7 for you to fetch changes up to 63afbe7a0ac184ef8485dac4914e87b211b5bfaa: kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform (2014-07-30 14:35:42 +0200) --- Will Deacon (1): kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform virt/kvm/arm/vgic.c | 24 1 file changed, 20 insertions(+), 4 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: HV: Remove generic instruction emulation
Now that we have properly split load/store instruction emulation and generic instruction emulation, we can move the generic one from kvm.ko to kvm-pr.ko on book3s_64. This reduces the attack surface and amount of code loaded on HV KVM kernels. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/Makefile | 2 +- arch/powerpc/kvm/trace_pr.h | 20 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile index 1ccd7a1..2d590de 100644 --- a/arch/powerpc/kvm/Makefile +++ b/arch/powerpc/kvm/Makefile @@ -48,6 +48,7 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) := \ kvm-pr-y := \ fpu.o \ + emulate.o \ book3s_paired_singles.o \ book3s_pr.o \ book3s_pr_papr.o \ @@ -91,7 +92,6 @@ kvm-book3s_64-module-objs += \ $(KVM)/kvm_main.o \ $(KVM)/eventfd.o \ powerpc.o \ - emulate.o \ emulate_loadstore.o \ book3s.o \ book3s_64_vio.o \ diff --git a/arch/powerpc/kvm/trace_pr.h b/arch/powerpc/kvm/trace_pr.h index e1357cd..a674f09 100644 --- a/arch/powerpc/kvm/trace_pr.h +++ b/arch/powerpc/kvm/trace_pr.h @@ -291,6 +291,26 @@ TRACE_EVENT(kvm_unmap_hva, TP_printk("unmap hva 0x%lx\n", __entry->hva) ); +TRACE_EVENT(kvm_ppc_instr, + TP_PROTO(unsigned int inst, unsigned long _pc, unsigned int emulate), + TP_ARGS(inst, _pc, emulate), + + TP_STRUCT__entry( + __field(unsigned int, inst) + __field(unsigned long, pc ) + __field(unsigned int, emulate ) + ), + + TP_fast_assign( + __entry->inst = inst; + __entry->pc = _pc; + __entry->emulate= emulate; + ), + + TP_printk("inst %u pc 0x%lx emulate %u\n", + __entry->inst, __entry->pc, __entry->emulate) +); + #endif /* _TRACE_KVM_H */ /* This part must be outside protection */ -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Final KVM change for 3.16
Linus, The following changes since commit bb18b526a9d8d4a3fe56f234d5013b9f6036978d: Merge tag 'signed-for-3.16' of git://github.com/agraf/linux-2.6 into kvm-master (2014-07-08 12:08:58 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus for you to fetch changes up to 63afbe7a0ac184ef8485dac4914e87b211b5bfaa: kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform (2014-07-30 14:35:42 +0200) Fix a bug which allows KVM guests to bring down the entire system on some 64K enabled ARM64 hosts. Will Deacon (1): kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform virt/kvm/arm/vgic.c | 24 1 file changed, 20 insertions(+), 4 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] KVM/ARM Urgent fix for 3.16
Il 30/07/2014 14:55, Christoffer Dall ha scritto: > Hi Paolo and Gleb, > > Is there any chance you can get this urgent fix (which allows KVM guest > to bring down the entire system on some 64K enabled ARM64 hosts) merged > for 3.16? > > The following changes since commit bb18b526a9d8d4a3fe56f234d5013b9f6036978d: > > Merge tag 'signed-for-3.16' of git://github.com/agraf/linux-2.6 into > kvm-master (2014-07-08 12:08:58 +0200) > > are available in the git repository at: > > > git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git > tags/kvm-arm-for-3.16-rc7 > > for you to fetch changes up to 63afbe7a0ac184ef8485dac4914e87b211b5bfaa: > > kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform (2014-07-30 > 14:35:42 +0200) > > --- > Will Deacon (1): > kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform > > virt/kvm/arm/vgic.c | 24 > 1 file changed, 20 insertions(+), 4 deletions(-) > I think Gleb is on vacation now, but unfortunately I've already had enough this year. I resent the pull request from git://git.kernel.org/pub/scm/virt/kvm/kvm.git, even though you had CCed Linus here already. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] watchdog: control hard lockup detection default
On Fri, Jul 25, 2014 at 01:25:11PM +0200, Andrew Jones wrote: > > to enable hard lockup detection explicitly. > > > > I think changing the 'watchdog_thresh' while 'watchdog_running' is true > > should > > _not_ enable hard lockup detection as a side-effect, because a user may > > have a > > 'sysctl.conf' entry such as > > > >kernel.watchdog_thresh = ... > > > > or may only want to change the 'watchdog_thresh' on the fly. > > > > I think the following flow of execution could cause such undesired > > side-effect. > > > >proc_dowatchdog > > if (watchdog_user_enabled && watchdog_thresh) { > > > > watchdog_enable_hardlockup_detector > >hardlockup_detector_enabled = true > > > > watchdog_enable_all_cpus > >if (!watchdog_running) { > >... > >} else if (sample_period_changed) > > update_timers_all_cpus > > for_each_online_cpu > > update_timers > > watchdog_nmi_disable > > ... > > watchdog_nmi_enable > > > > watchdog_hardlockup_detector_is_enabled > > return true > > > > enable perf counter for hard lockup > > detection > > > > Regards, > > > > Uli > > Nice catch. Looks like this will need a v2. Paolo, do we have a > consensus on the proc echoing? Or should that be revisited in the v2 as > well? As discussed privately, how about something like this to handle that case: (applied on top of these patches) Cheers, Don diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 34eca29..027fb6c 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -666,7 +666,12 @@ int proc_dowatchdog(struct ctl_table *table, int write, * watchdog_*_all_cpus() function takes care of this. */ if (watchdog_user_enabled && watchdog_thresh) { - watchdog_enable_hardlockup_detector(true); + /* +* Prevent a change in watchdog_thresh accidentally overriding +* the enablement of the hardlockup detector. +*/ + if (watchdog_user_enabled != old_enabled) + watchdog_enable_hardlockup_detector(true); err = watchdog_enable_all_cpus(old_thresh != watchdog_thresh); } else watchdog_disable_all_cpus(); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] watchdog: control hard lockup detection default
Il 30/07/2014 15:43, Don Zickus ha scritto: >> > Nice catch. Looks like this will need a v2. Paolo, do we have a >> > consensus on the proc echoing? Or should that be revisited in the v2 as >> > well? > As discussed privately, how about something like this to handle that case: > (applied on top of these patches) Don, what do you think about proc? My opinion is still what I mentioned earlier in the thread, i.e. that if the file says "1", writing "0" and then "1" should not constitute a change WRT to the initial state. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: nVMX: nested TPR shadow/threshold emulation
Il 30/07/2014 14:04, Wanpeng Li ha scritto: > @@ -7962,14 +7965,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, > struct vmcs12 *vmcs12) > if (!vmx->rdtscp_enabled) > exec_control &= ~SECONDARY_EXEC_RDTSCP; > /* Take the following fields only from vmcs12 */ > - exec_control &= ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES | > - SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY | > + exec_control &= ~(SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY | >SECONDARY_EXEC_APIC_REGISTER_VIRT); This change is wrong. You don't have to take L0's "virtualize APIC accesses" setting into account, because while running L2 you cannot modify L1's CR8 (only the virtual nested one). > + > + virtual_apic_page = nested_get_page(vcpu, > + vmcs12->virtual_apic_page_addr); > + if (vmcs_read64(VIRTUAL_APIC_PAGE_ADDR) != > + page_to_phys(virtual_apic_page)) > + vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, > + page_to_phys(virtual_apic_page)); > + nested_release_page(virtual_apic_page); > + You cannot release this page here. You need to the exactly the same thing that is done for apic_access_page. One thing: > + if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) > + vmcs_write32(TPR_THRESHOLD, vmcs12->tpr_threshold); I think you can just do this write unconditionally, since most hypervisors will enable this. Also, you probably can add the tpr threshold field to the read-write fields for shadow VMCS. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: vmx: remove duplicate vmx_mpx_supported()
Il 29/07/2014 23:14, Chris J Arges ha scritto: > Remove a function which was added by both 93c4adc7afe and 36be0b9deb2. > > Signed-off-by: Chris J Arges > --- > arch/x86/kvm/vmx.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 801332e..c4ea039 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -740,7 +740,6 @@ static u32 vmx_segment_access_rights(struct kvm_segment > *var); > static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu); > static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx); > static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx); > -static bool vmx_mpx_supported(void); > > static DEFINE_PER_CPU(struct vmcs *, vmxarea); > static DEFINE_PER_CPU(struct vmcs *, current_vmcs); > Thanks, applying. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dirty page tracking in kvm/qemu -- page faults inevitable?
Il 30/07/2014 09:41, Chris Friesen ha scritto: >> I am afraid that using dirty-bit instead of write-protection may cause the >> case >> even more worse for iothread-lock because we need to walk whole sptes to get >> dirty-set pages, however currently we only need to walk the page set in the >> bitmap. > > I found a document at > "http://ftp.software-sources.co.il/Processor_Architecture_Update-Bob_Valentine.pdf"; > which talks about the benefits of Haswell. One of the items reads: > > "New Accessed and Dirty bits for Extended Page Tables (EPT) eliminates > major cause of vmexits" > > Is that accurate? If so, then it seems like it should allow for the VM > to run without trying to exit the hypervisor, and as long as it just > does in-memory operations it won't contend on the iothread lock. True, but: 1) the problem is fishing the information out of the page tables and passing it up to userspace. You have to process the whole EPT tree one page at a time, instead of doing it 64 bits at a time. Also, one source of bad performance is having to split all entries of the EPT page tables down to 4K, and you get that anyway. 2) You should not get to userspace simply for marking a page as locked. As you describe it, your problem seems to be contention between QEMU threads, KVM is not involved. 3) What version of QEMU are you using? Things have been improving steadily, and we probably will get to using atomic operations instead of the iothread lock to protect the migration dirty bitmap. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dirty page tracking in kvm/qemu -- page faults inevitable?
On 07/30/2014 09:42 AM, Paolo Bonzini wrote: Il 30/07/2014 09:41, Chris Friesen ha scritto: I found a document at "http://ftp.software-sources.co.il/Processor_Architecture_Update-Bob_Valentine.pdf"; which talks about the benefits of Haswell. One of the items reads: "New Accessed and Dirty bits for Extended Page Tables (EPT) eliminates major cause of vmexits" Is that accurate? If so, then it seems like it should allow for the VM to run without trying to exit the hypervisor, and as long as it just does in-memory operations it won't contend on the iothread lock. 2) You should not get to userspace simply for marking a page as locked. As you describe it, your problem seems to be contention between QEMU threads, KVM is not involved. What about writing to a page where we're tracking dirty pages? Would that get back up to qemu or would that be handled entirely in the kvm kernel module? I was assuming that it was due to the page faults since as far as I know the app in the VM is just doing packet processing from/to memory-mapped circular buffers--the qemu threads in question aren't doing "normal" I/O but something is causing them to try to acquire the iothread lock. 3) What version of QEMU are you using? Things have been improving steadily, and we probably will get to using atomic operations instead of the iothread lock to protect the migration dirty bitmap. We're currently on 1.4.2. We're looking at trying out 1.7 to see if it's better, but we've got some local patches that would need to get ported. Chris -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table
Currently, the EOI exit bitmap (used for APICv) does not include interrupts that are masked. However, this can cause a bug that manifests as an interrupt storm inside the guest. Alex Williamson reported the bug and is the one who really debugged this; I only wrote the patch. :) The scenario involves a multi-function PCI device with OHCI and EHCI USB functions and an audio function, all assigned to the guest, where both USB functions use legacy INTx interrupts. As soon as the guest boots, interrupts for these devices turn into an interrupt storm in the guest; the host does not see the interrupt storm. Basically the EOI path does not work, and the guest continues to see the interrupt over and over, even after it attempts to mask it at the APIC. The bug is only visible with older kernels (RHEL6.5, based on 2.6.32 with not many changes in the area of APIC/IOAPIC handling). Alex then tried forcing bit 59 (corresponding to the USB functions' IRQ) on in the eoi_exit_bitmap and TMR, and things then work. What happens is that VFIO asserts IRQ11, then KVM recomputes the EOI exit bitmap. It does not have set bit 59 because the RTE was masked, so the IOAPIC never sees the EOI and the interrupt continues to fire in the guest. Probably, the guest is masking the interrupt in the redirection table in the interrupt routine, i.e. while the interrupt is set in a LAPIC's ISR. The simplest fix is to ignore the masking state, we would rather have an unnecessary exit rather than a missed IRQ ACK and anyway IOAPIC interrupts are not as performance-sensitive as for example MSIs. [Thanks to Alex for his precise description of the problem and initial debugging effort. A lot of the text above is based on emails exchanged with him.] Reported-by: Alex Williamson Cc: sta...@vger.kernel.org Signed-off-by: Paolo Bonzini --- virt/kvm/ioapic.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index 2458a1dc2ba9..e8ce34c9db32 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -254,10 +254,9 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, spin_lock(&ioapic->lock); for (index = 0; index < IOAPIC_NUM_PINS; index++) { e = &ioapic->redirtbl[index]; - if (!e->fields.mask && - (e->fields.trig_mode == IOAPIC_LEVEL_TRIG || -kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, -index) || index == RTC_GSI)) { + if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG || + kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index) || + index == RTC_GSI) { if (kvm_apic_match_dest(vcpu, NULL, 0, e->fields.dest_id, e->fields.dest_mode)) { __set_bit(e->fields.vector, -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dirty page tracking in kvm/qemu -- page faults inevitable?
Il 30/07/2014 18:02, Chris Friesen ha scritto: > On 07/30/2014 09:42 AM, Paolo Bonzini wrote: >> Il 30/07/2014 09:41, Chris Friesen ha scritto: >>> I found a document at >>> "http://ftp.software-sources.co.il/Processor_Architecture_Update-Bob_Valentine.pdf"; >>> >>> which talks about the benefits of Haswell. One of the items reads: >>> >>> "New Accessed and Dirty bits for Extended Page Tables (EPT) eliminates >>> major cause of vmexits" >>> >>> Is that accurate? If so, then it seems like it should allow for the VM >>> to run without trying to exit the hypervisor, and as long as it just >>> does in-memory operations it won't contend on the iothread lock. > >> 2) You should not get to userspace simply for marking a page as locked. >> As you describe it, your problem seems to be contention between QEMU >> threads, KVM is not involved. > > What about writing to a page where we're tracking dirty pages? Would > that get back up to qemu or would that be handled entirely in the kvm > kernel module? It's handle inside the kernel module. Every now and then QEMU asks the kernel for the dirty pages and ORs the bitmap returned by KVM with its own. All this is done under the iothread lock. > I was assuming that it was due to the page faults since as far as I know > the app in the VM is just doing packet processing from/to memory-mapped > circular buffers--the qemu threads in question aren't doing "normal" I/O > but something is causing them to try to acquire the iothread lock. > >> 3) What version of QEMU are you using? Things have been improving >> steadily, and we probably will get to using atomic operations instead of >> the iothread lock to protect the migration dirty bitmap. > > We're currently on 1.4.2. We're looking at trying out 1.7 to see if > it's better, but we've got some local patches that would need to get > ported. >From a quick "git describe" 2.0 is needed. The patches end at commit ae2810c (memory: syncronize kvm bitmap using bitmaps operations, 2013-11-05). Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: HV: Remove generic instruction emulation
Il 30/07/2014 15:27, Alexander Graf ha scritto: > Now that we have properly split load/store instruction emulation and generic > instruction emulation, we can move the generic one from kvm.ko to kvm-pr.ko > on book3s_64. > > This reduces the attack surface and amount of code loaded on HV KVM kernels. Can emulation races happen on HV KVM like you can have on x86? Basically one CPU writes to MMIO while the other patches instructions so that basically anything can end up in the hands of the emulator? On PPC it may even happen simply because of a missing icache invalidation, I think, since it doesn't support self-modifying code without explicit invalidation. Paolo > Signed-off-by: Alexander Graf > --- > arch/powerpc/kvm/Makefile | 2 +- > arch/powerpc/kvm/trace_pr.h | 20 > 2 files changed, 21 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile > index 1ccd7a1..2d590de 100644 > --- a/arch/powerpc/kvm/Makefile > +++ b/arch/powerpc/kvm/Makefile > @@ -48,6 +48,7 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) > := \ > > kvm-pr-y := \ > fpu.o \ > + emulate.o \ > book3s_paired_singles.o \ > book3s_pr.o \ > book3s_pr_papr.o \ > @@ -91,7 +92,6 @@ kvm-book3s_64-module-objs += \ > $(KVM)/kvm_main.o \ > $(KVM)/eventfd.o \ > powerpc.o \ > - emulate.o \ > emulate_loadstore.o \ > book3s.o \ > book3s_64_vio.o \ > diff --git a/arch/powerpc/kvm/trace_pr.h b/arch/powerpc/kvm/trace_pr.h > index e1357cd..a674f09 100644 > --- a/arch/powerpc/kvm/trace_pr.h > +++ b/arch/powerpc/kvm/trace_pr.h > @@ -291,6 +291,26 @@ TRACE_EVENT(kvm_unmap_hva, > TP_printk("unmap hva 0x%lx\n", __entry->hva) > ); > > +TRACE_EVENT(kvm_ppc_instr, > + TP_PROTO(unsigned int inst, unsigned long _pc, unsigned int emulate), > + TP_ARGS(inst, _pc, emulate), > + > + TP_STRUCT__entry( > + __field(unsigned int, inst) > + __field(unsigned long, pc ) > + __field(unsigned int, emulate ) > + ), > + > + TP_fast_assign( > + __entry->inst = inst; > + __entry->pc = _pc; > + __entry->emulate= emulate; > + ), > + > + TP_printk("inst %u pc 0x%lx emulate %u\n", > + __entry->inst, __entry->pc, __entry->emulate) > +); > + > #endif /* _TRACE_KVM_H */ > > /* This part must be outside protection */ > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table
On Wed, 2014-07-30 at 18:12 +0200, Paolo Bonzini wrote: > Currently, the EOI exit bitmap (used for APICv) does not include > interrupts that are masked. However, this can cause a bug that manifests > as an interrupt storm inside the guest. Alex Williamson reported the > bug and is the one who really debugged this; I only wrote the patch. :) > > The scenario involves a multi-function PCI device with OHCI and EHCI > USB functions and an audio function, all assigned to the guest, where > both USB functions use legacy INTx interrupts. > > As soon as the guest boots, interrupts for these devices turn into an > interrupt storm in the guest; the host does not see the interrupt storm. > Basically the EOI path does not work, and the guest continues to see the > interrupt over and over, even after it attempts to mask it at the APIC. > The bug is only visible with older kernels (RHEL6.5, based on 2.6.32 > with not many changes in the area of APIC/IOAPIC handling). > > Alex then tried forcing bit 59 (corresponding to the USB functions' IRQ) > on in the eoi_exit_bitmap and TMR, and things then work. What happens > is that VFIO asserts IRQ11, then KVM recomputes the EOI exit bitmap. > It does not have set bit 59 because the RTE was masked, so the IOAPIC > never sees the EOI and the interrupt continues to fire in the guest. > > Probably, the guest is masking the interrupt in the redirection table in > the interrupt routine, i.e. while the interrupt is set in a LAPIC's ISR. > The simplest fix is to ignore the masking state, we would rather have > an unnecessary exit rather than a missed IRQ ACK and anyway IOAPIC > interrupts are not as performance-sensitive as for example MSIs. > > [Thanks to Alex for his precise description of the problem > and initial debugging effort. A lot of the text above is > based on emails exchanged with him.] > > Reported-by: Alex Williamson > Cc: sta...@vger.kernel.org > Signed-off-by: Paolo Bonzini Thanks Paolo Tested-by: Alex Williamson > --- > virt/kvm/ioapic.c | 7 +++ > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c > index 2458a1dc2ba9..e8ce34c9db32 100644 > --- a/virt/kvm/ioapic.c > +++ b/virt/kvm/ioapic.c > @@ -254,10 +254,9 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 > *eoi_exit_bitmap, > spin_lock(&ioapic->lock); > for (index = 0; index < IOAPIC_NUM_PINS; index++) { > e = &ioapic->redirtbl[index]; > - if (!e->fields.mask && > - (e->fields.trig_mode == IOAPIC_LEVEL_TRIG || > - kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, > - index) || index == RTC_GSI)) { > + if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG || > + kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, > index) || > + index == RTC_GSI) { > if (kvm_apic_match_dest(vcpu, NULL, 0, > e->fields.dest_id, e->fields.dest_mode)) { > __set_bit(e->fields.vector, -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] watchdog: control hard lockup detection default
On Wed, Jul 30, 2014 at 04:16:38PM +0200, Paolo Bonzini wrote: > Il 30/07/2014 15:43, Don Zickus ha scritto: > >> > Nice catch. Looks like this will need a v2. Paolo, do we have a > >> > consensus on the proc echoing? Or should that be revisited in the v2 as > >> > well? > > As discussed privately, how about something like this to handle that case: > > (applied on top of these patches) > > Don, what do you think about proc? > > My opinion is still what I mentioned earlier in the thread, i.e. that if > the file says "1", writing "0" and then "1" should not constitute a > change WRT to the initial state. > I can agree. The problem is there are two things this proc value controls, softlockup and hardlockup. I have always tried to keep the both disabled or enabled together. This patchset tries to separate them for an edge case. Hence the proc value becomes slightly confusing. I don't know the right way to solve this without introducing more proc values. We have /proc/sys/kernel/nmi_watchdog and /proc/sys/kernel/watchdog which point to the same internal variable. Do I separate them and have 'nmi_watchdog' just mean hardlockup and 'watchdog' mean softlockup? Then we can be clear on what the output is. Or does 'watchdog' represent a superset of 'nmi_watchdog' && softlockup? That is where the confusion lies. Cheers, Don -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest visible register
On Wed, 2014-07-30 at 00:21 -0500, Bhushan Bharat-R65777 wrote: > > > -Original Message- > > From: Wood Scott-B07421 > > Sent: Tuesday, July 29, 2014 3:22 AM > > To: Bhushan Bharat-R65777 > > Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; Yoder > > Stuart- > > B08248 > > Subject: Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest > > visible register > > > > On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote: > > > This is not used and even I do not remember why this was added in > > > first place. > > > > > > Signed-off-by: Bharat Bhushan > > > --- > > > arch/powerpc/kvm/booke.c | 2 -- > > > 1 file changed, 2 deletions(-) > > > > > > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index > > > ab62109..a5ee42c 100644 > > > --- a/arch/powerpc/kvm/booke.c > > > +++ b/arch/powerpc/kvm/booke.c > > > @@ -1804,8 +1804,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct > > > kvm_vcpu > > *vcpu, > > > kvm_guest_protect_msr(vcpu, MSR_DE, true); > > > vcpu->guest_debug = dbg->control; > > > vcpu->arch.shadow_dbg_reg.dbcr0 = 0; > > > - /* Set DBCR0_EDM in guest visible DBCR0 register. */ > > > - vcpu->arch.dbg_reg.dbcr0 = DBCR0_EDM; > > > > > > if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) > > > vcpu->arch.shadow_dbg_reg.dbcr0 |= DBCR0_IDM | DBCR0_IC; > > > > This was intended to let the guest know that the host owns the debug > > resources, > > by analogy to what a JTAG debugger would do. > > > > The Power ISA has this "Virtualized Implementation Note": > > > > It is the responsibility of the hypervisor to ensure that > > DBCR0[EDM] is consistent with usage of DEP. > > Ok, That means that if MSRP_DEP is set then set DBCR0_EDM and if MSRP_DEP is > clear then clear DBCR0_EDM, right? > We need to implement above mentioned this. We should probably clear EDM only when guest debug emulation is working and enabled (i.e. not until at least patch 6/6). -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest visible register
> -Original Message- > From: Wood Scott-B07421 > Sent: Wednesday, July 30, 2014 11:18 PM > To: Bhushan Bharat-R65777 > Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; Yoder Stuart- > B08248 > Subject: Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest > visible register > > On Wed, 2014-07-30 at 00:21 -0500, Bhushan Bharat-R65777 wrote: > > > > > -Original Message- > > > From: Wood Scott-B07421 > > > Sent: Tuesday, July 29, 2014 3:22 AM > > > To: Bhushan Bharat-R65777 > > > Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; > > > Yoder Stuart- > > > B08248 > > > Subject: Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM > > > in guest visible register > > > > > > On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote: > > > > This is not used and even I do not remember why this was added in > > > > first place. > > > > > > > > Signed-off-by: Bharat Bhushan > > > > --- > > > > arch/powerpc/kvm/booke.c | 2 -- > > > > 1 file changed, 2 deletions(-) > > > > > > > > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c > > > > index ab62109..a5ee42c 100644 > > > > --- a/arch/powerpc/kvm/booke.c > > > > +++ b/arch/powerpc/kvm/booke.c > > > > @@ -1804,8 +1804,6 @@ int > > > > kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu > > > *vcpu, > > > > kvm_guest_protect_msr(vcpu, MSR_DE, true); > > > > vcpu->guest_debug = dbg->control; > > > > vcpu->arch.shadow_dbg_reg.dbcr0 = 0; > > > > - /* Set DBCR0_EDM in guest visible DBCR0 register. */ > > > > - vcpu->arch.dbg_reg.dbcr0 = DBCR0_EDM; > > > > > > > > if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) > > > > vcpu->arch.shadow_dbg_reg.dbcr0 |= DBCR0_IDM | DBCR0_IC; > > > > > > This was intended to let the guest know that the host owns the debug > > > resources, by analogy to what a JTAG debugger would do. > > > > > > The Power ISA has this "Virtualized Implementation Note": > > > > > > It is the responsibility of the hypervisor to ensure that > > > DBCR0[EDM] is consistent with usage of DEP. > > > > Ok, That means that if MSRP_DEP is set then set DBCR0_EDM and if MSRP_DEP > > is > clear then clear DBCR0_EDM, right? > > We need to implement above mentioned this. > > We should probably clear EDM only when guest debug emulation is working and > enabled (i.e. not until at least patch 6/6). But if EDM is set then guest debug emulation will not start/allowed. Thanks -Bharat > > -Scott >
Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest visible register
On Wed, 2014-07-30 at 12:57 -0500, Bhushan Bharat-R65777 wrote: > > > -Original Message- > > From: Wood Scott-B07421 > > Sent: Wednesday, July 30, 2014 11:18 PM > > To: Bhushan Bharat-R65777 > > Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; Yoder > > Stuart- > > B08248 > > Subject: Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest > > visible register > > > > On Wed, 2014-07-30 at 00:21 -0500, Bhushan Bharat-R65777 wrote: > > > > > > > -Original Message- > > > > From: Wood Scott-B07421 > > > > Sent: Tuesday, July 29, 2014 3:22 AM > > > > To: Bhushan Bharat-R65777 > > > > Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; > > > > Yoder Stuart- > > > > B08248 > > > > Subject: Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM > > > > in guest visible register > > > > > > > > On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote: > > > > > This is not used and even I do not remember why this was added in > > > > > first place. > > > > > > > > > > Signed-off-by: Bharat Bhushan > > > > > --- > > > > > arch/powerpc/kvm/booke.c | 2 -- > > > > > 1 file changed, 2 deletions(-) > > > > > > > > > > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c > > > > > index ab62109..a5ee42c 100644 > > > > > --- a/arch/powerpc/kvm/booke.c > > > > > +++ b/arch/powerpc/kvm/booke.c > > > > > @@ -1804,8 +1804,6 @@ int > > > > > kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu > > > > *vcpu, > > > > > kvm_guest_protect_msr(vcpu, MSR_DE, true); > > > > > vcpu->guest_debug = dbg->control; > > > > > vcpu->arch.shadow_dbg_reg.dbcr0 = 0; > > > > > - /* Set DBCR0_EDM in guest visible DBCR0 register. */ > > > > > - vcpu->arch.dbg_reg.dbcr0 = DBCR0_EDM; > > > > > > > > > > if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) > > > > > vcpu->arch.shadow_dbg_reg.dbcr0 |= DBCR0_IDM | DBCR0_IC; > > > > > > > > This was intended to let the guest know that the host owns the debug > > > > resources, by analogy to what a JTAG debugger would do. > > > > > > > > The Power ISA has this "Virtualized Implementation Note": > > > > > > > > It is the responsibility of the hypervisor to ensure that > > > > DBCR0[EDM] is consistent with usage of DEP. > > > > > > Ok, That means that if MSRP_DEP is set then set DBCR0_EDM and if > > > MSRP_DEP is > > clear then clear DBCR0_EDM, right? > > > We need to implement above mentioned this. > > > > We should probably clear EDM only when guest debug emulation is working and > > enabled (i.e. not until at least patch 6/6). > > But if EDM is set then guest debug emulation will not start/allowed. I don't mean after the guest tries to write to the registers -- I mean after the code has been added to KVM to allow it to work. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] KVM/ARM Urgent fix for 3.16
On Wed, Jul 30, 2014 at 03:34:11PM +0200, Paolo Bonzini wrote: > Il 30/07/2014 14:55, Christoffer Dall ha scritto: > > Hi Paolo and Gleb, > > > > Is there any chance you can get this urgent fix (which allows KVM guest > > to bring down the entire system on some 64K enabled ARM64 hosts) merged > > for 3.16? > > > > The following changes since commit bb18b526a9d8d4a3fe56f234d5013b9f6036978d: > > > > Merge tag 'signed-for-3.16' of git://github.com/agraf/linux-2.6 into > > kvm-master (2014-07-08 12:08:58 +0200) > > > > are available in the git repository at: > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git > > tags/kvm-arm-for-3.16-rc7 > > > > for you to fetch changes up to 63afbe7a0ac184ef8485dac4914e87b211b5bfaa: > > > > kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform > > (2014-07-30 14:35:42 +0200) > > > > --- > > Will Deacon (1): > > kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform > > > > virt/kvm/arm/vgic.c | 24 > > 1 file changed, 20 insertions(+), 4 deletions(-) > > > > I think Gleb is on vacation now, but unfortunately I've already had > enough this year. > > I resent the pull request from > git://git.kernel.org/pub/scm/virt/kvm/kvm.git, even though you had CCed > Linus here already. > I cc'ed Linus in case you were on vacation and since this is urgent and last minute. In any case, thanks to all for dealing with this quickly. -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: HV: Remove generic instruction emulation
On 30.07.14 18:21, Paolo Bonzini wrote: Il 30/07/2014 15:27, Alexander Graf ha scritto: Now that we have properly split load/store instruction emulation and generic instruction emulation, we can move the generic one from kvm.ko to kvm-pr.ko on book3s_64. This reduces the attack surface and amount of code loaded on HV KVM kernels. Can emulation races happen on HV KVM like you can have on x86? Basically one CPU writes to MMIO while the other patches instructions so that basically anything can end up in the hands of the emulator? On PPC it may even happen simply because of a missing icache invalidation, I think, since it doesn't support self-modifying code without explicit invalidation. Yes, this is perfectly possible. As of my last patch set we will never enter the generic emulator for HV KVM, so that race is moot (we just inject a PROGRAM interrupt into the guest). With this patch even the code to emulate these bits doesn't exist in the kernel anymore if you don't modprobe kvm-pr.ko. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: HV: Remove generic instruction emulation
Il 30/07/2014 20:57, Alexander Graf ha scritto: > Yes, this is perfectly possible. As of my last patch set we will never > enter the generic emulator for HV KVM, so that race is moot (we just > inject a PROGRAM interrupt into the guest). With this patch even the > code to emulate these bits doesn't exist in the kernel anymore if you > don't modprobe kvm-pr.ko. What is a PROGRAM interrupt? :) Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: HV: Remove generic instruction emulation
On 30.07.14 21:47, Paolo Bonzini wrote: Il 30/07/2014 20:57, Alexander Graf ha scritto: Yes, this is perfectly possible. As of my last patch set we will never enter the generic emulator for HV KVM, so that race is moot (we just inject a PROGRAM interrupt into the guest). With this patch even the code to emulate these bits doesn't exist in the kernel anymore if you don't modprobe kvm-pr.ko. What is a PROGRAM interrupt? :) The thing that happens when you invoke an illegal or privileged instruction ;) Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 0/3] x86: svm: minimal IOIO testing
So far the only "multi-stage" test was assembly only, so we have to implement register save/restore around vmrun. Paolo Paolo Bonzini (3): x86: svm: load/save all GPRs x86: svm: initialize IO bitmap x86: svm: IOIO testing x86/svm.c | 191 -- 1 file changed, 186 insertions(+), 5 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 2/3] x86: svm: initialize IO bitmap
Signed-off-by: Paolo Bonzini --- x86/svm.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/x86/svm.c b/x86/svm.c index 4b7f06e..2cf5c81 100644 --- a/x86/svm.c +++ b/x86/svm.c @@ -36,6 +36,9 @@ u64 latclgi_max; u64 latclgi_min; u64 runs; +u8 *io_bitmap; +u8 io_bitmap_area[16384]; + static bool npt_supported(void) { return cpuid(0x800A).d & 1; @@ -53,6 +56,8 @@ static void setup_svm(void) scratch_page = alloc_page(); +io_bitmap = (void *) (((ulong)io_bitmap_area + 4095) & ~4095); + if (!npt_supported()) return; @@ -149,6 +154,7 @@ static void vmcb_ident(struct vmcb *vmcb) save->g_pat = rdmsr(MSR_IA32_CR_PAT); save->dbgctl = rdmsr(MSR_IA32_DEBUGCTLMSR); ctrl->intercept = (1ULL << INTERCEPT_VMRUN) | (1ULL << INTERCEPT_VMMCALL); +ctrl->iopm_base_pa = virt_to_phys(io_bitmap); if (npt_supported()) { ctrl->nested_ctl = 1; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 3/3] x86: svm: IOIO testing
Testing the bitmap handling so far, does not cover string instructions yet. Signed-off-by: Paolo Bonzini --- x86/svm.c | 126 ++ 1 file changed, 126 insertions(+) diff --git a/x86/svm.c b/x86/svm.c index 2cf5c81..290c33e 100644 --- a/x86/svm.c +++ b/x86/svm.c @@ -6,6 +6,7 @@ #include "vm.h" #include "smp.h" #include "types.h" +#include "io.h" /* for the nested page table*/ u64 *pml4e; @@ -505,6 +506,129 @@ static bool check_mode_switch(struct test *test) return test->scratch == 2; } +static void prepare_ioio(struct test *test) +{ +test->vmcb->control.intercept |= (1ULL << INTERCEPT_IOIO_PROT); +test->scratch = 0; +memset(io_bitmap, 0, 8192); +io_bitmap[8192] = 0xFF; +} + +int get_test_stage(struct test *test) +{ +barrier(); +return test->scratch; +} + +void inc_test_stage(struct test *test) +{ +barrier(); +test->scratch++; +barrier(); +} + +static void test_ioio(struct test *test) +{ +// stage 0, test IO pass +inb(0x5000); +outb(0x0, 0x5000); +if (get_test_stage(test) != 0) +goto fail; + +// test IO width, in/out +io_bitmap[0] = 0xFF; +inc_test_stage(test); +inb(0x0); +if (get_test_stage(test) != 2) +goto fail; + +outw(0x0, 0x0); +if (get_test_stage(test) != 3) +goto fail; + +inl(0x0); +if (get_test_stage(test) != 4) +goto fail; + +// test low/high IO port +io_bitmap[0x5000 / 8] = (1 << (0x5000 % 8)); +inb(0x5000); +if (get_test_stage(test) != 5) +goto fail; + +io_bitmap[0x9000 / 8] = (1 << (0x9000 % 8)); +inw(0x9000); +if (get_test_stage(test) != 6) +goto fail; + +// test partial pass +io_bitmap[0x5000 / 8] = (1 << (0x5000 % 8)); +inl(0x4FFF); +if (get_test_stage(test) != 7) +goto fail; + +// test across pages +inc_test_stage(test); +inl(0x7FFF); +if (get_test_stage(test) != 8) +goto fail; + +inc_test_stage(test); +io_bitmap[0x8000 / 8] = 1 << (0x8000 % 8); +inl(0x7FFF); +if (get_test_stage(test) != 10) +goto fail; + +io_bitmap[0] = 0; +inl(0x); +if (get_test_stage(test) != 11) +goto fail; + +io_bitmap[0] = 0xFF; +io_bitmap[8192] = 0; +inl(0x); +inc_test_stage(test); +if (get_test_stage(test) != 12) +goto fail; + +return; + +fail: +printf("test failure, stage %d\n", get_test_stage(test)); +test->scratch = -1; +} + +static bool ioio_finished(struct test *test) +{ +unsigned port, size; + +/* Only expect IOIO intercepts */ +if (test->vmcb->control.exit_code == SVM_EXIT_VMMCALL) +return true; + +if (test->vmcb->control.exit_code != SVM_EXIT_IOIO) +return true; + +/* one step forward */ +test->scratch += 1; + +port = test->vmcb->control.exit_info_1 >> 16; +size = (test->vmcb->control.exit_info_1 >> SVM_IOIO_SIZE_SHIFT) & 7; + +while (size--) { +io_bitmap[port / 8] &= ~(1 << (port & 7)); +port++; +} + +return false; +} + +static bool check_ioio(struct test *test) +{ +memset(io_bitmap, 0, 8193); +return test->scratch != -1; +} + static void prepare_asid_zero(struct test *test) { test->vmcb->control.asid = 0; @@ -804,6 +928,8 @@ static struct test tests[] = { default_finished, null_check }, { "vmrun", default_supported, default_prepare, test_vmrun, default_finished, check_vmrun }, +{ "ioio", default_supported, prepare_ioio, test_ioio, + ioio_finished, check_ioio }, { "vmrun intercept check", default_supported, prepare_no_vmrun_int, null_test, default_finished, check_no_vmrun_int }, { "cr3 read intercept", default_supported, prepare_cr3_intercept, -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 1/3] x86: svm: load/save all GPRs
The cr2 field is unused, but I prefer to keep it the same as vmx (it is also unused there). Signed-off-by: Paolo Bonzini --- x86/svm.c | 59 ++- 1 file changed, 54 insertions(+), 5 deletions(-) diff --git a/x86/svm.c b/x86/svm.c index 3e45426..4b7f06e 100644 --- a/x86/svm.c +++ b/x86/svm.c @@ -174,6 +174,48 @@ static void test_thunk(struct test *test) asm volatile ("vmmcall" : : : "memory"); } +struct regs { +u64 rax; +u64 rcx; +u64 rdx; +u64 rbx; +u64 cr2; +u64 rbp; +u64 rsi; +u64 rdi; +u64 r8; +u64 r9; +u64 r10; +u64 r11; +u64 r12; +u64 r13; +u64 r14; +u64 r15; +u64 rflags; +}; + +struct regs regs; + +// rax handled specially below + +#define SAVE_GPR_C \ +"xchg %%rbx, regs+0x8\n\t" \ +"xchg %%rcx, regs+0x10\n\t" \ +"xchg %%rdx, regs+0x18\n\t" \ +"xchg %%rbp, regs+0x28\n\t" \ +"xchg %%rsi, regs+0x30\n\t" \ +"xchg %%rdi, regs+0x38\n\t" \ +"xchg %%r8, regs+0x40\n\t" \ +"xchg %%r9, regs+0x48\n\t" \ +"xchg %%r10, regs+0x50\n\t" \ +"xchg %%r11, regs+0x58\n\t" \ +"xchg %%r12, regs+0x60\n\t" \ +"xchg %%r13, regs+0x68\n\t" \ +"xchg %%r14, regs+0x70\n\t" \ +"xchg %%r15, regs+0x78\n\t" + +#define LOAD_GPR_C SAVE_GPR_C + static bool test_run(struct test *test, struct vmcb *vmcb) { u64 vmcb_phys = virt_to_phys(vmcb); @@ -184,19 +226,26 @@ static bool test_run(struct test *test, struct vmcb *vmcb) test->prepare(test); vmcb->save.rip = (ulong)test_thunk; vmcb->save.rsp = (ulong)(guest_stack + ARRAY_SIZE(guest_stack)); +regs.rdi = (ulong)test; do { tsc_start = rdtsc(); asm volatile ( "clgi \n\t" "vmload \n\t" -"push %%rbp \n\t" -"push %1 \n\t" +"mov regs+0x80, %%r15\n\t" // rflags +"mov %%r15, 0x170(%0)\n\t" +"mov regs, %%r15\n\t" // rax +"mov %%r15, 0x1f8(%0)\n\t" +LOAD_GPR_C "vmrun \n\t" -"pop %1 \n\t" -"pop %%rbp \n\t" +SAVE_GPR_C +"mov 0x170(%0), %%r15\n\t" // rflags +"mov %%r15, regs+0x80\n\t" +"mov 0x1f8(%0), %%r15\n\t" // rax +"mov %%r15, regs\n\t" "vmsave \n\t" "stgi" -: : "a"(vmcb_phys), "D"(test) +: : "a"(vmcb_phys) : "rbx", "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11" , "r12", "r13", "r14", "r15", "memory"); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 1/4] x86/kvm: Resolve some missing-initializers warnings
Resolve some missing-initializers warnings that appear in W=2 builds. They are resolved by adding the name as a parameter to the macros and having the macro generate all four fields of the structure. Signed-off-by: Mark Rustad Signed-off-by: Jeff Kirsher --- V2: Change macro to supply all four fields instead of using a designated initializer. Also fix up the array terminator. --- arch/x86/kvm/x86.c | 71 ++-- 1 file changed, 36 insertions(+), 35 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ef432f891d30..623aea52ceba 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -82,8 +82,9 @@ u64 __read_mostly efer_reserved_bits = ~((u64)(EFER_SCE | EFER_LME | EFER_LMA)); static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE); #endif -#define VM_STAT(x) offsetof(struct kvm, stat.x), KVM_STAT_VM -#define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU +#define VM_STAT(name, x) name, offsetof(struct kvm, stat.x), KVM_STAT_VM, NULL +#define VCPU_STAT(name, x) name, offsetof(struct kvm_vcpu, stat.x), \ + KVM_STAT_VCPU, NULL static void update_cr8_intercept(struct kvm_vcpu *vcpu); static void process_nmi(struct kvm_vcpu *vcpu); @@ -128,39 +129,39 @@ static struct kvm_shared_msrs_global __read_mostly shared_msrs_global; static struct kvm_shared_msrs __percpu *shared_msrs; struct kvm_stats_debugfs_item debugfs_entries[] = { - { "pf_fixed", VCPU_STAT(pf_fixed) }, - { "pf_guest", VCPU_STAT(pf_guest) }, - { "tlb_flush", VCPU_STAT(tlb_flush) }, - { "invlpg", VCPU_STAT(invlpg) }, - { "exits", VCPU_STAT(exits) }, - { "io_exits", VCPU_STAT(io_exits) }, - { "mmio_exits", VCPU_STAT(mmio_exits) }, - { "signal_exits", VCPU_STAT(signal_exits) }, - { "irq_window", VCPU_STAT(irq_window_exits) }, - { "nmi_window", VCPU_STAT(nmi_window_exits) }, - { "halt_exits", VCPU_STAT(halt_exits) }, - { "halt_wakeup", VCPU_STAT(halt_wakeup) }, - { "hypercalls", VCPU_STAT(hypercalls) }, - { "request_irq", VCPU_STAT(request_irq_exits) }, - { "irq_exits", VCPU_STAT(irq_exits) }, - { "host_state_reload", VCPU_STAT(host_state_reload) }, - { "efer_reload", VCPU_STAT(efer_reload) }, - { "fpu_reload", VCPU_STAT(fpu_reload) }, - { "insn_emulation", VCPU_STAT(insn_emulation) }, - { "insn_emulation_fail", VCPU_STAT(insn_emulation_fail) }, - { "irq_injections", VCPU_STAT(irq_injections) }, - { "nmi_injections", VCPU_STAT(nmi_injections) }, - { "mmu_shadow_zapped", VM_STAT(mmu_shadow_zapped) }, - { "mmu_pte_write", VM_STAT(mmu_pte_write) }, - { "mmu_pte_updated", VM_STAT(mmu_pte_updated) }, - { "mmu_pde_zapped", VM_STAT(mmu_pde_zapped) }, - { "mmu_flooded", VM_STAT(mmu_flooded) }, - { "mmu_recycled", VM_STAT(mmu_recycled) }, - { "mmu_cache_miss", VM_STAT(mmu_cache_miss) }, - { "mmu_unsync", VM_STAT(mmu_unsync) }, - { "remote_tlb_flush", VM_STAT(remote_tlb_flush) }, - { "largepages", VM_STAT(lpages) }, - { NULL } + { VCPU_STAT("pf_fixed", pf_fixed) }, + { VCPU_STAT("pf_guest", pf_guest) }, + { VCPU_STAT("tlb_flush", tlb_flush) }, + { VCPU_STAT("invlpg", invlpg) }, + { VCPU_STAT("exits", exits) }, + { VCPU_STAT("io_exits", io_exits) }, + { VCPU_STAT("mmio_exits", mmio_exits) }, + { VCPU_STAT("signal_exits", signal_exits) }, + { VCPU_STAT("irq_window", irq_window_exits) }, + { VCPU_STAT("nmi_window", nmi_window_exits) }, + { VCPU_STAT("halt_exits", halt_exits) }, + { VCPU_STAT("halt_wakeup", halt_wakeup) }, + { VCPU_STAT("hypercalls", hypercalls) }, + { VCPU_STAT("request_irq", request_irq_exits) }, + { VCPU_STAT("irq_exits", irq_exits) }, + { VCPU_STAT("host_state_reload", host_state_reload) }, + { VCPU_STAT("efer_reload", efer_reload) }, + { VCPU_STAT("fpu_reload", fpu_reload) }, + { VCPU_STAT("insn_emulation", insn_emulation) }, + { VCPU_STAT("insn_emulation_fail", insn_emulation_fail) }, + { VCPU_STAT("irq_injections", irq_injections) }, + { VCPU_STAT("nmi_injections", nmi_injections) }, + { VM_STAT("mmu_shadow_zapped", mmu_shadow_zapped) }, + { VM_STAT("mmu_pte_write", mmu_pte_write) }, + { VM_STAT("mmu_pte_updated", mmu_pte_updated) }, + { VM_STAT("mmu_pde_zapped", mmu_pde_zapped) }, + { VM_STAT("mmu_flooded", mmu_flooded) }, + { VM_STAT("mmu_recycled", mmu_recycled) }, + { VM_STAT("mmu_cache_miss", mmu_cache_miss) }, + { VM_STAT("mmu_unsync", mmu_unsync) }, + { VM_STAT("remote_tlb_flush", remote_tlb_flush) }, + { VM_STAT("largepages", lpages) }, + { NULL, 0, 0, NULL } }; u64 __read_mostly host_xcr0; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.
[PATCH V2 3/4] x86/kvm: Resolve shadow warnings in macro expansion
Resolve shadow warnings that appear in W=2 builds. Instead of using ret to hold the return pointer, save the length in a new variable saved_len and compute the pointer on exit. This also resolves a very technical error, in that ret was declared as a const char *, when it really was a char * const, which theoretically could have allowed the compiler to do something wrong. Signed-off-by: Mark Rustad Signed-off-by: Jeff Kirsher --- Changes in V2: - Instead of renaming all inner variables, just delete the ret variable in favor of the new saved_len variable. --- arch/x86/kvm/mmutrace.h |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h index 9d2e0ffcb190..5aaf35641768 100644 --- a/arch/x86/kvm/mmutrace.h +++ b/arch/x86/kvm/mmutrace.h @@ -22,7 +22,7 @@ __entry->unsync = sp->unsync; #define KVM_MMU_PAGE_PRINTK() ({ \ - const char *ret = p->buffer + p->len; \ + const u32 saved_len = p->len; \ static const char *access_str[] = { \ "---", "--x", "w--", "w-x", "-u-", "-ux", "wu-", "wux" \ }; \ @@ -41,7 +41,7 @@ role.nxe ? "" : "!", \ __entry->root_count, \ __entry->unsync ? "unsync" : "sync", 0); \ - ret;\ + p->buffer + saved_len; \ }) #define kvm_mmu_trace_pferr_flags \ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
hang after seabios
Hi, Locally I have a supermicro server running OEL 6.5 with KVM can do virt-sysprep and libguestfs-test-tool no problem. Linux 2.6.39-400.215.6.el6uek.x86_64 qemu-kvm-0.12.1.2-2.415.el6_5.10.x86_64 seabios-0.6.1.2-28.el6.x86_64 However I have a server in a datacenter (Sun X4-2) running the same versions, and libguestfs-test-tool hangs when launching KVM. virt-sysprep also hangs the same way when trying to access a disk image, so I'm using libguestfs-test-tool as my example: [root@kvm]# libguestfs-test-tool *IMPORTANT NOTICE * * When reporting bugs, include the COMPLETE, UNEDITED * output below in your bug report. * LIBGUESTFS_APPEND=edd=off PATH=/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr /sbin:/usr/bin:/root/bin SELinux: Enforcing library version: 1.20.11rhel=6,release=2.el6 guestfs_get_append: edd=off guestfs_get_attach_method: appliance guestfs_get_autosync: 1 guestfs_get_cachedir: /var/tmp guestfs_get_direct: 0 guestfs_get_memsize: 500 guestfs_get_network: 0 guestfs_get_path: /usr/lib64/guestfs guestfs_get_pgroup: 0 guestfs_get_qemu: /usr/libexec/qemu-kvm guestfs_get_recovery_proc: 1 guestfs_get_selinux: 0 guestfs_get_smp: 1 guestfs_get_tmpdir: /tmp guestfs_get_trace: 0 guestfs_get_verbose: 1 host_cpu: x86_64 Launching appliance, timeout set to 600 seconds. libguestfs: launch: attach-method=appliance libguestfs: launch: tmpdir=/tmp/libguestfspx9994 libguestfs: launch: umask=0077 libguestfs: launch: euid=0 libguestfs: command: run: febootstrap-supermin-helper libguestfs: command: run: \ --verbose libguestfs: command: run: \ -f checksum libguestfs: command: run: \ /usr/lib64/guestfs/supermin.d libguestfs: command: run: \ x86_64 supermin helper [0ms] whitelist = (not specified), host_cpu = x86_64, kernel = (null), initrd = (null), appliance = (null) supermin helper [0ms] inputs[0] = /usr/lib64/guestfs/supermin.d checking modpath /lib/modules/2.6.32-279.el6.x86_64 is a directory picked vmlinuz-2.6.32-279.el6.x86_64 because modpath /lib/modules/2.6.32- 279.el6.x86_64 exists checking modpath /lib/modules/2.6.39-200.24.1.el6uek.x86_64 is a directory picked vmlinuz-2.6.39-200.24.1.el6uek.x86_64 because modpath /lib/modules /2.6.39-200.24.1.el6uek.x86_64 exists supermin helper [0ms] finished creating kernel supermin helper [0ms] visiting /usr/lib64/guestfs/supermin.d supermin helper [0ms] visiting /usr/lib64/guestfs/supermin.d/base.img supermin helper [0ms] visiting /usr/lib64/guestfs/supermin.d/ daemon.img supermin helper [0ms] visiting /usr/lib64/guestfs/supermin.d/ hostfiles supermin helper [00020ms] visiting /usr/lib64/guestfs/supermin.d/init.img supermin helper [00020ms] visiting /usr/lib64/guestfs/supermin.d/ udev-rules.img supermin helper [00020ms] adding kernel modules supermin helper [00051ms] finished creating appliance libguestfs: checksum of existing appliance: 4805d2b09b84366bd753e62706693476b59c3971f4c1808739426b92f8baa3bf libguestfs: [00054ms] begin testing qemu features libguestfs: command: run: /usr/libexec/qemu-kvm libguestfs: command: run: \ -nographic libguestfs: command: run: \ -help libguestfs: command: run: /usr/libexec/qemu-kvm libguestfs: command: run: \ -nographic libguestfs: command: run: \ -version libguestfs: qemu version 0.12 libguestfs: command: run: /usr/libexec/qemu-kvm libguestfs: command: run: \ -nographic libguestfs: command: run: \ -machine accel=kvm:tcg libguestfs: command: run: \ -device ? libguestfs: [00182ms] finished testing qemu features libguestfs: accept_from_daemon: 0x2266e00 g->state = 1 [00183ms] /usr/libexec/qemu-kvm \ -global virtio-blk-pci.scsi=off \ -nodefconfig \ -nodefaults \ -nographic \ -machine accel=kvm:tcg \ -cpu host,+kvmclock \ -m 500 \ -no-reboot \ -kernel /var/tmp/.guestfs-0/kernel.47903 \ -initrd /var/tmp/.guestfs-0/initrd.47903 \ -device virtio-scsi-pci,id=scsi \ -drive file=/tmp/libguestfs-test-tool-sda-Iakpwe,cache=none,format =raw,id=hd0,if=none \ -device scsi-hd,drive=hd0 \ -drive file=/var/tmp/.guestfs-0/root.47903,snapshot=on,id=appliance, if=none,cache=unsafe \ -device scsi-hd,drive=appliance \ -device virtio-serial \ -serial stdio \ -device sga \ -chardev socket,path=/tmp/libguestfspx9994/guestfsd.sock,id=channel0 \ -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \ -append 'panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0
-- Re: Very Urgent............
Greetings from gulf region Thanks for the e-mail. I am very interested on funding lucrative business partnership with you acting as the manager and sole controller of the investment while i remain a silent investor for a period of ten yrs , though I am only looking at investment opportunities within the range you specified for a start. You can reply me here (fmoris...@yahoo.com) Let me know your thought asap Regards Financial Consultant Mr.Fabian Morision -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception
On Wed, 2014-07-30 at 01:43 -0500, Bhushan Bharat-R65777 wrote: > > > -Original Message- > > From: Wood Scott-B07421 > > Sent: Tuesday, July 29, 2014 3:58 AM > > To: Bhushan Bharat-R65777 > > Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; Yoder > > Stuart- > > B08248 > > Subject: Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and > > exception > > > > Userspace might be interested in > > the raw value, > > With the current design, If userspace is interested then it will not > get the DBSR. Oh, because DBSR isn't currently implemented in sregs or one reg? > But why userspace will be interested? Do you expose all of the hardware's debugging features in your high-level interface? > > plus it's a change from the current API semantics. > > Can you please let us know how ? It looked like it was removing dbsr visibility and the requirement for userspace to clear dbsr. I guess the old way was that the value in vcpu->arch.dbsr didn't matter until the next debug exception, when it would be overwritten by the new SPRN_DBSR? > > > + case SPRN_DBCR2: > > > + /* > > > + * If userspace is debugging guest then guest > > > + * can not access debug registers. > > > + */ > > > + if (vcpu->guest_debug) > > > + break; > > > + > > > + debug_inst = true; > > > + vcpu->arch.dbg_reg.dbcr2 = spr_val; > > > + vcpu->arch.shadow_dbg_reg.dbcr2 = spr_val; > > > break; > > > > In what circumstances can the architected and shadow registers differ? > > As of now they are same. But I think that if we want to implement other > features like "Freeze Timer (FT)" then they can be different. I don't think we can possibly implement Freeze Timer. > > > case SPRN_DBSR: > > > + /* > > > + * If userspace is debugging guest then guest > > > + * can not access debug registers. > > > + */ > > > + if (vcpu->guest_debug) > > > + break; > > > + > > > vcpu->arch.dbsr &= ~spr_val; > > > + if (vcpu->arch.dbsr == 0) > > > + kvmppc_core_dequeue_debug(vcpu); > > > break; > > > > Not all DBSR bits cause an exception, e.g. IDE and MRR. > > I am not sure what we should in that case ? > > As we are currently emulating a subset of debug events (IAC, DAC, IC, > BT and TIE --- DBCR0 emulation) then we should expose status of those > events in guest dbsr and rest should be cleared ? I'm not saying they need to be exposed to the guest, but I don't see where you filter out bits like these. > > > @@ -273,6 +397,10 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu > > > *vcpu, int > > sprn, ulong spr_val) > > > emulated = EMULATE_FAIL; > > > } > > > > > > + if (debug_inst) { > > > + switch_booke_debug_regs(&vcpu->arch.shadow_dbg_reg); > > > + current->thread.debug = vcpu->arch.shadow_dbg_reg; > > > + } > > > > Could you explain what's going on with regard to copying the registers > > into current->thread.debug? Why is it done after loading the registers > > into the hardware (is there a race if we get preempted in the middle)? > > Yes, and this was something I was not clear when writing this code. > Should we have preempt disable-enable around this. Can they be reordered instead? -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception
> -Original Message- > From: Wood Scott-B07421 > Sent: Thursday, July 31, 2014 8:18 AM > To: Bhushan Bharat-R65777 > Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; Yoder Stuart- > B08248 > Subject: Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and > exception > > On Wed, 2014-07-30 at 01:43 -0500, Bhushan Bharat-R65777 wrote: > > > > > -Original Message- > > > From: Wood Scott-B07421 > > > Sent: Tuesday, July 29, 2014 3:58 AM > > > To: Bhushan Bharat-R65777 > > > Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; > > > Yoder Stuart- > > > B08248 > > > Subject: Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers > > > and exception > > > > > > Userspace might be interested in > > > the raw value, > > > > With the current design, If userspace is interested then it will not > > get the DBSR. > > Oh, because DBSR isn't currently implemented in sregs or one reg? That is one reason. Another is that if we give dbsr visibility to userspace then userspace have to clear dbsr in handling KVM_EXIT_DEBUG. And we think there is no gain in doing that because " - QEMU cannot inject debug interrupt to guest (as this does not know guest ability to handle debug interrupt; MSR_DE), so will always clear DBSR. - If QEMU has to always clear DBSR in handling KVM_EXIT_DEBUG then this (clearing dbsr in kernel) avoid doing in SET_SREGS/set_one_reg() " This makes dbsr not visible to userspace. Also this (clearing of dbsr) should not be part of this patch, this should be a separate patch. I will do that in next version. > > > But why userspace will be interested? > > Do you expose all of the hardware's debugging features in your high-level > interface? We support h/w breakpoint, watchpoint and IC (single stepping) and status in userspace exit provide all required information to userspace. > > > > plus it's a change from the current API semantics. > > > > Can you please let us know how ? > > It looked like it was removing dbsr visibility and the requirement for > userspace > to clear dbsr. I guess the old way was that the value in > vcpu->arch.dbsr didn't matter until the next debug exception, when it > would be overwritten by the new SPRN_DBSR? But that means old dbsr will be visibility to userspace, which is even bad than not visible, no? Also this can lead to old dbsr visible to guest once userspace releases debug resources, but this can be solved by clearing dbsr in kvm_arch_vcpu_ioctl_set_guest_debug() -> " if (!(dbg->control & KVM_GUESTDBG_ENABLE)) { }". > > > > > + case SPRN_DBCR2: > > > > + /* > > > > +* If userspace is debugging guest then guest > > > > +* can not access debug registers. > > > > +*/ > > > > + if (vcpu->guest_debug) > > > > + break; > > > > + > > > > + debug_inst = true; > > > > + vcpu->arch.dbg_reg.dbcr2 = spr_val; > > > > + vcpu->arch.shadow_dbg_reg.dbcr2 = spr_val; > > > > break; > > > > > > In what circumstances can the architected and shadow registers differ? > > > > As of now they are same. But I think that if we want to implement other > features like "Freeze Timer (FT)" then they can be different. > > I don't think we can possibly implement Freeze Timer. May be, but in my opinion we should keep this open. > > > > > case SPRN_DBSR: > > > > + /* > > > > +* If userspace is debugging guest then guest > > > > +* can not access debug registers. > > > > +*/ > > > > + if (vcpu->guest_debug) > > > > + break; > > > > + > > > > vcpu->arch.dbsr &= ~spr_val; > > > > + if (vcpu->arch.dbsr == 0) > > > > + kvmppc_core_dequeue_debug(vcpu); > > > > break; > > > > > > Not all DBSR bits cause an exception, e.g. IDE and MRR. > > > > I am not sure what we should in that case ? > > > > As we are currently emulating a subset of debug events (IAC, DAC, IC, > > BT and TIE --- DBCR0 emulation) then we should expose status of those > > events in guest dbsr and rest should be cleared ? > > I'm not saying they need to be exposed to the guest, but I don't see where you > filter out bits like these. I am trying to get what all bits should be filtered out, all bits except IACx, DACx, IC, BT and TIE (same as event set filtering done when setting DBCR0) ? i.e IDE, UDE, MRR, IRPT, RET, CIRPT, CRET should be filtered out? > > > > > @@ -273,6 +397,10 @@ int kvmppc_booke_emulate_mtspr(struct > > > > kvm_vcpu *vcpu, int > > > sprn, ulong spr_val) > > > > emulated = EMULATE_FAIL; > > > > } > > > > > > > > + if (debug_inst) { > > > > + switch_booke_debug_regs(&vcpu->arch.shadow_dbg_reg); > > > > + current->thread.debug = vcpu->arch.shadow_dbg_reg; > >