Re: [PATCH] KVM: nVMX: nested TPR shadow/threshold emulation
Il 01/08/2014 02:57, Zhang, Yang Z ha scritto: TPR_THRESHOLD will be likely written as zero, but the processor will never use it anyway. It's just a small optimization because nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW) will almost always be true. Theoretically, you are right. But we should not expect all VMMs follow it. It is not worth to violate the SDM just for saving two or three instructions' cost. Yes, you do need an if (cpu_has_vmx_tpr_shadow()) around the vmcs_write32. But still, checking nested_cpu_has is not strictly necessary. Right now they both are a single AND, but I have plans to change all of the cpu_has_*() checks to static keys. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] KVM: nVMX: nested TPR shadow/threshold emulation
Paolo Bonzini wrote on 2014-08-01: Il 01/08/2014 02:57, Zhang, Yang Z ha scritto: TPR_THRESHOLD will be likely written as zero, but the processor will never use it anyway. It's just a small optimization because nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW) will almost always be true. Theoretically, you are right. But we should not expect all VMMs follow it. It is not worth to violate the SDM just for saving two or three instructions' cost. Yes, you do need an if (cpu_has_vmx_tpr_shadow()) around the vmcs_write32. But still, checking nested_cpu_has is not strictly necessary. Right now they both are a single AND, but I have plans to change all of the cpu_has_*() checks to static keys. See v2 patch. It isn't a problem anymore. Paolo Best regards, Yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integrity in untrusted environments
Il 31/07/2014 23:25, Shiva V ha scritto: Hello, I am exploring ideas to implement a service inside a virtual machine on untrusted hypervisors under current cloud infrastructures. Particularly, I am interested how one can verify the integrity of the service in an environment where hypervisor is not trusted. This is my setup. 1. I have two virtual machines. (Normal client VM's). 2. VM-A is executing a service and VM-B wants to verify its integrity. 3. Both are executing on untrusted hypervisor. Though, Intel SGX will solve this, by using the concept of enclaves, its not publicly available yet. One could also use SMM to verify the integrity. But since this is time based approach, one could easily exploit between the time window. I was drilling down this idea, We know Write xor Execute Memory Protection Scheme. Using this idea,If we could lock down the VM-A memory pages where the service is running and also corresponding page-table entries, then have a handler code that temporarily unlocks them for legitimate updates, then one could verify the integrity of the service running. You can make a malicious hypervisor that makes all executable pages also writable, but hides the fact to the running process. But really, if you control the hypervisor you can just write to guest memory as you wish. SMM will be emulated by the hypervisor. If the hypervisor is untrusted, you cannot solve _everything_. For the third time, what attacks are you trying to protect from? Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] KVM: nVMX: nested TPR shadow/threshold emulation
This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=61411 TPR shadow/threshold feature is important to speed up the Windows guest. Besides, it is a must feature for certain VMM. We map virtual APIC page address and TPR threshold from L1 VMCS. If TPR_BELOW_THRESHOLD VM exit is triggered by L2 guest and L1 interested in, we inject it into L1 VMM for handling. Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com --- v1 - v2: * don't take L0's virtualize APIC accesses setting into account * virtual_apic_page do exactly the same thing that is done for apic_access_page * add the tpr threshold field to the read-write fields for shadow VMCS arch/x86/kvm/vmx.c | 33 +++-- 1 file changed, 31 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a3845b8..0e6e95e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -379,6 +379,7 @@ struct nested_vmx { * we must keep them pinned while L2 runs. */ struct page *apic_access_page; + struct page *virtual_apic_page; u64 msr_ia32_feature_control; struct hrtimer preemption_timer; @@ -533,6 +534,7 @@ static int max_shadow_read_only_fields = ARRAY_SIZE(shadow_read_only_fields); static unsigned long shadow_read_write_fields[] = { + TPR_THRESHOLD, GUEST_RIP, GUEST_RSP, GUEST_CR0, @@ -2331,7 +2333,7 @@ static __init void nested_vmx_setup_ctls_msrs(void) CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MONITOR_EXITING | CPU_BASED_RDPMC_EXITING | CPU_BASED_RDTSC_EXITING | - CPU_BASED_PAUSE_EXITING | + CPU_BASED_PAUSE_EXITING | CPU_BASED_TPR_SHADOW | CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; /* * We can allow some features even when not supported by the @@ -6149,6 +6151,10 @@ static void free_nested(struct vcpu_vmx *vmx) nested_release_page(vmx-nested.apic_access_page); vmx-nested.apic_access_page = 0; } + if (vmx-nested.virtual_apic_page) { + nested_release_page(vmx-nested.virtual_apic_page); + vmx-nested.virtual_apic_page = 0; + } nested_free_all_saved_vmcss(vmx); } @@ -6937,7 +6943,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu) case EXIT_REASON_MCE_DURING_VMENTRY: return 0; case EXIT_REASON_TPR_BELOW_THRESHOLD: - return 1; + return nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW); case EXIT_REASON_APIC_ACCESS: return nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES); @@ -7058,6 +7064,9 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) { + if (is_guest_mode(vcpu)) + return; + if (irr == -1 || tpr irr) { vmcs_write32(TPR_THRESHOLD, 0); return; @@ -8025,6 +8034,22 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) exec_control = ~CPU_BASED_VIRTUAL_NMI_PENDING; exec_control = ~CPU_BASED_TPR_SHADOW; exec_control |= vmcs12-cpu_based_vm_exec_control; + + if (exec_control CPU_BASED_TPR_SHADOW) { + if (vmx-nested.virtual_apic_page) + nested_release_page(vmx-nested.virtual_apic_page); + vmx-nested.virtual_apic_page = + nested_get_page(vcpu, vmcs12-virtual_apic_page_addr); + if (!vmx-nested.virtual_apic_page) + exec_control = + ~CPU_BASED_TPR_SHADOW; + else + vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, + page_to_phys(vmx-nested.virtual_apic_page)); + + vmcs_write32(TPR_THRESHOLD, vmcs12-tpr_threshold); + } + /* * Merging of IO and MSR bitmaps not currently supported. * Rather, exit every time. @@ -8793,6 +8818,10 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, nested_release_page(vmx-nested.apic_access_page); vmx-nested.apic_access_page = 0; } + if (vmx-nested.virtual_apic_page) { + nested_release_page(vmx-nested.virtual_apic_page); + vmx-nested.virtual_apic_page = 0; + } /* * Exiting from L2 to L1, we're now back to L1 which thinks it just -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: nVMX: Fix nested vmexit ack intr before load vmcs01
External interrupt will cause L1 vmexit w/ reason external interrupt when L2 is running. Then L1 will pick up the interrupt through vmcs12 if L1 set the ack interrupt bit. Commit 77b0f5d (KVM: nVMX: Ack and write vector info to intr_info if L1 asks us to) get intr that belongs to L1 before load vmcs01 which is wrong, especially this lead to the obvious L1 ack APICv behavior weired since APICv is for L1 instead of L2. This patch fix it by ack intr after load vmcs01. Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com --- arch/x86/kvm/vmx.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e618f34..b8122b3 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8754,14 +8754,6 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, prepare_vmcs12(vcpu, vmcs12, exit_reason, exit_intr_info, exit_qualification); - if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT) -nested_exit_intr_ack_set(vcpu)) { - int irq = kvm_cpu_get_interrupt(vcpu); - WARN_ON(irq 0); - vmcs12-vm_exit_intr_info = irq | - INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR; - } - trace_kvm_nested_vmexit_inject(vmcs12-vm_exit_reason, vmcs12-exit_qualification, vmcs12-idt_vectoring_info_field, @@ -8771,6 +8763,14 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, vmx_load_vmcs01(vcpu); + if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT) +nested_exit_intr_ack_set(vcpu)) { + int irq = kvm_cpu_get_interrupt(vcpu); + WARN_ON(irq 0); + vmcs12-vm_exit_intr_info = irq | + INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR; + } + vm_entry_controls_init(vmx, vmcs_read32(VM_ENTRY_CONTROLS)); vm_exit_controls_init(vmx, vmcs_read32(VM_EXIT_CONTROLS)); vmx_segment_cache_clear(vmx); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: nVMX: fix acknowledge interrupt on exit when APICv is in use
After commit 77b0f5d (KVM: nVMX: Ack and write vector info to intr_info if L1 asks us to), Acknowledge interrupt on exit behavior can be emulated. To do so, KVM will ask the APIC for the interrupt vector if during a nested vmexit if VM_EXIT_ACK_INTR_ON_EXIT is set. With APICv, kvm_get_apic_interrupt would return -1 and give the following WARNING: Call Trace: [81493563] dump_stack+0x49/0x5e [8103f0eb] warn_slowpath_common+0x7c/0x96 [a059709a] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel] [8103f11a] warn_slowpath_null+0x15/0x17 [a059709a] nested_vmx_vmexit+0xa4/0x233 [kvm_intel] [a0594295] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel] [a0537931] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm] [a05972ec] vmx_check_nested_events+0xc3/0xd3 [kvm_intel] [a051ebe9] inject_pending_event+0xd0/0x16e [kvm] [a051efa0] vcpu_enter_guest+0x319/0x704 [kvm] If enabling APIC-v, all interrupts to L1 are delivered through APIC-v. But when L2 is running, external interrupt will casue L1 vmexit with reason external interrupt. Then L1 will pick up the interrupt through vmcs12. when L1 ack the interrupt, since the APIC-v is enabled when L1 is running, so APIC-v hardware still will do vEOI updating. The problem is that the interrupt is delivered not through APIC-v hardware, this means SVI/RVI/vPPR are not setting, but hardware required them when doing vEOI updating. The solution is that, when L1 tried to pick up the interrupt from vmcs12, then hypervisor will help to update the SVI/RVI/vPPR to make sure the following vEOI updating and vPPR updating corrently. Also, since interrupt is delivered through vmcs12, so APIC-v hardware will not cleare vIRR and hypervisor need to clear it before L1 running. Suggested-by: Paolo Bonzini pbonz...@redhat.com Suggested-by: Zhang, Yang Z yang.z.zh...@intel.com Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com --- arch/x86/kvm/lapic.c | 18 ++ arch/x86/kvm/lapic.h | 1 + arch/x86/kvm/vmx.c | 10 ++ 3 files changed, 29 insertions(+) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 3855103..06942b9 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -534,6 +534,24 @@ static void apic_set_tpr(struct kvm_lapic *apic, u32 tpr) apic_update_ppr(apic); } +int kvm_lapic_ack_apicv(struct kvm_vcpu *vcpu) +{ + struct kvm_lapic *apic = vcpu-arch.apic; + int vec; + + vec = kvm_apic_has_interrupt(vcpu); + + if (vec == -1) + return vec; + + apic_set_vector(vec, apic-regs + APIC_ISR); + apic_update_ppr(apic); + apic_clear_vector(vec, apic-regs + APIC_IRR); + + return vec; +} +EXPORT_SYMBOL_GPL(kvm_lapic_ack_apicv); + int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest) { return dest == 0xff || kvm_apic_id(apic) == dest; diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 6a11845..ead1392 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -169,5 +169,6 @@ static inline bool kvm_apic_has_events(struct kvm_vcpu *vcpu) } bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector); +int kvm_lapic_ack_apicv(struct kvm_vcpu *vcpu); #endif diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index b8122b3..c604f3c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8766,6 +8766,16 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT) nested_exit_intr_ack_set(vcpu)) { int irq = kvm_cpu_get_interrupt(vcpu); + + if (irq 0 kvm_apic_vid_enabled(vcpu-kvm)) { + irq = kvm_lapic_ack_apicv(vcpu); + if (irq = 0) { + vmx_hwapic_isr_update(vcpu-kvm, irq); + /* try to update RVI */ + kvm_make_request(KVM_REQ_EVENT, vcpu); + } + } + WARN_ON(irq 0); vmcs12-vm_exit_intr_info = irq | INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: nVMX: fix acknowledge interrupt on exit when APICv is in use
After commit 77b0f5d (KVM: nVMX: Ack and write vector info to intr_info if L1 asks us to), Acknowledge interrupt on exit behavior can be emulated. To do so, KVM will ask the APIC for the interrupt vector if during a nested vmexit if VM_EXIT_ACK_INTR_ON_EXIT is set. With APICv, kvm_get_apic_interrupt would return -1 and give the following WARNING: Call Trace: [81493563] dump_stack+0x49/0x5e [8103f0eb] warn_slowpath_common+0x7c/0x96 [a059709a] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel] [8103f11a] warn_slowpath_null+0x15/0x17 [a059709a] nested_vmx_vmexit+0xa4/0x233 [kvm_intel] [a0594295] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel] [a0537931] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm] [a05972ec] vmx_check_nested_events+0xc3/0xd3 [kvm_intel] [a051ebe9] inject_pending_event+0xd0/0x16e [kvm] [a051efa0] vcpu_enter_guest+0x319/0x704 [kvm] If enabling APIC-v, all interrupts to L1 are delivered through APIC-v. But when L2 is running, external interrupt will casue L1 vmexit with reason external interrupt. Then L1 will pick up the interrupt through vmcs12. when L1 ack the interrupt, since the APIC-v is enabled when L1 is running, so APIC-v hardware still will do vEOI updating. The problem is that the interrupt is delivered not through APIC-v hardware, this means SVI/RVI/vPPR are not setting, but hardware required them when doing vEOI updating. The solution is that, when L1 tried to pick up the interrupt from vmcs12, then hypervisor will help to update the SVI/RVI/vPPR to make sure the following vEOI updating and vPPR updating corrently. Also, since interrupt is delivered through vmcs12, so APIC-v hardware will not cleare vIRR and hypervisor need to clear it before L1 running. Suggested-by: Paolo Bonzini pbonz...@redhat.com Suggested-by: Zhang, Yang Z yang.z.zh...@intel.com Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com --- arch/x86/kvm/lapic.c | 18 ++ arch/x86/kvm/lapic.h | 1 + arch/x86/kvm/vmx.c | 10 ++ 3 files changed, 29 insertions(+) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 3855103..06942b9 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -534,6 +534,24 @@ static void apic_set_tpr(struct kvm_lapic *apic, u32 tpr) apic_update_ppr(apic); } +int kvm_lapic_ack_apicv(struct kvm_vcpu *vcpu) +{ + struct kvm_lapic *apic = vcpu-arch.apic; + int vec; + + vec = kvm_apic_has_interrupt(vcpu); + + if (vec == -1) + return vec; + + apic_set_vector(vec, apic-regs + APIC_ISR); + apic_update_ppr(apic); + apic_clear_vector(vec, apic-regs + APIC_IRR); + + return vec; +} +EXPORT_SYMBOL_GPL(kvm_lapic_ack_apicv); + int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest) { return dest == 0xff || kvm_apic_id(apic) == dest; diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 6a11845..ead1392 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -169,5 +169,6 @@ static inline bool kvm_apic_has_events(struct kvm_vcpu *vcpu) } bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector); +int kvm_lapic_ack_apicv(struct kvm_vcpu *vcpu); #endif diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index b8122b3..c604f3c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8766,6 +8766,16 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT) nested_exit_intr_ack_set(vcpu)) { int irq = kvm_cpu_get_interrupt(vcpu); + + if (irq 0 kvm_apic_vid_enabled(vcpu-kvm)) { + irq = kvm_lapic_ack_apicv(vcpu); + if (irq = 0) { + vmx_hwapic_isr_update(vcpu-kvm, irq); + /* try to update RVI */ + kvm_make_request(KVM_REQ_EVENT, vcpu); + } + } + WARN_ON(irq 0); vmcs12-vm_exit_intr_info = irq | INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: nVMX: fix acknowledge interrupt on exit when APICv is in use
Please ignore this duplicate one. 于 14-8-1 下午4:13, Wanpeng Li 写道: After commit 77b0f5d (KVM: nVMX: Ack and write vector info to intr_info if L1 asks us to), Acknowledge interrupt on exit behavior can be emulated. To do so, KVM will ask the APIC for the interrupt vector if during a nested vmexit if VM_EXIT_ACK_INTR_ON_EXIT is set. With APICv, kvm_get_apic_interrupt would return -1 and give the following WARNING: Call Trace: [81493563] dump_stack+0x49/0x5e [8103f0eb] warn_slowpath_common+0x7c/0x96 [a059709a] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel] [8103f11a] warn_slowpath_null+0x15/0x17 [a059709a] nested_vmx_vmexit+0xa4/0x233 [kvm_intel] [a0594295] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel] [a0537931] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm] [a05972ec] vmx_check_nested_events+0xc3/0xd3 [kvm_intel] [a051ebe9] inject_pending_event+0xd0/0x16e [kvm] [a051efa0] vcpu_enter_guest+0x319/0x704 [kvm] If enabling APIC-v, all interrupts to L1 are delivered through APIC-v. But when L2 is running, external interrupt will casue L1 vmexit with reason external interrupt. Then L1 will pick up the interrupt through vmcs12. when L1 ack the interrupt, since the APIC-v is enabled when L1 is running, so APIC-v hardware still will do vEOI updating. The problem is that the interrupt is delivered not through APIC-v hardware, this means SVI/RVI/vPPR are not setting, but hardware required them when doing vEOI updating. The solution is that, when L1 tried to pick up the interrupt from vmcs12, then hypervisor will help to update the SVI/RVI/vPPR to make sure the following vEOI updating and vPPR updating corrently. Also, since interrupt is delivered through vmcs12, so APIC-v hardware will not cleare vIRR and hypervisor need to clear it before L1 running. Suggested-by: Paolo Bonzini pbonz...@redhat.com Suggested-by: Zhang, Yang Z yang.z.zh...@intel.com Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com --- arch/x86/kvm/lapic.c | 18 ++ arch/x86/kvm/lapic.h | 1 + arch/x86/kvm/vmx.c | 10 ++ 3 files changed, 29 insertions(+) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 3855103..06942b9 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -534,6 +534,24 @@ static void apic_set_tpr(struct kvm_lapic *apic, u32 tpr) apic_update_ppr(apic); } +int kvm_lapic_ack_apicv(struct kvm_vcpu *vcpu) +{ + struct kvm_lapic *apic = vcpu-arch.apic; + int vec; + + vec = kvm_apic_has_interrupt(vcpu); + + if (vec == -1) + return vec; + + apic_set_vector(vec, apic-regs + APIC_ISR); + apic_update_ppr(apic); + apic_clear_vector(vec, apic-regs + APIC_IRR); + + return vec; +} +EXPORT_SYMBOL_GPL(kvm_lapic_ack_apicv); + int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest) { return dest == 0xff || kvm_apic_id(apic) == dest; diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 6a11845..ead1392 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -169,5 +169,6 @@ static inline bool kvm_apic_has_events(struct kvm_vcpu *vcpu) } bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector); +int kvm_lapic_ack_apicv(struct kvm_vcpu *vcpu); #endif diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index b8122b3..c604f3c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8766,6 +8766,16 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT) nested_exit_intr_ack_set(vcpu)) { int irq = kvm_cpu_get_interrupt(vcpu); + + if (irq 0 kvm_apic_vid_enabled(vcpu-kvm)) { + irq = kvm_lapic_ack_apicv(vcpu); + if (irq = 0) { + vmx_hwapic_isr_update(vcpu-kvm, irq); + /* try to update RVI */ + kvm_make_request(KVM_REQ_EVENT, vcpu); + } + } + WARN_ON(irq 0); vmcs12-vm_exit_intr_info = irq | INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] x86: AMD: mark TSC unstable on APU family 15h models 10h-1fh
On Thu, Jul 31, 2014 at 09:47:12AM +, Igor Mammedov wrote: Due to erratum #778 from Revision Guide for AMD Family 15h Models 10h-1Fh Processors, Publication # 48931, Issue Date: May 2013, Revision: 3.10 TSC on affected processor, a core may drift under certain conditions, which makes initially synchronized TSCs to become unsynchronized. As result TSC clocksource becomes unsuitable for using as wallclock and it brakes pvclock when it's running with PVCLOCK_TSC_STABLE_BIT flag set. That causes backwards clock jumps when pvclock is first read on CPU with drifted TSC and then on CPU where TSC was stable or had a lower drift rate. To fix issue mark TSC as unstable on affected CPU, so it won't be used as clocksource. Which in turn disables master_clock mechanism in KVM and force pvclock using global clock counter that can't go backwards. Signed-off-by: Igor Mammedov imamm...@redhat.com Acked-by: Borislav Petkov b...@suse.de -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: nVMX: nested TPR shadow/threshold emulation
Il 01/08/2014 10:09, Wanpeng Li ha scritto: This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=61411 TPR shadow/threshold feature is important to speed up the Windows guest. Besides, it is a must feature for certain VMM. We map virtual APIC page address and TPR threshold from L1 VMCS. If TPR_BELOW_THRESHOLD VM exit is triggered by L2 guest and L1 interested in, we inject it into L1 VMM for handling. Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com --- v1 - v2: * don't take L0's virtualize APIC accesses setting into account * virtual_apic_page do exactly the same thing that is done for apic_access_page * add the tpr threshold field to the read-write fields for shadow VMCS arch/x86/kvm/vmx.c | 33 +++-- 1 file changed, 31 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a3845b8..0e6e95e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -379,6 +379,7 @@ struct nested_vmx { * we must keep them pinned while L2 runs. */ struct page *apic_access_page; + struct page *virtual_apic_page; u64 msr_ia32_feature_control; struct hrtimer preemption_timer; @@ -533,6 +534,7 @@ static int max_shadow_read_only_fields = ARRAY_SIZE(shadow_read_only_fields); static unsigned long shadow_read_write_fields[] = { + TPR_THRESHOLD, GUEST_RIP, GUEST_RSP, GUEST_CR0, @@ -2331,7 +2333,7 @@ static __init void nested_vmx_setup_ctls_msrs(void) CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MONITOR_EXITING | CPU_BASED_RDPMC_EXITING | CPU_BASED_RDTSC_EXITING | - CPU_BASED_PAUSE_EXITING | + CPU_BASED_PAUSE_EXITING | CPU_BASED_TPR_SHADOW | CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; /* * We can allow some features even when not supported by the @@ -6149,6 +6151,10 @@ static void free_nested(struct vcpu_vmx *vmx) nested_release_page(vmx-nested.apic_access_page); vmx-nested.apic_access_page = 0; } + if (vmx-nested.virtual_apic_page) { + nested_release_page(vmx-nested.virtual_apic_page); + vmx-nested.virtual_apic_page = 0; + } nested_free_all_saved_vmcss(vmx); } @@ -6937,7 +6943,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu) case EXIT_REASON_MCE_DURING_VMENTRY: return 0; case EXIT_REASON_TPR_BELOW_THRESHOLD: - return 1; + return nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW); case EXIT_REASON_APIC_ACCESS: return nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES); @@ -7058,6 +7064,9 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) { + if (is_guest_mode(vcpu)) + return; + if (irr == -1 || tpr irr) { vmcs_write32(TPR_THRESHOLD, 0); return; @@ -8025,6 +8034,22 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) exec_control = ~CPU_BASED_VIRTUAL_NMI_PENDING; exec_control = ~CPU_BASED_TPR_SHADOW; exec_control |= vmcs12-cpu_based_vm_exec_control; + + if (exec_control CPU_BASED_TPR_SHADOW) { + if (vmx-nested.virtual_apic_page) + nested_release_page(vmx-nested.virtual_apic_page); + vmx-nested.virtual_apic_page = +nested_get_page(vcpu, vmcs12-virtual_apic_page_addr); + if (!vmx-nested.virtual_apic_page) + exec_control = + ~CPU_BASED_TPR_SHADOW; This will cause L1 to miss exits when L2 writes to CR8. I think the only sensible thing to do if this happens is fail the vmentry. The problem is that while the APIC access page field is used to trap reads/writes to the APIC access page itself, here the processor will read/write the virtual APIC page when L2 does CR8 accesses. Paolo + else + vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, + page_to_phys(vmx-nested.virtual_apic_page)); + + vmcs_write32(TPR_THRESHOLD, vmcs12-tpr_threshold); + } + /* * Merging of IO and MSR bitmaps not currently supported. * Rather, exit every time. @@ -8793,6 +8818,10 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, nested_release_page(vmx-nested.apic_access_page); vmx-nested.apic_access_page = 0; } + if (vmx-nested.virtual_apic_page) { + nested_release_page(vmx-nested.virtual_apic_page); + vmx-nested.virtual_apic_page = 0; + } /* * Exiting from L2 to L1, we're now back to L1 which thinks
Re: [PATCH] arm64: KVM: export current vcpu-pause state via pseudo regs
Christoffer Dall writes: On Thu, Jul 31, 2014 at 04:14:51PM +0100, Alex Bennée wrote: Christoffer Dall writes: On Wed, Jul 09, 2014 at 02:55:12PM +0100, Alex Bennée wrote: To cleanly restore an SMP VM we need to ensure that the current pause state of each vcpu is correctly recorded. Things could get confused if the CPU starts running after migration restore completes when it was paused before it state was captured. snip +/* Power state (PSCI), not real registers */ +#define KVM_REG_ARM_PSCI (0x0014 KVM_REG_ARM_COPROC_SHIFT) +#define KVM_REG_ARM_PSCI_REG(n) \ + (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_PSCI | \ + (n ~KVM_REG_ARM_COPROC_MASK)) I don't understand this mask, why isn't this (n 0x)) I was trying to use the existing masks, but of course if anyone changes that it would be an ABI change so probably not worth it. the KVM_REG_ARM_COPROC_MASK is part of the uapi IIRC, so that's not the issue, but that mask doesn't cover all the upper bits, so it feels weird to use that to me. Yeah I missed that. I could do a: #define KVM_REG_ARM_COPROC_INDEX_MASK ((1KVM_REG_ARM_COPROC_SHIFT)-1) and use that. I'm generally try to avoid hardcoded numbers but I could be being a little OCD here ;-) Can you add the 32-bit counterpart as part of this patch? Same patch? Sure. really up to you if you want to split it up into two patches, but I think it's small enough that you can just create one patch. Given the similarity of this code between arm and arm64 I'm wondering if it's worth doing a arch/arm/kvm/guest_common.c or something to reduce the amount of copy paste stuff? -- Alex Bennée -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 19/63] KVM: PPC: Book3S HV: Access host lppaca and shadow slb in BE
Some data structures are always stored in big endian. Among those are the LPPACA fields as well as the shadow slb. These structures might be shared with a hypervisor. So whenever we access those fields, make sure we do so in big endian byte order. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index e66c1e38..bf5270e 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -32,10 +32,6 @@ #define VCPU_GPRS_TM(reg) (((reg) * ULONG_SIZE) + VCPU_GPR_TM) -#ifdef __LITTLE_ENDIAN__ -#error Need to fix lppaca and SLB shadow accesses in little endian mode -#endif - /* Values in HSTATE_NAPPING(r13) */ #define NAPPING_CEDE 1 #define NAPPING_NOVCPU 2 @@ -595,9 +591,10 @@ kvmppc_got_guest: ld r3, VCPU_VPA(r4) cmpdi r3, 0 beq 25f - lwz r5, LPPACA_YIELDCOUNT(r3) + li r6, LPPACA_YIELDCOUNT + LWZX_BE r5, r3, r6 addir5, r5, 1 - stw r5, LPPACA_YIELDCOUNT(r3) + STWX_BE r5, r3, r6 li r6, 1 stb r6, VCPU_VPA_DIRTY(r4) 25: @@ -1442,9 +1439,10 @@ END_FTR_SECTION_IFCLR(CPU_FTR_TM) ld r8, VCPU_VPA(r9)/* do they have a VPA? */ cmpdi r8, 0 beq 25f - lwz r3, LPPACA_YIELDCOUNT(r8) + li r4, LPPACA_YIELDCOUNT + LWZX_BE r3, r8, r4 addir3, r3, 1 - stw r3, LPPACA_YIELDCOUNT(r8) + STWX_BE r3, r8, r4 li r3, 1 stb r3, VCPU_VPA_DIRTY(r9) 25: @@ -1757,8 +1755,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) 33:ld r8,PACA_SLBSHADOWPTR(r13) .rept SLB_NUM_BOLTED - ld r5,SLBSHADOW_SAVEAREA(r8) - ld r6,SLBSHADOW_SAVEAREA+8(r8) + li r3, SLBSHADOW_SAVEAREA + LDX_BE r5, r8, r3 + addir3, r3, 8 + LDX_BE r6, r8, r3 andis. r7,r5,SLB_ESID_V@h beq 1f slbmte r6,r5 -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 29/63] kvm: ppc: bookehv: Added wrapper macros for shadow registers
From: Bharat Bhushan bharat.bhus...@freescale.com There are shadow registers like, GSPRG[0-3], GSRR0, GSRR1 etc on BOOKE-HV and these shadow registers are guest accessible. So these shadow registers needs to be updated on BOOKE-HV. This patch adds new macro for get/set helper of shadow register . Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_ppc.h | 44 +++--- 1 file changed, 36 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index e2fd5a1..6520d09 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -472,8 +472,20 @@ static inline bool kvmppc_shared_big_endian(struct kvm_vcpu *vcpu) #endif } +#define SPRNG_WRAPPER_GET(reg, e500hv_spr) \ +static inline ulong kvmppc_get_##reg(struct kvm_vcpu *vcpu)\ +{ \ + return mfspr(e500hv_spr); \ +} \ + +#define SPRNG_WRAPPER_SET(reg, e500hv_spr) \ +static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, ulong val) \ +{ \ + mtspr(e500hv_spr, val); \ +} \ + #define SHARED_WRAPPER_GET(reg, size) \ -static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu) \ +static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu) \ { \ if (kvmppc_shared_big_endian(vcpu)) \ return be##size##_to_cpu(vcpu-arch.shared-reg);\ @@ -494,14 +506,30 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val) \ SHARED_WRAPPER_GET(reg, size) \ SHARED_WRAPPER_SET(reg, size) \ +#define SPRNG_WRAPPER(reg, e500hv_spr) \ + SPRNG_WRAPPER_GET(reg, e500hv_spr) \ + SPRNG_WRAPPER_SET(reg, e500hv_spr) \ + +#ifdef CONFIG_KVM_BOOKE_HV + +#define SHARED_SPRNG_WRAPPER(reg, size, e500hv_spr)\ + SPRNG_WRAPPER(reg, e500hv_spr) \ + +#else + +#define SHARED_SPRNG_WRAPPER(reg, size, e500hv_spr)\ + SHARED_WRAPPER(reg, size) \ + +#endif + SHARED_WRAPPER(critical, 64) -SHARED_WRAPPER(sprg0, 64) -SHARED_WRAPPER(sprg1, 64) -SHARED_WRAPPER(sprg2, 64) -SHARED_WRAPPER(sprg3, 64) -SHARED_WRAPPER(srr0, 64) -SHARED_WRAPPER(srr1, 64) -SHARED_WRAPPER(dar, 64) +SHARED_SPRNG_WRAPPER(sprg0, 64, SPRN_GSPRG0) +SHARED_SPRNG_WRAPPER(sprg1, 64, SPRN_GSPRG1) +SHARED_SPRNG_WRAPPER(sprg2, 64, SPRN_GSPRG2) +SHARED_SPRNG_WRAPPER(sprg3, 64, SPRN_GSPRG3) +SHARED_SPRNG_WRAPPER(srr0, 64, SPRN_GSRR0) +SHARED_SPRNG_WRAPPER(srr1, 64, SPRN_GSRR1) +SHARED_SPRNG_WRAPPER(dar, 64, SPRN_GDEAR) SHARED_WRAPPER_GET(msr, 64) static inline void kvmppc_set_msr_fast(struct kvm_vcpu *vcpu, u64 val) { -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 07/63] KVM: PPC: Book3S HV: Fix ABIv2 indirect branch issue
From: Anton Blanchard an...@samba.org To establish addressability quickly, ABIv2 requires the target address of the function being called to be in r12. Signed-off-by: Anton Blanchard an...@samba.org Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 868347e..da1cac5 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1913,8 +1913,8 @@ hcall_try_real_mode: lwaxr3,r3,r4 cmpwi r3,0 beq guest_exit_cont - add r3,r3,r4 - mtctr r3 + add r12,r3,r4 + mtctr r12 mr r3,r9 /* get vcpu pointer */ ld r4,VCPU_GPR(R4)(r9) bctrl -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 52/63] KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com When calculating the lower bits of AVA field, use the shift count based on the base page size. Also add the missing segment size and remove stale comment. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Acked-by: Paul Mackerras pau...@samba.org Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s_64.h | 6 -- arch/powerpc/kvm/book3s_hv.c | 6 -- 2 files changed, 4 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index e504f88..07cf9df 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -147,6 +147,8 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, */ /* This covers 14..54 bits of va*/ rb = (v ~0x7fUL) 16; /* AVA field */ + + rb |= v (62 - 8);/* B field */ /* * AVA in v had cleared lower 23 bits. We need to derive * that from pteg index @@ -177,10 +179,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, { int aval_shift; /* -* remaining 7bits of AVA/LP fields +* remaining bits of AVA/LP fields * Also contain the rr bits of LP */ - rb |= (va_low 0x7f) 16; + rb |= (va_low mmu_psize_defs[b_psize].shift) 0x7ff000; /* * Now clear not needed LP bits based on actual psize */ diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index c470d55..27cced9 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2064,12 +2064,6 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps, (*sps)-page_shift = def-shift; (*sps)-slb_enc = def-sllp; (*sps)-enc[0].page_shift = def-shift; - /* -* Only return base page encoding. We don't want to return -* all the supporting pte_enc, because our H_ENTER doesn't -* support MPSS yet. Once they do, we can start passing all -* support pte_enc here -*/ (*sps)-enc[0].pte_enc = def-penc[linux_psize]; /* * Add 16MB MPSS support if host supports it -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 56/63] KVM: PPC: Use kvm_read_guest in kvmppc_ld
We have a nice and handy helper to read from guest physical address space, so we should make use of it in kvmppc_ld as we already do for its counterpart in kvmppc_st. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/powerpc.c | 27 ++- 1 file changed, 2 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 3d59730..be40886 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -309,19 +309,6 @@ int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvmppc_emulate_mmio); -static hva_t kvmppc_pte_to_hva(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte) -{ - hva_t hpage; - - hpage = gfn_to_hva(vcpu-kvm, pte-raddr PAGE_SHIFT); - if (kvm_is_error_hva(hpage)) - goto err; - - return hpage | (pte-raddr ~PAGE_MASK); -err: - return KVM_HVA_ERR_BAD; -} - int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data) { @@ -351,7 +338,6 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data) { struct kvmppc_pte pte; - hva_t hva = *eaddr; int rc; vcpu-stat.ld++; @@ -369,19 +355,10 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, if (!data !pte.may_execute) return -ENOEXEC; - hva = kvmppc_pte_to_hva(vcpu, pte); - if (kvm_is_error_hva(hva)) - goto mmio; - - if (copy_from_user(ptr, (void __user *)hva, size)) { - printk(KERN_INFO kvmppc_ld at 0x%lx failed\n, hva); - goto mmio; - } + if (kvm_read_guest(vcpu-kvm, pte.raddr, ptr, size)) + return EMULATE_DO_MMIO; return EMULATE_DONE; - -mmio: - return EMULATE_DO_MMIO; } EXPORT_SYMBOL_GPL(kvmppc_ld); -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 12/63] KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling
From: Paul Mackerras pau...@samba.org This provides a way for userspace controls which sPAPR hcalls get handled in the kernel. Each hcall can be individually enabled or disabled for in-kernel handling, except for H_RTAS. The exception for H_RTAS is because userspace can already control whether individual RTAS functions are handled in-kernel or not via the KVM_PPC_RTAS_DEFINE_TOKEN ioctl, and because the numeric value for H_RTAS is out of the normal sequence of hcall numbers. Hcalls are enabled or disabled using the KVM_ENABLE_CAP ioctl for the KVM_CAP_PPC_ENABLE_HCALL capability on the file descriptor for the VM. The args field of the struct kvm_enable_cap specifies the hcall number in args[0] and the enable/disable flag in args[1]; 0 means disable in-kernel handling (so that the hcall will always cause an exit to userspace) and 1 means enable. Enabling or disabling in-kernel handling of an hcall is effective across the whole VM. The ability for KVM_ENABLE_CAP to be used on a VM file descriptor on PowerPC is new, added by this commit. The KVM_CAP_ENABLE_CAP_VM capability advertises that this ability exists. When a VM is created, an initial set of hcalls are enabled for in-kernel handling. The set that is enabled is the set that have an in-kernel implementation at this point. Any new hcall implementations from this point onwards should not be added to the default set without a good reason. No distinction is made between real-mode and virtual-mode hcall implementations; the one setting controls them both. Signed-off-by: Paul Mackerras pau...@samba.org Signed-off-by: Alexander Graf ag...@suse.de --- Documentation/virtual/kvm/api.txt | 41 -- arch/powerpc/include/asm/kvm_book3s.h | 1 + arch/powerpc/include/asm/kvm_host.h | 2 ++ arch/powerpc/kernel/asm-offsets.c | 1 + arch/powerpc/kvm/book3s_hv.c| 51 + arch/powerpc/kvm/book3s_hv_rmhandlers.S | 11 +++ arch/powerpc/kvm/book3s_pr.c| 5 arch/powerpc/kvm/book3s_pr_papr.c | 37 arch/powerpc/kvm/powerpc.c | 45 + include/uapi/linux/kvm.h| 1 + 10 files changed, 193 insertions(+), 2 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 0fe3649..5c54d19 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2863,8 +2863,8 @@ The fields in each entry are defined as follows: this function/index combination -6. Capabilities that can be enabled +6. Capabilities that can be enabled on vCPUs + There are certain capabilities that change the behavior of the virtual CPU when enabled. To enable them, please see section 4.37. Below you can find a list of @@ -3002,3 +3002,40 @@ Parameters: args[0] is the XICS device fd args[1] is the XICS CPU number (server ID) for this vcpu This capability connects the vcpu to an in-kernel XICS device. + + +7. Capabilities that can be enabled on VMs +-- + +There are certain capabilities that change the behavior of the virtual +machine when enabled. To enable them, please see section 4.37. Below +you can find a list of capabilities and what their effect on the VM +is when enabling them. + +The following information is provided along with the description: + + Architectures: which instruction set architectures provide this ioctl. + x86 includes both i386 and x86_64. + + Parameters: what parameters are accepted by the capability. + + Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL) + are not detailed, but errors with specific meanings are. + + +7.1 KVM_CAP_PPC_ENABLE_HCALL + +Architectures: ppc +Parameters: args[0] is the sPAPR hcall number + args[1] is 0 to disable, 1 to enable in-kernel handling + +This capability controls whether individual sPAPR hypercalls (hcalls) +get handled by the kernel or not. Enabling or disabling in-kernel +handling of an hcall is effective across the VM. On creation, an +initial set of hcalls are enabled for in-kernel handling, which +consists of those hcalls for which in-kernel handlers were implemented +before this capability was implemented. If disabled, the kernel will +not to attempt to handle the hcall, but will always exit to userspace +to handle it. Note that it may not make sense to enable some and +disable others of a group of related hcalls, but KVM does not prevent +userspace from doing that. diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index a20cc0b..052ab2a 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -187,6 +187,7 @@ extern void kvmppc_hv_entry_trampoline(void); extern u32 kvmppc_alignment_dsisr(struct
[PULL 24/63] KVM: PPC: Book3S: Move vcore definition to end of kvm_arch struct
When building KVM with a lot of vcores (NR_CPUS is big), we can potentially get out of the ld immediate range for dereferences inside that struct. Move the array to the end of our kvm_arch struct. This fixes compilation issues with NR_CPUS=2048 for me. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index faf2f0e..855ba4d 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -255,7 +255,6 @@ struct kvm_arch { atomic_t hpte_mod_interest; spinlock_t slot_phys_lock; cpumask_t need_tlb_flush; - struct kvmppc_vcore *vcores[KVM_MAX_VCORES]; int hpt_cma_alloc; #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE @@ -273,6 +272,10 @@ struct kvm_arch { struct kvmppc_xics *xics; #endif struct kvmppc_ops *kvm_ops; +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE + /* This array can grow quite large, keep it at the end */ + struct kvmppc_vcore *vcores[KVM_MAX_VCORES]; +#endif }; /* -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 28/63] KVM: PPC: Book3S: Make magic page properly 4k mappable
The magic page is defined as a 4k page of per-vCPU data that is shared between the guest and the host to accelerate accesses to privileged registers. However, when the host is using 64k page size granularity we weren't quite as strict about that rule anymore. Instead, we partially treated all of the upper 64k as magic page and mapped only the uppermost 4k with the actual magic contents. This works well enough for Linux which doesn't use any memory in kernel space in the upper 64k, but Mac OS X got upset. So this patch makes magic page actually stay in a 4k range even on 64k page size hosts. This patch fixes magic page usage with Mac OS X (using MOL) on 64k PAGE_SIZE hosts for me. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h | 2 +- arch/powerpc/kvm/book3s.c | 12 ++-- arch/powerpc/kvm/book3s_32_mmu_host.c | 7 +++ arch/powerpc/kvm/book3s_64_mmu_host.c | 5 +++-- arch/powerpc/kvm/book3s_pr.c | 13 ++--- arch/powerpc/kvm/powerpc.c| 19 +++ 6 files changed, 38 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index b1cf18d..20fb6f2 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -158,7 +158,7 @@ extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct kvmppc_bat *bat, bool upper, u32 val); extern void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr); extern int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu *vcpu); -extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, bool writing, +extern pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing, bool *writable); extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev, unsigned long *rmap, long pte_index, int realmode); diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 1d13764..31facfc 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -354,18 +354,18 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvmppc_core_prepare_to_enter); -pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, bool writing, +pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing, bool *writable) { - ulong mp_pa = vcpu-arch.magic_page_pa; + ulong mp_pa = vcpu-arch.magic_page_pa KVM_PAM; + gfn_t gfn = gpa PAGE_SHIFT; if (!(kvmppc_get_msr(vcpu) MSR_SF)) mp_pa = (uint32_t)mp_pa; /* Magic page override */ - if (unlikely(mp_pa) - unlikely(((gfn PAGE_SHIFT) KVM_PAM) == -((mp_pa PAGE_MASK) KVM_PAM))) { + gpa = ~0xFFFULL; + if (unlikely(mp_pa) unlikely((gpa KVM_PAM) == mp_pa)) { ulong shared_page = ((ulong)vcpu-arch.shared) PAGE_MASK; pfn_t pfn; @@ -378,7 +378,7 @@ pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, bool writing, return gfn_to_pfn_prot(vcpu-kvm, gfn, writing, writable); } -EXPORT_SYMBOL_GPL(kvmppc_gfn_to_pfn); +EXPORT_SYMBOL_GPL(kvmppc_gpa_to_pfn); static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, bool data, bool iswrite, struct kvmppc_pte *pte) diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c b/arch/powerpc/kvm/book3s_32_mmu_host.c index 678e753..2035d16 100644 --- a/arch/powerpc/kvm/book3s_32_mmu_host.c +++ b/arch/powerpc/kvm/book3s_32_mmu_host.c @@ -156,11 +156,10 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte, bool writable; /* Get host physical address for gpa */ - hpaddr = kvmppc_gfn_to_pfn(vcpu, orig_pte-raddr PAGE_SHIFT, - iswrite, writable); + hpaddr = kvmppc_gpa_to_pfn(vcpu, orig_pte-raddr, iswrite, writable); if (is_error_noslot_pfn(hpaddr)) { - printk(KERN_INFO Couldn't get guest page for gfn %lx!\n, -orig_pte-eaddr); + printk(KERN_INFO Couldn't get guest page for gpa %lx!\n, +orig_pte-raddr); r = -EINVAL; goto out; } diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c index 0ac9839..b982d92 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_host.c +++ b/arch/powerpc/kvm/book3s_64_mmu_host.c @@ -104,9 +104,10 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte, smp_rmb(); /* Get host physical address for gpa */ - pfn = kvmppc_gfn_to_pfn(vcpu, gfn, iswrite, writable); + pfn = kvmppc_gpa_to_pfn(vcpu, orig_pte-raddr, iswrite, writable); if (is_error_noslot_pfn(pfn)) { - printk(KERN_INFO Couldn't get guest page for gfn
[PULL 53/63] KVM: PPC: Implement kvmppc_xlate for all targets
We have a nice API to find the translated GPAs of a GVA including protection flags. So far we only use it on Book3S, but there's no reason the same shouldn't be used on BookE as well. Implement a kvmppc_xlate() version for BookE and clean it up to make it more readable in general. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_ppc.h | 13 ++ arch/powerpc/kvm/book3s.c | 12 ++--- arch/powerpc/kvm/booke.c | 51 ++ 3 files changed, 72 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index e381363..1a60af9 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -52,6 +52,16 @@ enum instruction_type { INST_SC,/* system call */ }; +enum xlate_instdata { + XLATE_INST, /* translate instruction address */ + XLATE_DATA /* translate data address */ +}; + +enum xlate_readwrite { + XLATE_READ, /* check for read permissions */ + XLATE_WRITE /* check for write permissions */ +}; + extern int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu); extern int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu); extern void kvmppc_handler_highmem(void); @@ -94,6 +104,9 @@ extern gpa_t kvmppc_mmu_xlate(struct kvm_vcpu *vcpu, unsigned int gtlb_index, gva_t eaddr); extern void kvmppc_mmu_dtlb_miss(struct kvm_vcpu *vcpu); extern void kvmppc_mmu_itlb_miss(struct kvm_vcpu *vcpu); +extern int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, + enum xlate_instdata xlid, enum xlate_readwrite xlrw, + struct kvmppc_pte *pte); extern struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id); diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index a3cbada..0b6c84e 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -380,9 +380,11 @@ pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing, } EXPORT_SYMBOL_GPL(kvmppc_gpa_to_pfn); -static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, bool data, - bool iswrite, struct kvmppc_pte *pte) +int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, enum xlate_instdata xlid, +enum xlate_readwrite xlrw, struct kvmppc_pte *pte) { + bool data = (xlid == XLATE_DATA); + bool iswrite = (xlrw == XLATE_WRITE); int relocated = (kvmppc_get_msr(vcpu) (data ? MSR_DR : MSR_IR)); int r; @@ -434,7 +436,8 @@ int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, vcpu-stat.st++; - r = kvmppc_xlate(vcpu, *eaddr, data, true, pte); + r = kvmppc_xlate(vcpu, *eaddr, data ? XLATE_DATA : XLATE_INST, +XLATE_WRITE, pte); if (r 0) return r; @@ -459,7 +462,8 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, vcpu-stat.ld++; - rc = kvmppc_xlate(vcpu, *eaddr, data, false, pte); + rc = kvmppc_xlate(vcpu, *eaddr, data ? XLATE_DATA : XLATE_INST, + XLATE_READ, pte); if (rc) return rc; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 97bcde2..2f697b4 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -1785,6 +1785,57 @@ void kvm_guest_protect_msr(struct kvm_vcpu *vcpu, ulong prot_bitmap, bool set) #endif } +int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, enum xlate_instdata xlid, +enum xlate_readwrite xlrw, struct kvmppc_pte *pte) +{ + int gtlb_index; + gpa_t gpaddr; + +#ifdef CONFIG_KVM_E500V2 + if (!(vcpu-arch.shared-msr MSR_PR) + (eaddr PAGE_MASK) == vcpu-arch.magic_page_ea) { + pte-eaddr = eaddr; + pte-raddr = (vcpu-arch.magic_page_pa PAGE_MASK) | +(eaddr ~PAGE_MASK); + pte-vpage = eaddr PAGE_SHIFT; + pte-may_read = true; + pte-may_write = true; + pte-may_execute = true; + + return 0; + } +#endif + + /* Check the guest TLB. */ + switch (xlid) { + case XLATE_INST: + gtlb_index = kvmppc_mmu_itlb_index(vcpu, eaddr); + break; + case XLATE_DATA: + gtlb_index = kvmppc_mmu_dtlb_index(vcpu, eaddr); + break; + default: + BUG(); + } + + /* Do we have a TLB entry at all? */ + if (gtlb_index 0) + return -ENOENT; + + gpaddr = kvmppc_mmu_xlate(vcpu, gtlb_index, eaddr); + + pte-eaddr = eaddr; + pte-raddr = (gpaddr PAGE_MASK) | (eaddr ~PAGE_MASK); + pte-vpage = eaddr
[PULL 03/63] KVM: PPC: BOOK3S: PR: Emulate instruction counter
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Writing to IC is not allowed in the privileged mode. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/kvm/book3s.c | 6 ++ arch/powerpc/kvm/book3s_emulate.c | 3 +++ arch/powerpc/kvm/book3s_hv.c| 6 -- arch/powerpc/kvm/book3s_pr.c| 4 5 files changed, 14 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index bd3caea..f9ae696 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -506,6 +506,7 @@ struct kvm_vcpu_arch { /* Time base value when we entered the guest */ u64 entry_tb; u64 entry_vtb; + u64 entry_ic; u32 tcr; ulong tsr; /* we need to perform set/clr_bits() which requires ulong */ u32 ivor[64]; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index ddce1ea..90aa5c7 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -649,6 +649,9 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) case KVM_REG_PPC_VTB: val = get_reg_val(reg-id, vcpu-arch.vtb); break; + case KVM_REG_PPC_IC: + val = get_reg_val(reg-id, vcpu-arch.ic); + break; default: r = -EINVAL; break; @@ -756,6 +759,9 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) case KVM_REG_PPC_VTB: vcpu-arch.vtb = set_reg_val(reg-id, val); break; + case KVM_REG_PPC_IC: + vcpu-arch.ic = set_reg_val(reg-id, val); + break; default: r = -EINVAL; break; diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 1bb16a5..84fddcd 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -580,6 +580,9 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val case SPRN_VTB: *spr_val = vcpu-arch.vtb; break; + case SPRN_IC: + *spr_val = vcpu-arch.ic; + break; case SPRN_GQR0: case SPRN_GQR1: case SPRN_GQR2: diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 315e884..1562acf 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -894,9 +894,6 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, case KVM_REG_PPC_CIABR: *val = get_reg_val(id, vcpu-arch.ciabr); break; - case KVM_REG_PPC_IC: - *val = get_reg_val(id, vcpu-arch.ic); - break; case KVM_REG_PPC_CSIGR: *val = get_reg_val(id, vcpu-arch.csigr); break; @@ -1091,9 +1088,6 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, if ((vcpu-arch.ciabr CIABR_PRIV) == CIABR_PRIV_HYPER) vcpu-arch.ciabr = ~CIABR_PRIV;/* disable */ break; - case KVM_REG_PPC_IC: - vcpu-arch.ic = set_reg_val(id, *val); - break; case KVM_REG_PPC_CSIGR: vcpu-arch.csigr = set_reg_val(id, *val); break; diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index d2deb9e..3da412e 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -126,6 +126,8 @@ void kvmppc_copy_to_svcpu(struct kvmppc_book3s_shadow_vcpu *svcpu, */ vcpu-arch.entry_tb = get_tb(); vcpu-arch.entry_vtb = get_vtb(); + if (cpu_has_feature(CPU_FTR_ARCH_207S)) + vcpu-arch.entry_ic = mfspr(SPRN_IC); svcpu-in_use = true; } @@ -178,6 +180,8 @@ void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu, vcpu-arch.purr += get_tb() - vcpu-arch.entry_tb; vcpu-arch.spurr += get_tb() - vcpu-arch.entry_tb; vcpu-arch.vtb += get_vtb() - vcpu-arch.entry_vtb; + if (cpu_has_feature(CPU_FTR_ARCH_207S)) + vcpu-arch.ic += mfspr(SPRN_IC) - vcpu-arch.entry_ic; svcpu-in_use = false; out: -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 04/63] KVM: PPC: Book3s PR: Disable AIL mode with OPAL
When we're using PR KVM we must not allow the CPU to take interrupts in virtual mode, as the SLB does not contain host kernel mappings when running inside the guest context. To make sure we get good performance for non-KVM tasks but still properly functioning PR KVM, let's just disable AIL whenever a vcpu is scheduled in. This is fundamentally different from how we deal with AIL on pSeries type machines where we disable AIL for the whole machine as soon as a single KVM VM is up. The reason for that is easy - on pSeries we do not have control over per-cpu configuration of AIL. We also don't want to mess with CPU hotplug races and AIL configuration, so setting it per CPU is easier and more flexible. This patch fixes running PR KVM on POWER8 bare metal for me. Signed-off-by: Alexander Graf ag...@suse.de Acked-by: Paul Mackerras pau...@samba.org --- arch/powerpc/kvm/book3s_pr.c | 12 1 file changed, 12 insertions(+) diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 3da412e..8ea7da4 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -71,6 +71,12 @@ static void kvmppc_core_vcpu_load_pr(struct kvm_vcpu *vcpu, int cpu) svcpu-in_use = 0; svcpu_put(svcpu); #endif + + /* Disable AIL if supported */ + if (cpu_has_feature(CPU_FTR_HVMODE) + cpu_has_feature(CPU_FTR_ARCH_207S)) + mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) ~LPCR_AIL); + vcpu-cpu = smp_processor_id(); #ifdef CONFIG_PPC_BOOK3S_32 current-thread.kvm_shadow_vcpu = vcpu-arch.shadow_vcpu; @@ -91,6 +97,12 @@ static void kvmppc_core_vcpu_put_pr(struct kvm_vcpu *vcpu) kvmppc_giveup_ext(vcpu, MSR_FP | MSR_VEC | MSR_VSX); kvmppc_giveup_fac(vcpu, FSCR_TAR_LG); + + /* Enable AIL if supported */ + if (cpu_has_feature(CPU_FTR_HVMODE) + cpu_has_feature(CPU_FTR_ARCH_207S)) + mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) | LPCR_AIL_3); + vcpu-cpu = -1; } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 55/63] KVM: PPC: Remove kvmppc_bad_hva()
We have a proper define for invalid HVA numbers. Use those instead of the ppc specific kvmppc_bad_hva(). Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/powerpc.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 2c5a1c3..3d59730 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -309,11 +309,6 @@ int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvmppc_emulate_mmio); -static hva_t kvmppc_bad_hva(void) -{ - return PAGE_OFFSET; -} - static hva_t kvmppc_pte_to_hva(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte) { hva_t hpage; @@ -324,7 +319,7 @@ static hva_t kvmppc_pte_to_hva(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte) return hpage | (pte-raddr ~PAGE_MASK); err: - return kvmppc_bad_hva(); + return KVM_HVA_ERR_BAD; } int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 45/63] KVM: PPC: Book3S PR: Take SRCU read lock around RTAS kvm_read_guest() call
From: Paul Mackerras pau...@samba.org This does for PR KVM what c9438092cae4 (KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call) did for HV KVM, that is, eliminate a suspicious rcu_dereference_check() usage! warning by taking the SRCU lock around the call to kvmppc_rtas_hcall(). It also fixes a return of RESUME_HOST to return EMULATE_FAIL instead, since kvmppc_h_pr() is supposed to return EMULATE_* values. Signed-off-by: Paul Mackerras pau...@samba.org Cc: sta...@vger.kernel.org Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_pr_papr.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c index 6d0143f..ce3c893 100644 --- a/arch/powerpc/kvm/book3s_pr_papr.c +++ b/arch/powerpc/kvm/book3s_pr_papr.c @@ -267,6 +267,8 @@ static int kvmppc_h_pr_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd) int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd) { + int rc, idx; + if (cmd = MAX_HCALL_OPCODE !test_bit(cmd/4, vcpu-kvm-arch.enabled_hcalls)) return EMULATE_FAIL; @@ -299,8 +301,11 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd) break; case H_RTAS: if (list_empty(vcpu-kvm-arch.rtas_tokens)) - return RESUME_HOST; - if (kvmppc_rtas_hcall(vcpu)) + break; + idx = srcu_read_lock(vcpu-kvm-srcu); + rc = kvmppc_rtas_hcall(vcpu); + srcu_read_unlock(vcpu-kvm-srcu, idx); + if (rc) break; kvmppc_set_gpr(vcpu, 3, 0); return EMULATE_DONE; -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 59/63] KVM: PPC: Expose helper functions for data/inst faults
We're going to implement guest code interpretation in KVM for some rare corner cases. This code needs to be able to inject data and instruction faults into the guest when it encounters them. Expose generic APIs to do this in a reasonably subarch agnostic fashion. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_ppc.h | 8 arch/powerpc/kvm/book3s.c | 17 + arch/powerpc/kvm/booke.c | 16 ++-- 3 files changed, 35 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 2214ee6..cbee453 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -132,6 +132,14 @@ extern void kvmppc_core_dequeue_dec(struct kvm_vcpu *vcpu); extern void kvmppc_core_queue_external(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq); extern void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu); +extern void kvmppc_core_queue_dtlb_miss(struct kvm_vcpu *vcpu, ulong dear_flags, + ulong esr_flags); +extern void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu, + ulong dear_flags, + ulong esr_flags); +extern void kvmppc_core_queue_itlb_miss(struct kvm_vcpu *vcpu); +extern void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, + ulong esr_flags); extern void kvmppc_core_flush_tlb(struct kvm_vcpu *vcpu); extern int kvmppc_core_check_requests(struct kvm_vcpu *vcpu); diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index de8da33..dd03f6b 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -230,6 +230,23 @@ void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu) kvmppc_book3s_dequeue_irqprio(vcpu, BOOK3S_INTERRUPT_EXTERNAL_LEVEL); } +void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu, ulong dar, + ulong flags) +{ + kvmppc_set_dar(vcpu, dar); + kvmppc_set_dsisr(vcpu, flags); + kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_DATA_STORAGE); +} + +void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, ulong flags) +{ + u64 msr = kvmppc_get_msr(vcpu); + msr = ~(SRR1_ISI_NOPT | SRR1_ISI_N_OR_G | SRR1_ISI_PROT); + msr |= flags (SRR1_ISI_NOPT | SRR1_ISI_N_OR_G | SRR1_ISI_PROT); + kvmppc_set_msr_fast(vcpu, msr); + kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_INST_STORAGE); +} + int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority) { int deliver = 1; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 2f697b4..f30948a 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -185,24 +185,28 @@ static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, set_bit(priority, vcpu-arch.pending_exceptions); } -static void kvmppc_core_queue_dtlb_miss(struct kvm_vcpu *vcpu, -ulong dear_flags, ulong esr_flags) +void kvmppc_core_queue_dtlb_miss(struct kvm_vcpu *vcpu, +ulong dear_flags, ulong esr_flags) { vcpu-arch.queued_dear = dear_flags; vcpu-arch.queued_esr = esr_flags; kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DTLB_MISS); } -static void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu, - ulong dear_flags, ulong esr_flags) +void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu, + ulong dear_flags, ulong esr_flags) { vcpu-arch.queued_dear = dear_flags; vcpu-arch.queued_esr = esr_flags; kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DATA_STORAGE); } -static void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, - ulong esr_flags) +void kvmppc_core_queue_itlb_miss(struct kvm_vcpu *vcpu) +{ + kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_ITLB_MISS); +} + +void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, ulong esr_flags) { vcpu-arch.queued_esr = esr_flags; kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_INST_STORAGE); -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 60/63] KVM: PPC: Remove DCR handling
DCR handling was only needed for 440 KVM. Since we removed it, we can also remove handling of DCR accesses. Signed-off-by: Alexander Graf ag...@suse.de --- Documentation/virtual/kvm/api.txt | 6 +++--- arch/powerpc/include/asm/kvm_host.h | 4 arch/powerpc/include/asm/kvm_ppc.h | 1 - arch/powerpc/kvm/booke.c| 5 - arch/powerpc/kvm/powerpc.c | 10 -- arch/powerpc/kvm/timing.c | 1 - arch/powerpc/kvm/timing.h | 3 --- include/uapi/linux/kvm.h| 4 ++-- 8 files changed, 5 insertions(+), 29 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 8898caf..a21ff22 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2613,8 +2613,8 @@ The 'data' member contains, in its first 'len' bytes, the value as it would appear if the VCPU performed a load or store of the appropriate width directly to the byte array. -NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR, - KVM_EXIT_PAPR and KVM_EXIT_EPR the corresponding +NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI KVM_EXIT_PAPR and + KVM_EXIT_EPR the corresponding operations are complete (and guest state is consistent) only after userspace has re-entered the kernel with KVM_RUN. The kernel side will first finish incomplete operations and then check for pending signals. Userspace @@ -2685,7 +2685,7 @@ Principles of Operation Book in the Chapter for Dynamic Address Translation __u8 is_write; } dcr; -powerpc specific. +Deprecated - was used for 440 KVM. /* KVM_EXIT_OSI */ struct { diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 66f5b59..98d9dd5 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -94,7 +94,6 @@ struct kvm_vm_stat { struct kvm_vcpu_stat { u32 sum_exits; u32 mmio_exits; - u32 dcr_exits; u32 signal_exits; u32 light_exits; /* Account for special types of light exits: */ @@ -126,7 +125,6 @@ struct kvm_vcpu_stat { enum kvm_exit_types { MMIO_EXITS, - DCR_EXITS, SIGNAL_EXITS, ITLB_REAL_MISS_EXITS, ITLB_VIRT_MISS_EXITS, @@ -601,8 +599,6 @@ struct kvm_vcpu_arch { u8 io_gpr; /* GPR used as IO source/target */ u8 mmio_is_bigendian; u8 mmio_sign_extend; - u8 dcr_needed; - u8 dcr_is_write; u8 osi_needed; u8 osi_enabled; u8 papr_enabled; diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index cbee453..8e36c1e 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -41,7 +41,6 @@ enum emulation_result { EMULATE_DONE, /* no further processing */ EMULATE_DO_MMIO, /* kvm_run filled with MMIO request */ - EMULATE_DO_DCR, /* kvm_run filled with DCR request */ EMULATE_FAIL, /* can't emulate this instruction */ EMULATE_AGAIN,/* something went wrong. go again */ EMULATE_EXIT_USER,/* emulation requires exit to user-space */ diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index f30948a..b4c89fa 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -51,7 +51,6 @@ unsigned long kvmppc_booke_handlers; struct kvm_stats_debugfs_item debugfs_entries[] = { { mmio, VCPU_STAT(mmio_exits) }, - { dcr,VCPU_STAT(dcr_exits) }, { sig,VCPU_STAT(signal_exits) }, { itlb_r, VCPU_STAT(itlb_real_miss_exits) }, { itlb_v, VCPU_STAT(itlb_virt_miss_exits) }, @@ -709,10 +708,6 @@ static int emulation_exit(struct kvm_run *run, struct kvm_vcpu *vcpu) case EMULATE_AGAIN: return RESUME_GUEST; - case EMULATE_DO_DCR: - run-exit_reason = KVM_EXIT_DCR; - return RESUME_HOST; - case EMULATE_FAIL: printk(KERN_CRIT %s: emulation at %lx failed (%08x)\n, __func__, vcpu-arch.pc, vcpu-arch.last_inst); diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index c14ed15..288b4bb 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -743,12 +743,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) #endif } -static void kvmppc_complete_dcr_load(struct kvm_vcpu *vcpu, - struct kvm_run *run) -{ - kvmppc_set_gpr(vcpu, vcpu-arch.io_gpr, run-dcr.data); -} - static void kvmppc_complete_mmio_load(struct kvm_vcpu *vcpu, struct kvm_run *run) { @@ -945,10 +939,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) if (!vcpu-mmio_is_write) kvmppc_complete_mmio_load(vcpu, run);
[PULL 16/63] PPC: Add asm helpers for BE 32bit load/store
From assembly code we might not only have to explicitly BE access 64bit values, but sometimes also 32bit ones. Add helpers that allow for easy use of lwzx/stwx in their respective byte-reverse or native form. Signed-off-by: Alexander Graf ag...@suse.de CC: Benjamin Herrenschmidt b...@kernel.crashing.org --- arch/powerpc/include/asm/asm-compat.h | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/include/asm/asm-compat.h b/arch/powerpc/include/asm/asm-compat.h index 4b237aa..21be8ae 100644 --- a/arch/powerpc/include/asm/asm-compat.h +++ b/arch/powerpc/include/asm/asm-compat.h @@ -34,10 +34,14 @@ #define PPC_MIN_STKFRM 112 #ifdef __BIG_ENDIAN__ +#define LWZX_BEstringify_in_c(lwzx) #define LDX_BE stringify_in_c(ldx) +#define STWX_BEstringify_in_c(stwx) #define STDX_BEstringify_in_c(stdx) #else +#define LWZX_BEstringify_in_c(lwbrx) #define LDX_BE stringify_in_c(ldbrx) +#define STWX_BEstringify_in_c(stwbrx) #define STDX_BEstringify_in_c(stdbrx) #endif -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 50/63] KVM: Allow KVM_CHECK_EXTENSION on the vm fd
The KVM_CHECK_EXTENSION is only available on the kvm fd today. Unfortunately on PPC some of the capabilities change depending on the way a VM was created. So instead we need a way to expose capabilities as VM ioctl, so that we can see which VM type we're using (HV or PR). To enable this, add the KVM_CHECK_EXTENSION ioctl to our vm ioctl portfolio. Signed-off-by: Alexander Graf ag...@suse.de Acked-by: Paolo Bonzini pbonz...@redhat.com --- Documentation/virtual/kvm/api.txt | 7 +++-- include/uapi/linux/kvm.h | 1 + virt/kvm/kvm_main.c | 58 +-- 3 files changed, 37 insertions(+), 29 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 884f819..8898caf 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -148,9 +148,9 @@ of banks, as set via the KVM_X86_SETUP_MCE ioctl. 4.4 KVM_CHECK_EXTENSION -Capability: basic +Capability: basic, KVM_CAP_CHECK_EXTENSION_VM for vm ioctl Architectures: all -Type: system ioctl +Type: system ioctl, vm ioctl Parameters: extension identifier (KVM_CAP_*) Returns: 0 if unsupported; 1 (or some other positive integer) if supported @@ -160,6 +160,9 @@ receives an integer that describes the extension availability. Generally 0 means no and 1 means yes, but some extensions may report additional information in the integer return value. +Based on their initialization different VMs may have different capabilities. +It is thus encouraged to use the vm ioctl to query for capabilities (available +with KVM_CAP_CHECK_EXTENSION_VM on the vm fd) 4.5 KVM_GET_VCPU_MMAP_SIZE diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 0418b74..51776ca 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -759,6 +759,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_ARM_PSCI_0_2 102 #define KVM_CAP_PPC_FIXUP_HCALL 103 #define KVM_CAP_PPC_ENABLE_HCALL 104 +#define KVM_CAP_CHECK_EXTENSION_VM 105 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e28f3ca..1b95cc9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2324,6 +2324,34 @@ static int kvm_ioctl_create_device(struct kvm *kvm, return 0; } +static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) +{ + switch (arg) { + case KVM_CAP_USER_MEMORY: + case KVM_CAP_DESTROY_MEMORY_REGION_WORKS: + case KVM_CAP_JOIN_MEMORY_REGIONS_WORKS: +#ifdef CONFIG_KVM_APIC_ARCHITECTURE + case KVM_CAP_SET_BOOT_CPU_ID: +#endif + case KVM_CAP_INTERNAL_ERROR_DATA: +#ifdef CONFIG_HAVE_KVM_MSI + case KVM_CAP_SIGNAL_MSI: +#endif +#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING + case KVM_CAP_IRQFD_RESAMPLE: +#endif + case KVM_CAP_CHECK_EXTENSION_VM: + return 1; +#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING + case KVM_CAP_IRQ_ROUTING: + return KVM_MAX_IRQ_ROUTES; +#endif + default: + break; + } + return kvm_vm_ioctl_check_extension(kvm, arg); +} + static long kvm_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { @@ -2487,6 +2515,9 @@ static long kvm_vm_ioctl(struct file *filp, r = 0; break; } + case KVM_CHECK_EXTENSION: + r = kvm_vm_ioctl_check_extension_generic(kvm, arg); + break; default: r = kvm_arch_vm_ioctl(filp, ioctl, arg); if (r == -ENOTTY) @@ -2571,33 +2602,6 @@ static int kvm_dev_ioctl_create_vm(unsigned long type) return r; } -static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) -{ - switch (arg) { - case KVM_CAP_USER_MEMORY: - case KVM_CAP_DESTROY_MEMORY_REGION_WORKS: - case KVM_CAP_JOIN_MEMORY_REGIONS_WORKS: -#ifdef CONFIG_KVM_APIC_ARCHITECTURE - case KVM_CAP_SET_BOOT_CPU_ID: -#endif - case KVM_CAP_INTERNAL_ERROR_DATA: -#ifdef CONFIG_HAVE_KVM_MSI - case KVM_CAP_SIGNAL_MSI: -#endif -#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING - case KVM_CAP_IRQFD_RESAMPLE: -#endif - return 1; -#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING - case KVM_CAP_IRQ_ROUTING: - return KVM_MAX_IRQ_ROUTES; -#endif - default: - break; - } - return kvm_vm_ioctl_check_extension(kvm, arg); -} - static long kvm_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 47/63] Split out struct kvmppc_vcore creation to separate function
From: Stewart Smith stew...@linux.vnet.ibm.com No code changes, just split it out to a function so that with the addition of micro partition prefetch buffer allocation (in subsequent patch) looks neater and doesn't require excessive indentation. Signed-off-by: Stewart Smith stew...@linux.vnet.ibm.com Acked-by: Paul Mackerras pau...@samba.org Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_hv.c | 31 +-- 1 file changed, 21 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 0c5266e..5042ccc 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1303,6 +1303,26 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, return r; } +static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core) +{ + struct kvmppc_vcore *vcore; + + vcore = kzalloc(sizeof(struct kvmppc_vcore), GFP_KERNEL); + + if (vcore == NULL) + return NULL; + + INIT_LIST_HEAD(vcore-runnable_threads); + spin_lock_init(vcore-lock); + init_waitqueue_head(vcore-wq); + vcore-preempt_tb = TB_NIL; + vcore-lpcr = kvm-arch.lpcr; + vcore-first_vcpuid = core * threads_per_subcore; + vcore-kvm = kvm; + + return vcore; +} + static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm, unsigned int id) { @@ -1354,16 +1374,7 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm, mutex_lock(kvm-lock); vcore = kvm-arch.vcores[core]; if (!vcore) { - vcore = kzalloc(sizeof(struct kvmppc_vcore), GFP_KERNEL); - if (vcore) { - INIT_LIST_HEAD(vcore-runnable_threads); - spin_lock_init(vcore-lock); - init_waitqueue_head(vcore-wq); - vcore-preempt_tb = TB_NIL; - vcore-lpcr = kvm-arch.lpcr; - vcore-first_vcpuid = core * threads_per_subcore; - vcore-kvm = kvm; - } + vcore = kvmppc_vcore_create(kvm, core); kvm-arch.vcores[core] = vcore; kvm-arch.online_vcores++; } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 54/63] KVM: PPC: Move kvmppc_ld/st to common code
We have enough common infrastructure now to resolve GVA-GPA mappings at runtime. With this we can move our book3s specific helpers to load / store in guest virtual address space to common code as well. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h | 2 +- arch/powerpc/include/asm/kvm_host.h | 4 +- arch/powerpc/include/asm/kvm_ppc.h| 4 ++ arch/powerpc/kvm/book3s.c | 81 --- arch/powerpc/kvm/powerpc.c| 81 +++ 5 files changed, 88 insertions(+), 84 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index a86ca65..172fd6d 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -148,8 +148,8 @@ extern void kvmppc_mmu_hpte_sysexit(void); extern int kvmppc_mmu_hv_init(void); extern int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned long hc); +/* XXX remove this export when load_last_inst() is generic */ extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data); -extern int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data); extern void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec); extern void kvmppc_book3s_dequeue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec); diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 11385bb..66f5b59 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -111,15 +111,15 @@ struct kvm_vcpu_stat { u32 halt_wakeup; u32 dbell_exits; u32 gdbell_exits; + u32 ld; + u32 st; #ifdef CONFIG_PPC_BOOK3S u32 pf_storage; u32 pf_instruc; u32 sp_storage; u32 sp_instruc; u32 queue_intr; - u32 ld; u32 ld_slow; - u32 st; u32 st_slow; #endif }; diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 1a60af9..17fa277 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -80,6 +80,10 @@ extern int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu, extern int kvmppc_load_last_inst(struct kvm_vcpu *vcpu, enum instruction_type type, u32 *inst); +extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, +bool data); +extern int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, +bool data); extern int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu); extern int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu); diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 0b6c84e..de8da33 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -410,87 +410,6 @@ int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, enum xlate_instdata xlid, return r; } -static hva_t kvmppc_bad_hva(void) -{ - return PAGE_OFFSET; -} - -static hva_t kvmppc_pte_to_hva(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte) -{ - hva_t hpage; - - hpage = gfn_to_hva(vcpu-kvm, pte-raddr PAGE_SHIFT); - if (kvm_is_error_hva(hpage)) - goto err; - - return hpage | (pte-raddr ~PAGE_MASK); -err: - return kvmppc_bad_hva(); -} - -int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, - bool data) -{ - struct kvmppc_pte pte; - int r; - - vcpu-stat.st++; - - r = kvmppc_xlate(vcpu, *eaddr, data ? XLATE_DATA : XLATE_INST, -XLATE_WRITE, pte); - if (r 0) - return r; - - *eaddr = pte.raddr; - - if (!pte.may_write) - return -EPERM; - - if (kvm_write_guest(vcpu-kvm, pte.raddr, ptr, size)) - return EMULATE_DO_MMIO; - - return EMULATE_DONE; -} -EXPORT_SYMBOL_GPL(kvmppc_st); - -int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, - bool data) -{ - struct kvmppc_pte pte; - hva_t hva = *eaddr; - int rc; - - vcpu-stat.ld++; - - rc = kvmppc_xlate(vcpu, *eaddr, data ? XLATE_DATA : XLATE_INST, - XLATE_READ, pte); - if (rc) - return rc; - - *eaddr = pte.raddr; - - if (!pte.may_read) - return -EPERM; - - if (!data !pte.may_execute) - return -ENOEXEC; - - hva = kvmppc_pte_to_hva(vcpu, pte); - if (kvm_is_error_hva(hva)) - goto mmio; - - if (copy_from_user(ptr, (void __user *)hva, size)) { - printk(KERN_INFO kvmppc_ld at 0x%lx failed\n, hva); - goto mmio; - } - - return
[PULL 58/63] KVM: PPC: Separate loadstore emulation from priv emulation
Today the instruction emulator can get called via 2 separate code paths. It can either be called by MMIO emulation detection code or by privileged instruction traps. This is bad, as both code paths prepare the environment differently. For MMIO emulation we already know the virtual address we faulted on, so instructions there don't have to actually fetch that information. Split out the two separate use cases into separate files. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_ppc.h | 1 + arch/powerpc/kvm/Makefile| 4 +- arch/powerpc/kvm/emulate.c | 192 + arch/powerpc/kvm/emulate_loadstore.c | 272 +++ arch/powerpc/kvm/powerpc.c | 2 +- 5 files changed, 278 insertions(+), 193 deletions(-) create mode 100644 arch/powerpc/kvm/emulate_loadstore.c diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 17fa277..2214ee6 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -86,6 +86,7 @@ extern int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data); extern int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu); +extern int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu); extern int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu); extern void kvmppc_emulate_dec(struct kvm_vcpu *vcpu); extern u32 kvmppc_get_dec(struct kvm_vcpu *vcpu, u64 tb); diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile index 777f894..1ccd7a1 100644 --- a/arch/powerpc/kvm/Makefile +++ b/arch/powerpc/kvm/Makefile @@ -13,8 +13,9 @@ common-objs-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \ CFLAGS_e500_mmu.o := -I. CFLAGS_e500_mmu_host.o := -I. CFLAGS_emulate.o := -I. +CFLAGS_emulate_loadstore.o := -I. -common-objs-y += powerpc.o emulate.o +common-objs-y += powerpc.o emulate.o emulate_loadstore.o obj-$(CONFIG_KVM_EXIT_TIMING) += timing.o obj-$(CONFIG_KVM_BOOK3S_HANDLER) += book3s_exports.o @@ -91,6 +92,7 @@ kvm-book3s_64-module-objs += \ $(KVM)/eventfd.o \ powerpc.o \ emulate.o \ + emulate_loadstore.o \ book3s.o \ book3s_64_vio.o \ book3s_rtas.o \ diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c index c5c64b6..e96b50d 100644 --- a/arch/powerpc/kvm/emulate.c +++ b/arch/powerpc/kvm/emulate.c @@ -207,25 +207,12 @@ static int kvmppc_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt) return emulated; } -/* XXX to do: - * lhax - * lhaux - * lswx - * lswi - * stswx - * stswi - * lha - * lhau - * lmw - * stmw - * - */ /* XXX Should probably auto-generate instruction decoding for a particular core * from opcode tables in the future. */ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) { u32 inst; - int ra, rs, rt, sprn; + int rs, rt, sprn; enum emulation_result emulated; int advance = 1; @@ -238,7 +225,6 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) pr_debug(Emulating opcode %d / %d\n, get_op(inst), get_xop(inst)); - ra = get_ra(inst); rs = get_rs(inst); rt = get_rt(inst); sprn = get_sprn(inst); @@ -270,200 +256,24 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) #endif advance = 0; break; - case OP_31_XOP_LWZX: - emulated = kvmppc_handle_load(run, vcpu, rt, 4, 1); - break; - - case OP_31_XOP_LBZX: - emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1); - break; - - case OP_31_XOP_LBZUX: - emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1); - kvmppc_set_gpr(vcpu, ra, vcpu-arch.vaddr_accessed); - break; - - case OP_31_XOP_STWX: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), - 4, 1); - break; - - case OP_31_XOP_STBX: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), - 1, 1); - break; - - case OP_31_XOP_STBUX: - emulated = kvmppc_handle_store(run, vcpu, - kvmppc_get_gpr(vcpu, rs), - 1, 1); - kvmppc_set_gpr(vcpu, ra, vcpu-arch.vaddr_accessed); - break; -
[PULL 37/63] KVM: PPC: Book3s: Remove kvmppc_read_inst() function
From: Mihai Caraman mihai.cara...@freescale.com In the context of replacing kvmppc_ld() function calls with a version of kvmppc_get_last_inst() which allow to fail, Alex Graf suggested this: If we get EMULATE_AGAIN, we just have to make sure we go back into the guest. No need to inject an ISI into the guest - it'll do that all by itself. With an error returning kvmppc_get_last_inst we can just use completely get rid of kvmppc_read_inst() and only use kvmppc_get_last_inst() instead. As a intermediate step get rid of kvmppc_read_inst() and only use kvmppc_ld() instead. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_pr.c | 85 ++-- 1 file changed, 34 insertions(+), 51 deletions(-) diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index e40765f..e76aec3 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -710,42 +710,6 @@ static void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac) #endif } -static int kvmppc_read_inst(struct kvm_vcpu *vcpu) -{ - ulong srr0 = kvmppc_get_pc(vcpu); - u32 last_inst = kvmppc_get_last_inst(vcpu); - int ret; - - ret = kvmppc_ld(vcpu, srr0, sizeof(u32), last_inst, false); - if (ret == -ENOENT) { - ulong msr = kvmppc_get_msr(vcpu); - - msr = kvmppc_set_field(msr, 33, 33, 1); - msr = kvmppc_set_field(msr, 34, 36, 0); - msr = kvmppc_set_field(msr, 42, 47, 0); - kvmppc_set_msr_fast(vcpu, msr); - kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_INST_STORAGE); - return EMULATE_AGAIN; - } - - return EMULATE_DONE; -} - -static int kvmppc_check_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr) -{ - - /* Need to do paired single emulation? */ - if (!(vcpu-arch.hflags BOOK3S_HFLAG_PAIRED_SINGLE)) - return EMULATE_DONE; - - /* Read out the instruction */ - if (kvmppc_read_inst(vcpu) == EMULATE_DONE) - /* Need to emulate */ - return EMULATE_FAIL; - - return EMULATE_AGAIN; -} - /* Handle external providers (FPU, Altivec, VSX) */ static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr, ulong msr) @@ -1149,31 +1113,49 @@ program_interrupt: case BOOK3S_INTERRUPT_VSX: { int ext_msr = 0; + int emul; + ulong pc; + u32 last_inst; + + if (vcpu-arch.hflags BOOK3S_HFLAG_PAIRED_SINGLE) { + /* Do paired single instruction emulation */ + pc = kvmppc_get_pc(vcpu); + last_inst = kvmppc_get_last_inst(vcpu); + emul = kvmppc_ld(vcpu, pc, sizeof(u32), last_inst, +false); + if (emul == EMULATE_DONE) + goto program_interrupt; + else + r = RESUME_GUEST; - switch (exit_nr) { - case BOOK3S_INTERRUPT_FP_UNAVAIL: ext_msr = MSR_FP; break; - case BOOK3S_INTERRUPT_ALTIVEC:ext_msr = MSR_VEC; break; - case BOOK3S_INTERRUPT_VSX:ext_msr = MSR_VSX; break; + break; } - switch (kvmppc_check_ext(vcpu, exit_nr)) { - case EMULATE_DONE: - /* everything ok - let's enable the ext */ - r = kvmppc_handle_ext(vcpu, exit_nr, ext_msr); + /* Enable external provider */ + switch (exit_nr) { + case BOOK3S_INTERRUPT_FP_UNAVAIL: + ext_msr = MSR_FP; break; - case EMULATE_FAIL: - /* we need to emulate this instruction */ - goto program_interrupt; + + case BOOK3S_INTERRUPT_ALTIVEC: + ext_msr = MSR_VEC; break; - default: - /* nothing to worry about - go again */ + + case BOOK3S_INTERRUPT_VSX: + ext_msr = MSR_VSX; break; } + + r = kvmppc_handle_ext(vcpu, exit_nr, ext_msr); break; } case BOOK3S_INTERRUPT_ALIGNMENT: - if (kvmppc_read_inst(vcpu) == EMULATE_DONE) { - u32 last_inst = kvmppc_get_last_inst(vcpu); + { + ulong pc = kvmppc_get_pc(vcpu); + u32 last_inst = kvmppc_get_last_inst(vcpu); + int emul = kvmppc_ld(vcpu, pc, sizeof(u32), last_inst, false); + + if (emul == EMULATE_DONE) { u32 dsisr; u64 dar; @@ -1187,6
[PULL 42/63] KVM: PPC: Remove comment saying SPRG1 is used for vcpu pointer
From: Bharat Bhushan bharat.bhus...@freescale.com Scott Wood pointed out that We are no longer using SPRG1 for vcpu pointer, but using SPRN_SPRG_THREAD = SPRG3 (thread-vcpu). So this comment is not valid now. Note: SPRN_SPRG3R is not supported (do not see any need as of now), and if we want to support this in future then we have to shift to using SPRG1 for VCPU pointer. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/reg.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index c8f3381..0ef17ad 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -944,9 +944,6 @@ * readable variant for reads, which can avoid a fault * with KVM type virtualization. * - * (*) Under KVM, the host SPRG1 is used to point to - * the current VCPU data structure - * * 32-bit 8xx: * - SPRG0 scratch for exception vectors * - SPRG1 scratch for exception vectors -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 43/63] KVM: PPC: Remove 440 support
The 440 target hasn't been properly functioning for a few releases and before I was the only one who fixes a very serious bug that indicates to me that nobody used it before either. Furthermore KVM on 440 is slow to the extent of unusable. We don't have to carry along completely unused code. Remove 440 and give us one less thing to worry about. Signed-off-by: Alexander Graf ag...@suse.de --- Documentation/powerpc/00-INDEX| 2 - Documentation/powerpc/kvm_440.txt | 41 --- arch/powerpc/Kconfig.debug| 4 +- arch/powerpc/configs/ppc44x_defconfig | 1 - arch/powerpc/include/asm/kvm_44x.h| 67 - arch/powerpc/include/asm/kvm_asm.h| 1 - arch/powerpc/include/asm/kvm_host.h | 3 - arch/powerpc/kvm/44x.c| 237 --- arch/powerpc/kvm/44x_emulate.c| 194 - arch/powerpc/kvm/44x_tlb.c| 528 -- arch/powerpc/kvm/44x_tlb.h| 86 -- arch/powerpc/kvm/Kconfig | 16 +- arch/powerpc/kvm/Makefile | 12 - arch/powerpc/kvm/booke.h | 7 - arch/powerpc/kvm/booke_interrupts.S | 5 - arch/powerpc/kvm/bookehv_interrupts.S | 1 - arch/powerpc/kvm/powerpc.c| 1 - 17 files changed, 2 insertions(+), 1204 deletions(-) delete mode 100644 Documentation/powerpc/kvm_440.txt delete mode 100644 arch/powerpc/include/asm/kvm_44x.h delete mode 100644 arch/powerpc/kvm/44x.c delete mode 100644 arch/powerpc/kvm/44x_emulate.c delete mode 100644 arch/powerpc/kvm/44x_tlb.c delete mode 100644 arch/powerpc/kvm/44x_tlb.h diff --git a/Documentation/powerpc/00-INDEX b/Documentation/powerpc/00-INDEX index 6db73df..a68784d 100644 --- a/Documentation/powerpc/00-INDEX +++ b/Documentation/powerpc/00-INDEX @@ -17,8 +17,6 @@ firmware-assisted-dump.txt - Documentation on the firmware assisted dump mechanism fadump. hvcs.txt - IBM Hypervisor Virtual Console Server Installation Guide -kvm_440.txt - - Various notes on the implementation of KVM for PowerPC 440. mpc52xx.txt - Linux 2.6.x on MPC52xx family pmu-ebb.txt diff --git a/Documentation/powerpc/kvm_440.txt b/Documentation/powerpc/kvm_440.txt deleted file mode 100644 index c02a003..000 --- a/Documentation/powerpc/kvm_440.txt +++ /dev/null @@ -1,41 +0,0 @@ -Hollis Blanchard holl...@us.ibm.com -15 Apr 2008 - -Various notes on the implementation of KVM for PowerPC 440: - -To enforce isolation, host userspace, guest kernel, and guest userspace all -run at user privilege level. Only the host kernel runs in supervisor mode. -Executing privileged instructions in the guest traps into KVM (in the host -kernel), where we decode and emulate them. Through this technique, unmodified -440 Linux kernels can be run (slowly) as guests. Future performance work will -focus on reducing the overhead and frequency of these traps. - -The usual code flow is started from userspace invoking an run ioctl, which -causes KVM to switch into guest context. We use IVPR to hijack the host -interrupt vectors while running the guest, which allows us to direct all -interrupts to kvmppc_handle_interrupt(). At this point, we could either -- handle the interrupt completely (e.g. emulate mtspr SPRG0), or -- let the host interrupt handler run (e.g. when the decrementer fires), or -- return to host userspace (e.g. when the guest performs device MMIO) - -Address spaces: We take advantage of the fact that Linux doesn't use the AS=1 -address space (in host or guest), which gives us virtual address space to use -for guest mappings. While the guest is running, the host kernel remains mapped -in AS=0, but the guest can only use AS=1 mappings. - -TLB entries: The TLB entries covering the host linear mapping remain -present while running the guest. This reduces the overhead of lightweight -exits, which are handled by KVM running in the host kernel. We keep three -copies of the TLB: - - guest TLB: contents of the TLB as the guest sees it - - shadow TLB: the TLB that is actually in hardware while guest is running - - host TLB: to restore TLB state when context switching guest - host -When a TLB miss occurs because a mapping was not present in the shadow TLB, -but was present in the guest TLB, KVM handles the fault without invoking the -guest. Large guest pages are backed by multiple 4KB shadow pages through this -mechanism. - -IO: MMIO and DCR accesses are emulated by userspace. We use virtio for network -and block IO, so those drivers must be enabled in the guest. It's possible -that some qemu device emulation (e.g. e1000 or rtl8139) may also work with -little effort. diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug index 790352f..93500f6 100644 --- a/arch/powerpc/Kconfig.debug +++ b/arch/powerpc/Kconfig.debug @@ -202,9 +202,7 @@ config PPC_EARLY_DEBUG_BEAT config PPC_EARLY_DEBUG_44x bool Early serial debugging for IBM/AMCC 44x CPUs - # PPC_EARLY_DEBUG on 440 leaves
[PULL 46/63] KVM: PPC: Book3S: Make kvmppc_ld return a more accurate error indication
From: Paul Mackerras pau...@samba.org At present, kvmppc_ld calls kvmppc_xlate, and if kvmppc_xlate returns any error indication, it returns -ENOENT, which is taken to mean an HPTE not found error. However, the error could have been a segment found (no SLB entry) or a permission error. Similarly, kvmppc_pte_to_hva currently does permission checking, but any error from it is taken by kvmppc_ld to mean that the access is an emulated MMIO access. Also, kvmppc_ld does no execute permission checking. This fixes these problems by (a) returning any error from kvmppc_xlate directly, (b) moving the permission check from kvmppc_pte_to_hva into kvmppc_ld, and (c) adding an execute permission check to kvmppc_ld. This is similar to what was done for kvmppc_st() by commit 82ff911317c3 (KVM: PPC: Deflect page write faults properly in kvmppc_st). Signed-off-by: Paul Mackerras pau...@samba.org Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s.c | 25 - 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 37ca8a0..a3cbada 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -413,17 +413,10 @@ static hva_t kvmppc_bad_hva(void) return PAGE_OFFSET; } -static hva_t kvmppc_pte_to_hva(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte, - bool read) +static hva_t kvmppc_pte_to_hva(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte) { hva_t hpage; - if (read !pte-may_read) - goto err; - - if (!read !pte-may_write) - goto err; - hpage = gfn_to_hva(vcpu-kvm, pte-raddr PAGE_SHIFT); if (kvm_is_error_hva(hpage)) goto err; @@ -462,15 +455,23 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, { struct kvmppc_pte pte; hva_t hva = *eaddr; + int rc; vcpu-stat.ld++; - if (kvmppc_xlate(vcpu, *eaddr, data, false, pte)) - goto nopte; + rc = kvmppc_xlate(vcpu, *eaddr, data, false, pte); + if (rc) + return rc; *eaddr = pte.raddr; - hva = kvmppc_pte_to_hva(vcpu, pte, true); + if (!pte.may_read) + return -EPERM; + + if (!data !pte.may_execute) + return -ENOEXEC; + + hva = kvmppc_pte_to_hva(vcpu, pte); if (kvm_is_error_hva(hva)) goto mmio; @@ -481,8 +482,6 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, return EMULATE_DONE; -nopte: - return -ENOENT; mmio: return EMULATE_DO_MMIO; } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 44/63] KVM: PPC: Book3S: Fix LPCR one_reg interface
From: Alexey Kardashevskiy a...@ozlabs.ru Unfortunately, the LPCR got defined as a 32-bit register in the one_reg interface. This is unfortunate because KVM allows userspace to control the DPFD (default prefetch depth) field, which is in the upper 32 bits. The result is that DPFD always get set to 0, which reduces performance in the guest. We can't just change KVM_REG_PPC_LPCR to be a 64-bit register ID, since that would break existing userspace binaries. Instead we define a new KVM_REG_PPC_LPCR_64 id which is 64-bit. Userspace can still use the old KVM_REG_PPC_LPCR id, but it now only modifies those fields in the bottom 32 bits that userspace can modify (ILE, TC and AIL). If userspace uses the new KVM_REG_PPC_LPCR_64 id, it can modify DPFD as well. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Signed-off-by: Paul Mackerras pau...@samba.org Cc: sta...@vger.kernel.org Signed-off-by: Alexander Graf ag...@suse.de --- Documentation/virtual/kvm/api.txt | 3 ++- arch/powerpc/include/uapi/asm/kvm.h | 1 + arch/powerpc/kvm/book3s_hv.c| 13 +++-- arch/powerpc/kvm/book3s_pr.c| 2 ++ 4 files changed, 16 insertions(+), 3 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 6955318..884f819 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1869,7 +1869,8 @@ registers, find a list below: PPC | KVM_REG_PPC_PID | 64 PPC | KVM_REG_PPC_ACOP | 64 PPC | KVM_REG_PPC_VRSAVE | 32 - PPC | KVM_REG_PPC_LPCR | 64 + PPC | KVM_REG_PPC_LPCR | 32 + PPC | KVM_REG_PPC_LPCR_64 | 64 PPC | KVM_REG_PPC_PPR | 64 PPC | KVM_REG_PPC_ARCH_COMPAT 32 PPC | KVM_REG_PPC_DABRX | 32 diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 0e56d9e..e0e49db 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -548,6 +548,7 @@ struct kvm_get_htab_header { #define KVM_REG_PPC_VRSAVE (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xb4) #define KVM_REG_PPC_LPCR (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xb5) +#define KVM_REG_PPC_LPCR_64(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xb5) #define KVM_REG_PPC_PPR(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xb6) /* Architecture compatibility level */ diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index f1281c4..0c5266e 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -863,7 +863,8 @@ static int kvm_arch_vcpu_ioctl_set_sregs_hv(struct kvm_vcpu *vcpu, return 0; } -static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr) +static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr, + bool preserve_top32) { struct kvmppc_vcore *vc = vcpu-arch.vcore; u64 mask; @@ -898,6 +899,10 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr) mask = LPCR_DPFD | LPCR_ILE | LPCR_TC; if (cpu_has_feature(CPU_FTR_ARCH_207S)) mask |= LPCR_AIL; + + /* Broken 32-bit version of LPCR must not clear top bits */ + if (preserve_top32) + mask = 0x; vc-lpcr = (vc-lpcr ~mask) | (new_lpcr mask); spin_unlock(vc-lock); } @@ -1011,6 +1016,7 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, *val = get_reg_val(id, vcpu-arch.vcore-tb_offset); break; case KVM_REG_PPC_LPCR: + case KVM_REG_PPC_LPCR_64: *val = get_reg_val(id, vcpu-arch.vcore-lpcr); break; case KVM_REG_PPC_PPR: @@ -1216,7 +1222,10 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, ALIGN(set_reg_val(id, *val), 1UL 24); break; case KVM_REG_PPC_LPCR: - kvmppc_set_lpcr(vcpu, set_reg_val(id, *val)); + kvmppc_set_lpcr(vcpu, set_reg_val(id, *val), true); + break; + case KVM_REG_PPC_LPCR_64: + kvmppc_set_lpcr(vcpu, set_reg_val(id, *val), false); break; case KVM_REG_PPC_PPR: vcpu-arch.ppr = set_reg_val(id, *val); diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index b18f2d4..e7a1fa2 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -1314,6 +1314,7 @@ static int kvmppc_get_one_reg_pr(struct kvm_vcpu *vcpu, u64 id, *val = get_reg_val(id, to_book3s(vcpu)-hior); break; case KVM_REG_PPC_LPCR: + case KVM_REG_PPC_LPCR_64: /* * We are only interested in the LPCR_ILE bit */ @@ -1349,6 +1350,7 @@ static int kvmppc_set_one_reg_pr(struct kvm_vcpu *vcpu, u64 id, to_book3s(vcpu)-hior_explicit = true; break; case KVM_REG_PPC_LPCR: + case KVM_REG_PPC_LPCR_64:
[PULL 62/63] KVM: PPC: HV: Remove generic instruction emulation
Now that we have properly split load/store instruction emulation and generic instruction emulation, we can move the generic one from kvm.ko to kvm-pr.ko on book3s_64. This reduces the attack surface and amount of code loaded on HV KVM kernels. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/Makefile | 2 +- arch/powerpc/kvm/trace_pr.h | 20 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile index 1ccd7a1..2d590de 100644 --- a/arch/powerpc/kvm/Makefile +++ b/arch/powerpc/kvm/Makefile @@ -48,6 +48,7 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) := \ kvm-pr-y := \ fpu.o \ + emulate.o \ book3s_paired_singles.o \ book3s_pr.o \ book3s_pr_papr.o \ @@ -91,7 +92,6 @@ kvm-book3s_64-module-objs += \ $(KVM)/kvm_main.o \ $(KVM)/eventfd.o \ powerpc.o \ - emulate.o \ emulate_loadstore.o \ book3s.o \ book3s_64_vio.o \ diff --git a/arch/powerpc/kvm/trace_pr.h b/arch/powerpc/kvm/trace_pr.h index e1357cd..a674f09 100644 --- a/arch/powerpc/kvm/trace_pr.h +++ b/arch/powerpc/kvm/trace_pr.h @@ -291,6 +291,26 @@ TRACE_EVENT(kvm_unmap_hva, TP_printk(unmap hva 0x%lx\n, __entry-hva) ); +TRACE_EVENT(kvm_ppc_instr, + TP_PROTO(unsigned int inst, unsigned long _pc, unsigned int emulate), + TP_ARGS(inst, _pc, emulate), + + TP_STRUCT__entry( + __field(unsigned int, inst) + __field(unsigned long, pc ) + __field(unsigned int, emulate ) + ), + + TP_fast_assign( + __entry-inst = inst; + __entry-pc = _pc; + __entry-emulate= emulate; + ), + + TP_printk(inst %u pc 0x%lx emulate %u\n, + __entry-inst, __entry-pc, __entry-emulate) +); + #endif /* _TRACE_KVM_H */ /* This part must be outside protection */ -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 41/63] KVM: PPC: Booke-hv: Add one reg interface for SPRG9
From: Bharat Bhushan bharat.bhus...@freescale.com We now support SPRG9 for guest, so also add a one reg interface for same Note: Changes are in bookehv code only as we do not have SPRG9 on booke-pr. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/uapi/asm/kvm.h | 1 + arch/powerpc/kvm/e500mc.c | 22 -- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 2bc4a94..0e56d9e 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -555,6 +555,7 @@ struct kvm_get_htab_header { #define KVM_REG_PPC_DABRX (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xb8) #define KVM_REG_PPC_WORT (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xb9) +#define KVM_REG_PPC_SPRG9 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xba) /* Transactional Memory checkpointed state: * This is all GPRs, all VSX regs and a subset of SPRs diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c index 690499d..164bad2 100644 --- a/arch/powerpc/kvm/e500mc.c +++ b/arch/powerpc/kvm/e500mc.c @@ -267,14 +267,32 @@ static int kvmppc_core_set_sregs_e500mc(struct kvm_vcpu *vcpu, static int kvmppc_get_one_reg_e500mc(struct kvm_vcpu *vcpu, u64 id, union kvmppc_one_reg *val) { - int r = kvmppc_get_one_reg_e500_tlb(vcpu, id, val); + int r = 0; + + switch (id) { + case KVM_REG_PPC_SPRG9: + *val = get_reg_val(id, vcpu-arch.sprg9); + break; + default: + r = kvmppc_get_one_reg_e500_tlb(vcpu, id, val); + } + return r; } static int kvmppc_set_one_reg_e500mc(struct kvm_vcpu *vcpu, u64 id, union kvmppc_one_reg *val) { - int r = kvmppc_set_one_reg_e500_tlb(vcpu, id, val); + int r = 0; + + switch (id) { + case KVM_REG_PPC_SPRG9: + vcpu-arch.sprg9 = set_reg_val(id, *val); + break; + default: + r = kvmppc_set_one_reg_e500_tlb(vcpu, id, val); + } + return r; } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 63/63] KVM: PPC: PR: Handle FSCR feature deselects
We handle FSCR feature bits (well, TAR only really today) lazily when the guest starts using them. So when a guest activates the bit and later uses that feature we enable it for real in hardware. However, when the guest stops using that bit we don't stop setting it in hardware. That means we can potentially lose a trap that the guest expects to happen because it thinks a feature is not active. This patch adds support to drop TAR when then guest turns it off in FSCR. While at it it also restricts FSCR access to 64bit systems - 32bit ones don't have it. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h | 1 + arch/powerpc/kvm/book3s_emulate.c | 6 +++--- arch/powerpc/kvm/book3s_pr.c | 9 + 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 6166791..6acf0c2 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -182,6 +182,7 @@ extern long kvmppc_hv_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot, unsigned long *map); extern void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr, unsigned long mask); +extern void kvmppc_set_fscr(struct kvm_vcpu *vcpu, u64 fscr); extern void kvmppc_entry_trampoline(void); extern void kvmppc_hv_entry_trampoline(void); diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 84fddcd..5a2bc4b 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -449,10 +449,10 @@ int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val) case SPRN_GQR7: to_book3s(vcpu)-gqr[sprn - SPRN_GQR0] = spr_val; break; +#ifdef CONFIG_PPC_BOOK3S_64 case SPRN_FSCR: - vcpu-arch.fscr = spr_val; + kvmppc_set_fscr(vcpu, spr_val); break; -#ifdef CONFIG_PPC_BOOK3S_64 case SPRN_BESCR: vcpu-arch.bescr = spr_val; break; @@ -593,10 +593,10 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val case SPRN_GQR7: *spr_val = to_book3s(vcpu)-gqr[sprn - SPRN_GQR0]; break; +#ifdef CONFIG_PPC_BOOK3S_64 case SPRN_FSCR: *spr_val = vcpu-arch.fscr; break; -#ifdef CONFIG_PPC_BOOK3S_64 case SPRN_BESCR: *spr_val = vcpu-arch.bescr; break; diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index e7a1fa2..faffb27 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -871,6 +871,15 @@ static int kvmppc_handle_fac(struct kvm_vcpu *vcpu, ulong fac) return RESUME_GUEST; } + +void kvmppc_set_fscr(struct kvm_vcpu *vcpu, u64 fscr) +{ + if ((vcpu-arch.fscr FSCR_TAR) !(fscr FSCR_TAR)) { + /* TAR got dropped, drop it in shadow too */ + kvmppc_giveup_fac(vcpu, FSCR_TAR_LG); + } + vcpu-arch.fscr = fscr; +} #endif int kvmppc_handle_exit_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 51/63] KVM: PPC: Book3S: Provide different CAPs based on HV or PR mode
With Book3S KVM we can create both PR and HV VMs in parallel on the same machine. That gives us new challenges on the CAPs we return - both have different capabilities. When we get asked about CAPs on the kvm fd, there's nothing we can do. We can try to be smart and assume we're running HV if HV is available, PR otherwise. However with the newly added VM CHECK_EXTENSION we can now ask for capabilities directly on a VM which knows whether it's PR or HV. With this patch I can successfully expose KVM PVINFO data to user space in the PR case, fixing magic page mapping for PAPR guests. Signed-off-by: Alexander Graf ag...@suse.de Acked-by: Paolo Bonzini pbonz...@redhat.com --- arch/powerpc/kvm/powerpc.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index d870bac..eaa57da 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -394,11 +394,17 @@ void kvm_arch_sync_events(struct kvm *kvm) int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) { int r; - /* FIXME!! -* Should some of this be vm ioctl ? is it possible now ? -*/ + /* Assume we're using HV mode when the HV module is loaded */ int hv_enabled = kvmppc_hv_ops ? 1 : 0; + if (kvm) { + /* +* Hooray - we know which VM type we're running on. Depend on +* that rather than the guess above. +*/ + hv_enabled = is_kvmppc_hv_enabled(kvm); + } + switch (ext) { #ifdef CONFIG_BOOKE case KVM_CAP_PPC_BOOKE_SREGS: -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 61/63] KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr
From: Bharat Bhushan bharat.bhus...@freescale.com This are not specific to e500hv but applicable for bookehv (As per comment from Scott Wood on my patch kvm: ppc: bookehv: Added wrapper macros for shadow registers) Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_ppc.h | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 8e36c1e..fb86a22 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -539,16 +539,16 @@ static inline bool kvmppc_shared_big_endian(struct kvm_vcpu *vcpu) #endif } -#define SPRNG_WRAPPER_GET(reg, e500hv_spr) \ +#define SPRNG_WRAPPER_GET(reg, bookehv_spr)\ static inline ulong kvmppc_get_##reg(struct kvm_vcpu *vcpu)\ { \ - return mfspr(e500hv_spr); \ + return mfspr(bookehv_spr); \ } \ -#define SPRNG_WRAPPER_SET(reg, e500hv_spr) \ +#define SPRNG_WRAPPER_SET(reg, bookehv_spr)\ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, ulong val) \ { \ - mtspr(e500hv_spr, val); \ + mtspr(bookehv_spr, val); \ } \ #define SHARED_WRAPPER_GET(reg, size) \ @@ -573,18 +573,18 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val) \ SHARED_WRAPPER_GET(reg, size) \ SHARED_WRAPPER_SET(reg, size) \ -#define SPRNG_WRAPPER(reg, e500hv_spr) \ - SPRNG_WRAPPER_GET(reg, e500hv_spr) \ - SPRNG_WRAPPER_SET(reg, e500hv_spr) \ +#define SPRNG_WRAPPER(reg, bookehv_spr) \ + SPRNG_WRAPPER_GET(reg, bookehv_spr) \ + SPRNG_WRAPPER_SET(reg, bookehv_spr) \ #ifdef CONFIG_KVM_BOOKE_HV -#define SHARED_SPRNG_WRAPPER(reg, size, e500hv_spr)\ - SPRNG_WRAPPER(reg, e500hv_spr) \ +#define SHARED_SPRNG_WRAPPER(reg, size, bookehv_spr) \ + SPRNG_WRAPPER(reg, bookehv_spr) \ #else -#define SHARED_SPRNG_WRAPPER(reg, size, e500hv_spr)\ +#define SHARED_SPRNG_WRAPPER(reg, size, bookehv_spr) \ SHARED_WRAPPER(reg, size) \ #endif -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 49/63] KVM: Rename and add argument to check_extension
In preparation to make the check_extension function available to VM scope we add a struct kvm * argument to the function header and rename the function accordingly. It will still be called from the /dev/kvm fd, but with a NULL argument for struct kvm *. Signed-off-by: Alexander Graf ag...@suse.de Acked-by: Paolo Bonzini pbonz...@redhat.com --- arch/arm/kvm/arm.c | 2 +- arch/ia64/kvm/kvm-ia64.c | 2 +- arch/mips/kvm/mips.c | 2 +- arch/powerpc/kvm/powerpc.c | 2 +- arch/s390/kvm/kvm-s390.c | 2 +- arch/x86/kvm/x86.c | 2 +- include/linux/kvm_host.h | 2 +- virt/kvm/kvm_main.c| 6 +++--- 8 files changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 3c82b37..cb77f999 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -184,7 +184,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm) } } -int kvm_dev_ioctl_check_extension(long ext) +int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) { int r; switch (ext) { diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 6a4309b..0729ba6 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -190,7 +190,7 @@ void kvm_arch_check_processor_compat(void *rtn) *(int *)rtn = 0; } -int kvm_dev_ioctl_check_extension(long ext) +int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) { int r; diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index d687c6e..3ca79aa 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -885,7 +885,7 @@ int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) return VM_FAULT_SIGBUS; } -int kvm_dev_ioctl_check_extension(long ext) +int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) { int r; diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 8e03568..d870bac 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -391,7 +391,7 @@ void kvm_arch_sync_events(struct kvm *kvm) { } -int kvm_dev_ioctl_check_extension(long ext) +int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) { int r; /* FIXME!! diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 2f3e14f..00268ca 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -146,7 +146,7 @@ long kvm_arch_dev_ioctl(struct file *filp, return -EINVAL; } -int kvm_dev_ioctl_check_extension(long ext) +int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) { int r; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5a8691b..5a62d91 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2616,7 +2616,7 @@ out: return r; } -int kvm_dev_ioctl_check_extension(long ext) +int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) { int r; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ec4e3bd..5065b95 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -602,7 +602,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg); int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf); -int kvm_dev_ioctl_check_extension(long ext); +int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext); int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log, int *is_dirty); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4b6c01b..e28f3ca 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2571,7 +2571,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type) return r; } -static long kvm_dev_ioctl_check_extension_generic(long arg) +static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) { switch (arg) { case KVM_CAP_USER_MEMORY: @@ -2595,7 +2595,7 @@ static long kvm_dev_ioctl_check_extension_generic(long arg) default: break; } - return kvm_dev_ioctl_check_extension(arg); + return kvm_vm_ioctl_check_extension(kvm, arg); } static long kvm_dev_ioctl(struct file *filp, @@ -2614,7 +2614,7 @@ static long kvm_dev_ioctl(struct file *filp, r = kvm_dev_ioctl_create_vm(arg); break; case KVM_CHECK_EXTENSION: - r = kvm_dev_ioctl_check_extension_generic(arg); + r = kvm_vm_ioctl_check_extension_generic(NULL, arg); break; case KVM_GET_VCPU_MMAP_SIZE: r = -EINVAL; -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 27/63] KVM: PPC: Book3S: Add hack for split real mode
Today we handle split real mode by mapping both instruction and data faults into a special virtual address space that only exists during the split mode phase. This is good enough to catch 32bit Linux guests that use split real mode for copy_from/to_user. In this case we're always prefixed with 0xc000 for our instruction pointer and can map the user space process freely below there. However, that approach fails when we're running KVM inside of KVM. Here the 1st level last_inst reader may well be in the same virtual page as a 2nd level interrupt handler. It also fails when running Mac OS X guests. Here we have a 4G/4G split, so a kernel copy_from/to_user implementation can easily overlap with user space addresses. The architecturally correct way to fix this would be to implement an instruction interpreter in KVM that kicks in whenever we go into split real mode. This interpreter however would not receive a great amount of testing and be a lot of bloat for a reasonably isolated corner case. So I went back to the drawing board and tried to come up with a way to make split real mode work with a single flat address space. And then I realized that we could get away with the same trick that makes it work for Linux: Whenever we see an instruction address during split real mode that may collide, we just move it higher up the virtual address space to a place that hopefully does not collide (keep your fingers crossed!). That approach does work surprisingly well. I am able to successfully run Mac OS X guests with KVM and QEMU (no split real mode hacks like MOL) when I apply a tiny timing probe hack to QEMU. I'd say this is a win over even more broken split real mode :). Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_asm.h| 1 + arch/powerpc/include/asm/kvm_book3s.h | 3 +++ arch/powerpc/kvm/book3s.c | 19 ++ arch/powerpc/kvm/book3s_pr.c | 48 +++ 4 files changed, 71 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h index 9601741..3f3e530 100644 --- a/arch/powerpc/include/asm/kvm_asm.h +++ b/arch/powerpc/include/asm/kvm_asm.h @@ -131,6 +131,7 @@ #define BOOK3S_HFLAG_NATIVE_PS 0x8 #define BOOK3S_HFLAG_MULTI_PGSIZE 0x10 #define BOOK3S_HFLAG_NEW_TLBIE 0x20 +#define BOOK3S_HFLAG_SPLIT_HACK0x40 #define RESUME_FLAG_NV (10) /* Reload guest nonvolatile state? */ #define RESUME_FLAG_HOST(11) /* Resume host? */ diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 8ac5392..b1cf18d 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -324,4 +324,7 @@ static inline bool is_kvmppc_resume_guest(int r) /* LPIDs we support with this build -- runtime limit may be lower */ #define KVMPPC_NR_LPIDS(LPID_RSVD + 1) +#define SPLIT_HACK_MASK0xff00 +#define SPLIT_HACK_OFFS0xfb00 + #endif /* __ASM_KVM_BOOK3S_H__ */ diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 9624c56..1d13764 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -72,6 +72,17 @@ void kvmppc_core_load_guest_debugstate(struct kvm_vcpu *vcpu) { } +void kvmppc_unfixup_split_real(struct kvm_vcpu *vcpu) +{ + if (vcpu-arch.hflags BOOK3S_HFLAG_SPLIT_HACK) { + ulong pc = kvmppc_get_pc(vcpu); + if ((pc SPLIT_HACK_MASK) == SPLIT_HACK_OFFS) + kvmppc_set_pc(vcpu, pc ~SPLIT_HACK_MASK); + vcpu-arch.hflags = ~BOOK3S_HFLAG_SPLIT_HACK; + } +} +EXPORT_SYMBOL_GPL(kvmppc_unfixup_split_real); + static inline unsigned long kvmppc_interrupt_offset(struct kvm_vcpu *vcpu) { if (!is_kvmppc_hv_enabled(vcpu-kvm)) @@ -118,6 +129,7 @@ static inline bool kvmppc_critical_section(struct kvm_vcpu *vcpu) void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags) { + kvmppc_unfixup_split_real(vcpu); kvmppc_set_srr0(vcpu, kvmppc_get_pc(vcpu)); kvmppc_set_srr1(vcpu, kvmppc_get_msr(vcpu) | flags); kvmppc_set_pc(vcpu, kvmppc_interrupt_offset(vcpu) + vec); @@ -384,6 +396,13 @@ static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, bool data, pte-may_write = true; pte-may_execute = true; r = 0; + + if ((kvmppc_get_msr(vcpu) (MSR_IR | MSR_DR)) == MSR_DR + !data) { + if ((vcpu-arch.hflags BOOK3S_HFLAG_SPLIT_HACK) + ((eaddr SPLIT_HACK_MASK) == SPLIT_HACK_OFFS)) + pte-raddr = ~SPLIT_HACK_MASK; + } } return r; diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 15fd6c2..6125f60 100644 ---
[PULL 40/63] kvm: ppc: bookehv: Save restore SPRN_SPRG9 on guest entry exit
From: Bharat Bhushan bharat.bhus...@freescale.com SPRN_SPRG is used by debug interrupt handler, so this is required for debug support. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/kernel/asm-offsets.c | 1 + arch/powerpc/kvm/bookehv_interrupts.S | 4 3 files changed, 6 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 855ba4d..562f685 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -587,6 +587,7 @@ struct kvm_vcpu_arch { u32 mmucfg; u32 eptcfg; u32 epr; + u64 sprg9; u32 pwrmgtcr0; u32 crit_save; /* guest debug registers*/ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 17ffcb4..ab9ae04 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -668,6 +668,7 @@ int main(void) DEFINE(VCPU_LR, offsetof(struct kvm_vcpu, arch.lr)); DEFINE(VCPU_CTR, offsetof(struct kvm_vcpu, arch.ctr)); DEFINE(VCPU_PC, offsetof(struct kvm_vcpu, arch.pc)); + DEFINE(VCPU_SPRG9, offsetof(struct kvm_vcpu, arch.sprg9)); DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, arch.last_inst)); DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear)); DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr)); diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index e000b39..b4f8fba 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -398,6 +398,7 @@ _GLOBAL(kvmppc_resume_host) #ifdef CONFIG_64BIT PPC_LL r3, PACA_SPRG_VDSO(r13) #endif + mfspr r5, SPRN_SPRG9 PPC_STD(r6, VCPU_SHARED_SPRG4, r11) mfspr r8, SPRN_SPRG6 PPC_STD(r7, VCPU_SHARED_SPRG5, r11) @@ -405,6 +406,7 @@ _GLOBAL(kvmppc_resume_host) #ifdef CONFIG_64BIT mtspr SPRN_SPRG_VDSO_WRITE, r3 #endif + PPC_STD(r5, VCPU_SPRG9, r4) PPC_STD(r8, VCPU_SHARED_SPRG6, r11) mfxer r3 PPC_STD(r9, VCPU_SHARED_SPRG7, r11) @@ -639,7 +641,9 @@ lightweight_exit: mtspr SPRN_SPRG5W, r6 PPC_LD(r8, VCPU_SHARED_SPRG7, r11) mtspr SPRN_SPRG6W, r7 + PPC_LD(r5, VCPU_SPRG9, r4) mtspr SPRN_SPRG7W, r8 + mtspr SPRN_SPRG9, r5 /* Load some guest volatiles. */ PPC_LL r3, VCPU_LR(r4) -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 48/63] Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8
From: Stewart Smith stew...@linux.vnet.ibm.com The POWER8 processor has a Micro Partition Prefetch Engine, which is a fancy way of saying has way to store and load contents of L2 or L2+MRU way of L3 cache. We initiate the storing of the log (list of addresses) using the logmpp instruction and start restore by writing to a SPR. The logmpp instruction takes parameters in a single 64bit register: - starting address of the table to store log of L2/L2+L3 cache contents - 32kb for L2 - 128kb for L2+L3 - Aligned relative to maximum size of the table (32kb or 128kb) - Log control (no-op, L2 only, L2 and L3, abort logout) We should abort any ongoing logging before initiating one. To initiate restore, we write to the MPPR SPR. The format of what to write to the SPR is similar to the logmpp instruction parameter: - starting address of the table to read from (same alignment requirements) - table size (no data, until end of table) - prefetch rate (from fastest possible to slower. about every 8, 16, 24 or 32 cycles) The idea behind loading and storing the contents of L2/L3 cache is to reduce memory latency in a system that is frequently swapping vcores on a physical CPU. The best case scenario for doing this is when some vcores are doing very cache heavy workloads. The worst case is when they have about 0 cache hits, so we just generate needless memory operations. This implementation just does L2 store/load. In my benchmarks this proves to be useful. Benchmark 1: - 16 core POWER8 - 3x Ubuntu 14.04LTS guests (LE) with 8 VCPUs each - No split core/SMT - two guests running sysbench memory test. sysbench --test=memory --num-threads=8 run - one guest running apache bench (of default HTML page) ab -n 49 -c 400 http://localhost/ This benchmark aims to measure performance of real world application (apache) where other guests are cache hot with their own workloads. The sysbench memory benchmark does pointer sized writes to a (small) memory buffer in a loop. In this benchmark with this patch I can see an improvement both in requests per second (~5%) and in mean and median response times (again, about 5%). The spread of minimum and maximum response times were largely unchanged. benchmark 2: - Same VM config as benchmark 1 - all three guests running sysbench memory benchmark This benchmark aims to see if there is a positive or negative affect to this cache heavy benchmark. Although due to the nature of the benchmark (stores) we may not see a difference in performance, but rather hopefully an improvement in consistency of performance (when vcore switched in, don't have to wait many times for cachelines to be pulled in) The results of this benchmark are improvements in consistency of performance rather than performance itself. With this patch, the few outliers in duration go away and we get more consistent performance in each guest. benchmark 3: - same 3 guests and CPU configuration as benchmark 1 and 2. - two idle guests - 1 guest running STREAM benchmark This scenario also saw performance improvement with this patch. On Copy and Scale workloads from STREAM, I got 5-6% improvement with this patch. For Add and triad, it was around 10% (or more). benchmark 4: - same 3 guests as previous benchmarks - two guests running sysbench --memory, distinctly different cache heavy workload - one guest running STREAM benchmark. Similar improvements to benchmark 3. benchmark 5: - 1 guest, 8 VCPUs, Ubuntu 14.04 - Host configured with split core (SMT8, subcores-per-core=4) - STREAM benchmark In this benchmark, we see a 10-20% performance improvement across the board of STREAM benchmark results with this patch. Based on preliminary investigation and microbenchmarks by Prerna Saxena pre...@linux.vnet.ibm.com Signed-off-by: Stewart Smith stew...@linux.vnet.ibm.com Acked-by: Paul Mackerras pau...@samba.org Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/cache.h | 7 + arch/powerpc/include/asm/kvm_host.h | 2 ++ arch/powerpc/include/asm/ppc-opcode.h | 17 +++ arch/powerpc/include/asm/reg.h| 1 + arch/powerpc/kvm/book3s_hv.c | 57 ++- 5 files changed, 83 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h index ed0afc1..34a05a1 100644 --- a/arch/powerpc/include/asm/cache.h +++ b/arch/powerpc/include/asm/cache.h @@ -3,6 +3,7 @@ #ifdef __KERNEL__ +#include asm/reg.h /* bytes per L1 cache line */ #if defined(CONFIG_8xx) || defined(CONFIG_403GCX) @@ -39,6 +40,12 @@ struct ppc64_caches { }; extern struct ppc64_caches ppc64_caches; + +static inline void logmpp(u64 x) +{ + asm volatile(PPC_LOGMPP(R1) : : r (x)); +} + #endif /* __powerpc64__ ! __ASSEMBLY__ */ #if defined(__ASSEMBLY__) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 5fe2b5d..11385bb 100644 ---
[PULL 57/63] KVM: PPC: Handle magic page in kvmppc_ld/st
We use kvmppc_ld and kvmppc_st to emulate load/store instructions that may as well access the magic page. Special case it out so that we can properly access it. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h | 7 +++ arch/powerpc/include/asm/kvm_booke.h | 10 ++ arch/powerpc/kvm/powerpc.c| 22 ++ 3 files changed, 39 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 172fd6d..6166791 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -286,6 +286,13 @@ static inline bool is_kvmppc_resume_guest(int r) return (r == RESUME_GUEST || r == RESUME_GUEST_NV); } +static inline bool is_kvmppc_hv_enabled(struct kvm *kvm); +static inline bool kvmppc_supports_magic_page(struct kvm_vcpu *vcpu) +{ + /* Only PR KVM supports the magic page */ + return !is_kvmppc_hv_enabled(vcpu-kvm); +} + /* Magic register values loaded into r3 and r4 before the 'sc' assembly * instruction for the OSI hypercalls */ #define OSI_SC_MAGIC_R30x113724FA diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h index cbb1990..f7aa5cc 100644 --- a/arch/powerpc/include/asm/kvm_booke.h +++ b/arch/powerpc/include/asm/kvm_booke.h @@ -103,4 +103,14 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu) { return vcpu-arch.fault_dear; } + +static inline bool kvmppc_supports_magic_page(struct kvm_vcpu *vcpu) +{ + /* Magic page is only supported on e500v2 */ +#ifdef CONFIG_KVM_E500V2 + return true; +#else + return false; +#endif +} #endif /* __ASM_KVM_BOOKE_H__ */ diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index be40886..544d1d3 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -312,6 +312,7 @@ EXPORT_SYMBOL_GPL(kvmppc_emulate_mmio); int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data) { + ulong mp_pa = vcpu-arch.magic_page_pa KVM_PAM PAGE_MASK; struct kvmppc_pte pte; int r; @@ -327,6 +328,16 @@ int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, if (!pte.may_write) return -EPERM; + /* Magic page override */ + if (kvmppc_supports_magic_page(vcpu) mp_pa + ((pte.raddr KVM_PAM PAGE_MASK) == mp_pa) + !(kvmppc_get_msr(vcpu) MSR_PR)) { + void *magic = vcpu-arch.shared; + magic += pte.eaddr 0xfff; + memcpy(magic, ptr, size); + return EMULATE_DONE; + } + if (kvm_write_guest(vcpu-kvm, pte.raddr, ptr, size)) return EMULATE_DO_MMIO; @@ -337,6 +348,7 @@ EXPORT_SYMBOL_GPL(kvmppc_st); int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data) { + ulong mp_pa = vcpu-arch.magic_page_pa KVM_PAM PAGE_MASK; struct kvmppc_pte pte; int rc; @@ -355,6 +367,16 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, if (!data !pte.may_execute) return -ENOEXEC; + /* Magic page override */ + if (kvmppc_supports_magic_page(vcpu) mp_pa + ((pte.raddr KVM_PAM PAGE_MASK) == mp_pa) + !(kvmppc_get_msr(vcpu) MSR_PR)) { + void *magic = vcpu-arch.shared; + magic += pte.eaddr 0xfff; + memcpy(ptr, magic, size); + return EMULATE_DONE; + } + if (kvm_read_guest(vcpu-kvm, pte.raddr, ptr, size)) return EMULATE_DO_MMIO; -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 06/63] KVM: PPC: Book3S PR: Handle hyp doorbell exits
If we're running PR KVM in HV mode, we may get hypervisor doorbell interrupts. Handle those the same way we treat normal doorbells. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_pr.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 8ea7da4..3b82e86 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -988,6 +988,7 @@ int kvmppc_handle_exit_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, case BOOK3S_INTERRUPT_DECREMENTER: case BOOK3S_INTERRUPT_HV_DECREMENTER: case BOOK3S_INTERRUPT_DOORBELL: + case BOOK3S_INTERRUPT_H_DOORBELL: vcpu-stat.dec_exits++; r = RESUME_GUEST; break; -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 39/63] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation
From: Mihai Caraman mihai.cara...@freescale.com On book3e, KVM uses load external pid (lwepx) dedicated instruction to read guest last instruction on the exit path. lwepx exceptions (DTLB_MISS, DSI and LRAT), generated by loading a guest address, needs to be handled by KVM. These exceptions are generated in a substituted guest translation context (EPLC[EGS] = 1) from host context (MSR[GS] = 0). Currently, KVM hooks only interrupts generated from guest context (MSR[GS] = 1), doing minimal checks on the fast path to avoid host performance degradation. lwepx exceptions originate from host state (MSR[GS] = 0) which implies additional checks in DO_KVM macro (beside the current MSR[GS] = 1) by looking at the Exception Syndrome Register (ESR[EPID]) and the External PID Load Context Register (EPLC[EGS]). Doing this on each Data TLB miss exception is obvious too intrusive for the host. Read guest last instruction from kvmppc_load_last_inst() by searching for the physical address and kmap it. This address the TODO for TLB eviction and execute-but-not-read entries, and allow us to get rid of lwepx until we are able to handle failures. A simple stress benchmark shows a 1% sys performance degradation compared with previous approach (lwepx without failure handling): time for i in `seq 1 1`; do /bin/echo /dev/null; done real0m 8.85s user0m 4.34s sys 0m 4.48s vs real0m 8.84s user0m 4.36s sys 0m 4.44s A solution to use lwepx and to handle its exceptions in KVM would be to temporary highjack the interrupt vector from host. This imposes additional synchronizations for cores like FSL e6500 that shares host IVOR registers between hardware threads. This optimized solution can be later developed on top of this patch. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/booke.c | 44 + arch/powerpc/kvm/bookehv_interrupts.S | 37 -- arch/powerpc/kvm/e500_mmu_host.c | 92 +++ 3 files changed, 145 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 50df5e3..97bcde2 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -819,6 +819,28 @@ static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu, } } +static int kvmppc_resume_inst_load(struct kvm_run *run, struct kvm_vcpu *vcpu, + enum emulation_result emulated, u32 last_inst) +{ + switch (emulated) { + case EMULATE_AGAIN: + return RESUME_GUEST; + + case EMULATE_FAIL: + pr_debug(%s: load instruction from guest address %lx failed\n, + __func__, vcpu-arch.pc); + /* For debugging, encode the failing instruction and +* report it to userspace. */ + run-hw.hardware_exit_reason = ~0ULL 32; + run-hw.hardware_exit_reason |= last_inst; + kvmppc_core_queue_program(vcpu, ESR_PIL); + return RESUME_HOST; + + default: + BUG(); + } +} + /** * kvmppc_handle_exit * @@ -830,6 +852,8 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, int r = RESUME_HOST; int s; int idx; + u32 last_inst = KVM_INST_FETCH_FAILED; + enum emulation_result emulated = EMULATE_DONE; /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); @@ -837,6 +861,20 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, /* restart interrupts if they were meant for the host */ kvmppc_restart_interrupt(vcpu, exit_nr); + /* +* get last instruction before beeing preempted +* TODO: for e6500 check also BOOKE_INTERRUPT_LRAT_ERROR ESR_DATA +*/ + switch (exit_nr) { + case BOOKE_INTERRUPT_DATA_STORAGE: + case BOOKE_INTERRUPT_DTLB_MISS: + case BOOKE_INTERRUPT_HV_PRIV: + emulated = kvmppc_get_last_inst(vcpu, false, last_inst); + break; + default: + break; + } + local_irq_enable(); trace_kvm_exit(exit_nr, vcpu); @@ -845,6 +883,11 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, run-exit_reason = KVM_EXIT_UNKNOWN; run-ready_for_interrupt_injection = 1; + if (emulated != EMULATE_DONE) { + r = kvmppc_resume_inst_load(run, vcpu, emulated, last_inst); + goto out; + } + switch (exit_nr) { case BOOKE_INTERRUPT_MACHINE_CHECK: printk(MACHINE CHECK: %lx\n, mfspr(SPRN_MCSR)); @@ -1134,6 +1177,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, BUG(); } +out: /* * To avoid clobbering exit_reason, only check for signals if we * aren't already
[PULL 31/63] kvm: ppc: booke: Use the shared struct helpers of SPRN_DEAR
From: Bharat Bhushan bharat.bhus...@freescale.com Uses kvmppc_set_dar() and kvmppc_get_dar() helper functions Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/booke.c | 24 +++- 1 file changed, 3 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 3b43adb..8e8b14b 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -292,24 +292,6 @@ static void set_guest_mcsrr(struct kvm_vcpu *vcpu, unsigned long srr0, u32 srr1) vcpu-arch.mcsrr1 = srr1; } -static unsigned long get_guest_dear(struct kvm_vcpu *vcpu) -{ -#ifdef CONFIG_KVM_BOOKE_HV - return mfspr(SPRN_GDEAR); -#else - return vcpu-arch.shared-dar; -#endif -} - -static void set_guest_dear(struct kvm_vcpu *vcpu, unsigned long dear) -{ -#ifdef CONFIG_KVM_BOOKE_HV - mtspr(SPRN_GDEAR, dear); -#else - vcpu-arch.shared-dar = dear; -#endif -} - static unsigned long get_guest_esr(struct kvm_vcpu *vcpu) { #ifdef CONFIG_KVM_BOOKE_HV @@ -447,7 +429,7 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, if (update_esr == true) set_guest_esr(vcpu, vcpu-arch.queued_esr); if (update_dear == true) - set_guest_dear(vcpu, vcpu-arch.queued_dear); + kvmppc_set_dar(vcpu, vcpu-arch.queued_dear); if (update_epr == true) { if (vcpu-arch.epr_flags KVMPPC_EPR_USER) kvm_make_request(KVM_REQ_EPR_EXIT, vcpu); @@ -1317,7 +1299,7 @@ static void get_sregs_base(struct kvm_vcpu *vcpu, sregs-u.e.csrr1 = vcpu-arch.csrr1; sregs-u.e.mcsr = vcpu-arch.mcsr; sregs-u.e.esr = get_guest_esr(vcpu); - sregs-u.e.dear = get_guest_dear(vcpu); + sregs-u.e.dear = kvmppc_get_dar(vcpu); sregs-u.e.tsr = vcpu-arch.tsr; sregs-u.e.tcr = vcpu-arch.tcr; sregs-u.e.dec = kvmppc_get_dec(vcpu, tb); @@ -1335,7 +1317,7 @@ static int set_sregs_base(struct kvm_vcpu *vcpu, vcpu-arch.csrr1 = sregs-u.e.csrr1; vcpu-arch.mcsr = sregs-u.e.mcsr; set_guest_esr(vcpu, sregs-u.e.esr); - set_guest_dear(vcpu, sregs-u.e.dear); + kvmppc_set_dar(vcpu, sregs-u.e.dear); vcpu-arch.vrsave = sregs-u.e.vrsave; kvmppc_set_tcr(vcpu, sregs-u.e.tcr); -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 30/63] kvm: ppc: booke: Use the shared struct helpers of SRR0 and SRR1
From: Bharat Bhushan bharat.bhus...@freescale.com Use kvmppc_set_srr0/srr1() and kvmppc_get_srr0/srr1() helper functions Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/booke.c | 17 ++--- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index ab62109..3b43adb 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -266,13 +266,8 @@ static void kvmppc_core_dequeue_watchdog(struct kvm_vcpu *vcpu) static void set_guest_srr(struct kvm_vcpu *vcpu, unsigned long srr0, u32 srr1) { -#ifdef CONFIG_KVM_BOOKE_HV - mtspr(SPRN_GSRR0, srr0); - mtspr(SPRN_GSRR1, srr1); -#else - vcpu-arch.shared-srr0 = srr0; - vcpu-arch.shared-srr1 = srr1; -#endif + kvmppc_set_srr0(vcpu, srr0); + kvmppc_set_srr1(vcpu, srr1); } static void set_guest_csrr(struct kvm_vcpu *vcpu, unsigned long srr0, u32 srr1) @@ -1265,8 +1260,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) regs-lr = vcpu-arch.lr; regs-xer = kvmppc_get_xer(vcpu); regs-msr = vcpu-arch.shared-msr; - regs-srr0 = vcpu-arch.shared-srr0; - regs-srr1 = vcpu-arch.shared-srr1; + regs-srr0 = kvmppc_get_srr0(vcpu); + regs-srr1 = kvmppc_get_srr1(vcpu); regs-pid = vcpu-arch.pid; regs-sprg0 = vcpu-arch.shared-sprg0; regs-sprg1 = vcpu-arch.shared-sprg1; @@ -1293,8 +1288,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) vcpu-arch.lr = regs-lr; kvmppc_set_xer(vcpu, regs-xer); kvmppc_set_msr(vcpu, regs-msr); - vcpu-arch.shared-srr0 = regs-srr0; - vcpu-arch.shared-srr1 = regs-srr1; + kvmppc_set_srr0(vcpu, regs-srr0); + kvmppc_set_srr1(vcpu, regs-srr1); kvmppc_set_pid(vcpu, regs-pid); vcpu-arch.shared-sprg0 = regs-sprg0; vcpu-arch.shared-sprg1 = regs-sprg1; -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 14/63] KVM: PPC: Book3S HV: Add H_SET_MODE hcall handling
From: Michael Neuling mi...@neuling.org This adds support for the H_SET_MODE hcall. This hcall is a multiplexer that has several functions, some of which are called rarely, and some which are potentially called very frequently. Here we add support for the functions that set the debug registers CIABR (Completed Instruction Address Breakpoint Register) and DAWR/DAWRX (Data Address Watchpoint Register and eXtension), since they could be updated by the guest as often as every context switch. This also adds a kvmppc_power8_compatible() function to test to see if a guest is compatible with POWER8 or not. The CIABR and DAWR/X only exist on POWER8. Signed-off-by: Michael Neuling mi...@neuling.org Signed-off-by: Paul Mackerras pau...@samba.org Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/hvcall.h | 6 + arch/powerpc/kvm/book3s_hv.c | 52 ++- 2 files changed, 57 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index 5dbbb29..85bc8c0 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -279,6 +279,12 @@ #define H_GET_24X7_DATA0xF07C #define H_GET_PERF_COUNTER_INFO0xF080 +/* Values for 2nd argument to H_SET_MODE */ +#define H_SET_MODE_RESOURCE_SET_CIABR 1 +#define H_SET_MODE_RESOURCE_SET_DAWR 2 +#define H_SET_MODE_RESOURCE_ADDR_TRANS_MODE3 +#define H_SET_MODE_RESOURCE_LE 4 + #ifndef __ASSEMBLY__ /** diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index c4377c7..7db9df2 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -557,6 +557,48 @@ static void kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu, vcpu-arch.dtl.dirty = true; } +static bool kvmppc_power8_compatible(struct kvm_vcpu *vcpu) +{ + if (vcpu-arch.vcore-arch_compat = PVR_ARCH_207) + return true; + if ((!vcpu-arch.vcore-arch_compat) + cpu_has_feature(CPU_FTR_ARCH_207S)) + return true; + return false; +} + +static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags, +unsigned long resource, unsigned long value1, +unsigned long value2) +{ + switch (resource) { + case H_SET_MODE_RESOURCE_SET_CIABR: + if (!kvmppc_power8_compatible(vcpu)) + return H_P2; + if (value2) + return H_P4; + if (mflags) + return H_UNSUPPORTED_FLAG_START; + /* Guests can't breakpoint the hypervisor */ + if ((value1 CIABR_PRIV) == CIABR_PRIV_HYPER) + return H_P3; + vcpu-arch.ciabr = value1; + return H_SUCCESS; + case H_SET_MODE_RESOURCE_SET_DAWR: + if (!kvmppc_power8_compatible(vcpu)) + return H_P2; + if (mflags) + return H_UNSUPPORTED_FLAG_START; + if (value2 DABRX_HYP) + return H_P4; + vcpu-arch.dawr = value1; + vcpu-arch.dawrx = value2; + return H_SUCCESS; + default: + return H_TOO_HARD; + } +} + int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) { unsigned long req = kvmppc_get_gpr(vcpu, 3); @@ -626,7 +668,14 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) /* Send the error out to userspace via KVM_RUN */ return rc; - + case H_SET_MODE: + ret = kvmppc_h_set_mode(vcpu, kvmppc_get_gpr(vcpu, 4), + kvmppc_get_gpr(vcpu, 5), + kvmppc_get_gpr(vcpu, 6), + kvmppc_get_gpr(vcpu, 7)); + if (ret == H_TOO_HARD) + return RESUME_HOST; + break; case H_XIRR: case H_CPPR: case H_EOI: @@ -652,6 +701,7 @@ static int kvmppc_hcall_impl_hv(unsigned long cmd) case H_PROD: case H_CONFER: case H_REGISTER_VPA: + case H_SET_MODE: #ifdef CONFIG_KVM_XICS case H_XIRR: case H_CPPR: -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 09/63] KVM: PPC: Book3S PR: Fix ABIv2 on LE
We switched to ABIv2 on Little Endian systems now which gets rid of the dotted function names. Branch to the actual functions when we see such a system. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_interrupts.S | 4 arch/powerpc/kvm/book3s_rmhandlers.S | 4 2 files changed, 8 insertions(+) diff --git a/arch/powerpc/kvm/book3s_interrupts.S b/arch/powerpc/kvm/book3s_interrupts.S index e2c29e3..d044b8b 100644 --- a/arch/powerpc/kvm/book3s_interrupts.S +++ b/arch/powerpc/kvm/book3s_interrupts.S @@ -25,7 +25,11 @@ #include asm/exception-64s.h #if defined(CONFIG_PPC_BOOK3S_64) +#if defined(_CALL_ELF) _CALL_ELF == 2 +#define FUNC(name) name +#else #define FUNC(name) GLUE(.,name) +#endif #define GET_SHADOW_VCPU(reg)addi reg, r13, PACA_SVCPU #elif defined(CONFIG_PPC_BOOK3S_32) diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S b/arch/powerpc/kvm/book3s_rmhandlers.S index 4850a22..16c4d88 100644 --- a/arch/powerpc/kvm/book3s_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_rmhandlers.S @@ -36,7 +36,11 @@ #if defined(CONFIG_PPC_BOOK3S_64) +#if defined(_CALL_ELF) _CALL_ELF == 2 +#define FUNC(name) name +#else #define FUNC(name) GLUE(.,name) +#endif #elif defined(CONFIG_PPC_BOOK3S_32) -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 21/63] KVM: PPC: Book3S HV: Fix ABIv2 on LE
For code that doesn't live in modules we can just branch to the real function names, giving us compatibility with ABIv1 and ABIv2. Do this for the compiled-in code of HV KVM. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 364ca0c..855521e 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -668,9 +668,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_TM) mr r31, r4 addir3, r31, VCPU_FPRS_TM - bl .load_fp_state + bl load_fp_state addir3, r31, VCPU_VRS_TM - bl .load_vr_state + bl load_vr_state mr r4, r31 lwz r7, VCPU_VRSAVE_TM(r4) mtspr SPRN_VRSAVE, r7 @@ -1414,9 +1414,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_TM) /* Save FP/VSX. */ addir3, r9, VCPU_FPRS_TM - bl .store_fp_state + bl store_fp_state addir3, r9, VCPU_VRS_TM - bl .store_vr_state + bl store_vr_state mfspr r6, SPRN_VRSAVE stw r6, VCPU_VRSAVE_TM(r9) 1: @@ -2430,11 +2430,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX) mtmsrd r8 isync addir3,r3,VCPU_FPRS - bl .store_fp_state + bl store_fp_state #ifdef CONFIG_ALTIVEC BEGIN_FTR_SECTION addir3,r31,VCPU_VRS - bl .store_vr_state + bl store_vr_state END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) #endif mfspr r6,SPRN_VRSAVE @@ -2466,11 +2466,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX) mtmsrd r8 isync addir3,r4,VCPU_FPRS - bl .load_fp_state + bl load_fp_state #ifdef CONFIG_ALTIVEC BEGIN_FTR_SECTION addir3,r31,VCPU_VRS - bl .load_vr_state + bl load_vr_state END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) #endif lwz r7,VCPU_VRSAVE(r31) -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 33/63] kvm: ppc: booke: Use the shared struct helpers for SPRN_SPRG0-7
From: Bharat Bhushan bharat.bhus...@freescale.com Use kvmppc_set_sprg[0-7]() and kvmppc_get_sprg[0-7]() helper functions Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/booke.c | 32 arch/powerpc/kvm/booke_emulate.c | 8 2 files changed, 20 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 25a7e70..34562d4 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -1227,14 +1227,14 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) regs-srr0 = kvmppc_get_srr0(vcpu); regs-srr1 = kvmppc_get_srr1(vcpu); regs-pid = vcpu-arch.pid; - regs-sprg0 = vcpu-arch.shared-sprg0; - regs-sprg1 = vcpu-arch.shared-sprg1; - regs-sprg2 = vcpu-arch.shared-sprg2; - regs-sprg3 = vcpu-arch.shared-sprg3; - regs-sprg4 = vcpu-arch.shared-sprg4; - regs-sprg5 = vcpu-arch.shared-sprg5; - regs-sprg6 = vcpu-arch.shared-sprg6; - regs-sprg7 = vcpu-arch.shared-sprg7; + regs-sprg0 = kvmppc_get_sprg0(vcpu); + regs-sprg1 = kvmppc_get_sprg1(vcpu); + regs-sprg2 = kvmppc_get_sprg2(vcpu); + regs-sprg3 = kvmppc_get_sprg3(vcpu); + regs-sprg4 = kvmppc_get_sprg4(vcpu); + regs-sprg5 = kvmppc_get_sprg5(vcpu); + regs-sprg6 = kvmppc_get_sprg6(vcpu); + regs-sprg7 = kvmppc_get_sprg7(vcpu); for (i = 0; i ARRAY_SIZE(regs-gpr); i++) regs-gpr[i] = kvmppc_get_gpr(vcpu, i); @@ -1255,14 +1255,14 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) kvmppc_set_srr0(vcpu, regs-srr0); kvmppc_set_srr1(vcpu, regs-srr1); kvmppc_set_pid(vcpu, regs-pid); - vcpu-arch.shared-sprg0 = regs-sprg0; - vcpu-arch.shared-sprg1 = regs-sprg1; - vcpu-arch.shared-sprg2 = regs-sprg2; - vcpu-arch.shared-sprg3 = regs-sprg3; - vcpu-arch.shared-sprg4 = regs-sprg4; - vcpu-arch.shared-sprg5 = regs-sprg5; - vcpu-arch.shared-sprg6 = regs-sprg6; - vcpu-arch.shared-sprg7 = regs-sprg7; + kvmppc_set_sprg0(vcpu, regs-sprg0); + kvmppc_set_sprg1(vcpu, regs-sprg1); + kvmppc_set_sprg2(vcpu, regs-sprg2); + kvmppc_set_sprg3(vcpu, regs-sprg3); + kvmppc_set_sprg4(vcpu, regs-sprg4); + kvmppc_set_sprg5(vcpu, regs-sprg5); + kvmppc_set_sprg6(vcpu, regs-sprg6); + kvmppc_set_sprg7(vcpu, regs-sprg7); for (i = 0; i ARRAY_SIZE(regs-gpr); i++) kvmppc_set_gpr(vcpu, i, regs-gpr[i]); diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c index 27a4b28..28c1588 100644 --- a/arch/powerpc/kvm/booke_emulate.c +++ b/arch/powerpc/kvm/booke_emulate.c @@ -165,16 +165,16 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val) * guest (PR-mode only). */ case SPRN_SPRG4: - vcpu-arch.shared-sprg4 = spr_val; + kvmppc_set_sprg4(vcpu, spr_val); break; case SPRN_SPRG5: - vcpu-arch.shared-sprg5 = spr_val; + kvmppc_set_sprg5(vcpu, spr_val); break; case SPRN_SPRG6: - vcpu-arch.shared-sprg6 = spr_val; + kvmppc_set_sprg6(vcpu, spr_val); break; case SPRN_SPRG7: - vcpu-arch.shared-sprg7 = spr_val; + kvmppc_set_sprg7(vcpu, spr_val); break; case SPRN_IVPR: -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 36/63] KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1
From: Mihai Caraman mihai.cara...@freescale.com Add mising defines MAS0_GET_TLBSEL() and MAS1_GET_TSIZE() for Book3E. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/mmu-book3e.h | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h index 8d24f78..cd4f04a 100644 --- a/arch/powerpc/include/asm/mmu-book3e.h +++ b/arch/powerpc/include/asm/mmu-book3e.h @@ -40,9 +40,11 @@ /* MAS registers bit definitions */ -#define MAS0_TLBSEL_MASK0x3000 -#define MAS0_TLBSEL_SHIFT 28 -#define MAS0_TLBSEL(x) (((x) MAS0_TLBSEL_SHIFT) MAS0_TLBSEL_MASK) +#define MAS0_TLBSEL_MASK 0x3000 +#define MAS0_TLBSEL_SHIFT 28 +#define MAS0_TLBSEL(x) (((x) MAS0_TLBSEL_SHIFT) MAS0_TLBSEL_MASK) +#define MAS0_GET_TLBSEL(mas0) (((mas0) MAS0_TLBSEL_MASK) \ + MAS0_TLBSEL_SHIFT) #define MAS0_ESEL_MASK 0x0FFF #define MAS0_ESEL_SHIFT16 #define MAS0_ESEL(x) (((x) MAS0_ESEL_SHIFT) MAS0_ESEL_MASK) @@ -60,6 +62,7 @@ #define MAS1_TSIZE_MASK0x0f80 #define MAS1_TSIZE_SHIFT 7 #define MAS1_TSIZE(x) (((x) MAS1_TSIZE_SHIFT) MAS1_TSIZE_MASK) +#define MAS1_GET_TSIZE(mas1) (((mas1) MAS1_TSIZE_MASK) MAS1_TSIZE_SHIFT) #define MAS2_EPN (~0xFFFUL) #define MAS2_X00x0040 -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 15/63] KVM: PPC: e500: Fix default tlb for victim hint
From: Mihai Caraman mihai.cara...@freescale.com Tlb search operation used for victim hint relies on the default tlb set by the host. When hardware tablewalk support is enabled in the host, the default tlb is TLB1 which leads KVM to evict the bolted entry. Set and restore the default tlb when searching for victim hint. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Reviewed-by: Scott Wood scottw...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/mmu-book3e.h | 5 - arch/powerpc/kvm/e500_mmu_host.c | 4 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h index d0918e0..8d24f78 100644 --- a/arch/powerpc/include/asm/mmu-book3e.h +++ b/arch/powerpc/include/asm/mmu-book3e.h @@ -40,7 +40,9 @@ /* MAS registers bit definitions */ -#define MAS0_TLBSEL(x) (((x) 28) 0x3000) +#define MAS0_TLBSEL_MASK0x3000 +#define MAS0_TLBSEL_SHIFT 28 +#define MAS0_TLBSEL(x) (((x) MAS0_TLBSEL_SHIFT) MAS0_TLBSEL_MASK) #define MAS0_ESEL_MASK 0x0FFF #define MAS0_ESEL_SHIFT16 #define MAS0_ESEL(x) (((x) MAS0_ESEL_SHIFT) MAS0_ESEL_MASK) @@ -86,6 +88,7 @@ #define MAS3_SPSIZE0x003e #define MAS3_SPSIZE_SHIFT 1 +#define MAS4_TLBSEL_MASK MAS0_TLBSEL_MASK #define MAS4_TLBSELD(x)MAS0_TLBSEL(x) #define MAS4_INDD 0x8000 /* Default IND */ #define MAS4_TSIZED(x) MAS1_TSIZE(x) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index dd2cc03..79677d7 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -107,11 +107,15 @@ static u32 get_host_mas0(unsigned long eaddr) { unsigned long flags; u32 mas0; + u32 mas4; local_irq_save(flags); mtspr(SPRN_MAS6, 0); + mas4 = mfspr(SPRN_MAS4); + mtspr(SPRN_MAS4, mas4 ~MAS4_TLBSEL_MASK); asm volatile(tlbsx 0, %0 : : b (eaddr ~CONFIG_PAGE_OFFSET)); mas0 = mfspr(SPRN_MAS0); + mtspr(SPRN_MAS4, mas4); local_irq_restore(flags); return mas0; -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 23/63] KVM: PPC: e500: Emulate power management control SPR
From: Mihai Caraman mihai.cara...@freescale.com For FSL e6500 core the kernel uses power management SPR register (PWRMGTCR0) to enable idle power down for cores and devices by setting up the idle count period at boot time. With the host already controlling the power management configuration the guest could simply benefit from it, so emulate guest request as a general store. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/kvm/e500_emulate.c | 12 2 files changed, 13 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 62b2cee..faf2f0e 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -584,6 +584,7 @@ struct kvm_vcpu_arch { u32 mmucfg; u32 eptcfg; u32 epr; + u32 pwrmgtcr0; u32 crit_save; /* guest debug registers*/ struct debug_reg dbg_reg; diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c index 002d517..c99c40e 100644 --- a/arch/powerpc/kvm/e500_emulate.c +++ b/arch/powerpc/kvm/e500_emulate.c @@ -250,6 +250,14 @@ int kvmppc_core_emulate_mtspr_e500(struct kvm_vcpu *vcpu, int sprn, ulong spr_va spr_val); break; + case SPRN_PWRMGTCR0: + /* +* Guest relies on host power management configurations +* Treat the request as a general store +*/ + vcpu-arch.pwrmgtcr0 = spr_val; + break; + /* extra exceptions */ case SPRN_IVOR32: vcpu-arch.ivor[BOOKE_IRQPRIO_SPE_UNAVAIL] = spr_val; @@ -368,6 +376,10 @@ int kvmppc_core_emulate_mfspr_e500(struct kvm_vcpu *vcpu, int sprn, ulong *spr_v *spr_val = vcpu-arch.eptcfg; break; + case SPRN_PWRMGTCR0: + *spr_val = vcpu-arch.pwrmgtcr0; + break; + /* extra exceptions */ case SPRN_IVOR32: *spr_val = vcpu-arch.ivor[BOOKE_IRQPRIO_SPE_UNAVAIL]; -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 22/63] KVM: PPC: Book3S HV: Enable for little endian hosts
Now that we've fixed all the issues that HV KVM code had on little endian hosts, we can enable it in the kernel configuration for users to play with. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/Kconfig | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index d6a53b9..8aeeda1 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -75,7 +75,6 @@ config KVM_BOOK3S_64 config KVM_BOOK3S_64_HV tristate KVM support for POWER7 and PPC970 using hypervisor mode in host depends on KVM_BOOK3S_64 - depends on !CPU_LITTLE_ENDIAN select KVM_BOOK3S_HV_POSSIBLE select MMU_NOTIFIER select CMA -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 17/63] KVM: PPC: Book3S HV: Make HTAB code LE host aware
When running on an LE host all data structures are kept in little endian byte order. However, the HTAB still needs to be maintained in big endian. So every time we access any HTAB we need to make sure we do so in the right byte order. Fix up all accesses to manually byte swap. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h| 4 +- arch/powerpc/include/asm/kvm_book3s_64.h | 15 +++- arch/powerpc/kvm/book3s_64_mmu_hv.c | 128 ++- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 146 ++- 4 files changed, 164 insertions(+), 129 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index ceb70aa..8ac5392 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -162,9 +162,9 @@ extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, bool writing, bool *writable); extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev, unsigned long *rmap, long pte_index, int realmode); -extern void kvmppc_invalidate_hpte(struct kvm *kvm, unsigned long *hptep, +extern void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep, unsigned long pte_index); -void kvmppc_clear_ref_hpte(struct kvm *kvm, unsigned long *hptep, +void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep, unsigned long pte_index); extern void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long addr, unsigned long *nb_ret); diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index c7871f3..e504f88 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -59,20 +59,29 @@ extern unsigned long kvm_rma_pages; /* These bits are reserved in the guest view of the HPTE */ #define HPTE_GR_RESERVED HPTE_GR_MODIFIED -static inline long try_lock_hpte(unsigned long *hpte, unsigned long bits) +static inline long try_lock_hpte(__be64 *hpte, unsigned long bits) { unsigned long tmp, old; + __be64 be_lockbit, be_bits; + + /* +* We load/store in native endian, but the HTAB is in big endian. If +* we byte swap all data we apply on the PTE we're implicitly correct +* again. +*/ + be_lockbit = cpu_to_be64(HPTE_V_HVLOCK); + be_bits = cpu_to_be64(bits); asm volatile( ldarx %0,0,%2\n and.%1,%0,%3\n bne 2f\n - ori %0,%0,%4\n + or %0,%0,%4\n stdcx. %0,0,%2\n beq+2f\n mr %1,%3\n 2:isync : =r (tmp), =r (old) -: r (hpte), r (bits), i (HPTE_V_HVLOCK) +: r (hpte), r (be_bits), r (be_lockbit) : cc, memory); return old == 0; } diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 8056107..2d154d9 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -450,7 +450,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, unsigned long slb_v; unsigned long pp, key; unsigned long v, gr; - unsigned long *hptep; + __be64 *hptep; int index; int virtmode = vcpu-arch.shregs.msr (data ? MSR_DR : MSR_IR); @@ -473,13 +473,13 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, preempt_enable(); return -ENOENT; } - hptep = (unsigned long *)(kvm-arch.hpt_virt + (index 4)); - v = hptep[0] ~HPTE_V_HVLOCK; + hptep = (__be64 *)(kvm-arch.hpt_virt + (index 4)); + v = be64_to_cpu(hptep[0]) ~HPTE_V_HVLOCK; gr = kvm-arch.revmap[index].guest_rpte; /* Unlock the HPTE */ asm volatile(lwsync : : : memory); - hptep[0] = v; + hptep[0] = cpu_to_be64(v); preempt_enable(); gpte-eaddr = eaddr; @@ -583,7 +583,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned long ea, unsigned long dsisr) { struct kvm *kvm = vcpu-kvm; - unsigned long *hptep, hpte[3], r; + unsigned long hpte[3], r; + __be64 *hptep; unsigned long mmu_seq, psize, pte_size; unsigned long gpa_base, gfn_base; unsigned long gpa, gfn, hva, pfn; @@ -606,16 +607,16 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, if (ea != vcpu-arch.pgfault_addr) return RESUME_GUEST; index = vcpu-arch.pgfault_index; - hptep = (unsigned long *)(kvm-arch.hpt_virt + (index
[PULL 25/63] KVM: PPC: Deflect page write faults properly in kvmppc_st
When we have a page that we're not allowed to write to, xlate() will already tell us -EPERM on lookup of that page. With the code as is we change it into a page missing error which a guest may get confused about. Instead, just tell the caller about the -EPERM directly. This fixes Mac OS X guests when run with DCBZ32 emulation. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index bd75902..9624c56 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -418,11 +418,13 @@ int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data) { struct kvmppc_pte pte; + int r; vcpu-stat.st++; - if (kvmppc_xlate(vcpu, *eaddr, data, true, pte)) - return -ENOENT; + r = kvmppc_xlate(vcpu, *eaddr, data, true, pte); + if (r 0) + return r; *eaddr = pte.raddr; -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 05/63] KVM: PPC: Book3s HV: Fix tlbie compile error
Some compilers complain about uninitialized variables in the compute_tlbie_rb function. When you follow the code path you'll realize that we'll never get to that point, but the compiler isn't all that smart. So just default to 4k page sizes for everything, making the compiler happy and the code slightly easier to read. Signed-off-by: Alexander Graf ag...@suse.de Acked-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_book3s_64.h | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index fddb72b..c7871f3 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -110,16 +110,12 @@ static inline int __hpte_actual_psize(unsigned int lp, int psize) static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, unsigned long pte_index) { - int b_psize, a_psize; + int b_psize = MMU_PAGE_4K, a_psize = MMU_PAGE_4K; unsigned int penc; unsigned long rb = 0, va_low, sllp; unsigned int lp = (r LP_SHIFT) ((1 LP_BITS) - 1); - if (!(v HPTE_V_LARGE)) { - /* both base and actual psize is 4k */ - b_psize = MMU_PAGE_4K; - a_psize = MMU_PAGE_4K; - } else { + if (v HPTE_V_LARGE) { for (b_psize = 0; b_psize MMU_PAGE_COUNT; b_psize++) { /* valid entries have a shift value */ -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 08/63] KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC()
From: Anton Blanchard an...@samba.org Both kvmppc_hv_entry_trampoline and kvmppc_entry_trampoline are assembly functions that are exported to modules and also require a valid r2. As such we need to use _GLOBAL_TOC so we provide a global entry point that establishes the TOC (r2). Signed-off-by: Anton Blanchard an...@samba.org Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +- arch/powerpc/kvm/book3s_rmhandlers.S| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index da1cac5..64ac56f 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -48,7 +48,7 @@ * * LR = return address to continue at after eventually re-enabling MMU */ -_GLOBAL(kvmppc_hv_entry_trampoline) +_GLOBAL_TOC(kvmppc_hv_entry_trampoline) mflrr0 std r0, PPC_LR_STKOFF(r1) stdur1, -112(r1) diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S b/arch/powerpc/kvm/book3s_rmhandlers.S index 9eec675..4850a22 100644 --- a/arch/powerpc/kvm/book3s_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_rmhandlers.S @@ -146,7 +146,7 @@ kvmppc_handler_skip_ins: * On entry, r4 contains the guest shadow MSR * MSR.EE has to be 0 when calling this function */ -_GLOBAL(kvmppc_entry_trampoline) +_GLOBAL_TOC(kvmppc_entry_trampoline) mfmsr r5 LOAD_REG_ADDR(r7, kvmppc_handler_trampoline_enter) toreal(r7) -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 02/63] KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com virtual time base register is a per VM, per cpu register that needs to be saved and restored on vm exit and entry. Writing to VTB is not allowed in the privileged mode. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com [agraf: fix compile error] Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/include/asm/reg.h | 9 + arch/powerpc/include/asm/time.h | 9 + arch/powerpc/kvm/book3s.c | 6 ++ arch/powerpc/kvm/book3s_emulate.c | 3 +++ arch/powerpc/kvm/book3s_hv.c| 6 -- arch/powerpc/kvm/book3s_pr.c| 3 ++- 7 files changed, 30 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 4a58731..bd3caea 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -505,6 +505,7 @@ struct kvm_vcpu_arch { #endif /* Time base value when we entered the guest */ u64 entry_tb; + u64 entry_vtb; u32 tcr; ulong tsr; /* we need to perform set/clr_bits() which requires ulong */ u32 ivor[64]; diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index bffd89d..c8f3381 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -1203,6 +1203,15 @@ : r ((unsigned long)(v)) \ : memory) +static inline unsigned long mfvtb (void) +{ +#ifdef CONFIG_PPC_BOOK3S_64 + if (cpu_has_feature(CPU_FTR_ARCH_207S)) + return mfspr(SPRN_VTB); +#endif + return 0; +} + #ifdef __powerpc64__ #if defined(CONFIG_PPC_CELL) || defined(CONFIG_PPC_FSL_BOOK3E) #define mftb() ({unsigned long rval; \ diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h index 1d428e60..03cbada 100644 --- a/arch/powerpc/include/asm/time.h +++ b/arch/powerpc/include/asm/time.h @@ -102,6 +102,15 @@ static inline u64 get_rtc(void) return (u64)hi * 10 + lo; } +static inline u64 get_vtb(void) +{ +#ifdef CONFIG_PPC_BOOK3S_64 + if (cpu_has_feature(CPU_FTR_ARCH_207S)) + return mfvtb(); +#endif + return 0; +} + #ifdef CONFIG_PPC64 static inline u64 get_tb(void) { diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index c254c27..ddce1ea 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -646,6 +646,9 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) case KVM_REG_PPC_BESCR: val = get_reg_val(reg-id, vcpu-arch.bescr); break; + case KVM_REG_PPC_VTB: + val = get_reg_val(reg-id, vcpu-arch.vtb); + break; default: r = -EINVAL; break; @@ -750,6 +753,9 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) case KVM_REG_PPC_BESCR: vcpu-arch.bescr = set_reg_val(reg-id, val); break; + case KVM_REG_PPC_VTB: + vcpu-arch.vtb = set_reg_val(reg-id, val); + break; default: r = -EINVAL; break; diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 3565e77..1bb16a5 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -577,6 +577,9 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val */ *spr_val = vcpu-arch.spurr; break; + case SPRN_VTB: + *spr_val = vcpu-arch.vtb; + break; case SPRN_GQR0: case SPRN_GQR1: case SPRN_GQR2: diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 7a12edb..315e884 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -897,9 +897,6 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, case KVM_REG_PPC_IC: *val = get_reg_val(id, vcpu-arch.ic); break; - case KVM_REG_PPC_VTB: - *val = get_reg_val(id, vcpu-arch.vtb); - break; case KVM_REG_PPC_CSIGR: *val = get_reg_val(id, vcpu-arch.csigr); break; @@ -1097,9 +1094,6 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, case KVM_REG_PPC_IC: vcpu-arch.ic = set_reg_val(id, *val); break; - case KVM_REG_PPC_VTB: - vcpu-arch.vtb = set_reg_val(id, *val); - break; case KVM_REG_PPC_CSIGR: vcpu-arch.csigr =
[PULL 13/63] KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled
From: Paul Mackerras pau...@samba.org This adds code to check that when the KVM_CAP_PPC_ENABLE_HCALL capability is used to enable or disable in-kernel handling of an hcall, that the hcall is actually implemented by the kernel. If not an EINVAL error is returned. This also checks the default-enabled list of hcalls and prints a warning if any hcall there is not actually implemented. Signed-off-by: Paul Mackerras pau...@samba.org Signed-off-by: Alexander Graf ag...@suse.de --- Documentation/virtual/kvm/api.txt | 4 arch/powerpc/include/asm/kvm_book3s.h | 3 +++ arch/powerpc/include/asm/kvm_ppc.h | 2 +- arch/powerpc/kvm/book3s.c | 5 + arch/powerpc/kvm/book3s_hv.c| 31 +-- arch/powerpc/kvm/book3s_hv_builtin.c| 13 + arch/powerpc/kvm/book3s_hv_rmhandlers.S | 1 + arch/powerpc/kvm/book3s_pr.c| 3 +++ arch/powerpc/kvm/book3s_pr_papr.c | 29 +++-- arch/powerpc/kvm/powerpc.c | 2 ++ 10 files changed, 88 insertions(+), 5 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 5c54d19..6955318 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -3039,3 +3039,7 @@ not to attempt to handle the hcall, but will always exit to userspace to handle it. Note that it may not make sense to enable some and disable others of a group of related hcalls, but KVM does not prevent userspace from doing that. + +If the hcall number specified is not one that has an in-kernel +implementation, the KVM_ENABLE_CAP ioctl will fail with an EINVAL +error. diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 052ab2a..ceb70aa 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -146,6 +146,7 @@ extern void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache * extern int kvmppc_mmu_hpte_sysinit(void); extern void kvmppc_mmu_hpte_sysexit(void); extern int kvmppc_mmu_hv_init(void); +extern int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned long hc); extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data); extern int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data); @@ -188,6 +189,8 @@ extern u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst); extern ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst); extern int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd); extern void kvmppc_pr_init_default_hcalls(struct kvm *kvm); +extern int kvmppc_hcall_impl_pr(unsigned long cmd); +extern int kvmppc_hcall_impl_hv_realmode(unsigned long cmd); extern void kvmppc_copy_to_svcpu(struct kvmppc_book3s_shadow_vcpu *svcpu, struct kvm_vcpu *vcpu); extern void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu, diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 9c89cdd..e2fd5a1 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -228,7 +228,7 @@ struct kvmppc_ops { void (*fast_vcpu_kick)(struct kvm_vcpu *vcpu); long (*arch_vm_ioctl)(struct file *filp, unsigned int ioctl, unsigned long arg); - + int (*hcall_implemented)(unsigned long hcall); }; extern struct kvmppc_ops *kvmppc_hv_ops; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 90aa5c7..bd75902 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -925,6 +925,11 @@ int kvmppc_core_check_processor_compat(void) return 0; } +int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned long hcall) +{ + return kvm-arch.kvm_ops-hcall_implemented(hcall); +} + static int kvmppc_book3s_init(void) { int r; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index cf445d2..c4377c7 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -645,6 +645,28 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) return RESUME_GUEST; } +static int kvmppc_hcall_impl_hv(unsigned long cmd) +{ + switch (cmd) { + case H_CEDE: + case H_PROD: + case H_CONFER: + case H_REGISTER_VPA: +#ifdef CONFIG_KVM_XICS + case H_XIRR: + case H_CPPR: + case H_EOI: + case H_IPI: + case H_IPOLL: + case H_XIRR_X: +#endif + return 1; + } + + /* See if it's in the real-mode table */ + return kvmppc_hcall_impl_hv_realmode(cmd); +} + static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, struct task_struct *tsk) { @@ -2451,9 +2473,13 @@ static unsigned int default_hcall_list[] = { static void init_default_hcalls(void) { int i; +
[PULL 34/63] kvm: ppc: Add SPRN_EPR get helper function
From: Bharat Bhushan bharat.bhus...@freescale.com kvmppc_set_epr() is already defined in asm/kvm_ppc.h, So rename and move get_epr helper function to same file. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com [agraf: remove duplicate return] Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_ppc.h | 11 +++ arch/powerpc/kvm/booke.c | 11 +-- 2 files changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index c95bdbd..246fb9a 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -392,6 +392,17 @@ static inline int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd) { return 0; } #endif +static inline unsigned long kvmppc_get_epr(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_KVM_BOOKE_HV + return mfspr(SPRN_GEPR); +#elif defined(CONFIG_BOOKE) + return vcpu-arch.epr; +#else + return 0; +#endif +} + static inline void kvmppc_set_epr(struct kvm_vcpu *vcpu, u32 epr) { #ifdef CONFIG_KVM_BOOKE_HV diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 34562d4..a06ef6b 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -292,15 +292,6 @@ static void set_guest_mcsrr(struct kvm_vcpu *vcpu, unsigned long srr0, u32 srr1) vcpu-arch.mcsrr1 = srr1; } -static unsigned long get_guest_epr(struct kvm_vcpu *vcpu) -{ -#ifdef CONFIG_KVM_BOOKE_HV - return mfspr(SPRN_GEPR); -#else - return vcpu-arch.epr; -#endif -} - /* Deliver the interrupt of the corresponding priority, if possible. */ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority) @@ -1452,7 +1443,7 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) val = get_reg_val(reg-id, vcpu-arch.dbg_reg.dac2); break; case KVM_REG_PPC_EPR: { - u32 epr = get_guest_epr(vcpu); + u32 epr = kvmppc_get_epr(vcpu); val = get_reg_val(reg-id, epr); break; } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 35/63] KVM: PPC: e500mc: Revert add load inst fixup
From: Mihai Caraman mihai.cara...@freescale.com The commit 1d628af7 add load inst fixup made an attempt to handle failures generated by reading the guest current instruction. The fixup code that was added works by chance hiding the real issue. Load external pid (lwepx) instruction, used by KVM to read guest instructions, is executed in a subsituted guest translation context (EPLC[EGS] = 1). In consequence lwepx's TLB error and data storage interrupts need to be handled by KVM, even though these interrupts are generated from host context (MSR[GS] = 0) where lwepx is executed. Currently, KVM hooks only interrupts generated from guest context (MSR[GS] = 1), doing minimal checks on the fast path to avoid host performance degradation. As a result, the host kernel handles lwepx faults searching the faulting guest data address (loaded in DEAR) in its own Logical Partition ID (LPID) 0 context. In case a host translation is found the execution returns to the lwepx instruction instead of the fixup, the host ending up in an infinite loop. Revert the commit add load inst fixup. lwepx issue will be addressed in a subsequent patch without needing fixup code. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/bookehv_interrupts.S | 26 +- 1 file changed, 1 insertion(+), 25 deletions(-) diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index a1712b8..6ff4480 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -29,7 +29,6 @@ #include asm/asm-compat.h #include asm/asm-offsets.h #include asm/bitsperlong.h -#include asm/thread_info.h #ifdef CONFIG_64BIT #include asm/exception-64e.h @@ -164,32 +163,9 @@ PPC_STL r30, VCPU_GPR(R30)(r4) PPC_STL r31, VCPU_GPR(R31)(r4) mtspr SPRN_EPLC, r8 - - /* disable preemption, so we are sure we hit the fixup handler */ - CURRENT_THREAD_INFO(r8, r1) - li r7, 1 - stw r7, TI_PREEMPT(r8) - isync - - /* -* In case the read goes wrong, we catch it and write an invalid value -* in LAST_INST instead. -*/ -1: lwepx r9, 0, r5 -2: -.section .fixup, ax -3: li r9, KVM_INST_FETCH_FAILED - b 2b -.previous -.section __ex_table,a - PPC_LONG_ALIGN - PPC_LONG 1b,3b -.previous - + lwepx r9, 0, r5 mtspr SPRN_EPLC, r3 - li r7, 0 - stw r7, TI_PREEMPT(r8) stw r9, VCPU_LAST_INST(r4) .endif -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 11/63] KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
From: Mihai Caraman mihai.cara...@freescale.com On vcpu schedule, the condition checked for tlb pollution is too loose. The tlb entries of a vcpu become polluted (vs stale) only when a different vcpu within the same logical partition runs in-between. Optimize the tlb invalidation condition keeping last_vcpu per logical partition id. With the new invalidation condition, a guest shows 4% performance improvement on P5020DS while running a memory stress application with the cpu oversubscribed, the other guest running a cpu intensive workload. Guest - old invalidation condition real 3.89 user 3.87 sys 0.01 Guest - enhanced invalidation condition real 3.75 user 3.73 sys 0.01 Host real 3.70 user 1.85 sys 0.00 The memory stress application accesses 4KB pages backed by 75% of available TLB0 entries: char foo[ENTRIES][4096] __attribute__ ((aligned (4096))); int main() { char bar; int i, j; for (i = 0; i ITERATIONS; i++) for (j = 0; j ENTRIES; j++) bar = foo[j][0]; return 0; } Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Reviewed-by: Scott Wood scottw...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/e500mc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c index 17e4562..690499d 100644 --- a/arch/powerpc/kvm/e500mc.c +++ b/arch/powerpc/kvm/e500mc.c @@ -110,7 +110,7 @@ void kvmppc_mmu_msr_notify(struct kvm_vcpu *vcpu, u32 old_msr) { } -static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu_on_cpu); +static DEFINE_PER_CPU(struct kvm_vcpu *[KVMPPC_NR_LPIDS], last_vcpu_of_lpid); static void kvmppc_core_vcpu_load_e500mc(struct kvm_vcpu *vcpu, int cpu) { @@ -141,9 +141,9 @@ static void kvmppc_core_vcpu_load_e500mc(struct kvm_vcpu *vcpu, int cpu) mtspr(SPRN_GESR, vcpu-arch.shared-esr); if (vcpu-arch.oldpir != mfspr(SPRN_PIR) || - __get_cpu_var(last_vcpu_on_cpu) != vcpu) { + __get_cpu_var(last_vcpu_of_lpid)[vcpu-kvm-arch.lpid] != vcpu) { kvmppc_e500_tlbil_all(vcpu_e500); - __get_cpu_var(last_vcpu_on_cpu) = vcpu; + __get_cpu_var(last_vcpu_of_lpid)[vcpu-kvm-arch.lpid] = vcpu; } kvmppc_load_guest_fp(vcpu); -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 20/63] KVM: PPC: Book3S HV: Access XICS in BE
On the exit path from the guest we check what type of interrupt we received if we received one. This means we're doing hardware access to the XICS interrupt controller. However, when running on a little endian system, this access is byte reversed. So let's make sure to swizzle the bytes back again and virtually make XICS accesses big endian. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index bf5270e..364ca0c 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -2350,7 +2350,18 @@ kvmppc_read_intr: cmpdi r6, 0 beq-1f lwzcix r0, r6, r7 - rlwinm. r3, r0, 0, 0xff + /* +* Save XIRR for later. Since we get in in reverse endian on LE +* systems, save it byte reversed and fetch it back in host endian. +*/ + li r3, HSTATE_SAVED_XIRR + STWX_BE r0, r3, r13 +#ifdef __LITTLE_ENDIAN__ + lwz r3, HSTATE_SAVED_XIRR(r13) +#else + mr r3, r0 +#endif + rlwinm. r3, r3, 0, 0xff sync beq 1f /* if nothing pending in the ICP */ @@ -2382,10 +2393,9 @@ kvmppc_read_intr: li r3, -1 1: blr -42:/* It's not an IPI and it's for the host, stash it in the PACA -* before exit, it will be picked up by the host ICP driver +42:/* It's not an IPI and it's for the host. We saved a copy of XIRR in +* the PACA earlier, it will be picked up by the host ICP driver */ - stw r0, HSTATE_SAVED_XIRR(r13) li r3, 1 b 1b -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 10/63] KVM: PPC: Book3S PR: Fix sparse endian checks
While sending sparse with endian checks over the code base, it triggered at some places that were missing casts or had wrong types. Fix them up. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_pr_papr.c | 21 +++-- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c index 52a63bf..f7c25c6 100644 --- a/arch/powerpc/kvm/book3s_pr_papr.c +++ b/arch/powerpc/kvm/book3s_pr_papr.c @@ -40,8 +40,9 @@ static int kvmppc_h_pr_enter(struct kvm_vcpu *vcpu) { long flags = kvmppc_get_gpr(vcpu, 4); long pte_index = kvmppc_get_gpr(vcpu, 5); - unsigned long pteg[2 * 8]; - unsigned long pteg_addr, i, *hpte; + __be64 pteg[2 * 8]; + __be64 *hpte; + unsigned long pteg_addr, i; long int ret; i = pte_index 7; @@ -93,8 +94,8 @@ static int kvmppc_h_pr_remove(struct kvm_vcpu *vcpu) pteg = get_pteg_addr(vcpu, pte_index); mutex_lock(vcpu-kvm-arch.hpt_mutex); copy_from_user(pte, (void __user *)pteg, sizeof(pte)); - pte[0] = be64_to_cpu(pte[0]); - pte[1] = be64_to_cpu(pte[1]); + pte[0] = be64_to_cpu((__force __be64)pte[0]); + pte[1] = be64_to_cpu((__force __be64)pte[1]); ret = H_NOT_FOUND; if ((pte[0] HPTE_V_VALID) == 0 || @@ -171,8 +172,8 @@ static int kvmppc_h_pr_bulk_remove(struct kvm_vcpu *vcpu) pteg = get_pteg_addr(vcpu, tsh H_BULK_REMOVE_PTEX); copy_from_user(pte, (void __user *)pteg, sizeof(pte)); - pte[0] = be64_to_cpu(pte[0]); - pte[1] = be64_to_cpu(pte[1]); + pte[0] = be64_to_cpu((__force __be64)pte[0]); + pte[1] = be64_to_cpu((__force __be64)pte[1]); /* tsl = AVPN */ flags = (tsh H_BULK_REMOVE_FLAGS) 26; @@ -211,8 +212,8 @@ static int kvmppc_h_pr_protect(struct kvm_vcpu *vcpu) pteg = get_pteg_addr(vcpu, pte_index); mutex_lock(vcpu-kvm-arch.hpt_mutex); copy_from_user(pte, (void __user *)pteg, sizeof(pte)); - pte[0] = be64_to_cpu(pte[0]); - pte[1] = be64_to_cpu(pte[1]); + pte[0] = be64_to_cpu((__force __be64)pte[0]); + pte[1] = be64_to_cpu((__force __be64)pte[1]); ret = H_NOT_FOUND; if ((pte[0] HPTE_V_VALID) == 0 || @@ -231,8 +232,8 @@ static int kvmppc_h_pr_protect(struct kvm_vcpu *vcpu) rb = compute_tlbie_rb(v, r, pte_index); vcpu-arch.mmu.tlbie(vcpu, rb, rb 1 ? true : false); - pte[0] = cpu_to_be64(pte[0]); - pte[1] = cpu_to_be64(pte[1]); + pte[0] = (__force u64)cpu_to_be64(pte[0]); + pte[1] = (__force u64)cpu_to_be64(pte[1]); copy_to_user((void __user *)pteg, pte, sizeof(pte)); ret = H_SUCCESS; -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 38/63] KVM: PPC: Allow kvmppc_get_last_inst() to fail
From: Mihai Caraman mihai.cara...@freescale.com On book3e, guest last instruction is read on the exit path using load external pid (lwepx) dedicated instruction. This load operation may fail due to TLB eviction and execute-but-not-read entries. This patch lay down the path for an alternative solution to read the guest last instruction, by allowing kvmppc_get_lat_inst() function to fail. Architecture specific implmentations of kvmppc_load_last_inst() may read last guest instruction and instruct the emulation layer to re-execute the guest in case of failure. Make kvmppc_get_last_inst() definition common between architectures. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h| 26 -- arch/powerpc/include/asm/kvm_booke.h | 5 arch/powerpc/include/asm/kvm_ppc.h | 31 ++ arch/powerpc/kvm/book3s.c| 17 arch/powerpc/kvm/book3s_64_mmu_hv.c | 17 arch/powerpc/kvm/book3s_paired_singles.c | 38 +-- arch/powerpc/kvm/book3s_pr.c | 45 +++- arch/powerpc/kvm/booke.c | 3 +++ arch/powerpc/kvm/e500_mmu_host.c | 6 + arch/powerpc/kvm/emulate.c | 18 - arch/powerpc/kvm/powerpc.c | 11 ++-- 11 files changed, 140 insertions(+), 77 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 20fb6f2..a86ca65 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -276,32 +276,6 @@ static inline bool kvmppc_need_byteswap(struct kvm_vcpu *vcpu) return (kvmppc_get_msr(vcpu) MSR_LE) != (MSR_KERNEL MSR_LE); } -static inline u32 kvmppc_get_last_inst_internal(struct kvm_vcpu *vcpu, ulong pc) -{ - /* Load the instruction manually if it failed to do so in the -* exit path */ - if (vcpu-arch.last_inst == KVM_INST_FETCH_FAILED) - kvmppc_ld(vcpu, pc, sizeof(u32), vcpu-arch.last_inst, false); - - return kvmppc_need_byteswap(vcpu) ? swab32(vcpu-arch.last_inst) : - vcpu-arch.last_inst; -} - -static inline u32 kvmppc_get_last_inst(struct kvm_vcpu *vcpu) -{ - return kvmppc_get_last_inst_internal(vcpu, kvmppc_get_pc(vcpu)); -} - -/* - * Like kvmppc_get_last_inst(), but for fetching a sc instruction. - * Because the sc instruction sets SRR0 to point to the following - * instruction, we have to fetch from pc - 4. - */ -static inline u32 kvmppc_get_last_sc(struct kvm_vcpu *vcpu) -{ - return kvmppc_get_last_inst_internal(vcpu, kvmppc_get_pc(vcpu) - 4); -} - static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu) { return vcpu-arch.fault_dar; diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h index c7aed61..cbb1990 100644 --- a/arch/powerpc/include/asm/kvm_booke.h +++ b/arch/powerpc/include/asm/kvm_booke.h @@ -69,11 +69,6 @@ static inline bool kvmppc_need_byteswap(struct kvm_vcpu *vcpu) return false; } -static inline u32 kvmppc_get_last_inst(struct kvm_vcpu *vcpu) -{ - return vcpu-arch.last_inst; -} - static inline void kvmppc_set_ctr(struct kvm_vcpu *vcpu, ulong val) { vcpu-arch.ctr = val; diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 246fb9a..e381363 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -47,6 +47,11 @@ enum emulation_result { EMULATE_EXIT_USER,/* emulation requires exit to user-space */ }; +enum instruction_type { + INST_GENERIC, + INST_SC,/* system call */ +}; + extern int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu); extern int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu); extern void kvmppc_handler_highmem(void); @@ -62,6 +67,9 @@ extern int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu, u64 val, unsigned int bytes, int is_default_endian); +extern int kvmppc_load_last_inst(struct kvm_vcpu *vcpu, +enum instruction_type type, u32 *inst); + extern int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu); extern int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu); @@ -234,6 +242,29 @@ struct kvmppc_ops { extern struct kvmppc_ops *kvmppc_hv_ops; extern struct kvmppc_ops *kvmppc_pr_ops; +static inline int kvmppc_get_last_inst(struct kvm_vcpu *vcpu, + enum instruction_type type, u32 *inst) +{ + int ret = EMULATE_DONE; + u32 fetched_inst; + + /* Load the instruction manually if it failed to do so in the +* exit path */ + if
[PULL 32/63] kvm: ppc: booke: Add shared struct helpers of SPRN_ESR
From: Bharat Bhushan bharat.bhus...@freescale.com Add and use kvmppc_set_esr() and kvmppc_get_esr() helper functions Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_ppc.h | 1 + arch/powerpc/kvm/booke.c | 24 +++- 2 files changed, 4 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 6520d09..c95bdbd 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -530,6 +530,7 @@ SHARED_SPRNG_WRAPPER(sprg3, 64, SPRN_GSPRG3) SHARED_SPRNG_WRAPPER(srr0, 64, SPRN_GSRR0) SHARED_SPRNG_WRAPPER(srr1, 64, SPRN_GSRR1) SHARED_SPRNG_WRAPPER(dar, 64, SPRN_GDEAR) +SHARED_SPRNG_WRAPPER(esr, 64, SPRN_GESR) SHARED_WRAPPER_GET(msr, 64) static inline void kvmppc_set_msr_fast(struct kvm_vcpu *vcpu, u64 val) { diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 8e8b14b..25a7e70 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -292,24 +292,6 @@ static void set_guest_mcsrr(struct kvm_vcpu *vcpu, unsigned long srr0, u32 srr1) vcpu-arch.mcsrr1 = srr1; } -static unsigned long get_guest_esr(struct kvm_vcpu *vcpu) -{ -#ifdef CONFIG_KVM_BOOKE_HV - return mfspr(SPRN_GESR); -#else - return vcpu-arch.shared-esr; -#endif -} - -static void set_guest_esr(struct kvm_vcpu *vcpu, u32 esr) -{ -#ifdef CONFIG_KVM_BOOKE_HV - mtspr(SPRN_GESR, esr); -#else - vcpu-arch.shared-esr = esr; -#endif -} - static unsigned long get_guest_epr(struct kvm_vcpu *vcpu) { #ifdef CONFIG_KVM_BOOKE_HV @@ -427,7 +409,7 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, vcpu-arch.pc = vcpu-arch.ivpr | vcpu-arch.ivor[priority]; if (update_esr == true) - set_guest_esr(vcpu, vcpu-arch.queued_esr); + kvmppc_set_esr(vcpu, vcpu-arch.queued_esr); if (update_dear == true) kvmppc_set_dar(vcpu, vcpu-arch.queued_dear); if (update_epr == true) { @@ -1298,7 +1280,7 @@ static void get_sregs_base(struct kvm_vcpu *vcpu, sregs-u.e.csrr0 = vcpu-arch.csrr0; sregs-u.e.csrr1 = vcpu-arch.csrr1; sregs-u.e.mcsr = vcpu-arch.mcsr; - sregs-u.e.esr = get_guest_esr(vcpu); + sregs-u.e.esr = kvmppc_get_esr(vcpu); sregs-u.e.dear = kvmppc_get_dar(vcpu); sregs-u.e.tsr = vcpu-arch.tsr; sregs-u.e.tcr = vcpu-arch.tcr; @@ -1316,7 +1298,7 @@ static int set_sregs_base(struct kvm_vcpu *vcpu, vcpu-arch.csrr0 = sregs-u.e.csrr0; vcpu-arch.csrr1 = sregs-u.e.csrr1; vcpu-arch.mcsr = sregs-u.e.mcsr; - set_guest_esr(vcpu, sregs-u.e.esr); + kvmppc_set_esr(vcpu, sregs-u.e.esr); kvmppc_set_dar(vcpu, sregs-u.e.dear); vcpu-arch.vrsave = sregs-u.e.vrsave; kvmppc_set_tcr(vcpu, sregs-u.e.tcr); -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 01/63] KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation
From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com We use time base for PURR and SPURR emulation with PR KVM since we are emulating a single threaded core. When using time base we need to make sure that we don't accumulate time spent in the host in PURR and SPURR value. Also we don't need to emulate mtspr because both the registers are hypervisor resource. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h | 2 -- arch/powerpc/include/asm/kvm_host.h | 4 ++-- arch/powerpc/kvm/book3s_emulate.c | 16 arch/powerpc/kvm/book3s_pr.c | 11 +++ 4 files changed, 21 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index f52f656..a20cc0b 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -83,8 +83,6 @@ struct kvmppc_vcpu_book3s { u64 sdr1; u64 hior; u64 msr_mask; - u64 purr_offset; - u64 spurr_offset; #ifdef CONFIG_PPC_BOOK3S_32 u32 vsid_pool[VSID_POOL_SIZE]; u32 vsid_next; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index bb66d8b..4a58731 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -503,8 +503,8 @@ struct kvm_vcpu_arch { #ifdef CONFIG_BOOKE u32 decar; #endif - u32 tbl; - u32 tbu; + /* Time base value when we entered the guest */ + u64 entry_tb; u32 tcr; ulong tsr; /* we need to perform set/clr_bits() which requires ulong */ u32 ivor[64]; diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 3f29526..3565e77 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -439,12 +439,6 @@ int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val) (mfmsr() MSR_HV)) vcpu-arch.hflags |= BOOK3S_HFLAG_DCBZ32; break; - case SPRN_PURR: - to_book3s(vcpu)-purr_offset = spr_val - get_tb(); - break; - case SPRN_SPURR: - to_book3s(vcpu)-spurr_offset = spr_val - get_tb(); - break; case SPRN_GQR0: case SPRN_GQR1: case SPRN_GQR2: @@ -572,10 +566,16 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val *spr_val = 0; break; case SPRN_PURR: - *spr_val = get_tb() + to_book3s(vcpu)-purr_offset; + /* +* On exit we would have updated purr +*/ + *spr_val = vcpu-arch.purr; break; case SPRN_SPURR: - *spr_val = get_tb() + to_book3s(vcpu)-purr_offset; + /* +* On exit we would have updated spurr +*/ + *spr_val = vcpu-arch.spurr; break; case SPRN_GQR0: case SPRN_GQR1: diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 8eef1e5..671f5c92 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -120,6 +120,11 @@ void kvmppc_copy_to_svcpu(struct kvmppc_book3s_shadow_vcpu *svcpu, #ifdef CONFIG_PPC_BOOK3S_64 svcpu-shadow_fscr = vcpu-arch.shadow_fscr; #endif + /* +* Now also save the current time base value. We use this +* to find the guest purr and spurr value. +*/ + vcpu-arch.entry_tb = get_tb(); svcpu-in_use = true; } @@ -166,6 +171,12 @@ void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu, #ifdef CONFIG_PPC_BOOK3S_64 vcpu-arch.shadow_fscr = svcpu-shadow_fscr; #endif + /* +* Update purr and spurr using time base on exit. +*/ + vcpu-arch.purr += get_tb() - vcpu-arch.entry_tb; + vcpu-arch.spurr += get_tb() - vcpu-arch.entry_tb; + svcpu-in_use = false; out: -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 00/63] ppc patch queue 2014-08-01
Hi Paolo / Marcelo, This is my current patch queue for ppc. Please pull. Alex The following changes since commit 9f6226a762c7ae02f6a23a3d4fc552dafa57ea23: arch: x86: kvm: x86.c: Cleaning up variable is set more than once (2014-06-30 16:52:04 +0200) are available in the git repository at: git://github.com/agraf/linux-2.6.git tags/signed-kvm-ppc-next for you to fetch changes up to 8e6afa36e754be84b468d7df9e5aa71cf4003f3b: KVM: PPC: PR: Handle FSCR feature deselects (2014-07-31 10:23:46 +0200) Patch queue for ppc - 2014-08-01 Highlights in this release include: - BookE: Rework instruction fetch, not racy anymore now - BookE HV: Fix ONE_REG accessors for some in-hardware registers - Book3S: Good number of LE host fixes, enable HV on LE - Book3S: Some misc bug fixes - Book3S HV: Add in-guest debug support - Book3S HV: Preload cache lines on context switch - Remove 440 support Alexander Graf (31): KVM: PPC: Book3s PR: Disable AIL mode with OPAL KVM: PPC: Book3s HV: Fix tlbie compile error KVM: PPC: Book3S PR: Handle hyp doorbell exits KVM: PPC: Book3S PR: Fix ABIv2 on LE KVM: PPC: Book3S PR: Fix sparse endian checks PPC: Add asm helpers for BE 32bit load/store KVM: PPC: Book3S HV: Make HTAB code LE host aware KVM: PPC: Book3S HV: Access guest VPA in BE KVM: PPC: Book3S HV: Access host lppaca and shadow slb in BE KVM: PPC: Book3S HV: Access XICS in BE KVM: PPC: Book3S HV: Fix ABIv2 on LE KVM: PPC: Book3S HV: Enable for little endian hosts KVM: PPC: Book3S: Move vcore definition to end of kvm_arch struct KVM: PPC: Deflect page write faults properly in kvmppc_st KVM: PPC: Book3S: Stop PTE lookup on write errors KVM: PPC: Book3S: Add hack for split real mode KVM: PPC: Book3S: Make magic page properly 4k mappable KVM: PPC: Remove 440 support KVM: Rename and add argument to check_extension KVM: Allow KVM_CHECK_EXTENSION on the vm fd KVM: PPC: Book3S: Provide different CAPs based on HV or PR mode KVM: PPC: Implement kvmppc_xlate for all targets KVM: PPC: Move kvmppc_ld/st to common code KVM: PPC: Remove kvmppc_bad_hva() KVM: PPC: Use kvm_read_guest in kvmppc_ld KVM: PPC: Handle magic page in kvmppc_ld/st KVM: PPC: Separate loadstore emulation from priv emulation KVM: PPC: Expose helper functions for data/inst faults KVM: PPC: Remove DCR handling KVM: PPC: HV: Remove generic instruction emulation KVM: PPC: PR: Handle FSCR feature deselects Alexey Kardashevskiy (1): KVM: PPC: Book3S: Fix LPCR one_reg interface Aneesh Kumar K.V (4): KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation KVM: PPC: BOOK3S: PR: Emulate virtual timebase register KVM: PPC: BOOK3S: PR: Emulate instruction counter KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page Anton Blanchard (2): KVM: PPC: Book3S HV: Fix ABIv2 indirect branch issue KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC() Bharat Bhushan (10): kvm: ppc: bookehv: Added wrapper macros for shadow registers kvm: ppc: booke: Use the shared struct helpers of SRR0 and SRR1 kvm: ppc: booke: Use the shared struct helpers of SPRN_DEAR kvm: ppc: booke: Add shared struct helpers of SPRN_ESR kvm: ppc: booke: Use the shared struct helpers for SPRN_SPRG0-7 kvm: ppc: Add SPRN_EPR get helper function kvm: ppc: bookehv: Save restore SPRN_SPRG9 on guest entry exit KVM: PPC: Booke-hv: Add one reg interface for SPRG9 KVM: PPC: Remove comment saying SPRG1 is used for vcpu pointer KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr Michael Neuling (1): KVM: PPC: Book3S HV: Add H_SET_MODE hcall handling Mihai Caraman (8): KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule KVM: PPC: e500: Fix default tlb for victim hint KVM: PPC: e500: Emulate power management control SPR KVM: PPC: e500mc: Revert add load inst fixup KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1 KVM: PPC: Book3s: Remove kvmppc_read_inst() function KVM: PPC: Allow kvmppc_get_last_inst() to fail KVM: PPC: Bookehv: Get vcpu's last instruction for emulation Paul Mackerras (4): KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled KVM: PPC: Book3S PR: Take SRCU read lock around RTAS kvm_read_guest() call KVM: PPC: Book3S: Make kvmppc_ld return a more accurate error indication Stewart Smith (2): Split out struct kvmppc_vcore creation to separate function Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8 Alexander Graf (31): KVM: PPC:
[PULL 26/63] KVM: PPC: Book3S: Stop PTE lookup on write errors
When a page lookup failed because we're not allowed to write to the page, we should not overwrite that value with another lookup on the second PTEG which will return page not found. Instead, we should just tell the caller that we had a permission problem. This fixes Mac OS X guests looping endlessly in page lookup code for me. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_32_mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c index 93503bb..cd0b073 100644 --- a/arch/powerpc/kvm/book3s_32_mmu.c +++ b/arch/powerpc/kvm/book3s_32_mmu.c @@ -335,7 +335,7 @@ static int kvmppc_mmu_book3s_32_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, if (r 0) r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte, data, iswrite, true); - if (r 0) + if (r == -ENOENT) r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte, data, iswrite, false); -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 18/63] KVM: PPC: Book3S HV: Access guest VPA in BE
There are a few shared data structures between the host and the guest. Most of them get registered through the VPA interface. These data structures are defined to always be in big endian byte order, so let's make sure we always access them in big endian. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_hv.c | 22 +++--- arch/powerpc/kvm/book3s_hv_ras.c | 6 +++--- 2 files changed, 14 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 7db9df2..f1281c4 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -272,7 +272,7 @@ struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id) static void init_vpa(struct kvm_vcpu *vcpu, struct lppaca *vpa) { vpa-__old_status |= LPPACA_OLD_SHARED_PROC; - vpa-yield_count = 1; + vpa-yield_count = cpu_to_be32(1); } static int set_vpa(struct kvm_vcpu *vcpu, struct kvmppc_vpa *v, @@ -295,8 +295,8 @@ static int set_vpa(struct kvm_vcpu *vcpu, struct kvmppc_vpa *v, struct reg_vpa { u32 dummy; union { - u16 hword; - u32 word; + __be16 hword; + __be32 word; } length; }; @@ -335,9 +335,9 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu *vcpu, if (va == NULL) return H_PARAMETER; if (subfunc == H_VPA_REG_VPA) - len = ((struct reg_vpa *)va)-length.hword; + len = be16_to_cpu(((struct reg_vpa *)va)-length.hword); else - len = ((struct reg_vpa *)va)-length.word; + len = be32_to_cpu(((struct reg_vpa *)va)-length.word); kvmppc_unpin_guest_page(kvm, va, vpa, false); /* Check length */ @@ -542,18 +542,18 @@ static void kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu, return; memset(dt, 0, sizeof(struct dtl_entry)); dt-dispatch_reason = 7; - dt-processor_id = vc-pcpu + vcpu-arch.ptid; - dt-timebase = now + vc-tb_offset; - dt-enqueue_to_dispatch_time = stolen; - dt-srr0 = kvmppc_get_pc(vcpu); - dt-srr1 = vcpu-arch.shregs.msr; + dt-processor_id = cpu_to_be16(vc-pcpu + vcpu-arch.ptid); + dt-timebase = cpu_to_be64(now + vc-tb_offset); + dt-enqueue_to_dispatch_time = cpu_to_be32(stolen); + dt-srr0 = cpu_to_be64(kvmppc_get_pc(vcpu)); + dt-srr1 = cpu_to_be64(vcpu-arch.shregs.msr); ++dt; if (dt == vcpu-arch.dtl.pinned_end) dt = vcpu-arch.dtl.pinned_addr; vcpu-arch.dtl_ptr = dt; /* order writing *dt vs. writing vpa-dtl_idx */ smp_wmb(); - vpa-dtl_idx = ++vcpu-arch.dtl_index; + vpa-dtl_idx = cpu_to_be64(++vcpu-arch.dtl_index); vcpu-arch.dtl.dirty = true; } diff --git a/arch/powerpc/kvm/book3s_hv_ras.c b/arch/powerpc/kvm/book3s_hv_ras.c index 3a5c568..d562c8e 100644 --- a/arch/powerpc/kvm/book3s_hv_ras.c +++ b/arch/powerpc/kvm/book3s_hv_ras.c @@ -45,14 +45,14 @@ static void reload_slb(struct kvm_vcpu *vcpu) return; /* Sanity check */ - n = min_t(u32, slb-persistent, SLB_MIN_SIZE); + n = min_t(u32, be32_to_cpu(slb-persistent), SLB_MIN_SIZE); if ((void *) slb-save_area[n] vcpu-arch.slb_shadow.pinned_end) return; /* Load up the SLB from that */ for (i = 0; i n; ++i) { - unsigned long rb = slb-save_area[i].esid; - unsigned long rs = slb-save_area[i].vsid; + unsigned long rb = be64_to_cpu(slb-save_area[i].esid); + unsigned long rs = be64_to_cpu(slb-save_area[i].vsid); rb = (rb ~0xFFFul) | i; /* insert entry number */ asm volatile(slbmte %0,%1 : : r (rs), r (rb)); -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception
-Original Message- From: Wood Scott-B07421 Sent: Friday, August 01, 2014 2:16 AM To: Bhushan Bharat-R65777 Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; Yoder Stuart- B08248 Subject: Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception On Thu, 2014-07-31 at 01:15 -0500, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Thursday, July 31, 2014 8:18 AM To: Bhushan Bharat-R65777 Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; Yoder Stuart- B08248 Subject: Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception On Wed, 2014-07-30 at 01:43 -0500, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Tuesday, July 29, 2014 3:58 AM To: Bhushan Bharat-R65777 Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; Yoder Stuart- B08248 Subject: Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception Userspace might be interested in the raw value, With the current design, If userspace is interested then it will not get the DBSR. Oh, because DBSR isn't currently implemented in sregs or one reg? That is one reason. Another is that if we give dbsr visibility to userspace then userspace have to clear dbsr in handling KVM_EXIT_DEBUG. Right -- since I didn't realize DBSR wasn't already exposed, I thought userspace already had this responsibility. It looked like it was removing dbsr visibility and the requirement for userspace to clear dbsr. I guess the old way was that the value in vcpu-arch.dbsr didn't matter until the next debug exception, when it would be overwritten by the new SPRN_DBSR? But that means old dbsr will be visibility to userspace, which is even bad than not visible, no? Also this can lead to old dbsr visible to guest once userspace releases debug resources, but this can be solved by clearing dbsr in kvm_arch_vcpu_ioctl_set_guest_debug() - if (!(dbg-control KVM_GUESTDBG_ENABLE)) { }. I wasn't suggesting that you keep it that way, just clarifying my understanding of the current code. + case SPRN_DBCR2: + /* +* If userspace is debugging guest then guest +* can not access debug registers. +*/ + if (vcpu-guest_debug) + break; + + debug_inst = true; + vcpu-arch.dbg_reg.dbcr2 = spr_val; + vcpu-arch.shadow_dbg_reg.dbcr2 = spr_val; break; In what circumstances can the architected and shadow registers differ? As of now they are same. But I think that if we want to implement other features like Freeze Timer (FT) then they can be different. I don't think we can possibly implement Freeze Timer. May be, but in my opinion we should keep this open. We're not talking about API here -- the implementation should be kept simple if there's no imminent need for shadow registers. I am not sure what we should in that case ? As we are currently emulating a subset of debug events (IAC, DAC, IC, BT and TIE --- DBCR0 emulation) then we should expose status of those events in guest dbsr and rest should be cleared ? I'm not saying they need to be exposed to the guest, but I don't see where you filter out bits like these. I am trying to get what all bits should be filtered out, all bits except IACx, DACx, IC, BT and TIE (same as event set filtering done when setting DBCR0) ? i.e IDE, UDE, MRR, IRPT, RET, CIRPT, CRET should be filtered out? Bits like IRPT and RET don't really matter, as you shouldn't see them happen. Likewise MRR if you're sure you've cleared it since boot. We can clear MRR bits when update vcpu-arch-dbsr with SPRM_DBSR in kvm debug handler But IDE could be set any time an asynchronous exception happens. I don't think you should filter it out, but instead make sure that it doesn't cause an exception to be delivered. So this means that in kvmpp_handle_debug() if DBSR_IDE is set then do not inject debug interrupt and on dbsr write emulation, deque the debug interrupt even if DBSR_IDE is set. case SPRN_DBSR: vcpu-arch.dbsr = ~spr_val; if (!(vcpu-arch.dbsr ~DBSR_IDE)) kvmppc_core_dequeue_debug(vcpu); break; or vcpu-arch.dbsr = ~(spr_val | DBSR_IDE); if (!vcpu-arch.dbsr) kvmppc_core_dequeue_debug(vcpu); break; Thanks -Bharat -Scott
Re: [PATCH] arm64: KVM: export current vcpu-pause state via pseudo regs
Christoffer Dall writes: On Thu, Jul 31, 2014 at 05:45:28PM +0100, Peter Maydell wrote: On 31 July 2014 17:38, Christoffer Dall christoffer.d...@linaro.org wrote: If we are not complaining when setting the pause value to false if it was true before, then we probably also need to wake up the thread in case this is called from another thread, right? or perhaps we should just return an error if you're trying to un-pause a CPU through this interface, h. Wouldn't it be an error to mess with any register when the system is not in a quiescent state? I was assuming that the wake state is dealt with when the run loop finally restarts. The ABI doesn't really define it as an error (the ABI doesn't enforce anything right now) so the question is, does it ever make sense to clear the pause flag through this ioctl? If not, I think we should just err on the side of caution and specify in the docs that this is not supported and return an error. Consider the case where the reset state of the system is CPU 0 running, CPUs 1..N stopped, and we're doing an incoming migration to a state where all CPUs are running. In that case we'll be using this ioctl to clear the pause flag, right? (We'll also obviously need to set the PC and other register state correctly before resuming the guest.) Doh, you're right, I somehow had it in my mind that when you send the thread a signal, the pause flag would be cleared, but that goes against the whole idea of a CPU being turned off for KVM. But wouldn't we then have to also wake up the thread when clearing the pause flag? It feels strange that the ioctl can clear the pause flag, but keep the thread on a wake-queue, and then userspace has to send the thread a signal of some sort to wake it up? snip Isn't the vCPU off the wait-queue by definition if the ioctl exits and you go through the KVM_SET_ONE_REG stuff? Once you re-enter the KVM_RUN ioctl it sees the pause_flag as cleared and falls straight through into kvm_guest_enter() otherwise it will again wait on wait_event_interruptible(*wq, !vcpu-arch.pause). -- Alex Bennée -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL 16/63] PPC: Add asm helpers for BE 32bit load/store
On Fri, 2014-08-01 at 11:17 +0200, Alexander Graf wrote: From assembly code we might not only have to explicitly BE access 64bit values, but sometimes also 32bit ones. Add helpers that allow for easy use of lwzx/stwx in their respective byte-reverse or native form. Signed-off-by: Alexander Graf ag...@suse.de Acked-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- arch/powerpc/include/asm/asm-compat.h | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/include/asm/asm-compat.h b/arch/powerpc/include/asm/asm-compat.h index 4b237aa..21be8ae 100644 --- a/arch/powerpc/include/asm/asm-compat.h +++ b/arch/powerpc/include/asm/asm-compat.h @@ -34,10 +34,14 @@ #define PPC_MIN_STKFRM 112 #ifdef __BIG_ENDIAN__ +#define LWZX_BE stringify_in_c(lwzx) #define LDX_BE stringify_in_c(ldx) +#define STWX_BE stringify_in_c(stwx) #define STDX_BE stringify_in_c(stdx) #else +#define LWZX_BE stringify_in_c(lwbrx) #define LDX_BE stringify_in_c(ldbrx) +#define STWX_BE stringify_in_c(stwbrx) #define STDX_BE stringify_in_c(stdbrx) #endif -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] arm64: KVM: fix 64bit CP15 VM access for 32bit guests
Commit f0a3eaff71b8 (ARM64: KVM: fix big endian issue in access_vm_reg for 32bit guest) changed the way we handle CP15 VM accesses, so that all 64bit accesses are done via vcpu_sys_reg. This looks like a good idea as it solves indianness issues in an elegant way, except for one small detail: the register index is doesn't refer to the same array! We end up corrupting some random data structure instead. Fix this by reverting to the original code, except for the introduction of a vcpu_cp15_64_high macro that deals with the endianness thing. Tested on Juno with 32bit SMP guests. Cc: Victor Kamensky victor.kamen...@linaro.org Cc: Christoffer Dall christoffer.d...@linaro.org Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- Christoffer, can you please have a look at this one and and queue it if you find it acceptable? Thanks, M. arch/arm64/include/asm/kvm_host.h | 6 -- arch/arm64/kvm/sys_regs.c | 7 +-- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 79812be..e10c45a 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -149,9 +149,11 @@ struct kvm_vcpu_arch { #define vcpu_cp15(v,r) ((v)-arch.ctxt.copro[(r)]) #ifdef CONFIG_CPU_BIG_ENDIAN -#define vcpu_cp15_64_low(v,r) ((v)-arch.ctxt.copro[((r) + 1)]) +#define vcpu_cp15_64_high(v,r) vcpu_cp15((v),(r)) +#define vcpu_cp15_64_low(v,r) vcpu_cp15((v),(r) + 1) #else -#define vcpu_cp15_64_low(v,r) ((v)-arch.ctxt.copro[((r) + 0)]) +#define vcpu_cp15_64_high(v,r) vcpu_cp15((v),(r) + 1) +#define vcpu_cp15_64_low(v,r) vcpu_cp15((v),(r)) #endif struct kvm_vm_stat { diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index a4fd526..5805e7c 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -135,10 +135,13 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu, BUG_ON(!p-is_write); val = *vcpu_reg(vcpu, p-Rt); - if (!p-is_aarch32 || !p-is_32bit) + if (!p-is_aarch32) { vcpu_sys_reg(vcpu, r-reg) = val; - else + } else { + if (!p-is_32bit) + vcpu_cp15_64_high(vcpu, r-reg) = val 32; vcpu_cp15_64_low(vcpu, r-reg) = val 0xUL; + } return true; } -- 2.0.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm64: KVM: fix 64bit CP15 VM access for 32bit guests
On Fri, Aug 01, 2014 at 12:00:36PM +0100, Marc Zyngier wrote: Commit f0a3eaff71b8 (ARM64: KVM: fix big endian issue in access_vm_reg for 32bit guest) changed the way we handle CP15 VM accesses, so that all 64bit accesses are done via vcpu_sys_reg. This looks like a good idea as it solves indianness issues in an elegant way, except for one small detail: the register index is doesn't refer to the same array! We end up corrupting some random data structure instead. Ouch! Fix this by reverting to the original code, except for the introduction of a vcpu_cp15_64_high macro that deals with the endianness thing. Tested on Juno with 32bit SMP guests. Cc: Victor Kamensky victor.kamen...@linaro.org Cc: Christoffer Dall christoffer.d...@linaro.org Signed-off-by: Marc Zyngier marc.zyng...@arm.com --- Christoffer, can you please have a look at this one and and queue it if you find it acceptable? Good catch, it looks good, I'll queue it on kvmarm/next right away. Reviewed-by: Christoffer Dall christoffer.d...@linaro.org -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 04/17] COLO info: use colo info to tell migration target colo is enabled
* Yang Hongyang (yan...@cn.fujitsu.com) wrote: migrate colo info to migration target to tell the target colo is enabled. If I understand this correctly this means that you send a 'colo info' device information for migrations that don't have COLO enabled; that's bad because it breaks migration unless the destination has it; I guess it's OK if you were to guard it with a thing so it didn't do it for old machine-types. You could use the QEMU_VM_COMMAND sections I've created for postcopy; ( http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg00889.html ) and add a QEMU_VM_CMD_COLO to indicate you want the destination to become an SVM, then check the capability near the start of migration and send the command. Or perhaps there's a way to add the colo-info device on the command line so it's not always there. Dave Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- Makefile.objs | 1 + include/migration/migration-colo.h | 3 ++ migration-colo-comm.c | 68 ++ vl.c | 4 +++ 4 files changed, 76 insertions(+) create mode 100644 migration-colo-comm.c diff --git a/Makefile.objs b/Makefile.objs index cab5824..1836a68 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -50,6 +50,7 @@ common-obj-$(CONFIG_POSIX) += os-posix.o common-obj-$(CONFIG_LINUX) += fsdev/ common-obj-y += migration.o migration-tcp.o +common-obj-y += migration-colo-comm.o common-obj-$(CONFIG_COLO) += migration-colo.o common-obj-y += vmstate.o common-obj-y += qemu-file.o diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h index 35b384c..e3735d8 100644 --- a/include/migration/migration-colo.h +++ b/include/migration/migration-colo.h @@ -12,6 +12,9 @@ #define QEMU_MIGRATION_COLO_H #include qemu-common.h +#include migration/migration.h + +void colo_info_mig_init(void); bool colo_supported(void); diff --git a/migration-colo-comm.c b/migration-colo-comm.c new file mode 100644 index 000..ccbc246 --- /dev/null +++ b/migration-colo-comm.c @@ -0,0 +1,68 @@ +/* + * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO) + * (a.k.a. Fault Tolerance or Continuous Replication) + * + * Copyright (C) 2014 FUJITSU LIMITED + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + * + */ + +#include migration/migration-colo.h + +#define DEBUG_COLO + +#ifdef DEBUG_COLO +#define DPRINTF(fmt, ...) \ +do { fprintf(stdout, COLO: fmt, ## __VA_ARGS__); } while (0) +#else +#define DPRINTF(fmt, ...) \ +do { } while (0) +#endif + +static bool colo_requested; + +/* save */ + +static bool migrate_use_colo(void) +{ +MigrationState *s = migrate_get_current(); +return s-enabled_capabilities[MIGRATION_CAPABILITY_COLO]; +} + +static void colo_info_save(QEMUFile *f, void *opaque) +{ +qemu_put_byte(f, migrate_use_colo()); +} + +/* restore */ + +static int colo_info_load(QEMUFile *f, void *opaque, int version_id) +{ +int value = qemu_get_byte(f); + +if (value !colo_supported()) { +fprintf(stderr, COLO is not supported\n); +return -EINVAL; +} + +if (value !colo_requested) { +DPRINTF(COLO requested!\n); +} + +colo_requested = value; + +return 0; +} + +static SaveVMHandlers savevm_colo_info_handlers = { +.save_state = colo_info_save, +.load_state = colo_info_load, +}; + +void colo_info_mig_init(void) +{ +register_savevm_live(NULL, colo info, -1, 1, + savevm_colo_info_handlers, NULL); +} diff --git a/vl.c b/vl.c index fe451aa..1a282d8 100644 --- a/vl.c +++ b/vl.c @@ -89,6 +89,7 @@ int main(int argc, char **argv) #include sysemu/dma.h #include audio/audio.h #include migration/migration.h +#include migration/migration-colo.h #include sysemu/kvm.h #include qapi/qmp/qjson.h #include qemu/option.h @@ -4339,6 +4340,9 @@ int main(int argc, char **argv, char **envp) blk_mig_init(); ram_mig_init(); +if (colo_supported()) { +colo_info_mig_init(); +} /* open the virtual block devices */ if (snapshot) -- 1.9.1 -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 05/17] COLO save: integrate COLO checkpointed save into qemu migration
* Yang Hongyang (yan...@cn.fujitsu.com) wrote: Integrate COLO checkpointed save flow into qemu migration. Add a migrate state: MIG_STATE_COLO, enter this migrate state after the first live migration successfully finished. Create a colo thread to do the checkpointed save. In postcopy I added a 'migration_already_active' function to merge all the different places that check for ACTIVE/SETUP etc. ( http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg00850.html ) +/*TODO: COLO checkpointed save loop*/ + +if (s-state != MIG_STATE_ERROR) { +migrate_set_state(s, MIG_STATE_COLO, MIG_STATE_COMPLETED); +} I thought migrate_set_state only changed the state if the old state matched the 1st value - i.e. I think it'll only change to COMPLETED if the state is COLO; so I don't think you need the if. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 07/17] COLO buffer: implement colo buffer as well as QEMUFileOps based on it
* Yang Hongyang (yan...@cn.fujitsu.com) wrote: We need a buffer to store migration data. On save side: all saved data was write into colo buffer first, so that we can know the total size of the migration data. this can also separate the data transmission from colo control data, we use colo control data over socket fd to synchronous both side's stat. On restore side: all migration data was read into colo buffer first, then load data from the buffer: If network error happens while data transmission, the slaver can still functinal because the migration data are not yet loaded. This is very similar to the QEMUSizedBuffer based QEMUFile's that Stefan Berger wrote and that I use in both my postcopy and BER patchsets: http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg00846.html (and to the similar code from Isaku Yamahata). I think we should be able to use a shared version even if we need some changes. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- migration-colo.c | 112 +++ 1 file changed, 112 insertions(+) diff --git a/migration-colo.c b/migration-colo.c index d566b9d..b90d9b6 100644 --- a/migration-colo.c +++ b/migration-colo.c @@ -11,6 +11,7 @@ #include qemu/main-loop.h #include qemu/thread.h #include block/coroutine.h +#include qemu/error-report.h #include migration/migration-colo.h static QEMUBH *colo_bh; @@ -20,14 +21,122 @@ bool colo_supported(void) return true; } +/* colo buffer */ + +#define COLO_BUFFER_BASE_SIZE (1000*1000*4ULL) +#define COLO_BUFFER_MAX_SIZE (1000*1000*1000*10ULL) Powers of 2 are nicer! Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 10/17] COLO ctl: introduce is_slave() and is_master()
* Yang Hongyang (yan...@cn.fujitsu.com) wrote: is_slaver is to determine whether the QEMU instance is a slaver(migration target) at runtime. is_master is to determine whether the QEMU instance is a master(migration starter) at runtime. This 2 APIs will be used later. Since the names are made global in patch 15, I think it's best to do it here, but also use a more specific name for them, like colo_is_master. Dave Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- migration-colo.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/migration-colo.c b/migration-colo.c index 802f8b0..2699e77 100644 --- a/migration-colo.c +++ b/migration-colo.c @@ -187,6 +187,12 @@ static const QEMUFileOps colo_read_ops = { /* save */ +static __attribute__((unused)) bool is_master(void) +{ +MigrationState *s = migrate_get_current(); +return (s-state == MIG_STATE_COLO); +} + static void *colo_thread(void *opaque) { MigrationState *s = opaque; @@ -275,6 +281,11 @@ void colo_init_checkpointer(MigrationState *s) static Coroutine *colo; +static __attribute__((unused)) bool is_slave(void) +{ +return colo != NULL; +} + /* * return: * 0: start a checkpoint -- 1.9.1 -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 11/17] COLO ctl: implement colo checkpoint protocol
* Yang Hongyang (yan...@cn.fujitsu.com) wrote: implement colo checkpoint protocol. Checkpoint synchronzing points. Primary Secondary NEW @ Suspend SUSPENDED @ SuspendSave state SEND@ Send state Receive state RECEIVED@ Flush network Load state LOADED @ Resume Resume Start Comparing NOTE: 1) '@' who sends the message 2) Every sync-point is synchronized by two sides with only one handshake(single direction) for low-latency. If more strict synchronization is required, a opposite direction sync-point should be added. 3) Since sync-points are single direction, the remote side may go forward a lot when this side just receives the sync-point. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- migration-colo.c | 268 +-- 1 file changed, 262 insertions(+), 6 deletions(-) diff --git a/migration-colo.c b/migration-colo.c index 2699e77..a708872 100644 --- a/migration-colo.c +++ b/migration-colo.c @@ -24,6 +24,41 @@ */ #define CHKPOINT_TIMER 1 +enum { +COLO_READY = 0x46, + +/* + * Checkpoint synchronzing points. + * + * Primary Secondary + * NEW @ + * Suspend + * SUSPENDED @ + * SuspendSave state + * SEND@ + * Send state Receive state + * RECEIVED@ + * Flush network Load state + * LOADED @ + * Resume Resume + * + * Start Comparing + * NOTE: + * 1) '@' who sends the message + * 2) Every sync-point is synchronized by two sides with only + *one handshake(single direction) for low-latency. + *If more strict synchronization is required, a opposite direction + *sync-point should be added. + * 3) Since sync-points are single direction, the remote side may + *go forward a lot when this side just receives the sync-point. + */ +COLO_CHECKPOINT_NEW, +COLO_CHECKPOINT_SUSPENDED, +COLO_CHECKPOINT_SEND, +COLO_CHECKPOINT_RECEIVED, +COLO_CHECKPOINT_LOADED, +}; + static QEMUBH *colo_bh; bool colo_supported(void) @@ -185,30 +220,161 @@ static const QEMUFileOps colo_read_ops = { .close = colo_close, }; +/* colo checkpoint control helper */ +static bool is_master(void); +static bool is_slave(void); + +static void ctl_error_handler(void *opaque, int err) +{ +if (is_slave()) { +/* TODO: determine whether we need to failover */ +/* FIXME: we will not failover currently, just kill slave */ +error_report(error: colo transmission failed!\n); +exit(1); +} else if (is_master()) { +/* Master still alive, do not failover */ +error_report(error: colo transmission failed!\n); +return; +} else { +error_report(COLO: Unexpected error happend!\n); +exit(EXIT_FAILURE); +} +} + +static int colo_ctl_put(QEMUFile *f, uint64_t request) +{ +int ret = 0; + +qemu_put_be64(f, request); +qemu_fflush(f); + +ret = qemu_file_get_error(f); +if (ret 0) { +ctl_error_handler(f, ret); +return 1; +} + +return ret; +} + +static int colo_ctl_get_value(QEMUFile *f, uint64_t *value) +{ +int ret = 0; +uint64_t temp; + +temp = qemu_get_be64(f); + +ret = qemu_file_get_error(f); +if (ret 0) { +ctl_error_handler(f, ret); +return 1; +} + +*value = temp; +return 0; +} + +static int colo_ctl_get(QEMUFile *f, uint64_t require) +{ +int ret; +uint64_t value; + +ret = colo_ctl_get_value(f, value); +if (ret) { +return ret; +} + +if (value != require) { +error_report(unexpected state received!\n); I find it useful to print the expected/received state to be able to figure out what went wrong. +exit(1); +} + +return ret; +} + /* save */ -static __attribute__((unused)) bool is_master(void) +static bool is_master(void) { MigrationState *s = migrate_get_current(); return (s-state == MIG_STATE_COLO); } +static int do_colo_transaction(MigrationState *s, QEMUFile *control, + QEMUFile *trans) +{ +int ret; + +ret = colo_ctl_put(s-file, COLO_CHECKPOINT_NEW); +
Re: [RFC PATCH 13/17] COLO ctl: implement colo save
* Yang Hongyang (yan...@cn.fujitsu.com) wrote: implement colo save My postcopy 'QEMU_VM_CMD_PACKAGED' does something similar to parts of this with the QEMUSizedBuffer, we might be able to share some more: https://lists.nongnu.org/archive/html/qemu-devel/2014-07/msg00886.html +/* we send the total size of the vmstate first */ +ret = colo_ctl_put(s-file, colo_buffer.used); +if (ret) { +goto out; +} + +qemu_put_buffer_async(s-file, colo_buffer.data, colo_buffer.used); +ret = qemu_file_get_error(s-file); +if (ret 0) { +goto out; +} +qemu_fflush(s-file); Is there a reason to use _async here? I thought the only gain is if you were going to do other writes in the shadow of the async, with the fflush immediately after I'm not sure it helps. Dave ret = colo_ctl_get(control, COLO_CHECKPOINT_RECEIVED); if (ret) { goto out; } -/* TODO: Flush network etc. */ +/* Flush network etc. */ +colo_compare_flush(); ret = colo_ctl_get(control, COLO_CHECKPOINT_LOADED); if (ret) { goto out; } -/* TODO: resume master */ +colo_compare_resume(); +ret = 0; out: +/* resume master */ +qemu_mutex_lock_iothread(); +vm_start(); +qemu_mutex_unlock_iothread(); + return ret; } -- 1.9.1 -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 16/17] COLO ram cache: implement colo ram cache on slaver
* Yang Hongyang (yan...@cn.fujitsu.com) wrote: The ram cache was initially the same as PVM's memory. At checkpoint, we cache the dirty memory of PVM into ram cache (so that ram cache always the same as PVM's memory at every checkpoint), flush cached memory to SVM after we received all PVM dirty memory(only needed to flush memory that was both dirty on PVM and SVM since last checkpoint). (Typo: 'r' on the end of the title) I think I understand the need for the cache, to be able to restore pages that the SVM has modified that the PVM hadn't; however, if I understand the change here, (to host_from_stream_offset) the SVM will load the snapshot into the ram_cache rather than directly into host memory - why is this necessary? If the SVMs CPU is stopped at this point couldn't it load snapshot pages directly into host memory, clearing pages in the SVMs bitmap, so that the only pages that then get copied in flush_cache are the pages that the SVM modified but the PVM *didn't* include in the snapshot? I can see that you would need to do it the way you've done it if the snapshot-load could fail (at the sametime the PVM failed) and thus the old SVM state would be the surviving state, but how could it fail at this point given the whole stream is in the colo-buffer? +static void ram_flush_cache(void); static int ram_load(QEMUFile *f, void *opaque, int version_id) { ram_addr_t addr; int flags, ret = 0; static uint64_t seq_iter; +bool need_flush = false; Probably better as 'ram_cache_needs_flush' Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 15/17] COLO save: reuse migration bitmap under colo checkpoint
* Yang Hongyang (yan...@cn.fujitsu.com) wrote: reuse migration bitmap under colo checkpoint, only send dirty pages per-checkpoint. Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- arch_init.c| 20 +++- include/migration/migration-colo.h | 2 ++ migration-colo.c | 6 ++ stubs/migration-colo.c | 10 ++ 4 files changed, 33 insertions(+), 5 deletions(-) diff --git a/arch_init.c b/arch_init.c index 8ddaf35..c84e6c8 100644 --- a/arch_init.c +++ b/arch_init.c @@ -52,6 +52,7 @@ #include exec/ram_addr.h #include hw/acpi/acpi.h #include qemu/host-utils.h +#include migration/migration-colo.h #ifdef DEBUG_ARCH_INIT #define DPRINTF(fmt, ...) \ @@ -769,6 +770,15 @@ static int ram_save_setup(QEMUFile *f, void *opaque) RAMBlock *block; int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */ +/* + * migration has already setup the bitmap, reuse it. + */ +if (is_master()) { +qemu_mutex_lock_ramlist(); +reset_ram_globals(); +goto out_setup; +} + mig_throttle_on = false; dirty_rate_high_cnt = 0; bitmap_sync_count = 0; @@ -828,6 +838,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) migration_bitmap_sync(); qemu_mutex_unlock_iothread(); +out_setup: qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE); QTAILQ_FOREACH(block, ram_list.blocks, next) { Is it necessary to send the block list for each of your snapshots? Dave @@ -937,7 +948,14 @@ static int ram_save_complete(QEMUFile *f, void *opaque) } ram_control_after_iterate(f, RAM_CONTROL_FINISH); -migration_end(); + +/* + * Since we need to reuse dirty bitmap in colo, + * don't cleanup the bitmap. + */ +if (!migrate_use_colo() || migration_has_failed(migrate_get_current())) { +migration_end(); +} qemu_mutex_unlock_ramlist(); qemu_put_be64(f, RAM_SAVE_FLAG_EOS); diff --git a/include/migration/migration-colo.h b/include/migration/migration-colo.h index 861fa27..c286a60 100644 --- a/include/migration/migration-colo.h +++ b/include/migration/migration-colo.h @@ -21,10 +21,12 @@ bool colo_supported(void); /* save */ bool migrate_use_colo(void); void colo_init_checkpointer(MigrationState *s); +bool is_master(void); /* restore */ bool restore_use_colo(void); void restore_exit_colo(void); +bool is_slave(void); void colo_process_incoming_checkpoints(QEMUFile *f); diff --git a/migration-colo.c b/migration-colo.c index 8596845..13a6a57 100644 --- a/migration-colo.c +++ b/migration-colo.c @@ -222,8 +222,6 @@ static const QEMUFileOps colo_read_ops = { }; /* colo checkpoint control helper */ -static bool is_master(void); -static bool is_slave(void); static void ctl_error_handler(void *opaque, int err) { @@ -295,7 +293,7 @@ static int colo_ctl_get(QEMUFile *f, uint64_t require) /* save */ -static bool is_master(void) +bool is_master(void) { MigrationState *s = migrate_get_current(); return (s-state == MIG_STATE_COLO); @@ -499,7 +497,7 @@ void colo_init_checkpointer(MigrationState *s) static Coroutine *colo; -static bool is_slave(void) +bool is_slave(void) { return colo != NULL; } diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c index 55f0d37..ef65be6 100644 --- a/stubs/migration-colo.c +++ b/stubs/migration-colo.c @@ -22,3 +22,13 @@ void colo_init_checkpointer(MigrationState *s) void colo_process_incoming_checkpoints(QEMUFile *f) { } + +bool is_master(void) +{ +return false; +} + +bool is_slave(void) +{ +return false; +} -- 1.9.1 -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integrity in untrusted environments
Paolo Bonzini pbonzini at redhat.com writes Hello, I am exploring ideas to implement a service inside a virtual machine on untrusted hypervisors under current cloud infrastructures. Particularly, I am interested how one can verify the integrity of the service in an environment where hypervisor is not trusted. This is my setup. 1. I have two virtual machines. (Normal client VM's). 2. VM-A is executing a service and VM-B wants to verify its integrity. 3. Both are executing on untrusted hypervisor. Though, Intel SGX will solve this, by using the concept of enclaves, its not publicly available yet. One could also use SMM to verify the integrity. But since this is time based approach, one could easily exploit between the time window. I was drilling down this idea, We know Write xor Execute Memory Protection Scheme. Using this idea,If we could lock down the VM-A memory pages where the service is running and also corresponding page-table entries, then have a handler code that temporarily unlocks them for legitimate updates, then one could verify the integrity of the service running. You can make a malicious hypervisor that makes all executable pages also writable, but hides the fact to the running process. But really, if you control the hypervisor you can just write to guest memory as you wish. SMM will be emulated by the hypervisor. If the hypervisor is untrusted, you cannot solve _everything_. For the third time, what attacks are you trying to protect from? Paolo Thanks Paolo, I was considering all critical attacks possible that a client virtual machine could have under the untrusted hypervisor scenarios. For example,Memory based,Hypervisor based and few major side channel attacks. I am ignoring the network based attacks for the time being. And one more question to your reply. I did'nt understand as to what you were trying to describe here You can make a malicious hypervisor that makes all executable pages also writable, but hides the fact to the running process. But really, if you control the hypervisor you can just write to guest memory as you wish This is my understanding, Correct me if I am wrong here. If we lock down the code pages of genuine hypervisor as I discussed before, Isn't it sufficent? Because essentially hypervisor is the one that handles the traps from the virtual machines for execution.So, even if the hypervisor wishes to write to the client virtual machine, it will be captured since the memory pages of the hypervisor is locked down and is essentially non bypassable. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 00/17] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
* Yang Hongyang (yan...@cn.fujitsu.com) wrote: Virtual machine (VM) replication is a well known technique for providing application-agnostic software-implemented hardware fault tolerance non-stop service. COLO is a high availability solution. Both primary VM (PVM) and secondary VM (SVM) run in parallel. They receive the same request from client, and generate response in parallel too. If the response packets from PVM and SVM are identical, they are released immediately. Otherwise, a VM checkpoint (on demand) is conducted. The idea is presented in Xen summit 2012, and 2013, and academia paper in SOCC 2013. It's also presented in KVM forum 2013: http://www.linux-kvm.org/wiki/images/1/1d/Kvm-forum-2013-COLO.pdf Please refer to above document for detailed information. Please also refer to previous posted RFC proposal: http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html Hi Yang, Thanks for this set of patches (and I've replied to many individually). The patchset is also hosted on github: https://github.com/macrosheep/qemu/tree/colo_v0.1 This patchset is RFC, implements the frame of colo, without failover and nic/disk replication. But it is ready for demo the COLO idea above QEMU-Kvm. Steps using this patchset to get an overview of COLO: 1. configure the source with --enable-colo option 2. compile 3. just like QEMU's normal migration, run 2 QEMU VM: - Primary VM - Secondary VM with -incoming tcp:[IP]:[PORT] option 4. on Primary VM's QEMU monitor, run following command: migrate_set_capability colo on migrate tcp:[IP]:[PORT] 5. done you will see two runing VMs, whenever you make changes to PVM, SVM will be synced to PVM's state. TODO list: 1. failover 2. nic replication 3. disk replication[COLO Disk manager] I wonder if there are any parts that can be borrowed from other code to get it going; I notice that the reverse execution patchset has a network packet record/replay mode: https://lists.gnu.org/archive/html/qemu-devel/2014-07/msg00157.html What was used for the nic comparison in the 2013 kvm forum paper? Dave Any comments/feedbacks are warmly welcomed. Thanks, Yang Yang Hongyang (17): configure: add CONFIG_COLO to switch COLO support COLO: introduce an api colo_supported() to indicate COLO support COLO migration: add a migration capability 'colo' COLO info: use colo info to tell migration target colo is enabled COLO save: integrate COLO checkpointed save into qemu migration COLO restore: integrate COLO checkpointed restore into qemu restore COLO buffer: implement colo buffer as well as QEMUFileOps based on it COLO: disable qdev hotplug COLO ctl: implement API's that communicate with colo agent COLO ctl: introduce is_slave() and is_master() COLO ctl: implement colo checkpoint protocol COLO ctl: add a RunState RUN_STATE_COLO COLO ctl: implement colo save COLO ctl: implement colo restore COLO save: reuse migration bitmap under colo checkpoint COLO ram cache: implement colo ram cache on slaver HACK: trigger checkpoint every 500ms Makefile.objs | 2 + arch_init.c| 174 +- configure | 14 + include/exec/cpu-all.h | 1 + include/migration/migration-colo.h | 36 +++ include/migration/migration.h | 13 + include/qapi/qmp/qerror.h | 3 + migration-colo-comm.c | 78 + migration-colo.c | 643 + migration.c| 45 ++- qapi-schema.json | 9 +- stubs/Makefile.objs| 1 + stubs/migration-colo.c | 34 ++ vl.c | 12 + 14 files changed, 1044 insertions(+), 21 deletions(-) create mode 100644 include/migration/migration-colo.h create mode 100644 migration-colo-comm.c create mode 100644 migration-colo.c create mode 100644 stubs/migration-colo.c -- 1.9.1 -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
George Traykov Donation
Dear Sir / Ma'am, This is a personal email directed to you. My name is George Traykov and I have decided to write you to share my fortune to two (2) lucky winner.I won the lottery twice but I'm still not happy being labelled the world's most ungrateful winner hence I have voluntarily decided to donate $500,000.00 USD to you as part of my own charity project to improve the life of 2 lucky individuals all over the world. If you have received this email then you are one of the two lucky recipients, get back to me via email: georgetrayko...@yahoo.com for more details on how you can redeem your prize/donation. You can verify this by visiting the web pages below: http://metro.co.uk/2013/10/17/george-traykov-i-won-the-lottery-twice-but-im-still-not-happy-4150822/ Yours Sincerely, George Traykov -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: Question and Performance of Intel's APIC-v on Xeon E5 v2
Hi folks, I recently got a Intel Xeon E5-2609 v2 machine with APIC-v support. I did some performance tests under Linux kernel 3.11 and have some doubts about the new APICv feature. I'm appreciated for any comments and please correct me if I'm wrong. My understanding of APIC-v is that it mainly consists of 1) Virtual interrupt delivery (the same as posted interrupt), which avoids KVM to inject vAPIC interrupts manually. In other word, it post an interrupt to the guest without sending IPI, which causes external interrupt exit. 2) EOI virtualization. So guest acknowledging the interrupt incurs no EOI exit. (however, sometimes it exits) 3) Virtualized the APIC-registers so read/write won't trap into the hypervisor. However, some APIC-write still trigger VM exit, but it becomes trap-like instead of fault-like. (I don't know which APIC-write causes exit and which does not) === Experiment A Result === 1. virtio network with vhost, iperf TCP experiments, enable/disable APIC-v [With APIC-v] Total number of EXIT rate: 4351.1 exits second -- VM EXIT Breakdown -- reason exit/sec Avg(us) IO_INSTRUCTION1428 81.471931 EXCEPTION_NMI 69 7.906276 EXTERNAL_INTERRUPT 1866 7.317781 MSR_WRITE 970 1.504932 [Without APIC-v] Total number of EXIT rate: 83510.1 exits per second -- VM EXIT Breakdown -- reason exit/sec Avg(us) IO_INSTRUCTION18428 81.471931 EXTERNAL_INTERRUPT 311667.317781 MSR_WRITE 30970 1.504932 VM exit rate reduces from 83k/sec to 4.3k/sec because - the 31166 EXTERNAL_INTERRUPT mainly comes from vhost sending IPI, while APIC-v's posted interrupt avoids it. - the 30970 MSR_WRITE comes from EOI, while APIC-v's EOI virtualization avoids it - however, APIC-v still has 1866 EXTERNAL_INTERRUPT and 970 MSR_WRITE exits, I found it's due to timer. I confirm with the next experiments. === Experiment B Result === I run cyclictest in VM and measure the VM exit behavior. The cyclictest is configure to generate 1k timer per second. For with or without APIC-v, I got similar results as below total number of EXIT 156919 rate: 5225.18 exits per second -- VM EXIT Breakdown -- reason exit/sec Avg(us) IO_INSTRUCTION 18 47.412613 EXTERNAL_INTERRUPT 30853.022567 MSR_WRITE 20951.330987 I found APIC-v does not improve on timer interrupt delivery because posted interrupt seems not work on LAPIC timer? If not, why? So the 3085 EXTERNAL_INTERRUPT is due to timer expiration interrupt and 2095 MSR_WRITE is due to EOI and program the TMICT (part of APIC's register). Does this contradict with the APIC-v's assumption saying APIC-write is direct without VM exit? Thank you and any comments are welcome Regards, William Tu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kvm-unit-tests failures
Hi, We are planning on running kvm-unit-tests as part of our test suite; but I've noticed that many tests fail (even running the latest kvm tip). After searching I found many BZ entires that seem to point at this master bug for tracking these issues: https://bugzilla.redhat.com/show_bug.cgi?id=1079979 However, this bug is private; and cannot be viewed by the public. I'd like to know how to help report issues that we observe with testing in order to help fix these tests, or understand any progress being made to fix them already. Is there a public bug that everybody can view to track these issues? Should I be reporting new bugs with failures in the unit tests? Where is the appropriate place to file bugs against kvm-unit-tests and discuss issues? Thanks, --chris j arges -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH] kvm: x86: fix stale mmio cache bug
The following events can lead to an incorrect KVM_EXIT_MMIO bubbling up to userspace: (1) Guest accesses gpa X without a memory slot. The gfn is cached in struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets the SPTE write-execute-noread so that future accesses cause EPT_MISCONFIGs. (2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION covering the page just accessed. (3) Guest attempts to read or write to gpa X again. On Intel, this generates an EPT_MISCONFIG. The memory slot generation number that was incremented in (2) would normally take care of this but we fast path mmio faults through quickly_check_mmio_pf(), which only checks the per-vcpu mmio cache. Since we hit the cache, KVM passes a KVM_EXIT_MMIO up to userspace. This patch fixes the issue by clearing the mmio cache in the KVM_MR_CREATE code path. - introduce KVM_REQ_CLEAR_MMIO_CACHE for clearing all vcpu mmio caches. - extend vcpu_clear_mmio_info to clear mmio_gfn in addition to mmio_gva, since both can be used to fast path mmio faults. - issue KVM_REQ_CLEAR_MMIO_CACHE during memslot creation to flush the mmio cache. - in mmu_sync_roots, unconditionally clear the mmio cache since even direct_map (e.g. tdp) hosts use it. Signed-off-by: David Matlack dmatl...@google.com --- arch/x86/kvm/mmu.c | 3 ++- arch/x86/kvm/x86.c | 5 + arch/x86/kvm/x86.h | 8 +--- include/linux/kvm_host.h | 2 ++ virt/kvm/kvm_main.c | 10 +- 5 files changed, 19 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 9314678..8d50b84 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3157,13 +3157,14 @@ static void mmu_sync_roots(struct kvm_vcpu *vcpu) int i; struct kvm_mmu_page *sp; + vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY); + if (vcpu-arch.mmu.direct_map) return; if (!VALID_PAGE(vcpu-arch.mmu.root_hpa)) return; - vcpu_clear_mmio_info(vcpu, ~0ul); kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC); if (vcpu-arch.mmu.root_level == PT64_ROOT_LEVEL) { hpa_t root = vcpu-arch.mmu.root_hpa; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ef432f8..05b5629 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6001,6 +6001,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) kvm_deliver_pmi(vcpu); if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu)) vcpu_scan_ioapic(vcpu); + + if (kvm_check_request(KVM_REQ_CLEAR_MMIO_CACHE, vcpu)) + vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY); } if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) { @@ -7281,6 +7284,8 @@ void kvm_arch_memslots_updated(struct kvm *kvm) * mmio generation may have reached its maximum value. */ kvm_mmu_invalidate_mmio_sptes(kvm); + + kvm_make_all_vcpus_request(kvm, KVM_REQ_CLEAR_MMIO_CACHE); } int kvm_arch_prepare_memory_region(struct kvm *kvm, diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 8c97bac..41ef197 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -81,15 +81,17 @@ static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu, } /* - * Clear the mmio cache info for the given gva, - * specially, if gva is ~0ul, we clear all mmio cache info. + * Clear the mmio cache info for the given gva. If gva is MMIO_GVA_ANY, + * unconditionally clear the mmio cache. */ +#define MMIO_GVA_ANY (~0ul) static inline void vcpu_clear_mmio_info(struct kvm_vcpu *vcpu, gva_t gva) { - if (gva != (~0ul) vcpu-arch.mmio_gva != (gva PAGE_MASK)) + if (gva != MMIO_GVA_ANY vcpu-arch.mmio_gva != (gva PAGE_MASK)) return; vcpu-arch.mmio_gva = 0; + vcpu-arch.mmio_gfn = 0; } static inline bool vcpu_match_mmio_gva(struct kvm_vcpu *vcpu, unsigned long gva) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ec4e3bd..e4edaff 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -136,6 +136,7 @@ static inline bool is_error_page(struct page *page) #define KVM_REQ_GLOBAL_CLOCK_UPDATE 22 #define KVM_REQ_ENABLE_IBS23 #define KVM_REQ_DISABLE_IBS 24 +#define KVM_REQ_CLEAR_MMIO_CACHE 25 #define KVM_USERSPACE_IRQ_SOURCE_ID0 #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID 1 @@ -591,6 +592,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu); void kvm_load_guest_fpu(struct kvm_vcpu *vcpu); void kvm_put_guest_fpu(struct kvm_vcpu *vcpu); +bool kvm_make_all_vcpus_request(struct kvm *kvm, unsigned int req); void kvm_flush_remote_tlbs(struct kvm *kvm); void kvm_reload_remote_mmus(struct kvm *kvm); void kvm_make_mclock_inprogress_request(struct kvm *kvm); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4b6c01b..d09527a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -152,7
Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception
On Fri, 2014-08-01 at 04:34 -0500, Bhushan Bharat-R65777 wrote: on dbsr write emulation, deque the debug interrupt even if DBSR_IDE is set. case SPRN_DBSR: vcpu-arch.dbsr = ~spr_val; if (!(vcpu-arch.dbsr ~DBSR_IDE)) kvmppc_core_dequeue_debug(vcpu); break; or vcpu-arch.dbsr = ~(spr_val | DBSR_IDE); if (!vcpu-arch.dbsr) kvmppc_core_dequeue_debug(vcpu); break; The first option. I see no reason to have KVM forcibly clear DBSR[IDE]. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] kvm: x86: fix stale mmio cache bug
On Aug 2, 2014, at 7:54 AM, David Matlack dmatl...@google.com wrote: The following events can lead to an incorrect KVM_EXIT_MMIO bubbling up to userspace: (1) Guest accesses gpa X without a memory slot. The gfn is cached in struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets the SPTE write-execute-noread so that future accesses cause EPT_MISCONFIGs. (2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION covering the page just accessed. (3) Guest attempts to read or write to gpa X again. On Intel, this generates an EPT_MISCONFIG. The memory slot generation number that was incremented in (2) would normally take care of this but we fast path mmio faults through quickly_check_mmio_pf(), which only checks the per-vcpu mmio cache. Since we hit the cache, KVM passes a KVM_EXIT_MMIO up to userspace. Good catch, thank you, David! This patch fixes the issue by clearing the mmio cache in the KVM_MR_CREATE code path. - introduce KVM_REQ_CLEAR_MMIO_CACHE for clearing all vcpu mmio caches. - extend vcpu_clear_mmio_info to clear mmio_gfn in addition to mmio_gva, since both can be used to fast path mmio faults. - issue KVM_REQ_CLEAR_MMIO_CACHE during memslot creation to flush the mmio cache. - in mmu_sync_roots, unconditionally clear the mmio cache since even direct_map (e.g. tdp) hosts use it. I prefer to also caching the spte’s generation number, then check the number in quickly_check_mmio_pf(). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 04/63] KVM: PPC: Book3s PR: Disable AIL mode with OPAL
When we're using PR KVM we must not allow the CPU to take interrupts in virtual mode, as the SLB does not contain host kernel mappings when running inside the guest context. To make sure we get good performance for non-KVM tasks but still properly functioning PR KVM, let's just disable AIL whenever a vcpu is scheduled in. This is fundamentally different from how we deal with AIL on pSeries type machines where we disable AIL for the whole machine as soon as a single KVM VM is up. The reason for that is easy - on pSeries we do not have control over per-cpu configuration of AIL. We also don't want to mess with CPU hotplug races and AIL configuration, so setting it per CPU is easier and more flexible. This patch fixes running PR KVM on POWER8 bare metal for me. Signed-off-by: Alexander Graf ag...@suse.de Acked-by: Paul Mackerras pau...@samba.org --- arch/powerpc/kvm/book3s_pr.c | 12 1 file changed, 12 insertions(+) diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 3da412e..8ea7da4 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -71,6 +71,12 @@ static void kvmppc_core_vcpu_load_pr(struct kvm_vcpu *vcpu, int cpu) svcpu-in_use = 0; svcpu_put(svcpu); #endif + + /* Disable AIL if supported */ + if (cpu_has_feature(CPU_FTR_HVMODE) + cpu_has_feature(CPU_FTR_ARCH_207S)) + mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) ~LPCR_AIL); + vcpu-cpu = smp_processor_id(); #ifdef CONFIG_PPC_BOOK3S_32 current-thread.kvm_shadow_vcpu = vcpu-arch.shadow_vcpu; @@ -91,6 +97,12 @@ static void kvmppc_core_vcpu_put_pr(struct kvm_vcpu *vcpu) kvmppc_giveup_ext(vcpu, MSR_FP | MSR_VEC | MSR_VSX); kvmppc_giveup_fac(vcpu, FSCR_TAR_LG); + + /* Enable AIL if supported */ + if (cpu_has_feature(CPU_FTR_HVMODE) + cpu_has_feature(CPU_FTR_ARCH_207S)) + mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) | LPCR_AIL_3); + vcpu-cpu = -1; } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 19/63] KVM: PPC: Book3S HV: Access host lppaca and shadow slb in BE
Some data structures are always stored in big endian. Among those are the LPPACA fields as well as the shadow slb. These structures might be shared with a hypervisor. So whenever we access those fields, make sure we do so in big endian byte order. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index e66c1e38..bf5270e 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -32,10 +32,6 @@ #define VCPU_GPRS_TM(reg) (((reg) * ULONG_SIZE) + VCPU_GPR_TM) -#ifdef __LITTLE_ENDIAN__ -#error Need to fix lppaca and SLB shadow accesses in little endian mode -#endif - /* Values in HSTATE_NAPPING(r13) */ #define NAPPING_CEDE 1 #define NAPPING_NOVCPU 2 @@ -595,9 +591,10 @@ kvmppc_got_guest: ld r3, VCPU_VPA(r4) cmpdi r3, 0 beq 25f - lwz r5, LPPACA_YIELDCOUNT(r3) + li r6, LPPACA_YIELDCOUNT + LWZX_BE r5, r3, r6 addir5, r5, 1 - stw r5, LPPACA_YIELDCOUNT(r3) + STWX_BE r5, r3, r6 li r6, 1 stb r6, VCPU_VPA_DIRTY(r4) 25: @@ -1442,9 +1439,10 @@ END_FTR_SECTION_IFCLR(CPU_FTR_TM) ld r8, VCPU_VPA(r9)/* do they have a VPA? */ cmpdi r8, 0 beq 25f - lwz r3, LPPACA_YIELDCOUNT(r8) + li r4, LPPACA_YIELDCOUNT + LWZX_BE r3, r8, r4 addir3, r3, 1 - stw r3, LPPACA_YIELDCOUNT(r8) + STWX_BE r3, r8, r4 li r3, 1 stb r3, VCPU_VPA_DIRTY(r9) 25: @@ -1757,8 +1755,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) 33:ld r8,PACA_SLBSHADOWPTR(r13) .rept SLB_NUM_BOLTED - ld r5,SLBSHADOW_SAVEAREA(r8) - ld r6,SLBSHADOW_SAVEAREA+8(r8) + li r3, SLBSHADOW_SAVEAREA + LDX_BE r5, r8, r3 + addir3, r3, 8 + LDX_BE r6, r8, r3 andis. r7,r5,SLB_ESID_V@h beq 1f slbmte r6,r5 -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 28/63] KVM: PPC: Book3S: Make magic page properly 4k mappable
The magic page is defined as a 4k page of per-vCPU data that is shared between the guest and the host to accelerate accesses to privileged registers. However, when the host is using 64k page size granularity we weren't quite as strict about that rule anymore. Instead, we partially treated all of the upper 64k as magic page and mapped only the uppermost 4k with the actual magic contents. This works well enough for Linux which doesn't use any memory in kernel space in the upper 64k, but Mac OS X got upset. So this patch makes magic page actually stay in a 4k range even on 64k page size hosts. This patch fixes magic page usage with Mac OS X (using MOL) on 64k PAGE_SIZE hosts for me. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h | 2 +- arch/powerpc/kvm/book3s.c | 12 ++-- arch/powerpc/kvm/book3s_32_mmu_host.c | 7 +++ arch/powerpc/kvm/book3s_64_mmu_host.c | 5 +++-- arch/powerpc/kvm/book3s_pr.c | 13 ++--- arch/powerpc/kvm/powerpc.c| 19 +++ 6 files changed, 38 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index b1cf18d..20fb6f2 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -158,7 +158,7 @@ extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct kvmppc_bat *bat, bool upper, u32 val); extern void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr); extern int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu *vcpu); -extern pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, bool writing, +extern pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing, bool *writable); extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev, unsigned long *rmap, long pte_index, int realmode); diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 1d13764..31facfc 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -354,18 +354,18 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvmppc_core_prepare_to_enter); -pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, bool writing, +pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing, bool *writable) { - ulong mp_pa = vcpu-arch.magic_page_pa; + ulong mp_pa = vcpu-arch.magic_page_pa KVM_PAM; + gfn_t gfn = gpa PAGE_SHIFT; if (!(kvmppc_get_msr(vcpu) MSR_SF)) mp_pa = (uint32_t)mp_pa; /* Magic page override */ - if (unlikely(mp_pa) - unlikely(((gfn PAGE_SHIFT) KVM_PAM) == -((mp_pa PAGE_MASK) KVM_PAM))) { + gpa = ~0xFFFULL; + if (unlikely(mp_pa) unlikely((gpa KVM_PAM) == mp_pa)) { ulong shared_page = ((ulong)vcpu-arch.shared) PAGE_MASK; pfn_t pfn; @@ -378,7 +378,7 @@ pfn_t kvmppc_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, bool writing, return gfn_to_pfn_prot(vcpu-kvm, gfn, writing, writable); } -EXPORT_SYMBOL_GPL(kvmppc_gfn_to_pfn); +EXPORT_SYMBOL_GPL(kvmppc_gpa_to_pfn); static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, bool data, bool iswrite, struct kvmppc_pte *pte) diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c b/arch/powerpc/kvm/book3s_32_mmu_host.c index 678e753..2035d16 100644 --- a/arch/powerpc/kvm/book3s_32_mmu_host.c +++ b/arch/powerpc/kvm/book3s_32_mmu_host.c @@ -156,11 +156,10 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte, bool writable; /* Get host physical address for gpa */ - hpaddr = kvmppc_gfn_to_pfn(vcpu, orig_pte-raddr PAGE_SHIFT, - iswrite, writable); + hpaddr = kvmppc_gpa_to_pfn(vcpu, orig_pte-raddr, iswrite, writable); if (is_error_noslot_pfn(hpaddr)) { - printk(KERN_INFO Couldn't get guest page for gfn %lx!\n, -orig_pte-eaddr); + printk(KERN_INFO Couldn't get guest page for gpa %lx!\n, +orig_pte-raddr); r = -EINVAL; goto out; } diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c index 0ac9839..b982d92 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_host.c +++ b/arch/powerpc/kvm/book3s_64_mmu_host.c @@ -104,9 +104,10 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte, smp_rmb(); /* Get host physical address for gpa */ - pfn = kvmppc_gfn_to_pfn(vcpu, gfn, iswrite, writable); + pfn = kvmppc_gpa_to_pfn(vcpu, orig_pte-raddr, iswrite, writable); if (is_error_noslot_pfn(pfn)) { - printk(KERN_INFO Couldn't get guest page for gfn
[PULL 07/63] KVM: PPC: Book3S HV: Fix ABIv2 indirect branch issue
From: Anton Blanchard an...@samba.org To establish addressability quickly, ABIv2 requires the target address of the function being called to be in r12. Signed-off-by: Anton Blanchard an...@samba.org Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 868347e..da1cac5 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1913,8 +1913,8 @@ hcall_try_real_mode: lwaxr3,r3,r4 cmpwi r3,0 beq guest_exit_cont - add r3,r3,r4 - mtctr r3 + add r12,r3,r4 + mtctr r12 mr r3,r9 /* get vcpu pointer */ ld r4,VCPU_GPR(R4)(r9) bctrl -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html