Re: [PATCH] KVM: ARM: enable Cortex A7 hosts
On Wed, Sep 25, 2013 at 02:32:31PM +0900, Simon Horman wrote: On Thu, Sep 19, 2013 at 02:01:48PM +0200, Ulrich Hecht wrote: KVM runs fine on Cortex A7 cores, so they should be enabled. Tested on an APE6EVM board (r8a73a4 SoC). Signed-off-by: Ulrich Hecht ulrich.he...@gmail.com Hi Ulrich, I'm not entirely sure but it seems to me that you should expand the CC list of this patch someohow as it doesn't seem to have got any attention in the week since you sent it. Adding kvm...@lists.cs.columbia.edu will be a good start. # ./scripts/get_maintainer.pl -f arch/arm/kvm/guest.c Christoffer Dall christoffer.d...@linaro.org (supporter:KERNEL VIRTUAL MA...) Gleb Natapov g...@redhat.com (supporter:KERNEL VIRTUAL MA...) Paolo Bonzini pbonz...@redhat.com (supporter:KERNEL VIRTUAL MA...) Russell King li...@arm.linux.org.uk (maintainer:ARM PORT) kvm...@lists.cs.columbia.edu (open list:KERNEL VIRTUAL MA...) kvm@vger.kernel.org (open list:KERNEL VIRTUAL MA...) linux-arm-ker...@lists.infradead.org (moderated list:ARM PORT) linux-ker...@vger.kernel.org (open list) --- arch/arm/kvm/guest.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c index 152d036..05c62d5 100644 --- a/arch/arm/kvm/guest.c +++ b/arch/arm/kvm/guest.c @@ -192,6 +192,8 @@ int __attribute_const__ kvm_target_cpu(void) switch (part_number) { case ARM_CPU_PART_CORTEX_A15: return KVM_ARM_TARGET_CORTEX_A15; + case ARM_CPU_PART_CORTEX_A7: + return KVM_ARM_TARGET_CORTEX_A15; default: return -EINVAL; } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-sh in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: VMX: do not check bit 12 of EPT violation exit qualification when undefined
Bit 12 is undefined in any of the following cases: - If the NMI exiting VM-execution control is 1 and the virtual NMIs VM-execution control is 0. - If the VM exit sets the valid bit in the IDT-vectoring information field Signed-off-by: Gleb Natapov g...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a1216de..0e06c1c4 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5345,7 +5345,9 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) * There are errata that may cause this bit to not be set: * AAK134, BY25. */ - if (exit_qualification INTR_INFO_UNBLOCK_NMI) + if (!(to_vmx(vcpu)-idt_vectoring_info VECTORING_INFO_VALID_MASK) + cpu_has_virtual_nmis() + exit_qualification INTR_INFO_UNBLOCK_NMI) vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI); gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Yield CPU on poll for serial ports
Hi, Is it possible to implement Yield CPU on poll option similar to VmWare: http://www.vmware.com/support/ws45/doc/devices_serial_ws.html#1023869 ? I use kvm for windows kernel development and it will be great to not have 100% cpu load during a debugging over serial line session. Thanks, Dmitry -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] KVM/ARM Fixes for 3.12-rc2
Il 24/09/2013 21:49, Christoffer Dall ha scritto: The following changes since commit 62d228b8c676232eca579f91cc0782b060a59097: Merge branch 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm (2013-09-17 22:20:30 -0400) are available in the git repository at: git://git.linaro.org/people/cdall/linux-kvm-arm.git tags/kvm-arm-fixes-3.12-rc2 for you to fetch changes up to ac570e0493815e0b41681c89cb50d66421429d27: ARM: kvm: rename cpu_reset to avoid name clash (2013-09-24 11:15:05 -0700) KVM/ARM Fixes for 3.12-rc2 Fixes compilation error with KVM/ARM enabled. Olof Johansson (1): ARM: kvm: rename cpu_reset to avoid name clash arch/arm/kvm/reset.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) Pulled, thanks. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: do not check bit 12 of EPT violation exit qualification when undefined
Il 25/09/2013 09:58, Gleb Natapov ha scritto: Bit 12 is undefined in any of the following cases: - If the NMI exiting VM-execution control is 1 and the virtual NMIs VM-execution control is 0. - If the VM exit sets the valid bit in the IDT-vectoring information field Signed-off-by: Gleb Natapov g...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a1216de..0e06c1c4 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5345,7 +5345,9 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) * There are errata that may cause this bit to not be set: * AAK134, BY25. */ - if (exit_qualification INTR_INFO_UNBLOCK_NMI) + if (!(to_vmx(vcpu)-idt_vectoring_info VECTORING_INFO_VALID_MASK) + cpu_has_virtual_nmis() + exit_qualification INTR_INFO_UNBLOCK_NMI) vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI); gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Applied to kvm/master, thanks. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] KVM: nVMX: Do not put exception that caused vmexit to IDT_VECTORING_INFO
If an exception causes vmexit directly it should not be reported in IDT_VECTORING_INFO during the exit. For that we need to be able to distinguish between exception that is injected into nested VM and one that is reinjected because its delivery failed. Fortunately we already have mechanism to do so for nested SVM, so here we just use correct function to requeue exceptions and make sure that reinjected exception is not moved to IDT_VECTORING_INFO during vmexit emulation and not re-checked for interception during delivery. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/vmx.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 7eb0512..663bc5e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1921,7 +1921,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, unsigned nr, struct vcpu_vmx *vmx = to_vmx(vcpu); u32 intr_info = nr | INTR_INFO_VALID_MASK; - if (nr == PF_VECTOR is_guest_mode(vcpu) + if (!reinject nr == PF_VECTOR is_guest_mode(vcpu) !vmx-nested.nested_run_pending nested_pf_handled(vcpu)) return; @@ -7053,9 +7053,9 @@ static void __vmx_complete_interrupts(struct kvm_vcpu *vcpu, case INTR_TYPE_HARD_EXCEPTION: if (idt_vectoring_info VECTORING_INFO_DELIVER_CODE_MASK) { u32 err = vmcs_read32(error_code_field); - kvm_queue_exception_e(vcpu, vector, err); + kvm_requeue_exception_e(vcpu, vector, err); } else - kvm_queue_exception(vcpu, vector); + kvm_requeue_exception(vcpu, vector); break; case INTR_TYPE_SOFT_INTR: vcpu-arch.event_exit_inst_len = vmcs_read32(instr_len_field); @@ -8013,7 +8013,7 @@ static void vmcs12_save_pending_event(struct kvm_vcpu *vcpu, u32 idt_vectoring; unsigned int nr; - if (vcpu-arch.exception.pending) { + if (vcpu-arch.exception.pending vcpu-arch.exception.reinject) { nr = vcpu-arch.exception.nr; idt_vectoring = nr | VECTORING_INFO_VALID_MASK; -- 1.8.4.rc3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] KVM: nVMX: Check all exceptions for intercept during delivery to L2
All exceptions should be checked for intercept during delivery to L2, but we check only #PF currently. Drop nested_run_pending while we are at it since exception cannot be injected during vmentry anyway. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/vmx.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 663bc5e..5bfa09d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1902,12 +1902,11 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu) * a #PF exception (this is the only case in which KVM injects a #PF when L2 * is running). */ -static int nested_pf_handled(struct kvm_vcpu *vcpu) +static int nested_ex_handled(struct kvm_vcpu *vcpu, unsigned nr) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); - /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */ - if (!(vmcs12-exception_bitmap (1u PF_VECTOR))) + if (!(vmcs12-exception_bitmap (1u nr))) return 0; nested_vmx_vmexit(vcpu); @@ -1921,8 +1920,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, unsigned nr, struct vcpu_vmx *vmx = to_vmx(vcpu); u32 intr_info = nr | INTR_INFO_VALID_MASK; - if (!reinject nr == PF_VECTOR is_guest_mode(vcpu) - !vmx-nested.nested_run_pending nested_pf_handled(vcpu)) + if (!reinject is_guest_mode(vcpu) nested_ex_handled(vcpu, nr)) return; if (has_error_code) { -- 1.8.4.rc3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] Fix shadow-on-shadow nested VMX
This series fixes shadow-on-shadow nested VMX (at least for me). Gleb Natapov (4): KVM: nVMX: Amend nested_run_pending logic KVM: nVMX: Do not put exception that caused vmexit to IDT_VECTORING_INFO KVM: nVMX: Check all exceptions for intercept during delivery to L2 KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2 arch/x86/kvm/vmx.c | 60 ++ 1 file changed, 38 insertions(+), 22 deletions(-) -- 1.8.4.rc3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2
If #PF happens during delivery of an exception into L2 and L1 also do not have the page mapped in its shadow page table then L0 needs to generate vmexit to L2 with original event in IDT_VECTORING_INFO, but current code combines both exception and generates #DF instead. Fix that by providing nVMX specific function to handle page faults during page table walk that handles this case correctly. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/vmx.c | 20 1 file changed, 20 insertions(+) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 5bfa09d..07c36fd 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7520,6 +7520,20 @@ static void nested_ept_uninit_mmu_context(struct kvm_vcpu *vcpu) vcpu-arch.walk_mmu = vcpu-arch.mmu; } +static void vmx_inject_page_fault_nested(struct kvm_vcpu *vcpu, + struct x86_exception *fault) +{ + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); + + WARN_ON(!is_guest_mode(vcpu)); + + /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */ + if (vmcs12-exception_bitmap (1u PF_VECTOR)) + nested_vmx_vmexit(vcpu); + else + kvm_inject_page_fault(vcpu, fault); +} + /* * prepare_vmcs02 is called when the L1 guest hypervisor runs its nested * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function merges it @@ -7773,6 +7787,9 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) kvm_set_cr3(vcpu, vmcs12-guest_cr3); kvm_mmu_reset_context(vcpu); + if (!enable_ept) + vcpu-arch.walk_mmu-inject_page_fault = vmx_inject_page_fault_nested; + /* * L1 may access the L2's PDPTR, so save them to construct vmcs12 */ @@ -8232,6 +8249,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, kvm_set_cr3(vcpu, vmcs12-host_cr3); kvm_mmu_reset_context(vcpu); + if (!enable_ept) + vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault; + if (enable_vpid) { /* * Trivially support vpid by letting L2s share their parent -- 1.8.4.rc3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] KVM: nVMX: Amend nested_run_pending logic
EXIT_REASON_VMLAUNCH/EXIT_REASON_VMRESUME exit does not mean that nested VM will actually run during next entry. Move setting nested_run_pending closer to vmentry emulation code and move its clearing close to vmexit to minimize amount of code that will erroneously run with nested_run_pending set. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/vmx.c | 26 -- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e6e8fbc..7eb0512 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6742,20 +6742,6 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) if (vmx-emulation_required) return handle_invalid_guest_state(vcpu); - /* -* the KVM_REQ_EVENT optimization bit is only on for one entry, and if -* we did not inject a still-pending event to L1 now because of -* nested_run_pending, we need to re-enable this bit. -*/ - if (vmx-nested.nested_run_pending) - kvm_make_request(KVM_REQ_EVENT, vcpu); - - if (!is_guest_mode(vcpu) (exit_reason == EXIT_REASON_VMLAUNCH || - exit_reason == EXIT_REASON_VMRESUME)) - vmx-nested.nested_run_pending = 1; - else - vmx-nested.nested_run_pending = 0; - if (is_guest_mode(vcpu) nested_vmx_exit_handled(vcpu)) { nested_vmx_vmexit(vcpu); return 1; @@ -7290,6 +7276,16 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) vmx-exit_reason = vmcs_read32(VM_EXIT_REASON); trace_kvm_exit(vmx-exit_reason, vcpu, KVM_ISA_VMX); + /* +* the KVM_REQ_EVENT optimization bit is only on for one entry, and if +* we did not inject a still-pending event to L1 now because of +* nested_run_pending, we need to re-enable this bit. +*/ + if (vmx-nested.nested_run_pending) + kvm_make_request(KVM_REQ_EVENT, vcpu); + + vmx-nested.nested_run_pending = 0; + vmx_complete_atomic_exit(vmx); vmx_recover_nmi_blocking(vmx); vmx_complete_interrupts(vmx); @@ -7948,6 +7944,8 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) enter_guest_mode(vcpu); + vmx-nested.nested_run_pending = 1; + vmx-nested.vmcs01_tsc_offset = vmcs_read64(TSC_OFFSET); cpu = get_cpu(); -- 1.8.4.rc3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: nVMX: Check all exceptions for intercept during delivery to L2
Il 25/09/2013 11:51, Gleb Natapov ha scritto: All exceptions should be checked for intercept during delivery to L2, but we check only #PF currently. Drop nested_run_pending while we are at it since exception cannot be injected during vmentry anyway. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/vmx.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 663bc5e..5bfa09d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1902,12 +1902,11 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu) * a #PF exception (this is the only case in which KVM injects a #PF when L2 * is running). */ -static int nested_pf_handled(struct kvm_vcpu *vcpu) +static int nested_ex_handled(struct kvm_vcpu *vcpu, unsigned nr) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); - /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */ - if (!(vmcs12-exception_bitmap (1u PF_VECTOR))) + if (!(vmcs12-exception_bitmap (1u nr))) return 0; nested_vmx_vmexit(vcpu); @@ -1921,8 +1920,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, unsigned nr, struct vcpu_vmx *vmx = to_vmx(vcpu); u32 intr_info = nr | INTR_INFO_VALID_MASK; - if (!reinject nr == PF_VECTOR is_guest_mode(vcpu) - !vmx-nested.nested_run_pending nested_pf_handled(vcpu)) + if (!reinject is_guest_mode(vcpu) nested_ex_handled(vcpu, nr)) The code is now pretty similar to what svm.c does. Do we want to move the is_guest_mode(vcpu) check into nested_ex_handled, too? (Or vice versa, take it out in svm.c). Perhaps you could also name the function nested_vmx_check_exception. Paolo return; if (has_error_code) { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: nVMX: Check all exceptions for intercept during delivery to L2
On Wed, Sep 25, 2013 at 12:38:20PM +0200, Paolo Bonzini wrote: Il 25/09/2013 11:51, Gleb Natapov ha scritto: All exceptions should be checked for intercept during delivery to L2, but we check only #PF currently. Drop nested_run_pending while we are at it since exception cannot be injected during vmentry anyway. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/vmx.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 663bc5e..5bfa09d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1902,12 +1902,11 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu) * a #PF exception (this is the only case in which KVM injects a #PF when L2 * is running). */ -static int nested_pf_handled(struct kvm_vcpu *vcpu) +static int nested_ex_handled(struct kvm_vcpu *vcpu, unsigned nr) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); - /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */ - if (!(vmcs12-exception_bitmap (1u PF_VECTOR))) + if (!(vmcs12-exception_bitmap (1u nr))) return 0; nested_vmx_vmexit(vcpu); @@ -1921,8 +1920,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, unsigned nr, struct vcpu_vmx *vmx = to_vmx(vcpu); u32 intr_info = nr | INTR_INFO_VALID_MASK; - if (!reinject nr == PF_VECTOR is_guest_mode(vcpu) - !vmx-nested.nested_run_pending nested_pf_handled(vcpu)) + if (!reinject is_guest_mode(vcpu) nested_ex_handled(vcpu, nr)) The code is now pretty similar to what svm.c does. Do we want to move the is_guest_mode(vcpu) check into nested_ex_handled, too? (Or vice versa, take it out in svm.c). Perhaps you could also name the function nested_vmx_check_exception. I want to try to move the logic into common code eventually. I do not mind renaming for now, but it will have to wait for the next week :) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2
Il 25/09/2013 11:51, Gleb Natapov ha scritto: If #PF happens during delivery of an exception into L2 and L1 also do not have the page mapped in its shadow page table then L0 needs to generate vmexit to L2 with original event in IDT_VECTORING_INFO, but current code combines both exception and generates #DF instead. Fix that by providing nVMX specific function to handle page faults during page table walk that handles this case correctly. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/vmx.c | 20 1 file changed, 20 insertions(+) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 5bfa09d..07c36fd 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7520,6 +7520,20 @@ static void nested_ept_uninit_mmu_context(struct kvm_vcpu *vcpu) vcpu-arch.walk_mmu = vcpu-arch.mmu; } +static void vmx_inject_page_fault_nested(struct kvm_vcpu *vcpu, + struct x86_exception *fault) +{ + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); + + WARN_ON(!is_guest_mode(vcpu)); + + /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */ + if (vmcs12-exception_bitmap (1u PF_VECTOR)) + nested_vmx_vmexit(vcpu); + else + kvm_inject_page_fault(vcpu, fault); +} + /* * prepare_vmcs02 is called when the L1 guest hypervisor runs its nested * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function merges it @@ -7773,6 +7787,9 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) kvm_set_cr3(vcpu, vmcs12-guest_cr3); kvm_mmu_reset_context(vcpu); + if (!enable_ept) + vcpu-arch.walk_mmu-inject_page_fault = vmx_inject_page_fault_nested; + /* * L1 may access the L2's PDPTR, so save them to construct vmcs12 */ @@ -8232,6 +8249,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, kvm_set_cr3(vcpu, vmcs12-host_cr3); kvm_mmu_reset_context(vcpu); + if (!enable_ept) + vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault; This is strictly speaking not needed, because kvm_mmu_reset_context takes care of it. But I wonder if it is cleaner to not touch the struct here, and instead add a new member to kvm_x86_ops---used directly in init_kvm_softmmu like kvm_x86_ops-set_cr3. The new member can do something like if (is_guest_mode(vcpu)) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); if (vmcs12-exception_bitmap (1u PF_VECTOR)) { nested_vmx_vmexit(vcpu); return; } } kvm_inject_page_fault(vcpu, fault); Marcelo, Jan, what do you think? Alex (or Gleb :)), do you have any idea why SVM does not need this? Paolo if (enable_vpid) { /* * Trivially support vpid by letting L2s share their parent -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: nVMX: Check all exceptions for intercept during delivery to L2
Il 25/09/2013 13:00, Gleb Natapov ha scritto: @@ -1921,8 +1920,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, unsigned nr, struct vcpu_vmx *vmx = to_vmx(vcpu); u32 intr_info = nr | INTR_INFO_VALID_MASK; - if (!reinject nr == PF_VECTOR is_guest_mode(vcpu) - !vmx-nested.nested_run_pending nested_pf_handled(vcpu)) + if (!reinject is_guest_mode(vcpu) nested_ex_handled(vcpu, nr)) The code is now pretty similar to what svm.c does. Do we want to move the is_guest_mode(vcpu) check into nested_ex_handled, too? (Or vice versa, take it out in svm.c). Perhaps you could also name the function nested_vmx_check_exception. I want to try to move the logic into common code eventually. I do not mind renaming for now, but it will have to wait for the next week :) I can rename while applying the patch. Making more logic common to vmx and svm can wait. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2
On Wed, Sep 25, 2013 at 01:24:49PM +0200, Paolo Bonzini wrote: Il 25/09/2013 11:51, Gleb Natapov ha scritto: If #PF happens during delivery of an exception into L2 and L1 also do not have the page mapped in its shadow page table then L0 needs to generate vmexit to L2 with original event in IDT_VECTORING_INFO, but current code combines both exception and generates #DF instead. Fix that by providing nVMX specific function to handle page faults during page table walk that handles this case correctly. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/vmx.c | 20 1 file changed, 20 insertions(+) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 5bfa09d..07c36fd 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7520,6 +7520,20 @@ static void nested_ept_uninit_mmu_context(struct kvm_vcpu *vcpu) vcpu-arch.walk_mmu = vcpu-arch.mmu; } +static void vmx_inject_page_fault_nested(struct kvm_vcpu *vcpu, + struct x86_exception *fault) +{ + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); + + WARN_ON(!is_guest_mode(vcpu)); + + /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */ + if (vmcs12-exception_bitmap (1u PF_VECTOR)) + nested_vmx_vmexit(vcpu); + else + kvm_inject_page_fault(vcpu, fault); +} + /* * prepare_vmcs02 is called when the L1 guest hypervisor runs its nested * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function merges it @@ -7773,6 +7787,9 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) kvm_set_cr3(vcpu, vmcs12-guest_cr3); kvm_mmu_reset_context(vcpu); + if (!enable_ept) + vcpu-arch.walk_mmu-inject_page_fault = vmx_inject_page_fault_nested; + /* * L1 may access the L2's PDPTR, so save them to construct vmcs12 */ @@ -8232,6 +8249,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, kvm_set_cr3(vcpu, vmcs12-host_cr3); kvm_mmu_reset_context(vcpu); + if (!enable_ept) + vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault; This is strictly speaking not needed, because kvm_mmu_reset_context takes care of it. Yeah, but better make it explicit, it does not hurt but make it more clear what is going on. Or at least add comment above kvm_mmu_reset_context() about this side effect. But I wonder if it is cleaner to not touch the struct here, and instead add a new member to kvm_x86_ops---used directly in init_kvm_softmmu like kvm_x86_ops-set_cr3. The new member can do something like if (is_guest_mode(vcpu)) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); if (vmcs12-exception_bitmap (1u PF_VECTOR)) { nested_vmx_vmexit(vcpu); return; } } kvm_inject_page_fault(vcpu, fault); I do not quite understand what you mean here. inject_page_fault() is called from the depth of page table walking. How the code will not to call new member in some circumstances? Marcelo, Jan, what do you think? Alex (or Gleb :)), do you have any idea why SVM does not need this? It's probably needed there too. At least I fail to see why it does not. Without that patch guest is actually booting (most of the times), but sometimes random processes crash with double fault exception. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: nVMX: Check all exceptions for intercept during delivery to L2
On Wed, Sep 25, 2013 at 01:25:39PM +0200, Paolo Bonzini wrote: Il 25/09/2013 13:00, Gleb Natapov ha scritto: @@ -1921,8 +1920,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, unsigned nr, struct vcpu_vmx *vmx = to_vmx(vcpu); u32 intr_info = nr | INTR_INFO_VALID_MASK; - if (!reinject nr == PF_VECTOR is_guest_mode(vcpu) - !vmx-nested.nested_run_pending nested_pf_handled(vcpu)) + if (!reinject is_guest_mode(vcpu) nested_ex_handled(vcpu, nr)) The code is now pretty similar to what svm.c does. Do we want to move the is_guest_mode(vcpu) check into nested_ex_handled, too? (Or vice versa, take it out in svm.c). Perhaps you could also name the function nested_vmx_check_exception. I want to try to move the logic into common code eventually. I do not mind renaming for now, but it will have to wait for the next week :) I can rename while applying the patch. Making more logic common to vmx and svm can wait. Yes, of course, logic unification is not for the immediate feature. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2
Il 25/09/2013 13:51, Gleb Natapov ha scritto: On Wed, Sep 25, 2013 at 01:24:49PM +0200, Paolo Bonzini wrote: Il 25/09/2013 11:51, Gleb Natapov ha scritto: @@ -7773,6 +7787,9 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) kvm_set_cr3(vcpu, vmcs12-guest_cr3); kvm_mmu_reset_context(vcpu); + if (!enable_ept) + vcpu-arch.walk_mmu-inject_page_fault = vmx_inject_page_fault_nested; + /* * L1 may access the L2's PDPTR, so save them to construct vmcs12 */ @@ -8232,6 +8249,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, kvm_set_cr3(vcpu, vmcs12-host_cr3); kvm_mmu_reset_context(vcpu); + if (!enable_ept) + vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault; This is strictly speaking not needed, because kvm_mmu_reset_context takes care of it. Yeah, but better make it explicit, it does not hurt but make it more clear what is going on. Or at least add comment above kvm_mmu_reset_context() about this side effect. Yes, I agree the code is cleaner like you wrote it. But I wonder if it is cleaner to not touch the struct here, and instead add a new member to kvm_x86_ops---used directly in init_kvm_softmmu like kvm_x86_ops-set_cr3. The new member can do something like if (is_guest_mode(vcpu)) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); if (vmcs12-exception_bitmap (1u PF_VECTOR)) { nested_vmx_vmexit(vcpu); return; } } kvm_inject_page_fault(vcpu, fault); I do not quite understand what you mean here. inject_page_fault() is called from the depth of page table walking. How the code will not to call new member in some circumstances? IIUC the new function is called if and only if is_guest_mode(vcpu) !enable_ept. So what I'm suggesting is something like this: --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -735,6 +735,8 @@ struct kvm_x86_ops { void (*adjust_tsc_offset)(struct kvm_vcpu *vcpu, s64 adjustment, bool host); void (*set_tdp_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3); + void (*inject_softmmu_page_fault)(struct kvm_vcpu *vcpu, + struct x86_exception *fault); void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry); --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3805,7 +3805,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu) vcpu-arch.walk_mmu-set_cr3 = kvm_x86_ops-set_cr3; vcpu-arch.walk_mmu-get_cr3 = get_cr3; vcpu-arch.walk_mmu-get_pdptr = kvm_pdptr_read; - vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault; + vcpu-arch.walk_mmu-inject_page_fault = kvm_x86_ops-inject_softmmu_page_fault; return r; } --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7499,6 +7499,20 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu, vmcs12-guest_physical_address = fault-address; } +static void vmx_inject_softmmu_page_fault(struct kvm_vcpu *vcpu, + struct x86_exception *fault) +{ + if (is_guest_mode(vcpu)) { + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); + if (vmcs12-exception_bitmap (1u PF_VECTOR)) { + nested_vmx_vmexit(vcpu); + return; + } + } + + kvm_inject_page_fault(vcpu, fault); +} + /* Callbacks for nested_ept_init_mmu_context: */ static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu) @@ -8490,6 +8504,7 @@ static struct kvm_x86_ops vmx_x86_ops = { .read_l1_tsc = vmx_read_l1_tsc, .set_tdp_cr3 = vmx_set_cr3, + .inject_nested_tdp_pagefault = vmx_set_cr3, .check_intercept = vmx_check_intercept, .handle_external_intr = vmx_handle_external_intr, --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -4347,6 +4347,7 @@ static struct kvm_x86_ops svm_x86_ops = { .read_l1_tsc = svm_read_l1_tsc, .set_tdp_cr3 = set_tdp_cr3, + .inject_nested_tdp_pagefault = kvm_inject_page_fault, /*FIXME*/ .check_intercept = svm_check_intercept, .handle_external_intr = svm_handle_external_intr, Alex (or Gleb :)), do you have any idea why SVM does not need this? It's probably needed there too. At least I fail to see why it does not. Without that patch guest is actually booting (most of the times), but sometimes random processes crash with double fault exception. Sounds indeed like the same bug. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2
On Wed, Sep 25, 2013 at 02:08:09PM +0200, Paolo Bonzini wrote: Il 25/09/2013 13:51, Gleb Natapov ha scritto: On Wed, Sep 25, 2013 at 01:24:49PM +0200, Paolo Bonzini wrote: Il 25/09/2013 11:51, Gleb Natapov ha scritto: @@ -7773,6 +7787,9 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) kvm_set_cr3(vcpu, vmcs12-guest_cr3); kvm_mmu_reset_context(vcpu); + if (!enable_ept) + vcpu-arch.walk_mmu-inject_page_fault = vmx_inject_page_fault_nested; + /* * L1 may access the L2's PDPTR, so save them to construct vmcs12 */ @@ -8232,6 +8249,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, kvm_set_cr3(vcpu, vmcs12-host_cr3); kvm_mmu_reset_context(vcpu); + if (!enable_ept) + vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault; This is strictly speaking not needed, because kvm_mmu_reset_context takes care of it. Yeah, but better make it explicit, it does not hurt but make it more clear what is going on. Or at least add comment above kvm_mmu_reset_context() about this side effect. Yes, I agree the code is cleaner like you wrote it. But I wonder if it is cleaner to not touch the struct here, and instead add a new member to kvm_x86_ops---used directly in init_kvm_softmmu like kvm_x86_ops-set_cr3. The new member can do something like if (is_guest_mode(vcpu)) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); if (vmcs12-exception_bitmap (1u PF_VECTOR)) { nested_vmx_vmexit(vcpu); return; } } kvm_inject_page_fault(vcpu, fault); I do not quite understand what you mean here. inject_page_fault() is called from the depth of page table walking. How the code will not to call new member in some circumstances? IIUC the new function is called if and only if is_guest_mode(vcpu) !enable_ept. So what I'm suggesting is something like this: Ah I see, so you propose to check for guest mode and enable_ept in the function instead of switching to another function, but switching to another function is how code was designed to be. Nested NPT/EPT provide their own function too, but there is nothing that stops you from checking on what MMU you are now in the function itself. --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -735,6 +735,8 @@ struct kvm_x86_ops { void (*adjust_tsc_offset)(struct kvm_vcpu *vcpu, s64 adjustment, bool host); void (*set_tdp_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3); + void (*inject_softmmu_page_fault)(struct kvm_vcpu *vcpu, + struct x86_exception *fault); void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry); --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3805,7 +3805,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu) vcpu-arch.walk_mmu-set_cr3 = kvm_x86_ops-set_cr3; vcpu-arch.walk_mmu-get_cr3 = get_cr3; vcpu-arch.walk_mmu-get_pdptr = kvm_pdptr_read; - vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault; + vcpu-arch.walk_mmu-inject_page_fault = kvm_x86_ops-inject_softmmu_page_fault; return r; } --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7499,6 +7499,20 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu, vmcs12-guest_physical_address = fault-address; } +static void vmx_inject_softmmu_page_fault(struct kvm_vcpu *vcpu, + struct x86_exception *fault) +{ + if (is_guest_mode(vcpu)) { is_guest_mode(vcpu) !enable_ept + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); + if (vmcs12-exception_bitmap (1u PF_VECTOR)) { + nested_vmx_vmexit(vcpu); + return; + } + } + + kvm_inject_page_fault(vcpu, fault); +} + /* Callbacks for nested_ept_init_mmu_context: */ static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu) @@ -8490,6 +8504,7 @@ static struct kvm_x86_ops vmx_x86_ops = { .read_l1_tsc = vmx_read_l1_tsc, .set_tdp_cr3 = vmx_set_cr3, + .inject_nested_tdp_pagefault = vmx_set_cr3, .check_intercept = vmx_check_intercept, .handle_external_intr = vmx_handle_external_intr, --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -4347,6 +4347,7 @@ static struct kvm_x86_ops svm_x86_ops = { .read_l1_tsc = svm_read_l1_tsc, .set_tdp_cr3 = set_tdp_cr3, + .inject_nested_tdp_pagefault = kvm_inject_page_fault, /*FIXME*/ .check_intercept = svm_check_intercept, .handle_external_intr = svm_handle_external_intr, Alex (or Gleb :)), do you have any idea why SVM does not need this? It's probably needed there too. At least I fail to see why it does not. Without that patch guest is actually booting (most of the times),
Re: [PATCH 4/4] KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2
Il 25/09/2013 14:21, Gleb Natapov ha scritto: On Wed, Sep 25, 2013 at 02:08:09PM +0200, Paolo Bonzini wrote: Il 25/09/2013 13:51, Gleb Natapov ha scritto: On Wed, Sep 25, 2013 at 01:24:49PM +0200, Paolo Bonzini wrote: Il 25/09/2013 11:51, Gleb Natapov ha scritto: @@ -7773,6 +7787,9 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) kvm_set_cr3(vcpu, vmcs12-guest_cr3); kvm_mmu_reset_context(vcpu); + if (!enable_ept) + vcpu-arch.walk_mmu-inject_page_fault = vmx_inject_page_fault_nested; + /* * L1 may access the L2's PDPTR, so save them to construct vmcs12 */ @@ -8232,6 +8249,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, kvm_set_cr3(vcpu, vmcs12-host_cr3); kvm_mmu_reset_context(vcpu); + if (!enable_ept) + vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault; This is strictly speaking not needed, because kvm_mmu_reset_context takes care of it. Yeah, but better make it explicit, it does not hurt but make it more clear what is going on. Or at least add comment above kvm_mmu_reset_context() about this side effect. Yes, I agree the code is cleaner like you wrote it. But I wonder if it is cleaner to not touch the struct here, and instead add a new member to kvm_x86_ops---used directly in init_kvm_softmmu like kvm_x86_ops-set_cr3. The new member can do something like if (is_guest_mode(vcpu)) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); if (vmcs12-exception_bitmap (1u PF_VECTOR)) { nested_vmx_vmexit(vcpu); return; } } kvm_inject_page_fault(vcpu, fault); I do not quite understand what you mean here. inject_page_fault() is called from the depth of page table walking. How the code will not to call new member in some circumstances? IIUC the new function is called if and only if is_guest_mode(vcpu) !enable_ept. So what I'm suggesting is something like this: Ah I see, so you propose to check for guest mode and enable_ept in the function instead of switching to another function, but switching to another function is how code was designed to be. You do not need to check enable_ept if I understand the code correctly, because the new function is specifically called in init_kvm_softmmu, i.e. not for nested_mmu and not for tdp_enabled. I'm asking because I didn't find any other place that modifies function pointers this way after kvm_mmu_reset_context. Nested NPT/EPT provide their own function too, but there is nothing that stops you from checking on what MMU you are now in the function itself. The difference is that NPT/EPT use a completely different paging mode for nested and non-nested (non-nested uses direct mapping, nested uses shadow mapping). Shadow paging is really the same thing for nested and non-nested, you just have to do the injection the right way. --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -735,6 +735,8 @@ struct kvm_x86_ops { void (*adjust_tsc_offset)(struct kvm_vcpu *vcpu, s64 adjustment, bool host); void (*set_tdp_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3); +void (*inject_softmmu_page_fault)(struct kvm_vcpu *vcpu, + struct x86_exception *fault); void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry); --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3805,7 +3805,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu) vcpu-arch.walk_mmu-set_cr3 = kvm_x86_ops-set_cr3; vcpu-arch.walk_mmu-get_cr3 = get_cr3; vcpu-arch.walk_mmu-get_pdptr = kvm_pdptr_read; -vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault; +vcpu-arch.walk_mmu-inject_page_fault = kvm_x86_ops-inject_softmmu_page_fault; return r; } --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7499,6 +7499,20 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu, vmcs12-guest_physical_address = fault-address; } +static void vmx_inject_softmmu_page_fault(struct kvm_vcpu *vcpu, +struct x86_exception *fault) +{ +if (is_guest_mode(vcpu)) { is_guest_mode(vcpu) !enable_ept You don't really need to check for enable_ept (perhaps WARN_ON(enable_ept) instead) because the function is not used always, only in init_kvm_softmmu. I described what I saw with VMX, I am not saying the same happens with SVM :) I just do not see why it should not and the non fatality of the BUG can explain why it was missed. Interesting, I got something completely different. The guest just got stuck before even getting to the GRUB prompt. I'm trying your patches now... Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2
On Wed, Sep 25, 2013 at 03:26:56PM +0200, Paolo Bonzini wrote: Il 25/09/2013 14:21, Gleb Natapov ha scritto: On Wed, Sep 25, 2013 at 02:08:09PM +0200, Paolo Bonzini wrote: Il 25/09/2013 13:51, Gleb Natapov ha scritto: On Wed, Sep 25, 2013 at 01:24:49PM +0200, Paolo Bonzini wrote: Il 25/09/2013 11:51, Gleb Natapov ha scritto: @@ -7773,6 +7787,9 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) kvm_set_cr3(vcpu, vmcs12-guest_cr3); kvm_mmu_reset_context(vcpu); + if (!enable_ept) + vcpu-arch.walk_mmu-inject_page_fault = vmx_inject_page_fault_nested; + /* * L1 may access the L2's PDPTR, so save them to construct vmcs12 */ @@ -8232,6 +8249,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, kvm_set_cr3(vcpu, vmcs12-host_cr3); kvm_mmu_reset_context(vcpu); + if (!enable_ept) + vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault; This is strictly speaking not needed, because kvm_mmu_reset_context takes care of it. Yeah, but better make it explicit, it does not hurt but make it more clear what is going on. Or at least add comment above kvm_mmu_reset_context() about this side effect. Yes, I agree the code is cleaner like you wrote it. But I wonder if it is cleaner to not touch the struct here, and instead add a new member to kvm_x86_ops---used directly in init_kvm_softmmu like kvm_x86_ops-set_cr3. The new member can do something like if (is_guest_mode(vcpu)) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); if (vmcs12-exception_bitmap (1u PF_VECTOR)) { nested_vmx_vmexit(vcpu); return; } } kvm_inject_page_fault(vcpu, fault); I do not quite understand what you mean here. inject_page_fault() is called from the depth of page table walking. How the code will not to call new member in some circumstances? IIUC the new function is called if and only if is_guest_mode(vcpu) !enable_ept. So what I'm suggesting is something like this: Ah I see, so you propose to check for guest mode and enable_ept in the function instead of switching to another function, but switching to another function is how code was designed to be. You do not need to check enable_ept if I understand the code correctly, because the new function is specifically called in init_kvm_softmmu, i.e. not for nested_mmu and not for tdp_enabled. Ah true. With EPT shadow page code will not run so function will not be called. I'm asking because I didn't find any other place that modifies function pointers this way after kvm_mmu_reset_context. nested_ept_init_mmu_context() modify the pointer. Not after kvm_mmu_reset_context() but it does not matter much. The canonical way to do what I did here would be to create special mmu context to handle shadow on shadow, bit the only difference between it and regular shadow one would be this pointer, so it looked like overkill. Just changing the pointer do the trick. Nested NPT/EPT provide their own function too, but there is nothing that stops you from checking on what MMU you are now in the function itself. The difference is that NPT/EPT use a completely different paging mode for nested and non-nested (non-nested uses direct mapping, nested uses shadow mapping). Shadow paging is really the same thing for nested and non-nested, you just have to do the injection the right way. --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -735,6 +735,8 @@ struct kvm_x86_ops { void (*adjust_tsc_offset)(struct kvm_vcpu *vcpu, s64 adjustment, bool host); void (*set_tdp_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3); + void (*inject_softmmu_page_fault)(struct kvm_vcpu *vcpu, +struct x86_exception *fault); void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry); --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3805,7 +3805,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu) vcpu-arch.walk_mmu-set_cr3 = kvm_x86_ops-set_cr3; vcpu-arch.walk_mmu-get_cr3 = get_cr3; vcpu-arch.walk_mmu-get_pdptr = kvm_pdptr_read; - vcpu-arch.walk_mmu-inject_page_fault = kvm_inject_page_fault; + vcpu-arch.walk_mmu-inject_page_fault = kvm_x86_ops-inject_softmmu_page_fault; return r; } --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7499,6 +7499,20 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu, vmcs12-guest_physical_address = fault-address; } +static void vmx_inject_softmmu_page_fault(struct kvm_vcpu *vcpu, + struct x86_exception *fault) +{ + if (is_guest_mode(vcpu)) { is_guest_mode(vcpu) !enable_ept You don't really need to check for enable_ept (perhaps
Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- ChangeLog to v4: Format changes and remove a flag in nested_vmx. arch/x86/include/uapi/asm/msr-index.h |1 + arch/x86/kvm/vmx.c| 44 +++-- 2 files changed, 43 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h index bb04650..b93e09a 100644 --- a/arch/x86/include/uapi/asm/msr-index.h +++ b/arch/x86/include/uapi/asm/msr-index.h @@ -536,6 +536,7 @@ /* MSR_IA32_VMX_MISC bits */ #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL 29) +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F /* AMD-V MSRs */ #define MSR_VM_CR 0xc0010114 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1f1da43..e1fa13a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,13 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + if (!(nested_vmx_pinbased_ctls_high PIN_BASED_VMX_PREEMPTION_TIMER) || + !(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { + nested_vmx_exit_ctls_high = ~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + nested_vmx_pinbased_ctls_high = ~PIN_BASED_VMX_PREEMPTION_TIMER; + } nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); @@ -6707,6 +6713,27 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2) *info2 = vmcs_read32(VM_EXIT_INTR_INFO); } +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) +{ + u64 delta_tsc_l1; + u32 preempt_val_l1, preempt_val_l2, preempt_scale; + + if (!(get_vmcs12(vcpu)-pin_based_vm_exec_control + PIN_BASED_VMX_PREEMPTION_TIMER)) + return; + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) + MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE; + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + delta_tsc_l1 = vmx_read_l1_tsc(vcpu, native_read_tsc()) + - vcpu-arch.last_guest_tsc; + preempt_val_l1 = delta_tsc_l1 preempt_scale; + if (preempt_val_l2 = preempt_val_l1) + preempt_val_l2 = 0; + else + preempt_val_l2 -= preempt_val_l1; + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2); +} + /* * The guest has exited. See if we can fix it or if we need userspace * assistance. @@ -7131,6 +7158,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) atomic_switch_perf_msrs(vmx); debugctlmsr = get_debugctlmsr(); + if (is_guest_mode(vcpu) !(vmx-nested.nested_run_pending)) + nested_adjust_preemption_timer(vcpu); vmx-__launched = vmx-loaded_vmcs-launched; asm( /* Store host registers */ @@ -7518,6 +7547,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) { struct vcpu_vmx *vmx = to_vmx(vcpu); u32 exec_control; + u32 exit_control; vmcs_write16(GUEST_ES_SELECTOR, vmcs12-guest_es_selector); vmcs_write16(GUEST_CS_SELECTOR, vmcs12-guest_cs_selector); @@ -7691,7 +7721,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER * bits are further modified by vmx_set_efer() below. */ - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); + exit_control = vmcs_config.vmexit_ctrl; + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) + exit_control |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + vmcs_write32(VM_EXIT_CONTROLS, exit_control); /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are * emulated by vmx_set_efer(), below. @@ -8090,6 +8123,13 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vmcs12-guest_pending_dbg_exceptions = vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS); + if ((vmcs12-pin_based_vm_exec_control + PIN_BASED_VMX_PREEMPTION_TIMER) + (vmcs12-vm_exit_controls +
Re: [PATCH 3/4] KVM: nVMX: Check all exceptions for intercept during delivery to L2
Il 25/09/2013 11:51, Gleb Natapov ha scritto: All exceptions should be checked for intercept during delivery to L2, but we check only #PF currently. Drop nested_run_pending while we are at it since exception cannot be injected during vmentry anyway. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/vmx.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 663bc5e..5bfa09d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1902,12 +1902,11 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu) * a #PF exception (this is the only case in which KVM injects a #PF when L2 * is running). */ -static int nested_pf_handled(struct kvm_vcpu *vcpu) +static int nested_ex_handled(struct kvm_vcpu *vcpu, unsigned nr) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); - /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */ Just one more question, why drop this comment? It doesn't seem incorrect in the particular case where nr == PF_VECTOR. - if (!(vmcs12-exception_bitmap (1u PF_VECTOR))) + if (!(vmcs12-exception_bitmap (1u nr))) return 0; nested_vmx_vmexit(vcpu); @@ -1921,8 +1920,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, unsigned nr, struct vcpu_vmx *vmx = to_vmx(vcpu); u32 intr_info = nr | INTR_INFO_VALID_MASK; - if (!reinject nr == PF_VECTOR is_guest_mode(vcpu) - !vmx-nested.nested_run_pending nested_pf_handled(vcpu)) + if (!reinject is_guest_mode(vcpu) nested_ex_handled(vcpu, nr)) return; if (has_error_code) { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2
Il 25/09/2013 15:36, Gleb Natapov ha scritto: I'm asking because I didn't find any other place that modifies function pointers this way after kvm_mmu_reset_context. nested_ept_init_mmu_context() modify the pointer. Not after kvm_mmu_reset_context() but it does not matter much. The canonical way to do what I did here would be to create special mmu context to handle shadow on shadow, bit the only difference between it and regular shadow one would be this pointer, so it looked like overkill. Just changing the pointer do the trick. Ok, I think it's possible to clean up the code a bit but I can do that on top of your patches. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: nVMX: Check all exceptions for intercept during delivery to L2
On Wed, Sep 25, 2013 at 04:00:16PM +0200, Paolo Bonzini wrote: Il 25/09/2013 11:51, Gleb Natapov ha scritto: All exceptions should be checked for intercept during delivery to L2, but we check only #PF currently. Drop nested_run_pending while we are at it since exception cannot be injected during vmentry anyway. Signed-off-by: Gleb Natapov g...@redhat.com --- arch/x86/kvm/vmx.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 663bc5e..5bfa09d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1902,12 +1902,11 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu) * a #PF exception (this is the only case in which KVM injects a #PF when L2 * is running). */ -static int nested_pf_handled(struct kvm_vcpu *vcpu) +static int nested_ex_handled(struct kvm_vcpu *vcpu, unsigned nr) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); - /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */ Just one more question, why drop this comment? It doesn't seem incorrect in the particular case where nr == PF_VECTOR. I moved it to vmx_inject_page_fault_nested(). - if (!(vmcs12-exception_bitmap (1u PF_VECTOR))) + if (!(vmcs12-exception_bitmap (1u nr))) return 0; nested_vmx_vmexit(vcpu); @@ -1921,8 +1920,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, unsigned nr, struct vcpu_vmx *vmx = to_vmx(vcpu); u32 intr_info = nr | INTR_INFO_VALID_MASK; - if (!reinject nr == PF_VECTOR is_guest_mode(vcpu) - !vmx-nested.nested_run_pending nested_pf_handled(vcpu)) + if (!reinject is_guest_mode(vcpu) nested_ex_handled(vcpu, nr)) return; if (has_error_code) { -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: nVMX: Check all exceptions for intercept during delivery to L2
Il 25/09/2013 16:19, Gleb Natapov ha scritto: -static int nested_pf_handled(struct kvm_vcpu *vcpu) +static int nested_ex_handled(struct kvm_vcpu *vcpu, unsigned nr) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); - /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */ Just one more question, why drop this comment? It doesn't seem incorrect in the particular case where nr == PF_VECTOR. I moved it to vmx_inject_page_fault_nested(). Ah, so if you test it there, you shouldn't get here unless the PFEC_MASK/MATCH matches. In the case of EPT, you run with L1 PFEC_MASK/MATCH so you do not need any check. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to share filesystem
On Tuesday, September 24, 2013 3:38:39 AM CDT, Ross Boylan wrote: I would like to have access to the same file system from the host and the guest. Can anyone recommend the best way to do this, considering ease of use, safety (concurrent access from guest and host does not corrupt) and performance? For example, I would like to restore files from backup using the host, but write to filesystems used by the guest. I have previously used kvm mostly with disks that are based on LVM logical volumes, e.g. -hda /dev/turtle/Squeeze00. Since the LVs are virtual disks, I can't just mount them in the host AFAIK. Among the alternatives I can think of are using NFS and using NBD. Maybe there's some kind of loopback device I could use on the disk image to access it from the host. I would suggest NFS/CIFS generally speaking. 9p/virtio is an option, but NFS and CIFS are going to be far more stable, tested, and fast. Host: Debian GNU/Linux wheezy, amd64 architecture, qemu-kvm 1.1.2 Guest: Debian GNU/Linux lenny i386. Host processor is a recent i5 with good virtualization (flaga: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms) Thanks. Ross Boylan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: nVMX: Check all exceptions for intercept during delivery to L2
On Wed, Sep 25, 2013 at 04:22:48PM +0200, Paolo Bonzini wrote: Il 25/09/2013 16:19, Gleb Natapov ha scritto: -static int nested_pf_handled(struct kvm_vcpu *vcpu) +static int nested_ex_handled(struct kvm_vcpu *vcpu, unsigned nr) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); - /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */ Just one more question, why drop this comment? It doesn't seem incorrect in the particular case where nr == PF_VECTOR. I moved it to vmx_inject_page_fault_nested(). Ah, so if you test it there, you shouldn't get here unless the PFEC_MASK/MATCH matches. Yes, PF will never reach here in case of shadow on shadow. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] arm32: kvm: rename CONFIG_KVM_ARM_MAX_VCPUS
On Tue, Sep 24, 2013 at 04:09:28PM +0200, Andrew Jones wrote: Drop the _ARM_ part of the name. We can then introduce a config option like this to aarch64 and other arches using the same name - allowing grep to show them all. Also update the help text to describe the option more completely. Signed-off-by: Andrew Jones drjo...@redhat.com --- v2: reword help text some more --- arch/arm/include/asm/kvm_host.h | 4 ++-- arch/arm/kvm/Kconfig| 9 + 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 7d22517d80711..c614d3eb176c6 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -25,8 +25,8 @@ #include asm/fpstate.h #include kvm/arm_arch_timer.h -#if defined(CONFIG_KVM_ARM_MAX_VCPUS) -#define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS +#if defined(CONFIG_KVM_MAX_VCPUS) +#define KVM_MAX_VCPUS CONFIG_KVM_MAX_VCPUS #else #define KVM_MAX_VCPUS 0 #endif diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index ebf5015508b52..8e56ccf45edce 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -40,16 +40,17 @@ config KVM_ARM_HOST ---help--- Provides host support for ARM processors. -config KVM_ARM_MAX_VCPUS +config KVM_MAX_VCPUS int Number maximum supported virtual CPUs per VM depends on KVM_ARM_HOST default 4 help Static number of max supported virtual CPUs per VM. - If you choose a high number, the vcpu structures will be quite - large, so only choose a reasonable number that you expect to - actually use. + The default is set to the highest number of vcpus that + current hardware supports. Choosing a lower number decreases + the size of the VM data structure. This number may also be + increased. Maybe I'm being forgetful, but what do you mean by This number may also be increased ? -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency
On Sun, 2013-09-15 at 22:40 +1000, Alexey Kardashevskiy wrote: On 09/14/2013 02:25 AM, Alex Williamson wrote: On Fri, 2013-09-13 at 18:49 +1000, Alexey Kardashevskiy wrote: On 09/13/2013 07:23 AM, Alex Williamson wrote: So far we've succeeded at making KVM and VFIO mostly unaware of each other, but there's any important point where that breaks down. Intel VT-d hardware may or may not support snoop control. When snoop control is available, intel-iommu promotes No-Snoop transactions on PCIe to be cache coherent. That allows KVM to handle things like the x86 WBINVD opcode as a nop. When the hardware does not support this, KVM must implement a hardware visible WBINVD for the guest. We could simply let userspace tell KVM how to handle WBINVD, but it's privileged for a reason. Allowing an arbitrary user to enable physical WBINVD gives them a more access to the hardware. Previously, this has only been enabled for guests supporting legacy PCI device assignment. In such cases it's necessary for proper guest execution. We therefore create a new KVM-VFIO virtual device. The user can add and remove VFIO groups to this device via file descriptors. KVM makes use of the VFIO external user interface to validate that the user has access to physical hardware and gets the coherency state of the IOMMU from VFIO. This provides equivalent functionality to legacy KVM assignment, while keeping (nearly) all the bits isolated. The one intrusion is the resulting flag indicating the coherency state. For this RFC it's placed on the x86 kvm_arch struct, however I know POWER has interest in using the VFIO external user interface, and I'm hoping we can share a common KVM-VFIO device. Perhaps they care about No-Snoop handling as well or the code can be #ifdef'd. POWER does not support (at least boos3s - server, not sure about others) this cache-non-coherent stuff at all. Then it's easy for your IOMMU API interface to return always cache coherent or never cache coherent or whatever ;) Regarding reusing this device with external API for POWER - I posted a patch which introduces KVM device to link KVM with IOMMU but besides the list of groups registered in KVM, it also provides the way to find a group by LIOBN (logical bus number) which is used in DMA map/unmap hypercalls. So in my case kvm_vfio_group struct needs LIOBN and it would be nice to have there window_size too (for a quick boundary check). I am not sure we want to mix everything here. It is in [PATCH v10 12/13] KVM: PPC: Add support for IOMMU in-kernel handling if you are interested (kvmppc_spapr_tce_iommu_device). Yes, I stole the code to get the vfio symbols from your code. The convergence I was hoping to achieve is that KVM doesn't really want to know about VFIO and vica versa. We can therefore at least limit the intrusion by sharing a common device. Obviously for you it will need some extra interfaces to associate an LIOBN to a group, but we keep both the kernel an userspace cleaner by avoiding duplication where we can. Is this really not extensible to your usage? Well, I do not have a good taste for this kind of stuff so I cannot tell for sure. I can reuse this device and hack it to do whatever I need but how? There are 2 things I need from KVM device: 1. associate IOMMU group with LIOBN Does QEMU know this? We could set another attribute to specify the LIOBN for a group. 2. get a pointer to an IOMMU group by LIOBN (used to get ppc's IOMMU table pointer which is an IOMMU data of an IOMMU group so we could take a shortcut here). Once we have a VFIO device with a VFIO group added and the LIOBN attribute set, isn't this just a matter of some access code? There are 3 possible interfaces to use: A. set/get attributes B. ioctl C. additional API I think we need to differentiate user interfaces from kernel interfaces. Within the kernel, we make no guarantees about interfaces and we can change them to meet our needs. That includes the vfio external user interface. For userspace, we need to be careful not to break things. I suggest we use the set/get/has attribute interface that's already part of KVM for the user interface. What I posted a week ago uses A for 1 and C for 2. Now we move this to virt/kvm/vfio.c. I don't care where it lives, other than we both have a need for it, so it seems like the core of it should not live in architecture specific directories. A for 1 is more or less ok but how exactly? Yet another attribute? Platform specific bus ID? In your patch attr-addr is not really an address (to be accessed with get_user()) but just an fd. Is that a problem? For 2 - there are already A and B interfaces so we do not want C, right? What kind of a good device has a backdoor? :) But A and B do not have access control to prevent the user space from receiving a IOMMU group pointer (which it would not be able to use anyway
Re: [PATCH 6/8] KVM: arm-vgic: Add vgic reg access from dev attr
On Sun, Aug 25, 2013 at 04:21:58PM +0100, Alexander Graf wrote: On 23.08.2013, at 20:20, Christoffer Dall wrote: Add infrastructure to handle distributor and cpu interface register accesses through the KVM_{GET/SET}_DEVICE_ATTR interface by adding the KVM_DEV_ARM_VGIC_GRP_DIST_REGS and KVM_DEV_ARM_VGIC_GRP_CPU_REGS groups and defining the semantics of the attr field to be the MMIO offset as specified in the GICv2 specs. Missing register accesses or other changes in individual register access functions to support save/restore of the VGIC state is added in subsequent patches. Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- Documentation/virtual/kvm/devices/arm-vgic.txt | 35 ++ virt/kvm/arm/vgic.c| 143 2 files changed, 178 insertions(+) diff --git a/Documentation/virtual/kvm/devices/arm-vgic.txt b/Documentation/virtual/kvm/devices/arm-vgic.txt index c9febb2..1b68475 100644 --- a/Documentation/virtual/kvm/devices/arm-vgic.txt +++ b/Documentation/virtual/kvm/devices/arm-vgic.txt @@ -19,3 +19,38 @@ Groups: KVM_VGIC_V2_ADDR_TYPE_CPU (rw, 64-bit) Base address in the guest physical address space of the GIC virtual cpu interface register mappings. + + KVM_DEV_ARM_VGIC_GRP_DIST_REGS + Attributes: +The attr field of kvm_device_attr encodes two values: +bits: | 63 40 | 39 .. 32 | 31 0 | +values: |reserved | cpu id | offset | + +All distributor regs are (rw, 32-bit) + +The offset is relative to the Distributor base address as defined in the +GICv2 specs. Getting or setting such a register has the same effect as +reading or writing the register on the actual hardware from the cpu +specified with cpu id field. Note that most distributor fields are not +banked, but return the same value regardless of the cpu id used to access +the register. + Limitations: +- Priorities are not implemented, and registers are RAZ/WI + Errors: +- ENODEV: Getting or setting this register is not yet supported + + KVM_DEV_ARM_VGIC_GRP_CPU_REGS + Attributes: +The attr field of kvm_device_attr encodes two values: +bits: | 63 40 | 39 .. 32 | 31 0 | +values: |reserved | cpu id | offset | + +All CPU regs are (rw, 32-bit) + +The offsetspecifies the offset from the CPU interface base address as offset specifies +defined in the GICv2 specs. Getting or setting such a register has the +same effect as reading or writing the register on the actual hardware. + Limitations: +- Priorities are not implemented, and registers are RAZ/WI + Errors: +- ENODEV: Getting or setting this register is not yet supported diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 629caeb..e31625c 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -591,11 +591,29 @@ static bool handle_mmio_sgi_reg(struct kvm_vcpu *vcpu, return false; } +static bool handle_mmio_sgi_clear(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, + phys_addr_t offset) +{ + return false; +} + +static bool handle_mmio_sgi_set(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, + phys_addr_t offset) +{ + return false; +} + /* * I would have liked to use the kvm_bus_io_*() API instead, but it * cannot cope with banked registers (only the VM pointer is passed * around, and we need the vcpu). One of these days, someone please * fix it! + * + * Note that the handle_mmio implementations should not use the phys_addr + * field from the kvm_exit_mmio struct as this will not have any sane values + * when used to save/restore state from user space. */ struct mmio_range { phys_addr_t base; @@ -665,6 +683,16 @@ static const struct mmio_range vgic_dist_ranges[] = { .len= 4, .handle_mmio= handle_mmio_sgi_reg, }, + { + .base = GIC_DIST_SGI_CLEAR, + .len= VGIC_NR_SGIS, + .handle_mmio= handle_mmio_sgi_clear, + }, + { + .base = GIC_DIST_SGI_SET, + .len= VGIC_NR_SGIS, + .handle_mmio= handle_mmio_sgi_set, + }, {} }; @@ -1543,6 +1571,80 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write) return r; } +static bool handle_cpu_mmio_misc(struct kvm_vcpu *vcpu, +struct kvm_exit_mmio *mmio, phys_addr_t offset) +{ + return true; +} + +static const struct mmio_range vgic_cpu_ranges[] = { + { + .base
Re: [PATCH 6/8] KVM: arm-vgic: Add vgic reg access from dev attr
On Wed, Sep 25, 2013 at 01:38:03PM -0700, Christoffer Dall wrote: On Sun, Aug 25, 2013 at 04:21:58PM +0100, Alexander Graf wrote: On 23.08.2013, at 20:20, Christoffer Dall wrote: Add infrastructure to handle distributor and cpu interface register accesses through the KVM_{GET/SET}_DEVICE_ATTR interface by adding the KVM_DEV_ARM_VGIC_GRP_DIST_REGS and KVM_DEV_ARM_VGIC_GRP_CPU_REGS groups and defining the semantics of the attr field to be the MMIO offset as specified in the GICv2 specs. Missing register accesses or other changes in individual register access functions to support save/restore of the VGIC state is added in subsequent patches. Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- Documentation/virtual/kvm/devices/arm-vgic.txt | 35 ++ virt/kvm/arm/vgic.c| 143 2 files changed, 178 insertions(+) diff --git a/Documentation/virtual/kvm/devices/arm-vgic.txt b/Documentation/virtual/kvm/devices/arm-vgic.txt index c9febb2..1b68475 100644 --- a/Documentation/virtual/kvm/devices/arm-vgic.txt +++ b/Documentation/virtual/kvm/devices/arm-vgic.txt @@ -19,3 +19,38 @@ Groups: KVM_VGIC_V2_ADDR_TYPE_CPU (rw, 64-bit) Base address in the guest physical address space of the GIC virtual cpu interface register mappings. + + KVM_DEV_ARM_VGIC_GRP_DIST_REGS + Attributes: +The attr field of kvm_device_attr encodes two values: +bits: | 63 40 | 39 .. 32 | 31 0 | +values: |reserved | cpu id | offset | + +All distributor regs are (rw, 32-bit) + +The offset is relative to the Distributor base address as defined in the +GICv2 specs. Getting or setting such a register has the same effect as +reading or writing the register on the actual hardware from the cpu +specified with cpu id field. Note that most distributor fields are not +banked, but return the same value regardless of the cpu id used to access +the register. + Limitations: +- Priorities are not implemented, and registers are RAZ/WI + Errors: +- ENODEV: Getting or setting this register is not yet supported + + KVM_DEV_ARM_VGIC_GRP_CPU_REGS + Attributes: +The attr field of kvm_device_attr encodes two values: +bits: | 63 40 | 39 .. 32 | 31 0 | +values: |reserved | cpu id | offset | + +All CPU regs are (rw, 32-bit) + +The offsetspecifies the offset from the CPU interface base address as offset specifies +defined in the GICv2 specs. Getting or setting such a register has the +same effect as reading or writing the register on the actual hardware. + Limitations: +- Priorities are not implemented, and registers are RAZ/WI + Errors: +- ENODEV: Getting or setting this register is not yet supported diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 629caeb..e31625c 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -591,11 +591,29 @@ static bool handle_mmio_sgi_reg(struct kvm_vcpu *vcpu, return false; } +static bool handle_mmio_sgi_clear(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, + phys_addr_t offset) +{ + return false; +} + +static bool handle_mmio_sgi_set(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, + phys_addr_t offset) +{ + return false; +} + /* * I would have liked to use the kvm_bus_io_*() API instead, but it * cannot cope with banked registers (only the VM pointer is passed * around, and we need the vcpu). One of these days, someone please * fix it! + * + * Note that the handle_mmio implementations should not use the phys_addr + * field from the kvm_exit_mmio struct as this will not have any sane values + * when used to save/restore state from user space. */ struct mmio_range { phys_addr_t base; @@ -665,6 +683,16 @@ static const struct mmio_range vgic_dist_ranges[] = { .len= 4, .handle_mmio= handle_mmio_sgi_reg, }, + { + .base = GIC_DIST_SGI_CLEAR, + .len= VGIC_NR_SGIS, + .handle_mmio= handle_mmio_sgi_clear, + }, + { + .base = GIC_DIST_SGI_SET, + .len= VGIC_NR_SGIS, + .handle_mmio= handle_mmio_sgi_set, + }, {} }; @@ -1543,6 +1571,80 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write) return r; } +static bool handle_cpu_mmio_misc(struct kvm_vcpu *vcpu, +
Re: [PATCH 8/8] KVM: arm-vgic: Support CPU interface reg access
On Sun, Aug 25, 2013 at 04:24:20PM +0100, Alexander Graf wrote: On 23.08.2013, at 20:20, Christoffer Dall wrote: Implement support for the CPU interface register access driven by MMIO address offsets from the CPU interface base address. Useful for user space to support save/restore of the VGIC state. This commit adds support only for the same logic as the current VGIC support, and no more. For example, the active priority registers are handled as RAZ/WI, just like setting priorities on the emulated distributor. Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- virt/kvm/arm/vgic.c | 66 +++ 1 file changed, 62 insertions(+), 4 deletions(-) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index d44b5a1..257dbae 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1684,9 +1684,67 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write) static bool handle_cpu_mmio_misc(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, phys_addr_t offset) { - return true; + struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu; + u32 reg, mask = 0, shift = 0; + bool updated = false; + + switch (offset ~0x3) { + case GIC_CPU_CTRL: + mask = GICH_VMCR_CTRL_MASK; + shift = GICH_VMCR_CTRL_SHIFT; + break; + case GIC_CPU_PRIMASK: + mask = GICH_VMCR_PRIMASK_MASK; + shift = GICH_VMCR_PRIMASK_SHIFT; + break; + case GIC_CPU_BINPOINT: + mask = GICH_VMCR_BINPOINT_MASK; + shift = GICH_VMCR_BINPOINT_SHIFT; + break; + case GIC_CPU_ALIAS_BINPOINT: + mask = GICH_VMCR_ALIAS_BINPOINT_MASK; + shift = GICH_VMCR_ALIAS_BINPOINT_SHIFT; + break; + } + + if (!mmio-is_write) { + reg = (vgic_cpu-vgic_vmcr mask) shift; + memcpy(mmio-data, reg, sizeof(reg)); + } else { + memcpy(reg, mmio-data, sizeof(reg)); + reg = (reg shift) mask; + if (reg != (vgic_cpu-vgic_vmcr mask)) + updated = true; + vgic_cpu-vgic_vmcr = ~mask; + vgic_cpu-vgic_vmcr |= reg; + } + return updated; +} + +static bool handle_mmio_abpr(struct kvm_vcpu *vcpu, +struct kvm_exit_mmio *mmio, phys_addr_t offset) +{ + return handle_cpu_mmio_misc(vcpu, mmio, GIC_CPU_ALIAS_BINPOINT); +} + +static bool handle_cpu_mmio_ident(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, + phys_addr_t offset) +{ + u32 reg; + + if (mmio-is_write) + return false; + + reg = 0x0002043B; This wants a comment and probably also a #define :). Marc, where does the 0x4b0 product id code come from for the distributor IIDR? Would this be satisfying? diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 5214424..558be38 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -71,6 +71,9 @@ #define VGIC_ADDR_UNDEF(-1) #define IS_VGIC_ADDR_UNDEF(_x) ((_x) == VGIC_ADDR_UNDEF) +#define GIC_PRODUCT_ID 0x4b0 +#define ARM_JEP106_IMPLEMENTER 0x43b + /* Physical address of vgic virtual cpu interface */ static phys_addr_t vgic_vcpu_base; @@ -331,7 +334,7 @@ static bool handle_mmio_misc(struct kvm_vcpu *vcpu, break; case 8: /* IIDR */ - reg = 0x4B00043B; + reg = (GIC_PRODUCT_ID 20) | ARM_JEP106_IMPLEMENTER; vgic_reg_access(mmio, reg, word_offset, ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED); break; @@ -1734,7 +1737,7 @@ static bool handle_cpu_mmio_ident(struct kvm_vcpu *vcpu, if (mmio-is_write) return false; - reg = 0x0002043B; + reg = (GIC_PRODUCT_ID 20) | (0x2 16) | ARM_JEP106_IMPLEMENTER; mmio_data_write(mmio, ~0, reg); return false; } Thanks, -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 61971] Migration problem of qemu-kvm guests
https://bugzilla.kernel.org/show_bug.cgi?id=61971 Magnus Boden mag...@boden.cx changed: What|Removed |Added Kernel Version|3.10.5 |3.11.1 --- Comment #1 from Magnus Boden mag...@boden.cx --- Hello, Since the kernel I used was a debian backport I compiled my own kernel from kernel.org 3.11.1 and it has the same problem. Regards Magnus -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] KVM: arm-vgic: Support CPU interface reg access
On 25.09.2013, at 23:30, Christoffer Dall wrote: On Sun, Aug 25, 2013 at 04:24:20PM +0100, Alexander Graf wrote: On 23.08.2013, at 20:20, Christoffer Dall wrote: Implement support for the CPU interface register access driven by MMIO address offsets from the CPU interface base address. Useful for user space to support save/restore of the VGIC state. This commit adds support only for the same logic as the current VGIC support, and no more. For example, the active priority registers are handled as RAZ/WI, just like setting priorities on the emulated distributor. Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- virt/kvm/arm/vgic.c | 66 +++ 1 file changed, 62 insertions(+), 4 deletions(-) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index d44b5a1..257dbae 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1684,9 +1684,67 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write) static bool handle_cpu_mmio_misc(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, phys_addr_t offset) { - return true; + struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu; + u32 reg, mask = 0, shift = 0; + bool updated = false; + + switch (offset ~0x3) { + case GIC_CPU_CTRL: + mask = GICH_VMCR_CTRL_MASK; + shift = GICH_VMCR_CTRL_SHIFT; + break; + case GIC_CPU_PRIMASK: + mask = GICH_VMCR_PRIMASK_MASK; + shift = GICH_VMCR_PRIMASK_SHIFT; + break; + case GIC_CPU_BINPOINT: + mask = GICH_VMCR_BINPOINT_MASK; + shift = GICH_VMCR_BINPOINT_SHIFT; + break; + case GIC_CPU_ALIAS_BINPOINT: + mask = GICH_VMCR_ALIAS_BINPOINT_MASK; + shift = GICH_VMCR_ALIAS_BINPOINT_SHIFT; + break; + } + + if (!mmio-is_write) { + reg = (vgic_cpu-vgic_vmcr mask) shift; + memcpy(mmio-data, reg, sizeof(reg)); + } else { + memcpy(reg, mmio-data, sizeof(reg)); + reg = (reg shift) mask; + if (reg != (vgic_cpu-vgic_vmcr mask)) + updated = true; + vgic_cpu-vgic_vmcr = ~mask; + vgic_cpu-vgic_vmcr |= reg; + } + return updated; +} + +static bool handle_mmio_abpr(struct kvm_vcpu *vcpu, +struct kvm_exit_mmio *mmio, phys_addr_t offset) +{ + return handle_cpu_mmio_misc(vcpu, mmio, GIC_CPU_ALIAS_BINPOINT); +} + +static bool handle_cpu_mmio_ident(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, + phys_addr_t offset) +{ + u32 reg; + + if (mmio-is_write) + return false; + + reg = 0x0002043B; This wants a comment and probably also a #define :). Marc, where does the 0x4b0 product id code come from for the distributor IIDR? Would this be satisfying? diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 5214424..558be38 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -71,6 +71,9 @@ #define VGIC_ADDR_UNDEF (-1) #define IS_VGIC_ADDR_UNDEF(_x) ((_x) == VGIC_ADDR_UNDEF) +#define GIC_PRODUCT_ID 0x4b0 This is a specific GIC version. PL390 for example is 0x3b0: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0416b/Beiggeba.html That should be reflected in the #define. If it means GICv2 then it should be GICV2_PRODUCT_ID for example. +#define ARM_JEP106_IMPLEMENTER 0x43b I think naming this JEP106_IMPLEMENTER_ARM makes it more obvious that we're talking about the implementer code for ARM. Or maybe just only IMPLEMENTER_ARM. + /* Physical address of vgic virtual cpu interface */ static phys_addr_t vgic_vcpu_base; @@ -331,7 +334,7 @@ static bool handle_mmio_misc(struct kvm_vcpu *vcpu, break; case 8: /* IIDR */ - reg = 0x4B00043B; + reg = (GIC_PRODUCT_ID 20) | ARM_JEP106_IMPLEMENTER; vgic_reg_access(mmio, reg, word_offset, ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED); break; @@ -1734,7 +1737,7 @@ static bool handle_cpu_mmio_ident(struct kvm_vcpu *vcpu, if (mmio-is_write) return false; - reg = 0x0002043B; + reg = (GIC_PRODUCT_ID 20) | (0x2 16) | ARM_JEP106_IMPLEMENTER; What is the 0x2 here? Alex mmio_data_write(mmio, ~0, reg); return false; } Thanks, -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/18] KVM: PPC: Fixes for PR and preparation for POWER8
On 20.09.2013, at 06:52, Paul Mackerras wrote: This patch series contains updated versions of patches that have been posted before, plus one new compilation fix (for PR KVM without CONFIG_ALTIVEC), plus a patch to allow the guest VRSAVE register to be accessed with the ONE_REG interface on Book E. The first few patches are preparation for POWER8 support. Following that there are several patches that improve PR KVM's MMU emulation and prepare for being able to compile both HV and PR KVM in the one kernel. The series stops short of allowing them to coexist, though, since the details of how that should best be done are still being discussed. Please apply. Thanks, applied all (with minor cosmetic fixes) to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I urgently request your partnership in a transaction.
I urgently request your partnership in a transaction. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] KVM: arm-vgic: Support CPU interface reg access
On Thu, Sep 26, 2013 at 12:37:03AM +0200, Alexander Graf wrote: On 25.09.2013, at 23:30, Christoffer Dall wrote: On Sun, Aug 25, 2013 at 04:24:20PM +0100, Alexander Graf wrote: On 23.08.2013, at 20:20, Christoffer Dall wrote: Implement support for the CPU interface register access driven by MMIO address offsets from the CPU interface base address. Useful for user space to support save/restore of the VGIC state. This commit adds support only for the same logic as the current VGIC support, and no more. For example, the active priority registers are handled as RAZ/WI, just like setting priorities on the emulated distributor. Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- virt/kvm/arm/vgic.c | 66 +++ 1 file changed, 62 insertions(+), 4 deletions(-) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index d44b5a1..257dbae 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1684,9 +1684,67 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write) static bool handle_cpu_mmio_misc(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, phys_addr_t offset) { - return true; + struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu; + u32 reg, mask = 0, shift = 0; + bool updated = false; + + switch (offset ~0x3) { + case GIC_CPU_CTRL: + mask = GICH_VMCR_CTRL_MASK; + shift = GICH_VMCR_CTRL_SHIFT; + break; + case GIC_CPU_PRIMASK: + mask = GICH_VMCR_PRIMASK_MASK; + shift = GICH_VMCR_PRIMASK_SHIFT; + break; + case GIC_CPU_BINPOINT: + mask = GICH_VMCR_BINPOINT_MASK; + shift = GICH_VMCR_BINPOINT_SHIFT; + break; + case GIC_CPU_ALIAS_BINPOINT: + mask = GICH_VMCR_ALIAS_BINPOINT_MASK; + shift = GICH_VMCR_ALIAS_BINPOINT_SHIFT; + break; + } + + if (!mmio-is_write) { + reg = (vgic_cpu-vgic_vmcr mask) shift; + memcpy(mmio-data, reg, sizeof(reg)); + } else { + memcpy(reg, mmio-data, sizeof(reg)); + reg = (reg shift) mask; + if (reg != (vgic_cpu-vgic_vmcr mask)) + updated = true; + vgic_cpu-vgic_vmcr = ~mask; + vgic_cpu-vgic_vmcr |= reg; + } + return updated; +} + +static bool handle_mmio_abpr(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, phys_addr_t offset) +{ + return handle_cpu_mmio_misc(vcpu, mmio, GIC_CPU_ALIAS_BINPOINT); +} + +static bool handle_cpu_mmio_ident(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, + phys_addr_t offset) +{ + u32 reg; + + if (mmio-is_write) + return false; + + reg = 0x0002043B; This wants a comment and probably also a #define :). Marc, where does the 0x4b0 product id code come from for the distributor IIDR? Would this be satisfying? diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 5214424..558be38 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -71,6 +71,9 @@ #define VGIC_ADDR_UNDEF (-1) #define IS_VGIC_ADDR_UNDEF(_x) ((_x) == VGIC_ADDR_UNDEF) +#define GIC_PRODUCT_ID 0x4b0 This is a specific GIC version. PL390 for example is 0x3b0: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0416b/Beiggeba.html That should be reflected in the #define. If it means GICv2 then it should be GICV2_PRODUCT_ID for example. I know what field in the register it is thanks :) But I couldn't find 0x4b0 anywhere in the docs, so I'm asking Marc where he got it from. I don't believe it means GICv2, but a specific implementation of a GICv2, and once I have more info I can change the define name, I suspect this is potentially something made-up to indicate that this is the KVM VGIC... +#define ARM_JEP106_IMPLEMENTER 0x43b I think naming this JEP106_IMPLEMENTER_ARM makes it more obvious that we're talking about the implementer code for ARM. Or maybe just only IMPLEMENTER_ARM. ok + /* Physical address of vgic virtual cpu interface */ static phys_addr_t vgic_vcpu_base; @@ -331,7 +334,7 @@ static bool handle_mmio_misc(struct kvm_vcpu *vcpu, break; case 8: /* IIDR */ - reg = 0x4B00043B; + reg = (GIC_PRODUCT_ID 20) | ARM_JEP106_IMPLEMENTER; vgic_reg_access(mmio, reg, word_offset, ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED); break; @@ -1734,7 +1737,7 @@ static bool handle_cpu_mmio_ident(struct kvm_vcpu *vcpu, if (mmio-is_write) return false; - reg = 0x0002043B; + reg = (GIC_PRODUCT_ID 20) | (0x2 16) | ARM_JEP106_IMPLEMENTER; What is the 0x2 here? GICv2, and now
Re: [PATCH 8/8] KVM: arm-vgic: Support CPU interface reg access
On 26.09.2013, at 03:36, Alexander Graf wrote: On 26.09.2013, at 03:15, Alexander Graf wrote: On 26.09.2013, at 02:54, Christoffer Dall wrote: On Thu, Sep 26, 2013 at 12:37:03AM +0200, Alexander Graf wrote: On 25.09.2013, at 23:30, Christoffer Dall wrote: On Sun, Aug 25, 2013 at 04:24:20PM +0100, Alexander Graf wrote: On 23.08.2013, at 20:20, Christoffer Dall wrote: Implement support for the CPU interface register access driven by MMIO address offsets from the CPU interface base address. Useful for user space to support save/restore of the VGIC state. This commit adds support only for the same logic as the current VGIC support, and no more. For example, the active priority registers are handled as RAZ/WI, just like setting priorities on the emulated distributor. Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- virt/kvm/arm/vgic.c | 66 +++ 1 file changed, 62 insertions(+), 4 deletions(-) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index d44b5a1..257dbae 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1684,9 +1684,67 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write) static bool handle_cpu_mmio_misc(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, phys_addr_t offset) { - return true; + struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu; + u32 reg, mask = 0, shift = 0; + bool updated = false; + + switch (offset ~0x3) { + case GIC_CPU_CTRL: + mask = GICH_VMCR_CTRL_MASK; + shift = GICH_VMCR_CTRL_SHIFT; + break; + case GIC_CPU_PRIMASK: + mask = GICH_VMCR_PRIMASK_MASK; + shift = GICH_VMCR_PRIMASK_SHIFT; + break; + case GIC_CPU_BINPOINT: + mask = GICH_VMCR_BINPOINT_MASK; + shift = GICH_VMCR_BINPOINT_SHIFT; + break; + case GIC_CPU_ALIAS_BINPOINT: + mask = GICH_VMCR_ALIAS_BINPOINT_MASK; + shift = GICH_VMCR_ALIAS_BINPOINT_SHIFT; + break; + } + + if (!mmio-is_write) { + reg = (vgic_cpu-vgic_vmcr mask) shift; + memcpy(mmio-data, reg, sizeof(reg)); + } else { + memcpy(reg, mmio-data, sizeof(reg)); + reg = (reg shift) mask; + if (reg != (vgic_cpu-vgic_vmcr mask)) + updated = true; + vgic_cpu-vgic_vmcr = ~mask; + vgic_cpu-vgic_vmcr |= reg; + } + return updated; +} + +static bool handle_mmio_abpr(struct kvm_vcpu *vcpu, +struct kvm_exit_mmio *mmio, phys_addr_t offset) +{ + return handle_cpu_mmio_misc(vcpu, mmio, GIC_CPU_ALIAS_BINPOINT); +} + +static bool handle_cpu_mmio_ident(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, + phys_addr_t offset) +{ + u32 reg; + + if (mmio-is_write) + return false; + + reg = 0x0002043B; This wants a comment and probably also a #define :). Marc, where does the 0x4b0 product id code come from for the distributor IIDR? Would this be satisfying? ^ diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 5214424..558be38 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -71,6 +71,9 @@ #define VGIC_ADDR_UNDEF (-1) #define IS_VGIC_ADDR_UNDEF(_x) ((_x) == VGIC_ADDR_UNDEF) +#define GIC_PRODUCT_ID 0x4b0 This is a specific GIC version. PL390 for example is 0x3b0: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0416b/Beiggeba.html That should be reflected in the #define. If it means GICv2 then it should be GICV2_PRODUCT_ID for example. I know what field in the register it is thanks :) But I couldn't find 0x4b0 anywhere in the docs, so I'm asking Marc where he got it from. I don't believe it means GICv2, but a Ah, ok. Then the answer to your question above is a simple no as the name doesn't really tell us everything we want to know yet :). specific implementation of a GICv2, and once I have more info I can change the define name, I suspect this is potentially something made-up to indicate that this is the KVM VGIC... Hrm, makes sense. So that also explains why there's a special version field. It doesn't explain why it only gets set in one of the IIDR variants though. Is this on purpose? From what I can tell, the CPU and Distributor interfaces both should return the same number here. Hrm. Curious. According to http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0438i/BABGBHBG.html the proper values for IIDR on an A15 are: GICD_IIDR 0x043B GICC_IIDR 0x0002043B So what do the fields mean in each
Re: [PATCH V3 4/6] vhost_net: determine whether or not to use zerocopy at one time
On 09/23/2013 03:16 PM, Michael S. Tsirkin wrote: On Thu, Sep 05, 2013 at 10:54:44AM +0800, Jason Wang wrote: On 09/04/2013 07:59 PM, Michael S. Tsirkin wrote: On Mon, Sep 02, 2013 at 04:40:59PM +0800, Jason Wang wrote: Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if upend_idx != done_idx we still set zcopy_used to true and rollback this choice later. This could be avoided by determining zerocopy once by checking all conditions at one time before. Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/vhost/net.c | 47 --- 1 files changed, 20 insertions(+), 27 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 8a6dd0d..3f89dea 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -404,43 +404,36 @@ static void handle_tx(struct vhost_net *net) iov_length(nvq-hdr, s), hdr_size); break; } - zcopy_used = zcopy (len = VHOST_GOODCOPY_LEN || -nvq-upend_idx != nvq-done_idx); + + zcopy_used = zcopy len = VHOST_GOODCOPY_LEN + (nvq-upend_idx + 1) % UIO_MAXIOV != + nvq-done_idx Thinking about this, this looks strange. The original idea was that once we start doing zcopy, we keep using the heads ring even for short packets until no zcopy is outstanding. What's the reason for keep using the heads ring? To keep completions in order. Ok, I will do some test to see the impact. What's the logic behind (nvq-upend_idx + 1) % UIO_MAXIOV != nvq-done_idx here? Because we initialize both upend_idx and done_idx to zero, so upend_idx != done_idx could not be used to check whether or not the heads ring were full. But what does ring full have to do with zerocopy use? It was used to forbid the zerocopy when heads ring are full, but since we have the limitation now, it could be removed. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency
On 09/26/2013 06:19 AM, Alex Williamson wrote: On Sun, 2013-09-15 at 22:40 +1000, Alexey Kardashevskiy wrote: On 09/14/2013 02:25 AM, Alex Williamson wrote: On Fri, 2013-09-13 at 18:49 +1000, Alexey Kardashevskiy wrote: On 09/13/2013 07:23 AM, Alex Williamson wrote: So far we've succeeded at making KVM and VFIO mostly unaware of each other, but there's any important point where that breaks down. Intel VT-d hardware may or may not support snoop control. When snoop control is available, intel-iommu promotes No-Snoop transactions on PCIe to be cache coherent. That allows KVM to handle things like the x86 WBINVD opcode as a nop. When the hardware does not support this, KVM must implement a hardware visible WBINVD for the guest. We could simply let userspace tell KVM how to handle WBINVD, but it's privileged for a reason. Allowing an arbitrary user to enable physical WBINVD gives them a more access to the hardware. Previously, this has only been enabled for guests supporting legacy PCI device assignment. In such cases it's necessary for proper guest execution. We therefore create a new KVM-VFIO virtual device. The user can add and remove VFIO groups to this device via file descriptors. KVM makes use of the VFIO external user interface to validate that the user has access to physical hardware and gets the coherency state of the IOMMU from VFIO. This provides equivalent functionality to legacy KVM assignment, while keeping (nearly) all the bits isolated. The one intrusion is the resulting flag indicating the coherency state. For this RFC it's placed on the x86 kvm_arch struct, however I know POWER has interest in using the VFIO external user interface, and I'm hoping we can share a common KVM-VFIO device. Perhaps they care about No-Snoop handling as well or the code can be #ifdef'd. POWER does not support (at least boos3s - server, not sure about others) this cache-non-coherent stuff at all. Then it's easy for your IOMMU API interface to return always cache coherent or never cache coherent or whatever ;) Regarding reusing this device with external API for POWER - I posted a patch which introduces KVM device to link KVM with IOMMU but besides the list of groups registered in KVM, it also provides the way to find a group by LIOBN (logical bus number) which is used in DMA map/unmap hypercalls. So in my case kvm_vfio_group struct needs LIOBN and it would be nice to have there window_size too (for a quick boundary check). I am not sure we want to mix everything here. It is in [PATCH v10 12/13] KVM: PPC: Add support for IOMMU in-kernel handling if you are interested (kvmppc_spapr_tce_iommu_device). Yes, I stole the code to get the vfio symbols from your code. The convergence I was hoping to achieve is that KVM doesn't really want to know about VFIO and vica versa. We can therefore at least limit the intrusion by sharing a common device. Obviously for you it will need some extra interfaces to associate an LIOBN to a group, but we keep both the kernel an userspace cleaner by avoiding duplication where we can. Is this really not extensible to your usage? Well, I do not have a good taste for this kind of stuff so I cannot tell for sure. I can reuse this device and hack it to do whatever I need but how? There are 2 things I need from KVM device: 1. associate IOMMU group with LIOBN Does QEMU know this? We could set another attribute to specify the LIOBN for a group. QEMU knows as QEMU decides on LOIBN number. And yes, we could do that. 2. get a pointer to an IOMMU group by LIOBN (used to get ppc's IOMMU table pointer which is an IOMMU data of an IOMMU group so we could take a shortcut here). Once we have a VFIO device with a VFIO group added and the LIOBN attribute set, isn't this just a matter of some access code? The lookup function will be called from what we call a realmode on PPC64, i.e. when MMU is off and kvm.ko code is not available. So we will either have to put this lookup function to a separate virt/kvm/vfio_rm.c or compile the whole thing into the kernel image but this is not a big issue anyway. You can have a look for example at book3s_64_vio_hv.o vs. book3s_64_vio.o to get a better picture if you like. There are 3 possible interfaces to use: A. set/get attributes B. ioctl C. additional API I think we need to differentiate user interfaces from kernel interfaces. Within the kernel, we make no guarantees about interfaces and we can change them to meet our needs. That includes the vfio external user interface. For userspace, we need to be careful not to break things. I suggest we use the set/get/has attribute interface that's already part of KVM for the user interface. What I posted a week ago uses A for 1 and C for 2. Now we move this to virt/kvm/vfio.c. I don't care where it lives, other than we both have a need for it, so it seems like the core of it should not live in architecture specific
Re: [PATCH] powerpc/kvm: Handle the boundary condition correctly
Hi Alex, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com writes: Ok, please give me an example with real numbers and why it breaks. http://mid.gmane.org/1376995766-16526-4-git-send-email-aneesh.ku...@linux.vnet.ibm.com Didn't quiet get what you are looking for. As explained before, we now need to pass an array with array size 3 even though we know we need to read only 2 entries because kernel doesn't loop correctly. But we need to do that regardless, because newer QEMU needs to be able to run on older kernels, no? yes. So use space will have to pass an array of size 3. But that should not prevent us from fixing this right ? Do we still want this patch or should I drop this ? -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/18] KVM: PPC: Fixes for PR and preparation for POWER8
On 20.09.2013, at 06:52, Paul Mackerras wrote: This patch series contains updated versions of patches that have been posted before, plus one new compilation fix (for PR KVM without CONFIG_ALTIVEC), plus a patch to allow the guest VRSAVE register to be accessed with the ONE_REG interface on Book E. The first few patches are preparation for POWER8 support. Following that there are several patches that improve PR KVM's MMU emulation and prepare for being able to compile both HV and PR KVM in the one kernel. The series stops short of allowing them to coexist, though, since the details of how that should best be done are still being discussed. Please apply. Thanks, applied all (with minor cosmetic fixes) to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: Implement default IRQ routing
On Mon, Sep 23, 2013 at 09:34:01PM +0300, Gleb Natapov wrote: On Mon, Sep 23, 2013 at 09:24:14PM +1000, Paul Mackerras wrote: On Sun, Sep 22, 2013 at 03:32:53PM +0300, Gleb Natapov wrote: On Tue, Sep 17, 2013 at 07:18:40PM +1000, Paul Mackerras wrote: This implements a simple way to express the case of IRQ routing where there is one in-kernel PIC and the system interrupts (GSIs) are routed 1-1 to the corresponding pins of the PIC. This is expressed by having kvm-irq_routing == NULL with a skeleton irq routing entry in the new kvm-default_irq_route field. This provides a convenient way to provide backwards compatibility when adding IRQ routing capability to an existing in-kernel PIC, such as the XICS emulation on powerpc. Why not create simple 1-1 irq routing table? It will take a little bit more memory, but there will be no need for kvm-irq_routing == NULL special handling. The short answer is that userspace wants to use interrupt source numbers (i.e. pin numbers for the inputs to the emulated XICS) that are scattered throughout a large space, since that mirrors what real hardware does. More specifically, hardware divides up the interrupt source number into two fields, each of typically 12 bits, where the more significant field identifies an interrupt source unit (ISU) and the less significant field identifies an interrupt within the ISU. Each PCI host bridge would have an ISU, for example, and there can be ISUs associated with other things that attach directly to the interconnect fabric (coprocessors, cluster interconnects, etc.). Today, QEMU creates a virtual ISU numbered 1 for the emulated PCI host bridge, which means for example that virtio devices get interrupt pin numbers starting at 4096. So, I could have increased KVM_IRQCHIP_NUM_PINS to some multiple of 4096, say 16384, which would allow for 3 ISUs. But that would bloat out struct kvm_irq_routing_table to over 64kB, and if I wanted 1-1 mappings between GSI and pins for all of them, the routing table would be over 960kB. Yes, this is not an option. GSI is just a cookie for anything but x86 non MSI interrupts. So the way to use irq routing table to deliver XICS irqs is to register GSI-XICS irq mapping and by triggering GSI, which is just an arbitrary number, userspace tells kernel that XICS irq, that was registered for that GSI, should be injected. Yes, that's fine as far as it goes, but the trouble is that the existing data structures (specifically the chip[][] array in struct kvm_irq_routing_table) don't handle well the case where the pin numbers are large and/or sparse. In other words, using a small compact set of GSI numbers wouldn't help, because it's not the GSI - pin mapping that is the problem, it is the reverse pin - GSI mapping. There is a compatibility concern too -- if I want existing userspace to run, I would have to create 1-1 default mappings for at least the first (say) 4200 pins or so, which would use up about 294kB. That really doesn't seem worth it compared to just using the null routing table pointer to indicate an unlimited 1-1 mapping. Let me check that I understand you correctly. Exiting userspace already uses XICS irq directly and now you want to switch this interface to use irq routing. New userspace will register GSI-XICS irq mapping like described above, this is what [2/2] does. Is this correct? I don't particularly want irq routing, but it appears to be unavoidable if one wants IRQFD, which I do want. There is no particular advantage to userspace in using a GSI - XICS irq mapping. If userspace did set one up it would be the identity mapping. Paul. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html