Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume
Gleb Natapov g...@redhat.com wrote on 29/10/2009 19:31:05: [image removed] Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume Gleb Natapov to: Orit Wasserman 29/10/2009 19:31 Cc: Abel Gordon, aliguori, Ben-Ami Yassour1, kvm, mdday, Muli Ben-Yehuda On Wed, Oct 28, 2009 at 06:23:42PM +0200, Orit Wasserman wrote: Gleb Natapov g...@redhat.com wrote on 25/10/2009 11:44:31: From: Gleb Natapov g...@redhat.com To: Orit Wasserman/Haifa/i...@ibmil Cc: Abel Gordon/Haifa/i...@ibmil, aligu...@us.ibm.com, Ben-Ami Yassour1/ Haifa/i...@ibmil, kvm@vger.kernel.org, md...@us.ibm.com, Muli Ben- Yehuda/Haifa/i...@ibmil Date: 25/10/2009 11:44 Subject: Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume On Thu, Oct 22, 2009 at 05:46:16PM +0200, Orit Wasserman wrote: Gleb Natapov g...@redhat.com wrote on 22/10/2009 11:04:58: From: Gleb Natapov g...@redhat.com To: Orit Wasserman/Haifa/i...@ibmil Cc: Abel Gordon/Haifa/i...@ibmil, aligu...@us.ibm.com, Ben-Ami Yassour1/ Haifa/i...@ibmil, kvm@vger.kernel.org, md...@us.ibm.com, Muli Ben- Yehuda/Haifa/i...@ibmil Date: 22/10/2009 11:05 Subject: Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume On Wed, Oct 21, 2009 at 04:43:44PM +0200, Orit Wasserman wrote: @@ -4641,10 +4955,13 @@ static void vmx_complete_interrupts (struct vcpu_vmx *vmx) int type; bool idtv_info_valid; - exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); - vmx-exit_reason = vmcs_read32(VM_EXIT_REASON); + if (vmx-nested.nested_mode) + return; + Why return here? What the function does that should not be done in nested mode? In nested mode L0 injects an interrupt to L2 only in one scenario, if there is an IDT_VALID event and L0 decides to run L2 again and not to switch back to L1. In all other cases the injection is handled by L1. This is exactly the kind of scenario that is handled by vmx_complete_interrupts(). (vmx|svm)_complete_interrups() store pending event in arch agnostic way and re-injection is handled by x86.c You bypass this logic by inserting return here and introducing nested_handle_valid_idt() function below. The only location we can truly know if we are switching to L1 is in vmx_vcpu_run because enable_irq_window (that is called after handling the exit) can decide to switch to L1 because of an interrupt. enable_irq_window() will be called after L2 VMCS will be setup for event re-injection by previous call to inject_pending_event(). As far as I can see this should work for interrupt injection. For exception we should probably require l2 guest to re execute faulted instruction for now like svm does. The main issue is that L0 doesn't inject events to L2 but L1 hypervisor (we want to keep the nested hypervisor semantics as much as possible). Only if the event was caused by the fact that L2 is a nested guest and L1 can't handle it L0 will re-inject and event to L2, for example IDT event with page fault that is caused by a missing entry in SPT02 (the shadow page table L0 create for L2). In this case when vmx_complete_intterupts is called L0 doesn't know if the page fault should be handled by it or by L1 (it is decided later when handling the exit). So what? When it will be decided that L2 exit is needed pending event will be transfered into L2's idt_vectoring_info. Otherwise event will be reinfected by usual mechanism. BTW I don't see where you current code setup L2's idt_vectoring_info if it is decided that L1 should handle event re-injection. Suppose we are executing an L2 guest and we got an exit. There are 2 possible scenarios here: A) The L2 exit will be handled by the L1 guest hypervisor. In this case when we switch to L1 the IDT vectoring info field is copied from vmcs(02) to vmcs(12) in prepare_vmcs_12 (part of the nested_vmx_vmexit path). Now is under responsibility of L1 to deal with the IDT and do the corresponding logic. B) The L2 exit will be handled only by L0. In this case we never switch to L1. L0 handles the exit and resume L2. Any pending event in vmcs(02) idt vectoring info field is injected to l2 when L0 resumes it. KVM handles IDT in at the end of vmx_vcpu_run, calling vmx_complete_interrupts. The decision to switch or not switch to L1 is made in the following points: 1) nested_vmx_check_exception (called from vmx_queue_exception) 2) nested_vmx_intr (called from vmx_interrupt_allowed and enable_irq_window) 3) vmx_handle_exit From x86 perspective the flow looks as follow: vcpu_enter_guest { 1 2 run (includes vmx_complete_interrupts) 3 } All these functions are called
Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume
On Wed, Oct 28, 2009 at 06:23:42PM +0200, Orit Wasserman wrote: Gleb Natapov g...@redhat.com wrote on 25/10/2009 11:44:31: From: Gleb Natapov g...@redhat.com To: Orit Wasserman/Haifa/i...@ibmil Cc: Abel Gordon/Haifa/i...@ibmil, aligu...@us.ibm.com, Ben-Ami Yassour1/ Haifa/i...@ibmil, kvm@vger.kernel.org, md...@us.ibm.com, Muli Ben- Yehuda/Haifa/i...@ibmil Date: 25/10/2009 11:44 Subject: Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume On Thu, Oct 22, 2009 at 05:46:16PM +0200, Orit Wasserman wrote: Gleb Natapov g...@redhat.com wrote on 22/10/2009 11:04:58: From: Gleb Natapov g...@redhat.com To: Orit Wasserman/Haifa/i...@ibmil Cc: Abel Gordon/Haifa/i...@ibmil, aligu...@us.ibm.com, Ben-Ami Yassour1/ Haifa/i...@ibmil, kvm@vger.kernel.org, md...@us.ibm.com, Muli Ben- Yehuda/Haifa/i...@ibmil Date: 22/10/2009 11:05 Subject: Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume On Wed, Oct 21, 2009 at 04:43:44PM +0200, Orit Wasserman wrote: @@ -4641,10 +4955,13 @@ static void vmx_complete_interrupts (struct vcpu_vmx *vmx) int type; bool idtv_info_valid; - exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); - vmx-exit_reason = vmcs_read32(VM_EXIT_REASON); + if (vmx-nested.nested_mode) + return; + Why return here? What the function does that should not be done in nested mode? In nested mode L0 injects an interrupt to L2 only in one scenario, if there is an IDT_VALID event and L0 decides to run L2 again and not to switch back to L1. In all other cases the injection is handled by L1. This is exactly the kind of scenario that is handled by vmx_complete_interrupts(). (vmx|svm)_complete_interrups() store pending event in arch agnostic way and re-injection is handled by x86.c You bypass this logic by inserting return here and introducing nested_handle_valid_idt() function below. The only location we can truly know if we are switching to L1 is in vmx_vcpu_run because enable_irq_window (that is called after handling the exit) can decide to switch to L1 because of an interrupt. enable_irq_window() will be called after L2 VMCS will be setup for event re-injection by previous call to inject_pending_event(). As far as I can see this should work for interrupt injection. For exception we should probably require l2 guest to re execute faulted instruction for now like svm does. The main issue is that L0 doesn't inject events to L2 but L1 hypervisor (we want to keep the nested hypervisor semantics as much as possible). Only if the event was caused by the fact that L2 is a nested guest and L1 can't handle it L0 will re-inject and event to L2, for example IDT event with page fault that is caused by a missing entry in SPT02 (the shadow page table L0 create for L2). In this case when vmx_complete_intterupts is called L0 doesn't know if the page fault should be handled by it or by L1 (it is decided later when handling the exit). So what? When it will be decided that L2 exit is needed pending event will be transfered into L2's idt_vectoring_info. Otherwise event will be reinfected by usual mechanism. BTW I don't see where you current code setup L2's idt_vectoring_info if it is decided that L1 should handle event reinjection. In most other cases , L0 will switch to L1 and L1 will decide if there will be re-injection (depends on the L1 hypervisor logic) and update L2 VMCS accordingly. In order to simplify our code it was simpler to bypass vmx_complete_interrupts when it is called (after running L2) and to add nested_handle_valid_idt just before running L2. + exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); + /* Handle machine checks before interrupts are enabled */ if ((vmx-exit_reason == EXIT_REASON_MCE_DURING_VMENTRY) || (vmx-exit_reason == EXIT_REASON_EXCEPTION_NMI @@ -4747,6 +5064,60 @@ static void fixup_rmode_irq(struct vcpu_vmx *vmx) | vmx-rmode.irq.vector; } +static int nested_handle_valid_idt(struct kvm_vcpu *vcpu) +{ It seems by this function you are trying to bypass general event reinjection logic. Why? See above. The logic implemented by this function is handled in x86.c in arch agnostic way. Is there something wrong with this? See my comment before Sometimes it is wrong to reinject events from L0 to L2 directly. If L2 was not able to handle event because its IDT is not mapped by L1 shadow page table we should generate PF vmexit with valid idt vectoring info to L1 and let L1 handle event reinjection. -- Gleb. -- Gleb. -- To unsubscribe from
Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume
Gleb Natapov g...@redhat.com wrote on 25/10/2009 11:44:31: From: Gleb Natapov g...@redhat.com To: Orit Wasserman/Haifa/i...@ibmil Cc: Abel Gordon/Haifa/i...@ibmil, aligu...@us.ibm.com, Ben-Ami Yassour1/ Haifa/i...@ibmil, kvm@vger.kernel.org, md...@us.ibm.com, Muli Ben- Yehuda/Haifa/i...@ibmil Date: 25/10/2009 11:44 Subject: Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume On Thu, Oct 22, 2009 at 05:46:16PM +0200, Orit Wasserman wrote: Gleb Natapov g...@redhat.com wrote on 22/10/2009 11:04:58: From: Gleb Natapov g...@redhat.com To: Orit Wasserman/Haifa/i...@ibmil Cc: Abel Gordon/Haifa/i...@ibmil, aligu...@us.ibm.com, Ben-Ami Yassour1/ Haifa/i...@ibmil, kvm@vger.kernel.org, md...@us.ibm.com, Muli Ben- Yehuda/Haifa/i...@ibmil Date: 22/10/2009 11:05 Subject: Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume On Wed, Oct 21, 2009 at 04:43:44PM +0200, Orit Wasserman wrote: @@ -4641,10 +4955,13 @@ static void vmx_complete_interrupts (struct vcpu_vmx *vmx) int type; bool idtv_info_valid; - exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); - vmx-exit_reason = vmcs_read32(VM_EXIT_REASON); + if (vmx-nested.nested_mode) + return; + Why return here? What the function does that should not be done in nested mode? In nested mode L0 injects an interrupt to L2 only in one scenario, if there is an IDT_VALID event and L0 decides to run L2 again and not to switch back to L1. In all other cases the injection is handled by L1. This is exactly the kind of scenario that is handled by vmx_complete_interrupts(). (vmx|svm)_complete_interrups() store pending event in arch agnostic way and re-injection is handled by x86.c You bypass this logic by inserting return here and introducing nested_handle_valid_idt() function below. The only location we can truly know if we are switching to L1 is in vmx_vcpu_run because enable_irq_window (that is called after handling the exit) can decide to switch to L1 because of an interrupt. enable_irq_window() will be called after L2 VMCS will be setup for event re-injection by previous call to inject_pending_event(). As far as I can see this should work for interrupt injection. For exception we should probably require l2 guest to re execute faulted instruction for now like svm does. The main issue is that L0 doesn't inject events to L2 but L1 hypervisor (we want to keep the nested hypervisor semantics as much as possible). Only if the event was caused by the fact that L2 is a nested guest and L1 can't handle it L0 will re-inject and event to L2, for example IDT event with page fault that is caused by a missing entry in SPT02 (the shadow page table L0 create for L2). In this case when vmx_complete_intterupts is called L0 doesn't know if the page fault should be handled by it or by L1 (it is decided later when handling the exit). In most other cases , L0 will switch to L1 and L1 will decide if there will be re-injection (depends on the L1 hypervisor logic) and update L2 VMCS accordingly. In order to simplify our code it was simpler to bypass vmx_complete_interrupts when it is called (after running L2) and to add nested_handle_valid_idt just before running L2. + exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); + /* Handle machine checks before interrupts are enabled */ if ((vmx-exit_reason == EXIT_REASON_MCE_DURING_VMENTRY) || (vmx-exit_reason == EXIT_REASON_EXCEPTION_NMI @@ -4747,6 +5064,60 @@ static void fixup_rmode_irq(struct vcpu_vmx *vmx) | vmx-rmode.irq.vector; } +static int nested_handle_valid_idt(struct kvm_vcpu *vcpu) +{ It seems by this function you are trying to bypass general event reinjection logic. Why? See above. The logic implemented by this function is handled in x86.c in arch agnostic way. Is there something wrong with this? See my comment before Sometimes it is wrong to reinject events from L0 to L2 directly. If L2 was not able to handle event because its IDT is not mapped by L1 shadow page table we should generate PF vmexit with valid idt vectoring info to L1 and let L1 handle event reinjection. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume
On Thu, Oct 22, 2009 at 05:46:16PM +0200, Orit Wasserman wrote: Gleb Natapov g...@redhat.com wrote on 22/10/2009 11:04:58: From: Gleb Natapov g...@redhat.com To: Orit Wasserman/Haifa/i...@ibmil Cc: Abel Gordon/Haifa/i...@ibmil, aligu...@us.ibm.com, Ben-Ami Yassour1/ Haifa/i...@ibmil, kvm@vger.kernel.org, md...@us.ibm.com, Muli Ben- Yehuda/Haifa/i...@ibmil Date: 22/10/2009 11:05 Subject: Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume On Wed, Oct 21, 2009 at 04:43:44PM +0200, Orit Wasserman wrote: @@ -4641,10 +4955,13 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) int type; bool idtv_info_valid; - exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); - vmx-exit_reason = vmcs_read32(VM_EXIT_REASON); + if (vmx-nested.nested_mode) + return; + Why return here? What the function does that should not be done in nested mode? In nested mode L0 injects an interrupt to L2 only in one scenario, if there is an IDT_VALID event and L0 decides to run L2 again and not to switch back to L1. In all other cases the injection is handled by L1. This is exactly the kind of scenario that is handled by vmx_complete_interrupts(). (vmx|svm)_complete_interrups() store pending event in arch agnostic way and re-injection is handled by x86.c You bypass this logic by inserting return here and introducing nested_handle_valid_idt() function below. The only location we can truly know if we are switching to L1 is in vmx_vcpu_run because enable_irq_window (that is called after handling the exit) can decide to switch to L1 because of an interrupt. enable_irq_window() will be called after L2 VMCS will be setup for event re-injection by previous call to inject_pending_event(). As far as I can see this should work for interrupt injection. For exception we should probably require l2 guest to re execute faulted instruction for now like svm does. In order to simplify our code it was simpler to bypass vmx_complete_interrupts when it is called (after running L2) and to add nested_handle_valid_idt just before running L2. + exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); + /* Handle machine checks before interrupts are enabled */ if ((vmx-exit_reason == EXIT_REASON_MCE_DURING_VMENTRY) || (vmx-exit_reason == EXIT_REASON_EXCEPTION_NMI @@ -4747,6 +5064,60 @@ static void fixup_rmode_irq(struct vcpu_vmx *vmx) | vmx-rmode.irq.vector; } +static int nested_handle_valid_idt(struct kvm_vcpu *vcpu) +{ It seems by this function you are trying to bypass general event reinjection logic. Why? See above. The logic implemented by this function is handled in x86.c in arch agnostic way. Is there something wrong with this? See my comment before Sometimes it is wrong to reinject events from L0 to L2 directly. If L2 was not able to handle event because its IDT is not mapped by L1 shadow page table we should generate PF vmexit with valid idt vectoring info to L1 and let L1 handle event reinjection. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume
On Wed, Oct 21, 2009 at 04:43:44PM +0200, Orit Wasserman wrote: @@ -4641,10 +4955,13 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) int type; bool idtv_info_valid; - exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); - vmx-exit_reason = vmcs_read32(VM_EXIT_REASON); + if (vmx-nested.nested_mode) + return; + Why return here? What the function does that should not be done in nested mode? In nested mode L0 injects an interrupt to L2 only in one scenario, if there is an IDT_VALID event and L0 decides to run L2 again and not to switch back to L1. In all other cases the injection is handled by L1. This is exactly the kind of scenario that is handled by vmx_complete_interrupts(). (vmx|svm)_complete_interrups() store pending event in arch agnostic way and re-injection is handled by x86.c You bypass this logic by inserting return here and introducing nested_handle_valid_idt() function below. + exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); + /* Handle machine checks before interrupts are enabled */ if ((vmx-exit_reason == EXIT_REASON_MCE_DURING_VMENTRY) || (vmx-exit_reason == EXIT_REASON_EXCEPTION_NMI @@ -4747,6 +5064,60 @@ static void fixup_rmode_irq(struct vcpu_vmx *vmx) | vmx-rmode.irq.vector; } +static int nested_handle_valid_idt(struct kvm_vcpu *vcpu) +{ It seems by this function you are trying to bypass general event reinjection logic. Why? See above. The logic implemented by this function is handled in x86.c in arch agnostic way. Is there something wrong with this? + vmx-launched = vmx-nested.l2_state-launched; + Can you explain why -launched logic is needed? It is possible L1 called vmlaunch but we didn't actually run L2 (for example there was an interrupt and enable_irq_window switched back to L1 before running L2). L1 thinks the vmlaunch was successful and call vmresume in the next time but KVM needs to call vmlaunch for L2. handle_vmlauch() and handle_vmresume() are exactly the same. Why KVM needs to run one and not the other? +static int nested_vmx_vmexit(struct kvm_vcpu *vcpu, + bool is_interrupt) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + int initial_pfu_active = vcpu-fpu_active; + + if (!vmx-nested.nested_mode) { + printk(KERN_INFO WARNING: %s called but not in nested mode\n, + __func__); + return 0; + } + + save_msrs(vmx-guest_msrs, vmx-save_nmsrs); + + sync_cached_regs_to_vmcs(vcpu); + + if (!nested_map_shadow_vmcs(vcpu)) { + printk(KERN_INFO Error mapping shadow vmcs\n); + set_rflags_to_vmx_fail_valid(vcpu); Error during vmexit should set abort flag, not change flags. I think this is more a vmlaunch/vmresume error (in the code that switch back to L1). How is this vmlaunch/vmresume error? This function is called to exit from L2 guest while on L2 vcms. It is called asynchronously in respect to L2 guest and you modify L2 guest rflags register at unpredictable place here. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume
On 10/21/2009 04:43 PM, Orit Wasserman wrote: It is possible L1 called vmlaunch but we didn't actually run L2 (for example there was an interrupt and enable_irq_window switched back to L1 before running L2). L1 thinks the vmlaunch was successful and call vmresume in the next time but KVM needs to call vmlaunch for L2. Is it really possible? If vmlaunch is started, it should complete unconditionally for L1. The irq window should be recalculated based on the guest vmcs exit on external interrupt and guest eflags/cr8 etc. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume
Avi Kivity a...@redhat.com wrote on 20/10/2009 06:56:39: From: Avi Kivity a...@redhat.com To: Orit Wasserman/Haifa/i...@ibmil Cc: kvm@vger.kernel.org, Ben-Ami Yassour1/Haifa/i...@ibmil, Abel Gordon/ Haifa/i...@ibmil, Muli Ben-Yehuda/Haifa/i...@ibmil, aligu...@us.ibm.com, md...@us.ibm.com Date: 20/10/2009 06:56 Subject: Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume On 10/15/2009 11:41 PM, or...@il.ibm.com wrote: From: Orit Wassermanor...@il.ibm.com --- arch/x86/kvm/vmx.c | 1173 ++ -- 1 files changed, 1148 insertions(+), 25 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6a4c252..e814029 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -209,6 +209,7 @@ struct __attribute__ ((__packed__)) level_state { struct vmcs *vmcs; int cpu; int launched; + bool first_launch; }; struct nested_vmx { @@ -216,6 +217,12 @@ struct nested_vmx { bool vmxon; /* What is the location of the vmcs l1 keeps for l2? (in level1 gpa) */ u64 vmptr; + /* Are we running nested guest */ + bool nested_mode; + /* L1 requested VMLAUNCH or VMRESUME but we didn't run L2 yet */ + bool nested_run_pending; + /* flag indicating if there was a valid IDT after exiting from l2 */ + bool nested_valid_idt; Did you mean valid_idt_vectoring_info? yes. No need to prefix everything with nested_ inside nested_vmx. OK. +void prepare_vmcs_12(struct kvm_vcpu *vcpu) +{ + struct shadow_vmcs *l2_shadow_vmcs = + get_shadow_vmcs(vcpu); + + l2_shadow_vmcs-guest_es_selector = vmcs_read16(GUEST_ES_SELECTOR); + l2_shadow_vmcs-guest_cs_selector = vmcs_read16(GUEST_CS_SELECTOR); + l2_shadow_vmcs-guest_ss_selector = vmcs_read16(GUEST_SS_SELECTOR); + l2_shadow_vmcs-guest_ds_selector = vmcs_read16(GUEST_DS_SELECTOR); + l2_shadow_vmcs-guest_fs_selector = vmcs_read16(GUEST_FS_SELECTOR); + l2_shadow_vmcs-guest_gs_selector = vmcs_read16(GUEST_GS_SELECTOR); + l2_shadow_vmcs-guest_ldtr_selector = vmcs_read16 (GUEST_LDTR_SELECTOR); + l2_shadow_vmcs-guest_tr_selector = vmcs_read16(GUEST_TR_SELECTOR); + + l2_shadow_vmcs-tsc_offset = vmcs_read64(TSC_OFFSET); + l2_shadow_vmcs-guest_physical_address = + vmcs_read64(GUEST_PHYSICAL_ADDRESS); + l2_shadow_vmcs-vmcs_link_pointer = vmcs_read64(VMCS_LINK_POINTER); Physical addresses need translation, no? If you are referring to GUEST_PHYSICAL_ADDRESS than there is no need for translation for L1. It need to stay in L2 physical address. + l2_shadow_vmcs-guest_cr0 = vmcs_readl(GUEST_CR0); + + l2_shadow_vmcs-guest_cr4 = vmcs_readl(GUEST_CR4); We don't allow the guest to modify these, so no need to read them. If you do, you need to remove the bits that we modify. You are correct for example CR0.TS , it will be fixed in the next set of patches. + +int load_vmcs_common(struct shadow_vmcs *src) +{ + + vmcs_write64(VMCS_LINK_POINTER, src-vmcs_link_pointer); Why load this? At the moment it is not used , maybe in the future. I can add a check to see if it was changed. + vmcs_write64(GUEST_IA32_DEBUGCTL, src-guest_ia32_debugctl); I think some features there are dangerous. I will look into it. + vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, src- vm_entry_msr_load_count); Need to verify? Also need to validate the loaded MSRs and run them through kvm_set_msr() instead of letting the cpu do it. I will add the checks. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume
Gleb Natapov g...@redhat.com wrote on 22/10/2009 11:04:58: From: Gleb Natapov g...@redhat.com To: Orit Wasserman/Haifa/i...@ibmil Cc: Abel Gordon/Haifa/i...@ibmil, aligu...@us.ibm.com, Ben-Ami Yassour1/ Haifa/i...@ibmil, kvm@vger.kernel.org, md...@us.ibm.com, Muli Ben- Yehuda/Haifa/i...@ibmil Date: 22/10/2009 11:05 Subject: Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume On Wed, Oct 21, 2009 at 04:43:44PM +0200, Orit Wasserman wrote: @@ -4641,10 +4955,13 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) int type; bool idtv_info_valid; - exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); - vmx-exit_reason = vmcs_read32(VM_EXIT_REASON); + if (vmx-nested.nested_mode) + return; + Why return here? What the function does that should not be done in nested mode? In nested mode L0 injects an interrupt to L2 only in one scenario, if there is an IDT_VALID event and L0 decides to run L2 again and not to switch back to L1. In all other cases the injection is handled by L1. This is exactly the kind of scenario that is handled by vmx_complete_interrupts(). (vmx|svm)_complete_interrups() store pending event in arch agnostic way and re-injection is handled by x86.c You bypass this logic by inserting return here and introducing nested_handle_valid_idt() function below. The only location we can truly know if we are switching to L1 is in vmx_vcpu_run because enable_irq_window (that is called after handling the exit) can decide to switch to L1 because of an interrupt. In order to simplify our code it was simpler to bypass vmx_complete_interrupts when it is called (after running L2) and to add nested_handle_valid_idt just before running L2. + exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); + /* Handle machine checks before interrupts are enabled */ if ((vmx-exit_reason == EXIT_REASON_MCE_DURING_VMENTRY) || (vmx-exit_reason == EXIT_REASON_EXCEPTION_NMI @@ -4747,6 +5064,60 @@ static void fixup_rmode_irq(struct vcpu_vmx *vmx) | vmx-rmode.irq.vector; } +static int nested_handle_valid_idt(struct kvm_vcpu *vcpu) +{ It seems by this function you are trying to bypass general event reinjection logic. Why? See above. The logic implemented by this function is handled in x86.c in arch agnostic way. Is there something wrong with this? See my comment before + vmx-launched = vmx-nested.l2_state-launched; + Can you explain why -launched logic is needed? It is possible L1 called vmlaunch but we didn't actually run L2 (for example there was an interrupt and enable_irq_window switched back to L1 before running L2). L1 thinks the vmlaunch was successful and call vmresume in the next time but KVM needs to call vmlaunch for L2. handle_vmlauch() and handle_vmresume() are exactly the same. Why KVM needs to run one and not the other? Yes they are very similar (almost the same code) the only difference is the check of vmclear, we need to emulate the vmx hardware behavior for those two commands and check VMC12 state. +static int nested_vmx_vmexit(struct kvm_vcpu *vcpu, + bool is_interrupt) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + int initial_pfu_active = vcpu-fpu_active; + + if (!vmx-nested.nested_mode) { + printk(KERN_INFO WARNING: %s called but not in nested mode \n, + __func__); + return 0; + } + + save_msrs(vmx-guest_msrs, vmx-save_nmsrs); + + sync_cached_regs_to_vmcs(vcpu); + + if (!nested_map_shadow_vmcs(vcpu)) { + printk(KERN_INFO Error mapping shadow vmcs\n); + set_rflags_to_vmx_fail_valid(vcpu); Error during vmexit should set abort flag, not change flags. I think this is more a vmlaunch/vmresume error (in the code that switch back to L1). How is this vmlaunch/vmresume error? This function is called to exit from L2 guest while on L2 vcms. It is called asynchronously in respect to L2 guest and you modify L2 guest rflags register at unpredictable place here. OK. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume
On Thu, Oct 15, 2009 at 04:41:46PM +0200, or...@il.ibm.com wrote: From: Orit Wasserman or...@il.ibm.com --- arch/x86/kvm/vmx.c | 1173 ++-- 1 files changed, 1148 insertions(+), 25 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6a4c252..e814029 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -209,6 +209,7 @@ struct __attribute__ ((__packed__)) level_state { struct vmcs *vmcs; int cpu; int launched; + bool first_launch; }; struct nested_vmx { @@ -216,6 +217,12 @@ struct nested_vmx { bool vmxon; /* What is the location of the vmcs l1 keeps for l2? (in level1 gpa) */ u64 vmptr; + /* Are we running nested guest */ + bool nested_mode; + /* L1 requested VMLAUNCH or VMRESUME but we didn't run L2 yet */ + bool nested_run_pending; + /* flag indicating if there was a valid IDT after exiting from l2 */ + bool nested_valid_idt; /* * Level 2 state : includes vmcs,registers and * a copy of vmcs12 for vmread/vmwrite @@ -240,6 +247,10 @@ static inline int vmcs_field_length(unsigned long field) return (VMCS_FIELD_LENGTH_MASK field) 13; } +#define NESTED_VM_EXIT_CONTROLS_MASK (~(VM_EXIT_LOAD_IA32_PAT | \ + VM_EXIT_SAVE_IA32_PAT)) +#define NESTED_VM_ENTRY_CONTROLS_MASK (~(VM_ENTRY_LOAD_IA32_PAT | \ + VM_ENTRY_IA32E_MODE)) struct vmcs { u32 revision_id; u32 abort; @@ -303,6 +314,12 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) return container_of(vcpu, struct vcpu_vmx, vcpu); } +static inline struct shadow_vmcs *get_shadow_vmcs(struct kvm_vcpu *vcpu) +{ + WARN_ON(!to_vmx(vcpu)-nested.l2_state-shadow_vmcs); + return to_vmx(vcpu)-nested.l2_state-shadow_vmcs; +} + #define SHADOW_VMCS_OFFSET(x) offsetof(struct shadow_vmcs, x) static unsigned short vmcs_field_to_offset_table[HOST_RIP+1] = { @@ -822,8 +839,16 @@ static struct kvm_vmx_segment_field { static void ept_save_pdptrs(struct kvm_vcpu *vcpu); static int nested_vmx_check_permission(struct kvm_vcpu *vcpu); +static int nested_vmx_check_exception(struct vcpu_vmx *vmx, unsigned nr, + bool has_error_code, u32 error_code); +static int nested_vmx_intr(struct kvm_vcpu *vcpu); static int create_l1_state(struct kvm_vcpu *vcpu); static int create_l2_state(struct kvm_vcpu *vcpu); +static int launch_guest(struct kvm_vcpu *vcpu); +static int nested_vmx_exit_handled_msr(struct kvm_vcpu *vcpu); +static int nested_vmx_exit_handled(struct kvm_vcpu *vcpu, bool kvm_override); +static int nested_vmx_vmexit(struct kvm_vcpu *vcpu, + bool is_interrupt); /* * Keep MSR_K6_STAR at the end, as setup_msrs() will try to optimize it @@ -940,6 +965,18 @@ static inline bool cpu_has_vmx_ept_2m_page(void) return !!(vmx_capability.ept VMX_EPT_2MB_PAGE_BIT); } +static inline int is_exception(u32 intr_info) +{ + return (intr_info (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK)) + == (INTR_TYPE_HARD_EXCEPTION | INTR_INFO_VALID_MASK); +} + +static inline int is_nmi(u32 intr_info) +{ + return (intr_info (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK)) + == (INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK); +} + static inline int cpu_has_vmx_invept_individual_addr(void) { return !!(vmx_capability.ept VMX_EPT_EXTENT_INDIVIDUAL_BIT); @@ -990,6 +1027,51 @@ static inline bool report_flexpriority(void) return flexpriority_enabled; } +static inline int nested_cpu_has_vmx_tpr_shadow(struct kvm_vcpu *vcpu) +{ + return cpu_has_vmx_tpr_shadow() + get_shadow_vmcs(vcpu)-cpu_based_vm_exec_control + CPU_BASED_TPR_SHADOW; +} + +static inline int nested_cpu_has_secondary_exec_ctrls(struct kvm_vcpu *vcpu) +{ + return cpu_has_secondary_exec_ctrls() + get_shadow_vmcs(vcpu)-cpu_based_vm_exec_control + CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; +} + +static inline bool nested_vm_need_virtualize_apic_accesses(struct kvm_vcpu +*vcpu) +{ + return get_shadow_vmcs(vcpu)-secondary_vm_exec_control + SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; +} + +static inline int nested_cpu_has_vmx_ept(struct kvm_vcpu *vcpu) +{ + return get_shadow_vmcs(vcpu)- + secondary_vm_exec_control SECONDARY_EXEC_ENABLE_EPT; +} + +static inline int nested_cpu_has_vmx_vpid(struct kvm_vcpu *vcpu) +{ + return get_shadow_vmcs(vcpu)-secondary_vm_exec_control + SECONDARY_EXEC_ENABLE_VPID; +} + +static inline int nested_cpu_has_vmx_pat(struct kvm_vcpu *vcpu) +{ + return get_shadow_vmcs(vcpu)-vm_entry_controls +
Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume
On 10/15/2009 11:41 PM, or...@il.ibm.com wrote: From: Orit Wassermanor...@il.ibm.com --- arch/x86/kvm/vmx.c | 1173 ++-- 1 files changed, 1148 insertions(+), 25 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6a4c252..e814029 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -209,6 +209,7 @@ struct __attribute__ ((__packed__)) level_state { struct vmcs *vmcs; int cpu; int launched; + bool first_launch; }; struct nested_vmx { @@ -216,6 +217,12 @@ struct nested_vmx { bool vmxon; /* What is the location of the vmcs l1 keeps for l2? (in level1 gpa) */ u64 vmptr; + /* Are we running nested guest */ + bool nested_mode; + /* L1 requested VMLAUNCH or VMRESUME but we didn't run L2 yet */ + bool nested_run_pending; + /* flag indicating if there was a valid IDT after exiting from l2 */ + bool nested_valid_idt; Did you mean valid_idt_vectoring_info? No need to prefix everything with nested_ inside nested_vmx. +void prepare_vmcs_12(struct kvm_vcpu *vcpu) +{ + struct shadow_vmcs *l2_shadow_vmcs = + get_shadow_vmcs(vcpu); + + l2_shadow_vmcs-guest_es_selector = vmcs_read16(GUEST_ES_SELECTOR); + l2_shadow_vmcs-guest_cs_selector = vmcs_read16(GUEST_CS_SELECTOR); + l2_shadow_vmcs-guest_ss_selector = vmcs_read16(GUEST_SS_SELECTOR); + l2_shadow_vmcs-guest_ds_selector = vmcs_read16(GUEST_DS_SELECTOR); + l2_shadow_vmcs-guest_fs_selector = vmcs_read16(GUEST_FS_SELECTOR); + l2_shadow_vmcs-guest_gs_selector = vmcs_read16(GUEST_GS_SELECTOR); + l2_shadow_vmcs-guest_ldtr_selector = vmcs_read16(GUEST_LDTR_SELECTOR); + l2_shadow_vmcs-guest_tr_selector = vmcs_read16(GUEST_TR_SELECTOR); + + l2_shadow_vmcs-tsc_offset = vmcs_read64(TSC_OFFSET); + l2_shadow_vmcs-guest_physical_address = + vmcs_read64(GUEST_PHYSICAL_ADDRESS); + l2_shadow_vmcs-vmcs_link_pointer = vmcs_read64(VMCS_LINK_POINTER); Physical addresses need translation, no? + l2_shadow_vmcs-guest_cr0 = vmcs_readl(GUEST_CR0); + + l2_shadow_vmcs-guest_cr4 = vmcs_readl(GUEST_CR4); We don't allow the guest to modify these, so no need to read them. If you do, you need to remove the bits that we modify. + +int load_vmcs_common(struct shadow_vmcs *src) +{ + + vmcs_write64(VMCS_LINK_POINTER, src-vmcs_link_pointer); Why load this? + vmcs_write64(GUEST_IA32_DEBUGCTL, src-guest_ia32_debugctl); I think some features there are dangerous. + vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, src-vm_entry_msr_load_count); Need to verify? Also need to validate the loaded MSRs and run them through kvm_set_msr() instead of letting the cpu do it. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html