On 2013-07-02 17:15, Gleb Natapov wrote:
> On Tue, Jul 02, 2013 at 04:28:56PM +0200, Jan Kiszka wrote:
>> On 2013-07-02 15:59, Gleb Natapov wrote:
>>> On Tue, Jul 02, 2013 at 03:01:24AM +0000, Zhang, Yang Z wrote:
>>>> Since this series is pending in mail list for long time. And it's really a 
>>>> big feature for Nested. Also, I doubt the original authors(Jun and 
>>>> Nahav)should not have enough time to continue it. So I will pick it up. :)
>>>>
>>>> See comments below:
>>>>
>>>> Paolo Bonzini wrote on 2013-05-20:
>>>>> Il 19/05/2013 06:52, Jun Nakajima ha scritto:
>>>>>> From: Nadav Har'El <n...@il.ibm.com>
>>>>>>
>>>>>> Recent KVM, since
>>>>> http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577
>>>>>> switch the EFER MSR when EPT is used and the host and guest have 
>>>>>> different
>>>>>> NX bits. So if we add support for nested EPT (L1 guest using EPT to run 
>>>>>> L2)
>>>>>> and want to be able to run recent KVM as L1, we need to allow L1 to use 
>>>>>> this
>>>>>> EFER switching feature.
>>>>>>
>>>>>> To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER if
>>>>> available,
>>>>>> and if it isn't, it uses the generic VM_ENTRY/EXIT_MSR_LOAD. This patch 
>>>>>> adds
>>>>>> support for the former (the latter is still unsupported).
>>>>>>
>>>>>> Nested entry and exit emulation (prepare_vmcs_02 and
>>>>> load_vmcs12_host_state,
>>>>>> respectively) already handled VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So
>>>>> all
>>>>>> that's left to do in this patch is to properly advertise this feature to 
>>>>>> L1.
>>>>>>
>>>>>> Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by L0, by
>>>>> using
>>>>>> vmx_set_efer (which itself sets one of several vmcs02 fields), so we 
>>>>>> always
>>>>>> support this feature, regardless of whether the host supports it.
>>>>>>
>>>>>> Signed-off-by: Nadav Har'El <n...@il.ibm.com>
>>>>>> Signed-off-by: Jun Nakajima <jun.nakaj...@intel.com>
>>>>>> Signed-off-by: Xinhao Xu <xinhao...@intel.com>
>>>>>> ---
>>>>>>  arch/x86/kvm/vmx.c | 23 ++++++++++++++++-------
>>>>>>  1 file changed, 16 insertions(+), 7 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>>> index 260a919..fb9cae5 100644
>>>>>> --- a/arch/x86/kvm/vmx.c
>>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>>> @@ -2192,7 +2192,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
>>>>>>  #else
>>>>>>          nested_vmx_exit_ctls_high = 0;
>>>>>>  #endif
>>>>>> -        nested_vmx_exit_ctls_high |=
>>>>> VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
>>>>>> +        nested_vmx_exit_ctls_high |=
>>>>> (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
>>>>>> +                                      VM_EXIT_LOAD_IA32_EFER);
>>>>>>
>>>>>>          /* entry controls */
>>>>>>          rdmsr(MSR_IA32_VMX_ENTRY_CTLS,
>>>>>> @@ -2201,8 +2202,8 @@ static __init void nested_vmx_setup_ctls_msrs(void)
>>>>>>          nested_vmx_entry_ctls_low =
>>>>> VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
>>>>>>          nested_vmx_entry_ctls_high &=
>>>>>>                  VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_IA32E_MODE;
>>>>>> -        nested_vmx_entry_ctls_high |=
>>>>> VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
>>>>>> -
>>>>>> +        nested_vmx_entry_ctls_high |=
>>>>> (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR |
>>>>>> +                                       VM_ENTRY_LOAD_IA32_EFER);
>>>>>>          /* cpu-based controls */
>>>>>>          rdmsr(MSR_IA32_VMX_PROCBASED_CTLS,
>>>>>>                  nested_vmx_procbased_ctls_low, 
>>>>>> nested_vmx_procbased_ctls_high);
>>>>>> @@ -7492,10 +7493,18 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu,
>>>>> struct vmcs12 *vmcs12)
>>>>>>          vcpu->arch.cr0_guest_owned_bits &= ~vmcs12->cr0_guest_host_mask;
>>>>>>          vmcs_writel(CR0_GUEST_HOST_MASK,
>>>>> ~vcpu->arch.cr0_guest_owned_bits);
>>>>>>
>>>>>> -        /* Note: IA32_MODE, LOAD_IA32_EFER are modified by vmx_set_efer
>>>>> below */
>>>>>> -        vmcs_write32(VM_EXIT_CONTROLS,
>>>>>> -                vmcs12->vm_exit_controls | vmcs_config.vmexit_ctrl);
>>>>>> -        vmcs_write32(VM_ENTRY_CONTROLS, vmcs12->vm_entry_controls |
>>>>>> +        /* L2->L1 exit controls are emulated - the hardware exit is to 
>>>>>> L0 so
>>>>>> +         * we should use its exit controls. Note that IA32_MODE, 
>>>>>> LOAD_IA32_EFER
>>>>>> +         * bits are further modified by vmx_set_efer() below.
>>>>>> +         */
>>>>>> +        vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
>>>> This is wrong. We cannot use L0 exit control directly.
>>>> LOAD_PERF_GLOBAL_CTRL, LOAD_HOST_EFE, LOAD_HOST_PAT, ACK_INTR_ON_EXIT 
>>>> should use host's exit control. But others, still need use (vmcs12|host).
>>>>
>>> I do not see why. We always intercept DR7/PAT/EFER, so save is emulated
>>> too. Host address space size always come from L0 and preemption timer is
>>> not supported for nested IIRC and when it will be host will have to save
>>> it on exit anyway for correct emulation.
>>
>> Preemption timer is already supported and works fine as far as I tested.
>> KVM doesn't use it for L1, so we do not need to save/restore it - IIRC.
>>
> So what happens if L1 configures it to value X after X/2 ticks L0 exit
> happen and L0 gets back to L2 directly. The counter will be X again
> instead of X/2.

Likely. Yes, we need to improve our emulation by setting "Save
VMX-preemption timer value" or emulate this in software if the hardware
lacks support for it (was this flag introduced after the preemption
timer itself?).

Jan


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to