On Thu, Mar 07, 2013 at 09:27:10AM +0100, Jan Kiszka wrote:
> On 2013-03-07 09:12, Jan Kiszka wrote:
> > On 2013-03-07 08:51, Gleb Natapov wrote:
> >> On Mon, Mar 04, 2013 at 08:40:29PM +0100, Jan Kiszka wrote:
> >>> The logic for calculating the value with which we call kvm_set_cr0/4 was
> >>> broken (will definitely be visible with nested unrestricted guest mode
> >>> support). Also, we performed the check regarding CR0_ALWAYSON too early
> >>> when in guest mode.
> >>>
> >>> What really needs to be done on both CR0 and CR4 is to mask out L1-owned
> >>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and
> >>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus,
> >>> are not suited as input.
> >>>
> >>> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and
> >>> refuse the update if it fails. To be fully consistent, we implement this
> >>> check now also for CR4.
> >>>
> >>> Finally, we have to set the shadow to the value L2 wanted to write
> >>> originally.
> >>>
> >>> Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
> >>> ---
> >>>
> >>> Changes in v2:
> >>>  - keep the non-misleading part of the comment in handle_set_cr0
> >>>
> >>>  arch/x86/kvm/vmx.c |   46 +++++++++++++++++++++++++++++++---------------
> >>>  1 files changed, 31 insertions(+), 15 deletions(-)
> >>>
> >>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>> index 7cc566b..832b7b4 100644
> >>> --- a/arch/x86/kvm/vmx.c
> >>> +++ b/arch/x86/kvm/vmx.c
> >>> @@ -4605,37 +4605,53 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, 
> >>> unsigned char *hypercall)
> >>>  /* called to set cr0 as appropriate for a mov-to-cr0 exit. */
> >>>  static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val)
> >>>  {
> >>> - if (to_vmx(vcpu)->nested.vmxon &&
> >>> -     ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON))
> >>> -         return 1;
> >>> -
> >>>   if (is_guest_mode(vcpu)) {
> >>> +         struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> >>> +         unsigned long orig_val = val;
> >>> +
> >>>           /*
> >>>            * We get here when L2 changed cr0 in a way that did not change
> >>>            * any of L1's shadowed bits (see nested_vmx_exit_handled_cr),
> >>> -          * but did change L0 shadowed bits. This can currently happen
> >>> -          * with the TS bit: L0 may want to leave TS on (for lazy fpu
> >>> -          * loading) while pretending to allow the guest to change it.
> >>> +          * but did change L0 shadowed bits.
> >>>            */
> >>> -         if (kvm_set_cr0(vcpu, (val & vcpu->arch.cr0_guest_owned_bits) |
> >>> -                  (vcpu->arch.cr0 & ~vcpu->arch.cr0_guest_owned_bits)))
> >>> +         val = (val & ~vmcs12->cr0_guest_host_mask) |
> >>> +                 (vmcs_read64(GUEST_CR0) & vmcs12->cr0_guest_host_mask);
> >> I think using GUEST_CR0 here is incorrect. It contains combination of bits
> >> set by L2, L1 and L0 and here we need to get only L2/L1 mix which is in
> >> vcpu->arch.cr0 (almost, but good enough for this case). Why vcpu->arch.cr0
> >> contains right L2/L1 mix?
> > 
> > L0/L1. E.g., kvm_set_cr0 unconditionally injects X86_CR0_ET and masks
> > out reserved bits. But you are right, GUEST_CR0 isn't much better. And
> > maybe that mangling in kvm_set_cr0 is a corner case we can ignore.
> > 
> >> Because it was set to vmcs12->guest_cr0 during
> >> L2 #vmentry. While L2 is running three things may happen to CR0:
> >>
> >>  1. L2 writes to a bit that is not shadowed neither by L1 nor by L0. It
> >>     will go strait to GUEST_CR0.
> >>  2. L2 writes to a bit shadowed by L1. L1 #vmexit will be emulated. On the
> >>     next #vmetry vcpu->arch.cr0 will be set to whatever value L1 
> >> calculated.
> >>  3. L2 writes to a bit shadowed by L0, but not L1. This is the case we
> >>     are handling here. And if we will do it right vcpu->arch.cr0 will be
> >>     up-to-date at the end.
> >>
> >> The only case when, while this code running, vcpu->arch.cr0 has not
> >> up-to-date value is if 1 happened, but since L2 guest overwriting cr0
> >> here anyway it does not matter what it previously set in GUEST_CR0. The
> >> correct bits are in a new cr0 value provided by val and accessible by
> >> (val & ~vmcs12->cr0_guest_host_mask).
> > 
> > I need to think about it again. Maybe vmcs12->guest_cr0 is best, but
> > that's a shot from the hips now.
> 
> Yes, vmcs12->guest_cr0/4 is the correct input. We cannot tolerate to
> pick up L0-mangled bits from arch.cr0/4 as we will perform checks on the
> result and report errors back to the guest. Do you agree?
> 
See my answer to your previous email about when vmcs12->guest_cr0/4 may
contain incorrect value. About mangling in arch.cr0/4 I think it is
benign, but we can do L1 #vmexit if reserved bit is not zero or ET bit
is not set in L2 provided cr0 and let L1 handle it.

--
                        Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to