On 2013-03-07 09:12, Jan Kiszka wrote: > On 2013-03-07 08:51, Gleb Natapov wrote: >> On Mon, Mar 04, 2013 at 08:40:29PM +0100, Jan Kiszka wrote: >>> The logic for calculating the value with which we call kvm_set_cr0/4 was >>> broken (will definitely be visible with nested unrestricted guest mode >>> support). Also, we performed the check regarding CR0_ALWAYSON too early >>> when in guest mode. >>> >>> What really needs to be done on both CR0 and CR4 is to mask out L1-owned >>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4 and >>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and, thus, >>> are not suited as input. >>> >>> For both CRs, we can then apply the check against VMXON_CRx_ALWAYSON and >>> refuse the update if it fails. To be fully consistent, we implement this >>> check now also for CR4. >>> >>> Finally, we have to set the shadow to the value L2 wanted to write >>> originally. >>> >>> Signed-off-by: Jan Kiszka <jan.kis...@siemens.com> >>> --- >>> >>> Changes in v2: >>> - keep the non-misleading part of the comment in handle_set_cr0 >>> >>> arch/x86/kvm/vmx.c | 46 +++++++++++++++++++++++++++++++--------------- >>> 1 files changed, 31 insertions(+), 15 deletions(-) >>> >>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>> index 7cc566b..832b7b4 100644 >>> --- a/arch/x86/kvm/vmx.c >>> +++ b/arch/x86/kvm/vmx.c >>> @@ -4605,37 +4605,53 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned >>> char *hypercall) >>> /* called to set cr0 as appropriate for a mov-to-cr0 exit. */ >>> static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val) >>> { >>> - if (to_vmx(vcpu)->nested.vmxon && >>> - ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON)) >>> - return 1; >>> - >>> if (is_guest_mode(vcpu)) { >>> + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); >>> + unsigned long orig_val = val; >>> + >>> /* >>> * We get here when L2 changed cr0 in a way that did not change >>> * any of L1's shadowed bits (see nested_vmx_exit_handled_cr), >>> - * but did change L0 shadowed bits. This can currently happen >>> - * with the TS bit: L0 may want to leave TS on (for lazy fpu >>> - * loading) while pretending to allow the guest to change it. >>> + * but did change L0 shadowed bits. >>> */ >>> - if (kvm_set_cr0(vcpu, (val & vcpu->arch.cr0_guest_owned_bits) | >>> - (vcpu->arch.cr0 & ~vcpu->arch.cr0_guest_owned_bits))) >>> + val = (val & ~vmcs12->cr0_guest_host_mask) | >>> + (vmcs_read64(GUEST_CR0) & vmcs12->cr0_guest_host_mask); >> I think using GUEST_CR0 here is incorrect. It contains combination of bits >> set by L2, L1 and L0 and here we need to get only L2/L1 mix which is in >> vcpu->arch.cr0 (almost, but good enough for this case). Why vcpu->arch.cr0 >> contains right L2/L1 mix? > > L0/L1. E.g., kvm_set_cr0 unconditionally injects X86_CR0_ET and masks > out reserved bits. But you are right, GUEST_CR0 isn't much better. And > maybe that mangling in kvm_set_cr0 is a corner case we can ignore. > >> Because it was set to vmcs12->guest_cr0 during >> L2 #vmentry. While L2 is running three things may happen to CR0: >> >> 1. L2 writes to a bit that is not shadowed neither by L1 nor by L0. It >> will go strait to GUEST_CR0. >> 2. L2 writes to a bit shadowed by L1. L1 #vmexit will be emulated. On the >> next #vmetry vcpu->arch.cr0 will be set to whatever value L1 calculated. >> 3. L2 writes to a bit shadowed by L0, but not L1. This is the case we >> are handling here. And if we will do it right vcpu->arch.cr0 will be >> up-to-date at the end. >> >> The only case when, while this code running, vcpu->arch.cr0 has not >> up-to-date value is if 1 happened, but since L2 guest overwriting cr0 >> here anyway it does not matter what it previously set in GUEST_CR0. The >> correct bits are in a new cr0 value provided by val and accessible by >> (val & ~vmcs12->cr0_guest_host_mask). > > I need to think about it again. Maybe vmcs12->guest_cr0 is best, but > that's a shot from the hips now.
Yes, vmcs12->guest_cr0/4 is the correct input. We cannot tolerate to pick up L0-mangled bits from arch.cr0/4 as we will perform checks on the result and report errors back to the guest. Do you agree? Marcelo, can you drop this version of the patch (I do not see it in public git yet) and wait for some v3 I will send later? Otherwise I will fix on top. Thanks, Jan
signature.asc
Description: OpenPGP digital signature