Re: [PATCH 18/24] Exiting from L2 to L1

Avi Kivity Sun, 12 Sep 2010 10:21:49 -0700

 On 09/12/2010 07:05 PM, Nadav Har'El wrote:

+       vmcs12->guest_ia32_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);

Without msr bitmaps, cannot change.

I added a TODO before this (and a couple of others) for future
optimization.
I'm not even convinced how much quicker it is to check the MSR bitmap
before
doing vmcs_read64, vs just to going ahead and vmreading it in any case.

IIRC we don't support msr bitmaps now, so no need to check anything.

I do think we support msr bitmaps... E.g., we have
nested_vmx_exit_handled_msr() to check whether L1 requires an exit for a
certain MSR access.  Where don't we support them? (but I'm not denying the
possiblity that this support still has holes or bugs).


I was just talking from memory.  If you do support them, that's great.

Note that kvm itself doesn't support give the guest control ofDEBUGCTLMSR, so you should just be able to read it from the shadow value(which strangely doesn't exist - I'll post a fix).

In general, avoid vmcs reads as much as possible.  Just think of your
code running in a guest!

Yes. On the other hand, I don't want to be sorry in the future when I want
to support some feature, but because I wanted to shave off 1% of the L2->L1
switching time, and 0.01% of the total run time (and I'm just making
numbers up...), I now need to find a dozen places where things need to change
to support this feature. On the other hand, this will likely happen anyway ;-)

Well, with msrs you have two cases: those which the guest controls andthose which are shadowed. So all we need is a systematic way fordealing with the two types.

+       vmcs12->vm_entry_intr_info_field =
+               vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);

Autocleared, no need to read.

Well, we need to clear the "valid bit" on exit, so we don't mistakenly
inject
the same interrupt twice.

I don't think so.  We write this field as part of guest entry (that is,
after the decision re which vmcs to use, yes?), so guest entry will
always follow writing this field.  Since guest entry clears the field,
reading it after an exit will necessarily return 0.

Well, you obviously know the KVM code much better than I do, but from what
I saw, I thought (maybe I misunderstood) that in normal (non-nested) KVM,
this field only gets written on injection, not on every entry, so the code
relies on the fact that the processor turns off the "valid" bit during exit,
to avoid the same event being injected when the same field value is used for
another entry.


Correct.

I can only find code which resets this field in vmx_vcpu_reset(),
but that doesn't get called on every entry, right? Or am I missing something?

prepare_vmcs12() is called in response for a 2->1 vmexit which is firsttrapped by 0, yes? Because it's called immediately after a vmexit,VM_ENTRY_INTR_INFO_FIELD is guaranteed to have been cleared by theprocessor.

There are two cases where VM_ENTRY_INTR_INFO_FIELD can potentially notbe cleared by hardware:

1. if we call prepare_vmcs12() between injection and entry. This cannothappen AFAICT.2. if the vmexit was really a failed 1->2 vmentry, and if the processordoesn't clear VM_ENTRY_INTR_INFO_FIELD in response to vm entry failures(need to check scripture)

If neither of these are valid, the code can be removed. If only thesecond, we might make it conditional.

What can happen is that the contents of the field is transferred to the
IDT_VECTORING_INFO field or VM_EXIT_INTR_INFO field.

(question: on a failed vmentry, is this field cleared?)

I don't know the answer :-)


Sheng?

There were two ways to do it: 1. clear it ourselves,
or 2. copy the value from vmcs02 where the processor already cleared it.
There are pros and cons for each approach, but I'll change like you
suggest,
to clear it ourselves:

        vmcs12->vm_entry_intr_info_field&= ~INTR_INFO_VALID_MASK;

That's really a temporary variable, I don't think you need to touch it.

But we need to emulate the correct VMX behavior. According to the spec, the
"valid" bit on this field needs to be cleared on vmexit, so we need to do it
also on emulated exits from L2 to L1. If we're sure that we already cleared it
earlier, then fine, but if not (and like I said, I failed to find this code),
we need to do it now, on exit - either by explictly clearing the bit or by
copying a value where the processor cleared this bit (arguably, the former is
more correct emulation).

Sorry, I misread it as vmx->idt_vectoring_info which is a temporaryvariable used to cache IDT_VECTORING_INFO. Ignore my remark.

I didn't mean register independent helper; one function for cr0 and one
function for cr4 so the reader can just see the name and pretend to
understand what it does, instead of seeing a bunch of incomprehensible
bitwise operations.

Ok, done:

/*
  * On a nested exit from L2 to L1, vmcs12.guest_cr0 might not be up-to-date
  * because L2 may have changed some cr0 bits directly (see CRO_GUEST_HOST_MASK)
  * without L0 trapping the change and updating vmcs12.
  * This function returns the value we should put in vmcs12.guest_cr0. It's not
  * enough to just return the current (vmcs02) GUEST_CR0. This may not be the
  * guest cr0 that L1 thought it was giving its L2 guest - it is possible that
  * L1 wished to allow its guest to set a cr0 bit directly, but we (L0) asked
  * to trap this change and instead set just the read shadow. If this is the
  * case, we need to copy these read-shadow bits back to vmcs12.guest_cr0, where
  * L1 believes they already are.
  */
static inline unsigned long
vmcs12_guest_cr0(struct kvm_vcpu *vcpu, struct vmcs_fields *vmcs12)


newline...

{
        unsigned long guest_cr0_bits =
                vcpu->arch.cr0_guest_owned_bits | vmcs12->cr0_guest_host_mask;
        return (vmcs_readl(GUEST_CR0)&  guest_cr0_bits) |
                (vmcs_readl(CR0_READ_SHADOW)&  ~guest_cr0_bits);
}

and the call becomes just:

        vmcs12->guest_cr0 = vmcs12_guest_cr0(vcpu, vmcs12);

which is easy to glance over (but doesn't say much about what it is doing).


It's a little easier to digest, at least.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 18/24] Exiting from L2 to L1

Reply via email to