Il 05/03/2013 10:37, Gleb Natapov ha scritto:
> > > > Not at all. I'm keeping the state in a single place, mp_state. I just
> > > > have to make sure that I do not loose asynchronous events - what INIT
> > > > and SIPI are.
> > >
> > > As evident from this code:
> > >  +           if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED ||
> > >  +               test_bit(KVM_REQ_INIT, &vcpu->requests)) {
> > > the state is in two places.
> > That's just to protect the content of sipi_vector during delivery.

Is that even needed?  Can we just do an unconditional:

        vcpu->arch.sipi_vector = vector;
        /* make sure sipi_vector is visible for the receiver */
        smp_wmb();
        kvm_make_request(KVM_REQ_SIPI, vcpu);
        kvm_vcpu_kick(vcpu);

and check in the request handler that we did get an INIT?

> > We could drop the complete if clause if we protected that variable 
> > differently.
> 
> I understand why the code is here. I am saying that this is the evidence
> that the state is in two places.

I agree with Gleb here.  The state is in two places.  I'm not saying that
using requests is wrong, it sounds nice especially for nested INIT.  But
there is something nasty in the patches so far.

BTW checking the requests in kvm_apic_has_interrupt seems nicer than
synchronizing in kvm_arch_cpu_runnable; it is a bit ugly to have side
effects in kvm_arch_cpu_runnable.  Then you can actually _process_
the requests only in vcpu_enter_guest and kvm_arch_vcpu_ioctl_get_mpstate().
In kvm_arch_vcpu_ioctl_put_mpstate(), KVM_MP_STATE_SIPI_RECEIVED has to
become KVM_MP_STATE_INIT_RECEIVED with KVM_REQ_SIPI set.

>>> > > 
>>>>> > >>> To overcome this we can either deprecated
>>>>> > >>> KVM_MP_STATE_INIT_RECEIVED/KVM_MP_STATE_SIPI_RECEIVED values for 
>>>>> > >>> mp_state
>>>>> > >>> (use it only for migration purposes) and use separate state in APIC
>>>>> > >>> to hold those event, like with nmi, or why not go with Paolo's 
>>>>> > >>> simple
>>>>> > >>> cmpxchg one?
>>>> > >>
>>>> > >> We need to replace most, if not all, manipulations of mp_state with
>>>> > >> cmpxchg, verifying the state transitions there. 

Let's take again the matrix I sent yesterday, fixed to make UNINIT
unreachable (that transition can only be done by userspace):

from \ to    RUNNABLE     UNINIT      INIT     HALTED       SIPI
RUNNABLE       n/a          NO        yes       yes         NO
UNINIT         NO           n/a       yes       NO          NO
INIT           NO           NO        yes       NO          yes
HALTED         yes          NO        yes       n/a         NO
SIPI           yes          NO        yes       NO          n/a

We have:

- anything -> INIT: this is asynchronous, and INIT always wins.  No need
for a cmpxchg.

- RUNNABLE -> HALTED: cmpxchg needed

- INIT -> SIPI: we can have a race between the last two interrupts in an
INIT-INIT-SIPI (or INIT-SIPI-INIT) sequence, with these two interrupts sent
by two different processors.  The same race should also be present in real
hardware, and should never happen (the BSP should send SIPIs to all other
processors).  No need for a cmpxchg.

- HALTED -> RUNNABLE: racy vs. INIT too, cmpxchg needed

- SIPI -> RUNNABLE: there is the same race with going back to INIT; no need
for a cmpxchg.


But it probably will not scale well with nested virt.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to