Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-20 Thread Avi Kivity

On 09/19/2011 06:57 PM, Marcelo Tosatti wrote:

>  >Decrement when setting nmi_injected = false, increment when setting
>  >nmi_injected = true, in vmx/svm.c.
>
>  That gives a queue length of 3: one running nmi and nmi_pending = 2.

Increment through the same wrapper that will collapse the second and next, also
used by kvm_inject_nmi.




Cannot be done outside vcpu context.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-19 Thread Marcelo Tosatti
On Mon, Sep 19, 2011 at 06:37:35PM +0300, Avi Kivity wrote:
> On 09/19/2011 06:22 PM, Marcelo Tosatti wrote:
> >On Mon, Sep 19, 2011 at 06:09:39PM +0300, Avi Kivity wrote:
> >>  On 09/19/2011 05:54 PM, Marcelo Tosatti wrote:
> >>  >On Mon, Sep 19, 2011 at 05:30:27PM +0300, Avi Kivity wrote:
> >>  >>   On 09/19/2011 04:54 PM, Marcelo Tosatti wrote:
> >>  >>   >>>>
> >>  >>   >>>> Yes, due to NMI-blocked-by-STI.  A really touchy area.
> >>  >>   >>>And we don't need the window exit notification then? I don't 
> >> understand
> >>  >>   >>>what nmi_in_progress is supposed to do here.
> >>  >>   >>
> >>  >>   >>We need the window notification in both cases.  If we're 
> >> recovering
> >>  >>   >>from STI, then we don't need to collapse NMIs.  If we're 
> >> completing
> >>  >>   >>an NMI handler, then we do need to collapse NMIs (since the 
> >> queue
> >>  >>   >>length is two, and we just completed one).
> >>  >>   >
> >>  >>   >I don't understand what is the point with nmi_in_progress, and the 
> >> above
> >>  >>   >hunk, either. Can't inject_nmi do:
> >>  >>   >
> >>  >>   >if (nmi_injected + atomic_read(nmi_pending)<2)
> >>  >>   >   atomic_inc(nmi_pending)
> >>  >>   >
> >>  >>   >Instead of collapsing somewhere else?
> >>  >>
> >>  >>   We could.  It's not atomic though - two threads executing in
> >>  >>   parallel could raise the value to three.  Could do a cmpxchg loop
> >>  >>   does an increment bounded to two.  I guess this is a lot clearer,
> >>  >>   thanks.
> >>  >>
> >>  >>   >You'd also have to change
> >>  >>   >nmi_injected handling in arch code so its value is not "hidden", in
> >>  >>   >complete_interrupts().
> >>  >>
> >>  >>   Or maybe make raising nmi_injected not decrement nmi_pending.  So:
> >>  >>
> >>  >> nmi_pending: total number of interrupts in queue
> >>  >> nmi_injected: of these, how many are currently being injected
> >>  >>
> >>  >>   yes?
> >>  >
> >>  >Yes, at the expense of decrementing on subarch code (which is fine,
> >>  >apparently).
> >>  >
> >>
> >>  Hm, we have no place to decrement.
> >
> >Decrement when setting nmi_injected = false, increment when setting
> >nmi_injected = true, in vmx/svm.c.
> 
> That gives a queue length of 3: one running nmi and nmi_pending = 2.

Increment through the same wrapper that will collapse the second and next, also
used by kvm_inject_nmi.

> >>   We need to do that when IRET
> >>  executes, but we don't want to request an NMI window exit in the
> >>  common case of nmi_pending = 1.
> >
> >Do not enable nmi window if nmi_injected = true?
> >
> 
> We have to, since we need a back-to-back nmi if the queue length > 1
> (including the running nmi).
> 
> -- 
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-19 Thread Avi Kivity

On 09/19/2011 06:22 PM, Marcelo Tosatti wrote:

On Mon, Sep 19, 2011 at 06:09:39PM +0300, Avi Kivity wrote:
>  On 09/19/2011 05:54 PM, Marcelo Tosatti wrote:
>  >On Mon, Sep 19, 2011 at 05:30:27PM +0300, Avi Kivity wrote:
>  >>   On 09/19/2011 04:54 PM, Marcelo Tosatti wrote:
>  >>   >>>>
>  >>   >>>> Yes, due to NMI-blocked-by-STI.  A really touchy area.
>  >>   >>>And we don't need the window exit notification then? I don't 
understand
>  >>   >>>what nmi_in_progress is supposed to do here.
>  >>   >>
>  >>   >>We need the window notification in both cases.  If we're 
recovering
>  >>   >>from STI, then we don't need to collapse NMIs.  If we're 
completing
>  >>   >>an NMI handler, then we do need to collapse NMIs (since the queue
>  >>   >>length is two, and we just completed one).
>  >>   >
>  >>   >I don't understand what is the point with nmi_in_progress, and the 
above
>  >>   >hunk, either. Can't inject_nmi do:
>  >>   >
>  >>   >if (nmi_injected + atomic_read(nmi_pending)<2)
>  >>   >   atomic_inc(nmi_pending)
>  >>   >
>  >>   >Instead of collapsing somewhere else?
>  >>
>  >>   We could.  It's not atomic though - two threads executing in
>  >>   parallel could raise the value to three.  Could do a cmpxchg loop
>  >>   does an increment bounded to two.  I guess this is a lot clearer,
>  >>   thanks.
>  >>
>  >>   >You'd also have to change
>  >>   >nmi_injected handling in arch code so its value is not "hidden", in
>  >>   >complete_interrupts().
>  >>
>  >>   Or maybe make raising nmi_injected not decrement nmi_pending.  So:
>  >>
>  >> nmi_pending: total number of interrupts in queue
>  >> nmi_injected: of these, how many are currently being injected
>  >>
>  >>   yes?
>  >
>  >Yes, at the expense of decrementing on subarch code (which is fine,
>  >apparently).
>  >
>
>  Hm, we have no place to decrement.

Decrement when setting nmi_injected = false, increment when setting
nmi_injected = true, in vmx/svm.c.


That gives a queue length of 3: one running nmi and nmi_pending = 2.


>   We need to do that when IRET
>  executes, but we don't want to request an NMI window exit in the
>  common case of nmi_pending = 1.

Do not enable nmi window if nmi_injected = true?



We have to, since we need a back-to-back nmi if the queue length > 1 
(including the running nmi).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-19 Thread Marcelo Tosatti
On Mon, Sep 19, 2011 at 06:09:39PM +0300, Avi Kivity wrote:
> On 09/19/2011 05:54 PM, Marcelo Tosatti wrote:
> >On Mon, Sep 19, 2011 at 05:30:27PM +0300, Avi Kivity wrote:
> >>  On 09/19/2011 04:54 PM, Marcelo Tosatti wrote:
> >>  >>   >>
> >>  >>   >>Yes, due to NMI-blocked-by-STI.  A really touchy area.
> >>  >>   >And we don't need the window exit notification then? I don't 
> >> understand
> >>  >>   >what nmi_in_progress is supposed to do here.
> >>  >>
> >>  >>   We need the window notification in both cases.  If we're recovering
> >>  >>   from STI, then we don't need to collapse NMIs.  If we're completing
> >>  >>   an NMI handler, then we do need to collapse NMIs (since the queue
> >>  >>   length is two, and we just completed one).
> >>  >
> >>  >I don't understand what is the point with nmi_in_progress, and the above
> >>  >hunk, either. Can't inject_nmi do:
> >>  >
> >>  >if (nmi_injected + atomic_read(nmi_pending)<   2)
> >>  >  atomic_inc(nmi_pending)
> >>  >
> >>  >Instead of collapsing somewhere else?
> >>
> >>  We could.  It's not atomic though - two threads executing in
> >>  parallel could raise the value to three.  Could do a cmpxchg loop
> >>  does an increment bounded to two.  I guess this is a lot clearer,
> >>  thanks.
> >>
> >>  >You'd also have to change
> >>  >nmi_injected handling in arch code so its value is not "hidden", in
> >>  >complete_interrupts().
> >>
> >>  Or maybe make raising nmi_injected not decrement nmi_pending.  So:
> >>
> >>nmi_pending: total number of interrupts in queue
> >>nmi_injected: of these, how many are currently being injected
> >>
> >>  yes?
> >
> >Yes, at the expense of decrementing on subarch code (which is fine,
> >apparently).
> >
> 
> Hm, we have no place to decrement. 

Decrement when setting nmi_injected = false, increment when setting
nmi_injected = true, in vmx/svm.c.

>  We need to do that when IRET
> executes, but we don't want to request an NMI window exit in the
> common case of nmi_pending = 1.

Do not enable nmi window if nmi_injected = true?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-19 Thread Avi Kivity

On 09/19/2011 06:09 PM, Avi Kivity wrote:

On 09/19/2011 05:54 PM, Marcelo Tosatti wrote:

On Mon, Sep 19, 2011 at 05:30:27PM +0300, Avi Kivity wrote:
>  On 09/19/2011 04:54 PM, Marcelo Tosatti wrote:
> >> >>
> >> >>Yes, due to NMI-blocked-by-STI.  A really touchy area.
> >> >And we don't need the window exit notification then? I don't 
understand

> >> >what nmi_in_progress is supposed to do here.
> >>
> >>   We need the window notification in both cases.  If we're 
recovering
> >>   from STI, then we don't need to collapse NMIs.  If we're 
completing

> >>   an NMI handler, then we do need to collapse NMIs (since the queue
> >>   length is two, and we just completed one).
> >
> >I don't understand what is the point with nmi_in_progress, and the 
above

> >hunk, either. Can't inject_nmi do:
> >
> >if (nmi_injected + atomic_read(nmi_pending)<   2)
> >  atomic_inc(nmi_pending)
> >
> >Instead of collapsing somewhere else?
>
>  We could.  It's not atomic though - two threads executing in
>  parallel could raise the value to three.  Could do a cmpxchg loop
>  does an increment bounded to two.  I guess this is a lot clearer,
>  thanks.
>
> >You'd also have to change
> >nmi_injected handling in arch code so its value is not "hidden", in
> >complete_interrupts().
>
>  Or maybe make raising nmi_injected not decrement nmi_pending.  So:
>
>nmi_pending: total number of interrupts in queue
>nmi_injected: of these, how many are currently being injected
>
>  yes?

Yes, at the expense of decrementing on subarch code (which is fine,
apparently).



Hm, we have no place to decrement.  We need to do that when IRET 
executes, but we don't want to request an NMI window exit in the 
common case of nmi_pending = 1.




I guess we have to change kvm_inject_nmi to run in vcpu context, where 
it has access to more stuff.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-19 Thread Avi Kivity

On 09/19/2011 05:54 PM, Marcelo Tosatti wrote:

On Mon, Sep 19, 2011 at 05:30:27PM +0300, Avi Kivity wrote:
>  On 09/19/2011 04:54 PM, Marcelo Tosatti wrote:
>  >>   >>
>  >>   >>Yes, due to NMI-blocked-by-STI.  A really touchy area.
>  >>   >And we don't need the window exit notification then? I don't understand
>  >>   >what nmi_in_progress is supposed to do here.
>  >>
>  >>   We need the window notification in both cases.  If we're recovering
>  >>   from STI, then we don't need to collapse NMIs.  If we're completing
>  >>   an NMI handler, then we do need to collapse NMIs (since the queue
>  >>   length is two, and we just completed one).
>  >
>  >I don't understand what is the point with nmi_in_progress, and the above
>  >hunk, either. Can't inject_nmi do:
>  >
>  >if (nmi_injected + atomic_read(nmi_pending)<   2)
>  >  atomic_inc(nmi_pending)
>  >
>  >Instead of collapsing somewhere else?
>
>  We could.  It's not atomic though - two threads executing in
>  parallel could raise the value to three.  Could do a cmpxchg loop
>  does an increment bounded to two.  I guess this is a lot clearer,
>  thanks.
>
>  >You'd also have to change
>  >nmi_injected handling in arch code so its value is not "hidden", in
>  >complete_interrupts().
>
>  Or maybe make raising nmi_injected not decrement nmi_pending.  So:
>
>nmi_pending: total number of interrupts in queue
>nmi_injected: of these, how many are currently being injected
>
>  yes?

Yes, at the expense of decrementing on subarch code (which is fine,
apparently).



Hm, we have no place to decrement.  We need to do that when IRET 
executes, but we don't want to request an NMI window exit in the common 
case of nmi_pending = 1.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-19 Thread Marcelo Tosatti
On Mon, Sep 19, 2011 at 05:30:27PM +0300, Avi Kivity wrote:
> On 09/19/2011 04:54 PM, Marcelo Tosatti wrote:
> >>  >>
> >>  >>   Yes, due to NMI-blocked-by-STI.  A really touchy area.
> >>  >And we don't need the window exit notification then? I don't understand
> >>  >what nmi_in_progress is supposed to do here.
> >>
> >>  We need the window notification in both cases.  If we're recovering
> >>  from STI, then we don't need to collapse NMIs.  If we're completing
> >>  an NMI handler, then we do need to collapse NMIs (since the queue
> >>  length is two, and we just completed one).
> >
> >I don't understand what is the point with nmi_in_progress, and the above
> >hunk, either. Can't inject_nmi do:
> >
> >if (nmi_injected + atomic_read(nmi_pending)<  2)
> > atomic_inc(nmi_pending)
> >
> >Instead of collapsing somewhere else?
> 
> We could.  It's not atomic though - two threads executing in
> parallel could raise the value to three.  Could do a cmpxchg loop
> does an increment bounded to two.  I guess this is a lot clearer,
> thanks.
> 
> >You'd also have to change
> >nmi_injected handling in arch code so its value is not "hidden", in
> >complete_interrupts().
> 
> Or maybe make raising nmi_injected not decrement nmi_pending.  So:
> 
>   nmi_pending: total number of interrupts in queue
>   nmi_injected: of these, how many are currently being injected
> 
> yes?

Yes, at the expense of decrementing on subarch code (which is fine,
apparently).

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-19 Thread Avi Kivity

On 09/19/2011 04:54 PM, Marcelo Tosatti wrote:

>  >>
>  >>   Yes, due to NMI-blocked-by-STI.  A really touchy area.
>  >And we don't need the window exit notification then? I don't understand
>  >what nmi_in_progress is supposed to do here.
>
>  We need the window notification in both cases.  If we're recovering
>  from STI, then we don't need to collapse NMIs.  If we're completing
>  an NMI handler, then we do need to collapse NMIs (since the queue
>  length is two, and we just completed one).

I don't understand what is the point with nmi_in_progress, and the above
hunk, either. Can't inject_nmi do:

if (nmi_injected + atomic_read(nmi_pending)<  2)
 atomic_inc(nmi_pending)

Instead of collapsing somewhere else?


We could.  It's not atomic though - two threads executing in parallel 
could raise the value to three.  Could do a cmpxchg loop does an 
increment bounded to two.  I guess this is a lot clearer, thanks.



You'd also have to change
nmi_injected handling in arch code so its value is not "hidden", in
complete_interrupts().


Or maybe make raising nmi_injected not decrement nmi_pending.  So:

  nmi_pending: total number of interrupts in queue
  nmi_injected: of these, how many are currently being injected

yes?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-19 Thread Marcelo Tosatti
On Thu, Sep 15, 2011 at 08:48:58PM +0300, Avi Kivity wrote:
> On 09/15/2011 08:25 PM, Jan Kiszka wrote:
> >>
> >>  I think so.  Suppose the vcpu enters just after kvm_make_request(); it
> >>  sees KVM_REQ_EVENT and clears it, but doesn't see nmi_pending because it
> >>  wasn't set set.  Then comes a kick, the guest is reentered with
> >>  nmi_pending set but KVM_REQ_EVENT clear and sails through the check and
> >>  enters the guest.  The NMI is delayed until the next KVM_REQ_EVENT.
> >
> >That makes sense - and the old code looks more strange now.
> 
> I think it dates to the time all NMIs were synchronous.
> 
> 
>   /* try to inject new event if pending */
>    -  if (vcpu->arch.nmi_pending) {
>    +  if (atomic_read(&vcpu->arch.nmi_pending)) {
>   if (kvm_x86_ops->nmi_allowed(vcpu)) {
>    -  vcpu->arch.nmi_pending = false;
>    +  atomic_dec(&vcpu->arch.nmi_pending);
> >>>
> >>>  Here we lost NMIs in the past by overwriting nmi_pending while another
> >>>  one was already queued, right?
> >>
> >>  One place, yes.  The other is kvm_inject_nmi() - if the first nmi didn't
> >>  get picked up by the vcpu by the time the second nmi arrives, we lose
> >>  the second nmi.
> >
> >Thinking this through again, it's actually not yet clear to me what we
> >are modeling here: If two NMI events arrive almost perfectly in
> >parallel, does the real hardware guarantee that they will always cause
> >two NMI events in the CPU? Then this is required.
> 
> It's not 100% clear from the SDM, but this is what I understood from
> it.  And it's needed - the NMI handlers are now being reworked to
> handle just one NMI source (hopefully the cheapest) in the handler,
> and if we detect a back-to-back NMI, handle all possible NMI
> sources.  This optimization is needed in turn so we can use Jeremy's
> paravirt spinlock framework, which requires a sleep primitive and a
> wake-up-even-if-the-sleeper-has-interrupts-disabled primitive.  i
> thought of using HLT and NMIs respectively, but that means we need a
> cheap handler (i.e. don't go reading PMU MSRs).
> 
> >Otherwise I just lost understanding again why we were loosing NMIs here
> >and in kvm_inject_nmi (maybe elsewhere then?).
> 
> Because of that.
> 
> >>
>   if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>   inject_pending_event(vcpu);
> 
>   /* enable NMI/IRQ window open exits if needed */
>    -  if (nmi_pending)
>    +  if (atomic_read(&vcpu->arch.nmi_pending)
>    +  &&   nmi_in_progress(vcpu))
> >>>
> >>>  Is nmi_pending&&   !nmi_in_progress possible at all?
> >>
> >>  Yes, due to NMI-blocked-by-STI.  A really touchy area.
> >And we don't need the window exit notification then? I don't understand
> >what nmi_in_progress is supposed to do here.
> 
> We need the window notification in both cases.  If we're recovering
> from STI, then we don't need to collapse NMIs.  If we're completing
> an NMI handler, then we do need to collapse NMIs (since the queue
> length is two, and we just completed one).

I don't understand what is the point with nmi_in_progress, and the above
hunk, either. Can't inject_nmi do:

if (nmi_injected + atomic_read(nmi_pending) < 2)
atomic_inc(nmi_pending)

Instead of collapsing somewhere else? You'd also have to change
nmi_injected handling in arch code so its value is not "hidden", in
complete_interrupts().

> >>
> >>>  If not, what will happen next?
> >>
> >>  The NMI window will open and we'll inject the NMI.
> >
> >How will we know this? We do not request the exit, that's my worry.
> 
> I think we do? Oh, but this patch breaks it.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-15 Thread Avi Kivity

On 09/15/2011 08:25 PM, Jan Kiszka wrote:

>
>  I think so.  Suppose the vcpu enters just after kvm_make_request(); it
>  sees KVM_REQ_EVENT and clears it, but doesn't see nmi_pending because it
>  wasn't set set.  Then comes a kick, the guest is reentered with
>  nmi_pending set but KVM_REQ_EVENT clear and sails through the check and
>  enters the guest.  The NMI is delayed until the next KVM_REQ_EVENT.

That makes sense - and the old code looks more strange now.


I think it dates to the time all NMIs were synchronous.


>>>
>>>/* try to inject new event if pending */
>>>   -if (vcpu->arch.nmi_pending) {
>>>   +if (atomic_read(&vcpu->arch.nmi_pending)) {
>>>if (kvm_x86_ops->nmi_allowed(vcpu)) {
>>>   -vcpu->arch.nmi_pending = false;
>>>   +atomic_dec(&vcpu->arch.nmi_pending);
>>
>>  Here we lost NMIs in the past by overwriting nmi_pending while another
>>  one was already queued, right?
>
>  One place, yes.  The other is kvm_inject_nmi() - if the first nmi didn't
>  get picked up by the vcpu by the time the second nmi arrives, we lose
>  the second nmi.

Thinking this through again, it's actually not yet clear to me what we
are modeling here: If two NMI events arrive almost perfectly in
parallel, does the real hardware guarantee that they will always cause
two NMI events in the CPU? Then this is required.


It's not 100% clear from the SDM, but this is what I understood from 
it.  And it's needed - the NMI handlers are now being reworked to handle 
just one NMI source (hopefully the cheapest) in the handler, and if we 
detect a back-to-back NMI, handle all possible NMI sources.  This 
optimization is needed in turn so we can use Jeremy's paravirt spinlock 
framework, which requires a sleep primitive and a 
wake-up-even-if-the-sleeper-has-interrupts-disabled primitive.  i 
thought of using HLT and NMIs respectively, but that means we need a 
cheap handler (i.e. don't go reading PMU MSRs).



Otherwise I just lost understanding again why we were loosing NMIs here
and in kvm_inject_nmi (maybe elsewhere then?).


Because of that.


>
>>>if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>>>inject_pending_event(vcpu);
>>>
>>>/* enable NMI/IRQ window open exits if needed */
>>>   -if (nmi_pending)
>>>   +if (atomic_read(&vcpu->arch.nmi_pending)
>>>   +&&   nmi_in_progress(vcpu))
>>
>>  Is nmi_pending&&   !nmi_in_progress possible at all?
>
>  Yes, due to NMI-blocked-by-STI.  A really touchy area.

And we don't need the window exit notification then? I don't understand
what nmi_in_progress is supposed to do here.


We need the window notification in both cases.  If we're recovering from 
STI, then we don't need to collapse NMIs.  If we're completing an NMI 
handler, then we do need to collapse NMIs (since the queue length is 
two, and we just completed one).



>
>>  If not, what will happen next?
>
>  The NMI window will open and we'll inject the NMI.

How will we know this? We do not request the exit, that's my worry.


I think we do? Oh, but this patch breaks it.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-15 Thread Jan Kiszka
On 2011-09-15 19:02, Avi Kivity wrote:
> On 09/15/2011 07:01 PM, Jan Kiszka wrote:
>> On 2011-09-15 16:45, Avi Kivity wrote:
>>>  If simultaneous NMIs happen, we're supposed to queue the second
>>>  and next (collapsing them), but currently we sometimes collapse
>>>  the second into the first.
>>
>> Can you describe the race in a few more details here ("sometimes" sounds
>> like "I don't know when" :) )?
> 
> In this case it was "I'm in a hurry".
> 
>>>
>>>   void kvm_inject_nmi(struct kvm_vcpu *vcpu)
>>>   {
>>>  +  atomic_inc(&vcpu->arch.nmi_pending);
>>> kvm_make_request(KVM_REQ_EVENT, vcpu);
>>>  -  vcpu->arch.nmi_pending = 1;
>>
>> Does the reordering matter?
> 
> I think so.  Suppose the vcpu enters just after kvm_make_request(); it 
> sees KVM_REQ_EVENT and clears it, but doesn't see nmi_pending because it 
> wasn't set set.  Then comes a kick, the guest is reentered with 
> nmi_pending set but KVM_REQ_EVENT clear and sails through the check and 
> enters the guest.  The NMI is delayed until the next KVM_REQ_EVENT.

That makes sense - and the old code looks more strange now.

> 
>> Do we need barriers?
> 
> Yes.
> 
>>
>>>  @@ -5570,9 +5570,9 @@ static void inject_pending_event(struct kvm_vcpu 
>>> *vcpu)
>>> }
>>>
>>> /* try to inject new event if pending */
>>>  -  if (vcpu->arch.nmi_pending) {
>>>  +  if (atomic_read(&vcpu->arch.nmi_pending)) {
>>> if (kvm_x86_ops->nmi_allowed(vcpu)) {
>>>  -  vcpu->arch.nmi_pending = false;
>>>  +  atomic_dec(&vcpu->arch.nmi_pending);
>>
>> Here we lost NMIs in the past by overwriting nmi_pending while another
>> one was already queued, right?
> 
> One place, yes.  The other is kvm_inject_nmi() - if the first nmi didn't 
> get picked up by the vcpu by the time the second nmi arrives, we lose 
> the second nmi.

Thinking this through again, it's actually not yet clear to me what we
are modeling here: If two NMI events arrive almost perfectly in
parallel, does the real hardware guarantee that they will always cause
two NMI events in the CPU? Then this is required.

Otherwise I just lost understanding again why we were loosing NMIs here
and in kvm_inject_nmi (maybe elsewhere then?).

> 
>>> if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>>> inject_pending_event(vcpu);
>>>
>>> /* enable NMI/IRQ window open exits if needed */
>>>  -  if (nmi_pending)
>>>  +  if (atomic_read(&vcpu->arch.nmi_pending)
>>>  +  &&  nmi_in_progress(vcpu))
>>
>> Is nmi_pending&&  !nmi_in_progress possible at all?
> 
> Yes, due to NMI-blocked-by-STI.  A really touchy area.

And we don't need the window exit notification then? I don't understand
what nmi_in_progress is supposed to do here.

> 
>> Is it rather a BUG
>> condition?
> 
> No.
> 
>> If not, what will happen next?
> 
> The NMI window will open and we'll inject the NMI.

How will we know this? We do not request the exit, that's my worry.

>  But I think we have 
> a bug here - we should only kvm_collapse_nmis() if an NMI handler was 
> indeed running, yet we do it unconditionally.
> 
>>>
>>>  +static inline void kvm_collapse_pending_nmis(struct kvm_vcpu *vcpu)
>>>  +{
>>>  +  /* Collapse all NMIs queued while an NMI handler was running to one */
>>>  +  if (atomic_read(&vcpu->arch.nmi_pending))
>>>  +  atomic_set(&vcpu->arch.nmi_pending, 1);
>>
>> Is it OK that NMIs injected after the collapse will increment this to>
>> 1 again? Or is that impossible?
>>
> 
> It's possible and okay.  We're now completing execution of IRET.  Doing 
> atomic_set() after atomic_inc() means the NMI happened before IRET 
> completed, and vice versa.  Since these events are asynchronous, we're 
> free to choose one or the other (a self-IPI-NMI just before the IRET 
> must be swallowed, and a self-IPI-NMI just after the IRET would only be 
> executed after the next time around the handler).

Need to think through this separately.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-15 Thread Avi Kivity

On 09/15/2011 07:01 PM, Jan Kiszka wrote:

On 2011-09-15 16:45, Avi Kivity wrote:
>  If simultaneous NMIs happen, we're supposed to queue the second
>  and next (collapsing them), but currently we sometimes collapse
>  the second into the first.

Can you describe the race in a few more details here ("sometimes" sounds
like "I don't know when" :) )?


In this case it was "I'm in a hurry".


>
>   void kvm_inject_nmi(struct kvm_vcpu *vcpu)
>   {
>  + atomic_inc(&vcpu->arch.nmi_pending);
>kvm_make_request(KVM_REQ_EVENT, vcpu);
>  - vcpu->arch.nmi_pending = 1;

Does the reordering matter?


I think so.  Suppose the vcpu enters just after kvm_make_request(); it 
sees KVM_REQ_EVENT and clears it, but doesn't see nmi_pending because it 
wasn't set set.  Then comes a kick, the guest is reentered with 
nmi_pending set but KVM_REQ_EVENT clear and sails through the check and 
enters the guest.  The NMI is delayed until the next KVM_REQ_EVENT.



Do we need barriers?


Yes.



>  @@ -5570,9 +5570,9 @@ static void inject_pending_event(struct kvm_vcpu *vcpu)
>}
>
>/* try to inject new event if pending */
>  - if (vcpu->arch.nmi_pending) {
>  + if (atomic_read(&vcpu->arch.nmi_pending)) {
>if (kvm_x86_ops->nmi_allowed(vcpu)) {
>  - vcpu->arch.nmi_pending = false;
>  + atomic_dec(&vcpu->arch.nmi_pending);

Here we lost NMIs in the past by overwriting nmi_pending while another
one was already queued, right?


One place, yes.  The other is kvm_inject_nmi() - if the first nmi didn't 
get picked up by the vcpu by the time the second nmi arrives, we lose 
the second nmi.



>if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>inject_pending_event(vcpu);
>
>/* enable NMI/IRQ window open exits if needed */
>  - if (nmi_pending)
>  + if (atomic_read(&vcpu->arch.nmi_pending)
>  + &&  nmi_in_progress(vcpu))

Is nmi_pending&&  !nmi_in_progress possible at all?


Yes, due to NMI-blocked-by-STI.  A really touchy area.


Is it rather a BUG
condition?


No.


If not, what will happen next?


The NMI window will open and we'll inject the NMI.  But I think we have 
a bug here - we should only kvm_collapse_nmis() if an NMI handler was 
indeed running, yet we do it unconditionally.



>
>  +static inline void kvm_collapse_pending_nmis(struct kvm_vcpu *vcpu)
>  +{
>  + /* Collapse all NMIs queued while an NMI handler was running to one */
>  + if (atomic_read(&vcpu->arch.nmi_pending))
>  + atomic_set(&vcpu->arch.nmi_pending, 1);

Is it OK that NMIs injected after the collapse will increment this to>
1 again? Or is that impossible?



It's possible and okay.  We're now completing execution of IRET.  Doing 
atomic_set() after atomic_inc() means the NMI happened before IRET 
completed, and vice versa.  Since these events are asynchronous, we're 
free to choose one or the other (a self-IPI-NMI just before the IRET 
must be swallowed, and a self-IPI-NMI just after the IRET would only be 
executed after the next time around the handler).



--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM: Fix simultaneous NMIs

2011-09-15 Thread Jan Kiszka
On 2011-09-15 16:45, Avi Kivity wrote:
> If simultaneous NMIs happen, we're supposed to queue the second
> and next (collapsing them), but currently we sometimes collapse
> the second into the first.

Can you describe the race in a few more details here ("sometimes" sounds
like "I don't know when" :) )?

> 
> Fix by using a counter for pending NMIs instead of a bool; collapsing
> happens when the NMI window reopens.
> 
> Signed-off-by: Avi Kivity 
> ---
> 
> Not sure whether this interacts correctly with NMI-masked-by-STI or with
> save/restore.
> 
>  arch/x86/include/asm/kvm_host.h |2 +-
>  arch/x86/kvm/svm.c  |1 +
>  arch/x86/kvm/vmx.c  |3 ++-
>  arch/x86/kvm/x86.c  |   33 +++--
>  arch/x86/kvm/x86.h  |7 +++
>  5 files changed, 26 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 6ab4241..3a95885 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -413,7 +413,7 @@ struct kvm_vcpu_arch {
>   u32  tsc_catchup_mult;
>   s8   tsc_catchup_shift;
>  
> - bool nmi_pending;
> + atomic_t nmi_pending;
>   bool nmi_injected;
>  
>   struct mtrr_state_type mtrr_state;
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index e7ed4b1..d4c792f 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3609,6 +3609,7 @@ static void svm_complete_interrupts(struct vcpu_svm 
> *svm)
>   if ((svm->vcpu.arch.hflags & HF_IRET_MASK)
>   && kvm_rip_read(&svm->vcpu) != svm->nmi_iret_rip) {
>   svm->vcpu.arch.hflags &= ~(HF_NMI_MASK | HF_IRET_MASK);
> + kvm_collapse_pending_nmis(&svm->vcpu);
>   kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
>   }
>  
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index a0d6bd9..745dadb 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4761,6 +4761,7 @@ static int handle_nmi_window(struct kvm_vcpu *vcpu)
>   cpu_based_vm_exec_control &= ~CPU_BASED_VIRTUAL_NMI_PENDING;
>   vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
>   ++vcpu->stat.nmi_window_exits;
> + kvm_collapse_pending_nmis(vcpu);
>   kvm_make_request(KVM_REQ_EVENT, vcpu);
>  
>   return 1;
> @@ -5790,7 +5791,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
>   if (vmx_interrupt_allowed(vcpu)) {
>   vmx->soft_vnmi_blocked = 0;
>   } else if (vmx->vnmi_blocked_time > 10LL &&
> -vcpu->arch.nmi_pending) {
> +atomic_read(&vcpu->arch.nmi_pending)) {
>   /*
>* This CPU don't support us in finding the end of an
>* NMI-blocked window if the guest runs with IRQs
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 6b37f18..d4f45e0 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -359,8 +359,8 @@ void kvm_propagate_fault(struct kvm_vcpu *vcpu, struct 
> x86_exception *fault)
>  
>  void kvm_inject_nmi(struct kvm_vcpu *vcpu)
>  {
> + atomic_inc(&vcpu->arch.nmi_pending);
>   kvm_make_request(KVM_REQ_EVENT, vcpu);
> - vcpu->arch.nmi_pending = 1;

Does the reordering matter? Do we need barriers?

>  }
>  EXPORT_SYMBOL_GPL(kvm_inject_nmi);
>  
> @@ -2844,7 +2844,7 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct 
> kvm_vcpu *vcpu,
>   KVM_X86_SHADOW_INT_MOV_SS | KVM_X86_SHADOW_INT_STI);
>  
>   events->nmi.injected = vcpu->arch.nmi_injected;
> - events->nmi.pending = vcpu->arch.nmi_pending;
> + events->nmi.pending = atomic_read(&vcpu->arch.nmi_pending) != 0;
>   events->nmi.masked = kvm_x86_ops->get_nmi_mask(vcpu);
>   events->nmi.pad = 0;
>  
> @@ -2878,7 +2878,7 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct 
> kvm_vcpu *vcpu,
>  
>   vcpu->arch.nmi_injected = events->nmi.injected;
>   if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING)
> - vcpu->arch.nmi_pending = events->nmi.pending;
> + atomic_set(&vcpu->arch.nmi_pending, events->nmi.pending);
>   kvm_x86_ops->set_nmi_mask(vcpu, events->nmi.masked);
>  
>   if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR)
> @@ -4763,7 +4763,7 @@ int kvm_inject_realmode_interrupt(struct kvm_vcpu 
> *vcpu, int irq, int inc_eip)
>   kvm_set_rflags(vcpu, ctxt->eflags);
>  
>   if (irq == NMI_VECTOR)
> - vcpu->arch.nmi_pending = false;
> + atomic_set(&vcpu->arch.nmi_pending, 0);
>   else
>   vcpu->arch.interrupt.pending = false;
>  
> @@ -5570,9 +5570,9 @@ static void inject_pending_event(struct kvm_vcpu *vcpu)
>   }
>  
>   /* try to inject new event if pending */
> - if (vcpu->arch.nmi_pending) {
> + if (atomic_read(&vcpu->arch.nmi_pending)) {
>   if (kvm_x86_ops->nmi_allo