Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-08 Thread Christoffer Dall
On Fri, Apr 07, 2017 at 07:47:29PM +0800, Paolo Bonzini wrote:
> 
> 
> On 06/04/2017 22:14, Christoffer Dall wrote:
> >> So it seems like there are no races after all in KVM/ARM code
> >
> > No races after Drew's fix has been applied to set vcpu->mode =
> > IN_GUEST_MODE, before checking the pause flag, correct?  (I think that's
> > what the spin model below is modeling).
> 
> Yes.  All of them include the vcpu->mode = IN_GUEST_MODE assignment and
> (implicitly because spin is sequentially consistent) the memory barrier.
> 
> >> After this experiment, I think I like Drew's KVM_REQ_PAUSE more than I did
> >> yesterday.  However, yet another alternative is to leave pause/power_off as
> >> they are, while taking some inspiration from his patch to do some cleanups:
> >>
> >> 1) change the "if"
> >>
> >> if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
> >> vcpu->arch.power_off || vcpu->arch.pause) {
> >>
> >> to test kvm_requests_pending instead of pause/power_off
> >>
> >> 2) clear KVM_REQ_VCPU_EXIT before the other "if":
> >>
> >> if (vcpu->arch.power_off || vcpu->arch.pause)
> >> vcpu_sleep(vcpu);
> >
> > I like using requests as only requests from one thread to the VCPU
> > thread, and not to maintain specific state about a VCPU.
> > 
> > The benefit of Drew's approach is that since these pieces of state are
> > boolean, you can have just a single check in the critical path in the
> > run loop instead of having to access multiple fields.
> 
> I think you'd still need two checks for KVM_REQ_PAUSE/KVM_REQ_POWEROFF:
> one to check whether to sleep, and one to check whether to abort the
> vmentry.
> 
> Pause and power_off could be merged into a single bitmask if necessary, too.

True, a sort of flags.

I think at this point, I'll let Drew decide what looks cleanest when he
writes up the next revision and review those.

Thanks for the thorough help and checking!
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-07 Thread Paolo Bonzini


On 06/04/2017 22:14, Christoffer Dall wrote:
>> So it seems like there are no races after all in KVM/ARM code
>
> No races after Drew's fix has been applied to set vcpu->mode =
> IN_GUEST_MODE, before checking the pause flag, correct?  (I think that's
> what the spin model below is modeling).

Yes.  All of them include the vcpu->mode = IN_GUEST_MODE assignment and
(implicitly because spin is sequentially consistent) the memory barrier.

>> After this experiment, I think I like Drew's KVM_REQ_PAUSE more than I did
>> yesterday.  However, yet another alternative is to leave pause/power_off as
>> they are, while taking some inspiration from his patch to do some cleanups:
>>
>> 1) change the "if"
>>
>> if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
>> vcpu->arch.power_off || vcpu->arch.pause) {
>>
>> to test kvm_requests_pending instead of pause/power_off
>>
>> 2) clear KVM_REQ_VCPU_EXIT before the other "if":
>>
>> if (vcpu->arch.power_off || vcpu->arch.pause)
>> vcpu_sleep(vcpu);
>
> I like using requests as only requests from one thread to the VCPU
> thread, and not to maintain specific state about a VCPU.
> 
> The benefit of Drew's approach is that since these pieces of state are
> boolean, you can have just a single check in the critical path in the
> run loop instead of having to access multiple fields.

I think you'd still need two checks for KVM_REQ_PAUSE/KVM_REQ_POWEROFF:
one to check whether to sleep, and one to check whether to abort the
vmentry.

Pause and power_off could be merged into a single bitmask if necessary, too.

Paolo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-06 Thread Christoffer Dall
On Wed, Apr 05, 2017 at 01:37:10PM +0200, Paolo Bonzini wrote:
> 
> 
> On 05/04/2017 09:09, Christoffer Dall wrote:
> >>>  - In the explanation you wrote, you use the term 'we' a lot, but when
> >>>talking about SMP barriers, I think it only makes sense to talk about
> >>>actions and observations between multiple CPUs and we have to be
> >>>specific about which CPU observes or does what with respect to the
> >>>other.  Maybe I'm being a stickler here, but there something here
> >>>which is making me uneasy.
> >> The write1-mb-if(read2) / write2-mb-if(read1) pattern is pretty common,
> >> so I think it is justified to cut the ordering on the reasoning and just
> >> focus on what the two memory locations and conditions mean.
> > ok, but the pattern above was not common to me (and I'm pretty sure I'm
> > not the only fool in the bunch here), so if we can reference something
> > that explains that this is a known pattern which has been tried and
> > proven, that would be even better.
> 
> I found https://lwn.net/Articles/573436/ which shows this example:
> 
>   CPU 0   CPU 1
>   -   --
>   WRITE_ONCE(x, 1);   WRITE_ONCE(y, 1);
>   smp_mb();   smp_mb();
>   r2 = READ_ONCE(y);  r4 = READ_ONCE(x);
> 
> And says that it is a bug if r2 == 0 && r4 == 0.  This is exactly what 
> happens in KVM:
> 
>   CPU 0   CPU 1
>   -   --
>   vcpu->mode = IN_GUEST_MODE; kvm_make_request(REQ, vcpu);
>   smp_mb();   smp_mb();
>   r2 = kvm_request_pending(vcpu)  r4 = (vcpu->mode == IN_GUEST_MODE)
>   if (r2) if (r4)
>   abort entry kick();
> 
> If r2 sees no request and r4 doesn't kick there would be a bug.
> But why can't this happen?
> 
> - if no request is pending at the time of the read to r2, CPU 1 must
> not have executed kvm_make_request yet.  In CPU 0, kvm_request_pending
> must happen after vcpu->mode is set to IN_GUEST_MODE, therefore CPU 1
> will read IN_GUEST_MODE and kick.
> 
> - if no kick happens in CPU 1, CPU 0 must not have set vcpu->mode yet.
> In CPU 1, vcpu->mode is read after setting the request bit, therefore
> CPU 0 will see the request bit and abort the guest entry.
> 
> >>>  - Finally, it feels very hard to prove the correctness of this, and
> >>>equally hard to test it (given how long we've been running with
> >>>apparently racy code).  I would hope that we could abstract some of
> >>>this into architecture generic things, that someone who eat memory
> >>>barriers for breakfast could help us verify, but again, maybe this is
> >>>Radim's series I'm asking for here.
> >>
> >> What I can do here is to suggest copying the paradigms from x86, which
> >> is quite battle tested (Windows hammers it really hard).
> >
> > That sounds reasonable, but I think part of the problem was that we
> > simply didn't understand what the paradigms were (see the
> > kvm_make_all_cpus_request above as an example), so Drew's action about
> > documenting what this all is and the constraints of using it is really
> > important for me to do that.
> 
> Yes, totally agreed on that.
> 
> >> For QEMU I did use model checking in the past for some similarly hairy
> >> synchronization code, but that is really just "executable documentation"
> >> because the model is not written in C.
> >>
> > I played with using blast on some of the KVM/ARM code a long time ago,
> > and while I was able to find a bug with it, it was sort of an obvious
> > bug, and the things I was able to do with it was pretty limited to the
> > problems I could imagine myself anyhow.  Perhaps this is what you mean
> > with executable documentation.
> 
> I prepared three examples of a spin model for KVM vCPU kicking, and
> the outcome was actually pretty surprising: the mode check seems not
> to be necessary.
> 
> I haven't covered all x86 cases so I'm not going to remove it right
> ahead, but for ARM it really seems like EXITING_GUEST_MODE is nothing
> but an optimization of consecutive kvm_vcpu_kicks.
> 
> All three models can use C preprocessor #defines to inject bugs:
> 
> - kvm-arm-pause.promela: the "paused" mechanism; the model proves that
>   the "paused" test in the interrupt-disabled region is necessary
> 
> - kvm-req.promela: the requests mechanism; the model proves that
>   the requests check in the interrupt-disabled region is necessary
> 
> - kvm-x86-pi.promela: the x86 posted interrupt mechanism (simplified
>   a bit); the model proves that KVM must disable interrupts before
>   checking for interrupts injected while outside guest mode
>   (commit b95234c84004, "kvm: x86: do not use KVM_REQ_EVENT for APICv
>   interrupt injection", 2017-02-15)
> 
> So it seems like there are no races

Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-05 Thread Paolo Bonzini


On 05/04/2017 09:09, Christoffer Dall wrote:
>>>  - In the explanation you wrote, you use the term 'we' a lot, but when
>>>talking about SMP barriers, I think it only makes sense to talk about
>>>actions and observations between multiple CPUs and we have to be
>>>specific about which CPU observes or does what with respect to the
>>>other.  Maybe I'm being a stickler here, but there something here
>>>which is making me uneasy.
>> The write1-mb-if(read2) / write2-mb-if(read1) pattern is pretty common,
>> so I think it is justified to cut the ordering on the reasoning and just
>> focus on what the two memory locations and conditions mean.
> ok, but the pattern above was not common to me (and I'm pretty sure I'm
> not the only fool in the bunch here), so if we can reference something
> that explains that this is a known pattern which has been tried and
> proven, that would be even better.

I found https://lwn.net/Articles/573436/ which shows this example:

  CPU 0 CPU 1
  - --
  WRITE_ONCE(x, 1); WRITE_ONCE(y, 1);
  smp_mb(); smp_mb();
  r2 = READ_ONCE(y);r4 = READ_ONCE(x);

And says that it is a bug if r2 == 0 && r4 == 0.  This is exactly what 
happens in KVM:

  CPU 0 CPU 1
  - --
  vcpu->mode = IN_GUEST_MODE;   kvm_make_request(REQ, vcpu);
  smp_mb(); smp_mb();
  r2 = kvm_request_pending(vcpu)r4 = (vcpu->mode == IN_GUEST_MODE)
  if (r2)   if (r4)
abort entry kick();

If r2 sees no request and r4 doesn't kick there would be a bug.
But why can't this happen?

- if no request is pending at the time of the read to r2, CPU 1 must
not have executed kvm_make_request yet.  In CPU 0, kvm_request_pending
must happen after vcpu->mode is set to IN_GUEST_MODE, therefore CPU 1
will read IN_GUEST_MODE and kick.

- if no kick happens in CPU 1, CPU 0 must not have set vcpu->mode yet.
In CPU 1, vcpu->mode is read after setting the request bit, therefore
CPU 0 will see the request bit and abort the guest entry.

>>>  - Finally, it feels very hard to prove the correctness of this, and
>>>equally hard to test it (given how long we've been running with
>>>apparently racy code).  I would hope that we could abstract some of
>>>this into architecture generic things, that someone who eat memory
>>>barriers for breakfast could help us verify, but again, maybe this is
>>>Radim's series I'm asking for here.
>>
>> What I can do here is to suggest copying the paradigms from x86, which
>> is quite battle tested (Windows hammers it really hard).
>
> That sounds reasonable, but I think part of the problem was that we
> simply didn't understand what the paradigms were (see the
> kvm_make_all_cpus_request above as an example), so Drew's action about
> documenting what this all is and the constraints of using it is really
> important for me to do that.

Yes, totally agreed on that.

>> For QEMU I did use model checking in the past for some similarly hairy
>> synchronization code, but that is really just "executable documentation"
>> because the model is not written in C.
>>
> I played with using blast on some of the KVM/ARM code a long time ago,
> and while I was able to find a bug with it, it was sort of an obvious
> bug, and the things I was able to do with it was pretty limited to the
> problems I could imagine myself anyhow.  Perhaps this is what you mean
> with executable documentation.

I prepared three examples of a spin model for KVM vCPU kicking, and
the outcome was actually pretty surprising: the mode check seems not
to be necessary.

I haven't covered all x86 cases so I'm not going to remove it right
ahead, but for ARM it really seems like EXITING_GUEST_MODE is nothing
but an optimization of consecutive kvm_vcpu_kicks.

All three models can use C preprocessor #defines to inject bugs:

- kvm-arm-pause.promela: the "paused" mechanism; the model proves that
  the "paused" test in the interrupt-disabled region is necessary

- kvm-req.promela: the requests mechanism; the model proves that
  the requests check in the interrupt-disabled region is necessary

- kvm-x86-pi.promela: the x86 posted interrupt mechanism (simplified
  a bit); the model proves that KVM must disable interrupts before
  checking for interrupts injected while outside guest mode
  (commit b95234c84004, "kvm: x86: do not use KVM_REQ_EVENT for APICv
  interrupt injection", 2017-02-15)

So it seems like there are no races after all in KVM/ARM code, though
the code can still be cleaned up.  And I have been convinced of the wrong
thing all the time. :)

But why is KVM/ARM using KVM_REQ_VCPU_EXIT
just fine without checking for requests (kvm-req.promela)?  Because,
as mentioned earlier in 

Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-05 Thread Christoffer Dall
On Tue, Apr 04, 2017 at 10:10:15PM +0200, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 21:04, Christoffer Dall wrote:
> >  - (On a related work, I suddenly felt it weird that
> >kvm_make_all_cpus_request() doesn't wake up sleeping VCPUs, but only
> >sends an IPI; does this mean that calling this function should be
> >followed by a kick() for each VCPU?  Maybe Radim was looking at this
> >in his series already.)
> 
> Yes, kvm_make_all_cpus_request in x86 is only used for "non urgent"
> requests, i.e. things to do before the next guest entry.
> 

Ah, another good thing to document somewhere.

> >  - In the explanation you wrote, you use the term 'we' a lot, but when
> >talking about SMP barriers, I think it only makes sense to talk about
> >actions and observations between multiple CPUs and we have to be
> >specific about which CPU observes or does what with respect to the
> >other.  Maybe I'm being a stickler here, but there something here
> >which is making me uneasy.
> 
> The write1-mb-if(read2) / write2-mb-if(read1) pattern is pretty common,
> so I think it is justified to cut the ordering on the reasoning and just
> focus on what the two memory locations and conditions mean.

ok, but the pattern above was not common to me (and I'm pretty sure I'm
not the only fool in the bunch here), so if we can reference something
that explains that this is a known pattern which has been tried and
proven, that would be even better.

> But I'd
> wait for v3, since I'm sure that Drew also understands the
> synchronization better.
> 

Yes, I'm confident that v3 will be great ;)

> >  - Finally, it feels very hard to prove the correctness of this, and
> >equally hard to test it (given how long we've been running with
> >apparently racy code).  I would hope that we could abstract some of
> >this into architecture generic things, that someone who eat memory
> >barriers for breakfast could help us verify, but again, maybe this is
> >Radim's series I'm asking for here.
> 
> What I can do here is to suggest copying the paradigms from x86, which
> is quite battle tested (Windows hammers it really hard).

That sounds reasonable, but I think part of the problem was that we
simply didn't understand what the paradigms were (see the
kvm_make_all_cpus_request above as an example), so Drew's action about
documenting what this all is and the constraints of using it is really
important for me to do that.

> 
> For QEMU I did use model checking in the past for some similarly hairy
> synchronization code, but that is really just "executable documentation"
> because the model is not written in C.
> 

I played with using blast on some of the KVM/ARM code a long time ago,
and while I was able to find a bug with it, it was sort of an obvious
bug, and the things I was able to do with it was pretty limited to the
problems I could imagine myself anyhow.  Perhaps this is what you mean
with executable documentation.  In any case, I feel it starts with
documentation.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Paolo Bonzini


On 04/04/2017 21:04, Christoffer Dall wrote:
>  - (On a related work, I suddenly felt it weird that
>kvm_make_all_cpus_request() doesn't wake up sleeping VCPUs, but only
>sends an IPI; does this mean that calling this function should be
>followed by a kick() for each VCPU?  Maybe Radim was looking at this
>in his series already.)

Yes, kvm_make_all_cpus_request in x86 is only used for "non urgent"
requests, i.e. things to do before the next guest entry.

>  - In the explanation you wrote, you use the term 'we' a lot, but when
>talking about SMP barriers, I think it only makes sense to talk about
>actions and observations between multiple CPUs and we have to be
>specific about which CPU observes or does what with respect to the
>other.  Maybe I'm being a stickler here, but there something here
>which is making me uneasy.

The write1-mb-if(read2) / write2-mb-if(read1) pattern is pretty common,
so I think it is justified to cut the ordering on the reasoning and just
focus on what the two memory locations and conditions mean.  But I'd
wait for v3, since I'm sure that Drew also understands the
synchronization better.

>  - Finally, it feels very hard to prove the correctness of this, and
>equally hard to test it (given how long we've been running with
>apparently racy code).  I would hope that we could abstract some of
>this into architecture generic things, that someone who eat memory
>barriers for breakfast could help us verify, but again, maybe this is
>Radim's series I'm asking for here.

What I can do here is to suggest copying the paradigms from x86, which
is quite battle tested (Windows hammers it really hard).

For QEMU I did use model checking in the past for some similarly hairy
synchronization code, but that is really just "executable documentation"
because the model is not written in C.

Paolo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Christoffer Dall
On Tue, Apr 04, 2017 at 07:57:18PM +0200, Andrew Jones wrote:
> On Tue, Apr 04, 2017 at 06:04:17PM +0200, Christoffer Dall wrote:
> > On Fri, Mar 31, 2017 at 06:06:53PM +0200, Andrew Jones wrote:
> > > This not only ensures visibility of changes to pause by using
> > > atomic ops, but also plugs a small race where a vcpu could get its
> > > pause state enabled just after its last check before entering the
> > > guest. With this patch, while the vcpu will still initially enter
> > > the guest, it will exit immediately due to the IPI sent by the vcpu
> > > kick issued after making the vcpu request.
> > > 
> > > We use bitops, rather than kvm_make/check_request(), because we
> > > don't need the barriers they provide,
> > 
> > why not?
> 
> I'll add that it's because the only state of interest is the request bit
> itself.  When the request is observable then we're good to go, no need to
> ensure that at the time the request is observable, something else is too.
> 
> > 
> > > nor do we want the side-effect
> > > of kvm_check_request() clearing the request. For pause, only the
> > > requester should do the clearing.
> > > 
> > > Signed-off-by: Andrew Jones 
> > > ---
> > >  arch/arm/include/asm/kvm_host.h   |  5 +
> > >  arch/arm/kvm/arm.c| 45 
> > > +++
> > >  arch/arm64/include/asm/kvm_host.h |  5 +
> > >  3 files changed, 33 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/arch/arm/include/asm/kvm_host.h 
> > > b/arch/arm/include/asm/kvm_host.h
> > > index 31ee468ce667..52c25536d254 100644
> > > --- a/arch/arm/include/asm/kvm_host.h
> > > +++ b/arch/arm/include/asm/kvm_host.h
> > > @@ -45,7 +45,7 @@
> > >  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
> > >  #endif
> > >  
> > > -#define KVM_REQ_VCPU_EXIT8
> > > +#define KVM_REQ_PAUSE8
> > >  
> > >  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
> > >  int __attribute_const__ kvm_target_cpu(void);
> > > @@ -173,9 +173,6 @@ struct kvm_vcpu_arch {
> > >   /* vcpu power-off state */
> > >   bool power_off;
> > >  
> > > -  /* Don't run the guest (internal implementation need) */
> > > - bool pause;
> > > -
> > >   /* IO related fields */
> > >   struct kvm_decode mmio_decode;
> > >  
> > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > > index 314eb6abe1ff..f3bfbb5f3d96 100644
> > > --- a/arch/arm/kvm/arm.c
> > > +++ b/arch/arm/kvm/arm.c
> > > @@ -94,6 +94,18 @@ struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void)
> > >  
> > >  int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
> > >  {
> > > + /*
> > > +  * If we return true from this function, then it means the vcpu is
> > > +  * either in guest mode, or has already indicated that it's in guest
> > > +  * mode. The indication is done by setting ->mode to IN_GUEST_MODE,
> > > +  * and must be done before the final kvm_request_pending() read. It's
> > > +  * important that the observability of that order be enforced and that
> > > +  * the request receiving CPU can observe any new request before the
> > > +  * requester issues a kick. Thus, the general barrier below pairs with
> > > +  * the general barrier in kvm_arch_vcpu_ioctl_run() which divides the
> > > +  * write to ->mode and the final request pending read.
> > > +  */
> > 
> > I am having a hard time understanding this comment.  For example, I
> > don't understand the difference between 'is either in guest mode or has
> > already indicated it's in guest mode'.  Which case is which again, and
> > how are we checking for two cases below?
> > 
> > Also, the stuff about observability of an order is hard to follow, and
> > the comment assumes the reader is thinking about the specific race when
> > entering the guest.
> > 
> > I think we should focus on getting the documentation in place, refer to
> > the documentation from here, and be much more brief and say something
> > like:
> > 
> > /*
> >  * The memory barrier below pairs with the barrier in
> >  * kvm_arch_vcpu_ioctl_run() between writes to vcpu->mode
> >  * and reading vcpu->requests before entering the guest.
> >  *
> >  * Ensures that the VCPU thread's CPU can observe changes to
> >  * vcpu->requests written prior to calling this function before
> >  * it writes vcpu->mode = IN_GUEST_MODE, and correspondingly
> >  * ensures that this CPU observes vcpu->mode == IN_GUEST_MODE
> >  * only if the VCPU thread's CPU could observe writes to
> >  * vcpu->requests from this CPU.
> >  /
> > 
> > Is this correct?  I'm not really sure anymore?
> 
> It's confusing because we have cross dependencies on the negatives of
> two conditions.
> 
> Here's the cross dependencies:
> 
>   vcpu->mode = IN_GUEST_MODE;   ---   ---  kvm_make_request(REQ, vcpu);
>   smp_mb();\ / smp_mb();
> X
>/ \
>   if (kvm_request_pending(vcpu))<--   -->  if (vcpu->mode == IN_GUEST

Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Paolo Bonzini


On 04/04/2017 20:18, Andrew Jones wrote:
>> My understanding is that KVM-ARM is using KVM_REQ_VCPU_EXIT simply to
>> reuse the smp_call_function_many code in kvm_make_all_cpus_request.
>> Once you add EXITING_GUEST_MODE, ARM can just add a new function
>> kvm_kick_all_cpus and use it for both pause and power_off.
>
> I was wondering about the justification of
> 'if (vcpu->mode == EXITING_GUEST_MODE)' in the x86 code, as it seemed
> redundant to me with the requests.  I'll have another think on it to see
> if request-less kicks can be satisfied in all cases by this, as long as we
> have the mode setting, barrier, mode checking order ensured in vcpu run.

Yes, this is the justification.  You should add that to
kvm_arch_vcpu_ioctl_run to close the race window (as well as the
kvm_request_pending, just for good measure).  These two are not really
optional, they are part of how kvm_vcpu_exiting_guest_mode and requests
are supposed to work.  kvm_vcpu_exiting_guest_mode is optional, but ARM
is using it and it's a pity to undo it.

Once you have done this, you can choose whether to use requests or not
for pause and poweroff, but I think it will not be necessary.

Paolo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Christoffer Dall
On Tue, Apr 04, 2017 at 08:15:09PM +0200, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 19:57, Christoffer Dall wrote:
> >> Right.  That code does
> >>
> >> tmp->arch.power_off = true;
> >> kvm_vcpu_kick(tmp);
> >>
> >> and I think what's really missing in arm.c is the "if (vcpu->mode ==
> >> EXITING_GUEST_MODE)" check that is found in x86.c.  Then pausing can
> >> also simply use kvm_vcpu_kick.
> > I see, that's why the cmpxchg() works the way it does.  We just still
> > need to move the vcpu->mode = IN_GUEST_MODE before our
> > with-interrupts-disabled check.
> > 
> > What I'm not sure is why you can get away without using a memory barrier
> > or WRITE_ONCE on x86, but is this simply because x86 is a strongly
> > ordered architecture?
> 
> x86 does have a memory barrier:
> 
> vcpu->mode = IN_GUEST_MODE;
> 
> srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> smp_mb__after_srcu_read_unlock();

duh, the long complicated barrier version made me totally miss it.
Sorry.

> 
> /*
>  * This handles the case where a posted interrupt was
>  * notified with kvm_vcpu_kick.
>  */
> if (kvm_lapic_enabled(vcpu)) {
> if (kvm_x86_ops->sync_pir_to_irr && vcpu->arch.apicv_active)
> kvm_x86_ops->sync_pir_to_irr(vcpu);
> }
> 
> if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> 
> and WRITE_ONCE is not needed if you have a memory barrier (though I find it
> more self-documenting to use it anyway).
> 

ok, thanks.

-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Andrew Jones
On Tue, Apr 04, 2017 at 07:35:11PM +0200, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 19:19, Christoffer Dall wrote:
> > On Tue, Apr 04, 2017 at 06:24:36PM +0200, Paolo Bonzini wrote:
> >>
> >>
> >> On 04/04/2017 18:04, Christoffer Dall wrote:
>  For pause, only the requester should do the clearing.
> >>
> >> This suggests that maybe this should not be a request.  The request
> >> would be just the need to act on a GIC command, exactly as before this 
> >> patch.
> > 
> > Maybe the semantics should be:
> > 
> > requester:vcpu:
> > ---
> > make_requet(vcpu, KVM_REQ_PAUSE);
> >   handles the request by
> >   clearing it and setting
> >   vcpu->pause = true;
> > wait until vcpu->pause == true
> > make_request(vcpu, KVM_REQ_UNPAUSE);
> >   vcpus 'wake up' clear the
> >   UNPAUSE request and set
> >   vcpu->pause = false;

I thought of this originally, but then decided to [ab]use the concept
of pause being a boolean and requests being bits in a bitmap.  Simpler,
but arguably not as clean.

> > 
> > The benefit would be that we get to re-use the complicated "figure out
> > the VCPU mode and whether or not we should send an IPI and get the
> > barriers right" stuff.
> 
> I don't think that's necessary.  As long as the complicated stuff avoids
> that you enter the VCPU, the next run through the loop will
> find that 'vcpu->arch.power_off || vcpu->arch.pause' is true and
> go to sleep.
> 
> >> What I don't understand is:
> >>
>  With this patch, while the vcpu will still initially enter
>  the guest, it will exit immediately due to the IPI sent by the vcpu
>  kick issued after making the vcpu request.
> >>
> >> Isn't this also true of KVM_REQ_VCPU_EXIT that was used before?

As you state below, KVM_REQ_VCPU_EXIT was getting used as a
kick-all-vcpus, but without the request/mode stuff it wasn't
sufficient for the small race window.

> >>
> >> So this:
> >>
> >> +  vcpu->arch.power_off || kvm_request_pending(vcpu)) {
> >> +  WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
> >>
> >> is the crux of the fix, you can keep using vcpu->arch.pause.
> > 
> > Probably; I feel like there's a fix here which should be a separate
> > patch from using a different requests instead of the KVM_REQ_VCPU_EXIT +
> > the pause flag.
> 
> Yeah, and then the pause flag can stay.
> 
> >> By the way, vcpu->arch.power_off can go away from this "if" too because
> >> KVM_RUN and KVM_SET_MP_STATE are mutually exclusive through the vcpu mutex.
> > 
> > But we also allow setting the power_off flag from the in-kernel PSCI
> > emulation in the context of another VCPU thread.
> 
> Right.  That code does
> 
> tmp->arch.power_off = true;
> kvm_vcpu_kick(tmp);
> 
> and I think what's really missing in arm.c is the "if (vcpu->mode ==
> EXITING_GUEST_MODE)" check that is found in x86.c.  Then pausing can
> also simply use kvm_vcpu_kick.
> 
> My understanding is that KVM-ARM is using KVM_REQ_VCPU_EXIT simply to
> reuse the smp_call_function_many code in kvm_make_all_cpus_request.
> Once you add EXITING_GUEST_MODE, ARM can just add a new function
> kvm_kick_all_cpus and use it for both pause and power_off.
> 

I was wondering about the justification of
'if (vcpu->mode == EXITING_GUEST_MODE)' in the x86 code, as it seemed
redundant to me with the requests.  I'll have another think on it to see
if request-less kicks can be satisfied in all cases by this, as long as we
have the mode setting, barrier, mode checking order ensured in vcpu run.

Thanks,
drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Paolo Bonzini


On 04/04/2017 19:57, Christoffer Dall wrote:
>> Right.  That code does
>>
>> tmp->arch.power_off = true;
>> kvm_vcpu_kick(tmp);
>>
>> and I think what's really missing in arm.c is the "if (vcpu->mode ==
>> EXITING_GUEST_MODE)" check that is found in x86.c.  Then pausing can
>> also simply use kvm_vcpu_kick.
> I see, that's why the cmpxchg() works the way it does.  We just still
> need to move the vcpu->mode = IN_GUEST_MODE before our
> with-interrupts-disabled check.
> 
> What I'm not sure is why you can get away without using a memory barrier
> or WRITE_ONCE on x86, but is this simply because x86 is a strongly
> ordered architecture?

x86 does have a memory barrier:

vcpu->mode = IN_GUEST_MODE;

srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
smp_mb__after_srcu_read_unlock();

/*
 * This handles the case where a posted interrupt was
 * notified with kvm_vcpu_kick.
 */
if (kvm_lapic_enabled(vcpu)) {
if (kvm_x86_ops->sync_pir_to_irr && vcpu->arch.apicv_active)
kvm_x86_ops->sync_pir_to_irr(vcpu);
}

if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests

and WRITE_ONCE is not needed if you have a memory barrier (though I find it
more self-documenting to use it anyway).

Paolo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Andrew Jones
On Tue, Apr 04, 2017 at 06:04:17PM +0200, Christoffer Dall wrote:
> On Fri, Mar 31, 2017 at 06:06:53PM +0200, Andrew Jones wrote:
> > This not only ensures visibility of changes to pause by using
> > atomic ops, but also plugs a small race where a vcpu could get its
> > pause state enabled just after its last check before entering the
> > guest. With this patch, while the vcpu will still initially enter
> > the guest, it will exit immediately due to the IPI sent by the vcpu
> > kick issued after making the vcpu request.
> > 
> > We use bitops, rather than kvm_make/check_request(), because we
> > don't need the barriers they provide,
> 
> why not?

I'll add that it's because the only state of interest is the request bit
itself.  When the request is observable then we're good to go, no need to
ensure that at the time the request is observable, something else is too.

> 
> > nor do we want the side-effect
> > of kvm_check_request() clearing the request. For pause, only the
> > requester should do the clearing.
> > 
> > Signed-off-by: Andrew Jones 
> > ---
> >  arch/arm/include/asm/kvm_host.h   |  5 +
> >  arch/arm/kvm/arm.c| 45 
> > +++
> >  arch/arm64/include/asm/kvm_host.h |  5 +
> >  3 files changed, 33 insertions(+), 22 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h 
> > b/arch/arm/include/asm/kvm_host.h
> > index 31ee468ce667..52c25536d254 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -45,7 +45,7 @@
> >  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
> >  #endif
> >  
> > -#define KVM_REQ_VCPU_EXIT  8
> > +#define KVM_REQ_PAUSE  8
> >  
> >  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
> >  int __attribute_const__ kvm_target_cpu(void);
> > @@ -173,9 +173,6 @@ struct kvm_vcpu_arch {
> > /* vcpu power-off state */
> > bool power_off;
> >  
> > -/* Don't run the guest (internal implementation need) */
> > -   bool pause;
> > -
> > /* IO related fields */
> > struct kvm_decode mmio_decode;
> >  
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index 314eb6abe1ff..f3bfbb5f3d96 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -94,6 +94,18 @@ struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void)
> >  
> >  int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
> >  {
> > +   /*
> > +* If we return true from this function, then it means the vcpu is
> > +* either in guest mode, or has already indicated that it's in guest
> > +* mode. The indication is done by setting ->mode to IN_GUEST_MODE,
> > +* and must be done before the final kvm_request_pending() read. It's
> > +* important that the observability of that order be enforced and that
> > +* the request receiving CPU can observe any new request before the
> > +* requester issues a kick. Thus, the general barrier below pairs with
> > +* the general barrier in kvm_arch_vcpu_ioctl_run() which divides the
> > +* write to ->mode and the final request pending read.
> > +*/
> 
> I am having a hard time understanding this comment.  For example, I
> don't understand the difference between 'is either in guest mode or has
> already indicated it's in guest mode'.  Which case is which again, and
> how are we checking for two cases below?
> 
> Also, the stuff about observability of an order is hard to follow, and
> the comment assumes the reader is thinking about the specific race when
> entering the guest.
> 
> I think we should focus on getting the documentation in place, refer to
> the documentation from here, and be much more brief and say something
> like:
> 
>   /*
>* The memory barrier below pairs with the barrier in
>* kvm_arch_vcpu_ioctl_run() between writes to vcpu->mode
>* and reading vcpu->requests before entering the guest.
>*
>* Ensures that the VCPU thread's CPU can observe changes to
>* vcpu->requests written prior to calling this function before
>* it writes vcpu->mode = IN_GUEST_MODE, and correspondingly
>* ensures that this CPU observes vcpu->mode == IN_GUEST_MODE
>* only if the VCPU thread's CPU could observe writes to
>* vcpu->requests from this CPU.
>/
> 
> Is this correct?  I'm not really sure anymore?

It's confusing because we have cross dependencies on the negatives of
two conditions.

Here's the cross dependencies:

  vcpu->mode = IN_GUEST_MODE;   ---   ---  kvm_make_request(REQ, vcpu);
  smp_mb();\ / smp_mb();
X
   / \
  if (kvm_request_pending(vcpu))<--   -->  if (vcpu->mode == IN_GUEST_MODE)

On each side the smp_mb() ensures no reordering of the pair of operations
that each side has.  I.e. on the LHS the requests LOAD cannot be ordered
before the mode STORE and on the RHS side the mode LOAD cannot be ordered
befor

Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Christoffer Dall
On Tue, Apr 04, 2017 at 07:35:11PM +0200, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 19:19, Christoffer Dall wrote:
> > On Tue, Apr 04, 2017 at 06:24:36PM +0200, Paolo Bonzini wrote:
> >>
> >>
> >> On 04/04/2017 18:04, Christoffer Dall wrote:
>  For pause, only the requester should do the clearing.
> >>
> >> This suggests that maybe this should not be a request.  The request
> >> would be just the need to act on a GIC command, exactly as before this 
> >> patch.
> > 
> > Maybe the semantics should be:
> > 
> > requester:vcpu:
> > ---
> > make_requet(vcpu, KVM_REQ_PAUSE);
> >   handles the request by
> >   clearing it and setting
> >   vcpu->pause = true;
> > wait until vcpu->pause == true
> > make_request(vcpu, KVM_REQ_UNPAUSE);
> >   vcpus 'wake up' clear the
> >   UNPAUSE request and set
> >   vcpu->pause = false;
> > 
> > The benefit would be that we get to re-use the complicated "figure out
> > the VCPU mode and whether or not we should send an IPI and get the
> > barriers right" stuff.
> 
> I don't think that's necessary.  As long as the complicated stuff avoids
> that you enter the VCPU, the next run through the loop will
> find that 'vcpu->arch.power_off || vcpu->arch.pause' is true and
> go to sleep.
> 
> >> What I don't understand is:
> >>
>  With this patch, while the vcpu will still initially enter
>  the guest, it will exit immediately due to the IPI sent by the vcpu
>  kick issued after making the vcpu request.
> >>
> >> Isn't this also true of KVM_REQ_VCPU_EXIT that was used before?
> >>
> >> So this:
> >>
> >> +  vcpu->arch.power_off || kvm_request_pending(vcpu)) {
> >> +  WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
> >>
> >> is the crux of the fix, you can keep using vcpu->arch.pause.
> > 
> > Probably; I feel like there's a fix here which should be a separate
> > patch from using a different requests instead of the KVM_REQ_VCPU_EXIT +
> > the pause flag.
> 
> Yeah, and then the pause flag can stay.
> 
> >> By the way, vcpu->arch.power_off can go away from this "if" too because
> >> KVM_RUN and KVM_SET_MP_STATE are mutually exclusive through the vcpu mutex.
> > 
> > But we also allow setting the power_off flag from the in-kernel PSCI
> > emulation in the context of another VCPU thread.
> 
> Right.  That code does
> 
> tmp->arch.power_off = true;
> kvm_vcpu_kick(tmp);
> 
> and I think what's really missing in arm.c is the "if (vcpu->mode ==
> EXITING_GUEST_MODE)" check that is found in x86.c.  Then pausing can
> also simply use kvm_vcpu_kick.

I see, that's why the cmpxchg() works the way it does.  We just still
need to move the vcpu->mode = IN_GUEST_MODE before our
with-interrupts-disabled check.

What I'm not sure is why you can get away without using a memory barrier
or WRITE_ONCE on x86, but is this simply because x86 is a strongly
ordered architecture?

> 
> My understanding is that KVM-ARM is using KVM_REQ_VCPU_EXIT simply to
> reuse the smp_call_function_many code in kvm_make_all_cpus_request.

Your understanding is correct.

> Once you add EXITING_GUEST_MODE, ARM can just add a new function
> kvm_kick_all_cpus and use it for both pause and power_off.
> 

Yes, that should work.

I think Drew's approach should also work, but at this point, I'm not
really sure which approach is better than the other.


Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Paolo Bonzini


On 04/04/2017 19:19, Christoffer Dall wrote:
> On Tue, Apr 04, 2017 at 06:24:36PM +0200, Paolo Bonzini wrote:
>>
>>
>> On 04/04/2017 18:04, Christoffer Dall wrote:
 For pause, only the requester should do the clearing.
>>
>> This suggests that maybe this should not be a request.  The request
>> would be just the need to act on a GIC command, exactly as before this patch.
> 
> Maybe the semantics should be:
> 
> requester:vcpu:
> ---
> make_requet(vcpu, KVM_REQ_PAUSE);
>   handles the request by
> clearing it and setting
> vcpu->pause = true;
> wait until vcpu->pause == true
> make_request(vcpu, KVM_REQ_UNPAUSE);
>   vcpus 'wake up' clear the
> UNPAUSE request and set
> vcpu->pause = false;
> 
> The benefit would be that we get to re-use the complicated "figure out
> the VCPU mode and whether or not we should send an IPI and get the
> barriers right" stuff.

I don't think that's necessary.  As long as the complicated stuff avoids
that you enter the VCPU, the next run through the loop will
find that 'vcpu->arch.power_off || vcpu->arch.pause' is true and
go to sleep.

>> What I don't understand is:
>>
 With this patch, while the vcpu will still initially enter
 the guest, it will exit immediately due to the IPI sent by the vcpu
 kick issued after making the vcpu request.
>>
>> Isn't this also true of KVM_REQ_VCPU_EXIT that was used before?
>>
>> So this:
>>
>> +vcpu->arch.power_off || kvm_request_pending(vcpu)) {
>> +WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
>>
>> is the crux of the fix, you can keep using vcpu->arch.pause.
> 
> Probably; I feel like there's a fix here which should be a separate
> patch from using a different requests instead of the KVM_REQ_VCPU_EXIT +
> the pause flag.

Yeah, and then the pause flag can stay.

>> By the way, vcpu->arch.power_off can go away from this "if" too because
>> KVM_RUN and KVM_SET_MP_STATE are mutually exclusive through the vcpu mutex.
> 
> But we also allow setting the power_off flag from the in-kernel PSCI
> emulation in the context of another VCPU thread.

Right.  That code does

tmp->arch.power_off = true;
kvm_vcpu_kick(tmp);

and I think what's really missing in arm.c is the "if (vcpu->mode ==
EXITING_GUEST_MODE)" check that is found in x86.c.  Then pausing can
also simply use kvm_vcpu_kick.

My understanding is that KVM-ARM is using KVM_REQ_VCPU_EXIT simply to
reuse the smp_call_function_many code in kvm_make_all_cpus_request.
Once you add EXITING_GUEST_MODE, ARM can just add a new function
kvm_kick_all_cpus and use it for both pause and power_off.

Paolo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Christoffer Dall
On Tue, Apr 04, 2017 at 06:24:36PM +0200, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 18:04, Christoffer Dall wrote:
> >> For pause, only the requester should do the clearing.
> 
> This suggests that maybe this should not be a request.  The request
> would be just the need to act on a GIC command, exactly as before this patch.

Maybe the semantics should be:

requester:vcpu:
---
make_requet(vcpu, KVM_REQ_PAUSE);
  handles the request by
  clearing it and setting
  vcpu->pause = true;
wait until vcpu->pause == true
make_request(vcpu, KVM_REQ_UNPAUSE);
  vcpus 'wake up' clear the
  UNPAUSE request and set
  vcpu->pause = false;

The benefit would be that we get to re-use the complicated "figure out
the VCPU mode and whether or not we should send an IPI and get the
barriers right" stuff.

> 
> What I don't understand is:
> 
> >> With this patch, while the vcpu will still initially enter
> >> the guest, it will exit immediately due to the IPI sent by the vcpu
> >> kick issued after making the vcpu request.
> 
> Isn't this also true of KVM_REQ_VCPU_EXIT that was used before?
> 
> So this:
> 
> + vcpu->arch.power_off || kvm_request_pending(vcpu)) {
> + WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
> 
> is the crux of the fix, you can keep using vcpu->arch.pause.

Probably; I feel like there's a fix here which should be a separate
patch from using a different requests instead of the KVM_REQ_VCPU_EXIT +
the pause flag.

> 
> By the way, vcpu->arch.power_off can go away from this "if" too because
> KVM_RUN and KVM_SET_MP_STATE are mutually exclusive through the vcpu mutex.

But we also allow setting the power_off flag from the in-kernel PSCI
emulation in the context of another VCPU thread.

> The earlier check is enough:
> 
>  if (vcpu->arch.power_off || vcpu->arch.pause)
>  vcpu_sleep(vcpu);
> 
> 
> >> +  /*
> >> +   * Indicate we're in guest mode now, before doing a final
> >> +   * check for pending vcpu requests. The general barrier
> >> +   * pairs with the one in kvm_arch_vcpu_should_kick().
> >> +   * Please see the comment there for more details.
> >> +   */
> >> +  WRITE_ONCE(vcpu->mode, IN_GUEST_MODE);
> >> +  smp_mb();
> > 
> > There are two changes here:
> > 
> > there's a change from a normal write to a WRITE_ONCE and there's also a
> > change to that adds a memory barrier.  I feel like I'd like to know if
> > these are tied together or two separate cleanups.  I also wonder if we
> > could split out more general changes from the pause thing to have a
> > better log of why we changed the run loop?
> 
> You probably should just use smp_store_mb here.
> 

That looks cleaner at least.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Andrew Jones
On Tue, Apr 04, 2017 at 04:51:40PM +0200, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 16:47, Andrew Jones wrote:
> >>> -#define KVM_REQ_VCPU_EXIT8
> >>> +#define KVM_REQ_PAUSE8
> >> Small nit: can we have a #define for this 8? KVM_REQ_ARCH_BASE, or
> >> something along those lines?
> > Sounds good to me.  Should I even do something like
> > 
> >  #define KVM_REQ_ARCH_BASE 8
> > 
> >  #define KVM_ARCH_REQ(bit) ({ \
> >  BUILD_BUG_ON(((bit) + KVM_REQ_ARCH_BASE) >= BITS_PER_LONG); \
> 
> Please make this 32 so that we don't fail on 32-bit machines.
> 
> or even
> 
> BUILD_BUG_ON((unsigned)(bit) >= BITS_PER_LONG - KVM_REQ_ARCH_BASE);
> 
> in case someone is crazy enough to pass a negative value!

Will do.

Thanks,
drew

> 
> Paolo
> 
> >  ((bit) + KVM_REQ_ARCH_BASE); \
> >  })
> > 
> >  #define KVM_REQ_PAUSE KVM_ARCH_REQ(0)
> > 
> > or would that be overkill?  Also, whether we switch to just the base
> > define, or the macro, I guess it would be good to do for all
> > architectures.
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Paolo Bonzini


On 04/04/2017 18:04, Christoffer Dall wrote:
>> For pause, only the requester should do the clearing.

This suggests that maybe this should not be a request.  The request
would be just the need to act on a GIC command, exactly as before this patch.

What I don't understand is:

>> With this patch, while the vcpu will still initially enter
>> the guest, it will exit immediately due to the IPI sent by the vcpu
>> kick issued after making the vcpu request.

Isn't this also true of KVM_REQ_VCPU_EXIT that was used before?

So this:

+   vcpu->arch.power_off || kvm_request_pending(vcpu)) {
+   WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);

is the crux of the fix, you can keep using vcpu->arch.pause.

By the way, vcpu->arch.power_off can go away from this "if" too because
KVM_RUN and KVM_SET_MP_STATE are mutually exclusive through the vcpu mutex.
The earlier check is enough:

 if (vcpu->arch.power_off || vcpu->arch.pause)
 vcpu_sleep(vcpu);


>> +/*
>> + * Indicate we're in guest mode now, before doing a final
>> + * check for pending vcpu requests. The general barrier
>> + * pairs with the one in kvm_arch_vcpu_should_kick().
>> + * Please see the comment there for more details.
>> + */
>> +WRITE_ONCE(vcpu->mode, IN_GUEST_MODE);
>> +smp_mb();
> 
> There are two changes here:
> 
> there's a change from a normal write to a WRITE_ONCE and there's also a
> change to that adds a memory barrier.  I feel like I'd like to know if
> these are tied together or two separate cleanups.  I also wonder if we
> could split out more general changes from the pause thing to have a
> better log of why we changed the run loop?

You probably should just use smp_store_mb here.

Paolo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Christoffer Dall
On Fri, Mar 31, 2017 at 06:06:53PM +0200, Andrew Jones wrote:
> This not only ensures visibility of changes to pause by using
> atomic ops, but also plugs a small race where a vcpu could get its
> pause state enabled just after its last check before entering the
> guest. With this patch, while the vcpu will still initially enter
> the guest, it will exit immediately due to the IPI sent by the vcpu
> kick issued after making the vcpu request.
> 
> We use bitops, rather than kvm_make/check_request(), because we
> don't need the barriers they provide,

why not?

> nor do we want the side-effect
> of kvm_check_request() clearing the request. For pause, only the
> requester should do the clearing.
> 
> Signed-off-by: Andrew Jones 
> ---
>  arch/arm/include/asm/kvm_host.h   |  5 +
>  arch/arm/kvm/arm.c| 45 
> +++
>  arch/arm64/include/asm/kvm_host.h |  5 +
>  3 files changed, 33 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 31ee468ce667..52c25536d254 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -45,7 +45,7 @@
>  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
>  #endif
>  
> -#define KVM_REQ_VCPU_EXIT8
> +#define KVM_REQ_PAUSE8
>  
>  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
>  int __attribute_const__ kvm_target_cpu(void);
> @@ -173,9 +173,6 @@ struct kvm_vcpu_arch {
>   /* vcpu power-off state */
>   bool power_off;
>  
> -  /* Don't run the guest (internal implementation need) */
> - bool pause;
> -
>   /* IO related fields */
>   struct kvm_decode mmio_decode;
>  
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 314eb6abe1ff..f3bfbb5f3d96 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -94,6 +94,18 @@ struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void)
>  
>  int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
>  {
> + /*
> +  * If we return true from this function, then it means the vcpu is
> +  * either in guest mode, or has already indicated that it's in guest
> +  * mode. The indication is done by setting ->mode to IN_GUEST_MODE,
> +  * and must be done before the final kvm_request_pending() read. It's
> +  * important that the observability of that order be enforced and that
> +  * the request receiving CPU can observe any new request before the
> +  * requester issues a kick. Thus, the general barrier below pairs with
> +  * the general barrier in kvm_arch_vcpu_ioctl_run() which divides the
> +  * write to ->mode and the final request pending read.
> +  */

I am having a hard time understanding this comment.  For example, I
don't understand the difference between 'is either in guest mode or has
already indicated it's in guest mode'.  Which case is which again, and
how are we checking for two cases below?

Also, the stuff about observability of an order is hard to follow, and
the comment assumes the reader is thinking about the specific race when
entering the guest.

I think we should focus on getting the documentation in place, refer to
the documentation from here, and be much more brief and say something
like:

/*
 * The memory barrier below pairs with the barrier in
 * kvm_arch_vcpu_ioctl_run() between writes to vcpu->mode
 * and reading vcpu->requests before entering the guest.
 *
 * Ensures that the VCPU thread's CPU can observe changes to
 * vcpu->requests written prior to calling this function before
 * it writes vcpu->mode = IN_GUEST_MODE, and correspondingly
 * ensures that this CPU observes vcpu->mode == IN_GUEST_MODE
 * only if the VCPU thread's CPU could observe writes to
 * vcpu->requests from this CPU.
 /

Is this correct?  I'm not really sure anymore?

There's also the obvious fact that we're adding this memory barrier
inside a funciton that checks if we should kick a vcpu, and there's no
documentation that says that this is always called in association with
setting a request, is there?

I finally don't undertand why this would be a requirement only on ARM?

> + smp_mb();
>   return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
>  }
>  
> @@ -404,7 +416,8 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>  int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>  {
>   return ((!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v))
> - && !v->arch.power_off && !v->arch.pause);
> + && !v->arch.power_off
> + && !test_bit(KVM_REQ_PAUSE, &v->requests));
>  }
>  
>  /* Just ensure a guest exit from a particular CPU */
> @@ -535,17 +548,12 @@ bool kvm_arch_intc_initialized(struct kvm *kvm)
>  
>  void kvm_arm_halt_guest(struct kvm *kvm)
>  {
> - int i;
> - struct kvm_vcpu *vcpu;
> -
> - kvm_for_each_vcpu(i, vcpu, kvm

Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Marc Zyngier
On 04/04/17 15:51, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 16:47, Andrew Jones wrote:
 -#define KVM_REQ_VCPU_EXIT 8
 +#define KVM_REQ_PAUSE 8
>>> Small nit: can we have a #define for this 8? KVM_REQ_ARCH_BASE, or
>>> something along those lines?
>> Sounds good to me.  Should I even do something like
>>
>>  #define KVM_REQ_ARCH_BASE 8
>>
>>  #define KVM_ARCH_REQ(bit) ({ \
>>  BUILD_BUG_ON(((bit) + KVM_REQ_ARCH_BASE) >= BITS_PER_LONG); \
> 
> Please make this 32 so that we don't fail on 32-bit machines.
> 
> or even
> 
> BUILD_BUG_ON((unsigned)(bit) >= BITS_PER_LONG - KVM_REQ_ARCH_BASE);
> 
> in case someone is crazy enough to pass a negative value!
> 
> Paolo
> 
>>  ((bit) + KVM_REQ_ARCH_BASE); \
>>  })
>>
>>  #define KVM_REQ_PAUSE KVM_ARCH_REQ(0)
>>
>> or would that be overkill?  Also, whether we switch to just the base
>> define, or the macro, I guess it would be good to do for all
>> architectures.
> 

Both suggestions look good to me.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Paolo Bonzini


On 04/04/2017 16:47, Andrew Jones wrote:
>>> -#define KVM_REQ_VCPU_EXIT  8
>>> +#define KVM_REQ_PAUSE  8
>> Small nit: can we have a #define for this 8? KVM_REQ_ARCH_BASE, or
>> something along those lines?
> Sounds good to me.  Should I even do something like
> 
>  #define KVM_REQ_ARCH_BASE 8
> 
>  #define KVM_ARCH_REQ(bit) ({ \
>  BUILD_BUG_ON(((bit) + KVM_REQ_ARCH_BASE) >= BITS_PER_LONG); \

Please make this 32 so that we don't fail on 32-bit machines.

or even

BUILD_BUG_ON((unsigned)(bit) >= BITS_PER_LONG - KVM_REQ_ARCH_BASE);

in case someone is crazy enough to pass a negative value!

Paolo

>  ((bit) + KVM_REQ_ARCH_BASE); \
>  })
> 
>  #define KVM_REQ_PAUSE KVM_ARCH_REQ(0)
> 
> or would that be overkill?  Also, whether we switch to just the base
> define, or the macro, I guess it would be good to do for all
> architectures.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Andrew Jones
On Tue, Apr 04, 2017 at 02:39:19PM +0100, Marc Zyngier wrote:
> On 31/03/17 17:06, Andrew Jones wrote:
> > This not only ensures visibility of changes to pause by using
> > atomic ops, but also plugs a small race where a vcpu could get its
> > pause state enabled just after its last check before entering the
> > guest. With this patch, while the vcpu will still initially enter
> > the guest, it will exit immediately due to the IPI sent by the vcpu
> > kick issued after making the vcpu request.
> > 
> > We use bitops, rather than kvm_make/check_request(), because we
> > don't need the barriers they provide, nor do we want the side-effect
> > of kvm_check_request() clearing the request. For pause, only the
> > requester should do the clearing.
> > 
> > Signed-off-by: Andrew Jones 
> > ---
> >  arch/arm/include/asm/kvm_host.h   |  5 +
> >  arch/arm/kvm/arm.c| 45 
> > +++
> >  arch/arm64/include/asm/kvm_host.h |  5 +
> >  3 files changed, 33 insertions(+), 22 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h 
> > b/arch/arm/include/asm/kvm_host.h
> > index 31ee468ce667..52c25536d254 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -45,7 +45,7 @@
> >  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
> >  #endif
> >  
> > -#define KVM_REQ_VCPU_EXIT  8
> > +#define KVM_REQ_PAUSE  8
> 
> Small nit: can we have a #define for this 8? KVM_REQ_ARCH_BASE, or
> something along those lines?

Sounds good to me.  Should I even do something like

 #define KVM_REQ_ARCH_BASE 8

 #define KVM_ARCH_REQ(bit) ({ \
 BUILD_BUG_ON(((bit) + KVM_REQ_ARCH_BASE) >= BITS_PER_LONG); \
 ((bit) + KVM_REQ_ARCH_BASE); \
 })

 #define KVM_REQ_PAUSE KVM_ARCH_REQ(0)

or would that be overkill?  Also, whether we switch to just the base
define, or the macro, I guess it would be good to do for all
architectures.

Thanks,
drew

> 
> I've otherwise started hammering this series over a number of systems,
> looking good so far.
> 
> Thanks,
> 
>   M.
> -- 
> Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-04-04 Thread Marc Zyngier
On 31/03/17 17:06, Andrew Jones wrote:
> This not only ensures visibility of changes to pause by using
> atomic ops, but also plugs a small race where a vcpu could get its
> pause state enabled just after its last check before entering the
> guest. With this patch, while the vcpu will still initially enter
> the guest, it will exit immediately due to the IPI sent by the vcpu
> kick issued after making the vcpu request.
> 
> We use bitops, rather than kvm_make/check_request(), because we
> don't need the barriers they provide, nor do we want the side-effect
> of kvm_check_request() clearing the request. For pause, only the
> requester should do the clearing.
> 
> Signed-off-by: Andrew Jones 
> ---
>  arch/arm/include/asm/kvm_host.h   |  5 +
>  arch/arm/kvm/arm.c| 45 
> +++
>  arch/arm64/include/asm/kvm_host.h |  5 +
>  3 files changed, 33 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 31ee468ce667..52c25536d254 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -45,7 +45,7 @@
>  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
>  #endif
>  
> -#define KVM_REQ_VCPU_EXIT8
> +#define KVM_REQ_PAUSE8

Small nit: can we have a #define for this 8? KVM_REQ_ARCH_BASE, or
something along those lines?

I've otherwise started hammering this series over a number of systems,
looking good so far.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request

2017-03-31 Thread Andrew Jones
This not only ensures visibility of changes to pause by using
atomic ops, but also plugs a small race where a vcpu could get its
pause state enabled just after its last check before entering the
guest. With this patch, while the vcpu will still initially enter
the guest, it will exit immediately due to the IPI sent by the vcpu
kick issued after making the vcpu request.

We use bitops, rather than kvm_make/check_request(), because we
don't need the barriers they provide, nor do we want the side-effect
of kvm_check_request() clearing the request. For pause, only the
requester should do the clearing.

Signed-off-by: Andrew Jones 
---
 arch/arm/include/asm/kvm_host.h   |  5 +
 arch/arm/kvm/arm.c| 45 +++
 arch/arm64/include/asm/kvm_host.h |  5 +
 3 files changed, 33 insertions(+), 22 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 31ee468ce667..52c25536d254 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -45,7 +45,7 @@
 #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
 #endif
 
-#define KVM_REQ_VCPU_EXIT  8
+#define KVM_REQ_PAUSE  8
 
 u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
 int __attribute_const__ kvm_target_cpu(void);
@@ -173,9 +173,6 @@ struct kvm_vcpu_arch {
/* vcpu power-off state */
bool power_off;
 
-/* Don't run the guest (internal implementation need) */
-   bool pause;
-
/* IO related fields */
struct kvm_decode mmio_decode;
 
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 314eb6abe1ff..f3bfbb5f3d96 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -94,6 +94,18 @@ struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void)
 
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
 {
+   /*
+* If we return true from this function, then it means the vcpu is
+* either in guest mode, or has already indicated that it's in guest
+* mode. The indication is done by setting ->mode to IN_GUEST_MODE,
+* and must be done before the final kvm_request_pending() read. It's
+* important that the observability of that order be enforced and that
+* the request receiving CPU can observe any new request before the
+* requester issues a kick. Thus, the general barrier below pairs with
+* the general barrier in kvm_arch_vcpu_ioctl_run() which divides the
+* write to ->mode and the final request pending read.
+*/
+   smp_mb();
return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
 }
 
@@ -404,7 +416,8 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
return ((!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v))
-   && !v->arch.power_off && !v->arch.pause);
+   && !v->arch.power_off
+   && !test_bit(KVM_REQ_PAUSE, &v->requests));
 }
 
 /* Just ensure a guest exit from a particular CPU */
@@ -535,17 +548,12 @@ bool kvm_arch_intc_initialized(struct kvm *kvm)
 
 void kvm_arm_halt_guest(struct kvm *kvm)
 {
-   int i;
-   struct kvm_vcpu *vcpu;
-
-   kvm_for_each_vcpu(i, vcpu, kvm)
-   vcpu->arch.pause = true;
-   kvm_make_all_cpus_request(kvm, KVM_REQ_VCPU_EXIT);
+   kvm_make_all_cpus_request(kvm, KVM_REQ_PAUSE);
 }
 
 void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu)
 {
-   vcpu->arch.pause = true;
+   set_bit(KVM_REQ_PAUSE, &vcpu->requests);
kvm_vcpu_kick(vcpu);
 }
 
@@ -553,7 +561,7 @@ void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu)
 {
struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
 
-   vcpu->arch.pause = false;
+   clear_bit(KVM_REQ_PAUSE, &vcpu->requests);
swake_up(wq);
 }
 
@@ -571,7 +579,7 @@ static void vcpu_sleep(struct kvm_vcpu *vcpu)
struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
 
swait_event_interruptible(*wq, ((!vcpu->arch.power_off) &&
-  (!vcpu->arch.pause)));
+   (!test_bit(KVM_REQ_PAUSE, &vcpu->requests;
 }
 
 static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
@@ -624,7 +632,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 
update_vttbr(vcpu->kvm);
 
-   if (vcpu->arch.power_off || vcpu->arch.pause)
+   if (vcpu->arch.power_off || test_bit(KVM_REQ_PAUSE, 
&vcpu->requests))
vcpu_sleep(vcpu);
 
/*
@@ -647,8 +655,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
run->exit_reason = KVM_EXIT_INTR;
}
 
+   /*
+* Indicate we're in guest mode now, before doing a final
+* check for pending vcpu requests. The general barrier
+* pairs with the one in kvm_arch_vcpu_should_kick().
+* Pl