Re: [PATCH v2 8/9] KVM: arm/arm64: fix race in kvm_psci_vcpu_on

2017-04-05 Thread Christoffer Dall
On Wed, Apr 05, 2017 at 11:12:12AM +0200, Andrew Jones wrote:
> On Wed, Apr 05, 2017 at 10:50:05AM +0200, Christoffer Dall wrote:
> > On Wed, Apr 05, 2017 at 10:35:59AM +0200, Andrew Jones wrote:
> > > On Tue, Apr 04, 2017 at 09:42:08PM +0200, Christoffer Dall wrote:
> > > > On Fri, Mar 31, 2017 at 06:06:57PM +0200, Andrew Jones wrote:
> > > > > From: Levente Kurusa 
> > > > > 
> > > > > When two vcpus issue PSCI_CPU_ON on the same core at the same time,
> > > > > then it's possible for them to both enter the target vcpu's setup
> > > > > at the same time. This results in unexpected behaviors at best,
> > > > > and the potential for some nasty bugs at worst.
> > > > > 
> > > > > Signed-off-by: Levente Kurusa 
> > > > > Signed-off-by: Andrew Jones 
> > > > > ---
> > > > >  arch/arm/kvm/psci.c | 4 ++--
> > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> > > > > index f732484abc7a..0204daa899b1 100644
> > > > > --- a/arch/arm/kvm/psci.c
> > > > > +++ b/arch/arm/kvm/psci.c
> > > > > @@ -88,7 +88,8 @@ static unsigned long kvm_psci_vcpu_on(struct 
> > > > > kvm_vcpu *source_vcpu)
> > > > >*/
> > > > >   if (!vcpu)
> > > > >   return PSCI_RET_INVALID_PARAMS;
> > > > > - if (!test_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> > > > > +
> > > > > + if (!test_and_clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> > > > >   if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
> > > > >   return PSCI_RET_ALREADY_ON;
> > > > >   else
> > > > > @@ -116,7 +117,6 @@ static unsigned long kvm_psci_vcpu_on(struct 
> > > > > kvm_vcpu *source_vcpu)
> > > > >* the general puspose registers are undefined upon CPU_ON.
> > > > >*/
> > > > >   vcpu_set_reg(vcpu, 0, context_id);
> > > > > - clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
> > > > >  
> > > > >   wq = kvm_arch_vcpu_wq(vcpu);
> > > > >   swake_up(wq);
> > > > > -- 
> > > > > 2.9.3
> > > > > 
> > > > 
> > > > Depending on what you end up doing with the requests, if you keep the
> > > > bool flag you could just use the kvm->lock mutex instead.
> > > > 
> > > > Have you considered if there are any potential races between
> > > > kvm_psci_system_off() being called on one VCPU while two other VPCUs are
> > > > turning on the same CPU that is being turend off as part of system-wide
> > > > power down as well?
> > > 
> > > Sounds like a nice unit test.  I haven't considered it, but I guess
> > > the kvm_psci_system_off/reset calling VCPU will ultimately "win", as
> > > it'll cause an exit to userspace that initiates a shutdown/reset.
> > > When the VCPUs are restarted then vcpu init should reset the power_off
> > > state correctly.  As long as the race this patch addresses is fixed, then
> > > I'm not sure there should be any risk with the actual system_off/reset
> > > being delayed wrt a vcpu being "on'ed" again, nor with there being more
> > > than one VCPU trying to "on" it at the same time.
> > > 
> > > > 
> > > > I'm wondering if this means we should take the kvm->lock at a higher
> > > > level when handling PSCI events...
> > > 
> > > That would simplify our analysis of the PSCI emulation, but I'm not
> > > sure we want to give a guest the power to constantly acquire that
> > > mutex with a barrage of PSCI calls.  Maybe we should create a PSCI
> > > mutex?  In order to avoid holding it too long we may want power_off to
> > > be more than a boolean though, i.e. the PENDING state might also be
> > > a good idea to represent.
> > > 
> > 
> > Hmm, the kvm->lock mutex is per-VM, so if a VM wants to use its CPU
> > resources by taking its own mutex, I don't really see the problem.
> 
> I was worried about management paths that lead to a need for that
> lock. For example, I see x86's kvm_free_vcpus(), called from
> kvm_arch_destroy_vm(), acquires it. A quick grep of ARM code doesn't
> reveal anything though.
> 
Even in that case, PSCI is guaranteed to make progress, right?  So I
still don't understand the challenge.

In any case, I'll have a look over this patch again when you respin.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 8/9] KVM: arm/arm64: fix race in kvm_psci_vcpu_on

2017-04-05 Thread Andrew Jones
On Wed, Apr 05, 2017 at 10:50:05AM +0200, Christoffer Dall wrote:
> On Wed, Apr 05, 2017 at 10:35:59AM +0200, Andrew Jones wrote:
> > On Tue, Apr 04, 2017 at 09:42:08PM +0200, Christoffer Dall wrote:
> > > On Fri, Mar 31, 2017 at 06:06:57PM +0200, Andrew Jones wrote:
> > > > From: Levente Kurusa 
> > > > 
> > > > When two vcpus issue PSCI_CPU_ON on the same core at the same time,
> > > > then it's possible for them to both enter the target vcpu's setup
> > > > at the same time. This results in unexpected behaviors at best,
> > > > and the potential for some nasty bugs at worst.
> > > > 
> > > > Signed-off-by: Levente Kurusa 
> > > > Signed-off-by: Andrew Jones 
> > > > ---
> > > >  arch/arm/kvm/psci.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> > > > index f732484abc7a..0204daa899b1 100644
> > > > --- a/arch/arm/kvm/psci.c
> > > > +++ b/arch/arm/kvm/psci.c
> > > > @@ -88,7 +88,8 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu 
> > > > *source_vcpu)
> > > >  */
> > > > if (!vcpu)
> > > > return PSCI_RET_INVALID_PARAMS;
> > > > -   if (!test_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> > > > +
> > > > +   if (!test_and_clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> > > > if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
> > > > return PSCI_RET_ALREADY_ON;
> > > > else
> > > > @@ -116,7 +117,6 @@ static unsigned long kvm_psci_vcpu_on(struct 
> > > > kvm_vcpu *source_vcpu)
> > > >  * the general puspose registers are undefined upon CPU_ON.
> > > >  */
> > > > vcpu_set_reg(vcpu, 0, context_id);
> > > > -   clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
> > > >  
> > > > wq = kvm_arch_vcpu_wq(vcpu);
> > > > swake_up(wq);
> > > > -- 
> > > > 2.9.3
> > > > 
> > > 
> > > Depending on what you end up doing with the requests, if you keep the
> > > bool flag you could just use the kvm->lock mutex instead.
> > > 
> > > Have you considered if there are any potential races between
> > > kvm_psci_system_off() being called on one VCPU while two other VPCUs are
> > > turning on the same CPU that is being turend off as part of system-wide
> > > power down as well?
> > 
> > Sounds like a nice unit test.  I haven't considered it, but I guess
> > the kvm_psci_system_off/reset calling VCPU will ultimately "win", as
> > it'll cause an exit to userspace that initiates a shutdown/reset.
> > When the VCPUs are restarted then vcpu init should reset the power_off
> > state correctly.  As long as the race this patch addresses is fixed, then
> > I'm not sure there should be any risk with the actual system_off/reset
> > being delayed wrt a vcpu being "on'ed" again, nor with there being more
> > than one VCPU trying to "on" it at the same time.
> > 
> > > 
> > > I'm wondering if this means we should take the kvm->lock at a higher
> > > level when handling PSCI events...
> > 
> > That would simplify our analysis of the PSCI emulation, but I'm not
> > sure we want to give a guest the power to constantly acquire that
> > mutex with a barrage of PSCI calls.  Maybe we should create a PSCI
> > mutex?  In order to avoid holding it too long we may want power_off to
> > be more than a boolean though, i.e. the PENDING state might also be
> > a good idea to represent.
> > 
> 
> Hmm, the kvm->lock mutex is per-VM, so if a VM wants to use its CPU
> resources by taking its own mutex, I don't really see the problem.

I was worried about management paths that lead to a need for that
lock. For example, I see x86's kvm_free_vcpus(), called from
kvm_arch_destroy_vm(), acquires it. A quick grep of ARM code doesn't
reveal anything though.

Thanks,
drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 8/9] KVM: arm/arm64: fix race in kvm_psci_vcpu_on

2017-04-05 Thread Christoffer Dall
On Wed, Apr 05, 2017 at 10:35:59AM +0200, Andrew Jones wrote:
> On Tue, Apr 04, 2017 at 09:42:08PM +0200, Christoffer Dall wrote:
> > On Fri, Mar 31, 2017 at 06:06:57PM +0200, Andrew Jones wrote:
> > > From: Levente Kurusa 
> > > 
> > > When two vcpus issue PSCI_CPU_ON on the same core at the same time,
> > > then it's possible for them to both enter the target vcpu's setup
> > > at the same time. This results in unexpected behaviors at best,
> > > and the potential for some nasty bugs at worst.
> > > 
> > > Signed-off-by: Levente Kurusa 
> > > Signed-off-by: Andrew Jones 
> > > ---
> > >  arch/arm/kvm/psci.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> > > index f732484abc7a..0204daa899b1 100644
> > > --- a/arch/arm/kvm/psci.c
> > > +++ b/arch/arm/kvm/psci.c
> > > @@ -88,7 +88,8 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu 
> > > *source_vcpu)
> > >*/
> > >   if (!vcpu)
> > >   return PSCI_RET_INVALID_PARAMS;
> > > - if (!test_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> > > +
> > > + if (!test_and_clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> > >   if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
> > >   return PSCI_RET_ALREADY_ON;
> > >   else
> > > @@ -116,7 +117,6 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu 
> > > *source_vcpu)
> > >* the general puspose registers are undefined upon CPU_ON.
> > >*/
> > >   vcpu_set_reg(vcpu, 0, context_id);
> > > - clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
> > >  
> > >   wq = kvm_arch_vcpu_wq(vcpu);
> > >   swake_up(wq);
> > > -- 
> > > 2.9.3
> > > 
> > 
> > Depending on what you end up doing with the requests, if you keep the
> > bool flag you could just use the kvm->lock mutex instead.
> > 
> > Have you considered if there are any potential races between
> > kvm_psci_system_off() being called on one VCPU while two other VPCUs are
> > turning on the same CPU that is being turend off as part of system-wide
> > power down as well?
> 
> Sounds like a nice unit test.  I haven't considered it, but I guess
> the kvm_psci_system_off/reset calling VCPU will ultimately "win", as
> it'll cause an exit to userspace that initiates a shutdown/reset.
> When the VCPUs are restarted then vcpu init should reset the power_off
> state correctly.  As long as the race this patch addresses is fixed, then
> I'm not sure there should be any risk with the actual system_off/reset
> being delayed wrt a vcpu being "on'ed" again, nor with there being more
> than one VCPU trying to "on" it at the same time.
> 
> > 
> > I'm wondering if this means we should take the kvm->lock at a higher
> > level when handling PSCI events...
> 
> That would simplify our analysis of the PSCI emulation, but I'm not
> sure we want to give a guest the power to constantly acquire that
> mutex with a barrage of PSCI calls.  Maybe we should create a PSCI
> mutex?  In order to avoid holding it too long we may want power_off to
> be more than a boolean though, i.e. the PENDING state might also be
> a good idea to represent.
> 

Hmm, the kvm->lock mutex is per-VM, so if a VM wants to use its CPU
resources by taking its own mutex, I don't really see the problem.

-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 8/9] KVM: arm/arm64: fix race in kvm_psci_vcpu_on

2017-04-05 Thread Andrew Jones
On Tue, Apr 04, 2017 at 09:42:08PM +0200, Christoffer Dall wrote:
> On Fri, Mar 31, 2017 at 06:06:57PM +0200, Andrew Jones wrote:
> > From: Levente Kurusa 
> > 
> > When two vcpus issue PSCI_CPU_ON on the same core at the same time,
> > then it's possible for them to both enter the target vcpu's setup
> > at the same time. This results in unexpected behaviors at best,
> > and the potential for some nasty bugs at worst.
> > 
> > Signed-off-by: Levente Kurusa 
> > Signed-off-by: Andrew Jones 
> > ---
> >  arch/arm/kvm/psci.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> > index f732484abc7a..0204daa899b1 100644
> > --- a/arch/arm/kvm/psci.c
> > +++ b/arch/arm/kvm/psci.c
> > @@ -88,7 +88,8 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu 
> > *source_vcpu)
> >  */
> > if (!vcpu)
> > return PSCI_RET_INVALID_PARAMS;
> > -   if (!test_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> > +
> > +   if (!test_and_clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> > if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
> > return PSCI_RET_ALREADY_ON;
> > else
> > @@ -116,7 +117,6 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu 
> > *source_vcpu)
> >  * the general puspose registers are undefined upon CPU_ON.
> >  */
> > vcpu_set_reg(vcpu, 0, context_id);
> > -   clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
> >  
> > wq = kvm_arch_vcpu_wq(vcpu);
> > swake_up(wq);
> > -- 
> > 2.9.3
> > 
> 
> Depending on what you end up doing with the requests, if you keep the
> bool flag you could just use the kvm->lock mutex instead.
> 
> Have you considered if there are any potential races between
> kvm_psci_system_off() being called on one VCPU while two other VPCUs are
> turning on the same CPU that is being turend off as part of system-wide
> power down as well?

Sounds like a nice unit test.  I haven't considered it, but I guess
the kvm_psci_system_off/reset calling VCPU will ultimately "win", as
it'll cause an exit to userspace that initiates a shutdown/reset.
When the VCPUs are restarted then vcpu init should reset the power_off
state correctly.  As long as the race this patch addresses is fixed, then
I'm not sure there should be any risk with the actual system_off/reset
being delayed wrt a vcpu being "on'ed" again, nor with there being more
than one VCPU trying to "on" it at the same time.

> 
> I'm wondering if this means we should take the kvm->lock at a higher
> level when handling PSCI events...

That would simplify our analysis of the PSCI emulation, but I'm not
sure we want to give a guest the power to constantly acquire that
mutex with a barrage of PSCI calls.  Maybe we should create a PSCI
mutex?  In order to avoid holding it too long we may want power_off to
be more than a boolean though, i.e. the PENDING state might also be
a good idea to represent.

Thanks,
drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 8/9] KVM: arm/arm64: fix race in kvm_psci_vcpu_on

2017-04-04 Thread Christoffer Dall
On Fri, Mar 31, 2017 at 06:06:57PM +0200, Andrew Jones wrote:
> From: Levente Kurusa 
> 
> When two vcpus issue PSCI_CPU_ON on the same core at the same time,
> then it's possible for them to both enter the target vcpu's setup
> at the same time. This results in unexpected behaviors at best,
> and the potential for some nasty bugs at worst.
> 
> Signed-off-by: Levente Kurusa 
> Signed-off-by: Andrew Jones 
> ---
>  arch/arm/kvm/psci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> index f732484abc7a..0204daa899b1 100644
> --- a/arch/arm/kvm/psci.c
> +++ b/arch/arm/kvm/psci.c
> @@ -88,7 +88,8 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu 
> *source_vcpu)
>*/
>   if (!vcpu)
>   return PSCI_RET_INVALID_PARAMS;
> - if (!test_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> +
> + if (!test_and_clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
>   if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
>   return PSCI_RET_ALREADY_ON;
>   else
> @@ -116,7 +117,6 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu 
> *source_vcpu)
>* the general puspose registers are undefined upon CPU_ON.
>*/
>   vcpu_set_reg(vcpu, 0, context_id);
> - clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
>  
>   wq = kvm_arch_vcpu_wq(vcpu);
>   swake_up(wq);
> -- 
> 2.9.3
> 

Depending on what you end up doing with the requests, if you keep the
bool flag you could just use the kvm->lock mutex instead.

Have you considered if there are any potential races between
kvm_psci_system_off() being called on one VCPU while two other VPCUs are
turning on the same CPU that is being turend off as part of system-wide
power down as well?

I'm wondering if this means we should take the kvm->lock at a higher
level when handling PSCI events...

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm