Re: [PATCH v5 16/16] KVM: PPC: Book3S HV: XIVE: introduce a 'release' device operation

2019-04-11 Thread Cédric Le Goater
On 4/11/19 12:27 PM, Paul Mackerras wrote:
> On Wed, Apr 10, 2019 at 07:04:48PM +0200, Cédric Le Goater wrote:
>> When a P9 sPAPR VM boots, the CAS negotiation process determines which
>> interrupt mode to use (XICS legacy or XIVE native) and invokes a
>> machine reset to activate the chosen mode.
>>
>> To be able to switch from one mode to another, we introduce the
>> capability to release a KVM device without destroying the VM. The KVM
>> device interface is extended with a new 'release' operation which is
>> called when the file descriptor of the device is closed.
> 
> Unfortunately, I think there is now a memory leak:
> 
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index ea2018ae1cd7..ea2619d5ca98 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -2938,6 +2938,19 @@ static int kvm_device_release(struct inode *inode, 
>> struct file *filp)
>>  struct kvm_device *dev = filp->private_data;
>>  struct kvm *kvm = dev->kvm;
>>  
>> +if (!dev)
>> +return -ENODEV;
>> +
>> +if (dev->kvm != kvm)
>> +return -EPERM;
>> +
>> +if (dev->ops->release) {
>> +mutex_lock(&kvm->lock);
>> +list_del(&dev->vm_node);
> 
> Because the device is now no longer in the kvm->devices list,
> kvm_destroy_devices() won't find it there and therefore won't call the
> device's destroy method.  In fact now the device's destroy method will
> never get called; I can't see how kvmppc_xive_free() or
> kvmppc_xive_native_free() will ever get called.  Thus the memory for
> the kvmppc_xive structs will never get freed as far as I can see.

ah yes. indeed ...

> We could fix that by freeing both of the kvm->arch.xive_devices
> entries at VM destruction time.

That is what I was doing in the first patch I sent : 

http://patchwork.ozlabs.org/patch/1082303/

It worked fine and then, I had this better (worse) idea which I included 
in v5. 
 
> If it is true that any device that has a release method will never see
> its destroy method being called, then that needs to be documented
> clearly somewhere.

Yes. Closing the fd should take care of it. I have to rework that patch.

Thanks,

C.


Re: [PATCH v5 16/16] KVM: PPC: Book3S HV: XIVE: introduce a 'release' device operation

2019-04-11 Thread Paul Mackerras
On Wed, Apr 10, 2019 at 07:04:48PM +0200, Cédric Le Goater wrote:
> When a P9 sPAPR VM boots, the CAS negotiation process determines which
> interrupt mode to use (XICS legacy or XIVE native) and invokes a
> machine reset to activate the chosen mode.
> 
> To be able to switch from one mode to another, we introduce the
> capability to release a KVM device without destroying the VM. The KVM
> device interface is extended with a new 'release' operation which is
> called when the file descriptor of the device is closed.

Unfortunately, I think there is now a memory leak:

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ea2018ae1cd7..ea2619d5ca98 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2938,6 +2938,19 @@ static int kvm_device_release(struct inode *inode, 
> struct file *filp)
>   struct kvm_device *dev = filp->private_data;
>   struct kvm *kvm = dev->kvm;
>  
> + if (!dev)
> + return -ENODEV;
> +
> + if (dev->kvm != kvm)
> + return -EPERM;
> +
> + if (dev->ops->release) {
> + mutex_lock(&kvm->lock);
> + list_del(&dev->vm_node);

Because the device is now no longer in the kvm->devices list,
kvm_destroy_devices() won't find it there and therefore won't call the
device's destroy method.  In fact now the device's destroy method will
never get called; I can't see how kvmppc_xive_free() or
kvmppc_xive_native_free() will ever get called.  Thus the memory for
the kvmppc_xive structs will never get freed as far as I can see.

We could fix that by freeing both of the kvm->arch.xive_devices
entries at VM destruction time.

If it is true that any device that has a release method will never see
its destroy method being called, then that needs to be documented
clearly somewhere.

Paul.


Re: [PATCH v5 16/16] KVM: PPC: Book3S HV: XIVE: introduce a 'release' device operation

2019-04-10 Thread Cédric Le Goater
On 4/11/19 6:38 AM, David Gibson wrote:
> On Thu, Apr 11, 2019 at 01:16:25PM +1000, Paul Mackerras wrote:
>> On Wed, Apr 10, 2019 at 07:04:48PM +0200, Cédric Le Goater wrote:
>>> When a P9 sPAPR VM boots, the CAS negotiation process determines which
>>> interrupt mode to use (XICS legacy or XIVE native) and invokes a
>>> machine reset to activate the chosen mode.
>>>
>>> To be able to switch from one mode to another, we introduce the
>>> capability to release a KVM device without destroying the VM. The KVM
>>> device interface is extended with a new 'release' operation which is
>>> called when the file descriptor of the device is closed.
>>
>> I believe the release operation is not called until all of the mmaps
>> using the fd are unmapped - which is a good thing for us, since it
>> means the guest can't possibly be accessing the XIVE directly.
yes.

>> You might want to reword that last paragraph to mention that.

ok. 

>>> Such operations are defined for the XICS-on-XIVE and the XIVE native
>>> KVM devices. They clear the vCPU interrupt presenters that could be
>>> attached and then destroy the device.
>>>
>>> This is not considered as a safe operation as the vCPUs are still
>>> running and could be referencing the KVM device through their
>>> presenters. To protect the system from any breakage, the kvmppc_xive
>>> objects representing both KVM devices are now stored in an array under
>>> the VM. Allocation is performed on first usage and memory is freed
>>> only when the VM exits.
>>
>> One quick comment below:
>>
>>> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
>>> index 480a3fc6b9fd..064a9f2ae678 100644
>>> --- a/arch/powerpc/kvm/book3s_xive.c
>>> +++ b/arch/powerpc/kvm/book3s_xive.c
>>> @@ -1100,11 +1100,19 @@ void kvmppc_xive_disable_vcpu_interrupts(struct 
>>> kvm_vcpu *vcpu)
>>>  void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu)
>>>  {
>>> struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
>>> -   struct kvmppc_xive *xive = xc->xive;
>>> +   struct kvmppc_xive *xive;
>>> int i;
>>>  
>>> +   if (!kvmppc_xics_enabled(vcpu))
>>> +   return;
>>
>> Should that be kvmppc_xive_enabled() rather than xics?
> 
> I think I asked that on an earlier iteration, and the answer is no.
> The names are confusing, but this file is all about xics-on-xive
> rather than xive native.  So here we're checking what's available from
> the guest's point of view, so "xics", but most of the surrounding
> functions are named "xive" because that's the backend.
> 

yes. 

The relevant part is at the end of the kvmppc_xive_connect_vcpu() routine :

  int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
 struct kvm_vcpu *vcpu, u32 cpu)
  {
...
vcpu->arch.irq_type = KVMPPC_IRQ_XICS;
return 0;
  }



David suggested a few cleanups that we could do in the xics-on-xive 
device. We might want to introduce a KVMPPC_IRQ_XICS_ON_XIVE flag also. 
First, I would like to get rid of references to the kvmppc_xive struct 
and remove some useless attributes to improve locking.

Once the XIVE native mode is merged, all kernels above 4.14 running on 
a P9 sPAPR guest will switch to XIVE and the xics-on-xive device will 
only be useful for nested.


C.


Re: [PATCH v5 16/16] KVM: PPC: Book3S HV: XIVE: introduce a 'release' device operation

2019-04-10 Thread David Gibson
On Thu, Apr 11, 2019 at 01:16:25PM +1000, Paul Mackerras wrote:
> On Wed, Apr 10, 2019 at 07:04:48PM +0200, Cédric Le Goater wrote:
> > When a P9 sPAPR VM boots, the CAS negotiation process determines which
> > interrupt mode to use (XICS legacy or XIVE native) and invokes a
> > machine reset to activate the chosen mode.
> > 
> > To be able to switch from one mode to another, we introduce the
> > capability to release a KVM device without destroying the VM. The KVM
> > device interface is extended with a new 'release' operation which is
> > called when the file descriptor of the device is closed.
> 
> I believe the release operation is not called until all of the mmaps
> using the fd are unmapped - which is a good thing for us, since it
> means the guest can't possibly be accessing the XIVE directly.
> You might want to reword that last paragraph to mention that.
> 
> > Such operations are defined for the XICS-on-XIVE and the XIVE native
> > KVM devices. They clear the vCPU interrupt presenters that could be
> > attached and then destroy the device.
> > 
> > This is not considered as a safe operation as the vCPUs are still
> > running and could be referencing the KVM device through their
> > presenters. To protect the system from any breakage, the kvmppc_xive
> > objects representing both KVM devices are now stored in an array under
> > the VM. Allocation is performed on first usage and memory is freed
> > only when the VM exits.
> 
> One quick comment below:
> 
> > diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> > index 480a3fc6b9fd..064a9f2ae678 100644
> > --- a/arch/powerpc/kvm/book3s_xive.c
> > +++ b/arch/powerpc/kvm/book3s_xive.c
> > @@ -1100,11 +1100,19 @@ void kvmppc_xive_disable_vcpu_interrupts(struct 
> > kvm_vcpu *vcpu)
> >  void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu)
> >  {
> > struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
> > -   struct kvmppc_xive *xive = xc->xive;
> > +   struct kvmppc_xive *xive;
> > int i;
> >  
> > +   if (!kvmppc_xics_enabled(vcpu))
> > +   return;
> 
> Should that be kvmppc_xive_enabled() rather than xics?

I think I asked that on an earlier iteration, and the answer is no.
The names are confusing, but this file is all about xics-on-xive
rather than xive native.  So here we're checking what's available from
the guest's point of view, so "xics", but most of the surrounding
functions are named "xive" because that's the backend.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v5 16/16] KVM: PPC: Book3S HV: XIVE: introduce a 'release' device operation

2019-04-10 Thread Paul Mackerras
On Wed, Apr 10, 2019 at 07:04:48PM +0200, Cédric Le Goater wrote:
> When a P9 sPAPR VM boots, the CAS negotiation process determines which
> interrupt mode to use (XICS legacy or XIVE native) and invokes a
> machine reset to activate the chosen mode.
> 
> To be able to switch from one mode to another, we introduce the
> capability to release a KVM device without destroying the VM. The KVM
> device interface is extended with a new 'release' operation which is
> called when the file descriptor of the device is closed.

I believe the release operation is not called until all of the mmaps
using the fd are unmapped - which is a good thing for us, since it
means the guest can't possibly be accessing the XIVE directly.
You might want to reword that last paragraph to mention that.

> Such operations are defined for the XICS-on-XIVE and the XIVE native
> KVM devices. They clear the vCPU interrupt presenters that could be
> attached and then destroy the device.
> 
> This is not considered as a safe operation as the vCPUs are still
> running and could be referencing the KVM device through their
> presenters. To protect the system from any breakage, the kvmppc_xive
> objects representing both KVM devices are now stored in an array under
> the VM. Allocation is performed on first usage and memory is freed
> only when the VM exits.

One quick comment below:

> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index 480a3fc6b9fd..064a9f2ae678 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -1100,11 +1100,19 @@ void kvmppc_xive_disable_vcpu_interrupts(struct 
> kvm_vcpu *vcpu)
>  void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu)
>  {
>   struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
> - struct kvmppc_xive *xive = xc->xive;
> + struct kvmppc_xive *xive;
>   int i;
>  
> + if (!kvmppc_xics_enabled(vcpu))
> + return;

Should that be kvmppc_xive_enabled() rather than xics?

Paul.