Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs

2019-11-27 Thread Greg Kurz
On Wed, 27 Nov 2019 10:47:45 +0100
Frederic Barrat  wrote:

> 
> 
> Le 27/11/2019 à 10:33, Greg Kurz a écrit :
> > On Wed, 27 Nov 2019 10:10:13 +0100
> > Frederic Barrat  wrote:
> > 
> >>
> >>
> >> Le 27/11/2019 à 09:24, Greg Kurz a écrit :
> >>> On Wed, 27 Nov 2019 18:09:40 +1100
> >>> Alexey Kardashevskiy  wrote:
> >>>
> 
> 
>  On 20/11/2019 12:28, Oliver O'Halloran wrote:
> > The comment here implies that we don't need to take a ref to the pci_dev
> > because the ioda_pe will always have one. This implies that the current
> > expection is that the pci_dev for an NPU device will *never* be torn
> > down since the ioda_pe having a ref to the device will prevent the
> > release function from being called.
> >
> > In other words, the desired behaviour here appears to be leaking a ref.
> >
> > Nice!
> 
> 
>  There is a history: https://patchwork.ozlabs.org/patch/1088078/
> 
>  We did not fix anything in particular then, we do not seem to be fixing
>  anything now (in other words - we cannot test it in a normal natural
>  way). I'd drop this one.
> 
> >>>
> >>> Yeah, I didn't fix anything at the time. Just reverted to the ref
> >>> count behavior we had before:
> >>>
> >>> https://patchwork.ozlabs.org/patch/829172/
> >>>
> >>> Frederic recently posted his take on the same topic from the OpenCAPI
> >>> point of view:
> >>>
> >>> http://patchwork.ozlabs.org/patch/1198947/
> >>>
> >>> He seems to indicate the NPU devices as the real culprit because
> >>> nobody ever cared for them to be removable. Fixing that seems be
> >>> a chore nobody really wants to address obviously... :-\
> >>
> >>
> >> I had taken a stab at not leaking a ref for the nvlink devices and do
> >> the proper thing regarding ref counting (i.e. fixing all the callers of
> >> get_pci_dev() to drop the reference when they were done). With that, I
> >> could see that the ref count of the nvlink devices could drop to 0
> >> (calling remove for the device in /sys) and that the devices could go away.
> >>
> >> But then, I realized it's not necessarily desirable at this point. There
> >> are several comments in the code saying the npu devices (for nvlink)
> >> don't go away, there's no device release callback defined when it seems
> >> there should be, at least to handle releasing PEs All in all, it
> >> seems that some work would be needed. And if it hasn't been required by
> >> now...
> >>
> > 
> > If everyone is ok with leaking a reference in the NPU case, I guess
> > this isn't a problem. But if we move forward with Oliver's patch, a
> > pci_dev_put() would be needed for OpenCAPI, correct ?
> 
> 
> No, these code paths are nvlink-only.
> 

Oh yes indeed. Then this patch and yours fit well together :)

>Fred
> 
> 
> 
> >> Fred
> >>
> >>
> 
> 
> >
> > Signed-off-by: Oliver O'Halloran 
> > ---
> >arch/powerpc/platforms/powernv/npu-dma.c | 11 +++
> >1 file changed, 3 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> > b/arch/powerpc/platforms/powernv/npu-dma.c
> > index 72d3749da02c..2eb6e6d45a98 100644
> > --- a/arch/powerpc/platforms/powernv/npu-dma.c
> > +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> > @@ -28,15 +28,10 @@ static struct pci_dev *get_pci_dev(struct 
> > device_node *dn)
> > break;
> >
> > /*
> > -* pci_get_domain_bus_and_slot() increased the reference count 
> > of
> > -* the PCI device, but callers don't need that actually as the 
> > PE
> > -* already holds a reference to the device. Since callers aren't
> > -* aware of the reference count change, call pci_dev_put() now 
> > to
> > -* avoid leaks.
> > +* NB: for_each_pci_dev() elevates the pci_dev refcount.
> > +* Caller is responsible for dropping the ref when it's
> > +* finished with it.
> >  */
> > -   if (pdev)
> > -   pci_dev_put(pdev);
> > -
> > return pdev;
> >}
> >
> >
> 
> >>>
> >>
> > 
> 



Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs

2019-11-27 Thread Greg Kurz
On Wed, 27 Nov 2019 10:10:13 +0100
Frederic Barrat  wrote:

> 
> 
> Le 27/11/2019 à 09:24, Greg Kurz a écrit :
> > On Wed, 27 Nov 2019 18:09:40 +1100
> > Alexey Kardashevskiy  wrote:
> > 
> >>
> >>
> >> On 20/11/2019 12:28, Oliver O'Halloran wrote:
> >>> The comment here implies that we don't need to take a ref to the pci_dev
> >>> because the ioda_pe will always have one. This implies that the current
> >>> expection is that the pci_dev for an NPU device will *never* be torn
> >>> down since the ioda_pe having a ref to the device will prevent the
> >>> release function from being called.
> >>>
> >>> In other words, the desired behaviour here appears to be leaking a ref.
> >>>
> >>> Nice!
> >>
> >>
> >> There is a history: https://patchwork.ozlabs.org/patch/1088078/
> >>
> >> We did not fix anything in particular then, we do not seem to be fixing
> >> anything now (in other words - we cannot test it in a normal natural
> >> way). I'd drop this one.
> >>
> > 
> > Yeah, I didn't fix anything at the time. Just reverted to the ref
> > count behavior we had before:
> > 
> > https://patchwork.ozlabs.org/patch/829172/
> > 
> > Frederic recently posted his take on the same topic from the OpenCAPI
> > point of view:
> > 
> > http://patchwork.ozlabs.org/patch/1198947/
> > 
> > He seems to indicate the NPU devices as the real culprit because
> > nobody ever cared for them to be removable. Fixing that seems be
> > a chore nobody really wants to address obviously... :-\
> 
> 
> I had taken a stab at not leaking a ref for the nvlink devices and do 
> the proper thing regarding ref counting (i.e. fixing all the callers of 
> get_pci_dev() to drop the reference when they were done). With that, I 
> could see that the ref count of the nvlink devices could drop to 0 
> (calling remove for the device in /sys) and that the devices could go away.
> 
> But then, I realized it's not necessarily desirable at this point. There 
> are several comments in the code saying the npu devices (for nvlink) 
> don't go away, there's no device release callback defined when it seems 
> there should be, at least to handle releasing PEs All in all, it 
> seems that some work would be needed. And if it hasn't been required by 
> now...
> 

If everyone is ok with leaking a reference in the NPU case, I guess
this isn't a problem. But if we move forward with Oliver's patch, a
pci_dev_put() would be needed for OpenCAPI, correct ?

>Fred
> 
> 
> >>
> >>
> >>>
> >>> Signed-off-by: Oliver O'Halloran 
> >>> ---
> >>>   arch/powerpc/platforms/powernv/npu-dma.c | 11 +++
> >>>   1 file changed, 3 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> >>> b/arch/powerpc/platforms/powernv/npu-dma.c
> >>> index 72d3749da02c..2eb6e6d45a98 100644
> >>> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> >>> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> >>> @@ -28,15 +28,10 @@ static struct pci_dev *get_pci_dev(struct device_node 
> >>> *dn)
> >>>   break;
> >>>   
> >>>   /*
> >>> -  * pci_get_domain_bus_and_slot() increased the reference count of
> >>> -  * the PCI device, but callers don't need that actually as the PE
> >>> -  * already holds a reference to the device. Since callers aren't
> >>> -  * aware of the reference count change, call pci_dev_put() now to
> >>> -  * avoid leaks.
> >>> +  * NB: for_each_pci_dev() elevates the pci_dev refcount.
> >>> +  * Caller is responsible for dropping the ref when it's
> >>> +  * finished with it.
> >>>*/
> >>> - if (pdev)
> >>> - pci_dev_put(pdev);
> >>> -
> >>>   return pdev;
> >>>   }
> >>>   
> >>>
> >>
> > 
> 



Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs

2019-11-27 Thread Greg Kurz
On Wed, 27 Nov 2019 20:40:00 +1100
"Oliver O'Halloran"  wrote:

> On Wed, Nov 27, 2019 at 8:34 PM Greg Kurz  wrote:
> >
> >
> > If everyone is ok with leaking a reference in the NPU case, I guess
> > this isn't a problem. But if we move forward with Oliver's patch, a
> > pci_dev_put() would be needed for OpenCAPI, correct ?
> 
> Yes, but I think that's fair enough. By convention it's the callers
> responsibility to drop the ref when it calls a function that returns a
> refcounted object. Doing anything else creates a race condition since
> the object's count could drop to zero before the caller starts using
> it.
> 

Sure, you're right, especially with Frederic's patch that drops
the pci_dev_get(dev) in pnv_ioda_setup_dev_PE().

> Oliver



Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs

2019-11-27 Thread Frederic Barrat




Le 27/11/2019 à 10:33, Greg Kurz a écrit :

On Wed, 27 Nov 2019 10:10:13 +0100
Frederic Barrat  wrote:




Le 27/11/2019 à 09:24, Greg Kurz a écrit :

On Wed, 27 Nov 2019 18:09:40 +1100
Alexey Kardashevskiy  wrote:




On 20/11/2019 12:28, Oliver O'Halloran wrote:

The comment here implies that we don't need to take a ref to the pci_dev
because the ioda_pe will always have one. This implies that the current
expection is that the pci_dev for an NPU device will *never* be torn
down since the ioda_pe having a ref to the device will prevent the
release function from being called.

In other words, the desired behaviour here appears to be leaking a ref.

Nice!



There is a history: https://patchwork.ozlabs.org/patch/1088078/

We did not fix anything in particular then, we do not seem to be fixing
anything now (in other words - we cannot test it in a normal natural
way). I'd drop this one.



Yeah, I didn't fix anything at the time. Just reverted to the ref
count behavior we had before:

https://patchwork.ozlabs.org/patch/829172/

Frederic recently posted his take on the same topic from the OpenCAPI
point of view:

http://patchwork.ozlabs.org/patch/1198947/

He seems to indicate the NPU devices as the real culprit because
nobody ever cared for them to be removable. Fixing that seems be
a chore nobody really wants to address obviously... :-\



I had taken a stab at not leaking a ref for the nvlink devices and do
the proper thing regarding ref counting (i.e. fixing all the callers of
get_pci_dev() to drop the reference when they were done). With that, I
could see that the ref count of the nvlink devices could drop to 0
(calling remove for the device in /sys) and that the devices could go away.

But then, I realized it's not necessarily desirable at this point. There
are several comments in the code saying the npu devices (for nvlink)
don't go away, there's no device release callback defined when it seems
there should be, at least to handle releasing PEs All in all, it
seems that some work would be needed. And if it hasn't been required by
now...



If everyone is ok with leaking a reference in the NPU case, I guess
this isn't a problem. But if we move forward with Oliver's patch, a
pci_dev_put() would be needed for OpenCAPI, correct ?



No, these code paths are nvlink-only.

  Fred




Fred







Signed-off-by: Oliver O'Halloran 
---
   arch/powerpc/platforms/powernv/npu-dma.c | 11 +++
   1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
b/arch/powerpc/platforms/powernv/npu-dma.c
index 72d3749da02c..2eb6e6d45a98 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -28,15 +28,10 @@ static struct pci_dev *get_pci_dev(struct device_node *dn)
break;
   
   	/*

-* pci_get_domain_bus_and_slot() increased the reference count of
-* the PCI device, but callers don't need that actually as the PE
-* already holds a reference to the device. Since callers aren't
-* aware of the reference count change, call pci_dev_put() now to
-* avoid leaks.
+* NB: for_each_pci_dev() elevates the pci_dev refcount.
+* Caller is responsible for dropping the ref when it's
+* finished with it.
 */
-   if (pdev)
-   pci_dev_put(pdev);
-
return pdev;
   }
   













Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs

2019-11-27 Thread Oliver O'Halloran
On Wed, Nov 27, 2019 at 8:34 PM Greg Kurz  wrote:
>
>
> If everyone is ok with leaking a reference in the NPU case, I guess
> this isn't a problem. But if we move forward with Oliver's patch, a
> pci_dev_put() would be needed for OpenCAPI, correct ?

Yes, but I think that's fair enough. By convention it's the callers
responsibility to drop the ref when it calls a function that returns a
refcounted object. Doing anything else creates a race condition since
the object's count could drop to zero before the caller starts using
it.

Oliver


Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs

2019-11-27 Thread Frederic Barrat




Le 27/11/2019 à 09:24, Greg Kurz a écrit :

On Wed, 27 Nov 2019 18:09:40 +1100
Alexey Kardashevskiy  wrote:




On 20/11/2019 12:28, Oliver O'Halloran wrote:

The comment here implies that we don't need to take a ref to the pci_dev
because the ioda_pe will always have one. This implies that the current
expection is that the pci_dev for an NPU device will *never* be torn
down since the ioda_pe having a ref to the device will prevent the
release function from being called.

In other words, the desired behaviour here appears to be leaking a ref.

Nice!



There is a history: https://patchwork.ozlabs.org/patch/1088078/

We did not fix anything in particular then, we do not seem to be fixing
anything now (in other words - we cannot test it in a normal natural
way). I'd drop this one.



Yeah, I didn't fix anything at the time. Just reverted to the ref
count behavior we had before:

https://patchwork.ozlabs.org/patch/829172/

Frederic recently posted his take on the same topic from the OpenCAPI
point of view:

http://patchwork.ozlabs.org/patch/1198947/

He seems to indicate the NPU devices as the real culprit because
nobody ever cared for them to be removable. Fixing that seems be
a chore nobody really wants to address obviously... :-\



I had taken a stab at not leaking a ref for the nvlink devices and do 
the proper thing regarding ref counting (i.e. fixing all the callers of 
get_pci_dev() to drop the reference when they were done). With that, I 
could see that the ref count of the nvlink devices could drop to 0 
(calling remove for the device in /sys) and that the devices could go away.


But then, I realized it's not necessarily desirable at this point. There 
are several comments in the code saying the npu devices (for nvlink) 
don't go away, there's no device release callback defined when it seems 
there should be, at least to handle releasing PEs All in all, it 
seems that some work would be needed. And if it hasn't been required by 
now...


  Fred







Signed-off-by: Oliver O'Halloran 
---
  arch/powerpc/platforms/powernv/npu-dma.c | 11 +++
  1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
b/arch/powerpc/platforms/powernv/npu-dma.c
index 72d3749da02c..2eb6e6d45a98 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -28,15 +28,10 @@ static struct pci_dev *get_pci_dev(struct device_node *dn)
break;
  
  	/*

-* pci_get_domain_bus_and_slot() increased the reference count of
-* the PCI device, but callers don't need that actually as the PE
-* already holds a reference to the device. Since callers aren't
-* aware of the reference count change, call pci_dev_put() now to
-* avoid leaks.
+* NB: for_each_pci_dev() elevates the pci_dev refcount.
+* Caller is responsible for dropping the ref when it's
+* finished with it.
 */
-   if (pdev)
-   pci_dev_put(pdev);
-
return pdev;
  }
  









Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs

2019-11-27 Thread Greg Kurz
On Wed, 27 Nov 2019 18:09:40 +1100
Alexey Kardashevskiy  wrote:

> 
> 
> On 20/11/2019 12:28, Oliver O'Halloran wrote:
> > The comment here implies that we don't need to take a ref to the pci_dev
> > because the ioda_pe will always have one. This implies that the current
> > expection is that the pci_dev for an NPU device will *never* be torn
> > down since the ioda_pe having a ref to the device will prevent the
> > release function from being called.
> > 
> > In other words, the desired behaviour here appears to be leaking a ref.
> > 
> > Nice!
> 
> 
> There is a history: https://patchwork.ozlabs.org/patch/1088078/
> 
> We did not fix anything in particular then, we do not seem to be fixing
> anything now (in other words - we cannot test it in a normal natural
> way). I'd drop this one.
> 

Yeah, I didn't fix anything at the time. Just reverted to the ref
count behavior we had before:

https://patchwork.ozlabs.org/patch/829172/

Frederic recently posted his take on the same topic from the OpenCAPI
point of view:

http://patchwork.ozlabs.org/patch/1198947/

He seems to indicate the NPU devices as the real culprit because
nobody ever cared for them to be removable. Fixing that seems be
a chore nobody really wants to address obviously... :-\

> 
> 
> > 
> > Signed-off-by: Oliver O'Halloran 
> > ---
> >  arch/powerpc/platforms/powernv/npu-dma.c | 11 +++
> >  1 file changed, 3 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> > b/arch/powerpc/platforms/powernv/npu-dma.c
> > index 72d3749da02c..2eb6e6d45a98 100644
> > --- a/arch/powerpc/platforms/powernv/npu-dma.c
> > +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> > @@ -28,15 +28,10 @@ static struct pci_dev *get_pci_dev(struct device_node 
> > *dn)
> > break;
> >  
> > /*
> > -* pci_get_domain_bus_and_slot() increased the reference count of
> > -* the PCI device, but callers don't need that actually as the PE
> > -* already holds a reference to the device. Since callers aren't
> > -* aware of the reference count change, call pci_dev_put() now to
> > -* avoid leaks.
> > +* NB: for_each_pci_dev() elevates the pci_dev refcount.
> > +* Caller is responsible for dropping the ref when it's
> > +* finished with it.
> >  */
> > -   if (pdev)
> > -   pci_dev_put(pdev);
> > -
> > return pdev;
> >  }
> >  
> > 
> 



Re: [Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs

2019-11-26 Thread Alexey Kardashevskiy



On 20/11/2019 12:28, Oliver O'Halloran wrote:
> The comment here implies that we don't need to take a ref to the pci_dev
> because the ioda_pe will always have one. This implies that the current
> expection is that the pci_dev for an NPU device will *never* be torn
> down since the ioda_pe having a ref to the device will prevent the
> release function from being called.
> 
> In other words, the desired behaviour here appears to be leaking a ref.
> 
> Nice!


There is a history: https://patchwork.ozlabs.org/patch/1088078/

We did not fix anything in particular then, we do not seem to be fixing
anything now (in other words - we cannot test it in a normal natural
way). I'd drop this one.



> 
> Signed-off-by: Oliver O'Halloran 
> ---
>  arch/powerpc/platforms/powernv/npu-dma.c | 11 +++
>  1 file changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> b/arch/powerpc/platforms/powernv/npu-dma.c
> index 72d3749da02c..2eb6e6d45a98 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -28,15 +28,10 @@ static struct pci_dev *get_pci_dev(struct device_node *dn)
>   break;
>  
>   /*
> -  * pci_get_domain_bus_and_slot() increased the reference count of
> -  * the PCI device, but callers don't need that actually as the PE
> -  * already holds a reference to the device. Since callers aren't
> -  * aware of the reference count change, call pci_dev_put() now to
> -  * avoid leaks.
> +  * NB: for_each_pci_dev() elevates the pci_dev refcount.
> +  * Caller is responsible for dropping the ref when it's
> +  * finished with it.
>*/
> - if (pdev)
> - pci_dev_put(pdev);
> -
>   return pdev;
>  }
>  
> 

-- 
Alexey