RE: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-27 Thread Chen, Guchun
[Public]

Hi Andrey and Christian,

I just send out a new patch to address this, I am not sure if I understand your 
point correctly. Please review.

The patch is to stop scheduler in fence_hw_fini and start the scheduler in 
fence_hw_init.

Regards,
Guchun

-Original Message-
From: Grodzovsky, Andrey  
Sent: Monday, August 23, 2021 10:42 PM
To: Christian König ; Chen, Guchun 
; Alex Deucher ; Mike Lothian 
; Koenig, Christian 
Cc: amd-gfx list ; Gao, Likun 
; Zhang, Hawking ; Deucher, Alexander 

Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 
test (v2)


On 2021-08-23 2:50 a.m., Christian König wrote:
> Good mornings guys,
>
> Andrey has a rather valid concern here, but I think we need to 
> approach this from a more high level view.
>
> When hw_fini is called we should make sure that the scheduler can't 
> submit any more work to the hardware, because the hw is finalized and 
> not expected to response any more.
>
> As far as I can see the cleanest approach would be to stop the 
> scheduler in hw_fini and fully clean it up in sw_fini. That would also 
> fit quite nicely with how GPU reset is supposed to work I think.
>
> Problem is that this is currently done outside of the fence code for 
> the at least the reset case, so before we restructure that we need to 
> stick with what we have.
>
> Andrey do you think it would be any problem if we stop the scheduler 
> manually in the hot plug case as well?


As long as it's 'parked' inside HW fini - meaning the thread submitting to HW 
is done I think it should cover hot unplug as well.

Andrey


>
> Thanks,
> Christian.
>
> Am 23.08.21 um 08:36 schrieb Chen, Guchun:
>> [Public]
>>
>> Hi Andrey,
>>
>> Thanks for your notice. The cause why moving drm_sched_fini to 
>> sw_fini is it's a SW behavior and part of SW shutdown, so hw_fini 
>> should not touch it. But if the race, that scheduler on the ring 
>> possibly keeps submitting jobs which causes un-empty ring is there, 
>> possibly we still need to call drm_sched_fini first in hw_fini to 
>> stop job submission first.
>>
>> @Koenig, Christian what's your opinion?
>>
>> Regards,
>> Guchun
>>
>> -Original Message-
>> From: Alex Deucher 
>> Sent: Friday, August 20, 2021 2:13 AM
>> To: Mike Lothian 
>> Cc: Grodzovsky, Andrey ; Chen, Guchun 
>> ; amd-gfx list ; 
>> Gao, Likun ; Koenig, Christian 
>> ; Zhang, Hawking ; 
>> Deucher, Alexander 
>> Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver 
>> fini in s3 test (v2)
>>
>> Please go ahead.  Thanks!
>>
>> Alex
>>
>> On Thu, Aug 19, 2021 at 8:05 AM Mike Lothian 
>> wrote:
>>> Hi
>>>
>>> Do I need to open a new bug report for this?
>>>
>>> Cheers
>>>
>>> Mike
>>>
>>> On Wed, 18 Aug 2021 at 06:26, Andrey Grodzovsky 
>>>  wrote:
>>>>
>>>> On 2021-08-02 1:16 a.m., Guchun Chen wrote:
>>>>> In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to 
>>>>> stop scheduler in s3 test, otherwise, fence related failure will 
>>>>> arrive after resume. To fix this and for a better clean up, move 
>>>>> drm_sched_fini from fence_hw_fini to fence_sw_fini, as it's part 
>>>>> of driver shutdown, and should never be called in hw_fini.
>>>>>
>>>>> v2: rename amdgpu_fence_driver_init to 
>>>>> amdgpu_fence_driver_sw_init, to keep sw_init and sw_fini paired.
>>>>>
>>>>> Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable 
>>>>> sequence
>>>>> Suggested-by: Christian König 
>>>>> Signed-off-by: Guchun Chen 
>>>>> ---
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  5 ++---
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 12 +++-
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  4 ++--
>>>>>    3 files changed, 11 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> index b1d2dc39e8be..9e53ff851496 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> @@ -3646,9 +3646,9 @@ int amdgpu_device_init(struct amdgpu_device 
>>>>> *adev,
>>>>>
>>>>>    fence_driver_init:
>>>>> 

Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-23 Thread Andrey Grodzovsky



On 2021-08-23 2:50 a.m., Christian König wrote:

Good mornings guys,

Andrey has a rather valid concern here, but I think we need to 
approach this from a more high level view.


When hw_fini is called we should make sure that the scheduler can't 
submit any more work to the hardware, because the hw is finalized and 
not expected to response any more.


As far as I can see the cleanest approach would be to stop the 
scheduler in hw_fini and fully clean it up in sw_fini. That would also 
fit quite nicely with how GPU reset is supposed to work I think.


Problem is that this is currently done outside of the fence code for 
the at least the reset case, so before we restructure that we need to 
stick with what we have.


Andrey do you think it would be any problem if we stop the scheduler 
manually in the hot plug case as well?



As long as it's 'parked' inside HW fini - meaning the thread submitting 
to HW is done I think it should cover hot unplug as well.


Andrey




Thanks,
Christian.

Am 23.08.21 um 08:36 schrieb Chen, Guchun:

[Public]

Hi Andrey,

Thanks for your notice. The cause why moving drm_sched_fini to 
sw_fini is it's a SW behavior and part of SW shutdown, so hw_fini 
should not touch it. But if the race, that scheduler on the ring 
possibly keeps submitting jobs which causes un-empty ring is there, 
possibly we still need to call drm_sched_fini first in hw_fini to 
stop job submission first.


@Koenig, Christian what's your opinion?

Regards,
Guchun

-Original Message-
From: Alex Deucher 
Sent: Friday, August 20, 2021 2:13 AM
To: Mike Lothian 
Cc: Grodzovsky, Andrey ; Chen, Guchun 
; amd-gfx list ; 
Gao, Likun ; Koenig, Christian 
; Zhang, Hawking ; 
Deucher, Alexander 
Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver 
fini in s3 test (v2)


Please go ahead.  Thanks!

Alex

On Thu, Aug 19, 2021 at 8:05 AM Mike Lothian  
wrote:

Hi

Do I need to open a new bug report for this?

Cheers

Mike

On Wed, 18 Aug 2021 at 06:26, Andrey Grodzovsky 
 wrote:


On 2021-08-02 1:16 a.m., Guchun Chen wrote:

In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to
stop scheduler in s3 test, otherwise, fence related failure will
arrive after resume. To fix this and for a better clean up, move
drm_sched_fini from fence_hw_fini to fence_sw_fini, as it's part of
driver shutdown, and should never be called in hw_fini.

v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init,
to keep sw_init and sw_fini paired.

Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence
Suggested-by: Christian König 
Signed-off-by: Guchun Chen 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  5 ++---
   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 12 +++-
   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  4 ++--
   3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b1d2dc39e8be..9e53ff851496 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3646,9 +3646,9 @@ int amdgpu_device_init(struct amdgpu_device
*adev,

   fence_driver_init:
   /* Fence driver */
- r = amdgpu_fence_driver_init(adev);
+ r = amdgpu_fence_driver_sw_init(adev);
   if (r) {
- dev_err(adev->dev, "amdgpu_fence_driver_init 
failed\n");

+ dev_err(adev->dev, "amdgpu_fence_driver_sw_init
+ failed\n");
   amdgpu_vf_error_put(adev, 
AMDGIM_ERROR_VF_FENCE_INIT_FAIL, 0, 0);

   goto failed;
   }
@@ -3988,7 +3988,6 @@ int amdgpu_device_resume(struct drm_device 
*dev, bool fbcon)

   }
   amdgpu_fence_driver_hw_init(adev);

-
   r = amdgpu_device_ip_late_init(adev);
   if (r)
   return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 49c5c7331c53..7495911516c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -498,7 +498,7 @@ int amdgpu_fence_driver_init_ring(struct 
amdgpu_ring *ring,

   }

   /**
- * amdgpu_fence_driver_init - init the fence driver
+ * amdgpu_fence_driver_sw_init - init the fence driver
    * for all possible rings.
    *
    * @adev: amdgpu device pointer
@@ -509,13 +509,13 @@ int amdgpu_fence_driver_init_ring(struct 
amdgpu_ring *ring,

    * amdgpu_fence_driver_start_ring().
    * Returns 0 for success.
    */
-int amdgpu_fence_driver_init(struct amdgpu_device *adev)
+int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev)
   {
   return 0;
   }

   /**
- * amdgpu_fence_driver_fini - tear down the fence driver
+ * amdgpu_fence_driver_hw_fini - tear down the fence driver
    * for all possible rings.
    *
    * @adev: amdgpu device pointer
@@ -531,8 +531,7 @@ void amdgpu_fence_driver_hw_fini(struct
amdgpu_device *ad

Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-22 Thread Christian König

Good mornings guys,

Andrey has a rather valid concern here, but I think we need to approach 
this from a more high level view.


When hw_fini is called we should make sure that the scheduler can't 
submit any more work to the hardware, because the hw is finalized and 
not expected to response any more.


As far as I can see the cleanest approach would be to stop the scheduler 
in hw_fini and fully clean it up in sw_fini. That would also fit quite 
nicely with how GPU reset is supposed to work I think.


Problem is that this is currently done outside of the fence code for the 
at least the reset case, so before we restructure that we need to stick 
with what we have.


Andrey do you think it would be any problem if we stop the scheduler 
manually in the hot plug case as well?


Thanks,
Christian.

Am 23.08.21 um 08:36 schrieb Chen, Guchun:

[Public]

Hi Andrey,

Thanks for your notice. The cause why moving drm_sched_fini to sw_fini is it's 
a SW behavior and part of SW shutdown, so hw_fini should not touch it. But if 
the race, that scheduler on the ring possibly keeps submitting jobs which 
causes un-empty ring is there, possibly we still need to call drm_sched_fini 
first in hw_fini to stop job submission first.

@Koenig, Christian what's your opinion?

Regards,
Guchun

-Original Message-
From: Alex Deucher 
Sent: Friday, August 20, 2021 2:13 AM
To: Mike Lothian 
Cc: Grodzovsky, Andrey ; Chen, Guchun ; amd-gfx list 
; Gao, Likun ; Koenig, Christian 
; Zhang, Hawking ; Deucher, Alexander 

Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 
test (v2)

Please go ahead.  Thanks!

Alex

On Thu, Aug 19, 2021 at 8:05 AM Mike Lothian  wrote:

Hi

Do I need to open a new bug report for this?

Cheers

Mike

On Wed, 18 Aug 2021 at 06:26, Andrey Grodzovsky  
wrote:


On 2021-08-02 1:16 a.m., Guchun Chen wrote:

In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to
stop scheduler in s3 test, otherwise, fence related failure will
arrive after resume. To fix this and for a better clean up, move
drm_sched_fini from fence_hw_fini to fence_sw_fini, as it's part of
driver shutdown, and should never be called in hw_fini.

v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init,
to keep sw_init and sw_fini paired.

Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence
Suggested-by: Christian König 
Signed-off-by: Guchun Chen 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  5 ++---
   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 12 +++-
   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  4 ++--
   3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b1d2dc39e8be..9e53ff851496 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3646,9 +3646,9 @@ int amdgpu_device_init(struct amdgpu_device
*adev,

   fence_driver_init:
   /* Fence driver */
- r = amdgpu_fence_driver_init(adev);
+ r = amdgpu_fence_driver_sw_init(adev);
   if (r) {
- dev_err(adev->dev, "amdgpu_fence_driver_init failed\n");
+ dev_err(adev->dev, "amdgpu_fence_driver_sw_init
+ failed\n");
   amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_FENCE_INIT_FAIL, 0, 0);
   goto failed;
   }
@@ -3988,7 +3988,6 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
fbcon)
   }
   amdgpu_fence_driver_hw_init(adev);

-
   r = amdgpu_device_ip_late_init(adev);
   if (r)
   return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 49c5c7331c53..7495911516c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -498,7 +498,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
   }

   /**
- * amdgpu_fence_driver_init - init the fence driver
+ * amdgpu_fence_driver_sw_init - init the fence driver
* for all possible rings.
*
* @adev: amdgpu device pointer
@@ -509,13 +509,13 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring 
*ring,
* amdgpu_fence_driver_start_ring().
* Returns 0 for success.
*/
-int amdgpu_fence_driver_init(struct amdgpu_device *adev)
+int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev)
   {
   return 0;
   }

   /**
- * amdgpu_fence_driver_fini - tear down the fence driver
+ * amdgpu_fence_driver_hw_fini - tear down the fence driver
* for all possible rings.
*
* @adev: amdgpu device pointer
@@ -531,8 +531,7 @@ void amdgpu_fence_driver_hw_fini(struct
amdgpu_device *adev)

   if (!ring || !ring->fence_drv.initialized)
   continue;
- if (!ring->no_scheduler)
- drm_sched_fini(&ring->sched);
+
   /* You can't wait for HW to

RE: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-22 Thread Chen, Guchun
[Public]

Hi Andrey,

Thanks for your notice. The cause why moving drm_sched_fini to sw_fini is it's 
a SW behavior and part of SW shutdown, so hw_fini should not touch it. But if 
the race, that scheduler on the ring possibly keeps submitting jobs which 
causes un-empty ring is there, possibly we still need to call drm_sched_fini 
first in hw_fini to stop job submission first.

@Koenig, Christian what's your opinion?

Regards,
Guchun

-Original Message-
From: Alex Deucher  
Sent: Friday, August 20, 2021 2:13 AM
To: Mike Lothian 
Cc: Grodzovsky, Andrey ; Chen, Guchun 
; amd-gfx list ; Gao, Likun 
; Koenig, Christian ; Zhang, 
Hawking ; Deucher, Alexander 
Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 
test (v2)

Please go ahead.  Thanks!

Alex

On Thu, Aug 19, 2021 at 8:05 AM Mike Lothian  wrote:
>
> Hi
>
> Do I need to open a new bug report for this?
>
> Cheers
>
> Mike
>
> On Wed, 18 Aug 2021 at 06:26, Andrey Grodzovsky  
> wrote:
>>
>>
>> On 2021-08-02 1:16 a.m., Guchun Chen wrote:
>> > In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to 
>> > stop scheduler in s3 test, otherwise, fence related failure will 
>> > arrive after resume. To fix this and for a better clean up, move 
>> > drm_sched_fini from fence_hw_fini to fence_sw_fini, as it's part of 
>> > driver shutdown, and should never be called in hw_fini.
>> >
>> > v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init, 
>> > to keep sw_init and sw_fini paired.
>> >
>> > Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence
>> > Suggested-by: Christian König 
>> > Signed-off-by: Guchun Chen 
>> > ---
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  5 ++---
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 12 +++-
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  4 ++--
>> >   3 files changed, 11 insertions(+), 10 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> > index b1d2dc39e8be..9e53ff851496 100644
>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> > @@ -3646,9 +3646,9 @@ int amdgpu_device_init(struct amdgpu_device 
>> > *adev,
>> >
>> >   fence_driver_init:
>> >   /* Fence driver */
>> > - r = amdgpu_fence_driver_init(adev);
>> > + r = amdgpu_fence_driver_sw_init(adev);
>> >   if (r) {
>> > - dev_err(adev->dev, "amdgpu_fence_driver_init failed\n");
>> > + dev_err(adev->dev, "amdgpu_fence_driver_sw_init 
>> > + failed\n");
>> >   amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_FENCE_INIT_FAIL, 
>> > 0, 0);
>> >   goto failed;
>> >   }
>> > @@ -3988,7 +3988,6 @@ int amdgpu_device_resume(struct drm_device *dev, 
>> > bool fbcon)
>> >   }
>> >   amdgpu_fence_driver_hw_init(adev);
>> >
>> > -
>> >   r = amdgpu_device_ip_late_init(adev);
>> >   if (r)
>> >   return r;
>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> > index 49c5c7331c53..7495911516c2 100644
>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> > @@ -498,7 +498,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring 
>> > *ring,
>> >   }
>> >
>> >   /**
>> > - * amdgpu_fence_driver_init - init the fence driver
>> > + * amdgpu_fence_driver_sw_init - init the fence driver
>> >* for all possible rings.
>> >*
>> >* @adev: amdgpu device pointer
>> > @@ -509,13 +509,13 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring 
>> > *ring,
>> >* amdgpu_fence_driver_start_ring().
>> >* Returns 0 for success.
>> >*/
>> > -int amdgpu_fence_driver_init(struct amdgpu_device *adev)
>> > +int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev)
>> >   {
>> >   return 0;
>> >   }
>> >
>> >   /**
>> > - * amdgpu_fence_driver_fini - tear down the fence driver
>> > + * amdgpu_fence_driver_hw_fini - tear down the fence driver
>> >* for all possible rings.
>> >*
>> >* @adev: amdgpu device pointer
>> > @@ -531,8 +531,7 @@ 

Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-19 Thread Alex Deucher
Please go ahead.  Thanks!

Alex

On Thu, Aug 19, 2021 at 8:05 AM Mike Lothian  wrote:
>
> Hi
>
> Do I need to open a new bug report for this?
>
> Cheers
>
> Mike
>
> On Wed, 18 Aug 2021 at 06:26, Andrey Grodzovsky  
> wrote:
>>
>>
>> On 2021-08-02 1:16 a.m., Guchun Chen wrote:
>> > In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to stop
>> > scheduler in s3 test, otherwise, fence related failure will arrive
>> > after resume. To fix this and for a better clean up, move drm_sched_fini
>> > from fence_hw_fini to fence_sw_fini, as it's part of driver shutdown, and
>> > should never be called in hw_fini.
>> >
>> > v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init,
>> > to keep sw_init and sw_fini paired.
>> >
>> > Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence
>> > Suggested-by: Christian König 
>> > Signed-off-by: Guchun Chen 
>> > ---
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  5 ++---
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 12 +++-
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  4 ++--
>> >   3 files changed, 11 insertions(+), 10 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> > index b1d2dc39e8be..9e53ff851496 100644
>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> > @@ -3646,9 +3646,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>> >
>> >   fence_driver_init:
>> >   /* Fence driver */
>> > - r = amdgpu_fence_driver_init(adev);
>> > + r = amdgpu_fence_driver_sw_init(adev);
>> >   if (r) {
>> > - dev_err(adev->dev, "amdgpu_fence_driver_init failed\n");
>> > + dev_err(adev->dev, "amdgpu_fence_driver_sw_init failed\n");
>> >   amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_FENCE_INIT_FAIL, 
>> > 0, 0);
>> >   goto failed;
>> >   }
>> > @@ -3988,7 +3988,6 @@ int amdgpu_device_resume(struct drm_device *dev, 
>> > bool fbcon)
>> >   }
>> >   amdgpu_fence_driver_hw_init(adev);
>> >
>> > -
>> >   r = amdgpu_device_ip_late_init(adev);
>> >   if (r)
>> >   return r;
>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> > index 49c5c7331c53..7495911516c2 100644
>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> > @@ -498,7 +498,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring 
>> > *ring,
>> >   }
>> >
>> >   /**
>> > - * amdgpu_fence_driver_init - init the fence driver
>> > + * amdgpu_fence_driver_sw_init - init the fence driver
>> >* for all possible rings.
>> >*
>> >* @adev: amdgpu device pointer
>> > @@ -509,13 +509,13 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring 
>> > *ring,
>> >* amdgpu_fence_driver_start_ring().
>> >* Returns 0 for success.
>> >*/
>> > -int amdgpu_fence_driver_init(struct amdgpu_device *adev)
>> > +int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev)
>> >   {
>> >   return 0;
>> >   }
>> >
>> >   /**
>> > - * amdgpu_fence_driver_fini - tear down the fence driver
>> > + * amdgpu_fence_driver_hw_fini - tear down the fence driver
>> >* for all possible rings.
>> >*
>> >* @adev: amdgpu device pointer
>> > @@ -531,8 +531,7 @@ void amdgpu_fence_driver_hw_fini(struct amdgpu_device 
>> > *adev)
>> >
>> >   if (!ring || !ring->fence_drv.initialized)
>> >   continue;
>> > - if (!ring->no_scheduler)
>> > - drm_sched_fini(&ring->sched);
>> > +
>> >   /* You can't wait for HW to signal if it's gone */
>> >   if (!drm_dev_is_unplugged(&adev->ddev))
>> >   r = amdgpu_fence_wait_empty(ring);
>>
>>
>> Sorry for late notice, missed this patch. By moving drm_sched_fini
>> past amdgpu_fence_wait_empty a race is created as even after you waited
>> for all fences on the ring to signal the sw scheduler will keep submitting
>> new jobs on the ring and so the ring won't stay empty.
>>
>> For hot device removal also we want to prevent any access to HW past PCI
>> removal
>> in order to not do any MMIO accesses inside the physical MMIO range that
>> no longer
>> belongs to this device after it's removal by the PCI core. Stopping all
>> the schedulers prevents any MMIO
>> accesses done during job submissions and that why drm_sched_fini was
>> done as part of amdgpu_fence_driver_hw_fini
>> and not amdgpu_fence_driver_sw_fini
>>
>> Andrey
>>
>> > @@ -560,6 +559,9 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device 
>> > *adev)
>> >   if (!ring || !ring->fence_drv.initialized)
>> >   continue;
>> >
>> > + if (!ring->no_scheduler)
>> > + drm_sched_fini(&ring->sched);
>> > +
>> >   for (j = 0; j <= ring->fence_drv.num_fences_mask; +

Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-19 Thread Mike Lothian
Hi

Do I need to open a new bug report for this?

Cheers

Mike

On Wed, 18 Aug 2021 at 06:26, Andrey Grodzovsky 
wrote:

>
> On 2021-08-02 1:16 a.m., Guchun Chen wrote:
> > In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to stop
> > scheduler in s3 test, otherwise, fence related failure will arrive
> > after resume. To fix this and for a better clean up, move drm_sched_fini
> > from fence_hw_fini to fence_sw_fini, as it's part of driver shutdown, and
> > should never be called in hw_fini.
> >
> > v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init,
> > to keep sw_init and sw_fini paired.
> >
> > Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence
> > Suggested-by: Christian König 
> > Signed-off-by: Guchun Chen 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  5 ++---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 12 +++-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  4 ++--
> >   3 files changed, 11 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index b1d2dc39e8be..9e53ff851496 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -3646,9 +3646,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> >
> >   fence_driver_init:
> >   /* Fence driver */
> > - r = amdgpu_fence_driver_init(adev);
> > + r = amdgpu_fence_driver_sw_init(adev);
> >   if (r) {
> > - dev_err(adev->dev, "amdgpu_fence_driver_init failed\n");
> > + dev_err(adev->dev, "amdgpu_fence_driver_sw_init failed\n");
> >   amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_FENCE_INIT_FAIL,
> 0, 0);
> >   goto failed;
> >   }
> > @@ -3988,7 +3988,6 @@ int amdgpu_device_resume(struct drm_device *dev,
> bool fbcon)
> >   }
> >   amdgpu_fence_driver_hw_init(adev);
> >
> > -
> >   r = amdgpu_device_ip_late_init(adev);
> >   if (r)
> >   return r;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > index 49c5c7331c53..7495911516c2 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > @@ -498,7 +498,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring
> *ring,
> >   }
> >
> >   /**
> > - * amdgpu_fence_driver_init - init the fence driver
> > + * amdgpu_fence_driver_sw_init - init the fence driver
> >* for all possible rings.
> >*
> >* @adev: amdgpu device pointer
> > @@ -509,13 +509,13 @@ int amdgpu_fence_driver_init_ring(struct
> amdgpu_ring *ring,
> >* amdgpu_fence_driver_start_ring().
> >* Returns 0 for success.
> >*/
> > -int amdgpu_fence_driver_init(struct amdgpu_device *adev)
> > +int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev)
> >   {
> >   return 0;
> >   }
> >
> >   /**
> > - * amdgpu_fence_driver_fini - tear down the fence driver
> > + * amdgpu_fence_driver_hw_fini - tear down the fence driver
> >* for all possible rings.
> >*
> >* @adev: amdgpu device pointer
> > @@ -531,8 +531,7 @@ void amdgpu_fence_driver_hw_fini(struct
> amdgpu_device *adev)
> >
> >   if (!ring || !ring->fence_drv.initialized)
> >   continue;
> > - if (!ring->no_scheduler)
> > - drm_sched_fini(&ring->sched);
> > +
> >   /* You can't wait for HW to signal if it's gone */
> >   if (!drm_dev_is_unplugged(&adev->ddev))
> >   r = amdgpu_fence_wait_empty(ring);
>
>
> Sorry for late notice, missed this patch. By moving drm_sched_fini
> past amdgpu_fence_wait_empty a race is created as even after you waited
> for all fences on the ring to signal the sw scheduler will keep submitting
> new jobs on the ring and so the ring won't stay empty.
>
> For hot device removal also we want to prevent any access to HW past PCI
> removal
> in order to not do any MMIO accesses inside the physical MMIO range that
> no longer
> belongs to this device after it's removal by the PCI core. Stopping all
> the schedulers prevents any MMIO
> accesses done during job submissions and that why drm_sched_fini was
> done as part of amdgpu_fence_driver_hw_fini
> and not amdgpu_fence_driver_sw_fini
>
> Andrey
>
> > @@ -560,6 +559,9 @@ void amdgpu_fence_driver_sw_fini(struct
> amdgpu_device *adev)
> >   if (!ring || !ring->fence_drv.initialized)
> >   continue;
> >
> > + if (!ring->no_scheduler)
> > + drm_sched_fini(&ring->sched);
> > +
> >   for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
> >   dma_fence_put(ring->fence_drv.fences[j]);
> >   kfree(ring->fence_drv.fences);
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > index 27adffa7658d..9

Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-17 Thread Andrey Grodzovsky



On 2021-08-02 1:16 a.m., Guchun Chen wrote:

In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to stop
scheduler in s3 test, otherwise, fence related failure will arrive
after resume. To fix this and for a better clean up, move drm_sched_fini
from fence_hw_fini to fence_sw_fini, as it's part of driver shutdown, and
should never be called in hw_fini.

v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init,
to keep sw_init and sw_fini paired.

Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence
Suggested-by: Christian König 
Signed-off-by: Guchun Chen 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  5 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 12 +++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  4 ++--
  3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b1d2dc39e8be..9e53ff851496 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3646,9 +3646,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
  
  fence_driver_init:

/* Fence driver */
-   r = amdgpu_fence_driver_init(adev);
+   r = amdgpu_fence_driver_sw_init(adev);
if (r) {
-   dev_err(adev->dev, "amdgpu_fence_driver_init failed\n");
+   dev_err(adev->dev, "amdgpu_fence_driver_sw_init failed\n");
amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_FENCE_INIT_FAIL, 0, 
0);
goto failed;
}
@@ -3988,7 +3988,6 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
fbcon)
}
amdgpu_fence_driver_hw_init(adev);
  
-

r = amdgpu_device_ip_late_init(adev);
if (r)
return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 49c5c7331c53..7495911516c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -498,7 +498,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
  }
  
  /**

- * amdgpu_fence_driver_init - init the fence driver
+ * amdgpu_fence_driver_sw_init - init the fence driver
   * for all possible rings.
   *
   * @adev: amdgpu device pointer
@@ -509,13 +509,13 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring 
*ring,
   * amdgpu_fence_driver_start_ring().
   * Returns 0 for success.
   */
-int amdgpu_fence_driver_init(struct amdgpu_device *adev)
+int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev)
  {
return 0;
  }
  
  /**

- * amdgpu_fence_driver_fini - tear down the fence driver
+ * amdgpu_fence_driver_hw_fini - tear down the fence driver
   * for all possible rings.
   *
   * @adev: amdgpu device pointer
@@ -531,8 +531,7 @@ void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev)
  
  		if (!ring || !ring->fence_drv.initialized)

continue;
-   if (!ring->no_scheduler)
-   drm_sched_fini(&ring->sched);
+
/* You can't wait for HW to signal if it's gone */
if (!drm_dev_is_unplugged(&adev->ddev))
r = amdgpu_fence_wait_empty(ring);



Sorry for late notice, missed this patch. By moving drm_sched_fini
past amdgpu_fence_wait_empty a race is created as even after you waited
for all fences on the ring to signal the sw scheduler will keep submitting
new jobs on the ring and so the ring won't stay empty.

For hot device removal also we want to prevent any access to HW past PCI 
removal
in order to not do any MMIO accesses inside the physical MMIO range that 
no longer
belongs to this device after it's removal by the PCI core. Stopping all 
the schedulers prevents any MMIO
accesses done during job submissions and that why drm_sched_fini was 
done as part of amdgpu_fence_driver_hw_fini

and not amdgpu_fence_driver_sw_fini

Andrey


@@ -560,6 +559,9 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev)
if (!ring || !ring->fence_drv.initialized)
continue;
  
+		if (!ring->no_scheduler)

+   drm_sched_fini(&ring->sched);
+
for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
dma_fence_put(ring->fence_drv.fences[j]);
kfree(ring->fence_drv.fences);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 27adffa7658d..9c11ced4312c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -106,7 +106,6 @@ struct amdgpu_fence_driver {
struct dma_fence**fences;
  };
  
-int amdgpu_fence_driver_init(struct amdgpu_device *adev);

  void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring);
  
  int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,

@@ -115,9 +114,10 @@ int amdgpu_fence_driver_init_ring

Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-17 Thread Mike Lothian
Hi

I've just noticed something similar when starting weston, I still see it
with this patch, but not on linus's tree

I'll confirm for sure tomorrow and send the stack trace if I can save it

Cheers

Mike

On Tue, 3 Aug 2021 at 02:56, Chen, Guchun  wrote:

> [Public]
>
> Hi Alex,
>
> I submitted the patch before your message, I will take care of this next
> time.
>
> Regards,
> Guchun
>
> -Original Message-
> From: Alex Deucher 
> Sent: Monday, August 2, 2021 9:35 PM
> To: Chen, Guchun 
> Cc: Christian König ;
> amd-gfx@lists.freedesktop.org; Gao, Likun ; Koenig,
> Christian ; Zhang, Hawking <
> hawking.zh...@amd.com>; Deucher, Alexander 
> Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in
> s3 test (v2)
>
> On Mon, Aug 2, 2021 at 4:23 AM Chen, Guchun  wrote:
> >
> > [Public]
> >
> > Thank you, Christian.
> >
> > Regarding fence_drv.initialized, it looks to a bit redundant, anyway let
> me look into this more.
>
> Does this patch fix this bug?
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1668&data=04%7C01%7CGuchun.Chen%40amd.com%7C2bf8bebf5b424751572408d955ba66e8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637635081353279181%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FuAo44Ws5SnuCxt45A%2Fqmu%2B3OfEkat1G%2BixO8G9uDVc%3D&reserved=0
>
> If so, please add:
> Bug:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1668&data=04%7C01%7CGuchun.Chen%40amd.com%7C2bf8bebf5b424751572408d955ba66e8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637635081353279181%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FuAo44Ws5SnuCxt45A%2Fqmu%2B3OfEkat1G%2BixO8G9uDVc%3D&reserved=0
> to the commit message.
>
> Alex
>
> >
> > Regards,
> > Guchun
> >
> > -Original Message-
> > From: Christian König 
> > Sent: Monday, August 2, 2021 2:56 PM
> > To: Chen, Guchun ; amd-gfx@lists.freedesktop.org;
> > Gao, Likun ; Koenig, Christian
> > ; Zhang, Hawking ;
> > Deucher, Alexander 
> > Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver
> > fini in s3 test (v2)
> >
> > Am 02.08.21 um 07:16 schrieb Guchun Chen:
> > > In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to
> > > stop scheduler in s3 test, otherwise, fence related failure will
> > > arrive after resume. To fix this and for a better clean up, move
> > > drm_sched_fini from fence_hw_fini to fence_sw_fini, as it's part of
> > > driver shutdown, and should never be called in hw_fini.
> > >
> > > v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init,
> > > to keep sw_init and sw_fini paired.
> > >
> > > Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence
> > > Suggested-by: Christian König 
> > > Signed-off-by: Guchun Chen 
> >
> > It's a bit ambiguous now what fence_drv.initialized means, but I think
> we can live with that for now.
> >
> > Patch is Reviewed-by: Christian König .
> >
> > Regards,
> > Christian.
> >
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  5 ++---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 12 +++-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  4 ++--
> > >   3 files changed, 11 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > index b1d2dc39e8be..9e53ff851496 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > @@ -3646,9 +3646,9 @@ int amdgpu_device_init(struct amdgpu_device
> > > *adev,
> > >
> > >   fence_driver_init:
> > >   /* Fence driver */
> > > - r = amdgpu_fence_driver_init(adev);
> > > + r = amdgpu_fence_driver_sw_init(adev);
> > >   if (r) {
> > > - dev_err(adev->dev, "amdgpu_fence_driver_init failed\n");
> > > + dev_err(adev->dev, "amdgpu_fence_driver_sw_init
> > > + failed\n");
> > >   amdgpu_vf_error_put(adev,
> AMDGIM_ERROR_VF_FENCE_INIT_FAIL, 0, 0);
> > >   goto failed;
> > >   }
> > > @@ -3988,7 +3988,6 @@ int am

RE: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-02 Thread Chen, Guchun
[Public]

Hi Alex,

I submitted the patch before your message, I will take care of this next time.

Regards,
Guchun

-Original Message-
From: Alex Deucher  
Sent: Monday, August 2, 2021 9:35 PM
To: Chen, Guchun 
Cc: Christian König ; 
amd-gfx@lists.freedesktop.org; Gao, Likun ; Koenig, 
Christian ; Zhang, Hawking ; 
Deucher, Alexander 
Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 
test (v2)

On Mon, Aug 2, 2021 at 4:23 AM Chen, Guchun  wrote:
>
> [Public]
>
> Thank you, Christian.
>
> Regarding fence_drv.initialized, it looks to a bit redundant, anyway let me 
> look into this more.

Does this patch fix this bug?
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1668&data=04%7C01%7CGuchun.Chen%40amd.com%7C2bf8bebf5b424751572408d955ba66e8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637635081353279181%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FuAo44Ws5SnuCxt45A%2Fqmu%2B3OfEkat1G%2BixO8G9uDVc%3D&reserved=0

If so, please add:
Bug: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1668&data=04%7C01%7CGuchun.Chen%40amd.com%7C2bf8bebf5b424751572408d955ba66e8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637635081353279181%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FuAo44Ws5SnuCxt45A%2Fqmu%2B3OfEkat1G%2BixO8G9uDVc%3D&reserved=0
to the commit message.

Alex

>
> Regards,
> Guchun
>
> -Original Message-
> From: Christian König 
> Sent: Monday, August 2, 2021 2:56 PM
> To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; 
> Gao, Likun ; Koenig, Christian 
> ; Zhang, Hawking ; 
> Deucher, Alexander 
> Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver 
> fini in s3 test (v2)
>
> Am 02.08.21 um 07:16 schrieb Guchun Chen:
> > In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to 
> > stop scheduler in s3 test, otherwise, fence related failure will 
> > arrive after resume. To fix this and for a better clean up, move 
> > drm_sched_fini from fence_hw_fini to fence_sw_fini, as it's part of 
> > driver shutdown, and should never be called in hw_fini.
> >
> > v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init, 
> > to keep sw_init and sw_fini paired.
> >
> > Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence
> > Suggested-by: Christian König 
> > Signed-off-by: Guchun Chen 
>
> It's a bit ambiguous now what fence_drv.initialized means, but I think we can 
> live with that for now.
>
> Patch is Reviewed-by: Christian König .
>
> Regards,
> Christian.
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  5 ++---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 12 +++-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  4 ++--
> >   3 files changed, 11 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index b1d2dc39e8be..9e53ff851496 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -3646,9 +3646,9 @@ int amdgpu_device_init(struct amdgpu_device 
> > *adev,
> >
> >   fence_driver_init:
> >   /* Fence driver */
> > - r = amdgpu_fence_driver_init(adev);
> > + r = amdgpu_fence_driver_sw_init(adev);
> >   if (r) {
> > - dev_err(adev->dev, "amdgpu_fence_driver_init failed\n");
> > + dev_err(adev->dev, "amdgpu_fence_driver_sw_init 
> > + failed\n");
> >   amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_FENCE_INIT_FAIL, 0, 
> > 0);
> >   goto failed;
> >   }
> > @@ -3988,7 +3988,6 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
> > fbcon)
> >   }
> >   amdgpu_fence_driver_hw_init(adev);
> >
> > -
> >   r = amdgpu_device_ip_late_init(adev);
> >   if (r)
> >   return r;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > index 49c5c7331c53..7495911516c2 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > @@ -498,7 +498,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring 
> > *ring,
> >   }
> >
> >   /**
> > - * amdgpu_fence_driv

Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-02 Thread Mike Lothian
I've just tested it and it seem to fix my issue

Feel free to add my

Tested-by: Mike Lothian 

On Mon, 2 Aug 2021 at 14:35, Alex Deucher  wrote:

> On Mon, Aug 2, 2021 at 4:23 AM Chen, Guchun  wrote:
> >
> > [Public]
> >
> > Thank you, Christian.
> >
> > Regarding fence_drv.initialized, it looks to a bit redundant, anyway let
> me look into this more.
>
> Does this patch fix this bug?
> https://gitlab.freedesktop.org/drm/amd/-/issues/1668
>
> If so, please add:
> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1668
> to the commit message.
>
> Alex
>
> >
> > Regards,
> > Guchun
> >
> > -Original Message-
> > From: Christian König 
> > Sent: Monday, August 2, 2021 2:56 PM
> > To: Chen, Guchun ; amd-gfx@lists.freedesktop.org;
> Gao, Likun ; Koenig, Christian <
> christian.koe...@amd.com>; Zhang, Hawking ;
> Deucher, Alexander 
> > Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini
> in s3 test (v2)
> >
> > Am 02.08.21 um 07:16 schrieb Guchun Chen:
> > > In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to stop
> > > scheduler in s3 test, otherwise, fence related failure will arrive
> > > after resume. To fix this and for a better clean up, move
> > > drm_sched_fini from fence_hw_fini to fence_sw_fini, as it's part of
> > > driver shutdown, and should never be called in hw_fini.
> > >
> > > v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init, to
> > > keep sw_init and sw_fini paired.
> > >
> > > Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence
> > > Suggested-by: Christian König 
> > > Signed-off-by: Guchun Chen 
> >
> > It's a bit ambiguous now what fence_drv.initialized means, but I think
> we can live with that for now.
> >
> > Patch is Reviewed-by: Christian König .
> >
> > Regards,
> > Christian.
> >
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  5 ++---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 12 +++-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  4 ++--
> > >   3 files changed, 11 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > index b1d2dc39e8be..9e53ff851496 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > @@ -3646,9 +3646,9 @@ int amdgpu_device_init(struct amdgpu_device
> > > *adev,
> > >
> > >   fence_driver_init:
> > >   /* Fence driver */
> > > - r = amdgpu_fence_driver_init(adev);
> > > + r = amdgpu_fence_driver_sw_init(adev);
> > >   if (r) {
> > > - dev_err(adev->dev, "amdgpu_fence_driver_init failed\n");
> > > + dev_err(adev->dev, "amdgpu_fence_driver_sw_init
> failed\n");
> > >   amdgpu_vf_error_put(adev,
> AMDGIM_ERROR_VF_FENCE_INIT_FAIL, 0, 0);
> > >   goto failed;
> > >   }
> > > @@ -3988,7 +3988,6 @@ int amdgpu_device_resume(struct drm_device *dev,
> bool fbcon)
> > >   }
> > >   amdgpu_fence_driver_hw_init(adev);
> > >
> > > -
> > >   r = amdgpu_device_ip_late_init(adev);
> > >   if (r)
> > >   return r;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > index 49c5c7331c53..7495911516c2 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > @@ -498,7 +498,7 @@ int amdgpu_fence_driver_init_ring(struct
> amdgpu_ring *ring,
> > >   }
> > >
> > >   /**
> > > - * amdgpu_fence_driver_init - init the fence driver
> > > + * amdgpu_fence_driver_sw_init - init the fence driver
> > >* for all possible rings.
> > >*
> > >* @adev: amdgpu device pointer
> > > @@ -509,13 +509,13 @@ int amdgpu_fence_driver_init_ring(struct
> amdgpu_ring *ring,
> > >* amdgpu_fence_driver_start_ring().
> > >* Returns 0 for success.
> > >*/
> > > -int amdgpu_fence_driver_init(struct amdgpu_device *adev)
> > > +int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev)
> > >   {
> > >   return 0;
> > >   }
&

Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-02 Thread Alex Deucher
On Mon, Aug 2, 2021 at 4:23 AM Chen, Guchun  wrote:
>
> [Public]
>
> Thank you, Christian.
>
> Regarding fence_drv.initialized, it looks to a bit redundant, anyway let me 
> look into this more.

Does this patch fix this bug?
https://gitlab.freedesktop.org/drm/amd/-/issues/1668

If so, please add:
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1668
to the commit message.

Alex

>
> Regards,
> Guchun
>
> -Original Message-
> From: Christian König 
> Sent: Monday, August 2, 2021 2:56 PM
> To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; Gao, 
> Likun ; Koenig, Christian ; 
> Zhang, Hawking ; Deucher, Alexander 
> 
> Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 
> test (v2)
>
> Am 02.08.21 um 07:16 schrieb Guchun Chen:
> > In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to stop
> > scheduler in s3 test, otherwise, fence related failure will arrive
> > after resume. To fix this and for a better clean up, move
> > drm_sched_fini from fence_hw_fini to fence_sw_fini, as it's part of
> > driver shutdown, and should never be called in hw_fini.
> >
> > v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init, to
> > keep sw_init and sw_fini paired.
> >
> > Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence
> > Suggested-by: Christian König 
> > Signed-off-by: Guchun Chen 
>
> It's a bit ambiguous now what fence_drv.initialized means, but I think we can 
> live with that for now.
>
> Patch is Reviewed-by: Christian König .
>
> Regards,
> Christian.
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  5 ++---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 12 +++-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  4 ++--
> >   3 files changed, 11 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index b1d2dc39e8be..9e53ff851496 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -3646,9 +3646,9 @@ int amdgpu_device_init(struct amdgpu_device
> > *adev,
> >
> >   fence_driver_init:
> >   /* Fence driver */
> > - r = amdgpu_fence_driver_init(adev);
> > + r = amdgpu_fence_driver_sw_init(adev);
> >   if (r) {
> > - dev_err(adev->dev, "amdgpu_fence_driver_init failed\n");
> > + dev_err(adev->dev, "amdgpu_fence_driver_sw_init failed\n");
> >   amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_FENCE_INIT_FAIL, 0, 
> > 0);
> >   goto failed;
> >   }
> > @@ -3988,7 +3988,6 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
> > fbcon)
> >   }
> >   amdgpu_fence_driver_hw_init(adev);
> >
> > -
> >   r = amdgpu_device_ip_late_init(adev);
> >   if (r)
> >   return r;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > index 49c5c7331c53..7495911516c2 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > @@ -498,7 +498,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring 
> > *ring,
> >   }
> >
> >   /**
> > - * amdgpu_fence_driver_init - init the fence driver
> > + * amdgpu_fence_driver_sw_init - init the fence driver
> >* for all possible rings.
> >*
> >* @adev: amdgpu device pointer
> > @@ -509,13 +509,13 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring 
> > *ring,
> >* amdgpu_fence_driver_start_ring().
> >* Returns 0 for success.
> >*/
> > -int amdgpu_fence_driver_init(struct amdgpu_device *adev)
> > +int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev)
> >   {
> >   return 0;
> >   }
> >
> >   /**
> > - * amdgpu_fence_driver_fini - tear down the fence driver
> > + * amdgpu_fence_driver_hw_fini - tear down the fence driver
> >* for all possible rings.
> >*
> >* @adev: amdgpu device pointer
> > @@ -531,8 +531,7 @@ void amdgpu_fence_driver_hw_fini(struct
> > amdgpu_device *adev)
> >
> >   if (!ring || !ring->fence_drv.initialized)
> >   continue;
> > - if (!ring->no_scheduler)
> > - drm_sched_fini(&ring->sched);
> > +
> >   /* You can't wait for HW to si

RE: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-02 Thread Chen, Guchun
[Public]

Thank you, Christian.

Regarding fence_drv.initialized, it looks to a bit redundant, anyway let me 
look into this more.

Regards,
Guchun

-Original Message-
From: Christian König  
Sent: Monday, August 2, 2021 2:56 PM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; Gao, 
Likun ; Koenig, Christian ; Zhang, 
Hawking ; Deucher, Alexander 
Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 
test (v2)

Am 02.08.21 um 07:16 schrieb Guchun Chen:
> In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to stop 
> scheduler in s3 test, otherwise, fence related failure will arrive 
> after resume. To fix this and for a better clean up, move 
> drm_sched_fini from fence_hw_fini to fence_sw_fini, as it's part of 
> driver shutdown, and should never be called in hw_fini.
>
> v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init, to 
> keep sw_init and sw_fini paired.
>
> Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence
> Suggested-by: Christian König 
> Signed-off-by: Guchun Chen 

It's a bit ambiguous now what fence_drv.initialized means, but I think we can 
live with that for now.

Patch is Reviewed-by: Christian König .

Regards,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  5 ++---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 12 +++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  4 ++--
>   3 files changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index b1d2dc39e8be..9e53ff851496 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3646,9 +3646,9 @@ int amdgpu_device_init(struct amdgpu_device 
> *adev,
>   
>   fence_driver_init:
>   /* Fence driver */
> - r = amdgpu_fence_driver_init(adev);
> + r = amdgpu_fence_driver_sw_init(adev);
>   if (r) {
> - dev_err(adev->dev, "amdgpu_fence_driver_init failed\n");
> + dev_err(adev->dev, "amdgpu_fence_driver_sw_init failed\n");
>   amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_FENCE_INIT_FAIL, 0, 
> 0);
>   goto failed;
>   }
> @@ -3988,7 +3988,6 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
> fbcon)
>   }
>   amdgpu_fence_driver_hw_init(adev);
>   
> -
>   r = amdgpu_device_ip_late_init(adev);
>   if (r)
>   return r;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 49c5c7331c53..7495911516c2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -498,7 +498,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring 
> *ring,
>   }
>   
>   /**
> - * amdgpu_fence_driver_init - init the fence driver
> + * amdgpu_fence_driver_sw_init - init the fence driver
>* for all possible rings.
>*
>* @adev: amdgpu device pointer
> @@ -509,13 +509,13 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring 
> *ring,
>* amdgpu_fence_driver_start_ring().
>* Returns 0 for success.
>*/
> -int amdgpu_fence_driver_init(struct amdgpu_device *adev)
> +int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev)
>   {
>   return 0;
>   }
>   
>   /**
> - * amdgpu_fence_driver_fini - tear down the fence driver
> + * amdgpu_fence_driver_hw_fini - tear down the fence driver
>* for all possible rings.
>*
>* @adev: amdgpu device pointer
> @@ -531,8 +531,7 @@ void amdgpu_fence_driver_hw_fini(struct 
> amdgpu_device *adev)
>   
>   if (!ring || !ring->fence_drv.initialized)
>   continue;
> - if (!ring->no_scheduler)
> - drm_sched_fini(&ring->sched);
> +
>   /* You can't wait for HW to signal if it's gone */
>   if (!drm_dev_is_unplugged(&adev->ddev))
>   r = amdgpu_fence_wait_empty(ring); @@ -560,6 +559,9 @@ 
> void 
> amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev)
>   if (!ring || !ring->fence_drv.initialized)
>   continue;
>   
> + if (!ring->no_scheduler)
> + drm_sched_fini(&ring->sched);
> +
>   for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
>   dma_fence_put(ring->fence_drv.fences[j]);
>   kfree(ring->fence_drv.fences);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
&g

Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-01 Thread Christian König

Am 02.08.21 um 07:16 schrieb Guchun Chen:

In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to stop
scheduler in s3 test, otherwise, fence related failure will arrive
after resume. To fix this and for a better clean up, move drm_sched_fini
from fence_hw_fini to fence_sw_fini, as it's part of driver shutdown, and
should never be called in hw_fini.

v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init,
to keep sw_init and sw_fini paired.

Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence
Suggested-by: Christian König 
Signed-off-by: Guchun Chen 


It's a bit ambiguous now what fence_drv.initialized means, but I think 
we can live with that for now.


Patch is Reviewed-by: Christian König .

Regards,
Christian.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  5 ++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 12 +++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  4 ++--
  3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b1d2dc39e8be..9e53ff851496 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3646,9 +3646,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
  
  fence_driver_init:

/* Fence driver */
-   r = amdgpu_fence_driver_init(adev);
+   r = amdgpu_fence_driver_sw_init(adev);
if (r) {
-   dev_err(adev->dev, "amdgpu_fence_driver_init failed\n");
+   dev_err(adev->dev, "amdgpu_fence_driver_sw_init failed\n");
amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_FENCE_INIT_FAIL, 0, 
0);
goto failed;
}
@@ -3988,7 +3988,6 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
fbcon)
}
amdgpu_fence_driver_hw_init(adev);
  
-

r = amdgpu_device_ip_late_init(adev);
if (r)
return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 49c5c7331c53..7495911516c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -498,7 +498,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
  }
  
  /**

- * amdgpu_fence_driver_init - init the fence driver
+ * amdgpu_fence_driver_sw_init - init the fence driver
   * for all possible rings.
   *
   * @adev: amdgpu device pointer
@@ -509,13 +509,13 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring 
*ring,
   * amdgpu_fence_driver_start_ring().
   * Returns 0 for success.
   */
-int amdgpu_fence_driver_init(struct amdgpu_device *adev)
+int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev)
  {
return 0;
  }
  
  /**

- * amdgpu_fence_driver_fini - tear down the fence driver
+ * amdgpu_fence_driver_hw_fini - tear down the fence driver
   * for all possible rings.
   *
   * @adev: amdgpu device pointer
@@ -531,8 +531,7 @@ void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev)
  
  		if (!ring || !ring->fence_drv.initialized)

continue;
-   if (!ring->no_scheduler)
-   drm_sched_fini(&ring->sched);
+
/* You can't wait for HW to signal if it's gone */
if (!drm_dev_is_unplugged(&adev->ddev))
r = amdgpu_fence_wait_empty(ring);
@@ -560,6 +559,9 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev)
if (!ring || !ring->fence_drv.initialized)
continue;
  
+		if (!ring->no_scheduler)

+   drm_sched_fini(&ring->sched);
+
for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
dma_fence_put(ring->fence_drv.fences[j]);
kfree(ring->fence_drv.fences);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 27adffa7658d..9c11ced4312c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -106,7 +106,6 @@ struct amdgpu_fence_driver {
struct dma_fence**fences;
  };
  
-int amdgpu_fence_driver_init(struct amdgpu_device *adev);

  void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring);
  
  int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,

@@ -115,9 +114,10 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
  int amdgpu_fence_driver_start_ring(struct amdgpu_ring *ring,
   struct amdgpu_irq_src *irq_src,
   unsigned irq_type);
+void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev);
  void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev);
+int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev);
  void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev);
-void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev);
  int amdgpu