On 8/13/2021 3:59 PM, Michel Dänzer wrote:
From: Michel Dänzer <mdaen...@redhat.com>

schedule_delayed_work does not push back the work if it was already
scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
was disabled and re-enabled again during those 100 ms.

This resulted in frame drops / stutter with the upcoming mutter 41
release on Navi 14, due to constantly enabling GFXOFF in the HW and
disabling it again (for getting the GPU clock counter).

To fix this, call cancel_delayed_work_sync when GFXOFF transitions from
enabled to disabled. This makes sure the delayed work will be scheduled
as intended in the reverse case.

In order to avoid a deadlock, amdgpu_device_delay_enable_gfx_off needs
to use mutex_trylock instead of mutex_lock.

v2:
* Use cancel_delayed_work_sync & mutex_trylock instead of
   mod_delayed_work.

Signed-off-by: Michel Dänzer <mdaen...@redhat.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 ++++++++++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c    | 13 +++++++------
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h    |  3 +++
  3 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f3fd5ec710b6..8b025f70706c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2777,7 +2777,16 @@ static void amdgpu_device_delay_enable_gfx_off(struct 
work_struct *work)
        struct amdgpu_device *adev =
                container_of(work, struct amdgpu_device, 
gfx.gfx_off_delay_work.work);
- mutex_lock(&adev->gfx.gfx_off_mutex);
+       /* mutex_lock could deadlock with cancel_delayed_work_sync in 
amdgpu_gfx_off_ctrl. */
+       if (!mutex_trylock(&adev->gfx.gfx_off_mutex)) {
+               /* If there's a bug which causes amdgpu_gfx_off_ctrl to be 
called with enable=true
+                * when adev->gfx.gfx_off_req_count is already 0, we might race 
with that.
+                * Re-schedule to make sure gfx off will be re-enabled in the 
HW eventually.
+                */
+               schedule_delayed_work(&adev->gfx.gfx_off_delay_work, 
AMDGPU_GFX_OFF_DELAY_ENABLE);
+               return;

This is not needed and is just creating another thread to contend for mutex. The checks below take care of enabling gfxoff correctly. If it's already in gfx_off state, it doesn't do anything. So I don't see why this change is needed.

The other problem is amdgpu_get_gfx_off_status() also uses the same mutex. So it won't be knowing which thread it would be contending against and blindly creates more work items.

+       }
+
        if (!adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
                if (!amdgpu_dpm_set_powergating_by_smu(adev, 
AMD_IP_BLOCK_TYPE_GFX, true))
                        adev->gfx.gfx_off_state = true;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index a0be0772c8b3..da4c46db3093 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -28,9 +28,6 @@
  #include "amdgpu_rlc.h"
  #include "amdgpu_ras.h"
-/* delay 0.1 second to enable gfx off feature */
-#define GFX_OFF_DELAY_ENABLE         msecs_to_jiffies(100)
-
  /*
   * GPU GFX IP block helpers function.
   */
@@ -569,9 +566,13 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool 
enable)
                adev->gfx.gfx_off_req_count--;
if (enable && !adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
-               schedule_delayed_work(&adev->gfx.gfx_off_delay_work, 
GFX_OFF_DELAY_ENABLE);
-       } else if (!enable && adev->gfx.gfx_off_state) {
-               if (!amdgpu_dpm_set_powergating_by_smu(adev, 
AMD_IP_BLOCK_TYPE_GFX, false)) {
+               schedule_delayed_work(&adev->gfx.gfx_off_delay_work, 
AMDGPU_GFX_OFF_DELAY_ENABLE);
+       } else if (!enable) {
+               if (adev->gfx.gfx_off_req_count == 1 && 
!adev->gfx.gfx_off_state)
+                       cancel_delayed_work_sync(&adev->gfx.gfx_off_delay_work);

This has the deadlock problem as discussed in the other thread.

Thanks,
Lijo

+               if (adev->gfx.gfx_off_state &&
+                   !amdgpu_dpm_set_powergating_by_smu(adev, 
AMD_IP_BLOCK_TYPE_GFX, false)) {
                        adev->gfx.gfx_off_state = false;
if (adev->gfx.funcs->init_spm_golden) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index d43fe2ed8116..dcdb505bb7f4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -32,6 +32,9 @@
  #include "amdgpu_rlc.h"
  #include "soc15.h"
+/* delay 0.1 second to enable gfx off feature */
+#define AMDGPU_GFX_OFF_DELAY_ENABLE msecs_to_jiffies(100)
+
  /* GFX current status */
  #define AMDGPU_GFX_NORMAL_MODE                        0x00000000L
  #define AMDGPU_GFX_SAFE_MODE                  0x00000001L

Reply via email to