Am 03.11.22 um 05:06 schrieb Victor Zhao:
- clear kiq ring after suspend/resume under sriov to aviod kiq ring
test failure
- update irq after resume to fix kiq interrput loss

Good to see that somebody takes a look into this. Is that enough to get suspend/resume with SRIOV going?


Signed-off-by: Victor Zhao <victor.z...@amd.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c     | 2 ++
  2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 522820eeaa59..5b9f992e4607 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4197,6 +4197,8 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
fbcon)
        }
/* Make sure IB tests flushed */
+       if (amdgpu_sriov_vf(adev))
+               amdgpu_irq_gpu_reset_resume_helper(adev);

This is a pretty clear NAK because that should happen during resume anyway. If this doesn't happen we have a bug somewhere else and that here just hides it.

        flush_delayed_work(&adev->delayed_init_work);
if (adev->in_s0ix) {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 7853d3ca58cf..49d34c7bbf20 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -6909,6 +6909,8 @@ static int gfx_v10_0_kiq_init_queue(struct amdgpu_ring 
*ring)
                mutex_unlock(&adev->srbm_mutex);
        } else {
                memset((void *)mqd, 0, sizeof(*mqd));
+               if (amdgpu_sriov_vf(adev) && adev->in_suspend)
+                       amdgpu_ring_clear_ring(ring);

Is there any good reason to not always clear the KIQ ring here? E.g. also on bare metal and during load/reset?

Regards,
Christian.

                mutex_lock(&adev->srbm_mutex);
                nv_grbm_select(adev, ring->me, ring->pipe, ring->queue, 0);
                amdgpu_ring_init_mqd(ring);

Reply via email to