Re: [PATCH] drm/amd/amdgpu: move inc gpu_reset_counter after drm_sched_stop

2021-02-25 Thread Christian König

Am 25.02.21 um 10:16 schrieb Jingwen Chen:

Move gpu_reset_counter after drm_sched_stop to avoid race
condition caused by job submitted between reset_count +1 and
drm_sched_stop.

Signed-off-by: Jingwen Chen 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f0f7ed42ee7f..703b96cf3560 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4447,7 +4447,6 @@ static bool amdgpu_device_lock_adev(struct amdgpu_device 
*adev,
down_write(>reset_sem);
}
  
-	atomic_inc(>gpu_reset_counter);

switch (amdgpu_asic_reset_method(adev)) {
case AMD_RESET_METHOD_MODE1:
adev->mp1_state = PP_MP1_STATE_SHUTDOWN;
@@ -4708,6 +4707,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
if (need_emergency_restart)
amdgpu_job_stop_all_jobs_on_sched(>sched);
}
+   atomic_inc(_adev->gpu_reset_counter);
}
  
  	if (need_emergency_restart)

@@ -5050,6 +5050,7 @@ pci_ers_result_t amdgpu_pci_error_detected(struct pci_dev 
*pdev, pci_channel_sta
  
  			drm_sched_stop(>sched, NULL);

}
+   atomic_inc(>gpu_reset_counter);
return PCI_ERS_RESULT_NEED_RESET;
case pci_channel_io_perm_failure:
/* Permanent error, prepare for device removal */


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/amdgpu: move inc gpu_reset_counter after drm_sched_stop

2021-02-25 Thread Jingwen Chen
Move gpu_reset_counter after drm_sched_stop to avoid race
condition caused by job submitted between reset_count +1 and
drm_sched_stop.

Signed-off-by: Jingwen Chen 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f0f7ed42ee7f..703b96cf3560 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4447,7 +4447,6 @@ static bool amdgpu_device_lock_adev(struct amdgpu_device 
*adev,
down_write(>reset_sem);
}
 
-   atomic_inc(>gpu_reset_counter);
switch (amdgpu_asic_reset_method(adev)) {
case AMD_RESET_METHOD_MODE1:
adev->mp1_state = PP_MP1_STATE_SHUTDOWN;
@@ -4708,6 +4707,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
if (need_emergency_restart)
amdgpu_job_stop_all_jobs_on_sched(>sched);
}
+   atomic_inc(_adev->gpu_reset_counter);
}
 
if (need_emergency_restart)
@@ -5050,6 +5050,7 @@ pci_ers_result_t amdgpu_pci_error_detected(struct pci_dev 
*pdev, pci_channel_sta
 
drm_sched_stop(>sched, NULL);
}
+   atomic_inc(>gpu_reset_counter);
return PCI_ERS_RESULT_NEED_RESET;
case pci_channel_io_perm_failure:
/* Permanent error, prepare for device removal */
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx