Re: [PATCH] drm/amdgpu: Move reset domain locking in DPC handler

Andrey Grodzovsky Thu, 14 Apr 2022 07:31:36 -0700


On 2022-04-14 02:40, Christian König wrote:



Am 13.04.22 um 21:31 schrieb Andrey Grodzovsky:

Lock reset domain unconditionally because on resume
we unlock it unconditionally.
This solved mutex deadlock when handling both FATAL
and non FATAL PCI errors one after another.

Signed-off-by: Andrey Grodzovsky <andrey.grodzov...@amd.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 +++++++-------
  1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.cb/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 1cc488a767d8..c65f25e3a0fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

@@ -5531,18 +5531,18 @@ pci_ers_result_tamdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta

        adev->pci_channel_state = state;
  +    /*
+     * Locking adev->reset_domain->sem will prevent any external access
+     * to GPU during PCI error recovery
+     */
+    amdgpu_device_lock_reset_domain(adev->reset_domain);
+    amdgpu_device_set_mp1_state(adev);
+
      switch (state) {
      case pci_channel_io_normal:
          return PCI_ERS_RESULT_CAN_RECOVER;


BTW: Where are we unlocking that again?



In amdgpu_pci_resume, but you made realize I can do this better.
I will be back with V2.

Andrey

      /* Fatal error, prepare for slot reset */
      case pci_channel_io_frozen:
-        /*

- * Locking adev->reset_domain->sem will prevent any externalaccess

-         * to GPU during PCI error recovery
-         */
-        amdgpu_device_lock_reset_domain(adev->reset_domain);
-        amdgpu_device_set_mp1_state(adev);
-
          /*
           * Block any work scheduling as we do for regular GPU reset
           * for the duration of the recovery

Re: [PATCH] drm/amdgpu: Move reset domain locking in DPC handler

Reply via email to