On Thu, Feb 5, 2026 at 11:52 AM Mario Limonciello
<[email protected]> wrote:
>
> The commit 28695ca09d32 ("drm/amd: Clean up kfd node on surprise
> disconnect") introduced early KFD cleanup when drm_dev_is_unplugged()
> returns true. However, this causes hangs during normal module unload
> (rmmod amdgpu).
>
> The issue occurs because drm_dev_unplug() is called in amdgpu_pci_remove()
> for all removal scenarios, not just surprise disconnects. This was done
> intentionally in commit 39934d3ed5725c ("Revert "drm/amdgpu: TA unload
> messages are not actually sent to psp when amdgpu is uninstalled"") to
> fix IGT PCI software unplug test failures. As a result,
> drm_dev_is_unplugged() returns true even during normal module unload,
> triggering the early KFD cleanup inappropriately.
>
> The correct check should distinguish between:
> - Actual surprise disconnect (eGPU unplugged): pci_dev_is_disconnected()
>   returns true
> - Normal module unload (rmmod): pci_dev_is_disconnected() returns false
>
> Replace drm_dev_is_unplugged() with pci_dev_is_disconnected() to ensure
> the early cleanup only happens during true hardware disconnect events.
>
> Reported-by: Cal Peake <[email protected]>
> Closes: 
> https://lore.kernel.org/all/[email protected]/
> Fixes: 28695ca09d32 ("drm/amd: Clean up kfd node on surprise disconnect")
> Signed-off-by: Mario Limonciello <[email protected]>

Acked-by: Alex Deucher <[email protected]>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index d2c3885de711f..8900e0dc8a61d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5068,7 +5068,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
>          * before ip_fini_early to prevent kfd locking refcount issues by 
> calling
>          * amdgpu_amdkfd_suspend()
>          */
> -       if (drm_dev_is_unplugged(adev_to_drm(adev)))
> +       if (pci_dev_is_disconnected(adev->pdev))
>                 amdgpu_amdkfd_device_fini_sw(adev);
>
>         amdgpu_device_ip_fini_early(adev);
> @@ -5080,7 +5080,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
>
>         amdgpu_gart_dummy_page_fini(adev);
>
> -       if (drm_dev_is_unplugged(adev_to_drm(adev)))
> +       if (pci_dev_is_disconnected(adev->pdev))
>                 amdgpu_device_unmap_mmio(adev);
>
>  }
> --
> 2.52.0
>

Reply via email to