The patch below does not apply to the 6.12-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <[email protected]>.
Thanks, Sasha ------------------ original commit in Linus's tree ------------------ >From f7afda7fcd169a9168695247d07ad94cf7b9798f Mon Sep 17 00:00:00 2001 From: Mario Limonciello <[email protected]> Date: Thu, 5 Feb 2026 10:42:54 -0600 Subject: [PATCH] drm/amd: Fix hang on amdgpu unload by using pci_dev_is_disconnected() The commit 6a23e7b4332c ("drm/amd: Clean up kfd node on surprise disconnect") introduced early KFD cleanup when drm_dev_is_unplugged() returns true. However, this causes hangs during normal module unload (rmmod amdgpu). The issue occurs because drm_dev_unplug() is called in amdgpu_pci_remove() for all removal scenarios, not just surprise disconnects. This was done intentionally in commit 39934d3ed572 ("Revert "drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is uninstalled"") to fix IGT PCI software unplug test failures. As a result, drm_dev_is_unplugged() returns true even during normal module unload, triggering the early KFD cleanup inappropriately. The correct check should distinguish between: - Actual surprise disconnect (eGPU unplugged): pci_dev_is_disconnected() returns true - Normal module unload (rmmod): pci_dev_is_disconnected() returns false Replace drm_dev_is_unplugged() with pci_dev_is_disconnected() to ensure the early cleanup only happens during true hardware disconnect events. Cc: [email protected] Reported-by: Cal Peake <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Fixes: 6a23e7b4332c ("drm/amd: Clean up kfd node on surprise disconnect") Acked-by: Alex Deucher <[email protected]> Signed-off-by: Mario Limonciello <[email protected]> Signed-off-by: Alex Deucher <[email protected]> --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 528990a595ec9..9758221413814 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4924,7 +4924,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) * before ip_fini_early to prevent kfd locking refcount issues by calling * amdgpu_amdkfd_suspend() */ - if (drm_dev_is_unplugged(adev_to_drm(adev))) + if (pci_dev_is_disconnected(adev->pdev)) amdgpu_amdkfd_device_fini_sw(adev); amdgpu_device_ip_fini_early(adev); @@ -4936,7 +4936,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) amdgpu_gart_dummy_page_fini(adev); - if (drm_dev_is_unplugged(adev_to_drm(adev))) + if (pci_dev_is_disconnected(adev->pdev)) amdgpu_device_unmap_mmio(adev); } -- 2.51.0
