On Thu, Feb 5, 2026 at 11:52 AM Mario Limonciello <[email protected]> wrote: > > The commit 28695ca09d32 ("drm/amd: Clean up kfd node on surprise > disconnect") introduced early KFD cleanup when drm_dev_is_unplugged() > returns true. However, this causes hangs during normal module unload > (rmmod amdgpu). > > The issue occurs because drm_dev_unplug() is called in amdgpu_pci_remove() > for all removal scenarios, not just surprise disconnects. This was done > intentionally in commit 39934d3ed5725c ("Revert "drm/amdgpu: TA unload > messages are not actually sent to psp when amdgpu is uninstalled"") to > fix IGT PCI software unplug test failures. As a result, > drm_dev_is_unplugged() returns true even during normal module unload, > triggering the early KFD cleanup inappropriately. > > The correct check should distinguish between: > - Actual surprise disconnect (eGPU unplugged): pci_dev_is_disconnected() > returns true > - Normal module unload (rmmod): pci_dev_is_disconnected() returns false > > Replace drm_dev_is_unplugged() with pci_dev_is_disconnected() to ensure > the early cleanup only happens during true hardware disconnect events. > > Reported-by: Cal Peake <[email protected]> > Closes: > https://lore.kernel.org/all/[email protected]/ > Fixes: 28695ca09d32 ("drm/amd: Clean up kfd node on surprise disconnect") > Signed-off-by: Mario Limonciello <[email protected]>
Acked-by: Alex Deucher <[email protected]> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index d2c3885de711f..8900e0dc8a61d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -5068,7 +5068,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) > * before ip_fini_early to prevent kfd locking refcount issues by > calling > * amdgpu_amdkfd_suspend() > */ > - if (drm_dev_is_unplugged(adev_to_drm(adev))) > + if (pci_dev_is_disconnected(adev->pdev)) > amdgpu_amdkfd_device_fini_sw(adev); > > amdgpu_device_ip_fini_early(adev); > @@ -5080,7 +5080,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) > > amdgpu_gart_dummy_page_fini(adev); > > - if (drm_dev_is_unplugged(adev_to_drm(adev))) > + if (pci_dev_is_disconnected(adev->pdev)) > amdgpu_device_unmap_mmio(adev); > > } > -- > 2.52.0 >
