[Public] Looks good to me.
Reviewed-by: Kent Russell <[email protected]> > -----Original Message----- > From: Mario Limonciello (AMD) <[email protected]> > Sent: Wednesday, January 7, 2026 4:37 PM > To: [email protected] > Cc: Mario Limonciello (AMD) <[email protected]>; Russell, Kent > <[email protected]> > Subject: [PATCH] drm/amd: Clean up kfd node on surprise disconnect > > When an eGPU is unplugged the KFD topology should also be destroyed > for that GPU. This never happens because the fini_sw callbacks never > get to run. Run them manually before calling amdgpu_device_ip_fini_early() > when a device has already been disconnected. > > This location is intentionally chosen to make sure that the kfd locking > refcount doesn't get incremented unintentionally. > > Cc: [email protected] > Closes: https://community.frame.work/t/amd-egpu-on-linux/8691/33 > Signed-off-by: Mario Limonciello (AMD) <[email protected]> > --- > v2: > * Move the call earlier in amdgpu_device_fini_hw() to fix locking > refcount issues > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 021ecc988ff79..f167ba1b6ffcb 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -5251,6 +5251,14 @@ void amdgpu_device_fini_hw(struct amdgpu_device > *adev) > > amdgpu_ttm_set_buffer_funcs_status(adev, false); > > + /* > + * device went through surprise hotplug; we need to destroy topology > + * before ip_fini_early to prevent kfd locking refcount issues by > calling > + * amdgpu_amdkfd_suspend() > + */ > + if (drm_dev_is_unplugged(adev_to_drm(adev))) > + amdgpu_amdkfd_device_fini_sw(adev); > + > amdgpu_device_ip_fini_early(adev); > > amdgpu_irq_fini_hw(adev); > -- > 2.43.0
