On 2025-10-15 16:11, Philip Yang wrote:
In mmu notifier release callback, stop user queues to be safe because
the SVM memory is going to unmap from CPU.

Suggested-by: Felix Kuehling <[email protected]>
Signed-off-by: Philip Yang <[email protected]>
---
  drivers/gpu/drm/amd/amdkfd/kfd_process.c | 7 ++++++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 0341f570f3d1..e2a0ae0394b8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1221,11 +1221,16 @@ static void kfd_process_free_notifier(struct 
mmu_notifier *mn)
static void kfd_process_notifier_release_internal(struct kfd_process *p)
  {
-       int i;
+       int i, r;
cancel_delayed_work_sync(&p->eviction_work);
        cancel_delayed_work_sync(&p->restore_work);
+ WARN(debug_evictions, "Evicting pid %d", p->lead_thread->pid);
+       r = kfd_process_evict_queues(p, KFD_QUEUE_EVICTION_TRIGGER_SVM);

Is there a reason why we can't just call kfd_process_dequeue_from_all_devices here, and remove that call from kfd_process_wq_release? We don't need to call this an eviction. The queues get removed on process termination anyway. We're just doing it a bit earlier now.

Regards,
  Felix


+       if (r)
+               pr_debug("failed %d to quiesce KFD queues\n", r);
+
        for (i = 0; i < p->n_pdds; i++) {
                struct kfd_process_device *pdd = p->pdds[i];

Reply via email to