On 2025-10-17 18:43, Felix Kuehling wrote:

On 2025-10-15 16:11, Philip Yang wrote:
In mmu notifier release callback, stop user queues to be safe because
the SVM memory is going to unmap from CPU.

Suggested-by: Felix Kuehling <[email protected]>
Signed-off-by: Philip Yang <[email protected]>
---
  drivers/gpu/drm/amd/amdkfd/kfd_process.c | 7 ++++++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 0341f570f3d1..e2a0ae0394b8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1221,11 +1221,16 @@ static void kfd_process_free_notifier(struct mmu_notifier *mn)     static void kfd_process_notifier_release_internal(struct kfd_process *p)
  {
-    int i;
+    int i, r;
        cancel_delayed_work_sync(&p->eviction_work);
      cancel_delayed_work_sync(&p->restore_work);
  +    WARN(debug_evictions, "Evicting pid %d", p->lead_thread->pid);
+    r = kfd_process_evict_queues(p, KFD_QUEUE_EVICTION_TRIGGER_SVM);

Is there a reason why we can't just call kfd_process_dequeue_from_all_devices here, and remove that call from kfd_process_wq_release? We don't need to call this an eviction. The queues get removed on process termination anyway. We're just doing it a bit earlier now.

MMU release notifier callback don't hold mmap lock, it is safe to call kfd_process_dequeue_from_all_devices here, will send new version for review.

Regards,

Philip


Regards,
  Felix


+    if (r)
+        pr_debug("failed %d to quiesce KFD queues\n", r);
+
      for (i = 0; i < p->n_pdds; i++) {
          struct kfd_process_device *pdd = p->pdds[i];

Reply via email to