On 10/15/2025 4:33 PM, Philip Yang wrote:

On 2025-10-15 16:40, Chen, Xiaogang wrote:


On 10/15/2025 3:11 PM, Philip Yang wrote:
In mmu notifier release callback, stop user queues to be safe because
the SVM memory is going to unmap from CPU.

Suggested-by: Felix Kuehling<[email protected]>
Signed-off-by: Philip Yang<[email protected]>
---
  drivers/gpu/drm/amd/amdkfd/kfd_process.c | 7 ++++++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 0341f570f3d1..e2a0ae0394b8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1221,11 +1221,16 @@ static void kfd_process_free_notifier(struct mmu_notifier *mn)     static void kfd_process_notifier_release_internal(struct kfd_process *p)
  {
-    int i;
+    int i, r;
        cancel_delayed_work_sync(&p->eviction_work);
      cancel_delayed_work_sync(&p->restore_work);
  +    WARN(debug_evictions, "Evicting pid %d", p->lead_thread->pid);

Use warning message or debug message? I saw this WARN are used several places. If the queues from kfd process p are still running when come here we need to stop them. It is not error. debug message is more suitable I think.

The module parameter debug_evictions can be set to true, use WARN to dump call back trace to help understand why queue is evicted, by default debug_evictions is false.
I agree stopping  kfd process's queues during kfd process release. Just wonder if change WARN to debug message form. We can use dump_stack() to dump stack anyway, but it is not relevant to this patch.

+    r = kfd_process_evict_queues(p, KFD_QUEUE_EVICTION_TRIGGER_SVM);

The evict reason KFD_QUEUE_EVICTION_TRIGGER_SVM is not good here as it is general kfd process release. Maybe need another enum value.

Define new profiling event requires rocprofiler API change, KFD_QUEUE_EVICTION_TRIGGER_SVM seems the closest event from mmu notifier.

That is awkward. We may add a emu value at end that rocprofile would not know for now.

Regards

Xiaogang



Regards,

Philip

Regards

Xiaogagn

+    if (r)
+        pr_debug("failed %d to quiesce KFD queues\n", r);
+
      for (i = 0; i < p->n_pdds; i++) {
          struct kfd_process_device *pdd = p->pdds[i];

Reply via email to