On 2025-10-15 16:40, Chen, Xiaogang wrote:
On 10/15/2025 3:11 PM, Philip Yang wrote:
In mmu notifier release callback, stop user queues to be safe because
the SVM memory is going to unmap from CPU.
Suggested-by: Felix Kuehling<[email protected]>
Signed-off-by: Philip Yang<[email protected]>
---
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 0341f570f3d1..e2a0ae0394b8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1221,11 +1221,16 @@ static void kfd_process_free_notifier(struct
mmu_notifier *mn)
static void kfd_process_notifier_release_internal(struct kfd_process *p)
{
- int i;
+ int i, r;
cancel_delayed_work_sync(&p->eviction_work);
cancel_delayed_work_sync(&p->restore_work);
+ WARN(debug_evictions, "Evicting pid %d", p->lead_thread->pid);
Use warning message or debug message? I saw this WARN are used several
places. If the queues from kfd process p are still running when come
here we need to stop them. It is not error. debug message is more
suitable I think.
The module parameter debug_evictions can be set to true, use WARN to
dump call back trace to help understand why queue is evicted, by default
debug_evictions is false.
+ r = kfd_process_evict_queues(p, KFD_QUEUE_EVICTION_TRIGGER_SVM);
The evict reason KFD_QUEUE_EVICTION_TRIGGER_SVM is not good here as it
is general kfd process release. Maybe need another enum value.
Define new profiling event requires rocprofiler API change,
KFD_QUEUE_EVICTION_TRIGGER_SVM seems the closest event from mmu notifier.
Regards,
Philip
Regards
Xiaogagn
+ if (r)
+ pr_debug("failed %d to quiesce KFD queues\n", r);
+
for (i = 0; i < p->n_pdds; i++) {
struct kfd_process_device *pdd = p->pdds[i];