On 2025-10-15 17:01, Chen, Xiaogang wrote:
On 10/15/2025 3:11 PM, Philip Yang wrote:
Only show warning message if process mm is still alive when queue
buffer is freed to evcit the queues.
If kfd_lookup_process_by_mm return NULL, means the process is already
exited and mm is gone, it is fine to free queue buffer.
But another question is why a prange is still alive, its kfd process
is gone?
It is application process exited, kfd process structure still exist and
available. The issue is race condition:
do_exit
exit_mmap
a. mmu mm release notifier, schedule kfd release wq to destroy
queue
unmap_vmas
b. mmu_notifier_range(.. MMU_NOTIFY_UNMAP...)
the step b is executed to unmap CWSR svm range, before step a kfd
release wq destroy queue.
When unmap a prange the queues that use it should have been stopped.
If not, there is problem somewhere. This warning message need be sent
no matter kfd process exists or not.
I think a real problem here is kfd process need be alive as long as
any of its resource is still alive. In this case since prange is still
alive its kfd process should not be released(p should not be null). If
not we need wait all pranges from this process got released, then
release this kfd process.
kfd process structure is freed in kfd_process_wq_release after
svm_range_list_fini.
Regards,
Philip
Regards
Xiaogang
Fixes: b049504e211e ("drm/amdkfd: Validate user queue svm memory
residency")
Signed-off-by: Philip Yang <[email protected]>
---
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 4d4a47313f5b..d1b2f8525f80 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -2487,7 +2487,9 @@ svm_range_unmap_from_cpu(struct mm_struct *mm,
struct svm_range *prange,
bool unmap_parent;
uint32_t i;
- if (atomic_read(&prange->queue_refcount)) {
+ p = kfd_lookup_process_by_mm(mm);
+
+ if (p && atomic_read(&prange->queue_refcount)) {
int r;
pr_warn("Freeing queue vital buffer 0x%lx, queue evicted\n",
@@ -2497,7 +2499,6 @@ svm_range_unmap_from_cpu(struct mm_struct *mm,
struct svm_range *prange,
pr_debug("failed %d to quiesce KFD queues\n", r);
}
- p = kfd_lookup_process_by_mm(mm);
if (!p)
return;
svms = &p->svms;