Move dequeue user queues and destroy user queues from
kfd_process_wq_release to mmu notifier release callback, to ensure no
system memory access from GPU because the process memory is going to
free from CPU after mmu release notifier callback returns.

Destroy queue releases the svm prange queue_refcount, this also removes
fake flase positive warning message "Freeing queue vital buffer" message
if application crash or killed.

Suggested-by: Felix Kuehling <[email protected]>
Signed-off-by: Philip Yang <[email protected]>
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 849456ac498b..b429ee4c4ed7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1162,9 +1162,6 @@ static void kfd_process_wq_release(struct work_struct 
*work)
                                             release_work);
        struct dma_fence *ef;
 
-       kfd_process_dequeue_from_all_devices(p);
-       pqm_uninit(&p->pqm);
-
        /*
         * If GPU in reset, user queues may still running, wait for reset 
complete.
         */
@@ -1226,6 +1223,14 @@ static void kfd_process_notifier_release_internal(struct 
kfd_process *p)
        cancel_delayed_work_sync(&p->eviction_work);
        cancel_delayed_work_sync(&p->restore_work);
 
+       /*
+        * Dequeue and remove user queues because exit_mmap free process memory,
+        * it is not safe for GPU to access system memory after mmu release
+        * notifier callback returns.
+        */
+       kfd_process_dequeue_from_all_devices(p);
+       pqm_uninit(&p->pqm);
+
        for (i = 0; i < p->n_pdds; i++) {
                struct kfd_process_device *pdd = p->pdds[i];
 
-- 
2.49.0

Reply via email to