Re: [PATCH] drm/amdkfd: Fix eviction fence handling

2024-04-18 Thread Ba, Gang
[AMD Official Use Only - General]

Tested-by: Gang BA 
Reviewed-by: Gang BA 

From: Kuehling, Felix 
Sent: Wednesday, April 17, 2024 11:14 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Ba, Gang ; Prosyak, Vitaly 
Subject: [PATCH] drm/amdkfd: Fix eviction fence handling

Handle case that dma_fence_get_rcu_safe returns NULL.

If restore work is already scheduled, only update its timer. The same
work item cannot be queued twice, so undo the extra queue eviction.

Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs")
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index b79986412cd8..aafdf064651f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1922,6 +1922,8 @@ static int signal_eviction_fence(struct kfd_process *p)
 rcu_read_lock();
 ef = dma_fence_get_rcu_safe(>ef);
 rcu_read_unlock();
+   if (!ef)
+   return -EINVAL;

 ret = dma_fence_signal(ef);
 dma_fence_put(ef);
@@ -1949,10 +1951,9 @@ static void evict_process_worker(struct work_struct 
*work)
  * they are responsible stopping the queues and scheduling
  * the restore work.
  */
-   if (!signal_eviction_fence(p))
-   queue_delayed_work(kfd_restore_wq, >restore_work,
-   msecs_to_jiffies(PROCESS_RESTORE_TIME_MS));
-   else
+   if (signal_eviction_fence(p) ||
+   mod_delayed_work(kfd_restore_wq, >restore_work,
+msecs_to_jiffies(PROCESS_RESTORE_TIME_MS)))
 kfd_process_restore_queues(p);

 pr_debug("Finished evicting pasid 0x%x\n", p->pasid);
--
2.34.1



Re: [PATCH] drm/amdkfd: Fix eviction fence handling

2024-04-18 Thread Philip Yang

  


On 2024-04-17 23:14, Felix Kuehling
  wrote:


  Handle case that dma_fence_get_rcu_safe returns NULL.

If restore work is already scheduled, only update its timer. The same
work item cannot be queued twice, so undo the extra queue eviction.

Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs")
Signed-off-by: Felix Kuehling 

Reviewed-by: Philip Yang 

  
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index b79986412cd8..aafdf064651f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1922,6 +1922,8 @@ static int signal_eviction_fence(struct kfd_process *p)
 	rcu_read_lock();
 	ef = dma_fence_get_rcu_safe(>ef);
 	rcu_read_unlock();
+	if (!ef)
+		return -EINVAL;
 
 	ret = dma_fence_signal(ef);
 	dma_fence_put(ef);
@@ -1949,10 +1951,9 @@ static void evict_process_worker(struct work_struct *work)
 		 * they are responsible stopping the queues and scheduling
 		 * the restore work.
 		 */
-		if (!signal_eviction_fence(p))
-			queue_delayed_work(kfd_restore_wq, >restore_work,
-msecs_to_jiffies(PROCESS_RESTORE_TIME_MS));
-		else
+		if (signal_eviction_fence(p) ||
+		mod_delayed_work(kfd_restore_wq, >restore_work,
+ msecs_to_jiffies(PROCESS_RESTORE_TIME_MS)))
 			kfd_process_restore_queues(p);
 
 		pr_debug("Finished evicting pasid 0x%x\n", p->pasid);


  



[PATCH] drm/amdkfd: Fix eviction fence handling

2024-04-17 Thread Felix Kuehling
Handle case that dma_fence_get_rcu_safe returns NULL.

If restore work is already scheduled, only update its timer. The same
work item cannot be queued twice, so undo the extra queue eviction.

Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs")
Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index b79986412cd8..aafdf064651f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1922,6 +1922,8 @@ static int signal_eviction_fence(struct kfd_process *p)
rcu_read_lock();
ef = dma_fence_get_rcu_safe(>ef);
rcu_read_unlock();
+   if (!ef)
+   return -EINVAL;
 
ret = dma_fence_signal(ef);
dma_fence_put(ef);
@@ -1949,10 +1951,9 @@ static void evict_process_worker(struct work_struct 
*work)
 * they are responsible stopping the queues and scheduling
 * the restore work.
 */
-   if (!signal_eviction_fence(p))
-   queue_delayed_work(kfd_restore_wq, >restore_work,
-   msecs_to_jiffies(PROCESS_RESTORE_TIME_MS));
-   else
+   if (signal_eviction_fence(p) ||
+   mod_delayed_work(kfd_restore_wq, >restore_work,
+msecs_to_jiffies(PROCESS_RESTORE_TIME_MS)))
kfd_process_restore_queues(p);
 
pr_debug("Finished evicting pasid 0x%x\n", p->pasid);
-- 
2.34.1