On 2025-10-17 14:33, Ahmad Rehman wrote:
The patch adds the sleep to yield for the runtime to act on
the EXCEPTION event. This allows the runtime/app to execute
actions on signal reception before driver gets a chance to
move ahead with the sequence.
Signed-off-by: Ahmad Rehman <[email protected]>
---
drivers/gpu/drm/amd/amdkfd/kfd_events.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 82905f3e54dd..8dfb796fd506 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -1329,6 +1329,13 @@ void kfd_signal_reset_event(struct kfd_node *dev)
}
rcu_read_unlock();
+
+ /*
+ * Since the set_event is asynchronous, putting a delay
+ * to give runtime sometime to act on the EXCEPTION before
+ * driver moves ahead.
+ */
+ ssleep(2);
This adds a 2s sleep inside a loop that iterates over all processes
using KFD. If you have multiple KFD processes running, that could add up
to a significant delay.
I also don't like waiting before the srcu_read_unlock, because that
would block other threads synchronizing with the kfd_processes_srcu
(mostly kfd_process_notifier_release).
What exactly is it that you want to prevent "moving on" to here?
Regards,
Felix
}
srcu_read_unlock(&kfd_processes_srcu, idx);
}