Re: [PATCH 5/5] drm/sched: Make use of a "done" list (v2)

2020-12-07 Thread Luben Tuikov
On 2020-12-04 3:16 a.m., Christian König wrote:
> Am 04.12.20 um 04:17 schrieb Luben Tuikov:
>> The drm_sched_job_done() callback now moves done
>> jobs from the pending list to a "done" list.
>>
>> In drm_sched_job_timeout, make use of the status
>> returned by a GPU driver job timeout handler to
>> decide whether to leave the oldest job in the
>> pending list, or to send it off to the done list.
>> If a driver's job timeout callback returns a
>> status that that job is done, it is added to the
>> done list and the done thread woken up. If that
>> job needs more time, it is left on the pending
>> list and the timeout timer restarted.
>>
>> The idea is that a GPU driver can check the IP to
>> which the passed-in job belongs to and determine
>> whether the IP is alive and well, or if it needs
>> more time to complete this job and perhaps others
>> also executing on it.
>>
>> In drm_sched_job_timeout(), the main scheduler
>> thread is now parked, before calling a driver's
>> timeout_job callback, so as to not compete pushing
>> jobs down to the GPU while the recovery method is
>> taking place.
>>
>> Eliminate the polling mechanism of picking out done
>> jobs from the pending list, i.e. eliminate
>> drm_sched_get_cleanup_job().
>>
>> This also eliminates the eldest job disappearing
>> from the pending list, while the driver timeout
>> handler is called.
>>
>> Various other optimizations to the GPU scheduler
>> and job recovery are possible with this format.
>>
>> Signed-off-by: Luben Tuikov 
>>
>> Cc: Alexander Deucher 
>> Cc: Andrey Grodzovsky 
>> Cc: Christian König 
>> Cc: Daniel Vetter 
>> Cc: Lucas Stach 
>> Cc: Russell King 
>> Cc: Christian Gmeiner 
>> Cc: Qiang Yu 
>> Cc: Rob Herring 
>> Cc: Tomeu Vizoso 
>> Cc: Steven Price 
>> Cc: Alyssa Rosenzweig 
>> Cc: Eric Anholt 
>>
>> v2: Dispell using a done thread, so as to keep
>>  the cache hot on the same processor.
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 247 +
>>   include/drm/gpu_scheduler.h|   4 +
>>   2 files changed, 134 insertions(+), 117 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index b9876cad94f2..d77180b44998 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -164,7 +164,9 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
>>* drm_sched_job_done - complete a job
>>* @s_job: pointer to the job which is done
>>*
>> - * Finish the job's fence and wake up the worker thread.
>> + * Move the completed task to the done list,
>> + * signal the its fence to mark it finished,
>> + * and wake up the worker thread.
>>*/
>>   static void drm_sched_job_done(struct drm_sched_job *s_job)
>>   {
>> @@ -176,9 +178,14 @@ static void drm_sched_job_done(struct drm_sched_job 
>> *s_job)
>>   
>>  trace_drm_sched_process_job(s_fence);
>>   
>> +spin_lock(&sched->job_list_lock);
>> +list_move(&s_job->list, &sched->done_list);
>> +spin_unlock(&sched->job_list_lock);
>> +
> 
> That is racy, as soon as the spinlock is dropped the job and with it the 
> s_fence might haven been destroyed.

Yeah, I had it the other way around, (the correct way),
and changed it--not sure why. I revert it back. Thanks
for catching this.

Regards,
Luben

> 
>>  dma_fence_get(&s_fence->finished);
>>  drm_sched_fence_finished(s_fence);
>>  dma_fence_put(&s_fence->finished);
> 
> In other words this here needs to come first.
> 
> Regards,
> Christian.
> 
>> +
>>  wake_up_interruptible(&sched->wake_up_worker);
>>   }
>>   
>> @@ -309,6 +316,37 @@ static void drm_sched_job_begin(struct drm_sched_job 
>> *s_job)
>>  spin_unlock(&sched->job_list_lock);
>>   }
>>   
>> +/** drm_sched_job_timeout -- a timer timeout occurred
>> + * @work: pointer to work_struct
>> + *
>> + * First, park the scheduler thread whose IP timed out,
>> + * so that we don't race with the scheduler thread pushing
>> + * jobs down the IP as we try to investigate what
>> + * happened and give drivers a chance to recover.
>> + *
>> + * Second, take the fist job in the pending list
>> + * (oldest), leave it in the pending list and call the
>> + * driver's timer timeout callback to find out what
>> + * happened, passing this job as the suspect one.
>> + *
>> + * The driver may return DRM_TASK_STATUS COMPLETE,
>> + * which means the task is not in the IP(*) and we move
>> + * it to the done list to free it.
>> + *
>> + * (*) A reason for this would be, say, that the job
>> + * completed in due time, or the driver has aborted
>> + * this job using driver specific methods in the
>> + * timedout_job callback and has now removed it from
>> + * the hardware.
>> + *
>> + * Or, the driver may return DRM_TASK_STATUS_ALIVE, to
>> + * indicate that it had inquired about this job, and it
>> + * has verified that this job is alive and well, and
>> + * that the DRM layer should give this task more time
>> +

Re: [PATCH 5/5] drm/sched: Make use of a "done" list (v2)

2020-12-04 Thread Christian König

Am 04.12.20 um 04:17 schrieb Luben Tuikov:

The drm_sched_job_done() callback now moves done
jobs from the pending list to a "done" list.

In drm_sched_job_timeout, make use of the status
returned by a GPU driver job timeout handler to
decide whether to leave the oldest job in the
pending list, or to send it off to the done list.
If a driver's job timeout callback returns a
status that that job is done, it is added to the
done list and the done thread woken up. If that
job needs more time, it is left on the pending
list and the timeout timer restarted.

The idea is that a GPU driver can check the IP to
which the passed-in job belongs to and determine
whether the IP is alive and well, or if it needs
more time to complete this job and perhaps others
also executing on it.

In drm_sched_job_timeout(), the main scheduler
thread is now parked, before calling a driver's
timeout_job callback, so as to not compete pushing
jobs down to the GPU while the recovery method is
taking place.

Eliminate the polling mechanism of picking out done
jobs from the pending list, i.e. eliminate
drm_sched_get_cleanup_job().

This also eliminates the eldest job disappearing
from the pending list, while the driver timeout
handler is called.

Various other optimizations to the GPU scheduler
and job recovery are possible with this format.

Signed-off-by: Luben Tuikov 

Cc: Alexander Deucher 
Cc: Andrey Grodzovsky 
Cc: Christian König 
Cc: Daniel Vetter 
Cc: Lucas Stach 
Cc: Russell King 
Cc: Christian Gmeiner 
Cc: Qiang Yu 
Cc: Rob Herring 
Cc: Tomeu Vizoso 
Cc: Steven Price 
Cc: Alyssa Rosenzweig 
Cc: Eric Anholt 

v2: Dispell using a done thread, so as to keep
 the cache hot on the same processor.
---
  drivers/gpu/drm/scheduler/sched_main.c | 247 +
  include/drm/gpu_scheduler.h|   4 +
  2 files changed, 134 insertions(+), 117 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index b9876cad94f2..d77180b44998 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -164,7 +164,9 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
   * drm_sched_job_done - complete a job
   * @s_job: pointer to the job which is done
   *
- * Finish the job's fence and wake up the worker thread.
+ * Move the completed task to the done list,
+ * signal the its fence to mark it finished,
+ * and wake up the worker thread.
   */
  static void drm_sched_job_done(struct drm_sched_job *s_job)
  {
@@ -176,9 +178,14 @@ static void drm_sched_job_done(struct drm_sched_job *s_job)
  
  	trace_drm_sched_process_job(s_fence);
  
+	spin_lock(&sched->job_list_lock);

+   list_move(&s_job->list, &sched->done_list);
+   spin_unlock(&sched->job_list_lock);
+


That is racy, as soon as the spinlock is dropped the job and with it the 
s_fence might haven been destroyed.



dma_fence_get(&s_fence->finished);
drm_sched_fence_finished(s_fence);
dma_fence_put(&s_fence->finished);


In other words this here needs to come first.

Regards,
Christian.


+
wake_up_interruptible(&sched->wake_up_worker);
  }
  
@@ -309,6 +316,37 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)

spin_unlock(&sched->job_list_lock);
  }
  
+/** drm_sched_job_timeout -- a timer timeout occurred

+ * @work: pointer to work_struct
+ *
+ * First, park the scheduler thread whose IP timed out,
+ * so that we don't race with the scheduler thread pushing
+ * jobs down the IP as we try to investigate what
+ * happened and give drivers a chance to recover.
+ *
+ * Second, take the fist job in the pending list
+ * (oldest), leave it in the pending list and call the
+ * driver's timer timeout callback to find out what
+ * happened, passing this job as the suspect one.
+ *
+ * The driver may return DRM_TASK_STATUS COMPLETE,
+ * which means the task is not in the IP(*) and we move
+ * it to the done list to free it.
+ *
+ * (*) A reason for this would be, say, that the job
+ * completed in due time, or the driver has aborted
+ * this job using driver specific methods in the
+ * timedout_job callback and has now removed it from
+ * the hardware.
+ *
+ * Or, the driver may return DRM_TASK_STATUS_ALIVE, to
+ * indicate that it had inquired about this job, and it
+ * has verified that this job is alive and well, and
+ * that the DRM layer should give this task more time
+ * to complete. In this case, we restart the timeout timer.
+ *
+ * Lastly, we unpark the scheduler thread.
+ */
  static void drm_sched_job_timedout(struct work_struct *work)
  {
struct drm_gpu_scheduler *sched;
@@ -316,37 +354,32 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
  
  	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
  
-	/* Protects against concurrent deletion in drm_sched_get_cleanup_job */

+   kthread_park(sched->thread);
+
spin_lock(&sched->job_list_lock);
job = lis

[PATCH 5/5] drm/sched: Make use of a "done" list (v2)

2020-12-03 Thread Luben Tuikov
The drm_sched_job_done() callback now moves done
jobs from the pending list to a "done" list.

In drm_sched_job_timeout, make use of the status
returned by a GPU driver job timeout handler to
decide whether to leave the oldest job in the
pending list, or to send it off to the done list.
If a driver's job timeout callback returns a
status that that job is done, it is added to the
done list and the done thread woken up. If that
job needs more time, it is left on the pending
list and the timeout timer restarted.

The idea is that a GPU driver can check the IP to
which the passed-in job belongs to and determine
whether the IP is alive and well, or if it needs
more time to complete this job and perhaps others
also executing on it.

In drm_sched_job_timeout(), the main scheduler
thread is now parked, before calling a driver's
timeout_job callback, so as to not compete pushing
jobs down to the GPU while the recovery method is
taking place.

Eliminate the polling mechanism of picking out done
jobs from the pending list, i.e. eliminate
drm_sched_get_cleanup_job().

This also eliminates the eldest job disappearing
from the pending list, while the driver timeout
handler is called.

Various other optimizations to the GPU scheduler
and job recovery are possible with this format.

Signed-off-by: Luben Tuikov 

Cc: Alexander Deucher 
Cc: Andrey Grodzovsky 
Cc: Christian König 
Cc: Daniel Vetter 
Cc: Lucas Stach 
Cc: Russell King 
Cc: Christian Gmeiner 
Cc: Qiang Yu 
Cc: Rob Herring 
Cc: Tomeu Vizoso 
Cc: Steven Price 
Cc: Alyssa Rosenzweig 
Cc: Eric Anholt 

v2: Dispell using a done thread, so as to keep
the cache hot on the same processor.
---
 drivers/gpu/drm/scheduler/sched_main.c | 247 +
 include/drm/gpu_scheduler.h|   4 +
 2 files changed, 134 insertions(+), 117 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index b9876cad94f2..d77180b44998 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -164,7 +164,9 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
  * drm_sched_job_done - complete a job
  * @s_job: pointer to the job which is done
  *
- * Finish the job's fence and wake up the worker thread.
+ * Move the completed task to the done list,
+ * signal the its fence to mark it finished,
+ * and wake up the worker thread.
  */
 static void drm_sched_job_done(struct drm_sched_job *s_job)
 {
@@ -176,9 +178,14 @@ static void drm_sched_job_done(struct drm_sched_job *s_job)
 
trace_drm_sched_process_job(s_fence);
 
+   spin_lock(&sched->job_list_lock);
+   list_move(&s_job->list, &sched->done_list);
+   spin_unlock(&sched->job_list_lock);
+
dma_fence_get(&s_fence->finished);
drm_sched_fence_finished(s_fence);
dma_fence_put(&s_fence->finished);
+
wake_up_interruptible(&sched->wake_up_worker);
 }
 
@@ -309,6 +316,37 @@ static void drm_sched_job_begin(struct drm_sched_job 
*s_job)
spin_unlock(&sched->job_list_lock);
 }
 
+/** drm_sched_job_timeout -- a timer timeout occurred
+ * @work: pointer to work_struct
+ *
+ * First, park the scheduler thread whose IP timed out,
+ * so that we don't race with the scheduler thread pushing
+ * jobs down the IP as we try to investigate what
+ * happened and give drivers a chance to recover.
+ *
+ * Second, take the fist job in the pending list
+ * (oldest), leave it in the pending list and call the
+ * driver's timer timeout callback to find out what
+ * happened, passing this job as the suspect one.
+ *
+ * The driver may return DRM_TASK_STATUS COMPLETE,
+ * which means the task is not in the IP(*) and we move
+ * it to the done list to free it.
+ *
+ * (*) A reason for this would be, say, that the job
+ * completed in due time, or the driver has aborted
+ * this job using driver specific methods in the
+ * timedout_job callback and has now removed it from
+ * the hardware.
+ *
+ * Or, the driver may return DRM_TASK_STATUS_ALIVE, to
+ * indicate that it had inquired about this job, and it
+ * has verified that this job is alive and well, and
+ * that the DRM layer should give this task more time
+ * to complete. In this case, we restart the timeout timer.
+ *
+ * Lastly, we unpark the scheduler thread.
+ */
 static void drm_sched_job_timedout(struct work_struct *work)
 {
struct drm_gpu_scheduler *sched;
@@ -316,37 +354,32 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
 
sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
 
-   /* Protects against concurrent deletion in drm_sched_get_cleanup_job */
+   kthread_park(sched->thread);
+
spin_lock(&sched->job_list_lock);
job = list_first_entry_or_null(&sched->pending_list,
   struct drm_sched_job, list);
+   spin_unlock(&sched->job_list_lock);
 
if (job) {
-   /*
-* Remove the bad jo