sched: signal and free remaining fences in amd_sched_entity_fini

Tom St Denis Mon, 02 Oct 2017 09:01:31 -0700

Hi Nicolai,

Ping?  :-)


Tom


On 28/09/17 11:01 AM, Christian König wrote:

Am 28.09.2017 um 16:55 schrieb Nicolai Hähnle:

From: Nicolai Hähnle <nicolai.haeh...@amd.com>

Highly concurrent Piglit runs can trigger a race condition where apending

SDMA job on a buffer object is never executed because the corresponding
process is killed (perhaps due to a crash). Since the job's fences were
never signaled, the buffer object was effectively leaked. Worse, the

buffer was stuck wherever it happened to be at the time, possibly inVRAM.


The symptom was user space processes stuck in interruptible waits with
kernel stacks like:

     [<ffffffffbc5e6722>] dma_fence_default_wait+0x112/0x250
     [<ffffffffbc5e6399>] dma_fence_wait_timeout+0x39/0xf0
     [<ffffffffbc5e82d2>] reservation_object_wait_timeout_rcu+0x1c2/0x300
     [<ffffffffc03ce56f>] ttm_bo_cleanup_refs_and_unlock+0xff/0x1a0 [ttm]
     [<ffffffffc03cf1ea>] ttm_mem_evict_first+0xba/0x1a0 [ttm]
     [<ffffffffc03cf611>] ttm_bo_mem_space+0x341/0x4c0 [ttm]
     [<ffffffffc03cfc54>] ttm_bo_validate+0xd4/0x150 [ttm]
     [<ffffffffc03cffbd>] ttm_bo_init_reserved+0x2ed/0x420 [ttm]

[<ffffffffc042f523>] amdgpu_bo_create_restricted+0x1f3/0x470[amdgpu]

     [<ffffffffc042f9fa>] amdgpu_bo_create+0xda/0x220 [amdgpu]
     [<ffffffffc04349ea>] amdgpu_gem_object_create+0xaa/0x140 [amdgpu]
     [<ffffffffc0434f97>] amdgpu_gem_create_ioctl+0x97/0x120 [amdgpu]
     [<ffffffffc037ddba>] drm_ioctl+0x1fa/0x480 [drm]
     [<ffffffffc041904f>] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
     [<ffffffffbc23db33>] do_vfs_ioctl+0xa3/0x5f0
     [<ffffffffbc23e0f9>] SyS_ioctl+0x79/0x90
     [<ffffffffbc864ffb>] entry_SYSCALL_64_fastpath+0x1e/0xad
     [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Nicolai Hähnle <nicolai.haeh...@amd.com>
Acked-by: Christian König <christian.koe...@amd.com>
---
  drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 7 ++++++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.cb/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c

index 54eb77cffd9b..32a99e980d78 100644
--- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c

@@ -220,22 +220,27 @@ void amd_sched_entity_fini(structamd_gpu_scheduler *sched,

                      amd_sched_entity_is_idle(entity));
      amd_sched_rq_remove_entity(rq, entity);
      if (r) {
          struct amd_sched_job *job;

/* Park the kernel for a moment to make sure it isn'tprocessing

           * our enity.
           */
          kthread_park(sched->thread);
          kthread_unpark(sched->thread);
-        while (kfifo_out(&entity->job_queue, &job, sizeof(job)))
+        while (kfifo_out(&entity->job_queue, &job, sizeof(job))) {
+            struct amd_sched_fence *s_fence = job->s_fence;
+            amd_sched_fence_scheduled(s_fence);

It would be really nice to have an error code set on s_fence->finishedbefore it is signaled, use dma_fence_set_error() for this.

Additional to that it would be nice to note in the subject line thatthis is a rather important bug fix.

With that fixed the whole series is Reviewed-by: Christian König<christian.koe...@amd.com>.


Regards,
Christian.

+            amd_sched_fence_finished(s_fence);
+            dma_fence_put(&s_fence->finished);
              sched->ops->free_job(job);
+        }
      }
      kfifo_free(&entity->job_queue);
  }

static void amd_sched_entity_wakeup(struct dma_fence *f, structdma_fence_cb *cb)

  {
      struct amd_sched_entity *entity =
          container_of(cb, struct amd_sched_entity, cb);
      entity->dependency = NULL;



_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 5/5] drm/amd/sched: signal and free remaining fences in amd_sched_entity_fini

Reply via email to