Re: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset v2

2017-04-28 Thread zhoucm1
Agree, but libdrm doesn't allow concurrent submissions from same 
context, like protection 'pthread_mutex_lock(>sequence_mutex);' 
in amdgpu_cs_submit_one.


Regards,
David Zhou
On 2017年04月28日 16:15, Christian König wrote:
Indeed, but after a bit of thinking I've found another problem with 
that patch.


When two threads are pushing jobs into the same scheduler context we 
don't guarantee correct execution order any more!


Before that patch it was handled by the exclusiveness we had because 
of reserving the VM page tables, but now nothing prevents us from 
calling amd_sched_entity_push_job() in nondeterministic order.


In other words we need an additional lock in amdgpu_ctx_ring or 
something like that.


Regards,
Christian.

Am 28.04.2017 um 04:51 schrieb Zhang, Jerry:

Nice catch!
Reviewed-by: Junwei Zhang <jerry.zh...@amd.com>

Regards,
Jerry (Junwei Zhang)

Linux Base Graphics
SRDC Software Development
_



-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On 
Behalf Of

Chunming Zhou
Sent: Friday, April 28, 2017 10:46
To: amd-gfx@lists.freedesktop.org
Cc: Zhou, David(ChunMing)
Subject: [PATCH] drm/amdgpu: fix deadlock of reservation between cs 
and gpu

reset v2

the case could happen when gpu reset:
1. when gpu reset, cs can be continue until sw queue is full, then 
push job will

wait with holding pd reservation.
2. gpu_reset routine will also need pd reservation to restore page 
table from

their shadow.
3. cs is waiting for gpu_reset complete, but gpu reset is waiting 
for cs releases

reservation.

v2: handle amdgpu_cs_submit error path.

Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
Signed-off-by: Chunming Zhou <david1.z...@amd.com>
Reviewed-by: Christian König <christian.koe...@amd.com>
Reviewed-by: Junwei Zhang <jerry.zh...@amd.com>
Reviewed-by: Monk Liu <monk@amd.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 26168df..699f5fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1074,6 +1074,7 @@ static int amdgpu_cs_submit(struct 
amdgpu_cs_parser

*p,
  cs->out.handle = amdgpu_ctx_add_fence(p->ctx, ring, p->fence);
  job->uf_sequence = cs->out.handle;
  amdgpu_job_free_resources(job);
+amdgpu_cs_parser_fini(p, 0, true);

  trace_amdgpu_cs_ioctl(job);
  amd_sched_entity_push_job(>base);
@@ -1129,7 +1130,10 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void
*data, struct drm_file *filp)
  goto out;

  r = amdgpu_cs_submit(, cs);
+if (r)
+goto out;

+return 0;
  out:
  amdgpu_cs_parser_fini(, r, reserved_buffers);
  return r;
--
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx





___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset v2

2017-04-28 Thread Christian König
Indeed, but after a bit of thinking I've found another problem with that 
patch.


When two threads are pushing jobs into the same scheduler context we 
don't guarantee correct execution order any more!


Before that patch it was handled by the exclusiveness we had because of 
reserving the VM page tables, but now nothing prevents us from calling 
amd_sched_entity_push_job() in nondeterministic order.


In other words we need an additional lock in amdgpu_ctx_ring or 
something like that.


Regards,
Christian.

Am 28.04.2017 um 04:51 schrieb Zhang, Jerry:

Nice catch!
Reviewed-by: Junwei Zhang <jerry.zh...@amd.com>

Regards,
Jerry (Junwei Zhang)

Linux Base Graphics
SRDC Software Development
_



-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of
Chunming Zhou
Sent: Friday, April 28, 2017 10:46
To: amd-gfx@lists.freedesktop.org
Cc: Zhou, David(ChunMing)
Subject: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu
reset v2

the case could happen when gpu reset:
1. when gpu reset, cs can be continue until sw queue is full, then push job will
wait with holding pd reservation.
2. gpu_reset routine will also need pd reservation to restore page table from
their shadow.
3. cs is waiting for gpu_reset complete, but gpu reset is waiting for cs 
releases
reservation.

v2: handle amdgpu_cs_submit error path.

Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
Signed-off-by: Chunming Zhou <david1.z...@amd.com>
Reviewed-by: Christian König <christian.koe...@amd.com>
Reviewed-by: Junwei Zhang <jerry.zh...@amd.com>
Reviewed-by: Monk Liu <monk@amd.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 26168df..699f5fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1074,6 +1074,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser
*p,
cs->out.handle = amdgpu_ctx_add_fence(p->ctx, ring, p->fence);
job->uf_sequence = cs->out.handle;
amdgpu_job_free_resources(job);
+   amdgpu_cs_parser_fini(p, 0, true);

trace_amdgpu_cs_ioctl(job);
amd_sched_entity_push_job(>base);
@@ -1129,7 +1130,10 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void
*data, struct drm_file *filp)
goto out;

r = amdgpu_cs_submit(, cs);
+   if (r)
+   goto out;

+   return 0;
  out:
amdgpu_cs_parser_fini(, r, reserved_buffers);
return r;
--
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset v2

2017-04-27 Thread Zhang, Jerry
Nice catch!
Reviewed-by: Junwei Zhang <jerry.zh...@amd.com>

Regards,
Jerry (Junwei Zhang)

Linux Base Graphics
SRDC Software Development
_


> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of
> Chunming Zhou
> Sent: Friday, April 28, 2017 10:46
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhou, David(ChunMing)
> Subject: [PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu
> reset v2
> 
> the case could happen when gpu reset:
> 1. when gpu reset, cs can be continue until sw queue is full, then push job 
> will
> wait with holding pd reservation.
> 2. gpu_reset routine will also need pd reservation to restore page table from
> their shadow.
> 3. cs is waiting for gpu_reset complete, but gpu reset is waiting for cs 
> releases
> reservation.
> 
> v2: handle amdgpu_cs_submit error path.
> 
> Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
> Signed-off-by: Chunming Zhou <david1.z...@amd.com>
> Reviewed-by: Christian König <christian.koe...@amd.com>
> Reviewed-by: Junwei Zhang <jerry.zh...@amd.com>
> Reviewed-by: Monk Liu <monk@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 26168df..699f5fe 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -1074,6 +1074,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser
> *p,
>   cs->out.handle = amdgpu_ctx_add_fence(p->ctx, ring, p->fence);
>   job->uf_sequence = cs->out.handle;
>   amdgpu_job_free_resources(job);
> + amdgpu_cs_parser_fini(p, 0, true);
> 
>   trace_amdgpu_cs_ioctl(job);
>   amd_sched_entity_push_job(>base);
> @@ -1129,7 +1130,10 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void
> *data, struct drm_file *filp)
>   goto out;
> 
>   r = amdgpu_cs_submit(, cs);
> + if (r)
> + goto out;
> 
> + return 0;
>  out:
>   amdgpu_cs_parser_fini(, r, reserved_buffers);
>   return r;
> --
> 1.9.1
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: fix deadlock of reservation between cs and gpu reset v2

2017-04-27 Thread Chunming Zhou
the case could happen when gpu reset:
1. when gpu reset, cs can be continue until sw queue is full, then push job 
will wait with holding pd reservation.
2. gpu_reset routine will also need pd reservation to restore page table from 
their shadow.
3. cs is waiting for gpu_reset complete, but gpu reset is waiting for cs 
releases reservation.

v2: handle amdgpu_cs_submit error path.

Change-Id: I0f66d04b2bef3433035109623c8a5c5992c84202
Signed-off-by: Chunming Zhou 
Reviewed-by: Christian König 
Reviewed-by: Junwei Zhang 
Reviewed-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 26168df..699f5fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1074,6 +1074,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
cs->out.handle = amdgpu_ctx_add_fence(p->ctx, ring, p->fence);
job->uf_sequence = cs->out.handle;
amdgpu_job_free_resources(job);
+   amdgpu_cs_parser_fini(p, 0, true);
 
trace_amdgpu_cs_ioctl(job);
amd_sched_entity_push_job(>base);
@@ -1129,7 +1130,10 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
goto out;
 
r = amdgpu_cs_submit(, cs);
+   if (r)
+   goto out;
 
+   return 0;
 out:
amdgpu_cs_parser_fini(, r, reserved_buffers);
return r;
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx