Re: [PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-24 Thread Christian König

No, I'm busy with P2P and recoverable page faults.

Christian.

Am 24.04.19 um 11:03 schrieb zhoucm1:


OK, Let's drop mine, then Could you draft a solution for that?


-David


On 2019年04月24日 16:59, Koenig, Christian wrote:

Am 24.04.19 um 10:11 schrieb zhoucm1:




On 2019年04月24日 16:07, Christian König wrote:

This is used in a work item, so you don't need to check for signals.

will remove.


And checking if the LRU is populated is mandatory here

How to check it outside of TTM? because the code is in dm.


Well just use a static cast on the first entry of the LRU.

We can't upstream that solution anyway, so just a hack should do for now.




or otherwise you can run into an endless loop.

I already add a timeout for that.


Thinking more about it we most likely actually need to grab the lock 
on the first BO entry from the LRU.




-David


Christian.

Am 24.04.19 um 09:59 schrieb zhoucm1:


how about new attached?


-David


On 2019年04月24日 15:30, Christian König wrote:
That would change the semantics of ttm_bo_mem_space() and so 
could change the return code in an IOCTL as well. Not a good 
idea, cause that could have a lot of side effects.


Instead in the calling DC code you could check if you get an 
-ENOMEM and then call schedule().


If after the schedule() we see that we have now BOs on the LRU we 
can try again and see if pinning the frame buffer now succeeds.


Christian.

Am 24.04.19 um 09:17 schrieb zhoucm1:


I made a patch as attached.

I'm not sure how to make patch as your proposal, Could you make 
a patch for that if mine isn't enough?


-David

On 2019年04月24日 15:12, Christian König wrote:

how about just adding a wrapper for pin function as below?

I considered this as well and don't think it will work reliable.

We could use it as a band aid for this specific problem, but in 
general we need to improve the handling in TTM to resolve those 
kind of resource conflicts.


Regards,
Christian.

Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>3. If we have a ticket we grab a reference to the first BO on 
the LRU, drop the LRU lock and try to grab the reservation 
lock with the ticket.


The BO on LRU is already locked by cs user, can it be dropped 
here by DC user? and then DC user grab its lock with ticket, 
how does CS grab it again?


If you think waiting in ttm has this risk, how about just 
adding a wrapper for pin function as below?

amdgpu_get_pin_bo_timeout()
{
do {
amdgpo_bo_reserve();
r=amdgpu_bo_pin();

if(!r)
    break;
amdgpu_bo_unreserve();
timeout--;

} while(timeout>0);

}

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while 
gpu busy

From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, 
Prike" ,dri-devel@lists.freedesktop.org

CC:

Well that's not so easy of hand.

The basic problem here is that when you busy wait at this 
place you can easily run into situations where application A 
busy waits for B while B busy waits for A -> deadlock.


So what we need here is the deadlock detection logic of the 
ww_mutex. To use this we at least need to do the following steps:


1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if 
the BO we need memory for (or rather the ww_mutex of its 
reservation object) has a ticket assigned.


3. If we have a ticket we grab a reference to the first BO on 
the LRU, drop the LRU lock and try to grab the reservation 
lock with the ticket.


4. If getting the reservation lock with the ticket succeeded 
we check if the BO is still the first one on the LRU in 
question (the BO could have moved).


5. If the BO is still the first one on the LRU in question we 
try to evict it as we would evict any other BO.


6. If any of the "If's" above fail we just back off and return 
-EBUSY.


Steps 2-5 are certainly not trivial, but doable as far as I 
can see.


Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
How about adding more condition ctx->resv inline to address 
your concern? As well as don't wait from same user, shouldn't 
lead to deadlock.


Otherwise, any other idea?

-------- Original Message ----
Subject: Re: [PATCH] ttm: wait mem space if user allow while 
gpu busy

From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org

CC:

Well that is certainly a NAK because it can lead to deadlock 
in the

memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang 
>
> Thanks,
> Prike
> -Original Message-
> From: Chunming Zhou 
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Liang, Prike ; Zhou, 
David(ChunMing) 
> Subject: [PATCH] ttm: wait mem space if user allow while 
gpu busy

>
> hea

Re: [PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-24 Thread zhoucm1

OK, Let's drop mine, then Could you draft a solution for that?


-David


On 2019年04月24日 16:59, Koenig, Christian wrote:

Am 24.04.19 um 10:11 schrieb zhoucm1:




On 2019年04月24日 16:07, Christian König wrote:

This is used in a work item, so you don't need to check for signals.

will remove.


And checking if the LRU is populated is mandatory here

How to check it outside of TTM? because the code is in dm.


Well just use a static cast on the first entry of the LRU.

We can't upstream that solution anyway, so just a hack should do for now.




or otherwise you can run into an endless loop.

I already add a timeout for that.


Thinking more about it we most likely actually need to grab the lock 
on the first BO entry from the LRU.




-David


Christian.

Am 24.04.19 um 09:59 schrieb zhoucm1:


how about new attached?


-David


On 2019年04月24日 15:30, Christian König wrote:
That would change the semantics of ttm_bo_mem_space() and so could 
change the return code in an IOCTL as well. Not a good idea, cause 
that could have a lot of side effects.


Instead in the calling DC code you could check if you get an 
-ENOMEM and then call schedule().


If after the schedule() we see that we have now BOs on the LRU we 
can try again and see if pinning the frame buffer now succeeds.


Christian.

Am 24.04.19 um 09:17 schrieb zhoucm1:


I made a patch as attached.

I'm not sure how to make patch as your proposal, Could you make a 
patch for that if mine isn't enough?


-David

On 2019年04月24日 15:12, Christian König wrote:

how about just adding a wrapper for pin function as below?

I considered this as well and don't think it will work reliable.

We could use it as a band aid for this specific problem, but in 
general we need to improve the handling in TTM to resolve those 
kind of resource conflicts.


Regards,
Christian.

Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>3. If we have a ticket we grab a reference to the first BO on 
the LRU, drop the LRU lock and try to grab the reservation lock 
with the ticket.


The BO on LRU is already locked by cs user, can it be dropped 
here by DC user? and then DC user grab its lock with ticket, 
how does CS grab it again?


If you think waiting in ttm has this risk, how about just 
adding a wrapper for pin function as below?

amdgpu_get_pin_bo_timeout()
{
do {
amdgpo_bo_reserve();
r=amdgpu_bo_pin();

if(!r)
    break;
amdgpu_bo_unreserve();
timeout--;

} while(timeout>0);

}

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while 
gpu busy

From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, 
Prike" ,dri-devel@lists.freedesktop.org

CC:

Well that's not so easy of hand.

The basic problem here is that when you busy wait at this place 
you can easily run into situations where application A busy 
waits for B while B busy waits for A -> deadlock.


So what we need here is the deadlock detection logic of the 
ww_mutex. To use this we at least need to do the following steps:


1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if the 
BO we need memory for (or rather the ww_mutex of its 
reservation object) has a ticket assigned.


3. If we have a ticket we grab a reference to the first BO on 
the LRU, drop the LRU lock and try to grab the reservation lock 
with the ticket.


4. If getting the reservation lock with the ticket succeeded we 
check if the BO is still the first one on the LRU in question 
(the BO could have moved).


5. If the BO is still the first one on the LRU in question we 
try to evict it as we would evict any other BO.


6. If any of the "If's" above fail we just back off and return 
-EBUSY.


Steps 2-5 are certainly not trivial, but doable as far as I can 
see.


Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
How about adding more condition ctx->resv inline to address 
your concern? As well as don't wait from same user, shouldn't 
lead to deadlock.


Otherwise, any other idea?

-------- Original Message ----
Subject: Re: [PATCH] ttm: wait mem space if user allow while 
gpu busy

From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org

CC:

Well that is certainly a NAK because it can lead to deadlock 
in the

memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang 
>
> Thanks,
> Prike
> -Original Message-
> From: Chunming Zhou 
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Liang, Prike ; Zhou, 
David(ChunMing) 
> Subject: [PATCH] ttm: wait mem space if user allow while gpu 
busy

>
> heavy gpu job could occupy memory long time, which could 
lead to other user fail to get memory.

>
> 

Re: [PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-24 Thread Koenig, Christian
Am 24.04.19 um 10:11 schrieb zhoucm1:


On 2019年04月24日 16:07, Christian König wrote:
This is used in a work item, so you don't need to check for signals.
will remove.

And checking if the LRU is populated is mandatory here
How to check it outside of TTM? because the code is in dm.

Well just use a static cast on the first entry of the LRU.

We can't upstream that solution anyway, so just a hack should do for now.


or otherwise you can run into an endless loop.
I already add a timeout for that.

Thinking more about it we most likely actually need to grab the lock on the 
first BO entry from the LRU.


-David

Christian.

Am 24.04.19 um 09:59 schrieb zhoucm1:

how about new attached?


-David

On 2019年04月24日 15:30, Christian König wrote:
That would change the semantics of ttm_bo_mem_space() and so could change the 
return code in an IOCTL as well. Not a good idea, cause that could have a lot 
of side effects.

Instead in the calling DC code you could check if you get an -ENOMEM and then 
call schedule().

If after the schedule() we see that we have now BOs on the LRU we can try again 
and see if pinning the frame buffer now succeeds.

Christian.

Am 24.04.19 um 09:17 schrieb zhoucm1:

I made a patch as attached.

I'm not sure how to make patch as your proposal, Could you make a patch for 
that if mine isn't enough?

-David

On 2019年04月24日 15:12, Christian König wrote:
how about just adding a wrapper for pin function as below?
I considered this as well and don't think it will work reliable.

We could use it as a band aid for this specific problem, but in general we need 
to improve the handling in TTM to resolve those kind of resource conflicts.

Regards,
Christian.

Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>3. If we have a ticket we grab a reference to the first BO on the LRU, drop 
>the LRU lock and try to grab the reservation lock with the ticket.

The BO on LRU is already locked by cs user, can it be dropped here by DC user? 
and then DC user grab its lock with ticket, how does CS grab it again?

If you think waiting in ttm has this risk, how about just adding a wrapper for 
pin function as below?
amdgpu_get_pin_bo_timeout()
{
do {
amdgpo_bo_reserve();
r=amdgpu_bo_pin();

if(!r)
break;
amdgpu_bo_unreserve();
timeout--;

} while(timeout>0);

}

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
,dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>
CC:

Well that's not so easy of hand.

The basic problem here is that when you busy wait at this place you can easily 
run into situations where application A busy waits for B while B busy waits for 
A -> deadlock.

So what we need here is the deadlock detection logic of the ww_mutex. To use 
this we at least need to do the following steps:

1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if the BO we need 
memory for (or rather the ww_mutex of its reservation object) has a ticket 
assigned.

3. If we have a ticket we grab a reference to the first BO on the LRU, drop the 
LRU lock and try to grab the reservation lock with the ticket.

4. If getting the reservation lock with the ticket succeeded we check if the BO 
is still the first one on the LRU in question (the BO could have moved).

5. If the BO is still the first one on the LRU in question we try to evict it 
as we would evict any other BO.

6. If any of the "If's" above fail we just back off and return -EBUSY.

Steps 2-5 are certainly not trivial, but doable as far as I can see.

Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
How about adding more condition ctx->resv inline to address your concern? As 
well as don't wait from same user, shouldn't lead to deadlock.

Otherwise, any other idea?

---- Original Message ----
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>
CC:

Well that is certainly a NAK because it can lead to deadlock in the
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang <mailto:prike.li...@amd.com>
>
> Thanks,
> Prike
> -Original Message-
> From: Chunming Zhou <mailto:david1.z...@amd.com>
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>
> Cc: Liang, Prike <mailto:prike.li...@amd.com>; Zhou, 
> David(ChunMing) <mailto:david1.z...@amd.com>
> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
>

Re: [PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-24 Thread zhoucm1



On 2019年04月24日 16:07, Christian König wrote:

This is used in a work item, so you don't need to check for signals.

will remove.


And checking if the LRU is populated is mandatory here

How to check it outside of TTM? because the code is in dm.


or otherwise you can run into an endless loop.

I already add a timeout for that.

-David


Christian.

Am 24.04.19 um 09:59 schrieb zhoucm1:


how about new attached?


-David


On 2019年04月24日 15:30, Christian König wrote:
That would change the semantics of ttm_bo_mem_space() and so could 
change the return code in an IOCTL as well. Not a good idea, cause 
that could have a lot of side effects.


Instead in the calling DC code you could check if you get an -ENOMEM 
and then call schedule().


If after the schedule() we see that we have now BOs on the LRU we 
can try again and see if pinning the frame buffer now succeeds.


Christian.

Am 24.04.19 um 09:17 schrieb zhoucm1:


I made a patch as attached.

I'm not sure how to make patch as your proposal, Could you make a 
patch for that if mine isn't enough?


-David

On 2019年04月24日 15:12, Christian König wrote:

how about just adding a wrapper for pin function as below?

I considered this as well and don't think it will work reliable.

We could use it as a band aid for this specific problem, but in 
general we need to improve the handling in TTM to resolve those 
kind of resource conflicts.


Regards,
Christian.

Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>3. If we have a ticket we grab a reference to the first BO on 
the LRU, drop the LRU lock and try to grab the reservation lock 
with the ticket.


The BO on LRU is already locked by cs user, can it be dropped 
here by DC user? and then DC user grab its lock with ticket, how 
does CS grab it again?


If you think waiting in ttm has this risk, how about just adding 
a wrapper for pin function as below?

amdgpu_get_pin_bo_timeout()
{
do {
amdgpo_bo_reserve();
r=amdgpu_bo_pin();

if(!r)
    break;
amdgpu_bo_unreserve();
timeout--;

} while(timeout>0);

}

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
,dri-devel@lists.freedesktop.org

CC:

Well that's not so easy of hand.

The basic problem here is that when you busy wait at this place 
you can easily run into situations where application A busy waits 
for B while B busy waits for A -> deadlock.


So what we need here is the deadlock detection logic of the 
ww_mutex. To use this we at least need to do the following steps:


1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if the 
BO we need memory for (or rather the ww_mutex of its reservation 
object) has a ticket assigned.


3. If we have a ticket we grab a reference to the first BO on the 
LRU, drop the LRU lock and try to grab the reservation lock with 
the ticket.


4. If getting the reservation lock with the ticket succeeded we 
check if the BO is still the first one on the LRU in question 
(the BO could have moved).


5. If the BO is still the first one on the LRU in question we try 
to evict it as we would evict any other BO.


6. If any of the "If's" above fail we just back off and return 
-EBUSY.


Steps 2-5 are certainly not trivial, but doable as far as I can see.

Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
How about adding more condition ctx->resv inline to address your 
concern? As well as don't wait from same user, shouldn't lead to 
deadlock.


Otherwise, any other idea?

-------- Original Message ----
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu 
busy

From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org

CC:

Well that is certainly a NAK because it can lead to deadlock in the
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang 
>
> Thanks,
> Prike
> -Original Message-
> From: Chunming Zhou 
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Liang, Prike ; Zhou, David(ChunMing) 


> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
> heavy gpu job could occupy memory long time, which could lead 
to other user fail to get memory.

>
> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> Signed-off-by: Chunming Zhou 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 
100644

> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/d

Re: [PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-24 Thread Christian König

This is used in a work item, so you don't need to check for signals.

And checking if the LRU is populated is mandatory here or otherwise you 
can run into an endless loop.


Christian.

Am 24.04.19 um 09:59 schrieb zhoucm1:


how about new attached?


-David


On 2019年04月24日 15:30, Christian König wrote:
That would change the semantics of ttm_bo_mem_space() and so could 
change the return code in an IOCTL as well. Not a good idea, cause 
that could have a lot of side effects.


Instead in the calling DC code you could check if you get an -ENOMEM 
and then call schedule().


If after the schedule() we see that we have now BOs on the LRU we can 
try again and see if pinning the frame buffer now succeeds.


Christian.

Am 24.04.19 um 09:17 schrieb zhoucm1:


I made a patch as attached.

I'm not sure how to make patch as your proposal, Could you make a 
patch for that if mine isn't enough?


-David

On 2019年04月24日 15:12, Christian König wrote:

how about just adding a wrapper for pin function as below?

I considered this as well and don't think it will work reliable.

We could use it as a band aid for this specific problem, but in 
general we need to improve the handling in TTM to resolve those 
kind of resource conflicts.


Regards,
Christian.

Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>3. If we have a ticket we grab a reference to the first BO on the 
LRU, drop the LRU lock and try to grab the reservation lock with 
the ticket.


The BO on LRU is already locked by cs user, can it be dropped here 
by DC user? and then DC user grab its lock with ticket, how does 
CS grab it again?


If you think waiting in ttm has this risk, how about just adding a 
wrapper for pin function as below?

amdgpu_get_pin_bo_timeout()
{
do {
amdgpo_bo_reserve();
r=amdgpu_bo_pin();

if(!r)
    break;
amdgpu_bo_unreserve();
timeout--;

} while(timeout>0);

}

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
,dri-devel@lists.freedesktop.org

CC:

Well that's not so easy of hand.

The basic problem here is that when you busy wait at this place 
you can easily run into situations where application A busy waits 
for B while B busy waits for A -> deadlock.


So what we need here is the deadlock detection logic of the 
ww_mutex. To use this we at least need to do the following steps:


1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if the BO 
we need memory for (or rather the ww_mutex of its reservation 
object) has a ticket assigned.


3. If we have a ticket we grab a reference to the first BO on the 
LRU, drop the LRU lock and try to grab the reservation lock with 
the ticket.


4. If getting the reservation lock with the ticket succeeded we 
check if the BO is still the first one on the LRU in question (the 
BO could have moved).


5. If the BO is still the first one on the LRU in question we try 
to evict it as we would evict any other BO.


6. If any of the "If's" above fail we just back off and return -EBUSY.

Steps 2-5 are certainly not trivial, but doable as far as I can see.

Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
How about adding more condition ctx->resv inline to address your 
concern? As well as don't wait from same user, shouldn't lead to 
deadlock.


Otherwise, any other idea?

-------- Original Message ----
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org

CC:

Well that is certainly a NAK because it can lead to deadlock in the
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang 
>
> Thanks,
> Prike
> -Original Message-
> From: Chunming Zhou 
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Liang, Prike ; Zhou, David(ChunMing) 


> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
> heavy gpu job could occupy memory long time, which could lead 
to other user fail to get memory.

>
> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> Signed-off-by: Chunming Zhou 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 
100644

> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
ttm_buffer_object *bo,

>    if (mem->mm_node)
>    break

Re: [PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-24 Thread Koenig, Christian
Am 24.04.19 um 10:01 schrieb Daniel Vetter:
> On Tue, Apr 23, 2019 at 4:42 PM Christian König
>  wrote:
>> Well that's not so easy of hand.
>>
>> The basic problem here is that when you busy wait at this place you can 
>> easily run into situations where application A busy waits for B while B busy 
>> waits for A -> deadlock.
>>
>> So what we need here is the deadlock detection logic of the ww_mutex. To use 
>> this we at least need to do the following steps:
>>
>> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).
>>
>> 2. If we then run into this EBUSY condition in TTM check if the BO we need 
>> memory for (or rather the ww_mutex of its reservation object) has a ticket 
>> assigned.
>>
>> 3. If we have a ticket we grab a reference to the first BO on the LRU, drop 
>> the LRU lock and try to grab the reservation lock with the ticket.
>>
>> 4. If getting the reservation lock with the ticket succeeded we check if the 
>> BO is still the first one on the LRU in question (the BO could have moved).
> I don't think you actually need this check. Once you're in this slow
> reclaim mode all hope for performance is pretty much lost (you're
> thrashin vram terribly), forward progress matters. Also, less code :-)

This step is not for performance, but for correctness.

When you drop the LRU lock and grab the ww_mutex in the slow path you 
need to double check that this BO wasn't evicted by somebody else in the 
meantime.

>> 5. If the BO is still the first one on the LRU in question we try to evict 
>> it as we would evict any other BO.
>>
>> 6. If any of the "If's" above fail we just back off and return -EBUSY.
> Another idea I pondered (but never implemented) is a slow reclaim lru
> lock. Essentially there'd be two ways to walk the lru and evict bo:
>
> - fast path: spinlock + trylock, like today
>
> - slow path: ww_mutex lru lock, plus every bo is reserved (nested
> within that ww_mutex lru lock) with a full ww_mutex_lock. Guaranteed
> forward progress.

Of hand I don't see any advantage to this. So what is the benefit?

Regards,
Christian.

>
> Transition would be that as soon as someone hits an EBUSY, they set
> the slow reclaim flag (while holding the quick reclaim spinlock
> quickly, which will drain anyone still stuck in fast reclaim path).
> Everytime fast reclaim acquires the spinlock it needs to check for the
> slow reclaim flag, and if that's set, fall back to slow reclaim.
>
> Transitioning out of slow reclaim would only happen once the thread
> (with it's ww ticket) that hit the EBUSY has completed whatever it was
> trying to do (either successfully, or failed because even evicting
> everyone else didn't give you enough vram). Tricky part here is making
> sure threads still in slow reclaim don't blow up if we switch back.
> Since only ever one thread can be actually doing slow reclaim
> (everyone is serialized on the single ww mutex lru lock) should be
> doable by checking for the slow reclaim conditiona once you have the
> lru ww_mutex and if the slow reclaim condition is lifted, switch back
> to fast reclaim.
>
> The slow reclaim conditiona might also need to be a full reference
> count, to handle multiple threads hitting EBUSY/slow reclaim without
> the book-keeping getting all confused.
>
> Upshot of this is that it's guranteeing forward progress, but the perf
> cliff might be too steep if this happens too often. You might need to
> round it off with 1-2 retries when you hit EBUSY, before forcing slow
> reclaim.
> -Daniel
>
>> Steps 2-5 are certainly not trivial, but doable as far as I can see.
>>
>> Have fun :)
>> Christian.
>>
>> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
>>
>> How about adding more condition ctx->resv inline to address your concern? As 
>> well as don't wait from same user, shouldn't lead to deadlock.
>>
>> Otherwise, any other idea?
>>
>>  Original Message 
>> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
>> From: Christian König
>> To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org
>> CC:
>>
>> Well that is certainly a NAK because it can lead to deadlock in the
>> memory management.
>>
>> You can't just busy wait with all those locks held.
>>
>> Regards,
>> Christian.
>>
>> Am 23.04.19 um 03:45 schrieb Liang, Prike:
>>> Acked-by: Prike Liang 
>>>
>>> Thanks,
>>> Prike
>>> -Original Message-
>>> From: Chunming Zhou 
>>> Sent: Monday, April 22, 2019 6:39 PM
>>&g

Re: [PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-24 Thread Daniel Vetter
On Tue, Apr 23, 2019 at 4:42 PM Christian König
 wrote:
>
> Well that's not so easy of hand.
>
> The basic problem here is that when you busy wait at this place you can 
> easily run into situations where application A busy waits for B while B busy 
> waits for A -> deadlock.
>
> So what we need here is the deadlock detection logic of the ww_mutex. To use 
> this we at least need to do the following steps:
>
> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).
>
> 2. If we then run into this EBUSY condition in TTM check if the BO we need 
> memory for (or rather the ww_mutex of its reservation object) has a ticket 
> assigned.
>
> 3. If we have a ticket we grab a reference to the first BO on the LRU, drop 
> the LRU lock and try to grab the reservation lock with the ticket.
>
> 4. If getting the reservation lock with the ticket succeeded we check if the 
> BO is still the first one on the LRU in question (the BO could have moved).

I don't think you actually need this check. Once you're in this slow
reclaim mode all hope for performance is pretty much lost (you're
thrashin vram terribly), forward progress matters. Also, less code :-)

> 5. If the BO is still the first one on the LRU in question we try to evict it 
> as we would evict any other BO.
>
> 6. If any of the "If's" above fail we just back off and return -EBUSY.

Another idea I pondered (but never implemented) is a slow reclaim lru
lock. Essentially there'd be two ways to walk the lru and evict bo:

- fast path: spinlock + trylock, like today

- slow path: ww_mutex lru lock, plus every bo is reserved (nested
within that ww_mutex lru lock) with a full ww_mutex_lock. Guaranteed
forward progress.

Transition would be that as soon as someone hits an EBUSY, they set
the slow reclaim flag (while holding the quick reclaim spinlock
quickly, which will drain anyone still stuck in fast reclaim path).
Everytime fast reclaim acquires the spinlock it needs to check for the
slow reclaim flag, and if that's set, fall back to slow reclaim.

Transitioning out of slow reclaim would only happen once the thread
(with it's ww ticket) that hit the EBUSY has completed whatever it was
trying to do (either successfully, or failed because even evicting
everyone else didn't give you enough vram). Tricky part here is making
sure threads still in slow reclaim don't blow up if we switch back.
Since only ever one thread can be actually doing slow reclaim
(everyone is serialized on the single ww mutex lru lock) should be
doable by checking for the slow reclaim conditiona once you have the
lru ww_mutex and if the slow reclaim condition is lifted, switch back
to fast reclaim.

The slow reclaim conditiona might also need to be a full reference
count, to handle multiple threads hitting EBUSY/slow reclaim without
the book-keeping getting all confused.

Upshot of this is that it's guranteeing forward progress, but the perf
cliff might be too steep if this happens too often. You might need to
round it off with 1-2 retries when you hit EBUSY, before forcing slow
reclaim.
-Daniel

> Steps 2-5 are certainly not trivial, but doable as far as I can see.
>
> Have fun :)
> Christian.
>
> Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
>
> How about adding more condition ctx->resv inline to address your concern? As 
> well as don't wait from same user, shouldn't lead to deadlock.
>
> Otherwise, any other idea?
>
>  Original Message 
> Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
> From: Christian König
> To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org
> CC:
>
> Well that is certainly a NAK because it can lead to deadlock in the
> memory management.
>
> You can't just busy wait with all those locks held.
>
> Regards,
> Christian.
>
> Am 23.04.19 um 03:45 schrieb Liang, Prike:
> > Acked-by: Prike Liang 
> >
> > Thanks,
> > Prike
> > -----Original Message-----
> > From: Chunming Zhou 
> > Sent: Monday, April 22, 2019 6:39 PM
> > To: dri-devel@lists.freedesktop.org
> > Cc: Liang, Prike ; Zhou, David(ChunMing) 
> > 
> > Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
> >
> > heavy gpu job could occupy memory long time, which could lead to other user 
> > fail to get memory.
> >
> > Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> > Signed-off-by: Chunming Zhou 
> > ---
> >   drivers/gpu/drm/ttm/ttm_bo.c | 6 --
> >   1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c 
> > index 7c484729f9b2..6c596cc24bec 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > +++ b/driv

Re: [PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-24 Thread zhoucm1

how about new attached?


-David


On 2019年04月24日 15:30, Christian König wrote:
That would change the semantics of ttm_bo_mem_space() and so could 
change the return code in an IOCTL as well. Not a good idea, cause 
that could have a lot of side effects.


Instead in the calling DC code you could check if you get an -ENOMEM 
and then call schedule().


If after the schedule() we see that we have now BOs on the LRU we can 
try again and see if pinning the frame buffer now succeeds.


Christian.

Am 24.04.19 um 09:17 schrieb zhoucm1:


I made a patch as attached.

I'm not sure how to make patch as your proposal, Could you make a 
patch for that if mine isn't enough?


-David

On 2019年04月24日 15:12, Christian König wrote:

how about just adding a wrapper for pin function as below?

I considered this as well and don't think it will work reliable.

We could use it as a band aid for this specific problem, but in 
general we need to improve the handling in TTM to resolve those kind 
of resource conflicts.


Regards,
Christian.

Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>3. If we have a ticket we grab a reference to the first BO on the 
LRU, drop the LRU lock and try to grab the reservation lock with 
the ticket.


The BO on LRU is already locked by cs user, can it be dropped here 
by DC user? and then DC user grab its lock with ticket, how does CS 
grab it again?


If you think waiting in ttm has this risk, how about just adding a 
wrapper for pin function as below?

amdgpu_get_pin_bo_timeout()
{
do {
amdgpo_bo_reserve();
r=amdgpu_bo_pin();

if(!r)
    break;
amdgpu_bo_unreserve();
timeout--;

} while(timeout>0);

}

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
,dri-devel@lists.freedesktop.org

CC:

Well that's not so easy of hand.

The basic problem here is that when you busy wait at this place you 
can easily run into situations where application A busy waits for B 
while B busy waits for A -> deadlock.


So what we need here is the deadlock detection logic of the 
ww_mutex. To use this we at least need to do the following steps:


1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if the BO 
we need memory for (or rather the ww_mutex of its reservation 
object) has a ticket assigned.


3. If we have a ticket we grab a reference to the first BO on the 
LRU, drop the LRU lock and try to grab the reservation lock with 
the ticket.


4. If getting the reservation lock with the ticket succeeded we 
check if the BO is still the first one on the LRU in question (the 
BO could have moved).


5. If the BO is still the first one on the LRU in question we try 
to evict it as we would evict any other BO.


6. If any of the "If's" above fail we just back off and return -EBUSY.

Steps 2-5 are certainly not trivial, but doable as far as I can see.

Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
How about adding more condition ctx->resv inline to address your 
concern? As well as don't wait from same user, shouldn't lead to 
deadlock.


Otherwise, any other idea?

-------- Original Message ----
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org

CC:

Well that is certainly a NAK because it can lead to deadlock in the
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang 
>
> Thanks,
> Prike
> -Original Message-
> From: Chunming Zhou 
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Liang, Prike ; Zhou, David(ChunMing) 


> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
> heavy gpu job could occupy memory long time, which could lead to 
other user fail to get memory.

>
> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> Signed-off-by: Chunming Zhou 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644

> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
ttm_buffer_object *bo,

>    if (mem->mm_node)
>    break;
>    ret = ttm_mem_evict_first(bdev, mem_type, place, 
ctx);

> - if (unlikely(ret != 0))
> - return ret;
> + if (unlikely(ret != 0)) {
> +

Re: [PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-24 Thread Christian König
That would change the semantics of ttm_bo_mem_space() and so could 
change the return code in an IOCTL as well. Not a good idea, cause that 
could have a lot of side effects.


Instead in the calling DC code you could check if you get an -ENOMEM and 
then call schedule().


If after the schedule() we see that we have now BOs on the LRU we can 
try again and see if pinning the frame buffer now succeeds.


Christian.

Am 24.04.19 um 09:17 schrieb zhoucm1:


I made a patch as attached.

I'm not sure how to make patch as your proposal, Could you make a 
patch for that if mine isn't enough?


-David

On 2019年04月24日 15:12, Christian König wrote:

how about just adding a wrapper for pin function as below?

I considered this as well and don't think it will work reliable.

We could use it as a band aid for this specific problem, but in 
general we need to improve the handling in TTM to resolve those kind 
of resource conflicts.


Regards,
Christian.

Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>3. If we have a ticket we grab a reference to the first BO on the 
LRU, drop the LRU lock and try to grab the reservation lock with the 
ticket.


The BO on LRU is already locked by cs user, can it be dropped here 
by DC user? and then DC user grab its lock with ticket, how does CS 
grab it again?


If you think waiting in ttm has this risk, how about just adding a 
wrapper for pin function as below?

amdgpu_get_pin_bo_timeout()
{
do {
amdgpo_bo_reserve();
r=amdgpu_bo_pin();

if(!r)
    break;
amdgpu_bo_unreserve();
timeout--;

} while(timeout>0);

}

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
,dri-devel@lists.freedesktop.org

CC:

Well that's not so easy of hand.

The basic problem here is that when you busy wait at this place you 
can easily run into situations where application A busy waits for B 
while B busy waits for A -> deadlock.


So what we need here is the deadlock detection logic of the 
ww_mutex. To use this we at least need to do the following steps:


1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if the BO 
we need memory for (or rather the ww_mutex of its reservation 
object) has a ticket assigned.


3. If we have a ticket we grab a reference to the first BO on the 
LRU, drop the LRU lock and try to grab the reservation lock with the 
ticket.


4. If getting the reservation lock with the ticket succeeded we 
check if the BO is still the first one on the LRU in question (the 
BO could have moved).


5. If the BO is still the first one on the LRU in question we try to 
evict it as we would evict any other BO.


6. If any of the "If's" above fail we just back off and return -EBUSY.

Steps 2-5 are certainly not trivial, but doable as far as I can see.

Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
How about adding more condition ctx->resv inline to address your 
concern? As well as don't wait from same user, shouldn't lead to 
deadlock.


Otherwise, any other idea?

-------- Original Message ----
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org

CC:

Well that is certainly a NAK because it can lead to deadlock in the
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang 
>
> Thanks,
> Prike
> -Original Message-
> From: Chunming Zhou 
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Liang, Prike ; Zhou, David(ChunMing) 


> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
> heavy gpu job could occupy memory long time, which could lead to 
other user fail to get memory.

>
> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> Signed-off-by: Chunming Zhou 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644

> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
ttm_buffer_object *bo,

>    if (mem->mm_node)
>    break;
>    ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
> - if (unlikely(ret != 0))
> - return ret;
> + if (unlikely(ret != 0)) {
> + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
> +   

Re: [PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-24 Thread zhoucm1

I made a patch as attached.

I'm not sure how to make patch as your proposal, Could you make a patch 
for that if mine isn't enough?


-David

On 2019年04月24日 15:12, Christian König wrote:

how about just adding a wrapper for pin function as below?

I considered this as well and don't think it will work reliable.

We could use it as a band aid for this specific problem, but in 
general we need to improve the handling in TTM to resolve those kind 
of resource conflicts.


Regards,
Christian.

Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>3. If we have a ticket we grab a reference to the first BO on the 
LRU, drop the LRU lock and try to grab the reservation lock with the 
ticket.


The BO on LRU is already locked by cs user, can it be dropped here by 
DC user? and then DC user grab its lock with ticket, how does CS grab 
it again?


If you think waiting in ttm has this risk, how about just adding a 
wrapper for pin function as below?

amdgpu_get_pin_bo_timeout()
{
do {
amdgpo_bo_reserve();
r=amdgpu_bo_pin();

if(!r)
    break;
amdgpu_bo_unreserve();
timeout--;

} while(timeout>0);

}

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
,dri-devel@lists.freedesktop.org

CC:

Well that's not so easy of hand.

The basic problem here is that when you busy wait at this place you 
can easily run into situations where application A busy waits for B 
while B busy waits for A -> deadlock.


So what we need here is the deadlock detection logic of the ww_mutex. 
To use this we at least need to do the following steps:


1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if the BO we 
need memory for (or rather the ww_mutex of its reservation object) 
has a ticket assigned.


3. If we have a ticket we grab a reference to the first BO on the 
LRU, drop the LRU lock and try to grab the reservation lock with the 
ticket.


4. If getting the reservation lock with the ticket succeeded we check 
if the BO is still the first one on the LRU in question (the BO could 
have moved).


5. If the BO is still the first one on the LRU in question we try to 
evict it as we would evict any other BO.


6. If any of the "If's" above fail we just back off and return -EBUSY.

Steps 2-5 are certainly not trivial, but doable as far as I can see.

Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
How about adding more condition ctx->resv inline to address your 
concern? As well as don't wait from same user, shouldn't lead to 
deadlock.


Otherwise, any other idea?

-------- Original Message ----
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org

CC:

Well that is certainly a NAK because it can lead to deadlock in the
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang 
>
> Thanks,
> Prike
> -Original Message-
> From: Chunming Zhou 
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Liang, Prike ; Zhou, David(ChunMing) 


> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
> heavy gpu job could occupy memory long time, which could lead to 
other user fail to get memory.

>
> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> Signed-off-by: Chunming Zhou 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644

> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
ttm_buffer_object *bo,

>    if (mem->mm_node)
>    break;
>    ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
> - if (unlikely(ret != 0))
> - return ret;
> + if (unlikely(ret != 0)) {
> + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
> + return ret;
> + }
>    } while (1);
>    mem->mem_type = mem_type;
>    return ttm_bo_add_move_fence(bo, man, mem);
> --
> 2.17.1
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https:

Re: [PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-24 Thread Christian König

how about just adding a wrapper for pin function as below?

I considered this as well and don't think it will work reliable.

We could use it as a band aid for this specific problem, but in general 
we need to improve the handling in TTM to resolve those kind of resource 
conflicts.


Regards,
Christian.

Am 23.04.19 um 17:09 schrieb Zhou, David(ChunMing):
>3. If we have a ticket we grab a reference to the first BO on the 
LRU, drop the LRU lock and try to grab the reservation lock with the 
ticket.


The BO on LRU is already locked by cs user, can it be dropped here by 
DC user? and then DC user grab its lock with ticket, how does CS grab 
it again?


If you think waiting in ttm has this risk, how about just adding a 
wrapper for pin function as below?

amdgpu_get_pin_bo_timeout()
{
do {
amdgpo_bo_reserve();
r=amdgpu_bo_pin();

if(!r)
    break;
amdgpu_bo_unreserve();
timeout--;

} while(timeout>0);

}

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
,dri-devel@lists.freedesktop.org

CC:

Well that's not so easy of hand.

The basic problem here is that when you busy wait at this place you 
can easily run into situations where application A busy waits for B 
while B busy waits for A -> deadlock.


So what we need here is the deadlock detection logic of the ww_mutex. 
To use this we at least need to do the following steps:


1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if the BO we 
need memory for (or rather the ww_mutex of its reservation object) has 
a ticket assigned.


3. If we have a ticket we grab a reference to the first BO on the LRU, 
drop the LRU lock and try to grab the reservation lock with the ticket.


4. If getting the reservation lock with the ticket succeeded we check 
if the BO is still the first one on the LRU in question (the BO could 
have moved).


5. If the BO is still the first one on the LRU in question we try to 
evict it as we would evict any other BO.


6. If any of the "If's" above fail we just back off and return -EBUSY.

Steps 2-5 are certainly not trivial, but doable as far as I can see.

Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
How about adding more condition ctx->resv inline to address your 
concern? As well as don't wait from same user, shouldn't lead to 
deadlock.


Otherwise, any other idea?

-------- Original Message ----
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org

CC:

Well that is certainly a NAK because it can lead to deadlock in the
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang 
>
> Thanks,
> Prike
> -Original Message-
> From: Chunming Zhou 
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Liang, Prike ; Zhou, David(ChunMing) 


> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
> heavy gpu job could occupy memory long time, which could lead to 
other user fail to get memory.

>
> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> Signed-off-by: Chunming Zhou 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644

> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
ttm_buffer_object *bo,

>    if (mem->mm_node)
>    break;
>    ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
> - if (unlikely(ret != 0))
> - return ret;
> + if (unlikely(ret != 0)) {
> + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
> + return ret;
> + }
>    } while (1);
>    mem->mem_type = mem_type;
>    return ttm_bo_add_move_fence(bo, man, mem);
> --
> 2.17.1
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re:[PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-23 Thread Zhou, David(ChunMing)
>3. If we have a ticket we grab a reference to the first BO on the LRU, drop 
>the LRU lock and try to grab the reservation lock with the ticket.

The BO on LRU is already locked by cs user, can it be dropped here by DC user? 
and then DC user grab its lock with ticket, how does CS grab it again?

If you think waiting in ttm has this risk, how about just adding a wrapper for 
pin function as below?
amdgpu_get_pin_bo_timeout()
{
do {
amdgpo_bo_reserve();
r=amdgpu_bo_pin();

if(!r)
break;
amdgpu_bo_unreserve();
timeout--;

} while(timeout>0);

}

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
,dri-devel@lists.freedesktop.org
CC:

Well that's not so easy of hand.

The basic problem here is that when you busy wait at this place you can easily 
run into situations where application A busy waits for B while B busy waits for 
A -> deadlock.

So what we need here is the deadlock detection logic of the ww_mutex. To use 
this we at least need to do the following steps:

1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if the BO we need 
memory for (or rather the ww_mutex of its reservation object) has a ticket 
assigned.

3. If we have a ticket we grab a reference to the first BO on the LRU, drop the 
LRU lock and try to grab the reservation lock with the ticket.

4. If getting the reservation lock with the ticket succeeded we check if the BO 
is still the first one on the LRU in question (the BO could have moved).

5. If the BO is still the first one on the LRU in question we try to evict it 
as we would evict any other BO.

6. If any of the "If's" above fail we just back off and return -EBUSY.

Steps 2-5 are certainly not trivial, but doable as far as I can see.

Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
How about adding more condition ctx->resv inline to address your concern? As 
well as don't wait from same user, shouldn't lead to deadlock.

Otherwise, any other idea?

-------- Original Message ----
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>
CC:

Well that is certainly a NAK because it can lead to deadlock in the
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang <mailto:prike.li...@amd.com>
>
> Thanks,
> Prike
> -Original Message-
> From: Chunming Zhou <mailto:david1.z...@amd.com>
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>
> Cc: Liang, Prike <mailto:prike.li...@amd.com>; Zhou, 
> David(ChunMing) <mailto:david1.z...@amd.com>
> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
> heavy gpu job could occupy memory long time, which could lead to other user 
> fail to get memory.
>
> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> Signed-off-by: Chunming Zhou <mailto:david1.z...@amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c 
> index 7c484729f9b2..6c596cc24bec 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
> ttm_buffer_object *bo,
>if (mem->mm_node)
>break;
>ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
> - if (unlikely(ret != 0))
> - return ret;
> + if (unlikely(ret != 0)) {
> + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
> + return ret;
> + }
>} while (1);
>mem->mem_type = mem_type;
>return ttm_bo_add_move_fence(bo, man, mem);
> --
> 2.17.1
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>
> https://lists.freedesktop.org/mailman/listinfo/dri-devel




___
dri-devel mailing list
dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-23 Thread Christian König

Well that's not so easy of hand.

The basic problem here is that when you busy wait at this place you can 
easily run into situations where application A busy waits for B while B 
busy waits for A -> deadlock.


So what we need here is the deadlock detection logic of the ww_mutex. To 
use this we at least need to do the following steps:


1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if the BO we 
need memory for (or rather the ww_mutex of its reservation object) has a 
ticket assigned.


3. If we have a ticket we grab a reference to the first BO on the LRU, 
drop the LRU lock and try to grab the reservation lock with the ticket.


4. If getting the reservation lock with the ticket succeeded we check if 
the BO is still the first one on the LRU in question (the BO could have 
moved).


5. If the BO is still the first one on the LRU in question we try to 
evict it as we would evict any other BO.


6. If any of the "If's" above fail we just back off and return -EBUSY.

Steps 2-5 are certainly not trivial, but doable as far as I can see.

Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
How about adding more condition ctx->resv inline to address your 
concern? As well as don't wait from same user, shouldn't lead to deadlock.


Otherwise, any other idea?

 Original Message ----
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org

CC:

Well that is certainly a NAK because it can lead to deadlock in the
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang 
>
> Thanks,
> Prike
> -Original Message-
> From: Chunming Zhou 
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Liang, Prike ; Zhou, David(ChunMing) 


> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
> heavy gpu job could occupy memory long time, which could lead to 
other user fail to get memory.

>
> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> Signed-off-by: Chunming Zhou 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
b/drivers/gpu/drm/ttm/ttm_bo.c index 7c484729f9b2..6c596cc24bec 100644

> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
ttm_buffer_object *bo,

>    if (mem->mm_node)
>    break;
>    ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
> - if (unlikely(ret != 0))
> - return ret;
> + if (unlikely(ret != 0)) {
> + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
> + return ret;
> + }
>    } while (1);
>    mem->mem_type = mem_type;
>    return ttm_bo_add_move_fence(bo, man, mem);
> --
> 2.17.1
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re:[PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-23 Thread Zhou, David(ChunMing)
How about adding more condition ctx->resv inline to address your concern? As 
well as don't wait from same user, shouldn't lead to deadlock.

Otherwise, any other idea?

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org
CC:

Well that is certainly a NAK because it can lead to deadlock in the
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang 
>
> Thanks,
> Prike
> -Original Message-
> From: Chunming Zhou 
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Liang, Prike ; Zhou, David(ChunMing) 
> 
> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
> heavy gpu job could occupy memory long time, which could lead to other user 
> fail to get memory.
>
> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> Signed-off-by: Chunming Zhou 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c 
> index 7c484729f9b2..6c596cc24bec 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
> ttm_buffer_object *bo,
>if (mem->mm_node)
>break;
>ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
> - if (unlikely(ret != 0))
> - return ret;
> + if (unlikely(ret != 0)) {
> + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
> + return ret;
> + }
>} while (1);
>mem->mem_type = mem_type;
>return ttm_bo_add_move_fence(bo, man, mem);
> --
> 2.17.1
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-23 Thread Christian König
Well that is certainly a NAK because it can lead to deadlock in the 
memory management.


You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:

Acked-by: Prike Liang 

Thanks,
Prike
-Original Message-
From: Chunming Zhou 
Sent: Monday, April 22, 2019 6:39 PM
To: dri-devel@lists.freedesktop.org
Cc: Liang, Prike ; Zhou, David(ChunMing) 

Subject: [PATCH] ttm: wait mem space if user allow while gpu busy

heavy gpu job could occupy memory long time, which could lead to other user 
fail to get memory.

Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
Signed-off-by: Chunming Zhou 
---
  drivers/gpu/drm/ttm/ttm_bo.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 
7c484729f9b2..6c596cc24bec 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object 
*bo,
if (mem->mm_node)
break;
ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
-   if (unlikely(ret != 0))
-   return ret;
+   if (unlikely(ret != 0)) {
+   if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
+   return ret;
+   }
} while (1);
mem->mem_type = mem_type;
return ttm_bo_add_move_fence(bo, man, mem);
--
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

RE: [PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-22 Thread Liang, Prike
Acked-by: Prike Liang 

Thanks,
Prike
-Original Message-
From: Chunming Zhou  
Sent: Monday, April 22, 2019 6:39 PM
To: dri-devel@lists.freedesktop.org
Cc: Liang, Prike ; Zhou, David(ChunMing) 

Subject: [PATCH] ttm: wait mem space if user allow while gpu busy

heavy gpu job could occupy memory long time, which could lead to other user 
fail to get memory.

Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 
7c484729f9b2..6c596cc24bec 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object 
*bo,
if (mem->mm_node)
break;
ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
-   if (unlikely(ret != 0))
-   return ret;
+   if (unlikely(ret != 0)) {
+   if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
+   return ret;
+   }
} while (1);
mem->mem_type = mem_type;
return ttm_bo_add_move_fence(bo, man, mem);
--
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-22 Thread Chunming Zhou
heavy gpu job could occupy memory long time, which could lead to other
user fail to get memory.

Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 7c484729f9b2..6c596cc24bec 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object 
*bo,
if (mem->mm_node)
break;
ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
-   if (unlikely(ret != 0))
-   return ret;
+   if (unlikely(ret != 0)) {
+   if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
+   return ret;
+   }
} while (1);
mem->mem_type = mem_type;
return ttm_bo_add_move_fence(bo, man, mem);
-- 
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel