Re: [Intel-gfx] [PATCH] drm/i915: Reduce i915_request_alloc retirement to local context

2019-01-09 Thread Tvrtko Ursulin


On 09/01/2019 12:06, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2019-01-09 11:56:15)


On 07/01/2019 15:29, Chris Wilson wrote:

In the continual quest to reduce the amount of global work required when
submitting requests, replace i915_retire_requests() after allocation
failure to retiring just our ring.

References: 11abf0c5a021 ("drm/i915: Limit the backpressure for i915_request 
allocation")
Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
---
   drivers/gpu/drm/i915/i915_request.c | 33 +
   1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 1e158eb8cb97..9ba218c6029b 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -477,6 +477,29 @@ submit_notify(struct i915_sw_fence *fence, enum 
i915_sw_fence_notify state)
   return NOTIFY_DONE;
   }
   
+static noinline struct i915_request *

+i915_request_alloc_slow(struct intel_context *ce)
+{
+ struct intel_ring *ring = ce->ring;
+ struct i915_request *rq, *next;
+
+ list_for_each_entry_safe(rq, next, &ring->request_list, ring_link) {
+ /* Ratelimit ourselves to prevent oom from malicious clients */
+ if (&next->ring_link == &ring->request_list) {


list_is_last(next, &ring->request_list) ?


Tried it (needs list_is_last(&next->ring_link,...)), but I slightly
preferred not implying that next was a valid request here, and keeping
the matching form to list termination.
  

+ cond_synchronize_rcu(rq->rcustate);
+ break; /* keep the last objects for the next request */
+ }
+
+ if (!i915_request_completed(rq))
+ break;
+
+ /* Retire our old requests in the hope that we free some */
+ i915_request_retire(rq);

The RCU wait against the last submitted rq is also gone. Now it only
sync against the next to last rq, unless there is more than two live
requests. Is this what you intended?


Nah, I was trying to be too smart, forgetting that we didn't walk the
entire list. The RCU wait is against to the last rq (since next is the
list head at that point, so unchanged wrt to using list_last_entry), but
we break on seeing a busy request, so no ratelimiting if you keep the GPU
busy (not quite as intended!).
  

If the ring timeline has is a list of r-r-r-R-R-R (r=completed,
R=pending) then it looks like it will not sync on anything.

And if the list is r-r-r-r it will sync against a completed rq. Which I
hope is a no-op, but still, the loop logic looks potentially dodgy.

It also has a higher level vulnerability to one hog timeline starving
the rest I think.


Also? Other than forgetting the earlier break preventing the throtting,
what else do you see wrong with throttling along a timeline/ring?


I was on the wrong track when thinking about the removal of global 
retire. I though the hog on one timeline would be able to starve the 
other timeline, but the hog will eventually hit the allocation failure 
and sync against itself. I think it's fine.


Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Reduce i915_request_alloc retirement to local context

2019-01-09 Thread Chris Wilson
Quoting Tvrtko Ursulin (2019-01-09 11:56:15)
> 
> On 07/01/2019 15:29, Chris Wilson wrote:
> > In the continual quest to reduce the amount of global work required when
> > submitting requests, replace i915_retire_requests() after allocation
> > failure to retiring just our ring.
> > 
> > References: 11abf0c5a021 ("drm/i915: Limit the backpressure for 
> > i915_request allocation")
> > Signed-off-by: Chris Wilson 
> > Cc: Tvrtko Ursulin 
> > ---
> >   drivers/gpu/drm/i915/i915_request.c | 33 +
> >   1 file changed, 24 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_request.c 
> > b/drivers/gpu/drm/i915/i915_request.c
> > index 1e158eb8cb97..9ba218c6029b 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -477,6 +477,29 @@ submit_notify(struct i915_sw_fence *fence, enum 
> > i915_sw_fence_notify state)
> >   return NOTIFY_DONE;
> >   }
> >   
> > +static noinline struct i915_request *
> > +i915_request_alloc_slow(struct intel_context *ce)
> > +{
> > + struct intel_ring *ring = ce->ring;
> > + struct i915_request *rq, *next;
> > +
> > + list_for_each_entry_safe(rq, next, &ring->request_list, ring_link) {
> > + /* Ratelimit ourselves to prevent oom from malicious clients 
> > */
> > + if (&next->ring_link == &ring->request_list) {
> 
> list_is_last(next, &ring->request_list) ?

Tried it (needs list_is_last(&next->ring_link,...)), but I slightly
preferred not implying that next was a valid request here, and keeping
the matching form to list termination.
 
> > + cond_synchronize_rcu(rq->rcustate);
> > + break; /* keep the last objects for the next request 
> > */
> > + }
> > +
> > + if (!i915_request_completed(rq))
> > + break;
> > +
> > + /* Retire our old requests in the hope that we free some */
> > + i915_request_retire(rq);
> The RCU wait against the last submitted rq is also gone. Now it only 
> sync against the next to last rq, unless there is more than two live 
> requests. Is this what you intended?

Nah, I was trying to be too smart, forgetting that we didn't walk the
entire list. The RCU wait is against to the last rq (since next is the
list head at that point, so unchanged wrt to using list_last_entry), but
we break on seeing a busy request, so no ratelimiting if you keep the GPU
busy (not quite as intended!).
 
> If the ring timeline has is a list of r-r-r-R-R-R (r=completed, 
> R=pending) then it looks like it will not sync on anything.
> 
> And if the list is r-r-r-r it will sync against a completed rq. Which I 
> hope is a no-op, but still, the loop logic looks potentially dodgy.
> 
> It also has a higher level vulnerability to one hog timeline starving 
> the rest I think.

Also? Other than forgetting the earlier break preventing the throtting,
what else do you see wrong with throttling along a timeline/ring?
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Reduce i915_request_alloc retirement to local context

2019-01-09 Thread Tvrtko Ursulin


On 07/01/2019 15:29, Chris Wilson wrote:

In the continual quest to reduce the amount of global work required when
submitting requests, replace i915_retire_requests() after allocation
failure to retiring just our ring.

References: 11abf0c5a021 ("drm/i915: Limit the backpressure for i915_request 
allocation")
Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
---
  drivers/gpu/drm/i915/i915_request.c | 33 +
  1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 1e158eb8cb97..9ba218c6029b 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -477,6 +477,29 @@ submit_notify(struct i915_sw_fence *fence, enum 
i915_sw_fence_notify state)
return NOTIFY_DONE;
  }
  
+static noinline struct i915_request *

+i915_request_alloc_slow(struct intel_context *ce)
+{
+   struct intel_ring *ring = ce->ring;
+   struct i915_request *rq, *next;
+
+   list_for_each_entry_safe(rq, next, &ring->request_list, ring_link) {
+   /* Ratelimit ourselves to prevent oom from malicious clients */
+   if (&next->ring_link == &ring->request_list) {


list_is_last(next, &ring->request_list) ?


+   cond_synchronize_rcu(rq->rcustate);
+   break; /* keep the last objects for the next request */
+   }
+
+   if (!i915_request_completed(rq))
+   break;
+
+   /* Retire our old requests in the hope that we free some */
+   i915_request_retire(rq);
The RCU wait against the last submitted rq is also gone. Now it only 
sync against the next to last rq, unless there is more than two live 
requests. Is this what you intended?


If the ring timeline has is a list of r-r-r-R-R-R (r=completed, 
R=pending) then it looks like it will not sync on anything.


And if the list is r-r-r-r it will sync against a completed rq. Which I 
hope is a no-op, but still, the loop logic looks potentially dodgy.


It also has a higher level vulnerability to one hog timeline starving 
the rest I think.


Regards,

Tvrtko


+   }
+
+   return kmem_cache_alloc(ce->gem_context->i915->requests, GFP_KERNEL);
+}
+
  /**
   * i915_request_alloc - allocate a request structure
   *
@@ -559,15 +582,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct 
i915_gem_context *ctx)
rq = kmem_cache_alloc(i915->requests,
  GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
if (unlikely(!rq)) {
-   i915_retire_requests(i915);
-
-   /* Ratelimit ourselves to prevent oom from malicious clients */
-   rq = i915_gem_active_raw(&ce->ring->timeline->last_request,
-&i915->drm.struct_mutex);
-   if (rq)
-   cond_synchronize_rcu(rq->rcustate);
-
-   rq = kmem_cache_alloc(i915->requests, GFP_KERNEL);
+   rq = i915_request_alloc_slow(ce);
if (!rq) {
ret = -ENOMEM;
goto err_unreserve;


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Reduce i915_request_alloc retirement to local context

2019-01-07 Thread Chris Wilson
In the continual quest to reduce the amount of global work required when
submitting requests, replace i915_retire_requests() after allocation
failure to retiring just our ring.

References: 11abf0c5a021 ("drm/i915: Limit the backpressure for i915_request 
allocation")
Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_request.c | 33 +
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 1e158eb8cb97..9ba218c6029b 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -477,6 +477,29 @@ submit_notify(struct i915_sw_fence *fence, enum 
i915_sw_fence_notify state)
return NOTIFY_DONE;
 }
 
+static noinline struct i915_request *
+i915_request_alloc_slow(struct intel_context *ce)
+{
+   struct intel_ring *ring = ce->ring;
+   struct i915_request *rq, *next;
+
+   list_for_each_entry_safe(rq, next, &ring->request_list, ring_link) {
+   /* Ratelimit ourselves to prevent oom from malicious clients */
+   if (&next->ring_link == &ring->request_list) {
+   cond_synchronize_rcu(rq->rcustate);
+   break; /* keep the last objects for the next request */
+   }
+
+   if (!i915_request_completed(rq))
+   break;
+
+   /* Retire our old requests in the hope that we free some */
+   i915_request_retire(rq);
+   }
+
+   return kmem_cache_alloc(ce->gem_context->i915->requests, GFP_KERNEL);
+}
+
 /**
  * i915_request_alloc - allocate a request structure
  *
@@ -559,15 +582,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct 
i915_gem_context *ctx)
rq = kmem_cache_alloc(i915->requests,
  GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
if (unlikely(!rq)) {
-   i915_retire_requests(i915);
-
-   /* Ratelimit ourselves to prevent oom from malicious clients */
-   rq = i915_gem_active_raw(&ce->ring->timeline->last_request,
-&i915->drm.struct_mutex);
-   if (rq)
-   cond_synchronize_rcu(rq->rcustate);
-
-   rq = kmem_cache_alloc(i915->requests, GFP_KERNEL);
+   rq = i915_request_alloc_slow(ce);
if (!rq) {
ret = -ENOMEM;
goto err_unreserve;
-- 
2.20.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx