Re: [Intel-gfx] [PATCH] drm/i915: Reduce i915_request_alloc retirement to local context
On 09/01/2019 12:06, Chris Wilson wrote: Quoting Tvrtko Ursulin (2019-01-09 11:56:15) On 07/01/2019 15:29, Chris Wilson wrote: In the continual quest to reduce the amount of global work required when submitting requests, replace i915_retire_requests() after allocation failure to retiring just our ring. References: 11abf0c5a021 ("drm/i915: Limit the backpressure for i915_request allocation") Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_request.c | 33 + 1 file changed, 24 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 1e158eb8cb97..9ba218c6029b 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -477,6 +477,29 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) return NOTIFY_DONE; } +static noinline struct i915_request * +i915_request_alloc_slow(struct intel_context *ce) +{ + struct intel_ring *ring = ce->ring; + struct i915_request *rq, *next; + + list_for_each_entry_safe(rq, next, &ring->request_list, ring_link) { + /* Ratelimit ourselves to prevent oom from malicious clients */ + if (&next->ring_link == &ring->request_list) { list_is_last(next, &ring->request_list) ? Tried it (needs list_is_last(&next->ring_link,...)), but I slightly preferred not implying that next was a valid request here, and keeping the matching form to list termination. + cond_synchronize_rcu(rq->rcustate); + break; /* keep the last objects for the next request */ + } + + if (!i915_request_completed(rq)) + break; + + /* Retire our old requests in the hope that we free some */ + i915_request_retire(rq); The RCU wait against the last submitted rq is also gone. Now it only sync against the next to last rq, unless there is more than two live requests. Is this what you intended? Nah, I was trying to be too smart, forgetting that we didn't walk the entire list. The RCU wait is against to the last rq (since next is the list head at that point, so unchanged wrt to using list_last_entry), but we break on seeing a busy request, so no ratelimiting if you keep the GPU busy (not quite as intended!). If the ring timeline has is a list of r-r-r-R-R-R (r=completed, R=pending) then it looks like it will not sync on anything. And if the list is r-r-r-r it will sync against a completed rq. Which I hope is a no-op, but still, the loop logic looks potentially dodgy. It also has a higher level vulnerability to one hog timeline starving the rest I think. Also? Other than forgetting the earlier break preventing the throtting, what else do you see wrong with throttling along a timeline/ring? I was on the wrong track when thinking about the removal of global retire. I though the hog on one timeline would be able to starve the other timeline, but the hog will eventually hit the allocation failure and sync against itself. I think it's fine. Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Reduce i915_request_alloc retirement to local context
Quoting Tvrtko Ursulin (2019-01-09 11:56:15) > > On 07/01/2019 15:29, Chris Wilson wrote: > > In the continual quest to reduce the amount of global work required when > > submitting requests, replace i915_retire_requests() after allocation > > failure to retiring just our ring. > > > > References: 11abf0c5a021 ("drm/i915: Limit the backpressure for > > i915_request allocation") > > Signed-off-by: Chris Wilson > > Cc: Tvrtko Ursulin > > --- > > drivers/gpu/drm/i915/i915_request.c | 33 + > > 1 file changed, 24 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_request.c > > b/drivers/gpu/drm/i915/i915_request.c > > index 1e158eb8cb97..9ba218c6029b 100644 > > --- a/drivers/gpu/drm/i915/i915_request.c > > +++ b/drivers/gpu/drm/i915/i915_request.c > > @@ -477,6 +477,29 @@ submit_notify(struct i915_sw_fence *fence, enum > > i915_sw_fence_notify state) > > return NOTIFY_DONE; > > } > > > > +static noinline struct i915_request * > > +i915_request_alloc_slow(struct intel_context *ce) > > +{ > > + struct intel_ring *ring = ce->ring; > > + struct i915_request *rq, *next; > > + > > + list_for_each_entry_safe(rq, next, &ring->request_list, ring_link) { > > + /* Ratelimit ourselves to prevent oom from malicious clients > > */ > > + if (&next->ring_link == &ring->request_list) { > > list_is_last(next, &ring->request_list) ? Tried it (needs list_is_last(&next->ring_link,...)), but I slightly preferred not implying that next was a valid request here, and keeping the matching form to list termination. > > + cond_synchronize_rcu(rq->rcustate); > > + break; /* keep the last objects for the next request > > */ > > + } > > + > > + if (!i915_request_completed(rq)) > > + break; > > + > > + /* Retire our old requests in the hope that we free some */ > > + i915_request_retire(rq); > The RCU wait against the last submitted rq is also gone. Now it only > sync against the next to last rq, unless there is more than two live > requests. Is this what you intended? Nah, I was trying to be too smart, forgetting that we didn't walk the entire list. The RCU wait is against to the last rq (since next is the list head at that point, so unchanged wrt to using list_last_entry), but we break on seeing a busy request, so no ratelimiting if you keep the GPU busy (not quite as intended!). > If the ring timeline has is a list of r-r-r-R-R-R (r=completed, > R=pending) then it looks like it will not sync on anything. > > And if the list is r-r-r-r it will sync against a completed rq. Which I > hope is a no-op, but still, the loop logic looks potentially dodgy. > > It also has a higher level vulnerability to one hog timeline starving > the rest I think. Also? Other than forgetting the earlier break preventing the throtting, what else do you see wrong with throttling along a timeline/ring? -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Reduce i915_request_alloc retirement to local context
On 07/01/2019 15:29, Chris Wilson wrote: In the continual quest to reduce the amount of global work required when submitting requests, replace i915_retire_requests() after allocation failure to retiring just our ring. References: 11abf0c5a021 ("drm/i915: Limit the backpressure for i915_request allocation") Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_request.c | 33 + 1 file changed, 24 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 1e158eb8cb97..9ba218c6029b 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -477,6 +477,29 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) return NOTIFY_DONE; } +static noinline struct i915_request * +i915_request_alloc_slow(struct intel_context *ce) +{ + struct intel_ring *ring = ce->ring; + struct i915_request *rq, *next; + + list_for_each_entry_safe(rq, next, &ring->request_list, ring_link) { + /* Ratelimit ourselves to prevent oom from malicious clients */ + if (&next->ring_link == &ring->request_list) { list_is_last(next, &ring->request_list) ? + cond_synchronize_rcu(rq->rcustate); + break; /* keep the last objects for the next request */ + } + + if (!i915_request_completed(rq)) + break; + + /* Retire our old requests in the hope that we free some */ + i915_request_retire(rq); The RCU wait against the last submitted rq is also gone. Now it only sync against the next to last rq, unless there is more than two live requests. Is this what you intended? If the ring timeline has is a list of r-r-r-R-R-R (r=completed, R=pending) then it looks like it will not sync on anything. And if the list is r-r-r-r it will sync against a completed rq. Which I hope is a no-op, but still, the loop logic looks potentially dodgy. It also has a higher level vulnerability to one hog timeline starving the rest I think. Regards, Tvrtko + } + + return kmem_cache_alloc(ce->gem_context->i915->requests, GFP_KERNEL); +} + /** * i915_request_alloc - allocate a request structure * @@ -559,15 +582,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) rq = kmem_cache_alloc(i915->requests, GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN); if (unlikely(!rq)) { - i915_retire_requests(i915); - - /* Ratelimit ourselves to prevent oom from malicious clients */ - rq = i915_gem_active_raw(&ce->ring->timeline->last_request, -&i915->drm.struct_mutex); - if (rq) - cond_synchronize_rcu(rq->rcustate); - - rq = kmem_cache_alloc(i915->requests, GFP_KERNEL); + rq = i915_request_alloc_slow(ce); if (!rq) { ret = -ENOMEM; goto err_unreserve; ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915: Reduce i915_request_alloc retirement to local context
In the continual quest to reduce the amount of global work required when submitting requests, replace i915_retire_requests() after allocation failure to retiring just our ring. References: 11abf0c5a021 ("drm/i915: Limit the backpressure for i915_request allocation") Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_request.c | 33 + 1 file changed, 24 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 1e158eb8cb97..9ba218c6029b 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -477,6 +477,29 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) return NOTIFY_DONE; } +static noinline struct i915_request * +i915_request_alloc_slow(struct intel_context *ce) +{ + struct intel_ring *ring = ce->ring; + struct i915_request *rq, *next; + + list_for_each_entry_safe(rq, next, &ring->request_list, ring_link) { + /* Ratelimit ourselves to prevent oom from malicious clients */ + if (&next->ring_link == &ring->request_list) { + cond_synchronize_rcu(rq->rcustate); + break; /* keep the last objects for the next request */ + } + + if (!i915_request_completed(rq)) + break; + + /* Retire our old requests in the hope that we free some */ + i915_request_retire(rq); + } + + return kmem_cache_alloc(ce->gem_context->i915->requests, GFP_KERNEL); +} + /** * i915_request_alloc - allocate a request structure * @@ -559,15 +582,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) rq = kmem_cache_alloc(i915->requests, GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN); if (unlikely(!rq)) { - i915_retire_requests(i915); - - /* Ratelimit ourselves to prevent oom from malicious clients */ - rq = i915_gem_active_raw(&ce->ring->timeline->last_request, -&i915->drm.struct_mutex); - if (rq) - cond_synchronize_rcu(rq->rcustate); - - rq = kmem_cache_alloc(i915->requests, GFP_KERNEL); + rq = i915_request_alloc_slow(ce); if (!rq) { ret = -ENOMEM; goto err_unreserve; -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx