Re: [Intel-gfx] [PATCH 2/4] drm/i915: Retire requests along rings

2018-04-23 Thread Tvrtko Ursulin



On 23/04/2018 10:06, Chris Wilson wrote:

Quoting Tvrtko Ursulin (2018-04-23 09:47:57)


On 20/04/2018 14:20, Chris Wilson wrote:

   void i915_retire_requests(struct drm_i915_private *i915)
   {
- struct intel_engine_cs *engine;
- enum intel_engine_id id;
+ struct intel_ring *ring, *next;
   
   lockdep_assert_held(>drm.struct_mutex);
   
   if (!i915->gt.active_requests)

   return;
   
- for_each_engine(engine, i915, id)

- engine_retire_requests(engine);
+ list_for_each_entry_safe(ring, next, >gt.rings, link)
+ ring_retire_requests(ring);


[Continuing from previous discussion on try-bot.]

I had a thought that this could be managed more efficiently in a very
simple manner.

We rename timeline->inflight_seqnos to something like live_requests and
manage this per ctx timeline, from request_add to request_retire. At the
same time we only have ring->ring_link be linked to
i915->gt.(live_)rings while the ctx timeline live_requests count is
greater than zero. In other words list_add on 0->1, list_del on 1->0.

This way the retire path does not have to walk all known rings, but only
all rings with live (unretired) reqeusts.

What do you think?


I wouldn't resurrect inflight_seqnos for this, we can simply use
list_empty(ring->request_list). Slight icky feeling every time we do
anything under struct_mutex, but by this point I itch all over.


Right, that's even simpler.


Seems like a small enough patch to try after. I think I class it as an
optimisation, so would like to get the bigger change in engine to ring
soaking first.


Ok, as you wish.

Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/4] drm/i915: Retire requests along rings

2018-04-23 Thread Chris Wilson
Quoting Tvrtko Ursulin (2018-04-23 09:47:57)
> 
> On 20/04/2018 14:20, Chris Wilson wrote:
> >   void i915_retire_requests(struct drm_i915_private *i915)
> >   {
> > - struct intel_engine_cs *engine;
> > - enum intel_engine_id id;
> > + struct intel_ring *ring, *next;
> >   
> >   lockdep_assert_held(>drm.struct_mutex);
> >   
> >   if (!i915->gt.active_requests)
> >   return;
> >   
> > - for_each_engine(engine, i915, id)
> > - engine_retire_requests(engine);
> > + list_for_each_entry_safe(ring, next, >gt.rings, link)
> > + ring_retire_requests(ring);
> 
> [Continuing from previous discussion on try-bot.]
> 
> I had a thought that this could be managed more efficiently in a very 
> simple manner.
> 
> We rename timeline->inflight_seqnos to something like live_requests and 
> manage this per ctx timeline, from request_add to request_retire. At the 
> same time we only have ring->ring_link be linked to 
> i915->gt.(live_)rings while the ctx timeline live_requests count is 
> greater than zero. In other words list_add on 0->1, list_del on 1->0.
> 
> This way the retire path does not have to walk all known rings, but only 
> all rings with live (unretired) reqeusts.
> 
> What do you think?

I wouldn't resurrect inflight_seqnos for this, we can simply use
list_empty(ring->request_list). Slight icky feeling every time we do
anything under struct_mutex, but by this point I itch all over.

Seems like a small enough patch to try after. I think I class it as an
optimisation, so would like to get the bigger change in engine to ring
soaking first.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/4] drm/i915: Retire requests along rings

2018-04-23 Thread Tvrtko Ursulin


On 20/04/2018 14:20, Chris Wilson wrote:

In the next patch, rings are the central timeline as requests may jump
between engines. Therefore in the future as we retire in order along the
engine timeline, we may retire out-of-order within a ring (as the ring now
occurs along multiple engines), leading to much hilarity in miscomputing
the position of ring->head.

As an added bonus, retiring along the ring reduces the penalty of having
one execlists client do cleanup for another (old legacy submission
shares a ring between all clients). The downside is that slow and
irregular (off the critical path) process of cleaning up stale requests
after userspace becomes a modicum less efficient.

In the long run, it will become apparent that the ordered
ring->request_list matches the ring->timeline, a fun challenge for the
future will be unifying the two lists to avoid duplication!

Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
---
  drivers/gpu/drm/i915/i915_drv.h   |  3 +-
  drivers/gpu/drm/i915/i915_gem.c   |  1 +
  drivers/gpu/drm/i915/i915_request.c   | 43 ---
  drivers/gpu/drm/i915/intel_ringbuffer.c   |  5 +++
  drivers/gpu/drm/i915/intel_ringbuffer.h   |  1 +
  drivers/gpu/drm/i915/selftests/mock_engine.c  | 27 +---
  .../gpu/drm/i915/selftests/mock_gem_device.c  |  2 +
  7 files changed, 50 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 028691108125..e177d2bda87d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2055,8 +2055,9 @@ struct drm_i915_private {
void (*resume)(struct drm_i915_private *);
void (*cleanup_engine)(struct intel_engine_cs *engine);
  
-		struct list_head timelines;

struct i915_gem_timeline global_timeline;
+   struct list_head timelines;
+   struct list_head rings;
u32 active_requests;
  
  		/**

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 795ca83aed7a..906e2395c245 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5600,6 +5600,7 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
goto err_dependencies;
  
  	mutex_lock(_priv->drm.struct_mutex);

+   INIT_LIST_HEAD(_priv->gt.rings);
INIT_LIST_HEAD(_priv->gt.timelines);
err = i915_gem_timeline_init__global(dev_priv);
mutex_unlock(_priv->drm.struct_mutex);
diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 1437538d5bfa..0bf949ec9f1a 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -292,6 +292,7 @@ void i915_gem_retire_noop(struct i915_gem_active *active,
  
  static void advance_ring(struct i915_request *request)

  {
+   struct intel_ring *ring = request->ring;
unsigned int tail;
  
  	/*

@@ -303,7 +304,9 @@ static void advance_ring(struct i915_request *request)
 * Note this requires that we are always called in request
 * completion order.
 */
-   if (list_is_last(>ring_link, >ring->request_list)) {
+   GEM_BUG_ON(request != list_first_entry(>request_list,
+  typeof(*request), ring_link));
+   if (list_is_last(>ring_link, >request_list)) {
/*
 * We may race here with execlists resubmitting this request
 * as we retire it. The resubmission will move the ring->tail
@@ -318,7 +321,7 @@ static void advance_ring(struct i915_request *request)
}
list_del(>ring_link);
  
-	request->ring->head = tail;

+   ring->head = tail;
  }
  
  static void free_capture_list(struct i915_request *request)

@@ -424,18 +427,18 @@ static void i915_request_retire(struct i915_request 
*request)
  
  void i915_request_retire_upto(struct i915_request *rq)

  {
-   struct intel_engine_cs *engine = rq->engine;
+   struct intel_ring *ring = rq->ring;
struct i915_request *tmp;
  
  	lockdep_assert_held(>i915->drm.struct_mutex);

GEM_BUG_ON(!i915_request_completed(rq));
  
-	if (list_empty(>link))

+   if (list_empty(>ring_link))
return;
  
  	do {

-   tmp = list_first_entry(>timeline->requests,
-  typeof(*tmp), link);
+   tmp = list_first_entry(>request_list,
+  typeof(*tmp), ring_link);
  
  		i915_request_retire(tmp);

} while (tmp != rq);
@@ -644,9 +647,9 @@ i915_request_alloc(struct intel_engine_cs *engine, struct 
i915_gem_context *ctx)
if (ret)
goto err_unreserve;
  
-	/* Move the oldest request to the slab-cache (if not in use!) */

-   rq = list_first_entry_or_null(>timeline->requests,
-   

[Intel-gfx] [PATCH 2/4] drm/i915: Retire requests along rings

2018-04-20 Thread Chris Wilson
In the next patch, rings are the central timeline as requests may jump
between engines. Therefore in the future as we retire in order along the
engine timeline, we may retire out-of-order within a ring (as the ring now
occurs along multiple engines), leading to much hilarity in miscomputing
the position of ring->head.

As an added bonus, retiring along the ring reduces the penalty of having
one execlists client do cleanup for another (old legacy submission
shares a ring between all clients). The downside is that slow and
irregular (off the critical path) process of cleaning up stale requests
after userspace becomes a modicum less efficient.

In the long run, it will become apparent that the ordered
ring->request_list matches the ring->timeline, a fun challenge for the
future will be unifying the two lists to avoid duplication!

Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/i915_drv.h   |  3 +-
 drivers/gpu/drm/i915/i915_gem.c   |  1 +
 drivers/gpu/drm/i915/i915_request.c   | 43 ---
 drivers/gpu/drm/i915/intel_ringbuffer.c   |  5 +++
 drivers/gpu/drm/i915/intel_ringbuffer.h   |  1 +
 drivers/gpu/drm/i915/selftests/mock_engine.c  | 27 +---
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  2 +
 7 files changed, 50 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 028691108125..e177d2bda87d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2055,8 +2055,9 @@ struct drm_i915_private {
void (*resume)(struct drm_i915_private *);
void (*cleanup_engine)(struct intel_engine_cs *engine);
 
-   struct list_head timelines;
struct i915_gem_timeline global_timeline;
+   struct list_head timelines;
+   struct list_head rings;
u32 active_requests;
 
/**
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 795ca83aed7a..906e2395c245 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5600,6 +5600,7 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
goto err_dependencies;
 
mutex_lock(_priv->drm.struct_mutex);
+   INIT_LIST_HEAD(_priv->gt.rings);
INIT_LIST_HEAD(_priv->gt.timelines);
err = i915_gem_timeline_init__global(dev_priv);
mutex_unlock(_priv->drm.struct_mutex);
diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 1437538d5bfa..0bf949ec9f1a 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -292,6 +292,7 @@ void i915_gem_retire_noop(struct i915_gem_active *active,
 
 static void advance_ring(struct i915_request *request)
 {
+   struct intel_ring *ring = request->ring;
unsigned int tail;
 
/*
@@ -303,7 +304,9 @@ static void advance_ring(struct i915_request *request)
 * Note this requires that we are always called in request
 * completion order.
 */
-   if (list_is_last(>ring_link, >ring->request_list)) {
+   GEM_BUG_ON(request != list_first_entry(>request_list,
+  typeof(*request), ring_link));
+   if (list_is_last(>ring_link, >request_list)) {
/*
 * We may race here with execlists resubmitting this request
 * as we retire it. The resubmission will move the ring->tail
@@ -318,7 +321,7 @@ static void advance_ring(struct i915_request *request)
}
list_del(>ring_link);
 
-   request->ring->head = tail;
+   ring->head = tail;
 }
 
 static void free_capture_list(struct i915_request *request)
@@ -424,18 +427,18 @@ static void i915_request_retire(struct i915_request 
*request)
 
 void i915_request_retire_upto(struct i915_request *rq)
 {
-   struct intel_engine_cs *engine = rq->engine;
+   struct intel_ring *ring = rq->ring;
struct i915_request *tmp;
 
lockdep_assert_held(>i915->drm.struct_mutex);
GEM_BUG_ON(!i915_request_completed(rq));
 
-   if (list_empty(>link))
+   if (list_empty(>ring_link))
return;
 
do {
-   tmp = list_first_entry(>timeline->requests,
-  typeof(*tmp), link);
+   tmp = list_first_entry(>request_list,
+  typeof(*tmp), ring_link);
 
i915_request_retire(tmp);
} while (tmp != rq);
@@ -644,9 +647,9 @@ i915_request_alloc(struct intel_engine_cs *engine, struct 
i915_gem_context *ctx)
if (ret)
goto err_unreserve;
 
-   /* Move the oldest request to the slab-cache (if not in use!) */
-   rq = list_first_entry_or_null(>timeline->requests,
-