i915: Interrupt driven fences

Maarten Lankhorst Tue, 07 Jun 2016 05:04:14 -0700

Op 02-06-16 om 15:25 schreef Tvrtko Ursulin:
>
> On 01/06/16 18:07, john.c.harri...@intel.com wrote:
>> From: John Harrison <john.c.harri...@intel.com>
>>
>> The intended usage model for struct fence is that the signalled status
>> should be set on demand rather than polled. That is, there should not
>> be a need for a 'signaled' function to be called everytime the status
>> is queried. Instead, 'something' should be done to enable a signal
>> callback from the hardware which will update the state directly. In
>> the case of requests, this is the seqno update interrupt. The idea is
>> that this callback will only be enabled on demand when something
>> actually tries to wait on the fence.
>>
>> This change removes the polling test and replaces it with the callback
>> scheme. Each fence is added to a 'please poke me' list at the start of
>> i915_add_request(). The interrupt handler then scans through the 'poke
>> me' list when a new seqno pops out and signals any matching
>> fence/request. The fence is then removed from the list so the entire
>> request stack does not need to be scanned every time. Note that the
>> fence is added to the list before the commands to generate the seqno
>> interrupt are added to the ring. Thus the sequence is guaranteed to be
>> race free if the interrupt is already enabled.
>>
>> Note that the interrupt is only enabled on demand (i.e. when
>> __wait_request() is called). Thus there is still a potential race when
>> enabling the interrupt as the request may already have completed.
>> However, this is simply solved by calling the interrupt processing
>> code immediately after enabling the interrupt and thereby checking for
>> already completed requests.
>>
>> Lastly, the ring clean up code has the possibility to cancel
>> outstanding requests (e.g. because TDR has reset the ring). These
>> requests will never get signalled and so must be removed from the
>> signal list manually. This is done by setting a 'cancelled' flag and
>> then calling the regular notify/retire code path rather than
>> attempting to duplicate the list manipulatation and clean up code in
>> multiple places. This also avoid any race condition where the
>> cancellation request might occur after/during the completion interrupt
>> actually arriving.
>>
>> v2: Updated to take advantage of the request unreference no longer
>> requiring the mutex lock.
>>
>> v3: Move the signal list processing around to prevent unsubmitted
>> requests being added to the list. This was occurring on Android
>> because the native sync implementation calls the
>> fence->enable_signalling API immediately on fence creation.
>>
>> Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
>> 'link' instead of 'list'. Added support for returning an error code on
>> a cancelled fence. Update list processing to be more efficient/safer
>> with respect to spinlocks.
>>
>> v5: Made i915_gem_request_submit a static as it is only ever called
>> from one place.
>>
>> Fixed up the low latency wait optimisation. The time delay between the
>> seqno value being to memory and the drive's ISR running can be
>> significant, at least for the wait request micro-benchmark. This can
>> be greatly improved by explicitly checking for seqno updates in the
>> pre-wait busy poll loop. Also added some documentation comments to the
>> busy poll code.
>>
>> Fixed up support for the faking of lost interrupts
>> (test_irq_rings/missed_irq_rings). That is, there is an IGT test that
>> tells the driver to loose interrupts deliberately and then check that
>> everything still works as expected (albeit much slower).
>>
>> Updates from review comments: use non IRQ-save spinlocking, early exit
>> on WARN and improved comments (Tvrtko Ursulin).
>>
>> v6: Updated to newer nigthly and resolved conflicts around the
>> wait_request busy spin optimisation. Also fixed a race condition
>> between this early exit path and the regular completion path.
>>
>> v7: Updated to newer nightly - lots of ring -> engine renaming plus an
>> interface change on get_seqno(). Also added a list_empty() check
>> before acquring spinlocks and doing list processing.
>>
>> v8: Updated to newer nightly - changes to request clean up code mean
>> non of the deferred free mess is needed any more.
>>
>> v9: Moved the request completion processing out of the interrupt
>> handler and into a worker thread (Chris Wilson).
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <john.c.harri...@intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursu...@intel.com>
>> Cc: Maarten Lankhorst <maarten.lankho...@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_dma.c         |   9 +-
>>   drivers/gpu/drm/i915/i915_drv.h         |  11 ++
>>   drivers/gpu/drm/i915/i915_gem.c         | 248 
>> +++++++++++++++++++++++++++++---
>>   drivers/gpu/drm/i915/i915_irq.c         |   2 +
>>   drivers/gpu/drm/i915/intel_lrc.c        |   5 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.c |   5 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |   3 +
>>   7 files changed, 260 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_dma.c 
>> b/drivers/gpu/drm/i915/i915_dma.c
>> index 07edaed..f8f60bb 100644
>> --- a/drivers/gpu/drm/i915/i915_dma.c
>> +++ b/drivers/gpu/drm/i915/i915_dma.c
>> @@ -1019,9 +1019,13 @@ static int i915_workqueues_init(struct 
>> drm_i915_private *dev_priv)
>>       if (dev_priv->wq == NULL)
>>           goto out_err;
>>
>> +    dev_priv->req_wq = alloc_ordered_workqueue("i915-rq", 0);
>> +    if (dev_priv->req_wq == NULL)
>> +        goto out_free_wq;
>> +
>
> Single (per-device) ordered workqueue will serialize interrupt processing 
> across all engines to one thread. Together with the fact request worker does 
> not seem to need the sleeping context, I am thinking that a tasklet per 
> engine would be much better (see engine->irq_tasklet for an example).
>
>>       dev_priv->hotplug.dp_wq = alloc_ordered_workqueue("i915-dp", 0);
>>       if (dev_priv->hotplug.dp_wq == NULL)
>> -        goto out_free_wq;
>> +        goto out_free_req_wq;
>>
>>       dev_priv->gpu_error.hangcheck_wq =
>>           alloc_ordered_workqueue("i915-hangcheck", 0);
>> @@ -1032,6 +1036,8 @@ static int i915_workqueues_init(struct 
>> drm_i915_private *dev_priv)
>>
>>   out_free_dp_wq:
>>       destroy_workqueue(dev_priv->hotplug.dp_wq);
>> +out_free_req_wq:
>> +    destroy_workqueue(dev_priv->req_wq);
>>   out_free_wq:
>>       destroy_workqueue(dev_priv->wq);
>>   out_err:
>> @@ -1044,6 +1050,7 @@ static void i915_workqueues_cleanup(struct 
>> drm_i915_private *dev_priv)
>>   {
>>       destroy_workqueue(dev_priv->gpu_error.hangcheck_wq);
>>       destroy_workqueue(dev_priv->hotplug.dp_wq);
>> +    destroy_workqueue(dev_priv->req_wq);
>>       destroy_workqueue(dev_priv->wq);
>>   }
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index 69c3412..5a7f256 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -1851,6 +1851,9 @@ struct drm_i915_private {
>>        */
>>       struct workqueue_struct *wq;
>>
>> +    /* Work queue for request completion processing */
>> +    struct workqueue_struct *req_wq;
>> +
>>       /* Display functions */
>>       struct drm_i915_display_funcs display;
>>
>> @@ -2359,6 +2362,10 @@ struct drm_i915_gem_request {
>>        */
>>       struct fence fence;
>>       struct rcu_head rcu_head;
>> +    struct list_head signal_link;
>> +    bool cancelled;
>> +    bool irq_enabled;
>> +    bool signal_requested;
>>
>>       /** On Which ring this request was generated */
>>       struct drm_i915_private *i915;
>> @@ -2460,6 +2467,10 @@ struct drm_i915_gem_request {
>>   struct drm_i915_gem_request * __must_check
>>   i915_gem_request_alloc(struct intel_engine_cs *engine,
>>                  struct i915_gem_context *ctx);
>> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
>> +                       bool fence_locked);
>> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool 
>> fence_locked);
>> +void i915_gem_request_worker(struct work_struct *work);
>>
>>   static inline bool i915_gem_request_completed(struct drm_i915_gem_request 
>> *req)
>>   {
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 97e3138..83cf9b0 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -39,6 +39,8 @@
>>   #include <linux/pci.h>
>>   #include <linux/dma-buf.h>
>>
>> +static void i915_gem_request_submit(struct drm_i915_gem_request *req);
>> +
>>   static void i915_gem_object_flush_gtt_write_domain(struct 
>> drm_i915_gem_object *obj);
>>   static void i915_gem_object_flush_cpu_write_domain(struct 
>> drm_i915_gem_object *obj);
>>   static void
>> @@ -1237,9 +1239,8 @@ int __i915_wait_request(struct drm_i915_gem_request 
>> *req,
>>   {
>>       struct intel_engine_cs *engine = i915_gem_request_get_engine(req);
>>       struct drm_i915_private *dev_priv = req->i915;
>> -    const bool irq_test_in_progress =
>> -        ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & 
>> intel_engine_flag(engine);
>>       int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
>> +    uint32_t seqno;
>>       DEFINE_WAIT(wait);
>>       unsigned long timeout_expire;
>>       s64 before = 0; /* Only to silence a compiler warning. */
>> @@ -1247,9 +1248,6 @@ int __i915_wait_request(struct drm_i915_gem_request 
>> *req,
>>
>>       WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
>>
>> -    if (list_empty(&req->list))
>> -        return 0;
>> -
>>       if (i915_gem_request_completed(req))
>>           return 0;
>>
>> @@ -1275,15 +1273,17 @@ int __i915_wait_request(struct drm_i915_gem_request 
>> *req,
>>       trace_i915_gem_request_wait_begin(req);
>>
>>       /* Optimistic spin for the next jiffie before touching IRQs */
>> -    ret = __i915_spin_request(req, state);
>> -    if (ret == 0)
>> -        goto out;
>> -
>> -    if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) {
>> -        ret = -ENODEV;
>> -        goto out;
>> +    if (req->seqno) {
>
> This needs a comment I think because it is so unusual and new that req->seqno 
> == 0 is a special path. To explain why and how it can happen here.
>
>> +        ret = __i915_spin_request(req, state);
>> +        if (ret == 0)
>> +            goto out;
>>       }
>>
>> +    /*
>> +     * Enable interrupt completion of the request.
>> +     */
>> +    fence_enable_sw_signaling(&req->fence);
>> +
>>       for (;;) {
>>           struct timer_list timer;
>>
>> @@ -1306,6 +1306,21 @@ int __i915_wait_request(struct drm_i915_gem_request 
>> *req,
>>               break;
>>           }
>>
>> +        if (req->seqno) {
>> +            /*
>> +             * There is quite a lot of latency in the user interrupt
>> +             * path. So do an explicit seqno check and potentially
>> +             * remove all that delay.
>> +             */
>> +            if (req->engine->irq_seqno_barrier)
>> +                req->engine->irq_seqno_barrier(req->engine);
>> +            seqno = engine->get_seqno(engine);
>> +            if (i915_seqno_passed(seqno, req->seqno)) {
>> +                ret = 0;
>> +                break;
>> +            }
>> +        }
>> +
>>           if (signal_pending_state(state, current)) {
>>               ret = -ERESTARTSYS;
>>               break;
>> @@ -1332,14 +1347,32 @@ int __i915_wait_request(struct drm_i915_gem_request 
>> *req,
>>               destroy_timer_on_stack(&timer);
>>           }
>>       }
>> -    if (!irq_test_in_progress)
>> -        engine->irq_put(engine);
>>
>>       finish_wait(&engine->irq_queue, &wait);
>
> Hm I don't understand why our custom waiting remains? Shouldn't fence_wait 
> just be called after the optimistic spin, more or less?
>
>>
>>   out:
>>       trace_i915_gem_request_wait_end(req);
>>
>> +    if ((ret == 0) && (req->seqno)) {
>> +        if (req->engine->irq_seqno_barrier)
>> +            req->engine->irq_seqno_barrier(req->engine);
>> +        seqno = engine->get_seqno(engine);
>> +        if (i915_seqno_passed(seqno, req->seqno) &&
>> +            !i915_gem_request_completed(req)) {
>> +            /*
>> +             * Make sure the request is marked as completed before
>> +             * returning. NB: Need to acquire the spinlock around
>> +             * the whole call to avoid a race condition with the
>> +             * interrupt handler is running concurrently and could
>> +             * cause this invocation to early exit even though the
>> +             * request has not actually been fully processed yet.
>> +             */
>> +            spin_lock_irq(&req->engine->fence_lock);
>> +            i915_gem_request_notify(req->engine, true);
>> +            spin_unlock_irq(&req->engine->fence_lock);
>> +        }
>> +    }
>> +
>>       if (timeout) {
>>           s64 tres = *timeout - (ktime_get_raw_ns() - before);
>>
>> @@ -1405,6 +1438,11 @@ static void i915_gem_request_retire(struct 
>> drm_i915_gem_request *request)
>>   {
>>       trace_i915_gem_request_retire(request);
>>
>> +    if (request->irq_enabled) {
>> +        request->engine->irq_put(request->engine);
>> +        request->irq_enabled = false;
>
> What protects request->irq_enabled? Here versus enable_signalling bit? It can 
> be called from the external fence users which would take the fence_lock, but 
> here it does not.
>
>> +    }
>
>> +
>>       /* We know the GPU must have read the request to have
>>        * sent us the seqno + interrupt, so use the position
>>        * of tail of the request to update the last known position
>> @@ -1418,6 +1456,22 @@ static void i915_gem_request_retire(struct 
>> drm_i915_gem_request *request)
>>       list_del_init(&request->list);
>>       i915_gem_request_remove_from_client(request);
>>
>> +    /*
>> +     * In case the request is still in the signal pending list,
>> +     * e.g. due to being cancelled by TDR, preemption, etc.
>> +     */
>> +    if (!list_empty(&request->signal_link)) {
>
> No locking required here?
Considering the locked function is used, I'm assuming this function holds the 
fence_lock.


If not, something's seriously wrong.
>> +        /*
>> +         * The request must be marked as cancelled and the underlying
>> +         * fence as failed. NB: There is no explicit fence fail API,
>> +         * there is only a manual poke and signal.
>> +         */
>> +        request->cancelled = true;
>> +        /* How to propagate to any associated sync_fence??? */
^This way works, so comment can be removed.

There's deliberately no way to cancel a fence, it's the same path but with 
status member set.

If you have a fence for another driver, there's really no good way to handle 
failure. So you have to treat
it as if it succeeded.
>> +        request->fence.status = -EIO;
>> +        fence_signal_locked(&request->fence);
>
> And here?
>
>> +    }
>> +
>>       if (request->previous_context) {
>>           if (i915.enable_execlists)
>>               intel_lr_context_unpin(request->previous_context,
>> @@ -2670,6 +2724,12 @@ void __i915_add_request(struct drm_i915_gem_request 
>> *request,
>>        */
>>       request->postfix = intel_ring_get_tail(ringbuf);
>>
>> +    /*
>> +     * Add the fence to the pending list before emitting the commands to
>> +     * generate a seqno notification interrupt.
>> +     */
>> +    i915_gem_request_submit(request);
>> +
>>       if (i915.enable_execlists)
>>           ret = engine->emit_request(request);
>>       else {
>> @@ -2755,25 +2815,154 @@ static void i915_gem_request_free(struct fence 
>> *req_fence)
>>       struct drm_i915_gem_request *req;
>>
>>       req = container_of(req_fence, typeof(*req), fence);
>> +
>> +    WARN_ON(req->irq_enabled);
>
> How useful is this? If it went wrong engine irq reference counting would be 
> bad. Okay no one would notice, but we could then stick some other warns here, 
> list !list_empy(req->list) and who knows what, which we don't have, so I am 
> just wondering if this one brings any value.
>
>> +
>>       call_rcu(&req->rcu_head, i915_gem_request_free_rcu);
>>   }
>>
>> -static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>> +/*
>> + * The request is about to be submitted to the hardware so add the fence to
>> + * the list of signalable fences.
>> + *
>> + * NB: This does not necessarily enable interrupts yet. That only occurs on
>> + * demand when the request is actually waited on. However, adding it to the
>> + * list early ensures that there is no race condition where the interrupt
>> + * could pop out prematurely and thus be completely lost. The race is merely
>> + * that the interrupt must be manually checked for after being enabled.
>> + */
>> +static void i915_gem_request_submit(struct drm_i915_gem_request *req)
>>   {
>> -    /* Interrupt driven fences are not implemented yet.*/
>> -    WARN(true, "This should not be called!");
>> -    return true;
>> +    /*
>> +     * Always enable signal processing for the request's fence object
>> +     * before that request is submitted to the hardware. Thus there is no
>> +     * race condition whereby the interrupt could pop out before the
>> +     * request has been added to the signal list. Hence no need to check
>> +     * for completion, undo the list add and return false.
>> +     */
>> +    i915_gem_request_reference(req);
>> +    spin_lock_irq(&req->engine->fence_lock);
>> +    WARN_ON(!list_empty(&req->signal_link));
>> +    list_add_tail(&req->signal_link, &req->engine->fence_signal_list);
>> +    spin_unlock_irq(&req->engine->fence_lock);
>> +
>> +    /*
>> +     * NB: Interrupts are only enabled on demand. Thus there is still a
>> +     * race where the request could complete before the interrupt has
>> +     * been enabled. Thus care must be taken at that point.
>> +     */
>> +
>> +    /* Have interrupts already been requested? */
>> +    if (req->signal_requested)
>> +        i915_gem_request_enable_interrupt(req, false);
>
> I am thinking that the fence lock could be held here until the end of the 
> function and in such way i915_gem_request_enable_interrupt would not need the 
> fence_locked parameter any more.
>
> It would probably also be safer with regards to accesing the 
> req->signal_requested. I am not sure that enable signalling and this 
> otherwise can't race and miss the signal_requested getting set?
>
>> +}
>> +
>> +/*
>> + * The request is being actively waited on, so enable interrupt based
>> + * completion signalling.
>> + */
>> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
>> +                       bool fence_locked)
>> +{
>> +    struct drm_i915_private *dev_priv = req->engine->i915;
>> +    const bool irq_test_in_progress =
>> +        ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) &
>> +                        intel_engine_flag(req->engine);
>> +
>> +    if (req->irq_enabled)
>> +        return;
>> +
>> +    if (irq_test_in_progress)
>> +        return;
>> +
>> +    if (!WARN_ON(!req->engine->irq_get(req->engine)))
>> +        req->irq_enabled = true;
>
> The double negation confused me a bit. It is probably not ideal since 
> WARN_ONs go to the out of line section so in a way it is deliberately 
> penalising the fast and expected path. I think it would be better to put a 
> WARN on the else path.
>
>> +
>> +    /*
>> +     * Because the interrupt is only enabled on demand, there is a race
>> +     * where the interrupt can fire before anyone is looking for it. So
>> +     * do an explicit check for missed interrupts.
>> +     */
>> +    i915_gem_request_notify(req->engine, fence_locked);
>>   }
>>
>> -static bool i915_gem_request_is_completed(struct fence *req_fence)
>> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>>   {
>>       struct drm_i915_gem_request *req = container_of(req_fence,
>>                            typeof(*req), fence);
>> +
>> +    /*
>> +     * No need to actually enable interrupt based processing until the
>> +     * request has been submitted to the hardware. At which point
>> +     * 'i915_gem_request_submit()' is called. So only really enable
>> +     * signalling in there. Just set a flag to say that interrupts are
>> +     * wanted when the request is eventually submitted. On the other hand
>> +     * if the request has already been submitted then interrupts do need
>> +     * to be enabled now.
>> +     */
>> +
>> +    req->signal_requested = true;
>> +
>> +    if (!list_empty(&req->signal_link))
>
> In what scenarios is the list_empty check needed? Someone can somehow enable 
> signalling on a fence not yet submitted?
>> +        i915_gem_request_enable_interrupt(req, true);
>> +
>> +    return true;
>> +}
>> +
>> +/**
>> + * i915_gem_request_worker - request work handler callback.
>> + * @work: Work structure
>> + * Called in response to a seqno interrupt to process the completed 
>> requests.
>> + */
>> +void i915_gem_request_worker(struct work_struct *work)
>> +{
>> +    struct intel_engine_cs *engine;
>> +
>> +    engine = container_of(work, struct intel_engine_cs, request_work);
>> +    i915_gem_request_notify(engine, false);
>> +}
>> +
>> +void i915_gem_request_notify(struct intel_engine_cs *engine, bool 
>> fence_locked)
>> +{
>> +    struct drm_i915_gem_request *req, *req_next;
>> +    unsigned long flags;
>>       u32 seqno;
>>
>> -    seqno = req->engine->get_seqno(req->engine);
>> +    if (list_empty(&engine->fence_signal_list))
>
> Okay this without the lock still makes me nervous. I'd rather not having to 
> think about why it is safe and can't miss a wakeup.
>
> Also I would be leaning toward having i915_gem_request_notify and 
> i915_gem_request_notify__unlocked. With the enable_interrupts simplification 
> I suggested it think it would look better and be more consistent with the 
> rest of the driver.
>
>> +        return;
>> +
>> +    if (!fence_locked)
>> +        spin_lock_irqsave(&engine->fence_lock, flags);
>
> Not called from hard irq context so can be just spin_lock_irq.
>
> But if you agree to go with the tasklet it would then be spin_lock_bh.
fence is always spin_lock_irq, if this requires _bh then it can't go into the 
tasklet.
>>
>> -    return i915_seqno_passed(seqno, req->seqno);
>> +    if (engine->irq_seqno_barrier)
>> +        engine->irq_seqno_barrier(engine);
>> +    seqno = engine->get_seqno(engine);
>> +
>> +    list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, 
>> signal_link) {
>> +        if (!req->cancelled) {
>> +            if (!i915_seqno_passed(seqno, req->seqno))
>> +                break;
>
> Merge to one if statement?
>
>> +        }
>> +
>> +        /*
>> +         * Start by removing the fence from the signal list otherwise
>> +         * the retire code can run concurrently and get confused.
>> +         */
>> +        list_del_init(&req->signal_link);
>> +
>> +        if (!req->cancelled)
>> +            fence_signal_locked(&req->fence);
>
> I forgot how signalling errors to userspace works? Does that still work for 
> cancelled fences in this series?
>
>> +
>> +        if (req->irq_enabled) {
>> +            req->engine->irq_put(req->engine);
>> +            req->irq_enabled = false;
>> +        }
>> +
>> +        i915_gem_request_unreference(req);
>> +    }
>> +
>> +    if (!fence_locked)
>> +        spin_unlock_irqrestore(&engine->fence_lock, flags);
>>   }
>>
>>   static const char *i915_gem_request_get_driver_name(struct fence 
>> *req_fence)
>> @@ -2816,7 +3005,6 @@ static void i915_gem_request_fence_value_str(struct 
>> fence *req_fence,
>>
>>   static const struct fence_ops i915_gem_request_fops = {
>>       .enable_signaling    = i915_gem_request_enable_signaling,
>> -    .signaled        = i915_gem_request_is_completed,
>>       .wait            = fence_default_wait,
>>       .release        = i915_gem_request_free,
>>       .get_driver_name    = i915_gem_request_get_driver_name,
>> @@ -2902,6 +3090,7 @@ __i915_gem_request_alloc(struct intel_engine_cs 
>> *engine,
>>       req->ctx  = ctx;
>>       i915_gem_context_reference(req->ctx);
>>
>> +    INIT_LIST_HEAD(&req->signal_link);
>>       fence_init(&req->fence, &i915_gem_request_fops, &engine->fence_lock,
>>              ctx->engine[engine->id].fence_timeline.fence_context,
>>              
>> i915_fence_timeline_get_next_seqno(&ctx->engine[engine->id].fence_timeline));
>> @@ -3036,6 +3225,13 @@ static void i915_gem_reset_engine_cleanup(struct 
>> drm_i915_private *dev_priv,
>>           i915_gem_request_retire(request);
>>       }
>>
>> +    /*
>> +     * Tidy up anything left over. This includes a call to
>> +     * i915_gem_request_notify() which will make sure that any requests
>> +     * that were on the signal pending list get also cleaned up.
>> +     */
>> +    i915_gem_retire_requests_ring(engine);
>
> Hmm.. but this function has just walked the same lists this will, and done 
> the same processing. Why call this from here? It looks bad to me, the two are 
> different special cases of the similar thing so I can't see that calling this 
> from here makes sense.
>
>> +
>>       /* Having flushed all requests from all queues, we know that all
>>        * ringbuffers must now be empty. However, since we do not reclaim
>>        * all space when retiring the request (to prevent HEADs colliding
>> @@ -3082,6 +3278,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs 
>> *engine)
>>   {
>>       WARN_ON(i915_verify_lists(engine->dev));
>>
>> +    /*
>> +     * If no-one has waited on a request recently then interrupts will
>> +     * not have been enabled and thus no requests will ever be marked as
>> +     * completed. So do an interrupt check now.
>> +     */
>> +    i915_gem_request_notify(engine, false);
>
> Would it work to signal the fence from the existing loop a bit above in this 
> function which already walks the request list in search for completed ones? 
> Or maybe even in i915_gem_request_retire?
>
> I am thinking about doing less list walking and better integration with the 
> core GEM. Downside would be more traffic on the fence_lock, hmm.. not sure 
> then. It just looks a bit bolted on like this.
>
> I don't see it being a noticeable cost so perhaps it can stay like this for 
> now.
>
>> +
>>       /* Retire requests first as we use it above for the early return.
>>        * If we retire requests last, we may use a later seqno and so clear
>>        * the requests lists without clearing the active list, leading to
>> @@ -5102,6 +5305,7 @@ init_engine_lists(struct intel_engine_cs *engine)
>>   {
>>       INIT_LIST_HEAD(&engine->active_list);
>>       INIT_LIST_HEAD(&engine->request_list);
>> +    INIT_LIST_HEAD(&engine->fence_signal_list);
>>   }
>>
>>   void
>> diff --git a/drivers/gpu/drm/i915/i915_irq.c 
>> b/drivers/gpu/drm/i915/i915_irq.c
>> index f780421..a87a3c5 100644
>> --- a/drivers/gpu/drm/i915/i915_irq.c
>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>> @@ -994,6 +994,8 @@ static void notify_ring(struct intel_engine_cs *engine)
>>       trace_i915_gem_request_notify(engine);
>>       engine->user_interrupts++;
>>
>> +    queue_work(engine->i915->req_wq, &engine->request_work);
>> +
>>       wake_up_all(&engine->irq_queue);
>
> Yes that is the weird part, why the engine->irq_queue has to remain with this 
> patch?
>
>>   }
>>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
>> b/drivers/gpu/drm/i915/intel_lrc.c
>> index f126bcb..134759d 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -1879,6 +1879,8 @@ void intel_logical_ring_cleanup(struct intel_engine_cs 
>> *engine)
>>
>>       dev_priv = engine->i915;
>>
>> +    cancel_work_sync(&engine->request_work);
>> +
>>       if (engine->buffer) {
>>           intel_logical_ring_stop(engine);
>>           WARN_ON((I915_READ_MODE(engine) & MODE_IDLE) == 0);
>> @@ -2027,6 +2029,7 @@ logical_ring_setup(struct drm_device *dev, enum 
>> intel_engine_id id)
>>
>>       INIT_LIST_HEAD(&engine->active_list);
>>       INIT_LIST_HEAD(&engine->request_list);
>> +    INIT_LIST_HEAD(&engine->fence_signal_list);
>>       INIT_LIST_HEAD(&engine->buffers);
>>       INIT_LIST_HEAD(&engine->execlist_queue);
>>       spin_lock_init(&engine->execlist_lock);
>> @@ -2035,6 +2038,8 @@ logical_ring_setup(struct drm_device *dev, enum 
>> intel_engine_id id)
>>       tasklet_init(&engine->irq_tasklet,
>>                intel_lrc_irq_handler, (unsigned long)engine);
>>
>> +    INIT_WORK(&engine->request_work, i915_gem_request_worker);
>> +
>>       logical_ring_init_platform_invariants(engine);
>>       logical_ring_default_vfuncs(engine);
>>       logical_ring_default_irqs(engine, info->irq_shift);
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
>> b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index fbd3f12..1641096 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -2254,6 +2254,7 @@ static int intel_init_ring_buffer(struct drm_device 
>> *dev,
>>       INIT_LIST_HEAD(&engine->request_list);
>>       INIT_LIST_HEAD(&engine->execlist_queue);
>>       INIT_LIST_HEAD(&engine->buffers);
>> +    INIT_LIST_HEAD(&engine->fence_signal_list);
>>       spin_lock_init(&engine->fence_lock);
>>       i915_gem_batch_pool_init(dev, &engine->batch_pool);
>>       memset(engine->semaphore.sync_seqno, 0,
>> @@ -2261,6 +2262,8 @@ static int intel_init_ring_buffer(struct drm_device 
>> *dev,
>>
>>       init_waitqueue_head(&engine->irq_queue);
>>
>> +    INIT_WORK(&engine->request_work, i915_gem_request_worker);
>> +
>>       ringbuf = intel_engine_create_ringbuffer(engine, 32 * PAGE_SIZE);
>>       if (IS_ERR(ringbuf)) {
>>           ret = PTR_ERR(ringbuf);
>> @@ -2307,6 +2310,8 @@ void intel_cleanup_engine(struct intel_engine_cs 
>> *engine)
>>
>>       dev_priv = engine->i915;
>>
>> +    cancel_work_sync(&engine->request_work);
>> +
>>       if (engine->buffer) {
>>           intel_stop_engine(engine);
>>           WARN_ON(!IS_GEN2(dev_priv) && (I915_READ_MODE(engine) & MODE_IDLE) 
>> == 0);
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
>> b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> index 3f39daf..51779b4 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> @@ -347,6 +347,9 @@ struct intel_engine_cs {
>>       u32 (*get_cmd_length_mask)(u32 cmd_header);
>>
>>       spinlock_t fence_lock;
>> +    struct list_head fence_signal_list;
>> +
>> +    struct work_struct request_work;
>>   };
>>
>>   static inline bool 

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v9 4/6] drm/i915: Interrupt driven fences

Reply via email to