subject:"\[Intel\-gfx\] \[PATCH\] drm\/i915\: Delegate our irq handler to a thread"

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

2019-09-27 Thread Sebastian Andrzej Siewior

On 2019-09-26 16:44:33 [+0100], Chris Wilson wrote:
> > It's all edge interrupts -- although for gen2/3 my memory is hazy. But
> > the GPU (circa gen6) can generate more than enough interrupts to saturate
> > a CPU.
> 
> So everything older than gen5 has MSI disabled it appears and needs
> ONESHOT.

Also ACPI/PCI-quirks may decide that MSI is broken on the system and
disable it.

If you end up with a shared handler, you can't mix ONESHOT among the
handlers. So either all have that flag set or none of them.
In that case you need to provide a tiny primary handler which just
disables the IRQ (in the HW) and the threaded handler has to enable it
again (at the end of its routine).

> -Chris

Sebastian
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

2019-09-26 Thread Chris Wilson

Quoting Chris Wilson (2019-09-26 15:25:38)
> Moving our primary irq handler to a RT thread incurs an extra 1us delay
> in process interrupts. This is most notice in waking up client threads,
> where it adds about 20% of extra latency. It also imposes a delay in
> feeding the GPU, an extra 1us before signaling secondary engines and
> extra latency in resubmitting work to keep the GPU busy. The latter case
> is insignificant as the latency hidden by the active GPU, and
> preempt-to-busy ensures that no extra latency is incurred for
> preemption.

Fwiw, gem_wsim says the impact is around 1.25% for transcode (just
looking at the first hour or so of runs), which is enough to put the idea
on the back burner for now.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

2019-09-26 Thread Chris Wilson

Quoting Brian Welty (2019-09-26 19:57:17)
> 
> On 9/26/2019 7:25 AM, Chris Wilson wrote:
> > Moving our primary irq handler to a RT thread incurs an extra 1us delay
> > in process interrupts. This is most notice in waking up client threads,
> > where it adds about 20% of extra latency. It also imposes a delay in
> > feeding the GPU, an extra 1us before signaling secondary engines and
> > extra latency in resubmitting work to keep the GPU busy. The latter case
> > is insignificant as the latency hidden by the active GPU, and
> > preempt-to-busy ensures that no extra latency is incurred for
> > preemption.
> > 
> > The benefit is that we reduced the impact on the rest of the system, the
> > cycletest shows a reduction from 5us mean latency to 2us, with the
> > maximum observed latency (in a 2 minute window) reduced by over 160us.
> 
> Hi Chris,
> 
> I'm curious to understand this a little better.
> If only benefit of this patch is improving overall system performance, then
> can you say why i915 interrupt handling doesn't fit into what I thought was
> the common usage of threaded interrupts...

The way cyclictest determines system impact is via latency in delivery
of a timer to userspace; it observes irq-off time. We use that as a
convenient metric to determine how long we block the system while
servicing the gpu.

> Usually with request_threaded_irq(), I thought typically both handlers
> are specified and so you'd only fallback to the threaded handler when the
> interrupt context handler is overwhelmed?
> Like so:
> while (HW events need some action) {
>  do something ...
> if (too overwhelmed)  /* ie. reduce load on system */
> return IRQ_WAKE_THREAD;
> }
> return IRQ_HANDLED;
> 
> Likely you considered something like above.
> I'm just looking to understand why using only threaded interrupt handler is 
> best in this case.
> (Also maybe discuss this a little bit further in commit message...?)

Convenience. The primary motivation for the patch was to test
Sebastian's patches in our CI which are targeting PREEMPT_RT issues and
he noted that would also be reproduced in a threaded irq handler.

The measurements I did were to assuage my own gut response that a
threaded irq handler would be devastating to our latency metrics
(although a 1us extra latency is terrible if our goal happens to be
sub-1us gpu->client latency!). (I was contemplating adding IRQF_NO_THREAD
to avoid the issue.)

The problem I feel with the overflow of primary to thread is what value
do you decide is acceptable latency for the rest of the system; and why
bother to kick off a second thread if all it does is service a tasklet,
would we be better served by a dedicated thread per-submission-thread?
(Which might sound familiar -- it what's we used before tasklets.)

It's definitely a tradeoff; our latency versus the rest of the system.
Though there were a few advantages to having an RT thread spawned for
the softirq (which I guess is was because RT threads are granted a
frequency boost), and if we can reduce the number of irq-off spinlocks
that may have much greater advantages across the driver that outweigh
the cost of the extra context switch in client wakeup.

> Sorry if this was perhaps discussed in response to Tvrtko's question.
> I didn't quite follow the fast/slow separation mentioned in that thread.

Our interrupt handlers are effectively split into 2 phases. First we read
the master control and various IIR, and ack the interrupt, after which
the GPU is free to raise an interrupt again. Then we process the
interrupt, handling the various bits asserted in the IIR (signaling
clients, feeding the GPU its next context, completing vblanks etc).
The first ack should be fast -- you can demonstrate the impact of not
doing the ack/handler split, by measuring the throughput on multiple
engines (a context-switch event on the second engine will be noticeably
blocked by the interrupt handler on the first, if you are not careful).
(gen11+ breaks this split atm as we haven't worked out how we are meant
to ack the banked GT IIR yet, or at least no one has seen Icelake to
measure the impact).
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

2019-09-26 Thread Brian Welty

On 9/26/2019 7:25 AM, Chris Wilson wrote:
> Moving our primary irq handler to a RT thread incurs an extra 1us delay
> in process interrupts. This is most notice in waking up client threads,
> where it adds about 20% of extra latency. It also imposes a delay in
> feeding the GPU, an extra 1us before signaling secondary engines and
> extra latency in resubmitting work to keep the GPU busy. The latter case
> is insignificant as the latency hidden by the active GPU, and
> preempt-to-busy ensures that no extra latency is incurred for
> preemption.
> 
> The benefit is that we reduced the impact on the rest of the system, the
> cycletest shows a reduction from 5us mean latency to 2us, with the
> maximum observed latency (in a 2 minute window) reduced by over 160us.

Hi Chris,

I'm curious to understand this a little better.
If only benefit of this patch is improving overall system performance, then
can you say why i915 interrupt handling doesn't fit into what I thought was
the common usage of threaded interrupts...

Usually with request_threaded_irq(), I thought typically both handlers
are specified and so you'd only fallback to the threaded handler when the
interrupt context handler is overwhelmed?
Like so:
while (HW events need some action) {
 do something ...
if (too overwhelmed)  /* ie. reduce load on system */
return IRQ_WAKE_THREAD;
}
return IRQ_HANDLED;

Likely you considered something like above.
I'm just looking to understand why using only threaded interrupt handler is 
best in this case.
(Also maybe discuss this a little bit further in commit message...?)

Sorry if this was perhaps discussed in response to Tvrtko's question.
I didn't quite follow the fast/slow separation mentioned in that thread.

Thanks,
-Brian

> 
> Signed-off-by: Chris Wilson 
> Cc: Joonas Lahtinen 
> Cc: Tvrtko Ursulin 
> Cc: Clark Williams 
> Cc: Sebastian Andrzej Siewior 
> ---
> Note this should need the fixes in
> 20190926105644.16703-2-bige...@linutronix.de, by itself this should be a
> test vehicle to exercise that patch!
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index bc83f094065a..f3df7714a3f3 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -4491,8 +4491,8 @@ int intel_irq_install(struct drm_i915_private *dev_priv)
>  
>   intel_irq_reset(dev_priv);
>  
> - ret = request_irq(irq, intel_irq_handler(dev_priv),
> -   IRQF_SHARED, DRIVER_NAME, dev_priv);
> + ret = request_threaded_irq(irq, NULL, intel_irq_handler(dev_priv),
> +IRQF_SHARED, DRIVER_NAME, dev_priv);
>   if (ret < 0) {
>   dev_priv->drm.irq_enabled = false;
>   return ret;
> 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

2019-09-26 Thread Sebastian Andrzej Siewior

On 2019-09-26 16:40:34 [+0100], Chris Wilson wrote:
> 
> It's all edge interrupts -- although for gen2/3 my memory is hazy. But
> the GPU (circa gen6) can generate more than enough interrupts to saturate
> a CPU.

:)

> -Chris

Sebastian
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

2019-09-26 Thread Chris Wilson

Quoting Chris Wilson (2019-09-26 16:40:34)
> Quoting Sebastian Andrzej Siewior (2019-09-26 16:32:52)
> > On 2019-09-26 16:24:59 [+0100], Chris Wilson wrote:
> > > > > diff --git a/drivers/gpu/drm/i915/i915_irq.c 
> > > > > b/drivers/gpu/drm/i915/i915_irq.c
> > > > > index bc83f094065a..f3df7714a3f3 100644
> > > > > --- a/drivers/gpu/drm/i915/i915_irq.c
> > > > > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > > > > @@ -4491,8 +4491,8 @@ int intel_irq_install(struct drm_i915_private 
> > > > > *dev_priv)
> > > > >  
> > > > >   intel_irq_reset(dev_priv);
> > > > >  
> > > > > - ret = request_irq(irq, intel_irq_handler(dev_priv),
> > > > > -   IRQF_SHARED, DRIVER_NAME, dev_priv);
> > > > > + ret = request_threaded_irq(irq, NULL, 
> > > > > intel_irq_handler(dev_priv),
> > > > > +IRQF_SHARED, DRIVER_NAME, dev_priv);
> > > > 
> > > > I think you should add IRQF_ONESHOT. Otherwise a LEVEL interrupt will
> > > > keep interrupting the CPU and you never manage to switch to the thread.
> > > 
> > > The interrupts only keep coming if we feed the GPU, and we only feed the
> > > GPU if we service the interrupt. That should provide a natural
> > > quiescence :)
> > 
> > In IRQ-context your primary handler gets invoked which wakes the thread
> > (what ever intel_irq_handler() returns). Then you leave the IRQ context
> > and should switch to your IRQ-handler. This will never happen because
> > the IRQ line remains asserted and CPU ends up in the primary handler
> > again.
> > An EDGE typed IRQ wouldn't notice a difference. But a LINE typed IRQ
> > will remain asserted until the hardware de-asserts the interrupt again.
> > Since you never reach your thread handler, I don't see how this can
> > happen.
> 
> It's all edge interrupts -- although for gen2/3 my memory is hazy. But
> the GPU (circa gen6) can generate more than enough interrupts to saturate
> a CPU.

So everything older than gen5 has MSI disabled it appears and needs
ONESHOT.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

2019-09-26 Thread Chris Wilson

Quoting Sebastian Andrzej Siewior (2019-09-26 16:32:52)
> On 2019-09-26 16:24:59 [+0100], Chris Wilson wrote:
> > > > diff --git a/drivers/gpu/drm/i915/i915_irq.c 
> > > > b/drivers/gpu/drm/i915/i915_irq.c
> > > > index bc83f094065a..f3df7714a3f3 100644
> > > > --- a/drivers/gpu/drm/i915/i915_irq.c
> > > > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > > > @@ -4491,8 +4491,8 @@ int intel_irq_install(struct drm_i915_private 
> > > > *dev_priv)
> > > >  
> > > >   intel_irq_reset(dev_priv);
> > > >  
> > > > - ret = request_irq(irq, intel_irq_handler(dev_priv),
> > > > -   IRQF_SHARED, DRIVER_NAME, dev_priv);
> > > > + ret = request_threaded_irq(irq, NULL, intel_irq_handler(dev_priv),
> > > > +IRQF_SHARED, DRIVER_NAME, dev_priv);
> > > 
> > > I think you should add IRQF_ONESHOT. Otherwise a LEVEL interrupt will
> > > keep interrupting the CPU and you never manage to switch to the thread.
> > 
> > The interrupts only keep coming if we feed the GPU, and we only feed the
> > GPU if we service the interrupt. That should provide a natural
> > quiescence :)
> 
> In IRQ-context your primary handler gets invoked which wakes the thread
> (what ever intel_irq_handler() returns). Then you leave the IRQ context
> and should switch to your IRQ-handler. This will never happen because
> the IRQ line remains asserted and CPU ends up in the primary handler
> again.
> An EDGE typed IRQ wouldn't notice a difference. But a LINE typed IRQ
> will remain asserted until the hardware de-asserts the interrupt again.
> Since you never reach your thread handler, I don't see how this can
> happen.

It's all edge interrupts -- although for gen2/3 my memory is hazy. But
the GPU (circa gen6) can generate more than enough interrupts to saturate
a CPU.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

2019-09-26 Thread Sebastian Andrzej Siewior

On 2019-09-26 16:24:59 [+0100], Chris Wilson wrote:
> > > diff --git a/drivers/gpu/drm/i915/i915_irq.c 
> > > b/drivers/gpu/drm/i915/i915_irq.c
> > > index bc83f094065a..f3df7714a3f3 100644
> > > --- a/drivers/gpu/drm/i915/i915_irq.c
> > > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > > @@ -4491,8 +4491,8 @@ int intel_irq_install(struct drm_i915_private 
> > > *dev_priv)
> > >  
> > >   intel_irq_reset(dev_priv);
> > >  
> > > - ret = request_irq(irq, intel_irq_handler(dev_priv),
> > > -   IRQF_SHARED, DRIVER_NAME, dev_priv);
> > > + ret = request_threaded_irq(irq, NULL, intel_irq_handler(dev_priv),
> > > +IRQF_SHARED, DRIVER_NAME, dev_priv);
> > 
> > I think you should add IRQF_ONESHOT. Otherwise a LEVEL interrupt will
> > keep interrupting the CPU and you never manage to switch to the thread.
> 
> The interrupts only keep coming if we feed the GPU, and we only feed the
> GPU if we service the interrupt. That should provide a natural
> quiescence :)

In IRQ-context your primary handler gets invoked which wakes the thread
(what ever intel_irq_handler() returns). Then you leave the IRQ context
and should switch to your IRQ-handler. This will never happen because
the IRQ line remains asserted and CPU ends up in the primary handler
again.
An EDGE typed IRQ wouldn't notice a difference. But a LINE typed IRQ
will remain asserted until the hardware de-asserts the interrupt again.
Since you never reach your thread handler, I don't see how this can
happen.

> -Chris

Sebastian
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

2019-09-26 Thread Chris Wilson

Quoting Sebastian Andrzej Siewior (2019-09-26 16:13:08)
> On 2019-09-26 15:25:38 [+0100], Chris Wilson wrote:
> > Moving our primary irq handler to a RT thread incurs an extra 1us delay
> > in process interrupts. This is most notice in waking up client threads,
> > where it adds about 20% of extra latency. It also imposes a delay in
> > feeding the GPU, an extra 1us before signaling secondary engines and
> > extra latency in resubmitting work to keep the GPU busy. The latter case
> > is insignificant as the latency hidden by the active GPU, and
> > preempt-to-busy ensures that no extra latency is incurred for
> > preemption.
> > 
> > The benefit is that we reduced the impact on the rest of the system, the
> > cycletest shows a reduction from 5us mean latency to 2us, with the
> > maximum observed latency (in a 2 minute window) reduced by over 160us.
> 
> cycletest is the name of the tool I don't know about, right? It is not
> cyclictest with a typo?

It's a tool like cyclictest that measures the timer latency in one
thread while flooding the system with GPU work, filling the page cache,
grabbing as many THP as possible etc. (It's actually called
gem_syslatency.)

> I don't know how many interrupts you have in a system but if this is the
> only one for the card then you could remove _irqsave() from locking if
> the lock is never observed in IRQ context (but then you have irqwork
> which requires this irqsave).

True, pushing the irq into a thread should allow us to re-evaluate a lot
of our spin_lock_irq.

> > Signed-off-by: Chris Wilson 
> > Cc: Joonas Lahtinen 
> > Cc: Tvrtko Ursulin 
> > Cc: Clark Williams 
> > Cc: Sebastian Andrzej Siewior 
> > ---
> > Note this should need the fixes in
> > 20190926105644.16703-2-bige...@linutronix.de, by itself this should be a
> > test vehicle to exercise that patch!
> > ---
> >  drivers/gpu/drm/i915/i915_irq.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c 
> > b/drivers/gpu/drm/i915/i915_irq.c
> > index bc83f094065a..f3df7714a3f3 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -4491,8 +4491,8 @@ int intel_irq_install(struct drm_i915_private 
> > *dev_priv)
> >  
> >   intel_irq_reset(dev_priv);
> >  
> > - ret = request_irq(irq, intel_irq_handler(dev_priv),
> > -   IRQF_SHARED, DRIVER_NAME, dev_priv);
> > + ret = request_threaded_irq(irq, NULL, intel_irq_handler(dev_priv),
> > +IRQF_SHARED, DRIVER_NAME, dev_priv);
> 
> I think you should add IRQF_ONESHOT. Otherwise a LEVEL interrupt will
> keep interrupting the CPU and you never manage to switch to the thread.

The interrupts only keep coming if we feed the GPU, and we only feed the
GPU if we service the interrupt. That should provide a natural
quiescence :)
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

2019-09-26 Thread Sebastian Andrzej Siewior

On 2019-09-26 15:57:07 [+0100], Tvrtko Ursulin wrote:
> 2. What about our tasklets - with threaded irqs we don't need them any more,
> right? So in this case they just add additional latency.

If you enqueue / schedule tasklets from your threaded handler then this
will wake up ksoftirqd and perform the work there (based on my memory of
the code).

> Regards,
> 
> Tvrtko

Sebastian
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

2019-09-26 Thread Sebastian Andrzej Siewior

On 2019-09-26 15:25:38 [+0100], Chris Wilson wrote:
> Moving our primary irq handler to a RT thread incurs an extra 1us delay
> in process interrupts. This is most notice in waking up client threads,
> where it adds about 20% of extra latency. It also imposes a delay in
> feeding the GPU, an extra 1us before signaling secondary engines and
> extra latency in resubmitting work to keep the GPU busy. The latter case
> is insignificant as the latency hidden by the active GPU, and
> preempt-to-busy ensures that no extra latency is incurred for
> preemption.
> 
> The benefit is that we reduced the impact on the rest of the system, the
> cycletest shows a reduction from 5us mean latency to 2us, with the
> maximum observed latency (in a 2 minute window) reduced by over 160us.

cycletest is the name of the tool I don't know about, right? It is not
cyclictest with a typo?
I don't know how many interrupts you have in a system but if this is the
only one for the card then you could remove _irqsave() from locking if
the lock is never observed in IRQ context (but then you have irqwork
which requires this irqsave).

> Signed-off-by: Chris Wilson 
> Cc: Joonas Lahtinen 
> Cc: Tvrtko Ursulin 
> Cc: Clark Williams 
> Cc: Sebastian Andrzej Siewior 
> ---
> Note this should need the fixes in
> 20190926105644.16703-2-bige...@linutronix.de, by itself this should be a
> test vehicle to exercise that patch!
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index bc83f094065a..f3df7714a3f3 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -4491,8 +4491,8 @@ int intel_irq_install(struct drm_i915_private *dev_priv)
>  
>   intel_irq_reset(dev_priv);
>  
> - ret = request_irq(irq, intel_irq_handler(dev_priv),
> -   IRQF_SHARED, DRIVER_NAME, dev_priv);
> + ret = request_threaded_irq(irq, NULL, intel_irq_handler(dev_priv),
> +IRQF_SHARED, DRIVER_NAME, dev_priv);

I think you should add IRQF_ONESHOT. Otherwise a LEVEL interrupt will
keep interrupting the CPU and you never manage to switch to the thread.

>   if (ret < 0) {
>   dev_priv->drm.irq_enabled = false;
>   return ret;

Sebastian
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

2019-09-26 Thread Chris Wilson

Quoting Tvrtko Ursulin (2019-09-26 15:57:07)
> 
> On 26/09/2019 15:25, Chris Wilson wrote:
> > Moving our primary irq handler to a RT thread incurs an extra 1us delay
> > in process interrupts. This is most notice in waking up client threads,
> > where it adds about 20% of extra latency. It also imposes a delay in
> > feeding the GPU, an extra 1us before signaling secondary engines and
> > extra latency in resubmitting work to keep the GPU busy. The latter case
> > is insignificant as the latency hidden by the active GPU, and
> > preempt-to-busy ensures that no extra latency is incurred for
> > preemption.
> > 
> > The benefit is that we reduced the impact on the rest of the system, the
> > cycletest shows a reduction from 5us mean latency to 2us, with the
> > maximum observed latency (in a 2 minute window) reduced by over 160us.
> > 
> > Signed-off-by: Chris Wilson 
> > Cc: Joonas Lahtinen 
> > Cc: Tvrtko Ursulin 
> > Cc: Clark Williams 
> > Cc: Sebastian Andrzej Siewior 
> > ---
> > Note this should need the fixes in
> > 20190926105644.16703-2-bige...@linutronix.de, by itself this should be a
> > test vehicle to exercise that patch!
> > ---
> >   drivers/gpu/drm/i915/i915_irq.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c 
> > b/drivers/gpu/drm/i915/i915_irq.c
> > index bc83f094065a..f3df7714a3f3 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -4491,8 +4491,8 @@ int intel_irq_install(struct drm_i915_private 
> > *dev_priv)
> >   
> >   intel_irq_reset(dev_priv);
> >   
> > - ret = request_irq(irq, intel_irq_handler(dev_priv),
> > -   IRQF_SHARED, DRIVER_NAME, dev_priv);
> > + ret = request_threaded_irq(irq, NULL, intel_irq_handler(dev_priv),
> > +IRQF_SHARED, DRIVER_NAME, dev_priv);
> >   if (ret < 0) {
> >   dev_priv->drm.irq_enabled = false;
> >   return ret;
> > 
> 
> Two questions:
> 
> 1. Should we split out the master_ctl handling into the fast handler? 
> Although can we pass anything from the fast to threaded handler? If not 
> we'd have to re-read the master_ctl from the threaded handler.

I did look at using the primary/threaded irq handler to do some
fast/slow handling (but RT is probably going to force the primary into a
thread, so long term prospect there looked bleak ;)

In our pre-gen11 model we could certainly do the ack in the fast
primary handler, pushing the processing into the secondary thread. Post
gen11 is more annoying and we probably only do the master_ctl
immediately. (It's very tempting to put GT handler into fast, but that
defeats the PREEMPT_RT objections ;)

We can accumulate into i915 anything we want to pass from primary to
secondary. Or we can use per-cpu handlers and data structures.

> 2. What about our tasklets - with threaded irqs we don't need them any 
> more, right? So in this case they just add additional latency.

Our tasklets run immediately within the RT thread. That appears to be a
positive outcome of this. We keep our direct submission so our initial
latency is not impacted at all, and the impact of the thread latency is
minimised by keeping the GPU busy. So the only (easily?) measurable
impact is client wakeup. (I have a modicum of concern that there is a
priority inversion here for an RT client waiting on the GPU.)
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

2019-09-26 Thread Tvrtko Ursulin



On 26/09/2019 15:25, Chris Wilson wrote:

Moving our primary irq handler to a RT thread incurs an extra 1us delay
in process interrupts. This is most notice in waking up client threads,
where it adds about 20% of extra latency. It also imposes a delay in
feeding the GPU, an extra 1us before signaling secondary engines and
extra latency in resubmitting work to keep the GPU busy. The latter case
is insignificant as the latency hidden by the active GPU, and
preempt-to-busy ensures that no extra latency is incurred for
preemption.

The benefit is that we reduced the impact on the rest of the system, the
cycletest shows a reduction from 5us mean latency to 2us, with the
maximum observed latency (in a 2 minute window) reduced by over 160us.

Signed-off-by: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Tvrtko Ursulin 
Cc: Clark Williams 
Cc: Sebastian Andrzej Siewior 
---
Note this should need the fixes in
20190926105644.16703-2-bige...@linutronix.de, by itself this should be a
test vehicle to exercise that patch!
---
  drivers/gpu/drm/i915/i915_irq.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index bc83f094065a..f3df7714a3f3 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -4491,8 +4491,8 @@ int intel_irq_install(struct drm_i915_private *dev_priv)
  
  	intel_irq_reset(dev_priv);
  
-	ret = request_irq(irq, intel_irq_handler(dev_priv),

- IRQF_SHARED, DRIVER_NAME, dev_priv);
+   ret = request_threaded_irq(irq, NULL, intel_irq_handler(dev_priv),
+  IRQF_SHARED, DRIVER_NAME, dev_priv);
if (ret < 0) {
dev_priv->drm.irq_enabled = false;
return ret;



Two questions:

1. Should we split out the master_ctl handling into the fast handler? 
Although can we pass anything from the fast to threaded handler? If not 
we'd have to re-read the master_ctl from the threaded handler.


2. What about our tasklets - with threaded irqs we don't need them any 
more, right? So in this case they just add additional latency.


Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

2019-09-26 Thread Chris Wilson

Moving our primary irq handler to a RT thread incurs an extra 1us delay
in process interrupts. This is most notice in waking up client threads,
where it adds about 20% of extra latency. It also imposes a delay in
feeding the GPU, an extra 1us before signaling secondary engines and
extra latency in resubmitting work to keep the GPU busy. The latter case
is insignificant as the latency hidden by the active GPU, and
preempt-to-busy ensures that no extra latency is incurred for
preemption.

The benefit is that we reduced the impact on the rest of the system, the
cycletest shows a reduction from 5us mean latency to 2us, with the
maximum observed latency (in a 2 minute window) reduced by over 160us.

Signed-off-by: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Tvrtko Ursulin 
Cc: Clark Williams 
Cc: Sebastian Andrzej Siewior 
---
Note this should need the fixes in
20190926105644.16703-2-bige...@linutronix.de, by itself this should be a
test vehicle to exercise that patch!
---
 drivers/gpu/drm/i915/i915_irq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index bc83f094065a..f3df7714a3f3 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -4491,8 +4491,8 @@ int intel_irq_install(struct drm_i915_private *dev_priv)
 
intel_irq_reset(dev_priv);
 
-   ret = request_irq(irq, intel_irq_handler(dev_priv),
- IRQF_SHARED, DRIVER_NAME, dev_priv);
+   ret = request_threaded_irq(irq, NULL, intel_irq_handler(dev_priv),
+  IRQF_SHARED, DRIVER_NAME, dev_priv);
if (ret < 0) {
dev_priv->drm.irq_enabled = false;
return ret;
-- 
2.23.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

Re: [Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

[Intel-gfx] [PATCH] drm/i915: Delegate our irq handler to a thread

14 matches

Site Navigation

Mail list logo

Footer information