Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker

2016-07-22 Thread Chris Wilson
On Fri, Jul 22, 2016 at 11:10:28AM +0100, Tvrtko Ursulin wrote:
> 
> Would canceling the idle worker be to expensive?

It wasn't as much as that, I was trying to keep runtime suspend simple.
In that the GT takes the wakelock to prevent suspend as required and
not have the knowledge about all the users of the device inside runtime
management callbacks. (It means the users then have to be concious that
if they don't hold an explicit wakelock, they should check rpm first.)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker

2016-07-22 Thread Tvrtko Ursulin


On 21/07/16 12:04, Chris Wilson wrote:

On Thu, Jul 21, 2016 at 11:28:02AM +0100, Tvrtko Ursulin wrote:

On 21/07/16 11:10, Chris Wilson wrote:

On Thu, Jul 21, 2016 at 10:58:05AM +0100, Tvrtko Ursulin wrote:


On 21/07/16 07:57, Chris Wilson wrote:

During the idle-worker we disable the hangcheck and so kick any waiters
that should have been completed (since the GPU is now idle). Unlike the
hangcheck, we do not take any care to avoid the race between the irq
handler and ourselves, and so it is possible for us to declare a missed
interrupt even as the bottom-half is being scheduled to run. Let's
ignore this race to stop a potential false-positive error.


If the bottom half is scheduled to run then then..


References: https://bugs.freedesktop.org/show_bug.cgi?id=96974
Signed-off-by: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Tvrtko Ursulin 
---
  drivers/gpu/drm/i915/i915_gem.c | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 40047eb48826..9e826585edb2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work)
rearm_hangcheck = false;

stuck_engines = intel_kick_waiters(dev_priv);


... this will not return a stucked engine since the there is a bh
task assigned all until the bh exits.


It reports if it wakes up a waiter on any engine. If the bh is already
running, we cannot know if it has missed the seqno update. If it isn't
running yet, we cannot know if it is about to be run.


Oh I read the logic as completely opposite than what it is.

Since the idle worker runs 100ms after last retirement, that would
mean a really slow waiter or what?


It is dubious. But the idle worker runs 100ms after the first time we
detect all engines are idle and may be running as we detect all engines
are idle again. The only thing for sure is that in some cases that bdw-u
is reaching the idle-worker with an unwoken engine (and that there is
a race here in declaring it as a missed interrupt). I wasn't that
concerned about the race because of that 100ms delay where eveything
should have been idle, but on reflection that 100ms is not guarranteed.


Would canceling the idle worker be to expensive?

Either way, looks OK to me.

Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker

2016-07-21 Thread Chris Wilson
On Thu, Jul 21, 2016 at 11:28:02AM +0100, Tvrtko Ursulin wrote:
> 
> On 21/07/16 11:10, Chris Wilson wrote:
> >On Thu, Jul 21, 2016 at 10:58:05AM +0100, Tvrtko Ursulin wrote:
> >>
> >>On 21/07/16 07:57, Chris Wilson wrote:
> >>>During the idle-worker we disable the hangcheck and so kick any waiters
> >>>that should have been completed (since the GPU is now idle). Unlike the
> >>>hangcheck, we do not take any care to avoid the race between the irq
> >>>handler and ourselves, and so it is possible for us to declare a missed
> >>>interrupt even as the bottom-half is being scheduled to run. Let's
> >>>ignore this race to stop a potential false-positive error.
> >>
> >>If the bottom half is scheduled to run then then..
> >>
> >>>References: https://bugs.freedesktop.org/show_bug.cgi?id=96974
> >>>Signed-off-by: Chris Wilson 
> >>>Cc: Joonas Lahtinen 
> >>>Cc: Tvrtko Ursulin 
> >>>---
> >>>  drivers/gpu/drm/i915/i915_gem.c | 7 +++
> >>>  1 file changed, 3 insertions(+), 4 deletions(-)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> >>>b/drivers/gpu/drm/i915/i915_gem.c
> >>>index 40047eb48826..9e826585edb2 100644
> >>>--- a/drivers/gpu/drm/i915/i915_gem.c
> >>>+++ b/drivers/gpu/drm/i915/i915_gem.c
> >>>@@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work)
> >>>   rearm_hangcheck = false;
> >>>
> >>>   stuck_engines = intel_kick_waiters(dev_priv);
> >>
> >>... this will not return a stucked engine since the there is a bh
> >>task assigned all until the bh exits.
> >
> >It reports if it wakes up a waiter on any engine. If the bh is already
> >running, we cannot know if it has missed the seqno update. If it isn't
> >running yet, we cannot know if it is about to be run.
> 
> Oh I read the logic as completely opposite than what it is.
> 
> Since the idle worker runs 100ms after last retirement, that would
> mean a really slow waiter or what?

It is dubious. But the idle worker runs 100ms after the first time we
detect all engines are idle and may be running as we detect all engines
are idle again. The only thing for sure is that in some cases that bdw-u
is reaching the idle-worker with an unwoken engine (and that there is
a race here in declaring it as a missed interrupt). I wasn't that
concerned about the race because of that 100ms delay where eveything
should have been idle, but on reflection that 100ms is not guarranteed.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker

2016-07-21 Thread Tvrtko Ursulin


On 21/07/16 11:10, Chris Wilson wrote:

On Thu, Jul 21, 2016 at 10:58:05AM +0100, Tvrtko Ursulin wrote:


On 21/07/16 07:57, Chris Wilson wrote:

During the idle-worker we disable the hangcheck and so kick any waiters
that should have been completed (since the GPU is now idle). Unlike the
hangcheck, we do not take any care to avoid the race between the irq
handler and ourselves, and so it is possible for us to declare a missed
interrupt even as the bottom-half is being scheduled to run. Let's
ignore this race to stop a potential false-positive error.


If the bottom half is scheduled to run then then..


References: https://bugs.freedesktop.org/show_bug.cgi?id=96974
Signed-off-by: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Tvrtko Ursulin 
---
  drivers/gpu/drm/i915/i915_gem.c | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 40047eb48826..9e826585edb2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work)
rearm_hangcheck = false;

stuck_engines = intel_kick_waiters(dev_priv);


... this will not return a stucked engine since the there is a bh
task assigned all until the bh exits.


It reports if it wakes up a waiter on any engine. If the bh is already
running, we cannot know if it has missed the seqno update. If it isn't
running yet, we cannot know if it is about to be run.


Oh I read the logic as completely opposite than what it is.

Since the idle worker runs 100ms after last retirement, that would mean 
a really slow waiter or what?


Regards,

Tvrtko

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker

2016-07-21 Thread Chris Wilson
On Thu, Jul 21, 2016 at 10:58:05AM +0100, Tvrtko Ursulin wrote:
> 
> On 21/07/16 07:57, Chris Wilson wrote:
> >During the idle-worker we disable the hangcheck and so kick any waiters
> >that should have been completed (since the GPU is now idle). Unlike the
> >hangcheck, we do not take any care to avoid the race between the irq
> >handler and ourselves, and so it is possible for us to declare a missed
> >interrupt even as the bottom-half is being scheduled to run. Let's
> >ignore this race to stop a potential false-positive error.
> 
> If the bottom half is scheduled to run then then..
> 
> >References: https://bugs.freedesktop.org/show_bug.cgi?id=96974
> >Signed-off-by: Chris Wilson 
> >Cc: Joonas Lahtinen 
> >Cc: Tvrtko Ursulin 
> >---
> >  drivers/gpu/drm/i915/i915_gem.c | 7 +++
> >  1 file changed, 3 insertions(+), 4 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> >b/drivers/gpu/drm/i915/i915_gem.c
> >index 40047eb48826..9e826585edb2 100644
> >--- a/drivers/gpu/drm/i915/i915_gem.c
> >+++ b/drivers/gpu/drm/i915/i915_gem.c
> >@@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work)
> > rearm_hangcheck = false;
> >
> > stuck_engines = intel_kick_waiters(dev_priv);
> 
> ... this will not return a stucked engine since the there is a bh
> task assigned all until the bh exits.

It reports if it wakes up a waiter on any engine. If the bh is already
running, we cannot know if it has missed the seqno update. If it isn't
running yet, we cannot know if it is about to be run.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker

2016-07-21 Thread Tvrtko Ursulin


On 21/07/16 07:57, Chris Wilson wrote:

During the idle-worker we disable the hangcheck and so kick any waiters
that should have been completed (since the GPU is now idle). Unlike the
hangcheck, we do not take any care to avoid the race between the irq
handler and ourselves, and so it is possible for us to declare a missed
interrupt even as the bottom-half is being scheduled to run. Let's
ignore this race to stop a potential false-positive error.


If the bottom half is scheduled to run then then..


References: https://bugs.freedesktop.org/show_bug.cgi?id=96974
Signed-off-by: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Tvrtko Ursulin 
---
  drivers/gpu/drm/i915/i915_gem.c | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 40047eb48826..9e826585edb2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work)
rearm_hangcheck = false;

stuck_engines = intel_kick_waiters(dev_priv);


... this will not return a stucked engine since the there is a bh task 
assigned all until the bh exits.


So I don't get it. :)

Regards,

Tvrtko


-   if (unlikely(stuck_engines)) {
-   DRM_DEBUG_DRIVER("kicked stuck waiters...missed irq\n");
-   dev_priv->gpu_error.missed_irq_rings |= stuck_engines;
-   }
+   if (unlikely(stuck_engines))
+   DRM_DEBUG_DRIVER("kicked stuck waiters (%x)...missed irq?\n",
+stuck_engines);

if (INTEL_GEN(dev_priv) >= 6)
gen6_rps_idle(dev_priv);


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx