Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker
On Fri, Jul 22, 2016 at 11:10:28AM +0100, Tvrtko Ursulin wrote: > > Would canceling the idle worker be to expensive? It wasn't as much as that, I was trying to keep runtime suspend simple. In that the GT takes the wakelock to prevent suspend as required and not have the knowledge about all the users of the device inside runtime management callbacks. (It means the users then have to be concious that if they don't hold an explicit wakelock, they should check rpm first.) -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker
On 21/07/16 12:04, Chris Wilson wrote: On Thu, Jul 21, 2016 at 11:28:02AM +0100, Tvrtko Ursulin wrote: On 21/07/16 11:10, Chris Wilson wrote: On Thu, Jul 21, 2016 at 10:58:05AM +0100, Tvrtko Ursulin wrote: On 21/07/16 07:57, Chris Wilson wrote: During the idle-worker we disable the hangcheck and so kick any waiters that should have been completed (since the GPU is now idle). Unlike the hangcheck, we do not take any care to avoid the race between the irq handler and ourselves, and so it is possible for us to declare a missed interrupt even as the bottom-half is being scheduled to run. Let's ignore this race to stop a potential false-positive error. If the bottom half is scheduled to run then then.. References: https://bugs.freedesktop.org/show_bug.cgi?id=96974 Signed-off-by: Chris Wilson Cc: Joonas Lahtinen Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_gem.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 40047eb48826..9e826585edb2 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work) rearm_hangcheck = false; stuck_engines = intel_kick_waiters(dev_priv); ... this will not return a stucked engine since the there is a bh task assigned all until the bh exits. It reports if it wakes up a waiter on any engine. If the bh is already running, we cannot know if it has missed the seqno update. If it isn't running yet, we cannot know if it is about to be run. Oh I read the logic as completely opposite than what it is. Since the idle worker runs 100ms after last retirement, that would mean a really slow waiter or what? It is dubious. But the idle worker runs 100ms after the first time we detect all engines are idle and may be running as we detect all engines are idle again. The only thing for sure is that in some cases that bdw-u is reaching the idle-worker with an unwoken engine (and that there is a race here in declaring it as a missed interrupt). I wasn't that concerned about the race because of that 100ms delay where eveything should have been idle, but on reflection that 100ms is not guarranteed. Would canceling the idle worker be to expensive? Either way, looks OK to me. Reviewed-by: Tvrtko Ursulin Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker
On Thu, Jul 21, 2016 at 11:28:02AM +0100, Tvrtko Ursulin wrote: > > On 21/07/16 11:10, Chris Wilson wrote: > >On Thu, Jul 21, 2016 at 10:58:05AM +0100, Tvrtko Ursulin wrote: > >> > >>On 21/07/16 07:57, Chris Wilson wrote: > >>>During the idle-worker we disable the hangcheck and so kick any waiters > >>>that should have been completed (since the GPU is now idle). Unlike the > >>>hangcheck, we do not take any care to avoid the race between the irq > >>>handler and ourselves, and so it is possible for us to declare a missed > >>>interrupt even as the bottom-half is being scheduled to run. Let's > >>>ignore this race to stop a potential false-positive error. > >> > >>If the bottom half is scheduled to run then then.. > >> > >>>References: https://bugs.freedesktop.org/show_bug.cgi?id=96974 > >>>Signed-off-by: Chris Wilson > >>>Cc: Joonas Lahtinen > >>>Cc: Tvrtko Ursulin > >>>--- > >>> drivers/gpu/drm/i915/i915_gem.c | 7 +++ > >>> 1 file changed, 3 insertions(+), 4 deletions(-) > >>> > >>>diff --git a/drivers/gpu/drm/i915/i915_gem.c > >>>b/drivers/gpu/drm/i915/i915_gem.c > >>>index 40047eb48826..9e826585edb2 100644 > >>>--- a/drivers/gpu/drm/i915/i915_gem.c > >>>+++ b/drivers/gpu/drm/i915/i915_gem.c > >>>@@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work) > >>> rearm_hangcheck = false; > >>> > >>> stuck_engines = intel_kick_waiters(dev_priv); > >> > >>... this will not return a stucked engine since the there is a bh > >>task assigned all until the bh exits. > > > >It reports if it wakes up a waiter on any engine. If the bh is already > >running, we cannot know if it has missed the seqno update. If it isn't > >running yet, we cannot know if it is about to be run. > > Oh I read the logic as completely opposite than what it is. > > Since the idle worker runs 100ms after last retirement, that would > mean a really slow waiter or what? It is dubious. But the idle worker runs 100ms after the first time we detect all engines are idle and may be running as we detect all engines are idle again. The only thing for sure is that in some cases that bdw-u is reaching the idle-worker with an unwoken engine (and that there is a race here in declaring it as a missed interrupt). I wasn't that concerned about the race because of that 100ms delay where eveything should have been idle, but on reflection that 100ms is not guarranteed. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker
On 21/07/16 11:10, Chris Wilson wrote: On Thu, Jul 21, 2016 at 10:58:05AM +0100, Tvrtko Ursulin wrote: On 21/07/16 07:57, Chris Wilson wrote: During the idle-worker we disable the hangcheck and so kick any waiters that should have been completed (since the GPU is now idle). Unlike the hangcheck, we do not take any care to avoid the race between the irq handler and ourselves, and so it is possible for us to declare a missed interrupt even as the bottom-half is being scheduled to run. Let's ignore this race to stop a potential false-positive error. If the bottom half is scheduled to run then then.. References: https://bugs.freedesktop.org/show_bug.cgi?id=96974 Signed-off-by: Chris Wilson Cc: Joonas Lahtinen Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_gem.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 40047eb48826..9e826585edb2 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work) rearm_hangcheck = false; stuck_engines = intel_kick_waiters(dev_priv); ... this will not return a stucked engine since the there is a bh task assigned all until the bh exits. It reports if it wakes up a waiter on any engine. If the bh is already running, we cannot know if it has missed the seqno update. If it isn't running yet, we cannot know if it is about to be run. Oh I read the logic as completely opposite than what it is. Since the idle worker runs 100ms after last retirement, that would mean a really slow waiter or what? Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker
On Thu, Jul 21, 2016 at 10:58:05AM +0100, Tvrtko Ursulin wrote: > > On 21/07/16 07:57, Chris Wilson wrote: > >During the idle-worker we disable the hangcheck and so kick any waiters > >that should have been completed (since the GPU is now idle). Unlike the > >hangcheck, we do not take any care to avoid the race between the irq > >handler and ourselves, and so it is possible for us to declare a missed > >interrupt even as the bottom-half is being scheduled to run. Let's > >ignore this race to stop a potential false-positive error. > > If the bottom half is scheduled to run then then.. > > >References: https://bugs.freedesktop.org/show_bug.cgi?id=96974 > >Signed-off-by: Chris Wilson > >Cc: Joonas Lahtinen > >Cc: Tvrtko Ursulin > >--- > > drivers/gpu/drm/i915/i915_gem.c | 7 +++ > > 1 file changed, 3 insertions(+), 4 deletions(-) > > > >diff --git a/drivers/gpu/drm/i915/i915_gem.c > >b/drivers/gpu/drm/i915/i915_gem.c > >index 40047eb48826..9e826585edb2 100644 > >--- a/drivers/gpu/drm/i915/i915_gem.c > >+++ b/drivers/gpu/drm/i915/i915_gem.c > >@@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work) > > rearm_hangcheck = false; > > > > stuck_engines = intel_kick_waiters(dev_priv); > > ... this will not return a stucked engine since the there is a bh > task assigned all until the bh exits. It reports if it wakes up a waiter on any engine. If the bh is already running, we cannot know if it has missed the seqno update. If it isn't running yet, we cannot know if it is about to be run. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker
On 21/07/16 07:57, Chris Wilson wrote: During the idle-worker we disable the hangcheck and so kick any waiters that should have been completed (since the GPU is now idle). Unlike the hangcheck, we do not take any care to avoid the race between the irq handler and ourselves, and so it is possible for us to declare a missed interrupt even as the bottom-half is being scheduled to run. Let's ignore this race to stop a potential false-positive error. If the bottom half is scheduled to run then then.. References: https://bugs.freedesktop.org/show_bug.cgi?id=96974 Signed-off-by: Chris Wilson Cc: Joonas Lahtinen Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_gem.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 40047eb48826..9e826585edb2 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work) rearm_hangcheck = false; stuck_engines = intel_kick_waiters(dev_priv); ... this will not return a stucked engine since the there is a bh task assigned all until the bh exits. So I don't get it. :) Regards, Tvrtko - if (unlikely(stuck_engines)) { - DRM_DEBUG_DRIVER("kicked stuck waiters...missed irq\n"); - dev_priv->gpu_error.missed_irq_rings |= stuck_engines; - } + if (unlikely(stuck_engines)) + DRM_DEBUG_DRIVER("kicked stuck waiters (%x)...missed irq?\n", +stuck_engines); if (INTEL_GEN(dev_priv) >= 6) gen6_rps_idle(dev_priv); ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx