Hi Andrzej,

On Friday, 26 April 2024 18:13:02 CEST Nirmoy Das wrote:
> 
> On 4/23/2024 6:23 PM, Janusz Krzysztofik wrote:
> > From: Chris Wilson <ch...@chris-wilson.co.uk>
> >
> > The breadcrumbs use a GT wakeref for guarding the interrupt, but are
> > disarmed during release of the engine wakeref. This leaves a hole where
> > we may attach a breadcrumb just as the engine is parking (after it has
> > parked its breadcrumbs), execute the irq worker with some signalers still
> > attached, but never be woken again.
> >
> > That issue manifests itself in CI with IGT runner timeouts while tests
> > are waiting indefinitely for release of all GT wakerefs.
> >
> > <6> [209.151778] i915: Running 
> > live_engine_pm_selftests/live_engine_busy_stats
> > <7> [209.231628] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] 
> > disabling PW_5
> > <7> [209.231816] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] 
> > disabling PW_4
> > <7> [209.231944] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] 
> > disabling PW_3
> > <7> [209.232056] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] 
> > disabling PW_2
> > <7> [209.232166] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] 
> > disabling DC_off
> > <7> [209.232270] i915 0000:00:02.0: [drm:skl_enable_dc6 [i915]] Enabling DC6
> > <7> [209.232368] i915 0000:00:02.0: [drm:gen9_set_dc_state.part.0 [i915]] 
> > Setting DC state from 00 to 02
> > <4> [299.356116] [IGT] Inactivity timeout exceeded. Killing the current 
> > test with SIGQUIT.
> > ...
> > <6> [299.356526] sysrq: Show State
> > ...
> > <6> [299.373964] task:i915_selftest   state:D stack:11784 pid:5578  
> > tgid:5578  ppid:873    flags:0x00004002
> > <6> [299.373967] Call Trace:
> > <6> [299.373968]  <TASK>
> > <6> [299.373970]  __schedule+0x3bb/0xda0
> > <6> [299.373974]  schedule+0x41/0x110
> > <6> [299.373976]  intel_wakeref_wait_for_idle+0x82/0x100 [i915]
> > <6> [299.374083]  ? __pfx_var_wake_function+0x10/0x10
> > <6> [299.374087]  live_engine_busy_stats+0x9b/0x500 [i915]
> > <6> [299.374173]  __i915_subtests+0xbe/0x240 [i915]
> > <6> [299.374277]  ? __pfx___intel_gt_live_setup+0x10/0x10 [i915]
> > <6> [299.374369]  ? __pfx___intel_gt_live_teardown+0x10/0x10 [i915]
> > <6> [299.374456]  intel_engine_live_selftests+0x1c/0x30 [i915]
> > <6> [299.374547]  __run_selftests+0xbb/0x190 [i915]
> > <6> [299.374635]  i915_live_selftests+0x4b/0x90 [i915]
> > <6> [299.374717]  i915_pci_probe+0x10d/0x210 [i915]
> >
> > At the end of the interrupt worker, if there are no more engines awake,
> > disarm the breadcrumb and go to sleep.
> >
> > Fixes: 9d5612ca165a ("drm/i915/gt: Defer enabling the breadcrumb interrupt 
> > to after submission")
> > Closes: https://gitlab.freedesktop.org/drm/intel/issues/10026
> > Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
> > Cc: Andrzej Hajda <andrzej.ha...@intel.com>
> > Cc: <sta...@vger.kernel.org> # v5.12+
> > Signed-off-by: Janusz Krzysztofik <janusz.krzyszto...@linux.intel.com>
> 
> 
> Acked-by: Nirmoy Das <nirmoy....@intel.com>
> 
> I will let others/Andrzej r-b this as I am not very familiar with the code.

This patch should be familiar to you, could you please take a look?

Thanks,
Janusz

> 
> 
> Thanks,
> 
> Nirmoy
> 
> > ---
> >   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 15 +++++++--------
> >   1 file changed, 7 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c 
> > b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > index d650beb8ed22f..20b9b04ec1e0b 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > @@ -263,8 +263,13 @@ static void signal_irq_work(struct irq_work *work)
> >             i915_request_put(rq);
> >     }
> >   
> > +   /* Lazy irq enabling after HW submission */
> >     if (!READ_ONCE(b->irq_armed) && !list_empty(&b->signalers))
> >             intel_breadcrumbs_arm_irq(b);
> > +
> > +   /* And confirm that we still want irqs enabled before we yield */
> > +   if (READ_ONCE(b->irq_armed) && !atomic_read(&b->active))
> > +           intel_breadcrumbs_disarm_irq(b);
> >   }
> >   
> >   struct intel_breadcrumbs *
> > @@ -315,13 +320,7 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs 
> > *b)
> >             return;
> >   
> >     /* Kick the work once more to drain the signalers, and disarm the irq */
> > -   irq_work_sync(&b->irq_work);
> > -   while (READ_ONCE(b->irq_armed) && !atomic_read(&b->active)) {
> > -           local_irq_disable();
> > -           signal_irq_work(&b->irq_work);
> > -           local_irq_enable();
> > -           cond_resched();
> > -   }
> > +   irq_work_queue(&b->irq_work);
> >   }
> >   
> >   void intel_breadcrumbs_free(struct kref *kref)
> > @@ -404,7 +403,7 @@ static void insert_breadcrumb(struct i915_request *rq)
> >      * the request as it may have completed and raised the interrupt as
> >      * we were attaching it into the lists.
> >      */
> > -   if (!b->irq_armed || __i915_request_is_complete(rq))
> > +   if (!READ_ONCE(b->irq_armed) || __i915_request_is_complete(rq))
> >             irq_work_queue(&b->irq_work);
> >   }
> >   
> 




Reply via email to