Hi Nitin,
On Thu, Dec 05, 2024 at 04:03:38PM +0000, Gote, Nitin R wrote:
> > On Thu, Dec 05, 2024 at 05:27:36PM +0530, Nitin Gote wrote:
> > > Issue is seen again where engine resets fails because the engine
> > > resumes from an incorrect RING_HEAD. So, increase a time if at first
> > > the write doesn't succeed and retry again.
> > >
> > > Fixes: 6ef0e3ef2662 ("drm/i915/gt: Retry RING_HEAD reset until it get
> > > sticks")
> >
> > Is this a real fix or is it more of a fine tuning?
>
> Here we can say this for more fine tuning as issue seen again and
> that's why I have added fixes : 6ef0e3ef2662.
I thinkt he Fixes: tag is not necessary, here. You are not really
fixing a bug.
> > > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12806
> > > Signed-off-by: Nitin Gote <[email protected]>
> >
> > ...
> >
> > > @@ -231,7 +231,7 @@ static int xcs_resume(struct intel_engine_cs *engine)
> > > set_pp_dir(engine);
> > >
> > > /* First wake the ring up to an empty/idle ring */
> > > - for ((kt) = ktime_get() + (2 * NSEC_PER_MSEC);
> > > + for ((kt) = ktime_get() + (50 * NSEC_PER_MSEC);
> >
> > Where is this 50 coming from?
>
> Well, here HEAD is still not 0 even after writing in it.
> So, it could be the timing issue. I discussed this with Chris and we thought
> It is better to add 50ms instead of 2ms delay here to let HEAD write complete.
> I tested this on trybot for Haswell/Ivybridge platform
> https://patchwork.freedesktop.org/series/141779/ and
> I see BAT is successful and shards issues are not related.
Can you please add a comment explaining why you are using 50ms?
You can also mention that you experimented with different values
and determined that 50ms works best based on testing.
Andi