On Thu, 5 Oct 2017, Gabriel Beddingfield wrote:

> Hi Thomas,
> 
> On Thu, Oct 5, 2017 at 11:01 AM, Thomas Gleixner <t...@linutronix.de> wrote:
> >> > Which SoC/clocksource driver are you talking about?
> >>
> >> NXP i.MX 6SoloX
> >> drivers/clocksource/timer-imx-gpt.c
> >
> > So that clocksource driver looks correct. Do you have an idea in which
> > context this time jump happens? Does it happen when you exercise your high
> > frequency suspend/resume dance or is that happening just when you let the
> > machine run forever as well?
> 
> We couldn't devise any reproduction steps. We observed it happening at
> unexpected times in a fleet of devices -- and we couldn't find any
> patterns to clue us in.

Ok. Did you talk to NXP about that? Or did you try to exercise reads in a
loop to detect the wreckage and maybe a pattern in there?

> > The timekeeping_resume() path definitely has an issue:
> >
> >         cycle_now = tk_clock_read(&tk->tkr_mono);
> >         if ((clock->flags & CLOCK_SOURCE_SUSPEND_NONSTOP) &&
> >                 cycle_now > tk->tkr_mono.cycle_last) {
> >
> > This works nice for clocksources which wont wrap across suspend/resume but
> > not for those which can. That cycle_now -> cycle_last check should take
> > cs-mask into account ...
> >
> > Of course for clocksources which can wrap within realistic suspend times,
> > which 36 hours might be accounted for, this would need an extra sanity
> > check against a RTC whether wrap time has been exceeded.
> >
> > I haven't thought it through whether that buggered check fully explains
> > what you are observing, but it's wrong nevertheless. John?
> 
> Nah. It looks like the consequence is that you'll either fail to inject
> the sleep time or you'll fall back to having the RTC inject the sleep
> time. In our case, we never sleep for more than a couple of minutes so
> the error would be seconds rather than hours.

Fair enough. It's still wrong though and wants to be fixed.

Thanks,

        tglx

Reply via email to