Jeff,

On Thu, 21 Jan 2016, Jeff Merkey wrote:
> static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
> {
>       cycle_t delta;
>       s64 nsec;
> 
>       delta = timekeeping_get_delta(tkr);
> 
>       nsec = delta * tkr->mult + tkr->xtime_nsec;
>       nsec >>= tkr->shift;    << wrap caused here
> 
>       /* If arch requires, add in get_arch_timeoffset() */
>       return nsec + arch_gettimeoffset();
> }
> 
> You only have 64 bits of register and the numbers being calculated
> here are big.   By way of example, I observed the following during
> normal operations:
> 
> delta  (RAX)   |     tkr->mult (RDX)
> 
> 0x157876            0x65ee27
> 0xf1855               0x65f158
> 0x16cf05             0x65f408
> 303bc3                0x65f154
> 
> When this bug occurs different story.
> 
> delta  (RAX)    |     tkr->mult (RDX)
> 
> 0x243283994b8     0x65233
> 
> So it goes like this:
> 
>       nsec = delta * tkr->mult + tkr->xtime_nsec;
>         0x243283994b8 * 0x65233
>         imul   rax,rdx = 0xE6A2Ce1f1ea690a8
> 
>       nsec >>= tkr->shift;    << wrap caused here
>         sar    rax,cl  =  0xFFFFFFE6BFB3B7C3

That SAR is siomply wrong here. It must be an SHR and it is at least when I'm
looking at the assembly of my machine.

> the sar instruction doesn't just shift, it backfills the signedness of
> the value, so this instruction is not doing what the C code is asking
> it to do.  I am guessing that somewhere in this mass of macros,
> something may have gotten declared wrong or incomplete (declared
> signed ?).

There is no macro involved.

timekeeping_get_ns
{
        nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
}

> The assembler output for this section that calls the macro to
> calculate nsecs shows the sar instruction:
> 
>       delta = timekeeping_get_delta(tkr);
> 
>       nsec = delta * tkr->mult + tkr->xtime_nsec;
>      29b:     48 0f af c2             imul   %rdx,%rax
>      29f:     48 03 05 00 00 00 00    add    0x0(%rip),%rax        # 2a6
> <ktime_get_ts64+0xc6>
>       nsec >>= tkr->shift;
>      2a6:     48 d3 f8                sar    %cl,%rax

And this is fundamentally wrong. Why is the compiler emitting SAR instead of
SHR here? Here is the assembly output from my kernel:

        nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
     27e:       48 0f af c5             imul   %rbp,%rax
     282:       48 01 d8                add    %rbx,%rax
     285:       48 d3 e8                shr    %cl,%rax

        } while (read_seqcount_retry(&tk_core.seq, seq));


So the first thing which needs to be figured out is WHY this results in a SAR
on your compiler.

> There is another problem with the tkr->read returning an unchanging,
> unclearable number when this bug occurs for the delta value.  I
> appears for whatever reason the clock has gone to sleep or gone away
> and is no longer updating its counters.
> 
> static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
> {
>       cycle_t cycle_now, delta;
> 
>       /* read clocksource */
>       cycle_now = tkr->read(tkr->clock); << returns the same value after
> this bug happens
> 
>       /* calculate the delta since the last update_wall_time */
>       delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask); <<
> cycle last is also the same value.
> 
>       return delta;
> }

If that value does not change, then the timekeeping update is not
running. That might happen because the timer interrupt is not happening or
whatever got wreckaged.
 
> I would check how these structs are defined and the vars in them to
> see if somewhere they are declared as signed values to the compiler,
> because that's what it thinks it was given to compile.

Sure. Here you go:

        nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;

delta, mult, xtime_nsec and shift are unsigned. The only signed value is nsec.

Does that issue go away if you apply the patch below?

Thanks,

        tglx

8<-----------
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 34b4cedfa80d..d405bcdf9d40 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -301,7 +301,7 @@ static inline u32 arch_gettimeoffset(void) { return 0; }
 static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
 {
        cycle_t delta;
-       s64 nsec;
+       u64 nsec;
 
        delta = timekeeping_get_delta(tkr);
 

Reply via email to