One other item worth mentioning is the effect of this bug on the user space daemons. Since this bug is in all kernels from 3.6-v4.4, folks will see the following behavior of systemd when this bug fires off and the value wraps:
id 0a0c809bcdc0812145d9c35456e46335650601b7 reason: systemd-journald killed by SIGABRT time: Sun 10 Jan 2016 11:56:16 AM MST cmdline: /usr/lib/systemd/systemd-journald package: systemd-219-19.el7 uid: 0 (root) count: 2 Directory: /var/tmp/abrt/ccpp-2016-01-10-11:56:16-6281 id 38f7bb372d0c14cb912e9d225fb474b931889e77 reason: __epoll_wait_nocancel(): systemd-logind killed by SIGABRT time: Fri 08 Jan 2016 01:33:42 PM MST cmdline: /usr/lib/systemd/systemd-logind package: systemd-219-19.el7 uid: 0 (root) count: 39 Directory: /var/tmp/abrt/ccpp-2016-01-08-13:33:42-630 Reported: https://retrace.fedoraproject.org/faf/reports/bthash/307b26a77cc6d5005ce2fdf18ff010fe3dc94401 So if folks see aborts in systemd that look like the above, it's this bug. I traced the systemd crash as well and its caused by the system returning garbage via system calls to systemd when that rax value wraps. Jeff. On 1/21/16, Jeff Merkey <linux....@gmail.com> wrote: > Hi Thomas > > That patch works and so does the patch currently in linux-next. Yeah ! > > The code on your machine is apparently from the linux-next tree. In > 4.4 is busted and that was the build I was using since that's the last > build I have MDB released on. If you compare v4.4 vs. linux-next you > will see someone already has patched this problem -- its just not in > Linus tree yet. > > v4.4 > > static inline s64 timekeeping_get_ns(struct tk_read_base *tkr) > { > cycle_t delta; > s64 nsec; > > delta = timekeeping_get_delta(tkr); > > nsec = delta * tkr->mult + tkr->xtime_nsec; > nsec >>= tkr->shift; << causes sar to be used > > /* If arch requires, add in get_arch_timeoffset() */ > return nsec + arch_gettimeoffset(); > } > > linux-next > > static inline s64 timekeeping_get_ns(struct tk_read_base *tkr) > { > cycle_t delta; > u64 nsec; > > delta = timekeeping_get_delta(tkr); > > nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift; << shr used > here > > /* If arch requires, add in get_arch_timeoffset() */ > return (s64)(nsec + arch_gettimeoffset()); > } > > > This bug goes way back in the linux revs, so I have to retro apply > this patch back to v3.6 in the MDB patch series. This bug was > introduced in the late linux 3.6 kernels and exists in all of them up > to v4.4. > > Jeff > > > On 1/21/16, Thomas Gleixner <t...@linutronix.de> wrote: >> Jeff, >> >> On Thu, 21 Jan 2016, Jeff Merkey wrote: >>> static inline s64 timekeeping_get_ns(struct tk_read_base *tkr) >>> { >>> cycle_t delta; >>> s64 nsec; >>> >>> delta = timekeeping_get_delta(tkr); >>> >>> nsec = delta * tkr->mult + tkr->xtime_nsec; >>> nsec >>= tkr->shift; << wrap caused here >>> >>> /* If arch requires, add in get_arch_timeoffset() */ >>> return nsec + arch_gettimeoffset(); >>> } >>> >>> You only have 64 bits of register and the numbers being calculated >>> here are big. By way of example, I observed the following during >>> normal operations: >>> >>> delta (RAX) | tkr->mult (RDX) >>> >>> 0x157876 0x65ee27 >>> 0xf1855 0x65f158 >>> 0x16cf05 0x65f408 >>> 303bc3 0x65f154 >>> >>> When this bug occurs different story. >>> >>> delta (RAX) | tkr->mult (RDX) >>> >>> 0x243283994b8 0x65233 >>> >>> So it goes like this: >>> >>> nsec = delta * tkr->mult + tkr->xtime_nsec; >>> 0x243283994b8 * 0x65233 >>> imul rax,rdx = 0xE6A2Ce1f1ea690a8 >>> >>> nsec >>= tkr->shift; << wrap caused here >>> sar rax,cl = 0xFFFFFFE6BFB3B7C3 >> >> That SAR is siomply wrong here. It must be an SHR and it is at least when >> I'm >> looking at the assembly of my machine. >> >>> the sar instruction doesn't just shift, it backfills the signedness of >>> the value, so this instruction is not doing what the C code is asking >>> it to do. I am guessing that somewhere in this mass of macros, >>> something may have gotten declared wrong or incomplete (declared >>> signed ?). >> >> There is no macro involved. >> >> timekeeping_get_ns >> { >> nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift; >> } >> >>> The assembler output for this section that calls the macro to >>> calculate nsecs shows the sar instruction: >>> >>> delta = timekeeping_get_delta(tkr); >>> >>> nsec = delta * tkr->mult + tkr->xtime_nsec; >>> 29b: 48 0f af c2 imul %rdx,%rax >>> 29f: 48 03 05 00 00 00 00 add 0x0(%rip),%rax # 2a6 >>> <ktime_get_ts64+0xc6> >>> nsec >>= tkr->shift; >>> 2a6: 48 d3 f8 sar %cl,%rax >> >> And this is fundamentally wrong. Why is the compiler emitting SAR instead >> of >> SHR here? Here is the assembly output from my kernel: >> >> nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift; >> 27e: 48 0f af c5 imul %rbp,%rax >> 282: 48 01 d8 add %rbx,%rax >> 285: 48 d3 e8 shr %cl,%rax >> >> } while (read_seqcount_retry(&tk_core.seq, seq)); >> >> >> So the first thing which needs to be figured out is WHY this results in a >> SAR >> on your compiler. >> >>> There is another problem with the tkr->read returning an unchanging, >>> unclearable number when this bug occurs for the delta value. I >>> appears for whatever reason the clock has gone to sleep or gone away >>> and is no longer updating its counters. >>> >>> static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr) >>> { >>> cycle_t cycle_now, delta; >>> >>> /* read clocksource */ >>> cycle_now = tkr->read(tkr->clock); << returns the same value after >>> this bug happens >>> >>> /* calculate the delta since the last update_wall_time */ >>> delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask); << >>> cycle last is also the same value. >>> >>> return delta; >>> } >> >> If that value does not change, then the timekeeping update is not >> running. That might happen because the timer interrupt is not happening >> or >> whatever got wreckaged. >> >>> I would check how these structs are defined and the vars in them to >>> see if somewhere they are declared as signed values to the compiler, >>> because that's what it thinks it was given to compile. >> >> Sure. Here you go: >> >> nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift; >> >> delta, mult, xtime_nsec and shift are unsigned. The only signed value is >> nsec. >> >> Does that issue go away if you apply the patch below? >> >> Thanks, >> >> tglx >> >> 8<----------- >> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c >> index 34b4cedfa80d..d405bcdf9d40 100644 >> --- a/kernel/time/timekeeping.c >> +++ b/kernel/time/timekeeping.c >> @@ -301,7 +301,7 @@ static inline u32 arch_gettimeoffset(void) { return >> 0; >> } >> static inline s64 timekeeping_get_ns(struct tk_read_base *tkr) >> { >> cycle_t delta; >> - s64 nsec; >> + u64 nsec; >> >> delta = timekeeping_get_delta(tkr); >> >> >