----- On Feb 14, 2017, at 3:02 PM, Trent Piepho tpie...@kymetacorp.com wrote:
> On Thu, 2017-02-09 at 20:51 -0500, mathieu.desnoyers wrote: >> On 32-bit systems, the current algorithm assume to have one clock read >> per 32-bit LSB overflow period, which is not guaranteed. It also has an >> issue on the first clock reads after module load, because the initial >> value for the last LSB is 0. It can cause the time to stay stuck at the >> same value for a few seconds at the beginning of the trace. > > I tested this on my 32-bit ARM system and it has solved the issue with > timestamp being fixed at zero until the first time the TSC wraps. > > I'm not sure how to properly test that NMIs do not generate any reverse > timestamps. NMIs don't exist on ARM. They have FIQ which are close (fast interrupts), but no NMIs. It would be a good thing to ensure that either FIQs are disabled when the OS updates the clock, or that no tracing is ever done in FIQ handlers. I don't think FIQs are disabled when updating the clock, but I suspect that no tracing is done in FIQ handlers. Also, lttng-modules, with that fix on ARM, just discard events that would come from NMI context (and since nmi context don't exist on ARM, it never discards anything). > > >> Fix this by using the non-nmi-safe clock source on 32-bit systems, >> except for x86 systems that support double-word cmpxchg (cmpxchg8b). > > Is there a reason to restrict this to x86 systems? ARM supports 64-bit > cmpxchg too, and indeed that's what I was testing. Since ARM don't have NMI, I don't see why we should care about making the clock source nmi-safe, especially since the algorithm adds a cmpxchg64_local (slight performance overhead). This might end up being useful if we ever want to trace fast interrupt handlers though. > >> + >> #if (LINUX_VERSION_CODE >= KERNEL_VERSION(3,17,0) \ >> + && (BITS_PER_LONG == 64 || CONFIG_X86_CMPXCHG64) \ >> && !defined(LTTNG_CLOCK_NMI_SAFE_BROKEN)) > > This fails to compile on ARM, as CONFIG_X86_CMPXCHG64 is not defined. > From what I can see, all ARM have cmpxchg64_local() so I modified this > to allow ARM support, used defined(), etc. > > It also appears that even if CONFIG_X86_CMPXCHG64 is not defined on x86, > there is still support via another definition of cmpxchg64_local() using > cmpxchg8b_emu. Or does this version lack some atomic in the face of NMI > ability that the cmpxchg8b instruction would have? Indeed, cmpxchg8b_emu disables interrupts on UP within its emulation, which is not safe against NMIs. I am tempted to only use the nmi-safe clock source on architectures where it matters, and ARM32/64 don't have NMIs. So I would use the usual seqlock-protected clock source on ARM32/64, unless we end up figuring out that FIQ handlers are instrumented with tracepoints. For x86 32/64, when 64-bit local cmpxchg is natively available, I would like to use the nmi-safe clock source. However, using double-word CAS on x86-32 is something that would be a new behavior in lttng-modules, and I don't want to introduce this as a fix into the stable branches. This is why the commit that landed into master currently simply revert to the non-nmi-safe clock source on x86-32, and don't trace NMIs on x86-32 for now. I would be OK to introduce use of double-word CAS on x86-32 as a new feature for upcoming lttng-modules versions (2.10+). But I would only use it when CONFIG_X86_CMPXCHG64 is defined: it tells us that the architecture does have a 64-bit cmpxchg instruction. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev