Hi, On 2022-07-01 01:23:01 -0700, Lukas Fittl wrote: > On Fri, Jun 12, 2020 at 4:28 PM Andres Freund <and...@anarazel.de> wrote: > > > The attached patches are really just a prototype. I'm also not really > > planning to work on getting this into a "production ready" patchset > > anytime soon. I developed it primarily because I found it the overhead > > made it too hard to nail down in which part of a query tree performance > > changed. If somebody else wants to continue from here... > > > > I do think it's be a pretty significant improvement if we could reduce > > the timing overhead of EXPLAIN ANALYZE by this much. Even if requires a > > bunch of low-level code. > > > > Based on an off-list conversation with Andres, I decided to dust off this > old > patch for using rdtsc directly. The significant EXPLAIN ANALYZE performance > improvements (especially when using rdtsc instead of rdtsc*p*) seem to > warrant > giving this a more thorough look. > > See attached an updated patch (adding it to the July commitfest), with a few > changes: > > - Keep using clock_gettime() as a fallback if we decide to not use rdtsc
Yep. > - Fallback to /proc/cpuinfo for clock frequency, if cpuid(0x16) doesn't work I suspect that this might not be needed anymore. Seems like it'd be ok to just fall back to clock_gettime() in that case. > - In an abundance of caution, for now I've decided to only enable this if we > are on Linux/x86, and the current kernel clocksource is TSC (the kernel > has > quite sophisticated logic around making this decision, see [1]) I think our requirements are a bit lower than the kernel's - we're not tracking wall clock over an extended period... > Note that if we implemented the decision logic ourselves (instead of relying > on the Linux kernel), I'd be most worried about older virtualization > technology. In my understanding getting this right is notably more > complicated > than just checking cpuid, see [2]. > Known WIP problems with this patch version: > > * There appears to be a timing discrepancy I haven't yet worked out, where > the \timing data reported by psql doesn't match what EXPLAIN ANALYZE is > reporting. With Andres' earlier test case, I'm seeing a consistent ~700ms > higher for \timing than for the EXPLAIN ANALYZE time reported on the > server > side, only when rdtsc measurement is used -- its likely there is a problem > somewhere with how we perform the cycles to time conversion Could you explain a bit more what you're seeing? I just tested your patches and didn't see that here. > * Possibly related, the floating point handling for the cycles_to_sec > variable > is problematic in terms of precision (see FIXME, taken over from Andres' > POC) And probably also performance... > Open questions from me: > > 1) Do we need to account for different TSC offsets on different CPUs in SMP > systems? (the Linux kernel certainly has logic to that extent, but [3] > suggests this is no longer a problem on Nehalem and newer chips, i.e. > those > having an invariant TSC) I don't think we should cater to systems where we need that. > 2) Should we have a setting "--with-tsc" for configure? (instead of always > enabling it when on Linux/x86 with a TSC clocksource) Probably not worth it. > 3) Are there cases where we actually want to use rdtsc*p*? (i.e. wait for > current instructions to finish -- the prior discussion seemed to suggest > we don't want it for node instruction measurements, but possibly we do > want > this in other cases?) I was wondering about that too... Perhaps we should add a INSTR_TIME_SET_CURRENT_BARRIER() or such? > 4) Should we support using the "mrs" instruction on ARM? (which is similar > to > rdtsc, see [4]) I'd leave that for later personally. > #define NS_PER_S INT64CONST(1000000000) > #define US_PER_S INT64CONST(1000000) > #define MS_PER_S INT64CONST(1000) > @@ -95,17 +104,37 @@ typedef int64 instr_time; > > #define INSTR_TIME_SET_ZERO(t) ((t) = 0) > > -static inline instr_time pg_clock_gettime_ns(void) > +extern double cycles_to_sec; > + > +bool use_rdtsc; This should be extern and inside the ifdef below. > +#if defined(__x86_64__) && defined(__linux__) > +extern void pg_clock_gettime_initialize_rdtsc(void); > +#endif > + > +static inline instr_time pg_clock_gettime_ref_cycles(void) > { > struct timespec tmp; > > +#if defined(__x86_64__) && defined(__linux__) > + if (use_rdtsc) > + return __rdtsc(); > +#endif > + > clock_gettime(PG_INSTR_CLOCK, &tmp); > > return tmp.tv_sec * NS_PER_S + tmp.tv_nsec; > } > Greetings, Andres Freund