Hi, On 2026-02-23 16:24:57 +0100, David Geier wrote: > The code wasn't compiling properly on Windows because __x86_64__ is not > defined in Visual C++. I've changed the code to use > > #if defined(__x86_64__) || defined(_M_X64)
Independently of this patchset I wonder if it'd be worth introducing a PG_ARCH_X64 or such, to avoid this kind of thing. > I've tested v8 of the patch (= v7 plus aforementioned changes) on > Windows. I'm reporting the best of 3 runs. > > lotsarows test with parallelism disabled: > > master: 2781 ms > v7: 2776 ms (timing_clock_source = 'system') > v7: 2091 ms (timing_clock_source = 'tsc') Nice. > pg_test_timing: > > master: 27.04 ns > v7: 16.59 ns (QueryxPerformanceCounter) > v7: 13.69 ns (RDTSCP) > v7: 9.42 ns (RDTSC) Very nice. Unfortunately, on linux, applying up to 0002 cause a small regression in pg_test_timing. With cpuidle disabled, performance governor, pinned to one core. pg_test_timing, turboboost disabled: 412f78c66ee 27.70 ns 0002 28.48 ns pg_test_timing, turboboost enabled: 412f78c66ee 20.41 ns 0002 21.04 ns However, I tried, but failed, to push an actual EXPLAIN ANALYZE to show that difference. All the differences I see are well below the run-to-run noise. Which makes sense - the increase in overhead here probably is visible because it increases the dependency chain inside the loop, which wouldn't be visible in a normal explain (and of course, with more patches applied, a lot more is won). > From 25b58d2890e65a95ce426a0b80fab41c1c99bd8f Mon Sep 17 00:00:00 2001 > From: Lukas Fittl <[email protected]> > Date: Sat, 31 Jan 2026 08:49:46 -0800 > Subject: [PATCH v8 1/4] Check for HAVE__CPUIDEX and HAVE__GET_CPUID_COUNT > separately > > Previously we would only check for the availability of __cpuidex if > the related __get_cpuid_count was not available on a platform. But there > are cases where we want to be able to call __cpuidex as the only viable > option, specifically, when accessing a high leaf like VM Hypervisor > information (0x40000000), which __get_cpuid_count does not allow. > > This will be used in an future commit to access Hypervisor information > about the TSC frequency of x86 CPUs, where available. > > Note that __cpuidex is defined in cpuid.h for GCC/clang, but in intrin.h > for MSVC. Because we now set HAVE__CPUIDEX for GCC/clang when available, > adjust existing code to check for _MSC_VER when including intrin.h. > > Author: Lukas Fittl <[email protected]> > Reviewed-by: > Discussion: > https://www.postgresql.org/message-id/flat/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de > # Check for XSAVE intrinsics > diff --git a/meson.build b/meson.build > index ebfb85e93e5..312c919eaa4 100644 > --- a/meson.build > +++ b/meson.build > @@ -2080,7 +2080,8 @@ elif cc.links(''' > endif > > > -# Check for __get_cpuid_count() and __cpuidex() in a similar fashion. > +# Check for __get_cpuid_count() and __cpuidex() separately, since we > sometimes > +# need __cpuidex() even if __get_cpuid_count() is available. > if cc.links(''' > #include <cpuid.h> > int main(int arg, char **argv) > @@ -2091,8 +2092,13 @@ if cc.links(''' > ''', name: '__get_cpuid_count', > args: test_c_args) > cdata.set('HAVE__GET_CPUID_COUNT', 1) > -elif cc.links(''' > +endif > +if cc.links(''' > + #ifdef _MSC_VER > #include <intrin.h> > + #else > + #include <cpuid.h> > + #endif > int main(int arg, char **argv) > { > unsigned int exx[4] = {0, 0, 0, 0}; FWIW, this seems to trigger a warning locally: /srv/dev/build/postgres/m-dev-assert/meson-private/tmpw34r2pnc/testfile.c: In function 'main': /srv/dev/build/postgres/m-dev-assert/meson-private/tmpw34r2pnc/testfile.c:10:19: warning: pointer targets in passing argument 1 of '__cpuidex' differ in signe> 10 | __cpuidex(exx, 7, 0); | ^~~ | | | unsigned int * In file included from /srv/dev/build/postgres/m-dev-assert/meson-private/tmpw34r2pnc/testfile.c:5: /home/andres/build/gcc/master/install/lib/gcc/x86_64-pc-linux-gnu/16/include/cpuid.h:361:16: note: expected 'int *' but argument is of type 'unsigned int *' 361 | __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf) | ~~~~^~~~~~~~~~~~~~~ ----------- Checking if "__cpuidex" links: YES > diff --git a/src/port/pg_crc32c_sse42_choose.c > b/src/port/pg_crc32c_sse42_choose.c > index f586476964f..7a75380b483 100644 > --- a/src/port/pg_crc32c_sse42_choose.c > +++ b/src/port/pg_crc32c_sse42_choose.c > @@ -20,11 +20,11 @@ > > #include "c.h" > > -#if defined(HAVE__GET_CPUID) || defined(HAVE__GET_CPUID_COUNT) > +#if defined(HAVE__GET_CPUID) || defined(HAVE__GET_CPUID_COUNT) || > (defined(HAVE__CPUIDEX) && !defined(_MSC_VER)) > #include <cpuid.h> > #endif Why would we want to include cpuid.h with msvc if one of the other variables is defined? > -#if defined(HAVE__CPUID) || defined(HAVE__CPUIDEX) > +#if defined(HAVE__CPUID) || (defined(HAVE__CPUIDEX) && defined(_MSC_VER)) > #include <intrin.h> > #endif And here, why would we want to include intrin.h if HAVE__CPUID is defined? Seems like this should just be something like #if defined(HAVE__GET_CPUID) || defined(HAVE__GET_CPUID_COUNT) || defined(HAVE__CPUIDEX) #if defined(_MSC_VER) #include <intrin.h> #else #include <cpuid.h> #endif /* defined(_MSC_VER) */ #endif > From 2392d95626599a1b5562f9216eb8c334db99c932 Mon Sep 17 00:00:00 2001 > From: Lukas Fittl <[email protected]> > Date: Fri, 25 Jul 2025 17:57:20 -0700 > Subject: [PATCH v8 2/4] Timing: Streamline ticks to nanosecond conversion > across platforms > > The timing infrastructure (INSTR_* macros) measures time elapsed using > clock_gettime() on POSIX systems, which returns the time as nanoseconds, > and QueryPerformanceCounter() on Windows, which is a specialized timing > clock source that returns a tick counter that needs to be converted to > nanoseconds using the result of QueryPerformanceFrequency(). > > This conversion currently happens ad-hoc on Windows, e.g. when calling > INSTR_TIME_GET_NANOSEC, which calls QueryPerformanceFrequency() on every > invocation, despite the frequency being stable after program start, > incurring unnecessary overhead. It also causes a fractured implementation > where macros are defined differently between platforms. > > To ease code readability, and prepare for a future change that intends > to use a ticks-to-nanosecond conversion on x86-64 for TSC use, introduce > a new pg_ticks_to_ns() function that gets called on all platforms. > > This function relies on a separately initialized ticks_per_ns_scaled > value, that represents the conversion ratio. This value is initialized > from QueryPerformanceFrequency() on Windows, and set to zero on x86-64 > POSIX systems, which results in the ticks being treated as nanoseconds. > Other architectures always directly return the original ticks. > > To support this, pg_initialize_timing() is introduced, and is now > mandatory for both the backend and any frontend programs to call before > utilizing INSTR_* macros. I wonder if it's worth trying to transparently initialize in the overflow codepath. Probably not, but worth explicitly considering. > In passing modify pg_test_timing to reduce the per-loop overhead caused > by repeated divisions in INSTR_TIME_GET_NANOSEC when the ticks variable > has become very large. Instead diff first and then turn it into nanosecs. I'd like to see this broken out into a separate change. > diff --git a/src/bin/pg_test_timing/pg_test_timing.c > b/src/bin/pg_test_timing/pg_test_timing.c > index a5621251afc..9fd630a490a 100644 > @@ -182,9 +184,8 @@ test_timing(unsigned int duration) > bits; > > prev = cur; > - INSTR_TIME_SET_CURRENT(temp); > - cur = INSTR_TIME_GET_NANOSEC(temp); > - diff = cur - prev; > + INSTR_TIME_SET_CURRENT(cur); > + diff = INSTR_TIME_DIFF_NANOSEC(cur, prev); > > /* Did time go backwards? */ > if (unlikely(diff < 0)) FWIW, I don't think this needs a special INSTR_TIME macro, it could just use INSTR_TIME_SUBTRACT() and INSTR_TIME_GET_NANOSEC(). > diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c > index cb4e986092e..c8b233be16c 100644 > --- a/src/bin/pgbench/pgbench.c > +++ b/src/bin/pgbench/pgbench.c > @@ -7334,6 +7334,9 @@ main(int argc, char **argv) > initRandomState(&state[i].cs_func_rs); > } > > + /* initialize timing infrastructure (required for INSTR_* calls) */ > + pg_initialize_timing(); > + > /* opening connection... */ > con = doConnect(); > if (con == NULL) FWIW, I also verified that I am am unable to see measure overhead in pgbench due to the more expensive conversion. Not surprised, but it did seem like a possibility, because pgbench unfortunately always converts the gathered time to microseconds, rather than compute a difference between two timestamps. > + > +/* > + * Stores what the number of ticks needs to be multiplied with to end up > + * with nanoseconds using integer math. > + * > + * On certain platforms (currently Windows) the ticks to nanoseconds > conversion > + * requires floating point math because: > + * > + * sec = ticks / frequency_hz > + * ns = ticks / frequency_hz * 1,000,000,000 > + * ns = ticks * (1,000,000,000 / frequency_hz) > + * ns = ticks * (1,000,000 / frequency_khz) <-- now in kilohertz > + * > + * Here, 'ns' is usually a floating number. For example for a 2.5 GHz CPU > + * the scaling factor becomes 1,000,000 / 2,500,000 = 1.2. > + * > + * To be able to use integer math we work around the lack of precision. We > + * first scale the integer up and after the multiplication by the number > + * of ticks in INSTR_TIME_GET_NANOSEC() we divide again by the same value. > + * We picked the scaler such that it provides enough precision and is a > + * power-of-two which allows for shifting instead of doing an integer > + * division. We utilize unsigned integers even though ticks are stored as a > + * signed value because that encourages compilers to generate better > assembly. > + * On all other platforms we are using clock_gettime(), which uses > nanoseconds > + * as ticks. Hence, we set the multiplier to zero, which causes > pg_ticks_to_ns > + * to return the original value. > + */ > +uint64 ticks_per_ns_scaled = 0; > +uint64 max_ticks_no_overflow = 0; > + > +static void set_ticks_per_ns(void); > + > +void > +pg_initialize_timing() > +{ > + set_ticks_per_ns(); > +} > + > +#ifndef WIN32 > + > +static void > +set_ticks_per_ns() > +{ > + ticks_per_ns_scaled = 0; > + max_ticks_no_overflow = 0; > +} > + > +#else /* WIN32 */ > + > +/* GetTimerFrequency returns counts per second */ > +static inline double > +GetTimerFrequency(void) > +{ > + LARGE_INTEGER f; > + > + QueryPerformanceFrequency(&f); > + return (double) f.QuadPart; > +} > + > +static void > +set_ticks_per_ns() > +{ > + ticks_per_ns_scaled = INT64CONST(1000000000) * TICKS_TO_NS_PRECISION / > GetTimerFrequency(); This should probably use NS_PER_S. I wonder whether we should use an explicit shift here and in pg_ticks_to_ns(), to avoid having to rely on the compiler to do so for us. > +static inline int64 > +pg_ticks_to_ns(int64 ticks) > +{ > +#if defined(__x86_64__) || defined(_M_X64) > + int64 ns = 0; > + > + if (ticks_per_ns_scaled == 0) > + return ticks; There should be comment explaining (or referencing another explanation) for why this exists. > + /* > + * Would multiplication overflow? If so perform computation in two > parts. > + * Check overflow without actually overflowing via: a * b > max <=> a > > + * max / b > + */ > + if (unlikely(ticks > (int64) max_ticks_no_overflow)) The "via" comment seems a bit misplaced, given that the transformation is not really utilized here (but at the point where max_ticks_no_overflow) is computed. > + { > + /* > + * Compute how often the maximum number of ticks fits > completely into > + * the number of elapsed ticks and convert that number into > + * nanoseconds. Then multiply by the count to arrive at the > final > + * value. In a 2nd step we adjust the number of elapsed ticks > and > + * convert the remaining ticks. > + */ > + int64 count = ticks / max_ticks_no_overflow; > + int64 max_ns = max_ticks_no_overflow * > ticks_per_ns_scaled / TICKS_TO_NS_PRECISION; > + > + ns = max_ns * count; > + > + /* > + * Subtract the ticks that we now already accounted for, so > that they > + * don't get counted twice. > + */ > + ticks -= count * max_ticks_no_overflow; > + Assert(ticks >= 0); I think we could perhaps make the overflow case a good bit cheaper, by avoiding any divisions with a non-constant factor (assuming I haven't blown the logic below). Instead of doing a division we can "transform back" into the non-scaled representation, I think? ns = (ticks * ticks_per_ns_scaled) / TICKS_TO_NS_PRECISION equals, assuming arbitrary precision ns = (ticks / TICKS_TO_NS_PRECISION) * ticks_per_ns_scaled and not assuming arbitrary precision: count = ticks // TICKS_TO_NS_PRECISION rem_ticks = ticks - (count * TICKS_TO_NS_PRECISION) ns = count * ticks_per_ns_scaled + rem_ticks * ticks_per_ns_scaled // TICKS_TO_NS_PRECISION None of which afaict would overflow? > --- a/src/common/instr_time.c > +++ b/src/common/instr_time.c > @@ -20,8 +20,8 @@ > * Stores what the number of ticks needs to be multiplied with to end up > * with nanoseconds using integer math. > * > - * On certain platforms (currently Windows) the ticks to nanoseconds > conversion > - * requires floating point math because: > + * In certain cases (TSC on x86-64, and QueryPerformanceCounter on Windows) > + * the ticks to nanoseconds conversion requires floating point math because: > * > * sec = ticks / frequency_hz > * ns = ticks / frequency_hz * 1,000,000,000 > @@ -39,7 +39,7 @@ > * division. We utilize unsigned integers even though ticks are stored as a > * signed value because that encourages compilers to generate better > assembly. > * > - * On all other platforms we are using clock_gettime(), which uses > nanoseconds > + * In all other cases we are using clock_gettime(), which uses nanoseconds > * as ticks. Hence, we set the multiplier to zero, which causes > pg_ticks_to_ns > * to return the original value. > */ > @@ -48,16 +48,57 @@ uint64 max_ticks_no_overflow = 0; > > static void set_ticks_per_ns(void); > > +int timing_clock_source = TIMING_CLOCK_SOURCE_AUTO; > + > +#if defined(__x86_64__) || defined(_M_X64) > +/* Indicates if TSC instructions (RDTSC and RDTSCP) are usable. */ > +extern bool has_usable_tsc; > + > +static void tsc_initialize(void); > +static bool tsc_use_by_default(void); > +static void set_ticks_per_ns_for_tsc(void); > +static bool set_tsc_frequency_khz(void); > +static bool is_rdtscp_available(void); > +#endif > + > void > pg_initialize_timing() > { > +#if defined(__x86_64__) || defined(_M_X64) > + tsc_initialize(); > +#endif > + > + set_ticks_per_ns(); > +} > + > +bool > +pg_set_timing_clock_source(TimingClockSourceType source) > +{ > +#if defined(__x86_64__) || defined(_M_X64) > + switch (source) > + { > + case TIMING_CLOCK_SOURCE_AUTO: > + use_tsc = has_usable_tsc && tsc_use_by_default(); > + break; > + case TIMING_CLOCK_SOURCE_SYSTEM: > + use_tsc = false; > + break; > + case TIMING_CLOCK_SOURCE_TSC: > + if (!has_usable_tsc) /* Tell caller TSC is not > usable */ > + return false; > + use_tsc = true; > + break; > + } > +#endif > set_ticks_per_ns(); > + timing_clock_source = source; > + return true; > } Perhaps this should ensure that pg_initialize_timing() has already been called? > +bool > +check_timing_clock_source(int *newval, void **extra, GucSource source) > +{ > +#if defined(__x86_64__) || defined(_M_X64) > + pg_initialize_timing(); > + > + if (*newval == TIMING_CLOCK_SOURCE_TSC && !has_usable_tsc) > + { > + GUC_check_errdetail("TSC is not supported as fast clock > source"); > + return false; The GUC name doesn't refer to "fast", so probably this shouldn't either? > +const char * > +show_timing_clock_source() > +{ > +#if defined(__x86_64__) || defined(_M_X64) > + TimingClockSourceType effective_source = > pg_current_timing_clock_source(); > + > + switch (timing_clock_source) > + { > + case TIMING_CLOCK_SOURCE_AUTO: > + if (effective_source == TIMING_CLOCK_SOURCE_TSC) > + return "auto (tsc)"; > + else > + return "auto (system)"; > + case TIMING_CLOCK_SOURCE_SYSTEM: > + return "system"; > + case TIMING_CLOCK_SOURCE_TSC: > + return "tsc"; > + } > +#else > + switch (timing_clock_source) > + { > + case TIMING_CLOCK_SOURCE_AUTO: > + return "auto (system)"; > + case TIMING_CLOCK_SOURCE_SYSTEM: > + return "system"; > + } > +#endif Seems like it'd be nicer if we had one switch with the ifdef-ery inside the TIMING_CLOCK_SOURCE_AUTO case? If we add support for tsc based clock sources on arm as well, this would get a bit unmanageable. > +static uint32 tsc_frequency_khz = 0; > + > +/* > + * Decide whether we use the RDTSC/RDTSCP instructions at runtime, for > Linux/x86-64, > + * instead of incurring the overhead of a full clock_gettime() call. > + * > + * This can't be reliably determined at compile time, since the > + * availability of an "invariant" TSC (that is not affected by CPU > + * frequency changes) is dependent on the CPU architecture. Additionally, > + * there are cases where TSC availability is impacted by virtualization, > + * where a simple cpuid feature check would not be enough. > + */ > +static void > +tsc_initialize(void) > +{ > + /* > + * Compute baseline CPU peformance, determines speed at which the TSC > + * advances. > + */ > + if (!set_tsc_frequency_khz()) > + return; > + > + has_usable_tsc = is_rdtscp_available(); > +} > + > +/* > + * Decides whether to use TSC clock source if the user did not specify it > + * one way or the other, and it is available (checked separately). > + * > + * Currently only enabled by default on Linux, since Linux already does a > + * significant amount of work to determine whether TSC is a viable clock > + * source. > + */ > +static bool > +tsc_use_by_default() Postgres style is still funcname(void). > +{ > +#if defined(__linux__) > + FILE *fp = > fopen("/sys/devices/system/clocksource/clocksource0/current_clocksource", > "r"); > + char buf[128]; > + > + if (!fp) > + return false; > + > + if (fgets(buf, sizeof(buf), fp) != NULL && strcmp(buf, "tsc\n") == 0) > + { > + fclose(fp); > + return true; > + } > + > + fclose(fp); > +#endif > + > + return false; > +} I think this will often disable tsc on VMs, due to linux defaulting to kvm-clock in KVM VMs. Do we care about that? If the tsc is not actually viable, is it still listed in /sys/devices/system/clocksource/clocksource0/available_clocksource ? > + > +#define CPUID_HYPERVISOR_VMWARE(words) (words[1] == 0x61774d56 && words[2] > == 0x4d566572 && words[3] == 0x65726177) /* VMwareVMware */ > +#define CPUID_HYPERVISOR_KVM(words) (words[1] == 0x4b4d564b && words[2] == > 0x564b4d56 && words[3] == 0x0000004d) /* KVMKVMKVM */ > + > +static bool > +set_tsc_frequency_khz() > +{ > + uint32 r[4] = {0, 0, 0, 0}; > + > +#if defined(HAVE__GET_CPUID) > + __get_cpuid(0x15, &r[0] /* denominator */ , &r[1] /* numerator */ , > &r[2] /* hz */ , &r[3]); > +#elif defined(HAVE__CPUID) > + __cpuid(r, 0x15); > +#else > +#error cpuid instruction not available > +#endif > + > + if (r[2] > 0) > + { > + if (r[0] == 0 || r[1] == 0) > + return false; > + > + tsc_frequency_khz = r[2] / 1000 * r[1] / r[0]; > + return true; > + } I think there should be some explanation about what this is testing. Including perhaps a reference to the relevant documents. > + /* Some CPUs only report frequency in 16H */ Dito. > +#if defined(HAVE__GET_CPUID) > + __get_cpuid(0x16, &r[0] /* base_mhz */ , &r[1], &r[2], &r[3]); > +#elif defined(HAVE__CPUID) > + __cpuid(r, 0x16); > +#else > +#error cpuid instruction not available > +#endif Perhaps we could package the __get_cpuid / __cpuid thing in a wrapper, instead of repeating the ifdefery three times? > index 985b6b5af88..e7191c5d6cd 100644 > --- a/src/include/portability/instr_time.h > +++ b/src/include/portability/instr_time.h > @@ -4,9 +4,10 @@ > * portable high-precision interval timing > * > * This file provides an abstraction layer to hide portability issues in > - * interval timing. On Unix we use clock_gettime(), and on Windows we use > - * QueryPerformanceCounter(). These macros also give some breathing room to > - * use other high-precision-timing APIs. > + * interval timing. On x86 we use the RDTSC/RDTSCP instruction directly in > + * certain cases, or alternatively clock_gettime() on Unix-like systems and > + * QueryPerformanceCounter() on Windows. These macros also give some > breathing > + * room to use other high-precision-timing APIs. > * > * The basic data type is instr_time, which all callers should treat as an > * opaque typedef. instr_time can store either an absolute time (of > @@ -17,10 +18,11 @@ > * > * INSTR_TIME_SET_ZERO(t) set t to zero (memset is > acceptable too) > * > - * INSTR_TIME_SET_CURRENT(t) set t to current time > + * INSTR_TIME_SET_CURRENT_FAST(t) set t to current time without waiting > + * for > instructions in out-of-order window > * > - * INSTR_TIME_SET_CURRENT_LAZY(t) set t to current time if t is zero, > - * > evaluates to whether t changed > + * INSTR_TIME_SET_CURRENT(t) set t to current time while waiting for > + * > instructions in OOO to retire > * > * INSTR_TIME_ADD(x, y) x += y > * I'd probably remove INSTR_TIME_SET_CURRENT_LAZY in a prep commit. > @@ -93,13 +95,54 @@ typedef struct instr_time > extern PGDLLIMPORT uint64 ticks_per_ns_scaled; > extern PGDLLIMPORT uint64 max_ticks_no_overflow; > > +#if defined(__x86_64__) || defined(_M_X64) > +#include <immintrin.h> Why do we need to include immintrin.h in instr_time.h? Including immintrin.h makes compilation a lot slower: $ echo '#include <immintrin.h>'|gcc -ftime-report -xc -o /dev/null -c - Time variable wall GGC phase setup : 0.00 ( 1%) 1905k ( 6%) phase parsing : 0.50 ( 99%) 30M ( 94%) preprocessing : 0.09 ( 18%) 5375k ( 16%) lexical analysis : 0.02 ( 5%) 0 ( 0%) parser (global) : 0.31 ( 61%) 12M ( 39%) parser inl. func. body : 0.08 ( 15%) 12M ( 39%) TOTAL : 0.51 32M > int > @@ -46,10 +46,47 @@ main(int argc, char *argv[]) > /* initialize timing infrastructure (required for INSTR_* calls) */ > pg_initialize_timing(); > > - loop_count = test_timing(test_duration); > - > + /* > + * First, test default (non-fast) timing code. A clock source for that > is > + * always available. Hence, we can unconditionally output the result. > + */ > + loop_count = test_timing(test_duration, TIMING_CLOCK_SOURCE_SYSTEM, > false); > output(loop_count); > > +#if defined(__x86_64__) || defined(_M_X64) I don't love that now test_timing.c has architecture specific checks. Could we abstract this a bit more? > + /* > + * If on a supported architecture, test the RDTSC clock source. This > clock > + * source is not always available. In that case the loop count will be 0 > + * and we don't print. > + * > + * We first emit RDTSCP timings, which is slower, and gets used for > higher > + * precision measurements when the TSC clock source is enabled. We emit > + * RDTSC second, which is used for faster timing measurements with lower > + * precision. > + */ > + printf("\n"); > + loop_count = test_timing(test_duration, TIMING_CLOCK_SOURCE_TSC, false); > + if (loop_count > 0) > + { > + output(loop_count); > + printf("\n"); > + > + /* Now, emit fast timing measurements */ > + loop_count = test_timing(test_duration, > TIMING_CLOCK_SOURCE_TSC, true); > + output(loop_count); > + printf("\n"); > + > + pg_set_timing_clock_source(TIMING_CLOCK_SOURCE_AUTO); > + if (pg_current_timing_clock_source() == TIMING_CLOCK_SOURCE_TSC) > + printf(_("TSC clock source will be used by default, > unless timing_clock_source is set to 'system'.\n")); > + else > + printf(_("TSC clock source will not be used by default, > unless timing_clock_source is set to 'tsc'.\n")); > + } > + else > + printf(_("TSC clock source is not usable. Likely unable to > determine TSC frequency. are you running in an unsupported virtualized > environment?.\n")); > +#endif > + A bit weird that most of the output stuff is handled in output(), but then some of it is handled directly in main() now, some of it in test_timing(). Greetings, Andres Freund
