Hi, See attached v11 (and moved to the PG19-2 commitfest), split into a new set of patches:
0001 - Improve the __cpuidex check added for a different purpose in 792752af4eb5 to: - Fix a typo (configure was incorrectly checking for "__get_cpuidex", vs meson.build was doing it correctly) - Adds support for non-MSVC compilers as well (e.g. GCC 11+), where __cpuidex is defined in cpuid.h, not intrin.h This change should be independently committable, though we wouldn't use cpuidex on non-MSVC compilers today in practice, I believe. 0002 - The core patch rebased, which, as before: - Adds INSTR_TIME_SET_CURRENT_FAST (which uses RDTSC if available) and uses it for InstrStartNode/InstrStopNode - Changes INSTR_TIME_SET_CURRENT to directly use RDTSCP if available (instead of pg_clock_gettime) - Keeps utilizing pg_clock_gettime for both unless we're on Linux x86 and the clocksource is set to "tsc" (see note below re: that aspect) 0003 - Changes to pg_test_timing utility: - Show the used time source (clock_gettime + clock type / RDTSC / RDTSCP) - Allows checking the latency of the "fast" time source (RDTSC) with the new "--fast" option, and warn if its not available - Avoids the INSTR_TIME_GET_NANOSEC slowness that Andres reported by diffing the ticks first and then calculating nanosecs Note the other pg_test_timing changes regarding nanoseconds should all have been addressed by 0b096e379e6f I believe. On Wed, Jul 16, 2025 at 5:48 PM Andres Freund <and...@anarazel.de> wrote: > Applying just patch 2 results in a performance *regression* in > pg_test_timing > on my machine, which is due to always hitting the unlikely() path in > INSTR_TIME_GET_NANOSEC() when INSTR_TIME_GET_NANOSEC() is used for an > "absolute" timestamp, rather than a differential timestamp. Which in turn > means hitting a division instruction every time, which on my slightly older > hardware is slower. That may be because my workstation has been up for 40 > days, but clearly that can't lead us down to the slow-path > Assuming you didn't restart your workstation, can you retest with this patch set? I believe the pg_test_timing changes should address this problem, by avoiding calculations with the absolute (very large) ticks value. > Open questions I have: > > - Could we rely on checking whether the TSC timesource is invariant (via > > CPUID), instead of relying on Linux choosing it as a clocksource? > > I don't see why not? Thinking this through again, my worry would be that our detection logic for whether the TSC is safe to use directly, is much less sophisticated than that of the Linux Kernel - and the Linux Kernel also allows configuring the clock source explicitly, if the detection goes wrong. For example, David had previously brought up the worry that accessing the TSC directly in a VM can be very slow when the TSC is emulated. The Linux Kernel indeed has checks for this, e.g. in the context of Xen: https://github.com/torvalds/linux/blob/b711733e89a3f84c8e1e56e2328f9a0fa5facc7c/arch/x86/xen/time.c#L490 Maybe introducing a GUC for this is the way to go, with an OS-dependent "auto" setting? Thanks, Lukas -- Lukas Fittl
v11-0003-pg_test_timing-Add-fast-flag-to-test-fast-timing.patch
Description: Binary data
v11-0002-Use-time-stamp-counter-to-measure-time-on-Linux-.patch
Description: Binary data
v11-0001-cpuidex-check-Support-detecting-newer-GCC-versio.patch
Description: Binary data