Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

Lukas Fittl Sun, 27 Jul 2025 12:51:55 -0700

Hi,

See attached v11 (and moved to the PG19-2 commitfest), split into a new set
of patches:

0001 - Improve the __cpuidex check added for a different purpose
in 792752af4eb5 to:

- Fix a typo (configure was incorrectly checking for "__get_cpuidex", vs
meson.build was doing it correctly)
- Adds support for non-MSVC compilers as well (e.g. GCC 11+), where
__cpuidex is defined in cpuid.h, not intrin.h

This change should be independently committable, though we wouldn't use
cpuidex on non-MSVC compilers today in practice, I believe.

0002 - The core patch rebased, which, as before:

- Adds INSTR_TIME_SET_CURRENT_FAST (which uses RDTSC if available) and uses
it for InstrStartNode/InstrStopNode
- Changes INSTR_TIME_SET_CURRENT to directly use RDTSCP if available
(instead of pg_clock_gettime)
- Keeps utilizing pg_clock_gettime for both unless we're on Linux x86 and
the clocksource is set to "tsc" (see note below re: that aspect)

0003 - Changes to pg_test_timing utility:

- Show the used time source (clock_gettime + clock type / RDTSC / RDTSCP)
- Allows checking the latency of the "fast" time source (RDTSC) with the
new "--fast" option, and warn if its not available
- Avoids the INSTR_TIME_GET_NANOSEC slowness that Andres reported by
diffing the ticks first and then calculating nanosecs

Note the other pg_test_timing changes regarding nanoseconds should all have
been addressed by 0b096e379e6f I believe.

On Wed, Jul 16, 2025 at 5:48 PM Andres Freund <[email protected]> wrote:

> Applying just patch 2 results in a performance *regression* in
> pg_test_timing
> on my machine, which is due to always hitting the unlikely() path in
> INSTR_TIME_GET_NANOSEC() when INSTR_TIME_GET_NANOSEC() is used for an
> "absolute" timestamp, rather than a differential timestamp. Which in turn
> means hitting a division instruction every time, which on my slightly older
> hardware is slower.  That may be because my workstation has been up for 40
> days, but clearly that can't lead us down to the slow-path
>

Assuming you didn't restart your workstation, can you retest with this
patch set?

I believe the pg_test_timing changes should address this problem, by
avoiding calculations with the absolute (very large) ticks value.

> Open questions I have:
> > - Could we rely on checking whether the TSC timesource is invariant (via
> > CPUID), instead of relying on Linux choosing it as a clocksource?
>
> I don't see why not?

Thinking this through again, my worry would be that our detection logic for
whether the TSC is safe to use directly, is much less sophisticated than
that of the Linux Kernel - and the Linux Kernel also allows configuring the
clock source explicitly, if the detection goes wrong.

For example, David had previously brought up the worry that accessing the
TSC directly in a VM can be very slow when the TSC is emulated. The Linux
Kernel indeed has checks for this, e.g. in the context of Xen:
https://github.com/torvalds/linux/blob/b711733e89a3f84c8e1e56e2328f9a0fa5facc7c/arch/x86/xen/time.c#L490

Maybe introducing a GUC for this is the way to go, with an OS-dependent
"auto" setting?

Thanks,
Lukas

-- 
Lukas Fittl

v11-0003-pg_test_timing-Add-fast-flag-to-test-fast-timing.patch
Description: Binary data

v11-0002-Use-time-stamp-counter-to-measure-time-on-Linux-.patch
Description: Binary data

v11-0001-cpuidex-check-Support-detecting-newer-GCC-versio.patch
Description: Binary data

Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

Reply via email to