Here is the latest patch with documentation only for the utility itself. Old general discussion moved to PostgreSQL Wiki with link to it in "See Also " section
Also added a flag to select number of direct values to show On Sun, Nov 3, 2024 at 11:19 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > > Hannu Krosing <han...@google.com> writes: > > Did I understand correctly that you would prefer the documentation part to > > be much smaller than it is now and all current the discussion about things > > that are not strictly about the pg_test_timing to be not in the docs for it > > ? > > Well, I would like for the docs not to readily get stale again. > I don't foresee us maintaining this page better in future than > we have so far. > > > My current plan is to move the other discussions around timing from th > > edocs to PostgreSQL Wiki. > > That could work. > > regards, tom lane
From 56cf14f282a99e5431daf05f97de454f82358a9a Mon Sep 17 00:00:00 2001 From: Hannu Krosing <han...@google.com> Date: Sun, 6 Jul 2025 20:17:24 +0200 Subject: [PATCH v3] added option to select number of direct values to show, added documentation, moved old general discussion to PostgreSQL wiki --- doc/src/sgml/ref/pgtesttiming.sgml | 247 ++++++------------------ src/bin/pg_test_timing/pg_test_timing.c | 128 +++++++++--- 2 files changed, 162 insertions(+), 213 deletions(-) diff --git a/doc/src/sgml/ref/pgtesttiming.sgml b/doc/src/sgml/ref/pgtesttiming.sgml index a5eb3aa25e0..4b03bf77487 100644 --- a/doc/src/sgml/ref/pgtesttiming.sgml +++ b/doc/src/sgml/ref/pgtesttiming.sgml @@ -32,6 +32,12 @@ PostgreSQL documentation <para> <application>pg_test_timing</application> is a tool to measure the timing overhead on your system and confirm that the system time never moves backwards. + </para> + <para> + This utility is also used to determine if the <varname>track_io_timing</varname> + configuration parameter is safe to turn on. + </para> + <para> Systems that are slow to collect timing data can give less accurate <command>EXPLAIN ANALYZE</command> results. </para> @@ -59,6 +65,18 @@ PostgreSQL documentation </listitem> </varlistentry> + <varlistentry> + <term><option>-N <replaceable class="parameter">list_n_direct_values</replaceable></option></term> + <term><option>--list-n-direct-values=<replaceable class="parameter">list_n_direct_values</replaceable></option></term> + <listitem> + <para> + Specifies the number of direct ns values to list. We count first 1024 nanosecond values + to help in understanding the behaviour of the system clock but by default we show just + the first 10 that have non-zero count. + </para> + </listitem> + </varlistentry> + <varlistentry> <term><option>-V</option></term> <term><option>--version</option></term> @@ -92,205 +110,65 @@ PostgreSQL documentation <title>Interpreting Results</title> <para> - Good results will show most (>90%) individual timing calls take less than - one microsecond. Average per loop overhead will be even lower, below 100 - nanoseconds. This example from an Intel i7-860 system using a TSC clock - source shows excellent performance: - -<screen><![CDATA[ -Testing timing overhead for 3 seconds. -Per loop time including overhead: 35.96 ns -Histogram of timing durations: - < us % of total count - 1 96.40465 80435604 - 2 3.59518 2999652 - 4 0.00015 126 - 8 0.00002 13 - 16 0.00000 2 -]]></screen> + The output has four columns with rows showing a shifted-by-one log2(ns) histogram of timing calls. + This is not the classic log2(n+1) histogram as it counts zeros separately and then switches to + log2(ns) starting from value == 1. </para> - <para> - Note that different units are used for the per loop time than the - histogram. The loop can have resolution within a few nanoseconds (ns), - while the individual timing calls can only resolve down to one microsecond - (us). + <itemizedlist spacing="compact"> + <listitem><simpara>nanosecond value that is larger of equal of counted ns</simpara></listitem> + <listitem><simpara>percentage of values in this bucket</simpara></listitem> + <listitem><simpara>running percentage of values in this and previous buckets</simpara></listitem> + <listitem><simpara>count of timings in this bucket</simpara></listitem> + </itemizedlist> </para> - - </refsect2> - <refsect2> - <title>Measuring Executor Timing Overhead</title> - <para> - When the query executor is running a statement using - <command>EXPLAIN ANALYZE</command>, individual operations are timed as well - as showing a summary. The overhead of your system can be checked by - counting rows with the <application>psql</application> program: - -<screen> -CREATE TABLE t AS SELECT * FROM generate_series(1,100000); -\timing -SELECT COUNT(*) FROM t; -EXPLAIN ANALYZE SELECT COUNT(*) FROM t; -</screen> + The results below show that 99% of timing calls took between 8 and 31 nanoseconds (the first block) </para> - <para> - The i7-860 system measured runs the count query in 9.8 ms while - the <command>EXPLAIN ANALYZE</command> version takes 16.6 ms, each - processing just over 100,000 rows. That 6.8 ms difference means the timing - overhead per row is 68 ns, about twice what pg_test_timing estimated it - would be. Even that relatively small amount of overhead is making the fully - timed count statement take almost 70% longer. On more substantial queries, - the timing overhead would be less problematic. + The second block goes into more detail showing that the fastest calls took 13 ns and 99% took 18 ns or less. </para> - </refsect2> - - <refsect2> - <title>Changing Time Sources</title> <para> - On some newer Linux systems, it's possible to change the clock source used - to collect timing data at any time. A second example shows the slowdown - possible from switching to the slower acpi_pm time source, on the same - system used for the fast results above: - <screen><![CDATA[ -# cat /sys/devices/system/clocksource/clocksource0/available_clocksource -tsc hpet acpi_pm -# echo acpi_pm > /sys/devices/system/clocksource/clocksource0/current_clocksource -# pg_test_timing -Per loop time including overhead: 722.92 ns +Testing timing overhead for 3 seconds. +Per loop time including overhead: 15.09 ns Histogram of timing durations: - < us % of total count - 1 27.84870 1155682 - 2 72.05956 2990371 - 4 0.07810 3241 - 8 0.01357 563 - 16 0.00007 3 + <= ns % of total running % count + 0 0.0000 0.0000 0 + 1 0.0000 0.0000 0 + 3 0.0000 0.0000 0 + 7 0.0000 0.0000 0 + 15 78.2383 78.2383 155509081 + 31 21.6825 99.9208 43096793 + 63 0.0755 99.9963 150061 + 127 0.0010 99.9972 1926 + 255 0.0011 99.9983 2097 + 511 0.0012 99.9995 2470 + 1023 0.0001 99.9997 248 + 2047 0.0002 99.9999 403 + 4095 0.0001 99.9999 142 + 8191 0.0000 100.0000 55 + 16383 0.0000 100.0000 45 + 32767 0.0000 100.0000 10 + 65535 0.0000 100.0000 6 + +Timing durations less than 1024 ns, first 10 values: + ns % of total running % count + 13 0.0560 0.0560 111398 + 14 21.9167 21.9727 43562342 + 15 56.2656 78.2383 111835341 + 16 19.0465 97.2848 37857493 + 17 1.5008 98.7856 2983064 + 18 0.5467 99.3324 1086682 + 19 0.2145 99.5469 426359 + 20 0.1422 99.6891 282614 + 21 0.0553 99.7444 110002 + 22 0.0252 99.7696 50099 ]]></screen> </para> - <para> - In this configuration, the sample <command>EXPLAIN ANALYZE</command> above - takes 115.9 ms. That's 1061 ns of timing overhead, again a small multiple - of what's measured directly by this utility. That much timing overhead - means the actual query itself is only taking a tiny fraction of the - accounted for time, most of it is being consumed in overhead instead. In - this configuration, any <command>EXPLAIN ANALYZE</command> totals involving - many timed operations would be inflated significantly by timing overhead. - </para> - - <para> - FreeBSD also allows changing the time source on the fly, and it logs - information about the timer selected during boot: - -<screen> -# dmesg | grep "Timecounter" -Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 -Timecounter "i8254" frequency 1193182 Hz quality 0 -Timecounters tick every 10.000 msec -Timecounter "TSC" frequency 2531787134 Hz quality 800 -# sysctl kern.timecounter.hardware=TSC -kern.timecounter.hardware: ACPI-fast -> TSC -</screen> - </para> - - <para> - Other systems may only allow setting the time source on boot. On older - Linux systems the "clock" kernel setting is the only way to make this sort - of change. And even on some more recent ones, the only option you'll see - for a clock source is "jiffies". Jiffies are the older Linux software clock - implementation, which can have good resolution when it's backed by fast - enough timing hardware, as in this example: - -<screen><![CDATA[ -$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource -jiffies -$ dmesg | grep time.c -time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer. -time.c: Detected 2400.153 MHz processor. -$ pg_test_timing -Testing timing overhead for 3 seconds. -Per timing duration including loop overhead: 97.75 ns -Histogram of timing durations: - < us % of total count - 1 90.23734 27694571 - 2 9.75277 2993204 - 4 0.00981 3010 - 8 0.00007 22 - 16 0.00000 1 - 32 0.00000 1 -]]></screen></para> - </refsect2> - - <refsect2> - <title>Clock Hardware and Timing Accuracy</title> - - <para> - Collecting accurate timing information is normally done on computers using - hardware clocks with various levels of accuracy. With some hardware the - operating systems can pass the system clock time almost directly to - programs. A system clock can also be derived from a chip that simply - provides timing interrupts, periodic ticks at some known time interval. In - either case, operating system kernels provide a clock source that hides - these details. But the accuracy of that clock source and how quickly it can - return results varies based on the underlying hardware. - </para> - - <para> - Inaccurate time keeping can result in system instability. Test any change - to the clock source very carefully. Operating system defaults are sometimes - made to favor reliability over best accuracy. And if you are using a virtual - machine, look into the recommended time sources compatible with it. Virtual - hardware faces additional difficulties when emulating timers, and there are - often per operating system settings suggested by vendors. - </para> - - <para> - The Time Stamp Counter (TSC) clock source is the most accurate one available - on current generation CPUs. It's the preferred way to track the system time - when it's supported by the operating system and the TSC clock is - reliable. There are several ways that TSC can fail to provide an accurate - timing source, making it unreliable. Older systems can have a TSC clock that - varies based on the CPU temperature, making it unusable for timing. Trying - to use TSC on some older multicore CPUs can give a reported time that's - inconsistent among multiple cores. This can result in the time going - backwards, a problem this program checks for. And even the newest systems - can fail to provide accurate TSC timing with very aggressive power saving - configurations. - </para> - - <para> - Newer operating systems may check for the known TSC problems and switch to a - slower, more stable clock source when they are seen. If your system - supports TSC time but doesn't default to that, it may be disabled for a good - reason. And some operating systems may not detect all the possible problems - correctly, or will allow using TSC even in situations where it's known to be - inaccurate. - </para> - - <para> - The High Precision Event Timer (HPET) is the preferred timer on systems - where it's available and TSC is not accurate. The timer chip itself is - programmable to allow up to 100 nanosecond resolution, but you may not see - that much accuracy in your system clock. - </para> - - <para> - Advanced Configuration and Power Interface (ACPI) provides a Power - Management (PM) Timer, which Linux refers to as the acpi_pm. The clock - derived from acpi_pm will at best provide 300 nanosecond resolution. - </para> - - <para> - Timers used on older PC hardware include the 8254 Programmable Interval - Timer (PIT), the real-time clock (RTC), the Advanced Programmable Interrupt - Controller (APIC) timer, and the Cyclone timer. These timers aim for - millisecond resolution. - </para> - </refsect2> </refsect1> <refsect1> @@ -298,6 +176,7 @@ Histogram of timing durations: <simplelist type="inline"> <member><xref linkend="sql-explain"/></member> + <member><ulink url="https://wiki.postgresql.org/wiki/Pg_test_timing">General discussion about timing moved into PostgreSQL Wiki</ulink></member> </simplelist> </refsect1> </refentry> diff --git a/src/bin/pg_test_timing/pg_test_timing.c b/src/bin/pg_test_timing/pg_test_timing.c index ce7aad4b25a..270ab41ea92 100644 --- a/src/bin/pg_test_timing/pg_test_timing.c +++ b/src/bin/pg_test_timing/pg_test_timing.c @@ -9,19 +9,25 @@ #include <limits.h> #include "getopt_long.h" +#include "port/pg_bitutils.h" #include "portability/instr_time.h" static const char *progname; static unsigned int test_duration = 3; +static unsigned int list_n_direct_values = 10; + +/* record duration in powers of 2 nanoseconds */ +static long long int histogram[32]; + +/* record counts of first 128 durations directly */ +#define NUM_DIRECT 1024 +static long long int direct_histogram[NUM_DIRECT]; static void handle_args(int argc, char *argv[]); static uint64 test_timing(unsigned int duration); static void output(uint64 loop_count); -/* record duration in powers of 2 microseconds */ -static long long int histogram[32]; - int main(int argc, char *argv[]) { @@ -44,6 +50,7 @@ handle_args(int argc, char *argv[]) { static struct option long_options[] = { {"duration", required_argument, NULL, 'd'}, + {"list-n-direct-values", required_argument, NULL, 'N'}, {NULL, 0, NULL, 0} }; @@ -56,7 +63,7 @@ handle_args(int argc, char *argv[]) { if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0) { - printf(_("Usage: %s [-d DURATION]\n"), progname); + printf(_("Usage: %s [-d DURATION] [-N <number of direct values>]\n"), progname); exit(0); } if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0) @@ -66,7 +73,7 @@ handle_args(int argc, char *argv[]) } } - while ((option = getopt_long(argc, argv, "d:", + while ((option = getopt_long(argc, argv, "d:N:", long_options, &optindex)) != -1) { switch (option) @@ -93,6 +100,28 @@ handle_args(int argc, char *argv[]) } break; + case 'N': + errno = 0; + optval = strtoul(optarg, &endptr, 10); + + if (endptr == optarg || *endptr != '\0' || + errno != 0 || optval != (unsigned int) optval) + { + fprintf(stderr, _("%s: invalid argument for option %s\n"), + progname, "--list-n-direct-values"); + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname); + exit(1); + } + + list_n_direct_values = (unsigned int) optval; + if (list_n_direct_values > NUM_DIRECT) + { + fprintf(stderr, _("%s: %s must be in range %u..%u\n"), + progname, "--list_n_direct_values", 0, NUM_DIRECT); + exit(1); + } + break; + default: fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname); @@ -111,7 +140,6 @@ handle_args(int argc, char *argv[]) exit(1); } - printf(ngettext("Testing timing overhead for %u second.\n", "Testing timing overhead for %u seconds.\n", test_duration), @@ -130,19 +158,19 @@ test_timing(unsigned int duration) end_time, temp; - total_time = duration > 0 ? duration * INT64CONST(1000000) : 0; + total_time = duration > 0 ? duration * INT64CONST(1000000000) : 0; INSTR_TIME_SET_CURRENT(start_time); - cur = INSTR_TIME_GET_MICROSEC(start_time); + cur = INSTR_TIME_GET_NANOSEC(start_time); while (time_elapsed < total_time) { int32 diff, - bits = 0; + bits; prev = cur; INSTR_TIME_SET_CURRENT(temp); - cur = INSTR_TIME_GET_MICROSEC(temp); + cur = INSTR_TIME_GET_NANOSEC(temp); diff = cur - prev; /* Did time go backwards? */ @@ -154,18 +182,21 @@ test_timing(unsigned int duration) } /* What is the highest bit in the time diff? */ - while (diff) - { - diff >>= 1; - bits++; - } + if (diff > 0) + bits = pg_leftmost_one_pos32(diff) + 1; + else + bits = 0; /* Update appropriate duration bucket */ histogram[bits]++; + /* Update direct histogram of time diffs */ + if (diff < NUM_DIRECT) + direct_histogram[diff]++; + loop_count++; INSTR_TIME_SUBTRACT(temp, start_time); - time_elapsed = INSTR_TIME_GET_MICROSEC(temp); + time_elapsed = INSTR_TIME_GET_NANOSEC(temp); } INSTR_TIME_SET_CURRENT(end_time); @@ -181,28 +212,67 @@ test_timing(unsigned int duration) static void output(uint64 loop_count) { - int64 max_bit = 31, + int max_bit = 31, i; - char *header1 = _("< us"); + char *header1 = _("<= ns"); + char *header1b = _("ns"); char *header2 = /* xgettext:no-c-format */ _("% of total"); - char *header3 = _("count"); + char *header3 = /* xgettext:no-c-format */ _("running %"); + char *header4 = _("count"); int len1 = strlen(header1); int len2 = strlen(header2); int len3 = strlen(header3); + int len4 = strlen(header4); + double rprct; /* find highest bit value */ while (max_bit > 0 && histogram[max_bit] == 0) max_bit--; + /* set minimum column widths */ + len1 = Max(8, len1); + len2 = Max(10, len2); + len3 = Max(10, len3); + len4 = Max(10, len4); + printf(_("Histogram of timing durations:\n")); - printf("%*s %*s %*s\n", - Max(6, len1), header1, - Max(10, len2), header2, - Max(10, len3), header3); - - for (i = 0; i <= max_bit; i++) - printf("%*ld %*.5f %*lld\n", - Max(6, len1), 1l << i, - Max(10, len2) - 1, (double) histogram[i] * 100 / loop_count, - Max(10, len3), histogram[i]); + printf("%*s %*s %*s %*s\n", + len1, header1, + len2, header2, + len3, header3, + len4, header4); + + for (i = 0, rprct = 0; i <= max_bit; i++) + { + double prct = (double) histogram[i] * 100 / loop_count; + + rprct += prct; + printf("%*ld %*.4f %*.4f %*lld\n", + len1, (1L << i) - 1, + len2, prct, + len3, rprct, + len4, histogram[i]); + } + + printf(_("\nTiming durations less than %d ns, first %d values:\n"), NUM_DIRECT, list_n_direct_values); + printf("%*s %*s %*s %*s\n", + len1, header1b, + len2, header2, + len3, header3, + len4, header4); + + for (i = 0, rprct = 0; i < NUM_DIRECT && list_n_direct_values ; i++) + { + double prct = (double) direct_histogram[i] * 100 / loop_count; + + rprct += prct; + if (direct_histogram[i]) { + list_n_direct_values--; + printf("%*d %*.4f %*.4f %*lld\n", + len1, i, + len2, prct, + len3, rprct, + len4, direct_histogram[i]); + } + } } -- 2.50.0.727.gbf7dc18ff4-goog