Re: What is a typical precision of gettimeofday()?

Hannu Krosing Sun, 06 Jul 2025 11:22:57 -0700

Here is the latest patch with documentation only for the utility
itself. Old general discussion moved to PostgreSQL Wiki with link to
it in "See Also " section


Also added a flag to select number of direct values to show

On Sun, Nov 3, 2024 at 11:19 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
>
> Hannu Krosing <han...@google.com> writes:
> > Did I understand correctly that you would prefer the documentation part to
> > be much smaller than it is now and all current the discussion about things
> > that are not strictly about the pg_test_timing to be not in the docs for it
> > ?
>
> Well, I would like for the docs not to readily get stale again.
> I don't foresee us maintaining this page better in future than
> we have so far.
>
> > My current plan is to move the other discussions around timing from th
> > edocs to PostgreSQL Wiki.
>
> That could work.
>
>                         regards, tom lane

From 56cf14f282a99e5431daf05f97de454f82358a9a Mon Sep 17 00:00:00 2001
From: Hannu Krosing <han...@google.com>
Date: Sun, 6 Jul 2025 20:17:24 +0200
Subject: [PATCH v3] added option to select number of direct values to show,
 added documentation, moved old general discussion to PostgreSQL wiki

---
 doc/src/sgml/ref/pgtesttiming.sgml      | 247 ++++++------------------
 src/bin/pg_test_timing/pg_test_timing.c | 128 +++++++++---
 2 files changed, 162 insertions(+), 213 deletions(-)

diff --git a/doc/src/sgml/ref/pgtesttiming.sgml b/doc/src/sgml/ref/pgtesttiming.sgml
index a5eb3aa25e0..4b03bf77487 100644
--- a/doc/src/sgml/ref/pgtesttiming.sgml
+++ b/doc/src/sgml/ref/pgtesttiming.sgml
@@ -32,6 +32,12 @@ PostgreSQL documentation
  <para>
   <application>pg_test_timing</application> is a tool to measure the timing overhead
   on your system and confirm that the system time never moves backwards.
+ </para>
+ <para>
+  This utility is also used to determine if the <varname>track_io_timing</varname> 
+  configuration parameter is safe to turn on.
+ </para>
+ <para>
   Systems that are slow to collect timing data can give less accurate
   <command>EXPLAIN ANALYZE</command> results.
  </para>
@@ -59,6 +65,18 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-N <replaceable class="parameter">list_n_direct_values</replaceable></option></term>
+      <term><option>--list-n-direct-values=<replaceable class="parameter">list_n_direct_values</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the number of direct ns values to list. We count first 1024 nanosecond values 
+        to help in understanding the behaviour of the system clock but by default we show just 
+        the first 10 that have non-zero count.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-V</option></term>
       <term><option>--version</option></term>
@@ -92,205 +110,65 @@ PostgreSQL documentation
   <title>Interpreting Results</title>
 
   <para>
-   Good results will show most (>90%) individual timing calls take less than
-   one microsecond. Average per loop overhead will be even lower, below 100
-   nanoseconds. This example from an Intel i7-860 system using a TSC clock
-   source shows excellent performance:
-
-<screen><![CDATA[
-Testing timing overhead for 3 seconds.
-Per loop time including overhead: 35.96 ns
-Histogram of timing durations:
-  < us   % of total      count
-     1     96.40465   80435604
-     2      3.59518    2999652
-     4      0.00015        126
-     8      0.00002         13
-    16      0.00000          2
-]]></screen>
+   The output has four columns with rows showing a shifted-by-one log2(ns) histogram of timing calls. 
+   This is not the classic log2(n+1) histogram as it counts zeros separately and then  switches to 
+   log2(ns) starting from value == 1.
   </para>
-
   <para>
-   Note that different units are used for the per loop time than the
-   histogram. The loop can have resolution within a few nanoseconds (ns),
-   while the individual timing calls can only resolve down to one microsecond
-   (us).
+    <itemizedlist spacing="compact">
+    <listitem><simpara>nanosecond value that is larger of equal of counted ns</simpara></listitem>
+    <listitem><simpara>percentage of values in this bucket</simpara></listitem>
+    <listitem><simpara>running percentage of values in this and previous buckets</simpara></listitem>
+    <listitem><simpara>count of timings in this bucket</simpara></listitem>
+   </itemizedlist>
   </para>
-
- </refsect2>
- <refsect2>
-  <title>Measuring Executor Timing Overhead</title>
-
   <para>
-   When the query executor is running a statement using
-   <command>EXPLAIN ANALYZE</command>, individual operations are timed as well
-   as showing a summary.  The overhead of your system can be checked by
-   counting rows with the <application>psql</application> program:
-
-<screen>
-CREATE TABLE t AS SELECT * FROM generate_series(1,100000);
-\timing
-SELECT COUNT(*) FROM t;
-EXPLAIN ANALYZE SELECT COUNT(*) FROM t;
-</screen>
+   The results below show that 99% of timing calls took between 8 and 31 nanoseconds (the first block) 
   </para>
-
   <para>
-   The i7-860 system measured runs the count query in 9.8 ms while
-   the <command>EXPLAIN ANALYZE</command> version takes 16.6 ms, each
-   processing just over 100,000 rows.  That 6.8 ms difference means the timing
-   overhead per row is 68 ns, about twice what pg_test_timing estimated it
-   would be.  Even that relatively small amount of overhead is making the fully
-   timed count statement take almost 70% longer.  On more substantial queries,
-   the timing overhead would be less problematic.
+   The second block goes into more detail showing that the fastest calls took 13 ns and 99% took 18 ns or less.
   </para>
 
- </refsect2>
-
- <refsect2>
-  <title>Changing Time Sources</title>
   <para>
-   On some newer Linux systems, it's possible to change the clock source used
-   to collect timing data at any time.  A second example shows the slowdown
-   possible from switching to the slower acpi_pm time source, on the same
-   system used for the fast results above:
-
 <screen><![CDATA[
-# cat /sys/devices/system/clocksource/clocksource0/available_clocksource
-tsc hpet acpi_pm
-# echo acpi_pm > /sys/devices/system/clocksource/clocksource0/current_clocksource
-# pg_test_timing
-Per loop time including overhead: 722.92 ns
+Testing timing overhead for 3 seconds.
+Per loop time including overhead: 15.09 ns
 Histogram of timing durations:
-  < us   % of total      count
-     1     27.84870    1155682
-     2     72.05956    2990371
-     4      0.07810       3241
-     8      0.01357        563
-    16      0.00007          3
+   <= ns   % of total  running %      count
+       0       0.0000     0.0000          0
+       1       0.0000     0.0000          0
+       3       0.0000     0.0000          0
+       7       0.0000     0.0000          0
+      15      78.2383    78.2383  155509081
+      31      21.6825    99.9208   43096793
+      63       0.0755    99.9963     150061
+     127       0.0010    99.9972       1926
+     255       0.0011    99.9983       2097
+     511       0.0012    99.9995       2470
+    1023       0.0001    99.9997        248
+    2047       0.0002    99.9999        403
+    4095       0.0001    99.9999        142
+    8191       0.0000   100.0000         55
+   16383       0.0000   100.0000         45
+   32767       0.0000   100.0000         10
+   65535       0.0000   100.0000          6
+
+Timing durations less than 1024 ns, first 10 values:
+      ns   % of total  running %      count
+      13       0.0560     0.0560     111398
+      14      21.9167    21.9727   43562342
+      15      56.2656    78.2383  111835341
+      16      19.0465    97.2848   37857493
+      17       1.5008    98.7856    2983064
+      18       0.5467    99.3324    1086682
+      19       0.2145    99.5469     426359
+      20       0.1422    99.6891     282614
+      21       0.0553    99.7444     110002
+      22       0.0252    99.7696      50099
 ]]></screen>
   </para>
 
-  <para>
-   In this configuration, the sample <command>EXPLAIN ANALYZE</command> above
-   takes 115.9 ms.  That's 1061 ns of timing overhead, again a small multiple
-   of what's measured directly by this utility.  That much timing overhead
-   means the actual query itself is only taking a tiny fraction of the
-   accounted for time, most of it is being consumed in overhead instead.  In
-   this configuration, any <command>EXPLAIN ANALYZE</command> totals involving
-   many timed operations would be inflated significantly by timing overhead.
-  </para>
-
-  <para>
-   FreeBSD also allows changing the time source on the fly, and it logs
-   information about the timer selected during boot:
-
-<screen>
-# dmesg | grep "Timecounter"
-Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
-Timecounter "i8254" frequency 1193182 Hz quality 0
-Timecounters tick every 10.000 msec
-Timecounter "TSC" frequency 2531787134 Hz quality 800
-# sysctl kern.timecounter.hardware=TSC
-kern.timecounter.hardware: ACPI-fast -> TSC
-</screen>
-  </para>
-
-  <para>
-   Other systems may only allow setting the time source on boot.  On older
-   Linux systems the "clock" kernel setting is the only way to make this sort
-   of change.  And even on some more recent ones, the only option you'll see
-   for a clock source is "jiffies".  Jiffies are the older Linux software clock
-   implementation, which can have good resolution when it's backed by fast
-   enough timing hardware, as in this example:
-
-<screen><![CDATA[
-$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
-jiffies
-$ dmesg | grep time.c
-time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer.
-time.c: Detected 2400.153 MHz processor.
-$ pg_test_timing
-Testing timing overhead for 3 seconds.
-Per timing duration including loop overhead: 97.75 ns
-Histogram of timing durations:
-  < us   % of total      count
-     1     90.23734   27694571
-     2      9.75277    2993204
-     4      0.00981       3010
-     8      0.00007         22
-    16      0.00000          1
-    32      0.00000          1
-]]></screen></para>
-
  </refsect2>
-
- <refsect2>
-  <title>Clock Hardware and Timing Accuracy</title>
-
-  <para>
-   Collecting accurate timing information is normally done on computers using
-   hardware clocks with various levels of accuracy.  With some hardware the
-   operating systems can pass the system clock time almost directly to
-   programs.  A system clock can also be derived from a chip that simply
-   provides timing interrupts, periodic ticks at some known time interval.  In
-   either case, operating system kernels provide a clock source that hides
-   these details.  But the accuracy of that clock source and how quickly it can
-   return results varies based on the underlying hardware.
-  </para>
-
-  <para>
-   Inaccurate time keeping can result in system instability.  Test any change
-   to the clock source very carefully.  Operating system defaults are sometimes
-   made to favor reliability over best accuracy. And if you are using a virtual
-   machine, look into the recommended time sources compatible with it.  Virtual
-   hardware faces additional difficulties when emulating timers, and there are
-   often per operating system settings suggested by vendors.
-  </para>
-
-  <para>
-   The Time Stamp Counter (TSC) clock source is the most accurate one available
-   on current generation CPUs. It's the preferred way to track the system time
-   when it's supported by the operating system and the TSC clock is
-   reliable. There are several ways that TSC can fail to provide an accurate
-   timing source, making it unreliable. Older systems can have a TSC clock that
-   varies based on the CPU temperature, making it unusable for timing. Trying
-   to use TSC on some older multicore CPUs can give a reported time that's
-   inconsistent among multiple cores. This can result in the time going
-   backwards, a problem this program checks for.  And even the newest systems
-   can fail to provide accurate TSC timing with very aggressive power saving
-   configurations.
-  </para>
-
-  <para>
-   Newer operating systems may check for the known TSC problems and switch to a
-   slower, more stable clock source when they are seen.  If your system
-   supports TSC time but doesn't default to that, it may be disabled for a good
-   reason.  And some operating systems may not detect all the possible problems
-   correctly, or will allow using TSC even in situations where it's known to be
-   inaccurate.
-  </para>
-
-  <para>
-   The High Precision Event Timer (HPET) is the preferred timer on systems
-   where it's available and TSC is not accurate.  The timer chip itself is
-   programmable to allow up to 100 nanosecond resolution, but you may not see
-   that much accuracy in your system clock.
-  </para>
-
-  <para>
-   Advanced Configuration and Power Interface (ACPI) provides a Power
-   Management (PM) Timer, which Linux refers to as the acpi_pm.  The clock
-   derived from acpi_pm will at best provide 300 nanosecond resolution.
-  </para>
-
-  <para>
-   Timers used on older PC hardware include the 8254 Programmable Interval
-   Timer (PIT), the real-time clock (RTC), the Advanced Programmable Interrupt
-   Controller (APIC) timer, and the Cyclone timer.  These timers aim for
-   millisecond resolution.
-  </para>
-  </refsect2>
  </refsect1>
 
  <refsect1>
@@ -298,6 +176,7 @@ Histogram of timing durations:
 
   <simplelist type="inline">
    <member><xref linkend="sql-explain"/></member>
+   <member><ulink url="https://wiki.postgresql.org/wiki/Pg_test_timing";>General discussion about timing moved into PostgreSQL Wiki</ulink></member>
   </simplelist>
  </refsect1>
 </refentry>
diff --git a/src/bin/pg_test_timing/pg_test_timing.c b/src/bin/pg_test_timing/pg_test_timing.c
index ce7aad4b25a..270ab41ea92 100644
--- a/src/bin/pg_test_timing/pg_test_timing.c
+++ b/src/bin/pg_test_timing/pg_test_timing.c
@@ -9,19 +9,25 @@
 #include <limits.h>
 
 #include "getopt_long.h"
+#include "port/pg_bitutils.h"
 #include "portability/instr_time.h"
 
 static const char *progname;
 
 static unsigned int test_duration = 3;
+static unsigned int list_n_direct_values = 10;
+
+/* record duration in powers of 2 nanoseconds */
+static long long int histogram[32];
+
+/* record counts of first 128 durations directly */
+#define NUM_DIRECT 1024
+static long long int direct_histogram[NUM_DIRECT];
 
 static void handle_args(int argc, char *argv[]);
 static uint64 test_timing(unsigned int duration);
 static void output(uint64 loop_count);
 
-/* record duration in powers of 2 microseconds */
-static long long int histogram[32];
-
 int
 main(int argc, char *argv[])
 {
@@ -44,6 +50,7 @@ handle_args(int argc, char *argv[])
 {
 	static struct option long_options[] = {
 		{"duration", required_argument, NULL, 'd'},
+		{"list-n-direct-values", required_argument, NULL, 'N'},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -56,7 +63,7 @@ handle_args(int argc, char *argv[])
 	{
 		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
 		{
-			printf(_("Usage: %s [-d DURATION]\n"), progname);
+			printf(_("Usage: %s [-d DURATION] [-N <number of direct values>]\n"), progname);
 			exit(0);
 		}
 		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
@@ -66,7 +73,7 @@ handle_args(int argc, char *argv[])
 		}
 	}
 
-	while ((option = getopt_long(argc, argv, "d:",
+	while ((option = getopt_long(argc, argv, "d:N:",
 								 long_options, &optindex)) != -1)
 	{
 		switch (option)
@@ -93,6 +100,28 @@ handle_args(int argc, char *argv[])
 				}
 				break;
 
+			case 'N':
+				errno = 0;
+				optval = strtoul(optarg, &endptr, 10);
+
+				if (endptr == optarg || *endptr != '\0' ||
+					errno != 0 || optval != (unsigned int) optval)
+				{
+					fprintf(stderr, _("%s: invalid argument for option %s\n"),
+							progname, "--list-n-direct-values");
+					fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+					exit(1);
+				}
+
+				list_n_direct_values = (unsigned int) optval;
+				if (list_n_direct_values > NUM_DIRECT) 
+				{
+					fprintf(stderr, _("%s: %s must be in range %u..%u\n"),
+							progname, "--list_n_direct_values", 0, NUM_DIRECT);
+					exit(1);
+				}
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						progname);
@@ -111,7 +140,6 @@ handle_args(int argc, char *argv[])
 		exit(1);
 	}
 
-
 	printf(ngettext("Testing timing overhead for %u second.\n",
 					"Testing timing overhead for %u seconds.\n",
 					test_duration),
@@ -130,19 +158,19 @@ test_timing(unsigned int duration)
 				end_time,
 				temp;
 
-	total_time = duration > 0 ? duration * INT64CONST(1000000) : 0;
+	total_time = duration > 0 ? duration * INT64CONST(1000000000) : 0;
 
 	INSTR_TIME_SET_CURRENT(start_time);
-	cur = INSTR_TIME_GET_MICROSEC(start_time);
+	cur = INSTR_TIME_GET_NANOSEC(start_time);
 
 	while (time_elapsed < total_time)
 	{
 		int32		diff,
-					bits = 0;
+					bits;
 
 		prev = cur;
 		INSTR_TIME_SET_CURRENT(temp);
-		cur = INSTR_TIME_GET_MICROSEC(temp);
+		cur = INSTR_TIME_GET_NANOSEC(temp);
 		diff = cur - prev;
 
 		/* Did time go backwards? */
@@ -154,18 +182,21 @@ test_timing(unsigned int duration)
 		}
 
 		/* What is the highest bit in the time diff? */
-		while (diff)
-		{
-			diff >>= 1;
-			bits++;
-		}
+		if (diff > 0)
+			bits = pg_leftmost_one_pos32(diff) + 1;
+		else
+			bits = 0;
 
 		/* Update appropriate duration bucket */
 		histogram[bits]++;
 
+		/* Update direct histogram of time diffs */
+		if (diff < NUM_DIRECT)
+			direct_histogram[diff]++;
+
 		loop_count++;
 		INSTR_TIME_SUBTRACT(temp, start_time);
-		time_elapsed = INSTR_TIME_GET_MICROSEC(temp);
+		time_elapsed = INSTR_TIME_GET_NANOSEC(temp);
 	}
 
 	INSTR_TIME_SET_CURRENT(end_time);
@@ -181,28 +212,67 @@ test_timing(unsigned int duration)
 static void
 output(uint64 loop_count)
 {
-	int64		max_bit = 31,
+	int			max_bit = 31,
 				i;
-	char	   *header1 = _("< us");
+	char	   *header1 = _("<= ns");
+	char	   *header1b = _("ns");
 	char	   *header2 = /* xgettext:no-c-format */ _("% of total");
-	char	   *header3 = _("count");
+	char	   *header3 = /* xgettext:no-c-format */ _("running %");
+	char	   *header4 = _("count");
 	int			len1 = strlen(header1);
 	int			len2 = strlen(header2);
 	int			len3 = strlen(header3);
+	int			len4 = strlen(header4);
+	double		rprct;
 
 	/* find highest bit value */
 	while (max_bit > 0 && histogram[max_bit] == 0)
 		max_bit--;
 
+	/* set minimum column widths */
+	len1 = Max(8, len1);
+	len2 = Max(10, len2);
+	len3 = Max(10, len3);
+	len4 = Max(10, len4);
+
 	printf(_("Histogram of timing durations:\n"));
-	printf("%*s   %*s %*s\n",
-		   Max(6, len1), header1,
-		   Max(10, len2), header2,
-		   Max(10, len3), header3);
-
-	for (i = 0; i <= max_bit; i++)
-		printf("%*ld    %*.5f %*lld\n",
-			   Max(6, len1), 1l << i,
-			   Max(10, len2) - 1, (double) histogram[i] * 100 / loop_count,
-			   Max(10, len3), histogram[i]);
+	printf("%*s   %*s %*s %*s\n",
+		   len1, header1,
+		   len2, header2,
+		   len3, header3,
+		   len4, header4);
+
+	for (i = 0, rprct = 0; i <= max_bit; i++)
+	{
+		double		prct = (double) histogram[i] * 100 / loop_count;
+
+		rprct += prct;
+		printf("%*ld   %*.4f %*.4f %*lld\n",
+			   len1, (1L << i) - 1,
+			   len2, prct,
+			   len3, rprct,
+			   len4, histogram[i]);
+	}
+
+	printf(_("\nTiming durations less than %d ns, first %d values:\n"), NUM_DIRECT, list_n_direct_values);
+	printf("%*s   %*s %*s %*s\n",
+		   len1, header1b,
+		   len2, header2,
+		   len3, header3,
+		   len4, header4);
+
+	for (i = 0, rprct = 0; i < NUM_DIRECT && list_n_direct_values ; i++)
+	{
+		double		prct = (double) direct_histogram[i] * 100 / loop_count;
+
+		rprct += prct;
+		if (direct_histogram[i]) {
+			list_n_direct_values--;
+			printf("%*d   %*.4f %*.4f %*lld\n",
+				   len1, i,
+				   len2, prct,
+				   len3, rprct,
+				   len4, direct_histogram[i]);
+		}
+	}
 }
-- 
2.50.0.727.gbf7dc18ff4-goog

Re: What is a typical precision of gettimeofday()?

Reply via email to