Re: [HACKERS] pgbench progress report improvements - split 3 v2

Fabien COELHO Tue, 24 Sep 2013 13:43:15 -0700

Split 3 of the initial submission, which actually deal with data measured andreported on stderr under various options.
This version currently takes into account many comments by Noah Mish. Inparticular, the default "no report" behavior under benchmarking is notchanged, although I really think that it should. Also, the standard deviationis only shown when available, which is not in under all settings, because ofconcerns expressed about the cost of additional calls to gettimeofday. ISTMthat these concerned are misplaced in this particular case.

Yet another version to fulfill requirements by Robert Haas not to changecurrent default behaviors. The initialization progress is reported every100-k rows, and there is no progress report for benchmarking.




Improve pgbench measurements & progress report

These changes are coupled because measures are changed, and their
reporting as well. Submitting separate patches for these different
features would result in conflicting or dependent patches, so I
wish to avoid that if possible.

 - Use same progress and quiet options both under init & bench.

   However, the default reporting is NOT changed, as required by
   Robert Haas. It is currently every 100k rows when initializing,
   and nothing when benchmarking.

   I would suggest to change this default to a few seconds for both.
   The 100-k row report is very verbose an unreadable on good hardware.
   For benchmarking, the rational is that it is error prone as it must
   run a long time to be significant because of warmup effects (esp on HDD,
   less true on SSD). Seeing how the current performance evolve would
   help pgbench users realise that. See down below a sample output.

 - Measure transaction latency instead of computing it,
   for --rate and --progress.

   The previous computed figure does not make sense under throttling:
   as sleep throttling time was included in the figures, the displayed
   results were plain false.

   The latency and its standard deviation are printed out under progress
   and in the final report when available.

   It could be made "always" available, but that would require to accept
   additional gettimeofday calls. I do not think that there is a
   performance issue here, but there is significant opposition to the idea.

 - Take thread start time at the beginning of the thread (!)

   Otherwise it includes pretty slow thread/fork system start times in
   the measurements. May help with bug #8326. This also helps with throttling,
   as the start time is used to adjust the rate: if it is not the actual
   start time, there is a bit of a rush at the beginning in order to
   catch up. Taking the start time when the thread actually starts is
   the sane thing to do to avoid surprises at possibly strange measures
   on short runs.

Sample output under initialization with --progress=1

  creating tables...
  1126000 of 3000000 tuples (37%) done (elapsed 1.00 s, remaining 1.67 s).
  2106000 of 3000000 tuples (70%) done (elapsed 2.00 s, remaining 0.85 s).
  2824000 of 3000000 tuples (94%) done (elapsed 3.00 s, remaining 0.19 s).
  3000000 of 3000000 tuples (100%) done (elapsed 3.19 s, remaining 0.00 s).
  vacuum...
  set primary keys...
  done.

Sample output under benchmarking with --progress=1

  progress: 1.0 s, 2626.1 tps, 0.374 stddev 0.597 ms lat
  progress: 2.0 s, 2766.6 tps, 0.358 stddev 0.588 ms lat
  progress: 3.0 s, 2567.4 tps, 0.385 stddev 0.665 ms lat
  progress: 4.0 s, 3014.2 tps, 0.328 stddev 0.593 ms lat
  progress: 5.0 s, 2959.3 tps, 0.334 stddev 0.553 ms lat
  ...
  progress: 16.0 s, 5009.2 tps, 0.197 stddev 0.381 ms lat
  ...
  progress: 24.0 s, 7051.2 tps, 0.139 stddev 0.284 ms lat
  ...
  progress: 50.0 s, 6860.5 tps, 0.143 stddev 0.052 ms lat
  ...

The "warmup" is quite fast because the DB is on a SSD. In the beginning
the standard deviation is well over the average transaction time, but
when the steady state is reached (later) it is much below.

--
Fabien.

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index ad8e272..934244b 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -167,11 +167,13 @@ char	   *index_tablespace = NULL;
 #define SCALE_32BIT_THRESHOLD 20000
 
 bool		use_log;			/* log transaction latencies to a file */
-bool		use_quiet;			/* quiet logging onto stderr */
 int			agg_interval;		/* log aggregates instead of individual
 								 * transactions */
-int			progress = 0;       /* thread progress report every this seconds */
+bool		use_quiet = false;	/* more or less "quiet" logging onto stderr */
+int			progress = -1;      /* report every this seconds.
+								   0 is no report, -1 is not set yet */
 int         progress_nclients = 0; /* number of clients for progress report */
+int         progress_nthreads = 0; /* number of threads for progress report */
 bool		is_connect;			/* establish connection for each transaction */
 bool		is_latencies;		/* report per-command latencies */
 int			main_pid;			/* main process id used in log filename */
@@ -214,6 +216,8 @@ typedef struct
 	int			nvariables;
 	instr_time	txn_begin;		/* used for measuring transaction latencies */
 	instr_time	stmt_begin;		/* used for measuring statement latencies */
+	int64       txn_latencies;	/* cumulated latencies */
+	int64		txn_sqlats;		/* cumulated square latencies */
 	bool		is_throttled;	/* whether transaction throttling is done */
 	int			use_file;		/* index in sql_files for this client */
 	bool		prepared[MAX_FILES];
@@ -243,6 +247,8 @@ typedef struct
 {
 	instr_time	conn_time;
 	int			xacts;
+	int64		latencies;
+	int64		sqlats;
 	int64       throttle_lag;
 	int64       throttle_lag_max;
 } TResult;
@@ -347,7 +353,6 @@ usage(void)
 		   "  -i, --initialize         invokes initialization mode\n"
 		   "  -F, --fillfactor=NUM     set fill factor\n"
 		   "  -n, --no-vacuum          do not run VACUUM after initialization\n"
-		   "  -q, --quiet              quiet logging (one message each 5 seconds)\n"
 		   "  -s, --scale=NUM          scaling factor\n"
 		   "  --foreign-keys           create foreign key constraints between tables\n"
 		   "  --index-tablespace=TABLESPACE\n"
@@ -367,7 +372,6 @@ usage(void)
 		                                           "(default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
 		   "  -N, --skip-some-updates  skip updates of pgbench_tellers and pgbench_branches\n"
-		   "  -P, --progress=NUM       show thread progress report every NUM seconds\n"
 		   "  -r, --report-latencies   report average latency per command\n"
 		   "  -R, --rate=SPEC          target rate in transactions per second\n"
 		   "  -s, --scale=NUM          report this scale factor in output\n"
@@ -382,6 +386,9 @@ usage(void)
 		   "  -d, --debug              print debugging output\n"
 		   "  -h, --host=HOSTNAME      database server host or socket directory\n"
 		   "  -p, --port=PORT          database server port number\n"
+		   "  -P, --progress=NUM       show progress report every NUM seconds\n"
+		   "  -q, --quiet              bench: no progress report\n"
+		   "                           init: every 5 second progress report\n"
 		   "  -U, --username=USERNAME  connect as specified database user\n"
 		   "  -V, --version            output version information, then exit\n"
 		   "  -?, --help               show this help, then exit\n"
@@ -1003,6 +1010,21 @@ top:
 		}
 
 		/*
+		 * always record latency under progress or throttling
+		 */
+		if ((progress || throttle_delay) && commands[st->state + 1] == NULL)
+		{
+			instr_time	now, diff;
+			int64 latency;
+			INSTR_TIME_SET_CURRENT(now);
+			diff = now;
+			INSTR_TIME_SUBTRACT(diff, st->txn_begin);
+			latency = INSTR_TIME_GET_MICROSEC(diff);
+			st->txn_latencies += latency;
+			st->txn_sqlats += latency*latency;
+		}
+
+		/*
 		 * if transaction finished, record the time it took in the log
 		 */
 		if (logfile && commands[st->state + 1] == NULL)
@@ -1191,8 +1213,11 @@ top:
 		goto top;
 	}
 
-	/* Record transaction start time if logging is enabled */
-	if (logfile && st->state == 0)
+	/*
+	 * Record transaction start time
+	 * if logging, progress or throttling is enabled
+	 */
+	if ((logfile || progress || throttle_delay) && st->state == 0)
 		INSTR_TIME_SET_CURRENT(st->txn_begin);
 
 	/* Record statement start time if per-command latencies are requested */
@@ -1682,7 +1707,7 @@ init(bool is_no_vacuum)
 		 * If we want to stick with the original logging, print a message each
 		 * 100k inserted rows.
 		 */
-		if ((!use_quiet) && (j % 100000 == 0))
+		if ((!progress) && (j % 100000 == 0))
 		{
 			INSTR_TIME_SET_CURRENT(diff);
 			INSTR_TIME_SUBTRACT(diff, start);
@@ -1695,8 +1720,8 @@ init(bool is_no_vacuum)
 					(int) (((int64) j * 100) / (naccounts * scale)),
 					elapsed_sec, remaining_sec);
 		}
-		/* let's not call the timing for each row, but only each 100 rows */
-		else if (use_quiet && (j % 100 == 0))
+		/* gettimeofday is called every 1000 rows, should be under 0.01 s */
+		else if (progress && (j % 1000 == 0))
 		{
 			INSTR_TIME_SET_CURRENT(diff);
 			INSTR_TIME_SUBTRACT(diff, start);
@@ -1705,14 +1730,14 @@ init(bool is_no_vacuum)
 			remaining_sec = (scale * naccounts - j) * elapsed_sec / j;
 
 			/* have we reached the next interval (or end)? */
-			if ((j == scale * naccounts) || (elapsed_sec >= log_interval * LOG_STEP_SECONDS))
+			if ((j == scale * naccounts) || (elapsed_sec >= log_interval * progress))
 			{
 				fprintf(stderr, INT64_FORMAT " of " INT64_FORMAT " tuples (%d%%) done (elapsed %.2f s, remaining %.2f s).\n",
 						j, (int64) naccounts * scale,
 						(int) (((int64) j * 100) / (naccounts * scale)), elapsed_sec, remaining_sec);
 
 				/* skip to the next interval */
-				log_interval = (int) ceil(elapsed_sec / LOG_STEP_SECONDS);
+				log_interval = (int) ceil(elapsed_sec / progress);
 			}
 		}
 
@@ -2096,6 +2121,7 @@ static void
 printResults(int ttype, int normal_xacts, int nclients,
 			 TState *threads, int nthreads,
 			 instr_time total_time, instr_time conn_total_time,
+			 int64 total_latencies, int64 total_sqlats,
 			 int64 throttle_lag, int64 throttle_lag_max)
 {
 	double		time_include,
@@ -2135,6 +2161,22 @@ printResults(int ttype, int normal_xacts, int nclients,
 			   normal_xacts);
 	}
 
+	if (throttle_delay || progress)
+	{
+		/* compute and show latency average and standard deviation */
+		double latency = 0.001 * total_latencies / normal_xacts;
+		double sqlat = (double) total_sqlats / normal_xacts;
+		printf("latency average: %.3f ms\n"
+			   "latency stddev: %.3f ms\n",
+			   latency, 0.001 * sqrt(sqlat - 1000000*latency*latency));
+	}
+	else
+ 	{
+		/* only an average latency computed from the duration is available */
+		printf("latency average: %.3f ms\n",
+			   1000.0 * duration / normal_xacts);
+	}
+
 	if (throttle_delay)
 	{
 		/*
@@ -2143,8 +2185,9 @@ printResults(int ttype, int normal_xacts, int nclients,
 		 * transaction.  The measured lag may be caused by thread/client load,
 		 * the database load, or the Poisson throttling process.
 		 */
-		printf("average rate limit schedule lag: %.3f ms (max %.3f ms)\n",
-			   0.001 * throttle_lag / normal_xacts, 0.001 * throttle_lag_max);
+		printf("rate limit lag: avg %.3f (max %.3f) ms\n",
+			   0.001 * throttle_lag / normal_xacts,
+			   0.001 * throttle_lag_max);
 	}
 
 	printf("tps = %f (including connections establishing)\n", tps_include);
@@ -2254,7 +2297,9 @@ main(int argc, char **argv)
 	instr_time	start_time;		/* start up time */
 	instr_time	total_time;
 	instr_time	conn_total_time;
-	int			total_xacts;
+	int			total_xacts = 0;
+	int64		total_latencies = 0;
+	int64		total_sqlats = 0;
 	int64       throttle_lag = 0;
 	int64       throttle_lag_max = 0;
 
@@ -2519,6 +2564,13 @@ main(int argc, char **argv)
     /* compute a per thread delay */
 	throttle_delay *= nthreads;
 
+	/* set default progress report time
+	 * the previous default every 100-k row report for -i is preserved.
+	 * the no progress report by default for benchmarking is also preserved.
+	 */
+	if (progress == -1)
+		progress = is_init_mode && use_quiet? LOG_STEP_SECONDS: 0;
+
 	if (argc > optind)
 		dbName = argv[optind];
 	else
@@ -2554,13 +2606,6 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
-	/* -q may be used only with -i */
-	if (use_quiet && !is_init_mode)
-	{
-		fprintf(stderr, "quiet-logging is allowed only in initialization mode (-i)\n");
-		exit(1);
-	}
-
 	/* --sampling-rate may must not be used with --aggregate-interval */
 	if (sample_rate > 0.0 && agg_interval > 0)
 	{
@@ -2608,6 +2653,7 @@ main(int argc, char **argv)
 	 */
 	main_pid = (int) getpid();
 	progress_nclients = nclients;
+	progress_nthreads = nthreads;
 
 	if (nclients > 1)
 	{
@@ -2796,8 +2842,6 @@ main(int argc, char **argv)
 	{
 		TState	   *thread = &threads[i];
 
-		INSTR_TIME_SET_CURRENT(thread->start_time);
-
 		/* the first thread (i = 0) is executed by main thread */
 		if (i > 0)
 		{
@@ -2816,15 +2860,16 @@ main(int argc, char **argv)
 	}
 
 	/* wait for threads and accumulate results */
-	total_xacts = 0;
 	INSTR_TIME_SET_ZERO(conn_total_time);
 	for (i = 0; i < nthreads; i++)
 	{
 		void	   *ret = NULL;
 
 		if (threads[i].thread == INVALID_THREAD)
+			/* run thread 0 */
 			ret = threadRun(&threads[i]);
 		else
+			/* wait for other threads */
 			pthread_join(threads[i].thread, &ret);
 
 		if (ret != NULL)
@@ -2832,6 +2877,8 @@ main(int argc, char **argv)
 			TResult    *r = (TResult *) ret;
 
 			total_xacts += r->xacts;
+			total_latencies += r->latencies;
+			total_sqlats += r->sqlats;
 			throttle_lag += r->throttle_lag;
 			if (r->throttle_lag_max > throttle_lag_max)
 				throttle_lag_max = r->throttle_lag_max;
@@ -2845,7 +2892,8 @@ main(int argc, char **argv)
 	INSTR_TIME_SET_CURRENT(total_time);
 	INSTR_TIME_SUBTRACT(total_time, start_time);
 	printResults(ttype, total_xacts, nclients, threads, nthreads,
-				 total_time, conn_total_time, throttle_lag, throttle_lag_max);
+				 total_time, conn_total_time,
+				 total_latencies, total_sqlats, throttle_lag, throttle_lag_max);
 
 	return 0;
 }
@@ -2863,21 +2911,24 @@ threadRun(void *arg)
 	int			remains = nstate;		/* number of remaining clients */
 	int			i;
 	/* for reporting progress: */
-	int64       thread_start = INSTR_TIME_GET_MICROSEC(thread->start_time);
-	int64		last_report = thread_start;
-	int64		next_report = last_report + progress * 1000000;
-	int64		last_count = 0;
+	int64		thread_start, last_report, next_report;
+	int64		last_count = 0, last_lats = 0, last_sqlats = 0, last_lags = 0;
 
 	AggVals		aggs;
 
 	/*
-	 * Initialize throttling rate target for all of the thread's clients.  It
-	 * might be a little more accurate to reset thread->start_time here too.
-	 * The possible drift seems too small relative to typical throttle delay
-	 * times to worry about it.
+	 * This is the real start time of the thread. Take it here to avoid
+	 * including thread creation time in measures.
 	 */
-	INSTR_TIME_SET_CURRENT(start);
-	thread->throttle_trigger = INSTR_TIME_GET_MICROSEC(start);
+	INSTR_TIME_SET_CURRENT(thread->start_time);
+
+	/*
+	 * Initialize throttling rate target for all of the thread's clients.
+	 */
+	thread_start = INSTR_TIME_GET_MICROSEC(thread->start_time);
+	last_report = thread_start;
+	next_report = last_report + progress * 1000000;
+	thread->throttle_trigger = thread_start;
 	thread->throttle_lag = 0;
 	thread->throttle_lag_max = 0;
 
@@ -3066,21 +3117,40 @@ threadRun(void *arg)
 			if (now >= next_report)
 			{
 				/* generate and show report */
-				int64 count = 0;
+				int64 count = 0, lats = 0, sqlats = 0;
+				int64 lags = thread->throttle_lag;
 				int64 run = now - last_report;
-				float tps, total_run, latency;
+				double tps, total_run, latency, sqlat, stdev, lag;
 
 				for (i = 0 ; i < nstate ; i++)
+				{
 					count += state[i].cnt;
+					lats   += state[i].txn_latencies;
+					sqlats += state[i].txn_sqlats;
+				}
 
 				total_run = (now - thread_start) / 1000000.0;
 				tps = 1000000.0 * (count - last_count) / run;
-				latency = 1000.0 * nstate / tps;
+				latency = 0.001 * (lats - last_lats) / (count - last_count);
+				sqlat = 1.0 * (sqlats - last_sqlats) / (count - last_count);
+				stdev = 0.001 * sqrt(sqlat - 1000000.0 * latency * latency);
+				lag = 0.001 * (lags - last_lags) / (count - last_count);
 
-				fprintf(stderr, "progress %d: %.1f s, %.1f tps, %.3f ms lat\n",
-						thread->tid, total_run, tps, latency);
+				if (throttle_delay)
+					fprintf(stderr,
+							"progress %d: %.1f s, %.1f tps, "
+							"%.3f stddev %.3f ms lat, %.3f ms lag\n",
+							thread->tid, total_run, tps, latency, stdev, lag);
+				else
+					fprintf(stderr,
+							"progress %d: %.1f s, %.1f tps, "
+							"%.3f stddev %.3f ms lat\n",
+							thread->tid, total_run, tps, latency, stdev);
 
 				last_count = count;
+				last_lats = lats;
+				last_sqlats = sqlats;
+				last_lags = lags;
 				last_report = now;
 				next_report += progress * 1000000;
 			}
@@ -3096,21 +3166,42 @@ threadRun(void *arg)
 			if (now >= next_report)
 			{
 				/* generate and show report */
-				int64 count = 0;
+				int64 count = 0, lats = 0, sqlats = 0, lags = 0;
 				int64 run = now - last_report;
-				float tps, total_run, latency;
+				float tps, total_run, latency, sqlat, lag, stdev;
 
 				for (i = 0 ; i < progress_nclients ; i++)
+				{
 					count += state[i].cnt;
+					lats   += state[i].txn_latencies;
+					sqlats += state[i].txn_sqlats;
+				}
+
+				for (i = 0 ; i < progress_nthreads ; i++)
+					lags += thread[i].throttle_lag;
 
 				total_run = (now - thread_start) / 1000000.0;
 				tps = 1000000.0 * (count - last_count) / run;
-				latency = 1000.0 * progress_nclients / tps;
+				latency = 0.001 * (lats - last_lats) / (count - last_count);
+				sqlat = 1.0 * (sqlats - last_sqlats) / (count - last_count);
+				stdev = 0.001 * sqrt(sqlat - 1000000.0 * latency * latency);
+				lag = 0.001 * (lags - last_lags) / (count - last_count);
 
-				fprintf(stderr, "progress: %.1f s, %.1f tps, %.3f ms lat\n",
-						total_run, tps, latency);
+				if (throttle_delay)
+					fprintf(stderr,
+							"progress: %.1f s, %.1f tps, "
+							"%.3f stddev %.3f ms lat, %.3f ms lag\n",
+							total_run, tps, latency, stdev, lag);
+				else
+					fprintf(stderr,
+							"progress: %.1f s, %.1f tps, "
+							"%.3f stddev %.3f ms lat\n",
+							total_run, tps, latency, stdev);
 
 				last_count = count;
+				last_lats = lats;
+				last_sqlats = sqlats;
+				last_lags = lags;
 				last_report = now;
 				next_report += progress * 1000000;
 			}
@@ -3122,8 +3213,14 @@ done:
 	INSTR_TIME_SET_CURRENT(start);
 	disconnect_all(state, nstate);
 	result->xacts = 0;
+	result->latencies = 0;
+	result->sqlats = 0;
 	for (i = 0; i < nstate; i++)
+	{
 		result->xacts += state[i].cnt;
+		result->latencies += state[i].txn_latencies;
+		result->sqlats += state[i].txn_sqlats;
+	}
 	result->throttle_lag = thread->throttle_lag;
 	result->throttle_lag_max = thread->throttle_lag_max;
 	INSTR_TIME_SET_CURRENT(end);
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index 49a79b1..825592d 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -182,18 +182,6 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
      </varlistentry>
 
      <varlistentry>
-      <term><option>-q</option></term>
-      <term><option>--quiet</option></term>
-      <listitem>
-       <para>
-        Switch logging to quiet mode, producing only one progress message per 5
-        seconds. The default logging prints one message each 100000 rows, which
-        often outputs many lines per second (especially on good hardware).
-       </para>
-      </listitem>
-     </varlistentry>
-
-     <varlistentry>
       <term><option>-s</option> <replaceable>scale_factor</></term>
       <term><option>--scale=</option><replaceable>scale_factor</></term>
       <listitem>
@@ -408,16 +396,6 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
      </varlistentry>
 
      <varlistentry>
-      <term><option>-P</option> <replaceable>sec</></term>
-      <term><option>--progress=</option><replaceable>sec</></term>
-      <listitem>
-       <para>
-        Show progress report every <literal>sec</> seconds.
-       </para>
-      </listitem>
-     </varlistentry>
-
-     <varlistentry>
       <term><option>-R</option> <replaceable>rate</></term>
       <term><option>--rate=</option><replaceable>rate</></term>
       <listitem>
@@ -591,6 +569,39 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
      </varlistentry>
 
      <varlistentry>
+      <term><option>-P</option> <replaceable>sec</></term>
+      <term><option>--progress=</option><replaceable>sec</></term>
+      <listitem>
+       <para>
+        Show progress report every <literal>sec</> seconds.
+
+        When initializing, the report displays the number of inserted rows
+        and its total percentage, the passed time and estimated remaining time
+        for completion.
+
+        When benchmarking, the report displays the time since the beginning
+        of the run, the tps since the last report, the transaction latency
+        average and standard deviation. Under throttling (<option>-R</option>),
+        also includes the lag time behind schedule.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-q</option></term>
+      <term><option>--quiet</option></term>
+      <listitem>
+       <para>
+        When initializing: switch to every 5 seconds progress report instead
+        of the default possibly verbose every 100-k rows progress report.
+       </para>
+       <para>
+        When benchmarking: switch off regular progress reports.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>-U</option> <replaceable>login</></term>
       <term><option>--username=</option><replaceable>login</></term>
       <listitem>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench progress report improvements - split 3 v2

Reply via email to