Re: [HACKERS] pgbench throttling latency limit

Heikki Linnakangas Thu, 02 Oct 2014 03:52:42 -0700

On 09/15/2014 08:46 PM, Fabien COELHO wrote:

I'm not sure I like the idea of printing a percentage.  It might be
unclear what the denominator was if somebody feels the urge to work
back to the actual number of skipped transactions.  I mean, I guess
it's probably just the value you passed to -R, so maybe that's easy
enough, but then why bother dividing in the first place?  The user can
do that easily enough if they want the data that way.


Indeed "skipped" and "late" per second may have an unclear denominator. If
you divide by the time, the unit would be "tps", but 120 tps performance
including 20 late tps, plus 10 skipped tps... I do not think it is that
clear. Reporting "tps" for transaction *not* performed looks strange.

Maybe late transactions could be given as a percentage of all processed
transactions in the interval. But for skipped the percentage of what? The
only number that would make sense is the total number of transactions
schedule in the interval, but that would mean that the denominator for
late would be different than the denominator for skipped, which is
basically uncomprehensible.

Hmm. I guess the absolute number makes sense, if you expect that thereare normally zero skipped transactions, or at least a very small number.It's more like a "good or no good" indicator. Ok, I'm fine with that.


The version I'm now working on prints output like this:

$ ./pgbench -T10 -P1  --rate=1600 --latency-limit=10
starting vacuum...end.
progress: 1.0 s, 1579.0 tps, lat 2.973 ms stddev 2.493, lag 2.414 ms, 4 skipped
progress: 2.0 s, 1570.0 tps, lat 2.140 ms stddev 1.783, lag 1.599 ms, 0 skipped
progress: 3.0 s, 1663.0 tps, lat 2.372 ms stddev 1.742, lag 1.843 ms, 4 skipped
progress: 4.0 s, 1603.2 tps, lat 2.435 ms stddev 2.247, lag 1.902 ms, 13 skipped
progress: 5.0 s, 1540.9 tps, lat 1.845 ms stddev 1.270, lag 1.303 ms, 0 skipped
progress: 6.0 s, 1588.0 tps, lat 1.630 ms stddev 1.003, lag 1.097 ms, 0 skipped
progress: 7.0 s, 1577.0 tps, lat 2.071 ms stddev 1.445, lag 1.517 ms, 0 skipped
progress: 8.0 s, 1669.9 tps, lat 2.375 ms stddev 1.917, lag 1.846 ms, 0 skipped
progress: 9.0 s, 1636.0 tps, lat 2.801 ms stddev 2.354, lag 2.250 ms, 5 skipped
progress: 10.0 s, 1606.1 tps, lat 2.751 ms stddev 2.117, lag 2.197 ms, 2 skipped
transaction type: TPC-B (sort of)
scaling factor: 5
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 16034
number of transactions skipped: 28 (0.174 %)
number of transactions above the 10.0 ms latency limit: 70 (0.436 %)
latency average: 2.343 ms
latency stddev: 1.940 ms
rate limit schedule lag: avg 1.801 (max 9.994) ms
tps = 1603.370819 (including connections establishing)
tps = 1603.619536 (excluding connections establishing)

Those progress lines are 79 or 80 characters wide, so they *just* fit ina 80-char terminal. Of course, if any of the printed numbers werehigher, it would not fit. I don't see how to usefully make it moreterse, though. I think we can live with this - these days it shouldn'tbe a huge problem to enlare the terminal to make the output fit.

Here are new patches, again the first one is just refactoring, and thesecond one contains this feature. I'm planning to commit the first oneshortly, and the second one later after people have had a chance to lookat it.

Greg: As the author of pgbench-tools, what do you think of this patch?The log file format, in particular.


- Heikki

>From 512fde5dc3fde5fc1368b3bf0c09e3ea8e022fad Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <[email protected]>
Date: Thu, 2 Oct 2014 12:58:14 +0300
Subject: [PATCH 1/2] Refactor pgbench log-writing code to a separate function.

The doCustom function was incredibly long, this makes it a little bit more
readable.
---
 contrib/pgbench/pgbench.c | 340 +++++++++++++++++++++++-----------------------
 1 file changed, 169 insertions(+), 171 deletions(-)

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 087e0d3..c14a577 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -347,6 +347,9 @@ static char *select_only = {
 static void setalarm(int seconds);
 static void *threadRun(void *arg);
 
+static void doLog(TState *thread, CState *st, FILE *logfile, instr_time *now,
+	  AggVals *agg);
+
 static void
 usage(void)
 {
@@ -1016,6 +1019,16 @@ doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVa
 	PGresult   *res;
 	Command   **commands;
 	bool		trans_needs_throttle = false;
+	instr_time	now;
+
+	/*
+	 * gettimeofday() isn't free, so we get the current timestamp lazily the
+	 * first time it's needed, and reuse the same value throughout this
+	 * function after that. This also ensures that e.g. the calculated latency
+	 * reported in the log file and in the totals are the same. Zero means
+	 * "not set yet".
+	 */
+	INSTR_TIME_SET_ZERO(now);
 
 top:
 	commands = sql_files[st->use_file];
@@ -1049,10 +1062,10 @@ top:
 
 	if (st->sleeping)
 	{							/* are we sleeping? */
-		instr_time	now;
 		int64		now_us;
 
-		INSTR_TIME_SET_CURRENT(now);
+		if (INSTR_TIME_IS_ZERO(now))
+			INSTR_TIME_SET_CURRENT(now);
 		now_us = INSTR_TIME_GET_MICROSEC(now);
 		if (st->txn_scheduled <= now_us)
 		{
@@ -1074,11 +1087,6 @@ top:
 
 	if (st->listen)
 	{							/* are we receiver? */
-		instr_time	now;
-		bool		now_valid = false;
-
-		INSTR_TIME_SET_ZERO(now); /* initialize to keep compiler quiet */
-
 		if (commands[st->state]->type == SQL_COMMAND)
 		{
 			if (debug)
@@ -1100,181 +1108,40 @@ top:
 		{
 			int			cnum = commands[st->state]->command_num;
 
-			if (!now_valid)
-			{
+			if (INSTR_TIME_IS_ZERO(now))
 				INSTR_TIME_SET_CURRENT(now);
-				now_valid = true;
-			}
 			INSTR_TIME_ACCUM_DIFF(thread->exec_elapsed[cnum],
 								  now, st->stmt_begin);
 			thread->exec_count[cnum]++;
 		}
 
-		/* transaction finished: record latency under progress or throttling */
-		if ((progress || throttle_delay) && commands[st->state + 1] == NULL)
+		/* transaction finished: calculate latency and log the transaction */
+		if (commands[st->state + 1] == NULL)
 		{
-			int64		latency;
-
-			if (!now_valid)
+			/* only calculate latency if an option is used that needs it */
+			if (progress || throttle_delay)
 			{
-				INSTR_TIME_SET_CURRENT(now);
-				now_valid = true;
-			}
-
-			latency = INSTR_TIME_GET_MICROSEC(now) - st->txn_scheduled;
+				int64		latency;
 
-			st->txn_latencies += latency;
-
-			/*
-			 * XXX In a long benchmark run of high-latency transactions, this
-			 * int64 addition eventually overflows.  For example, 100 threads
-			 * running 10s transactions will overflow it in 2.56 hours.  With
-			 * a more-typical OLTP workload of .1s transactions, overflow
-			 * would take 256 hours.
-			 */
-			st->txn_sqlats += latency * latency;
-		}
-
-		/*
-		 * if transaction finished, record the time it took in the log
-		 */
-		if (logfile && commands[st->state + 1] == NULL)
-		{
-			double		lag;
-			double		latency;
-
-			/*
-			 * write the log entry if this row belongs to the random sample,
-			 * or no sampling rate was given which means log everything.
-			 */
-			if (sample_rate == 0.0 ||
-				pg_erand48(thread->random_state) <= sample_rate)
-			{
-				if (!now_valid)
-				{
+				if (INSTR_TIME_IS_ZERO(now))
 					INSTR_TIME_SET_CURRENT(now);
-					now_valid = true;
-				}
-				latency = (double) (INSTR_TIME_GET_MICROSEC(now) - st->txn_scheduled);
-				lag = (double) (INSTR_TIME_GET_MICROSEC(st->txn_begin) - st->txn_scheduled);
-
-				/* should we aggregate the results or not? */
-				if (agg_interval > 0)
-				{
-					/*
-					 * are we still in the same interval? if yes, accumulate
-					 * the values (print them otherwise)
-					 */
-					if (agg->start_time + agg_interval >= INSTR_TIME_GET_DOUBLE(now))
-					{
-						agg->cnt += 1;
-						agg->sum_latency += latency;
-						agg->sum2_latency += latency * latency;
-
-						/* first in this aggregation interval */
-						if ((agg->cnt == 1) || (latency < agg->min_latency))
-							agg->min_latency = latency;
-
-						if ((agg->cnt == 1) || (latency > agg->max_latency))
-							agg->max_latency = latency;
-
-						/* and the same for schedule lag */
-						if (throttle_delay)
-						{
-							agg->sum_lag += lag;
-							agg->sum2_lag += lag * lag;
-
-							if ((agg->cnt == 1) || (lag < agg->min_lag))
-								agg->min_lag = lag;
-							if ((agg->cnt == 1) || (lag > agg->max_lag))
-								agg->max_lag = lag;
-						}
-					}
-					else
-					{
-						/*
-						 * Loop until we reach the interval of the current
-						 * transaction (and print all the empty intervals in
-						 * between).
-						 */
-						while (agg->start_time + agg_interval < INSTR_TIME_GET_DOUBLE(now))
-						{
-							/*
-							 * This is a non-Windows branch (thanks to the
-							 * ifdef in usage), so we don't need to handle
-							 * this in a special way (see below).
-							 */
-							fprintf(logfile, "%ld %d %.0f %.0f %.0f %.0f",
-									agg->start_time,
-									agg->cnt,
-									agg->sum_latency,
-									agg->sum2_latency,
-									agg->min_latency,
-									agg->max_latency);
-							if (throttle_delay)
-								fprintf(logfile, " %.0f %.0f %.0f %.0f",
-										agg->sum_lag,
-										agg->sum2_lag,
-										agg->min_lag,
-										agg->max_lag);
-							fputc('\n', logfile);
-
-							/* move to the next inteval */
-							agg->start_time = agg->start_time + agg_interval;
-
-							/* reset for "no transaction" intervals */
-							agg->cnt = 0;
-							agg->min_latency = 0;
-							agg->max_latency = 0;
-							agg->sum_latency = 0;
-							agg->sum2_latency = 0;
-							agg->min_lag = 0;
-							agg->max_lag = 0;
-							agg->sum_lag = 0;
-							agg->sum2_lag = 0;
-						}
-
-						/*
-						 * and now update the reset values (include the
-						 * current)
-						 */
-						agg->cnt = 1;
-						agg->min_latency = latency;
-						agg->max_latency = latency;
-						agg->sum_latency = latency;
-						agg->sum2_latency = latency * latency;
-						agg->min_lag = lag;
-						agg->max_lag = lag;
-						agg->sum_lag = lag;
-						agg->sum2_lag = lag * lag;
-					}
-				}
-				else
-				{
-					/* no, print raw transactions */
-#ifndef WIN32
-
-					/*
-					 * This is more than we really ought to know about
-					 * instr_time
-					 */
-					fprintf(logfile, "%d %d %.0f %d %ld %ld",
-							st->id, st->cnt, latency, st->use_file,
-							(long) now.tv_sec, (long) now.tv_usec);
-#else
-
-					/*
-					 * On Windows, instr_time doesn't provide a timestamp
-					 * anyway
-					 */
-					fprintf(logfile, "%d %d %.0f %d 0 0",
-							st->id, st->cnt, latency, st->use_file);
-#endif
-					if (throttle_delay)
-						fprintf(logfile, " %.0f", lag);
-					fputc('\n', logfile);
-				}
+				latency = INSTR_TIME_GET_MICROSEC(now) - st->txn_scheduled;
+
+				st->txn_latencies += latency;
+
+				/*
+				 * XXX In a long benchmark run of high-latency transactions,
+				 * this int64 addition eventually overflows.  For example, 100
+				 * threads running 10s transactions will overflow it in 2.56
+				 * hours.  With a more-typical OLTP workload of .1s
+				 * transactions, overflow would take 256 hours.
+				 */
+				st->txn_sqlats += latency * latency;
 			}
+
+			/* record the time it took in the log */
+			if (logfile)
+				doLog(thread, st, logfile, &now, agg);
 		}
 
 		if (commands[st->state]->type == SQL_COMMAND)
@@ -1734,6 +1601,137 @@ top:
 	return true;
 }
 
+/*
+ * print log entry after completing one transaction.
+ */
+static void
+doLog(TState *thread, CState *st, FILE *logfile, instr_time *now, AggVals *agg)
+{
+	double		lag;
+	double		latency;
+
+	/*
+	 * Skip the log entry if sampling is enabled and this row doesn't belong
+	 * to the random sample.
+	 */
+	if (sample_rate != 0.0 &&
+		pg_erand48(thread->random_state) > sample_rate)
+		return;
+
+	if (INSTR_TIME_IS_ZERO(*now))
+		INSTR_TIME_SET_CURRENT(*now);
+
+	latency = (double) (INSTR_TIME_GET_MICROSEC(*now) - st->txn_scheduled);
+	lag = (double) (INSTR_TIME_GET_MICROSEC(st->txn_begin) - st->txn_scheduled);
+
+	/* should we aggregate the results or not? */
+	if (agg_interval > 0)
+	{
+		/*
+		 * Are we still in the same interval? If yes, accumulate the values
+		 * (print them otherwise)
+		 */
+		if (agg->start_time + agg_interval >= INSTR_TIME_GET_DOUBLE(*now))
+		{
+			agg->cnt += 1;
+			agg->sum_latency += latency;
+			agg->sum2_latency += latency * latency;
+
+			/* first in this aggregation interval */
+			if ((agg->cnt == 1) || (latency < agg->min_latency))
+				agg->min_latency = latency;
+
+			if ((agg->cnt == 1) || (latency > agg->max_latency))
+				agg->max_latency = latency;
+
+			/* and the same for schedule lag */
+			if (throttle_delay)
+			{
+				agg->sum_lag += lag;
+				agg->sum2_lag += lag * lag;
+
+				if ((agg->cnt == 1) || (lag < agg->min_lag))
+					agg->min_lag = lag;
+				if ((agg->cnt == 1) || (lag > agg->max_lag))
+					agg->max_lag = lag;
+			}
+		}
+		else
+		{
+			/*
+			 * Loop until we reach the interval of the current transaction
+			 * (and print all the empty intervals in between).
+			 */
+			while (agg->start_time + agg_interval < INSTR_TIME_GET_DOUBLE(*now))
+			{
+				/*
+				 * This is a non-Windows branch (thanks to the
+				 * ifdef in usage), so we don't need to handle
+				 * this in a special way (see below).
+				 */
+				fprintf(logfile, "%ld %d %.0f %.0f %.0f %.0f",
+						agg->start_time,
+						agg->cnt,
+						agg->sum_latency,
+						agg->sum2_latency,
+						agg->min_latency,
+						agg->max_latency);
+				if (throttle_delay)
+					fprintf(logfile, " %.0f %.0f %.0f %.0f",
+							agg->sum_lag,
+							agg->sum2_lag,
+							agg->min_lag,
+							agg->max_lag);
+				fputc('\n', logfile);
+
+				/* move to the next inteval */
+				agg->start_time = agg->start_time + agg_interval;
+
+				/* reset for "no transaction" intervals */
+				agg->cnt = 0;
+				agg->min_latency = 0;
+				agg->max_latency = 0;
+				agg->sum_latency = 0;
+				agg->sum2_latency = 0;
+				agg->min_lag = 0;
+				agg->max_lag = 0;
+				agg->sum_lag = 0;
+				agg->sum2_lag = 0;
+			}
+
+			/* reset the values to include only the current transaction. */
+			agg->cnt = 1;
+			agg->min_latency = latency;
+			agg->max_latency = latency;
+			agg->sum_latency = latency;
+			agg->sum2_latency = latency * latency;
+			agg->min_lag = lag;
+			agg->max_lag = lag;
+			agg->sum_lag = lag;
+			agg->sum2_lag = lag * lag;
+		}
+	}
+	else
+	{
+		/* no, print raw transactions */
+#ifndef WIN32
+
+		/* This is more than we really ought to know about instr_time */
+		fprintf(logfile, "%d %d %.0f %d %ld %ld",
+				st->id, st->cnt, latency, st->use_file,
+				(long) now->tv_sec, (long) now->tv_usec);
+#else
+
+		/* On Windows, instr_time doesn't provide a timestamp anyway */
+		fprintf(logfile, "%d %d %.0f %d 0 0",
+				st->id, st->cnt, latency, st->use_file);
+#endif
+		if (throttle_delay)
+			fprintf(logfile, " %.0f", lag);
+		fputc('\n', logfile);
+	}
+}
+
 /* discard connections */
 static void
 disconnect_all(CState *state, int length)
-- 
2.1.1

>From 834cc53a95bda4710e3364643e56772451170d38 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <[email protected]>
Date: Thu, 2 Oct 2014 13:42:13 +0300
Subject: [PATCH 2/2] Add --latency-limit option to pgbench.

This allows transactions that take longer than specified limit to be counted
separately. With --rate, transactions that are already late by the time we
get to execute them are skipped altogether. Using --latency-limit with
--rate allows you to "catch up" more quickly, if there's a hickup in the
server causing a lot of transactions to stall momentarily.

Fabien COELHO, reviewed by Rukh Meski and heavily refactored by me.
---
 contrib/pgbench/pgbench.c | 228 +++++++++++++++++++++++++++++++++++-----------
 doc/src/sgml/pgbench.sgml |  43 +++++++--
 2 files changed, 207 insertions(+), 64 deletions(-)

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index c14a577..1ed412c 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -141,6 +141,18 @@ double		sample_rate = 0.0;
 int64		throttle_delay = 0;
 
 /*
+ * Transactions which take longer that this limit are counted as late
+ * and reported as such, although they are completed anyway.
+ *
+ * When under throttling: execution time slots which are more than
+ * this late (in us) are simply skipped, and the corresponding transaction
+ * is counted as such... it is not even started;
+ * otherwise above the limit transactions are counted as such, with the latency
+ * measured wrt the transaction schedule, not its actual start.
+ */
+int64		latency_limit = 0;
+
+/*
  * tablespace selection
  */
 char	   *tablespace = NULL;
@@ -238,6 +250,8 @@ typedef struct
 	int64		throttle_trigger;		/* previous/next throttling (us) */
 	int64		throttle_lag;	/* total transaction lag behind throttling */
 	int64		throttle_lag_max;		/* max transaction lag */
+	int64		throttle_latency_skipped; /* lagging transactions skipped */
+	int64		latency_late;	/* late transactions */
 } TState;
 
 #define INVALID_THREAD		((pthread_t) 0)
@@ -250,6 +264,8 @@ typedef struct
 	int64		sqlats;
 	int64		throttle_lag;
 	int64		throttle_lag_max;
+	int64		throttle_latency_skipped;
+	int64		latency_late;
 } TResult;
 
 /*
@@ -284,6 +300,8 @@ typedef struct
 
 	long		start_time;		/* when does the interval start */
 	int			cnt;			/* number of transactions */
+	int			skipped;		/* number of transactions skipped under
+								 * --rate and --latency-limit */
 
 	double		min_latency;	/* min/max latencies */
 	double		max_latency;
@@ -348,7 +366,7 @@ static void setalarm(int seconds);
 static void *threadRun(void *arg);
 
 static void doLog(TState *thread, CState *st, FILE *logfile, instr_time *now,
-	  AggVals *agg);
+	  AggVals *agg, bool skipped);
 
 static void
 usage(void)
@@ -375,6 +393,10 @@ usage(void)
 		 "  -f, --file=FILENAME      read transaction script from FILENAME\n"
 		   "  -j, --jobs=NUM           number of threads (default: 1)\n"
 		   "  -l, --log                write transaction times to log file\n"
+		   "  -L, --latency-limit=NUM  count transactions lasting more than NUM ms.\n"
+		   "                           under throttling (--rate), transactions behind schedule\n"
+		   "                           more than NUM ms are skipped, and those finishing more\n"
+		   "                           than NUM ms after their scheduled start are counted.\n"
 		   "  -M, --protocol=simple|extended|prepared\n"
 		   "                           protocol for submitting queries (default: simple)\n"
 		   "  -n, --no-vacuum          do not run VACUUM before tests\n"
@@ -994,7 +1016,9 @@ void
 agg_vals_init(AggVals *aggs, instr_time start)
 {
 	/* basic counters */
-	aggs->cnt = 0;				/* number of transactions */
+	aggs->cnt = 0;				/* number of transactions (includes skipped) */
+	aggs->skipped = 0;			/* xacts skipped under --rate --latency-limit */
+
 	aggs->sum_latency = 0;		/* SUM(latency) */
 	aggs->sum2_latency = 0;				/* SUM(latency*latency) */
 
@@ -1050,6 +1074,30 @@ top:
 		int64		wait = getPoissonRand(thread, throttle_delay);
 
 		thread->throttle_trigger += wait;
+		
+		/*
+		 * If this --lateny-limit is used, and this slot is already late so
+		 * that the transaction will miss the latency limit even if it
+		 * completed immediately, we skip this time slot and iterate till the
+		 * next slot that isn't late yet.
+		 */
+		if (latency_limit)
+		{
+			int64		now_us;
+
+			if (INSTR_TIME_IS_ZERO(now))
+				INSTR_TIME_SET_CURRENT(now);
+			now_us = INSTR_TIME_GET_MICROSEC(now);
+			while (thread->throttle_trigger < now_us - latency_limit)
+			{
+				wait = getPoissonRand(thread, throttle_delay);
+				thread->throttle_trigger += wait;
+				thread->throttle_latency_skipped++;
+
+				if (logfile)
+					doLog(thread, st, logfile, &now, agg, true);
+			}
+		}
 
 		st->txn_scheduled = thread->throttle_trigger;
 		st->sleeping = 1;
@@ -1119,12 +1167,13 @@ top:
 		if (commands[st->state + 1] == NULL)
 		{
 			/* only calculate latency if an option is used that needs it */
-			if (progress || throttle_delay)
+			if (progress || throttle_delay || latency_limit)
 			{
 				int64		latency;
 
 				if (INSTR_TIME_IS_ZERO(now))
 					INSTR_TIME_SET_CURRENT(now);
+
 				latency = INSTR_TIME_GET_MICROSEC(now) - st->txn_scheduled;
 
 				st->txn_latencies += latency;
@@ -1137,11 +1186,15 @@ top:
 				 * transactions, overflow would take 256 hours.
 				 */
 				st->txn_sqlats += latency * latency;
+
+				/* record over the limit transactions if needed. */
+				if (latency_limit && latency > latency_limit)
+					thread->latency_late++;
 			}
 
 			/* record the time it took in the log */
 			if (logfile)
-				doLog(thread, st, logfile, &now, agg);
+				doLog(thread, st, logfile, &now, agg, false);
 		}
 
 		if (commands[st->state]->type == SQL_COMMAND)
@@ -1227,7 +1280,7 @@ top:
 	}
 
 	/* Record transaction start time under logging, progress or throttling */
-	if ((logfile || progress || throttle_delay) && st->state == 0)
+	if ((logfile || progress || throttle_delay || latency_limit) && st->state == 0)
 	{
 		INSTR_TIME_SET_CURRENT(st->txn_begin);
 
@@ -1605,7 +1658,8 @@ top:
  * print log entry after completing one transaction.
  */
 static void
-doLog(TState *thread, CState *st, FILE *logfile, instr_time *now, AggVals *agg)
+doLog(TState *thread, CState *st, FILE *logfile, instr_time *now, AggVals *agg,
+	  bool skipped)
 {
 	double		lag;
 	double		latency;
@@ -1634,26 +1688,34 @@ doLog(TState *thread, CState *st, FILE *logfile, instr_time *now, AggVals *agg)
 		if (agg->start_time + agg_interval >= INSTR_TIME_GET_DOUBLE(*now))
 		{
 			agg->cnt += 1;
-			agg->sum_latency += latency;
-			agg->sum2_latency += latency * latency;
+			if (skipped)
+			{
+				/* there is no latency to record if the transaction was skipped */
+				agg->skipped += 1;
+			}
+			else
+			{
+				agg->sum_latency += latency;
+				agg->sum2_latency += latency * latency;
 
-			/* first in this aggregation interval */
-			if ((agg->cnt == 1) || (latency < agg->min_latency))
-				agg->min_latency = latency;
+				/* first in this aggregation interval */
+				if ((agg->cnt == 1) || (latency < agg->min_latency))
+					agg->min_latency = latency;
 
-			if ((agg->cnt == 1) || (latency > agg->max_latency))
-				agg->max_latency = latency;
+				if ((agg->cnt == 1) || (latency > agg->max_latency))
+					agg->max_latency = latency;
 
-			/* and the same for schedule lag */
-			if (throttle_delay)
-			{
-				agg->sum_lag += lag;
-				agg->sum2_lag += lag * lag;
+				/* and the same for schedule lag */
+				if (throttle_delay)
+				{
+					agg->sum_lag += lag;
+					agg->sum2_lag += lag * lag;
 
-				if ((agg->cnt == 1) || (lag < agg->min_lag))
-					agg->min_lag = lag;
-				if ((agg->cnt == 1) || (lag > agg->max_lag))
-					agg->max_lag = lag;
+					if ((agg->cnt == 1) || (lag < agg->min_lag))
+						agg->min_lag = lag;
+					if ((agg->cnt == 1) || (lag > agg->max_lag))
+						agg->max_lag = lag;
+				}
 			}
 		}
 		else
@@ -1677,11 +1739,15 @@ doLog(TState *thread, CState *st, FILE *logfile, instr_time *now, AggVals *agg)
 						agg->min_latency,
 						agg->max_latency);
 				if (throttle_delay)
+				{
 					fprintf(logfile, " %.0f %.0f %.0f %.0f",
 							agg->sum_lag,
 							agg->sum2_lag,
 							agg->min_lag,
 							agg->max_lag);
+					if (latency_limit)
+						fprintf(logfile, " %d", agg->skipped);
+				}
 				fputc('\n', logfile);
 
 				/* move to the next inteval */
@@ -1689,6 +1755,7 @@ doLog(TState *thread, CState *st, FILE *logfile, instr_time *now, AggVals *agg)
 
 				/* reset for "no transaction" intervals */
 				agg->cnt = 0;
+				agg->skipped = 0;
 				agg->min_latency = 0;
 				agg->max_latency = 0;
 				agg->sum_latency = 0;
@@ -1701,10 +1768,11 @@ doLog(TState *thread, CState *st, FILE *logfile, instr_time *now, AggVals *agg)
 
 			/* reset the values to include only the current transaction. */
 			agg->cnt = 1;
+			agg->skipped = skipped ? 1 : 0;
 			agg->min_latency = latency;
 			agg->max_latency = latency;
-			agg->sum_latency = latency;
-			agg->sum2_latency = latency * latency;
+			agg->sum_latency = skipped ? 0.0 : latency;
+			agg->sum2_latency = skipped ? 0.0 : latency * latency;
 			agg->min_lag = lag;
 			agg->max_lag = lag;
 			agg->sum_lag = lag;
@@ -1717,14 +1785,23 @@ doLog(TState *thread, CState *st, FILE *logfile, instr_time *now, AggVals *agg)
 #ifndef WIN32
 
 		/* This is more than we really ought to know about instr_time */
-		fprintf(logfile, "%d %d %.0f %d %ld %ld",
-				st->id, st->cnt, latency, st->use_file,
-				(long) now->tv_sec, (long) now->tv_usec);
+		if (skipped)
+			fprintf(logfile, "%d %d skipped %d %ld %ld",
+					st->id, st->cnt, st->use_file,
+					(long) now->tv_sec, (long) now->tv_usec);
+		else
+			fprintf(logfile, "%d %d %.0f %d %ld %ld",
+					st->id, st->cnt, latency, st->use_file,
+					(long) now->tv_sec, (long) now->tv_usec);
 #else
 
 		/* On Windows, instr_time doesn't provide a timestamp anyway */
-		fprintf(logfile, "%d %d %.0f %d 0 0",
-				st->id, st->cnt, latency, st->use_file);
+		if (skipped)
+			fprintf(logfile, "%d %d skipped %d 0 0",
+					st->id, st->cnt, st->use_file);
+		else
+			fprintf(logfile, "%d %d %.0f %d 0 0",
+					st->id, st->cnt, latency, st->use_file);
 #endif
 		if (throttle_delay)
 			fprintf(logfile, " %.0f", lag);
@@ -2424,7 +2501,8 @@ printResults(int ttype, int64 normal_xacts, int nclients,
 			 TState *threads, int nthreads,
 			 instr_time total_time, instr_time conn_total_time,
 			 int64 total_latencies, int64 total_sqlats,
-			 int64 throttle_lag, int64 throttle_lag_max)
+			 int64 throttle_lag, int64 throttle_lag_max,
+			 int64 throttle_latency_skipped, int64 latency_late)
 {
 	double		time_include,
 				tps_include,
@@ -2463,7 +2541,17 @@ printResults(int ttype, int64 normal_xacts, int nclients,
 			   normal_xacts);
 	}
 
-	if (throttle_delay || progress)
+	if (throttle_delay && latency_limit)
+		printf("number of transactions skipped: " INT64_FORMAT " (%.3f %%)\n",
+			   throttle_latency_skipped,
+			   100.0 * throttle_latency_skipped / (throttle_latency_skipped + normal_xacts));
+
+	if (latency_limit)
+		printf("number of transactions above the %.1f ms latency limit: " INT64_FORMAT " (%.3f %%)\n",
+			   latency_limit / 1000.0, latency_late,
+			   100.0 * latency_late / (throttle_latency_skipped + normal_xacts));
+
+	if (throttle_delay || progress || latency_limit)
 	{
 		/* compute and show latency average and standard deviation */
 		double		latency = 0.001 * total_latencies / normal_xacts;
@@ -2578,6 +2666,7 @@ main(int argc, char **argv)
 		{"sampling-rate", required_argument, NULL, 4},
 		{"aggregate-interval", required_argument, NULL, 5},
 		{"rate", required_argument, NULL, 'R'},
+		{"latency-limit", required_argument, NULL, 'L'},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -2607,6 +2696,8 @@ main(int argc, char **argv)
 	int64		total_sqlats = 0;
 	int64		throttle_lag = 0;
 	int64		throttle_lag_max = 0;
+	int64		throttle_latency_skipped = 0;
+	int64		latency_late = 0;
 
 	int			i;
 
@@ -2651,7 +2742,7 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
-	while ((c = getopt_long(argc, argv, "ih:nvp:dqSNc:j:Crs:t:T:U:lf:D:F:M:P:R:", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "ih:nvp:dqSNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
 		switch (c)
 		{
@@ -2848,6 +2939,18 @@ main(int argc, char **argv)
 					throttle_delay = (int64) (1000000.0 / throttle_value);
 				}
 				break;
+			case 'L':
+				{
+					double limit_ms = atof(optarg);
+					if (limit_ms <= 0.0)
+					{
+						fprintf(stderr, "invalid latency limit: %s\n", optarg);
+						exit(1);
+					}
+					benchmarking_option_set = true;
+					latency_limit = (int64) (limit_ms * 1000);
+				}
+				break;
 			case 0:
 				/* This covers long options which take no argument. */
 				if (foreign_keys || unlogged_tables)
@@ -3143,6 +3246,8 @@ main(int argc, char **argv)
 		thread->random_state[0] = random();
 		thread->random_state[1] = random();
 		thread->random_state[2] = random();
+		thread->throttle_latency_skipped = 0;
+		thread->latency_late = 0;
 
 		if (is_latencies)
 		{
@@ -3217,6 +3322,8 @@ main(int argc, char **argv)
 			total_latencies += r->latencies;
 			total_sqlats += r->sqlats;
 			throttle_lag += r->throttle_lag;
+			throttle_latency_skipped += r->throttle_latency_skipped;
+			latency_late += r->latency_late;
 			if (r->throttle_lag_max > throttle_lag_max)
 				throttle_lag_max = r->throttle_lag_max;
 			INSTR_TIME_ADD(conn_total_time, r->conn_time);
@@ -3239,7 +3346,8 @@ main(int argc, char **argv)
 	INSTR_TIME_SUBTRACT(total_time, start_time);
 	printResults(ttype, total_xacts, nclients, threads, nthreads,
 				 total_time, conn_total_time, total_latencies, total_sqlats,
-				 throttle_lag, throttle_lag_max);
+				 throttle_lag, throttle_lag_max, throttle_latency_skipped,
+				 latency_late);
 
 	return 0;
 }
@@ -3264,7 +3372,8 @@ threadRun(void *arg)
 	int64		last_count = 0,
 				last_lats = 0,
 				last_sqlats = 0,
-				last_lags = 0;
+				last_lags = 0,
+				last_skipped = 0;
 
 	AggVals		aggs;
 
@@ -3467,7 +3576,8 @@ threadRun(void *arg)
 				/* generate and show report */
 				int64		count = 0,
 							lats = 0,
-							sqlats = 0;
+							sqlats = 0,
+							skipped = 0;
 				int64		lags = thread->throttle_lag;
 				int64		run = now - last_report;
 				double		tps,
@@ -3490,23 +3600,26 @@ threadRun(void *arg)
 				sqlat = 1.0 * (sqlats - last_sqlats) / (count - last_count);
 				stdev = 0.001 * sqrt(sqlat - 1000000.0 * latency * latency);
 				lag = 0.001 * (lags - last_lags) / (count - last_count);
+				skipped = thread->throttle_latency_skipped - last_skipped;
 
+				fprintf(stderr,
+						"progress %d: %.1f s, %.1f tps, "
+						"lat %.3f ms stddev %.3f",
+						thread->tid, total_run, tps, latency, stdev);
 				if (throttle_delay)
-					fprintf(stderr,
-							"progress %d: %.1f s, %.1f tps, "
-							"lat %.3f ms stddev %.3f, lag %.3f ms\n",
-							thread->tid, total_run, tps, latency, stdev, lag);
-				else
-					fprintf(stderr,
-							"progress %d: %.1f s, %.1f tps, "
-							"lat %.3f ms stddev %.3f\n",
-							thread->tid, total_run, tps, latency, stdev);
+				{
+					fprintf(stderr, ", lag %.3f ms", lag);
+					if (latency_limit)
+						fprintf(stderr, ", skipped " INT64_FORMAT, skipped);
+				}
+				fprintf(stderr, "\n");
 
 				last_count = count;
 				last_lats = lats;
 				last_sqlats = sqlats;
 				last_lags = lags;
 				last_report = now;
+				last_skipped = thread->throttle_latency_skipped;
 				next_report += (int64) progress *1000000;
 			}
 		}
@@ -3525,7 +3638,8 @@ threadRun(void *arg)
 				int64		count = 0,
 							lats = 0,
 							sqlats = 0,
-							lags = 0;
+							lags = 0,
+							skipped = 0;
 				int64		run = now - last_report;
 				double		tps,
 							total_run,
@@ -3550,23 +3664,26 @@ threadRun(void *arg)
 				sqlat = 1.0 * (sqlats - last_sqlats) / (count - last_count);
 				stdev = 0.001 * sqrt(sqlat - 1000000.0 * latency * latency);
 				lag = 0.001 * (lags - last_lags) / (count - last_count);
+				skipped = thread->throttle_latency_skipped - last_skipped;
 
+				fprintf(stderr,
+						"progress: %.1f s, %.1f tps, "
+						"lat %.3f ms stddev %.3f",
+						total_run, tps, latency, stdev);
 				if (throttle_delay)
-					fprintf(stderr,
-							"progress: %.1f s, %.1f tps, "
-							"lat %.3f ms stddev %.3f, lag %.3f ms\n",
-							total_run, tps, latency, stdev, lag);
-				else
-					fprintf(stderr,
-							"progress: %.1f s, %.1f tps, "
-							"lat %.3f ms stddev %.3f\n",
-							total_run, tps, latency, stdev);
+				{
+					fprintf(stderr, ", lag %.3f ms", lag);
+					if (latency_limit)
+						fprintf(stderr, ", " INT64_FORMAT " skipped", skipped);
+				}
+				fprintf(stderr, "\n");
 
 				last_count = count;
 				last_lats = lats;
 				last_sqlats = sqlats;
 				last_lags = lags;
 				last_report = now;
+				last_skipped = thread->throttle_latency_skipped;
 				next_report += (int64) progress *1000000;
 			}
 		}
@@ -3587,6 +3704,9 @@ done:
 	}
 	result->throttle_lag = thread->throttle_lag;
 	result->throttle_lag_max = thread->throttle_lag_max;
+	result->throttle_latency_skipped = thread->throttle_latency_skipped;
+	result->latency_late = thread->latency_late;
+
 	INSTR_TIME_SET_CURRENT(end);
 	INSTR_TIME_ACCUM_DIFF(result->conn_time, end, start);
 	if (logfile)
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index c4e0cbd..b6770c6 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -345,6 +345,23 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
      </varlistentry>
 
      <varlistentry>
+      <term><option>-L</option> <replaceable>limit</></term>
+      <term><option>--latency-limit=</option><replaceable>limit</></term>
+      <listitem>
+       <para>
+        Under normal operations, transaction which last more than
+        <replaceable>limit</> milliseconds are counted and reported.
+       </para>
+       <para>
+        Under throttling option <option>--rate=...</>, transactions behind
+        schedule by <replaceable>limit</> ms are simply skipped and counted
+        as such, then transactions finished later than the scheduled time plus
+        <replaceable>limit</> are counted and reported.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>-M</option> <replaceable>querymode</></term>
       <term><option>--protocol=</option><replaceable>querymode</></term>
       <listitem>
@@ -439,7 +456,9 @@ pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</>
         schedule moves forward based on when the client first started, not
         when the previous transaction ended.  That approach means that when
         transactions go past their original scheduled end time, it is
-        possible for later ones to catch up again.
+        possible for later ones to catch up again. It can be changed with
+        option <option>--latency-limit</> which skips and counts transactions
+        late by more than this delay.
        </para>
        <para>
         When throttling is active, the transaction latency reported at the
@@ -940,7 +959,7 @@ END;
    The format of the log is:
 
 <synopsis>
-<replaceable>client_id</> <replaceable>transaction_no</> <replaceable>time</> <replaceable>file_no</> <replaceable>time_epoch</> <replaceable>time_us</> [<replaceable>schedule_lag</replaceable>]
+<replaceable>client_id</> <replaceable>transaction_no</> <replaceable>time</> <replaceable>file_no</> <replaceable>time_epoch</> <replaceable>time_us</> [<replaceable>schedule_lag</replaceable> [<replaceable>skipped_transactions</>]]
 </synopsis>
 
    where <replaceable>time</> is the total elapsed transaction time in microseconds,
@@ -950,10 +969,13 @@ END;
    UNIX epoch format timestamp and an offset
    in microseconds (suitable for creating an ISO 8601
    timestamp with fractional seconds) showing when
-   the transaction completed. The last field, <replaceable>schedule_lag</>, is
-   the difference between the transaction's scheduled start time, and the
-   time it actually started, in microseconds. It is only present when the
-   <option>--rate</> option is used.
+   the transaction completed.
+   Field <replaceable>schedule_lag</> is the difference between the transaction's
+   scheduled start time, and the time it actually started, in microseconds. It is only present
+   when the <option>--rate</> option is used.
+   The last field <replaceable>skipped_transactions</> reports the number of transactions
+   skipped because they were too far behind schedule. It is only present when both options
+   <option>--rate</> and <option>--latency-limit</> are used.
   </para>
 
   <para>
@@ -974,12 +996,12 @@ END;
 
  <refsect2>
   <title>Aggregated Logging</title>
-  
+
   <para>
    With the <option>--aggregate-interval</option> option, the logs use a bit different format:
 
 <synopsis>
-<replaceable>interval_start</> <replaceable>num_of_transactions</> <replaceable>latency_sum</> <replaceable>latency_2_sum</> <replaceable>min_latency</> <replaceable>max_latency</> [<replaceable>lag_sum</> <replaceable>lag_2_sum</> <replaceable>min_lag</> <replaceable>max_lag</>]
+<replaceable>interval_start</> <replaceable>num_of_transactions</> <replaceable>latency_sum</> <replaceable>latency_2_sum</> <replaceable>min_latency</> <replaceable>max_latency</> [<replaceable>lag_sum</> <replaceable>lag_2_sum</> <replaceable>min_lag</> <replaceable>max_lag</> [<replaceable>skipped_transactions</>]]
 </synopsis>
 
    where <replaceable>interval_start</> is the start of the interval (UNIX epoch
@@ -990,8 +1012,9 @@ END;
    <replaceable>latency_2_sum</> is a sum of 2nd powers of latencies. The last two
    fields are <replaceable>min_latency</> - a minimum latency within the interval, and
    <replaceable>max_latency</> - maximum latency within the interval. A transaction is
-   counted into the interval when it was committed. The last four fields, 
-   <replaceable>lag_sum</>, <replaceable>lag_2_sum</>, <replaceable>min_lag</>, and <replaceable>max_lag</>, are only present if the --rate option is used.
+   counted into the interval when it was committed. The fields in the end,
+   <replaceable>lag_sum</>, <replaceable>lag_2_sum</>, <replaceable>min_lag</>, and <replaceable>max_lag</>, are only present if the <option>--rate</> option is used.
+   The very last one <replaceable>skipped_transactions</> also requires option <option>--latency-limit</>.
    They are calculated from the time each transaction had to wait for the
    previous one to finish, i.e. the difference between each transaction's
    scheduled start time and the time it actually started.
-- 
2.1.1

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench throttling latency limit

Reply via email to