Hello Chapman,

Thanks for the review,

The tests assume that stdlib random/srandom behavior is standard thus
deterministic between platform.

Is the behavior of srandom() and the system generator really so precisely
specified that seed 5432 will produce the same values hardcoded in the
tests on all platforms? [...]

Good question.

I'm hoping that in practice it would be the same, or that their would be very few cases (eg linux vs windows vs macos...). I was counting on the the buildfarm to tell me if I'm wrong, and fix it if needed.

To have the test run pgbench twice with the same seed and compare the
results sounds safer.

Interesting idea. The test script would be more complicated to do that, though. I would prefer to bet on "random" determinism (:-) and resort to such a solution only if it is not the case. Or maybe just put back some "\d+" to keep it simple.

This is a debatable strategy.

I did some wordsmithing of the doc, which I hope was not presumptuous of me, only as a conversation starter. [...]

Thanks for the doc improvements.

The documentation doesn't say that --random-seed= (or PGBENCH_RANDOM_SEED=)
will have the same effect as 'time', and indeed, I really think it should
be unset (defaulting to 'time'), or 'time', or 'rand', or an integer,
or an error.

Ok, done.

The error, right now, says only "expecting an unsigned integer"; it should also mention time and rand.

Ok, done.

Should 'rand' be something that conveys more about its meaning, 'strong' perhaps?

Hmmm. "Random means random":-). I have no opinion about whether it would be better. ISTM that "strong" would require some explanations. I let is as "rand" for now.

The documentation doesn't mention the allowed range of the unsigned
integer (which I get, because 'unsigned int' is exactly the signature
for srandom, but somebody might read "unsigned integer" in a more
generic sense).

Ok. I extended so that it works with octal, decimal and hexadecimal, and updated the doc accordingly. I did not add range information though.

I'm not sure what would be a better way to say it.
The base (only decimal, as now in the code) isn't specified either.

Sure.

Maybe the documentation should mention that the output now includes the random seed being used, so that (even without planning ahead) [...]

It does so only if the seed is explicitely set, otherwise it keeps the previous behavior of being silent. I added a sentence about that.

Something more may need to be said in the doc about reproducibility. I think
the real story is that a run can be reproduced if the number of clients is
equal to the number of threads.

Yes, good point. I tried to hide the issue under the "as far as random numbers are concerned". I've tried to improve this point in the doc.

Although each thread has its own generator state, each client does not (if there is more than one per thread), and as soon as real select() delays start happening in CSTATE_WAIT_RESULT, the clients dealt out to a given thread may not be hitting that thread's generator in a deterministic order.

Yep. This may evolve, for instance the error handling patch needs to restart transactions so it adds a state into the client.

--
Fabien.
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 1519fe7..7e81a51 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -680,6 +680,45 @@ pgbench <optional> <replaceable>options</replaceable> </optional> <replaceable>d
      </varlistentry>
 
      <varlistentry>
+      <term><option>--random-seed=</option><replaceable>SEED</replaceable></term>
+      <listitem>
+       <para>
+        Set random generator seed.  Seeds the system random number generator,
+        which then produces a sequence of initial generator states, one for
+        each thread.
+        Values for <replaceable>SEED</replaceable> may be:
+        <literal>time</literal> (the default, the seed is based on the current time),
+        <literal>rand</literal> (use a strong random source, failing if none
+        is available), or any unsigned octal (<literal>012470</literal>),
+        decimal (<literal>5432</literal>) or
+        hexedecimal (<literal>0x1538</literal>) integer value.
+        The random generator is invoked explicitly from a pgbench script
+        (<literal>random...</literal> functions) or implicitly (for instance option
+        <option>--rate</option> uses it to schedule transactions).
+        When explicitely set, the value used for seeding is shown on the terminal.
+        Any value allowed for <replaceable>SEED</replaceable> may also be
+        provided through the environment variable
+        <literal>PGBENCH_RANDOM_SEED</literal>.
+        To ensure that the provided seed impacts all possible uses, put this option
+        first or use the environment variable.
+      </para>
+      <para>
+        Setting the seed explicitly allows to reproduce a <command>pgbench</command>
+        run exactly, as far as random numbers are concerned.
+        As the random state is managed per thread, this means the exact same
+        <command>pgbench</command> run for an identical invocation if there is one
+        client per thread and there are no external or data dependencies.
+        From a statistical viewpoint reproducing runs exactly is a bad idea because
+        it can hide the performance variability or improve performance unduly,
+        e.g. by hitting the same pages as a previous run.
+        However, it may also be of great help for debugging, for instance
+        re-running a tricky case which leads to an error.
+        Use wisely.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>--sampling-rate=<replaceable>rate</replaceable></option></term>
       <listitem>
        <para>
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index fc2c734..d4ff4c7 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -557,6 +557,7 @@ usage(void)
 		   "  --log-prefix=PREFIX      prefix for transaction time log file\n"
 		   "                           (default: \"pgbench_log\")\n"
 		   "  --progress-timestamp     use Unix epoch timestamps for progress\n"
+		   "  --random-seed=SEED       set random seed (\"time\", \"rand\", integer)\n"
 		   "  --sampling-rate=NUM      fraction of transactions to log (e.g., 0.01 for 1%%)\n"
 		   "\nCommon options:\n"
 		   "  -d, --debug              print debugging output\n"
@@ -4010,6 +4011,55 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	}
 }
 
+/* call srandom based on some seed. NULL triggers the default behavior. */
+static void
+set_random_seed(const char *seed)
+{
+	unsigned int iseed;
+
+	if (seed == NULL || strcmp(seed, "time") == 0)
+	{
+		/* rely on current time */
+		instr_time	now;
+		INSTR_TIME_SET_CURRENT(now);
+		iseed = (unsigned int) INSTR_TIME_GET_MICROSEC(now);
+	}
+	else if (strcmp(seed, "rand") == 0)
+	{
+		/* use some "strong" random source */
+		if (!pg_strong_random(&iseed, sizeof(iseed)))
+		{
+			fprintf(stderr, "cannot seed random from a strong source\n");
+			exit(1);
+		}
+	}
+	else
+	{
+		/* parse uint seed value */
+		char garbage;
+		if (!
+			/* hexa */
+			((seed[0] == '0' && (seed[1] == 'x' || seed[1] == 'X') &&
+			 sscanf(seed, "%x%c", &iseed, &garbage) == 1) ||
+			 /* octal */
+			(seed[0] == '0' && seed[1] != 'x' && seed[1] != 'X' &&
+			 sscanf(seed, "%o%c", &iseed, &garbage) == 1) ||
+			 /* decimal */
+			 (seed[0] != '0' &&
+			  sscanf(seed, "%u%c", &iseed, &garbage) == 1)))
+		{
+			fprintf(stderr,
+					"error while scanning '%s', expecting an unsigned integer, 'time' or 'rand'\n",
+					seed);
+			exit(1);
+		}
+	}
+
+	if (seed != NULL)
+		fprintf(stderr, "setting random seed to %u\n", iseed);
+	srandom(iseed);
+}
+
 
 int
 main(int argc, char **argv)
@@ -4052,6 +4102,7 @@ main(int argc, char **argv)
 		{"progress-timestamp", no_argument, NULL, 6},
 		{"log-prefix", required_argument, NULL, 7},
 		{"foreign-keys", no_argument, NULL, 8},
+		{"random-seed", required_argument, NULL, 9},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -4120,6 +4171,9 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
+	/* set random seed early, because it may be used while parsing scripts. */
+	set_random_seed(getenv("PGBENCH_RANDOM_SEED"));
+
 	while ((c = getopt_long(argc, argv, "iI:h:nvp:dqb:SNc:j:Crs:t:T:U:lf:D:F:M:P:R:L:", long_options, &optindex)) != -1)
 	{
 		char	   *script;
@@ -4392,6 +4446,10 @@ main(int argc, char **argv)
 				initialization_option_set = true;
 				foreign_keys = true;
 				break;
+			case 9:				/* random-seed */
+				benchmarking_option_set = true;
+				set_random_seed(optarg);
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -4698,10 +4756,6 @@ main(int argc, char **argv)
 	}
 	PQfinish(con);
 
-	/* set random seed */
-	INSTR_TIME_SET_CURRENT(start_time);
-	srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
-
 	/* set up thread data structures */
 	threads = (TState *) pg_malloc(sizeof(TState) * nthreads);
 	nclients_dealt = 0;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 3dd080e..7fee671 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -210,11 +210,16 @@ COMMIT;
 } });
 
 # test expressions
+# command 1..3 and 23 depend on random seed which is used to call srandom.
 pgbench(
-	'-t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808',
+	'--random-seed=0x1538 -t 1 -Dfoo=-10.1 -Dbla=false -Di=+3 -Dminint=-9223372036854775808',
 	0,
 	[ qr{type: .*/001_pgbench_expressions}, qr{processed: 1/1} ],
-	[   qr{command=4.: int 4\b},
+	[   qr{setting random seed to 5432\b},
+		qr{command=1.: int 28\b}, # uniform random
+		qr{command=2.: int 7\b}, # exponential random
+		qr{command=3.: int 47\b}, # gaussian random
+		qr{command=4.: int 4\b},
 		qr{command=5.: int 5\b},
 		qr{command=6.: int 6\b},
 		qr{command=7.: int 7\b},
@@ -232,7 +237,7 @@ pgbench(
 		qr{command=19.: double 19\b},
 		qr{command=20.: double 20\b},
 		qr{command=21.: int 9223372036854775807\b},
-		qr{command=23.: int [1-9]\b},
+		qr{command=23.: int 1\b}, # zipfian random
 		qr{command=24.: double -27\b},
 		qr{command=25.: double 1024\b},
 		qr{command=26.: double 1\b},
diff --git a/src/bin/pgbench/t/002_pgbench_no_server.pl b/src/bin/pgbench/t/002_pgbench_no_server.pl
index 6ea55f8..6853650 100644
--- a/src/bin/pgbench/t/002_pgbench_no_server.pl
+++ b/src/bin/pgbench/t/002_pgbench_no_server.pl
@@ -78,6 +78,8 @@ my @options = (
 	[ 'invalid init step', '-i -I dta',
 		[qr{unrecognized initialization step},
 		 qr{allowed steps are} ] ],
+	[ 'bad random seed', '--random-seed=one',
+		[qr{error while scanning 'one', expecting an unsigned integer} ] ],
 
 	# loging sub-options
 	[   'sampling => log', '--sampling-rate=0.01',

Reply via email to