Hello,

+# note: this test is time sensitive, and may fail on a very
+#       loaded host.
+# note: --progress-timestamp is not tested
+my $delay = pgbench(
+       '-T 2 -P 1 -l --aggregate-interval=1 -S -b se@2'
+       . ' --rate=20 --latency-limit=1000 -j ' . $nthreads
+       . ' -c 3 -r',
+       0,
+       [   qr{type: multiple},
+               qr{clients: 3},
+               qr{threads: $nthreads},
+               qr{duration: 2 s},
+               qr{script 1: .* select only},
+               qr{script 2: .* select only},
+               qr{statement latencies in milliseconds},
+               qr{FROM pgbench_accounts} ],
+       [ qr{vacuum}, qr{progress: 1\b} ],
+       'pgbench progress', undef,
+       "--log-prefix=$bdir/001_pgbench_log_1");

Could we make that shorter at 1s?  That will shorten the duration of
the test run.  It is easy to miss that this test is for
--aggregate-interval (aka the logAgg() path) without a comment.

It is for -T, -P and --aggregate-interval. The units of all these is seconds, the minimum is 1, I put 2 so that It is pretty sure to get some output. We could try 1, but I'm less confident that an output is ensured in all cases on a very slow host which may decide to shorten the run before having shown a progress for instance.

+# cool check that we are around 2 seconds
+# The rate may results in an unlucky schedule which triggers
+# an early exit, hence the loose bound.
+ok(0.0 < $delay && $delay < 4.0, "-T 2 run around 2 seconds");

The command itself would not fail, but we would just fail on a machine
where the difference in start/stop time is higher than 4 seconds,
right?

Yep. It could detect a struck pgbench process which was one of the reported issue. Maybe there should be a timeout added.

On RPI-level machines, this could fail reliably.

Dunno, Not sure what RPI means. Probably not "Retail Price Index"… maybe Rasberry-Pi? In that case, the O-4 second leeway is intended to be loose enough to accomodate such hosts, but I cannot test that.

I am not completely sure what's the additional value we can get from that extra test, to be honest.

This would be to detect a somehow stuck process. It could be looser if necessary. Or removed, or preferably commented out, or enabled with some options (eg an environment variable? configure option?). Such control could also skip all 3 calls, in which case the 2 seconds is not an issue.

+# $nthreads threads, 2 seconds, but due to timing imprecision we might get
+# only 1 or as many as 3 progress reports per thread.
+check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3,
+       qr{^\d{10,} \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$});
+
Now this one is good and actually stable thanks to the fact that we'd
get at least the final logs, and the main complain we got to discuss
about on this thread was the format of the logs.

Yep, this test would have probably detected the epoch issue reported by Greg.

I would say to give up on the first test, and keep the second.

You mean remove the time check and keep the log check. I'd like to keep a time check, or at least have it in comment so that I can enable it simply.

But, is this regex correct? Shouldn't we check for six integer fields only with the first one having a minimal number of digits for the epoch?

Epoch (seconds since 1970-01-01?) is currently 10 digits. Not sure how well it would work if some host have another zero start date.

Given the options of the bench run, there are that many fields in the log output, I'm not sure why we would want to check for less?

--
Fabien.

Reply via email to