Re: [HACKERS] pgbench regression test failure
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: not tested Spec compliant: not tested Documentation:not tested This causes the pgbench tests to fail (consistently) with not ok 194 - pgbench late throttling stdout /(?^:processed: [01]/10)/ When I run pgbench manually I get (-t 10 --rate=10 --latency-limit=1 -n -r) number of transactions actually processed: 10/10 number of transactions skipped: 10 (100.000 %) Prior to the patch I was getting. number of transactions actually processed: 0/10 number of transactions skipped: 10 (100.000 %) @@ -3539,7 +3542,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,^M {^M printf("number of transactions per client: %d\n", nxacts);^M printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",^M - total->cnt - total->skipped, nxacts * nclients);^M + total->cnt, nxacts * nclients);^M I think you want ntx instead of total->cnt here. The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
Hello Tom, # progress: 2.6 s, 6.9 tps, lat 0.000 ms stddev 0.000, lag 0.000 ms, 18 skipped # progress: 3.0 s, 0.0 tps, lat -nan ms stddev -nan, lag -nan ms, 0 skipped # progress: 4.0 s, 1.0 tps, lat 2682.730 ms stddev 0.000, lag 985.509 ms, 0 skipped (BTW, the "-nan" bits suggest an actual pgbench bug, independently of anything else.) From my point of view, NaN is expected when no test were executed in the interval: if there was no transaction, it does not make sense to talk about its latency, so NaN is the right answer. However, the above "6.9 tps, lat 0.000, stddev 0.000, lag 0.000" is inconsistent. As "6.9 = 18 / 2.6", it means that progress tps calculation should remove skipped transactions... Attached patch attempts to report more consistent figures in the progress and in final report when transactions are skipped. sh> cat sleep-100.sql \sleep 100 ms SELECT 1; sh> ./pgbench -P 1 -t 100 -f sleep-100.sql -R 20 -L 1 [...] progress: 1.0 s, 7.0 tps, lat 100.145 ms stddev 0.042, lag 0.000 ms, 16 skipped progress: 2.0 s, 6.0 tps, lat 100.133 ms stddev 0.040, lag 0.021 ms, 7 skipped progress: 3.0 s, 9.0 tps, lat 100.115 ms stddev 0.016, lag 0.000 ms, 11 skipped [...] number of transactions actually processed: 38/100 number of transactions skipped: 62 (62.000 %) number of transactions above the 1.0 ms latency limit: 38 (38.000 %) latency average = 100.142 ms tps = 7.091010 (including connections establishing) tps = 7.094144 (excluding connections establishing) script statistics: - number of transactions skipped: 62 (62.000%) -- Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index e37496c..9ca9734 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -2584,7 +2584,7 @@ processXactStats(TState *thread, CState *st, instr_time *now, doLog(thread, st, agg, skipped, latency, lag); /* XXX could use a mutex here, but we choose not to */ - if (per_script_stats) + if (per_script_stats || latency_limit) accumStats(_script[st->use_file].stats, skipped, latency, lag); } @@ -3522,11 +3522,14 @@ printResults(TState *threads, StatsData *total, instr_time total_time, double time_include, tps_include, tps_exclude; + int64 ntx = total->cnt - total->skipped; time_include = INSTR_TIME_GET_DOUBLE(total_time); - tps_include = total->cnt / time_include; - tps_exclude = total->cnt / (time_include - -(INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients)); + + /* tps is about actually executed transactions */ + tps_include = ntx / time_include; + tps_exclude = ntx / + (time_include - (INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients)); /* Report test parameters. */ printf("transaction type: %s\n", @@ -3539,7 +3542,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time, { printf("number of transactions per client: %d\n", nxacts); printf("number of transactions actually processed: " INT64_FORMAT "/%d\n", - total->cnt - total->skipped, nxacts * nclients); + total->cnt, nxacts * nclients); } else { @@ -4660,7 +4663,8 @@ threadRun(void *arg) { /* generate and show report */ StatsData cur; -int64 run = now - last_report; +int64 run = now - last_report, + ntx; double tps, total_run, latency, @@ -4675,7 +4679,7 @@ threadRun(void *arg) * XXX: No locking. There is no guarantee that we get an * atomic snapshot of the transaction count and latencies, so * these figures can well be off by a small amount. The - * progress is report's purpose is to give a quick overview of + * progress report's purpose is to give a quick overview of * how the test is going, so that shouldn't matter too much. * (If a read from a 64-bit integer is not atomic, you might * get a "torn" read and completely bogus latencies though!) @@ -4689,15 +4693,14 @@ threadRun(void *arg) cur.skipped += thread[i].stats.skipped; } +/* we count only actually executed transactions */ +ntx = (cur.cnt - cur.skipped) - (last.cnt - last.skipped); total_run = (now - thread_start) / 100.0; -tps = 100.0 * (cur.cnt - last.cnt) / run; -latency = 0.001 * (cur.latency.sum - last.latency.sum) / - (cur.cnt - last.cnt); -sqlat = 1.0 * (cur.latency.sum2 - last.latency.sum2) - / (cur.cnt - last.cnt); +tps = 100.0 * ntx / run; +latency = 0.001 * (cur.latency.sum - last.latency.sum) / ntx; +sqlat = 1.0 * (cur.latency.sum2 - last.latency.sum2) / ntx; stdev = 0.001 * sqrt(sqlat - 100.0 * latency * latency); -lag = 0.001 * (cur.lag.sum - last.lag.sum) / - (cur.cnt - last.cnt); +lag = 0.001 * (cur.lag.sum - last.lag.sum) / ntx; if (progress_timestamp) { @@ -4714,6 +4717,7 @@ threadRun(void *arg) (long) tv.tv_sec, (long) (tv.tv_usec / 1000)); } else + /* round seconds are expected, nut the thread may
Re: [HACKERS] pgbench regression test failure
Fabien COELHOwrites: >> [...] After another week of buildfarm runs, we have a few more cases of >> 3 rows of output, and none of more than 3 or less than 1. So I went >> ahead and pushed your patch. I'm still suspicious of these results, but >> we might as well try to make the buildfarm green pending investigation >> of how this is happening. > Yep. I keep the issue of pgbench tap test determinism in my todo list, > among other things. > I think that it should be at least clearer under which condition (load ? > luck ? bug ?) the result may be 1 or 3 when 2 are expected, which needs > some thinking. skink blew up real good just now: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2017-09-23%2010%3A50%3A01 the critical bit being # Failed test 'pgbench progress stderr /(?^:progress: 1\b)/' # at /home/andres/build/buildfarm/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm line 369. # 'starting vacuum...end. # progress: 2.6 s, 6.9 tps, lat 0.000 ms stddev 0.000, lag 0.000 ms, 18 skipped # progress: 3.0 s, 0.0 tps, lat -nan ms stddev -nan, lag -nan ms, 0 skipped # progress: 4.0 s, 1.0 tps, lat 2682.730 ms stddev 0.000, lag 985.509 ms, 0 skipped # ' # doesn't match '(?^:progress: 1\b)' # Failed test 'transaction count for 001_pgbench_log_1.15981 (5)' # at t/001_pgbench_with_server.pl line 438. # Failed test 'transaction count for 001_pgbench_log_1.15981.1 (4)' # at t/001_pgbench_with_server.pl line 438. # Looks like you failed 3 tests of 233. That's exceeded my patience with this test case, so I've removed it for the moment. We can put it back as soon as we figure some way to make it more robust. (BTW, the "-nan" bits suggest an actual pgbench bug, independently of anything else.) Possibly you can duplicate skink's issue by running things under valgrind. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
[...] After another week of buildfarm runs, we have a few more cases of 3 rows of output, and none of more than 3 or less than 1. So I went ahead and pushed your patch. I'm still suspicious of these results, but we might as well try to make the buildfarm green pending investigation of how this is happening. Yep. I keep the issue of pgbench tap test determinism in my todo list, among other things. I think that it should be at least clearer under which condition (load ? luck ? bug ?) the result may be 1 or 3 when 2 are expected, which needs some thinking. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
Fabien COELHOwrites: >> It could be as simple as putting the check-for-done at the bottom of the >> loop not the top, perhaps. > I agree that it is best if tests should work in all reasonable conditions, > including a somehow overloaded host... > I'm going to think about it, but I'm not sure of the best approach. In the > mean time, ISTM that the issue has not been encountered (yet), so this is > not a pressing issue. Maybe under -T > --aggregate-interval pgbench could > go on over the limit if the log file has not been written at all, but that > would be some kind of kludge for this specific test... After another week of buildfarm runs, we have a few more cases of 3 rows of output, and none of more than 3 or less than 1. So I went ahead and pushed your patch. I'm still suspicious of these results, but we might as well try to make the buildfarm green pending investigation of how this is happening. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
I have a serious, serious dislike for tests that seem to work until they're run on a heavily loaded machine. I'm not that sure the error message was because of that. No, this particular failure (probably) wasn't. But now that I've realized that this test case is timing-sensitive, I'm worried about what will happen when it's run on a sufficiently slow or loaded machine. I would not necessarily object to doing something in the code that would guarantee that, though. Hmmm. Interesting point. It could be as simple as putting the check-for-done at the bottom of the loop not the top, perhaps. I agree that it is best if tests should work in all reasonable conditions, including a somehow overloaded host... I'm going to think about it, but I'm not sure of the best approach. In the mean time, ISTM that the issue has not been encountered (yet), so this is not a pressing issue. Maybe under -T > --aggregate-interval pgbench could go on over the limit if the log file has not been written at all, but that would be some kind of kludge for this specific test... Note that to get test coverage for -T and have to assume that maybe a loaded host would not be able to generate just a one line log every second during that time is kind of a hard assumption... Maybe some test could be "warnings", i.e. it could be acceptable to accept a failure once in a while in specific conditions, if this is rare enough and documented. ISTM that there is such a test for random output. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
Fabien COELHOwrites: >> I have a serious, serious dislike for tests that seem to work until >> they're run on a heavily loaded machine. > I'm not that sure the error message was because of that. No, this particular failure (probably) wasn't. But now that I've realized that this test case is timing-sensitive, I'm worried about what will happen when it's run on a sufficiently slow or loaded machine. >> I would not necessarily object to doing something in the code that >> would guarantee that, though. > Hmmm. Interesting point. It could be as simple as putting the check-for-done at the bottom of the loop not the top, perhaps. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
I have a serious, serious dislike for tests that seem to work until they're run on a heavily loaded machine. I'm not that sure the error message was because of that. ISTM that it was rather finding 3 seconds in two because it started just at the right time, or maybe because of slowness induce by load and the order in which the different checks are performed. So unless there is some reason why pgbench is *guaranteed* to run at least one transaction per thread, I'd rather the test not assume that. Well, pgbench is for testing performance... so if the checks allow zero performance that's quite annoying as well:-) The tests are designed to require very low performance (eg there are a lot of -t 1 when only one transaction is enough to check a point), but maybe some test assume a minimal requirement, maybe 10 tps with 2 threads... I would not necessarily object to doing something in the code that would guarantee that, though. Hmmm. Interesting point. There could be a client-side synchronization barrier, eg something like "\sync :nclients/nthreads" could be easy enough to implement with pthread, and quite error prone to use, but probably that could be okay for validation purposes. Or maybe we could expose something at the SQL level, eg "SELECT synchro('synchroname', whomanyclientstowait);" which would be harder to implement server-side but possibly doable as well. A simpler option may be to introduce a synchronization barrier at thread start, so that all threads start together and that would set the "zero" time. Not sure that would solve the potential issue you raise, although that would help. Currently the statistics collection and outputs are performed by thread 0 in addition to the client it runs, so that pgbench would work even if there are no threads, but it also means that under a heavy load some things may not be done on the target time but a little bit later, if some thread is stuck somewhere. Although the async protocol try to avoid that. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
Fabien COELHOwrites: > By definition, parallelism induces non determinism. When I put 2 seconds, > the intention was that I would get a non empty trace with a "every second" > aggregation. I would rather take a longer test rather than allowing an > empty file: the point is to check that something is generated, but > avoiding a longer test is desirable. So I would suggest to stick to > between 1 and 3, and if it fails then maybe add one second... That's a losing game. You can't ever guarantee that N seconds is enough for slow, heavily loaded machines, and cranking up N just penalizes developers who are testing under normal circumstances. I have a serious, serious dislike for tests that seem to work until they're run on a heavily loaded machine. So unless there is some reason why pgbench is *guaranteed* to run at least one transaction per thread, I'd rather the test not assume that. I would not necessarily object to doing something in the code that would guarantee that, though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
Apparently, one of the threads ran 3 transactions where the test script expects it to run at most 2. Is this a pgbench bug, or is the test being overoptimistic about how exact the "-T 2" cutoff is? Probably both? It seems that cutting off on time is not a precise science, so I suggest to accept 1, 2 and 3 lines, see attached. Before I'd deciphered the test output fully, I was actually guessing that the problem was the opposite, namely too few lines. The test was waiting for betwen 1 and 2 lines, so I assumed that the 3 should the number of lines found. Isn't it possible that some thread is slow enough to start up that it doesn't get to run any transactions? IOW, do we need to allow 0 to 3 lines? By definition, parallelism induces non determinism. When I put 2 seconds, the intention was that I would get a non empty trace with a "every second" aggregation. I would rather take a longer test rather than allowing an empty file: the point is to check that something is generated, but avoiding a longer test is desirable. So I would suggest to stick to between 1 and 3, and if it fails then maybe add one second... -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
Fabien COELHOwrites: >> Apparently, one of the threads ran 3 transactions where the test script >> expects it to run at most 2. Is this a pgbench bug, or is the test >> being overoptimistic about how exact the "-T 2" cutoff is? > Probably both? It seems that cutting off on time is not a precise science, > so I suggest to accept 1, 2 and 3 lines, see attached. Before I'd deciphered the test output fully, I was actually guessing that the problem was the opposite, namely too few lines. Isn't it possible that some thread is slow enough to start up that it doesn't get to run any transactions? IOW, do we need to allow 0 to 3 lines? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
francolin just showed a non-reproducing failure in the new pgbench tests: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=francolin=2017-09-12%2014%3A00%3A02 not ok 211 - transaction count for 001_pgbench_log_1.31583 (3) # Failed test 'transaction count for 001_pgbench_log_1.31583 (3)' # at t/001_pgbench_with_server.pl line 438. Apparently, one of the threads ran 3 transactions where the test script expects it to run at most 2. Is this a pgbench bug, or is the test being overoptimistic about how exact the "-T 2" cutoff is? Probably both? It seems that cutting off on time is not a precise science, so I suggest to accept 1, 2 and 3 lines, see attached. -- Fabien.diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl index 3609b9b..1fe3433 100644 --- a/src/bin/pgbench/t/001_pgbench_with_server.pl +++ b/src/bin/pgbench/t/001_pgbench_with_server.pl @@ -463,7 +463,8 @@ pgbench( 'pgbench progress'); # $nthreads threads, 2 seconds, sometimes only one aggregated line is written -check_pgbench_logs('001_pgbench_log_1', $nthreads, 1, 2, +# and sometimes 3 lines... +check_pgbench_logs('001_pgbench_log_1', $nthreads, 1, 3, qr{^\d+ \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$}); # with sampling rate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pgbench regression test failure
francolin just showed a non-reproducing failure in the new pgbench tests: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=francolin=2017-09-12%2014%3A00%3A02 The relevant part of the log is # Running: pgbench -T 2 -P 1 -l --log-prefix=001_pgbench_log_1 --aggregate-interval=1 -S -b se@2 --rate=20 --latency-limit=1000 -j 2 -c 3 -r ok 198 - pgbench progress status (got 0 vs expected 0) ok 199 - pgbench progress stdout /(?^:type: multiple)/ ok 200 - pgbench progress stdout /(?^:clients: 3)/ ok 201 - pgbench progress stdout /(?^:threads: 2)/ ok 202 - pgbench progress stdout /(?^:duration: 2 s)/ ok 203 - pgbench progress stdout /(?^:script 1: .* select only)/ ok 204 - pgbench progress stdout /(?^:script 2: .* select only)/ ok 205 - pgbench progress stdout /(?^:statement latencies in milliseconds)/ ok 206 - pgbench progress stdout /(?^:FROM pgbench_accounts)/ ok 207 - pgbench progress stderr /(?^:vacuum)/ ok 208 - pgbench progress stderr /(?^:progress: 1\b)/ ok 209 - number of log files ok 210 - file name format not ok 211 - transaction count for 001_pgbench_log_1.31583 (3) # Failed test 'transaction count for 001_pgbench_log_1.31583 (3)' # at t/001_pgbench_with_server.pl line 438. ok 212 - transaction format for 001_pgbench_log_1 ok 213 - transaction count for 001_pgbench_log_1.31583.1 (2) ok 214 - transaction format for 001_pgbench_log_1 ok 215 - remove log files Apparently, one of the threads ran 3 transactions where the test script expects it to run at most 2. Is this a pgbench bug, or is the test being overoptimistic about how exact the "-T 2" cutoff is? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers