Re: [HACKERS] pgbench regression test failure
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: not tested Spec compliant: not tested Documentation:not tested This causes the pgbench tests to fail (consistently) with not ok 194 - pgbench late throttling stdout /(?^:processed: [01]/10)/ When I run pgbench manually I get (-t 10 --rate=10 --latency-limit=1 -n -r) number of transactions actually processed: 10/10 number of transactions skipped: 10 (100.000 %) Prior to the patch I was getting. number of transactions actually processed: 0/10 number of transactions skipped: 10 (100.000 %) @@ -3539,7 +3542,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,^M {^M printf("number of transactions per client: %d\n", nxacts);^M printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",^M - total->cnt - total->skipped, nxacts * nclients);^M + total->cnt, nxacts * nclients);^M I think you want ntx instead of total->cnt here. The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fresh regression - regproc result contains unwanted schema
2017-10-14 17:26 GMT+02:00 Tom Lane: > Pavel Stehule writes: > > When function is overwritten, then regproc result contains schema, > although > > it is on search_path > > There's no "fresh regression" here, it's done that more or less since > we invented schemas. See regprocout: > > * Would this proc be found (uniquely!) by regprocin? If not, > * qualify it. > > git blame dates that comment to commit 52200bef of 2002-04-25. > > Admittedly, qualifying the name might not be sufficient to disambiguate, > but regprocout doesn't have any other tool in its toolbox, so it uses > the hammer it's got. If you're overloading functions, you really need > to use regprocedure not regproc. > It is false alarm. I am sorry. I shot by self. Thank you for explanation Nice evening. Pavel > regards, tom lane >
Re: [HACKERS] fresh regression - regproc result contains unwanted schema
Pavel Stehulewrites: > When function is overwritten, then regproc result contains schema, although > it is on search_path There's no "fresh regression" here, it's done that more or less since we invented schemas. See regprocout: * Would this proc be found (uniquely!) by regprocin? If not, * qualify it. git blame dates that comment to commit 52200bef of 2002-04-25. Admittedly, qualifying the name might not be sufficient to disambiguate, but regprocout doesn't have any other tool in its toolbox, so it uses the hammer it's got. If you're overloading functions, you really need to use regprocedure not regproc. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
Hello Tom, # progress: 2.6 s, 6.9 tps, lat 0.000 ms stddev 0.000, lag 0.000 ms, 18 skipped # progress: 3.0 s, 0.0 tps, lat -nan ms stddev -nan, lag -nan ms, 0 skipped # progress: 4.0 s, 1.0 tps, lat 2682.730 ms stddev 0.000, lag 985.509 ms, 0 skipped (BTW, the "-nan" bits suggest an actual pgbench bug, independently of anything else.) From my point of view, NaN is expected when no test were executed in the interval: if there was no transaction, it does not make sense to talk about its latency, so NaN is the right answer. However, the above "6.9 tps, lat 0.000, stddev 0.000, lag 0.000" is inconsistent. As "6.9 = 18 / 2.6", it means that progress tps calculation should remove skipped transactions... Attached patch attempts to report more consistent figures in the progress and in final report when transactions are skipped. sh> cat sleep-100.sql \sleep 100 ms SELECT 1; sh> ./pgbench -P 1 -t 100 -f sleep-100.sql -R 20 -L 1 [...] progress: 1.0 s, 7.0 tps, lat 100.145 ms stddev 0.042, lag 0.000 ms, 16 skipped progress: 2.0 s, 6.0 tps, lat 100.133 ms stddev 0.040, lag 0.021 ms, 7 skipped progress: 3.0 s, 9.0 tps, lat 100.115 ms stddev 0.016, lag 0.000 ms, 11 skipped [...] number of transactions actually processed: 38/100 number of transactions skipped: 62 (62.000 %) number of transactions above the 1.0 ms latency limit: 38 (38.000 %) latency average = 100.142 ms tps = 7.091010 (including connections establishing) tps = 7.094144 (excluding connections establishing) script statistics: - number of transactions skipped: 62 (62.000%) -- Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index e37496c..9ca9734 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -2584,7 +2584,7 @@ processXactStats(TState *thread, CState *st, instr_time *now, doLog(thread, st, agg, skipped, latency, lag); /* XXX could use a mutex here, but we choose not to */ - if (per_script_stats) + if (per_script_stats || latency_limit) accumStats(_script[st->use_file].stats, skipped, latency, lag); } @@ -3522,11 +3522,14 @@ printResults(TState *threads, StatsData *total, instr_time total_time, double time_include, tps_include, tps_exclude; + int64 ntx = total->cnt - total->skipped; time_include = INSTR_TIME_GET_DOUBLE(total_time); - tps_include = total->cnt / time_include; - tps_exclude = total->cnt / (time_include - -(INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients)); + + /* tps is about actually executed transactions */ + tps_include = ntx / time_include; + tps_exclude = ntx / + (time_include - (INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients)); /* Report test parameters. */ printf("transaction type: %s\n", @@ -3539,7 +3542,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time, { printf("number of transactions per client: %d\n", nxacts); printf("number of transactions actually processed: " INT64_FORMAT "/%d\n", - total->cnt - total->skipped, nxacts * nclients); + total->cnt, nxacts * nclients); } else { @@ -4660,7 +4663,8 @@ threadRun(void *arg) { /* generate and show report */ StatsData cur; -int64 run = now - last_report; +int64 run = now - last_report, + ntx; double tps, total_run, latency, @@ -4675,7 +4679,7 @@ threadRun(void *arg) * XXX: No locking. There is no guarantee that we get an * atomic snapshot of the transaction count and latencies, so * these figures can well be off by a small amount. The - * progress is report's purpose is to give a quick overview of + * progress report's purpose is to give a quick overview of * how the test is going, so that shouldn't matter too much. * (If a read from a 64-bit integer is not atomic, you might * get a "torn" read and completely bogus latencies though!) @@ -4689,15 +4693,14 @@ threadRun(void *arg) cur.skipped += thread[i].stats.skipped; } +/* we count only actually executed transactions */ +ntx = (cur.cnt - cur.skipped) - (last.cnt - last.skipped); total_run = (now - thread_start) / 100.0; -tps = 100.0 * (cur.cnt - last.cnt) / run; -latency = 0.001 * (cur.latency.sum - last.latency.sum) / - (cur.cnt - last.cnt); -sqlat = 1.0 * (cur.latency.sum2 - last.latency.sum2) - / (cur.cnt - last.cnt); +tps = 100.0 * ntx / run; +latency = 0.001 * (cur.latency.sum - last.latency.sum) / ntx; +sqlat = 1.0 * (cur.latency.sum2 - last.latency.sum2) / ntx; stdev = 0.001 * sqrt(sqlat - 100.0 * latency * latency); -lag = 0.001 * (cur.lag.sum - last.lag.sum) / - (cur.cnt - last.cnt); +lag = 0.001 * (cur.lag.sum - last.lag.sum) / ntx; if (progress_timestamp) { @@ -4714,6 +4717,7 @@ threadRun(void *arg) (long) tv.tv_sec, (long) (tv.tv_usec / 1000)); } else + /* round seconds are expected, nut the thread may
Re: [HACKERS] pgbench regression test failure
Fabien COELHOwrites: >> [...] After another week of buildfarm runs, we have a few more cases of >> 3 rows of output, and none of more than 3 or less than 1. So I went >> ahead and pushed your patch. I'm still suspicious of these results, but >> we might as well try to make the buildfarm green pending investigation >> of how this is happening. > Yep. I keep the issue of pgbench tap test determinism in my todo list, > among other things. > I think that it should be at least clearer under which condition (load ? > luck ? bug ?) the result may be 1 or 3 when 2 are expected, which needs > some thinking. skink blew up real good just now: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2017-09-23%2010%3A50%3A01 the critical bit being # Failed test 'pgbench progress stderr /(?^:progress: 1\b)/' # at /home/andres/build/buildfarm/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm line 369. # 'starting vacuum...end. # progress: 2.6 s, 6.9 tps, lat 0.000 ms stddev 0.000, lag 0.000 ms, 18 skipped # progress: 3.0 s, 0.0 tps, lat -nan ms stddev -nan, lag -nan ms, 0 skipped # progress: 4.0 s, 1.0 tps, lat 2682.730 ms stddev 0.000, lag 985.509 ms, 0 skipped # ' # doesn't match '(?^:progress: 1\b)' # Failed test 'transaction count for 001_pgbench_log_1.15981 (5)' # at t/001_pgbench_with_server.pl line 438. # Failed test 'transaction count for 001_pgbench_log_1.15981.1 (4)' # at t/001_pgbench_with_server.pl line 438. # Looks like you failed 3 tests of 233. That's exceeded my patience with this test case, so I've removed it for the moment. We can put it back as soon as we figure some way to make it more robust. (BTW, the "-nan" bits suggest an actual pgbench bug, independently of anything else.) Possibly you can duplicate skink's issue by running things under valgrind. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
[...] After another week of buildfarm runs, we have a few more cases of 3 rows of output, and none of more than 3 or less than 1. So I went ahead and pushed your patch. I'm still suspicious of these results, but we might as well try to make the buildfarm green pending investigation of how this is happening. Yep. I keep the issue of pgbench tap test determinism in my todo list, among other things. I think that it should be at least clearer under which condition (load ? luck ? bug ?) the result may be 1 or 3 when 2 are expected, which needs some thinking. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
Fabien COELHOwrites: >> It could be as simple as putting the check-for-done at the bottom of the >> loop not the top, perhaps. > I agree that it is best if tests should work in all reasonable conditions, > including a somehow overloaded host... > I'm going to think about it, but I'm not sure of the best approach. In the > mean time, ISTM that the issue has not been encountered (yet), so this is > not a pressing issue. Maybe under -T > --aggregate-interval pgbench could > go on over the limit if the log file has not been written at all, but that > would be some kind of kludge for this specific test... After another week of buildfarm runs, we have a few more cases of 3 rows of output, and none of more than 3 or less than 1. So I went ahead and pushed your patch. I'm still suspicious of these results, but we might as well try to make the buildfarm green pending investigation of how this is happening. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
I have a serious, serious dislike for tests that seem to work until they're run on a heavily loaded machine. I'm not that sure the error message was because of that. No, this particular failure (probably) wasn't. But now that I've realized that this test case is timing-sensitive, I'm worried about what will happen when it's run on a sufficiently slow or loaded machine. I would not necessarily object to doing something in the code that would guarantee that, though. Hmmm. Interesting point. It could be as simple as putting the check-for-done at the bottom of the loop not the top, perhaps. I agree that it is best if tests should work in all reasonable conditions, including a somehow overloaded host... I'm going to think about it, but I'm not sure of the best approach. In the mean time, ISTM that the issue has not been encountered (yet), so this is not a pressing issue. Maybe under -T > --aggregate-interval pgbench could go on over the limit if the log file has not been written at all, but that would be some kind of kludge for this specific test... Note that to get test coverage for -T and have to assume that maybe a loaded host would not be able to generate just a one line log every second during that time is kind of a hard assumption... Maybe some test could be "warnings", i.e. it could be acceptable to accept a failure once in a while in specific conditions, if this is rare enough and documented. ISTM that there is such a test for random output. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
Fabien COELHOwrites: >> I have a serious, serious dislike for tests that seem to work until >> they're run on a heavily loaded machine. > I'm not that sure the error message was because of that. No, this particular failure (probably) wasn't. But now that I've realized that this test case is timing-sensitive, I'm worried about what will happen when it's run on a sufficiently slow or loaded machine. >> I would not necessarily object to doing something in the code that >> would guarantee that, though. > Hmmm. Interesting point. It could be as simple as putting the check-for-done at the bottom of the loop not the top, perhaps. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
I have a serious, serious dislike for tests that seem to work until they're run on a heavily loaded machine. I'm not that sure the error message was because of that. ISTM that it was rather finding 3 seconds in two because it started just at the right time, or maybe because of slowness induce by load and the order in which the different checks are performed. So unless there is some reason why pgbench is *guaranteed* to run at least one transaction per thread, I'd rather the test not assume that. Well, pgbench is for testing performance... so if the checks allow zero performance that's quite annoying as well:-) The tests are designed to require very low performance (eg there are a lot of -t 1 when only one transaction is enough to check a point), but maybe some test assume a minimal requirement, maybe 10 tps with 2 threads... I would not necessarily object to doing something in the code that would guarantee that, though. Hmmm. Interesting point. There could be a client-side synchronization barrier, eg something like "\sync :nclients/nthreads" could be easy enough to implement with pthread, and quite error prone to use, but probably that could be okay for validation purposes. Or maybe we could expose something at the SQL level, eg "SELECT synchro('synchroname', whomanyclientstowait);" which would be harder to implement server-side but possibly doable as well. A simpler option may be to introduce a synchronization barrier at thread start, so that all threads start together and that would set the "zero" time. Not sure that would solve the potential issue you raise, although that would help. Currently the statistics collection and outputs are performed by thread 0 in addition to the client it runs, so that pgbench would work even if there are no threads, but it also means that under a heavy load some things may not be done on the target time but a little bit later, if some thread is stuck somewhere. Although the async protocol try to avoid that. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
Fabien COELHOwrites: > By definition, parallelism induces non determinism. When I put 2 seconds, > the intention was that I would get a non empty trace with a "every second" > aggregation. I would rather take a longer test rather than allowing an > empty file: the point is to check that something is generated, but > avoiding a longer test is desirable. So I would suggest to stick to > between 1 and 3, and if it fails then maybe add one second... That's a losing game. You can't ever guarantee that N seconds is enough for slow, heavily loaded machines, and cranking up N just penalizes developers who are testing under normal circumstances. I have a serious, serious dislike for tests that seem to work until they're run on a heavily loaded machine. So unless there is some reason why pgbench is *guaranteed* to run at least one transaction per thread, I'd rather the test not assume that. I would not necessarily object to doing something in the code that would guarantee that, though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
Apparently, one of the threads ran 3 transactions where the test script expects it to run at most 2. Is this a pgbench bug, or is the test being overoptimistic about how exact the "-T 2" cutoff is? Probably both? It seems that cutting off on time is not a precise science, so I suggest to accept 1, 2 and 3 lines, see attached. Before I'd deciphered the test output fully, I was actually guessing that the problem was the opposite, namely too few lines. The test was waiting for betwen 1 and 2 lines, so I assumed that the 3 should the number of lines found. Isn't it possible that some thread is slow enough to start up that it doesn't get to run any transactions? IOW, do we need to allow 0 to 3 lines? By definition, parallelism induces non determinism. When I put 2 seconds, the intention was that I would get a non empty trace with a "every second" aggregation. I would rather take a longer test rather than allowing an empty file: the point is to check that something is generated, but avoiding a longer test is desirable. So I would suggest to stick to between 1 and 3, and if it fails then maybe add one second... -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
Fabien COELHOwrites: >> Apparently, one of the threads ran 3 transactions where the test script >> expects it to run at most 2. Is this a pgbench bug, or is the test >> being overoptimistic about how exact the "-T 2" cutoff is? > Probably both? It seems that cutting off on time is not a precise science, > so I suggest to accept 1, 2 and 3 lines, see attached. Before I'd deciphered the test output fully, I was actually guessing that the problem was the opposite, namely too few lines. Isn't it possible that some thread is slow enough to start up that it doesn't get to run any transactions? IOW, do we need to allow 0 to 3 lines? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
francolin just showed a non-reproducing failure in the new pgbench tests: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=francolin=2017-09-12%2014%3A00%3A02 not ok 211 - transaction count for 001_pgbench_log_1.31583 (3) # Failed test 'transaction count for 001_pgbench_log_1.31583 (3)' # at t/001_pgbench_with_server.pl line 438. Apparently, one of the threads ran 3 transactions where the test script expects it to run at most 2. Is this a pgbench bug, or is the test being overoptimistic about how exact the "-T 2" cutoff is? Probably both? It seems that cutting off on time is not a precise science, so I suggest to accept 1, 2 and 3 lines, see attached. -- Fabien.diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl index 3609b9b..1fe3433 100644 --- a/src/bin/pgbench/t/001_pgbench_with_server.pl +++ b/src/bin/pgbench/t/001_pgbench_with_server.pl @@ -463,7 +463,8 @@ pgbench( 'pgbench progress'); # $nthreads threads, 2 seconds, sometimes only one aggregated line is written -check_pgbench_logs('001_pgbench_log_1', $nthreads, 1, 2, +# and sometimes 3 lines... +check_pgbench_logs('001_pgbench_log_1', $nthreads, 1, 3, qr{^\d+ \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$}); # with sampling rate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generate_series regression 9.6->10
Thanks Tom. This showed up in a regression test of ours that built the test data using generate_series, so it's not a critical production issue or anything, just a surprise change in behaviour. P. On Wed, May 24, 2017 at 10:28 AM, Tom Lanewrote: > Paul Ramsey writes: > > The behaviour of generate_series seems to have changed a little, at least > > in conjunction w/ CTEs. > > What's changed is the behavior of multiple SRFs in a SELECT's targetlist, > cf > > https://git.postgresql.org/gitweb/?p=postgresql.git=commitdiff= > 69f4b9c85f168ae006929eec44fc44d569e846b9 > > specifically this comment: > > While moving SRF evaluation to ProjectSet would allow to retain the old > "least common multiple" behavior when multiple SRFs are present in one > targetlist (i.e. continue returning rows until all SRFs are at the > end of > their input at the same time), we decided to instead only return rows > till > all SRFs are exhausted, returning NULL for already exhausted ones. We > deemed the previous behavior to be too confusing, unexpected and > actually > not particularly useful. > > I see the current v10 release notes have failed miserably at documenting > this :-(. Will try to improve that. > > regards, tom lane >
Re: [HACKERS] generate_series regression 9.6->10
Paul Ramseywrites: > The behaviour of generate_series seems to have changed a little, at least > in conjunction w/ CTEs. What's changed is the behavior of multiple SRFs in a SELECT's targetlist, cf https://git.postgresql.org/gitweb/?p=postgresql.git=commitdiff=69f4b9c85f168ae006929eec44fc44d569e846b9 specifically this comment: While moving SRF evaluation to ProjectSet would allow to retain the old "least common multiple" behavior when multiple SRFs are present in one targetlist (i.e. continue returning rows until all SRFs are at the end of their input at the same time), we decided to instead only return rows till all SRFs are exhausted, returning NULL for already exhausted ones. We deemed the previous behavior to be too confusing, unexpected and actually not particularly useful. I see the current v10 release notes have failed miserably at documenting this :-(. Will try to improve that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generate_series regression 9.6->10
On 2017-05-24 10:09:19 -0700, Paul Ramsey wrote: > The behaviour of generate_series seems to have changed a little, at least > in conjunction w/ CTEs. Under 9.6 (and prior) this query returns 2127 rows, > with no nulls: > > with > ij as ( select i, j from generate_series(1, 10) i, generate_series(1, 10) > j), > iijj as (select generate_series(1, i) as a, generate_series(1, j) b from ij) > select a, b from iijj; > > Under 10, it returns only 715 rows, with many nulls. Right, that's expected - we probably need to expand on that in the release notes. Before v10 targetlist with multiple SRFs were evaluated using on a "least common multiple" logic. I.e. if you have SELECT generate_series(1,2), generate_series(1,4); once the first SRFs is exhausted it was restarted. Only once all SRFs stopped returning rows at the same time, things were stopped. Going on forward, once either SRF stops returning rows, it'll return NULL until all SRFs are exhausted. Makes sense? Is that a problem for you? If so, what do you use the LCM logic for in practical terms? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible regression with gather merge.
On Wed, Mar 22, 2017 at 3:39 AM, Rushabh Lathiawrote: > Looking at the explain analyze output of both the plan, its clear that GM > taking longer as its using external merge dist for the sort, where as > another plan perform top-N heapsort. For normal sort path, it can consider > the limit as bound, but for GM its not possible. Right, good catch. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible regression with gather merge.
On Wed, Mar 22, 2017 at 1:09 PM, Rushabh Lathiawrote: > In the code, gather merge consider the limit for the sort into > create_ordered_paths, > which is wrong. Into create_ordered_paths(), GM should not consider the > limit > while doing costing for the sort node. > > Attached patch fix the bug. Thanks, Rushabh, Now I see non-parallel scan being picked up by default create table test as (select id, (random()*1)::int as v1, random() as v2 from generate_series(1,1000) id); postgres=# explain analyze select * from test order by v1, v2 limit 10; QUERY PLAN --- Limit (cost=370146.98..370147.00 rows=10 width=16) (actual time=1169.685..1169.686 rows=10 loops=1) -> Sort (cost=370146.98..395146.63 rows=860 width=16) (actual time=1169.683..1169.684 rows=10 loops=1) Sort Key: v1, v2 Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on test (cost=0.00..154053.60 rows=860 width=16) (actual time=0.016..590.176 rows=1000 loops=1) Planning time: 0.070 ms Execution time: 1169.706 ms (7 rows) Another find by accident. Setting higher max_parallel_workers_per_gather to a higher value than default results in more parallel workers, But query runs slower than the plan with lesser workers. postgres=# set max_parallel_workers_per_gather = default; SET postgres=# explain analyze select * from test order by v1, v2 limit 1000; QUERY PLAN Limit (cost=697263.86..1669540.28 rows=8333216 width=16) (actual time=2212.437..8207.161 rows=1000 loops=1) -> Gather Merge (cost=697263.86..1669540.28 rows=8333216 width=16) (actual time=2212.436..7600.478 rows=1000 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=696263.84..706680.36 rows=4166608 width=16) (actual time=2173.756..3105.512 rows=333 loops=3) Sort Key: v1, v2 Sort Method: external merge Disk: 86648kB -> Parallel Seq Scan on test (cost=0.00..95721.08 rows=4166608 width=16) (actual time=0.030..240.486 rows=333 loops=3) Planning time: 0.096 ms Execution time: 8537.214 ms (10 rows) postgres=# set max_parallel_workers_per_gather = 10; SET postgres=# explain analyze select * from test order by v1, v2 limit 1000; QUERY PLAN Limit (cost=431168.44..1628498.09 rows=860 width=16) (actual time=1525.120..13273.805 rows=1000 loops=1) -> Gather Merge (cost=431168.44..1628498.09 rows=860 width=16) (actual time=1525.119..12650.621 rows=1000 loops=1) Workers Planned: 4 Workers Launched: 4 -> Sort (cost=430168.39..436418.30 rows=2499965 width=16) (actual time=1472.799..2133.571 rows=200 loops=5) Sort Key: v1, v2 Sort Method: external merge Disk: 50336kB -> Parallel Seq Scan on test (cost=0.00..79054.65 rows=2499965 width=16) (actual time=0.047..201.405 rows=200 loops=5) Planning time: 0.077 ms Execution time: 13622.319 ms (10 rows) -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible regression with gather merge.
Hi, postgres=# explain analyze select * from test order by v1, v2 limit 10; QUERY PLAN Limit (cost=19576.71..19577.88 rows=10 width=16) (actual time=408.842..408.853 rows=10 loops=1) -> Gather Merge (cost=19576.71..116805.80 rows=84 width=16) (actual time=408.841..408.850 rows=10 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=18576.69..19618.36 rows=416667 width=16) (actual time=393.602..394.080 rows=911 loops=3) Sort Key: v1, v2 Sort Method: external merge Disk: 8128kB -> Parallel Seq Scan on test (cost=0.00..9572.67 rows=416667 width=16) (actual time=0.053..46.238 rows=33 loops=3) Planning time: 0.118 ms Execution time: 414.763 ms (10 rows) postgres=# set enable_gathermerge = off; SET postgres=# explain analyze select * from test order by v1, v2 limit 10; QUERY PLAN - Limit (cost=37015.64..37015.67 rows=10 width=16) (actual time=268.859..268.861 rows=10 loops=1) -> Sort (cost=37015.64..39515.64 rows=100 width=16) (actual time=268.858..268.859 rows=10 loops=1) Sort Key: v1, v2 Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on test (cost=0.00..15406.00 rows=100 width=16) (actual time=0.085..107.841 rows=100 loops=1) Planning time: 0.163 ms Execution time: 268.897 ms (7 rows) Looking at the explain analyze output of both the plan, its clear that GM taking longer as its using external merge dist for the sort, where as another plan perform top-N heapsort. For normal sort path, it can consider the limit as bound, but for GM its not possible. In the code, gather merge consider the limit for the sort into create_ordered_paths, which is wrong. Into create_ordered_paths(), GM should not consider the limit while doing costing for the sort node. Attached patch fix the bug. On Wed, Mar 22, 2017 at 12:05 PM, Rushabh Lathiawrote: > Thanks for reporting, I am looking into this. > > On Wed, Mar 22, 2017 at 11:51 AM, Mithun Cy > wrote: > >> Adding more rows to table make gather merge execution time very slow >> when compared to non-parallel plan we get after disabling gather >> merge. >> >> create table test as (select id, (random()*1)::int as v1, random() as >> v2 from generate_series(1,1) id); >> >> postgres=# set max_parallel_workers_per_gather = default; >> SET >> postgres=# explain analyze select * from test order by v1, v2 limit 10; >> >> QUERY PLAN >> >> >> >> Limit (cost=1858610.53..1858611.70 rows=10 width=16) (actual >> time=31103.880..31103.885 rows=10 loops=1) >>-> Gather Merge (cost=1858610.53..11581520.05 rows=8406 >> width=16) (actual time=31103.878..31103.882 rows=10 loops=1) >> Workers Planned: 2 >> Workers Launched: 2 >> -> Sort (cost=1857610.50..1961777.26 rows=41666703 >> width=16) (actual time=30560.865..30561.046 rows=911 loops=3) >>Sort Key: v1, v2 >>Sort Method: external merge Disk: 841584kB >>-> Parallel Seq Scan on test (cost=0.00..957208.03 >> rows=41666703 width=16) (actual time=0.050..2330.275 rows= >> loops=3) >> Planning time: 0.292 ms >> Execution time: 31502.896 ms >> (10 rows) >> >> postgres=# set max_parallel_workers_per_gather = 0; >> SET >> postgres=# explain analyze select * from test order by v1, v2 limit 10; >> QUERY >> PLAN >> >> >> Limit (cost=3701507.83..3701507.85 rows=10 width=16) (actual >> time=13231.264..13231.266 rows=10 loops=1) >>-> Sort (cost=3701507.83..3951508.05 rows=10088 width=16) >> (actual time=13231.261..13231.262 rows=10 loops=1) >> Sort Key: v1, v2 >> Sort Method: top-N heapsort Memory: 25kB >> -> Seq Scan on test (cost=0.00..1540541.88 rows=10088 >> width=16) (actual time=0.045..6759.363 rows=1 loops=1) >> Planning time: 0.131 ms >> Execution time: 13231.299 ms >> (7 rows) >> >> On Wed, Mar 22, 2017 at 11:07 AM, Mithun Cy >> wrote: >> > I accidently encountered a case where gather merge was picked as >> > default but disabling same by setting max_parallel_workers_per_gather >> > = 0; produced a non-parallel plan which was
Re: [HACKERS] Possible regression with gather merge.
Thanks for reporting, I am looking into this. On Wed, Mar 22, 2017 at 11:51 AM, Mithun Cywrote: > Adding more rows to table make gather merge execution time very slow > when compared to non-parallel plan we get after disabling gather > merge. > > create table test as (select id, (random()*1)::int as v1, random() as > v2 from generate_series(1,1) id); > > postgres=# set max_parallel_workers_per_gather = default; > SET > postgres=# explain analyze select * from test order by v1, v2 limit 10; > > QUERY PLAN > > > > Limit (cost=1858610.53..1858611.70 rows=10 width=16) (actual > time=31103.880..31103.885 rows=10 loops=1) >-> Gather Merge (cost=1858610.53..11581520.05 rows=8406 > width=16) (actual time=31103.878..31103.882 rows=10 loops=1) > Workers Planned: 2 > Workers Launched: 2 > -> Sort (cost=1857610.50..1961777.26 rows=41666703 > width=16) (actual time=30560.865..30561.046 rows=911 loops=3) >Sort Key: v1, v2 >Sort Method: external merge Disk: 841584kB >-> Parallel Seq Scan on test (cost=0.00..957208.03 > rows=41666703 width=16) (actual time=0.050..2330.275 rows= > loops=3) > Planning time: 0.292 ms > Execution time: 31502.896 ms > (10 rows) > > postgres=# set max_parallel_workers_per_gather = 0; > SET > postgres=# explain analyze select * from test order by v1, v2 limit 10; > QUERY > PLAN > > > Limit (cost=3701507.83..3701507.85 rows=10 width=16) (actual > time=13231.264..13231.266 rows=10 loops=1) >-> Sort (cost=3701507.83..3951508.05 rows=10088 width=16) > (actual time=13231.261..13231.262 rows=10 loops=1) > Sort Key: v1, v2 > Sort Method: top-N heapsort Memory: 25kB > -> Seq Scan on test (cost=0.00..1540541.88 rows=10088 > width=16) (actual time=0.045..6759.363 rows=1 loops=1) > Planning time: 0.131 ms > Execution time: 13231.299 ms > (7 rows) > > On Wed, Mar 22, 2017 at 11:07 AM, Mithun Cy > wrote: > > I accidently encountered a case where gather merge was picked as > > default but disabling same by setting max_parallel_workers_per_gather > > = 0; produced a non-parallel plan which was faster than gather merge, > > but its cost is marked too high when compared to gather merge. > > > > I guess we need some cost adjustment is planner code. > > > > Test setting > > = > > create table test as (select id, (random()*1)::int as v1, random() as > > v2 from generate_series(1,100) id); > > create index test_v1_idx on test (v1); > > > > > > Server setting is default. > > > > > > postgres=# explain analyze select * from test order by v1, v2 limit 10; > >QUERY > > PLAN > > > > > > Limit (cost=19576.71..19577.88 rows=10 width=16) (actual > > time=265.989..265.995 rows=10 loops=1) > >-> Gather Merge (cost=19576.71..116805.80 rows=84 width=16) > > (actual time=265.987..265.992 rows=10 loops=1) > > Workers Planned: 2 > > Workers Launched: 2 > > -> Sort (cost=18576.69..19618.36 rows=416667 width=16) > > (actual time=250.202..250.424 rows=911 loops=3) > >Sort Key: v1, v2 > >Sort Method: external merge Disk: 9272kB > >-> Parallel Seq Scan on test (cost=0.00..9572.67 > > rows=416667 width=16) (actual time=0.053..41.397 rows=33 loops=3) > > Planning time: 0.193 ms > > Execution time: 271.222 ms > > > > postgres=# set max_parallel_workers_per_gather = 0; > > SET > > postgres=# explain analyze select * from test order by v1, v2 limit 10; > > QUERY PLAN > > > - > > Limit (cost=37015.64..37015.67 rows=10 width=16) (actual > > time=211.582..211.584 rows=10 loops=1) > >-> Sort (cost=37015.64..39515.64 rows=100 width=16) (actual > > time=211.581..211.582 rows=10 loops=1) > > Sort Key: v1, v2 > > Sort Method: top-N heapsort Memory: 25kB > > -> Seq Scan on test (cost=0.00..15406.00 rows=100 > > width=16) (actual time=0.085..107.522 rows=100 loops=1) > > Planning time: 0.093 ms > > Execution time: 211.608 ms > > (7 rows) > > > > > > > > -- > > Thanks and Regards > > Mithun C Y > > EnterpriseDB: http://www.enterprisedb.com > > > >
Re: [HACKERS] Possible regression with gather merge.
Adding more rows to table make gather merge execution time very slow when compared to non-parallel plan we get after disabling gather merge. create table test as (select id, (random()*1)::int as v1, random() as v2 from generate_series(1,1) id); postgres=# set max_parallel_workers_per_gather = default; SET postgres=# explain analyze select * from test order by v1, v2 limit 10; QUERY PLAN Limit (cost=1858610.53..1858611.70 rows=10 width=16) (actual time=31103.880..31103.885 rows=10 loops=1) -> Gather Merge (cost=1858610.53..11581520.05 rows=8406 width=16) (actual time=31103.878..31103.882 rows=10 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=1857610.50..1961777.26 rows=41666703 width=16) (actual time=30560.865..30561.046 rows=911 loops=3) Sort Key: v1, v2 Sort Method: external merge Disk: 841584kB -> Parallel Seq Scan on test (cost=0.00..957208.03 rows=41666703 width=16) (actual time=0.050..2330.275 rows= loops=3) Planning time: 0.292 ms Execution time: 31502.896 ms (10 rows) postgres=# set max_parallel_workers_per_gather = 0; SET postgres=# explain analyze select * from test order by v1, v2 limit 10; QUERY PLAN Limit (cost=3701507.83..3701507.85 rows=10 width=16) (actual time=13231.264..13231.266 rows=10 loops=1) -> Sort (cost=3701507.83..3951508.05 rows=10088 width=16) (actual time=13231.261..13231.262 rows=10 loops=1) Sort Key: v1, v2 Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on test (cost=0.00..1540541.88 rows=10088 width=16) (actual time=0.045..6759.363 rows=1 loops=1) Planning time: 0.131 ms Execution time: 13231.299 ms (7 rows) On Wed, Mar 22, 2017 at 11:07 AM, Mithun Cywrote: > I accidently encountered a case where gather merge was picked as > default but disabling same by setting max_parallel_workers_per_gather > = 0; produced a non-parallel plan which was faster than gather merge, > but its cost is marked too high when compared to gather merge. > > I guess we need some cost adjustment is planner code. > > Test setting > = > create table test as (select id, (random()*1)::int as v1, random() as > v2 from generate_series(1,100) id); > create index test_v1_idx on test (v1); > > > Server setting is default. > > > postgres=# explain analyze select * from test order by v1, v2 limit 10; >QUERY > PLAN > > Limit (cost=19576.71..19577.88 rows=10 width=16) (actual > time=265.989..265.995 rows=10 loops=1) >-> Gather Merge (cost=19576.71..116805.80 rows=84 width=16) > (actual time=265.987..265.992 rows=10 loops=1) > Workers Planned: 2 > Workers Launched: 2 > -> Sort (cost=18576.69..19618.36 rows=416667 width=16) > (actual time=250.202..250.424 rows=911 loops=3) >Sort Key: v1, v2 >Sort Method: external merge Disk: 9272kB >-> Parallel Seq Scan on test (cost=0.00..9572.67 > rows=416667 width=16) (actual time=0.053..41.397 rows=33 loops=3) > Planning time: 0.193 ms > Execution time: 271.222 ms > > postgres=# set max_parallel_workers_per_gather = 0; > SET > postgres=# explain analyze select * from test order by v1, v2 limit 10; > QUERY PLAN > - > Limit (cost=37015.64..37015.67 rows=10 width=16) (actual > time=211.582..211.584 rows=10 loops=1) >-> Sort (cost=37015.64..39515.64 rows=100 width=16) (actual > time=211.581..211.582 rows=10 loops=1) > Sort Key: v1, v2 > Sort Method: top-N heapsort Memory: 25kB > -> Seq Scan on test (cost=0.00..15406.00 rows=100 > width=16) (actual time=0.085..107.522 rows=100 loops=1) > Planning time: 0.093 ms > Execution time: 211.608 ms > (7 rows) > > > > -- > Thanks and Regards > Mithun C Y > EnterpriseDB: http://www.enterprisedb.com -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible regression regarding estimating relation width in FDWs
Ronan Dunklauwrites: > While working on adapting the Multicorn FDW for 9.6, I noticed that there is > a > regression with regards to estimating the remote relation width. Hm. In the old code it was basically a chance artifact that this example happens to produce the "right" width estimate. There are plenty of other cases where the per-column width estimates were already the relevant ones, for example if you join the foreign table to another one. postgres_fdw is falling down on the job by not making the per-column width estimates consistent with the overall relation width estimate. We could imagine extending use_remote_estimate mode to collect per-column width estimates from the remote end, but that would add quite a lot of cost. It's not really necessary either, IMO, because you can instead ANALYZE the foreign table to cause column width estimates to be computed and stored locally. If you do that in this example, you find another interesting thing about HEAD's behavior: regression=# EXPLAIN SELECT * FROM foreign_table; QUERY PLAN -- Foreign Scan on foreign_table (cost=100.00..101.03 rows=1 width=32) (1 row) regression=# ANALYZE foreign_table; ANALYZE regression=# EXPLAIN SELECT * FROM foreign_table; QUERY PLAN - Foreign Scan on foreign_table (cost=100.00..101.03 rows=1 width=40004) (1 row) The width estimate is now based on the decompressed/detoasted column width, which is really the right thing here because that is the representation we'll be working with for any local operations --- estimating the size of a hash table using the remote's toasted column width, for example, is just wrong. So I don't see any basis for arguing that 472 is the "right" width to use for the foreign table. In both HEAD and 9.5, join cases (post-ANALYZE) look pretty wacko: regression=# EXPLAIN SELECT * FROM foreign_table cross join int8_tbl; QUERY PLAN - Nested Loop (cost=100.00..102.13 rows=5 width=40020) -> Foreign Scan on foreign_table (cost=100.00..101.03 rows=1 width=472) -> Seq Scan on int8_tbl (cost=0.00..1.05 rows=5 width=16) (3 rows) The top-level estimate here is actually right, IMV, but the width estimate for the ForeignScan is not. In view of this, I'm a bit tempted to double down on the ANALYZE dependency by having postgres_fdw not pay attention to the remote's width estimates at all, but just use whatever column width data is cached locally, and sum those numbers to get a relation width. That would be more reliable if we've done an ANALYZE recently, and I'm not convinced it'd be worse if we have not. Anyway, the bottom line is that the behavior of 9.5 and before is not so great in this area that I feel a need to be bug-compatible with it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] large regression for parallel COPY
On 2016-04-05 17:12:11 -0400, Robert Haas wrote: > On Wed, Mar 30, 2016 at 4:10 PM, Andres Freundwrote: > > Indeed. On SSDs I see about a 25-35% gain, on HDDs about 5%. If I > > increase the size of backend_flush_after to 64 (like it's for bgwriter) > > I however do get about 15% for HDDs as well. > > I tried the same test mentioned in the original post on cthulhu (EDB > machine, CentOS 7.2, 8 sockets, 8 cores per socket, 2 threads per > core, Xeon E7-8830 @ 2.13 GHz). I attempted to test both the effects > of multi_extend_v21 and the *_flush_after settings. The machine has > both HD and SSD, but I used HD for this test. > master, logged tables, 4 parallel copies: > 1m15.411s, 1m14.248s, 1m15.040s > master, logged tables, 1 copy: > 0m28.336s, 0m28.040s, 0m29.576s > multi_extend_v21, logged tables, 4 parallel copies: > 0m46.058s, 0m44.515s, 0m45.688s > multi_extend_v21, logged tables, 1 copy: > 0m28.440s, 0m28.129s, 0m30.698s > master, logged tables, 4 parallel copies, {backend,bgwriter}_flush_after=0: > 1m2.817s, 1m4.467s, 1m12.319s > multi_extend_v21, logged tables, 4 parallel copies, > {backend,bgwriter}_flush_after=0: 0m41.301s, 0m41.104s, 0m41.342s > master, logged tables, 1 copy, {backend,bgwriter}_flush_after=0: > 0m26.948s, 0m26.829s, 0m26.616s Any chance you could repeat with backend_flush_after set to 64? I wonder if the current value isn't just too small a default for HDDs due to their increased latency. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] large regression for parallel COPY
Hello Robert, I tried the same test mentioned in the original post on cthulhu (EDB machine, CentOS 7.2, 8 sockets, 8 cores per socket, 2 threads per core, Xeon E7-8830 @ 2.13 GHz). I attempted to test both the effects of multi_extend_v21 and the *_flush_after settings. I'm not sure of {backend,writer}_flush_after intrinsic effectiveness, especially on HDDs, because although for the checkpointer (checkpoint_flush_after) there is a great deal of effort to generate large sequential writes, there is no such provisions for other write activities. I'm not sure how the write activity of the "parallel" copy is organized, but that sounds like it will generate less sequential writes than before, and the negative performance impact could be accentuated by flushing. This might suggest that the benefit of these two settings are more irregular/hard to predict, so their default value should be 0 (aka off)? Or maybe warn clearly in the documentation about the uncertain effects of these two settings? -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] large regression for parallel COPY
On Wed, Mar 30, 2016 at 4:10 PM, Andres Freundwrote: > Indeed. On SSDs I see about a 25-35% gain, on HDDs about 5%. If I > increase the size of backend_flush_after to 64 (like it's for bgwriter) > I however do get about 15% for HDDs as well. I tried the same test mentioned in the original post on cthulhu (EDB machine, CentOS 7.2, 8 sockets, 8 cores per socket, 2 threads per core, Xeon E7-8830 @ 2.13 GHz). I attempted to test both the effects of multi_extend_v21 and the *_flush_after settings. The machine has both HD and SSD, but I used HD for this test. master, logged tables, 4 parallel copies: 1m15.411s, 1m14.248s, 1m15.040s master, logged tables, 1 copy: 0m28.336s, 0m28.040s, 0m29.576s multi_extend_v21, logged tables, 4 parallel copies: 0m46.058s, 0m44.515s, 0m45.688s multi_extend_v21, logged tables, 1 copy: 0m28.440s, 0m28.129s, 0m30.698s master, logged tables, 4 parallel copies, {backend,bgwriter}_flush_after=0: 1m2.817s, 1m4.467s, 1m12.319s multi_extend_v21, logged tables, 4 parallel copies, {backend,bgwriter}_flush_after=0: 0m41.301s, 0m41.104s, 0m41.342s master, logged tables, 1 copy, {backend,bgwriter}_flush_after=0: 0m26.948s, 0m26.829s, 0m26.616s So the flushing is a small win with only 1 parallel copy, but with 4 parallel copies it's a significant loss. However, the relation extension patch reduces the regression significantly, probably because it makes it far more likely that a backend doing a flush is flushing a consecutive range of blocks all of which it added to the relation, so that there is no interleaving. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] large regression for parallel COPY
On 03/30/2016 01:10 PM, Andres Freund wrote: On 2016-03-30 15:50:21 -0400, Robert Haas wrote: On Thu, Mar 10, 2016 at 8:29 PM, Andres Freundwrote: Allow to trigger kernel writeback after a configurable number of writes. While testing out Dilip Kumar's relation extension patch today, I discovered (with some help from Andres) that this causes nasty regressions when doing parallel COPY on hydra (3.2.6-3.fc16.ppc64, lousy disk subsystem). Unless Fedora/Redhat fixed the 3.2 kernel in a subsequent patch (.6-3?) then I would look hard right at that. The kernel from 3.2 - 3.8 is going to be miserable for anything that is doing concurrent writes. I understand that this is a regression regardless but I think we need wider testing to see if the changes are somehow related. Sincerely, jD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] large regression for parallel COPY
On 2016-03-30 15:50:21 -0400, Robert Haas wrote: > On Thu, Mar 10, 2016 at 8:29 PM, Andres Freundwrote: > > Allow to trigger kernel writeback after a configurable number of writes. > > While testing out Dilip Kumar's relation extension patch today, I > discovered (with some help from Andres) that this causes nasty > regressions when doing parallel COPY on hydra (3.2.6-3.fc16.ppc64, > lousy disk subsystem). What I did was (1) run pgbench -i -s 100, (2) > copy the results to a file, (3) truncate and drop the indexes on the > original table, and (4) try copying in one or more copies of the data > from the file. Typical command line: > > time pgbench -n -f f -t 1 -c 4 -j 4 && psql -c "select > pg_size_pretty(pg_relation_size('pgbench_accounts'));" && time psql -c > checkpoint && psql -c "truncate pgbench_accounts; checkpoint;" > > With default settings against > 96f8373cad5d6066baeb7a1c5a88f6f5c9661974, pgbench takes 9 to 9.5 > minutes and the subsequent checkpoint takes 9 seconds. After setting > , it takes 1 minute and 11 seconds and the subsequent checkpoint takes > 11 seconds. With a single copy of the data (that is, -c 1 -j 1 but > otherwise as above), it takes 28-29 seconds with default settings and > 26-27 seconds with backend_flush_after=0, bgwriter_flush_after=0. So > the difference is rather small with a straight-up COPY, but with 4 > copies running at the same time, it's near enough to an order of > magnitude. > > Andres reports that on his machine, non-zero *_flush_after settings > make things faster, not slower, so apparently this is > hardware-dependent or kernel-dependent. Nevertheless, it seems to me > that we should try to get some broader testing here to see which > experience is typical. Indeed. On SSDs I see about a 25-35% gain, on HDDs about 5%. If I increase the size of backend_flush_after to 64 (like it's for bgwriter) I however do get about 15% for HDDs as well. I wonder if the default value of backend_flush_after is too small for some scenarios. I've reasoned that backend_flush_after should have a *lower* default value than e.g. checkpointer or bgwriter, because there's many concurrent writers increasing the total amount of unflushed dirty writes. Which is true for OLTP write workloads; but less so for bulk load. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving regression tests to check LOCK TABLE and table permissions
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/11/2015 07:52 PM, Michael Paquier wrote: Hi all, As mentioned in this thread, it would be good to have regression tests to test the interactions with permissions and LOCK TABLE: http://www.postgresql.org/message-id/20150511195335.ge30...@tamriel.sn owman.net Attached is a patch achieving that. Note that it does some checks on the modes SHARE ACCESS, ROW EXCLUSIVE and ACCESS EXCLUSIVE to check all the code paths of LockTableAclCheck@lockcmds.c. I'll add an entry in the next CF to keep track of it. Regards, Committed and pushed to master and 9.5 Joe - -- Joe Conway -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJVnEcNAAoJEDfy90M199hlqTwP/0w87jIaAiJ86w+B72w24InP 77HYMqHd/6IB7cx6JWvDxeYZB0UN0h66A6z7TxaRyVCGM3m2ak73GwH+hj23TO9p xCn94ZAFN4jfnFoiHAHQqThMlschbGFpvDbSxDyCbRyMV0t9ztpg+/bE/9/QZgg/ NzyhcjKQZZLhzMLDZva5i9jov8wi+cyVjYN2RT2I5+d7Sslrmz0QvOt8lCLMT6Mo RQAMSy7m23SWCPjehDUhfaCvPu/Ur9E5PQx0JrJeSWGuJLbJ2Y700y7jstZYUgt9 96CmSJ1W/72deIzBWunf1eDFpLXqk3zn6Yi1K/wrGJwHDm7kfgqoSm5UsV9UYaE6 FUoPm3W2cqPXgOAzDJCfqS3mt7FOrYJ8dq+CsWK+eRRGmsjiOuNqu0YSAqC3rKUi +GtBBXbaghm6+qLXi/ZSjfUdSq49Mj8jTMlWIcCxNWm7NV9lrXGUwRhCv97TrRoR 0Kl/PGL5Rsi9df2ck1VahEmIh5Ad+54I6On/0nZiq6pp42i7ZlrS1sA/kQbVLLVG a1GPlXvN0pj8IGNyc2+FKdPBqTFrqp2Gcq2G4QfWWf5gCeTTyLKVtXPO8EcyJSGI 0Us+ELyW8IIBqCz/Rxh9T4NTPTsSlJbdpW8/vT9dY5z2rTR6IH11l+QQ2DOFDuM4 ehy/f/tjsT3u/VJIX79E =3hyl -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving regression tests to check LOCK TABLE and table permissions
On Wed, Jul 8, 2015 at 6:39 AM, Joe Conway wrote: Hash: SHA1 Committed and pushed to master and 9.5 Thanks, Joe! -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Evidently there is a problem right there. If I simply add an order by tenthous as proposed by Peter, many more errors appear; and what errors appear differs if I change shared_buffers. I think the real fix for this is to change the hand-picked values used in the brinopers table, so that they all pass the test using some reasonable ORDER BY specification in the populating query (probably tenk1.unique1). I may be confused, but why would the physical ordering of the table entries make a difference to the correct answers for this test? (I can certainly see why that might break the brin code, but not why it should change the seqscan's answers.) We create the brintest using a scan of tenk1 LIMIT 100, without specifying the order. So whether we find rows that match each test query is pure chance. Also, what I'd just noticed is that all of the cases that are failing are ones where the expected number of matching rows is exactly 1. I am wondering if the test is sometimes just missing random rows, and we're not seeing any reported problem unless that makes it go down to no rows. (But I do not know how that could simultaneously affect the seqscan case ...) Yeah, we compare the ctid sets of the results, and we assume that a seqscan would get that correctly. I think it would be a good idea to extend the brinopers table to include the number of expected matches, and to complain if that's not what we got, rather than simply checking for zero. That sounds reasonable. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
I wrote: I think it would be a good idea to extend the brinopers table to include the number of expected matches, and to complain if that's not what we got, rather than simply checking for zero. Also, further experimentation shows that there are about 30 entries in the brinopers table that give rise to seqscan plans even when we're commanding a bitmap scan, presumably because those operators aren't brin-indexable. They're not the problematic cases, but things like ((charcol)::text 'A'::text) Is there a reason to have such things in the table, or is this just a thinko? Or is it actually a bug that we're getting such plans? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Tom Lane wrote: I wrote: I think it would be a good idea to extend the brinopers table to include the number of expected matches, and to complain if that's not what we got, rather than simply checking for zero. Also, further experimentation shows that there are about 30 entries in the brinopers table that give rise to seqscan plans even when we're commanding a bitmap scan, presumably because those operators aren't brin-indexable. They're not the problematic cases, but things like ((charcol)::text 'A'::text) Is there a reason to have such things in the table, or is this just a thinko? Or is it actually a bug that we're getting such plans? No, I left those there knowing that there are no plans involving brin -- in a way, they provide some future proofing if some of those operators are made indexable later. I couldn't think of a way to test that the plans are actually using the brin index or not, but if we can do that in some way, that would be good. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: Also, further experimentation shows that there are about 30 entries in the brinopers table that give rise to seqscan plans even when we're commanding a bitmap scan, presumably because those operators aren't brin-indexable. They're not the problematic cases, but things like ((charcol)::text 'A'::text) Is there a reason to have such things in the table, or is this just a thinko? Or is it actually a bug that we're getting such plans? No, I left those there knowing that there are no plans involving brin -- in a way, they provide some future proofing if some of those operators are made indexable later. On closer investigation, I think the ones involving charcol are a flat out bug in the test, namely failure to quote char. Observe: regression=# explain select ctid from brintest where charcol = 'A'::char; QUERY PLAN -- Seq Scan on brintest (cost=0.00..101.88 rows=1 width=6) Filter: ((charcol)::text = 'A'::text) (2 rows) regression=# explain select ctid from brintest where charcol = 'A'::char; QUERY PLAN --- Bitmap Heap Scan on brintest (cost=48.02..58.50 rows=3 width=6) Recheck Cond: (charcol = 'A'::char) - Bitmap Index Scan on brinidx (cost=0.00..48.02 rows=3 width=0) Index Cond: (charcol = 'A'::char) (4 rows) Presumably we'd like to test the latter case not the former. The other cases that I found involve cidrcol, and seem to represent an actual bug in the brin planning logic, ie failure to disregard a no-op cast. I'll look closer. I couldn't think of a way to test that the plans are actually using the brin index or not, but if we can do that in some way, that would be good. Yeah, we can do that --- the way I found out there's a problem is to modify the test script to check the output of EXPLAIN. So at this point it looks like (1) chipmunk's issue might be explained by lack of forced ORDER BY; (2) the test script could be improved to test more carefully, and it has got an issue with char vs char; (3) there might be a planner bug. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: I may be confused, but why would the physical ordering of the table entries make a difference to the correct answers for this test? (I can certainly see why that might break the brin code, but not why it should change the seqscan's answers.) We create the brintest using a scan of tenk1 LIMIT 100, without specifying the order. So whether we find rows that match each test query is pure chance. Oooh ... normally that would not matter, but I wonder if what's happening on chipmunk is that the synchronized-seqscan logic kicks in and causes us to read some other part of tenk1 than we normally would, as a consequence of some concurrent activity or other. The connection to smaller than normal shared_buffers would be that it would change our idea of what's a large enough table to justify using synchronized seqscans. Peter's patch failed to hit the place where this matters, btw. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: Fixed, see 79f2b5d583e2e2a7; but AFAICS this has no real-world impact so it does not explain whatever is happening on chipmunk. Ah, thanks for diagnosing that. The chipmunk failure is strange -- notice it only references the = operators, except for type box for which it's ~= that fails. The test includes a lot of operators ... Actually not --- if you browse through the last half dozen failures on chipmunk you will notice that (1) the set of operators complained of varies a bit from one failure to the next; (2) more often than not, this is one of the failures: WARNING: no results for (boxcol,@,box,((1,2),(300,400))) Certainly the majority of the complaints are about equality operators, but not quite all of them. Also, we have quite a number of ARM boxes: apart from chipmunk we have gull, hamster, mereswine, dangomushi, axolotl, grison. (hamster and chipmunk report hostname -m as armv6l, the others armv7l). All of them are running Linux, either Fedora or Debian. Most are using gcc, compilation flags look pretty standard. I have no idea what might be different about chipmunk compared to any other ARM buildfarm critter ... Heikki, any thoughts on that? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Tom Lane wrote: Actually not --- if you browse through the last half dozen failures on chipmunk you will notice that (1) the set of operators complained of varies a bit from one failure to the next; (2) more often than not, this is one of the failures: WARNING: no results for (boxcol,@,box,((1,2),(300,400))) Certainly the majority of the complaints are about equality operators, but not quite all of them. Hm. Well, what this message says is that we ran that query using both BRIN and seqscan, and that in both cases no row was returned. Note that if the BRIN and seqscan cases had returned different sets of rows, the error message would have been different. So this might be related to the way the test table is created, rather than to a bug in BRIN. Peter G. recently pointed out that this seems to be relying on an index-only scan on table tenk1 and suggested an ORDER BY. Maybe that assumption is being violated on chipmunk and so the table populated is different than what the table actually expects. I just noticed that chipmunk has shared_buffers=10MB on its buildfarm config. I don't see that in any of the other ARM animals. Maybe that can change the plan choice. I will test locally with reduced shared_buffers and see if I can reproduce the results. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Alvaro Herrera wrote: Hm. Well, what this message says is that we ran that query using both BRIN and seqscan, and that in both cases no row was returned. Note that if the BRIN and seqscan cases had returned different sets of rows, the error message would have been different. So this might be related to the way the test table is created, rather than to a bug in BRIN. Peter G. recently pointed out that this seems to be relying on an index-only scan on table tenk1 and suggested an ORDER BY. Maybe that assumption is being violated on chipmunk and so the table populated is different than what the table actually expects. Evidently there is a problem right there. If I simply add an order by tenthous as proposed by Peter, many more errors appear; and what errors appear differs if I change shared_buffers. I think the real fix for this is to change the hand-picked values used in the brinopers table, so that they all pass the test using some reasonable ORDER BY specification in the populating query (probably tenk1.unique1). -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Alvaro Herrera alvhe...@2ndquadrant.com writes: Evidently there is a problem right there. If I simply add an order by tenthous as proposed by Peter, many more errors appear; and what errors appear differs if I change shared_buffers. I think the real fix for this is to change the hand-picked values used in the brinopers table, so that they all pass the test using some reasonable ORDER BY specification in the populating query (probably tenk1.unique1). I may be confused, but why would the physical ordering of the table entries make a difference to the correct answers for this test? (I can certainly see why that might break the brin code, but not why it should change the seqscan's answers.) Also, what I'd just noticed is that all of the cases that are failing are ones where the expected number of matching rows is exactly 1. I am wondering if the test is sometimes just missing random rows, and we're not seeing any reported problem unless that makes it go down to no rows. (But I do not know how that could simultaneously affect the seqscan case ...) I think it would be a good idea to extend the brinopers table to include the number of expected matches, and to complain if that's not what we got, rather than simply checking for zero. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
I wrote: The other cases that I found involve cidrcol, and seem to represent an actual bug in the brin planning logic, ie failure to disregard a no-op cast. I'll look closer. I leapt to the wrong conclusion on that one. The reason for failure to match to an index column had nothing to do with the extra cast, and everything to do with the fact that there was no such index column. I think we're probably good now, though it'd be wise to keep an eye on chipmunk for awhile to verify that it doesn't see the problem anymore. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Tom Lane wrote: I wrote: The other cases that I found involve cidrcol, and seem to represent an actual bug in the brin planning logic, ie failure to disregard a no-op cast. I'll look closer. I leapt to the wrong conclusion on that one. The reason for failure to match to an index column had nothing to do with the extra cast, and everything to do with the fact that there was no such index column. Oops! Thanks for reviewing this. I think we're probably good now, though it'd be wise to keep an eye on chipmunk for awhile to verify that it doesn't see the problem anymore. Will do. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hstore_plpython regression test does not work on Python 3
Peter Eisentraut pete...@gmx.net writes: On 5/26/15 5:19 PM, Oskari Saarenmaa wrote: Looks like that animal uses Python 3.4. Python 3.3 and newer versions default to using a random seed for hashing objects into dicts which makes the order of dict elements random; see https://docs.python.org/3/using/cmdline.html#cmdoption-R Ah, good catch. That explains the, well, randomness. I can reproduce the test failures with PYTHONHASHSEED=2. But I haven't been successful getting that environment variable set so that it works in the installcheck case. Yeah, there's pretty much no chance of controlling the postmaster's environment in installcheck cases. Instead, I have rewritten the tests to use asserts instead of textual comparisons. See attached patch. Comments? If that works back to Python 2.3 or whatever is the oldest we support, sounds good to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hstore_plpython regression test does not work on Python 3
On 5/26/15 5:19 PM, Oskari Saarenmaa wrote: [1] http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=jaguarundibr=HEAD Looks like that animal uses Python 3.4. Python 3.3 and newer versions default to using a random seed for hashing objects into dicts which makes the order of dict elements random; see https://docs.python.org/3/using/cmdline.html#cmdoption-R Ah, good catch. That explains the, well, randomness. I can reproduce the test failures with PYTHONHASHSEED=2. But I haven't been successful getting that environment variable set so that it works in the installcheck case. Instead, I have rewritten the tests to use asserts instead of textual comparisons. See attached patch. Comments? diff --git a/contrib/hstore_plpython/expected/hstore_plpython.out b/contrib/hstore_plpython/expected/hstore_plpython.out index 6252836..b7a6a92 100644 --- a/contrib/hstore_plpython/expected/hstore_plpython.out +++ b/contrib/hstore_plpython/expected/hstore_plpython.out @@ -43,12 +43,10 @@ CREATE FUNCTION test1arr(val hstore[]) RETURNS int LANGUAGE plpythonu TRANSFORM FOR TYPE hstore AS $$ -plpy.info(repr(val)) +assert(val == [{'aa': 'bb', 'cc': None}, {'dd': 'ee'}]) return len(val) $$; SELECT test1arr(array['aa=bb, cc=NULL'::hstore, 'dd=ee']); -INFO: [{'aa': 'bb', 'cc': None}, {'dd': 'ee'}] -CONTEXT: PL/Python function test1arr test1arr -- 2 @@ -88,18 +86,14 @@ LANGUAGE plpythonu TRANSFORM FOR TYPE hstore AS $$ rv = plpy.execute(SELECT 'aa=bb, cc=NULL'::hstore AS col1) -plpy.info(repr(rv[0][col1])) +assert(rv[0][col1] == {'aa': 'bb', 'cc': None}) val = {'a': 1, 'b': 'boo', 'c': None} plan = plpy.prepare(SELECT $1::text AS col1, [hstore]) rv = plpy.execute(plan, [val]) -plpy.info(repr(rv[0][col1])) +assert(rv[0][col1] == 'a=1, b=boo, c=NULL') $$; SELECT test3(); -INFO: {'aa': 'bb', 'cc': None} -CONTEXT: PL/Python function test3 -INFO: 'a=1, b=boo, c=NULL' -CONTEXT: PL/Python function test3 test3 --- @@ -118,7 +112,7 @@ CREATE FUNCTION test4() RETURNS trigger LANGUAGE plpythonu TRANSFORM FOR TYPE hstore AS $$ -plpy.info(Trigger row: {'a': %r, 'b': %r} % (TD[new][a], TD[new][b])) +assert(TD[new] == {'a': 1, 'b': {'aa': 'bb', 'cc': None}}) if TD[new][a] == 1: TD[new][b] = {'a': 1, 'b': 'boo', 'c': None} @@ -126,8 +120,6 @@ return MODIFY $$; CREATE TRIGGER test4 BEFORE UPDATE ON test1 FOR EACH ROW EXECUTE PROCEDURE test4(); UPDATE test1 SET a = a; -INFO: Trigger row: {'a': 1, 'b': {'aa': 'bb', 'cc': None}} -CONTEXT: PL/Python function test4 SELECT * FROM test1; a |b ---+- diff --git a/contrib/hstore_plpython/sql/hstore_plpython.sql b/contrib/hstore_plpython/sql/hstore_plpython.sql index 872d8c6..9ff2ebc 100644 --- a/contrib/hstore_plpython/sql/hstore_plpython.sql +++ b/contrib/hstore_plpython/sql/hstore_plpython.sql @@ -37,7 +37,7 @@ CREATE FUNCTION test1arr(val hstore[]) RETURNS int LANGUAGE plpythonu TRANSFORM FOR TYPE hstore AS $$ -plpy.info(repr(val)) +assert(val == [{'aa': 'bb', 'cc': None}, {'dd': 'ee'}]) return len(val) $$; @@ -74,12 +74,12 @@ CREATE FUNCTION test3() RETURNS void TRANSFORM FOR TYPE hstore AS $$ rv = plpy.execute(SELECT 'aa=bb, cc=NULL'::hstore AS col1) -plpy.info(repr(rv[0][col1])) +assert(rv[0][col1] == {'aa': 'bb', 'cc': None}) val = {'a': 1, 'b': 'boo', 'c': None} plan = plpy.prepare(SELECT $1::text AS col1, [hstore]) rv = plpy.execute(plan, [val]) -plpy.info(repr(rv[0][col1])) +assert(rv[0][col1] == 'a=1, b=boo, c=NULL') $$; SELECT test3(); @@ -94,7 +94,7 @@ CREATE FUNCTION test4() RETURNS trigger LANGUAGE plpythonu TRANSFORM FOR TYPE hstore AS $$ -plpy.info(Trigger row: {'a': %r, 'b': %r} % (TD[new][a], TD[new][b])) +assert(TD[new] == {'a': 1, 'b': {'aa': 'bb', 'cc': None}}) if TD[new][a] == 1: TD[new][b] = {'a': 1, 'b': 'boo', 'c': None} -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hstore_plpython regression test does not work on Python 3
29.05.2015, 03:12, Peter Eisentraut kirjoitti: On 5/26/15 5:19 PM, Oskari Saarenmaa wrote: [1] http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=jaguarundibr=HEAD Looks like that animal uses Python 3.4. Python 3.3 and newer versions default to using a random seed for hashing objects into dicts which makes the order of dict elements random; see https://docs.python.org/3/using/cmdline.html#cmdoption-R Ah, good catch. That explains the, well, randomness. I can reproduce the test failures with PYTHONHASHSEED=2. But I haven't been successful getting that environment variable set so that it works in the installcheck case. Instead, I have rewritten the tests to use asserts instead of textual comparisons. See attached patch. Comments? Looks good to me. / Oskari -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Tom Lane wrote: Peter Geoghegan p...@heroku.com writes: I meant to get around to looking into it, but FWIW I see BRIN-related Valgrind issues. e.g.: Fixed, see 79f2b5d583e2e2a7; but AFAICS this has no real-world impact so it does not explain whatever is happening on chipmunk. Ah, thanks for diagnosing that. The chipmunk failure is strange -- notice it only references the = operators, except for type box for which it's ~= that fails. The test includes a lot of operators ... Also, we have quite a number of ARM boxes: apart from chipmunk we have gull, hamster, mereswine, dangomushi, axolotl, grison. (hamster and chipmunk report hostname -m as armv6l, the others armv7l). All of them are running Linux, either Fedora or Debian. Most are using gcc, compilation flags look pretty standard. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hstore_plpython regression test does not work on Python 3
22.05.2015, 09:44, Christian Ullrich kirjoitti: * Peter Eisentraut wrote: On 5/16/15 12:06 PM, Tom Lane wrote: As exhibited for instance here: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spoonbilldt=2015-05-16%2011%3A00%3A07 I've been able to replicate this on a Fedora 21 box: works fine with Python 2, fails with Python 3. Seems like we still have an issue with reliance on a system-provided sort method. Pushed a fix, tested with 2.3 .. 3.4. There is still a sorting problem (of sorts). jaguarundi [1] keeps failing intermittently like this: *** 47,53 return len(val) $$; SELECT test1arr(array['aa=bb, cc=NULL'::hstore, 'dd=ee']); ! INFO: [{'aa': 'bb', 'cc': None}, {'dd': 'ee'}] CONTEXT: PL/Python function test1arr test1arr -- --- 47,53 return len(val) $$; SELECT test1arr(array['aa=bb, cc=NULL'::hstore, 'dd=ee']); ! INFO: [{'cc': None, 'aa': 'bb'}, {'dd': 'ee'}] CONTEXT: PL/Python function test1arr test1arr -- I cannot find any other animal that does the same, but I doubt it's due to CCA this time. Should dict tests perhaps output sorted(thedict.items()) instead? Testing dict formatting could be done with single-item dicts. [1] http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=jaguarundibr=HEAD Looks like that animal uses Python 3.4. Python 3.3 and newer versions default to using a random seed for hashing objects into dicts which makes the order of dict elements random; see https://docs.python.org/3/using/cmdline.html#cmdoption-R The test case could be changed to use sorted(dict.items()) always, but there are multiple places where it would need to be applied. Setting the PYTHONHASHSEED environment variable to a stable value would probably be easier. / Oskari -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Andrew Dunstan and...@dunslane.net writes: There's something odd about the brin regression tests. They seem to generate intermittent failures, which suggests some sort of race condition or ordering failure. See for example http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=fulmardt=2015-05-15%2001%3A02%3A28 and http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=sittelladt=2015-05-15%2021%3A08%3A38 I found the cause of this symptom today. Alvaro said he'd added the autovacuum_enabled=off option to the brintest table to prevent autovac from screwing up this expected result ... but that only stops autovacuum from summarizing the table. Guess what is in the concurrently-executed gist.sql test, at line 40. While we could and perhaps should change that command to a more narrowly targeted vacuum analyze gist_tbl;, this will not prevent someone from reintroducing an untargeted vacuum command in one of the concurrent tests later. I think a future-proof fix would require either making brintest a temp table (losing all test coverage of WAL logging :-() or changing the test so that it does not expect a specific result from brin_summarize_new_values. Or, maybe better, let's lose the brin_summarize_new_values call altogether. What does it test that wouldn't be better done by explicitly running vacuum brintest; ? Also worth noting is that there's a completely different failure symptom that's shown up a few times, eg here: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunkdt=2015-05-25%2009%3A56%3A55 This makes it look like brintest sometimes contains no rows at all, which is difficult to explain ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Peter Geoghegan p...@heroku.com writes: I meant to get around to looking into it, but FWIW I see BRIN-related Valgrind issues. e.g.: Fixed, see 79f2b5d583e2e2a7; but AFAICS this has no real-world impact so it does not explain whatever is happening on chipmunk. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
I wrote: Peter Geoghegan p...@heroku.com writes: I meant to get around to looking into it, but FWIW I see BRIN-related Valgrind issues. e.g.: Fixed, see 79f2b5d583e2e2a7; but AFAICS this has no real-world impact so it does not explain whatever is happening on chipmunk. BTW, after some further trawling in the buildfarm logs, it seem that that no results for failure mode has been seen *only* on chipmunk, where it's happened in roughly half the runs since that particular test case went in. So it's definitely platform-specific. It's less clear whether the test case is bogus, or it's exposing a bug added elsewhere in db5f98ab4fa44bc5, or the bug was pre-existing but not exposed by any older test case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
On Mon, May 25, 2015 at 3:25 PM, Tom Lane t...@sss.pgh.pa.us wrote: Also worth noting is that there's a completely different failure symptom that's shown up a few times, eg here: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunkdt=2015-05-25%2009%3A56%3A55 This makes it look like brintest sometimes contains no rows at all, which is difficult to explain ... I meant to get around to looking into it, but FWIW I see BRIN-related Valgrind issues. e.g.: 2015-05-20 23:44:42.419 PDT 14787 LOG: statement: CREATE INDEX brinidx ON brintest USING brin ( byteacol, charcol, namecol, int8col, int2col, int4col, textcol, oidcol, tidcol, float4col, float8col, macaddrcol, inetcol inet_inclusion_ops, inetcol inet_minmax_ops, bpcharcol, datecol, timecol, timestampcol, timestamptzcol, intervalcol, timetzcol, bitcol, varbitcol, numericcol, uuidcol, int4rangecol, lsncol, boxcol ) with (pages_per_range = 1); ==14787== Unaddressable byte(s) found during client check request ==14787==at 0x7E19AD: PageAddItem (bufpage.c:314) ==14787==by 0x4693AD: brin_doinsert (brin_pageops.c:315) ==14787==by 0x46844C: form_and_insert_tuple (brin.c:1122) ==14787==by 0x4672AD: brinbuildCallback (brin.c:540) ==14787==by 0x544D35: IndexBuildHeapRangeScan (index.c:2549) ==14787==by 0x54426A: IndexBuildHeapScan (index.c:2162) ==14787==by 0x4676EA: brinbuild (brin.c:638) ==14787==by 0x944D19: OidFunctionCall3Coll (fmgr.c:1649) ==14787==by 0x543F75: index_build (index.c:2025) ==14787==by 0x542C4D: index_create (index.c:1100) ==14787==by 0x614E52: DefineIndex (indexcmds.c:605) ==14787==by 0x7F29CC: ProcessUtilitySlow (utility.c:1258) ==14787== Address 0x6932a3a is 1,738 bytes inside a block of size 8,192 alloc'd ==14787==at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==14787==by 0x966E86: AllocSetAlloc (aset.c:847) ==14787==by 0x969B19: palloc (mcxt.c:825) ==14787==by 0x839203: datumCopy (datum.c:171) ==14787==by 0x46D6B3: brin_minmax_add_value (brin_minmax.c:105) ==14787==by 0x944116: FunctionCall4Coll (fmgr.c:1375) ==14787==by 0x4673AF: brinbuildCallback (brin.c:562) ==14787==by 0x544D35: IndexBuildHeapRangeScan (index.c:2549) ==14787==by 0x54426A: IndexBuildHeapScan (index.c:2162) ==14787==by 0x4676EA: brinbuild (brin.c:638) ==14787==by 0x944D19: OidFunctionCall3Coll (fmgr.c:1649) ==14787==by 0x543F75: index_build (index.c:2025) ==14787== { insert_a_suppression_name_here Memcheck:User fun:PageAddItem fun:brin_doinsert fun:form_and_insert_tuple fun:brinbuildCallback fun:IndexBuildHeapRangeScan fun:IndexBuildHeapScan fun:brinbuild fun:OidFunctionCall3Coll fun:index_build fun:index_create fun:DefineIndex fun:ProcessUtilitySlow } ==14787== Invalid read of size 8 ==14787==at 0x4C2F79E: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==14787==by 0x7E19DB: PageAddItem (bufpage.c:317) ==14787==by 0x4693AD: brin_doinsert (brin_pageops.c:315) ==14787==by 0x46844C: form_and_insert_tuple (brin.c:1122) ==14787==by 0x4672AD: brinbuildCallback (brin.c:540) ==14787==by 0x544D35: IndexBuildHeapRangeScan (index.c:2549) ==14787==by 0x54426A: IndexBuildHeapScan (index.c:2162) ==14787==by 0x4676EA: brinbuild (brin.c:638) ==14787==by 0x944D19: OidFunctionCall3Coll (fmgr.c:1649) ==14787==by 0x543F75: index_build (index.c:2025) ==14787==by 0x542C4D: index_create (index.c:1100) ==14787==by 0x614E52: DefineIndex (indexcmds.c:605) ==14787== Address 0x6932a38 is 728 bytes inside a block of size 730 client-defined ==14787==at 0x969DC2: palloc0 (mcxt.c:864) ==14787==by 0x46B9FE: brin_form_tuple (brin_tuple.c:166) ==14787==by 0x46840B: form_and_insert_tuple (brin.c:1120) ==14787==by 0x4672AD: brinbuildCallback (brin.c:540) ==14787==by 0x544D35: IndexBuildHeapRangeScan (index.c:2549) ==14787==by 0x54426A: IndexBuildHeapScan (index.c:2162) ==14787==by 0x4676EA: brinbuild (brin.c:638) ==14787==by 0x944D19: OidFunctionCall3Coll (fmgr.c:1649) ==14787==by 0x543F75: index_build (index.c:2025) ==14787==by 0x542C4D: index_create (index.c:1100) ==14787==by 0x614E52: DefineIndex (indexcmds.c:605) ==14787==by 0x7F29CC: ProcessUtilitySlow (utility.c:1258) ==14787== { insert_a_suppression_name_here Memcheck:Addr8 fun:memcpy@@GLIBC_2.14 fun:PageAddItem fun:brin_doinsert fun:form_and_insert_tuple fun:brinbuildCallback fun:IndexBuildHeapRangeScan fun:IndexBuildHeapScan fun:brinbuild fun:OidFunctionCall3Coll fun:index_build fun:index_create fun:DefineIndex } -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hstore_plpython regression test does not work on Python 3
On 5/16/15 12:06 PM, Tom Lane wrote: As exhibited for instance here: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spoonbilldt=2015-05-16%2011%3A00%3A07 I've been able to replicate this on a Fedora 21 box: works fine with Python 2, fails with Python 3. Seems like we still have an issue with reliance on a system-provided sort method. Pushed a fix, tested with 2.3 .. 3.4. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Andrew Dunstan wrote: There's something odd about the brin regression tests. They seem to generate intermittent failures, which suggests some sort of race condition or ordering failure. See for example http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=fulmardt=2015-05-15%2001%3A02%3A28 and http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=sittelladt=2015-05-15%2021%3A08%3A38 Yeah it's pretty odd. I guess the way to figure out what is going on is to get the test to print out the index contents in case of failure. I guess I could do something with \gset. (The way to print out the index is to use the pageinspect functions. One problem is that at the time the brin test is run we don't have pageinspect) Of course, if I could reproduce the issue locally, this would be a lot easier. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: Just from reading the documentation, couldn't the symptom we're seeing arise from autovacuum having hit the table right before brin_summarize_new_values got called? Well, I added a autovacuum_enabled=off to that table recently precisely because that was my hypothesis. It didn't work though, so it must be sometihng else. Ah. Not having noticed that, I'd locally added a pg_sleep(60) right before the brin_summarize_new_values call, and failed to reproduce any problem. So it's not AV doing something, but it sure smells like something close to that. Is there a good reason why we need to exercise brin_summarize_new_values as such here, rather than just doing a manual VACUUM on the table? And if there is, do we really need to verify its result value? I mean, even without whatever sort of race condition we're talking about, that expected result of 5 looks pretty darn phase-of-the-moon-dependent to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Alvaro Herrera alvhe...@2ndquadrant.com writes: Andrew Dunstan wrote: There's something odd about the brin regression tests. They seem to generate intermittent failures, which suggests some sort of race condition or ordering failure. See for example http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=fulmardt=2015-05-15%2001%3A02%3A28 and http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=sittelladt=2015-05-15%2021%3A08%3A38 Yeah it's pretty odd. Oooh. I saw the sittella failure and assumed it was triggered by the latest BRIN additions, but that fulmar failure is from before those hit. Just from reading the documentation, couldn't the symptom we're seeing arise from autovacuum having hit the table right before brin_summarize_new_values got called? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] brin regression test intermittent failures
Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Andrew Dunstan wrote: There's something odd about the brin regression tests. They seem to generate intermittent failures, which suggests some sort of race condition or ordering failure. See for example http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=fulmardt=2015-05-15%2001%3A02%3A28 and http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=sittelladt=2015-05-15%2021%3A08%3A38 Yeah it's pretty odd. Oooh. I saw the sittella failure and assumed it was triggered by the latest BRIN additions, but that fulmar failure is from before those hit. Just from reading the documentation, couldn't the symptom we're seeing arise from autovacuum having hit the table right before brin_summarize_new_values got called? Well, I added a autovacuum_enabled=off to that table recently precisely because that was my hypothesis. It didn't work though, so it must be sometihng else. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add regression tests for autocommit-off mode for psql and fix some omissions
On Mon, Oct 6, 2014 at 03:49:37PM +0200, Feike Steenbergen wrote: On 6 October 2014 14:09, Michael Paquier michael.paqu...@gmail.com wrote: That's a good catch and it should be a separate patch. This could even be considered for a back-patch down to 9.2. Thoughts? If I isolate DROP INDEX concurrently, this patch would do the trick. Patch applied for 9.5. Thanks. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL regression test suite
On Thu, Dec 04, 2014 at 02:42:41PM +0200, Heikki Linnakangas wrote: On 10/06/2014 04:21 PM, Heikki Linnakangas wrote: This probably needs some further cleanup before it's ready for committing. One issues is that it creates a temporary cluster that listens for TCP connections on localhost, which isn't safe on a multi-user system. This issue remains. There isn't much we can do about it; SSL doesn't work over Unix domain sockets. We could make it work, but that's a whole different feature. A large subset of the test suite could be made secure. Omit or lock down trustdb, and skip affected tests. (Perhaps have an --unsafe-tests option to reactivate them.) Instead of distributing frozen keys, generate all keys on-demand. Ensure that key files have secure file modes from the start. Having said that, ... How do people feel about including this test suite in the source tree? It's probably not suitable for running as part of make check-world, but it's extremely handy if you're working on a patch related to SSL. I'd like to commit this, even if it has some rough edges. That way we can improve it later, rather than have it fall into oblivion. Any objections? ... +1 for having this suite in the tree, even if check-world ignores it. Echoing Tom's comment, the README should mention its security weakness. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL regression test suite
On 10/06/2014 04:21 PM, Heikki Linnakangas wrote: Here's a new version of the SSL regression suite I wrote earlier. It now specifies both host and hostaddr in the connection string as Andres suggested, so it no longer requires changes to network configuration. I added a bunch of tests for the SAN feature that Alexey Klyukin wrote and was committed earlier. Plus a lot of miscellaneous cleanup. And here's another version. It now includes tests for CRLs, and uses a root CA that's used to sign the server and client CA's certificates, to test that using intermediary CAs work. This probably needs some further cleanup before it's ready for committing. One issues is that it creates a temporary cluster that listens for TCP connections on localhost, which isn't safe on a multi-user system. This issue remains. There isn't much we can do about it; SSL doesn't work over Unix domain sockets. We could make it work, but that's a whole different feature. How do people feel about including this test suite in the source tree? It's probably not suitable for running as part of make check-world, but it's extremely handy if you're working on a patch related to SSL. I'd like to commit this, even if it has some rough edges. That way we can improve it later, rather than have it fall into oblivion. Any objections? - Heikki diff --git a/src/test/Makefile b/src/test/Makefile index 9238860..1d6f789 100644 --- a/src/test/Makefile +++ b/src/test/Makefile @@ -12,7 +12,7 @@ subdir = src/test top_builddir = ../.. include $(top_builddir)/src/Makefile.global -SUBDIRS = regress isolation modules +SUBDIRS = regress isolation modules ssl # We want to recurse to all subdirs for all standard targets, except that # installcheck and install should not recurse into the subdirectory modules. diff --git a/src/test/ssl/Makefile b/src/test/ssl/Makefile new file mode 100644 index 000..194267b --- /dev/null +++ b/src/test/ssl/Makefile @@ -0,0 +1,126 @@ +#- +# +# Makefile for src/test/ssl +# +# Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group +# Portions Copyright (c) 1994, Regents of the University of California +# +# src/test/ssl/Makefile +# +#- + +subdir = src/test/ssl +top_builddir = ../../.. +include $(top_builddir)/src/Makefile.global + +CERTIFICATES := server_ca server-cn-and-alt-names \ + server-cn-only server-single-alt-name server-multiple-alt-names \ + server-no-names server-revoked server-ss \ + client_ca client client-revoked \ + root_ca + +SSLFILES := $(CERTIFICATES:%=ssl/%.key) $(CERTIFICATES:%=ssl/%.crt) \ + ssl/client.crl ssl/server.crl ssl/root.crl \ + ssl/both-cas-1.crt ssl/both-cas-2.crt \ + ssl/root+server_ca.crt ssl/root+server.crl \ + ssl/root+client_ca.crt ssl/root+client.crl + +sslfiles: $(SSLFILES) + +ssl/new_certs_dir: + mkdir ssl/new_certs_dir + +# Rule for creating private/public key pairs +ssl/%.key: + openssl genrsa -out $@ 1024 + chmod 0600 $@ + +# Rules for creating root CA certificates +ssl/root_ca.crt: ssl/root_ca.key cas.config + touch ssl/root_ca-certindex + openssl req -new -out ssl/root_ca.crt -x509 -config cas.config -config root_ca.config -key ssl/root_ca.key + echo 01 ssl/root_ca.srl + +# for client and server CAs +ssl/%_ca.crt: ssl/%_ca.key %_ca.config ssl/root_ca.crt ssl/new_certs_dir + touch ssl/$*_ca-certindex + openssl req -new -out ssl/temp_ca.crt -config cas.config -config $*_ca.config -key ssl/$*_ca.key +# Sign the certificate with the root CA + openssl ca -name root_ca -batch -config cas.config -in ssl/temp_ca.crt -out ssl/temp_ca_signed.crt + openssl x509 -in ssl/temp_ca_signed.crt -out ssl/$*_ca.crt # to keep just the PEM cert + rm ssl/temp_ca.crt ssl/temp_ca_signed.crt + echo 01 ssl/$*_ca.srl + +# Server certificates, signed by server CA: +ssl/server-%.crt: ssl/server-%.key ssl/server_ca.crt server-%.config + openssl req -new -key ssl/server-$*.key -out ssl/server-$*.csr -config server-$*.config + openssl ca -name server_ca -batch -config cas.config -in ssl/server-$*.csr -out ssl/temp.crt -extensions v3_req -extfile server-$*.config + openssl x509 -in ssl/temp.crt -out ssl/server-$*.crt # to keep just the PEM cert + rm ssl/server-$*.csr + +# Self-signed version of server-cn-only.crt +ssl/server-ss.crt: ssl/server-cn-only.key ssl/server-cn-only.crt server-cn-only.config + openssl req -new -key ssl/server-cn-only.key -out ssl/server-ss.csr -config server-cn-only.config + openssl x509 -req -days 1 -in ssl/server-ss.csr -signkey ssl/server-cn-only.key -out ssl/server-ss.crt -extensions v3_req -extfile server-cn-only.config + rm ssl/server-ss.csr + +# Client certificate, signed by the client CA: +ssl/client.crt: ssl/client.key ssl/client_ca.crt + openssl req -new -key ssl/client.key -out ssl/client.csr -config client.config + openssl ca -name client_ca -batch -out ssl/temp.crt -config cas.config -infiles
Re: [HACKERS] SSL regression test suite
On Thu, Dec 04, 2014 at 02:42:41PM +0200, Heikki Linnakangas wrote: On 10/06/2014 04:21 PM, Heikki Linnakangas wrote: This probably needs some further cleanup before it's ready for committing. One issues is that it creates a temporary cluster that listens for TCP connections on localhost, which isn't safe on a multi-user system. This issue remains. There isn't much we can do about it; SSL doesn't work over Unix domain sockets. We could make it work, but that's a whole different feature. How do people feel about including this test suite in the source tree? It's probably not suitable for running as part of make check-world, What makes it unsuitable? but it's extremely handy if you're working on a patch related to SSL. I'd like to commit this, even if it has some rough edges. That way we can improve it later, rather than have it fall into oblivion. Any objections? Not from me :) Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL regression test suite
Heikki Linnakangas hlinnakan...@vmware.com writes: On 10/06/2014 04:21 PM, Heikki Linnakangas wrote: This probably needs some further cleanup before it's ready for committing. One issues is that it creates a temporary cluster that listens for TCP connections on localhost, which isn't safe on a multi-user system. This issue remains. There isn't much we can do about it; SSL doesn't work over Unix domain sockets. We could make it work, but that's a whole different feature. How do people feel about including this test suite in the source tree? It's probably not suitable for running as part of make check-world, but it's extremely handy if you're working on a patch related to SSL. I'd like to commit this, even if it has some rough edges. That way we can improve it later, rather than have it fall into oblivion. Any objections? As long as it's not run by any standard target, and there's some documentation explaining why not, I see no reason it can't be in the tree. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL regression test suite
Heikki Linnakangas wrote: How do people feel about including this test suite in the source tree? +1 It's probably not suitable for running as part of make check-world, but it's extremely handy if you're working on a patch related to SSL. I'd like to commit this, even if it has some rough edges. That way we can improve it later, rather than have it fall into oblivion. Any objections? To prevent it from breaking, one idea is to have one or more buildfarm animals that run this test as a separate module. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance regression: 9.2+ vs. ScalarArrayOpExpr vs. ORDER BY
I wrote: Andrew Gierth and...@tao11.riddles.org.uk writes: Bruce == Bruce Momjian br...@momjian.us writes: Bruce Uh, did this ever get addressed? It did not. It dropped off the radar screen (I think I'd assumed the patch would appear in the next commitfest, which it didn't unless I missed something). I'll make a note to look at it once I've finished with the timezone abbreviation mess. I've pushed this patch with some further redesign of build_index_paths' API --- if we're going to have it reporting about what it found, we should extend that to the other case of non-amsearcharray indexes too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance regression: 9.2+ vs. ScalarArrayOpExpr vs. ORDER BY
Bruce == Bruce Momjian br...@momjian.us writes: Bruce Uh, did this ever get addressed? It did not. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance regression: 9.2+ vs. ScalarArrayOpExpr vs. ORDER BY
Andrew Gierth and...@tao11.riddles.org.uk writes: Bruce == Bruce Momjian br...@momjian.us writes: Bruce Uh, did this ever get addressed? It did not. It dropped off the radar screen (I think I'd assumed the patch would appear in the next commitfest, which it didn't unless I missed something). I'll make a note to look at it once I've finished with the timezone abbreviation mess. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance regression: 9.2+ vs. ScalarArrayOpExpr vs. ORDER BY
Uh, did this ever get addressed? --- On Sun, Jul 6, 2014 at 08:56:00PM +0100, Andrew Gierth wrote: Tom == Tom Lane t...@sss.pgh.pa.us writes: I've experimented with the attached patch, which detects when this situation might have occurred and does another pass to try and build ordered scans without the SAOP condition. However, the results may not be quite ideal, because at least in some test queries (not all) the scan with the SAOP included in the indexquals is being costed higher than the same scan with the SAOP moved to a Filter, which seems unreasonable. Tom I'm not convinced that that's a-priori unreasonable, since the Tom SAOP will result in multiple index scans under the hood. Tom Conceivably we ought to try the path with and with SAOPs all the Tom time. OK, here's a patch that always retries on lower SAOP clauses, assuming that a SAOP in the first column is always applicable - or is even that assumption too strong? -- Andrew (irc:RhodiumToad) diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 42dcb11..cfcfbfc 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -50,7 +50,8 @@ typedef enum { SAOP_PER_AM,/* Use ScalarArrayOpExpr if amsearcharray */ SAOP_ALLOW, /* Use ScalarArrayOpExpr for all indexes */ - SAOP_REQUIRE/* Require ScalarArrayOpExpr to be used */ + SAOP_REQUIRE, /* Require ScalarArrayOpExpr to be used */ + SAOP_SKIP_LOWER /* Require lower ScalarArrayOpExpr to be eliminated */ } SaOpControl; /* Whether we are looking for plain indexscan, bitmap scan, or either */ @@ -118,7 +119,8 @@ static void get_index_paths(PlannerInfo *root, RelOptInfo *rel, static List *build_index_paths(PlannerInfo *root, RelOptInfo *rel, IndexOptInfo *index, IndexClauseSet *clauses, bool useful_predicate, - SaOpControl saop_control, ScanTypeControl scantype); + SaOpControl saop_control, ScanTypeControl scantype, + bool *saop_retry); static List *build_paths_for_OR(PlannerInfo *root, RelOptInfo *rel, List *clauses, List *other_clauses); static List *generate_bitmap_or_paths(PlannerInfo *root, RelOptInfo *rel, @@ -734,6 +736,7 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel, { List *indexpaths; ListCell *lc; + bool saop_retry = false; /* * Build simple index paths using the clauses. Allow ScalarArrayOpExpr @@ -742,7 +745,23 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel, indexpaths = build_index_paths(root, rel, index, clauses, index-predOK, -SAOP_PER_AM, ST_ANYSCAN); +SAOP_PER_AM, ST_ANYSCAN, saop_retry); + + /* + * If we allowed any ScalarArrayOpExprs on an index with a useful sort + * ordering, then try again without them, since otherwise we miss important + * paths where the index ordering is relevant. + */ + if (saop_retry) + { + indexpaths = list_concat(indexpaths, + build_index_paths(root, rel, + index, clauses, + index-predOK, + SAOP_SKIP_LOWER, + ST_ANYSCAN, + NULL)); + } /* * Submit all the ones that can form plain IndexScan plans to add_path. (A @@ -779,7 +798,7 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel, indexpaths = build_index_paths(root, rel, index, clauses, false, - SAOP_REQUIRE, ST_BITMAPSCAN); + SAOP_REQUIRE,
Re: [HACKERS] Add regression tests for autocommit-off mode for psql and fix some omissions
Apologies for the previous message, I didn't send the full version. On 6 October 2014 16:01, Tom Lane t...@sss.pgh.pa.us wrote: What class of bug would that prevent exactly? ERROR: [...] cannot run inside a transaction block when: - running psql in AUTOCOMMIT off - not having started a transaction yet Currently some statements (ALTER TYPE name ADD VALUE, DROP INDEX CONCURRENTLY) can only be run in psql when enabling autocommit (which I consider a bug - either in the code, or in the documentation), whilst many others (VACUUM, CREATE DATABASE) can be run in AUTOCOMMIT off because they will not implicitly create a transaction in psql. It seems to me like something that would normally get forgotten when we add any new such statement. I think that is probably true; it has already been forgotten to be added to psql for a few commands. Perhaps I am the only one using autocommit-off mode and we shouldn't put effort into fixing this? For me the reason to add some tests was to make sure that the current behaviour will not change in future versions; the function command_no_begin might be added to, modified, or rewritten. On 7 October 2014 01:41, Jim Nasby jim.na...@bluetreble.com wrote: The options I see... 1) If there's a definitive way to tell from backend source code what commands disallow transactions then we can just use that information to generate the list of commands psql shouldn't do that with. 2) Always run the regression test with auto-commit turned off. 3) Run the regression in both modes (presumably only on the build farm due to how long it would take). 1) I don't know about a definitive way. I used grep to find all statements calling PreventTransactionChain. 2) - I expect most people use autocommit-on; so only running it in autocommit-off would not test the majority of users. - autocommit-off also obliges you to explicitly rollback transactions after errors occur; this would probably mean a rewrite of some tests? kind regards, Feike Steenbergen -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add regression tests for autocommit-off mode for psql and fix some omissions
On 10/7/14, 9:11 AM, Feike Steenbergen wrote: Perhaps I am the only one using autocommit-off mode You most definitely aren't. and we shouldn't put effort into fixing this? It's not clear to me that this is fixing a problem, to be honest. If you're running autocommit=off, you have an expectation that you can roll back commands at will. It's fine if I can't roll back a VACUUM, for example, since I would practically never want to do that. But ALTER TYPE .. ADD VALUE ..; is an entirely different beast. That one's permanent; there's no DROP equivalent. If the command is just executed, and I can't roll it back, wouldn't that be a serious violation of the principle of least astonishment? DROP INDEX CONCURRENTLY has a bit of the same problem. You can CREATE INDEX CONCURRENTLY, but it might take days in some cases. I think that just running the command is a bad idea, and if we want to fix something here we should focus on just providing a better error message. .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add regression tests for autocommit-off mode for psql and fix some omissions
On 7 October 2014 09:55, Marko Tiikkaja ma...@joh.to wrote: It's not clear to me that this is fixing a problem, to be honest. If you're running autocommit=off, you have an expectation that you can roll back commands at will. It's fine if I can't roll back a VACUUM, for example, since I would practically never want to do that. But ALTER TYPE .. ADD VALUE ..; is an entirely different beast. That one's permanent; there's no DROP equivalent. If the command is just executed, and I can't roll it back, wouldn't that be a serious violation of the principle of least astonishment? I think you have a valid and good point; however the autocommit-off mode can currently already execute statements which cannnot be rolled back. Perhaps it is a good idea to not allow any of these statements in autocommit-off mode to prevent astonishement from users, but that would be a discussion of itself. My reason for proposing this is to have all these commands treated consistently. The expectation of being able to roll back commands at will cannot be fulfilled currently, many statemens that are allowed with autocommit-off fall into the category different beast. Currently the following statemens call PreventTransactionChain and do not generate errors in autocommit-off mode: - REINDEX DATABASE - CREATE INDEX CONCURRENTLY - ALTER SYSTEM - CREATE DATABASE - DROP DATABASE - CREATE TABLESPACE - DROP TABLESPACE - CLUSTER - VACUUM The following statements call PreventTransactionChain and do generate errors in autocommit-off mode: - DROP INDEX CONCURRENTLY - ALTER DATABASE ... SET TABLESPACE - ALTER TYPE ... ADD I don't see why these last three should be treated seperately from the first list; we should either allow all, or none of these statements IMHO. kind regards, Feike Steenbergen On 7 October 2014 09:55, Marko Tiikkaja ma...@joh.to wrote: On 10/7/14, 9:11 AM, Feike Steenbergen wrote: Perhaps I am the only one using autocommit-off mode You most definitely aren't. and we shouldn't put effort into fixing this? It's not clear to me that this is fixing a problem, to be honest. If you're running autocommit=off, you have an expectation that you can roll back commands at will. It's fine if I can't roll back a VACUUM, for example, since I would practically never want to do that. But ALTER TYPE .. ADD VALUE ..; is an entirely different beast. That one's permanent; there's no DROP equivalent. If the command is just executed, and I can't roll it back, wouldn't that be a serious violation of the principle of least astonishment? DROP INDEX CONCURRENTLY has a bit of the same problem. You can CREATE INDEX CONCURRENTLY, but it might take days in some cases. I think that just running the command is a bad idea, and if we want to fix something here we should focus on just providing a better error message. .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add regression tests for autocommit-off mode for psql and fix some omissions
On 10/7/14, 2:11 AM, Feike Steenbergen wrote: On 7 October 2014 01:41, Jim Nasbyjim.na...@bluetreble.com wrote: The options I see... 1) If there's a definitive way to tell from backend source code what commands disallow transactions then we can just use that information to generate the list of commands psql shouldn't do that with. 2) Always run the regression test with auto-commit turned off. 3) Run the regression in both modes (presumably only on the build farm due to how long it would take). 1) I don't know about a definitive way. I used grep to find all statements calling PreventTransactionChain. Perhaps it wouldn't be too horrific to create some perl code that would figure out what all of those commands are, and we could then use that to generate the appropriate list for psql. 2) - I expect most people use autocommit-on; so only running it in autocommit-off would not test the majority of users. - autocommit-off also obliges you to explicitly rollback transactions after errors occur; this would probably mean a rewrite of some tests? Well, that is at least doable, but probably rather ugly. It would probably be less ugly if our test framework had a way to test for errors (ala pgTap). Where I was going with this is a full-on brute-force test: execute every possible command with autocommit turned off. We don't need to check that each command does what it's supposed to do, only that it can execute. Of course, the huge problem with that is knowing how to actually successfully run each command. :( Theoretically the tests could be structured in such a way that there's a subset of tests that just see if the command even executes, but creating that is obviously a lot of work and with our current test framework probably a real pain to maintain. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add regression tests for autocommit-off mode for psql and fix some omissions
On Mon, Oct 6, 2014 at 7:36 PM, Feike Steenbergen feikesteenber...@gmail.com wrote: I would like to propose to add a regression test for all statements that call PreventTransactionChain in autocommit-off mode. I propose to add these tests to src/test/regress/sql/psql.sql as this is a psql-specific mode. Alternatively an isolated test called autocommit.sql could be created. Putting all this stuff in psql.sql is good enough IMO. During the writing of the regression test I found another statement not covered in the current function: DROP INDEX CONCURRENTLY. That's a good catch and it should be a separate patch. This could even be considered for a back-patch down to 9.2. Thoughts? I have created a patch consisting of a regression test and adding DROP INDEX CONCURRENTLY to command_no_begin. CREATE DATABASE and DROP DATABASE are not commands present (not allowed?) in the regression suite. ALTER SYSTEM has no tests as well, and REINDEX DATABASE may take time so they may be better ripped off... Also tests for CLUSTER without arguments, transaction commands, DISCARD and VACUUM would be good things. Regards, -- Michael
Re: [HACKERS] SSL regression test suite
On 08/12/2014 03:53 PM, Heikki Linnakangas wrote: On 08/12/2014 02:28 PM, Andres Freund wrote: On 2014-08-12 14:01:18 +0300, Heikki Linnakangas wrote: Also, to test sslmode=verify-full, where the client checks that the server certificate's hostname matches the hostname that it connected to, you need to have two aliases for the same server, one that matches the certificate and one that doesn't. But I think I found a way around that part; if the certificate is set up for localhost, and connect to 127.0.0.1, you get a mismatch. Alternatively, and to e.g. test wildcard certs and such, I think you can specify both host and hostaddr to connect to connect without actually doing a dns lookup. Oh, I didn't know that's possible! Yeah, that's a good solution. Here's a new version of the SSL regression suite I wrote earlier. It now specifies both host and hostaddr in the connection string as Andres suggested, so it no longer requires changes to network configuration. I added a bunch of tests for the SAN feature that Alexey Klyukin wrote and was committed earlier. Plus a lot of miscellaneous cleanup. This probably needs some further cleanup before it's ready for committing. One issues is that it creates a temporary cluster that listens for TCP connections on localhost, which isn't safe on a multi-user system. - Heikki diff --git a/src/test/Makefile b/src/test/Makefile index 0fd7eab..e6a7154 100644 --- a/src/test/Makefile +++ b/src/test/Makefile @@ -12,6 +12,6 @@ subdir = src/test top_builddir = ../.. include $(top_builddir)/src/Makefile.global -SUBDIRS = regress isolation +SUBDIRS = regress isolation ssl $(recurse) diff --git a/src/test/ssl/Makefile b/src/test/ssl/Makefile new file mode 100644 index 000..8e0db47 --- /dev/null +++ b/src/test/ssl/Makefile @@ -0,0 +1,59 @@ +#- +# +# Makefile for src/test/ssl +# +# Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group +# Portions Copyright (c) 1994, Regents of the University of California +# +# src/test/ssl/Makefile +# +#- + +subdir = src/test/ssl +top_builddir = ../../.. +include $(top_builddir)/src/Makefile.global + +CERTIFICATES := serverroot server-cn-and-alt-names \ + server-cn-only server-single-alt-name server-multiple-alt-names \ + server-no-names \ + clientroot client + +SSLFILES := $(CERTIFICATES:%=ssl/%.key) $(CERTIFICATES:%=ssl/%.crt) + +sslfiles: $(SSLFILES) + +# Rule for creating private/public key pairs +ssl/%.key: + openssl genrsa -out $@ 1024 + chmod 0600 $@ + +# Rule for creating CA certificates (client and server) +ssl/%root.crt: ssl/%root.key %root.config + openssl req -new -key ssl/$*root.key -days 36500 -out ssl/$*root.crt -x509 -config $*root.config + echo 00 ssl/$*root.srl + +# Server certificates, signed by server root CA: +ssl/server-%.crt: ssl/server-%.key ssl/serverroot.crt +# Generate a Certificate Sign Request (CSR) + openssl req -new -key ssl/server-$*.key -out ssl/server-$*.csr -config server-$*.config +# Sign the certificate with the right CA + openssl x509 -req -in ssl/server-$*.csr -CA ssl/serverroot.crt -CAkey ssl/serverroot.key -CAserial ssl/serverroot.srl -out ssl/server-$*.crt -extfile server-$*.config -extensions v3_req + rm ssl/server-$*.csr + +# Client certificate, signed by the client root CA: +ssl/client.crt: ssl/client.key ssl/clientroot.crt +# Generate a Certificate Sign Request (CSR) + openssl req -new -key ssl/client.key -out ssl/client.csr -config client.config +# Sign the certificate with the right CA + openssl x509 -req -in ssl/client.csr -CA ssl/clientroot.crt -CAkey ssl/clientroot.key -CAserial ssl/clientroot.srl -out ssl/client.crt + rm ssl/client.csr + +sslfiles-clean: + rm -f $(SSLFILES) ssl/client-root.srl ssl/server-root.srl + +check: + $(prove_check) + +installcheck: + rm -rf tmp_check + $(prove_installcheck) diff --git a/src/test/ssl/README b/src/test/ssl/README new file mode 100644 index 000..dfd2d79 --- /dev/null +++ b/src/test/ssl/README @@ -0,0 +1,43 @@ +src/test/ssl/README + +SSL regression tests + + +This directory contains a test suite for SSL support. + +Running the tests += + +make check + +Certificates + + +The test suite needs a set of public/private key pairs and certificates to +run: + +serverroot.crt: CA used to sign server certificates +clientroot.crt: CA used to sign client certificates +server-*.crt: server certificate, with small variations in the hostnames + present in the certificate. +client.crt: a client certificate, for user ssltestuser + +For convenience, these keypairs and certificates are included in the ssl/ +subdirectory, but the Makefile also contains a rule, make sslfiles, to +recreate them if you want to make changes. + + +TODO + + +* Allow the client-side of the tests to be run on different host easily. + Currently,
Re: [HACKERS] Add regression tests for autocommit-off mode for psql and fix some omissions
On 6 October 2014 14:09, Michael Paquier michael.paqu...@gmail.com wrote: That's a good catch and it should be a separate patch. This could even be considered for a back-patch down to 9.2. Thoughts? If I isolate DROP INDEX concurrently, this patch would do the trick. 20141006_drop_index_concurrently.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add regression tests for autocommit-off mode for psql and fix some omissions
Feike Steenbergen feikesteenber...@gmail.com writes: I would like to propose to add a regression test for all statements that call PreventTransactionChain in autocommit-off mode. What class of bug would that prevent exactly? It seems to me like something that would normally get forgotten when we add any new such statement. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add regression tests for autocommit-off mode for psql and fix some omissions
It would test that when setting AUTOCOMMIT to off that you will not run into: ERROR: [...] cannot run inside a transaction block when issuing one of these PreventTransactionChain commands. In src/bin/psql/common.c -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add regression tests for autocommit-off mode for psql and fix some omissions
On 10/6/14, 9:59 AM, Feike Steenbergen wrote: It would test that when setting AUTOCOMMIT to off that you will not run into: ERROR: [...] cannot run inside a transaction block when issuing one of these PreventTransactionChain commands. In src/bin/psql/common.c Yes, but what happens when a new non-transaction command is added? If we forget to exclude it in psql, we'll certainly also forget to add it to the unit test. The options I see... 1) If there's a definitive way to tell from backend source code what commands disallow transactions then we can just use that information to generate the list of commands psql shouldn't do that with. 2) Always run the regression test with auto-commit turned off. 3) Run the regression in both modes (presumably only on the build farm due to how long it would take). -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL regression test suite
On 08/05/2014 10:46 PM, Robert Haas wrote: On Mon, Aug 4, 2014 at 10:38 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Now that we use TAP for testing client tools, I think we can use that to test various SSL options too. I came up with the attached. Comments? It currently assumes that the client's and the server's hostnames are postgres-client.test and postgres-server.test, respectively. That makes it a bit tricky to run on a single systme. The README includes instructions; basically you need to set up an additional loopback device, and add entries to /etc/hosts for that. That seems so onerous that I think few people will do it, and not regularly, resulting in the tests breaking and nobody noticing. Reconfiguring the loopback interface like that requires root privilege, and won't survive a reboot, and doing it in the system configuration will require hackery specific to the particular flavor of Linux you're running, and probably other hackery on non-Linux systems (never mind Windows). Yeah, you're probably right. Why can't you make it work over 127.0.0.1? I wanted it to be easy to run the client and the server on different hosts. As soon as we have more than one SSL implementation, it would be really nice to do interoperability testing between a client and a server using different implementations. Also, to test sslmode=verify-full, where the client checks that the server certificate's hostname matches the hostname that it connected to, you need to have two aliases for the same server, one that matches the certificate and one that doesn't. But I think I found a way around that part; if the certificate is set up for localhost, and connect to 127.0.0.1, you get a mismatch. So, I got rid of the DNS setup, it only depends localhost/127.0.0.1 now. Patch attached. That means that it's not easy to run the client and the server on different hosts, but we can improve that later. - Heikki commit 140c590ca86a0ba4a6b422e4b618cd459b84175f Author: Heikki Linnakangas heikki.linnakan...@iki.fi Date: Wed Aug 6 18:43:39 2014 +0300 Refactor cert file stuff in client diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c index f950fc3..cee7b2e 100644 --- a/src/interfaces/libpq/fe-secure-openssl.c +++ b/src/interfaces/libpq/fe-secure-openssl.c @@ -780,57 +780,21 @@ destroy_ssl_system(void) static int initialize_SSL(PGconn *conn) { - struct stat buf; - char homedir[MAXPGPATH]; - char fnbuf[MAXPGPATH]; - char sebuf[256]; - bool have_homedir; - bool have_cert; EVP_PKEY *pkey = NULL; - - /* - * We'll need the home directory if any of the relevant parameters are - * defaulted. If pqGetHomeDirectory fails, act as though none of the - * files could be found. - */ - if (!(conn-sslcert strlen(conn-sslcert) 0) || - !(conn-sslkey strlen(conn-sslkey) 0) || - !(conn-sslrootcert strlen(conn-sslrootcert) 0) || - !(conn-sslcrl strlen(conn-sslcrl) 0)) - have_homedir = pqGetHomeDirectory(homedir, sizeof(homedir)); - else /* won't need it */ - have_homedir = false; - - /* Read the client certificate file */ - if (conn-sslcert strlen(conn-sslcert) 0) - strncpy(fnbuf, conn-sslcert, sizeof(fnbuf)); - else if (have_homedir) - snprintf(fnbuf, sizeof(fnbuf), %s/%s, homedir, USER_CERT_FILE); - else - fnbuf[0] = '\0'; - - if (fnbuf[0] == '\0') - { - /* no home directory, proceed without a client cert */ - have_cert = false; - } - else if (stat(fnbuf, buf) != 0) - { - /* - * If file is not present, just go on without a client cert; server - * might or might not accept the connection. Any other error, - * however, is grounds for complaint. - */ - if (errno != ENOENT errno != ENOTDIR) - { - printfPQExpBuffer(conn-errorMessage, - libpq_gettext(could not open certificate file \%s\: %s\n), - fnbuf, pqStrerror(errno, sebuf, sizeof(sebuf))); - return -1; - } - have_cert = false; - } - else + char *sslcertfile = NULL; + char *engine = NULL; + char *keyname = NULL; + char *sslkeyfile = NULL; + char *sslrootcert = NULL; + char *sslcrl = NULL; + int ret = -1; + + if (!pqsecure_get_ssl_files(conn, +sslcertfile, sslkeyfile, engine, keyname, +sslrootcert, sslcrl)) + return PGRES_POLLING_READING; + + if (sslcertfile) { /* * Cert file exists, so load it. Since OpenSSL doesn't provide the @@ -855,216 +819,146 @@ initialize_SSL(PGconn *conn) { printfPQExpBuffer(conn-errorMessage, libpq_gettext(could not acquire mutex: %s\n), strerror(rc)); - return -1; + goto fail; } #endif - if (SSL_CTX_use_certificate_chain_file(SSL_context, fnbuf) != 1) + if (SSL_CTX_use_certificate_chain_file(SSL_context, sslcertfile) != 1) { char *err = SSLerrmessage(); printfPQExpBuffer(conn-errorMessage, libpq_gettext(could not read certificate file \%s\: %s\n), - fnbuf, err); + sslcertfile, err); SSLerrfree(err); #ifdef
Re: [HACKERS] SSL regression test suite
On 2014-08-12 14:01:18 +0300, Heikki Linnakangas wrote: On 08/05/2014 10:46 PM, Robert Haas wrote: Why can't you make it work over 127.0.0.1? I wanted it to be easy to run the client and the server on different hosts. As soon as we have more than one SSL implementation, it would be really nice to do interoperability testing between a client and a server using different implementations. Also, to test sslmode=verify-full, where the client checks that the server certificate's hostname matches the hostname that it connected to, you need to have two aliases for the same server, one that matches the certificate and one that doesn't. But I think I found a way around that part; if the certificate is set up for localhost, and connect to 127.0.0.1, you get a mismatch. Alternatively, and to e.g. test wildcard certs and such, I think you can specify both host and hostaddr to connect to connect without actually doing a dns lookup. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL regression test suite
On 08/12/2014 02:28 PM, Andres Freund wrote: On 2014-08-12 14:01:18 +0300, Heikki Linnakangas wrote: Also, to test sslmode=verify-full, where the client checks that the server certificate's hostname matches the hostname that it connected to, you need to have two aliases for the same server, one that matches the certificate and one that doesn't. But I think I found a way around that part; if the certificate is set up for localhost, and connect to 127.0.0.1, you get a mismatch. Alternatively, and to e.g. test wildcard certs and such, I think you can specify both host and hostaddr to connect to connect without actually doing a dns lookup. Oh, I didn't know that's possible! Yeah, that's a good solution. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL regression test suite
On Mon, Aug 4, 2014 at 10:38 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Now that we use TAP for testing client tools, I think we can use that to test various SSL options too. I came up with the attached. Comments? It currently assumes that the client's and the server's hostnames are postgres-client.test and postgres-server.test, respectively. That makes it a bit tricky to run on a single systme. The README includes instructions; basically you need to set up an additional loopback device, and add entries to /etc/hosts for that. That seems so onerous that I think few people will do it, and not regularly, resulting in the tests breaking and nobody noticing. Reconfiguring the loopback interface like that requires root privilege, and won't survive a reboot, and doing it in the system configuration will require hackery specific to the particular flavor of Linux you're running, and probably other hackery on non-Linux systems (never mind Windows). Why can't you make it work over 127.0.0.1? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance regression: 9.2+ vs. ScalarArrayOpExpr vs. ORDER BY
Andrew Gierth and...@tao11.riddles.org.uk writes: commit 807a40c5 fixed a bug in handling of (new in 9.2) functionality of ScalarArrayOpExpr in btree index quals, forcing the results of scans including such a qual to be treated as unordered (because the order can in fact be wrong). However, this kills any chance of using the same index _without_ the SAOP to get the benefit of its ordering. Hm, good point. I've experimented with the attached patch, which detects when this situation might have occurred and does another pass to try and build ordered scans without the SAOP condition. However, the results may not be quite ideal, because at least in some test queries (not all) the scan with the SAOP included in the indexquals is being costed higher than the same scan with the SAOP moved to a Filter, which seems unreasonable. I'm not convinced that that's a-priori unreasonable, since the SAOP will result in multiple index scans under the hood. Conceivably we ought to try the path with and with SAOPs all the time. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance regression: 9.2+ vs. ScalarArrayOpExpr vs. ORDER BY
Tom == Tom Lane t...@sss.pgh.pa.us writes: I've experimented with the attached patch, which detects when this situation might have occurred and does another pass to try and build ordered scans without the SAOP condition. However, the results may not be quite ideal, because at least in some test queries (not all) the scan with the SAOP included in the indexquals is being costed higher than the same scan with the SAOP moved to a Filter, which seems unreasonable. Tom I'm not convinced that that's a-priori unreasonable, since the Tom SAOP will result in multiple index scans under the hood. Tom Conceivably we ought to try the path with and with SAOPs all the Tom time. OK, here's a patch that always retries on lower SAOP clauses, assuming that a SAOP in the first column is always applicable - or is even that assumption too strong? -- Andrew (irc:RhodiumToad) diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 42dcb11..cfcfbfc 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -50,7 +50,8 @@ typedef enum { SAOP_PER_AM,/* Use ScalarArrayOpExpr if amsearcharray */ SAOP_ALLOW, /* Use ScalarArrayOpExpr for all indexes */ - SAOP_REQUIRE/* Require ScalarArrayOpExpr to be used */ + SAOP_REQUIRE,/* Require ScalarArrayOpExpr to be used */ + SAOP_SKIP_LOWER/* Require lower ScalarArrayOpExpr to be eliminated */ } SaOpControl; /* Whether we are looking for plain indexscan, bitmap scan, or either */ @@ -118,7 +119,8 @@ static void get_index_paths(PlannerInfo *root, RelOptInfo *rel, static List *build_index_paths(PlannerInfo *root, RelOptInfo *rel, IndexOptInfo *index, IndexClauseSet *clauses, bool useful_predicate, - SaOpControl saop_control, ScanTypeControl scantype); + SaOpControl saop_control, ScanTypeControl scantype, + bool *saop_retry); static List *build_paths_for_OR(PlannerInfo *root, RelOptInfo *rel, List *clauses, List *other_clauses); static List *generate_bitmap_or_paths(PlannerInfo *root, RelOptInfo *rel, @@ -734,6 +736,7 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel, { List *indexpaths; ListCell *lc; + bool saop_retry = false; /* * Build simple index paths using the clauses. Allow ScalarArrayOpExpr @@ -742,7 +745,23 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel, indexpaths = build_index_paths(root, rel, index, clauses, index-predOK, - SAOP_PER_AM, ST_ANYSCAN); + SAOP_PER_AM, ST_ANYSCAN, saop_retry); + + /* + * If we allowed any ScalarArrayOpExprs on an index with a useful sort + * ordering, then try again without them, since otherwise we miss important + * paths where the index ordering is relevant. + */ + if (saop_retry) + { + indexpaths = list_concat(indexpaths, + build_index_paths(root, rel, + index, clauses, + index-predOK, + SAOP_SKIP_LOWER, + ST_ANYSCAN, + NULL)); + } /* * Submit all the ones that can form plain IndexScan plans to add_path. (A @@ -779,7 +798,7 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel, indexpaths = build_index_paths(root, rel, index, clauses, false, - SAOP_REQUIRE, ST_BITMAPSCAN); + SAOP_REQUIRE, ST_BITMAPSCAN, NULL); *bitindexpaths = list_concat(*bitindexpaths, indexpaths); } } @@ -816,12 +835,14 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel, * 'useful_predicate' indicates whether the index has a useful predicate * 'saop_control' indicates whether ScalarArrayOpExpr clauses can be used * 'scantype' indicates whether we need plain or bitmap scan support + * 'saop_retry' indicates whether a SAOP_SKIP_LOWER retry is worthwhile */ static List * build_index_paths(PlannerInfo *root, RelOptInfo *rel, IndexOptInfo *index, IndexClauseSet *clauses, bool useful_predicate, - SaOpControl saop_control, ScanTypeControl scantype) + SaOpControl saop_control, ScanTypeControl scantype, + bool *saop_retry) { List *result = NIL; IndexPath *ipath; @@ -877,7 +898,9 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel, * assuming that the scan result is ordered. (Actually, the result is * still ordered if there are equality constraints for all earlier * columns, but it seems too expensive and non-modular for this code to be - * aware of that refinement.) + * aware of that refinement.) But if saop_control is SAOP_SKIP_LOWER, we + * skip exactly these clauses that break sorting, and don't bother + * building any paths otherwise. * * We also build a Relids set showing which outer rels are required by the * selected clauses. Any lateral_relids are included in that, but not @@ -901,9 +924,13 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel, /* Ignore if not supported by index */ if (saop_control
Re: [HACKERS] performance regression in 9.2/9.3
On 05/06/14 23:09, Linos wrote: On 05/06/14 19:39, Tom Lane wrote: Merlin Moncure mmonc...@gmail.com writes: On Thu, Jun 5, 2014 at 9:54 AM, Linos i...@linos.es wrote: What I don't understand is why the statistics have this bad information, all my tests are done on a database just restored and analyzed. Can I do something to improve the quality of my database statistics and let the planner do better choices? Maybe increase the statistics target of the columns involved? By that I meant row count estimates coming out of the joins are way off. This is pushing the planner into making bad choices. The most pervasive problem I see is that the row count estimate boils down to '1' at some juncture causing the server to favor nestloop/index scan when something like a hash join would likely be more appropriate. There's some fairly wacko stuff going on in this example, like why is the inner HashAggregate costed so much higher by 9.3 than 8.4, when the inputs are basically the same? And why does 9.3 fail to suppress the SubqueryScan on ven, when 8.4 does get rid of it? And why is the final output rows estimate so much higher in 9.3? That one is actually higher than the product of the two nestloop inputs, which looks like possibly a bug. I think what's happening is that 9.3 is picking what it knows to be a less than optimal join method so that it can sort the output by means of the ordered scan Index Scan using referencia_key on modelo mo, and thereby avoid an explicit sort of what it thinks would be 42512461 rows. With a closer-to-reality estimate there, it would have gone for a plan more similar to 8.4's, ie, hash joins and then an explicit sort. There is a lot going on in this plan that we haven't been told about; for instance at least one of the query's tables seems to actually be a view, and some other ones appear to be inheritance trees with partitioning constraints, and I'm suspicious that some of the aggregates might be user-defined functions with higher than normal costs. I'd like to see a self-contained test case, by which I mean full details about the table/view schemas; it's not clear whether the actual data is very important here. regards, tom lane Query 2 doesn't use any view and you can find the schema here: http://pastebin.com/Nkv7FwRr Query 1 use 5 views: ticket_cabecera, ticket_linea, reserva_cabecera, reserva_linea and tarifa_proveedor_modelo_precio, I have factored out the four first with the same result as before, you can find the new query and the new plan here: http://pastebin.com/7u2Dkyxp http://explain.depesz.com/s/2V9d Actually the execution time is worse than before. About the last view if I change join from tarifa_proveedor_modelo_precio to tarifa_modelo_precio (a table with nearly the same structure as the view) the query is executed much faster, but I get a similar time changing the (MIN(cab.time_stamp_recepcion)::DATE = ) to (WHERE cab.time_stamp_recepcion::date = ) in the ent subquery that never was a view. Anyway I included tarifa_modelo_precio to the query1 schema file for reference and you can find the plan using tarifa_modelo_precio instead of the view tarifa_proveedor_modelo_precio here: http://explain.depesz.com/s/4gV query1 schema file: http://pastebin.com/JpqM87dr Regards, Miguel Angel. Hello, Is this information enough? I could try to assemble a complete test case but I have very little time right now because I am trying to meet a very difficult deadline. I will do ASAP if needed. Regards, Miguel Angel. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance regression in 9.2/9.3
On Mon, Jun 9, 2014 at 9:51 AM, Linos i...@linos.es wrote: Hello, Is this information enough? I could try to assemble a complete test case but I have very little time right now because I am trying to meet a very difficult deadline. I will do ASAP if needed. It is not -- it was enough to diagnose a potential problem but not the solution. Tom was pretty clear: I'd like to see a self-contained test case, by which I mean full details about the table/view schemas; it's not clear whether the actual data is very important here.. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance regression in 9.2/9.3
On 09/06/14 16:55, Merlin Moncure wrote: On Mon, Jun 9, 2014 at 9:51 AM, Linos i...@linos.es wrote: Hello, Is this information enough? I could try to assemble a complete test case but I have very little time right now because I am trying to meet a very difficult deadline. I will do ASAP if needed. It is not -- it was enough to diagnose a potential problem but not the solution. Tom was pretty clear: I'd like to see a self-contained test case, by which I mean full details about the table/view schemas; it's not clear whether the actual data is very important here.. merlin Merlin, in the email I replied to are attached the table/view schemas, I was referring to this information as enough or not. Tom said full details about the table/view schemas and these details are attached to the original email I replied to. Miguel Angel. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance regression in 9.2/9.3
On Mon, Jun 9, 2014 at 10:00 AM, Linos i...@linos.es wrote: On 09/06/14 16:55, Merlin Moncure wrote: On Mon, Jun 9, 2014 at 9:51 AM, Linos i...@linos.es wrote: Hello, Is this information enough? I could try to assemble a complete test case but I have very little time right now because I am trying to meet a very difficult deadline. I will do ASAP if needed. It is not -- it was enough to diagnose a potential problem but not the solution. Tom was pretty clear: I'd like to see a self-contained test case, by which I mean full details about the table/view schemas; it's not clear whether the actual data is very important here.. merlin Merlin, in the email I replied to are attached the table/view schemas, I was referring to this information as enough or not. Tom said full details about the table/view schemas and these details are attached to the original email I replied to. A self contained test case would generally imply a precise sequence of steps (possibly with supplied data, or some manipulations via generate_series) that would reproduce the issue locally. Since data may not be required, you might be able to get away with a 'schema only dump', but you'd need to make sure to include necessary statistics (mostly what you'd need is in pg_statistic which you'd have to join against pg_class, pg_attribute and pg_namespace). Ideally, you'd be able to restore your schema only dump on a blank database with autovacuum disabled, hack in your statistics, and verify your query produced the same plan. Then (and only then) you could tar up your schema only file, the statistics data, and the query to update the data, and your query with the bad plan which you've triple checked matched your problem condition's plan, and send it to Tom. There might be some things I've missed but getting a blank database to reproduce your problem with a minimum number of steps is key. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance regression in 9.2/9.3
On 09/06/14 17:30, Merlin Moncure wrote: On Mon, Jun 9, 2014 at 10:00 AM, Linos i...@linos.es wrote: On 09/06/14 16:55, Merlin Moncure wrote: On Mon, Jun 9, 2014 at 9:51 AM, Linos i...@linos.es wrote: Hello, Is this information enough? I could try to assemble a complete test case but I have very little time right now because I am trying to meet a very difficult deadline. I will do ASAP if needed. It is not -- it was enough to diagnose a potential problem but not the solution. Tom was pretty clear: I'd like to see a self-contained test case, by which I mean full details about the table/view schemas; it's not clear whether the actual data is very important here.. merlin Merlin, in the email I replied to are attached the table/view schemas, I was referring to this information as enough or not. Tom said full details about the table/view schemas and these details are attached to the original email I replied to. A self contained test case would generally imply a precise sequence of steps (possibly with supplied data, or some manipulations via generate_series) that would reproduce the issue locally. Since data may not be required, you might be able to get away with a 'schema only dump', but you'd need to make sure to include necessary statistics (mostly what you'd need is in pg_statistic which you'd have to join against pg_class, pg_attribute and pg_namespace). Ideally, you'd be able to restore your schema only dump on a blank database with autovacuum disabled, hack in your statistics, and verify your query produced the same plan. Then (and only then) you could tar up your schema only file, the statistics data, and the query to update the data, and your query with the bad plan which you've triple checked matched your problem condition's plan, and send it to Tom. There might be some things I've missed but getting a blank database to reproduce your problem with a minimum number of steps is key. merlin oh I understand now, sorry for the misunderstanding, I will prepare the complete test case ASAP, thank you for the explanation Merlin. Miguel Angel. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance regression in 9.2/9.3
Linos i...@linos.es writes: On 05/06/14 19:39, Tom Lane wrote: I'd like to see a self-contained test case, by which I mean full details about the table/view schemas; it's not clear whether the actual data is very important here. query1 schema file: http://pastebin.com/JpqM87dr Sorry about the delay on getting back to this. I downloaded the above schema file and tried to run the originally given query with it, and it failed because the query refers to a couple of tienda columns that don't exist anywhere in this schema. When you submit an updated version, please make sure that all the moving parts match ;-). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance regression in 9.2/9.3
On 09/06/14 18:31, Tom Lane wrote: Linos i...@linos.es writes: On 05/06/14 19:39, Tom Lane wrote: I'd like to see a self-contained test case, by which I mean full details about the table/view schemas; it's not clear whether the actual data is very important here. query1 schema file: http://pastebin.com/JpqM87dr Sorry about the delay on getting back to this. I downloaded the above schema file and tried to run the originally given query with it, and it failed because the query refers to a couple of tienda columns that don't exist anywhere in this schema. When you submit an updated version, please make sure that all the moving parts match ;-). regards, tom lane Tom are you trying with the modified query 1 I posted in the email you found the schema link? I changed a little bit to remove 4 views, these views were where tienda columns were. Here you can find the modified query and the new explain without these views. http://pastebin.com/7u2Dkyxp http://explain.depesz.com/s/2V9d Anyway Merlin told me how to create a more complete self-contained case without data, I will try to do it ASAP, I am really busy right now trying to meet a deadline but I will try to search for a while to create this test-case. Thank you Tom. Regards, Miguel Angel. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance regression in 9.2/9.3
On 05/06/14 13:32, Linos wrote: Hello all, This is a continuation of the thread found here: http://www.postgresql.org/message-id/538f2578.9080...@linos.es Considering this seems to be a problem with the planner I thought that maybe would be a better idea to post this problem here. To summarize the original thread I upgraded a medium (17Gb) database from PostgreSQL 8.4 to 9.3 and many of the queries my application uses started performing a lot slower, Merlin advised me to try disabling nestloop, this helped out for the particular query I was asking about but it is not a solution that I can/would like to use in the general case. I simplified a little bit the original query and I have added another one with same problem. query 1: http://pastebin.com/32QxbNqW query 1 postgres 9.3 nestloop enabled: http://explain.depesz.com/s/6WX query 1 postgres 8.4: http://explain.depesz.com/s/Q7V query 1 postgres 9.3 nestloop disabled: http://explain.depesz.com/s/w1n query 1 postgres 9.3 changed having min(ts_recepcion) = for where ts_recepcion = http://explain.depesz.com/s/H5V query 2: http://pastebin.com/JmfPcRg8 query 2 postgres 9.3 nestloop enabled: http://explain.depesz.com/s/EY7 query 2 postgres 8.4: http://explain.depesz.com/s/Xc4 query 2 postgres 9.3 nestloop disabled: http://explain.depesz.com/s/oO6O query 2 postgres 9.3 changed between to equal for date filter: http://explain.depesz.com/s/cP2H As you can see in this links the problem disappears when I disable nestloop, another thing I discovered making different combinations of changes is that it seems to be related with date/timestamp fields, small changes to the queries fix the problem without disabling nestloop. For example in query 1 changing this: WHERE cab.id_almacen_destino = 109 GROUP BY mo.modelo_id HAVING MIN(cab.time_stamp_recepcion)::date = (current_date - interval '30 days')::date to this: WHERE cab.id_almacen_destino = 109 AND cab.time_stamp_recepcion::date = (current_date - interval '30 days')::date GROUP BY mo.modelo_id in the first subquery fixed the execution time problem, I know the result is not the same, the second change is a better example: In query2 changing this: WHERE fecha BETWEEN '2014-05-19' AND '2014-05-19' to this: WHERE fecha = '2014-05-19' fixes the problem, as you can see in the different explains. This changes are not needed to make PostgreSQL 8.4 take the correct plan but they are in 9.2/9.3, I haven't tried 9.1 or 9.0 yet. Merlin advised me to create a small test case, the thing is that the tables involved can be pretty large. The best way to create a good test case would be to use generate_series or something alike to try to replicate this problem from zero without any dump, no? Regards, Miguel Angel. Hi, to put a little more of data on the table, on 9.1 I can reproduce the query 1 problem but not the query 2 problem. Regards, Miguel Angel. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance regression in 9.2/9.3
On Thu, Jun 5, 2014 at 6:32 AM, Linos i...@linos.es wrote: Hello all, This is a continuation of the thread found here: http://www.postgresql.org/message-id/538f2578.9080...@linos.es Considering this seems to be a problem with the planner I thought that maybe would be a better idea to post this problem here. To summarize the original thread I upgraded a medium (17Gb) database from PostgreSQL 8.4 to 9.3 and many of the queries my application uses started performing a lot slower, Merlin advised me to try disabling nestloop, this helped out for the particular query I was asking about but it is not a solution that I can/would like to use in the general case. I simplified a little bit the original query and I have added another one with same problem. I believe the basic problem (this is just one example; I've anecdotally seen this myself) is that changes in the query planner (which I don't follow and fully understand) in recent versions seem to be such that the planner makes better decisions in the presence of good information but in certain cases makes worse choices when dealing with bad information. Statistics errors tend to accumulate and magnify in complicated plans, especially when the SQL is not optimally written. I have no clue what the right solution is. There's been several discussions about 'plan risk' and trying to get the server to pick plans with better worse case behavior in cases where statistics are demonstrably suspicious. Maybe that would work but ISTM is a huge research item that won't get solved quickly or even necessarily pan out in the end. Nevertheless, user supplied test cases demonstrating performance regressions (bonus if it can be scripted out of generate_series) are going to be key drivers in finding a solution. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance regression in 9.2/9.3
On 05/06/14 16:40, Merlin Moncure wrote: On Thu, Jun 5, 2014 at 6:32 AM, Linos i...@linos.es wrote: Hello all, This is a continuation of the thread found here: http://www.postgresql.org/message-id/538f2578.9080...@linos.es Considering this seems to be a problem with the planner I thought that maybe would be a better idea to post this problem here. To summarize the original thread I upgraded a medium (17Gb) database from PostgreSQL 8.4 to 9.3 and many of the queries my application uses started performing a lot slower, Merlin advised me to try disabling nestloop, this helped out for the particular query I was asking about but it is not a solution that I can/would like to use in the general case. I simplified a little bit the original query and I have added another one with same problem. I believe the basic problem (this is just one example; I've anecdotally seen this myself) is that changes in the query planner (which I don't follow and fully understand) in recent versions seem to be such that the planner makes better decisions in the presence of good information but in certain cases makes worse choices when dealing with bad information. Statistics errors tend to accumulate and magnify in complicated plans, especially when the SQL is not optimally written. I have no clue what the right solution is. There's been several discussions about 'plan risk' and trying to get the server to pick plans with better worse case behavior in cases where statistics are demonstrably suspicious. Maybe that would work but ISTM is a huge research item that won't get solved quickly or even necessarily pan out in the end. Nevertheless, user supplied test cases demonstrating performance regressions (bonus if it can be scripted out of generate_series) are going to be key drivers in finding a solution. merlin What I don't understand is why the statistics have this bad information, all my tests are done on a database just restored and analyzed. Can I do something to improve the quality of my database statistics and let the planner do better choices? Maybe increase the statistics target of the columns involved? Regards, Miguel Angel. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance regression in 9.2/9.3
On 05/06/14 16:40, Merlin Moncure wrote: On Thu, Jun 5, 2014 at 6:32 AM, Linos i...@linos.es wrote: Hello all, This is a continuation of the thread found here: http://www.postgresql.org/message-id/538f2578.9080...@linos.es Considering this seems to be a problem with the planner I thought that maybe would be a better idea to post this problem here. To summarize the original thread I upgraded a medium (17Gb) database from PostgreSQL 8.4 to 9.3 and many of the queries my application uses started performing a lot slower, Merlin advised me to try disabling nestloop, this helped out for the particular query I was asking about but it is not a solution that I can/would like to use in the general case. I simplified a little bit the original query and I have added another one with same problem. I believe the basic problem (this is just one example; I've anecdotally seen this myself) is that changes in the query planner (which I don't follow and fully understand) in recent versions seem to be such that the planner makes better decisions in the presence of good information but in certain cases makes worse choices when dealing with bad information. Statistics errors tend to accumulate and magnify in complicated plans, especially when the SQL is not optimally written. I have no clue what the right solution is. There's been several discussions about 'plan risk' and trying to get the server to pick plans with better worse case behavior in cases where statistics are demonstrably suspicious. Maybe that would work but ISTM is a huge research item that won't get solved quickly or even necessarily pan out in the end. Nevertheless, user supplied test cases demonstrating performance regressions (bonus if it can be scripted out of generate_series) are going to be key drivers in finding a solution. merlin I tried setting statistics to 1 on albaran_entrada_cabecera.time_stamp_recepcion (query 1) and ticket_cabecera.fecha (query 2), query 2 is fixed after analyze with the new statistics target (with 5000 as target is fixed too) but query 1 doesn't improve. Regards, Miguel Angel. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance regression in 9.2/9.3
On Thu, Jun 5, 2014 at 9:54 AM, Linos i...@linos.es wrote: What I don't understand is why the statistics have this bad information, all my tests are done on a database just restored and analyzed. Can I do something to improve the quality of my database statistics and let the planner do better choices? Maybe increase the statistics target of the columns involved? By that I meant row count estimates coming out of the joins are way off. This is pushing the planner into making bad choices. The most pervasive problem I see is that the row count estimate boils down to '1' at some juncture causing the server to favor nestloop/index scan when something like a hash join would likely be more appropriate. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance regression in 9.2/9.3
On Thu, Jun 5, 2014 at 3:54 PM, Linos i...@linos.es wrote: What I don't understand is why the statistics have this bad information, all my tests are done on a database just restored and analyzed. Can I do something to improve the quality of my database statistics and let the planner do better choices? Maybe increase the statistics target of the columns involved? The statistics don't seem different at all in this case. The planner is predicting more or less the same results right up to the top level join where it think it'll be joining 200 rows by 92,000 rows. In 8.4 it predicted the join will produce 200 rows but in 9.4 it's predicting the join will produce 42 million rows. That's a pretty big difference. The actual number of rows it's seeing are about 2000x68 in both versions. I think in this case part of the answer is just that if your estimates are wrong then the planner will make bad deductions and it'll just be luck whether one set of bad deductions will produce better or worse plans than another set of bad deductions. The particular bad deductions here are that 9.3 is better able to deduce the ordering of the aggregates and avoid the extra sort. In 8.4 it probably wasn't aware of any plans that would produce rows in the right order. But why is it guessing the join will produce 42 million in 9.4 and only 200 in 8.4? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance regression in 9.2/9.3
Merlin Moncure mmonc...@gmail.com writes: On Thu, Jun 5, 2014 at 9:54 AM, Linos i...@linos.es wrote: What I don't understand is why the statistics have this bad information, all my tests are done on a database just restored and analyzed. Can I do something to improve the quality of my database statistics and let the planner do better choices? Maybe increase the statistics target of the columns involved? By that I meant row count estimates coming out of the joins are way off. This is pushing the planner into making bad choices. The most pervasive problem I see is that the row count estimate boils down to '1' at some juncture causing the server to favor nestloop/index scan when something like a hash join would likely be more appropriate. There's some fairly wacko stuff going on in this example, like why is the inner HashAggregate costed so much higher by 9.3 than 8.4, when the inputs are basically the same? And why does 9.3 fail to suppress the SubqueryScan on ven, when 8.4 does get rid of it? And why is the final output rows estimate so much higher in 9.3? That one is actually higher than the product of the two nestloop inputs, which looks like possibly a bug. I think what's happening is that 9.3 is picking what it knows to be a less than optimal join method so that it can sort the output by means of the ordered scan Index Scan using referencia_key on modelo mo, and thereby avoid an explicit sort of what it thinks would be 42512461 rows. With a closer-to-reality estimate there, it would have gone for a plan more similar to 8.4's, ie, hash joins and then an explicit sort. There is a lot going on in this plan that we haven't been told about; for instance at least one of the query's tables seems to actually be a view, and some other ones appear to be inheritance trees with partitioning constraints, and I'm suspicious that some of the aggregates might be user-defined functions with higher than normal costs. I'd like to see a self-contained test case, by which I mean full details about the table/view schemas; it's not clear whether the actual data is very important here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance regression in 9.2/9.3
On 05/06/14 19:39, Tom Lane wrote: Merlin Moncure mmonc...@gmail.com writes: On Thu, Jun 5, 2014 at 9:54 AM, Linos i...@linos.es wrote: What I don't understand is why the statistics have this bad information, all my tests are done on a database just restored and analyzed. Can I do something to improve the quality of my database statistics and let the planner do better choices? Maybe increase the statistics target of the columns involved? By that I meant row count estimates coming out of the joins are way off. This is pushing the planner into making bad choices. The most pervasive problem I see is that the row count estimate boils down to '1' at some juncture causing the server to favor nestloop/index scan when something like a hash join would likely be more appropriate. There's some fairly wacko stuff going on in this example, like why is the inner HashAggregate costed so much higher by 9.3 than 8.4, when the inputs are basically the same? And why does 9.3 fail to suppress the SubqueryScan on ven, when 8.4 does get rid of it? And why is the final output rows estimate so much higher in 9.3? That one is actually higher than the product of the two nestloop inputs, which looks like possibly a bug. I think what's happening is that 9.3 is picking what it knows to be a less than optimal join method so that it can sort the output by means of the ordered scan Index Scan using referencia_key on modelo mo, and thereby avoid an explicit sort of what it thinks would be 42512461 rows. With a closer-to-reality estimate there, it would have gone for a plan more similar to 8.4's, ie, hash joins and then an explicit sort. There is a lot going on in this plan that we haven't been told about; for instance at least one of the query's tables seems to actually be a view, and some other ones appear to be inheritance trees with partitioning constraints, and I'm suspicious that some of the aggregates might be user-defined functions with higher than normal costs. I'd like to see a self-contained test case, by which I mean full details about the table/view schemas; it's not clear whether the actual data is very important here. regards, tom lane Query 2 doesn't use any view and you can find the schema here: http://pastebin.com/Nkv7FwRr Query 1 use 5 views: ticket_cabecera, ticket_linea, reserva_cabecera, reserva_linea and tarifa_proveedor_modelo_precio, I have factored out the four first with the same result as before, you can find the new query and the new plan here: http://pastebin.com/7u2Dkyxp http://explain.depesz.com/s/2V9d Actually the execution time is worse than before. About the last view if I change join from tarifa_proveedor_modelo_precio to tarifa_modelo_precio (a table with nearly the same structure as the view) the query is executed much faster, but I get a similar time changing the (MIN(cab.time_stamp_recepcion)::DATE = ) to (WHERE cab.time_stamp_recepcion::date = ) in the ent subquery that never was a view. Anyway I included tarifa_modelo_precio to the query1 schema file for reference and you can find the plan using tarifa_modelo_precio instead of the view tarifa_proveedor_modelo_precio here: http://explain.depesz.com/s/4gV query1 schema file: http://pastebin.com/JpqM87dr Regards, Miguel Angel. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] small regression adjustment
Andrew Dunstan and...@dunslane.net writes: It occurred to me after having to change I think 9 files to clean up a small mess in the jsonb regression tests the other day that we might usefully expose the inputdir and outputdir to psql as variables when running pg_regress. Then we might be able to do thing like this, quite independent of location: \set datafile :inputdir/data/mystuff.data COPY mytable FROM :'datafile'; If we could get rid of the run-time-generated-test-file facility altogether, I could get excited about this; but just getting rid of the COPY special cases isn't enough for that. Looking at convert_sourcefiles_in, it seems like we'd also need solutions for these dynamic substitutions: replace_string(line, @testtablespace@, testtablespace); replace_string(line, @libdir@, dlpath); replace_string(line, @DLSUFFIX@, DLSUFFIX); At least this one seems rather difficult to fix in this fashion: output/create_function_1.source:83:ERROR: could not find function nosuchsymbol in file @libdir@/regress@DLSUFFIX@ (I'm a bit inclined to think that we could dispense with @DLSUFFIX@ altogether; explicit use of the platform's library suffix has been deprecated for at least a decade. But the others are harder.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] small regression adjustment
On 03/26/2014 11:37 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: It occurred to me after having to change I think 9 files to clean up a small mess in the jsonb regression tests the other day that we might usefully expose the inputdir and outputdir to psql as variables when running pg_regress. Then we might be able to do thing like this, quite independent of location: \set datafile :inputdir/data/mystuff.data COPY mytable FROM :'datafile'; If we could get rid of the run-time-generated-test-file facility altogether, I could get excited about this; but just getting rid of the COPY special cases isn't enough for that. Looking at convert_sourcefiles_in, it seems like we'd also need solutions for these dynamic substitutions: replace_string(line, @testtablespace@, testtablespace); replace_string(line, @libdir@, dlpath); replace_string(line, @DLSUFFIX@, DLSUFFIX); At least this one seems rather difficult to fix in this fashion: output/create_function_1.source:83:ERROR: could not find function nosuchsymbol in file @libdir@/regress@DLSUFFIX@ (I'm a bit inclined to think that we could dispense with @DLSUFFIX@ altogether; explicit use of the platform's library suffix has been deprecated for at least a decade. But the others are harder.) Well, maybe we should change dfmgr.c to stop outputting the DLSUFFIX in its error message. I haven't tried with the other two. I will when I get a spare moment. But even if we find it too troublesome to get rid if the substitution part altogether, I think minimizing the need for it would still be worth doing. It would help extension authors, for example, who are most likely to want to use it to load data files for testing their extensions. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] small regression adjustment
Andrew Dunstan and...@dunslane.net writes: On 03/26/2014 11:37 AM, Tom Lane wrote: At least this one seems rather difficult to fix in this fashion: output/create_function_1.source:83:ERROR: could not find function nosuchsymbol in file @libdir@/regress@DLSUFFIX@ (I'm a bit inclined to think that we could dispense with @DLSUFFIX@ altogether; explicit use of the platform's library suffix has been deprecated for at least a decade. But the others are harder.) Well, maybe we should change dfmgr.c to stop outputting the DLSUFFIX in its error message. If it weren't in the input command, it wouldn't be in the message either, I think. It's the @libdir@ part of that that's problematic. But even if we find it too troublesome to get rid if the substitution part altogether, I think minimizing the need for it would still be worth doing. It would help extension authors, for example, who are most likely to want to use it to load data files for testing their extensions. Yeah, I suppose getting down to one file needing a replacement would still be a significant improvement over the current situation. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ECPG regression tests generating warnings
On Sun, Jan 12, 2014 at 08:28:57AM -0800, Kevin Grittner wrote: desc.pgc:55: WARNING: descriptor outdesc does not exist desc.pgc:86: WARNING: descriptor outdesc does not exist Thanks, I didn't notice, fixed. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org Jabber: michael.meskes at gmail dot com VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers