Re: [HACKERS] Reducing ClogControlLock contention
On 30 June 2015 at 08:22, Simon Riggs si...@2ndquadrant.com wrote: This contention is masked by contention elsewhere, e.g. ProcArrayLock, so the need for testing here should come once other patches ahead of this are in. Let me explain more clearly. Andres' patch to cache snapshots and reduce ProcArrayLock was interesting, but not initially compelling. We now have a solution that commits in batches, which will reduce the number of times the ProcArray changes - this will heighten the benefit from Andres' snapshot cache patch. So the order of testing/commit should be Queued commit patch ProcArray cache patch Clog shared commit patch (this one) I didn't hear recent mention of Robert's chash patch, but IIRC it was effective and so we hope to see it again soon also. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
On Tue, Jun 30, 2015 at 7:37 PM, Rahila Syed rahilasye...@gmail.com wrote: Hello Hackers, Following is a proposal for feature to calculate VACUUM progress. Use Case : Measuring progress of long running VACUUMs to help DBAs make informed decision whether to continue running VACUUM or abort it. +1 I was thinking recently that it would be very cool to see some estimation of the progress of VACUUM and CLUSTER in a view similar to pg_stat_activity, or the ps title. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
On Tue, Jun 30, 2015 at 1:07 PM, Rahila Syed rahilasye...@gmail.com wrote: Hello Hackers, Following is a proposal for feature to calculate VACUUM progress. Use Case : Measuring progress of long running VACUUMs to help DBAs make informed decision whether to continue running VACUUM or abort it. +1 I am excited to know how the progress works in when any of the statement got blocked during locks. Rather displaying the stats in the LOG, shall we have this in a pg_stat_vacuum_activity[ New catalog for all auto-vacuum stats]. Best Regards, Dinesh manojadinesh.blogspot.com Design: A shared preload library to store progress information from different backends running VACUUM, calculate remaining time for each and display progress in the in the form a view. VACUUM needs to be instrumented with a hook to collect progress information (pages vacuumed/scanned) periodically. The patch attached implements a new hook to store vacuumed_pages and scanned_pages count at the end of each page scanned by VACUUM. This information is stored in a shared memory structure. In addition to measuring progress this function using hook also calculates remaining time for VACUUM. The frequency of collecting progress information can be reduced by appending delays in between hook function calls. Also, a GUC parameter log_vacuum_min_duration can be used. This will cause VACUUM progress to be calculated only if VACUUM runs more than specified milliseconds. A value of zero calculates VACUUM progress for each page processed. -1 disables logging. Progress calculation : percent_complete = scanned_pages * 100 / total_pages_to_be_scanned; remaining_time = elapsed_time * (total_pages_to_be_scanned - scanned_pages) / scanned_pages; Shared memory struct: typedef struct PgStat_VacuumStats { Oid databaseoid; Oid tableoid; Int32 vacuumed_pages; Int32 total_pages; Int32 scanned_pages; doubleelapsed_time; doubleremaining_time; } PgStat_VacuumStats[max_connections]; Reporting : A view named 'pg_maintenance_progress' can be created using the values in the struct above. pg_stat_maintenance can be called from any other backend and will display progress of each running VACUUM. Other uses of hook in VACUUM: Cost of VACUUM in terms of pages hit , missed and dirtied same as autovacuum can be collected using this hook. Autovacuum does it at the end of VACUUM for each table. It can be done while VACUUM on a table is in progress. This can be helpful to track manual VACUUMs also not just autovacuum. Read/Write(I/O) rates can be computed on the lines of autovacuum. Read rate patterns can be used to help tuning future vacuum on the table(like shared buffers tuning) Other resource usages can also be collected using progress checker hook. Attached patch is POC patch of progress calculation for a single backend. Also attached is a brief snapshot of the output log. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing ClogControlLock contention
On Tue, Jun 30, 2015 at 12:52 PM, Simon Riggs si...@2ndquadrant.com wrote: On 30 June 2015 at 08:13, Michael Paquier michael.paqu...@gmail.com wrote: Could it be possible to see some performance numbers? For example with a simple pgbench script doing a bunch of tiny transactions, with many concurrent sessions (perhaps hundreds). I'm more interested to see if people think it is safe. This contention is masked by contention elsewhere, e.g. ProcArrayLock, so the need for testing here should come once other patches ahead of this are in. Exactly and other lock that can mask this improvement is WALWriteLock, but for that we can take the performance data with synchronous_commit off mode. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
Hi 2015-06-30 9:37 GMT+02:00 Rahila Syed rahilasye...@gmail.com: Hello Hackers, Following is a proposal for feature to calculate VACUUM progress. interesting idea - I like to see it integrated to core. Use Case : Measuring progress of long running VACUUMs to help DBAs make informed decision whether to continue running VACUUM or abort it. Design: A shared preload library to store progress information from different backends running VACUUM, calculate remaining time for each and display progress in the in the form a view. probably similar idea can be used for REINDEX, CREATE INDEX, COPY TO statements I though about the possibilities of progress visualization - and one possibility is one or two special column in pg_stat_activity table - this info can be interesting for VACUUM started by autovacuum too. Regards Pavel VACUUM needs to be instrumented with a hook to collect progress information (pages vacuumed/scanned) periodically. The patch attached implements a new hook to store vacuumed_pages and scanned_pages count at the end of each page scanned by VACUUM. This information is stored in a shared memory structure. In addition to measuring progress this function using hook also calculates remaining time for VACUUM. The frequency of collecting progress information can be reduced by appending delays in between hook function calls. Also, a GUC parameter log_vacuum_min_duration can be used. This will cause VACUUM progress to be calculated only if VACUUM runs more than specified milliseconds. A value of zero calculates VACUUM progress for each page processed. -1 disables logging. Progress calculation : percent_complete = scanned_pages * 100 / total_pages_to_be_scanned; remaining_time = elapsed_time * (total_pages_to_be_scanned - scanned_pages) / scanned_pages; Shared memory struct: typedef struct PgStat_VacuumStats { Oid databaseoid; Oid tableoid; Int32 vacuumed_pages; Int32 total_pages; Int32 scanned_pages; doubleelapsed_time; doubleremaining_time; } PgStat_VacuumStats[max_connections]; Reporting : A view named 'pg_maintenance_progress' can be created using the values in the struct above. pg_stat_maintenance can be called from any other backend and will display progress of each running VACUUM. Other uses of hook in VACUUM: Cost of VACUUM in terms of pages hit , missed and dirtied same as autovacuum can be collected using this hook. Autovacuum does it at the end of VACUUM for each table. It can be done while VACUUM on a table is in progress. This can be helpful to track manual VACUUMs also not just autovacuum. Read/Write(I/O) rates can be computed on the lines of autovacuum. Read rate patterns can be used to help tuning future vacuum on the table(like shared buffers tuning) Other resource usages can also be collected using progress checker hook. Attached patch is POC patch of progress calculation for a single backend. Also attached is a brief snapshot of the output log. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
On 30 June 2015 at 08:52, Pavel Stehule pavel.steh...@gmail.com wrote: I though about the possibilities of progress visualization - and one possibility is one or two special column in pg_stat_activity table - this info can be interesting for VACUUM started by autovacuum too. Yes, I suggest just a single column on pg_stat_activity called pct_complete trace_completion_interval = 5s (default) Every interval, we report the current % complete for any operation that supports it. We just show NULL if the current operation has not reported anything or never will. We do this for VACUUM first, then we can begin adding other operations as we work out how (for that operation). -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] Reducing ClogControlLock contention
On 30 June 2015 at 08:13, Michael Paquier michael.paqu...@gmail.com wrote: On Tue, Jun 30, 2015 at 4:02 PM, Simon Riggs si...@2ndquadrant.com wrote: ClogControlLock contention is high at commit time. This appears to be due to the fact that ClogControlLock is acquired in Exclusive mode prior to marking commit, which then gets starved by backends running TransactionIdGetStatus(). Proposal for improving this is to acquire the ClogControlLock in Shared mode, if possible. This is safe because people checking visibility of an xid must always run TransactionIdIsInProgress() first to avoid race conditions, which will always return true for the transaction we are currently committing. As a result, we never get concurrent access to the same bits in clog, which would require a barrier. Two concurrent writers might access the same word concurrently, so we protect against that with a new CommitLock. We could partition that by pageno also, if needed. Could it be possible to see some performance numbers? For example with a simple pgbench script doing a bunch of tiny transactions, with many concurrent sessions (perhaps hundreds). I'm more interested to see if people think it is safe. This contention is masked by contention elsewhere, e.g. ProcArrayLock, so the need for testing here should come once other patches ahead of this are in. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
[HACKERS] Missing return value checks in jsonfuncs.c
Hi all, Coverity is pointing out that a couple of return value checks are missing for JsonbIteratorNext in jsonfuncs.c. Patch is attached to append (void) to them, as far as I am guessing we want skip the value iterated on. Regards, -- Michael diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c index 13d5b7a..b4258d8 100644 --- a/src/backend/utils/adt/jsonfuncs.c +++ b/src/backend/utils/adt/jsonfuncs.c @@ -3362,8 +3362,7 @@ jsonb_delete(PG_FUNCTION_ARGS) { /* skip corresponding value as well */ if (r == WJB_KEY) -JsonbIteratorNext(it, v, true); - +(void) JsonbIteratorNext(it, v, true); continue; } @@ -3436,7 +3435,7 @@ jsonb_delete_idx(PG_FUNCTION_ARGS) if (i++ == idx) { if (r == WJB_KEY) - JsonbIteratorNext(it, v, true); /* skip value */ + (void) JsonbIteratorNext(it, v, true); /* skip value */ continue; } } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory leak with XLogFileCopy since de768844 (WAL file with .partial)
On Tue, Jun 9, 2015 at 10:09 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Tue, Jun 9, 2015 at 6:25 AM, Heikki Linnakangas wrote: I'm still not sure if I should've just reverted that refactoring, to make XLogFileCopy() look the same in master and back-branches, which makes back-patching easier, or keep the refactoring, because it makes the code slightly nicer. But the current situation is the worst of both worlds: the interface of XLogFileCopy() is no better than it used to be, but it's different enough to cause merge conflicts. At this point, it's probably best to revert the code to look the same as in 9.4. That's a valid concern. What about the attached then? I think that it is still good to keep upto to copy only data up to the switch point at recovery exit. InstallXLogFileSegment() changes a bit as well because of its modifications of arguments. Applied. Thanks! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in bttext_abbrev_convert()
On Tue, Jun 30, 2015 at 5:25 PM, Jim Nasby jim.na...@bluetreble.com wrote: Isn't this the kind of thing Coverty's supposed to find? I don't know, but in general I'm not very excited about static analysis tools. The best things that they have going for them is that they're available, and don't require test coverage in the same way as running the regression tests with Valgrind enabled. There is no real testing of sorting in the regression tests. It would be nice to have a way of generating a large and varied selection of sort operations programmatically, to catch this kind of thing. pg_regress is not really up to it. The same applies to various other cases where having a lot of expected output makes using pg_regress infeasible. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WAL-related tools and .paritial WAL file
Hi, WAL-related tools, i.e., pg_archivecleanup, pg_resetxlog and pg_xlogdump don't seem to properly handle .paritial WAL file. I think that we should fix at least pg_archivecleanup, otherwise, in the system using pg_archivecleanup to clean up old archived files, the archived .paritial WAL file will never be removed. Thought? To make pg_archivecleanup detect .paritial WAL file, I'd like to use IsPartialXLogFileName() macro that commit 179cdd0 added. Fortunately Michael proposed the patch which makes the macros in xlog_internal.h usable in WAL-related tools, and it's better to apply the fix based on his patch. So my plan is to apply his patch first and then apply the fix to pg_archivecleanup. http://www.postgresql.org/message-id/CAB7nPqSiKU+8HjVe_J05btonC5E7kvmRaLpGS6EaEQe=dy3...@mail.gmail.com Regarding pg_resetxlog and pg_xlogdump, do we need to change also them so that they can handle .paritial file properly? pg_resetxlog cannot clean up .partial file even if it should be removed. But the remaining .paritial file is harmless and will be recycled later by checkpoint, so extra treatment of .paritial file in pg_resetxlog may not be necessary. IOW, maybe we can document that's the limitation of pg_resetxlog. Also regarding pg_xlogdump, we can just document, for example, please get rid of .paritial suffix from the WAL file name if you want to dump it by pg_xlogdump. Thought? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup and replication slots
On 5/21/15 8:42 AM, Peter Eisentraut wrote: I wonder why pg_basebackup doesn't have any support for replication slots. When relying on replication slots to hang on to WAL data, there is a gap between when pg_basebackup finishes and streaming replication is started where WAL data could be thrown away by the primary. Looking at the code, the -X stream method could easily specify a replication slot. (Might be nice if it could also create it in the same run.) Here is a patch set for this. (If you're looking at the patch and wondering why there is no code to actually do anything with the replication slot, that's because the code that does the WAL streaming is already aware of replication slots because of the pg_receivexlog functionality that it also serves. So all that needs to be done is set replication_slot.) See also the concurrent discussion on (optionally?) initializing restart_lsn when the replication slot is created (http://www.postgresql.org/message-id/flat/b8d538ac5587c84898b261fcb8c7d8a41fde1...@ex10-mbx-36009.ant.amazon.com#b8d538ac5587c84898b261fcb8c7d8a41fde1...@ex10-mbx-36009.ant.amazon.com), which might have an impact on the details of this change. From a718282cbc18566732a0d9c3e82c8fca752f4812 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut pete...@gmx.net Date: Tue, 30 Jun 2015 21:15:05 -0400 Subject: [PATCH 1/3] pg_basebackup: Add tests for -R option --- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 9 - src/test/perl/TestLib.pm | 8 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index c8c9250..0ddfffe 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -2,7 +2,7 @@ use warnings; use Cwd; use TestLib; -use Test::More tests = 35; +use Test::More tests = 39; program_help_ok('pg_basebackup'); program_version_ok('pg_basebackup'); @@ -138,3 +138,10 @@ command_ok([ 'pg_basebackup', '-D', $tempdir/tarbackup_l3, '-Ft' ], 'pg_basebackup tar with long symlink target'); psql 'postgres', DROP TABLESPACE tblspc3;; + +command_ok([ 'pg_basebackup', '-D', $tempdir/backupR, '-R' ], + 'pg_basebackup -R runs'); +ok(-f $tempdir/backupR/recovery.conf, 'recovery.conf was created'); +my $recovery_conf = slurp_file $tempdir/backupR/recovery.conf; +like($recovery_conf, qr/^standby_mode = 'on'$/m, 'recovery.conf sets standby_mode'); +like($recovery_conf, qr/^primary_conninfo = '.*port=$ENV{PGPORT}.*'$/m, 'recovery.conf sets primary_conninfo'); diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm index ef42366..60bd7d5 100644 --- a/src/test/perl/TestLib.pm +++ b/src/test/perl/TestLib.pm @@ -11,6 +11,7 @@ our @EXPORT = qw( start_test_server restart_test_server psql + slurp_file system_or_bail command_ok @@ -129,6 +130,13 @@ sub psql run [ 'psql', '-X', '-q', '-d', $dbname, '-f', '-' ], '', \$sql or die; } +sub slurp_file +{ + local $/; + local @ARGV = @_; + +} + sub system_or_bail { system(@_) == 0 or BAIL_OUT(system @_ failed: $?); -- 2.4.5 From 934309249f1fed9fd008eccd32906fbfd763811a Mon Sep 17 00:00:00 2001 From: Peter Eisentraut pete...@gmx.net Date: Tue, 30 Jun 2015 21:15:29 -0400 Subject: [PATCH 2/3] pg_basebackup: Add tests for -X option --- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 13 - src/test/perl/TestLib.pm | 10 ++ 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index 0ddfffe..52f1575 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -2,7 +2,7 @@ use warnings; use Cwd; use TestLib; -use Test::More tests = 39; +use Test::More tests = 44; program_help_ok('pg_basebackup'); program_version_ok('pg_basebackup'); @@ -46,6 +46,10 @@ 'pg_basebackup runs'); ok(-f $tempdir/backup/PG_VERSION, 'backup was created'); +is_deeply([sort(slurp_dir($tempdir/backup/pg_xlog/))], + [sort qw(. .. archive_status)], + 'no WAL files copied'); + command_ok( [ 'pg_basebackup', '-D', $tempdir/backup2, '--xlogdir', $tempdir/xlog2 ], @@ -145,3 +149,10 @@ my $recovery_conf = slurp_file $tempdir/backupR/recovery.conf; like($recovery_conf, qr/^standby_mode = 'on'$/m, 'recovery.conf sets standby_mode'); like($recovery_conf, qr/^primary_conninfo = '.*port=$ENV{PGPORT}.*'$/m, 'recovery.conf sets primary_conninfo'); + +command_ok([ 'pg_basebackup', '-D', $tempdir/backupxf, '-X', 'fetch' ], + 'pg_basebackup -X fetch runs'); +ok(grep(/^[0-9A-F]{24}$/, slurp_dir($tempdir/backupxf/pg_xlog)), 'WAL files copied'); +command_ok([ 'pg_basebackup', '-D', $tempdir/backupxs, '-X', 'stream' ], + 'pg_basebackup -X stream runs'); +ok(grep(/^[0-9A-F]{24}$/, slurp_dir($tempdir/backupxf/pg_xlog)), 'WAL files copied'); diff --git
Re: [HACKERS] Memory leak with XLogFileCopy since de768844 (WAL file with .partial)
On Wed, Jul 1, 2015 at 10:58 AM, Fujii Masao wrote: On Tue, Jun 9, 2015 at 10:09 AM, Michael Paquier wrote: That's a valid concern. What about the attached then? I think that it is still good to keep upto to copy only data up to the switch point at recovery exit. InstallXLogFileSegment() changes a bit as well because of its modifications of arguments. Applied. Thanks! Thanks for showing up. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in bttext_abbrev_convert()
On 6/29/15 6:47 PM, Peter Geoghegan wrote: As we all know, the state of automated testing is pretty lamentable. This is the kind of thing that we could catch more easily in the future if better infrastructure were in place. I caught this by eyeballing bttext_abbrev_convert() with slightly fresher eyes than the last time I looked. Isn't this the kind of thing Coverty's supposed to find? -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dereferenced pointer in tablesample.c
On 2015-06-30 09:10, Michael Paquier wrote: Hi all, (Petr in CC) Coverity is complaining about the following pointer dereference in tablesample_init@tablesample.c: + ExprState *argstate = ExecInitExpr(argexpr, (PlanState *) scanstate); + + if (argstate == NULL) + { + fcinfo.argnull[i] = true; + fcinfo.arg[i] = (Datum) 0;; + } + + fcinfo.arg[i] = ExecEvalExpr(argstate, econtext, + fcinfo.argnull[i], NULL); If the expression argstate is NULL when calling ExecInitExpr(), argstate is going to be NULL and dereferenced afterwards, see execQual.c for more details. Hence I think that the patch attached should be applied. Thoughts? Well, yes the ExecEvalExpr should be in the else block if we'd keep the NULL logic there. However after rereading the code, ISTM the ExecInitExpr will only return NULL if the argexpr is NULL and argexpr is added by ParseTableSample using the transformExpr on every argument which comes from grammar and those are a_exprs which AFAIK will never be NULL. So I actually think that the argstate can never be NULL in practice. Given the above I would just remove the if statement here - it's not present in any other code that does ExecInitExpr/ExecEvalExpr either. It's most likely relic of the code that didn't treat the repeatable separately and just put it into args List. Patch attached. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/tablesample/tablesample.c b/src/backend/access/tablesample/tablesample.c index 44a2434..d263fe2 100644 --- a/src/backend/access/tablesample/tablesample.c +++ b/src/backend/access/tablesample/tablesample.c @@ -110,12 +110,6 @@ tablesample_init(SampleScanState *scanstate, TableSampleClause *tablesample) Expr *argexpr = (Expr *) lfirst(arg); ExprState *argstate = ExecInitExpr(argexpr, (PlanState *) scanstate); - if (argstate == NULL) - { - fcinfo.argnull[i] = true; - fcinfo.arg[i] = (Datum) 0;; - } - fcinfo.arg[i] = ExecEvalExpr(argstate, econtext, fcinfo.argnull[i], NULL); i++; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Missing checks on return value of timestamp2tm in datetime.c
Hi all, timestamp2tm is called close to 40 times in the backend source code, returning -1 in case of failure. However, there are two places in datetime.c where we do not check for its return value: GetCurrentDateTime and GetCurrentTimeUsec. This does not really matter much in practice as the timestamp used is GetCurrentTransactionStartTimestamp(), but for correctness shouldn't we check for its return code and report ERRCODE_DATETIME_VALUE_OUT_OF_RANGE on error? Per se the patch attached. Regards, -- Michael
[HACKERS] Unneeded NULL-pointer check in FreeSpaceMapTruncateRel
Hi all, In the category of nitpicky-code-style-issues, FreeSpaceMapTruncateRel is doing a NULL-pointer check for something that has been dereferenced on all the code paths leading to this check. (Yes, that's not interesting for common humans, Coverity sees things based on correctness). Regards, -- Michael diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c index fddb47c..c948cb6 100644 --- a/src/backend/storage/freespace/freespace.c +++ b/src/backend/storage/freespace/freespace.c @@ -308,8 +308,7 @@ FreeSpaceMapTruncateRel(Relation rel, BlockNumber nblocks) * at the next command boundary. But this ensures it isn't outright wrong * until then. */ - if (rel-rd_smgr) - rel-rd_smgr-smgr_fsm_nblocks = new_nfsmblocks; + rel-rd_smgr-smgr_fsm_nblocks = new_nfsmblocks; } /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactor to split nodeAgg.c?
On 30 June 2015 at 14:33, Jeff Davis pg...@j-davis.com wrote: I was going to rebase my HashAgg patch, and got some conflicts related to the grouping sets patch. I could probably sort them out, but I think that may be the tipping point where we want to break up nodeAgg.c into nodeSortedAgg.c and nodeHashAgg.c, and probably a common file as well. This would also (I hope) be convenient for Simon and David Rowley, who have been hacking on aggregates in general. That would be more inconvenient right now as I have a pending patch which makes quite a number of changes which are all over nodeAgg.c. https://commitfest.postgresql.org/5/271/ Regards David Rowley -- David Rowley http://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
[HACKERS] 9.5 branch splitoff
Barring objections, I'll create the REL9_5_STABLE branch and stamp HEAD as 9.6devel sometime this afternoon, maybe around 1800 UTC. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive
On Fri, Jun 26, 2015 at 6:26 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jun 25, 2015 at 11:57 PM, Amit Kapila amit.kapil...@gmail.com wrote: 3. Add new view 'pg_stat_wait_event' with following info: pid - process id of this backend waiting - true for any form of wait, false otherwise wait_event_type - Heavy Weight Lock, Light Weight Lock, I/O wait, etc wait_event - Lock (Relation), Lock (Relation Extension), etc I am pretty unconvinced that it's a good idea to try to split up the wait event into two columns. I'm only proposing ~20 wait states, so there's something like 5 bits of information here. Spreading that over two text columns is a lot, and note that Amit's text would basically recapitulate the contents of the first column in the second one, which I cannot get excited about. There is an advantage in splitting the columns which is if wait_event_type column indicates Heavy Weight Lock, then user can go and check further details in pg_locks, I think he can do that even by seeing wait_event column, but that might not be as straightforward as with wait_event_type column. It's just a matter of writing event_type LIKE 'Lock %' instead of event_type = 'Lock'. Yes that way it can be done and may be that is not inconvenient for user, but then there is other type of information which user might need like what distinct resources on which wait is possible, which again he can easily find with different event_type column. I think some more discussion is required before we could freeze the user interface for this feature, but in the meantime I have prepared an initial patch by adding a new column wait_event in pg_stat_activity. For now, I have added the support for Heavy-Weight locks, Light-Weight locks [1] and Buffer Cleanup Lock. I could add for other types (spin lock delay sleep, IO, network IO, etc.) if there is no objection in the approach used in patch to implement this feature. [1] For LWLocks, currently I have used wait_event as OtherLock for locks other than NUM_FIXED_LWLOCKS (Refer function NumLWLocks to see all type of LWLocks). The reason is that there is no straight forward way to get the id (lockid) of such locks as for some of those (like shared_buffers, MaxBackends) the number of locks will depend on run-time configuration parameters. I think if we want to handle those then we could either do some math to find out the lockid based on runtime values of these parameters or we could add tag in LWLock structure (which indicates the lock type) and fill it during Lock initialization or may be some other better way to do it. I have still not added documentation and have not changed anything for waiting column in pg_stat_activity as I think before that we need to finalize the user interface. Apart from that as mentioned above still wait for some event types (like IO, netwrok IO, etc.) is not added and also I think separate function/'s (like we have for existing ones pg_stat_get_backend_waiting) will be required which again depends upon user interface. Suggestions? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com extend_pg_stat_activity_wait_event_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactor to split nodeAgg.c?
Hi, On 2015-06-29 19:33:58 -0700, Jeff Davis wrote: I was going to rebase my HashAgg patch, and got some conflicts related to the grouping sets patch. I could probably sort them out, but I think that may be the tipping point where we want to break up nodeAgg.c into nodeSortedAgg.c and nodeHashAgg.c, and probably a common file as well. I'm not sure that's going to be helpful and clean without a significant amount of duplication. Grouping sets right now use sorting, but Andrew Gierth already is working on a patch that employs hashing for individual group of groups that support it and where the aggregated state is deemed small enough. That implies a fair amount of coupling between the sorted and hashed aggregation modes. I'm not sure that conflicts due to GS can be taken as an argument to split the file - I doubt there'd be significantly fewer with a splitup since common datastructures have been changed. That said, I think e.g. splitting out the lowest level of interaction with aggregation functions and transition layers could be split off without too much pain. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Solaris testers wanted for strxfrm() behavior
Noah Misch n...@leadboat.com writes: On Sun, Jun 28, 2015 at 07:00:14PM -0400, Tom Lane wrote: Another idea would be to make a test during postmaster start to see if this bug exists, and fail if so. I'm generally on board with the thought that we don't need to work on systems with such a bad bug, but it would be a good thing if the failure was clean and produced a helpful error message, rather than looking like a Postgres bug. Failing cleanly on unpatched Solaris is adequate, agreed. A check at postmaster start isn't enough, because the postmaster might run in the C locale while individual databases or collations use problem locales. The safest thing is to test after every setlocale(LC_COLLATE) and newlocale(LC_COLLATE). That's once at backend start and once per backend per collation used, more frequent than I would like. Hmm. I was thinking more along the lines of making a single test by momentarily switching into a known-buggy locale. However, your results here imply that there are or were more than one bug with this symptom, which may make a catchall test impossible. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in bttext_abbrev_convert()
On Wed, Jul 1, 2015 at 9:36 AM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jun 30, 2015 at 5:25 PM, Jim Nasby jim.na...@bluetreble.com wrote: Isn't this the kind of thing Coverty's supposed to find? I don't know, but in general I'm not very excited about static analysis tools. The best things that they have going for them is that they're available, and don't require test coverage in the same way as running the regression tests with Valgrind enabled. Well, yes. It should have complained about that if it were a perfect tool able to emulate all code paths with all possible variable values. But Coverity is not actually that perfect, and sometimes it misses the shot, like here. by experience when you look at reports of a static tool, you need to have a look first at other code paths that share similarities with the problem reported, and you will actually see that most of the time what the static tool saw is just the tip of the iceberg. The human factor is determinant in this case. There is no real testing of sorting in the regression tests. It would be nice to have a way of generating a large and varied selection of sort operations programmatically, to catch this kind of thing. pg_regress is not really up to it. The same applies to various other cases where having a lot of expected output makes using pg_regress infeasible. Well, the issue is double here: 1) Having a buildfarm member with valgrind would greatly help. 2) This code path is not used at all AFAIK in the test suite, so we definitely lack regression tests here. In your opinion what would be a sort set large enough to be able to trigger this code path? The idea is to not make the regression test suite take too much time because of it, and not to have the server created by pg_regress running the regression tests having a too large PGDATA folder. For example, could a btree index do it with a correct data set do it on at least one platform? Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in bttext_abbrev_convert()
On Tue, Jun 30, 2015 at 9:39 PM, Michael Paquier michael.paqu...@gmail.com wrote: There is no real testing of sorting in the regression tests. It would be nice to have a way of generating a large and varied selection of sort operations programmatically, to catch this kind of thing. pg_regress is not really up to it. The same applies to various other cases where having a lot of expected output makes using pg_regress infeasible. Well, the issue is double here: 1) Having a buildfarm member with valgrind would greatly help. Not sure that I agree with that. It wouldn't hurt, but people are running Valgrind often enough these days that I think it's unlikely that having it run consistently actually adds all that much. Maybe it would make a difference to do it on a 32-bit platform, since most of us have not used a 32-bit machine as our primary development environment in about a decade, and we probably miss some things because of that. If you run Valgrind on Postgres master these days, you actually have a pretty good chance of not finding any issues, which is great. shared_buffers support for Valgrind would also help a lot (e.g. making it so that referencing a shared buffer that isn't pinned causes a Valgrind error -- Noah and I discussed this a while back; should be doable). If you're looking for a new project, may I highly recommend working on that. :-) 2) This code path is not used at all AFAIK in the test suite, so we definitely lack regression tests here. In your opinion what would be a sort set large enough to be able to trigger this code path? The idea is to not make the regression test suite take too much time because of it, and not to have the server created by pg_regress running the regression tests having a too large PGDATA folder. For example, could a btree index do it with a correct data set do it on at least one platform? Maybe. The trick would be constructing a case where many different code paths are covered, including the many different permutations and combinations of how an external sort can go (number of runs, merge steps, datatypes, etc). In general, I think that there is a lot of value to be added by someone making it their explicit goal to increase test coverage, as measured by a tool like gcov (plus subjective expert analysis, of course), particularly when it comes to things like memory corruption bugs (hopefully including shared memory corruption bugs, and hopefully including recovery). If someone was doing that in a focused way, then the codepath where we must explicitly pfree() (in the case of this bug) would probably have coverage, and then Valgrind probably would catch this. As long as that has to be a part of adding things to the standard regression test suite (particularly with sorts), a suite which is expected to run quickly, and as long as we're entirely relying on pg_regress, we will not make progress here. Maybe if there was an extended regression test suite that was explicitly about meeting a code coverage goal we'd do better. Yes, I think mechanically increasing code coverage is useful as an end in itself (although I do accept that meeting a particularly coverage percentage is often not especially useful). Increasing coverage has led me to the right question before, just as static analysis has done that for you. For example, the effort of increasing coverage can find dead code, and you can intuitively get a sense of where to look for bugs manually by determining mechanically what code paths are hard to hit, or are naturally seldom hit. It would be nice to always have a html report from gcov always available on the internet. That would be something useful to automate, IMV. Watching that regress over time might provide useful insight, but I only use gcov a couple of times a year, so that's not going to happen on its own. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in bttext_abbrev_convert()
On Tue, Jun 30, 2015 at 10:25 PM, Peter Geoghegan p...@heroku.com wrote: It would be nice to always have a html report from gcov always available on the internet. That would be something useful to automate, IMV. Watching that regress over time might provide useful insight, but I only use gcov a couple of times a year, so that's not going to happen on its own. I generated such a report just now, and noticed this: 1137 : tuplesort_performsort(btspool-sortstate); 2141131 : if (btspool2) 215 0 : tuplesort_performsort(btspool2-sortstate); 216 : 2171131 : wstate.heap = btspool-heap; 2181131 : wstate.index = btspool-index; The regression tests have zero coverage for this tuplesort_performsort() btspool2 case. That's a fairly common case to have no coverage for, and that took me all of 5 minutes to find. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in bttext_abbrev_convert()
On Tue, Jun 30, 2015 at 10:35 PM, Peter Geoghegan p...@heroku.com wrote: The regression tests have zero coverage for this tuplesort_performsort() btspool2 case. That's a fairly common case to have no coverage for, and that took me all of 5 minutes to find. BTW, I looked here because I added a bunch of sortsupport stuff to _bt_load() in 9.5. It appears that that new code is entirely without coverage. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Tue, Jun 30, 2015 at 4:00 AM, Jeff Davis pg...@j-davis.com wrote: [Jumping in without catching up on entire thread. No problem. Please let me know if these questions have already been covered.] 1. Can you change the name to something like ParallelHeapScan? Parallel Sequential is a contradiction. (I know this is bikeshedding and I won't protest further if you keep the name.) For what you are asking to change name for? We have two nodes in patch (Funnel and PartialSeqScan). Funnel is the name given to node because it is quite generic and can be used in multiple ways (other than plain parallel sequiantial scan) and other node is named as PartialSeqScan because it is used for doing the part of sequence scan. 2. Where is the speedup coming from? How much of it is CPU and IO overlapping (i.e. not leaving disk or CPU idle while the other is working), and how much from the CPU parallelism? I know this is difficult to answer rigorously, but it would be nice to have some breakdown even if for a specific machine. Yes, you are right and we have done quite some testing (on the hardware available) with this patch (with different approaches) to see how much difference it creates for IO and CPU, with respect to IO we have found that it doesn't help much [1], though it helps when the data is cached and there are really good benefits in terms of CPU [2]. In terms of completeness, I think we should add some documentation for this patch, one way is to update about the execution mechanism in src/backend/access/transam/README.parallel and then explain about new configuration knobs in documentation (.sgml files). Also we can have a separate page in itself in documentation under Server Programming Section (Parallel Query - Parallel Scan; Parallel Scan Examples; ...) Another thing to think about this patch at this stage do we need to breakup this patch and if yes, how to break it up into multiple patches, so that it can be easier to complete the review. I could see that it can be splitted into 2 or 3 patches. a. Infrastructure for parallel execution, like some of the stuff in execparallel.c, heapam.c,tqueue.c, etc and all other generic (non-nodes specific) code. b. Nodes (Funnel and PartialSeqScan) specific code for optimiser and executor. c. Documentation Suggestions? [1] - http://www.postgresql.org/message-id/caa4ek1jhcmn2x1ljq4bomlapt+btouid5vqqk5g6ddfv69i...@mail.gmail.com [2] - Refer slides 14-15 for the presentation in PGCon, I can repost the data here if required. https://www.pgcon.org/2015/schedule/events/785.en.html With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] WAL-related tools and .paritial WAL file
On Wed, Jul 1, 2015 at 9:09 AM, Fujii Masao masao.fu...@gmail.com wrote: Also regarding pg_xlogdump, we can just document, for example, please get rid of .paritial suffix from the WAL file name if you want to dump it by pg_xlogdump. Can't we skip such files in pg_xlogdump? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Streaming replication for psycopg2
On Thu, Jun 4, 2015 at 5:49 PM, Shulgin, Oleksandr oleksandr.shul...@zalando.de wrote: On Tue, Jun 2, 2015 at 2:23 PM, Shulgin, Oleksandr oleksandr.shul...@zalando.de wrote: Hello, I've submitted a patch to psycopg2 to support streaming replication protocol (COPY_BOTH): https://github.com/psycopg/psycopg2/pull/322 It would be great if more people had a chance to take a look and provide feedback about it. In particular, please see example usage at this github comment[1]. Some bikeshedding is really needed here. :-) Hello again, I have updated the pull request above to address the feedback I've gathered from using this construct internally and from other sources. Now included, the lower level asynchronous interface that gives the user more control, for the price of doing select() and keeping track of keepalive timeouts manually. It would be nice if someone could review this. The final look can be found by using this link (docs included): https://github.com/psycopg/psycopg2/pull/322/files Thanks. -- Alex
Re: [HACKERS] thread_test's sched_yield requires -lrt on solaris
Oskari Saarenmaa wrote: I configured the dingo and binturong Solaris 10 animals to build 9.3 some time ago but apparently they always failed the configure phase. Turns out this is caused by thread_test's usage of sched_yield which is in librt on Solaris but which is not pulled in by anything on 9.3 and earlier on my box. Apparently the other Solaris animal (castoroides) requires librt for fdatasync, but that's not required on my system. On 9.4 and master librt is required for shm_open so the check doesn't fail there. Attached a patch to check for sched_yield in configure, the patch only applies against 9.0 - 9.3 which are using autoconf 2.63. We should probably check for sched_yield anyway on all branches even if it's not strictly required on 9.4+ at the moment. I'm going to apply this patch to all branches. (I don't think it's so cool to rely on librt being pulled in by unrelated functions.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On Mon, Jun 29, 2015 at 11:28 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Mon, Jun 29, 2015 at 5:55 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes jeff.ja...@gmail.com wrote: Is there a way to use gdb to figure out who holds the lock they are waiting for? Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG defined? That might do it. I hadn't planned on running into this problem, so had not compiled accordingly. I thought LOCK_DEBUG would produce too much output, but now I see it doesn't print anything unless Trace_lwlocks is on (but it still makes more info available to gdb, as Amit mentioned), so I think that should be OK. Since I messed up the gdb session causing the postmaster to SIGKILL all the children and destroy the evidence, I'll repeat the run compiled with LOCK_DEBUG and see what it looks like in the morning. I've gotten the LWLock deadlock again. User backend 24841 holds the WALInsertLocks 7 and is blocked attempting to acquire 6 . So it seems to be violating the lock ordering rules (although I don't see that rule spelled out in xlog.c) The Checkpoint process, meanwhile, has acquired WALInsertLocks 0 through 6 and is blocked on 7. I'm not sure where to go from here. 24841: (gdb) f 2 (gdb) p held_lwlocks $1 = {{lock = 0x7f70e11fae40, mode = LW_EXCLUSIVE}, {lock = 0x7f70e13df080, mode = LW_EXCLUSIVE}, {lock = 0x7f70e11d4280, mode = LW_EXCLUSIVE}, {lock = 0x7f70e11d4280, mode = LW_EXCLUSIVE}, {lock = 0x0, mode = LW_EXCLUSIVE} repeats 196 times} (gdb) p T_NAME((LWLock *) held_lwlocks[1]) $2 = 0x84cee1 WALInsertLocks (gdb) p T_ID((LWLock *) held_lwlocks[1]) (gdb) p lock $3 = (LWLock *) 0x7f70e13df000 (gdb) p T_NAME((LWLock *) lock) $7 = 0x84cee1 WALInsertLocks (gdb) p T_ID((LWLock *) lock) $8 = 6 (gdb) p *(lock) $1 = {mutex = 0 '\000', tranche = 1, state = {value = 1627389952}, nwaiters = {value = 2}, waiters = {head = {prev = 0x7f70e9e29e58, next = 0x7f70e9e2a140}}, owner = 0x7f70e9e2d260} (gdb) p *(lock-owner) $5 = {links = {prev = 0x0, next = 0x0}, sem = {semId = 539754537, semNum = 0}, waitStatus = 0, procLatch = {is_set = 0, is_shared = 1 '\001', owner_pid = 24820}, lxid = 0, pid = 24820, pgprocno = 112, backendId = -1, databaseId = 0, roleId = 0, recoveryConflictPending = 0 '\000', lwWaiting = 1 '\001', lwWaitMode = 0 '\000', lwWaitLink = { prev = 0x7f70e13df090, next = 0x7f70e13df090}, waitLock = 0x0, waitProcLock = 0x0, waitLockMode = 0, heldLocks = 0, waitLSN = 0, syncRepState = 0, syncRepLinks = {prev = 0x0, next = 0x0}, myProcLocks = {{prev = 0x7f70e9e2d2f0, next = 0x7f70e9e2d2f0}, {prev = 0x7f70e9e2d300, next = 0x7f70e9e2d300}, {prev = 0x7f70e9e2d310, next = 0x7f70e9e2d310}, { prev = 0x7f70e9e2d320, next = 0x7f70e9e2d320}, {prev = 0x7f70e9e2d330, next = 0x7f70e9e2d330}, {prev = 0x7f70e9e2d340, next = 0x7f70e9e2d340}, {prev = 0x7f70e9e2d350, next = 0x7f70e9e2d350}, {prev = 0x7f70e9e2d360, next = 0x7f70e9e2d360}, {prev = 0x7f70e9e2d370, next = 0x7f70e9e2d370}, {prev = 0x7f70e9e2d380, next = 0x7f70e9e2d380}, { prev = 0x7f70e9e2d390, next = 0x7f70e9e2d390}, {prev = 0x7f70e9e2d3a0, next = 0x7f70e9e2d3a0}, {prev = 0x7f70e9e2d3b0, next = 0x7f70e9e2d3b0}, {prev = 0x7f70e9e2d3c0, next = 0x7f70e9e2d3c0}, {prev = 0x7f70e9e2d3d0, next = 0x7f70e9e2d3d0}, {prev = 0x7f70e9e2d3e0, next = 0x7f70e9e2d3e0}}, subxids = {xids = {0 repeats 64 times}}, backendLock = 0x7f70e13dad00, fpLockBits = 0, fpRelId = {0 repeats 16 times}, fpVXIDLock = 0 '\000', fpLocalTransactionId = 0} #0 0x003dcb6eaf27 in semop () from /lib64/libc.so.6 #1 0x00671aef in PGSemaphoreLock (sema=0x7f70e9e2a108) at pg_sema.c:387 #2 0x006d79ac in LWLockWaitForVar (lock=0x7f70e13df000, valptr=0x7f70e13df028, oldval=0, newval=0x7fffb33cbd48) at lwlock.c:1406 #3 0x004ee006 in WaitXLogInsertionsToFinish (upto=41590734848) at xlog.c:1576 #4 0x004ee94b in AdvanceXLInsertBuffer (upto=41594920960, opportunistic=value optimized out) at xlog.c:1888 #5 0x004f3c42 in GetXLogBuffer (ptr=41594920960) at xlog.c:1669 #6 0x004f40e7 in CopyXLogRecordToWAL (rdata=0x19bc920, fpw_lsn=value optimized out) at xlog.c:1313 #7 XLogInsertRecord (rdata=0x19bc920, fpw_lsn=value optimized out) at xlog.c:1008 #8 0x004f7c6c in XLogInsert (rmid=13 '\r', info=32 ' ') at xloginsert.c:453 #9 0x0047e210 in ginPlaceToPage (btree=0x7fffb33cc070, stack=0x1a5bfe0, insertdata=value optimized out, updateblkno=value optimized out, childbuf=0, buildStats=0x0) at ginbtree.c:418 #10 0x0047f50d in ginInsertValue (btree=0x7fffb33cc070, stack=0x1a5bfe0, insertdata=0x7fffb33cc0f0, buildStats=0x0) at ginbtree.c:748 #11 0x00475deb in ginEntryInsert (ginstate=0x7fffb33cc470, attnum=5544, key=1, category=value optimized out, items=0x7f70e10d9140, nitem=51, buildStats=0x0) at gininsert.c:234 #12 0x0048602c in ginInsertCleanup
Re: [HACKERS] pg_restore -t should match views, matviews, and foreign tables
Hi I am sending a review of this trivial patch. 1.This patch enables the possibility to restore only selected view, mat. view, foreign table or sequence. Currently the option -t works with tables only. All other relation like objects are quietly ignored. With this patch, the check on type is enhanced to allow other types described by pg_class system table. The implementation is trivial: +strcmp(te-desc, TABLE DATA) == 0 || +strcmp(te-desc, VIEW) == 0 || +strcmp(te-desc, FOREIGN TABLE) == 0 || +strcmp(te-desc, MATERIALIZED VIEW) == 0 || +strcmp(te-desc, MATERIALIZED VIEW DATA) == 0 || +strcmp(te-desc, SEQUENCE) == 0) 2. There was not any objections against this patch. 3. There was not any problem with patching and compilation. 4. This feature is good enough documented. There is opened question, if the related line should be changed? The current text is not 100% accurate, but it is short, and well readable and understandable. -S, --superuser=NAME superuser user name to use for disabling triggers -t, --table=NAME restore named table -T, --trigger=NAME restore named trigger 5. All tests passed 6. There are no tests. But pg_dump related sw has not any tests yet. I don't see any issues - this patch is really trivial without risks. It is working already on pg_dump side, so the fix on pg_restore side is natural. Regards Pavel 2015-04-01 5:01 GMT+02:00 Craig Ringer cr...@2ndquadrant.com: Following on from this -bugs post: http://www.postgresql.org/message-id/camsr+ygj50tvtvk4dbp66gajeoc0kap6kxfehaom+neqmhv...@mail.gmail.com this patch adds support for views, foreign tables, and materialised views to the pg_restore -t flag. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] Refactoring speculative insertion with unique indexes a little
On 06/11/2015 02:19 AM, Peter Geoghegan wrote: Currently, speculative insertion (the INSERT ... ON CONFLICT DO UPDATE executor/storage infrastructure) uses checkUnique == UNIQUE_CHECK_PARTIAL for unique indexes, which is a constant originally only used by deferred unique constraints. It occurred to me that this has a number of disadvantages: * It confuses separation of concerns. Pushing down this information to the nbtree AM makes it clear why it's slightly special from a speculative insertion point of view. For example, the nbtree AM does not actually require livelock insurance (for unique indexes, although in principle not for nbtree-based exclusion constraints, which are possible). * UNIQUE_CHECK_PARTIAL is not only not the same thing as UNIQUE_CHECK_SPECULATIVE (a new constant for the enum). It's also naturally mutually exclusive with it (since we do not and cannot support deferred unique constraints as arbiters). Let's represent this directly. * It makes a conflict not detected by the pre-check always insert an index tuple, even though that occurs after a point where it's already been established that the pointed-to TID is doomed -- it must go on to be super deleted. Why bother bloating the index? I'm actually not really motivated by wanting to reduce bloat here (that was always something that I thought was a non-issue with *any* implemented speculative insertion prototype [1]). Rather, by actually physically inserting an index tuple unnecessarily, the implication is that it makes sense to do so (perhaps for roughly the same reason it makes sense with deferred unique constraints, or some other complicated and subtle reason.). AFAICT that implication is incorrect, though; I see no reason why inserting that index tuple serves any purpose, and it clearly *can* be avoided with little effort. I agree it would be cleaner to have a separate CHECK_UNIQUE_XXX code for speculative insertions. You've defined CHECK_UNIQUE_SPECULATIVE as like CHECK_UNIQUE_PARTIAL, but you don't have to insert the index tuple if there's a conflict. I think it'd be better to define it as like CHECK_UNIQUE_YES, but return FALSE instead of throwing an error on conflict. The difference is that the aminsert would not be allowed to return FALSE when there is no conflict. Actually, even without this patch, a dummy implementation of aminsert that always returns FALSE in CHECK_UNIQUE_PARTIAL mode would be legal per the docs, but would loop forever. So that would be nice to fix or document away, even though it's not a problem for B-tree currently. Attached patch updates speculative insertion along these lines. In passing, I have make ExecInsertIndexTuples() give up when a speculative insertion conflict is detected. Again, this is not about bloat prevention; it's about making the code easier to understand by not doing something that is unnecessary, and in avoiding that also avoiding the implication that it is necessary. There are already enough complicated interactions that *are* necessary (e.g. livelock avoidance for exclusion constraints). Let us make our intent clearer. Hmm, not sure I like those changes. ExecInsertIndexTuples's behaviour now depends on whether specConflict is passed, but that seems weird as specConflict is an output parameter. The patch also updates the AM interface documentation (the part covering unique indexes). It seems quite natural to me to document the theory of operation for speculative insertion there. Yeah, although I find the explanation pretty long-winded and difficult to understand ;-). - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On 06/30/2015 07:37 PM, Alvaro Herrera wrote: Jeff Janes wrote: I've gotten the LWLock deadlock again. User backend 24841 holds the WALInsertLocks 7 and is blocked attempting to acquire 6 . So it seems to be violating the lock ordering rules (although I don't see that rule spelled out in xlog.c) Hmm, interesting -- pg_stat_statement is trying to re-do an operation that involves updating a GIN index, while WAL-logging for the original update is still ongoing, it seems. I don't think pg_stat_statement has anything to do with this. You can see from the backtrace that pg_stat_statement is enabled, as the call went through the pgss_ExecutorRun executor hook, but that's all. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
Jeff Janes wrote: I've gotten the LWLock deadlock again. User backend 24841 holds the WALInsertLocks 7 and is blocked attempting to acquire 6 . So it seems to be violating the lock ordering rules (although I don't see that rule spelled out in xlog.c) Hmm, interesting -- pg_stat_statement is trying to re-do an operation that involves updating a GIN index, while WAL-logging for the original update is still ongoing, it seems. #0 0x003dcb6eaf27 in semop () from /lib64/libc.so.6 #1 0x00671aef in PGSemaphoreLock (sema=0x7f70e9e2a108) at pg_sema.c:387 #2 0x006d79ac in LWLockWaitForVar (lock=0x7f70e13df000, valptr=0x7f70e13df028, oldval=0, newval=0x7fffb33cbd48) at lwlock.c:1406 #3 0x004ee006 in WaitXLogInsertionsToFinish (upto=41590734848) at xlog.c:1576 #4 0x004ee94b in AdvanceXLInsertBuffer (upto=41594920960, opportunistic=value optimized out) at xlog.c:1888 #5 0x004f3c42 in GetXLogBuffer (ptr=41594920960) at xlog.c:1669 #6 0x004f40e7 in CopyXLogRecordToWAL (rdata=0x19bc920, fpw_lsn=value optimized out) at xlog.c:1313 #7 XLogInsertRecord (rdata=0x19bc920, fpw_lsn=value optimized out) at xlog.c:1008 #8 0x004f7c6c in XLogInsert (rmid=13 '\r', info=32 ' ') at xloginsert.c:453 #9 0x0047e210 in ginPlaceToPage (btree=0x7fffb33cc070, stack=0x1a5bfe0, insertdata=value optimized out, updateblkno=value optimized out, childbuf=0, buildStats=0x0) at ginbtree.c:418 #10 0x0047f50d in ginInsertValue (btree=0x7fffb33cc070, stack=0x1a5bfe0, insertdata=0x7fffb33cc0f0, buildStats=0x0) at ginbtree.c:748 #11 0x00475deb in ginEntryInsert (ginstate=0x7fffb33cc470, attnum=5544, key=1, category=value optimized out, items=0x7f70e10d9140, nitem=51, buildStats=0x0) at gininsert.c:234 #12 0x0048602c in ginInsertCleanup (ginstate=0x7fffb33cc470, vac_delay=value optimized out, stats=0x0) at ginfast.c:843 #13 0x004871b9 in ginHeapTupleFastInsert (ginstate=0x7fffb33cc470, collector=value optimized out) at ginfast.c:436 #14 0x0047625a in gininsert (fcinfo=value optimized out) at gininsert.c:531 #15 0x007dc483 in FunctionCall6Coll (flinfo=value optimized out, collation=value optimized out, arg1=value optimized out, arg2=value optimized out, arg3=value optimized out, arg4=value optimized out, arg5=140122789534360, arg6=0) at fmgr.c:1436 #16 0x004b2247 in index_insert (indexRelation=0x7f70e1190e60, values=0x7fffb33cef50, isnull=0x7fffb33cf050 , heap_t_ctid=0x1a4a794, heapRelation=0x7f70e1185a98, checkUnique=UNIQUE_CHECK_NO) at indexam.c:226 #17 0x005d2e67 in ExecInsertIndexTuples (slot=0x1a497f0, tupleid=0x1a4a794, estate=0x1a40768, noDupErr=0 '\000', specConflict=0x0, arbiterIndexes=0x0) at execIndexing.c:384 #18 0x005f0161 in ExecUpdate (tupleid=0x7fffb33cf250, oldtuple=0x0, slot=0x1a497f0, planSlot=0x1a42498, epqstate=0x1a40a70, estate=0x1a40768, canSetTag=1 '\001') at nodeModifyTable.c:978 #19 0x005f0b2a in ExecModifyTable (node=0x1a409d0) at nodeModifyTable.c:1436 #20 0x005d6cc8 in ExecProcNode (node=0x1a409d0) at execProcnode.c:389 #21 0x005d5402 in ExecutePlan (queryDesc=0x1a2b220, direction=value optimized out, count=0) at execMain.c:1549 #22 standard_ExecutorRun (queryDesc=0x1a2b220, direction=value optimized out, count=0) at execMain.c:337 #23 0x7f70ea39af3b in pgss_ExecutorRun (queryDesc=0x1a2b220, direction=ForwardScanDirection, count=0) at pg_stat_statements.c:881 #24 0x006ea38f in ProcessQuery (plan=0x1a2b3d8, sourceText=0x1a2af68 update foo set count=count+1 where text_array @ $1, params=0x1a2afe8, dest=value optimized out, completionTag=0x7fffb33cf660 ) at pquery.c:185 #25 0x006ea5ec in PortalRunMulti (portal=0x19a6128, isTopLevel=1 '\001', dest=0xc55020, altdest=0xc55020, completionTag=0x7fffb33cf660 ) at pquery.c:1279 #26 0x006eadb3 in PortalRun (portal=0x19a6128, count=9223372036854775807, isTopLevel=1 '\001', dest=0x1990630, altdest=0x1990630, completionTag=0x7fffb33cf660 ) at pquery.c:816 #27 0x006e6ffb in exec_execute_message (portal_name=0x1990218 , max_rows=9223372036854775807) at postgres.c:1988 #28 0x006e8c15 in PostgresMain (argc=value optimized out, argv=value optimized out, dbname=0x19a3210 jjanes, username=value optimized out) at postgres.c:4088 #29 0x006855dd in BackendRun (argc=value optimized out, argv=value optimized out) at postmaster.c:4159 #30 BackendStartup (argc=value optimized out, argv=value optimized out) at postmaster.c:3835 #31 ServerLoop (argc=value optimized out, argv=value optimized out) at postmaster.c:1609 #32 PostmasterMain (argc=value optimized out, argv=value optimized out) at postmaster.c:1254 #33 0x00610ab0 in main (argc=4, argv=0x1976cf0) at main.c:221 -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development,
Re: [HACKERS] Dereferenced pointer in tablesample.c
Petr Jelinek p...@2ndquadrant.com writes: On 2015-06-30 09:10, Michael Paquier wrote: If the expression argstate is NULL when calling ExecInitExpr(), argstate is going to be NULL and dereferenced afterwards, see execQual.c for more details. Hence I think that the patch attached should be applied. Thoughts? Well, yes the ExecEvalExpr should be in the else block if we'd keep the NULL logic there. However after rereading the code, ISTM the ExecInitExpr will only return NULL if the argexpr is NULL and argexpr is added by ParseTableSample using the transformExpr on every argument which comes from grammar and those are a_exprs which AFAIK will never be NULL. So I actually think that the argstate can never be NULL in practice. Indeed. ParseTableSample() is badly in need of a rewrite, but I agree that it's not going to produce null expression trees. Patch attached. Will push this shortly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind failure by file deletion in source server
On 06/29/2015 09:44 AM, Michael Paquier wrote: On Mon, Jun 29, 2015 at 4:55 AM, Heikki Linnakangas wrote: But we'll still need to handle the pg_xlog symlink case somehow. Perhaps it would be enough to special-case pg_xlog for now. Well, sure, pg_rewind does not copy the soft links either way. Now it would be nice to have an option to be able to recreate the soft link of at least pg_xlog even if it can be scripted as well after a run. Hmm. I'm starting to think that pg_rewind should ignore pg_xlog entirely. In any non-trivial scenarios, just copying all the files from pg_xlog isn't enough anyway, and you need to set up a recovery.conf after running pg_rewind that contains a restore_command or primary_conninfo, to fetch the WAL. So you can argue that by not copying pg_xlog automatically, we're actually doing a favour to the DBA, by forcing him to set up the recovery.conf file correctly. Because if you just test simple scenarios where not much time has passed between the failover and running pg_rewind, it might be enough to just copy all the WAL currently in pg_xlog, but it would not be enough if more time had passed and not all the required WAL is present in pg_xlog anymore. And by not copying the WAL, we can avoid some copying, as restore_command or streaming replication will only copy what's needed, while pg_rewind would copy all WAL it can find the target's data directory. pg_basebackup also doesn't include any WAL, unless you pass the --xlog option. It would be nice to also add an optional --xlog option to pg_rewind, but with pg_rewind it's possible that all the required WAL isn't present in the pg_xlog directory anymore, so you wouldn't always achieve the same effect of making the backup self-contained. So, I propose the attached. It makes pg_rewind ignore the pg_xlog directory in both the source and the target. - Heikki diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml index 32dc83f..bb1d640 100644 --- a/doc/src/sgml/ref/pg_rewind.sgml +++ b/doc/src/sgml/ref/pg_rewind.sgml @@ -49,12 +49,13 @@ PostgreSQL documentation para The result is equivalent to replacing the target data directory with the - source one. All files are copied, including configuration files. The - advantage of applicationpg_rewind/ over taking a new base backup, or - tools like applicationrsync/, is that applicationpg_rewind/ does - not require reading through all unchanged files in the cluster. That makes - it a lot faster when the database is large and only a small portion of it - differs between the clusters. + source one. All files in the data directory are copied, including + configuration files, but excluding the WAL log directory, + filenamepg_xlog/. The advantage of applicationpg_rewind/ over + taking a new base backup, or tools like applicationrsync/, is that + applicationpg_rewind/ does not require reading through all unchanged + files in the cluster. That makes it a lot faster when the database is + large and only a small portion of it differs between the clusters. /para para @@ -74,12 +75,12 @@ PostgreSQL documentation When the target server is started up for the first time after running applicationpg_rewind/, it will go into recovery mode and replay all WAL generated in the source server after the point of divergence. - If some of the WAL was no longer available in the source server when - applicationpg_rewind/ was run, and therefore could not be copied by - applicationpg_rewind/ session, it needs to be made available when the - target server is started up. That can be done by creating a - filenamerecovery.conf/ file in the target data directory with a - suitable varnamerestore_command/. + Like after restoring from a base backup using continous archiving, + the source server's WAL must be made available to the rewound server. + This can be done by creating a filenamerecovery.conf/ file in the + target data directory with a suitable varnamerestore_command/ or + varnameprimary_conninfo/ line, or by copying the required WAL files + manually into filenamepg_xlog/ in the target data directory. /para /refsect1 diff --git a/src/bin/pg_rewind/RewindTest.pm b/src/bin/pg_rewind/RewindTest.pm index 5219ec9..82e3f4c 100644 --- a/src/bin/pg_rewind/RewindTest.pm +++ b/src/bin/pg_rewind/RewindTest.pm @@ -284,14 +284,12 @@ sub run_pg_rewind # Keep a temporary postgresql.conf for master node or it would be # overwritten during the rewind. - copy( - $test_master_datadir/postgresql.conf, - $testroot/master-postgresql.conf.tmp); + copy($test_master_datadir/postgresql.conf, + $testroot/master-postgresql.conf.tmp); # Now run pg_rewind if ($test_mode eq local) { - # Do rewind using a local pgdata as source # Stop the master and be ready to perform the rewind system_or_bail( @@ -306,10 +304,22 @@ sub run_pg_rewind $log_path, '21'); ok($result, 'pg_rewind
Re: [HACKERS] LWLock deadlock and gdb advice
On 06/30/2015 07:05 PM, Jeff Janes wrote: On Mon, Jun 29, 2015 at 11:28 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Mon, Jun 29, 2015 at 5:55 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes jeff.ja...@gmail.com wrote: Is there a way to use gdb to figure out who holds the lock they are waiting for? Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG defined? That might do it. I hadn't planned on running into this problem, so had not compiled accordingly. I thought LOCK_DEBUG would produce too much output, but now I see it doesn't print anything unless Trace_lwlocks is on (but it still makes more info available to gdb, as Amit mentioned), so I think that should be OK. Since I messed up the gdb session causing the postmaster to SIGKILL all the children and destroy the evidence, I'll repeat the run compiled with LOCK_DEBUG and see what it looks like in the morning. I've gotten the LWLock deadlock again. User backend 24841 holds the WALInsertLocks 7 and is blocked attempting to acquire 6 . So it seems to be violating the lock ordering rules (although I don't see that rule spelled out in xlog.c) The Checkpoint process, meanwhile, has acquired WALInsertLocks 0 through 6 and is blocked on 7. I'm not sure where to go from here. The user backend 24841 is waiting in a LWLockWaitForVar() call, on WALInsertLock 6, and oldval==0. The checkpointer is holding WALInsertLock 6, but it has set the insertingat value to PG_UINT64_MAX. The LWLockWaitForVar() call should not be blocked on that, because 0 != PG_UINT64_MAX. Looks like the LWLock-scalability patch that made LWlock acquisiton to use atomic ops instead of a spinlock broke this. LWLockWaitForVar() is supposed to guarantee that it always wakes up from sleep, when the variable's value changes. But there is now a race condition in LWLockWaitForVar that wasn't there in 9.4: if (mustwait) { /* * Perform comparison using spinlock as we can't rely on atomic 64 * bit reads/stores. */ #ifdef LWLOCK_STATS lwstats-spin_delay_count += SpinLockAcquire(lock-mutex); #else SpinLockAcquire(lock-mutex); #endif /* * XXX: We can significantly optimize this on platforms with 64bit * atomics. */ value = *valptr; if (value != oldval) { result = false; mustwait = false; *newval = value; } else mustwait = true; SpinLockRelease(lock-mutex); } else mustwait = false; if (!mustwait) break; /* the lock was free or value didn't match */ /* * Add myself to wait queue. Note that this is racy, somebody else * could wakeup before we're finished queuing. NB: We're using nearly * the same twice-in-a-row lock acquisition protocol as * LWLockAcquire(). Check its comments for details. */ LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE, false); After the spinlock is released above, but before the LWLockQueueSelf() call, it's possible that another backend comes in, acquires the lock, changes the variable's value, and releases the lock again. In 9.4, the spinlock was not released until the process was queued. Looking at LWLockAcquireWithVar(), it's also no longer holding the spinlock while it updates the Var. That's not cool either :-(. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5 release notes
I've gone through the release notes and added comments referencing commits as discussed earlier. Additionally I've improved and added a bunch of items. Further stuff that came up while looking: * 2014-09-25 [b0d81ad] Heikki..: Add -D option to specify data directory to pg_c.. new options, should possibly be documented? * 2014-10-02 [3acc10c9] Robert..: Increase the number of buffer mapping partition.. should we mention this? This has been patched by a number of downstream vendors and users, so it's probably worth calling out? * 2014-11-18 [606c012] Simon ..: Reduce btree scan overhead for and strategies Worthy of a note in the performance section. * 2014-11-22 [eca2b9b] Andrew..: Rework echo_hidden for \sf and \ef from commit .. Seems far too minor in comparison to other stuff left out. * 2014-11-07 [7516f52] Alvaro..: BRIN: Block Range Indexes Looks to me like that should just be Alvaro. * 2014-11-22 [b62f94c] Tom Lane: Allow simplification of EXISTS() subqueries con.. Planner change, I think it's good to mention those. * 2014-11-28 [e384ed6] Tom Lane: Improve typcache: cache negative lookup results.. Should perhaps, together with other cache enhancing entries, be mentioned? * 2014-12-08 [519b075] Simon ..: Use GetSystemTimeAsFileTime directly in win32 2014-12-08 [8001fe6] Simon ..: Windows: use GetSystemTimePreciseAsFileTime if .. Timer resolution isn't a unimportant thing for people using explain? * 2014-12-12 [7e354ab] Andrew..: Add several generator functions for jsonb that .. 2015-05-12 [c694701] Andrew..: Additional functions and operators for jsonb 2015-05-31 [37def42] Andrew..: Rename jsonb_replace to jsonb_set and allow it .. 2014-12-12 [237a882] Andrew..: Add json_strip_nulls and jsonb_strip_nulls fun.. should probably be merged? Similar set of authors and too many similar release note entries. * 2014-12-23 [d7ee82e] Alvaro..: Add SQL-callable pg_get_object_address * 2014-12-30 [a676201] Alvaro..: Add pg_identify_object_as_address should be merged. * 2015-01-13 [4bad60e] Andres..: Allow latches to wait for socket writability wi.. 2015-01-14 [59f71a0] Andres..: Add a default local latch for use in signal han.. 2015-01-17 [ff44fba] Andres..: Replace walsender's latch with the general shar.. 2015-02-03 [387da18] Andres..: Use a nonblocking socket for FE/BE communicatio.. 2015-02-03 [4f85fde] Andres..: Introduce and use infrastructure for interrupt .. 2015-02-03 [4fe384b] Andres..: Process 'die' interrupts while reading/writing .. 2015-02-03 [6647248] Andres..: Don't allow immediate interrupts during authent.. 2015-02-03 [675] Andres..: Move deadlock and other interrupt handling in p.. 2015-02-13 [80788a4] Heikki..: Simplify waiting logic in reading from /writin.. * 2015-01-17 [9402869] Heikki..: Advance backend's advertised xmin more aggressi.. This is a pretty big efficiency boon for systems with longer nontrivial transactions. * 2015-01-29 [ed12700] Andres..: Align buffer descriptors to cache line boundari.. Maybe worthwhile to mention? * 2015-02-16 [9e3ad1a] Tom Lane: Use fast path in plpgsql's RETURN/RETURN NEXT i.. 2015-02-28 [e524cbd] Tom Lane: Track typmods in plpgsql expression evaluation .. 2015-03-04 [1345cc6] Tom Lane: Use standard casting mechanism to convert types.. 2015-03-11 [21dcda2] Tom Lane: Allocate ParamListInfo once per plpgsql functio.. Maybe reformulate to generalize the array performance into plpgsql and mention arrays and RETURN? And combine with the casting change entry, because that's also about performance? * 2015-02-20 [09d8d11] Tom Lane: Use FLEXIBLE_ARRAY_MEMBER in a bunch more places. 2015-02-20 [5740be6] Tom Lane: Some more FLEXIBLE_ARRAY_MEMBER hacking. 2015-02-20 [e38b1eb] Tom Lane: Use FLEXIBLE_ARRAY_MEMBER in struct varlena. 2015-02-20 [c110eff] Tom Lane: Use FLEXIBLE_ARRAY_MEMBER in struct RecordIOData. 2015-02-20 [33a3b03] Tom Lane: Use FLEXIBLE_ARRAY_MEMBER in some more places. 2015-02-20 [33b2a2c] Tom Lane: Fix statically allocated struct with FLEXIBLE_A.. 2015-02-21 [f2874fe] Tom Lane: Some more FLEXIBLE_ARRAY_MEMBER fixes. 2015-02-21 [e1a11d9] Tom Lane: Use FLEXIBLE_ARRAY_MEMBER for HeapTupleHeaderDa.. 2015-02-21 [2e21121] Tom Lane: Use FLEXIBLE_ARRAY_MEMBER in a number of other .. Maybe add a note in the source code section? Not sure if relevant enough. * 2015-02-21 [b419865] Jeff D..: In array_agg(), don't create a new context for .. Peformance improvement? * 2015-02-27 [f65e827] Tom Lane: Invent a memory context reset/delete callback m.. Interesting for devs, maybe add? * 2015-03-11 [b557226] Tom Lane: Improve planner's cost estimation in the presen.. 2015-03-03 [b989619] Tom Lane: Fix cost estimation for indexscans on expensive.. 2015-03-11 [f4abd02] Tom Lane: Support flattening of empty-FROM subqueries and.. 2015-05-04 [2503982] Tom Lane: Improve procost estimates for some text search ..
Re: [HACKERS] LWLock deadlock and gdb advice
On 2015-06-30 21:08:53 +0300, Heikki Linnakangas wrote: /* * XXX: We can significantly optimize this on platforms with 64bit * atomics. */ value = *valptr; if (value != oldval) { result = false; mustwait = false; *newval = value; } else mustwait = true; SpinLockRelease(lock-mutex); } else mustwait = false; if (!mustwait) break; /* the lock was free or value didn't match */ /* * Add myself to wait queue. Note that this is racy, somebody else * could wakeup before we're finished queuing. NB: We're using nearly * the same twice-in-a-row lock acquisition protocol as * LWLockAcquire(). Check its comments for details. */ LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE, false); After the spinlock is released above, but before the LWLockQueueSelf() call, it's possible that another backend comes in, acquires the lock, changes the variable's value, and releases the lock again. In 9.4, the spinlock was not released until the process was queued. Hm. Right. A recheck of the value after the queuing should be sufficient to fix? That's how we deal with the exact same scenarios for the normal lock state, so that shouldn't be very hard to add. Looking at LWLockAcquireWithVar(), it's also no longer holding the spinlock while it updates the Var. That's not cool either :-(. Hm. I'd somehow assumed holding the lwlock ought to be sufficient, but it really isn't. This var stuff isn't fitting all that well into the lwlock code. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Mention column name in error messages
Hi all, As far as I know, there is currently no way to find which column is triggering an error on an INSERT or ALTER COLUMN statement. Example: # create table foo(bar varchar(4), baz varchar(2)); CREATE TABLE # insert into foo values ('foo!', 'ok'); INSERT 0 1 # insert into foo values ('foo2!', 'ok'); ERROR: value too long for type character varying(4) # insert into foo values ('foo!', 'ok2'); ERROR: value too long for type character varying(2) I browsed the list and found a conversation from last year (http://www.postgresql.org/message-id/3214.1390155...@sss.pgh.pa.us) that discussed adding the actual value in the output. The behavior I am proposing differs in the sense we will be able to see in the ERROR: what column triggered the error: # create table foo(bar varchar(4), baz varchar(2)); CREATE TABLE # insert into foo values ('foo!', 'ok'); INSERT 0 1 # insert into foo values ('foo2!', 'ok'); ERROR: value too long for bar of type character varying(4) # insert into foo values ('foo!', 'ok2'); ERROR: value too long for baz of type character varying(2) In that same conversation, Tom Lane said: [...] People have speculated about ways to name the target column in the error report, which would probably be far more useful; but it's not real clear how to do that in our system structure. Given my very restricted knowledge of PG's codebase I am not sure whether my modifications are legitimate or not, so please don't hesitate to comment on it and pointing where things are subpar to PG's codebase. In any case, it's to be considered as WIP for the moment. Thanks in advance, Franck -- Franck Verrot https://github.com/franckverrot https://twitter.com/franckverrot verbose-validation.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On 06/30/2015 10:09 PM, Andres Freund wrote: On 2015-06-30 21:08:53 +0300, Heikki Linnakangas wrote: /* * XXX: We can significantly optimize this on platforms with 64bit * atomics. */ value = *valptr; if (value != oldval) { result = false; mustwait = false; *newval = value; } else mustwait = true; SpinLockRelease(lock-mutex); } else mustwait = false; if (!mustwait) break; /* the lock was free or value didn't match */ /* * Add myself to wait queue. Note that this is racy, somebody else * could wakeup before we're finished queuing. NB: We're using nearly * the same twice-in-a-row lock acquisition protocol as * LWLockAcquire(). Check its comments for details. */ LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE, false); After the spinlock is released above, but before the LWLockQueueSelf() call, it's possible that another backend comes in, acquires the lock, changes the variable's value, and releases the lock again. In 9.4, the spinlock was not released until the process was queued. Hm. Right. A recheck of the value after the queuing should be sufficient to fix? That's how we deal with the exact same scenarios for the normal lock state, so that shouldn't be very hard to add. Yeah. It's probably more efficient to not release the spinlock between checking the value and LWLockQueueSelf() though. Looking at LWLockAcquireWithVar(), it's also no longer holding the spinlock while it updates the Var. That's not cool either :-(. Hm. I'd somehow assumed holding the lwlock ought to be sufficient, but it really isn't. This var stuff isn't fitting all that well into the lwlock code. I'll take a stab at fixing this tomorrow. I take the blame for not documenting the semantics of LWLockAcquireWithVar() and friends well enough... - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLock deadlock and gdb advice
On 2015-06-30 22:19:02 +0300, Heikki Linnakangas wrote: Hm. Right. A recheck of the value after the queuing should be sufficient to fix? That's how we deal with the exact same scenarios for the normal lock state, so that shouldn't be very hard to add. Yeah. It's probably more efficient to not release the spinlock between checking the value and LWLockQueueSelf() though. Right now LWLockQueueSelf() takes spinlocks etc itself, and is used that way in a bunch of callsites... So that'd be harder. Additionally I'm planning to get rid of the spinlocks around queuing (they show up as bottlenecks in contended workloads), so it seems more future proof not to assume that either way. On top of that I think we should, when available (or using the same type of fallback as for 32bit?), use 64 bit atomics for the var anyway. I'll take a stab at fixing this tomorrow. Thanks! Do you mean both or just the second issue? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm version 1.2
On Tue, Jun 30, 2015 at 2:46 AM, Alexander Korotkov a.korot...@postgrespro.ru wrote: On Sun, Jun 28, 2015 at 1:17 AM, Jeff Janes jeff.ja...@gmail.com wrote: This patch implements version 1.2 of contrib module pg_trgm. This supports the triconsistent function, introduced in version 9.4 of the server, to make it faster to implement indexed queries where some keys are common and some are rare. Thank you for the patch! Lack of pg_trgm triconsistent support was significant miss after fast scan implementation. I've included the paths to both upgrade and downgrade between 1.1 and 1.2, although after doing so you must close and restart the session before you can be sure the change has taken effect. There is no change to the on-disk index structure pg_trgm--1.1.sql andpg_trgm--1.1--1.2.sql are useful for debug, but do you expect them in final commit? As I can see in other contribs we have only last version and upgrade scripts. I had thought that pg_trgm--1.1.sql was needed for pg_upgrade to work, but I see that that is not the case. I did see another downgrade path for different module, but on closer inspection it was one that I wrote while trying to analyze that module, and then never removed. I have no objection to removing pg_trgm--1.2--1.1.sql before the commit, but I don't see what we gain by doing so. If a downgrade is feasible and has been tested, why not include it? ... You could get the same benefit just by increasing MAX_MAYBE_ENTRIES (in core) from 4 to some higher value (which it probably should be anyway, but there will always be a case where it needs to be higher than you can afford it to be, so a real solution is needed). Actually, it depends on how long it takes to calculate consistent function. The cheaper consistent function is, the higher MAX_MAYBE_ENTRIES could be. Since all functions in PostgreSQL may define its cost, GIN could calculate MAX_MAYBE_ENTRIES from the cost of consistent function. The consistent function gets called on every candidate tuple, so if it is expensive then it is also all the more worthwhile to reduce the set of candidate tuples. Perhaps MAX_MAYBE_ENTRIES could be calculated from the log of the maximum of the predictNumberResult entries? Anyway, a subject for a different day There may also be some gains in the similarity and regex cases, but I didn't really analyze those for performance. I've thought about how to document this change. Looking to other example of other contrib modules with multiple versions, I decided that we don't document them, other than in the release notes. The same patch applies to 9.4 code with a minor conflict in the Makefile, and gives benefits there as well. Some other notes about the patch: * You allocate boolcheck array and don't use it. That was a bug (at least notionally). trigramsMatchGraph was supposed to be getting boolcheck, not check, passed in to it. It may not have been a bug in practise, because GIN_MAYBE and GIN_TRUE both test as true when cast to booleans. But it seems wrong to rely on that. Or was it intended to work this way? I'm surprised the compiler suite doesn't emit some kind of warning on that. * Check coding style and formatting, in particular check[i]==GIN_TRUE should be check[i] == GIN_TRUE. Sorry about that, fixed. I also changed it in other places to check[i] != GIN_FALSE, rather than checking against both GIN_TRUE and GIN_MAYBE. At first I was concerned we might decide to add a 4th option to the type which would render != GIN_FALSE the wrong way to test for it. But since it is called GinTernaryValue, I think we wouldn't add a fourth thing to it. But perhaps the more verbose form is clearer to some people. * I think some comments needed in gin_trgm_triconsistent() about trigramsMatchGraph(). gin_trgm_triconsistent() may use trigramsMatchGraph() that way because trigramsMatchGraph() implements monotonous boolean function. I have a function-level comment that in all cases, GIN_TRUE is at least as favorable to inclusion of a tuple as GIN_MAYBE. Should I reiterate that comment before trigramsMatchGraph() as well? Calling it a monotonic boolean function is precise and concise, but probably less understandable to people reading the code. At least, if I saw that, I would need to go do some reading before I knew what it meant. Update attached, with the downgrade path still included for now. Thanks, Jeff pg_trgm_1_2_v002.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Dereferenced pointer in tablesample.c
Hi all, (Petr in CC) Coverity is complaining about the following pointer dereference in tablesample_init@tablesample.c: + ExprState *argstate = ExecInitExpr(argexpr, (PlanState *) scanstate); + + if (argstate == NULL) + { + fcinfo.argnull[i] = true; + fcinfo.arg[i] = (Datum) 0;; + } + + fcinfo.arg[i] = ExecEvalExpr(argstate, econtext, + fcinfo.argnull[i], NULL); If the expression argstate is NULL when calling ExecInitExpr(), argstate is going to be NULL and dereferenced afterwards, see execQual.c for more details. Hence I think that the patch attached should be applied. Thoughts? At the same time I noted a double semicolon, fixed as well in the attached. Regards, -- Michael diff --git a/src/backend/access/tablesample/tablesample.c b/src/backend/access/tablesample/tablesample.c index 44a2434..9d443b1 100644 --- a/src/backend/access/tablesample/tablesample.c +++ b/src/backend/access/tablesample/tablesample.c @@ -113,11 +113,13 @@ tablesample_init(SampleScanState *scanstate, TableSampleClause *tablesample) if (argstate == NULL) { fcinfo.argnull[i] = true; - fcinfo.arg[i] = (Datum) 0;; + fcinfo.arg[i] = (Datum) 0; + } + else + { + fcinfo.arg[i] = ExecEvalExpr(argstate, econtext, + fcinfo.argnull[i], NULL); } - - fcinfo.arg[i] = ExecEvalExpr(argstate, econtext, - fcinfo.argnull[i], NULL); i++; } Assert(i == fcinfo.nargs); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reduce ProcArrayLock contention
On Tue, Jun 30, 2015 at 11:53 AM, Simon Riggs si...@2ndquadrant.com wrote: On 30 June 2015 at 04:21, Amit Kapila amit.kapil...@gmail.com wrote: Now, I would like to briefly explain how allow-one-waker idea has helped to improve the patch as not every body here was present in that Un-conference. The same idea applies for marking commits in clog, for which I have been sitting on a patch for a month or so and will post now I'm done travelling. Sure and I think we might want to try something similar even for XLogFlush where we use LWLockAcquireOrWait for WALWriteLock, not sure how it will workout in that case as I/O is involved, but I think it is worth trying. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] drop/truncate table sucks for large values of shared buffers
On Tue, Jun 30, 2015 at 11:00 AM, Simon Riggs si...@2ndquadrant.com wrote: On 30 June 2015 at 05:02, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Jun 29, 2015 at 7:18 PM, Simon Riggs si...@2ndquadrant.com wrote: On 28 June 2015 at 17:17, Tom Lane t...@sss.pgh.pa.us wrote: If lseek fails badly then SeqScans would give *silent* data loss, which in my view is worse. Just added pages aren't the only thing we might miss if lseek is badly wrong. So for the purpose of this patch, do we need to assume that lseek can give us wrong size of file and we should add preventive checks and other handling for the same? I am okay to change that way, if we are going to have that as assumption in out code wherever we are using it or will use it in-future, otherwise we will end with some preventive checks which are actually not required. They're preventative checks. You always hope it is wasted effort. I am not sure if Preventative checks (without the real need) are okay if they are not-cheap which could happen in this case. I think Validating buffer-tag would require rel or sys cache lookup. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
[HACKERS] Reducing ClogControlLock contention
ClogControlLock contention is high at commit time. This appears to be due to the fact that ClogControlLock is acquired in Exclusive mode prior to marking commit, which then gets starved by backends running TransactionIdGetStatus(). Proposal for improving this is to acquire the ClogControlLock in Shared mode, if possible. This is safe because people checking visibility of an xid must always run TransactionIdIsInProgress() first to avoid race conditions, which will always return true for the transaction we are currently committing. As a result, we never get concurrent access to the same bits in clog, which would require a barrier. Two concurrent writers might access the same word concurrently, so we protect against that with a new CommitLock. We could partition that by pageno also, if needed. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services clog_optimize.v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing ClogControlLock contention
On Tue, Jun 30, 2015 at 4:02 PM, Simon Riggs si...@2ndquadrant.com wrote: ClogControlLock contention is high at commit time. This appears to be due to the fact that ClogControlLock is acquired in Exclusive mode prior to marking commit, which then gets starved by backends running TransactionIdGetStatus(). Proposal for improving this is to acquire the ClogControlLock in Shared mode, if possible. This is safe because people checking visibility of an xid must always run TransactionIdIsInProgress() first to avoid race conditions, which will always return true for the transaction we are currently committing. As a result, we never get concurrent access to the same bits in clog, which would require a barrier. Two concurrent writers might access the same word concurrently, so we protect against that with a new CommitLock. We could partition that by pageno also, if needed. Could it be possible to see some performance numbers? For example with a simple pgbench script doing a bunch of tiny transactions, with many concurrent sessions (perhaps hundreds). -- Michael
Re: [HACKERS] Reduce ProcArrayLock contention
On 30 June 2015 at 03:43, Robert Haas robertmh...@gmail.com wrote: On Mon, Jun 29, 2015 at 1:22 PM, Simon Riggs si...@2ndquadrant.com wrote: Yes, I know. And we all had a long conversation about how to do it without waking up the other procs. Forming a list, like we use for sync rep and having just a single process walk the queue was the way I suggested then and previously. Weird. I am not sure what your point is. Are you complaining that you didn't get a design credit for this patch? If so, I think that's a bit petty. I agree that you mentioned something along these lines at PGCon, but Amit and I have been discussing this every week for over a month, so it's not as if the conversations at PGCon were the only ones, or the first. Nor is there a conspiracy to deprive Simon Riggs of credit for his ideas. I believe that you should assume good faith and take it for granted that Amit credited who he believed that he got his ideas from. The fact that you may have had similar ideas does not mean that he got his from you. It probably does mean that they are good ideas, since we are apparently all thinking in the same way. Oh, I accept multiple people can have the same ideas. That happens to me a lot around here. What I find weird is that the discussion was so intense about LWLockAcquireOrWait that when someone presented a solution there were people that didn't notice. It makes me wonder whether large group discussions are worth it. What I find even weirder is the thought that there were another 100 people in the room and it makes me wonder whether others present had even better ideas but they don't speak up for some other reason. I guess that happens on-list all the time, its just we seldom experience the number of people watching our discussions. I wonder how many times people give good ideas and we ignore them, myself included. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] LWLock deadlock and gdb advice
On Mon, Jun 29, 2015 at 5:55 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes jeff.ja...@gmail.com wrote: Is there a way to use gdb to figure out who holds the lock they are waiting for? Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG defined? That might do it. I hadn't planned on running into this problem, so had not compiled accordingly. I thought LOCK_DEBUG would produce too much output, but now I see it doesn't print anything unless Trace_lwlocks is on (but it still makes more info available to gdb, as Amit mentioned), so I think that should be OK. Since I messed up the gdb session causing the postmaster to SIGKILL all the children and destroy the evidence, I'll repeat the run compiled with LOCK_DEBUG and see what it looks like in the morning. Thanks, Jeff
Re: [HACKERS] drop/truncate table sucks for large values of shared buffers
On 30 June 2015 at 07:34, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Jun 30, 2015 at 11:00 AM, Simon Riggs si...@2ndquadrant.com wrote: On 30 June 2015 at 05:02, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Jun 29, 2015 at 7:18 PM, Simon Riggs si...@2ndquadrant.com wrote: On 28 June 2015 at 17:17, Tom Lane t...@sss.pgh.pa.us wrote: If lseek fails badly then SeqScans would give *silent* data loss, which in my view is worse. Just added pages aren't the only thing we might miss if lseek is badly wrong. So for the purpose of this patch, do we need to assume that lseek can give us wrong size of file and we should add preventive checks and other handling for the same? I am okay to change that way, if we are going to have that as assumption in out code wherever we are using it or will use it in-future, otherwise we will end with some preventive checks which are actually not required. They're preventative checks. You always hope it is wasted effort. I am not sure if Preventative checks (without the real need) are okay if they are not-cheap which could happen in this case. I think Validating buffer-tag would require rel or sys cache lookup. True, so don't do that. Keep a list of dropped relations and have the checkpoint process scan the buffer pool every 64 tables, kinda like AbsorbFsync All the heavy lifting gets done in a background process and we know we're safe. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] Reduce ProcArrayLock contention
On 30 June 2015 at 04:21, Amit Kapila amit.kapil...@gmail.com wrote: Now, I would like to briefly explain how allow-one-waker idea has helped to improve the patch as not every body here was present in that Un-conference. The same idea applies for marking commits in clog, for which I have been sitting on a patch for a month or so and will post now I'm done travelling. These ideas have been around some time and are even listed on the PostgreSQL TODO: http://archives.postgresql.org/pgsql-hackers/2007-09/msg00206.php -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] Reduce ProcArrayLock contention
On 30 June 2015 at 07:30, Amit Kapila amit.kapil...@gmail.com wrote: Sure and I think we might want to try something similar even for XLogFlush where we use LWLockAcquireOrWait for WALWriteLock, not sure how it will workout in that case as I/O is involved, but I think it is worth trying. +1 -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] Reduce ProcArrayLock contention
On Mon, Jun 29, 2015 at 11:14 PM, Simon Riggs si...@2ndquadrant.com wrote: What I find weird is that the discussion was so intense about LWLockAcquireOrWait that when someone presented a solution there were people that didn't notice. It makes me wonder whether large group discussions are worth it. I didn't think of this myself, but it feels like something I could have thought of easily. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PANIC in GIN code
On 06/30/2015 02:18 AM, Jeff Janes wrote: On Mon, Jun 29, 2015 at 2:08 PM, Heikki Linnakangas hlinn...@iki.fi wrote: I just pushed a fix for this, but unfortunately it didn't make it 9.5alpha1. Thanks. I think that that fixed it. It survived for over an hour this time. Thanks. I grepped through all the other call sites of XLogInitBufferForRedo() to look for similar bugs, and sure enough, there was one more bug of the same sort, in B-tree page deletion. That one was in 9.4 as well. Fixed that too, and added an assertion into PageGetSpecialPointer() that would help to catch this sort of bugs earlier in the future. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Solaris testers wanted for strxfrm() behavior
On 06/29/2015 07:53 PM, Robert Haas wrote: On Mon, Jun 29, 2015 at 6:07 PM, Josh Berkus j...@agliodbs.com wrote: On 06/29/2015 02:08 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: Joyent confirms that the bug is fixed on SmartOS: The more interesting bit of information would be *when* it was fixed. Answer: not certain, but fixed at least 2 years ago. 2 years is definitely not long enough to assume that no unpatched machines exist in the wild. :-( Well, AFAIK, 90% of SmartOS users are on the Joyent Cloud and get upgraded automatically. It's not much used as a portable OS. So for them, it is enough. Given that Theo's patch against Illumos went in in 2012, I think we can take that as our cutoff date for Illumos. The question is: how many folks out there are running PostgreSQL on Solaris 10? And are they at all likely to upgrade to PostgreSQL 9.5? Unfortunately, these questions are coming up right when Bjorn is on vacation ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Solaris testers wanted for strxfrm() behavior
Josh Berkus j...@agliodbs.com writes: The question is: how many folks out there are running PostgreSQL on Solaris 10? And are they at all likely to upgrade to PostgreSQL 9.5? That's only the pertinent question if the bug exists on Solaris 10, which I don't think we know do we? Oskari Saarenmaa's results upthread seemed to indicate not. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mention column name in error messages
Franck Verrot fra...@verrot.fr writes: As far as I know, there is currently no way to find which column is triggering an error on an INSERT or ALTER COLUMN statement. Example: Indeed ... Given my very restricted knowledge of PG's codebase I am not sure whether my modifications are legitimate or not, so please don't hesitate to comment on it and pointing where things are subpar to PG's codebase. In any case, it's to be considered as WIP for the moment. I think the way you're going at this is fundamentally wrong. It amounts to an (undocumented) change to the API for datatype input functions, and the only way it can work is to change *every datatype's* input function to know about it. That's not practical. More, it doesn't cover errors that are thrown from someplace other than a datatype input function. Evaluation errors in domain check constraint expressions are an example that would be very hard to manage this way. And the approach definitely doesn't scale nicely when you start thinking about other contexts wherein a datatype input operation or coercion might fail. What seems more likely to lead to a usable patch is to arrange for the extra information you want to be emitted as error context, via an error context callback that gets installed at the right times. We already have that set up for datatype-specific errors detected by COPY IN, for example. If you feed garbage data to COPY you get something like ERROR: invalid input syntax for integer: foo CONTEXT: COPY int8_tbl, line 2, column q2: foo with no need for int8in to be directly aware of the context. You should try adapting that methodology for the cases you're worried about. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trgm version 1.2
Jeff Janes jeff.ja...@gmail.com writes: On Tue, Jun 30, 2015 at 2:46 AM, Alexander Korotkov a.korot...@postgrespro.ru wrote: pg_trgm--1.1.sql andpg_trgm--1.1--1.2.sql are useful for debug, but do you expect them in final commit? As I can see in other contribs we have only last version and upgrade scripts. I did see another downgrade path for different module, but on closer inspection it was one that I wrote while trying to analyze that module, and then never removed. I have no objection to removing pg_trgm--1.2--1.1.sql before the commit, but I don't see what we gain by doing so. If a downgrade is feasible and has been tested, why not include it? Because we don't want to support 1.1 anymore once 1.2 exists. You're supposing that just because you wrote the downgrade script and think it works, there's no further costs associated with having that. Personally, I don't even want to review such a script, let alone document its existence and why someone might want to use it, let alone support 1.1 into the far future. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers