[HACKERS] Simplifying the interface of UpdateMinRecoveryPoint
Hi all, As of now UpdateMinRecoveryPoint() is using two arguments: - lsn, to check if the minimum recovery point should be updated to that - force, a boolean flag to decide if the update should be enforced or not. However those two arguments are correlated. If lsn is InvalidXlogRecPtr, the minimum recovery point update will be enforced. Hence why not simplifying its interface and remove the force flag? See attached. Thanks, -- Michael diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index e4645a3..5d69bd3 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -827,7 +827,7 @@ static void RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRec static void UpdateLastRemovedPtr(char *filename); static void ValidateXLOGDirectoryStructure(void); static void CleanupBackupHistory(void); -static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force); +static void UpdateMinRecoveryPoint(XLogRecPtr lsn); static XLogRecord *ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode, bool fetching_ckpt); static void CheckRecoveryConsistency(void); @@ -2489,14 +2489,16 @@ XLogGetReplicationSlotMinimumLSN(void) * If we crash during recovery, we must reach this point again before the * database is consistent. * - * If 'force' is true, 'lsn' argument is ignored. Otherwise, minRecoveryPoint - * is only updated if it's not already greater than or equal to 'lsn'. + * If 'lsn' is InvalidXLogRecPtr, enforce its advance. Otherwise, + * minRecoveryPoint is only updated if it's not already greater than or + * equal to 'lsn'. */ static void -UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force) +UpdateMinRecoveryPoint(XLogRecPtr lsn) { /* Quick check using our local copy of the variable */ - if (!updateMinRecoveryPoint || (!force && lsn <= minRecoveryPoint)) + if (!updateMinRecoveryPoint || + (!XLogRecPtrIsInvalid(lsn) && lsn <= minRecoveryPoint)) return; LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); @@ -2512,7 +2514,7 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force) */ if (minRecoveryPoint == 0) updateMinRecoveryPoint = false; - else if (force || minRecoveryPoint < lsn) + else if (XLogRecPtrIsInvalid(lsn) || minRecoveryPoint < lsn) { XLogRecPtr newMinRecoveryPoint; TimeLineID newMinRecoveryPointTLI; @@ -2520,8 +2522,7 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force) /* * To avoid having to update the control file too often, we update it * all the way to the last record being replayed, even though 'lsn' - * would suffice for correctness. This also allows the 'force' case - * to not need a valid 'lsn' value. + * would suffice for correctness. * * Another important reason for doing it this way is that the passed * 'lsn' value could be bogus, i.e., past the end of available WAL, if @@ -2535,7 +2536,7 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force) newMinRecoveryPointTLI = XLogCtl->replayEndTLI; SpinLockRelease(>info_lck); - if (!force && newMinRecoveryPoint < lsn) + if (!XLogRecPtrIsInvalid(lsn) && newMinRecoveryPoint < lsn) elog(WARNING, "xlog min recovery request %X/%X is past current point %X/%X", (uint32) (lsn >> 32), (uint32) lsn, @@ -2582,7 +2583,8 @@ XLogFlush(XLogRecPtr record) */ if (!XLogInsertAllowed()) { - UpdateMinRecoveryPoint(record, false); + Assert(!XLogRecPtrIsInvalid(record)); + UpdateMinRecoveryPoint(record); return; } @@ -5244,7 +5246,7 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog) /* * Update min recovery point one last time. */ - UpdateMinRecoveryPoint(InvalidXLogRecPtr, true); + UpdateMinRecoveryPoint(InvalidXLogRecPtr); /* * If the ending log segment is still open, close it (to avoid problems on @@ -8750,7 +8752,7 @@ CreateRestartPoint(int flags) (uint32) (lastCheckPoint.redo >> 32), (uint32) lastCheckPoint.redo))); - UpdateMinRecoveryPoint(InvalidXLogRecPtr, true); + UpdateMinRecoveryPoint(InvalidXLogRecPtr); if (flags & CHECKPOINT_IS_SHUTDOWN) { LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] RecoveryTargetTLI dead variable in XLogCtlData
Hi all, I just bumped into $subject, a variable that is never set and never used: --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -631,8 +631,6 @@ typedef struct XLogCtlData TimeLineID replayEndTLI; /* timestamp of last COMMIT/ABORT record replayed (or being replayed) */ TimestampTz recoveryLastXTime; - /* current effective recovery target timeline */ - TimeLineID RecoveryTargetTLI; Thanks, -- Michael diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index e4645a3..232f533 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -631,8 +631,6 @@ typedef struct XLogCtlData TimeLineID replayEndTLI; /* timestamp of last COMMIT/ABORT record replayed (or being replayed) */ TimestampTz recoveryLastXTime; - /* current effective recovery target timeline */ - TimeLineID RecoveryTargetTLI; /* * timestamp of when we started replaying the current chunk of WAL data, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails
On Tue, Jul 12, 2016 at 12:22 PM, Amit Kapilawrote: > I think updating minRecoveryPoint unconditionally can change it's > purpose in some cases. Refer below comments in code: > > * minRecoveryPoint is updated to the latest replayed LSN whenever we > * flush a data change during archive recovery. That guards against > * starting archive recovery, aborting it, and restarting with an earlier > * stop location. If we've already flushed data changes from WAL record X > * to disk, we mustn't start up until we reach X again. > > Now, as per above rule, the value of minRecoveryPoint can be much > smaller than XLogCtl->replayEndRecPtr. I think this can change the > rules when we can allow read-only connections. That would delay the moment read-only connections in hot standby are allowed in the case where a server is stopped with "immediate", then restarted with a different target if no data has been flushed when replaying. > I think your and Kyotaro-san's point that minRecoveryPoint should be > updated to support back-ups on standby has merits, but I think doing > it unconditionally might lead to change in behaviour in some cases. If we want to tackle the case I mentioned above, one way is to just update minRecoveryPoint when an exclusive or a non-exclusive backup is being taken by looking at their status in shared memory. See for example the patch (1) attached, but this does not save from the case where a node replays WAL, does not have data flushes, and from which a backup is taken, in the case where this node gets restarted later with the immediate mode and has different replay targets. Another way that just popped into my mind is to add dedicated fields to XLogCtl that set the stop LSN of a backup the way it should be instead of using minRecoveryPoint. In short we'd update those fields in CreateRestartPoint and UpdateMinRecoveryPoint under XLogCtl->info_lck. The good thing is that this lock is already taken there. See patch (2) accomplishing that. Thinking about that, patch (2) is far cleaner, and takes care of not advancing minRecoveryPoint where not needed, but it does it for the base backups as they should be. So the dependency between the minimum recovery point and the end locations of a backup get reduced. -- Michael diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index e4645a3..fa37ff1 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8739,8 +8739,10 @@ CreateRestartPoint(int flags) * immediate shutdown, though. * * We don't explicitly advance minRecoveryPoint when we do create a - * restartpoint. It's assumed that flushing the buffers will do that as a - * side-effect. + * restartpoint as it is assumed that flushing the buffers will do that as + * a side-effect except when a backup is running to ensure that the start + * LSN location of a backup is not newer than minRecoveryPoint which is + * used as the stop location of a backup. */ if (XLogRecPtrIsInvalid(lastCheckPointRecPtr) || lastCheckPoint.redo <= ControlFile->checkPointCopy.redo) @@ -8762,6 +8764,8 @@ CreateRestartPoint(int flags) LWLockRelease(CheckpointLock); return false; } + else if (BackupInProgress(false)) + UpdateMinRecoveryPoint(InvalidXLogRecPtr, true); /* * Update the shared RedoRecPtr so that the startup process can calculate @@ -10866,14 +10870,28 @@ rm_redo_error_callback(void *arg) /* * BackupInProgress: check if online backup mode is active * - * This is done by checking for existence of the "backup_label" file. + * This is done by checking for existence of the "backup_label" file or by + * looking at the shared memory status of backups. */ bool -BackupInProgress(void) +BackupInProgress(bool label_check) { - struct stat stat_buf; + bool res; + + if (label_check) + { + struct stat stat_buf; + res = (stat(BACKUP_LABEL_FILE, _buf) == 0); + } + else + { + WALInsertLockAcquireExclusive(); + res = XLogCtl->Insert.nonExclusiveBackups > 0 || + XLogCtl->Insert.exclusiveBackup; + WALInsertLockRelease(); + } - return (stat(BACKUP_LABEL_FILE, _buf) == 0); + return res; } /* diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c index 33383b4..2f381cd 100644 --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -636,7 +636,7 @@ pg_xlog_location_diff(PG_FUNCTION_ARGS) Datum pg_is_in_backup(PG_FUNCTION_ARGS) { - PG_RETURN_BOOL(BackupInProgress()); + PG_RETURN_BOOL(BackupInProgress(true)); } /* diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 19d11e0..6795c69 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -3527,7 +3527,7 @@ PostmasterStateMachine(void) /* * PM_WAIT_BACKUP state ends when online backup mode is not active. */ - if (!BackupInProgress()) + if (!BackupInProgress(true)) pmState
Re: [HACKERS] pg_basebackup wish list
On Wed, Jul 13, 2016 at 1:53 AM, Jeff Janeswrote: > I've been having some adventures with pg_basebackup lately, and had > some suggestions based on those. And what I think is pg_baseback never remove the directory specified by -D option even if execution is failed. initdb command behaves so. I think it's helpful for backup operation. Regards, Masahiko Sawada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Showing parallel status in \df+
* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: > On 7/12/16 12:17 PM, Tom Lane wrote: > > It's sounding to me like we have consensus on this proposal to further > > change \df+ to replace the "Source code" column with "Internal name", > > which is prosrc for C and internal-language functions but NULL otherwise. > > > > If I've not heard objections by tomorrow I'll go make that change. > > > > Are we satisfied with telling people to use \sf to see the source code > > for a PL function? Or should there be another variant of \df that > > still provides source code? > > I'm quite fond of having the full source code show in \df+ and I'm curious how it's useful and in what way \sf does not accomplish what you use \df+ for. I understand that's a change, but I believe it's a positive one and would make \df+ much more generally useful. I tend to resort to selecting columns out of pg_proc more often than I use \df+, which is certainly not what we're going for. > I'm > against removing it on short notice past beta2 We've already had to change the structure of \df+; I'm not convinced that avoiding doing so further now, just to do so again in the next release, is actually a better answer than changing it now. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Showing parallel status in \df+
Peter Eisentraut wrote: > I'm quite fond of having the full source code show in \df+ and I'm > against removing it on short notice past beta2, discussed under a > "false" subject heading. How do you use it? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IMPORT FOREIGN SCHEMA can't be run in in pl/pgsql due to INTO
Merlin Moncurewrites: > On Mon, Jul 11, 2016 at 3:09 PM, Tom Lane wrote: >> While we can certainly hack it by something along the lines of not >> recognizing INTO when the first token was IMPORT, the whole thing >> seems awfully messy and fragile. And it will certainly break again >> the next time somebody decides that INTO is le mot juste in some new >> SQL command. I wish we could think of a safer, more future-proof >> solution. I have no idea what that would be, though, short of >> deprecating INTO altogether. > This is a natural consequence of having two > almost-but-not-quite-the-same grammars handing the same shared > language. There are a similar set of annoyances compiling C with a > C++ compiler as we all know. In a perfect world, SQL procedural > extensions would be a proper superset and we'd have *one* grammar > handling everything. Among other niceties this would make moving > forward with stored procedures a much simpler discussion. Well, C'est > la vie :-D. Yeah, I was just belly-aching ;-). Not much choice in the short term. Fix pushed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [COMMITTERS] [HACKERS] Logical decoding
On Mon, Jul 11, 2016 at 2:13 AM, Michael Paquierwrote: > > On Mon, Jul 11, 2016 at 2:03 PM, Craig Ringer wrote: > > On 9 July 2016 at 01:57, Joshua Bay wrote: > >> and where are this code is in the codebase? > > > > src/backend/replication/logical/* > > src/backend/replication/walsender.c > > src/backend/access/transam/xlogreader.c > > src/include/access/xlogreader.h > > src/include/replication/output_plugin.h > > contrib/test_decoding/ > > doc/src/sgml/logicaldecoding.sgml > > Some other references: > https://wiki.postgresql.org/images/7/77/Fosdem-2015-logical-decoding.pdf > > And some other examples of plugins: > https://github.com/leptonix/decoding-json > https://github.com/xstevens/decoderbufs > https://github.com/confluentinc/bottledwater-pg (for kafka) > https://github.com/michaelpq/pg_plugins/tree/master/decoder_raw (wrote this one) > Nice, also we have wal2json [1]. Regards, [1] https://github.com/eulerto/wal2json -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog: http://fabriziomello.github.io >> Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello >> Github: http://github.com/fabriziomello
Re: [HACKERS] pgbench - minor fix for meta command only scripts
Fabien COELHOwrites: > Ok. Here is an updated version, with a better suffix and a simplified > comment. Doesn't this break the handling of latency calculations, or at least make the results completely different for the last metacommand than what they would be for a non-last command? It looks like it needs to loop back so that the latency calculation is completed for the metacommand before it can exit. Seems to me it would probably make more sense to fall out at the end of the "transaction finished" if-block, around line 1923 in HEAD. (The code structure in here seems like a complete mess to me, but probably now is not the time to refactor it.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST index build versus NaN coordinates
I wrote: > Andreas Seltenreichwrites: >> The infinite loop from the bug report was triggered. Further, two >> previously unseen errors are logged: >> ERROR: timestamp cannot be NaN >> ERROR: getQuadrant: impossible case >> The first is porbably as boring as it gets, the second one is from the >> getQuadrant() in spgquadtreeproc.c. > Yeah, the first one is presumably from float8_timestamptz() intentionally > rejecting a NaN, which seems fine. >> Curiously, the getQuadrant()s in geo_spgist.c and rangetypes_spgist.c do >> not have such a check. I guess the boxes will just end up in an >> undefined position in the index for these. > Right, we probably want them all to apply some consistent ordering --- > doesn't matter so much what it is, but float8's rule is as good as any. I looked into these a bit more closely. I think rangetypes_spgist.c is fine as-is: it's relying on the range component type to provide a total ordering of its values, and if the component type fails to do that, it's the fault of the component type not the range code. spgquadtreeproc.c and geo_spgist.c both have got NaN issues, but the code in both of them is rather tightly tied to the fuzzy-geometric-comparisons logic that Emre has been messing with. I think we ought to put "what to do with NaNs?" into that same can of worms, rather than try to resolve it separately. (Yeah, I know, I just made Emre's job even harder.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dumping database privileges broken in 9.6
All, * Noah Misch (n...@leadboat.com) wrote: > On Wed, Jun 29, 2016 at 11:50:17AM -0400, Stephen Frost wrote: > > * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: > > > Do this: > > > > > > CREATE DATABASE test1; > > > REVOKE CONNECT ON DATABASE test1 FROM PUBLIC; > > > > > > Run pg_dumpall. > > > > > > In 9.5, this produces > > > > > > CREATE DATABASE test1 WITH TEMPLATE = template0 OWNER = peter; > > > REVOKE ALL ON DATABASE test1 FROM PUBLIC; > > > REVOKE ALL ON DATABASE test1 FROM peter; > > > GRANT ALL ON DATABASE test1 TO peter; > > > GRANT TEMPORARY ON DATABASE test1 TO PUBLIC; > > > > > > In 9.6, this produces only > > > > > > CREATE DATABASE test1 WITH TEMPLATE = template0 OWNER = peter; > > > GRANT TEMPORARY ON DATABASE test1 TO PUBLIC; > > > GRANT ALL ON DATABASE test1 TO peter; > > > > > > Note that the REVOKE statements are missing. This does not > > > correctly recreate the original state. > > > > I see what happened here, the query in dumpCreateDB() needs to be > > adjusted to pull the default information to then pass to > > buildACLComments(), similar to how the objects handled by pg_dump work. > > The oversight was in thinking that databases didn't have any default > > rights granted, which clearly isn't correct. > > > > I'll take care of that in the next day or so and add an appropriate > > regression test. > > This PostgreSQL 9.6 open item is past due for your status update. Kindly send > a status update within 24 hours, and include a date for your subsequent status > update. Refer to the policy on open item ownership: > http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com Attached is a patch to address this. After much consideration and deliberation, I went with the simpler solution to simply dump out the database privileges based on what a new creation of those privileges would yield, resulting in output similar to pre-9.6. We document that template1 is allowed to be dropped/recreated, which greatly complicates using pg_init_privs to record and produce a delta against the initdb-time values, as we lose the connection between pg_init_privs and the "template1" database as soon as it is dropped (something which can't be done with objects in that catalog). Comments welcome. Thanks! Stephen From fc87c0a07f37d7b4bbcf067d22d31467a511e865 Mon Sep 17 00:00:00 2001 From: Stephen FrostDate: Tue, 12 Jul 2016 15:37:35 -0400 Subject: [PATCH] Correctly dump database ACLs --- src/bin/pg_dump/pg_dumpall.c | 54 +++--- src/bin/pg_dump/t/002_pg_dump.pl | 153 +++ 2 files changed, 195 insertions(+), 12 deletions(-) diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c index d4fb03e..99c0f90 100644 --- a/src/bin/pg_dump/pg_dumpall.c +++ b/src/bin/pg_dump/pg_dumpall.c @@ -1284,14 +1284,43 @@ dumpCreateDB(PGconn *conn) PQclear(res); - /* Now collect all the information about databases to dump */ - if (server_version >= 90300) + + /* + * Now collect all the information about databases to dump. + * + * For the database ACLs, as of 9.6, we extract both the positive (as + * datacl) and negative (as rdatacl) ACLs, relative to the default ACL for + * databases, which are then passed to buildACLCommands() below. + * + * See buildACLQueries() and buildACLCommands(). + * + * Note that we do not support initial privileges (pg_init_privs) on + * databases. + */ + if (server_version >= 90600) + res = executeQuery(conn, + "SELECT datname, " + "coalesce(rolname, (select rolname from pg_authid where oid=(select datdba from pg_database where datname='template0'))), " + "pg_encoding_to_char(d.encoding), " + "datcollate, datctype, datfrozenxid, datminmxid, " + "datistemplate, " + "(SELECT pg_catalog.array_agg(acl) FROM (SELECT pg_catalog.unnest(coalesce(datacl,pg_catalog.acldefault('d',datdba))) AS acl " + "EXCEPT SELECT pg_catalog.unnest(pg_catalog.acldefault('d',datdba))) as foo)" + "AS datacl," + "(SELECT pg_catalog.array_agg(acl) FROM (SELECT pg_catalog.unnest(pg_catalog.acldefault('d',datdba)) AS acl " + "EXCEPT SELECT pg_catalog.unnest(coalesce(datacl,pg_catalog.acldefault('d',datdba as foo)" + "AS rdatacl," + "datconnlimit, " + "(SELECT spcname FROM pg_tablespace t WHERE t.oid = d.dattablespace) AS dattablespace " + "FROM pg_database d LEFT JOIN pg_authid u ON (datdba = u.oid) " + "WHERE datallowconn ORDER BY 1"); + else if (server_version >= 90300) res = executeQuery(conn, "SELECT datname, " "coalesce(rolname, (select rolname from pg_authid where oid=(select datdba from pg_database where datname='template0'))), " "pg_encoding_to_char(d.encoding), " "datcollate, datctype, datfrozenxid, datminmxid, " - "datistemplate, datacl, datconnlimit, " + "datistemplate, datacl, '' as
[HACKERS] DO with a large amount of statements get stuck with high memory consumption
I've noticed that pl/pgsql functions/do commands do not behave well when the statement resolves and frees memory. To be clear: FOR i in 1..100 LOOP INSERT INTO foo VALUES (i); END LOOP; ...runs just fine while BEGIN INSERT INTO foo VALUES (1); INSERT INTO foo VALUES (2); ... INSERT INTO foo VALUES (100); END; (for the curious, create a script yourself via copy ( select 'do $$begin create temp table foo(i int);' union all select format('insert into foo values (%s);', i) from generate_series(1,100) i union all select 'raise notice ''abandon all hope!''; end; $$;' ) to '/tmp/breakit.sql'; ...while consume amounts of resident memory proportional to the number of statemnts and eventually crash the server. The problem is obvious; each statement causes a plan to get created and the server gets stuck in a loop where SPI_freeplan() is called repeatedly. Everything is working as designed I guess, but when this happens it's really unpleasant: the query is uncancellable and unterminatable, nicht gut. A pg_ctl kill ABRT will do the trick but I was quite astonished to see linux take a few minutes to clean up the mess (!) on a somewhat pokey virtualized server with lots of memory. With even as little as ten thousand statements the cleanup time far exceed the runtime of the statement block. I guess the key takeaway here is, "don't do that"; pl/pgsql aggressively generates plans and turns out to be a poor choice for bulk loading because of all the plan caching. Having said that, I can't help but wonder if there should be a (perhaps user configurable) limit to the amount of SPI plans a single function call should be able to acquire on the basis you are going to smack into very poor behaviors in the memory subsystem. Stepping back, I can't help but wonder what the value of all the plan caching going on is at all for statement blocks. Loops might comprise a notable exception, noted. I'd humbly submit though that (relative to functions) it's much more likely to want to do something like insert a lot of statements and a impossible to utilize any cached plans. This is not an academic gripe -- I just exploded production :-D. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST index build versus NaN coordinates
I wrote: > I looked into the problem reported in bug #14238, > https://www.postgresql.org/message-id/20160708151747.1426.60...@wrigleys.postgresql.org > I think that probably the most reasonable answer is to replace all the > raw "double" comparisons in this code with float8_cmp_internal() or > something that's the moral equivalent of that. Does anyone want to > propose something else? Here's a draft patch that modifies gistproc.c to treat NaNs essentially as though they are infinities. I've only touched the code associated with GiST index insertion, not with searches, so it's quite possible that searches will still not work right in the presence of NaNs. However, this does fix the index-build infinite loop complained of originally. I've not yet looked at the SPGiST issues reported by Andreas. regards, tom lane diff --git a/src/backend/access/gist/gistproc.c b/src/backend/access/gist/gistproc.c index e8213e2..d47211a 100644 *** a/src/backend/access/gist/gistproc.c --- b/src/backend/access/gist/gistproc.c *** *** 17,36 */ #include "postgres.h" #include "access/gist.h" #include "access/stratnum.h" #include "utils/geo_decls.h" static bool gist_box_leaf_consistent(BOX *key, BOX *query, StrategyNumber strategy); - static double size_box(BOX *box); static bool rtree_internal_consistent(BOX *key, BOX *query, StrategyNumber strategy); /* Minimum accepted ratio of split */ #define LIMIT_RATIO 0.3 /** * Box ops --- 17,47 */ #include "postgres.h" + #include + #include "access/gist.h" #include "access/stratnum.h" + #include "utils/builtins.h" #include "utils/geo_decls.h" static bool gist_box_leaf_consistent(BOX *key, BOX *query, StrategyNumber strategy); static bool rtree_internal_consistent(BOX *key, BOX *query, StrategyNumber strategy); /* Minimum accepted ratio of split */ #define LIMIT_RATIO 0.3 + /* Convenience macros for NaN-aware comparisons */ + #define FLOAT8_EQ(a,b) (float8_cmp_internal(a, b) == 0) + #define FLOAT8_LT(a,b) (float8_cmp_internal(a, b) < 0) + #define FLOAT8_LE(a,b) (float8_cmp_internal(a, b) <= 0) + #define FLOAT8_GT(a,b) (float8_cmp_internal(a, b) > 0) + #define FLOAT8_GE(a,b) (float8_cmp_internal(a, b) >= 0) + #define FLOAT8_MAX(a,b) (FLOAT8_GT(a, b) ? (a) : (b)) + #define FLOAT8_MIN(a,b) (FLOAT8_LT(a, b) ? (a) : (b)) + /** * Box ops *** static bool rtree_internal_consistent(BO *** 40,51 * Calculates union of two boxes, a and b. The result is stored in *n. */ static void ! rt_box_union(BOX *n, BOX *a, BOX *b) { ! n->high.x = Max(a->high.x, b->high.x); ! n->high.y = Max(a->high.y, b->high.y); ! n->low.x = Min(a->low.x, b->low.x); ! n->low.y = Min(a->low.y, b->low.y); } /* --- 51,103 * Calculates union of two boxes, a and b. The result is stored in *n. */ static void ! rt_box_union(BOX *n, const BOX *a, const BOX *b) { ! n->high.x = FLOAT8_MAX(a->high.x, b->high.x); ! n->high.y = FLOAT8_MAX(a->high.y, b->high.y); ! n->low.x = FLOAT8_MIN(a->low.x, b->low.x); ! n->low.y = FLOAT8_MIN(a->low.y, b->low.y); ! } ! ! /* ! * Size of a BOX for penalty-calculation purposes. ! * The result can be +Infinity, but not NaN. ! */ ! static double ! size_box(const BOX *box) ! { ! /* ! * Check for zero-width cases. Note that we define the size of a zero- ! * by-infinity box as zero. It's important to special-case this somehow, ! * as naively multiplying infinity by zero will produce NaN. ! * ! * The less-than cases should not happen, but if they do, say "zero". ! */ ! if (FLOAT8_LE(box->high.x, box->low.x) || ! FLOAT8_LE(box->high.y, box->low.y)) ! return 0.0; ! ! /* ! * We treat NaN as larger than +Infinity, so any distance involving a NaN ! * and a non-NaN is infinite. Note the previous check eliminated the ! * possibility that the low fields are NaNs. ! */ ! if (isnan(box->high.x) || isnan(box->high.y)) ! return get_float8_infinity(); ! return (box->high.x - box->low.x) * (box->high.y - box->low.y); ! } ! ! /* ! * Return amount by which the union of the two boxes is larger than ! * the original BOX's area. The result can be +Infinity, but not NaN. ! */ ! static double ! box_penalty(const BOX *original, const BOX *new) ! { ! BOX unionbox; ! ! rt_box_union(, original, new); ! return size_box() - size_box(original); } /* *** gist_box_consistent(PG_FUNCTION_ARGS) *** 85,100 strategy)); } static void ! adjustBox(BOX *b, BOX *addon) { ! if (b->high.x < addon->high.x) b->high.x = addon->high.x; ! if (b->low.x > addon->low.x) b->low.x = addon->low.x; ! if (b->high.y < addon->high.y) b->high.y = addon->high.y; ! if (b->low.y > addon->low.y) b->low.y = addon->low.y; } ---
Re: [HACKERS] pg_basebackup wish list
On Tue, Jul 12, 2016 at 11:06:39AM -0700, Jeff Janes wrote: > On Tue, Jul 12, 2016 at 10:48 AM, Peter Eisentraut >wrote: > > On 7/12/16 12:53 PM, Jeff Janes wrote: > >> The --help message for pg_basebackup says: > >> > >> -Z, --compress=0-9 compress tar output with given compression level > >> > >> But -Z0 is then rejected as 'invalid compression level "0"'. The real > >> docs do say 1-9, only the --help message has this bug. Trivial patch > >> attached. > > > > pg_dump --help and man page say it supports 0..9. Maybe we should make > > that more consistent. > > pg_dump actually does support -Z0, though. Well, sort of. It outputs > plain text. Rather than plain text wrapped in some kind of dummy gzip > header, which is what I had naively expected. > > Is that what -Z0 in pg_basebackup should do as well, just output > uncompressed tar data, and not add the ".gz" to the "base.tar" file > name? > > Cheers, > > Jeff > Hi, Yes, please support the no compression option. It can be useful in situations where the bottleneck is the compression itself (quite easily done with zlib based options, another plug for a higher performance option). Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup wish list
On Tue, Jul 12, 2016 at 10:48 AM, Peter Eisentrautwrote: > On 7/12/16 12:53 PM, Jeff Janes wrote: >> The --help message for pg_basebackup says: >> >> -Z, --compress=0-9 compress tar output with given compression level >> >> But -Z0 is then rejected as 'invalid compression level "0"'. The real >> docs do say 1-9, only the --help message has this bug. Trivial patch >> attached. > > pg_dump --help and man page say it supports 0..9. Maybe we should make > that more consistent. pg_dump actually does support -Z0, though. Well, sort of. It outputs plain text. Rather than plain text wrapped in some kind of dummy gzip header, which is what I had naively expected. Is that what -Z0 in pg_basebackup should do as well, just output uncompressed tar data, and not add the ".gz" to the "base.tar" file name? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - compute & show latency consistently
On 7/9/16 4:42 AM, Fabien COELHO wrote: > number of transactions per client: 1000 > number of transactions actually processed: 1/1 > -latency average = 15.844 ms > -latency stddev = 2.715 ms > +latency average: 15.844 ms > +latency stddev: 2.715 ms > tps = 618.764555 (including connections establishing) > tps = 622.977698 (excluding connections establishing) I think what you have here is that colons separate input parameters and equal signs separate result output. So I think it's OK the way it is. Maybe a better improvement would be introducing section headings like "test parameters" and "test results". -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix Error Message for allocate_recordbuf() Failure
Thank you for the feedback. We had encountered this error message when allocating a record buf of 344 bytes. The message "record length 344 at %X/%X too long" along with the comment /* We treat this as a "bogus data" condition */ masked the OOM condition, implying an error in log record size calculation logic. The actual allocation that failed was for 5 * Max(BLCKSZ, XLOG_BLCKSZ), a much larger allocation. Shoaib On Sun, Jul 10, 2016 at 10:58 PM, Craig Ringerwrote: > > > On 11 July 2016 at 12:04, Michael Paquier > wrote: > >> On Sat, Jul 9, 2016 at 2:58 AM, Shoaib Lari wrote: >> > Besides making the error message more informative, we had to modify >> > allocate_recordbuf() to return the actual number of bytes that were >> being >> > allocated. >> >> - report_invalid_record(state, "record length %u at %X/%X too long", >> - total_len, >> - (uint32) (RecPtr >> 32), (uint32) RecPtr); >> + report_invalid_record(state, >> + "cannot allocate %u bytes for record >> length %u at %X/%X", >> + newSizeOut, total_len, (uint32) (RecPtr >> >> 32), >> + (uint32) RecPtr); >> >> It does not look like a good idea to me to complicate the interface of >> allocate_recordbuf just to make more verbose one error message, >> meaning that it basically a complain related to the fact that >> palloc_extended(MCXT_ALLOC_NO_OOM) does not mention to the user the >> size that it has tried to allocate before returning NULL. Do you have >> a use case that would prove to be useful if this extra piece of >> information is provided? Because it seems to me that we don't really >> care if we know how much memory it has failed to allocate, we only >> want to know that it *has* failed and take actions only based on that. >> > > I actually find details of how much memory we tried to allocate to be > really useful in other places that do emit it. Sometimes it's been an > important clue when trying to figure out what's going on on a remote system > with no or limited access. Even if it just tells me "we were probably > incrementally allocating lots of small values since this failure is small" > vs "this allocation is huge, look for something unusually large" or "this > allocation is impossibly huge / some suspicious value look for memory > corruption". > > I tend to agree with your suggestion about a better approach but do think > emitting details on allocation failures is useful. At least when people > turn VM overcommit off ... > > -- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
Re: [HACKERS] pg_basebackup wish list
On 7/12/16 12:53 PM, Jeff Janes wrote: > The --help message for pg_basebackup says: > > -Z, --compress=0-9 compress tar output with given compression level > > But -Z0 is then rejected as 'invalid compression level "0"'. The real > docs do say 1-9, only the --help message has this bug. Trivial patch > attached. pg_dump --help and man page say it supports 0..9. Maybe we should make that more consistent. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Showing parallel status in \df+
On 7/12/16 12:17 PM, Tom Lane wrote: > It's sounding to me like we have consensus on this proposal to further > change \df+ to replace the "Source code" column with "Internal name", > which is prosrc for C and internal-language functions but NULL otherwise. > > If I've not heard objections by tomorrow I'll go make that change. > > Are we satisfied with telling people to use \sf to see the source code > for a PL function? Or should there be another variant of \df that > still provides source code? I'm quite fond of having the full source code show in \df+ and I'm against removing it on short notice past beta2, discussed under a "false" subject heading. This is long-standing, intentional behavior, not a regression, and changing it should get wider consultation. Please submit a patch to the next commit fest instead. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Showing parallel status in \df+
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frostwrites: > > * Tom Lane (t...@sss.pgh.pa.us) wrote: > >> Are we satisfied with telling people to use \sf to see the source code > >> for a PL function? Or should there be another variant of \df that > >> still provides source code? > > > I don't see the point in having a \df variant be the same as what \sf > > is. I could possibly see extending \sf in some way, if there are things > > that it doesn't currently do that \df does (and those things are > > useful). > > I certainly agree that \sf already does what it does just fine. The > question is more about whether anyone is likely to think that removing > source code from \df+ output constitutes an important loss of > functionality. Right, I understood that to be your question and was intending to answer it with "no." > I had some vague ideas about inventing a new \df behavior modeled on > the way that \d+ shows view definitions, that is, put the function body > in a footer rather than in the tabular output proper. So you could > imagine something like > > # \df++ foo* > Schema | Name | ... > +--+-... > public | fooa | ... > public | foob | ... > Source code for fooa(int, text): > ... body of fooa ... > Source code for foob(text, text, numeric): > ... body of foob ... > > But I'm not sure it's worth the trouble. And anyway we could add this > later. Agreed on both counts. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Showing parallel status in \df+
Stephen Frostwrites: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> Are we satisfied with telling people to use \sf to see the source code >> for a PL function? Or should there be another variant of \df that >> still provides source code? > I don't see the point in having a \df variant be the same as what \sf > is. I could possibly see extending \sf in some way, if there are things > that it doesn't currently do that \df does (and those things are > useful). I certainly agree that \sf already does what it does just fine. The question is more about whether anyone is likely to think that removing source code from \df+ output constitutes an important loss of functionality. I had some vague ideas about inventing a new \df behavior modeled on the way that \d+ shows view definitions, that is, put the function body in a footer rather than in the tabular output proper. So you could imagine something like # \df++ foo* Schema | Name | ... +--+-... public | fooa | ... public | foob | ... Source code for fooa(int, text): ... body of fooa ... Source code for foob(text, text, numeric): ... body of foob ... But I'm not sure it's worth the trouble. And anyway we could add this later. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_basebackup wish list
I've been having some adventures with pg_basebackup lately, and had some suggestions based on those. The --help message for pg_basebackup says: -Z, --compress=0-9 compress tar output with given compression level But -Z0 is then rejected as 'invalid compression level "0"'. The real docs do say 1-9, only the --help message has this bug. Trivial patch attached. These ones I have not written code for yet: The progress reporting for pg_basebackup is pretty terse: 858117/7060099 kB (12%), 0/1 tablespace I think we should at least add a count-up timer showing the seconds it has been running. I can always use my own stopwatch, but that is not very friendly and easy to forget to start. And maybe change the reporting units from kB to MB when the pre-scan says the total size exceeds some threshold? At the same limit pg_size_pretty does? If I use the verbose flag, then the progress message includes the name of the file being written to by the client. However, in -Ft mode, this is always "something/base.tar" (unless you are using tablespaces), which is not terribly useful. Should it instead report the name of the file being read on the server end? When using pg_basebackup from the wrong version, the error message it reports is pretty unhelpful: pg_basebackup: could not initiate base backup: ERROR: syntax error Could we have a newer version of pg_basebackup capture that error and inject a HINT, or is there a better solution for getting a better error message? Cheers, Jeff pg_basebackup_compress.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Showing parallel status in \df+
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frostwrites: > > Agreed. I don't have any issue with "Language", really, but I agree > > that "Source code" makes the output pretty ridiculous. I also liked the > > idea of changing the name to "internal name" or something along those > > lines, rather than having it be "source code", if we keep the column for > > C/internal functions. Keeping is as "source code" wouldn't be accurate. > > It's sounding to me like we have consensus on this proposal to further > change \df+ to replace the "Source code" column with "Internal name", > which is prosrc for C and internal-language functions but NULL otherwise. > > If I've not heard objections by tomorrow I'll go make that change. > > Are we satisfied with telling people to use \sf to see the source code > for a PL function? Or should there be another variant of \df that > still provides source code? I don't see the point in having a \df variant be the same as what \sf is. I could possibly see extending \sf in some way, if there are things that it doesn't currently do that \df does (and those things are useful). Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] GiST index build versus NaN coordinates
Andreas Seltenreichwrites: > Tom Lane writes: >> More generally, this example makes me fearful that NaN coordinates in >> geometric values are likely to cause all sorts of issues. It's too late >> to disallow them, probably, but I wonder how can we identify other bugs >> of this ilk. > Sounds like some fuzz testing with nan/infinity is in order. sqlsmith > doesn't generate any float literals, but it calls functions to satisfy > its need for values of specific types. Adding suitable functions[1] to > the regression db, I made the following observations: This is really useful, thanks! > The infinite loop from the bug report was triggered. Further, two > previously unseen errors are logged: > ERROR: timestamp cannot be NaN > ERROR: getQuadrant: impossible case > The first is porbably as boring as it gets, the second one is from the > getQuadrant() in spgquadtreeproc.c. Yeah, the first one is presumably from float8_timestamptz() intentionally rejecting a NaN, which seems fine. > Curiously, the getQuadrant()s in geo_spgist.c and rangetypes_spgist.c do > not have such a check. I guess the boxes will just end up in an > undefined position in the index for these. Right, we probably want them all to apply some consistent ordering --- doesn't matter so much what it is, but float8's rule is as good as any. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] use of SEQ_MINVALUE in btree_gin
Peter Eisentrautwrites: > btree_gin uses SEQ_MINVALUE as a way to get the smallest int64 value. > This is actually wrong because the smallest int64 value is > SEQ_MINVALUE-1, so this might be slightly broken. > The whole thing was done as a convenience when INT64_IS_BUSTED had to be > considered, but I think we can get rid of that now. See attached > proposed patch. +1. I agree that this is actually a bug fix, so it should be back-patched. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Showing parallel status in \df+
Stephen Frostwrites: > Agreed. I don't have any issue with "Language", really, but I agree > that "Source code" makes the output pretty ridiculous. I also liked the > idea of changing the name to "internal name" or something along those > lines, rather than having it be "source code", if we keep the column for > C/internal functions. Keeping is as "source code" wouldn't be accurate. It's sounding to me like we have consensus on this proposal to further change \df+ to replace the "Source code" column with "Internal name", which is prosrc for C and internal-language functions but NULL otherwise. If I've not heard objections by tomorrow I'll go make that change. Are we satisfied with telling people to use \sf to see the source code for a PL function? Or should there be another variant of \df that still provides source code? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold
On Fri, Jul 8, 2016 at 1:52 PM, Andres Freundwrote: > I'm a bit confused, why aren't we simply adding LSN interlock > checks for toast? Doesn't look that hard? Seems like a much more > natural course of fixing this issue? I took some time trying to see what you have in mind, and I'm really not "getting it". I definitely applaud you for spotting the problem, but this suggestion for solving it doesn't seem to be useful. Basically, after turning this suggestion every way I could, I see two alternative ways to implement it. (1) Whenever TestForOldSnapshot() checks a heap page, check whether the related toast is OK for all visible tuples on that page. It would be enough to check one toast tuple for one value per heap tuple, but still -- this would be really nasty from a performance perspective. (2) To deal with the fact that only about 7% of the BufferGetPage() calls need to make this check, all functions and macros which read toast data from the table would need extra parameters, and all call sites for the toast API would need to have such context information passed to them, so they could specify this correctly. Ugh. Compared to those options, the approach I was taking, where the fix is "automatic" but some workloads where old_snapshot_threshold is on would sequentially read some toast indexes more often seems pretty tame. Do you see some other option that fits what you describe? I'll give you a couple days to respond before coding the patch. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] One process per session lack of sharing
Hi > amatv...@bitec.ru writes: >> Is there any plan to implement "session per thread" or "shared >> sessions between thread"? > No, not really. The amount of overhead that would add --- eg, the need > for locking on what used to be single-use caches --- makes the benefit > highly questionable. A two-layer cache is the best answer. > Also, most people who need this find that sticking > a connection pooler in front of the database solves their problem It has some disadvantages. Lack of temporary table for example Practical usage of that table with connection poller is highly questionable. And so on. > , so > there's not that much motivation to do a ton of work inside the database > to solve it there. It is clear. Thank you. -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] One process per session lack of sharing
amatv...@bitec.ru writes: > Is there any plan to implement "session per thread" or "shared > sessions between thread"? No, not really. The amount of overhead that would add --- eg, the need for locking on what used to be single-use caches --- makes the benefit highly questionable. Also, most people who need this find that sticking a connection pooler in front of the database solves their problem, so there's not that much motivation to do a ton of work inside the database to solve it there. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] One process per session lack of sharing
Hi Is there any plan to implement "session per thread" or "shared sessions between thread"? We have analyzed the ability to contribute pgSql to jvm bytecode compiler but with current thread model this idea is far from optimal.(Vm can be different of course. But currently we use oracle and jvm is important for us) We have faced with some lack of sharing resources. So in our test memory usage per session: Oracle: about 5M MSSqlServer: about 4M postgreSql: about 160М It's discussed on pgsql-gene...@postgresql.org: http://www.mail-archive.com/pgsql-general@postgresql.org/msg206452.html >I think the "problem" that he is having is fixable only by changing how >PostgreSQL itself works. His problem is a PL/pgSQL function which is 11K >lines in length. When invoked, this function is "compiled" into a large >tokenized parse tree. This parse tree is only usable in the session which >invoked the the function. Apparently this parse tree takes a lot of memory. >And "n" concurrent users of this, highly used, function will therefore >require "n" times as much memory because the parse tree is _not_ >shareable. This is explained in: >https://www.postgresql.org/docs/9.5/static/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING Next interesting answer(from Karl Czajkowskiin private): > But, I search the > archives of the mailing list, and when others have previously > suggested such caching or reuse, it was immediately shot down by core > developers. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IMPORT FOREIGN SCHEMA can't be run in in pl/pgsql due to INTO
On Mon, Jul 11, 2016 at 3:09 PM, Tom Lanewrote: > Merlin Moncure writes: >> Currently pl/pgsql interprets the mandatory INTO of IMPORT FOREIGN >> SCHEMA as INTO variable. > > Ugh, that's definitely a bug. > >> I estimate this to be minor oversight in >> pl/pgsql parsing with respect to the introduction of this statement. > > While we can certainly hack it by something along the lines of not > recognizing INTO when the first token was IMPORT, the whole thing > seems awfully messy and fragile. And it will certainly break again > the next time somebody decides that INTO is le mot juste in some new > SQL command. I wish we could think of a safer, more future-proof > solution. I have no idea what that would be, though, short of > deprecating INTO altogether. This is a natural consequence of having two almost-but-not-quite-the-same grammars handing the same shared language. There are a similar set of annoyances compiling C with a C++ compiler as we all know. In a perfect world, SQL procedural extensions would be a proper superset and we'd have *one* grammar handling everything. Among other niceties this would make moving forward with stored procedures a much simpler discussion. Well, C'est la vie :-D. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Showing parallel status in \df+
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Michael Paquierwrites: > > On Tue, Jul 12, 2016 at 11:36 AM, Alvaro Herrera > > wrote: > >> So prosrc for internal/C and NULL for others? WFM. > > > And so we'd remove "Language" at the same time? That does not sound bad to > > me. > > Hm, I wasn't thinking of that step. The main knock on "Source code" is > that it is usually too large to fit into the display grid --- but that > argument doesn't work against "Language". Also, while "Language" is > certainly an implementation detail in some sense, it is a pretty useful > detail: it gives you a good hint about the likely speed of the function, > for instance. Agreed. I don't have any issue with "Language", really, but I agree that "Source code" makes the output pretty ridiculous. I also liked the idea of changing the name to "internal name" or something along those lines, rather than having it be "source code", if we keep the column for C/internal functions. Keeping is as "source code" wouldn't be accurate. Thanks! Stephen signature.asc Description: Digital signature
[HACKERS] Oddity in handling of cached plans for FDW queries
Hi, I noticed odd behavior in invalidating a cached plan with a foreign join due to user mapping changes. Consider: postgres=# create extension postgres_fdw; postgres=# create server loopback foreign data wrapper postgres_fdw options (dbname 'postgres'); postgres=# create user mapping for public server loopback; postgres=# create table t1 (a int, b int); postgres=# insert into t1 values (1, 1); postgres=# create foreign table ft1 (a int, b int) server loopback options (table_name 't1'); postgres=# analyze ft1; postgres=# create view v1 as select * from ft1; postgres=# create user v1_owner; postgres=# alter view v1 owner to v1_owner; postgres=# grant select on ft1 to v1_owner; postgres=# prepare join_stmt as select * from ft1, v1 where ft1.a = v1.a; postgres=# explain verbose execute join_stmt; QUERY PLAN -- Foreign Scan (cost=100.00..102.04 rows=1 width=16) Output: ft1.a, ft1.b, ft1_1.a, ft1_1.b Relations: (public.ft1) INNER JOIN (public.ft1) Remote SQL: SELECT r1.a, r1.b, r5.a, r5.b FROM (public.t1 r1 INNER JOIN public.t1 r5 ON (((r1.a = r5.a (4 rows) That's great. postgres=# create user mapping for v1_owner server loopback; postgres=# explain verbose execute join_stmt; QUERY PLAN -- Nested Loop (cost=200.00..202.07 rows=1 width=16) Output: ft1.a, ft1.b, ft1_1.a, ft1_1.b Join Filter: (ft1.a = ft1_1.a) -> Foreign Scan on public.ft1 (cost=100.00..101.03 rows=1 width=8) Output: ft1.a, ft1.b Remote SQL: SELECT a, b FROM public.t1 -> Foreign Scan on public.ft1 ft1_1 (cost=100.00..101.03 rows=1 width=8) Output: ft1_1.a, ft1_1.b Remote SQL: SELECT a, b FROM public.t1 (9 rows) That's also great. Since ft1 and v1 use different user mappings, the join should not be pushed down. postgres=# drop user mapping for v1_owner server loopback; postgres=# explain verbose execute join_stmt; QUERY PLAN -- Nested Loop (cost=200.00..202.07 rows=1 width=16) Output: ft1.a, ft1.b, ft1_1.a, ft1_1.b Join Filter: (ft1.a = ft1_1.a) -> Foreign Scan on public.ft1 (cost=100.00..101.03 rows=1 width=8) Output: ft1.a, ft1.b Remote SQL: SELECT a, b FROM public.t1 -> Foreign Scan on public.ft1 ft1_1 (cost=100.00..101.03 rows=1 width=8) Output: ft1_1.a, ft1_1.b Remote SQL: SELECT a, b FROM public.t1 (9 rows) Seems odd to me. Both relations use the same user mapping as before, so the join should be pushed down. Another thing I noticed is that the code in plancache.c would invalidate a cached plan with a foreign join due to user mapping changes even in the case where user mappings are meaningless to the FDW. To fix the first, I'd like to propose (1) replacing the existing has_foreign_join flag in the CachedPlan data structure with a new flag, say uses_user_mapping, that indicates whether a cached plan uses any user mapping regardless of whether the cached plan has foreign joins or not (likewise, replace the hasForeignJoin flag in PlannedStmt), and (2) invalidating the cached plan if the uses_user_mapping flag is true. I thought that we invalidate the cached plan if the flag is true and there is a possibility of performing foreign joins, but it seems to me that updates on the corresponding catalog are infrequent enough that such a detailed tracking is not worth the effort. One benefit of using the proposed approach is that we could make the FDW's handling of user mappings in BeginForeignScan more efficient; currently, there is additional overhead caused by catalog re-lookups to obtain the user mapping information for the simple-foreign-table-scan case where user mappings mean something to the FDW as in postgres_fdw (probably, updates on the catalogs are infrequent, though), but we could improve the efficiency by using the validated user mapping information created at planning time for that case as well as for the foreign-join case. For that, I'd like to propose storing the validated user mapping information into ForeignScan, not fdw_private. There is a discussion about using an ExtensibleNode [1] for that, but the proposed approach would make the FDW's job much simpler. I don't think the above change is sufficient to fix the second. The root reason for that is that since currently, we allow the user mapping OID (rel->umid) to be InvalidOid in two cases: (1) user mappings mean something to the FDW but it can't get any user mapping at planning time and (2) user mappings are meaningless to the FDW, we cannot distinguish these two cases. So, I'd like to
Re: [HACKERS] remove checkpoint_warning
On 07/09/2016 11:12 PM, Tom Lane wrote: Alvaro Herrerawrites: the checkpoint_warning feature was added by commit 2986aa6a668bce3cfb836 in November 2002 when we didn't have any logging of checkpointing at all. I propose to remove it: surely anyone who cares about analyzing checkpointing behavior nowadays is using the log_checkpoint feature instead, which contains much more detail. The other one is just noise now, and probably ignored amidst the number of other warning traffic. Hmm, not sure. ISTM log_checkpoint is oriented to people who know what they are doing, whereas checkpoint_warning is more targeted to trying to help people who don't. Perhaps you could make an argument that checkpoint_warning is useless because the people whom it's meant to help won't notice the warning anyway --- but I doubt that it's been "superseded" by log_checkpoint, because the latter would only be enabled by people who already have a clue that checkpoint performance is something to worry about. Or in short, this may be a fine change to make, but I don't like your argument for it. regards, tom lane i think tom is right here. log_checkpoint and checkpoint_warning are for totally different people. we might just want to do one thing: we might want to state explicitly that the database cannot break down if this warning shows up. many people are scared to death that this warning somehow indicates that PostgreSQL is about to go up in flames, which is of course not true. maybe we could do "consider increasing to ensure good performance" or so ... regards, hans -- Hans-Jürgen Schönig Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] remove checkpoint_warning
On Tue, Jul 12, 2016 at 3:32 AM, Craig Ringerwrote: > On 11 July 2016 at 22:25, Stephen Frost wrote: > >> * Tom Lane (t...@sss.pgh.pa.us) wrote: >> > Alvaro Herrera writes: >> > > the checkpoint_warning feature was added by commit >> 2986aa6a668bce3cfb836 >> > > in November 2002 when we didn't have any logging of checkpointing at >> > > all. I propose to remove it: surely anyone who cares about analyzing >> > > checkpointing behavior nowadays is using the log_checkpoint feature >> > > instead, which contains much more detail. The other one is just noise >> > > now, and probably ignored amidst the number of other warning traffic. >> > >> > Hmm, not sure. ISTM log_checkpoint is oriented to people who know what >> > they are doing, whereas checkpoint_warning is more targeted to trying >> > to help people who don't. Perhaps you could make an argument that >> > checkpoint_warning is useless because the people whom it's meant to help >> > won't notice the warning anyway --- but I doubt that it's been >> > "superseded" by log_checkpoint, because the latter would only be enabled >> > by people who already have a clue that checkpoint performance is >> something >> > to worry about. >> > >> > Or in short, this may be a fine change to make, but I don't like your >> > argument for it. >> >> I don't agree that it's sensible to get rid of. Having just >> log_checkpoints will have the logs filled with checkpoints starting >> because of XLOG, but there's no indication of that being a bad thing. >> >> > Also, the warning is greppable-for and easily spotted by log ingesting > tools. I see no real reason to remove it. > +1, this is a useful warning. I'd flip the original argument over instead and say that for *most* people, log_checkpoint output is mostly noise, and it's checkpoint_warnings that's actually interesting. That on tells them if they have to look at anything at all. It's true that those that care about *analyzing* their checkpointing behaviour need to turn on log_checkpoint, but it's checkpoint_warnings that tells them that they have to do this. And the argument about it getting lost amongst other log traffic to me is an argument for not turning on log_checkpoints by default. That generates a lot of log entries that most people don't care for, and in doing so hides other things in the log. It's excellent to have it once you need it, but it shouldn't be turned on by default. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] GiST index build versus NaN coordinates
Tom Lane writes: > More generally, this example makes me fearful that NaN coordinates in > geometric values are likely to cause all sorts of issues. It's too late > to disallow them, probably, but I wonder how can we identify other bugs > of this ilk. Sounds like some fuzz testing with nan/infinity is in order. sqlsmith doesn't generate any float literals, but it calls functions to satisfy its need for values of specific types. Adding suitable functions[1] to the regression db, I made the following observations: The infinite loop from the bug report was triggered. Further, two previously unseen errors are logged: ERROR: timestamp cannot be NaN ERROR: getQuadrant: impossible case The first is porbably as boring as it gets, the second one is from the getQuadrant() in spgquadtreeproc.c. Curiously, the getQuadrant()s in geo_spgist.c and rangetypes_spgist.c do not have such a check. I guess the boxes will just end up in an undefined position in the index for these. regards Andreas Footnotes: [1] create function smith_double_inf() returns float as $$select 'infinity'::float$$ language sql immutable; create function smith_double_ninf() returns float as $$select '-infinity'::float$$ language sql immutable; create function smith_double_nan() returns float as $$select 'nan'::float$$ language sql immutable; create function smith_real_nan() returns real as $$select 'nan'::real$$ language sql immutable; create function smith_real_inf() returns real as $$select 'infinity'::real$$ language sql immutable; create function smith_real_ninf() returns real as $$select '-infinity'::real$$ language sql immutable; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers