[HACKERS] Simplifying the interface of UpdateMinRecoveryPoint

2016-07-12 Thread Michael Paquier
Hi all,

As of now UpdateMinRecoveryPoint() is using two arguments:
- lsn, to check if the minimum recovery point should be updated to that
- force, a boolean flag to decide if the update should be enforced or not.
However those two arguments are correlated. If lsn is
InvalidXlogRecPtr, the minimum recovery point update will be enforced.
Hence why not simplifying its interface and remove the force flag? See
attached.

Thanks,
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e4645a3..5d69bd3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -827,7 +827,7 @@ static void RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRec
 static void UpdateLastRemovedPtr(char *filename);
 static void ValidateXLOGDirectoryStructure(void);
 static void CleanupBackupHistory(void);
-static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
+static void UpdateMinRecoveryPoint(XLogRecPtr lsn);
 static XLogRecord *ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr,
 		   int emode, bool fetching_ckpt);
 static void CheckRecoveryConsistency(void);
@@ -2489,14 +2489,16 @@ XLogGetReplicationSlotMinimumLSN(void)
  * If we crash during recovery, we must reach this point again before the
  * database is consistent.
  *
- * If 'force' is true, 'lsn' argument is ignored. Otherwise, minRecoveryPoint
- * is only updated if it's not already greater than or equal to 'lsn'.
+ * If 'lsn' is InvalidXLogRecPtr, enforce its advance.  Otherwise,
+ * minRecoveryPoint is only updated if it's not already greater than or
+ * equal to 'lsn'.
  */
 static void
-UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
+UpdateMinRecoveryPoint(XLogRecPtr lsn)
 {
 	/* Quick check using our local copy of the variable */
-	if (!updateMinRecoveryPoint || (!force && lsn <= minRecoveryPoint))
+	if (!updateMinRecoveryPoint ||
+		(!XLogRecPtrIsInvalid(lsn) && lsn <= minRecoveryPoint))
 		return;
 
 	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
@@ -2512,7 +2514,7 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 	 */
 	if (minRecoveryPoint == 0)
 		updateMinRecoveryPoint = false;
-	else if (force || minRecoveryPoint < lsn)
+	else if (XLogRecPtrIsInvalid(lsn) || minRecoveryPoint < lsn)
 	{
 		XLogRecPtr	newMinRecoveryPoint;
 		TimeLineID	newMinRecoveryPointTLI;
@@ -2520,8 +2522,7 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 		/*
 		 * To avoid having to update the control file too often, we update it
 		 * all the way to the last record being replayed, even though 'lsn'
-		 * would suffice for correctness.  This also allows the 'force' case
-		 * to not need a valid 'lsn' value.
+		 * would suffice for correctness.
 		 *
 		 * Another important reason for doing it this way is that the passed
 		 * 'lsn' value could be bogus, i.e., past the end of available WAL, if
@@ -2535,7 +2536,7 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 		newMinRecoveryPointTLI = XLogCtl->replayEndTLI;
 		SpinLockRelease(>info_lck);
 
-		if (!force && newMinRecoveryPoint < lsn)
+		if (!XLogRecPtrIsInvalid(lsn) && newMinRecoveryPoint < lsn)
 			elog(WARNING,
 			   "xlog min recovery request %X/%X is past current point %X/%X",
  (uint32) (lsn >> 32), (uint32) lsn,
@@ -2582,7 +2583,8 @@ XLogFlush(XLogRecPtr record)
 	 */
 	if (!XLogInsertAllowed())
 	{
-		UpdateMinRecoveryPoint(record, false);
+		Assert(!XLogRecPtrIsInvalid(record));
+		UpdateMinRecoveryPoint(record);
 		return;
 	}
 
@@ -5244,7 +5246,7 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
 	/*
 	 * Update min recovery point one last time.
 	 */
-	UpdateMinRecoveryPoint(InvalidXLogRecPtr, true);
+	UpdateMinRecoveryPoint(InvalidXLogRecPtr);
 
 	/*
 	 * If the ending log segment is still open, close it (to avoid problems on
@@ -8750,7 +8752,7 @@ CreateRestartPoint(int flags)
 		(uint32) (lastCheckPoint.redo >> 32),
 		(uint32) lastCheckPoint.redo)));
 
-		UpdateMinRecoveryPoint(InvalidXLogRecPtr, true);
+		UpdateMinRecoveryPoint(InvalidXLogRecPtr);
 		if (flags & CHECKPOINT_IS_SHUTDOWN)
 		{
 			LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] RecoveryTargetTLI dead variable in XLogCtlData

2016-07-12 Thread Michael Paquier
Hi all,

I just bumped into $subject, a variable that is never set and never used:
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -631,8 +631,6 @@ typedef struct XLogCtlData
TimeLineID  replayEndTLI;
/* timestamp of last COMMIT/ABORT record replayed (or being replayed) */
TimestampTz recoveryLastXTime;
-   /* current effective recovery target timeline */
-   TimeLineID  RecoveryTargetTLI;

Thanks,
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e4645a3..232f533 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -631,8 +631,6 @@ typedef struct XLogCtlData
 	TimeLineID	replayEndTLI;
 	/* timestamp of last COMMIT/ABORT record replayed (or being replayed) */
 	TimestampTz recoveryLastXTime;
-	/* current effective recovery target timeline */
-	TimeLineID	RecoveryTargetTLI;
 
 	/*
 	 * timestamp of when we started replaying the current chunk of WAL data,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails

2016-07-12 Thread Michael Paquier
On Tue, Jul 12, 2016 at 12:22 PM, Amit Kapila  wrote:
> I think updating minRecoveryPoint unconditionally can change it's
> purpose in some cases.  Refer below comments in code:
>
> * minRecoveryPoint is updated to the latest replayed LSN whenever we
> * flush a data change during archive recovery. That guards against
> * starting archive recovery, aborting it, and restarting with an earlier
> * stop location. If we've already flushed data changes from WAL record X
> * to disk, we mustn't start up until we reach X again.
>
> Now, as per above rule, the value of minRecoveryPoint can be much
> smaller than XLogCtl->replayEndRecPtr.  I think this can change the
> rules when we can allow read-only connections.

That would delay the moment read-only connections in hot standby are
allowed in the case where a server is stopped with "immediate", then
restarted with a different target if no data has been flushed when
replaying.

> I think your and Kyotaro-san's point that minRecoveryPoint should be
> updated to support back-ups on standby has merits, but I think doing
> it unconditionally might lead to change in behaviour in some cases.

If we want to tackle the case I mentioned above, one way is to just
update minRecoveryPoint when an exclusive or a non-exclusive backup is
being taken by looking at their status in shared memory. See for
example the patch (1) attached, but this does not save from the case
where a node replays WAL, does not have data flushes, and from which a
backup is taken, in the case where this node gets restarted later with
the immediate mode and has different replay targets.

Another way that just popped into my mind is to add dedicated fields
to XLogCtl that set the stop LSN of a backup the way it should be
instead of using minRecoveryPoint. In short we'd update those fields
in CreateRestartPoint and UpdateMinRecoveryPoint under
XLogCtl->info_lck. The good thing is that this lock is already taken
there. See patch (2) accomplishing that.

Thinking about that, patch (2) is far cleaner, and takes care of not
advancing minRecoveryPoint where not needed, but it does it for the
base backups as they should be. So the dependency between the minimum
recovery point and the end locations of a backup get reduced.
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e4645a3..fa37ff1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8739,8 +8739,10 @@ CreateRestartPoint(int flags)
 	 * immediate shutdown, though.
 	 *
 	 * We don't explicitly advance minRecoveryPoint when we do create a
-	 * restartpoint. It's assumed that flushing the buffers will do that as a
-	 * side-effect.
+	 * restartpoint as it is assumed that flushing the buffers will do that as
+	 * a side-effect except when a backup is running to ensure that the start
+	 * LSN location of a backup is not newer than minRecoveryPoint which is
+	 * used as the stop location of a backup.
 	 */
 	if (XLogRecPtrIsInvalid(lastCheckPointRecPtr) ||
 		lastCheckPoint.redo <= ControlFile->checkPointCopy.redo)
@@ -8762,6 +8764,8 @@ CreateRestartPoint(int flags)
 		LWLockRelease(CheckpointLock);
 		return false;
 	}
+	else if (BackupInProgress(false))
+		UpdateMinRecoveryPoint(InvalidXLogRecPtr, true);
 
 	/*
 	 * Update the shared RedoRecPtr so that the startup process can calculate
@@ -10866,14 +10870,28 @@ rm_redo_error_callback(void *arg)
 /*
  * BackupInProgress: check if online backup mode is active
  *
- * This is done by checking for existence of the "backup_label" file.
+ * This is done by checking for existence of the "backup_label" file or by
+ * looking at the shared memory status of backups.
  */
 bool
-BackupInProgress(void)
+BackupInProgress(bool label_check)
 {
-	struct stat stat_buf;
+	bool res;
+
+	if (label_check)
+	{
+		struct stat stat_buf;
+		res = (stat(BACKUP_LABEL_FILE, _buf) == 0);
+	}
+	else
+	{
+		WALInsertLockAcquireExclusive();
+		res = XLogCtl->Insert.nonExclusiveBackups > 0 ||
+			XLogCtl->Insert.exclusiveBackup;
+		WALInsertLockRelease();
+	}
 
-	return (stat(BACKUP_LABEL_FILE, _buf) == 0);
+	return res;
 }
 
 /*
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 33383b4..2f381cd 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -636,7 +636,7 @@ pg_xlog_location_diff(PG_FUNCTION_ARGS)
 Datum
 pg_is_in_backup(PG_FUNCTION_ARGS)
 {
-	PG_RETURN_BOOL(BackupInProgress());
+	PG_RETURN_BOOL(BackupInProgress(true));
 }
 
 /*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 19d11e0..6795c69 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -3527,7 +3527,7 @@ PostmasterStateMachine(void)
 		/*
 		 * PM_WAIT_BACKUP state ends when online backup mode is not active.
 		 */
-		if (!BackupInProgress())
+		if (!BackupInProgress(true))
 			pmState 

Re: [HACKERS] pg_basebackup wish list

2016-07-12 Thread Masahiko Sawada
On Wed, Jul 13, 2016 at 1:53 AM, Jeff Janes  wrote:
> I've been having some adventures with pg_basebackup lately, and had
> some suggestions based on those.

And what I think is pg_baseback never remove the directory specified
by -D option even if execution is failed. initdb command behaves so.
I think it's helpful for backup operation.

Regards,

Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-07-12 Thread Stephen Frost
* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 7/12/16 12:17 PM, Tom Lane wrote:
> > It's sounding to me like we have consensus on this proposal to further
> > change \df+ to replace the "Source code" column with "Internal name",
> > which is prosrc for C and internal-language functions but NULL otherwise.
> > 
> > If I've not heard objections by tomorrow I'll go make that change.
> > 
> > Are we satisfied with telling people to use \sf to see the source code
> > for a PL function?  Or should there be another variant of \df that
> > still provides source code?
> 
> I'm quite fond of having the full source code show in \df+ and 

I'm curious how it's useful and in what way \sf does not accomplish what
you use \df+ for.  I understand that's a change, but I believe it's a
positive one and would make \df+ much more generally useful.  I tend to
resort to selecting columns out of pg_proc more often than I use \df+,
which is certainly not what we're going for.

> I'm
> against removing it on short notice past beta2

We've already had to change the structure of \df+; I'm not convinced
that avoiding doing so further now, just to do so again in the next
release, is actually a better answer than changing it now.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Showing parallel status in \df+

2016-07-12 Thread Alvaro Herrera
Peter Eisentraut wrote:

> I'm quite fond of having the full source code show in \df+ and I'm
> against removing it on short notice past beta2, discussed under a
> "false" subject heading.

How do you use it?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IMPORT FOREIGN SCHEMA can't be run in in pl/pgsql due to INTO

2016-07-12 Thread Tom Lane
Merlin Moncure  writes:
> On Mon, Jul 11, 2016 at 3:09 PM, Tom Lane  wrote:
>> While we can certainly hack it by something along the lines of not
>> recognizing INTO when the first token was IMPORT, the whole thing
>> seems awfully messy and fragile.  And it will certainly break again
>> the next time somebody decides that INTO is le mot juste in some new
>> SQL command.  I wish we could think of a safer, more future-proof
>> solution.  I have no idea what that would be, though, short of
>> deprecating INTO altogether.

> This is a natural consequence of having two
> almost-but-not-quite-the-same grammars handing the same shared
> language.  There are a similar set of annoyances compiling C with a
> C++ compiler as we all know.  In a perfect world, SQL procedural
> extensions would be a proper superset and we'd have *one* grammar
> handling everything.  Among other niceties this would make moving
> forward with stored procedures a much simpler discussion.  Well, C'est
> la vie :-D.

Yeah, I was just belly-aching ;-).  Not much choice in the short term.
Fix pushed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [COMMITTERS] [HACKERS] Logical decoding

2016-07-12 Thread Fabrízio de Royes Mello
On Mon, Jul 11, 2016 at 2:13 AM, Michael Paquier 
wrote:
>
> On Mon, Jul 11, 2016 at 2:03 PM, Craig Ringer 
wrote:
> > On 9 July 2016 at 01:57, Joshua Bay  wrote:
> >> and where are this code is in the codebase?
> >
> > src/backend/replication/logical/*
> > src/backend/replication/walsender.c
> > src/backend/access/transam/xlogreader.c
> > src/include/access/xlogreader.h
> > src/include/replication/output_plugin.h
> > contrib/test_decoding/
> > doc/src/sgml/logicaldecoding.sgml
>
> Some other references:
> https://wiki.postgresql.org/images/7/77/Fosdem-2015-logical-decoding.pdf
>
> And some other examples of plugins:
> https://github.com/leptonix/decoding-json
> https://github.com/xstevens/decoderbufs
> https://github.com/confluentinc/bottledwater-pg (for kafka)
> https://github.com/michaelpq/pg_plugins/tree/master/decoder_raw (wrote
this one)
>

Nice, also we have wal2json [1].

Regards,


[1] https://github.com/eulerto/wal2json

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] pgbench - minor fix for meta command only scripts

2016-07-12 Thread Tom Lane
Fabien COELHO  writes:
> Ok. Here is an updated version, with a better suffix and a simplified 
> comment.

Doesn't this break the handling of latency calculations, or at least make
the results completely different for the last metacommand than what they
would be for a non-last command?  It looks like it needs to loop back so
that the latency calculation is completed for the metacommand before it
can exit.  Seems to me it would probably make more sense to fall out at
the end of the "transaction finished" if-block, around line 1923 in HEAD.

(The code structure in here seems like a complete mess to me, but probably
now is not the time to refactor it.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST index build versus NaN coordinates

2016-07-12 Thread Tom Lane
I wrote:
> Andreas Seltenreich  writes:
>> The infinite loop from the bug report was triggered. Further, two
>> previously unseen errors are logged:
>> ERROR:  timestamp cannot be NaN
>> ERROR:  getQuadrant: impossible case
>> The first is porbably as boring as it gets, the second one is from the
>> getQuadrant() in spgquadtreeproc.c.

> Yeah, the first one is presumably from float8_timestamptz() intentionally
> rejecting a NaN, which seems fine.

>> Curiously, the getQuadrant()s in geo_spgist.c and rangetypes_spgist.c do
>> not have such a check.  I guess the boxes will just end up in an
>> undefined position in the index for these.

> Right, we probably want them all to apply some consistent ordering ---
> doesn't matter so much what it is, but float8's rule is as good as any.

I looked into these a bit more closely.  I think rangetypes_spgist.c is
fine as-is: it's relying on the range component type to provide a total
ordering of its values, and if the component type fails to do that, it's
the fault of the component type not the range code.

spgquadtreeproc.c and geo_spgist.c both have got NaN issues, but the code
in both of them is rather tightly tied to the fuzzy-geometric-comparisons
logic that Emre has been messing with.  I think we ought to put "what
to do with NaNs?" into that same can of worms, rather than try to resolve
it separately.  (Yeah, I know, I just made Emre's job even harder.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dumping database privileges broken in 9.6

2016-07-12 Thread Stephen Frost
All,

* Noah Misch (n...@leadboat.com) wrote:
> On Wed, Jun 29, 2016 at 11:50:17AM -0400, Stephen Frost wrote:
> > * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> > > Do this:
> > > 
> > > CREATE DATABASE test1;
> > > REVOKE CONNECT ON DATABASE test1 FROM PUBLIC;
> > > 
> > > Run pg_dumpall.
> > > 
> > > In 9.5, this produces
> > > 
> > > CREATE DATABASE test1 WITH TEMPLATE = template0 OWNER = peter;
> > > REVOKE ALL ON DATABASE test1 FROM PUBLIC;
> > > REVOKE ALL ON DATABASE test1 FROM peter;
> > > GRANT ALL ON DATABASE test1 TO peter;
> > > GRANT TEMPORARY ON DATABASE test1 TO PUBLIC;
> > > 
> > > In 9.6, this produces only
> > > 
> > > CREATE DATABASE test1 WITH TEMPLATE = template0 OWNER = peter;
> > > GRANT TEMPORARY ON DATABASE test1 TO PUBLIC;
> > > GRANT ALL ON DATABASE test1 TO peter;
> > > 
> > > Note that the REVOKE statements are missing.  This does not
> > > correctly recreate the original state.
> > 
> > I see what happened here, the query in dumpCreateDB() needs to be
> > adjusted to pull the default information to then pass to
> > buildACLComments(), similar to how the objects handled by pg_dump work.
> > The oversight was in thinking that databases didn't have any default
> > rights granted, which clearly isn't correct.
> > 
> > I'll take care of that in the next day or so and add an appropriate
> > regression test.
> 
> This PostgreSQL 9.6 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com

Attached is a patch to address this.

After much consideration and deliberation, I went with the simpler
solution to simply dump out the database privileges based on what a new
creation of those privileges would yield, resulting in output similar to
pre-9.6.  We document that template1 is allowed to be dropped/recreated,
which greatly complicates using pg_init_privs to record and produce a
delta against the initdb-time values, as we lose the connection between
pg_init_privs and the "template1" database as soon as it is dropped
(something which can't be done with objects in that catalog).

Comments welcome.

Thanks!

Stephen
From fc87c0a07f37d7b4bbcf067d22d31467a511e865 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Tue, 12 Jul 2016 15:37:35 -0400
Subject: [PATCH] Correctly dump database ACLs

---
 src/bin/pg_dump/pg_dumpall.c |  54 +++---
 src/bin/pg_dump/t/002_pg_dump.pl | 153 +++
 2 files changed, 195 insertions(+), 12 deletions(-)

diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index d4fb03e..99c0f90 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -1284,14 +1284,43 @@ dumpCreateDB(PGconn *conn)
 
 	PQclear(res);
 
-	/* Now collect all the information about databases to dump */
-	if (server_version >= 90300)
+
+	/*
+	 * Now collect all the information about databases to dump.
+	 *
+	 * For the database ACLs, as of 9.6, we extract both the positive (as
+	 * datacl) and negative (as rdatacl) ACLs, relative to the default ACL for
+	 * databases, which are then passed to buildACLCommands() below.
+	 *
+	 * See buildACLQueries() and buildACLCommands().
+	 *
+	 * Note that we do not support initial privileges (pg_init_privs) on
+	 * databases.
+	 */
+	if (server_version >= 90600)
+		res = executeQuery(conn,
+		   "SELECT datname, "
+		   "coalesce(rolname, (select rolname from pg_authid where oid=(select datdba from pg_database where datname='template0'))), "
+		   "pg_encoding_to_char(d.encoding), "
+		   "datcollate, datctype, datfrozenxid, datminmxid, "
+		   "datistemplate, "
+		   "(SELECT pg_catalog.array_agg(acl) FROM (SELECT pg_catalog.unnest(coalesce(datacl,pg_catalog.acldefault('d',datdba))) AS acl "
+		   "EXCEPT SELECT pg_catalog.unnest(pg_catalog.acldefault('d',datdba))) as foo)"
+		   "AS datacl,"
+		   "(SELECT pg_catalog.array_agg(acl) FROM (SELECT pg_catalog.unnest(pg_catalog.acldefault('d',datdba)) AS acl "
+		   "EXCEPT SELECT pg_catalog.unnest(coalesce(datacl,pg_catalog.acldefault('d',datdba as foo)"
+		   "AS rdatacl,"
+		   "datconnlimit, "
+		   "(SELECT spcname FROM pg_tablespace t WHERE t.oid = d.dattablespace) AS dattablespace "
+			  "FROM pg_database d LEFT JOIN pg_authid u ON (datdba = u.oid) "
+		   "WHERE datallowconn ORDER BY 1");
+	else if (server_version >= 90300)
 		res = executeQuery(conn,
 		   "SELECT datname, "
 		   "coalesce(rolname, (select rolname from pg_authid where oid=(select datdba from pg_database where datname='template0'))), "
 		   "pg_encoding_to_char(d.encoding), "
 		   "datcollate, datctype, datfrozenxid, datminmxid, "
-		   "datistemplate, datacl, datconnlimit, "
+		   "datistemplate, datacl, '' as 

[HACKERS] DO with a large amount of statements get stuck with high memory consumption

2016-07-12 Thread Merlin Moncure
I've noticed that pl/pgsql functions/do commands do not behave well
when the statement resolves and frees memory.   To be clear:

FOR i in 1..100
LOOP
  INSERT INTO foo VALUES (i);
END LOOP;

...runs just fine while

BEGIN
  INSERT INTO foo VALUES (1);
  INSERT INTO foo VALUES (2);
  ...
  INSERT INTO foo VALUES (100);
END;

(for the curious, create a script yourself via
copy (
  select
'do $$begin create temp table foo(i int);'
  union all select
format('insert into foo values (%s);', i) from generate_series(1,100) i
  union all select 'raise notice ''abandon all hope!''; end; $$;'
) to '/tmp/breakit.sql';

...while consume amounts of resident memory proportional to the number
of statemnts and eventually crash the server.  The problem is obvious;
each statement causes a plan to get created and the server gets stuck
in a loop where SPI_freeplan() is called repeatedly.  Everything is
working as designed I guess, but when this happens it's really
unpleasant: the query is uncancellable and unterminatable, nicht gut.
A pg_ctl kill ABRT  will do the trick but I was quite astonished
to see linux take a few minutes to clean up the mess (!) on a somewhat
pokey virtualized server with lots of memory.  With even as little as
ten thousand statements the cleanup time far exceed the runtime of the
statement block.

I guess the key takeaway here is, "don't do that"; pl/pgsql
aggressively generates plans and turns out to be a poor choice for
bulk loading because of all the plan caching.   Having said that, I
can't help but wonder if there should be a (perhaps user configurable)
limit to the amount of SPI plans a single function call should be able
to acquire on the basis you are going to smack into very poor
behaviors in the memory subsystem.

Stepping back, I can't help but wonder what the value of all the plan
caching going on is at all for statement blocks.  Loops might comprise
a notable exception, noted.  I'd humbly submit though that (relative
to functions) it's much more likely to want to do something like
insert a lot of statements and a impossible to utilize any cached
plans.

This is not an academic gripe -- I just exploded production :-D.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST index build versus NaN coordinates

2016-07-12 Thread Tom Lane
I wrote:
> I looked into the problem reported in bug #14238,
> https://www.postgresql.org/message-id/20160708151747.1426.60...@wrigleys.postgresql.org
> I think that probably the most reasonable answer is to replace all the
> raw "double" comparisons in this code with float8_cmp_internal() or
> something that's the moral equivalent of that.  Does anyone want to
> propose something else?

Here's a draft patch that modifies gistproc.c to treat NaNs essentially
as though they are infinities.  I've only touched the code associated
with GiST index insertion, not with searches, so it's quite possible that
searches will still not work right in the presence of NaNs.  However,
this does fix the index-build infinite loop complained of originally.

I've not yet looked at the SPGiST issues reported by Andreas.

regards, tom lane

diff --git a/src/backend/access/gist/gistproc.c b/src/backend/access/gist/gistproc.c
index e8213e2..d47211a 100644
*** a/src/backend/access/gist/gistproc.c
--- b/src/backend/access/gist/gistproc.c
***
*** 17,36 
   */
  #include "postgres.h"
  
  #include "access/gist.h"
  #include "access/stratnum.h"
  #include "utils/geo_decls.h"
  
  
  static bool gist_box_leaf_consistent(BOX *key, BOX *query,
  		 StrategyNumber strategy);
- static double size_box(BOX *box);
  static bool rtree_internal_consistent(BOX *key, BOX *query,
  		  StrategyNumber strategy);
  
  /* Minimum accepted ratio of split */
  #define LIMIT_RATIO 0.3
  
  
  /**
   * Box ops
--- 17,47 
   */
  #include "postgres.h"
  
+ #include 
+ 
  #include "access/gist.h"
  #include "access/stratnum.h"
+ #include "utils/builtins.h"
  #include "utils/geo_decls.h"
  
  
  static bool gist_box_leaf_consistent(BOX *key, BOX *query,
  		 StrategyNumber strategy);
  static bool rtree_internal_consistent(BOX *key, BOX *query,
  		  StrategyNumber strategy);
  
  /* Minimum accepted ratio of split */
  #define LIMIT_RATIO 0.3
  
+ /* Convenience macros for NaN-aware comparisons */
+ #define FLOAT8_EQ(a,b)	(float8_cmp_internal(a, b) == 0)
+ #define FLOAT8_LT(a,b)	(float8_cmp_internal(a, b) < 0)
+ #define FLOAT8_LE(a,b)	(float8_cmp_internal(a, b) <= 0)
+ #define FLOAT8_GT(a,b)	(float8_cmp_internal(a, b) > 0)
+ #define FLOAT8_GE(a,b)	(float8_cmp_internal(a, b) >= 0)
+ #define FLOAT8_MAX(a,b)  (FLOAT8_GT(a, b) ? (a) : (b))
+ #define FLOAT8_MIN(a,b)  (FLOAT8_LT(a, b) ? (a) : (b))
+ 
  
  /**
   * Box ops
*** static bool rtree_internal_consistent(BO
*** 40,51 
   * Calculates union of two boxes, a and b. The result is stored in *n.
   */
  static void
! rt_box_union(BOX *n, BOX *a, BOX *b)
  {
! 	n->high.x = Max(a->high.x, b->high.x);
! 	n->high.y = Max(a->high.y, b->high.y);
! 	n->low.x = Min(a->low.x, b->low.x);
! 	n->low.y = Min(a->low.y, b->low.y);
  }
  
  /*
--- 51,103 
   * Calculates union of two boxes, a and b. The result is stored in *n.
   */
  static void
! rt_box_union(BOX *n, const BOX *a, const BOX *b)
  {
! 	n->high.x = FLOAT8_MAX(a->high.x, b->high.x);
! 	n->high.y = FLOAT8_MAX(a->high.y, b->high.y);
! 	n->low.x = FLOAT8_MIN(a->low.x, b->low.x);
! 	n->low.y = FLOAT8_MIN(a->low.y, b->low.y);
! }
! 
! /*
!  * Size of a BOX for penalty-calculation purposes.
!  * The result can be +Infinity, but not NaN.
!  */
! static double
! size_box(const BOX *box)
! {
! 	/*
! 	 * Check for zero-width cases.  Note that we define the size of a zero-
! 	 * by-infinity box as zero.  It's important to special-case this somehow,
! 	 * as naively multiplying infinity by zero will produce NaN.
! 	 *
! 	 * The less-than cases should not happen, but if they do, say "zero".
! 	 */
! 	if (FLOAT8_LE(box->high.x, box->low.x) ||
! 		FLOAT8_LE(box->high.y, box->low.y))
! 		return 0.0;
! 
! 	/*
! 	 * We treat NaN as larger than +Infinity, so any distance involving a NaN
! 	 * and a non-NaN is infinite.  Note the previous check eliminated the
! 	 * possibility that the low fields are NaNs.
! 	 */
! 	if (isnan(box->high.x) || isnan(box->high.y))
! 		return get_float8_infinity();
! 	return (box->high.x - box->low.x) * (box->high.y - box->low.y);
! }
! 
! /*
!  * Return amount by which the union of the two boxes is larger than
!  * the original BOX's area.  The result can be +Infinity, but not NaN.
!  */
! static double
! box_penalty(const BOX *original, const BOX *new)
! {
! 	BOX			unionbox;
! 
! 	rt_box_union(, original, new);
! 	return size_box() - size_box(original);
  }
  
  /*
*** gist_box_consistent(PG_FUNCTION_ARGS)
*** 85,100 
   strategy));
  }
  
  static void
! adjustBox(BOX *b, BOX *addon)
  {
! 	if (b->high.x < addon->high.x)
  		b->high.x = addon->high.x;
! 	if (b->low.x > addon->low.x)
  		b->low.x = addon->low.x;
! 	if (b->high.y < addon->high.y)
  		b->high.y = addon->high.y;
! 	if (b->low.y > addon->low.y)
  		b->low.y = addon->low.y;
  }
  
--- 

Re: [HACKERS] pg_basebackup wish list

2016-07-12 Thread Kenneth Marshall
On Tue, Jul 12, 2016 at 11:06:39AM -0700, Jeff Janes wrote:
> On Tue, Jul 12, 2016 at 10:48 AM, Peter Eisentraut
>  wrote:
> > On 7/12/16 12:53 PM, Jeff Janes wrote:
> >> The --help message for pg_basebackup says:
> >>
> >> -Z, --compress=0-9 compress tar output with given compression level
> >>
> >> But -Z0 is then rejected as 'invalid compression level "0"'.  The real
> >> docs do say 1-9, only the --help message has this bug.  Trivial patch
> >> attached.
> >
> > pg_dump --help and man page say it supports 0..9.  Maybe we should make
> > that more consistent.
> 
> pg_dump actually does support -Z0, though.  Well, sort of.  It outputs
> plain text.  Rather than plain text wrapped in some kind of dummy gzip
> header, which is what I had naively expected.
> 
> Is that what -Z0 in pg_basebackup should do as well, just output
> uncompressed tar data, and not add the ".gz" to the "base.tar" file
> name?
> 
> Cheers,
> 
> Jeff
> 

Hi,

Yes, please support the no compression option. It can be useful in
situations where the bottleneck is the compression itself (quite
easily done with zlib based options, another plug for a higher
performance option).

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup wish list

2016-07-12 Thread Jeff Janes
On Tue, Jul 12, 2016 at 10:48 AM, Peter Eisentraut
 wrote:
> On 7/12/16 12:53 PM, Jeff Janes wrote:
>> The --help message for pg_basebackup says:
>>
>> -Z, --compress=0-9 compress tar output with given compression level
>>
>> But -Z0 is then rejected as 'invalid compression level "0"'.  The real
>> docs do say 1-9, only the --help message has this bug.  Trivial patch
>> attached.
>
> pg_dump --help and man page say it supports 0..9.  Maybe we should make
> that more consistent.

pg_dump actually does support -Z0, though.  Well, sort of.  It outputs
plain text.  Rather than plain text wrapped in some kind of dummy gzip
header, which is what I had naively expected.

Is that what -Z0 in pg_basebackup should do as well, just output
uncompressed tar data, and not add the ".gz" to the "base.tar" file
name?

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - compute & show latency consistently

2016-07-12 Thread Peter Eisentraut
On 7/9/16 4:42 AM, Fabien COELHO wrote:
>  number of transactions per client: 1000
>  number of transactions actually processed: 1/1
> -latency average = 15.844 ms
> -latency stddev = 2.715 ms
> +latency average: 15.844 ms
> +latency stddev: 2.715 ms
>  tps = 618.764555 (including connections establishing)
>  tps = 622.977698 (excluding connections establishing)

I think what you have here is that colons separate input parameters and
equal signs separate result output.  So I think it's OK the way it is.

Maybe a better improvement would be introducing section headings like
"test parameters" and "test results".

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix Error Message for allocate_recordbuf() Failure

2016-07-12 Thread Shoaib Lari
Thank you for the feedback.

We had encountered this error message when allocating a record buf of 344
bytes.  The message "record length 344 at %X/%X too long" along with the
comment /* We treat this as a "bogus data" condition */  masked the OOM
condition, implying an error in log record size calculation logic.

The actual allocation that failed was for 5 * Max(BLCKSZ, XLOG_BLCKSZ), a
much larger allocation.

Shoaib


On Sun, Jul 10, 2016 at 10:58 PM, Craig Ringer 
wrote:

>
>
> On 11 July 2016 at 12:04, Michael Paquier 
> wrote:
>
>> On Sat, Jul 9, 2016 at 2:58 AM, Shoaib Lari  wrote:
>> > Besides making the error message more informative, we had to modify
>> > allocate_recordbuf() to return the actual number of bytes that were
>> being
>> > allocated.
>>
>> -   report_invalid_record(state, "record length %u at %X/%X too long",
>> - total_len,
>> - (uint32) (RecPtr >> 32), (uint32) RecPtr);
>> +   report_invalid_record(state,
>> + "cannot allocate %u bytes for record
>> length %u at %X/%X",
>> + newSizeOut, total_len, (uint32) (RecPtr >>
>> 32),
>> + (uint32) RecPtr);
>>
>> It does not look like a good idea to me to complicate the interface of
>> allocate_recordbuf just to make more verbose one error message,
>> meaning that it basically a complain related to the fact that
>> palloc_extended(MCXT_ALLOC_NO_OOM) does not mention to the user the
>> size that it has tried to allocate before returning NULL. Do you have
>> a use case that would prove to be useful if this extra piece of
>> information is provided? Because it seems to me that we don't really
>> care if we know how much memory it has failed to allocate, we only
>> want to know that it *has* failed and take actions only based on that.
>>
>
> I actually find details of how much memory we tried to allocate to be
> really useful in other places that do emit it. Sometimes it's been an
> important clue when trying to figure out what's going on on a remote system
> with no or limited access. Even if it just tells me "we were probably
> incrementally allocating lots of small values since this failure is small"
> vs "this allocation is huge, look for something unusually large" or "this
> allocation is impossibly huge / some suspicious value look for memory
> corruption".
>
> I tend to agree with your suggestion about a better approach but do think
> emitting details on allocation failures is useful. At least when people
> turn VM overcommit off ...
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [HACKERS] pg_basebackup wish list

2016-07-12 Thread Peter Eisentraut
On 7/12/16 12:53 PM, Jeff Janes wrote:
> The --help message for pg_basebackup says:
> 
> -Z, --compress=0-9 compress tar output with given compression level
> 
> But -Z0 is then rejected as 'invalid compression level "0"'.  The real
> docs do say 1-9, only the --help message has this bug.  Trivial patch
> attached.

pg_dump --help and man page say it supports 0..9.  Maybe we should make
that more consistent.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-07-12 Thread Peter Eisentraut
On 7/12/16 12:17 PM, Tom Lane wrote:
> It's sounding to me like we have consensus on this proposal to further
> change \df+ to replace the "Source code" column with "Internal name",
> which is prosrc for C and internal-language functions but NULL otherwise.
> 
> If I've not heard objections by tomorrow I'll go make that change.
> 
> Are we satisfied with telling people to use \sf to see the source code
> for a PL function?  Or should there be another variant of \df that
> still provides source code?

I'm quite fond of having the full source code show in \df+ and I'm
against removing it on short notice past beta2, discussed under a
"false" subject heading.

This is long-standing, intentional behavior, not a regression, and
changing it should get wider consultation.  Please submit a patch to the
next commit fest instead.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-07-12 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> Are we satisfied with telling people to use \sf to see the source code
> >> for a PL function?  Or should there be another variant of \df that
> >> still provides source code?
> 
> > I don't see the point in having a \df variant be the same as what \sf
> > is.  I could possibly see extending \sf in some way, if there are things
> > that it doesn't currently do that \df does (and those things are
> > useful).
> 
> I certainly agree that \sf already does what it does just fine.  The
> question is more about whether anyone is likely to think that removing
> source code from \df+ output constitutes an important loss of
> functionality.

Right, I understood that to be your question and was intending to answer
it with "no."

> I had some vague ideas about inventing a new \df behavior modeled on
> the way that \d+ shows view definitions, that is, put the function body
> in a footer rather than in the tabular output proper.  So you could
> imagine something like
> 
> # \df++ foo*
>  Schema | Name | ...
> +--+-...
>  public | fooa | ...
>  public | foob | ...
> Source code for fooa(int, text):
>   ... body of fooa ...
> Source code for foob(text, text, numeric):
>   ... body of foob ...
> 
> But I'm not sure it's worth the trouble.  And anyway we could add this
> later.

Agreed on both counts.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Showing parallel status in \df+

2016-07-12 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Are we satisfied with telling people to use \sf to see the source code
>> for a PL function?  Or should there be another variant of \df that
>> still provides source code?

> I don't see the point in having a \df variant be the same as what \sf
> is.  I could possibly see extending \sf in some way, if there are things
> that it doesn't currently do that \df does (and those things are
> useful).

I certainly agree that \sf already does what it does just fine.  The
question is more about whether anyone is likely to think that removing
source code from \df+ output constitutes an important loss of
functionality.

I had some vague ideas about inventing a new \df behavior modeled on
the way that \d+ shows view definitions, that is, put the function body
in a footer rather than in the tabular output proper.  So you could
imagine something like

# \df++ foo*
 Schema | Name | ...
+--+-...
 public | fooa | ...
 public | foob | ...
Source code for fooa(int, text):
  ... body of fooa ...
Source code for foob(text, text, numeric):
  ... body of foob ...

But I'm not sure it's worth the trouble.  And anyway we could add this
later.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_basebackup wish list

2016-07-12 Thread Jeff Janes
I've been having some adventures with pg_basebackup lately, and had
some suggestions based on those.

The --help message for pg_basebackup says:

-Z, --compress=0-9 compress tar output with given compression level

But -Z0 is then rejected as 'invalid compression level "0"'.  The real
docs do say 1-9, only the --help message has this bug.  Trivial patch
attached.

These ones I have not written code for yet:

The progress reporting for pg_basebackup is pretty terse:

 858117/7060099 kB (12%), 0/1 tablespace

I think we should at least add a count-up timer showing the seconds it
has been running.  I can always use my own stopwatch, but that is not
very friendly and easy to forget to start.

And maybe change the reporting units from kB to MB when the pre-scan
says the total size exceeds some threshold?  At the same limit
pg_size_pretty does?

If I use the verbose flag, then the progress message includes the name
of the file being written to by the client. However, in -Ft mode, this
is always "something/base.tar" (unless you are using tablespaces),
which is not terribly useful. Should it instead report the name of the
file being read on the server end?

When using pg_basebackup from the wrong version, the error message it
reports is pretty unhelpful:

pg_basebackup: could not initiate base backup: ERROR:  syntax error

Could we have a newer version of pg_basebackup capture that error and
inject a HINT, or is there a better solution for getting a better
error message?

Cheers,

Jeff


pg_basebackup_compress.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-07-12 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Agreed.  I don't have any issue with "Language", really, but I agree
> > that "Source code" makes the output pretty ridiculous.  I also liked the
> > idea of changing the name to "internal name" or something along those
> > lines, rather than having it be "source code", if we keep the column for
> > C/internal functions.  Keeping is as "source code" wouldn't be accurate.
> 
> It's sounding to me like we have consensus on this proposal to further
> change \df+ to replace the "Source code" column with "Internal name",
> which is prosrc for C and internal-language functions but NULL otherwise.
> 
> If I've not heard objections by tomorrow I'll go make that change.
> 
> Are we satisfied with telling people to use \sf to see the source code
> for a PL function?  Or should there be another variant of \df that
> still provides source code?

I don't see the point in having a \df variant be the same as what \sf
is.  I could possibly see extending \sf in some way, if there are things
that it doesn't currently do that \df does (and those things are
useful).

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] GiST index build versus NaN coordinates

2016-07-12 Thread Tom Lane
Andreas Seltenreich  writes:
> Tom Lane writes:
>> More generally, this example makes me fearful that NaN coordinates in
>> geometric values are likely to cause all sorts of issues.  It's too late
>> to disallow them, probably, but I wonder how can we identify other bugs
>> of this ilk.

> Sounds like some fuzz testing with nan/infinity is in order.  sqlsmith
> doesn't generate any float literals, but it calls functions to satisfy
> its need for values of specific types.  Adding suitable functions[1] to
> the regression db, I made the following observations:

This is really useful, thanks!

> The infinite loop from the bug report was triggered. Further, two
> previously unseen errors are logged:
> ERROR:  timestamp cannot be NaN
> ERROR:  getQuadrant: impossible case
> The first is porbably as boring as it gets, the second one is from the
> getQuadrant() in spgquadtreeproc.c.

Yeah, the first one is presumably from float8_timestamptz() intentionally
rejecting a NaN, which seems fine.

> Curiously, the getQuadrant()s in geo_spgist.c and rangetypes_spgist.c do
> not have such a check.  I guess the boxes will just end up in an
> undefined position in the index for these.

Right, we probably want them all to apply some consistent ordering ---
doesn't matter so much what it is, but float8's rule is as good as any.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] use of SEQ_MINVALUE in btree_gin

2016-07-12 Thread Tom Lane
Peter Eisentraut  writes:
> btree_gin uses SEQ_MINVALUE as a way to get the smallest int64 value.
> This is actually wrong because the smallest int64 value is
> SEQ_MINVALUE-1, so this might be slightly broken.

> The whole thing was done as a convenience when INT64_IS_BUSTED had to be
> considered, but I think we can get rid of that now.  See attached
> proposed patch.

+1.  I agree that this is actually a bug fix, so it should be back-patched.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-07-12 Thread Tom Lane
Stephen Frost  writes:
> Agreed.  I don't have any issue with "Language", really, but I agree
> that "Source code" makes the output pretty ridiculous.  I also liked the
> idea of changing the name to "internal name" or something along those
> lines, rather than having it be "source code", if we keep the column for
> C/internal functions.  Keeping is as "source code" wouldn't be accurate.

It's sounding to me like we have consensus on this proposal to further
change \df+ to replace the "Source code" column with "Internal name",
which is prosrc for C and internal-language functions but NULL otherwise.

If I've not heard objections by tomorrow I'll go make that change.

Are we satisfied with telling people to use \sf to see the source code
for a PL function?  Or should there be another variant of \df that
still provides source code?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-12 Thread Kevin Grittner
On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund  wrote:

> I'm a bit confused, why aren't we simply adding LSN interlock
> checks for toast? Doesn't look that hard? Seems like a much more
> natural course of fixing this issue?

I took some time trying to see what you have in mind, and I'm
really not "getting it".  I definitely applaud you for spotting the
problem, but this suggestion for solving it doesn't seem to be
useful.

Basically, after turning this suggestion every way I could, I see
two alternative ways to implement it.

(1)  Whenever TestForOldSnapshot() checks a heap page, check
whether the related toast is OK for all visible tuples on that
page.  It would be enough to check one toast tuple for one value
per heap tuple, but still -- this would be really nasty from a
performance perspective.

(2)  To deal with the fact that only about 7% of the
BufferGetPage() calls need to make this check, all functions and
macros which read toast data from the table would need extra
parameters, and all call sites for the toast API would need to have
such context information passed to them, so they could specify this
correctly.  Ugh.

Compared to those options, the approach I was taking, where the fix
is "automatic" but some workloads where old_snapshot_threshold is
on would sequentially read some toast indexes more often seems
pretty tame.

Do you see some other option that fits what you describe?  I'll
give you a couple days to respond before coding the patch.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One process per session lack of sharing

2016-07-12 Thread AMatveev
Hi

> amatv...@bitec.ru writes:
>> Is  there  any  plan  to  implement  "session  per  thread" or "shared
>> sessions between thread"?

> No, not really.  The amount of overhead that would add --- eg, the need
> for locking on what used to be single-use caches --- makes the benefit
> highly questionable.
A two-layer cache is the best answer.
>  Also, most people who need this find that sticking
> a connection pooler in front of the database solves their problem
It has some disadvantages. Lack of temporary table for example
Practical  usage  of  that  table  with  connection  poller is  highly
questionable.
And so on.
> , so
> there's not that much motivation to do a ton of work inside the database
> to solve it there.
It is clear. Thank you.


-- 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One process per session lack of sharing

2016-07-12 Thread Tom Lane
amatv...@bitec.ru writes:
> Is  there  any  plan  to  implement  "session  per  thread" or "shared
> sessions between thread"?

No, not really.  The amount of overhead that would add --- eg, the need
for locking on what used to be single-use caches --- makes the benefit
highly questionable.  Also, most people who need this find that sticking
a connection pooler in front of the database solves their problem, so
there's not that much motivation to do a ton of work inside the database
to solve it there.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] One process per session lack of sharing

2016-07-12 Thread AMatveev
Hi

Is  there  any  plan  to  implement  "session  per  thread" or "shared
sessions between thread"?
We  have analyzed  the  ability to contribute  pgSql to jvm bytecode compiler 
but with
current   thread   model  this  idea  is  far  from optimal.(Vm can be 
different of course.
But currently we use oracle and jvm is important for us)

We have faced with some lack of sharing resources.
So in our test memory usage per session:
Oracle: about 5M
MSSqlServer: about 4M
postgreSql: about 160М

It's discussed on pgsql-gene...@postgresql.org:
http://www.mail-archive.com/pgsql-general@postgresql.org/msg206452.html


>I think the "problem" that he is having is fixable only by changing how
>PostgreSQL itself works. His problem is a PL/pgSQL function which is 11K
>lines in length. When invoked, this function is "compiled" into a large
>tokenized parse tree. This parse tree is only usable in the session which
>invoked the the function. Apparently this parse tree takes a lot of memory.
>And "n" concurrent users of this, highly used, function will therefore
>require "n" times as much memory because the parse tree is _not_
>shareable.  This is explained in:
>https://www.postgresql.org/docs/9.5/static/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING

Next  interesting  answer(from  Karl  Czajkowski    in
private):
>  But, I search the
> archives of the mailing list, and when others have previously
> suggested such caching or reuse, it was immediately shot down by core
> developers.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IMPORT FOREIGN SCHEMA can't be run in in pl/pgsql due to INTO

2016-07-12 Thread Merlin Moncure
On Mon, Jul 11, 2016 at 3:09 PM, Tom Lane  wrote:
> Merlin Moncure  writes:
>> Currently pl/pgsql interprets the mandatory INTO of IMPORT FOREIGN
>> SCHEMA as INTO variable.
>
> Ugh, that's definitely a bug.
>
>> I estimate this to be minor oversight in
>> pl/pgsql parsing with respect to the introduction of this statement.
>
> While we can certainly hack it by something along the lines of not
> recognizing INTO when the first token was IMPORT, the whole thing
> seems awfully messy and fragile.  And it will certainly break again
> the next time somebody decides that INTO is le mot juste in some new
> SQL command.  I wish we could think of a safer, more future-proof
> solution.  I have no idea what that would be, though, short of
> deprecating INTO altogether.

This is a natural consequence of having two
almost-but-not-quite-the-same grammars handing the same shared
language.  There are a similar set of annoyances compiling C with a
C++ compiler as we all know.  In a perfect world, SQL procedural
extensions would be a proper superset and we'd have *one* grammar
handling everything.  Among other niceties this would make moving
forward with stored procedures a much simpler discussion.  Well, C'est
la vie :-D.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-07-12 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Michael Paquier  writes:
> > On Tue, Jul 12, 2016 at 11:36 AM, Alvaro Herrera
> >  wrote:
> >> So prosrc for internal/C and NULL for others?  WFM.
> 
> > And so we'd remove "Language" at the same time? That does not sound bad to 
> > me.
> 
> Hm, I wasn't thinking of that step.  The main knock on "Source code" is
> that it is usually too large to fit into the display grid --- but that
> argument doesn't work against "Language".  Also, while "Language" is
> certainly an implementation detail in some sense, it is a pretty useful
> detail: it gives you a good hint about the likely speed of the function,
> for instance.

Agreed.  I don't have any issue with "Language", really, but I agree
that "Source code" makes the output pretty ridiculous.  I also liked the
idea of changing the name to "internal name" or something along those
lines, rather than having it be "source code", if we keep the column for
C/internal functions.  Keeping is as "source code" wouldn't be accurate.

Thanks!

Stephen


signature.asc
Description: Digital signature


[HACKERS] Oddity in handling of cached plans for FDW queries

2016-07-12 Thread Etsuro Fujita

Hi,

I noticed odd behavior in invalidating a cached plan with a foreign join  
due to user mapping changes.  Consider:


postgres=# create extension postgres_fdw;
postgres=# create server loopback foreign data wrapper postgres_fdw  
options (dbname 'postgres');

postgres=# create user mapping for public server loopback;
postgres=# create table t1 (a int, b int);
postgres=# insert into t1 values (1, 1);
postgres=# create foreign table ft1 (a int, b int) server loopback  
options (table_name 't1');

postgres=# analyze ft1;
postgres=# create view v1 as select * from ft1;
postgres=# create user v1_owner;
postgres=# alter view v1 owner to v1_owner;
postgres=# grant select on ft1 to v1_owner;
postgres=# prepare join_stmt as select * from ft1, v1 where ft1.a = v1.a;
postgres=# explain verbose execute join_stmt;
  QUERY PLAN
--
 Foreign Scan  (cost=100.00..102.04 rows=1 width=16)
   Output: ft1.a, ft1.b, ft1_1.a, ft1_1.b
   Relations: (public.ft1) INNER JOIN (public.ft1)
   Remote SQL: SELECT r1.a, r1.b, r5.a, r5.b FROM (public.t1 r1 INNER  
JOIN public.t1 r5 ON (((r1.a = r5.a

(4 rows)

That's great.

postgres=# create user mapping for v1_owner server loopback;
postgres=# explain verbose execute join_stmt;
  QUERY PLAN
--
 Nested Loop  (cost=200.00..202.07 rows=1 width=16)
   Output: ft1.a, ft1.b, ft1_1.a, ft1_1.b
   Join Filter: (ft1.a = ft1_1.a)
   ->  Foreign Scan on public.ft1  (cost=100.00..101.03 rows=1 width=8)
 Output: ft1.a, ft1.b
 Remote SQL: SELECT a, b FROM public.t1
   ->  Foreign Scan on public.ft1 ft1_1  (cost=100.00..101.03 rows=1  
width=8)

 Output: ft1_1.a, ft1_1.b
 Remote SQL: SELECT a, b FROM public.t1
(9 rows)

That's also great.  Since ft1 and v1 use different user mappings, the  
join should not be pushed down.


postgres=# drop user mapping for v1_owner server loopback;
postgres=# explain verbose execute join_stmt;
  QUERY PLAN
--
 Nested Loop  (cost=200.00..202.07 rows=1 width=16)
   Output: ft1.a, ft1.b, ft1_1.a, ft1_1.b
   Join Filter: (ft1.a = ft1_1.a)
   ->  Foreign Scan on public.ft1  (cost=100.00..101.03 rows=1 width=8)
 Output: ft1.a, ft1.b
 Remote SQL: SELECT a, b FROM public.t1
   ->  Foreign Scan on public.ft1 ft1_1  (cost=100.00..101.03 rows=1  
width=8)

 Output: ft1_1.a, ft1_1.b
 Remote SQL: SELECT a, b FROM public.t1
(9 rows)

Seems odd to me.  Both relations use the same user mapping as before, so  
the join should be pushed down.


Another thing I noticed is that the code in plancache.c would invalidate  
a cached plan with a foreign join due to user mapping changes even in  
the case where user mappings are meaningless to the FDW.


To fix the first, I'd like to propose (1) replacing the existing  
has_foreign_join flag in the CachedPlan data structure with a new flag,  
say uses_user_mapping, that indicates whether a cached plan uses any  
user mapping regardless of whether the cached plan has foreign joins or  
not (likewise, replace the hasForeignJoin flag in PlannedStmt), and (2)  
invalidating the cached plan if the uses_user_mapping flag is true.  I  
thought that we invalidate the cached plan if the flag is true and there  
is a possibility of performing foreign joins, but it seems to me that  
updates on the corresponding catalog are infrequent enough that such a  
detailed tracking is not worth the effort.  One benefit of using the  
proposed approach is that we could make the FDW's handling of user  
mappings in BeginForeignScan more efficient; currently, there is  
additional overhead caused by catalog re-lookups to obtain the user  
mapping information for the simple-foreign-table-scan case where user  
mappings mean something to the FDW as in postgres_fdw (probably, updates  
on the catalogs are infrequent, though), but we could improve the  
efficiency by using the validated user mapping information created at  
planning time for that case as well as for the foreign-join case.  For  
that, I'd like to propose storing the validated user mapping information  
into ForeignScan, not fdw_private.  There is a discussion about using an  
ExtensibleNode [1] for that, but the proposed approach would make the  
FDW's job much simpler.


I don't think the above change is sufficient to fix the second.  The  
root reason for that is that since currently, we allow the user mapping  
OID (rel->umid) to be InvalidOid in two cases: (1) user mappings mean  
something to the FDW but it can't get any user mapping at planning time  
and (2) user mappings are meaningless to the FDW, we cannot distinguish  
these two cases.  So, I'd like to 

Re: [HACKERS] remove checkpoint_warning

2016-07-12 Thread PostgreSQL - Hans-Jürgen Schönig



On 07/09/2016 11:12 PM, Tom Lane wrote:

Alvaro Herrera  writes:

the checkpoint_warning feature was added by commit 2986aa6a668bce3cfb836
in November 2002 when we didn't have any logging of checkpointing at
all.  I propose to remove it: surely anyone who cares about analyzing
checkpointing behavior nowadays is using the log_checkpoint feature
instead, which contains much more detail.  The other one is just noise
now, and probably ignored amidst the number of other warning traffic.

Hmm, not sure.  ISTM log_checkpoint is oriented to people who know what
they are doing, whereas checkpoint_warning is more targeted to trying
to help people who don't.  Perhaps you could make an argument that
checkpoint_warning is useless because the people whom it's meant to help
won't notice the warning anyway --- but I doubt that it's been
"superseded" by log_checkpoint, because the latter would only be enabled
by people who already have a clue that checkpoint performance is something
to worry about.

Or in short, this may be a fine change to make, but I don't like your
argument for it.

regards, tom lane




i think tom is right here.
log_checkpoint and checkpoint_warning are for totally different people.
we might just want to do one thing: we might want to state explicitly 
that the database cannot break down if this warning shows up.
many people are scared to death that this warning somehow indicates that 
PostgreSQL is about to go up in flames, which is of course not true.
maybe we could do "consider increasing  to ensure good performance" 
or so ...


regards,

hans

--
Hans-Jürgen Schönig
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove checkpoint_warning

2016-07-12 Thread Magnus Hagander
On Tue, Jul 12, 2016 at 3:32 AM, Craig Ringer  wrote:

> On 11 July 2016 at 22:25, Stephen Frost  wrote:
>
>> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> > Alvaro Herrera  writes:
>> > > the checkpoint_warning feature was added by commit
>> 2986aa6a668bce3cfb836
>> > > in November 2002 when we didn't have any logging of checkpointing at
>> > > all.  I propose to remove it: surely anyone who cares about analyzing
>> > > checkpointing behavior nowadays is using the log_checkpoint feature
>> > > instead, which contains much more detail.  The other one is just noise
>> > > now, and probably ignored amidst the number of other warning traffic.
>> >
>> > Hmm, not sure.  ISTM log_checkpoint is oriented to people who know what
>> > they are doing, whereas checkpoint_warning is more targeted to trying
>> > to help people who don't.  Perhaps you could make an argument that
>> > checkpoint_warning is useless because the people whom it's meant to help
>> > won't notice the warning anyway --- but I doubt that it's been
>> > "superseded" by log_checkpoint, because the latter would only be enabled
>> > by people who already have a clue that checkpoint performance is
>> something
>> > to worry about.
>> >
>> > Or in short, this may be a fine change to make, but I don't like your
>> > argument for it.
>>
>> I don't agree that it's sensible to get rid of.  Having just
>> log_checkpoints will have the logs filled with checkpoints starting
>> because of XLOG, but there's no indication of that being a bad thing.
>>
>>
> Also, the warning is greppable-for and easily spotted by log ingesting
> tools. I see no real reason to remove it.
>

+1, this is a useful warning.

I'd flip the original argument over instead and say that for *most* people,
log_checkpoint output is mostly noise, and it's checkpoint_warnings that's
actually interesting. That on tells them if they have to look at anything
at all. It's true that those that care about *analyzing* their
checkpointing behaviour need to turn on log_checkpoint, but it's
checkpoint_warnings that tells them that they have to do this.

And the argument about it getting lost amongst other log traffic to me is
an argument for not turning on log_checkpoints by default. That generates a
lot of log entries that most people don't care for, and in doing so hides
other things in the log. It's excellent to have it once you need it, but it
shouldn't be turned on by default.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] GiST index build versus NaN coordinates

2016-07-12 Thread Andreas Seltenreich
Tom Lane writes:

> More generally, this example makes me fearful that NaN coordinates in
> geometric values are likely to cause all sorts of issues.  It's too late
> to disallow them, probably, but I wonder how can we identify other bugs
> of this ilk.

Sounds like some fuzz testing with nan/infinity is in order.  sqlsmith
doesn't generate any float literals, but it calls functions to satisfy
its need for values of specific types.  Adding suitable functions[1] to
the regression db, I made the following observations:

The infinite loop from the bug report was triggered. Further, two
previously unseen errors are logged:

ERROR:  timestamp cannot be NaN
ERROR:  getQuadrant: impossible case

The first is porbably as boring as it gets, the second one is from the
getQuadrant() in spgquadtreeproc.c.

Curiously, the getQuadrant()s in geo_spgist.c and rangetypes_spgist.c do
not have such a check.  I guess the boxes will just end up in an
undefined position in the index for these.

regards
Andreas

Footnotes:
[1]
create function smith_double_inf() returns float as $$select 
'infinity'::float$$ language sql immutable;
create function smith_double_ninf() returns float as $$select 
'-infinity'::float$$ language sql immutable;
create function smith_double_nan() returns float as $$select 'nan'::float$$ 
language sql immutable;
create function smith_real_nan() returns real as $$select 'nan'::real$$ 
language sql immutable;
create function smith_real_inf() returns real as $$select 'infinity'::real$$ 
language sql immutable;
create function smith_real_ninf() returns real as $$select '-infinity'::real$$ 
language sql immutable;


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers