[HACKERS] Customized Options Threshold Error

2014-11-14 Thread furuyao
Hi,

When we specify a value which exceeds valid range in Customized Options ,
its behavior is different from Parameter Interaction via Configuration File 
behavior.

In case of Parameter Interaction via Configuration File, it finish with FATAL 
error when it get threshold error.
But in Customized Options, it just printout WARNING and continue the process 
with default value.

In manual, it says issue warnings for any unrecognized placeholders that begin 
with its extension name written in 18.16. Customized Options.
So when extension name is correct and the value exceeds the threshold range,
I think it should end with FATAL error just like Parameter Interaction via 
Configuration File.
Thought?

Following is the threshold error log. One is Customized Options Threshold 
Error and another is Parameter Interaction via Configuration File Threshold 
Error.

Customized Options Threshold Error

 WARNING:  invalid value for parameter pg_receivexlog_mng.monitor_interval: 
2147483648
 HINT:  Value exceeds integer range.
 LOG:  registering background worker pg_receivexlog_mng
 
Parameter Interaction via Configuration File Threshold Error

 FATAL:  configuration file /dbfp/data/postgresql.conf contains errors

Regards,

--
Furuya Osamu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alter user/role CURRENT_USER

2014-11-14 Thread Kyotaro HORIGUCHI
Hi, this is revised version.

 Kyotaro HORIGUCHI wrote:
 
  - Storage for new information
  
  The new struct NameId stores an identifier which telling what it
  logically is using the new enum NameIdTypes.
 
 I think NameId is a bad name for this.  My point is that NameId, as it
 stands, might be a name for anything, not just a role; and the object it
 identifies is not an Id either.  Maybe RoleSpec?

Yeah! I felt it no good even if it were a generic type for
various Name of something or its oid. RoleSpec sounds much better.

 Do we need a public_ok
 argument to get_nameid_oid() (get a better name for this function too)

Maybe get_rolespec_oid() as a name ofter its parameter type?

 so that callers don't have to check for InvalidOid argument?  I think
 the arrangement you propose is not very convenient; it'd be best to
 avoid duplicating the check for InvalidOid in all callers of the new
 function, particularly where there was no check before.

I agree that It'd be better keeping away from duplicated
InvalidOid checks, but public_ok seems a bit myopic. Since
there's no reasonable border between functions accepting 'public'
and others, such kind of solution would not be reasonable..

What about checking it being a PUBLIC or not *before* calling
get_rolespec_oid()?

The attached patch modified in the following points.

 - rename NameId to RoleSpec and NameIdType to RoleSpecTypes.
 - rename get_nameid_oid() to get_rolespec_oid().
 - rename roleNamesToIds() to roleSpecsToIds().
 - some struct members are changed such as authname to authrole.
 - check if rolespec is public or not before calling get_rolespec_oid()
 - ExecAlterDefaultPrivilegesStmt and ExecuteGrantStmt does
   slightly different things about ACL_ID_PUBLIC but I unified it
   to the latter.
 - rebased to the current master

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

CreateStmt-authrole = NULL = ?
From 307249654c97b6449261febbfd84190fbad9111d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi horiguchi.kyot...@lab.ntt.co.jp
Date: Fri, 14 Nov 2014 17:37:22 +0900
Subject: [PATCH] ALTER USER CURRENT_USER v3

---
 src/backend/catalog/aclchk.c   |  30 +++---
 src/backend/commands/alter.c   |   2 +-
 src/backend/commands/extension.c   |   2 +-
 src/backend/commands/foreigncmds.c |  57 +--
 src/backend/commands/schemacmds.c  |  30 +-
 src/backend/commands/tablecmds.c   |   4 +-
 src/backend/commands/tablespace.c  |   2 +-
 src/backend/commands/user.c|  86 +
 src/backend/nodes/copyfuncs.c  |  37 +---
 src/backend/nodes/equalfuncs.c |  35 ---
 src/backend/parser/gram.y  | 190 +++--
 src/backend/parser/parse_utilcmd.c |   4 +-
 src/backend/utils/adt/acl.c|  34 +++
 src/include/commands/user.h|   2 +-
 src/include/nodes/nodes.h  |   1 +
 src/include/nodes/parsenodes.h |  48 +++---
 src/include/utils/acl.h|   2 +-
 17 files changed, 385 insertions(+), 181 deletions(-)

diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index d30612c..24811c6 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -430,13 +430,16 @@ ExecuteGrantStmt(GrantStmt *stmt)
 	foreach(cell, stmt-grantees)
 	{
 		PrivGrantee *grantee = (PrivGrantee *) lfirst(cell);
+		Oid grantee_uid = ACL_ID_PUBLIC;
 
-		if (grantee-rolname == NULL)
-			istmt.grantees = lappend_oid(istmt.grantees, ACL_ID_PUBLIC);
-		else
-			istmt.grantees =
-lappend_oid(istmt.grantees,
-			get_role_oid(grantee-rolname, false));
+		/* public is mapped to ACL_ID_PUBLIC */
+		if (grantee-role-roltype != ROLESPEC_PUBLIC)
+		{
+			grantee_uid = get_rolespec_oid(grantee-role, false);
+			if (!OidIsValid(grantee_uid))
+grantee_uid = ACL_ID_PUBLIC;
+		}
+		istmt.grantees = lappend_oid(istmt.grantees, grantee_uid);
 	}
 
 	/*
@@ -913,13 +916,16 @@ ExecAlterDefaultPrivilegesStmt(AlterDefaultPrivilegesStmt *stmt)
 	foreach(cell, action-grantees)
 	{
 		PrivGrantee *grantee = (PrivGrantee *) lfirst(cell);
+		Oid grantee_uid = ACL_ID_PUBLIC;
 
-		if (grantee-rolname == NULL)
-			iacls.grantees = lappend_oid(iacls.grantees, ACL_ID_PUBLIC);
-		else
-			iacls.grantees =
-lappend_oid(iacls.grantees,
-			get_role_oid(grantee-rolname, false));
+		/* public is mapped to ACL_ID_PUBLIC */
+		if (grantee-role-roltype != ROLESPEC_PUBLIC)
+		{
+			grantee_uid = get_rolespec_oid(grantee-role, false);
+			if (!OidIsValid(grantee_uid))
+grantee_uid = ACL_ID_PUBLIC;
+		}
+		iacls.grantees = lappend_oid(iacls.grantees, grantee_uid);
 	}
 
 	/*
diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index c9a9baf..c53d4e5 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -678,7 +678,7 @@ AlterObjectNamespace_internal(Relation rel, Oid objid, Oid nspOid)
 Oid
 ExecAlterOwnerStmt(AlterOwnerStmt *stmt)
 {
-	Oid			newowner = get_role_oid(stmt-newowner, false);
+	Oid	

Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

2014-11-14 Thread David Rowley
On 14 November 2014 20:37, Jim Nasby jim.na...@bluetreble.com wrote:

 On 11/12/14, 1:54 AM, David Rowley wrote:


 We'd also need to add some infrastructure to merge aggregate states
 together for this to work properly. This means that could also work for
 avg() and stddev etc. For max() and min() the merge functions would likely
 just be the same as the transition functions.


 Sanity check: what % of a large aggregate query fed by a seqscan actually
 spent in the aggregate functions? Even if you look strictly at CPU cost,
 isn't there more code involved to get data to the aggregate function than
 in the aggregation itself, except maybe for numeric?


You might be right, but that sounds like it would need all the parallel
workers to send each matching tuple to a queue to be processed by some
aggregate node. I guess this would have more advantage for wider tables or
tables with many dead tuples, or if the query has quite a selective where
clause, as less data would make it onto that queue.

Perhaps I've taken 1 step too far forward here. I had been thinking that
each worker would perform the partial seqscan and in the worker context
pass the tuple down to the aggregate node. Then later once each worker had
complete some other perhaps new node type (MergeAggregateStates) would
merge all those intermediate agg states into the final agg state (which
would then be ready for the final function to be called).

Are there any plans for what will be in charge of deciding how many workers
would be allocated to a parallel query? Will this be something that's done
at planning time? Or should the planner just create a parallel friendly
plan, iif the plan is costly enough and then just allow the executor decide
how many workers to throw at the job based on how busy the system is with
other tasks at execution time?

Regards

David Rowley


Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

2014-11-14 Thread Simon Riggs
On 14 November 2014 07:37, Jim Nasby jim.na...@bluetreble.com wrote:
 On 11/12/14, 1:54 AM, David Rowley wrote:

 On Tue, Nov 11, 2014 at 9:29 PM, Simon Riggs si...@2ndquadrant.com
 mailto:si...@2ndquadrant.com wrote:


 This plan type is widely used in reporting queries, so will hit the
 mainline of BI applications and many Mat View creations.
 This will allow SELECT count(*) FROM foo to go faster also.

 We'd also need to add some infrastructure to merge aggregate states
 together for this to work properly. This means that could also work for
 avg() and stddev etc. For max() and min() the merge functions would likely
 just be the same as the transition functions.


 Sanity check: what % of a large aggregate query fed by a seqscan actually
 spent in the aggregate functions? Even if you look strictly at CPU cost,
 isn't there more code involved to get data to the aggregate function than in
 the aggregation itself, except maybe for numeric?

Yes, which is why I suggested pre-aggregating before collecting the
streams together.

The point is not that the aggregation is expensive, its that the
aggregation eats data and the required bandwidth for later steps is
reduced and hence does not then become a bottleneck that renders the
parallel Seq Scan ineffective.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

2014-11-14 Thread Simon Riggs
On 14 November 2014 01:51, David Rowley dgrowle...@gmail.com wrote:

 When I mentioned this, I didn't mean to appear to be placing a road block.I
 was just bringing to the table the information that COUNT(*) + COUNT(*)
 works ok for merging COUNT(*)'s sub totals, but AVG(n) + AVG(n) does not.

  Merge functions should be a simple patch to write. If I thought there was
 going to be a use for them in this release, I'd put my hand up to put a
 patch together.

The hard part is describing and then agreeing the semantics.

If you raise a separate post on this, copy me in and I will help.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unintended restart after recovery error

2014-11-14 Thread Antonin Houska
Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Nov 13, 2014 at 8:30 AM, Robert Haas robertmh...@gmail.com wrote:
 It's true that if the startup process dies we don't try to restart,
 but it's also true that if the checkpointer dies we do try to restart.
 I'm not sure why this specific situation should be an exception to
 that general rule.

My distinction was during recovery vs outside recovery, rather than
startup process vs checkpointer. But I'm not sure it's easy enough to
recognize that checkpointer (maybe also bgwriter) no longer participates in
recovery.

 442231d7f71764b8c628044e7ce2225f9aa43b6 introduced the latter rule
 for hot-standby case.

I didn't fully understand the purpose of this condition until I saw the commit
message. Thanks for pointing out.

 Maybe *during crash recovery* (i.e., hot standby should not be enabled) it's
 better to treat the crash of startup process as a catastrophic crash.

Yes, that's what I thought too. But I think the current StartupXLOG() does not
always (need to) determine the exact boundary between crash and archive
recovery. I'd need to think more if the end of crash recovery can be safely
identified during the replay and signaled to postmaster.

-- 
Antonin Houska
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PostgreSQL doesn't stop propley when --slot option is specified with pg_receivexlog.

2014-11-14 Thread furuyao
Hi,

pg_ctl stop does't work propley, if --slot option is specified when WAL is 
flushed only it has switched.
These processes still continue even after the posmaster failed:pg_receivexlog, 
walsender and logger.

How to reproduce:
1.Start PostgreSQL
2.Create slot
3.Specify --slot option to pg_receivexlog and start it
4.Stop PostgreSQL

Example commands for reproduce:
1.pg_ctl start
2.psql -c select pg_create_physical_replication_slot('test');
3.pg_receivexlog --slot='test' -D ./testdir
4.pg_ctl stop

This happen cause --slot option set a flush location to feedback messages,
but it only flush when wal has switched, so walsender wait for the WAL have 
been replicated forever.

When --slot option is not specified, 'invalid' is set to flush location and it 
use write location to check replication so it can stop the process propley.

So I thought set 'invalid' to flush location as well in this case, this problem 
will be solved.
Does --slot option need to set flush location?
Thought?

Regards,

--
Furuya Osamu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Size of regression database

2014-11-14 Thread Alvaro Herrera
Tom Lane wrote:
 I was testing backwards compatibility of pg_dumpall just now, and was
 somewhat astonished to notice the size of the output for the regression
 database compared to what it was not too long ago:
 
 -rw-rw-r--. 1 tgl tgl  4509135 Nov 13 16:19 dumpall.83
 -rw-rw-r--. 1 tgl tgl  4514441 Nov 13 16:24 dumpall.84
 -rw-rw-r--. 1 tgl tgl  4666917 Nov 13 16:15 dumpall.90
 -rw-rw-r--. 1 tgl tgl  4681235 Nov 13 16:15 dumpall.91
 -rw-rw-r--. 1 tgl tgl  5333587 Nov 13 16:15 dumpall.92
 -rw-rw-r--. 1 tgl tgl  5409083 Nov 13 16:15 dumpall.93
 -rw-rw-r--. 1 tgl tgl  5493686 Nov 13 16:15 dumpall.94
 -rw-rw-r--. 1 tgl tgl 27151777 Nov 13 16:21 dumpall.head
 
 A quick eyeball check says that that quintupling of the database size
 is all in BRIN index tests.  Could we dial that back to something a
 bit saner please?

Oops.  Sure, will see about this.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

2014-11-14 Thread Kouhei Kaigai
 On 14 November 2014 07:37, Jim Nasby jim.na...@bluetreble.com wrote:
  On 11/12/14, 1:54 AM, David Rowley wrote:
 
  On Tue, Nov 11, 2014 at 9:29 PM, Simon Riggs si...@2ndquadrant.com
  mailto:si...@2ndquadrant.com wrote:
 
 
  This plan type is widely used in reporting queries, so will hit the
  mainline of BI applications and many Mat View creations.
  This will allow SELECT count(*) FROM foo to go faster also.
 
  We'd also need to add some infrastructure to merge aggregate states
  together for this to work properly. This means that could also work
  for
  avg() and stddev etc. For max() and min() the merge functions would
  likely just be the same as the transition functions.
 
 
  Sanity check: what % of a large aggregate query fed by a seqscan
  actually spent in the aggregate functions? Even if you look strictly
  at CPU cost, isn't there more code involved to get data to the
  aggregate function than in the aggregation itself, except maybe for
 numeric?
 
 Yes, which is why I suggested pre-aggregating before collecting the streams
 together.
 
 The point is not that the aggregation is expensive, its that the aggregation
 eats data and the required bandwidth for later steps is reduced and hence
 does not then become a bottleneck that renders the parallel Seq Scan
 ineffective.
 
I'd like to throw community folks a question. 
Did someone have a discussion to the challenge of aggregate push-down across
relations join in the past? It potentially reduces number of rows to be joined.
If we already had, I'd like to check up the discussion at that time.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort

2014-11-14 Thread Jeremy Harris
On 14/11/14 00:46, Simon Riggs wrote:
 Limit  (cost= rows=20 width=175) (actual time= rows=20 loops=1)
-  Sort  (cost= rows=568733 width=175) (actual time=
 rows=20 loops=1)
  Sort Method: top-N heapsort

Going off on a tangent, when I was playing with a merge-sort
implementation I propagated limit information into the sort
node, for a significant win.  Eliding the Limit node gave
a further slight win.

I wasn't convinced the use-case was common enough to justify
the replacement of quicksort (despite having consistently
fewer compares, the merge sort was slightly slower.  I never
understood why) - but I never asked.  Is there any appetite
for supporting alternate sort algorithms?
-- 
Cheers,
  Jeremy



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

2014-11-14 Thread David Rowley
On Fri, Nov 14, 2014 at 1:27 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 12 November 2014 00:54, Robert Haas robertmh...@gmail.com wrote:
  Interestingly, I have a fairly solid idea of what proisparallel is,
  but I have no clear idea what CONTAINS NO SQL is or why it's relevant.
  I would imagine that srandom() contains no SQL under any reasonable
  definition of what that means, but it ain't parallel-safe.

 What is wrong in generating random numbers in parallel?


I was just watching Robert's talk on Parallel query on youtube... I think
the answer is at 41:09, the link below should take you there:
https://www.youtube.com/watch?v=3klfarKEtMQ#t=2469

Regards

David Rowley


Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Noah Misch n...@leadboat.com writes:
  On Thu, Nov 13, 2014 at 08:24:36PM -0500, Stephen Frost wrote:
  Agreed.  I'll take care of both and we'll make sure the new role
  attributes being added will do the same for upgrades also.
 
  That would make pg_dumpall less faithful for every role other than the
  bootstrap superuser, a net loss.  It would be defensible to do this for
  BOOTSTRAP_SUPERUSERID only.
 
 Huh?  It seems difficult to argue that it's less faithful to do this
 when the previous version didn't have the concept at all.

I believe what Noah is pointing out is that we'll end up adding
attributes which weren't there already for superusers created by users.

You're correct that we currently enable all attributes for the bootstrap
superuser and therefore a dump/restore upgrade looks different from an
initdb, unless the dump includes all new attributes for the bootstrap
superuser.

There's a couple ways to address this-

Stop enabling all the role attribute bits for the bootstrap superuser,
in which case it'd look a lot more like other superusers that a user
might create (at least, in my experience, no one bothers to set the role
attributes beyond superuser in real environments).

or

Reflect actual capability in what is viewed through the catalog.  This
might actually dovetail nicely with the role attribute representation
change which is also being discussed, were we to make pg_authid a view
which called 'has_rolX_privilege' to get the value for each attribute.
What's actually in the bitmask might end up being different, but at
least what's seen in pg_authid (and hopefully for all client tools)
would make sense.  Of course, this also has the downside that if the
superuser bit is removed later, we'd revert to whatever is actually in
the catalog for the user and that'd potentially be different for the
bootstrap superuser vs. user-created superusers.

Personally, I'm leaning towards the first as it's less clutter in the
output of psql.  If we add the role attributes proposed and continue to
enable all of them explicitly for the bootstrap superuser, the
'Attributes' column is going to get mighty wide.  I don't really see the
explicit list of attributes as helping de-mystify what is going on for
users as it's akin to root anyway- doing an 'ls' as root doesn't show
all the file permissions based on what root can do.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Noah Misch
On Fri, Nov 14, 2014 at 08:35:25AM -0500, Stephen Frost wrote:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
  Noah Misch n...@leadboat.com writes:
   On Thu, Nov 13, 2014 at 08:24:36PM -0500, Stephen Frost wrote:
   Agreed.  I'll take care of both and we'll make sure the new role
   attributes being added will do the same for upgrades also.
  
   That would make pg_dumpall less faithful for every role other than the
   bootstrap superuser, a net loss.  It would be defensible to do this for
   BOOTSTRAP_SUPERUSERID only.
  
  Huh?  It seems difficult to argue that it's less faithful to do this
  when the previous version didn't have the concept at all.
 
 I believe what Noah is pointing out is that we'll end up adding
 attributes which weren't there already for superusers created by users.

Yes.

 There's a couple ways to address this-
 
 Stop enabling all the role attribute bits for the bootstrap superuser,
 in which case it'd look a lot more like other superusers that a user
 might create (at least, in my experience, no one bothers to set the role
 attributes beyond superuser in real environments).
 
 or [...]

 Personally, I'm leaning towards the first as it's less clutter in the
 output of psql.

I'd agree for a new design, but I see too little to gain from changing it now.
Today's behavior is fine.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Customized Options Threshold Error

2014-11-14 Thread Tom Lane
furu...@pm.nttdata.co.jp writes:
 When we specify a value which exceeds valid range in Customized Options ,
 its behavior is different from Parameter Interaction via Configuration File 
 behavior.

 In case of Parameter Interaction via Configuration File, it finish with 
 FATAL error when it get threshold error.
 But in Customized Options, it just printout WARNING and continue the 
 process with default value.

I think this is based on a misunderstanding.  Bad values in
postgresql.conf only result in a FATAL error at initial postmaster
startup; if you change them and SIGHUP the server, an erroneous new
value just results in a WARNING.  This is because we don't really
want the server crashing once it's up.  The case you're showing with
registering a new background worker is something that happens after
the server is up, so it's consistent for it to produce a WARNING
not a fatal error.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Teaching pg_dump to use NOT VALID constraints

2014-11-14 Thread Alvaro Herrera
Peter Eisentraut wrote:
 On 11/13/14 5:07 AM, Simon Riggs wrote:
  On 13 November 2014 00:20, Jim Nasby jim.na...@bluetreble.com wrote:
  
  Isn't the real use-case here that if constraints were valid when you dumped
  then we shouldn't have to *any* re-validate when we load? (Though, we'd 
  have
  to be careful of that with CHECK because that can call user code...)
  
  Yes, that is the use case this patch would improve considerably.
 
 But your patch doesn't really address that.  It just leaves the
 constraints as invalid, and someone will have to revalidate them later
 (how?).  What Jim was describing was a mode that creates the constraints
 as valid but doesn't actually validate them.  I can see both sides of
 that kind of feature.

This might lead to users introducing invalid data by way of declaring
constants as valid but not checked by the system; if they turn out to be
invalid, the final state is a mess.  I would only buy such a feature if
we had some way to pass down the knowledge of the constraint being valid
in the original system through some other means; say emit a CRC of the
copy data in the pg_dump output that can be verified while loading, and
only allow unvalidated constraints to be marked VALID if the sum
matches.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort

2014-11-14 Thread Tom Lane
Jeremy Harris j...@wizmail.org writes:
 On 14/11/14 00:46, Simon Riggs wrote:
 Limit  (cost= rows=20 width=175) (actual time= rows=20 loops=1)
 -  Sort  (cost= rows=568733 width=175) (actual time=
 rows=20 loops=1)
 Sort Method: top-N heapsort

 Going off on a tangent, when I was playing with a merge-sort
 implementation I propagated limit information into the sort
 node, for a significant win.

I'm not entirely following.  The top-N heapsort approach already
makes use of the limit info.

If the limit is so large that the sort spills to disk, then we
stop thinking about the limit.  But I'm finding it doubtful either
that that's a case worthy of extra code or that you could get very
much win if you did add code for it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2014-11-14 Thread Andres Freund
On 2014-11-14 02:04:00 -0600, Jim Nasby wrote:
 On 11/13/14, 3:50 PM, Andres Freund wrote:
 Having been responsible for a site where downtime was a 6 figure
 dollar amount per hour, I've spent a LOT of time worrying about lock
 problems. The really big issue here isn't grabbing an exclusive lock;
 it's grabbing one at some random time when no one is there to actively
 monitor what's happening. (If you can't handle *any* exclusive locks,
 that also means you can never do an ALTER TABLE ADD COLUMN either.)

 With that in mind, would it be possible to set this up so that the
 time-consuming process of building the new index file happens first,
 and then (optionally) some sort of DBA action is required to actually
 do the relfilenode swap? I realize that's not the most elegant
 solution, but it's WAY better than this feature not hitting 9.5 and
 people having to hand-code a solution.

I don't think having a multi step version of the feature and it not
making into 9.5 are synonymous. And I really don't want to make it even
more complex before we have the basic version in.

I think a split like your:

 Possible syntax:
 REINDEX CONCURRENTLY -- Does what current patch does
 REINDEX CONCURRENT BUILD -- Builds new files
 REINDEX CONCURRENT SWAP -- Swaps new files in

could make sense, but it's really an additional feature ontop.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Stephen Frost
* Noah Misch (n...@leadboat.com) wrote:
 On Fri, Nov 14, 2014 at 08:35:25AM -0500, Stephen Frost wrote:
  Personally, I'm leaning towards the first as it's less clutter in the
  output of psql.
 
 I'd agree for a new design, but I see too little to gain from changing it now.
 Today's behavior is fine.

To clarify- you mean with the changes described- using usesuper for
rolreplication and rolbypassrls instead of 'false' when dumping from
older versions, correct?

Note for all- rolreplication goes back to 9.1.  Are we thinking that
change should be backpatched?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Noah Misch (n...@leadboat.com) wrote:
 I'd agree for a new design, but I see too little to gain from changing it 
 now.
 Today's behavior is fine.

 To clarify- you mean with the changes described- using usesuper for
 rolreplication and rolbypassrls instead of 'false' when dumping from
 older versions, correct?

I think Noah is arguing for leaving the pg_dumpall queries as they
stand.  I disagree, but he's entitled to his opinion.

 Note for all- rolreplication goes back to 9.1.  Are we thinking that
 change should be backpatched?

I would say it's not really worth back-patching.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Noah Misch
On Fri, Nov 14, 2014 at 11:24:20AM -0500, Tom Lane wrote:
 Stephen Frost sfr...@snowman.net writes:
  * Noah Misch (n...@leadboat.com) wrote:
  I'd agree for a new design, but I see too little to gain from changing it 
  now.
  Today's behavior is fine.
 
  To clarify- you mean with the changes described- using usesuper for
  rolreplication and rolbypassrls instead of 'false' when dumping from
  older versions, correct?
 
 I think Noah is arguing for leaving the pg_dumpall queries as they
 stand.  I disagree, but he's entitled to his opinion.

Yes, that.  (Adopt Gilles Darold's fix, of course.)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2014-11-14 Thread Andres Freund
On 2014-11-13 11:41:18 -0500, Robert Haas wrote:
 On Wed, Nov 12, 2014 at 7:31 PM, Andres Freund and...@2ndquadrant.com wrote:
  But I think it won't work realistically. We have a *lot* of
  infrastructure that refers to indexes using it's primary key. I don't
  think we want to touch all those places to also disambiguate on some
  other factor. All the relevant APIs are either just passing around oids
  or relcache entries.
 
 I'm not quite following this.  The whole point is to AVOID having two
 indexes.  You have one index which may at times have two sets of
 physical storage.

Right. But how are we going to refer to these different relfilenodes?
All the indexing infrastructure just uses oids and/or Relation pointers
to refer to the index. How would you hand down the knowledge which of
the relfilenodes is supposed to be used in some callchain?

There's ugly solutions like having a flag like 'bool
rd_useotherfilenode' inside struct RelationData, but even ignoring the
uglyness I don't think that'd work well - what if some function called
inside the index code again starts a index lookup?

I think I might just not getting your idea here?

  And all the
  indexing infrastructure can't deal with that without having separate
  oids  relcache entries.

Hopefully explained above?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Stephen Frost
* Noah Misch (n...@leadboat.com) wrote:
 On Fri, Nov 14, 2014 at 11:24:20AM -0500, Tom Lane wrote:
  Stephen Frost sfr...@snowman.net writes:
   * Noah Misch (n...@leadboat.com) wrote:
   I'd agree for a new design, but I see too little to gain from changing 
   it now.
   Today's behavior is fine.
  
   To clarify- you mean with the changes described- using usesuper for
   rolreplication and rolbypassrls instead of 'false' when dumping from
   older versions, correct?
  
  I think Noah is arguing for leaving the pg_dumpall queries as they
  stand.  I disagree, but he's entitled to his opinion.
 
 Yes, that.

Ah, ok.  I'm impartial, but I do note that we're currently inconsistent
and so I'd prefer to go one way or the other.

rolcreaterole uses usesuper, while rolreplication and rolbypassrls do
not.  Noah- would you argue that we should change rolcreaterole, which
has this behavior in all released branches (though, of course, it's only
relevant when upgrading from a pre-8.1 server where we didn't have
rolcreaterole)?  What are your thoughts on the additional role
attributes which are being discussed?

 (Adopt Gilles Darold's fix, of course.)

That's been done already.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] alternative model for handling locking in parallel groups

2014-11-14 Thread Andres Freund
Hi,

On 2014-11-13 15:59:11 -0500, Robert Haas wrote:
 Discussion of my incomplete group locking patch seems to have
 converged around two points: (1) Everybody agrees that undetected
 deadlocks are unacceptable.  (2) Nobody agrees with my proposal to
 treat locks held by group members as mutually non-conflicting.  As was
 probably evident from the emails on the other thread, it was not
 initially clear to me why you'd EVER want heavyweight locks held by
 different group members to mutually conflict, but after thinking it
 over for a while, I started to think of cases where you would
 definitely want that:

 1. Suppose two or more parallel workers are doing a parallel COPY.
 Each time the relation needs to be extended, one backend or the other
 will need to take the relation extension lock in Exclusive mode.
 Clearly, taking this lock needs to exclude both workers in the same
 group and also unrelated processes.
 
 2. Suppose two or more parallel workers are doing a parallel
 sequential scan, with a filter condition of myfunc(a.somecol), and
 that myfunc(a.somecal) updates a tuple in some other table.  Access to
 update that tuple will be serialized using tuple locks, and it's no
 safer for two parallel workers to do this at the same time than it
 would be for two unrelated processes to do it at the same time.
 
 On the other hand, I think there are also some cases where you pretty
 clearly DO want the locks among group members to be mutually
 non-conflicting, such as:
 
 3. Parallel VACUUM.  VACUUM takes ShareUpdateExclusiveLock, so that
 only one process can be vacuuming a relation at the same time.  Now,
 if you've got several processes in a group that are collaborating to
 vacuum that relation, they clearly need to avoid excluding each other,
 but they still need to exclude other people.  And in particular,
 nobody else should get to start vacuuming that relation until the last
 member of the group exits.  So what you want is a
 ShareUpdateExclusiveLock that is, in effect, shared across the whole
 group, so that it's only released when the last process exits.

Note that you'd definitely not want to do this naively - currently
there's baked in assumptions into the vaccum code that only one backend
is doing parts of it.

I think there's

 4. Parallel query on a locked relation.  Parallel query should work on
 a table created in the current transaction, or one explicitly locked
 by user action.  It's not acceptable for that to just randomly
 deadlock, and skipping parallelism altogether, while it'd probably be
 acceptable for a first version, is not going a good long-term
 solution.

FWIW, I think it's perfectly acceptable to refuse to work in parallel in
that scenario. And not just for now.

The biggest argument I can see to that is parallel index creation.

 It also sounds buggy and fragile for the query planner to
 try to guess whether the lock requests in the parallel workers will
 succeed or fail when issued.

I don't know. Checking whether we hold a self exclusive lock on that
relation doesn't sound very problematic to me.

 Figuring such details out is the job of
 the lock manager or the parallelism infrastructure, not the query
 planner.

I don't think you can sensibly separate the parallelism infrastructure
and the query planner.

 After thinking about these cases for a bit, I came up with a new
 possible approach to this problem.  Suppose that, at the beginning of
 parallelism, when we decide to start up workers, we grant all of the
 locks already held by the master to each worker (ignoring the normal
 rules for lock conflicts).  Thereafter, we do everything the same as
 now, with no changes to the deadlock detector.  That allows the lock
 conflicts to happen normally in the first two cases above, while
 preventing the unwanted lock conflicts in the second two cases.

I don't think that's safe enough. There's e.g. a reason why ANALYZE
requires SUE lock. It'd definitely not be safe to simply grant the
worker a SUE lock, just because the master backend already analyzed it
or something like that. You could end up with the master and worker
backends ANALYZEing the same relation.

That said, I can definitely see use for a infrastructure where we
explicitly can grant another backend a lock that'd conflict with one
we're already holding. But I think it'll need to be more explicit than
just granting all currently held locks at the highest current
level. And I think it's not necessarily going to be granting them all
the locks at their current levels.

What I'm thinking of is basically add a step to execMain.c:InitPlan()
that goes through the locks and grants the child process all the locks
that are required for the statement to run. But not possibly preexisting
higher locks.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-14 Thread Robert Haas
On Thu, Nov 13, 2014 at 6:55 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 13 November 2014 21:24, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Nov 13, 2014 at 8:18 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Ordering transactions in LSN order is very precisly the remit of the
 existing logical decoding API. Any user that wishes to see a commits
 in sequence can do so using that API. BDR already does this, as do
 other users of the decoding API. So Slony already has access to a
 useful ordering if it wishes it. We do not need to anything *on this
 patch* to enable LSNs for Slony or anyone else. I don't see any reason
 to provide the same facility twice, in two different ways.

 Perhaps you could respond more specifically to concerns expressed
 upthread, like:

 http://www.postgresql.org/message-id/blu436-smtp28b68b9312cbe5d9125441dc...@phx.gbl

 I don't see that as a dumb argument on the face of it.

 We have a clear must have argument for timestamps to support
 replication conflicts.

 Adding LSNs, when we already have a way to access them, is much more
 of a nice to have. I don't really see it as a particularly nice to
 have either, since the SLRU is in no way ordered.

 Scope creep is a dangerous thing, so we shouldn't, and elsewhere
 don't, collect up ideas like a bag of mixed sweets. It's easy to
 overload things to the point where they don't fly at all. The whole
 point of this is that we're building something faster than trigger
 based systems.

I think that's slamming the door closed and nailing it shut behind
you.  If we add the feature without LSNs, how will someone go back and
add that later?  It would change the on-disk format of the SLRU, so to
avoid breaking pg_upgrade, someone would have to write a conversion
utility.  Even at that, it would slow pg_upgrade down when the feature
has been used.

By way of contrast, adding the feature now is quite easy.  It just
requires storing a few more bytes per transaction.

I am all in favor of incremental development when possible, but not
when it so greatly magnifies the work that needs to be done.  People
have been asking for the ability to determine the commit ordering for
years, and this is a way to do that, very inexpensively, as part of a
patch that is needed anyway.  We are not talking about loading 20 new
requirements on top of this patch; that would be intolerable.  We're
talking about adding one additional piece of information that has been
requested multiple times over the years.

The way I see it, there are at least three uses for this information.
One, trigger-based replication solutions.  While logical decoding will
doubtless be preferable, I don't think trigger-based replication
solutions will go away completely, and certainly not right away.
They've wanted this since forever.  Two, for user applications that
want to know the commit order for their own purposes, as in Steve's
example.  Three, for O(1) snapshots.  Heikki's patch to make that
happen seems to have stalled a bit, but if it's ever to go anywhere it
will need something like this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Robert Haas
On Thu, Nov 13, 2014 at 6:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 What's bothering me is that I see this in pg_dumpall output from a 9.4
 or earlier database:

 ALTER ROLE postgres WITH SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN 
 REPLICATION NOBYPASSRLS;

What about leaving out NOBYPASSRLS and letting it go to whatever the default is?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-11-14 Thread Robert Haas
On Fri, Nov 14, 2014 at 2:55 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Fri, Nov 14, 2014 at 9:11 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Nov 13, 2014 at 10:37 PM, Amit Kapila amit.kapil...@gmail.com
 wrote:
 
 
  I have mentioned that this can be usable for Linux users as well on that
  thread, however I think we might want to provide it with an option for
  linux users.  In general, I think it is good to have this patch for
  Windows
  users and later if we find that Linux users can also get the benefit
  with
  this functionality, we can expose the same with an additional option.

 Why make it an option instead of just always doing it this way?

 To avoid extra work during archive recovery if it is not required.  I
 understand that this might not create any measurable difference, but
 still there is addition I/O involved (read from file) which can be avoided.

Yeah, but it's trivial.  We're not going create enough tablespaces on
one cluster for the cost of dropping a few extra symlinks in place to
matter.

 OTOH, if that is okay, then I think we can avoid few #ifdef WIN32 that
 this patch introduces and can have consistency for this operation on
 both linux and Windows.

Having one code path for everything seems appealing to me, but what do
others think?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-11-14 Thread Robert Haas
On Thu, Nov 13, 2014 at 7:45 AM, Andres Freund and...@2ndquadrant.com wrote:
 Right.  And that's why it's cool that logical decoding can operate
 through DDL differences.  The apply side might not be able to cope
 with what pops out, but that's not logical decoding's fault, and
 different apply-sides can adopt different policies as to how to deal
 with whatever problems crop up.

 I think pretty much all of the solutions just say oops, you're on your
 own. And I can't blame them for that. Once there's a schema difference
 and it causes problem there's really not much that can be done.

Agreed.

 I don't know exactly what you mean by a normal single master setup.
 Surely the point of logical decoding is that the replica might not be
 identical to the master.

 I actually think that that's not primary the point if you talk about
 individual objects. The majority of objects will be exactly the same on
 all nodes. If you actually want to have differening objects on the nodes
 you'll have to opt out/in (depending on your solution) of ddl
 replication for those objects.

I'm not saying it's the primary point; I'm just saying that it can happen.

 And if it isn't, then a command that
 succeeded on the master might fail on the standby - for example,
 because an object by that name already exists there, or because a type
 doesn't exist there.  (Even if you replicate a CREATE EXTENSION
 command, there's no guarantee that the .so exists on the target.) Then
 what?

 Sure. There's reasons logical replication isn't always a win. But I
 don't see why that's a reason not to make it as robust as possible.

I agree.

 Btw, the .so problem exists for wal shipping as as well.

Not in the same way.  You might not be able to access the data on the
standby if the .so defining the datatype isn't present, but the
replication itself doesn't care.

 This is basically the same problem as multi-master replication
 conflicts, except with DDL.  Resolving replication conflicts is not a
 very easy thing to get right even if you're only concerned about the
 rows in the tables.  It's probably harder if you're worried about the
 schema, too.

 I don't think it's a sane thing to do multimaster with differing schemas
 for individual relations, except maybe additional nonunique indexes.

But that's exactly what you ARE doing, isn't it?  I mean, if you
replicate in only one direction, nothing keeps someone from modifying
things on the replica independently of BDR, and if you replicate in
both directions, well that's multi-master.

  The solution here doesn't force you to do that, does it? It's something
  that can be used by more than replication solution?

 In theory, yes.

 What's the practical point here?

I am quite doubtful about whether there will ever be a second, working
implementation, so I see all of this code - and the maintenance effort
associated with it - as something that will really only benefit BDR.
I understand that you don't see it that way, and I'm not saying that
you are offering anything in bad faith, but it looks to me like even
with all of this very substantial amount of infrastructure, you're
still going to need a big pile of additional code inside BDR to
actually make it work, and I don't hear anyone else offering to
develop something similar.

 I think generally logical replication has more failure cases than
 physical ones. Which you seem to agree with.

Yes, I agree with that.

 Physical replication is basically no-fail because it just
 says, hey, go write these bytes into this page, and we can pretty much
 always do that. But statement-based logical replication means
 basically executing arbitrary chunks of code all over the backend, and
 there is just no way to guarantee that code won't throw an error.  So
 the best we can do is to hope that those errors will get reported back
 to the user, which is going to require some kind of distributed
 transaction.  Your idea to just run the replicated DDL statements on
 the standby before committing on the master is one approach to that
 problem, and probably the simplest one, but not the only one - one can
 imagine something that resembles true clustering, for example.

 I think that's generally not what people need for primary/standby
 cases (of subsets of tables). In practice there aren't many failures
 like that besides schema differences. And there's just many usecases
 that aren't doable with physical replication, so we can't advise people
 doing that.

I can't really follow this, especially the first sentence.

 By the way, the fact that you're planning to do log-based replication
 of DML and trigger-based replication of DDL scares the crap out of me.
 I'm not sure how that's going to work at all if the two are
 interleaved in the same transaction.

 Maybe that's based on a misunderstanding. All the event trigger does is
 insert a event into a (local) queue. That's then shipped to the other
 side using the same replication mechanisms as used for rows. Then, when
 

Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-11-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Nov 14, 2014 at 2:55 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 OTOH, if that is okay, then I think we can avoid few #ifdef WIN32 that
 this patch introduces and can have consistency for this operation on
 both linux and Windows.

 Having one code path for everything seems appealing to me, but what do
 others think?

Generally I'd be in favor of avoiding platform-dependent code where
possible, but that doesn't represent a YES vote for this particular
patch.  It looks pretty messy in a quick look, even granting that the
#ifdef WIN32's would all go away.

A larger question here is about forward/backward compatibility of the
basebackup files.  Changing the representation of symlinks like this
would break that.  Maybe we don't care, not sure (is there already a
catversion check for these things?).  Changing the file format for only
some platforms seems like definitely a bad idea though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-11-14 Thread Andres Freund
On 2014-11-14 12:38:52 -0500, Robert Haas wrote:
  This is basically the same problem as multi-master replication
  conflicts, except with DDL.  Resolving replication conflicts is not a
  very easy thing to get right even if you're only concerned about the
  rows in the tables.  It's probably harder if you're worried about the
  schema, too.
 
  I don't think it's a sane thing to do multimaster with differing schemas
  for individual relations, except maybe additional nonunique indexes.
 
 But that's exactly what you ARE doing, isn't it?  I mean, if you
 replicate in only one direction, nothing keeps someone from modifying
 things on the replica independently of BDR, and if you replicate in
 both directions, well that's multi-master.

Meh. By that definition any logical replication solution does multi
master replication. Either you tell your users that that's not allowed,
or you just prevent it by technical means. Absolutely the same is true
for table contents.

FWIW, in BDR we *do* prevent schemas from being modified independently
on different nodes (unless you set the 'running with scissors' guc).

 I am quite doubtful about whether there will ever be a second, working
 implementation, so I see all of this code - and the maintenance effort
 associated with it - as something that will really only benefit BDR.
 I understand that you don't see it that way, and I'm not saying that
 you are offering anything in bad faith, but it looks to me like even
 with all of this very substantial amount of infrastructure, you're
 still going to need a big pile of additional code inside BDR to
 actually make it work, and I don't hear anyone else offering to
 develop something similar.

I don't know what to say about this. I don't see any other realistic way
to perform DDL replication in logical rep, and nearly all my
conversations with users have indicated that they want that.

I think it's a good idea to structure independent features in a way that
other solutions can reuse them. But I sure as hell can't force them to
use it - especially as there's unfortunately not too much development
going on in the existing logical replication solutions for postgres.

Btw, I really think there's quite legitimate use cases for this besides
logical replication solutions - e.g. schema tracking is quite a sensible
use case.

  By the way, the fact that you're planning to do log-based replication
  of DML and trigger-based replication of DDL scares the crap out of me.
  I'm not sure how that's going to work at all if the two are
  interleaved in the same transaction.
 
  Maybe that's based on a misunderstanding. All the event trigger does is
  insert a event into a (local) queue. That's then shipped to the other
  side using the same replication mechanisms as used for rows. Then, when
  receiving changes in that ddl queue the standby performs those actions
  before continuing with the replay.
  That makes the interleaving on the standby to be pretty much the same as
  on the primary.
 
 OK.  But that's also not something that I can really imagine us ever
 adopting in core.

Well, that bit really depends on what the user (e.g. a replication
solution, or a schema tracking feature) needs. There's certain things
that you can quite easily do as part of core (e.g. insert something into
the WAL stream), that you just can't as external code.

I don't think there's any external replication solution that won't have
some form of internal queue to manipulate its internal state. For an
external solution such a queue currently pretty much has to be some
table, but for an eventual in core solution it could be done
differently.

 If we were going to do DDL replication in core, I have to believe we'd
 find a way to put all of the information in the WAL stream, not use
 triggers.

I agree that we might want to represent the transport to standbys
differently for something in core. That there's many different ways the
deparsing output could be used imo is a good reason why that part of the
mechanism isn't part of this submission.

I don't really understand the arguments against triggers in general
though. We're already using them quite extensively - I don't see why DDL
replication has to meet some completely different bar than say foreign
key checks or deferred uniqueness checks. We easily can add 'implicit'
event triggers, that aren't defined inside the catalog if we feel like
it. I'm just not sure we really would need/want to.

  Yes, it's not trivial. And I think there's some commands where you might
  not want to try but either scream ERROR or just rereplicate the whole
  relation or such.
 
  I very strongly feel that we (as postgres devs) have a *much* higher
  chance of recognizing these cases than either some random users (that
  write slonik scripts or similar) or some replication solution authors
  that aren't closely involved with -hackers.
 
 It's clearly the job of the replication solution authors to figure
 these details out.  I'm not going to 

Re: [HACKERS] PostgreSQL doesn't stop propley when --slot option is specified with pg_receivexlog.

2014-11-14 Thread Fujii Masao
On Fri, Nov 14, 2014 at 7:22 PM,  furu...@pm.nttdata.co.jp wrote:
 Hi,

 pg_ctl stop does't work propley, if --slot option is specified when WAL is 
 flushed only it has switched.
 These processes still continue even after the posmaster 
 failed:pg_receivexlog, walsender and logger.

I could reproduce this problem. At normal shutdown, walsender keeps waiting
for the last WAL record to be replicated and flushed in pg_receivexlog. But
pg_receivexlog issues sync command only when WAL file is switched. Thus,
since pg_receivexlog may never flush the last WAL record, walsender may have
to keep waiting infinitely.

pg_recvlogical handles this problem by calling fsync() when it receives the
request of immediate reply from the server. That is, at shutdown, walsender
sends the request, pg_receivexlog receives it, flushes the last WAL record,
and sends the flush location back to the server. Since walsender can see that
the last WAL record is successfully flushed in pg_receivexlog, it can
exit cleanly.

One idea to the problem is to introduce the same logic as pg_recvlogical has,
to pg_receivexlog. Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-11-14 Thread Robert Haas
On Fri, Nov 14, 2014 at 1:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Generally I'd be in favor of avoiding platform-dependent code where
 possible, but that doesn't represent a YES vote for this particular
 patch.  It looks pretty messy in a quick look, even granting that the
 #ifdef WIN32's would all go away.

Hmm, OK.  I have not read the patch.  Hopefully that's something that
could be fixed.

 A larger question here is about forward/backward compatibility of the
 basebackup files.  Changing the representation of symlinks like this
 would break that.  Maybe we don't care, not sure (is there already a
 catversion check for these things?).  Changing the file format for only
 some platforms seems like definitely a bad idea though.

What are the practical consequences of changing the file format?  I
think that an old backup containing symlinks could be made to work on
a new server that knows how to create them, and we should probably
design it that way, but a physical backup isn't compatible across
major versions anyway, so it doesn't have the same kinds of
repercussions as changing something like the pg_dump file format.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-11-14 Thread Alvaro Herrera
Amit Kapila wrote:

 2. Symlink file format:
 oid linkpath
 16387 E:\PostgreSQL\tbs
 
 Symlink file will contain entries for all the tablspaces
 under pg_tblspc directory.  I have kept the file name as
 symlink_label (suggestion are welcome if you want some
 different name for this file).

I think symlink_label isn't a very good name.  This file is not a label
in the sense that backup_label is; it seems more a catalog to me.  And
it's not, in essence, about symlinks either, but rather about
tablespaces.  I would name it following the term tablespace catalog or
some variation thereof.

I know we don't expect that users would have to look at the file or edit
it in normal cases, but it seems better to make it be human-readable.  I
would think that the file needs to have tablespace names too, then, not
just OIDs.  Maybe we don't use the tablespace name for anything other
than documentation purposes if someone decides to look at the file, so
perhaps it should look like a comment:

oid   link path ; tablespace name

We already do this in pg_restore -l output IIRC.

One use case mentioned upthread is having the clone be created in the
same machine as the source server.  Does your proposal help with it?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL doesn't stop propley when --slot option is specified with pg_receivexlog.

2014-11-14 Thread Andres Freund
On 2014-11-15 03:25:16 +0900, Fujii Masao wrote:
 On Fri, Nov 14, 2014 at 7:22 PM,  furu...@pm.nttdata.co.jp wrote:
  Hi,
 
  pg_ctl stop does't work propley, if --slot option is specified when WAL 
  is flushed only it has switched.
  These processes still continue even after the posmaster 
  failed:pg_receivexlog, walsender and logger.
 
 I could reproduce this problem. At normal shutdown, walsender keeps waiting
 for the last WAL record to be replicated and flushed in pg_receivexlog. But
 pg_receivexlog issues sync command only when WAL file is switched. Thus,
 since pg_receivexlog may never flush the last WAL record, walsender may have
 to keep waiting infinitely.

Right.

 pg_recvlogical handles this problem by calling fsync() when it receives the
 request of immediate reply from the server. That is, at shutdown, walsender
 sends the request, pg_receivexlog receives it, flushes the last WAL record,
 and sends the flush location back to the server. Since walsender can see that
 the last WAL record is successfully flushed in pg_receivexlog, it can
 exit cleanly.
 
 One idea to the problem is to introduce the same logic as pg_recvlogical has,
 to pg_receivexlog. Thought?

Sounds sane to me. Are you looking into doing that?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Size of regression database

2014-11-14 Thread Alvaro Herrera
Alvaro Herrera wrote:
 Tom Lane wrote:
  I was testing backwards compatibility of pg_dumpall just now, and was
  somewhat astonished to notice the size of the output for the regression
  database compared to what it was not too long ago:
  
  -rw-rw-r--. 1 tgl tgl  4509135 Nov 13 16:19 dumpall.83
  -rw-rw-r--. 1 tgl tgl  4514441 Nov 13 16:24 dumpall.84
  -rw-rw-r--. 1 tgl tgl  4666917 Nov 13 16:15 dumpall.90
  -rw-rw-r--. 1 tgl tgl  4681235 Nov 13 16:15 dumpall.91
  -rw-rw-r--. 1 tgl tgl  5333587 Nov 13 16:15 dumpall.92
  -rw-rw-r--. 1 tgl tgl  5409083 Nov 13 16:15 dumpall.93
  -rw-rw-r--. 1 tgl tgl  5493686 Nov 13 16:15 dumpall.94
  -rw-rw-r--. 1 tgl tgl 27151777 Nov 13 16:21 dumpall.head
  
  A quick eyeball check says that that quintupling of the database size
  is all in BRIN index tests.  Could we dial that back to something a
  bit saner please?
 
 Oops.  Sure, will see about this.

Done:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=86cf9a565069755189e08290343d2d62afdd1f52

Now a pg_dumpall for me is 5510969 bytes which seems reasonable.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] printing table in asciidoc with psql

2014-11-14 Thread Szymon Guz
On 14 November 2014 20:57, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 Is anyone going to submit a new version of this patch?




Hi Alvaro,
due to family issues I will not be able to work on it for the next 10 days.

regards,
Szymon


Re: [HACKERS] TODO request: log_long_transaction

2014-11-14 Thread Jim Nasby

On 11/7/14, 1:19 PM, Michael Banck wrote:

Am Montag, den 27.10.2014, 19:29 + schrieb Thom Brown:

On 27 October 2014 19:21, Josh Berkusj...@agliodbs.com  wrote:

 I just realized that there is one thing we can't log currently:
 transactions which last more than #ms.  This is valuable diagnostic
 information when looking for issues like causes of bloat and deadlocks.
 
 I'd like it to be on the TODO list because it seems like part of a good
 GSOC project or first-time contribution.



So effectively, log_min_duration_transaction?  Sounds useful.


FWIW, I've also wanted the equivalent of statement_timeout for transactions; 
the ability to abort a transaction if it runs for too long.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort

2014-11-14 Thread Jeremy Harris
On 14/11/14 14:54, Tom Lane wrote:
 Jeremy Harris j...@wizmail.org writes:
 On 14/11/14 00:46, Simon Riggs wrote:
 Limit  (cost= rows=20 width=175) (actual time= rows=20 loops=1)
 -  Sort  (cost= rows=568733 width=175) (actual time=
 rows=20 loops=1)
 Sort Method: top-N heapsort
 
 Going off on a tangent, when I was playing with a merge-sort
 implementation I propagated limit information into the sort
 node, for a significant win.
 
 I'm not entirely following.  The top-N heapsort approach already
 makes use of the limit info.

Having gone back to look, you're right.  It was Uniq nodes I merged
(the sort handles both bounded-output and dedup).
-- 
Cheers,
  Jeremy




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New storage parameter pages_per_range not mentioned in CREATE INDEX doc

2014-11-14 Thread Alvaro Herrera
Michael Paquier wrote:
 Hi all,
 
 The new storage parameter pages_per_range is missing in the
 documentation of CREATE INDEX. The patch attached corrects that.

Thanks!  Pushed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-11-14 Thread Peter Eisentraut
On 11/13/14 4:33 PM, Peter Eisentraut wrote:
 Is this still relevant after this commit?
  
  commit fb05f3ce83d225dd0f39f8860ce04082753e9e98
  Author: Peter Eisentraut pete...@gmx.net
  Date:   Sat Feb 22 13:38:06 2014 -0500
  
  pg_basebackup: Add support for relocating tablespaces
 I believe so.
 
 The commit only applies to plain output.  Amit's complaint is that tar
 utilities on Windows don't unpack symlinks, so the tar format isn't
 useful on Windows when tablespaces are used.  So he wants the recovery
 mechanism to restore the symlinks.

Um, wouldn't accepting this patch break the above-mentioned
tablespace-relocation feature, because pg_basebackup wouldn't see any
more symlinks sent down?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Idle transaction cancel/timeout and SSL revisited

2014-11-14 Thread Alex Shulgin
Hello Hackers,

After reading up through archives on the two $subj related TODO items
I'm under impression that the patches[1,2] didn't make it mainly because
of the risk of breaking SSL internals if we try to longjump out of the
signal handler in the middle of a blocking SSL read and/or if we try to
report that final transaction/connection aborted notice to the client.

What if instead of trying to escape from the signal handler we would set
a flag and use it to simulate EOF after the recv() is restarted due to
EINTR?  From the backend perspective it should look like the client has
just hang up.

Both SSL and cleartext connections are routed through secure_raw_read,
so it seems like a good candidate for checking the flag we would set in
StatementCancelHandler().  (Should we also add the same check in
interactive_getc()?)

Ultimately, in PostgresMain() the ReadCommand() would return EOF and we
can either let the existing code shutdown the backend, or add special
handling that will only abort the transaction and report the case to the
client, which was the intent in[1].  And if that works out, the timeout
handler in[2] can simply reuse the flag.

A potential problem with this is that SSL might perform some eager
cleanup when seeing EOF, so the backend might need to reset/restart the
connection (is that possible?)

I'm attaching a patch (that doesn't compile yet :-p) in hope it can be
useful for the purpose of the discussion.  Notably, secure_raw_read()
can't know about IdleTransactionCancelPending and I'm not sure what
would be the best way to let it see that (is it too much of a shortcut
anyway?)

--
Alex

[1] http://www.postgresql.org/message-id/1262268211.19367.10736.camel@ebony
[2] http://www.postgresql.org/message-id/538dc843.2070...@dalibo.com

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
new file mode 100644
index b05364c..377984d
*** a/src/backend/libpq/be-secure-openssl.c
--- b/src/backend/libpq/be-secure-openssl.c
*** rloop:
*** 543,550 
  	 errmsg(SSL error: %s, SSLerrmessage(;
  			/* fall through */
  		case SSL_ERROR_ZERO_RETURN:
! 			errno = ECONNRESET;
! 			n = -1;
  			break;
  		default:
  			ereport(COMMERROR,
--- 543,553 
  	 errmsg(SSL error: %s, SSLerrmessage(;
  			/* fall through */
  		case SSL_ERROR_ZERO_RETURN:
! 			if (!IdleTransactionCancelPending) /* HACK */
! 			{
! errno = ECONNRESET;
! n = -1;
! 			}
  			break;
  		default:
  			ereport(COMMERROR,
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
new file mode 100644
index 41ec1ad..44079ff
*** a/src/backend/libpq/be-secure.c
--- b/src/backend/libpq/be-secure.c
*** secure_raw_read(Port *port, void *ptr, s
*** 145,155 
  {
  	ssize_t		n;
  
! 	prepare_for_client_read();
  
! 	n = recv(port-sock, ptr, len, 0);
  
! 	client_read_ended();
  
  	return n;
  }
--- 145,160 
  {
  	ssize_t		n;
  
! 	if (IdleTransactionCancelPending)
! 		n = 0; /* simulate EOF */
! 	else
! 	{
! 		prepare_for_client_read();
  
! 		n = recv(port-sock, ptr, len, 0);
  
! 		client_read_ended();
! 	}
  
  	return n;
  }
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
new file mode 100644
index cc62b2c..37437fc
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
*** interactive_getc(void)
*** 310,318 
  {
  	int			c;
  
! 	prepare_for_client_read();
! 	c = getc(stdin);
! 	client_read_ended();
  	return c;
  }
  
--- 310,323 
  {
  	int			c;
  
! 	if (IdleTransactionCancelPending)
! 		c = EOF;
! 	else
! 	{
! 		prepare_for_client_read();
! 		c = getc(stdin);
! 		client_read_ended();
! 	}
  	return c;
  }
  
*** StatementCancelHandler(SIGNAL_ARGS)
*** 2629,2642 
  	if (!proc_exit_inprogress)
  	{
  		InterruptPending = true;
! 		QueryCancelPending = true;
  
  		/*
  		 * If it's safe to interrupt, and we're waiting for input or a lock,
  		 * service the interrupt immediately
  		 */
  		if (ImmediateInterruptOK  InterruptHoldoffCount == 0 
! 			CritSectionCount == 0)
  		{
  			/* bump holdoff count to make ProcessInterrupts() a no-op */
  			/* until we are done getting ready for it */
--- 2634,2655 
  	if (!proc_exit_inprogress)
  	{
  		InterruptPending = true;
! 
! 		if (!DoingCommandRead)
! 			QueryCancelPending = true;
! 		else if (IsTransactionOrTransactionBlock())
! 			IdleTransactionCancelPending = true;
  
  		/*
  		 * If it's safe to interrupt, and we're waiting for input or a lock,
  		 * service the interrupt immediately
  		 */
  		if (ImmediateInterruptOK  InterruptHoldoffCount == 0 
! 			CritSectionCount == 0  QueryCancelPending)
! 			/*
! 			 * ProcessInterrupts() does nothing in case of
! 			 * IdleTransactionCancelPending, it is handled in PostgresMain
! 			 */
  		{
  			/* bump holdoff count to make ProcessInterrupts() a no-op */
  			/* until we are done getting ready for it */
*** PostgresMain(int argc, char 

Re: [HACKERS] Idle transaction cancel/timeout and SSL revisited

2014-11-14 Thread Andres Freund
Hi,

On 2014-11-15 00:11:36 +0300, Alex Shulgin wrote: 
 After reading up through archives on the two $subj related TODO items
 I'm under impression that the patches[1,2] didn't make it mainly because
 of the risk of breaking SSL internals if we try to longjump out of the
 signal handler in the middle of a blocking SSL read and/or if we try to
 report that final transaction/connection aborted notice to the client.

I've written a patch that allows that - check
http://archives.postgresql.org/message-id/20140927225421.GE5423%40alap3.anarazel.de

I feel pretty confident that that's the way to go. I just need some time
to polish it.

 What if instead of trying to escape from the signal handler we would set
 a flag and use it to simulate EOF after the recv() is restarted due to
 EINTR?  From the backend perspective it should look like the client has
 just hang up.

That's pretty much what the above does. Except that it actually deals
with blocking syscalls by not using them and relying on latches/select
instead.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idle transaction cancel/timeout and SSL revisited

2014-11-14 Thread Alex Shulgin

Andres Freund and...@2ndquadrant.com writes:

 On 2014-11-15 00:11:36 +0300, Alex Shulgin wrote: 
 After reading up through archives on the two $subj related TODO items
 I'm under impression that the patches[1,2] didn't make it mainly because
 of the risk of breaking SSL internals if we try to longjump out of the
 signal handler in the middle of a blocking SSL read and/or if we try to
 report that final transaction/connection aborted notice to the client.

 I've written a patch that allows that - check
 http://archives.postgresql.org/message-id/20140927225421.GE5423%40alap3.anarazel.de

 I feel pretty confident that that's the way to go. I just need some time
 to polish it.

 What if instead of trying to escape from the signal handler we would set
 a flag and use it to simulate EOF after the recv() is restarted due to
 EINTR?  From the backend perspective it should look like the client has
 just hang up.

 That's pretty much what the above does. Except that it actually deals
 with blocking syscalls by not using them and relying on latches/select
 instead.

Neat, I must have missed it.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Robert Haas
On Fri, Nov 14, 2014 at 2:51 PM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Thu, Nov 13, 2014 at 6:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  What's bothering me is that I see this in pg_dumpall output from a 9.4
  or earlier database:
 
  ALTER ROLE postgres WITH SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN 
  REPLICATION NOBYPASSRLS;

 What about leaving out NOBYPASSRLS and letting it go to whatever the default 
 is?

 I'd be fine with that- but would we want to do it for the other role
 attributes also?  Specifically rolcreaterole and rolreplication for
 older server versions.  I'm still of the opinion that we should just
 drop the explicit true for all the role attributes for the bootstrap
 superuser and then go with this suggestion to let it go to the default
 for upgrades from older versions.

Not sure; I was just brainstorming.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On the warpath again about ill-considered inclusion nests

2014-11-14 Thread Stephen Frost
All,

* Stephen Frost (sfr...@snowman.net) wrote:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
  Well, if you *only* move RowSecurityDesc and not RowSecurityPolicy,
  okay, but that seems a bit useless/inconsistent if I'm reading it
  right that RowSecurityDesc contains a List of RowSecurityPolicy structs.
 
 Yes, good point.
 
  What seems possibly saner is to just remove the header inclusion in rel.h
  and declare the new Relation field similarly to the way we handle
  rd_fdwroutine and some other fields there:
  
  /* use struct here to avoid needing to include rowsecurity.h: */
  struct RowSecurityDesc *rsdesc; /* Row-security policy, or NULL */
 
 Makes sense to me.
 
  And while you are at it, how about renaming rsdesc to rd_rsdesc?
  The fact that whoever put in trigdesc didn't get the memo about the
  naming convention for Relation fields doesn't excuse you from following
  it.
 
 Ok.  I tend to be bad and mistakenly consider existing code 'gospel'.
 Will fix.
 
  PS: The comments for struct RowSecurityPolicy could stand to be improved.
 
 Understood, will do so.

And, done.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] initdb -S and tablespaces

2014-11-14 Thread Andres Freund
On 2014-10-30 14:30:28 +0530, Abhijit Menon-Sen wrote:
 Here's a proposed patch to initdb to make initdb -S fsync everything
 under pg_tblspc. It introduces a new function that calls walkdir on
 every entry under pg_tblspc. This is only one approach: I could have
 also changed walkdir to follow links, but that would have required a
 bunch of #ifdefs for Windows (because it doesn't have symlinks), and I
 guessed a separate function with two calls might be easier to patch into
 back branches. I've tested this patch under various conditions on Linux,
 but it could use some testing on Windows.

I've pushed this. The windows buildfarm animals that run pg_upgrade (and
thus --sync-only) will have to tell us whether there's a problem. I sure
hope there's none...

Thanks for that patch!

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] END_OF_RECOVERY shutdowns and ResetUnloggedRelations()

2014-11-14 Thread Andres Freund
On 2014-10-27 16:09:30 +0530, Abhijit Menon-Sen wrote:
 At 2014-09-25 22:41:18 +0200, and...@2ndquadrant.com wrote:
 
  On 2014-09-24 17:06:05 +0530, Abhijit Menon-Sen wrote:
   
   1. Move the call to ResetUnloggedRelations(UNLOGGED_RELATION_INIT) to
  earlier in StartupXLOG.
   
   2. Inside that function, issue fsync()s for the main forks we create by
  copying the _init fork.
  
  These two imo should definitely be backpatched.
 
 I've attached updated versions of these two patches. The only changes
 are to revise the comments as you suggested.

Committed and backpatched (to 9.1) these. That required also
backpatching the move of fsync_fname() to fd.c/h to 9.1-9.3.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Noah Misch
On Fri, Nov 14, 2014 at 11:47:49AM -0500, Stephen Frost wrote:
  On Fri, Nov 14, 2014 at 11:24:20AM -0500, Tom Lane wrote:
   I think Noah is arguing for leaving the pg_dumpall queries as they
   stand.  I disagree, but he's entitled to his opinion.

 Ah, ok.  I'm impartial, but I do note that we're currently inconsistent
 and so I'd prefer to go one way or the other.
 
 rolcreaterole uses usesuper, while rolreplication and rolbypassrls do
 not.  Noah- would you argue that we should change rolcreaterole, which
 has this behavior in all released branches (though, of course, it's only
 relevant when upgrading from a pre-8.1 server where we didn't have
 rolcreaterole)?

Setting aside that I wouldn't have argued for any change here, yes.  I agree
that there's no good reason to handle rolcreaterole unlike rolreplication.
The choice between their respective techniques has behavior consequences only
if you later use ALTER ROLE x NOSUPERUSER, which I have not seen done apart
from explicit ALTER ROLE testing.  Having said that, I prefer setting these
attributes to false when dumping from a version that did not have them, for
these reasons:

1) It's fail-safe.  The hypothetical ALTER ROLE x NOSUPERUSER may leave the
   role with fewer privileges than expected, never with more privileges than
   expected.

2) It's more consistent with how folks create superuser accounts.  I've not
   seen CREATE USER x SUPERUSER CREATEROLE used.  Where SUPERUSER preempts a
   given role attribute, the norm among sites I've seen is to omit the
   attribute.  (The bootstrap superuser does turn this point on its head.)

3) It's cleaner in \du output.

I can't pinpoint a technical argument against your proposal to cease adding
excess attributes to the bootstrap superuser at initdb time.  It feels like a
case of the tail wagging the dog, changing antediluvian initdb behavior to
make pg_dumpall slightly more transparent.

So, if you desire to make this consistent, I recommend using rolreplication's
treatment as the gold standard.  That is to say, when dumping from an older
version, set to false any of these role attributes not existing in that
version.  Robert's proposal to emit neither NOBYPASSRLS nor BYPASSRLS is a
reasonable alternative, though less so for pg_dumpall --clean.  It would be
defensible to send NOBYPASSRLS under --clean and omit the option otherwise;
consider that my second choice.

 What are your thoughts on the additional role
 attributes which are being discussed?

All three of rolcreaterole, rolreplication and rolbypassrls deserve the same
dumpRoles() treatment.

nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-14 Thread Simon Riggs
On 14 November 2014 17:12, Robert Haas robertmh...@gmail.com wrote:

 We are not talking about loading 20 new
 requirements on top of this patch; that would be intolerable.  We're
 talking about adding one additional piece of information that has been
 requested multiple times over the years.

The requested information is already available, as discussed. Logical
decoding adds commit ordering for *exactly* the purpose of using it
for replication, available to all solutions. This often requested
feature has now been added and doesn't need to be added twice.

So what we are discussing is adding a completely superfluous piece of
information.

Not including the LSN info does nothing to trigger based replication,
which will no doubt live on happily for many years. But adding LSN
will slow down logical replication, for no purpose at all.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Useless dead struct in parse_func.h

2014-11-14 Thread Peter Geoghegan
It looks like commit 0e99be1c removed the final real use of the struct
InhPaths. Attached patch removes it entirely.

-- 
Peter Geoghegan
diff --git a/src/include/parser/parse_func.h b/src/include/parser/parse_func.h
index b9b06ae..4423bc0 100644
--- a/src/include/parser/parse_func.h
+++ b/src/include/parser/parse_func.h
@@ -18,18 +18,6 @@
 #include parser/parse_node.h
 
 
-/*
- *	This structure is used to explore the inheritance hierarchy above
- *	nodes in the type tree in order to disambiguate among polymorphic
- *	functions.
- */
-typedef struct _InhPaths
-{
-	int			nsupers;		/* number of superclasses */
-	Oid			self;			/* this class */
-	Oid		   *supervec;		/* vector of superclasses */
-}	InhPaths;
-
 /* Result codes for func_get_detail */
 typedef enum
 {

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Stephen Frost
* Noah Misch (n...@leadboat.com) wrote:
 On Fri, Nov 14, 2014 at 11:47:49AM -0500, Stephen Frost wrote:
  rolcreaterole uses usesuper, while rolreplication and rolbypassrls do
  not.  Noah- would you argue that we should change rolcreaterole, which
  has this behavior in all released branches (though, of course, it's only
  relevant when upgrading from a pre-8.1 server where we didn't have
  rolcreaterole)?
 
 Setting aside that I wouldn't have argued for any change here, yes.  I agree
 that there's no good reason to handle rolcreaterole unlike rolreplication.
 The choice between their respective techniques has behavior consequences only
 if you later use ALTER ROLE x NOSUPERUSER, which I have not seen done apart
 from explicit ALTER ROLE testing.  Having said that, I prefer setting these
 attributes to false when dumping from a version that did not have them, for
 these reasons:
 
 1) It's fail-safe.  The hypothetical ALTER ROLE x NOSUPERUSER may leave the
role with fewer privileges than expected, never with more privileges than
expected.
 
 2) It's more consistent with how folks create superuser accounts.  I've not
seen CREATE USER x SUPERUSER CREATEROLE used.  Where SUPERUSER preempts a
given role attribute, the norm among sites I've seen is to omit the
attribute.  (The bootstrap superuser does turn this point on its head.)
 
 3) It's cleaner in \du output.

I agree with these points and my experience matches yours.  The
bootstrap user is the one anomaly at those sites and makes it stand out
rather than look like the majority of the superuser accounts which
exist (where there exists more than one or two anyway).

 I can't pinpoint a technical argument against your proposal to cease adding
 excess attributes to the bootstrap superuser at initdb time.  It feels like a
 case of the tail wagging the dog, changing antediluvian initdb behavior to
 make pg_dumpall slightly more transparent.

For my 2c, it feels like it was simply an attempt to reflect actual
capabilities without consideration for what happened later rather than
any particularly technical reason, and I don't feel that original
reasoning (if that's what it was, anyway) was sound.

 So, if you desire to make this consistent, I recommend using rolreplication's
 treatment as the gold standard.  That is to say, when dumping from an older
 version, set to false any of these role attributes not existing in that
 version.  Robert's proposal to emit neither NOBYPASSRLS nor BYPASSRLS is a
 reasonable alternative, though less so for pg_dumpall --clean.  It would be
 defensible to send NOBYPASSRLS under --clean and omit the option otherwise;
 consider that my second choice.

I don't see the point in including them for --clean..?  --clean states
that DROP commands would be added, not that existing roles would be
adjusted in some way.

As for using 'always false'- I tend to think Robert actually has it
better by using the default for users.  Consider rolinherit- that
defaults to 'true' and while it would technically be more 'safe' to set
it to false, it wouldn't have matched what we provided under the user /
group system prior to roles.  Doing this would also reduce clutter in
pg_dumpall output.

  What are your thoughts on the additional role
  attributes which are being discussed?
 
 All three of rolcreaterole, rolreplication and rolbypassrls deserve the same
 dumpRoles() treatment.

Agreed.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Noah Misch
On Fri, Nov 14, 2014 at 08:39:28PM -0500, Stephen Frost wrote:
 * Noah Misch (n...@leadboat.com) wrote:
  So, if you desire to make this consistent, I recommend using 
  rolreplication's
  treatment as the gold standard.  That is to say, when dumping from an older
  version, set to false any of these role attributes not existing in that
  version.  Robert's proposal to emit neither NOBYPASSRLS nor BYPASSRLS is a
  reasonable alternative, though less so for pg_dumpall --clean.  It would 
  be
  defensible to send NOBYPASSRLS under --clean and omit the option otherwise;
  consider that my second choice.
 
 I don't see the point in including them for --clean..?  --clean states
 that DROP commands would be added, not that existing roles would be
 adjusted in some way.

It does state that, but note this comment in dumpRoles():

/*
 * We dump CREATE ROLE followed by ALTER ROLE to ensure that 
the role
 * will acquire the right properties even if it already exists 
(ie, it
 * won't hurt for the CREATE to fail).  This is particularly 
important
 * for the role we are connected as, since even with --clean we 
will
 * have failed to drop it.  binary_upgrade cannot generate any 
errors,
 * so we assume the current role is already created.
 */

Under --clean, the right properties are those the role would have had if the
DROP ROLE had succeeded.  Those are necessarily independent of the pre-DROP
version of the role.  (Otherwise, you potentially get different outcomes
depending on which superuser restored the --clean dump.)

 As for using 'always false'- I tend to think Robert actually has it
 better by using the default for users.  Consider rolinherit- that
 defaults to 'true' and while it would technically be more 'safe' to set
 it to false, it wouldn't have matched what we provided under the user /
 group system prior to roles.  Doing this would also reduce clutter in
 pg_dumpall output.

My arguments and conclusion apply only to the permission-like attributes that
are subsets of SUPERUSER.  rolinherit is indeed not in that category.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Stephen Frost
* Noah Misch (n...@leadboat.com) wrote:
 On Fri, Nov 14, 2014 at 08:39:28PM -0500, Stephen Frost wrote:
  I don't see the point in including them for --clean..?  --clean states
  that DROP commands would be added, not that existing roles would be
  adjusted in some way.
 
 It does state that, but note this comment in dumpRoles():
 
   /*
* We dump CREATE ROLE followed by ALTER ROLE to ensure that 
 the role
* will acquire the right properties even if it already exists 
 (ie, it
* won't hurt for the CREATE to fail).  This is particularly 
 important
* for the role we are connected as, since even with --clean we 
 will
* have failed to drop it.  binary_upgrade cannot generate any 
 errors,
* so we assume the current role is already created.
*/

Ah, yes, of course.

 Under --clean, the right properties are those the role would have had if the
 DROP ROLE had succeeded.  Those are necessarily independent of the pre-DROP
 version of the role.  (Otherwise, you potentially get different outcomes
 depending on which superuser restored the --clean dump.)

Agreed, and in this case we'd need to set any attributes not set back to
the default, which would include having NOBYPASSRLS.

  As for using 'always false'- I tend to think Robert actually has it
  better by using the default for users.  Consider rolinherit- that
  defaults to 'true' and while it would technically be more 'safe' to set
  it to false, it wouldn't have matched what we provided under the user /
  group system prior to roles.  Doing this would also reduce clutter in
  pg_dumpall output.
 
 My arguments and conclusion apply only to the permission-like attributes that
 are subsets of SUPERUSER.  rolinherit is indeed not in that category.

Understood.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-11-14 Thread Amit Kapila
On Sat, Nov 15, 2014 at 2:21 AM, Peter Eisentraut pete...@gmx.net wrote:

 On 11/13/14 4:33 PM, Peter Eisentraut wrote:
  Is this still relevant after this commit?
  
   commit fb05f3ce83d225dd0f39f8860ce04082753e9e98
   Author: Peter Eisentraut pete...@gmx.net
   Date:   Sat Feb 22 13:38:06 2014 -0500
  
   pg_basebackup: Add support for relocating tablespaces
  I believe so.
 
  The commit only applies to plain output.  Amit's complaint is that tar
  utilities on Windows don't unpack symlinks, so the tar format isn't
  useful on Windows when tablespaces are used.  So he wants the recovery
  mechanism to restore the symlinks.

 Um, wouldn't accepting this patch break the above-mentioned
 tablespace-relocation feature, because pg_basebackup wouldn't see any
 more symlinks sent down?

No, the new feature is implemented only for tar format and above feature
works only with plain format.  It will still send the symlink information as
previously for plain format.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-11-14 Thread Alvaro Herrera
Michael Paquier wrote:

 Btw, perhaps this diff should be pushed as a different patch as this is a
 rather different thing:
 -   if (heapRelation-rd_rel-relpersistence == RELPERSISTENCE_UNLOGGED
 
 +   if (indexRelation-rd_rel-relpersistence ==
 RELPERSISTENCE_UNLOGGED 
 !smgrexists(indexRelation-rd_smgr, INIT_FORKNUM)
 As this is a correctness fix, it does not seem necessary to back-patch it:
 the parent relation always has the same persistence as its indexes.

There was an argument for doing it this way that only applies if this
patch went in, but I can't remember now what it was.

Anyway I pushed the patch after some slight additional changes.  Thanks!

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New storage parameter pages_per_range not mentioned in CREATE INDEX doc

2014-11-14 Thread Alvaro Herrera
Michael Paquier wrote:
 Hi all,
 
 The new storage parameter pages_per_range is missing in the
 documentation of CREATE INDEX. The patch attached corrects that.

Ah BTW I had pushed this earlier today.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Useless dead struct in parse_func.h

2014-11-14 Thread Alvaro Herrera
Peter Geoghegan wrote:
 It looks like commit 0e99be1c removed the final real use of the struct
 InhPaths. Attached patch removes it entirely.

I didn't verify the claim about 0e99be1c, but I pushed it anyway.
Thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-14 Thread Steve Singer

On 11/14/2014 08:21 PM, Simon Riggs wrote:

The requested information is already available, as discussed. Logical
decoding adds commit ordering for *exactly* the purpose of using it
for replication, available to all solutions. This often requested
feature has now been added and doesn't need to be added twice.

So what we are discussing is adding a completely superfluous piece of
information.

Not including the LSN info does nothing to trigger based replication,
which will no doubt live on happily for many years. But adding LSN
will slow down logical replication, for no purpose at all.



Simon,
The use cases I'm talking about aren't really replication related. Often 
I have come across systems that want to do something such as 'select * 
from orders where X  the_last_row_I_saw order by X' and then do further 
processing on the order.


This is kind of awkard to do today because you don't have a good 
candidate for 'X' to order on.   Using either a sequence or insert-row 
timestamp doesn't work well because a transaction with a lower value for 
X might end up committing after the higher value in in a query result.


Yes you could setup a logical wal slot and listen on the stream of 
inserts into your order table but thats a lot of administration overhead 
compared to just issuing an SQL query for what really is a query type 
operation.


Using the commit timestamp for my X sounded very tempting but could 
allow duplicates.


One could argue that this patch is about replication features, and 
providing commit ordering for query purposes should be a separate patch 
to add that on top of this infrastructure. I see merit to smaller more 
focused patches but that requires leaving the door open to easily 
extending things later.


It could also be that I'm the only one who wants to order and filter 
queries in this manner (but that would surprise me).  If the commit lsn 
has limited appeal and we decide we don't want it at all  then we 
shouldn't add it.  I've seen this type of requirement in a number of 
different systems at a number of different companies.  I've generally 
seen it dealt with by either selecting rows behind the last now() 
timestamp seen and then filtering out already processed rows or by 
tracking the 'processed' state of each row individually (ie performing 
an update on each row once its been processed) which performs poorly.


Steve







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New storage parameter pages_per_range not mentioned in CREATE INDEX doc

2014-11-14 Thread Michael Paquier
On Sat, Nov 15, 2014 at 1:27 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Michael Paquier wrote:
 Hi all,

 The new storage parameter pages_per_range is missing in the
 documentation of CREATE INDEX. The patch attached corrects that.

 Ah BTW I had pushed this earlier today.
Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format and API changes (9.5)

2014-11-14 Thread Michael Paquier
On Fri, Nov 14, 2014 at 5:31 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 3) pg_xlogdump does not seem to work:
 $ pg_xlogdump 0001000D
 pg_xlogdump: FATAL:  could not find a valid record after 0/D00
This one is a bad manipulation from my side. Please forget this comment.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-11-14 Thread Amit Kapila
On Sat, Nov 15, 2014 at 12:03 AM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Amit Kapila wrote:

  2. Symlink file format:
  oid linkpath
  16387 E:\PostgreSQL\tbs
 
  Symlink file will contain entries for all the tablspaces
  under pg_tblspc directory.  I have kept the file name as
  symlink_label (suggestion are welcome if you want some
  different name for this file).

 I think symlink_label isn't a very good name.  This file is not a label
 in the sense that backup_label is; it seems more a catalog to me.  And
 it's not, in essence, about symlinks either, but rather about
 tablespaces.  I would name it following the term tablespace catalog or
 some variation thereof.


This file is going to provide the symlink path for each tablespace, so
it not be bad to have that in file name.  I agree with you that it's more
about tablespaces.  So how about:

tablespace_symlink
symlink_tablespace
tablespace_info

 I know we don't expect that users would have to look at the file or edit
 it in normal cases, but it seems better to make it be human-readable.  I
 would think that the file needs to have tablespace names too, then, not
 just OIDs.  Maybe we don't use the tablespace name for anything other
 than documentation purposes if someone decides to look at the file, so
 perhaps it should look like a comment:

 oid   link path ; tablespace name

 We already do this in pg_restore -l output IIRC.


Okay, I will take care of doing this in next version of patch if no one
objects to this idea or more people are in favour of doing so.

 One use case mentioned upthread is having the clone be created in the
 same machine as the source server.  Does your proposal help with it?


Sorry, but I am not getting which proposal exactly you are referring here,
Could you explain in more detail?

In general, if user took the backup (in tar format) using pg_basebackup,
this
patch will be able to restore such a backup even on the same server.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-11-14 Thread Amit Kapila
On Sat, Nov 15, 2014 at 12:01 AM, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Nov 14, 2014 at 1:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Generally I'd be in favor of avoiding platform-dependent code where
  possible, but that doesn't represent a YES vote for this particular
  patch.  It looks pretty messy in a quick look, even granting that the
  #ifdef WIN32's would all go away.

 Hmm, OK.  I have not read the patch.  Hopefully that's something that
 could be fixed.

  A larger question here is about forward/backward compatibility of the
  basebackup files.  Changing the representation of symlinks like this
  would break that.  Maybe we don't care, not sure (is there already a
  catversion check for these things?).  Changing the file format for only
  some platforms seems like definitely a bad idea though.

 What are the practical consequences of changing the file format?  I
 think that an old backup containing symlinks could be made to work on
 a new server that knows how to create them,

So if I understand correctly, by *old backup* you mean backup created
by 9.5 and by *new server*, you mean server  9.5, if yes the current
design should handle it.

However if the backup is created on version  9.5 using pg_basebackup
of same version and trying to restore it with server =9.5 won't work,
because server won't have the information about symlinks.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com