date:20150306

Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-06 Thread Noah Misch

On Sat, Mar 07, 2015 at 12:46:42AM -0500, Tom Lane wrote:
> Noah Misch  writes:
> > On Thu, Mar 05, 2015 at 03:28:12PM -0600, Jim Nasby wrote:
> >> I was thinking the simpler route of just repalloc'ing... the memcpy would
> >> suck, but much less so than the extra index pass. 64M gets us 11M tuples,
> >> which probably isn't very common.
> 
> > +1.  Start far below 64 MiB; grow geometrically using repalloc_huge(); cap
> > growth at vac_work_mem.
> 
> +1 for repalloc'ing at need, but I'm not sure about the "start far below
> 64 MiB" part.  64MB is a pretty small amount on nearly any machine these
> days (and for anybody who thinks it isn't, that's why maintenance_work_mem
> is a tunable).

True; nothing would explode, especially since the allocation would be strictly
smaller than it is today.  However, I can't think of a place in PostgreSQL
where a growable allocation begins so aggressively, nor a reason to break new
ground in that respect.  For comparison, tuplestore/tuplesort start memtupsize
at 1 KiB.  (One could make a separate case for that practice being wrong.)

> A different line of thought is that it would seem to make sense to have
> the initial allocation vary depending on the relation size.  For instance,
> you could assume there might be 10 dead tuples per page, and hence try to
> alloc that much if it fits in vac_work_mem.

Sounds better than a fixed 64 MiB start, though I'm not sure it's better than
a fixed 256 KiB start.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-06 Thread Tom Lane

Noah Misch  writes:
> On Thu, Mar 05, 2015 at 03:28:12PM -0600, Jim Nasby wrote:
>> I was thinking the simpler route of just repalloc'ing... the memcpy would
>> suck, but much less so than the extra index pass. 64M gets us 11M tuples,
>> which probably isn't very common.

> +1.  Start far below 64 MiB; grow geometrically using repalloc_huge(); cap
> growth at vac_work_mem.

+1 for repalloc'ing at need, but I'm not sure about the "start far below
64 MiB" part.  64MB is a pretty small amount on nearly any machine these
days (and for anybody who thinks it isn't, that's why maintenance_work_mem
is a tunable).  I think min(64MB, vac_work_mem) might be a reasonable
start point.

A different line of thought is that it would seem to make sense to have
the initial allocation vary depending on the relation size.  For instance,
you could assume there might be 10 dead tuples per page, and hence try to
alloc that much if it fits in vac_work_mem.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-06 Thread Noah Misch

On Thu, Mar 05, 2015 at 03:28:12PM -0600, Jim Nasby wrote:
> On 3/4/15 9:10 AM, Robert Haas wrote:
> >On Wed, Feb 25, 2015 at 5:06 PM, Jim Nasby  wrote:
> >>Could the large allocation[2] for the dead tuple array in lazy_space_alloc
> >>cause problems with linux OOM? [1] and some other things I've read indicate
> >>that a large mmap will count towards total system memory, including
> >>producing a failure if overcommit is disabled.
> >
> >I believe that this is possible.

I have seen that in the field, albeit on a server with a 10 GiB allocation
limit, years ago.

> >>Would it be worth avoiding the full size allocation when we can?
> >
> >Maybe.  I'm not aware of any evidence that this is an actual problem
> >as opposed to a theoretical one.  vacrelstats->dead_tuples is limited
> >to a 1GB allocation, which is not a trivial amount of memory, but it's
> >not huge, either.  But we could consider changing the representation
> >from a single flat array to a list of chunks, with each chunk capped
> >at say 64MB.  That would not only reduce the amount of memory that we
> 
> I was thinking the simpler route of just repalloc'ing... the memcpy would
> suck, but much less so than the extra index pass. 64M gets us 11M tuples,
> which probably isn't very common.

+1.  Start far below 64 MiB; grow geometrically using repalloc_huge(); cap
growth at vac_work_mem.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CATUPDATE confusion?

2015-03-06 Thread Tom Lane

Peter Eisentraut  writes:
> On 12/29/14 7:16 PM, Adam Brightwell wrote:
>> Given this discussion, I have attached a patch that removes CATUPDATE
>> for review/discussion.

> committed this version

Hmm .. I'm not sure that summarily removing usecatupd from those three
system views was well thought out.  pg_shadow, especially, has no reason
to live at all except for backwards compatibility, and clients might well
expect that column to still be there.  I wonder if we'd not be better off
to keep the column in the views but have it read from rolsuper.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CATUPDATE confusion?

2015-03-06 Thread Peter Eisentraut

On 12/29/14 7:16 PM, Adam Brightwell wrote:
> Given this discussion, I have attached a patch that removes CATUPDATE
> for review/discussion.

committed this version



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Stephen Frost

* Bruce Momjian (br...@momjian.us) wrote:
> On Fri, Mar  6, 2015 at 12:50:14PM -0800, Josh Berkus wrote:
> > On 03/06/2015 08:19 AM, Stephen Frost wrote:
> > > Well, server-side, we already have that- have pgbouncer run on the
> > > database server (something which I'm typically in favor of anyway) and
> > > use 'peer'.  If it supported TLS then it could use certificates instead.
> > > The question is what to do after the pooler has connected and that's
> > > actually a generic issue which goes beyond poolers and into
> > > applications, basically, "how can I re-authenticate this connection
> > > using a different role."  We have SET ROLE, but that gives a lot of
> > > power to the role the pooler logs in as.  It'd definitely be neat to
> > > provide a way to use SCRAM or similar to do that re-authentication after
> > > the initial connection.
> > 
> > Using pgbouncer on the DB server is common, but less common that using
> > it on an intermediate server or even the app server itself.  So anything
> > we create needs to be implementable with all three configurations in
> > some way.
> 
> I think the best solution to this would be to introduce a per-cluster
> salt that is used for every password hash.  That way, you could not
> replay a pg_authid hash on another server _unless_ you had manually
> assigned the same cluster salt to both servers, or connection pooler.

Wouldn't that break the wireline protocol, unless you used a fixed set
of known challenges?  Perhaps I'm not following what you mean by a
cluster-wide salt here.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Stephen Frost

* Bruce Momjian (br...@momjian.us) wrote:
> On Thu, Mar  5, 2015 at 11:15:55AM -0500, Stephen Frost wrote:
> > * Bruce Momjian (br...@momjian.us) wrote:
> > > On Wed, Mar  4, 2015 at 05:56:25PM -0800, Josh Berkus wrote:
> > > > So, are we more worried about attackers getting a copy of pg_authid, or
> > > > sniffing the hash on the wire?
> > > 
> > > Both.  Stephen is more worried about pg_authid, but I am more worried
> > > about sniffing.
> > 
> > I'm also worried about both, but if the admin is worried about sniffing
> > in their environment, they're much more likely to use TLS than to set up
> > client side certificates, kerberos, or some other strong auth mechanism,
> > simply because TLS is pretty darn easy to get working and distros set it
> > up for you by default.
> 
> I think your view might be skewed.  I think there many people who care
> about password security who don't care to do TLS.

I'm not quite sure that I follow what you mean about caring for
"password security".

I'll try to clarify by suggesting a few things that I think you might
mean and will respond to them.  Please clarify if I've completely missed
what you're getting at here.

If the concern is database access due to an attacker who can sniff the
network data then the approach which I suggested would make things
materially worse for users who do not employ TLS.  Once the attacker has
sniffed the network for a little while, they'll have one and likely more
challenge/responses which they could use to attempt to log in with.  As
discussed, our existing challenge/response system is vulnerable to
network sniffing based replay attacks also.

If the concern is database access due to an attacker who can see the
on-disk data, then the current situation makes it trivial for the
attacker to log in, while the approach I've suggested would require the
attacker to reverse hash(salt+auth_token) (where auth_token here is
taken to represent what we currently store on disk; even with my
suggestion, the attacker would not need to reverse the
hash(username+password) to gain database access).

If the concern is about getting at the cleartext password, then I don't
believe things are materially worse with either approach assuming a
network-based sniff attack.  Both require that the attacker reverse
hash(salt+hash(username+password)) where the salt and password are not
known.

If the concern is about getting the cleartext password as it resides on
disk or in backups, we are not in a good position today.  While the
password is hash'd, the "salt" used is the username which the attacker
may know ahead of the compromise.  The approach I am suggesting improves
that situation because it would bring it up to the same level of
difficulty as that of the network-based sniff attack: the attacker would
have to reverse hash(salt+hash(username+password)).

> Also, my suggestion to use a counter for the session salt, to reduce
> replay from 16k to 4 billion, has not received any comments, and it does
> not break the wire protocol.  I feel that is an incremental improvement
> we should consider.

You are correct, that would improve the existing protocol regarding
database-access risk due to network-based sniffing attacks.
Unfortunately, it would not improve cleartext or database access
risk due to disk-based attacks.

> I think you are minimizing the downsize of your idea using X challenges
> instead of 16k challenges to get in.  Again, if my idea is valid, it
> would be X challenges vs 4 billion challenges.

The reason I have not been as excited by this approach is that we
already have a solution for network-based sniffing attacks.  As Josh
mentioned, there are users out there who even go to extraordinary
lengths to thwart network-based sniffing attacks by using stunnel with
pg_bouncer.  On the flip side, while there are ways to protect backups
through encryption, many other vectors exist for disk-based attacks
which result in either database access or finding the cleartext
password with much less difficulty.

Further, this only improves the authentication handling and doesn't
improve our risk to other network-based attacks, including connection
hijacking, sniffing during password set/reset, data compromise as it's
sent across the wire, etc.  Encouraging use of TLS addresses all of
those risks.  I don't recall any complaints about these other
network-based attacks and I do believe that's because TLS is available.
Combined with the approach I've suggested, we would reduce the risk of
disk-based attacks to the extent we're able to without breaking the
protocol.

For my part, doing this, or going with my suggestion, or doing nothing
with md5, really doesn't move us forward very much, which frustrates me
greatly.  I brought this suggestion to this list because it was
suggested to me as one change we could make that would reduce the risk
of disk-based attacks while trading that off for a higher risk on the
side of network-based attacks while not breaking the existing netwo

Re: [HACKERS] Clamping reulst row number of joins.

2015-03-06 Thread Tom Lane

I wrote:
> Hm, the problem evidently is that we get a default selectivity estimate
> for the ON condition.  I think a proper fix for this would involve
> teaching eqjoinsel (and ideally other join selectivity functions) how
> to drill down into appendrels and combine estimates for the child
> relations.

I wrote a prototype patch for this.  The additions to examine_variable()
seem pretty reasonable.  However, the only selectivity function I've fixed
is eqjoinsel_inner().  If we do it like this, we're going to need
similar recursive-boilerplate additions in basically every selectivity
function, which seems like a PITA as well as a lot of code bloat.
Can anyone think of a cute way to minimize that overhead?

regards, tom lane

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 4dd3f9f..aaf76be 100644
*** a/src/backend/utils/adt/selfuncs.c
--- b/src/backend/utils/adt/selfuncs.c
*** static double convert_one_bytea_to_scala
*** 181,187 
  			int rangelo, int rangehi);
  static char *convert_string_datum(Datum value, Oid typid);
  static double convert_timevalue_to_scalar(Datum value, Oid typid);
! static void examine_simple_variable(PlannerInfo *root, Var *var,
  		VariableStatData *vardata);
  static bool get_variable_range(PlannerInfo *root, VariableStatData *vardata,
     Oid sortop, Datum *min, Datum *max);
--- 181,187 
  			int rangelo, int rangehi);
  static char *convert_string_datum(Datum value, Oid typid);
  static double convert_timevalue_to_scalar(Datum value, Oid typid);
! static void examine_simple_variable(PlannerInfo *root, Var *var, Index varno,
  		VariableStatData *vardata);
  static bool get_variable_range(PlannerInfo *root, VariableStatData *vardata,
     Oid sortop, Datum *min, Datum *max);
*** eqjoinsel_inner(Oid operator,
*** 2221,2226 
--- 2221,2276 
  	float4	   *numbers2 = NULL;
  	int			nnumbers2 = 0;
  
+ 	if (vardata1->children)
+ 	{
+ 		/* Recurse to appendrel children and compute weighted selectivity */
+ 		double		appendrelrows;
+ 		ListCell   *lc;
+ 
+ 		selec = 0;
+ 		appendrelrows = 0;
+ 		foreach(lc, vardata1->children)
+ 		{
+ 			VariableStatData *childdata = (VariableStatData *) lfirst(lc);
+ 			double		cselec;
+ 
+ 			if (childdata->rel == NULL)
+ continue;		/* safety check */
+ 			cselec = eqjoinsel_inner(operator, childdata, vardata2);
+ 			selec += cselec * childdata->rel->rows;
+ 			appendrelrows += childdata->rel->rows;
+ 		}
+ 		if (appendrelrows > 0)
+ 			selec /= appendrelrows;
+ 		CLAMP_PROBABILITY(selec);
+ 		return selec;
+ 	}
+ 
+ 	if (vardata2->children)
+ 	{
+ 		/* Recurse to appendrel children and compute weighted selectivity */
+ 		double		appendrelrows;
+ 		ListCell   *lc;
+ 
+ 		selec = 0;
+ 		appendrelrows = 0;
+ 		foreach(lc, vardata2->children)
+ 		{
+ 			VariableStatData *childdata = (VariableStatData *) lfirst(lc);
+ 			double		cselec;
+ 
+ 			if (childdata->rel == NULL)
+ continue;		/* safety check */
+ 			cselec = eqjoinsel_inner(operator, vardata1, childdata);
+ 			selec += cselec * childdata->rel->rows;
+ 			appendrelrows += childdata->rel->rows;
+ 		}
+ 		if (appendrelrows > 0)
+ 			selec /= appendrelrows;
+ 		CLAMP_PROBABILITY(selec);
+ 		return selec;
+ 	}
+ 
  	nd1 = get_variable_numdistinct(vardata1, &isdefault1);
  	nd2 = get_variable_numdistinct(vardata2, &isdefault2);
  
*** get_restriction_variable(PlannerInfo *ro
*** 4192,4198 
  	{
  		*varonleft = true;
  		*other = estimate_expression_value(root, rdata.var);
! 		/* Assume we need no ReleaseVariableStats(rdata) here */
  		return true;
  	}
  
--- 4242,4248 
  	{
  		*varonleft = true;
  		*other = estimate_expression_value(root, rdata.var);
! 		ReleaseVariableStats(rdata);	/* usually unnecessary, but ... */
  		return true;
  	}
  
*** get_restriction_variable(PlannerInfo *ro
*** 4200,4206 
  	{
  		*varonleft = false;
  		*other = estimate_expression_value(root, vardata->var);
! 		/* Assume we need no ReleaseVariableStats(*vardata) here */
  		*vardata = rdata;
  		return true;
  	}
--- 4250,4256 
  	{
  		*varonleft = false;
  		*other = estimate_expression_value(root, vardata->var);
! 		ReleaseVariableStats(*vardata);
  		*vardata = rdata;
  		return true;
  	}
*** get_join_variables(PlannerInfo *root, Li
*** 4259,4283 
   *	node: the expression tree to examine
   *	varRelid: see specs for restriction selectivity functions
   *
!  * Outputs: *vardata is filled as follows:
!  *	var: the input expression (with any binary relabeling stripped, if
!  *		it is or contains a variable; but otherwise the type is preserved)
!  *	rel: RelOptInfo for relation containing variable; NULL if expression
!  *		contains no Vars (NOTE this could point to a RelOptInfo of a
!  *		subquery, not one in the current query).
!  *	statsTuple: the pg_statistic entry for the variable, if one exists;
!  *		otherwise NULL.
!  *

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Bruce Momjian

On Fri, Mar  6, 2015 at 12:50:14PM -0800, Josh Berkus wrote:
> On 03/06/2015 08:19 AM, Stephen Frost wrote:
> > * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> >> Stephen Frost wrote:
> >> Sure.  I was thinking we would have some mechanism to authenticate the
> >> connection as coming from a pooler that has been previously authorized;
> >> something simple as a new pg_hba.conf entry type for "poolers" that are
> >> only authorized to connect to such-and-such databases, perhaps limit to
> >> such-and-such users, etc.
> > 
> > Well, server-side, we already have that- have pgbouncer run on the
> > database server (something which I'm typically in favor of anyway) and
> > use 'peer'.  If it supported TLS then it could use certificates instead.
> > The question is what to do after the pooler has connected and that's
> > actually a generic issue which goes beyond poolers and into
> > applications, basically, "how can I re-authenticate this connection
> > using a different role."  We have SET ROLE, but that gives a lot of
> > power to the role the pooler logs in as.  It'd definitely be neat to
> > provide a way to use SCRAM or similar to do that re-authentication after
> > the initial connection.
> 
> Using pgbouncer on the DB server is common, but less common that using
> it on an intermediate server or even the app server itself.  So anything
> we create needs to be implementable with all three configurations in
> some way.

I think the best solution to this would be to introduce a per-cluster
salt that is used for every password hash.  That way, you could not
replay a pg_authid hash on another server _unless_ you had manually
assigned the same cluster salt to both servers, or connection pooler.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Bruce Momjian

On Thu, Mar  5, 2015 at 11:15:55AM -0500, Stephen Frost wrote:
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Wed, Mar  4, 2015 at 05:56:25PM -0800, Josh Berkus wrote:
> > > So, are we more worried about attackers getting a copy of pg_authid, or
> > > sniffing the hash on the wire?
> > 
> > Both.  Stephen is more worried about pg_authid, but I am more worried
> > about sniffing.
> 
> I'm also worried about both, but if the admin is worried about sniffing
> in their environment, they're much more likely to use TLS than to set up
> client side certificates, kerberos, or some other strong auth mechanism,
> simply because TLS is pretty darn easy to get working and distros set it
> up for you by default.

I think your view might be skewed.  I think there many people who care
about password security who don't care to do TLS.

Also, my suggestion to use a counter for the session salt, to reduce
replay from 16k to 4 billion, has not received any comments, and it does
not break the wire protocol.  I feel that is an incremental improvement
we should consider.

I think you are minimizing the downsize of your idea using X challenges
instead of 16k challenges to get in.  Again, if my idea is valid, it
would be X challenges vs 4 billion challenges.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] alter user/role CURRENT_USER

2015-03-06 Thread Alvaro Herrera

There is something odd here going on:

alvherre=# alter group test add user current_user;
ERROR:  role "test" is a member of role "(null)"

Surely (null) is not the right name to be reporting there ...

Attached is a rebased patch, which also has some incomplete doc changes.

With this patch applied, doing
\h ALTER ROLE
in psql looks quite odd: note how wide it has become.  Maybe we should
be doing this differently?  (Hmm, why don't we accept ALL in the first SET
line?  Maybe that's just a mistake and the four lines should be all
identical in the first half ...)


alvherre=# \h alter role
Command: ALTER ROLE
Description: change a database role
Syntax:
ALTER ROLE { name | CURRENT_USER | SESSION_USER } [ WITH ] option [ ... ]

where option can be:

  SUPERUSER | NOSUPERUSER
| CREATEDB | NOCREATEDB
| CREATEROLE | NOCREATEROLE
| CREATEUSER | NOCREATEUSER
| INHERIT | NOINHERIT
| LOGIN | NOLOGIN
| REPLICATION | NOREPLICATION
| BYPASSRLS | NOBYPASSRLS
| CONNECTION LIMIT connlimit
| [ ENCRYPTED | UNENCRYPTED ] PASSWORD 'password'
| VALID UNTIL 'timestamp'

ALTER ROLE name RENAME TO new_name

ALTER ROLE { name | CURRENT_USER | SESSION_USER } [ IN DATABASE database_name ] 
SET configuration_parameter { TO | = } { value | DEFAULT }
ALTER ROLE { name | CURRENT_USER | SESSION_USER | ALL } [ IN DATABASE 
database_name ] SET configuration_parameter FROM CURRENT
ALTER ROLE { name | CURRENT_USER | SESSION_USER | ALL } [ IN DATABASE 
database_name ] RESET configuration_parameter
ALTER ROLE { name | CURRENT_USER | SESSION_USER | ALL } [ IN DATABASE 
database_name ] RESET ALL

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/ref/alter_group.sgml b/doc/src/sgml/ref/alter_group.sgml
index 1432242..5f0d8b4 100644
--- a/doc/src/sgml/ref/alter_group.sgml
+++ b/doc/src/sgml/ref/alter_group.sgml
@@ -21,10 +21,10 @@ PostgreSQL documentation
 
  
 
-ALTER GROUP group_name ADD USER user_name [, ... ]
-ALTER GROUP group_name DROP USER user_name [, ... ]
+ALTER GROUP { group_name | CURRENT_USER | SESSION_USER } ADD USER user_name [, ... ]
+ALTER GROUP { group_name | CURRENT_USER | SESSION_USER } DROP USER user_name [, ... ]
 
-ALTER GROUP group_name RENAME TO new_name
+ALTER GROUP { group_name | CURRENT_USER | SESSION_USER } RENAME TO new_name
 
  
 
diff --git a/doc/src/sgml/ref/alter_role.sgml b/doc/src/sgml/ref/alter_role.sgml
index 0471daa..59f6499 100644
--- a/doc/src/sgml/ref/alter_role.sgml
+++ b/doc/src/sgml/ref/alter_role.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-ALTER ROLE name [ [ WITH ] option [ ... ] ]
+ALTER ROLE { name | CURRENT_USER | SESSION_USER } [ WITH ] option [ ... ]
 
 where option can be:
 
@@ -39,10 +39,10 @@ ALTER ROLE name [ [ WITH ] name RENAME TO new_name
 
-ALTER ROLE name [ IN DATABASE database_name ] SET configuration_parameter { TO | = } { value | DEFAULT }
-ALTER ROLE { name | ALL } [ IN DATABASE database_name ] SET configuration_parameter FROM CURRENT
-ALTER ROLE { name | ALL } [ IN DATABASE database_name ] RESET configuration_parameter
-ALTER ROLE { name | ALL } [ IN DATABASE database_name ] RESET ALL
+ALTER ROLE { name | CURRENT_USER | SESSION_USER } [ IN DATABASE database_name ] SET configuration_parameter { TO | = } { value | DEFAULT }
+ALTER ROLE { name | CURRENT_USER | SESSION_USER | ALL } [ IN DATABASE database_name ] SET configuration_parameter FROM CURRENT
+ALTER ROLE { name | CURRENT_USER | SESSION_USER | ALL } [ IN DATABASE database_name ] RESET configuration_parameter
+ALTER ROLE { name | CURRENT_USER | SESSION_USER | ALL } [ IN DATABASE database_name ] RESET ALL
 
  
 
@@ -129,6 +129,25 @@ ALTER ROLE { name | ALL } [ IN DATA
  
 
  
+  CURRENT_USER
+  
+   
+Indicates to 
+Alter the current user instead of a specifically named role.
+   
+  
+ 
+
+ 
+  SESSION_USER
+  
+   
+Alter the current session role instead of a specifically named role.
+   
+  
+ 
+
+ 
   SUPERUSER
   NOSUPERUSER
   CREATEDB
diff --git a/doc/src/sgml/ref/alter_user.sgml b/doc/src/sgml/ref/alter_user.sgml
index 58ae1da..628729d 100644
--- a/doc/src/sgml/ref/alter_user.sgml
+++ b/doc/src/sgml/ref/alter_user.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-ALTER USER name [ [ WITH ] option [ ... ] ]
+ALTER USER { name | CURRENT_USER | SESSION_USER } [ WITH ] option [ ... ]
 
 where option can be:
 
@@ -38,10 +38,10 @@ ALTER USER name [ [ WITH ] name RENAME TO new_name
 
-ALTER USER name SET configuration_parameter { TO | = } { value | DEFAULT }
-ALTER USER name SET configuration_parameter FROM CURRENT
-ALTER USER name RESET configuration_parameter
-ALTER USER name RESET ALL
+ALTER USER { name | CURRENT_USER | SESSION_USER } SET configuration_parameter { TO | = } { value | DEFAULT }
+ALTER USER { name | CURRENT_USER | SESSION_USER } SET configurati

Re: [HACKERS] Bootstrap DATA is a pita

2015-03-06 Thread Alvaro Herrera

Robert Haas wrote:
> On Wed, Mar 4, 2015 at 2:27 PM, Alvaro Herrera  
> wrote:

> > BTW one solution to the merge problem is to have unique separators for
> > each entry.  For instance, instead of

> Speaking from entirely too much experience, that's not nearly enough.
> git only needs 3 lines of context to apply a hunk with no qualms at
> all, and it'll shade that to just 1 or 2 with little fanfare.  If your
> pg_proc entries are each 20 lines long, this sort of thing will
> provide little protection.

Yeah, you're right.  This is going to be a problem, and we need some
solution for it.  I'm out of ideas, other than of course getting each
entry to be at most two lines long which nobody seems to like (for good
reasons.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add more tests on event triggers

2015-03-06 Thread Alvaro Herrera

Robert Haas wrote:
> On Wed, Feb 25, 2015 at 10:03 AM, Fabien COELHO  wrote:
> >> You can add tests in src/test/modules/dummy_seclabel.
> >
> > Patch attached to test sec label there, in addition to the other more
> > standard checks in event_trigger.
> 
> These tests seem worthwhile to me.

Pushed, thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] get_object_address support for additional object types

2015-03-06 Thread Alvaro Herrera

This is extracted from the DDL deparse series.  These patches add
get_object_address support for the following object types:

- user mappings
- default ACLs
- operators and functions of operator families

In each case I had to create a new value in ObjectType.  These object
types can not be created from the parser, which is why they don't exist
yet.  But if we want to be able to process DDL for them, then we need to
cope at this level.  This is the kind of fix we need to handle the
failures related to commit a486841eb11517e.

There is one strange thing in the last one, which is that an opfamily
member is represented in two arrays like this (objname, objargs):
{opfamily identity, access method identity, number} , {left type, right type}

This is a bit odd considering that operator families themselves are
addressed like this instead:
{opfamily identity} , {access method identity}

Note that the AM name is originally in objargs and moves to objnames.
The reason I did it this way is that the objargs elements can be
processed completely as an array of TypeName, and therefore there's no
need for an extra strange case in pg_get_object_address.  But it does
mean that there is some code that knows to search the strategy or
function number in a specific position in the objname array.

If we had more freedom on general object representation I'm sure we
could do better, but it's what we have.  I don't think it's a tremendous
problem, considering that get_object_address gets a fairly ad-hoc
representation for each object type anyway, as each gets constructed by
the grammar.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From fad598488c0c15e0962808ad13825374e8a3640e Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 5 Mar 2015 12:04:39 -0300
Subject: [PATCH 1/3] deparse/core: get_object_address support user mappings

---
 src/backend/catalog/objectaddress.c  | 67 +++-
 src/backend/commands/event_trigger.c |  1 +
 src/include/nodes/parsenodes.h   |  1 +
 src/test/regress/expected/event_trigger.out  | 12 -
 src/test/regress/expected/object_address.out | 19 +---
 src/test/regress/sql/event_trigger.sql   | 11 -
 src/test/regress/sql/object_address.sql  |  9 ++--
 7 files changed, 107 insertions(+), 13 deletions(-)

diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 541912b..5553ec7 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -520,7 +520,7 @@ ObjectTypeMap[] =
 	/* OCLASS_FOREIGN_SERVER */
 	{ "server", OBJECT_FOREIGN_SERVER },
 	/* OCLASS_USER_MAPPING */
-	{ "user mapping", -1 },		/* unmapped */
+	{ "user mapping", OBJECT_USER_MAPPING },
 	/* OCLASS_DEFACL */
 	{ "default acl", -1 },		/* unmapped */
 	/* OCLASS_EXTENSION */
@@ -555,6 +555,8 @@ static ObjectAddress get_object_address_type(ObjectType objtype,
 		List *objname, bool missing_ok);
 static ObjectAddress get_object_address_opcf(ObjectType objtype, List *objname,
 		List *objargs, bool missing_ok);
+static ObjectAddress get_object_address_usermapping(List *objname,
+			   List *objargs, bool missing_ok);
 static const ObjectPropertyType *get_object_property_data(Oid class_id);
 
 static void getRelationDescription(StringInfo buffer, Oid relid);
@@ -769,6 +771,10 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
 address.objectId = get_ts_config_oid(objname, missing_ok);
 address.objectSubId = 0;
 break;
+			case OBJECT_USER_MAPPING:
+address = get_object_address_usermapping(objname, objargs,
+		 missing_ok);
+break;
 			default:
 elog(ERROR, "unrecognized objtype: %d", (int) objtype);
 /* placate compiler, in case it thinks elog might return */
@@ -1373,6 +1379,64 @@ get_object_address_opcf(ObjectType objtype,
 }
 
 /*
+ * Find the ObjectAddress for a user mapping.
+ */
+static ObjectAddress
+get_object_address_usermapping(List *objname, List *objargs, bool missing_ok)
+{
+	ObjectAddress address;
+	Oid			userid;
+	char	   *username;
+	char	   *servername;
+	ForeignServer *server;
+	HeapTuple	tp;
+
+	ObjectAddressSet(address, UserMappingRelationId, InvalidOid);
+
+	username = strVal(linitial(objname));
+	servername = strVal(linitial(objargs));
+	server = GetForeignServerByName(servername, false);
+
+	if (strcmp(username, "public") == 0)
+		userid = InvalidOid;
+	else
+	{
+		tp = SearchSysCache1(AUTHNAME,
+			 CStringGetDatum(username));
+		if (!HeapTupleIsValid(tp))
+		{
+			if (!missing_ok)
+ereport(ERROR,
+		(errcode(ERRCODE_UNDEFINED_OBJECT),
+		 errmsg("user mapping for user \"%s\" in server \"%s\" does not exist",
+username, servername)));
+			return address;
+		}
+		userid = HeapTupleGetOid(tp);
+		ReleaseSysCache(tp);
+	}
+
+	tp = SearchSysCache2(USERMAPPINGUSERSERVER,
+		 ObjectIdGetDatum(userid),
+

Re: [HACKERS] Clamping reulst row number of joins.

2015-03-06 Thread Tom Lane

Tom Lane  writes:
> Stephen Frost  writes:
>> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>>> BTW, is that JOIN (VALUES(...)) thing common in applications, or did you
>>> just use it to make a compact example?  If it were something worth
>>> optimizing, it seems like we could teach the planner to "pull up VALUES"
>>> in the same way that it flattens sub-selects.  I'm not sure if this is
>>> worth the trouble or not, though.

>> I've certainly seen and used values() constructs in joins for a variety
>> of reasons and I do think it'd be worthwhile for the planner to know how
>> to pull up a VALUES construct.

>> Would that be a lot of effort, either code-wise or runtime-wise?  My gut
>> feeling is that it wouldn't be, but you're clearly in a much better
>> position to determine that.

> My guess is that it'd be pretty simple to do if we want to do it.
> I've not looked at the code yet though.

I spent a bit of time looking at this, and realized that the blocker
is the same as the reason why we don't pull up sub-selects with empty
rangetables ("SELECT expression(s)").  Namely, that the upper query's
jointree can't handle a null subtree.  (This is not only a matter of
the jointree data structure, but the fact that the whole planner
identifies relations by bitmapsets of RTE indexes, and subtrees with
empty RTE sets couldn't be told apart.)

We could probably fix both cases for order-of-a-hundred lines of new code
in prepjointree.  The plan I'm thinking about is to allow such vacuous
subquery jointrees to be pulled up, but only if they are in a place in
the upper query's jointree where it's okay to delete the subtree.  This
would basically be two cases: (1) the immediate parent is a FromExpr that
would have at least one remaining child, or (2) the immediate parent is
an INNER JOIN whose other child isn't also being deleted (so that we can
convert the JoinExpr to a nonempty FromExpr, or just use the other child
as-is if the JoinExpr has no quals).

More generally, it occurs to me that maybe Oracle wasn't being totally
silly when they invented DUAL.  If we had a jointree representation for
"dummy relation with exactly one row" then we could substitute that in
all vacuous-jointree cases.  However, it's not clear that there's any
functional advantage to replacing a VALUES Scan with a "DUAL Scan",
which is basically what would happen if we flattened a VALUES and then
had to put one of these things in instead.  And having such things
propagate all through the planner, executor, EXPLAIN, etc is way more
code churn than I want to contemplate right now.  The plan proposed in
the preceding para is a bit more ugly logically, but it would limit the
code effects to basically pull_up_subqueries() and its child routines.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Josh Berkus

On 03/06/2015 08:19 AM, Stephen Frost wrote:
> * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
>> Stephen Frost wrote:
>> Sure.  I was thinking we would have some mechanism to authenticate the
>> connection as coming from a pooler that has been previously authorized;
>> something simple as a new pg_hba.conf entry type for "poolers" that are
>> only authorized to connect to such-and-such databases, perhaps limit to
>> such-and-such users, etc.
> 
> Well, server-side, we already have that- have pgbouncer run on the
> database server (something which I'm typically in favor of anyway) and
> use 'peer'.  If it supported TLS then it could use certificates instead.
> The question is what to do after the pooler has connected and that's
> actually a generic issue which goes beyond poolers and into
> applications, basically, "how can I re-authenticate this connection
> using a different role."  We have SET ROLE, but that gives a lot of
> power to the role the pooler logs in as.  It'd definitely be neat to
> provide a way to use SCRAM or similar to do that re-authentication after
> the initial connection.

Using pgbouncer on the DB server is common, but less common that using
it on an intermediate server or even the app server itself.  So anything
we create needs to be implementable with all three configurations in
some way.

What I'd particularly like to see, of course, is some way for a
pgbouncer-like proxy to authenticate a client connection, then to use
those credentials to authenticate to the database server, without every
storing a reusable auth token (such as the current md5) on the pgbouncer
server.  And, of course, TLS support.

> pgbouncer isn't as necessary in the kinds of environments that use
> Kerberos because you aren't having lots of connections/s from distinct
> principals to a given server.  It's just not the profile that pgbouncer
> is built for, but that's kind of my point here- pgbouncer is far more
> concerned with performance than security because the assumption going in
> is that you have a trusted server connecting to a trusted server.
> That's an acceptable assumption for a lot of environments, though not
> all.

It's not that pgbouncer users don't want more secure connections; it's
that they can't have them.  I know users who are using stunnel to
connect to pgbouncer because that's the only way they can secure it.

>> (Actually, is pgbouncer under active maintenance at all these days?)
> 
> I understand Andrew has been helping with it, but my guess is that's
> more maintenance and less active development.

No, there's active development; look for some new pgbouncer features
over the next 6-9 months.

More importantly, pgbouncer is *used* by a large portion of our users --
somewhere between 20% and 50% based on my experience with our clients
and at conferences (our clients are actually around 70%
pgbouncer-using).  So if our md5-replacement solution isn't compatible
with pgbouncer, or doesn't arrive with a replacement for pgbouncer with
which it is compatible, will result in us supporting old-style md5 for a
long time or a lot of people using "password".

>>> Also, I don't expect we're going to remove md5 any time soon and,
>>> frankly, people using pgbouncer probably aren't worried about the issues
>>> which exist with that mechanism and care much more about performance, as
>>> it doesn't even support TLS..
>>
>> I think the accept the issues because they have no other choice, not
>> because they are really all that OK with them.
> 
> I'd certainly be very happy to see someone interested enough in this
> issue to dedicate resources (either human time or funding) to implement
> TLS for pgbouncer...

PGX would be happy to implement this if someone wanted to find funding.
 I'd expect that a couple other consulting companies would jump at it as
well.  It's just that so far nobody has wanted to foot the bill.

Kickstarter, maybe?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extend pgbench expressions with functions

2015-03-06 Thread Fabien COELHO



This patch extends pgbench expression with functions. Currently only one 
"abs" function is added. The point is rather to bootstrap the infrastructure 
for other functions (such as hash, random variants...) to be added later.


Here is a small v2 update:
 - with better error messages on non existing functions
 - a minimal documentation

--
Fabien.diff --git a/contrib/pgbench/exprparse.y b/contrib/pgbench/exprparse.y
index 243c6b9..254f081 100644
--- a/contrib/pgbench/exprparse.y
+++ b/contrib/pgbench/exprparse.y
@@ -20,6 +20,7 @@ static PgBenchExpr *make_integer_constant(int64 ival);
 static PgBenchExpr *make_variable(char *varname);
 static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
 		PgBenchExpr *rexpr);
+static PgBenchExpr *make_func(const char *fname, PgBenchExpr *arg1);
 
 %}
 
@@ -35,8 +36,8 @@ static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
 
 %type  expr
 %type  INTEGER
-%type  VARIABLE
-%token INTEGER VARIABLE
+%type  VARIABLE FUNCTION
+%token INTEGER VARIABLE FUNCTION
 %token CHAR_ERROR /* never used, will raise a syntax error */
 
 %left	'+' '-'
@@ -57,6 +58,7 @@ expr: '(' expr ')'			{ $$ = $2; }
 	| expr '%' expr			{ $$ = make_op('%', $1, $3); }
 	| INTEGER{ $$ = make_integer_constant($1); }
 	| VARIABLE { $$ = make_variable($1); }
+	| FUNCTION '(' expr ')' { $$ = make_func($1, $3); pg_free($1); }
 	;
 
 %%
@@ -93,4 +95,33 @@ make_op(char operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr)
 	return expr;
 }
 
+static struct {
+	char * fname;
+	int nargs;
+	PgBenchFunction tag;
+} PGBENCH_FUNCTIONS[] = {
+	{ "abs", 1, PGBENCH_ABS }
+};
+
+static PgBenchExpr *
+make_func(const char * fname, PgBenchExpr *arg1)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+	int nfunctions = sizeof(PGBENCH_FUNCTIONS) / sizeof(PGBENCH_FUNCTIONS[0]);
+	int i;
+
+	expr->etype = ENODE_FUNCTION;
+	expr->u.function.function = PGBENCH_NONE;
+
+	for (i = 0; i < nfunctions; i++)
+		if (pg_strcasecmp(fname, PGBENCH_FUNCTIONS[i].fname) == 0)
+			expr->u.function.function = PGBENCH_FUNCTIONS[i].tag;
+
+	if (expr->u.function.function == PGBENCH_NONE)
+		yyerror("unexpected function name");
+
+	expr->u.function.arg1 = arg1;
+	return expr;
+}
+
 #include "exprscan.c"
diff --git a/contrib/pgbench/exprscan.l b/contrib/pgbench/exprscan.l
index 4c9229c..a276444 100644
--- a/contrib/pgbench/exprscan.l
+++ b/contrib/pgbench/exprscan.l
@@ -46,6 +46,7 @@ space			[ \t\r\f]
 ")"{ yycol += yyleng; return ')'; }
 :[a-zA-Z0-9_]+	{ yycol += yyleng; yylval.str = pg_strdup(yytext + 1); return VARIABLE; }
 [0-9]+			{ yycol += yyleng; yylval.ival = strtoint64(yytext); return INTEGER; }
+[a-zA-Z]+   { yycol += yyleng; yylval.str = pg_strdup(yytext); return FUNCTION; }
 
 [\n]			{ yycol = 0; yyline++; }
 {space}			{ yycol += yyleng; /* ignore */ }
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 706fdf5..e0cffc2 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -959,6 +959,26 @@ evaluateExpr(CState *st, PgBenchExpr *expr, int64 *retval)
 return false;
 			}
 
+		case ENODE_FUNCTION:
+			{
+switch (expr->u.function.function)
+{
+	case PGBENCH_ABS:
+	{
+		int64 arg1;
+		if (!evaluateExpr(st, expr->u.function.arg1, &arg1))
+			return false;
+
+		*retval = arg1 > 0? arg1: -arg1;
+		return true;
+	}
+	default:
+		fprintf(stderr, "bad function (%d)\n",
+expr->u.function.function);
+		return false;
+}
+			}
+
 		default:
 			break;
 	}
diff --git a/contrib/pgbench/pgbench.h b/contrib/pgbench/pgbench.h
index 128bf11..b7d30ba 100644
--- a/contrib/pgbench/pgbench.h
+++ b/contrib/pgbench/pgbench.h
@@ -15,12 +15,19 @@ typedef enum PgBenchExprType
 {
 	ENODE_INTEGER_CONSTANT,
 	ENODE_VARIABLE,
-	ENODE_OPERATOR
+	ENODE_OPERATOR,
+	ENODE_FUNCTION
 } PgBenchExprType;
 
 struct PgBenchExpr;
 typedef struct PgBenchExpr PgBenchExpr;
 
+typedef enum PgBenchFunction
+{
+	PGBENCH_NONE,
+	PGBENCH_ABS
+} PgBenchFunction;
+
 struct PgBenchExpr
 {
 	PgBenchExprType	etype;
@@ -40,6 +47,11 @@ struct PgBenchExpr
 			PgBenchExpr	*lexpr;
 			PgBenchExpr *rexpr;
 		} operator;
+		struct
+		{
+			PgBenchFunction function;
+			PgBenchExpr *arg1;
+		} function;
 	} u;
 };
 
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index ed12e27..0c58412 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -762,7 +762,7 @@ pgbench  options  dbname
   references to variables :variablename,
   and expressions composed of unary (-) or binary operators
   (+, -, *, /, %)
-  with their usual associativity, and parentheses.
+  with their usual associativity, function abs and parentheses.
  
 
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

2015-03-06 Thread Bruce Momjian

On Fri, Mar  6, 2015 at 10:50:27AM +0300, Vladimir Borodin wrote:
> What I have done is to update the pg_upgrade instructions to add this
> required step.  Updated doc patch attached.  (I also added the --delete
> flag to rsync.)  Thanks so much for your detailed report.
> 
> 
> It seems to work fine now. The only thing that would be nice to change is
> to explicitly write that these instructions are correct for hot standby
> installations too.
> 
> + 
> +  If you have Log-Shipping Standby Servers ( +  linkend="warm-standby">), follow these steps to upgrade them (before
> +  starting any servers):
> + 
> 
> Actually, I’ve entered this thread because it is not obvious from the 
> paragraph
> above or any other places.

Oh, very good point.  I was trying to match the wording we use in the
docs, but forgot that log shipping and streaming replication are
specified separately.

Updated patch attached.  You can view the output at:

http://momjian.us/tmp/pgsql/pgupgrade.html

Thanks much!

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
new file mode 100644
index 07ca0dc..e25e0d0
*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
*** tar -cf backup.tar /usr/local/pgsql/data
*** 438,445 
 Another option is to use rsync to perform a file
 system backup.  This is done by first running rsync
 while the database server is running, then shutting down the database
!server just long enough to do a second rsync.  The
!second rsync will be much quicker than the first,
 because it has relatively little data to transfer, and the end result
 will be consistent because the server was down.  This method
 allows a file system backup to be performed with minimal downtime.
--- 438,447 
 Another option is to use rsync to perform a file
 system backup.  This is done by first running rsync
 while the database server is running, then shutting down the database
!server long enough to do an rsync --checksum.
!(--checksum is necessary because rsync only
!has file modification-time granularity of one second.)  The
!second rsync will be quicker than the first,
 because it has relatively little data to transfer, and the end result
 will be consistent because the server was down.  This method
 allows a file system backup to be performed with minimal downtime.
diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml
new file mode 100644
index e1cd260..0d79fb5
*** a/doc/src/sgml/pgupgrade.sgml
--- b/doc/src/sgml/pgupgrade.sgml
*** NET STOP postgresql-8.4
*** 315,320 
--- 315,325 
  NET STOP postgresql-9.0
  
  
+ 
+ 
+  Streaming replication and log-shipping standby servers can remain running until
+  a later step.
+ 
 
  
 
*** pg_upgrade.exe
*** 399,404 
--- 404,539 
 
  
 
+ Upgrade Streaming Replication and Log-Shipping standby
+ servers
+ 
+ 
+  If you have Streaming Replication () or Log-Shipping () standby servers, follow these steps to
+  upgrade them (before starting any servers):
+ 
+ 
+ 
+ 
+  
+   Install the new PostgreSQL binaries on standby servers
+ 
+   
+Make sure the new binaries and support files are installed on all
+standby servers.
+   
+  
+ 
+  
+   Make sure the new standby data directories do not
+   exist
+ 
+   
+Make sure the new standby data directories do not
+exist or are empty.  If initdb was run, delete
+the standby server data directories.
+   
+  
+ 
+  
+   Install custom shared object files
+ 
+   
+Install the same custom shared object files on the new standbys
+that you installed in the new master cluster.
+   
+  
+ 
+  
+   Stop standby servers
+ 
+   
+If the standby servers are still running, stop them now using the
+above instructions.
+   
+  
+ 
+  
+   Verify standby servers
+ 
+   
+To prevent old standby servers from being modified, run
+pg_controldata against the primary and standby
+clusters and verify that the Latest checkpoint location
+values match in all clusters.  (This requires the standbys to be
+shut down after the primary.)
+   
+  
+ 
+  
+   Save configuration files
+ 
+   
+Save any configuration files from the standbys you need to keep,
+e.g.  postgresql.conf, recovery.conf,
+as these will be overwritten or removed in the next step.
+   
+  
+ 
+  
+   Start and stop the new master cluster
+ 
+   
+In the new master cluster, change wal_level to
+hot_standby in the postgresql.co

Re: [HACKERS] Weirdly pesimistic estimates in optimizer

2015-03-06 Thread Tom Lane

I wrote:
> I chewed on this for awhile and decided that there'd be no real harm in
> taking identification of the unique expressions out of 
> create_unique_path() and doing it earlier, in initsplan.c; we'd need a
> couple more fields in SpecialJoinInfo but that doesn't seem like a
> problem.  However, rel->rows is a *big* problem; we simply have not made
> any join size estimates yet, and can't, because these things are done
> bottom up.

> However ... estimate_num_groups's dependency on its rowcount input is not
> large (it's basically using it as a clamp).  So conceivably we could have
> get_loop_count just multiply together the sizes of the base relations
> included in the semijoin's RHS to get a preliminary estimate of that
> number.  This would be the right thing anyway for a single relation in the
> RHS, which is the most common case.  It would usually be an overestimate
> for join RHS, but we could hope that the output of estimate_num_groups
> wouldn't be affected too badly.

Attached is a draft patch that does those two things.  I like the first
part (replacing SpecialJoinInfo's rather ad-hoc join_quals field with
something more explicitly attuned to semijoin uniqueness processing).
The second part is still pretty much of a kluge, but then get_loop_count
was a kluge already.  This arguably makes it better.

Now, on the test case you presented, this has the unfortunate effect that
it now reliably chooses the "wrong" plan for both cases :-(.  But I think
that's a reflection of poor cost parameters (ie, test case fits handily in
RAM but we've not set the cost parameters to reflect that).  We do get the
same rowcount and roughly-same cost estimates for both the random_fk_dupl
and random_fk_uniq queries, so from that standpoint it's doing the right
thing.  If I reduce random_page_cost to 2 or so, it makes the choices you
wanted.

regards, tom lane

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 9fe8008..9f65d66 100644
*** a/src/backend/nodes/copyfuncs.c
--- b/src/backend/nodes/copyfuncs.c
*** _copySpecialJoinInfo(const SpecialJoinIn
*** 1942,1948 
  	COPY_SCALAR_FIELD(jointype);
  	COPY_SCALAR_FIELD(lhs_strict);
  	COPY_SCALAR_FIELD(delay_upper_joins);
! 	COPY_NODE_FIELD(join_quals);
  
  	return newnode;
  }
--- 1942,1951 
  	COPY_SCALAR_FIELD(jointype);
  	COPY_SCALAR_FIELD(lhs_strict);
  	COPY_SCALAR_FIELD(delay_upper_joins);
! 	COPY_SCALAR_FIELD(semi_can_btree);
! 	COPY_SCALAR_FIELD(semi_can_hash);
! 	COPY_NODE_FIELD(semi_operators);
! 	COPY_NODE_FIELD(semi_rhs_exprs);
  
  	return newnode;
  }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index fe509b0..fd876fb 100644
*** a/src/backend/nodes/equalfuncs.c
--- b/src/backend/nodes/equalfuncs.c
*** _equalSpecialJoinInfo(const SpecialJoinI
*** 798,804 
  	COMPARE_SCALAR_FIELD(jointype);
  	COMPARE_SCALAR_FIELD(lhs_strict);
  	COMPARE_SCALAR_FIELD(delay_upper_joins);
! 	COMPARE_NODE_FIELD(join_quals);
  
  	return true;
  }
--- 798,807 
  	COMPARE_SCALAR_FIELD(jointype);
  	COMPARE_SCALAR_FIELD(lhs_strict);
  	COMPARE_SCALAR_FIELD(delay_upper_joins);
! 	COMPARE_SCALAR_FIELD(semi_can_btree);
! 	COMPARE_SCALAR_FIELD(semi_can_hash);
! 	COMPARE_NODE_FIELD(semi_operators);
! 	COMPARE_NODE_FIELD(semi_rhs_exprs);
  
  	return true;
  }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 775f482..75c57a2 100644
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
*** _outSpecialJoinInfo(StringInfo str, cons
*** 1945,1951 
  	WRITE_ENUM_FIELD(jointype, JoinType);
  	WRITE_BOOL_FIELD(lhs_strict);
  	WRITE_BOOL_FIELD(delay_upper_joins);
! 	WRITE_NODE_FIELD(join_quals);
  }
  
  static void
--- 1945,1954 
  	WRITE_ENUM_FIELD(jointype, JoinType);
  	WRITE_BOOL_FIELD(lhs_strict);
  	WRITE_BOOL_FIELD(delay_upper_joins);
! 	WRITE_BOOL_FIELD(semi_can_btree);
! 	WRITE_BOOL_FIELD(semi_can_hash);
! 	WRITE_NODE_FIELD(semi_operators);
! 	WRITE_NODE_FIELD(semi_rhs_exprs);
  }
  
  static void
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5a9daf0..1a0d358 100644
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
*** compute_semi_anti_join_factors(PlannerIn
*** 3294,3300 
  	/* we don't bother trying to make the remaining fields valid */
  	norm_sjinfo.lhs_strict = false;
  	norm_sjinfo.delay_upper_joins = false;
! 	norm_sjinfo.join_quals = NIL;
  
  	nselec = clauselist_selectivity(root,
  	joinquals,
--- 3294,3303 
  	/* we don't bother trying to make the remaining fields valid */
  	norm_sjinfo.lhs_strict = false;
  	norm_sjinfo.delay_upper_joins = false;
! 	norm_sjinfo.semi_can_btree = false;
! 	norm_sjinfo.semi_can_hash = false;
! 	norm_sjinfo.semi_operators = NIL;
! 	norm_sjinfo.semi_rhs_exprs = NIL;
  
  	nselec = clauselist_selectivity(root,
  	joinquals,
***

Re: [HACKERS] Clamping reulst row number of joins.

2015-03-06 Thread Tom Lane

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> BTW, is that JOIN (VALUES(...)) thing common in applications, or did you
>> just use it to make a compact example?  If it were something worth
>> optimizing, it seems like we could teach the planner to "pull up VALUES"
>> in the same way that it flattens sub-selects.  I'm not sure if this is
>> worth the trouble or not, though.

> I've certainly seen and used values() constructs in joins for a variety
> of reasons and I do think it'd be worthwhile for the planner to know how
> to pull up a VALUES construct.

> Would that be a lot of effort, either code-wise or runtime-wise?  My gut
> feeling is that it wouldn't be, but you're clearly in a much better
> position to determine that.

My guess is that it'd be pretty simple to do if we want to do it.
I've not looked at the code yet though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Stephen Frost

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Stephen Frost wrote:
> > * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> > > Perhaps one of the requirements of a new auth method should be to allow
> > > middlemen such as connection poolers.  It's been over two years since I
> > > had a look, but IIRC pgbouncer had the very ugly requirement of its own
> > > copy of user/passwords in a file, and of course you had to update it
> > > separately if you changed the password in the server.  We need to make
> > > it possible for it not to require any such thing.
> > 
> > If we go this direction, we've got to be *very* careful that it's only
> > when the admin enables it.  man-in-the-middle attacks are quite real and
> > you're essentially asking that we support them intentionally.  I agree
> > that we want to support connection poolers but they have an inherent
> > MITM profile.
> 
> Sure.  I was thinking we would have some mechanism to authenticate the
> connection as coming from a pooler that has been previously authorized;
> something simple as a new pg_hba.conf entry type for "poolers" that are
> only authorized to connect to such-and-such databases, perhaps limit to
> such-and-such users, etc.

Well, server-side, we already have that- have pgbouncer run on the
database server (something which I'm typically in favor of anyway) and
use 'peer'.  If it supported TLS then it could use certificates instead.
The question is what to do after the pooler has connected and that's
actually a generic issue which goes beyond poolers and into
applications, basically, "how can I re-authenticate this connection
using a different role."  We have SET ROLE, but that gives a lot of
power to the role the pooler logs in as.  It'd definitely be neat to
provide a way to use SCRAM or similar to do that re-authentication after
the initial connection.

> I think Kerberos implementations are uncommon, and the complexity of
> getting the whole thing up and running is probably the major reason; or
> at least there's the common belief that it's so.

Kerberos is *extremely* common- it's what Active Directory uses, after
all.  Where it isn't used are hosting/cloud providers and the like where
they want to support any and every device their client might want to use
to connect, or places which don't realize that AD uses Kerberos and that
it can be leveraged to provide auth to PG.

> My guess is that since there are few users, pgbouncer devs see little
> reason to implement it also.  Chicken and egg, perhaps.

pgbouncer isn't as necessary in the kinds of environments that use
Kerberos because you aren't having lots of connections/s from distinct
principals to a given server.  It's just not the profile that pgbouncer
is built for, but that's kind of my point here- pgbouncer is far more
concerned with performance than security because the assumption going in
is that you have a trusted server connecting to a trusted server.
That's an acceptable assumption for a lot of environments, though not
all.

> (Actually, is pgbouncer under active maintenance at all these days?)

I understand Andrew has been helping with it, but my guess is that's
more maintenance and less active development.

> > Also, I don't expect we're going to remove md5 any time soon and,
> > frankly, people using pgbouncer probably aren't worried about the issues
> > which exist with that mechanism and care much more about performance, as
> > it doesn't even support TLS..
> 
> I think the accept the issues because they have no other choice, not
> because they are really all that OK with them.

I'd certainly be very happy to see someone interested enough in this
issue to dedicate resources (either human time or funding) to implement
TLS for pgbouncer...

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] File based Incremental backup v8

2015-03-06 Thread Robert Haas

On Fri, Mar 6, 2015 at 9:38 AM, Gabriele Bartolini
 wrote:
> I believe the main point is to look at a user interface point of view.
> If/When we switch to a block level incremental support, this will be
> completely transparent to the end user, even if we start with a file-level
> approach with LSN check.

I don't think that's true.  To have a real file-level incremental
backup you need the ability to take the incremental backup, and then
you also need the ability to take a full backup + an incremental
backup taken later and reassemble a full image of the cluster on which
you can run recovery.  The means of doing that is going to be
different for an approach that only copies certain blocks vs. one that
copies whole files.  Once we have the block-based approach, nobody
will ever use the file-based approach, so whatever code or tools we
write to do that will all be dead code, yet we'll still have to
support them for many years.

By the way, unless I'm missing something, this patch only seems to
include the code to construct an incremental backup, but no tools
whatsoever to do anything useful with it once you've got it.  I think
that's 100% unacceptable.  Users need to be able to manipulate
PostgreSQL backups using either standard operating system tools or
tools provided with PostgreSQL.  Some people may prefer to use
something like repmgr or pitrtools or omniptr in addition, but that
shouldn't be a requirement for incremental backup to be usable.

Agile development is good, but that does not mean you can divide a big
project into arbitrarily small chunks.  At some point the chunks are
too small to be sure that the overall direction is right, and/or
individually useless.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Alvaro Herrera

Stephen Frost wrote:
> Alvaro,
> 
> * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:

> > Perhaps one of the requirements of a new auth method should be to allow
> > middlemen such as connection poolers.  It's been over two years since I
> > had a look, but IIRC pgbouncer had the very ugly requirement of its own
> > copy of user/passwords in a file, and of course you had to update it
> > separately if you changed the password in the server.  We need to make
> > it possible for it not to require any such thing.
> 
> If we go this direction, we've got to be *very* careful that it's only
> when the admin enables it.  man-in-the-middle attacks are quite real and
> you're essentially asking that we support them intentionally.  I agree
> that we want to support connection poolers but they have an inherent
> MITM profile.

Sure.  I was thinking we would have some mechanism to authenticate the
connection as coming from a pooler that has been previously authorized;
something simple as a new pg_hba.conf entry type for "poolers" that are
only authorized to connect to such-and-such databases, perhaps limit to
such-and-such users, etc.

> Note that this is also something which is up to the pooling system and
> which we can't control.  A good example is Kerberos.  Kerberos has had a
> way for authentication to be proxied for a long time (with some controls
> to say which principals are allowed to be proxied, and which systems are
> allowed to proxy on behalf of other principals), but pgbouncer doesn't
> support that even though it'd eliminate the need for it to have a user /
> password file.

True.

I think Kerberos implementations are uncommon, and the complexity of
getting the whole thing up and running is probably the major reason; or
at least there's the common belief that it's so.

My guess is that since there are few users, pgbouncer devs see little
reason to implement it also.  Chicken and egg, perhaps.

(Actually, is pgbouncer under active maintenance at all these days?)

> Also, I don't expect we're going to remove md5 any time soon and,
> frankly, people using pgbouncer probably aren't worried about the issues
> which exist with that mechanism and care much more about performance, as
> it doesn't even support TLS..

I think the accept the issues because they have no other choice, not
because they are really all that OK with them.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CATUPDATE confusion?

2015-03-06 Thread Adam Brightwell

All,

Thanks for all the feedback and review.

> I think I vote for (1).  I do not like (2) because of the argument I made
> > upthread that write access on system catalogs is far more dangerous than
> > a naive DBA might think.  (0) has some attraction but really CATUPDATE
> > is one more concept than we need.
>
> I agree with #1 on this.
>

#1 makes sense to me as well.  The current version of the patch takes this
approach.  Also, I have verified against 'master' as of c6ee39b.

Thanks,
Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Stephen Frost

Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Stephen Frost wrote:
> > * Josh Berkus (j...@agliodbs.com) wrote:
> 
> > > > 3)  Using the user name for the MD5 storage salt allows the MD5 stored
> > > > hash to be used on a different cluster if the user used the same
> > > > password. 
> > > 
> > > This is a feature as well as a bug. For example, pgBouncer relies on
> > > this aspect of md5 auth.
> > 
> > It's not a feature and pgBouncer could be made to not rely on this.
> 
> Perhaps one of the requirements of a new auth method should be to allow
> middlemen such as connection poolers.  It's been over two years since I
> had a look, but IIRC pgbouncer had the very ugly requirement of its own
> copy of user/passwords in a file, and of course you had to update it
> separately if you changed the password in the server.  We need to make
> it possible for it not to require any such thing.

If we go this direction, we've got to be *very* careful that it's only
when the admin enables it.  man-in-the-middle attacks are quite real and
you're essentially asking that we support them intentionally.  I agree
that we want to support connection poolers but they have an inherent
MITM profile.

Note that this is also something which is up to the pooling system and
which we can't control.  A good example is Kerberos.  Kerberos has had a
way for authentication to be proxied for a long time (with some controls
to say which principals are allowed to be proxied, and which systems are
allowed to proxy on behalf of other principals), but pgbouncer doesn't
support that even though it'd eliminate the need for it to have a user /
password file.

Also, I don't expect we're going to remove md5 any time soon and,
frankly, people using pgbouncer probably aren't worried about the issues
which exist with that mechanism and care much more about performance, as
it doesn't even support TLS..

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Clamping reulst row number of joins.

2015-03-06 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> BTW, is that JOIN (VALUES(...)) thing common in applications, or did you
> just use it to make a compact example?  If it were something worth
> optimizing, it seems like we could teach the planner to "pull up VALUES"
> in the same way that it flattens sub-selects.  I'm not sure if this is
> worth the trouble or not, though.

I've certainly seen and used values() constructs in joins for a variety
of reasons and I do think it'd be worthwhile for the planner to know how
to pull up a VALUES construct.

Would that be a lot of effort, either code-wise or runtime-wise?  My gut
feeling is that it wouldn't be, but you're clearly in a much better
position to determine that.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Clamping reulst row number of joins.

2015-03-06 Thread Tom Lane

Kyotaro HORIGUCHI  writes:
> Hello, I had a report that a query gets wired estimated row
> numbers and it makes subsequent plans wrong.

> Although the detailed explanation is shown later in this mail,
> the problem here was that a query like following makes the path
> with apparently wrong row number.

> EXPLAIN SELECT a FROM (SELECT a FROM t1 UNION ALL SELECT 0) t
>  JOIN (VALUES (1)) tt(x) ON tt.x = t.a;

Hm, the problem evidently is that we get a default selectivity estimate
for the ON condition.  I think a proper fix for this would involve
teaching eqjoinsel (and ideally other join selectivity functions) how
to drill down into appendrels and combine estimates for the child
relations.

> I suppose that join nodes can safely clamp the number of its
> result rows by the product of the row numbers of underlying two
> paths. And if it is correct, the attached patch can correct the
> estimation like this.

Unfortunately, this patch is completely horrid, because it gives an unfair
advantage to the parameterized-nestloop plan.  (The hacks in the other
two functions are no-ops, because only a nestloop plan will have a
parameterized path for the appendrel.)  What's more, it's only a cosmetic
fix: it won't correct the planner's actual idea of the joinrel size, which
means it won't have an impact on planning any additional levels of
joining.

We really need to fix this on the selectivity-estimate side.

BTW, is that JOIN (VALUES(...)) thing common in applications, or did you
just use it to make a compact example?  If it were something worth
optimizing, it seems like we could teach the planner to "pull up VALUES"
in the same way that it flattens sub-selects.  I'm not sure if this is
worth the trouble or not, though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Alvaro Herrera

Stephen Frost wrote:
> * Josh Berkus (j...@agliodbs.com) wrote:

> > > 3)  Using the user name for the MD5 storage salt allows the MD5 stored
> > > hash to be used on a different cluster if the user used the same
> > > password. 
> > 
> > This is a feature as well as a bug. For example, pgBouncer relies on
> > this aspect of md5 auth.
> 
> It's not a feature and pgBouncer could be made to not rely on this.

Perhaps one of the requirements of a new auth method should be to allow
middlemen such as connection poolers.  It's been over two years since I
had a look, but IIRC pgbouncer had the very ugly requirement of its own
copy of user/passwords in a file, and of course you had to update it
separately if you changed the password in the server.  We need to make
it possible for it not to require any such thing.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add transforms feature

2015-03-06 Thread Pavel Stehule

Hi

I am checking this patch, but it is broken still

Regards

Pavel




2015-02-13 8:14 GMT+01:00 Michael Paquier :

>
>
> On Mon, Dec 22, 2014 at 12:19 PM, Michael Paquier <
> michael.paqu...@gmail.com> wrote:
>
>> On Mon, Dec 15, 2014 at 3:10 PM, Peter Eisentraut 
>> wrote:
>> > fixed
>> This patch needs a rebase, it does not apply correctly in a couple of
>> places on latest HEAD (699300a):
>> ./src/include/catalog/catversion.h.rej
>> ./src/include/catalog/pg_proc.h.rej
>> ./src/pl/plpython/plpy_procedure.c.rej
>>
>
> I moved this patch to 2015-02 to not lose track of it and because it did
> not receive much reviews...
> --
> Michael
>

Re: [HACKERS] File based Incremental backup v8

2015-03-06 Thread Gabriele Bartolini

Hi Robert,

2015-03-06 3:10 GMT+11:00 Robert Haas :

> But I agree with Fujii to the extent that I see little value in
> committing this patch in the form proposed.  Being smart enough to use
> the LSN to identify changed blocks, but then sending the entirety of
> every file anyway because you don't want to go to the trouble of
> figuring out how to revise the wire protocol to identify the
> individual blocks being sent and write the tools to reconstruct a full
> backup based on that data, does not seem like enough of a win.


I believe the main point is to look at a user interface point of view.
If/When we switch to a block level incremental support, this will be
completely transparent to the end user, even if we start with a file-level
approach with LSN check.

The win is already determined by the average space/time gained by users of
VLDB with a good chunk of read-only data. Our Barman users with incremental
backup (released recently - its algorithm can be compared to the one of
file-level backup proposed by Marco) can benefit on average of a data
deduplication ratio ranging between 50 to 70% of the cluster size.

A tangible example is depicted here, with Navionics saving 8.2TB a week
thanks to this approach (and 17 hours instead of 50 for backup time):
http://blog.2ndquadrant.com/incremental-backup-barman-1-4-0/

However, even smaller databases will benefit. It is clear that very small
databases as well as frequently updated ones won't be interested in
incremental backup, but that is never been the use case for this feature.

I believe that if we still think that this approach is not worth it, we are
making a big mistake. The way I see it, this patch follows an agile
approach and it is an important step towards incremental backup on a block
basis.


> As Fujii says, if we ship this patch as written, people will just keep
> using the timestamp-based approach anyway.


I think that allowing users to be able to backup in an incremental way
through streaming replication (even though based on files) will give more
flexibility to system and database administrators for their disaster
recovery solutions.

Thanks,
Gabriele

Re: [HACKERS] Rethinking pg_dump's function sorting code

2015-03-06 Thread Tom Lane

Noah Misch  writes:
> Comparing argument type names sounds fine.  Comparing argument type OID does
> not offer enough to justify the loss of cross-cluster sort equivalence.

Fair enough.

> So as to stably compare f(nsp0.t) to f(nsp1.t), this should also compare the
> dobj.namespace name.

Ah, good point.  Will fix.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS

2015-03-06 Thread Stephen Frost

* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
> On 26 February 2015 at 09:50, Dean Rasheed  wrote:
> > On 26 February 2015 at 05:43, Stephen Frost  wrote:
> >> I wonder if there are some documentation updates which need to be done
> >> for this also?  I'm planning to look as I vauguely recall mentioning the
> >> ordering of operations somewhere along the way.
> 
> I couldn't find any mention of the timing of the check in the existing
> documentation, although it does vaguely imply that the check is done
> before inserting any new data. There is an existing paragraph
> describing the timing of USING conditions, so I added a new paragraph
> immediately after that to explain when CHECK expressions are enforced,
> since that seemed like the best place for it.

Ah, ok, I must have been thinking about the USING discussion.  Thanks
for adding the paragraph about the CHECK timing.

> >> I also addressed the bitrot from the column-priv leak patch.  Would be
> >> great to have you take a look at the latest and let me know if you have
> >> any further comments or suggestions.
> 
> I looked it over again, and I'm happy with these updates, except there
> was a missing "break" in the switch statement in
> ExecWithCheckOptions().

Oh, thanks for catching that!

> Here's an updated patch.

Excellent, will review.

Thanks again!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Stephen Frost

Greg,

* Greg Stark (st...@mit.edu) wrote:
> Locked accounts are a terrible terrible idea. All they do is hand attackers
> an easy DOS vulnerability. They're pure security theatre if your
> authentication isn't vulnerable to brute force attacks and an unreliable
> band-aid if they are.

For starters, our authentication *is* vulnerable to brute force attacks
(as is any password-based system, and I doubt we're going to completely
drop support for them regardless of what else we do here), and second
the account lock-out capability is still required in NIST 800-53 rev4 and
is covered by AC-7.

http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-53r4.pdf

AC-7 does address the DOS risk and allows organizations to unlock the
account after an organization-specified delay.

I've been able to address that in the past by using Kerberos for PG
instead and implementing the lock-out and other requirements in this
area that PG doesn't support that way, but that isn't available in all
situations which leads to far worse solutions having to be used to meet
these requirements (sorry, but hacking up a PAM-based approach which
uses cracklib and pam_deny is *really* ugly).

> Having dealt with mechanisms for locking accounts in other database they're
> much more complicated than they appear. You need to deal with different
> requirements for different users, have multiple knobs for how it triggers
> and resolves, have tools for auditing the connection attempts to determine
> if they're legitimate and identify where the incorrect attempts are coming
> from, and so on. And all that accomplishes in the best case scenario is
> having lots of busy-work support requests responding to locked accounts
> and in the worst case scenario upgrading minor issues into major service
> outages.

I agree that they're complicated and that auditing is another necessary
component that we don't currently have.  We are woefully behind in these
areas and should certainly look to what others have done and learned
over the past 10 years that these issues have more-or-less been ignored,
but I don't believe we can or should continue to ignore them as it makes
PG unnecessairly more difficult to use in many areas.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Stephen Frost

* Albe Laurenz (laurenz.a...@wien.gv.at) wrote:
> Stephen Frost wrote:
> > Yes, it certainly was.  I think Bruce was thinking that we could simply
> > hash what goes on to disk with an additional salt that's stored, but
> > that wouldn't actually work without requiring a change to the wireline
> > protocol, which is the basis of this entire line of discussion, in my
> > view.
> 
> This article
> https://hashcat.net/misc/postgres-pth/postgres-pth.pdf
> has some ideas about how to improve the situation.

This falls into the same category as some other proposed changes- it
requires wireline protocol changes, which means it really isn't
interesting to consider.

While I'm not surprised, it's certainly unfortunate that none of these
articles bother to point out what would be really useful to PG users-
how they can decide which risks they want to accept by choosing the
authentication method.  Using 'password', while it isn't great because
of the poor salt used (username), it isn't vulnerable to the 'PTH'
attack, and better authentication methods are available (certificates,
Kerberos, PAM, etc).  Admittedly, the default is md5 for most
distributions, but that's because the better auth methods require
depending on external systems and distribution installers can't know if
those systems have been set up or not.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] parallel mode and parallel contexts

2015-03-06 Thread Robert Haas

On Fri, Mar 6, 2015 at 7:01 AM, Amit Kapila  wrote:
> Today, while testing parallel_seqscan patch, I encountered one
> intermittent issue (it hangs in below function) and I have question
> related to below function.
>
> +void
> +WaitForParallelWorkersToFinish(ParallelContext *pcxt)
> +{
> ..
> + for (;;)
> + {
> ..
> + CHECK_FOR_INTERRUPTS();
> + for (i = 0; i < pcxt->nworkers; ++i)
> + {
> + if (pcxt->worker[i].error_mqh != NULL)
> + {
> + anyone_alive = true;
> + break;
> + }
> + }
> +
> + if (!anyone_alive)
> + break;
> +
> + WaitLatch(&MyProc->procLatch, WL_LATCH_SET, -1);
> + ResetLatch(&MyProc->procLatch);
> + }
>
> Isn't there a race condition in this function such that after it finds
> that there is some alive worker and before it does WaitLatch(), the
> worker completes its work and exits, now in such a case who is
> going to wake the backend waiting on procLatch?

It doesn't matter whether some other backend sets the process latch
before we reach WaitLatch() or after we begin waiting.  Either way,
it's fine.

It would be bad if the process could exit without setting the latch at
all, though.  I hope that's not the case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel mode and parallel contexts

2015-03-06 Thread Amit Kapila

On Sun, Feb 15, 2015 at 11:48 AM, Robert Haas  wrote:
>
> On Fri, Feb 13, 2015 at 2:22 AM, Michael Paquier
>  wrote:
> > On Thu, Feb 12, 2015 at 3:59 AM, Robert Haas 
wrote:
> >>
> >> We're not seeing eye to eye here yet, since I don't accept your
> >> example scenario and you don't accept mine.  Let's keep discussing.
> >>
> >> Meanwhile, here's an updated patch.
> >
> > A lot of cool activity is showing up here, so moved the patch to CF
2015-02.
> > Perhaps Andres you could add yourself as a reviewer?
>
> Here's a new version of the patch with (a) some additional
> restrictions on heavyweight locking both at the start of, and during,
> parallel mode and (b) a write-up in the README explaining the
> restrictions and my theory of why the handling of heavyweight locking
> is safe.  Hopefully this goes some way towards addressing Andres's
> concerns.  I've also replaced the specific (and wrong) messages about
> advisory locks with a more generic message, as previously proposed;
> and I've fixed at least one bug.
>

Today, while testing parallel_seqscan patch, I encountered one
intermittent issue (it hangs in below function) and I have question
related to below function.

+void
+WaitForParallelWorkersToFinish(ParallelContext *pcxt)
+{
..
+ for (;;)
+ {
..
+ CHECK_FOR_INTERRUPTS();
+ for (i = 0; i < pcxt->nworkers; ++i)
+ {
+ if (pcxt->worker[i].error_mqh != NULL)
+ {
+ anyone_alive = true;
+ break;
+ }
+ }
+
+ if (!anyone_alive)
+ break;
+
+ WaitLatch(&MyProc->procLatch, WL_LATCH_SET, -1);
+ ResetLatch(&MyProc->procLatch);
+ }

Isn't there a race condition in this function such that after it finds
that there is some alive worker and before it does WaitLatch(), the
worker completes its work and exits, now in such a case who is
going to wake the backend waiting on procLatch?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Greg Stark

Locked accounts are a terrible terrible idea. All they do is hand attackers
an easy DOS vulnerability. They're pure security theatre if your
authentication isn't vulnerable to brute force attacks and an unreliable
band-aid if they are.

Having dealt with mechanisms for locking accounts in other database they're
much more complicated than they appear. You need to deal with different
requirements for different users, have multiple knobs for how it triggers
and resolves, have tools for auditing the connection attempts to determine
if they're legitimate and identify where the incorrect attempts are coming
from, and so on. And all that accomplishes in the best case scenario is
having lots of busy-work support requests responding to locked accounts
and in the worst case scenario upgrading minor issues into major service
outages.

Re: [HACKERS] Join push-down support for foreign tables

2015-03-06 Thread Ashutosh Bapat

Hi Kaigai-san, Hanada-san,
Attached please find a patch to print the column names prefixed by the
relation names. I haven't tested the patch fully. The same changes will be
needed for CustomPlan node specific code.

Now I am able to make sense out of the Output information

postgres=# explain verbose select * from ft1 join ft2 using (val);

QUERY PLAN

---
---
 Foreign Scan  (cost=100.00..125.60 rows=2560 width=12)
   Output: ft1.val, ft1.val2, ft2.val2
   Remote SQL: SELECT r.a_0, r.a_1, l.a_1 FROM (SELECT val, val2 FROM
public.lt) l (a_0, a_1) INNER JOIN (SELECT val, val2 FROM public.lt) r
(a_0, a_1)
  ON ((r.a_0 = l.a_0))
(3 rows)


On Fri, Mar 6, 2015 at 6:41 AM, Kouhei Kaigai  wrote:

> > Actually val and val2 come from public.lt in "r" side, but as you say
> > it's too difficult to know that from EXPLAIN output.  Do you have any
> > idea to make the "Output" item more readable?
> >
> A fundamental reason why we need to have symbolic aliases here is that
> postgres_fdw has remote query in cstring form. It makes implementation
> complicated to deconstruct/construct a query that is once constructed
> on the underlying foreign-path level.
> If ForeignScan keeps items to construct remote query in expression node
> form (and construction of remote query is delayed to beginning of the
> executor, probably), we will be able to construct more human readable
> remote query.
>
> However, I don't recommend to work on this great refactoring stuff
> within the scope of join push-down support project.
>
> Thanks,
> --
> NEC OSS Promotion Center / PG-Strom Project
> KaiGai Kohei 
>
>
> > -Original Message-
> > From: pgsql-hackers-ow...@postgresql.org
> > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Shigeru Hanada
> > Sent: Thursday, March 05, 2015 10:00 PM
> > To: Ashutosh Bapat
> > Cc: Kaigai Kouhei(海外 浩平); Robert Haas; PostgreSQL-development
> > Subject: Re: [HACKERS] Join push-down support for foreign tables
> >
> > Hi Ashutosh, thanks for the review.
> >
> > 2015-03-04 19:17 GMT+09:00 Ashutosh Bapat <
> ashutosh.ba...@enterprisedb.com>:
> > > In create_foreignscan_path() we have lines like -
> > > 1587 pathnode->path.param_info = get_baserel_parampathinfo(root,
> rel,
> > > 1588
> > > required_outer);
> > > Now, that the same function is being used for creating foreign scan
> paths
> > > for joins, we should be calling get_joinrel_parampathinfo() on a join
> rel
> > > and get_baserel_parampathinfo() on base rel.
> >
> > Got it.  Please let me check the difference.
> >
> > >
> > > The patch seems to handle all the restriction clauses in the same way.
> There
> > > are two kinds of restriction clauses - a. join quals (specified using
> ON
> > > clause; optimizer might move them to the other class if that doesn't
> affect
> > > correctness) and b. quals on join relation (specified in the WHERE
> clause,
> > > optimizer might move them to the other class if that doesn't affect
> > > correctness). The quals in "a" are applied while the join is being
> computed
> > > whereas those in "b" are applied after the join is computed. For
> example,
> > > postgres=# select * from lt;
> > >  val | val2
> > > -+--
> > >1 |2
> > >1 |3
> > > (2 rows)
> > >
> > > postgres=# select * from lt2;
> > >  val | val2
> > > -+--
> > >1 |2
> > > (1 row)
> > >
> > > postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2);
> > >  val | val2 | val | val2
> > > -+--+-+--
> > >1 |2 |   1 |2
> > >1 |3 | |
> > > (2 rows)
> > >
> > > postgres=# select * from lt left join lt2 on (true) where (lt.val2 =
> > > lt2.val2);
> > >  val | val2 | val | val2
> > > -+--+-+--
> > >1 |2 |   1 |2
> > > (1 row)
> > >
> > > The difference between these two kinds is evident in case of outer
> joins,
> > > for inner join optimizer puts all of them in class "b". The remote
> query
> > > sent to the foreign server has all those in ON clause. Consider foreign
> > > tables ft1 and ft2 pointing to local tables on the same server.
> > > postgres=# \d ft1
> > >  Foreign table "public.ft1"
> > >  Column |  Type   | Modifiers | FDW Options
> > > +-+---+-
> > >  val| integer |   |
> > >  val2   | integer |   |
> > > Server: loopback
> > > FDW Options: (table_name 'lt')
> > >
> > > postgres=# \d ft2
> > >  Foreign table "public.ft2"
> > >  Column |  Type   | Modifiers | FDW Options
> > > +-+---+-
> > >  val| integer |   |
> > >  val2   | integer |   |
> > > Server: loopback
> > > FDW Options: (table_name 'lt2')
> > >
> > > postgres=# explain verbose select * from ft1 left join ft2 on
> (ft1.val2 =
> > > ft2.val2) where ft1.val + ft2.val > ft1.val

[HACKERS] extend pgbench expressions with functions

2015-03-06 Thread Fabien COELHO



This patch extends pgbench expression with functions. Currently only one 
"abs" function is added. The point is rather to bootstrap the 
infrastructure for other functions (such as hash, random variants...) to 
be added later.


--
Fabien.diff --git a/contrib/pgbench/exprparse.y b/contrib/pgbench/exprparse.y
index 243c6b9..fd396bd 100644
--- a/contrib/pgbench/exprparse.y
+++ b/contrib/pgbench/exprparse.y
@@ -20,6 +20,7 @@ static PgBenchExpr *make_integer_constant(int64 ival);
 static PgBenchExpr *make_variable(char *varname);
 static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
 		PgBenchExpr *rexpr);
+static PgBenchExpr *make_func(const char *fname, PgBenchExpr *arg1);
 
 %}
 
@@ -35,8 +36,8 @@ static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
 
 %type  expr
 %type  INTEGER
-%type  VARIABLE
-%token INTEGER VARIABLE
+%type  VARIABLE FUNCTION
+%token INTEGER VARIABLE FUNCTION
 %token CHAR_ERROR /* never used, will raise a syntax error */
 
 %left	'+' '-'
@@ -57,6 +58,7 @@ expr: '(' expr ')'			{ $$ = $2; }
 	| expr '%' expr			{ $$ = make_op('%', $1, $3); }
 	| INTEGER{ $$ = make_integer_constant($1); }
 	| VARIABLE { $$ = make_variable($1); }
+	| FUNCTION '(' expr ')' { $$ = make_func($1, $3); pg_free($1); }
 	;
 
 %%
@@ -93,4 +95,32 @@ make_op(char operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr)
 	return expr;
 }
 
+static struct {
+	char * fname;
+	int nargs;
+	PgBenchFunction tag;
+} PGBENCH_FUNCTIONS[] = {
+	{ "abs", 1, PGBENCH_ABS }
+};
+
+static PgBenchExpr *
+make_func(const char * fname, PgBenchExpr *arg1)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+	int nfunctions = sizeof(PGBENCH_FUNCTIONS) / sizeof(PGBENCH_FUNCTIONS[0]);
+	int i;
+
+	expr->etype = ENODE_FUNCTION;
+	expr->u.function.function = PGBENCH_NONE;
+
+	for (i = 0; i < nfunctions; i++)
+		if (pg_strcasecmp(fname, PGBENCH_FUNCTIONS[i].fname) == 0)
+			expr->u.function.function = PGBENCH_FUNCTIONS[i].tag;
+
+	Assert(expr->u.function.function != PGBENCH_NONE);
+
+	expr->u.function.arg1 = arg1;
+	return expr;
+}
+
 #include "exprscan.c"
diff --git a/contrib/pgbench/exprscan.l b/contrib/pgbench/exprscan.l
index 4c9229c..a276444 100644
--- a/contrib/pgbench/exprscan.l
+++ b/contrib/pgbench/exprscan.l
@@ -46,6 +46,7 @@ space			[ \t\r\f]
 ")"{ yycol += yyleng; return ')'; }
 :[a-zA-Z0-9_]+	{ yycol += yyleng; yylval.str = pg_strdup(yytext + 1); return VARIABLE; }
 [0-9]+			{ yycol += yyleng; yylval.ival = strtoint64(yytext); return INTEGER; }
+[a-zA-Z]+   { yycol += yyleng; yylval.str = pg_strdup(yytext); return FUNCTION; }
 
 [\n]			{ yycol = 0; yyline++; }
 {space}			{ yycol += yyleng; /* ignore */ }
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 706fdf5..1e9604d 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -959,6 +959,26 @@ evaluateExpr(CState *st, PgBenchExpr *expr, int64 *retval)
 return false;
 			}
 
+		case ENODE_FUNCTION:
+			{
+switch (expr->u.function.function)
+{
+	case PGBENCH_ABS:
+	{
+		int64 arg1;
+		if (!evaluateExpr(st, expr->u.function.arg1, &arg1))
+			return false;
+
+		*retval = arg1 > 0? arg1: -arg1;
+		return true;
+	}
+	default:
+		fprintf(stderr, "bad function (%d)",
+expr->u.function.function);
+		return false;
+}
+			}
+
 		default:
 			break;
 	}
diff --git a/contrib/pgbench/pgbench.h b/contrib/pgbench/pgbench.h
index 128bf11..8697c7b 100644
--- a/contrib/pgbench/pgbench.h
+++ b/contrib/pgbench/pgbench.h
@@ -15,12 +15,19 @@ typedef enum PgBenchExprType
 {
 	ENODE_INTEGER_CONSTANT,
 	ENODE_VARIABLE,
-	ENODE_OPERATOR
+	ENODE_OPERATOR,
+	ENODE_FUNCTION
 } PgBenchExprType;
 
 struct PgBenchExpr;
 typedef struct PgBenchExpr PgBenchExpr;
 
+typedef enum PgBenchFunction
+{
+	PGBENCH_NONE,
+	PGBENCH_ABS
+} PgBenchFunction;
+
 struct PgBenchExpr
 {
 	PgBenchExprType	etype;
@@ -40,6 +47,11 @@ struct PgBenchExpr
 			PgBenchExpr	*lexpr;
 			PgBenchExpr *rexpr;
 		} operator;
+		struct
+		{
+			int function;
+			PgBenchExpr *arg1;
+		} function;
 	} u;
 };
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Albe Laurenz

Stephen Frost wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Bruce Momjian  writes:
>>> Let me update my list of possible improvements:
>>
>>> 1)  MD5 makes users feel uneasy (though our usage is mostly safe)
>>
>>> 2)  The per-session salt sent to the client is only 32-bits, meaning
>>> that it is possible to reply an observed MD5 hash in ~16k connection
>>> attempts.
>>
>>> 3)  Using the user name for the MD5 storage salt allows the MD5 stored
>>> hash to be used on a different cluster if the user used the same
>>> password.
>>
>>> 4)  Using the user name for the MD5 storage salt allows the MD5 stored
>>> hash to be used on the _same_ cluster.
>>
>>> 5)  Using the user name for the MD5 storage salt causes the renaming of
>>> a user to break the stored password.
>>
>> What happened to "possession of the contents of pg_authid is sufficient
>> to log in"?  I thought fixing that was one of the objectives here.
> 
> Yes, it certainly was.  I think Bruce was thinking that we could simply
> hash what goes on to disk with an additional salt that's stored, but
> that wouldn't actually work without requiring a change to the wireline
> protocol, which is the basis of this entire line of discussion, in my
> view.

This article
https://hashcat.net/misc/postgres-pth/postgres-pth.pdf
has some ideas about how to improve the situation.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS

2015-03-06 Thread Dean Rasheed

On 26 February 2015 at 09:50, Dean Rasheed  wrote:
> On 26 February 2015 at 05:43, Stephen Frost  wrote:
>> I wonder if there are some documentation updates which need to be done
>> for this also?  I'm planning to look as I vauguely recall mentioning the
>> ordering of operations somewhere along the way.
>>

I couldn't find any mention of the timing of the check in the existing
documentation, although it does vaguely imply that the check is done
before inserting any new data. There is an existing paragraph
describing the timing of USING conditions, so I added a new paragraph
immediately after that to explain when CHECK expressions are enforced,
since that seemed like the best place for it.


>> I also addressed the bitrot from the column-priv leak patch.  Would be
>> great to have you take a look at the latest and let me know if you have
>> any further comments or suggestions.

I looked it over again, and I'm happy with these updates, except there
was a missing "break" in the switch statement in
ExecWithCheckOptions().

Here's an updated patch.

Regards,
Dean
diff --git a/doc/src/sgml/ref/create_policy.sgml b/doc/src/sgml/ref/create_policy.sgml
new file mode 100644
index 868a6c1..49eaadc
*** a/doc/src/sgml/ref/create_policy.sgml
--- b/doc/src/sgml/ref/create_policy.sgml
*** CREATE POLICY viewname),
! 	val_desc ? errdetail("Failing row contains %s.", val_desc) :
! 			   0));
  		}
  	}
  }
--- 1719,1747 
  	 modifiedCols,
  	 64);
  
! 			switch (wco->kind)
! 			{
! case WCO_VIEW_CHECK:
! 	ereport(ERROR,
! 			(errcode(ERRCODE_WITH_CHECK_OPTION_VIOLATION),
! 		errmsg("new row violates WITH CHECK OPTION for \"%s\"",
! 			   wco->relname),
! 			 val_desc ? errdetail("Failing row contains %s.",
!   val_desc) : 0));
! 	break;
! case WCO_RLS_INSERT_CHECK:
! case WCO_RLS_UPDATE_CHECK:
! 	ereport(ERROR,
! 			(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
! 		 errmsg("new row violates row level security policy for \"%s\"",
! wco->relname),
! 			 val_desc ? errdetail("Failing row contains %s.",
!   val_desc) : 0));
! 	break;
! default:
! 	elog(ERROR, "unrecognized WCO kind: %u", wco->kind);
! 	break;
! 			}
  		}
  	}
  }
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
new file mode 100644
index f96fb24..be879a4
*** a/src/backend/executor/nodeModifyTable.c
--- b/src/backend/executor/nodeModifyTable.c
*** ExecInsert(TupleTableSlot *slot,
*** 253,258 
--- 253,265 
  		tuple->t_tableOid = RelationGetRelid(resultRelationDesc);
  
  		/*
+ 		 * Check any RLS INSERT WITH CHECK policies
+ 		 */
+ 		if (resultRelInfo->ri_WithCheckOptions != NIL)
+ 			ExecWithCheckOptions(WCO_RLS_INSERT_CHECK,
+  resultRelInfo, slot, estate);
+ 
+ 		/*
  		 * Check the constraints of the tuple
  		 */
  		if (resultRelationDesc->rd_att->constr)
*** ExecInsert(TupleTableSlot *slot,
*** 287,295 
  
  	list_free(recheckIndexes);
  
! 	/* Check any WITH CHECK OPTION constraints */
  	if (resultRelInfo->ri_WithCheckOptions != NIL)
! 		ExecWithCheckOptions(resultRelInfo, slot, estate);
  
  	/* Process RETURNING if present */
  	if (resultRelInfo->ri_projectReturning)
--- 294,302 
  
  	list_free(recheckIndexes);
  
! 	/* Check any WITH CHECK OPTION constraints from parent views */
  	if (resultRelInfo->ri_WithCheckOptions != NIL)
! 		ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);
  
  	/* Process RETURNING if present */
  	if (resultRelInfo->ri_projectReturning)
*** ExecUpdate(ItemPointer tupleid,
*** 653,667 
  		tuple->t_tableOid = RelationGetRelid(resultRelationDesc);
  
  		/*
! 		 * Check the constraints of the tuple
  		 *
  		 * If we generate a new candidate tuple after EvalPlanQual testing, we
! 		 * must loop back here and recheck constraints.  (We don't need to
! 		 * redo triggers, however.  If there are any BEFORE triggers then
! 		 * trigger.c will have done heap_lock_tuple to lock the correct tuple,
! 		 * so there's no need to do them again.)
  		 */
  lreplace:;
  		if (resultRelationDesc->rd_att->constr)
  			ExecConstraints(resultRelInfo, slot, estate);
  
--- 660,681 
  		tuple->t_tableOid = RelationGetRelid(resultRelationDesc);
  
  		/*
! 		 * Check any RLS UPDATE WITH CHECK policies
  		 *
  		 * If we generate a new candidate tuple after EvalPlanQual testing, we
! 		 * must loop back here and recheck any RLS policies and constraints.
! 		 * (We don't need to redo triggers, however.  If there are any BEFORE
! 		 * triggers then trigger.c will have done heap_lock_tuple to lock the
! 		 * correct tuple, so there's no need to do them again.)
  		 */
  lreplace:;
+ 		if (resultRelInfo->ri_WithCheckOptions != NIL)
+ 			ExecWithCheckOptions(WCO_RLS_UPDATE_CHECK,
+  resultRelInfo, slot, estate);
+ 
+ 		/*
+ 		 * Check the constraints of the tuple
+ 		 *

[HACKERS] Clamping reulst row number of joins.

2015-03-06 Thread Kyotaro HORIGUCHI

Hello, I had a report that a query gets wired estimated row
numbers and it makes subsequent plans wrong.

Although the detailed explanation is shown later in this mail,
the problem here was that a query like following makes the path
with apparently wrong row number.

EXPLAIN SELECT a FROM (SELECT a FROM t1 UNION ALL SELECT 0) t
 JOIN (VALUES (1)) tt(x) ON tt.x = t.a;

0> Nested Loop  (cost=0.43..8.51 rows=68732 width=4)
1>   ->  Values Scan on "*VALUES*"  (cost=0.00..0.01 rows=1 width=4)
2>   ->  Append  (cost=0.43..8.48 rows=2 width=4)
3> ->  Index Only Scan using i_t1_a on t1
4>   (cost=0.43..8.46 rows=1 width=4)
5>   Index Cond: (a = "*VALUES*".column1)
6> ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..0.02 rows=1 width=4)
7>   Filter: ("*VALUES*".column1 = "*SELECT* 2"."?column?")
8>   ->  Result  (cost=0.00..0.01 rows=1 width=0)

The Nested Loop at line 0 says that it emits 68832 rows even
though the underlying nodes, Values at line 1 and Append at line
2 are giving 1 and 2 respectively as their reults. This query
actually returns only 1 row. It is seemingly wrong and causes the
plans above go wrong.

This is caused by the Append node, which don't have statistics,
so eqjoinsel takes 0.5 as the default selectivity for the nested
loop. Even though it is defficult to get statistics for Append
node, the nested loop could get more sane row nubmer if it
doesn't ignore the parameterized path selected for the scan on
t1.

I suppose that join nodes can safely clamp the number of its
result rows by the product of the row numbers of underlying two
paths. And if it is correct, the attached patch can correct the
estimation like this.

>  Nested Loop  (cost=0.43..16.51 rows=2 width=4)
>->  Values Scan on "*VALUES*"  (cost=0.00..0.01 rows=1 width=4)
>->  Append  (cost=0.43..16.48 rows=2 width=4)
>  ->  Index Only Scan using i_t1_a on t1 
>(cost=0.43..16.45 rows=1 width=4)
>Index Cond: (a = "*VALUES*".column1)
>  ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..0.02 rows=1 width=4)
>Filter: ("*VALUES*".column1 = "*SELECT* 2"."?column?")
>->  Result  (cost=0.00..0.01 rows=1 width=0)

Is it correct? And Does it have no bad side effects?
And is this applicable for 9.5?


The detailed test environment is described below.

Before this fix, the inner path of the top-level join is doing
hash because the inner path says that it will give 68732
rows. (But only 1 actually).

After this fix, the outer path declaring that it gives 2 rows so
the top-level join becomes nested loop and the inner path becomes
index scan.

===
CREATE TABLE t1 (a int);
INSERT INTO t1 (SELECT (a%2)*50 + a / 2 FROM generate_series(0, 100) a);
CREATE INDEX i_t1_a ON t1 (a);
CREATE TABLE t2 AS SELECT * from t1;
ANALYZE t1;
ANALYZE t2;
-- emulating the problematic situation
SET join_collapse_limit TO 1;
SET random_page_cost TO 8.0;
SET seq_page_cost TO 0.5;

EXPLAIN
  SELECT tt2.a
  FROM (SELECT a
FROM (SELECT a FROM t1
  UNION ALL SELECT 0) t
JOIN (VALUES (1)) tt(x) ON tt.x = t.a) tt2
  JOIN t2 ON (tt2.a = t2.a);


 The plan before fix
 Hash Join  (cost=30832.46..36230.60 rows=68732 width=4)
(actual time=1258.880..1260.519 rows=1 loops=1)
   Hash Cond: ("*VALUES*".column1 = t2.a)
   ->  Nested Loop  (cost=0.43..8.51 rows=68732 width=8)
(actual time=0.071..0.104 rows=1 loops=1)
 ->  Values Scan on "*VALUES*"  (cost=0.00..0.01 rows=1 width=4)
  (actual time=0.002..0.003 rows=1 loops=1)
 ->  Append  (cost=0.43..8.48 rows=2 width=4)
 (actual time=0.063..0.093 rows=1 loops=1)
   ->  Index Only Scan using i_t1_a on t1
(cost=0.43..8.46 rows=1 width=4)
(actual time=0.059..0.075 rows=1 loops=1)
 Index Cond: (a = "*VALUES*".column1)
 Heap Fetches: 2
   ->  Subquery Scan on "*SELECT* 2" 
 (cost=0.00..0.02 rows=1 width=4)
 (actual time=0.010..0.010 rows=0 loops=1)
 Filter: ("*VALUES*".column1 = "*SELECT* 2"."?column?")
 Rows Removed by Filter: 1
 ->  Result  (cost=0.00..0.01 rows=1 width=0)
 (actual time=0.001..0.002 rows=1 loops=1)
   ->  Hash  (cost=14425.01..14425.01 rows=101 width=4)
 (actual time=1250.274..1250.274 rows=101 loops=1)
 Buckets: 131072  Batches: 16  Memory Usage: 3227kB
 ->  Seq Scan on t2  (cost=0.00..14425.01 rows=101 width=4)
 (actual time=0.021..608.245 rows=101 loops=1)
 Planning time: 0.319 ms
 Execution time: 1260.792 ms

 The plan after fix
 Nested Loop  (cost=0.86..17.43 rows=2 width=4)
  (actual t

Re: [HACKERS] Rethinking pg_dump's function sorting code

2015-03-06 Thread Noah Misch

On Thu, Mar 05, 2015 at 07:28:33PM -0500, Tom Lane wrote:
> In bug #12832 Marko Tiikkaja points out that commit
> 7b583b20b1c95acb621c71251150beef958bb603 created a rather unnecessary
> dump failure hazard, since it applies pg_get_function_identity_arguments()
> to every function in the database, even those that won't get dumped.
> I think we should fix this by getting rid of pg_dump's use of that
> function altogether.  A low-tech way to sort functions of identical names
> would be to compare argument type OIDs, as in the attached simple patch.
> If people feel it's important to avoid depending on numerical OID order,
> we could instead look up type names locally and compare them as in the
> attached less-simple patch.

Comparing argument type names sounds fine.  Comparing argument type OID does
not offer enough to justify the loss of cross-cluster sort equivalence.

> Neither patch will exactly preserve the sort behavior of the current
> code, but I don't think that's important.

Agreed.

> --- 291,313 
>   {
>   FuncInfo   *fobj1 = *(FuncInfo *const *) p1;
>   FuncInfo   *fobj2 = *(FuncInfo *const *) p2;
> + int i;
>   
>   cmpval = fobj1->nargs - fobj2->nargs;
>   if (cmpval != 0)
>   return cmpval;
> ! for (i = 0; i < fobj1->nargs; i++)
> ! {
> ! TypeInfo   *argtype1 = 
> findTypeByOid(fobj1->argtypes[i]);
> ! TypeInfo   *argtype2 = 
> findTypeByOid(fobj2->argtypes[i]);
> ! 
> ! if (argtype1 && argtype2)
> ! {
> ! cmpval = strcmp(argtype1->dobj.name, 
> argtype2->dobj.name);
> ! if (cmpval != 0)
> ! return cmpval;
> ! }
> ! }

So as to stably compare f(nsp0.t) to f(nsp1.t), this should also compare the
dobj.namespace name.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rethinking pg_dump's function sorting code

2015-03-06 Thread Marko Tiikkaja


On 2015-03-06 01:28, Tom Lane wrote:

In bug #12832 Marko Tiikkaja points out that commit
7b583b20b1c95acb621c71251150beef958bb603 created a rather unnecessary
dump failure hazard, since it applies pg_get_function_identity_arguments()
to every function in the database, even those that won't get dumped.
I think we should fix this by getting rid of pg_dump's use of that
function altogether.  A low-tech way to sort functions of identical names
would be to compare argument type OIDs, as in the attached simple patch.
If people feel it's important to avoid depending on numerical OID order,
we could instead look up type names locally and compare them as in the
attached less-simple patch.  (Both patches neglect reverting the data
collection aspects of the prior commit, since that's mechanical; the only
interesting part is what we'll do to sort.)

Neither patch will exactly preserve the sort behavior of the current
code, but I don't think that's important.

Comments?


I have my own cow in this ditch, but I would much prefer the sort to be 
done based on the type name.  That way the order is still consistent 
between two databases where the objects were created in a different order.



.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

44 matches

Mail list logo