Re: [HACKERS] Event triggers + table partitioning cause server crash in current master

2017-05-15 Thread Amit Langote
On 2017/05/16 1:18, Mark Dilger wrote:
> 
>> On May 15, 2017, at 6:49 AM, Mark Dilger  wrote:
>>
>> I can confirm that this fixes the crash that I was seeing.  I have read
>> through the patch briefly, but will give it a more thorough review in the
>> next few hours.

Thanks a lot for the review.

> My only negative comment is that your patch follows a precedent of using
> event trigger commands named for AlterTable for operations other than
> an ALTER TABLE command.  You clearly are not starting the precedent
> here, as they are already used for IndexStmt and ViewStmt processing, but
> expanding the precedent by using them in DefineRelation, where they were
> not used before, seems to move in the wrong direction.  Either the functions
> should be renamed to something more general to avoid implying that the
> function is ALTER TABLE specific, or different functions should be defined
> and used, or ...?  I am uncertain what the right solution would be.

Hmm.  I think an alternative to doing what I did in my patch is to get rid
of calling AlterTableInternal() from DefineRelation() altogether.
Instead, the required ALTER TABLE commands can be added during the
transform phase, which will create a new AlterTableStmt and add it to the
list of things to be done after the relation is defined.  That way,
ProcessUtilitySlow() takes care of doing the event trigger stuff instead
of having to worry about it ourselves.  Attached updated patch does that.

PS: actually, we're talking elsewhere [1] of getting rid of adding
explicit NOT NULL constraints on range partition keys altogether, so even
if we apply this patch to fix the bug, there is a good chance that the
code will go away (of course, the bug won't exist too)

I divided the patch into 2 parts.

0001 fixes the bug you reported (not so directly though as my previous
 patch did)

0002 fixes the bug I found while working on this, whereby the content of
 CreateStmt will become invalid when creating partitions which could
 cause crash in certain case

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/8e2dd63d-c6fb-bb74-3c2b-ed6d63629c9d%40lab.ntt.co.jp
>From 31423c4dc43587ef5221bf0ae87484199487da40 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 15 May 2017 14:00:54 +0900
Subject: [PATCH 1/2] Fix to make partitioned tables work with event triggers

When creating range partitioned tables, AlterTableInternal is
called which expects the event trigger context info to be set by
EventTriggerAlterTableStart().  Lacking that, a crash occurred
when creating range partitioned tables if event triggers have
been defined in the database.  Instead of fixing it by adding
appropriate event trigger initialization calls in DefineRelation(),
rearrange code such that AlterTableInternal() is no longer directly
used to create NOT NULL constraints on partition key columns.
Now, transformCreateStmt() calls transformPartitionKey() which adds
an AlterTableStmt to add the necessary constraints.  Event triggers
will be taken care of properly when that statement is later
executed by ProcessUtilitySlow().

Reported by Mark Dilger
Report: https://postgr.es/m/4A9B8269-9771-4FB7-BFA3-84249800917D%40gmail.com
---
 src/backend/commands/tablecmds.c| 31 +
 src/backend/parser/parse_utilcmd.c  | 70 -
 src/test/regress/expected/event_trigger.out | 32 +
 src/test/regress/sql/event_trigger.sql  | 35 +++
 4 files changed, 128 insertions(+), 40 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index e259378051..e4d2bfb6b0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -805,13 +805,11 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 	if (stmt->partspec)
 	{
 		char		strategy;
-		int			partnatts,
-	i;
+		int			partnatts;
 		AttrNumber	partattrs[PARTITION_MAX_KEYS];
 		Oid			partopclass[PARTITION_MAX_KEYS];
 		Oid			partcollation[PARTITION_MAX_KEYS];
 		List	   *partexprs = NIL;
-		List	   *cmds = NIL;
 
 		/*
 		 * We need to transform the raw parsetrees corresponding to partition
@@ -828,33 +826,6 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 		partnatts = list_length(stmt->partspec->partParams);
 		StorePartitionKey(rel, strategy, partnatts, partattrs, partexprs,
 		  partopclass, partcollation);
-
-		/* Force key columns to be NOT NULL when using range partitioning */
-		if (strategy == PARTITION_STRATEGY_RANGE)
-		{
-			for (i = 0; i < partnatts; i++)
-			{
-AttrNumber	partattno = partattrs[i];
-Form_pg_attribute attform = descriptor->attrs[partattno - 1];
-
-if (partattno != 0 && !attform->attnotnull)
-{
-	/* Add a subcommand to make this one NOT NULL */
-	AlterTableCmd *cmd = makeNode(AlterTableCmd);
-
-	cmd->subtype = AT_SetNotNull;
-	cmd->name = pstrdup(NameStr(attform->attname));
-	cmds = 

Re: [HACKERS] [POC] hash partitioning

2017-05-15 Thread amul sul
On Mon, May 15, 2017 at 9:13 PM, Robert Haas  wrote:
> On Mon, May 15, 2017 at 6:57 AM, amul sul  wrote:
>>> Collation is only relevant for ordering, not equality.  Since hash
>>> opclasses provide only equality, not ordering, it's not relevant here.
>>> I'm not sure whether we should error out if it's specified or just
>>> silently ignore it.  Maybe an ERROR is a good idea?  But not sure.
>>>
>> IMHO, we could simply have a WARNING, and ignore collation, thoughts?
>>
>> Updated patches attached.
>
> I think that WARNING is rarely a good compromise between ERROR and
> nothing.  I think we should just decide whether this is legal (and
> then allow it without a WARNING) or not legal (and then ERROR).
> Telling the user that it's allowed but we don't like it doesn't really
> help much.

Understood, will throw an ERROR instead.

Thank you.

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CTE inlining

2017-05-15 Thread David Rowley
On 13 May 2017 at 08:39, Bruce Momjian  wrote:
> To summarize, it seems we have two options if we want to add fence
> control to CTEs:
>
> 1.  add INLINE to disable the CTE fence
> 2.  add MATERIALIZE to enable the CTE fence

I think #1 is out of the question.

What would we do in cases like:

WITH INLINE cte AS (SELECT random() a)
SELECT * FROM cte UNION SELECT * FROM cte;

I assume we won't want to inline when the CTE query contains a
volatile function, and we certainly won't in cases like:

WITH INLINE cte AS (DELETE FROM a RETURNING *)
INSERT INTO b SELECT * from cte WHERE cte.value > 5;

We'd be certain to receive complaints from disgruntled users about
"Why is this not inlined when I specified INLINE?"

#2 does not suffer from that.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Duplicate usage of tablespace location?

2017-05-15 Thread Kyotaro HORIGUCHI
Hi,

At Mon, 15 May 2017 14:35:20 +0900, Michael Paquier  
wrote in 
> On Thu, May 11, 2017 at 1:09 PM, Kyotaro HORIGUCHI
>  wrote:
> > If we can accept multiple server versions share a tablespace
> > directory, pg_basebackup also can allow that situation. The
> > attached patch does that. Similary to the server code, it
> > correctly fails if the same version subdirectory exists.
> 
> +#define verify_dir_is_empty_or_create(dirname, created, found) \
> +verify_and_create_dir(dirname, created, found, false)
> This solution looks like a quick-and-dirty fix. I tend to prefer a

You're right, it just intends to reduce the amount of
modification to clarify what the patch does is. It is to be
replced with the bare functions.

> solution close to whet Pierre is proposing on the other thread by
> localizing things in ReceiveAndUnpackTarFile(). This makes the check
> more localized, and there is no need to complicate low-level APIs of
> pg_basebackup.c.
> 
> By the way, it may be better to keep discussions on the first thread created:
> https://www.postgresql.org/message-id/05c62730-8670-4da6-b783-52e66fb42...@pinaraf.info
> A patch has been submitted to the next CF there as well.

I noticed it after the mail upthread.  I have sent a comment on
that and moved to the thread. Thanks for noticing.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-15 Thread Ashutosh Bapat
On Mon, May 15, 2017 at 9:13 PM, Robert Haas  wrote:
> On Mon, May 15, 2017 at 6:57 AM, amul sul  wrote:
>>> Collation is only relevant for ordering, not equality.  Since hash
>>> opclasses provide only equality, not ordering, it's not relevant here.
>>> I'm not sure whether we should error out if it's specified or just
>>> silently ignore it.  Maybe an ERROR is a good idea?  But not sure.
>>>
>> IMHO, we could simply have a WARNING, and ignore collation, thoughts?
>>
>> Updated patches attached.
>
> I think that WARNING is rarely a good compromise between ERROR and
> nothing.  I think we should just decide whether this is legal (and
> then allow it without a WARNING) or not legal (and then ERROR).
> Telling the user that it's allowed but we don't like it doesn't really
> help much.

+1. We should throw an error and add a line in documentation that
collation should not be specified for hash partitioned table.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] src/test/ssl/t/001_ssltests.pl should not tromp on file permissions

2017-05-15 Thread Tom Lane
Michael Paquier  writes:
> On Tue, May 16, 2017 at 7:05 AM, Tom Lane  wrote:
>> We could maybe make 001_ssltests.pl save and restore the file's
>> permissions, but I think probably a cleaner answer is to have it
>> make a temporary copy and set the permissions on that.

> Ah, you are talking about ss/client.key here... Using a temporary copy
> makes the most sense to me, as there is no need to think about putting
> back the old permissions on test failure, and the original tree
> remains unmodified all the time.

Pushed with minor adjustments --- I made the test script unlink the temp
file at the end, and tightened the .gitignore patterns a bit.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] src/test/ssl/t/001_ssltests.pl should not tromp on file permissions

2017-05-15 Thread Michael Paquier
On Tue, May 16, 2017 at 7:05 AM, Tom Lane  wrote:
> I got tripped up while building the 10beta1 tarballs by the fact
> that src/test/ssl/ssl/client.key had permissions 0600 in my git
> checkout.  After a fair amount of head-scratching, I figured out
> that this must have been a side-effect of having run the SSL regression
> tests at some point in the past.  I do not like test scripts that
> scribble on non-transient files, not even (or perhaps especially not)
> if they're "just" changing the permissions.
>
> We could maybe make 001_ssltests.pl save and restore the file's
> permissions, but I think probably a cleaner answer is to have it
> make a temporary copy and set the permissions on that.

Ah, you are talking about ss/client.key here... Using a temporary copy
makes the most sense to me, as there is no need to think about putting
back the old permissions on test failure, and the original tree
remains unmodified all the time.
-- 
Michael


ssl-test-perms.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Obsolete sentence in CREATE SUBSCRIPTION docs

2017-05-15 Thread Tom Lane
Masahiko Sawada  writes:
> I found $subject while reading documentation. I guess this should have
> updated when we reworked options syntax.

Pushed, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [doc fix] PG10: wroing description on connect_timeout when multiple hosts are specified

2017-05-15 Thread Michael Paquier
On Tue, May 16, 2017 at 3:13 AM, Tom Lane  wrote:
> FWIW, I think the position most of us were taking is that this feature
> is meant to retry transport-level connection failures, not cases where
> we successfully make a connection to a server and then it rejects our
> login attempt.  I would classify a timeout as a transport-level failure
> as long as it occurs before we got any server response --- if it happens
> during the authentication protocol, that's less clear.  But it might not
> be very practical to distinguish those two cases.
>
> In short, +1 for retrying on timeout during connection, and I'm okay with
> retrying a timeout during authentication if it's not practical to treat
> that differently.

Sensible argument here. It could happen that a server is unresponsive,
for example in split-brains (?).

I have been playing a bit with the patch.

+ *
+ * Returns -1 on failure, 0 if the socket is readable/writable, 1 if
it timed out.
  */
pqWait is internal to libpq, so we are free to set up what we want
here. Still I think that we should be consistent with what
pqSocketCheck returns:
* >0 means that the socket is readable/writable, counting things.
* 0 is for timeout.
* -1 on failure.

+intret = 0;
+inttimeout = 0;
The declaration of "ret" should be internal in the for(;;) loop.

+   /* Attempt connection to the next host, starting the
connect_timeout timer */
+   pqDropConnection(conn, true);
+   conn->addr_cur = conn->connhost[conn->whichhost].addrlist;
+   conn->status = CONNECTION_NEEDED;
+   finish_time = time(NULL) + timeout;
+   }
I think that it would be safer to not set finish_time if
conn->connect_timeout is NULL. I agree that your code works because
pqWaitTimed() will never complain on timeout reached if finish_time is
-1. That's for robustness sake.

The docs say that for connect_timeout:
  
   Maximum wait for connection, in seconds (write as a decimal integer
   string). Zero or not specified means wait indefinitely.  It is not
   recommended to use a timeout of less than 2 seconds.
   This timeout applies separately to each connection attempt.
   For example, if you specify two hosts and both of them are unreachable,
   and connect_timeout is 5, the total time spent waiting for a
   connection might be up to 10 seconds.
  

It seems to me that this implies that if a timeout occurs on the first
connection, the counter is reset, which is what this patch is doing.
So we are all set.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Obsolete sentence in CREATE SUBSCRIPTION docs

2017-05-15 Thread Masahiko Sawada
Hi,

I found $subject while reading documentation. I guess this should have
updated when we reworked options syntax.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_create_subscription_doc.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relcache leak when row triggers on partitions are fired by COPY

2017-05-15 Thread Amit Langote
On 2017/05/16 10:03, Thomas Munro wrote:
> On Tue, May 16, 2017 at 12:32 PM, Amit Langote
>  wrote:
>> I vote for ExecCleanupTriggerState(estate).  After your patch, there will
>> be 4 places, including afterTriggerInvokeEvents(), ExecEndPlan(), and
>> EvalPlanQualEnd(), that repeat the same block of code.
> 
> Ok, here's a patch like that.

Thanks, looks good to me.

>  The call to ExecCloseIndices() may
> technically be redundant (we never opened them).

Actually yes.  We never do ExecOpenIndices() on the ResultRelInfos
contained in es_trig_target_relations.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server Crashes if try to provide slot_name='none' at the time of creating subscription.

2017-05-15 Thread Masahiko Sawada
On Mon, May 15, 2017 at 8:09 PM, Masahiko Sawada  wrote:
> On Mon, May 15, 2017 at 8:06 PM, Kuntal Ghosh
>  wrote:
>> On Mon, May 15, 2017 at 4:00 PM, Masahiko Sawada  
>> wrote:
>>> On Mon, May 15, 2017 at 6:41 PM, tushar  
>>> wrote:
 Hi,

 Server Crashes if we try to provide slot_name='none' at the time of 
 creating
 subscription -

 postgres=#  create subscription sub2 connection 'dbname=postgres port=5000
 user=centos password=f' publication abc with (slot_name='none');
 NOTICE:  synchronized table states
 server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
 The connection to the server was lost. Attempting reset: Failed.
 !>

>>>
>>> Thank you for reporting.
>>> I think create_slot and enabled should be set to false forcibly when
>>> slot_name = 'none'. Attached patch fixes it, more test and regression
>>> test case are needed though.
>>>
>> I think the patch doesn't solve the problem completely. For example,
>> postgres=# create subscription sub3 connection 'dbname=postgres
>> port=5432 user=edb password=edb' publication abc with
>> (slot_name='none',create_slot=true);
>> ERROR:  slot_name = NONE and create slot are mutually exclusive options
>> postgres=# create subscription sub3 connection 'dbname=postgres
>> port=5432 user=edb password=edb' publication abc with
>> (create_slot=true,slot_name='none');
>> ERROR:  could not connect to the publisher: could not connect to
>> server: Connection refused
>> Is the server running locally and accepting
>> connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
>>
>> Changing the order of subscription parameter changes the output. I
>> think the modifications should be done at the end of the function.
>> Attached a patch for the same.
>>
>
> Yes, you're patch fixes it correctly.
>

I've updated Kuntal's patch, added regression test for option
combination and updated documentation.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_bug_of_parse_option_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relcache leak when row triggers on partitions are fired by COPY

2017-05-15 Thread Thomas Munro
On Tue, May 16, 2017 at 12:32 PM, Amit Langote
 wrote:
> I vote for ExecCleanupTriggerState(estate).  After your patch, there will
> be 4 places, including afterTriggerInvokeEvents(), ExecEndPlan(), and
> EvalPlanQualEnd(), that repeat the same block of code.

Ok, here's a patch like that.  The call to ExecCloseIndices() may
technically be redundant (we never opened them).

-- 
Thomas Munro
http://www.enterprisedb.com


fix-relcache-leak-in-copy-with-triggers-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relcache leak when row triggers on partitions are fired by COPY

2017-05-15 Thread Amit Langote
Hi Thomas,

On 2017/05/16 9:12, Thomas Munro wrote:
> Hi hackers,
> 
> While testing the patch I'm writing for the transition table open
> item, I noticed that we can leak Relation objects like this:
> 
> postgres=# create table parent (a text, b int) partition by list (a);
> CREATE TABLE
> postgres=# create table child partition of parent for values in ('AAA');
> CREATE TABLE
> postgres=# create or replace function my_trigger_func() returns
> trigger language plpgsql as $$ begin raise notice 'hello'; return
> null; end; $$;
> CREATE FUNCTION
> postgres=# create trigger child_trigger after insert or update or
> delete on child for each row execute procedure my_trigger_func();
> CREATE TRIGGER
> postgres=# copy parent (a, b) from stdin;
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself.
>>> AAA 42
>>> \.
> NOTICE:  hello
> WARNING:  relcache reference leak: relation "child" not closed
> COPY 1

Thanks for the test case and the patch.

> The leaked object is created by ExecGetTriggerResultRel, called by
> afterTriggerInvokeEvents.  That function relies on someone, usually
> ExecEndPlan or EvalPlanQualEnd, to clean up es_trig_target_relations.
> Shouldn't copy.c do the same thing (see attached)?  Or perhaps there
> should there be an ExecCleanupTriggerState(estate) used by all places?

I vote for ExecCleanupTriggerState(estate).  After your patch, there will
be 4 places, including afterTriggerInvokeEvents(), ExecEndPlan(), and
EvalPlanQualEnd(), that repeat the same block of code.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Relcache leak when row triggers on partitions are fired by COPY

2017-05-15 Thread Thomas Munro
Hi hackers,

While testing the patch I'm writing for the transition table open
item, I noticed that we can leak Relation objects like this:

postgres=# create table parent (a text, b int) partition by list (a);
CREATE TABLE
postgres=# create table child partition of parent for values in ('AAA');
CREATE TABLE
postgres=# create or replace function my_trigger_func() returns
trigger language plpgsql as $$ begin raise notice 'hello'; return
null; end; $$;
CREATE FUNCTION
postgres=# create trigger child_trigger after insert or update or
delete on child for each row execute procedure my_trigger_func();
CREATE TRIGGER
postgres=# copy parent (a, b) from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> AAA 42
>> \.
NOTICE:  hello
WARNING:  relcache reference leak: relation "child" not closed
COPY 1

The leaked object is created by ExecGetTriggerResultRel, called by
afterTriggerInvokeEvents.  That function relies on someone, usually
ExecEndPlan or EvalPlanQualEnd, to clean up es_trig_target_relations.
Shouldn't copy.c do the same thing (see attached)?  Or perhaps there
should there be an ExecCleanupTriggerState(estate) used by all places?

-- 
Thomas Munro
http://www.enterprisedb.com


fix-relcache-leak-in-copy-with-triggers.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Receive buffer size for the statistics socket

2017-05-15 Thread Tom Lane
I wrote:
> I propose that it'd be a good idea to try to set the stats socket's
> receive buffer size to be a minimum of, say, 100K on all platforms.
> Code for this would be analogous to what we already have in pqcomm.c
> (circa line 760) for forcing up the send buffer size, but SO_RCVBUF
> not SO_SNDBUF.

I experimented with the attached patch.  Modern platforms such as
recent Linux and macOS seem to have default receive buffer sizes
in the 100K-200K range.  The default is less in older systems, but
even my very oldest dinosaurs will let you set it to at least 256K.
(Don't have Windows to try, though.)

I propose committing this (less the debug logging part) to HEAD
once the beta is out, and then back-patching if it doesn't break
anything and seems to improve matters on frogmouth.

regards, tom lane

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index d4feed1..d868976 100644
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
***
*** 93,98 
--- 93,101 
  #define PGSTAT_POLL_LOOP_COUNT	(PGSTAT_MAX_WAIT_TIME / PGSTAT_RETRY_DELAY)
  #define PGSTAT_INQ_LOOP_COUNT	(PGSTAT_INQ_INTERVAL / PGSTAT_RETRY_DELAY)
  
+ /* Minimum receive buffer size for the collector's socket. */
+ #define PGSTAT_MIN_RCVBUF		(100 * 1024)
+ 
  
  /* --
   * The initial size hints for the hash tables used in the collector.
*** retry2:
*** 574,579 
--- 577,620 
  		goto startup_failed;
  	}
  
+ 	/*
+ 	 * Try to ensure that the socket's receive buffer is at least
+ 	 * PGSTAT_MIN_RCVBUF bytes, so that it won't easily overflow and lose
+ 	 * data.  Use of UDP protocol means that we are willing to lose data under
+ 	 * heavy load, but we don't want it to happen just because of ridiculously
+ 	 * small default buffer sizes (such as 8KB on older Windows versions).
+ 	 */
+ 	{
+ 		int			old_rcvbuf;
+ 		int			new_rcvbuf;
+ 		ACCEPT_TYPE_ARG3 rcvbufsize = sizeof(old_rcvbuf);
+ 
+ 		if (getsockopt(pgStatSock, SOL_SOCKET, SO_RCVBUF,
+ 	   (char *) _rcvbuf, ) < 0)
+ 		{
+ 			elog(LOG, "getsockopt(SO_RCVBUF) failed: %m");
+ 			/* if we can't get existing size, always try to set it */
+ 			old_rcvbuf = 0;
+ 		}
+ 
+ 		new_rcvbuf = PGSTAT_MIN_RCVBUF;
+ 		if (old_rcvbuf < new_rcvbuf)
+ 		{
+ 			if (setsockopt(pgStatSock, SOL_SOCKET, SO_RCVBUF,
+ 		   (char *) _rcvbuf, sizeof(new_rcvbuf)) < 0)
+ elog(LOG, "setsockopt(SO_RCVBUF) failed: %m");
+ 		}
+ 
+ 		/* this part is just for debugging, not needed at commit: */
+ 		rcvbufsize = sizeof(new_rcvbuf);
+ 		if (getsockopt(pgStatSock, SOL_SOCKET, SO_RCVBUF,
+ 	   (char *) _rcvbuf, ) < 0)
+ 			elog(LOG, "getsockopt(SO_RCVBUF) failed: %m");
+ 		else
+ 			elog(LOG, "getsockopt(SO_RCVBUF) was %d, now %d",
+  old_rcvbuf, new_rcvbuf);
+ 	}
+ 
  	pg_freeaddrinfo_all(hints.ai_family, addrs);
  
  	return;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] COPY FROM STDIN behaviour on end-of-file

2017-05-15 Thread Thomas Munro
Hi hackers,

I you hit ^d while COPY FROM STDIN is reading then subsequent COPY
FROM STDIN commands return immediately.  That's because we never clear
the end-of-file state on the libc FILE.  Shouldn't we do that, perhaps
with something like the attached?

-- 
Thomas Munro
http://www.enterprisedb.com


clear-copy-stream-eof.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] src/test/ssl/t/001_ssltests.pl should not tromp on file permissions

2017-05-15 Thread Tom Lane
I got tripped up while building the 10beta1 tarballs by the fact
that src/test/ssl/ssl/client.key had permissions 0600 in my git
checkout.  After a fair amount of head-scratching, I figured out
that this must have been a side-effect of having run the SSL regression
tests at some point in the past.  I do not like test scripts that
scribble on non-transient files, not even (or perhaps especially not)
if they're "just" changing the permissions.

We could maybe make 001_ssltests.pl save and restore the file's
permissions, but I think probably a cleaner answer is to have it
make a temporary copy and set the permissions on that.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-05-15 Thread David Fetter
On Mon, May 15, 2017 at 03:26:02PM -0400, Robert Haas wrote:
> On Sun, May 14, 2017 at 9:35 PM, Andres Freund  wrote:
> > On 2017-05-14 21:22:58 -0400, Robert Haas wrote:
> >> but wanting a CHECK constraint that applies to only one partition
> >> seems pretty reasonable (e.g. CHECK that records for older years
> >> are all in the 'inactive' state, or whatever).
> >
> > On a hash-partitioned table?
> 
> No, probably not.  But do we really want the rules for partitioned
> tables to be different depending on the kind of partitioning in use?

As the discussion has devolved here, it appears that there are, at
least conceptually, two fundamentally different classes of partition:
public, which is to say meaningful to DB clients, and "private", used
for optimizations, but otherwise opaque to DB clients.

Mashing those two cases together appears to cause more problems than
it solves.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-05-15 Thread Mark Dilger

> On May 15, 2017, at 7:48 AM, Jeff Davis  wrote:
> 
> On Sun, May 14, 2017 at 6:22 PM, Robert Haas  wrote:
>> You'd have to prohibit a heck of a lot more than that in order for
>> this to work 100% reliably.  You'd have to prohibit CHECK constraints,
>> triggers, rules, RLS policies, and UNIQUE indexes, at the least.  You
>> might be able to convince me that some of those things are useless
>> when applied to partitions, but wanting a CHECK constraint that
>> applies to only one partition seems pretty reasonable (e.g. CHECK that
>> records for older years are all in the 'inactive' state, or whatever).
>> I think getting this to work 100% reliably in all cases would require
>> an impractically large hammer.
> 
> The more I think about it the more I think hash partitions are
> "semi-logical". A check constraint on a single partition of a
> range-partitioned table makes sense, but it doesn't make sense on a
> single partition of a hash-partitioned table.

That depends on whether the user gets to specify the hash function, perhaps
indirectly by specifying a user defined opfamily.  I can imagine clever hash
functions that preserve certain properties of the incoming data, and check
constraints in development versions of the database that help verify the hash
is not violating those properties.

That's not to say such hash functions must be allowed in the hash partitioning
implementation; just that it does make sense if you squint and look a bit 
sideways
at it.

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOT NULL constraints on range partition key columns

2017-05-15 Thread Robert Haas
On Mon, May 15, 2017 at 9:12 AM, Amit Kapila  wrote:
> Can't we allow NULL to get inserted into the partition (leaf
> partition) if the user uses the partition name in Insert statement?

That would be terrible behavior - the behavior of tuple routing should
match the enforced constraints.

> For root partitions, I think for now giving an error is okay, but once
> we have default partitions (Rahila's patch), we can route NULLS to
> default partition.

Yeah, that's exactly why I think we should make the change Amit is
proposing here.  If we don't, then we won't be able to accept NULL
values even after we have the default partitioning stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal psql \gdesc

2017-05-15 Thread Pavel Stehule
2017-05-09 23:00 GMT+02:00 Fabien COELHO :

>
> What about detecting the empty result (eg PQntuples()==0?) and writing
>>> "Empty result" instead of the strange looking empty table above? That
>>> would
>>> just mean skipping the PrintQueryResult call in this case?
>>>
>>
>> PQntuples == 0 every time - the query is not executed.
>>
>
> I meant to test the query which collects type names, which is executed?
>

How it can help?

>
> Or check that PQnfields() == 0 on the PQdescribePrepared() result, so that
> there is no need to execute the type name collection query?
>
> For the case "SELECT;" the empty table is correct.
>>
>
> Ok. Then write "Empty table"?
>
> For TRUNCATE and similar command I am not sure. The empty table is maybe
>> unusual, but it is valid - like "SELECT;".
>>
>
> I would partly disagree:
>
> "SELECT;" does indeed return an empty relation, so I agree that an empty
> table is valid whether spelled out as "Empty table" or explicitly.
>
> However, ISTM that "TRUNCATE stuff;" does *NOT* return a relation, so
> maybe "No table" would be ok, but not an empty table... ?!
>
> So I could be okay with both:
>
>   SELECT \gdesc
>   -- "Empty table" or some other string
> Or
>   -- Name | Type
>
> Although I prefer the first one, because the second looks like a bug
> somehow: I asked for a description, but nothing is described... even if the
> answer is somehow valid, it looks pretty strange.
>
> The same results do not realy suit "TRUNCATE Foo \gdesc", where "No table"
> would seem more appropriate?
>
> In both case, "Empty result" is kind of neutral, it does not promise a
> table or not. Hmmm. At least not too much. Or maybe some other string such
> as "Nothing" or "No result"?
>
> Now I wonder whether the No vs Empty cases can be distinguished?


No with standard libpq API :(

I am sending a variant with message instead empty result.

Personally I prefer empty result instead message. It is hard to choose some
good text of this message. Empty result is just empty result for all cases.

Regards

Pavel




>
> --
> Fabien.
>
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 3b86612..2e46f4c 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1957,6 +1957,16 @@ Tue Oct 26 21:40:57 CEST 1999
 
 
   
+\gdesc
+
+
+Show the description of the result of the current query buffer without
+actually executing it, by considering it a prepared statement.
+
+
+  
+
+  
 \gexec
 
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index b3263a9..a1c8e1d 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -88,6 +88,7 @@ static backslashResult exec_command_errverbose(PsqlScanState scan_state, bool ac
 static backslashResult exec_command_f(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_g(PsqlScanState scan_state, bool active_branch,
 			   const char *cmd);
+static backslashResult exec_command_gdesc(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_gexec(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_gset(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_help(PsqlScanState scan_state, bool active_branch);
@@ -337,6 +338,8 @@ exec_command(const char *cmd,
 		status = exec_command_f(scan_state, active_branch);
 	else if (strcmp(cmd, "g") == 0 || strcmp(cmd, "gx") == 0)
 		status = exec_command_g(scan_state, active_branch, cmd);
+	else if (strcmp(cmd, "gdesc") == 0)
+		status = exec_command_gdesc(scan_state, active_branch);
 	else if (strcmp(cmd, "gexec") == 0)
 		status = exec_command_gexec(scan_state, active_branch);
 	else if (strcmp(cmd, "gset") == 0)
@@ -1328,6 +1331,25 @@ exec_command_g(PsqlScanState scan_state, bool active_branch, const char *cmd)
 }
 
 /*
+ * \gdesc -- describe query result
+ */
+static backslashResult
+exec_command_gdesc(PsqlScanState scan_state, bool active_branch)
+{
+	backslashResult status = PSQL_CMD_SKIP_LINE;
+
+	if (active_branch)
+	{
+		pset.gdesc_flag = true;
+		status = PSQL_CMD_SEND;
+	}
+	else
+		ignore_slash_filepipe(scan_state);
+
+	return status;
+}
+
+/*
  * \gexec -- send query and execute each field of result
  */
 static backslashResult
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index a2f1259..3cffb4a 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -1323,7 +1323,93 @@ SendQuery(const char *query)
 		}
 	}
 
-	if (pset.fetch_count <= 0 || pset.gexec_flag ||
+	if (pset.gdesc_flag)
+	{
+		/*
+		 * Unnamed prepared statement is used. Is not possible to
+		 * create any unnamed prepared statement from psql user space,
+		 * so there should not be any conflict. In this moment is not
+		 * possible to deallocate this prepared statement, so it should
+		 * to live to end of session or to another \gdesc call.
+		 */

Re: [HACKERS] Hash Functions

2017-05-15 Thread Robert Haas
On Sun, May 14, 2017 at 9:35 PM, Andres Freund  wrote:
> On 2017-05-14 21:22:58 -0400, Robert Haas wrote:
>> but wanting a CHECK constraint that applies to only one partition
>> seems pretty reasonable (e.g. CHECK that records for older years are
>> all in the 'inactive' state, or whatever).
>
> On a hash-partitioned table?

No, probably not.  But do we really want the rules for partitioned
tables to be different depending on the kind of partitioning in use?

> I'm not saying it can't work for any datatype, I just think it'd be a
> very bad idea to make it work for any non-trivial ones. The likelihood
> of reguarly breaking or preventing us from improving things seems high.
> I'm not sure that having a design where this most of the time works for
> some datatypes is a good one.

I think you might be engaging in undue pessimism here, but I suspect
we need to actually try doing the work before we know how it will turn
out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-15 Thread Kevin Grittner
[Apologies to all for my recent absence from community lists, and
special thanks to Thomas and Robert for picking up the slack.]

On Tue, May 9, 2017 at 4:51 PM, Thomas Munro
 wrote:
> On Tue, May 9, 2017 at 10:29 PM, Thomas Munro  
> wrote:

> Recall that transition tables can be specified for statement-level
> triggers AND row-level triggers.  If you specify them for row-level
> triggers, then they can see all rows changed so far each time they
> fire.

No, they see all rows from the statement, each time.

test=# create table t (c int not null);
CREATE TABLE
test=# create function t_func()
test-#   returns trigger
test-#   language plpgsql
test-# as $$
test$# begin
test$#   raise notice '% / % = %',
test$#new.c,
test$#(select sum(c) from n),
test$#(select new.c::float / sum(n.c) from n);
test$#   return null;
test$# end;
test$# $$;
CREATE FUNCTION
test=# create trigger t_trig
test-#   after insert or update on t
test-#   referencing new table as n
test-#   for each row
test-#   execute procedure t_func();
CREATE TRIGGER
test=# insert into t select generate_series(1,5);
NOTICE:  1 / 15 = 0.0667
NOTICE:  2 / 15 = 0.133
NOTICE:  3 / 15 = 0.2
NOTICE:  4 / 15 = 0.267
NOTICE:  5 / 15 = 0.333
INSERT 0 5

This behavior is required for this feature by the SQL standard.

> Now our policy of firing the statement level triggers only for
> the named relation but firing row-level triggers for all modified
> relations leads to a tricky problem for the inheritance case: what
> type of transition tuples should the child table's row-level triggers
> see?

The record format for the object on which the trigger was declared, IMO.

> Suppose you have an inheritance hierarchy like this:
>
>  animal
>   -> mammal
>   -> cat
>
> You define a statement-level trigger on "animal" and another
> statement-level trigger on "mammal".  You define a row-level trigger
> on "cat".  When you update either "animal" or "mammal", the row
> triggers on "cat" might run.  Row-level triggers on "cat" see OLD and
> NEW as "cat" tuples, of course, but if they are configured to see
> transition tables, should they see "cat", "mammal" or "animal" tuples
> in the transition tables?  With my patch as it is, that depends on
> which level of the hierarchy you explicitly updated!

I think that the ideal behavior would be that if you define a
trigger on "cat", you see rows in the "cat" format; if you define a
trigger on rows for "mammal", you see rows in the "mammal" format;
if you define a trigger on rows for "animal", you see rows in the
"animal" format.  Also, the ideal would be that we support an ONLY
option for trigger declaration.  If your statement is ONLY on the a
given level in the hierarchy, the row triggers for only that level
would fire.  If you don't use ONLY, a row trigger at that level
would fire for operations at that level or any child level, but with
a record format matching the level of the trigger.

Now, that may be too ambitious for this release.  If so, I suggest
we not implement anything that would be broken by the above, and
throw a "not implemented" error when necessary.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: proposal - using names as primary names of plpgsql function parameters instead $ based names

2017-05-15 Thread Pavel Stehule
2017-05-13 18:26 GMT+02:00 Pavel Stehule :

> Hi
>
> Now, I when I working on plpgsql_check, I have to check function
> parameters. I can use fn_vargargnos and out_param_varno for list of
> arguments and related varno(s). when I detect some issue, I am using
> refname. It is not too nice now, because these refnames are $ based. Long
> names are alias only. There are not a possibility to find related alias.
>
> So, my proposal. Now, we can use names as refname of parameter variable. $
> based name can be used as alias. From user perspective there are not any
> change.
>
> Comments, notes?
>

here is a patch

Regards

Pavel


>
> Regards
>
> Pavel
>


psql-named-arguments
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-05-15 Thread Tom Lane
Robert Haas  writes:
> On Sun, May 14, 2017 at 9:50 PM, Tsunakawa, Takayuki
>  wrote:
>> By the way, could you elaborate what problem could occur if my solution is 
>> applied?  (it doesn't seem easy for me to imagine...)

> Sure.  Imagine that the user thinks that 'foo' and 'bar' are the
> relevant database servers for some service and writes 'dbname=quux
> host=foo,bar' as a connection string.  However, actually the user has
> made a mistake and 'foo' is supporting some other service entirely; it
> has no database 'quux'; the database servers which have database
> 'quux' are in fact 'bar' and 'baz'.

Even more simply, suppose that your userid is known to host bar but the
DBA has forgotten to create it on foo.  This is surely a configuration
error that ought to be rectified, not just failed past, or else you don't
have any of the redundancy you think you do.

Of course, the user would have to try connections to both foo and bar
to be sure that they're both configured correctly.  But he might try
"host=foo,bar" and "host=bar,foo" and figure he was OK, not noticing
that both connections had silently been made to bar.

The bigger picture here is that we only want to fail past transient
errors, not configuration errors.  I'm willing to err in favor of
regarding doubtful cases as transient, but most server login rejections
aren't for transient causes.

There might be specific post-connection errors that we should consider
retrying; "too many connections" is an obvious case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [doc fix] PG10: wroing description on connect_timeout when multiple hosts are specified

2017-05-15 Thread Tom Lane
Robert Haas  writes:
> Takayuki Tsunakawa raised a very similar issue in another thread
> related to another open item, namely
> https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F6F5659%40G01JPEXMBYT05
> in which he argued that libpq ought to try then next host after a
> connection failure regardless of the reason for the connection
> failure.  Tom, Michael Paquier, and I all disagreed; none of us
> believe that this feature was intended to retry the connection to a
> different host after an arbitrary error reported by the remote server.
> This thread is essentially the same issue, except here the question
> isn't what should happen after we connect to a server and it returns
> an error, but rather what happens when we time out waiting to connect
> to a server.  When that happens, should we give up, or try the next
> server?

FWIW, I think the position most of us were taking is that this feature
is meant to retry transport-level connection failures, not cases where
we successfully make a connection to a server and then it rejects our
login attempt.  I would classify a timeout as a transport-level failure
as long as it occurs before we got any server response --- if it happens
during the authentication protocol, that's less clear.  But it might not
be very practical to distinguish those two cases.

In short, +1 for retrying on timeout during connection, and I'm okay with
retrying a timeout during authentication if it's not practical to treat
that differently.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-05-15 Thread Robert Haas
On Sun, May 14, 2017 at 9:50 PM, Tsunakawa, Takayuki
 wrote:
>> I guess not as well. That would be tricky for the user to have a different
>> behavior depending on the error returned by the server, which is why the
>> current code is doing things right IMO. Now, the feature has been designed
>> similarly to JDBC with its parametrization, so it could be surprising for
>> users to get a different failure handling compared to that. Not saying that
>> JDBC is doing it wrong, but libpq does nothing wrong either.
>
> I didn't intend to make the user have a different behavior depending on the 
> error returned by the server.  I meant attempting connection to alternative 
> hosts when the server returned an error. I thought the new libpq feature 
> tries to connect to other hosts when a connection attempt fails, where the 
> "connection" is the *database connection* (user's perspective), not the 
> *socket connection* (PG developer's perspective).  I think PgJDBC meets the 
> user's desire better -- "Please connect to some host for better HA if a 
> database server is unavailable for some reason."
>
> By the way, could you elaborate what problem could occur if my solution is 
> applied?  (it doesn't seem easy for me to imagine...)

Sure.  Imagine that the user thinks that 'foo' and 'bar' are the
relevant database servers for some service and writes 'dbname=quux
host=foo,bar' as a connection string.  However, actually the user has
made a mistake and 'foo' is supporting some other service entirely; it
has no database 'quux'; the database servers which have database
'quux' are in fact 'bar' and 'baz'.  All appears well as long as 'bar'
remains up, because the missing-database error for 'foo' is ignored
and we just connect to 'bar'.  However, when 'bar' goes down then we
are out of service instead of failing over to 'baz' as we should have
done.

Now it's quite possible that the user, if they test carefully, might
realize that things are not working as intended, because the DBA might
say "hey, all of your connections are being directed to 'bar' instead
of being load-balanced properly!".  But even if they are careful
enough to realize this, it may not be clear what has gone wrong.
Under your proposal, the connection to 'foo' could be failing for *any
reason whatsoever* from lack of connectivity to a missing database to
a missing user to a missing CONNECT privilege to an authentication
failure.  If the user looks at the server log and can pick out the
entries from their own connection attempts they can figure it out, but
otherwise they might spend quite a bit of time wondering what's wrong;
after all, libpq will report no error, as long as the connection to
the other server works.

Now, this is all arguable.  You could certainly say -- and you are
saying -- that this feature ought to be defined to retry after any
kind of failure whatsoever.  But I think what Tom and Michael and I
are saying is that this is a failover feature and therefore ought to
try the next server when the first one in the list appears to have
gone down, but not when the first one in the list is unhappy with the
connection request for some other reason.  Who is right is a judgement
call, but I don't think it's self-evident that users want to ignore
anything and everything that might have gone wrong with the connection
to the first server, rather than only those things which resemble a
down server.  It seems quite possible to me that if we had defined it
as you are proposing, somebody would now be arguing for a behavior
change in the other direction.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [doc fix] PG10: wroing description on connect_timeout when multiple hosts are specified

2017-05-15 Thread Robert Haas
On Sun, May 14, 2017 at 11:45 PM, Noah Misch  wrote:
>> I'll add this item in the PostgreSQL 10 Open Items.
>
> [Action required within three days.  This is a generic notification.]

I think there is a good argument that the existing behavior is as per
the documentation, but I think we may want to change it anyway.  What
the documentation is saying - or at least what I believe I intended
for it to say - is that connect_timeout is restarted for each new
host, so you could end up waiting longer than connect_timeout - but
not forever - if you specify multiple hosts.  And I believe that
statement to be correct. Takayuki Tsunakawa is saying something
different. He's saying that when connect_timeout expires, we should
try the next host instead of giving up. That may or may not be a good
idea, but it doesn't contradict the passage from the documentation
which he quoted.  That passage from the documentation doesn't say
anything at all about what happens when connect_timeout expires.  It
only talks about how much time might pass before that happens.

Takayuki Tsunakawa raised a very similar issue in another thread
related to another open item, namely
https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F6F5659%40G01JPEXMBYT05
in which he argued that libpq ought to try then next host after a
connection failure regardless of the reason for the connection
failure.  Tom, Michael Paquier, and I all disagreed; none of us
believe that this feature was intended to retry the connection to a
different host after an arbitrary error reported by the remote server.
This thread is essentially the same issue, except here the question
isn't what should happen after we connect to a server and it returns
an error, but rather what happens when we time out waiting to connect
to a server.  When that happens, should we give up, or try the next
server?

Despite the chorus of support for the opposite conclusion on the other
thread, I'm inclined to think that it would be best to change the
behavior here as per the proposed patch.  The point of being able to
specify multiple hosts is to be able to have multiple database servers
(or perhaps, multiple ways to access the same database server) and use
whichever one of those servers is currently up.  I think that when the
server fails with a complaint like "I've never heard of the database
to which you want to connect" that's not a case of the server being
down, but some other kind of trouble that the administrator really
ought to fix; thus it's best to stop and report the error.  But if
connect_timeout expires, that sounds a whole lot like the server being
down.  It sounds morally equivalent to socket() or connect() failing
outright, which *would* trigger advancing to the next host.

So I'm inclined to accept the patch, but as a definitional change
rather than a bug fix.  However, I'd like to hear some other opinions.
I'll wait until Friday for such opinions to arrive, and then update on
next steps.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-05-15 Thread Alvaro Herrera
Sokolov Yura wrote:
> Sokolov Yura писал 2017-05-15 18:23:
> > Alvaro Herrera писал 2017-05-15 18:04:
> > > Please add these two patches to the upcoming commitfest,
> > > https://commitfest.postgresql.org/
> > 
> > Thank you for suggestion.
> > 
> > I've created https://commitfest.postgresql.org/14/1138/
> > As I could understand, I should attach both patches to single email
> > to be show correctly in commitfest topic. So I do it with this email.

> Looks like it should be single file.

As I understand, these patches are logically separate, so putting them
together in a single file isn't such a great idea.  If you don't edit
the patches further, then you're all set because we already have the
previously archived patches.  Next commitfest starts in a few months
yet, and if you feel the need to submit corrected versions in the
meantime, please do submit in separate files.  (Some would even argue
that each should be its own thread, but I don't think that's necessary.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Event triggers + table partitioning cause server crash in current master

2017-05-15 Thread Mark Dilger

> On May 15, 2017, at 6:49 AM, Mark Dilger  wrote:
> 
> I can confirm that this fixes the crash that I was seeing.  I have read
> through the patch briefly, but will give it a more thorough review in the
> next few hours.

My only negative comment is that your patch follows a precedent of using
event trigger commands named for AlterTable for operations other than
an ALTER TABLE command.  You clearly are not starting the precedent
here, as they are already used for IndexStmt and ViewStmt processing, but
expanding the precedent by using them in DefineRelation, where they were
not used before, seems to move in the wrong direction.  Either the functions
should be renamed to something more general to avoid implying that the
function is ALTER TABLE specific, or different functions should be defined
and used, or ...?  I am uncertain what the right solution would be.

Thoughts?

Mark Dilger


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CTE inlining

2017-05-15 Thread Adam Brusselback
>From a user's perspective:

>I think most people prefer #2 because:

>
>*  most users writing queries prefer #2

> >*  most users assume full optimization and it seems natural to turn

> >   _off_ an optimization via a keyword

> >*  while some queries can be inlined, all queries can be materialized,

> >   so doing #1 means INLINE would be only a preference, which could be

> >   confusing

I completely agree with this reasoning.  I have a few queries I would have
to touch to add "MATERIALIZED", but the vast majority of CTE's in my
codebase would get a speedup. It would allow usage of CTE's more freely
than now.  I currently avoid them unless it really simplifies a query
because of the optimization fence.

Not that my opinion holds any weight, but the extra keyword for enabling
the optimization fence is my preference.  By default trying to optimize
more is a good thing IMO.

>
>Anyway, I am very glad we are considering addressing this in PG 11.

Seconded, this is a sore spot for me when using Postgres, and i'd love to
not have it be an issue any more.

Thanks,
-Adam


Re: [HACKERS] postgres 9.6.2 update breakage

2017-05-15 Thread Peter Eisentraut
On 5/15/17 02:48, Roel Janssen wrote:
> Ah yes, I see the point.  The problem here is that when new features are
> added to PostgreSQL, and you rely upon them in your database schemas,
> downgrading will most likely cause loss of information.
> 
> Maybe we need a wrapper script that also makes a dump of all of the
> data?  Now that could become a security hole.
> 
> Or the wrapper script warns about this situation, and recommends making
> a (extra) back-up of the database before upgrading.
> 
> Or.. the upgrade is something a user should do explicitly, basically
> giving up on the "just works" concept.  Guix already provides a nice way
> to get the previous version of the exact binaries used before the
> upgrade.

The best way to manage this with PostgreSQL is to make separate packages
for each PostgreSQL major version.  I see for example that you have
packages gcc-4.9, gcc-5, gcc-6, etc.  You should do the same with
PostgreSQL, e.g., postgresql-9.5, postgresql-9.6, postgresql-10.  Then
you don't have to concern yourselves with how "upgrades" and
"downgrades" should look for the users of your packaging system.  Minor
version upgrades are just installing the new package and restarting.
Major version upgrades are figured out by the user.

Downgrades between minor versions of the same major versions should
mostly work.  They are not well tested, if at all, but I don't think
that's all that different from downgrading any other package.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Increasing parallel workers at runtime

2017-05-15 Thread Robert Haas
On Mon, May 15, 2017 at 10:06 AM, Haribabu Kommi
 wrote:
> This still needs some adjustments to fix for the cases where
> the main backend also does the scan instead of waiting for
> the workers to finish the job. As increasing the workers logic
> shouldn't add an overhead in this case.

I think it would be pretty crazy to try relaunching workers after
every tuple, as this patch does.  The overhead of that will be very
high for queries where the number of tuples passing through the Gather
is large, whereas when the number of tuples passing through Gather is
small, or where tuples are sent all at once at the end of procesisng,
it will not actually be very effective at getting hold of more
workers.  A different idea is to have an area in shared memory where
queries can advertise that they didn't get all of the workers they
wanted, plus a background process that periodically tries to launch
workers to help those queries as parallel workers become available.
It can recheck for available workers after some interval, say 10s.
There are some problems there -- the process won't have bgw_notify_pid
pointing at the parallel leader -- but I think it might be best to try
to solve those problems instead of making it the leader's job to try
to grab more workers as we go along.  For one thing, the background
process idea can attempt to achieve fairness.  Suppose there are two
processes that didn't get all of their workers; one got 3 of 4, the
other 1 of 4.  When a worker becomes available, we'd presumably like
to give it to the process that got 1 of 4, rather than having the
leaders race to see who grabs the new worker first.  Similarly if
there are four workers available and two queries that each got 1 of 5
workers they wanted, we'd like to split the workers two and two,
rather than having one leader grab all four of them.  Or at least, I
think that's what we want.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create publication syntax is not coming properly in pg_dump / pg_dumpall

2017-05-15 Thread Tom Lane
Masahiko Sawada  writes:
> Hm, It's a bug of pg_dump. Attached patch should fix both pg_dump and
> pg_dumpall.

Pushed, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-15 Thread Dilip Kumar
On Mon, May 15, 2017 at 9:06 PM, Dilip Kumar  wrote:
> Test2:
> postgres=# insert into x2 values(100);   -- it should violates
> partition constraint
> INSERT 0 1
>
> Seems like a bug or am I missing something completely?

Sorry, my bad. It's modulus on the hashvalue, not the column.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-15 Thread Robert Haas
On Mon, May 15, 2017 at 6:57 AM, amul sul  wrote:
>> Collation is only relevant for ordering, not equality.  Since hash
>> opclasses provide only equality, not ordering, it's not relevant here.
>> I'm not sure whether we should error out if it's specified or just
>> silently ignore it.  Maybe an ERROR is a good idea?  But not sure.
>>
> IMHO, we could simply have a WARNING, and ignore collation, thoughts?
>
> Updated patches attached.

I think that WARNING is rarely a good compromise between ERROR and
nothing.  I think we should just decide whether this is legal (and
then allow it without a WARNING) or not legal (and then ERROR).
Telling the user that it's allowed but we don't like it doesn't really
help much.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-05-15 Thread Sokolov Yura

Sokolov Yura писал 2017-05-15 18:23:

Alvaro Herrera писал 2017-05-15 18:04:

Please add these two patches to the upcoming commitfest,
https://commitfest.postgresql.org/


Thank you for suggestion.

I've created https://commitfest.postgresql.org/14/1138/
As I could understand, I should attach both patches to single email
to be show correctly in commitfest topic. So I do it with this email.

Please, correct me, if I do something wrong.

With regards.


Looks like it should be single file.

--
Sokolov Yura aka funny.falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres CompanyFrom 231023298aad3024e31b487c8f5d2a6e68ad1da9 Mon Sep 17 00:00:00 2001
From: Sokolov Yura 
Date: Mon, 15 May 2017 14:23:39 +0300
Subject: [PATCH 1/2] Improve compactify_tuples

Items passed to compactify_tuples are almost sorted. So call to general
qsort makes unnecessary overhead on function calls (swapfunc, med,
itemoffcompare).

This patch implements bucket sort:
- one pass of stable prefix sort on high 8 bits of offset
- and insertion sort for buckets larger than 1 element

Also for smaller arrays shell sort is used.

Insertion and Shell sort are implemented using macroses.

This approach allows to save 3% of cpu in degenerate case
(highly intensive HOT random updates on unlogged table with
 synchronized_commit=off), and speeds up WAL replaying (as were
found by Heikki Linnakangas).

Same approach were implemented by Heikki Linnakangas some time ago with
several distinctions.
---
 src/backend/storage/page/bufpage.c | 78 --
 src/include/c.h| 59 
 2 files changed, 134 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index fdf045a45b..c5630e808b 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -434,6 +434,77 @@ itemoffcompare(const void *itemidp1, const void *itemidp2)
 }
 
 /*
+ * Sort an array of itemIdSort's on itemoff, descending.
+ *
+ * It uses Shell sort. Given array is small and itemoffcompare is inlined,
+ * it is much faster than call to qsort.
+ */
+static void
+sort_itemIds_small(itemIdSort itemidbase, int nitems)
+{
+	pg_shell_sort(itemIdSortData, itemidbase, nitems, itemoffcompare);
+}
+
+/*
+ * Sort an array of itemIdSort's on itemoff, descending.
+ *
+ * It uses bucket sort:
+ * - single pass of stable prefix sort on high 8 bits
+ * - and insertion sort on buckets larger than 1 element
+ */
+static void
+sort_itemIds(itemIdSort itemidbase, int nitems)
+{
+	itemIdSortData copy[Max(MaxIndexTuplesPerPage, MaxHeapTuplesPerPage)];
+#define NSPLIT 256
+#define PREFDIV (BLCKSZ / NSPLIT)
+	/* two extra elements to emulate offset on previous step */
+	uint16		count[NSPLIT + 2] = {0};
+	int			i,
+pos,
+highbits;
+
+	Assert(nitems <= MaxIndexTuplesPerPage);
+	for (i = 0; i < nitems; i++)
+	{
+		highbits = itemidbase[i].itemoff / PREFDIV;
+		count[highbits]++;
+	}
+	/* sort in decreasing order */
+	for (i = NSPLIT - 1; i != 0; i--)
+		count[i - 1] += count[i];
+
+	/*
+	 * count[k+1] is start of bucket k, count[k] is end of bucket k, and
+	 * count[k] - count[k+1] is length of bucket k.
+	 */
+	Assert(count[0] == nitems);
+	for (i = 0; i < nitems; i++)
+	{
+		highbits = itemidbase[i].itemoff / PREFDIV;
+		pos = count[highbits + 1];
+		count[highbits + 1]++;
+		copy[pos] = itemidbase[i];
+	}
+	Assert(count[1] == nitems);
+
+	/*
+	 * count[k+2] is start of bucket k, count[k+1] is end of bucket k, and
+	 * count[k+1]-count[k+2] is length of bucket k.
+	 */
+	for (i = NSPLIT; i > 0; i--)
+	{
+		if (likely(count[i] - count[i + 1] <= 1))
+			continue;
+		pg_insertion_sort(itemIdSortData,
+		  copy + count[i + 1],
+		  count[i] - count[i + 1],
+		  itemoffcompare);
+	}
+	memcpy(itemidbase, copy, sizeof(itemIdSortData) * nitems);
+}
+
+/*
  * After removing or marking some line pointers unused, move the tuples to
  * remove the gaps caused by the removed items.
  */
@@ -444,9 +515,10 @@ compactify_tuples(itemIdSort itemidbase, int nitems, Page page)
 	Offset		upper;
 	int			i;
 
-	/* sort itemIdSortData array into decreasing itemoff order */
-	qsort((char *) itemidbase, nitems, sizeof(itemIdSortData),
-		  itemoffcompare);
+	if (nitems > 48)
+		sort_itemIds(itemidbase, nitems);
+	else
+		sort_itemIds_small(itemidbase, nitems);
 
 	upper = phdr->pd_special;
 	for (i = 0; i < nitems; i++)
diff --git a/src/include/c.h b/src/include/c.h
index fba07c651f..837940d5cf 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -962,6 +962,65 @@ typedef NameData *Name;
 #define unlikely(x) ((x) != 0)
 #endif
 
+/*
+ * pg_shell_sort - sort for small arrays with inlinable comparator.
+ * Since it is implemented as a macros it could be optimized together with
+ * comparison function.
+ * Gaps are "gap(i) = smallest prime number below e^i". They are close to
+ * Incerpi & Sedwick gaps, but looks to 

Re: [HACKERS] [POC] hash partitioning

2017-05-15 Thread Dilip Kumar
On Mon, May 15, 2017 at 4:27 PM, amul sul  wrote:
> Updated patches attached.

While testing latest patch I found a strange behaviour.

test1:
postgres=# create table x (a int) partition by hash(a);
CREATE TABLE
postgres=# create table x1 partition of x for values with (modulus 4,
remainder 0);
CREATE TABLE
postgres=# create table x2 partition of x for values with (modulus 4,
remainder 1);
CREATE TABLE
postgres=# insert into x values(1);
2017-05-15 20:55:20.446 IST [28045] ERROR:  no partition of relation
"x" found for row
2017-05-15 20:55:20.446 IST [28045] DETAIL:  Partition key of the
failing row contains (a) = (1).
2017-05-15 20:55:20.446 IST [28045] STATEMENT:  insert into x values(1);
ERROR:  no partition of relation "x" found for row
DETAIL:  Partition key of the failing row contains (a) = (1).

Test2:
postgres=# insert into x2 values(100);   -- it should violates
partition constraint
INSERT 0 1

Seems like a bug or am I missing something completely?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-05-15 Thread David Fetter
On Mon, May 15, 2017 at 07:48:14AM -0700, Jeff Davis wrote:
> This would mean we need to reload through the root as Andres and
> others suggested,

One refinement of this would be to traverse the partition tree,
stopping at the first place where the next branch has hash partitions,
or at any rate types which have no application-level significance, and
load from there.  This could allow for parallelizing where it's
portable to do so.

Level   TablePartition Type

Base table: Log (N/A)
Next partition: Year(range)
Next partition: Month   (range)
Next partition: Day (range) < Data gets loaded no lower than 
here
Next partition: *   (hash)

That last, any below it, doesn't have a specific name because they're
just an implementation detail, i.e. none has any application-level
meaning.

> and disable a lot of logical partitioning capabilities. I'd be a
> little worried about what we do with attaching/detaching, though.

Attaching and detaching partitions only makes sense for partitions
whose partition keys have application-level meaning anyway.

Does it make sense at this point to separate our partitions into two
categories, those which have can significance to applications, and
those which can't?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cached plans and statement generalization

2017-05-15 Thread Robert Haas
On Wed, May 10, 2017 at 12:11 PM, Konstantin Knizhnik
 wrote:
> Robert, can you please explain why using TRY/CATCH is not safe here:
>>
>> This is definitely not a safe way of using TRY/CATCH.

This has been discussed many, many times on this mailing list before,
and I don't really want to go over it again here.  We really need a
README or some documentation about this so that we don't have to keep
answering this same question again and again.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bumping HASH_VERSION to 3

2017-05-15 Thread Robert Haas
On Mon, May 15, 2017 at 12:08 AM, Noah Misch  wrote:
> The above-described topic is currently a PostgreSQL 10 open item.  Robert,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.

I don't believe this patch can be committed to beta1 which wraps in
just a few hours; it seems to need a bit of work.  I'll update again
by Friday based on how review unfolds this week.  I anticipate
committing something on Wednesday or Thursday unless Bruce picks this
one up.

+/* find hash and gin indexes */
+res = executeQueryOrDie(conn,
+"SELECT n.nspname, c.relname "
+"FROM   pg_catalog.pg_class c, "
+"   pg_catalog.pg_index i, "
+"   pg_catalog.pg_am a, "
+"   pg_catalog.pg_namespace n "
+"WHERE  i.indexrelid = c.oid AND "
+"   c.relam = a.oid AND "
+"   c.relnamespace = n.oid AND "
+"   a.amname = 'hash'"

Comment doesn't match code.

+if (!check_mode && db_used)
+/* mark hash indexes as invalid */
+PQclear(executeQueryOrDie(conn,

Given that we have a comment here, I'd put curly braces around this block.

+snprintf(output_path, sizeof(output_path), "reindex_hash.sql");

This looks suspiciously pointless.  The contents of output_path will
always be precisely "reindex_hash.sql"; you could just char
*output_path = "reindex_hash.sql" and get the same effect.  Compare
this to the logic in create_script_for_cluster_analyze, which prepends
SCRIPT_PREFIX.  (I am not sure why it's necessary to prepend "./" on
Windows, but if it's needed in that case, maybe it's needed in this
case, too.)

+start_postmaster(_cluster, true);
+old_9_6_invalidate_hash_indexes(_cluster, false);
+stop_postmaster(false);

Won't this cause the hash indexes to be invalided in the old cluster
rather than the new one?

This might need a visit from pgindent in one or two places, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-05-15 Thread Sokolov Yura

Alvaro Herrera писал 2017-05-15 18:04:

Please add these two patches to the upcoming commitfest,
https://commitfest.postgresql.org/


Thank you for suggestion.

I've created https://commitfest.postgresql.org/14/1138/
As I could understand, I should attach both patches to single email
to be show correctly in commitfest topic. So I do it with this email.

Please, correct me, if I do something wrong.

With regards.
--
Sokolov Yura aka funny.falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres CompanyFrom 2cde4cb6b0c4c5868d99e13789b0ac33364d7315 Mon Sep 17 00:00:00 2001
From: Sokolov Yura 
Date: Mon, 15 May 2017 16:04:14 +0300
Subject: [PATCH 2/2] bufpage.c: simplify PageRepairFragmentation

In assumption that page usually doesn't become empty, merge second loop
body (collecting items with storage) into first (counting kinds of
items).
---
 src/backend/storage/page/bufpage.c | 46 +++---
 1 file changed, 18 insertions(+), 28 deletions(-)

diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index c5630e808b..61738f241f 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -555,10 +555,11 @@ PageRepairFragmentation(Page page)
 	Offset		pd_special = ((PageHeader) page)->pd_special;
 	ItemId		lp;
 	int			nline,
-nstorage,
 nunused;
 	int			i;
 	Size		totallen;
+	itemIdSortData itemidbase[MaxHeapTuplesPerPage];
+	itemIdSort	itemidptr = itemidbase;
 
 	/*
 	 * It's worth the trouble to be more paranoid here than in most places,
@@ -578,14 +579,26 @@ PageRepairFragmentation(Page page)
 		pd_lower, pd_upper, pd_special)));
 
 	nline = PageGetMaxOffsetNumber(page);
-	nunused = nstorage = 0;
+	nunused = totallen = 0;
 	for (i = FirstOffsetNumber; i <= nline; i++)
 	{
 		lp = PageGetItemId(page, i);
 		if (ItemIdIsUsed(lp))
 		{
 			if (ItemIdHasStorage(lp))
-nstorage++;
+			{
+itemidptr->offsetindex = i - 1;
+itemidptr->itemoff = ItemIdGetOffset(lp);
+if (unlikely(itemidptr->itemoff < (int) pd_upper ||
+			 itemidptr->itemoff >= (int) pd_special))
+	ereport(ERROR,
+			(errcode(ERRCODE_DATA_CORRUPTED),
+			 errmsg("corrupted item pointer: %u",
+	itemidptr->itemoff)));
+itemidptr->alignedlen = MAXALIGN(ItemIdGetLength(lp));
+totallen += itemidptr->alignedlen;
+itemidptr++;
+			}
 		}
 		else
 		{
@@ -595,7 +608,7 @@ PageRepairFragmentation(Page page)
 		}
 	}
 
-	if (nstorage == 0)
+	if (itemidptr == itemidbase)
 	{
 		/* Page is completely empty, so just reset it quickly */
 		((PageHeader) page)->pd_upper = pd_special;
@@ -603,36 +616,13 @@ PageRepairFragmentation(Page page)
 	else
 	{
 		/* Need to compact the page the hard way */
-		itemIdSortData itemidbase[MaxHeapTuplesPerPage];
-		itemIdSort	itemidptr = itemidbase;
-
-		totallen = 0;
-		for (i = 0; i < nline; i++)
-		{
-			lp = PageGetItemId(page, i + 1);
-			if (ItemIdHasStorage(lp))
-			{
-itemidptr->offsetindex = i;
-itemidptr->itemoff = ItemIdGetOffset(lp);
-if (itemidptr->itemoff < (int) pd_upper ||
-	itemidptr->itemoff >= (int) pd_special)
-	ereport(ERROR,
-			(errcode(ERRCODE_DATA_CORRUPTED),
-			 errmsg("corrupted item pointer: %u",
-	itemidptr->itemoff)));
-itemidptr->alignedlen = MAXALIGN(ItemIdGetLength(lp));
-totallen += itemidptr->alignedlen;
-itemidptr++;
-			}
-		}
-
 		if (totallen > (Size) (pd_special - pd_lower))
 			ereport(ERROR,
 	(errcode(ERRCODE_DATA_CORRUPTED),
 			   errmsg("corrupted item lengths: total %u, available space %u",
 	  (unsigned int) totallen, pd_special - pd_lower)));
 
-		compactify_tuples(itemidbase, nstorage, page);
+		compactify_tuples(itemidbase, (int) (itemidptr - itemidbase), page);
 	}
 
 	/* Set hint bit for PageAddItem */
-- 
2.11.0

From 8f75fcaba0590d5150dd956a60feee5c28f4caab Mon Sep 17 00:00:00 2001
From: Sokolov Yura 
Date: Mon, 15 May 2017 14:23:39 +0300
Subject: [PATCH] storage/page/bufpage.c: improve compactify_tuples

Items passed to compactify_tuples are almost sorted. So call to general
qsort makes unnecessary overhead on function calls (swapfunc, med,
itemoffcompare).

This patch implements bucket sort:
- one pass of stable prefix sort on high 8 bits of offset
- and insertion sort for buckets larger than 1 element

Also for smaller arrays shell sort is used.

Insertion and Shell sort are implemented using macroses.

This approach allows to save 3% of cpu in degenerate case
(highly intensive HOT random updates on unlogged table with
 synchronized_commit=off), and speeds up WAL replaying (as were
found by Heikki Linnakangas).

Same approach were implemented by Heikki Linnakangas some time ago with
several distinctions.
---
 src/backend/storage/page/bufpage.c | 78 --
 src/include/c.h| 59 
 2 files changed, 134 insertions(+), 3 

Re: [HACKERS] Small improvement to compactify_tuples

2017-05-15 Thread Alvaro Herrera
Please add these two patches to the upcoming commitfest,
https://commitfest.postgresql.org/

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-05-15 Thread Bruce Momjian
On Mon, May 15, 2017 at 07:32:30AM -0700, Jeff Davis wrote:
> On Sun, May 14, 2017 at 8:00 PM, Bruce Momjian  wrote:
> > Do we even know that floats are precise enough to determine the
> > partition.  For example, if you have 6.1, is it possible for
> > that to be 5.999 on some systems?  Are IEEE systems all the same for
> > these values?  I would say we should disallow any approximate date type
> > for partitioning completely.
> 
> I'm inclined in this direction, as well. Hash partitioning is mostly
> useful for things that are likely to be join keys or group keys, and
> floats aren't. Same for complex user-defined types.
> 
> The real problem here is what Tom pointed out: that we would have
> trouble hashing strings in an encoding-insensitive way. Strings are
> useful as join/group keys, so it would be painful to not support them.

Well, since we can't mix encodings in the same column, why can't we just
hash the binary representation of the string?  My point is that wish
hashing we aren't comparing one string with another, right?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-05-15 Thread Jeff Davis
On Sun, May 14, 2017 at 6:22 PM, Robert Haas  wrote:
> You'd have to prohibit a heck of a lot more than that in order for
> this to work 100% reliably.  You'd have to prohibit CHECK constraints,
> triggers, rules, RLS policies, and UNIQUE indexes, at the least.  You
> might be able to convince me that some of those things are useless
> when applied to partitions, but wanting a CHECK constraint that
> applies to only one partition seems pretty reasonable (e.g. CHECK that
> records for older years are all in the 'inactive' state, or whatever).
> I think getting this to work 100% reliably in all cases would require
> an impractically large hammer.

The more I think about it the more I think hash partitions are
"semi-logical". A check constraint on a single partition of a
range-partitioned table makes sense, but it doesn't make sense on a
single partition of a hash-partitioned table.

I think hash partitioning is mainly useful for parallel query (so the
optimizer needs to know about it) and maintenance commands (as you
listed in another email). But those don't need hash portability.

FKs are a little more interesting, but I actually think they still
work regardless of hash portability. If the two sides are in the same
hash opfamily, they should hash the same and it's fine. And if they
aren't, the FK has no hope of working regardless of hash portability.

This would mean we need to reload through the root as Andres and
others suggested, and disable a lot of logical partitioning
capabilities. I'd be a little worried about what we do with
attaching/detaching, though.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-05-15 Thread Peter Eisentraut
On 5/10/17 09:12, Michael Paquier wrote:
> Looking at 0001 and 0002... So you are correctly blocking nextval()
> when ALTER SEQUENCE holds a lock on the sequence object. And
> concurrent calls of nextval() don't conflict. As far as I can see this
> matches the implementation of 3.
> 
> Here are some minor comments.

Committed after working in your comments.  Thanks!

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-05-15 Thread Jeff Davis
On Sun, May 14, 2017 at 8:00 PM, Bruce Momjian  wrote:
> Do we even know that floats are precise enough to determine the
> partition.  For example, if you have 6.1, is it possible for
> that to be 5.999 on some systems?  Are IEEE systems all the same for
> these values?  I would say we should disallow any approximate date type
> for partitioning completely.

I'm inclined in this direction, as well. Hash partitioning is mostly
useful for things that are likely to be join keys or group keys, and
floats aren't. Same for complex user-defined types.

The real problem here is what Tom pointed out: that we would have
trouble hashing strings in an encoding-insensitive way. Strings are
useful as join/group keys, so it would be painful to not support them.

Perhaps there's some kind of compromise, like a pg_dump mode that
reloads through the root. Or maybe hash partitions are really a
"semi-logical" partitioning that the optimizer understands, but where
things like per-partition check constraints don't make sense.

Regards,
Jeff Davis


Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Increasing parallel workers at runtime

2017-05-15 Thread Dilip Kumar
On Mon, May 15, 2017 at 7:36 PM, Haribabu Kommi
 wrote:
>
> The wait of the workers to send tuples is may be
> because of less number of workers. So increasing
> the parallel workers may improve the performance.

Yes, for the cases where a significant amount of work is still
pending, but it might hurt the performance where the work is about to
complete but not yet completed.

@@ -346,6 +346,38 @@ gather_readnext(GatherState *gatherstate)
  if (nvisited >= gatherstate->nreaders)
  {
  /*
+ * As we are going to wait for the workers to send tuples, this may be
+ * possible because of not sufficient workers that are planned?
+ * Does the gather have all the required parallel workers? If not try to get
+ * some more workers (only when all the previously allocated workers are still
+ * doing the job) before we wait, this will further increase the performance
+ * of the query as planned.
+ */
You might want to do similar changes for gather_merge_readnext.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Increasing parallel workers at runtime

2017-05-15 Thread Haribabu Kommi
In the current parallel implementation, in case if the
number of planned workers doesn't available during
the start of the query execution, the query starts the
execution with the available number of workers till
the end of the query.

It may be possible that during the query processing
the required number of workers may be available.
so how about increasing the parallel workers to the
planned workers (only when the main backend has
going to wait for the workers send the tuples or it
has to process the plan by itself.)

The wait of the workers to send tuples is may be
because of less number of workers. So increasing
the parallel workers may improve the performance.

POC Patch attached for the same.

This still needs some adjustments to fix for the cases where
the main backend also does the scan instead of waiting for
the workers to finish the job. As increasing the workers logic
shouldn't add an overhead in this case.

Regards,
Hari Babu
Fujitsu Australia


parallel_workers_increase_at_runtime.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Receive buffer size for the statistics socket

2017-05-15 Thread Tom Lane
Heikki Linnakangas  writes:
> On 05/14/2017 09:54 PM, Tom Lane wrote:
>> A further idea is that maybe backends should be tweaked to avoid
>> blasting large amounts of data at the stats collector in one go.
>> That would require more thought to design, though.

> The data is already sent in small < 1 kB messages, I don't see what more 
> we can do in the sender side to avoid overwhelming the receiver. Except 
> reduce the amount of data sent overall. But that only goes so far, we 
> cannot eliminate the problem altogether, unless we also lose some detail.

I was wondering about some sort of rate-throttling on the messages.
For instance, stop after sending X kilobytes, leaving remaining counts
to be sent at the next opportunity.  (Although it's not clear if you'd
ever catch up :-(.)  Or just a short sleep after every X kilobytes, to
make it more probable that the stats collector gets to run and collect
the data.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-05-15 Thread Ildus Kurbangaliev
On Mon, 15 May 2017 17:43:52 +0530
Ashutosh Bapat  wrote:

> On Mon, May 15, 2017 at 2:46 PM, Ildus Kurbangaliev
>  wrote:
> > On Mon, 15 May 2017 10:34:58 +0530
> > Dilip Kumar  wrote:
> >  
> >> On Sun, May 14, 2017 at 9:54 PM, Dilip Kumar
> >>  wrote:  
> >> > After your fix, now tupleid is invalid which is expected, but
> >> > seems like we need to do something more. As per the comments
> >> > seems like it is expected to get the oldtuple from planSlot.
> >> > But I don't see any code for handling that part.  
> >>
> >> Maybe we should do something like attached patch.
> >>  
> >
> > Hi,
> > planSlot contains already projected tuple, you can't use it as
> > oldtuple. I think problem is that `rewriteTargetListUD` called only
> > for parent relation, so there is no `wholerow` attribute for
> > foreign tables.  
> 
> Yes. postgresAddForeignUpdateTargets() which is called by
> rewriteTargetListUD injects "ctid". "wholerow" is always there. Not
> for postgres_fdw but for other wrappers it might be a bad news. ctid,
> whole row obtained from the remote postgres server will fit the tuple
> descriptor of parent, but for other FDWs the column injected by
> rewriteTargetListUD() may make the child tuple look different from
> that of the parent, so we may not pass that column down to the child.
> 

I'm trying to say that when we have a regular table as parent, and
foreign table as child, in rewriteTargetListUD `wholerow` won't be
added, because rewriteTargetListUD will be called only for parent
relation. You can see that by running the script i provided in the first
message of this thread.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Event triggers + table partitioning cause server crash in current master

2017-05-15 Thread Mark Dilger

> On May 14, 2017, at 11:02 PM, Amit Langote  
> wrote:
> 
> On 2017/05/14 12:03, Mark Dilger wrote:
>> Hackers,
>> 
>> I discovered a reproducible crash using event triggers in the current
>> development version, 29c7d5e483acaa74a0d06dd6c70b320bb315.
>> I was getting a crash before this version, and cloned a fresh copy of
>> the sources to be sure I was up to date, so I don't think the bug can be
>> attributed to Andres' commit.  (The prior version I was testing against
>> was heavily modified by me, so I recreated the bug using the latest
>> standard, unmodified sources.)
>> 
>> I create both before and after event triggers early in the regression test
>> schedule, which then fire here and there during the following tests, leading
>> fairly reproducibly to the server crashing somewhere during the test suite.
>> These crashes do not happen for me without the event triggers being added
>> to the tests.  Many tests show as 'FAILED' simply because the logging
>> that happens in the event triggers creates unexpected output for the test.
>> Those "failures" are expected.  The server crashes are not.
>> 
>> The server logs suggest the crashes might be related to partitioned tables.
>> 
>> Please find attached the patch that includes my changes to the sources
>> for recreating this bug.  The logs and regression.diffs are a bit large; let
>> me know if you need them.
>> 
>> I built using the command
>> 
>> ./configure --enable-cassert --enable-tap-tests && make -j4 && make check
> 
> Thanks for the report and providing steps to reproduce.
> 
> It seems that it is indeed a bug related to creating range-partitioned
> tables.  DefineRelation() calls AlterTableInternal() to add NOT NULL
> constraints on the range partition key columns, but the code fails to
> first initialize the event trigger context information.  Attached patch
> should fix that.
> 
> Thanks to the above test case, I also discovered that in the case of
> creating a partition, manipulations performed by MergeAttributes() on the
> input schema list may cause it to become invalid, that is, the List
> metadata (length) will no longer match the reality, because while the
> ListCells are deleted from the input list, the List pointer passed to
> list_delete_cell does not point to the same list.  This caused a crash
> when the CreateStmt in question was subsequently passed to copyObject,
> which tried to access CreateStmt.tableElts that has become invalid as just
> described.  The attached patch also takes care of that.

I can confirm that this fixes the crash that I was seeing.  I have read
through the patch briefly, but will give it a more thorough review in the
next few hours.

Many thanks for your attention on this!

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-05-15 Thread Sokolov Yura

Sokolov Yura писал 2017-05-15 15:08:

Heikki Linnakangas писал 2017-05-15 12:06:

On 05/14/2017 09:47 PM, Sokolov Yura wrote:

Good day, everyone.

I've been playing a bit with unlogged tables - just random updates on
simple
key-value table. I've noticed amount of cpu spent in a 
compactify_tuples
(called by PageRepareFragmentaion). Most of time were spent in qsort 
of

itemidbase items.


Ah, I played with this too a couple of years ago, see
https://www.postgresql.org/message-id/546B89DE.7030906%40vmware.com,
but got distracted by other things and never got around to commit
that.


itemidbase array is bounded by number of tuples in a page, and
itemIdSortData
structure is simple, so specialized version could be a better choice.

Attached patch adds combination of one pass of prefix sort with
insertion
sort for larger array and shell sort for smaller array.
Insertion sort and shell sort are implemented as macros and could be
reused.


Cool! Could you compare that against the bucket sort I posted in the
above thread, please?

At a quick glance, your "prefix sort" seems to be the the same
algorithm as the bucket sort that I implemented. You chose 256
buckets, where I picked 32. And you're adding a shell sort
implementation, for small arrays, while I used a straight insertion
sort. Not sure what these differences mean in practice.

- Heikki


Thank you for attention.

My first version of big page sort was almost exactly same to yours.
I had a bug in my insertion sort, so I had to refactor it.
(bug were fixed)

I found that items in itemidbase are almost sorted, so it is important
to try keep its order in prefix sort. So I've changed --count[i] to
count[i+1]++.

And it looks like it is better to have more buckets:
- with 256 buckets, size of single bucket is almost always less than 2,
so array is almost always sorted after prefix sort pass.

But it looks like it is better to sort each bucket separately, as you
did, and as it was in my early version.

Also I used your names for functions and some comments.

I attached new version of the patch.

I left memcpy intact cause it looks like it doesn't take noticable
cpu time.


In a sequel, I propose to simplify PageRepairFragmentation in attached
patch.

--
Sokolov Yura aka funny.falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres CompanyFrom 2cde4cb6b0c4c5868d99e13789b0ac33364d7315 Mon Sep 17 00:00:00 2001
From: Sokolov Yura 
Date: Mon, 15 May 2017 16:04:14 +0300
Subject: [PATCH 2/2] bufpage.c: simplify PageRepairFragmentation

In assumption that page usually doesn't become empty, merge second loop
body (collecting items with storage) into first (counting kinds of
items).
---
 src/backend/storage/page/bufpage.c | 46 +++---
 1 file changed, 18 insertions(+), 28 deletions(-)

diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index c5630e808b..61738f241f 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -555,10 +555,11 @@ PageRepairFragmentation(Page page)
 	Offset		pd_special = ((PageHeader) page)->pd_special;
 	ItemId		lp;
 	int			nline,
-nstorage,
 nunused;
 	int			i;
 	Size		totallen;
+	itemIdSortData itemidbase[MaxHeapTuplesPerPage];
+	itemIdSort	itemidptr = itemidbase;
 
 	/*
 	 * It's worth the trouble to be more paranoid here than in most places,
@@ -578,14 +579,26 @@ PageRepairFragmentation(Page page)
 		pd_lower, pd_upper, pd_special)));
 
 	nline = PageGetMaxOffsetNumber(page);
-	nunused = nstorage = 0;
+	nunused = totallen = 0;
 	for (i = FirstOffsetNumber; i <= nline; i++)
 	{
 		lp = PageGetItemId(page, i);
 		if (ItemIdIsUsed(lp))
 		{
 			if (ItemIdHasStorage(lp))
-nstorage++;
+			{
+itemidptr->offsetindex = i - 1;
+itemidptr->itemoff = ItemIdGetOffset(lp);
+if (unlikely(itemidptr->itemoff < (int) pd_upper ||
+			 itemidptr->itemoff >= (int) pd_special))
+	ereport(ERROR,
+			(errcode(ERRCODE_DATA_CORRUPTED),
+			 errmsg("corrupted item pointer: %u",
+	itemidptr->itemoff)));
+itemidptr->alignedlen = MAXALIGN(ItemIdGetLength(lp));
+totallen += itemidptr->alignedlen;
+itemidptr++;
+			}
 		}
 		else
 		{
@@ -595,7 +608,7 @@ PageRepairFragmentation(Page page)
 		}
 	}
 
-	if (nstorage == 0)
+	if (itemidptr == itemidbase)
 	{
 		/* Page is completely empty, so just reset it quickly */
 		((PageHeader) page)->pd_upper = pd_special;
@@ -603,36 +616,13 @@ PageRepairFragmentation(Page page)
 	else
 	{
 		/* Need to compact the page the hard way */
-		itemIdSortData itemidbase[MaxHeapTuplesPerPage];
-		itemIdSort	itemidptr = itemidbase;
-
-		totallen = 0;
-		for (i = 0; i < nline; i++)
-		{
-			lp = PageGetItemId(page, i + 1);
-			if (ItemIdHasStorage(lp))
-			{
-itemidptr->offsetindex = i;
-itemidptr->itemoff = ItemIdGetOffset(lp);
-if (itemidptr->itemoff < (int) pd_upper ||
-	itemidptr->itemoff >= (int) pd_special)
-	

Re: [HACKERS] NOT NULL constraints on range partition key columns

2017-05-15 Thread Amit Kapila
On Mon, May 15, 2017 at 11:46 AM, Amit Langote
 wrote:
> Starting a new thread to discuss the proposal I put forward in [1] to stop
> creating explicit NOT NULL constraint on range partition keys that are
> simple columns.  I said the following:
>
> On 2017/05/12 11:20, Robert Haas wrote:
>> On Thu, May 11, 2017 at 10:15 PM, Amit Langote
>>  wrote:
>>> On 2017/05/12 10:42, Robert Haas wrote:
 Actually, I think that not supporting nulls for range partitioning may
 have been a fairly bad decision.
>>>
>>> I think the relevant discussion concluded [1] that way, because we
>>> couldn't decide which interface to provide for specifying where NULLs are
>>> placed or because we decided to think about it later.
>>
>> Yeah, but I have a feeling that marking the columns NOT NULL is going
>> to make it really hard to support that in the future when we get the
>> syntax hammered out.  If it had only affected the partition
>> constraints that'd be different.
>
> So, adding keycol IS NOT NULL (like we currently do for expressions) in
> the implicit partition constraint would be more future-proof than
> generating an actual catalogued NOT NULL constraint on the keycol?  I now
> tend to think it would be better.  Directly inserting into a range
> partition with a NULL value for a column currently generates a "null value
> in column \"%s\" violates not-null constraint" instead of perhaps more
> relevant "new row for relation \"%s\" violates partition constraint".
> That said, we *do* document the fact that a NOT NULL constraint is added
> on range key columns, but we might as well document instead that we don't
> currently support routing tuples with NULL values in the partition key
> through a range-partitioned table and so NULL values cause error.
>

Can't we allow NULL to get inserted into the partition (leaf
partition) if the user uses the partition name in Insert statement?
For root partitions, I think for now giving an error is okay, but once
we have default partitions (Rahila's patch), we can route NULLS to
default partition.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres 9.6.2 update breakage

2017-05-15 Thread Roel Janssen

Jan Nieuwenhuizen writes:

> Roel Janssen writes:
>
>> So, it would be something like:
>> postgres pg_upgrade \
>> ...
>
> It's great to have a recipe `that works', so thanks!
>
> However, whether or not we automate this, I cannot help to wonder if
> we should support downgrading -- at least to the previous version
> in this case?
>
> If I'm not mistaken, everything else in GuixSD will run if I select a
> previous system generation in Grub...except for this?
>
> Is involving postgres developers an option, I'm sure a least one of
> the postgresql hackers[cc] are already looking at Guix[SD]?
>
> Greetings,
> janneke

Ah yes, I see the point.  The problem here is that when new features are
added to PostgreSQL, and you rely upon them in your database schemas,
downgrading will most likely cause loss of information.

Maybe we need a wrapper script that also makes a dump of all of the
data?  Now that could become a security hole.

Or the wrapper script warns about this situation, and recommends making
a (extra) back-up of the database before upgrading.

Or.. the upgrade is something a user should do explicitly, basically
giving up on the "just works" concept.  Guix already provides a nice way
to get the previous version of the exact binaries used before the
upgrade.

Kind regards,
Roel Janssen



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-05-15 Thread Sokolov Yura

Heikki Linnakangas писал 2017-05-15 12:06:

On 05/14/2017 09:47 PM, Sokolov Yura wrote:

Good day, everyone.

I've been playing a bit with unlogged tables - just random updates on
simple
key-value table. I've noticed amount of cpu spent in a 
compactify_tuples
(called by PageRepareFragmentaion). Most of time were spent in qsort 
of

itemidbase items.


Ah, I played with this too a couple of years ago, see
https://www.postgresql.org/message-id/546B89DE.7030906%40vmware.com,
but got distracted by other things and never got around to commit
that.


itemidbase array is bounded by number of tuples in a page, and
itemIdSortData
structure is simple, so specialized version could be a better choice.

Attached patch adds combination of one pass of prefix sort with
insertion
sort for larger array and shell sort for smaller array.
Insertion sort and shell sort are implemented as macros and could be
reused.


Cool! Could you compare that against the bucket sort I posted in the
above thread, please?

At a quick glance, your "prefix sort" seems to be the the same
algorithm as the bucket sort that I implemented. You chose 256
buckets, where I picked 32. And you're adding a shell sort
implementation, for small arrays, while I used a straight insertion
sort. Not sure what these differences mean in practice.

- Heikki


Thank you for attention.

My first version of big page sort was almost exactly same to yours.
I had a bug in my insertion sort, so I had to refactor it.
(bug were fixed)

I found that items in itemidbase are almost sorted, so it is important
to try keep its order in prefix sort. So I've changed --count[i] to
count[i+1]++.

And it looks like it is better to have more buckets:
- with 256 buckets, size of single bucket is almost always less than 2,
so array is almost always sorted after prefix sort pass.

But it looks like it is better to sort each bucket separately, as you
did, and as it was in my early version.

Also I used your names for functions and some comments.

I attached new version of the patch.

I left memcpy intact cause it looks like it doesn't take noticable
cpu time.

--
Sokolov Yura aka funny.falcon
Postgres Professional: https://postgrespro.ru
The Russian PostgreSQL CompanyFrom 8f75fcaba0590d5150dd956a60feee5c28f4caab Mon Sep 17 00:00:00 2001
From: Sokolov Yura 
Date: Mon, 15 May 2017 14:23:39 +0300
Subject: [PATCH] storage/page/bufpage.c: improve compactify_tuples

Items passed to compactify_tuples are almost sorted. So call to general
qsort makes unnecessary overhead on function calls (swapfunc, med,
itemoffcompare).

This patch implements bucket sort:
- one pass of stable prefix sort on high 8 bits of offset
- and insertion sort for buckets larger than 1 element

Also for smaller arrays shell sort is used.

Insertion and Shell sort are implemented using macroses.

This approach allows to save 3% of cpu in degenerate case
(highly intensive HOT random updates on unlogged table with
 synchronized_commit=off), and speeds up WAL replaying (as were
found by Heikki Linnakangas).

Same approach were implemented by Heikki Linnakangas some time ago with
several distinctions.
---
 src/backend/storage/page/bufpage.c | 78 --
 src/include/c.h| 59 
 2 files changed, 134 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index fdf045a45b..c5630e808b 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -434,6 +434,77 @@ itemoffcompare(const void *itemidp1, const void *itemidp2)
 }
 
 /*
+ * Sort an array of itemIdSort's on itemoff, descending.
+ *
+ * It uses Shell sort. Given array is small and itemoffcompare is inlined,
+ * it is much faster than call to qsort.
+ */
+static void
+sort_itemIds_small(itemIdSort itemidbase, int nitems)
+{
+	pg_shell_sort(itemIdSortData, itemidbase, nitems, itemoffcompare);
+}
+
+/*
+ * Sort an array of itemIdSort's on itemoff, descending.
+ *
+ * It uses bucket sort:
+ * - single pass of stable prefix sort on high 8 bits
+ * - and insertion sort on buckets larger than 1 element
+ */
+static void
+sort_itemIds(itemIdSort itemidbase, int nitems)
+{
+	itemIdSortData copy[Max(MaxIndexTuplesPerPage, MaxHeapTuplesPerPage)];
+#define NSPLIT 256
+#define PREFDIV (BLCKSZ / NSPLIT)
+	/* two extra elements to emulate offset on previous step */
+	uint16		count[NSPLIT + 2] = {0};
+	int			i,
+pos,
+highbits;
+
+	Assert(nitems <= MaxIndexTuplesPerPage);
+	for (i = 0; i < nitems; i++)
+	{
+		highbits = itemidbase[i].itemoff / PREFDIV;
+		count[highbits]++;
+	}
+	/* sort in decreasing order */
+	for (i = NSPLIT - 1; i != 0; i--)
+		count[i - 1] += count[i];
+
+	/*
+	 * count[k+1] is start of bucket k, count[k] is end of bucket k, and
+	 * count[k] - count[k+1] is length of bucket k.
+	 */
+	Assert(count[0] == nitems);
+	for (i = 0; i < nitems; i++)
+	{
+		highbits = 

Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-05-15 Thread Ashutosh Bapat
On Mon, May 15, 2017 at 6:04 PM, Dilip Kumar  wrote:
> On Mon, May 15, 2017 at 5:43 PM, Ashutosh Bapat
>  wrote:
>> Yes. postgresAddForeignUpdateTargets() which is called by
>> rewriteTargetListUD injects "ctid". "wholerow" is always there. Not
>> for postgres_fdw but for other wrappers it might be a bad news. ctid,
>> whole row obtained from the remote postgres server will fit the tuple
>> descriptor of parent, but for other FDWs the column injected by
>> rewriteTargetListUD() may make the child tuple look different from
>> that of the parent, so we may not pass that column down to the child.
>
> But, can't we call rewriteTargetListUD for all the inheritors if the
> inheritor is a foreign table which will intern call the
> postgresAddForeignUpdateTargets?
>
Yes. But it's not necessary to have all children use same FDW.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-05-15 Thread Dilip Kumar
On Mon, May 15, 2017 at 5:43 PM, Ashutosh Bapat
 wrote:
> Yes. postgresAddForeignUpdateTargets() which is called by
> rewriteTargetListUD injects "ctid". "wholerow" is always there. Not
> for postgres_fdw but for other wrappers it might be a bad news. ctid,
> whole row obtained from the remote postgres server will fit the tuple
> descriptor of parent, but for other FDWs the column injected by
> rewriteTargetListUD() may make the child tuple look different from
> that of the parent, so we may not pass that column down to the child.

But, can't we call rewriteTargetListUD for all the inheritors if the
inheritor is a foreign table which will intern call the
postgresAddForeignUpdateTargets?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-05-15 Thread Ashutosh Bapat
On Mon, May 15, 2017 at 2:46 PM, Ildus Kurbangaliev
 wrote:
> On Mon, 15 May 2017 10:34:58 +0530
> Dilip Kumar  wrote:
>
>> On Sun, May 14, 2017 at 9:54 PM, Dilip Kumar 
>> wrote:
>> > After your fix, now tupleid is invalid which is expected, but seems
>> > like we need to do something more. As per the comments seems like it
>> > is expected to get the oldtuple from planSlot.  But I don't see any
>> > code for handling that part.
>>
>> Maybe we should do something like attached patch.
>>
>
> Hi,
> planSlot contains already projected tuple, you can't use it as oldtuple.
> I think problem is that `rewriteTargetListUD` called only for parent
> relation, so there is no `wholerow` attribute for foreign tables.

Yes. postgresAddForeignUpdateTargets() which is called by
rewriteTargetListUD injects "ctid". "wholerow" is always there. Not
for postgres_fdw but for other wrappers it might be a bad news. ctid,
whole row obtained from the remote postgres server will fit the tuple
descriptor of parent, but for other FDWs the column injected by
rewriteTargetListUD() may make the child tuple look different from
that of the parent, so we may not pass that column down to the child.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-05-15 Thread Robert Haas
On Fri, May 12, 2017 at 3:07 AM, Amit Kapila  wrote:
> I agree with you that it might not be straightforward to make it work,
> but now that earliest it can go is v11, do we want to try doing
> something other than just documenting it.  What I could read from this
> e-mail thread is that you are intending towards just documenting it
> for the first cut of this feature. However, both Greg and Simon are of
> opinion that we should do something about this and even patch Author
> (Amit Khandekar) has shown some inclination to do something about this
> point (return error to the user in some way), so I think we can't
> ignore this point.
>
> I think now that we have some more time, it is better to try something
> based on a couple of ideas floating in this thread to address this
> point and see if we can come up with something doable without a big
> architecture change.
>
> What is your take on this point now?

I still don't think it's worth spending a bit on this, especially not
with WARM probably gobbling up multiple bits.  Reclaiming the bits
seems like a good idea, but spending one on this still seems to me
like it's probably not the best use of our increasingly-limited supply
of infomask bits.  Now, Simon and Greg may still feel otherwise, of
course.

I could get behind providing an option to turn this behavior on and
off at the level of the partitioned table.  That would use a reloption
rather than an infomask bit, so no scarce resource is being consumed.
I suspect that most people don't update the partition keys at all (so
they don't care either way) and the ones who do are probably either
depending on EPQ (in which case they most likely want to just disallow
all UPDATE-row-movement) or not (in which case they again don't care).
If I understand correctly, the only people who will benefit from
consuming an infomask bit are the people who update their partition
keys AND depend on EPQ BUT only for non-key updates AND need the
system to make sure that they don't accidentally rely on it for the
case of an EPQ update.  That seems (to me, anyway) like it's got to be
a really small percentage of actual users, but I just work here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server Crashes if try to provide slot_name='none' at the time of creating subscription.

2017-05-15 Thread Kuntal Ghosh
On Mon, May 15, 2017 at 5:04 PM, Masahiko Sawada  wrote:
> On Mon, May 15, 2017 at 8:22 PM, Kuntal Ghosh
>  wrote:
>> On Mon, May 15, 2017 at 4:39 PM, Masahiko Sawada  
>> wrote:
>>
>> While testing with logical replication, I've found that the server
>> hangs if we create a subscription to the same server. For example,
>>
>> $ ./pg_ctl -D Test start -o "-p 5432"
>> $ ./psql -p 5432 postgres -c "CREATE PUBLICATION abc for all tables
>> with (publish='delete');"
>> $ ./psql -p 5432 postgres -c "create subscription sub connection
>> 'dbname=postgres port=5432 user=edb password=edb' publication abc with
>> (slot_name='abcslot');"
>> NOTICE:  synchronized table states
>> LOG:  logical decoding found initial starting point at 0/162DF18
>> DETAIL:  Waiting for transactions (approximately 1) older than 560 to end.
>>
>> And, it hangs. Is this an expected behavior?
>
> I guess this is what we're discussing on [1].
> Petr answered on that,
> ===
> Yes that's result of how logical replication slots work, the transaction
> that needs to finish is your transaction. It can be worked around by
> creating the slot manually via the SQL interface for example and create
> the subscription using WITH (NOCREATE SLOT, SLOT NAME = 'your slot') .
> ===
>
> [1] https://www.postgresql.org/message-id/20170426165954.GK14000%40momjian.us
>
Ohh I see. Thank you.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server Crashes if try to provide slot_name='none' at the time of creating subscription.

2017-05-15 Thread Masahiko Sawada
On Mon, May 15, 2017 at 8:22 PM, Kuntal Ghosh
 wrote:
> On Mon, May 15, 2017 at 4:39 PM, Masahiko Sawada  
> wrote:
>
> While testing with logical replication, I've found that the server
> hangs if we create a subscription to the same server. For example,
>
> $ ./pg_ctl -D Test start -o "-p 5432"
> $ ./psql -p 5432 postgres -c "CREATE PUBLICATION abc for all tables
> with (publish='delete');"
> $ ./psql -p 5432 postgres -c "create subscription sub connection
> 'dbname=postgres port=5432 user=edb password=edb' publication abc with
> (slot_name='abcslot');"
> NOTICE:  synchronized table states
> LOG:  logical decoding found initial starting point at 0/162DF18
> DETAIL:  Waiting for transactions (approximately 1) older than 560 to end.
>
> And, it hangs. Is this an expected behavior?

I guess this is what we're discussing on [1].
Petr answered on that,
===
Yes that's result of how logical replication slots work, the transaction
that needs to finish is your transaction. It can be worked around by
creating the slot manually via the SQL interface for example and create
the subscription using WITH (NOCREATE SLOT, SLOT NAME = 'your slot') .
===

[1] https://www.postgresql.org/message-id/20170426165954.GK14000%40momjian.us

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server Crashes if try to provide slot_name='none' at the time of creating subscription.

2017-05-15 Thread Kuntal Ghosh
On Mon, May 15, 2017 at 4:39 PM, Masahiko Sawada  wrote:

While testing with logical replication, I've found that the server
hangs if we create a subscription to the same server. For example,

$ ./pg_ctl -D Test start -o "-p 5432"
$ ./psql -p 5432 postgres -c "CREATE PUBLICATION abc for all tables
with (publish='delete');"
$ ./psql -p 5432 postgres -c "create subscription sub connection
'dbname=postgres port=5432 user=edb password=edb' publication abc with
(slot_name='abcslot');"
NOTICE:  synchronized table states
LOG:  logical decoding found initial starting point at 0/162DF18
DETAIL:  Waiting for transactions (approximately 1) older than 560 to end.

And, it hangs. Is this an expected behavior?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server Crashes if try to provide slot_name='none' at the time of creating subscription.

2017-05-15 Thread Masahiko Sawada
On Mon, May 15, 2017 at 8:06 PM, Kuntal Ghosh
 wrote:
> On Mon, May 15, 2017 at 4:00 PM, Masahiko Sawada  
> wrote:
>> On Mon, May 15, 2017 at 6:41 PM, tushar  
>> wrote:
>>> Hi,
>>>
>>> Server Crashes if we try to provide slot_name='none' at the time of creating
>>> subscription -
>>>
>>> postgres=#  create subscription sub2 connection 'dbname=postgres port=5000
>>> user=centos password=f' publication abc with (slot_name='none');
>>> NOTICE:  synchronized table states
>>> server closed the connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> The connection to the server was lost. Attempting reset: Failed.
>>> !>
>>>
>>
>> Thank you for reporting.
>> I think create_slot and enabled should be set to false forcibly when
>> slot_name = 'none'. Attached patch fixes it, more test and regression
>> test case are needed though.
>>
> I think the patch doesn't solve the problem completely. For example,
> postgres=# create subscription sub3 connection 'dbname=postgres
> port=5432 user=edb password=edb' publication abc with
> (slot_name='none',create_slot=true);
> ERROR:  slot_name = NONE and create slot are mutually exclusive options
> postgres=# create subscription sub3 connection 'dbname=postgres
> port=5432 user=edb password=edb' publication abc with
> (create_slot=true,slot_name='none');
> ERROR:  could not connect to the publisher: could not connect to
> server: Connection refused
> Is the server running locally and accepting
> connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
>
> Changing the order of subscription parameter changes the output. I
> think the modifications should be done at the end of the function.
> Attached a patch for the same.
>

Yes, you're patch fixes it correctly.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server Crashes if try to provide slot_name='none' at the time of creating subscription.

2017-05-15 Thread Kuntal Ghosh
On Mon, May 15, 2017 at 4:00 PM, Masahiko Sawada  wrote:
> On Mon, May 15, 2017 at 6:41 PM, tushar  wrote:
>> Hi,
>>
>> Server Crashes if we try to provide slot_name='none' at the time of creating
>> subscription -
>>
>> postgres=#  create subscription sub2 connection 'dbname=postgres port=5000
>> user=centos password=f' publication abc with (slot_name='none');
>> NOTICE:  synchronized table states
>> server closed the connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> The connection to the server was lost. Attempting reset: Failed.
>> !>
>>
>
> Thank you for reporting.
> I think create_slot and enabled should be set to false forcibly when
> slot_name = 'none'. Attached patch fixes it, more test and regression
> test case are needed though.
>
I think the patch doesn't solve the problem completely. For example,
postgres=# create subscription sub3 connection 'dbname=postgres
port=5432 user=edb password=edb' publication abc with
(slot_name='none',create_slot=true);
ERROR:  slot_name = NONE and create slot are mutually exclusive options
postgres=# create subscription sub3 connection 'dbname=postgres
port=5432 user=edb password=edb' publication abc with
(create_slot=true,slot_name='none');
ERROR:  could not connect to the publisher: could not connect to
server: Connection refused
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

Changing the order of subscription parameter changes the output. I
think the modifications should be done at the end of the function.
Attached a patch for the same.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


fix_bug_of_parse_option_v1.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-05-15 Thread Kyotaro HORIGUCHI
Hello,

At Fri, 12 May 2017 17:24:07 +0900, Masahiko Sawada  
wrote in 
> On Fri, May 12, 2017 at 11:24 AM, Masahiko Sawada  
> wrote:
> > On Thu, May 11, 2017 at 6:16 PM, Petr Jelinek
> >  wrote:
> >> On 11/05/17 10:10, Masahiko Sawada wrote:
> >>> On Thu, May 11, 2017 at 4:06 PM, Michael Paquier
> >>>  wrote:
>  On Wed, May 10, 2017 at 11:57 AM, Masahiko Sawada 
>   wrote:
> > Barring any objections, I'll add these two issues to open item.
> 
>  It seems to me that those open items have not been added yet to the
>  list. If I am following correctly, they could be defined as follows:
>  - Dropping subscription may stuck if done during tablesync.
>  -- Analyze deadlock issues with DROP SUBSCRIPTION and apply worker 
>  process.
> >>
> >> I think the solution to this is to reintroduce the LWLock that was
> >> removed and replaced with the exclusive lock on catalog [1]. I am afraid
> >> that correct way of handling this is to do both LWLock and catalog lock
> >> (first LWLock under which we kill the workers and then catalog lock so
> >> that something that prevents launcher from restarting them is held till
> >> the end of transaction).
> >
> > I agree to reintroduce LWLock and to stop logical rep worker first and
> > then modify catalog. That way we can reduce catalog lock level (maybe
> > to RowExclusiveLock) so that apply worker can see it. Also I think
> > that we need to do more things like in order to prevent that we keep
> > to hold LWLock until end of transaction, because holding LWLock until
> > end of transaction is not good idea and could be cause of deadlock. So
> > for example we can commit the transaction in DropSubscription after
> > cleaned pg_subscription record and all its dependencies and then start
> > new transaction for the remaining work. Of course we also need to
> > disallow DROP SUBSCRIPTION being executed in a user transaction
> > though.
> 
> Attached two draft patches to solve these issues.
> 
> Attached 0001 patch reintroduces LogicalRepLauncherLock and makes DROP
> SUBSCRIPTION keep holding it until commit. To prevent from deadlock
> possibility, I disallowed DROP SUBSCRIPTION being called in a
> transaction block. But there might be more sensible solution for this.
> please give me feedback.

+* Protect against launcher restarting the worker. This lock will
+* be released at commit.

This is wrong. COMMIT doesn't release left-over LWLocks, only
ABORT does (precisely, it seems intended to fire on ERRORs). So
with this patch, the second DROP SUBSCRIPTION is stuck on the
LWLock acquired at the first time. And as Petr said, LWLock with
such a duration seems bad.

The cause seems to be that workers ignore sigterm on certain
conditions. One of the choke points is GetSubscription, the other
is get_subscription_list. I think we can treat the both cases
without LWLocks.

The attached patch does that.

- heap_close + UnlockRelationOid in get_subscription_list() is
  equivalent to one heap_close or relation_close but I took seeming
  symmetricity.

- 0.5 seconds for the sleep in ApplyWorkerMain is quite
  arbitrary. NAPTIME_PER_CYCLE * 1000 could be used instead.

- NULL MySubscription without SIGTERM might not need to be an
  ERROR.

Any more thoughts?


FYI, I reproduced the situation by the following steps. This
effectively reproduced the situation without delay insertion for
me.

# Creating 5 tables with 10 rows on the publisher
create table t1 (a int);
...
create table t5 (a int);
insert into t1 (select * from generate_series(0, 9) a);
...
insert into t5 (select * from generate_series(0, 9) a);
create publication p1 for table t1, t2, t3, t4, t5;


# Subscribe them, wait 1sec, then unsbscribe.
create table t1 (a int);
...
create table t5 (a int);
truncate t1, t2, t3, t4, t5; create subscription s1 CONNECTION 'host=/tmp 
port=5432 dbname=postgres' publication p1; select pg_sleep(1); drop 
subscription s1;

Repeated test can be performed by repeatedly enter the last line.

>  -- Avoid orphaned tablesync worker if apply worker exits before
>  changing its status.
> >>>
> >>
> >> The behavior question I have about this is if sync workers should die
> >> when apply worker dies (ie they are tied to apply worker) or if they
> >> should be tied to the subscription.
> >>
> >> I guess taking down all the sync workers when apply worker has exited is
> >> easier to solve. Of course it means that if apply worker restarts in
> >> middle of table synchronization, the table synchronization will have to
> >> start from scratch. That being said, in normal operation apply worker
> >> should only exit/restart if subscription has changed or has been
> >> dropped/disabled and I think sync workers want to exit/restart in that
> >> situation as 

Re: [HACKERS] Patch to fix documentation about AFTER triggers

2017-05-15 Thread Ashutosh Bapat
On Sat, May 13, 2017 at 2:45 AM, Paul Jungwirth
 wrote:
> Here is a patch to amend the docs here:
>
> https://www.postgresql.org/docs/devel/static/plpgsql-trigger.html
>
> In the example for an AFTER trigger, you see this code:
>
> --
> -- Create a row in emp_audit to reflect the operation performed on emp,
> -- make use of the special variable TG_OP to work out the operation.
> --
> IF (TG_OP = 'DELETE') THEN
> INSERT INTO emp_audit SELECT 'D', now(), user, OLD.*;
> RETURN OLD;
>  ELSIF (TG_OP = 'UPDATE') THEN
> INSERT INTO emp_audit SELECT 'U', now(), user, NEW.*;
> RETURN NEW;
> ELSIF (TG_OP = 'INSERT') THEN
> INSERT INTO emp_audit SELECT 'I', now(), user, NEW.*;
> RETURN NEW;
> END IF;
> RETURN NULL; -- result is ignored since this is an AFTER trigger
>
> What are all those RETURNs doing in there? The comment on the final RETURN
> is correct, so returning NEW or OLD above seems confusing, and likely a
> copy/paste error.
>
> This patch just removes those three lines from the example code.

https://www.postgresql.org/docs/devel/static/trigger-definition.html says
"The return value is ignored for row-level triggers fired after an
operation, and so they can return NULL.". There's nothing wrong with
the example, returning OLD or NEW, but as you have pointed out it's
confusing. So, +1 for this change.




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-05-15 Thread amul sul
On Wed, May 10, 2017 at 10:13 PM, Robert Haas  wrote:
> On Wed, May 10, 2017 at 8:34 AM, Ashutosh Bapat
>  wrote:
>> Hash partitioning will partition the data based on the hash value of the
>> partition key. Does that require collation? Should we throw an error/warning 
>> if
>> collation is specified in PARTITION BY clause?
>
> Collation is only relevant for ordering, not equality.  Since hash
> opclasses provide only equality, not ordering, it's not relevant here.
> I'm not sure whether we should error out if it's specified or just
> silently ignore it.  Maybe an ERROR is a good idea?  But not sure.
>
IMHO, we could simply have a WARNING, and ignore collation, thoughts?

Updated patches attached.

Regards,
Amul


0001-Cleanup_v2.patch
Description: Binary data


0002-hash-partitioning_another_design-v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server Crashes if try to provide slot_name='none' at the time of creating subscription.

2017-05-15 Thread Masahiko Sawada
On Mon, May 15, 2017 at 6:41 PM, tushar  wrote:
> Hi,
>
> Server Crashes if we try to provide slot_name='none' at the time of creating
> subscription -
>
> postgres=#  create subscription sub2 connection 'dbname=postgres port=5000
> user=centos password=f' publication abc with (slot_name='none');
> NOTICE:  synchronized table states
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !>
>

Thank you for reporting.
I think create_slot and enabled should be set to false forcibly when
slot_name = 'none'. Attached patch fixes it, more test and regression
test case are needed though.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_bug_of_parse_option.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-05-15 Thread Dilip Kumar
On Mon, May 15, 2017 at 2:46 PM, Ildus Kurbangaliev
 wrote:
> Hi,
> planSlot contains already projected tuple, you can't use it as oldtuple.
> I think problem is that `rewriteTargetListUD` called only for parent
> relation, so there is no `wholerow` attribute for foreign tables.

Oh, right!

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create publication syntax is not coming properly in pg_dump / pg_dumpall

2017-05-15 Thread Kuntal Ghosh
On Mon, May 15, 2017 at 3:06 PM, Masahiko Sawada  wrote:
> On Mon, May 15, 2017 at 5:04 PM, tushar  wrote:
>> Hi,
>>
>> I observed that in pg_dump/pg_dumpall - 'create publication' syntax is not
>> coming properly if only specified value  is mentioned in publish.
>>
>> Testcase to reproduce -
>>
>> \\create a publication
>>
>> postgres=# CREATE PUBLICATION abc for all tables with (publish='insert');
>> CREATE PUBLICATION
>>
>> \\take the plain dump
>>
>> [centos@centos-cpula bin]$ ./pg_dump -FP -p 5000 postgres  > /tmp/a.a
>>
>> \\check the syntax
>>
>> [centos@centos-cpula bin]$ cat /tmp/a.a |grep 'create publication abc' -i
>> CREATE PUBLICATION abc FOR ALL TABLES WITH (publish = 'insert, , ');
>>
>> \\try to execute the same syntax against psql terminal
>>
>> postgres=# CREATE PUBLICATION abc FOR ALL TABLES WITH (publish = 'insert, ,
>> ');
>> ERROR:  invalid publish list
>>
>> Same is valid for pg_dumpall as well..
>>
>
> Thank you for reporting.
>
> Hm, It's a bug of pg_dump. Attached patch should fix both pg_dump and
> pg_dumpall.
>
I've reproduced the bug and the patch fixes the issue for me. Also,
tested with different combinations of insert, delete and update.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Server Crashes if try to provide slot_name='none' at the time of creating subscription.

2017-05-15 Thread tushar

Hi,

Server Crashes if we try to provide slot_name='none' at the time of 
creating subscription -


postgres=#  create subscription sub2 connection 'dbname=postgres 
port=5000 user=centos password=f' publication abc with (slot_name='none');

NOTICE:  synchronized table states
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create publication syntax is not coming properly in pg_dump / pg_dumpall

2017-05-15 Thread Masahiko Sawada
On Mon, May 15, 2017 at 5:04 PM, tushar  wrote:
> Hi,
>
> I observed that in pg_dump/pg_dumpall - 'create publication' syntax is not
> coming properly if only specified value  is mentioned in publish.
>
> Testcase to reproduce -
>
> \\create a publication
>
> postgres=# CREATE PUBLICATION abc for all tables with (publish='insert');
> CREATE PUBLICATION
>
> \\take the plain dump
>
> [centos@centos-cpula bin]$ ./pg_dump -FP -p 5000 postgres  > /tmp/a.a
>
> \\check the syntax
>
> [centos@centos-cpula bin]$ cat /tmp/a.a |grep 'create publication abc' -i
> CREATE PUBLICATION abc FOR ALL TABLES WITH (publish = 'insert, , ');
>
> \\try to execute the same syntax against psql terminal
>
> postgres=# CREATE PUBLICATION abc FOR ALL TABLES WITH (publish = 'insert, ,
> ');
> ERROR:  invalid publish list
>
> Same is valid for pg_dumpall as well..
>

Thank you for reporting.

Hm, It's a bug of pg_dump. Attached patch should fix both pg_dump and
pg_dumpall.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_pg_dump.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Receive buffer size for the statistics socket

2017-05-15 Thread Heikki Linnakangas

On 05/14/2017 09:54 PM, Tom Lane wrote:

Also, it's clear that a session could easily shove much more than 8KB at
a time out to the stats collector, because what we're doing in the stats
test does not involve touching any very large number of tables.  So I
think this is not just a test failure but is telling us about a plausible
mechanism for real-world statistics drops.

I observe a default receive buffer size around 124K on my Linux box,
which seems like it'd be far less prone to overflow than 8K.

I propose that it'd be a good idea to try to set the stats socket's
receive buffer size to be a minimum of, say, 100K on all platforms.
Code for this would be analogous to what we already have in pqcomm.c
(circa line 760) for forcing up the send buffer size, but SO_RCVBUF
not SO_SNDBUF.


Seems reasonable.


A further idea is that maybe backends should be tweaked to avoid
blasting large amounts of data at the stats collector in one go.
That would require more thought to design, though.


The data is already sent in small < 1 kB messages, I don't see what more 
we can do in the sender side to avoid overwhelming the receiver. Except 
reduce the amount of data sent overall. But that only goes so far, we 
cannot eliminate the problem altogether, unless we also lose some detail.


It might nevertheless be worthwhile to reduce the overall volume. It 
would avoid some overhead, even if the buffer is large enough, although 
I don't remember pgstat being significant in any profiling I've done. 
One thing that caught my eye at a quick glance is that we are sending 
the # of tuples inserted/deleted/updated counters, even for read-only 
cases. It seems straightforward to detect that special case, and use an 
abbreviated version of PgStat_MsgTabstat without those counters, when we 
haven't updated anything. But again, might not be worth the trouble.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-05-15 Thread Ildus Kurbangaliev
On Mon, 15 May 2017 10:34:58 +0530
Dilip Kumar  wrote:

> On Sun, May 14, 2017 at 9:54 PM, Dilip Kumar 
> wrote:
> > After your fix, now tupleid is invalid which is expected, but seems
> > like we need to do something more. As per the comments seems like it
> > is expected to get the oldtuple from planSlot.  But I don't see any
> > code for handling that part.  
> 
> Maybe we should do something like attached patch.
> 

Hi,
planSlot contains already projected tuple, you can't use it as oldtuple.
I think problem is that `rewriteTargetListUD` called only for parent
relation, so there is no `wholerow` attribute for foreign tables.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typos in pg_basebackup.c

2017-05-15 Thread Magnus Hagander
On Mon, May 15, 2017 at 7:02 AM, Michael Paquier 
wrote:

> Hi all,
>
> Found $subject while working on the area. A patch is attached.
> Thanks,
>

Applied, thanks.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] Small improvement to compactify_tuples

2017-05-15 Thread Heikki Linnakangas

On 05/14/2017 09:47 PM, Sokolov Yura wrote:

Good day, everyone.

I've been playing a bit with unlogged tables - just random updates on
simple
key-value table. I've noticed amount of cpu spent in a compactify_tuples
(called by PageRepareFragmentaion). Most of time were spent in qsort of
itemidbase items.


Ah, I played with this too a couple of years ago, see 
https://www.postgresql.org/message-id/546B89DE.7030906%40vmware.com, but 
got distracted by other things and never got around to commit that.



itemidbase array is bounded by number of tuples in a page, and
itemIdSortData
structure is simple, so specialized version could be a better choice.

Attached patch adds combination of one pass of prefix sort with
insertion
sort for larger array and shell sort for smaller array.
Insertion sort and shell sort are implemented as macros and could be
reused.


Cool! Could you compare that against the bucket sort I posted in the 
above thread, please?


At a quick glance, your "prefix sort" seems to be the the same algorithm 
as the bucket sort that I implemented. You chose 256 buckets, where I 
picked 32. And you're adding a shell sort implementation, for small 
arrays, while I used a straight insertion sort. Not sure what these 
differences mean in practice.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Create publication syntax is not coming properly in pg_dump / pg_dumpall

2017-05-15 Thread tushar

Hi,

I observed that in pg_dump/pg_dumpall - 'create publication' syntax is 
not coming properly if only specified value  is mentioned in publish.


Testcase to reproduce -

\\create a publication

postgres=# CREATE PUBLICATION abc for all tables with (publish='insert');
CREATE PUBLICATION

\\take the plain dump

[centos@centos-cpula bin]$ ./pg_dump -FP -p 5000 postgres  > /tmp/a.a

\\check the syntax

[centos@centos-cpula bin]$ cat /tmp/a.a |grep 'create publication abc' -i
CREATE PUBLICATION abc FOR ALL TABLES WITH (publish = 'insert, , ');

\\try to execute the same syntax against psql terminal

postgres=# CREATE PUBLICATION abc FOR ALL TABLES WITH (publish = 
'insert, , ');

ERROR:  invalid publish list

Same is valid for pg_dumpall as well..

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] NOT NULL constraints on range partition key columns

2017-05-15 Thread Amit Langote
Starting a new thread to discuss the proposal I put forward in [1] to stop
creating explicit NOT NULL constraint on range partition keys that are
simple columns.  I said the following:

On 2017/05/12 11:20, Robert Haas wrote:
> On Thu, May 11, 2017 at 10:15 PM, Amit Langote
>  wrote:
>> On 2017/05/12 10:42, Robert Haas wrote:
>>> Actually, I think that not supporting nulls for range partitioning may
>>> have been a fairly bad decision.
>>
>> I think the relevant discussion concluded [1] that way, because we
>> couldn't decide which interface to provide for specifying where NULLs are
>> placed or because we decided to think about it later.
>
> Yeah, but I have a feeling that marking the columns NOT NULL is going
> to make it really hard to support that in the future when we get the
> syntax hammered out.  If it had only affected the partition
> constraints that'd be different.

So, adding keycol IS NOT NULL (like we currently do for expressions) in
the implicit partition constraint would be more future-proof than
generating an actual catalogued NOT NULL constraint on the keycol?  I now
tend to think it would be better.  Directly inserting into a range
partition with a NULL value for a column currently generates a "null value
in column \"%s\" violates not-null constraint" instead of perhaps more
relevant "new row for relation \"%s\" violates partition constraint".
That said, we *do* document the fact that a NOT NULL constraint is added
on range key columns, but we might as well document instead that we don't
currently support routing tuples with NULL values in the partition key
through a range-partitioned table and so NULL values cause error.

Can we still decide to do that instead?

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/f52fb5c3-b6ad-b6ff-4bb6-145fdb805fa6%40lab.ntt.co.jp



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Event triggers + table partitioning cause server crash in current master

2017-05-15 Thread Amit Langote
On 2017/05/14 12:03, Mark Dilger wrote:
> Hackers,
> 
> I discovered a reproducible crash using event triggers in the current
> development version, 29c7d5e483acaa74a0d06dd6c70b320bb315.
> I was getting a crash before this version, and cloned a fresh copy of
> the sources to be sure I was up to date, so I don't think the bug can be
> attributed to Andres' commit.  (The prior version I was testing against
> was heavily modified by me, so I recreated the bug using the latest
> standard, unmodified sources.)
> 
> I create both before and after event triggers early in the regression test
> schedule, which then fire here and there during the following tests, leading
> fairly reproducibly to the server crashing somewhere during the test suite.
> These crashes do not happen for me without the event triggers being added
> to the tests.  Many tests show as 'FAILED' simply because the logging
> that happens in the event triggers creates unexpected output for the test.
> Those "failures" are expected.  The server crashes are not.
> 
> The server logs suggest the crashes might be related to partitioned tables.
> 
> Please find attached the patch that includes my changes to the sources
> for recreating this bug.  The logs and regression.diffs are a bit large; let
> me know if you need them.
> 
> I built using the command
> 
> ./configure --enable-cassert --enable-tap-tests && make -j4 && make check

Thanks for the report and providing steps to reproduce.

It seems that it is indeed a bug related to creating range-partitioned
tables.  DefineRelation() calls AlterTableInternal() to add NOT NULL
constraints on the range partition key columns, but the code fails to
first initialize the event trigger context information.  Attached patch
should fix that.

Thanks to the above test case, I also discovered that in the case of
creating a partition, manipulations performed by MergeAttributes() on the
input schema list may cause it to become invalid, that is, the List
metadata (length) will no longer match the reality, because while the
ListCells are deleted from the input list, the List pointer passed to
list_delete_cell does not point to the same list.  This caused a crash
when the CreateStmt in question was subsequently passed to copyObject,
which tried to access CreateStmt.tableElts that has become invalid as just
described.  The attached patch also takes care of that.

Thanks,
Amit
>From 43f29ac13abda3c6ad71143d462c2dcd2c42f521 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 15 May 2017 14:00:54 +0900
Subject: [PATCH] Fix to make partitioned tables work with event triggers

There were a couple of bugs here:

 1. When creating range partitioned tables, AlterTableInternal is
called which invokes the event trigger code.  We must perform
EventTriggerAlterTableStart() before calling AlterTableInternal
so that the event trigger context information is initialized at
all.

 2. When creating a partition, CreateStmt.tableElts is likely to
become invalid upon return from MergeAttributes().  We must replace
the invalid value with the newly constructed correct list value
computed by MergeAttributes().  For example, in the caller
ProcessUtilitySlow(), EventTriggerCollectSimpleCommand() is called
right after returning from DefineRelation(), and it expects the
the content of CreateStmt to be valid to perform copyObject on it.

First bug was reported by Mark Dilger.
Report: https://postgr.es/m/4A9B8269-9771-4FB7-BFA3-84249800917D%40gmail.com
---
 src/backend/commands/tablecmds.c| 17 ++
 src/test/regress/expected/event_trigger.out | 32 ++
 src/test/regress/sql/event_trigger.sql  | 35 +
 3 files changed, 84 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index b94db89b25..c3f489308a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -620,6 +620,19 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 			 stmt->relation->relpersistence,
 			 stmt->partbound != NULL,
 			 , _constraints, );
+	/*
+	 * The original value of stmt->tableElts may have been obsoleted by
+	 * MergeAttributes(), especially those for partitions, wherein the
+	 * cells of stmt->tableElts are deleted.  Note that in a partition's
+	 * case, stmt->tableElts contains dummy ColumnDef's created to capture
+	 * the partition's own column constraints which are merged into the actual
+	 * column definitions derived from the parent, before deleting the
+	 * aforementioned dummy ones.  We don't need to refer to stmt->tableElts
+	 * hereafter in this function, although the caller expects it to contain
+	 * valid elements upon return.  So replace the original list with the
+	 * one that's known to be valid.
+	 */
+	stmt->tableElts = schema;
 
 	/*
 	 * Create a tuple descriptor from the relation schema.  Note that this
@@ -853,7