Re: [HACKERS] archive_timeout ignored directly after promotion

2017-06-21 Thread Michael Paquier
On Thu, Jun 22, 2017 at 2:58 AM, Andres Freund  wrote:
> One easy way to fix that would be to just wakeup the checkpointer from
> the startup process once at the end of recovery, but it'd not be
> pretty.   I think it'd be better to change the
> do_restartpoint = RecoveryInProgress();
>
> /*
>  * The end-of-recovery checkpoint is a real 
> checkpoint that's
>  * performed while we're still in recovery.
>  */
> if (flags & CHECKPOINT_END_OF_RECOVERY)
> do_restartpoint = false;
>
> into having a per-loop 'local_in_recovery' variable, that we turn off
> once a CHECKPOINT_END_OF_RECOVERY checkpoint is requested.
>
> Comments?

By Initializing this flag out of the for(;;) loop once, this could put
more control into the checkpointer logic so as no new restart points
can be generated after the end-of-recovery checkpoint. This way we
could replace the hack looking for DB_IN_ARCHIVE_RECOVERY in
CreateRestartPoint() by an error handling.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race conditions with WAL sender PID lookups

2017-06-21 Thread Noah Misch
On Thu, Jun 15, 2017 at 09:04:44AM +0100, Simon Riggs wrote:
> On 15 June 2017 at 02:59, Noah Misch  wrote:
> 
> > Formally, the causative commit is the one that removed the superuser() test,
> > namely 25fff40.
> >
> > [Action required within three days.  This is a generic notification.]
> >
> > The above-described topic is currently a PostgreSQL 10 open item.  Simon,
> > since you committed the patch believed to have created it, you own this open
> > item.  If some other commit is more relevant or if this does not belong as a
> > v10 open item, please let us know.  Otherwise, please observe the policy on
> > open item ownership[1] and send a status update within three calendar days 
> > of
> > this message.  Include a date for your subsequent status update.  Testers 
> > may
> > discover new open items at any time, and I want to plan to get them all 
> > fixed
> > well in advance of shipping v10.  Consequently, I will appreciate your 
> > efforts
> > toward speedy resolution.  Thanks.
> >
> > [1] 
> > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 
> Thanks for letting me know. I'm on leave at present, so won't be able
> to get to this until June 20, though will make it a priority for then.

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-21 Thread Kuntal Ghosh
On Thu, Jun 22, 2017 at 9:48 AM, Michael Paquier
 wrote:
> On Thu, Jun 22, 2017 at 1:29 AM, Kuntal Ghosh
>  wrote:
>> On Wed, Jun 21, 2017 at 7:52 PM, Dilip Kumar  wrote:
>>> On Wed, Jun 21, 2017 at 7:47 PM, Kuntal Ghosh
>>>  wrote:
> IMHO, It's not a good idea to use DSM call to verify the DSA handle.
>
 Okay. Is there any particular scenario you've in mind where this may fail?
>>>
>>> It's not about failure, but about the abstraction.  When we are using
>>> the DSA we should not directly access the DSM which is under DSA.
>>>
>> Okay. I thought that I've found at least one usage of
>> dsm_find_mapping() in the code. :-)
>>
>> But, I've some more doubts.
>> 1. When should we use dsm_find_mapping()? (The first few lines of
>> dsm_attach is same as dsm_find_mapping().)
>> 2. As a user of dsa, how should we check whether my dsa handle is
>> already attached? I guess this is required because, if a user tries to
>> re-attach a dsa handle,  it's punishing the user by throwing an error
>> and the user wants to avoid such errors.
>
> From a logical point of view, there is nothing preventing the use of
> dsm_find_mapping() on a DSA handle, still the API layering looks wrong
> if you want to check for an existing mapping. So why not defining a
> new API, like dsa_find_mapping() that just wraps dsm_find_mapping()
> but has its own error handling? This would offer more flexibility for
> the future.
Yeah. That sounds reasonable. Or, dsa_attach can throw error conditionally.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transition tables vs ON CONFLICT

2017-06-21 Thread Thomas Munro
On Mon, Jun 19, 2017 at 11:04 AM, Andrew Gierth
 wrote:
>> "Thomas" == Thomas Munro  writes:
>
>  Thomas> That accidentally removed a comment that I wanted to keep.
>  Thomas> Here is a better version.
>
> I plan to commit this soon; if anyone has any comment to make, now would
> be a good time.

Here's patch #3 rebased for the recent reindent.

-- 
Thomas Munro
http://www.enterprisedb.com


transition-tuples-from-on-conflict-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-21 Thread Thomas Munro
On Mon, Jun 19, 2017 at 11:06 AM, Andrew Gierth
 wrote:
>> "Thomas" == Thomas Munro  writes:
>
>  Thomas> Thanks both for the review.  New version of patch #2 attached.
>
> I'm looking to commit this soon; if anyone has any further comment now
> would be a good time to speak up.

Here's patch #2 rebased for the recent reindent.

-- 
Thomas Munro
http://www.enterprisedb.com


transition-tuples-from-wctes-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-06-21 Thread Thomas Munro
On Mon, Jun 12, 2017 at 2:03 PM, Thomas Munro
 wrote:
> Thanks for the review.  New version of patch #1 attached.

Here's a version rebased on top of the recently reindented master branch.

-- 
Thomas Munro
http://www.enterprisedb.com


transition-tuples-from-child-tables-v12.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous execution

2017-06-21 Thread Kyotaro HORIGUCHI
The patch got conflicted. This is a new version just rebased to
the current master. Furtuer amendment will be taken later.

> The attached patch is rebased on the current master, but no
> substantial changes other than disallowing partitioned tables on
> async by assertion.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 32d5c143a679bcccee9ff29fe3807dfd8deae458 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 22 Feb 2017 09:07:49 +0900
Subject: [PATCH 1/4] Allow wait event set to be registered to resource owner

WaitEventSet needs to be released using resource owner for a certain
case. This change adds WaitEventSet reowner and allow the creator of a
WaitEventSet to specify a resource owner.
---
 src/backend/libpq/pqcomm.c|  2 +-
 src/backend/storage/ipc/latch.c   | 18 ++-
 src/backend/storage/lmgr/condition_variable.c |  2 +-
 src/backend/utils/resowner/resowner.c | 68 +++
 src/include/storage/latch.h   |  4 +-
 src/include/utils/resowner_private.h  |  8 
 6 files changed, 97 insertions(+), 5 deletions(-)

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 261e9be..c4f336d 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -201,7 +201,7 @@ pq_init(void)
 (errmsg("could not set socket to nonblocking mode: %m")));
 #endif
 
-	FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, 3);
+	FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, NULL, 3);
 	AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE, MyProcPort->sock,
 	  NULL, NULL);
 	AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL);
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 07b1364..9543397 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -51,6 +51,7 @@
 #include "storage/latch.h"
 #include "storage/pmsignal.h"
 #include "storage/shmem.h"
+#include "utils/resowner_private.h"
 
 /*
  * Select the fd readiness primitive to use. Normally the "most modern"
@@ -77,6 +78,8 @@ struct WaitEventSet
 	int			nevents;		/* number of registered events */
 	int			nevents_space;	/* maximum number of events in this set */
 
+	ResourceOwner	resowner;	/* Resource owner */
+
 	/*
 	 * Array, of nevents_space length, storing the definition of events this
 	 * set is waiting for.
@@ -359,7 +362,7 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
 	int			ret = 0;
 	int			rc;
 	WaitEvent	event;
-	WaitEventSet *set = CreateWaitEventSet(CurrentMemoryContext, 3);
+	WaitEventSet *set = CreateWaitEventSet(CurrentMemoryContext, NULL, 3);
 
 	if (wakeEvents & WL_TIMEOUT)
 		Assert(timeout >= 0);
@@ -518,12 +521,15 @@ ResetLatch(volatile Latch *latch)
  * WaitEventSetWait().
  */
 WaitEventSet *
-CreateWaitEventSet(MemoryContext context, int nevents)
+CreateWaitEventSet(MemoryContext context, ResourceOwner res, int nevents)
 {
 	WaitEventSet *set;
 	char	   *data;
 	Size		sz = 0;
 
+	if (res)
+		ResourceOwnerEnlargeWESs(res);
+
 	/*
 	 * Use MAXALIGN size/alignment to guarantee that later uses of memory are
 	 * aligned correctly. E.g. epoll_event might need 8 byte alignment on some
@@ -592,6 +598,11 @@ CreateWaitEventSet(MemoryContext context, int nevents)
 	StaticAssertStmt(WSA_INVALID_EVENT == NULL, "");
 #endif
 
+	/* Register this wait event set if requested */
+	set->resowner = res;
+	if (res)
+		ResourceOwnerRememberWES(set->resowner, set);
+
 	return set;
 }
 
@@ -633,6 +644,9 @@ FreeWaitEventSet(WaitEventSet *set)
 	}
 #endif
 
+	if (set->resowner != NULL)
+		ResourceOwnerForgetWES(set->resowner, set);
+
 	pfree(set);
 }
 
diff --git a/src/backend/storage/lmgr/condition_variable.c b/src/backend/storage/lmgr/condition_variable.c
index b4b7d28..182f759 100644
--- a/src/backend/storage/lmgr/condition_variable.c
+++ b/src/backend/storage/lmgr/condition_variable.c
@@ -66,7 +66,7 @@ ConditionVariablePrepareToSleep(ConditionVariable *cv)
 	/* Create a reusable WaitEventSet. */
 	if (cv_wait_event_set == NULL)
 	{
-		cv_wait_event_set = CreateWaitEventSet(TopMemoryContext, 1);
+		cv_wait_event_set = CreateWaitEventSet(TopMemoryContext, NULL, 1);
 		AddWaitEventToSet(cv_wait_event_set, WL_LATCH_SET, PGINVALID_SOCKET,
 		  MyLatch, NULL);
 	}
diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c
index 4a4a287..f2509c3 100644
--- a/src/backend/utils/resowner/resowner.c
+++ b/src/backend/utils/resowner/resowner.c
@@ -124,6 +124,7 @@ typedef struct ResourceOwnerData
 	ResourceArray snapshotarr;	/* snapshot references */
 	ResourceArray filearr;		/* open temporary files */
 	ResourceArray dsmarr;		/* dynamic shmem segments */
+	ResourceArray wesarr;		/* wait event sets */
 
 	/* We can remember up to MAX_RESOWNER_LOCKS references to local locks. */
 	int			nlocks;			/* number of owned locks */
@@ -169,6 +170,7 @@ static 

Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers

2017-06-21 Thread Michael Paquier
On Thu, Jun 22, 2017 at 1:52 PM, Yugo Nagata  wrote:
> On Thu, 22 Jun 2017 12:05:19 +0900
> Michael Paquier  wrote:
>> signal-able). Different thought, but I'd love to see a SQL function
>> that allows triggering SIGHUP on a specific process, like an
>> autovacuum worker to change its cost parameters. There is
>> pg_reload_conf() to do so but that's system-wide.
>
> For example, is that like this?
>
> =# alter system set autovacuum_vacuum_cost_delay to 10;
> =# select pg_reload_conf();
> =# alter system reset autovacuum_vacuum_cost_delay;

No need to reset the parameter afterwards as this makes it sensible to
updates afterwards, but you have the idea. Note that this is rather
recent, as autovacuum listens to SIGHUP only since a75fb9b3.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication launcher never been restarted when terminated

2017-06-21 Thread Yugo Nagata
On Wed, 21 Jun 2017 15:17:20 -0400
Peter Eisentraut  wrote:

> On 6/21/17 13:03, Yugo Nagata wrote:
> > As I report in another thread[1], when the logical replication launcher 
> > is terminated by SIGTERM, it never been restarted and we need to restart
> > the server to enable logical replication again.
> > 
> > This is because the logical replication launcher exits with exitstatus 0,
> > so if it exits with status 1 it is restarted by the postmaster normally.
> > Attached is a simple patch to fix it in this way.
> 
> Fixed, thanks!

Thanks!

> 
> -- 
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Yugo Nagata 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion not used when doing a recovery_target PITR restore?

2017-06-21 Thread Michael Paquier
On Thu, Jun 22, 2017 at 3:04 AM, Andres Freund  wrote:
> When doing a PITR style recovery, with recovery target set, we're
> currently not doing a fast promotion, in contrast to the handling when
> doing a pg_ctl or trigger file based promotion. That can prolong making
> the server available for writes.
>
> I can't really see a reason for this?

Yes, you are right. I see no reason either why this cannot be done.
Why not just switching fast_promote to true in when using
RECOVERY_TARGET_ACTION_PROMOTE? That's a bug, not a critical one
though.
-- 
Michael


recovery-stop-promote.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-21 Thread Yugo Nagata
On Thu, 22 Jun 2017 13:12:48 +0900
Michael Paquier  wrote:

> On Wed, Jun 21, 2017 at 9:15 PM, Yugo Nagata  wrote:
> > This errors continue until this process is terminated or the server is 
> > restarted.
> >
> > When SIGINT is issued, the process exits from the main loop and returns
> > to sigsetjmp, and calls dsa_attach() before entering into the loop again,
> > this causes the error.
> >
> > We can fix it by calling dsa_attach() before sigsetjmp. Attached is the 
> > patch.
> 
> Your fix looks like a bad idea to me. If the shared memory area does
> not exist after an exception occurred the process should be able to
> re-attach to the shared memory area if it exists or create a new one
> if that's not the case. That should not be a one-time execution.

Thank you for your comment. I overlooked it and now I understand it.

> -- 
> Michael


-- 
Yugo Nagata 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers

2017-06-21 Thread Yugo Nagata
On Thu, 22 Jun 2017 12:05:19 +0900
Michael Paquier  wrote:

> On Thu, Jun 22, 2017 at 11:52 AM, Andres Freund  wrote:
> > On 2017-06-22 11:49:47 +0900, Yugo Nagata wrote:
> >> I agree that we can kill theses processes by the OS command. However,
> >> It seems to me that pg_{cancel,terminate}_backend don't need to be able to
> >> kill processes except for client backends because we can do same thing by
> >> the OS command if necessary, and acutually these functions cannot kill
> >> most other processes, for example, background writer. Are the autovacuum
> >> launcher and background worker special for these functions?
> >
> > I strongly disagree with this - I think it's quite useful to be able to
> > kill things via SQL that can hold lock on database objects.   I'm not
> > seeing which problem would be solved by prohibiting this?
> 
> +1. Controlling them as of now is useful, particularly now that all
> processes show wait events, even auxiliary ones (though not

I got it. I agree that the current behaviour and it isn't worth changint it.
In my understand, processes that the functions can kill (client backends,
background workers, autovacuum processes) are ones that can hold lock
on database objects.

> signal-able). Different thought, but I'd love to see a SQL function
> that allows triggering SIGHUP on a specific process, like an
> autovacuum worker to change its cost parameters. There is
> pg_reload_conf() to do so but that's system-wide.

For example, is that like this?

=# alter system set autovacuum_vacuum_cost_delay to 10;
=# select pg_reload_conf();
=# alter system reset autovacuum_vacuum_cost_delay;

> -- 
> Michael


-- 
Yugo Nagata 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-21 Thread Michael Paquier
On Thu, Jun 22, 2017 at 1:29 AM, Kuntal Ghosh
 wrote:
> On Wed, Jun 21, 2017 at 7:52 PM, Dilip Kumar  wrote:
>> On Wed, Jun 21, 2017 at 7:47 PM, Kuntal Ghosh
>>  wrote:
 IMHO, It's not a good idea to use DSM call to verify the DSA handle.

>>> Okay. Is there any particular scenario you've in mind where this may fail?
>>
>> It's not about failure, but about the abstraction.  When we are using
>> the DSA we should not directly access the DSM which is under DSA.
>>
> Okay. I thought that I've found at least one usage of
> dsm_find_mapping() in the code. :-)
>
> But, I've some more doubts.
> 1. When should we use dsm_find_mapping()? (The first few lines of
> dsm_attach is same as dsm_find_mapping().)
> 2. As a user of dsa, how should we check whether my dsa handle is
> already attached? I guess this is required because, if a user tries to
> re-attach a dsa handle,  it's punishing the user by throwing an error
> and the user wants to avoid such errors.

>From a logical point of view, there is nothing preventing the use of
dsm_find_mapping() on a DSA handle, still the API layering looks wrong
if you want to check for an existing mapping. So why not defining a
new API, like dsa_find_mapping() that just wraps dsm_find_mapping()
but has its own error handling? This would offer more flexibility for
the future.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-21 Thread Michael Paquier
On Wed, Jun 21, 2017 at 9:15 PM, Yugo Nagata  wrote:
> This errors continue until this process is terminated or the server is 
> restarted.
>
> When SIGINT is issued, the process exits from the main loop and returns
> to sigsetjmp, and calls dsa_attach() before entering into the loop again,
> this causes the error.
>
> We can fix it by calling dsa_attach() before sigsetjmp. Attached is the patch.

Your fix looks like a bad idea to me. If the shared memory area does
not exist after an exception occurred the process should be able to
re-attach to the shared memory area if it exists or create a new one
if that's not the case. That should not be a one-time execution.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgrowlocks relkind check

2017-06-21 Thread Amit Langote
On 2017/06/22 12:26, Peter Eisentraut wrote:
> On 6/14/17 01:34, Amit Langote wrote:
>> How about the attached patch that teaches pgrowlocks() to error out unless
>> a meaningful result can be returned?  With the patch, it will issue a "%s
>> is not a table" message if it is not a RELKIND_RELATION, except a
>> different message will be issued for partitioned tables (for they are
>> tables).
> 
> committed

Thanks.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] outfuncs.c utility statement support

2017-06-21 Thread Tom Lane
Peter Eisentraut  writes:
> On 6/18/17 10:14, Tom Lane wrote:
>> Doesn't cope with backslash-quoted characters.  If we're going to bother
>> to do anything here, I think we ought to make it reversible for all
>> possible characters.

> Makes sense.  Updated patch attached.

That looks sane to me, though I've still not actually tested any
interesting cases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgrowlocks relkind check

2017-06-21 Thread Peter Eisentraut
On 6/14/17 01:34, Amit Langote wrote:
> How about the attached patch that teaches pgrowlocks() to error out unless
> a meaningful result can be returned?  With the patch, it will issue a "%s
> is not a table" message if it is not a RELKIND_RELATION, except a
> different message will be issued for partitioned tables (for they are
> tables).

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers

2017-06-21 Thread Michael Paquier
On Thu, Jun 22, 2017 at 11:52 AM, Andres Freund  wrote:
> On 2017-06-22 11:49:47 +0900, Yugo Nagata wrote:
>> I agree that we can kill theses processes by the OS command. However,
>> It seems to me that pg_{cancel,terminate}_backend don't need to be able to
>> kill processes except for client backends because we can do same thing by
>> the OS command if necessary, and acutually these functions cannot kill
>> most other processes, for example, background writer. Are the autovacuum
>> launcher and background worker special for these functions?
>
> I strongly disagree with this - I think it's quite useful to be able to
> kill things via SQL that can hold lock on database objects.   I'm not
> seeing which problem would be solved by prohibiting this?

+1. Controlling them as of now is useful, particularly now that all
processes show wait events, even auxiliary ones (though not
signal-able). Different thought, but I'd love to see a SQL function
that allows triggering SIGHUP on a specific process, like an
autovacuum worker to change its cost parameters. There is
pg_reload_conf() to do so but that's system-wide.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] outfuncs.c utility statement support

2017-06-21 Thread Peter Eisentraut
On 6/18/17 10:14, Tom Lane wrote:
> pg_strtok recognizes "<>" and returns length = 0, so debackslash()
> would produce the right answer AFAICS (admittedly, I haven't tested).
> But I don't really want to do it like that because of the wasted
> palloc space and cycles.
> 
>> Maybe
>> local_node->fldname = length ? token[0] : '\0';
>> ?
> 
> Doesn't cope with backslash-quoted characters.  If we're going to bother
> to do anything here, I think we ought to make it reversible for all
> possible characters.

Makes sense.  Updated patch attached.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 3b221dc64e1e9eb74b48a97fba4bfa18fad4bf4a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 21 Jun 2017 22:57:23 -0400
Subject: [PATCH v2] Fix output of char node fields

WRITE_CHAR_FIELD() didn't do any escaping, so that for example a zero
byte would cause the whole output string to be truncated.  To fix, pass
the char through outToken(), so it is escaped like a string.  Adjust the
reading side to handle this.
---
 src/backend/nodes/outfuncs.c  | 20 +++-
 src/backend/nodes/readfuncs.c |  3 ++-
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 3a23f0bb16..b0abe9ec10 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -32,6 +32,8 @@
 #include "utils/datum.h"
 #include "utils/rel.h"
 
+static void outChar(StringInfo str, char c);
+
 
 /*
  * Macros to simplify output of different kinds of fields.  Use these
@@ -62,7 +64,8 @@
 
 /* Write a char field (ie, one ascii character) */
 #define WRITE_CHAR_FIELD(fldname) \
-   appendStringInfo(str, " :" CppAsString(fldname) " %c", node->fldname)
+   (appendStringInfo(str, " :" CppAsString(fldname) " "), \
+outChar(str, node->fldname))
 
 /* Write an enumerated-type field as an integer code */
 #define WRITE_ENUM_FIELD(fldname, enumtype) \
@@ -140,6 +143,21 @@ outToken(StringInfo str, const char *s)
}
 }
 
+/*
+ * Convert one char.  Goes through outToken() so that special characters are
+ * escaped.
+ */
+static void
+outChar(StringInfo str, char c)
+{
+   charin[2];
+
+   in[0] = c;
+   in[1] = '\0';
+
+   outToken(str, in);
+}
+
 static void
 _outList(StringInfo str, const List *node)
 {
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 2988e8bd16..1380703cbc 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -86,7 +86,8 @@
 #define READ_CHAR_FIELD(fldname) \
token = pg_strtok(); /* skip :fldname */ \
token = pg_strtok(); /* get field value */ \
-   local_node->fldname = token[0]
+   /* avoid overhead of calling debackslash() for one char */ \
+   local_node->fldname = (length == 0) ? '\0' : (token[0] == '\\' ? 
token[1] : token[0])
 
 /* Read an enumerated-type field that was written as an integer code */
 #define READ_ENUM_FIELD(fldname, enumtype) \
-- 
2.13.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-06-21 Thread Michael Paquier
On Thu, Jun 22, 2017 at 11:12 AM, Amit Langote
 wrote:
> On 2017/06/22 10:01, Michael Paquier wrote:
>> #3 implies that the index AM logic is implemented in the table
>> AM. Not saying that it is not useful, but it does not feel natural to
>> have the planner request for a sequential scan, just to have the table
>> AM secretly do some kind of index/skipping scan.
>
> I had read a relevant comment on a pluggable storage thread awhile back
> [1].  In short, the comment was that the planner should be able to get
> some intelligence, via some API, from the heap storage implementation
> about the latter's access cost characteristics.  The storage should
> provide accurate-enough cost information to the planner when such a
> request is made by, say, cost_seqscan(), so that the planner can make
> appropriate choice.  If two tables containing the same number of rows (and
> the same size in bytes, perhaps) use different storage implementations,
> then, planner's cost parameters remaining same, cost_seqscan() will end up
> calculating different costs for the two tables.  Perhaps, SeqScan would be
> chosen for one table but not the another based on that.

Yeah, I agree that the costing part needs some clear attention and
thoughts, and the gains are absolutely huge with the correct
interface. That could be done in a later step though.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Useless code in ExecInitModifyTable

2017-06-21 Thread Amit Langote
On 2017/06/22 11:25, Etsuro Fujita wrote:
> On 2017/06/21 23:52, Robert Haas wrote:
> 
>> You're right that there is a comment missing from the function header,
>> but it's not too hard to figure out what it should be.  Just consult
>> the definition of ModifyTable itself:
>>
>>  /* RT indexes of non-leaf tables in a partition tree */
>>  List   *partitioned_rels;
> 
> I agree on that point, but I think it's better to add the missing comment
> for the create_modifytable_path argument, as proposed in [1].

Thanks for sharing that link.  I was about to send a patch to add the
comment, but seems like you beat me to it.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers

2017-06-21 Thread Andres Freund
On 2017-06-22 11:49:47 +0900, Yugo Nagata wrote:
> On Wed, 21 Jun 2017 11:04:34 -0400
> Robert Haas  wrote:
> 
> > On Wed, Jun 21, 2017 at 7:56 AM, Yugo Nagata  wrote:
> > > I have found that we can cancel/terminate autovacuum launchers and
> > > background worker processes by pg_cancel/terminate_backend function.
> > > I'm wondering this behavior is not expected and if not I want to fix it.
> > 
> > I think it is expected.  Even if we blocked it, those processes have
> > to cope gracefully with SIGTERM, because anyone with access to the OS
> > user can kill them that way by hand.
> 
> I agree that we can kill theses processes by the OS command. However,
> It seems to me that pg_{cancel,terminate}_backend don't need to be able to
> kill processes except for client backends because we can do same thing by
> the OS command if necessary, and acutually these functions cannot kill
> most other processes, for example, background writer. Are the autovacuum
> launcher and background worker special for these functions?

I strongly disagree with this - I think it's quite useful to be able to
kill things via SQL that can hold lock on database objects.   I'm not
seeing which problem would be solved by prohibiting this?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers

2017-06-21 Thread Yugo Nagata
On Wed, 21 Jun 2017 11:04:34 -0400
Robert Haas  wrote:

> On Wed, Jun 21, 2017 at 7:56 AM, Yugo Nagata  wrote:
> > I have found that we can cancel/terminate autovacuum launchers and
> > background worker processes by pg_cancel/terminate_backend function.
> > I'm wondering this behavior is not expected and if not I want to fix it.
> 
> I think it is expected.  Even if we blocked it, those processes have
> to cope gracefully with SIGTERM, because anyone with access to the OS
> user can kill them that way by hand.

I agree that we can kill theses processes by the OS command. However,
It seems to me that pg_{cancel,terminate}_backend don't need to be able to
kill processes except for client backends because we can do same thing by
the OS command if necessary, and acutually these functions cannot kill
most other processes, for example, background writer. Are the autovacuum
launcher and background worker special for these functions?

> 
> > However, we can terminate background workers by pg_terminate_backend.
> > In the following example, I terminated the logical replication launcher,
> > and this process did not appear again[1].
> >
> > postgres=# select pg_terminate_backend(30902);
> >  pg_terminate_backend
> > --
> >  t
> > (1 row)
> 
> That seems to be a bug in logical replication.
> 
> > Similarly, we can terminate autovacuum launcher by pg_terminate_backend,
> > but a new process is restarted by postmaster in this case.[2]
> >
> > postgres=# select pg_terminate_backend(30900);
> >  pg_terminate_backend
> > --
> >  t
> > (1 row)
> 
> That is as I would expect.
> 
> > [2]
> > On the other hand, when we use pg_cancel_backend for autovacuum launcher,
> > it causes the following error. I'll report the detail in another thread.
> >
> >  ERROR:  can't attach the same segment more than once
> 
> I think that's a bug.
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


-- 
Yugo Nagata 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-21 Thread Peter Eisentraut
On 6/20/17 22:44, Noah Misch wrote:
>> A patch has been posted, and it's being reviewed.  Next update Monday.
> 
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

I'm not sure how to proceed here.  Nobody is else seems terribly
interested, and this is really a minor issue, so I don't want to muck
around with the locking endlessly.  Let's say, if there are no new
insights by Friday, I'll pick one of the proposed patches or just close
it without any patch.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-21 Thread Peter Eisentraut
On 6/19/17 22:41, Masahiko Sawada wrote:
> On Tue, Jun 20, 2017 at 10:47 AM, Peter Eisentraut
>  wrote:
>> On 6/16/17 04:16, Masahiko Sawada wrote:
>>> A subscription relation state may have been removed already when we
>>> try to update it. SetSubscriptionRelState didn't emit an error in such
>>> case but with this patch we end up with an error. Since we shouldn't
>>> ignore such error in UpdateSubscriptionRelState I'd say we can let the
>>> user know about that possibility in the error message.
>>
>> So are you saying it's good to have the error message?
>>
> 
> Yes. UpdateSubscriptionRelState failure means that the subscription
> relation state has disappeared or also means something wrong. So I
> think it's good to have it as perhaps errdetail. Thought?

I think this could lead to errors in the tablesync workers if they are
trying to write while the entries have already been deleted as part of
the subscription or the table being deleted.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-21 Thread Peter Eisentraut
On 6/20/17 19:10, Peter Eisentraut wrote:
> On 6/19/17 22:54, Masahiko Sawada wrote:
>>> It seems to me we could just take a stronger lock around
>>> RemoveSubscriptionRel(), so that workers can't write in there concurrently.
>>
>> Since we reduced the lock level of updating pg_subscription_rel by
>> commit 521fd4795e3e the same deadlock issue will appear if we just
>> take a stronger lock level.
> 
> I was thinking about a more refined approach, like in the attached
> patch.  It just changes the locking when in DropSubscription(), so that
> that doesn't fail if workers are doing stuff concurrently.  Everything
> else stays the same.

The alternative is that we use the LockSharedObject() approach that was
already alluded to, like in the attached patch.  Both approaches would
work equally fine AFAICT.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 10afa9807014e14596cb05d70ba302c86bf30dd3 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 21 Jun 2017 22:40:22 -0400
Subject: [PATCH] WIP Add locking SetSubscriptionRelState()

---
 src/backend/catalog/pg_subscription.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/backend/catalog/pg_subscription.c 
b/src/backend/catalog/pg_subscription.c
index c69c461b62..f794d4c526 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -28,6 +28,8 @@
 
 #include "nodes/makefuncs.h"
 
+#include "storage/lmgr.h"
+
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
@@ -246,6 +248,8 @@ SetSubscriptionRelState(Oid subid, Oid relid, char state,
boolnulls[Natts_pg_subscription_rel];
Datum   values[Natts_pg_subscription_rel];
 
+   LockSharedObject(SubscriptionRelationId, subid, 0, AccessExclusiveLock);
+
rel = heap_open(SubscriptionRelRelationId, RowExclusiveLock);
 
/* Try finding existing mapping. */
-- 
2.13.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Useless code in ExecInitModifyTable

2017-06-21 Thread Etsuro Fujita

On 2017/06/21 23:52, Robert Haas wrote:


You're right that there is a comment missing from the function header,
but it's not too hard to figure out what it should be.  Just consult
the definition of ModifyTable itself:

 /* RT indexes of non-leaf tables in a partition tree */
 List   *partitioned_rels;


I agree on that point, but I think it's better to add the missing 
comment for the create_modifytable_path argument, as proposed in [1].


Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/e87c4a6d-23d7-5e7c-e8db-44ed418eb5d1%40lab.ntt.co.jp




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix a typo in partition.c

2017-06-21 Thread Masahiko Sawada
On Wed, Jun 21, 2017 at 3:32 AM, Peter Eisentraut
 wrote:
> On 6/19/17 23:02, Masahiko Sawada wrote:
>> Hi,
>>
>> Attached patch for $subject.
>>
>> s/opreator/operator/
>
> fixed
>

Thank you!
I found another one.

s/retrived/retrieved/

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_typo_in_partition_c_2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-06-21 Thread Amit Langote
On 2017/06/22 10:01, Michael Paquier wrote:
> #3 implies that the index AM logic is implemented in the table
> AM. Not saying that it is not useful, but it does not feel natural to
> have the planner request for a sequential scan, just to have the table
> AM secretly do some kind of index/skipping scan.

I had read a relevant comment on a pluggable storage thread awhile back
[1].  In short, the comment was that the planner should be able to get
some intelligence, via some API, from the heap storage implementation
about the latter's access cost characteristics.  The storage should
provide accurate-enough cost information to the planner when such a
request is made by, say, cost_seqscan(), so that the planner can make
appropriate choice.  If two tables containing the same number of rows (and
the same size in bytes, perhaps) use different storage implementations,
then, planner's cost parameters remaining same, cost_seqscan() will end up
calculating different costs for the two tables.  Perhaps, SeqScan would be
chosen for one table but not the another based on that.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoY3LXVUPQVdZW70XKp5PsXffO82pXXt%3DbeegcV%2B%3DRsQgg%40mail.gmail.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An attempt to reduce WALWriteLock contention

2017-06-21 Thread Kuntal Ghosh
On Thu, Jun 22, 2017 at 6:46 AM, Michael Paquier
 wrote:
> On Wed, Jun 21, 2017 at 4:57 PM, jasrajd  wrote:
>> We are also seeing contention on the walwritelock and repeated writes to the
>> same offset if we move the flush outside the lock in the Azure environment.
>> pgbench doesn't scale beyond ~8 cores without saturating the IOPs or
>> bandwidth. Is there more work being done in this area?
>
> As of now, there is no patch in the development queue for Postgres 11
> that is dedicated to this particularly lock contention. There is a
> patch for LWlocks in general with power PC, but that's all:
> https://commitfest.postgresql.org/14/984/
>
> Not sure if Kuntal has plans to submit again this patch. It is
> actually a bit sad to not see things moving on and use an approach to
> group flushes.
As of now, I've no plans to re-submit the patch. Actually, I'm not
sure what I should try next. I would love to get some advice/direction
regarding this.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An attempt to reduce WALWriteLock contention

2017-06-21 Thread Kuntal Ghosh
On Thu, Dec 22, 2016 at 11:29 PM, Tomas Vondra
 wrote:
>
> How do these counts compare to the other wait events? For example
> CLogControlLock, which is what Amit's patch [1] is about?
>
> [1]
> https://www.postgresql.org/message-id/flat/84c22fbb-b9c4-a02f-384b-b4feb2c67193%402ndquadrant.com
>
Hello Tomas,

I'm really sorry for this late reply. I've somehow missed the thread.
Actually, I've seen some performance improvement with the
CLogControlLock patch. But, then it turned out all the improvements
were because of the CLogControlLock patch alone.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ASOF join

2017-06-21 Thread Thomas Munro
On Wed, Jun 21, 2017 at 9:46 PM, Konstantin Knizhnik
 wrote:
> Thank you for this idea. I agree that it is the best way of implementing
> ASOF join - just as optimization of standard SQL query.

Great.  I think this part definitely has potential.

> But do you think that still it will be good idea to extend SQL syntax with
> ASOF JOIN ... USING ... clause? It will significantly simplify writing
> queries like above
> and IMHO doesn't introduce some confusions with standard SQL syntax. My
> primary idea of suggesting ASOF join for Postgres was not  just building
> more efficient plan (using merge join instead of nested loop) but also
> simplifying writing of such queries. Or do you think that nobody will be
> interested in non-standard SQL extensions?

I can see the appeal, but I expect it to be difficult to convince the
project to accept a non-standard syntax for a niche use case that can
be expressed already.  Q is super terse and designed for time series
data.  SQL is neither of those things.

Some first reactions to the syntaxes you mentioned:

1.  times LEFT ASOF JOIN ticks ON ticks.time <= times.time
2.  times LEFT ASOF JOIN ticks USING (time)
3.  times LEFT ASOF JOIN ticks USING (ticks.time, times.time)

The USING ideas don't seem to be general enough, because there is no
place to say whether to use a lower or higher value if there is no
match, or did I miss something?  Relying on an ORDER BY clause in the
query to control the meaning of the join seems too weird, and making
it always (for example) <= would be an arbitrary limitation.  The
first syntax at least has enough information: when you say one of <,
>, <=, >= you also imply the search order.  I'm not sure if there are
any problems with that, perhaps when combined with other quals.

The equivalent nearly-standard syntax is definitely quite verbose, but
it has the merit of being absolutely explicit about which row from
'ticks' will be selected:

  times LEFT JOIN LATERAL (SELECT * FROM ticks
WHERE ticks.time <= times.time
 ORDER BY ticks.time DESC LIMIT 1) x ON true

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An attempt to reduce WALWriteLock contention

2017-06-21 Thread Andres Freund
On 2017-06-21 00:57:32 -0700, jasrajd wrote:
> We are also seeing contention on the walwritelock and repeated writes to the
> same offset if we move the flush outside the lock in the Azure environment.
> pgbench doesn't scale beyond ~8 cores without saturating the IOPs or
> bandwidth. Is there more work being done in this area?

I kind of doubt that scalability limit is directly related to this patch
- I've seen postgres scale furhter without that lock becoming the prime
issue.  What exactly are you measuring / observing?

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-21 Thread Andres Freund
On 2017-06-21 18:07:21 -0700, Andres Freund wrote:
> On 2017-06-22 09:03:05 +0800, Craig Ringer wrote:
> > On 22 June 2017 at 08:29, Andres Freund  wrote:
> > 
> > > I.e. we're doing tiny write send() syscalls (they should be coalesced)
> > 
> > That's likely worth doing, but can probably wait for a separate patch.
> 
> I don't think so, we should get this right, it could have API influence.
> 
> 
> > The kernel will usually do some packet aggregation unless we use
> > TCP_NODELAY (which we don't and shouldn't), and the syscall overhead
> > is IMO not worth worrying about just yet.
> 
> 1)
>   /*
>* Select socket options: no delay of 
> outgoing data for
>* TCP sockets, nonblock mode, 
> close-on-exec. Fail if any
>* of this fails.
>*/
>   if (!IS_AF_UNIX(addr_cur->ai_family))
>   {
>   if (!connectNoDelay(conn))
>   {
>   pqDropConnection(conn, 
> true);
>   conn->addr_cur = 
> addr_cur->ai_next;
>   continue;
>   }
>   }
> 
> 2) Even if nodelay weren't set, this can still lead to smaller packets
>being sent, because you start sending normal sized tcp packets,
>rather than jumbo ones, even if configured (pretty common these
>days).
> 
> 3) Syscall overhead is actually quite significant.

Proof of the pudding:

pgbench of 10 pgbench select statements in a batch:

as submitted by Daniel:

pgbench -h localhost -M prepared -S -n -c 16 -j 16 -T 1 -P 1 -f 
~/tmp/pgbench-select-only-batch.sq
progress: 1.0 s, 24175.5 tps, lat 0.647 ms stddev 0.782
progress: 2.0 s, 27737.6 tps, lat 0.577 ms stddev 0.625
progress: 3.0 s, 28853.3 tps, lat 0.554 ms stddev 0.619
progress: 4.0 s, 26660.8 tps, lat 0.600 ms stddev 0.776
progress: 5.0 s, 30023.8 tps, lat 0.533 ms stddev 0.484
progress: 6.0 s, 29959.3 tps, lat 0.534 ms stddev 0.450
progress: 7.0 s, 29944.9 tps, lat 0.534 ms stddev 0.536
progress: 8.0 s, 30137.7 tps, lat 0.531 ms stddev 0.533
progress: 9.0 s, 30285.2 tps, lat 0.528 ms stddev 0.479
progress: 10.0 s, 30228.7 tps, lat 0.529 ms stddev 0.460
progress: 11.0 s, 29921.4 tps, lat 0.534 ms stddev 0.613
progress: 12.0 s, 29982.4 tps, lat 0.533 ms stddev 0.510
progress: 13.0 s, 29247.4 tps, lat 0.547 ms stddev 0.526
progress: 14.0 s, 28757.3 tps, lat 0.556 ms stddev 0.635
progress: 15.0 s, 29035.3 tps, lat 0.551 ms stddev 0.523
^C

sample vmstat:
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
19  0  0 488992 787332 2355867600 0 0 9720 455099 65 35  0  
0  0

(i.e. ~450k context switches)

hackily patched:
pgbench -h localhost -M prepared -S -n -c 16 -j 16 -T 1 -P 1 -f 
~/tmp/pgbench-select-only-batch.sq
progress: 1.0 s, 40545.2 tps, lat 0.386 ms stddev 0.625
progress: 2.0 s, 48158.0 tps, lat 0.332 ms stddev 0.277
progress: 3.0 s, 50125.7 tps, lat 0.319 ms stddev 0.204
progress: 4.0 s, 50740.6 tps, lat 0.315 ms stddev 0.250
progress: 5.0 s, 50795.6 tps, lat 0.315 ms stddev 0.246
progress: 6.0 s, 51195.6 tps, lat 0.312 ms stddev 0.207
progress: 7.0 s, 50746.7 tps, lat 0.315 ms stddev 0.264
progress: 8.0 s, 50619.1 tps, lat 0.316 ms stddev 0.250
progress: 9.0 s, 50619.4 tps, lat 0.316 ms stddev 0.228
progress: 10.0 s, 46967.8 tps, lat 0.340 ms stddev 0.499
progress: 11.0 s, 50480.1 tps, lat 0.317 ms stddev 0.239
progress: 12.0 s, 50242.5 tps, lat 0.318 ms stddev 0.286
progress: 13.0 s, 49912.7 tps, lat 0.320 ms stddev 0.266
progress: 14.0 s, 49841.7 tps, lat 0.321 ms stddev 0.271
progress: 15.0 s, 49807.1 tps, lat 0.321 ms stddev 0.248
^C

sample vmstat:
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
23  0  0 482008 787312 2355899600 0 0 8219 105097 87 14  0  
0  0

(i.e. ~100k context switches)

That's *localhost*.


It's completely possible that I've screwed something up here, I didn't
test it besides running pgbench, but the send/recv'd data looks like
it's similar amounts of data, just fewer syscalls.

Greetings,

Andres Freund
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e498ad61e5..aeed1649ce 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -2352,14 +2352,17 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 if (debug)
 	fprintf(stderr, "client %d receiving\n", st->id);
 
-if (!PQconsumeInput(st->con))
-{/* there's something wrong */
-	commandFailed(st, "perhaps the backend died while processing");
-	

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-21 Thread Craig Ringer
On 22 June 2017 at 09:07, Andres Freund  wrote:
> On 2017-06-22 09:03:05 +0800, Craig Ringer wrote:
>> On 22 June 2017 at 08:29, Andres Freund  wrote:
>>
>> > I.e. we're doing tiny write send() syscalls (they should be coalesced)
>>
>> That's likely worth doing, but can probably wait for a separate patch.
>
> I don't think so, we should get this right, it could have API influence.
>
>
>> The kernel will usually do some packet aggregation unless we use
>> TCP_NODELAY (which we don't and shouldn't), and the syscall overhead
>> is IMO not worth worrying about just yet.
>
> 1)
> /*
>  * Select socket options: no delay of 
> outgoing data for
>  * TCP sockets, nonblock mode, 
> close-on-exec. Fail if any
>  * of this fails.
>  */
> if (!IS_AF_UNIX(addr_cur->ai_family))
> {
> if (!connectNoDelay(conn))
> {
> 
> pqDropConnection(conn, true);
> conn->addr_cur = 
> addr_cur->ai_next;
> continue;
> }
> }
>
> 2) Even if nodelay weren't set, this can still lead to smaller packets
>being sent, because you start sending normal sized tcp packets,
>rather than jumbo ones, even if configured (pretty common these
>days).
>
> 3) Syscall overhead is actually quite significant.

Fair enough, and *headdesk* re not checking NODELAY. I thought I'd
checked for our use of that before, but I must've remembered wrong.

We could use TCP_CORK but it's not portable and it'd be better to just
collect up a buffer to dispatch.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An attempt to reduce WALWriteLock contention

2017-06-21 Thread Michael Paquier
On Wed, Jun 21, 2017 at 4:57 PM, jasrajd  wrote:
> We are also seeing contention on the walwritelock and repeated writes to the
> same offset if we move the flush outside the lock in the Azure environment.
> pgbench doesn't scale beyond ~8 cores without saturating the IOPs or
> bandwidth. Is there more work being done in this area?

As of now, there is no patch in the development queue for Postgres 11
that is dedicated to this particularly lock contention. There is a
patch for LWlocks in general with power PC, but that's all:
https://commitfest.postgresql.org/14/984/

Not sure if Kuntal has plans to submit again this patch. It is
actually a bit sad to not see things moving on and use an approach to
group flushes.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-21 Thread Andres Freund
On 2017-06-22 09:03:05 +0800, Craig Ringer wrote:
> On 22 June 2017 at 08:29, Andres Freund  wrote:
> 
> > I.e. we're doing tiny write send() syscalls (they should be coalesced)
> 
> That's likely worth doing, but can probably wait for a separate patch.

I don't think so, we should get this right, it could have API influence.


> The kernel will usually do some packet aggregation unless we use
> TCP_NODELAY (which we don't and shouldn't), and the syscall overhead
> is IMO not worth worrying about just yet.

1)
/*
 * Select socket options: no delay of 
outgoing data for
 * TCP sockets, nonblock mode, 
close-on-exec. Fail if any
 * of this fails.
 */
if (!IS_AF_UNIX(addr_cur->ai_family))
{
if (!connectNoDelay(conn))
{
pqDropConnection(conn, 
true);
conn->addr_cur = 
addr_cur->ai_next;
continue;
}
}

2) Even if nodelay weren't set, this can still lead to smaller packets
   being sent, because you start sending normal sized tcp packets,
   rather than jumbo ones, even if configured (pretty common these
   days).

3) Syscall overhead is actually quite significant.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-21 Thread Craig Ringer
On 22 June 2017 at 08:29, Andres Freund  wrote:

> I.e. we're doing tiny write send() syscalls (they should be coalesced)

That's likely worth doing, but can probably wait for a separate patch.
The kernel will usually do some packet aggregation unless we use
TCP_NODELAY (which we don't and shouldn't), and the syscall overhead
is IMO not worth worrying about just yet.

> and then completely unnecessarily call recv() over and over again
> without polling.  To me it looks very much like we're just doing either
> exactly once per command...

Yeah, that looks suspect.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-06-21 Thread Michael Paquier
On Thu, Jun 22, 2017 at 4:47 AM, Robert Haas  wrote:
> I think that BitmapHeapScan, at least, is applicable to any table AM
> that has TIDs.   It seems to me that in general we can imagine three
> kinds of table AMs:
>
> 1. Table AMs where a tuple can be efficiently located by a real TID.
> By a real TID, I mean that the block number part is really a block
> number and the item ID is really a location within the block.  These
> are necessarily quite similar to our current heap, but they can change
> the tuple format and page format to some degree, and it seems like in
> many cases it should be possible to plug them into our existing index
> AMs without too much heartache.  Both index scans and bitmap index
> scans ought to work.
>
> 2. Table AMs where a tuple has some other kind of locator.  For
> example, imagine an index-organized table where the locator is the
> primary key, which is a bit like what Alvaro had in mind for indirect
> indexes.  If the locator is 6 bytes or less, it could potentially be
> jammed into a TID, but I don't think that's a great idea.  For things
> like int8 or numeric, it won't work at all.  Even for other things,
> it's going to cause problems because the bit patterns won't be what
> the code is expecting; e.g. bitmap scans care about the structure of
> the TID, not just how many bits it is.  (Due credit: Somebody, maybe
> Alvaro, pointed out this problem before, at PGCon.)  For these kinds
> of tables, larger modifications to the index AMs are likely to be
> necessary, at least if we want a really general solution, or maybe we
> should have separate index AMs - e.g. btree for traditional TID-based
> heaps, and generic_btree or indirect_btree or key_btree or whatever
> for heaps with some other kind of locator.  It's not too hard to see
> how to make index scans work with this sort of structure but it's very
> unclear to me whether, or how, bitmap scans can be made to work.
>
> 3. Table AMs where a tuple doesn't really have a locator at all.  In
> these cases, we can't support any sort of index AM at all.  When the
> table is queried, there's really nothing the core system can do except
> ask the table AM for a full scan, supply the quals, and hope the table
> AM has some sort of smarts that enable it to optimize somehow.  For
> example, you can imagine converting cstore_fdw into a table AM of this
> sort - ORC has a sort of inbuilt BRIN-like indexing that allows whole
> chunks to be proven uninteresting and skipped.  (You could use chunk
> number + offset to turn this into a table AM of the previous type if
> you wanted to support secondary indexes; not sure if that'd be useful,
> but it'd certainly be harder.)
>
> I'm more interested in #1 than in #3, and more interested in #3 than
> #2, but other people may have different priorities.

Putting that in a couple of words.
1. Table AM with a 6-byte TID.
2. Table AM with a custom locator format, which could be TID-like.
3. Table AM with no locators.

Getting into having #1 first to work out would already be really
useful for users. My take on the matter is that being able to plug in
in-core index AMs directly into a table AM #1 is more useful in the
long term, as it is possible for multiple table AMs to use the same
kind of index AM which is designed nicely enough. So the index AM
logic basically does not need to be duplicated across multiple table
AMs. #3 implies that the index AM logic is implemented in the table
AM. Not saying that it is not useful, but it does not feel natural to
have the planner request for a sequential scan, just to have the table
AM secretly do some kind of index/skipping scan.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-06-21 Thread Amit Langote
On 2017/06/21 21:37, Jeevan Ladhe wrote:
> Hi Amit,
> 
> On Thu, Jun 15, 2017 at 12:31 PM, Amit Langote <
> langote_amit...@lab.ntt.co.jp> wrote:
> 
>> Oops, I meant to send one more comment.
>>
>> On 2017/06/15 15:48, Amit Langote wrote:
>>> BTW, I noticed the following in 0002
>> +errmsg("there exists a default
>> partition for table \"%s\", cannot
>> add a new partition",
>>
>> This error message style seems novel to me.  I'm not sure about the best
>> message text here, but maybe: "cannot add new partition to table \"%s\"
>> with default partition"
>>
> 
> This sounds confusing to me, what about something like:
> "\"%s\" has a default partition, cannot add a new partition."

It's the comma inside the error message that suggests to me that it's a
style that I haven't seen elsewhere in the backend code.  The primary
error message here is that the new partition cannot be created.  "%s has
default partition" seems to me to belong in errdetail() (see "What Goes
Where" in [1].)

Or write the sentence such that the comma is not required.  Anyway, we can
leave this for the committer to decide.

> Note that this comment belongs to patch 0002, and it will go away
> in case we are going to have extended functionality i.e. patch 0003,
> as in that patch we allow user to create a new partition even in the
> cases when there exists a default partition.

Oh, that'd be great.  It's always better to get rid of the error
conditions that are hard to communicate to users. :)  (Although, this
one's not that ambiguous.)

Thanks,
Amit

[1] https://www.postgresql.org/docs/devel/static/error-style-guide.html



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ASOF join

2017-06-21 Thread Thomas Munro
On Mon, Jun 19, 2017 at 11:57 PM, Konstantin Knizhnik
 wrote:
> On 16.06.2017 19:07, David Fetter wrote:
>> If you turn your head sideways, it's very similar to the range merge
>> join Jeff Davis proposed.  https://commitfest.postgresql.org/14/1106/
>
> May be, but I do not understand how to limit result to contain exactly one
> (last) inner tuple for each outer tuple.

Yeah, it's somehow related but it's not the same thing.  I guess you
can think of the keys in the ASOF case as modelling range starts, with
range ends implied by the record with next higher/lower key.
Concretely, if every 'tick' row from my example in a nearby message
had a time but also an end time to be set when a new tick is inserted
so that each tick row had the complete effective time range for that
tick, then you could rewrite the query as "get me the tick whose
effective time range overlaps with each value of times.time" and get a
nice range merge join with Jeff's patch.  That sort of thing might be
very useful for SQL:2011 temporal query-style stuff, but it's not what
Konstantin wants to do: he wants to merge time series where the
effective range is not explicitly stored in every row.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rules on table partitions

2017-06-21 Thread Amit Langote
On 2017/06/21 18:49, Dean Rasheed wrote:
> On 20 June 2017 at 03:01, Amit Langote  wrote:
>> Hmm, yes.  The following exercise convinced me.
>>
>> create table r (a int) partition by range (a);
>> create table r1 partition of r for values from (1) to (10);
>> create rule "_RETURN" as on select to r1 do instead select * from r;
>>
>> insert into r values (1);
>> ERROR:  cannot insert into view "r1"
>> HINT:  To enable inserting into the view, provide an INSTEAD OF INSERT
>> trigger or an unconditional ON INSERT DO INSTEAD rule.
>>
>> The error is emitted by CheckValidResultRel() that is called on individual
>> leaf partitions when setting up tuple-routing in ExecInitModifyTable.
>>
>> I agree that we should forbid this case, so please find attached a patch.
>>
> 
> Committed.

Thanks, Dean.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-21 Thread Andres Freund
On 2017-06-21 16:40:48 -0700, Andres Freund wrote:
> On 2017-06-20 17:51:23 +0200, Daniel Verite wrote:
> > Andres Freund wrote:
> > 
> > > FWIW, I still think this needs a pgbench or similar example integration,
> > > so we can actually properly measure the benefits.
> > 
> > Here's an updated version of the patch I made during review,
> > adding \beginbatch and \endbatch to pgbench.
> > The performance improvement appears clearly
> > with a custom script of this kind:
> >   \beginbatch
> >  UPDATE pgbench_branches SET bbalance = bbalance + 1 WHERE bid = 0;
> >  ..above repeated 1000 times...
> >   \endbatch
> > 
> > versus the same with a BEGIN; END; pair instead of \beginbatch \endbatch
> > 
> > On localhost on my desktop I tend to see a 30% difference in favor
> > of the batch mode with that kind of test.
> > On slower networks there are much bigger differences.
> 
> This is seriously impressive.  Just using the normal pgbench mixed
> workload, wrapping a whole transaction into a batch *doubles* the
> throughput.  And that's locally over a unix socket - the gain over
> actual network will be larger.

I've not analyzed this further, but something with the way network is
done isn't yet quite right either in the pgbench patch or in the libpq
patch.  You'll currently get IO like:

sendto(3, "B\0\0\0\22\0P1_2\0\0\0\0\0\0\1\0\0D\0\0\0\6P\0E\0\0\0\t\0"..., 36, 
MSG_NOSIGNAL, NULL, 0) = 36
sendto(3, "B\0\0\0\35\0P1_4\0\0\0\0\1\0\0\0\0073705952\0\1\0\0D\0"..., 47, 
MSG_NOSIGNAL, NULL, 0) = 47
sendto(3, "B\0\0\0\35\0P1_6\0\0\0\0\1\0\0\0\0077740854\0\1\0\0D\0"..., 47, 
MSG_NOSIGNAL, NULL, 0) = 47
sendto(3, "B\0\0\0\35\0P1_8\0\0\0\0\1\0\0\0\0071570280\0\1\0\0D\0"..., 47, 
MSG_NOSIGNAL, NULL, 0) = 47
sendto(3, "B\0\0\0\36\0P1_10\0\0\0\0\1\0\0\0\0072634305\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\36\0P1_12\0\0\0\0\1\0\0\0\0078960656\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\36\0P1_14\0\0\0\0\1\0\0\0\0073030370\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\35\0P1_16\0\0\0\0\1\0\0\0\006376125\0\1\0\0D\0"..., 47, 
MSG_NOSIGNAL, NULL, 0) = 47
sendto(3, "B\0\0\0\36\0P1_18\0\0\0\0\1\0\0\0\0072982423\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\36\0P1_20\0\0\0\0\1\0\0\0\0073860195\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\36\0P1_22\0\0\0\0\1\0\0\0\0072794433\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\36\0P1_24\0\0\0\0\1\0\0\0\0075475271\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\23\0P1_25\0\0\0\0\0\0\1\0\0D\0\0\0\6P\0E\0\0\0\t"..., 37, 
MSG_NOSIGNAL, NULL, 0) = 37
sendto(3, "S\0\0\0\4", 5, MSG_NOSIGNAL, NULL, 0) = 5
recvfrom(3, "2\0\0\0\4n\0\0\0\4C\0\0\0\nBEGIN\0002\0\0\0\4T\0\0\0!\0"..., 
16384, 0, NULL, NULL) = 775
recvfrom(3, 0x559a02667ff2, 15630, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667fb1, 15695, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667f6c, 15764, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667f2b, 15829, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667eea, 15894, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667ea9, 15959, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667e68, 16024, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667e24, 16092, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667de3, 16157, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667da2, 16222, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667d5d, 16291, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667d1c, 16356, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667d06, 16378, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)

I.e. we're doing tiny write send() syscalls (they should be coalesced)
and then completely unnecessarily call recv() over and over again
without polling.  To me it looks very much like we're just doing either
exactly once per command...

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optional message to user when terminating/cancelling backend

2017-06-21 Thread Michael Paquier
On Wed, Jun 21, 2017 at 11:42 PM, Daniel Gustafsson  wrote:
> The message is truncated in SetBackendCancelMessage() for safety, but
> pg_{cancel|terminate}_backend() could throw an error on too long message, or
> warning truncation, to the caller as well.  Personally I think a warning is 
> the
> appropriate response, but I don’t really have a strong opinion.

And a NOTICE? That's what happens for relation name truncation. You
are right that having a check in SetBackendCancelMessage() makes the
most sense as bgworkers could just call the low level API. Isn't the
concept actually closer to just a backend message? This slot could be
used for other purposes than cancellation.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS in CTE incorrect permission failure

2017-06-21 Thread Rod Taylor
On Wed, Jun 21, 2017 at 7:46 PM, Tom Lane  wrote:

> Rod Taylor  writes:
> > In the attached script, the second insert into t2 (as part of the CTE)
> > should succeed.
>
> No, I don't think so.  You declared the check function as STABLE which
> means it is confined to seeing the same snapshot as the surrounding query.
> So it can't see anything inserted by that query.
>
> Possibly it'd work as you wish with a VOLATILE function.
>

Indeed, that works as expected.

Sorry for the noise.


-- 
Rod Taylor


Re: [HACKERS] [COMMITTERS] pgsql: Restart logical replication launcher when killed

2017-06-21 Thread Andres Freund
On 2017-06-22 08:46:35 +0900, Michael Paquier wrote:
> On Thu, Jun 22, 2017 at 4:16 AM, Peter Eisentraut  wrote:
> > Restart logical replication launcher when killed
> 
> -   /* The logical replication launcher can be stopped at any time. */
> -   proc_exit(0);
> +   /* The logical replication launcher can be stopped at any time.
> +* Use exit status 1 so the background worker is restarted. */
> +   proc_exit(1);
> I know I am noisy on the matter but... This comment format is not PG-like.

That's since been repaired by the new pgindent run.  But what I'm a bit
confused about is why we're going for proc_exit() for replication
launchers when other types of processes simply FATAL out?  Seems like a
weird change.  It's not like it's not log-worthy if somebody kills the
launcher?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS in CTE incorrect permission failure

2017-06-21 Thread Tom Lane
Rod Taylor  writes:
> In the attached script, the second insert into t2 (as part of the CTE)
> should succeed.

No, I don't think so.  You declared the check function as STABLE which
means it is confined to seeing the same snapshot as the surrounding query.
So it can't see anything inserted by that query.

Possibly it'd work as you wish with a VOLATILE function.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Restart logical replication launcher when killed

2017-06-21 Thread Michael Paquier
On Thu, Jun 22, 2017 at 4:16 AM, Peter Eisentraut  wrote:
> Restart logical replication launcher when killed

-   /* The logical replication launcher can be stopped at any time. */
-   proc_exit(0);
+   /* The logical replication launcher can be stopped at any time.
+* Use exit status 1 so the background worker is restarted. */
+   proc_exit(1);
I know I am noisy on the matter but... This comment format is not PG-like.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-21 Thread Craig Ringer
On 22 Jun. 2017 07:40, "Andres Freund"  wrote:

On 2017-06-20 17:51:23 +0200, Daniel Verite wrote:
>   Andres Freund wrote:
>
> > FWIW, I still think this needs a pgbench or similar example integration,
> > so we can actually properly measure the benefits.
>
> Here's an updated version of the patch I made during review,
> adding \beginbatch and \endbatch to pgbench.
> The performance improvement appears clearly
> with a custom script of this kind:
>   \beginbatch
>  UPDATE pgbench_branches SET bbalance = bbalance + 1 WHERE bid = 0;
>  ..above repeated 1000 times...
>   \endbatch
>
> versus the same with a BEGIN; END; pair instead of \beginbatch \endbatch
>
> On localhost on my desktop I tend to see a 30% difference in favor
> of the batch mode with that kind of test.
> On slower networks there are much bigger differences.

This is seriously impressive.  Just using the normal pgbench mixed
workload, wrapping a whole transaction into a batch *doubles* the
throughput.  And that's locally over a unix socket - the gain over
actual network will be larger.


In my original tests I got over a 300x improvement on WAN :) . I should
check if the same applies with pgbench.


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-21 Thread Andres Freund
On 2017-06-20 17:51:23 +0200, Daniel Verite wrote:
>   Andres Freund wrote:
> 
> > FWIW, I still think this needs a pgbench or similar example integration,
> > so we can actually properly measure the benefits.
> 
> Here's an updated version of the patch I made during review,
> adding \beginbatch and \endbatch to pgbench.
> The performance improvement appears clearly
> with a custom script of this kind:
>   \beginbatch
>  UPDATE pgbench_branches SET bbalance = bbalance + 1 WHERE bid = 0;
>  ..above repeated 1000 times...
>   \endbatch
> 
> versus the same with a BEGIN; END; pair instead of \beginbatch \endbatch
> 
> On localhost on my desktop I tend to see a 30% difference in favor
> of the batch mode with that kind of test.
> On slower networks there are much bigger differences.

This is seriously impressive.  Just using the normal pgbench mixed
workload, wrapping a whole transaction into a batch *doubles* the
throughput.  And that's locally over a unix socket - the gain over
actual network will be larger.

\set nbranches 1 * :scale
\set ntellers 10 * :scale
\set naccounts 10 * :scale
\set aid random(1, :naccounts)
\set bid random(1, :nbranches)
\set tid random(1, :ntellers)
\set delta random(-5000, 5000)
\beginbatch
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, 
:aid, :delta, CURRENT_TIMESTAMP);
END;
\endbatch

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] RLS in CTE incorrect permission failure

2017-06-21 Thread Rod Taylor
In the attached script, the second insert into t2 (as part of the CTE)
should succeed. My actual use case isn't much more complex; the function is
used primarily to allow peaking at columns that the function definer has
access to but a typical user does not. Function also makes it easy to copy
this policy to a number of structures.

The function within the policy doesn't seem to be able to see records
inserted by earlier statements in the CTE. Perhaps this is as simple as
adding a command counter increment in the right place?

Fails in 9.5.7 and HEAD.

-- 
Rod Taylor


cte_rls_fail.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PG 10beta2 schedule

2017-06-21 Thread Tom Lane
The release team has agreed that a good time to put out beta2 will
be the second week of July, ie wrap tarballs 10 July for announcement
13 July.

I imagine beta3 will appear along with the scheduled back-branch
releases in the second week of August, but that's not positively
decided yet.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

2017-06-21 Thread Tom Lane
Today, lorikeet failed with a new variant on the bgworker start crash:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet=2017-06-21%2020%3A29%3A10

This one is even more exciting than the last one, because it sure looks
like the crashing bgworker took the postmaster down with it.  That is
Not Supposed To Happen.

Wondering if we broke something here recently, I tried to reproduce it
on a Linux machine by adding a randomized Assert failure in
shm_mq_set_sender.  I don't see any such problem, even with EXEC_BACKEND;
we recover from the crash as-expected.

So I'm starting to get a distinct feeling that there's something wrong
with the cygwin port.  But I dunno what.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re-indent HEAD tomorrow?

2017-06-21 Thread Tom Lane
Andres Freund  writes:
> On 2017-06-21 17:28:32 -0400, Tom Lane wrote:
>> And I think it'd make sense to wait a few
>> months and garner some experience with back-patching from v10 into the
>> older branches, so we have more than guesses about how much pain not
>> reindenting will be for us.

> Hm.  A few days / a week or two, okay.  But a few months?  That'll incur
> the backpatch cost during all that time...

We can always move up the decision date if we find that our pain
sensors have overloaded.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re-indent HEAD tomorrow?

2017-06-21 Thread Andres Freund
On 2017-06-21 17:28:32 -0400, Tom Lane wrote:
> Right now we're really just speculating about how much pain there will
> be, on either end of this.  So it'd be interesting for somebody who's
> carrying large out-of-tree patches (EDB? Citus?) to try the new
> pgindent version on a back branch and see how much of their patches no
> longer apply afterwards.

Citus isn't a patched version of postgres anymore, butan extension (with
some ugly hacks to make that possible ...).  Therefore it shouldn't
affect us in any meaningful way.

> And I think it'd make sense to wait a few
> months and garner some experience with back-patching from v10 into the
> older branches, so we have more than guesses about how much pain not
> reindenting will be for us.

Hm.  A few days / a week or two, okay.  But a few months?  That'll incur
the backpatch cost during all that time...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SUBSCRIPTION log noise

2017-06-21 Thread Euler Taveira
2017-06-21 15:14 GMT-03:00 Peter Eisentraut <
peter.eisentr...@2ndquadrant.com>:

> > It doesn't appear to be contingent on anything other than the content of
> > the command you just gave it.  I don't think we need a NOTICE saying
> > that it did that thing I just told it to do--that should be implicit by
> > the lack of an ERROR.
>
> I'm appreciative of this complaint.  The point is that we are allocating
> resources on a remote host that should not be forgotten.  I could go
> either way.


It is documented that subscription can create a replication slot on remote
host as mentioned in section "Replication Slot Management". If we want to
maintain the message let's downgrade it to DEBUG1 (as we do with "create
implicit index for table") but I prefer to rip it out.


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento



Re: [HACKERS] Phantom segment upon promotion causing troubles.

2017-06-21 Thread Andres Freund
Hi,

On 2017-06-20 16:11:32 +0300, Heikki Linnakangas wrote:
> On 06/19/2017 10:30 AM, Andres Freund wrote:
> > Greg Burek from Heroku (CCed) reported a weird issue on IM, that was
> > weird enough to be interesting.  What he'd observed was that he promoted
> > some PITR standby, and early clones of that node work, but later clones
> > did not, failing to read some segment.
> > 
> > The problems turns out to be the following:  [explanation]
> 
> Good detective work!

Thanks.


> > The minimal fix here is presumably not to use XLByteToPrevSeg() in
> > RemoveXlogFile(), but XLByteToSeg(). I don't quite see what purpose it
> > serves here - I don't think it's ever needed.
> 
> Agreed, I don't see a reason for it either.

Pushed. And found like three other things while investigating :/


> > There seems to be a larger question ehre though: Why does
> > XLogFileReadAnyTLI() probe all timelines even if they weren't a parent
> > at that period?  That seems like a bad idea, especially in more
> > complicated scenarios where some precursor timeline might live for
> > longer than it was a parent?  ISTM XLogFileReadAnyTLI() should check
> > which timeline a segment ought to come from, based on the historY?
> 
> Yeah. I've had that thought for years as well, but there has never been any
> pressing reason to bite the bullet and rewrite it, so I haven't gotten
> around to it.

Heh.  Still seems like something we should tackle - but it'd not be
urgent enough to backpatch, so it doesn't quite seem like something to
tackle *just now* :/

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re-indent HEAD tomorrow?

2017-06-21 Thread Tom Lane
Bruce Momjian  writes:
> On Wed, Jun 21, 2017 at 04:07:30PM -0400, Tom Lane wrote:
>> ... and it's done.

> You are eventually doing all active branches, right?

I don't think we'd entirely decided that we should do that, or when
to do it.  I'm not in a huge hurry; we might find some more tweaks
we want to make -- particularly around typedef collection -- before
we call it done.

For reference, the patchset wound up changing just about 1 lines,
out of a total code base approaching 1.4M lines, so less than 1%
churn.  That's not as bad as I'd thought it would be going in.
But I'm not sure if that represents an argument for or against
reindenting the back branches.  It's probably more than the number of
lines affected by that 8.1 comment-right-margin adjustment that caused
us so much back-patching pain later.

Right now we're really just speculating about how much pain there will
be, on either end of this.  So it'd be interesting for somebody who's
carrying large out-of-tree patches (EDB? Citus?) to try the new
pgindent version on a back branch and see how much of their patches no
longer apply afterwards.  And I think it'd make sense to wait a few
months and garner some experience with back-patching from v10 into the
older branches, so we have more than guesses about how much pain not
reindenting will be for us.

I'd earlier suggested that waiting till around the time of 10.0
release might be a good idea, and that still seems like a
reasonable timeframe.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Broken O(n^2) avoidance in wal segment recycling.

2017-06-21 Thread Andres Freund
Hi,

Author: Heikki Linnakangas 
Branch: master Release: REL9_5_BR [b2a5545bd] 2015-04-13 16:53:49 +0300
Branch: REL9_4_STABLE Release: REL9_4_2 [d72792d02] 2015-04-13 17:22:21 +0300
Branch: REL9_3_STABLE Release: REL9_3_7 [a800267e4] 2015-04-13 17:22:35 +0300
Branch: REL9_2_STABLE Release: REL9_2_11 [cc2939f44] 2015-04-13 17:26:59 +0300
Branch: REL9_1_STABLE Release: REL9_1_16 [ad2925e20] 2015-04-13 17:26:49 +0300
Branch: REL9_0_STABLE Release: REL9_0_20 [5b6938186] 2015-04-13 17:26:35 +0300

Don't archive bogus recycled or preallocated files after timeline switch.

Moved xlog file deletion from RemoveOldXlogFiles() into its own
RemoveXlogFile() routine, because it introduced a new function also
deleting/recycling segments.

It did so moving
+   /* Needn't recheck that slot on future iterations */
+   endlogSegNo++;
into the new routine. But it's useless there, because it's just a stack
variable, which is going to be freshly computed with
+   XLByteToPrevSeg(endptr, endlogSegNo);
on the next call.

That logic was introduced in

commit 61b861421b0b849a0dffe36238b8e504624831c1
Author: Tom Lane 
Date:   2005-04-15 18:48:10 +

Modify MoveOfflineLogs/InstallXLogFileSegment to avoid O(N^2) behavior
when recycling a large number of xlog segments during checkpoint.
The former behavior searched from the same start point each time,
requiring O(checkpoint_segments^2) stat() calls to relocate all the
segments.  Instead keep track of where we stopped last time through.

but was neutered by the commit above.

We've not heard any complaints about this afaik, but it's not something
that's easily diagnosable as being a problem.  Therefore I suspect we
should fix and backpatch this?

Heikki?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re-indent HEAD tomorrow?

2017-06-21 Thread Bruce Momjian
On Wed, Jun 21, 2017 at 04:07:30PM -0400, Tom Lane wrote:
> I wrote:
> > Barring objections, I'd like to reindent HEAD with the new version
> > of pg_bsd_indent (and correspondingly updated pgindent script)
> > tomorrow, say around 1800 UTC.
> 
> ... and it's done.

You are eventually doing all active branches, right?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-06-21 Thread Robert Haas
On Wed, Jun 21, 2017 at 1:38 PM, Amit Khandekar  wrote:
>>> Yep, it's more appropriate to use
>>> ModifyTableState->rootResultRelationInfo->ri_RelationDesc somehow.  That
>>> is, if answer to the question I raised above is positive.
>
> From what I had checked earlier when coding that part,
> rootResultRelInfo is NULL in case of inserts, unless something has
> changed in later commits. That's the reason I decided to use the first
> resultRelInfo.

We're just going around in circles here.  Saying that you decided to
use the first child's resultRelInfo because you didn't have a
resultRelInfo for the parent is an explanation of why you wrote the
code the way you did, but that doesn't make it correct.  I want to
know why you think it's correct.

I think it's probably wrong, because it seems to me that if the INSERT
code needs to use the parent's ResultRelInfo rather than the first
child's ResultRelInfo, the UPDATE code probably needs to do the same.
Commit d3cc37f1d801a6b5cad9bf179274a8d767f1ee50 got rid of
resultRelInfos for non-leaf partitions, and commit
e180c8aa8caf5c55a273d4a8e6092e77ff3cff10 added the resultRelInfo back
for the topmost parent, because otherwise it didn't work correctly.
If every partition in the hierarchy has a different attribute
ordering, then it seems to me that it must surely matter which of
those attribute orderings we pick.  It's hard to imagine that we can
pick *either* the parent's attribute ordering *or* that of the first
child and nothing will be different - the attribute numbers inside the
returning lists and WCOs we create have got to get used somehow, so
surely it matters which attribute numbers we use, doesn't it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making server name part of the startup message

2017-06-21 Thread Tom Lane
Satyanarayana Narlapuram  writes:
>> Why do we need to incur a protocol break to add another one?

> This is optional and is not a protocol break.

Yes, it is.  We've been around on this sort of thing before and we
understand the consequences.  If the option is carried in the startup
message, the client has to send it without knowing whether the server
is of new enough version to accept it.  If not, the server will reject
the connection (with a scary looking message in its log) and the client
then has to retry without the option.  This is not distinguishable from
what you have to do if you consider the startup message as belonging
to a new protocol version 4 instead of 3.

We have done this in the past, but it's painful, subject to bugs,
and generally is a pretty high price to pay for a marginal feature.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-06-21 Thread Robert Haas
On Wed, Jun 21, 2017 at 1:37 PM, Amit Khandekar  wrote:
>> e.g. the table is partitioned on order number, and you do UPDATE
>> lineitem SET product_code = 'K372B' WHERE product_code = 'K372'.
>
> This query does not update order number, so here there is no
> partition-key-update. Are you thinking that the patch is generating
> the per-leaf-partition WCO expressions even for a update not involving
> a partition key ?

No, it just wasn't a great example.  Sorry.

>> Second, it will amount to a functional bug if you get a
>> different answer than the planner did.
>
> Actually, the per-leaf WCOs are meant to be executed on the
> destination partitions where the tuple is moved, while the WCOs
> belonging to the per-subplan resultRelInfo are meant for the
> resultRelinfo used for the UPDATE plans. So actually it should not
> matter whether they look same or different, because they are fired at
> different objects. Now these objects can happen to be the same
> relations though.
>
> But in any case, it's not clear to me how the mapped WCO and the
> planner's WCO would yield a different answer if they are both the same
> relation. I am possibly missing something. The planner has already
> generated the withCheckOptions for each of the resultRelInfo. And then
> we are using one of those to re-generate the WCO for a leaf partition
> by only adjusting the attnos. If there is already a WCO generated in
> the planner for that leaf partition (because that partition was
> present in mtstate->resultRelInfo), then the re-built WCO should be
> exactly look same as the earlier one, because they are the same
> relations, and so the attnos generated in them would be same since the
> Relation TupleDesc is the same.

If the planner's WCOs and mapped WCOs are always the same, then I
think we should try to avoid generating both.  If they can be
different, but that's intentional and correct, then there's no
substantive problem with the patch but the comments need to make it
clear why we are generating both.

> Actually I meant, "above works for only local updates. For
> row-movement-updates, we need per-leaf partition WCOs, because when
> the row is inserted into target partition, that partition may be not
> be included in the above planner resultRelInfo, so we need WCOs for
> all partitions". I think this said comment should be sufficient if I
> add this in the code ?

Let's not get too focused on updating the comment until we are in
agreement about what the code ought to be doing.  I'm not clear
whether you accept the point that the patch needs to be changed to
avoid generating the same WCOs and returning lists in both the planner
and the executor.

>> Also, I feel like it's probably not correct to use the first result
>> relation as the nominal relation for building WCOs and returning lists
>> anyway.  I mean, if the first result relation has a different column
>> order than the parent relation, isn't this just broken?  If it works
>> for some reason, the comments don't explain what that reason is.
>
> Not sure why parent relation should come into picture. As long as the
> first result relation belongs to one of the partitions in the whole
> partition tree, we should be able to use that to build WCOs of any
> other partitions, because they have a common set of attributes having
> the same name. So we are bound to find each of the attributes of first
> resultRelInfo in the other leaf partitions during attno mapping.

Well, at least for returning lists, we've got to generate the
returning lists so that they all match the column order of the parent,
not the parent's first child.  Otherwise, for example, UPDATE
parent_table ... RETURNING * will not work correctly.  The tuples
returned by the returning clause have to have the attribute order of
parent_table, not the attribute order of parent_table's first child.
I'm not sure whether WCOs have the same issue, but it's not clear to
me why they wouldn't: they contain a qual which is an expression tree,
and presumably there are Var nodes in there someplace, and if so, then
they have varattnos that have to be right for the purpose for which
they're going to be used.

>> +for (i = 0; i < num_rels; i++)
>> +{
>> +ResultRelInfo *resultRelInfo = _rels[i];
>> +Relationrel = resultRelInfo->ri_RelationDesc;
>> +Bitmapset *expr_attrs = NULL;
>> +
>> +pull_varattnos((Node *) rel->rd_partcheck, 1, _attrs);
>> +
>> +/* Both bitmaps are offset by FirstLowInvalidHeapAttributeNumber. */
>> +if (bms_overlap(expr_attrs, GetUpdatedColumns(resultRelInfo, 
>> estate)))
>> +return true;
>> +}
>>
>> This seems like an awfully expensive way of performing this test.
>> Under what circumstances could this be true for some result relations
>> and false for others;
>
> One resultRelinfo can have no partition key column used in its quals,
> but the next resultRelinfo can have quite different quals, and these

Re: [HACKERS] postgresql transactons not fully isolated

2017-06-21 Thread Merlin Moncure
On Tue, Jun 20, 2017 at 2:58 PM, Merlin Moncure  wrote:
> On Tue, Jun 20, 2017 at 2:34 PM, David G. Johnston
>  wrote:
>> On Tue, Jun 20, 2017 at 12:22 PM, Chapman Flack  
>> wrote:
>>> I get the reported result (DELETE 0 and a table containing 2 and 3)
>>> in both 'read committed' and 'read uncommitted'.
>>
>> Practically speaking those are a single transaction isolation mode.
>>
>> https://www.postgresql.org/docs/10/static/transaction-iso.html
>>
>> I think Merlin has mis-read the article he linked to.  The example
>> being used there never claims to be done under serialization and seems
>> to describe an example of the perils of relying on the default
>> isolation level.
>
> oops -- could be operator error :-)


yep, I made the rookie mistake of setting transaction isolation level
(which immediately evaporated since it wasn't bracketed by the
transaction), but not for the default.  Sorry for the noise,
serialization failures are raised and that is acceptable behavior per
spec AIUI.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re-indent HEAD tomorrow?

2017-06-21 Thread Tom Lane
I wrote:
> Barring objections, I'd like to reindent HEAD with the new version
> of pg_bsd_indent (and correspondingly updated pgindent script)
> tomorrow, say around 1800 UTC.

... and it's done.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-06-21 Thread Robert Haas
On Mon, Jun 12, 2017 at 9:50 PM, Haribabu Kommi
 wrote:
> Open Items:
>
> 1. The BitmapHeapScan and TableSampleScan are tightly coupled with
> HeapTuple and HeapScanDesc, So these scans are directly operating
> on those structures and providing the result.
>
> These scan types may not be applicable to different storage formats.
> So how to handle them?

I think that BitmapHeapScan, at least, is applicable to any table AM
that has TIDs.   It seems to me that in general we can imagine three
kinds of table AMs:

1. Table AMs where a tuple can be efficiently located by a real TID.
By a real TID, I mean that the block number part is really a block
number and the item ID is really a location within the block.  These
are necessarily quite similar to our current heap, but they can change
the tuple format and page format to some degree, and it seems like in
many cases it should be possible to plug them into our existing index
AMs without too much heartache.  Both index scans and bitmap index
scans ought to work.

2. Table AMs where a tuple has some other kind of locator.  For
example, imagine an index-organized table where the locator is the
primary key, which is a bit like what Alvaro had in mind for indirect
indexes.  If the locator is 6 bytes or less, it could potentially be
jammed into a TID, but I don't think that's a great idea.  For things
like int8 or numeric, it won't work at all.  Even for other things,
it's going to cause problems because the bit patterns won't be what
the code is expecting; e.g. bitmap scans care about the structure of
the TID, not just how many bits it is.  (Due credit: Somebody, maybe
Alvaro, pointed out this problem before, at PGCon.)  For these kinds
of tables, larger modifications to the index AMs are likely to be
necessary, at least if we want a really general solution, or maybe we
should have separate index AMs - e.g. btree for traditional TID-based
heaps, and generic_btree or indirect_btree or key_btree or whatever
for heaps with some other kind of locator.  It's not too hard to see
how to make index scans work with this sort of structure but it's very
unclear to me whether, or how, bitmap scans can be made to work.

3. Table AMs where a tuple doesn't really have a locator at all.  In
these cases, we can't support any sort of index AM at all.  When the
table is queried, there's really nothing the core system can do except
ask the table AM for a full scan, supply the quals, and hope the table
AM has some sort of smarts that enable it to optimize somehow.  For
example, you can imagine converting cstore_fdw into a table AM of this
sort - ORC has a sort of inbuilt BRIN-like indexing that allows whole
chunks to be proven uninteresting and skipped.  (You could use chunk
number + offset to turn this into a table AM of the previous type if
you wanted to support secondary indexes; not sure if that'd be useful,
but it'd certainly be harder.)

I'm more interested in #1 than in #3, and more interested in #3 than
#2, but other people may have different priorities.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding connection id in the startup message

2017-06-21 Thread David G. Johnston
On Wed, Jun 21, 2017 at 12:15 PM, Robert Haas  wrote:
> I think the problem is real,
> but I'm not sure that this is the best solution.  On the other hand,
> I'm also not entirely sure I understand the proposal yet.

Given the problems with changing the protocol it does seem like
generalizing away from "connection id" to "arbitrary payload" would be
useful.

Like, add a new "attachment map" area where any application-aware node
that encounters the message can attach data.  If some node further
downstream sends a response message the contents of the attachment map
would be sent back as-is.

The advantage of "connection id" is that the messages are still of
fixed size whereas the more flexible arrangement would necessitate
that message sizes vary.  In return middleware gets a place to store
session information that can be used more broadly than a simple
externally generated connection id.

David J.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Missing comment for ResultRelInfo in execnodes.h

2017-06-21 Thread Peter Eisentraut
On 6/20/17 22:37, Etsuro Fujita wrote:
> On 2017/06/21 3:30, Peter Eisentraut wrote:
>> On 6/20/17 05:50, Etsuro Fujita wrote:
>>> Here is a small patch to add a comment on its new member PartitionRoot.
>>
>> The existing comment style is kind of unusual.  How about the attached
>> to clean it up a bit?
> 
> +1 for that change.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making server name part of the startup message

2017-06-21 Thread Satyanarayana Narlapuram
PgBouncer for example assumes that the database names are unique across the 
database clusters it is serving. Our front-end Gateways can serve tens of 
thousands of Postgres servers spanning multiple customers, and organizations, 
and enforcing the database names being unique is not possible for the users of 
the service.

> For the original idea in this thread, using something like dbname@server 
> seems a more logical choice than username@server.


We considered this option but connecting to the database from the GUI tools is 
not very intuitive / possible. Also /c switch in Psql requires including full 
cluster_name every time user connect to a different database.


Thanks,
Satya
From: Magnus Hagander [mailto:mag...@hagander.net]
Sent: Thursday, June 15, 2017 9:24 AM
To: Tom Lane 
Cc: Alvaro Herrera ; Satyanarayana Narlapuram 
; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Making server name part of the startup message

On Thu, Jun 15, 2017 at 5:57 PM, Tom Lane 
> wrote:
Alvaro Herrera > 
writes:
> Tom Lane wrote:
>> This makes no sense at all.  The client is telling the server what the
>> server's name is?

> I think for instance you could have one pgbouncer instance (or whatever
> pooler) pointing to several different servers.  So the client connects
> to the pooler and indicates which of the servers to connect to.

I should think that in such cases, the end client is exactly not what
you want to be choosing which server it gets redirected to.  You'd
be wanting to base that on policies defined at the pooler.  There are
already plenty of client-supplied attributes you could use as inputs
for such policies (user name and application name, for instance).
Why do we need to incur a protocol break to add another one?

The normal one to use for pgbonucer today is, well, "database name". You can 
then have pgbouncer map different databases to different backend servers. It's 
fairly common in my experience to have things like "dbname" and "dbname-ro" 
(for example) as different database names with one mapping to the master and 
one mapping to a load-balanced set of standbys, and things like that. ISTM that 
using the database name is a good choice for that.

For the original idea in this thread, using something like dbname@server seems 
a more logical choice than username@server.

TBH, so maybe I'm misunderstanding the original issue?

--
 Magnus Hagander
 Me: 
https://www.hagander.net/
 Work: 
https://www.redpill-linpro.com/


Re: [HACKERS] Making server name part of the startup message

2017-06-21 Thread Satyanarayana Narlapuram
> I should think that in such cases, the end client is exactly not what you 
> want to be choosing which server it gets redirected to.  You'd be wanting to 
> base that on >policies defined at the pooler.  There are already plenty of 
> client-supplied attributes you could use as inputs for such policies (user 
> name and application name, for >instance).
Pooler would be the end client for the Postgres database cluster, and 
connection string changes are required at the pooler. There is no change in the 
connection string format in such cases.

>Why do we need to incur a protocol break to add another one?
This is optional and is not a protocol break. This doesn't make the cluster 
name field mandatory in the startup message. If the client specifies the extra 
parameter in the connection string to include the server name in the startup 
message, then only it will be included otherwise it is not. In a proxy 
scenario, end client's startup message doesn't need to include the server name 
in it, and for proxy it is optional to include this field while sending the 
startup message to the server. It is preferred to set the field for the Azure 
PostgreSQL service instead of appending the cluster name to the user name.

Proposed LibPQ connection string format would be:

host=localhost port=5432 dbname=mydb connect_timeout=10 
include_cluster_name=true 

include_cluster_name is an optional parameter and setting this true includes 
cluster_name in the startup message and will not be included otherwise.

Thanks,
Satya

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Thursday, June 15, 2017 8:58 AM
To: Alvaro Herrera 
Cc: Satyanarayana Narlapuram ; 
pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Making server name part of the startup message

Alvaro Herrera  writes:
> Tom Lane wrote:
>> This makes no sense at all.  The client is telling the server what 
>> the server's name is?

> I think for instance you could have one pgbouncer instance (or 
> whatever
> pooler) pointing to several different servers.  So the client connects 
> to the pooler and indicates which of the servers to connect to.

I should think that in such cases, the end client is exactly not what you want 
to be choosing which server it gets redirected to.  You'd be wanting to base 
that on policies defined at the pooler.  There are already plenty of 
client-supplied attributes you could use as inputs for such policies (user name 
and application name, for instance).
Why do we need to incur a protocol break to add another one?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication launcher never been restarted when terminated

2017-06-21 Thread Peter Eisentraut
On 6/21/17 13:03, Yugo Nagata wrote:
> As I report in another thread[1], when the logical replication launcher 
> is terminated by SIGTERM, it never been restarted and we need to restart
> the server to enable logical replication again.
> 
> This is because the logical replication launcher exits with exitstatus 0,
> so if it exits with status 1 it is restarted by the postmaster normally.
> Attached is a simple patch to fix it in this way.

Fixed, thanks!

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding connection id in the startup message

2017-06-21 Thread Robert Haas
On Thu, Jun 15, 2017 at 9:50 AM, Tom Lane  wrote:
> Can you give a concrete example where this would have
> helped above and beyond knowing, eg, the source and time of the connection
> attempt?

I can imagine that in really high-volume use cases (such as the OP
apparently has) the number of client connections might be so large
that identification by timestamp isn't useful, and the source IP will
be obscured after the first hop through a connection pooler or other
middleware.  If you've got 100 connections per second coming in,
matching things up by timestamp across different machines is going to
be tough.

But I agree with your other concerns.  I think the problem is real,
but I'm not sure that this is the best solution.  On the other hand,
I'm also not entirely sure I understand the proposal yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding connection id in the startup message

2017-06-21 Thread Robert Haas
On Thu, Jun 15, 2017 at 3:11 AM, Satyanarayana Narlapuram
 wrote:
> The proposal is to tweak the connectivity wire protocol, and add a
> connection id (GUID) filed in the startup message. We can trace the
> connection using this GUID and investigate further on where the connection
> failed.

Wire protocol changes are scary, because they can result in a need to
update every client connector and every bit of middleware that speaks
PostgreSQL, not just the server.  If we thought this feature had
enough value to justify adding it, we could add it as an optional
feature so that existing clients don't break.  But if we added it and
only libpq and the server ended up supporting it, that would be a
disappointing outcome.

> Client adds a connection id in the startup message and send it to the server
> it is trying to connect to. Proxy logs the connection id information in its
> logs, and passes it to the server. Server logs the connection Id in the
> server log, and set it in the GUC variable (ConnectionId).

Are you imagining that this would be done by the client or by the
driver the client is using?  If the latter, it's transparent but maybe
useless, because the client won't even know that this ID got created.
If it's done by the client code proper, then it only works if not only
the server and every bit of middleware and every connector but also
every client application in the world is updated, which does not seem
like a thing that is likely to occur.

> This field can be an optional field driven by the connection parameters for
> PSql (-C or--include-clientId).

This makes it sound like you're imagining it being added by an
explicit action on the part of the client, which means we're in
update-every-PostgreSQL-application-in-the-world territory.

> P.S: I am looking for initial feedback on this idea and can provide more
> design details if needed.

I think you need to build a more compelling case for why this would be
useful, why application_name isn't the right answer, and how we avoid
forcing everybody in the world to worry about this new thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GSoC 2017 Proposal for predicate locking in hash index

2017-06-21 Thread Shubham Barai
Hi,

Now that hash index support write-ahead logging, it will be great if we add
support for predicate locking to it.
Implementation of predicate locking in hash index seems very simple.
I have already made changes in the code. I am currently working on testing.

Here is my approach

1) PredicateLockPage()

->_hash_first()

During a scan operation, acquire a predicate lock on the primary page of a
bucket.

2) CheckForSerializableConflictIn()

->_hash_doinsert()

During an insert operation, check if there is any predicate lock on the
primary page of a bucket.


3) PredicateLockPageSplit()

In case of a bucket split, copy predicate lock from the primary page of an
old bucket to the primary page of a new bucket.

Any suggestions or corrections will be appreciated.

Regards,
Shubham



   Sent with Mailtrack



Re: [HACKERS] pg_bsd_indent 2.0 is available from git.postgresql.org

2017-06-21 Thread Andrew Dunstan


On 06/21/2017 02:25 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 06/21/2017 11:30 AM, Tom Lane wrote:
>>> Andrew Dunstan  writes:
 Unfortunately this seems precluded by the use of the non-standard
 "err.h". It looks like that will trip us up on mingw also.
>>> Hm, why?  Doesn't the -I search path get the right thing?
>> The file isn't present at all.
> Uh, what?  It's in the repo.
>
>   


Oh, sorry, my bad, brain fade while multitasking.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication: stuck spinlock at ReplicationSlotRelease

2017-06-21 Thread Peter Eisentraut
On 6/21/17 09:02, Albe Laurenz wrote:
> 2017-06-21 14:55:12.033 CEST [23124] LOG:  could not send data to client: 
> Broken pipe
> 2017-06-21 14:55:12.033 CEST [23124] FATAL:  connection to client lost
> 2017-06-21 14:55:17.032 CEST [23133] LOG:  logical replication apply worker 
> for subscription "reprec" has started
> DEBUG:  received replication command: IDENTIFY_SYSTEM
> DEBUG:  received replication command: START_REPLICATION SLOT "reprec" LOGICAL 
> 0/0 (proto_version '1', publication_names '"repsend"')
> 2017-06-21 14:57:24.552 CEST [23124] PANIC:  stuck spinlock detected at 
> ReplicationSlotRelease, slot.c:394
> 2017-06-21 14:57:24.885 CEST [23070] LOG:  server process (PID 23124) was 
> terminated by signal 6: Aborted
> 2017-06-21 14:57:24.885 CEST [23070] LOG:  terminating any other active 
> server processes
> 2017-06-21 14:57:24.887 CEST [23134] LOG:  could not send data to client: 
> Broken pipe
> 2017-06-21 14:57:24.890 CEST [23070] LOG:  all server processes terminated; 
> reinitializing

I can't reproduce that.  I let it loop around for about 10 minutes and
it was fine.

I notice that you have some debug settings on.  Could you share your
exact setup steps from initdb, as well as configure options, just in
case one of these settings is causing a problem?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_bsd_indent 2.0 is available from git.postgresql.org

2017-06-21 Thread Tom Lane
Andrew Dunstan  writes:
> On 06/21/2017 11:30 AM, Tom Lane wrote:
>> Andrew Dunstan  writes:
>>> Unfortunately this seems precluded by the use of the non-standard
>>> "err.h". It looks like that will trip us up on mingw also.

>> Hm, why?  Doesn't the -I search path get the right thing?

> The file isn't present at all.

Uh, what?  It's in the repo.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_bsd_indent 2.0 is available from git.postgresql.org

2017-06-21 Thread Andrew Dunstan


On 06/21/2017 11:30 AM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> Unfortunately this seems precluded by the use of the non-standard
>> "err.h". It looks like that will trip us up on mingw also.
> Hm, why?  Doesn't the -I search path get the right thing?
>
>   


The file isn't present at all. On my Linux workstation "man errx" says:

CONFORMING TO
   These functions are nonstandard BSD extensions.


cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE SUBSCRIPTION log noise

2017-06-21 Thread Peter Eisentraut
On 6/20/17 22:54, Jeff Janes wrote:
> I think this should go away:
> 
> ereport(NOTICE,
>   (errmsg("created replication slot \"%s\" on
> publisher",
>   slotname)));
> 
> It doesn't appear to be contingent on anything other than the content of
> the command you just gave it.  I don't think we need a NOTICE saying
> that it did that thing I just told it to do--that should be implicit by
> the lack of an ERROR.

I'm appreciative of this complaint.  The point is that we are allocating
resources on a remote host that should not be forgotten.  I could go
either way.

Other opinions?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fast promotion not used when doing a recovery_target PITR restore?

2017-06-21 Thread Andres Freund
Hi,

When doing a PITR style recovery, with recovery target set, we're
currently not doing a fast promotion, in contrast to the handling when
doing a pg_ctl or trigger file based promotion. That can prolong making
the server available for writes.

I can't really see a reason for this?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] archive_timeout ignored directly after promotion

2017-06-21 Thread Andres Freund
Hi,

When promoting a standby without using the "fast promotion" logic,
e.g. by using recovery targets (which doesn't use fast promotion for
unbeknownst to me reasons), checkpointer doesn't necessarily respect
archive timeout for the first cycle after promotion.  The reason for
that is that it determines the sleep times like
if (XLogArchiveTimeout > 0 && !RecoveryInProgress())
{
elapsed_secs = now - last_xlog_switch_time;
if (elapsed_secs >= XLogArchiveTimeout)
continue;   /* no sleep for us ... 
*/
cur_timeout = Min(cur_timeout, XLogArchiveTimeout - 
elapsed_secs);
}
and RecoveryInProgress() will still return true.  We just fudge that
when creating the end-of-recovery checkpoint:
/*
 * The end-of-recovery checkpoint is a real checkpoint 
that's
 * performed while we're still in recovery.
 */
if (flags & CHECKPOINT_END_OF_RECOVERY)
do_restartpoint = false;

One easy way to fix that would be to just wakeup the checkpointer from
the startup process once at the end of recovery, but it'd not be
pretty.   I think it'd be better to change the
do_restartpoint = RecoveryInProgress();

/*
 * The end-of-recovery checkpoint is a real checkpoint 
that's
 * performed while we're still in recovery.
 */
if (flags & CHECKPOINT_END_OF_RECOVERY)
do_restartpoint = false;

into having a per-loop 'local_in_recovery' variable, that we turn off
once a CHECKPOINT_END_OF_RECOVERY checkpoint is requested.

Comments?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-06-21 Thread Amit Khandekar
On 21 June 2017 at 20:14, Robert Haas  wrote:
> On Wed, Jun 21, 2017 at 5:28 AM, Amit Langote
>  wrote:>> The comment "UPDATE/DELETE
> cases are handled above" is referring to
>>> the code that initializes the WCOs generated by the planner.  You've
>>> modified the comment in your patch, but the associated code: your
>>> updated comment says that only "DELETEs and local UPDATES are handled
>>> above", but in reality, *all* updates are still handled above.  And
>>> then they are handled again here.  Similarly for returning lists.
>>> It's certainly not OK for the comment to be inaccurate, but I think
>>> it's also bad to redo the work which the planner has already done,
>>> even if it makes the patch smaller.
>>
>> I guess this has to do with the UPDATE turning into DELETE+INSERT.  So, it
>> seems like WCOs are being initialized for the leaf partitions
>> (ResultRelInfos in the mt_partitions array) that are in turn are
>> initialized for the aforementioned INSERT.  That's why the term "...local
>> UPDATEs" in the new comment text.
>>
>> If that's true, I wonder if it makes sense to apply what would be
>> WCO_RLS_UPDATE_CHECK to a leaf partition that the tuple will be moved into
>> by calling ExecInsert()?
>
> I think we probably should apply the insert policy, just as we're
> executing the insert trigger.

Yes, the RLS quals should execute during tuple routing according to
whether it is a update or whether it has been converted to insert. I
think the tests don't quite test the insert part. Will check.

>
>>> Also, I feel like it's probably not correct to use the first result
>>> relation as the nominal relation for building WCOs and returning lists
>>> anyway.  I mean, if the first result relation has a different column
>>> order than the parent relation, isn't this just broken?  If it works
>>> for some reason, the comments don't explain what that reason is.
>>
>> Yep, it's more appropriate to use
>> ModifyTableState->rootResultRelationInfo->ri_RelationDesc somehow.  That
>> is, if answer to the question I raised above is positive.

>From what I had checked earlier when coding that part,
rootResultRelInfo is NULL in case of inserts, unless something has
changed in later commits. That's the reason I decided to use the first
resultRelInfo.


Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-06-21 Thread Amit Khandekar
On 21 June 2017 at 00:23, Robert Haas  wrote:
> On Tue, Jun 20, 2017 at 2:54 AM, Amit Khandekar  
> wrote:
>>> I guess I don't see why it should work like this.  In the INSERT case,
>>> we must build withCheckOption objects for each partition because those
>>> partitions don't appear in the plan otherwise -- but in the UPDATE
>>> case, they're already there, so why do we need to build anything at
>>> all?  Similarly for RETURNING projections.  How are the things we need
>>> for those cases not already getting built, associated with the
>>> relevant resultRelInfos?  Maybe there's a concern if some children got
>>> pruned - they could turn out later to be the children into which
>>> tuples need to be routed. But the patch makes no distinction
>>> between possibly-pruned children and any others.
>>
>> Yes, only a subset of the partitions appear in the UPDATE subplans. I
>> think typically for updates, a very small subset of the total leaf
>> partitions will be there in the plans, others would get pruned. IMHO,
>> it would not be worth having an optimization where it opens only those
>> leaf partitions which are not already there in the subplans. Without
>> the optimization, we are able to re-use the INSERT infrastructure
>> without additional changes.
>
> Well, that is possible, but certainly not guaranteed.  I mean,
> somebody could do a whole-table UPDATE, or an UPDATE that hits a
> smattering of rows in every partition;

I am not saying that it's guaranteed to be a small subset. I am saying
that it would be typically a small subset for
update-of-partitioned-key case. Seems weird if a user causes an
update-row-movement for multiple partitions at the same time.
Generally it would be an administrative task where some/all of the
rows of a partition need to have their partition key updated that
cause them to change their partition, and so there would be probably a
where clause that would narrow down the update to that particular
partition, because without the where clause the update is anyway
slower and it's redundant to scan all other partitions.

But, point taken, that there can always be certain cases involving
multiple table partition-key updates.

> e.g. the table is partitioned on order number, and you do UPDATE
> lineitem SET product_code = 'K372B' WHERE product_code = 'K372'.

This query does not update order number, so here there is no
partition-key-update. Are you thinking that the patch is generating
the per-leaf-partition WCO expressions even for a update not involving
a partition key ?

>
> Leaving that aside, the point here is that you're rebuilding
> withCheckOptions and returningLists that have already been built in
> the planner.  That's bad for two reasons.  First, it's inefficient,
> especially if there are many partitions.

Yeah, I agree that this becomes more and more redundant if the update
involves more partitions.

> Second, it will amount to a functional bug if you get a
> different answer than the planner did.

Actually, the per-leaf WCOs are meant to be executed on the
destination partitions where the tuple is moved, while the WCOs
belonging to the per-subplan resultRelInfo are meant for the
resultRelinfo used for the UPDATE plans. So actually it should not
matter whether they look same or different, because they are fired at
different objects. Now these objects can happen to be the same
relations though.

But in any case, it's not clear to me how the mapped WCO and the
planner's WCO would yield a different answer if they are both the same
relation. I am possibly missing something. The planner has already
generated the withCheckOptions for each of the resultRelInfo. And then
we are using one of those to re-generate the WCO for a leaf partition
by only adjusting the attnos. If there is already a WCO generated in
the planner for that leaf partition (because that partition was
present in mtstate->resultRelInfo), then the re-built WCO should be
exactly look same as the earlier one, because they are the same
relations, and so the attnos generated in them would be same since the
Relation TupleDesc is the same.

> Note this comment in the existing code:
>
> /*
>  * Build WITH CHECK OPTION constraints for each leaf partition rel. Note
>  * that we didn't build the withCheckOptionList for each partition within
>  * the planner, but simple translation of the varattnos for each partition
>  * will suffice.  This only occurs for the INSERT case; UPDATE/DELETE
>  * cases are handled above.
>  */
>
> The comment "UPDATE/DELETE cases are handled above" is referring to
> the code that initializes the WCOs generated by the planner.  You've
> modified the comment in your patch, but the associated code: your
> updated comment says that only "DELETEs and local UPDATES are handled
> above", but in reality, *all* updates are still handled above.  And

Actually I meant, "above works for only local updates. For
row-movement-updates, 

[HACKERS] Logical replication launcher never been restarted when terminated

2017-06-21 Thread Yugo Nagata
Hi,

As I report in another thread[1], when the logical replication launcher 
is terminated by SIGTERM, it never been restarted and we need to restart
the server to enable logical replication again.

This is because the logical replication launcher exits with exitstatus 0,
so if it exits with status 1 it is restarted by the postmaster normally.
Attached is a simple patch to fix it in this way.

However, I'm not sure this is the best way. For example, in this way,
we get the following log when the process is terminated, which we don't
get when it exits with status 0.

 LOG:  worker process: logical replication launcher (PID 11526) exited with 
exit code 1

If we don't want to get this message, we need more fixes in 
CleanupBackgroundWorker()
or around it.

[1]
https://www.postgresql.org/message-id/20170621205657.61d90605.nagata%40sraoss.co.jp

-- 
Yugo Nagata 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f99dd0a..a833dc5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2855,7 +2855,7 @@ ProcessInterrupts(void)
 	(errmsg("logical replication launcher shutting down")));
 
 			/* The logical replication launcher can be stopped at any time. */
-			proc_exit(0);
+			proc_exit(1);
 		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb initalization failure for collation "ja_JP"

2017-06-21 Thread Marco Atzeri

On 20/06/2017 17:37, Tom Lane wrote:

I wrote:

Marco Atzeri  writes:

Building on Cygwin latest 10  beta1 or head sourece,
make check fails as:
...
performing post-bootstrap initialization ... 2017-05-31 23:23:22.214
CEST [16860] FATAL:  collation "ja_JP" for encoding "EUC_JP" already exists



Hmph.  Could we see the results of "locale -a | grep ja_JP" ?


Despite the lack of followup from the OP, I'm pretty troubled by this
report.  It shows that the reimplementation of OS collation data import
as pg_import_system_collations() is a whole lot more fragile than the
original coding.  We have never before trusted "locale -a" to not produce
duplicate outputs, not since the very beginning in 414c5a2e.  AFAICS,
the current coding has also lost the protections we added very shortly
after that in 853c1750f; and it has also lost the admittedly rather
arbitrary, but at least deterministic, preference order for conflicting
short aliases that was in the original initdb code.


Hi Tom,
I raised the duplication issue on the cygwin mailing list,
and one of the core developer reports that
they saw the same issues on Linux in the past.

https://cygwin.com/ml/cygwin/2017-06/msg00253.html



regards, tom lane


Regards
Marco




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incorrect documentation about pg_stat_activity

2017-06-21 Thread Yugo Nagata
On Wed, 21 Jun 2017 16:18:50 +0200
Simon Riggs  wrote:

> On 21 June 2017 at 16:15, Yugo Nagata  wrote:
> > On Wed, 21 Jun 2017 19:08:35 +0530
> > Kuntal Ghosh  wrote:
> >
> >> On Wed, Jun 21, 2017 at 6:05 PM, Yugo Nagata  wrote:
> >> >
> >> > Attached is a patch for the documentation fix.
> >> >
> >> Please attach the patch as well. :-)
> >
> > I'm sorry, I forgot it. I attahed this.
> 
> Thanks, will apply.

Thanks, too.

> 
> -- 
> Simon Riggshttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Yugo Nagata 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Broken hint bits (freeze)

2017-06-21 Thread Bruce Momjian
On Wed, Jun 21, 2017 at 07:49:21PM +0530, Amit Kapila wrote:
> On Tue, Jun 20, 2017 at 7:24 PM, Amit Kapila  wrote:
> > Hmm.  I think we need something that works with lesser effort because
> > not all users will be as knowledgeable as you are, so if they make any
> > mistakes in copying the file manually, it can lead to problems.  How
> > about issuing a notification (XLogArchiveNotifySeg) in shutdown
> > checkpoint if archiving is enabled?
> >
> 
> I have thought more about the above solution and it seems risky to
> notify archiver for incomplete WAL segments (which will be possible in
> this case as there is no guarantee that Checkpoint record will fill
> the segment).  So, it seems to me we should update the document unless
> you or someone has some solution to this problem.

The over-arching question is how do we tell users to verify that the WAL
has been replayed on the standby?  I am thinking we would say that for
streaming replication, the "Latest checkpoint location" should match on
the primary and standby, while for log shipping, the standbys should be
exactly one WAL file less than the primary.

As far as I know this is the only remaining open issue.  Sergey, please
verify.  I appreciate the work everyone has done to improve this, and
all the existing fixes have been pushed to all supported branches.  :-)

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-21 Thread Kuntal Ghosh
On Wed, Jun 21, 2017 at 7:52 PM, Dilip Kumar  wrote:
> On Wed, Jun 21, 2017 at 7:47 PM, Kuntal Ghosh
>  wrote:
>>> IMHO, It's not a good idea to use DSM call to verify the DSA handle.
>>>
>> Okay. Is there any particular scenario you've in mind where this may fail?
>
> It's not about failure, but about the abstraction.  When we are using
> the DSA we should not directly access the DSM which is under DSA.
>
Okay. I thought that I've found at least one usage of
dsm_find_mapping() in the code. :-)

But, I've some more doubts.
1. When should we use dsm_find_mapping()? (The first few lines of
dsm_attach is same as dsm_find_mapping().)
2. As a user of dsa, how should we check whether my dsa handle is
already attached? I guess this is required because, if a user tries to
re-attach a dsa handle,  it's punishing the user by throwing an error
and the user wants to avoid such errors.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Shortened URLs for commit messages

2017-06-21 Thread Bruce Momjian
On Wed, Jun 21, 2017 at 11:11:57AM -0400, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > Oh, here is a fixed version that requires an @ sign, which all message
> > id's have:
> > 
> > sed '/http/!s;^\(Discussion: *\)\(.*@.*\)$;\1https://postgr.es/m/\2;'
> 
> So how do you actually use this?

My commit script is here:

https://momjian.us/main/writings/pgsql/src/pgcommit

It basically runs some checks and then creates a temp file with lines
labeled by their purpose.  It edits the temp file, then runs the temp
file through a number of filters, and one of those does the URL
shortening via pgurl at:

https://momjian.us/main/writings/pgsql/src/pgurl

It also changes bare message-ids on the 'Discussion' line to shorted
URLs too.  It also removes empty labeled lines, and exits if nothing has
been changed in the editor.  It then does the commit (potentially to
multiple branches), and then the push.

The script calls many of other custom scripts but you can get an idea
how it works.  I am happy to supply more tools as desired.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_bsd_indent 2.0 is available from git.postgresql.org

2017-06-21 Thread Tom Lane
Andres Freund  writes:
> On 2017-06-20 20:30:34 -0400, Tom Lane wrote:
>> Well, *I* don't care, but I thought we wanted to support Windows-using
>> developers as reasonably first-class citizens.

> The old one didn't have windows support either, right?

Not that I was aware of, but a large part of the point here is to
make things better than they were before.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_bsd_indent 2.0 is available from git.postgresql.org

2017-06-21 Thread Andres Freund
On 2017-06-20 20:30:34 -0400, Tom Lane wrote:
> Michael Paquier  writes:
> > On Wed, Jun 21, 2017 at 8:43 AM, Tom Lane  wrote:
> >> Yeah, I thought it would work fine with Makefile-using Windows toolchains.
> >> But people who use MSVC need something else, no?
> 
> > Are there that many anyway who care?
> 
> Well, *I* don't care, but I thought we wanted to support Windows-using
> developers as reasonably first-class citizens.

The old one didn't have windows support either, right?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Useless code in ExecInitModifyTable

2017-06-21 Thread Tom Lane
Robert Haas  writes:
> It appears to me that create_modifytable_path() has a partitioned_rels
> argument and it looks like inheritance_planner not only passes it but
> guarantees that it's non-empty when the target is a partitioned table.

Oh, somehow I missed the call in inheritance_planner.  Right.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_bsd_indent 2.0 is available from git.postgresql.org

2017-06-21 Thread Tom Lane
Andrew Dunstan  writes:
> Unfortunately this seems precluded by the use of the non-standard
> "err.h". It looks like that will trip us up on mingw also.

Hm, why?  Doesn't the -I search path get the right thing?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default Partition for Range

2017-06-21 Thread Dilip Kumar
On Wed, Jun 21, 2017 at 8:08 PM, Robert Haas  wrote:
> I think somebody should do some testing of the existing code with
> valgrind.  And then apply the list-partitioning patch and this patch,
> and do some more testing with valgrind.  It seems to be really easy to
> miss these uninitialized access problems during code review.

I think it will help,  but it may not help in all the scenarios
because most of the places we are allocating memory with palloc0 (
initialized with 0) but it never initialized with RANGE_DATUM_DEFAULT
except the first element (in the case of DEFAULT partition).  And,
later they may be considered as RANGE_DATUM_FINITE (because its value
is 0).

One solution can be that if bound is DEFAULT then initialize with
RANGE_DATUM_DEFAULT for the complete content array for that partition
bound instead of just first.  Otherwise, we need to be careful of
early exiting wherever we are looping the content array of the DEFAULT
bound.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Shortened URLs for commit messages

2017-06-21 Thread Alvaro Herrera
Bruce Momjian wrote:

> Oh, here is a fixed version that requires an @ sign, which all message
> id's have:
> 
>   sed '/http/!s;^\(Discussion: *\)\(.*@.*\)$;\1https://postgr.es/m/\2;'

So how do you actually use this?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers

2017-06-21 Thread Alvaro Herrera
Yugo Nagata wrote:

> However, we can terminate background workers by pg_terminate_backend.
> In the following example, I terminated the logical replication launcher,
> and this process did not appear again[1]. 
> 
> postgres=# select pg_terminate_backend(30902);
>  pg_terminate_backend 
> --
>  t
> (1 row)

I think failing to restart the replication launcher after it stops is
probably a bug, and should be fixed by having postmaster start another
one if it dies.

> Similarly, we can terminate autovacuum launcher by pg_terminate_backend,
> but a new process is restarted by postmaster in this case.[2]

Yeah, this is operating as intended.  You can turn autovacuum off and
the launcher should go away, and turn it back on and launcher should
start.  So we expect the autovac launcher to respond to signals.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers

2017-06-21 Thread Robert Haas
On Wed, Jun 21, 2017 at 7:56 AM, Yugo Nagata  wrote:
> I have found that we can cancel/terminate autovacuum launchers and
> background worker processes by pg_cancel/terminate_backend function.
> I'm wondering this behavior is not expected and if not I want to fix it.

I think it is expected.  Even if we blocked it, those processes have
to cope gracefully with SIGTERM, because anyone with access to the OS
user can kill them that way by hand.

> However, we can terminate background workers by pg_terminate_backend.
> In the following example, I terminated the logical replication launcher,
> and this process did not appear again[1].
>
> postgres=# select pg_terminate_backend(30902);
>  pg_terminate_backend
> --
>  t
> (1 row)

That seems to be a bug in logical replication.

> Similarly, we can terminate autovacuum launcher by pg_terminate_backend,
> but a new process is restarted by postmaster in this case.[2]
>
> postgres=# select pg_terminate_backend(30900);
>  pg_terminate_backend
> --
>  t
> (1 row)

That is as I would expect.

> [2]
> On the other hand, when we use pg_cancel_backend for autovacuum launcher,
> it causes the following error. I'll report the detail in another thread.
>
>  ERROR:  can't attach the same segment more than once

I think that's a bug.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-06-21 Thread Masahiko Sawada
On Fri, May 19, 2017 at 11:12 AM, Masahiko Sawada  wrote:
> On Wed, May 17, 2017 at 1:30 AM, Robert Haas  wrote:
>> On Sat, May 13, 2017 at 7:27 AM, Amit Kapila  wrote:
>>> On Fri, May 12, 2017 at 9:14 AM, Tom Lane  wrote:
 Robert Haas  writes:
> On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada  
> wrote:
>> ... I'd like to propose to change relation
>> extension lock management so that it works using LWLock instead.

> That's not a good idea because it'll make the code that executes while
> holding that lock noninterruptible.

 Is that really a problem?  We typically only hold it over one kernel call,
 which ought to be noninterruptible anyway.
>>>
>>> During parallel bulk load operations, I think we hold it over multiple
>>> kernel calls.
>>
>> We do.  Also, RelationGetNumberOfBlocks() is not necessarily only one
>> kernel call, no?  Nor is vm_extend.
>
> Yeah, these functions could call more than one kernel calls while
> holding extension lock.
>
>> Also, it's not just the backend doing the filesystem operation that's
>> non-interruptible, but also any waiters, right?
>>
>> Maybe this isn't a big problem, but it does seem to be that it would
>> be better to avoid it if we can.
>>
>
> I agree to change it to be interruptible for more safety.
>

Attached updated version patch. To use the lock mechanism similar to
LWLock but interrupt-able, I introduced new lock manager for extension
lock. A lot of code especially locking and unlocking, is inspired by
LWLock but it uses the condition variables to wait for acquiring lock.
Other part is not changed from previous patch. This is still a PoC
patch, lacks documentation. The following is the measurement result
with test script same as I used before.

* Copy test script
 HEADPatched
4436.6   436.1
8561.8   561.8
16   580.7   579.4
32   588.5   597.0
64   596.1   599.0

* Insert test script
 HEADPatched
4156.5   156.0
8167.0   167.9
16   176.2   175.6
32   181.1   181.0
64   181.5   183.0

Since I replaced heavyweight lock with lightweight lock I expected the
performance slightly improves from HEAD but it was almost same result.
I'll continue to look at more detail.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Moving_extension_lock_out_of_heavyweight_lock_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Useless code in ExecInitModifyTable

2017-06-21 Thread Robert Haas
On Wed, Jun 21, 2017 at 10:33 AM, Tom Lane  wrote:
> Amit Langote  writes:
>> On 2017/06/21 16:59, Etsuro Fujita wrote:
>>> but I noticed that that function doesn't use the relation descriptor at
>>> all.  Since partitioned_rels is given in case of an UPDATE/DELETE on a
>>> partitioned table, the relation is opened in that case, but the relation
>>> descriptor isn't referenced at all in initializing WITH CHECK OPTION
>>> constraints and/or RETURNING projections.  (The mtstate->resultRelinfo
>>> array is referenced in those initialization, instead.)  So, I'd like to
>>> propose to remove this from that function.  Attached is a patch for that.
>
>> Thanks for cleaning that up.  I cannot see any problem in applying the patch.
>
> Actually, isn't ModifyTable.partitioned_rels *always* NIL?  I can't
> see anyplace in the planner that sets it differently.  I think we should
> flush the field altogether.  Anybody who doesn't want that should at least
> provide the missing comment for create_modifytable_path's partitioned_rels
> argument (and yes, the fact that that is missing isn't making me any
> more favorably disposed...)

It appears to me that create_modifytable_path() has a partitioned_rels
argument and it looks like inheritance_planner not only passes it but
guarantees that it's non-empty when the target is a partitioned table.
You're right that there is a comment missing from the function header,
but it's not too hard to figure out what it should be.  Just consult
the definition of ModifyTable itself:

/* RT indexes of non-leaf tables in a partition tree */
List   *partitioned_rels;

The point here is that if we have a partitioned table with a bunch of
descendent tables, the non-leaf partitions are never actually scanned;
there's no file on disk to scan.  Despite the lack of a scan, we still
need to arrange for those relations to be locked.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-06-21 Thread Robert Haas
On Wed, Jun 21, 2017 at 5:28 AM, Amit Langote
 wrote:>> The comment "UPDATE/DELETE
cases are handled above" is referring to
>> the code that initializes the WCOs generated by the planner.  You've
>> modified the comment in your patch, but the associated code: your
>> updated comment says that only "DELETEs and local UPDATES are handled
>> above", but in reality, *all* updates are still handled above.  And
>> then they are handled again here.  Similarly for returning lists.
>> It's certainly not OK for the comment to be inaccurate, but I think
>> it's also bad to redo the work which the planner has already done,
>> even if it makes the patch smaller.
>
> I guess this has to do with the UPDATE turning into DELETE+INSERT.  So, it
> seems like WCOs are being initialized for the leaf partitions
> (ResultRelInfos in the mt_partitions array) that are in turn are
> initialized for the aforementioned INSERT.  That's why the term "...local
> UPDATEs" in the new comment text.
>
> If that's true, I wonder if it makes sense to apply what would be
> WCO_RLS_UPDATE_CHECK to a leaf partition that the tuple will be moved into
> by calling ExecInsert()?

I think we probably should apply the insert policy, just as we're
executing the insert trigger.

>> Also, I feel like it's probably not correct to use the first result
>> relation as the nominal relation for building WCOs and returning lists
>> anyway.  I mean, if the first result relation has a different column
>> order than the parent relation, isn't this just broken?  If it works
>> for some reason, the comments don't explain what that reason is.
>
> Yep, it's more appropriate to use
> ModifyTableState->rootResultRelationInfo->ri_RelationDesc somehow.  That
> is, if answer to the question I raised above is positive.

The questions appear to me to be independent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >