date:20170604

Amit Kapila  writes:
> On Mon, Jun 5, 2017 at 4:00 AM, Tom Lane  wrote:
>> I took a quick look at this, and it seems rather beside the point.

> What I understood from the randomization shm allocation behavior due
> to ASLR is that when we try to allocate memory (say using
> VirtualAllocEx as in our case) at a pre-defined address, it will
> instead allocate it at a different address which can lead to a
> collision.

No.  The problem we're concerned about is that by the time we try
pgwin32_ReserveSharedMemoryRegion's call
VirtualAllocEx(hChild, UsedShmemSegAddr, UsedShmemSegSize,
   MEM_RESERVE, PAGE_READWRITE)
in the child process, something might have already taken that address
space in the child process, which would cause the call to fail not just
allocate some randomly different space.  That can't happen unless the base
executable, or ntdll.dll, or possibly something injected by an antivirus
product, has taken up address space different from what it took up in the
postmaster process.  What we are assuming here is that any such variation
is randomized, and so will probably be different in the next child process
launched by the postmaster, allowing a new process launch to possibly fix
the problem.  But once that address space is allocated in a given process,
no amount of merely retrying a new allocation request in that process is
going to make it go away.  You'd have to do something to free the existing
allocation, and we have no way to do that short of starting over.

> I think the same problem can happen during reattach as well.
> Basically, MapViewOfFileEx can fail to load image at predefined
> address (UsedShmemSegAddr).

Once we've successfully done the VirtualAllocEx call, that should hold
until the VirtualFree call in PGSharedMemoryReAttach, allowing the
immediately-following MapViewOfFileEx call to succeed.  Were that not the
case, we'd have had problems even without ASLR.  We did have problems
exactly like that before the pgwin32_ReserveSharedMemoryRegion code was
added.  So my feeling about this is that retrying the process creation as
in my prototype patch ought to be sufficient; if you think it isn't, the
burden of proof is on you.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: retry shm attach for windows (WAS: Re: [HACKERS] OK, so culicidae is still broken)

2017-06-04 Thread Amit Kapila

On Mon, Jun 5, 2017 at 4:00 AM, Tom Lane  wrote:
> Amit Kapila  writes:
>> Okay, I have added the comment to explain the same.  I have also
>> modified the patch to adjust the looping as per your suggestion.
>
> I took a quick look at this, and it seems rather beside the point.
> You can't just loop inside an already-forked process and expect
> that address space that was previously unavailable will suddenly
> become available, when the problem is that the executable itself
> (or some shared library) is mapped into the space you want.
>

What I understood from the randomization shm allocation behavior due
to ASLR is that when we try to allocate memory (say using
VirtualAllocEx as in our case) at a pre-defined address, it will
instead allocate it at a different address which can lead to a
collision.  So, it seems to me the problem to solve here is not to
attempt to allocate the address space which is allocated to another
library,  rather negate the impact of randomized address allocation
behavior.  So I thought retrying to allocate space at predefined
address is sufficient.

> What we have to do in order to retry is to fork an entire new
> process, whose address space will hopefully get laid out differently.
>
> While it's possible we could do that in a platform-independent
> way, it would be pretty invasive and I don't see the value.
> The implementation I'd had in mind originally was to put a loop
> inside postmaster.c's internal_forkexec (win32 version), such
> that if pgwin32_ReserveSharedMemoryRegion failed, it would
> terminate/close up the subprocess as it already does, but then
> go back around and do another CreateProcess.  Perhaps it's as
> simple as the attached, but I'm not in a position to test it.
>

I think the same problem can happen during reattach as well.
Basically, MapViewOfFileEx can fail to load image at predefined
address (
UsedShmemSegAddr).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fix a typo in README.dependencies

2017-06-04 Thread atorikoshi


Hi,

I found below formula to compute selectivities, but
I think the last Probability 'P(b=?)' should be 'P(c=?)'.


 P(a=?,b=?,c=?) = P(a=?,b=?) * (d + (1-d)*P(b=?))


Attached patch fixes it, and it also adds some spaces
following another formula which is on line 86 and
computes P(a=?, b=?).

Regards,

--
Atsushi Torikoshi
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/src/backend/statistics/README.dependencies 
b/src/backend/statistics/README.dependencies
index 59f9d57..d10e4d0 100644
--- a/src/backend/statistics/README.dependencies
+++ b/src/backend/statistics/README.dependencies
@@ -90,7 +90,7 @@ Where 'd' is the degree of functional dependence (a=>b).
 With more than two equality clauses, this process happens recursively. For
 example for (a,b,c) we first use (a,b=>c) to break the computation into
 
-P(a=?,b=?,c=?) = P(a=?,b=?) * (d + (1-d)*P(b=?))
+P(a=?,b=?,c=?) = P(a=?,b=?) * (d + (1-d) * P(c=?))
 
 and then apply (a=>b) the same way on P(a=?,b=?).
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication - still unstable after all these months

2017-06-04 Thread Petr Jelinek

On 03/06/17 16:12, Jeff Janes wrote:
> 
> On Fri, Jun 2, 2017 at 4:10 PM, Petr Jelinek
> > wrote:
> 
> 
> While I was testing something for different thread I noticed that I
> manage transactions incorrectly in this patch. Fixed here, I didn't test
> it much yet (it takes a while as you know :) ). Not sure if it's related
> to the issue you've seen though.
> 
> 
> This conflicts again with Peter E's recent commit
> 3c9bc2157a4f465b3c070d1250597568d2dc285f
> 

Rebased on top of current HEAD.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From 067edf450967c76d000c0b4a2cddf719fae45f7f Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Wed, 31 May 2017 01:51:45 +0200
Subject: [PATCH] Improve handover logic between sync and apply workers

Make apply busy wait check the catalog instead of shmem state to ensure
that next transaction will see the expected table synchronization state.

Also make the handover always go through same set of steps to make the
overall process easier to understand and debug.
---
 src/backend/replication/logical/tablesync.c | 215 
 1 file changed, 123 insertions(+), 92 deletions(-)

diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 6e268f3..8926914 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -27,27 +27,21 @@
  *		 synchronization has finished.
  *
  *	  The stream position synchronization works in multiple steps.
- *	   - Sync finishes copy and sets table state as SYNCWAIT and waits
+ *	   - Sync finishes copy and sets worker state as SYNCWAIT and waits
  *		 for state to change in a loop.
  *	   - Apply periodically checks tables that are synchronizing for SYNCWAIT.
- *		 When the desired state appears it will compare its position in the
- *		 stream with the SYNCWAIT position and based on that changes the
- *		 state to based on following rules:
- *		  - if the apply is in front of the sync in the WAL stream the new
- *			state is set to CATCHUP and apply loops until the sync process
- *			catches up to the same LSN as apply
- *		  - if the sync is in front of the apply in the WAL stream the new
- *			state is set to SYNCDONE
- *		  - if both apply and sync are at the same position in the WAL stream
- *			the state of the table is set to READY
- *	   - If the state was set to CATCHUP sync will read the stream and
- *		 apply changes until it catches up to the specified stream
- *		 position and then sets state to READY and signals apply that it
- *		 can stop waiting and exits, if the state was set to something
- *		 else than CATCHUP the sync process will simply end.
- *	   - If the state was set to SYNCDONE by apply, the apply will
- *		 continue tracking the table until it reaches the SYNCDONE stream
- *		 position at which point it sets state to READY and stops tracking.
+ *		 When the desired state appears it will set the worker state to
+ *		 CATCHUP and starts loop waiting until either table state is set to
+ *		 SYNCDONE or worker exits
+ *	   - After worker seen the state change to CATCHUP, it will read the
+ *	 stream and apply changes until it catches up to the specified stream
+ *		 position and then sets state to SYNCDONE. Note that there might be
+ *		 zero changes applied betweem CATCHUP and SYNCDONE states as the sync
+ *		 worker might be ahead of the apply.
+ *	   - Once the state was set to SYNCDONE, the apply will continue tracking
+ *	 the table until it reaches the SYNCDONE stream position at which
+ *	 point it sets state to READY and stops tracking. Again there might
+ *	 be zero changes inbetween.
  *
  *	  The catalog pg_subscription_rel is used to keep information about
  *	  subscribed tables and their state and some transient state during
@@ -56,26 +50,29 @@
  *	  Example flows look like this:
  *	   - Apply is in front:
  *		  sync:8
- *			-> set SYNCWAIT
+ *			-> set in memory SYNCWAIT
  *		  apply:10
- *			-> set CATCHUP
+ *			-> set in memory CATCHUP
  *			-> enter wait-loop
  *		  sync:10
- *			-> set READY
+ *			-> set in catalog SYNCDONE
  *			-> exit
  *		  apply:10
  *			-> exit wait-loop
  *			-> continue rep
+ *		  apply:11
+ *		-> set in catalog READY
  *	   - Sync in front:
  *		  sync:10
- *			-> set SYNCWAIT
+ *			-> set in memory SYNCWAIT
  *		  apply:8
- *			-> set SYNCDONE
+ *			-> set in memory CATCHUP
  *			-> continue per-table filtering
- *		  sync:10
+  *		  sync:10
+ *			-> set in catalog SYNCDONE
  *			-> exit
  *		  apply:10
- *			-> set READY
+ *			-> set in catalog READY
  *			-> stop per-table filtering
  *			-> continue rep
  *-
@@ -100,6 +97,7 @@
 #include "replication/walreceiver.h"
 #include

Re: [HACKERS] sketchy partcollation handling

2017-06-04 Thread Amit Langote

On 2017/06/03 1:31, Robert Haas wrote:
> If you create a partitioned table in the obvious way, partcollation ends up 0:
> 
> rhaas=# create table foo (a int, b text) partition by list (a);
> CREATE TABLE
> rhaas=# select * from pg_partitioned_table;
>  partrelid | partstrat | partnatts | partattrs | partclass |
> partcollation | partexprs
> ---+---+---+---+---+---+---
>  16420 | l | 1 | 1 | 1978  | 0 |
> (1 row)
> 
> You could argue that 0 is an OK value there; offhand, I'm not sure
> about that.

If you create index on an integer column, you'll get a 0 in indcollation:

select indcollation from pg_index where indrelid = 'foo'::regclass;
 indcollation
--
 0
(1 row)


> But there's nothing in
> https://www.postgresql.org/docs/10/static/catalog-pg-partitioned-table.html
> which indicates that some entries can be 0 rather than a valid
> collation OID.

Yeah, it might be better to explain that.  BTW, pg_index.indcollation's
description lacks a note about this too, so maybe, we should fix both.
OTOH, pg_attribute.attcollation's description mentions it:

   The defined collation of the column, or zero if the column is
   not of a collatable data type.

It might be a good idea to update partcollation's and indcollation's
description along the same lines.

> And this is definitely not OK:
> 
> rhaas=# select * from pg_depend where objid = 16420;
>  classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
> -+---+--++--+-+-
> 1259 | 16420 |0 |   2615 | 2200 |   0 | n
> 1259 | 16420 |0 |   3456 |0 |   0 | n
> (2 rows)
> 
> We shouldn't be storing a dependency on non-existing collation 0.
>
> I'm not sure whether the bug here is that we should have a valid
> collation OID there rather than 0, or whether the bug is that we
> shouldn't be recording a dependency on anything other than a real
> collation OID, but something about this is definitely not right.

I think we can call it a bug of StorePartitionKey().  I looked at the
similar code in index_create() (which actually I had originally looked at
for reference when writing the partitioning code in question) and looks
like it doesn't store the dependency for collation 0 and for the default
collation of the database.  I think the partitioning code should do the
same.  Attached find a patch for the same (which also updates the
documentation as mentioned above); with the patch:

create table p (a int) partition by range (a);
select partcollation from pg_partitioned_table;
 partcollation
---
 0
(1 row)


-- no collation dependency stored for invalid collation
select * from pg_depend where objid = 'p'::regclass;
 classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
-+---+--++--+-+-
1259 | 16434 |0 |   2615 | 2200 |   0 | n
(1 row)


create table p (a text) partition by range (a);
select partcollation from pg_partitioned_table;
 partcollation
---
 100
(1 row)

-- no collation dependency stored for the default collation
select * from pg_depend where objid = 'p'::regclass;
 classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
-+---+--++--+-+-
1259 | 16407 |0 |   2615 | 2200 |   0 | n
(1 row)


create table p (a text) partition by range (a collate "en_US");
select partcollation from pg_partitioned_table;
 partcollation
---
 12513
(1 row)

-- dependency on non-default collation is stored
select * from pg_depend where objid = 'p'::regclass;
 classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
-+---+--++--+-+-
1259 | 16410 |0 |   2615 | 2200 |   0 | n
1259 | 16410 |0 |   3456 |12513 |   0 | n
(2 rows)


BTW, the places which check whether the collation to store a dependency
for is the database default collation don't need to do that.  I mean the
following code block in all of these places:

   /* The default collation is pinned, so don't bother recording it */
   if (OidIsValid(attr->attcollation) &&
   attr->attcollation != DEFAULT_COLLATION_OID)
   {
   referenced.classId = CollationRelationId;
   referenced.objectId = attr->attcollation;
   referenced.objectSubId = 0;
   recordDependencyOn(, , DEPENDENCY_NORMAL);
   }

That's because the default collation is pinned and the dependency code
checks isObjectPinned() and does not create pg_depend entries if so.
Those places are:

AddNewAttributeTuples
StorePartitionKey
index_create
GenerateTypeDependencies

Re: [HACKERS] logical replication - still unstable after all these months

2017-06-04 Thread Mark Kirkwood




On 05/06/17 13:08, Mark Kirkwood wrote:

On 05/06/17 00:04, Erik Rijkers wrote:


On 2017-05-31 16:20, Erik Rijkers wrote:

On 2017-05-31 11:16, Petr Jelinek wrote:
[...]
Thanks to Mark's offer I was able to study the issue as it happened 
and

found the cause of this.

[0001-Improve-handover-logic-between-sync-and-apply-worker.patch]


This looks good:

-- out_20170531_1141.txt
100 -- pgbench -c 90 -j 8 -T 60 -P 12 -n   --  scale 25
100 -- All is well.

So this is 100x a 1-minute test with 100x success. (This on the most
fastidious machine (slow disks, meagre specs) that used to give 15%
failures)


[Improve-handover-logic-between-sync-and-apply-worker-v2.patch]

No errors after (several days of) running variants of this. (2500x 1 
minute runs; 12x 1-hour runs)


Same here, no errors with the v2 patch applied (approx 2 days - all 1 
minute runs)




Further, reapplying the v1 patch (with a bit of editing as I wanted to 
apply it to my current master), gets a failure with missing rows in the 
history table quite quickly. I'll put back the v2 patch and resume runs 
with that, but I'm cautiously optimistic that the v2 patch solves the issue.


regards

Mark



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

Hi,

On 2017-06-05 10:31:12 +0900, Michael Paquier wrote:
> On Mon, Jun 5, 2017 at 10:29 AM, Andres Freund  wrote:
> > Michael, Peter, Fujii, is either of you planning to review this? I'm
> > planning to commit this tomorrow morning PST, unless somebody protest
> > till then.
> 
> Yes, I am. It would be nice if you could let me 24 hours to look at it
> in details.

Sure.  Could you let me know when you're done?

Noah, I might thus not be able to resolve most of "Query handling in
Walsender is somewhat broken" by tomorrow, but it might end up being
Tuesday.  Even after that there'll be a remaining item after all these.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

On Mon, Jun 5, 2017 at 10:29 AM, Andres Freund  wrote:
> Michael, Peter, Fujii, is either of you planning to review this? I'm
> planning to commit this tomorrow morning PST, unless somebody protest
> till then.

Yes, I am. It would be nice if you could let me 24 hours to look at it
in details.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE

On Mon, Jun 5, 2017 at 10:09 AM, Jing Wang  wrote:
> By using the patch the CURRENT_DATABASE as a keyword can be used in the
> following SQL commands:
>
> 1. COMMENT ON DATABASE CURRENT_DATABASE is ...
> 2. ALTER DATABASE CURRENT_DATABASE OWNER to ...
> 3. ALTER DATABASE CURRENT_DATABASE SET parameter ...
> 4. ALTER DATABASE CURRENT_DATABASE RESET parameter ...
> 5. SELECT CURRENT_DATABASE

You should add that to the next commit fest:
https://commitfest.postgresql.org/14/
Note that it may take a couple of months until you get some review, as
the focus now is to stabilize Postgres 10.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

On 2017-06-02 17:20:23 -0700, Andres Freund wrote:
> Attached is a *preliminary* patch series implementing this.  I've first
> reverted the previous patch, as otherwise backpatchable versions of the
> necessary patches would get too complicated, due to the signals used and
> such.

I went again through this, and the only real thing I found that there
was a leftover prototype in walsender.h.  I've in interim worked on
backpatch versions of that series, annoying conflicts, but nothing
really problematic.  The only real difference is adding SetLatch() calls
to HandleWalSndInitStopping() < 9.6, and guarding SetLatch with an if <
9.5.

As an additional patch (based on one by Petr), even though it more
belongs to
http://archives.postgresql.org/message-id/20170421014030.fdzvvvbrz4nckrow%40alap3.anarazel.de
attached is a patch unifying SIGHUP between normal and walsender
backends.  This needs to be backpatched all the way.  I've also attached
a second patch, again based on Petr's, that unifies SIGHUP handling
across all the remaining backends, but that's something that probably
more appropriate for v11, although I'm still tempted to commit it
earlier.

Michael, Peter, Fujii, is either of you planning to review this? I'm
planning to commit this tomorrow morning PST, unless somebody protest
till then.

- Andres
>From 39f95c9e85811d6759a29b293adc97567d895d69 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 2 Jun 2017 14:14:34 -0700
Subject: [PATCH 1/6] Revert "Prevent panic during shutdown checkpoint"

This reverts commit 086221cf6b1727c2baed4703c582f657b7c5350e, which
was made to master only.

The approach implemented in the above commit has some issues.  While
those could easily be fixed incrementally, doing so would make
backpatching considerably harder, so instead first revert this patch.

Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxp...@alap3.anarazel.de
---
 doc/src/sgml/monitoring.sgml|   5 -
 src/backend/access/transam/xlog.c   |   6 --
 src/backend/postmaster/postmaster.c |   7 +-
 src/backend/replication/walsender.c | 143 
 src/include/replication/walsender.h |   1 -
 src/include/replication/walsender_private.h |   3 +-
 6 files changed, 24 insertions(+), 141 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 79ca45a156..5640c0d84a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1690,11 +1690,6 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
backup: This WAL sender is sending a backup.
   
  
- 
-  
-   stopping: This WAL sender is stopping.
-  
- 

  
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 399822d3fe..35ee7d1cb6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8324,12 +8324,6 @@ ShutdownXLOG(int code, Datum arg)
 	ereport(IsPostmasterEnvironment ? LOG : NOTICE,
 			(errmsg("shutting down")));
 
-	/*
-	 * Wait for WAL senders to be in stopping state.  This prevents commands
-	 * from writing new WAL.
-	 */
-	WalSndWaitStopping();
-
 	if (RecoveryInProgress())
 		CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
 	else
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 35b4ec88d3..5c79b1e40d 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2918,7 +2918,7 @@ reaper(SIGNAL_ARGS)
  * Waken walsenders for the last time. No regular backends
  * should be around anymore.
  */
-SignalChildren(SIGINT);
+SignalChildren(SIGUSR2);
 
 pmState = PM_SHUTDOWN_2;
 
@@ -3656,9 +3656,7 @@ PostmasterStateMachine(void)
 /*
  * If we get here, we are proceeding with normal shutdown. All
  * the regular children are gone, and it's time to tell the
- * checkpointer to do a shutdown checkpoint. All WAL senders
- * are told to switch to a stopping state so that the shutdown
- * checkpoint can go ahead.
+ * checkpointer to do a shutdown checkpoint.
  */
 Assert(Shutdown > NoShutdown);
 /* Start the checkpointer if not running */
@@ -3667,7 +3665,6 @@ PostmasterStateMachine(void)
 /* And tell it to shut down */
 if (CheckpointerPID != 0)
 {
-	SignalSomeChildren(SIGUSR2, BACKEND_TYPE_WALSND);
 	signal_child(CheckpointerPID, SIGUSR2);
 	pmState = PM_SHUTDOWN;
 }
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 49cce38880..aa705e5b35 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -24,14 +24,11 @@
  * are treated as not a crash but approximately normal termination;
  * the walsender will exit quickly without sending any more XLOG records.
  *
- * If the

[HACKERS] Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog

On Mon, Jun 5, 2017 at 7:24 AM, Noah Misch  wrote:
> [Action required within three days.  This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 10 open item.  Simon,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.

I have just provided a patch that takes care of solving this issue and
cleans up the lock handling for all the 2PC redo code paths.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE

2017-06-04 Thread Jing Wang

Hi all,

The attached patch is to support the feature "COMMENT ON DATABASE
CURRENT_DATABASE". The solution is based on the previous discussion in [2] .

Can't find the previous link in my email history list so create a new topic
here.

By using the patch the CURRENT_DATABASE as a keyword can be used in the
following SQL commands:

1. COMMENT ON DATABASE CURRENT_DATABASE is ...
2. ALTER DATABASE CURRENT_DATABASE OWNER to ...
3. ALTER DATABASE CURRENT_DATABASE SET parameter ...
4. ALTER DATABASE CURRENT_DATABASE RESET parameter ...
5. SELECT CURRENT_DATABASE


[1] https://www.postgresql.org/message-id/20150317171836.gc10...@momjian.us
[2]
https://www.postgresql.org/message-id/flat/CAB7nPqSTXUWAx-C5Pgw%2Bdu5jxu4QZ%3DaxQq165McmyT3UggWmuQ%40mail.gmail.com#CAB7nPqSTXUWAx-C5Pgw+du5jxu4QZ=axqq165mcmyt3uggw...@mail.gmail.com

--
Regards,
Jing Wang
Fujitsu Australia
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index e60e8e8..8895980 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -665,7 +665,8 @@ const ObjectAddress InvalidObjectAddress =
 	InvalidOid,
 	0
 };
-
+static ObjectAddress get_object_address_database(ObjectType objtype,
+List * objname, bool missing_ok);
 static ObjectAddress get_object_address_unqualified(ObjectType objtype,
 			   List *qualname, bool missing_ok);
 static ObjectAddress get_relation_by_qualified_name(ObjectType objtype,
@@ -803,6 +804,8 @@ get_object_address(ObjectType objtype, List *objname, List *objargs,
 }
 break;
 			case OBJECT_DATABASE:
+address = get_object_address_database(objtype, objname, missing_ok);
+break;
 			case OBJECT_EXTENSION:
 			case OBJECT_TABLESPACE:
 			case OBJECT_ROLE:
@@ -1042,6 +1045,44 @@ get_object_address_rv(ObjectType objtype, RangeVar *rel, List *objname,
 
 /*
  * Find an ObjectAddress for a type of object that is identified by an
+ * database name
+ */
+static ObjectAddress
+get_object_address_database(ObjectType objtype, List * objname, bool missing_ok)
+{
+	const char 		*name;
+	char			*dbname;
+	DBSpecName		*dbspecname;
+	ObjectAddress 	address;
+
+	if (list_length(objname) != 1)
+	{
+		const char *msg;
+
+		msg = gettext_noop("database name cannot be qualified");
+
+		ereport(ERROR,
+(errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("%s", _(msg;
+	}
+
+	/* Format is valid, extract the actual name. */
+	dbspecname = (DBSpecName*)linitial(objname);
+
+	if (dbspecname->dbnametype == DBSPEC_CURRENT_DATABASE )
+		dbname = get_database_name(MyDatabaseId);
+	else
+		dbname = dbspecname->dbname;
+
+	address.classId = DatabaseRelationId;
+	address.objectId = get_database_oid(dbname, missing_ok);
+	address.objectSubId = 0;
+
+	return address;
+}
+
+/*
+ * Find an ObjectAddress for a type of object that is identified by an
  * unqualified name.
  */
 static ObjectAddress
@@ -2086,8 +2127,20 @@ check_object_ownership(Oid roleid, ObjectType objtype, ObjectAddress address,
 			break;
 		case OBJECT_DATABASE:
 			if (!pg_database_ownercheck(address.objectId, roleid))
-aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_DATABASE,
-			   NameListToString(objname));
+			{
+char 		*dbname;
+DBSpecName 	*dbspecname;
+
+/* Format is valid, extract the actual name. */
+dbspecname = (DBSpecName*)linitial(objname);
+
+if (dbspecname->dbnametype == DBSPEC_CURRENT_DATABASE )
+	dbname = get_database_name(MyDatabaseId);
+else
+	dbname = dbspecname->dbname;
+
+aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_DATABASE,dbname);
+			}
 			break;
 		case OBJECT_TYPE:
 		case OBJECT_DOMAIN:
diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index 1301bcb..78f6dfb 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -749,7 +749,7 @@ ExecAlterOwnerStmt(AlterOwnerStmt *stmt)
 	switch (stmt->objectType)
 	{
 		case OBJECT_DATABASE:
-			return AlterDatabaseOwner(strVal(linitial(stmt->object)), newowner);
+			return AlterDatabaseOwner(linitial(stmt->object), newowner);
 
 		case OBJECT_SCHEMA:
 			return AlterSchemaOwner(strVal(linitial(stmt->object)), newowner);
diff --git a/src/backend/commands/comment.c b/src/backend/commands/comment.c
index a0d3f8d..f2f27d7 100644
--- a/src/backend/commands/comment.c
+++ b/src/backend/commands/comment.c
@@ -53,13 +53,21 @@ CommentObject(CommentStmt *stmt)
 	 */
 	if (stmt->objtype == OBJECT_DATABASE && list_length(stmt->objname) == 1)
 	{
-		char	   *database = strVal(linitial(stmt->objname));
+		char		*dbname = NULL;
+		DBSpecName	*dbspecname = NULL;
 
-		if (!OidIsValid(get_database_oid(database, true)))
+		dbspecname = (DBSpecName*)linitial(stmt->objname);
+
+		if (dbspecname->dbnametype == DBSPEC_CURRENT_DATABASE )
+			dbname = get_database_name(MyDatabaseId);
+		else
+			dbname = dbspecname->dbname;
+
+		if (!OidIsValid(get_database_oid(dbname, true)))
 		{
 			ereport(WARNING,
 	(errcode(ERRCODE_UNDEFINED_DATABASE),
-

Re: [HACKERS] logical replication - still unstable after all these months

2017-06-04 Thread Mark Kirkwood


On 05/06/17 00:04, Erik Rijkers wrote:


On 2017-05-31 16:20, Erik Rijkers wrote:

On 2017-05-31 11:16, Petr Jelinek wrote:
[...]

Thanks to Mark's offer I was able to study the issue as it happened and
found the cause of this.

[0001-Improve-handover-logic-between-sync-and-apply-worker.patch]


This looks good:

-- out_20170531_1141.txt
100 -- pgbench -c 90 -j 8 -T 60 -P 12 -n   --  scale 25
100 -- All is well.

So this is 100x a 1-minute test with 100x success. (This on the most
fastidious machine (slow disks, meagre specs) that used to give 15%
failures)


[Improve-handover-logic-between-sync-and-apply-worker-v2.patch]

No errors after (several days of) running variants of this. (2500x 1 
minute runs; 12x 1-hour runs)


Same here, no errors with the v2 patch applied (approx 2 days - all 1 
minute runs)


regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should we standardize on a type for signal handler flags?

Andres Freund  writes:
> On 2017-06-04 19:14:06 -0400, Tom Lane wrote:
>> sig_atomic_t is more standards-conforming, I should think.  I'm not sure
>> if there are any current platforms where a store to a char variable
>> wouldn't be atomic, but why live dangerously?

> Well, we already have some variables that aren't actually booleans,
> although I think all of them are only read not manipulated in signal
> handlers (InterruptHoldoffCount etc).

Hm.  Well, according to POSIX one may rely on sig_atomic_t being able
to hold the values 0..127 on all platforms.  So we might be able to
get away with converting InterruptHoldoffCount to sig_atomic_t if we
needed to.  In the absence of evidence that we need to, I wouldn't.
But I have no problem with standardizing on using sig_atomic_t for
variables that are assigned to by signal handlers.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should we standardize on a type for signal handler flags?

On 2017-06-04 19:14:06 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > At the moment a number of flag variables set in signal handlers have
> > 'volatile bool' as type, others have 'volatile sig_atomic_t'.  That's
> > kinda confusing.   I think either is safe, but I think we should
> > standardize one of them.
> 
> sig_atomic_t is more standards-conforming, I should think.  I'm not sure
> if there are any current platforms where a store to a char variable
> wouldn't be atomic, but why live dangerously?

Well, we already have some variables that aren't actually booleans,
although I think all of them are only read not manipulated in signal
handlers (InterruptHoldoffCount etc).  So one could argue that there's
no safety benefit in sig_atomic_t, because we're already using in other
places.   We also already rely on int32 stores being atomic in other
parts of the code, although that's between processes not between signal
/ normal path of execution.

> I'd be inclined to let the code continue to treat the variables as
> if they were bool, ie store "true" and "false" not "1" and "0"
> into them.  That should be perfectly safe.

Indeed.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should we standardize on a type for signal handler flags?

Andres Freund  writes:
> At the moment a number of flag variables set in signal handlers have
> 'volatile bool' as type, others have 'volatile sig_atomic_t'.  That's
> kinda confusing.   I think either is safe, but I think we should
> standardize one of them.

sig_atomic_t is more standards-conforming, I should think.  I'm not sure
if there are any current platforms where a store to a char variable
wouldn't be atomic, but why live dangerously?

I'd be inclined to let the code continue to treat the variables as
if they were bool, ie store "true" and "false" not "1" and "0"
into them.  That should be perfectly safe.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should we standardize on a type for signal handler flags?

On Mon, Jun 5, 2017 at 8:00 AM, Andres Freund  wrote:
> At the moment a number of flag variables set in signal handlers have
> 'volatile bool' as type, others have 'volatile sig_atomic_t'.  That's
> kinda confusing.   I think either is safe, but I think we should
> standardize one of them.

sig_atomic_t's definition includes a reference to signals, so I would
vote for using it instead of bool.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Should we standardize on a type for signal handler flags?

Hi,

At the moment a number of flag variables set in signal handlers have
'volatile bool' as type, others have 'volatile sig_atomic_t'.  That's
kinda confusing.   I think either is safe, but I think we should
standardize one of them.

Opinions?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Server ignores contents of SASLInitialResponse

On Fri, Jun 02, 2017 at 09:58:40PM -0700, Noah Misch wrote:
> On Tue, May 30, 2017 at 03:04:47AM +, Noah Misch wrote:
> > On Thu, May 25, 2017 at 10:52:23AM -0400, Michael Paquier wrote:
> > > On Thu, May 25, 2017 at 9:32 AM, Michael Paquier
> > >  wrote:
> > > > On Thu, May 25, 2017 at 8:51 AM, Heikki Linnakangas  
> > > > wrote:
> > > >> On 05/24/2017 11:33 PM, Michael Paquier wrote:
> > > >>> I have noticed today that the server ignores completely the contents
> > > >>> of SASLInitialResponse. ... Attached is a patch to fix the problem.
> > > >>
> > > >> Fixed, thanks!
> > > >
> > > > Thanks for the commit.
> > > 
> > > Actually, I don't think that we are completely done here. Using the
> > > patch of upthread to enforce a failure on SASLInitialResponse, I see
> > > that connecting without SSL causes the following error:
> > > psql: FATAL:  password authentication failed for user "mpaquier"
> > > But connecting with SSL returns that:
> > > psql: duplicate SASL authentication request
> > > 
> > > I have not looked at that in details yet, but it seems to me that we
> > > should not take pg_SASL_init() twice in the scram authentication code
> > > path in libpq for a single attempt.
> > 
> > [Action required within three days.  This is a generic notification.]
> > 
> > The above-described topic is currently a PostgreSQL 10 open item.  Heikki,
> > since you committed the patch believed to have created it, you own this open
> > item.  If some other commit is more relevant or if this does not belong as a
> > v10 open item, please let us know.  Otherwise, please observe the policy on
> > open item ownership[1] and send a status update within three calendar days 
> > of
> > this message.  Include a date for your subsequent status update.  Testers 
> > may
> > discover new open items at any time, and I want to plan to get them all 
> > fixed
> > well in advance of shipping v10.  Consequently, I will appreciate your 
> > efforts
> > toward speedy resolution.  Thanks.
> > 
> > [1] 
> > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
for your status update.  Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately.  If I do not hear from you by
2017-06-05 23:00 UTC, I will transfer this item to release management team
ownership without further notice.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Error while creating subscription when server is running in single user mode

On Fri, Jun 02, 2017 at 11:06:52PM -0400, Peter Eisentraut wrote:
> On 6/2/17 15:41, Tom Lane wrote:
> > It's certainly plausible that we could have the latch code just ignore
> > WL_POSTMASTER_DEATH if not IsUnderPostmaster.  I think that the original
> > reasoning for not doing that was that the calling code should know which
> > environment it's in, and not pass an unimplementable wait-exit reason;
> > so silently ignoring the bit could mask a bug.  Perhaps that argument is
> > no longer attractive.  Alternatively, we could fix the relevant call sites
> > to do "(IsUnderPostmaster ? WL_POSTMASTER_DEATH : 0)", and keep the strict
> > behavior for the majority of call sites.
> 
> There are a lot of those call sites.  (And a lot of duplicate code for
> what to do if postmaster death actually happens.)  I doubt we want to
> check them all.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG #14682: row level security not work with partitioned table

On Fri, Jun 02, 2017 at 09:28:16AM +0900, Michael Paquier wrote:
> On Thu, Jun 1, 2017 at 11:13 AM, Mike Palmiotto
>  wrote:
> > This is indeed a bug. fireRIRrules is currently skipping the RLS
> > policy check when relkind == PARTITIONED_TABLES, so RLS policies are
> > not applied. The attached patch fixes the behavior.
> 
> I would expect RLS to trigger as well in this context. Note that there
> should be regression tests to avoid this failure again in the future.
> I have added an open item.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: retry shm attach for windows (WAS: Re: [HACKERS] OK, so culicidae is still broken)

Amit Kapila  writes:
> Okay, I have added the comment to explain the same.  I have also
> modified the patch to adjust the looping as per your suggestion.

I took a quick look at this, and it seems rather beside the point.
You can't just loop inside an already-forked process and expect
that address space that was previously unavailable will suddenly
become available, when the problem is that the executable itself
(or some shared library) is mapped into the space you want.
What we have to do in order to retry is to fork an entire new
process, whose address space will hopefully get laid out differently.

While it's possible we could do that in a platform-independent
way, it would be pretty invasive and I don't see the value.
The implementation I'd had in mind originally was to put a loop
inside postmaster.c's internal_forkexec (win32 version), such
that if pgwin32_ReserveSharedMemoryRegion failed, it would
terminate/close up the subprocess as it already does, but then
go back around and do another CreateProcess.  Perhaps it's as
simple as the attached, but I'm not in a position to test it.

Assuming this passes minimal smoke testing, the way to actually test
it is to undo the hack in the MSVC build scripts that disables ASLR,
and then (on a Windows version where ASLR is active) run it until
you've seen a few occurrences of the retry, which should be detectable
by looking for bleats from pgwin32_ReserveSharedMemoryRegion
in the postmaster log.

regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 35b4ec8..e30b0c6 100644
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*** internal_forkexec(int argc, char *argv[]
*** 4487,4492 
--- 4487,4493 
  static pid_t
  internal_forkexec(int argc, char *argv[], Port *port)
  {
+ 	int			retry_count = 0;
  	STARTUPINFO si;
  	PROCESS_INFORMATION pi;
  	int			i;
*** internal_forkexec(int argc, char *argv[]
*** 4504,4509 
--- 4505,4513 
  	Assert(strncmp(argv[1], "--fork", 6) == 0);
  	Assert(argv[2] == NULL);

+ 	/* Resume here if we need to retry */
+ retry:
+ 
  	/* Set up shared memory for parameter passing */
  	ZeroMemory(, sizeof(sa));
  	sa.nLength = sizeof(sa);
*** internal_forkexec(int argc, char *argv[]
*** 4595,4616 

  	/*
  	 * Reserve the memory region used by our main shared memory segment before
! 	 * we resume the child process.
  	 */
  	if (!pgwin32_ReserveSharedMemoryRegion(pi.hProcess))
  	{
! 		/*
! 		 * Failed to reserve the memory, so terminate the newly created
! 		 * process and give up.
! 		 */
  		if (!TerminateProcess(pi.hProcess, 255))
  			ereport(LOG,
  	(errmsg_internal("could not terminate process that failed to reserve memory: error code %lu",
  	 GetLastError(;
  		CloseHandle(pi.hProcess);
  		CloseHandle(pi.hThread);
! 		return -1;/* logging done made by
!  * pgwin32_ReserveSharedMemoryRegion() */
  	}

  	/*
--- 4599,4624 

  	/*
  	 * Reserve the memory region used by our main shared memory segment before
! 	 * we resume the child process.  Normally this should succeed, but if ASLR
! 	 * is active then it might sometimes fail due to the executable having
! 	 * gotten mapped into that range.  In that case, just terminate the
! 	 * process and retry.
  	 */
  	if (!pgwin32_ReserveSharedMemoryRegion(pi.hProcess))
  	{
! 		/* pgwin32_ReserveSharedMemoryRegion already made a log entry */
  		if (!TerminateProcess(pi.hProcess, 255))
  			ereport(LOG,
  	(errmsg_internal("could not terminate process that failed to reserve memory: error code %lu",
  	 GetLastError(;
  		CloseHandle(pi.hProcess);
  		CloseHandle(pi.hThread);
! 		if (++retry_count < 100)
! 			goto retry;
! 		ereport(LOG,
! 		  (errmsg("giving up after too many tries to reserve shared memory"),
! 		   errhint("This might be caused by ASLR or antivirus software.")));
! 		return -1;
  	}

  	/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog

On Thu, Jun 01, 2017 at 01:07:53AM -0700, Michael Paquier wrote:
> On Wed, May 31, 2017 at 12:30 PM, Michael Paquier
>  wrote:
> > On Wed, May 31, 2017 at 6:57 AM, Tom Lane  wrote:
> >> wangchuant...@huawei.com writes:
> >>> startup process on standby encounter a deadlock of TwoPhaseStateLock when
> >>> redo 2PC xlog.
> >>
> >> Please provide an example of how to get into this state.
> >
> > That would help. Are you seeing in the logs something like "removing
> > future two-phase state from memory for XXX" or "removing stale
> > two-phase state from shared memory for XXX"?
> >
> > Even with that, the light-weight lock sequence taken in those code
> > paths look definitely wrong to me, we should not take twice
> > TwoPhaseStateLock in the same code path. I think that we should remove
> > the lock acquisitions in RemoveGXact() and PrepareRedoRemove, and then
> > upgrade the locks of PrescanPreparedTransactions() and
> > StandbyRecoverPreparedTransactions() to be exclusive. We still need to
> > keep a lock as CheckPointTwoPhase() may still be triggered by the
> > checkpoint. Tom, what do you think?
> 
> Attached is what I was thinking about for reference. I just came back
> from a long flight and I am pretty tired, so my brain may have missed
> something. I'll take again a look at this issue on Monday, an open
> item has been added for now.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Simon,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity

> On Jun 4, 2017, at 2:19 PM, Andres Freund  wrote:
> 
> On 2017-06-04 14:16:14 -0700, Mark Dilger wrote:
>> Sorry, I was not clear.  What I meant to get at was that if you remove from 
>> the
>> executor all support for SRFs inside case statements, you might foreclose 
>> the option
>> of extending the syntax at some later date to allow aggregates over
>> SRFs.
> 
> Seems very unlikely that we'd ever want to do that.  The right way to do
> this is to simply move the SRF into the from list.  Having the executor
> support arbitrary sources of tuples would just complicate and slow down
> already complicated and slow code...
> 
> 
>> I'm
>> not saying that this works currently, but in principle if you allowed that 
>> SUM() that
>> I put up there, you'd get back exactly one row from it, same as you get from 
>> the
>> ELSE clause.  That would seem to solve the problem without going so far as
>> completely disallowing the SRF altogether.
> 
> But what would the benefit be?

In my example, the aggregate function is taking a column from the table as
an argument, so the output of the aggregate function needs to be computed per 
row,
not just once.  And if the function is expensive, or has side-effects, you might
only want it to execute for those rows where the CASE statement is true, rather
than for all of them.  You may get that same behavior using lateral or some 
such,
I'm uncertain, but in a complicated CASE statement, it be more straightforward
to write something like:

SELECT
CASE
WHEN t.x = 'foo' THEN expensive_aggfunc1(srf1(t.y,t.z))
WHEN t.x = 'bar' THEN expensive_aggfunc2(srf2(t.y,t.z))
WHEN t.x = 'baz' THEN expensive_aggfunc3(srf3(t.y,t.z))

WHEN t.x = 'zzz' THEN expensive_aggfuncN(srfN(t.y,t.z))
ELSE 5
END
FROM mytable t;

Than to try to write it any other way.

I'm not advocating anything here, even though it may sound that way to you.
I'm just thinking this thing through, given that you may be committing a removal
of functionality that we want back at some later time.

Out of curiosity, how would you rewrite what I have above such that the
aggregate function is not inside the case statement, and the expensive_aggfuncs
are only called for those (t.y,t.z) that are actually appropriate?

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Make ANALYZE more selective about what is a "most common value"?

I've been thinking about the behavior discussed in
https://www.postgresql.org/message-id/flat/20170522132017.29944.48391%40wrigleys.postgresql.org
and it seems to me that there are a couple of things we ought to do about
it.

First, I think we need a larger hard floor on the number of occurrences
of a value that're required to make ANALYZE decide it is a "most common
value". The existing coding is willing to believe that anything that
appears at least twice in the sample is a potential MCV, but that design
originated when we were envisioning stats samples of just a few thousand
rows --- specifically, default_statistics_target was originally just 10,
leading to a 3000-row sample size. So accepting two-appearance values as
MCVs would lead to a minimum MCV frequency estimate of 1/1500. Now it
could be a tenth or a hundredth of that.

As a round number, I'm thinking that a good floor would be a frequency
estimate of 1/1000. With today's typical sample size of 3 rows,
a value would have to appear at least 30 times in the sample to be
believed to be an MCV. That seems like it gives us a reasonable margin
of error against the kind of sampling noise seen in the above-cited
thread.

Second, the code also has a rule that potential MCVs need to have an
estimated frequency at least 25% larger than what it thinks the "average"
value's frequency is. A rule of that general form seems like a good idea,
but I now think the 25% threshold is far too small to do anything useful.
In particular, in any case like this where there are more distinct values
than there are sample rows, the "average frequency" estimate will
correspond to less than one occurrence in the sample, so that this rule is
totally useless to filter anything that we would otherwise consider as an
MCV. I wonder if we shouldn't make it be "at least double the estimated
average frequency".

Thoughts?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity

On 2017-06-04 14:16:14 -0700, Mark Dilger wrote:
> Sorry, I was not clear.  What I meant to get at was that if you remove from 
> the
> executor all support for SRFs inside case statements, you might foreclose the 
> option
> of extending the syntax at some later date to allow aggregates over
> SRFs.

Seems very unlikely that we'd ever want to do that.  The right way to do
this is to simply move the SRF into the from list.  Having the executor
support arbitrary sources of tuples would just complicate and slow down
already complicated and slow code...


> I'm
> not saying that this works currently, but in principle if you allowed that 
> SUM() that
> I put up there, you'd get back exactly one row from it, same as you get from 
> the
> ELSE clause.  That would seem to solve the problem without going so far as
> completely disallowing the SRF altogether.

But what would the benefit be?


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity


> On Jun 4, 2017, at 12:35 PM, Andres Freund  wrote:
> 
> Hi Mark,
> 
> On 2017-06-04 11:55:03 -0700, Mark Dilger wrote:
>>> Yea, I'm not a big fan of the either the pre v10 or the v10 behaviour of
>>> SRFs inside coalesce/case.  Neither is really resonable imo - I'm not
>>> sure a reasonable behaviour even exists.  IIRC I'd argued in the
>>> original SRF thread that we should just throw an error, and I think we'd
>>> concluded that we'd not do so for now.
>> 
>> I am trying to get my head around the type of query you and Tom
>> are discussing.  When you say you are unsure a reasonable behavior
>> even exists, are you saying such queries have no intuitive meaning?
> 
> I'm not saying that there aren't some cases where it's intuitive, but
> there's definitely lots that don't have intuitive meaning.  Especially
> when there are multiple SRFs in the same targetlist.
> 
> Do I understand correctly that you're essentially advocating for the <
> v10 behaviour?

No, I'm not advocating either way.  I merely wanted to know which queries
you and Tom were talking about.

>  It'd be nontrivial to implement that, without loosing
> the significant benefits (Significantly slower, higher code complexity,
> weird behaviour around multiple SRFs) gained by removing the previous
> implementation.   I'd really like to see some examples of when all of
> this is useful - I've yet to see a realistic one that's not just as
> easily written differently
> 
> 
>> Can you give an example of such a query which has no intuitive
>> meaning?  Perhaps I am not thinking about the right kind of queries.
>> I have been thinking about examples like:
>> 
>> SELECT x, CASE WHEN y THEN generate_series(1,z) ELSE 5 END
>>  FROM table_with_columns_x_and_y_and_z;
>> 
>> Which to me gives 'z' output rows per table row where y is true, and
>> one output row per table row where y is false.
> 
> Try any query that has one SRF outside of the CASE, and one inside.  In
> the old behaviour that'll make the total number of rows returned nearly
> undeterministic because of the least-common-multiple behaviour.
> 
>> That could be changed with an aggregate function such as:
>> SELECT x, CASE WHEN y THEN SUM(generate_series(1,z)) ELSE 5 END
>>  FROM table_with_columns_x_and_y;
> 
> That query doesn't work.  First off, aggregates don't take set arguments
> (neither in old nor new releases), secondly aggregates are evaluated
> independently of CASE/COALESCE statements, thirdly you're missing group
> bys.  Those all are independent of the v10 changes.

Sorry, I was not clear.  What I meant to get at was that if you remove from the
executor all support for SRFs inside case statements, you might foreclose the 
option
of extending the syntax at some later date to allow aggregates over SRFs.  I'm
not saying that this works currently, but in principle if you allowed that 
SUM() that
I put up there, you'd get back exactly one row from it, same as you get from the
ELSE clause.  That would seem to solve the problem without going so far as
completely disallowing the SRF altogether.  The reasons for not putting a GROUP 
BY
clause in the example are (a) I don't know where it would go, syntactically, 
and (b)
it would defeat the purpose, because once you are grouping, you again have the
possibility of getting more than one row.

I'm not advocating implementing this; I'm just speculating about possible future
features in which SRFs inside a case statement might be allowed.

> 
>> Thanks, and my apologies if I am merely lacking sufficient imagination to
>> think of a proper example.
> 
> Might be worthwhile to reread the thread about the SRF reimplementation.
> 
> https://www.postgresql.org/message-id/20160822214023.aaxz5l4igypow...@alap3.anarazel.de

This helps, thanks!

Mark Dilger



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Continuous buildfarm failures on hamster with bin-check

On Tue, Apr 18, 2017 at 09:59:26PM +0900, Michael Paquier wrote:
> On Tue, Apr 18, 2017 at 9:35 PM, Andrew Dunstan 
>  wrote:
> > On 04/18/2017 08:23 AM, Michael Paquier wrote:
> >> Increasing wal_sender_timeout and wal_receiver_timeout can help in
> >> reducing the failures seen.
> >
> > OK, but you're only talking about a handful of these, right?
> 
> Yup, that would be one solution but that's not attacking the problem
> at its root.
> 
> > Lets's say we have a bunch of possible environment settings with names
> > that all begin with "PG_TAP_" PostgresNode.pm could check for the
> > existence of these and take action accordingly, and you could set them
> > on a buildfarm animal in the config file, or for interactive use in your
> > .profile.
> 
> That's the point I am trying to make upthread: slow buildfarm animals
> should have minimal impact on core code modifications. We could for
> example have one environment variable that lists all the parameters to
> modify in a single string and appends them at the end of
> postgresql.conf. But honestly I don't think that this is necessary if
> there is only one variable able to define a base directory for
> temporary statistics as the real bottleneck comes from there at least
> in the case of hamster. When initializing a node via PostgresNode.pm,
> we would just check for this variable, and the init() routine just
> creates a temporary folder in it, setting up temp_stats_path in
> postgresql.conf.

Each of the above approaches has fairly low impact on the code, so we should
use other criteria to choose.  I'd welcome a feature for augmenting every
postgresql.conf of every test suite (a generalization of "pg_regress
--temp-config", which has proven its value).  I can envision using it with
force_parallel_mode, default_transaction_isolation, log_*, wal_*_timeout,
autovacuum_naptime, and others.

Even for hamster, I'm skeptical that changing stats_temp_directory would
suffice.  Every hamster BinInstallCheck failure since 2017-02-13 had a "LOG:
terminating walsender process due to replication timeout".  Most, but not all,
of those replication timeouts followed a "LOG:  using stale statistics instead
of current ones because stats collector is not responding".  For the remaining
minority, I expect to eventually need wal_sender_timeout.  Example:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster=2017-02-24%2016%3A00%3A06


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-06-04 Thread Beena Emerson

Hello,

On Sun, Jun 4, 2017 at 9:26 AM, Rafia Sabih
 wrote:
> On Fri, Jun 2, 2017 at 5:48 PM, Robert Haas  wrote:
>> On Fri, Jun 2, 2017 at 8:09 AM, Amit Kapila  wrote:
>>> I think if you have found spelling mistakes unrelated to this patch,
>>> then it is better to submit those as a separate patch in a new thread.
>>
>> +1.
>
> Sure, attached is the version without those changes.
>

Thank you for your comments. I have applied most of the changes. The
regression comment change from 'fail' -> ' Following statement should
fail' and 'ok' -> 'Following should complete successfully' is ignored
since other tests in the file had similar comments

The new patch is rebased over default_partition_v18.patch
[http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg315831.html]


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


default_range_partition_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity

Hi Mark,

On 2017-06-04 11:55:03 -0700, Mark Dilger wrote:
> > Yea, I'm not a big fan of the either the pre v10 or the v10 behaviour of
> > SRFs inside coalesce/case.  Neither is really resonable imo - I'm not
> > sure a reasonable behaviour even exists.  IIRC I'd argued in the
> > original SRF thread that we should just throw an error, and I think we'd
> > concluded that we'd not do so for now.
> 
> I am trying to get my head around the type of query you and Tom
> are discussing.  When you say you are unsure a reasonable behavior
> even exists, are you saying such queries have no intuitive meaning?

I'm not saying that there aren't some cases where it's intuitive, but
there's definitely lots that don't have intuitive meaning.  Especially
when there are multiple SRFs in the same targetlist.

Do I understand correctly that you're essentially advocating for the <
v10 behaviour?  It'd be nontrivial to implement that, without loosing
the significant benefits (Significantly slower, higher code complexity,
weird behaviour around multiple SRFs) gained by removing the previous
implementation.   I'd really like to see some examples of when all of
this is useful - I've yet to see a realistic one that's not just as
easily written differently

> Can you give an example of such a query which has no intuitive
> meaning?  Perhaps I am not thinking about the right kind of queries.
> I have been thinking about examples like:
> 
> SELECT x, CASE WHEN y THEN generate_series(1,z) ELSE 5 END
>   FROM table_with_columns_x_and_y_and_z;
> 
> Which to me gives 'z' output rows per table row where y is true, and
> one output row per table row where y is false.

Try any query that has one SRF outside of the CASE, and one inside.  In
the old behaviour that'll make the total number of rows returned nearly
undeterministic because of the least-common-multiple behaviour.

> That could be changed with an aggregate function such as:
> SELECT x, CASE WHEN y THEN SUM(generate_series(1,z)) ELSE 5 END
>   FROM table_with_columns_x_and_y;

That query doesn't work.  First off, aggregates don't take set arguments
(neither in old nor new releases), secondly aggregates are evaluated
independently of CASE/COALESCE statements, thirdly you're missing group
bys.  Those all are independent of the v10 changes.

> Thanks, and my apologies if I am merely lacking sufficient imagination to
> think of a proper example.

Might be worthwhile to reread the thread about the SRF reimplementation.

https://www.postgresql.org/message-id/20160822214023.aaxz5l4igypow...@alap3.anarazel.de

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and missing loadable libraries

On Sun, Jun  4, 2017 at 02:30:58PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > I didn't want to optimize for it --- I wanted a way to detect when DROP
> > EXTENSION has no hope of working, and give more details.  I assume the
> > problem with that is the the object names are inside SQL scripts that
> > cannot be easily interrogated.  Are the pg_proc entries tied to the
> > extension in some verifiable way that we could identify orphaned pg_proc
> > lines?
> 
> You could look for 'e'-type pg_depend entries.

OK, I will run some tests later and report back.  Thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity


> On Jun 4, 2017, at 11:55 AM, Mark Dilger  wrote:
> 
> SELECT x, CASE WHEN y THEN SUM(generate_series(1,z)) ELSE 5 END
>   FROM table_with_columns_x_and_y;

Sorry, this table is supposed to be the same as the previous one,

table_with_columns_x_and_y_and_z



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity

> On Jun 2, 2017, at 8:11 PM, Andres Freund  wrote:
> 
> Hi,
> 
> 
> On 2017-06-02 22:53:00 -0400, Tom Lane wrote:
>> I think you've got enough on your plate.  I can take care of whatever
>> we decide to do here.
> 
> Thanks.
> 
> 
>> Andres Freund  writes:
 Another possibility is to say that we've broken this situation
 irretrievably and we should start throwing errors for SRFs in
 places where they'd be conditionally evaluated.  That's not real
 nice perhaps, but it's better than the way things are right now.
>> 
>>> I'd be ok with that too, but I don't really see a strong need so far.
>> 
>> The argument for this way is basically that it's better to break
>> apps visibly than silently.
> 
> Right, I got that.
> 
> 
>> The behavior for SRF-inside-CASE is
>> not going to be the same as before even if we implement the fix
>> I suggest above, and it's arguable that this new behavior is not
>> at all intuitive.
> 
> Yea, I'm not a big fan of the either the pre v10 or the v10 behaviour of
> SRFs inside coalesce/case.  Neither is really resonable imo - I'm not
> sure a reasonable behaviour even exists.  IIRC I'd argued in the
> original SRF thread that we should just throw an error, and I think we'd
> concluded that we'd not do so for now.

I am trying to get my head around the type of query you and Tom
are discussing.  When you say you are unsure a reasonable behavior
even exists, are you saying such queries have no intuitive meaning?

Can you give an example of such a query which has no intuitive
meaning?  Perhaps I am not thinking about the right kind of queries.
I have been thinking about examples like:

SELECT x, CASE WHEN y THEN generate_series(1,z) ELSE 5 END
FROM table_with_columns_x_and_y_and_z;

Which to me gives 'z' output rows per table row where y is true, and
one output row per table row where y is false.  That could be changed
with an aggregate function such as:

SELECT x, CASE WHEN y THEN SUM(generate_series(1,z)) ELSE 5 END
FROM table_with_columns_x_and_y;

Which to me gives one output row per table row regardless of whether y
is true.

Thanks, and my apologies if I am merely lacking sufficient imagination to
think of a proper example.

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-06-04 Thread Beena Emerson

On Mon, Jun 5, 2017 at 12:14 AM, Jeevan Ladhe
 wrote:
>
>
>>
>> What is the reason the new patch does not mention of violating rows
>> when a new partition overlaps with default?
>> Is it because more than one row could be violating the condition?
>
>
> This is because, for reporting the violating error, I had to function
> ExecBuildSlotValueDescription() public. Per Amit's comment I have
> removed this change and let the overlapping error without row contains.
> I think this is analogus to other functions that are throwing violation
> error
> but are not local to execMain.c.
>

ok thanks.


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-06-04 Thread Jeevan Ladhe

> What is the reason the new patch does not mention of violating rows
> when a new partition overlaps with default?
> Is it because more than one row could be violating the condition?
>

This is because, for reporting the violating error, I had to function
ExecBuildSlotValueDescription() public. Per Amit's comment I have
removed this change and let the overlapping error without row contains.
I think this is analogus to other functions that are throwing violation
error
but are not local to execMain.c.

Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

2017-06-04 Thread Beena Emerson

Hello,

On Fri, Jun 2, 2017 at 1:05 AM, Jeevan Ladhe
 wrote:
> Hi,
>
> I have addressed Ashutosh's and Amit's comments in the attached patch.
>
> Please let me know if I have missed anything and any further comments.
>
> PFA.
>
> Regards,
> Jeevan Ladhe
>

What is the reason the new patch does not mention of violating rows
when a new partition overlaps with default?
Is it because more than one row could be violating the condition?

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and missing loadable libraries

Bruce Momjian  writes:
> I didn't want to optimize for it --- I wanted a way to detect when DROP
> EXTENSION has no hope of working, and give more details.  I assume the
> problem with that is the the object names are inside SQL scripts that
> cannot be easily interrogated.  Are the pg_proc entries tied to the
> extension in some verifiable way that we could identify orphaned pg_proc
> lines?

You could look for 'e'-type pg_depend entries.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and missing loadable libraries

On Sun, Jun  4, 2017 at 02:04:37PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > The problem is that in some cases extensions are improperly removed or
> > the extension has bugs that leaves pg_proc entries around that aren't
> > dumped, but are seen by pg_upgrade and generate an error.  In these
> > cases, and I have seen a few recently, we don't give the user any way to
> > find the cause except ask for assistance, i.e. we don't show them the
> > query we used to find the problem libraries.
> 
> Meh.  I think that sort of situation is one in which non-experts are
> going to need help in any case.  It's unlikely that pg_upgrade can,
> or should try to, offer them advice sufficient to fix the problem.
> 
> Also, I completely reject the idea that pg_upgrade's output should
> be optimized for that situation rather than the typical "you forgot
> to install these extensions in the new installation" case.

I didn't want to optimize for it --- I wanted a way to detect when DROP
EXTENSION has no hope of working, and give more details.  I assume the
problem with that is the the object names are inside SQL scripts that
cannot be easily interrogated.  Are the pg_proc entries tied to the
extension in some verifiable way that we could identify orphaned pg_proc
lines?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and missing loadable libraries

Bruce Momjian  writes:
> The problem is that in some cases extensions are improperly removed or
> the extension has bugs that leaves pg_proc entries around that aren't
> dumped, but are seen by pg_upgrade and generate an error.  In these
> cases, and I have seen a few recently, we don't give the user any way to
> find the cause except ask for assistance, i.e. we don't show them the
> query we used to find the problem libraries.

Meh.  I think that sort of situation is one in which non-experts are
going to need help in any case.  It's unlikely that pg_upgrade can,
or should try to, offer them advice sufficient to fix the problem.

Also, I completely reject the idea that pg_upgrade's output should
be optimized for that situation rather than the typical "you forgot
to install these extensions in the new installation" case.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and missing loadable libraries

On Sun, Jun  4, 2017 at 01:55:01PM -0400, Tom Lane wrote:
> > *  should we print all the pg_proc.pronames that are involved, not just
> > the unique library names
> > *  should we output a query helping people find the pg_proc entries
> 
> > I think there are many cases where DROP EXTENSION XXX fixes the problem,
> 
> Yes.  I think in most cases nowadays there's a one-for-one correlation
> between extensions and libraries; drilling down to the level of individual
> functions would just be confusing clutter.  I think if you just print
> a report saying "these libraries are referenced in these databases",
> that would be sufficiently usable in most cases.

OK.

> You could think about printing a script full of DROP EXTENSION commands,
> but aside from the sheer difficulty of doing that, it doesn't seem all
> that helpful.  Simply dropping every extension is usually *not* the
> right answer, and it could easily lead to data loss if done blindly.
> Usually people are going to need to stop and think anyway.

The problem is that in some cases extensions are improperly removed or
the extension has bugs that leaves pg_proc entries around that aren't
dumped, but are seen by pg_upgrade and generate an error.  In these
cases, and I have seen a few recently, we don't give the user any way to
find the cause except ask for assistance, i.e. we don't show them the
query we used to find the problem libraries.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and missing loadable libraries

Bruce Momjian  writes:
> On Sun, Jun  4, 2017 at 01:20:12PM -0400, Alvaro Herrera wrote:
>> I think it'd be better to be exhaustive about the report, i.e. report
>> all problems in all databases, if possible.  Doing repeated pg_upgrade
>> attempts until you've nailed all the problems is boring ...

> Well, I think there are three open items:

> *  should we print all the database names involved

Yes.

> *  should we print all the pg_proc.pronames that are involved, not just
> the unique library names
> *  should we output a query helping people find the pg_proc entries

> I think there are many cases where DROP EXTENSION XXX fixes the problem,

Yes.  I think in most cases nowadays there's a one-for-one correlation
between extensions and libraries; drilling down to the level of individual
functions would just be confusing clutter.  I think if you just print
a report saying "these libraries are referenced in these databases",
that would be sufficiently usable in most cases.

You could think about printing a script full of DROP EXTENSION commands,
but aside from the sheer difficulty of doing that, it doesn't seem all
that helpful.  Simply dropping every extension is usually *not* the
right answer, and it could easily lead to data loss if done blindly.
Usually people are going to need to stop and think anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and missing loadable libraries

On Sun, Jun  4, 2017 at 01:20:12PM -0400, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > I have seen a few reports where people are getting this pg_upgrade
> > error:
> > 
> > Your installation references loadable libraries that are missing
> > from the new installation.  You can add these libraries to the
> > new installation, or remove the functions using them from the
> > old installation.  A list of problem libraries is in the file:
> > 
> > ./loadable_libraries.txt
> > 
> > and the file contains:
> > 
> > could not load library "$libdir/pgpool-regclass":
> > ERROR:  could not access file "$libdir/pgpool-regclass": No such file 
> > or directory
> > 
> > The problem is that there is no indicate of which database to look in. 
> 
> I think it'd be better to be exhaustive about the report, i.e. report
> all problems in all databases, if possible.  Doing repeated pg_upgrade
> attempts until you've nailed all the problems is boring ...

Well, I think there are three open items:

*  should we print all the database names involved

*  should we print all the pg_proc.pronames that are involved, not just
the unique library names

*  should we output a query helping people find the pg_proc entries

I think there are many cases where DROP EXTENSION XXX fixes the problem,
and in those cases showing pg_proc.pronames or giving them a query to
find them is a negative --- they should be pointed to DROP EXTENSION. 
Can we detect when to recommend one over the other?  Can we tell which
proc entries will not be in the dump and can be ignored?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and missing loadable libraries

2017-06-04 Thread Alvaro Herrera

Bruce Momjian wrote:
> I have seen a few reports where people are getting this pg_upgrade
> error:
> 
>   Your installation references loadable libraries that are missing
>   from the new installation.  You can add these libraries to the
>   new installation, or remove the functions using them from the
>   old installation.  A list of problem libraries is in the file:
>   
>   ./loadable_libraries.txt
>   
> and the file contains:
> 
>   could not load library "$libdir/pgpool-regclass":
>   ERROR:  could not access file "$libdir/pgpool-regclass": No such file 
> or directory
> 
> The problem is that there is no indicate of which database to look in. 

I think it'd be better to be exhaustive about the report, i.e. report
all problems in all databases, if possible.  Doing repeated pg_upgrade
attempts until you've nailed all the problems is boring ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and missing loadable libraries

On 2017-06-04 13:06:25 -0400, Bruce Momjian wrote:
> This seems to be one of the last pg_upgrade problems

Famous last words.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_upgrade and missing loadable libraries

I have seen a few reports where people are getting this pg_upgrade
error:

Your installation references loadable libraries that are missing
from the new installation.  You can add these libraries to the
new installation, or remove the functions using them from the
old installation.  A list of problem libraries is in the file:

./loadable_libraries.txt

and the file contains:

could not load library "$libdir/pgpool-regclass":
ERROR:  could not access file "$libdir/pgpool-regclass": No such file 
or directory

The problem is that there is no indicate of which database to look in. 
Should we adjust the output to suggest the first database that has it,
or update the instructions to mention they have to look in all
databases, and give them some instructions on finding the problem?

This seems to be one of the last pg_upgrade problems, along with
preserving optimizer statistics, which I am hoping to do for PG 11.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index created in BEFORE trigger not updated during INSERT