Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-05-25 Thread Alvaro Herrera
On 2021-May-24, Noah Misch wrote:

> prairiedog and wrasse failed a $SUBJECT test after this (commit 8aba932).
> Also, some non-CLOBBER_CACHE_ALWAYS animals failed a test before the fix.
> These IsolationCheck failures match detach-partition-concurrently[^\n]*FAILED:
> 
>   sysname   │  snapshot   │ branch │  
>bfurl  
> ┼─┼┼

Checking this list, these three failures can be explained by the
detach-partition-concurrently-3 that was just patched.

>  wrasse │ 2021-04-21 10:38:32 │ HEAD   │ 
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse=2021-04-21%2010%3A38%3A32
>  prairiedog │ 2021-04-25 22:05:48 │ HEAD   │ 
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog=2021-04-25%2022%3A05%3A48
>  wrasse │ 2021-05-11 03:43:40 │ HEAD   │ 
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse=2021-05-11%2003%3A43%3A40

Next there's a bunch whose error message is the same that we had seen
earlier in this thread; these animals are all CLOBBER_CACHE_ALWAYS:

 step s1insert: insert into d4_fk values (1);
 +ERROR:  insert or update on table "d4_fk" violates foreign key constraint 
"d4_fk_a_fkey"

>  hyrax  │ 2021-03-27 07:29:34 │ HEAD   │ 
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2021-03-27%2007%3A29%3A34
>  trilobite  │ 2021-03-29 18:14:24 │ HEAD   │ 
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite=2021-03-29%2018%3A14%3A24
>  hyrax  │ 2021-04-01 07:21:10 │ HEAD   │ 
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2021-04-01%2007%3A21%3A10
>  avocet │ 2021-04-05 15:45:56 │ HEAD   │ 
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=avocet=2021-04-05%2015%3A45%3A56
>  hyrax  │ 2021-04-06 07:15:16 │ HEAD   │ 
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2021-04-06%2007%3A15%3A16
>  hyrax  │ 2021-04-11 07:25:50 │ HEAD   │ 
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2021-04-11%2007%3A25%3A50
>  hyrax  │ 2021-04-20 18:25:37 │ HEAD   │ 
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2021-04-20%2018%3A25%3A37

This is fine, because the fix commit 8aba932 is dated April 22 and these
failures all predate that.


And finally there's these two:

>  topminnow  │ 2021-03-28 20:37:38 │ HEAD   │ 
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=topminnow=2021-03-28%2020%3A37%3A38
>  dragonet   │ 2021-04-01 19:48:03 │ HEAD   │ 
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet=2021-04-01%2019%3A48%3A03

(animals not CCA) which are exposing the same problem in
detach-partition-concurrently-4 that we just fixed in
detach-partition-concurrently-3, so we should apply the same fix: add a
no-op step right after the cancel to prevent the error report from
changing.  I'll go do that after grabbing some coffee.

Thanks for digging into the reports!

-- 
Álvaro Herrera39°49'30"S 73°17'W
"Cada quien es cada cual y baja las escaleras como quiere" (JMSerrat)




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-05-24 Thread Amit Langote
On Mon, May 24, 2021 at 6:07 PM Noah Misch  wrote:
> On Wed, Apr 21, 2021 at 04:38:55PM -0400, Alvaro Herrera wrote:
>
> [fix to let CLOBBER_CACHE_ALWAYS pass]
>
> > Barring objections, I will get this pushed early tomorrow.
>
> prairiedog and wrasse failed a $SUBJECT test after this (commit 8aba932).
> Also, some non-CLOBBER_CACHE_ALWAYS animals failed a test before the fix.
> These IsolationCheck failures match detach-partition-concurrently[^\n]*FAILED:

FWIW, all 4 detach-partition-concurrently suites passed for me on a
build of the latest HEAD, with CPPFLAGS = -DRELCACHE_FORCE_RELEASE
-DCATCACHE_FORCE_RELEASE -DCLOBBER_CACHE_ALWAYS -D_GNU_SOURCE

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-05-24 Thread Noah Misch
On Wed, Apr 21, 2021 at 04:38:55PM -0400, Alvaro Herrera wrote:

[fix to let CLOBBER_CACHE_ALWAYS pass]

> Barring objections, I will get this pushed early tomorrow.

prairiedog and wrasse failed a $SUBJECT test after this (commit 8aba932).
Also, some non-CLOBBER_CACHE_ALWAYS animals failed a test before the fix.
These IsolationCheck failures match detach-partition-concurrently[^\n]*FAILED:

  sysname   │  snapshot   │ branch │
 bfurl  
┼─┼┼
 hyrax  │ 2021-03-27 07:29:34 │ HEAD   │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2021-03-27%2007%3A29%3A34
 topminnow  │ 2021-03-28 20:37:38 │ HEAD   │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=topminnow=2021-03-28%2020%3A37%3A38
 trilobite  │ 2021-03-29 18:14:24 │ HEAD   │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite=2021-03-29%2018%3A14%3A24
 hyrax  │ 2021-04-01 07:21:10 │ HEAD   │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2021-04-01%2007%3A21%3A10
 dragonet   │ 2021-04-01 19:48:03 │ HEAD   │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet=2021-04-01%2019%3A48%3A03
 avocet │ 2021-04-05 15:45:56 │ HEAD   │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=avocet=2021-04-05%2015%3A45%3A56
 hyrax  │ 2021-04-06 07:15:16 │ HEAD   │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2021-04-06%2007%3A15%3A16
 hyrax  │ 2021-04-11 07:25:50 │ HEAD   │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2021-04-11%2007%3A25%3A50
 hyrax  │ 2021-04-20 18:25:37 │ HEAD   │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2021-04-20%2018%3A25%3A37
 wrasse │ 2021-04-21 10:38:32 │ HEAD   │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse=2021-04-21%2010%3A38%3A32
 prairiedog │ 2021-04-25 22:05:48 │ HEAD   │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog=2021-04-25%2022%3A05%3A48
 wrasse │ 2021-05-11 03:43:40 │ HEAD   │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse=2021-05-11%2003%3A43%3A40
(12 rows)




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-05-12 Thread Amit Langote
On Fri, May 7, 2021 at 2:13 AM Álvaro Herrera  wrote:
> On 2021-Apr-30, Amit Langote wrote:
>
> > The case I was looking at is when a partition detach appears as
> > in-progress to a serializable transaction.
>
> Yeah, I was exceedingly sloppy on my reasoning about this, and you're
> right that that's what actually happens rather than what I said.
>
> > If the caller wants to omit detached partitions, such a partition ends
> > up in rd_partdesc_nodetached, with the corresponding xmin being set to
> > 0 due to the way find_inheritance_children_extended() sets
> > *detached_xmin.  The next query in the transaction that wants to omit
> > detached partitions, seeing rd_partdesc_nodetached_xmin being invalid,
> > rebuilds the partdesc, again including that partition because the
> > snapshot wouldn't have changed, and so on until the transaction ends.
> > Now, this can perhaps be "fixed" by making
> > find_inheritance_children_extended() set the xmin outside the
> > snapshot-checking block, but maybe there's no need to address this on
> > priority.
>
> Hmm.  See below.
>
> > Rather, a point that bothers me a bit is that we're including a
> > detached partition in the partdesc labeled "nodetached" in this
> > particular case.  Maybe we should avoid that by considering in this
> > scenario that no detached partitions exist for this transactions and
> > so initialize rd_partdesc, instead of rd_partdesc_nodetached.  That
> > will let us avoid the situations where the xmin is left in invalid
> > state.  Maybe like the attached (it also fixes a couple of
> > typos/thinkos in the previous commit).
>
> Makes sense -- applied, thanks.

Thank you.

> > Note that we still end up in the same situation as before where each
> > query in the serializable transaction that sees the detach as
> > in-progress to have to rebuild the partition descriptor omitting the
> > detached partitions, even when it's clear that the detach-in-progress
> > partition will be included every time.
>
> Yeah, you're right that there is a performance hole in the case where a
> partition pending detach exists and you're using repeatable read
> transactions.  I didn't see it as terribly critical since it's supposed
> to be very transient, but I may be wrong.

Yeah, I'd hope so too.  I think RR transactions would have to be
concurrent with an interrupted DETACH CONCURRENTLY to suffer the
performance hit and that does kind of make this a rarely occurring
case.

--
Amit Langote
EDB: http://www.enterprisedb.com




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-05-06 Thread Álvaro Herrera
On 2021-May-06, Amit Langote wrote:

> That makes sense, thanks for noticing.
> 
> How about the attached?

I tweaked the linkage; as submitted, the text in the link contained what
is in the  tag, so literally it had:

  ... see DETACH PARTITION partition_name [ CONCURRENTLY | FINALIZE ] for
  details ...

which didn't look very nice.  So I made it use  instead of xref
and wrote the "ALTER TABLE .. DETACH PARTITION" text.  I first tried to
fix it by adding an "xreflabel" attrib, but I didn't like it because the
text was not set in fixed width font.

I also tweaked the wording to match the surrounding text a bit better,
at least IMO.  Feel free to suggest improvements.

Thanks!

-- 
Álvaro Herrera39°49'30"S 73°17'W
"That sort of implies that there are Emacs keystrokes which aren't obscure.
I've been using it daily for 2 years now and have yet to discover any key
sequence which makes any sense."(Paul Thomas)




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-05-06 Thread Álvaro Herrera
On 2021-May-05, Pavel Luzanov wrote:

> Hello,
> 
> I found this in the documentation, section '5.11.3. Partitioning Using
> Inheritance'[1]:
> "Some operations require a stronger lock when using declarative partitioning
> than when using table inheritance. For example, removing a partition from a
> partitioned table requires taking an ACCESS EXCLUSIVE lock on the parent
> table, whereas a SHARE UPDATE EXCLUSIVE lock is enough in the case of
> regular inheritance."
> 
> This point is no longer valid with some restrictions. If the table has a
> default partition, then removing a partition still requires taking an ACCESS
> EXCLUSIVE lock.

Hmm, are there any other operations for which the partitioning command
takes a strong lock than the legacy inheritance corresponding command?
If there aren't any, then it's okay to delete this paragraph as in the
proposed patch.  But if there are any, then I think we should change the
example to mention that other operation.  I'm not sure what's a good way
to verify that, though.

Also, it remains true that without CONCURRENTLY the DETACH operation
takes AEL.  I'm not sure it's worth pointing this out in this paragraph.

> May be make sense to add some details about DETACH CONCURRENTLY to the
> section '5.11.2.2. Partition Maintenance' and completely remove this point?

Maybe you're right, though.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"Las mujeres son como hondas:  mientras más resistencia tienen,
 más lejos puedes llegar con ellas"  (Jonas Nightingale, Leap of Faith)




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-05-06 Thread Álvaro Herrera
On 2021-Apr-30, Amit Langote wrote:

> The case I was looking at is when a partition detach appears as
> in-progress to a serializable transaction.

Yeah, I was exceedingly sloppy on my reasoning about this, and you're
right that that's what actually happens rather than what I said.

> If the caller wants to omit detached partitions, such a partition ends
> up in rd_partdesc_nodetached, with the corresponding xmin being set to
> 0 due to the way find_inheritance_children_extended() sets
> *detached_xmin.  The next query in the transaction that wants to omit
> detached partitions, seeing rd_partdesc_nodetached_xmin being invalid,
> rebuilds the partdesc, again including that partition because the
> snapshot wouldn't have changed, and so on until the transaction ends.
> Now, this can perhaps be "fixed" by making
> find_inheritance_children_extended() set the xmin outside the
> snapshot-checking block, but maybe there's no need to address this on
> priority.

Hmm.  See below.

> Rather, a point that bothers me a bit is that we're including a
> detached partition in the partdesc labeled "nodetached" in this
> particular case.  Maybe we should avoid that by considering in this
> scenario that no detached partitions exist for this transactions and
> so initialize rd_partdesc, instead of rd_partdesc_nodetached.  That
> will let us avoid the situations where the xmin is left in invalid
> state.  Maybe like the attached (it also fixes a couple of
> typos/thinkos in the previous commit).

Makes sense -- applied, thanks.

> Note that we still end up in the same situation as before where each
> query in the serializable transaction that sees the detach as
> in-progress to have to rebuild the partition descriptor omitting the
> detached partitions, even when it's clear that the detach-in-progress
> partition will be included every time.

Yeah, you're right that there is a performance hole in the case where a
partition pending detach exists and you're using repeatable read
transactions.  I didn't see it as terribly critical since it's supposed
to be very transient, but I may be wrong.

-- 
Álvaro Herrera   Valdivia, Chile
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-05-06 Thread Pavel Luzanov

On 06.05.2021 08:35, Amit Langote wrote:
On Wed, May 5, 2021 at 7:59 PM Pavel Luzanov 
 wrote:
I found this in the documentation, section '5.11.3. Partitioning 
Using Inheritance'[1]: "Some operations require a stronger lock when 
using declarative partitioning than when using table inheritance. For 
example, removing a partition from a partitioned table requires 
taking an ACCESS EXCLUSIVE lock on the parent table, whereas a SHARE 
UPDATE EXCLUSIVE lock is enough in the case of regular inheritance." 
This point is no longer valid with some restrictions. If the table 
has a default partition, then removing a partition still requires 
taking an ACCESS EXCLUSIVE lock. May be make sense to add some 
details about DETACH CONCURRENTLY to the section '5.11.2.2. Partition 
Maintenance' and completely remove this point? 1. 
https://www.postgresql.org/docs/devel/ddl-partitioning.html#DDL-PARTITIONING-USING-INHERITANCE
That makes sense, thanks for noticing. How about the attached? 


I like it.
Especially the link to the ALTER TABLE, this avoids duplication of all 
the nuances of the the DETACH .. CONCURRENTLY.


--
Pavel Luzanov
Postgres Professional: https://postgrespro.com
The Russian Postgres Company





Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-05-05 Thread Amit Langote
On Wed, May 5, 2021 at 7:59 PM Pavel Luzanov  wrote:
> I found this in the documentation, section '5.11.3. Partitioning Using 
> Inheritance'[1]:
> "Some operations require a stronger lock when using declarative partitioning 
> than when using table inheritance. For example, removing a partition from a 
> partitioned table requires taking an ACCESS EXCLUSIVE lock on the parent 
> table, whereas a SHARE UPDATE EXCLUSIVE lock is enough in the case of regular 
> inheritance."
>
> This point is no longer valid with some restrictions. If the table has a 
> default partition, then removing a partition still requires taking an ACCESS 
> EXCLUSIVE lock.
>
> May be make sense to add some details about DETACH CONCURRENTLY to the 
> section '5.11.2.2. Partition Maintenance' and completely remove this point?
>
> 1. 
> https://www.postgresql.org/docs/devel/ddl-partitioning.html#DDL-PARTITIONING-USING-INHERITANCE

That makes sense, thanks for noticing.

How about the attached?

-- 
Amit Langote
EDB: http://www.enterprisedb.com


ddl-partitioning-note-detach-concurrenltly.patch
Description: Binary data


Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-05-05 Thread Pavel Luzanov

Hello,

I found this in the documentation, section '5.11.3. Partitioning Using 
Inheritance'[1]:
"Some operations require a stronger lock when using declarative 
partitioning than when using table inheritance. For example, removing a 
partition from a partitioned table requires taking an ACCESS EXCLUSIVE 
lock on the parent table, whereas a SHARE UPDATE EXCLUSIVE lock is 
enough in the case of regular inheritance."


This point is no longer valid with some restrictions. If the table has a 
default partition, then removing a partition still requires taking an 
ACCESS EXCLUSIVE lock.


May be make sense to add some details about DETACH CONCURRENTLY to the 
section '5.11.2.2. Partition Maintenance' and completely remove this point?


1. 
https://www.postgresql.org/docs/devel/ddl-partitioning.html#DDL-PARTITIONING-USING-INHERITANCE


Pavel Luzanov
Postgres Professional: https://postgrespro.com
The Russian Postgres Company




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-30 Thread Amit Langote
(Thanks for committing the fix.)

On Thu, Apr 29, 2021 at 1:11 AM Álvaro Herrera  wrote:
> On Wed, Apr 28, 2021, at 10:21 AM, Amit Langote wrote:
> I noticed that rd_partdesc_nodetached_xmin can sometimes end up with
> value 0. While you seem to be already aware of that, because otherwise
> you wouldn't have added TransactionIdIsValid(...) in condition in
> RelationGetPartitionDesc(), the comments nearby don't mention why such
> a thing might happen.  Also, I guess it can't be helped that the
> partdesc_nodetached will have to be leaked when the xmin is 0, but
> that shouldn't be as problematic as the case we discussed earlier.
>
>
> The only case I am aware where that can happen is if the pg_inherits tuple is 
> frozen. (That's exactly what the affected test case was testing, note the 
> "VACUUM FREEZE pg_inherits" there).  So that test case blew up immediately; 
> but I think the real-world chances that people are going to be doing that are 
> pretty low, so I'm not really concerned about the leak.

The case I was looking at is when a partition detach appears as
in-progress to a serializable transaction.  If the caller wants to
omit detached partitions, such a partition ends up in
rd_partdesc_nodetached, with the corresponding xmin being set to 0 due
to the way find_inheritance_children_extended() sets *detached_xmin.
The next query in the transaction that wants to omit detached
partitions, seeing rd_partdesc_nodetached_xmin being invalid, rebuilds
the partdesc, again including that partition because the snapshot
wouldn't have changed, and so on until the transaction ends.  Now,
this can perhaps be "fixed" by making
find_inheritance_children_extended() set the xmin outside the
snapshot-checking block, but maybe there's no need to address this on
priority.

Rather, a point that bothers me a bit is that we're including a
detached partition in the partdesc labeled "nodetached" in this
particular case.  Maybe we should avoid that by considering in this
scenario that no detached partitions exist for this transactions and
so initialize rd_partdesc, instead of rd_partdesc_nodetached.  That
will let us avoid the situations where the xmin is left in invalid
state.  Maybe like the attached (it also fixes a couple of
typos/thinkos in the previous commit).

Note that we still end up in the same situation as before where each
query in the serializable transaction that sees the detach as
in-progress to have to rebuild the partition descriptor omitting the
detached partitions, even when it's clear that the detach-in-progress
partition will be included every time.

--
Amit Langote
EDB: http://www.enterprisedb.com


partdesc_nodetached-only-if-detached-visible.patch
Description: Binary data


Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-28 Thread Álvaro Herrera
Pushed that now, with a one-line addition to the docs that only one
partition can be marked detached.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"That sort of implies that there are Emacs keystrokes which aren't obscure.
I've been using it daily for 2 years now and have yet to discover any key
sequence which makes any sense."(Paul Thomas)




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-28 Thread Álvaro Herrera
Thanks for re-reviewing! This one I hope is the last version.

On Wed, Apr 28, 2021, at 10:21 AM, Amit Langote wrote:
> I noticed that rd_partdesc_nodetached_xmin can sometimes end up with
> value 0. While you seem to be already aware of that, because otherwise
> you wouldn't have added TransactionIdIsValid(...) in condition in
> RelationGetPartitionDesc(), the comments nearby don't mention why such
> a thing might happen.  Also, I guess it can't be helped that the
> partdesc_nodetached will have to be leaked when the xmin is 0, but
> that shouldn't be as problematic as the case we discussed earlier.

The only case I am aware where that can happen is if the pg_inherits tuple is 
frozen. (That's exactly what the affected test case was testing, note the 
"VACUUM FREEZE pg_inherits" there).  So that test case blew up immediately; but 
I think the real-world chances that people are going to be doing that are 
pretty low, so I'm not really concerned about the leak.

> Would it be a bit more readable to just duplicate this stanza in the
> blocks that assign to rd_partdesc_nodetached and rd_partdesc,
> respectively?  That's not much code to duplicate and it'd be easier to
> see which context is for which partdesc.

Sure .. that's how I first wrote this code.  We don't use that style much, so 
I'm OK with backing out of it.

> +   TransactionId rd_partdesc_nodetached_xmin;  /* xmin for the above */
> 
> Could you please expand this description a bit?

Done.From 8b1489a17cb5cfcd30d4dfd18020c64e606c5042 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 27 Apr 2021 19:04:37 -0400
Subject: [PATCH v5] Allow a partdesc-omitting-partitions to be cached

Makes partition descriptor acquisition faster during the transient
period in which a partition is in the process of being detached.

(This incidentally fixes a bug in 8aba9322511 whereby a memory context
holding a transient partdesc is reparented to NULL, leading to permanent
leak of that memory.  Reported by Amit Langote.)

Per gripe from Amit Langote
Discussion: https://postgr.es/m/ca+hiwqfgpp1lxjzobygt9rpvtjxxkg5qg2+xch2z1q7krqz...@mail.gmail.com
---
 src/backend/catalog/pg_inherits.c |  52 -
 src/backend/commands/tablecmds.c  |  66 ++-
 src/backend/commands/trigger.c|   3 +-
 src/backend/partitioning/partdesc.c   | 101 +++-
 src/backend/utils/cache/relcache.c|  33 +-
 src/include/catalog/pg_inherits.h |   6 +-
 src/include/utils/rel.h   |  15 +++
 .../detach-partition-concurrently-3.out   | 110 +-
 .../detach-partition-concurrently-3.spec  |  14 ++-
 9 files changed, 327 insertions(+), 73 deletions(-)

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 6447b52854..c373faf2d6 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -52,6 +52,22 @@ typedef struct SeenRelsEntry
  * then no locks are acquired, but caller must beware of race conditions
  * against possible DROPs of child relations.
  *
+ * Partitions marked as being detached are omitted; see
+ * find_inheritance_children_extended for details.
+ */
+List *
+find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
+{
+	return find_inheritance_children_extended(parentrelId, true, lockmode,
+			  NULL, NULL);
+}
+
+/*
+ * find_inheritance_children_extended
+ *
+ * As find_inheritance_children, with more options regarding detached
+ * partitions.
+ *
  * If a partition's pg_inherits row is marked "detach pending",
  * *detached_exist (if not null) is set true.
  *
@@ -60,11 +76,13 @@ typedef struct SeenRelsEntry
  * marked "detach pending" is visible to that snapshot, then that partition is
  * omitted from the output list.  This makes partitions invisible depending on
  * whether the transaction that marked those partitions as detached appears
- * committed to the active snapshot.
+ * committed to the active snapshot.  In addition, *detached_xmin (if not null)
+ * is set to the xmin of the row of the detached partition.
  */
 List *
-find_inheritance_children(Oid parentrelId, bool omit_detached,
-		  LOCKMODE lockmode, bool *detached_exist)
+find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
+   LOCKMODE lockmode, bool *detached_exist,
+   TransactionId *detached_xmin)
 {
 	List	   *list = NIL;
 	Relation	relation;
@@ -132,7 +150,32 @@ find_inheritance_children(Oid parentrelId, bool omit_detached,
 snap = GetActiveSnapshot();
 
 if (!XidInMVCCSnapshot(xmin, snap))
+{
+	if (detached_xmin)
+	{
+		/*
+		 * Two detached partitions should not occur (see
+		 * checks in MarkInheritDetached), but if they do,
+		 * track the newer of the two.  Make sure to warn the
+		 * user, so that they can clean up.  Since this is
+		 * just a cross-check against potentially corrupt
+		 * catalogs, we don't 

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-28 Thread Amit Langote
On Wed, Apr 28, 2021 at 8:32 AM Alvaro Herrera  wrote:
> On 2021-Apr-27, Alvaro Herrera wrote:
>
> > This v3 handles things as you suggested and works correctly AFAICT.  I'm
> > going to add some more tests cases to verify the behavior in the
> > scenarios you showed, and get them to run under cache-clobber options to
> > make sure it's good.
>
> Yep, it seems to work.  Strangely, the new isolation case doesn't
> actually crash before the fix -- it merely throws a memory allocation
> error.

Thanks.  Yeah, it does seem to work.

I noticed that rd_partdesc_nodetached_xmin can sometimes end up with
value 0. While you seem to be already aware of that, because otherwise
you wouldn't have added TransactionIdIsValid(...) in condition in
RelationGetPartitionDesc(), the comments nearby don't mention why such
a thing might happen.  Also, I guess it can't be helped that the
partdesc_nodetached will have to be leaked when the xmin is 0, but
that shouldn't be as problematic as the case we discussed earlier.

+   /*
+* But first, a kluge: if there's an old context for this type of
+* descriptor, it contains an old partition descriptor that may still be
+* referenced somewhere.  Preserve it, while not leaking it, by
+* reattaching it as a child context of the new one.  Eventually it will
+* get dropped by either RelationClose or RelationClearRelation.
+*
+* We keep the regular partdesc in rd_pdcxt, and the partdesc-excluding-
+* detached-partitions in rd_pddcxt.
+*/
+   context = is_omit ? >rd_pddcxt : >rd_pdcxt;
+   if (*context != NULL)
+   MemoryContextSetParent(*context, new_pdcxt);
+   *context = new_pdcxt;

Would it be a bit more readable to just duplicate this stanza in the
blocks that assign to rd_partdesc_nodetached and rd_partdesc,
respectively?  That's not much code to duplicate and it'd be easier to
see which context is for which partdesc.

+   TransactionId rd_partdesc_nodetached_xmin;  /* xmin for the above */

Could you please expand this description a bit?

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-27 Thread Alvaro Herrera
On 2021-Apr-27, Alvaro Herrera wrote:

> This v3 handles things as you suggested and works correctly AFAICT.  I'm
> going to add some more tests cases to verify the behavior in the
> scenarios you showed, and get them to run under cache-clobber options to
> make sure it's good.

Yep, it seems to work.  Strangely, the new isolation case doesn't
actually crash before the fix -- it merely throws a memory allocation
error.

-- 
Álvaro Herrera   Valdivia, Chile
"Linux transformó mi computadora, de una `máquina para hacer cosas',
en un aparato realmente entretenido, sobre el cual cada día aprendo
algo nuevo" (Jaime Salinas)
>From 8e41eb137b835f614ddeeb7c79e70d7e0740f522 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 27 Apr 2021 19:04:37 -0400
Subject: [PATCH v4] Allow a partdesc-omitting-partitions to be cached

Makes partition descriptor acquisition faster during the transient
period in which a partition is in the process of being detached.

(This incidentally fixes an alleged bug in 8aba9322511 whereby a
memory context holding a transient partdesc is reparented to NULL ...
leading to permanent leak of that memory.  Reported by Amit Langote.)

Per gripe from Amit Langote
Discussion: https://postgr.es/m/ca+hiwqfgpp1lxjzobygt9rpvtjxxkg5qg2+xch2z1q7krqz...@mail.gmail.com
---
 src/backend/catalog/pg_inherits.c |  52 -
 src/backend/commands/tablecmds.c  |  66 ++-
 src/backend/commands/trigger.c|   3 +-
 src/backend/partitioning/partdesc.c   | 110 --
 src/backend/utils/cache/relcache.c|  33 +-
 src/include/catalog/pg_inherits.h |   6 +-
 src/include/utils/rel.h   |   5 +
 .../detach-partition-concurrently-3.out   | 110 +-
 .../detach-partition-concurrently-3.spec  |  14 ++-
 9 files changed, 320 insertions(+), 79 deletions(-)

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 6447b52854..c373faf2d6 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -52,6 +52,22 @@ typedef struct SeenRelsEntry
  * then no locks are acquired, but caller must beware of race conditions
  * against possible DROPs of child relations.
  *
+ * Partitions marked as being detached are omitted; see
+ * find_inheritance_children_extended for details.
+ */
+List *
+find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
+{
+	return find_inheritance_children_extended(parentrelId, true, lockmode,
+			  NULL, NULL);
+}
+
+/*
+ * find_inheritance_children_extended
+ *
+ * As find_inheritance_children, with more options regarding detached
+ * partitions.
+ *
  * If a partition's pg_inherits row is marked "detach pending",
  * *detached_exist (if not null) is set true.
  *
@@ -60,11 +76,13 @@ typedef struct SeenRelsEntry
  * marked "detach pending" is visible to that snapshot, then that partition is
  * omitted from the output list.  This makes partitions invisible depending on
  * whether the transaction that marked those partitions as detached appears
- * committed to the active snapshot.
+ * committed to the active snapshot.  In addition, *detached_xmin (if not null)
+ * is set to the xmin of the row of the detached partition.
  */
 List *
-find_inheritance_children(Oid parentrelId, bool omit_detached,
-		  LOCKMODE lockmode, bool *detached_exist)
+find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
+   LOCKMODE lockmode, bool *detached_exist,
+   TransactionId *detached_xmin)
 {
 	List	   *list = NIL;
 	Relation	relation;
@@ -132,7 +150,32 @@ find_inheritance_children(Oid parentrelId, bool omit_detached,
 snap = GetActiveSnapshot();
 
 if (!XidInMVCCSnapshot(xmin, snap))
+{
+	if (detached_xmin)
+	{
+		/*
+		 * Two detached partitions should not occur (see
+		 * checks in MarkInheritDetached), but if they do,
+		 * track the newer of the two.  Make sure to warn the
+		 * user, so that they can clean up.  Since this is
+		 * just a cross-check against potentially corrupt
+		 * catalogs, we don't make it a full-fledged error
+		 * message.
+		 */
+		if (*detached_xmin != InvalidTransactionId)
+		{
+			elog(WARNING, "more than one partition pending detach found for table with OID %u",
+ parentrelId);
+			if (TransactionIdFollows(xmin, *detached_xmin))
+*detached_xmin = xmin;
+		}
+		else
+			*detached_xmin = xmin;
+	}
+
+	/* Don't add the partition to the output list */
 	continue;
+}
 			}
 		}
 
@@ -247,8 +290,7 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
 		ListCell   *lc;
 
 		/* Get the direct children of this rel */
-		currentchildren = find_inheritance_children(currentrel, true,
-	lockmode, NULL);
+		currentchildren = find_inheritance_children(currentrel, lockmode);
 
 		/*
 		 * Add to the 

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-27 Thread Alvaro Herrera
This v3 handles things as you suggested and works correctly AFAICT.  I'm
going to add some more tests cases to verify the behavior in the
scenarios you showed, and get them to run under cache-clobber options to
make sure it's good.

Thanks!

-- 
Álvaro Herrera   Valdivia, Chile
>From 145ed63c4338d7c7804329e736019a00c97f1a7a Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 26 Apr 2021 14:53:04 -0400
Subject: [PATCH v3] Allow a partdesc-omitting-partitions to be cached

Makes partition descriptor acquisition faster during the transient
period in which a partition is in the process of being detached.

Per gripe from Amit Langote
Discussion: https://postgr.es/m/ca+hiwqfgpp1lxjzobygt9rpvtjxxkg5qg2+xch2z1q7krqz...@mail.gmail.com
---
 src/backend/catalog/pg_inherits.c |  52 -
 src/backend/commands/tablecmds.c  |  66 +++-
 src/backend/commands/trigger.c|   3 +-
 src/backend/partitioning/partdesc.c   | 101 --
 src/backend/utils/cache/relcache.c|  33 +-
 src/include/catalog/pg_inherits.h |   6 +-
 src/include/utils/rel.h   |   5 +
 .../detach-partition-concurrently-3.out   |  57 +-
 .../detach-partition-concurrently-3.spec  |   9 +-
 9 files changed, 254 insertions(+), 78 deletions(-)

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 6447b52854..c373faf2d6 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -52,6 +52,22 @@ typedef struct SeenRelsEntry
  * then no locks are acquired, but caller must beware of race conditions
  * against possible DROPs of child relations.
  *
+ * Partitions marked as being detached are omitted; see
+ * find_inheritance_children_extended for details.
+ */
+List *
+find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
+{
+	return find_inheritance_children_extended(parentrelId, true, lockmode,
+			  NULL, NULL);
+}
+
+/*
+ * find_inheritance_children_extended
+ *
+ * As find_inheritance_children, with more options regarding detached
+ * partitions.
+ *
  * If a partition's pg_inherits row is marked "detach pending",
  * *detached_exist (if not null) is set true.
  *
@@ -60,11 +76,13 @@ typedef struct SeenRelsEntry
  * marked "detach pending" is visible to that snapshot, then that partition is
  * omitted from the output list.  This makes partitions invisible depending on
  * whether the transaction that marked those partitions as detached appears
- * committed to the active snapshot.
+ * committed to the active snapshot.  In addition, *detached_xmin (if not null)
+ * is set to the xmin of the row of the detached partition.
  */
 List *
-find_inheritance_children(Oid parentrelId, bool omit_detached,
-		  LOCKMODE lockmode, bool *detached_exist)
+find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
+   LOCKMODE lockmode, bool *detached_exist,
+   TransactionId *detached_xmin)
 {
 	List	   *list = NIL;
 	Relation	relation;
@@ -132,7 +150,32 @@ find_inheritance_children(Oid parentrelId, bool omit_detached,
 snap = GetActiveSnapshot();
 
 if (!XidInMVCCSnapshot(xmin, snap))
+{
+	if (detached_xmin)
+	{
+		/*
+		 * Two detached partitions should not occur (see
+		 * checks in MarkInheritDetached), but if they do,
+		 * track the newer of the two.  Make sure to warn the
+		 * user, so that they can clean up.  Since this is
+		 * just a cross-check against potentially corrupt
+		 * catalogs, we don't make it a full-fledged error
+		 * message.
+		 */
+		if (*detached_xmin != InvalidTransactionId)
+		{
+			elog(WARNING, "more than one partition pending detach found for table with OID %u",
+ parentrelId);
+			if (TransactionIdFollows(xmin, *detached_xmin))
+*detached_xmin = xmin;
+		}
+		else
+			*detached_xmin = xmin;
+	}
+
+	/* Don't add the partition to the output list */
 	continue;
+}
 			}
 		}
 
@@ -247,8 +290,7 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
 		ListCell   *lc;
 
 		/* Get the direct children of this rel */
-		currentchildren = find_inheritance_children(currentrel, true,
-	lockmode, NULL);
+		currentchildren = find_inheritance_children(currentrel, lockmode);
 
 		/*
 		 * Add to the queue only those children not already seen. This avoids
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8e717ada28..d9ba87a2a3 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3492,7 +3492,7 @@ renameatt_internal(Oid myrelid,
 		 * expected_parents will only be 0 if we are not already recursing.
 		 */
 		if (expected_parents == 0 &&
-			find_inheritance_children(myrelid, true, NoLock, NULL) != NIL)
+			find_inheritance_children(myrelid, NoLock) != NIL)
 			ereport(ERROR,
 	

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-27 Thread Alvaro Herrera
On 2021-Apr-27, Amit Langote wrote:

> On Tue, Apr 27, 2021 at 4:34 PM Amit Langote  wrote:

> I think we may need a separate context for partdesc_nodetached, likely
> with the same kludges as rd_pdcxt.  Maybe the first problem will go
> away with that as well.

Ooh, seems I completely misunderstood what RelationClose was doing.  I
thought it was deleting the whole rd_pdcxt, *including* the "current"
partdesc.  But that's not at all what it does: it only deletes the
*children* memcontexts, so the partdesc that is currently valid remains
valid.  I agree that your proposed fix appears to be promising, in that
a separate "context tree" rd_pddcxt (?) can be used for this.  I'll try
it out now.

> Few other minor things I noticed:
> 
> +* Store it into relcache.  For snapshots built excluding detached
> +* partitions, which we save separately, we also record the
> +* pg_inherits.xmin of the detached partition that was omitted; this
> +* informs a future potential user of such a cached snapshot.
> 
> The "snapshot" in the 1st and the last sentence should be "partdesc"?

Doh, yeah.

> + * We keep two partdescs in relcache: rd_partdesc_nodetached excludes
> + * partitions marked concurrently being detached, while rd_partdesc includes
> + * them.
> 
> IMHO, describing rd_partdesc first in the sentence would be better.
> Like: rd_partdesc includes all partitions including any that are being
> concurrently detached, while rd_partdesc_nodetached excludes them.

Makes sense.

-- 
Álvaro Herrera   Valdivia, Chile




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-27 Thread Amit Langote
On Tue, Apr 27, 2021 at 4:34 PM Amit Langote  wrote:
> Thanks for the updated patch.  I've been reading it, but I noticed a
> bug in 8aba9322511f, which I thought you'd want to know to make a note
> of when committing this one.
>
> So we decided in 8aba9322511f that it is okay to make the memory
> context in which a transient partdesc is allocated a child of
> PortalContext so that it disappears when the portal does.  But
> PortalContext is apparently NULL when the planner runs, at least in
> the "simple" query protocol, so any transient partdescs built by the
> planner would effectively leak.  Not good.
>
> With this patch, partdesc_nodetached is no longer transient, so the
> problem doesn't exist.
>
> I will write more about the updated patch in a bit...

The first thing that struck me about partdesc_nodetached is that it's
not handled in RelationClearRelation(), where we (re)set a regular
partdesc to NULL so that the next RelationGetPartitionDesc() has to
build it from scratch.  I think partdesc_nodetached and the xmin
should likewise be reset.  Also in load_relcache_init_file(), although
nothing serious there.

That said, I think I may have found a couple of practical problems
with partdesc_nodetached, or more precisely with having it
side-by-side with regular partdesc.  Maybe they can be fixed, so the
problems are not as such deal breakers for the patch's main idea.  The
problems can be seen when different queries in a serializable
transaction have to use both the regular partdesc and
partdesc_detached in a given relcache.  For example, try the following
after first creating a situation where the table p has a
detach-pending partition p2 (for values in (2) and a live partition p1
(for values in (1)).

begin isolation level serializable;
insert into p values (1);
select * from p where a = 1;
insert into p values (1);

The 1st insert succeeds but the 2nd fails with:

ERROR:  no partition of relation "p" found for row
DETAIL:  Partition key of the failing row contains (a) = (1).

I haven't analyzed this error very closely but there is another
situation which causes a crash due to what appears to be a conflict
with rd_pdcxt's design:

-- run in a new session
begin isolation level serializable;
select * from p where a = 1;
insert into p values (1);
select * from p where a = 1;

The 2nd select crashes:

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

The crash occurs because the planner gets handed a stale copy of
partdesc_nodetached for the 2nd select.  It gets stale, because the
context it's allocated in gets made a child of rd_pdcxt, which is in
turn assigned the context of the regular partdesc when it is built for
the insert query.  Any child contexts of rd_pdcxt are deleted as soon
as the Relation's refcount goes to zero, taking it with
partdesc_nodetached.  Note it is this code in
RelationBuildPartitionDesc():

/*
 * But first, a kluge: if there's an old rd_pdcxt, it contains an old
 * partition descriptor that may still be referenced somewhere.
 * Preserve it, while not leaking it, by reattaching it as a child
 * context of the new rd_pdcxt.  Eventually it will get dropped by
 * either RelationClose or RelationClearRelation.
 */
if (rel->rd_pdcxt != NULL)
MemoryContextSetParent(rel->rd_pdcxt, new_pdcxt);
rel->rd_pdcxt = new_pdcxt;

I think we may need a separate context for partdesc_nodetached, likely
with the same kludges as rd_pdcxt.  Maybe the first problem will go
away with that as well.

Few other minor things I noticed:

+* Store it into relcache.  For snapshots built excluding detached
+* partitions, which we save separately, we also record the
+* pg_inherits.xmin of the detached partition that was omitted; this
+* informs a future potential user of such a cached snapshot.

The "snapshot" in the 1st and the last sentence should be "partdesc"?

+ * We keep two partdescs in relcache: rd_partdesc_nodetached excludes
+ * partitions marked concurrently being detached, while rd_partdesc includes
+ * them.

IMHO, describing rd_partdesc first in the sentence would be better.
Like: rd_partdesc includes all partitions including any that are being
concurrently detached, while rd_partdesc_nodetached excludes them.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-27 Thread Amit Langote
Thanks for the updated patch.  I've been reading it, but I noticed a
bug in 8aba9322511f, which I thought you'd want to know to make a note
of when committing this one.

So we decided in 8aba9322511f that it is okay to make the memory
context in which a transient partdesc is allocated a child of
PortalContext so that it disappears when the portal does.  But
PortalContext is apparently NULL when the planner runs, at least in
the "simple" query protocol, so any transient partdescs built by the
planner would effectively leak.  Not good.

With this patch, partdesc_nodetached is no longer transient, so the
problem doesn't exist.

I will write more about the updated patch in a bit...

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-26 Thread Alvaro Herrera
Sorry, I forgot to update some comments in that version.  Fixed here.

-- 
Álvaro Herrera39°49'30"S 73°17'W
>From cb6d9e026624656e826ea880716ee552b15203a8 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 26 Apr 2021 14:53:04 -0400
Subject: [PATCH v2] Allow a partdesc-omitting-partitions to be cached

Makes partition descriptor acquisition faster during the transient
period in which a partition is in the process of being detached.

Per gripe from Amit Langote
Discussion: https://postgr.es/m/ca+hiwqfgpp1lxjzobygt9rpvtjxxkg5qg2+xch2z1q7krqz...@mail.gmail.com
---
 src/backend/catalog/pg_inherits.c | 52 +-
 src/backend/commands/tablecmds.c  | 66 +++--
 src/backend/commands/trigger.c|  3 +-
 src/backend/partitioning/partdesc.c   | 96 ---
 src/include/catalog/pg_inherits.h |  6 +-
 src/include/utils/rel.h   |  3 +
 .../detach-partition-concurrently-3.out   | 57 ++-
 .../detach-partition-concurrently-3.spec  |  9 +-
 8 files changed, 219 insertions(+), 73 deletions(-)

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 6447b52854..c373faf2d6 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -52,6 +52,22 @@ typedef struct SeenRelsEntry
  * then no locks are acquired, but caller must beware of race conditions
  * against possible DROPs of child relations.
  *
+ * Partitions marked as being detached are omitted; see
+ * find_inheritance_children_extended for details.
+ */
+List *
+find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
+{
+	return find_inheritance_children_extended(parentrelId, true, lockmode,
+			  NULL, NULL);
+}
+
+/*
+ * find_inheritance_children_extended
+ *
+ * As find_inheritance_children, with more options regarding detached
+ * partitions.
+ *
  * If a partition's pg_inherits row is marked "detach pending",
  * *detached_exist (if not null) is set true.
  *
@@ -60,11 +76,13 @@ typedef struct SeenRelsEntry
  * marked "detach pending" is visible to that snapshot, then that partition is
  * omitted from the output list.  This makes partitions invisible depending on
  * whether the transaction that marked those partitions as detached appears
- * committed to the active snapshot.
+ * committed to the active snapshot.  In addition, *detached_xmin (if not null)
+ * is set to the xmin of the row of the detached partition.
  */
 List *
-find_inheritance_children(Oid parentrelId, bool omit_detached,
-		  LOCKMODE lockmode, bool *detached_exist)
+find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
+   LOCKMODE lockmode, bool *detached_exist,
+   TransactionId *detached_xmin)
 {
 	List	   *list = NIL;
 	Relation	relation;
@@ -132,7 +150,32 @@ find_inheritance_children(Oid parentrelId, bool omit_detached,
 snap = GetActiveSnapshot();
 
 if (!XidInMVCCSnapshot(xmin, snap))
+{
+	if (detached_xmin)
+	{
+		/*
+		 * Two detached partitions should not occur (see
+		 * checks in MarkInheritDetached), but if they do,
+		 * track the newer of the two.  Make sure to warn the
+		 * user, so that they can clean up.  Since this is
+		 * just a cross-check against potentially corrupt
+		 * catalogs, we don't make it a full-fledged error
+		 * message.
+		 */
+		if (*detached_xmin != InvalidTransactionId)
+		{
+			elog(WARNING, "more than one partition pending detach found for table with OID %u",
+ parentrelId);
+			if (TransactionIdFollows(xmin, *detached_xmin))
+*detached_xmin = xmin;
+		}
+		else
+			*detached_xmin = xmin;
+	}
+
+	/* Don't add the partition to the output list */
 	continue;
+}
 			}
 		}
 
@@ -247,8 +290,7 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
 		ListCell   *lc;
 
 		/* Get the direct children of this rel */
-		currentchildren = find_inheritance_children(currentrel, true,
-	lockmode, NULL);
+		currentchildren = find_inheritance_children(currentrel, lockmode);
 
 		/*
 		 * Add to the queue only those children not already seen. This avoids
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 7d00f4eb25..ab3c61898c 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3507,7 +3507,7 @@ renameatt_internal(Oid myrelid,
 		 * expected_parents will only be 0 if we are not already recursing.
 		 */
 		if (expected_parents == 0 &&
-			find_inheritance_children(myrelid, true, NoLock, NULL) != NIL)
+			find_inheritance_children(myrelid, NoLock) != NIL)
 			ereport(ERROR,
 	(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
 	 errmsg("inherited column \"%s\" must be renamed in child tables too",
@@ -3706,7 +3706,7 @@ rename_constraint_internal(Oid myrelid,
 		else
 		{
 			if 

Re: tab-complete for ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-26 Thread Alvaro Herrera
On 2021-Apr-26, Justin Pryzby wrote:

> On Thu, Apr 22, 2021 at 04:40:35PM -0400, Alvaro Herrera wrote:
> > Would anyone oppose me pushing this for tab-completing the new keywords
> > of ALTER TABLE ..  DETACH PARTITION?
> 
> +1 to apply tab completion for v14

Pushed.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"Before you were born your parents weren't as boring as they are now. They
got that way paying your bills, cleaning up your room and listening to you
tell them how idealistic you are."  -- Charles J. Sykes' advice to teenagers




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-26 Thread Alvaro Herrera
On 2021-Apr-26, Alvaro Herrera wrote:

> > Please allow me to study the patch a bit more closely and get back tomorrow.
> 
> Sure, thanks!

Here's a more polished version.

After trying the version with the elog(ERROR) when two detached
partitions are present, I decided against it; it is unhelpful because
it doesn't let you build partition descriptors for anything.  So I made
it an elog(WARNING) (not an ereport, note), and keep the most recent
pg_inherits.xmin value.  This is not great, but it lets you out of the
situation by finalizing one of the detaches.

The other check (at ALTER TABLE .. DETACH time) is an ereport(ERROR) and
should make the first one unreachable.

-- 
Álvaro Herrera39°49'30"S 73°17'W
>From 89e8a7d170fc0c6701e536b9c5830e3a58f303cc Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 26 Apr 2021 14:53:04 -0400
Subject: [PATCH] Allow a partdesc-with-omitted-partitions to be cached

Makes partdesc acquisition faster during the transient period during
which a partition is in the process of being dropped.

Per gripe from Amit Langote
Discussion: https://postgr.es/m/ca+hiwqfgpp1lxjzobygt9rpvtjxxkg5qg2+xch2z1q7krqz...@mail.gmail.com
---
 src/backend/catalog/pg_inherits.c | 52 +++--
 src/backend/commands/tablecmds.c  | 66 +---
 src/backend/commands/trigger.c|  3 +-
 src/backend/partitioning/partdesc.c   | 75 ---
 src/include/catalog/pg_inherits.h |  6 +-
 src/include/utils/rel.h   |  3 +
 .../detach-partition-concurrently-3.out   | 57 +-
 .../detach-partition-concurrently-3.spec  |  9 ++-
 8 files changed, 205 insertions(+), 66 deletions(-)

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 6447b52854..c373faf2d6 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -52,6 +52,22 @@ typedef struct SeenRelsEntry
  * then no locks are acquired, but caller must beware of race conditions
  * against possible DROPs of child relations.
  *
+ * Partitions marked as being detached are omitted; see
+ * find_inheritance_children_extended for details.
+ */
+List *
+find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
+{
+	return find_inheritance_children_extended(parentrelId, true, lockmode,
+			  NULL, NULL);
+}
+
+/*
+ * find_inheritance_children_extended
+ *
+ * As find_inheritance_children, with more options regarding detached
+ * partitions.
+ *
  * If a partition's pg_inherits row is marked "detach pending",
  * *detached_exist (if not null) is set true.
  *
@@ -60,11 +76,13 @@ typedef struct SeenRelsEntry
  * marked "detach pending" is visible to that snapshot, then that partition is
  * omitted from the output list.  This makes partitions invisible depending on
  * whether the transaction that marked those partitions as detached appears
- * committed to the active snapshot.
+ * committed to the active snapshot.  In addition, *detached_xmin (if not null)
+ * is set to the xmin of the row of the detached partition.
  */
 List *
-find_inheritance_children(Oid parentrelId, bool omit_detached,
-		  LOCKMODE lockmode, bool *detached_exist)
+find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
+   LOCKMODE lockmode, bool *detached_exist,
+   TransactionId *detached_xmin)
 {
 	List	   *list = NIL;
 	Relation	relation;
@@ -132,7 +150,32 @@ find_inheritance_children(Oid parentrelId, bool omit_detached,
 snap = GetActiveSnapshot();
 
 if (!XidInMVCCSnapshot(xmin, snap))
+{
+	if (detached_xmin)
+	{
+		/*
+		 * Two detached partitions should not occur (see
+		 * checks in MarkInheritDetached), but if they do,
+		 * track the newer of the two.  Make sure to warn the
+		 * user, so that they can clean up.  Since this is
+		 * just a cross-check against potentially corrupt
+		 * catalogs, we don't make it a full-fledged error
+		 * message.
+		 */
+		if (*detached_xmin != InvalidTransactionId)
+		{
+			elog(WARNING, "more than one partition pending detach found for table with OID %u",
+ parentrelId);
+			if (TransactionIdFollows(xmin, *detached_xmin))
+*detached_xmin = xmin;
+		}
+		else
+			*detached_xmin = xmin;
+	}
+
+	/* Don't add the partition to the output list */
 	continue;
+}
 			}
 		}
 
@@ -247,8 +290,7 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
 		ListCell   *lc;
 
 		/* Get the direct children of this rel */
-		currentchildren = find_inheritance_children(currentrel, true,
-	lockmode, NULL);
+		currentchildren = find_inheritance_children(currentrel, lockmode);
 
 		/*
 		 * Add to the queue only those children not already seen. This avoids
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 

Re: tab-complete for ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-26 Thread Justin Pryzby
On Thu, Apr 22, 2021 at 04:40:35PM -0400, Alvaro Herrera wrote:
> Would anyone oppose me pushing this for tab-completing the new keywords
> of ALTER TABLE ..  DETACH PARTITION?

+1 to apply tab completion for v14

-- 
Justin




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-26 Thread Alvaro Herrera
Hello Amit,

On 2021-Apr-26, Amit Langote wrote:

> On Sat, Apr 24, 2021 at 8:31 AM Alvaro Herrera  
> wrote:

> > I haven't added a mechanism to verify this; but with asserts on, this
> > patch will crash if you have more than one.  I think the behavior is not
> > necessarily sane with asserts off, since you'll get an arbitrary
> > detach-Xmin assigned to the partdesc, depending on catalog scan order.
> 
> Maybe this is an ignorant question but is the plan to add an elog() in
> this code path or a check (and an ereport()) somewhere in
> ATExecDetachPartition() to prevent more than one partition ending up
> in detach-pending state?

Yeah, that's what I'm planning to do.

> Please allow me to study the patch a bit more closely and get back tomorrow.

Sure, thanks!

-- 
Álvaro Herrera   Valdivia, Chile
"But static content is just dynamic content that isn't moving!"
http://smylers.hates-software.com/2007/08/15/fe244d0c.html




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-26 Thread Amit Langote
Hi Alvaro,

On Sat, Apr 24, 2021 at 8:31 AM Alvaro Herrera  wrote:
> On 2021-Apr-23, Alvaro Herrera wrote:
> > I think the patch I posted was too simple.  I think a real fix requires
> > us to keep track of exactly in what way the partdesc is outdated, so
> > that we can compare to the current situation in deciding to use that
> > partdesc or build a new one.  For example, we could keep track of a list
> > of OIDs of detached partitions (and it simplifies things a lot if allow
> > only a single partition in this situation, because it's either zero OIDs
> > or one OID); or we can save the Xmin of the pg_inherits tuple for the
> > detached partition (and we can compare that Xmin to our current active
> > snapshot and decide whether to use that partdesc or not).
> >
> > I'll experiment with this a little more and propose a patch later today.
>
> This (POC-quality) seems to do the trick.

Thanks for the patch.

> (I restored the API of find_inheritance_children, because it was getting
> a little obnoxious.  I haven't thought this through but I think we
> should do something like it.)

+1.

> > I don't think it's too much of a problem to state that you need to
> > finalize one detach before you can do the next one.  After all, with
> > regular detach, you'd have the tables exclusively locked so you can't do
> > them in parallel anyway.  (It also increases the chances that people
> > will finalize detach operations that went unnoticed.)

That sounds reasonable.

> I haven't added a mechanism to verify this; but with asserts on, this
> patch will crash if you have more than one.  I think the behavior is not
> necessarily sane with asserts off, since you'll get an arbitrary
> detach-Xmin assigned to the partdesc, depending on catalog scan order.

Maybe this is an ignorant question but is the plan to add an elog() in
this code path or a check (and an ereport()) somewhere in
ATExecDetachPartition() to prevent more than one partition ending up
in detach-pending state?

Please allow me to study the patch a bit more closely and get back tomorrow.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-23 Thread Alvaro Herrera
On 2021-Apr-23, Alvaro Herrera wrote:

> I think the patch I posted was too simple.  I think a real fix requires
> us to keep track of exactly in what way the partdesc is outdated, so
> that we can compare to the current situation in deciding to use that
> partdesc or build a new one.  For example, we could keep track of a list
> of OIDs of detached partitions (and it simplifies things a lot if allow
> only a single partition in this situation, because it's either zero OIDs
> or one OID); or we can save the Xmin of the pg_inherits tuple for the
> detached partition (and we can compare that Xmin to our current active
> snapshot and decide whether to use that partdesc or not).
> 
> I'll experiment with this a little more and propose a patch later today.

This (POC-quality) seems to do the trick.

(I restored the API of find_inheritance_children, because it was getting
a little obnoxious.  I haven't thought this through but I think we
should do something like it.)

> I don't think it's too much of a problem to state that you need to
> finalize one detach before you can do the next one.  After all, with
> regular detach, you'd have the tables exclusively locked so you can't do
> them in parallel anyway.  (It also increases the chances that people
> will finalize detach operations that went unnoticed.)

I haven't added a mechanism to verify this; but with asserts on, this
patch will crash if you have more than one.  I think the behavior is not
necessarily sane with asserts off, since you'll get an arbitrary
detach-Xmin assigned to the partdesc, depending on catalog scan order.

-- 
Álvaro Herrera   Valdivia, Chile
"El hombre nunca sabe de lo que es capaz hasta que lo intenta" (C. Dickens)
diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 6447b52854..e3ee72c0b7 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -63,8 +63,16 @@ typedef struct SeenRelsEntry
  * committed to the active snapshot.
  */
 List *
-find_inheritance_children(Oid parentrelId, bool omit_detached,
-		  LOCKMODE lockmode, bool *detached_exist)
+find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
+{
+	return find_inheritance_children_extended(parentrelId, true, lockmode,
+			  NULL, NULL);
+}
+
+List *
+find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
+   LOCKMODE lockmode, bool *detached_exist,
+   TransactionId *detached_xmin)
 {
 	List	   *list = NIL;
 	Relation	relation;
@@ -132,7 +140,15 @@ find_inheritance_children(Oid parentrelId, bool omit_detached,
 snap = GetActiveSnapshot();
 
 if (!XidInMVCCSnapshot(xmin, snap))
+{
+	/* XXX only one detached partition allowed */
+	if (detached_xmin)
+	{
+		Assert(*detached_xmin == InvalidTransactionId);
+		*detached_xmin = xmin;
+	}
 	continue;
+}
 			}
 		}
 
@@ -247,8 +263,7 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
 		ListCell   *lc;
 
 		/* Get the direct children of this rel */
-		currentchildren = find_inheritance_children(currentrel, true,
-	lockmode, NULL);
+		currentchildren = find_inheritance_children(currentrel, lockmode);
 
 		/*
 		 * Add to the queue only those children not already seen. This avoids
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 7d00f4eb25..dd931d09af 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3507,7 +3507,7 @@ renameatt_internal(Oid myrelid,
 		 * expected_parents will only be 0 if we are not already recursing.
 		 */
 		if (expected_parents == 0 &&
-			find_inheritance_children(myrelid, true, NoLock, NULL) != NIL)
+			find_inheritance_children(myrelid, NoLock) != NIL)
 			ereport(ERROR,
 	(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
 	 errmsg("inherited column \"%s\" must be renamed in child tables too",
@@ -3706,7 +3706,7 @@ rename_constraint_internal(Oid myrelid,
 		else
 		{
 			if (expected_parents == 0 &&
-find_inheritance_children(myrelid, true, NoLock, NULL) != NIL)
+find_inheritance_children(myrelid, NoLock) != NIL)
 ereport(ERROR,
 		(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
 		 errmsg("inherited constraint \"%s\" must be renamed in child tables too",
@@ -6580,7 +6580,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
 	 */
 	if (colDef->identity &&
 		recurse &&
-		find_inheritance_children(myrelid, true, NoLock, NULL) != NIL)
+		find_inheritance_children(myrelid, NoLock) != NIL)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
  errmsg("cannot recursively add identity column to table that has child tables")));
@@ -6826,7 +6826,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
 	 * use find_all_inheritors to do it in one pass.
 	 */
 	children =
-		find_inheritance_children(RelationGetRelid(rel), true, lockmode, NULL);
+		

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-23 Thread Alvaro Herrera
On 2021-Apr-23, Amit Langote wrote:

> Back in the 1st session:
> 
> end;
> insert into fk select generate_series(1, 1);
> INSERT 0 1
> Time: 47400.792 ms (00:47.401)

I guess I was wrong about that ... the example I tried didn't have 1000s
of partitions, and I used debug print-outs to show when a new partdesc
was being rebuilt, and only six were occurring.  I'm not sure why my
case behaves so differently from yours, but clearly from the timing this
is not good.

> I am afraid we may have to fix this in the code after all, because
> there does not seem a good way to explain this away in the
> documentation.  

Yeah, looking at this case, I agree that it needs a fix of some kind.

> If I read correctly, you did try an approach of caching the
> PartitionDesc that we currently don't, no?

I think the patch I posted was too simple.  I think a real fix requires
us to keep track of exactly in what way the partdesc is outdated, so
that we can compare to the current situation in deciding to use that
partdesc or build a new one.  For example, we could keep track of a list
of OIDs of detached partitions (and it simplifies things a lot if allow
only a single partition in this situation, because it's either zero OIDs
or one OID); or we can save the Xmin of the pg_inherits tuple for the
detached partition (and we can compare that Xmin to our current active
snapshot and decide whether to use that partdesc or not).

I'll experiment with this a little more and propose a patch later today.

I don't think it's too much of a problem to state that you need to
finalize one detach before you can do the next one.  After all, with
regular detach, you'd have the tables exclusively locked so you can't do
them in parallel anyway.  (It also increases the chances that people
will finalize detach operations that went unnoticed.)

-- 
Álvaro Herrera39°49'30"S 73°17'W




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-23 Thread Amit Langote
On Fri, Apr 23, 2021 at 4:26 AM Alvaro Herrera  wrote:
> On 2021-Apr-22, Amit Langote wrote:
> > -* The reason for this check is that we want to avoid seeing the
> > +* The reason for this hack is that we want to avoid seeing the
> >  * partition as alive in RI queries during REPEATABLE READ or
> > 
> > +* SERIALIZABLE transactions.
> >
> > The comment doesn't quite make it clear why it is the RI query case
> > that necessitates this hack in the first case.
>
> I added "such queries use a different snapshot than the one used by
> regular (user) queries."  I hope that's sufficient.

Yeah, that makes sense.

> > Maybe the relation to what's going on with the partdesc

(I had to leave my desk while in the middle of typing this, but I
forget what I was going to add :()

> > BTW, I do feel a bit alarmed by the potential performance impact of
> > this.  If detached_exist of a cached partdesc is true, then RI queries
> > invoked during a bulk DML operation would have to rebuild one for
> > every tuple to be checked, right?  I haven't tried an actual example
> > yet though.
>
> Yeah, I was scared about that too (which is why I insisted on trying to
> add a cached copy of the partdesc omitting detached partitions).  But
> AFAICS what happens is that the plan for the RI query gets cached after
> a few tries; so we do build the partdesc a few times at first, but later
> we use the cached plan and so we no longer use that one.  So at least in
> the normal cases this isn't a serious problem that I can see.

Actually, ri_trigger.c (or really plancache.c) is not very good at
caching the plan when querying partitioned tables; it always chooses
to replan because a generic plan, even with runtime pruning built into
it, looks very expensive compared to a custom one.  Now that's a
problem we will have to fix sooner than later, but until then we have
to work around it.

Here is an example that shows the problem:

create unlogged table pk_parted (a int primary key) partition by range (a);
select 'create unlogged table pk_parted_' || i || ' partition of
pk_parted for values from (' || (i-1) * 1000 + 1 || ') to (' ||  i *
1000 + 1 || ');' from generate_series(1, 1000) i;
\gexec
create unlogged table fk (a int references pk_parted);
insert into pk_parted select generate_series(1, 1);
begin;
select * from fk_parted where a = 1;

In another session:

alter table pk_parted detach partition pk_parted_1000 concurrently;


Back in the 1st session:

end;
insert into fk select generate_series(1, 1);
INSERT 0 1
Time: 47400.792 ms (00:47.401)

The insert took unusually long, because the PartitionDesc for
pk_parted had to be built exactly 1 times, because there's a
detach-pending partition lying around.  There is also a danger of an
OOM with such an insert because of leaking into PortalContext the
memory of every PartitionDesc thus built, especially with larger
counts of PK partitions and rows inserted into the FK table.

Also, I noticed that all queries on pk_parted, not just the RI
queries, have to build the PartitionDesc every time, so take that much
longer:

-- note the planning time
explain analyze select * from pk_parted where a = 1;
 QUERY
PLAN

---
--
 Index Only Scan using pk_parted_1_pkey on pk_parted_1 pk_parted
(cost=0.28..8.29 rows=1 width=4) (actual time=0.016..0.017 ro
ws=1 loops=1)
   Index Cond: (a = 1)
   Heap Fetches: 1
 Planning Time: 7.543 ms
 Execution Time: 0.044 ms
(5 rows)

Finalizing the detach makes the insert and the query finish in normal
time, because the PartitionDesc can be cached again:

alter table pk_parted detach partition pk_parted_1000 finalize;
insert into fk select generate_series(1, 1);
INSERT 0 1
Time: 855.336 ms

explain analyze select * from pk_parted where a = 1;
 QUERY
PLAN

---
--
 Index Only Scan using pk_parted_1_pkey on pk_parted_1 pk_parted
(cost=0.28..8.29 rows=1 width=4) (actual time=0.033..0.036 ro
ws=1 loops=1)
   Index Cond: (a = 1)
   Heap Fetches: 1
 Planning Time: 0.202 ms
 Execution Time: 0.075 ms
(5 rows)

I am afraid we may have to fix this in the code after all, because
there does not seem a good way to explain this away in the
documentation.  If I read correctly, you did try an approach of
caching the PartitionDesc that we currently don't, no?

-- 
Amit Langote
EDB: http://www.enterprisedb.com




tab-complete for ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-22 Thread Alvaro Herrera
Would anyone oppose me pushing this for tab-completing the new keywords
of
ALTER TABLE ..  DETACH PARTITION?

-- 
Álvaro Herrera   Valdivia, Chile
"Por suerte hoy explotó el califont porque si no me habría muerto
 de aburrido"  (Papelucho)
>From 4ab605c1e1ed87ef92370bc6205a8b786739f774 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 22 Apr 2021 16:37:46 -0400
Subject: [PATCH] tab-complete ALTER TABLE DETACH PARTITION with
 CONCURRENTLY/FINALIZE

---
 src/bin/psql/tab-complete.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index ed84b3789c..7c493b 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2209,6 +2209,8 @@ psql_completion(const char *text, int start, int end)
 		completion_info_charp = prev3_wd;
 		COMPLETE_WITH_QUERY(Query_for_partition_of_table);
 	}
+	else if (Matches("ALTER", "TABLE", MatchAny, "DETACH", "PARTITION", MatchAny))
+		COMPLETE_WITH("CONCURRENTLY", "FINALIZE");
 
 	/* ALTER TABLESPACE  with RENAME TO, OWNER TO, SET, RESET */
 	else if (Matches("ALTER", "TABLESPACE", MatchAny))
-- 
2.20.1



Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-22 Thread Alvaro Herrera
On 2021-Apr-22, Amit Langote wrote:

> On Thu, Apr 22, 2021 at 5:39 AM Alvaro Herrera  
> wrote:

> Reading through the latest one, seeing both include_detached and
> omit_detached being used in different parts of the code makes it a bit
> hard to keep in mind what a given code path is doing wrt detached
> partitions.  How about making it all omit_detached?

Yeah, I hesitated but wanted to do that too.  Done.

>  * Cope with partitions concurrently being detached.  When we see a
> -* partition marked "detach pending", we only include it in the set of
> -* visible partitions if caller requested all detached partitions, or
> -* if its pg_inherits tuple's xmin is still visible to the active
> -* snapshot.
> +* partition marked "detach pending", we omit it from the returned
> +* descriptor if caller requested that and the tuple's xmin does not
> +* appear in progress to the active snapshot.
> 
> It seems odd for a comment in find_inheritance_children() to talk
> about the "descriptor".   Maybe the earlier "set of visible
> partitions" wording was fine?

Absolutely -- done that way.

> -* The reason for this check is that we want to avoid seeing the
> +* The reason for this hack is that we want to avoid seeing the
>  * partition as alive in RI queries during REPEATABLE READ or
> 
> +* SERIALIZABLE transactions.
> 
> The comment doesn't quite make it clear why it is the RI query case
> that necessitates this hack in the first case.

I added "such queries use a different snapshot than the one used by
regular (user) queries."  I hope that's sufficient.

> Maybe the relation to what's going on with the partdesc
> 
> +   if (likely(rel->rd_partdesc &&
> +  (!rel->rd_partdesc->detached_exist || include_detached)))
> +   return rel->rd_partdesc;
> 
> I think it would help to have a comment about what's going here, in
> addition to the description you already wrote for
> PartitionDescData.detached_exist.  Maybe something along the lines of:
> 
> ===
> Under normal circumstances, we just return the partdesc that was
> already built.  However, if the partdesc was built at a time when
> there existed detach-pending partition(s), which the current caller
> would rather not see (omit_detached), then we build one afresh
> omitting any such partitions and return that one.
> RelationBuildPartitionDesc() makes sure that any such partdescs will
> disappear when the query finishes.
> ===
> 
> That's maybe a bit verbose but I am sure you will find a way to write
> it more succinctly.

I added some text in this spot, and also wrote some more in the comment
above RelationGetPartitionDesc and RelationBuildPartitionDesc.

> BTW, I do feel a bit alarmed by the potential performance impact of
> this.  If detached_exist of a cached partdesc is true, then RI queries
> invoked during a bulk DML operation would have to rebuild one for
> every tuple to be checked, right?  I haven't tried an actual example
> yet though.

Yeah, I was scared about that too (which is why I insisted on trying to
add a cached copy of the partdesc omitting detached partitions).  But
AFAICS what happens is that the plan for the RI query gets cached after
a few tries; so we do build the partdesc a few times at first, but later
we use the cached plan and so we no longer use that one.  So at least in
the normal cases this isn't a serious problem that I can see.

I pushed it now.  Thanks for your help,

-- 
Álvaro Herrera   Valdivia, Chile
"How strange it is to find the words "Perl" and "saner" in such close
proximity, with no apparent sense of irony. I doubt that Larry himself
could have managed it." (ncm, http://lwn.net/Articles/174769/)




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-22 Thread Amit Langote
(Sorry about being away from this for over a week.)

On Thu, Apr 22, 2021 at 5:39 AM Alvaro Herrera  wrote:
> While the approach in the previous email does pass the tests, I think
> (but couldn't find a test case to prove) it does so coincidentally, not
> because it is correct.  If I make the test for "detached exist" use the
> cached omits-partitions-partdesc, it does fail, because we had
> previously cached one that was not yet omitting the partition.  So what
> I said earlier in the thread stands: the set of partitions that are
> considered detached changes depending on what the active snapshot is,
> and therefore we *must not* cache any such descriptor.
>
> So I backtracked to my previous proposal, which saves in relcache only
> the partdesc that includes all partitions.  If any partdesc is built
> that omits partitions being detached, that one must be rebuilt afresh
> each time.  And to avoid potentially saving a lot of single-use
> partdescs in CacheMemoryContext, in the attached second patch (which I
> attach separately only to make it more obvious to review) I store such
> partdescs in PortalContext.
>
> Barring objections, I will get this pushed early tomorrow.

Thanks for updating the patch.  I have mostly cosmetic comments.

Reading through the latest one, seeing both include_detached and
omit_detached being used in different parts of the code makes it a bit
hard to keep in mind what a given code path is doing wrt detached
partitions.  How about making it all omit_detached?

 * Cope with partitions concurrently being detached.  When we see a
-* partition marked "detach pending", we only include it in the set of
-* visible partitions if caller requested all detached partitions, or
-* if its pg_inherits tuple's xmin is still visible to the active
-* snapshot.
+* partition marked "detach pending", we omit it from the returned
+* descriptor if caller requested that and the tuple's xmin does not
+* appear in progress to the active snapshot.

It seems odd for a comment in find_inheritance_children() to talk
about the "descriptor".   Maybe the earlier "set of visible
partitions" wording was fine?

-* The reason for this check is that we want to avoid seeing the
+* The reason for this hack is that we want to avoid seeing the
 * partition as alive in RI queries during REPEATABLE READ or

+* SERIALIZABLE transactions.

The comment doesn't quite make it clear why it is the RI query case
that necessitates this hack in the first case.  Maybe the relation to
what's going on with the partdesc

+   if (likely(rel->rd_partdesc &&
+  (!rel->rd_partdesc->detached_exist || include_detached)))
+   return rel->rd_partdesc;

I think it would help to have a comment about what's going here, in
addition to the description you already wrote for
PartitionDescData.detached_exist.  Maybe something along the lines of:

===
Under normal circumstances, we just return the partdesc that was
already built.  However, if the partdesc was built at a time when
there existed detach-pending partition(s), which the current caller
would rather not see (omit_detached), then we build one afresh
omitting any such partitions and return that one.
RelationBuildPartitionDesc() makes sure that any such partdescs will
disappear when the query finishes.
===

That's maybe a bit verbose but I am sure you will find a way to write
it more succinctly.

BTW, I do feel a bit alarmed by the potential performance impact of
this.  If detached_exist of a cached partdesc is true, then RI queries
invoked during a bulk DML operation would have to rebuild one for
every tuple to be checked, right?  I haven't tried an actual example
yet though.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-21 Thread Alvaro Herrera
On 2021-Apr-10, Justin Pryzby wrote:

> If it *implies* the partition constraint, then it's at least as tight (and
> maybe tighter), yes ?
> 
> I think you're concerned with the case that someone has a partition with
> "tight" bounds like (a>=200 and a<300) and a check constraint that's "less
> tight" like (a>=100 AND a<400).  In that case, the loose check constraint
> doesn't imply the tighter partition constraint, so your patch would add a
> non-redundant constraint.

... yeah, you're right, we can do as you suggest and it seems an
improvement.  I verified, as is obvious in hindsight, that the existing
constraint makes a future ATTACH of the partition with the same bounds
as before not scan the partition.

I pushed the patch with a small change:
PartConstraintImpliedByRelConstraint wants the constraint in
implicit-AND form (that is, a list) which is what we already have, so we
can postpone make_ands_explicit() until later.

Pushed, thanks,

-- 
Álvaro Herrera   Valdivia, Chile




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-21 Thread Alvaro Herrera
While the approach in the previous email does pass the tests, I think
(but couldn't find a test case to prove) it does so coincidentally, not
because it is correct.  If I make the test for "detached exist" use the
cached omits-partitions-partdesc, it does fail, because we had
previously cached one that was not yet omitting the partition.  So what
I said earlier in the thread stands: the set of partitions that are
considered detached changes depending on what the active snapshot is,
and therefore we *must not* cache any such descriptor.

So I backtracked to my previous proposal, which saves in relcache only
the partdesc that includes all partitions.  If any partdesc is built
that omits partitions being detached, that one must be rebuilt afresh
each time.  And to avoid potentially saving a lot of single-use
partdescs in CacheMemoryContext, in the attached second patch (which I
attach separately only to make it more obvious to review) I store such
partdescs in PortalContext.

Barring objections, I will get this pushed early tomorrow.

-- 
Álvaro Herrera   Valdivia, Chile
"Just treat us the way you want to be treated + some extra allowance
 for ignorance."(Michael Brusser)
>From a6e316fd12e5afe4dbb736e27ac83f688c7138ab Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 20 Apr 2021 18:01:18 -0400
Subject: [PATCH 1/2] Fix relcache hazard with detached partitions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During queries coming from ri_triggers.c, we need to omit partitions
that are marked pending detach -- otherwise, the RI query is tricked
into allowing a row into the referencing table whose corresponding row
is in the detached partition.  Which is bogus: once the detach operation
completes, the row becomes an orphan.

However, the code was not doing that in repeatable-read transactions,
because relcache kept a copy of the partition descriptor that included
the partition, and used it in the RI query.  This commit changes the
partdesc cache code to only keep descriptors that aren't dependent on
a snapshot (namely: those where no detached partition exist, and those
where detached partitions are included).  When a partdesc-without-
detached-partitions is requested, we create one afresh each time.

find_inheritance_children gets a new output *detached_exist boolean,
which indicates whether any partition marked pending-detach is found.
Its "include_detached" input flag is changed to "omit_detached", because
that captures desired the semantics more naturally.

This was noticed because a buildfarm member that runs with relcache
clobbering, which would not keep the improperly cached partdesc, broke
one test, which led us to realize that the expected output of that test
was bogus.  This commit also corrects that expected output.

Author: Amit Langote 
Author: Álvaro Herrera 
Discussion: https://postgr.es/m/3269784.1617215...@sss.pgh.pa.us
---
 src/backend/catalog/pg_inherits.c | 64 +++
 src/backend/commands/tablecmds.c  | 22 +++
 src/backend/commands/trigger.c|  4 +-
 src/backend/executor/execPartition.c  | 15 ++---
 src/backend/partitioning/partdesc.c   | 30 +
 src/include/catalog/pg_inherits.h |  4 +-
 src/include/partitioning/partdesc.h   | 10 ++-
 .../detach-partition-concurrently-4.out   |  1 +
 8 files changed, 88 insertions(+), 62 deletions(-)

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index bb8b2249b1..bb6b1686ee 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -52,13 +52,19 @@ typedef struct SeenRelsEntry
  * then no locks are acquired, but caller must beware of race conditions
  * against possible DROPs of child relations.
  *
- * include_detached says to include all partitions, even if they're marked
- * detached.  Passing it as false means they might or might not be included,
- * depending on the visibility of the pg_inherits row for the active snapshot.
+ * If a partition's pg_inherits row is marked "detach pending",
+ * *detached_exist (if not null) is set true, otherwise it is set false.
+ *
+ * If omit_detached is true and there is an active snapshot (not the same as
+ * the catalog snapshot used to scan pg_inherits!) and a pg_inherits tuple
+ * marked "detach pending" is visible to that snapshot, then that partition is
+ * omitted from the output list.  This makes partitions invisible depending on
+ * whether the transaction that marked those partitions as detached appears
+ * committed to the active snapshot.
  */
 List *
-find_inheritance_children(Oid parentrelId, bool include_detached,
-		  LOCKMODE lockmode)
+find_inheritance_children(Oid parentrelId, bool omit_detached,
+		  LOCKMODE lockmode, bool *detached_exist)
 {
 	List	   *list = NIL;
 	Relation	relation;
@@ -78,6 +84,9 @@ find_inheritance_children(Oid 

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-20 Thread Alvaro Herrera
Actually I had a silly bug in the version that attempted to cache a
partdesc that omits detached partitions.  This one, while not fully
baked, seems to work correctly (on top of the previous one).

The thing that I don't fully understand is why we have to require to
have built the regular one first.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"This is what I like so much about PostgreSQL.  Most of the surprises
are of the "oh wow!  That's cool" Not the "oh shit!" kind.  :)"
Scott Marlowe, http://archives.postgresql.org/pgsql-admin/2008-10/msg00152.php
>From b45e9db9cd08282f7a7fabcef944880cc3a3d415 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 20 Apr 2021 20:42:30 -0400
Subject: [PATCH] separately cache a partdesc that omits detached partitions,
 too

---
 src/backend/partitioning/partdesc.c | 21 +++--
 src/backend/utils/cache/relcache.c  |  4 
 src/include/utils/rel.h |  1 +
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/src/backend/partitioning/partdesc.c b/src/backend/partitioning/partdesc.c
index 7d019a72be..f245cbd664 100644
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -67,10 +67,25 @@ RelationGetPartitionDesc(Relation rel, bool include_detached)
 {
 	Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
 
+	/*
+	 * The common case is that there are no partitions pending detach and
+	 * that the appropriate descriptor has already been built; use that.
+	 */
 	if (likely(rel->rd_partdesc &&
 			   (!rel->rd_partdesc->detached_exist || include_detached)))
 		return rel->rd_partdesc;
 
+	/*
+	 * If we're asked for a descriptor with detached partitions omitted, and
+	 * we have already built it, return that.
+	 */
+	if (!include_detached &&
+		rel->rd_partdesc &&
+		rel->rd_partdesc->detached_exist &&
+		rel->rd_partdesc_nodetached)
+		return rel->rd_partdesc_nodetached;
+
+	/* Otherwise build one afresh and store it for next time */
 	return RelationBuildPartitionDesc(rel, include_detached);
 }
 
@@ -278,9 +293,11 @@ RelationBuildPartitionDesc(Relation rel, bool include_detached)
 		MemoryContextSetParent(rel->rd_pdcxt, new_pdcxt);
 	rel->rd_pdcxt = new_pdcxt;
 
-	/* Store it into relcache, but only if no detached partitions exist */
-	if (!detached_exist)
+	/* Store it into the appropriate member of the relcache entry */
+	if (!detached_exist || include_detached)
 		rel->rd_partdesc = partdesc;
+	else
+		rel->rd_partdesc_nodetached = partdesc;
 
 	return partdesc;
 }
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 29702d6eab..87a7613584 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1157,6 +1157,7 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
 	relation->rd_partkey = NULL;
 	relation->rd_partkeycxt = NULL;
 	relation->rd_partdesc = NULL;
+	relation->rd_partdesc_nodetached = NULL;
 	relation->rd_pdcxt = NULL;
 	relation->rd_partcheck = NIL;
 	relation->rd_partcheckvalid = false;
@@ -2672,12 +2673,14 @@ RelationClearRelation(Relation relation, bool rebuild)
 			 * newrel.
 			 */
 			relation->rd_partdesc = NULL;	/* ensure rd_partdesc is invalid */
+			relation->rd_partdesc_nodetached = NULL;
 			if (relation->rd_pdcxt != NULL) /* probably never happens */
 MemoryContextSetParent(newrel->rd_pdcxt, relation->rd_pdcxt);
 			else
 relation->rd_pdcxt = newrel->rd_pdcxt;
 			/* drop newrel's pointers so we don't destroy it below */
 			newrel->rd_partdesc = NULL;
+			newrel->rd_partdesc_nodetached = NULL;
 			newrel->rd_pdcxt = NULL;
 		}
 
@@ -5942,6 +5945,7 @@ load_relcache_init_file(bool shared)
 		rel->rd_partkey = NULL;
 		rel->rd_partkeycxt = NULL;
 		rel->rd_partdesc = NULL;
+		rel->rd_partdesc_nodetached = NULL;
 		rel->rd_pdcxt = NULL;
 		rel->rd_partcheck = NIL;
 		rel->rd_partcheckvalid = false;
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 34e25eb597..3d480a3ab2 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -127,6 +127,7 @@ typedef struct RelationData
 
 	/* data managed by RelationGetPartitionDesc: */
 	PartitionDesc rd_partdesc;	/* partition descriptor, or NULL */
+	PartitionDesc rd_partdesc_nodetached;	/* partition descriptor omitting detached partitions */
 	MemoryContext rd_pdcxt;		/* private context for rd_partdesc, if any */
 
 	/* data managed by RelationGetPartitionQual: */
-- 
2.20.1



Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-20 Thread Alvaro Herrera
OK so after mulling this over for a long time, here's a patch for this.
It solves the problem by making the partition descriptor no longer be
cached if a detached partition is omitted.  Any transaction that needs a
partition descriptor that excludes detached partitions, will have to
recreate the partdesc from scratch.  To support this, I changed the
output boolean semantics: instead of "does this partdesc include an
detached partitions" as in your patch, it now is "are there any detached
partitions".  But whenever no detached partitions exist, or when all
partitions including detached are requested, then the cached descriptor
is returned (because that necessarily has to be correct).  The main
difference this has to your patch is that we always keep the descriptor
in the cache and don't rebuild anything, unless a detached partition is
present.

I flipped the find_inheritance_children() input boolean, from
"include_detached" to "omit_detached".  This is more natural, given the
internal behavior.  You could argue to propagate that naming change to
the partdesc.h API and PartitionDirectory, but I don't think there's a
need for that.

I ran all the detach-partition-concurrently tests under
debug_invalidate_system_caches_always=1 and everything passes.

I experimented with keeping a separate cached partition descriptor that
omits the detached partition, but that brings back some trouble; I
couldn't find a way to invalidate such a cached entry in a reasonable
way.  I have the patch for that, if somebody wants to play with it.

-- 
Álvaro Herrera   Valdivia, Chile
"That sort of implies that there are Emacs keystrokes which aren't obscure.
I've been using it daily for 2 years now and have yet to discover any key
sequence which makes any sense."(Paul Thomas)
>From f95149f0fee451b8a2b49861dc813aeddf384f8f Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 20 Apr 2021 18:01:18 -0400
Subject: [PATCH] Fix relcache hazard with detached partitions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During queries run by ri_triggers.c queries, we need to omit partitions
that are marked pending detach -- otherwise, the RI query is tricked
into allowing a row into the referencing table whose corresponding row
is in the detached partition.  Which is bogus: once the detach operation
completes, the row becomes an orphan.

However, the code was not doing that in repeatable-read transactions,
because relcache kept a copy of the partition descriptor that included
the partition, and used it in the RI query.  This commit changes the
caching code so that the partition descriptors we keep in relcache are
only those that do not contain any detached partition, *or* those that
are requested to contain all partitions.  When a partdesc-without-
detached-partitions is requested, we create one afresh each time.

This was noticed because a buildfarm member that runs with relcache
clobbering, which would not keep the improperly cached partdesc, broke
one test, which led us to realize that the expected output of that test
was bogus.  This commit also corrects that expected output.

Author: Amit Langote 
Author: Álvaro Herrera 
Discussion: https://postgr.es/m/3269784.1617215...@sss.pgh.pa.us
---
 src/backend/catalog/pg_inherits.c | 64 +++
 src/backend/commands/tablecmds.c  | 22 +++
 src/backend/commands/trigger.c|  4 +-
 src/backend/executor/execPartition.c  | 15 ++---
 src/backend/partitioning/partdesc.c   | 30 +
 src/include/catalog/pg_inherits.h |  4 +-
 src/include/partitioning/partdesc.h   | 10 ++-
 .../detach-partition-concurrently-4.out   |  1 +
 8 files changed, 88 insertions(+), 62 deletions(-)

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index bb8b2249b1..bb6b1686ee 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -52,13 +52,19 @@ typedef struct SeenRelsEntry
  * then no locks are acquired, but caller must beware of race conditions
  * against possible DROPs of child relations.
  *
- * include_detached says to include all partitions, even if they're marked
- * detached.  Passing it as false means they might or might not be included,
- * depending on the visibility of the pg_inherits row for the active snapshot.
+ * If a partition's pg_inherits row is marked "detach pending",
+ * *detached_exist (if not null) is set true, otherwise it is set false.
+ *
+ * If omit_detached is true and there is an active snapshot (not the same as
+ * the catalog snapshot used to scan pg_inherits!) and a pg_inherits tuple
+ * marked "detach pending" is visible to that snapshot, then that partition is
+ * omitted from the output list.  This makes partitions invisible depending on
+ * whether the transaction that marked those partitions as detached appears
+ * committed to the active snapshot.
  */
 

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-13 Thread Justin Pryzby
On Sat, Apr 10, 2021 at 01:42:26PM -0500, Justin Pryzby wrote:
> On Sun, Mar 21, 2021 at 03:01:15PM -0300, Alvaro Herrera wrote:
> > > But note that it doesn't check if an existing constraint "implies" the new
> > > constraint - maybe it should.
> > 
> > Hm, I'm not sure I want to do that, because that means that if I later
> > have to attach the partition again with the same partition bounds, then
> > I might have to incur a scan to recheck the constraint.  I think we want
> > to make the new constraint be as tight as possible ...
> 
> If it *implies* the partition constraint, then it's at least as tight (and
> maybe tighter), yes ?
> 
> I think you're concerned with the case that someone has a partition with
> "tight" bounds like (a>=200 and a<300) and a check constraint that's "less
> tight" like (a>=100 AND a<400).  In that case, the loose check constraint
> doesn't imply the tighter partition constraint, so your patch would add a
> non-redundant constraint.
> 
> I'm interested in the case that someone has a check constraint that almost but
> not exactly matches the partition constraint, like (a<300 AND a>=200).  In 
> that
> case, your patch adds a redundant constraint.
> 
> I wrote a patch which seems to effect my preferred behavior - please check.

On Sat, Apr 10, 2021 at 02:13:26PM -0500, Justin Pryzby wrote:
> I suppose the docs should be updated for the exact behavior, maybe even 
> without
> this patch:
> 
> |A CHECK constraint
> |that duplicates the partition constraint is added to the partition.

I added this as an Opened Item, since it affects user-visible behavior:
whether or not a redundant, non-equal constraint is added.

-- 
Justin




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-13 Thread Alvaro Herrera
On 2021-Apr-13, Amit Langote wrote:

> Actually it occurred to me this morning that CLOBBER_CACHE_ALWAYS is
> what exposed this problem on this animal (not sure if other such
> animals did too though).  With CLOBBER_CACHE_ALWAYS, a PartitionDesc
> will be built afresh on most uses.  In this particular case, the RI
> query executed by the insert has to build a new one (for d4_primary),
> correctly excluding the detach-pending partition (d4_primary1) per the
> snapshot with which it is run.  In normal builds, it would reuse the
> one built by an earlier query in the transaction, which does include
> the detach-pending partition, thus allowing the insert trying to
> insert a row referencing that partition to succeed.  There is a
> provision in RelationGetPartitionDesc() to rebuild if any
> detach-pending partitions in the existing copy of PartitionDesc are
> not to be seen by the current query, but a bug mentioned in my earlier
> reply prevents that from kicking in.

Right -- that explanation makes perfect sense: the problem stems from
the fact that the cached copy of the partition descriptor is not valid
depending on the visibility of detached partitions for the operation
that requests the descriptor.  I think your patch is a critical part to
a complete solution, but one thing is missing: we don't actually know
that the detached partitions we see now are the same detached partitions
we saw a moment ago.  After all, a partitioned table can have several
partitions in the process of being detached; so if you just go with the
boolean "does it have any detached or not" bit, you could end up with a
descriptor that doesn't include/ignore *all* detached partitions, just
the older one(s).

I think you could fix this aspect easily by decreeing that you can only
have only one partition-being-detached at one point.  So if you try to
DETACH CONCURRENTLY and there's another one in that state, raise an
error.  Maybe for simplicity we should do that anyway.

But I think there's another hidden assumption in your patch, which is
that the descriptor is rebuilt every now and then *anyway* because the
flag for detached flips between parser and executor, and because we send
invalidation messages for each detach.  I don't think we would ever
change things that would break this flipping (it sounds like planner and
executor necessarily have to be doing things differently all the time),
but it seems fragile as heck.  I would feel much safer if we just
avoided caching the wrong thing ... or perhaps keep a separate cache
entry (one descriptor including detached, another one not), to avoid
pointless rebuilds.

-- 
Álvaro Herrera   Valdivia, Chile




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-12 Thread Amit Langote
On Mon, Apr 12, 2021 at 6:23 AM Justin Pryzby  wrote:
> On Sun, Apr 11, 2021 at 05:20:35PM -0400, Alvaro Herrera wrote:
> > On 2021-Mar-31, Tom Lane wrote:
> >
> > > diff -U3 
> > > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/expected/detach-partition-concurrently-4.out
> > >  
> > > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/output_iso/results/detach-partition-concurrently-4.out
> > > --- 
> > > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/expected/detach-partition-concurrently-4.out
> > > 2021-03-29 20:14:21.258199311 +0200
> > > +++ 
> > > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/output_iso/results/detach-partition-concurrently-4.out
> > >   2021-03-30 18:54:34.272839428 +0200
> > > @@ -324,6 +324,7 @@
> > >  1
> > >  2
> > >  step s1insert: insert into d4_fk values (1);
> > > +ERROR:  insert or update on table "d4_fk" violates foreign key 
> > > constraint "d4_fk_a_fkey"
> > >  step s1c: commit;
> > >
> > >  starting permutation: s2snitch s1b s1s s2detach s1cancel s3vacfreeze s1s 
> > > s1insert s1c
> >
> > Hmm, actually, looking at this closely, I think the expected output is
> > bogus and trilobite is doing the right thing by throwing this error
> > here.  The real question is why isn't this case behaving in that way in
> > every *other* animal.
>
> I was looking/thinking at it, and wondered whether it could be a race 
> condition
> involving pg_cancel_backend()

I thought about it some and couldn't come up with an explanation as to
why pg_cancel_backend() race might be a problem.

Actually it occurred to me this morning that CLOBBER_CACHE_ALWAYS is
what exposed this problem on this animal (not sure if other such
animals did too though).  With CLOBBER_CACHE_ALWAYS, a PartitionDesc
will be built afresh on most uses.  In this particular case, the RI
query executed by the insert has to build a new one (for d4_primary),
correctly excluding the detach-pending partition (d4_primary1) per the
snapshot with which it is run.  In normal builds, it would reuse the
one built by an earlier query in the transaction, which does include
the detach-pending partition, thus allowing the insert trying to
insert a row referencing that partition to succeed.  There is a
provision in RelationGetPartitionDesc() to rebuild if any
detach-pending partitions in the existing copy of PartitionDesc are
not to be seen by the current query, but a bug mentioned in my earlier
reply prevents that from kicking in.


--
Amit Langote
EDB: http://www.enterprisedb.com




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-12 Thread Amit Langote
On Mon, Apr 12, 2021 at 4:42 PM Amit Langote  wrote:
> On Mon, Apr 12, 2021 at 6:20 AM Alvaro Herrera  
> wrote:
> > On 2021-Mar-31, Tom Lane wrote:
> >
> > > diff -U3 
> > > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/expected/detach-partition-concurrently-4.out
> > >  
> > > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/output_iso/results/detach-partition-concurrently-4.out
> > > --- 
> > > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/expected/detach-partition-concurrently-4.out
> > >   2021-03-29 20:14:21.258199311 +0200
> > > +++ 
> > > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/output_iso/results/detach-partition-concurrently-4.out
> > > 2021-03-30 18:54:34.272839428 +0200
> > > @@ -324,6 +324,7 @@
> > >  1
> > >  2
> > >  step s1insert: insert into d4_fk values (1);
> > > +ERROR:  insert or update on table "d4_fk" violates foreign key 
> > > constraint "d4_fk_a_fkey"
> > >  step s1c: commit;
> > >
> > >  starting permutation: s2snitch s1b s1s s2detach s1cancel s3vacfreeze s1s 
> > > s1insert s1c
> >
> > Hmm, actually, looking at this closely, I think the expected output is
> > bogus and trilobite is doing the right thing by throwing this error
> > here.  The real question is why isn't this case behaving in that way in
> > every *other* animal.
>
> Indeed.
>
> I can see a wrong behavior of RelationGetPartitionDesc() in a case
> that resembles the above test case.
>
> drop table if exists d4_primary, d4_primary1, d4_fk, d4_pid;
> create table d4_primary (a int primary key) partition by list (a);
> create table d4_primary1 partition of d4_primary for values in (1);
> create table d4_primary2 partition of d4_primary for values in (2);
> insert into d4_primary values (1);
> insert into d4_primary values (2);
> create table d4_fk (a int references d4_primary);
> insert into d4_fk values (2);
> create table d4_pid (pid int);
>
> Session 1:
> begin isolation level serializable;
> select * from d4_primary;
>  a
> ---
>  1
>  2
> (2 rows)
>
> Session 2:
> alter table d4_primary detach partition d4_primary1 concurrently;
> 
>
> Session 1:
> -- can see d4_primary1 as detached at this point, though still scans!
> select * from d4_primary;
>  a
> ---
>  1
>  2
> (2 rows)
> insert into d4_fk values (1);
> INSERT 0 1
> end;
>
> Session 2:
> 
> ALTER TABLE
>
> Session 1:
> -- FK violated
> select * from d4_primary;
>  a
> ---
>  2
> (1 row)
> select * from d4_fk;
>  a
> ---
>  1
> (1 row)
>
> The 2nd select that session 1 performs adds d4_primary1, whose detach
> it *sees* is pending, to the PartitionDesc, but without setting its
> includes_detached.  The subsequent insert's RI query, because it uses
> that PartitionDesc as-is, returns 1 as being present in d4_primary,
> leading to the insert succeeding.  When session 1's transaction
> commits, the waiting ALTER proceeds with committing the 2nd part of
> the DETACH, without having a chance again to check if the DETACH would
> lead to the foreign key of d4_fk being violated.
>
> It seems problematic to me that the logic of setting includes_detached
> is oblivious of the special check in find_inheritance_children() to
> decide whether "force"-include a detach-pending child based on
> cross-checking its pg_inherit tuple's xmin against the active
> snapshot.  Maybe fixing that would help, although I haven't tried that
> myself yet.

I tried that in the attached.  It is indeed the above failing
isolation test whose output needed to be adjusted.

While at it, I tried rewording the comment around that special
visibility check done to force-include detached partitions, as I got
confused by the way it's worded currently.  Actually, it may be a good
idea to add some comments around the intended include_detached
behavior in the places where PartitionDesc is used; e.g.
set_relation_partition_info() lacks a one-liner on why it's okay for
the planner to not see detached partitions.  Or perhaps, a comment for
includes_detached of PartitionDesc should describe the various cases
in which it is true and the cases in which it is not.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


PartitionDesc-includes_detached-thinko-fix.patch
Description: Binary data


Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-12 Thread Amit Langote
On Mon, Apr 12, 2021 at 6:20 AM Alvaro Herrera  wrote:
> On 2021-Mar-31, Tom Lane wrote:
>
> > diff -U3 
> > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/expected/detach-partition-concurrently-4.out
> >  
> > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/output_iso/results/detach-partition-concurrently-4.out
> > --- 
> > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/expected/detach-partition-concurrently-4.out
> >   2021-03-29 20:14:21.258199311 +0200
> > +++ 
> > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/output_iso/results/detach-partition-concurrently-4.out
> > 2021-03-30 18:54:34.272839428 +0200
> > @@ -324,6 +324,7 @@
> >  1
> >  2
> >  step s1insert: insert into d4_fk values (1);
> > +ERROR:  insert or update on table "d4_fk" violates foreign key constraint 
> > "d4_fk_a_fkey"
> >  step s1c: commit;
> >
> >  starting permutation: s2snitch s1b s1s s2detach s1cancel s3vacfreeze s1s 
> > s1insert s1c
>
> Hmm, actually, looking at this closely, I think the expected output is
> bogus and trilobite is doing the right thing by throwing this error
> here.  The real question is why isn't this case behaving in that way in
> every *other* animal.

Indeed.

I can see a wrong behavior of RelationGetPartitionDesc() in a case
that resembles the above test case.

drop table if exists d4_primary, d4_primary1, d4_fk, d4_pid;
create table d4_primary (a int primary key) partition by list (a);
create table d4_primary1 partition of d4_primary for values in (1);
create table d4_primary2 partition of d4_primary for values in (2);
insert into d4_primary values (1);
insert into d4_primary values (2);
create table d4_fk (a int references d4_primary);
insert into d4_fk values (2);
create table d4_pid (pid int);

Session 1:
begin isolation level serializable;
select * from d4_primary;
 a
---
 1
 2
(2 rows)

Session 2:
alter table d4_primary detach partition d4_primary1 concurrently;


Session 1:
-- can see d4_primary1 as detached at this point, though still scans!
select * from d4_primary;
 a
---
 1
 2
(2 rows)
insert into d4_fk values (1);
INSERT 0 1
end;

Session 2:

ALTER TABLE

Session 1:
-- FK violated
select * from d4_primary;
 a
---
 2
(1 row)
select * from d4_fk;
 a
---
 1
(1 row)

The 2nd select that session 1 performs adds d4_primary1, whose detach
it *sees* is pending, to the PartitionDesc, but without setting its
includes_detached.  The subsequent insert's RI query, because it uses
that PartitionDesc as-is, returns 1 as being present in d4_primary,
leading to the insert succeeding.  When session 1's transaction
commits, the waiting ALTER proceeds with committing the 2nd part of
the DETACH, without having a chance again to check if the DETACH would
lead to the foreign key of d4_fk being violated.

It seems problematic to me that the logic of setting includes_detached
is oblivious of the special check in find_inheritance_children() to
decide whether "force"-include a detach-pending child based on
cross-checking its pg_inherit tuple's xmin against the active
snapshot.  Maybe fixing that would help, although I haven't tried that
myself yet.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-11 Thread Justin Pryzby
On Sun, Apr 11, 2021 at 05:20:35PM -0400, Alvaro Herrera wrote:
> On 2021-Mar-31, Tom Lane wrote:
> 
> > diff -U3 
> > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/expected/detach-partition-concurrently-4.out
> >  
> > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/output_iso/results/detach-partition-concurrently-4.out
> > --- 
> > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/expected/detach-partition-concurrently-4.out
> > 2021-03-29 20:14:21.258199311 +0200
> > +++ 
> > /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/output_iso/results/detach-partition-concurrently-4.out
> >   2021-03-30 18:54:34.272839428 +0200
> > @@ -324,6 +324,7 @@
> >  1  
> >  2  
> >  step s1insert: insert into d4_fk values (1);
> > +ERROR:  insert or update on table "d4_fk" violates foreign key constraint 
> > "d4_fk_a_fkey"
> >  step s1c: commit;
> >  
> >  starting permutation: s2snitch s1b s1s s2detach s1cancel s3vacfreeze s1s 
> > s1insert s1c
> 
> Hmm, actually, looking at this closely, I think the expected output is
> bogus and trilobite is doing the right thing by throwing this error
> here.  The real question is why isn't this case behaving in that way in
> every *other* animal.

I was looking/thinking at it, and wondered whether it could be a race condition
involving pg_cancel_backend()

-- 
Justin




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-11 Thread Alvaro Herrera
On 2021-Mar-31, Tom Lane wrote:

> diff -U3 
> /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/expected/detach-partition-concurrently-4.out
>  
> /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/output_iso/results/detach-partition-concurrently-4.out
> --- 
> /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/expected/detach-partition-concurrently-4.out
>   2021-03-29 20:14:21.258199311 +0200
> +++ 
> /home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/output_iso/results/detach-partition-concurrently-4.out
> 2021-03-30 18:54:34.272839428 +0200
> @@ -324,6 +324,7 @@
>  1  
>  2  
>  step s1insert: insert into d4_fk values (1);
> +ERROR:  insert or update on table "d4_fk" violates foreign key constraint 
> "d4_fk_a_fkey"
>  step s1c: commit;
>  
>  starting permutation: s2snitch s1b s1s s2detach s1cancel s3vacfreeze s1s 
> s1insert s1c

Hmm, actually, looking at this closely, I think the expected output is
bogus and trilobite is doing the right thing by throwing this error
here.  The real question is why isn't this case behaving in that way in
every *other* animal.

-- 
Álvaro Herrera   Valdivia, Chile
"Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-04-10 Thread Justin Pryzby
On Sun, Mar 21, 2021 at 03:01:15PM -0300, Alvaro Herrera wrote:
> > But note that it doesn't check if an existing constraint "implies" the new
> > constraint - maybe it should.
> 
> Hm, I'm not sure I want to do that, because that means that if I later
> have to attach the partition again with the same partition bounds, then
> I might have to incur a scan to recheck the constraint.  I think we want
> to make the new constraint be as tight as possible ...

If it *implies* the partition constraint, then it's at least as tight (and
maybe tighter), yes ?

I think you're concerned with the case that someone has a partition with
"tight" bounds like (a>=200 and a<300) and a check constraint that's "less
tight" like (a>=100 AND a<400).  In that case, the loose check constraint
doesn't imply the tighter partition constraint, so your patch would add a
non-redundant constraint.

I'm interested in the case that someone has a check constraint that almost but
not exactly matches the partition constraint, like (a<300 AND a>=200).  In that
case, your patch adds a redundant constraint.

I wrote a patch which seems to effect my preferred behavior - please check.

-- 
Justin
>From 1ecc8cb17192d13f7432fe582a43ab8597b15c00 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 25 Mar 2021 21:24:10 -0500
Subject: [PATCH] DETACH CONCURRENTLY: avoid creation of redundant constraint

---
 src/backend/commands/tablecmds.c  | 29 ++-
 src/test/regress/expected/alter_table.out | 20 
 src/test/regress/sql/alter_table.sql  |  6 +
 3 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 7d0bdc9ac4..62398b6882 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -17873,31 +17873,28 @@ ATExecDetachPartitionFinalize(Relation rel, RangeVar *name)
  * DetachAddConstraintIfNeeded
  *		Subroutine for ATExecDetachPartition.  Create a constraint that
  *		takes the place of the partition constraint, but avoid creating
- *		a dupe if an equivalent constraint already exists.
+ *		a dupe if an constraint already exists which implies the needed
+ *		constraint.
  */
 static void
 DetachAddConstraintIfNeeded(List **wqueue, Relation partRel)
 {
 	AlteredTableInfo *tab;
-	Expr	   *constraintExpr;
-	TupleDesc	td = RelationGetDescr(partRel);
+	List	   *constraintExpr;
 	Constraint *n;
 
-	constraintExpr = make_ands_explicit(RelationGetPartitionQual(partRel));
+	constraintExpr = (List *) eval_const_expressions(NULL,
+			(Node *) RelationGetPartitionQual(partRel));
 
-	/* If an identical constraint exists, we don't need to create one */
-	if (td->constr && td->constr->num_check > 0)
-	{
-		for (int i = 0; i < td->constr->num_check; i++)
-		{
-			Node	*thisconstr;
+	constraintExpr = (List *) make_ands_explicit(constraintExpr);
 
-			thisconstr = stringToNode(td->constr->check[i].ccbin);
-
-			if (equal(constraintExpr, thisconstr))
-return;
-		}
-	}
+	/*
+	 * Avoid adding a new constraint if the needed constraint is implied by an
+	 * existing constraint
+	 */
+	if (PartConstraintImpliedByRelConstraint(partRel,
+list_make1(constraintExpr)))
+		return;
 
 	tab = ATGetQueueEntry(wqueue, partRel);
 
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index ec14b060a6..f81bdf513b 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -4191,6 +4191,26 @@ ALTER TABLE range_parted2 DETACH PARTITION part_rp CONCURRENTLY;
 Partition key: RANGE (a)
 Number of partitions: 0
 
+-- constraint should be created
+\d part_rp
+  Table "public.part_rp"
+ Column |  Type   | Collation | Nullable | Default 
++-+---+--+-
+ a  | integer |   |  | 
+Check constraints:
+"part_rp_a_check" CHECK (a IS NOT NULL AND a >= 0 AND a < 100)
+
+CREATE TABLE part_rp100 PARTITION OF range_parted2 (CHECK (a>=123 AND a<133 AND a IS NOT NULL)) FOR VALUES FROM (100) to (200);
+ALTER TABLE range_parted2 DETACH PARTITION part_rp100 CONCURRENTLY;
+-- redundant constraint should not be created
+\d part_rp100
+ Table "public.part_rp100"
+ Column |  Type   | Collation | Nullable | Default 
++-+---+--+-
+ a  | integer |   |  | 
+Check constraints:
+"part_rp100_a_check" CHECK (a >= 123 AND a < 133 AND a IS NOT NULL)
+
 DROP TABLE range_parted2;
 -- Check ALTER TABLE commands for partitioned tables and partitions
 -- cannot add/drop column to/from *only* the parent
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index 7a9c9252dc..dc0200adcb 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -2696,6 +2696,12 @@ DROP TABLE part_rpd;
 -- works fine
 ALTER TABLE range_parted2 DETACH PARTITION part_rp 

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-31 Thread Tom Lane
Alvaro Herrera  writes:
> I added that test as promised, and I couldn't find any problems, so I
> have pushed it.

Buildfarm testing suggests there's an issue under CLOBBER_CACHE_ALWAYS:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite=2021-03-29%2018%3A14%3A24

specifically

diff -U3 
/home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/expected/detach-partition-concurrently-4.out
 
/home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/output_iso/results/detach-partition-concurrently-4.out
--- 
/home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/expected/detach-partition-concurrently-4.out
2021-03-29 20:14:21.258199311 +0200
+++ 
/home/buildfarm/trilobite/buildroot/HEAD/pgsql.build/src/test/isolation/output_iso/results/detach-partition-concurrently-4.out
  2021-03-30 18:54:34.272839428 +0200
@@ -324,6 +324,7 @@
 1  
 2  
 step s1insert: insert into d4_fk values (1);
+ERROR:  insert or update on table "d4_fk" violates foreign key constraint 
"d4_fk_a_fkey"
 step s1c: commit;
 
 starting permutation: s2snitch s1b s1s s2detach s1cancel s3vacfreeze s1s 
s1insert s1c

regards, tom lane




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-25 Thread Alvaro Herrera
On 2020-Nov-30, Justin Pryzby wrote:

> On Tue, Nov 03, 2020 at 08:56:06PM -0300, Alvaro Herrera wrote:

> > * On the first transaction (the one that marks the partition as
> > detached), the partition is locked with ShareLock rather than
> > ShareUpdateExclusiveLock.  This means that DML is not allowed anymore.
> > This seems a reasonable restriction to me; surely by the time you're
> > detaching the partition you're not inserting data into it anymore.
> 
> I don't think it's an issue with your patch, but FYI that sounds like 
> something
> I had to do recently: detach *all* partitions of various tabls to promote 
> their
> partition key column from timestamp to timestamptz.  And we insert directly
> into child tables, not routed via parent.
> 
> I don't your patch is still useful, but not to us.  So the documentation 
> should
> be clear about that.

FWIW since you mentioned this detail specifically: I backed away from
doing this (and use ShareUpdateExclusive), because it wasn't buying us
anything anyway.  The reason for it is that I wanted to close the hole
for RI queries, and this seemed the simplest fix; but it really *wasn't*
a fix anyway.  My later games with the active snapshot (which are
present in the version I pushed) better close this problem.  So I don't
think this would be a problem.

-- 
Álvaro Herrera   Valdivia, Chile




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-25 Thread Alvaro Herrera
I added that test as promised, and I couldn't find any problems, so I
have pushed it.

Thanks!

-- 
Álvaro Herrera   Valdivia, Chile




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-25 Thread Alvaro Herrera
On 2021-Mar-23, Alvaro Herrera wrote:

> I think a possible solution to this problem is that the "detach" flag in
> pg_inherits is not a boolean anymore, but an Xid (or maybe two Xids).
> Not sure exactly which Xid(s) yet, and I'm not sure what are the exact
> rules, but the Xid becomes a marker that indicates an horizon past which
> the partition is no longer visible.  Then, REPEATABLE READ can see the
> partition, but only if its snapshot is older than the Xid.

So a solution to this problem seems similar (but not quite the same) as
pg_index.indcheckxmin: the partition is included in the partition
directory, or not, depending on the pg_inherits tuple visibility for the
active snapshot.  This solves the problem because the RI query uses a
fresh snapshot, for which the partition has already been detached, while
the normal REPEATABLE READ query is using the old snapshot for which the
'detach-pending' row is still seen as in progress.  With this, the weird
hack in a couple of places that needed to check the isolation level is
gone, which makes me a bit more comfortable.

So attached is v9 with this problem solved.

I'll add one more torture test, and if it works correctly I'll push it:
have a cursor in the repeatable read transaction, which can read the
referenced partition table and see the row in the detached partition,
but the RI query must not see that row.  Bonus: the RI query is run from
another cursor that is doing UPDATE WHERE CURRENT OF that cursor.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)
>From 61ff1c133bb4fa7e81f119e806af9760b2faf0e4 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 13 Jul 2020 20:15:30 -0400
Subject: [PATCH v9 1/2] Let ALTER TABLE exec routines deal with the relation

This means that ATExecCmd relies on AlteredRelationInfo->rel instead of
keeping the relation as a local variable; this is useful if the
subcommand needs to modify the relation internally.  For example, a
subcommand that internally commits its transaction needs this.
---
 src/backend/commands/tablecmds.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3349bcfaa7..bf7fd6e8ae 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -157,6 +157,8 @@ typedef struct AlteredTableInfo
 	Oid			relid;			/* Relation to work on */
 	char		relkind;		/* Its relkind */
 	TupleDesc	oldDesc;		/* Pre-modification tuple descriptor */
+	/* Transiently set during Phase 2, normally set to NULL */
+	Relation	rel;
 	/* Information saved by Phase 1 for Phase 2: */
 	List	   *subcmds[AT_NUM_PASSES]; /* Lists of AlterTableCmd */
 	/* Information saved by Phases 1/2 for Phase 3: */
@@ -354,7 +356,7 @@ static void ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 	  AlterTableUtilityContext *context);
 static void ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			  AlterTableUtilityContext *context);
-static void ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
+static void ATExecCmd(List **wqueue, AlteredTableInfo *tab,
 	  AlterTableCmd *cmd, LOCKMODE lockmode, int cur_pass,
 	  AlterTableUtilityContext *context);
 static AlterTableCmd *ATParseTransformCmd(List **wqueue, AlteredTableInfo *tab,
@@ -4569,7 +4571,6 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 		{
 			AlteredTableInfo *tab = (AlteredTableInfo *) lfirst(ltab);
 			List	   *subcmds = tab->subcmds[pass];
-			Relation	rel;
 			ListCell   *lcmd;
 
 			if (subcmds == NIL)
@@ -4578,10 +4579,10 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			/*
 			 * Appropriate lock was obtained by phase 1, needn't get it again
 			 */
-			rel = relation_open(tab->relid, NoLock);
+			tab->rel = relation_open(tab->relid, NoLock);
 
 			foreach(lcmd, subcmds)
-ATExecCmd(wqueue, tab, rel,
+ATExecCmd(wqueue, tab,
 		  castNode(AlterTableCmd, lfirst(lcmd)),
 		  lockmode, pass, context);
 
@@ -4593,7 +4594,11 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			if (pass == AT_PASS_ALTER_TYPE)
 ATPostAlterTypeCleanup(wqueue, tab, lockmode);
 
-			relation_close(rel, NoLock);
+			if (tab->rel)
+			{
+relation_close(tab->rel, NoLock);
+tab->rel = NULL;
+			}
 		}
 	}
 
@@ -4619,11 +4624,12 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
  * ATExecCmd: dispatch a subcommand to appropriate execution routine
  */
 static void
-ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
+ATExecCmd(List **wqueue, AlteredTableInfo *tab,
 		  AlterTableCmd *cmd, LOCKMODE lockmode, int cur_pass,
 		  AlterTableUtilityContext *context)
 {
 	ObjectAddress address = InvalidObjectAddress;
+	Relation	rel = tab->rel;
 
 	switch (cmd->subtype)
 	{
@@ -5730,6 +5736,7 @@ ATGetQueueEntry(List **wqueue, 

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-23 Thread Alvaro Herrera
I'm coming around to the idea that the fact that you can cancel the wait
phase of DETACH CONCURRENTLY creates quite a disaster, and it's not easy
to get away from it.  The idea that REPEATABLE READ mode means that you
now see detached partitions as if they were in normal condition, is
completely at odds with that behavior. 

I think a possible solution to this problem is that the "detach" flag in
pg_inherits is not a boolean anymore, but an Xid (or maybe two Xids).
Not sure exactly which Xid(s) yet, and I'm not sure what are the exact
rules, but the Xid becomes a marker that indicates an horizon past which
the partition is no longer visible.  Then, REPEATABLE READ can see the
partition, but only if its snapshot is older than the Xid.

-- 
Álvaro Herrera   Valdivia, Chile
"La persona que no quería pecar / estaba obligada a sentarse
 en duras y empinadas sillas/ desprovistas, por cierto
 de blandos atenuantes"  (Patricio Vogel)




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-23 Thread Alvaro Herrera
On 2021-Mar-23, Alvaro Herrera wrote:

> So I was about ready to get these patches pushed, when I noticed that in
> REPEATABLE READ isolation mode it is possible to insert rows violating
> an FK referencing the partition that is being detached.  I'm not sure
> what is a good solution to this problem.

...

> I had the idea that the RI code, in REPEATABLE READ mode, used a
> different snapshot for the RI queries than the transaction snapshot.

I am definitely right about this.  So why doesn't it work?  The reason
is that when SPI goes to execute the query, it obtains a new partition
directory, and we tell it to include detached partitions precisely
because we're in REPEATABLE READ mode.

In other words, the idea that we can blanket use the snapshot-isolation
condition to decide whether to include detached partitions or not, is
bogus and needs at least the refinement that for any query that comes
from the RI system, we need a partition directory that does not include
detached partitions.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"El sabio habla porque tiene algo que decir;
el tonto, porque tiene que decir algo" (Platon).




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-23 Thread Alvaro Herrera
So I was about ready to get these patches pushed, when I noticed that in
REPEATABLE READ isolation mode it is possible to insert rows violating
an FK referencing the partition that is being detached.  I'm not sure
what is a good solution to this problem.

The problem goes like this:

/* setup */
drop table if exists d4_primary, d4_primary1, d4_fk;
create table d4_primary (a int primary key) partition by list (a);
create table d4_primary1 partition of d4_primary for values in (1);
insert into d4_primary values (1);
create table d4_fk (a int references d4_primary);

/* session 1 */
begin isolation level repeatable read;
select * from d4_primary;

/* session 2 */
alter table d4_primary detach partition d4_primary1 concurrently;
-- blocks
-- Cancel wait: Ctrl-c

/* session 1 */
insert into d4_fk values (1);
commit;

At this point, d4_fk contains the value (1) which is not present in
d4_primary.

This doesn't happen in READ COMMITTED mode; the INSERT at the final step
fails with "insert or update in table f4_fk violates the foreign key",
which is what I expected to happen here too.

I had the idea that the RI code, in REPEATABLE READ mode, used a
different snapshot for the RI queries than the transaction snapshot.
Maybe I'm wrong about that.

I'm looking into that now.

-- 
Álvaro Herrera   Valdivia, Chile
"Cuando mañana llegue pelearemos segun lo que mañana exija" (Mowgli)




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-21 Thread Justin Pryzby
On Sun, Mar 21, 2021 at 01:14:20PM -0500, Justin Pryzby wrote:
> On Sun, Mar 21, 2021 at 03:01:15PM -0300, Alvaro Herrera wrote:
> > > But note that it doesn't check if an existing constraint "implies" the new
> > > constraint - maybe it should.
> > 
> > Hm, I'm not sure I want to do that, because that means that if I later
> > have to attach the partition again with the same partition bounds, then
> > I might have to incur a scan to recheck the constraint.  I think we want
> > to make the new constraint be as tight as possible ...
> 
> The ATTACH PARTITION checks if any existing constraint impilies the (proposed)
> partition bounds, not just if constraints are equal.  So I'm suggesting to do
> the same here.

On Sun, Mar 21, 2021 at 04:07:12PM -0300, Alvaro Herrera wrote:
> On 2021-Mar-21, Justin Pryzby wrote:
> > On Sun, Mar 21, 2021 at 03:22:00PM -0300, Alvaro Herrera wrote:
> > > So if we do that on DETACH, what would happen on ATTACH?
> > 
> > Do you mean what happens to the constraint that was already there ?
> > Nothing, since it's not ours to mess with.  Checking ImpliedBy() rather than
> > equal() doesn't change that.
> 
> No, I meant what happens regarding checking existing values in the
> table: is the table scanned even if the partition constraint is implied
> by existing table constraints?

I'm still not sure we're talking about the same thing.

Your patch adds a CHECK constraint during DETACH CONCURRENTLY, and I suggested
that it should avoid adding it if it's redundant with an existing constraint,
even if not equal().

The current behavior (since v10) is this:

postgres=# ALTER TABLE p ATTACH PARTITION p1 FOR VALUES FROM (1)TO(2);
DEBUG:  partition constraint for table "p1" is implied by existing constraints
ALTER TABLE

And that wouldn't change, except the CHECK constraint would be added
automatically during detach (if it wasn't already implied).  Maybe the CHECK
constraint should be added without CONCURRENTLY, too.  One fewer difference in
behavior.

-- 
Justin




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-21 Thread Alvaro Herrera
On 2021-Mar-21, Justin Pryzby wrote:

> On Sun, Mar 21, 2021 at 03:22:00PM -0300, Alvaro Herrera wrote:
>
> > So if we do that on DETACH, what would happen on ATTACH?
> 
> Do you mean what happens to the constraint that was already there ?
> Nothing, since it's not ours to mess with.  Checking ImpliedBy() rather than
> equal() doesn't change that.

No, I meant what happens regarding checking existing values in the
table: is the table scanned even if the partition constraint is implied
by existing table constraints?

> I proposed this a few years ago for DETACH (without concurrently), 
> specifically
> to avoid the partition scans.
> https://www.postgresql.org/message-id/20180601221428.gu5...@telsasoft.com
> |The docs say: if detaching/re-attach a partition, should first ADD CHECK to
> |avoid a slow ATTACH operation.  Perhaps DETACHing a partition could
> |implicitly CREATE a constraint which is usable when reATTACHing?

Well, I agree with you that we should add such a constraint.

-- 
Álvaro Herrera   Valdivia, Chile
"The problem with the future is that it keeps turning into the present"
(Hobbes)




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-21 Thread Justin Pryzby
On Sun, Mar 21, 2021 at 03:22:00PM -0300, Alvaro Herrera wrote:
> On 2021-Mar-21, Justin Pryzby wrote:
> 
> > On Sun, Mar 21, 2021 at 03:01:15PM -0300, Alvaro Herrera wrote:
> > > > But note that it doesn't check if an existing constraint "implies" the 
> > > > new
> > > > constraint - maybe it should.
> > > 
> > > Hm, I'm not sure I want to do that, because that means that if I later
> > > have to attach the partition again with the same partition bounds, then
> > > I might have to incur a scan to recheck the constraint.  I think we want
> > > to make the new constraint be as tight as possible ...
> > 
> > The ATTACH PARTITION checks if any existing constraint impilies the 
> > (proposed)
> > partition bounds, not just if constraints are equal.  So I'm suggesting to 
> > do
> > the same here.
> 
> So if we do that on DETACH, what would happen on ATTACH?

Do you mean what happens to the constraint that was already there ?
Nothing, since it's not ours to mess with.  Checking ImpliedBy() rather than
equal() doesn't change that.

I proposed this a few years ago for DETACH (without concurrently), specifically
to avoid the partition scans.
https://www.postgresql.org/message-id/20180601221428.gu5...@telsasoft.com
|The docs say: if detaching/re-attach a partition, should first ADD CHECK to
|avoid a slow ATTACH operation.  Perhaps DETACHing a partition could
|implicitly CREATE a constraint which is usable when reATTACHing?

-- 
Justin




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-21 Thread Alvaro Herrera
On 2021-Mar-21, Justin Pryzby wrote:

> On Sun, Mar 21, 2021 at 03:01:15PM -0300, Alvaro Herrera wrote:
> > > But note that it doesn't check if an existing constraint "implies" the new
> > > constraint - maybe it should.
> > 
> > Hm, I'm not sure I want to do that, because that means that if I later
> > have to attach the partition again with the same partition bounds, then
> > I might have to incur a scan to recheck the constraint.  I think we want
> > to make the new constraint be as tight as possible ...
> 
> The ATTACH PARTITION checks if any existing constraint impilies the (proposed)
> partition bounds, not just if constraints are equal.  So I'm suggesting to do
> the same here.

So if we do that on DETACH, what would happen on ATTACH?

-- 
Álvaro Herrera39°49'30"S 73°17'W




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-21 Thread Justin Pryzby
On Sun, Mar 21, 2021 at 03:01:15PM -0300, Alvaro Herrera wrote:
> > But note that it doesn't check if an existing constraint "implies" the new
> > constraint - maybe it should.
> 
> Hm, I'm not sure I want to do that, because that means that if I later
> have to attach the partition again with the same partition bounds, then
> I might have to incur a scan to recheck the constraint.  I think we want
> to make the new constraint be as tight as possible ...

The ATTACH PARTITION checks if any existing constraint impilies the (proposed)
partition bounds, not just if constraints are equal.  So I'm suggesting to do
the same here.

-- 
Justin




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-21 Thread Alvaro Herrera
On 2021-Mar-19, Alvaro Herrera wrote:

> diff --git a/src/backend/utils/cache/partcache.c 
> b/src/backend/utils/cache/partcache.c
> index 0fe4f55b04..6dfa3fb4a8 100644
> --- a/src/backend/utils/cache/partcache.c
> +++ b/src/backend/utils/cache/partcache.c
> @@ -352,16 +352,9 @@ generate_partition_qual(Relation rel)
>   return copyObject(rel->rd_partcheck);
>  
>   /*
> -  * Obtain parent relid; if it's invalid, then the partition is being
> -  * detached.  The constraint is NIL in that case, and let's cache that.
> +  * Obtain parent relid.  XXX explain why we need this
>*/
> - parentrelid = get_partition_parent(RelationGetRelid(rel));
> - if (parentrelid == InvalidOid)
> - {
> - rel->rd_partcheckvalid = true;
> - rel->rd_partcheck = NIL;
> - return NIL;
> - }
> + parentrelid = get_partition_parent(RelationGetRelid(rel), true);

One thing that makes me uneasy about this, is that I don't understand
how does this happen with your test of two psqls doing attach/detach.
(It is necessary for the case when the waiting concurrent detach is
canceled, and so this fix is necessary anyway).  In your test, no
waiting transaction is ever cancelled; so what is the period during
which the relation is not locked that causes this code to be hit?  I
fear that there's a bug in the lock protocol somewhere.

-- 
Álvaro Herrera   Valdivia, Chile
"In Europe they call me Niklaus Wirth; in the US they call me Nickel's worth.
 That's because in Europe they call me by name, and in the US by value!"




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-21 Thread Alvaro Herrera
On 2021-Mar-21, Justin Pryzby wrote:

> On Fri, Mar 19, 2021 at 10:57:37AM -0300, Alvaro Herrera wrote:
> > > Also, it "fails to avoid" adding duplicate constraints:
> > > 
> > > Check constraints:
> > > "c" CHECK (i IS NOT NULL AND i > 1 AND i < 2)
> > > "cc" CHECK (i IS NOT NULL AND i >= 1 AND i < 2)
> > > "p1_check" CHECK (true)
> > > "p1_i_check" CHECK (i IS NOT NULL AND i >= 1 AND i < 2)
> > 
> > Do you mean the "cc" and "p1_i_check" one?  I mean, if you already have
> 
> No, I started with c and cc, and it added the broken constraint p1_check 
> (which
> you say you've fixed) and the redundant constraint p1_i_check.  I guess that's
> what you meant.

Yes, that's what I meant.

> > a constraint in the partition that duplicates the partition constraint,
> > then during attach we still create our new constraint?  I guess a
> > solution to this would be to scan all constraints and see if any equals
> > the expression that the new one would have.  Sounds easy enough now that
> > write it out loud.
> 
> But it looks like DetachAddConstraintIfNeeded already intended to do that:
> 
> +if (equal(constraintExpr, thisconstr))
> +return;

Hah, so I had already done it, but forgot.

> Actually, it appears your latest notpatch resolves both these issues.

Great.

> But note that it doesn't check if an existing constraint "implies" the new
> constraint - maybe it should.

Hm, I'm not sure I want to do that, because that means that if I later
have to attach the partition again with the same partition bounds, then
I might have to incur a scan to recheck the constraint.  I think we want
to make the new constraint be as tight as possible ...

-- 
Álvaro Herrera39°49'30"S 73°17'W




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-21 Thread Justin Pryzby
On Fri, Mar 19, 2021 at 10:57:37AM -0300, Alvaro Herrera wrote:
> > Also, it "fails to avoid" adding duplicate constraints:
> > 
> > Check constraints:
> > "c" CHECK (i IS NOT NULL AND i > 1 AND i < 2)
> > "cc" CHECK (i IS NOT NULL AND i >= 1 AND i < 2)
> > "p1_check" CHECK (true)
> > "p1_i_check" CHECK (i IS NOT NULL AND i >= 1 AND i < 2)
> 
> Do you mean the "cc" and "p1_i_check" one?  I mean, if you already have

No, I started with c and cc, and it added the broken constraint p1_check (which
you say you've fixed) and the redundant constraint p1_i_check.  I guess that's
what you meant.

> a constraint in the partition that duplicates the partition constraint,
> then during attach we still create our new constraint?  I guess a
> solution to this would be to scan all constraints and see if any equals
> the expression that the new one would have.  Sounds easy enough now that
> write it out loud.

But it looks like DetachAddConstraintIfNeeded already intended to do that:

+if (equal(constraintExpr, thisconstr)) 


+return;



Actually, it appears your latest notpatch resolves both these issues.
But note that it doesn't check if an existing constraint "implies" the new
constraint - maybe it should.

-- 
Justin




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-19 Thread Alvaro Herrera
On 2021-Mar-17, Justin Pryzby wrote:

> The v8 patch has the "broken constraint" problem.

Yeah, I had misunderstood what the problem was.  I think a good solution
to this is to have get_partition_parent return the true parent even when
a detach is pending, for one particular callsite.  (This means adjusting
all other callsites.)  Notpatch attached (applies on top of v8).

> Also, it "fails to avoid" adding duplicate constraints:
> 
> Check constraints:
> "c" CHECK (i IS NOT NULL AND i > 1 AND i < 2)
> "cc" CHECK (i IS NOT NULL AND i >= 1 AND i < 2)
> "p1_check" CHECK (true)
> "p1_i_check" CHECK (i IS NOT NULL AND i >= 1 AND i < 2)

Do you mean the "cc" and "p1_i_check" one?  I mean, if you already have
a constraint in the partition that duplicates the partition constraint,
then during attach we still create our new constraint?  I guess a
solution to this would be to scan all constraints and see if any equals
the expression that the new one would have.  Sounds easy enough now that
write it out loud.

Thanks

-- 
Álvaro Herrera39°49'30"S 73°17'W
"Java is clearly an example of money oriented programming"  (A. Stepanov)
>From ecec91fc4744ddd9a231754afdf10ec09742f8d9 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 19 Mar 2021 10:52:26 -0300
Subject: [PATCH] Fix for generation of invalid constraints

per report from Justin Pryzby
---
 src/backend/catalog/heap.c  |  2 +-
 src/backend/catalog/partition.c | 32 ++---
 src/backend/commands/tablecmds.c| 12 +--
 src/backend/utils/cache/partcache.c | 11 ++
 src/include/catalog/partition.h |  2 +-
 5 files changed, 26 insertions(+), 33 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 1a05f2a2fe..c6c5e40a39 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1919,7 +1919,7 @@ heap_drop_with_catalog(Oid relid)
elog(ERROR, "cache lookup failed for relation %u", relid);
if (((Form_pg_class) GETSTRUCT(tuple))->relispartition)
{
-   parentOid = get_partition_parent(relid);
+   parentOid = get_partition_parent(relid, true);
LockRelationOid(parentOid, AccessExclusiveLock);
 
/*
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 797a7384ad..a7cf3d9e86 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -42,15 +42,15 @@ static void get_partition_ancestors_worker(Relation inhRel, 
Oid relid,
  *
  * Returns inheritance parent of a partition by scanning pg_inherits
  *
- * If the partition is in the process of being detached, InvalidOid is
- * returned and no error is thrown.
+ * If the partition is in the process of being detached, we raise the same
+ * error unless even_if_detached is passed.
  *
  * Note: Because this function assumes that the relation whose OID is passed
  * as an argument will have precisely one parent, it should only be called
  * when it is known that the relation is a partition.
  */
 Oid
-get_partition_parent(Oid relid)
+get_partition_parent(Oid relid, bool even_if_detached)
 {
RelationcatalogRelation;
Oid result;
@@ -60,10 +60,10 @@ get_partition_parent(Oid relid)
 
result = get_partition_parent_worker(catalogRelation, relid, );
 
-   if (detached)
-   result = InvalidOid;
-   else if (!OidIsValid(result))
+   if (!OidIsValid(result))
elog(ERROR, "could not find tuple for parent of relation %u", 
relid);
+   if (detached && !even_if_detached)
+   elog(ERROR, "relation %u has no parent because it's being 
detached", relid);
 
table_close(catalogRelation, AccessShareLock);
 
@@ -75,8 +75,8 @@ get_partition_parent(Oid relid)
  * Scan the pg_inherits relation to return the OID of the parent 
of the
  * given relation
  *
- * If the partition is being detached, InvalidOid is returned, and also
- * *detached is set true.
+ * If the partition is being detached, *detached is set true (but the original
+ * parent is still returned.)
  */
 static Oid
 get_partition_parent_worker(Relation inhRel, Oid relid, bool *detached)
@@ -106,12 +106,9 @@ get_partition_parent_worker(Relation inhRel, Oid relid, 
bool *detached)
 
/* A partition being detached has no parent */
if (form->inhdetachpending)
-   {
*detached = true;
-   result = InvalidOid;
-   }
-   else
-   result = form->inhparent;
+
+   result = form->inhparent;
}
 
systable_endscan(scan);
@@ -154,9 +151,12 @@ get_partition_ancestors_worker(Relation inhRel, Oid relid, 
List **ancestors)
Oid parentOid;
booldetached;
 
-   /* Recursion ends 

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-17 Thread Justin Pryzby
The v8 patch has the "broken constraint" problem.

Also, it "fails to avoid" adding duplicate constraints:

Check constraints:
"c" CHECK (i IS NOT NULL AND i > 1 AND i < 2)
"cc" CHECK (i IS NOT NULL AND i >= 1 AND i < 2)
"p1_check" CHECK (true)
"p1_i_check" CHECK (i IS NOT NULL AND i >= 1 AND i < 2)

> diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
> index 5c9f4af1d5..0cb846f408 100644
> --- a/doc/src/sgml/catalogs.sgml
> +++ b/doc/src/sgml/catalogs.sgml
> @@ -4485,6 +4485,16 @@ SCRAM-SHA-256$iteration 
> count:
> when using declarative partitioning.
>
>   
> +
> + 
> +  
> +   inhdetachpending bool
> +  
> +  
> +   Set to true for a partition that is in the process of being detached;
> +   false otherwise.
> +  
> + 

Remove "Set to" ?
And say true and false

Probably you'll hate the suggestion, but maybe it should be "pendingdetach".
We already have pg_settings.pending_restart.

> +  If CONCURRENTLY is specified, this process runs in 
> two
> +  transactions in order to avoid blocking other sessions that might be 
> accessing
> +  the partitioned table.  During the first transaction, a
> +  SHARE UPDATE EXCLUSIVE lock is taken on both parent 
> table and
> +  partition, and its partition is marked detached; at that point, the 
> transaction
> +  is committed and all transactions using the partitioned table are 
> waited for.
> +  Once all those transactions are gone, the second stage acquires

Instead of "gone", say "have completed" ?

> +/*
> + * MarkInheritDetached
> + *
> + * When a partition is detached from its parent concurrently, we don't
> + * remove the pg_inherits row until a second transaction; as a preparatory
> + * step, this function marks the entry as 'detached', so that other

*pending detached

> + * The strategy for concurrency is to first modify the partition catalog
> + * rows to make it visible to everyone that the partition is detached,

the inherits catalog?

> + /*
> +  * In concurrent mode, the partition is locked with 
> share-update-exclusive
> +  * in the first transaction.  This allows concurrent transactions to be
> +  * doing DML to the partition.

> + /*
> +  * Check inheritance conditions and either delete the pg_inherits row
> +  * (in non-concurrent mode) or just set the inhisdetached flag.

detachpending




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-17 Thread Alvaro Herrera
On 2021-Mar-15, Alvaro Herrera wrote:

> Here's a fixup patch to do it that way.  I tried running the commands
> you showed and one of them immediately dies with the new error message;
> I can't cause the bogus constraint to show up anymore.

Actually, that was a silly fix that didn't actually work correctly, as I
discovered immediately after sending it.  The right fix is to forbid all
commands other than DETACH PARTITION FINALIZE in a partition that's in
the process of being detached.

In the attached v8, I did that; I also added a ton more tests that
hopefully show how the feature should work in concurrent cases,
including one case in which the transaction doing the detach is
cancelled.  I also renamed "inhdetached" to "inhdetachpending", per
previous discussion, including changing how to looks in psql.

I am not aware of any other loose end in this patch; I consider this
version final.  Barring further problem reports, I'll get this pushed
tomorrow morning.

psql completion is missing. If somebody would like to contribute that,
I'm grateful.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"If you have nothing to say, maybe you need just the right tool to help you
not say it."   (New York Times, about Microsoft PowerPoint)
>From 101a20d1ccf86a255af166e198c7ac691af58c99 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 13 Jul 2020 20:15:30 -0400
Subject: [PATCH v8 1/2] Let ALTER TABLE exec routines deal with the relation

This means that ATExecCmd relies on AlteredRelationInfo->rel instead of
keeping the relation as a local variable; this is useful if the
subcommand needs to modify the relation internally.  For example, a
subcommand that internally commits its transaction needs this.
---
 src/backend/commands/tablecmds.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index ffb1308a0c..8e753f1efd 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -156,6 +156,8 @@ typedef struct AlteredTableInfo
 	Oid			relid;			/* Relation to work on */
 	char		relkind;		/* Its relkind */
 	TupleDesc	oldDesc;		/* Pre-modification tuple descriptor */
+	/* Transiently set during Phase 2, normally set to NULL */
+	Relation	rel;
 	/* Information saved by Phase 1 for Phase 2: */
 	List	   *subcmds[AT_NUM_PASSES]; /* Lists of AlterTableCmd */
 	/* Information saved by Phases 1/2 for Phase 3: */
@@ -353,7 +355,7 @@ static void ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 	  AlterTableUtilityContext *context);
 static void ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			  AlterTableUtilityContext *context);
-static void ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
+static void ATExecCmd(List **wqueue, AlteredTableInfo *tab,
 	  AlterTableCmd *cmd, LOCKMODE lockmode, int cur_pass,
 	  AlterTableUtilityContext *context);
 static AlterTableCmd *ATParseTransformCmd(List **wqueue, AlteredTableInfo *tab,
@@ -4512,7 +4514,6 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 		{
 			AlteredTableInfo *tab = (AlteredTableInfo *) lfirst(ltab);
 			List	   *subcmds = tab->subcmds[pass];
-			Relation	rel;
 			ListCell   *lcmd;
 
 			if (subcmds == NIL)
@@ -4521,10 +4522,10 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			/*
 			 * Appropriate lock was obtained by phase 1, needn't get it again
 			 */
-			rel = relation_open(tab->relid, NoLock);
+			tab->rel = relation_open(tab->relid, NoLock);
 
 			foreach(lcmd, subcmds)
-ATExecCmd(wqueue, tab, rel,
+ATExecCmd(wqueue, tab,
 		  castNode(AlterTableCmd, lfirst(lcmd)),
 		  lockmode, pass, context);
 
@@ -4536,7 +4537,11 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			if (pass == AT_PASS_ALTER_TYPE)
 ATPostAlterTypeCleanup(wqueue, tab, lockmode);
 
-			relation_close(rel, NoLock);
+			if (tab->rel)
+			{
+relation_close(tab->rel, NoLock);
+tab->rel = NULL;
+			}
 		}
 	}
 
@@ -4562,11 +4567,12 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
  * ATExecCmd: dispatch a subcommand to appropriate execution routine
  */
 static void
-ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
+ATExecCmd(List **wqueue, AlteredTableInfo *tab,
 		  AlterTableCmd *cmd, LOCKMODE lockmode, int cur_pass,
 		  AlterTableUtilityContext *context)
 {
 	ObjectAddress address = InvalidObjectAddress;
+	Relation	rel = tab->rel;
 
 	switch (cmd->subtype)
 	{
@@ -5669,6 +5675,7 @@ ATGetQueueEntry(List **wqueue, Relation rel)
 	 */
 	tab = (AlteredTableInfo *) palloc0(sizeof(AlteredTableInfo));
 	tab->relid = relid;
+	tab->rel = NULL;			/* set later */
 	tab->relkind = rel->rd_rel->relkind;
 	tab->oldDesc = CreateTupleDescCopyConstr(RelationGetDescr(rel));
 	tab->newrelpersistence = RELPERSISTENCE_PERMANENT;
-- 
2.20.1

>From 6e893b5f2e7abae9e55ba0d1487d4621272d2cdf Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-15 Thread Alvaro Herrera
On 2021-Feb-26, Alvaro Herrera wrote:

> Hmm, but if we take this approach, then we're still vulnerable to the
> problem that somebody can do DETACH CONCURRENTLY and cancel the wait (or
> crash the server), then mess up the state before doing DETACH FINALIZE:
> when they cancel the wait, the lock will be released.
> 
> I think the right fix is to disallow any action on a partition which is
> pending detach other than DETACH FINALIZE.  (Didn't do that here.)

Here's a fixup patch to do it that way.  I tried running the commands
you showed and one of them immediately dies with the new error message;
I can't cause the bogus constraint to show up anymore.

I'll clean this up for a real submission tomorrow.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"The Gord often wonders why people threaten never to come back after they've
been told never to return" (www.actsofgord.com)
diff --git a/src/backend/catalog/pg_inherits.c 
b/src/backend/catalog/pg_inherits.c
index 3fd0880ca0..83093f9730 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -535,3 +535,39 @@ DeleteInheritsTuple(Oid inhrelid, Oid inhparent, bool 
detached,
 
return found;
 }
+
+bool
+PartitionHasPendingDetach(Oid partoid)
+{
+   RelationcatalogRelation;
+   ScanKeyData key;
+   SysScanDesc scan;
+   HeapTuple   inheritsTuple;
+
+   /*
+* Find pg_inherits entries by inhrelid.
+*/
+   catalogRelation = table_open(InheritsRelationId, RowExclusiveLock);
+   ScanKeyInit(,
+   Anum_pg_inherits_inhrelid,
+   BTEqualStrategyNumber, F_OIDEQ,
+   ObjectIdGetDatum(partoid));
+   scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId,
+ true, NULL, 1, );
+
+   while (HeapTupleIsValid(inheritsTuple = systable_getnext(scan)))
+   {
+   booldetached;
+
+   detached = ((Form_pg_inherits) 
GETSTRUCT(inheritsTuple))->inhdetached;
+
+   /* Done */
+   systable_endscan(scan);
+   table_close(catalogRelation, RowExclusiveLock);
+
+   return detached;
+   }
+
+   elog(ERROR, "relation %u is not a partition", partoid);
+   return false;   /* keep compiler quiet */
+}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 5986b4dd56..5074c0ff2b 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -17106,6 +17106,8 @@ ATExecDetachPartition(List **wqueue, AlteredTableInfo 
*tab, Relation rel,
 */
partRel = table_openrv(name, concurrent ? ShareLock :
   AccessExclusiveLock);
+   if (PartitionHasPendingDetach(RelationGetRelid(partRel)))
+   elog(ERROR, "hey, don't!");
 
/*
 * Check inheritance conditions and either delete the pg_inherits row
diff --git a/src/include/catalog/pg_inherits.h 
b/src/include/catalog/pg_inherits.h
index 155596907c..19243ea9e4 100644
--- a/src/include/catalog/pg_inherits.h
+++ b/src/include/catalog/pg_inherits.h
@@ -61,5 +61,6 @@ extern void StoreSingleInheritance(Oid relationId, Oid 
parentOid,
   int32 
seqNumber);
 extern bool DeleteInheritsTuple(Oid inhrelid, Oid inhparent, bool 
allow_detached,
const char 
*childname);
+extern bool PartitionHasPendingDetach(Oid partoid);
 
 #endif /* PG_INHERITS_H */


Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-11 Thread Alvaro Herrera
Rebase to current sources, to appease CF bot; no other changes.


-- 
Álvaro Herrera39°49'30"S 73°17'W
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index b1de6d0674..ea3ae56991 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -4485,6 +4485,16 @@ SCRAM-SHA-256$iteration count:
when using declarative partitioning.
   
  
+
+ 
+  
+   inhdetached bool
+  
+  
+   Set to true for a partition that is in the process of being detached;
+   false otherwise.
+  
+ 
 

   
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index c25ef5abd6..09673dcf1c 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -36,7 +36,9 @@ ALTER TABLE ALL IN TABLESPACE name
 ALTER TABLE [ IF EXISTS ] name
 ATTACH PARTITION partition_name { FOR VALUES partition_bound_spec | DEFAULT }
 ALTER TABLE [ IF EXISTS ] name
-DETACH PARTITION partition_name
+DETACH PARTITION partition_name [ CONCURRENTLY ]
+ALTER TABLE [ IF EXISTS ] name
+DETACH PARTITION partition_name FINALIZE
 
 where action is one of:
 
@@ -938,7 +940,8 @@ WITH ( MODULUS numeric_literal, REM

 

-DETACH PARTITION partition_name
+DETACH PARTITION partition_name [ { CONCURRENTLY | FINALIZE } ]
+
 
  
   This form detaches the specified partition of the target table.  The detached
@@ -947,6 +950,24 @@ WITH ( MODULUS numeric_literal, REM
   attached to the target table's indexes are detached.  Any triggers that
   were created as clones of those in the target table are removed.
  
+ 
+  If CONCURRENTLY is specified, this process runs in two
+  transactions in order to avoid blocking other sessions that might be accessing
+  the partitioned table.  During the first transaction, a
+  SHARE UPDATE EXCLUSIVE lock is taken on both parent table and
+  partition, and its partition is marked detached; at that point, the transaction
+  is committed and all transactions using the partitioned table are waited for.
+  Once all those transactions are gone, the second stage acquires
+  ACCESS EXCLUSIVE on the partition, and the detach process
+  completes.
+  CONCURRENTLY is not allowed if the
+  partitioned table contains a default partition.
+ 
+ 
+  If FINALIZE is specified, a previous
+  DETACH CONCURRENTLY invocation that was cancelled or
+  interrupted is completed.
+ 
 

 
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 9abc4a1f55..1a05f2a2fe 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2555,10 +2555,12 @@ StoreConstraints(Relation rel, List *cooked_constraints, bool is_internal)
  * Returns a list of CookedConstraint nodes that shows the cooked form of
  * the default and constraint expressions added to the relation.
  *
- * NB: caller should have opened rel with AccessExclusiveLock, and should
- * hold that lock till end of transaction.  Also, we assume the caller has
- * done a CommandCounterIncrement if necessary to make the relation's catalog
- * tuples visible.
+ * NB: caller should have opened rel with some self-conflicting lock mode,
+ * and should hold that lock till end of transaction; for normal cases that'll
+ * be AccessExclusiveLock, but if caller knows that the constraint is already
+ * enforced by some other means, it can be ShareUpdateExclusiveLock.  Also, we
+ * assume the caller has done a CommandCounterIncrement if necessary to make
+ * the relation's catalog tuples visible.
  */
 List *
 AddRelationNewConstraints(Relation rel,
@@ -3827,7 +3829,8 @@ StorePartitionBound(Relation rel, Relation parent, PartitionBoundSpec *bound)
 	 * relcache entry for that partition every time a partition is added or
 	 * removed.
 	 */
-	defaultPartOid = get_default_oid_from_partdesc(RelationGetPartitionDesc(parent));
+	defaultPartOid =
+		get_default_oid_from_partdesc(RelationGetPartitionDesc(parent, false));
 	if (OidIsValid(defaultPartOid))
 		CacheInvalidateRelcacheByRelid(defaultPartOid);
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 4ef61b5efd..28f04a0700 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1836,7 +1836,7 @@ index_concurrently_swap(Oid newIndexId, Oid oldIndexId, const char *oldName)
 		List	   *ancestors = get_partition_ancestors(oldIndexId);
 		Oid			parentIndexRelid = linitial_oid(ancestors);
 
-		DeleteInheritsTuple(oldIndexId, parentIndexRelid);
+		DeleteInheritsTuple(oldIndexId, parentIndexRelid, false, NULL);
 		StoreSingleInheritance(newIndexId, parentIndexRelid, 1);
 
 		list_free(ancestors);
@@ -2486,7 +2486,7 @@ index_drop(Oid indexId, bool concurrent, bool concurrent_lock_mode)
 	/*
 	 * fix INHERITS relation
 	 */
-	DeleteInheritsTuple(indexId, InvalidOid);
+	DeleteInheritsTuple(indexId, 

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-02-26 Thread Alvaro Herrera
On 2021-Jan-10, Justin Pryzby wrote:

> On Fri, Jan 08, 2021 at 04:14:33PM -0300, Alvaro Herrera wrote:
> > > > I ended up with apparently broken constraint when running multiple 
> > > > loops around
> > > > a concurrent detach / attach:
> > > > 
> > > > while psql -h /tmp postgres -c "ALTER TABLE p ATTACH PARTITION p1 FOR 
> > > > VALUES FROM (1)TO(2)" -c "ALTER TABLE p DETACH PARTITION p1 
> > > > CONCURRENTLY"; do :; done&
> > > > while psql -h /tmp postgres -c "ALTER TABLE p ATTACH PARTITION p1 FOR 
> > > > VALUES FROM (1)TO(2)" -c "ALTER TABLE p DETACH PARTITION p1 
> > > > CONCURRENTLY"; do :; done&
> > > > 
> > > > "p1_check" CHECK (true)
> > > > "p1_i_check" CHECK (i IS NOT NULL AND i >= 1 AND i < 2)
> > > 
> > > Not good.
> > 
> > Haven't had time to investigate this problem yet.
> 
> I guess it's because you commited the txn and released lock in the middle of
> the command.

Hmm, but if we take this approach, then we're still vulnerable to the
problem that somebody can do DETACH CONCURRENTLY and cancel the wait (or
crash the server), then mess up the state before doing DETACH FINALIZE:
when they cancel the wait, the lock will be released.

I think the right fix is to disallow any action on a partition which is
pending detach other than DETACH FINALIZE.  (Didn't do that here.)

Here's a rebase to current sources; there are no changes from v5.

-- 
Álvaro Herrera   Valdivia, Chile
"No hay ausente sin culpa ni presente sin disculpa" (Prov. francés)
>From d042b99ad3368ba0b658ac9260450ed8e39accea Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 13 Jul 2020 20:15:30 -0400
Subject: [PATCH v6 1/2] Let ALTER TABLE exec routines deal with the relation

This means that ATExecCmd relies on AlteredRelationInfo->rel instead of
keeping the relation as a local variable; this is useful if the
subcommand needs to modify the relation internally.  For example, a
subcommand that internally commits its transaction needs this.
---
 src/backend/commands/tablecmds.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index b2457a6924..b990063d38 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -156,6 +156,8 @@ typedef struct AlteredTableInfo
 	Oid			relid;			/* Relation to work on */
 	char		relkind;		/* Its relkind */
 	TupleDesc	oldDesc;		/* Pre-modification tuple descriptor */
+	/* Transiently set during Phase 2, normally set to NULL */
+	Relation	rel;
 	/* Information saved by Phase 1 for Phase 2: */
 	List	   *subcmds[AT_NUM_PASSES]; /* Lists of AlterTableCmd */
 	/* Information saved by Phases 1/2 for Phase 3: */
@@ -353,7 +355,7 @@ static void ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 	  AlterTableUtilityContext *context);
 static void ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			  AlterTableUtilityContext *context);
-static void ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
+static void ATExecCmd(List **wqueue, AlteredTableInfo *tab,
 	  AlterTableCmd *cmd, LOCKMODE lockmode, int cur_pass,
 	  AlterTableUtilityContext *context);
 static AlterTableCmd *ATParseTransformCmd(List **wqueue, AlteredTableInfo *tab,
@@ -4513,7 +4515,6 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 		{
 			AlteredTableInfo *tab = (AlteredTableInfo *) lfirst(ltab);
 			List	   *subcmds = tab->subcmds[pass];
-			Relation	rel;
 			ListCell   *lcmd;
 
 			if (subcmds == NIL)
@@ -4522,10 +4523,10 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			/*
 			 * Appropriate lock was obtained by phase 1, needn't get it again
 			 */
-			rel = relation_open(tab->relid, NoLock);
+			tab->rel = relation_open(tab->relid, NoLock);
 
 			foreach(lcmd, subcmds)
-ATExecCmd(wqueue, tab, rel,
+ATExecCmd(wqueue, tab,
 		  castNode(AlterTableCmd, lfirst(lcmd)),
 		  lockmode, pass, context);
 
@@ -4537,7 +4538,11 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			if (pass == AT_PASS_ALTER_TYPE)
 ATPostAlterTypeCleanup(wqueue, tab, lockmode);
 
-			relation_close(rel, NoLock);
+			if (tab->rel)
+			{
+relation_close(tab->rel, NoLock);
+tab->rel = NULL;
+			}
 		}
 	}
 
@@ -4563,11 +4568,12 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
  * ATExecCmd: dispatch a subcommand to appropriate execution routine
  */
 static void
-ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
+ATExecCmd(List **wqueue, AlteredTableInfo *tab,
 		  AlterTableCmd *cmd, LOCKMODE lockmode, int cur_pass,
 		  AlterTableUtilityContext *context)
 {
 	ObjectAddress address = InvalidObjectAddress;
+	Relation	rel = tab->rel;
 
 	switch (cmd->subtype)
 	{
@@ -5670,6 +5676,7 @@ ATGetQueueEntry(List **wqueue, Relation rel)
 	 */
 	tab = (AlteredTableInfo *) palloc0(sizeof(AlteredTableInfo));
 	tab->relid = relid;
+	tab->rel = NULL;			/* set later */
 	tab->relkind = rel->rd_rel->relkind;
 	tab->oldDesc 

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-01-10 Thread Justin Pryzby
On Fri, Jan 08, 2021 at 04:14:33PM -0300, Alvaro Herrera wrote:
> > > I ended up with apparently broken constraint when running multiple loops 
> > > around
> > > a concurrent detach / attach:
> > > 
> > > while psql -h /tmp postgres -c "ALTER TABLE p ATTACH PARTITION p1 FOR 
> > > VALUES FROM (1)TO(2)" -c "ALTER TABLE p DETACH PARTITION p1 
> > > CONCURRENTLY"; do :; done&
> > > while psql -h /tmp postgres -c "ALTER TABLE p ATTACH PARTITION p1 FOR 
> > > VALUES FROM (1)TO(2)" -c "ALTER TABLE p DETACH PARTITION p1 
> > > CONCURRENTLY"; do :; done&
> > > 
> > > "p1_check" CHECK (true)
> > > "p1_i_check" CHECK (i IS NOT NULL AND i >= 1 AND i < 2)
> > 
> > Not good.
> 
> Haven't had time to investigate this problem yet.

I guess it's because you commited the txn and released lock in the middle of
the command.

-- 
Justin
>From e18c11fd5bcc4f5cd981a3219383265b55974f34 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 10 Jan 2021 15:41:43 -0600
Subject: [PATCH] fix

---
 src/backend/commands/tablecmds.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d7b9c63e5f..144c27c303 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -17131,6 +17131,7 @@ ATExecDetachPartition(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		Oid		partrelid,
 parentrelid;
 		LOCKTAG		tag;
+		LockRelId	partlockrelid;
 		char   *parentrelname;
 		char   *partrelname;
 
@@ -17162,6 +17163,10 @@ ATExecDetachPartition(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		table_close(rel, NoLock);
 		tab->rel = NULL;
 
+		partlockrelid.relId = parentrelid;
+		partlockrelid.dbId = MyDatabaseId;
+		LockRelationIdForSession(, ShareUpdateExclusiveLock);
+
 		/* Make updated catalog entry visible */
 		PopActiveSnapshot();
 		CommitTransactionCommand();
@@ -17204,7 +17209,7 @@ ATExecDetachPartition(List **wqueue, AlteredTableInfo *tab, Relation rel,
 	 errmsg("partition \"%s\" was removed concurrently", partrelname)));
 
 		tab->rel = rel;
-
+		UnlockRelationIdForSession(, ShareUpdateExclusiveLock);
 	}
 
 	/* Do the final part of detaching */
@@ -17444,7 +17449,10 @@ DetachAddConstraintIfNeeded(List **wqueue, Relation partRel)
 	TupleDesc	td = RelationGetDescr(partRel);
 	Constraint *n;
 
-	constraintExpr = make_ands_explicit(RelationGetPartitionQual(partRel));
+	List *l = RelationGetPartitionQual(partRel);
+	Assert(partRel->rd_rel->relispartition);
+	Assert(l != NIL);
+	constraintExpr = make_ands_explicit(l);
 
 	/* If an identical constraint exists, we don't need to create one */
 	if (td->constr && td->constr->num_check > 0)
-- 
2.17.0



Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-01-08 Thread Alvaro Herrera
On 2020-Dec-01, Alvaro Herrera wrote:

> On 2020-Nov-30, Justin Pryzby wrote:

> Thanks for all the comments. I'll incorporate everything and submit an
> updated version later.

Here's a rebased version 5, with the typos fixed.  More comments below.

> > The attname "detached" is a stretch of what's intuitive (it's more like
> > "detachING" or half-detached).  But I think psql should for sure show 
> > something
> > more obvious to users.  Esp. seeing as psql output isn't documented.  Let's
> > figure out what to show to users and then maybe rename the column that, too.
> 
> OK.  I agree that "being detached" is the state we want users to see, or
> maybe "detach pending", or "unfinisheddetach" (ugh).  I'm not sure that
> pg_inherits.inhbeingdetached" is a great column name.  Opinions welcome.

I haven't changed this yet; I can't make up my mind about what I like
best.

Partition of: parent FOR VALUES IN (1) UNFINISHED DETACH
Partition of: parent FOR VALUES IN (1) UNDER DETACH
Partition of: parent FOR VALUES IN (1) BEING DETACHED

> > ATExecDetachPartition:
> > Doesn't this need to lock the table before testing for default partition ?
> 
> Correct, it does.

I failed to point out that by the time ATExecDetachPartition is called,
the relation has already been locked by the invoking ALTER TABLE support
code.

> > I ended up with apparently broken constraint when running multiple loops 
> > around
> > a concurrent detach / attach:
> > 
> > while psql -h /tmp postgres -c "ALTER TABLE p ATTACH PARTITION p1 FOR 
> > VALUES FROM (1)TO(2)" -c "ALTER TABLE p DETACH PARTITION p1 CONCURRENTLY"; 
> > do :; done&
> > while psql -h /tmp postgres -c "ALTER TABLE p ATTACH PARTITION p1 FOR 
> > VALUES FROM (1)TO(2)" -c "ALTER TABLE p DETACH PARTITION p1 CONCURRENTLY"; 
> > do :; done&
> > 
> > "p1_check" CHECK (true)
> > "p1_i_check" CHECK (i IS NOT NULL AND i >= 1 AND i < 2)
> 
> Not good.

Haven't had time to investigate this problem yet.

-- 
Álvaro Herrera
>From d21acb0e99ed3d296db70fed2d5d036d721d611b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 13 Jul 2020 20:15:30 -0400
Subject: [PATCH v5 1/2] Let ALTER TABLE exec routines deal with the relation

This means that ATExecCmd relies on AlteredRelationInfo->rel instead of
keeping the relation as a local variable; this is useful if the
subcommand needs to modify the relation internally.  For example, a
subcommand that internally commits its transaction needs this.
---
 src/backend/commands/tablecmds.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 993da56d43..a8528a3423 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -156,6 +156,8 @@ typedef struct AlteredTableInfo
 	Oid			relid;			/* Relation to work on */
 	char		relkind;		/* Its relkind */
 	TupleDesc	oldDesc;		/* Pre-modification tuple descriptor */
+	/* Transiently set during Phase 2, normally set to NULL */
+	Relation	rel;
 	/* Information saved by Phase 1 for Phase 2: */
 	List	   *subcmds[AT_NUM_PASSES]; /* Lists of AlterTableCmd */
 	/* Information saved by Phases 1/2 for Phase 3: */
@@ -353,7 +355,7 @@ static void ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 	  AlterTableUtilityContext *context);
 static void ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			  AlterTableUtilityContext *context);
-static void ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
+static void ATExecCmd(List **wqueue, AlteredTableInfo *tab,
 	  AlterTableCmd *cmd, LOCKMODE lockmode, int cur_pass,
 	  AlterTableUtilityContext *context);
 static AlterTableCmd *ATParseTransformCmd(List **wqueue, AlteredTableInfo *tab,
@@ -4405,7 +4407,6 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 		{
 			AlteredTableInfo *tab = (AlteredTableInfo *) lfirst(ltab);
 			List	   *subcmds = tab->subcmds[pass];
-			Relation	rel;
 			ListCell   *lcmd;
 
 			if (subcmds == NIL)
@@ -4414,10 +4415,10 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			/*
 			 * Appropriate lock was obtained by phase 1, needn't get it again
 			 */
-			rel = relation_open(tab->relid, NoLock);
+			tab->rel = relation_open(tab->relid, NoLock);
 
 			foreach(lcmd, subcmds)
-ATExecCmd(wqueue, tab, rel,
+ATExecCmd(wqueue, tab,
 		  castNode(AlterTableCmd, lfirst(lcmd)),
 		  lockmode, pass, context);
 
@@ -4429,7 +4430,11 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			if (pass == AT_PASS_ALTER_TYPE)
 ATPostAlterTypeCleanup(wqueue, tab, lockmode);
 
-			relation_close(rel, NoLock);
+			if (tab->rel)
+			{
+relation_close(tab->rel, NoLock);
+tab->rel = NULL;
+			}
 		}
 	}
 
@@ -4455,11 +4460,12 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
  * ATExecCmd: dispatch a subcommand to appropriate execution routine
  */
 static void
-ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-12-25 Thread Andy Fan
On Tue, Aug 4, 2020 at 7:49 AM Alvaro Herrera 
wrote:

> I've been working on the ability to detach a partition from a
> partitioned table, without causing blockages to concurrent activity.
> I think this operation is critical for some use cases.
>
>
This would be a very great feature.  When we can't handle thousands of
partitions
very well, and user agree to detach some old partitions automatically, the
blocking
issue here would be a big blocker for this solution. Thanks for working on
this!



> There was a lot of great discussion which ended up in Robert completing
> a much sought implementation of non-blocking ATTACH.  DETACH was
> discussed too because it was a goal initially, but eventually dropped
> from that patch altogether. Nonetheless, that thread provided a lot of
> useful input to this implementation.  Important ones:
>
> [1]
> https://postgr.es/m/CA+TgmoYg4x7AH=_QSptvuBKf+3hUdiCa4frPkt+RvXZyjX1n=w...@mail.gmail.com
> [2]
> https://postgr.es/m/CA+TgmoaAjkTibkEr=xJg3ndbRsHHSiYi2SJgX69MVosj=lj...@mail.gmail.com
> [3]
> https://postgr.es/m/CA+TgmoY13KQZF-=hntrt9uywyx3_oyoqpu9iont49jggidp...@mail.gmail.com
>
> Attached is a patch that implements
> ALTER TABLE ... DETACH PARTITION .. CONCURRENTLY.
>
> In the previous thread we were able to implement the concurrent model
> without the extra keyword.  For this one I think that won't work; my
> implementation works in two transactions so there's a restriction that
> you can't run it in a transaction block.  Also, there's a wait phase
> that makes it slower than the non-concurrent one.  Those two drawbacks
> make me think that it's better to keep both modes available, just like
> we offer both CREATE INDEX and CREATE INDEX CONCURRENTLY.
>
> Why two transactions?  The reason is that in order for this to work, we
> make a catalog change (mark it detached), and commit so that all
> concurrent transactions can see the change.  A second transaction waits
> for anybody who holds any lock on the partitioned table and grabs Access
> Exclusive on the partition (which now no one cares about, if they're
> looking at the partitioned table), where the DDL action on the partition
> can be completed.
>
> ALTER TABLE is normally unable to run in two transactions.  I hacked it
> (0001) so that the relation can be closed and reopened in the Exec phase
> (by having the rel as part of AlteredTableInfo: when ATRewriteCatalogs
> returns, it uses that pointer to close the rel).  It turns out that this
> is sufficient to make that work.  This means that ALTER TABLE DETACH
> CONCURRENTLY cannot work as part of a multi-command ALTER TABLE, but
> that's alreay enforced by the grammar anyway.
>
> DETACH CONCURRENTLY doesn't work if a default partition exists.  It's
> just too problematic a case; you would still need to have AEL on the
> default partition.
>
>
> I haven't yet experimented with queries running in a standby in tandem
> with a detach.
>
> --
> Álvaro Herrera
>


-- 
Best Regards
Andy Fan


Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-12-01 Thread Alvaro Herrera
Hi Justin,

Thanks for all the comments. I'll incorporate everything and submit an
updated version later.

On 2020-Nov-30, Justin Pryzby wrote:

> On Tue, Nov 03, 2020 at 08:56:06PM -0300, Alvaro Herrera wrote:

> > +++ b/src/bin/psql/describe.c
> > -   printfPQExpBuffer(, _("Partition of: %s %s"), 
> > parent_name,
> > - partdef);
> > +   printfPQExpBuffer(, _("Partition of: %s %s%s"), 
> > parent_name,
> > + partdef,
> > + strcmp(detached, "t") 
> > == 0 ? " DETACHED" : "");
> 
> The attname "detached" is a stretch of what's intuitive (it's more like
> "detachING" or half-detached).  But I think psql should for sure show 
> something
> more obvious to users.  Esp. seeing as psql output isn't documented.  Let's
> figure out what to show to users and then maybe rename the column that, too.

OK.  I agree that "being detached" is the state we want users to see, or
maybe "detach pending", or "unfinisheddetach" (ugh).  I'm not sure that
pg_inherits.inhbeingdetached" is a great column name.  Opinions welcome.

> > +PG_KEYWORD("finalize", FINALIZE, UNRESERVED_KEYWORD, BARE_LABEL)
> 
> Instead of finalize .. deferred ?  Or ??

Well, I'm thinking that this has to be a verb in the imperative mood.
The user is commanding the server to "finalize this detach operation".
I'm not sure that DEFERRED fits that grammatical role.  If there are
other ideas, let's discuss them.

ALTER TABLE tst DETACH PARTITION tst_1 FINALIZE  <-- decent
ALTER TABLE tst DETACH PARTITION tst_1 COMPLETE  <-- I don't like it
ALTER TABLE tst DETACH PARTITION tst_1 DEFERRED  <-- grammatically faulty?

> ATExecDetachPartition:
> Doesn't this need to lock the table before testing for default partition ?

Correct, it does.

> I ended up with apparently broken constraint when running multiple loops 
> around
> a concurrent detach / attach:
> 
> while psql -h /tmp postgres -c "ALTER TABLE p ATTACH PARTITION p1 FOR VALUES 
> FROM (1)TO(2)" -c "ALTER TABLE p DETACH PARTITION p1 CONCURRENTLY"; do :; 
> done&
> while psql -h /tmp postgres -c "ALTER TABLE p ATTACH PARTITION p1 FOR VALUES 
> FROM (1)TO(2)" -c "ALTER TABLE p DETACH PARTITION p1 CONCURRENTLY"; do :; 
> done&
> 
> "p1_check" CHECK (true)
> "p1_i_check" CHECK (i IS NOT NULL AND i >= 1 AND i < 2)

Not good.




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-11-30 Thread Justin Pryzby
On Tue, Nov 03, 2020 at 08:56:06PM -0300, Alvaro Herrera wrote:
> Here's an updated version of this patch.
> 
> Apart from rebasing to current master, I made the following changes:
> 
> * On the first transaction (the one that marks the partition as
> detached), the partition is locked with ShareLock rather than
> ShareUpdateExclusiveLock.  This means that DML is not allowed anymore.
> This seems a reasonable restriction to me; surely by the time you're
> detaching the partition you're not inserting data into it anymore.

I don't think it's an issue with your patch, but FYI that sounds like something
I had to do recently: detach *all* partitions of various tabls to promote their
partition key column from timestamp to timestamptz.  And we insert directly
into child tables, not routed via parent.

I don't your patch is still useful, but not to us.  So the documentation should
be clear about that.

> * ALTER TABLE .. DETACH PARTITION FINALIZE now waits for concurrent old
> snapshots, as previously discussed. This should ensure that the user
> doesn't just cancel the wait during the second transaction by Ctrl-C and
> run FINALIZE immediately afterwards, which I claimed would bring
> inconsistency.
> 
> * Avoid creating the CHECK constraint if an identical one already
> exists.
> 
> (Note I do not remove the constraint on ATTACH.  That seems pointless.)
> 
> Still to do: test this using the new hook added by 6f0b632f083b.

tab complete?

> +* Ensure that foreign keys still hold after this detach.  This keeps 
> lock
> +* on the referencing tables, which prevent concurrent transactions 
> from

keeps locks or
which prevents

> +++ b/doc/src/sgml/ref/alter_table.sgml
> @@ -947,6 +950,24 @@ WITH ( MODULUS  class="parameter">numeric_literal, REM
>attached to the target table's indexes are detached.  Any triggers that
>were created as clones of those in the target table are removed.
>   
> + 
> +  If CONCURRENTLY is specified, this process runs in 
> two
> +  transactions in order to avoid blocking other sessions that might be 
> accessing
> +  the partitioned table.  During the first transaction,
> +  SHARE UPDATE EXCLUSIVE is taken in both parent 
> table and

missing "lock"
taken *on* ?

> +  partition, and the partition is marked detached; at that point, the 
> transaction

probably "its partition,"

> +  If FINALIZE is specified, complete actions of a
> +  previous DETACH CONCURRENTLY invocation that
> +  was cancelled or crashed.

say "actions are completed" or:

  If FINALIZE is specified, a previous DETACH that was cancelled or interrupted
  is completed.

> + if (!inhdetached && detached)
> + ereport(ERROR,
> + 
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +  errmsg("cannot complete 
> detaching partition \"%s\"",
> + childname),
> +  errdetail("There's no partial 
> concurrent detach in progress.")));

maybe say "partially-complete" or remove "partial"

> +  * the partition being detached?  Putting them on the partition 
> being
> +  * detached would be wrong, since they'd become "lost" after 
> the but
after *that* ?

> +  * Concurrent mode has to work harder; first we add a new constraint to 
> the
> +  * partition that matches the partition constraint, if there isn't a 
> matching
> +  * one already.  The reason for this is that the planner may have made
> +  * optimizations that depend on the constraint.  XXX Isn't it 
> sufficient to
> +  * invalidate the partition's relcache entry?

Ha.  I suggested this years ago.
https://www.postgresql.org/message-id/20180601221428.gu5...@telsasoft.com
|. The docs say: if detaching/re-attach a partition, should first ADD CHECK to
|  avoid a slow ATTACH operation.  Perhaps DETACHing a partition could
|  implicitly CREATE a constraint which is usable when reATTACHing?

> +  * Then we close our existing transaction, and in a new one wait for
> +  * all process to catch up on the catalog updates we've done so far; at

processes

> +  * We don't need to concern ourselves with waiting for a lock 
> the
> +  * partition itself, since we will acquire AccessExclusiveLock 
> below.

lock *on* ?

> +  * If asked to, wait until existing snapshots are gone.  This is 
> important
> +  * in the second transaction of DETACH PARTITION CONCURRENTLY is 
> canceled:

s/in/if/

> +++ b/src/bin/psql/describe.c
> - printfPQExpBuffer(, _("Partition of: %s %s"), 
> parent_name,
> -   partdef);
> + printfPQExpBuffer(, _("Partition of: %s %s%s"), 
> parent_name,
> + 

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-11-30 Thread Alvaro Herrera
On 2020-Nov-30, Anastasia Lubennikova wrote:

> The commitfest is nearing the end and this thread is "Waiting on Author".
> As far as I see the last message contains a patch. Is there anything left to
> work on or it needs review now? Alvaro, are you planning to continue working
> on it?

Thanks Anastasia.  I marked it as needs review.




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-11-30 Thread Anastasia Lubennikova

On 04.11.2020 02:56, Alvaro Herrera wrote:

Here's an updated version of this patch.

Apart from rebasing to current master, I made the following changes:

* On the first transaction (the one that marks the partition as
detached), the partition is locked with ShareLock rather than
ShareUpdateExclusiveLock.  This means that DML is not allowed anymore.
This seems a reasonable restriction to me; surely by the time you're
detaching the partition you're not inserting data into it anymore.

* In ExecInitPartitionDispatchInfo, the previous version always
excluded detached partitions.  Now it does include them in isolation
level repeatable read and higher.  Considering the point above, this
sounds a bit contradictory: you shouldn't be inserting new tuples in
partitions being detached.  But if you do, it makes more sense: in RR
two queries that insert tuples in the same partition would not fail
mid-transaction.  (In a read-committed transaction, the second query
does fail, but to me that does not sound surprising.)

* ALTER TABLE .. DETACH PARTITION FINALIZE now waits for concurrent old
snapshots, as previously discussed. This should ensure that the user
doesn't just cancel the wait during the second transaction by Ctrl-C and
run FINALIZE immediately afterwards, which I claimed would bring
inconsistency.

* Avoid creating the CHECK constraint if an identical one already
exists.

(Note I do not remove the constraint on ATTACH.  That seems pointless.)

Still to do: test this using the new hook added by 6f0b632f083b.


Status update for a commitfest entry.

The commitfest is nearing the end and this thread is "Waiting on Author".
As far as I see the last message contains a patch. Is there anything 
left to work on or it needs review now? Alvaro, are you planning to 
continue working on it?


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-11-03 Thread Alvaro Herrera
Here's an updated version of this patch.

Apart from rebasing to current master, I made the following changes:

* On the first transaction (the one that marks the partition as
detached), the partition is locked with ShareLock rather than
ShareUpdateExclusiveLock.  This means that DML is not allowed anymore.
This seems a reasonable restriction to me; surely by the time you're
detaching the partition you're not inserting data into it anymore.

* In ExecInitPartitionDispatchInfo, the previous version always
excluded detached partitions.  Now it does include them in isolation
level repeatable read and higher.  Considering the point above, this
sounds a bit contradictory: you shouldn't be inserting new tuples in
partitions being detached.  But if you do, it makes more sense: in RR
two queries that insert tuples in the same partition would not fail
mid-transaction.  (In a read-committed transaction, the second query
does fail, but to me that does not sound surprising.)

* ALTER TABLE .. DETACH PARTITION FINALIZE now waits for concurrent old
snapshots, as previously discussed. This should ensure that the user
doesn't just cancel the wait during the second transaction by Ctrl-C and
run FINALIZE immediately afterwards, which I claimed would bring
inconsistency.

* Avoid creating the CHECK constraint if an identical one already
exists.

(Note I do not remove the constraint on ATTACH.  That seems pointless.)

Still to do: test this using the new hook added by 6f0b632f083b.
>From 8135f82493892ad755176c8c4c8251355dae2927 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 3 Aug 2020 19:21:09 -0400
Subject: [PATCH v4] ALTER TABLE ... DETACH CONCURRENTLY

---
 doc/src/sgml/catalogs.sgml|  10 +
 doc/src/sgml/ref/alter_table.sgml |  25 +-
 src/backend/catalog/heap.c|  13 +-
 src/backend/catalog/index.c   |   4 +-
 src/backend/catalog/partition.c   |  28 +-
 src/backend/catalog/pg_inherits.c |  51 +-
 src/backend/commands/indexcmds.c  |   7 +-
 src/backend/commands/tablecmds.c  | 587 ++
 src/backend/commands/trigger.c|   5 +-
 src/backend/executor/execPartition.c  |  29 +-
 src/backend/nodes/copyfuncs.c |   1 +
 src/backend/nodes/equalfuncs.c|   1 +
 src/backend/optimizer/util/plancat.c  |  11 +-
 src/backend/parser/gram.y |  23 +-
 src/backend/partitioning/partbounds.c |   6 +-
 src/backend/partitioning/partdesc.c   |  21 +-
 src/backend/tcop/utility.c|  19 +
 src/backend/utils/cache/partcache.c   |  15 +-
 src/bin/psql/describe.c   |  40 +-
 src/include/catalog/pg_inherits.h |   7 +-
 src/include/nodes/parsenodes.h|   2 +
 src/include/parser/kwlist.h   |   1 +
 src/include/partitioning/partdesc.h   |   5 +-
 src/include/utils/snapmgr.h   |   1 +
 .../detach-partition-concurrently-1.out   | 301 +
 .../detach-partition-concurrently-2.out   |  66 ++
 src/test/isolation/isolation_schedule |   2 +
 .../detach-partition-concurrently-1.spec  |  90 +++
 .../detach-partition-concurrently-2.spec  |  41 ++
 29 files changed, 1234 insertions(+), 178 deletions(-)
 create mode 100644 src/test/isolation/expected/detach-partition-concurrently-1.out
 create mode 100644 src/test/isolation/expected/detach-partition-concurrently-2.out
 create mode 100644 src/test/isolation/specs/detach-partition-concurrently-1.spec
 create mode 100644 src/test/isolation/specs/detach-partition-concurrently-2.spec

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 5fb9dca425..b628ffaa4b 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -4482,6 +4482,16 @@ SCRAM-SHA-256$iteration count:
when using declarative partitioning.
   
  
+
+ 
+  
+   inhdetached bool
+  
+  
+   Set to true for a partition that is in the process of being detached;
+   false otherwise.
+  
+ 
 

   
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index c25ef5abd6..1ff6214fa7 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -36,7 +36,9 @@ ALTER TABLE ALL IN TABLESPACE name
 ALTER TABLE [ IF EXISTS ] name
 ATTACH PARTITION partition_name { FOR VALUES partition_bound_spec | DEFAULT }
 ALTER TABLE [ IF EXISTS ] name
-DETACH PARTITION partition_name
+DETACH PARTITION partition_name [ CONCURRENTLY ]
+ALTER TABLE [ IF EXISTS ] name
+DETACH PARTITION partition_name FINALIZE
 
 where action is one of:
 
@@ -938,7 +940,8 @@ WITH ( MODULUS numeric_literal, REM

 

-DETACH PARTITION partition_name
+DETACH PARTITION partition_name [ { CONCURRENTLY | FINALIZE } ]
+
 
  
   This form detaches 

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-10-16 Thread Alvaro Herrera
On 2020-Sep-24, Amit Langote wrote:

Hello Amit,

> Sorry I totally failed to see the v2 you had posted and a couple of
> other emails where you mentioned the issues I brought up.

No worries, I appreciate you reviewing this.

> However, I am a bit curious about including detached partitions in
> some cases while not in other, which can result in a (to me)
> surprising behavior as follows:
[ snip ]
> begin;
> insert into foo values (2);  -- ok
> select * from foo;
> select * from foo;  -- ?!
>  a | b
> ---+---
> (0 rows)
> 
> Maybe, it's fine to just always exclude detached partitions, although
> perhaps I am missing some corner cases that you have thought of?

Well, this particular case can be fixed by changing
ExecInitPartitionDispatchInfo specifically, from including detached
partitions to excluding them, as in the attached version.  Given your
example I think the case is fairly good that they should be excluded
there.  I can't think of a case that this change break.

However I'm not sure that excluding them everywhere is sensible.  There
are currently two cases where they are included (search for calls to
CreatePartitionDirectory if you're curious).  One is snapshot-isolation
transactions (repeatable read and serializable) in
set_relation_partition_info, per the example from Hao Wu.  If we simply
exclude detached transaction there, repeatable read no longer works
properly; rows will just go missing for no apparent reason.  I don't
think this is acceptable.

The other case is ExecCreatePartitionPruneState().  The whole point of
including detached partitions here is to make them available for queries
that were planned before the detach and executed after the detach.  My
fear is that the pruning plan will contain references (from planner) to
partitions that the executor doesn't know about.  If there are extra
partitions at the executor side, it shouldn't harm anything (and it
shouldn't change query results); but I'm not sure that things will work
OK if partitions seen by the planner disappear from under the executor.

I'm posting this version just as a fresh rebase -- it is not yet
intended for commit.  I haven't touched the constraint stuff.  I still
think that that can be removed before commit, which is a simple change;
but even if not, the problem with the duplicated constraints should be
easy to fix.

Again, my thanks for reviewing.
>From 2cf81404fb4fa4e85665d6695361830d015acac0 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 13 Jul 2020 20:15:30 -0400
Subject: [PATCH v3 1/2] Let ALTER TABLE exec routines deal with the relation

This means that ATExecCmd relies on AlteredRelationInfo->rel instead of
keeping the relation as a local variable; this is useful if the
subcommand needs to modify the relation internally.  For example, a
subcommand that internally commits its transaction needs this.
---
 src/backend/commands/tablecmds.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 511f015a86..2784fb11f9 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -157,6 +157,8 @@ typedef struct AlteredTableInfo
 	Oid			relid;			/* Relation to work on */
 	char		relkind;		/* Its relkind */
 	TupleDesc	oldDesc;		/* Pre-modification tuple descriptor */
+	/* Transiently set during Phase 2, normally set to NULL */
+	Relation	rel;
 	/* Information saved by Phase 1 for Phase 2: */
 	List	   *subcmds[AT_NUM_PASSES]; /* Lists of AlterTableCmd */
 	/* Information saved by Phases 1/2 for Phase 3: */
@@ -354,7 +356,7 @@ static void ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 	  AlterTableUtilityContext *context);
 static void ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			  AlterTableUtilityContext *context);
-static void ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
+static void ATExecCmd(List **wqueue, AlteredTableInfo *tab,
 	  AlterTableCmd *cmd, LOCKMODE lockmode, int cur_pass,
 	  AlterTableUtilityContext *context);
 static AlterTableCmd *ATParseTransformCmd(List **wqueue, AlteredTableInfo *tab,
@@ -4393,7 +4395,6 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 		{
 			AlteredTableInfo *tab = (AlteredTableInfo *) lfirst(ltab);
 			List	   *subcmds = tab->subcmds[pass];
-			Relation	rel;
 			ListCell   *lcmd;
 
 			if (subcmds == NIL)
@@ -4402,10 +4403,10 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			/*
 			 * Appropriate lock was obtained by phase 1, needn't get it again
 			 */
-			rel = relation_open(tab->relid, NoLock);
+			tab->rel = relation_open(tab->relid, NoLock);
 
 			foreach(lcmd, subcmds)
-ATExecCmd(wqueue, tab, rel,
+ATExecCmd(wqueue, tab,
 		  castNode(AlterTableCmd, lfirst(lcmd)),
 		  lockmode, pass, context);
 
@@ -4417,7 +4418,11 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			if (pass == AT_PASS_ALTER_TYPE)
 

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-10-14 Thread Andy Fan
Hi David/Alvaro:

On Thu, Oct 15, 2020 at 9:09 AM David Rowley  wrote:

> On Thu, 15 Oct 2020 at 14:04, Andy Fan  wrote:
> >
> > I think if it is possible to implement the detech with a NoWait option .
> >
> > ALTER TABLE ... DETACH PARTITION ..  [NoWait].
> >
> > if it can't get the lock, raise "Resource is Busy" immediately, without
> blocking others.
> > this should be a default behavior.   If people do want to keep trying,
> it can set
> > a ddl_lock_timeout to 'some-interval',  in this case, it will still
> block others(so it
> > can't be as good as what you are doing, but very simple),  however the
> user
> > would know what would happen exactly and can coordinate with their
> > application accordingly.   I'm sorry about this since it is a bit of
> off-topics
> > or it has been discussed already.
>
> How would that differ from setting a low lock_timeout and running the DDL?
>

They are exactly the same (I didn't realize this parameter when I sent the
email).


> I think what Alvaro wants to avoid is taking the AEL in the first
> place.


I'm agreed with this,  that's why I said "so it can't be as good as what
you are doing"


> When you have multiple long overlapping queries to the
> partitioned table, then there be no point in time where there are zero
> locks on the table. It does not sound like your idea would help with that.



Based on my current knowledge,  "detach" will hold an exclusive lock
and it will have higher priority than other waiters.  so it has to wait for
the lock
holder before it (named as sess 1).  and at the same time, block all the
other
waiters which are requiring a lock even the lock mode is compatible with
session 1.
So "deteach" can probably get its lock in a short time (unless some long
transaction
before it). I'm not sure if I have some misunderstanding here.

Overall I'd be +1 for this patch.

-- 
Best Regards
Andy Fan


Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-10-14 Thread David Rowley
On Thu, 15 Oct 2020 at 14:04, Andy Fan  wrote:
>
> I think if it is possible to implement the detech with a NoWait option .
>
> ALTER TABLE ... DETACH PARTITION ..  [NoWait].
>
> if it can't get the lock, raise "Resource is Busy" immediately, without 
> blocking others.
> this should be a default behavior.   If people do want to keep trying, it can 
> set
> a ddl_lock_timeout to 'some-interval',  in this case, it will still block 
> others(so it
> can't be as good as what you are doing, but very simple),  however the user
> would know what would happen exactly and can coordinate with their
> application accordingly.   I'm sorry about this since it is a bit of 
> off-topics
> or it has been discussed already.

How would that differ from setting a low lock_timeout and running the DDL?

I think what Alvaro wants to avoid is taking the AEL in the first
place. When you have multiple long overlapping queries to the
partitioned table, then there be no point in time where there are zero
locks on the table. It does not sound like your idea would help with
that.

David




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-10-14 Thread Alvaro Herrera
On 2020-Oct-15, Andy Fan wrote:

> I think if it is possible to implement the detech with a NoWait option .
> 
> ALTER TABLE ... DETACH PARTITION ..  [NoWait].
> 
> if it can't get the lock, raise "Resource is Busy" immediately,
> without blocking others.  this should be a default behavior.   If
> people do want to keep trying, it can set a ddl_lock_timeout to
> 'some-interval',  in this case, it will still block others(so it can't
> be as good as what you are doing, but very simple),  however the user
> would know what would happen exactly and can coordinate with their
> application accordingly.   I'm sorry about this since it is a bit of
> off-topics or it has been discussed already.

Hi.  I don't think this has been discussed, but it doesn't really solve
the use case I want to -- in many cases where the tables are
continuously busy, this would lead to starvation.  I think the proposal
to make the algorithm work with reduced lock level is much more useful.

I think you can already do NOWAIT behavior, with LOCK TABLE .. NOWAIT
followed by DETACH PARTITION, perhaps with a nonzero statement timeout.




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-10-14 Thread Andy Fan
Hi Alvaro:

On Tue, Aug 4, 2020 at 7:49 AM Alvaro Herrera 
wrote:

> I've been working on the ability to detach a partition from a
> partitioned table, without causing blockages to concurrent activity.
> I think this operation is critical for some use cases.
>

I think if it is possible to implement the detech with a NoWait option .

ALTER TABLE ... DETACH PARTITION ..  [NoWait].

if it can't get the lock, raise "Resource is Busy" immediately, without
blocking others.
this should be a default behavior.   If people do want to keep trying, it
can set
a ddl_lock_timeout to 'some-interval',  in this case, it will still block
others(so it
can't be as good as what you are doing, but very simple),  however the user
would know what would happen exactly and can coordinate with their
application accordingly.   I'm sorry about this since it is a bit of
off-topics
or it has been discussed already.

-- 
Best Regards
Andy Fan


Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-09-30 Thread Michael Paquier
On Thu, Sep 24, 2020 at 12:51:52PM +0900, Amit Langote wrote:
> Also, I noticed that looking up a parent's partitions via
> RelationBuildPartitionDesc or directly will inspect inhdetached to
> include or exclude partitions, but checking if a child table is a
> partition of a given parent table via get_partition_parent doesn't.
> Now if you fix get_partition_parent() to also take into account
> inhdetached, for example, to return InvalidOid if true, then the
> callers would need to not consider the child table a valid partition.
> So, RelationGetPartitionQual() on a detached partition should actually
> return NIL, making my earlier claim about not needing the defensive
> CHECK constraint invalid.  But maybe that's fine if all places agree
> on a detached partition not being a valid partition anymore?

It would be good to get that answered, and while on it please note
that the patch needs a rebase.
--
Michael


signature.asc
Description: PGP signature


Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-09-23 Thread Amit Langote
Hi Alvaro,

Sorry I totally failed to see the v2 you had posted and a couple of
other emails where you mentioned the issues I brought up.

On Thu, Sep 24, 2020 at 12:23 AM Alvaro Herrera
 wrote:
> On 2020-Sep-23, Amit Langote wrote:
 > I suspect that we don't really need this defensive constraint.  I mean
> > even after committing the 1st transaction, the partition being
> > detached still has relispartition set to true and
> > RelationGetPartitionQual() still returns the partition constraint, so
> > it seems the constraint being added above is duplicative.
>
> Ah, thanks for thinking through that.  I had vague thoughts about this
> being unnecessary in the current mechanics, but hadn't fully
> materialized the thought.  (The patch was completely different a few
> unposted iterations ago).
>
> > Moreover, the constraint is not removed as part of any cleaning up
> > after the DETACH process, so repeated attach/detach of the same
> > partition results in the constraints piling up:
>
> Yeah, I knew about this issue (mentioned in my self-reply on Aug 4) and
> didn't worry too much about it because I was thinking I'd rather get rid
> of the constraint addition in the first place.

Okay, gotcha.

> > Noticed a bug/typo in the patched RelationBuildPartitionDesc():
> >
> > -   inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock);
> > +   inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock,
> > +   include_detached);
> >
> > You're passing NoLock for include_detached which means you never
> > actually end up including detached partitions from here.
>
> I fixed this in the version I posted on Sept 10.  I think you were
> reading the version posted at the start of this thread.

I am trying the v2 now and I can confirm that those problems are now fixed.

However, I am a bit curious about including detached partitions in
some cases while not in other, which can result in a (to me)
surprising behavior as follows:

Session 1:

create table foo (a int, b int) partition by range (a);
create table foo1 partition of foo for values from (1) to (2);
create table foo2 partition of foo for values from (2) to (3);

...attach GDB and set breakpoint so as to block right after finishing
the 1st transaction of DETACH PARTITION CONCURRENTLY...
alter table foo detach partition foo2 concurrently;


Session 2:

begin;
insert into foo values (2);  -- ok
select * from foo;
select * from foo;  -- ?!
 a | b
---+---
(0 rows)

Maybe, it's fine to just always exclude detached partitions, although
perhaps I am missing some corner cases that you have thought of?

Also, I noticed that looking up a parent's partitions via
RelationBuildPartitionDesc or directly will inspect inhdetached to
include or exclude partitions, but checking if a child table is a
partition of a given parent table via get_partition_parent doesn't.
Now if you fix get_partition_parent() to also take into account
inhdetached, for example, to return InvalidOid if true, then the
callers would need to not consider the child table a valid partition.
So, RelationGetPartitionQual() on a detached partition should actually
return NIL, making my earlier claim about not needing the defensive
CHECK constraint invalid.  But maybe that's fine if all places agree
on a detached partition not being a valid partition anymore?

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-09-23 Thread Alvaro Herrera
On 2020-Sep-23, Amit Langote wrote:

Hi Amit, thanks for reviewing this patch!

> On Tue, Aug 4, 2020 at 8:49 AM Alvaro Herrera  
> wrote:

> I suspect that we don't really need this defensive constraint.  I mean
> even after committing the 1st transaction, the partition being
> detached still has relispartition set to true and
> RelationGetPartitionQual() still returns the partition constraint, so
> it seems the constraint being added above is duplicative.

Ah, thanks for thinking through that.  I had vague thoughts about this
being unnecessary in the current mechanics, but hadn't fully
materialized the thought.  (The patch was completely different a few
unposted iterations ago).

> Moreover, the constraint is not removed as part of any cleaning up
> after the DETACH process, so repeated attach/detach of the same
> partition results in the constraints piling up:

Yeah, I knew about this issue (mentioned in my self-reply on Aug 4) and
didn't worry too much about it because I was thinking I'd rather get rid
of the constraint addition in the first place.

> Noticed a bug/typo in the patched RelationBuildPartitionDesc():
> 
> -   inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock);
> +   inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock,
> +   include_detached);
> 
> You're passing NoLock for include_detached which means you never
> actually end up including detached partitions from here.

I fixed this in the version I posted on Sept 10.  I think you were
reading the version posted at the start of this thread.

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-09-22 Thread Amit Langote
Hi Alvaro,

Studying the patch to understand how it works.

On Tue, Aug 4, 2020 at 8:49 AM Alvaro Herrera  wrote:
> Why two transactions?  The reason is that in order for this to work, we
> make a catalog change (mark it detached), and commit so that all
> concurrent transactions can see the change.  A second transaction waits
> for anybody who holds any lock on the partitioned table and grabs Access
> Exclusive on the partition (which now no one cares about, if they're
> looking at the partitioned table), where the DDL action on the partition
> can be completed.

+   /*
+* Concurrent mode has to work harder; first we add a new constraint to the
+* partition that matches the partition constraint.  The reason for this is
+* that the planner may have made optimizations that depend on the
+* constraint.  XXX Isn't it sufficient to invalidate the partition's
+* relcache entry?
...
+   /* Add constraint on partition, equivalent to the partition
constraint */
+   n = makeNode(Constraint);
+   n->contype = CONSTR_CHECK;
+   n->conname = NULL;
+   n->location = -1;
+   n->is_no_inherit = false;
+   n->raw_expr = NULL;
+   n->cooked_expr =
nodeToString(make_ands_explicit(RelationGetPartitionQual(partRel)));
+   n->initially_valid = true;
+   n->skip_validation = true;
+   /* It's a re-add, since it nominally already exists */
+   ATAddCheckConstraint(wqueue, tab, partRel, n,
+true, false, true, ShareUpdateExclusiveLock);

I suspect that we don't really need this defensive constraint.  I mean
even after committing the 1st transaction, the partition being
detached still has relispartition set to true and
RelationGetPartitionQual() still returns the partition constraint, so
it seems the constraint being added above is duplicative.  Moreover,
the constraint is not removed as part of any cleaning up after the
DETACH process, so repeated attach/detach of the same partition
results in the constraints piling up:

create table foo (a int, b int) partition by range (a);
create table foo1 partition of foo for values from (1) to (2);
create table foo2 partition of foo for values from (2) to (3);
alter table foo detach partition foo2 concurrently;
alter table foo attach partition foo2 for values from (2) to (3);
alter table foo detach partition foo2 concurrently;
alter table foo attach partition foo2 for values from (2) to (3);
\d foo2
Table "public.foo2"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | integer |   |  |
 b  | integer |   |  |
Partition of: foo FOR VALUES FROM (2) TO (3)
Check constraints:
"foo2_a_check" CHECK (a IS NOT NULL AND a >= 2 AND a < 3)
"foo2_a_check1" CHECK (a IS NOT NULL AND a >= 2 AND a < 3)

Noticed a bug/typo in the patched RelationBuildPartitionDesc():

-   inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock);
+   inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock,
+   include_detached);

You're passing NoLock for include_detached which means you never
actually end up including detached partitions from here.  I think it
is due to this bug that partition-concurrent-attach test fails in my
run.  Also, the error seen in the following hunk of
detach-partition-concurrently-1 test is not intentional:

+starting permutation: s1brr s1prep s1s s2d s1s s1exec2 s1c
+step s1brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s1prep: PREPARE f(int) AS INSERT INTO d_listp VALUES ($1);
+step s1s: SELECT * FROM d_listp;
+a
+
+1
+2
+step s2d: ALTER TABLE d_listp DETACH PARTITION d_listp2 CONCURRENTLY;

+step s1s: SELECT * FROM d_listp;
+a
+
+1
+step s1exec2: EXECUTE f(2); DEALLOCATE f;
+step s2d: <... completed>
+error in steps s1exec2 s2d: ERROR:  no partition of relation
"d_listp" found for row
+step s1c: COMMIT;

As you're intending to make the executor always *include* detached
partitions, the insert should be able route (2) to d_listp2, the
detached partition.  It's the bug mentioned above that causes the
executor's partition descriptor build to falsely fail to include the
detached partition.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-09-11 Thread Robert Haas
On Thu, Sep 10, 2020 at 4:54 PM Alvaro Herrera  wrote:
> Interesting example, thanks.  It seems this can be fixed without
> breaking anything else by changing the planner so that it includes
> detached partitions when we are in a snapshot-isolation transaction.
> Indeed, the results from the detach-partition-concurrently-1.spec
> isolation test are more satisfying with this change.

Hmm, so I think the idea here is that since we're out-waiting plans
with the old partition descriptor by waiting for lock release, it's OK
for anyone who has a lock to keep using the old partition descriptor
as long as they continuously hold the lock. Is that right? I can't
think of a hole in that logic, but it's probably worth noting in the
comments, in case someone is tempted to change the way that we
out-wait plans with the old partition descriptor to some other
mechanism.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-09-10 Thread Alvaro Herrera
end/parser/gram.y
index c5154b818c..3bfbfdd8a5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -649,7 +649,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	EXCLUDE EXCLUDING EXCLUSIVE EXECUTE EXISTS EXPLAIN EXPRESSION
 	EXTENSION EXTERNAL EXTRACT
 
-	FALSE_P FAMILY FETCH FILTER FIRST_P FLOAT_P FOLLOWING FOR
+	FALSE_P FAMILY FETCH FILTER FINALIZE FIRST_P FLOAT_P FOLLOWING FOR
 	FORCE FOREIGN FORWARD FREEZE FROM FULL FUNCTION FUNCTIONS
 
 	GENERATED GLOBAL GRANT GRANTED GREATEST GROUP_P GROUPING GROUPS
@@ -2057,12 +2057,13 @@ partition_cmd:
 	n->subtype = AT_AttachPartition;
 	cmd->name = $3;
 	cmd->bound = $4;
+	cmd->concurrent = false;
 	n->def = (Node *) cmd;
 
 	$$ = (Node *) n;
 }
-			/* ALTER TABLE  DETACH PARTITION  */
-			| DETACH PARTITION qualified_name
+			/* ALTER TABLE  DETACH PARTITION  [CONCURRENTLY] */
+			| DETACH PARTITION qualified_name opt_concurrently
 {
 	AlterTableCmd *n = makeNode(AlterTableCmd);
 	PartitionCmd *cmd = makeNode(PartitionCmd);
@@ -2070,8 +2071,21 @@ partition_cmd:
 	n->subtype = AT_DetachPartition;
 	cmd->name = $3;
 	cmd->bo

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-09-03 Thread Amit Langote
Hi Hao,

On Wed, Sep 2, 2020 at 5:25 PM Hao Wu  wrote:
>
> Not related to DETACH PARTITION, but I found a bug in ATTACH PARTITION.
> Here are the steps to reproduce the issue:
>
> create table tpart(i int, j int) partition by range(i);
> create table tpart_1(like tpart);
> create table tpart_2(like tpart);
> create table tpart_default(like tpart);alter table tpart attach partition 
> tpart_1 for values from(0) to (100);
> alter table tpart attach partition tpart_default default;insert into tpart_2 
> values(110,110),(120,120),(150,150);1: begin;
> 1: alter table tpart attach partition tpart_2 for values from(100) to (200);
> 2: begin;
> -- insert will be blocked by ALTER TABLE ATTACH PARTITION
> 2: insert into tpart values (110,110),(120,120),(150,150);
> 1: end;
> 2: select * from tpart_default;
> 2: end;
>
>
> After that the partition tpart_default contains (110,110),(120,120),(150,150)
> inserted in session 2, which obviously violates the partition constraint.

Thanks for the report.  That looks like a bug.

I have started another thread to discuss this bug and a patch to fix
it to keep the discussion here focused on the new feature.

Subject: default partition and concurrent attach partition
https://www.postgresql.org/message-id/CA%2BHiwqFqBmcSSap4sFnCBUEL_VfOMmEKaQ3gwUhyfa4c7J_-nA%40mail.gmail.com

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-09-02 Thread Hao Wu
Not related to DETACH PARTITION, but I found a bug in ATTACH PARTITION.
Here are the steps to reproduce the issue:


create table tpart(i int, j int) partition by range(i);
create table tpart_1(like tpart);
create table tpart_2(like tpart);
create table tpart_default(like tpart);alter table tpart attach partition 
tpart_1 for values from(0) to (100);
alter table tpart attach partition tpart_default default;insert into tpart_2 
values(110,110),(120,120),(150,150);1: begin;
1: alter table tpart attach partition tpart_2 for values from(100) to (200);
2: begin;
-- insert will be blocked by ALTER TABLE ATTACH PARTITION
2: insert into tpart values (110,110),(120,120),(150,150);
1: end;
2: select * from tpart_default;
2: end;

After that the partition tpart_default contains (110,110),(120,120),(150,150)
inserted in session 2, which obviously violates the partition constraint.

Regards,
Hao Wu


Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-09-01 Thread Alvaro Herrera
On 2020-Aug-27, Robert Haas wrote:

> On Wed, Aug 26, 2020 at 7:40 PM Alvaro Herrera  
> wrote:
> > To mark it detached means to set pg_inherits.inhdetached=true.  That
> > column name is a bit of a misnomer, since that column really means "is
> > in the process of being detached"; the pg_inherits row goes away
> > entirely once the detach process completes.  This mark guarantees that
> > everyone will see that row because the detaching session waits long
> > enough after committing the first transaction and before deleting the
> > pg_inherits row, until everyone else has moved on.
> 
> OK. Do you just wait until the XID of the transaction that set
> inhdetached is all-visible, or how do you do it?

I'm just doing WaitForLockers( ... AccessExclusiveLock ...)  on the
partitioned table at the start of the second transaction.  That will
wait until all lockers that have obtained a partition descriptor with
the old definition are gone.  Note we don't actually lock the
partitioned table with that lock level.

In the second transaction we additionally obtain AccessExclusiveLock on
the partition itself, but that's after nobody sees it as a partition
anymore.  That lock level is needed for some of the internal DDL
changes, and should not cause problems.

I thought about using WaitForOlderSnapshots() instead of waiting for
lockers, but it didn't seem to solve any actual problem.

Note that on closing the first transaction, the locks on both tables are
released.  This avoids the deadlock hazards because of the lock upgrades
that would otherwise occur.  This means that the tables could be dropped
or changed in the meantime.  The case where either relation is dropped
is handled by using try_relation_open() in the second transaction; if
either table is gone, then we can just mark the operation as completed.
This part is a bit fuzzy.  One thing that should probably be done is
have a few operations (such as other ALTER TABLE) raise an error when
run on a table with inhdetached=true, because that might get things out
of step and potentially cause other problems.  I've not done that yet.  

> So all the plans that were created before you set
> inhdetached=true have to be guaranteed to be invaliated or gone
> altogether before you can actually delete the pg_inherits row. It
> seems like it ought to be possible to ensure that, though I am not
> surely of the details exactly. Is it sufficient to wait for all
> transactions that have locked the table to go away? I'm not sure
> exactly how this stuff interacts with the plan cache.

Hmm, any cached plan should be released with relcache inval events, per
PlanCacheRelCallback().  There are some comments in plancache.h about
"unsaved" cached plans that I don't really understand :-(

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-08-31 Thread Hao Wu
Hi hacker,

I tested the patch provided by Alvaro. It seems not well in REPEATABLE READ.

gpadmin=*# \d+ tpart
 Partitioned table "public.tpart"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | 
Description
+-+---+--+-+-+--+-
 i  | integer |   |  | | plain   |  |
 j  | integer |   |  | | plain   |  |
Partition key: RANGE (i)
Partitions: tpart_1 FOR VALUES FROM (0) TO (100),
tpart_2 FOR VALUES FROM (100) TO (200)

begin isolation level repeatable read;
BEGIN
gpadmin=*# select * from tpart;
  i  |  j
-+-
  10 |  10
  50 |  50
 110 | 110
 120 | 120
 150 | 150
(5 rows)
-- Here, run `ALTER TABLE tpart DROP PARTITION tpart_2 CONCURRENTLY`
-- but only complete the first transaction.

-- the tuples from tpart_2 are gone.
gpadmin=*# select * from tpart;
 i  | j
+
 10 | 10
 50 | 50
(2 rows)

gpadmin=*# \d+ tpart_2
  Table "public.tpart_2"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | 
Description
+-+---+--+-+-+--+-
 i  | integer |   |  | | plain   |  |
 j  | integer |   |  | | plain   |  |
Partition of: tpart FOR VALUES FROM (100) TO (200)
Partition constraint: ((i IS NOT NULL) AND (i >= 100) AND (i < 200))
Access method: heap

-- the part tpart_2 is not showed as DETACHED
gpadmin=*# \d+ tpart
 Partitioned table "public.tpart"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | 
Description
+-+---+--+-+-+--+-
 i  | integer |   |  | | plain   |  |
 j  | integer |   |  | | plain   |  |
Partition key: RANGE (i)
Partitions: tpart_1 FOR VALUES FROM (0) TO (100),
tpart_2 FOR VALUES FROM (100) TO (200)

-- next, the insert failed. It's OK.
gpadmin=*# insert into tpart values(130,130);
ERROR:  no partition of relation "tpart" found for row
DETAIL:  Partition key of the failing row contains (i) = (130).


Is this an expected behavior?

Regards,
Hao Wu


From: Robert Haas 
Sent: Thursday, August 27, 2020 11:46 PM
To: Alvaro Herrera 
Cc: Pg Hackers 
Subject: Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

On Wed, Aug 26, 2020 at 7:40 PM Alvaro Herrera  wrote:
> To mark it detached means to set pg_inherits.inhdetached=true.  That
> column name is a bit of a misnomer, since that column really means "is
> in the process of being detached"; the pg_inherits row goes away
> entirely once the detach process completes.  This mark guarantees that
> everyone will see that row because the detaching session waits long
> enough after committing the first transaction and before deleting the
> pg_inherits row, until everyone else has moved on.

OK. Do you just wait until the XID of the transaction that set
inhdetached is all-visible, or how do you do it?

> The other point is that the partition directory code can be asked to
> include detached partitions, or not to.  The executor does the former,
> and the planner does the latter.  This allows a transient period during
> which the partition descriptor returned to planner and executor is
> different; this makes the situation equivalent to what would have
> happened if the partition was attached during the operation: in executor
> we would detect that there is an additional partition that was not seen
> by the planner, and we already know how to deal with that situation by
> your handling of the ATTACH code.

Ah ha! That is quite clever and I don't think that I would have
thought of it. So all the plans that were created before you set
inhdetached=true have to be guaranteed to be invaliated or gone
altogether before you can actually delete the pg_inherits row. It
seems like it ought to be possible to ensure that, though I am not
surely of the details exactly. Is it sufficient to wait for all
transactions that have locked the table to go away? I'm not sure
exactly how this stuff interacts with the plan cache.

> There is one fly in the ointment though, which is that if you cancel the
> wait and then immediately do the ALTER TABLE DETACH FINALIZE without
> waiting for as long as the original execution would have waited, you
> might end up killing the partition ahead of time.  One solution to this
> would be to cause the FINALIZE action to wait again at start.  This
> would cause it to take even longer, but it would be safer.  (If you
> don't want it to take longer, you can jus

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-08-27 Thread Robert Haas
On Wed, Aug 26, 2020 at 7:40 PM Alvaro Herrera  wrote:
> To mark it detached means to set pg_inherits.inhdetached=true.  That
> column name is a bit of a misnomer, since that column really means "is
> in the process of being detached"; the pg_inherits row goes away
> entirely once the detach process completes.  This mark guarantees that
> everyone will see that row because the detaching session waits long
> enough after committing the first transaction and before deleting the
> pg_inherits row, until everyone else has moved on.

OK. Do you just wait until the XID of the transaction that set
inhdetached is all-visible, or how do you do it?

> The other point is that the partition directory code can be asked to
> include detached partitions, or not to.  The executor does the former,
> and the planner does the latter.  This allows a transient period during
> which the partition descriptor returned to planner and executor is
> different; this makes the situation equivalent to what would have
> happened if the partition was attached during the operation: in executor
> we would detect that there is an additional partition that was not seen
> by the planner, and we already know how to deal with that situation by
> your handling of the ATTACH code.

Ah ha! That is quite clever and I don't think that I would have
thought of it. So all the plans that were created before you set
inhdetached=true have to be guaranteed to be invaliated or gone
altogether before you can actually delete the pg_inherits row. It
seems like it ought to be possible to ensure that, though I am not
surely of the details exactly. Is it sufficient to wait for all
transactions that have locked the table to go away? I'm not sure
exactly how this stuff interacts with the plan cache.

> There is one fly in the ointment though, which is that if you cancel the
> wait and then immediately do the ALTER TABLE DETACH FINALIZE without
> waiting for as long as the original execution would have waited, you
> might end up killing the partition ahead of time.  One solution to this
> would be to cause the FINALIZE action to wait again at start.  This
> would cause it to take even longer, but it would be safer.  (If you
> don't want it to take longer, you can just not cancel it in the first
> place.)  This is not a problem if the server crashes in between (which
> is the scenario I had in mind when doing the FINALIZE thing), because of
> course no transaction can continue to run across a crash.

Yeah, it sounds like this will require some solution, but I agree that
just waiting "longer" seems acceptable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-08-26 Thread Alvaro Herrera
On 2020-Aug-04, Robert Haas wrote:

> On Mon, Aug 3, 2020 at 7:49 PM Alvaro Herrera  
> wrote:
> > Why two transactions?  The reason is that in order for this to work, we
> > make a catalog change (mark it detached), and commit so that all
> > concurrent transactions can see the change.  A second transaction waits
> > for anybody who holds any lock on the partitioned table and grabs Access
> > Exclusive on the partition (which now no one cares about, if they're
> > looking at the partitioned table), where the DDL action on the partition
> > can be completed.
> 
> Is there a more detailed theory of operation of this patch somewhere?
> What exactly do you mean by marking it detached? Committing the change
> makes it possible for all concurrent transactions to see the change,
> but does not guarantee that they will. If you can't guarantee that,
> then I'm not sure how you can guarantee that they will behave sanely.

Sorry for the long delay.  I haven't written up the theory of operation.
I suppose it is complicated enough that it should be documented
somewhere.

To mark it detached means to set pg_inherits.inhdetached=true.  That
column name is a bit of a misnomer, since that column really means "is
in the process of being detached"; the pg_inherits row goes away
entirely once the detach process completes.  This mark guarantees that
everyone will see that row because the detaching session waits long
enough after committing the first transaction and before deleting the
pg_inherits row, until everyone else has moved on.

The other point is that the partition directory code can be asked to
include detached partitions, or not to.  The executor does the former,
and the planner does the latter.  This allows a transient period during
which the partition descriptor returned to planner and executor is
different; this makes the situation equivalent to what would have
happened if the partition was attached during the operation: in executor
we would detect that there is an additional partition that was not seen
by the planner, and we already know how to deal with that situation by
your handling of the ATTACH code.

> Even if you can, it's not clear what the sane behavior is: what
> happens when a tuple gets routed to an ex-partition? What happens when
> an ex-partition needs to be scanned? 

During the transient period, any transaction that obtained a partition
descriptor before the inhdetached mark is committed should be able to
get the tuple routing done successfully, but transactions using later
snapshots should get their insertions rejected.  Reads should behave in
the same way.

> What prevents problems if a partition is detached, possibly modified,
> and then reattached, possibly with different partition bounds?

This should not be a problem, because the partition needs to be fully
detached before it can be attached again.  And if the partition bounds
are different, that won't be a problem, because the previous partition
bounds won't be present in the pg_class row.  Of course, the new
partition bounds will be checked to the existing contents.

There is one fly in the ointment though, which is that if you cancel the
wait and then immediately do the ALTER TABLE DETACH FINALIZE without
waiting for as long as the original execution would have waited, you
might end up killing the partition ahead of time.  One solution to this
would be to cause the FINALIZE action to wait again at start.  This
would cause it to take even longer, but it would be safer.  (If you
don't want it to take longer, you can just not cancel it in the first
place.)  This is not a problem if the server crashes in between (which
is the scenario I had in mind when doing the FINALIZE thing), because of
course no transaction can continue to run across a crash.


I'm going to see if I can get the new delay_execution module to help
test this, to verify whether my claims are true.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-08-04 Thread Robert Haas
On Mon, Aug 3, 2020 at 7:49 PM Alvaro Herrera  wrote:
> Why two transactions?  The reason is that in order for this to work, we
> make a catalog change (mark it detached), and commit so that all
> concurrent transactions can see the change.  A second transaction waits
> for anybody who holds any lock on the partitioned table and grabs Access
> Exclusive on the partition (which now no one cares about, if they're
> looking at the partitioned table), where the DDL action on the partition
> can be completed.

Is there a more detailed theory of operation of this patch somewhere?
What exactly do you mean by marking it detached? Committing the change
makes it possible for all concurrent transactions to see the change,
but does not guarantee that they will. If you can't guarantee that,
then I'm not sure how you can guarantee that they will behave sanely.
Even if you can, it's not clear what the sane behavior is: what
happens when a tuple gets routed to an ex-partition? What happens when
an ex-partition needs to be scanned? What prevents problems if a
partition is detached, possibly modified, and then reattached,
possibly with different partition bounds?

I think the two-transaction approach is interesting and I can imagine
that it possibly solves some problems, but it's not clear to me
exactly which problems it solves or how it does so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-08-04 Thread Alvaro Herrera
On 2020-Aug-03, Alvaro Herrera wrote:

> Why two transactions?  The reason is that in order for this to work, we
> make a catalog change (mark it detached), and commit so that all
> concurrent transactions can see the change.  A second transaction waits
> for anybody who holds any lock on the partitioned table and grabs Access
> Exclusive on the partition (which now no one cares about, if they're
> looking at the partitioned table), where the DDL action on the partition
> can be completed.

I forgot to mention.  If for whatever reason the second transaction
fails, (say the user aborts it or there is a crash), then the partition
is still marked as detached, so no queries would see it; but all the
ancillary catalog data remains.  Just like when CREATE INDEX
CONCURRENTLY fails, you keep an invalid index that must be dropped; in
this case, the changes to do are much more extensive, so manual action
is out of the question.  So there's another DDL command to be invoked,

ALTER TABLE parent DETACH PARTITION part FINALIZE;

which will complete the detach action.

If we had UNDO then perhaps it would be possible to register an action
so that the detach is completed automatically.  But for now this seems
sufficient.


Another aspect worth mentioning is constraints.  In the patch, I create
a CHECK constraint to stand for the partition constraint that's going to
logically disappear.  This was mentioned as a potential problem in one
of Robert's emails (I didn't actually verify that this is a problem).
However, a funny thing is that if a constraint already exists, you get a
dupe, so after a few rounds of attach/detach you can see them pile up.
I'll have to fix this at some point.  But also, I need to think about
whether foreign keys have similar problems, since they are also used by
the optimizer.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-08-03 Thread Alvaro Herrera
On 2020-Aug-03, Alvaro Herrera wrote:

> There was a lot of great discussion which ended up in Robert completing
> a much sought implementation of non-blocking ATTACH.  DETACH was
> discussed too because it was a goal initially, but eventually dropped
> from that patch altogether. Nonetheless, that thread provided a lot of
> useful input to this implementation.  Important ones:
> 
> [1] 
> https://postgr.es/m/CA+TgmoYg4x7AH=_QSptvuBKf+3hUdiCa4frPkt+RvXZyjX1n=w...@mail.gmail.com
> [2] 
> https://postgr.es/m/CA+TgmoaAjkTibkEr=xJg3ndbRsHHSiYi2SJgX69MVosj=lj...@mail.gmail.com
> [3] 
> https://postgr.es/m/CA+TgmoY13KQZF-=hntrt9uywyx3_oyoqpu9iont49jggidp...@mail.gmail.com

There was some discussion about having a version number in the partition
descriptor somewhere as a means to implement this.  I couldn't figure
out how that would work, or what the version number would be attached
to.  Surely the idea wasn't to increment the version number to every
partition other than the one being detached?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2020-08-03 Thread Alvaro Herrera
I've been working on the ability to detach a partition from a
partitioned table, without causing blockages to concurrent activity.
I think this operation is critical for some use cases.

There was a lot of great discussion which ended up in Robert completing
a much sought implementation of non-blocking ATTACH.  DETACH was
discussed too because it was a goal initially, but eventually dropped
from that patch altogether. Nonetheless, that thread provided a lot of
useful input to this implementation.  Important ones:

[1] 
https://postgr.es/m/CA+TgmoYg4x7AH=_QSptvuBKf+3hUdiCa4frPkt+RvXZyjX1n=w...@mail.gmail.com
[2] 
https://postgr.es/m/CA+TgmoaAjkTibkEr=xJg3ndbRsHHSiYi2SJgX69MVosj=lj...@mail.gmail.com
[3] 
https://postgr.es/m/CA+TgmoY13KQZF-=hntrt9uywyx3_oyoqpu9iont49jggidp...@mail.gmail.com

Attached is a patch that implements
ALTER TABLE ... DETACH PARTITION .. CONCURRENTLY.

In the previous thread we were able to implement the concurrent model
without the extra keyword.  For this one I think that won't work; my
implementation works in two transactions so there's a restriction that
you can't run it in a transaction block.  Also, there's a wait phase
that makes it slower than the non-concurrent one.  Those two drawbacks
make me think that it's better to keep both modes available, just like
we offer both CREATE INDEX and CREATE INDEX CONCURRENTLY.

Why two transactions?  The reason is that in order for this to work, we
make a catalog change (mark it detached), and commit so that all
concurrent transactions can see the change.  A second transaction waits
for anybody who holds any lock on the partitioned table and grabs Access
Exclusive on the partition (which now no one cares about, if they're
looking at the partitioned table), where the DDL action on the partition
can be completed.

ALTER TABLE is normally unable to run in two transactions.  I hacked it
(0001) so that the relation can be closed and reopened in the Exec phase
(by having the rel as part of AlteredTableInfo: when ATRewriteCatalogs
returns, it uses that pointer to close the rel).  It turns out that this
is sufficient to make that work.  This means that ALTER TABLE DETACH
CONCURRENTLY cannot work as part of a multi-command ALTER TABLE, but
that's alreay enforced by the grammar anyway.

DETACH CONCURRENTLY doesn't work if a default partition exists.  It's
just too problematic a case; you would still need to have AEL on the
default partition.


I haven't yet experimented with queries running in a standby in tandem
with a detach.

-- 
Álvaro Herrera
>From 2f9202bf6f4c86d607032d7f04d3b2cee74a9617 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 13 Jul 2020 20:15:30 -0400
Subject: [PATCH v1 1/2] Let ALTER TABLE exec routines deal with the relation

This means that ATExecCmd relies on AlteredRelationInfo->rel instead of
keeping the relation as a local variable; this is useful if the
subcommand needs to modify the relation internally.  For example, a
subcommand that internally commits its transaction needs this.
---
 src/backend/commands/tablecmds.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index ac53f79ada..07f4f562ee 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -157,6 +157,8 @@ typedef struct AlteredTableInfo
 	Oid			relid;			/* Relation to work on */
 	char		relkind;		/* Its relkind */
 	TupleDesc	oldDesc;		/* Pre-modification tuple descriptor */
+	/* Transiently set during Phase 2, normally set to NULL */
+	Relation	rel;
 	/* Information saved by Phase 1 for Phase 2: */
 	List	   *subcmds[AT_NUM_PASSES]; /* Lists of AlterTableCmd */
 	/* Information saved by Phases 1/2 for Phase 3: */
@@ -354,7 +356,7 @@ static void ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 	  AlterTableUtilityContext *context);
 static void ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			  AlterTableUtilityContext *context);
-static void ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
+static void ATExecCmd(List **wqueue, AlteredTableInfo *tab,
 	  AlterTableCmd *cmd, LOCKMODE lockmode, int cur_pass,
 	  AlterTableUtilityContext *context);
 static AlterTableCmd *ATParseTransformCmd(List **wqueue, AlteredTableInfo *tab,
@@ -4326,7 +4328,6 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 		{
 			AlteredTableInfo *tab = (AlteredTableInfo *) lfirst(ltab);
 			List	   *subcmds = tab->subcmds[pass];
-			Relation	rel;
 			ListCell   *lcmd;
 
 			if (subcmds == NIL)
@@ -4335,10 +4336,10 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			/*
 			 * Appropriate lock was obtained by phase 1, needn't get it again
 			 */
-			rel = relation_open(tab->relid, NoLock);
+			tab->rel = relation_open(tab->relid, NoLock);
 
 			foreach(lcmd, subcmds)
-ATExecCmd(wqueue, tab, rel,
+ATExecCmd(wqueue, tab,
 		  castNode(AlterTab