Re: Broken defenses against dropping a partitioning column

2019-07-22 Thread Tom Lane
Alvaro Herrera  writes:
> On 2019-Jul-22, Tom Lane wrote:
>> I nearly missed the need for that because of all the noise that
>> check-world emits in pre-v12 branches.  We'd discussed back-patching
>> eb9812f27 at the time, and I think now it's tested enough that doing
>> so is low risk (or at least, lower risk than the risk of not seeing
>> a failure).  So I think I'll go do that now.

> I'd like that, as it bites me too, thanks.

Done.  The approach "make check-world >/dev/null" now emits the
same amount of noise on all branches, ie just

NOTICE:  database "regression" does not exist, skipping


The amount of parallelism you can apply is still pretty
branch-dependent, unfortunately.

regards, tom lane




Re: Broken defenses against dropping a partitioning column

2019-07-22 Thread Manuel Rigger
Thanks a lot for the fix!

Best,
Manuel

On Mon, Jul 22, 2019 at 9:35 PM Alvaro Herrera  wrote:
>
> On 2019-Jul-22, Tom Lane wrote:
>
> > I nearly missed the need for that because of all the noise that
> > check-world emits in pre-v12 branches.  We'd discussed back-patching
> > eb9812f27 at the time, and I think now it's tested enough that doing
> > so is low risk (or at least, lower risk than the risk of not seeing
> > a failure).  So I think I'll go do that now.
>
> I'd like that, as it bites me too, thanks.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Broken defenses against dropping a partitioning column

2019-07-22 Thread Alvaro Herrera
On 2019-Jul-22, Tom Lane wrote:

> I nearly missed the need for that because of all the noise that
> check-world emits in pre-v12 branches.  We'd discussed back-patching
> eb9812f27 at the time, and I think now it's tested enough that doing
> so is low risk (or at least, lower risk than the risk of not seeing
> a failure).  So I think I'll go do that now.

I'd like that, as it bites me too, thanks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Broken defenses against dropping a partitioning column

2019-07-22 Thread Tom Lane
I wrote:
> Here's a proposed patch for that.  It's mostly pretty straightforward,
> except I had to add some recursion defenses in findDependentObjects that
> weren't there before.  But those seem like a good idea anyway to prevent
> infinite recursion in case of bogus entries in pg_depend.
> Per above, I'm envisioning applying this to HEAD and v12 with a catversion
> bump, and to v11 and v10 with no catversion bump.

Pushed.  Back-patching turned up one thing I hadn't expected: pre-v12
pg_dump bleated about circular dependencies.  It turned out that Peter
had already installed a hack in pg_dump to suppress that complaint in
connection with generated columns, so I improved the comment and
back-patched that too.

I nearly missed the need for that because of all the noise that
check-world emits in pre-v12 branches.  We'd discussed back-patching
eb9812f27 at the time, and I think now it's tested enough that doing
so is low risk (or at least, lower risk than the risk of not seeing
a failure).  So I think I'll go do that now.

regards, tom lane




Re: Broken defenses against dropping a partitioning column

2019-07-19 Thread Tom Lane
I wrote:
> So I think we're probably stuck with the approach of adding new internal
> dependencies.  If we go that route, then our options for the released
> branches are (1) do nothing, or (2) back-patch the code that adds such
> dependencies, but without a catversion bump.  That would mean that only
> tables created after the next minor releases would have protection against
> this problem.  That's not ideal but maybe it's okay, considering that we
> haven't seen actual field reports of trouble of this kind.

Here's a proposed patch for that.  It's mostly pretty straightforward,
except I had to add some recursion defenses in findDependentObjects that
weren't there before.  But those seem like a good idea anyway to prevent
infinite recursion in case of bogus entries in pg_depend.

I also took the liberty of improving some related error messages that
I thought were unnecessarily vague and not up to project standards.

Per above, I'm envisioning applying this to HEAD and v12 with a catversion
bump, and to v11 and v10 with no catversion bump.

regards, tom lane

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 6315fc4..3356461 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -543,6 +543,7 @@ findDependentObjects(const ObjectAddress *object,
 ObjectIdGetDatum(object->objectId));
 	if (object->objectSubId != 0)
 	{
+		/* Consider only dependencies of this sub-object */
 		ScanKeyInit([2],
 	Anum_pg_depend_objsubid,
 	BTEqualStrategyNumber, F_INT4EQ,
@@ -550,7 +551,10 @@ findDependentObjects(const ObjectAddress *object,
 		nkeys = 3;
 	}
 	else
+	{
+		/* Consider dependencies of this object and any sub-objects it has */
 		nkeys = 2;
+	}
 
 	scan = systable_beginscan(*depRel, DependDependerIndexId, true,
 			  NULL, nkeys, key);
@@ -567,6 +571,18 @@ findDependentObjects(const ObjectAddress *object,
 		otherObject.objectId = foundDep->refobjid;
 		otherObject.objectSubId = foundDep->refobjsubid;
 
+		/*
+		 * When scanning dependencies of a whole object, we may find rows
+		 * linking sub-objects of the object to the object itself.  (Normally,
+		 * such a dependency is implicit, but we must make explicit ones in
+		 * some cases involving partitioning.)  We must ignore such rows to
+		 * avoid infinite recursion.
+		 */
+		if (otherObject.classId == object->classId &&
+			otherObject.objectId == object->objectId &&
+			object->objectSubId == 0)
+			continue;
+
 		switch (foundDep->deptype)
 		{
 			case DEPENDENCY_NORMAL:
@@ -855,6 +871,16 @@ findDependentObjects(const ObjectAddress *object,
 		otherObject.objectSubId = foundDep->objsubid;
 
 		/*
+		 * If what we found is a sub-object of the current object, just ignore
+		 * it.  (Normally, such a dependency is implicit, but we must make
+		 * explicit ones in some cases involving partitioning.)
+		 */
+		if (otherObject.classId == object->classId &&
+			otherObject.objectId == object->objectId &&
+			object->objectSubId == 0)
+			continue;
+
+		/*
 		 * Must lock the dependent object before recursing to it.
 		 */
 		AcquireDeletionLock(, 0);
@@ -1588,8 +1614,10 @@ recordDependencyOnExpr(const ObjectAddress *depender,
  * As above, but only one relation is expected to be referenced (with
  * varno = 1 and varlevelsup = 0).  Pass the relation OID instead of a
  * range table.  An additional frammish is that dependencies on that
- * relation (or its component columns) will be marked with 'self_behavior',
- * whereas 'behavior' is used for everything else.
+ * relation's component columns will be marked with 'self_behavior',
+ * whereas 'behavior' is used for everything else; also, if reverse_self
+ * is true, those dependencies are reversed so that the columns are made
+ * to depend on the table not vice versa.
  *
  * NOTE: the caller should ensure that a whole-table dependency on the
  * specified relation is created separately, if one is needed.  In particular,
@@ -1602,7 +1630,7 @@ recordDependencyOnSingleRelExpr(const ObjectAddress *depender,
 Node *expr, Oid relId,
 DependencyType behavior,
 DependencyType self_behavior,
-bool ignore_self)
+bool reverse_self)
 {
 	find_expr_references_context context;
 	RangeTblEntry rte;
@@ -1626,7 +1654,8 @@ recordDependencyOnSingleRelExpr(const ObjectAddress *depender,
 	eliminate_duplicate_dependencies(context.addrs);
 
 	/* Separate self-dependencies if necessary */
-	if (behavior != self_behavior && context.addrs->numrefs > 0)
+	if ((behavior != self_behavior || reverse_self) &&
+		context.addrs->numrefs > 0)
 	{
 		ObjectAddresses *self_addrs;
 		ObjectAddress *outobj;
@@ -1657,11 +1686,23 @@ recordDependencyOnSingleRelExpr(const ObjectAddress *depender,
 		}
 		context.addrs->numrefs = outrefs;
 
-		/* Record the self-dependencies */
-		if (!ignore_self)
+		/* Record the self-dependencies with the appropriate direction */
+		if (!reverse_self)
 			

Re: Broken defenses against dropping a partitioning column

2019-07-09 Thread Tom Lane
Tomas Vondra  writes:
> On Mon, Jul 08, 2019 at 10:58:56AM -0400, Tom Lane wrote:
>> So I think we're probably stuck with the approach of adding new internal
>> dependencies.  If we go that route, then our options for the released
>> branches are (1) do nothing, or (2) back-patch the code that adds such
>> dependencies, but without a catversion bump.  That would mean that only
>> tables created after the next minor releases would have protection against
>> this problem.  That's not ideal but maybe it's okay, considering that we
>> haven't seen actual field reports of trouble of this kind.

> Couldn't we also write a function that adds those dependencies for
> existing objects, and request users to run it after the update?

Maybe.  I'm not volunteering to write such a thing.

BTW, it looks like somebody actually did think about this problem with
respect to external dependencies of partition expressions:

regression=# create function myabs(int) returns int language internal as 
'int4abs' immutable strict parallel safe;
CREATE FUNCTION
regression=# create table foo (f1 int) partition by range (myabs(f1));
CREATE TABLE
regression=# drop function myabs(int);
ERROR:  cannot drop function myabs(integer) because other objects depend on it
DETAIL:  table foo depends on function myabs(integer)
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

Unfortunately, there's still no dependency on the column f1 in this
scenario.  That means any function that wants to reconstruct the
correct dependencies would need a way to scan the partition expressions
for Vars.  Not much fun from plpgsql, for sure.

regards, tom lane




Re: Broken defenses against dropping a partitioning column

2019-07-09 Thread Tomas Vondra

On Mon, Jul 08, 2019 at 10:58:56AM -0400, Tom Lane wrote:

Alvaro Herrera  writes:

That said, I'm not sure I see the use case for an ALTER TABLE .. DROP
COLUMN command that turns a partitioned table (with existing partitions
containing data) into one non-partitioned table with all data minus the
partitioning column(s).


Yeah, it'd be a lot of work for a dubious goal.


This seems vaguely related to the issue of dropping foreign keys; see
https://postgr.es/m/20190329152239.GA29258@alvherre.pgsql wherein I
settled with a non-ideal solution to the problem of being unable to
depend on something that did not cause the entire table to be dropped
in certain cases.


That's an interesting analogy.  Re-reading that thread, what I said
in <29497.1554217...@sss.pgh.pa.us> seems pretty apropos to the
current problem:


FWIW, I think that the dependency mechanism is designed around the idea
that whether it's okay to drop a *database object* depends only on what
other *database objects* rely on it, and that you can always make a DROP
valid by dropping all the dependent objects.  That's not an unreasonable
design assumption, considering that the SQL standard embeds the same
assumption in its RESTRICT/CASCADE syntax.


I think that is probably a fatal objection to my idea of putting an error
check into RemoveAttributeById().  As an example, consider the possibility
that somebody makes a temporary type and then makes a permanent table with
a partitioning column of that type.  What shall we do at session exit?
Failing to remove the temp type is not an acceptable choice, because that
leaves us with a permanently broken temp schema (compare bug #15631 [1]).

Also, I don't believe we can make that work without order-of-operations
problems in cases comparable to the original bug in this thread [2].
One or the other order of the object OIDs is going to lead to the column
being visited for deletion before the whole table is, and again rejecting
the column deletion is not going to be an acceptable behavior.

So I think we're probably stuck with the approach of adding new internal
dependencies.  If we go that route, then our options for the released
branches are (1) do nothing, or (2) back-patch the code that adds such
dependencies, but without a catversion bump.  That would mean that only
tables created after the next minor releases would have protection against
this problem.  That's not ideal but maybe it's okay, considering that we
haven't seen actual field reports of trouble of this kind.



Couldn't we also write a function that adds those dependencies for
existing objects, and request users to run it after the update?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Broken defenses against dropping a partitioning column

2019-07-08 Thread Robert Haas
On Mon, Jul 8, 2019 at 11:08 AM Tom Lane  wrote:
> FWIW, I was imagining the action as being (1) detach all the child
> partitions, (2) make parent into a non-partitioned table, (3)
> drop the target column in each of these now-independent tables.
> No data movement.  Other than the need to acquire locks on all
> the tables, it shouldn't be particularly slow.

I see.  I think that would be reasonable, but like you say, it's not
clear that it's really what users would prefer.  You can think of a
partitioned table as a first-class object and the partitions as
subordinate implementation details; or you can think of the partitions
as the first-class objects and the partitioned table as the
second-rate glue that holds them together. It seems like users prefer
the former view.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Broken defenses against dropping a partitioning column

2019-07-08 Thread Tom Lane
Alvaro Herrera  writes:
> That said, I'm not sure I see the use case for an ALTER TABLE .. DROP
> COLUMN command that turns a partitioned table (with existing partitions
> containing data) into one non-partitioned table with all data minus the
> partitioning column(s).

Yeah, it'd be a lot of work for a dubious goal.

> This seems vaguely related to the issue of dropping foreign keys; see
> https://postgr.es/m/20190329152239.GA29258@alvherre.pgsql wherein I
> settled with a non-ideal solution to the problem of being unable to
> depend on something that did not cause the entire table to be dropped
> in certain cases.

That's an interesting analogy.  Re-reading that thread, what I said
in <29497.1554217...@sss.pgh.pa.us> seems pretty apropos to the
current problem:

>> FWIW, I think that the dependency mechanism is designed around the idea
>> that whether it's okay to drop a *database object* depends only on what
>> other *database objects* rely on it, and that you can always make a DROP
>> valid by dropping all the dependent objects.  That's not an unreasonable
>> design assumption, considering that the SQL standard embeds the same
>> assumption in its RESTRICT/CASCADE syntax.

I think that is probably a fatal objection to my idea of putting an error
check into RemoveAttributeById().  As an example, consider the possibility
that somebody makes a temporary type and then makes a permanent table with
a partitioning column of that type.  What shall we do at session exit?
Failing to remove the temp type is not an acceptable choice, because that
leaves us with a permanently broken temp schema (compare bug #15631 [1]).

Also, I don't believe we can make that work without order-of-operations
problems in cases comparable to the original bug in this thread [2].
One or the other order of the object OIDs is going to lead to the column
being visited for deletion before the whole table is, and again rejecting
the column deletion is not going to be an acceptable behavior.

So I think we're probably stuck with the approach of adding new internal
dependencies.  If we go that route, then our options for the released
branches are (1) do nothing, or (2) back-patch the code that adds such
dependencies, but without a catversion bump.  That would mean that only
tables created after the next minor releases would have protection against
this problem.  That's not ideal but maybe it's okay, considering that we
haven't seen actual field reports of trouble of this kind.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/15631-188663b383e1e697%40postgresql.org

[2] 
https://www.postgresql.org/message-id/flat/CA%2Bu7OA4JKCPFrdrAbOs7XBiCyD61XJxeNav4LefkSmBLQ-Vobg%40mail.gmail.com




Re: Broken defenses against dropping a partitioning column

2019-07-08 Thread Tom Lane
Robert Haas  writes:
> On Mon, Jul 8, 2019 at 11:02 AM Robert Haas  wrote:
>> On Mon, Jul 8, 2019 at 10:32 AM Alvaro Herrera  
>> wrote:
>>> That said, I'm not sure I see the use case for an ALTER TABLE .. DROP
>>> COLUMN command that turns a partitioned table (with existing partitions
>>> containing data) into one non-partitioned table with all data minus the
>>> partitioning column(s).

>> I think it would be useful to have "ALTER TABLE blah NOT PARTITIONED" but I

> ...hit send too soon, and also, I don't think anyone will be very
> happy if they get that behavior as a side effect of a DROP statement,
> mostly because it could take an extremely long time to execute.

FWIW, I was imagining the action as being (1) detach all the child
partitions, (2) make parent into a non-partitioned table, (3)
drop the target column in each of these now-independent tables.
No data movement.  Other than the need to acquire locks on all
the tables, it shouldn't be particularly slow.

But I'm still not volunteering to write it, because I'm not sure
anyone would want such a behavior.

regards, tom lane




Re: Broken defenses against dropping a partitioning column

2019-07-08 Thread Robert Haas
On Mon, Jul 8, 2019 at 11:02 AM Robert Haas  wrote:
> On Mon, Jul 8, 2019 at 10:32 AM Alvaro Herrera  
> wrote:
> > That said, I'm not sure I see the use case for an ALTER TABLE .. DROP
> > COLUMN command that turns a partitioned table (with existing partitions
> > containing data) into one non-partitioned table with all data minus the
> > partitioning column(s).
>
> I think it would be useful to have "ALTER TABLE blah NOT PARTITIONED" but I

...hit send too soon, and also, I don't think anyone will be very
happy if they get that behavior as a side effect of a DROP statement,
mostly because it could take an extremely long time to execute.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Broken defenses against dropping a partitioning column

2019-07-08 Thread Robert Haas
On Mon, Jul 8, 2019 at 10:32 AM Alvaro Herrera  wrote:
> That said, I'm not sure I see the use case for an ALTER TABLE .. DROP
> COLUMN command that turns a partitioned table (with existing partitions
> containing data) into one non-partitioned table with all data minus the
> partitioning column(s).

I think it would be useful to have "ALTER TABLE blah NOT PARTITIONED" but I

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Broken defenses against dropping a partitioning column

2019-07-08 Thread Alvaro Herrera
On 2019-Jul-07, Tom Lane wrote:

> Ideally, perhaps, a DROP CASCADE like this would not cascade to
> the whole table but only to the table's partitioned-ness property,
> leaving you with a non-partitioned table with most of its data
> intact.  It would take a lot of work to make that happen though,
> and it certainly wouldn't be back-patchable, and I'm not really
> sure it's worth it.

Maybe we can add dependencies to rows of the pg_partitioned_table
relation, with the semantics of "depends on the partitioned-ness of the
table"?

That said, I'm not sure I see the use case for an ALTER TABLE .. DROP
COLUMN command that turns a partitioned table (with existing partitions
containing data) into one non-partitioned table with all data minus the
partitioning column(s).

This seems vaguely related to the issue of dropping foreign keys; see
https://postgr.es/m/20190329152239.GA29258@alvherre.pgsql wherein I
settled with a non-ideal solution to the problem of being unable to
depend on something that did not cause the entire table to be dropped
in certain cases.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Broken defenses against dropping a partitioning column

2019-07-08 Thread Amit Langote
On Mon, Jul 8, 2019 at 4:11 AM Tom Lane  wrote:
> (Moved from pgsql-bugs thread at [1])

Thanks.

> Consider
>
> regression=# create domain d1 as int;
> CREATE DOMAIN
> regression=# create table t1 (f1 d1) partition by range(f1);
> CREATE TABLE
> regression=# alter table t1 drop column f1;
> ERROR:  cannot drop column named in partition key
>
> So far so good, but that defense has more holes than a hunk of
> Swiss cheese:

Indeed.

> regression=# drop domain d1 cascade;
> psql: NOTICE:  drop cascades to column f1 of table t1
> DROP DOMAIN
>
> Of course, the table is now utterly broken, e.g.
>
> regression=# \d t1
> psql: ERROR:  cache lookup failed for type 0

Oops.

> (More-likely variants of this include dropping an extension that
> defines the type of a partitioning column, or dropping the schema
> containing such a type.)

Yeah.  Actually, it's embarrassingly easy to fall through the holes.

create type mytype as (a int);
create table mytyptab (a mytype) partition by list (a);
drop type mytype cascade;
NOTICE:  drop cascades to column a of table mytyptab
DROP TYPE
select * from mytyptab;
ERROR:  cache lookup failed for type 0
LINE 1: select * from mytyptab;
  ^
drop table mytyptab;
ERROR:  cache lookup failed for type 0

> The fix I was speculating about in the pgsql-bugs thread was to add
> explicit pg_depend entries making the table's partitioning columns
> internally dependent on the whole table (or maybe the other way around;
> haven't experimented).  That fix has a couple of problems though:
>
> 1. In the example, "drop domain d1 cascade" would automatically
> cascade to the whole partitioned table, including child partitions
> of course.  This might leave a user sad, if a few terabytes of
> valuable data went away; though one could argue that they'd better
> have paid more attention to what the cascade cascaded to.
>
> 2. It doesn't fix anything for pre-existing tables in pre-v12 branches.
>
>
> I thought of a different possible approach, which is to move the
> "cannot drop column named in partition key" error check from
> ATExecDropColumn(), where it is now, to RemoveAttributeById().
> That would be back-patchable, but the implication would be that
> dropping anything that a partitioning column depends on would be
> impossible, even with CASCADE; you'd have to manually drop the
> partitioned table first.  Good for data safety, but a horrible
> violation of expectations, and likely of the SQL spec as well.

I prefer this second solution as it works for both preexisting and new
tables, although I also agree that it is not user-friendly.  Would it
help to document that one would be unable to drop anything that a
partitioning column directly and indirectly depends on (type, domain,
schema, extension, etc.)?

> I'm not sure we could avoid order-of-traversal problems, either.
>
> Ideally, perhaps, a DROP CASCADE like this would not cascade to
> the whole table but only to the table's partitioned-ness property,
> leaving you with a non-partitioned table with most of its data
> intact.

Yeah, it would've been nice if the partitioned-ness property of table
could be deleted independently of the table.

>  It would take a lot of work to make that happen though,
> and it certainly wouldn't be back-patchable, and I'm not really
> sure it's worth it.

Agreed that this sounds maybe more like a new feature.

Thanks,
Amit