On 2019/03/08 19:22, Amit Langote wrote: > On 2019/03/07 20:36, Amit Langote wrote: >> On Thu, Mar 7, 2019 at 11:17 AM Amit Langote >> <langote_amit...@lab.ntt.co.jp> wrote: >>> The problem start when ALTER TABLE users ALTER COLUMN is executed. >>> create table users(user_id int, name varchar(64), unique (user_id, name)) >>> partition by list(user_id); >>> >>> create table users_000 partition of users for values in(0); >>> create table users_001 partition of users for values in(1); >>> select relname, relfilenode from pg_class where relname like 'users%'; >>> relname │ relfilenode >>> ────────────────────────────┼───────────── >>> users │ 16441 >>> users_000 │ 16446 >>> users_000_user_id_name_key │ 16449 >>> users_001 │ 16451 >>> users_001_user_id_name_key │ 16454 >>> users_user_id_name_key │ 16444 >>> (6 rows) >>> >>> alter table users alter column name type varchar(127); >>> select relname, relfilenode from pg_class where relname like 'users%'; >>> relname │ relfilenode >>> ────────────────────────────┼───────────── >>> users │ 16441 >>> users_000 │ 16446 >>> users_000_user_id_name_key │ 16444 <=== duplicated >>> users_001 │ 16451 >>> users_001_user_id_name_key │ 16444 <=== duplicated >>> users_user_id_name_key │ 16444 <=== duplicated >>> (6 rows) >> >> I checked why users_000's and user_0001's indexes end up reusing >> users_user_id_name_key's relfilenode. At the surface, it's because >> DefineIndex(<parent's-index-to-be-recreated>) is carrying oldNode = >> <old-parents-index's-relfilenode> in IndexStmt, which is recursively >> passed down to DefineIndex(<child-indexes-to-be-recreated>). This >> DefineIndex() chain is running due to ATPostAlterTypeCleanup() on the >> parent rel. This surface problem may be solved in DefineIndex() by >> just resetting oldNode in each child IndexStmt before recursing, but >> that means child indexes are recreated with new relfilenodes. That >> solves the immediate problem of relfilenodes being wrongly duplicated, >> that's leading to madness such as SMgrRelationHash corruption being >> seen in the original bug report. > > This doesn't happen in HEAD, because in HEAD we got 807ae415c5, which > changed things so that partitioned relations always have their relfilenode > set to 0. So, there's no question of parent's relfilenode being passed to > children and hence being duplicated. > > But that also means child indexes are unnecessarily rewritten, that is, > with new relfilenodes. > >> But, the root problem seems to be that ATPostAlterTypeCleanup() on >> child tables isn't setting up their own >> DefineIndex(<child-index-to-be-rewritten>) step. That's because the >> parent's ATPostAlterTypeCleanup() dropped child copies of the UNIQUE >> constraint due to dependencies (+ CCI). So, ATExecAlterColumnType() >> on child relations isn't able to find the constraint on the individual >> child relations to turn into their own >> DefineIndex(<child-index-to-be-rewritten>). If we manage to handle >> each relation's ATPostAlterTypeCleanup() independently, child's >> recreated indexes will be able to reuse their old relfilenodes and >> everything will be fine. But maybe that will require significant >> overhaul of how this post-alter-type-cleanup occurs? > > We could try to solve this dividing ATPostAlterTypeCleanup processing into > two functions: > > 1 The first one runs right after ATExecAlterColumnType() is finished for a > given table (like it does today), which then runs ATPostAlterTypeParse to > generate commands for constraints and/or indexes to re-add. This part > won't drop the old constraints/indexes just yet, so child > constraints/indexes will remain for ATExecAlterColumnType to see when > executed for the children. > > 2. Dropping the old constraints/indexes is the responsibility of the 2nd > part, which runs just before executing ATExecReAddIndex or > ATExecAddConstraint (with is_readd = true), so that the new constraints > don't collide with the existing ones. > > This arrangement allows step 1 to generate the commands to recreate even > the child indexes such that the old relfilenode can be be preserved by > setting IndexStmt.oldNode. > > Attached patch is a very rough sketch, which fails some regression tests, > but I ran out of time today.
I'm thinking of adding this to open items under Older Bugs. Attached the patch that I had posted on -bugs, but it's only a rough sketch as described above, not a full fix. Link to the original bug report: https://www.postgresql.org/message-id/flat/15672-b9fa7db32698269f%40postgresql.org Thanks, Amit
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 5dcedc337a..ca514919d1 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -872,7 +872,7 @@ DefineIndex(Oid relationId, CreateComments(indexRelationId, RelationRelationId, 0, stmt->idxcomment); - if (partitioned) + if (partitioned && !is_alter_table) { /* * Unless caller specified to skip this step (via ONLY), process each diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 93f13a4778..fb10d640e5 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -426,7 +426,9 @@ static ObjectAddress ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel, AlterTableCmd *cmd, LOCKMODE lockmode); static ObjectAddress ATExecAlterColumnGenericOptions(Relation rel, const char *colName, List *options, LOCKMODE lockmode); -static void ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, +static void ATPostAlterTypeCleanup1(List **wqueue, AlteredTableInfo *tab, + LOCKMODE lockmode); +static void ATPostAlterTypeCleanup2(List **wqueue, AlteredTableInfo *tab, LOCKMODE lockmode); static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd, List **wqueue, LOCKMODE lockmode, @@ -4068,6 +4070,10 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode) if (subcmds == NIL) continue; + /* Drop the old indexes and constraints before creating*/ + if (pass == AT_PASS_OLD_INDEX || pass == AT_PASS_OLD_CONSTR) + ATPostAlterTypeCleanup2(wqueue, tab, lockmode); + /* * Appropriate lock was obtained by phase 1, needn't get it again */ @@ -4084,7 +4090,7 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode) * multiple columns of a table are altered). */ if (pass == AT_PASS_ALTER_TYPE) - ATPostAlterTypeCleanup(wqueue, tab, lockmode); + ATPostAlterTypeCleanup1(wqueue, tab, lockmode); relation_close(rel, NoLock); } @@ -7233,6 +7239,9 @@ ATAddCheckConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel, if (constr->is_no_inherit) return address; + if (is_readd) + return address; + /* * Propagate to children as appropriate. Unlike most other ALTER * routines, we have to do this one level of recursion at a time; we can't @@ -10307,21 +10316,12 @@ ATExecAlterColumnGenericOptions(Relation rel, * and constraints that depend on the altered columns. */ static void -ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMODE lockmode) +ATPostAlterTypeCleanup1(List **wqueue, AlteredTableInfo *tab, LOCKMODE lockmode) { - ObjectAddress obj; - ObjectAddresses *objects; ListCell *def_item; ListCell *oid_item; /* - * Collect all the constraints and indexes to drop so we can process them - * in a single call. That way we don't have to worry about dependencies - * among them. - */ - objects = new_object_addresses(); - - /* * Re-parse the index and constraint definitions, and attach them to the * appropriate work queue entries. We do this before dropping because in * the case of a FOREIGN KEY constraint, we might not yet have exclusive @@ -10363,16 +10363,13 @@ ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMODE lockmode) conislocal = con->conislocal; ReleaseSysCache(tup); - ObjectAddressSet(obj, ConstraintRelationId, oldId); - add_exact_object_address(&obj, objects); - /* * If the constraint is inherited (only), we don't want to inject a * new definition here; it'll get recreated when ATAddCheckConstraint * recurses from adding the parent table's constraint. But we had to * carry the info this far so that we can drop the constraint below. */ - if (!conislocal) + if (!conislocal && contype == CONSTR_CHECK) continue; /* @@ -10398,6 +10395,46 @@ ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMODE lockmode) ATPostAlterTypeParse(oldId, relid, InvalidOid, (char *) lfirst(def_item), wqueue, lockmode, tab->rewrite); + } +} + +/* + * Cleanup after we've finished all the ALTER TYPE operations for a + * particular relation. We have to drop and recreate all the indexes + * and constraints that depend on the altered columns. + */ +static void +ATPostAlterTypeCleanup2(List **wqueue, AlteredTableInfo *tab, LOCKMODE lockmode) +{ + ObjectAddress obj; + ObjectAddresses *objects; + ListCell *def_item; + ListCell *oid_item; + + /* + * Collect all the constraints and indexes to drop so we can process them + * in a single call. That way we don't have to worry about dependencies + * among them. + */ + objects = new_object_addresses(); + forboth(oid_item, tab->changedConstraintOids, + def_item, tab->changedConstraintDefs) + { + Oid oldId = lfirst_oid(oid_item); + + if (!SearchSysCacheExists1(CONSTROID, oldId)) + continue; + + ObjectAddressSet(obj, ConstraintRelationId, oldId); + add_exact_object_address(&obj, objects); + } + forboth(oid_item, tab->changedIndexOids, + def_item, tab->changedIndexDefs) + { + Oid oldId = lfirst_oid(oid_item); + + if (!SearchSysCacheExists1(RELOID, oldId)) + continue; ObjectAddressSet(obj, RelationRelationId, oldId); add_exact_object_address(&obj, objects); @@ -10411,6 +10448,7 @@ ATPostAlterTypeCleanup(List **wqueue, AlteredTableInfo *tab, LOCKMODE lockmode) free_object_addresses(objects); + CommandCounterIncrement(); /* * The objects will get recreated during subsequent passes over the work * queue.