Added to the open items list. https://wiki.postgresql.org/wiki/PostgreSQL_11_Open_Items#Open_Issues
On Tue, Apr 3, 2018 at 2:52 PM, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: > On 2018/04/03 14:45, Kyotaro HORIGUCHI wrote: > > Hello. > > > > At Mon, 2 Apr 2018 16:11:12 -0300, Alvaro Herrera < > alvhe...@alvh.no-ip.org> wrote: > >> Why do we need AccessExclusiveLock on all children of a relation that we > >> want to scan to search for rows not satisfying the constraint? I think > >> it should be enough to get ShareLock, which prevents INSERT, no? I have > >> a feeling I'm missing something here, but I don't know what, and all > >> tests pass with that change. > > Thinking on this a bit, I see no problem with locking the children with > just ShareLock. It was just a paranoia that if we're going to lock the > table itself being attached with AEL, we must its children (if any) with > AEL, too. > > > Mmm. I'm not sure if there's a lock-upgrade case but the > > following sentense is left at the last of one of the modified > > comments. ATREwriteTables is called with AEL after that (that has > > finally no effect in this case). > > > > | But we cannot risk a deadlock by > taking > > | * a weaker lock now and the stronger one only when needed. > > > > I don't actually find places where the children's lock can be > > raised but this seems just following the lock parent first > > principle. > > No lock upgrade happen as of now. The comment was added by the commit > 972b6ec20bf [1], which removed the code that could cause such a deadlock. > The comment fragment is simply trying to explain why we don't postpone the > locking of children to a later time, say, to the point where we actually > know that they need to be scanned. Previously the code next to the > comment used to lock the children using AccessShareLock, because at that > point we just needed to check if the table being attached is one of those > children and then later locked with AEL if it turned out that they need to > be scanned to check the partition constraint. > > > By the way check_default_allows_bound() (CREATE TABLE case) > > contains a similar usage of find_all_inheritors(default_rel, > > AEL). > > Good catch. Its job is more or less same as > ValidatePartitionConstraints(), except all the children (if any) are > scanned right away instead of queuing it like in the AT case. > > >> Also: the change proposed to remove the get_default_oid_from_partdesc() > >> call actually fixes the bug, but to me it was not at all obvious why. > > > > CommandCounterIncrement updates the content of a relcache entry > > via invalidation. It can be surprising for callers of a function > > like StorePartitionBound. > > > > CommandCounterIncrement > > <skip inval mechanism> > > RelationCacheInvalidateEntry > > RelationFlushRelation > > RelationClearRelation > > Because of the CCI() after storing the default partition OID into the > system catalog, RelationClearRelation() would changes what > rel->rd_partdesc points to where 'rel' is the ATExecAttachPartition()'s > reference to the relcache entry of the parent that it passed to > StorePartitionBound. > > So, whereas the rel->rd_partdesc wouldn't contain the default partition > before StorePartitionBound() was called, it would after. > > >> To figure out why, I first had to realize that > >> ValidatePartitionConstraints was lying to me, both in name and in > >> comments: it doesn't actually validate anything, it merely queues > >> entries so that alter table's phase 3 would do the validation. I found > >> this extremely confusing, so I fixed the comments to match reality, and > >> later decided to rename the function also. > > > > It is reasonable. Removing exccessive extension of lower-level > > partitions is also reasonable. The previous code extended > > inheritances in different manner for levels at odd and even > > depth. > > I like the new code including the naming, but I notice this changes the > order in which we do the locking now. There are still sites in the code > where the locking order is breadth-first, that is, as determined by > find_all_inheritors(), as this function would too previously. > > Also note that beside the breadth-first -> depth-first change, this also > changes the locking order of partitions for a given partitioned table. > The OIDs in partdesc->oids[] are canonically ordered (that is order of > their partition bounds), whereas find_inheritance_children() that's called > by find_all_inheritors() would lock them in the order in which the > individual OIDs were found in the system catalog. > > Not sure if there is anything to be alarmed of here, but in all previous > discussions, this has been a thing to avoid. > > >> At that point I was able to understand what the problem was: when > >> attaching the default partition, we were setting the constraint to be > >> validated for that partition to the correctly computed partition > >> constraint; and later in the same function we would set the constraint > >> to "the immature constraint" because we were now seeing that the default > >> partition OID was not invalid. So it was rather a bug in > >> ValidatePartitionConstraints, in that it was accepting to set the > >> expression to validate when another expression had already been set! I > >> added an assert to protect against this. And then changed the decision > >> of whether or not to run this block based on the attached partition > >> being the default one or not; because as the "if" test was, it was just > >> a recipe for confusion. (Now, if you look carefully you realize that > >> the new test for bound->is_default I added is kinda redundant: it can > >> only be set if there was a default partition OID at start of the > >> function, and therefore we know we're not adding a default partition. > >> And for the case where we *are* adding the default partition, it is > >> still Invalid because we don't re-read its own OID. But relying on that > >> seems brittle because it breaks if we happen to update the default > >> partition OID in the middle of that function, which is what we were > >> doing. Hence the new is_default test.) > > > > Seems reasonable. > > +1 > > > But even if we assume so, rereading > > defaultPartOid is still breaking the assumption that > > defaultPartOid is that at the time of entering to this function > > and the added condition just conceals the fact. > > Afaics, defaultPartOid is only set at the beginning of > ATExecAttachPartition, so it seems fine. > > > So I think it should be an assertion. > > > > | if (OidIsValid(defaultPartOid)) > > | { > > | /* > > | * The command cannot be adding default partition if the > > | * defaultPartOid is valid. > > | */ > > | Assert(!cmd->bound->is_default); > > I guess that makes sense, because when trying to attach a default > partition to the table that already has one, check_new_partition_bound > that's called before this errors out before we could get here. > > >> I looked at the implementation of ValidatePartitionConstraints and > >> didn't like it, so I rewrote it also. > >> > >> This comment is mistaken, too: > >> - /* > >> - * Skip if the partition is itself a partitioned table. We can > only > >> - * ever scan RELKIND_RELATION relations. > >> - */ > >> ... because it ignores the possibility of a partition being a foreign > >> table. The code seems to work, but I think there is no test to cover > >> the case that a foreign table containing data that doesn't satisfy the > >> constraint is attached, because when I set that case to return doing > >> nothing (ie. ATGetQueueEntry is not called for a foreign partition), no > >> test failed. > > > > Foreign tables are intentionally not verified on attaching (in my > > understanding). ATRewriteTables ignores foreign tables so it > > contradicts to what ATRewriteTables thinks. As my understanding > > foreign tables are assumed to be unrestrictable by local > > constraint after attaching so the users are responsible to the > > consistency. > > That and ATRewriteTable() in phase 3 cannot really cope with being handed > a foreign table as it can only work with RELKIND_RELATION tables. > Actually, the following in ATRewriteTables() also prevents passing foreign > tables: > > static void > ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE > lockmode) > { > ListCell *ltab; > > /* Go through each table that needs to be checked or rewritten */ > foreach(ltab, *wqueue) > { > AlteredTableInfo *tab = (AlteredTableInfo *) lfirst(ltab); > > /* > * Foreign tables have no storage, nor do partitioned tables and > * indexes. > */ > if (tab->relkind == RELKIND_FOREIGN_TABLE || > tab->relkind == RELKIND_PARTITIONED_TABLE || > tab->relkind == RELKIND_PARTITIONED_INDEX) > continue; > > >> Generally speaking, I didn't like ATExecAttachPartition; it's doing too > >> much that should have been done in ATPrepAttachPartition. The only > >> reason that's not broken is because we don't allow ATTACH PARTITION to > >> appear together with other commands in alter table, which seems > >> disappointing ... for example, when attaching multiple tables and a > >> default partition exist, you have to scan the default one multiple > >> times, which is lame. > > > > Indeed, currently only partition commands are isolated from other > > alter table subcommands (except all in tablespace cases). We can > > improve that in the next step? > > I think we can improve this. > > >> (Speaking of lame: I noticed that RelationGetPartitionQual obtains lock > >> on the parent relation ... I wonder if this can be used to cause a > >> deadlock during InitResultRelationInfo.) > > It does seem that there is a possibility of deadlock, because when > InitResultRelInfo(), that's initializing a partition, calls > RelationGetPartitionQual() to get its partition constraint, the partition > would already have been locked. So this reverses the generally > established order of locking the parent first; another session which tries > to add a column to the parent, for example, will lock the parent and then > partitions. > > I think we need for inserts directly into a partition to lock all of its > ancestors from the root to the direct parent (in that order) before > locking the partition itself, or maybe at least the parent if not all > ancestors. > > > Mmm. That seems strange since the RealtionGetPartitionQual > > doesn't return anything (specifically NIL) there, because we > > don't allow a partition to be attached to partitioned tables. It > > seems totally useless. > > > > Addition to that the code tries to add the partition qual (which > > is always NIL) destructively and assign to partConstraint.. > > It is not always NIL. Imagine attaching a partition at a lower level. > > create table foo (a int, b char) partition by list (a); > create table foo1 partition of foo for values in (1) partition by list (b); > create table foo1a (a, b) as values (2, 'b'); > > -- note that we're attaching to foo1, not foo > alter table foo1 attach partition foo1a for values in ('a'); > > If we didn't include foo1's (the parent) constraint (that is, a = 1), the > above command will wrongly succeed. It must include a = 1 in the > constraint to be be checked when scanning foo1a. > > Although, I noticed there is no test covering this. > > >> BTW, I think this is already broken for the case where the default > >> partition is partitioned and you attach a partition colliding with a row > >> that's being concurrently inserted in a partition of the default > >> partition, though I didn't bother to write a test for that. > > > > How is it broken? Every attaching partitions are checked for the > > specified partition bound and every partitions of the default > > partition are also checked against the new default part bound. We > > already hold required locks on all the participants. > > Yes, concurrent insertions to either the default partition or any of its > partitions couldn't be occurring as we'd have locked them. > > >> Anyway, I'm just an onlooker fixing a CommandCounterIncrement change. > > > > It's reassuring. Thanks. > > Yes, thank you for taking the time out to clean things up. > > Thanks, > Amit > > [1] > https://git.postgresql.org/gitweb/?p=postgresql.git;a= > commitdiff;h=972b6ec20bf > > -- Rushabh Lathia