On Wed, Mar 16, 2016 at 11:25 AM, Robert Haas <robertmh...@gmail.com> wrote: > Sure, and if everybody does that, then there will be 40 patches that > get updated in the last 2 days if the CommitFest, and that will be > impossible. Come on. You're demanding a degree of preferential > treatment which is unsupportable.
It's unexpected that an entirely maintenance-orientated patch like this would be received this way. I'm not demanding anything, or applying any real pressure. Let's just get on with it. I attach a revision, that makes all the changes that Heikki suggested, except one. As already noted several times, following this suggestion would have added a bug. Alvaro preferred my original approach here in any case. I refer to my original approach of making the new UNIQUE_CHECK_SPECULATIVE case minimally different from the existing UNIQUE_CHECK_PARTIAL case currently used for deferred unique constraints and speculative insertion, as opposed to making the new UNIQUE_CHECK_SPECULATIVE "like CHECK_UNIQUE_YES, but return FALSE instead of throwing an error on conflict". That was broken because CHECK_UNIQUE_YES waits for the outcome of an xact, which UNIQUE_CHECK_PARTIAL never does, and so UNIQUE_CHECK_SPECULATIVE must never do. Any and all waits happen in the first phase of speculative insertion, and never the seconds. I could give a complicated explanation for why, involving a deadlock scenario, but a simple explanation will do: it has always worked that way, and was tested to work that way. Feedback from Heikki led to these changes for this revision: * The use of arguments within ExecInsert() was simplified. * More concise AM documentation. The docs essentially describe two new concepts: - What unique index insertion needs to know about speculative insertion in general. This doesn't just apply to speculative inserters themselves, of course. - What speculative insertion is. Why it exists (why we don't just wait on xact). In other words, what "unprincipled deadlocks" are, and how they are avoided (from a relatively high level). I feel like I have a responsibility to make sure that this mechanism is well documented, especially given that not that many people were involved in its design. It's possible that no more than the 3 original authors of UPSERT fully understand speculative insertion -- it's easy to miss some of the subtleties. I do not pursue something like this without good reason. I'm optimistic that the patch will be accepted if it is carefully considered. -- Peter Geoghegan
From 2b2a4c40a5e60ac1f28a75f11204ce88eb48cc73 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan <peter.geoghega...@gmail.com> Date: Tue, 2 Jun 2015 17:34:16 -0700 Subject: [PATCH] Refactor speculative insertion into unique indexes Add a dedicated IndexUniqueCheck constant for the speculative insertion case, UNIQUE_CHECK_SPECULATIVE, rather than reusing UNIQUE_CHECK_PARTIAL, which should now only be used for deferrable unique constraints. This change allows btinsert() (and, in principle, any amcanunique aminsert function) to avoid physically inserting an IndexTuple in the event of detecting a conflict during speculative insertion's second phase. With nbtree, this avoidance now occurs at the critical point in _bt_doinsert() immediately after establishing that there is a conflict, but immediately before actually calling _bt_insertonpg() to proceed with physical IndexTuple insertion. At that point during UNIQUE_CHECK_PARTIAL insertion it makes sense to soldier on, because the possibility remains that the conflict will later go away and everything will happen to work out because the conflicting insertion's transaction aborted. Speculative inserters, in contrast, have no chance of working out a way to proceed without first deleting the would-be-pointed-to heap tuple already physically inserted. For the current row proposed for insertion, no useful progress will have been made at this point. This patch has nothing to do with performance; it is intended to clarify how amcanunique AMs perform speculative insertion, and the general theory of operation. It is natural to avoid an unnecessary index tuple insertion. That that could happen before was quite misleading, because it implied that it was necessary, and didn't acknowledge the differing requirements in each case. --- doc/src/sgml/indexam.sgml | 101 ++++++++++++++++++++++++++------- src/backend/access/nbtree/nbtinsert.c | 49 +++++++++------- src/backend/executor/execIndexing.c | 34 +++++++++-- src/backend/executor/nodeModifyTable.c | 2 +- src/include/access/genam.h | 8 +++ 5 files changed, 148 insertions(+), 46 deletions(-) diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml index 5f7befb..1b26dd0 100644 --- a/doc/src/sgml/indexam.sgml +++ b/doc/src/sgml/indexam.sgml @@ -271,10 +271,13 @@ aminsert (Relation indexRelation, <para> The function's Boolean result value is significant only when - <literal>checkUnique</> is <literal>UNIQUE_CHECK_PARTIAL</>. - In this case a TRUE result means the new entry is known unique, whereas - FALSE means it might be non-unique (and a deferred uniqueness check must - be scheduled). For other cases a constant FALSE result is recommended. + <literal>checkUnique</> is <literal>UNIQUE_CHECK_PARTIAL</> or + <literal>UNIQUE_CHECK_SPECULATIVE</>. In this case a TRUE result + means the new entry is known unique, whereas FALSE means it might + be non-unique (and a deferred uniqueness check must be scheduled, + or for <literal>UNIQUE_CHECK_SPECULATIVE</>, speculative insertion + will restart from its first phase, conflict prechecking). For + other cases a constant FALSE result is recommended. </para> <para> @@ -875,11 +878,15 @@ amrestrpos (IndexScanDesc scan); <listitem> <para> If a conflicting row has been inserted by an as-yet-uncommitted - transaction, the would-be inserter must wait to see if that transaction - commits. If it rolls back then there is no conflict. If it commits - without deleting the conflicting row again, there is a uniqueness - violation. (In practice we just wait for the other transaction to - end and then redo the visibility check in toto.) + transaction, the would-be inserter must wait to see if that + transaction commits (or if its <firstterm>speculative + insertion</> succeeds). If it rolls back (or the speculative + insertion does not complete) then there is no conflict. If it + commits without deleting the conflicting row again, there is a + uniqueness violation. (In practice we just wait for the other + transaction to end and then redo the visibility check in toto; + if a speculative insertion was conflicting, it may now be found + to be confirmed pending transaction commit.) </para> </listitem> <listitem> @@ -903,15 +910,17 @@ amrestrpos (IndexScanDesc scan); </para> <para> - We require the index access method to apply these tests itself, which - means that it must reach into the heap to check the commit status of - any row that is shown to have a duplicate key according to the index - contents. This is without a doubt ugly and non-modular, but it saves - redundant work: if we did a separate probe then the index lookup for - a conflicting row would be essentially repeated while finding the place to - insert the new row's index entry. What's more, there is no obvious way - to avoid race conditions unless the conflict check is an integral part - of insertion of the new index entry. + We require the index access method to apply these tests itself, + which means that it must reach into the heap to check the commit + status of any row (possibly including a speculative insertion + token) that is shown to have a duplicate key according to the index + contents. This is without a doubt ugly and non-modular, but it + saves redundant work: if we did a separate probe then the index + lookup for a conflicting row would be essentially repeated while + finding the place to insert the new row's index entry. What's + more, there is no obvious way to avoid race conditions unless the + conflict check is an integral part of insertion of the new index + entry. </para> <para> @@ -926,8 +935,36 @@ amrestrpos (IndexScanDesc scan); tuple and some other tuple with the same key are live, then the error must be reported. (Note that for this purpose, <quote>live</> actually means <quote>any tuple in the index entry's HOT chain is live</>.) - To implement this, the <function>aminsert</> function is passed a - <literal>checkUnique</> parameter having one of the following values: + </para> + + <para> + Speculative insertion is an executor infrastructure used by + <literal>INSERT ... ON CONFLICT DO UPDATE/NOTHING</>. A major goal + of the infrastructure is to consistently maintain a documented + user-visible guarantee: the avoidance of <quote>unprincipled</> + deadlocks. Unprincipled deadlocks are deadlocks that arise due to + a mutual dependency that is purely an implementation detail, that + users cannot avoid by being consistent about the order of lock + acquisition. A speculative insertion token lock is not held for + its transaction's entire duration; rather, it is held during its + holder's speculative insertion, typically for only an instant. + Without this precaution, under high concurrency <literal>INSERT ... + ON CONFLICT DO UPDATE</> sessions could raise non-avoidable + deadlocks, which would be inconsistent with documented behavior. + Speculative insertion conflicts are handled in a similar manner to + deferred constraint conflicts, in that no error is initially + thrown. However, rather than rechecking for a conflict at a + deferred point, the speculative insertion attempt is cancelled. + The executor restarts at the first phase, the precheck, which is + the only point at which unique index speculative insertion may wait + for an outcome of or in another transaction. Note that deferrable + constraints are not supported. + </para> + + <para> + To implement unique enforcement, the <function>aminsert</> function + is passed a <literal>checkUnique</> parameter having one of the + following values: <itemizedlist> <listitem> @@ -986,6 +1023,30 @@ amrestrpos (IndexScanDesc scan); for the same tuple values as were used in the original insertion. </para> </listitem> + <listitem> + <para> + <literal>UNIQUE_CHECK_SPECULATIVE</> is very similar to + <literal>UNIQUE_CHECK_PARTIAL</>, but indicates an optimistic + insertion in the second stage of speculative insertion. + </para> + + <para> + <literal>UNIQUE_CHECK_SPECULATIVE</> allows the access method + to not proceed with <structname>IndexTuple</structname> + insertion when it is already clear that that cannot be useful + (i.e. when <function>aminsert</> is set to return FALSE, + making caller immediately delete its physically inserted heap + tuple). Proceeding with insertion of the caller's + <structname>IndexTuple</structname> regardless is harmless but + unnecessary. When <literal>UNIQUE_CHECK_SPECULATIVE</> + insertion detects a conflict, the caller (executor) will cancel + its speculative insertion, immediately preventing its heap + tuple from being observed even by uniqueness checking. The + executor then restarts from the very beginning for the row + proposed for insertion. There should be no side-effects + associated with the cancelled attempt. + </para> + </listitem> </itemizedlist> </para> diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c index e3c55eb..3b6620f 100644 --- a/src/backend/access/nbtree/nbtinsert.c +++ b/src/backend/access/nbtree/nbtinsert.c @@ -91,17 +91,21 @@ static void _bt_vacuum_one_page(Relation rel, Buffer buffer, Relation heapRel); * This routine is called by the public interface routine, btinsert. * By here, itup is filled in, including the TID. * - * If checkUnique is UNIQUE_CHECK_NO or UNIQUE_CHECK_PARTIAL, this - * will allow duplicates. Otherwise (UNIQUE_CHECK_YES or - * UNIQUE_CHECK_EXISTING) it will throw error for a duplicate. - * For UNIQUE_CHECK_EXISTING we merely run the duplicate check, and - * don't actually insert. + * If checkUnique is UNIQUE_CHECK_NO, UNIQUE_CHECK_PARTIAL or + * UNIQUE_CHECK_SPECULATIVE, this will allow duplicates. Otherwise + * (UNIQUE_CHECK_YES or UNIQUE_CHECK_EXISTING) it will throw error for a + * duplicate. For UNIQUE_CHECK_EXISTING we merely run the duplicate + * check, and don't actually insert. * - * The result value is only significant for UNIQUE_CHECK_PARTIAL: - * it must be TRUE if the entry is known unique, else FALSE. - * (In the current implementation we'll also return TRUE after a - * successful UNIQUE_CHECK_YES or UNIQUE_CHECK_EXISTING call, but - * that's just a coding artifact.) + * The result value is only significant for UNIQUE_CHECK_PARTIAL + * and UNIQUE_CHECK_SPECULATIVE: it must be TRUE if the entry is + * known unique, else FALSE. (In the current implementation + * we'll also return TRUE after a successful UNIQUE_CHECK_YES or + * UNIQUE_CHECK_EXISTING call, but that's just a coding + * artifact.) + * + * Note that UNIQUE_CHECK_SPECULATIVE calls that return FALSE + * don't actually proceed with insertion of the index tuple. */ bool _bt_doinsert(Relation rel, IndexTuple itup, @@ -188,7 +192,8 @@ top: } } - if (checkUnique != UNIQUE_CHECK_EXISTING) + if (checkUnique != UNIQUE_CHECK_EXISTING && + (checkUnique != UNIQUE_CHECK_SPECULATIVE || is_unique)) { /* * The only conflict predicate locking cares about for indexes is when @@ -230,10 +235,11 @@ top: * progress, *speculativeToken is set to non-zero, and the caller can wait for * the verdict on the insertion using SpeculativeInsertionWait(). * - * However, if checkUnique == UNIQUE_CHECK_PARTIAL, we always return - * InvalidTransactionId because we don't want to wait. In this case we - * set *is_unique to false if there is a potential conflict, and the - * core code must redo the uniqueness check later. + * However, if checkUnique is UNIQUE_CHECK_PARTIAL or + * UNIQUE_CHECK_SPECULATIVE, we always return InvalidTransactionId because we + * don't want to wait. In this case we set *is_unique to false if there is a + * potential conflict, and the core code must redo the uniqueness check later + * or restart speculative insertion. */ static TransactionId _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel, @@ -331,12 +337,15 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel, /* * It is a duplicate. If we are only doing a partial - * check, then don't bother checking if the tuple is being - * updated in another transaction. Just return the fact - * that it is a potential conflict and leave the full - * check till later. + * check or speculative insertion, then don't bother + * checking if the tuple is being updated in another + * transaction. Just return the fact that it is a potential + * conflict. Partial check callers can leave the full check + * till later. Speculative insertion callers will need to + * restart. */ - if (checkUnique == UNIQUE_CHECK_PARTIAL) + if (checkUnique == UNIQUE_CHECK_PARTIAL || + checkUnique == UNIQUE_CHECK_SPECULATIVE) { if (nbuf != InvalidBuffer) _bt_relbuf(rel, nbuf); diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c index 5d553d5..f37c423 100644 --- a/src/backend/executor/execIndexing.c +++ b/src/backend/executor/execIndexing.c @@ -259,6 +259,14 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo) * the same is done for non-deferred constraints, but report * if conflict was speculative or deferred conflict to caller) * + * Note that the implementation does not bother with inserting + * into all indexes in the event of detecting a speculative + * insertion conflict. This should not matter, because when + * *specConflict is set to true, caller's heap tuple will be + * super-deleted; a complete list of indexes with conflicts + * is not useful (in fact, even the actual source of the + * speculative insertion conflict is not of interest). + * * CAUTION: this must not be called for a HOT update. * We can't defend against that here for lack of info. * Should we change the API to make it safer? @@ -320,6 +328,7 @@ ExecInsertIndexTuples(TupleTableSlot *slot, /* Record if speculative insertion arbiter */ arbiter = list_member_oid(arbiterIndexes, indexRelation->rd_index->indexrelid); + Assert(!arbiter || indexRelation->rd_index->indimmediate); /* If the index is marked as read-only, ignore it */ if (!indexInfo->ii_ReadyForInserts) @@ -374,7 +383,7 @@ ExecInsertIndexTuples(TupleTableSlot *slot, if (!indexRelation->rd_index->indisunique) checkUnique = UNIQUE_CHECK_NO; else if (noDupErr && (arbiterIndexes == NIL || arbiter)) - checkUnique = UNIQUE_CHECK_PARTIAL; + checkUnique = UNIQUE_CHECK_SPECULATIVE; else if (indexRelation->rd_index->indimmediate) checkUnique = UNIQUE_CHECK_YES; else @@ -432,18 +441,29 @@ ExecInsertIndexTuples(TupleTableSlot *slot, } if ((checkUnique == UNIQUE_CHECK_PARTIAL || + checkUnique == UNIQUE_CHECK_SPECULATIVE || indexInfo->ii_ExclusionOps != NULL) && !satisfiesConstraint) { + if (noDupErr && indexRelation->rd_index->indimmediate) + { + /* + * Speculative conflict found, so caller must restart + * speculative insertion. Caller will now not care about + * specifics of any other deferred constraint conflict + * that may have already been noted. + */ + list_free(result); + *specConflict = true; + return NIL; + } + /* * The tuple potentially violates the uniqueness or exclusion * constraint, so make a note of the index so that we can re-check - * it later. Speculative inserters are told if there was a - * speculative conflict, since that always requires a restart. + * it later */ result = lappend_oid(result, RelationGetRelid(indexRelation)); - if (indexRelation->rd_index->indimmediate && specConflict) - *specConflict = true; } } @@ -540,6 +560,10 @@ ExecCheckIndexConstraints(TupleTableSlot *slot, errtableconstraint(heapRelation, RelationGetRelationName(indexRelation)))); + /* + * Set checkedIndex here, since partial unique index will still + * count as found + */ checkedIndex = true; /* Check for partial index */ diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 27051e8..d0d52f6 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -441,7 +441,7 @@ ExecInsert(ModifyTableState *mtstate, */ if (specConflict) { - list_free(recheckIndexes); + Assert(recheckIndexes == NIL); goto vlock; } diff --git a/src/include/access/genam.h b/src/include/access/genam.h index 81907d5..2254716 100644 --- a/src/include/access/genam.h +++ b/src/include/access/genam.h @@ -98,6 +98,13 @@ typedef struct SysScanDescData *SysScanDesc; * known unique, false if it is possibly non-unique. In the "true" case * it is safe to omit the later recheck. * + * UNIQUE_CHECK_SPECULATIVE is very similar to UNIQUE_CHECK_PARTIAL, but is + * used for speculative insertion (which does not support deferrable unique + * constraints as arbiters). The only difference is that it allows the + * implementation to avoid pointlessly inserting an index tuple after + * establishing that there is a speculative insertion conflict, where the heap + * tuple must be super-deleted anyway. + * * When it is time to recheck the deferred constraint, a pseudo-insertion * call is made with UNIQUE_CHECK_EXISTING. The tuple is already in the * index in this case, so it should not be inserted again. Rather, just @@ -108,6 +115,7 @@ typedef enum IndexUniqueCheck UNIQUE_CHECK_NO, /* Don't do any uniqueness checking */ UNIQUE_CHECK_YES, /* Enforce uniqueness at insertion time */ UNIQUE_CHECK_PARTIAL, /* Test uniqueness, but no error */ + UNIQUE_CHECK_SPECULATIVE, /* Speculative Insertion */ UNIQUE_CHECK_EXISTING /* Check if existing tuple is unique */ } IndexUniqueCheck; -- 1.9.1
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers