Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

Peter Geoghegan Sat, 19 Mar 2016 16:55:15 -0700

On Wed, Mar 16, 2016 at 11:25 AM, Robert Haas <robertmh...@gmail.com> wrote:
> Sure, and if everybody does that, then there will be 40 patches that
> get updated in the last 2 days if the CommitFest, and that will be
> impossible.  Come on.  You're demanding a degree of preferential
> treatment which is unsupportable.


It's unexpected that an entirely maintenance-orientated patch like
this would be received this way. I'm not demanding anything, or
applying any real pressure. Let's just get on with it.

I attach a revision, that makes all the changes that Heikki suggested,
except one. As already noted several times, following this suggestion
would have added a bug. Alvaro preferred my original approach here in
any case. I refer to my original approach of making the new
UNIQUE_CHECK_SPECULATIVE case minimally different from the existing
UNIQUE_CHECK_PARTIAL case currently used for deferred unique
constraints and speculative insertion, as opposed to making the new
UNIQUE_CHECK_SPECULATIVE "like CHECK_UNIQUE_YES, but return FALSE
instead of throwing an error on conflict". That was broken because
CHECK_UNIQUE_YES waits for the outcome of an xact, which
UNIQUE_CHECK_PARTIAL never does, and so UNIQUE_CHECK_SPECULATIVE must
never do.

Any and all waits happen in the first phase of speculative insertion,
and never the seconds. I could give a complicated explanation for why,
involving a deadlock scenario, but a simple explanation will do: it
has always worked that way, and was tested to work that way.

Feedback from Heikki led to these changes for this revision:

* The use of arguments within ExecInsert() was simplified.

* More concise AM documentation.

The docs essentially describe two new concepts:

- What unique index insertion needs to know about speculative
insertion in general. This doesn't just apply to speculative inserters
themselves, of course.

- What speculative insertion is. Why it exists (why we don't just wait
on xact). In other words, what "unprincipled deadlocks" are, and how
they are avoided (from a relatively high level).

I feel like I have a responsibility to make sure that this mechanism
is well documented, especially given that not that many people were
involved in its design. It's possible that no more than the 3 original
authors of UPSERT fully understand speculative insertion -- it's easy
to miss some of the subtleties.

I do not pursue something like this without good reason. I'm
optimistic that the patch will be accepted if it is carefully
considered.

-- 
Peter Geoghegan

From 2b2a4c40a5e60ac1f28a75f11204ce88eb48cc73 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <peter.geoghega...@gmail.com>
Date: Tue, 2 Jun 2015 17:34:16 -0700
Subject: [PATCH] Refactor speculative insertion into unique indexes

Add a dedicated IndexUniqueCheck constant for the speculative insertion
case, UNIQUE_CHECK_SPECULATIVE, rather than reusing
UNIQUE_CHECK_PARTIAL, which should now only be used for deferrable
unique constraints.

This change allows btinsert() (and, in principle, any amcanunique
aminsert function) to avoid physically inserting an IndexTuple in the
event of detecting a conflict during speculative insertion's second
phase.  With nbtree, this avoidance now occurs at the critical point in
_bt_doinsert() immediately after establishing that there is a conflict,
but immediately before actually calling _bt_insertonpg() to proceed with
physical IndexTuple insertion.

At that point during UNIQUE_CHECK_PARTIAL insertion it makes sense to
soldier on, because the possibility remains that the conflict will later
go away and everything will happen to work out because the conflicting
insertion's transaction aborted.  Speculative inserters, in contrast,
have no chance of working out a way to proceed without first deleting
the would-be-pointed-to heap tuple already physically inserted.  For the
current row proposed for insertion, no useful progress will have been
made at this point.

This patch has nothing to do with performance; it is intended to clarify
how amcanunique AMs perform speculative insertion, and the general
theory of operation.  It is natural to avoid an unnecessary index tuple
insertion.  That that could happen before was quite misleading, because
it implied that it was necessary, and didn't acknowledge the differing
requirements in each case.
---
 doc/src/sgml/indexam.sgml              | 101 ++++++++++++++++++++++++++-------
 src/backend/access/nbtree/nbtinsert.c  |  49 +++++++++-------
 src/backend/executor/execIndexing.c    |  34 +++++++++--
 src/backend/executor/nodeModifyTable.c |   2 +-
 src/include/access/genam.h             |   8 +++
 5 files changed, 148 insertions(+), 46 deletions(-)

diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 5f7befb..1b26dd0 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -271,10 +271,13 @@ aminsert (Relation indexRelation,
 
   <para>
    The function's Boolean result value is significant only when
-   <literal>checkUnique</> is <literal>UNIQUE_CHECK_PARTIAL</>.
-   In this case a TRUE result means the new entry is known unique, whereas
-   FALSE means it might be non-unique (and a deferred uniqueness check must
-   be scheduled).  For other cases a constant FALSE result is recommended.
+   <literal>checkUnique</> is <literal>UNIQUE_CHECK_PARTIAL</> or
+   <literal>UNIQUE_CHECK_SPECULATIVE</>.  In this case a TRUE result
+   means the new entry is known unique, whereas FALSE means it might
+   be non-unique (and a deferred uniqueness check must be scheduled,
+   or for <literal>UNIQUE_CHECK_SPECULATIVE</>, speculative insertion
+   will restart from its first phase, conflict prechecking).  For
+   other cases a constant FALSE result is recommended.
   </para>
 
   <para>
@@ -875,11 +878,15 @@ amrestrpos (IndexScanDesc scan);
      <listitem>
       <para>
        If a conflicting row has been inserted by an as-yet-uncommitted
-       transaction, the would-be inserter must wait to see if that transaction
-       commits.  If it rolls back then there is no conflict.  If it commits
-       without deleting the conflicting row again, there is a uniqueness
-       violation.  (In practice we just wait for the other transaction to
-       end and then redo the visibility check in toto.)
+       transaction, the would-be inserter must wait to see if that
+       transaction commits (or if its <firstterm>speculative
+       insertion</> succeeds).  If it rolls back (or the speculative
+       insertion does not complete) then there is no conflict.  If it
+       commits without deleting the conflicting row again, there is a
+       uniqueness violation.  (In practice we just wait for the other
+       transaction to end and then redo the visibility check in toto;
+       if a speculative insertion was conflicting, it may now be found
+       to be confirmed pending transaction commit.)
       </para>
      </listitem>
      <listitem>
@@ -903,15 +910,17 @@ amrestrpos (IndexScanDesc scan);
   </para>
 
   <para>
-   We require the index access method to apply these tests itself, which
-   means that it must reach into the heap to check the commit status of
-   any row that is shown to have a duplicate key according to the index
-   contents.  This is without a doubt ugly and non-modular, but it saves
-   redundant work: if we did a separate probe then the index lookup for
-   a conflicting row would be essentially repeated while finding the place to
-   insert the new row's index entry.  What's more, there is no obvious way
-   to avoid race conditions unless the conflict check is an integral part
-   of insertion of the new index entry.
+   We require the index access method to apply these tests itself,
+   which means that it must reach into the heap to check the commit
+   status of any row (possibly including a speculative insertion
+   token) that is shown to have a duplicate key according to the index
+   contents.  This is without a doubt ugly and non-modular, but it
+   saves redundant work: if we did a separate probe then the index
+   lookup for a conflicting row would be essentially repeated while
+   finding the place to insert the new row's index entry.  What's
+   more, there is no obvious way to avoid race conditions unless the
+   conflict check is an integral part of insertion of the new index
+   entry.
   </para>
 
   <para>
@@ -926,8 +935,36 @@ amrestrpos (IndexScanDesc scan);
    tuple and some other tuple with the same key are live, then the error
    must be reported.  (Note that for this purpose, <quote>live</> actually
    means <quote>any tuple in the index entry's HOT chain is live</>.)
-   To implement this, the <function>aminsert</> function is passed a
-   <literal>checkUnique</> parameter having one of the following values:
+  </para>
+
+  <para>
+   Speculative insertion is an executor infrastructure used by
+   <literal>INSERT ... ON CONFLICT DO UPDATE/NOTHING</>.  A major goal
+   of the infrastructure is to consistently maintain a documented
+   user-visible guarantee:  the avoidance of <quote>unprincipled</>
+   deadlocks.  Unprincipled deadlocks are deadlocks that arise due to
+   a mutual dependency that is purely an implementation detail, that
+   users cannot avoid by being consistent about the order of lock
+   acquisition.  A speculative insertion token lock is not held for
+   its transaction's entire duration;  rather, it is held during its
+   holder's speculative insertion, typically for only an instant.
+   Without this precaution, under high concurrency <literal>INSERT ...
+   ON CONFLICT DO UPDATE</> sessions could raise non-avoidable
+   deadlocks, which would be inconsistent with documented behavior.
+   Speculative insertion conflicts are handled in a similar manner to
+   deferred constraint conflicts, in that no error is initially
+   thrown.  However, rather than rechecking for a conflict at a
+   deferred point, the speculative insertion attempt is cancelled.
+   The executor restarts at the first phase, the precheck, which is
+   the only point at which unique index speculative insertion may wait
+   for an outcome of or in another transaction.  Note that deferrable
+   constraints are not supported.
+  </para>
+
+  <para>
+   To implement unique enforcement, the <function>aminsert</> function
+   is passed a <literal>checkUnique</> parameter having one of the
+   following values:
 
     <itemizedlist>
      <listitem>
@@ -986,6 +1023,30 @@ amrestrpos (IndexScanDesc scan);
        for the same tuple values as were used in the original insertion.
       </para>
      </listitem>
+     <listitem>
+      <para>
+       <literal>UNIQUE_CHECK_SPECULATIVE</> is very similar to
+       <literal>UNIQUE_CHECK_PARTIAL</>, but indicates an optimistic
+       insertion in the second stage of speculative insertion.
+      </para>
+
+      <para>
+       <literal>UNIQUE_CHECK_SPECULATIVE</> allows the access method
+       to not proceed with <structname>IndexTuple</structname>
+       insertion when it is already clear that that cannot be useful
+       (i.e.  when <function>aminsert</> is set to return FALSE,
+       making caller immediately delete its physically inserted heap
+       tuple).  Proceeding with insertion of the caller's
+       <structname>IndexTuple</structname> regardless is harmless but
+       unnecessary.   When <literal>UNIQUE_CHECK_SPECULATIVE</>
+       insertion detects a conflict, the caller (executor) will cancel
+       its speculative insertion, immediately preventing its heap
+       tuple from being observed even by uniqueness checking.  The
+       executor then restarts from the very beginning for the row
+       proposed for insertion. There should be no side-effects
+       associated with the cancelled attempt.
+      </para>
+     </listitem>
     </itemizedlist>
   </para>
 
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index e3c55eb..3b6620f 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -91,17 +91,21 @@ static void _bt_vacuum_one_page(Relation rel, Buffer buffer, Relation heapRel);
  *		This routine is called by the public interface routine, btinsert.
  *		By here, itup is filled in, including the TID.
  *
- *		If checkUnique is UNIQUE_CHECK_NO or UNIQUE_CHECK_PARTIAL, this
- *		will allow duplicates.  Otherwise (UNIQUE_CHECK_YES or
- *		UNIQUE_CHECK_EXISTING) it will throw error for a duplicate.
- *		For UNIQUE_CHECK_EXISTING we merely run the duplicate check, and
- *		don't actually insert.
+ *		If checkUnique is UNIQUE_CHECK_NO, UNIQUE_CHECK_PARTIAL or
+ *		UNIQUE_CHECK_SPECULATIVE, this will allow duplicates.  Otherwise
+ *		(UNIQUE_CHECK_YES or UNIQUE_CHECK_EXISTING) it will throw error for a
+ *		duplicate.  For UNIQUE_CHECK_EXISTING we merely run the duplicate
+ *		check, and don't actually insert.
  *
- *		The result value is only significant for UNIQUE_CHECK_PARTIAL:
- *		it must be TRUE if the entry is known unique, else FALSE.
- *		(In the current implementation we'll also return TRUE after a
- *		successful UNIQUE_CHECK_YES or UNIQUE_CHECK_EXISTING call, but
- *		that's just a coding artifact.)
+ *		The result value is only significant for UNIQUE_CHECK_PARTIAL
+ *		and UNIQUE_CHECK_SPECULATIVE:  it must be TRUE if the entry is
+ *		known unique, else FALSE.  (In the current implementation
+ *		we'll also return TRUE after a successful UNIQUE_CHECK_YES or
+ *		UNIQUE_CHECK_EXISTING call, but that's just a coding
+ *		artifact.)
+ *
+ *		Note that UNIQUE_CHECK_SPECULATIVE calls that return FALSE
+ *		don't actually proceed with insertion of the index tuple.
  */
 bool
 _bt_doinsert(Relation rel, IndexTuple itup,
@@ -188,7 +192,8 @@ top:
 		}
 	}
 
-	if (checkUnique != UNIQUE_CHECK_EXISTING)
+	if (checkUnique != UNIQUE_CHECK_EXISTING &&
+		(checkUnique != UNIQUE_CHECK_SPECULATIVE || is_unique))
 	{
 		/*
 		 * The only conflict predicate locking cares about for indexes is when
@@ -230,10 +235,11 @@ top:
  * progress, *speculativeToken is set to non-zero, and the caller can wait for
  * the verdict on the insertion using SpeculativeInsertionWait().
  *
- * However, if checkUnique == UNIQUE_CHECK_PARTIAL, we always return
- * InvalidTransactionId because we don't want to wait.  In this case we
- * set *is_unique to false if there is a potential conflict, and the
- * core code must redo the uniqueness check later.
+ * However, if checkUnique is UNIQUE_CHECK_PARTIAL or
+ * UNIQUE_CHECK_SPECULATIVE, we always return InvalidTransactionId because we
+ * don't want to wait.  In this case we set *is_unique to false if there is a
+ * potential conflict, and the core code must redo the uniqueness check later
+ * or restart speculative insertion.
  */
 static TransactionId
 _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
@@ -331,12 +337,15 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
 
 					/*
 					 * It is a duplicate. If we are only doing a partial
-					 * check, then don't bother checking if the tuple is being
-					 * updated in another transaction. Just return the fact
-					 * that it is a potential conflict and leave the full
-					 * check till later.
+					 * check or speculative insertion, then don't bother
+					 * checking if the tuple is being updated in another
+					 * transaction. Just return the fact that it is a potential
+					 * conflict. Partial check callers can leave the full check
+					 * till later. Speculative insertion callers will need to
+					 * restart.
 					 */
-					if (checkUnique == UNIQUE_CHECK_PARTIAL)
+					if (checkUnique == UNIQUE_CHECK_PARTIAL ||
+						checkUnique == UNIQUE_CHECK_SPECULATIVE)
 					{
 						if (nbuf != InvalidBuffer)
 							_bt_relbuf(rel, nbuf);
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 5d553d5..f37c423 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -259,6 +259,14 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
  *		the same is done for non-deferred constraints, but report
  *		if conflict was speculative or deferred conflict to caller)
  *
+ *		Note that the implementation does not bother with inserting
+ *		into all indexes in the event of detecting a speculative
+ *		insertion conflict.  This should not matter, because when
+ *		*specConflict is set to true, caller's heap tuple will be
+ *		super-deleted;  a complete list of indexes with conflicts
+ *		is not useful (in fact, even the actual source of the
+ *		speculative insertion conflict is not of interest).
+ *
  *		CAUTION: this must not be called for a HOT update.
  *		We can't defend against that here for lack of info.
  *		Should we change the API to make it safer?
@@ -320,6 +328,7 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 		/* Record if speculative insertion arbiter */
 		arbiter = list_member_oid(arbiterIndexes,
 								  indexRelation->rd_index->indexrelid);
+		Assert(!arbiter || indexRelation->rd_index->indimmediate);
 
 		/* If the index is marked as read-only, ignore it */
 		if (!indexInfo->ii_ReadyForInserts)
@@ -374,7 +383,7 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 		if (!indexRelation->rd_index->indisunique)
 			checkUnique = UNIQUE_CHECK_NO;
 		else if (noDupErr && (arbiterIndexes == NIL || arbiter))
-			checkUnique = UNIQUE_CHECK_PARTIAL;
+			checkUnique = UNIQUE_CHECK_SPECULATIVE;
 		else if (indexRelation->rd_index->indimmediate)
 			checkUnique = UNIQUE_CHECK_YES;
 		else
@@ -432,18 +441,29 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 		}
 
 		if ((checkUnique == UNIQUE_CHECK_PARTIAL ||
+			 checkUnique == UNIQUE_CHECK_SPECULATIVE ||
 			 indexInfo->ii_ExclusionOps != NULL) &&
 			!satisfiesConstraint)
 		{
+			if (noDupErr && indexRelation->rd_index->indimmediate)
+			{
+				/*
+				 * Speculative conflict found, so caller must restart
+				 * speculative insertion.  Caller will now not care about
+				 * specifics of any other deferred constraint conflict
+				 * that may have already been noted.
+				 */
+				list_free(result);
+				*specConflict = true;
+				return NIL;
+			}
+
 			/*
 			 * The tuple potentially violates the uniqueness or exclusion
 			 * constraint, so make a note of the index so that we can re-check
-			 * it later.  Speculative inserters are told if there was a
-			 * speculative conflict, since that always requires a restart.
+			 * it later
 			 */
 			result = lappend_oid(result, RelationGetRelid(indexRelation));
-			if (indexRelation->rd_index->indimmediate && specConflict)
-				*specConflict = true;
 		}
 	}
 
@@ -540,6 +560,10 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
 					 errtableconstraint(heapRelation,
 								   RelationGetRelationName(indexRelation))));
 
+		/*
+		 * Set checkedIndex here, since partial unique index will still
+		 * count as found
+		 */
 		checkedIndex = true;
 
 		/* Check for partial index */
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 27051e8..d0d52f6 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -441,7 +441,7 @@ ExecInsert(ModifyTableState *mtstate,
 			 */
 			if (specConflict)
 			{
-				list_free(recheckIndexes);
+				Assert(recheckIndexes == NIL);
 				goto vlock;
 			}
 
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 81907d5..2254716 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -98,6 +98,13 @@ typedef struct SysScanDescData *SysScanDesc;
  * known unique, false if it is possibly non-unique.  In the "true" case
  * it is safe to omit the later recheck.
  *
+ * UNIQUE_CHECK_SPECULATIVE is very similar to UNIQUE_CHECK_PARTIAL, but is
+ * used for speculative insertion (which does not support deferrable unique
+ * constraints as arbiters).  The only difference is that it allows the
+ * implementation to avoid pointlessly inserting an index tuple after
+ * establishing that there is a speculative insertion conflict, where the heap
+ * tuple must be super-deleted anyway.
+ *
  * When it is time to recheck the deferred constraint, a pseudo-insertion
  * call is made with UNIQUE_CHECK_EXISTING.  The tuple is already in the
  * index in this case, so it should not be inserted again.  Rather, just
@@ -108,6 +115,7 @@ typedef enum IndexUniqueCheck
 	UNIQUE_CHECK_NO,			/* Don't do any uniqueness checking */
 	UNIQUE_CHECK_YES,			/* Enforce uniqueness at insertion time */
 	UNIQUE_CHECK_PARTIAL,		/* Test uniqueness, but no error */
+	UNIQUE_CHECK_SPECULATIVE,	/* Speculative Insertion */
 	UNIQUE_CHECK_EXISTING		/* Check if existing tuple is unique */
 } IndexUniqueCheck;
 
-- 
1.9.1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

Reply via email to