Currently, speculative insertion (the INSERT ... ON CONFLICT DO UPDATE
executor/storage infrastructure) uses checkUnique ==
UNIQUE_CHECK_PARTIAL for unique indexes, which is a constant
originally only used by deferred unique constraints. It occurred to me
that this has a number of disadvantages:

* It confuses separation of concerns. Pushing down this information to
the nbtree AM makes it clear why it's slightly special from a
speculative insertion point of view. For example, the nbtree AM does
not actually require "livelock insurance" (for unique indexes,
although in principle not for nbtree-based exclusion constraints,
which are possible).

* UNIQUE_CHECK_PARTIAL is not only not the same thing as
UNIQUE_CHECK_SPECULATIVE (a new constant for the enum). It's also
naturally mutually exclusive with it (since we do not and cannot
support deferred unique constraints as arbiters). Let's represent this
directly.

* It makes a conflict not detected by the pre-check always insert an
index tuple, even though that occurs after a point where it's already
been established that the pointed-to TID is doomed -- it must go on to
be super deleted. Why bother bloating the index?


I'm actually not really motivated by wanting to reduce bloat here
(that was always something that I thought was a non-issue with *any*
implemented speculative insertion prototype [1]). Rather, by actually
physically inserting an index tuple unnecessarily, the implication is
that it makes sense to do so (perhaps for roughly the same reason it
makes sense with deferred unique constraints, or some other
complicated and subtle reason.). AFAICT that implication is incorrect,
though; I see no reason why inserting that index tuple serves any
purpose, and it clearly *can* be avoided with little effort.

Attached patch updates speculative insertion along these lines.

In passing, I have make ExecInsertIndexTuples() give up when a
speculative insertion conflict is detected. Again, this is not about
bloat prevention; it's about making the code easier to understand by
not doing something that is unnecessary, and in avoiding that also
avoiding the implication that it is necessary. There are already
enough complicated interactions that *are* necessary (e.g. "livelock
avoidance" for exclusion constraints). Let us make our intent clearer.

The patch also updates the AM interface documentation (the part
covering unique indexes). It seems quite natural to me to document the
theory of operation for speculative insertion there.

Thoughts?

[1] 
https://wiki.postgresql.org/wiki/Value_locking#.232._.22Promise.22_heap_tuples_.28Heikki_Linnakangas.29
-- 
Peter Geoghegan
From 693dc7b53d311b6a739fea0eea9c2f7147673caf Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <peter.geoghega...@gmail.com>
Date: Tue, 2 Jun 2015 17:34:16 -0700
Subject: [PATCH] Refactor speculative insertion with unique indexes

Add a dedicated IndexUniqueCheck constant that relates to the
speculative insertion case:  UNIQUE_CHECK_SPECULATIVE.  This allows
btinsert() (and, in principle, any amcanunique aminsert function) to
avoid physically inserting an IndexTuple in the event of detecting a
conflict during speculative insertion's second phase.  With btree, this
avoidance can occur at the critical point in _bt_doinsert() immediately
after conclusively establishing that there is a conflict, but
immediately before actually calling _bt_insertonpg() to proceed with
physical IndexTuple insertion.

It only makes sense to soldier on in the hope that the conflicts can
later be resolved for deferrable unique constraints -- speculative
inserters have no chance of working out a way to proceed without first
deleting the would-be-pointed-to heap tuple already physically inserted.

This isn't really about bloat, which seems unlikely to be a significant
problem with speculative insertion in practice.  Rather, it's about
bringing clarity to how btree (or future amcanunique AMs) handle
speculative insertion.  It seems quite natural to avoid a patently
unnecessary unique index physical IndexTuple insertion in the rare case
where that can currently happen.  Structuring the code so that the btree
access method has some knowledge of speculative insertion callers feels
like a good way of representing its contract with the executor for
speculative insertion.  Besides, it was inaccurate to assert that
UNIQUE_CHECK_PARTIAL only concerned deferrable unique indexes, and never
speculative insertion; the AM interface documentation and various
comments now recognize speculative insertion callers.  Allowing the
access method documentation to falsely claim that UNIQUE_CHECK_PARTIAL
only concerned deferrable constraints could conceivably cause bugs in
third party access method code (back when UNIQUE_CHECK_PARTIAL *did*
include speculative insertion, which, of course, is no longer the case
with this commit).  The access method documentation has been changed to
clarify matters, including describing the new and slightly distinct
UNIQUE_CHECK_SPECULATIVE case.
---
 doc/src/sgml/indexam.sgml              | 107 +++++++++++++++++++++++++++++----
 src/backend/access/nbtree/nbtinsert.c  |  49 +++++++++------
 src/backend/executor/execIndexing.c    |  36 +++++++++--
 src/backend/executor/nodeModifyTable.c |   2 +-
 src/include/access/genam.h             |   8 +++
 5 files changed, 163 insertions(+), 39 deletions(-)

diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 1c09bae..0c09977 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -199,10 +199,13 @@ aminsert (Relation indexRelation,
 
   <para>
    The function's Boolean result value is significant only when
-   <literal>checkUnique</> is <literal>UNIQUE_CHECK_PARTIAL</>.
-   In this case a TRUE result means the new entry is known unique, whereas
-   FALSE means it might be non-unique (and a deferred uniqueness check must
-   be scheduled).  For other cases a constant FALSE result is recommended.
+   <literal>checkUnique</> is <literal>UNIQUE_CHECK_PARTIAL</> or
+   <literal>UNIQUE_CHECK_SPECULATIVE</>.  In this case a TRUE result
+   means the new entry is known unique, whereas FALSE means it might
+   be non-unique (and a deferred uniqueness check must be scheduled,
+   or for <literal>UNIQUE_CHECK_SPECULATIVE</>, speculative insertion
+   will restart from scratch, before its conflict pre-check).  For
+   other cases a constant FALSE result is recommended.
   </para>
 
   <para>
@@ -804,6 +807,34 @@ amrestrpos (IndexScanDesc scan);
        for that transaction to commit or abort, and then repeat the test.
       </para>
      </listitem>
+     <listitem>
+      <para>
+       It may be necessary to wait on a <quote>speculative token</>
+       rather than a transaction, regardless of whether or not the
+       current insertion is a <firstterm>speculative insertion</>.
+       The procedure for handling waiting on a conflict with a
+       speculative token is largely the same as the
+       waiting-on-transaction case.
+      </para>
+      <para>
+       Speculative insertion is an executor infrastructure that allows
+       <literal>INSERT ... ON CONFLICT DO UPDATE/NOTHING</> to safely
+       maintain useful guarantees about the avoidance of
+       <quote>unprincipled deadlocks</> under high concurrency (among
+       other things).  These are deadlocks that arise due to a mutual
+       dependency that is not user visible.  By definition,
+       unprincipled deadlocks cannot be prevented by the user
+       reordering lock acquisition in client code, because the
+       implementation level lock acquisitions are not under the user's
+       direct control.  The speculative insertion token lock is an
+       identifier that is not tied to its holders transaction
+       duration;  rather, it is held (usually for only an instant)
+       during its holder's speculative insertion.  Without this
+       precaution, under high concurrency <literal>INSERT ... ON
+       CONFLICT DO UPDATE</> sessions could deadlock with each other,
+       which would not be acceptable.
+      </para>
+     </listitem>
     </itemizedlist>
   </para>
 
@@ -832,15 +863,20 @@ amrestrpos (IndexScanDesc scan);
   <para>
    If the unique constraint is deferrable, there is additional complexity:
    we need to be able to insert an index entry for a new row, but defer any
-   uniqueness-violation error until end of statement or even later.  To
-   avoid unnecessary repeat searches of the index, the index access method
-   should do a preliminary uniqueness check during the initial insertion.
-   If this shows that there is definitely no conflicting live tuple, we
-   are done.  Otherwise, we schedule a recheck to occur when it is time to
-   enforce the constraint.  If, at the time of the recheck, both the inserted
-   tuple and some other tuple with the same key are live, then the error
-   must be reported.  (Note that for this purpose, <quote>live</> actually
-   means <quote>any tuple in the index entry's HOT chain is live</>.)
+   uniqueness-violation error until end of statement or even later.
+   Speculative insertion similarly avoids uniquess-violation errors.
+   To avoid unnecessary repeat searches of the index, the index access
+   method should do a preliminary uniqueness check during the initial
+   insertion for deferrable unique constraint indexes (the speculative
+   insertion case has a pre-check in advance of index insertion).  If
+   this shows that there is definitely no conflicting live tuple, we
+   are done.  Otherwise, we schedule a recheck to occur when it is
+   time to enforce the constraint (or for the speculative insertion
+   case, a new attempt begins, starting with a new pre-check).  If, at
+   the time of the recheck, both the inserted tuple and some other
+   tuple with the same key are live, then the error must be reported.
+   (Note that for this purpose, <quote>live</> actually means
+   <quote>any tuple in the index entry's HOT chain is live</>.)
    To implement this, the <function>aminsert</> function is passed a
    <literal>checkUnique</> parameter having one of the following values:
 
@@ -901,6 +937,51 @@ amrestrpos (IndexScanDesc scan);
        for the same tuple values as were used in the original insertion.
       </para>
      </listitem>
+     <listitem>
+      <para>
+       <literal>UNIQUE_CHECK_SPECULATIVE</> is very similar to
+       <literal>UNIQUE_CHECK_PARTIAL</>, but indicates an optimistic
+       insertion in the second stage of speculative insertion.  (The
+       first stage involves the pre-check.)
+      </para>
+
+      <para>
+       <literal>UNIQUE_CHECK_SPECULATIVE</> allows the access method
+       to not proceed with <structname>IndexTuple</structname>
+       insertion when it is already clear that that cannot be useful
+       (i.e.  when <function>aminsert</> is set to return FALSE,
+       making caller immediately delete its would-be inserted TID).
+       Proceeding with insertion of the caller's
+       <structname>IndexTuple</structname> regardless is harmless but
+       unnecessary.  It must be harmless because there could be more
+       than one <function>aminsert</>
+       <literal>UNIQUE_CHECK_SPECULATIVE</> call for each of multiple
+       unique indexes, with all calls concerning the same TID &mdash;
+       in this case each unique index arbitrates insertion, and so can
+       individually veto the TID's insertion.
+      </para>
+
+      <para>
+       Note that speculative insertion pre-checks are not particularly
+       influenced by what might have occurred in previous iterations
+       (that is, previous unsuccessful attempts to insert or to take
+       alternative path for a user-proposed row).  There is no
+       <literal>UNIQUE_CHECK_EXISTING</> style check for the
+       <literal>UNIQUE_CHECK_SPECULATIVE</> case, because speculative
+       insertion pre-checks occur without <emphasis>any</>
+       <function>aminsert</> call (at least in the first iteration of
+       a speculative insertion), and because the second stage
+       (involving an <literal>UNIQUE_CHECK_SPECULATIVE</>
+       <function>aminsert</> call) is optimistic;  unlike the
+       <literal>UNIQUE_CHECK_PARTIAL</> case, the caller's TID will
+       never go on to play a useful role in locking or conflict
+       avoidance/detection.  The caller will immediately resign itself
+       to a restart, having made no useful progress for its TID (or,
+       at a higher level, for the row proposed for insertion).
+       <literal>UNIQUE_CHECK_SPECULATIVE</> is otherwise equivalent to
+       <literal>UNIQUE_CHECK_PARTIAL</>.
+      </para>
+     </listitem>
     </itemizedlist>
   </para>
 
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index 77c2fdf..aa5b621 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -91,17 +91,21 @@ static void _bt_vacuum_one_page(Relation rel, Buffer buffer, Relation heapRel);
  *		This routine is called by the public interface routine, btinsert.
  *		By here, itup is filled in, including the TID.
  *
- *		If checkUnique is UNIQUE_CHECK_NO or UNIQUE_CHECK_PARTIAL, this
- *		will allow duplicates.  Otherwise (UNIQUE_CHECK_YES or
- *		UNIQUE_CHECK_EXISTING) it will throw error for a duplicate.
- *		For UNIQUE_CHECK_EXISTING we merely run the duplicate check, and
- *		don't actually insert.
+ *		If checkUnique is UNIQUE_CHECK_NO, UNIQUE_CHECK_PARTIAL or
+ *		UNIQUE_CHECK_SPECULATIVE, this will allow duplicates.  Otherwise
+ *		(UNIQUE_CHECK_YES or UNIQUE_CHECK_EXISTING) it will throw error for a
+ *		duplicate.  For UNIQUE_CHECK_EXISTING we merely run the duplicate
+ *		check, and don't actually insert.
  *
- *		The result value is only significant for UNIQUE_CHECK_PARTIAL:
- *		it must be TRUE if the entry is known unique, else FALSE.
- *		(In the current implementation we'll also return TRUE after a
- *		successful UNIQUE_CHECK_YES or UNIQUE_CHECK_EXISTING call, but
- *		that's just a coding artifact.)
+ *		The result value is only significant for UNIQUE_CHECK_PARTIAL
+ *		and UNIQUE_CHECK_SPECULATIVE:  it must be TRUE if the entry is
+ *		known unique, else FALSE.  (In the current implementation
+ *		we'll also return TRUE after a successful UNIQUE_CHECK_YES or
+ *		UNIQUE_CHECK_EXISTING call, but that's just a coding
+ *		artifact.)
+ *
+ *		Note that UNIQUE_CHECK_SPECULATIVE calls that return false
+ *		don't actually proceed with insertion of the index tuple.
  */
 bool
 _bt_doinsert(Relation rel, IndexTuple itup,
@@ -188,7 +192,8 @@ top:
 		}
 	}
 
-	if (checkUnique != UNIQUE_CHECK_EXISTING)
+	if (checkUnique != UNIQUE_CHECK_EXISTING &&
+		(checkUnique != UNIQUE_CHECK_SPECULATIVE || is_unique))
 	{
 		/*
 		 * The only conflict predicate locking cares about for indexes is when
@@ -230,10 +235,11 @@ top:
  * progress, *speculativeToken is set to non-zero, and the caller can wait for
  * the verdict on the insertion using SpeculativeInsertionWait().
  *
- * However, if checkUnique == UNIQUE_CHECK_PARTIAL, we always return
- * InvalidTransactionId because we don't want to wait.  In this case we
- * set *is_unique to false if there is a potential conflict, and the
- * core code must redo the uniqueness check later.
+ * However, if checkUnique is UNIQUE_CHECK_PARTIAL or
+ * UNIQUE_CHECK_SPECULATIVE, we always return InvalidTransactionId because we
+ * don't want to wait.  In this case we set *is_unique to false if there is a
+ * potential conflict, and the core code must redo the uniqueness check later
+ * or restart speculative insertion.
  */
 static TransactionId
 _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
@@ -331,12 +337,15 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
 
 					/*
 					 * It is a duplicate. If we are only doing a partial
-					 * check, then don't bother checking if the tuple is being
-					 * updated in another transaction. Just return the fact
-					 * that it is a potential conflict and leave the full
-					 * check till later.
+					 * check or speculative insertion, then don't bother
+					 * checking if the tuple is being updated in another
+					 * transaction. Just return the fact that it is a potential
+					 * conflict. Partial check callers can leave the full check
+					 * till later. Speculative insertion callers will need to
+					 * restart.
 					 */
-					if (checkUnique == UNIQUE_CHECK_PARTIAL)
+					if (checkUnique == UNIQUE_CHECK_PARTIAL ||
+						checkUnique == UNIQUE_CHECK_SPECULATIVE)
 					{
 						if (nbuf != InvalidBuffer)
 							_bt_relbuf(rel, nbuf);
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index bf38508..1163f1a 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -259,6 +259,14 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)
  *		the same is done for non-deferred constraints, but report
  *		if conflict was speculative or deferred conflict to caller)
  *
+ *		Note that the implementation does not bother with inserting
+ *		into all indexes in the event of detecting a speculative
+ *		insertion conflict.  This should not matter, because when
+ *		*specConflict is set to true, caller's heap tuple will be
+ *		super-deleted;  a complete list of indexes with conflicts
+ *		is not useful (in fact, even the actual source of the
+ *		speculative insertion conflict is not of interest).
+ *
  *		CAUTION: this must not be called for a HOT update.
  *		We can't defend against that here for lack of info.
  *		Should we change the API to make it safer?
@@ -374,7 +382,7 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 		if (!indexRelation->rd_index->indisunique)
 			checkUnique = UNIQUE_CHECK_NO;
 		else if (noDupErr && (arbiterIndexes == NIL || arbiter))
-			checkUnique = UNIQUE_CHECK_PARTIAL;
+			checkUnique = UNIQUE_CHECK_SPECULATIVE;
 		else if (indexRelation->rd_index->indimmediate)
 			checkUnique = UNIQUE_CHECK_YES;
 		else
@@ -432,18 +440,31 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 		}
 
 		if ((checkUnique == UNIQUE_CHECK_PARTIAL ||
+			 checkUnique == UNIQUE_CHECK_SPECULATIVE ||
 			 indexInfo->ii_ExclusionOps != NULL) &&
 			!satisfiesConstraint)
 		{
+			if (indexRelation->rd_index->indimmediate && specConflict)
+			{
+				Assert(noDupErr);
+
+				/*
+				 * Speculative conflict found, necessitating new loop for
+				 * caller.  Caller will now not care about specifics of
+				 * conflicts if there was also a deferred constraint
+				 * conflict.
+				 */
+				list_free(result);
+				*specConflict = true;
+				return NIL;
+			}
+
 			/*
 			 * The tuple potentially violates the uniqueness or exclusion
 			 * constraint, so make a note of the index so that we can re-check
-			 * it later.  Speculative inserters are told if there was a
-			 * speculative conflict, since that always requires a restart.
+			 * it later
 			 */
 			result = lappend_oid(result, RelationGetRelid(indexRelation));
-			if (indexRelation->rd_index->indimmediate && specConflict)
-				*specConflict = true;
 		}
 	}
 
@@ -540,6 +561,11 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
 					 errtableconstraint(heapRelation,
 								   RelationGetRelationName(indexRelation))));
 
+		/*
+		 * Set checkedIndex here, since partial unique index will still count
+		 * as a found arbiter index despite being skipped due to predicate not
+		 * being satisfied
+		 */
 		checkedIndex = true;
 
 		/* Check for partial index */
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 874ca6a..7545e61 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -436,7 +436,7 @@ ExecInsert(ModifyTableState *mtstate,
 			 */
 			if (specConflict)
 			{
-				list_free(recheckIndexes);
+				Assert(recheckIndexes == NIL);
 				goto vlock;
 			}
 
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index d86590a..bd8d344 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -98,6 +98,13 @@ typedef struct SysScanDescData *SysScanDesc;
  * known unique, false if it is possibly non-unique.  In the "true" case
  * it is safe to omit the later recheck.
  *
+ * UNIQUE_CHECK_SPECULATIVE is very similar to UNIQUE_CHECK_PARTIAL, but is
+ * used for speculative insertion (which does not support deferrable unique
+ * constraints as arbiters).  The only difference is that it allows the
+ * implementation to avoid pointlessly inserting an index tuple after
+ * establishing that there is a speculative insertion conflict, where the heap
+ * tuple must be super-deleted anyway.
+ *
  * When it is time to recheck the deferred constraint, a pseudo-insertion
  * call is made with UNIQUE_CHECK_EXISTING.  The tuple is already in the
  * index in this case, so it should not be inserted again.  Rather, just
@@ -108,6 +115,7 @@ typedef enum IndexUniqueCheck
 	UNIQUE_CHECK_NO,			/* Don't do any uniqueness checking */
 	UNIQUE_CHECK_YES,			/* Enforce uniqueness at insertion time */
 	UNIQUE_CHECK_PARTIAL,		/* Test uniqueness, but no error */
+	UNIQUE_CHECK_SPECULATIVE,	/* Speculative Insertion */
 	UNIQUE_CHECK_EXISTING		/* Check if existing tuple is unique */
 } IndexUniqueCheck;
 
-- 
1.9.1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to