On Sat, Jan 3, 2015 at 12:42 PM, Peter Geoghegan <p...@heroku.com> wrote:
> There are probably additional factors that make the situation more
> complicated than it is for the ON CONFLICT patch, but it's clear to me
> that the mutual dependency that can be involved with two ordinary
> exclusion constraint inserters is reason enough for those to deadlock.

I looked at the code in more detail, and realized that there were old
bugs in the exclusion constraint related modifications. I attach a
delta patch that fixes them. This is a combined patch that is all that
is needed to apply on top of v1.8.vallock2.tar.gz [1] to have all
available bugfixes.

This unbreaks my previous exclusion constraint test case, as far as it
goes. I am still concerned about the risk of lock starvation/livelocks
here. But I must admit, having stressed the patch with this latest
fix, that it's possible that the real risks have been overestimated.
Maybe this is in the same category as the way L&Y's algorithm could
theoretically starve a session with a pagesplit that needs to move
right indefinitely. It's a theoretical risk that turns out to not
matter in practice, for reasons that lack a real theoretical
justification beyond the fact that sustained very bad luck - a
sustained "perfect storm" - is required for starvation to occur. OTOH,
maybe my OS scheduler doesn't have the characteristic that provoke a
the suspected bug. I really don't know, but I suppose that one or the
other concurrent inserters will tend to win the race often enough - a
draw may be a very rare thing.

[1] 
http://www.postgresql.org/message-id/cam3swzrg_htrol-6_wfe6_d_ucuyc28jfapsfh_tra76gkk...@mail.gmail.com
-- 
Peter Geoghegan
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index ea8c57b..f8a4017 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1133,8 +1133,11 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 		 * If the index has an associated exclusion constraint, check that.
 		 * This is simpler than the process for uniqueness checks since we
 		 * always insert first and then check.  If the constraint is deferred,
-		 * we check now anyway, but don't throw error on violation; instead
-		 * we'll queue a recheck event.
+		 * we check now anyway, but don't throw error on violation or wait for
+		 * a conclusive outcome from a concurrent insertion; instead we'll
+		 * queue a recheck event.  Similarly, noDupErr callers (speculative
+		 * inserters) will recheck later, and wait for a conclusive outcome
+		 * then.
 		 *
 		 * An index for an exclusion constraint can't also be UNIQUE (not an
 		 * essential property, we just don't allow it in the grammar), so no
@@ -1142,15 +1145,15 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 		 */
 		if (indexInfo->ii_ExclusionOps != NULL)
 		{
-			bool		errorOK = (!indexRelation->rd_index->indimmediate &&
-								   !noDupErr);
+			bool		violationOK = (!indexRelation->rd_index->indimmediate ||
+									   noDupErr);
 
 			satisfiesConstraint =
 				check_exclusion_or_unique_constraint(heapRelation,
 													 indexRelation, indexInfo,
 													 tupleid, values, isnull,
-													 estate, false, errorOK,
-													 false, NULL);
+													 estate, false,
+													 violationOK, false, NULL);
 		}
 
 		if ((checkUnique == UNIQUE_CHECK_PARTIAL ||
@@ -1158,7 +1161,7 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 			!satisfiesConstraint)
 		{
 			/*
-			 * The tuple potentially violates the uniqueness or exclusion
+			 * The tuple potentially violates the unique index or exclusion
 			 * constraint, so make a note of the index so that we can re-check
 			 * it later.
 			 */
@@ -1322,7 +1325,7 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
  * is convenient for deferred exclusion checks; we need not bother queuing
  * a deferred event if there is definitely no conflict at insertion time.
  *
- * When errorOK is false, we'll throw error on violation, so a false result
+ * When violationOK is false, we'll throw error on violation, so a false result
  * is impossible.
  *
  * Note: The indexam is normally responsible for checking unique constraints,
@@ -1335,7 +1338,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 									 IndexInfo *indexInfo, ItemPointer tupleid,
 									 Datum *values, bool *isnull,
 									 EState *estate, bool newIndex,
-									 bool errorOK, bool wait,
+									 bool violationOK, bool wait,
 									 ItemPointer conflictTid)
 {
 	Oid		   *constr_procs;
@@ -1462,7 +1465,7 @@ retry:
 		 * we're not supposed to raise error, just return the fact of the
 		 * potential conflict without waiting to see if it's real.
 		 */
-		if (errorOK && !wait)
+		if (violationOK && !wait)
 		{
 			conflict = true;
 			if (conflictTid)
@@ -1487,7 +1490,7 @@ retry:
 			if (DirtySnapshot.speculativeToken)
 				SpeculativeInsertionWait(DirtySnapshot.xmin,
 										 DirtySnapshot.speculativeToken);
-			else if (errorOK)
+			else if (violationOK)
 				XactLockTableWait(xwait, heap, &tup->t_self,
 								  XLTW_RecheckExclusionConstr);
 			else
@@ -1499,7 +1502,7 @@ retry:
 		/*
 		 * We have a definite conflict.  Return it to caller, or report it.
 		 */
-		if (errorOK)
+		if (violationOK)
 		{
 			conflict = true;
 			if (conflictTid)
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4e5be1a..9bb13df 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -313,8 +313,8 @@ ExecInsert(TupleTableSlot *slot,
 		 * deleting the already-inserted tuple and retrying, but that's fairly
 		 * expensive, so we try to avoid it.
 		 */
-		conflict = false;
 vlock:
+		conflict = false;
 		ItemPointerSetInvalid(&conflictTid);
 
 		/*
@@ -354,7 +354,8 @@ vlock:
 			 * which is appropriate only for non-promise tuples) to wait for us
 			 * to decide if we're going to go ahead with the insertion.
 			 */
-			SpeculativeInsertionLockAcquire(GetCurrentTransactionId());
+			if (spec != SPEC_NONE)
+				SpeculativeInsertionLockAcquire(GetCurrentTransactionId());
 
 			/*
 			 * insert the tuple
@@ -363,7 +364,8 @@ vlock:
 			 * the t_self field.
 			 */
 			newId = heap_insert(resultRelationDesc, tuple,
-								estate->es_output_cid, HEAP_INSERT_SPECULATIVE,
+								estate->es_output_cid,
+								spec != SPEC_NONE? HEAP_INSERT_SPECULATIVE:0,
 								NULL);
 
 			/*
@@ -404,8 +406,11 @@ vlock:
 							estate->es_output_cid, NULL, false, &hufd, true);
 			}
 
-			SpeculativeInsertionLockRelease(GetCurrentTransactionId());
-			ClearSpeculativeInsertionState();
+			if (spec != SPEC_NONE)
+			{
+				SpeculativeInsertionLockRelease(GetCurrentTransactionId());
+				ClearSpeculativeInsertionState();
+			}
 		}
 
 		if (conflict)
@@ -1014,6 +1019,17 @@ ExecLockUpdateTuple(EState *estate,
 						   LockTupleExclusive, false, /* wait */
 						   false, &buffer, &hufd);
 
+	/* Was tuple concurrently super-deleted? */
+	if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple.t_data),
+							 InvalidTransactionId))
+	{
+		test = HeapTupleUpdated;
+		/* XXX Should be caught within tqual.c */
+		elog(WARNING, "concurrent super delete of tuple (%u,%u)",
+			 ItemPointerGetBlockNumber(&tuple.t_data->t_ctid),
+			 ItemPointerGetOffsetNumber(&tuple.t_data->t_ctid));
+	}
+
 	if (test == HeapTupleMayBeUpdated)
 		copyTuple = heap_copytuple(&tuple);
 
@@ -1145,6 +1161,15 @@ ExecLockUpdateTuple(EState *estate,
 				*returning = ExecUpdate(&tuple.t_data->t_ctid, NULL, slot,
 										planSlot, &onConflict->mt_epqstate,
 										onConflict->ps.state, canSetTag);
+			else
+				/*
+				 * XXX Debug instrumentation.  This is useful for stress test
+				 * queries with "WHERE target.index = excluded.index", in order
+				 * to assert that the UPDATE affects the right row.
+				 */
+				elog(WARNING, "did not update tuple (%u,%u)",
+					 ItemPointerGetBlockNumber(&tuple.t_data->t_ctid),
+					 ItemPointerGetOffsetNumber(&tuple.t_data->t_ctid));
 
 			ReleaseBuffer(buffer);
 
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 9216779..94099f1 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -728,6 +728,16 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 	snapshot->xmin = snapshot->xmax = InvalidTransactionId;
 	snapshot->speculativeToken = 0;
 
+	/*
+	 * Never return "super-deleted" tuples
+	 *
+	 * XXX:  Comment this code out and you'll get conflicts within
+	 * ExecLockUpdateTuple(), which result in an infinite loop.
+	 */
+	if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple),
+							InvalidTransactionId))
+		return false;
+
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
 		if (HeapTupleHeaderXminInvalid(tuple))
@@ -943,6 +953,13 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 	Assert(ItemPointerIsValid(&htup->t_self));
 	Assert(htup->t_tableOid != InvalidOid);
 
+	/*
+	 * Never return "super-deleted" tuples
+	 */
+	if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple),
+							InvalidTransactionId))
+		return false;
+
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
 		if (HeapTupleHeaderXminInvalid(tuple))
@@ -1147,6 +1164,13 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	Assert(htup->t_tableOid != InvalidOid);
 
 	/*
+	 * Immediately VACUUM "super-deleted" tuples
+	 */
+	if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple),
+							InvalidTransactionId))
+		return HEAPTUPLE_DEAD;
+
+	/*
 	 * Has inserting transaction committed?
 	 *
 	 * If the inserting transaction aborted, then the tuple was never visible
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to