[HACKERS] Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

Peter Geoghegan Mon, 05 Jan 2015 17:54:01 -0800

On Thu, Jan 1, 2015 at 11:17 PM, Peter Geoghegan <p...@heroku.com> wrote:
> The attached patch fixes Jeff's test case, as far as it goes. But, as
> I mentioned, it's not intended as a real fix. For one thing, I have
> made multiple changes to HeapTupleSatisfies*() routines, but haven't
> fully considered the consequences for, say, ri_triggers.c.
> HeapTupleSatisfiesSelf() and HeapTupleSatisfiesHistoricMVCC() were not
> modified. I've modified HeapTupleSatisfiesVacuum() to see
> InvalidTransactionID (not invalid xid infomask bits) as visible for
> garbage collection (alongside HeapTupleSatisfiesDirty() and
> HeapTupleSatisfiesMVCC(), which I changed first), and I wouldn't be
> surprised if that created new problems in other areas. In short, this
> patch is just for illustrative purposes.

I think that I see another race condition, requiring custom
instrumentation to the server to demonstrate.

Basically, even with the fix above, there are still similar races. I'm
not trying to call ExecLockUpdateTuple() against "super deleted" NULL
tuples, because those are no longer visible to any relevant snapshot
type, the fix described above. However, I can still see a change in
tuple header xmin between a dirty index scan and attempting to lock
that row to update. If I'm not mistaken, the race condition is small
enough that something like Jeff's tool won't catch it directly in a
reasonable amount of time, because it happens to be the same index key
value.

It's not all that easy to recreate, even with my custom
instrumentation. With fsync=off, it occurs somewhat infrequently with
a custom instrumentation Postgres build:

pg@hamster:~/jjanes_upsert$ perl count_upsert.pl 8 100000
[Mon Jan 5 17:31:57 2015] init done at count_upsert.pl line 101.
[Mon Jan 5 17:32:17 2015] child abnormal exit [Mon Jan 5 17:32:17
2015] DBD::Pg::db selectall_arrayref failed: ERROR: mismatch in xmin
for (322,264). Initially 4324145, then 4324429 at count_upsert.pl line
210.\n at count_upsert.pl line 227.
[Mon Jan 5 17:32:49 2015] sum is -800
[Mon Jan 5 17:32:49 2015] count is 9515
[Mon Jan 5 17:32:49 2015] normal exit at 1420507969 after 733912
items processed at count_upsert.pl line 182.

I think that "super deleting" broken promise tuples undermines how
vacuum interlocking is supposed to work [1]. We end up at the wrong
tuple, that happens to occupy the same slot as the old one (and
happens to have the same index key value, because the race is so small
are there are relatively few distinct values in play - it's hard to
recreate). Attached instrumentation patch was used with Jeff Jane's
tool.

If we weren't super deleting in the patch, then I think that our
backend's xmin horizon would be enough of an interlock (haven't tested
that theory by testing value locking approach #1 implementation yet,
but I worried about the scenario quite a lot before and things seemed
okay). But we are "super deleting" with implementation #2, and we're
also not holding a pin on the heap buffer or B-Tree leaf page buffer
throughout, so this happens (if I'm not mistaken).

[1]
https://github.com/postgres/postgres/blob/REL9_3_STABLE/src/backend/access/nbtree/README#L142
--
Peter Geoghegan

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index f8a4017..4ce45e3 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1189,7 +1189,7 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
  */
 bool
 ExecCheckIndexConstraints(TupleTableSlot *slot,
-						  EState *estate, ItemPointer conflictTid,
+						  EState *estate, HeapTuple conflictTid,
 						  Oid arbiterIdx)
 {
 	ResultRelInfo *resultRelInfo;
@@ -1339,7 +1339,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 									 Datum *values, bool *isnull,
 									 EState *estate, bool newIndex,
 									 bool violationOK, bool wait,
-									 ItemPointer conflictTid)
+									 HeapTuple conflictTid)
 {
 	Oid		   *constr_procs;
 	uint16	   *constr_strats;
@@ -1469,7 +1469,10 @@ retry:
 		{
 			conflict = true;
 			if (conflictTid)
-				*conflictTid = tup->t_self;
+			{
+				conflictTid->t_self = tup->t_self;
+				conflictTid->t_data = tup->t_data;
+			}
 			break;
 		}
 
@@ -1506,7 +1509,10 @@ retry:
 		{
 			conflict = true;
 			if (conflictTid)
-				*conflictTid = tup->t_self;
+			{
+				conflictTid->t_self = tup->t_self;
+				conflictTid->t_data = tup->t_data;
+			}
 			break;
 		}
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 69af2d1..a289ef4 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -56,7 +56,7 @@
 
 static bool ExecLockUpdateTuple(EState *estate,
 								ResultRelInfo *relinfo,
-								ItemPointer conflictTid,
+								HeapTuple 	conflictTid,
 								TupleTableSlot *planSlot,
 								TupleTableSlot *insertSlot,
 								bool			canSetTag,
@@ -292,7 +292,7 @@ ExecInsert(TupleTableSlot *slot,
 	else
 	{
 		bool				conflict;
-		ItemPointerData		conflictTid;
+		HeapTupleData conflictTid;
 
 		/*
 		 * Constraints might reference the tableoid column, so initialize
@@ -315,7 +315,7 @@ ExecInsert(TupleTableSlot *slot,
 		 */
 vlock:
 		conflict = false;
-		ItemPointerSetInvalid(&conflictTid);
+		ItemPointerSetInvalid(&conflictTid.t_self);
 
 		/*
 		 * XXX If we know or assume that there are few duplicates, it would be
@@ -432,7 +432,7 @@ vlock:
 															&returning))
 					goto vlock;
 			else if (spec == SPEC_IGNORE)
-				ExecCheckHeapTupleVisible(estate, resultRelInfo, &conflictTid);
+				ExecCheckHeapTupleVisible(estate, resultRelInfo, &conflictTid.t_self);
 
 			/*
 			 * RETURNING may have been processed already -- the target
@@ -976,7 +976,7 @@ lreplace:;
 static bool
 ExecLockUpdateTuple(EState *estate,
 					ResultRelInfo *relinfo,
-					ItemPointer conflictTid,
+					HeapTuple 	conflictTid,
 					TupleTableSlot *planSlot,
 					TupleTableSlot *insertSlot,
 					bool			canSetTag,
@@ -999,7 +999,7 @@ ExecLockUpdateTuple(EState *estate,
 	 * now, we just retry, and hopefully the new pre-check will fail on the
 	 * same tuple (or it's finished by now), and we'll get its TID that way.
 	 */
-	if (!ItemPointerIsValid(conflictTid))
+	if (!ItemPointerIsValid(&conflictTid->t_self))
 	{
 		elog(DEBUG1, "insertion conflicted after pre-check");
 		return false;
@@ -1013,15 +1013,15 @@ ExecLockUpdateTuple(EState *estate,
 	 * our previous conclusion that the tuple is the conclusively committed
 	 * conflicting tuple.
 	 */
-	tuple.t_self = *conflictTid;
+	tuple.t_self = conflictTid->t_self;
 	test = heap_lock_tuple(relation, &tuple,
 						   estate->es_output_cid,
-						   LockTupleExclusive, false, /* wait */
+						   LockTupleExclusive, LockWaitBlock,
 						   false, &buffer, &hufd);
 
 	/* Was tuple concurrently super-deleted? */
 	if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple.t_data),
-							 InvalidTransactionId))
+							InvalidTransactionId))
 	{
 		test = HeapTupleUpdated;
 		/* XXX Should be caught within tqual.c */
@@ -1157,6 +1157,16 @@ ExecLockUpdateTuple(EState *estate,
 
 			slot = EvalPlanQualNext(&onConflict->mt_epqstate);
 
+			if (copyTuple->t_data->t_choice.t_heap.t_xmin !=
+				conflictTid->t_data->t_choice.t_heap.t_xmin)
+			{
+				elog(ERROR, "mismatch in xmin for (%u,%u). Initially %u, then %u",
+					 ItemPointerGetBlockNumber(&copyTuple->t_data->t_ctid),
+					 ItemPointerGetOffsetNumber(&copyTuple->t_data->t_ctid),
+					 copyTuple->t_data->t_choice.t_heap.t_xmin,
+					 conflictTid->t_data->t_choice.t_heap.t_xmin);
+			}
+
 			if (!TupIsNull(slot))
 				*returning = ExecUpdate(&tuple.t_data->t_ctid, NULL, slot,
 										planSlot, &onConflict->mt_epqstate,
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index e7c16fb..abd7741 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -2288,12 +2288,14 @@ transformConflictClause(ParseState *pstate, ConflictClause *confClause,
 {
 	InferClause	   *infer = confClause->infer;
 
+#if 0
 	if (confClause->specclause == SPEC_INSERT && !infer)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("ON CONFLICT with UPDATE must contain columns or expressions to infer a unique index from"),
 				 parser_errposition(pstate,
 									exprLocation((Node *) confClause))));
+#endif
 
 	/* Raw grammar must ensure this invariant holds */
 	Assert(confClause->specclause != SPEC_INSERT ||
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 4cdf958..b668619 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -359,14 +359,14 @@ extern List *ExecLockIndexValues(TupleTableSlot *slot, EState *estate,
 extern List *ExecInsertIndexTuples(TupleTableSlot *slot, ItemPointer tupleid,
 					  EState *estate, bool noDupErr, Oid arbiterIdx);
 extern bool ExecCheckIndexConstraints(TupleTableSlot *slot, EState *estate,
-					  ItemPointer conflictTid, Oid arbiterIdx);
+					  HeapTuple conflictTid, Oid arbiterIdx);
 extern bool check_exclusion_or_unique_constraint(Relation heap, Relation index,
 									   IndexInfo *indexInfo,
 									   ItemPointer tupleid,
 									   Datum *values, bool *isnull,
 									   EState *estate,
 									   bool newIndex, bool errorOK,
-									   bool wait, ItemPointer conflictTid);
+									   bool wait, HeapTuple conflictTid);
 
 extern void RegisterExprContextCallback(ExprContext *econtext,
 							ExprContextCallbackFunction function,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

Reply via email to