On Thu, Jan 1, 2015 at 11:17 PM, Peter Geoghegan <p...@heroku.com> wrote: > The attached patch fixes Jeff's test case, as far as it goes. But, as > I mentioned, it's not intended as a real fix. For one thing, I have > made multiple changes to HeapTupleSatisfies*() routines, but haven't > fully considered the consequences for, say, ri_triggers.c. > HeapTupleSatisfiesSelf() and HeapTupleSatisfiesHistoricMVCC() were not > modified. I've modified HeapTupleSatisfiesVacuum() to see > InvalidTransactionID (not invalid xid infomask bits) as visible for > garbage collection (alongside HeapTupleSatisfiesDirty() and > HeapTupleSatisfiesMVCC(), which I changed first), and I wouldn't be > surprised if that created new problems in other areas. In short, this > patch is just for illustrative purposes.
I think that I see another race condition, requiring custom instrumentation to the server to demonstrate. Basically, even with the fix above, there are still similar races. I'm not trying to call ExecLockUpdateTuple() against "super deleted" NULL tuples, because those are no longer visible to any relevant snapshot type, the fix described above. However, I can still see a change in tuple header xmin between a dirty index scan and attempting to lock that row to update. If I'm not mistaken, the race condition is small enough that something like Jeff's tool won't catch it directly in a reasonable amount of time, because it happens to be the same index key value. It's not all that easy to recreate, even with my custom instrumentation. With fsync=off, it occurs somewhat infrequently with a custom instrumentation Postgres build: pg@hamster:~/jjanes_upsert$ perl count_upsert.pl 8 100000 [Mon Jan 5 17:31:57 2015] init done at count_upsert.pl line 101. [Mon Jan 5 17:32:17 2015] child abnormal exit [Mon Jan 5 17:32:17 2015] DBD::Pg::db selectall_arrayref failed: ERROR: mismatch in xmin for (322,264). Initially 4324145, then 4324429 at count_upsert.pl line 210.\n at count_upsert.pl line 227. [Mon Jan 5 17:32:49 2015] sum is -800 [Mon Jan 5 17:32:49 2015] count is 9515 [Mon Jan 5 17:32:49 2015] normal exit at 1420507969 after 733912 items processed at count_upsert.pl line 182. I think that "super deleting" broken promise tuples undermines how vacuum interlocking is supposed to work [1]. We end up at the wrong tuple, that happens to occupy the same slot as the old one (and happens to have the same index key value, because the race is so small are there are relatively few distinct values in play - it's hard to recreate). Attached instrumentation patch was used with Jeff Jane's tool. If we weren't super deleting in the patch, then I think that our backend's xmin horizon would be enough of an interlock (haven't tested that theory by testing value locking approach #1 implementation yet, but I worried about the scenario quite a lot before and things seemed okay). But we are "super deleting" with implementation #2, and we're also not holding a pin on the heap buffer or B-Tree leaf page buffer throughout, so this happens (if I'm not mistaken). [1] https://github.com/postgres/postgres/blob/REL9_3_STABLE/src/backend/access/nbtree/README#L142 -- Peter Geoghegan
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c index f8a4017..4ce45e3 100644 --- a/src/backend/executor/execUtils.c +++ b/src/backend/executor/execUtils.c @@ -1189,7 +1189,7 @@ ExecInsertIndexTuples(TupleTableSlot *slot, */ bool ExecCheckIndexConstraints(TupleTableSlot *slot, - EState *estate, ItemPointer conflictTid, + EState *estate, HeapTuple conflictTid, Oid arbiterIdx) { ResultRelInfo *resultRelInfo; @@ -1339,7 +1339,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index, Datum *values, bool *isnull, EState *estate, bool newIndex, bool violationOK, bool wait, - ItemPointer conflictTid) + HeapTuple conflictTid) { Oid *constr_procs; uint16 *constr_strats; @@ -1469,7 +1469,10 @@ retry: { conflict = true; if (conflictTid) - *conflictTid = tup->t_self; + { + conflictTid->t_self = tup->t_self; + conflictTid->t_data = tup->t_data; + } break; } @@ -1506,7 +1509,10 @@ retry: { conflict = true; if (conflictTid) - *conflictTid = tup->t_self; + { + conflictTid->t_self = tup->t_self; + conflictTid->t_data = tup->t_data; + } break; } diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 69af2d1..a289ef4 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -56,7 +56,7 @@ static bool ExecLockUpdateTuple(EState *estate, ResultRelInfo *relinfo, - ItemPointer conflictTid, + HeapTuple conflictTid, TupleTableSlot *planSlot, TupleTableSlot *insertSlot, bool canSetTag, @@ -292,7 +292,7 @@ ExecInsert(TupleTableSlot *slot, else { bool conflict; - ItemPointerData conflictTid; + HeapTupleData conflictTid; /* * Constraints might reference the tableoid column, so initialize @@ -315,7 +315,7 @@ ExecInsert(TupleTableSlot *slot, */ vlock: conflict = false; - ItemPointerSetInvalid(&conflictTid); + ItemPointerSetInvalid(&conflictTid.t_self); /* * XXX If we know or assume that there are few duplicates, it would be @@ -432,7 +432,7 @@ vlock: &returning)) goto vlock; else if (spec == SPEC_IGNORE) - ExecCheckHeapTupleVisible(estate, resultRelInfo, &conflictTid); + ExecCheckHeapTupleVisible(estate, resultRelInfo, &conflictTid.t_self); /* * RETURNING may have been processed already -- the target @@ -976,7 +976,7 @@ lreplace:; static bool ExecLockUpdateTuple(EState *estate, ResultRelInfo *relinfo, - ItemPointer conflictTid, + HeapTuple conflictTid, TupleTableSlot *planSlot, TupleTableSlot *insertSlot, bool canSetTag, @@ -999,7 +999,7 @@ ExecLockUpdateTuple(EState *estate, * now, we just retry, and hopefully the new pre-check will fail on the * same tuple (or it's finished by now), and we'll get its TID that way. */ - if (!ItemPointerIsValid(conflictTid)) + if (!ItemPointerIsValid(&conflictTid->t_self)) { elog(DEBUG1, "insertion conflicted after pre-check"); return false; @@ -1013,15 +1013,15 @@ ExecLockUpdateTuple(EState *estate, * our previous conclusion that the tuple is the conclusively committed * conflicting tuple. */ - tuple.t_self = *conflictTid; + tuple.t_self = conflictTid->t_self; test = heap_lock_tuple(relation, &tuple, estate->es_output_cid, - LockTupleExclusive, false, /* wait */ + LockTupleExclusive, LockWaitBlock, false, &buffer, &hufd); /* Was tuple concurrently super-deleted? */ if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple.t_data), - InvalidTransactionId)) + InvalidTransactionId)) { test = HeapTupleUpdated; /* XXX Should be caught within tqual.c */ @@ -1157,6 +1157,16 @@ ExecLockUpdateTuple(EState *estate, slot = EvalPlanQualNext(&onConflict->mt_epqstate); + if (copyTuple->t_data->t_choice.t_heap.t_xmin != + conflictTid->t_data->t_choice.t_heap.t_xmin) + { + elog(ERROR, "mismatch in xmin for (%u,%u). Initially %u, then %u", + ItemPointerGetBlockNumber(©Tuple->t_data->t_ctid), + ItemPointerGetOffsetNumber(©Tuple->t_data->t_ctid), + copyTuple->t_data->t_choice.t_heap.t_xmin, + conflictTid->t_data->t_choice.t_heap.t_xmin); + } + if (!TupIsNull(slot)) *returning = ExecUpdate(&tuple.t_data->t_ctid, NULL, slot, planSlot, &onConflict->mt_epqstate, diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c index e7c16fb..abd7741 100644 --- a/src/backend/parser/parse_clause.c +++ b/src/backend/parser/parse_clause.c @@ -2288,12 +2288,14 @@ transformConflictClause(ParseState *pstate, ConflictClause *confClause, { InferClause *infer = confClause->infer; +#if 0 if (confClause->specclause == SPEC_INSERT && !infer) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("ON CONFLICT with UPDATE must contain columns or expressions to infer a unique index from"), parser_errposition(pstate, exprLocation((Node *) confClause)))); +#endif /* Raw grammar must ensure this invariant holds */ Assert(confClause->specclause != SPEC_INSERT || diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h index 4cdf958..b668619 100644 --- a/src/include/executor/executor.h +++ b/src/include/executor/executor.h @@ -359,14 +359,14 @@ extern List *ExecLockIndexValues(TupleTableSlot *slot, EState *estate, extern List *ExecInsertIndexTuples(TupleTableSlot *slot, ItemPointer tupleid, EState *estate, bool noDupErr, Oid arbiterIdx); extern bool ExecCheckIndexConstraints(TupleTableSlot *slot, EState *estate, - ItemPointer conflictTid, Oid arbiterIdx); + HeapTuple conflictTid, Oid arbiterIdx); extern bool check_exclusion_or_unique_constraint(Relation heap, Relation index, IndexInfo *indexInfo, ItemPointer tupleid, Datum *values, bool *isnull, EState *estate, bool newIndex, bool errorOK, - bool wait, ItemPointer conflictTid); + bool wait, HeapTuple conflictTid); extern void RegisterExprContextCallback(ExprContext *econtext, ExprContextCallbackFunction function,
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers